python - How to perform a function between specific columns of multi-indexed data using Pandas -
how can calculate correlation between 1st column (a) of , 1st column (d) of jp , extend creating loop calculates correlation between (b,e) , (c,f) defined in desired output.
sample input:
import pandas pd columns = pd.multiindex.from_arrays([['us', 'us', 'us', 'jp', 'jp', 'jp'], ['a', 'b', 'c', 'd', 'e', 'f']], names=['cty', 'tenor']) hier_df = dataframe(np.random.randn(12, 6), columns=columns) hier_df
desired output:
a d 0.8 b e 0 c f 0.2
if want use loop, can use zip iterate on 2 sub frames:
data = [] col1, col2 in zip(hier_df['us'], hier_df['jp']): data.append((col1, col2, hier_df['us'][col1].corr(hier_df['jp'][col2]))) data = pd.dataframe(data) data.to_csv(filename, sep='\t', index=false, header=false) # write tab-seperated file d 0.130997264133 b e 0.740703734042 c f 0.033917870807
a more efficient way of doing use corrwith
. requires column names identical though:
hier_df['us'].corrwith(hier_df['jp'].rename(columns={'d': 'a', 'e': 'b', 'f': 'c'})) out: tenor 0.130997 b 0.740704 c 0.033918 dtype: float64
Comments
Post a Comment