python - Sum of Every Two Columns in Pandas dataframe -


when using pandas, have problem. task this:

df=pd.dataframe([(1,2,3,4,5,6),(1,2,3,4,5,6),(1,2,3,4,5,6)],columns=['a','b','c','d','e','f']) out:     b c d e f 0   1 2 3 4 5 6 1   1 2 3 4 5 6  2   1 2 3 4 5 6 

what want output dataframe looks this:

out:     s1   s2   s3 0   3    7    11 1   3    7    11 2   3    7    11 

that say, sum column (a,b),(c,d),(e,f) separately , rename result columns names (s1,s2,s3). solve problem in pandas? thank much.

1) perform groupby w.r.t columns supplying axis=1. per @boud's comment, want minor tweak in grouping array:

df.groupby((np.arange(len(df.columns)) // 2) + 1, axis=1).sum().add_prefix('s') 

enter image description here

grouping gets performed according condition:

np.arange(len(df.columns)) // 2 # array([0, 0, 1, 1, 2, 2], dtype=int32) 

2) use np.add.reduceat faster alternative:

df = pd.dataframe(np.add.reduceat(df.values, np.arange(len(df.columns))[::2], axis=1)) df.columns = df.columns + 1 df.add_prefix('s') 

enter image description here

timing constraints:

for df of 1 million rows spanned on 20 columns:

from string import ascii_lowercase np.random.seed(42) df = pd.dataframe(np.random.randint(0, 10, (10**6,20)), columns=list(ascii_lowercase[:20])) df.shape (1000000, 20)  def with_groupby(df):     return df.groupby((np.arange(len(df.columns)) // 2) + 1, axis=1).sum().add_prefix('s')  def with_reduceat(df):     df = pd.dataframe(np.add.reduceat(df.values, np.arange(len(df.columns))[::2], axis=1))     df.columns = df.columns + 1     return df.add_prefix('s')  # test whether give same o/p with_groupby(df).equals(with_groupby(df)) true  %timeit with_groupby(df.copy()) 1 loop, best of 3: 1.11 s per loop  %timeit with_reduceat(df.copy())   # <--- (>3x faster) 1 loop, best of 3: 345 ms per loop 

Comments

Popular posts from this blog

account - Script error login visual studio DefaultLogin_PCore.js -

xcode - CocoaPod Storyboard error: -