python - Sum of Every Two Columns in Pandas dataframe -
when using pandas, have problem. task this:
df=pd.dataframe([(1,2,3,4,5,6),(1,2,3,4,5,6),(1,2,3,4,5,6)],columns=['a','b','c','d','e','f']) out: b c d e f 0 1 2 3 4 5 6 1 1 2 3 4 5 6 2 1 2 3 4 5 6
what want output dataframe looks this:
out: s1 s2 s3 0 3 7 11 1 3 7 11 2 3 7 11
that say, sum column (a,b),(c,d),(e,f) separately , rename result columns names (s1,s2,s3). solve problem in pandas? thank much.
1) perform groupby
w.r.t columns supplying axis=1
. per @boud's comment, want minor tweak in grouping array:
df.groupby((np.arange(len(df.columns)) // 2) + 1, axis=1).sum().add_prefix('s')
grouping gets performed according condition:
np.arange(len(df.columns)) // 2 # array([0, 0, 1, 1, 2, 2], dtype=int32)
2) use np.add.reduceat
faster alternative:
df = pd.dataframe(np.add.reduceat(df.values, np.arange(len(df.columns))[::2], axis=1)) df.columns = df.columns + 1 df.add_prefix('s')
timing constraints:
for df
of 1 million rows spanned on 20 columns:
from string import ascii_lowercase np.random.seed(42) df = pd.dataframe(np.random.randint(0, 10, (10**6,20)), columns=list(ascii_lowercase[:20])) df.shape (1000000, 20) def with_groupby(df): return df.groupby((np.arange(len(df.columns)) // 2) + 1, axis=1).sum().add_prefix('s') def with_reduceat(df): df = pd.dataframe(np.add.reduceat(df.values, np.arange(len(df.columns))[::2], axis=1)) df.columns = df.columns + 1 return df.add_prefix('s') # test whether give same o/p with_groupby(df).equals(with_groupby(df)) true %timeit with_groupby(df.copy()) 1 loop, best of 3: 1.11 s per loop %timeit with_reduceat(df.copy()) # <--- (>3x faster) 1 loop, best of 3: 345 ms per loop
Comments
Post a Comment