python - Add successive rows in Pandas if they match on some columns -
i have dataframe following one:
id url seconds 1 email 9 1 email 3 1 app 5 1 app 9 1 faceboook 50 1 faceboook 7 1 faceboook 39 1 faceboook 10 1 email 39 1 email 5 1 email 57 1 faceboook 7 1 faceboook 32 1 faceboook 3 2 app 11 2 app 10 2 email 56 2 faceboook 9 2 faceboook 46 2 faceboook 16 2 email 21
i want sum 'seconds' column successive views of same url same id. that's result i'm looking for:
id url seconds 1 email 12 1 app 14 1 faceboook 106 1 email 101 1 faceboook 42 2 app 21 2 email 56 2 faceboook 71 2 email 21
df.groupby(['id', 'url']).sum()
not work in case sum cases of same url same id, not successive ones.
any ideas?
you can use groupby
series
created compare ne
column url
, shifted, last use cumsum
boolean mask
:
print ((df.url.ne(df.url.shift())).cumsum()) 0 1 1 1 2 2 3 2 4 3 5 3 6 3 7 3 8 4 9 4 10 4 11 5 12 5 13 5 14 6 15 6 16 7 17 8 18 8 19 8 20 9 name: url, dtype: int32
print (df['seconds'].groupby([(df.url.ne(df.url.shift())).cumsum(), df.id, df.url]).sum()) url id url 1 1 email 12 2 1 app 14 3 1 faceboook 106 4 1 email 101 5 1 faceboook 42 6 2 app 21 7 2 email 56 8 2 faceboook 71 9 2 email 21 name: seconds, dtype: int64 print (df['seconds'].groupby([(df.url.ne(df.url.shift())).cumsum(), df.id, df.url]) .sum() .reset_index(level=0, drop=true) .reset_index()) id url seconds 0 1 email 12 1 1 app 14 2 1 faceboook 106 3 1 email 101 4 1 faceboook 42 5 2 app 21 6 2 email 56 7 2 faceboook 71 8 2 email 21
Comments
Post a Comment