r - Creating Groups with Dplyr's "group_by" then Using Stringr or Set Operations to Find Differences Between Groups -
i use dplyr , stringr if possible, or @ least stay within tidyverse achieve following:
i need group data caseworker , client , compare "task" , "task2" find categories in "task2" not in "task", along associated total time "task2" category.
"task" can have categories not in "task2", i'm interested in finding categories in "task2" not in "task". great able create new columns show specific entries in "task2" , not in "task", along associated "time" value.
the end result should show 4 new columns client chris, 1 "iron shirt" , 1 column associated "time" of 45, , column "do homework" , column "time" of 21. there 2 new columns client eric, 1 "iron shirt" , 1 associated time of 12.
caseworker<-c("john","john","john","john","john","john","john","john", "john","kim","kim") client<-c("chris","chris","chris","chris","chris","chris","chris","chris","chris","eric","eric") task<-c("feed cat","feed cat","feed cat","make dinner","make dinner","make dinner","buy groceries","buy groceries","buy groceries","do homework","do homework") task2<-c("feed cat","iron shirt","iron shirt","do homework","do homework","do homework","make dinner","feed cat","feed cat","do homework","iron shirt") time<-c(20,34,11,10,5,6,55,30,20,10,12) df<-data.frame(caseworker,client,task,task2,time)
we can try
library(dplyr) library(tidyr) df %>% group_by(caseworker, client) %>% filter(task2 %in% setdiff(task2, task)) %>% group_by(task2, add=true) %>% summarise(time = sum(time)) %>% spread(task2, time) # caseworker client `do homework` `iron shirt` #* <fctr> <fctr> <dbl> <dbl> #1 john chris 21 45 #2 kim eric na 12
Comments
Post a Comment