r - How can I split large data.frame into smaller ones without using a loop? -
i have large dataframe (20k rows) dataframe contains date / timestamp text , delta between first timestamp , subsequent time stamps.
date text time.diff 1 2016-03-09 15:50:07 text 1 0.000 2 2016-03-09 15:50:10 text 2 2.808 3 2016-03-09 15:50:17 text 3 10.128 4 2016-03-09 15:50:53 text 4 45.952 5 2016-03-09 21:26:15 text 5 65.053
i'd able split dataframe smaller chunks based on values contained in time.diff (say chunks of 60 seconds). example, splitting 2 using subset can done so, if have larger frame, end writing 1000's lines of code!
i create loop iterate through larger dataframe , accomplish task, know using loops in r rather slow.
so i'm wondering approach can take split larger frame many smaller frames in way doesn't use loop , can increment smaller dataframe names e.g. df.sub.1, df.sub.2 ... df.sub.3
# split 2 frames based on matched criteria df.split1 <- subset(df.tosplit, time.diff <= 60) df.split2 <- subset(df.tosplit, time.diff > 60) > df.split1 date text time.diff 1 2016-03-09 15:50:07 text 1 0.000 2 2016-03-09 15:50:10 text 2 2.808 3 2016-03-09 15:50:17 text 3 10.128 4 2016-03-09 15:50:53 text 4 45.952 > df.split2 date text time.diff 5 2016-03-09 21:26:15 text 5 65.053 6 2016-03-09 21:26:20 text 6 85.110
i've included sample code create first 6 lines should enough folks suggest way forward here.
# create data date <- c("2016-03-09 15:50:07", "2016-03-09 15:50:10", "2016-03-09 15:50:17" , "2016-03-09 15:50:53", "2016-03-09 21:26:15", "2016-03-09 21:26:20") text <- c("text 1", "text 2", "text 3", "text 4", "text 5", "text 6") time.diff <- c(0, 2.808, 10.128, 45.952, 65.053, 85.110) df.tosplit <- data.frame(date, text, time.diff)
using split()
:
split(df, paste0("df.split", df$time.diff %/% 60)) $df.split0 dat text time.diff 1 2016-03-09 15:50:07 text 1 0.000 2 2016-03-09 15:50:10 text 2 2.808 3 2016-03-09 15:50:17 text 3 10.128 4 2016-03-09 15:50:53 text 4 45.952 $df.split1 dat text time.diff 5 2016-03-09 21:26:15 text 5 65.053 6 2016-03-09 21:26:20 text 6 85.110
exotic way (see explanation here):
list2env(split(df, paste0("df.split", df$time.diff %/% 60)), .globalenv)
Comments
Post a Comment