r - How can I split large data.frame into smaller ones without using a loop? -


i have large dataframe (20k rows) dataframe contains date / timestamp text , delta between first timestamp , subsequent time stamps.

                date   text time.diff 1 2016-03-09 15:50:07 text 1     0.000 2 2016-03-09 15:50:10 text 2     2.808 3 2016-03-09 15:50:17 text 3    10.128 4 2016-03-09 15:50:53 text 4    45.952 5 2016-03-09 21:26:15 text 5    65.053 

i'd able split dataframe smaller chunks based on values contained in time.diff (say chunks of 60 seconds). example, splitting 2 using subset can done so, if have larger frame, end writing 1000's lines of code!

i create loop iterate through larger dataframe , accomplish task, know using loops in r rather slow.

so i'm wondering approach can take split larger frame many smaller frames in way doesn't use loop , can increment smaller dataframe names e.g. df.sub.1, df.sub.2 ... df.sub.3

# split 2 frames based on matched criteria df.split1 <- subset(df.tosplit, time.diff <= 60) df.split2 <- subset(df.tosplit, time.diff > 60)  > df.split1                  date   text time.diff 1 2016-03-09 15:50:07 text 1     0.000 2 2016-03-09 15:50:10 text 2     2.808 3 2016-03-09 15:50:17 text 3    10.128 4 2016-03-09 15:50:53 text 4    45.952 > df.split2                  date   text time.diff 5 2016-03-09 21:26:15 text 5    65.053 6 2016-03-09 21:26:20 text 6    85.110 

i've included sample code create first 6 lines should enough folks suggest way forward here.

# create data date <- c("2016-03-09 15:50:07", "2016-03-09 15:50:10", "2016-03-09 15:50:17" ,       "2016-03-09 15:50:53", "2016-03-09 21:26:15", "2016-03-09 21:26:20") text <- c("text 1", "text 2", "text 3", "text 4", "text 5", "text 6") time.diff <- c(0, 2.808, 10.128, 45.952, 65.053, 85.110) df.tosplit <- data.frame(date, text, time.diff) 

using split():

split(df, paste0("df.split", df$time.diff %/% 60))  $df.split0                   dat   text time.diff 1 2016-03-09 15:50:07 text 1     0.000 2 2016-03-09 15:50:10 text 2     2.808 3 2016-03-09 15:50:17 text 3    10.128 4 2016-03-09 15:50:53 text 4    45.952  $df.split1                   dat   text time.diff 5 2016-03-09 21:26:15 text 5    65.053 6 2016-03-09 21:26:20 text 6    85.110 

exotic way (see explanation here):

list2env(split(df, paste0("df.split", df$time.diff %/% 60)), .globalenv)


Comments

Popular posts from this blog

account - Script error login visual studio DefaultLogin_PCore.js -

xcode - CocoaPod Storyboard error: -