r - Splitting a string few characters after the delimiter -


i have large data set of names , states need split. after splitting, want create new rows each name , state. data strings in multiple lines this

"peter johnson, in chet charles, tx ed walsh, az" "ralph hogan, tx, michael johnson, fl" 

i need data this

attr      name            state 1         peter johnson   in 2         chet charles    tx 3         ed walsh        az 4         ralph hogan     tx 5         michael johnson fl 

i can't figure out how this, perhaps split somehow few characters after comma? appreciated.

if multiple line strings, can create delimiter gsub, split strings using strsplit, create data.frame components of split in output list, , rbind together.

d1 <- do.call(rbind, lapply(strsplit(gsub("([a-z]{2})(\\s+|,)",                     "\\1;", lines), "[,;]"), function(x) {                         x1 <- trimws(x)        data.frame(name = x1[c(true, false)],state = x1[c(false, true)]) }))      cbind(attr = seq_len(nrow(d1)), d1) #  attr            name state #1    1   peter johnson    in #2    2    chet charles    tx #3    3        ed walsh    az #4    4     ralph hogan    tx #5    5 michael johnson    fl 

or can done in compact way

library(data.table) fread(paste(gsub("([a-z]{2})(\\s+|,)", "\\1\n", lines), collapse="\n"),         col.names = c("names", "state"), header = false)[, attr := 1:.n][] #             names state attr #1:   peter johnson    in    1 #2:    chet charles    tx    2 #3:        ed walsh    az    3 #4:     ralph hogan    tx    4 #5: michael johnson    fl    5 

data

lines <- readlines(textconnection("peter johnson, in chet charles, tx ed walsh, az  ralph hogan, tx, michael johnson, fl")) 

Comments

Popular posts from this blog

account - Script error login visual studio DefaultLogin_PCore.js -

xcode - CocoaPod Storyboard error: -