r - Splitting a string few characters after the delimiter -
i have large data set of names , states need split. after splitting, want create new rows each name , state. data strings in multiple lines this
"peter johnson, in chet charles, tx ed walsh, az" "ralph hogan, tx, michael johnson, fl" i need data this
attr name state 1 peter johnson in 2 chet charles tx 3 ed walsh az 4 ralph hogan tx 5 michael johnson fl i can't figure out how this, perhaps split somehow few characters after comma? appreciated.
if multiple line strings, can create delimiter gsub, split strings using strsplit, create data.frame components of split in output list, , rbind together.
d1 <- do.call(rbind, lapply(strsplit(gsub("([a-z]{2})(\\s+|,)", "\\1;", lines), "[,;]"), function(x) { x1 <- trimws(x) data.frame(name = x1[c(true, false)],state = x1[c(false, true)]) })) cbind(attr = seq_len(nrow(d1)), d1) # attr name state #1 1 peter johnson in #2 2 chet charles tx #3 3 ed walsh az #4 4 ralph hogan tx #5 5 michael johnson fl or can done in compact way
library(data.table) fread(paste(gsub("([a-z]{2})(\\s+|,)", "\\1\n", lines), collapse="\n"), col.names = c("names", "state"), header = false)[, attr := 1:.n][] # names state attr #1: peter johnson in 1 #2: chet charles tx 2 #3: ed walsh az 3 #4: ralph hogan tx 4 #5: michael johnson fl 5 data
lines <- readlines(textconnection("peter johnson, in chet charles, tx ed walsh, az ralph hogan, tx, michael johnson, fl"))
Comments
Post a Comment