r - Splitting a string few characters after the delimiter -
i have large data set of names , states need split. after splitting, want create new rows each name , state. data strings in multiple lines this
"peter johnson, in chet charles, tx ed walsh, az" "ralph hogan, tx, michael johnson, fl"
i need data this
attr name state 1 peter johnson in 2 chet charles tx 3 ed walsh az 4 ralph hogan tx 5 michael johnson fl
i can't figure out how this, perhaps split somehow few characters after comma? appreciated.
if multiple line strings, can create delimiter gsub
, split strings using strsplit
, create data.frame
components of split
in output list
, , rbind
together.
d1 <- do.call(rbind, lapply(strsplit(gsub("([a-z]{2})(\\s+|,)", "\\1;", lines), "[,;]"), function(x) { x1 <- trimws(x) data.frame(name = x1[c(true, false)],state = x1[c(false, true)]) })) cbind(attr = seq_len(nrow(d1)), d1) # attr name state #1 1 peter johnson in #2 2 chet charles tx #3 3 ed walsh az #4 4 ralph hogan tx #5 5 michael johnson fl
or can done in compact way
library(data.table) fread(paste(gsub("([a-z]{2})(\\s+|,)", "\\1\n", lines), collapse="\n"), col.names = c("names", "state"), header = false)[, attr := 1:.n][] # names state attr #1: peter johnson in 1 #2: chet charles tx 2 #3: ed walsh az 3 #4: ralph hogan tx 4 #5: michael johnson fl 5
data
lines <- readlines(textconnection("peter johnson, in chet charles, tx ed walsh, az ralph hogan, tx, michael johnson, fl"))
Comments
Post a Comment