pos tagger - Selecting text from corresponding tags in a sequence in R -

- April 15, 2011

i trying extract text corresponding tag in sentence in sequence. trying part of speech corresponds each sentence in text file. code:

   postext<- "the verifone not working, when customers slide card nothing happens. screen frozen. rebooted did not help."    postext1<- c("the verifone not working","scanner not scanning","printer offline","when customers slide card nothing happens. screen frozen. rebooted did not help.")     tagpos <-  function(x, ...) {    s <- as.string(x)    word_token_annotator <- maxent_word_token_annotator()    a2 <- annotation(1l, "sentence", 1l, nchar(s))    a2 <- annotate(s, word_token_annotator, a2)    a3 <- annotate(s, maxent_pos_tag_annotator(), a2)    a3w <- a3[a3$type == "word"]    postags <- unlist(lapply(a3w$features, `[[`, "pos"))    postagged <- paste(sprintf("%s/%s", s[a3w], postags), collapse = " ")    list(postagged = postagged, postags = postags)    }     dd1 <- do.call(rbind, strsplit(as.character(postext), ' '))    dd_v1 <- tagpos(dd1)$postagged    dd_v1

output

   [1] "the/dt verifone/nnp is/vbz not/rb working/vbg ,/, when/wrb customers/nns slide/nn card/nn nothing/nn happens/vbz ./. the/dt screen/nn is/vbz frozen/vbn ./. we/prp rebooted/vbd but/cc it/prp did/vbd not/rb help/vb ./."

i want extract text of tag in sequence. example: want extract texts tag 'nnp','vbz','rb','vbg' in sequence entire text file wherever have following sequence in sentences.

my desired outputs is:

 [1] verifone not working

thank help.

this rather naive approach , in case have plenty of strings, slow, give try

# constrol sequence ids (probably regex nicer do...)   tags <- sapply(strsplit(strsplit(dd_v1,"/")[[1]][-1]," "),"[",1)  # define constants   matchseq <- c('nnp','vbz','rb', 'vbg')   totaltags <- length(tags)   searchlength <- length(matchseq)  # loop through subvectors , store starting points of possible matches   startpoints <- c()   for(i in 1:(totaltags-searchlength)){     if(identical(tags[i:(i+searchlength-1)], matchseq)) startpoints <- c(startpoints,i)   }  # print results, if there   if(!is.null(startpoints)) paste(dd1[startpoints:(startpoints+searchlength-1)], collapse=" ")

if find more location, can e.g. loop on startpoints , print every single sequence separately.

Search This Blog

QR

pos tagger - Selecting text from corresponding tags in a sequence in R -

Comments

Post a Comment

Popular posts from this blog

java - .class files under target/classes folder Maven -

linux - Could not find a package configuration file provided by "Qt5Svg" -

simple.odata.client - Simple OData Client Unlink -