xml - Web Scraping with xpathSApply (R) - Only no class text -


i'm trying extract text following structure:

<p class="id1"> title or </p>     <p> text text text </p> <p> more text </p> <p class="id2"> else </p> 

when use:

text_info <- xpathsapply(parsed, "//p", xmlvalue) 

the result is:

[1] 'title or something' [2] 'text text text' [3] 'more text' [4] 'something else' 

i want text inside <p> no class:

[1] 'text text text' [2] 'more text' 

i'm using following code takes long time , have many texts:

text_info <- setdiff(xpathsapply(parsed, "//p", xmlvalue), xpathsapply(parsed, "//p[@class]", xmlvalue)) 

is there way extract have no class using 1 xpathsapply?

you can use not() in xpath.

xpathsapply(doc, "//p[not(@class)]", xmlvalue, trim = true) # [1] "text text text" "more text"    

this chooses elements without class attribute.

data:

library(xml) doc <- htmlparse('<p class="id1"> title or </p>     <p> text text text </p> <p> more text </p> <p class="id2"> else </p>') 

Comments

Popular posts from this blog

account - Script error login visual studio DefaultLogin_PCore.js -

xcode - CocoaPod Storyboard error: -