xml - Web Scraping with xpathSApply (R)

i'm trying extract text following structure:

<p class="id1"> title or </p>     <p> text text text </p> <p> more text </p> <p class="id2"> else </p>

when use:

text_info <- xpathsapply(parsed, "//p", xmlvalue)

the result is:

[1] 'title or something' [2] 'text text text' [3] 'more text' [4] 'something else'

i want text inside <p> no class:

[1] 'text text text' [2] 'more text'

i'm using following code takes long time , have many texts:

text_info <- setdiff(xpathsapply(parsed, "//p", xmlvalue), xpathsapply(parsed, "//p[@class]", xmlvalue))

is there way extract have no class using 1 xpathsapply?

you can use not() in xpath.

xpathsapply(doc, "//p[not(@class)]", xmlvalue, trim = true) # [1] "text text text" "more text"

this chooses elements without class attribute.

data:

library(xml) doc <- htmlparse('<p class="id1"> title or </p>     <p> text text text </p> <p> more text </p> <p class="id2"> else </p>')

QR