xml - Web Scraping with xpathSApply (R) - Only no class text -
i'm trying extract text following structure:
<p class="id1"> title or </p> <p> text text text </p> <p> more text </p> <p class="id2"> else </p>
when use:
text_info <- xpathsapply(parsed, "//p", xmlvalue)
the result is:
[1] 'title or something' [2] 'text text text' [3] 'more text' [4] 'something else'
i want text inside <p>
no class:
[1] 'text text text' [2] 'more text'
i'm using following code takes long time , have many texts:
text_info <- setdiff(xpathsapply(parsed, "//p", xmlvalue), xpathsapply(parsed, "//p[@class]", xmlvalue))
is there way extract have no class using 1 xpathsapply?
you can use not()
in xpath.
xpathsapply(doc, "//p[not(@class)]", xmlvalue, trim = true) # [1] "text text text" "more text"
this chooses elements without class attribute.
data:
library(xml) doc <- htmlparse('<p class="id1"> title or </p> <p> text text text </p> <p> more text </p> <p class="id2"> else </p>')
Comments
Post a Comment