python - In BeautifulSoup, Ignore Children Elements While Getting Parent Element Data -

- September 15, 2010

i have html follows:

<html>     <div class="maindiv">         text data here          <br>         continued text data         <br>         <div class="somename">             text & data want omit         </div>     </div> </html>

i trying the text found in maindivelement, without getting text data found in somename element. in cases, in experience anyway, text data contained within child element. have ran particular case data seems contained will-nilly , bit harder filter.

my approach follows:

textdata= soup.find('div', class_='maindiv').get_text()

this gets text data found within maindiv element, text data found in somename div element.

the logic i'd use more along lines of: textdata = soup.find('div', class_='maindiv').get_text(recursive=false) omit text data found within somename element.

i know recursive=false argument works locating parent-level elemenets when searching dom structure using beautifulsoup, can't used .get_text() method.

i've realized approach of finding text, subtracting string data found in somename element string data found in maindiv element, i'm looking little more efficient.

not far subtracting method, 1 way (at least in python 3) discard child divs.

s = soup.find('div', class_='maindiv')  child in s.find_all("div"):     child.decompose()  print(s.get_text())

would print like:

text data here          continued text data

that might bit more efficient , flexible subtracting strings, though still needs go through children first.

Search This Blog

QR

python - In BeautifulSoup, Ignore Children Elements While Getting Parent Element Data -

Comments

Post a Comment

Popular posts from this blog

java - .class files under target/classes folder Maven -

linux - Could not find a package configuration file provided by "Qt5Svg" -

simple.odata.client - Simple OData Client Unlink -