python - In BeautifulSoup, Ignore Children Elements While Getting Parent Element Data -
i have html follows:
<html> <div class="maindiv"> text data here <br> continued text data <br> <div class="somename"> text & data want omit </div> </div> </html>
i trying the text found in maindiv
element, without getting text data found in somename
element. in cases, in experience anyway, text data contained within child element. have ran particular case data seems contained will-nilly , bit harder filter.
my approach follows:
textdata= soup.find('div', class_='maindiv').get_text()
this gets text data found within maindiv
element, text data found in somename
div element.
the logic i'd use more along lines of: textdata = soup.find('div', class_='maindiv').get_text(recursive=false)
omit text data found within somename
element.
i know recursive=false
argument works locating parent-level elemenets when searching dom structure using beautifulsoup, can't used .get_text()
method.
i've realized approach of finding text, subtracting string data found in somename
element string data found in maindiv
element, i'm looking little more efficient.
not far subtracting method, 1 way (at least in python 3) discard child divs.
s = soup.find('div', class_='maindiv') child in s.find_all("div"): child.decompose() print(s.get_text())
would print like:
text data here continued text data
that might bit more efficient , flexible subtracting strings, though still needs go through children first.
Comments
Post a Comment