python - Regular expressions in Beautiful soup don't work -
i'm trying find tag. tag's class contains substring: borderbox flightbox p2.
for example this: <div class="borderbox flightbox p2 my-repeat-animation ng-scope"...
so suppose should works:
soup.find_all('div',class_=re.compile(r"borderbox flightbox p2"+".*"))
but can't find anything. have advice?
this should want:
def match_tag(tag, classes): return (tag.name == 'div' , 'class' in tag.attrs , all([c in tag['class'] c in classes])) divs = soup.find_all(lambda t: match_tag(t, ['borderbox', 'flightbox', 'p2'))
in beautifulsoup 4, regex passed class_
argument applied each css class individually. beautifulsoup checking each individual css class held div see whether matches regular expression gave it. put in code, it's doing like:
for class in div['class']: if regexp.search(class): yield div
of course no individual class have going match regex; 'borderbox flightbox p2'
found in 'borderbox'
, 'flightbox'
, or 'p2'
.
the solution use beautifulsoup's ability take function matching you. match_tag
checks see (1) tag div
, (2) tag has every css class specified argument classes
.
Comments
Post a Comment