python - Regular expressions in Beautiful soup don't work -


i'm trying find tag. tag's class contains substring: borderbox flightbox p2.

for example this: <div class="borderbox flightbox p2 my-repeat-animation ng-scope"...

so suppose should works:

soup.find_all('div',class_=re.compile(r"borderbox flightbox p2"+".*")) 

but can't find anything. have advice?

this should want:

def match_tag(tag, classes):     return (tag.name == 'div'             , 'class' in tag.attrs             , all([c in tag['class'] c in classes]))  divs = soup.find_all(lambda t: match_tag(t, ['borderbox', 'flightbox', 'p2')) 

in beautifulsoup 4, regex passed class_ argument applied each css class individually. beautifulsoup checking each individual css class held div see whether matches regular expression gave it. put in code, it's doing like:

for class in div['class']:     if regexp.search(class): yield div 

of course no individual class have going match regex; 'borderbox flightbox p2' found in 'borderbox', 'flightbox', or 'p2'.

the solution use beautifulsoup's ability take function matching you. match_tag checks see (1) tag div , (2) tag has every css class specified argument classes.


Comments

Popular posts from this blog

python - How to create jsonb index using GIN on SQLAlchemy? -

PHP DOM loadHTML() method unusual warning -

c# - TransactionScope not rolling back although no complete() is called -