python 2.7 - combine stripping white space and html tags -


i'm looking possibility strip html tags , white space parsed text using beautiful soup. problem can't combine these two.

here whole script:

# -*- coding: utf-8 -*-  urllib2 import urlopen bs4 import beautifulsoup bs  word = "drop" url = ('http://civil.ge/eng/category.php?id=10') soup = bs(urlopen(url).read()) titz = soup.find("div", {"class": "archtype_category_block"})  t in titz.find_all('div', {'class': 'archive_type_article_title'}):     if word in t.encode('utf-8').strip():         print t.prettify()    

the result prettify() is:

<div class="archive_type_article_title">  prosecutors drop objection release of ex-mod officials pretrial     detention </div> 

and get_text() clean text lots of white space before , after it. solutions this?

thanks!

i used python 3 , wasn't able reproduce spacing problem. maybe answer!

i change print t.prettify() print t.prettify().join(mystring.split()) , see if fixes problem.

also, code first archtype_category_block, maybe want, if want of them have change titz = soup.find("div", {"class": "archtype_category_block"}) for titz in soup.find_all("div", {"class": "archtype_category_block"}):


Comments

Popular posts from this blog

PHP DOM loadHTML() method unusual warning -

python - How to create jsonb index using GIN on SQLAlchemy? -

c# - TransactionScope not rolling back although no complete() is called -