python - How can I scrape the correct number of URLs from an infinite-scroll webpage? -


i trying scrape urls webpage. using code:

from bs4 import beautifulsoup  import urllib2   url = urllib2.urlopen("http://www.barneys.com/barneys-new-york/men/clothing/shirts/dress/classic#sz=176&pageviewchange=true")  content = url.read() soup = beautifulsoup(content)  links=soup.find_all("a", {"class": "thumb-link"})  link in links:        print (link.get('href')) 

but i'm getting output 48 links instead of 176. doing wrong?

so did used postmans interceptor feature @ call website made each time loaded next set of 36 shirts. there replicated calls in code. can't dump 176 items @ once replicated 36 @ time website did.

from bs4 import beautifulsoup import requests  urls = []  in range(1, 5):     offset = 36 *     r = requests.get('http://www.barneys.com/barneys-new-york/men/clothing/shirts/dress/classic?start=1&format=page-element&sz={}&_=1434647715868'.format(offset))     soup = beautifulsoup(r.text)      links = soup.find_all("a", {"class": "thumb-link"})      link in links:         if len(urls) < 176:             print (link.get('href'))             urls.append(link.get('href')) 

Comments

Popular posts from this blog

python - How to create jsonb index using GIN on SQLAlchemy? -

PHP DOM loadHTML() method unusual warning -

c# - TransactionScope not rolling back although no complete() is called -