python - How can I scrape the correct number of URLs from an infinite-scroll webpage? -
i trying scrape urls webpage. using code:
from bs4 import beautifulsoup import urllib2 url = urllib2.urlopen("http://www.barneys.com/barneys-new-york/men/clothing/shirts/dress/classic#sz=176&pageviewchange=true") content = url.read() soup = beautifulsoup(content) links=soup.find_all("a", {"class": "thumb-link"}) link in links: print (link.get('href'))
but i'm getting output 48 links instead of 176. doing wrong?
so did used postmans interceptor feature @ call website made each time loaded next set of 36 shirts. there replicated calls in code. can't dump 176 items @ once replicated 36 @ time website did.
from bs4 import beautifulsoup import requests urls = [] in range(1, 5): offset = 36 * r = requests.get('http://www.barneys.com/barneys-new-york/men/clothing/shirts/dress/classic?start=1&format=page-element&sz={}&_=1434647715868'.format(offset)) soup = beautifulsoup(r.text) links = soup.find_all("a", {"class": "thumb-link"}) link in links: if len(urls) < 176: print (link.get('href')) urls.append(link.get('href'))
Comments
Post a Comment