python - Scrapy only outputting an open bracket -

- February 15, 2010

i'm trying scrape title , url of khan academy pages under math/science/economics pages. however, outputting open bracket, , before happened scrape start url.

from openbar_index.items import openbarindexitem scrapy.contrib.spiders import crawlspider, rule scrapy.contrib.linkextractors.sgml import sgmllinkextractor   class openbarspider(crawlspider):     """     scrapes website urls educational websites , commits urls/webpage names/text document     """      name = 'openbar'     allowed_domains = 'khanacademy.org'     start_urls = [          "https://www.khanacademy.org"      ]       rules = [              rule(sgmllinkextractor(allow = ['/math/']), callback='parse_item', follow = true),              rule(sgmllinkextractor(allow = ['/science/']), callback='parse_item', follow=true),              rule(sgmllinkextractor(allow = ['/economics-finance-domain/']), callback='parse_item', follow=true)     ]      def parse_item(self, response):           item = openbarindexitem()          url = response.url          item['url'] = url          item['title'] = response.xpath('/html/head/title/text()').extract()          yield item

does have idea why happening or tips on how fix it?

the problem assignment allowed_domains. must not string list according documentation. string potentially results filtered scrapy offsite requests because there no valid domain.

so adding square brackets in next line should fix it

    allowed_domains = ['khanacademy.org']

Search This Blog

Yet

python - Scrapy only outputting an open bracket -

Comments

Post a Comment

Popular posts from this blog

swift - How to change text of a button with a segmented controller? -

python - How to create jsonb index using GIN on SQLAlchemy? -

PHP DOM loadHTML() method unusual warning -