只需检查官方文档即可。我会进行一些更改,以便你可以控制Spider仅在执行此操作时运行,
python myscript.py而不是每次从其导入时都运行。只需添加一个
if __name__ == "__main__":
import scrapyfrom scrapy.crawler import CrawlerProcessclass MySpider(scrapy.Spider): # Your spider definition passif __name__ == "__main__": process = CrawlerProcess({ 'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)' }) process.crawl(MySpider) process.start() # the script will block here until the crawling is finished现在将文件另存为,
myscript.py然后运行“ python myscript.py”。



