您需要在项目的settings.py中设置DOWNLOAD_DELAY。请注意,您可能还需要限制并发性。默认情况下,并发为8,因此您要同时访问8个网站。
# settings.pyDOWNLOAD_DELAY = 1CONCURRENT_REQUESTS_PER_DOMAIN = 2
从Scrapy
1.0开始,您还可以在Spider中放置自定义设置,因此您可以执行以下操作:
class DmozSpider(Spider): name = "dmoz" allowed_domains = ["dmoz.org"] start_urls = [ "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/", "http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/", ] custom_settings = { "DOWNLOAD_DELAY": 5, "CONCURRENT_REQUESTS_PER_DOMAIN": 2 }延迟和并发是根据下载器插槽而不是根据请求设置的。要实际检查您的下载内容,可以尝试执行以下操作
def parse(self, response): """ """ delay = self.crawler.engine.downloader.slots["www.dmoz.org"].delay concurrency = self.crawler.engine.downloader.slots["www.dmoz.org"].concurrency self.log("Delay {}, concurrency {} for request {}".format(delay, concurrency, response.request)) return


