在Pablo Hoffman的解决方案的基础上,您可以在
process_itemPipeline对象的方法上使用以下装饰器,以便它检查
pipeline您的Spider的属性是否应执行。例如:
def check_spider_pipeline(process_item_method): @functools.wraps(process_item_method) def wrapper(self, item, spider): # message template for debugging msg = '%%s %s pipeline step' % (self.__class__.__name__,) # if class is in the spider's pipeline, then use the # process_item method normally. if self.__class__ in spider.pipeline: spider.log(msg % 'executing', level=log.DEBUG) return process_item_method(self, item, spider) # otherwise, just return the untouched item (skip this step in # the pipeline) else: spider.log(msg % 'skipping', level=log.DEBUG) return item return wrapper
为了使此装饰器正常工作,蜘蛛程序必须具有管道属性,其中包含要用于处理项目的管道对象的容器,例如:
class MySpider(baseSpider): pipeline = set([ pipelines.Save, pipelines.Validate, ]) def parse(self, response): # insert scrapy goodness here return item
然后在一个
pipelines.py文件中:
class Save(object): @check_spider_pipeline def process_item(self, item, spider): # do saving here return itemclass Validate(object): @check_spider_pipeline def process_item(self, item, spider): # do validating here return item
所有Pipeline对象仍应在ITEM_PIPELINES中的设置中进行定义(以正确的顺序进行更改-这样很好,以便可以在Spider上指定顺序)。



