栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 软件开发 > 后端开发 > Python

循环调用scrapy框架出现的问题:twisted.internet.error.ReactorNotRestartable,解决方法

Python 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

循环调用scrapy框架出现的问题:twisted.internet.error.ReactorNotRestartable,解决方法

这两天遇到个问题,就是需要循环调用scrapy框架

这是官网给的几种方法(但是这只是调用多个爬虫的方法)

https://doc.scrapy.org/en/latest/topics/practices.html#running-multiple-spiders-in-the-same-process

from twisted.internet import reactor, defer
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging
import time
import logging
from scrapy.utils.project import get_project_settings
import multiprocessing


import psycopg2
import time

#在控制台打印日志
configure_logging()
#CrawlerRunner获取settings.py里的设置信息
runner = CrawlerRunner(get_project_settings())

@defer.inlineCallbacks
def crawl():
    logging.info("new cycle starting")
    yield runner.crawl('name')
    reactor.stop()

def pgsql():
    conn = psycopg2.connect(database="1111", user="1111",
                            password="1111", host="1111", port="1111")
    cursor = conn.cursor()
    cursor.execute('SELECt id, subtidf_id, sp_detail, status, tasksum_id FROM public.stask where position({} in subtidf_id)!=0 ;'.format("'netyicc'"))
    rows = cursor.fetchall()
    print(rows)
    for row in rows:
        if row[-2] ==101:
            cursor.execute(''''';'.format(row[0]))
            conn.commit()
            crawl()
            reactor.run()
            cursor.execute('111111;'.format(row[0]))
            conn.commit() 
    conn.close()
    cursor.close()


if __name__ == '__main__':
    while 1:
        pgsql()
        time.sleep(60)

我一开始是用的这种方法就会出现异常:twisted.internet.error.ReactorNotRestartable

经过三天的各种搜索都无法解决,后来了解到reactor.run()一个进程只能调用一次

所以我就通过多进程来解决此方法,成功了

这是成功的代码:

from twisted.internet import reactor, defer
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging
import time
import logging
from scrapy.utils.project import get_project_settings
import multiprocessing


import psycopg2
import time

#在控制台打印日志
configure_logging()
#CrawlerRunner获取settings.py里的设置信息
runner = CrawlerRunner(get_project_settings())

@defer.inlineCallbacks
def crawl():
    logging.info("new cycle starting")
    yield runner.crawl('name')
    reactor.stop()

def pgsql():
    conn = psycopg2.connect(database="1111", user="1111",
                            password="1111", host="1111", port="1111")
    cursor = conn.cursor()
    cursor.execute('1111)!=0 ;'.format("'netyicc'"))
    rows = cursor.fetchall()
    print(rows)
    for row in rows:
        if row[-2] ==101:
            cursor.execute('1111;'.format(row[0]))
            conn.commit()
            crawl()
            reactor.run()
            cursor.execute('111111;'.format(row[0]))
            conn.commit() 
    conn.close()
    cursor.close()


if __name__ == '__main__':
    while 1:
        process = multiprocessing.Process(target=pgsql)
        process.start()
        process.join()
        time.sleep(60)

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/840573.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号