我会考虑使用 自定义的Retry
Middleware
,它类似于内置的。
实施示例(未经测试):
import logginglogger = logging.getLogger(__name__)class RetryMiddleware(object): def process_response(self, request, response, spider): if 'var PageIsLoaded = false;' in response.body: logger.warning('parse_page encountered an incomplete rendering of {}'.format(response.url)) return self._retry(request) or response return response def _retry(self, request): logger.debug("Retrying %(request)s", {'request': request}) retryreq = request.copy() retryreq.dont_filter = True return retryreq并且不要忘记激活它。



