Multiple scrapy-redis cannot be crawled at the same time

Open two scrapy tasks at the same time, and then go to push in redis a start_url
but only one scrapy task An is running, and when An is stopped, B task will begin to crawl.

the reason seems to be that requests is not saved in redis while scrapy-redis is running, only dupefilter is saved. requests

is saved in redis only after ctrl+c stops.

or push a start_url,B task to redis before it starts crawling.
what"s going on?

clipboard.png

clipboard.png

version:
python 3.6
Scrapy (1.5.0)
scrapy-redis (0.6.8)

settings.py

SCHEDULER = "scrapy_redis.scheduler.Scheduler"
DUPEFILTER_CLASS = "scrapy_redis.dupefilter.RFPDupeFilter"
SCHEDULER_PERSIST = True
The reason for

is found because it is too slow to generate next_ url, that is, it is pop immediately after going to redis push, which causes redis to have no serialization of requests, and then other scrapy instances cannot read request and cannot crawl data at the same time.


so how to solve this problem?

Menu