When scrapy crawls the page, it returns that the page will stop.

what should I do if scrapy crawls a page and returns a 404 page?

http://www.example.com/artiles/1
http://www.example.com/artiles/2
.
http://www.example.com/artile.
for example, to grab a total of 20 pages above, These 2nd pages do not exist, return 404 pages,
and then scrapy stops-sharp-sharp-sharp problem description

the environmental background of the problems and what methods you have tried

related codes

http://www.example.com/artiles/1
http://www.example.com/artiles/2
.
http://www.example.com/artile.

what result do you expect? What is the error message actually seen?

how to solve the stop problem

Aug.04,2021

you can try throwing an exception using try and expect, and then you can skip the 404 URL.


ider-middleware.html-sharpstd:reqmeta-handle_httpstatus_list" rel=" nofollow noreferrer "> handle_httpstatus_list- official document

class MySpider(CrawlSpider):
    handle_httpstatus_list = [404]

so that you can process your 404 requests in the call_back function of Request that you define yourself.

Menu