Scrapy only climbed start_urls using LinkExtractor.

The

code is as follows. Start_urls can crawl information, but cannot match other links

headers={
    "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36",
    }
    start_urls=[
        "https://chaoshi.detail.tmall.com/item.htm?id=576632421624&tbpm=3"
    ]
    rules=(
        Rule(LinkExtractor(allow=(r"https://chaoshi.detail.tmall.com/item.htm\?id=\d+&tbpm=3")),process_request="request_tagPage",callback="parse_item",follow=True),
    )
    def request_tagPage(self, request):
        newRequest = request.replace(headers=self.headers)
        return newRequest
    def parse_item(self,response):
        print(response.url)
Feb.28,2022

other links need to use the CrawlSpider class. I don't know whether you use the default Spider or which

.
Menu