The setting of deny in crawlspider is invalid?

deny, is set in Rule but does not take effect:

the code is as follows:

"123123":(
        Rule(LinkExtractor(allow="\d+-\d+-\d+/.*?-.*?.shtml", deny=("http://search.******.com.cn/.*?")),
         callback="parse_item", follow=True),
        Rule(LinkExtractor(allow="a[href^="http"]",deny_domains=("http://auto.******.com.cn")), follow=True)
        )

there are still debug forbidden links at runtime
clipboard.png

Mar.25,2022

you excluded 123123.com.cn and crawled sina.com.cn, right?

remove the protocol header from deny_domains and try using the domain name directly


setting deny and deny_domains is useless

Menu