Some questions and doubts of scrapy

my proxy middleware: settings has been set to 544 and added to None by default

class IpProxyMiddleware(object):
    def __init__(self, ip=""):
        self.ip = ip

    def process_request(self, request, spider):
        self.ip = requests.get("http://localhost:5555/random").text
        logging.info("IP:" + self.ip)
        request.meta["proxy"] = "http://" + self.ip
  • wonder: every time you use Request to specify a url and callback function, will the process_request method be executed? Then call API once to get the local proxy Ip? That"s what doc said.
this method is called when each request passes through the download middleware
  • question: how to set the parsing function of the callback, when parsing the non-20000 error code, switch the proxy IP? again Is there a problem using the following code? I used it, but it didn"t work. I don"t know where to check.
        if response.status != 200:
            logging.error("--------IP has be baned!Retry again~ --------")
            yield Request(url=response.url,meta={"change_proxy": True}, callback=self.followees_parse)
Mar.09,2021

take a look at debug, and run the breakpoint step by step

Menu