Scrapy request problem

want to write an ip proxy pool, write an iterator in the download middleware, each time this iterator will return an ip address, and then I will use this ip address in process_request. But I found that every time I run my crawler, it will return three values to me, that is, every time I request a web page process_request, this function will run three times. I don"t understand why. No, no, no.

def canshu(self):-sharp
    aa=["192.168.1.2","11.22.33","44,55,66"]
    return aa
def order(self):-sharpSs
    aa=self.canshu()
    for i in aa:
        yield i
@classmethod
def from_crawler(cls, crawler):
    -sharp This method is used by Scrapy to create your spiders.
    s = cls()
    s.a=s.order()
    crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
    return s

def process_request(self, request, spider):
    aa=self.a.__next__()
    ua=random.choice(user_agent_list)
    print("this time ua:",ua)
    request.headers.setdefault("User-Agent",ua)
    -sharprequest.meta["proxy"]="http://"+aa
    print("ip:",aa)
    return None
    

clipboard.png

Jun.28,2021

looking at the log, there are 3 requests. After your spider starts, you request robots.txt, first because the ROBOTSTXT_OBEY=True, in your settings then requests http://news.sina.com.cn/, but there is a 302 redirect https, so you request https://news.sina.com.cn/, and finally succeed. Every request triggers your middleware

Thank you for yuanshi's answer (Genji? )

Menu