Some errors about timeout and connection when scrapy uses proxy ip to grab data

when using scrapy to add agent ip, it is difficult to crawl the data and check the error reports in the log. I hope some great god can point out more detailed reasons for me

error message:
1, twisted.internet.error.TimeoutError: User timeout caused connection failure: Getting http://open.douyucdn.cn/api/RoomApi/room/1355623 took longer than 180.0 seconds.

2, twisted.internet.error.ConnectionRefusedError: Connection was refused by other side: 111: Connection refused

3, twisted.web._newclient.ResponseNeverReceived: [< twisted.python.failure.Failure twisted.internet.error.ConnectionDone: Connection was closed cleanly. >]

4, twisted.web._newclient.ResponseNeverReceived: [< twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion. >]

attach the proxy middleware I wrote
middlewares.py

class ZhimaProxyMiddleware(object):
    """"""
    def __init__(self):
        with open("ip_pool.txt", "r") as f:
            self.proxy_dict = json.loads(f.read())
            self.proxy_list = self.proxy_dict["proxy"]

    def process_request(self, request, spider):
        try:
            if request.url.find("douyu") > 0:
                request.meta["proxy"] = random.choice(self.proxy_list)["http"]
        except ValueError as error:
            logging.error(",{}".format(error))
        finally:
            return None

the extraction of the agent is detected
with the detected code

import requests
import json

-sharp 
zhima_url = ""
-sharp url
test_url = "http://www.qq.com/"
-sharp 
zhima = {"proxy": []}

def zhima_proxy():
    """"""
    response = requests.get(zhima_url)
    proxy_dict = json.loads(response.content.decode())

    for data in proxy_dict["data"]:
        proxy_ip = data["ip"]
        proxy_port = data["port"]
        proxy = {
            "http": "http://{}:{}".format(proxy_ip, proxy_port),
            "https": "https://{}:{}".format(proxy_ip, proxy_port)
        }
        try:
            res = requests.get(test_url, proxies=proxy, verify=False)
        except Exception as error:
            print("".format(error))
            continue
        else:
            if res.status_code == 200:
                zhima["proxy"].append(proxy)
            else:
                continue
    with open("ip_pool.txt", "w") as f:
        f.write(json.dumps(zhima))

online proxy detection tools also detect that agents are valid
but still don"t know why these problems occur

make sure the middleware is turned on
Thank you

Mar.05,2021

Agent may be out of date


I'm just like you, but I use mushroom agent, how do you solve it?


check the availability of the agent again when using it

Menu