Can we set a proxy for the spider using the scrapy_splash?

When I implemented a spider using Scrapy, I wanted to change the proxy of it so that the server wouldn"t forbid my request according to the frequent requests from an ip. I also knew how to change the proxy with Scrapy, using middlewares or directly change the meta when I request.

However, I used the package scrapy_splash to execute the Javascript for my spider, then I found it difficult to change the proxy because in my opinion, the scrapy_splash use a proxy server to render the JS of the website for us.

In fact, when I only use Scrapy, the proxy goes well, but turns to be unuseful when I use scrapy_splash.

So is there any way to set a proxy for the request of the scrapy_splash?

< H2 > HELP ME,PLZ,THANK YOU < / H2 >

modified 4 hours later:

I have set the related settings in the setting.py and written this in the middlewares.py . As I mentioned before, this only works for scrapy but not scrapy_splash:

class RandomIpProxyMiddleware(object):
    def __init__(self, ip=""):
        self.ip = ip
        ip_get()
        with open("carhome\\ip.json", "r") as f:
            self.IPPool = json.loads(f.read())

    def process_request(self, request, spider):
        thisip = random.choice(self.IPPool)
        request.meta["proxy"] = "http://{}".format(thisip["ipaddr"])

And here is the code in the spider with scrapy_splash:

    yield scrapy_splash.SplashRequest(
            item, callback=self.parse, args={"wait": 0.5})

Here is the code in the spider without this pluguin:

    yield scrapy.Request(item, callback=self.parse)
Mar.30,2021

If you want to build your private proxy pool, you can try this solution which can makes your android device or your home pc as an proxy server: https://github.com/xapanyun/p.

Menu