Dianping's latest anti-crawling: identify dynamic second-cut agent IP?

I have been climbing the front page of Dianping"s store recently. Url is similar to http://m.dianping.com/shop/4094416. Because Dianping has anti-crawling against IP, I built a dynamic IP tunnel that can switch IP, in seconds, that is, to change an IP, for each request. I have verified it with the http://httpbin.org/get website, and it is indeed a request for one IP at a time.

but I crawled the front page of Dianping"s store using the above method, and controlled that the interval between requests was 1 s. Finally, I was dropped by ban, as shown below:

clipboard.png

ban:

{"Date": "Thu, 07 Jun 2018 17:45:05 GMT", "Content-Type": "application/octet-stream", "Content-Length": "0", "M-Appkey": "com.sankuai.rc.mtsi.optimus", "M-SpanName": "OptimusController.optimusAuthorize", "M-Host": "10.73.137.220", "M-TraceId": "3536539434466270722", "Pragma": "no-cache", "Cache-Control": "no-cache", "Vary": "User-Agent, Accept-Encoding", "Age": "0", "Accept-Ranges": "bytes", "Connection": "keep-alive"}

banIP

cookiecookiechromehttp://m.dianping.com/shop/4094416:

clipboard.png

you can see that this header message is very common, so how do you do it?

Mar.18,2021

1. There may be js algorithms for browser fingerprints
2.chrome drive eigenvalues

now the technology is developing so fast that switching IP is no longer a killer mace. Various eigenvalues and fingerprints do not need to match IP to ban you


http://www.dianping.com/searc.
I have also been climbing Dianping recently. The json file returned from the URL with a map cannot be accessed without cookie.
anti-crawling is particularly powerful. I found that coolie's _ lxsdk is changing
anti-crawling every time it is refreshed. Headache has been reported incorrectly 403 or 302


in recent days. If you pay for the sesame HTTP agent ip, you will encounter a detailed page with comments at the end, similar to http://www.dianping.com/shop/.


Hello, Have you solved it, please? If you change the ip request, you can't get the information, so what is its anti-crawling mechanism to detect?

Menu