The problem of crawler redirecting 302

when the crawler starts, it is redirected to an error page. What to do
http://www.gzcc.gov.cn/data/l.
crawler"s error log is

clipboard.png

Mar.11,2021

Open the web page, see what the request headers are
, and then configure the request first sample in your crawler.


this kind of website is OK to reverse crawl. If you just want to get the source code, just disguise your head

.
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding:gzip, deflate, sdch
Accept-Language:zh-CN,zh;q=0.8
Cache-Control:max-age=0
Connection:keep-alive
Cookie:ASP.NET_SessionId=bqtygl55xovvgp45ajwmuj45
DNT:1
Host:www.gzcc.gov.cn
Upgrade-Insecure-Requests:1
User-Agent:Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 SE 2.X MetaSr 1.0

of course, in practice, we don't need to fill in so many parameters. Take the requests library as an example, (Pyhthon)

.
header={'user-agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 SE 2.X MetaSr 1.0'}
-sharp json
resp=requests.get(url=url,headers=header)

camouflage the request header

Menu