For the same download address, the size of the seed file crawled by the python crawler is 0, but can be downloaded normally with the browser?

1. Visit a web page, use the browser to download the embedded seed file, the seed file size is normal, with Thunderbolt tool can also be downloaded normally, but crawled with python crawler, and the downloaded data size is 0?
2. This is the code I wrote myself.

url = "http://www.gawu88.space/thread-9431970-1-1.html"
headers = {
    "Cookie":"__cfduid=d15f7eb39310b0301f07e1f744ca70a3d1526800937; _ga=GA1.2.942865751.1526800940; A8tI_2132_saltkey=njU69xqb; A8tI_2132_lastvisit=1526797339; A8tI_2132_adult_warn=1; A8tI_2132_auth=7d44BRr5TCxDGN9zYzcgtvgTYZzopZtEOJjzAO323fO%2BdvFoIjRzKH31yzmid2IjzmB9bQ5PLK%2B1iWLRV%2BnD6zp8PwkV; A8tI_2132_lastcheckfeed=7589318%7C1526800977; A8tI_2132_smile=2D1; A8tI_2132_atarget=1; _gid=GA1.2.849215201.1527331040; cus_cookie=5; A8tI_2132_adv_gid=18; A8tI_2132_self_unique_code=6357ea0d-3640-91bf-a290-cdc483f40ded; A8tI_2132_ignore_notice=1; __insp_wid=1484672786; __insp_nv=true; __insp_targlpu=aHR0cDovL3d3dy5nYXd1ODguc3BhY2UvcG9ydGFsLmh0bWw%3D; __insp_targlpt=6K665Z2b6Zeo5oi3X_adj_WQp_iuuuWdm1%2FmgKflkKfmiJDkurrorrrlnZs%3D; __insp_norec_sess=true; A8tI_2132_sign_close=1; A8tI_2132_notification_readed_ids=57457151; A8tI_2132_noticeTitle=1; A8tI_2132_notification_unread_tips=1527519801; A8tI_2132_credit_max_num=0; A8tI_2132_credit_remain_num=0; A8tI_2132_sendmail=1; A8tI_2132_st_t=7589318%7C1527520644%7C1dc26593f0230c7c6b43bde6c98103c9; A8tI_2132_forum_lastvisit=D_180_1526811032D_181_1527427919D_815_1527520227D_798_1527520644; A8tI_2132_visitedfid=798D815D181D307D791D216D11D180D142D27; A8tI_2132_ulastactivity=1527520644%7C0; A8tI_2132_self_uid=7589318; A8tI_2132_self_fid=798; A8tI_2132_st_p=7589318%7C1527520650%7C570a2893a0834543f205c6bc2090a236; A8tI_2132_viewid=tid_9478918; A8tI_2132_self_tid=9478918; A8tI_2132_lastact=1527520651%09misc.php%09seccode; A8tI_2132_seccode=129607798.bd627f2e523f8c47f4; __insp_slim=1527520653270",
    "Host":"www.gawu88.space",
    "Referer":"http://www.gawu88.space/forum-798-1.html",
    "Accept-Encoding":"",
    "User-Agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36",
}
response = requests.get(url,headers=headers)
html = etree.HTML(response.text)
print(response.text)
hrefs ="http://www.gawu88.space/"+ html.xpath("//span[@style="white-space: nowrap"]/a/@href")[0]
req = requests.get(hrefs,headers=headers)
file_name = "f:/1.torrent"
with open(file_name,"wb") as f:
    f.write(req.content)
    f.close()

3. If I do not join headers, although the downloaded seed data is no longer 0, but the downloaded seed file is an empty file, there is no download data.
4. What I want to know is why I can"t download the seed file. Is there any solution? Or is there something wrong with my request header headers construction? I hope all friends can help solve it. Thank you.

Mar.14,2021

Why not try the omnipotent wireshark? Grab a bag, copy all the header, and remove them one by one to see which header has an impact. Of course, it is also possible that the server requires you to send a get request to your referer first, or that the file is downloaded and reported incorrectly. Anyway, just grab a bag and have a look.

Menu