[python crawler] how to grab the penalty information link of Shenzhen Stock Exchange?

problem description

the goal of beginner Python, is to crawl penalty details on the Shenzhen Stock Exchange website in bulk, with a link with ".pdf". The web page ( http://www.szse.cn/disclosure.) and the corresponding source code are as follows:

clipboard.png

the environmental background of the problems and what methods you have tried

the code you just wrote is as follows:

from urllib.request import urlopen
from bs4 import BeautifulSoup
html=urlopen("http://www.szse.cn/disclosure/listed/credit/record/index.html").read().decode("utf-8")
soup=BeautifulSoup(html,"html.parser")
link=soup.find_all("a",attrs={"href":"javascript:void(0);"})

the result of execution is

>>> link
[<a class="" href="javascript:void(0);"></a>, <a class="ml10" href="javascript:void(0);"></a>, <a class="ml10" href="javascript:void(0);"></a>]

didn"t catch the link you wanted to grab.
considering that the attribute "encode-open" appears only once in the source code, it is changed to this:

link=soup.find_all("a",attrs={"encode-open":re.compile(r".*\.pdf")})

but an empty list is returned:

>>> link               
[]

how can I grab this link? Thank you

Sep.30,2021

this page is not static, it is loaded dynamically by js.
that is, what you crawl is only the source code before js rendering, and the information in the page appears only after js rendering, so you can't get the data

in this way.
Menu