[python crawler] how to grab the penalty information link of Shenzhen Stock Exchange? - Codes Helper - Programming Question Answer

[python crawler] how to grab the penalty information link of Shenzhen Stock Exchange?

problem description

the goal of beginner Python, is to crawl penalty details on the Shenzhen Stock Exchange website in bulk, with a link with ".pdf". The web page ( http://www.szse.cn/disclosure.) and the corresponding source code are as follows:

the environmental background of the problems and what methods you have tried

the code you just wrote is as follows:

from urllib.request import urlopen
from bs4 import BeautifulSoup
html=urlopen("http://www.szse.cn/disclosure/listed/credit/record/index.html").read().decode("utf-8")
soup=BeautifulSoup(html,"html.parser")
link=soup.find_all("a",attrs={"href":"javascript：void(0);"})

the result of execution is

>>> link
[<a class="" href="javascript：void(0);"></a>, <a class="ml10" href="javascript：void(0);"></a>, <a class="ml10" href="javascript：void(0);"></a>]

didn"t catch the link you wanted to grab.
considering that the attribute "encode-open" appears only once in the source code, it is changed to this:

link=soup.find_all("a",attrs={"encode-open":re.compile(r".*\.pdf")})

but an empty list is returned:

>>> link               
[]

how can I grab this link? Thank you

Web-crawler python

Sep.30,2021

this page is not static, it is loaded dynamically by js.
that is, what you crawl is only the source code before js rendering, and the information in the page appears only after js rendering, so you can't get the data

in this way.

Previous: Obtaining the user's rights after a successful vue login displays different pages

Next: The problem of failure of array splice,push, operation according to subscript

An error occurred when Python3 crawled the short rent of Piglet.
just contacted python, according to https: blog.csdn.net mtbaby . wanted to crawl piglet short rent information, but then IP was blocked. then looks at the problem of agent ip , but still can t get the information . import requests from lxml im...

Web-crawler python

Feb.28,2021
How to clean up some unwanted HTML attributes in crawler data
for example, for the following data <p id="a">data I just want to keep data is there a quick way to do this? ...

Web-crawler python pyspider scrapy

Mar.01,2021
There is a problem that we can't get the playback information continuously when using bilibili api to obtain the playback information.
api: http: api.bilibili.com x web. there are already 70w aid, in the library every morning to get video playback updates by aid , and then there is a sudden problem in the early hours of this morning. Every time we get 200,300 pieces of data, there w...

Web-crawler python

Mar.02,2021
I would like to ask why this situation can not crawl the content of the tag.
as shown in the figure, only the tag is returned, but the content is gone. I haven t been learning crawlers for long, and I don t know why I m wrong. ...

Web-crawler python

Mar.02,2021
The < script > tags in html are all exactly the same. How can you tell the difference?
<html> <srcipt > 1 <srcipt > 2 .... < html> there must be no problem when loading. If I want to get a specified srcipt tag, I can get the element by getting the < script > array and then using the su...

Requests web-crawler python javascript

Mar.03,2021
Python 3.6Readwrite file transcoding
I picked the code of a website. How can I write it to the txt document? how can I write it to the document? here is my code and error report ...

Web-crawler python

Mar.03,2021
Simulated login pull hook net, one of the parameters in post's form is that signature, is generated as soon as it enters the login interface without entering account information, but I can't find it.
simulate login pull hook. One of the parameters in post s form is that signature, is generated as soon as it enters the login interface without entering account information, but I can t find . there is a result of searching signature in html with F...

Web-crawler python

Mar.05,2021
Multiple scrapy-redis cannot be crawled at the same time
Open two scrapy tasks at the same time, and then go to push in redis a start_url but only one scrapy task An is running, and when An is stopped, B task will begin to crawl. the reason seems to be that requests is not saved in redis while...

Scrapyd scrapy web-crawler python-crawler python

Mar.05,2021
When using selenium to drive chrome to find certain elements, the website cannot be found. It is a course learning platform.
after I log in to the website through selenium, I want to start automatically clicking some buttons on the web page. Through xpath positioning, I can t find . The code is as follows (account password is not important, you need to log in to enter the...

Selenium chrome web-crawler python

Mar.09,2021
How to determine the date element in python requests.post?
how does the date element in requests.post determine when building a crawler request such as requests.post (url, data=post_data)-sharp pseudo code the content of this post_data is different when crawling different websites. how should this content...

Post web-crawler python

Mar.12,2021
Requests cookies simulated login encountered problems
as mentioned above, I tried to use cookies to simulate login to www.jianshu.com, but failed. Come here to find some ideas. the process of simulation: f12 cookies,cookies network found a little too much, first added all of it, found that it didn t wor...

Requests web-crawler python

Mar.14,2021
Weibo scrolling load crawler problem
when browsing someone s Weibo home page, not all of the content will be loaded. It is divided into three loads. when I scroll to a location, I will initiate another request. but the content doesn t exist, and the request address is the same, a...

Web-crawler python

Mar.14,2021
How to write selenium in scrapy
...

Web-crawler python

Mar.16,2021
According to an example to write a program to crawl amazon pages, but there are many mistakes, do not understand, ask for help!
crawl the title and price of goods in Amazon China, Mobile phone-> Mobile Communications-> Apple Phone. its URL= https: www.amazon.cn s ref=s. my python code is as follows: import requests from bs4 import BeautifulSoup import re -sharpHTML import ti...

Web-crawler python

Mar.16,2021
Python selenium crawler
option.add_argument ( --start-maximized ) self.driver.maximize_window () what is the maximum difference between the two ...

Web-crawler python

Mar.17,2021
Dianping's latest anti-crawling: identify dynamic second-cut agent IP?
I have been climbing the front page of Dianping s store recently. Url is similar to http: m.dianping.com shop 4094416. Because Dianping has anti-crawling against IP, I built a dynamic IP tunnel that can switch IP, in seconds, that is, to change an IP...

Web-crawler python

Mar.18,2021
Why can't selenium search be located?
**** ...

Web-crawler python

Mar.18,2021
Check selenium does not return content
...

Web-crawler python

Mar.18,2021
Check to find out how the search is anti-crawling.
https: www.qichacha.com I climbed with a headless browser, simulated search keywords for dynamic ip 5 seconds for a do not log in, you can start to search keywords, but later can not, I do not know through what anti-climbing? ...

Web-crawler python

Mar.19,2021
Why can't my xpaht match?
<item> <title> <![CDATA[ IP ]]> < title> <link> <![CDATA[ cyzone_title_list=etree.HTML (response.text.encode ( utf-8 )) .XPath ( item title text () ) isn t text in title? http: www.cyzone.cn rss link ...

Web-crawler python

Mar.20,2021

MySQL Query : SELECT * FROM `codeshelper`.`v9_news` WHERE status=99 AND catid='6' ORDER BY rand() LIMIT 5
MySQL Error : Disk full (/tmp/#sql-temptable-64f5-329ea24-a73e.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
MySQL Errno : 1021
Message : Disk full (/tmp/#sql-temptable-64f5-329ea24-a73e.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
Need Help?