Crawlers crawl web pages, a single crawler can crawl, multi-threaded crawlers can not open the url?

as mentioned above, if a single crawler can crawl, the multithreaded crawler cannot open the url. Is the request time between the multithreaded crawlers too short, which triggers the anti-crawler mechanism of the website?

Web-crawler python

May.22,2021

pay attention to the delay request. I usually only start batch downloading when downloading images.

you can try to change the proxy every time. The IP, is most likely pulled into the blacklist after the access rate is too fast.

Previous: How do I use trigger to simulate pressing a mousedown event with a mouse?

Next: How to achieve a project can only be opened in WeChat Work, directly enter the URL can not be opened?

An error occurred when Python3 crawled the short rent of Piglet.
just contacted python, according to https: blog.csdn.net mtbaby . wanted to crawl piglet short rent information, but then IP was blocked. then looks at the problem of agent ip , but still can t get the information . import requests from lxml im...

Web-crawler python

Feb.28,2021
How to clean up some unwanted HTML attributes in crawler data
for example, for the following data <p id="a">data I just want to keep data is there a quick way to do this? ...

Web-crawler python pyspider scrapy

Mar.01,2021
There is a problem that we can't get the playback information continuously when using bilibili api to obtain the playback information.
api: http: api.bilibili.com x web. there are already 70w aid, in the library every morning to get video playback updates by aid , and then there is a sudden problem in the early hours of this morning. Every time we get 200,300 pieces of data, there w...

Web-crawler python

Mar.02,2021
I would like to ask why this situation can not crawl the content of the tag.
as shown in the figure, only the tag is returned, but the content is gone. I haven t been learning crawlers for long, and I don t know why I m wrong. ...

Web-crawler python

Mar.02,2021
The < script > tags in html are all exactly the same. How can you tell the difference?
<html> <srcipt > 1 <srcipt > 2 .... < html> there must be no problem when loading. If I want to get a specified srcipt tag, I can get the element by getting the < script > array and then using the su...

Requests web-crawler python javascript

Mar.03,2021
Python 3.6Readwrite file transcoding
I picked the code of a website. How can I write it to the txt document? how can I write it to the document? here is my code and error report ...

Web-crawler python

Mar.03,2021
Simulated login pull hook net, one of the parameters in post's form is that signature, is generated as soon as it enters the login interface without entering account information, but I can't find it.
simulate login pull hook. One of the parameters in post s form is that signature, is generated as soon as it enters the login interface without entering account information, but I can t find . there is a result of searching signature in html with F...

Web-crawler python

Mar.05,2021
Multiple scrapy-redis cannot be crawled at the same time
Open two scrapy tasks at the same time, and then go to push in redis a start_url but only one scrapy task An is running, and when An is stopped, B task will begin to crawl. the reason seems to be that requests is not saved in redis while...

Scrapyd scrapy web-crawler python-crawler python

Mar.05,2021
When using selenium to drive chrome to find certain elements, the website cannot be found. It is a course learning platform.
after I log in to the website through selenium, I want to start automatically clicking some buttons on the web page. Through xpath positioning, I can t find . The code is as follows (account password is not important, you need to log in to enter the...

Selenium chrome web-crawler python

Mar.09,2021
How to determine the date element in python requests.post?
how does the date element in requests.post determine when building a crawler request such as requests.post (url, data=post_data)-sharp pseudo code the content of this post_data is different when crawling different websites. how should this content...

Post web-crawler python

Mar.12,2021
Requests cookies simulated login encountered problems
as mentioned above, I tried to use cookies to simulate login to www.jianshu.com, but failed. Come here to find some ideas. the process of simulation: f12 cookies,cookies network found a little too much, first added all of it, found that it didn t wor...

Requests web-crawler python

Mar.14,2021
Weibo scrolling load crawler problem
when browsing someone s Weibo home page, not all of the content will be loaded. It is divided into three loads. when I scroll to a location, I will initiate another request. but the content doesn t exist, and the request address is the same, a...

Web-crawler python

Mar.14,2021
How to write selenium in scrapy
...

Web-crawler python

Mar.16,2021
According to an example to write a program to crawl amazon pages, but there are many mistakes, do not understand, ask for help!
crawl the title and price of goods in Amazon China, Mobile phone-> Mobile Communications-> Apple Phone. its URL= https: www.amazon.cn s ref=s. my python code is as follows: import requests from bs4 import BeautifulSoup import re -sharpHTML import ti...

Web-crawler python

Mar.16,2021
Python selenium crawler
option.add_argument ( --start-maximized ) self.driver.maximize_window () what is the maximum difference between the two ...

Web-crawler python

Mar.17,2021
Dianping's latest anti-crawling: identify dynamic second-cut agent IP?
I have been climbing the front page of Dianping s store recently. Url is similar to http: m.dianping.com shop 4094416. Because Dianping has anti-crawling against IP, I built a dynamic IP tunnel that can switch IP, in seconds, that is, to change an IP...

Web-crawler python

Mar.18,2021
Why can't selenium search be located?
**** ...

Web-crawler python

Mar.18,2021
Check selenium does not return content
...

Web-crawler python

Mar.18,2021
Check to find out how the search is anti-crawling.
https: www.qichacha.com I climbed with a headless browser, simulated search keywords for dynamic ip 5 seconds for a do not log in, you can start to search keywords, but later can not, I do not know through what anti-climbing? ...

Web-crawler python

Mar.19,2021
Why can't my xpaht match?
<item> <title> <![CDATA[ IP ]]> < title> <link> <![CDATA[ cyzone_title_list=etree.HTML (response.text.encode ( utf-8 )) .XPath ( item title text () ) isn t text in title? http: www.cyzone.cn rss link ...

Web-crawler python

Mar.20,2021

MySQL Query : SELECT * FROM `codeshelper`.`v9_news` WHERE status=99 AND catid='6' ORDER BY rand() LIMIT 5
MySQL Error : Disk full (/tmp/#sql-temptable-64f5-5386323-2b45.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
MySQL Errno : 1021
Message : Disk full (/tmp/#sql-temptable-64f5-5386323-2b45.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
Need Help?