Ask about the choice of reptiles.

now there is a need to crawl an article from a website, including all the js,css.html files, and then save it to become your own article, which is loaded asynchronously through ajax. So I would like to ask, this kind of demand, which way to achieve better, scrapy splash and puppeteer seem to be similar in principle. In addition to the above two, there is no other framework for my current needs, the language is selected in node and ptyhon for advice.

Apr.19,2022

selenium is good, although inefficient


articles are obtained through ajax , why don't you just use this interface?


finally, I chose puppeteer


. I think that the retro combination of scrapy and bs4 will not fail to apply


dynamic web pages loaded through ajax. It is recommended to use selenium

.
Menu