Problems encountered in crawling articles

encountered problems when fetching headline articles
Why are the content of articles obtained by various operations of getting pages different

I use curl crawl to save dom to file, use PHPSPIDER crawl to save dom to file, directly in the Google browser page right-click the audit element to see the dom, to check the source code to see the dom, these dom are unexpectedly different, the browser is easy to understand, it should be opened after the js run dom also changed.

for guidance, what I do is crawl the headline article of the user-specified link

Php
Nov.23,2021

should be generated dynamically through js. If you want to take the content, parse the string in the js script directly.

if you want to crawl based on dom, you can use Google's Headless Chromium

Headless Chromium

Menu