How to get the list of articles when crawling infoQ?

problem description

I want to crawl infoQ articles, such as articles under AI topics, but I"m curious about how he asked to load the article list.
uses Java"s crawler gecco.

the environmental background of the problems and what methods you have tried

View the XHR request as follows:

{"type":1,"size":12,"id":31,"score":1546988400000}

this is the first time to load. Ajax will load the article after the pulley slips. The request is as follows:

{"type":1,"size":12,"id":31,"score":1546495717917}

after you need to load the article, you need to click the load more button, and the request format is the same as above

clipboard.png
the addresses of these requests are all

.
https://www.infoq.cn/public/v1/article/getList

how did he determine the list of articles to be recorded?
does it depend on the distance the pulley slides?
how can I get a list of articles?


it is recommended that you take a look at Selenium
now many websites have done techniques to prevent crawling. Selenium simulates browser clicks and can get Dom data at the same time. You can try it.


new Date(1546495717917)
Thu Jan 03 2019 14:08:37 GMT+0800 ()
new Date(1546988400000)
Wed Jan 09 2019 07:00:00 GMT+0800 ()

is about getting 12 articles before a certain time.

Menu