In the current structure of Sina Weibo, how do I crawl all Weibo of a single user?

recently, I followed a well-informed blogger on Sina Weibo. He has more than 20, 000 Weibo posts, mostly in plain text.

have you ever been a partner of data collection and crawlers, tell me about this part of the way of thinking and understanding? (I am also groping)

Node.js java javascript

Mar.17,2021

I have done a Weibo climbing article before, using puppeteer.js, to completely simulate user behavior and will not be blocked from detection
you can take a look at this library

it is illegal to climb Weibo. Please read Weibo's user agreement carefully. So just do it secretly, don't do it with so much fanfare.

Java
has never done Weibo, but the idea is to first obtain authentication Cookie,Token and so on, then grab the package with Fiddler, mainly the interface for requesting data, and then capture the Weibo part for persistence with Jsoup.
about the source, there should be an App interface, or a PC page or an H5 page, to see which is easier to choose.

previously wrote a simulated login with Java and climbed my own private message
because I was lazy. Instead of using Weibo's API
, I used Fiddler to grab packets, analyze parameters, simulate browser login, send requests, and parse Json
. The disadvantage is that it is relatively passive, so others can't play with a parameter program.

if I were asked to write another one now, I would choose to write a Chrome plug-in
, after all, it is a browser. Don't worry about authentication, just climb

if the plug-in doesn't bother to write, you can take a look at this
from=groupmessage" rel=" nofollow noreferrer "> without writing code. Webscraper grabs Li Xiaolai in 30 seconds

Weibo has its own open platform, which you can get through Weibo's API. There is no need to use crawlers

Previous: Win10's linux subsystem cannot be connected with xshell

Next: Confusion about SPA and template engine

The problem of vue axios carrying cookie is very serious!
Axios request cannot carry cookie, each login creates a new session, according to the instructions given online, a withCredent=true, is added as follows: : but still can t get cookie, to ask for help, the kind of urgent ...

Node.js java javascript vue.js

Feb.28,2021
How to parse the display cookie key value in the browser
chrome browser I set up buyNow in cookie and saved three pairs of keys. The browser shows that this has been escaped as follows: 7B%22skuCode%22%3A%22250S1524100609059%22%2C%22num%22%3A1%2C%22unConfirm%22%3Atrue%7D how do I explicitly display thes...

Html5 node.js java javascript

Mar.07,2021
The question of the middle layer of node?
the current project is built by vue scaffolding. Call the java API to render the page. if you join the node middle layer as the service forwarding java interface, will the node server be put together with the vue project or open separately? if you put ...

Node.js java javascript

Mar.28,2021
Js changes data structure
the background gave me a data structure as the header. I don t feel very good to use it. I want to change the format. var dateInfo = { week1_end: "2018-09-09", week1_start: "2018-09-03", week2_end: "2018-09-16", ...

Html5 node.js java javascript

Jun.02,2021
How does promise.all pass on both data?
getLatestJob(context){ const result1=api.getJobJsonFromShield(context); const result2=api.getJobJson(context); Promise.all([result1,result2]).then(function(results){ return { "shielddata":results[0], ...

Css html node.js java javascript

Jun.29,2021
The interface has been in padding state all the time, without timeout, it will jam directly.
for example, there is no problem with an online bug, in the test environment, the request parameters are normal, and the interface does not return anything (because it has always been in the state of padding, and the debugger is directly jammed). Which ...

Html5 node.js java javascript

Aug.16,2021
How to wrap a string
how to replace this string 1 br > with 1, 2, 3 to wrap ...

Node.js java javascript

Sep.10,2021
Is there any other solution to the react project seo besides server-side rendering?
the company developed the official website before, but did not consider the problem of seo at that time, but directly used the react+antd framework. Excuse me, do you have any other solutions besides server rendering? ...

React.js node.js java javascript

Nov.17,2021
The front end appears {"$ref": "$.data [0] .addMapList [3]"} when parsing the code. How to solve this problem?
...

React.js node.js java javascript vue.js

Dec.08,2021
JavaScript for help! Add fields to objects in the data
A click event that adds a field to the fsg array of objects with mobile number 123. "iscoll ": true. how do I do this?! Ask for help from the boss ...

Node.js java javascript

Feb.11,2022
APP home page has multiple content areas, multiple requests through multiple APIs or an one-time return?
there are columns at the top of the APP, home page for making a news, top news and general news below, and there are separate interfaces behind all three. At present, it is divided into three requests, but look at some practices of others, it seems to be...

Python node.js java javascript php

Feb.23,2022
The problem of authorizing a third party on the official account of Wechat
now the official account is authorized to a third party, and the administrator only needs to scan a QR code to complete the authorization. how is this implemented, and it is not found in the developer documentation- ...

Node.js java javascript php

Apr.16,2022
The choice of domain name cn and com
is registering a domain name, less than .com can only be registered .cn I wonder if there are any restrictions or effects on the use of .cn domain names in the future ...

Node.js java javascript php

May.23,2022
React, component life cycle, how to load virtual dom first?
problem description in html, you use a property that state does not define, but if you use the componentWillMount () method to define this property, the result will be an error. The browser prompts that the property is not defined and cannot refer to ...

Node.js java javascript html5 react.js

May.24,2022
Mysql randomly disrupts the array problem
for example, 10 pieces of data I took out of the database and paged into two pages to upset the order is 10 pieces of data to be disrupted and then paged how to deal with it? Tp framework ...

Mysql node.js java javascript php

Jun.24,2022
How to code the words in the thesaurus efficiently?
I have two plans now. One is to grow directly with numbers . let weight= { : 10, : 5, : 7, : 4, : 7, ufo : 3, } the other is to parse the characte...

Node.js java javascript

Jul.07,2022

MySQL Query : SELECT * FROM `codeshelper`.`v9_news` WHERE status=99 AND catid='6' ORDER BY rand() LIMIT 5
MySQL Error : Disk full (/tmp/#sql-temptable-64f5-382dc15-803f.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
MySQL Errno : 1021
Message : Disk full (/tmp/#sql-temptable-64f5-382dc15-803f.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
Need Help?