Scrapy parses js code or regular

crawl a website with scrapy. The data is generated by js. The script, extracted by xpath is obtained as follows:

define("page_data",
        {
            "uiConfig": {
                "type": "root",
                "items":[
                    {
                        "comid": "itemBasic",
                        "items":[
                            {
                                "id":123,
                                "data":
                            }
                        ]
                    }
                ]
            }
        }
    );

is there any way to get it? Due to the large number of requests, selenium.
is not considered for the time being. Is there any way to manipulate this data like js, for example, such as js.
or regular words how to match?


if the text you get is so regular, it's very simple, you don't even need the rules, get rid of the first line, get rid of the last line. Then data = json.loads (content) , you can get it through data ['uiConfig'] [' items'] [0] ['items'] [0] [' data'] .


to be honest, I don't quite understand the requirements. You got this data, and then you need "data" in this data the data I need the data here, right?
if so, you can try to regularly match a piece of data from a similar dictionary after page_data , and then try to deserialize the data into a Python dictionary using json.loads () . Then take out the content you want according to the field value method of Python.

if you can't serialize, you can try to use regular matching directly to find the data you need. Remember to use non-greedy pattern matching

if you want to execute js. Em. Your data is not like js, it seems that you can't execute it.
if you want to execute js in Python, Baidu has a third-party package that can parse and execute js code

Menu