How to speed up php curl_multi access with cookie?

problem description

you need a link to curl in your work, and you need to log in before you can return the key fields. Previously, it was directly curl, and then accessed with the saved cookie file, but only 20 results were returned per page, and it was too slow, each time the curl was about 1s curl 1.3s, and if there were too many results (a maximum of 50 pages of), curl will time out.

the environmental background of the problems and what methods you have tried

then limits the number of pages, that is, if the number of pages returned is greater than 5, only let it curl 5 times, so that although it is not too slow, but the result is also less. Later, I learned about curl_multi and tried it myself. If you don"t use cookie, this multi-process is really much faster. It turns out that curl 25 times is about 28 seconds, and after multiple processes is about 2 seconds. But once you add cookie access, the speed becomes the same as ordinary curl, or even a little slower. I don"t know if I wrote it wrong. Please attach the code

.

related codes

/ / Please paste the code text below (do not replace the code with pictures)
this is a normal curl
if ($arr0 ["pcount"] > 1) {

        $arr[0]["total"]["pcount"]=$arr[0]["total"]["pcount"]>10?10:$arr[0]["total"]["pcount"];
        for ($i=2;$i<=$arr[0]["total"]["pcount"];$iPP){
            //cookie
            $send_url="http://******.cn/ashx/GetList.ashx?pageIndex=".$i."&keys=".htmlspecialchars($tj)."&Lx=3";
            $ch = curl_init($send_url);
            curl_setopt($ch, CURLOPT_HEADER, 0);
            curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
            //curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file);
            curl_setopt($ch, CURLOPT_COOKIE, $cookie);
            $contents[] = curl_exec($ch);
            curl_close($ch);
        }
        $arr=array();
        foreach($contents as $k=>$v){
            $arr[$k]=(json_decode($v,true));
        }
    }
    
    
    //curl_multi
    
 if($arr[0]["total"]["pcount"]>1){
        $chArr=[];
        for($i=2;$i<=$arr[0]["total"]["pcount"];$iPP){
            $chArr[$i] = "http://*****.cn/ashx/GetList.ashx?pageIndex=$i&keys=".htmlspecialchars($tj)."&Lx=3";
        }
        $result = $this->postMulti($chArr,$cookie);
        $arr = array_merge($arr,$result);
    }
    
    //curl_multi
    public static function postMulti($chArr,$cookie)
{
    $max_request = count($chArr);
    $ch_list = array();
    $multi_ch = curl_multi_init();
    for ($i = 2;$i <= $max_request+1;$iPP) {
        $ch_list[$i] = curl_init($chArr[$i]);

        curl_setopt($ch_list[$i], CURLOPT_RETURNTRANSFER, true);
        curl_setopt($ch_list[$i], CURLOPT_COOKIE,$cookie) ;   //
        curl_setopt($ch_list[$i], CURLOPT_TIMEOUT, 30);
        //curl_setopt($ch_list[$i], CURLOPT_COOKIEFILE, $cookie_file);
        curl_multi_add_handle($multi_ch, $ch_list[$i]);
    }

    $active = null;
    do {
        $mrc = curl_multi_exec($multi_ch, $active); //
    } while ($mrc == CURLM_CALL_MULTI_PERFORM);
    //Note:
    // CURLM_OK 
    while ($active && $mrc == CURLM_OK) {
        if (curl_multi_select($multi_ch) != -1) {
            //cURL
            do {
                $mrc = curl_multi_exec($multi_ch, $active);
            } while ($mrc == CURLM_CALL_MULTI_PERFORM);
        }
    }
    //http
    $true_request = 0;
    foreach ($ch_list as $k => $ch) {
        $result[] = curl_multi_getcontent($ch);
        curl_multi_remove_handle($multi_ch,$ch);
        curl_close($ch);
        if ($result == 1) {
            $true_request += 1;
        }
    }
    curl_multi_close($multi_ch);
    foreach($result as $k=>$v){
        $arr[$k]=(json_decode($v,true));
    }
    return $arr;
}

what result do you expect? What is the error message actually seen?

Php
Mar.28,2021

  1. grasp the nature of the problem you are trying to solve, and don't confine yourself to certain routines
  2. the problem you need to solve is how to crawl data. The resistance encountered is inefficient. Specifically, too little data is crawled per unit time
  3. what we need to do at this time is to improve efficiency instead of studying curl all the time
  4. solution: reduce the time and increase the number of concurrent requests
  5. reduce the time: whether the interface is slow or there is a problem with the method you called, you can make a note of the time spent in each operation to troubleshoot this problem
  6. increase the number of concurrent requests: curl_multi simulates multithreading. You can create multiple processes to improve concurrency

Open your mind:

  1. can you read the library directly without reading the interface
  2. try python's scrapy
Menu