Pyspider debugging is correct, but automatic running has no result.

1. Write a pyspider script, debug and run without error, and can also be inserted into the database, but after the first successful automatic run, it will never run successfully again. The prompt message is all success, but no data is inserted.

the code is as follows:

< H1 >! / usr/bin/env python < / H1 > < H1 >--encoding: utf-8--< / H1 > < H1 > Created on 2018-04-27 11:58:40 < / H1 > < H1 > Project: tripadvistor_bj < / H1 >

from pyspider.libs.base_handler import *
import pymongo

class Handler (BaseHandler):

crawl_config = {
    "headers": {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.98 Safari/537.36 LBBROWSER"
    }
}

client = pymongo.MongoClient("localhost")
db = client["trip"]

@every(minutes=24 * 60)
def on_start(self):
    self.crawl("https://www.tripadvisor.cn/Attractions-g294212-Activities-Beijing.html", callback=self.index_page ,validate_cert=False, auto_recrawl = True)

@config(age=10 * 24 * 60 * 60)
def index_page(self, response):
    for each in response.doc(".listing_info .listing_title a").items():
        self.crawl(each.attr.href, callback=self.detail_page, validate_cert=False)
        
    next = response.doc(".pagination .nav.next").attr.href
    self.crawl(next,callback = self.index_page, validate_cert=False)
    
@config(priority=2)
def detail_page(self, response):
    
    name = response.doc("-sharpHEADING").text() + response.doc("-sharpHEADING > span").text()
    addr = response.doc("span.street-address").text()
    type = response.doc("span.header_detail.attraction_details > div > a:nth-child(1)").text()
    phone = response.doc("div.blEntry.phone > span:nth-child(2)").text()
    desc = response.doc("div.prw_rup.prw_common_location_description > div > div.text").text()
    
    return {
        "url": response.url,
        "title": response.doc("title").text(),
        "name": name,
        "addr": addr,
        "type": type,
        "tel": phone,
        "desc": desc
    }

def on_result(self,result):
    if result:
        self.save_to_mongo(result)
            
def save_to_mongo(self,result):
    if self.db["beijing"].insert(result):
        print("save to mongo",result)

3. Manual debugging results:
[I 180428 10:55:10 tornado_fetcher:419] [200] tripadvistor_bj:2370a7545cee745073ca951c016aa06f https://www.tripadvisor.cn/At. 0.83s
[I 180428 10:55:12 tornado_fetcher:419] [200] tripadvistor_bj:a89af7ac6933d3284facd55ec0210234 https://www.tripadvisor.cn/At. 0.72s
View the database, A data
autorun result has been added:
[I 180428 10:56:03 scheduler:906] task done tripadvistor_bj:on_start data:,on_start
[I 180428 10:56:04 scheduler:858] restart task tripadvistor_bj:on_finished data:,on_finished
[I 180428 10:56:04 scheduler:965] select tripadvistor_bj:on_finished data:,on_finished
[I 180428 10:56:04 tornado_fetcher:188] [200] tripadvistor_bj:on_finished data:, On_finished 0s
[I 180428 10:56:04 processor:202] process tripadvistor_bj:on_finished data:,on_finished-> [180428] len:11-> result:None fol:0 msg:0 err:None
[I 180428 10:56:05 scheduler:906] task done tripadvistor_bj:on_finished data:,on_finished
View the database, There is no increase in data.
Please give me some advice on how to find the problem. What does the code behind tornado_fetcher: mean? Does it have anything to do with this?

Mar.10,2021

pyspider automatically removes weight, and links that have been grasped will not be re-crawled. Quote https://codeshelper.com/q/10.
pyspider instruction manual
ider.org/en/latest/apis/self.crawl/-sharpitag" rel= "nofollow noreferrer" > http://docs.pyspider.org/en/l.

.

I wish I could change it like this:

class Handler (BaseHandler):

crawl_config = {
    'itag': 'v223'
}

how does itag work here, and where is v223 in the code?


I have a similar problem. When I debug one by one manually, I can get the data, and when it comes to runing, Just report an error
[I 180819 21:07:33 tornado_fetcher:188] [200] meizitu_com:on_finished data:,on_finished 0s
[I 180819 21:07:33 processor:202] process meizitu_com:on_finished data:,on_finished-> [180819] len:11-> result:None fol:0 msg:0 err:None
[I 180819 21:07:33 scheduler:906] task done meizitu_com:on_finished data:,on_finished

Menu