On the problem of monitoring page change and regular crawling increment

there is a project. I want to crawl the page at 19:00 every day, every 30 minutes, until I get to the incremental content, and then cycle again at 19:00 tomorrow. The configuration is as follows

@every(minutes=30)
def on_start(self):
    ...


@config(age=24 * 60 * 60)
def index_page(self, response):
    ...
  1. this setting, every= every 30 minutes, age= every 24 hours, can play the effect of timing start?
    if you want to start a timing function that starts at 19:00 every day, is there a more appropriate way to start a run at 19: 00 for the first time?
  2. in addition, the URL of the web page of the project will change if the content is the same. In addition to manually comparing the local database, is there a more appropriate way to monitor and only crawl increments?
Jun.13,2021

the first problem is solved by itself:
calls the time and date interface of Python and uses if to judge.
second question, since URL can change, maybe I'm giving you the only way to do it right now.

Menu