Does celery handle the task of updating records at the 10w level, create 10w tasks, or create a task table scan loop? What are their advantages?

the database has 10w records, which may increase to 20w in half a year, but it should not exceed 100w in the end.

Server configuration:
python3.6 celery+rabbitMQ
CVM ubuntu 16.04 1G 1 core
Database postgresql 10, with a limit of 100 connections

the structure of the table is as follows:

1111.png

The

last_update field is the time of the last request (we need to update each record at least once within 1 hour, allowing an error of 10 minutes)
uuid field determines the parameter passed to the other party"s api when the request is initiated

the last_update of each record may be different, depending on the time when the record was added. This field changes each time the record is updated

the idea of our current program is:
creates a task An in celery, which works every hour.
queries all records of whose update time is 1 hour ago, and
then splices the queried records with for loop url, to send the spliced Url to asynchronous task B

. The purpose of

task B is simple: to request data according to the obtained url, write to the database, and update the last_update field

this way, you only need to create 2 celery tasks, but it doesn"t feel very robust.
it is said on the Internet that celery can support millions of tasks, so I am considering whether to create a celery task for each record.

dare to send a post to ask for help from my predecessors. In my case, which way of thinking is better? Do you have any improvement plans?

Thank you very much

Mar.20,2021

now that the owner of the building is implemented, there is already one task instance for each record.
first of all, let's make two definitions:

  1. task, is the celery method you define, for example:
@celery.task
def celery_task():
  pass
  1. task instance, which is the actual task to be run
task_instance = celery_task.delay()

Task 1, query; Task 2, traversal and update.
so the design of the landlord himself is:
two task, million (if enough data) task instances (that is, one task has been created for each eligible data).

since it is not very convenient to answer the landlord's question in the comments, I will answer the questions in the comments here.
Plan 1:
increase the number of consumers of celery and increase the number of worker.
is not recommended, because there are many uncontrollable factors and may not achieve the desired results.
Plan 2 (personal suggestion, which can be modified according to the situation):
add the judgment flag bit on your own.
does not know how the landlord uses celery, so it is assumed that the publish and subscribe tasks completed through redis have been operated.

-sharp 
@celery.task
def query_from_db():
  results = db.query
  for result in results:
    if redis.get(result.id):
      continue
    -sharp 
    -sharp updatequery_from_db
    -sharp updateredisresult.id
    redis.set(result.id, 'something', two_hours)
    update_result.delay(result.id)
    
@celery.task
def update_result(result_id):
  result = db.query.get(result_id)
  rv = requests.get(.....)
  result.update(rv.json())
  redis.delete(result_id)
Menu