Does celery handle the task of updating records at the 10w level, create 10w tasks, or create a task table scan loop? What are their advantages? - Codes Helper - Programming Question Answer

Does celery handle the task of updating records at the 10w level, create 10w tasks, or create a task table scan loop? What are their advantages?

the database has 10w records, which may increase to 20w in half a year, but it should not exceed 100w in the end.

Server configuration:
python3.6 celery+rabbitMQ
CVM ubuntu 16.04 1G 1 core
Database postgresql 10, with a limit of 100 connections

the structure of the table is as follows:

The

last_update field is the time of the last request (we need to update each record at least once within 1 hour, allowing an error of 10 minutes)
uuid field determines the parameter passed to the other party"s api when the request is initiated

the last_update of each record may be different, depending on the time when the record was added. This field changes each time the record is updated

the idea of our current program is:
creates a task An in celery, which works every hour.
queries all records of whose update time is 1 hour ago, and
then splices the queried records with for loop url, to send the spliced Url to asynchronous task B

. The purpose of

task B is simple: to request data according to the obtained url, write to the database, and update the last_update field

this way, you only need to create 2 celery tasks, but it doesn"t feel very robust.
it is said on the Internet that celery can support millions of tasks, so I am considering whether to create a celery task for each record.

dare to send a post to ask for help from my predecessors. In my case, which way of thinking is better? Do you have any improvement plans?

Thank you very much

Celery python

Mar.20,2021

now that the owner of the building is implemented, there is already one task instance for each record.
first of all, let's make two definitions:

task, is the celery method you define, for example:

@celery.task
def celery_task():
  pass

task instance, which is the actual task to be run

task_instance = celery_task.delay()

Task 1, query; Task 2, traversal and update.
so the design of the landlord himself is:
two task, million (if enough data) task instances (that is, one task has been created for each eligible data).

since it is not very convenient to answer the landlord's question in the comments, I will answer the questions in the comments here.
Plan 1:
increase the number of consumers of celery and increase the number of worker.
is not recommended, because there are many uncontrollable factors and may not achieve the desired results.
Plan 2 (personal suggestion, which can be modified according to the situation):
add the judgment flag bit on your own.
does not know how the landlord uses celery, so it is assumed that the publish and subscribe tasks completed through redis have been operated.

-sharp 
@celery.task
def query_from_db():
  results = db.query
  for result in results:
    if redis.get(result.id):
      continue
    -sharp 
    -sharp updatequery_from_db
    -sharp updateredisresult.id
    redis.set(result.id, 'something', two_hours)
    update_result.delay(result.id)
    
@celery.task
def update_result(result_id):
  result = db.query.get(result_id)
  rv = requests.get(.....)
  result.update(rv.json())
  redis.delete(result_id)

Previous: Vue project mobile pull-up load more

Next: How does vue v-for make attribute values also be added dynamically?

Two groups of group tasks in celery, how to make the second group of tasks wait for the completion of the first group of tasks
I need to execute two sets of tasks step by step. The subtasks in each set of tasks are executed in parallel celery, but the second group of tasks needs to wait for the first group of tasks to be completed before continuing to execute . from celery impo...

Celery python

Mar.02,2021
Why does the task_success of celery, this signal, no longer work?
@celery_app.task(name=u"abc", routing_key="xxx") def func_abc(a, b, c, d): pass @task_success.connect(sender=u"abc") def on_abc_success(sender, result, **kwargs): pass the overall logic of the code is like this...

Celery python

Mar.05,2021
Celery multi start monitors problems with supervisor
supervisor is a process management tool under Linux, which can manage the services running in the foreground, while the services in the background need to be transformed into foreground before they can be managed through supervisor. The celery process s...

Celery python

Mar.13,2021

MySQL Query : SELECT * FROM `codeshelper`.`v9_news` WHERE status=99 AND catid='6' ORDER BY rand() LIMIT 5
MySQL Error : Disk full (/tmp/#sql-temptable-64f5-3a4b2af-4be4b.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
MySQL Errno : 1021
Message : Disk full (/tmp/#sql-temptable-64f5-3a4b2af-4be4b.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
Need Help?