How to design sub-tables for Mysql hundreds of millions of data?

now there is a reading reward log table with approximately 100 million data (storage size 50G)
table structure, id,num (number of), uid (users id), acid (article id), ac_url (article path), atime,channel (channel)
now this table has three commonly used query statements
1. Use uid to query the cumulative number of readings for this user sum (num)
2. Use atime to accumulate the number of reading awards within the time range sum (num)
3. Use uid+atime query to accumulate the number of reading awards sum (num)
now that the above query is very slow, how to operate sub-tables or partitions?
if you query according to the time range of tables 2 and 3 per day, don"t you have to join queries every time?
does it only improve the efficiency of 1 query according to the last four digits of the user"s id? Time query or joint query

and how the old data is written to the sub-table as quickly as possible, and so on.

sincerely ask for advice on how to solve the problem. Thank you?!

Mar.07,2021

  1. use MySQL middleware to split the table (which can be divided by month) (not a good solution)
  2. it is recommended to use a distributed database such as TiDB or Aliyun's commercial distributed database

  • according to the uid hash (or the last four digits as you said); support 1p3 query

    • advantage: concurrency: divide the concurrent load equally to each table according to the uid table; if the table is divided according to time, the concurrency problem cannot be solved
  • The query of
  • 2 is summarized by the above table at a fixed time every day, and is included in a separate table (or sub-table, monthly, etc.)
as the above classmate said, TIDB is also fine.

Why not add two table records sum (num) directly? One is that according to uid, the number of log data written according to atime, (only create does not have update) will be far less than the number of searches, not to mention that each query sum, is equivalent to iterating through the entire result.

.

if you forcibly follow your existing plan, you can only choose one of the two. This is ok according to the number of historical calls and cost

.
For the

sub-table, I used to take the module according to uid% 50 (the same as hash). For example, the disadvantage of table_0/table_1./table_49
is that it takes a little more effort to query by time.
divide the table by time or by uid. It mainly depends on which query.

in addition, the amount of data is in the hundreds of millions, so why consider mysql? You can change it to ElasticSearch or something.
if you often query summary data, you can also regularly automatically summarize the data into a table to facilitate query.

The function of

statistical class does not require very high real-time performance and accuracy. It is recommended to build a new summary table and update the incremental data regularly in the evening, so as to solve the performance problem through pre-calculation.

Menu