Elasticsearch has a large amount of data, how to summarize the whole table

there are multiple index , recording product data, each index recent 20g large

I want to summarize each of these index , such as all merchants statistics in an index and save them to the new index , but the aggs query cannot be paged. I know that there are scroll , scan , but also see

.

if the scroll query contains aggregation, only the initial query result is the aggregate result
scan query does not support aggregation

so, if I want to count the entire index , what is the option?

Mar.29,2021

ES paging scheme
Code:

  1. disable From/Size
  2. the sorted data is obtained by the search_after method. Only sorted data is returned each time, and no other data is returned. The sorted data is saved in ES
  3. .
  4. Paging search uses the search_after method, sets the search starting point according to the ES sorting index, and returns all the data each time

advantages: the access speed is the fastest, theoretically, the access time of a single page is seconds, and the pressure on ES is small
disadvantages: it requires a separate thread to maintain the sorted data array, and because the ES data in the sorted index may be deleted, but the data of the ES sorted index has not been updated, the access data has a lower probability that a piece of data on a single page is squeezed out of the paging interface by the new data. There is a low probability that the last piece of data on the previous page will appear on this page, and the last piece of data on this page will appear on the next page.
specific implementation details: the program will create a sort index in ES at the beginning, and then use the search_after algorithm to constantly calculate the sorting data of the index to be paged and save it in ES. This procedure is carried out in a loop. Then when you want to access it, you only need to read the values of the paging index and the corresponding paging sort index, and you can get the data of the corresponding page.

introduction to Restful process:

  1. Restful statement to get sorted data

first page:
GET test_delete/_search
{

"size":15,
"sort":[
  {"randomDouble":"DESC"},
  {"randomInt": "DESC"},
  {"phone":"DESC"}
],
"_source": "{}"

}
get the last sorted data
page N:
GET test_delete/_search
{

"size":15,
"sort":[
  {"randomDouble":"DESC"},
  {"randomInt": "DESC"},
  {"phone":"DESC"}
],
"search_after":[
      
    ],
"_source": "{}"

}
this process opens a thread that continuously updates the sorted data

.
  1. get the Restful statement of the corresponding page data

first page:
GET test_delete/_search
{

"size":15,
"sort":[
  {"randomDouble":"DESC"},
  {"randomInt": "DESC"},
  {"phone":"DESC"}
]

}
Page N:
GET test_delete/_search
{

"size":15,
"sort":[
  {"randomDouble":"DESC"},
  {"randomInt": "DESC"},
  {"phone":"DESC"}
],
"search_after":[
      
    ],

}
contrast to get sorted data, the restful statement deletes "_ source": "{}" to get all the data on the page

Menu