Elasticsearch json aggregation search

now there is a need to count user operations, because there must be a large amount of data lost in mysql, so we are going to use elasticsearch. At present, there is a problem. Post the structure directly

.
{
  "_index": "test_vpn_operation-2018-04-08",
  "_type": "vpn_operation",
  "_id": "AWKiw45UfMhbLGkLP-v3",
  "_score": null,
  "_source": {
    "app_version": "2.0.0",
    "dateline": 1523149062,
    "channel": "pc",
    "edition": "jichu",
    "message": "{\"date_time\":\"2018-04-08T08:59:07\",\"account_id\":447189,\"operation\":[{\"account_id\":\"447189\",\"operation_id\":1,\"timestamp\":1523149062},{\"account_id\":\"447189\",\"operation_id\":1,\"timestamp\":1523149063}],\"channel\":\"pc\",\"edition\":\"jichu\",\"app_version\":\"2.0.0\",\"dateline\":1523149062}",
    "type": "vpn_operation",
    "path": "/data/wwwroot/vpnApi/elklog/vpn_operation_record_20180408",
    "@timestamp": "2018-04-08T00:59:07.722Z",
    "account_id": 447189,
    "date_time": "2018-04-08T08:59:07",
    "@version": "1",
    "host": "JG-otter",
    "operation": [
      {
        "operation_id": 1,
        "account_id": "447189",
        "timestamp": 1523149062
      },
      {
        "operation_id": 1,
        "account_id": "447189",
        "timestamp": 1523149063
      }
    ]
  },
  "fields": {
    "date_time": [
      1523177947000
    ],
    "@timestamp": [
      1523149147722
    ]
  },
  "highlight": {
    "message": [
      "{\"date_time\":\"2018-04-08T08:59:@kibana-highlighted-field@07@/kibana-highlighted-field@\",\"account_id\":447189,\"operation\":[{\"account_id\":\"447189\",\"operation_id\":1,\"timestamp\":1523149062},{\"account_id\":\"447189\",\"operation_id\":1,\"timestamp\":1523149063}],\"channel\":\"pc\",\"edition\":\"jichu\",\"app_version\":\"2.0.0\",\"dateline\":1523149062}"
    ]
  },
  "sort": [
    1523149147722
  ]
}

originally intended to implement mysql similar to group by statistical operations by aggregating operation.operation_id, but found that the data was not accurate. Because the operation_id field does not exist under operation, is there any way to achieve similar requirements, or how I need to adjust the structure to achieve the requirements

Mar.02,2021

guess 1: when you set mapping, you use an index of object type on the object array, which should be the default strategy for nested documents. If an index of type object is used, the array of objects will be flattened. See the following three official documents:

ide/cn/elasticsearch/guide/cn/complex-core-fields.html-sharpobject-arrays" rel=" nofollow noreferrer "> an array of internal objects
ide/cn/elasticsearch/guide/cn/nested-objects.html" rel= "nofollow noreferrer" > nested objects
ide/cn/elasticsearch/guide/cn/nested-mapping.html" rel= "nofollow noreferrer" > nested object mapping

guess 2: whether some of your documents do not have the field operation.operation_id , resulting in inaccurate results. At this point, you can set the default value for this field. Official document: ide/en/elasticsearch/reference/current/search-aggregations-metrics-sum-aggregation.html-sharp_missing_value_9" rel=" nofollow noreferrer "> missing value

guess 3: whether the operation.operation_id field has multiple types in all documents. For example, at first I saved int, then I saved string.. I do it against the norm because my upstream data source is the log, and field changes in the log are common. My practice is to check only the changed part of the index, because: the mapping change rule takes effect when the new index is established. For example, the index-2018-03-02 index rule will not take effect until it is changed to mapping, on March 1. At this time, a cross-index query, such as a query for the whole 3 months, will go wrong. You can query from March 2 to March 31 at this time.

Menu