Elasticsearch-计算重复值和唯一值

好吧，我在这里利用了几种聚合。以下是我使用过的列表。列表的顺序是聚合的执行顺序。

对于重复

术语汇总
统计数据桶汇总

对于非重复

术语汇总
- 桶选择器（作为子集合）
总和桶选择器

汇总查询：

POST <your_index_name>/_search{     "size":0,   "aggs":{        "duplicate_aggs":{"terms":{   "field":"firstname.keyword", "min_doc_count":2         }      },      "duplicate_bucketcount":{"stats_bucket":{   "buckets_path":"duplicate_aggs._count"         }      },      "nonduplicate_aggs":{"terms":{   "field":"firstname.keyword"         },         "aggs":{   "equal_one":{      "bucket_selector":{         "buckets_path":{ "count":"_count"       },       "script":"params.count == 1"    } }         }      },      "nonduplicate_bucketcount":{"sum_bucket":{   "buckets_path":"nonduplicate_aggs._count"         }      }   }}

响应

{  "took": 10,  "timed_out": false,  "_shards": {    "total": 5,    "successful": 5,    "skipped": 0,    "failed": 0  },  "hits": {    "total": 8,    "max_score": 0,    "hits": []  },  "aggregations": {    "duplicate_aggs": {      "doc_count_error_upper_bound": 0,      "sum_other_doc_count": 0,      "buckets": [        {          "key": "jane",          "doc_count": 2        },        {          "key": "joe",          "doc_count": 2        },        {          "key": "john",          "doc_count": 2        }      ]    },    "nonduplicate_aggs": {      "doc_count_error_upper_bound": 0,      "sum_other_doc_count": 0,      "buckets": [        {          "key": "jack",          "doc_count": 1        },        {          "key": "steve",          "doc_count": 1        }      ]    },    "duplicate_bucketcount": {      "count": 3,      "min": 2,      "max": 2,      "avg": 2,      "sum": 6    },    "nonduplicate_bucketcount": {      "value": 2    }  }}

注意，在上面的响应中，我们有一个

duplicate_bucketcount.count

键，其值

是将显示存储桶计数的值，该值是重复的键的数量。

让我知道是否有帮助！

Elasticsearch-计算重复值和唯一值

汇总查询：

响应

面试问答相关栏目本月热门文章