好吧,我在这里利用了几种聚合。以下是我使用过的列表。列表的顺序是聚合的执行顺序。
对于重复
- 术语汇总
- 统计数据桶汇总
对于非重复
- 术语汇总
- 桶选择器 (作为子集合)
- 总和桶选择器
汇总查询:
POST <your_index_name>/_search{ "size":0, "aggs":{ "duplicate_aggs":{"terms":{ "field":"firstname.keyword", "min_doc_count":2 } }, "duplicate_bucketcount":{"stats_bucket":{ "buckets_path":"duplicate_aggs._count" } }, "nonduplicate_aggs":{"terms":{ "field":"firstname.keyword" }, "aggs":{ "equal_one":{ "bucket_selector":{ "buckets_path":{ "count":"_count" }, "script":"params.count == 1" } } } }, "nonduplicate_bucketcount":{"sum_bucket":{ "buckets_path":"nonduplicate_aggs._count" } } }}响应
{ "took": 10, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 8, "max_score": 0, "hits": [] }, "aggregations": { "duplicate_aggs": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "jane", "doc_count": 2 }, { "key": "joe", "doc_count": 2 }, { "key": "john", "doc_count": 2 } ] }, "nonduplicate_aggs": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "jack", "doc_count": 1 }, { "key": "steve", "doc_count": 1 } ] }, "duplicate_bucketcount": { "count": 3, "min": 2, "max": 2, "avg": 2, "sum": 6 }, "nonduplicate_bucketcount": { "value": 2 } }}注意,在上面的响应中,我们有一个
duplicate_bucketcount.count键,其值
3是将显示存储桶计数的值,该值是重复的键的数量。
让我知道是否有帮助!



