不理想,但是我认为它可以满足您的需求。
field1假设您是用来定义“重复”文档的字段,请更改字段的映射,如下所示:
PUT /lastseen{ "mappings": { "test": { "properties": { "field1": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } }, "field2": { "type": "string" }, "lastseen": { "type": "long" } } } }}意思是,您添加了一个
.raw子字段,
not_analyzed这意味着将按原样对它进行索引,而无需进行分析并将其分解为术语。这是为了使有些“重复的文档发现”成为可能。
然后,您需要在上使用
terms聚合
field1.raw(用于重复项)和
top_hits子聚合,以获取每个
field1值的单个文档:
GET /lastseen/test/_search{ "size": 0, "query": { "query_string": { "query": "dinner" } }, "aggs": { "field1_unique": { "terms": { "field": "field1.raw", "size": 2 }, "aggs": { "first_one": { "top_hits": { "size": 1, "sort": [{"lastseen": {"order":"desc"}}] } } } } }}此外,传回的那个单一文件
top_hits是最高的
lastseen(可能使
"sort": [{"lastseen":{"order":"desc"}}])。您将获得的结果是这些(在
aggregationsnot 之下
hits):
... "aggregations": { "field1_unique": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "dinner carrot potato broccoli", "doc_count": 2, "first_one": { "hits": { "total": 2, "max_score": null, "hits": [ { "_index": "lastseen", "_type": "test", "_id": "AU60ZObtjKWeJgeyudI-", "_score": null, "_source": { "field1": "dinner carrot potato broccoli", "field2": "something here", "lastseen": 1000 }, "sort": [ 1000 ] } ] } } }, { "key": "fish chicken something", "doc_count": 2, "first_one": { "hits": { "total": 2, "max_score": null, "hits": [ { "_index": "lastseen", "_type": "test", "_id": "AU60ZObtjKWeJgeyudJA", "_score": null, "_source": { "field1": "fish chicken something", "field2": "dinner", "lastseen": 2000 }, "sort": [ 2000 ] } ] } } } ] } }


