您要实现的目标无法在单个查询中完成。第一个查询将是过滤并获取需要对术语进行计数的文档ID。假设您具有以下映射:
{ "test": { "mappings": { "_doc": { "properties": { "details": { "type": "text", "store": true, "term_vector": "with_positions_offsets_payloads" }, "name": { "type": "keyword" } } } } }}假设您查询返回以下两个文档:
{ "hits": { "total": 2, "max_score": 1, "hits": [ { "_index": "test", "_type": "_doc", "_id": "1", "_score": 1, "_source": { "details": "There is some content about cars here. Lots of cars!", "name": "n1" } }, { "_index": "test", "_type": "_doc", "_id": "2", "_score": 1, "_source": { "details": "This page is all about cars", "name": "n2" } } ] }}从上面的响应中,您可以获得与查询匹配的所有文档ID。上面我们有:
"_id": "1"和
"_id": "2"
现在,我们使用
_mtermvectorsapi获取给定字段中每个术语的频率(计数):
test/_doc/_mtermvectors{ "docs": [ { "_id": "1", "fields": [ "details" ] }, { "_id": "2", "fields": [ "details" ] } ]}上面返回以下结果:
{ "docs": [ { "_index": "test", "_type": "_doc", "_id": "1", "_version": 1, "found": true, "took": 8, "term_vectors": { "details": { "field_statistics": { "sum_doc_freq": 15, "doc_count": 2, "sum_ttf": 16 }, "terms": { .... , "cars": { "term_freq": 2, "tokens": [ { "position": 5, "start_offset": 28, "end_offset": 32 }, { "position": 9, "start_offset": 47, "end_offset": 51 } ] }, .... } } } }, { "_index": "test", "_type": "_doc", "_id": "2", "_version": 1, "found": true, "took": 2, "term_vectors": { "details": { "field_statistics": { "sum_doc_freq": 15, "doc_count": 2, "sum_ttf": 16 }, "terms": { .... , "cars": { "term_freq": 1, "tokens": [ { "position": 5, "start_offset": 23, "end_offset": 27 } ] }, .... } } } ]}请注意,
....由于术语向量api返回所有术语的术语相关详细信息,因此我曾经在该字段中表示其他术语数据。您绝对可以从上述响应中提取有关所需字词的信息,这是我在此处显示的
cars,您感兴趣的字段是
term_freq



