Elasticsearch：计算文档中的术语

您要实现的目标无法在单个查询中完成。第一个查询将是过滤并获取需要对术语进行计数的文档ID。假设您具有以下映射：

{  "test": {    "mappings": {      "_doc": {        "properties": {          "details": { "type": "text", "store": true, "term_vector": "with_positions_offsets_payloads"          },          "name": { "type": "keyword"          }        }      }    }  }}

假设您查询返回以下两个文档：

{  "hits": {    "total": 2,    "max_score": 1,    "hits": [      {        "_index": "test",        "_type": "_doc",        "_id": "1",        "_score": 1,        "_source": {          "details": "There is some content about cars here. Lots of cars!",          "name": "n1"        }      },      {        "_index": "test",        "_type": "_doc",        "_id": "2",        "_score": 1,        "_source": {          "details": "This page is all about cars",          "name": "n2"        }      }    ]  }}

从上面的响应中，您可以获得与查询匹配的所有文档ID。上面我们有：

"_id": "1"

和

"_id": "2"

现在，我们使用

_mtermvectors

api获取给定字段中每个术语的频率（计数）：

test/_doc/_mtermvectors{  "docs": [    {      "_id": "1",      "fields": [        "details"      ]    },    {      "_id": "2",      "fields": [        "details"      ]    }  ]}

上面返回以下结果：

{  "docs": [    {      "_index": "test",      "_type": "_doc",      "_id": "1",      "_version": 1,      "found": true,      "took": 8,      "term_vectors": {        "details": {          "field_statistics": { "sum_doc_freq": 15, "doc_count": 2, "sum_ttf": 16          },          "terms": { .... , "cars": {   "term_freq": 2,   "tokens": [     {       "position": 5,       "start_offset": 28,       "end_offset": 32     },     {       "position": 9,       "start_offset": 47,       "end_offset": 51     }   ] }, ....          }        }      }    },    {      "_index": "test",      "_type": "_doc",      "_id": "2",      "_version": 1,      "found": true,      "took": 2,      "term_vectors": {        "details": {          "field_statistics": { "sum_doc_freq": 15, "doc_count": 2, "sum_ttf": 16          },          "terms": { .... , "cars": {   "term_freq": 1,   "tokens": [     {       "position": 5,       "start_offset": 23,       "end_offset": 27     }   ] }, ....        }      }    }  ]}

请注意，

....

由于术语向量api返回所有术语的术语相关详细信息，因此我曾经在该字段中表示其他术语数据。您绝对可以从上述响应中提取有关所需字词的信息，这是我在此处显示的

cars

，您感兴趣的字段是

term_freq

Elasticsearch：计算文档中的术语

面试问答相关栏目本月热门文章