栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 面试经验 > 面试问答

Elasticsearch:计算文档中的术语

面试问答 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

Elasticsearch:计算文档中的术语

您要实现的目标无法在单个查询中完成。第一个查询将是过滤并获取需要对术语进行计数的文档ID。假设您具有以下映射:

{  "test": {    "mappings": {      "_doc": {        "properties": {          "details": { "type": "text", "store": true, "term_vector": "with_positions_offsets_payloads"          },          "name": { "type": "keyword"          }        }      }    }  }}

假设您查询返回以下两个文档:

{  "hits": {    "total": 2,    "max_score": 1,    "hits": [      {        "_index": "test",        "_type": "_doc",        "_id": "1",        "_score": 1,        "_source": {          "details": "There is some content about cars here. Lots of cars!",          "name": "n1"        }      },      {        "_index": "test",        "_type": "_doc",        "_id": "2",        "_score": 1,        "_source": {          "details": "This page is all about cars",          "name": "n2"        }      }    ]  }}

从上面的响应中,您可以获得与查询匹配的所有文档ID。上面我们有:

"_id": "1"
"_id": "2"

现在,我们使用

_mtermvectors
api获取给定字段中每个术语的频率(计数):

test/_doc/_mtermvectors{  "docs": [    {      "_id": "1",      "fields": [        "details"      ]    },    {      "_id": "2",      "fields": [        "details"      ]    }  ]}

上面返回以下结果:

{  "docs": [    {      "_index": "test",      "_type": "_doc",      "_id": "1",      "_version": 1,      "found": true,      "took": 8,      "term_vectors": {        "details": {          "field_statistics": { "sum_doc_freq": 15, "doc_count": 2, "sum_ttf": 16          },          "terms": { .... , "cars": {   "term_freq": 2,   "tokens": [     {       "position": 5,       "start_offset": 28,       "end_offset": 32     },     {       "position": 9,       "start_offset": 47,       "end_offset": 51     }   ] }, ....          }        }      }    },    {      "_index": "test",      "_type": "_doc",      "_id": "2",      "_version": 1,      "found": true,      "took": 2,      "term_vectors": {        "details": {          "field_statistics": { "sum_doc_freq": 15, "doc_count": 2, "sum_ttf": 16          },          "terms": { .... , "cars": {   "term_freq": 1,   "tokens": [     {       "position": 5,       "start_offset": 23,       "end_offset": 27     }   ] }, ....        }      }    }  ]}

请注意,

....
由于术语向量api返回所有术语的术语相关详细信息,因此我曾经在该字段中表示其他术语数据。您绝对可以从上述响应中提取有关所需字词的信息,这是我在此处显示的
cars
,您感兴趣的字段是
term_freq



转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/403739.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号