栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 前沿技术 > 大数据 > 大数据系统

es match

es match

****** 本文仅作为项目中用到知识点的记录,防止下次看到再去各种百度,个人理解!!!仅供参考!!!

由于官网对于match_phrase的解释有限,可参考这篇文章,讲的比较详细,点这里有match和match_phrase的比较

note: match和match_phrase一样 都会对搜索的条件进行分词查询,但是上面文章有一点提到的,图中红色选中的部分,不太理解,举例如下:
使用的是edge_ngram分词器
ngram会细分,如name 会分词成n,na,am,me,但是edge_ngram只会从开头分词,如n,na

1.创建mapping,并指定自定义的edge_ngram分词器

PUT localhost:9200/edge_ngram_custom_example
{
  "mappings":{
      "properties": {
        "content": {
            "type": "text",
            "analyzer": "my_edge_ngram"
            }
        }
    },  
  "settings": {
    "analysis": {
      "analyzer": {
          "my_edge_ngram": {
                "tokenizer": "custom_edge_ngram"
          }        
      },
      "tokenizer": {
        "custom_edge_ngram": {
          "type": "edge_ngram",
          "min_gram": 1,
          "max_gram": 2
          ,"token_chars": [
            "letter",
            "punctuation",
            "symbol",
            "digit"
            ]
        }
      }
    }
  }
}

2.添加数据

POST localhost:9200/edge_ngram_custom_example/_doc/2

{
    "content": "that isnot a test"
}

3.查询1

POST localhost:9200/edge_ngram_custom_example/_search

{
    "query":{
        "match_phrase":{
            "content": {
                "query": "th is a t"
                // ,"slop": 0
            }
            
        }
    }
}

结果:
{
    "took": 18,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 1.7260926,
        "hits": [
            {
                "_index": "edge_ngram_custom_example",
                "_type": "_doc",
                "_id": "2",
                "_score": 1.7260926,
                "_source": {
                    "content": "that isnot a test"
                }
            }
        ]
    }
}

4.查询2

POST localhost:9200/edge_ngram_custom_example/_search
{
    "query":{
        "match_phrase":{
            "content": {
                "query": "th i a t"
                // ,"slop": 0
            }
            
        }
    }
}

结果:
{
    "took": 3,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 0,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    }
}

可以看到 查询1和查询2的查询条件只是少了一个s,name我们来看下分词结果

POST localhost:9200/edge_ngram_custom_example/_analyze
{
   "analyzer": "my_edge_ngram",
   "text": "that isnot a test"
}

结果:
{
    "tokens": [
        {
            "token": "t",
            "start_offset": 0,
            "end_offset": 1,
            "type": "word",
            "position": 0
        },
        {
            "token": "th",
            "start_offset": 0,
            "end_offset": 2,
            "type": "word",
            "position": 1
        },
        {
            "token": "i",
            "start_offset": 5,
            "end_offset": 6,
            "type": "word",
            "position": 2
        },
        {
            "token": "is",
            "start_offset": 5,
            "end_offset": 7,
            "type": "word",
            "position": 3
        },
        {
            "token": "a",
            "start_offset": 11,
            "end_offset": 12,
            "type": "word",
            "position": 4
        },
        {
            "token": "t",
            "start_offset": 13,
            "end_offset": 14,
            "type": "word",
            "position": 5
        },
        {
            "token": "te",
            "start_offset": 13,
            "end_offset": 15,
            "type": "word",
            "position": 6
        }
    ]
}

可以看到is 是被分词成"i"和"is"的,按照上面的说法position必须连续 th 和 a 中间隔着一个i和is 理论上是根本没法连续的,但是 使用position 1,3,4,5的顺序就能查到,1,2,4,5就没有查到,不太理解为啥,有知道的可以评论一下

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/734808.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号