栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 前沿技术 > 大数据 > 大数据系统

Elasticsearch之IK分词器

Elasticsearch之IK分词器

1.比较

使用内置分词器


内置的分词器将一句话拆分成一个个字,这种拆法意义不大

使用IK分词器

2.安装IK分词器

资源
https://github.com/medcl/elasticsearch-analysis-ik/releases
链接:https://pan.baidu.com/s/1dTzBN6fr1ieks25qDqA26A
提取码:0cc3

在es安装目录下的plugins目录里创建ik目录
mkdir /usr/local/es/elasticsearch-7.2.0/plugins/ik
安装unzip命令
yum -y install unzip
解压
unzip elasticsearch-analysis-ik-7.2.0.zip

重启es即可

3.使用

ik使用
ik_max_word :会将文本做最细粒度的拆分;尽可能多的拆分出词语
ik_smart:会做最粗粒度的拆分;已被分出的词语将不会再次被其它词语占有

{
-"tokens": [
-{
"token": "中华人民共和国",
"start_offset": 0,
"end_offset": 7,
"type": "CN_WORD",
"position": 0
},
-{
"token": "中华人民",
"start_offset": 0,
"end_offset": 4,
"type": "CN_WORD",
"position": 1
},
-{
"token": "中华",
"start_offset": 0,
"end_offset": 2,
"type": "CN_WORD",
"position": 2
},
-{
"token": "华人",
"start_offset": 1,
"end_offset": 3,
"type": "CN_WORD",
"position": 3
},
-{
"token": "人民共和国",
"start_offset": 2,
"end_offset": 7,
"type": "CN_WORD",
"position": 4
},
-{
"token": "人民",
"start_offset": 2,
"end_offset": 4,
"type": "CN_WORD",
"position": 5
},
-{
"token": "共和国",
"start_offset": 4,
"end_offset": 7,
"type": "CN_WORD",
"position": 6
},
-{
"token": "共和",
"start_offset": 4,
"end_offset": 6,
"type": "CN_WORD",
"position": 7
},
-{
"token": "国",
"start_offset": 6,
"end_offset": 7,
"type": "CN_CHAR",
"position": 8
},
-{
"token": "国歌",
"start_offset": 7,
"end_offset": 9,
"type": "CN_WORD",
"position": 9
}
]
}

4.创建索引、使用IK分词器

创建索引

{
    "settings" : {
        "analysis" : {
            "analyzer" : {
                "ik" : {
                    "tokenizer" : "ik_max_word"
                }
            }
        }
    },
    "mappings" : {
        "properties" : {
          "username" : {"type" : "text", "analyzer" : "ik_max_word"}
         }
    }
}

添加数据

查询


5.自定义

{
-"tokens": [
-{
"token": "你好",
"start_offset": 0,
"end_offset": 2,
"type": "CN_WORD",
"position": 0
},
-{
"token": "我",
"start_offset": 3,
"end_offset": 4,
"type": "CN_CHAR",
"position": 1
},
-{
"token": "朴",
"start_offset": 4,
"end_offset": 5,
"type": "CN_CHAR",
"position": 2
},
-{
"token": "国",
"start_offset": 5,
"end_offset": 6,
"type": "CN_CHAR",
"position": 3
},
-{
"token": "昌",
"start_offset": 6,
"end_offset": 7,
"type": "CN_CHAR",
"position": 4
}
]
}

创建 custom 目录
mkdir custom

custom/myext.dic

custom/myext_stopword.dic


vim IKAnalyzer.cfg.xml

重启es

{
-"tokens": [
-{
"token": "你好",
"start_offset": 0,
"end_offset": 2,
"type": "CN_WORD",
"position": 0
},
-{
"token": "史珍香",
"start_offset": 2,
"end_offset": 5,
"type": "CN_WORD",
"position": 1
},
-{
"token": "我",
"start_offset": 6,
"end_offset": 7,
"type": "CN_CHAR",
"position": 2
},
-{
"token": "朴国昌",
"start_offset": 7,
"end_offset": 10,
"type": "CN_WORD",
"position": 3
}
]
}
转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/710243.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号