栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 软件开发 > 后端开发 > Java

Java培训学习之分词工具之HanLP介绍

Java 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

Java培训学习之分词工具之HanLP介绍

HanLP 是由一系列模型和算法组成的Java工具包。目标是普及自然语言处理在生产环境中的应用。它不仅是分词,还提供了词法分析、句法分析、语义理解等完整的功能。HanLP 具有功能齐全、性能高效、结构清晰、语料最新、功能可定制等特点。
HanLP 是完全开源的,包括字典。不依赖其他jar,底层使用了一系列高速数据结构,如双数组Trie树、DAWG、AhoCorasickDoubleArrayTrie等,这些基础组件都是开源的。
通过工具类HanLP,可以一句话调用所有函数,文档详细,开箱即用。底层算法经过精心优化,极速分词模式下每秒可达200​​0万字,内存仅需要120MB。IO方面,字典加载速度极快,快速启动仅需500ms
POM文件

4.0.0
com.iqilu
Segment
1.0-SNAPSHOT
jar
Hello
http://maven.apache.org

UTF-8



junit
junit
3.8.1
test


com.hankcs
hanlp
portable-1.3.2


DemoSegment.java
package com.iqilu;
import com.hankcs.hanlp.HanLP;
import com.hankcs.hanlp.seg.common.Term;
import java.util.List;
public class DemoSegment {
public static void main(String[] args) {
String[] testCase = new String[]{
“Goods and services”,
“Married and unmarried are indeed interfering with participles”,
“Buy fruits and then come to the Expo and die at the Expo”,
“China’s capital is Beijing”,
“Welcome the new teacher to come to dinner”,
“The virgin officer of the industry and information technology must personally explain the installation of technical devices such as 24 switches through the subordinate departments every month”,
“With the rise of web games, the current web games are prosperous and rely on archives. The design for logical judgment is reduced, but this one cannot be completely ignored.”,
};
for (String sentence : testCase)
{
List termList = HanLP.segment(sentence);
System.out.println(termList);
}
}
}结果
[Products/n, and/c, services/vn]
[Married/v, of/uj, and/c, not yet/d, married/v, of/uj, indeed/ad, at/p, interference/v, participle/n, ah/y]
[Buy/v, fruit/n, then/c, come/v, Expo/j, finally/f, go/v, Expo/j]
[China/ns, of/uj, capital/n, yes/v, Beijing/ns]
[Welcome/v, new/a, teacher/n, before death/t, come/v, dinner/v]
[Industry and Information Office/n, female/b, secretary/n, monthly/r, passing/p, subordinate/v, department/n, all/nr, personally/d,
Explain/v, 24/m, port/q, switch/n, etc/u, technical/n, device/n, of/uj, installation/v, work/vn]
[With/p, page/q, youxing/n, from/v, to/v, now/t, of/uj, page tour/nz, flourishing/an,/w,
Depend on/v, archive/vn, proceed/v, logic/n, judge/v, of/uj, design/vn, reduce/v, up/ul,/w,
But/c, this piece of/r, also/d, cannot/v, completely/ad, ignore/v, drop/v,./w]Java分词工具只是众多的Java开发工具之一,以后大家还会接触到更多相关知识。

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/351950.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号