android 汉字转拼音带多音字识别功能,供大家参考,具体内容如下
问题来源
在做地名按首字母排序的时候出现了这样一个bug。长沙会被翻译拼音成zhangsha,重庆会被翻译拼音成zhong qing。于是排序出了问题。
汉字转拼音库和多音字识别库
1.多音字对应的词汇库
2.文字的二进制大小对应的拼音库
关键代码
1.我在这里首先将要转化的文字转化成对应的”gb2312”编码。汉字转化成二进制编码一般占两个字节,如果一个字节返回字符,如果是两个字节算一下偏移量。代码如下
private int getChsAscii(String chs) {
int asc = 0;
try {
byte[] bytes = chs.getBytes("gb2312");
if (bytes == null || bytes.length > 2 || bytes.length <= 0) {
throw new RuntimeException("illegal resource string");
}
if (bytes.length == 1) {
asc = bytes[0];
}
if (bytes.length == 2) {
int hightByte = 256 + bytes[0];
int lowByte = 256 + bytes[1];
asc = (256 * hightByte + lowByte) - 256 * 256;
}
} catch (Exception e) {
System.out.println("ERROR:ChineseSpelling.class-getChsAscii(String chs)" + e);
}
return asc;
}
2.将单个汉字获取的拼音再和多音字库的hashMap进行比较,代码如下:
public String getSellingWithPolyphone(String chs){
if(polyphoneMap != null && polyphoneMap.isEmpty()){
polyphoneMap = initDictionary();
}
String key, value, resultPy = null;
buffer = new StringBuilder();
for (int i = 0; i < chs.length(); i++) {
key = chs.substring(i, i + 1);
if (key.getBytes().length >= 2) {
value = (String) convert(key);
if (value == null) {
value = "unknown";
}
} else {
value = key;
}
resultPy = value;
String left = null;
if(i>=1 && i+1 <= chs.length()){
left = chs.substring(i-1,i+1);
if(polyphoneMap.containsKey(value) && polyphoneMap.get(value).contains(left)){
resultPy = value;
}
}
// if(chs.contains("重庆")){
String right = null; //向右多取一个字,例如 [长]沙
if(i<=chs.length()-2){
right = chs.substring(i,i+2);
if(polyphoneMap.containsKey(right)){
resultPy = polyphoneMap.get(right);
}
}
// }
String middle = null; //左右各多取一个字,例如 龙[爪]槐
if(i>=1 && i+2<=chs.length()){
middle = chs.substring(i-1,i+2);
if(polyphoneMap.containsKey(value) && polyphoneMap.get(value).contains(middle)){
resultPy = value;
}
}
String left3 = null; //向左多取2个字,如 芈月[传],列车长
if(i>=2 && i+1<=chs.length()){
left3 = chs.substring(i-2,i+1);
if(polyphoneMap.containsKey(value) && polyphoneMap.get(value).contains(left3)){
resultPy = value;
}
}
String right3 = null; //向右多取2个字,如 [长]孙无忌
if(i<=chs.length()-3){
right3 = chs.substring(i,i+3);
if(polyphoneMap.containsKey(value) && polyphoneMap.get(value).contains(right3)){
resultPy = value;
}
}
buffer.append(resultPy);
}
return buffer.toString();
}
3.将asserts文件内容解析生成HashMap列表.
public HashMapinitDictionary(){ String fileName = "py4j.dic"; InputStreamReader inputReader = null; BufferedReader bufferedReader = null; HashMap polyphoneMap = new HashMap (); try{ inputReader = new InputStreamReader(MyApplication.mContext.getResources().getAssets().open(fileName),"UTF-8"); bufferedReader = new BufferedReader(inputReader); String line = null; while((line = bufferedReader.readLine()) != null){ String[] arr = line.split(PINYIN_SEPARATOR); if(isNotEmpty(arr[1])){ String[] dyzs = arr[1].split(WORD_SEPARATOR); for(String dyz: dyzs){ if(isNotEmpty(dyz)){ polyphoneMap.put(dyz.trim(),arr[0]); } } } } }catch(Exception e){ e.printStackTrace(); }finally{ if(inputReader != null){ try { inputReader.close(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } } if(bufferedReader != null){ try { bufferedReader.close(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } } } return polyphoneMap; }
github源码下载:https://github.com/loveburce/ChinesePolyphone.git
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持考高分网。



