Java判断文件编码的方式——使用第三方库cpdetector

问题背景

我在使用Java开发一款音乐播放器，有个功能是读取本地lrc文件作为歌词，然而它们的编码方式不同，导致有的文件读取出来乱码，因此我需要能够在读取文件时判断编码的方法。

解决过程

网上有一个比较火的库——cpdetector，我先去maven远程仓库搜了一下，引入了依赖

但是在重新导入时提示找不到包，错误信息如下

找原因：看到下面一排小字，应该是说需要到对应的官网下载jar才能使用

来到官网https://sourceforge.net/projects/cpdetector/files/后下载了一个压缩包，将所有红框中的包放到项目的libs文件夹下

然后手动将这3个jar包添加到Maven中并刷新一下就可以使用了



    cpdetector
    cpdetector
    1.0.10
    system
    ${basedir}/libs/cpdetector_1.0.10.jar




    chardet
    chardet
    1.0
    system
    ${basedir}/libs/chardet-1.0.jar




    antlr
    antlr
    2.7.4
    system
    ${basedir}/libs/antlr-2.7.4.jar

具体使用代码示例如下，逻辑很简单

public static String getCharsetName(File file) throws IOException {
    String charsetName = "UTF-8";
    // 获取 CodepageDetectorProxy 实例
    CodepageDetectorProxy detector = CodepageDetectorProxy.getInstance();
    // 添加解析器，会使用到添加的后 2 个 ext 里的 jar 包
    detector.add(new ParsingDetector(false));
    detector.add(JChardetFacade.getInstance());
    detector.add(ASCIIDetector.getInstance());
    detector.add(UnicodeDetector.getInstance());
    Charset charset = detector.detectCodepage(file.toURI().toURL());
    if (charset != null) charsetName = charset.name();
    return charsetName;
}

探测的结果还是挺准确的，比网上的某些代码准多了

Java判断文件编码的方式——使用第三方库cpdetector

Java相关栏目本月热门文章