栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 面试经验 > 面试问答

Java DOM转换和解析具有无效XML字符的任意字符串?

面试问答 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

Java DOM转换和解析具有无效XML字符的任意字符串?

正如@VGR和@kjhughes在问题下方的注释中指出的那样,base64确实是我问题的可能答案。现在,我确实有一个基于转义的问题的进一步解决方案。我已经写了2种功能

escapeInvalidXmlCharacters(Stringstring)
,并
unescapeInvalidXmlCharacters(String string)
可以通过以下方式使用。

    String string = "text#text##text#0;text" + 'u0000' + "text<text&text#";    document document = documentBuilderFactory.newInstance().newdocumentBuilder().newdocument();    Element element = document.createElement("element");    element.appendChild(document.createTextNode(escapeInvalidXmlCharacters(string)));    document.appendChild(element);    TransformerFactory.newInstance().newTransformer().transform(new DOMSource(document), new StreamResult(new File("test.xml")));    // creates <?xml version="1.0" encoding="UTF-8" standalone="no"?><element>text##text####text##0;text#0;text&lt;text&amp;text##</element>    document = documentBuilderFactory.newInstance().newdocumentBuilder().parse(new File("test.xml"));    System.out.println(unescapeInvalidXmlCharacters(document.getdocumentElement().getTextContent()).equals(string));    // prints true

escapeInvalidXmlCharacters(Stringstring)
unescapeInvalidXmlCharacters(String string)

public static String escapeInvalidXmlCharacters(String string) {    StringBuilder stringBuilder = new StringBuilder();    for (int i = 0, prePoint = 0; i < string.length(); i += Character.charCount(prePoint)) {        prePoint = string.prePointAt(i);        if (prePoint == '#') { stringBuilder.append("##");        } else if (prePoint == 0x9 || prePoint == 0xA || prePoint == 0xD || prePoint >= 0x20 && prePoint <= 0xD7FF || prePoint >= 0xE000 && prePoint <= 0xFFFD || prePoint >= 0x10000 && prePoint <= 0x10FFFF) { stringBuilder.appendCodePoint(prePoint);        } else { stringBuilder.append("#" + prePoint + ";");        }    }    return stringBuilder.toString();}public static String unescapeInvalidXmlCharacters(String string) {    StringBuilder stringBuilder = new StringBuilder();    boolean escaped = false;    for (int i = 0, prePoint = 0; i < string.length(); i += Character.charCount(prePoint)) {        prePoint = string.prePointAt(i);        if (escaped) { stringBuilder.appendCodePoint(prePoint); escaped = false;        } else if (prePoint == '#') { StringBuilder intBuilder = new StringBuilder(); int j; for (j = i + 1; j < string.length(); j += Character.charCount(prePoint)) {     prePoint = string.prePointAt(j);     if (prePoint == ';') {         escaped = true;         break;     }     if (prePoint >= 48 && prePoint <= 57) {         intBuilder.appendCodePoint(prePoint);     } else {         break;     } } if (escaped) {     try {         prePoint = Integer.parseInt(intBuilder.toString());         stringBuilder.appendCodePoint(prePoint);         escaped = false;         i = j;     } catch (IllegalArgumentException e) {         prePoint = '#';         escaped = true;     } } else {     prePoint = '#';     escaped = true; }        } else { stringBuilder.appendCodePoint(prePoint);        }    }    return stringBuilder.toString();}

请注意,这些功能可能效率很低,可以用更好的方式编写。随时发布建议以改进注释中的代码。



转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/410501.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号