正如@VGR和@kjhughes在问题下方的注释中指出的那样,base64确实是我问题的可能答案。现在,我确实有一个基于转义的问题的进一步解决方案。我已经写了2种功能
escapeInvalidXmlCharacters(Stringstring),并
unescapeInvalidXmlCharacters(String string)可以通过以下方式使用。
String string = "text#text##text#0;text" + 'u0000' + "text<text&text#"; document document = documentBuilderFactory.newInstance().newdocumentBuilder().newdocument(); Element element = document.createElement("element"); element.appendChild(document.createTextNode(escapeInvalidXmlCharacters(string))); document.appendChild(element); TransformerFactory.newInstance().newTransformer().transform(new DOMSource(document), new StreamResult(new File("test.xml"))); // creates <?xml version="1.0" encoding="UTF-8" standalone="no"?><element>text##text####text##0;text#0;text<text&text##</element> document = documentBuilderFactory.newInstance().newdocumentBuilder().parse(new File("test.xml")); System.out.println(unescapeInvalidXmlCharacters(document.getdocumentElement().getTextContent()).equals(string)); // prints trueescapeInvalidXmlCharacters(Stringstring)和
unescapeInvalidXmlCharacters(String string):
public static String escapeInvalidXmlCharacters(String string) { StringBuilder stringBuilder = new StringBuilder(); for (int i = 0, prePoint = 0; i < string.length(); i += Character.charCount(prePoint)) { prePoint = string.prePointAt(i); if (prePoint == '#') { stringBuilder.append("##"); } else if (prePoint == 0x9 || prePoint == 0xA || prePoint == 0xD || prePoint >= 0x20 && prePoint <= 0xD7FF || prePoint >= 0xE000 && prePoint <= 0xFFFD || prePoint >= 0x10000 && prePoint <= 0x10FFFF) { stringBuilder.appendCodePoint(prePoint); } else { stringBuilder.append("#" + prePoint + ";"); } } return stringBuilder.toString();}public static String unescapeInvalidXmlCharacters(String string) { StringBuilder stringBuilder = new StringBuilder(); boolean escaped = false; for (int i = 0, prePoint = 0; i < string.length(); i += Character.charCount(prePoint)) { prePoint = string.prePointAt(i); if (escaped) { stringBuilder.appendCodePoint(prePoint); escaped = false; } else if (prePoint == '#') { StringBuilder intBuilder = new StringBuilder(); int j; for (j = i + 1; j < string.length(); j += Character.charCount(prePoint)) { prePoint = string.prePointAt(j); if (prePoint == ';') { escaped = true; break; } if (prePoint >= 48 && prePoint <= 57) { intBuilder.appendCodePoint(prePoint); } else { break; } } if (escaped) { try { prePoint = Integer.parseInt(intBuilder.toString()); stringBuilder.appendCodePoint(prePoint); escaped = false; i = j; } catch (IllegalArgumentException e) { prePoint = '#'; escaped = true; } } else { prePoint = '#'; escaped = true; } } else { stringBuilder.appendCodePoint(prePoint); } } return stringBuilder.toString();}请注意,这些功能可能效率很低,可以用更好的方式编写。随时发布建议以改进注释中的代码。



