OpenXML标签搜索_面试问答

OpenXML标签搜索

尝试查找标签的问题在于单词并非总是以其在Word中的格式出现在基础XML中。例如，在您的示例XML中，

<!TAG1!>

标记被分成多个运行，如下所示：

<w:r>    <w:rPr>        <w:lang w:val="en-GB"/>    </w:rPr>    <w:t>&lt;!TAG1</w:t></w:r><w:proofErr w:type="gramEnd"/>    <w:r>    <w:rPr>        <w:lang w:val="en-GB"/>    </w:rPr>    <w:t>!&gt;</w:t></w:r>

正如评论中指出的那样，这有时是由拼写和语法检查程序引起的，但这并不是所有可能导致这种情况的原因。例如，在标签的各个部分上使用不同的样式也可能会导致它。

处理的方法之一是找到

InnerText

的

Paragraph

和比较，对你的

Regex

。该

InnerText

属性将返回段落的纯文本，而不会妨碍基础文档中的任何格式或其他XML。

有了标签后，替换文本是下一个问题。由于上述原因，您不能只

InnerText

用一些新文本替换，因为不清楚文本的哪些部分属于哪个部分

Run

。解决此问题的最简单方法是删除所有现有

Run

的，并

Run

使用

Text

包含新文本的属性添加新的。

以下代码显示查找标签并立即替换它们，而不是按照您在问题中建议的那样使用两次传递。说实话，这只是为了简化示例。它应该显示您需要的一切。

private static void ReplaceTags(string filename){    Regex regex = new Regex("<!(.)*?!>", RegexOptions.Compiled);    using (Wordprocessingdocument worddocument = Wordprocessingdocument.Open(filename, true))    {        //grab the header parts and replace tags there        foreach (HeaderPart headerPart in worddocument.MaindocumentPart.HeaderParts)        { ReplaceParagraphParts(headerPart.Header, regex);        }        //now do the document        ReplaceParagraphParts(worddocument.MaindocumentPart.document, regex);        //now replace the footer parts        foreach (FooterPart footerPart in worddocument.MaindocumentPart.FooterParts)        { ReplaceParagraphParts(footerPart.Footer, regex);        }    }}private static void ReplaceParagraphParts(OpenXmlElement element, Regex regex){    foreach (var paragraph in element.Descendants<Paragraph>())    {        Match match = regex.Match(paragraph.InnerText);        if (match.Success)        { //create a new run and set its value to the correct text //this must be done before the child runs are removed otherwise //paragraph.InnerText will be empty Run newRun = new Run(); newRun.AppendChild(new Text(paragraph.InnerText.Replace(match.Value, "some new value"))); //remove any child runs paragraph.RemoveAllChildren<Run>(); //add the newly created run paragraph.AppendChild(newRun);        }    }}

上述方法的缺点是，您可能拥有的所有样式都会丢失。这些可以从现有

Run

的中复制，但是如果有多个

Run

具有不同属性的，则需要计算出哪些属性需要复制到哪里。

Run

如果需要的话，没有什么可以阻止您在上面的代码中创建多个具有不同属性的。其他元素也将丢失（例如，任何符号），因此也需要考虑这些元素。

OpenXML标签搜索

面试问答相关栏目本月热门文章