我将使用一种递归方法,该方法采用您的开始标记并对其子节点进行迭代。对于每个TextNode,打印内容。对于每个元素,检查它的子节点。
public static void main(String[] args) throws ParseException, IOException{ //I put your HTML in the body tag in a local file document doc = Jsoup.parse(new File("input/20160505.html"), "UTF-8"); Elements elements = doc.getElementsByTag("body"); Element rootTag = elements.get(0); printTextOfTag(rootTag);}public static void printTextOfTag(Element currentTag){ List<Node> nodes = currentTag.childNodes(); for(Node n : nodes) { if(n instanceof TextNode) { System.out.println(((TextNode)n).text()); } else if(n instanceof Element) { printTextOfTag((Element)n); } }}输出量
This is the first text More text here Another line of textText in the spanAnother text in span This is another line



