- 1、Jsoup是什么
- 2、Jsoup解析URL返回结果
- 3、Jsoup解析HTML片段
Jsoup是java的HTML解析器,可以解析请求URL的返回结果,可以解析HTML的片段内容,其实主要用来解析HTML内容的。
pom.xml文件引入:
2、Jsoup解析URL返回结果org.jsoup jsoup 1.14.3
package com.xxx.xxx.utils;
import java.io.IOException;
import org.jsoup.Connection;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
public class JsoupHttpUtil {
public static void main(String[] args) {
Document doc = JsoupHttpUtil.get("https://www.baidu.com");
System.out.println(doc.toString());
}
public static Document get(String url){
try{
Connection conn = Jsoup.connect(url);
conn.header("Accept", "text/javascript, application/javascript, application/ecmascript, application/x-ecmascript, */*; q=0.01");
conn.header("Referer", url);
conn.header("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36");
conn.header("X-Requested-With", "XMLHttpRequest");
return conn.get();
} catch (IOException e) {
e.printStackTrace();
return null;
}
}
}
3、Jsoup解析HTML片段
HTML片段:
代码:
String each = "点击跳转此链接下一页";
//将html片段转成Document对象
Document div = Jsoup.parse(each);
//获取到所有的属性有target的标签,然后取第一个
Element a = div.getElementsByAttribute("target").get(0);
//获取a标签内部的所有的em的元素集合
Elements em = a.getElementsByTag("em");



