您要使用
FormElement。这是Jsoup的有用功能。它能够找到在表单内声明的字段并将其发布给您。发布表单之前,您可以使用Jsoup
API设置字段的值。
注意:
在下面的示例代码中,您将始终看到对Element#select方法的调用,以及对Elements#first方法的调用。
例如 :
responsedocument.select("form#aspnetForm").first()Jsoup
1.11.1引入了更有效的替代方法:Element#selectFirst。您可以将其用作原始替代品的直接替代品。例如:
responsedocument.select("form#aspnetForm").first()
可以替换为responsedocument.selectFirst("form#aspnetForm")
样本代码
// * Connect to websiteString url = "http://kepler.sos.ca.gov/";Connection.Response resp = Jsoup.connect(url) // .timeout(30000) // .method(Connection.Method.GET) // .execute();// * Find the formdocument responsedocument = resp.parse();Element potentialForm = responsedocument.select("form#aspnetForm").first();checkElement("form element", potentialForm);FormElement form = (FormElement) potentialForm;// * Fill in the form and submit it// ** Search TypeElement radioButtonListSearchType = form.select("[name$=RadioButtonList_SearchType]").first();checkElement("search type radio button list", radioButtonListSearchType);radioButtonListSearchType.attr("checked", "checked");// ** Name searchElement textBoxNameSearch = form.select("[name$=TextBox_NameSearch]").first();checkElement("name search text box", textBoxNameSearch);textBoxNameSearch.val("cali");// ** Submit the formdocument searchResults = form.submit().cookies(resp.cookies()).post();// * Extract results (entity numbers in this sample pre)for (Element entityNumber : searchResults.select("table[id$=SearchResults_Corp] > tbody > tr > td:first-of-type:not(td[colspan=5])")) { System.out.println(entityNumber.text());}public static void checkElement(String name, Element elem) { if (elem == null) { throw new RuntimeException("Unable to find " + name); }}输出(截至撰写本文时)
C3036475C3027305C3236514C3027304C3034012C3035110C3028330C3035378C3124793C3734637
也可以看看:
在此示例中,我们将使用FormElement类登录GitHub网站。
// # Constants used in this examplefinal String USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"; final String LOGIN_FORM_URL = "https://github.com/login";final String USERNAME = "yourUsername"; final String PASSWORD = "yourPassword";// # Go to login pageConnection.Response loginFormResponse = Jsoup.connect(LOGIN_FORM_URL) .method(Connection.Method.GET) .userAgent(USER_AGENT) .execute();// # Fill the login form// ## Find the form first...FormElement loginForm = (FormElement)loginFormResponse.parse() .select("div#login > form").first();checkElement("Login Form", loginForm);// ## ... then "type" the username ...Element loginField = loginForm.select("#login_field").first();checkElement("Login Field", loginField);loginField.val(USERNAME);// ## ... and "type" the passwordElement passwordField = loginForm.select("#password").first();checkElement("Password Field", passwordField);passwordField.val(PASSWORD);// # Now send the form for loginConnection.Response loginActionResponse = loginForm.submit() .cookies(loginFormResponse.cookies()) .userAgent(USER_AGENT).execute();System.out.println(loginActionResponse.parse().html());public static void checkElement(String name, Element elem) { if (elem == null) { throw new RuntimeException("Unable to find " + name); }}所有表单数据都由FormElement类为我们处理(甚至包括表单方法检测)。调用FormElement#submit方法时,将建立一个现成的Connection。我们要做的就是用附加头(cookie,用户代理等)完成此连接并执行它。



