你从哪里得到的价值
viewstate和
eventvalidation?一方面,它们不应以“
…”结尾,您必须省略一些内容。另一方面,它们不应该被硬编码。
一种解决方案是这样的:
- 通过URL“ http://www.indiapost.gov.in/pin/ ”检索页面,而无需任何表单数据
- 解析和检索诸如
__VIEWSTATE
和的表单值__EVENTVALIDATION
(您可以使用BeautifulSoup)。 - 通过从步骤2添加重要的表单数据来获取搜索结果(第二个HTTP请求)。
更新 :
根据上述想法,我将对您的代码进行一些修改以使其正常工作:
import urllibfrom bs4 import BeautifulSoupheaders = { 'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'Origin': 'http://www.indiapost.gov.in', 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17', 'Content-Type': 'application/x-www-form-urlenpred', 'Referer': 'http://www.indiapost.gov.in/pin/', 'Accept-Encoding': 'gzip,deflate,sdch', 'Accept-Language': 'en-US,en;q=0.8', 'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3'}class MyOpener(urllib.FancyURLopener): version = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17'myopener = MyOpener()url = 'http://www.indiapost.gov.in/pin/'# first HTTP request without form dataf = myopener.open(url)soup = BeautifulSoup(f)# parse and retrieve two vital form valuesviewstate = soup.select("#__VIEWSTATE")[0]['value']eventvalidation = soup.select("#__EVENTVALIDATION")[0]['value']formData = ( ('__EVENTVALIDATION', eventvalidation), ('__VIEWSTATE', viewstate), ('__VIEWSTATEENCRYPTED',''), ('txt_offname', ''), ('ddl_dist', '0'), ('txt_dist_on', ''), ('ddl_state','1'), ('btn_state', 'Search'), ('txt_stateon', ''), ('hdn_tabchoice', '1'), ('search_on', 'Search'),)enpredFields = urllib.urlenpre(formData)# second HTTP request with form dataf = myopener.open(url, enpredFields)try: # actually we'd better use BeautifulSoup once again to # retrieve results(instead of writing out the whole HTML file) # Besides, since the result is split into multipages, # we need send more HTTP requests fout = open('tmp.html', 'w')except: print('Could not open output filen')fout.writelines(f.readlines())fout.close()


