使用BeautifulSoup4的Python解决方案(
编辑: 使用适当的跳过。 编辑3: 使用
table):
from bs4 import BeautifulSouphtml = """ <table border="0" cellpadding="5" cellspacing="2" width="95%"> <tr valign="top"> <th>Tests</th> <th>Failures</th> <th>Success Rate</th> <th>Average Time</th> <th>Min Time</th> <th>Max Time</th> </tr> <tr valign="top" > <td>103</td> <td>24</td> <td>76.70%</td> <td>71 ms</td> <td>0 ms</td> <td>829 ms</td> </tr></table>"""soup = BeautifulSoup(html)table = soup.find("table", attrs={"class":"details"})# The first tr contains the field names.headings = [th.get_text() for th in table.find("tr").find_all("th")]datasets = []for row in table.find_all("tr")[1:]: dataset = zip(headings, (td.get_text() for td in row.find_all("td"))) datasets.append(dataset)print datasets结果看起来像这样:
[[(u'Tests', u'103'), (u'Failures', u'24'), (u'Success Rate', u'76.70%'), (u'Average Time', u'71 ms'), (u'Min Time', u'0 ms'), (u'Max Time', u'829 ms')]]
Edit2: 要产生所需的输出,请使用类似以下的内容:
for dataset in datasets: for field in dataset: print "{0:<16}: {1}".format(field[0], field[1])结果:
Tests: 103Failures : 24Success Rate : 76.70%Average Time : 71 msMin Time : 0 msMax Time : 829 ms



