运行代码,您将从该表中获取所需的数据。要尝试从该元素中提取数据,您需要做的就是将上面粘贴的整个html元素包装在
html=''' '''
import csvfrom bs4 import BeautifulSoupoutfile = open("table_data.csv","w",newline='')writer = csv.writer(outfile)tree = BeautifulSoup(html,"lxml")table_tag = tree.select("table")[0]tab_data = [[item.text for item in row_data.select("th,td")] for row_data in table_tag.select("tr")]for data in tab_data: writer.writerow(data) print(' '.join(data))我试图将代码分成几部分,以使您理解。我在上面所做的是一个嵌套的for循环。这是分开的过程:
from bs4 import BeautifulSoupsoup = BeautifulSoup(html,"lxml")table = soup.find('table')list_of_rows = []for row in table.findAll('tr'): list_of_cells = [] for cell in row.findAll(["th","td"]): text = cell.text list_of_cells.append(text) list_of_rows.append(list_of_cells)for item in list_of_rows: print(' '.join(item))结果:
Date Open High Low Close Volume Market CapSep 14, 2017 3875.37 3920.60 3153.86 3154.95 2,716,310,000 64,191,600,000Sep 13, 2017 4131.98 3789.92 3882.59 2,219,410,000 68,432,200,000Sep 12, 2017 4168.88 4344.65 4085.22 4130.81 1,864,530,000 69,033,400,000



