图1
只需要红框内的表格数据
按F12进入开发者界面
图2
再按Ctrl+R重新运行当前界面
图3
这个文件中包含我们想要的数据
点击Headers
图4
复制url
http://push2.eastmoney.com/api/qt/clist/get?cb=jQuery112304931261550818704_1635070090942&fid=f62&po=1&pz=50&pn=1&np=1&fltt=2&invt=2&ut=b2884a393a59ad64002292a3e90d46a5&fs=m%3A0%2Bt%3A6%2Bf%3A!2%2Cm%3A0%2Bt%3A13%2Bf%3A!2%2Cm%3A0%2Bt%3A80%2Bf%3A!2%2Cm%3A1%2Bt%3A2%2Bf%3A!2%2Cm%3A1%2Bt%3A23%2Bf%3A!2%2Cm%3A0%2Bt%3A7%2Bf%3A!2%2Cm%3A1%2Bt%3A3%2Bf%3A!2&fields=f12%2Cf14%2Cf2%2Cf3%2Cf62%2Cf184%2Cf66%2Cf69%2Cf72%2Cf75%2Cf78%2Cf81%2Cf84%2Cf87%2Cf204%2Cf205%2Cf124%2Cf1%2Cf13
在python中requests模块可以用get获取
import requests url = 'http://push2.eastmoney.com/api/qt/clist/get?cb=jQuery112301936958418195105_1634991380727&fid=f62&po=1&pz=50&pn=1&np=1&fltt=2&invt=2&ut=b2884a393a59ad64002292a3e90d46a5&fs=m%3A0%2Bt%3A6%2Bf%3A!2%2Cm%3A0%2Bt%3A13%2Bf%3A!2%2Cm%3A0%2Bt%3A80%2Bf%3A!2%2Cm%3A1%2Bt%3A2%2Bf%3A!2%2Cm%3A1%2Bt%3A23%2Bf%3A!2%2Cm%3A0%2Bt%3A7%2Bf%3A!2%2Cm%3A1%2Bt%3A3%2Bf%3A!2&fields=f12%2Cf14%2Cf2%2Cf3%2Cf62%2Cf184%2Cf66%2Cf69%2Cf72%2Cf75%2Cf78%2Cf81%2Cf84%2Cf87%2Cf204%2Cf205%2Cf124%2Cf1%2Cf13' r = requests.get(url).text print(r)
图5
r中包含需要的数据text将其转成了字符串格式
在图4中最后一行就是原网页表格对应的变量名
r储存着对应的值
获取每一列数据
需要re模块
import requests
import re
url = 'http://push2.eastmoney.com/api/qt/clist/get?cb=jQuery112301936958418195105_1634991380727&fid=f62&po=1&pz=50&pn=1&np=1&fltt=2&invt=2&ut=b2884a393a59ad64002292a3e90d46a5&fs=m%3A0%2Bt%3A6%2Bf%3A!2%2Cm%3A0%2Bt%3A13%2Bf%3A!2%2Cm%3A0%2Bt%3A80%2Bf%3A!2%2Cm%3A1%2Bt%3A2%2Bf%3A!2%2Cm%3A1%2Bt%3A23%2Bf%3A!2%2Cm%3A0%2Bt%3A7%2Bf%3A!2%2Cm%3A1%2Bt%3A3%2Bf%3A!2&fields=f12%2Cf14%2Cf2%2Cf3%2Cf62%2Cf184%2Cf66%2Cf69%2Cf72%2Cf75%2Cf78%2Cf81%2Cf84%2Cf87%2Cf204%2Cf205%2Cf124%2Cf1%2Cf13'
r = requests.get(url).text
f12=re.findall(""f12":"d*.{0,1}d*"",r)
f14=re.findall(""f14":".{3,4}"{0,1}",r)
f2=re.findall(""f2":d*.{0,1}d*",r)
f3=re.findall(""f3":d*.{0,1}d*",r)
f62=re.findall(""f62":d*.{0,1}d*",r)
f184=re.findall(""f184":d*.{0,1}d*",r)
f66=re.findall(""f66":-{0,1}d*.{0,1}d*",r)
f69=re.findall(""f69":-{0,1}d*.{0,1}d*",r)
f72=re.findall(""f72":-{0,1}d*.{0,1}d*",r)
f75=re.findall(""f75":-{0,1}d*.{0,1}d*",r)
f78=re.findall(""f78":-{0,1}d*.{0,1}d*",r)
f81=re.findall(""f81":-{0,1}d*.{0,1}d*",r)
f84=re.findall(""f84":-{0,1}d*.{0,1}d*",r)
f87=re.findall(""f87":-{0,1}d*.{0,1}d*",r)
结果如下:
图6
对应每列数据大同小异,用正则表达式匹配
学习正则表达式的链接:
https://www.cnblogs.com/magicking/p/8986869.html
将每一行数据提取出来并存入一个表中作为一个元素
需要用到pandas模块和re模块
import re import pandas as pd
info=[]
for i in range(50):
dm=re.findall("d*",f12[i])[6]
name=re.findall("w*s{0,1}w*",f14[i])[5]
zxj=float(re.findall("d*.{0,1}d*", f2[i])[5])
tdzdf=float(re.findall("d*.{0,1}d*",f3[i])[5])
zlje=float(re.findall("d*.{0,1}d*",f62[i])[5])
zljzb=float(re.findall("d*.{0,1}d*",f184[i])[5])
sblrje=float(re.findall("-{0,1}d*.{0,1}d*",f66[i])[5])
sblrjzb=float(re.findall("-{0,1}d*.{0,1}d*",f69[i])[5])
blrje=float(re.findall("-{0,1}d*.{0,1}d*",f72[i])[5])
blrjzb=float(re.findall("-{0,1}d*.{0,1}d*",f75[i])[5])
mlrje=float(re.findall("-{0,1}d*.{0,1}d*",f78[i])[5])
mlrjzb=float(re.findall("-{0,1}d*.{0,1}d*",f81[i])[5])
llrje=float(re.findall("-{0,1}d*.{0,1}d*",f84[i])[5])
llrjzb=float(re.findall("-{0,1}d*.{0,1}d*",f87[i])[5])
info.append(pd.Dataframe({'dm':dm,'name':name,'zxj':zxj,
'tdzdf':tdzdf,'zlje':zlje,'zljzb':zljzb,
'sblrje':sblrje,'sblrjzb':sblrjzb,
'mlrje':mlrje,'mlrjzb':mlrjzb,
'llrje':llrje,'llrjzb':llrjzb},index=[i]))
sj=pd.concat(info)
最后一行将数据进行合并,生成表格
数据转换由于python中数据不会被保存故导出为excel文件存储
学习链接:https://www.cnblogs.com/wtmb/p/13501463.html
sj.to_excel('D:myfilesj.xlsx',sheet_name="p1",index=False)
图7
打开结果
图8
读取excel数据,在转为mysql数据并打印出来
需要pymysql模块
import pymysql
读取数据
学习链接:https://www.cnblogs.com/lj821022/p/8232764.html
data=pd.read_excel('D:myfilesj.xlsx')
连接数据库并转为数据库表
data.to_sql(name='nt',con='mysql+pymysql://root:123456@localhost:3306/mysql?charset=utf8',if_exists='replace',index=False)
连接数据库,生成游标,输出
学习连接:https://blog.csdn.net/kongsuhongbaby/article/details/84948205
db = pymysql.connect(host='localhost',user='root',password='123456',database='mysql',port=3306)
cursor = db.cursor()
cursor.execute('''select * from nt''')
results = cursor.fetchall()
for row in results:
print(row)
关闭游标,断开链接
cursor.close() db.close()完整代码
import requests
import pandas as pd
import re
import pymysql
url = 'http://push2.eastmoney.com/api/qt/clist/get?cb=jQuery112301936958418195105_1634991380727&fid=f62&po=1&pz=50&pn=1&np=1&fltt=2&invt=2&ut=b2884a393a59ad64002292a3e90d46a5&fs=m%3A0%2Bt%3A6%2Bf%3A!2%2Cm%3A0%2Bt%3A13%2Bf%3A!2%2Cm%3A0%2Bt%3A80%2Bf%3A!2%2Cm%3A1%2Bt%3A2%2Bf%3A!2%2Cm%3A1%2Bt%3A23%2Bf%3A!2%2Cm%3A0%2Bt%3A7%2Bf%3A!2%2Cm%3A1%2Bt%3A3%2Bf%3A!2&fields=f12%2Cf14%2Cf2%2Cf3%2Cf62%2Cf184%2Cf66%2Cf69%2Cf72%2Cf75%2Cf78%2Cf81%2Cf84%2Cf87%2Cf204%2Cf205%2Cf124%2Cf1%2Cf13'
r = requests.get(url).text
f12=re.findall(""f12":"d*.{0,1}d*"",r)
f14=re.findall(""f14":".{3,4}"{0,1}",r)
f2=re.findall(""f2":d*.{0,1}d*",r)
f3=re.findall(""f3":d*.{0,1}d*",r)
f62=re.findall(""f62":d*.{0,1}d*",r)
f184=re.findall(""f184":d*.{0,1}d*",r)
f66=re.findall(""f66":-{0,1}d*.{0,1}d*",r)
f69=re.findall(""f69":-{0,1}d*.{0,1}d*",r)
f72=re.findall(""f72":-{0,1}d*.{0,1}d*",r)
f75=re.findall(""f75":-{0,1}d*.{0,1}d*",r)
f78=re.findall(""f78":-{0,1}d*.{0,1}d*",r)
f81=re.findall(""f81":-{0,1}d*.{0,1}d*",r)
f84=re.findall(""f84":-{0,1}d*.{0,1}d*",r)
f87=re.findall(""f87":-{0,1}d*.{0,1}d*",r)
info=[]
for i in range(50):
dm=re.findall("d*",f12[i])[6]
name=re.findall("w*s{0,1}w*",f14[i])[5]
zxj=float(re.findall("d*.{0,1}d*", f2[i])[5])
tdzdf=float(re.findall("d*.{0,1}d*",f3[i])[5])
zlje=float(re.findall("d*.{0,1}d*",f62[i])[5])
zljzb=float(re.findall("d*.{0,1}d*",f184[i])[5])
sblrje=float(re.findall("-{0,1}d*.{0,1}d*",f66[i])[5])
sblrjzb=float(re.findall("-{0,1}d*.{0,1}d*",f69[i])[5])
blrje=float(re.findall("-{0,1}d*.{0,1}d*",f72[i])[5])
blrjzb=float(re.findall("-{0,1}d*.{0,1}d*",f75[i])[5])
mlrje=float(re.findall("-{0,1}d*.{0,1}d*",f78[i])[5])
mlrjzb=float(re.findall("-{0,1}d*.{0,1}d*",f81[i])[5])
llrje=float(re.findall("-{0,1}d*.{0,1}d*",f84[i])[5])
llrjzb=float(re.findall("-{0,1}d*.{0,1}d*",f87[i])[5])
info.append(pd.Dataframe({'dm':dm,'name':name,'zxj':zxj,
'tdzdf':tdzdf,'zlje':zlje,'zljzb':zljzb,
'sblrje':sblrje,'sblrjzb':sblrjzb,
'mlrje':mlrje,'mlrjzb':mlrjzb,
'llrje':llrje,'llrjzb':llrjzb},index=[i]))
sj=pd.concat(info)
sj.to_excel('D:myfilesj.xlsx',sheet_name="p1",index=False)
data=pd.read_excel('D:myfilesj.xlsx')
data.to_sql(name='nt',con='mysql+pymysql://root:123456@localhost:3306/mysql?charset=utf8',if_exists='replace',index=False)
db = pymysql.connect(host='localhost',user='root',password='123456',database='mysql',port=3306)
cursor = db.cursor()
cursor.execute('''select * from nt''')
results = cursor.fetchall()
for row in results:
print(row)
cursor.close()
db.close()



