简介本文灵感来源:https://blog.csdn.net/qq_43613793/article/details/104268536
感谢博主提供学习文章!
Echarts 是一个由百度开源的数据可视化,凭借着良好的交互性,精巧的图表设计,得到了众多开发者的认可。而 Python 是一门富有表达力的语言,很适合用于数据处理。当数据分析遇上数据可视化时,pyecharts 诞生了
效果图
首先,分析一下网页。这里是全国的汇总数据,在这个节点下↓
然后,找到全国各省的数据,在这个节点下↓
接着,找一找某个省的历史数据,然后发现,这个数据就非常像了↓
对比一下发现,这个就是某个省的历史数据了↓
然后找到它对应的URL,发现只要把province=heilongjiang这个换成其它省的拼音就能得到其它省的历史数据了。
对比一下发现,有callback和没callback会多了一个字符串和括号括起来,这样就不是纯字典的形式,就不那么容易用json去处理数据了(不知道是不是真的不容易,反正我是这么觉得的)
接着找到世界各国的数据↓
(这里我是没有找到涵盖所有想要的数据的json文件,所以只能这样一个个找,如果有小伙伴找到的欢迎留言分享一下)
找到所有想要的数据之后,就开始写代码进行爬取和处理分析了
加载库# 把所有有可能用到的库都预加载一下 import re import json import time import requests import pandas as pd from pyecharts.charts import * from pyecharts import options as opts from pyecharts.commons.utils import JsCode from pyecharts.globals import ThemeType, ChartType from bs4 import BeautifulSoup from selenium import webdriverselenium网站解析
https://news.sina.cn/zt_d/yiqing0121
参考文章:https://www.cnblogs.com/stin/p/7929601.html
url = 'https://news.sina.cn/zt_d/yiqing0121'
try:
# 浏览器驱动参数对象
chrome_options = webdriver.ChromeOptions()
# 不加载图片
prefs = {"profile.managed_default_content_settings.images": 2}
chrome_options.add_experimental_option("prefs", prefs)
# 使用headless无界面浏览器模式
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
# 加载谷歌浏览器驱动,填写自己浏览器驱动的实际路径
driver = webdriver.Chrome(options=chrome_options,
executable_path='C:Program FilesGoogleChromeApplicationchromedriver'
)
driver.get(url) # 发送网络请求
html = driver.page_source # 获取页面html源代码
html = BeautifulSoup(html, "html.parser") # 解析html代码
driver.quit() # 退出浏览器驱动
except Exception as e:
print('异常信息为:',e)
获取国内各省数据
( 因为子节点标签比较复杂,我不知道怎么处理不爬取孙节点,请会的小伙伴留言分享一下,谢谢!)
p_list = []
def provinces(html):
for i in range(34):
hi = []
for ht in html.find('div', attrs={'data-index':i}).find_all('span', recursive=False):
if ht.em is not None:
# 处理爬取到的孙节点
h2 = ht.em.text
ht = ht.text.replace(h2,'').replace(' ','')
hi.append(ht)
else:
ht = ht.text.replace('- -','0')
hi.append(ht)
p_list.append(hi)
return p_list
provinces(html)
国内数据处理
个人感觉英文对后续的pandas数据处理更方便,所以把爬取到的数据加上英文的属性标签
n_data = pd.Dataframe(p_list) n_data.drop(columns=[3,7], inplace=True) n_data['p_name'] = n_data[0] n_data['econNum'] = n_data[1] n_data['value'] = n_data[2] n_data['asymptomNum'] = n_data[4] n_data['deathNum'] = n_data[5] n_data['cureNum'] = n_data[6] del n_data[0] del n_data[1] del n_data[2] del n_data[4] del n_data[5] del n_data[6] n_data获取世界各国数据
worldlist:https://news.sina.com.cn/project/fymap/ncp2020_full_data.json
urL = 'https://news.sina.com.cn/project/fymap/ncp2020_full_data.json'
headers = {
# 填你们浏览器的headers
}
reponse = requests.get(urL, headers=headers)
f_data = json.loads(re.match(".*?({.*}).*", reponse.text)[1])['data']
worldlist = f_data['worldlist']
worldlist[:2];
worldlist里China的数据属性和其它国家地区的不一样,所以另外处理,使得它们的对应数据属性相同
w_list = []
cn = worldlist[0]['name']
cn_values = f_data['gntotal']
cn_cureNum = f_data['curetotal']
cn_deathNum = f_data['deathtotal']
cn_conadd = f_data['add_daily']['addcon']
cn_cureadd = f_data['add_daily']['addcure']
cn_deathadd = f_data['add_daily']['adddeath']
cn_dict = {'country':cn, 'value':cn_values, 'cureNum':cn_cureNum,
'deathNum':cn_deathNum, 'conadd':cn_conadd, 'cureadd':cn_cureadd, 'deathadd':cn_deathadd
}
w_list.append(cn_dict)
for x in range(1,len(worldlist)):
country = worldlist[x]['name']
values = worldlist[x]['value']
cureNum = worldlist[x]['cureNum']
deathNum = worldlist[x]['deathNum']
conadd = worldlist[x]['conadd']
cureadd = worldlist[x]['cureadd']
deathadd = worldlist[x]['deathadd']
f_dict = {'country':country, 'value':values, 'cureNum':cureNum,
'deathNum':deathNum, 'conadd':conadd, 'cureadd':cureadd, 'deathadd':deathadd
}
w_list.append(f_dict)
转为Dataframe
w_data = pd.Dataframe(w_list) w_data
这里是为了能在后续的可视化地图中能够对应上,所以对数据做了预处理(如果不做预处理的话,后面的世界地图会显示不出来,而且数据里是包含了如红宝石公主号的非国家地区的元组)对照表是pyecharts中的国家列表
name_en = pd.read_excel('国家中英文对照.xlsx')
w_data = pd.merge(w_data, name_en, left_on='country', right_on='c_name', how='inner')
w_data = w_data[['name','country','value','cureNum','deathNum','conadd','cureadd','deathadd']]
w_data
获取广东历史数据
historylist:https://gwpre.sina.cn/interface/news/ncp/data.d.jsonmod=province&province=guangdong
response = requests.get('https://gwpre.sina.cn/interface/news/ncp/data.d.json?mod=province&province=guangdong').json()
gddata = response['data']
gdlist = gddata['historylist']
gdlist[:2];
gd_list = []
for y in range(len(gdlist)):
ymd = gdlist[y]['ymd']
gd_conNum = gdlist[y]['conNum']
gd_cureNum = gdlist[y]['cureNum']
gd_deathNum = gdlist[y]['deathNum']
gd_econNum = gdlist[y]['econNum']
gd_conadd = gdlist[y]['conadd']
gd_dict = {
'ymd':ymd, 'gd_conNum':gd_conNum, 'gd_cureNum':gd_cureNum, 'gd_deathNum':gd_deathNum, 'gd_econNum':gd_econNum, 'gd_conadd':gd_conadd
}
gd_list.append(gd_dict)
gd_list;
gd_data = pd.Dataframe(gd_list)
gd_data.head()
数据保存
with pd.ExcelWriter(r'新浪疫情数据.xlsx') as writer:
n_data.to_excel(writer, sheet_name='China', index=False) #保存数据
w_data.to_excel(writer, sheet_name='World', index=False) #保存数据
gd_data.to_excel(writer, sheet_name='Guangdong', index=False)
全国数据统计
dc_name = ['新增境外输入', '新增无症状','新增确诊', '新增死亡', '新增治愈']
d_comp = []
total_dict = dict()
comp_dict = dict()
for zong_shu in html.find('div', attrs={'class':'t_list'}).find_all('div'):
total_dict[zong_shu.h5.text] = zong_shu.b.text
for xin_zeng in html.find('div', attrs={'class':'t_list'}).find_all('h4'):
# d_comp.append(xin_zeng.code.text.replace('+','').replace('-',''))
d_comp.append(xin_zeng.code.text.replace('+',''))
d_comp = [d_comp[1], d_comp[2], d_comp[4], d_comp[5], d_comp[6]]
for i in range(len(d_comp)):
comp_dict[dc_name[i]] = d_comp[i]
# total_dict
comp_dict
数据整理
t_list = [total_dict] comp_list = [comp_dict] total = pd.Dataframe(t_list) comp = pd.Dataframe(comp_list) total = total.unstack(level=0) total = total.reset_index(drop=True, level=-1) # total comp = comp.unstack(level=0) comp = comp.reset_index(drop=True, level=-1) comp获取时间
subtime = html.find('div', attrs={'class':'t_tit'}).find('span')
subtime = subtime.text
subtime
爬取完数据之后就开始读取整理数据并进行可视化
读取国内数据data_n = pd.read_excel('新浪疫情数据.xlsx', 'China', index_col=0)
data_n.fillna(0, inplace=True)
data_n['deathNum'] = data_n['deathNum'].astype('int64')
data_n.head()
读取世界数据
data_w = pd.read_excel('新浪疫情数据.xlsx', 'World', index_col=0)
data_w.fillna(0, inplace=True)
data_w
读取广东数据
data_gd = pd.read_excel('新浪疫情数据.xlsx', 'Guangdong', parse_dates = ['ymd'], index_col=0)
data_gd.fillna(0, inplace=True)
data_gd['gd_deathNum'] = data_gd['gd_deathNum'].astype('int64')
data_gd['gd_cureNum'] = data_gd['gd_cureNum'].astype('int64')
data_gd
2021每日数据
gd_2021 = data_gd['2021']
gd_2021.index = gd_2021.index.strftime('%m-%d')
gd_2021
2020每日数据
gd_2020 = data_gd['2020']
gd_2020.index = gd_2020.index.strftime('%m-%d')
gd_2020
整理完需要用的数据就可以进行数据可视化了!
pyecharts加载库#加载 Jupyter lab中设置 pyecharts 全局显示参数 from pyecharts.globals import CurrentConfig, NotebookType, SymbolType CurrentConfig.NOTEBOOK_TYPE = NotebookType.JUPYTER_LAB import pyecharts.options as opts #并加载pyecharts选项 from pyecharts.charts import * from pyecharts.components import * from pyecharts.faker import Faker figsize=opts.InitOpts(bg_color='rgb(5, 46, 112, 0.5)') #设置图形大小和背景色 rgb(225, 225, 225, 0.5) # width='1200px',height='600px', Bar().load_javascript(); #绘图前需加载一次Javascript函数!! Line().load_javascript(); Map().load_javascript(); Page().load_javascript(); Pie().load_javascript(); WordCloud().load_javascript(); Table().load_javascript(); HeatMap().load_javascript()测试一下
没问题!
#仪表盘
from pyecharts.charts import Gauge
Gauge().load_javascript()
Gauge().add("",[("完成率nnn",99)]).render_notebook()
中国疫情表
这里不知道怎么处理表格的颜色,有知道的小伙伴分享一下啊,谢谢了!
table = (
Table()
.add(headers=['中国疫情','数据'],
rows = [['累计确诊', data_w.iloc[0][1]],
['累计治愈', data_w.iloc[0][2]],
['累计死亡', data_w.iloc[0][3]],
['新增确诊', data_w.iloc[0][4]],
['新增治愈', data_w.iloc[0][5]],
['新增死亡', data_w.iloc[0][6]],
],
attributes = {"class": "fl-table"}
)
)
table.render_notebook()
# table.render('tables.html')
全国数据饼图
outer_data_pair = [list(z) for z in zip(total.index.tolist(), total.values.tolist())]
inner_data_pair = [list(z) for z in zip(comp.index.tolist(), comp.values.tolist())]
# 绘制饼图
pie1 = Pie(figsize)
pie1.add(' ', outer_data_pair, radius=['60%','80%'])
pie1.add(' ', inner_data_pair, radius=[0,'40%'])
pie1.set_global_opts(title_opts=opts.TitleOpts(title='全国数据饼图', pos_right='1%', title_textstyle_opts=opts.TextStyleOpts(color='#FFFF99')),
legend_opts=opts.LegendOpts(orient='vertical', pos_top='10%', pos_right='1%', textstyle_opts=opts.TextStyleOpts(color='#FFFF99')))
pie1.set_series_opts(label_opts=opts.LabelOpts(position="outside",
formatter=" {per|{b}: {c}}",
background_color="#eee",
border_color="#aaa",
border_width=1,
border_radius=4,
rich={
"per": {
"color": "#eee",
"backgroundColor": "#334455",
"padding": [2, 4],
"borderRadius": 2,
},
},
))
pie1.set_colors(['#EF9050', '#3B7BA9', '#6FB27C', '#FFAF34', '#D8BFD8', '#00BFFF', '#FF2400'])
pie1.render_notebook()
全国数据地图
累计确诊
map1 = Map(figsize)
map1.add('', [list(z) for z in zip(data_n.index.tolist(),data_n.value.tolist())],
maptype='china')
map1.set_global_opts(
title_opts = opts.TitleOpts(title='全国累计确诊', pos_right='1%', title_textstyle_opts=opts.TextStyleOpts(color='#FFFF99')),
visualmap_opts = opts.VisualMapOpts(max_=4000, textstyle_opts=opts.TextStyleOpts(color='#FFFF99'), pos_left='left'),
legend_opts=opts.LegendOpts(textstyle_opts=opts.TextStyleOpts(color='#FFFF99'))
)
map1.load_javascript();
map1.render('map1.html')
现存确诊
map2 = Map(figsize)
map2.add('', [list(z) for z in zip(data_n.index.tolist(),data_n.econNum.tolist())],
maptype='china')
map2.set_global_opts(
title_opts = opts.TitleOpts(title='全国现存确诊', title_textstyle_opts=opts.TextStyleOpts(color='#FFFF99')),
visualmap_opts = opts.VisualMapOpts(max_=1000, is_piecewise=True,
textstyle_opts=opts.TextStyleOpts(color='#FFFF99'),
pos_left='right',
pieces=[
{'min':1001, 'label':'>1000', "color": "#E3170D"},
{'min':501, 'max':1000, 'label':'501~1000', "color": "#ff585e"},
{'min':101, 'max':500, 'label':'101~500', "color": "#FF9912"},
{'min':11, 'max':100, 'label':'11~100', "color": "#FFE384"},
{'min':0, 'max':10, 'label':'0~10', "color": "#FFFAF0"},
]),
legend_opts=opts.LegendOpts(textstyle_opts=opts.TextStyleOpts(color='#FFFF99'))
)
map2.load_javascript();
map2.render('map2.html')
各省数据热图
heatmap = HeatMap(figsize)
value = [[i, j, int(data_n.iloc[i][j])] for i in range(34) for j in range(5)]
heatmap.add_xaxis(data_n.index.tolist())
heatmap.add_yaxis('', ['现存确诊', '累计确诊', '现存无症状', '累计死亡', '累计治愈'], value)
heatmap.set_global_opts(title_opts=opts.TitleOpts(title="各省数据HeatMap", pos_right='40%', title_textstyle_opts=opts.TextStyleOpts(color='#FFFF99')),
xaxis_opts=opts.AxisOpts(axisline_opts=opts.AxisLineOpts(linestyle_opts=opts.LineStyleOpts(color='#FFFF99'))),
yaxis_opts=opts.AxisOpts(axisline_opts=opts.AxisLineOpts(linestyle_opts=opts.LineStyleOpts(color='#FFFF99'))),
visualmap_opts=opts.VisualMapOpts(max_=2000, pos_right='right', textstyle_opts=opts.TextStyleOpts(color='#FFFF99')),
datazoom_opts=[opts.DataZoomOpts(), opts.DataZoomOpts(type_="inside")],
)
heatmap.render_notebook()
世界数据地图
map3 = Map(figsize)
map3.add('', [list(z) for z in zip(data_w.index.tolist(), data_w.value.tolist())], is_map_symbol_show=False,
maptype='world')
map3.set_global_opts(
title_opts = opts.TitleOpts(title='世界累计确诊', pos_right='40%', title_textstyle_opts=opts.TextStyleOpts(color='#FFFF99')),
visualmap_opts = opts.VisualMapOpts(max_=5000000, textstyle_opts=opts.TextStyleOpts(color='#FFFF99'), pos_left='left'),
legend_opts=opts.LegendOpts(textstyle_opts=opts.TextStyleOpts(color='#FFFF99'))
)
map3.set_series_opts(label_opts=opts.LabelOpts(is_show=False))
map3.load_javascript();
map3.render('map3.html')
世界疫情数据表
d_w = data_w.iloc[:,1:]
d_w = d_w.sum()
d_w
table1 = (
Table(figsize)
.add(headers=['世界疫情','数据'],
rows = [['累计确诊', d_w.value],
['累计治愈', d_w.cureNum],
['累计死亡', d_w.deathNum],
['新增确诊', d_w.conadd],
['新增治愈', d_w.cureadd],
['新增死亡', d_w.deathadd],
],
)
)
table1.render_notebook()
广东数据图
2021数据柱状图
bar1 = Bar(figsize)
bar1.add_xaxis(gd_2021.index.tolist())
bar1.add_yaxis('新增确诊', gd_2021.gd_conadd.tolist(), yaxis_index=1, color='#E3170D')
bar1.add_yaxis('现存确诊', gd_2021.gd_econNum.tolist(), yaxis_index=1, color='#F0FFFF')
bar1.add_yaxis('累计确诊', gd_2021.gd_conNum.tolist(), yaxis_index=0, color='#8A2BE2')
bar1.add_yaxis('累计治愈', gd_2021.gd_cureNum.tolist(), yaxis_index=0, color='#FFFF00')
bar1.add_yaxis('累计死亡', gd_2021.gd_deathNum.tolist(), yaxis_index=1, color='#00FF00')
bar1.set_series_opts(label_opts=opts.LabelOpts(is_show=False))
bar1.extend_axis(yaxis=opts.AxisOpts( axisline_opts=opts.AxisLineOpts(linestyle_opts=opts.LineStyleOpts(color='#FFFF99'))))
bar1.set_global_opts(title_opts=opts.TitleOpts(title='2021广东疫情数据', pos_left = 'left', padding=[1,5], title_textstyle_opts=opts.TextStyleOpts(color='#FFFF99')),
xaxis_opts=opts.AxisOpts(axisline_opts=opts.AxisLineOpts(linestyle_opts=opts.LineStyleOpts(color='#FFFF99'))),
yaxis_opts=opts.AxisOpts(axisline_opts=opts.AxisLineOpts(linestyle_opts=opts.LineStyleOpts(color='#FFFF99'))),
datazoom_opts=[opts.DataZoomOpts(), opts.DataZoomOpts(type_="inside")],
legend_opts=opts.LegendOpts(textstyle_opts=opts.TextStyleOpts(color='#FFFF99')),
tooltip_opts=opts.TooltipOpts(is_show=True, trigger="axis", textstyle_opts=opts.TextStyleOpts(color='#FFFF99')),
)
bar1.render_notebook()
2020数据折线图
line1 = Line(figsize)
line1.add_xaxis(gd_2020.index.tolist())
line1.add_yaxis('新增确诊', gd_2020.gd_conadd.tolist(), yaxis_index=1, color='#E3170D', is_smooth=True, symbol="none")
line1.add_yaxis('累计确诊', gd_2020.gd_conNum.tolist(), yaxis_index=0, color='#FFFF00', is_smooth=True, symbol="none")
line1.add_yaxis('累计治愈', gd_2020.gd_cureNum.tolist(), yaxis_index=0, color='#00FF00', is_smooth=True, symbol="none")
line1.add_yaxis('累计死亡', gd_2020.gd_deathNum.tolist(), yaxis_index=1, color='#F0FFFF', is_smooth=True, symbol="none")
line1.add_yaxis('现存确诊', gd_2020.gd_econNum.tolist(), yaxis_index=1, color='#8A2BE2', is_smooth=True, symbol="none")
line1.set_series_opts(label_opts=opts.LabelOpts(is_show=False))
line1.extend_axis(yaxis=opts.AxisOpts( axisline_opts=opts.AxisLineOpts(linestyle_opts=opts.LineStyleOpts(color='#FFFF99'))))
line1.set_global_opts(title_opts=opts.TitleOpts(title='2020广东疫情数据', pos_left = 'right', padding=[1,5], title_textstyle_opts=opts.TextStyleOpts(color='#FFFF99')),
xaxis_opts=opts.AxisOpts(axisline_opts=opts.AxisLineOpts(linestyle_opts=opts.LineStyleOpts(color='#FFFF99'))),
yaxis_opts=opts.AxisOpts(axisline_opts=opts.AxisLineOpts(linestyle_opts=opts.LineStyleOpts(color='#FFFF99'))),
legend_opts=opts.LegendOpts(textstyle_opts=opts.TextStyleOpts(color='#FFFF99')),
tooltip_opts=opts.TooltipOpts(is_show=True, trigger="axis", textstyle_opts=opts.TextStyleOpts(color='#FFFF99')),
datazoom_opts=[opts.DataZoomOpts(), opts.DataZoomOpts(type_="inside")],
)
line1.render_notebook()
词云处理
wc = (
WordCloud()
.add("", [list(z) for z in zip(list(data_w.country), list(data_w["value"]))],
word_gap=0)
)
wc.render('wc.html')
制作标题
title = Pie().set_global_opts(title_opts=opts.TitleOpts(title="2021疫情数据大屏", title_textstyle_opts=opts.TextStyleOpts(font_size=40, color='#FFFF99'), pos_top=0))
title.render_notebook()
subtitle = Pie().set_global_opts(title_opts=opts.TitleOpts(subtitle=(subtime),
subtitle_textstyle_opts=opts.TextStyleOpts(font_size=15, color='#FFFF99'),
pos_top=0
)
)
subtitle.render_notebook()
大屏拼接
page = Page(layout=Page.DraggablePageLayout, page_title='2021疫情数据大屏')
page.add(
table,
pie1,
map1,
map2,
heatmap,
map3,
table1,
bar1,
line1,
wc,
title,
subtitle
)
page.render()
运行后得到render.html打开后会看到左上角有一个控件,并且每个小图都是可移动的。按照你喜欢的排版设置好后,点击控件即可保存得到一个.json的文件
得到.json文件后再运行以下代码即可得到你想要的数据大屏了
Page.save_resize_html("render.html", cfg_file=r"chart_config.json", dest="my_new_charts.html");
最后处理得到的效果
如果你知道每个小图的具体位置,可以直接用这个代码,然后再运行一次上面的那行代码
from bs4 import BeautifulSoup
with open("render.html", "r+", encoding='utf-8') as html:
html_bf = BeautifulSoup(html, 'lxml')
divs = html_bf.select('.chart-container')
divs[0]["style"] = "width:25px;height:350px;position:absolute;top:1197px;left:234px;"
divs[1]["style"] = "width:925px;height:500px;position:absolute;top:696.6666870117188px;left:-95px;"
divs[2]["style"] = "width:882px;height:596px;position:absolute;top:75.33333587646484px;left:800px;"
divs[3]["style"] = "width:786px;height:596px;position:absolute;top:76px;left:6px;"
divs[4]["style"] = "width:828px;height:500px;position:absolute;top:697.6666870117188px;left:852px;"
divs[5]["style"] = "width:1646px;height:773px;position:absolute;top:1576.3333740234375px;left:21px;"
divs[6]["style"] = "width:48px;height:351px;position:absolute;top:1197px;left:1244px;"
divs[7]["style"] = "width:815px;height:500px;position:absolute;top:2369.666748046875px;left:20px;"
divs[8]["style"] = "width:811px;height:500px;position:absolute;top:2369.33349609375px;left:856px;"
divs[9]["style"] = "width:1565px;height:580px;position:absolute;top:1089px;left:46px;"
divs[10]["style"] = "width:374px;height:76px;position:absolute;top:12.666666984558105px;left:639px;"
divs[11]["style"] = "width:233px;height:75px;position:absolute;top:40.333335876464844px;left:989px;"
body = html_bf.find("body")
body["background"] = "" # 背景颜色
html_new = str(html_bf)
html.seek(0, 0)
html.truncate()
html.write(html_new)
html.close()
文章中的代码有参考学习过其它文章,但因为忘记了有哪些文章了,请各位作者见谅,在评论区附上您的文章出处,感谢!如有侵权,联系必删!再次道谢!
欢迎各位小伙伴一起讨论分享并改进代码!
原创文章,转载请注明出处,本文地址:https://blog.csdn.net/qq_25834057/article/details/120687310



