曼昆的《宏观经济学》里在讲到CPI(消费物价指数)这个指标时提到一个例子,就是说当时《阿凡达》以7.61亿美元成为票房第一名,而在考虑通货膨胀之后,《阿凡达》却降到14名,反而1939年的《乱世佳人》排在第一。
通过查看响应,可以发现该Post请求返回一个JSON,如果更清楚的看清JSON内部的数据,可以使用网站对JSON进行格式化,如在线JSON校验格式化工具(Be JSON)。通过对JSON格式化后,可以清楚看到该数据包括我们最需要的数据,即电影名、上映时间与累计票房。JSON格式化如下图:
此时可以将data变量当作字典类型使用,并使用split函数对上映时间进行处理,让时间字段只包含上映年份。详情如下图,可以看到分别打印出前10个数据的片名、上映年份与票房。
不过没关系,我们仔细看看返回这个JSON的post,发现了这个奥秘就在这个“top:50”里,所以该请求只给前端界面返回了前50个排名,同时注意到“type:1”,这个是用来区分全部、国产与进口的!读者可以试试,当选择进口的时候,type就会变成2。
运气还不错,使用最简单的Post就可以直接得到返回。从下图可以看出,我们已经成功得到了前500名的数据,跳出了前端界面的限制。写爬虫的时候经常也可以发现这种情况,服务器发给前端的数据,前端只是选择性的显示,比如人人贷的贷款数据,从界面上是无法看见借款人的借款理由,但通过F12可以发现服务器返回的数据是包含借款理由这个字段,只是前端不显示而已。
原本想着说可以使用CPI来进行修正,但看了CPI的数据后,发现CPI却基本非常的稳定。我大为吃惊,明明感觉从小时候到现在,物价总数蹭蹭的涨,为什么CPI却挺稳定。有兴趣的可以搜搜原因,蛮有意思。
这篇文章说,20世纪90年代的100元,约等于2020年的1000元;21世纪00年代的100元相当于2020年的300-400元,取350元;21世纪10年代的100元,约等于2020年的157元。
于是,现在的思路非常明确,即在第一节中的Df中添加对应年份的物价水平,然后对票房数据进行修正,具体代码与结果如下图,从图中可以看到已经将物价水平匹配到相应的上映年份。
四、代码
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
from pylab import *
import requests
plt.rcParams['axes.unicode_minus']=False #用于解决不能显示负号的问题
mpl.rcParams['font.sans-serif'] = ['SimHei']
#最小二乘法 OLS
def standRegres(xArr,yArr):
xMat = mat(xArr)
yMat = mat(yArr).T
xTx = xMat.T*xMat
if linalg.det(xTx) == 0.0:
print("This matrix is singular, cannot do inverse")
return
ws = xTx.I * (xMat.T*yMat)
# print(ws)
return ws
data = {"status":1,"des":"成功","userstatus":0,"version":0,"data":{"table0":[{"MovieName":"长津湖","AvgAudienceCount":23,"ReleaseTime":"2021-09-30","AvgBoxOffice":47,"BoxOffice":5689581646,"Irank":1,"EnMovieID":703496},{"MovieName":"战狼2","AvgAudienceCount":37,"ReleaseTime":"2017-07-27","AvgBoxOffice":36,"BoxOffice":5688740633,"Irank":2,"EnMovieID":641515},{"MovieName":"你好,李焕英","AvgAudienceCount":24,"ReleaseTime":"2021-02-12","AvgBoxOffice":45,"BoxOffice":5413303171,"Irank":3,"EnMovieID":662746},{"MovieName":"哪吒之魔童降世","AvgAudienceCount":23,"ReleaseTime":"2019-07-26","AvgBoxOffice":36,"BoxOffice":5035020595,"Irank":4,"EnMovieID":662685},{"MovieName":"流浪地球","AvgAudienceCount":29,"ReleaseTime":"2019-02-05","AvgBoxOffice":45,"BoxOffice":4686808164,"Irank":5,"EnMovieID":642412},{"MovieName":"唐人街探案3","AvgAudienceCount":29,"ReleaseTime":"2021-02-12","AvgBoxOffice":48,"BoxOffice":4522345605,"Irank":6,"EnMovieID":676314},{"MovieName":"复仇者联盟4:终局之战","AvgAudienceCount":23,"ReleaseTime":"2019-04-24","AvgBoxOffice":49,"BoxOffice":4250383910,"Irank":7,"EnMovieID":670808},{"MovieName":"红海行动","AvgAudienceCount":33,"ReleaseTime":"2018-02-16","AvgBoxOffice":39,"BoxOffice":3651886398,"Irank":8,"EnMovieID":655823},{"MovieName":"唐人街探案2","AvgAudienceCount":39,"ReleaseTime":"2018-02-16","AvgBoxOffice":39,"BoxOffice":3397688097,"Irank":9,"EnMovieID":663419},{"MovieName":"美人鱼","AvgAudienceCount":43,"ReleaseTime":"2016-02-08","AvgBoxOffice":37,"BoxOffice":3397175023,"Irank":10,"EnMovieID":626153},{"MovieName":"我和我的祖国","AvgAudienceCount":35,"ReleaseTime":"2019-09-30","AvgBoxOffice":38,"BoxOffice":3176119334,"Irank":11,"EnMovieID":691481},{"MovieName":"八佰","AvgAudienceCount":20,"ReleaseTime":"2020-08-21","AvgBoxOffice":38,"BoxOffice":3102323734,"Irank":12,"EnMovieID":669412},{"MovieName":"我不是药神","AvgAudienceCount":27,"ReleaseTime":"2018-07-05","AvgBoxOffice":35,"BoxOffice":3099961063,"Irank":13,"EnMovieID":676313},{"MovieName":"中国机长","AvgAudienceCount":26,"ReleaseTime":"2019-09-30","AvgBoxOffice":37,"BoxOffice":2913117677,"Irank":14,"EnMovieID":681319},{"MovieName":"我和我的家乡","AvgAudienceCount":19,"ReleaseTime":"2020-10-01","AvgBoxOffice":39,"BoxOffice":2828832552,"Irank":15,"EnMovieID":701620},{"MovieName":"速度与激情8","AvgAudienceCount":30,"ReleaseTime":"2017-04-14","AvgBoxOffice":37,"BoxOffice":2670959285,"Irank":16,"EnMovieID":659757},{"MovieName":"西虹市首富","AvgAudienceCount":28,"ReleaseTime":"2018-07-27","AvgBoxOffice":35,"BoxOffice":2547571742,"Irank":17,"EnMovieID":671983},{"MovieName":"捉妖记","AvgAudienceCount":41,"ReleaseTime":"2015-07-16","AvgBoxOffice":37,"BoxOffice":2441462276,"Irank":18,"EnMovieID":627896},{"MovieName":"速度与激情7","AvgAudienceCount":42,"ReleaseTime":"2015-04-12","AvgBoxOffice":39,"BoxOffice":2426586547,"Irank":19,"EnMovieID":629625},{"MovieName":"复仇者联盟3:无限战争","AvgAudienceCount":19,"ReleaseTime":"2018-05-11","AvgBoxOffice":38,"BoxOffice":2390537273,"Irank":20,"EnMovieID":675789},{"MovieName":"捉妖记2","AvgAudienceCount":44,"ReleaseTime":"2018-02-16","AvgBoxOffice":38,"BoxOffice":2237154621,"Irank":21,"EnMovieID":656875},{"MovieName":"疯狂的外星人","AvgAudienceCount":30,"ReleaseTime":"2019-02-05","AvgBoxOffice":42,"BoxOffice":2214254201,"Irank":22,"EnMovieID":638300},{"MovieName":"羞羞的铁拳","AvgAudienceCount":25,"ReleaseTime":"2017-09-30","AvgBoxOffice":33,"BoxOffice":2201748735,"Irank":23,"EnMovieID":661004},{"MovieName":"海王","AvgAudienceCount":18,"ReleaseTime":"2018-12-07","AvgBoxOffice":36,"BoxOffice":2013198359,"Irank":24,"EnMovieID":665526},{"MovieName":"变形金刚4:绝迹重生","AvgAudienceCount":50,"ReleaseTime":"2014-06-27","AvgBoxOffice":42,"BoxOffice":1977522388,"Irank":25,"EnMovieID":612232},{"MovieName":"前任3:再见前任","AvgAudienceCount":29,"ReleaseTime":"2017-12-29","AvgBoxOffice":35,"BoxOffice":1941740154,"Irank":26,"EnMovieID":663359},{"MovieName":"毒液:致命守护者","AvgAudienceCount":17,"ReleaseTime":"2018-11-09","AvgBoxOffice":36,"BoxOffice":1870680440,"Irank":27,"EnMovieID":662209},{"MovieName":"功夫瑜伽","AvgAudienceCount":33,"ReleaseTime":"2017-01-28","AvgBoxOffice":38,"BoxOffice":1752603744,"Irank":28,"EnMovieID":629898},{"MovieName":"飞驰人生","AvgAudienceCount":25,"ReleaseTime":"2019-02-05","AvgBoxOffice":42,"BoxOffice":1729373180,"Irank":29,"EnMovieID":676018},{"MovieName":"烈火英雄","AvgAudienceCount":19,"ReleaseTime":"2019-08-01","AvgBoxOffice":36,"BoxOffice":1707188998,"Irank":30,"EnMovieID":692321},{"MovieName":"侏罗纪世界2","AvgAudienceCount":19,"ReleaseTime":"2018-06-15","AvgBoxOffice":36,"BoxOffice":1695881571,"Irank":31,"EnMovieID":667168},{"MovieName":"寻龙诀","AvgAudienceCount":40,"ReleaseTime":"2015-12-18","AvgBoxOffice":36,"BoxOffice":1682742863,"Irank":32,"EnMovieID":614981},{"MovieName":"西游伏妖篇","AvgAudienceCount":36,"ReleaseTime":"2017-01-28","AvgBoxOffice":39,"BoxOffice":1655926405,"Irank":33,"EnMovieID":619719},{"MovieName":"港囧","AvgAudienceCount":40,"ReleaseTime":"2015-09-25","AvgBoxOffice":33,"BoxOffice":1614103585,"Irank":34,"EnMovieID":618038},{"MovieName":"姜子牙","AvgAudienceCount":19,"ReleaseTime":"2020-10-01","AvgBoxOffice":40,"BoxOffice":1602983421,"Irank":35,"EnMovieID":682630},{"MovieName":"少年的你","AvgAudienceCount":16,"ReleaseTime":"2019-10-25","AvgBoxOffice":36,"BoxOffice":1559025893,"Irank":36,"EnMovieID":680681},{"MovieName":"变形金刚5:最后的骑士","AvgAudienceCount":23,"ReleaseTime":"2017-06-23","AvgBoxOffice":37,"BoxOffice":1551242789,"Irank":37,"EnMovieID":656946},{"MovieName":"疯狂动物城","AvgAudienceCount":28,"ReleaseTime":"2016-03-04","AvgBoxOffice":34,"BoxOffice":1534528494,"Irank":38,"EnMovieID":643235},{"MovieName":"我和我的父辈","AvgAudienceCount":16,"ReleaseTime":"2021-09-30","AvgBoxOffice":43,"BoxOffice":1474411166,"Irank":39,"EnMovieID":706356},{"MovieName":"魔兽","AvgAudienceCount":25,"ReleaseTime":"2016-06-08","AvgBoxOffice":37,"BoxOffice":1472297906,"Irank":40,"EnMovieID":402117},{"MovieName":"复仇者联盟2:奥创纪元","AvgAudienceCount":29,"ReleaseTime":"2015-05-12","AvgBoxOffice":40,"BoxOffice":1464392888,"Irank":41,"EnMovieID":631792},{"MovieName":"夏洛特烦恼","AvgAudienceCount":33,"ReleaseTime":"2015-09-30","AvgBoxOffice":32,"BoxOffice":1447823756,"Irank":42,"EnMovieID":628183},{"MovieName":"速度与激情:特别行动","AvgAudienceCount":15,"ReleaseTime":"2019-08-23","AvgBoxOffice":36,"BoxOffice":1434299899,"Irank":43,"EnMovieID":682202},{"MovieName":"送你一朵小红花","AvgAudienceCount":12,"ReleaseTime":"2020-12-31","AvgBoxOffice":37,"BoxOffice":1432524430,"Irank":44,"EnMovieID":701874},{"MovieName":"芳华","AvgAudienceCount":25,"ReleaseTime":"2017-12-15","AvgBoxOffice":34,"BoxOffice":1422584326,"Irank":45,"EnMovieID":659453},{"MovieName":"侏罗纪世界","AvgAudienceCount":33,"ReleaseTime":"2015-06-10","AvgBoxOffice":38,"BoxOffice":1420732578,"Irank":46,"EnMovieID":348959},{"MovieName":"蜘蛛侠:英雄远征","AvgAudienceCount":17,"ReleaseTime":"2019-06-28","AvgBoxOffice":36,"BoxOffice":1417682748,"Irank":47,"EnMovieID":682139},{"MovieName":"头号玩家","AvgAudienceCount":18,"ReleaseTime":"2018-03-30","AvgBoxOffice":36,"BoxOffice":1396660613,"Irank":48,"EnMovieID":657862},{"MovieName":"速度与激情9","AvgAudienceCount":13,"ReleaseTime":"2021-05-21","AvgBoxOffice":39,"BoxOffice":1392333894,"Irank":49,"EnMovieID":682199},{"MovieName":"后来的我们","AvgAudienceCount":21,"ReleaseTime":"2018-04-28","AvgBoxOffice":34,"BoxOffice":1361525311,"Irank":50,"EnMovieID":663327}]}}
url = "https://ys.endata.cn/enlib-api/api/home/getrank_mainland.do"
myData = {'top':'500','type':'1'}
res = requests.post(url,data=myData)
res = res.json()
print(len(res["data"]["table0"]))
dataDf = pd.Dataframe()
for myIndex,each in enumerate(res["data"]["table0"]):
dataDf = dataDf.append(pd.Dataframe({"MovieName":each["MovieName"],
"ReleaseTime":each["ReleaseTime"].split("-")[0],
"BoxOffice":each["BoxOffice"]},index=[myIndex]))
#kind:barh 横向条状图
#figsize:图片尺寸大小
#legend=False:不显示图例
#color:设置颜色
#fontsize:设置标签文字大小
#list[::-1] -> 倒序
dataDf[:20][::-1].plot("MovieName","BoxOffice",kind='barh',figsize=(16,8),fontsize=15,legend=False,
color=["grey","gold","darkviolet","turquoise","r","g","b","c",
"k","darkorange","lightgreen","plum", "tan","khaki", "pink", "skyblue","lawngreen","salmon"])
plt.savefig("before",dpi=500)
print(dataDf["ReleaseTime"].value_counts())
priceLevel = np.array([1/1000,1/350,1/157,1/100])*1000
plt.scatter([1990,2000,2010,2020],priceLevel)
ws = standRegres([[1,0],[1,10],[1,20],[1,30]],priceLevel)
plt.scatter([0,10,20,30],priceLevel,color="r")
x = np.linspace(0,30)
y = float(ws[0])+float(ws[1])*x
plt.xticks([0,10,20,30],[1990,2000,2010,2020])
plt.plot(x,y)
priceLevelNew = []
for each in range(10,32):
priceLevelNew.append(float(ws[0])+float(ws[1])*each)
for eachIndex in dataDf.index:
dataDf.loc[eachIndex,"priceLevel"] = priceLevelNew[int(dataDf.iloc[eachIndex]["ReleaseTime"])-2000]
dataDf["newBoxOffice"] = dataDf["BoxOffice"]*9.938608/dataDf["priceLevel"]
dataDf = dataDf.sort_values(by="newBoxOffice",ascending=False)
dataDf
dataDf[:20][::-1].plot("MovieName","newBoxOffice",kind='barh',figsize=(16,8),fontsize=15,legend=False,
color=["grey","gold","darkviolet","turquoise","r","g","b","c",
"k","darkorange","lightgreen","plum", "tan","khaki", "pink", "skyblue","lawngreen","salmon"])
plt.savefig("after",dpi=500)