| 年份 | R&D支出(亿元) | GDP(亿元) | 能源消耗(万吨标准煤) |
|---|---|---|---|
| 2013 | 11906 | 568845 | 379732 |
| 2012 | 10298.4 | 519470.1 | 361732 |
| 2011 | 8687 | 473104.1 | 348001.66 |
| 2010 | 7062.6 | 401512.8 | 324939.15 |
| 2009 | 5802.1 | 340902.8 | 306647.15 |
| 2008 | 4616 | 314045.4 | 291448.29 |
| 2007 | 3710.2 | 265810.3 | 280507.94 |
| 2006 | 3003.1 | 216314.4 | 258676.3 |
| 2005 | 2450 | 184937.4 | 235996.65 |
| 2004 | 1966.3 | 159878.3 | 213455.99 |
| 2003 | 1539.6 | 135822.8 | 183791.82 |
| 2002 | 1287.6 | 120332.7 | 159430.99 |
| 2001 | 1042.5 | 109655.2 | 150405.8 |
| 2000 | 895.7 | 99214.55 | 145530.86 |
| 1999 | 678.9 | 89677.05 | 140569 |
| 1998 | 551.1 | 84402.28 | 136184 |
| 1997 | 509.2 | 78973.03 | 135909 |
| 1996 | 404.5 | 71176.59 | 135192 |
| 1995 | 348.7 | 60793.73 | 131176 |
| 1994 | 265.09 | 48197.86 | 122737 |
| 1993 | 240.27 | 35333.92 | 115993 |
| 1992 | 297.92 | 26923.48 | 109170 |
| 1991 | 163.36 | 21781.5 | 103783 |
| 1990 | 125 | 18667.82 | 98703 |
2. 数据预处理 2.1 导包先将数据保存为2.2.xlsx
import numpy as np import pandas as pd import matplotlib.pyplot as plt from statsmodels.formula.api import ols2.2 读取并规范字段名称
data = pd.read_excel('2.2.xlsx')
data=data.rename(columns={'年份':'Year','R&D支出(亿元)':'RD','GDP(亿元) ':'GDP','能源消耗(万吨标准煤)':'energyConsume'})
print(data)
处理后的数据:
Year RD GDP energyConsume 0 2013 11906.00 568845.00 379732.00 1 2012 10298.40 519470.10 361732.00 2 2011 8687.00 473104.05 348001.66 3 2010 7062.60 401512.80 324939.15 4 2009 5802.10 340902.81 306647.15 5 2008 4616.00 314045.43 291448.29 6 2007 3710.20 265810.31 280507.94 7 2006 3003.10 216314.43 258676.30 8 2005 2450.00 184937.37 235996.65 9 2004 1966.30 159878.34 213455.99 10 2003 1539.60 135822.76 183791.82 11 2002 1287.60 120332.69 159430.99 12 2001 1042.50 109655.17 150405.80 13 2000 895.70 99214.55 145530.86 14 1999 678.90 89677.05 140569.00 15 1998 551.10 84402.28 136184.00 16 1997 509.20 78973.03 135909.00 17 1996 404.50 71176.59 135192.00 18 1995 348.70 60793.73 131176.00 19 1994 265.09 48197.86 122737.00 20 1993 240.27 35333.92 115993.00 21 1992 297.92 26923.48 109170.00 22 1991 163.36 21781.50 103783.00 23 1990 125.00 18667.82 98703.003. 数据描述性分析
describe = data.describe() print(describe)
结果:
Year RD GDP energyConsume count 24.000000 24.000000 24.000000 24.000000 mean 2001.500000 2827.130833 185240.544583 202904.691667 std 7.071068 3466.864991 166428.586244 92240.219636 min 1990.000000 125.000000 18667.820000 98703.000000 25% 1995.750000 390.550000 68580.875000 134188.000000 50% 2001.500000 1165.050000 114993.930000 154918.395000 75% 2007.250000 3936.650000 277869.090000 283243.027500 max 2013.000000 11906.000000 568845.000000 379732.0000004. 自变量与因变量关系判断
plt.figure(1);
plt.scatter(data['RD'],data['GDP']);
plt.xlabel('$RD$');
plt.ylabel('$GDP$');
plt.title('RD-GDP')
plt.figure(2);
plt.scatter(data['GDP'],data['energyConsume']);
plt.xlabel('$GDP$');
plt.ylabel('$energyConsume$');
plt.title('GDP-energyConsume')
plt.figure(3);
plt.scatter(data['energyConsume'],data['RD']);
plt.ylabel('$RD$');
plt.xlabel('$energyConsume$');
plt.title('energyConsume-RD')
plt.ioff();
plt.show();
可以看出,自1990年起的一段时间,数据变化缓慢,在图像中显示为数据点较为密集,说明这段时间内国力不够强大,居民消费水平、支出与能源消耗增长缓慢。
print(data[['RD','GDP','energyConsume']].corr())
结果:
RD GDP energyConsume RD 1.000000 0.989405 0.948299 GDP 0.989405 1.000000 0.979595 energyConsume 0.948299 0.979595 1.000000
可以看出,各变量间的正相关性都很强。
6. 二元线性回归分析lm = ols('GDP ~ RD + energyConsume', data=data).fit()
print(lm.summary())
结果:
OLS Regression Results
==============================================================================
Dep. Variable: GDP R-squared: 0.996
Model: OLS Adj. R-squared: 0.995
Method: Least Squares F-statistic: 2545.
Date: Sun, 03 Oct 2021 Prob (F-statistic): 8.80e-26
Time: 15:21:09 Log-Likelihood: -256.14
No. Observations: 24 AIC: 518.3
Df Residuals: 21 BIC: 521.8
Df Model: 2
Covariance Type: nonrobust
=================================================================================
coef std err t P>|t| [0.025 0.975]
---------------------------------------------------------------------------------
Intercept -4.648e+04 1.09e+04 -4.274 0.000 -6.91e+04 -2.39e+04
RD 28.8123 2.116 13.618 0.000 24.412 33.212
energyConsume 0.7405 0.080 9.312 0.000 0.575 0.906
==============================================================================
Omnibus: 10.013 Durbin-Watson: 0.580
Prob(Omnibus): 0.007 Jarque-Bera (JB): 2.182
Skew: -0.057 Prob(JB): 0.336
Kurtosis: 1.527 Cond. No. 1.06e+06
==============================================================================
参照表
根据测试,以GDP为因变量,RD、energyConsume为自变量,能更好地描述数据间的关系。
模型检验:
首先,根据R-squared=0.996,可以得出该模型对y的解释能力很强,拟合效果很好。
再次,根据Prob (F-statistic)=8.80e-26,取
α
=
0.05
alpha=0.05
α=0.05,因为
8.80
e
−
26
<
0.05
8.80e-26<0.05
8.80e−26<0.05,表示拒绝原假设,即认为模型是显著的。
系数检验:
由于截距Intercept,RD,energyConsume的系数检验的p值均为0.000,均小于0.05,故该系数在统计上具有显著性。



