这里简单介绍了回归诊断,本文简单介绍如何用python计算其中的值。
二、计算import statsmodels.api as sm # 以波士顿房价为例 from sklearn.datasets import load_boston from sklearn.model_selection import train_test_split X = load_boston()['data'] y = load_boston()['target'] # 加上全1列 X = sm.add_constant(X) # 划分训练集和测试集 X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=2021) # 建立线性回归模型 ols = sm.OLS(y_train, X_train) models = ols.fit() # 计算预测值 y_predict = models.predict(X_train) outliers = models.get_influence()2.1 计算残差
y_train - y_predict # (n_samples,)2.2 计算学生化残差
resids1 = outliers.resid_studentized_external # (n_samples,)
或
resids2 = outliers.resid_studentized_internal # (n_samples,)
我也没搞懂这两个有啥区别,它们之间的数值差的比较小。
2.3 画残差图plt.scatter(y_predict, resids1)
plt.xlabel('y_predict')
plt.ylabel('resid')
plt.yticks(range(-5, 6))
plt.axhline(y=2, color='r', linestyle='--')
plt.axhline(y=-2, color='r', linestyle='--')
plt.show()
2.4 计算Cook距离
cook = outliers.cooks_distance # (n_samples,)2.5 帽子矩阵
h = outliers.hat_matrix_diag # (n_samples,)2.6 dffits值
diffts = outliers.dffits diffts[0] # (n_samples,) diffts[1] # (), 就一个数



