栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 软件开发 > 后端开发 > Python

机器学习算法性能指标(Regression problems)

Python 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

机器学习算法性能指标(Regression problems)

Machine Learning Algorithm Performance Metrics(机器学习算法性能指标)

数据集:housing.csv(regression problem)
参考书:《Machine Learning Mastery With Python Understand Your Data, Create Accurate Models and work Projects End-to-End》
获取链接:https://github.com/aoyinke/ML_learner

文章综述

本文主要论述了5种classification problem的相关算法性能指标(包括了相关概念,注意事项,以及如何使用sklearn来计算这些指标)

Mean Absolute Error.Mean Squared Error.R^2 Regression Metrics

Note that mean squared error values are inverted (negative). This is a quirk of the cross val score() function used that requires all algorithm metrics to be sorted in ascending order (larger value is better).

数据准备

from pandas import read_csv
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LinearRegression
filename = 'housing.csv'
names = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT', 'MEDV']
dataframe = read_csv(filename, delim_whitespace=True, names=names)
array = dataframe.values
X = array[:,0:13]
Y = array[:,13]
Mean Absolute Error(MAE)

Concept:

Absolute Error:即预测值和真实值之差,取绝对值

n = the number of errors,
Σ = summation symbol (which means “add them all up”),
|xi – x| = the absolute errors.
Code:

from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LinearRegression
kfold = KFold(n_splits=10, random_state=7)
model = LinearRegression()
scoring = 'neg_mean_absolute_error'
results = cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print("MAE: %.3f (%.3f)" % (results.mean(), results.std()) ) 
# MAE: -4.005 (2.084)
Mean Squared Error(MSE)

Concept:

Yi= original or observed y-value,
Yi(head)= y-value from regression.

Code:

from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LinearRegression
kfold = KFold(n_splits=10, random_state=7)
model = LinearRegression()
scoring = 'neg_mean_squared_error'
results = cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print("MSE: %.3f (%.3f)" % (results.mean(), results.std()) )
# MSE: -34.705 (45.574)
R Squared

Concept:

R-Squared(R2)是一种统计措施,表示由回归模型中的独立变量或变量解释的依赖变量的方差的比例。相关性(correlation)解释了自变量和因变量之间关系的强度,而r平方解释了一个变量的方差在多大程度上解释了第二个变量的方差。因此,如果模型的R2是0.50,那么观测到的变化大约有一半可以由模型的输入来解释。

what is a good r2:
这取决于研究的领域,每一个领域对于r2的取值是不一样的。例如社会科学领域,r2是0.5分就算是好了,但是在金融领域,r2要0.7分才好

Limitations:

    R2得分高低,无法告诉你选择的模型是好是坏,也无法说明你的数据和预测值是否存在bias你无法通过r2来判断是否选择了正确的regression algorithm

Code:

from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LinearRegression
kfold = KFold(n_splits=10, random_state=7)
model = LinearRegression()
scoring = 'r2'
results = cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print("R^2: %.3f (%.3f)" % (results.mean(), results.std()) )
# R^2: 0.203 (0.595)
转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/741578.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号