栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 软件开发 > 后端开发 > Python

mnist数据集怎么用(mnist数据集是怎么做出来的)

Python 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

mnist数据集怎么用(mnist数据集是怎么做出来的)

1. 导入包
import pandas as pd
import numpy as np
from numpy import mean
from numpy import std
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.linear_model import LogisticRegression
2. 导入自己的数据
data_wide = pd.read_csv("./data/mode_wide.csv", index_col=0) #index_col=0 第一行为列名
data_wide
choicecost.carcost.carpoolcost.buscost.railtime.cartime.carpooltime.bustime.rail
1car1.5070102.3356121.8005122.35892018.50320026.33823320.86779430.033469
2rail6.0569982.8969192.2371281.85545031.31110734.25695667.18188960.293126
3car5.7946772.1374542.5763852.74747922.54742923.25517163.30905749.171643
4car1.8691442.5724271.9035182.26827626.09028229.89602319.75270413.472675
5car2.4989521.7220102.6860002.9738664.69914012.41408443.09203939.743252
..............................
449rail6.9909010.5151372.0660442.17117448.02279244.50157727.27191818.966319
450car4.5916472.8911481.9003791.79440729.44419233.72708766.11734539.842459
451car3.2362371.2068151.7546742.02367116.34901718.97507423.38772943.298276
452bus6.9327401.1718612.4614952.61248965.42064160.48166852.40431548.370662
453carpool6.5315091.4081712.2147911.85633859.56607355.14140667.81563573.447286

453 rows × 9 columns

2. 处理数据

y= 1(选car);

y = 2 (carpool);

y = 3 (rail);

y = 4 (bus);

def choice_to_y(choice):
    if choice == 'car':
        return 1
    elif choice == 'carpool':
        return 2
    elif choice == 'rail':
        return 3
    else:
        return 4

data_wide['y'] = data_wide['choice'].map(choice_to_y)
data_wide
choicecost.carcost.carpoolcost.buscost.railtime.cartime.carpooltime.bustime.raily
1car1.5070102.3356121.8005122.35892018.50320026.33823320.86779430.0334691
2rail6.0569982.8969192.2371281.85545031.31110734.25695667.18188960.2931263
3car5.7946772.1374542.5763852.74747922.54742923.25517163.30905749.1716431
4car1.8691442.5724271.9035182.26827626.09028229.89602319.75270413.4726751
5car2.4989521.7220102.6860002.9738664.69914012.41408443.09203939.7432521
.................................
449rail6.9909010.5151372.0660442.17117448.02279244.50157727.27191818.9663193
450car4.5916472.8911481.9003791.79440729.44419233.72708766.11734539.8424591
451car3.2362371.2068151.7546742.02367116.34901718.97507423.38772943.2982761
452bus6.9327401.1718612.4614952.61248965.42064160.48166852.40431548.3706624
453carpool6.5315091.4081712.2147911.85633859.56607355.14140667.81563573.4472862

453 rows × 10 columns

3. 确定自变量X和因变量y
data_wide.columns
Index(['choice', 'cost.car', 'cost.carpool', 'cost.bus', 'cost.rail',
       'time.car', 'time.carpool', 'time.bus', 'time.rail', 'y'],
      dtype='object')
X = data_wide[['cost.car', 'cost.carpool', 'cost.bus', 'cost.rail','time.car', 'time.carpool', 'time.bus', 'time.rail']]
y = data_wide['y']
4. 配置Logit模型并评估
model = LogisticRegression(multi_class='multinomial', solver='lbfgs')

# define the model evaluation procedure (定义模型评估程序) n_splits 就是K-flods中的K值;n_repeats是交叉验证的次数
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
# evaluate the model and collect the scores (评估模型并收集分数)
n_scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)
# report the model performance 
print('Mean Accuracy: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))  
Mean Accuracy: 0.665 (0.061)
5. 拟合
model.fit(X, y)
D:ANACONDAlibsite-packagessklearnlinear_model_logistic.py:818: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG,





LogisticRegression(multi_class='multinomial')
6. 设置一个新的数据,预测结果
#生成一组新数据
new_data = np.random.rand(8)
new_data
array([0.11880174, 0.16505872, 0.14297278, 0.50355392, 0.87629855,
       0.91189688, 0.57073101, 0.19178997])
#预测
#预测新数据的分布概率
yhat = model.predict_proba([new_data])

#输出预测结果
print('Predicted Probabilities: %s' % yhat[0])
Predicted Probabilities: [0.3749058  0.20228137 0.20380141 0.21901142]


D:ANACONDAlibsite-packagessklearnbase.py:451: UserWarning: X does not have valid feature names, but LogisticRegression was fitted with feature names
  "X does not have valid feature names, but"
已经可以了解如何使用自己的数据进行多元logit回归的一个思路;

上面的警告是出现了无效的特征名(列名不是正确的格式)

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/772582.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号