机器学习常见数据集下载

数据集下载
从sklearn库中获取数据集

机器学习用到数据集都在UCI上面，做个笔记方便自己找。

UCI官网（老版本）：https://archive.ics.uci.edu/ml/index.php

UCI官网（新版本）：https://archive-beta.ics.uci.edu/

数据集下载

下面这些数据的下载地址都是老官网。

鸢尾花数据集：https://archive.ics.uci.edu/ml/datasets/Iris

红酒数据集：https://archive.ics.uci.edu/ml/datasets/Wine

波士顿房价数据集：https://archive.ics.uci.edu/ml/machine-learning-databases/housing/

隐形眼镜数据集：https://archive.ics.uci.edu/ml/datasets/lenses

患疝气病马的数据集：http://archive.ics.uci.edu/ml/datasets/Horse+Colic

葡萄牙银行机构营销案例数据集：http://archive.ics.uci.edu/ml/datasets/Bank+Marketing

1984年美国国会投票的数据集：http://archive.ics.uci.edu/ml/datasets/Congressional+Voting+Records

发现毒蘑菇相似特征的数据集：https://archive.ics.uci.edu/ml/datasets/mushroom

另外几个是kaggle上的数据集(如果不登录还没法下)：
旧金山犯罪案例：https://www.kaggle.com/c/sf-crime
泰坦尼克幸存者预测：https://www.kaggle.com/c/titanic/data
手写数字识别：https://www.kaggle.com/c/digit-recognizer/data

从sklearn库中获取数据集

学到后期发现的，原来有些数据在sklearn中有，调函数就能获取。省事多了。但好像只有12个。获取到的数据是JSON形式的，代码演示的是红酒数据集。

wine：一个JSON形式的数据
wine.data：数据
wine.feature_names：每一列特征的名称
wine.target：所属类型
wine.target_names：类型的名称

如果将wine.data与wine.target拼接成Dataframe，
那么它会是 [178 rows x 14 columns] 0~13都是特征 14列是标签 wine.feature_names+‘种类’ 可以做它的列名

from sklearn.datasets import load_boston,load_wine,load_iris,load_breast_cancer
import pprint

boston = load_boston()
wine = load_wine()
iris = load_iris()
BreastCancer = load_breast_cancer()

pprint.pprint(wine)

'''
打印结果;
"D:Programming SoftwarePython3.9.1python.exe" "D:/Program Space/Python/sklearn_machinelearning/src/Test/main.py"
{'DESCR': '.. _wine_dataset:n'
          'n'
          'Wine recognition datasetn'
          '------------------------n'
          'n'
          '**Data Set Characteristics:**n'
          'n'
          '    :Number of Instances: 178 (50 in each of three classes)n'
          '    :Number of Attributes: 13 numeric, predictive attributes and '
          'the classn'
          '    :Attribute Information:n'
          ' tt- Alcoholn'
          ' tt- Malic acidn'
          ' tt- Ashn'
          'tt- Alcalinity of ash  n'
          ' tt- Magnesiumn'
          'tt- Total phenolsn'
          ' tt- Flavanoidsn'
          ' tt- Nonflavanoid phenolsn'
          ' tt- Proanthocyaninsn'
          'tt- Color intensityn'
          ' tt- Huen'
          ' tt- OD280/OD315 of diluted winesn'
          ' tt- Prolinen'
          'n'
          '    - class:n'
          '            - class_0n'
          '            - class_1n'
          '            - class_2n'
          'ttn'
          '    :Summary Statistics:n'
          '    n'
          '    ============================= ==== ===== ======= =====n'
          '                                   Min   Max   Mean     SDn'
          '    ============================= ==== ===== ======= =====n'
          '    Alcohol:                      11.0  14.8    13.0   0.8n'
          '    Malic Acid:                   0.74  5.80    2.34  1.12n'
          '    Ash:                          1.36  3.23    2.36  0.27n'
          '    Alcalinity of Ash:            10.6  30.0    19.5   3.3n'
          '    Magnesium:                    70.0 162.0    99.7  14.3n'
          '    Total Phenols:                0.98  3.88    2.29  0.63n'
          '    Flavanoids:                   0.34  5.08    2.03  1.00n'
          '    Nonflavanoid Phenols:         0.13  0.66    0.36  0.12n'
          '    Proanthocyanins:              0.41  3.58    1.59  0.57n'
          '    Colour Intensity:              1.3  13.0     5.1   2.3n'
          '    Hue:                          0.48  1.71    0.96  0.23n'
          '    OD280/OD315 of diluted wines: 1.27  4.00    2.61  0.71n'
          '    Proline:                       278  1680     746   315n'
          '    ============================= ==== ===== ======= =====n'
          'n'
          '    :Missing Attribute Values: Nonen'
          '    :Class Distribution: class_0 (59), class_1 (71), class_2 (48)n'
          '    :Creator: R.A. Fishern'
          '    :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)n'
          '    :Date: July, 1988n'
          'n'
          'This is a copy of UCI ML Wine recognition datasets.n'
          'https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.datan'
          'n'
          'The data is the results of a chemical analysis of wines grown in '
          'the samen'
          'region in Italy by three different cultivators. There are thirteen '
          'differentn'
          'measurements taken for different constituents found in the three '
          'types ofn'
          'wine.n'
          'n'
          'Original Owners: n'
          'n'
          'Forina, M. et al, PARVUS - n'
          'An Extendible Package for Data Exploration, Classification and '
          'Correlation. n'
          'Institute of Pharmaceutical and Food Analysis and Technologies,n'
          'Via Brigata Salerno, 16147 Genoa, Italy.n'
          'n'
          'Citation:n'
          'n'
          'Lichman, M. (2013). UCI Machine Learning Repositoryn'
          '[https://archive.ics.uci.edu/ml]. Irvine, CA: University of '
          'California,n'
          'School of Information and Computer Science. n'
          'n'
          '.. topic:: Referencesn'
          'n'
          '  (1) S. Aeberhard, D. Coomans and O. de Vel, n'
          '  Comparison of Classifiers in High Dimensional Settings, n'
          '  Tech. Rep. no. 92-02, (1992), Dept. of Computer Science and Dept. '
          'of  n'
          '  Mathematics and Statistics, James Cook University of North '
          'Queensland. n'
          '  (Also submitted to Technometrics). n'
          'n'
          '  The data was used with many others for comparing various n'
          '  classifiers. The classes are separable, though only RDA n'
          '  has achieved 100% correct classification. n'
          '  (RDA : 100%, QDA 99.4%, LDA 98.9%, 1NN 96.1% (z-transformed '
          'data)) n'
          '  (All results using the leave-one-out technique) n'
          'n'
          '  (2) S. Aeberhard, D. Coomans and O. de Vel, n'
          '  "THE CLASSIFICATION PERFORMANCE OF RDA" n'
          '  Tech. Rep. no. 92-01, (1992), Dept. of Computer Science and Dept. '
          'of n'
          '  Mathematics and Statistics, James Cook University of North '
          'Queensland. n'
          '  (Also submitted to Journal of Chemometrics).n',
 'data': array([[1.423e+01, 1.710e+00, 2.430e+00, ..., 1.040e+00, 3.920e+00,
        1.065e+03],
       [1.320e+01, 1.780e+00, 2.140e+00, ..., 1.050e+00, 3.400e+00,
        1.050e+03],
       [1.316e+01, 2.360e+00, 2.670e+00, ..., 1.030e+00, 3.170e+00,
        1.185e+03],
       ...,
       [1.327e+01, 4.280e+00, 2.260e+00, ..., 5.900e-01, 1.560e+00,
        8.350e+02],
       [1.317e+01, 2.590e+00, 2.370e+00, ..., 6.000e-01, 1.620e+00,
        8.400e+02],
       [1.413e+01, 4.100e+00, 2.740e+00, ..., 6.100e-01, 1.600e+00,
        5.600e+02]]),
 'feature_names': ['alcohol',
                   'malic_acid',
                   'ash',
                   'alcalinity_of_ash',
                   'magnesium',
                   'total_phenols',
                   'flavanoids',
                   'nonflavanoid_phenols',
                   'proanthocyanins',
                   'color_intensity',
                   'hue',
                   'od280/od315_of_diluted_wines',
                   'proline'],
 'frame': None,
 'target': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2]),
 'target_names': array(['class_0', 'class_1', 'class_2'], dtype='

机器学习常见数据集下载

Python相关栏目本月热门文章