use logistic regression and neural networks to recognize handwritten digits (from 0 to 9)
1.1 Datasetex3data1.mat:5000 training examples,20 pixel by 20 pixel grayscale image of the digit.
The 20 by 20 grid of pixels is “unrolled” into a 400-dimensional vector. Each of these training examples becomes a single row in our data matrix X.
The second part of the training set is a 5000-dimensional vector y that contains labels for the training set.
“0” digit is labeled as “10”, while the digits “1” to “9” are labeled as “1” to “9” in their natural order
用Python读取.m文件需要使用Scipy
# 1 Multi-class Classification
# 1.1 Dataset
from scipy.io import loadmat
data=loadmat('ex3data1.mat')
print(data)
print(data['X'].shape,data['y'].shape)
{'__header__': b'MATLAB 5.0 MAT-file, Platform: GLNXA64, Created on: Sun Oct 16 13:09:09 2011', '__version__': '1.0', '__globals__': [], 'X': array([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]]), 'y': array([[10],
[10],
[10],
...,
[ 9],
[ 9],
[ 9]], dtype=uint8)}
(5000, 400) (5000, 1)
1.2 Visualizing the data
randomly selects 100 rows from X
底下这是百度到的 还不知道matshow是个怎么操作。。。
# 1.2 Visualizing the data
import numpy as np
# X的行数5000 在[0, 5000)内选100个
sample_idx = np.random.choice(np.arange(data['X'].shape[0]), 100)
sample_images = data['X'][sample_idx, :]
print(sample_images)
import matplotlib
import matplotlib.pyplot as plt
fig, ax_array = plt.subplots(nrows=10, ncols=10, sharey=True, sharex=True, figsize=(12, 12))
for r in range(10):
for c in range(10):
ax_array[r, c].matshow(np.array(sample_images[10 * r + c].reshape((20, 20))).T,cmap=matplotlib.cm.binary)
plt.show()
[[0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.] ... [0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.]]



