目录
1 Neural Networks
1.1 Visualizing the data
1.2 Model representation
1.3 Feedforward(前向传播) and cost function
1.4 Regularized cost function
2 Backpropagation
2.1 Sigmoid gradient
2.2 Random initialization
2.3 Backpropagation
2.4 Gradient checking
2.5 Regularized Neural Networks
2.6 Learning parameters using fmincg
1 Neural Networks
1.1 Visualizing the data
1.2 Model representation
1.2 Model representation
和ex3的第二部分一样
1.3 Feedforward(前向传播) and cost function
使用Scikitlearn内置的编码函数
implement the feedforward computation that computes hθ(x(i)) for every example i and sum the cost over all examples. Your code should also work for a dataset of any size, with any number of labels
using the loaded set of parameters for Theta1 and Theta2. You should see that the cost is about 0.287629
import numpy as np
from sklearn.preprocessing import oneHotEncoder
from scipy.io import loadmat
def sigmoid(z):
return 1/(1+np.exp(-z))
# 1 Neural Networks
# 1.3 Feedforward and cost function
def forward_propagate(X,theta1,theta2):
m=X.shape[0]
a1=np.insert(X,0,values=np.ones(m),axis=1)
z2=a1*theta1.T
a2=np.insert(sigmoid(z2),0,values=np.ones(m),axis=1)
z3=a2*theta2.T
h=sigmoid(z3)
return a1,z2,a2,z3,h
def cost(theta1,theta2,X,y):
m=X.shape[0]
X=np.matrix(X)
y=np.matrix(y)
a1,z2,a2,z3,h=forward_propagate(X,theta1,theta2)
J = 0
for i in range(m):
first_term = np.multiply(-y[i, :], np.log(h[i, :]))
second_term = np.multiply((1 - y[i, :]), np.log(1 - h[i, :]))
J += np.sum(first_term - second_term)
J = J / m
return J
data=loadmat('ex4data1.mat')
X = data['X']
y = data['y']
print(X.shape, y.shape)
encoder = oneHotEncoder(sparse=False)
y_onehot = encoder.fit_transform(y)
print(y_onehot.shape)
print(y[0], y_onehot[0,:])
weight = loadmat("ex4weights.mat")
theta1, theta2 = weight['Theta1'], weight['Theta2']
print(theta1.shape, theta2.shape)
print(cost(theta1, theta2,X, y_onehot))
(5000, 400) (5000, 1) (5000, 10) [10] [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.] (25, 401) (10, 26) 0.2876291651613187
1.4 Regularized cost function
your code should in general work with Θ(1) and Θ(2) of any size.
Theta1 and Theta2, and λ = 1.
cost is about 0.383770.
# 1.4 Regularized cost function learning_rate=1 costReg=cost+(float(learning_rate) / (2 * X.shape[0])) * (np.sum(np.power(theta1[:,1:], 2)) + np.sum(np.power(theta2[:,1:], 2)))
0.3837698590909234
2 Backpropagation
2.1 Sigmoid gradient
When z = 0, the gradient should be exactly 0.25.
# 2 Backpropagation
# 2.1 Sigmoid gradient
def sigmoid_gradient(z):
return np.multiply(sigmoid(z),1-sigmoid(z))
print(sigmoid_gradient(0))
0.25
2.2 Random initialization
One effective strategy for random initialization is to randomly select values for Θ(l) uniformly in the range [−init,init].You should use init = 0.12.
# 2.2 Random initialization input_size = 400 hidden_size = 25 num_labels = 10 # np.random.random(size) 返回size大小的0-1随机浮点数 params=(np.random.random(size=hidden_size*(input_size+1)+num_labels*(input_size+1))-0.5)*0.24 print(params)
[ 0.10470569 0.06136579 0.00691982 ... -0.08727087 -0.00776181 0.02983205]
2.3 Backpropagation
- computing the activations (z(2),a(2),z(3),a(3))
2.4 Gradient checking
2.5 Regularized Neural Networks
# 2.3 Backpropagation
def back_propagate(params, input_size, hidden_size, num_labels, X, y, learning_rate):
theta1 = np.matrix(np.reshape(params[:hidden_size * (input_size + 1)], (hidden_size, (input_size + 1))))
theta2 = np.matrix(np.reshape(params[hidden_size * (input_size + 1):], (num_labels, (hidden_size + 1))))
J = cost(theta1, theta2, X, y)
delta1 = np.zeros(theta1.shape) # (25, 401)
delta2 = np.zeros(theta2.shape) # (10, 26)
a1, z2, a2, z3, h = forward_propagate(X, theta1, theta2)
m = X.shape[0]
for t in range(m):
a1t = a1[t, :] # (1, 401)
z2t = z2[t, :] # (1, 25)
a2t = a2[t, :] # (1, 26)
ht = h[t, :] # (1, 10)
yt = y[t, :] # (1, 10)
d3t = ht - yt # (1, 10)
# z2t = np.insert(z2t, 0, values=np.ones(1)) # (1, 26)
d2t = np.multiply((theta2.T * d3t.T).T, sigmoid_gradient(a2t)) # (1, 26)
delta1 = delta1 + (d2t[:, 1:]).T * a1t
delta2 = delta2 + d3t.T * a2t
delta1 = delta1 / m
delta2 = delta2 / m
# 2.5 Regularized Neural Networks
J += (float(learning_rate) / (2 * m)) * (np.sum(np.power(theta1[:, 1:], 2)) + np.sum(np.power(theta2[:, 1:], 2)))
delta1[:, 1:] = delta1[:, 1:] + (theta1[:, 1:] * learning_rate) / m
delta2[:, 1:] = delta2[:, 1:] + (theta2[:, 1:] * learning_rate) / m
grad = np.concatenate((np.ravel(delta1), np.ravel(delta2)))
return J, grad
2.6 Learning parameters using fmincg
# 2.6 Learning parameters using fmincg
fmin = minimize(fun=back_propagate, x0=(params),
args=(input_size, hidden_size, num_labels, X, y_onehot, learning_rate),
method='TNC', jac=True, options={'maxiter': 250})
print(fmin)
X = np.matrix(X)
thetafinal1 = np.matrix(np.reshape(fmin.x[:hidden_size * (input_size + 1)], (hidden_size, (input_size + 1))))
thetafinal2 = np.matrix(np.reshape(fmin.x[hidden_size * (input_size + 1):], (num_labels, (hidden_size + 1))))
a1, z2, a2, z3, h = forward_propagate(X, thetafinal1, thetafinal2 )
y_pred = np.array(np.argmax(h, axis=1) + 1)
print(y_pred)
from sklearn.metrics import classification_report#这个包是评价报告
print(classification_report(y, y_pred))
fun: 1.714892084546992
jac: array([ 3.60878519e-02, 8.30071260e-06, -2.04683724e-05, ...,
3.76088105e-02, 3.75763793e-02, 6.87872656e-04])
message: 'Converged (|f_n-f_(n-1)| ~= 0)'
nfev: 172
nit: 9
status: 1
success: True
x: array([ 0.06030615, 0.04150356, -0.10234186, ..., 1.51481312,
1.16313088, -1.44561905])
[[10]
[10]
[10]
...
[ 9]
[ 9]
[10]]
precision recall f1-score support
1 0.94 0.88 0.91 500
2 0.63 0.87 0.73 500
3 0.75 0.54 0.63 500
4 0.67 0.77 0.71 500
5 0.60 0.65 0.63 500
6 0.98 0.63 0.76 500
7 0.94 0.64 0.76 500
8 0.74 0.67 0.70 500
9 0.63 0.65 0.64 500
10 0.64 0.95 0.77 500
accuracy 0.72 5000
macro avg 0.75 0.72 0.72 5000
weighted avg 0.75 0.72 0.72 5000
If your implementation is correct, you should see a reported training accuracy of about 95.3% (this may vary by about 1% due to the random initialization).
这里又不对了????
3 Visualizing the hidden layer
# 3 Visualizing the hidden layer
import matplotlib
import matplotlib.pyplot as plt
hidden_layer = thetafinal1[:, 1:]
print(hidden_layer.shape)
fig, ax_array = plt.subplots(nrows=5, ncols=5, sharey=True, sharex=True, figsize=(12, 12))
for r in range(5):
for c in range(5):
ax_array[r, c].matshow(np.array(hidden_layer[5 * r + c].reshape((20, 20))),cmap=matplotlib.cm.binary)
plt.xticks(np.array([]))
plt.yticks(np.array([]))
plt.show()
视频看到这其实有点难受了,有一些推导还需要自己去查一下理解一下,然后就是如果用python把作业搞一遍,好困难,总有各种小问题,忍不住要去百度,呜呜呜
错误以后清醒了再看吧。。。



