1 Neural Networks

1.1 Visualizing the data

1.2 Model representation

1.3 Feedforward（前向传播） and cost function

1.4 Regularized cost function

2 Backpropagation

2.1 Sigmoid gradient

2.2 Random initialization

2.3 Backpropagation

2.4 Gradient checking

2.5 Regularized Neural Networks

2.6 Learning parameters using fmincg

3 Visualizing the hidden layer

1 Neural Networks

1.1 Visualizing the data

1.2 Model representation
和ex3的第二部分一样

1.3 Feedforward（前向传播） and cost function

使用Scikitlearn内置的编码函数

implement the feedforward computation that computes hθ(x(i)) for every example i and sum the cost over all examples. Your code should also work for a dataset of any size, with any number of labels

using the loaded set of parameters for Theta1 and Theta2. You should see that the cost is about 0.287629

import numpy as np
from sklearn.preprocessing import oneHotEncoder
from scipy.io import loadmat

def sigmoid(z):
    return 1/(1+np.exp(-z))

# 1 Neural Networks
# 1.3 Feedforward and cost function
def forward_propagate(X,theta1,theta2):
    m=X.shape[0]
    a1=np.insert(X,0,values=np.ones(m),axis=1)
    z2=a1*theta1.T
    a2=np.insert(sigmoid(z2),0,values=np.ones(m),axis=1)
    z3=a2*theta2.T
    h=sigmoid(z3)
    return a1,z2,a2,z3,h

def cost(theta1,theta2,X,y):
    m=X.shape[0]
    X=np.matrix(X)
    y=np.matrix(y)

    a1,z2,a2,z3,h=forward_propagate(X,theta1,theta2)

    J = 0
    for i in range(m):
        first_term = np.multiply(-y[i, :], np.log(h[i, :]))
        second_term = np.multiply((1 - y[i, :]), np.log(1 - h[i, :]))
        J += np.sum(first_term - second_term)

    J = J / m

    return J

data=loadmat('ex4data1.mat')
X = data['X']
y = data['y']
print(X.shape, y.shape)

encoder = oneHotEncoder(sparse=False)
y_onehot = encoder.fit_transform(y)
print(y_onehot.shape)
print(y[0], y_onehot[0,:])

weight = loadmat("ex4weights.mat")
theta1, theta2 = weight['Theta1'], weight['Theta2']
print(theta1.shape, theta2.shape)

print(cost(theta1, theta2,X, y_onehot))

(5000, 400) (5000, 1)
(5000, 10)
[10] [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
(25, 401) (10, 26)
0.2876291651613187

1.4 Regularized cost function

your code should in general work with Θ(1) and Θ(2) of any size.

Theta1 and Theta2, and λ = 1.

cost is about 0.383770.

# 1.4 Regularized cost function
learning_rate=1
costReg=cost+(float(learning_rate) / (2 * X.shape[0])) * (np.sum(np.power(theta1[:,1:], 2)) + np.sum(np.power(theta2[:,1:], 2)))

0.3837698590909234

2 Backpropagation

2.1 Sigmoid gradient

When z = 0, the gradient should be exactly 0.25.

# 2 Backpropagation
# 2.1 Sigmoid gradient

def sigmoid_gradient(z):
    return np.multiply(sigmoid(z),1-sigmoid(z))

print(sigmoid_gradient(0))

0.25

2.2 Random initialization

One effective strategy for random initialization is to randomly select values for Θ(l) uniformly in the range [−init,init].You should use init = 0.12.

# 2.2 Random initialization
input_size = 400
hidden_size = 25
num_labels = 10
# np.random.random(size) 返回size大小的0-1随机浮点数
params=(np.random.random(size=hidden_size*(input_size+1)+num_labels*(input_size+1))-0.5)*0.24
print(params)

[ 0.10470569  0.06136579  0.00691982 ... -0.08727087 -0.00776181
  0.02983205]

2.3 Backpropagation

computing the activations (z(2),a(2),z(3),a(3))

2.4 Gradient checking

2.5 Regularized Neural Networks

# 2.3 Backpropagation
def back_propagate(params, input_size, hidden_size, num_labels, X, y, learning_rate):


    theta1 = np.matrix(np.reshape(params[:hidden_size * (input_size + 1)], (hidden_size, (input_size + 1))))
    theta2 = np.matrix(np.reshape(params[hidden_size * (input_size + 1):], (num_labels, (hidden_size + 1))))

    J = cost(theta1, theta2, X, y)
    delta1 = np.zeros(theta1.shape)  # (25, 401)
    delta2 = np.zeros(theta2.shape)  # (10, 26)

    a1, z2, a2, z3, h = forward_propagate(X, theta1, theta2)
    m = X.shape[0]
    for t in range(m):
        a1t = a1[t, :]  # (1, 401)
        z2t = z2[t, :]  # (1, 25)
        a2t = a2[t, :]  # (1, 26)
        ht = h[t, :]  # (1, 10)
        yt = y[t, :]  # (1, 10)

        d3t = ht - yt  # (1, 10)

    #    z2t = np.insert(z2t, 0, values=np.ones(1))  # (1, 26)
        d2t = np.multiply((theta2.T * d3t.T).T, sigmoid_gradient(a2t))  # (1, 26)

        delta1 = delta1 + (d2t[:, 1:]).T * a1t
        delta2 = delta2 + d3t.T * a2t

    delta1 = delta1 / m
    delta2 = delta2 / m

    # 2.5 Regularized Neural Networks
    J += (float(learning_rate) / (2 * m)) * (np.sum(np.power(theta1[:, 1:], 2)) + np.sum(np.power(theta2[:, 1:], 2)))

    delta1[:, 1:] = delta1[:, 1:] + (theta1[:, 1:] * learning_rate) / m
    delta2[:, 1:] = delta2[:, 1:] + (theta2[:, 1:] * learning_rate) / m

    grad = np.concatenate((np.ravel(delta1), np.ravel(delta2)))
    return J, grad

2.6 Learning parameters using fmincg

# 2.6 Learning parameters using fmincg
fmin = minimize(fun=back_propagate, x0=(params),
                args=(input_size, hidden_size, num_labels, X, y_onehot, learning_rate),
                method='TNC', jac=True, options={'maxiter': 250})
print(fmin)

X = np.matrix(X)
thetafinal1 = np.matrix(np.reshape(fmin.x[:hidden_size * (input_size + 1)], (hidden_size, (input_size + 1))))
thetafinal2 = np.matrix(np.reshape(fmin.x[hidden_size * (input_size + 1):], (num_labels, (hidden_size + 1))))

a1, z2, a2, z3, h = forward_propagate(X, thetafinal1, thetafinal2 )
y_pred = np.array(np.argmax(h, axis=1) + 1)
print(y_pred)

from sklearn.metrics import classification_report#这个包是评价报告
print(classification_report(y, y_pred))

fun: 1.714892084546992
     jac: array([ 3.60878519e-02,  8.30071260e-06, -2.04683724e-05, ...,
        3.76088105e-02,  3.75763793e-02,  6.87872656e-04])
 message: 'Converged (|f_n-f_(n-1)| ~= 0)'
    nfev: 172
     nit: 9
  status: 1
 success: True
       x: array([ 0.06030615,  0.04150356, -0.10234186, ...,  1.51481312,
        1.16313088, -1.44561905])
[[10]
 [10]
 [10]
 ...
 [ 9]
 [ 9]
 [10]]
              precision    recall  f1-score   support

           1       0.94      0.88      0.91       500
           2       0.63      0.87      0.73       500
           3       0.75      0.54      0.63       500
           4       0.67      0.77      0.71       500
           5       0.60      0.65      0.63       500
           6       0.98      0.63      0.76       500
           7       0.94      0.64      0.76       500
           8       0.74      0.67      0.70       500
           9       0.63      0.65      0.64       500
          10       0.64      0.95      0.77       500

    accuracy                           0.72      5000
   macro avg       0.75      0.72      0.72      5000
weighted avg       0.75      0.72      0.72      5000

If your implementation is correct, you should see a reported training accuracy of about 95.3% (this may vary by about 1% due to the random initialization).

这里又不对了？？？？

3 Visualizing the hidden layer

# 3 Visualizing the hidden layer
import matplotlib
import matplotlib.pyplot as plt
hidden_layer = thetafinal1[:, 1:]
print(hidden_layer.shape)

fig, ax_array = plt.subplots(nrows=5, ncols=5, sharey=True, sharex=True, figsize=(12, 12))
for r in range(5):
    for c in range(5):
        ax_array[r, c].matshow(np.array(hidden_layer[5 * r + c].reshape((20, 20))),cmap=matplotlib.cm.binary)
        plt.xticks(np.array([]))
        plt.yticks(np.array([]))

plt.show()

视频看到这其实有点难受了，有一些推导还需要自己去查一下理解一下，然后就是如果用python把作业搞一遍，好困难，总有各种小问题，忍不住要去百度，呜呜呜

错误以后清醒了再看吧。。。

Programming Exercise 4:Neural Networks Learning

1 Neural Networks

1.1 Visualizing the data

1.2 Model representation
和ex3的第二部分一样

2 Backpropagation

2.1 Sigmoid gradient

When z = 0, the gradient should be exactly 0.25.

# 2 Backpropagation # 2.1 Sigmoid gradient def sigmoid_gradient(z): return np.multiply(sigmoid(z),1-sigmoid(z)) print(sigmoid_gradient(0))

0.25

2.3 Backpropagation

computing the activations (z(2),a(2),z(3),a(3))

2.4 Gradient checking

Python相关栏目本月热门文章

Programming Exercise 4:Neural Networks Learning

1 Neural Networks

1.1 Visualizing the data

1.2 Model representation 和ex3的第二部分一样

2 Backpropagation

2.1 Sigmoid gradient When z = 0, the gradient should be exactly 0.25. # 2 Backpropagation # 2.1 Sigmoid gradient def sigmoid_gradient(z): return np.multiply(sigmoid(z),1-sigmoid(z)) print(sigmoid_gradient(0)) 0.25

2.3 Backpropagation computing the activations (z(2),a(2),z(3),a(3))

2.4 Gradient checking

Python相关栏目本月热门文章

1.2 Model representation
和ex3的第二部分一样

2.1 Sigmoid gradient

When z = 0, the gradient should be exactly 0.25.

# 2 Backpropagation # 2.1 Sigmoid gradient def sigmoid_gradient(z): return np.multiply(sigmoid(z),1-sigmoid(z)) print(sigmoid_gradient(0))

0.25

2.3 Backpropagation

computing the activations (z(2),a(2),z(3),a(3))