Multilayer Perceptron_multilayer perception?

Multilayer Perceptron实验报告

Experimental contentExperimental resultsExperimental analysisConclusions

Experimental content

Coding MLP including one input layer, one hidden layer and one output layer. Additionally, output layer has two output neurons.

Experimental results

损失函数：均方误差（MSE）
激活函数：Sigmoid

代码实现：

# -*- coding: utf-8 -*-
# @Author  : sido
# @Software: PyCharm
import numpy as np
import matplotlib.pyplot as plt
'''
损失函数：MSE
激活函数：Sigmoid
'''

def sigmoid(x):  # 激活函数
    return 1 / (1 + np.exp(-x))

# def MSE(y_pred, y):
#     return ((y_pred - y) @ (y_pred - y).T) / len(y)
#
# def MAE(y_pred, y):
#     return np.mean(abs(y_pred - y))


# ---------------------------- 参数初始化 -----------------------------
# np.random.seed(10)
x_input = np.array([0.05, 0.1, 1])  # shape(1, 3)
# w_input = np.array([[0.15, 0.20], [0.25, 0.30], [0.35, 0.35]])  # shape: (3, 2)
w_input = np.random.rand(3, 2)
# w_hidden = [[0.4, 0.45], [0.50, 0.55], [0.60, 0.60]]  # shape: (3, 2)
w_hidden = np.random.rand(3, 2)
y = np.array([0.1, 0.99])  # shape: (1, 2)

a = 0.1  # 学习率
k = 1001  # 退出条件，迭代1001次
history_loss = []  # 记录训练过程中的损失

for i in range(k):
    # ----------------------------- 前向传播 --------------------------
    h_input = x_input @ w_input  # shape: (1, 2)
    h_output = np.hstack((sigmoid(h_input), 1))  # shape: (1, 3)
    y_pred_input = h_output @ w_hidden  # shape: (1, 2)
    y_pred_output = sigmoid(y_pred_input)  # shape: (1, 2)

    temp = np.subtract(y, y_pred_output) # shape: (1, 2)

    # ------------------------------ 计算梯度 ----------------------------
    step1 = (- temp * y_pred_output * (1 - y_pred_output)).reshape(-1, 1)  # shape: (2, 1)
    step2 = (step1 @ np.expand_dims(h_output, axis = 0)).T  # shape: (3, 2)

    step3 = np.sum(step2[:2], axis=1).reshape(-1, 1)  # shape: （2, 1)
    step4 = step3 * (h_output[:2] * (1 - h_output[:2])).reshape(-1, 1)  # shape: (2, 1)
    step5 = (step4 @ np.expand_dims(x_input, axis = 0)).T

    # ---------------------------- 更新参数 -----------------------------
    w_hidden -= a * step2
    w_input -= a * step5
	# ---------------------------- 计算并记录损失 -------------------------
    mae = (temp @ temp.T) / len(temp)
    history_loss.append(mae)
    if i % 100 == 0:
        print(f"# 第{i}次 Loss: ", mae)

# -------------------------------- 绘制损失图像 --------------------------------
plt.title("Training Loss")
plt.xlabel("Batch")
plt.ylabel("MAE_Loss")
plt.plot([i for i in range(k)], history_loss, color='red')
plt.show()

代码输出：

# 第0次 Loss:  0.2797712656430154
# 第100次 Loss:  0.04744961852163526
# 第200次 Loss:  0.016702498979428736
# 第300次 Loss:  0.008973068224617735
# 第400次 Loss:  0.005788804421514717
# 第500次 Loss:  0.004132668628735767
# 第600次 Loss:  0.0031471498591124007
# 第700次 Loss:  0.0025065495053741916
# 第800次 Loss:  0.0020631623751121582
# 第900次 Loss:  0.0017414387680463237
# 第1000次 Loss:  0.0014992082423502138

Experimental analysis

实验的难点主要在计算梯度以及反向传播

计算 y p r e d o u t p u t ∗ ( 1 − y p r e d o u t p u t ) y_pred_output * (1 - y_pred_output) ypredoutput∗(1−ypredoutput) 时，不要加绝对值计算 y p r e d o u t p u t ∗ ( 1 − y p r e d o u t p u t ) y_pred_output * (1 - y_pred_output) ypredoutput∗(1−ypredoutput) 时，不要写成了 y _ p r e d _ i n p u t ∗ ( 1 − y _ p r e d _ i n p u t ) y_{_pred_input} * (1 - y_{_pred_input}) y_pred_input∗(1−y_pred_input)
sigmoid函数:
y = 1 1 + e − x y = frac{1}{1 + e^{-x}} y=1+e−x1
求导：
y ′ = 1 1 + e − x ∗ ( 1 − 1 1 + e − x ) = y ∗ ( 1 − y ) y^{'} = frac{1}{1 + e^{-x}} * (1 - frac{1}{1 + e^{-x}}) = y *(1-y) y′=1+e−x1∗(1−1+e−x1)=y∗(1−y)实验过程中注意向量，矩阵形状的变化
可以采用 reshape, transpose 等来变换矩阵形状
不要混淆形状（2，）和（2， 1），两者是不同的，一个是一维的，一个是二维的
注：transpose不可以对一维向量进行转置 Conclusions

成功实现实验内容中所给的多层感知机成功实现反向传播来更新参数

. 完成实验花费3~4个小时，主要时间花费在实现反向传播

首先是对反向传播的理解不透彻，比较生疏其次是对矩阵运算的生疏，不能熟练的运用Numpy库，对于每次计算的结果的向量、矩阵形状不能很好的把握，为了不出现由于矩阵形状不匹配而带来的矩阵运算错误，我将每次运算结果的形状标注在代码旁，这也花费了大量的时间。虽然最终实现了实验，损失随着迭代次数而收敛，但是代码可复用性差，只适合实验内容中所给的多层感知机

收获
对多层感知机有了更直观的理解与感受，在实践的过程中反复体验了梯度下降的过程，对损失在多层感知机中的传递以及参数的更新有了更深的理解，更熟悉矩阵的运算以及Numpy库使用。

Multilayer Perceptron_multilayer perception?

Python相关栏目本月热门文章