刚开始学习神经网络,之前在帖子中学习了arxiv.org中一篇论文通过神经网络求解常微分方程的思路,原帖介绍了论文思路并给出了常微分方程求解举例,在这里我写一下自己的一些理解,并尝试复现原论文中的耦合常微分方程组求解、产生的问题以及解决方法。
原帖和原论文链接如下:
Tensorflow一个很简单的神经网络求解常微分及偏微分方程_Trytobenice的博客-CSDN博客https://blog.csdn.net/qq_39817721/article/details/88875099NNforDEsPT (arxiv.org)https://arxiv.org/pdf/1902.05563.pdf首先,关于原论文中第3页的一阶常微分方程,原帖已经讲的很详细了,通过设计包含10个神经元的单隐层神经网络,将损失函数设定为常微分方程的平方,在范围内,利用Adam优化器实现了拟合误差不超过0.01。此外,根据评论,有同学指出,如果将原帖代码中tf.zeros()全部换成tf.random_normal(),可以进一步提升loss下降速度和最终精度,并可以加入偏置项。整个神经网络的构成大概是下图所示:
基于以上思路,考虑下面的耦合常微分方程组:
论文的求解思路是,仍通过前面所设计的单隐层神经网络求解,隐层仍然是10个神经元,由于待解决的是2个常微分方程,这里输出层自然变为2个神经元,整个神经网络的构成就变成了下图所示:
但根据论文所述,给出的边界条件在区域的同一端,即x=0处,这有可能导致神经网络求解存在数值不稳定,其与损失超平面(loss hypersurface)相关,导致神经网络可能收敛到局部极小。为了解决此问题,论文中提出的方法是将训练样本点进行分割,基于此对神经网络进行多轮训练,且随着训练进行,逐步增加训练样本范围。
按照这种思路,在第k轮训练结束后,保留当前的已训练权重,然后进行第k+1轮训练,那么神经网络在第k+1轮训练中,只需要对新增的那部分样本点进行学习,如下图所示:
原文给出的仿真结果如下图所示, 从上到下分为三部分,第一部分是两个函数的估计值(拟合结果),以为例,黄色虚线是第一次迭代的结果,按照文中所述,第一轮训练,取的是[0,1]内的样本点,相应的,拟合出的黄色虚线在前段和解析解贴合的还是不错的,由下方的误差或微分方程数值就能看出来,但越往后越离谱;同理,第二轮训练,取的是[0,2]内的样本点,那么拟合结果贴合解析解的部分就更多了,但末尾还是有明显偏差,直到第三轮,拟合解才几乎与解析解重合。
这个处理方法是增量式的训练,但每次增多少,每轮训练多少次,应该根据具体问题来微调。基于原帖改的对应Python代码(Tensorflow=2.6.0, numpy=1.19.5):
# ===============================================
# Time: 2021-09-28
# Title: coupled_v3.py
# Version: 3.0
# Title: coupled differential equation solver
# Note1:lack of analytic solution
# Note2:train by steps [0,1]+[1,2]+[2,3]
# ===============================================
# package used in this code:
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
tf.compat.v1.disable_eager_execution()
# ===============================================
# training sample points and analytic solution
x_total = np.linspace(0, 3, 100, endpoint=True) # training points
x_train_1st = x_total[:32] # 1st iteration
x_train_2nd = x_total[:68] # 2nd iteration
# y_trail_1 = # i don't know analytic solution
# y_trail_2 = # if you know, please update
# ===============================================
# hidden layer setting
x1 = tf.compat.v1.placeholder("float", [None, 1]) # placeholder
W = tf.Variable(tf.random.normal([1, 10])) # hidden layer weights
b = tf.Variable(tf.random.normal([10])) # hidden layer bias
y1 = tf.nn.sigmoid(tf.matmul(x1, W) + b) # sigmoid function
# ===============================================
# output layer setting
W1 = tf.Variable(tf.random.normal([10, 1])) # output weights of phi_1, phi_2
W2 = tf.Variable(tf.random.normal([10, 1]))
b1 = tf.Variable(tf.random.normal([1])) # output bias of phi_1, phi_2
b2 = tf.Variable(tf.random.normal([1]))
# ===============================================
# training result
y_1 = tf.matmul(y1, W1) + b1
y_2 = tf.matmul(y1, W2) + b2
# ===============================================
# differential operator
dif_1 = tf.matmul(tf.multiply(y1 * (1 - y1), W), W1)
dif_2 = tf.matmul(tf.multiply(y1 * (1 - y1), W), W2)
# ===============================================
# loss function
# coupled differential equation example
de1 = dif_1 - tf.cos(x1) - y_1 ** 2 - y_2 + 1 + x1 ** 2 + (tf.sin(x1)) ** 2
de2 = dif_2 - 2 * x1 + (1 + x1 ** 2) * tf.sin(x1) - y_1 * y_2
t_loss = de1 ** 2 + de2 ** 2
loss = tf.reduce_mean(t_loss) + (y_1[0]) ** 2 + (y_2[0] - 1) ** 2 # loss
# ===============================================
# training optimizer
train_step = tf.compat.v1.train.AdamOptimizer(1e-3).minimize(loss) # AdamOptimizer
init = tf.compat.v1.global_variables_initializer()
sess = tf.compat.v1.InteractiveSession()
sess.run(init)
# ===============================================
# training process: 1st training
x_t = np.zeros((len(x_train_1st), 1))
for i in range(len(x_train_1st)):
x_t[i] = x_train_1st[i]
for i in range(10000):
sess.run(train_step, feed_dict={x1: x_t})
if i % 50 == 0:
total_loss = sess.run(loss, feed_dict={x1: x_t})
print("iteration={}".format(i))
print("loss={}".format(total_loss))
# data: 1st training
x_1st = np.zeros((len(x_total), 1))
for i in range(len(x_total)):
x_1st[i] = x_total[i]
output_1_1st = sess.run(y_1, feed_dict={x1: x_1st})
output_2_1st = sess.run(y_2, feed_dict={x1: x_1st})
output_loss = sess.run(t_loss, feed_dict={x1: x_1st})
y_output_1_1st = x_total.copy()
y_output_2_1st = x_total.copy()
y_output_loss_1st = x_total.copy()
for i in range(len(x_total)):
y_output_1_1st[i] = output_1_1st[i]
y_output_2_1st[i] = output_2_1st[i]
y_output_loss_1st[i] = output_loss[i]
# ===============================================
# training process: 2st training
x_t = np.zeros((len(x_train_2nd), 1))
for i in range(len(x_train_2nd)):
x_t[i] = x_train_2nd[i]
for i in range(10000):
sess.run(train_step, feed_dict={x1: x_t})
if i % 50 == 0:
total_loss = sess.run(loss, feed_dict={x1: x_t})
print("iteration={}".format(i))
print("loss={}".format(total_loss))
# data: 2nd training
x_2nd = np.zeros((len(x_total), 1))
for i in range(len(x_total)):
x_2nd[i] = x_total[i]
output_1_2nd = sess.run(y_1, feed_dict={x1: x_2nd})
output_2_2nd = sess.run(y_2, feed_dict={x1: x_2nd})
output_loss_2nd = sess.run(t_loss, feed_dict={x1: x_2nd})
y_output_1_2nd = x_total.copy()
y_output_2_2nd = x_total.copy()
y_output_loss_2nd = x_total.copy()
for i in range(len(x_total)):
y_output_1_2nd[i] = output_1_2nd[i]
y_output_2_2nd[i] = output_2_2nd[i]
y_output_loss_2nd[i] = output_loss_2nd[i]
# ===============================================
# training process: 3rd training
x_t = np.zeros((len(x_total), 1))
for i in range(len(x_total)):
x_t[i] = x_total[i]
for i in range(10000):
sess.run(train_step, feed_dict={x1: x_t})
if i % 50 == 0:
total_loss = sess.run(loss, feed_dict={x1: x_t})
print("iteration={}".format(i))
print("loss={}".format(total_loss))
# data: 3rd training
x_3rd = np.zeros((len(x_total), 1))
for i in range(len(x_total)):
x_3rd[i] = x_total[i]
output_1_3rd = sess.run(y_1, feed_dict={x1: x_3rd})
output_2_3rd = sess.run(y_2, feed_dict={x1: x_3rd})
output_loss_3rd = sess.run(t_loss, feed_dict={x1: x_3rd})
y_output_1_3rd = x_total.copy()
y_output_2_3rd = x_total.copy()
y_output_loss_3rd = x_total.copy()
for i in range(len(x_total)):
y_output_1_3rd[i] = output_1_3rd[i]
y_output_2_3rd[i] = output_2_3rd[i]
y_output_loss_3rd[i] = output_loss_3rd[i]
# ===============================================
# training model saving
saver = tf.compat.v1.train.Saver(max_to_keep=1) # save training model
saver.save(sess, 'ckpt/nn.ckpt', global_step=10000)
saver = tf.compat.v1.train.Saver(max_to_keep=1)
model_file = "ckpt/nn.ckpt-10000"
saver.restore(sess, model_file)
# ===============================================
# plot fitting result
plt.figure(1)
plt.xlabel("x")
plt.ylabel("fitting result of phi_1, phi_2")
plt.plot(x_total, y_output_1_1st, color='green', linestyle='--') # phi_1 NN First Iteration Solution
plt.plot(x_total, y_output_1_2nd, color='purple', linestyle='--') # phi_1 NN Second Iteration Solution
plt.plot(x_total, y_output_1_3rd, color='blue', linestyle='-') # phi_1 NN Third Iteration Solution
plt.plot(x_total, y_output_2_1st, color='orange', linestyle='--') # phi_2 NN First Iteration Solution
plt.plot(x_total, y_output_2_2nd, color='maroon', linestyle='--') # phi_2 NN Second Iteration Solution
plt.plot(x_total, y_output_2_3rd, color='red', linestyle='-') # phi_2 NN Third Iteration Solution
plt.legend(
['phi_1 NN First Iteration Solution', 'phi_1 NN Second Iteration Solution', 'phi_1 NN Third Iteration Solution',
'phi_2 NN First Iteration Solution', 'phi_2 NN Second Iteration Solution', 'phi_2 NN Third Iteration Solution'],
loc='upper left')
plt.xlim(0, 3)
plt.show()
程序运行结果如下(我只画了2个函数的3次拟合结果,因为解析解我没求,所以这里没画):
结果和原文中差不多。综上,总结几点想法:
- 上面的代码,可以自己试试,如果直接在[0,3]内取全部样本点训练一轮,拟合出的结果每次都可能不一样,并且和解析解相差甚远,说明问题确实存在
- 对于一个函数拟合问题,如果定义的神经网络比较简单,又存在论文中所述的局部最优问题,可以考虑通过逐渐增加拟合样本范围来避免。(可否通过增加网络层数来解决?)
- 既然能解两个耦合方程,那么解矩阵方程也是可行的,此时loss函数可定义为微分方程的euclidean范数(或其他范数)
- 如果一个精简的方法能够解决问题,就不要用繁杂的方法,即使后者可能看起来很厉害
- 可以继续尝试解偏微分方程,原论文中也有这部分内容
- 理论层面分析神经网络还是很困难的,很多时候只是验证,尝试,再验证
以上内容若有错误,请批评指正,欢迎交流。



