TensorFlow简单实现神经网络求解耦合常微分方程组

刚开始学习神经网络，之前在帖子中学习了arxiv.org中一篇论文通过神经网络求解常微分方程的思路，原帖介绍了论文思路并给出了常微分方程求解举例，在这里我写一下自己的一些理解，并尝试复现原论文中的耦合常微分方程组求解、产生的问题以及解决方法。

原帖和原论文链接如下：

Tensorflow一个很简单的神经网络求解常微分及偏微分方程_Trytobenice的博客-CSDN博客https://blog.csdn.net/qq_39817721/article/details/88875099NNforDEsPT (arxiv.org)https://arxiv.org/pdf/1902.05563.pdf首先，关于原论文中第3页的一阶常微分方程，原帖已经讲的很详细了，通过设计包含10个神经元的单隐层神经网络，将损失函数设定为常微分方程的平方，在范围内，利用Adam优化器实现了拟合误差不超过0.01。此外，根据评论，有同学指出，如果将原帖代码中tf.zeros()全部换成tf.random_normal()，可以进一步提升loss下降速度和最终精度，并可以加入偏置项。整个神经网络的构成大概是下图所示：

基于以上思路，考虑下面的耦合常微分方程组：

论文的求解思路是，仍通过前面所设计的单隐层神经网络求解，隐层仍然是10个神经元，由于待解决的是2个常微分方程，这里输出层自然变为2个神经元，整个神经网络的构成就变成了下图所示：

但根据论文所述，给出的边界条件在区域的同一端，即x=0处，这有可能导致神经网络求解存在数值不稳定，其与损失超平面（loss hypersurface）相关，导致神经网络可能收敛到局部极小。为了解决此问题，论文中提出的方法是将训练样本点进行分割，基于此对神经网络进行多轮训练，且随着训练进行，逐步增加训练样本范围。

按照这种思路，在第k轮训练结束后，保留当前的已训练权重，然后进行第k+1轮训练，那么神经网络在第k+1轮训练中，只需要对新增的那部分样本点进行学习，如下图所示：

原文给出的仿真结果如下图所示，从上到下分为三部分，第一部分是两个函数的估计值（拟合结果），以为例，黄色虚线是第一次迭代的结果，按照文中所述，第一轮训练，取的是[0,1]内的样本点，相应的，拟合出的黄色虚线在前段和解析解贴合的还是不错的，由下方的误差或微分方程数值就能看出来，但越往后越离谱；同理，第二轮训练，取的是[0,2]内的样本点，那么拟合结果贴合解析解的部分就更多了，但末尾还是有明显偏差，直到第三轮，拟合解才几乎与解析解重合。

这个处理方法是增量式的训练，但每次增多少，每轮训练多少次，应该根据具体问题来微调。基于原帖改的对应Python代码（Tensorflow=2.6.0, numpy=1.19.5）：

# ===============================================
# Time: 2021-09-28
# Title: coupled_v3.py
# Version: 3.0
# Title: coupled differential equation solver
# Note1：lack of analytic solution
# Note2：train by steps [0,1]+[1,2]+[2,3]
# ===============================================
# package used in this code:
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf

tf.compat.v1.disable_eager_execution()
# ===============================================
# training sample points and analytic solution
x_total = np.linspace(0, 3, 100, endpoint=True)  # training points
x_train_1st = x_total[:32]  # 1st iteration
x_train_2nd = x_total[:68]  # 2nd iteration
# y_trail_1 =  # i don't know analytic solution
# y_trail_2 =  # if you know, please update
# ===============================================
# hidden layer setting
x1 = tf.compat.v1.placeholder("float", [None, 1])  # placeholder
W = tf.Variable(tf.random.normal([1, 10]))  # hidden layer weights
b = tf.Variable(tf.random.normal([10]))  # hidden layer bias
y1 = tf.nn.sigmoid(tf.matmul(x1, W) + b)  # sigmoid function
# ===============================================
# output layer setting
W1 = tf.Variable(tf.random.normal([10, 1]))  # output weights of phi_1, phi_2
W2 = tf.Variable(tf.random.normal([10, 1]))
b1 = tf.Variable(tf.random.normal([1]))  # output bias of phi_1, phi_2
b2 = tf.Variable(tf.random.normal([1]))
# ===============================================
# training result
y_1 = tf.matmul(y1, W1) + b1
y_2 = tf.matmul(y1, W2) + b2
# ===============================================
# differential operator
dif_1 = tf.matmul(tf.multiply(y1 * (1 - y1), W), W1)
dif_2 = tf.matmul(tf.multiply(y1 * (1 - y1), W), W2)
# ===============================================
# loss function
# coupled differential equation example
de1 = dif_1 - tf.cos(x1) - y_1 ** 2 - y_2 + 1 + x1 ** 2 + (tf.sin(x1)) ** 2
de2 = dif_2 - 2 * x1 + (1 + x1 ** 2) * tf.sin(x1) - y_1 * y_2
t_loss = de1 ** 2 + de2 ** 2
loss = tf.reduce_mean(t_loss) + (y_1[0]) ** 2 + (y_2[0] - 1) ** 2  # loss
# ===============================================
# training optimizer
train_step = tf.compat.v1.train.AdamOptimizer(1e-3).minimize(loss)  # AdamOptimizer
init = tf.compat.v1.global_variables_initializer()
sess = tf.compat.v1.InteractiveSession()
sess.run(init)
# ===============================================
# training process: 1st training
x_t = np.zeros((len(x_train_1st), 1))
for i in range(len(x_train_1st)):
    x_t[i] = x_train_1st[i]
for i in range(10000):
    sess.run(train_step, feed_dict={x1: x_t})
    if i % 50 == 0:
        total_loss = sess.run(loss, feed_dict={x1: x_t})
        print("iteration={}".format(i))
        print("loss={}".format(total_loss))
# data: 1st training
x_1st = np.zeros((len(x_total), 1))
for i in range(len(x_total)):
    x_1st[i] = x_total[i]
output_1_1st = sess.run(y_1, feed_dict={x1: x_1st})
output_2_1st = sess.run(y_2, feed_dict={x1: x_1st})
output_loss = sess.run(t_loss, feed_dict={x1: x_1st})
y_output_1_1st = x_total.copy()
y_output_2_1st = x_total.copy()
y_output_loss_1st = x_total.copy()
for i in range(len(x_total)):
    y_output_1_1st[i] = output_1_1st[i]
    y_output_2_1st[i] = output_2_1st[i]
    y_output_loss_1st[i] = output_loss[i]
# ===============================================
# training process: 2st training
x_t = np.zeros((len(x_train_2nd), 1))
for i in range(len(x_train_2nd)):
    x_t[i] = x_train_2nd[i]
for i in range(10000):
    sess.run(train_step, feed_dict={x1: x_t})
    if i % 50 == 0:
        total_loss = sess.run(loss, feed_dict={x1: x_t})
        print("iteration={}".format(i))
        print("loss={}".format(total_loss))
# data: 2nd training
x_2nd = np.zeros((len(x_total), 1))
for i in range(len(x_total)):
    x_2nd[i] = x_total[i]
output_1_2nd = sess.run(y_1, feed_dict={x1: x_2nd})
output_2_2nd = sess.run(y_2, feed_dict={x1: x_2nd})
output_loss_2nd = sess.run(t_loss, feed_dict={x1: x_2nd})
y_output_1_2nd = x_total.copy()
y_output_2_2nd = x_total.copy()
y_output_loss_2nd = x_total.copy()
for i in range(len(x_total)):
    y_output_1_2nd[i] = output_1_2nd[i]
    y_output_2_2nd[i] = output_2_2nd[i]
    y_output_loss_2nd[i] = output_loss_2nd[i]
# ===============================================
# training process: 3rd training
x_t = np.zeros((len(x_total), 1))
for i in range(len(x_total)):
    x_t[i] = x_total[i]
for i in range(10000):
    sess.run(train_step, feed_dict={x1: x_t})
    if i % 50 == 0:
        total_loss = sess.run(loss, feed_dict={x1: x_t})
        print("iteration={}".format(i))
        print("loss={}".format(total_loss))
# data: 3rd training
x_3rd = np.zeros((len(x_total), 1))
for i in range(len(x_total)):
    x_3rd[i] = x_total[i]
output_1_3rd = sess.run(y_1, feed_dict={x1: x_3rd})
output_2_3rd = sess.run(y_2, feed_dict={x1: x_3rd})
output_loss_3rd = sess.run(t_loss, feed_dict={x1: x_3rd})
y_output_1_3rd = x_total.copy()
y_output_2_3rd = x_total.copy()
y_output_loss_3rd = x_total.copy()
for i in range(len(x_total)):
    y_output_1_3rd[i] = output_1_3rd[i]
    y_output_2_3rd[i] = output_2_3rd[i]
    y_output_loss_3rd[i] = output_loss_3rd[i]
# ===============================================
# training model saving
saver = tf.compat.v1.train.Saver(max_to_keep=1)  # save training model
saver.save(sess, 'ckpt/nn.ckpt', global_step=10000)
saver = tf.compat.v1.train.Saver(max_to_keep=1)
model_file = "ckpt/nn.ckpt-10000"
saver.restore(sess, model_file)
# ===============================================
# plot fitting result
plt.figure(1)
plt.xlabel("x")
plt.ylabel("fitting result of phi_1, phi_2")
plt.plot(x_total, y_output_1_1st, color='green', linestyle='--')  # phi_1 NN First Iteration Solution
plt.plot(x_total, y_output_1_2nd, color='purple', linestyle='--')  # phi_1 NN Second Iteration Solution
plt.plot(x_total, y_output_1_3rd, color='blue', linestyle='-')  # phi_1 NN Third Iteration Solution
plt.plot(x_total, y_output_2_1st, color='orange', linestyle='--')  # phi_2 NN First Iteration Solution
plt.plot(x_total, y_output_2_2nd, color='maroon', linestyle='--')  # phi_2 NN Second Iteration Solution
plt.plot(x_total, y_output_2_3rd, color='red', linestyle='-')  # phi_2 NN Third Iteration Solution
plt.legend(
    ['phi_1 NN First Iteration Solution', 'phi_1 NN Second Iteration Solution', 'phi_1 NN Third Iteration Solution',
     'phi_2 NN First Iteration Solution', 'phi_2 NN Second Iteration Solution', 'phi_2 NN Third Iteration Solution'],
    loc='upper left')
plt.xlim(0, 3)
plt.show()

程序运行结果如下（我只画了2个函数的3次拟合结果，因为解析解我没求，所以这里没画）：

结果和原文中差不多。综上，总结几点想法：

上面的代码，可以自己试试，如果直接在[0,3]内取全部样本点训练一轮，拟合出的结果每次都可能不一样，并且和解析解相差甚远，说明问题确实存在
对于一个函数拟合问题，如果定义的神经网络比较简单，又存在论文中所述的局部最优问题，可以考虑通过逐渐增加拟合样本范围来避免。（可否通过增加网络层数来解决？）
既然能解两个耦合方程，那么解矩阵方程也是可行的，此时loss函数可定义为微分方程的euclidean范数（或其他范数）
如果一个精简的方法能够解决问题，就不要用繁杂的方法，即使后者可能看起来很厉害
可以继续尝试解偏微分方程，原论文中也有这部分内容
理论层面分析神经网络还是很困难的，很多时候只是验证，尝试，再验证

以上内容若有错误，请批评指正，欢迎交流。

TensorFlow简单实现神经网络求解耦合常微分方程组

Python相关栏目本月热门文章