栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 软件开发 > 后端开发 > Python

TensorFlow简单实现神经网络求解耦合常微分方程组

Python 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

TensorFlow简单实现神经网络求解耦合常微分方程组

刚开始学习神经网络,之前在帖子中学习了arxiv.org中一篇论文通过神经网络求解常微分方程的思路,原帖介绍了论文思路并给出了常微分方程求解举例,在这里我写一下自己的一些理解,并尝试复现原论文中的耦合常微分方程组求解、产生的问题以及解决方法。

原帖和原论文链接如下:

Tensorflow一个很简单的神经网络求解常微分及偏微分方程_Trytobenice的博客-CSDN博客https://blog.csdn.net/qq_39817721/article/details/88875099NNforDEsPT (arxiv.org)https://arxiv.org/pdf/1902.05563.pdf首先,关于原论文中第3页的一阶常微分方程,原帖已经讲的很详细了,通过设计包含10个神经元的单隐层神经网络,将损失函数设定为常微分方程的平方,在范围内,利用Adam优化器实现了拟合误差不超过0.01。此外,根据评论,有同学指出,如果将原帖代码中tf.zeros()全部换成tf.random_normal(),可以进一步提升loss下降速度和最终精度,并可以加入偏置项。整个神经网络的构成大概是下图所示:

 基于以上思路,考虑下面的耦合常微分方程组:

 

论文的求解思路是,仍通过前面所设计的单隐层神经网络求解,隐层仍然是10个神经元,由于待解决的是2个常微分方程,这里输出层自然变为2个神经元,整个神经网络的构成就变成了下图所示:

但根据论文所述,给出的边界条件在区域的同一端,即x=0处,这有可能导致神经网络求解存在数值不稳定,其与损失超平面(loss hypersurface)相关,导致神经网络可能收敛到局部极小。为了解决此问题,论文中提出的方法是将训练样本点进行分割,基于此对神经网络进行多轮训练,且随着训练进行,逐步增加训练样本范围。

按照这种思路,在第k轮训练结束后,保留当前的已训练权重,然后进行第k+1轮训练,那么神经网络在第k+1轮训练中,只需要对新增的那部分样本点进行学习,如下图所示:

原文给出的仿真结果如下图所示, 从上到下分为三部分,第一部分是两个函数的估计值(拟合结果),以为例,黄色虚线是第一次迭代的结果,按照文中所述,第一轮训练,取的是[0,1]内的样本点,相应的,拟合出的黄色虚线在前段和解析解贴合的还是不错的,由下方的误差或微分方程数值就能看出来,但越往后越离谱;同理,第二轮训练,取的是[0,2]内的样本点,那么拟合结果贴合解析解的部分就更多了,但末尾还是有明显偏差,直到第三轮,拟合解才几乎与解析解重合。

这个处理方法是增量式的训练,但每次增多少,每轮训练多少次,应该根据具体问题来微调。基于原帖改的对应Python代码(Tensorflow=2.6.0, numpy=1.19.5):

# ===============================================
# Time: 2021-09-28
# Title: coupled_v3.py
# Version: 3.0
# Title: coupled differential equation solver
# Note1:lack of analytic solution
# Note2:train by steps [0,1]+[1,2]+[2,3]
# ===============================================
# package used in this code:
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf

tf.compat.v1.disable_eager_execution()
# ===============================================
# training sample points and analytic solution
x_total = np.linspace(0, 3, 100, endpoint=True)  # training points
x_train_1st = x_total[:32]  # 1st iteration
x_train_2nd = x_total[:68]  # 2nd iteration
# y_trail_1 =  # i don't know analytic solution
# y_trail_2 =  # if you know, please update
# ===============================================
# hidden layer setting
x1 = tf.compat.v1.placeholder("float", [None, 1])  # placeholder
W = tf.Variable(tf.random.normal([1, 10]))  # hidden layer weights
b = tf.Variable(tf.random.normal([10]))  # hidden layer bias
y1 = tf.nn.sigmoid(tf.matmul(x1, W) + b)  # sigmoid function
# ===============================================
# output layer setting
W1 = tf.Variable(tf.random.normal([10, 1]))  # output weights of phi_1, phi_2
W2 = tf.Variable(tf.random.normal([10, 1]))
b1 = tf.Variable(tf.random.normal([1]))  # output bias of phi_1, phi_2
b2 = tf.Variable(tf.random.normal([1]))
# ===============================================
# training result
y_1 = tf.matmul(y1, W1) + b1
y_2 = tf.matmul(y1, W2) + b2
# ===============================================
# differential operator
dif_1 = tf.matmul(tf.multiply(y1 * (1 - y1), W), W1)
dif_2 = tf.matmul(tf.multiply(y1 * (1 - y1), W), W2)
# ===============================================
# loss function
# coupled differential equation example
de1 = dif_1 - tf.cos(x1) - y_1 ** 2 - y_2 + 1 + x1 ** 2 + (tf.sin(x1)) ** 2
de2 = dif_2 - 2 * x1 + (1 + x1 ** 2) * tf.sin(x1) - y_1 * y_2
t_loss = de1 ** 2 + de2 ** 2
loss = tf.reduce_mean(t_loss) + (y_1[0]) ** 2 + (y_2[0] - 1) ** 2  # loss
# ===============================================
# training optimizer
train_step = tf.compat.v1.train.AdamOptimizer(1e-3).minimize(loss)  # AdamOptimizer
init = tf.compat.v1.global_variables_initializer()
sess = tf.compat.v1.InteractiveSession()
sess.run(init)
# ===============================================
# training process: 1st training
x_t = np.zeros((len(x_train_1st), 1))
for i in range(len(x_train_1st)):
    x_t[i] = x_train_1st[i]
for i in range(10000):
    sess.run(train_step, feed_dict={x1: x_t})
    if i % 50 == 0:
        total_loss = sess.run(loss, feed_dict={x1: x_t})
        print("iteration={}".format(i))
        print("loss={}".format(total_loss))
# data: 1st training
x_1st = np.zeros((len(x_total), 1))
for i in range(len(x_total)):
    x_1st[i] = x_total[i]
output_1_1st = sess.run(y_1, feed_dict={x1: x_1st})
output_2_1st = sess.run(y_2, feed_dict={x1: x_1st})
output_loss = sess.run(t_loss, feed_dict={x1: x_1st})
y_output_1_1st = x_total.copy()
y_output_2_1st = x_total.copy()
y_output_loss_1st = x_total.copy()
for i in range(len(x_total)):
    y_output_1_1st[i] = output_1_1st[i]
    y_output_2_1st[i] = output_2_1st[i]
    y_output_loss_1st[i] = output_loss[i]
# ===============================================
# training process: 2st training
x_t = np.zeros((len(x_train_2nd), 1))
for i in range(len(x_train_2nd)):
    x_t[i] = x_train_2nd[i]
for i in range(10000):
    sess.run(train_step, feed_dict={x1: x_t})
    if i % 50 == 0:
        total_loss = sess.run(loss, feed_dict={x1: x_t})
        print("iteration={}".format(i))
        print("loss={}".format(total_loss))
# data: 2nd training
x_2nd = np.zeros((len(x_total), 1))
for i in range(len(x_total)):
    x_2nd[i] = x_total[i]
output_1_2nd = sess.run(y_1, feed_dict={x1: x_2nd})
output_2_2nd = sess.run(y_2, feed_dict={x1: x_2nd})
output_loss_2nd = sess.run(t_loss, feed_dict={x1: x_2nd})
y_output_1_2nd = x_total.copy()
y_output_2_2nd = x_total.copy()
y_output_loss_2nd = x_total.copy()
for i in range(len(x_total)):
    y_output_1_2nd[i] = output_1_2nd[i]
    y_output_2_2nd[i] = output_2_2nd[i]
    y_output_loss_2nd[i] = output_loss_2nd[i]
# ===============================================
# training process: 3rd training
x_t = np.zeros((len(x_total), 1))
for i in range(len(x_total)):
    x_t[i] = x_total[i]
for i in range(10000):
    sess.run(train_step, feed_dict={x1: x_t})
    if i % 50 == 0:
        total_loss = sess.run(loss, feed_dict={x1: x_t})
        print("iteration={}".format(i))
        print("loss={}".format(total_loss))
# data: 3rd training
x_3rd = np.zeros((len(x_total), 1))
for i in range(len(x_total)):
    x_3rd[i] = x_total[i]
output_1_3rd = sess.run(y_1, feed_dict={x1: x_3rd})
output_2_3rd = sess.run(y_2, feed_dict={x1: x_3rd})
output_loss_3rd = sess.run(t_loss, feed_dict={x1: x_3rd})
y_output_1_3rd = x_total.copy()
y_output_2_3rd = x_total.copy()
y_output_loss_3rd = x_total.copy()
for i in range(len(x_total)):
    y_output_1_3rd[i] = output_1_3rd[i]
    y_output_2_3rd[i] = output_2_3rd[i]
    y_output_loss_3rd[i] = output_loss_3rd[i]
# ===============================================
# training model saving
saver = tf.compat.v1.train.Saver(max_to_keep=1)  # save training model
saver.save(sess, 'ckpt/nn.ckpt', global_step=10000)
saver = tf.compat.v1.train.Saver(max_to_keep=1)
model_file = "ckpt/nn.ckpt-10000"
saver.restore(sess, model_file)
# ===============================================
# plot fitting result
plt.figure(1)
plt.xlabel("x")
plt.ylabel("fitting result of phi_1, phi_2")
plt.plot(x_total, y_output_1_1st, color='green', linestyle='--')  # phi_1 NN First Iteration Solution
plt.plot(x_total, y_output_1_2nd, color='purple', linestyle='--')  # phi_1 NN Second Iteration Solution
plt.plot(x_total, y_output_1_3rd, color='blue', linestyle='-')  # phi_1 NN Third Iteration Solution
plt.plot(x_total, y_output_2_1st, color='orange', linestyle='--')  # phi_2 NN First Iteration Solution
plt.plot(x_total, y_output_2_2nd, color='maroon', linestyle='--')  # phi_2 NN Second Iteration Solution
plt.plot(x_total, y_output_2_3rd, color='red', linestyle='-')  # phi_2 NN Third Iteration Solution
plt.legend(
    ['phi_1 NN First Iteration Solution', 'phi_1 NN Second Iteration Solution', 'phi_1 NN Third Iteration Solution',
     'phi_2 NN First Iteration Solution', 'phi_2 NN Second Iteration Solution', 'phi_2 NN Third Iteration Solution'],
    loc='upper left')
plt.xlim(0, 3)
plt.show()

程序运行结果如下(我只画了2个函数的3次拟合结果,因为解析解我没求,所以这里没画):

 结果和原文中差不多。综上,总结几点想法:

  1. 上面的代码,可以自己试试,如果直接在[0,3]内取全部样本点训练一轮,拟合出的结果每次都可能不一样,并且和解析解相差甚远,说明问题确实存在
  2. 对于一个函数拟合问题,如果定义的神经网络比较简单,又存在论文中所述的局部最优问题,可以考虑通过逐渐增加拟合样本范围来避免。(可否通过增加网络层数来解决?)
  3. 既然能解两个耦合方程,那么解矩阵方程也是可行的,此时loss函数可定义为微分方程的euclidean范数(或其他范数)
  4. 如果一个精简的方法能够解决问题,就不要用繁杂的方法,即使后者可能看起来很厉害
  5. 可以继续尝试解偏微分方程,原论文中也有这部分内容
  6. 理论层面分析神经网络还是很困难的,很多时候只是验证,尝试,再验证

以上内容若有错误,请批评指正,欢迎交流。

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/293536.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号