栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 软件开发 > 后端开发 > Python

矩阵求导学习笔记

Python 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

矩阵求导学习笔记

文章目录

1. 手推机器学习-矩阵求导

1.1 绪论1.2 ML中为什么需要矩阵求导1.3 向量函数与矩阵求导初印象1.4 矩阵求导-YX拉伸术1.5 常见矩阵求导公式举例1.6 求导细节补充

1. 手推机器学习-矩阵求导

B站链接

1.1 绪论

(1)理论

ML中为什么需要矩阵求导向量函数与矩阵求导初印象矩阵求导-YX拉伸术

(2)实战

常见矩阵求导公式举例矩阵求导补充最小二乘法 1.2 ML中为什么需要矩阵求导

向量化的数据会让计算变得简单
对于一个方程组来说
y 1 = W 1 x 11 + W 2 x 12 y_1=W_{1}x_{11}+W_2x_{12} y1​=W1​x11​+W2​x12​
y 2 = W 1 x 21 + W 2 x 22 y_2=W_{1}x_{21}+W_2x_{22} y2​=W1​x21​+W2​x22​

向量化后可以简写为
[ y 1 y 2 ] = [ x 11 x 12 x 21 x 22 ] [ W 1 W 2 ] (1) begin{bmatrix} y_1\\y_2end{bmatrix}=begin{bmatrix}x_{11}&x_{12}\\x_{21}&x_{22}end{bmatrix}begin{bmatrix}W_1\\W_2end{bmatrix}tag{1} ⎣⎡​y1​y2​​⎦⎤​=⎣⎡​x11​x21​​x12​x22​​⎦⎤​⎣⎡​W1​W2​​⎦⎤​(1)
Y = X W (2) Y=XWtag{2} Y=XW(2)
由上可以看出,不管我们的怎么增加x,y,w我们都可以用公式2进行表示,那么我们就可以看出来
for 循环和numpy矩阵运算

向量化计算运算快

我们来对同样一组数据进行比较处理,看看for循环与numpy的矩阵

# -*- coding: utf-8 -*-
# @Project: zc
# @Author: zc
# @File name: numpy_new_test
# @Create time: 2022/3/16 18:43
import numpy as np
import time

a = np.random.rand(10000000)
b = np.random.rand(10000000)
time_cur = time.time()
c = a.dot(b)
time_later = time.time()
print(f"c={c}")
vec_time = 1000 * (time_later - time_cur)
print("vectorized is " + str(vec_time) + "ms")
print()
c = 0
time_cur = time.time()
for i in range(a.size):
	c += a[i] * b[i]
time_later = time.time()
print(f"c={c}")
loop_time = 1000 * (time_later - time_cur)
print("Loop is " + str(loop_time) + "ms")
print()
print("times is " + str(loop_time / vec_time))

# 矢量化的时间-用 numpy 计算
c=2499945.9800939467
vectorized is 7.472991943359375ms

# for循环的时间-用for 计算
c=2499945.9800934764
Loop is 3543.708086013794ms

# numpy 居然比 for 循环块474倍
times is 474.2020482388974
1.3 向量函数与矩阵求导初印象

标量函数:输出为标量的函数
f ( x ) = x 2 ; x ∈ R ; f ( x ) = x 2 ∈ R f(x)=x^2;xin R;f(x)=x^2in R f(x)=x2;x∈R;f(x)=x2∈R
f ( x ) = x 1 2 + x 2 2 ; x = [ x 1 , x 2 ] ∈ R 2 , f ( x ) = x 1 2 + x 2 2 ∈ R (3) f(x)=x_1^2+x_2^2;x=[x_1,x_2]in R^2,f(x)=x_1^2+x_2^2 in Rtag{3} f(x)=x12​+x22​;x=[x1​,x2​]∈R2,f(x)=x12​+x22​∈R(3)输入标量;输出矩阵函数
f ( x ) = [ f 1 ( x ) = x f 2 ( x ) = x 2 ] ; x ∈ R ; [ f 1 ( x ) f 2 ( x ) ] ∈ R 2 (4) f(x)=begin{bmatrix}f_1(x)=x\\f_2(x)=x^2end{bmatrix};xin R;begin{bmatrix}f_1(x)\\f_2(x)end{bmatrix}in R^2tag{4} f(x)=⎣⎡​f1​(x)=xf2​(x)=x2​⎦⎤​;x∈R;⎣⎡​f1​(x)f2​(x)​⎦⎤​∈R2(4)
f ( x ) = [ f 11 ( x ) = x f 12 ( x ) = x 2 f 21 ( x ) = x 3 f 22 ( x ) = x 4 ] ; x ∈ R ; [ f 11 ( x ) f 12 ( x ) f 12 ( x ) f 22 ( x ) ] ∈ R 4 (5) f(x)=begin{bmatrix}f_{11}(x)=x&f_{12}(x)=x^2\\f_{21}(x)=x^3&f_{22}(x)=x^4end{bmatrix};xin R;begin{bmatrix}f_{11}(x)&f_{12}(x)\\f_{12}(x)&f_{22}(x)end{bmatrix}in R^4tag{5} f(x)=⎣⎡​f11​(x)=xf21​(x)=x3​f12​(x)=x2f22​(x)=x4​⎦⎤​;x∈R;⎣⎡​f11​(x)f12​(x)​f12​(x)f22​(x)​⎦⎤​∈R4(5)输入矩阵,输出矩阵函数
f ( x 1 , x 2 ) = [ f 11 ( x ) = x 1 + x 2 f 12 ( x ) = x 1 2 + x 2 2 f 21 ( x ) = x 1 3 + x 2 3 f 22 ( x ) = x 1 4 + x 2 4 ] ; x ∈ R 2 ; [ f 11 ( x ) f 12 ( x ) f 12 ( x ) f 22 ( x ) ] ∈ R 4 (6) f(x_1,x_2)=begin{bmatrix}f_{11}(x)=x_1+x_2&f_{12}(x)=x_1^2+x_2^2\\f_{21}(x)=x_1^3+x_2^3&f_{22}(x)=x_1^4+x_2^4end{bmatrix};xin R^2;begin{bmatrix}f_{11}(x)&f_{12}(x)\\f_{12}(x)&f_{22}(x)end{bmatrix}in R^4tag{6} f(x1​,x2​)=⎣⎡​f11​(x)=x1​+x2​f21​(x)=x13​+x23​​f12​(x)=x12​+x22​f22​(x)=x14​+x24​​⎦⎤​;x∈R2;⎣⎡​f11​(x)f12​(x)​f12​(x)f22​(x)​⎦⎤​∈R4(6)求导的本质
∂ A ∂ B = ? : 指 的 是 每 一 个 来 自 A 的 元 素 对 每 一 个 自 B 的 元 素 求 导 frac{partial A}{partial B}=?:指的是每一个来自A的元素对每一个自B的元素求导 ∂B∂A​=?:指的是每一个来自A的元素对每一个自B的元素求导 1.4 矩阵求导-YX拉伸术

标量不变,向量拉伸前面横向拉,后面纵向拉(YX:Y在前-横向拉,X在后-纵向拉)
(1)假设 f ( x ) f(x) f(x)为标量,x为向量;我们可以得到如下:
f ( x 1 , x 2 , . . . , x n ) = x 1 + x 2 , . . . , + x n (7) f(x_1,x_2,...,x_n)=x_1+x_2,...,+x_ntag{7} f(x1​,x2​,...,xn​)=x1​+x2​,...,+xn​(7)
x = [ x 1 , x 2 , . . . , x n ] T (8) x=[x_1,x_2,...,x_n]^Ttag{8} x=[x1​,x2​,...,xn​]T(8)
保证标量f(x)不变,向量x拉伸, ∂ f ( x ) ∂ x frac{partial f(x)}{partial x} ∂x∂f(x)​-> YX;X在后面,所以纵向拉,f(x)标量不变;可得如下
∂ f ( x ) ∂ x = [ ∂ f ( x ) ∂ x 1 ∂ f ( x ) ∂ x 2 ⋮ ∂ f ( x ) ∂ x n ] (9) frac{partial f(x)}{partial x}=begin{bmatrix}frac{partial f(x)}{partial x_1}\\frac{partial f(x)}{partial x_2}\vdots\frac{partial f(x)}{partial x_n}end{bmatrix}tag{9} ∂x∂f(x)​=⎣⎢⎢⎢⎢⎢⎢⎡​∂x1​∂f(x)​∂x2​∂f(x)​⋮∂xn​∂f(x)​​⎦⎥⎥⎥⎥⎥⎥⎤​(9)
(2)假设 f ( x ) f(x) f(x)是向量,x是标量;由于x是标量,所以不变;由于YX中Y在前,所以Y得横向拉;我们可以得到如下:
f ( x ) = [ f 1 ( x ) f 2 ( x ) ⋮ f n ( x ) ] (10) f(x)=begin{bmatrix}f_1(x)\\f_2(x)\vdots\f_n(x)end{bmatrix}tag{10} f(x)=⎣⎢⎢⎢⎢⎢⎡​f1​(x)f2​(x)⋮fn​(x)​⎦⎥⎥⎥⎥⎥⎤​(10)标量X不变,Y=f(x)在前横向拉:
∂ f ( x ) ∂ x = [ ∂ f 1 ( x ) ∂ x , ∂ f 2 ( x ) ∂ x , . . . , ∂ f n ( x ) ∂ x ] (11) frac{partial f(x)}{partial x}=[frac{partial f_1(x)}{partial x},frac{partial f_2(x)}{partial x},...,frac{partial f_n(x)}{partial x}]tag{11} ∂x∂f(x)​=[∂x∂f1​(x)​,∂x∂f2​(x)​,...,∂x∂fn​(x)​](11)
(3)假设 f ( x ) f(x) f(x)是向量函数,x是向量
f ( x ) = [ f 1 ( x ) f 2 ( x ) ⋮ f n ( x ) ] ; x = [ x 1 x 2 ⋮ x n ] ; (12) f(x)=begin{bmatrix}f_1(x)\\f_2(x)\vdots\f_n(x)end{bmatrix};x=begin{bmatrix}x_1\\x_2\vdots\x_nend{bmatrix};tag{12} f(x)=⎣⎢⎢⎢⎢⎢⎡​f1​(x)f2​(x)⋮fn​(x)​⎦⎥⎥⎥⎥⎥⎤​;x=⎣⎢⎢⎢⎢⎢⎡​x1​x2​⋮xn​​⎦⎥⎥⎥⎥⎥⎤​;(12)先拉伸X,因为YX中X在后面,所以X在后-纵向拉, f ( x ) f(x) f(x)先保持不变
∂ f ( x ) ∂ x = [ ∂ f ( x ) ∂ x 1 ∂ f ( x ) ∂ x 2 ⋮ ∂ f ( x ) ∂ x n ] (13) frac{partial f(x)}{partial x}=begin{bmatrix}frac{partial f(x)}{partial x_1}\\\frac{partial f(x)}{partial x_2}\vdots\\frac{partial f(x)}{partial x_n}end{bmatrix}tag{13} ∂x∂f(x)​=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡​∂x1​∂f(x)​∂x2​∂f(x)​⋮∂xn​∂f(x)​​⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤​(13)在拉伸 Y = f ( x ) Y=f(x) Y=f(x);Y在前-横向拉
∂ f ( x ) ∂ x = [ ∂ f ( x ) ∂ x 1 ∂ f ( x ) ∂ x 2 ⋮ ∂ f ( x ) ∂ x n ] = [ ∂ f 1 ( x ) ∂ x 1 ∂ f 2 ( x ) ∂ x 1 … ∂ f n ( x ) ∂ x 1 ∂ f 1 ( x ) ∂ x 2 ∂ f 2 ( x ) ∂ x 2 … ∂ f n ( x ) ∂ x 2 ⋮ ⋮ ⋮ ⋮ ∂ f 1 ( x ) ∂ x n ∂ f 2 ( x ) ∂ x n … ∂ f n ( x ) ∂ x n ] (14) frac{partial f(x)}{partial x}=begin{bmatrix}frac{partial f(x)}{partial x_1}\\\frac{partial f(x)}{partial x_2}\vdots\\frac{partial f(x)}{partial x_n}end{bmatrix}=begin{bmatrix}frac{partial f_1(x)}{partial x_1}&frac{partial f_2(x)}{partial x_1}&dots&frac{partial f_n(x)}{partial x_1} \\\frac{partial f_1(x)}{partial x_2}&frac{partial f_2(x)}{partial x_2}&dots&frac{partial f_n(x)}{partial x_2}\vdots&vdots&vdots&vdots\\frac{partial f_1(x)}{partial x_n}&frac{partial f_2(x)}{partial x_n}&dots&frac{partial f_n(x)}{partial x_n}end{bmatrix}tag{14} ∂x∂f(x)​=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡​∂x1​∂f(x)​∂x2​∂f(x)​⋮∂xn​∂f(x)​​⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤​=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡​∂x1​∂f1​(x)​∂x2​∂f1​(x)​⋮∂xn​∂f1​(x)​​∂x1​∂f2​(x)​∂x2​∂f2​(x)​⋮∂xn​∂f2​(x)​​……⋮…​∂x1​∂fn​(x)​∂x2​∂fn​(x)​⋮∂xn​∂fn​(x)​​⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤​(14) 1.5 常见矩阵求导公式举例

(1) f ( x ) f(x) f(x)是标量,x是向量
f ( x ) = A T X (15) f(x)=A^TXtag{15} f(x)=ATX(15)
A = [ a 1 , a 2 , . . . , a n ] T ; X = [ x 1 , x 2 , . . . , x n ] T (16) A=[a_1,a_2,...,a_n]^T;X=[x_1,x_2,...,x_n]^Ttag{16} A=[a1​,a2​,...,an​]T;X=[x1​,x2​,...,xn​]T(16)

因为f(x)为标量,所以标量不变,YX的X在后面,所以X纵向拉伸,故可得如下
∂ f ( x ) ∂ x = [ ∂ f ( x ) ∂ x 1 ∂ f ( x ) ∂ x 2   ⋮ ∂ f ( x ) ∂ x n ] (17) frac{partial f(x)}{partial x}=begin{bmatrix}frac{partial f(x)}{partial x_1}\\frac{partial f(x)}{partial x_2}\ vdots\frac{partial f(x)}{partial x_n}end{bmatrix}tag{17} ∂x∂f(x)​=⎣⎢⎢⎢⎢⎢⎢⎡​∂x1​∂f(x)​∂x2​∂f(x)​ ⋮∂xn​∂f(x)​​⎦⎥⎥⎥⎥⎥⎥⎤​(17)由于 f ( x ) = ∑ i = 1 n ∑ j = 1 n a i x j f(x)=sum_{i=1}^nsum_{j=1}^na_ix_j f(x)=∑i=1n​∑j=1n​ai​xj​;所以可得偏导如下:
∂ f ( x ) ∂ x i = a i (18) frac{partial f(x)}{partial x_i}=a_itag{18} ∂xi​∂f(x)​=ai​(18)故导数可得如下:
∂ f ( x ) ∂ x = [ a 1 a 2   ⋮ a n ] = A (19) frac{partial f(x)}{partial x}=begin{bmatrix}a_1\\a_2\ vdots\a_nend{bmatrix}=Atag{19} ∂x∂f(x)​=⎣⎢⎢⎢⎢⎢⎡​a1​a2​ ⋮an​​⎦⎥⎥⎥⎥⎥⎤​=A(19)
(2)f(x)是二次型,x是列向量
f ( x ) = X T A X = ∑ i = 1 n ∑ j = 1 n a i j x i x j (20) f(x)=X^TAX=sum_{i=1}^nsum_{j=1}^na_{ij}x_ix_jtag{20} f(x)=XTAX=i=1∑n​j=1∑n​aij​xi​xj​(20)
X = [ x 1 , x 2 , . . . , x n ] T ; A = [ a 11 a 12 … a 1 n a 21 a 22 … a 2 n ⋮ ⋮ … ⋮ a n 1 a n 2 … a n n ] (21) X=[x_1,x_2,...,x_n]^T;A=begin{bmatrix}a_{11}&a_{12}&dots&a_{1n}\a_{21}&a_{22}&dots&a_{2n}\vdots&vdots&dots&vdots\a_{n1}&a_{n2}&dots&a_{nn} end{bmatrix}tag{21} X=[x1​,x2​,...,xn​]T;A=⎣⎢⎢⎢⎡​a11​a21​⋮an1​​a12​a22​⋮an2​​…………​a1n​a2n​⋮ann​​⎦⎥⎥⎥⎤​(21)f(x)是标量,YX中X纵向拉伸
∂ f ( x ) ∂ x = [ ∂ f ( x ) ∂ x 1 ∂ f ( x ) ∂ x 2 ⋮ ∂ f ( x ) ∂ x n ] = [ ∑ j = 1 n a 1 j x j + ∑ i = 1 n a i 1 x i ∑ j = 1 n a 2 j x j + ∑ i = 1 n a i 2 x i ⋮ ∑ j = 1 n a n j x j + ∑ i = 1 n a i n x i ] (22) frac{partial f(x)}{partial x}=begin{bmatrix}frac{partial f(x)}{partial x_1}\\frac{partial f(x)}{partial x_2}\\vdots\\frac{partial f(x)}{partial x_n}\end{bmatrix}=begin{bmatrix}sum_{j=1}^na_{1j}x_j+sum_{i=1}^na_{i1}x_i\\sum_{j=1}^na_{2j}x_j+sum_{i=1}^na_{i2}x_i\\vdots\\sum_{j=1}^na_{nj}x_j+sum_{i=1}^na_{in}x_i\end{bmatrix}tag{22} ∂x∂f(x)​=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡​∂x1​∂f(x)​∂x2​∂f(x)​⋮∂xn​∂f(x)​​⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤​=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡​∑j=1n​a1j​xj​+∑i=1n​ai1​xi​∑j=1n​a2j​xj​+∑i=1n​ai2​xi​⋮∑j=1n​anj​xj​+∑i=1n​ain​xi​​⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤​(22)
∂ f ( x ) ∂ x = [ ∑ j = 1 n a 1 j x j ∑ j = 1 n a 2 j x j ⋮ ∑ j = 1 n a n j x j ] + [ ∑ i = 1 n a i 1 x i ∑ i = 1 n a i 2 x i ⋮ ∑ i = 1 n a i n x i ] (23) frac{partial f(x)}{partial x}=begin{bmatrix}sum_{j=1}^na_{1j}x_j\\sum_{j=1}^na_{2j}x_j\\vdots\\sum_{j=1}^na_{nj}x_j\end{bmatrix}+begin{bmatrix}sum_{i=1}^na_{i1}x_i\\sum_{i=1}^na_{i2}x_i\\vdots\\sum_{i=1}^na_{in}x_i\end{bmatrix}tag{23} ∂x∂f(x)​=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡​∑j=1n​a1j​xj​∑j=1n​a2j​xj​⋮∑j=1n​anj​xj​​⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤​+⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡​∑i=1n​ai1​xi​∑i=1n​ai2​xi​⋮∑i=1n​ain​xi​​⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤​(23)
∂ f ( x ) ∂ x = [ ∑ j = 1 n a 1 j x j ∑ j = 1 n a 2 j x j ⋮ ∑ j = 1 n a n j x j ] + [ ∑ i = 1 n a i 1 x i ∑ i = 1 n a i 2 x i ⋮ ∑ i = 1 n a i n x i ] (24) frac{partial f(x)}{partial x}=begin{bmatrix}sum_{j=1}^na_{1j}x_j\\sum_{j=1}^na_{2j}x_j\\vdots\\sum_{j=1}^na_{nj}x_j\end{bmatrix}+begin{bmatrix}sum_{i=1}^na_{i1}x_i\\sum_{i=1}^na_{i2}x_i\\vdots\\sum_{i=1}^na_{in}x_i\end{bmatrix}tag{24} ∂x∂f(x)​=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡​∑j=1n​a1j​xj​∑j=1n​a2j​xj​⋮∑j=1n​anj​xj​​⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤​+⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡​∑i=1n​ai1​xi​∑i=1n​ai2​xi​⋮∑i=1n​ain​xi​​⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤​(24)
[ ∑ j = 1 n a 1 j x j ∑ j = 1 n a 2 j x j ⋮ ∑ j = 1 n a n j x j ] = [ a 11 a 12 … a 1 n a 21 a 22 … a 2 n ⋮ ⋮ … ⋮ a n 1 a n 2 … a n n ] [ x 1 x 2 ⋮ x n ] = A X (25) begin{bmatrix}sum_{j=1}^na_{1j}x_j\\sum_{j=1}^na_{2j}x_j\\vdots\\sum_{j=1}^na_{nj}x_j\end{bmatrix}=begin{bmatrix}a_{11}&a_{12}&dots&a_{1n}\a_{21}&a_{22}&dots&a_{2n}\vdots&vdots&dots&vdots\a_{n1}&a_{n2}&dots&a_{nn} end{bmatrix}begin{bmatrix}x_1\\x_2\\vdots\\x_nend{bmatrix}=AXtag{25} ⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡​∑j=1n​a1j​xj​∑j=1n​a2j​xj​⋮∑j=1n​anj​xj​​⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤​=⎣⎢⎢⎢⎡​a11​a21​⋮an1​​a12​a22​⋮an2​​…………​a1n​a2n​⋮ann​​⎦⎥⎥⎥⎤​⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡​x1​x2​⋮xn​​⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤​=AX(25)
[ ∑ i = 1 n a i 1 x i ∑ i = 1 n a i 2 x i ⋮ ∑ i = 1 n a i n x i ] = [ a 11 a 21 … a n 1 a 12 a 22 … a n 2 ⋮ ⋮ … ⋮ a 1 n a 2 n … a n n ] [ x 1 x 2 ⋮ x n ] = A T X (26) begin{bmatrix}sum_{i=1}^na_{i1}x_i\\sum_{i=1}^na_{i2}x_i\\vdots\\sum_{i=1}^na_{in}x_i\end{bmatrix}=begin{bmatrix}a_{11}&a_{21}&dots&a_{n1}\a_{12}&a_{22}&dots&a_{n2}\vdots&vdots&dots&vdots\a_{1n}&a_{2n}&dots&a_{nn} end{bmatrix}begin{bmatrix}x_1\\x_2\\vdots\\x_nend{bmatrix}=A^TXtag{26} ⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡​∑i=1n​ai1​xi​∑i=1n​ai2​xi​⋮∑i=1n​ain​xi​​⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤​=⎣⎢⎢⎢⎡​a11​a12​⋮a1n​​a21​a22​⋮a2n​​…………​an1​an2​⋮ann​​⎦⎥⎥⎥⎤​⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡​x1​x2​⋮xn​​⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤​=ATX(26)
∂ f ( x ) ∂ x = ∂ ( X T A X ) ∂ x = A X + A T X = ( A + A T ) X (27) frac{partial f(x)}{partial x}=frac{partial (X^TAX)}{partial x}=AX+A^TX=(A+A^T)Xtag{27} ∂x∂f(x)​=∂x∂(XTAX)​=AX+ATX=(A+AT)X(27)当A为对称矩阵时,满足 A T = A A^T=A AT=A那么上式可得:
∂ f ( x ) ∂ x = ∂ ( X T A X ) ∂ x = A X + A T X = 2 A X (28) frac{partial f(x)}{partial x}=frac{partial (X^TAX)}{partial x}=AX+A^TX=2AXtag{28} ∂x∂f(x)​=∂x∂(XTAX)​=AX+ATX=2AX(28) 1.6 求导细节补充

分子布局和分母布局的区别:
详见知乎大佬链接:分子分母布局说明

分母布局- YX拉伸术;分子布局-XY拉伸术;X在前面就像分数的X/Y就是分子布局,X在后面就像分数的Y/X就是分母布局区别:向量求导拉伸方向的区别;拉伸方向的口诀是不变的:口诀:前面横向拉,后面纵向拉

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/770106.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号