1. 手推机器学习-矩阵求导
1.1 绪论1.2 ML中为什么需要矩阵求导1.3 向量函数与矩阵求导初印象1.4 矩阵求导-YX拉伸术1.5 常见矩阵求导公式举例1.6 求导细节补充
1. 手推机器学习-矩阵求导B站链接
1.1 绪论(1)理论
ML中为什么需要矩阵求导向量函数与矩阵求导初印象矩阵求导-YX拉伸术
(2)实战
常见矩阵求导公式举例矩阵求导补充最小二乘法 1.2 ML中为什么需要矩阵求导
向量化的数据会让计算变得简单
对于一个方程组来说
y
1
=
W
1
x
11
+
W
2
x
12
y_1=W_{1}x_{11}+W_2x_{12}
y1=W1x11+W2x12
y
2
=
W
1
x
21
+
W
2
x
22
y_2=W_{1}x_{21}+W_2x_{22}
y2=W1x21+W2x22
向量化后可以简写为
[
y
1
y
2
]
=
[
x
11
x
12
x
21
x
22
]
[
W
1
W
2
]
(1)
begin{bmatrix} y_1\\y_2end{bmatrix}=begin{bmatrix}x_{11}&x_{12}\\x_{21}&x_{22}end{bmatrix}begin{bmatrix}W_1\\W_2end{bmatrix}tag{1}
⎣⎡y1y2⎦⎤=⎣⎡x11x21x12x22⎦⎤⎣⎡W1W2⎦⎤(1)
Y
=
X
W
(2)
Y=XWtag{2}
Y=XW(2)
由上可以看出,不管我们的怎么增加x,y,w我们都可以用公式2进行表示,那么我们就可以看出来
for 循环和numpy矩阵运算
向量化计算运算快
我们来对同样一组数据进行比较处理,看看for循环与numpy的矩阵
# -*- coding: utf-8 -*-
# @Project: zc
# @Author: zc
# @File name: numpy_new_test
# @Create time: 2022/3/16 18:43
import numpy as np
import time
a = np.random.rand(10000000)
b = np.random.rand(10000000)
time_cur = time.time()
c = a.dot(b)
time_later = time.time()
print(f"c={c}")
vec_time = 1000 * (time_later - time_cur)
print("vectorized is " + str(vec_time) + "ms")
print()
c = 0
time_cur = time.time()
for i in range(a.size):
c += a[i] * b[i]
time_later = time.time()
print(f"c={c}")
loop_time = 1000 * (time_later - time_cur)
print("Loop is " + str(loop_time) + "ms")
print()
print("times is " + str(loop_time / vec_time))
# 矢量化的时间-用 numpy 计算 c=2499945.9800939467 vectorized is 7.472991943359375ms # for循环的时间-用for 计算 c=2499945.9800934764 Loop is 3543.708086013794ms # numpy 居然比 for 循环块474倍 times is 474.20204823889741.3 向量函数与矩阵求导初印象
标量函数:输出为标量的函数
f
(
x
)
=
x
2
;
x
∈
R
;
f
(
x
)
=
x
2
∈
R
f(x)=x^2;xin R;f(x)=x^2in R
f(x)=x2;x∈R;f(x)=x2∈R
f
(
x
)
=
x
1
2
+
x
2
2
;
x
=
[
x
1
,
x
2
]
∈
R
2
,
f
(
x
)
=
x
1
2
+
x
2
2
∈
R
(3)
f(x)=x_1^2+x_2^2;x=[x_1,x_2]in R^2,f(x)=x_1^2+x_2^2 in Rtag{3}
f(x)=x12+x22;x=[x1,x2]∈R2,f(x)=x12+x22∈R(3)输入标量;输出矩阵函数
f
(
x
)
=
[
f
1
(
x
)
=
x
f
2
(
x
)
=
x
2
]
;
x
∈
R
;
[
f
1
(
x
)
f
2
(
x
)
]
∈
R
2
(4)
f(x)=begin{bmatrix}f_1(x)=x\\f_2(x)=x^2end{bmatrix};xin R;begin{bmatrix}f_1(x)\\f_2(x)end{bmatrix}in R^2tag{4}
f(x)=⎣⎡f1(x)=xf2(x)=x2⎦⎤;x∈R;⎣⎡f1(x)f2(x)⎦⎤∈R2(4)
f
(
x
)
=
[
f
11
(
x
)
=
x
f
12
(
x
)
=
x
2
f
21
(
x
)
=
x
3
f
22
(
x
)
=
x
4
]
;
x
∈
R
;
[
f
11
(
x
)
f
12
(
x
)
f
12
(
x
)
f
22
(
x
)
]
∈
R
4
(5)
f(x)=begin{bmatrix}f_{11}(x)=x&f_{12}(x)=x^2\\f_{21}(x)=x^3&f_{22}(x)=x^4end{bmatrix};xin R;begin{bmatrix}f_{11}(x)&f_{12}(x)\\f_{12}(x)&f_{22}(x)end{bmatrix}in R^4tag{5}
f(x)=⎣⎡f11(x)=xf21(x)=x3f12(x)=x2f22(x)=x4⎦⎤;x∈R;⎣⎡f11(x)f12(x)f12(x)f22(x)⎦⎤∈R4(5)输入矩阵,输出矩阵函数
f
(
x
1
,
x
2
)
=
[
f
11
(
x
)
=
x
1
+
x
2
f
12
(
x
)
=
x
1
2
+
x
2
2
f
21
(
x
)
=
x
1
3
+
x
2
3
f
22
(
x
)
=
x
1
4
+
x
2
4
]
;
x
∈
R
2
;
[
f
11
(
x
)
f
12
(
x
)
f
12
(
x
)
f
22
(
x
)
]
∈
R
4
(6)
f(x_1,x_2)=begin{bmatrix}f_{11}(x)=x_1+x_2&f_{12}(x)=x_1^2+x_2^2\\f_{21}(x)=x_1^3+x_2^3&f_{22}(x)=x_1^4+x_2^4end{bmatrix};xin R^2;begin{bmatrix}f_{11}(x)&f_{12}(x)\\f_{12}(x)&f_{22}(x)end{bmatrix}in R^4tag{6}
f(x1,x2)=⎣⎡f11(x)=x1+x2f21(x)=x13+x23f12(x)=x12+x22f22(x)=x14+x24⎦⎤;x∈R2;⎣⎡f11(x)f12(x)f12(x)f22(x)⎦⎤∈R4(6)求导的本质
∂
A
∂
B
=
?
:
指
的
是
每
一
个
来
自
A
的
元
素
对
每
一
个
自
B
的
元
素
求
导
frac{partial A}{partial B}=?:指的是每一个来自A的元素对每一个自B的元素求导
∂B∂A=?:指的是每一个来自A的元素对每一个自B的元素求导
1.4 矩阵求导-YX拉伸术
标量不变,向量拉伸前面横向拉,后面纵向拉(YX:Y在前-横向拉,X在后-纵向拉)
(1)假设
f
(
x
)
f(x)
f(x)为标量,x为向量;我们可以得到如下:
f
(
x
1
,
x
2
,
.
.
.
,
x
n
)
=
x
1
+
x
2
,
.
.
.
,
+
x
n
(7)
f(x_1,x_2,...,x_n)=x_1+x_2,...,+x_ntag{7}
f(x1,x2,...,xn)=x1+x2,...,+xn(7)
x
=
[
x
1
,
x
2
,
.
.
.
,
x
n
]
T
(8)
x=[x_1,x_2,...,x_n]^Ttag{8}
x=[x1,x2,...,xn]T(8)
保证标量f(x)不变,向量x拉伸,
∂
f
(
x
)
∂
x
frac{partial f(x)}{partial x}
∂x∂f(x)-> YX;X在后面,所以纵向拉,f(x)标量不变;可得如下
∂
f
(
x
)
∂
x
=
[
∂
f
(
x
)
∂
x
1
∂
f
(
x
)
∂
x
2
⋮
∂
f
(
x
)
∂
x
n
]
(9)
frac{partial f(x)}{partial x}=begin{bmatrix}frac{partial f(x)}{partial x_1}\\frac{partial f(x)}{partial x_2}\vdots\frac{partial f(x)}{partial x_n}end{bmatrix}tag{9}
∂x∂f(x)=⎣⎢⎢⎢⎢⎢⎢⎡∂x1∂f(x)∂x2∂f(x)⋮∂xn∂f(x)⎦⎥⎥⎥⎥⎥⎥⎤(9)
(2)假设
f
(
x
)
f(x)
f(x)是向量,x是标量;由于x是标量,所以不变;由于YX中Y在前,所以Y得横向拉;我们可以得到如下:
f
(
x
)
=
[
f
1
(
x
)
f
2
(
x
)
⋮
f
n
(
x
)
]
(10)
f(x)=begin{bmatrix}f_1(x)\\f_2(x)\vdots\f_n(x)end{bmatrix}tag{10}
f(x)=⎣⎢⎢⎢⎢⎢⎡f1(x)f2(x)⋮fn(x)⎦⎥⎥⎥⎥⎥⎤(10)标量X不变,Y=f(x)在前横向拉:
∂
f
(
x
)
∂
x
=
[
∂
f
1
(
x
)
∂
x
,
∂
f
2
(
x
)
∂
x
,
.
.
.
,
∂
f
n
(
x
)
∂
x
]
(11)
frac{partial f(x)}{partial x}=[frac{partial f_1(x)}{partial x},frac{partial f_2(x)}{partial x},...,frac{partial f_n(x)}{partial x}]tag{11}
∂x∂f(x)=[∂x∂f1(x),∂x∂f2(x),...,∂x∂fn(x)](11)
(3)假设
f
(
x
)
f(x)
f(x)是向量函数,x是向量
f
(
x
)
=
[
f
1
(
x
)
f
2
(
x
)
⋮
f
n
(
x
)
]
;
x
=
[
x
1
x
2
⋮
x
n
]
;
(12)
f(x)=begin{bmatrix}f_1(x)\\f_2(x)\vdots\f_n(x)end{bmatrix};x=begin{bmatrix}x_1\\x_2\vdots\x_nend{bmatrix};tag{12}
f(x)=⎣⎢⎢⎢⎢⎢⎡f1(x)f2(x)⋮fn(x)⎦⎥⎥⎥⎥⎥⎤;x=⎣⎢⎢⎢⎢⎢⎡x1x2⋮xn⎦⎥⎥⎥⎥⎥⎤;(12)先拉伸X,因为YX中X在后面,所以X在后-纵向拉,
f
(
x
)
f(x)
f(x)先保持不变
∂
f
(
x
)
∂
x
=
[
∂
f
(
x
)
∂
x
1
∂
f
(
x
)
∂
x
2
⋮
∂
f
(
x
)
∂
x
n
]
(13)
frac{partial f(x)}{partial x}=begin{bmatrix}frac{partial f(x)}{partial x_1}\\\frac{partial f(x)}{partial x_2}\vdots\\frac{partial f(x)}{partial x_n}end{bmatrix}tag{13}
∂x∂f(x)=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡∂x1∂f(x)∂x2∂f(x)⋮∂xn∂f(x)⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤(13)在拉伸
Y
=
f
(
x
)
Y=f(x)
Y=f(x);Y在前-横向拉
∂
f
(
x
)
∂
x
=
[
∂
f
(
x
)
∂
x
1
∂
f
(
x
)
∂
x
2
⋮
∂
f
(
x
)
∂
x
n
]
=
[
∂
f
1
(
x
)
∂
x
1
∂
f
2
(
x
)
∂
x
1
…
∂
f
n
(
x
)
∂
x
1
∂
f
1
(
x
)
∂
x
2
∂
f
2
(
x
)
∂
x
2
…
∂
f
n
(
x
)
∂
x
2
⋮
⋮
⋮
⋮
∂
f
1
(
x
)
∂
x
n
∂
f
2
(
x
)
∂
x
n
…
∂
f
n
(
x
)
∂
x
n
]
(14)
frac{partial f(x)}{partial x}=begin{bmatrix}frac{partial f(x)}{partial x_1}\\\frac{partial f(x)}{partial x_2}\vdots\\frac{partial f(x)}{partial x_n}end{bmatrix}=begin{bmatrix}frac{partial f_1(x)}{partial x_1}&frac{partial f_2(x)}{partial x_1}&dots&frac{partial f_n(x)}{partial x_1} \\\frac{partial f_1(x)}{partial x_2}&frac{partial f_2(x)}{partial x_2}&dots&frac{partial f_n(x)}{partial x_2}\vdots&vdots&vdots&vdots\\frac{partial f_1(x)}{partial x_n}&frac{partial f_2(x)}{partial x_n}&dots&frac{partial f_n(x)}{partial x_n}end{bmatrix}tag{14}
∂x∂f(x)=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡∂x1∂f(x)∂x2∂f(x)⋮∂xn∂f(x)⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡∂x1∂f1(x)∂x2∂f1(x)⋮∂xn∂f1(x)∂x1∂f2(x)∂x2∂f2(x)⋮∂xn∂f2(x)……⋮…∂x1∂fn(x)∂x2∂fn(x)⋮∂xn∂fn(x)⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤(14)
1.5 常见矩阵求导公式举例
(1)
f
(
x
)
f(x)
f(x)是标量,x是向量
f
(
x
)
=
A
T
X
(15)
f(x)=A^TXtag{15}
f(x)=ATX(15)
A
=
[
a
1
,
a
2
,
.
.
.
,
a
n
]
T
;
X
=
[
x
1
,
x
2
,
.
.
.
,
x
n
]
T
(16)
A=[a_1,a_2,...,a_n]^T;X=[x_1,x_2,...,x_n]^Ttag{16}
A=[a1,a2,...,an]T;X=[x1,x2,...,xn]T(16)
因为f(x)为标量,所以标量不变,YX的X在后面,所以X纵向拉伸,故可得如下
∂
f
(
x
)
∂
x
=
[
∂
f
(
x
)
∂
x
1
∂
f
(
x
)
∂
x
2
⋮
∂
f
(
x
)
∂
x
n
]
(17)
frac{partial f(x)}{partial x}=begin{bmatrix}frac{partial f(x)}{partial x_1}\\frac{partial f(x)}{partial x_2}\ vdots\frac{partial f(x)}{partial x_n}end{bmatrix}tag{17}
∂x∂f(x)=⎣⎢⎢⎢⎢⎢⎢⎡∂x1∂f(x)∂x2∂f(x) ⋮∂xn∂f(x)⎦⎥⎥⎥⎥⎥⎥⎤(17)由于
f
(
x
)
=
∑
i
=
1
n
∑
j
=
1
n
a
i
x
j
f(x)=sum_{i=1}^nsum_{j=1}^na_ix_j
f(x)=∑i=1n∑j=1naixj;所以可得偏导如下:
∂
f
(
x
)
∂
x
i
=
a
i
(18)
frac{partial f(x)}{partial x_i}=a_itag{18}
∂xi∂f(x)=ai(18)故导数可得如下:
∂
f
(
x
)
∂
x
=
[
a
1
a
2
⋮
a
n
]
=
A
(19)
frac{partial f(x)}{partial x}=begin{bmatrix}a_1\\a_2\ vdots\a_nend{bmatrix}=Atag{19}
∂x∂f(x)=⎣⎢⎢⎢⎢⎢⎡a1a2 ⋮an⎦⎥⎥⎥⎥⎥⎤=A(19)
(2)f(x)是二次型,x是列向量
f
(
x
)
=
X
T
A
X
=
∑
i
=
1
n
∑
j
=
1
n
a
i
j
x
i
x
j
(20)
f(x)=X^TAX=sum_{i=1}^nsum_{j=1}^na_{ij}x_ix_jtag{20}
f(x)=XTAX=i=1∑nj=1∑naijxixj(20)
X
=
[
x
1
,
x
2
,
.
.
.
,
x
n
]
T
;
A
=
[
a
11
a
12
…
a
1
n
a
21
a
22
…
a
2
n
⋮
⋮
…
⋮
a
n
1
a
n
2
…
a
n
n
]
(21)
X=[x_1,x_2,...,x_n]^T;A=begin{bmatrix}a_{11}&a_{12}&dots&a_{1n}\a_{21}&a_{22}&dots&a_{2n}\vdots&vdots&dots&vdots\a_{n1}&a_{n2}&dots&a_{nn} end{bmatrix}tag{21}
X=[x1,x2,...,xn]T;A=⎣⎢⎢⎢⎡a11a21⋮an1a12a22⋮an2…………a1na2n⋮ann⎦⎥⎥⎥⎤(21)f(x)是标量,YX中X纵向拉伸
∂
f
(
x
)
∂
x
=
[
∂
f
(
x
)
∂
x
1
∂
f
(
x
)
∂
x
2
⋮
∂
f
(
x
)
∂
x
n
]
=
[
∑
j
=
1
n
a
1
j
x
j
+
∑
i
=
1
n
a
i
1
x
i
∑
j
=
1
n
a
2
j
x
j
+
∑
i
=
1
n
a
i
2
x
i
⋮
∑
j
=
1
n
a
n
j
x
j
+
∑
i
=
1
n
a
i
n
x
i
]
(22)
frac{partial f(x)}{partial x}=begin{bmatrix}frac{partial f(x)}{partial x_1}\\frac{partial f(x)}{partial x_2}\\vdots\\frac{partial f(x)}{partial x_n}\end{bmatrix}=begin{bmatrix}sum_{j=1}^na_{1j}x_j+sum_{i=1}^na_{i1}x_i\\sum_{j=1}^na_{2j}x_j+sum_{i=1}^na_{i2}x_i\\vdots\\sum_{j=1}^na_{nj}x_j+sum_{i=1}^na_{in}x_i\end{bmatrix}tag{22}
∂x∂f(x)=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡∂x1∂f(x)∂x2∂f(x)⋮∂xn∂f(x)⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡∑j=1na1jxj+∑i=1nai1xi∑j=1na2jxj+∑i=1nai2xi⋮∑j=1nanjxj+∑i=1nainxi⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤(22)
∂
f
(
x
)
∂
x
=
[
∑
j
=
1
n
a
1
j
x
j
∑
j
=
1
n
a
2
j
x
j
⋮
∑
j
=
1
n
a
n
j
x
j
]
+
[
∑
i
=
1
n
a
i
1
x
i
∑
i
=
1
n
a
i
2
x
i
⋮
∑
i
=
1
n
a
i
n
x
i
]
(23)
frac{partial f(x)}{partial x}=begin{bmatrix}sum_{j=1}^na_{1j}x_j\\sum_{j=1}^na_{2j}x_j\\vdots\\sum_{j=1}^na_{nj}x_j\end{bmatrix}+begin{bmatrix}sum_{i=1}^na_{i1}x_i\\sum_{i=1}^na_{i2}x_i\\vdots\\sum_{i=1}^na_{in}x_i\end{bmatrix}tag{23}
∂x∂f(x)=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡∑j=1na1jxj∑j=1na2jxj⋮∑j=1nanjxj⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤+⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡∑i=1nai1xi∑i=1nai2xi⋮∑i=1nainxi⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤(23)
∂
f
(
x
)
∂
x
=
[
∑
j
=
1
n
a
1
j
x
j
∑
j
=
1
n
a
2
j
x
j
⋮
∑
j
=
1
n
a
n
j
x
j
]
+
[
∑
i
=
1
n
a
i
1
x
i
∑
i
=
1
n
a
i
2
x
i
⋮
∑
i
=
1
n
a
i
n
x
i
]
(24)
frac{partial f(x)}{partial x}=begin{bmatrix}sum_{j=1}^na_{1j}x_j\\sum_{j=1}^na_{2j}x_j\\vdots\\sum_{j=1}^na_{nj}x_j\end{bmatrix}+begin{bmatrix}sum_{i=1}^na_{i1}x_i\\sum_{i=1}^na_{i2}x_i\\vdots\\sum_{i=1}^na_{in}x_i\end{bmatrix}tag{24}
∂x∂f(x)=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡∑j=1na1jxj∑j=1na2jxj⋮∑j=1nanjxj⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤+⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡∑i=1nai1xi∑i=1nai2xi⋮∑i=1nainxi⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤(24)
[
∑
j
=
1
n
a
1
j
x
j
∑
j
=
1
n
a
2
j
x
j
⋮
∑
j
=
1
n
a
n
j
x
j
]
=
[
a
11
a
12
…
a
1
n
a
21
a
22
…
a
2
n
⋮
⋮
…
⋮
a
n
1
a
n
2
…
a
n
n
]
[
x
1
x
2
⋮
x
n
]
=
A
X
(25)
begin{bmatrix}sum_{j=1}^na_{1j}x_j\\sum_{j=1}^na_{2j}x_j\\vdots\\sum_{j=1}^na_{nj}x_j\end{bmatrix}=begin{bmatrix}a_{11}&a_{12}&dots&a_{1n}\a_{21}&a_{22}&dots&a_{2n}\vdots&vdots&dots&vdots\a_{n1}&a_{n2}&dots&a_{nn} end{bmatrix}begin{bmatrix}x_1\\x_2\\vdots\\x_nend{bmatrix}=AXtag{25}
⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡∑j=1na1jxj∑j=1na2jxj⋮∑j=1nanjxj⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤=⎣⎢⎢⎢⎡a11a21⋮an1a12a22⋮an2…………a1na2n⋮ann⎦⎥⎥⎥⎤⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡x1x2⋮xn⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤=AX(25)
[
∑
i
=
1
n
a
i
1
x
i
∑
i
=
1
n
a
i
2
x
i
⋮
∑
i
=
1
n
a
i
n
x
i
]
=
[
a
11
a
21
…
a
n
1
a
12
a
22
…
a
n
2
⋮
⋮
…
⋮
a
1
n
a
2
n
…
a
n
n
]
[
x
1
x
2
⋮
x
n
]
=
A
T
X
(26)
begin{bmatrix}sum_{i=1}^na_{i1}x_i\\sum_{i=1}^na_{i2}x_i\\vdots\\sum_{i=1}^na_{in}x_i\end{bmatrix}=begin{bmatrix}a_{11}&a_{21}&dots&a_{n1}\a_{12}&a_{22}&dots&a_{n2}\vdots&vdots&dots&vdots\a_{1n}&a_{2n}&dots&a_{nn} end{bmatrix}begin{bmatrix}x_1\\x_2\\vdots\\x_nend{bmatrix}=A^TXtag{26}
⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡∑i=1nai1xi∑i=1nai2xi⋮∑i=1nainxi⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤=⎣⎢⎢⎢⎡a11a12⋮a1na21a22⋮a2n…………an1an2⋮ann⎦⎥⎥⎥⎤⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡x1x2⋮xn⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤=ATX(26)
∂
f
(
x
)
∂
x
=
∂
(
X
T
A
X
)
∂
x
=
A
X
+
A
T
X
=
(
A
+
A
T
)
X
(27)
frac{partial f(x)}{partial x}=frac{partial (X^TAX)}{partial x}=AX+A^TX=(A+A^T)Xtag{27}
∂x∂f(x)=∂x∂(XTAX)=AX+ATX=(A+AT)X(27)当A为对称矩阵时,满足
A
T
=
A
A^T=A
AT=A那么上式可得:
∂
f
(
x
)
∂
x
=
∂
(
X
T
A
X
)
∂
x
=
A
X
+
A
T
X
=
2
A
X
(28)
frac{partial f(x)}{partial x}=frac{partial (X^TAX)}{partial x}=AX+A^TX=2AXtag{28}
∂x∂f(x)=∂x∂(XTAX)=AX+ATX=2AX(28)
1.6 求导细节补充
分子布局和分母布局的区别:
详见知乎大佬链接:分子分母布局说明
分母布局- YX拉伸术;分子布局-XY拉伸术;X在前面就像分数的X/Y就是分子布局,X在后面就像分数的Y/X就是分母布局区别:向量求导拉伸方向的区别;拉伸方向的口诀是不变的:口诀:前面横向拉,后面纵向拉



