根据欧式距离矩阵(Euclidean Distance Matrix, EDM) 的定义,它是一个n × times ×n的矩阵,表示欧式空间中一组n个点的空间距离,每个元素为点之间的距离平方。假设 { x i ∈ R d : i ∈ { 1 , . . . , n } } {x_i in mathbb{R}^d : i in {1,...,n}} {xi∈Rd:i∈{1,...,n}}
X = ( ∣ ∣ ∣ x 1 x 2 . . . x n ∣ ∣ ∣ ) d × n X =begin{pmatrix} | & | & & |\ x_1 & x_2 &... & x_n\ | & | & & |\ end{pmatrix}_{dtimes n} X=⎝⎛∣x1∣∣x2∣...∣xn∣⎠⎞d×n
距离矩阵,
D
i
j
=
∣
∣
x
i
−
x
j
∣
∣
2
2
D_{ij} = ||x_i-x_j||^2_2
Dij=∣∣xi−xj∣∣22
=
(
x
i
−
x
j
)
T
(
x
i
−
x
j
)
=(x_i - x_j)^T(x_i - x_j)
=(xi−xj)T(xi−xj)
=
x
i
T
x
i
+
x
j
x
j
T
−
2
x
i
T
x
j
=x_i^Tx_i + x_jx_j^T-2x_i^Tx_j
=xiTxi+xjxjT−2xiTxj
=
(
0
d
12
2
d
13
2
…
d
1
n
2
d
21
2
0
d
23
2
…
d
2
n
2
d
31
2
d
32
2
0
…
d
3
n
2
⋮
⋮
⋮
⋱
⋮
d
n
1
2
d
n
2
2
d
1
n
2
…
0
)
=begin{pmatrix} 0 &d_{12}^2 & d_{13}^2 & dots & d_{1n}^2\ d_{21}^2&0 & d_{23}^2 & dots & d_{2n}^2\ d_{31}^2 & d_{32}^2 & 0&dots & d_{3n}^2\ vdots&vdots&vdots &ddots & vdots\ d_{n1}^2 & d_{n2}^2 & d_{1n}^2 & dots& 0\ end{pmatrix}
=⎝⎜⎜⎜⎜⎜⎛0d212d312⋮dn12d1220d322⋮dn22d132d2320⋮d1n2………⋱…d1n2d2n2d3n2⋮0⎠⎟⎟⎟⎟⎟⎞
欧式距离矩阵,edm(X)定义为
e
d
m
(
X
)
=
d
e
f
d
i
a
g
(
G
)
+
d
i
a
g
(
G
)
T
−
2
G
edm(X) overset{mathrm{def}}{=} diag(G) + diag(G)^T - 2G
edm(X)=defdiag(G)+diag(G)T−2G
其中,格拉姆矩阵(Gram matrix)
G
=
X
T
X
G=X^TX
G=XTX,
d
i
a
g
(
G
)
diag(G)
diag(G)为含有对角线元素的列向量。
在计算两个矩阵
X
d
×
m
X_{dtimes m}
Xd×m和
Y
d
×
n
Y_{dtimes n}
Yd×n之间的欧式距离矩阵时,可从上式推导出两个矩阵的欧式距离矩阵公式:
G
X
=
X
T
X
G_X = X^TX
GX=XTX
G
Y
=
Y
T
Y
G_Y = Y^TY
GY=YTY
e
d
m
(
X
)
=
d
i
a
g
(
G
X
)
+
d
i
a
g
(
G
Y
)
T
−
2
X
T
Y
edm(X) = diag(G_X) + diag(G_Y)^T - 2X^TY
edm(X)=diag(GX)+diag(GY)T−2XTY
=
(
x
11
x
22
⋮
x
m
m
)
m
×
1
+
(
y
11
y
22
⋮
y
n
n
)
n
×
1
T
−
2
X
T
Y
=begin{pmatrix} x_{11}\ x_{22}\ vdots\ x_{mm}\ end{pmatrix}_{mtimes1} + begin{pmatrix} y_{11}\ y_{22}\ vdots\ y_{nn}\ end{pmatrix}^T_{ntimes1} - 2X^TY
=⎝⎜⎜⎜⎛x11x22⋮xmm⎠⎟⎟⎟⎞m×1+⎝⎜⎜⎜⎛y11y22⋮ynn⎠⎟⎟⎟⎞n×1T−2XTY
这里的 G X G_X GX和 G Y G_Y GY的对角线元素会经过numpy数组的广播机制(broadcasting),分别广播成 n × n ntimes n n×n和 m × m mtimes m m×m的矩阵,进行相加。EDM矩阵的大小应为 m × n mtimes n m×n。
例子:
X
=
(
1
2
3
2
3
4
0
1
2
)
3
×
3
T
;
Y
=
(
1
2
3
4
3
2
)
2
×
3
T
X = begin{pmatrix} 1&2&3\ 2&3&4\ 0&1&2\ end{pmatrix}^T_{3times 3}; Y = begin{pmatrix} 1&2&3\ 4&3&2\ end{pmatrix}^T_{2times 3}
X=⎝⎛120231342⎠⎞3×3T;Y=(142332)2×3T
Python代码:
import numpy as np
def euclidean_dist(x, y=None):
"""
Compute all pairwise distances between vectors in X and Y matrices.
:param x: numpy array, with size of (d, m)
:param y: numpy array, with size of (d, n)
:return: EDM: numpy array, with size of (m,n).
Each entry in EDM_{i,j} represents the distance between row i in x and row j in y.
"""
if y is None:
y = x
# calculate Gram matrices
G_x = np.matmul(x.T, x)
G_y = np.matmul(y.T, y)
# convert diagonal matrix into column vector
diag_Gx = np.reshape(np.diag(G_x), (-1, 1))
diag_Gy = np.reshape(np.diag(G_y), (-1, 1))
# Compute Euclidean distance matrix
EDM = diag_Gx + diag_Gy.T - 2*np.matmul(x.T, y) # broadcasting
print('This is matrix EDM: ')
print(EDM)
return EDM
x = np.array([[1.0, 2.0, 3.0], [2.0, 3.0, 4.0], [0.0, 1.0, 2.0]]).T
y = np.array([[1.0, 2.0, 3.0], [4.0, 3.0, 2.0]]).T
euclidean_dist(x,y)
输出: E D M = ( 0 11 3 8 3 20 ) 3 × 2 EDM = begin{pmatrix} 0&11\ 3&8\ 3&20\ end{pmatrix}_{3times 2} EDM=⎝⎛03311820⎠⎞3×2
备注: 欧式距离矩阵中的元素为距离的平方,若需要距离,可用np.sqrt(EDM)开方。
参考文献:
- https://www.dabblingbadger.com/blog/2020/2/27/implementing-euclidean-distance-matrix-calculations-from-scratch-in-python
- Dokmanic, Ivan, et al. “Euclidean distance matrices: essential theory, algorithms, and applications.” IEEE Signal Processing Magazine 32.6 (2015): 12-30.
- Albanie, Samuel. “Euclidean Distance Matrix Trick.” Retrieved from Visual Geometry Group, University of Oxford (2019).
- https://ccrma.stanford.edu/~dattorro/EDM.pdf



