df = pd.Dataframe([(1, 2), (0, 3), (2, 0), (1, 1)],columns=['dogs', 'cats']) print(df.cov())
结果:
dogs cats
dogs 0.666667 -1.000000
cats -1.000000 1.666667
计算过程:
E[dogs]=(1+0+2+1)/4=1
E[cats]=(2+3+0+1)/4=1.5
cov(dogs,cats)
=E[(dog-E[dogs])(cat-E[cats])]
=[(1-1)(2-1.5)+(0-1)(3-1.5)+(2-1)(0-1.5)+(1-1)(1-1.5)]/(4-1)
=-1
即(dogs,cats)索引处的值
cov()为协方差函数,协方差表示的是两个变量的总体误差
v a r ( X ) = S 2 = ∑ i = 1 n ( X i − X ‾ ) ( X i − X ‾ ) n − 1 var(X)=S^2= cfrac{sum_{i=1}^n (X_i-overline X)(X_i-overline X)}{n-1} var(X)=S2=n−1∑i=1n(Xi−X)(Xi−X)
c o v ( X , Y ) = ∑ i = 1 n ( X i − X ‾ ) ( Y i − Y ‾ ) n − 1 cov(X,Y) = cfrac{sum_{i=1}^n (X_i-overline X)(Y_i-overline Y)}{n-1} cov(X,Y)=n−1∑i=1n(Xi−X)(Yi−Y) (即上述结果所用公式)
c
o
v
(
X
,
Y
)
=
E
[
(
X
−
E
(
X
)
)
(
Y
−
E
[
Y
]
)
]
cov(X,Y) = E[(X-E(X))(Y-E[Y])]
cov(X,Y)=E[(X−E(X))(Y−E[Y])]
=
E
[
X
Y
]
−
2
E
[
X
]
E
[
Y
]
+
E
[
x
]
E
[
Y
]
= E[XY]-2E[X]E[Y]+E[x]E[Y]
=E[XY]−2E[X]E[Y]+E[x]E[Y]
=
E
[
X
Y
]
−
E
[
X
]
E
[
Y
]
= E[XY]-E[X]E[Y]
=E[XY]−E[X]E[Y]



