实际中用户历史信息D、用户U、用户对不同类型I的表现M、物品I、预测结果Y的因果图,存在两条后门路径
U<-D->M->Y
M<-U->Y
我们需要矫正的是对U的embedding,因此不需要考虑M中介的问题(第二条后门路径)。对于第一条后门路径,可以阻断D->U或者D->M,但是M需要经由U和D来计算,其取值难以估计,不方便阻断,因此最简单的切断这条后门路径的方式是阻断D->U
| 符号 | 解释 |
|---|---|
| u = [ u 1 , . . . , u K ] , u K ∈ R H u=[u_1,...,u_K],u_K in R^H u=[u1,...,uK],uK∈RH | 用户 |
| x = [ x u , 1 , . . . , x u , K ] x=[x_{u,1},...,x_{u,K}] x=[xu,1,...,xu,K] | 用户特征 |
| d u = [ p u ( g 1 ) , . . . , p u ( g N ) ] d_u=[p_u(g_1),...,p_u(g_N)] du=[pu(g1),...,pu(gN)] | 用户历史上对某一类型I的倾向性 |
| m = M ( d , u ) ∈ R H m=M(d,u)in R^H m=M(d,u)∈RH | 用户基于历史交互的组特征 |
| H u mathcal{H}_u Hu | 交互记录的I的集合 |
| q i = [ q g 1 i , . . . , q g n i ] ∈ R H q^i=[q_{g_1}^i,...,q_{g_n}^i]in R^H qi=[qg1i,...,qgni]∈RH | I属于每个组的概率 |
| v = [ v 1 , . . . , v N ] , v N ∈ R H v=[v_1,...,v_N],v_Nin R^H v=[v1,...,vN],vN∈RH | 组的特征 |
P
(
Y
∣
U
=
u
,
I
=
i
)
=
∑
d
∈
D
∑
m
∈
M
P
(
d
)
P
(
u
∣
d
)
P
(
m
∣
d
,
u
)
P
(
i
)
P
(
Y
∣
u
,
i
,
m
)
P
(
u
)
P
(
i
)
=
∑
d
∈
D
∑
m
∈
M
P
(
d
∣
u
)
P
(
m
∣
d
,
u
)
P
(
Y
∣
u
,
i
,
m
)
=
∑
d
∈
D
P
(
d
∣
u
)
P
(
Y
∣
u
,
i
,
M
(
d
,
u
)
)
=
P
(
d
u
∣
u
)
P
(
Y
∣
u
,
i
,
M
(
d
u
,
u
)
)
P
(
Y
∣
d
o
(
U
=
u
)
,
I
=
i
)
=
∑
d
∈
D
P
(
d
∣
d
o
(
U
=
u
)
)
P
(
Y
∣
d
o
(
U
=
u
)
,
i
,
M
(
d
,
d
o
(
U
=
u
)
)
)
=
∑
d
∈
D
P
(
d
)
P
(
Y
∣
d
o
(
U
=
u
)
,
i
,
M
(
d
,
d
o
(
U
=
u
)
)
)
=
∑
d
∈
D
P
(
d
)
P
(
Y
∣
u
,
i
,
M
(
d
,
u
)
)
begin{aligned} P&(Y|U=mathbf{u},I=mathbf{i}) \ &=frac{textstyle sum_{mathbf{d} in D} sum_{mathbf{m} in M}P(mathbf{d})P(mathbf{u}|mathbf{d})P(mathbf{m}|mathbf{d},mathbf{u})P(mathbf{i})P(Y|mathbf{u},mathbf{i},mathbf{m})}{P(mathbf{u})P(mathbf{i})} \ &=textstyle sum_{mathbf{d} in D} sum_{mathbf{m} in M}P(mathbf{d}|mathbf{u})P(mathbf{m}|mathbf{d},mathbf{u})P(Y|mathbf{u},mathbf{i},mathbf{m})\ &=textstyle sum_{mathbf{d} in D}P(mathbf{d}|mathbf{u})P(Y|mathbf{u},mathbf{i},M(mathbf{d},mathbf{u}))\ &=P(mathbf{d}_u|mathbf{u})P(Y|mathbf{u},mathbf{i},M(mathbf{d}_u,mathbf{u}))\ P&(Y|do(U=mathbf{u}),I=mathbf{i}) \ & = displaystyle sum_{mathbf{d} in mathcal{D} }P(mathbf{d}|do(U=u))P(Y|do(U=mathbf{u}),mathbf{i},M(mathbf{d},do(U=mathbf{u}))) \ & = displaystyle sum_{mathbf{d} in mathcal{D} }P(mathbf{d})P(Y|do(U=mathbf{u}),mathbf{i},M(mathbf{d},do(U=mathbf{u})))\ & = displaystyle sum_{mathbf{d} in mathcal{D} }P(mathbf{d})P(Y|mathbf{u},mathbf{i},M(mathbf{d},mathbf{u})) end{aligned}
PP(Y∣U=u,I=i)=P(u)P(i)∑d∈D∑m∈MP(d)P(u∣d)P(m∣d,u)P(i)P(Y∣u,i,m)=∑d∈D∑m∈MP(d∣u)P(m∣d,u)P(Y∣u,i,m)=∑d∈DP(d∣u)P(Y∣u,i,M(d,u))=P(du∣u)P(Y∣u,i,M(du,u))(Y∣do(U=u),I=i)=d∈D∑P(d∣do(U=u))P(Y∣do(U=u),i,M(d,do(U=u)))=d∈D∑P(d)P(Y∣do(U=u),i,M(d,do(U=u)))=d∈D∑P(d)P(Y∣u,i,M(d,u))
由于D的范围是无限的,对上面应用后门准则计算后的公式进行优化,只考虑交互过的D
$p_u(g_n)=displaystyle sum_{i in I}p(g_n|i)p(i|u)=frac{sum_{i in mathcal{H}u}q{g_n}^i}{|mathcal{H}_u|} $
P
(
Y
∣
d
o
(
U
=
u
)
,
I
=
i
)
=
∑
d
∈
D
P
(
d
)
P
(
Y
∣
u
,
i
,
M
(
d
,
u
)
)
≈
∑
d
∈
D
P
(
d
)
f
(
u
,
i
,
M
(
d
,
u
)
)
=
f
(
u
,
i
,
M
(
∑
d
∈
D
P
(
d
)
d
,
u
)
)
=
f
(
u
,
i
,
M
(
d
ˉ
,
u
)
)
begin{aligned} P&(Y|do(U=mathbf{u}),I=mathbf{i}) \ & = displaystyle sum_{mathbf{d} in mathcal{D} }P(mathbf{d})P(Y|mathbf{u},mathbf{i},M(mathbf{d},mathbf{u})) \ & approx displaystyle sum_{mathbf{d} in mathcal{D} }P(mathbf{d})f(mathbf{u},mathbf{i},M(mathbf{d},mathbf{u})) \ & = f(mathbf{u},mathbf{i},M(displaystyle sum_{mathbf{d} in mathcal{D}}P(mathbf{d})mathbf{d},mathbf{u})) \ & = f(mathbf{u},mathbf{i},M(bar{mathbf{d}},mathbf{u})) \ end{aligned}
P(Y∣do(U=u),I=i)=d∈D∑P(d)P(Y∣u,i,M(d,u))≈d∈D∑P(d)f(u,i,M(d,u))=f(u,i,M(d∈D∑P(d)d,u))=f(u,i,M(dˉ,u))
可以利用FM来求解
M
(
d
ˉ
,
u
)
M(bar{mathbf{d}},mathbf{u})
M(dˉ,u)
M
(
d
ˉ
,
u
)
=
∑
a
=
1
N
∑
b
=
1
K
p
(
g
a
)
v
a
⊙
x
u
,
b
u
b
=
∑
a
=
1
N
+
K
∑
b
=
1
N
+
K
w
a
c
a
⊙
w
b
c
b
begin{aligned} M(bar{mathbf{d}},mathbf{u}) & = displaystyle sum_{a=1}^Ndisplaystyle sum_{b=1}^Kp(g_a)v_aodot x_{u,b}mathbf{u}_b\ & =displaystyle sum_{a=1}^{N+K}displaystyle sum_{b=1}^{N+K}w_amathbf{c}_aodot w_bmathbf{c}_b end{aligned}
M(dˉ,u)=a=1∑Nb=1∑Kp(ga)va⊙xu,bub=a=1∑N+Kb=1∑N+Kwaca⊙wbcb
其中
w
=
[
d
ˉ
,
x
u
]
c
=
[
v
,
u
]
begin{aligned} mathbf{w}&=[bar{mathbf{d}},mathbf{x}_u] \ mathbf{c}&=[mathbf{v},mathbf{u}] end{aligned}
wc=[dˉ,xu]=[v,u]
根据timestamp信息分为两组,运用KL分歧对用户的兴趣变化进行量化。将普通推荐系统模型与融入了去处混杂因子的模型预测结果进行融合
η
u
=
퐾
퐿
(
d
u
1
∣
d
u
2
)
+
퐾
퐿
(
d
u
2
∣
d
u
1
)
=
∑
n
=
1
N
P
u
1
(
g
n
)
P
u
1
(
g
n
)
P
u
2
(
g
n
)
+
∑
n
=
1
N
P
u
2
(
g
n
)
P
u
2
(
g
n
)
P
u
1
(
g
n
)
Y
u
,
i
=
(
1
−
η
^
u
)
∗
Y
u
,
i
R
S
+
η
^
u
∗
Y
u
,
i
D
E
C
R
S
begin{aligned} eta_u&=퐾퐿(d_u^1|d_u^2)+퐾퐿(d_u^2|d_u^1) \ &=displaystyle sum_{n=1}^NP_u^1(g_n)frac{P_u^1(g_n)}{P_u^2(g_n)}+displaystyle sum_{n=1}^NP_u^2(g_n)frac{P_u^2(g_n)}{P_u^1(g_n)}\ Y_{u,i}&=(1-hat{eta}_u)*Y_{u,i}^{RS}+hat{eta}_u*Y_{u,i}^{DECRS} end{aligned}
ηuYu,i=KL(du1∣du2)+KL(du2∣du1)=n=1∑NPu1(gn)Pu2(gn)Pu1(gn)+n=1∑NPu2(gn)Pu1(gn)Pu2(gn)=(1−η^u)∗Yu,iRS+η^u∗Yu,iDECRS
其中(MinMaxScaler一下权重的超参数)
η
^
u
=
(
η
u
−
η
m
i
n
η
m
a
x
−
η
m
i
n
)
α
hat{eta}_u=(frac{eta_u-eta_{min}}{eta_{max}-eta_{min}})^alpha
η^u=(ηmax−ηminηu−ηmin)α



