状态转移矩阵
A
=
[
a
i
j
]
N
×
N
α
i
j
=
P
(
i
t
+
1
=
q
j
∣
i
t
=
q
i
)
begin{aligned} A &=left[a_{i j}right]_{N times N} \ alpha_{ij} &= P(i_{t+1} = q_j|i_t=q_i) end{aligned}
Aαij=[aij]N×N=P(it+1=qj∣it=qi)
观测概率矩阵
B
=
[
b
j
(
k
)
]
N
×
M
b
j
(
k
)
=
P
(
o
t
=
v
k
∣
i
t
=
q
j
)
begin{aligned} B &=left[b_{j}(k)right]_{N times M} \ b_j(k) &= P(o_{t}=v_k|i_t=q_j) end{aligned}
Bbj(k)=[bj(k)]N×M=P(ot=vk∣it=qj)
假设有4个盒子,每个盒子都装有红白两种颜色的球
| 盒子 | 1 | 2 | 3 | 4 |
|---|---|---|---|---|
| 红球数 | 5 | 3 | 6 | 8 |
| 白球数 | 5 | 7 | 4 | 2 |
按照下面的方法抽球,产生一个球的颜色的观测序列:开始,从4个盒子里以等概率随机选取1个盒子,从这个盒子里随机抽出1个球,记录其颜色后,放回;然后,从当前盒子随机转移到下一个盒子,规则是:如果当前盒子是盒子1,
那么下一盒子一定是盒子2,如果当前是盒子2或3,那么分别以概率0.4和0.6转移到左边或右边的盒子,如果当前是盒子4,那么各以0.5的概率停留在盒子4或转移到盒子3;确定转移的盒子后,再从这个盒子里随机抽出1个球,记录其
颜色,放回;如此下去,重复进行5次,得到一个球的颜色的观测序列
O
=
{
红
,
红
,
白
,
白
,
红
}
O = {text{红},text{红},text{白},text{白},text{红}}
O={红,红,白,白,红}
在这个过程中,观察者只能观测到球的颜色的序列,观测不到球是从哪个盒子取出的,即观测不到盒子的序列
- 状态集合
Q = { 盒子 1 , 盒子 2 , 盒子 3 , 盒子 4 } Q = {text{盒子}1,text{盒子}2,text{盒子}3,text{盒子}4} Q={盒子1,盒子2,盒子3,盒子4}
- 观测集合
V = { 红 , 白 } V = {text{红},text{白}} V={红,白}
- 初始概率分布
π = ( 0.25 , 0.25 , 0.25 , 0.25 ) T pi= (0.25, 0.25, 0.25, 0.25)^T π=(0.25,0.25,0.25,0.25)T
- 状态转移矩阵 P ( i t + 1 ∣ i 1 , i 2 , . . . , i t , o 1 , o 2 , . . . o t ) P(i_{t+1}|i_1,i_2,...,i_t,o_1,o_2,...o_t) P(it+1∣i1,i2,...,it,o1,o2,...ot)
A = [ 0 1 0 0 0.4 0 0.6 0 0 0.4 0 0.6 0 0 0.5 0.5 ] A=left[begin{array}{cccc} 0 & 1 & 0 & 0 \ 0.4 & 0 & 0.6 & 0 \ 0 & 0.4 & 0 & 0.6 \ 0 & 0 & 0.5 & 0.5 end{array}right] A=⎣⎢⎢⎡00.400100.4000.600.5000.60.5⎦⎥⎥⎤
- 观测概率分布 P ( o t + 1 ∣ i 1 , i 2 , . . . , i t + 1 , o 1 , o 2 , . . . , o t ) P(o_{t+1}|i_1,i_2,...,i_{t+1},o_1,o_2,...,o_t) P(ot+1∣i1,i2,...,it+1,o1,o2,...,ot)
B = [ 0.5 0.5 0.3 0.7 0.6 0.4 0.8 0.2 ] B=left[begin{array}{ll} 0.5 & 0.5 \ 0.3 & 0.7 \ 0.6 & 0.4 \ 0.8 & 0.2 end{array}right] B=⎣⎢⎢⎡0.50.30.60.80.50.70.40.2⎦⎥⎥⎤
两个假设齐次马尔可夫假设
P
(
i
t
∣
i
t
−
1
,
o
t
−
1
,
⋯
,
i
1
,
o
1
)
=
P
(
i
t
∣
i
t
−
1
)
,
t
=
1
,
2
,
⋯
,
T
Pleft(i_{t} mid i_{t-1}, o_{t-1}, cdots, i_{1}, o_{1}right)=Pleft(i_{t} mid i_{t-1}right), quad t=1,2, cdots, T
P(it∣it−1,ot−1,⋯,i1,o1)=P(it∣it−1),t=1,2,⋯,T
观测独立假设
P
(
o
t
∣
i
T
,
o
T
,
i
T
−
1
,
o
T
−
1
,
⋯
,
i
t
+
1
,
o
t
+
1
,
i
t
,
i
t
−
1
,
o
t
−
1
,
⋯
,
i
1
,
o
1
)
=
P
(
o
t
∣
i
t
)
Pleft(o_{t} mid i_{T}, o_{T}, i_{T-1}, o_{T-1}, cdots, i_{t+1}, o_{t+1}, i_{t}, i_{t-1}, o_{t-1}, cdots, i_{1}, o_{1}right)=Pleft(o_{t} mid i_{t}right)
P(ot∣iT,oT,iT−1,oT−1,⋯,it+1,ot+1,it,it−1,ot−1,⋯,i1,o1)=P(ot∣it)
现在我们所有知道的东西都在这了,看看我们能做什么
三个问题 概率计算问题(evaluation)给定模型 λ = ( A , B , π ) lambda=(A, B, pi) λ=(A,B,π) 和观测序列 O = ( o 1 , o 2 , ⋯ , o T ) O=left(o_{1}, o_{2}, cdots, o_{T}right) O=(o1,o2,⋯,oT) , 计算在模型 λ lambda λ 下观测序列 O O O 出现的概率 P ( O ∣ λ ) P(O mid lambda) P(O∣λ).
直接计算法P ( O ∣ λ ) = ∑ I P ( O , I ∣ λ ) = ∑ I P ( O ∣ I , λ ) ⋅ P ( I ∣ λ ) P(O|lambda) = sum_{I}P(O,I|lambda) = sum_{I}P(O|I,lambda)cdot P(I|lambda) P(O∣λ)=I∑P(O,I∣λ)=I∑P(O∣I,λ)⋅P(I∣λ)
前向算法定义前向概率为
α
t
(
i
)
=
P
(
o
1
,
o
2
,
.
.
.
,
o
t
,
i
t
=
q
i
∣
λ
)
alpha_t(i) = P(o_1,o_2,...,o_t,i_t=q_i|lambda)
αt(i)=P(o1,o2,...,ot,it=qi∣λ)
那么我们要求的
P
(
O
∣
λ
)
=
P
(
o
1
,
o
2
,
.
.
.
,
o
T
∣
λ
)
=
∑
i
t
P
(
o
1
,
o
2
,
.
.
.
,
o
T
,
i
T
=
q
i
∣
λ
)
=
∑
i
α
T
(
i
)
P(O|lambda) = P(o_1,o_2,...,o_T|lambda) = sum_{i_t}P(o_1,o_2,...,o_T,i_T=q_i|lambda) = sum_{i}alpha_T(i)
P(O∣λ)=P(o1,o2,...,oT∣λ)=it∑P(o1,o2,...,oT,iT=qi∣λ)=i∑αT(i)
(1)初值
α
1
(
i
)
=
P
(
o
1
,
i
1
=
q
i
∣
λ
)
=
P
(
i
1
∣
λ
)
⋅
P
(
o
1
∣
i
1
=
q
i
,
λ
)
=
π
i
b
i
(
o
1
)
alpha_1(i) = P(o_1,i_1=q_i|lambda) = P(i_1|lambda)cdot P(o_1|i_1=qi,lambda) = pi_i b_i(o_1)
α1(i)=P(o1,i1=qi∣λ)=P(i1∣λ)⋅P(o1∣i1=qi,λ)=πibi(o1)
(2)递推
α
t
+
1
(
i
)
=
P
(
o
1
,
o
2
,
.
.
.
,
o
t
,
o
t
+
1
,
i
t
+
1
=
q
i
∣
λ
)
=
P
(
o
1
,
o
2
,
.
.
.
,
o
t
,
i
t
+
1
=
q
i
∣
λ
)
⋅
P
(
o
t
+
1
∣
o
1
,
o
2
,
.
.
.
,
o
t
,
i
t
+
1
=
q
i
∣
λ
)
=
[
∑
j
=
1
P
(
o
1
,
o
2
,
.
.
.
,
o
t
,
i
t
=
q
j
,
i
t
+
1
=
q
i
∣
λ
)
]
⋅
P
(
o
t
+
1
∣
i
t
+
1
=
q
i
)
=
[
∑
j
=
1
P
(
o
1
,
o
2
,
.
.
.
,
o
t
,
i
t
=
q
j
∣
λ
)
⋅
P
(
i
t
+
1
=
q
i
∣
o
1
,
o
2
,
.
.
.
,
o
t
,
i
t
=
q
j
,
λ
)
]
⋅
b
i
(
o
t
+
1
)
=
[
∑
j
=
1
α
t
(
j
)
⋅
P
(
i
t
+
1
=
q
i
∣
i
t
=
q
j
,
λ
)
]
⋅
b
i
(
o
t
+
1
)
=
[
∑
j
=
1
α
t
(
j
)
⋅
a
j
i
]
⋅
b
i
(
o
t
+
1
)
begin{aligned} alpha_{t+1}(i) &= P(o_1,o_2,...,o_t,o_{t+1},i_{t+1}=q_i|lambda) \ &= P(o_1,o_2,...,o_t,i_{t+1}=q_i|lambda)cdot P(o_{t+1}|o_1,o_2,...,o_t,i_{t+1} = q_i|lambda) \ &= [sum_{j=1} P(o_1,o_2,...,o_t,i_t=q_j,i_{t+1}=q_i|lambda)]cdot P(o_{t+1}|i_{t+1}=q_i) \ &= [sum_{j=1}P(o_1,o_2,...,o_t,i_t=q_j|lambda)cdot P(i_{t+1}=q_i|o_1,o_2,...,o_t,i_t=q_j,lambda)]cdot b_i(o_{t+1}) \ &= [sum_{j=1}alpha_t(j)cdot P(i_{t+1}=q_i|i_t=q_j,lambda)]cdot b_i(o_{t+1}) \ &= [sum_{j=1}alpha_t(j)cdot a_{ji}]cdot b_i(o_{t+1}) end{aligned}
αt+1(i)=P(o1,o2,...,ot,ot+1,it+1=qi∣λ)=P(o1,o2,...,ot,it+1=qi∣λ)⋅P(ot+1∣o1,o2,...,ot,it+1=qi∣λ)=[j=1∑P(o1,o2,...,ot,it=qj,it+1=qi∣λ)]⋅P(ot+1∣it+1=qi)=[j=1∑P(o1,o2,...,ot,it=qj∣λ)⋅P(it+1=qi∣o1,o2,...,ot,it=qj,λ)]⋅bi(ot+1)=[j=1∑αt(j)⋅P(it+1=qi∣it=qj,λ)]⋅bi(ot+1)=[j=1∑αt(j)⋅aji]⋅bi(ot+1)
(3)终止
P
(
O
∣
λ
)
=
∑
i
α
T
(
i
)
P(O|lambda) = sum_{i}alpha_T(i)
P(O∣λ)=i∑αT(i)
定义观测数据和 λ lambda λ参数( π , A , B pi,A,B π,A,B)
import numpy as np
O = [1,1,0,0,1]
Pi= [0.25,0.25,0.25,0.25]
A = [
[0, 1, 0, 0 ],
[0.4,0,0.6,0],
[0,0.4,0,0.6],
[0,0,0.5,0.5]
]
B = [
[0.5,0.5],
[0.3,0.7],
[0.6,0.4],
[0.8,0.2]
]
定义模型
class ForwardModel:
def __init__(self,Pi,A,B) -> None:
self.Pi = Pi
self.A = A
self.B = B
self.T = len(A)
def predict(self,O):
alpha = np.zeros(shape=(self.T,),dtype=float)
# 初值
for i in range(self.T):
alpha[i] = self.Pi[i]*self.B[i][O[0]]
# 递推
for observation in O:
temp = np.zeros_like(alpha)
for i in range(self.T):
for j in range(self.T):
temp[i] += alpha[j]*self.A[j][i]
temp[i] = temp[i]*self.B[i][observation]
alpha = temp
# 终止
return np.sum(alpha)
预测
model = ForwardModel(Pi,A,B) model.predict(O)
0.01164323808
读者可自行验证是否正确
后向算法定义后向概率为
β
t
(
i
)
=
P
(
o
t
+
1
,
o
t
+
2
,
⋯
,
o
T
∣
i
t
=
q
i
,
λ
)
beta_{t}(i)=Pleft(o_{t+1}, o_{t+2}, cdots, o_{T} mid i_{t}=q_{i}, lambdaright)
βt(i)=P(ot+1,ot+2,⋯,oT∣it=qi,λ)
那么
P
(
O
∣
λ
)
=
P
(
o
1
,
o
2
,
.
.
.
,
o
T
∣
λ
)
=
∑
i
=
1
N
P
(
o
1
,
o
2
,
.
.
.
,
o
T
,
i
1
=
q
i
∣
λ
)
=
∑
i
=
1
N
P
(
o
1
∣
o
2
,
.
.
.
,
o
T
,
i
1
=
q
i
,
λ
)
⋅
P
(
o
2
,
.
.
.
,
o
T
,
i
1
=
q
i
∣
λ
)
=
∑
i
=
1
N
P
(
o
1
∣
i
1
=
q
i
,
λ
)
⋅
P
(
o
2
,
.
.
.
,
o
T
∣
i
1
=
q
i
,
λ
)
⋅
P
(
i
1
=
q
i
∣
λ
)
=
∑
i
=
1
N
b
i
(
o
1
)
⋅
β
1
(
i
)
⋅
π
i
begin{aligned} P(O|lambda) &= P(o_1,o_2,...,o_T|lambda) \ &=sum_{i=1}^{N}P(o_1,o_2,...,o_T,i_1=q_i|lambda) \ &=sum_{i=1}^{N}P(o_1|o_2,...,o_T,i_1=q_i,lambda)cdot P(o_2,...,o_T,i_1=q_i|lambda) \ &=sum_{i=1}^{N}P(o_1|i_1=q_i,lambda)cdot P(o_2,...,o_T|i_1=q_i,lambda)cdot P(i_1=q_i|lambda) \ &=sum_{i=1}^{N}b_i(o_1)cdot beta_1(i)cdot pi_i end{aligned}
P(O∣λ)=P(o1,o2,...,oT∣λ)=i=1∑NP(o1,o2,...,oT,i1=qi∣λ)=i=1∑NP(o1∣o2,...,oT,i1=qi,λ)⋅P(o2,...,oT,i1=qi∣λ)=i=1∑NP(o1∣i1=qi,λ)⋅P(o2,...,oT∣i1=qi,λ)⋅P(i1=qi∣λ)=i=1∑Nbi(o1)⋅β1(i)⋅πi
(1)初值
β
T
(
i
)
=
1
beta_T(i) = 1
βT(i)=1
(2)递推
β
t
(
i
)
=
P
(
o
t
+
1
,
o
t
+
2
,
⋯
,
o
T
∣
i
t
=
q
i
,
λ
)
=
∑
j
=
1
N
P
(
o
t
+
1
,
o
t
+
2
,
⋯
,
o
T
,
i
t
+
1
=
q
j
∣
i
t
=
q
i
,
λ
)
=
∑
j
=
1
N
P
(
o
t
+
1
,
o
t
+
2
,
⋯
,
o
T
∣
i
t
+
1
=
q
j
,
i
t
=
q
i
,
λ
)
⋅
P
(
i
t
+
1
=
q
j
∣
i
t
=
q
i
,
λ
)
=
∑
j
=
1
N
P
(
o
t
+
1
,
o
t
+
2
,
⋯
,
o
T
∣
i
t
+
1
=
q
j
,
i
t
=
q
i
,
λ
)
⋅
α
i
j
begin{aligned} beta_{t}(i) &= Pleft(o_{t+1}, o_{t+2}, cdots, o_{T} mid i_{t}=q_{i}, lambdaright) \ &=sum_{j=1}^N Pleft(o_{t+1}, o_{t+2}, cdots, o_{T},i_{t+1}=q_j mid i_{t}=q_{i}, lambdaright) \ &=sum_{j=1}^N Pleft(o_{t+1}, o_{t+2}, cdots, o_{T}mid i_{t+1}=q_j,i_{t}=q_{i}, lambdaright)cdot P(i_{t+1}=q_j|i_t=q_i,lambda) \ &=sum_{j=1}^N Pleft(o_{t+1}, o_{t+2}, cdots, o_{T}mid i_{t+1}=q_j,i_{t}=q_{i}, lambdaright)cdot alpha_{ij} end{aligned}
βt(i)=P(ot+1,ot+2,⋯,oT∣it=qi,λ)=j=1∑NP(ot+1,ot+2,⋯,oT,it+1=qj∣it=qi,λ)=j=1∑NP(ot+1,ot+2,⋯,oT∣it+1=qj,it=qi,λ)⋅P(it+1=qj∣it=qi,λ)=j=1∑NP(ot+1,ot+2,⋯,oT∣it+1=qj,it=qi,λ)⋅αij
接下来这个步骤很关键,根据概率图模型。。。(挖个坑,这儿差个证明)
β
t
(
i
)
=
∑
j
=
1
N
P
(
o
t
+
1
,
.
.
.
o
T
∣
i
t
+
1
=
q
j
,
λ
)
⋅
α
i
j
=
∑
j
=
1
N
P
(
o
t
+
2
,
.
.
.
o
T
∣
i
t
+
1
=
q
j
,
λ
)
⋅
P
(
o
t
+
1
∣
o
t
+
2
,
.
.
.
o
T
,
i
t
+
1
=
q
j
,
λ
)
⋅
α
i
j
=
∑
j
=
1
N
β
t
+
1
(
j
)
⋅
b
j
(
o
t
+
1
)
⋅
α
i
j
begin{aligned} beta_t(i) &= sum_{j=1}^{N}P(o_{t+1},...o_T|i_{t+1}=q_j,lambda)cdot alpha_{ij} \ &=sum_{j=1}^{N}P(o_{t+2},...o_T|i_{t+1}=q_j,lambda)cdot P(o_{t+1}|o_{t+2},...o_T,i_{t+1}=q_j,lambda) cdot alpha_{ij} \ &= sum_{j=1}^{N}beta_{t+1}(j)cdot b_j(o_{t+1})cdot alpha_{ij} end{aligned}
βt(i)=j=1∑NP(ot+1,...oT∣it+1=qj,λ)⋅αij=j=1∑NP(ot+2,...oT∣it+1=qj,λ)⋅P(ot+1∣ot+2,...oT,it+1=qj,λ)⋅αij=j=1∑Nβt+1(j)⋅bj(ot+1)⋅αij
(3)终止
P
(
O
∣
λ
)
=
∑
i
=
1
N
b
i
(
o
1
)
⋅
β
1
(
i
)
⋅
π
i
begin{aligned} P(O|lambda) =sum_{i=1}^{N}b_i(o_1)cdot beta_1(i)cdot pi_i end{aligned}
P(O∣λ)=i=1∑Nbi(o1)⋅β1(i)⋅πi
数据和参数定义部分与上面一致,略
定义模型
class BackwardModel:
def __init__(self,Pi,A,B) -> None:
self.Pi = Pi
self.A = A
self.B = B
self.T = len(A)
def predict(self,O):
# 初值
beta = np.ones(shape=(self.T,))
# 递推
for o in reversed(O):
temp = np.zeros_like(beta)
for i in range(self.T):
for j in range(self.T):
temp[i] += beta[j]*B[j][o]*A[i][j]
beta = temp
# 终止
res = 0
for i in range(self.T):
res += B[i][O[0]]*beta[i]*Pi[i]
return res
预测
model = BackwardModel(Pi,A,B) model.predict(O)
0.01164323808
我们看到这跟前面使用前向算法得出的预测值是相同的
学习问题(Learning) Baum-Welch算法剩下的先留个坑,明天补充



