在pymc3中边缘化离散潜在变量，提升认知诊断(CDM)模型参数估计的速度和有效性。

在pymc3中，主要使用NUTS算法，但是此算法并不支持对潜在离散随机变量进行抽样。因此，当直接对潜在离散随机变量进行抽样时，pymc3会使用组合采样（CompoundStep）方法，降低抽样速度。想要在NUTS算法下对潜在离散随机变量进行抽样，需要将其边缘化。

本文以DINA为例

被试的作答正确概率为

代表被试n (1,...,N) 在题目i (1,...,I) 上的作答结果，和分别是题目i的失误和猜测概率，为被试n在题目i上的理想作答，当理想作答为1时，作答正确的概率为，当理想作答为0时，作答概率为。

代表被试n是否掌握了属性k，代表题目i是否考察了属性k，只有当被试n掌握题目i所有考察的属性时，理想作答才为1。

模拟数据生成

theano是pymc3的后端框架，对pymc3模型中的array进行操作时，需要使用theano的方法

import itertools
import numpy as np
import pymc3 as pm
import theano.tensor as tt
N = 300
I = 14
K = 3
all_p = np.array(list(itertools.product(*[[0,1] for k in range(K)])))

# 随机生成被试属性掌握模式
att_idx = np.random.randint(0,len(all_p),N)
att = all_p[att_idx]

# 生成Q矩阵
Q = np.vstack([all_p[1:] for i in range(2)])

# 生成guessing和slipping
s = np.random.uniform(0,0.2,(1,I))
g = np.random.uniform(0,0.2,(1,I))

# 计算理想作答
eta = (att.reshape(N,1,K)>=Q.reshape(1,I,K)).prod(axis=-1)

# 计算作答正确的概率
p = (1-s-g)*eta+g

# 生成具体作答结果
Y = np.random.binomial(1,p)

非边缘化方法

在非边缘化方法中，pymc3模型的构建包括以下几个步骤

个体属于各个属性掌握模式的概率.
从概率中抽出一个属性掌握模式作为被试的属性掌握模式
计算被试作答正确的概率

代码如下,N/C/I分别代表被试数量。属性掌握模式类别数量（all_p的行数），题目数量。

# 所有属性掌握模式对应的理想作答
all_p_eta = eta = (all_p.reshape(-1,1,K)>=Q.reshape(1,I,K)).prod(axis=-1)
all_p_eta = tt.as_tensor(all_p_eta)

with pm.Model() as DINA:
    
    # 个体属于各个属性掌握模式的概率,shape=N*C
    omega_c = pm.Dirichlet("att_pattern_p",np.ones(len(all_p)),shape=(N,len(all_p)))
    
    # 从概率中抽出一个属性掌握模式作为被试的属性掌握模式,shape=N
    att_pattern = pm.Categorical("att_pattern",omega_c,shape=(N))
    
    # 将属性掌握模式转化为理想作答，并计算作答正确的概率
    eta = all_p_eta[att_pattern]
    s_sample = pm.Beta("s",1,1,shape=(1,I))
    g_sample = pm.Beta("g",1,1,shape=(1,I))
    p_sample = eta*(1-s_sample-g_sample)+g_sample
    pm.Bernoulli("Y",p_sample,observed=Y)


with DINA:
    tr1 = pm.sample(1000,tune=3000,return_inferencedata=True,cores=4)

Multiprocess sampling (4 chains in 4 jobs)
CompoundStep
>NUTS: [g, s, att_pattern_p]
>CategoricalGibbsMetropolis: [att_pattern]

 100.00% [16000/16000 15:03<00:00 Sampling 4 chains, 0 divergences]
Sampling 4 chains for 3_000 tune and 1_000 draw iterations (12_000 + 4_000 draws total) took 905 seconds.
/home/anaconda/anaconda3/envs/pymc3/lib/python3.7/site-packages/arviz/stats/diagnostics.py:586: RuntimeWarning: divide by zero encountered in double_scalars
  (between_chain_variance / within_chain_variance + num_samples - 1) / (num_samples)
The rhat statistic is larger than 1.4 for some parameters. The sampler did not converge.
The estimated number of effective samples is smaller than 200 for some parameters.

att_pattern_p

array([[0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.14, 0.19],
       [0.11, 0.11, 0.14, 0.19, 0.11, 0.11, 0.11, 0.11],
       [0.15, 0.16, 0.11, 0.11, 0.12, 0.12, 0.11, 0.11],
       [0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.14, 0.19],
       [0.11, 0.11, 0.14, 0.19, 0.11, 0.11, 0.11, 0.11],
       [0.11, 0.11, 0.11, 0.11, 0.14, 0.2 , 0.11, 0.11],
       [0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.19, 0.14],
       [0.14, 0.19, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11],
       [0.11, 0.11, 0.19, 0.14, 0.11, 0.11, 0.11, 0.11],
       [0.19, 0.14, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11],
       [0.14, 0.19, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11],
       [0.11, 0.11, 0.11, 0.11, 0.19, 0.14, 0.11, 0.11],
       [0.11, 0.11, 0.11, 0.11, 0.14, 0.2 , 0.11, 0.11],
       [0.11, 0.11, 0.12, 0.11, 0.18, 0.15, 0.11, 0.11],
       [0.11, 0.11, 0.19, 0.14, 0.11, 0.11, 0.11, 0.11],
       [0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.2 , 0.14],
       [0.14, 0.19, 0.11, 0.11, 0.12, 0.11, 0.11, 0.11],
       [0.11, 0.11, 0.11, 0.11, 0.14, 0.19, 0.11, 0.11],
       [0.11, 0.11, 0.2 , 0.14, 0.11, 0.11, 0.11, 0.11],
       [0.13, 0.19, 0.12, 0.11, 0.11, 0.11, 0.11, 0.11],
...
       [0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.19, 0.14],
       [0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.14, 0.2 ],
       [0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.19, 0.14],
       [0.11, 0.11, 0.14, 0.2 , 0.11, 0.11, 0.11, 0.11],
       [0.13, 0.13, 0.15, 0.12, 0.12, 0.11, 0.13, 0.11],
       [0.14, 0.19, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11],
       [0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.2 , 0.14],
       [0.11, 0.11, 0.11, 0.11, 0.2 , 0.14, 0.11, 0.11],
       [0.19, 0.14, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11],
       [0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.19, 0.13],
       [0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.14, 0.19],
       [0.11, 0.11, 0.14, 0.19, 0.11, 0.11, 0.11, 0.11],
       [0.11, 0.11, 0.19, 0.14, 0.11, 0.11, 0.11, 0.11],
       [0.17, 0.17, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11],
       [0.12, 0.12, 0.13, 0.12, 0.13, 0.11, 0.14, 0.13],
       [0.11, 0.11, 0.11, 0.11, 0.14, 0.2 , 0.11, 0.11],
       [0.14, 0.13, 0.16, 0.12, 0.11, 0.11, 0.11, 0.11],
       [0.16, 0.16, 0.11, 0.11, 0.13, 0.11, 0.11, 0.11],
       [0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.14, 0.19],
       [0.11, 0.11, 0.19, 0.14, 0.11, 0.11, 0.11, 0.11]])

最后att_pattern的采样结果就是pymc3模型对被试属性掌握模式的估计结果

边缘化方法

在边缘化方法中，不再将个体属于各个属性掌握模式的概率抽样为具体的属性掌握模式，而是直接通过个体属于各个属性掌握模式的概率计算个体作答正确的概率。

边缘化的pymc3模型的构建包括以下几个步骤

个体属于各个属性掌握模式的概率
计算被试作答正确的概率。.代表被试n属于第c类属性掌握模式的概率，代表第c类属性掌握模式答对第i题的概率。被试n答对第i题的概率等于被试属于类别1到c的概率乘上类别1到c作答正确的概率的和，如下

在当前代码中，此步骤通过矩阵点乘实现。

设被试数量N=3，掌握模式类别数量C=2，题目数量I=2

,N行c列矩阵，

C行I列矩阵，则这三个被试在题目i上的作答概率是二者的点乘，即

代码如下,N/C/I分别代表被试数量。属性掌握模式类别数量（all_p的行数），题目数量。

all_p_eta = tt.as_tensor(all_p_eta)
with pm.Model() as DINA_marginal:
    
    # 个体属于各个属性掌握模式的概率,shape=N*C
    att_pattern_p = pm.Dirichlet("att_pattern_p",np.ones(len(all_p)),shape=(N,len(all_p)))
    
    s_sample = pm.Beta("s",1,1,shape=(1,I))
    g_sample = pm.Beta("g",1,1,shape=(1,I))
    
    # 所有可能的属性掌握模式作答正确的概率，shape=C*I
    all_p_p = all_p_eta*(1-s_sample-g_sample)+g_sample
    
    # 被试作答正确概率矩阵，shape=N*I
    p_sample = att_pattern_p.dot(all_p_p)
    pm.Bernoulli("Y",p_sample,observed=Y)

with DINA_marginal:
    tr2 = pm.sample(1000,tune=3000,return_inferencedata=True,cores=4)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [g, s, att_pattern_p, pi]

100.00% [8000/8000 16:59<00:00 Sampling 2 chains, 0 divergences]
Sampling 2 chains for 3_000 tune and 1_000 draw iterations (6_000 + 2_000 draws total) took 1020 seconds.
The rhat statistic is larger than 1.05 for some parameters. This indicates slight problems during sampling.
The estimated number of effective samples is smaller than 200 for some parameters.

最后att_pattern_p 的采样结果就是pymc3模型估计的结果

      [[0.17, 0.17, 0.11, 0.09, 0.17, 0.13, 0.1 , 0.06],
       [0.09, 0.1 , 0.09, 0.1 , 0.12, 0.1 , 0.25, 0.15],
       [0.14, 0.09, 0.22, 0.16, 0.1 , 0.06, 0.11, 0.13],
       [0.16, 0.15, 0.16, 0.12, 0.1 , 0.14, 0.09, 0.08],
       [0.05, 0.06, 0.05, 0.07, 0.06, 0.09, 0.07, 0.53],
       [0.09, 0.11, 0.07, 0.07, 0.11, 0.38, 0.07, 0.09],
       [0.17, 0.17, 0.11, 0.09, 0.17, 0.13, 0.09, 0.06],
       [0.13, 0.14, 0.1 , 0.08, 0.22, 0.15, 0.11, 0.06],
       [0.11, 0.08, 0.12, 0.1 , 0.17, 0.09, 0.22, 0.11],
       [0.1 , 0.14, 0.08, 0.08, 0.14, 0.29, 0.08, 0.09],
       [0.08, 0.07, 0.09, 0.09, 0.09, 0.09, 0.13, 0.37],
       [0.17, 0.11, 0.18, 0.1 , 0.17, 0.09, 0.13, 0.06],
       [0.25, 0.15, 0.15, 0.09, 0.15, 0.08, 0.09, 0.05],
       [0.14, 0.1 , 0.1 , 0.07, 0.23, 0.18, 0.11, 0.07],
       [0.15, 0.28, 0.11, 0.12, 0.11, 0.12, 0.07, 0.06],
       [0.14, 0.1 , 0.1 , 0.07, 0.23, 0.11, 0.19, 0.07],
       [0.14, 0.23, 0.1 , 0.11, 0.1 , 0.18, 0.07, 0.07],
       [0.05, 0.05, 0.05, 0.07, 0.05, 0.07, 0.08, 0.57],
       [0.08, 0.1 , 0.09, 0.09, 0.1 , 0.32, 0.09, 0.12],
       [0.06, 0.05, 0.07, 0.08, 0.07, 0.08, 0.12, 0.46],
...
       [0.15, 0.1 , 0.28, 0.12, 0.11, 0.07, 0.12, 0.06],
       [0.11, 0.1 , 0.08, 0.07, 0.14, 0.32, 0.09, 0.09],
       [0.15, 0.11, 0.11, 0.07, 0.27, 0.12, 0.12, 0.06],
       [0.13, 0.22, 0.09, 0.16, 0.1 , 0.1 , 0.1 , 0.1 ],
       [0.09, 0.13, 0.1 , 0.1 , 0.09, 0.22, 0.07, 0.2 ],
       [0.13, 0.2 , 0.09, 0.1 , 0.12, 0.14, 0.12, 0.09],
       [0.08, 0.11, 0.11, 0.33, 0.07, 0.1 , 0.07, 0.13],
       [0.13, 0.12, 0.13, 0.14, 0.13, 0.15, 0.1 , 0.11],
       [0.14, 0.23, 0.1 , 0.11, 0.1 , 0.19, 0.07, 0.07],
       [0.09, 0.11, 0.11, 0.18, 0.08, 0.12, 0.09, 0.24],
       [0.15, 0.11, 0.11, 0.07, 0.27, 0.12, 0.12, 0.06],
       [0.11, 0.15, 0.08, 0.09, 0.1 , 0.32, 0.07, 0.09],
       [0.06, 0.07, 0.07, 0.12, 0.05, 0.08, 0.08, 0.47],
       [0.09, 0.07, 0.13, 0.08, 0.13, 0.12, 0.26, 0.12],
       [0.13, 0.13, 0.14, 0.16, 0.14, 0.11, 0.11, 0.08],
       [0.11, 0.1 , 0.08, 0.07, 0.15, 0.32, 0.09, 0.09],
       [0.08, 0.08, 0.1 , 0.09, 0.1 , 0.09, 0.33, 0.12],
       [0.15, 0.16, 0.15, 0.12, 0.11, 0.09, 0.14, 0.08],
       [0.14, 0.14, 0.1 , 0.08, 0.22, 0.15, 0.11, 0.07],
       [0.2 , 0.12, 0.21, 0.1 , 0.13, 0.08, 0.1 , 0.05]])

参考文献：

Cognitive Diagnosis Model

Problem with hierachical occupancy model - Questions - PyMC Discourse

Naive Bayes model with PyMC3 - #9 by krasserm - Questions - PyMC Discourse

在pymc3中边缘化离散潜在变量，提升认知诊断(CDM)模型参数估计的速度和有效性。

Python相关栏目本月热门文章