Keras嵌入层中的mask_zero如何工作？

实际上，设置

mask_zero=True

嵌入层不会导致返回零向量。而是，嵌入层的行为不会改变，它将返回索引为零的嵌入向量。您可以通过检查Embedding层权重（即在您提到的示例中为

m.layers[0].get_weights()

）来确认这一点。取而代之的是，它将影响诸如RNN层之类的后续层的行为。

如果检查Embedding层的源代码，则会看到一个称为的方法

compute_mask

：

def compute_mask(self, inputs, mask=None):    if not self.mask_zero:        return None    output_mask = K.not_equal(inputs, 0)    return output_mask

此输出掩码将作为

mask

参数传递给支持掩码的以下层。这已经在

__call__

基本层的方法中实现了

Layer

：

# Handle mask propagation.previous_mask = _collect_previous_mask(inputs)user_kwargs = copy.copy(kwargs)if not is_all_none(previous_mask):    # The previous layer generated a mask.    if has_arg(self.call, 'mask'):        if 'mask' not in kwargs: # If mask is explicitly passed to __call__, # we should override the default mask. kwargs['mask'] = previous_mask

这使得以下层可以忽略（即，在其计算中不考虑）此输入步骤。这是一个最小的示例：

data_in = np.array([  [1, 0, 2, 0]])x = Input(shape=(4,))e = Embedding(5, 5, mask_zero=True)(x)rnn = LSTM(3, return_sequences=True)(e)m = Model(inputs=x, outputs=rnn)m.predict(data_in)array([[[-0.00084503, -0.00413611,  0.00049972],        [-0.00084503, -0.00413611,  0.00049972],        [-0.00144554, -0.00115775, -0.00293898],        [-0.00144554, -0.00115775, -0.00293898]]], dtype=float32)

如您所见，第二和第四时间步的LSTM层的输出分别与第一和第三时间步的输出相同。这意味着这些时间步骤已被掩盖。

更新：
在计算损耗时，还将考虑使用掩码，因为使用以下功能对损耗函数进行了内部增强以支持掩码

weighted_masked_objective

：

def weighted_masked_objective(fn):    """Adds support for masking and sample-weighting to an objective function.    It transforms an objective function `fn(y_true, y_pred)`    into a sample-weighted, cost-masked objective function    `fn(y_true, y_pred, weights, mask)`.    # Arguments        fn: The objective function to wrap, with signature `fn(y_true, y_pred)`.    # Returns        A function with signature `fn(y_true, y_pred, weights, mask)`.    """

编译模型时：

weighted_losses = [weighted_masked_objective(fn) for fn in loss_functions]

您可以使用以下示例对此进行验证：

data_in = np.array([[1, 2, 0, 0]])data_out = np.arange(12).reshape(1,4,3)x = Input(shape=(4,))e = Embedding(5, 5, mask_zero=True)(x)d = Dense(3)(e)m = Model(inputs=x, outputs=d)m.compile(loss='mse', optimizer='adam')preds = m.predict(data_in)loss = m.evaluate(data_in, data_out, verbose=0)print(preds)print('Computed Loss:', loss)[[[ 0.009682    0.02505393 -0.00632722]  [ 0.01756451  0.05928303  0.0153951 ]  [-0.00146054 -0.02064196 -0.04356086]  [-0.00146054 -0.02064196 -0.04356086]]]Computed Loss: 9.041069030761719# verify that only the first two outputs # have been considered in the computation of lossprint(np.square(preds[0,0:2] - data_out[0,0:2]).mean())9.041070036475277

Keras嵌入层中的mask_zero如何工作？

面试问答相关栏目本月热门文章