实际上,设置
mask_zero=True嵌入层不会导致返回零向量。而是,嵌入层的行为不会改变,它将返回索引为零的嵌入向量。您可以通过检查Embedding层权重(即在您提到的示例中为
m.layers[0].get_weights())来确认这一点。取而代之的是,它将影响诸如RNN层之类的后续层的行为。
如果检查Embedding层的源代码,则会看到一个称为的方法
compute_mask:
def compute_mask(self, inputs, mask=None): if not self.mask_zero: return None output_mask = K.not_equal(inputs, 0) return output_mask
此输出掩码将作为
mask参数传递给支持掩码的以下层。这已经在
__call__基本层的方法中实现了
Layer:
# Handle mask propagation.previous_mask = _collect_previous_mask(inputs)user_kwargs = copy.copy(kwargs)if not is_all_none(previous_mask): # The previous layer generated a mask. if has_arg(self.call, 'mask'): if 'mask' not in kwargs: # If mask is explicitly passed to __call__, # we should override the default mask. kwargs['mask'] = previous_mask
这使得以下层可以忽略(即,在其计算中不考虑)此输入步骤。这是一个最小的示例:
data_in = np.array([ [1, 0, 2, 0]])x = Input(shape=(4,))e = Embedding(5, 5, mask_zero=True)(x)rnn = LSTM(3, return_sequences=True)(e)m = Model(inputs=x, outputs=rnn)m.predict(data_in)array([[[-0.00084503, -0.00413611, 0.00049972], [-0.00084503, -0.00413611, 0.00049972], [-0.00144554, -0.00115775, -0.00293898], [-0.00144554, -0.00115775, -0.00293898]]], dtype=float32)
如您所见,第二和第四时间步的LSTM层的输出分别与第一和第三时间步的输出相同。这意味着这些时间步骤已被掩盖。
更新:
在计算损耗时,还将考虑使用掩码,因为使用以下功能对损耗函数进行了内部增强以支持掩码
weighted_masked_objective:
def weighted_masked_objective(fn): """Adds support for masking and sample-weighting to an objective function. It transforms an objective function `fn(y_true, y_pred)` into a sample-weighted, cost-masked objective function `fn(y_true, y_pred, weights, mask)`. # Arguments fn: The objective function to wrap, with signature `fn(y_true, y_pred)`. # Returns A function with signature `fn(y_true, y_pred, weights, mask)`. """
编译模型时:
weighted_losses = [weighted_masked_objective(fn) for fn in loss_functions]
您可以使用以下示例对此进行验证:
data_in = np.array([[1, 2, 0, 0]])data_out = np.arange(12).reshape(1,4,3)x = Input(shape=(4,))e = Embedding(5, 5, mask_zero=True)(x)d = Dense(3)(e)m = Model(inputs=x, outputs=d)m.compile(loss='mse', optimizer='adam')preds = m.predict(data_in)loss = m.evaluate(data_in, data_out, verbose=0)print(preds)print('Computed Loss:', loss)[[[ 0.009682 0.02505393 -0.00632722] [ 0.01756451 0.05928303 0.0153951 ] [-0.00146054 -0.02064196 -0.04356086] [-0.00146054 -0.02064196 -0.04356086]]]Computed Loss: 9.041069030761719# verify that only the first two outputs # have been considered in the computation of lossprint(np.square(preds[0,0:2] - data_out[0,0:2]).mean())9.041070036475277


