torch.nn.LayerNorm(normalized_shape, eps=1e-05, elementwise_affine=True, device=None, dtype=None)
相应运算的数学表示为:
y
=
x
−
E
[
x
]
V
a
r
[
x
]
+
ϵ
∗
γ
+
β
y=frac{x-E[x]}{sqrt{Var[x]+epsilon}}*gamma+beta
y=Var[x]+ϵ
x−E[x]∗γ+β
其中
E
[
x
]
E[x]
E[x]表示expectation,
V
a
r
[
x
]
Var[x]
Var[x]表示variance,
β
,
γ
beta,gamma
β,γ是可学习参数,
ϵ
>
0
epsilon>0
ϵ>0是一个任意小的数字。
N, C, H, W = 12, 3, 256, 256 input = torch.randn(N, C, H, W) # input data # Normalize over the last three dimensions (i.e. the channel and spatial dimensions) # as shown in the image below layer_norm = nn.LayerNorm([C, H, W]) output = layer_norm(input)



