PyTorch中的一些方法使用

Torch 中的 ModuleList，类似 python 的列表，存储 nn.Module，模型可以被遍历，如在GAT中有多头机制，可使用nn.ModuleList。

import torch
import torch.nn as nn
import torch.nn.functional as F

from layer import GATLayer
from torch.nn import ModuleList
class GAT_NET(nn.Module):
def __init__(self, num_input, num_hidden, num_classes, num_heads=3, dropout=0.5):
super(GAT_NET, self).__init__()
# gat1中存储着num_heads个GATLayer层
self.gat1 = ModuleList(GATLayer(num_input, num_hidden) for _ in range(num_heads))
self.gat2 = GATLayer(num_hidden * num_heads, num_classes)
self.dropout = dropout

def forward(self, adj, H):
H = torch.cat([gat(adj, H) for gat in self.gat1], dim=1)
H = F.dropout(H, self.dropout, training=self.training)
H = self.gat2(adj, H)
return F.softmax(H, dim=1)

Torch 中的 contiguous，在使用view操作的时候需要连续内存空间的tensor，如果当前的tensor经过了transpose等改变stride的操作，那么需要对tensor进行contiguous，才能执行view操作。

t = torch.arange(1,13).reshape(3,4)
tensor([[ 1, 2, 3, 4], [ 5, 6, 7, 8], [ 9, 10, 11, 12]])
# t.stride() = (4,1)每隔4个元素到下一行，每隔1个元素到下一列
t.stride() --》(4, 1)
# 将行和列交换位置，transpose
t2 = t.transpose(1,0)
tensor([[ 1, 5, 9], [ 2, 6, 10], [ 3, 7, 11], [ 4, 8, 12]])
# t2.stride() = (1,4)每隔1个元素到下一行，每隔4个元素到下一列（在原来t的基础上）
t2.stride() --》(1, 4)

t.is_contiguous() --》True

t2.is_contiguous() --》False
# 无论是 t 还是 t2 、t3，摊平展开之后的结果都是一样的
t.flatten() --》tensor([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])

t2.flatten() --》tensor([ 1, 5, 9, 2, 6, 10, 3, 7, 11, 4, 8, 12])

t3 = t2.contiguous()

t3.flatten() --》 tensor([ 1, 5, 9, 2, 6, 10, 3, 7, 11, 4, 8, 12])
# 非连续位置存储的元素在进行view操作的时候会报错，t2.view(12,1)报错
t2.view(12,1)
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
in
----> 1 t2.view(12,1)

RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

t.view(12,1)
tensor([[ 1], [ 2], [ 3], [ 4], [ 5], [ 6], [ 7], [ 8], [ 9], [10], [11], [12]])
# t2.contiguous()变为连续顺序存储，结果就不会报错，可以进行view操作
t3.view(12,1)
tensor([[ 1], [ 5], [ 9], [ 2], [ 6], [10], [ 3], [ 7], [11], [ 4], [ 8], [12]])

LayNorm：

不会像BatchNorm跟踪统计全局均值方差，因此train()和eval()对LayerNorm没有影响；通常只需要指定normalized_shape就可以了。

torch.nn.LayerNorm(normalized_shape: Union[int, List[int], torch.Size], eps: float = 1e-05, elementwise_affine: bool = True)

normalized_shape：如传入整数，如 4，被看做只有一个整数的list，LayerNorm会对输入的最后一维进行归一化，int值需要和输入的最后一维一样大。

假设此时输入的数据维度是[3, 4]，则对3个长度为4的向量求均值方差，得到3个均值和3个方差，分别对 3 行进行归一化（每一行的4个数字都是均值为0，方差为1）；LayerNorm中的weight和bias也分别包含4个数字，重复使用3次，对每一行进行仿射变换（仿射变换即乘以weight中对应的数字后，然后加bias中对应的数字），并会在反向传播时得到学习。
如果输入的是个list或者torch.Size，比如[3, 4]或torch.Size([3, 4])，则会对网络最后的两维进行归一化，且要求输入数据的最后两维尺寸也是[3, 4]。

假设此时输入的数据维度也是[3, 4]，首先对这12个数字求均值和方差，然后归一化12个数字；weight和bias也包含12个数字，分别对12个归一化后的数字进行仿射变换（仿射变换即乘以weight中对应的数字后，然后加bias中对应的数字），并会在反向传播时得到学习。
假设此时输入的数据维度是[N, 3, 4]，则对着N个[3,4]做和上述一样的操作，只是此时做仿射变换时，weight和bias被重复用了N次。
假设此时输入的数据维度是[N, T, 3, 4]，也是一样的，维度可以更多。
注意：显然LayerNorm中weight和bias的shape就是传入的normalized_shape。

eps：归一化时加在分母上防止除零。

elementwise_affine：如果设为False，则LayerNorm层不含有任何可学习参数。

如果设为True（默认是True）则会包含可学习参数weight和bias，用于仿射变换，即对输入数据归一化到均值0方差1后，乘以weight，即bias。

PyTorch中的一些方法使用

Python相关栏目本月热门文章