使用 PyTorch 进行深度学习-训练分类器

之前的内容已经学习了如何定义神经网络、计算损失和更新网络的权重。现在我们可以完整的训练一个分类器。

我们将按顺序执行以下步骤：

使用加载和规范化 CIFAR10 训练和测试数据集 torchvision
定义卷积神经网络
定义损失函数
在训练数据上训练网络
在测试数据上测试网络

获取数据

通常，当我们必须处理图像、文本、音频或视频数据时，可以使用将数据加载到 numpy 数组中的标准 python 包。然后将此数组转换为 torch.*Tensor类型。

对于图像，Pillow、OpenCV 等软件包很有用
对于音频，scipy 和 librosa 等软件包
对于文本，基于原始 Python 或 Cython 的加载，或 NLTK 和 SpaCy 很有用

例如针对计算机视觉可以使用名为 torchvision的包，它具有用于常见数据集的数据加载器，比如ImageNet、CIFAR10、MNIST 等。以及用于图像的数据转换器，即 torchvision.datasets和 torch.utils.data.DataLoader

这为我们提供了极大的便利并避免了编写模板代码。对于本次代码，将使用 CIFAR10 数据集。它有：'飞机'，'汽车'，'鸟'，'猫'，'鹿'， “狗”、“青蛙”、“马”、“船”、“卡车”十个类别。 CIFAR-10 中的图像尺寸是 3x32x32，即 32x32 像素大小的 3 通道彩色图像。

1、加载并规范化 CIFAR10

使用 torchvision，加载 CIFAR10

import torch
import torchvision
import torchvision.transforms as transforms

torchvision 数据集的输出是范围 [0, 1] 的 PIL图像。我们将它们转换为标准化范围 [-1, 1] 的张量。

transform = transforms.Compose(     #图像预处理包。一般用Compose把多个步骤整合到一起：
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])#用均值和标准差对张量图像进行归一化

batch_size = 4

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
                                          shuffle=True, num_workers=0)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
                                         shuffle=False, num_workers=0)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

其中ToTensor()能够把灰度范围从0-255变换到0-1之间，ToTensor() 将 shape 为(H, W, C)的nump.ndarray 或 img 转为 shape 为 (C, H, W) 的 tensor，其将每一个数值归一化到[0,1]，其归一化方法比较简单，直接除以255即可。

transforms.Normalize(std=(0.5,0.5,0.5),mean=(0.5,0.5,0.5))，则其作用就是先将输入归一化到(0,1)，再使用公式"(x-mean)/std"，将每个元素分布到(-1,1)，对每个通道而言，执行image=(image-mean)/std。每个样本图像变成了均值为0 方差为1 的标准正态分布。有时 std 和 mean 中的数据是从imagenet训练集中抽样算出来的

为了更加直观，让我们展示一些训练图像：

import matplotlib.pyplot as plt
import numpy as np
def imshow(img):
    img = img / 2 + 0.5     # 逆标准化，Normalize的逆过程，img = img * std + mean
    npimg = img.numpy()     # 转换为numpy形式
    plt.imshow(np.transpose(npimg, (1, 2, 0)))#将形状(C, H, W)转换为(H, W, C)
    plt.show()


# 获取随机训练图片，#获取数据流中的数据元素
dataiter = iter(trainloader)
images, labels = dataiter.next()#得到图像数据和对应标签

# 展示图片
imshow(torchvision.utils.make_grid(images))##组成图像的网络，其实就是将多张图片组合成一张图片。
# print labels
print(' '.join(f'{classes[labels[j]]:5s}' for j in range(batch_size)))

2、定义卷积神经网络

import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))#卷积、激活、池化
        x = self.pool(F.relu(self.conv2(x)))
        x = torch.flatten(x, 1) # 展平层、flatten all dimensions except batch
        x = F.relu(self.fc1(x))#全连接层、激活函数
        x = F.relu(self.fc2(x))
        x = self.fc3(x)#全连接层
        return x


net = Net()

3、定义损失函数和优化器

使用具有动量的分类交叉熵损失和 SGD优化器。

import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

4、训练网络

们只需要遍历我们的数据迭代器，并将输入提供给网络和优化。

for epoch in range(10):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data

        # 将参数梯度归零
        optimizer.zero_grad()

        # 前向 + 反向 + 优化
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # 输出损失
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}')
            running_loss = 0.0

print('Finished Training')

快速保存我们训练好的模型：

PATH = './cifar_net.pth'
torch.save(net.state_dict(), PATH)

5、在测试数据上测试网络

我们已经在训练数据集上训练了网络 2 次。但是我们需要检查网络是否学到了任何东西。

我们将通过预测神经网络的类标签来检查这一点输出，并根据实际情况对其进行检查。如果预测是正确，我们将样本添加到正确预测列表中。

好的，第一步。让我们展示一张来自测试集的图像来熟悉一下。

dataiter = iter(testloader)
images, labels = dataiter.next()

# print images
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join(f'{classes[labels[j]]:5s}' for j in range(4)))

接下来，让我们重新加载我们保存的模型（注意：保存并重新加载模型这里没有必要，我们只是为了说明如何这样做）：

net = Net()
net.load_state_dict(torch.load(PATH))

现在看看神经网络认为上面的这些例子是什么：

outputs = net(images)

输出是 10 个类别的能量。一个类的能量越高，网络越多认为图像属于特定类别。所以，让我们得到最高能量的指数：

_, predicted = torch.max(outputs, 1)

print('Predicted: ', ' '.join(f'{classes[predicted[j]]:5s}'
                              for j in range(4)))

现在看看网络在整个数据集上的表现。

correct = 0
total = 0
# since we're not training, we don't need to calculate the gradients for our outputs
with torch.no_grad():
    for data in testloader:
        images, labels = data
        # calculate outputs by running images through the network
        outputs = net(images)
        # the class with the highest energy is what we choose as prediction
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Accuracy of the network on the 10000 test images: {100 * correct // total} %')

这看起来比随机好得多，（随机选择 10节课中的一节课）。似乎网络学到了一些东西。接下来看哪些表现不错，哪些表现不佳：

# prepare to count predictions for each class
correct_pred = {classname: 0 for classname in classes}
total_pred = {classname: 0 for classname in classes}

# again no gradients needed
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predictions = torch.max(outputs, 1)
        # collect the correct predictions for each class
        for label, prediction in zip(labels, predictions):
            if label == prediction:
                correct_pred[classes[label]] += 1
            total_pred[classes[label]] += 1


# print accuracy for each class
for classname, correct_count in correct_pred.items():
    accuracy = 100 * float(correct_count) / total_pred[classname]
    print(f'Accuracy for class: {classname:5s} is {accuracy:.1f} %')

6、在 GPU 上训练

就像你如何将张量转移到 GPU 上一样，你转移神经网络到GPU上。如果有，让我们首先将我们的设备定义为第一个可见的 cuda 设备可用的 CUDA：

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

# Assuming that we are on a CUDA machine, this should print a CUDA device:

print(device)

本节的其余部分假设 device是一个 CUDA 设备。然后这些方法将递归地遍历所有模块并转换它们的 CUDA 张量的参数和缓冲区：

net.to(device)

另外必须在每一步发送输入和目标也到 GPU：

inputs, labels = data[0].to(device), data[1].to(device)

使用 PyTorch 进行深度学习-训练分类器

Python相关栏目本月热门文章