【神经网络】transforms 数据预处理

torchvision 中 transforms 的使用

常见代码书写

from torchvision import transforms

data_transform = {"train": transforms.Compose([
									 transforms.RandomResizedCrop(224), 
                                     transforms.RandomHorizontalFlip(), 
                                     transforms.ToTensor(),
                                     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])}

transforms.RandomResizedCrop()

"""
A crop of random size (default: of 0.08 to 1.0) of the original size and a random
aspect ratio (default: of 3/4 to 4/3) of the original aspect ratio is made. This crop
is finally resized to given size.
This is popularly used to train the Inception networks.
"""

将图像进行随机裁剪为不同大小的高宽比，然后将裁剪后的图像缩放的指定的大小。
默认缩放区间为 scale(0.08, 1.0) ，高宽比例为（3/4 到 4/3）

from PIL import Image
import matplotlib.pyplot as plt

img = Image.open('./img.png')
img.size
plt.imshow(img)

img2 = transforms.RandomResizedCrop(224)(img)
plt.imshow(img2)

transforms.RandomHorizontalFlip()

"""Horizontally flip the given image randomly with a given probability.
    If the image is torch Tensor, it is expected
    to have [..., H, W] shape, where ... means an arbitrary number of leading
    dimensions

    Args:
        p (float): probability of the image being flipped. Default value is 0.5
    """

以给定的概率随机水平翻转给定的PIL的图像，默认概率为0.5

img3 = transforms.RandomHorizontalFlip()(img)
img3.size, plt.imshow(img3)

transforms.ToTensor()

"""
Convert a ``PIL Image`` or ``numpy.ndarray`` to tensor. This transform does not support torchscript.

    Converts a PIL Image or numpy.ndarray (H x W x C) in the range
    [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0]
    if the PIL Image belongs to one of the modes (L, LA, P, I, F, RGB, YCbCr, RGBA, CMYK, 1)
    or if the numpy.ndarray has dtype = np.uint8

    In the other cases, tensors are returned without scaling.

    .. note::
        Because the input image is scaled to [0.0, 1.0], this transformation should not be used when
        transforming target image masks. See the `references`_ for implementing the transforms for image masks.
"""

将 numpy 的 ndarray 或 PIL.Image 读的图片转换成形状为（C, H, W）的 Tensor 格式；
然后将数值除以255，缩放到 [0, 1.0] 之间
图片的通道顺序跟数据的读取方式有关。例如 cv2 : (B, G, R)；PIL.Image:（R, G, B）

print(transforms.ToTensor()(img))

transforms.Normalize()

"""
	Normalize a tensor image with mean and standard deviation.
    This transform does not support PIL Image.
    Given mean: ``(mean[1],...,mean[n])`` and std: ``(std[1],..,std[n])`` for ``n``
    channels, this transform will normalize each channel of the input
    ``torch.*Tensor`` i.e.,
    ``output[channel] = (input[channel] - mean[channel]) / std[channel]``

    .. note::
        This transform acts out of place, i.e., it does not mutate the input tensor.

    Args:
        mean (sequence): Sequence of means for each channel.
        std (sequence): Sequence of standard deviations for each channel.
        inplace(bool,optional): Bool to make this operation in-place.
"""

使用均值和标准差对图像的每个通道进行操作，支持 Tensor 格式
参数：
mean ：各个通道的平均值；
std ：各个通道的标准差；
inplace ：是否原地操作

print(transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))(transforms.ToTensor()(img)))

这里将图像的数值缩放到了 [-1, 1] 区间，这样经过 mean 和 std 处理后可以让数据符合正态分布。

【神经网络】transforms 数据预处理

Python相关栏目本月热门文章