【吴恩达深度学习】04

(1)What do you think applying this filter to a grayscale image will do?
[ 0 1 − 1 0 1 3 − 3 − 1 1 3 − 3 − 1 0 1 − 1 0 ] begin{bmatrix} 0& 1& -1& 0\ 1& 3& -3& -1\ 1& 3& -3& -1\ 0& 1& -1& 0\ end{bmatrix} ⎣⎢⎢⎡01101331−1−3−3−10−1−10⎦⎥⎥⎤
[A]Detect 45 degree edges.
[B]Detect horizontal edges.
[C]Detect vertical edges.
[D]Detect image contrast.
答案：C
解析：运行以下代码，自行观察结果

import torch
from torch.nn import functional as F

if __name__ == '__main__':
    # x = torch.tensor([[[
    #     [1, 1, 1, 1, 1, 1, 1, 1],
    #     [1, 1, 1, 1, 1, 1, 1, 0],
    #     [1, 1, 1, 1, 1, 1, 0, 0],
    #     [1, 1, 1, 1, 1, 0, 0, 0],
    #     [1, 1, 1, 1, 0, 0, 0, 0],
    #     [1, 1, 1, 0, 0, 0, 0, 0],
    #     [1, 1, 0, 0, 0, 0, 0, 0],
    #     [1, 0, 0, 0, 0, 0, 0, 0],
    # ]]], dtype=torch.float)

    # x = torch.tensor([[[
    #     [1, 1, 1, 1, 1, 1, 1, 1],
    #     [1, 1, 1, 1, 1, 1, 1, 1],
    #     [1, 1, 1, 1, 1, 1, 1, 1],
    #     [1, 1, 1, 1, 1, 1, 1, 1],
    #     [0, 0, 0, 0, 0, 0, 0, 0],
    #     [0, 0, 0, 0, 0, 0, 0, 0],
    #     [0, 0, 0, 0, 0, 0, 0, 0],
    #     [0, 0, 0, 0, 0, 0, 0, 0],
    # ]]], dtype=torch.float)

    # x = torch.tensor([[[
    #     [1, 1, 1, 1, 0, 0, 0, 0],
    #     [1, 1, 1, 1, 0, 0, 0, 0],
    #     [1, 1, 1, 1, 0, 0, 0, 0],
    #     [1, 1, 1, 1, 0, 0, 0, 0],
    #     [1, 1, 1, 1, 0, 0, 0, 0],
    #     [1, 1, 1, 1, 0, 0, 0, 0],
    #     [1, 1, 1, 1, 0, 0, 0, 0],
    #     [1, 1, 1, 1, 0, 0, 0, 0],
    # ]]], dtype=torch.float)

    x = torch.tensor([[[
        [1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 0],
        [1, 1, 1, 1, 1, 1, 0, 0],
        [1, 1, 1, 1, 1, 0, 0, 0],
        [1, 1, 1, 1, 0, 0, 0, 0],
        [1, 1, 1, 1, 0, 0, 0, 0],
        [1, 1, 1, 1, 0, 0, 0, 0],
        [1, 1, 1, 1, 0, 0, 0, 0],
    ]]], dtype=torch.float)


    kernel = torch.tensor([[[
        [0, 1, -1, 0],
        [1, 3, -3, -1],
        [1, 3, -3, -1],
        [0, 1, -1, 0],
    ]]], dtype=torch.float)

    output = F.conv2d(input=x, weight=kernel)
    print(output)

注：你可能会发现该卷积核也能检测45度边缘，但显然检测垂直边缘要比45度边缘要好得多。
扩展阅读：判断一个边缘检测算子的检测方向的方法

(2)Suppose your input is a 300 by 300 color (RGB) image, and you are not using a convolutional network. If the first hidden layer has 100 neurons, each one fully connected to the input, how many parameters does this hidden layer have (including the bias parameters)?
[A] 9,000,001
[B] 9,000,100
[C] 27,000,001
[D] 27,000,100
答案：D
解析：先将图像进行flatten操作，输入x的维度为（300*300*3,n），第一个隐藏层的维度为（100,n），则W的维度为（100,300*300*3,），b的维度为（100,1）。
则该隐藏层的参数量为 300 × 300 × 3 × 100 + 100 = 27 , 000 , 100 300times300times3times100+100=27,000,100 300×300×3×100+100=27,000,100

(3)Suppose your input is a 300 by 300 color (RGB) image, and you use a convolutional layer 100 filters that are each 5x5. How many parameters does this hidden layer have (including the bias parameters)?
[A] 2501
[B] 2600
[C] 7500
[D] 7600
答案：D
解析：100个卷积核，每个卷积核维度为5x5x3（通道数和输入相同），每个卷积核都有一个偏置值，所以总的参数量为 100 × ( 5 × 5 × 3 + 1 ) = 7600 100times(5times5times3+1)=7600 100×(5×5×3+1)=7600

(4)You have an input volume that is 63x63x16, and convolve it with 32 filters that are each 7x7, using a stride of 2 and no padding. What is the output volume?
[A] 29x29x32
[B] 29x29x16
[C] 16x16x32
[D] 16x16x16
答案：A
解析：每个卷积核与图像做卷积生成的新图像维度为 ⌊ n + 2 p − f s + 1 ⌋ × ⌊ n + 2 p − f s + 1 ⌋ × 1 即 29 × 29 × 1 lfloor frac{n+2p-f}{s}+1 rfloor times lfloor frac{n+2p-f}{s}+1 rfloor times 1即29times29times1 ⌊sn+2p−f+1⌋×⌊sn+2p−f+1⌋×1即29×29×1，通道数为1是因为卷积核的通道数和输入图像的通道数相同，32个卷积核的输出堆叠起来，则总的输出维度为 29 × 29 × 32 29times29times32 29×29×32

(5)You have an input volume that is 15x15x8, and pad it using “pad=2”. What is the dimension of the resulting volume (after padding)?
[A] 17x17x10
[B] 17x17x8
[C] 19x19x8
[D] 19x19x12
答案：C
解析：padding不会在通道维度上填充，所以AD错，padding左右和上下都要填充所以不选B，选C

(6)You have an input volume that is 63x63x16, and convolve it with 32 filters that are each 7x7, and stride of 1. You want to use a “same” convolution. What is the padding?
[A] 1
[B] 2
[C] 3
[D] 7
答案：C
解析："same"卷积就是卷积前后维度大小不变。
令 ⌊ n + 2 p − f s + 1 ⌋ = n 代入解得得 p = 3 lfloor frac{n+2p-f}{s}+1 rfloor=n代入解得得p=3 ⌊sn+2p−f+1⌋=n代入解得得p=3

(7)You have an input volume that is 32x32x16, and apply max pooling with a stride of 2 and a filter size of 2. What is the output volume?
[A] 32x32x8
[B] 15x15x16
[C] 16x16x16
[D] 16x16x8
答案：C
解析：卷积层的输出大小公式同样适用于池化层， ⌊ n + 2 p − f s + 1 ⌋ = ⌊ 32 + 2 × 0 − 2 2 + 1 ⌋ = 16 lfloor frac{n+2p-f}{s}+1 rfloor=lfloor frac{32+2times0-2}{2}+1 rfloor=16 ⌊sn+2p−f+1⌋=⌊232+2×0−2+1⌋=16，并且池化是对每个通道单独进行池化，不会改变通道数。

(8)Because pooling layers do not have parameters, they do not affect the backpropagation (derivatives) calculation.
[A] True
[B] False
答案：B
解析：池化层虽然没有参数可以更新，但是对反向传播还是有影响的。
参考阅读：池化层的反向传播

(9)In lecture we talked about “parameter sharing” as a benefit of using convolutional networks. which of the following statements about parameter sharing in ConvNets are true? (Check all that apply.)
[A]It allows gradient descent to set many of the parameters to zero, thus making the connections sparse.
[B]It allows a feature detector to be used in multiple locations throughout the whole input image/input volume.
[C]It reduces the total number of parameters, thus reducing overfitting.
[D]It allows parameters learned for task to be shared even for a different task (transfer learning)
答案：B,C
解析：课堂上对于parameter sharing的定义：A feature detector that’s useful in one part of the image is probably useful in another part of the image.故B对。
使用卷积核相比全连接减少了很多参数量，参考题（2）（3），这有利于减少过拟合。

(10)In lecture we talked about “sparsity of connections” as a benefit of using convolutional layers. What does this mean?
[A]Regularization causes gradient descent to set many of the parameters to zero.
[B]Each filter is connected to every channel in the previous layer.
[C]Each avtivation in the next layer depends on only a small number of activations from the previous layer.
[D]Each layer in a convolutional network is connected only to two other layers.
答案：C
解析：课堂上对于sparsity of connections的定义：in each layer, each output value depends only on a small number of inputs.

【吴恩达深度学习】04

Python相关栏目本月热门文章