跟李沐学AI--锚框代码解析--1

跟李沐学AI–锚框代码解析–1 锚框的介绍

目标检测算法通常会在输入图像中采样大量的区域，然后判断区域中是否包含感兴趣的目标，并调整区域边缘从而更准确的预测目标的真实边界框。**锚框：**就是以图片中的像素点为中心生成的多个大小和高宽比（aspect ratio）不同的边界框。

锚框的生成

锚框的生成具有很多方式，但是基础为以像素点为中心扩展，主要参数：边框大小：s，高宽比：r，其中边框大小 s 为站原始图像的比例
- 设输入图像的高度为 h h h，宽度为 w w w，则生成锚框的宽度为： s r ∗ h / w s sqrt{r}*h/w sr ∗h/w, 高度分别为： s r s sqrt{r} sr 整理为等比例后（这里在沐神的讲解PPT中写错了，这里这个公式是按照程序中的公式整理）
- ```
  import numpy as np
  w = 1
  h = 1
  s = 0.72
  r = 8/9

  h1 = h  * np.sqrt(s * r)
  w1 = w * np.sqrt(s / r)

  print(h1*w1) --> 0.72
  print(h1)  --> 0.799999
  print(w1)  --> 0.9
```
- 上面的的代码为按照上述理解进行，但是在沐神锚框生成代码中使用的为如下计算方式：
- ```
w = 1
h = 1
s = 0.72
r = 8/9
w1 = s / np.sqrt(r) * h/w
h1 = s * np.sqrt(r)
print(h1/w1) --> 0.88888
print(h1*w1) --> 0.5183999
```
- 高宽比满足定义，但是面积计算结果能不是满足上述结果
- - 若我们设置锚框大小 s : s: s: [0.72, 0.64, 0.36], 共 n n n个数值， n = 3 n=3 n=3, 高宽比 r : r: r:[1/1, 1/2, 1/3],共 s s s个数值， s = 3 s=3 s=3 则生成相互匹配可以生成的锚框有 9个，但是我们在实际使用中只考虑包含 s 1 s_1 s1和 r 1 r_1 r1的组合，所以：
- 选取的为 ( s 1 , r 1 ) , ( s 1 , r 2 ) , ( s 1 , r 3 ) , ( s 2 , r 1 ) , ( s 2 , r 2 ) (s_1,r_1), (s_1,r_2), (s_1,r_3), (s_2, r_1), (s_2, r_2) (s1,r1),(s1,r2),(s1,r3),(s2,r1),(s2,r2)
- 由此可以得出公式为以同一像素点为中心锚框数量为 w h ( n + m − 1 ) wh(n+m-1) wh(n+m−1)
- 如下函数生成图片中所有的锚框

def multibox_proor(data, sizes, ratios):
	'''
	agrs:
	    data: tensor
	        [batch_size, height, width]
	    sizes: list 
	    ratios: list
	'''
	in_height, in_width = data.shape[-2:]
	device, num_size, num_ratios = data.devices, len(sizes), len(ratios)
	boxes_per_pixel = (num_size + num_ratios - 1)
	size_tensor = torch.tensor(sizes, device=device)
	ratio_tensor = torch.tensor(ratios, device=device)
	'''
	为了将锚点移动到像素的中心，需要设置偏移量，因像素的高为1，宽为1，则移动到中心偏移为0.5
	'''
	offset_h, offset_w = 0.5, 0.5
	'''
	锚框大小按图片比例设置大小
	'''
	steps_h = 1.0 / in_height  
	steps_w = 1.0 / in_width 
	
	center_h = (torch.arange(in_height, device=device) + offfset_h) * steps_h
	center_w = (torch.arange(in_width, device=device) + offfset_w) * steps_w
	'''将中心点网格化之后拉平，进行匹配'''
	shift_y, shift_x = torch.meshgrid(center_h, center_w)
	shift_y, shift_x = shift_y.reshape(-1), shift_x.reshape(-1)
	'''注意这里合并的为sizes元素与第一个ratios相乘和第一个size元素与ratios第一个元素之后所有元素相乘的结果'''
	w = torch.cat((size_tensor * torch.sqrt(ratio_tensor[0]), size[0] * torch.sqrt(ratio_tensor[1:])))*in_height / in_width
	h = torch.cat((size_tensor / torch.sqrt(ratio_tensor[0]), size[0] / torch.sqrt(ratio_tensor[1:])))
	 '''获取锚框的所有高宽，并进行转置，重复处理，与格点数量匹配
	 '''
	anchor_manipulations = torch.stack((-w,-h,w,h)).T.repeat(in_height * in_width, 1) / 2
	out_grid = torch.stack([shift_x, shift_y, shift_x, shift_y], dim=1).repeat_interleave(boxes_per_pixel, dim=0)
	output = out_grid + anchor_manipulations
	return output.unsqueeze(0)

上述代码中涉及到了一些程序的细节、函数、串联应用，接下来进行一下介绍：

repeat和repeat_interleave的区别

repeat：将tensor在选定维度上整体重复n次进行拼接，直接给出tuple定义在那个维度重复，及重复的次数
repeat_interleave：将tensor在选定维度上逐个重复，有dim参数，需要定义
具体区别详见代码：

import torch
a = torch.rand([3, 2])
# 对第一维度
print(a)
print(a.repeat_interleave(2, dim=0))	# 复制后大小：【6，2】
print(a.repeat(2, 1))					# 复制后大小：【6，2】

###############
output:
a: 
    tensor([[0.72, 0.34],
            [0.05, 0.59],
            [0.87, 0.19]])
repeat_interleave:
    tensor([[0.72, 0.34],
             [0.72, 0.34],
             [0.05, 0.59],
             [0.05, 0.59],
             [0.87, 0.19],
             [0.87, 0.19]])
 repeat:
    tensor([[0.72, 0.34],
             [0.05, 0.59],
             [0.87, 0.19],
             [0.72, 0.34],
             [0.05, 0.59],
             [0.87, 0.19]])

anchor_manipulations的生成：

先使用torch.stack对四个tensor进行拼接，其中四个tensor的维度均为1
对于out_grid的操作类似，不过最后repeat 变成了 repeat_interleave
逐步展示操作结果：

anchor1 = torch.stack((-w, -h, w, h))
print(anchor1.shape) -->【4, n+m-1】
anchor2 = torch.stack((-w, -h, w, h)).T 
print(anchor2.shape) --> 【n+m-1，4】
anchor3 = torch.stack((-w, -h, w, h)).T .repeat(in_height * in_width, 1) 
print(anchor3.shape) --> 【(n+m-1) * n_height * in_width，4】

交并比 (intersection over union, IoU)

Jaccard指数称为交并比：即两个边界框相交面积与相并面积之比，计算示意图如下
- 实操图如下：
使用交并比衡量锚框与真实边界框之间、以及不同锚框之间的相似度，其计算就是两个框相交的面积与两个框共同的面积的比值，代码如下：

def box_iou(boxes1, boxes2):
    '''
     args:
         boxes1: tensor
             [num_boxes, 4]
         boxes2: 同上
     '''
     box_area = lambda boxes:((boxes[:,2] - boxes[:,0]) * (boxes[:,3] - boxes[:,1]))
     area1 = box_area(boxes1)
     area2 = box_area(boxes2)
     inter_upperlefts = torch.max(boxes1[:, None, :2], boxes2[:, :2])
     inter_lowerrights = torch.min(boxes1[:, None, 2:], boxes[: 2:])
     inters = (inter_lowerrights - inter_upperlefts).clamp(min=0)
     inter_areas = inter[:, :, 0] * inters[:, :, 1]
     union_areas = areas1[:, None] + areas2 - inter_areas
     return inter_areas / union_areas

上述代码中有以下细节：

lambda定义函数：
- 格式如下：
- lambda：input params：执行动作
- ```
a = lambda x,y: x+y
print(a(1,2))  --> 3
```
inter_upperlefts和inter_lowerrights的广播机制

这里涉及到两个部分的内容，一个时None对于维度扩展的应用，另外就是torch.max对比大小时的广播机制：

None的应用

boxes1 = torch.tensor([[0, 0.1, 0.08, 0.52],
             [1, 0.55, 0.2, 0.9]])
 print(boxes1[:, :2].shape )
 -->torch.Size([2, 2])
 print(boxes1[:, None, :2].shape) 
 --> torch.Size([2, 1, 2])

torch.max对比方式：
torch.max([m, 1, n], [q, n])其结果为分别将tensor1[0, :, :]与tensor2[q,:]分别对比，得到tensor3_1[q, n]，为两向对比的最大值，输出结果的形状为：【m, q, n】代码演示如下(后一句torch.min类似)：

boxes1 = torch.tensor([[0, 0.1, 0.08, 0.52], [1, 0.55, 0.2, 0.9]])
boxes2 = torch.tensor([[0, 0.1, 0.2, 0.3],[0.15, 0.2, 0.4, 0.4], [0.63, 0.05, 0.88, 0.98],  [0.66, 0.45, 0.8, 0.8], [0.57, 0.3, 0.92, 0.9]])
inter_upperlefts = torch.max(boxes1[:, None, :2], boxes2[:2, :3])

print(inter_upperlefts) -->
  
  tensor([[[0.00, 0.10],
           [0.15, 0.20],
           [0.63, 0.10]],

          [[1.00, 0.55],
           [1.00, 0.55],
           [1.00, 0.55]]])

torch.clamp()和 tensor.clamp()

clamp函数将tensor限制在一定范围内，这里使用这个函数的原因是，如果为负数说明没有交集，用法如下：

x = torch.FloatTensor([-1,-2, -10, 1,2,3,4,5,6,7,8,9])
print(x.clamp(min=0))
print(torch.clamp(x, min=0, max=7))

output:
    tensor([0., 0., 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])
    tensor([0., 0., 0., 1., 2., 3., 4., 5., 6., 7., 7., 7.])

总结：

这次主要整理了两个函数，一个是给出图片根据大小和高宽比生成多个锚框，一个是计算每个锚框的IoU值
- 根据大小和高宽比生成多个锚框
  - 给出像素点的数量
  - 并设置偏移量（0.5）确定锚框中心
  - 根据高宽比、大小和生成方式获取单像素锚框
  - 在每一个像素点上进行操作，获取每个像素点的锚框
- 计算锚框IoU值：
  - 主要需要理解计算方法，图像坐标系与笛卡尔坐标系不同，y轴相反
  - 利用最大最小值的广播机制，计算出交集坐标，利用坐标计算面积

跟李沐学AI--锚框代码解析--1

Python相关栏目本月热门文章