先对整体的检测框进行排序,根据置信度从高到低进行排序
然后以第一个框作为哨兵,这个框保留,计算其他的框和它的IOU值,大于阈值的框被过滤掉,也就是被删除了;
剩余的框接着哨兵向后移动一位,看作第一位,这个框保留,计算其他的框和它的IOU值,大于阈值的框被过滤掉,也就是被删除了;重复直到没有框剩余,保留的框就是需要的框
soft-nms/py_cpu_nms.py at master · bharatsingh430/soft-nms (github.com)
def py_cpu_nms(dets, thresh):
"""Pure Python NMS baseline."""
x1 = dets[:, 0]
y1 = dets[:, 1]
x2 = dets[:, 2]
y2 = dets[:, 3]
scores = dets[:, 4]
areas = (x2 - x1 + 1) * (y2 - y1 + 1)
order = scores.argsort()[::-1] #置信度排序并得到相应的index
keep = []
while order.size > 0:
i = order[0]
keep.append(i) #第一个index保留作为检测框
xx1 = np.maximum(x1[i], x1[order[1:]])
yy1 = np.maximum(y1[i], y1[order[1:]])
xx2 = np.minimum(x2[i], x2[order[1:]])
yy2 = np.minimum(y2[i], y2[order[1:]])
w = np.maximum(0.0, xx2 - xx1 + 1)
h = np.maximum(0.0, yy2 - yy1 + 1)
inter = w * h
ovr = inter / (areas[i] + areas[order[1:]] - inter) #计算IOU值
inds = np.where(ovr <= thresh)[0] #跳过大于阈值的框,从小于阈值的框开始
order = order[inds + 1] #拿到相应的index并+1
return keep
softnms
[1704.04503v2] Soft-NMS -- Improving Object Detection With One Line of Code (arxiv.org)
有助于密集人群检测,也就是近距离检测;
不用对框进行排序,第一个框和其后所有框比较置信度,将置信度最大的移到第一位,作为哨兵的,然后计算第一个框和其后所有框的IOU值,存在几个方式;
1、线性法,其他框IOU大于阈值则权重W=1-IOU,否则W=1,该框的置信度乘以权重W等于update以后的置信度
2、高斯法,该框的置信度乘以np.exp(-(IOU*IOU)/scale),这里的scale=0.5,等于update以后的置信度
3、传统NMS,其他框IOU大于阈值则权重W=0,否则W=1,该框的置信度乘以权重W等于update以后的置信度
若update以后的置信度小于置信度阈值,则删除这个框,框总体数量减一
哨兵后移一位,然后接着计算和其后所有框的IOU值,直到最后一位。
所以这里存在两个阈值,一个是IOU的阈值,另一个则是update以后的置信度阈值
论文官方实现:soft-nms/cpu_nms.pyx at master · bharatsingh430/soft-nms (github.com)
def cpu_soft_nms(np.ndarray[float, ndim=2] boxes, float sigma=0.5, float Nt=0.3, float threshold=0.001, unsigned int method=0):
#两个阈值一个是IOU阈值另一个是置信度阈值,不用对检测框进行排序
cdef unsigned int N = boxes.shape[0]
cdef float iw, ih, box_area
cdef float ua
cdef int pos = 0
cdef float maxscore = 0
cdef int maxpos = 0
cdef float x1,x2,y1,y2,tx1,tx2,ty1,ty2,ts,area,weight,ov
for i in range(N):
maxscore = boxes[i, 4] #拿到第一位的置信度
maxpos = i
tx1 = boxes[i,0]
ty1 = boxes[i,1]
tx2 = boxes[i,2]
ty2 = boxes[i,3]
ts = boxes[i,4]
pos = i + 1
# get max box
while pos < N: #遍历其后的所有框,然后确定拿到置信度最大的值
if maxscore < boxes[pos, 4]:
maxscore = boxes[pos, 4]
maxpos = pos
pos = pos + 1
# add max box as a detection #第一位和置信度最大的框对调
boxes[i,0] = boxes[maxpos,0]
boxes[i,1] = boxes[maxpos,1]
boxes[i,2] = boxes[maxpos,2]
boxes[i,3] = boxes[maxpos,3]
boxes[i,4] = boxes[maxpos,4]
# swap ith box with position of max box
boxes[maxpos,0] = tx1 #第一位和置信度最大的框对调
boxes[maxpos,1] = ty1
boxes[maxpos,2] = tx2
boxes[maxpos,3] = ty2
boxes[maxpos,4] = ts
tx1 = boxes[i,0] #第一位和置信度最大的框对调
ty1 = boxes[i,1]
tx2 = boxes[i,2]
ty2 = boxes[i,3]
ts = boxes[i,4]
pos = i + 1
# NMS iterations, note that N changes if detection boxes fall below threshold
while pos < N: #遍历其后的所有框,然后计算相应的IOU值并进行抑制
x1 = boxes[pos, 0]
y1 = boxes[pos, 1]
x2 = boxes[pos, 2]
y2 = boxes[pos, 3]
s = boxes[pos, 4]
area = (x2 - x1 + 1) * (y2 - y1 + 1)
iw = (min(tx2, x2) - max(tx1, x1) + 1)
if iw > 0: #交集框的宽大于0
ih = (min(ty2, y2) - max(ty1, y1) + 1)
if ih > 0: #交集框的高大于0
ua = float((tx2 - tx1 + 1) * (ty2 - ty1 + 1) + area - iw * ih)
ov = iw * ih / ua #iou between max box and detection box
if method == 1: # linear #线性
if ov > Nt:
weight = 1 - ov
else:
weight = 1
elif method == 2: # gaussian 高斯
weight = np.exp(-(ov * ov)/sigma)
else: # original NMS 传统
if ov > Nt:
weight = 0
else:
weight = 1
boxes[pos, 4] = weight*boxes[pos, 4] #对置信度进行抑制
# if box score falls below threshold, discard the box by swapping with last box
# update N
if boxes[pos, 4] < threshold: #update以后的置信度低于阈值则删除,N值减一
boxes[pos,0] = boxes[N-1, 0]
boxes[pos,1] = boxes[N-1, 1]
boxes[pos,2] = boxes[N-1, 2]
boxes[pos,3] = boxes[N-1, 3]
boxes[pos,4] = boxes[N-1, 4]
N = N - 1
pos = pos - 1
pos = pos + 1
keep = [i for i in range(N)] #剩余的就是要保留的框
return keep
fastnms
[1904.02689] YOLACT: Real-time Instance Segmentation (arxiv.org)
由于IOU计算是一个框m和另一个框n,所有IOU(m,n)=IOU(n,m),所有框的计算IOU值是一个对称矩阵,而且对角线的值为0,所以可以选择对称矩阵的上三角矩阵,其他的全都置为0
然后对每一列选择IOU最大值,若该列IOU最大值小于阈值,那么就保留,这些保留的框,就是最后的检测框
官方的实现是:yolact/detection.py at 57b8f2d95e62e2e649b382f516ab41f949b57239 · dbolya/yolact (github.com)
def fast_nms(self, boxes, masks, scores, iou_threshold:float=0.5, top_k:int=200, second_threshold:bool=False):
#scores是排序以后的置信度,idx是排序以后对应的index
scores, idx = scores.sort(1, descending=True) #按照置信度从大到小排序
#选取前200个框
idx = idx[:, :top_k].contiguous()
scores = scores[:, :top_k] #拿到前200个置信度
num_classes, num_dets = idx.size() #idx的维度,第一维是类别,第二维是检测框数量
#拿到相应的检测框和masks
boxes = boxes[idx.view(-1), :].view(num_classes, num_dets, 4)
masks = masks[idx.view(-1), :].view(num_classes, num_dets, -1)
iou = jaccard(boxes, boxes) #计算IOU矩阵
iou.triu_(diagonal=1) #拿到IOU矩阵的上三角,下三角和对角线置0
iou_max, _ = iou.max(dim=1) #拿到IOU矩阵每一列的最大值
# Now just filter out the ones higher than the threshold
keep = (iou_max <= iou_threshold) #最大值小于阈值则保留输出
# We should also only keep detections over the confidence threshold, but at the cost of
# maxing out your detection count for every image, you can just not do that. Because we
# have such a minimal amount of computation per detection (matrix mulitplication only),
# this increase doesn't affect us much (+0.2 mAP for 34 -> 33 fps), so we leave it out.
# However, when you implement this in your method, you should do this second threshold.
if second_threshold:
keep *= (scores > self.conf_thresh)
# Assign each kept detection to its corresponding class
#对每一类分别进行
classes = torch.arange(num_classes, device=boxes.device)[:, None].expand_as(keep)
classes = classes[keep]
boxes = boxes[keep]
masks = masks[keep]
scores = scores[keep]
# only keep the top cfg.max_num_detections highest scores across all classes
scores, idx = scores.sort(0, descending=True)
idx = idx[:cfg.max_num_detections]
scores = scores[:cfg.max_num_detections]
classes = classes[idx]
boxes = boxes[idx]
masks = masks[idx]
return boxes, masks, classes, scores
matrixnms
[2003.10152] SOLOv2: Dynamic and Fast Instance Segmentation (arxiv.org)
属于softnms和fastnms的组合
由于IOU计算是一个框m和另一个框n,所有IOU(m,n)=IOU(n,m),所有框的计算IOU值是一个对称矩阵,而且对角线的值为0,所以可以选择对称矩阵的上三角矩阵,其他的全都置为0
然后对每一列选择IOU最大值,得到一维向量,这个向量保持维度不变,也就是之前是的IOU矩阵是(m, m),m行m列,现在是1行m列(1, m),所以有m列,
接着可以选择线性或者高斯的方式计算整个IOU矩阵和最大值,以列的方式计算,得到惩罚矩阵
最后这个惩罚矩阵和置信度相乘,update以后的置信度大于阈值的保留作为检测框,其它的就删除掉了
[[0.
0.42246028
0.21570093
0.93108057
0.95719326
0.94367218]]
def matrix_nms(seg_masks, cate_labels, cate_scores, kernel='gaussian', sigma=2.0, sum_masks=None):
"""Matrix NMS for multi-class masks.
Args:
seg_masks (Tensor): shape (n, h, w)
cate_labels (Tensor): shape (n), mask labels in descending order
cate_scores (Tensor): shape (n), mask scores in descending order
kernel (str): 'linear' or 'gauss'
sigma (float): std in gaussian method
sum_masks (Tensor): The sum of seg_masks
Returns:
Tensor: cate_scores_update, tensors of shape (n)
"""
n_samples = len(cate_labels)
if n_samples == 0:
return []
if sum_masks is None:
#seg_masks进行求和得到每个masks的面积也就是1的个数,输出n维(n)
sum_masks = seg_masks.sum((1, 2)).float()
#进行reshape,将mask拉成一维向量 (n, h*w)
seg_masks = seg_masks.reshape(n_samples, -1).float()
# inter.mask拉成一维向量以后,和它自身的转置相乘,得到交集区域 (n ,n)
inter_matrix = torch.mm(seg_masks, seg_masks.transpose(1, 0))
# union. seg_masks求和并reshape以后进行广播复制,从(n)维扩到(n, n)维
'''
tensor([182.1339, 179.4209, 185.8073])
tensor([[182.1339, 179.4209, 185.8073],
[182.1339, 179.4209, 185.8073],
[182.1339, 179.4209, 185.8073]])
'''
sum_masks_x = sum_masks.expand(n_samples, n_samples)
# iou. 得到IOU矩阵,并将下三角和对角线置0,得到上三角矩阵
iou_matrix = (inter_matrix / (sum_masks_x + sum_masks_x.transpose(1, 0) - inter_matrix)).triu(diagonal=1)
# label_specific matrix.
#对类别进行广播复制从(n)维扩到(n, n)维
cate_labels_x = cate_labels.expand(n_samples, n_samples)
#拿到类别的数值
label_matrix = (cate_labels_x == cate_labels_x.transpose(1, 0)).float().triu(diagonal=1)
# IoU compensation
#矩阵的平方,得到每一列的最大值,便于后续高斯计算
compensate_iou, _ = (iou_matrix * label_matrix).max(0)
#对每一列最大值进行广播复制扩充维度并转置,注意这里进行了转置!!!!!!
compensate_iou = compensate_iou.expand(n_samples, n_samples).transpose(1, 0)
# IoU decay
#便于高斯计算
decay_iou = iou_matrix * label_matrix
# matrix nms
if kernel == 'gaussian': #高斯计算方式
decay_matrix = torch.exp(-1 * sigma * (decay_iou ** 2)) #原矩阵进行计算
compensate_matrix = torch.exp(-1 * sigma * (compensate_iou ** 2)) #转置的矩阵进行相应计算
decay_coefficient, _ = (decay_matrix / compensate_matrix).min(0)
elif kernel == 'linear': #线性计算方式
decay_matrix = (1-decay_iou)/(1-compensate_iou)
decay_coefficient, _ = decay_matrix.min(0)
else:
raise NotImplementedError
# update the score.
cate_scores_update = cate_scores * decay_coefficient
#最后置信度大于阈值的才会保留作为输出框
return cate_scores_update
官方的实现是:
SOLO/matrix_nms.py at 0c689aec145cb0a7a62f14c83b920b65e64faa1e · WXinlong/SOLO (github.com)
其它的实现代码:
nms/matrix_nms.py at 2a065084d8ed0b905600d0a178f497452b65dc95 · AmberzzZZ/nms (github.com)
一文打尽目标检测NMS——效率提升篇 - 知乎 (zhihu.com)



