检测网络里的各式各样的Non-Maximum Suppression

这里的Non-Maximum Suppression都是对某一类别来说的，分类别进行的传统Non-Maximum Suppression

先对整体的检测框进行排序，根据置信度从高到低进行排序

然后以第一个框作为哨兵，这个框保留，计算其他的框和它的IOU值，大于阈值的框被过滤掉，也就是被删除了；

剩余的框接着哨兵向后移动一位，看作第一位，这个框保留，计算其他的框和它的IOU值，大于阈值的框被过滤掉，也就是被删除了；重复直到没有框剩余，保留的框就是需要的框

soft-nms/py_cpu_nms.py at master · bharatsingh430/soft-nms (github.com)

def py_cpu_nms(dets, thresh):
    """Pure Python NMS baseline."""
    x1 = dets[:, 0]
    y1 = dets[:, 1]
    x2 = dets[:, 2]
    y2 = dets[:, 3]
    scores = dets[:, 4]

    areas = (x2 - x1 + 1) * (y2 - y1 + 1)
    order = scores.argsort()[::-1] #置信度排序并得到相应的index

    keep = []
    while order.size > 0:
        i = order[0]
        keep.append(i) #第一个index保留作为检测框
        xx1 = np.maximum(x1[i], x1[order[1:]])
        yy1 = np.maximum(y1[i], y1[order[1:]])
        xx2 = np.minimum(x2[i], x2[order[1:]])
        yy2 = np.minimum(y2[i], y2[order[1:]])

        w = np.maximum(0.0, xx2 - xx1 + 1)
        h = np.maximum(0.0, yy2 - yy1 + 1)
        inter = w * h
        ovr = inter / (areas[i] + areas[order[1:]] - inter) #计算IOU值

        inds = np.where(ovr <= thresh)[0] #跳过大于阈值的框，从小于阈值的框开始
        order = order[inds + 1] #拿到相应的index并+1

    return keep

softnms

[1704.04503v2] Soft-NMS -- Improving Object Detection With One Line of Code (arxiv.org)

有助于密集人群检测，也就是近距离检测；

不用对框进行排序，第一个框和其后所有框比较置信度，将置信度最大的移到第一位，作为哨兵的，然后计算第一个框和其后所有框的IOU值，存在几个方式；

1、线性法，其他框IOU大于阈值则权重W=1-IOU，否则W=1，该框的置信度乘以权重W等于update以后的置信度

2、高斯法，该框的置信度乘以np.exp(-(IOU*IOU)/scale)，这里的scale=0.5，等于update以后的置信度

3、传统NMS，其他框IOU大于阈值则权重W=0，否则W=1，该框的置信度乘以权重W等于update以后的置信度

若update以后的置信度小于置信度阈值，则删除这个框，框总体数量减一

哨兵后移一位，然后接着计算和其后所有框的IOU值，直到最后一位。

所以这里存在两个阈值，一个是IOU的阈值，另一个则是update以后的置信度阈值

论文官方实现：soft-nms/cpu_nms.pyx at master · bharatsingh430/soft-nms (github.com)

def cpu_soft_nms(np.ndarray[float, ndim=2] boxes, float sigma=0.5, float Nt=0.3, float threshold=0.001, unsigned int method=0):
#两个阈值一个是IOU阈值另一个是置信度阈值，不用对检测框进行排序
    cdef unsigned int N = boxes.shape[0]
    cdef float iw, ih, box_area
    cdef float ua
    cdef int pos = 0
    cdef float maxscore = 0
    cdef int maxpos = 0
    cdef float x1,x2,y1,y2,tx1,tx2,ty1,ty2,ts,area,weight,ov

    for i in range(N):
        maxscore = boxes[i, 4] #拿到第一位的置信度
        maxpos = i

        tx1 = boxes[i,0]
        ty1 = boxes[i,1]
        tx2 = boxes[i,2]
        ty2 = boxes[i,3]
        ts = boxes[i,4]

        pos = i + 1
	# get max box
        while pos < N:   #遍历其后的所有框，然后确定拿到置信度最大的值
            if maxscore < boxes[pos, 4]: 
                maxscore = boxes[pos, 4]
                maxpos = pos
            pos = pos + 1

	# add max box as a detection  #第一位和置信度最大的框对调
        boxes[i,0] = boxes[maxpos,0]
        boxes[i,1] = boxes[maxpos,1]
        boxes[i,2] = boxes[maxpos,2]
        boxes[i,3] = boxes[maxpos,3]
        boxes[i,4] = boxes[maxpos,4]

	# swap ith box with position of max box
        boxes[maxpos,0] = tx1 #第一位和置信度最大的框对调
        boxes[maxpos,1] = ty1
        boxes[maxpos,2] = tx2
        boxes[maxpos,3] = ty2
        boxes[maxpos,4] = ts

        tx1 = boxes[i,0] #第一位和置信度最大的框对调
        ty1 = boxes[i,1]
        tx2 = boxes[i,2]
        ty2 = boxes[i,3]
        ts = boxes[i,4]

        pos = i + 1
	# NMS iterations, note that N changes if detection boxes fall below threshold
        while pos < N: #遍历其后的所有框，然后计算相应的IOU值并进行抑制
            x1 = boxes[pos, 0]
            y1 = boxes[pos, 1]
            x2 = boxes[pos, 2]
            y2 = boxes[pos, 3]
            s = boxes[pos, 4]

            area = (x2 - x1 + 1) * (y2 - y1 + 1)
            iw = (min(tx2, x2) - max(tx1, x1) + 1)
            if iw > 0: #交集框的宽大于0
                ih = (min(ty2, y2) - max(ty1, y1) + 1)
                if ih > 0: #交集框的高大于0
                    ua = float((tx2 - tx1 + 1) * (ty2 - ty1 + 1) + area - iw * ih)
                    ov = iw * ih / ua #iou between max box and detection box

                    if method == 1: # linear   #线性
                        if ov > Nt: 
                            weight = 1 - ov
                        else:
                            weight = 1
                    elif method == 2: # gaussian  高斯
                        weight = np.exp(-(ov * ov)/sigma)
                    else: # original NMS 传统
                        if ov > Nt: 
                            weight = 0
                        else:
                            weight = 1

                    boxes[pos, 4] = weight*boxes[pos, 4] #对置信度进行抑制
		    
		    # if box score falls below threshold, discard the box by swapping with last box
		    # update N
                    if boxes[pos, 4] < threshold:    #update以后的置信度低于阈值则删除，N值减一
                        boxes[pos,0] = boxes[N-1, 0]
                        boxes[pos,1] = boxes[N-1, 1]
                        boxes[pos,2] = boxes[N-1, 2]
                        boxes[pos,3] = boxes[N-1, 3]
                        boxes[pos,4] = boxes[N-1, 4]
                        N = N - 1
                        pos = pos - 1

            pos = pos + 1

    keep = [i for i in range(N)]      #剩余的就是要保留的框
    return keep

fastnms

[1904.02689] YOLACT: Real-time Instance Segmentation (arxiv.org)

由于IOU计算是一个框m和另一个框n，所有IOU(m,n)=IOU(n,m)，所有框的计算IOU值是一个对称矩阵，而且对角线的值为0，所以可以选择对称矩阵的上三角矩阵，其他的全都置为0

然后对每一列选择IOU最大值，若该列IOU最大值小于阈值，那么就保留，这些保留的框，就是最后的检测框

官方的实现是：yolact/detection.py at 57b8f2d95e62e2e649b382f516ab41f949b57239 · dbolya/yolact (github.com)

def fast_nms(self, boxes, masks, scores, iou_threshold:float=0.5, top_k:int=200, second_threshold:bool=False):
        #scores是排序以后的置信度，idx是排序以后对应的index
        scores, idx = scores.sort(1, descending=True) #按照置信度从大到小排序
    
        #选取前200个框
        idx = idx[:, :top_k].contiguous()      
        scores = scores[:, :top_k]         #拿到前200个置信度
    
        num_classes, num_dets = idx.size()   #idx的维度，第一维是类别，第二维是检测框数量

        #拿到相应的检测框和masks
        boxes = boxes[idx.view(-1), :].view(num_classes, num_dets, 4)
        masks = masks[idx.view(-1), :].view(num_classes, num_dets, -1)

        iou = jaccard(boxes, boxes)  #计算IOU矩阵
        iou.triu_(diagonal=1)   #拿到IOU矩阵的上三角，下三角和对角线置0
        iou_max, _ = iou.max(dim=1)    #拿到IOU矩阵每一列的最大值

        # Now just filter out the ones higher than the threshold
        keep = (iou_max <= iou_threshold)    #最大值小于阈值则保留输出

        # We should also only keep detections over the confidence threshold, but at the cost of
        # maxing out your detection count for every image, you can just not do that. Because we
        # have such a minimal amount of computation per detection (matrix mulitplication only),
        # this increase doesn't affect us much (+0.2 mAP for 34 -> 33 fps), so we leave it out.
        # However, when you implement this in your method, you should do this second threshold.
        if second_threshold:
            keep *= (scores > self.conf_thresh)

        # Assign each kept detection to its corresponding class
        #对每一类分别进行
        classes = torch.arange(num_classes, device=boxes.device)[:, None].expand_as(keep)
        classes = classes[keep]

        boxes = boxes[keep]
        masks = masks[keep]
        scores = scores[keep]
        
        # only keep the top cfg.max_num_detections highest scores across all classes
        scores, idx = scores.sort(0, descending=True)
        idx = idx[:cfg.max_num_detections]
        scores = scores[:cfg.max_num_detections]

        classes = classes[idx]
        boxes = boxes[idx]
        masks = masks[idx]

        return boxes, masks, classes, scores

matrixnms

[2003.10152] SOLOv2: Dynamic and Fast Instance Segmentation (arxiv.org)

属于softnms和fastnms的组合

然后对每一列选择IOU最大值，得到一维向量，这个向量保持维度不变，也就是之前是的IOU矩阵是(m, m)，m行m列，现在是1行m列(1, m)，所以有m列，

接着可以选择线性或者高斯的方式计算整个IOU矩阵和最大值，以列的方式计算，得到惩罚矩阵

最后这个惩罚矩阵和置信度相乘，update以后的置信度大于阈值的保留作为检测框，其它的就删除掉了

[[0.

0.42246028

0.21570093

0.93108057

0.95719326

0.94367218]]

def matrix_nms(seg_masks, cate_labels, cate_scores, kernel='gaussian', sigma=2.0, sum_masks=None):
    """Matrix NMS for multi-class masks.
    Args:
        seg_masks (Tensor): shape (n, h, w)
        cate_labels (Tensor): shape (n), mask labels in descending order
        cate_scores (Tensor): shape (n), mask scores in descending order
        kernel (str):  'linear' or 'gauss' 
        sigma (float): std in gaussian method
        sum_masks (Tensor): The sum of seg_masks
    Returns:
        Tensor: cate_scores_update, tensors of shape (n)
    """
    n_samples = len(cate_labels)
    if n_samples == 0:
        return []
    if sum_masks is None:
        #seg_masks进行求和得到每个masks的面积也就是1的个数，输出n维（n）
        sum_masks = seg_masks.sum((1, 2)).float()
    #进行reshape，将mask拉成一维向量 （n, h*w）
    seg_masks = seg_masks.reshape(n_samples, -1).float()
    # inter.mask拉成一维向量以后，和它自身的转置相乘，得到交集区域 (n ,n)
    inter_matrix = torch.mm(seg_masks, seg_masks.transpose(1, 0))
    # union. seg_masks求和并reshape以后进行广播复制，从(n)维扩到(n, n)维
    '''
    tensor([182.1339, 179.4209, 185.8073])
    tensor([[182.1339, 179.4209, 185.8073],
        [182.1339, 179.4209, 185.8073],
        [182.1339, 179.4209, 185.8073]])
    '''
    sum_masks_x = sum_masks.expand(n_samples, n_samples)
    # iou. 得到IOU矩阵，并将下三角和对角线置0，得到上三角矩阵
    iou_matrix = (inter_matrix / (sum_masks_x + sum_masks_x.transpose(1, 0) - inter_matrix)).triu(diagonal=1)
    # label_specific matrix.
    #对类别进行广播复制从(n)维扩到(n, n)维
    cate_labels_x = cate_labels.expand(n_samples, n_samples)
    #拿到类别的数值
    label_matrix = (cate_labels_x == cate_labels_x.transpose(1, 0)).float().triu(diagonal=1)

    # IoU compensation
    #矩阵的平方，得到每一列的最大值，便于后续高斯计算
    compensate_iou, _ = (iou_matrix * label_matrix).max(0)
    #对每一列最大值进行广播复制扩充维度并转置，注意这里进行了转置！！！！！！
    compensate_iou = compensate_iou.expand(n_samples, n_samples).transpose(1, 0)

    # IoU decay 
    #便于高斯计算
    decay_iou = iou_matrix * label_matrix

    # matrix nms
    if kernel == 'gaussian':  #高斯计算方式
        decay_matrix = torch.exp(-1 * sigma * (decay_iou ** 2))  #原矩阵进行计算
        compensate_matrix = torch.exp(-1 * sigma * (compensate_iou ** 2)) #转置的矩阵进行相应计算
        decay_coefficient, _ = (decay_matrix / compensate_matrix).min(0)
    elif kernel == 'linear':  #线性计算方式
        decay_matrix = (1-decay_iou)/(1-compensate_iou)
        decay_coefficient, _ = decay_matrix.min(0)
    else:
        raise NotImplementedError

    # update the score.
    cate_scores_update = cate_scores * decay_coefficient
    #最后置信度大于阈值的才会保留作为输出框
    return cate_scores_update

官方的实现是：

SOLO/matrix_nms.py at 0c689aec145cb0a7a62f14c83b920b65e64faa1e · WXinlong/SOLO (github.com)

其它的实现代码：

nms/matrix_nms.py at 2a065084d8ed0b905600d0a178f497452b65dc95 · AmberzzZZ/nms (github.com)

一文打尽目标检测NMS——效率提升篇 - 知乎 (zhihu.com)

检测网络里的各式各样的Non-Maximum Suppression

Python相关栏目本月热门文章