【pytorch】自己实现精简版YOLOV3【五】，实现YOLOV3损失函数：损失函数表达式转多维tensor

接上篇博文：
yolov3的损失函数是在yolov2基础上改动的，最大的变动是分类损失换成了二分交叉熵，这是由于yolov3中剔除了softmax改用logistic。
更为具体的yolo原理详见：
yolo发展史

yolov3的损失函数为：

1.上图宽高误差之所以开根号，是因为大物体损失对于小物体损失天然要大，开根号可以减小这种差距。实际代码没有采用。
2.不同项lose前的系数，平衡位置损失和分类损失，有物体和无物体损失之间的不同。

具体在pytorch实现时，使用多维tensor实现loss函数表达式，具体思路：

**思路：仔细观察图片上的损失函数，首先损失函数中的两个连加号顺序可以对调，这是连加号的性质；同时，多维tensor可以方便表示被加元素的下标索引，故每个连加号可以用tensor中的一维元素代替，其中特征图边长 S 2 S^2 S2可拆为两个S的乘积.也就是拆成两维。对于被累加的数，两个tensor必然是同维的才能相乘，于是：得到I及X的形状应为size:[B,S,S,1]。内部共含 S 2 B S^2B S2B个元素，刚好是双重连加号的加次数。再对相乘后的两tensor使用torch.mean求平均（实际上是逐元素求和，便实现了loss）。

下面给出pytorch代码：

class YOLOLoss(nn.Module):
    def __init__(self, anchors, num_classes, input_shape, cuda):
        super(YOLOLoss, self).__init__()
        self.anchors        = anchors
        self.num_classes    = num_classes
        self.bbox_attrs     = 5 + num_classes
        self.input_shape    = input_shape
        self.giou           = True
        self.balance        = [0.4, 1.0, 4]
        self.box_ratio      = 0.05
        self.obj_ratio      = 5 * (input_shape[0] * input_shape[1]) / (416 ** 2)
        self.ignore_threshold = 0.5
    def get_target(self, l, targets, anchors, in_h, in_w):
        '''
        该函数负责生成损失函数中标注真值相关部分。
        :param l: 前述博客可知，darNet53输出一共3种大小的特征图，[52x52,26x26,13x13],每张特征图上的每一网格点，对应生成3种尺寸的anchor框（每种1个，聚类得到）。
        3*3=9，一共9个聚类出的anchor框尺寸，参数l表示当前函数处理的是哪种大小的特征图，取值[0,1,2]；
        :param targets: 真值框，size:[bs,realBoxNum,5] 5=4+1：其中4为x,y(中心点坐标)以及wh；1为真值类别号
         注：这里的输入真值图，是原图经imagebox处理后的416大小图片，经416图片上的坐标再除以416算出的比例值（详见前述博文）。
        :param anchors:聚类得到的固定尺寸anchor：[[10,13],  [16,30],  [33,23],  [30,61],  [62,45],[59,119],  [116,90],  [156,198],  [373,326]]，实际该输入的anchors为相对于416图上的大小的比例值，需再次折合至相对于特征图的大小，以便和targets做对比。
        :param in_h:输入特征图高
        :param in_w:输入特征图宽
        ..................................................................
        y_true:输出真值size:[b, anchorNum, j, i, 4+1+classNum]，这里anchorNum为3，每张特征图每点对应3个尺寸Anchors。
        noobj_mask:无目标（背景）的位置赋值为1，非背景赋值为0：size:[b, realBoxNum, j, i]，j, i为特征图网格数量，如13*13；其中0的值标识了真值框的中心落入特征图的哪个网格中。
        box_loss_scale:比例系数，实现了大目标损失值小及小目标损失值大的平衡。
        obj_mask:真值mask，size同noobj_mask
        '''
        y_true=torch.zeros(targets.shape[0],3,in_h, in_w,5+self.num_classes)
        noobj_mask=torch.ones(targets.shape[0],3,in_h, in_w)
        box_loss_scale = torch.zeros(targets.shape[0], 3, in_h, in_w)
        '''将由聚类得到的anchors预处理，将其中心点移至（0，0），便于同真值框比较IOU，使真值框选出最佳IOU
        因为y_true的第二维，表示3种大小的特征图，根据损失函数公式，每种特征图，对应一个真值yhead。
        #anchors：size[9，2]----》size[3,3,2]
        '''
        #取本特征图3种特征框：3-l是因最先输入的是最大特征图：l=0时取[116,90],  [156,198],  [373,326]
        threeAnchors=anchors.view(3,3,2)[3-l]
        #中心点移至（0，0）产生size:[3,4]
        threeAnchors =torch.cat((torch.zeros(3,2),threeAnchors),dim=-1)
        #原图比例转至特征图上的长度:
        targets[...,[0,2]]=targets[...,[0,2]]*in_w
        targets[..., [1, 3]] = targets[..., [1, 3]] * in_h
        #创建副本将待比较的真值目标也移至（0，0）
        targetsCopy=targets.clone().detach()
        targetsCopy[...,[0,1]]=0
        #首先抛开bs维度：
        for bs,eachPic in enumerate(targetsCopy):
            #如果该图不含真值框，跳过
            if len(eachPic)==0:
                continue
            #如果含有真值框
            #真值框对3种求IOU：为满足函数需求，框坐标转为框的左上角和右下角：
            eachPic[..., [0, 1]]=eachPic[..., [0, 1]]-eachPic[..., [2, 3]] / 2
            eachPic[..., [2, 3]] = eachPic[..., [0, 1]] + eachPic[..., [2, 3]] / 2
            threeAnchors[..., [0, 1]] = threeAnchors[..., [0, 1]] - threeAnchors[..., [2, 3]] / 2
            threeAnchors[..., [2, 3]] = threeAnchors[..., [0, 1]] + threeAnchors[..., [2, 3]] / 2
            #求出每个真值框与3个Anchor框中最大的一种框，并输出框序号，size:[len(eachPic)]
            maxIouIndex=torch.argmax(self.calculate_iou(eachPic,threeAnchors),dim=-1)
            for t, index in enumerate(maxIouIndex):
                #求出真值框中心所在的网格点
                i = torch.floor(targets[bs][t, 0]).long()
                j = torch.floor(targets[bs][t, 1]).long()
                #取出真实框的种类
                c = targets[bs][t, 4].long()
                #noobj_mask代表无目标的特征点,真实图片所在处赋值0
                noobj_mask[bs, index, j, i] = 0
                #真实参数赋值：
                y_true[bs, index, j, i, 0:4] = targets[bs][t, 0:4]
                #真图置信度赋值为1
                y_true[bs, index, j, i, 4] = 1
                #独热向量赋值
                y_true[bs, index, j, i, c + 5] = 1
                #给出乘积系数：其中目标越大，该值越越小，平衡了小目标不易被识别的问题
                box_loss_scale[bs, k, j, i] = 2-targets[bs][t, 2] * targets[bs][t, 3]/ in_w / in_h
        obj_mask = y_true[..., 4] == 1
        return y_true, noobj_mask, box_loss_scale,obj_mask
    def get_ignore(self, l, prediction, targets, scaled_anchors, in_h, in_w, noobj_mask):
        '''
        prediction:size可能为以下的一种：
        [bs, 3*(5+num_classes), 13, 13]
        [bs, 3*(5+num_classes), 26, 26]
        [bs, 3*(5+num_classes), 52, 52]
        :param l: 前述博客可知，darNet53输出一共3种大小的特征图，[52x52,26x26,13x13],每张特征图上的每一网格点，对应生成3种尺寸的anchor框（每种1个，聚类得到）。
        3*3=9，一共9个聚类出的anchor框尺寸，参数l表示当前函数处理的是哪种大小的特征图，取值[0,1,2]；
        :param 真值框，size:[bs,realBoxNum,5] 5=4+1：其中4为x,y(中心点坐标)以及wh；1为真值类别号
         注：这里的输入真值图，是原图经imagebox处理后的416大小图片，经416图片上的坐标再除以416算出的比例值（详见前述博文）。
        :param scaled_anchors:聚类得到：聚类得到的固定尺寸anchor：[[10,13],  [16,30],  [33,23],  [30,61],  [62,45],[59,119],  [116,90],
         [156,198],  [373,326]]，实际该输入的anchors为相对于416图上的大小的比例值，需再次折合至相对于特征图的大小，以便和targets做对比。
        :param in_h:输入特征图高
        :param in_w:输入特征图宽
        :param noobj_mask:get_target的输出
        :return:返回结果：
        noobj_mask：选出每个网格点中预测值与真值框的IOU最大值，且最大值大于ignore_threshold的边框也都看为正样本
        pred_boxes：解码后的输出，size[...,4]
        '''
        #转为[bs, 3, 13, 13，5+num_classes]
        prediction = prediction.view(prediction.shape[0], 3, 5+self.num_classes, in_h, in_w).permute(0, 1, 3, 4,2).contiguous()
        #   先验框的中心位置的调整参数,这里是框相对于图片的比例
        x = torch.sigmoid(prediction[..., 0])
        y = torch.sigmoid(prediction[..., 1])
        #   先验框的宽高调整参数
        w = prediction[..., 2]
        h = prediction[..., 3]
        bs = len(targets)
        # 生成网格，先验框中心，网格左上角
        grid_x = torch.linspace(0, in_w - 1, in_w).repeat(in_h, 1).repeat(
            int(bs * len(self.anchors_mask[l])), 1, 1).view(x.shape).type(torch.FloatTensor)
        grid_y = torch.linspace(0, in_h - 1, in_h).repeat(in_w, 1).t().repeat(
            int(bs * len(self.anchors_mask[l])), 1, 1).view(y.shape).type(torch.FloatTensor)

        # 生成先验框的宽高
        scaled_anchors_l = np.array(scaled_anchors)[self.anchors_mask[l]]
        anchor_w = torch.FloatTensor(scaled_anchors_l).index_select(1, torch.LongTensor([0]))
        anchor_h = torch.FloatTensor(scaled_anchors_l).index_select(1, torch.LongTensor([1]))
        anchor_w = anchor_w.repeat(bs, 1).repeat(1, 1, in_h * in_w).view(w.shape)
        anchor_h = anchor_h.repeat(bs, 1).repeat(1, 1, in_h * in_w).view(h.shape)
        # 计算调整后的先验框中心与宽高[bs, 3*(5+num_classes), 13, 13,1]
        #x:[bs, 3, 13, 13]
        # -------------------------------------------------------#
        pred_boxes_x = torch.unsqueeze(x + grid_x, -1)#[bs, 3, 13, 13，1]
        pred_boxes_y = torch.unsqueeze(y + grid_y, -1)
        pred_boxes_w = torch.unsqueeze(torch.exp(w) * anchor_w, -1)
        pred_boxes_h = torch.unsqueeze(torch.exp(h) * anchor_h, -1)

        pred_boxes = torch.cat([pred_boxes_x, pred_boxes_y, pred_boxes_w, pred_boxes_h], dim=-1)#[bs, 3, 13, 13，4]

        for b in range(bs):
            #形状压缩
            pred_boxes_for_ignore = pred_boxes[b].view(-1, 4)
            if len(targets[b]) > 0:
                batch_target = torch.zeros_like(targets[b])
                batch_target[:, [0, 2]] = targets[b][:, [0, 2]] * in_w
                batch_target[:, [1, 3]] = targets[b][:, [1, 3]] * in_h
                batch_target = batch_target[:, :4]
                #返回真值框与预测框的IOU
                anch_ious = self.calculate_iou(batch_target, pred_boxes_for_ignore)
                #返回预测框个数个值，即每个点预测出的3个预测框中与真值IOU最大的点
                anch_ious_max, _ = torch.max(anch_ious, dim=0)
                #形状还原
                anch_ious_max = anch_ious_max.view(pred_boxes[b].size()[:3])#[3, 13, 13]
                # [b, anchorNum, j, i]将与真值iou大于ignore_threshold的预测框都标为正样本
                noobj_mask[b][anch_ious_max > self.ignore_threshold] = 0
        return noobj_mask,pred_boxes
    def calculate_iou(self,box_as, box_bs):
        '''
        实现box_as对box_bs的1对多IOU计算
        输入：
        box_as:size:[A,4] 其中4为x1,y1,x2,y2;即box的左上角及右下角
        box_bs:size:[B,4]
        输出：
        ious:size:[A,B,1]:其意义为，每个A对应B个IOU
        '''
        #首先进行矩阵扩张操作：
        #意义为，每个A中的重复了B次
        box_asExtend=box_as.unsqueeze(1).expand(box_as.shape[0],box_bs.shape[0],4)
        #意义为A个[B,4]
        box_bsExtend = box_bs.unsqueeze(0).expand(box_as.shape[0], box_bs.shape[0], 4)
        #经过上面的变换，1对多IOU计算已经转换为1对1IOU计算：
        #此时再使用1对1 IOU变换对进行计算：box_asExtend与box_bsExtend间的IOU值：
        box1 = box_asExtend
        box2 = box_bsExtend
        # 计算两框相并的面积b1AddB2area
        leftTop = torch.max(box1[..., 0:2], box2[..., 0:2])
        bottomRight = torch.min(box1[..., 2:4], box2[..., 2:4])
        # 如出现负值表示框不相交截取为0:输出[...,w,h]
        b1AndB2 = torch.clamp(bottomRight - leftTop, min=0)
        # 计算了相交面积，形状[...,1]
        b1AndB2Area = b1AndB2[..., 0:1] * b1AndB2[..., 1:2]
        #计算两个box各自的面积：
        # 再计算b1及b2自己的面积：形状[...,1]
        b1Area = (box1[...,2:3]-box1[...,0:1])*(box1[...,3:4]-box1[...,1:2])
        b2Area = (box2[...,2:3]-box2[...,0:1])*(box2[...,3:4]-box2[...,1:2])
        # 得出IOU的结果，输出[A,B]的两维结果
        return (b1AndB2Area / (b1Area + b2Area - b1AndB2Area)).squeeze()
    #拼接出最终的损失函数：
    def forward(self, l, input, targets=None):
        # 转为[bs, 3, 13, 13，5+num_classes]
        prediction = input.view(input.size(0), 3, 5 + self.num_classes, input.size(2), input.size(3)).permute(0, 1, 3, 4,2).contiguous()
        conf = torch.sigmoid(prediction[..., 4])
        pred_cls = torch.sigmoid(prediction[..., 5:])
        stride_h = self.input_shape[0] / input.size(3)
        stride_w = self.input_shape[1] / input.size(2)
        #转换为tensor对象：
        scaled_anchors = torch.FloatTensor([(a_w / stride_w, a_h / stride_h) for a_w, a_h in self.anchors])
        #求取真值及预测值对象：
        y_true, noobj_mask, box_loss_scale,obj_mask= self.get_target(l, targets, scaled_anchors, input.size(2), input.size(3))
        noobj_mask, pred_boxes = self.get_ignore(l, input, targets, scaled_anchors, input.size(2), input.size(3), noobj_mask)
        #求取真值目标总数：
        n = torch.sum(obj_mask)
        #当不含物体时，只有置信度损失一项：
        loss=0
        #当有真值目标时，还有宽高损失和分类损失：
        if n!=0:
            xLoss=nn.MSELoss()(prediction[...,0][obj_mask], y_true[..., 0][obj_mask])* box_loss_scale[obj_mask]
            yLoss = nn.MSELoss()(prediction[..., 1][obj_mask], y_true[..., 1][obj_mask]) * box_loss_scale[obj_mask]
            wLoss=nn.MSELoss()(prediction[..., 2][obj_mask], y_true[..., 2][obj_mask]) * box_loss_scale[obj_mask]
            hLoss=nn.MSELoss()(prediction[..., 3][obj_mask], y_true[..., 3][obj_mask]) * box_loss_scale[obj_mask]
            #位置损失求和并乘以0.1权重
            positionLoss=xLoss+yLoss+wLoss+hLoss
            loss+=positionLoss*0.1
            #下面计算分类损失：
            classLoss=nn.MSELoss()(pred_cls[obj_mask], y_true[..., 5:][obj_mask]) * box_loss_scale[obj_mask]
            loss += classLoss
        #已有不用带sigmoid函数，使用BCELoss，这项损失有目标和无目标均有
        confLossObj=nn.BCELoss()(conf[obj_mask],y_true[...,4][obj_mask])
        confLossNoObj = nn.BCELoss()(conf[noobj_mask.bool()], y_true[..., 4][noobj_mask.bool()])
        #增大有物体时的损失权重：
        loss+=confLossObj*5
        loss += confLossNoObj
        return loss

【pytorch】自己实现精简版YOLOV3【五】，实现YOLOV3损失函数：损失函数表达式转多维tensor

Python相关栏目本月热门文章