Pytorch_Retinaface项目中loc_t中存在-inf导致loss为“nan”

  • Post author:
  • Post category:其他




Pytorch_Retinaface项目中loc_t中存在-inf导致loss为“nan”




Pytorch_Retinaface工程地址



1.数据+裁剪(数据增强)



数据中存在框面积为0(四个坐标[0,0,0,0])的groundtruth,对,而且裁剪代码

./data/data_augment.py



67-69行

数据增强部分并没有限制框必须大于多少个像素点(尽管注释了必须大于16个像素点),这会造成0个像素点的gt框传入到后面。

		# make sure that the cropped image contains at least one face > 16 pixel at training image scale
        b_w_t = (boxes_t[:, 2] - boxes_t[:, 0] + 1) / w * img_dim
        b_h_t = (boxes_t[:, 3] - boxes_t[:, 1] + 1) / h * img_dim
        mask_b = np.minimum(b_w_t, b_h_t) > 0.0



2.匹配问题



匹配代码

./utils/box_utils.py

中的

match

函数操作流程:计算每个anchor和每个gt的IOU;找到每个gt与anchor的最大IOU和anchor对应的id;再找到每个anchor与gt的最大IOU和gt对应的id;然后,

142行

gt对应最大anchor的id到anchor中,这里有问题,保证将gtX对应最大anchorA,使anchorA对应gtX的id,但是之前计算的IOU没变,所以当多个gt对应一个anchor时候,并且后面的gt(面积为0)与anchor的IOU特别小(为0),anchor会取对应后面的id但是IOU没更新,所以先面置0判断不会起作用,导致最终gt(0像素,面积为0)的当作框送入后面损失计算,导致问题。
def match(threshold, truths, priors, variances, labels, landms, loc_t, conf_t, landm_t, idx):

    # 计算每个anchor和每个gt的IOU
    overlaps = jaccard(
        truths,
        point_form(priors)
    )
    # (Bipartite Matching)
    # [1,num_objects] best prior for each ground truth
    #每个gt与anchor的最大IOU和anchor对应的id
    best_prior_overlap, best_prior_idx = overlaps.max(1, keepdim=True)

    # ignore hard gt 
    valid_gt_idx = best_prior_overlap[:, 0] >= 0.2
    best_prior_idx_filter = best_prior_idx[valid_gt_idx, :]
    if best_prior_idx_filter.shape[0] <= 0:
        loc_t[idx] = 0
        conf_t[idx] = 0
        return

    # [1,num_priors] best ground truth for each prior
    best_truth_overlap, best_truth_idx = overlaps.max(0, keepdim=True)
    best_truth_idx.squeeze_(0)
    best_truth_overlap.squeeze_(0)
    best_prior_idx.squeeze_(1)
    best_prior_idx_filter.squeeze_(1)
    best_prior_overlap.squeeze_(1)
    best_truth_overlap.index_fill_(0, best_prior_idx_filter, 2)  # ensure best prior
    # TODO refactor: index  best_prior_idx with long tensor
    # ensure every gt matches with its prior of max overlap
    # 这里有问题,保证将gtX对应最大anchorA,使anchorA对应gtX的id,但是上面计算的IOU没变,所以当多个gt对应一个anchor时候,并且后面的gt(面积为0)与anchor的IOU特别小(为0),anchor会取对应后面的id但是IOU没更新,所以先面置0判断不会起作用,导致最终gt(0像素,面积为0)的当作框送入后面损失计算,导致问题。
    for j in range(best_prior_idx.size(0)):     # 判别此anchor是预测哪一个boxes
        best_truth_idx[best_prior_idx[j]] = j
    matches = truths[best_truth_idx]            # Shape: [num_priors,4] 此处为每一个anchor对应的bbox取出来
    conf = labels[best_truth_idx]               # Shape: [num_priors]      此处为每一个anchor对应的label取出来
    conf[best_truth_overlap < threshold] = 0    # label as background   overlap<0.35的全部作为负样本
    loc = encode(matches, priors, variances)

    matches_landm = landms[best_truth_idx]
    landm = encode_landm(matches_landm, priors, variances)
    loc_t[idx] = loc    # [num_priors,4] encoded offsets to learn
    conf_t[idx] = conf  # [num_priors] top class label for each prior
    landm_t[idx] = landm

举例说明gt_0 和gt_1,anchorA和anchorB,

项目 gt_0 gt_1
anchorA 0 0
anchorB 0.9 0

anchor对应的idx为[1,0],gt对应的idx为[1,1]

执行完

142行

循环后,anchor对应的idx改为[1,1],但是anchor[0]对应IOU还为0.9,进行负样本置0时候就会出现问题,anchor[0]对应了错误的gt,但是还标记为正样本,同理,若anchor和gt_1匹配度不是0,是很小的数的话,也会造成anchor次匹配问题



版权声明:本文为weixin_47343182原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。