Pytorch_Retinaface项目中loc_t中存在-inf导致loss为“nan” – 小飞侠

Pytorch_Retinaface项目中loc_t中存在-inf导致loss为“nan”

Post author:xfxia
Post published:2023年9月30日
Post category:其他

Pytorch_Retinaface项目中loc_t中存在-inf导致loss为“nan”

Pytorch_Retinaface工程地址

1.数据+裁剪(数据增强)

数据中存在框面积为0(四个坐标[0，0，0，0])的groundtruth，对，而且裁剪代码
`./data/data_augment.py`
中
`67-69行`
数据增强部分并没有限制框必须大于多少个像素点(尽管注释了必须大于16个像素点)，这会造成0个像素点的gt框传入到后面。

		# make sure that the cropped image contains at least one face > 16 pixel at training image scale
        b_w_t = (boxes_t[:, 2] - boxes_t[:, 0] + 1) / w * img_dim
        b_h_t = (boxes_t[:, 3] - boxes_t[:, 1] + 1) / h * img_dim
        mask_b = np.minimum(b_w_t, b_h_t) > 0.0

2.匹配问题

匹配代码
`./utils/box_utils.py`
中的
`match`
函数操作流程：计算每个anchor和每个gt的IOU；找到每个gt与anchor的最大IOU和anchor对应的id；再找到每个anchor与gt的最大IOU和gt对应的id；然后，
`142行`
gt对应最大anchor的id到anchor中，这里有问题，保证将gtX对应最大anchorA，使anchorA对应gtX的id，但是之前计算的IOU没变，所以当多个gt对应一个anchor时候，并且后面的gt(面积为0)与anchor的IOU特别小(为0)，anchor会取对应后面的id但是IOU没更新，所以先面置0判断不会起作用，导致最终gt(0像素，面积为0)的当作框送入后面损失计算，导致问题。

def match(threshold, truths, priors, variances, labels, landms, loc_t, conf_t, landm_t, idx):

    # 计算每个anchor和每个gt的IOU
    overlaps = jaccard(
        truths,
        point_form(priors)
    )
    # (Bipartite Matching)
    # [1,num_objects] best prior for each ground truth
    #每个gt与anchor的最大IOU和anchor对应的id
    best_prior_overlap, best_prior_idx = overlaps.max(1, keepdim=True)

    # ignore hard gt 
    valid_gt_idx = best_prior_overlap[:, 0] >= 0.2
    best_prior_idx_filter = best_prior_idx[valid_gt_idx, :]
    if best_prior_idx_filter.shape[0] <= 0:
        loc_t[idx] = 0
        conf_t[idx] = 0
        return

    # [1,num_priors] best ground truth for each prior
    best_truth_overlap, best_truth_idx = overlaps.max(0, keepdim=True)
    best_truth_idx.squeeze_(0)
    best_truth_overlap.squeeze_(0)
    best_prior_idx.squeeze_(1)
    best_prior_idx_filter.squeeze_(1)
    best_prior_overlap.squeeze_(1)
    best_truth_overlap.index_fill_(0, best_prior_idx_filter, 2)  # ensure best prior
    # TODO refactor: index  best_prior_idx with long tensor
    # ensure every gt matches with its prior of max overlap
    # 这里有问题，保证将gtX对应最大anchorA，使anchorA对应gtX的id，但是上面计算的IOU没变，所以当多个gt对应一个anchor时候，并且后面的gt(面积为0)与anchor的IOU特别小(为0)，anchor会取对应后面的id但是IOU没更新，所以先面置0判断不会起作用，导致最终gt(0像素，面积为0)的当作框送入后面损失计算，导致问题。
    for j in range(best_prior_idx.size(0)):     # 判别此anchor是预测哪一个boxes
        best_truth_idx[best_prior_idx[j]] = j
    matches = truths[best_truth_idx]            # Shape: [num_priors,4] 此处为每一个anchor对应的bbox取出来
    conf = labels[best_truth_idx]               # Shape: [num_priors]      此处为每一个anchor对应的label取出来
    conf[best_truth_overlap < threshold] = 0    # label as background   overlap<0.35的全部作为负样本
    loc = encode(matches, priors, variances)

    matches_landm = landms[best_truth_idx]
    landm = encode_landm(matches_landm, priors, variances)
    loc_t[idx] = loc    # [num_priors,4] encoded offsets to learn
    conf_t[idx] = conf  # [num_priors] top class label for each prior
    landm_t[idx] = landm

举例说明gt_0 和gt_1，anchorA和anchorB，

项目	gt_0	gt_1
anchorA	0	0
anchorB	0.9	0

anchor对应的idx为[1,0]，gt对应的idx为[1,1]

执行完
142行
循环后，anchor对应的idx改为[1,1]，但是anchor[0]对应IOU还为0.9，进行负样本置0时候就会出现问题,anchor[0]对应了错误的gt，但是还标记为正样本，同理，若anchor和gt_1匹配度不是0，是很小的数的话，也会造成anchor次匹配问题

版权声明：本文为weixin_47343182原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

原文链接：https://blog.csdn.net/weixin_47343182/article/details/119005977