yolov3–17–yolo-mobilenetv2-调试错误总结

  • Post author:
  • Post category:其他



Yolov-1-TX2上用YOLOv3训练自己数据集的流程(VOC2007-TX2-GPU)


Yolov–2–一文全面了解深度学习性能优化加速引擎—TensorRT


Yolov–3–TensorRT中yolov3性能优化加速(基于caffe)


yolov-5-目标检测:YOLOv2算法原理详解


yolov–8–Tensorflow实现YOLO v3


yolov–9–YOLO v3的剪枝优化


yolov–10–目标检测模型的参数评估指标详解、概念解析


yolov–11–YOLO v3的原版训练记录、mAP、AP、recall、precision、time等评价指标计算


yolov–12–YOLOv3的原理深度剖析和关键点讲解


yolov–14–轻量级模型MobilenetV2网络结构解析–概念解读


yolov–15–史上最详细的Yolov3边框预测分析–改进


yolov3–16–一文详解卷积操作中的padding填充操作


CUDA_VISIBLE_DEVICES=4 python train.py --gpu=4 &

调试错误1

(pytorch1.1.0-py2.7_cuda9.0) Liqing@user-ubuntu:~/hangyu/stronger-yolo-c/v3$ WARNING:tensorflow:From /home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
2019-11-01 23:38:28.294675: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-11-01 23:38:28.324619: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2299985000 Hz
2019-11-01 23:38:28.330328: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x7f09f4ce3070 executing computations on platform Host. Devices:
2019-11-01 23:38:28.330377: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
WARNING:tensorflow:From /home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from weights/mobilenet_v2_1.0_224.ckpt
Reading annotation for 1/181
Reading annotation for 101/181
Saving cached annotations to /home/Liqing/hangyu/stronger-yolo-c/v3/eval/cache/annots.pkl
/home/Liqing/hangyu/stronger-yolo-c/v3/eval/voc_eval.py:194: RuntimeWarning: invalid value encountered in divide
  rec = tp / float(npos)
Reading annotation for 1/181
Reading annotation for 101/181
Saving cached annotations to /home/Liqing/hangyu/stronger-yolo-c/v3/eval/cache/annots.pkl
Reading annotation for 1/181


[0. 0. 0. … 0. 0. 0.]

[nan nan nan … nan nan nan]

nan

    # compute precision recall
    fp = np.cumsum(fp)
    tp = np.cumsum(tp)
    print tp   #add
    rec = tp / float(npos)
    print rec  #add
    # avoid divide by zero in case the first detection matches a difficult
    # ground truth
    prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps)
    ap = voc_ap(rec, prec, use_07_metric)
    print ap  #add
    return rec, prec, ap





WARNING:tensorflow:From /home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
2019-11-14 21:11:32.104982: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-11-14 21:11:32.117354: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2299985000 Hz
2019-11-14 21:11:32.122381: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x7f337ba03df0 executing computations on platform Host. Devices:
2019-11-14 21:11:32.122419: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
WARNING:tensorflow:From /home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from weights/mobilenet_v2_1.0_224.ckpt
2019-11-14 21:53:59.213951: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at save_restore_v2_ops.cc:134 : Resource exhausted: weights/yolo.ckpt-1.data-00000-of-00001.tempstate5297156131404480268; No space left on device
Traceback (most recent call last):
  File "train.py", line 159, in <module>
    Yolo_train().train()
  File "train.py", line 149, in train
    self.__save.save(self.__sess, os.path.join(self.__weights_dir, 'yolo.ckpt-%d' % period))
  File "/home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1171, in save
    {self.saver_def.filename_tensor_name: checkpoint_file})
  File "/home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: weights/yolo.ckpt-1.data-00000-of-00001.tempstate5297156131404480268; No space left on device
         [[node load_save/save_1/SaveV2 (defined at train.py:80) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.



解决:


自己数据集label中类别大小写与训练类别不一致问题(统一改为小写)



版权声明:本文为qq_33869371原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。