yolov3–17–yolo-mobilenetv2-调试错误总结

CUDA_VISIBLE_DEVICES=4 python train.py --gpu=4 &

调试错误1

(pytorch1.1.0-py2.7_cuda9.0) Liqing@user-ubuntu:~/hangyu/stronger-yolo-c/v3$ WARNING:tensorflow:From /home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
2019-11-01 23:38:28.294675: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-11-01 23:38:28.324619: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2299985000 Hz
2019-11-01 23:38:28.330328: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x7f09f4ce3070 executing computations on platform Host. Devices:
2019-11-01 23:38:28.330377: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
WARNING:tensorflow:From /home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from weights/mobilenet_v2_1.0_224.ckpt
Reading annotation for 1/181
Reading annotation for 101/181
Saving cached annotations to /home/Liqing/hangyu/stronger-yolo-c/v3/eval/cache/annots.pkl
/home/Liqing/hangyu/stronger-yolo-c/v3/eval/voc_eval.py:194: RuntimeWarning: invalid value encountered in divide
  rec = tp / float(npos)
Reading annotation for 1/181
Reading annotation for 101/181
Saving cached annotations to /home/Liqing/hangyu/stronger-yolo-c/v3/eval/cache/annots.pkl
Reading annotation for 1/181

[0. 0. 0. … 0. 0. 0.]

[nan nan nan … nan nan nan]

nan

    # compute precision recall
    fp = np.cumsum(fp)
    tp = np.cumsum(tp)
    print tp   #add
    rec = tp / float(npos)
    print rec  #add
    # avoid divide by zero in case the first detection matches a difficult
    # ground truth
    prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps)
    ap = voc_ap(rec, prec, use_07_metric)
    print ap  #add
    return rec, prec, ap

WARNING:tensorflow:From /home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
2019-11-14 21:11:32.104982: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-11-14 21:11:32.117354: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2299985000 Hz
2019-11-14 21:11:32.122381: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x7f337ba03df0 executing computations on platform Host. Devices:
2019-11-14 21:11:32.122419: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
WARNING:tensorflow:From /home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from weights/mobilenet_v2_1.0_224.ckpt
2019-11-14 21:53:59.213951: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at save_restore_v2_ops.cc:134 : Resource exhausted: weights/yolo.ckpt-1.data-00000-of-00001.tempstate5297156131404480268; No space left on device
Traceback (most recent call last):
  File "train.py", line 159, in <module>
    Yolo_train().train()
  File "train.py", line 149, in train
    self.__save.save(self.__sess, os.path.join(self.__weights_dir, 'yolo.ckpt-%d' % period))
  File "/home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1171, in save
    {self.saver_def.filename_tensor_name: checkpoint_file})
  File "/home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: weights/yolo.ckpt-1.data-00000-of-00001.tempstate5297156131404480268; No space left on device
         [[node load_save/save_1/SaveV2 (defined at train.py:80) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

解决：

自己数据集label中类别大小写与训练类别不一致问题（统一改为小写）

原文链接：https://blog.csdn.net/qq_33869371/article/details/102869478

调试错误1

你可能也喜欢