Yolov-1-TX2上用YOLOv3训练自己数据集的流程(VOC2007-TX2-GPU)
Yolov–2–一文全面了解深度学习性能优化加速引擎—TensorRT
Yolov–3–TensorRT中yolov3性能优化加速(基于caffe)
yolov–11–YOLO v3的原版训练记录、mAP、AP、recall、precision、time等评价指标计算
yolov–14–轻量级模型MobilenetV2网络结构解析–概念解读
yolov–15–史上最详细的Yolov3边框预测分析–改进
yolov3–16–一文详解卷积操作中的padding填充操作
CUDA_VISIBLE_DEVICES=4 python train.py --gpu=4 &
调试错误1
(pytorch1.1.0-py2.7_cuda9.0) Liqing@user-ubuntu:~/hangyu/stronger-yolo-c/v3$ WARNING:tensorflow:From /home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
2019-11-01 23:38:28.294675: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-11-01 23:38:28.324619: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2299985000 Hz
2019-11-01 23:38:28.330328: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x7f09f4ce3070 executing computations on platform Host. Devices:
2019-11-01 23:38:28.330377: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined>
WARNING:tensorflow:From /home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from weights/mobilenet_v2_1.0_224.ckpt
Reading annotation for 1/181
Reading annotation for 101/181
Saving cached annotations to /home/Liqing/hangyu/stronger-yolo-c/v3/eval/cache/annots.pkl
/home/Liqing/hangyu/stronger-yolo-c/v3/eval/voc_eval.py:194: RuntimeWarning: invalid value encountered in divide
rec = tp / float(npos)
Reading annotation for 1/181
Reading annotation for 101/181
Saving cached annotations to /home/Liqing/hangyu/stronger-yolo-c/v3/eval/cache/annots.pkl
Reading annotation for 1/181
[0. 0. 0. … 0. 0. 0.]
[nan nan nan … nan nan nan]
nan
# compute precision recall
fp = np.cumsum(fp)
tp = np.cumsum(tp)
print tp #add
rec = tp / float(npos)
print rec #add
# avoid divide by zero in case the first detection matches a difficult
# ground truth
prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps)
ap = voc_ap(rec, prec, use_07_metric)
print ap #add
return rec, prec, ap
WARNING:tensorflow:From /home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
2019-11-14 21:11:32.104982: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-11-14 21:11:32.117354: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2299985000 Hz
2019-11-14 21:11:32.122381: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x7f337ba03df0 executing computations on platform Host. Devices:
2019-11-14 21:11:32.122419: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined>
WARNING:tensorflow:From /home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from weights/mobilenet_v2_1.0_224.ckpt
2019-11-14 21:53:59.213951: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at save_restore_v2_ops.cc:134 : Resource exhausted: weights/yolo.ckpt-1.data-00000-of-00001.tempstate5297156131404480268; No space left on device
Traceback (most recent call last):
File "train.py", line 159, in <module>
Yolo_train().train()
File "train.py", line 149, in train
self.__save.save(self.__sess, os.path.join(self.__weights_dir, 'yolo.ckpt-%d' % period))
File "/home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1171, in save
{self.saver_def.filename_tensor_name: checkpoint_file})
File "/home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: weights/yolo.ckpt-1.data-00000-of-00001.tempstate5297156131404480268; No space left on device
[[node load_save/save_1/SaveV2 (defined at train.py:80) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
解决:
自己数据集label中类别大小写与训练类别不一致问题(统一改为小写)
版权声明:本文为qq_33869371原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。