-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Out of memory for Inference of Faster R-CNN, 11G 1080 Ti #821
Description
Expected results
Output bounding box results in images.
Actual results
[E net_async_base.cc:377] [enforce fail at context_gpu.cu:415] error == cudaSuccess. 2 vs 0. Error at: /opt/conda/conda-bld/pytorch-nightly_1547287162138/work/caffe2/core/context_gpu.cu:415: out of memory
Error from operator:
input: "gpu_0/res4_7_branch2b" input: "gpu_0/res4_7_branch2c_w" output: "gpu_0/res4_7_branch2c" name: "" type: "Conv" arg { name: "kernel" i: 1 } arg { name: "exhaustive_search" i: 0 } arg { name: "stride" i: 1 } arg { name: "pad" i: 0 } arg { name: "order" s: "NCHW" } arg { name: "dilation" i: 1 } device_option { device_type: 1 device_id: 0 } engine: "CUDNN"frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x59 (0x7f7674fb8249 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libc10.so)
frame #1: + 0x29f42cb (0x7f7677e542cb in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #2: + 0x139a395 (0x7f76767fa395 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #3: + 0x1516d54 (0x7f7676976d54 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #4: bool caffe2::CudnnConvOp::DoRunWithType<float, float, float, float>() + 0x3cd (0x7f7676983eed in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #5: caffe2::CudnnConvOp::RunOnDevice() + 0x1a0 (0x7f767696ba70 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #6: + 0x14796a5 (0x7f76768d96a5 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #7: caffe2::AsyncNetBase::run(int, int) + 0x144 (0x7f76b691e094 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #8: + 0x13e96a2 (0x7f76b69246a2 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #9: c10::ThreadPool::main_loop(unsigned long) + 0x273 (0x7f76b5aa28e3 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #10: + 0xb8678 (0x7f76c8d67678 in /home/anaconda3/envs/caffe2_py2/bin/../lib/libstdc++.so.6)
frame #11: + 0x76db (0x7f76cfa306db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #12: clone + 0x3f (0x7f76cefb488f in /lib/x86_64-linux-gnu/libc.so.6)
, op Conv
[E net_async_base.cc:129] Rethrowing exception from the run of 'generalized_rcnn'
WARNING workspace.py: 204: Original python traceback for operator 157
in network generalized_rcnn
in exception above (most recent call last):
WARNING workspace.py: 209: File "tools/infer_simple.py", line 209, in
WARNING workspace.py: 209: File "tools/infer_simple.py", line 135, in main
WARNING workspace.py: 209: File "/home/Detectron/detectron/core/test_engine.py", line 329, in initialize_model_from_cfg
WARNING workspace.py: 209: File "/home/Detectron/detectron/modeling/model_builder.py", line 124, in create
WARNING workspace.py: 209: File "/home/Detectron/detectron/modeling/model_builder.py", line 89, in generalized_rcnn
WARNING workspace.py: 209: File "/home/Detectron/detectron/modeling/model_builder.py", line 229, in build_generic_detection_model
WARNING workspace.py: 209: File "/home/Detectron/detectron/modeling/optimizer.py", line 54, in build_data_parallel_model
WARNING workspace.py: 209: File "/home/Detectron/detectron/modeling/model_builder.py", line 169, in _single_gpu_build_func
WARNING workspace.py: 209: File "/home/Detectron/detectron/modeling/FPN.py", line 63, in add_fpn_ResNet101_conv5_body
WARNING workspace.py: 209: File "/home/Detectron/detectron/modeling/FPN.py", line 104, in add_fpn_onto_conv_body
WARNING workspace.py: 209: File "/home/Detectron/detectron/modeling/ResNet.py", line 48, in add_ResNet101_conv5_body
WARNING workspace.py: 209: File "/home/Detectron/detectron/modeling/ResNet.py", line 112, in add_ResNet_convX_body
WARNING workspace.py: 209: File "/home/Detectron/detectron/modeling/ResNet.py", line 85, in add_stage
WARNING workspace.py: 209: File "/home/Detectron/detectron/modeling/ResNet.py", line 183, in add_residual_block
WARNING workspace.py: 209: File "/home/Detectron/detectron/modeling/ResNet.py", line 331, in bottleneck_transformation
WARNING workspace.py: 209: File "/home/Detectron/detectron/modeling/detector.py", line 407, in ConvAffine
WARNING workspace.py: 209: File "/home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/cnn.py", line 97, in Conv
WARNING workspace.py: 209: File "/home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/brew.py", line 107, in scope_wrapper
WARNING workspace.py: 209: File "/home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/helpers/conv.py", line 186, in conv
WARNING workspace.py: 209: File "/home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/helpers/conv.py", line 139, in _ConvBase
Traceback (most recent call last):
File "tools/infer_simple.py", line 209, in
main(args)
File "tools/infer_simple.py", line 158, in main
model, im, None, timers=timers
File "/home/Detectron/detectron/core/test.py", line 63, in im_detect_all
scores, boxes, im_scale = im_detect_bbox_aug(model, im, box_proposals)
File "/home/Detectron/detectron/core/test.py", line 238, in im_detect_bbox_aug
model, im, scale, max_size, box_proposals
File "/home/Detectron/detectron/core/test.py", line 333, in im_detect_bbox_scale
model, im, target_scale, target_max_size, boxes=box_proposals
File "/home/Detectron/detectron/core/test.py", line 160, in im_detect_bbox
workspace.RunNet(model.net.Proto().name)
File "/home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/workspace.py", line 236, in RunNet
StringifyNetName(name), num_iter, allow_fail,
File "/home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/workspace.py", line 197, in CallWithExceptionIntercept
return func(args, kwargs)
RuntimeError: [enforce fail at context_gpu.cu:415] error == cudaSuccess. 2 vs 0. Error at: /opt/conda/conda-bld/pytorch-nightly_1547287162138/work/caffe2/core/context_gpu.cu:415: out of memory
Error from operator:
input: "gpu_0/res4_7_branch2b" input: "gpu_0/res4_7_branch2c_w" output: "gpu_0/res4_7_branch2c" name: "" type: "Conv" arg { name: "kernel" i: 1 } arg { name: "exhaustive_search" i: 0 } arg { name: "stride" i: 1 } arg { name: "pad" i: 0 } arg { name: "order" s: "NCHW" } arg { name: "dilation" i: 1 } device_option { device_type: 1 device_id: 0 } engine: "CUDNN"frame #0: c10::ThrowEnforceNotMet(char const, int, char const, std::string const&, void const) + 0x59 (0x7f7674fb8249 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libc10.so)
frame #1: + 0x29f42cb (0x7f7677e542cb in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #2: + 0x139a395 (0x7f76767fa395 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #3: + 0x1516d54 (0x7f7676976d54 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #4: bool caffe2::CudnnConvOp::DoRunWithType<float, float, float, float>() + 0x3cd (0x7f7676983eed in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #5: caffe2::CudnnConvOp::RunOnDevice() + 0x1a0 (0x7f767696ba70 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #6: + 0x14796a5 (0x7f76768d96a5 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #7: caffe2::AsyncNetBase::run(int, int) + 0x144 (0x7f76b691e094 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #8: + 0x13e96a2 (0x7f76b69246a2 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #9: c10::ThreadPool::main_loop(unsigned long) + 0x273 (0x7f76b5aa28e3 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #10: + 0xb8678 (0x7f76c8d67678 in /home/anaconda3/envs/caffe2_py2/bin/../lib/libstdc++.so.6)
frame #11: + 0x76db (0x7f76cfa306db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #12: clone + 0x3f (0x7f76cefb488f in /lib/x86_64-linux-gnu/libc.so.6)
Detailed steps to reproduce
python infer_simple.py ……
When I run infer_simple.py with TEST.BBOX_AUG, the GPU memory has been rising, and finally I will out of memory.
Do you have this problem?
I tested it and found that this will not happen when TEST.BBOX_AUG is set to False. When set to True, running out about forty or fifty pictures will appear out of memory...
System information
- Operating system: Ubuntu 18.04 LTS
- Compiler version: gcc version 7.3.0
- CUDA version: 10.0.130
- cuDNN version: 7.4.2
- NVIDIA driver version: 410.79
- GPU models (for all devices if they are not all the same): model/ImageNetPretrained/MSRA/R-101.pkl
PYTHONPATH
environment variable: anaconda3/bin/pythonpython --version
output: 2.7- Anything else that seems relevant: GTX 1080Ti, 11G