Skip to content
This repository was archived by the owner on Nov 21, 2023. It is now read-only.
This repository was archived by the owner on Nov 21, 2023. It is now read-only.

Out of memory for Inference of Faster R-CNN, 11G 1080 Ti #821

@PumayHui

Description

@PumayHui

Expected results

Output bounding box results in images.

Actual results

[E net_async_base.cc:377] [enforce fail at context_gpu.cu:415] error == cudaSuccess. 2 vs 0. Error at: /opt/conda/conda-bld/pytorch-nightly_1547287162138/work/caffe2/core/context_gpu.cu:415: out of memory
Error from operator:
input: "gpu_0/res4_7_branch2b" input: "gpu_0/res4_7_branch2c_w" output: "gpu_0/res4_7_branch2c" name: "" type: "Conv" arg { name: "kernel" i: 1 } arg { name: "exhaustive_search" i: 0 } arg { name: "stride" i: 1 } arg { name: "pad" i: 0 } arg { name: "order" s: "NCHW" } arg { name: "dilation" i: 1 } device_option { device_type: 1 device_id: 0 } engine: "CUDNN"frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x59 (0x7f7674fb8249 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libc10.so)
frame #1: + 0x29f42cb (0x7f7677e542cb in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #2: + 0x139a395 (0x7f76767fa395 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #3: + 0x1516d54 (0x7f7676976d54 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #4: bool caffe2::CudnnConvOp::DoRunWithType<float, float, float, float>() + 0x3cd (0x7f7676983eed in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #5: caffe2::CudnnConvOp::RunOnDevice() + 0x1a0 (0x7f767696ba70 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #6: + 0x14796a5 (0x7f76768d96a5 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #7: caffe2::AsyncNetBase::run(int, int) + 0x144 (0x7f76b691e094 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #8: + 0x13e96a2 (0x7f76b69246a2 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #9: c10::ThreadPool::main_loop(unsigned long) + 0x273 (0x7f76b5aa28e3 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #10: + 0xb8678 (0x7f76c8d67678 in /home/anaconda3/envs/caffe2_py2/bin/../lib/libstdc++.so.6)
frame #11: + 0x76db (0x7f76cfa306db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #12: clone + 0x3f (0x7f76cefb488f in /lib/x86_64-linux-gnu/libc.so.6)
, op Conv
[E net_async_base.cc:129] Rethrowing exception from the run of 'generalized_rcnn'
WARNING workspace.py: 204: Original python traceback for operator 157 in network generalized_rcnn in exception above (most recent call last):
WARNING workspace.py: 209: File "tools/infer_simple.py", line 209, in
WARNING workspace.py: 209: File "tools/infer_simple.py", line 135, in main
WARNING workspace.py: 209: File "/home/Detectron/detectron/core/test_engine.py", line 329, in initialize_model_from_cfg
WARNING workspace.py: 209: File "/home/Detectron/detectron/modeling/model_builder.py", line 124, in create
WARNING workspace.py: 209: File "/home/Detectron/detectron/modeling/model_builder.py", line 89, in generalized_rcnn
WARNING workspace.py: 209: File "/home/Detectron/detectron/modeling/model_builder.py", line 229, in build_generic_detection_model
WARNING workspace.py: 209: File "/home/Detectron/detectron/modeling/optimizer.py", line 54, in build_data_parallel_model
WARNING workspace.py: 209: File "/home/Detectron/detectron/modeling/model_builder.py", line 169, in _single_gpu_build_func
WARNING workspace.py: 209: File "/home/Detectron/detectron/modeling/FPN.py", line 63, in add_fpn_ResNet101_conv5_body
WARNING workspace.py: 209: File "/home/Detectron/detectron/modeling/FPN.py", line 104, in add_fpn_onto_conv_body
WARNING workspace.py: 209: File "/home/Detectron/detectron/modeling/ResNet.py", line 48, in add_ResNet101_conv5_body
WARNING workspace.py: 209: File "/home/Detectron/detectron/modeling/ResNet.py", line 112, in add_ResNet_convX_body
WARNING workspace.py: 209: File "/home/Detectron/detectron/modeling/ResNet.py", line 85, in add_stage
WARNING workspace.py: 209: File "/home/Detectron/detectron/modeling/ResNet.py", line 183, in add_residual_block
WARNING workspace.py: 209: File "/home/Detectron/detectron/modeling/ResNet.py", line 331, in bottleneck_transformation
WARNING workspace.py: 209: File "/home/Detectron/detectron/modeling/detector.py", line 407, in ConvAffine
WARNING workspace.py: 209: File "/home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/cnn.py", line 97, in Conv
WARNING workspace.py: 209: File "/home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/brew.py", line 107, in scope_wrapper
WARNING workspace.py: 209: File "/home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/helpers/conv.py", line 186, in conv
WARNING workspace.py: 209: File "/home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/helpers/conv.py", line 139, in _ConvBase
Traceback (most recent call last):
File "tools/infer_simple.py", line 209, in
main(args)
File "tools/infer_simple.py", line 158, in main
model, im, None, timers=timers
File "/home/Detectron/detectron/core/test.py", line 63, in im_detect_all
scores, boxes, im_scale = im_detect_bbox_aug(model, im, box_proposals)
File "/home/Detectron/detectron/core/test.py", line 238, in im_detect_bbox_aug
model, im, scale, max_size, box_proposals
File "/home/Detectron/detectron/core/test.py", line 333, in im_detect_bbox_scale
model, im, target_scale, target_max_size, boxes=box_proposals
File "/home/Detectron/detectron/core/test.py", line 160, in im_detect_bbox
workspace.RunNet(model.net.Proto().name)
File "/home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/workspace.py", line 236, in RunNet
StringifyNetName(name), num_iter, allow_fail,
File "/home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/workspace.py", line 197, in CallWithExceptionIntercept
return func(args, kwargs)
RuntimeError: [enforce fail at context_gpu.cu:415] error == cudaSuccess. 2 vs 0. Error at: /opt/conda/conda-bld/pytorch-nightly_1547287162138/work/caffe2/core/context_gpu.cu:415: out of memory
Error from operator:
input: "gpu_0/res4_7_branch2b" input: "gpu_0/res4_7_branch2c_w" output: "gpu_0/res4_7_branch2c" name: "" type: "Conv" arg { name: "kernel" i: 1 } arg { name: "exhaustive_search" i: 0 } arg { name: "stride" i: 1 } arg { name: "pad" i: 0 } arg { name: "order" s: "NCHW" } arg { name: "dilation" i: 1 } device_option { device_type: 1 device_id: 0 } engine: "CUDNN"frame #0: c10::ThrowEnforceNotMet(char const
, int, char const
, std::string const&, void const
) + 0x59 (0x7f7674fb8249 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libc10.so)
frame #1: + 0x29f42cb (0x7f7677e542cb in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #2: + 0x139a395 (0x7f76767fa395 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #3: + 0x1516d54 (0x7f7676976d54 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #4: bool caffe2::CudnnConvOp::DoRunWithType<float, float, float, float>() + 0x3cd (0x7f7676983eed in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #5: caffe2::CudnnConvOp::RunOnDevice() + 0x1a0 (0x7f767696ba70 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #6: + 0x14796a5 (0x7f76768d96a5 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #7: caffe2::AsyncNetBase::run(int, int) + 0x144 (0x7f76b691e094 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #8: + 0x13e96a2 (0x7f76b69246a2 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #9: c10::ThreadPool::main_loop(unsigned long) + 0x273 (0x7f76b5aa28e3 in /home/anaconda3/envs/caffe2_py2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #10: + 0xb8678 (0x7f76c8d67678 in /home/anaconda3/envs/caffe2_py2/bin/../lib/libstdc++.so.6)
frame #11: + 0x76db (0x7f76cfa306db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #12: clone + 0x3f (0x7f76cefb488f in /lib/x86_64-linux-gnu/libc.so.6)

Detailed steps to reproduce

python infer_simple.py ……

When I run infer_simple.py with TEST.BBOX_AUG, the GPU memory has been rising, and finally I will out of memory.
Do you have this problem?
I tested it and found that this will not happen when TEST.BBOX_AUG is set to False. When set to True, running out about forty or fifty pictures will appear out of memory...

System information

  • Operating system: Ubuntu 18.04 LTS
  • Compiler version: gcc version 7.3.0
  • CUDA version: 10.0.130
  • cuDNN version: 7.4.2
  • NVIDIA driver version: 410.79
  • GPU models (for all devices if they are not all the same): model/ImageNetPretrained/MSRA/R-101.pkl
  • PYTHONPATH environment variable: anaconda3/bin/python
  • python --version output: 2.7
  • Anything else that seems relevant: GTX 1080Ti, 11G

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions