Skip to content
This repository was archived by the owner on Nov 21, 2023. It is now read-only.
This repository was archived by the owner on Nov 21, 2023. It is now read-only.

IsType<T>() ASSERT FAILED [e2e_mask_rcnn_R-50-C4_1x.yaml] #891

@qvks

Description

@qvks

Expected results

Successful testing on coco, as per usual.

Actual results

Traceback (most recent call last):
File "tools/test_net.py", line 116, in
check_expected_results=True,
File "/vol/bitbucket2/rm2815/Detectron/detectron/core/test_engine.py", line 128, in run_inference
all_results = result_getter()
File "/vol/bitbucket2/rm2815/Detectron/detectron/core/test_engine.py", line 108, in result_getter
multi_gpu=multi_gpu_testing
File "/vol/bitbucket2/rm2815/Detectron/detectron/core/test_engine.py", line 159, in test_net_on_dataset
weights_file, dataset_name, proposal_file, output_dir, gpu_id=gpu_id
File "/vol/bitbucket2/rm2815/Detectron/detectron/core/test_engine.py", line 258, in test_net
model, im, box_proposals, timers
File "/vol/bitbucket2/rm2815/Detectron/detectron/core/test.py", line 66, in im_detect_all
model, im, cfg.TEST.SCALE, cfg.TEST.MAX_SIZE, boxes=box_proposals
File "/vol/bitbucket2/rm2815/Detectron/detectron/core/test.py", line 158, in im_detect_bbox
workspace.RunNet(model.net.Proto().name)
File "/vol/bitbucket/rm2815/anaconda3/lib/python3.6/site-packages/caffe2/python/workspace.py", line 237, in RunNet
StringifyNetName(name), num_iter, allow_fail,
File "/vol/bitbucket/rm2815/anaconda3/lib/python3.6/site-packages/caffe2/python/workspace.py", line 198, in CallWithExceptionIntercept
return func(*args, **kwargs)
RuntimeError: IsType() ASSERT FAILED at /opt/conda/conda-bld/pytorch-nightly_1551157756140/work/aten/src/ATen/core/blob.h:77, please report a bug to PyTorch. wrong type for the Blob instance. Blob contains nullptr (uninitialized) while caller expects caffe2::Tensor.
Offending Blob name: gpu_0/conv_rpn_w.
Error from operator:
input: "gpu_0/res4_5_sum" input: "gpu_0/conv_rpn_w" input: "gpu_0/conv_rpn_b" output: "gpu_0/conv_rpn" name: "" type: "Conv" arg { name: "kernel" i: 3 } arg { name: "order" s: "NCHW" } arg { name: "pad" i: 1 } arg { name: "stride" i: 1 } arg { name: "exhaustive_search" i: 0 } device_option { device_type: 1 device_id: 0 } engine: "CUDNN" (Get at /opt/conda/conda-bld/pytorch-nightly_1551157756140/work/aten/src/ATen/core/blob.h:77)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7f68638f59d5 in /vol/bitbucket/rm2815/anaconda3/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libc10.so)
frame #1: caffe2::Tensor const& caffe2::Blob::Getcaffe2::Tensor() const + 0xf0 (0x7f686400f750 in /vol/bitbucket/rm2815/anaconda3/lib/python3.6/site-packages/caffe2/python/caffe2_pybind11_state_gpu.cpython-36m-x86_64-linux-gnu.so)
frame #2: caffe2::Tensor const& caffe2::OperatorBase::Inputcaffe2::Tensor(int, c10::DeviceType) + 0x301 (0x7f686408bdf1 in /vol/bitbucket/rm2815/anaconda3/lib/python3.6/site-packages/caffe2/python/caffe2_pybind11_state_gpu.cpython-36m-x86_64-linux-gnu.so)
frame #3: bool caffe2::CudnnConvOp::DoRunWithType<float, float, float, float>() + 0x38 (0x7f682545f428 in /vol/bitbucket/rm2815/anaconda3/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #4: caffe2::CudnnConvOp::RunOnDevice() + 0x198 (0x7f682544dd08 in /vol/bitbucket/rm2815/anaconda3/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #5: + 0x13970c5 (0x7f68253ba0c5 in /vol/bitbucket/rm2815/anaconda3/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #6: caffe2::AsyncNetBase::run(int, int) + 0x144 (0x7f684c460964 in /vol/bitbucket/rm2815/anaconda3/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #7: + 0x16b5549 (0x7f684c467549 in /vol/bitbucket/rm2815/anaconda3/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #8: c10::ThreadPool::main_loop(unsigned long) + 0x273 (0x7f684b47b773 in /vol/bitbucket/rm2815/anaconda3/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #9: + 0xafc5c (0x7f6869201c5c in /vol/bitbucket/rm2815/anaconda3/bin/../lib/libstdc++.so.6)
frame #10: + 0x76db (0x7f6877b6d6db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #11: clone + 0x3f (0x7f687789688f in /lib/x86_64-linux-gnu/libc.so.6)

Detailed steps to reproduce

In the Detectron dir:

python tools/test_net.py
--cfg configs/12_2017_baselines/e2e_mask_rcnn_R-50-C4_1x.yaml
TEST.WEIGHTS tmp/model_final.pkl
NUM_GPUS 1
OUTPUT_DIR tmp/test
After successfully training via:
python tools/train_net.py \ --cfg configs/12_2017_baselines/e2e_mask_rcnn_R-50-C4_1x.yaml \ OUTPUT_DIR tmp/

System information

  • Operating system: Ubuntu 18.04.2 LTS
  • Compiler version: GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0
  • CUDA version: 10.1
  • cuDNN version: 7.5.0.56
  • NVIDIA driver version: 430.09
  • GPU models (for all devices if they are not all the same): TITAN Xp
  • PYTHONPATH environment variable: n/a
  • python --version output: Python 3.6.5 :: Anaconda, Inc.
  • Anything else that seems relevant: PyTorch version: 1.1.0

Training works. Inference with below model works too:
python tools/test_net.py
--cfg configs/getting_started/tutorial_1gpu_e2e_faster_rcnn_R-50-FPN.yaml
TEST.WEIGHTS /tmp/detectron-output/train/coco_2014_train/generalized_rcnn/model_final.pkl
NUM_GPUS 1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions