-
Notifications
You must be signed in to change notification settings - Fork 5.4k
IsType<T>() ASSERT FAILED [e2e_mask_rcnn_R-50-C4_1x.yaml] #891
Description
Expected results
Successful testing on coco, as per usual.
Actual results
Traceback (most recent call last):
File "tools/test_net.py", line 116, in
check_expected_results=True,
File "/vol/bitbucket2/rm2815/Detectron/detectron/core/test_engine.py", line 128, in run_inference
all_results = result_getter()
File "/vol/bitbucket2/rm2815/Detectron/detectron/core/test_engine.py", line 108, in result_getter
multi_gpu=multi_gpu_testing
File "/vol/bitbucket2/rm2815/Detectron/detectron/core/test_engine.py", line 159, in test_net_on_dataset
weights_file, dataset_name, proposal_file, output_dir, gpu_id=gpu_id
File "/vol/bitbucket2/rm2815/Detectron/detectron/core/test_engine.py", line 258, in test_net
model, im, box_proposals, timers
File "/vol/bitbucket2/rm2815/Detectron/detectron/core/test.py", line 66, in im_detect_all
model, im, cfg.TEST.SCALE, cfg.TEST.MAX_SIZE, boxes=box_proposals
File "/vol/bitbucket2/rm2815/Detectron/detectron/core/test.py", line 158, in im_detect_bbox
workspace.RunNet(model.net.Proto().name)
File "/vol/bitbucket/rm2815/anaconda3/lib/python3.6/site-packages/caffe2/python/workspace.py", line 237, in RunNet
StringifyNetName(name), num_iter, allow_fail,
File "/vol/bitbucket/rm2815/anaconda3/lib/python3.6/site-packages/caffe2/python/workspace.py", line 198, in CallWithExceptionIntercept
return func(*args, **kwargs)
RuntimeError: IsType() ASSERT FAILED at /opt/conda/conda-bld/pytorch-nightly_1551157756140/work/aten/src/ATen/core/blob.h:77, please report a bug to PyTorch. wrong type for the Blob instance. Blob contains nullptr (uninitialized) while caller expects caffe2::Tensor.
Offending Blob name: gpu_0/conv_rpn_w.
Error from operator:
input: "gpu_0/res4_5_sum" input: "gpu_0/conv_rpn_w" input: "gpu_0/conv_rpn_b" output: "gpu_0/conv_rpn" name: "" type: "Conv" arg { name: "kernel" i: 3 } arg { name: "order" s: "NCHW" } arg { name: "pad" i: 1 } arg { name: "stride" i: 1 } arg { name: "exhaustive_search" i: 0 } device_option { device_type: 1 device_id: 0 } engine: "CUDNN" (Get at /opt/conda/conda-bld/pytorch-nightly_1551157756140/work/aten/src/ATen/core/blob.h:77)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7f68638f59d5 in /vol/bitbucket/rm2815/anaconda3/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libc10.so)
frame #1: caffe2::Tensor const& caffe2::Blob::Getcaffe2::Tensor() const + 0xf0 (0x7f686400f750 in /vol/bitbucket/rm2815/anaconda3/lib/python3.6/site-packages/caffe2/python/caffe2_pybind11_state_gpu.cpython-36m-x86_64-linux-gnu.so)
frame #2: caffe2::Tensor const& caffe2::OperatorBase::Inputcaffe2::Tensor(int, c10::DeviceType) + 0x301 (0x7f686408bdf1 in /vol/bitbucket/rm2815/anaconda3/lib/python3.6/site-packages/caffe2/python/caffe2_pybind11_state_gpu.cpython-36m-x86_64-linux-gnu.so)
frame #3: bool caffe2::CudnnConvOp::DoRunWithType<float, float, float, float>() + 0x38 (0x7f682545f428 in /vol/bitbucket/rm2815/anaconda3/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #4: caffe2::CudnnConvOp::RunOnDevice() + 0x198 (0x7f682544dd08 in /vol/bitbucket/rm2815/anaconda3/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #5: + 0x13970c5 (0x7f68253ba0c5 in /vol/bitbucket/rm2815/anaconda3/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #6: caffe2::AsyncNetBase::run(int, int) + 0x144 (0x7f684c460964 in /vol/bitbucket/rm2815/anaconda3/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #7: + 0x16b5549 (0x7f684c467549 in /vol/bitbucket/rm2815/anaconda3/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #8: c10::ThreadPool::main_loop(unsigned long) + 0x273 (0x7f684b47b773 in /vol/bitbucket/rm2815/anaconda3/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #9: + 0xafc5c (0x7f6869201c5c in /vol/bitbucket/rm2815/anaconda3/bin/../lib/libstdc++.so.6)
frame #10: + 0x76db (0x7f6877b6d6db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #11: clone + 0x3f (0x7f687789688f in /lib/x86_64-linux-gnu/libc.so.6)
Detailed steps to reproduce
In the Detectron dir:
python tools/test_net.py
--cfg configs/12_2017_baselines/e2e_mask_rcnn_R-50-C4_1x.yaml
TEST.WEIGHTS tmp/model_final.pkl
NUM_GPUS 1
OUTPUT_DIR tmp/test
After successfully training via:
python tools/train_net.py \ --cfg configs/12_2017_baselines/e2e_mask_rcnn_R-50-C4_1x.yaml \ OUTPUT_DIR tmp/
System information
- Operating system: Ubuntu 18.04.2 LTS
- Compiler version: GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0
- CUDA version: 10.1
- cuDNN version: 7.5.0.56
- NVIDIA driver version: 430.09
- GPU models (for all devices if they are not all the same): TITAN Xp
PYTHONPATH
environment variable: n/apython --version
output: Python 3.6.5 :: Anaconda, Inc.- Anything else that seems relevant: PyTorch version: 1.1.0
Training works. Inference with below model works too:
python tools/test_net.py
--cfg configs/getting_started/tutorial_1gpu_e2e_faster_rcnn_R-50-FPN.yaml
TEST.WEIGHTS /tmp/detectron-output/train/coco_2014_train/generalized_rcnn/model_final.pkl
NUM_GPUS 1