-
Notifications
You must be signed in to change notification settings - Fork 215
Open
Description
when I run unit test in docker (cpu ver.), it reports an error:
root@85a655cc87d1:/app/codalab# python run_local_test.py
2020-05-07 06:36:42 INFO run_local_test.py: ##################################################
2020-05-07 06:36:42 INFO run_local_test.py: Begin running local test using
2020-05-07 06:36:42 INFO run_local_test.py: code_dir = AutoDL_sample_code_submission
2020-05-07 06:36:42 INFO run_local_test.py: dataset_dir = miniciao
2020-05-07 06:36:42 INFO run_local_test.py: ##################################################
2020-05-07 06:36:42 INFO run_local_test.py: Cleaning existing output directory of last run: /app/codalab/AutoDL_sample_result_submission
2020-05-07 06:36:42 INFO run_local_test.py: Cleaning existing output directory of last run: /app/codalab/AutoDL_scoring_output
python /app/codalab/AutoDL_ingestion_program/ingestion.py --dataset_dir=/app/codalab/AutoDL_sample_data/miniciao --code_dir=/app/codalab/AutoDL_sample_code_submission --time_budget=1200.0
python /app/codalab/AutoDL_scoring_program/score.py --solution_dir=/app/codalab/AutoDL_sample_data/miniciao
2020-05-07 06:36:43,653 INFO score.py: ===== Start scoring program. Version: v20191204 =====
2020-05-07 06:36:44,673 INFO ingestion.py: ************************************************
2020-05-07 06:36:44,673 INFO ingestion.py: ******** Processing dataset Miniciao ********
2020-05-07 06:36:44,673 INFO ingestion.py: ************************************************
2020-05-07 06:36:44,673 INFO ingestion.py: Reading training set and test set...
WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/tensor_array_ops.py:162: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2020-05-07 06:36:44,928 INFO ingestion.py: Creating model...this process should not exceed 20min.
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/usr/lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "/app/codalab/AutoDL_sample_code_submission/Auto_Image/model.py", line 19, in <lambda>
threading.Thread(target=lambda: torch.cuda.synchronize()),
File "/usr/local/lib/python3.5/dist-packages/torch/cuda/__init__.py", line 398, in synchronize
_lazy_init()
File "/usr/local/lib/python3.5/dist-packages/torch/cuda/__init__.py", line 192, in _lazy_init
_check_driver()
File "/usr/local/lib/python3.5/dist-packages/torch/cuda/__init__.py", line 102, in _check_driver
http://www.nvidia.com/Download/index.aspx""")
AssertionError:
Found no NVIDIA driver on your system. Please check that you
have an NVIDIA GPU and installed a driver from
http://www.nvidia.com/Download/index.aspx
2020-05-07 06:36:46,014 INFO ingestion.py: Initialization success, time spent so far 1.0854098796844482 sec
2020-05-07 06:36:46,014 ERROR ingestion.py: Failed to initializing model.
2020-05-07 06:36:46,015 ERROR ingestion.py: Encountered exception:
Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
Traceback (most recent call last):
File "/app/codalab/AutoDL_ingestion_program/ingestion.py", line 339, in <module>
M = Model(D_train.get_metadata()) # The metadata of D_train and D_test only differ in sample_count
File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
self.gen.throw(type, value, traceback)
File "/app/codalab/AutoDL_ingestion_program/ingestion.py", line 208, in time_limit
yield
File "/app/codalab/AutoDL_ingestion_program/ingestion.py", line 339, in <module>
M = Model(D_train.get_metadata()) # The metadata of D_train and D_test only differ in sample_count
File "/app/codalab/AutoDL_sample_code_submission/model.py", line 54, in __init__
self.domain_model = DomainModel(self.metadata)
File "/app/codalab/AutoDL_sample_code_submission/Auto_Image/model.py", line 42, in __init__
super(Model, self).__init__(metadata)
File "/app/codalab/AutoDL_sample_code_submission/Auto_Image/skeleton/projects/logic.py", line 88, in __init__
self.build()
File "/app/codalab/AutoDL_sample_code_submission/Auto_Image/model.py", line 66, in build
self.model_9.init(model_dir=model_path, gain=1.0)
File "/app/codalab/AutoDL_sample_code_submission/Auto_Image/architectures/resnet.py", line 244, in init
model_dir=self.model_dir)
File "/usr/local/lib/python3.5/dist-packages/torch/hub.py", line 499, in load_state_dict_from_url
return torch.load(cached_file, map_location=map_location)
File "/usr/local/lib/python3.5/dist-packages/torch/serialization.py", line 426, in load
return _load(f, map_location, pickle_module, **pickle_load_args)
File "/usr/local/lib/python3.5/dist-packages/torch/serialization.py", line 613, in _load
result = unpickler.load()
File "/usr/local/lib/python3.5/dist-packages/torch/serialization.py", line 576, in persistent_load
deserialized_objects[root_key] = restore_location(obj, location)
File "/usr/local/lib/python3.5/dist-packages/torch/serialization.py", line 155, in default_restore_location
result = fn(storage, location)
File "/usr/local/lib/python3.5/dist-packages/torch/serialization.py", line 131, in _cuda_deserialize
device = validate_cuda_device(location)
File "/usr/local/lib/python3.5/dist-packages/torch/serialization.py", line 115, in validate_cuda_device
raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
2020-05-07 06:36:46,035 INFO ingestion.py: ===== Start core part of ingestion program. Version: v20191204 =====
2020-05-07 06:36:46,039 INFO ingestion.py: Failed to run ingestion.
2020-05-07 06:36:46,039 ERROR ingestion.py: Encountered exception:
name 'M' is not defined
Traceback (most recent call last):
File "/app/codalab/AutoDL_ingestion_program/ingestion.py", line 358, in <module>
if not hasattr(M, attr):
NameError: name 'M' is not defined
2020-05-07 06:36:46,044 INFO ingestion.py: Wrote the file end.txt marking the end of ingestion.
2020-05-07 06:36:46,045 INFO ingestion.py: [-] Done, but encountered some errors during ingestion.
2020-05-07 06:36:46,045 INFO ingestion.py: [-] Overall time spent 0.01 sec
2020-05-07 06:36:46,079 INFO ingestion.py: [Ingestion terminated]
first I thought it was an netowrk issue during download training data, but I tried run test with proxy, orI downloaded the the r9-xxx.pth.tar , even after build with another machine (with docker of course) still without luck.
It's weird that log report :
Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False.
which I'm using docker of cpu ver
Metadata
Metadata
Assignees
Labels
No labels