Skip to content

Load checkpoints within a DeterministicEngine #2197

@H4dr1en

Description

@H4dr1en

🐛 Bug description

When loading a checkpoint trained with anEngine using a DeterministEngine, the following error is raised:

src/training/engine.py:44: in valid
    self.trainer.valid(self.my_task, checkpoint_path)
src/training/trainers/single_trainer.py:65: in valid
    valid_engine.run(valid_loader)
../../miniconda3/envs/training-py36/lib/python3.6/site-packages/ignite/engine/engine.py:701: in run
    return self._internal_run()
../../miniconda3/envs/training-py36/lib/python3.6/site-packages/ignite/engine/engine.py:774: in _internal_run
    self._handle_exception(e)
../../miniconda3/envs/training-py36/lib/python3.6/site-packages/ignite/engine/engine.py:469: in _handle_exception
    raise e
../../miniconda3/envs/training-py36/lib/python3.6/site-packages/ignite/engine/engine.py:751: in _internal_run
    self._fire_event(Events.EPOCH_COMPLETED)
../../miniconda3/envs/training-py36/lib/python3.6/site-packages/ignite/engine/engine.py:424: in _fire_event
    func(*first, *(event_args + others), **kwargs)
../../miniconda3/envs/training-py36/lib/python3.6/site-packages/ignite/handlers/checkpoint.py:373: in __call__
    checkpoint = self._setup_checkpoint()
../../miniconda3/envs/training-py36/lib/python3.6/site-packages/ignite/handlers/checkpoint.py:437: in _setup_checkpoint
    checkpoint[k] = obj.state_dict()
../../miniconda3/envs/training-py36/lib/python3.6/site-packages/ignite/engine/deterministic.py:186: in state_dict
    state_dict = super(DeterministicEngine, self).state_dict()
../../miniconda3/envs/training-py36/lib/python3.6/site-packages/ignite/engine/engine.py:504: in state_dict
    return OrderedDict([(k, getattr(self.state, k)) for k in keys])

.0 = <tuple_iterator object at 0x166c104a8>

>   return OrderedDict([(k, getattr(self.state, k)) for k in keys])
E   AttributeError: 'State' object has no attribute 'rng_states'

Expected behavior:
Since the checkpoint doesn't have rng_states, DeterministEngine should print a warning and ignore the previous rng_states (recreate on the fly)

How to reproduce:

  1. Train a pytorch model with an Engine
  2. Save a checkpoint
  3. Resume the training using a DeterministicEngine

Environment

  • PyTorch Version (e.g., 1.4): 1.7.1
  • Ignite Version (e.g., 0.3.0): 0.4.6
  • OS (e.g., Linux): macOS
  • How you installed Ignite (conda, pip, source): pip
  • Python version: 3.6.10

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions