Having troubles changing save_steps parameter for a resumed job.

I'm having next issue. Let say I'm starting a job 

```
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_sft.py recipes/zephyr-7b-beta/sft/config_full.yaml
```

In `config_full.yaml` I would have `save_steps: 1000`. At some point I would realize that 1000 is too frequent of a step to save, so I stop the job, edit `config_full.yaml` to have `save_steps: 10000` and restart the job. The resume from checkpoint goes as planned, however I would still have checkpoints saved every 1000 steps (original parameter). What do I do wrong?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Having troubles changing save_steps parameter for a resumed job. #195

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Having troubles changing save_steps parameter for a resumed job. #195

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions