Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions docs/source/deep_dives/checkpointer.rst
Original file line number Diff line number Diff line change
Expand Up @@ -293,8 +293,8 @@ For more details about each file, please check the End-to-End tutorial mentioned
│ ├── adapter_model.pt
│ ├── adapter_model.safetensors
│ ├── config.json
│ ├── ft-model-00001-of-00002.safetensors
│ ├── ft-model-00002-of-00002.safetensors
│ ├── model-00001-of-00002.safetensors
│ ├── model-00002-of-00002.safetensors
│ ├── generation_config.json
│ ├── LICENSE.txt
│ ├── model.safetensors.index.json
Expand All @@ -313,8 +313,8 @@ For more details about each file, please check the End-to-End tutorial mentioned
│ ├── adapter_model.pt
│ ├── adapter_model.safetensors
│ ├── config.json
│ ├── ft-model-00001-of-00002.safetensors
│ ├── ft-model-00002-of-00002.safetensors
│ ├── model-00001-of-00002.safetensors
│ ├── model-00002-of-00002.safetensors
│ ├── generation_config.json
│ ├── LICENSE.txt
│ ├── model.safetensors.index.json
Expand Down Expand Up @@ -394,7 +394,7 @@ you'll need to **update** the following fields in your configs:

**resume_from_checkpoint**: Set it to True;

**checkpoint_files**: change the path to ``epoch_{YOUR_EPOCH}/ft-model={}-of-{}.safetensors``;
**checkpoint_files**: change the path to ``epoch_{YOUR_EPOCH}/model-{}-of-{}.safetensors``;

Notice that we do **not** change our checkpoint_dir or output_dir. Since we are resuming from checkpoint, we know
to look for it in the output_dir.
Expand All @@ -405,8 +405,8 @@ to look for it in the output_dir.
# checkpoint files. Note that you will need to update this
# section of the config with the intermediate checkpoint files
checkpoint_files: [
epoch_{YOUR_EPOCH}/ft-model-00001-of-00002.safetensors,
epoch_{YOUR_EPOCH}/ft-model-00001-of-00002.safetensors,
epoch_{YOUR_EPOCH}/model-00001-of-00002.safetensors,
epoch_{YOUR_EPOCH}/model-00001-of-00002.safetensors,
]

# set to True if restarting training
Expand Down
14 changes: 7 additions & 7 deletions docs/source/tutorials/e2e_flow.rst
Original file line number Diff line number Diff line change
Expand Up @@ -142,8 +142,8 @@ There are 3 types of folders:
│ ├── adapter_model.pt
│ ├── adapter_model.safetensors
│ ├── config.json
│ ├── ft-model-00001-of-00002.safetensors
│ ├── ft-model-00002-of-00002.safetensors
│ ├── model-00001-of-00002.safetensors
│ ├── model-00002-of-00002.safetensors
│ ├── generation_config.json
│ ├── LICENSE.txt
│ ├── model.safetensors.index.json
Expand All @@ -168,7 +168,7 @@ There are 3 types of folders:
Let's understand the files:

- ``adapter_model.safetensors`` and ``adapter_model.pt`` are your LoRA trained adapter weights. We save a duplicated .pt version of it to facilitate resuming from checkpoint.
- ``ft-model-{}-of-{}.safetensors`` are your trained full model weights (not adapters). When LoRA finetuning, these are only present if we set ``save_adapter_weights_only=False``. In that case, we merge the merged base model with trained adapters, making inference easier.
- ``model-{}-of-{}.safetensors`` are your trained full model weights (not adapters). When LoRA finetuning, these are only present if we set ``save_adapter_weights_only=False``. In that case, we merge the merged base model with trained adapters, making inference easier.
- ``adapter_config.json`` is used by Huggingface PEFT when loading an adapter (more on that later);
- ``model.safetensors.index.json`` is used by Hugging Face ``from_pretrained()`` when loading the model weights (more on that later)
- All other files were originally in the checkpoint_dir. They are automatically copied during training. Files over 100MiB and ending on .safetensors, .pth, .pt, .bin are ignored, making it lightweight.
Expand Down Expand Up @@ -223,8 +223,8 @@ Notice that we are using the merged weights, and not the LoRA adapters.
_component_: torchtune.training.FullModelHFCheckpointer
checkpoint_dir: ${output_dir}
checkpoint_files: [
ft-model-00001-of-00002.safetensors,
ft-model-00002-of-00002.safetensors,
model-00001-of-00002.safetensors,
model-00002-of-00002.safetensors,
]
output_dir: ${output_dir}
model_type: LLAMA3_2
Expand Down Expand Up @@ -299,8 +299,8 @@ Let's modify ``custom_generation_config.yaml`` to include the following changes.
_component_: torchtune.training.FullModelHFCheckpointer
checkpoint_dir: ${checkpoint_dir}
checkpoint_files: [
ft-model-00001-of-00002.safetensors,
ft-model-00002-of-00002.safetensors,
model-00001-of-00002.safetensors,
model-00002-of-00002.safetensors,
]
output_dir: ${output_dir}
model_type: LLAMA3_2
Expand Down