@@ -83,6 +83,33 @@ and in most cases training can slow-down quite a bit as a result of this activat
8383To enable activation checkpointing, use the ``enable_activation_checkpointing `` config entry or flag
8484in any of our recipes, e.g. ``enable_activation_checkpointing=True ``.
8585
86+ .. _glossary_act_off :
87+
88+ Activation Offloading
89+ ---------------------
90+
91+ *What's going on here? *
92+
93+ You may have just read about activation checkpointing! Similar to checkpointing, offloading is a memory
94+ efficiency technique that allows saving GPU VRAM by temporarily moving activations to CPU and bringing
95+ them back when needed in the backward pass.
96+
97+ See `PyTorch autograd hook tutorial <https://pytorch.org/tutorials/intermediate/autograd_saved_tensors_hooks_tutorial.html#saving-tensors-to-cpu >`_
98+ for more details about how this is implemented through saved_tensors_hooks.
99+
100+ This setting is especially helpful for larger batch sizes, or longer context lengths when you're memory constrained.
101+ However, these savings in memory can come at the cost of training speed (i.e. tokens per-second), as it takes runtime
102+ and resources to move Tensors from GPU to CPU and back. The implementation in torchtune uses multiple CUDA streams
103+ in order to overlap the extra communication with the computation to hide the extra runtime. As the communication
104+ workload is variable depending on the number and size of tensors being offloaded, it is common to not offload every
105+ single activation. In fact, once can use offloading in conjunction with activations checkpointing, where all
106+ activations will either be recomputed later in the backward or brought back from the CPU.
107+
108+ *Sounds great! How do I use it? *
109+
110+ To enable activation offloading, use the ``enable_activation_offloading `` config entry or flag
111+ in our lora finetuning single device recipe, e.g. ``enable_activation_offloading=True ``.
112+
86113.. _glossary_grad_accm :
87114
88115Gradient Accumulation
0 commit comments