|
| 1 | +.. _wandb_logging: |
| 2 | + |
| 3 | +=========================== |
| 4 | +Logging to Weights & Biases |
| 5 | +=========================== |
| 6 | + |
| 7 | +.. customcarditem:: |
| 8 | + :header: Logging to Weights & Biases |
| 9 | + :card_description: Log metrics and model checkpoints to W&B |
| 10 | + :image: _static/img/torchtune_workspace.png |
| 11 | + :link: examples/wandb_logging.html |
| 12 | + :tags: logging,wandb |
| 13 | + |
| 14 | + |
| 15 | +Torchtune supports logging your training runs to [Weights & Biases](https://wandb.ai). |
| 16 | + |
| 17 | +.. note:: |
| 18 | + |
| 19 | + You will need to install the `wandb` package to use this feature. |
| 20 | + You can install it via pip: |
| 21 | + |
| 22 | + .. code-block:: bash |
| 23 | +
|
| 24 | + pip install wandb |
| 25 | +
|
| 26 | + Then you need to login with your API key using the W&B CLI: |
| 27 | + |
| 28 | + .. code-block:: bash |
| 29 | +
|
| 30 | + wandb login |
| 31 | +
|
| 32 | +
|
| 33 | +Metric Logger |
| 34 | +------------- |
| 35 | + |
| 36 | +The only change you need to make is to add the metric logger to your config. Weights & Biases will log the metrics and model checkpoints for you. |
| 37 | + |
| 38 | +.. code-block:: yaml |
| 39 | +
|
| 40 | + # enable logging to the built-in WandBLogger |
| 41 | + metric_logger: |
| 42 | + _component_: torchtune.utils.metric_logging.WandBLogger |
| 43 | + # the W&B project to log to |
| 44 | + project: torchtune |
| 45 | +
|
| 46 | +
|
| 47 | +We automatically grab the config from the recipe you are running and log it to W&B. You can find it in the W&B overview tab and the actual file in the `Files` tab. |
| 48 | + |
| 49 | +.. note:: |
| 50 | + |
| 51 | + Click on this sample [project to see the W&B workspace](https://wandb.ai/capecape/torchtune) |
| 52 | + The config used to train the models can be found [here](https://wandb.ai/capecape/torchtune/runs/6053ofw0/files/torchtune_config_j67sb73v.yaml) |
| 53 | + |
| 54 | +Logging Model Checkpoints to W&B |
| 55 | +-------------------------------- |
| 56 | + |
| 57 | +You can also log the model checkpoints to W&B by modifying the desired script `save_checkpoint` method. |
| 58 | + |
| 59 | +A suggested approach would be something like this: |
| 60 | + |
| 61 | +.. code-block:: python |
| 62 | +
|
| 63 | + def save_checkpoint(self, epoch: int) -> None: |
| 64 | + ... |
| 65 | + ## Let's save the checkpoint to W&B |
| 66 | + ## depending on the Checkpointer Class the file will be named differently |
| 67 | + ## Here is an example for the full_finetune case |
| 68 | + checkpoint_file = Path.joinpath( |
| 69 | + self._checkpointer._output_dir, f"torchtune_model_{epoch}" |
| 70 | + ).with_suffix(".pt") |
| 71 | + wandb_at = wandb.Artifact( |
| 72 | + name=f"torchtune_model_{epoch}", |
| 73 | + type="model", |
| 74 | + # description of the model checkpoint |
| 75 | + description="Model checkpoint", |
| 76 | + # you can add whatever metadata you want as a dict |
| 77 | + metadata={ |
| 78 | + utils.SEED_KEY: self.seed, |
| 79 | + utils.EPOCHS_KEY: self.epochs_run, |
| 80 | + utils.TOTAL_EPOCHS_KEY: self.total_epochs, |
| 81 | + utils.MAX_STEPS_KEY: self.max_steps_per_epoch, |
| 82 | + } |
| 83 | + ) |
| 84 | + wandb_at.add_file(checkpoint_file) |
| 85 | + wandb.log_artifact(wandb_at) |
0 commit comments