Skip to content

Conversation

@Ankur-singh
Copy link
Contributor

@Ankur-singh Ankur-singh commented Jan 24, 2025

Context

What is the purpose of this PR? Is it to

  • add a new feature
  • fix a bug
  • update tests and/or documentation
  • other (please add here)

Please link to any issues this PR addresses. #2281

Changelog

What are the changes made in this PR?

  • Implemented cat subcommand for the CLI
  • Wrote basic test-cases for tune cat

Test plan

Please make sure to do each of the following if applicable to your PR. If you're unsure about any one of these just ask and we will happily help. We also have a contributing page for some guidance on contributing.

  • run pre-commit hooks and linters (make sure you've first installed via pre-commit install)
  • add unit tests for any new functionality
  • update docstrings for any new or updated methods or classes
  • run unit tests via pytest tests
  • run recipe tests via pytest tests -m integration_test
  • manually run any new or modified recipes with sufficient proof of correctness
  • include relevant commands and any other artifacts in this summary (pastes of loss curves, eval results, etc.)

When file doesn't exist:

(tune) ankur@nuc:~/github/torchtune$ tune cat llama4/8B_full.yaml
usage: tune cat [-h] config_name
tune cat: error: Config 'llama4/8B_full.yaml' not found.

When some other file is passed:

(tune) ankur@nuc:~/github/torchtune$ tune cat llama4/8B_full.json
usage: tune cat [-h] config_name
tune cat: error: Invalid config format: 'llama4/8B_full.json'. Must be YAML (.yaml/.yml)

When recipe name is passed:

(tune) ankur@nuc:~/github/torchtune$ tune cat quantize
'quantize' is a recipe, not a config. Please use a config name.

When correct config is passed:

(tune) ankur@nuc:~/github/torchtune$ tune cat llama2/7B_full
output_dir: /tmp/torchtune/llama2_7B/full
tokenizer:
    _component_: torchtune.models.llama2.llama2_tokenizer
    path: /tmp/Llama-2-7b-hf/tokenizer.model
    max_seq_len: null
dataset:
    _component_: torchtune.datasets.alpaca_dataset
    packed: false
seed: null
shuffle: true
model:
    _component_: torchtune.models.llama2.llama2_7b
checkpointer:
    _component_: torchtune.training.FullModelHFCheckpointer
    checkpoint_dir: /tmp/Llama-2-7b-hf
    checkpoint_files:
    - pytorch_model-00001-of-00002.bin
    - pytorch_model-00002-of-00002.bin
    recipe_checkpoint: null
    output_dir: ${output_dir}
    model_type: LLAMA2
resume_from_checkpoint: false
batch_size: 2
epochs: 1
optimizer:
    _component_: torch.optim.AdamW
    fused: true
    lr: 2e-5
loss:
    _component_: torchtune.modules.loss.CEWithChunkedOutputLoss
max_steps_per_epoch: null
gradient_accumulation_steps: 1
compile: false
optimizer_in_bwd: false
device: cuda
enable_activation_checkpointing: true
enable_activation_offloading: false
dtype: bf16
metric_logger:
    _component_: torchtune.training.metric_logging.DiskLogger
    log_dir: ${output_dir}/logs
log_every_n_steps: 1
log_peak_memory_stats: true
profiler:
    _component_: torchtune.training.setup_torch_profiler
    enabled: false
    output_dir: ${output_dir}/profiling_outputs
    cpu: true
    cuda: true
    profile_memory: false
    with_stack: false
    record_shapes: true
    with_flops: false
    wait_steps: 5
    warmup_steps: 3
    active_steps: 2
    num_cycles: 1

UX

If your function changed a public API, please add a dummy example of what the user experience will look like when calling it.
Here is a docstring example
and a tutorial example

  • I did not change any public API
  • I have added an example to docs or docstrings

@pytorch-bot
Copy link

pytorch-bot bot commented Jan 24, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2298

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 24, 2025
@Ankur-singh
Copy link
Contributor Author

Still a work in progress, but I’d love to gather as much feedback as possible while it's still in the oven!

cc @RdoubleA @joecummings

@SalmanMohammadi
Copy link
Contributor

Thanks so much for the awesome RFC and for putting this up @Ankur-singh!

Would you be able to attach some example output(s) from using the command to the PR description?

@Ankur-singh
Copy link
Contributor Author

@SalmanMohammadi Thank you for the comment. I have added the command outs for different scenarios. This should make it easy for other as well. Again, thank you very much.

Copy link
Member

@joecummings joecummings left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Command looks good, just a few comments on docstrings and impl

$ tune cat non_existent_config
Config 'non_existent_config' not found.

$ tune cat some_recipe
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need an example for recipe

Config 'non_existent_config' not found.

$ tune cat some_recipe
'some_recipe' is a recipe, not a config. Please use a config name.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an additional line down here saying that they can find all the cat-able configs via the tune ls command? You can also provide an example of launching a run overriding a key that was found via the new tune cat command.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats a great idea. In fact, this is was the original motivation.

from torchtune._cli.subcommand import Subcommand
from torchtune._recipe_registry import Config, get_all_recipes

ROOT = Path(torchtune.__file__).parent.parent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, we need to find a different way to do this (laziness on my part, I apologize). I suspect the call to import torchtune significantly slows down the execution of the CLI.

The key part here is that we want to find the underlying project in which the torchtune package is installed. For a local installation AND for a PyPI installation, it should be two levels up from the init.py file. So theoretically it's three levels up from any of the CLI files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, importing torchtune is slow. I will address this issue. We will need to update other CLI files that import torchtune as well, but it might be better to handle that in a separate PR to avoid scope creep.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, let's do this in another PR.

if config.name == config_str:
return config

def _print_file(self, file: str) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_print_yaml_file is a little more descriptive.

Copy link
Contributor Author

@Ankur-singh Ankur-singh Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please elaborate it further? What do you mean by "descriptive"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a nit for sure, but this doesn't just print any file. It only takes in and prints a YAML file, therefore the name should be something like print_yaml_file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thats a good point. Will update the function name.

"cat",
prog="tune cat",
help="Pretty print a config",
description="Pretty print a config",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add something related to WHY someone might want to do this? E.g. pretty print a config, making it easy to know which keys you can override when launching a training run (or something a little more concise)

task: full_finetune
...

$ tune cat non_existent_config
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for negative examples

epilog=textwrap.dedent(
"""\
examples:
$ tune cat llama2/7B_full
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an example of tune cat LOCALFILE.yaml

yaml.dump(
data,
default_flow_style=False,
sort_keys=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally find the sorted configs easier to read when printed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SalmanMohammadi I can make it an argument. Something like --sort. Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @ebsmothers @joecummings thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of a --sort argument. I agree that sorted configs are a little easier to read; however, they wouldn't look like our configs on Github, which would be confusing so the default should be to match what we show on Github.

@SalmanMohammadi
Copy link
Contributor

Would you be up for adding a little section for your shiny new command in the docs? : )

https://github.com/pytorch/torchtune/blob/main/docs/source/tune_cli.rst

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 10.41667% with 86 lines in your changes missing coverage. Please review.

Project coverage is 23.88%. Comparing base (d23fa93) to head (3b103f9).
Report is 34 commits behind head on main.

Files with missing lines Patch % Lines
torchtune/_cli/cat.py 0.00% 46 Missing ⚠️
tests/torchtune/_cli/test_cat.py 20.83% 38 Missing ⚠️
torchtune/_cli/tune.py 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##            main    #2298       +/-   ##
==========================================
+ Coverage   9.19%   23.88%   +14.68%     
==========================================
  Files        292      360       +68     
  Lines      17740    21295     +3555     
==========================================
+ Hits        1631     5086     +3455     
- Misses     16109    16209      +100     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Ankur-singh
Copy link
Contributor Author

@SalmanMohammadi and @joecummings Thank you so much for all the amazing feedback. I have made all the requested changes.

# Pretty print the contents of LOCALFILE.yaml
$ tune cat LOCALFILE.yaml

# Example of launching a run overriding a key found via the `tune cat` command:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit, but take a look at how I did this for tune ls: https://github.com/pytorch/torchtune/blob/d4465c8b21629949a404f821528badb35962502f/torchtune/_cli/ls.py#L37

It should be unindented OUT of the examples section and instead be considered a followup.

e.g.

examples:
	$ tune cat llama2/7B_full
	...

	$ tune cat LOCALFILE.yaml
	...

You can now easily override a key based on your findings from `tune cat`:
	$ tune run full_finetune_distributed ...

Need to find all the "cat"-able configs? Try `tune ls`!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

self._parser = subparsers.add_parser(
"cat",
prog="tune cat",
help="Pretty print a config, making it easy to override parameters when using `tune run`",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "making it easy to know which parameters you can override with tune run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@joecummings joecummings left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome! Just two more nits I swear :)

Copy link
Member

@joecummings joecummings left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🫡🫡🫡

@joecummings joecummings merged commit e6b9064 into meta-pytorch:main Jan 30, 2025
3 checks passed
@Ankur-singh Ankur-singh deleted the add-cat-cmd branch March 4, 2025 07:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants