-
Notifications
You must be signed in to change notification settings - Fork 693
Readme Updates #1664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Readme Updates #1664
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1664
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit b7507ea with merge base b4fea32 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
README.md
Outdated
| Alternatively, you can install a nightly build of torchtune to gain access to the latest features not yet available in the stable release. | ||
|
|
||
| ```bash | ||
| pip install --pre torchtune --extra-index-url https://download.pytorch.org/whl/nightly/cpu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we only recommend torchtune nightlies with torch/torchao/torchvision nightlies? In other words, is it possible that someone will install stable torchtune, nightlies torch, and have issues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is a good point. Will update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fwiw we can support torchtune nightlies + PyTorch stable, but prob won't mention it cause it just complicates things
| | ||
|
|
||
| ### Downloading a model | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
downloading a model recommends llama 3 instead of 3.1
README.md
Outdated
| ## Design Principles | ||
| --- | ||
|
|
||
| ## Llama3 and Llama3.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why llama 3? I think its fine to just use 3.1. This may get stable soon, so may keeping it just as "llama" is better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I will drop 3, probably will keep 3.1 in just to be explicit (otherwise it's not clear what we're referring to). One thing is that the tutorial in the docs is still for Llama3, not 3.1. So strictly speaking it's not 100% correct, but maybe that's a minor point
README.md
Outdated
|
|
||
| ## Llama3 and Llama3.1 | ||
|
|
||
| torchtune supports fine-tuning for the Llama3/Llama3.1 8B, 70B, and 405B size models. You can fine-tune the 8B model with LoRA, QLoRA and full fine-tunes on one or more GPUs. You can also fine-tune the 70B model with QLoRA on a single device or LoRA and full-finetunes on multiple devices. Finally, you can fine-tune the 405B model on a single node with QLoRA. For all the details, take a look at our [tutorial](https://pytorch.org/torchtune/main/tutorials/llama3.html). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imo, less is more. No need to specify 8b/70b/405b in 3 different lines. Something like this should be good
"torchtune supports fine-tuning for the Llama3/Llama3.1 8B, 70B, and 405B size models with LoRA, QLoRA and full fine-tunes on one or more GPUs, depending on the model size. For all the details, take a look at our tutorial."
|
I think we absolutely need a section in getting started on running on a custom dataset, maybe even listing out the datset types we support and pointing to docs for more details. I don't see any mention of multimodal except in the install for why we depend on torchvision, we should be highlighting this somewhere |
README.md
Outdated
| LoRA 70B | ||
|
|
||
| Note that the download command for the Meta-Llama3 70B model slightly differs from download commands for the 8B models. This is because we use the HuggingFace [safetensor](https://huggingface.co/docs/safetensors/en/index) model format to load the model. To download the 70B model, run | ||
| ```bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i dont think that this warning is really necessary. But we should add a link to our models.api, that talks about downloading every model, not only llama. IMO, llama here should just be an example, not a full tutorial to download every llama model.
README.md
Outdated
|
|
||
| Note that the download command for the Meta-Llama3 70B model slightly differs from download commands for the 8B models. This is because we use the HuggingFace [safetensor](https://huggingface.co/docs/safetensors/en/index) model format to load the model. To download the 70B model, run | ||
| ```bash | ||
| tune download meta-llama/Meta-Llama-3.1-70b --hf-token <> --output-dir /tmp/Meta-Llama-3.1-70b --ignore-patterns "original/consolidated*" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if i fully follow why for llama 8b we just share tune run, but for 70b, we have also download instructions here.
README.md
Outdated
| ``` | ||
|
|
||
| torchtune is designed to be easy to understand, use and extend. | ||
| You can find a full list of all our Llama3 configs [here](recipes/configs/llama3) and Llama3.1 configs [here.](recipes/configs/llama3_1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is nice, but i dont think that we should share only about llama. Maybe shoutout llama 3.1. and replace llama3 with just recipes/configs.
| | ||
|
|
||
| ## Community Contributions | ||
| ### Community Contributions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we keeping this? I remember we had a discussion about it a while ago
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I planned to keep it for now
I would punt that one to you and/or @pbontrager. The goal of these changes is just to get us from 6 months behind to approximately present-day. And imo custom datasets can get hairy quickly, I think the readme should be simple and to-the-point, leveraging our live docs as a reference. E.g. even axolotl's readme basically only has two sentences on datasets and just points to their docs. |
| | 8 x A100 | LoRA | Llama2-70B | Batch Size = 4, Seq Length = 4096 | 26.4 GB | 3384 | | ||
| | 8 x A100 | Full Finetune * | Llama2-70B | Batch Size = 4, Seq Length = 4096 | 70.4 GB | 2032 | | ||
| | Fine-tuning method | Devices | Recipe | Example config(s) | | ||
| |:-:|:-:|:-:|:-:| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its worth adding another column with [Speed optimized, memory optimized], or it may not be clear for the user why examples are duplicated and numbers are different
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably worth also adding bsz and max_seq_len
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the second point, I followed your advice and use fixed bsz and max_seq_len for all examples. I call this out in the note just above the table. I left out the speed vs memory optimized column because it's really only applicable to like 50% of the rows, so it might be redundant. Tried to instead include the hardware type to delineate that one can be run in a more memory-constrained environment
README.md
Outdated
| To get started with fine-tuning your first LLM with torchtune, see our tutorial on [fine-tuning Llama2 7B](https://pytorch.org/torchtune/main/tutorials/first_finetune_tutorial.html). Our [end-to-end workflow](https://pytorch.org/torchtune/main/tutorials/e2e_flow.html) tutorial will show you how to evaluate, quantize and run inference with this model. The rest of this section will provide a quick overview of these steps with Llama2. | ||
| To get started with torchtune, see our [Fine-Tune Your First LLM Tutorial](https://pytorch.org/torchtune/main/tutorials/first_finetune_tutorial.html). Our [end-to-end workflow](https://pytorch.org/torchtune/main/tutorials/e2e_flow.html) tutorial will show you how to evaluate, quantize and run inference with this model. The rest of this section will provide a quick overview of these steps with Llama3. | ||
|
|
||
| If you have a more custom workflow or need additional information on torchtune components and recipes, please check out our documentation page |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: update
joecummings
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Controversial, but I would be in favor of dropping the full section on Llama 3 and Llama 3.1. We already say what models we support and can highlight new model updates at the top of the README in recent updates section
joecummings
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meh
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1664 +/- ##
==========================================
- Coverage 71.11% 69.13% -1.98%
==========================================
Files 297 298 +1
Lines 15120 15165 +45
==========================================
- Hits 10752 10485 -267
- Misses 4368 4680 +312
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Long-overdue updates to our README. Changes include:
Updating memory and tokens/sec numbers to report for Llama3.1 instead of Llama2.
Providing a (n imo) clearer version of our table of supported recipes
Generally reorganizing things. Now the flow is
A bunch of other small/cosmetic changes, but these are the main ones