Skip to content

Conversation

@ebsmothers
Copy link
Contributor

Long-overdue updates to our README. Changes include:

  1. Updating memory and tokens/sec numbers to report for Llama3.1 instead of Llama2.

  2. Providing a (n imo) clearer version of our table of supported recipes

  3. Generally reorganizing things. Now the flow is

  • Intro
    • Models
    • Recipes
    • Memory efficiency and perf
  • Install
  • Getting Started
    • Downloading models
    • Running recipes
    • Modifying configs
  • Llama3 and 3.1
  • Community
    • Community Contributions
  • Acknowledgments
  • License
  1. Updating the intro bullet points slightly for clarity.. now the first three more explicitly outline the three tables that follow
  2. Removing the design principles section

A bunch of other small/cosmetic changes, but these are the main ones

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 24, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1664

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b7507ea with merge base b4fea32 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 24, 2024
README.md Outdated
Alternatively, you can install a nightly build of torchtune to gain access to the latest features not yet available in the stable release.

```bash
pip install --pre torchtune --extra-index-url https://download.pytorch.org/whl/nightly/cpu
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we only recommend torchtune nightlies with torch/torchao/torchvision nightlies? In other words, is it possible that someone will install stable torchtune, nightlies torch, and have issues?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is a good point. Will update

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fwiw we can support torchtune nightlies + PyTorch stable, but prob won't mention it cause it just complicates things

 

### Downloading a model

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

downloading a model recommends llama 3 instead of 3.1

README.md Outdated
## Design Principles
---

## Llama3 and Llama3.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why llama 3? I think its fine to just use 3.1. This may get stable soon, so may keeping it just as "llama" is better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I will drop 3, probably will keep 3.1 in just to be explicit (otherwise it's not clear what we're referring to). One thing is that the tutorial in the docs is still for Llama3, not 3.1. So strictly speaking it's not 100% correct, but maybe that's a minor point

README.md Outdated

## Llama3 and Llama3.1

torchtune supports fine-tuning for the Llama3/Llama3.1 8B, 70B, and 405B size models. You can fine-tune the 8B model with LoRA, QLoRA and full fine-tunes on one or more GPUs. You can also fine-tune the 70B model with QLoRA on a single device or LoRA and full-finetunes on multiple devices. Finally, you can fine-tune the 405B model on a single node with QLoRA. For all the details, take a look at our [tutorial](https://pytorch.org/torchtune/main/tutorials/llama3.html).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo, less is more. No need to specify 8b/70b/405b in 3 different lines. Something like this should be good

"torchtune supports fine-tuning for the Llama3/Llama3.1 8B, 70B, and 405B size models with LoRA, QLoRA and full fine-tunes on one or more GPUs, depending on the model size. For all the details, take a look at our tutorial."

@RdoubleA
Copy link
Collaborator

I think we absolutely need a section in getting started on running on a custom dataset, maybe even listing out the datset types we support and pointing to docs for more details.

I don't see any mention of multimodal except in the install for why we depend on torchvision, we should be highlighting this somewhere

README.md Outdated
LoRA 70B

Note that the download command for the Meta-Llama3 70B model slightly differs from download commands for the 8B models. This is because we use the HuggingFace [safetensor](https://huggingface.co/docs/safetensors/en/index) model format to load the model. To download the 70B model, run
```bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont think that this warning is really necessary. But we should add a link to our models.api, that talks about downloading every model, not only llama. IMO, llama here should just be an example, not a full tutorial to download every llama model.

README.md Outdated

Note that the download command for the Meta-Llama3 70B model slightly differs from download commands for the 8B models. This is because we use the HuggingFace [safetensor](https://huggingface.co/docs/safetensors/en/index) model format to load the model. To download the 70B model, run
```bash
tune download meta-llama/Meta-Llama-3.1-70b --hf-token <> --output-dir /tmp/Meta-Llama-3.1-70b --ignore-patterns "original/consolidated*"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if i fully follow why for llama 8b we just share tune run, but for 70b, we have also download instructions here.

README.md Outdated
```

torchtune is designed to be easy to understand, use and extend.
You can find a full list of all our Llama3 configs [here](recipes/configs/llama3) and Llama3.1 configs [here.](recipes/configs/llama3_1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is nice, but i dont think that we should share only about llama. Maybe shoutout llama 3.1. and replace llama3 with just recipes/configs.

&nbsp;

## Community Contributions
### Community Contributions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we keeping this? I remember we had a discussion about it a while ago

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I planned to keep it for now

@ebsmothers
Copy link
Contributor Author

I think we absolutely need a section in getting started on running on a custom dataset, maybe even listing out the datset types we support and pointing to docs for more details.

I don't see any mention of multimodal except in the install for why we depend on torchvision, we should be highlighting this somewhere

I would punt that one to you and/or @pbontrager. The goal of these changes is just to get us from 6 months behind to approximately present-day. And imo custom datasets can get hairy quickly, I think the readme should be simple and to-the-point, leveraging our live docs as a reference. E.g. even axolotl's readme basically only has two sentences on datasets and just points to their docs.

| 8 x A100 | LoRA | Llama2-70B | Batch Size = 4, Seq Length = 4096 | 26.4 GB | 3384 |
| 8 x A100 | Full Finetune * | Llama2-70B | Batch Size = 4, Seq Length = 4096 | 70.4 GB | 2032 |
| Fine-tuning method | Devices | Recipe | Example config(s) |
|:-:|:-:|:-:|:-:|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its worth adding another column with [Speed optimized, memory optimized], or it may not be clear for the user why examples are duplicated and numbers are different

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably worth also adding bsz and max_seq_len

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the second point, I followed your advice and use fixed bsz and max_seq_len for all examples. I call this out in the note just above the table. I left out the speed vs memory optimized column because it's really only applicable to like 50% of the rows, so it might be redundant. Tried to instead include the hardware type to delineate that one can be run in a more memory-constrained environment

README.md Outdated
To get started with fine-tuning your first LLM with torchtune, see our tutorial on [fine-tuning Llama2 7B](https://pytorch.org/torchtune/main/tutorials/first_finetune_tutorial.html). Our [end-to-end workflow](https://pytorch.org/torchtune/main/tutorials/e2e_flow.html) tutorial will show you how to evaluate, quantize and run inference with this model. The rest of this section will provide a quick overview of these steps with Llama2.
To get started with torchtune, see our [Fine-Tune Your First LLM Tutorial](https://pytorch.org/torchtune/main/tutorials/first_finetune_tutorial.html). Our [end-to-end workflow](https://pytorch.org/torchtune/main/tutorials/e2e_flow.html) tutorial will show you how to evaluate, quantize and run inference with this model. The rest of this section will provide a quick overview of these steps with Llama3.

If you have a more custom workflow or need additional information on torchtune components and recipes, please check out our documentation page
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: update

Copy link
Member

@joecummings joecummings left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Controversial, but I would be in favor of dropping the full section on Llama 3 and Llama 3.1. We already say what models we support and can highlight new model updates at the top of the README in recent updates section

Copy link
Member

@joecummings joecummings left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meh

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 69.13%. Comparing base (50b24e5) to head (cc4dbff).
Report is 7 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1664      +/-   ##
==========================================
- Coverage   71.11%   69.13%   -1.98%     
==========================================
  Files         297      298       +1     
  Lines       15120    15165      +45     
==========================================
- Hits        10752    10485     -267     
- Misses       4368     4680     +312     
Flag Coverage Δ
69.13% <ø> (-1.98%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ebsmothers ebsmothers merged commit 30b8519 into meta-pytorch:main Sep 24, 2024
@ebsmothers ebsmothers deleted the readme-updates branch September 24, 2024 21:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants