Skip to content

Conversation

@joecummings
Copy link
Member

@joecummings joecummings commented Feb 18, 2025

What?

See title

Why?

We already have a few dev recipes that require torchdata support link, link. We wanted to properly test these before adding a new dependency since these are nice-to-haves, not necessarily requirements for most post-training work. However, we aim to support step-based checkpointing soon see here, which necessitates the abilities to resume training from steps within an epoch. For this, we need a dataloader that maintains state, which is already available and tested in torchdata - StatefulDataLoader.

FAQs?

  1. Why are we only adding stable support right now? This doesn't mirror how we handle other PyTorch deps.

The speed of new features of immediate interest to torchtune coming out of torchdata is low, therefore I don't think it's likely we need to utilize nightly support that often. In addition, the current way we handle PyTorch deps is a little cumbersome. The user has to install torch, torchao, and torchvision themselves before just installing the package. To add torchdata onto this for a required part of each recipe would be too much IMO. We should look at a future fix for this such as building a torchtune[nightly] package that automatically downloads the latest stuff from PyTorch packages. Not sure how feasible this is.

  1. What is the release schedule for torchdata?

Currently, torchdata releases on the same cadence as PyTorch core. A faster release schedule would definitely be better for torchtune especially if we only depend on the stable torchdata package. cc @ramanishsingh

  1. Is it worth pulling in a new dependency just for a single feature?

For one, having torchdata successfully integrated into torchtune would mean we can more easily integrate multi-dataset, iterable datasets, and multi-threaded sample packing. So it's not just one feature. But also, StatefulDataLoader makes our lives so much easier for step-based checkpointing and is much better tested than anything we would build so yes I do think it's worth it.

cc @scotts @ebsmothers @pbontrager @divyanshk

@pytorch-bot
Copy link

pytorch-bot bot commented Feb 18, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2408

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 1 Pending

As of commit 11f028e with merge base e6cba25 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 18, 2025
@joecummings joecummings merged commit a65d23c into meta-pytorch:main Feb 19, 2025
17 checks passed
joecummings added a commit to joecummings/torchtune that referenced this pull request Feb 27, 2025
joecummings added a commit to joecummings/torchtune that referenced this pull request Feb 27, 2025
pbontrager pushed a commit to pbontrager/torchtune that referenced this pull request Mar 17, 2025
pbontrager added a commit that referenced this pull request Mar 17, 2025
Co-authored-by: Joe Cummings <[email protected]>
Co-authored-by: jxtngx <[email protected]>
Co-authored-by: salman <[email protected]>
Co-authored-by: Mark <[email protected]>
Co-authored-by: Mark Obozov <[email protected]>
Co-authored-by: Felipe Mello <[email protected]>
Co-authored-by: Felipe Mello <[email protected]>
Co-authored-by: ebsmothers <[email protected]>
pbontrager added a commit that referenced this pull request Mar 17, 2025
pbontrager added a commit that referenced this pull request Mar 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants