docs: Add comprehensive custom data guide and fix missing _component_ #2889
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR addresses two related documentation issues to significantly improve the new user experience:
_component_
field in the instruct dataset examplesWhy This Matters
As noted in #2215 by @johnowhitaker, finding how to use custom data requires searching through multiple documentation pages. This is frustrating for new users who just want to get started with their own data. This PR consolidates all custom data information into a single, easy-to-find guide.
What's Included
New Custom Data Quick Start Guide (
custom_data_quickstart.rst
)Bug Fixes in
instruct_datasets.rst
_component_: torchtune.datasets.instruct_dataset
to YAML examplesTesting
Impact
This documentation directly addresses the #1 user question when starting with TorchTune. It will significantly reduce support burden and improve user onboarding.
Fixes #2215
Fixes #2221
cc @RdoubleA for review