Skip to content

Conversation

@ebsmothers
Copy link
Contributor

@ebsmothers ebsmothers commented Mar 13, 2025

Fix checkpoint path in one of our KD configs

@pytorch-bot
Copy link

pytorch-bot bot commented Mar 13, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2496

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 3609dba with merge base dab36d2 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 13, 2025
Copy link
Contributor

@pbontrager pbontrager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm uncomfortable with this PR because it's making a lot of assumptions about where things are saved during fine-tuning that might break in the future. It also makes the recipe dependent on having run another recipe which is new. Could you add more context on how common it is to finetune the teacher first?

teacher_checkpointer:
_component_: torchtune.training.FullModelHFCheckpointer
checkpoint_dir: /tmp/Meta-Llama-3.1-8B-Instruct/
checkpoint_dir: /tmp/torchtune/llama3_1_8B/lora/epoch_0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is epoch 0 the right default here?

@codecov-commenter
Copy link

codecov-commenter commented Mar 13, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 23.15%. Comparing base (dab36d2) to head (745a5e9).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2496   +/-   ##
=======================================
  Coverage   23.15%   23.15%           
=======================================
  Files         379      379           
  Lines       22838    22838           
=======================================
  Hits         5289     5289           
  Misses      17549    17549           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ebsmothers
Copy link
Contributor Author

@pbontrager these are fair comments. Tbh I'm not sure what the right thing to do here is, but would point to GRPO where we do something pretty similar (though ofc that is just in dev for now). The results are better when the teacher model is finetuned first, as discussed in the blog post. So based on that I claim this is the right thing to do, but understand your point around the usage of epoch_0 being a bit finicky.

@ebsmothers ebsmothers changed the title update teacher checkpointer paths in KD configs update teacher checkpointer paths in KD config Mar 14, 2025
@ebsmothers ebsmothers merged commit ab8c23e into meta-pytorch:main Mar 14, 2025
17 checks passed
pbontrager pushed a commit to pbontrager/torchtune that referenced this pull request Mar 17, 2025
ianbarber pushed a commit to ianbarber/torchtune that referenced this pull request Mar 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants