-
Notifications
You must be signed in to change notification settings - Fork 693
Update KVCache maximum sequence length configuration in PPO recipe #2412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update KVCache maximum sequence length configuration in PPO recipe #2412
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2412
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit a07588a with merge base 504cbea ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2412 +/- ##
===========================================
- Coverage 63.87% 23.47% -40.41%
===========================================
Files 368 373 +5
Lines 21873 22403 +530
===========================================
- Hits 13971 5258 -8713
- Misses 7902 17145 +9243 ☔ View full report in Codecov by Sentry. |
max_seq_len is set in PPO recipe when using kv-cacheing
Bug was raised in #2064. When setting up KV caches, we use
tokenizer.max_seq_lento determine the shape of the KV cache. As this was not configured in the config, we would error out.The direct fix which I'm introducing in this PR is to dynamically construct the KV cache shape based on the context length of the current batch, rather than the fixed
tokenizer.max_seq_len.I also introduce the default
tokenizer.max_seq_len=512as this was the config I used in #2066. This is a more sensible value to start out with thannull.Please see training logs below for this branch: