[memory Improvement] delete logits before bwd #1235

felipemello1 · 2024-07-27T15:19:23Z

Context

What is the purpose of this PR? Is it to

add a new feature
fix a bug
update tests and/or documentation
other (please add here)

Inspired by #1046

Releasing logits before backward releases memory and reduces peak allocated memory

Changelog

delete the logits before backward for lora/FFT/QAT recipes. I didn't do it for RL, since there is a bit more complexity there. Lora distributed already had it.

Test plan

ran it for 5 epochs with/without the change. Same loss and tok/s, but lower memory

run pre-commit hooks and linters (make sure you've first installed via pre-commit install)
add unit tests for any new functionality
update docstrings for any new or updated methods or classes
run unit tests via pytest tests
run recipe tests via pytest tests -m integration_test
manually run any new or modified recipes with sufficient proof of correctness
- include relevant commands and any other artifacts in this summary (pastes of loss curves, eval results, etc.)

pytorch-bot · 2024-07-27T15:19:26Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1235

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 71920dd with merge base 1157b94 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

codecov-commenter · 2024-07-27T22:15:33Z

Codecov Report

Attention: Patch coverage is 0% with 4 lines in your changes missing coverage. Please review.

Project coverage is 71.33%. Comparing base (7eb89e2) to head (71920dd).
Report is 694 commits behind head on main.

Files with missing lines	Patch %	Lines
recipes/full_finetune_distributed.py	0.00%	1 Missing ⚠️
recipes/full_finetune_single_device.py	0.00%	1 Missing ⚠️
recipes/lora_finetune_single_device.py	0.00%	1 Missing ⚠️
recipes/qat_distributed.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1235      +/-   ##
==========================================
+ Coverage   67.81%   71.33%   +3.52%     
==========================================
  Files         219      221       +2     
  Lines        9908    10013     +105     
==========================================
+ Hits         6719     7143     +424     
+ Misses       3189     2870     -319

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ebsmothers

Nice find! I think this should be a free win. Looks like some CI jobs are failing but otherwise no concerns from me

musabgultekin · 2024-07-29T11:57:42Z

Hi @felipemello1 I saw that you added "nproc8" config to the table. Could you elaborate what is that. Is that FSDP with 8x GPUs?. Im asking as I see it has a lower memory footprint on FFT

felipemello1 · 2024-07-29T13:28:57Z

Hi @felipemello1 I saw that you added "nproc8" config to the table. Could you elaborate what is that. Is that FSDP with 8x GPUs?. Im asking as I see it has a lower memory footprint on FFT

@musabgultekin you are correct, 8xA100 with FSDP

felipemello1 · 2024-07-29T13:30:30Z

@SalmanMohammadi since you are touching some RL recipes, if you have a chance to test it there too and you dont mind, there may be similar gains :)

SalmanMohammadi · 2024-07-29T13:38:45Z

Hey @felipemello1. Thanks for your work here - a very neat change.
The main blocker in RL recipes is that the logits are also used for metric logging as a way to measure the extent of divergence for the model being trained, or to measure how well differentiated logits/logprobs for "preferred" vs "non-preferred" are.

I can take a look to see if there's a neat way around this. As an aside, how did the improvements you saw in memory usage scale with batch size/max seq len?

felipemello1 · 2024-07-29T14:09:37Z

The main blocker in RL recipes is that the logits are also used for metric logging

I think that doing .detach().cpu() should do the trick? But then it may make the code a bit ugly/weird. Before going down the rabit hole of how to organize the code, I guess we could just test it, deleting the metrics logging, and see if it impacts memory?

how did the improvements you saw in memory usage scale with batch size/max seq len?

13GB less for 24k sequence len with QLoRA, bsz=1, but i dont have a graph showing the % diff for multiple seq_len and bsz :/

SalmanMohammadi · 2024-07-29T23:42:26Z

Thanks so much for pointing me towards this @felipemello1 . I made a hopefully minimal change to maintain the logging behaviour whilst deleting the unnecessary logits.

This only affects DPO, since RLHF is still in the works, but I've done my best to be ruthless with freeing memory wherever I can when implementing RLHF. If you're interested I wouldn't say no to a review in the relevant functions : )

delete loss before bwd

71920dd

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 27, 2024

ebsmothers reviewed Jul 27, 2024

View reviewed changes

joecummings approved these changes Jul 28, 2024

View reviewed changes

joecummings merged commit 8a98fba into meta-pytorch:main Jul 28, 2024

SalmanMohammadi mentioned this pull request Jul 29, 2024

[Memory Improvement] delete logits before bwd - DPO style #1243

Merged

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[memory Improvement] delete logits before bwd #1235

[memory Improvement] delete logits before bwd #1235

Uh oh!

felipemello1 commented Jul 27, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jul 27, 2024 •

edited

Loading

Uh oh!

codecov-commenter commented Jul 27, 2024 •

edited

Loading

Uh oh!

ebsmothers left a comment

Uh oh!

musabgultekin commented Jul 29, 2024 •

edited

Loading

Uh oh!

felipemello1 commented Jul 29, 2024

Uh oh!

felipemello1 commented Jul 29, 2024 •

edited

Loading

Uh oh!

SalmanMohammadi commented Jul 29, 2024

Uh oh!

felipemello1 commented Jul 29, 2024 •

edited

Loading

Uh oh!

SalmanMohammadi commented Jul 29, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

[memory Improvement] delete logits before bwd #1235

[memory Improvement] delete logits before bwd #1235

Uh oh!

Conversation

felipemello1 commented Jul 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Changelog

Test plan

Uh oh!

pytorch-bot bot commented Jul 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1235

✅ No Failures

Uh oh!

codecov-commenter commented Jul 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ebsmothers left a comment

Choose a reason for hiding this comment

Uh oh!

musabgultekin commented Jul 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

felipemello1 commented Jul 29, 2024

Uh oh!

felipemello1 commented Jul 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SalmanMohammadi commented Jul 29, 2024

Uh oh!

felipemello1 commented Jul 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SalmanMohammadi commented Jul 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

felipemello1 commented Jul 27, 2024 •

edited

Loading

pytorch-bot bot commented Jul 27, 2024 •

edited

Loading

codecov-commenter commented Jul 27, 2024 •

edited

Loading

musabgultekin commented Jul 29, 2024 •

edited

Loading

felipemello1 commented Jul 29, 2024 •

edited

Loading

felipemello1 commented Jul 29, 2024 •

edited

Loading

SalmanMohammadi commented Jul 29, 2024 •

edited

Loading