-
Notifications
You must be signed in to change notification settings - Fork 31.5k
HPU support #36424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HPU support #36424
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
@ArthurZucker @muellerzr PR is ready for review, I made sure (trainer, fsdp, deepspeed) tests ran successfully on both gaudi1 and gaudi2 in single and multi device settings. |
tests/fsdp/test_fsdp.py
Outdated
| # the file doesn't exist in the repo | ||
| if not os.path.exists("utils/testing_scripts/fsdp_cpu_offloading.py"): | ||
| raise unittest.SkipTest("FSDP CPU offloading script not found!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couldn't find this file, is this test still relevant ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no idea cc @muellerzr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's meant to be:
from functools import partial
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from accelerate import Accelerator
# verify we have FSDP activation support ready by importing:
from torch.distributed.algorithms._checkpoint.checkpoint_wrapper import (
checkpoint_wrapper,
CheckpointImpl,
apply_activation_checkpointing,
)
from transformers.models.llama.modeling_llama import LlamaDecoderLayer
model_id = "HuggingFaceM4/tiny-random-Llama3ForCausalLM"
model = AutoModelForCausalLM.from_pretrained(model_id)
model.train()
model.gradient_checkpointing_enable()
accelerator = Accelerator()
model = accelerator.prepare(model)
check_fn = lambda submodule: isinstance(submodule, LlamaDecoderLayer)
non_reentrant_wrapper = partial(
checkpoint_wrapper,
offload_to_cpu=False,
checkpoint_impl=CheckpointImpl.NO_REENTRANT,
)
apply_activation_checkpointing(
model, checkpoint_wrapper_fn=non_reentrant_wrapper, check_fn=check_fn
)
print(model)
rand_input = torch.LongTensor([[0, 1, 0, 1]]).to(0)
model(rand_input)Was referenced in #31161 but never actually added? 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should I leave it for another PR ? the file path utils/testing_scripts/fsdp_cpu_offloading.py doesn't make sense in transformers repo.
ArthurZucker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIce! Missing for me a bit of doc on:
- what is HPU
- how could anyone run on HPU?
But that's it!
tests/fsdp/test_fsdp.py
Outdated
| # the file doesn't exist in the repo | ||
| if not os.path.exists("utils/testing_scripts/fsdp_cpu_offloading.py"): | ||
| raise unittest.SkipTest("FSDP CPU offloading script not found!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no idea cc @muellerzr
muellerzr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Added a note for our apparent missing test file 👀
tests/fsdp/test_fsdp.py
Outdated
| # the file doesn't exist in the repo | ||
| if not os.path.exists("utils/testing_scripts/fsdp_cpu_offloading.py"): | ||
| raise unittest.SkipTest("FSDP CPU offloading script not found!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's meant to be:
from functools import partial
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from accelerate import Accelerator
# verify we have FSDP activation support ready by importing:
from torch.distributed.algorithms._checkpoint.checkpoint_wrapper import (
checkpoint_wrapper,
CheckpointImpl,
apply_activation_checkpointing,
)
from transformers.models.llama.modeling_llama import LlamaDecoderLayer
model_id = "HuggingFaceM4/tiny-random-Llama3ForCausalLM"
model = AutoModelForCausalLM.from_pretrained(model_id)
model.train()
model.gradient_checkpointing_enable()
accelerator = Accelerator()
model = accelerator.prepare(model)
check_fn = lambda submodule: isinstance(submodule, LlamaDecoderLayer)
non_reentrant_wrapper = partial(
checkpoint_wrapper,
offload_to_cpu=False,
checkpoint_impl=CheckpointImpl.NO_REENTRANT,
)
apply_activation_checkpointing(
model, checkpoint_wrapper_fn=non_reentrant_wrapper, check_fn=check_fn
)
print(model)
rand_input = torch.LongTensor([[0, 1, 0, 1]]).to(0)
model(rand_input)Was referenced in #31161 but never actually added? 😅
ArthurZucker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's go!
muellerzr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks good from the Trainer side in my eyes, only thing we may want is to add an accelerate import check to flag as a requirement (release will go live tonight)
Added ! target version is 1.50 right ? @muellerzr |
What does this PR do?
This PR introduces upstream support for HPU torch device/backend:
This PR focuses on enabling out of the box support in eager mode (
PT_HPU_LAZY_MODE=0), whileoptimum-habanawill continue to enable optimized paths making use of the lazy mode and advanced features of the SynapseAI software stack.This is part of three PRs:
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.