-
Notifications
You must be signed in to change notification settings - Fork 1.3k
HPU support #3378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
HPU support #3378
Changes from all commits
Commits
Show all changes
166 commits
Select commit
Hold shift + click to select a range
4f462b0
init
IlyasMoutawwakil 7b51103
style
IlyasMoutawwakil 9d7376e
is_hpu_available
IlyasMoutawwakil 069b88a
fix
IlyasMoutawwakil cd3cbb9
import habana_frameworks.torch.distributed.hccl
IlyasMoutawwakil 2493abe
style
IlyasMoutawwakil 32cbc88
test
IlyasMoutawwakil 5fd4de2
initialize dist proc group
IlyasMoutawwakil 7f72745
revert
IlyasMoutawwakil f66c5df
set backend to hccl only if hccl initialization sets a local rank
IlyasMoutawwakil 2a4130d
force backend hccl and multi_hpu type when sure of distributed launch
IlyasMoutawwakil fa1bc44
style
IlyasMoutawwakil d3e24c5
pass accelerator tests
IlyasMoutawwakil 00cc283
pas big modeling tests with bigger atol/rtol for accelerators
IlyasMoutawwakil 97081da
fix hpu device count and skip tests requiring hpu:x
IlyasMoutawwakil ddcb3ca
hpu autocast
IlyasMoutawwakil 6de389c
hpu rng_state
IlyasMoutawwakil ae9a76b
hpu launch
IlyasMoutawwakil 5b8b0b2
hpu special device placement
IlyasMoutawwakil a2f8040
hpu launch
IlyasMoutawwakil 6abecdd
rng state
IlyasMoutawwakil 7bc37dc
distributed data loop tests
IlyasMoutawwakil ef1de61
enforce non contiguity after device memory allocation
IlyasMoutawwakil 1b6905e
pass fsdp tests
IlyasMoutawwakil defe3fa
enforce pt_hpu_lazy_mode=0 when fsdp testing
IlyasMoutawwakil 9551ce3
pass cli tests
IlyasMoutawwakil 9c84fe7
pass and document grad sync tests
IlyasMoutawwakil 6f00591
pass kwargs handler and autocast tests
IlyasMoutawwakil c94bfbd
memory utils
IlyasMoutawwakil 61235d3
found source of int64 errors
IlyasMoutawwakil 0896a50
skip some modeling utils tests
IlyasMoutawwakil e974758
enable int64
IlyasMoutawwakil ee08748
skip optimizer tests
IlyasMoutawwakil 6f0fbe4
pass checkpointing tests
IlyasMoutawwakil c5c50c6
pass accelerator tests with safetensors main
IlyasMoutawwakil 34010c9
more hpu stuff
IlyasMoutawwakil 9f75a6e
Merge branch 'main' into hpu-support
IlyasMoutawwakil e80b484
style
IlyasMoutawwakil 5cacc31
remove PT_HPU_LAZY_MODE and PT_ENABLE_INT64_SUPPORT as they should be…
IlyasMoutawwakil f006c4e
start testing on gaudi2
IlyasMoutawwakil 19e652a
support fp16 on gaudi2
IlyasMoutawwakil 40d22b1
add testing order
IlyasMoutawwakil eb37c43
custom hpu fsdp env dict
IlyasMoutawwakil dc4ca51
fix torch trace malloc
IlyasMoutawwakil 74b307a
test ddp half precision comm hooks
IlyasMoutawwakil 5a6d5ef
fix
IlyasMoutawwakil 5a1c0c9
fix
IlyasMoutawwakil 50d9e71
remove lower bound for hpu
IlyasMoutawwakil f0579e8
use 0.72 as lower bound
IlyasMoutawwakil dfc82ec
lower lower bound
IlyasMoutawwakil 176e3d2
order deepspeed tests
IlyasMoutawwakil 6c688d0
fix
IlyasMoutawwakil b078e90
deepspeed_use_hpu
IlyasMoutawwakil 0dcb46a
assert non lazy mode with offloaded optimizer
IlyasMoutawwakil 5abb1a4
make patching torch with habana frameworks the default
IlyasMoutawwakil b63a6fa
less of require_non_hpu
IlyasMoutawwakil 36f8794
skip test_multi_device_merge_fsdp_weights for now as it halts
IlyasMoutawwakil ab5cbb0
skip another flaky test
IlyasMoutawwakil e318161
format
IlyasMoutawwakil 0c040c3
use habana_visible_modules
IlyasMoutawwakil 6f5977e
patch torch hpu device count
IlyasMoutawwakil f1e196f
avoid setting HABANA_VISIBLE_MODULES
IlyasMoutawwakil 2772b68
don't play with habana visible devices/modules
IlyasMoutawwakil 7d1ef62
only with hpu
IlyasMoutawwakil 427c313
fixes and skips
IlyasMoutawwakil be91183
skip
IlyasMoutawwakil 5c0cd84
fix device ids and add some todos
IlyasMoutawwakil ae1431a
skip offloading with generate()
IlyasMoutawwakil d383ea5
fix
IlyasMoutawwakil 0b62d52
reduced atol/rtol for hpu
IlyasMoutawwakil f2504a5
fix
IlyasMoutawwakil f5cf0d5
tag deepspeed tests that should run first
IlyasMoutawwakil ac434c2
enable a test path that was skipped
IlyasMoutawwakil 1501105
revert a test that was customized for gaudi1
IlyasMoutawwakil 8b5708e
some patching to enable HABANA_VISIBLE_MODULES
IlyasMoutawwakil 8935766
fix zero3 test
IlyasMoutawwakil d8301cd
misc
IlyasMoutawwakil 6ce9e3a
test DTensor TP
IlyasMoutawwakil 42775d2
remove gaudi1
IlyasMoutawwakil 788e95f
test
IlyasMoutawwakil 03b391e
style
IlyasMoutawwakil 2247739
comment
IlyasMoutawwakil 07ba582
pass pad_across_processes
IlyasMoutawwakil 647dfab
require_fp16
IlyasMoutawwakil 8e63b29
pass memory utils test
IlyasMoutawwakil 6b1d131
test_ddp_comm_hook
IlyasMoutawwakil 7803291
skip half precision comm hooks on hpu
IlyasMoutawwakil 2883ca1
fix
IlyasMoutawwakil 007d4a8
is_fp16_available
IlyasMoutawwakil 9c12fae
fp16
IlyasMoutawwakil 324d6df
tp as part of integration tests
IlyasMoutawwakil 839c6be
fix
IlyasMoutawwakil 3e548f4
write_basic_config
IlyasMoutawwakil f67a898
safetensors
IlyasMoutawwakil f449d3f
local sgd and masked_fill_fwd_i64
IlyasMoutawwakil 79ef8a5
fix num_processes in test_load_states_by_steps
IlyasMoutawwakil f772b76
fp8 support
IlyasMoutawwakil 6218cec
test
IlyasMoutawwakil 31872f6
Merge branch 'main' into hpu-support
IlyasMoutawwakil 610c68b
fix
IlyasMoutawwakil 347db07
add a workflow
IlyasMoutawwakil 5fc5a2a
Update src/accelerate/accelerator.py
IlyasMoutawwakil dc7a773
review comments
IlyasMoutawwakil 9606f0d
ci
IlyasMoutawwakil 6b77bc4
style
IlyasMoutawwakil d556021
comments
IlyasMoutawwakil e2fe2cc
test
IlyasMoutawwakil 05e6861
habana_frameworks.torch
IlyasMoutawwakil ef6192c
patch device count
IlyasMoutawwakil 59b51e5
fix
IlyasMoutawwakil c6731f5
fix
IlyasMoutawwakil 66ec449
require_fp8
IlyasMoutawwakil 28dae91
fix
IlyasMoutawwakil ec9c562
fix
IlyasMoutawwakil 53f99c3
gaudi 1
IlyasMoutawwakil 5f9928d
remove unnecessary
IlyasMoutawwakil ddbece5
fixed maskd fill error in transformers
IlyasMoutawwakil 72bd312
style
IlyasMoutawwakil 506d07e
balanced_memory pass on hpu
IlyasMoutawwakil ae67bcc
remove for now
IlyasMoutawwakil 405b857
run first
IlyasMoutawwakil 27be94c
Apply suggestions from code review
IlyasMoutawwakil 4e0e966
Merge branch 'main' into hpu-support
IlyasMoutawwakil e2a8d85
style after merge
IlyasMoutawwakil 03e2646
Update src/accelerate/accelerator.py
IlyasMoutawwakil 3ed87c1
Update src/accelerate/utils/transformer_engine.py
IlyasMoutawwakil 2dcab3e
Merge branch 'main' into hpu-support
IlyasMoutawwakil 55b0d3c
empty cache review comments
IlyasMoutawwakil bd2afc3
test_scirpt.py error messages
IlyasMoutawwakil 75e5b81
AccelerateTestCase for accelerator state cleanup
IlyasMoutawwakil e5dfad4
test
IlyasMoutawwakil ed84e7b
add gaudi1 workflow
IlyasMoutawwakil a05e54a
fp8 avilability
IlyasMoutawwakil eb0b3a3
fix
IlyasMoutawwakil 7b2650a
reduce batch size
IlyasMoutawwakil 9b227d8
concurrency
IlyasMoutawwakil 8cf20cd
check cuda as well
IlyasMoutawwakil 7c4897b
nits and comments
IlyasMoutawwakil d0485f1
mark fsdp tests that require_fp16
IlyasMoutawwakil c37aefd
style
IlyasMoutawwakil bdae68d
mark deepspeed fp16 tests
IlyasMoutawwakil d919931
update image
IlyasMoutawwakil efd2a27
fix
IlyasMoutawwakil 394b687
updated
IlyasMoutawwakil 4f76d2c
better msgs
IlyasMoutawwakil b3dd375
skip pippy
IlyasMoutawwakil 17d43ab
test
IlyasMoutawwakil db16287
test on 2 device
IlyasMoutawwakil e359c01
support up to 1% relative error in test_accelerate
IlyasMoutawwakil e9cfca4
skip hpu fp16
IlyasMoutawwakil ac41600
allow for 1 byte differene
IlyasMoutawwakil 8571ef4
revert torch_device change
IlyasMoutawwakil 3115ee4
style
IlyasMoutawwakil 7c6a44a
skip memory release since it's flaky
IlyasMoutawwakil e8f9a48
add accelerator state cleanup to fixture
IlyasMoutawwakil 3face36
fix
IlyasMoutawwakil 06c1f53
atol
IlyasMoutawwakil 75aaabd
fix
IlyasMoutawwakil 21fca86
more rtol
IlyasMoutawwakil a99c297
equal grad test
IlyasMoutawwakil 81a37be
revert
IlyasMoutawwakil 92775af
pass pippy on gaudi2 and skip on gaudi1
IlyasMoutawwakil ce13eeb
enable sd 1.5 test with require fp16
IlyasMoutawwakil 04983cc
added warning on memory release
IlyasMoutawwakil 5efbe8c
don't log warning in memory release as it requires PartialState to be…
IlyasMoutawwakil 4847474
Apply suggestions from code review
IlyasMoutawwakil File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,77 @@ | ||
| name: Gaudi1 tests (scheduled) | ||
|
|
||
| on: | ||
| workflow_dispatch: | ||
| schedule: | ||
| - cron: "0 2 * * *" | ||
|
|
||
| concurrency: | ||
| group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }} | ||
| cancel-in-progress: true | ||
|
|
||
| jobs: | ||
| run_gaudi1_tests: | ||
| name: Test on Gaudi1 | ||
| runs-on: | ||
| group: aws-dl1-24xlarge | ||
|
|
||
| container: | ||
| image: docker://vault.habana.ai/gaudi-docker/1.20.0/ubuntu22.04/habanalabs/pytorch-installer-2.6.0:latest | ||
| options: --runtime=habana --shm-size=64G --cap-add=sys_nice --env HABANA_VISIBLE_DEVICES=0,1 | ||
| env: | ||
| OMPI_MCA_btl_vader_single_copy_mechanism: none | ||
| PT_ENABLE_INT64_SUPPORT: 1 | ||
| PT_HPU_LAZY_MODE: 0 | ||
| RUN_SLOW: 1 | ||
|
|
||
| steps: | ||
| - name: HL-SMI (1) | ||
| run: | | ||
| hl-smi | ||
| echo "HABANA_VISIBLE_DEVICES=${HABANA_VISIBLE_DEVICES}" | ||
| echo "HABANA_VISIBLE_MODULES=${HABANA_VISIBLE_MODULES}" | ||
|
|
||
| - name: Extract HPU visible modules | ||
| id: add-modules | ||
| run: | | ||
| export HABANA_VISIBLE_MODULES=$(hl-smi -Q module_id -f csv,noheader | tr '\n' ',' | sed 's/,$//') | ||
| echo "HABANA_VISIBLE_MODULES=${HABANA_VISIBLE_MODULES}" >> $GITHUB_ENV | ||
|
|
||
| - name: HL-SMI (2) | ||
| run: | | ||
| hl-smi | ||
| echo "HABANA_VISIBLE_DEVICES=${HABANA_VISIBLE_DEVICES}" | ||
| echo "HABANA_VISIBLE_MODULES=${HABANA_VISIBLE_MODULES}" | ||
|
|
||
| - name: Checkout to Accelerate | ||
| uses: actions/checkout@v4 | ||
|
|
||
| - name: Install Accelerate with Transformers & DeepSpeed | ||
| run: | | ||
| pip install -e .[testing] \ | ||
| git+https://github.com/HabanaAI/[email protected] \ | ||
| git+https://github.com/huggingface/transformers.git@hpu-support | ||
|
|
||
| - name: Run CLI tests | ||
| run: | | ||
| make test_cli | ||
|
|
||
| - name: Run Core tests | ||
| run: | | ||
| make test_core | ||
|
|
||
| - name: Run Big Modeling tests | ||
| run: | | ||
| make test_big_modeling | ||
|
|
||
| - name: Run FSDP integration tests | ||
| run: | | ||
| make test_fsdp | ||
|
|
||
| - name: Run DeepSpeed integration tests | ||
| run: | | ||
| make test_deepspeed | ||
|
|
||
| - name: Run Examples tests | ||
| run: | | ||
| make test_examples |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -22,7 +22,7 @@ | |
| "ruff ~= 0.6.4", | ||
| ] | ||
| extras["docs"] = [] | ||
| extras["test_prod"] = ["pytest>=7.2.0,<=8.0.0", "pytest-xdist", "pytest-subtests", "parameterized"] | ||
| extras["test_prod"] = ["pytest>=7.2.0,<=8.0.0", "pytest-xdist", "pytest-subtests", "parameterized", "pytest-order"] | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. TIL 👀 |
||
| extras["test_dev"] = [ | ||
| "datasets", | ||
| "diffusers", | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure TP should be part of test_core, tell me if you want me to revert this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i don't think we want that cc @muellerzr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed