rocr/clr: Add SDMA linear swap copy support#4540
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Adds SDMA linear swap support across ROCR runtime and ROCclr, enabling gfx94x/gfx95x to submit SDMA swap packets via the existing SDMA prologue/body/epilogue submission path and sharing fan-out logic between copy and swap.
Changes:
- Introduces SDMA
SDMA_PKT_COPY_LINEAR_SWAPpacket definition and enables swap support detection for gfx94x+. - Adds
BlitSdma::BuildSwapCopyCommand/SubmitLinearSwapBodyand routes swap operations throughGpuAgent::DmaCopyFanOutOp. - Refactors batch-copy op descriptor field naming/usage (
num_dsts→num_entries) and updates ROCclr batching to emit multi-entry ops.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| projects/rocr-runtime/runtime/hsa-runtime/inc/hsa_ext_amd.h | Updates public batch-copy op docs/fields to use num_entries and describes swap op type. |
| projects/rocr-runtime/runtime/hsa-runtime/core/runtime/hsa_ext_amd.cpp | Updates batch-copy validation logic for num_entries and adds multi-entry SWAP validation. |
| projects/rocr-runtime/runtime/hsa-runtime/core/runtime/amd_gpu_agent.cpp | Refactors SDMA fan-out to be op-type driven and adds swap routing. |
| projects/rocr-runtime/runtime/hsa-runtime/core/runtime/amd_blit_sdma.cpp | Implements swap body submission and builds SDMA swap packets. |
| projects/rocr-runtime/runtime/hsa-runtime/core/inc/sdma_registers.h | Adds SDMA swap sub-op constant and packet struct definition. |
| projects/rocr-runtime/runtime/hsa-runtime/core/inc/amd_gpu_agent.h | Declares swap and generalized fan-out APIs. |
| projects/rocr-runtime/runtime/hsa-runtime/core/inc/amd_blit_sdma.h | Extends SDMA blit interface with swap support and capability query. |
| projects/clr/rocclr/device/rocm/rocblit.cpp | Extends ROCclr batch building to include SWAP and uses num_entries for multi/broadcast. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
projects/rocr-runtime/runtime/hsa-runtime/core/runtime/amd_gpu_agent.cpp
Show resolved
Hide resolved
projects/rocr-runtime/runtime/hsa-runtime/core/runtime/amd_gpu_agent.cpp
Show resolved
Hide resolved
b1f46bb to
a5865f6
Compare
dayatsin-amd
approved these changes
Mar 29, 2026
cc45f44 to
29fdc4e
Compare
gandryey
approved these changes
Mar 30, 2026
- Add SDMA_PKT_COPY_LINEAR_SWAP packet (SDMA_SUBOP_COPY_SWAP=9) with 30-bit count for gfx94X/gfx95X - Add BlitSdma::BuildSwapCopyCommand and SubmitLinearSwapBody to build and submit swap packets via the existing prologue/body/epilogue path - Refactor DmaCopyFanOut into DmaCopyFanOutOp parameterised by hsa_amd_memory_copy_op_type_t so copy and swap share the same fan-out logic without code duplication - Add GpuAgent::DmaCopySwap and wire HSA_AMD_MEMORY_COPY_OP_LINEAR_SWAP in DmaCopyBatch (rejects num_entries==0; swap always uses list form) - Check SwapSupported() early in DmaCopyFanOutOp before allocating signals to avoid resource leaks on unsupported hardware - CLR rocrCopyBufferBatch: group swap ops into multi-entry ops using the same MultiArrays struct as linear multi, bypassing broadcast grouping; set src_agent for validation routing; assert src_size == dst_size (asymmetric swap reserved for future use) - Update hsa_ext_amd.cpp validation for multi-entry swap (num_entries, src_list/dst_list/size_list, reserved0) and single-entry swap - Rename num_dsts -> num_entries across the public API and all callers; keep num_dsts as deprecated union alias for backward compatibility - Update LINEAR_SWAP docs to describe both multi-entry and single-entry forms Made-with: Cursor
29fdc4e to
8b6226b
Compare
bwelton
approved these changes
Mar 31, 2026
systems-assistant bot
pushed a commit
to ROCm/clr
that referenced
this pull request
Apr 1, 2026
- Add SDMA_PKT_COPY_LINEAR_SWAP packet for gfx94X/gfx95X - Add BlitSdma::BuildSwapCopyCommand and SubmitLinearSwapBody to build and submit swap packets via the existing prologue/body/epilogue path - Refactor DmaCopyFanOut into DmaCopyFanOutOp parameterised by hsa_amd_memory_copy_op_type_t so copy and swap share the same fan-out logic without code duplication ## Motivation <!-- Explain the purpose of this PR and the goals it aims to achieve. --> ## Technical Details <!-- Explain the changes along with any relevant GitHub links. --> ## JIRA ID <!-- If applicable, mention the JIRA ID resolved by this PR (Example: Resolves SWDEV-12345). --> <!-- Do not post any JIRA links here. --> ## Test Plan <!-- Explain any relevant testing done to verify this PR. --> ## Test Result <!-- Briefly summarize test outcomes. --> ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. [rocm-systems] ROCm/rocm-systems#4540 (commit bdc33c3)
systems-assistant bot
pushed a commit
to ROCm/ROCR-Runtime
that referenced
this pull request
Apr 1, 2026
- Add SDMA_PKT_COPY_LINEAR_SWAP packet for gfx94X/gfx95X - Add BlitSdma::BuildSwapCopyCommand and SubmitLinearSwapBody to build and submit swap packets via the existing prologue/body/epilogue path - Refactor DmaCopyFanOut into DmaCopyFanOutOp parameterised by hsa_amd_memory_copy_op_type_t so copy and swap share the same fan-out logic without code duplication ## Motivation <!-- Explain the purpose of this PR and the goals it aims to achieve. --> ## Technical Details <!-- Explain the changes along with any relevant GitHub links. --> ## JIRA ID <!-- If applicable, mention the JIRA ID resolved by this PR (Example: Resolves SWDEV-12345). --> <!-- Do not post any JIRA links here. --> ## Test Plan <!-- Explain any relevant testing done to verify this PR. --> ## Test Result <!-- Briefly summarize test outcomes. --> ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. [rocm-systems] ROCm/rocm-systems#4540 (commit bdc33c3)
iassiour
pushed a commit
that referenced
this pull request
Apr 1, 2026
- Add SDMA_PKT_COPY_LINEAR_SWAP packet for gfx94X/gfx95X - Add BlitSdma::BuildSwapCopyCommand and SubmitLinearSwapBody to build and submit swap packets via the existing prologue/body/epilogue path - Refactor DmaCopyFanOut into DmaCopyFanOutOp parameterised by hsa_amd_memory_copy_op_type_t so copy and swap share the same fan-out logic without code duplication ## Motivation <!-- Explain the purpose of this PR and the goals it aims to achieve. --> ## Technical Details <!-- Explain the changes along with any relevant GitHub links. --> ## JIRA ID <!-- If applicable, mention the JIRA ID resolved by this PR (Example: Resolves SWDEV-12345). --> <!-- Do not post any JIRA links here. --> ## Test Plan <!-- Explain any relevant testing done to verify this PR. --> ## Test Result <!-- Briefly summarize test outcomes. --> ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Technical Details
JIRA ID
Test Plan
Test Result
Submission Checklist