Skip to content

rocr/clr: Add SDMA linear swap copy support#4540

Merged
saleelk merged 1 commit intodevelopfrom
users/saleelk/sdmaSwap
Apr 1, 2026
Merged

rocr/clr: Add SDMA linear swap copy support#4540
saleelk merged 1 commit intodevelopfrom
users/saleelk/sdmaSwap

Conversation

@saleelk
Copy link
Copy Markdown
Contributor

@saleelk saleelk commented Mar 29, 2026

  • Add SDMA_PKT_COPY_LINEAR_SWAP packet for gfx94X/gfx95X
  • Add BlitSdma::BuildSwapCopyCommand and SubmitLinearSwapBody to build and submit swap packets via the existing prologue/body/epilogue path
  • Refactor DmaCopyFanOut into DmaCopyFanOutOp parameterised by hsa_amd_memory_copy_op_type_t so copy and swap share the same fan-out logic without code duplication

Motivation

Technical Details

JIRA ID

Test Plan

Test Result

Submission Checklist

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds SDMA linear swap support across ROCR runtime and ROCclr, enabling gfx94x/gfx95x to submit SDMA swap packets via the existing SDMA prologue/body/epilogue submission path and sharing fan-out logic between copy and swap.

Changes:

  • Introduces SDMA SDMA_PKT_COPY_LINEAR_SWAP packet definition and enables swap support detection for gfx94x+.
  • Adds BlitSdma::BuildSwapCopyCommand / SubmitLinearSwapBody and routes swap operations through GpuAgent::DmaCopyFanOutOp.
  • Refactors batch-copy op descriptor field naming/usage (num_dstsnum_entries) and updates ROCclr batching to emit multi-entry ops.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
projects/rocr-runtime/runtime/hsa-runtime/inc/hsa_ext_amd.h Updates public batch-copy op docs/fields to use num_entries and describes swap op type.
projects/rocr-runtime/runtime/hsa-runtime/core/runtime/hsa_ext_amd.cpp Updates batch-copy validation logic for num_entries and adds multi-entry SWAP validation.
projects/rocr-runtime/runtime/hsa-runtime/core/runtime/amd_gpu_agent.cpp Refactors SDMA fan-out to be op-type driven and adds swap routing.
projects/rocr-runtime/runtime/hsa-runtime/core/runtime/amd_blit_sdma.cpp Implements swap body submission and builds SDMA swap packets.
projects/rocr-runtime/runtime/hsa-runtime/core/inc/sdma_registers.h Adds SDMA swap sub-op constant and packet struct definition.
projects/rocr-runtime/runtime/hsa-runtime/core/inc/amd_gpu_agent.h Declares swap and generalized fan-out APIs.
projects/rocr-runtime/runtime/hsa-runtime/core/inc/amd_blit_sdma.h Extends SDMA blit interface with swap support and capability query.
projects/clr/rocclr/device/rocm/rocblit.cpp Extends ROCclr batch building to include SWAP and uses num_entries for multi/broadcast.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@saleelk saleelk force-pushed the users/saleelk/sdmaSwap branch 2 times, most recently from b1f46bb to a5865f6 Compare March 29, 2026 04:59
@saleelk saleelk force-pushed the users/saleelk/sdmaSwap branch from cc45f44 to 29fdc4e Compare March 30, 2026 20:31
- Add SDMA_PKT_COPY_LINEAR_SWAP packet (SDMA_SUBOP_COPY_SWAP=9) with
  30-bit count for gfx94X/gfx95X
- Add BlitSdma::BuildSwapCopyCommand and SubmitLinearSwapBody to build
  and submit swap packets via the existing prologue/body/epilogue path
- Refactor DmaCopyFanOut into DmaCopyFanOutOp parameterised by
  hsa_amd_memory_copy_op_type_t so copy and swap share the same
  fan-out logic without code duplication
- Add GpuAgent::DmaCopySwap and wire HSA_AMD_MEMORY_COPY_OP_LINEAR_SWAP
  in DmaCopyBatch (rejects num_entries==0; swap always uses list form)
- Check SwapSupported() early in DmaCopyFanOutOp before allocating
  signals to avoid resource leaks on unsupported hardware
- CLR rocrCopyBufferBatch: group swap ops into multi-entry ops using
  the same MultiArrays struct as linear multi, bypassing broadcast
  grouping; set src_agent for validation routing; assert src_size ==
  dst_size (asymmetric swap reserved for future use)
- Update hsa_ext_amd.cpp validation for multi-entry swap (num_entries,
  src_list/dst_list/size_list, reserved0) and single-entry swap
- Rename num_dsts -> num_entries across the public API and all callers;
  keep num_dsts as deprecated union alias for backward compatibility
- Update LINEAR_SWAP docs to describe both multi-entry and single-entry
  forms

Made-with: Cursor
@saleelk saleelk force-pushed the users/saleelk/sdmaSwap branch from 29fdc4e to 8b6226b Compare March 31, 2026 21:40
@saleelk saleelk requested review from a team as code owners March 31, 2026 21:40
@saleelk saleelk merged commit bdc33c3 into develop Apr 1, 2026
68 of 77 checks passed
@saleelk saleelk deleted the users/saleelk/sdmaSwap branch April 1, 2026 16:51
systems-assistant bot pushed a commit to ROCm/clr that referenced this pull request Apr 1, 2026
- Add SDMA_PKT_COPY_LINEAR_SWAP packet for gfx94X/gfx95X
- Add BlitSdma::BuildSwapCopyCommand and SubmitLinearSwapBody to build
and submit swap packets via the existing prologue/body/epilogue path
- Refactor DmaCopyFanOut into DmaCopyFanOutOp parameterised by
hsa_amd_memory_copy_op_type_t so copy and swap share the same fan-out
logic without code duplication

## Motivation

<!-- Explain the purpose of this PR and the goals it aims to achieve.
-->

## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->

## JIRA ID

<!-- If applicable, mention the JIRA ID resolved by this PR (Example:
Resolves SWDEV-12345). -->
<!-- Do not post any JIRA links here. -->

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
[rocm-systems] ROCm/rocm-systems#4540 (commit bdc33c3)
systems-assistant bot pushed a commit to ROCm/ROCR-Runtime that referenced this pull request Apr 1, 2026
- Add SDMA_PKT_COPY_LINEAR_SWAP packet for gfx94X/gfx95X
- Add BlitSdma::BuildSwapCopyCommand and SubmitLinearSwapBody to build
and submit swap packets via the existing prologue/body/epilogue path
- Refactor DmaCopyFanOut into DmaCopyFanOutOp parameterised by
hsa_amd_memory_copy_op_type_t so copy and swap share the same fan-out
logic without code duplication

## Motivation

<!-- Explain the purpose of this PR and the goals it aims to achieve.
-->

## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->

## JIRA ID

<!-- If applicable, mention the JIRA ID resolved by this PR (Example:
Resolves SWDEV-12345). -->
<!-- Do not post any JIRA links here. -->

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
[rocm-systems] ROCm/rocm-systems#4540 (commit bdc33c3)
iassiour pushed a commit that referenced this pull request Apr 1, 2026
- Add SDMA_PKT_COPY_LINEAR_SWAP packet for gfx94X/gfx95X
- Add BlitSdma::BuildSwapCopyCommand and SubmitLinearSwapBody to build
and submit swap packets via the existing prologue/body/epilogue path
- Refactor DmaCopyFanOut into DmaCopyFanOutOp parameterised by
hsa_amd_memory_copy_op_type_t so copy and swap share the same fan-out
logic without code duplication

## Motivation

<!-- Explain the purpose of this PR and the goals it aims to achieve.
-->

## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->

## JIRA ID

<!-- If applicable, mention the JIRA ID resolved by this PR (Example:
Resolves SWDEV-12345). -->
<!-- Do not post any JIRA links here. -->

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants