Skip to content

Conversation

@nieubank
Copy link
Contributor

Description

This commit adds the OrtExternalResourceImporter implementation for the NvTensorRtRtx execution provider, enabling zero-copy D3D12 to CUDA memory sharing and GPU synchronization.

Implementation:

  • NvTrtRtxExternalResourceImporterImpl: Full implementation of the OrtExternalResourceImporter interface using CUDA Driver APIs
  • Memory import: cuImportExternalMemory for D3D12_RESOURCE and D3D12_HEAP
  • Semaphore import: cuImportExternalSemaphore for D3D12_FENCE
  • Tensor creation: CreateTensorFromMemory wraps imported CUDA device pointers
  • Synchronization: WaitSemaphore/SignalSemaphore using cuWaitExternalSemaphoresAsync/cuSignalExternalSemaphoresAsync

Tests (nv_external_resource_importer_test.cc):

  • CreateExternalResourceImporter: Basic importer creation
  • CanImportMemoryCapabilities: D3D12 Resource/Heap capability queries
  • CanImportSemaphoreCapabilities: D3D12 Fence capability queries
  • ImportD3D12SharedResource: Memory import validation
  • CreateTensorFromImportedMemory: Tensor creation with CUDA device ptr verification
  • ImportD3D12Fence: Semaphore import validation
  • WaitAndSignalSemaphore: Bidirectional D3D12-CUDA sync
  • FullInferenceWithExternalMemory: E2E test with ReLU model verifying D3D12 upload -> CUDA inference -> D3D12 readback pipeline

Motivation and Context

#26821

This commit adds the OrtExternalResourceImporter implementation for the
NvTensorRtRtx execution provider, enabling zero-copy D3D12 to CUDA memory
sharing and GPU synchronization.

Implementation:
- NvTrtRtxExternalResourceImporterImpl: Full implementation of the
  OrtExternalResourceImporter interface using CUDA Driver APIs
- Memory import: cuImportExternalMemory for D3D12_RESOURCE and D3D12_HEAP
- Semaphore import: cuImportExternalSemaphore for D3D12_FENCE
- Tensor creation: CreateTensorFromMemory wraps imported CUDA device pointers
- Synchronization: WaitSemaphore/SignalSemaphore using
  cuWaitExternalSemaphoresAsync/cuSignalExternalSemaphoresAsync

Tests (nv_external_resource_importer_test.cc):
- CreateExternalResourceImporter: Basic importer creation
- CanImportMemoryCapabilities: D3D12 Resource/Heap capability queries
- CanImportSemaphoreCapabilities: D3D12 Fence capability queries
- ImportD3D12SharedResource: Memory import validation
- CreateTensorFromImportedMemory: Tensor creation with CUDA device ptr verification
- ImportD3D12Fence: Semaphore import validation
- WaitAndSignalSemaphore: Bidirectional D3D12-CUDA sync
- FullInferenceWithExternalMemory: E2E test with ReLU model verifying
  D3D12 upload -> CUDA inference -> D3D12 readback pipeline
@nieubank nieubank requested a review from skottmckay December 18, 2025 21:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants