Fix segfault in splitBatchedState for distributed multi-GPU batched evolution (NVIDIA#3771)

huaweil-nv · 1tnguyen · taalexander · commit 80b06888cd50 · 2026-01-26T21:35:16.000-04:00
This commit fixes a critical bug in the distributed batched state handling
for cudaq.evolve() with store_intermediate_results=ALL on multi-GPU systems.

Root causes fixed:
1. distributeBatchedStateData: Incorrect batch index calculation caused
   out-of-bounds memory access when distributing state data across GPUs.
2. splitBatchedState: Used local dimension with global batch size, causing
   incorrect state size calculation and wrong number of states per GPU.
3. cudm_solver.py: Assumed splitBatchedState returns all batch_size states,
   but in distributed mode it correctly returns only local subset.

Changes:
- Add singleStateDimension field to CuDensityMatState to track individual
  state dimension within a batch
- Fix batch index calculation using cuDensityMat API's batchModeLocation
- Update splitBatchedState to use singleStateDimension for correct sizing
- Update Python solver to handle distributed partial results correctly
- Add comprehensive MPI tests for distributed batched evolution scenarios

Signed-off-by: huaweil &lt;huaweil@nvidia.com&gt;
Co-authored-by: Thien Nguyen &lt;58006629+1tnguyen@users.noreply.github.com&gt;
diff --git a/runtime/nvqir/cudensitymat/CuDensityMatState.h b/runtime/nvqir/cudensitymat/CuDensityMatState.h
@@ -31,14 +31,12 @@ class CuDensityMatState : public cudaq::SimulationState {
   // For batched states in distributed mode, dimension < batchSize *
   // singleStateDimension.
   std::size_t singleStateDimension = 0;
-  bool borrowedData = false;
 
 public:
   // Create a state with a size and data pointer.
   // Note: the underlying cudm state is not yet initialized as we don't know the
   // dimensions of sub-systems.
-  // If `borrowed` is true, the state does not own the device data pointer.
-  CuDensityMatState(std::size_t s, void *ptr, bool borrowed = false);
+  CuDensityMatState(std::size_t s, void *ptr);
 
   // Default constructor
   CuDensityMatState() {}