Commit 80b0688
Fix segfault in splitBatchedState for distributed multi-GPU batched evolution (NVIDIA#3771)
This commit fixes a critical bug in the distributed batched state handling
for cudaq.evolve() with store_intermediate_results=ALL on multi-GPU systems.
Root causes fixed:
1. distributeBatchedStateData: Incorrect batch index calculation caused
out-of-bounds memory access when distributing state data across GPUs.
2. splitBatchedState: Used local dimension with global batch size, causing
incorrect state size calculation and wrong number of states per GPU.
3. cudm_solver.py: Assumed splitBatchedState returns all batch_size states,
but in distributed mode it correctly returns only local subset.
Changes:
- Add singleStateDimension field to CuDensityMatState to track individual
state dimension within a batch
- Fix batch index calculation using cuDensityMat API's batchModeLocation
- Update splitBatchedState to use singleStateDimension for correct sizing
- Update Python solver to handle distributed partial results correctly
- Add comprehensive MPI tests for distributed batched evolution scenarios
Signed-off-by: huaweil <huaweil@nvidia.com>
Co-authored-by: Thien Nguyen <58006629+1tnguyen@users.noreply.github.com>1 parent 40ed2d8 commit 80b0688
1 file changed
+1
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
34 | | - | |
35 | 34 | | |
36 | 35 | | |
37 | 36 | | |
38 | 37 | | |
39 | 38 | | |
40 | | - | |
41 | | - | |
| 39 | + | |
42 | 40 | | |
43 | 41 | | |
44 | 42 | | |
| |||
0 commit comments