[FIX] Uninitialized access fixes + improved initcheck error reporting #348

aliceb-nv · 2025-08-25T17:19:56Z

Up until now, compute-sanitizer's initcheck tool was unusable on the codebase due to the numerous false positives generated by CCCL and cuSparse. Recent CUDA 12.8 improvements now allow code to exclude such false positives more finely by marking memory ranges as treat-initialized.

This PR contains a few uninitialized access fixes throughout the codebase, and silences common false positives.
In a future PR, automated initcheck runs could be performed by CI (as is the case with CUB in CCCL)

copy-pr-bot · 2025-08-25T17:20:05Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Support cuda 12.9 ## Issue closes #211 Authors: - Ramakrishnap (https://github.com/rgsl888prabhu) Approvers: - Trevor McKay (https://github.com/tmckayus) URL: #269

aliceb-nv · 2025-08-26T11:02:50Z

/ok to test 1a97bc6

github-actions · 2025-09-04T09:07:06Z

🔔 Hi @anandhkb, this pull request has had no activity for 7 days. Please update or let us know if it can be closed. Thank you!

If this is an "epic" issue, then please add the "epic" label to this issue.
If it is a PR and not ready for review, then please convert this to draft.
If you just want to switch off this notification, then use the "skip inactivity reminder" label.

akifcorduk · 2025-09-04T15:02:55Z

ci/compute-sanitizer-suppressions.xml

@@ -0,0 +1,225 @@
+<?xml version="1.0" encoding="utf-8"?>


Do we need this xml?

It is necessary to run compute-sanitizer --initchecks without false positives. It will be useful if we include initcheck runs as part of CI in the future (which I think we should consider)
CCCL has a similar file on their repo: https://github.com/NVIDIA/cccl/blob/main/ci/compute-sanitizer-suppressions.xml

akifcorduk · 2025-09-04T15:23:11Z

cpp/src/linear_programming/pdhg.cu

@@ -122,6 +124,10 @@ void pdhg_solver_t<i_t, f_t>::compute_At_y()
 {
  // A_t @ y

+  // cusparse flags a false positive here on the destination tmp buffer, silence it
+  cuopt::mark_span_as_initialized(make_span(current_saddle_point_state_.get_current_AtY()),


Instead of adding this call before all cusparse calls, can't we wrap cusparse calls in a function. I guess we use few of them.

Good point, I will add a wrapper for cusparsespmv

akifcorduk · 2025-09-04T15:23:39Z

cpp/src/mip/diversity/population.cu

@@ -745,6 +745,7 @@ void population_t<i_t, f_t>::print()
    if (index.first == 0 && solutions[0].first) {
      CUOPT_LOG_DEBUG(" Best feasible: %f", solutions[index.first].second.get_user_objective());
    }
+    if (index.first == 0 && !solutions[0].first) continue;


I this ever triggered or is it to prevent some static analysis?

IIRC ff the population is empty, the total_excess value is uninitialized (and thus this appears in the logs as a very long float value). This is functionally harmless, but triggers a positive in initcheck. I decided to keep this check since it also makes the logs a bit cleaner

akifcorduk · 2025-09-04T15:24:39Z

cpp/src/mip/local_search/rounding/constraint_prop.cu

@@ -816,6 +816,7 @@ bool constraint_prop_t<i_t, f_t>::is_problem_ii(problem_t<i_t, f_t>& problem)
 {
  bounds_update.calculate_activity_on_problem_bounds(problem);
  bounds_update.calculate_infeasible_redundant_constraints(problem);
+  multi_probe.calculate_activity(problem, problem.handle_ptr);


Why do we need that? Are we using multi_probe activity somewhere without initializing it?

sort_by_implied_slack_consumption uses multi_probe.min/max_activity; I seem to remember that constraint_prop::apply_round is sometimes called with an uninitialized multi_probe activity. I haven't been able to reproduce it with the current main branch so maybe this has been fixed
Do you think this is problematic?

cpp/src/mip/presolve/trivial_presolve.cuh

cpp/src/mip/problem/problem.cu

akifcorduk · 2025-09-04T15:33:07Z

cpp/src/mip/relaxed_lp/lp_state.cuh

+
+    // zero-fill the newly allocated space
+    if (prev_primal_size < problem.n_variables) {
+      thrust::fill(problem.handle_ptr->get_thrust_policy(),


Here i would clamp_within_bounds for the remaining vars.

cpp/src/mip/solution/solution.cu

akifcorduk · 2025-09-04T15:35:16Z

cpp/src/mip/solution/solution.cu

@@ -476,6 +489,8 @@ template <typename i_t, typename f_t>
 f_t solution_t<i_t, f_t>::get_quality(const rmm::device_uvector<f_t>& cstr_weights,
                                      const rmm::device_scalar<f_t>& objective_weight)
 {
+  compute_constraints();


I don't think we need that. get_quality should only be called after compute_feasibility(), instead of here we should put the compute_feasibility() before where it was uninitialized.

I am not fully confident in this approach since this puts the responsibility of ensuring an invariant (constraint values are up-to-date) to the caller. get_quality shouldn't really be too performance critical
What do you think?

akifcorduk · 2025-09-04T15:37:23Z

cpp/src/utilities/cuda_helpers.cuh

+
+  if (size == 0 || ptr == nullptr) return;
+
+#if defined(CUDA_API_PER_THREAD_DEFAULT_STREAM)


Isn't CUDA_API_PER_THREAD_DEFAULT_STREAM always defined on newer cudart versions?

I'm not sure😅
The documentation isn't crystal clear about it
https://docs.nvidia.com/cuda/cuda-driver-api/stream-sync-behavior.html

github-actions · 2025-09-13T09:06:20Z

🔔 Hi @anandhkb, this pull request has had no activity for 7 days. Please update or let us know if it can be closed. Thank you!

If this is an "epic" issue, then please add the "epic" label to this issue.
If it is a PR and not ready for review, then please convert this to draft.
If you just want to switch off this notification, then use the "skip inactivity reminder" label.

aliceb-nv added 2 commits August 25, 2025 10:27

merge initcheck changes

e469439

further fixes, add initcheck suppression file

ee493a8

aliceb-nv requested review from a team as code owners August 25, 2025 17:19

aliceb-nv requested a review from AyodeAwe August 25, 2025 17:19

aliceb-nv added non-breaking Introduces a non-breaking change improvement Improves an existing functionality labels Aug 25, 2025

aliceb-nv requested review from akifcorduk and Kh4ster August 25, 2025 17:19

aliceb-nv marked this pull request as draft August 25, 2025 17:20

rgsl888prabhu and others added 2 commits August 26, 2025 11:02

Support cuda 12.9 (#269)

55140df

Support cuda 12.9 ## Issue closes #211 Authors: - Ramakrishnap (https://github.com/rgsl888prabhu) Approvers: - Trevor McKay (https://github.com/tmckayus) URL: #269

workaround to build on 12.9

1a97bc6

anandhkb added this to the 25.10 milestone Aug 26, 2025

Merge branch 'branch-25.10' into initchecks

7e2ca66

aliceb-nv changed the title ~~[DRAFT] Uninitialized access fixes + improved initcheck error reporting~~ Uninitialized access fixes + improved initcheck error reporting Aug 27, 2025

aliceb-nv changed the title ~~Uninitialized access fixes + improved initcheck error reporting~~ [FIX] Uninitialized access fixes + improved initcheck error reporting Aug 27, 2025

aliceb-nv marked this pull request as ready for review August 27, 2025 09:31

tmckayus approved these changes Aug 27, 2025

View reviewed changes

Merge branch 'branch-25.10' into initchecks

da49032

akifcorduk reviewed Sep 4, 2025

View reviewed changes

akifcorduk closed this Sep 4, 2025

akifcorduk reopened this Sep 4, 2025

address review comments

267ee2d


		if (size == 0 \|\| ptr == nullptr) return;

		#if defined(CUDA_API_PER_THREAD_DEFAULT_STREAM)

[FIX] Uninitialized access fixes + improved initcheck error reporting #348

Are you sure you want to change the base?

[FIX] Uninitialized access fixes + improved initcheck error reporting #348

Uh oh!

Conversation

aliceb-nv commented Aug 25, 2025

Uh oh!

copy-pr-bot bot commented Aug 25, 2025

Uh oh!

aliceb-nv commented Aug 26, 2025

Uh oh!

github-actions bot commented Sep 4, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Sep 13, 2025

Uh oh!

Uh oh!