Skip to content

Conversation

aliceb-nv
Copy link
Contributor

Up until now, compute-sanitizer's initcheck tool was unusable on the codebase due to the numerous false positives generated by CCCL and cuSparse. Recent CUDA 12.8 improvements now allow code to exclude such false positives more finely by marking memory ranges as treat-initialized.

This PR contains a few uninitialized access fixes throughout the codebase, and silences common false positives.
In a future PR, automated initcheck runs could be performed by CI (as is the case with CUB in CCCL)

@aliceb-nv aliceb-nv requested review from a team as code owners August 25, 2025 17:19
@aliceb-nv aliceb-nv requested a review from AyodeAwe August 25, 2025 17:19
@aliceb-nv aliceb-nv added non-breaking Introduces a non-breaking change improvement Improves an existing functionality labels Aug 25, 2025
@aliceb-nv aliceb-nv marked this pull request as draft August 25, 2025 17:20
Copy link

copy-pr-bot bot commented Aug 25, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@aliceb-nv
Copy link
Contributor Author

/ok to test 1a97bc6

@anandhkb anandhkb added this to the 25.10 milestone Aug 26, 2025
@aliceb-nv aliceb-nv changed the title [DRAFT] Uninitialized access fixes + improved initcheck error reporting Uninitialized access fixes + improved initcheck error reporting Aug 27, 2025
@aliceb-nv aliceb-nv changed the title Uninitialized access fixes + improved initcheck error reporting [FIX] Uninitialized access fixes + improved initcheck error reporting Aug 27, 2025
@aliceb-nv aliceb-nv marked this pull request as ready for review August 27, 2025 09:31
Copy link

github-actions bot commented Sep 4, 2025

🔔 Hi @anandhkb, this pull request has had no activity for 7 days. Please update or let us know if it can be closed. Thank you!

If this is an "epic" issue, then please add the "epic" label to this issue.
If it is a PR and not ready for review, then please convert this to draft.
If you just want to switch off this notification, then use the "skip inactivity reminder" label.

@@ -0,0 +1,225 @@
<?xml version="1.0" encoding="utf-8"?>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this xml?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is necessary to run compute-sanitizer --initchecks without false positives. It will be useful if we include initcheck runs as part of CI in the future (which I think we should consider)
CCCL has a similar file on their repo: https://github.com/NVIDIA/cccl/blob/main/ci/compute-sanitizer-suppressions.xml

@@ -122,6 +124,10 @@ void pdhg_solver_t<i_t, f_t>::compute_At_y()
{
// A_t @ y

// cusparse flags a false positive here on the destination tmp buffer, silence it
cuopt::mark_span_as_initialized(make_span(current_saddle_point_state_.get_current_AtY()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding this call before all cusparse calls, can't we wrap cusparse calls in a function. I guess we use few of them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I will add a wrapper for cusparsespmv

@@ -745,6 +745,7 @@ void population_t<i_t, f_t>::print()
if (index.first == 0 && solutions[0].first) {
CUOPT_LOG_DEBUG(" Best feasible: %f", solutions[index.first].second.get_user_objective());
}
if (index.first == 0 && !solutions[0].first) continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I this ever triggered or is it to prevent some static analysis?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC ff the population is empty, the total_excess value is uninitialized (and thus this appears in the logs as a very long float value). This is functionally harmless, but triggers a positive in initcheck. I decided to keep this check since it also makes the logs a bit cleaner

@@ -816,6 +816,7 @@ bool constraint_prop_t<i_t, f_t>::is_problem_ii(problem_t<i_t, f_t>& problem)
{
bounds_update.calculate_activity_on_problem_bounds(problem);
bounds_update.calculate_infeasible_redundant_constraints(problem);
multi_probe.calculate_activity(problem, problem.handle_ptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need that? Are we using multi_probe activity somewhere without initializing it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sort_by_implied_slack_consumption uses multi_probe.min/max_activity; I seem to remember that constraint_prop::apply_round is sometimes called with an uninitialized multi_probe activity. I haven't been able to reproduce it with the current main branch so maybe this has been fixed
Do you think this is problematic?


// zero-fill the newly allocated space
if (prev_primal_size < problem.n_variables) {
thrust::fill(problem.handle_ptr->get_thrust_policy(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here i would clamp_within_bounds for the remaining vars.

@@ -476,6 +489,8 @@ template <typename i_t, typename f_t>
f_t solution_t<i_t, f_t>::get_quality(const rmm::device_uvector<f_t>& cstr_weights,
const rmm::device_scalar<f_t>& objective_weight)
{
compute_constraints();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need that. get_quality should only be called after compute_feasibility(), instead of here we should put the compute_feasibility() before where it was uninitialized.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not fully confident in this approach since this puts the responsibility of ensuring an invariant (constraint values are up-to-date) to the caller. get_quality shouldn't really be too performance critical
What do you think?


if (size == 0 || ptr == nullptr) return;

#if defined(CUDA_API_PER_THREAD_DEFAULT_STREAM)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't CUDA_API_PER_THREAD_DEFAULT_STREAM always defined on newer cudart versions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure😅
The documentation isn't crystal clear about it
https://docs.nvidia.com/cuda/cuda-driver-api/stream-sync-behavior.html

@akifcorduk akifcorduk closed this Sep 4, 2025
@akifcorduk akifcorduk reopened this Sep 4, 2025
Copy link

🔔 Hi @anandhkb, this pull request has had no activity for 7 days. Please update or let us know if it can be closed. Thank you!

If this is an "epic" issue, then please add the "epic" label to this issue.
If it is a PR and not ready for review, then please convert this to draft.
If you just want to switch off this notification, then use the "skip inactivity reminder" label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improves an existing functionality non-breaking Introduces a non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants