-
Notifications
You must be signed in to change notification settings - Fork 71
[FIX] Uninitialized access fixes + improved initcheck error reporting #348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: branch-25.10
Are you sure you want to change the base?
Conversation
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Support cuda 12.9 ## Issue closes #211 Authors: - Ramakrishnap (https://github.com/rgsl888prabhu) Approvers: - Trevor McKay (https://github.com/tmckayus) URL: #269
/ok to test 1a97bc6 |
🔔 Hi @anandhkb, this pull request has had no activity for 7 days. Please update or let us know if it can be closed. Thank you! If this is an "epic" issue, then please add the "epic" label to this issue. |
@@ -0,0 +1,225 @@ | |||
<?xml version="1.0" encoding="utf-8"?> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this xml?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is necessary to run compute-sanitizer --initchecks without false positives. It will be useful if we include initcheck runs as part of CI in the future (which I think we should consider)
CCCL has a similar file on their repo: https://github.com/NVIDIA/cccl/blob/main/ci/compute-sanitizer-suppressions.xml
cpp/src/linear_programming/pdhg.cu
Outdated
@@ -122,6 +124,10 @@ void pdhg_solver_t<i_t, f_t>::compute_At_y() | |||
{ | |||
// A_t @ y | |||
|
|||
// cusparse flags a false positive here on the destination tmp buffer, silence it | |||
cuopt::mark_span_as_initialized(make_span(current_saddle_point_state_.get_current_AtY()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of adding this call before all cusparse calls, can't we wrap cusparse calls in a function. I guess we use few of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I will add a wrapper for cusparsespmv
@@ -745,6 +745,7 @@ void population_t<i_t, f_t>::print() | |||
if (index.first == 0 && solutions[0].first) { | |||
CUOPT_LOG_DEBUG(" Best feasible: %f", solutions[index.first].second.get_user_objective()); | |||
} | |||
if (index.first == 0 && !solutions[0].first) continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I this ever triggered or is it to prevent some static analysis?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC ff the population is empty, the total_excess value is uninitialized (and thus this appears in the logs as a very long float value). This is functionally harmless, but triggers a positive in initcheck. I decided to keep this check since it also makes the logs a bit cleaner
@@ -816,6 +816,7 @@ bool constraint_prop_t<i_t, f_t>::is_problem_ii(problem_t<i_t, f_t>& problem) | |||
{ | |||
bounds_update.calculate_activity_on_problem_bounds(problem); | |||
bounds_update.calculate_infeasible_redundant_constraints(problem); | |||
multi_probe.calculate_activity(problem, problem.handle_ptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need that? Are we using multi_probe activity somewhere without initializing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sort_by_implied_slack_consumption
uses multi_probe.min/max_activity; I seem to remember that constraint_prop::apply_round is sometimes called with an uninitialized multi_probe activity. I haven't been able to reproduce it with the current main branch so maybe this has been fixed
Do you think this is problematic?
|
||
// zero-fill the newly allocated space | ||
if (prev_primal_size < problem.n_variables) { | ||
thrust::fill(problem.handle_ptr->get_thrust_policy(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here i would clamp_within_bounds for the remaining vars.
@@ -476,6 +489,8 @@ template <typename i_t, typename f_t> | |||
f_t solution_t<i_t, f_t>::get_quality(const rmm::device_uvector<f_t>& cstr_weights, | |||
const rmm::device_scalar<f_t>& objective_weight) | |||
{ | |||
compute_constraints(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need that. get_quality should only be called after compute_feasibility()
, instead of here we should put the compute_feasibility()
before where it was uninitialized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not fully confident in this approach since this puts the responsibility of ensuring an invariant (constraint values are up-to-date) to the caller. get_quality
shouldn't really be too performance critical
What do you think?
|
||
if (size == 0 || ptr == nullptr) return; | ||
|
||
#if defined(CUDA_API_PER_THREAD_DEFAULT_STREAM) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't CUDA_API_PER_THREAD_DEFAULT_STREAM always defined on newer cudart versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure😅
The documentation isn't crystal clear about it
https://docs.nvidia.com/cuda/cuda-driver-api/stream-sync-behavior.html
🔔 Hi @anandhkb, this pull request has had no activity for 7 days. Please update or let us know if it can be closed. Thank you! If this is an "epic" issue, then please add the "epic" label to this issue. |
Up until now, compute-sanitizer's initcheck tool was unusable on the codebase due to the numerous false positives generated by CCCL and cuSparse. Recent CUDA 12.8 improvements now allow code to exclude such false positives more finely by marking memory ranges as treat-initialized.
This PR contains a few uninitialized access fixes throughout the codebase, and silences common false positives.
In a future PR, automated initcheck runs could be performed by CI (as is the case with CUB in CCCL)