-
Notifications
You must be signed in to change notification settings - Fork 71
[FIX] Uninitialized access fixes + improved initcheck error reporting #348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: branch-25.10
Are you sure you want to change the base?
Changes from 6 commits
e469439
ee493a8
55140df
1a97bc6
7e2ca66
da49032
267ee2d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,225 @@ | ||
<?xml version="1.0" encoding="utf-8"?> | ||
<ComputeSanitizerOutput> | ||
<record> | ||
<kind>Initcheck</kind> | ||
<what> | ||
<text>Uninitialized __global__ memory read of size 4 bytes</text> | ||
<size>4</size> | ||
</what> | ||
<where> | ||
<func>.*</func> | ||
</where> | ||
<hostStack> | ||
<frame> | ||
<module>.*libcuda.so.*</module> | ||
</frame> | ||
<frame> | ||
<func>cusparseCsr2cscEx2</func> | ||
<module>.*libcusparse.so.*</module> | ||
</frame> | ||
</hostStack> | ||
</record> | ||
<record> | ||
<kind>Initcheck</kind> | ||
<what> | ||
<text>Uninitialized __global__ memory read of size 4 bytes</text> | ||
<size>4</size> | ||
</what> | ||
<where> | ||
<func>ThreadLoad</func> | ||
</where> | ||
<hostStack> | ||
<frame> | ||
<module>.*libcuda.so.*</module> | ||
</frame> | ||
<frame> | ||
<module>libcudart.*</module> | ||
</frame> | ||
<frame> | ||
<func>cudaLaunchKernel</func> | ||
</frame> | ||
<frame> | ||
<func>.*cub::.*::Device(Segmented)?(Reduce|Scan)(SingleTile)?Kernel.*</func> | ||
</frame> | ||
</hostStack> | ||
</record> | ||
<record> | ||
<kind>Initcheck</kind> | ||
<what> | ||
<text>Uninitialized __global__ memory read of size 2 bytes</text> | ||
<size>2</size> | ||
</what> | ||
<where> | ||
<func>ThreadLoad</func> | ||
</where> | ||
<hostStack> | ||
<frame> | ||
<module>.*libcuda.so.*</module> | ||
</frame> | ||
<frame> | ||
<module>libcudart.*</module> | ||
</frame> | ||
<frame> | ||
<func>cudaLaunchKernel</func> | ||
</frame> | ||
<frame> | ||
<func>.*cub::.*::Device(Segmented)?(Reduce|Scan)(SingleTile)?Kernel.*</func> | ||
</frame> | ||
</hostStack> | ||
</record> | ||
<record> | ||
<kind>Initcheck</kind> | ||
<what> | ||
<text>Uninitialized __global__ memory read of size 8 bytes</text> | ||
<size>8</size> | ||
</what> | ||
<where> | ||
<func>DeviceSegmentedReduceKernel</func> | ||
</where> | ||
</record> | ||
<record> | ||
<kind>Initcheck</kind> | ||
<what> | ||
<text>Uninitialized __global__ memory read of size 4 bytes</text> | ||
<size>4</size> | ||
</what> | ||
<where> | ||
<func>ThreadLoad</func> | ||
</where> | ||
<hostStack> | ||
<frame> | ||
<module>.*libcuda.so.*</module> | ||
</frame> | ||
<frame> | ||
<module>libcudart.*</module> | ||
</frame> | ||
<frame> | ||
<module>libcudart.*</module> | ||
</frame> | ||
<frame> | ||
<module>.*libcuopt.*</module> | ||
</frame> | ||
<frame> | ||
<func>.*Device(Reduce|Scan)Kernel.*</func> | ||
</frame> | ||
</hostStack> | ||
</record> | ||
<!-- Rule matching cccl's pattern of copying tuples back to host after reduce_by_keys, which contain uninitialized padding --> | ||
<!-- Because of aggressive inlining, thrust calls are elided out of the host stack, which prevents a more finely grained rule. In practice this is good enough --> | ||
<record> | ||
<kind>InitcheckApiError</kind> | ||
<level>Error</level> | ||
<what> | ||
<text>Host API uninitialized memory access</text> | ||
<accessSize>16</accessSize> | ||
</what> | ||
<hostStack> | ||
<frame> | ||
<func>cuMemcpyDtoHAsync.*</func> | ||
<module>.*libcuda.so.*</module> | ||
</frame> | ||
</hostStack> | ||
</record> | ||
<!-- Suppress uninit copies on rmm::device_vector copy constructor - often vector members are allocated but not filled --> | ||
<record> | ||
<kind>InitcheckApiError</kind> | ||
<level>Error</level> | ||
<what> | ||
<text>Host API uninitialized memory access</text> | ||
</what> | ||
<hostStack> | ||
<frame> | ||
<func>cuMemcpyAsync</func> | ||
<module>.*libcuda.so.*</module> | ||
</frame> | ||
<frame> | ||
<module>.*libcudart.so.*</module> | ||
</frame> | ||
<frame> | ||
<module>.*libcudart.so.*</module> | ||
</frame> | ||
<frame> | ||
<module>.*libcudart.so.*</module> | ||
</frame> | ||
<frame> | ||
<module>.*librmm.so.*</module> | ||
</frame> | ||
<frame> | ||
<func>rmm::device_buffer::device_buffer</func> | ||
<module>.*librmm.so.*</module> | ||
</frame> | ||
</hostStack> | ||
</record> | ||
<record> | ||
<kind>InitcheckApiError</kind> | ||
<level>Error</level> | ||
<what> | ||
<text>Host API uninitialized memory access</text> | ||
</what> | ||
<hostStack> | ||
<frame> | ||
<func>cuMemcpyAsync</func> | ||
<module>.*libcuda.so.*</module> | ||
</frame> | ||
<frame> | ||
<module>.*libcudart.so.*</module> | ||
</frame> | ||
<frame> | ||
<module>.*libcudart.so.*</module> | ||
</frame> | ||
<frame> | ||
<module>.*libcudart.so.*</module> | ||
</frame> | ||
<frame> | ||
<module>.*librmm.so.*</module> | ||
</frame> | ||
<frame> | ||
<module>.*librmm.so.*</module> | ||
</frame> | ||
<frame> | ||
<func>rmm::device_uvector.*::device_uvector</func> | ||
<module>.*libcuopt.so.*</module> | ||
</frame> | ||
</hostStack> | ||
</record> | ||
<!-- Uninitialized device-to-device copies are usually harmless - if actualy bogus, errors may be caught later on --> | ||
<record> | ||
<kind>InitcheckApiError</kind> | ||
<level>Error</level> | ||
<what> | ||
<text>Host API uninitialized memory access</text> | ||
</what> | ||
<hostStack> | ||
<frame> | ||
<func>cuMemcpyDtoDAsync.*</func> | ||
<module>.*libcuda.so.*</module> | ||
</frame> | ||
</hostStack> | ||
</record> | ||
<record> | ||
<kind>InitcheckApiError</kind> | ||
<level>Error</level> | ||
<what> | ||
<text>Host API uninitialized memory access</text> | ||
</what> | ||
<hostStack> | ||
<frame> | ||
<func>cuMemcpyAsync</func> | ||
<module>.*libcuda.so.*</module> | ||
</frame> | ||
<frame> | ||
<module>.*libcudart.so.*</module> | ||
</frame> | ||
<frame> | ||
<module>.*libcudart.so.*</module> | ||
</frame> | ||
<frame> | ||
<func>cudaMemcpyAsync</func> | ||
</frame> | ||
<frame> | ||
<func>rmm::device_buffer::resize</func> | ||
<module>.*librmm.so.*</module> | ||
</frame> | ||
</hostStack> | ||
</record> | ||
</ComputeSanitizerOutput> |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,6 +19,8 @@ | |
#include <linear_programming/utilities/ping_pong_graph.cuh> | ||
#include <linear_programming/utils.cuh> | ||
#include <mip/mip_constants.hpp> | ||
#include <utilities/copy_helpers.hpp> | ||
#include <utilities/cuda_helpers.cuh> | ||
|
||
#include <raft/sparse/detail/cusparse_macros.h> | ||
#include <raft/sparse/detail/cusparse_wrappers.h> | ||
|
@@ -122,6 +124,10 @@ void pdhg_solver_t<i_t, f_t>::compute_At_y() | |
{ | ||
// A_t @ y | ||
|
||
// cusparse flags a false positive here on the destination tmp buffer, silence it | ||
cuopt::mark_span_as_initialized(make_span(current_saddle_point_state_.get_current_AtY()), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Instead of adding this call before all cusparse calls, can't we wrap cusparse calls in a function. I guess we use few of them. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point, I will add a wrapper for cusparsespmv |
||
handle_ptr_->get_stream()); | ||
|
||
RAFT_CUSPARSE_TRY(raft::sparse::detail::cusparsespmv(handle_ptr_->get_cusparse_handle(), | ||
CUSPARSE_OPERATION_NON_TRANSPOSE, | ||
reusable_device_scalar_value_1_.data(), | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -745,6 +745,7 @@ void population_t<i_t, f_t>::print() | |
if (index.first == 0 && solutions[0].first) { | ||
CUOPT_LOG_DEBUG(" Best feasible: %f", solutions[index.first].second.get_user_objective()); | ||
} | ||
if (index.first == 0 && !solutions[0].first) continue; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I this ever triggered or is it to prevent some static analysis? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IIRC ff the population is empty, the total_excess value is uninitialized (and thus this appears in the logs as a very long float value). This is functionally harmless, but triggers a positive in initcheck. I decided to keep this check since it also makes the logs a bit cleaner |
||
CUOPT_LOG_DEBUG("%d : %f\t%f\t%f\t%d", | ||
i, | ||
index.second, | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -816,6 +816,7 @@ bool constraint_prop_t<i_t, f_t>::is_problem_ii(problem_t<i_t, f_t>& problem) | |
{ | ||
bounds_update.calculate_activity_on_problem_bounds(problem); | ||
bounds_update.calculate_infeasible_redundant_constraints(problem); | ||
multi_probe.calculate_activity(problem, problem.handle_ptr); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we need that? Are we using multi_probe activity somewhere without initializing it? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
bool problem_ii = bounds_update.infeas_constraints_count > 0; | ||
return problem_ii; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this xml?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is necessary to run compute-sanitizer --initchecks without false positives. It will be useful if we include initcheck runs as part of CI in the future (which I think we should consider)
CCCL has a similar file on their repo: https://github.com/NVIDIA/cccl/blob/main/ci/compute-sanitizer-suppressions.xml