Re-enabling precomputed kNN on host for UMAP#7915
Re-enabling precomputed kNN on host for UMAP#7915viclafargue wants to merge 4 commits intorapidsai:mainfrom
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughSummary by CodeRabbit
WalkthroughC++ now checks CUDA pointer attributes for precomputed KNN graph pointers and conditionally copies host-accessible KNN indices/distances into launcher outputs; a RAFT CUDA error wrapper include was added. Python docs were fixed and extraction now preserves the input memory type. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/include/cuml/manifold/common.hpp`:
- Around line 109-117: The code reads cudaPointerAttributes (attr) from
cudaPointerGetAttributes for knn_graph.knn_indices and knn_graph.knn_dists
without checking return status, which can leave attr unchanged on failure; wrap
both cudaPointerGetAttributes calls with RAFT_CUDA_TRY (or equivalent) to check
the CUDA return value immediately and only inspect attr after a successful call,
and ensure failures are handled (e.g., return true or propagate) before using
attr to classify memory type.
In `@cpp/src/umap/knn_graph/algo.cuh`:
- Around line 220-223: The product passed to raft::copy uses int arithmetic
(inputsA.n * n_neighbors) which can overflow for large graphs; change the
copy-length argument to a size_t by casting one operand (e.g.,
static_cast<size_t>(inputsA.n) * n_neighbors) when calling raft::copy for
out.knn_indices and out.knn_dists so the multiplication is performed in size_t;
apply the same change in both specializations that copy from inputsA.knn_graph
(the calls to raft::copy involving out.knn_indices/out.knn_dists and
inputsA.knn_graph.knn_indices/knn_dists).
In `@python/cuml/cuml/manifold/umap/umap.pyx`:
- Around line 763-765: Fix the spelling mistake in the public docstring in
python/cuml/cuml/manifold/umap/umap.pyx by replacing "embeedings" with
"embeddings" in the sentence "should match the metric used to train the UMAP
embeedings."; update the docstring where that exact phrase appears so the
generated API docs are correct and run codespell (or your repository's spelling
check) to verify no other typos remain.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 103b5ecf-2bf1-400d-9f4e-8be183f7b397
📒 Files selected for processing (3)
cpp/include/cuml/manifold/common.hppcpp/src/umap/knn_graph/algo.cuhpython/cuml/cuml/manifold/umap/umap.pyx
|
Related issue: #7143 |
jinsolp
left a comment
There was a problem hiding this comment.
Thanks @viclafargue ! This looks like a more conservative version of the reverted PR, making it more robust to accessing non-device-accessible arrays in the kernel.
I have one question:
cpp/include/cuml/manifold/common.hpp
Outdated
| if (attr.devicePointer == nullptr || | ||
| (attr.type != cudaMemoryTypeDevice && attr.type != cudaMemoryTypeManaged)) |
There was a problem hiding this comment.
I'm worried about the behavior of this on HMM-configure machines (which was what made us revert the previous change).
Will attr.devicePointer always have a value as long as the pts is device-accessible?
There was a problem hiding this comment.
Returns in *attributes the attributes of the pointer ptr. If pointer was not allocated in, mapped by or registered with context supporting unified addressing cudaErrorInvalidValue is returned.
Maybe we should check for this case.
There was a problem hiding this comment.
@jinsolp If my understanding is correct, we want to use the pre-computed KNN directly if it is actual device (or exceptionally managed) memory and perform a copy in any other case. I agree that checking the value of devicePointer is not necessarily necessary as the actual check is done with the memory type. Indeed whether the machine is configured with HMM/ATS or not, host memory would either be detected as unregistered or host. I pushed a simplified version of the check.
Here is a little experiment I ran on my GPU configured with HMM :
Allocation | type devicePointer hostPointer device
-----------------------------+-------------------------------------------------------------
malloc | type=Unregistered devicePointer=set hostPointer=set device=-1
cudaMallocHost | type=Host devicePointer=set hostPointer=set device=0
cudaHostAlloc(mapped) | type=Host devicePointer=set hostPointer=set device=0
cudaMalloc | type=Device devicePointer=set hostPointer=NULL device=0
cudaMallocManaged | type=Managed devicePointer=set hostPointer=set device=0
cudaHostRegister'd malloc | type=Host devicePointer=set hostPointer=set device=0
@aamijar Thanks for spotting this, but it looks like the issue is limited to earlier versions of CUDA (pre CUDA 11.0). Since the lowest version we support is CUDA 12.9, the check may not be necessary.
jinsolp
left a comment
There was a problem hiding this comment.
Approving! I believe checking for the attr only will make this more robust compared to what we had previously.
Restores feature introduced in #7481.
(testing CI for now)