Skip to content

Re-enabling precomputed kNN on host for UMAP#7915

Open
viclafargue wants to merge 4 commits intorapidsai:mainfrom
viclafargue:umap-precomputed-knn-on-host2
Open

Re-enabling precomputed kNN on host for UMAP#7915
viclafargue wants to merge 4 commits intorapidsai:mainfrom
viclafargue:umap-precomputed-knn-on-host2

Conversation

@viclafargue
Copy link
Copy Markdown
Contributor

Restores feature introduced in #7481.

(testing CI for now)

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Mar 19, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions bot added Cython / Python Cython or Python issue CUDA/C++ labels Mar 19, 2026
@viclafargue viclafargue added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Mar 19, 2026
@viclafargue viclafargue marked this pull request as ready for review March 19, 2026 17:31
@viclafargue viclafargue requested review from a team as code owners March 19, 2026 17:31
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 19, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e875e00c-7950-45ad-a771-b178e3e46e03

📥 Commits

Reviewing files that changed from the base of the PR and between b837710 and b78210f.

📒 Files selected for processing (1)
  • cpp/include/cuml/manifold/common.hpp
🚧 Files skipped from review as they are similar to previous changes (1)
  • cpp/include/cuml/manifold/common.hpp

📝 Walkthrough

Summary by CodeRabbit

  • Bug Fixes

    • Improved handling of precomputed KNN graphs: now validates pointer memory type and conditionally stages or copies graph data to the active device to ensure correct memory usage.
  • Documentation

    • Fixed typos and clarified that precomputed KNN graphs should be provided as CPU-accessible arrays (e.g., NumPy) for more efficient memory usage.

Walkthrough

C++ now checks CUDA pointer attributes for precomputed KNN graph pointers and conditionally copies host-accessible KNN indices/distances into launcher outputs; a RAFT CUDA error wrapper include was added. Python docs were fixed and extraction now preserves the input memory type.

Changes

Cohort / File(s) Summary
CUDA Memory Validation
cpp/include/cuml/manifold/common.hpp
manifold_precomputed_knn_inputs_t::alloc_knn_graph() now calls cudaPointerGetAttributes (via RAFT_CUDA_TRY) to detect device/managed pointers and returns accordingly; added raft/core/error.hpp include.
Conditional KNN Data Population
cpp/src/umap/knn_graph/algo.cuh
Launcher specializations for precomputed KNN inputs now copy knn_indices/knn_dists into out with raft::copy when alloc_knn_graph() indicates copying is needed; otherwise keep direct pointer assignment.
Docs & Memory Handling
python/cuml/cuml/manifold/umap/umap.pyx
Docstrings corrected ("embeddings") and updated to recommend CPU-accessible precomputed KNN arrays; extract_knn_graph(...) now called with mem_type=False to preserve input memory type.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • jinsolp
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: re-enabling precomputed kNN on host for UMAP, which is reflected in all three modified files.
Description check ✅ Passed The description relates to the changeset by mentioning it restores a feature from PR #7481, which aligns with the precomputed kNN functionality changes across the codebase.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/include/cuml/manifold/common.hpp`:
- Around line 109-117: The code reads cudaPointerAttributes (attr) from
cudaPointerGetAttributes for knn_graph.knn_indices and knn_graph.knn_dists
without checking return status, which can leave attr unchanged on failure; wrap
both cudaPointerGetAttributes calls with RAFT_CUDA_TRY (or equivalent) to check
the CUDA return value immediately and only inspect attr after a successful call,
and ensure failures are handled (e.g., return true or propagate) before using
attr to classify memory type.

In `@cpp/src/umap/knn_graph/algo.cuh`:
- Around line 220-223: The product passed to raft::copy uses int arithmetic
(inputsA.n * n_neighbors) which can overflow for large graphs; change the
copy-length argument to a size_t by casting one operand (e.g.,
static_cast<size_t>(inputsA.n) * n_neighbors) when calling raft::copy for
out.knn_indices and out.knn_dists so the multiplication is performed in size_t;
apply the same change in both specializations that copy from inputsA.knn_graph
(the calls to raft::copy involving out.knn_indices/out.knn_dists and
inputsA.knn_graph.knn_indices/knn_dists).

In `@python/cuml/cuml/manifold/umap/umap.pyx`:
- Around line 763-765: Fix the spelling mistake in the public docstring in
python/cuml/cuml/manifold/umap/umap.pyx by replacing "embeedings" with
"embeddings" in the sentence "should match the metric used to train the UMAP
embeedings."; update the docstring where that exact phrase appears so the
generated API docs are correct and run codespell (or your repository's spelling
check) to verify no other typos remain.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 103b5ecf-2bf1-400d-9f4e-8be183f7b397

📥 Commits

Reviewing files that changed from the base of the PR and between 69261e4 and 1f4e782.

📒 Files selected for processing (3)
  • cpp/include/cuml/manifold/common.hpp
  • cpp/src/umap/knn_graph/algo.cuh
  • python/cuml/cuml/manifold/umap/umap.pyx

@jinsolp
Copy link
Copy Markdown
Contributor

jinsolp commented Mar 19, 2026

Related issue: #7143

Copy link
Copy Markdown
Contributor

@jinsolp jinsolp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @viclafargue ! This looks like a more conservative version of the reverted PR, making it more robust to accessing non-device-accessible arrays in the kernel.
I have one question:

Comment on lines +113 to +114
if (attr.devicePointer == nullptr ||
(attr.type != cudaMemoryTypeDevice && attr.type != cudaMemoryTypeManaged))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried about the behavior of this on HMM-configure machines (which was what made us revert the previous change).
Will attr.devicePointer always have a value as long as the pts is device-accessible?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__UNIFIED.html#group__CUDART__UNIFIED_1gd89830e17d399c064a2f3c3fa8bb4390

Returns in *attributes the attributes of the pointer ptr. If pointer was not allocated in, mapped by or registered with context supporting unified addressing cudaErrorInvalidValue is returned.

Maybe we should check for this case.

Copy link
Copy Markdown
Contributor Author

@viclafargue viclafargue Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jinsolp If my understanding is correct, we want to use the pre-computed KNN directly if it is actual device (or exceptionally managed) memory and perform a copy in any other case. I agree that checking the value of devicePointer is not necessarily necessary as the actual check is done with the memory type. Indeed whether the machine is configured with HMM/ATS or not, host memory would either be detected as unregistered or host. I pushed a simplified version of the check.

Here is a little experiment I ran on my GPU configured with HMM :

Allocation                   | type            devicePointer       hostPointer       device
-----------------------------+-------------------------------------------------------------
malloc                       | type=Unregistered    devicePointer=set    hostPointer=set    device=-1
cudaMallocHost               | type=Host            devicePointer=set    hostPointer=set    device=0
cudaHostAlloc(mapped)        | type=Host            devicePointer=set    hostPointer=set    device=0
cudaMalloc                   | type=Device          devicePointer=set    hostPointer=NULL   device=0
cudaMallocManaged            | type=Managed         devicePointer=set    hostPointer=set    device=0
cudaHostRegister'd malloc    | type=Host            devicePointer=set    hostPointer=set    device=0

@aamijar Thanks for spotting this, but it looks like the issue is limited to earlier versions of CUDA (pre CUDA 11.0). Since the lowest version we support is CUDA 12.9, the check may not be necessary.

Copy link
Copy Markdown
Contributor

@jinsolp jinsolp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving! I believe checking for the attr only will make this more robust compared to what we had previously.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CUDA/C++ Cython / Python Cython or Python issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants