FIX: Implement input validation in UMAP accel wrapper by shivansh023023 · Pull Request #7837 · rapidsai/cuml

shivansh023023 · 2026-02-28T03:45:41Z

Title: [FEA] Implement input validation in UMAP accel wrapper

Description:

This PR implements the TODO for adding input validation to the UMAP accel wrapper.

Changes:

Added sklearn.utils.validation.check_array to _gpu_fit, _gpu_fit_transform, and _gpu_transform.

Ensured the validated array is assigned back to X.

Maintained support for force_all_finite parameter to allow user-controlled validation behavior.

This ensures that accel users receive clear error messages when passing NaN or Inf values, consistent with standard scikit-learn estimators.

copy-pr-bot · 2026-02-28T03:45:45Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-02-28T03:50:19Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds input validation using check_array and numpy to GPU UMAP proxy methods, enforces dtype=float32 and order="C", accepts CSR sparse inputs, adds class- and method-level docstrings, updates _gpu_transform and _gpu_inverse_transform to accept **kwargs and forwards kwargs to underlying GPU calls; underlying GPU calls remain unchanged.

Changes

Cohort / File(s)	Summary
GPU UMAP proxy `python/cuml/cuml/accel/_overrides/umap.py`	Adds imports `numpy as np` and `check_array`; adds class docstring; applies `check_array` validation (with `dtype=float32`, `order="C"`, `accept_sparse="csr"`, `force_all_finite` handling) in `_gpu_fit`, `_gpu_fit_transform`, `_gpu_transform`, and `_gpu_inverse_transform`; updates `_gpu_transform` and `_gpu_inverse_transform` signatures to accept `*kwargs` and forwards kwargs to `self._gpu.` calls; replaces inline comments with method docstrings.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'FIX: Implement input validation in UMAP accel wrapper' directly and clearly describes the main change: adding input validation to the UMAP accelerator wrapper.
Description check	✅ Passed	The description is directly related to the changeset, explaining the implementation of input validation for the UMAP accel wrapper and its benefits for error messaging consistency.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@python/cuml/cuml/accel/_wrappers/umap.py`:
- Line 8: Replace the sklearn import with cuML's adapter so accel input types
are preserved: remove "from sklearn.utils.validation import check_array" and
instead import check_array from cuml.thirdparty_adapters.adapters (e.g., "from
cuml.thirdparty_adapters.adapters import check_array"), keeping the same
function name so existing calls to check_array in this module (the uses that
currently coerce cuDF/CuPy to NumPy) will call the cuML adapter which checks
cuml_accel_enabled() and properly handles NumPy, pandas, cuDF, CuPy and sparse
inputs.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ed4de0a and a9b0755.

📒 Files selected for processing (1)

python/cuml/cuml/accel/_wrappers/umap.py

python/cuml/cuml/accel/_wrappers/umap.py

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@python/cuml/cuml/accel/_wrappers/umap.py`:
- Around line 27-29: The function _gpu_fit_transform currently accepts **kwargs
for signature compatibility but does not use it, triggering ARG002; fix this by
explicitly marking kwargs as unused inside _gpu_fit_transform (for example add a
single line like "del kwargs" or "unused_kwargs = kwargs" at the start of the
function) so linters know the parameter is intentionally unused while leaving
the signature unchanged.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a9b0755 and 54e36f0.

📒 Files selected for processing (1)

python/cuml/cuml/accel/_wrappers/umap.py

coderabbitai · 2026-02-28T04:04:30Z

python/cuml/cuml/accel/_wrappers/umap.py

    def _gpu_fit_transform(self, X, y=None, force_all_finite=True, **kwargs):
        # **kwargs is here for signature compatibility - umap.UMAP has them,
        # but ignores all but the ones named here.


⚠️ Potential issue | 🟡 Minor

Handle intentionally unused kwargs explicitly to satisfy lint.

Line 27 introduces kwargs for signature compatibility, but it is unused and currently triggers ARG002. Please explicitly mark it unused.

Proposed fix

def _gpu_fit_transform(self, X, y=None, force_all_finite=True, **kwargs): # **kwargs is here for signature compatibility - umap.UMAP has them, # but ignores all but the ones named here. + del kwargs # intentionally unused; kept for signature compatibility # Validate input to handle non-finite values (NaN, Inf) X = check_array(

🧰 Tools

🪛 Ruff (0.15.2)

[warning] 27-27: Unused method argument: kwargs

(ARG002)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@python/cuml/cuml/accel/_wrappers/umap.py` around lines 27 - 29, The function _gpu_fit_transform currently accepts **kwargs for signature compatibility but does not use it, triggering ARG002; fix this by explicitly marking kwargs as unused inside _gpu_fit_transform (for example add a single line like "del kwargs" or "unused_kwargs = kwargs" at the start of the function) so linters know the parameter is intentionally unused while leaving the signature unchanged.

shivansh023023 · 2026-02-28T07:22:19Z

/label category: improvement
/label breaking: non-breaking

coderabbitai

🧹 Nitpick comments (2)

python/cuml/cuml/accel/_overrides/umap.py (2)
41-48: Add **kwargs for dispatch signature compatibility.

The dispatch mechanism in _call_gpu_method forwards **kwargs to all GPU methods. Unlike _gpu_fit and _gpu_fit_transform, this method doesn't accept **kwargs, which could cause TypeError if any kwargs are passed (e.g., via sklearn 1.3+ metadata routing).
♻️ Proposed fix
-    def _gpu_transform(self, X, force_all_finite=True):
+    def _gpu_transform(self, X, force_all_finite=True, **kwargs):
         """Transform the data with GPU-accelerated input validation."""
+        del kwargs  # Handle intentionally unused kwargs for signature compatibility
+
         # Validate input during transform
         X = check_array(
             X, accept_sparse="csr", force_all_finite=force_all_finite
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/cuml/cuml/accel/_overrides/umap.py` around lines 41 - 48, The
_gpu_transform method lacks a **kwargs parameter so kwargs forwarded by the
dispatch path (_call_gpu_method) can cause a TypeError; update the
_gpu_transform signature to accept **kwargs and forward them to
self._gpu.transform (i.e., call self._gpu.transform(X, **kwargs)) while keeping
the input validation via check_array intact so X is still validated before
dispatch.
50-52: Consider adding input validation for consistency.

This method lacks the check_array validation that was added to the other GPU methods. While inverse transform may have different input characteristics, validating for non-finite values would maintain consistency and prevent silent data corruption. Additionally, consider adding **kwargs for dispatch signature compatibility.
♻️ Proposed fix for consistency
-    def _gpu_inverse_transform(self, X):
+    def _gpu_inverse_transform(self, X, force_all_finite=True, **kwargs):
         """Inverse transform the data."""
+        del kwargs  # Handle intentionally unused kwargs for signature compatibility
+
+        # Validate input to handle non-finite values (NaN, Inf)
+        X = check_array(
+            X, accept_sparse="csr", force_all_finite=force_all_finite
+        )
+
         return self._gpu.inverse_transform(X)
Based on learnings: "Silent data corruption from type coercion, incorrect handling of cuDF vs pandas vs NumPy inputs, or missing validation causing crashes on invalid input must be addressed."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/cuml/cuml/accel/_overrides/umap.py` around lines 50 - 52, The
_gpu_inverse_transform method currently returns self._gpu.inverse_transform(X)
without input validation or dispatch-compatible signature; add validation using
the same check_array call used in other GPU methods to reject non-finite values
and ensure consistent dtype/ndim handling, and update the method signature to
accept **kwargs (def _gpu_inverse_transform(self, X, **kwargs):) so it matches
the dispatch signature before forwarding to self._gpu.inverse_transform(X,
**kwargs).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@python/cuml/cuml/accel/_overrides/umap.py`:
- Around line 41-48: The _gpu_transform method lacks a **kwargs parameter so
kwargs forwarded by the dispatch path (_call_gpu_method) can cause a TypeError;
update the _gpu_transform signature to accept **kwargs and forward them to
self._gpu.transform (i.e., call self._gpu.transform(X, **kwargs)) while keeping
the input validation via check_array intact so X is still validated before
dispatch.
- Around line 50-52: The _gpu_inverse_transform method currently returns
self._gpu.inverse_transform(X) without input validation or dispatch-compatible
signature; add validation using the same check_array call used in other GPU
methods to reject non-finite values and ensure consistent dtype/ndim handling,
and update the method signature to accept **kwargs (def
_gpu_inverse_transform(self, X, **kwargs):) so it matches the dispatch signature
before forwarding to self._gpu.inverse_transform(X, **kwargs).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 18761288-1cdd-45ea-bba5-4a345173d74e

📥 Commits

Reviewing files that changed from the base of the PR and between 54e36f0 and b2aa0b9.

📒 Files selected for processing (1)

python/cuml/cuml/accel/_overrides/umap.py

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@python/cuml/cuml/accel/_overrides/umap.py`:
- Line 8: The validation currently calls check_array with default dtype/order
which causes an extra allocation before UMAP's input_to_cuml_array; update the
pre-validation to call check_array(..., dtype=np.float32, order="C") (ensure
numpy is imported as np) so the array is validated/converted to the same float32
C-contiguous layout UMAP expects, or remove the redundant pre-check and validate
only via input_to_cuml_array(...) in the fit path (refer to uses of check_array
and input_to_cuml_array in this module, including the blocks around the existing
calls at the locations noted).
- Line 21: The override currently deletes kwargs (del kwargs) which silently
drops fit-time options forwarded by ProxyBase; instead, in the UMAP GPU override
(the function overriding cuml.manifold.UMAP.fit in
python/cuml/cuml/accel/_overrides/umap.py) validate and extract the supported
subset (e.g., convert_dtype, knn_graph) and forward only those to the underlying
GPU implementation, and for any remaining kwargs that are CPU-only raise
UnsupportedOnGPU; ensure the behavior mirrors cuml.manifold.UMAP.fit’s supported
kwargs and reject others explicitly rather than discarding them.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 45219cdd-8a86-4028-ae84-e16c39354ee6

📥 Commits

Reviewing files that changed from the base of the PR and between b2aa0b9 and 6d73d28.

📒 Files selected for processing (1)

python/cuml/cuml/accel/_overrides/umap.py

python/cuml/cuml/accel/_overrides/umap.py

coderabbitai · 2026-03-07T12:53:35Z

python/cuml/cuml/accel/_overrides/umap.py

-        # force_alL_finite is in here for _signature_ compatibility
-        # with umap.UMAP, but we don't properly implement it (yet).
+        """Fit the UMAP model with GPU-accelerated input validation."""
+        del kwargs  # Handle intentionally unused kwargs for signature compatibility


⚠️ Potential issue | 🟠 Major

Don't silently drop fit kwargs.

ProxyBase forwards method kwargs into these overrides. Deleting them here means the GPU path ignores fit-time options instead of honoring them or triggering a clean CPU fallback; cuml.manifold.UMAP.fit already supports extra kwargs such as convert_dtype and knn_graph. Please forward the supported subset after validation, and raise UnsupportedOnGPU for any remaining CPU-only kwargs rather than discarding them.

Also applies to: 31-31

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@python/cuml/cuml/accel/_overrides/umap.py` at line 21, The override currently deletes kwargs (del kwargs) which silently drops fit-time options forwarded by ProxyBase; instead, in the UMAP GPU override (the function overriding cuml.manifold.UMAP.fit in python/cuml/cuml/accel/_overrides/umap.py) validate and extract the supported subset (e.g., convert_dtype, knn_graph) and forward only those to the underlying GPU implementation, and for any remaining kwargs that are CPU-only raise UnsupportedOnGPU; ensure the behavior mirrors cuml.manifold.UMAP.fit’s supported kwargs and reject others explicitly rather than discarding them.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@python/cuml/cuml/accel/_overrides/umap.py`:
- Around line 22-29: The wrapper methods _gpu_fit, _gpu_fit_transform,
_gpu_transform, and _gpu_inverse_transform currently force dtype=np.float32 in
the check_array call which ignores a user-provided convert_dtype kwarg; update
each method to read convert_dtype = kwargs.get("convert_dtype", True) and only
pass dtype=np.float32 to check_array when convert_dtype is True (otherwise pass
None), keeping other check_array arguments the same, then forward the original
kwargs to self._gpu.*() so the underlying UMAP API contract is respected.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0174c43a-6d49-4921-a348-b6f550b7cf97

📥 Commits

Reviewing files that changed from the base of the PR and between 6d73d28 and 0de0ced.

📒 Files selected for processing (1)

python/cuml/cuml/accel/_overrides/umap.py

python/cuml/cuml/accel/_overrides/umap.py

shivansh023023 · 2026-03-09T11:25:02Z

/label category: improvement
/label breaking: non-breaking

shivansh023023 · 2026-03-09T11:31:20Z

Hi maintainers, I've resolved all the technical feedback from CodeRabbit and fixed the linter/docstring issues. Could a maintainer please apply the following labels so the Label Checker can pass?

category: improvement

breaking: non-breaking

shivansh023023 · 2026-03-09T11:38:39Z

/label category: improvement
/label breaking: non-breaking

csadorf · 2026-03-13T16:23:52Z

Can you address the conflict, please

…P accel

shivansh023023 · 2026-03-15T09:52:19Z

Hi @csadorf, I have resolved the merge conflicts and rebased the branch onto the latest main. I also took the opportunity to ensure the validation logic is consistent across the new _overrides structure and integrated the float32/C-order memory optimizations.

shivansh023023 requested a review from a team as a code owner February 28, 2026 03:45

shivansh023023 requested a review from divyegala February 28, 2026 03:45

github-actions bot added the Cython / Python Cython or Python issue label Feb 28, 2026

coderabbitai bot reviewed Feb 28, 2026

View reviewed changes

python/cuml/cuml/accel/_wrappers/umap.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Feb 28, 2026

View reviewed changes

coderabbitai bot reviewed Mar 7, 2026

View reviewed changes

python/cuml/cuml/accel/_overrides/umap.py Show resolved Hide resolved

csadorf added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Mar 13, 2026

shivansh023023 added 5 commits March 15, 2026 15:05

FIX: Add docstrings and handle unused kwargs in UMAP accel

ba27906

FIX: Align signatures, add validation, and include docstrings for UMA…

afbc993

…P accel

bbbbFIX: Optimize memory with float32/C-order and forward fit kwargs

5325f50

FIX: Optimize memory with float32/C-order and forward fit kwargs

ac5c97b

FIX: Respect convert_dtype and optimize memory in UMAP accel

c5f1ad1

shivansh023023 force-pushed the fix-accel-umap-nan-validation branch from dc1e7f8 to c5f1ad1 Compare March 15, 2026 09:46

Conversation

shivansh023023 commented Feb 28, 2026

Uh oh!

copy-pr-bot bot commented Feb 28, 2026

Uh oh!

coderabbitai bot commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

shivansh023023 commented Feb 28, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

shivansh023023 commented Mar 9, 2026

Uh oh!

shivansh023023 commented Mar 9, 2026

Uh oh!

shivansh023023 commented Mar 9, 2026

Uh oh!

csadorf commented Mar 13, 2026

Uh oh!

shivansh023023 commented Mar 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai bot commented Feb 28, 2026 •

edited

Loading