Skip to content

FIX: Implement input validation in UMAP accel wrapper#7837

Open
shivansh023023 wants to merge 5 commits intorapidsai:mainfrom
shivansh023023:fix-accel-umap-nan-validation
Open

FIX: Implement input validation in UMAP accel wrapper#7837
shivansh023023 wants to merge 5 commits intorapidsai:mainfrom
shivansh023023:fix-accel-umap-nan-validation

Conversation

@shivansh023023
Copy link
Copy Markdown

Title: [FEA] Implement input validation in UMAP accel wrapper

Description:

This PR implements the TODO for adding input validation to the UMAP accel wrapper.

Changes:

Added sklearn.utils.validation.check_array to _gpu_fit, _gpu_fit_transform, and _gpu_transform.

Ensured the validated array is assigned back to X.

Maintained support for force_all_finite parameter to allow user-controlled validation behavior.

This ensures that accel users receive clear error messages when passing NaN or Inf values, consistent with standard scikit-learn estimators.

@shivansh023023 shivansh023023 requested a review from a team as a code owner February 28, 2026 03:45
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Feb 28, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the Cython / Python Cython or Python issue label Feb 28, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 28, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds input validation using check_array and numpy to GPU UMAP proxy methods, enforces dtype=float32 and order="C", accepts CSR sparse inputs, adds class- and method-level docstrings, updates _gpu_transform and _gpu_inverse_transform to accept **kwargs and forwards kwargs to underlying GPU calls; underlying GPU calls remain unchanged.

Changes

Cohort / File(s) Summary
GPU UMAP proxy
python/cuml/cuml/accel/_overrides/umap.py
Adds imports numpy as np and check_array; adds class docstring; applies check_array validation (with dtype=float32, order="C", accept_sparse="csr", force_all_finite handling) in _gpu_fit, _gpu_fit_transform, _gpu_transform, and _gpu_inverse_transform; updates _gpu_transform and _gpu_inverse_transform signatures to accept **kwargs and forwards kwargs to self._gpu.* calls; replaces inline comments with method docstrings.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'FIX: Implement input validation in UMAP accel wrapper' directly and clearly describes the main change: adding input validation to the UMAP accelerator wrapper.
Description check ✅ Passed The description is directly related to the changeset, explaining the implementation of input validation for the UMAP accel wrapper and its benefits for error messaging consistency.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@python/cuml/cuml/accel/_wrappers/umap.py`:
- Line 8: Replace the sklearn import with cuML's adapter so accel input types
are preserved: remove "from sklearn.utils.validation import check_array" and
instead import check_array from cuml.thirdparty_adapters.adapters (e.g., "from
cuml.thirdparty_adapters.adapters import check_array"), keeping the same
function name so existing calls to check_array in this module (the uses that
currently coerce cuDF/CuPy to NumPy) will call the cuML adapter which checks
cuml_accel_enabled() and properly handles NumPy, pandas, cuDF, CuPy and sparse
inputs.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ed4de0a and a9b0755.

📒 Files selected for processing (1)
  • python/cuml/cuml/accel/_wrappers/umap.py

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@python/cuml/cuml/accel/_wrappers/umap.py`:
- Around line 27-29: The function _gpu_fit_transform currently accepts **kwargs
for signature compatibility but does not use it, triggering ARG002; fix this by
explicitly marking kwargs as unused inside _gpu_fit_transform (for example add a
single line like "del kwargs" or "unused_kwargs = kwargs" at the start of the
function) so linters know the parameter is intentionally unused while leaving
the signature unchanged.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a9b0755 and 54e36f0.

📒 Files selected for processing (1)
  • python/cuml/cuml/accel/_wrappers/umap.py

Comment on lines 27 to 29
def _gpu_fit_transform(self, X, y=None, force_all_finite=True, **kwargs):
# **kwargs is here for signature compatibility - umap.UMAP has them,
# but ignores all but the ones named here.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Handle intentionally unused kwargs explicitly to satisfy lint.

Line 27 introduces kwargs for signature compatibility, but it is unused and currently triggers ARG002. Please explicitly mark it unused.

Proposed fix
 def _gpu_fit_transform(self, X, y=None, force_all_finite=True, **kwargs):
     # **kwargs is here for signature compatibility - umap.UMAP has them,
     # but ignores all but the ones named here.
+    del kwargs  # intentionally unused; kept for signature compatibility
 
     # Validate input to handle non-finite values (NaN, Inf)
     X = check_array(
🧰 Tools
🪛 Ruff (0.15.2)

[warning] 27-27: Unused method argument: kwargs

(ARG002)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/cuml/cuml/accel/_wrappers/umap.py` around lines 27 - 29, The function
_gpu_fit_transform currently accepts **kwargs for signature compatibility but
does not use it, triggering ARG002; fix this by explicitly marking kwargs as
unused inside _gpu_fit_transform (for example add a single line like "del
kwargs" or "unused_kwargs = kwargs" at the start of the function) so linters
know the parameter is intentionally unused while leaving the signature
unchanged.

@shivansh023023
Copy link
Copy Markdown
Author

/label category: improvement
/label breaking: non-breaking

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
python/cuml/cuml/accel/_overrides/umap.py (2)

41-48: Add **kwargs for dispatch signature compatibility.

The dispatch mechanism in _call_gpu_method forwards **kwargs to all GPU methods. Unlike _gpu_fit and _gpu_fit_transform, this method doesn't accept **kwargs, which could cause TypeError if any kwargs are passed (e.g., via sklearn 1.3+ metadata routing).

♻️ Proposed fix
-    def _gpu_transform(self, X, force_all_finite=True):
+    def _gpu_transform(self, X, force_all_finite=True, **kwargs):
         """Transform the data with GPU-accelerated input validation."""
+        del kwargs  # Handle intentionally unused kwargs for signature compatibility
+
         # Validate input during transform
         X = check_array(
             X, accept_sparse="csr", force_all_finite=force_all_finite
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/cuml/cuml/accel/_overrides/umap.py` around lines 41 - 48, The
_gpu_transform method lacks a **kwargs parameter so kwargs forwarded by the
dispatch path (_call_gpu_method) can cause a TypeError; update the
_gpu_transform signature to accept **kwargs and forward them to
self._gpu.transform (i.e., call self._gpu.transform(X, **kwargs)) while keeping
the input validation via check_array intact so X is still validated before
dispatch.

50-52: Consider adding input validation for consistency.

This method lacks the check_array validation that was added to the other GPU methods. While inverse transform may have different input characteristics, validating for non-finite values would maintain consistency and prevent silent data corruption. Additionally, consider adding **kwargs for dispatch signature compatibility.

♻️ Proposed fix for consistency
-    def _gpu_inverse_transform(self, X):
+    def _gpu_inverse_transform(self, X, force_all_finite=True, **kwargs):
         """Inverse transform the data."""
+        del kwargs  # Handle intentionally unused kwargs for signature compatibility
+
+        # Validate input to handle non-finite values (NaN, Inf)
+        X = check_array(
+            X, accept_sparse="csr", force_all_finite=force_all_finite
+        )
+
         return self._gpu.inverse_transform(X)

Based on learnings: "Silent data corruption from type coercion, incorrect handling of cuDF vs pandas vs NumPy inputs, or missing validation causing crashes on invalid input must be addressed."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/cuml/cuml/accel/_overrides/umap.py` around lines 50 - 52, The
_gpu_inverse_transform method currently returns self._gpu.inverse_transform(X)
without input validation or dispatch-compatible signature; add validation using
the same check_array call used in other GPU methods to reject non-finite values
and ensure consistent dtype/ndim handling, and update the method signature to
accept **kwargs (def _gpu_inverse_transform(self, X, **kwargs):) so it matches
the dispatch signature before forwarding to self._gpu.inverse_transform(X,
**kwargs).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@python/cuml/cuml/accel/_overrides/umap.py`:
- Around line 41-48: The _gpu_transform method lacks a **kwargs parameter so
kwargs forwarded by the dispatch path (_call_gpu_method) can cause a TypeError;
update the _gpu_transform signature to accept **kwargs and forward them to
self._gpu.transform (i.e., call self._gpu.transform(X, **kwargs)) while keeping
the input validation via check_array intact so X is still validated before
dispatch.
- Around line 50-52: The _gpu_inverse_transform method currently returns
self._gpu.inverse_transform(X) without input validation or dispatch-compatible
signature; add validation using the same check_array call used in other GPU
methods to reject non-finite values and ensure consistent dtype/ndim handling,
and update the method signature to accept **kwargs (def
_gpu_inverse_transform(self, X, **kwargs):) so it matches the dispatch signature
before forwarding to self._gpu.inverse_transform(X, **kwargs).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 18761288-1cdd-45ea-bba5-4a345173d74e

📥 Commits

Reviewing files that changed from the base of the PR and between 54e36f0 and b2aa0b9.

📒 Files selected for processing (1)
  • python/cuml/cuml/accel/_overrides/umap.py

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@python/cuml/cuml/accel/_overrides/umap.py`:
- Line 8: The validation currently calls check_array with default dtype/order
which causes an extra allocation before UMAP's input_to_cuml_array; update the
pre-validation to call check_array(..., dtype=np.float32, order="C") (ensure
numpy is imported as np) so the array is validated/converted to the same float32
C-contiguous layout UMAP expects, or remove the redundant pre-check and validate
only via input_to_cuml_array(...) in the fit path (refer to uses of check_array
and input_to_cuml_array in this module, including the blocks around the existing
calls at the locations noted).
- Line 21: The override currently deletes kwargs (del kwargs) which silently
drops fit-time options forwarded by ProxyBase; instead, in the UMAP GPU override
(the function overriding cuml.manifold.UMAP.fit in
python/cuml/cuml/accel/_overrides/umap.py) validate and extract the supported
subset (e.g., convert_dtype, knn_graph) and forward only those to the underlying
GPU implementation, and for any remaining kwargs that are CPU-only raise
UnsupportedOnGPU; ensure the behavior mirrors cuml.manifold.UMAP.fit’s supported
kwargs and reject others explicitly rather than discarding them.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 45219cdd-8a86-4028-ae84-e16c39354ee6

📥 Commits

Reviewing files that changed from the base of the PR and between b2aa0b9 and 6d73d28.

📒 Files selected for processing (1)
  • python/cuml/cuml/accel/_overrides/umap.py

# force_alL_finite is in here for _signature_ compatibility
# with umap.UMAP, but we don't properly implement it (yet).
"""Fit the UMAP model with GPU-accelerated input validation."""
del kwargs # Handle intentionally unused kwargs for signature compatibility
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't silently drop fit kwargs.

ProxyBase forwards method kwargs into these overrides. Deleting them here means the GPU path ignores fit-time options instead of honoring them or triggering a clean CPU fallback; cuml.manifold.UMAP.fit already supports extra kwargs such as convert_dtype and knn_graph. Please forward the supported subset after validation, and raise UnsupportedOnGPU for any remaining CPU-only kwargs rather than discarding them.

Also applies to: 31-31

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/cuml/cuml/accel/_overrides/umap.py` at line 21, The override currently
deletes kwargs (del kwargs) which silently drops fit-time options forwarded by
ProxyBase; instead, in the UMAP GPU override (the function overriding
cuml.manifold.UMAP.fit in python/cuml/cuml/accel/_overrides/umap.py) validate
and extract the supported subset (e.g., convert_dtype, knn_graph) and forward
only those to the underlying GPU implementation, and for any remaining kwargs
that are CPU-only raise UnsupportedOnGPU; ensure the behavior mirrors
cuml.manifold.UMAP.fit’s supported kwargs and reject others explicitly rather
than discarding them.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@python/cuml/cuml/accel/_overrides/umap.py`:
- Around line 22-29: The wrapper methods _gpu_fit, _gpu_fit_transform,
_gpu_transform, and _gpu_inverse_transform currently force dtype=np.float32 in
the check_array call which ignores a user-provided convert_dtype kwarg; update
each method to read convert_dtype = kwargs.get("convert_dtype", True) and only
pass dtype=np.float32 to check_array when convert_dtype is True (otherwise pass
None), keeping other check_array arguments the same, then forward the original
kwargs to self._gpu.*() so the underlying UMAP API contract is respected.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0174c43a-6d49-4921-a348-b6f550b7cf97

📥 Commits

Reviewing files that changed from the base of the PR and between 6d73d28 and 0de0ced.

📒 Files selected for processing (1)
  • python/cuml/cuml/accel/_overrides/umap.py

@shivansh023023
Copy link
Copy Markdown
Author

/label category: improvement
/label breaking: non-breaking

@shivansh023023
Copy link
Copy Markdown
Author

Hi maintainers, I've resolved all the technical feedback from CodeRabbit and fixed the linter/docstring issues. Could a maintainer please apply the following labels so the Label Checker can pass?

category: improvement

breaking: non-breaking

@shivansh023023
Copy link
Copy Markdown
Author

/label category: improvement
/label breaking: non-breaking

@csadorf csadorf added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Mar 13, 2026
@csadorf
Copy link
Copy Markdown
Contributor

csadorf commented Mar 13, 2026

Can you address the conflict, please

@shivansh023023 shivansh023023 force-pushed the fix-accel-umap-nan-validation branch from dc1e7f8 to c5f1ad1 Compare March 15, 2026 09:46
@shivansh023023
Copy link
Copy Markdown
Author

Hi @csadorf, I have resolved the merge conflicts and rebased the branch onto the latest main. I also took the opportunity to ensure the validation logic is consistent across the new _overrides structure and integrated the float32/C-order memory optimizations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Cython / Python Cython or Python issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants