FIX: Implement input validation in UMAP accel wrapper#7837
FIX: Implement input validation in UMAP accel wrapper#7837shivansh023023 wants to merge 5 commits intorapidsai:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds input validation using Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@python/cuml/cuml/accel/_wrappers/umap.py`:
- Line 8: Replace the sklearn import with cuML's adapter so accel input types
are preserved: remove "from sklearn.utils.validation import check_array" and
instead import check_array from cuml.thirdparty_adapters.adapters (e.g., "from
cuml.thirdparty_adapters.adapters import check_array"), keeping the same
function name so existing calls to check_array in this module (the uses that
currently coerce cuDF/CuPy to NumPy) will call the cuML adapter which checks
cuml_accel_enabled() and properly handles NumPy, pandas, cuDF, CuPy and sparse
inputs.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@python/cuml/cuml/accel/_wrappers/umap.py`:
- Around line 27-29: The function _gpu_fit_transform currently accepts **kwargs
for signature compatibility but does not use it, triggering ARG002; fix this by
explicitly marking kwargs as unused inside _gpu_fit_transform (for example add a
single line like "del kwargs" or "unused_kwargs = kwargs" at the start of the
function) so linters know the parameter is intentionally unused while leaving
the signature unchanged.
| def _gpu_fit_transform(self, X, y=None, force_all_finite=True, **kwargs): | ||
| # **kwargs is here for signature compatibility - umap.UMAP has them, | ||
| # but ignores all but the ones named here. |
There was a problem hiding this comment.
Handle intentionally unused kwargs explicitly to satisfy lint.
Line 27 introduces kwargs for signature compatibility, but it is unused and currently triggers ARG002. Please explicitly mark it unused.
Proposed fix
def _gpu_fit_transform(self, X, y=None, force_all_finite=True, **kwargs):
# **kwargs is here for signature compatibility - umap.UMAP has them,
# but ignores all but the ones named here.
+ del kwargs # intentionally unused; kept for signature compatibility
# Validate input to handle non-finite values (NaN, Inf)
X = check_array(🧰 Tools
🪛 Ruff (0.15.2)
[warning] 27-27: Unused method argument: kwargs
(ARG002)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@python/cuml/cuml/accel/_wrappers/umap.py` around lines 27 - 29, The function
_gpu_fit_transform currently accepts **kwargs for signature compatibility but
does not use it, triggering ARG002; fix this by explicitly marking kwargs as
unused inside _gpu_fit_transform (for example add a single line like "del
kwargs" or "unused_kwargs = kwargs" at the start of the function) so linters
know the parameter is intentionally unused while leaving the signature
unchanged.
|
/label category: improvement |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
python/cuml/cuml/accel/_overrides/umap.py (2)
41-48: Add**kwargsfor dispatch signature compatibility.The dispatch mechanism in
_call_gpu_methodforwards**kwargsto all GPU methods. Unlike_gpu_fitand_gpu_fit_transform, this method doesn't accept**kwargs, which could causeTypeErrorif any kwargs are passed (e.g., via sklearn 1.3+ metadata routing).♻️ Proposed fix
- def _gpu_transform(self, X, force_all_finite=True): + def _gpu_transform(self, X, force_all_finite=True, **kwargs): """Transform the data with GPU-accelerated input validation.""" + del kwargs # Handle intentionally unused kwargs for signature compatibility + # Validate input during transform X = check_array( X, accept_sparse="csr", force_all_finite=force_all_finite )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@python/cuml/cuml/accel/_overrides/umap.py` around lines 41 - 48, The _gpu_transform method lacks a **kwargs parameter so kwargs forwarded by the dispatch path (_call_gpu_method) can cause a TypeError; update the _gpu_transform signature to accept **kwargs and forward them to self._gpu.transform (i.e., call self._gpu.transform(X, **kwargs)) while keeping the input validation via check_array intact so X is still validated before dispatch.
50-52: Consider adding input validation for consistency.This method lacks the
check_arrayvalidation that was added to the other GPU methods. While inverse transform may have different input characteristics, validating for non-finite values would maintain consistency and prevent silent data corruption. Additionally, consider adding**kwargsfor dispatch signature compatibility.♻️ Proposed fix for consistency
- def _gpu_inverse_transform(self, X): + def _gpu_inverse_transform(self, X, force_all_finite=True, **kwargs): """Inverse transform the data.""" + del kwargs # Handle intentionally unused kwargs for signature compatibility + + # Validate input to handle non-finite values (NaN, Inf) + X = check_array( + X, accept_sparse="csr", force_all_finite=force_all_finite + ) + return self._gpu.inverse_transform(X)Based on learnings: "Silent data corruption from type coercion, incorrect handling of cuDF vs pandas vs NumPy inputs, or missing validation causing crashes on invalid input must be addressed."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@python/cuml/cuml/accel/_overrides/umap.py` around lines 50 - 52, The _gpu_inverse_transform method currently returns self._gpu.inverse_transform(X) without input validation or dispatch-compatible signature; add validation using the same check_array call used in other GPU methods to reject non-finite values and ensure consistent dtype/ndim handling, and update the method signature to accept **kwargs (def _gpu_inverse_transform(self, X, **kwargs):) so it matches the dispatch signature before forwarding to self._gpu.inverse_transform(X, **kwargs).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@python/cuml/cuml/accel/_overrides/umap.py`:
- Around line 41-48: The _gpu_transform method lacks a **kwargs parameter so
kwargs forwarded by the dispatch path (_call_gpu_method) can cause a TypeError;
update the _gpu_transform signature to accept **kwargs and forward them to
self._gpu.transform (i.e., call self._gpu.transform(X, **kwargs)) while keeping
the input validation via check_array intact so X is still validated before
dispatch.
- Around line 50-52: The _gpu_inverse_transform method currently returns
self._gpu.inverse_transform(X) without input validation or dispatch-compatible
signature; add validation using the same check_array call used in other GPU
methods to reject non-finite values and ensure consistent dtype/ndim handling,
and update the method signature to accept **kwargs (def
_gpu_inverse_transform(self, X, **kwargs):) so it matches the dispatch signature
before forwarding to self._gpu.inverse_transform(X, **kwargs).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 18761288-1cdd-45ea-bba5-4a345173d74e
📒 Files selected for processing (1)
python/cuml/cuml/accel/_overrides/umap.py
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@python/cuml/cuml/accel/_overrides/umap.py`:
- Line 8: The validation currently calls check_array with default dtype/order
which causes an extra allocation before UMAP's input_to_cuml_array; update the
pre-validation to call check_array(..., dtype=np.float32, order="C") (ensure
numpy is imported as np) so the array is validated/converted to the same float32
C-contiguous layout UMAP expects, or remove the redundant pre-check and validate
only via input_to_cuml_array(...) in the fit path (refer to uses of check_array
and input_to_cuml_array in this module, including the blocks around the existing
calls at the locations noted).
- Line 21: The override currently deletes kwargs (del kwargs) which silently
drops fit-time options forwarded by ProxyBase; instead, in the UMAP GPU override
(the function overriding cuml.manifold.UMAP.fit in
python/cuml/cuml/accel/_overrides/umap.py) validate and extract the supported
subset (e.g., convert_dtype, knn_graph) and forward only those to the underlying
GPU implementation, and for any remaining kwargs that are CPU-only raise
UnsupportedOnGPU; ensure the behavior mirrors cuml.manifold.UMAP.fit’s supported
kwargs and reject others explicitly rather than discarding them.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 45219cdd-8a86-4028-ae84-e16c39354ee6
📒 Files selected for processing (1)
python/cuml/cuml/accel/_overrides/umap.py
| # force_alL_finite is in here for _signature_ compatibility | ||
| # with umap.UMAP, but we don't properly implement it (yet). | ||
| """Fit the UMAP model with GPU-accelerated input validation.""" | ||
| del kwargs # Handle intentionally unused kwargs for signature compatibility |
There was a problem hiding this comment.
Don't silently drop fit kwargs.
ProxyBase forwards method kwargs into these overrides. Deleting them here means the GPU path ignores fit-time options instead of honoring them or triggering a clean CPU fallback; cuml.manifold.UMAP.fit already supports extra kwargs such as convert_dtype and knn_graph. Please forward the supported subset after validation, and raise UnsupportedOnGPU for any remaining CPU-only kwargs rather than discarding them.
Also applies to: 31-31
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@python/cuml/cuml/accel/_overrides/umap.py` at line 21, The override currently
deletes kwargs (del kwargs) which silently drops fit-time options forwarded by
ProxyBase; instead, in the UMAP GPU override (the function overriding
cuml.manifold.UMAP.fit in python/cuml/cuml/accel/_overrides/umap.py) validate
and extract the supported subset (e.g., convert_dtype, knn_graph) and forward
only those to the underlying GPU implementation, and for any remaining kwargs
that are CPU-only raise UnsupportedOnGPU; ensure the behavior mirrors
cuml.manifold.UMAP.fit’s supported kwargs and reject others explicitly rather
than discarding them.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@python/cuml/cuml/accel/_overrides/umap.py`:
- Around line 22-29: The wrapper methods _gpu_fit, _gpu_fit_transform,
_gpu_transform, and _gpu_inverse_transform currently force dtype=np.float32 in
the check_array call which ignores a user-provided convert_dtype kwarg; update
each method to read convert_dtype = kwargs.get("convert_dtype", True) and only
pass dtype=np.float32 to check_array when convert_dtype is True (otherwise pass
None), keeping other check_array arguments the same, then forward the original
kwargs to self._gpu.*() so the underlying UMAP API contract is respected.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 0174c43a-6d49-4921-a348-b6f550b7cf97
📒 Files selected for processing (1)
python/cuml/cuml/accel/_overrides/umap.py
|
/label category: improvement |
|
Hi maintainers, I've resolved all the technical feedback from CodeRabbit and fixed the linter/docstring issues. Could a maintainer please apply the following labels so the Label Checker can pass? category: improvement breaking: non-breaking |
|
/label category: improvement |
|
Can you address the conflict, please |
dc1e7f8 to
c5f1ad1
Compare
|
Hi @csadorf, I have resolved the merge conflicts and rebased the branch onto the latest main. I also took the opportunity to ensure the validation logic is consistent across the new _overrides structure and integrated the float32/C-order memory optimizations. |
Title: [FEA] Implement input validation in UMAP accel wrapper
Description:
This PR implements the TODO for adding input validation to the UMAP accel wrapper.
Changes:
Added sklearn.utils.validation.check_array to _gpu_fit, _gpu_fit_transform, and _gpu_transform.
Ensured the validated array is assigned back to X.
Maintained support for force_all_finite parameter to allow user-controlled validation behavior.
This ensures that accel users receive clear error messages when passing NaN or Inf values, consistent with standard scikit-learn estimators.