Skip to content

Optimize data transfer for Pipeline in cuml.accel#7835

Merged
rapids-bot[bot] merged 4 commits intorapidsai:mainfrom
jcrist:cuml-accel-pipeline
Mar 3, 2026
Merged

Optimize data transfer for Pipeline in cuml.accel#7835
rapids-bot[bot] merged 4 commits intorapidsai:mainfrom
jcrist:cuml-accel-pipeline

Conversation

@jcrist
Copy link
Copy Markdown
Member

@jcrist jcrist commented Feb 27, 2026

This PR adds an optimization for Pipelines when running under cuml.accel. In cases where the tail end of operations performed by a pipeline are all accelerated, we can avoid intermediate host<->device transfers and pass the intermediates on device.

For example, a pipeline of unaccelerated -> unaccelerated -> accelerated -> accelerated wouldn't need to convert the intermediate result back to numpy between steps 3 and 4 and could pass that on device. A pipeline of unaccelerated -> accelerated -> unaccelerated -> accelerated though would keep passing intermediates on host since there's not a contiguous block of accelerated estimators.

The actual implementation of this is roughly ~100 lines. The remainder is tests, as well as a refactor to better distinguish between patches (mutate the original module) and overrides (overlay over the original module).

Fixes #7778. Supersedes #7782.

Previously the accelerator only supported "overrides" (though we weren't
consistent in our terminology). An override:

- Doesn't mutate the original (non-accelerated) module
- Is only visible to consumers not in the accelerator's exclude list.
  For example, the override for `sklearn.linear_models.LinearRegression`
  is visible to external consumers, but any usage within sklearn itself
  will see the original (non-accelerated) version.

Sometimes though we also need to patch sklearn itself, mutating the
original module. Previously we accomplished this in a one-off hack, but
now I have need to make this something more supported. Overrides should
be preferred to patches when possible, but are sometimes necessary to
achieve robust acceleration in the simplest way.

To do this, we:

- Standardize our internal terminology. An `override` is a non-mutating,
  overlay layer in an accelerated module, only visible to external
  consumers. A `patch` is a mutation applied to the original module, and
  is visible to both internal and external consumers.
- Add builtin support for patches to the accelerator, including a test.
- Rename the `_wrappers` directory to `_overrides`.
- Add a new `_patches` directory to contain any patches.
- Move the `sklearn.utils.all_estimators` patch to the new `_patches`
  directory.
@jcrist jcrist self-assigned this Feb 27, 2026
@jcrist jcrist added the improvement Improvement / enhancement to an existing function label Feb 27, 2026
@jcrist jcrist requested a review from a team as a code owner February 27, 2026 20:31
@jcrist jcrist added non-breaking Non-breaking change cuml-accel Issues related to cuml.accel labels Feb 27, 2026
@jcrist jcrist requested a review from dantegd February 27, 2026 20:31
@github-actions github-actions bot added the Cython / Python Cython or Python issue label Feb 27, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 27, 2026

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Optimized sklearn Pipeline execution to reduce GPU↔CPU transfers and auto-handle output formats for accelerated operations.
    • Exposed a wrapped all_estimators helper for consistent estimator discovery.
    • Improved module transform/registration to provide more reliable accelerated module overrides.
  • Tests

    • Added extensive tests for pipeline device/host data transfer and fallback behavior to ensure numpy-facing outputs.

Walkthrough

Adds a device-aware sklearn.Pipeline patch to reduce host-device transfers, introduces ModuleTransform to separate overrides and patches, updates accelerator registration to use override/patch sets, adds ensure_host for host transfers, simplifies pytest plugin initialization, and expands pipeline-related tests.

Changes

Cohort / File(s) Summary
Pipeline patch
python/cuml/cuml/accel/_patches/sklearn/pipeline.py
New module that determines pipeline output device via get_output_type() and wraps Pipeline methods with patch_method() to enforce output-type contexts, validate hyperparams, post-process outputs (convert device arrays/sparse to host when needed), and export Pipeline.
Sklearn utils wrapper
python/cuml/cuml/accel/_patches/sklearn/utils.py
Replaces prior patched all_estimators with a direct wrapper all_estimators() around _all_estimators, applies functools.wraps, updates module exports.
Accelerator transform framework
python/cuml/cuml/accel/accelerator.py
Introduces ModuleTransform (separate override + patch namespaces), replaces prior patch storage with transforms, adds _load_namespace, and updates loader/finder/install flows to apply transforms.
Accelerated modules config
python/cuml/cuml/accel/core.py
Replaces literal ACCELERATED_MODULES with computed union of new private _OVERRIDES and _PATCHES sets; registration now supplies override and/or patch per module.
Estimator proxy host handling
python/cuml/cuml/accel/estimator_proxy.py
Adds ensure_host() to transfer CuPy/cupyx sparse data to host; adjusts GPU/CPU call paths to conditionally return device arrays or force host outputs and wrap results according to set_output configuration.
Pytest plugin init
python/cuml/cuml/accel/pytest_plugin.py
Removes pre-install sklearn patch step and now calls install() unconditionally in pytest initial conftest hook; updates SPDX year.
Tests: accelerator
python/cuml/cuml_accel_tests/test_accelerator.py
Renames test to reflect override semantics, updates register usage to pass override, and adds test_accelerator_module_patch verifying direct patch mappings.
Tests: pipeline
python/cuml/cuml_accel_tests/test_pipeline.py
Adds extensive pipeline tests: MockMethod, HostTransformer, patch_methods fixture, and parameterized tests validating device/host transfer behavior, fallback-to-host semantics, and numpy outputs.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • csadorf
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 38.30% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Optimize data transfer for Pipeline in cuml.accel' clearly and specifically summarizes the main change in the PR.
Description check ✅ Passed The description is well-related to the changeset, explaining the optimization strategy and mentioning the refactor to distinguish patches from overrides.
Linked Issues check ✅ Passed The PR implementation meets the core objective from #7778: ensuring Pipeline works under cuml.accel with optimized data transfer for accelerated tail steps.
Out of Scope Changes check ✅ Passed All changes are directly related to Pipeline optimization and the supporting refactor to distinguish overrides from patches, with no unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
python/cuml/cuml/accel/_patches/sklearn/pipeline.py (1)

29-37: Rename unused loop variable name to _name.

The name variable from tuple unpacking is not used within the loop body.

✨ Suggested fix
-        for name, step in (
+        for _name, step in (
             reversed(pipeline.steps) if reverse else pipeline.steps
         ):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/cuml/cuml/accel/_patches/sklearn/pipeline.py` around lines 29 - 37,
The loop in flat_steps unpacks (name, step) but never uses name; change the loop
variable from name to _name in the for statement that iterates over
pipeline.steps (and reversed(pipeline.steps)) so the unused variable is clearly
marked (i.e., replace "for name, step in ..." with "for _name, step in ...").
Ensure the rest of flat_steps, Pipeline check, and yield logic remain unchanged.
python/cuml/cuml/accel/_patches/sklearn/utils.py (1)

38-44: Consider replacing the bare assert with a more informative error.

If the assertion fails (e.g., due to an unexpected state where multiple proxied classes share the same name), a bare AssertionError provides no context. A descriptive exception would aid debugging.

🛡️ Suggested improvement
         if len(cls_list) == 1:
             estimators.append((name, cls_list[0]))
         else:
             proxied_cls = [cls for cls in cls_list if is_proxy(cls)]
-            assert len(proxied_cls) == 1
+            if len(proxied_cls) != 1:
+                raise RuntimeError(
+                    f"Expected exactly one proxy class for {name!r}, "
+                    f"found {len(proxied_cls)}"
+                )
             estimators.append((name, proxied_cls[0]))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/cuml/cuml/accel/_patches/sklearn/utils.py` around lines 38 - 44,
Replace the bare assert in the loop over estimator_groups with an explicit
exception that includes context: after collecting proxied_cls via is_proxy for a
given name, if len(proxied_cls) != 1 raise a ValueError (or RuntimeError) that
states the estimator group name and the number of proxied classes found (and
optionally lists their types) instead of using assert, then append the single
proxied class to estimators as before; this change touches the loop using
estimator_groups, proxied_cls, is_proxy, and the estimators list.
python/cuml/cuml_accel_tests/test_pipeline.py (1)

221-221: Consider prefixing unused unpacked variables with underscore.

The y_test variable is unpacked but never used in these test functions.

✨ Suggested fix
# Line 221
-    X_train, X_test, y_train, y_test = regression_data
+    X_train, X_test, y_train, _y_test = regression_data

# Line 307
-    X_train, X_test, y_train, y_test = regression_data
+    X_train, X_test, y_train, _y_test = regression_data

Also applies to: 307-307

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/cuml/cuml_accel_tests/test_pipeline.py` at line 221, The test unpacks
regression_data into X_train, X_test, y_train, y_test but y_test is unused;
update the unpacking to prefix the unused variable with an underscore (e.g.,
X_train, X_test, y_train, _y_test) in the test functions that perform this
unpack (references: variables X_train, X_test, y_train, y_test in the test
file), and apply the same change at the other occurrence around the 307 context
so the linter/unused-variable warnings are resolved.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@python/cuml/cuml/accel/_patches/sklearn/pipeline.py`:
- Around line 78-83: The conditional in the pipeline code uses mixed `and`/`or`
without parentheses causing `is_cp_sparse(out)` to trigger conversion
unconditionally; update the condition in the block that checks
GlobalSettings().output_type (used around variable out in the pipeline) to
require the output type be in (None, "numpy") AND that the output is either a
cupy ndarray or a cupy sparse (i.e. change the test to
GlobalSettings().output_type in (None, "numpy") and (isinstance(out, cp.ndarray)
or is_cp_sparse(out))) so conversion with out.get() only happens when both the
desired output type and out's type match.

---

Nitpick comments:
In `@python/cuml/cuml_accel_tests/test_pipeline.py`:
- Line 221: The test unpacks regression_data into X_train, X_test, y_train,
y_test but y_test is unused; update the unpacking to prefix the unused variable
with an underscore (e.g., X_train, X_test, y_train, _y_test) in the test
functions that perform this unpack (references: variables X_train, X_test,
y_train, y_test in the test file), and apply the same change at the other
occurrence around the 307 context so the linter/unused-variable warnings are
resolved.

In `@python/cuml/cuml/accel/_patches/sklearn/pipeline.py`:
- Around line 29-37: The loop in flat_steps unpacks (name, step) but never uses
name; change the loop variable from name to _name in the for statement that
iterates over pipeline.steps (and reversed(pipeline.steps)) so the unused
variable is clearly marked (i.e., replace "for name, step in ..." with "for
_name, step in ..."). Ensure the rest of flat_steps, Pipeline check, and yield
logic remain unchanged.

In `@python/cuml/cuml/accel/_patches/sklearn/utils.py`:
- Around line 38-44: Replace the bare assert in the loop over estimator_groups
with an explicit exception that includes context: after collecting proxied_cls
via is_proxy for a given name, if len(proxied_cls) != 1 raise a ValueError (or
RuntimeError) that states the estimator group name and the number of proxied
classes found (and optionally lists their types) instead of using assert, then
append the single proxied class to estimators as before; this change touches the
loop using estimator_groups, proxied_cls, is_proxy, and the estimators list.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ed4de0a and 750627f.

📒 Files selected for processing (24)
  • python/cuml/cuml/accel/_overrides/__init__.py
  • python/cuml/cuml/accel/_overrides/hdbscan.py
  • python/cuml/cuml/accel/_overrides/sklearn/__init__.py
  • python/cuml/cuml/accel/_overrides/sklearn/cluster.py
  • python/cuml/cuml/accel/_overrides/sklearn/covariance.py
  • python/cuml/cuml/accel/_overrides/sklearn/decomposition.py
  • python/cuml/cuml/accel/_overrides/sklearn/ensemble.py
  • python/cuml/cuml/accel/_overrides/sklearn/kernel_ridge.py
  • python/cuml/cuml/accel/_overrides/sklearn/linear_model.py
  • python/cuml/cuml/accel/_overrides/sklearn/manifold.py
  • python/cuml/cuml/accel/_overrides/sklearn/neighbors.py
  • python/cuml/cuml/accel/_overrides/sklearn/preprocessing.py
  • python/cuml/cuml/accel/_overrides/sklearn/svm.py
  • python/cuml/cuml/accel/_overrides/umap.py
  • python/cuml/cuml/accel/_patches/__init__.py
  • python/cuml/cuml/accel/_patches/sklearn/__init__.py
  • python/cuml/cuml/accel/_patches/sklearn/pipeline.py
  • python/cuml/cuml/accel/_patches/sklearn/utils.py
  • python/cuml/cuml/accel/accelerator.py
  • python/cuml/cuml/accel/core.py
  • python/cuml/cuml/accel/estimator_proxy.py
  • python/cuml/cuml/accel/pytest_plugin.py
  • python/cuml/cuml_accel_tests/test_accelerator.py
  • python/cuml/cuml_accel_tests/test_pipeline.py

@jcrist jcrist force-pushed the cuml-accel-pipeline branch from 750627f to f4b2e63 Compare February 27, 2026 20:42
@@ -1,5 +1,5 @@
#
# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION.
# SPDX-FileCopyrightText: Copyright (c) 2025-2026, NVIDIA CORPORATION.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously the accelerator only supported "overrides" (though we weren't consistent in our terminology).

An override:

  • Doesn't mutate the original (non-accelerated) module
  • Is only visible to consumers not in the accelerator's exclude list. For example, the override for sklearn.linear_models.LinearRegression is visible to external consumers, but any usage within sklearn itself will see the original (non-accelerated) version.

Sometimes though we also need to patch sklearn itself, mutating the original module. Previously we accomplished this in a one-off hack, but now we have a need to make this something more supported. Overrides should be preferred to patches when possible, but are sometimes necessary to
achieve robust acceleration in the simplest way.

The changes in this file:

  • Standardize our internal terminology. An override is a non-mutating, overlay layer in an accelerated module, only visible to external consumers. A patch is a mutation applied to the original module, and is visible to both internal and external consumers.
  • Add builtin support for patches to the accelerator

This corresponds with a few changes in other files:

  • Rename the _wrappers directory to _overrides.
  • Add a new _patches directory to contain any patches.
  • Move the sklearn.utils.all_estimators patch to the new _patches directory. We want to always apply this patch, since without it all_estimators returns incorrect results.


try:
install()
except RuntimeError:
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This RuntimeError could never occur and was mistakenly leftover from the old implementation. Safe to rip this out.

# reference.html#pytest.hookspec.pytest_load_initial_conftests

# Apply sklearn patches BEFORE installing cuml.accel to prevent duplicates
apply_sklearn_patches()
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patches are now always applied, no need to hack this in here.

This adds an optimization to `Pipeline` when running under cuml.accel.
If a pipeline is composed of accelerated estimators (or only starts with
unaccelerated estimators), then the accelerated estimators will now pass
data on device rather than doing a series of device<->host transfers.
@jcrist jcrist force-pushed the cuml-accel-pipeline branch from f4b2e63 to fc6bfcf Compare February 27, 2026 20:44
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
python/cuml/cuml/accel/_patches/sklearn/pipeline.py (1)

27-37: Rename unused loop variable name to _name.

The loop variable name is not used within the loop body. Per Python convention, prefix unused variables with an underscore.

♻️ Proposed fix
     def flat_steps(pipeline):
         """Iterate over steps potentially nested pipelines"""
-        for name, step in (
+        for _name, step in (
             reversed(pipeline.steps) if reverse else pipeline.steps
         ):
             if step in (None, "passthrough"):
                 continue
             if isinstance(step, Pipeline):
                 yield from flat_steps(step)
             else:
                 yield step
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/cuml/cuml/accel/_patches/sklearn/pipeline.py` around lines 27 - 37, In
the flat_steps function, the loop binds an unused variable name; rename it to
_name to follow Python convention for unused variables by changing the loop
header in flat_steps from "for name, step in (reversed(pipeline.steps) if
reverse else pipeline.steps):" to "for _name, step in (reversed(pipeline.steps)
if reverse else pipeline.steps):" so linters and readers know the first element
is intentionally unused.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@python/cuml/cuml/accel/_patches/sklearn/pipeline.py`:
- Around line 27-37: In the flat_steps function, the loop binds an unused
variable name; rename it to _name to follow Python convention for unused
variables by changing the loop header in flat_steps from "for name, step in
(reversed(pipeline.steps) if reverse else pipeline.steps):" to "for _name, step
in (reversed(pipeline.steps) if reverse else pipeline.steps):" so linters and
readers know the first element is intentionally unused.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 750627f and fc6bfcf.

📒 Files selected for processing (4)
  • python/cuml/cuml/accel/_patches/sklearn/pipeline.py
  • python/cuml/cuml/accel/core.py
  • python/cuml/cuml/accel/estimator_proxy.py
  • python/cuml/cuml_accel_tests/test_pipeline.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • python/cuml/cuml/accel/core.py

jcrist added 2 commits March 2, 2026 10:46
Some methods may return dtypes not supported on device, and others would
never be chained in a context where this pipeline optimization would
take effect. Easier to only enabled device arrays for *transform*
methods than optionally gate the inverse.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@python/cuml/cuml_accel_tests/test_pipeline.py`:
- Line 65: The linter warnings are from unused parameters/variables: in the fit
method (def fit(self, X, y=None)) keep the public API but mark y as
intentionally unused by adding "del y" or referencing it as "_ = y" at the top
of fit; for the test files rename local variables y_test to _y_test (or prefix
with underscore) where they are unused (references at the test functions around
the previous y_test occurrences) to silence unused-variable warnings while
preserving behavior.
- Around line 306-321: The test test_pipeline_data_transfer_with_host_fallback
is currently patching Ridge at the proxy-facing API (patch_methods(Ridge, "fit",
"predict")) so it checks the wrong interception point; to validate device→host
conversion before CPU fallback you should patch the CPU-side estimator methods
where ensure_host runs (i.e., the CPU estimator class/methods invoked after
fallback) instead of the proxy Ridge methods, then assert the intercepted
fit/predict argument types are numpy.ndarray (not cupy) and that
pipeline.predict returns np.ndarray; update patch_methods to target the CPU
estimator implementation used in the fallback call path (the actual estimator
class/method names invoked after ensure_host) and change the type assertions
accordingly.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fc6bfcf and 5130dad.

📒 Files selected for processing (2)
  • python/cuml/cuml/accel/estimator_proxy.py
  • python/cuml/cuml_accel_tests/test_pipeline.py

Copy link
Copy Markdown
Contributor

@csadorf csadorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic!

@jcrist
Copy link
Copy Markdown
Member Author

jcrist commented Mar 3, 2026

/merge

@rapids-bot rapids-bot bot merged commit 9ab9f4e into rapidsai:main Mar 3, 2026
238 of 249 checks passed
@jcrist jcrist deleted the cuml-accel-pipeline branch March 3, 2026 03:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuml-accel Issues related to cuml.accel Cython / Python Cython or Python issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add cuml.accel support for Pipeline

3 participants