Optimize data transfer for Pipeline in cuml.accel#7835
Optimize data transfer for Pipeline in cuml.accel#7835rapids-bot[bot] merged 4 commits intorapidsai:mainfrom
Pipeline in cuml.accel#7835Conversation
Previously the accelerator only supported "overrides" (though we weren't consistent in our terminology). An override: - Doesn't mutate the original (non-accelerated) module - Is only visible to consumers not in the accelerator's exclude list. For example, the override for `sklearn.linear_models.LinearRegression` is visible to external consumers, but any usage within sklearn itself will see the original (non-accelerated) version. Sometimes though we also need to patch sklearn itself, mutating the original module. Previously we accomplished this in a one-off hack, but now I have need to make this something more supported. Overrides should be preferred to patches when possible, but are sometimes necessary to achieve robust acceleration in the simplest way. To do this, we: - Standardize our internal terminology. An `override` is a non-mutating, overlay layer in an accelerated module, only visible to external consumers. A `patch` is a mutation applied to the original module, and is visible to both internal and external consumers. - Add builtin support for patches to the accelerator, including a test. - Rename the `_wrappers` directory to `_overrides`. - Add a new `_patches` directory to contain any patches. - Move the `sklearn.utils.all_estimators` patch to the new `_patches` directory.
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds a device-aware sklearn.Pipeline patch to reduce host-device transfers, introduces ModuleTransform to separate overrides and patches, updates accelerator registration to use override/patch sets, adds ensure_host for host transfers, simplifies pytest plugin initialization, and expands pipeline-related tests. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
python/cuml/cuml/accel/_patches/sklearn/pipeline.py (1)
29-37: Rename unused loop variablenameto_name.The
namevariable from tuple unpacking is not used within the loop body.✨ Suggested fix
- for name, step in ( + for _name, step in ( reversed(pipeline.steps) if reverse else pipeline.steps ):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@python/cuml/cuml/accel/_patches/sklearn/pipeline.py` around lines 29 - 37, The loop in flat_steps unpacks (name, step) but never uses name; change the loop variable from name to _name in the for statement that iterates over pipeline.steps (and reversed(pipeline.steps)) so the unused variable is clearly marked (i.e., replace "for name, step in ..." with "for _name, step in ..."). Ensure the rest of flat_steps, Pipeline check, and yield logic remain unchanged.python/cuml/cuml/accel/_patches/sklearn/utils.py (1)
38-44: Consider replacing the bareassertwith a more informative error.If the assertion fails (e.g., due to an unexpected state where multiple proxied classes share the same name), a bare
AssertionErrorprovides no context. A descriptive exception would aid debugging.🛡️ Suggested improvement
if len(cls_list) == 1: estimators.append((name, cls_list[0])) else: proxied_cls = [cls for cls in cls_list if is_proxy(cls)] - assert len(proxied_cls) == 1 + if len(proxied_cls) != 1: + raise RuntimeError( + f"Expected exactly one proxy class for {name!r}, " + f"found {len(proxied_cls)}" + ) estimators.append((name, proxied_cls[0]))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@python/cuml/cuml/accel/_patches/sklearn/utils.py` around lines 38 - 44, Replace the bare assert in the loop over estimator_groups with an explicit exception that includes context: after collecting proxied_cls via is_proxy for a given name, if len(proxied_cls) != 1 raise a ValueError (or RuntimeError) that states the estimator group name and the number of proxied classes found (and optionally lists their types) instead of using assert, then append the single proxied class to estimators as before; this change touches the loop using estimator_groups, proxied_cls, is_proxy, and the estimators list.python/cuml/cuml_accel_tests/test_pipeline.py (1)
221-221: Consider prefixing unused unpacked variables with underscore.The
y_testvariable is unpacked but never used in these test functions.✨ Suggested fix
# Line 221 - X_train, X_test, y_train, y_test = regression_data + X_train, X_test, y_train, _y_test = regression_data # Line 307 - X_train, X_test, y_train, y_test = regression_data + X_train, X_test, y_train, _y_test = regression_dataAlso applies to: 307-307
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@python/cuml/cuml_accel_tests/test_pipeline.py` at line 221, The test unpacks regression_data into X_train, X_test, y_train, y_test but y_test is unused; update the unpacking to prefix the unused variable with an underscore (e.g., X_train, X_test, y_train, _y_test) in the test functions that perform this unpack (references: variables X_train, X_test, y_train, y_test in the test file), and apply the same change at the other occurrence around the 307 context so the linter/unused-variable warnings are resolved.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@python/cuml/cuml/accel/_patches/sklearn/pipeline.py`:
- Around line 78-83: The conditional in the pipeline code uses mixed `and`/`or`
without parentheses causing `is_cp_sparse(out)` to trigger conversion
unconditionally; update the condition in the block that checks
GlobalSettings().output_type (used around variable out in the pipeline) to
require the output type be in (None, "numpy") AND that the output is either a
cupy ndarray or a cupy sparse (i.e. change the test to
GlobalSettings().output_type in (None, "numpy") and (isinstance(out, cp.ndarray)
or is_cp_sparse(out))) so conversion with out.get() only happens when both the
desired output type and out's type match.
---
Nitpick comments:
In `@python/cuml/cuml_accel_tests/test_pipeline.py`:
- Line 221: The test unpacks regression_data into X_train, X_test, y_train,
y_test but y_test is unused; update the unpacking to prefix the unused variable
with an underscore (e.g., X_train, X_test, y_train, _y_test) in the test
functions that perform this unpack (references: variables X_train, X_test,
y_train, y_test in the test file), and apply the same change at the other
occurrence around the 307 context so the linter/unused-variable warnings are
resolved.
In `@python/cuml/cuml/accel/_patches/sklearn/pipeline.py`:
- Around line 29-37: The loop in flat_steps unpacks (name, step) but never uses
name; change the loop variable from name to _name in the for statement that
iterates over pipeline.steps (and reversed(pipeline.steps)) so the unused
variable is clearly marked (i.e., replace "for name, step in ..." with "for
_name, step in ..."). Ensure the rest of flat_steps, Pipeline check, and yield
logic remain unchanged.
In `@python/cuml/cuml/accel/_patches/sklearn/utils.py`:
- Around line 38-44: Replace the bare assert in the loop over estimator_groups
with an explicit exception that includes context: after collecting proxied_cls
via is_proxy for a given name, if len(proxied_cls) != 1 raise a ValueError (or
RuntimeError) that states the estimator group name and the number of proxied
classes found (and optionally lists their types) instead of using assert, then
append the single proxied class to estimators as before; this change touches the
loop using estimator_groups, proxied_cls, is_proxy, and the estimators list.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (24)
python/cuml/cuml/accel/_overrides/__init__.pypython/cuml/cuml/accel/_overrides/hdbscan.pypython/cuml/cuml/accel/_overrides/sklearn/__init__.pypython/cuml/cuml/accel/_overrides/sklearn/cluster.pypython/cuml/cuml/accel/_overrides/sklearn/covariance.pypython/cuml/cuml/accel/_overrides/sklearn/decomposition.pypython/cuml/cuml/accel/_overrides/sklearn/ensemble.pypython/cuml/cuml/accel/_overrides/sklearn/kernel_ridge.pypython/cuml/cuml/accel/_overrides/sklearn/linear_model.pypython/cuml/cuml/accel/_overrides/sklearn/manifold.pypython/cuml/cuml/accel/_overrides/sklearn/neighbors.pypython/cuml/cuml/accel/_overrides/sklearn/preprocessing.pypython/cuml/cuml/accel/_overrides/sklearn/svm.pypython/cuml/cuml/accel/_overrides/umap.pypython/cuml/cuml/accel/_patches/__init__.pypython/cuml/cuml/accel/_patches/sklearn/__init__.pypython/cuml/cuml/accel/_patches/sklearn/pipeline.pypython/cuml/cuml/accel/_patches/sklearn/utils.pypython/cuml/cuml/accel/accelerator.pypython/cuml/cuml/accel/core.pypython/cuml/cuml/accel/estimator_proxy.pypython/cuml/cuml/accel/pytest_plugin.pypython/cuml/cuml_accel_tests/test_accelerator.pypython/cuml/cuml_accel_tests/test_pipeline.py
750627f to
f4b2e63
Compare
| @@ -1,5 +1,5 @@ | |||
| # | |||
| # SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION. | |||
| # SPDX-FileCopyrightText: Copyright (c) 2025-2026, NVIDIA CORPORATION. | |||
There was a problem hiding this comment.
Previously the accelerator only supported "overrides" (though we weren't consistent in our terminology).
An override:
- Doesn't mutate the original (non-accelerated) module
- Is only visible to consumers not in the accelerator's exclude list. For example, the override for
sklearn.linear_models.LinearRegressionis visible to external consumers, but any usage within sklearn itself will see the original (non-accelerated) version.
Sometimes though we also need to patch sklearn itself, mutating the original module. Previously we accomplished this in a one-off hack, but now we have a need to make this something more supported. Overrides should be preferred to patches when possible, but are sometimes necessary to
achieve robust acceleration in the simplest way.
The changes in this file:
- Standardize our internal terminology. An
overrideis a non-mutating, overlay layer in an accelerated module, only visible to external consumers. Apatchis a mutation applied to the original module, and is visible to both internal and external consumers. - Add builtin support for patches to the accelerator
This corresponds with a few changes in other files:
- Rename the
_wrappersdirectory to_overrides. - Add a new
_patchesdirectory to contain any patches. - Move the
sklearn.utils.all_estimatorspatch to the new_patchesdirectory. We want to always apply this patch, since without itall_estimatorsreturns incorrect results.
|
|
||
| try: | ||
| install() | ||
| except RuntimeError: |
There was a problem hiding this comment.
This RuntimeError could never occur and was mistakenly leftover from the old implementation. Safe to rip this out.
| # reference.html#pytest.hookspec.pytest_load_initial_conftests | ||
|
|
||
| # Apply sklearn patches BEFORE installing cuml.accel to prevent duplicates | ||
| apply_sklearn_patches() |
There was a problem hiding this comment.
Patches are now always applied, no need to hack this in here.
This adds an optimization to `Pipeline` when running under cuml.accel. If a pipeline is composed of accelerated estimators (or only starts with unaccelerated estimators), then the accelerated estimators will now pass data on device rather than doing a series of device<->host transfers.
f4b2e63 to
fc6bfcf
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
python/cuml/cuml/accel/_patches/sklearn/pipeline.py (1)
27-37: Rename unused loop variablenameto_name.The loop variable
nameis not used within the loop body. Per Python convention, prefix unused variables with an underscore.♻️ Proposed fix
def flat_steps(pipeline): """Iterate over steps potentially nested pipelines""" - for name, step in ( + for _name, step in ( reversed(pipeline.steps) if reverse else pipeline.steps ): if step in (None, "passthrough"): continue if isinstance(step, Pipeline): yield from flat_steps(step) else: yield step🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@python/cuml/cuml/accel/_patches/sklearn/pipeline.py` around lines 27 - 37, In the flat_steps function, the loop binds an unused variable name; rename it to _name to follow Python convention for unused variables by changing the loop header in flat_steps from "for name, step in (reversed(pipeline.steps) if reverse else pipeline.steps):" to "for _name, step in (reversed(pipeline.steps) if reverse else pipeline.steps):" so linters and readers know the first element is intentionally unused.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@python/cuml/cuml/accel/_patches/sklearn/pipeline.py`:
- Around line 27-37: In the flat_steps function, the loop binds an unused
variable name; rename it to _name to follow Python convention for unused
variables by changing the loop header in flat_steps from "for name, step in
(reversed(pipeline.steps) if reverse else pipeline.steps):" to "for _name, step
in (reversed(pipeline.steps) if reverse else pipeline.steps):" so linters and
readers know the first element is intentionally unused.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
python/cuml/cuml/accel/_patches/sklearn/pipeline.pypython/cuml/cuml/accel/core.pypython/cuml/cuml/accel/estimator_proxy.pypython/cuml/cuml_accel_tests/test_pipeline.py
🚧 Files skipped from review as they are similar to previous changes (1)
- python/cuml/cuml/accel/core.py
Some methods may return dtypes not supported on device, and others would never be chained in a context where this pipeline optimization would take effect. Easier to only enabled device arrays for *transform* methods than optionally gate the inverse.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@python/cuml/cuml_accel_tests/test_pipeline.py`:
- Line 65: The linter warnings are from unused parameters/variables: in the fit
method (def fit(self, X, y=None)) keep the public API but mark y as
intentionally unused by adding "del y" or referencing it as "_ = y" at the top
of fit; for the test files rename local variables y_test to _y_test (or prefix
with underscore) where they are unused (references at the test functions around
the previous y_test occurrences) to silence unused-variable warnings while
preserving behavior.
- Around line 306-321: The test test_pipeline_data_transfer_with_host_fallback
is currently patching Ridge at the proxy-facing API (patch_methods(Ridge, "fit",
"predict")) so it checks the wrong interception point; to validate device→host
conversion before CPU fallback you should patch the CPU-side estimator methods
where ensure_host runs (i.e., the CPU estimator class/methods invoked after
fallback) instead of the proxy Ridge methods, then assert the intercepted
fit/predict argument types are numpy.ndarray (not cupy) and that
pipeline.predict returns np.ndarray; update patch_methods to target the CPU
estimator implementation used in the fallback call path (the actual estimator
class/method names invoked after ensure_host) and change the type assertions
accordingly.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
python/cuml/cuml/accel/estimator_proxy.pypython/cuml/cuml_accel_tests/test_pipeline.py
|
/merge |
This PR adds an optimization for
Pipelines when running undercuml.accel. In cases where the tail end of operations performed by a pipeline are all accelerated, we can avoid intermediate host<->device transfers and pass the intermediates on device.For example, a pipeline of
unaccelerated -> unaccelerated -> accelerated -> acceleratedwouldn't need to convert the intermediate result back to numpy between steps 3 and 4 and could pass that on device. A pipeline ofunaccelerated -> accelerated -> unaccelerated -> acceleratedthough would keep passing intermediates on host since there's not a contiguous block of accelerated estimators.The actual implementation of this is roughly ~100 lines. The remainder is tests, as well as a refactor to better distinguish between patches (mutate the original module) and overrides (overlay over the original module).
Fixes #7778. Supersedes #7782.