Add sparse input support to `ElasticNet`/`Lasso` by jcrist · Pull Request #7943 · rapidsai/cuml

jcrist · 2026-04-01T18:46:54Z

This:

Adds sparse input support to ElasticNet and Lasso, based on the existing QN solver. To accomplish this, we change the default of solver to 'auto', which will use 'cd' when dense and 'qn' when sparse.
Consolidates tests for ElasticNet and Lasso in test_elastic_net.py. Previously these were duplicated and spread among a few files.
Adds new tests for the new sparse functionality
Updates the cuml.accel integration and docs accordingly. Most new xfails are due to numerical equivalences or lack of support for dual_gap_.

Fixes #7912.

One test was duplicative and was just deleted. The other was moved unchanged.

coderabbitai · 2026-04-01T19:01:33Z

📝 Walkthrough

Summary by CodeRabbit

New Features
- Added sparse input support for ElasticNet and Lasso models.
- Added "auto" solver option that automatically selects between coordinate descent and quasi-Newton solvers based on input sparsity.
- Added sparse_coef_ property to return coefficient matrix in sparse format.
Bug Fixes
- Added validation to reject multi-output targets in GPU fitting with appropriate error messaging.
Documentation
- Updated limitations documentation to reflect improved sparse input handling.

Walkthrough

This PR adds sparse matrix support to ElasticNet and Lasso models by introducing SparseInputTagMixin, implementing auto-solver selection (choosing "qn" for sparse inputs, "cd" for dense), adding sparse_coef_ property, and validating multi-output targets in accelerated code paths. Documentation, tests, and xfail lists are updated accordingly.

Changes

Cohort / File(s)	Summary
Documentation `docs/source/cuml-accel/limitations.rst`	Removed "If X is sparse" fallback condition from ElasticNet and Lasso CPU fallback lists, reflecting new sparse support.
Core Implementation `python/cuml/cuml/linear_model/elastic_net.py`, `python/cuml/cuml/linear_model/lasso.py`	Added sparse support via `SparseInputTagMixin`, changed solver default to "auto" with logic to select "qn" for sparse and "cd" for dense inputs, added `sparse_coef_` property returning CSR sparse matrix, tightened "cd" solver to explicitly reject sparse inputs.
Accelerated Code Paths `python/cuml/cuml/accel/_overrides/sklearn/linear_model.py`	Updated ElasticNet and Lasso `_gpu_fit` to convert targets via `input_to_cuml_array`, validate non-multi-output targets, and raise `UnsupportedOnGPU` for multi-output cases.
Test Coverage `python/cuml/tests/test_elastic_net.py`	Added Hypothesis-based solver compatibility tests, sparse regression tests with tight tolerances against scikit-learn, `sparse_coef_` verification, and solver error handling assertions.
Test Infrastructure `python/cuml/tests/test_linear_model.py`, `python/cuml/tests/test_exceptions.py`	Removed legacy ElasticNet tests from `test_linear_model.py` and `test_exceptions.py` (consolidated into dedicated `test_elastic_net.py`).
CI Expectations `python/cuml/cuml_accel_tests/upstream/scikit-learn/xfail-list.yaml`	Updated xfail list to replace multi-output sparse tests with sparse format-specific xfails, added sample-weight consistency xfails for ElasticNet, and adjusted estimator-check xfails.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

CI Update xfail list #7768: Modifies xfail configurations for ElasticNet/Lasso test expectations, overlapping with CI expectation updates in this PR.

Suggested labels

cuml-accel

Suggested reviewers

divyegala
dantegd
viclafargue

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Add sparse input support to `ElasticNet`/`Lasso`' directly describes the main changes: adding sparse input support to both ElasticNet and Lasso models.
Description check	✅ Passed	The description comprehensively covers the changeset: sparse input support addition, solver default change to 'auto', test consolidation, new sparse tests, and cuml.accel updates.
Linked Issues check	✅ Passed	The PR fully addresses issue `#7912` by implementing sparse matrix input support for ElasticNet/Lasso with proper solver selection ('auto' chooses 'cd' for dense, 'qn' for sparse).
Out of Scope Changes check	✅ Passed	All changes are directly related to implementing sparse input support: new sparse handling logic, solver updates, test consolidation, and documentation/xfail updates align with the stated objective.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

python/cuml/cuml/linear_model/lasso.py (1)
33-46: ⚠️ Potential issue | 🟡 Minor

Clarify that selection only affects coordinate descent.

solver='auto' now resolves to qn for sparse inputs, but the selection paragraph still reads as if it always changes fitting behavior. Please mirror ElasticNet here and state that selection is only used when the resolved solver is 'cd'.

As per coding guidelines, "Missing docstrings for public methods, undocumented hyperparameters, or missing scikit-learn compatibility notes in documentation must be addressed."

Also applies to: 125-125
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/cuml/cuml/linear_model/lasso.py` around lines 33 - 46, Update the
Lasso docstring to clarify that the selection parameter only applies when the
coordinate descent solver is used: explicitly state that when solver resolves to
'cd' (including when solver='auto' resolves to 'cd' for dense inputs) the
selection option ('cyclic' or 'random') affects coefficient updates, and that
selection is ignored when the resolved solver is 'qn' (including when
solver='auto' resolves to 'qn' for sparse inputs); mirror the wording used in
ElasticNet's docstring so the behavior is consistent and also update any other
Lasso method docstring mentioning selection to the same wording.

🧹 Nitpick comments (1)

python/cuml/tests/test_elastic_net.py (1)
369-390: Please exercise at least one sparse-array input here.

This only covers scipy.sparse.csr_matrix, but the new code path is keyed off generic sparse detection and the same PR adds separate sparse-array / other sparse-format xfails upstream. Adding a csr_array case here would give us a local guardrail for the sparse-array path too.

As per coding guidelines, "Test files must validate numerical correctness by comparing with scikit-learn, include edge case coverage (empty datasets, single sample, high-dimensional data), test fit/predict/transform consistency, and test different input types (cuDF, pandas, NumPy)."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/cuml/tests/test_elastic_net.py` around lines 369 - 390, The
test_sparse function currently only exercises scipy.sparse.csr_matrix; add a
sparse-array case to exercise the new sparse-array code path by parameterizing
or branching on an input_type and passing a scipy.sparse.csr_array into the
training/prediction code. Concretely, update the test_sparse parametrize to
include an input_type (e.g., ["csr_matrix", "csr_array"]) or add a small loop
inside test_sparse that converts X to scipy.sparse.csr_array when the csr_array
case is selected, then proceed to construct cu_model/sk_model, fit, and compare
coef_/intercept_/score as before (leave test names and assertions unchanged) so
the same numerical comparisons cover both csr_matrix and csr_array inputs.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@python/cuml/cuml/linear_model/elastic_net.py`:
- Around line 30-35: The current SparseInputTagMixin together with is_sparse(X)
routes all scipy/cupyx sparse inputs into the QN/GPU path even though we still
mark some sparse cases as xfail (e.g., sample_weight on sparse inputs and
specific csc_*/lil_* cases in upstream/scikit-learn/xfail-list.yaml); update
ElasticNet's sparse handling to either (a) narrow the sparse contract by
changing the gating logic in SparseInputTagMixin/is_sparse(X) so only supported
sparse formats (e.g., CSR/CSC without sample_weight) are routed to the QN GPU
path, or (b) add explicit guards in ElasticNet (or the QN entrypoint) that
detect the unsupported combinations (sparse sample_weight, csc/lil formats) and
fall back to the CPU/scikit-learn code path; reference and modify the
SparseInputTagMixin, is_sparse(X) checks and the ElasticNet QN dispatch to
ensure parity with scikit-learn edge-case behavior and avoid sending xfailed
cases to the GPU path.

---

Outside diff comments:
In `@python/cuml/cuml/linear_model/lasso.py`:
- Around line 33-46: Update the Lasso docstring to clarify that the selection
parameter only applies when the coordinate descent solver is used: explicitly
state that when solver resolves to 'cd' (including when solver='auto' resolves
to 'cd' for dense inputs) the selection option ('cyclic' or 'random') affects
coefficient updates, and that selection is ignored when the resolved solver is
'qn' (including when solver='auto' resolves to 'qn' for sparse inputs); mirror
the wording used in ElasticNet's docstring so the behavior is consistent and
also update any other Lasso method docstring mentioning selection to the same
wording.

---

Nitpick comments:
In `@python/cuml/tests/test_elastic_net.py`:
- Around line 369-390: The test_sparse function currently only exercises
scipy.sparse.csr_matrix; add a sparse-array case to exercise the new
sparse-array code path by parameterizing or branching on an input_type and
passing a scipy.sparse.csr_array into the training/prediction code. Concretely,
update the test_sparse parametrize to include an input_type (e.g.,
["csr_matrix", "csr_array"]) or add a small loop inside test_sparse that
converts X to scipy.sparse.csr_array when the csr_array case is selected, then
proceed to construct cu_model/sk_model, fit, and compare coef_/intercept_/score
as before (leave test names and assertions unchanged) so the same numerical
comparisons cover both csr_matrix and csr_array inputs.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a72e33dd-6a9a-49eb-81be-ea08f41b8910

📥 Commits

Reviewing files that changed from the base of the PR and between ea4d5a2 and 84b564d.

📒 Files selected for processing (8)

docs/source/cuml-accel/limitations.rst
python/cuml/cuml/accel/_overrides/sklearn/linear_model.py
python/cuml/cuml/linear_model/elastic_net.py
python/cuml/cuml/linear_model/lasso.py
python/cuml/cuml_accel_tests/upstream/scikit-learn/xfail-list.yaml
python/cuml/tests/test_elastic_net.py
python/cuml/tests/test_exceptions.py
python/cuml/tests/test_linear_model.py

💤 Files with no reviewable changes (3)

python/cuml/tests/test_exceptions.py
docs/source/cuml-accel/limitations.rst
python/cuml/tests/test_linear_model.py

coderabbitai · 2026-04-01T19:01:37Z

python/cuml/cuml/linear_model/elastic_net.py

+    Base,
+    InteropMixin,
+    LinearPredictMixin,
+    RegressorMixin,
+    SparseInputTagMixin,
+    FMajorInputTagMixin,


⚠️ Potential issue | 🟠 Major

The new sparse contract is broader than the behavior we still xfail.

SparseInputTagMixin plus is_sparse(X) now routes any scipy/cupyx sparse input through QN, but this same PR still adds xfails for sparse sample_weight equivalence and several csc_* / lil_* cases in python/cuml/cuml_accel_tests/upstream/scikit-learn/xfail-list.yaml. Please canonicalize or gate the unsupported sparse formats/combinations here, or narrow the advertised sparse contract, before defaulting all sparse inputs into the GPU path.

As per coding guidelines, "Function and parameter names or defaults must match scikit-learn without justification; behavior for edge cases (empty arrays, single sample) must match scikit-learn."

Also applies to: 249-275

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@python/cuml/cuml/linear_model/elastic_net.py` around lines 30 - 35, The current SparseInputTagMixin together with is_sparse(X) routes all scipy/cupyx sparse inputs into the QN/GPU path even though we still mark some sparse cases as xfail (e.g., sample_weight on sparse inputs and specific csc_*/lil_* cases in upstream/scikit-learn/xfail-list.yaml); update ElasticNet's sparse handling to either (a) narrow the sparse contract by changing the gating logic in SparseInputTagMixin/is_sparse(X) so only supported sparse formats (e.g., CSR/CSC without sample_weight) are routed to the QN GPU path, or (b) add explicit guards in ElasticNet (or the QN entrypoint) that detect the unsupported combinations (sparse sample_weight, csc/lil formats) and fall back to the CPU/scikit-learn code path; reference and modify the SparseInputTagMixin, is_sparse(X) checks and the ElasticNet QN dispatch to ensure parity with scikit-learn edge-case behavior and avoid sending xfailed cases to the GPU path.

jcrist added 4 commits April 1, 2026 11:49

Rename test_coordinate_descent -> test_elastic_net

2a90b4b

Move ElasticNet/Lasso tests all to test_elastic_net

b419c96

One test was duplicative and was just deleted. The other was moved unchanged.

Add sparse support to ElasticNet/Lasso

ccb0bb5

Update cuml-accel

84b564d

jcrist self-assigned this Apr 1, 2026

jcrist requested a review from a team as a code owner April 1, 2026 18:46

jcrist added feature request New feature or request non-breaking Non-breaking change labels Apr 1, 2026

jcrist requested a review from divyegala April 1, 2026 18:46

jcrist added the algo: linear-model label Apr 1, 2026

github-actions bot added the Cython / Python Cython or Python issue label Apr 1, 2026

coderabbitai bot reviewed Apr 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sparse input support to `ElasticNet`/`Lasso`#7943

Add sparse input support to `ElasticNet`/`Lasso`#7943
jcrist wants to merge 4 commits intorapidsai:mainfrom
jcrist:sparse-elastic-net

jcrist commented Apr 1, 2026

Uh oh!

coderabbitai bot commented Apr 1, 2026

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jcrist commented Apr 1, 2026

Uh oh!

coderabbitai bot commented Apr 1, 2026

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants