Skip to content

Conversation

PicoCentauri
Copy link
Collaborator

@PicoCentauri PicoCentauri commented Feb 13, 2025

Fixes #237

Reopening of #238

Adding fixes for updates with scikit-learn and scipy. Scikit-learn's estimator checks will fail when validate_data is not used. These fixes will pass scikit-learn's estimator checks but are causing unit tests to fail.

Contributor (creator of PR) checklist

  • Tests updated (for new features and bugfixes)?
  • Documentation updated (for new features)?
  • Issue referenced (for PRs that solve an issue)?

For Reviewer

  • CHANGELOG updated if important change?

📚 Documentation preview 📚: https://scikit-matter--239.org.readthedocs.build/en/239/

@PicoCentauri PicoCentauri force-pushed the fix-sklearn branch 2 times, most recently from 75e03da to 879057b Compare February 13, 2025 13:21
PicoCentauri and others added 6 commits February 13, 2025 16:33
* replacing `self._validate_data` with `validate_data`

* Additional fixes

* Fixing scikit-learn warnings

* Fixing PCovR to work with 1D column vectors

* Fixing examples

* Changing shape of y in pcovr tests

* Fixing PCovR to not mess with data shape

---------

Co-authored-by: Christian Jorgensen <[email protected]>
@PicoCentauri PicoCentauri changed the title Fix sklearn Update to latest versions of sklearn and scipy Apr 17, 2025
@cajchristian cajchristian requested a review from rosecers April 17, 2025 16:12
Copy link
Collaborator

@rosecers rosecers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general things look good -- I don't think we need to do the validation checks in the selector subclasses because they're taken care of in the superclass. I can approve once codecov passes with sufficient tests.

y : ignored

Returns
-------
score : numpy.ndarray of (n_to_select_from_)
:math:`\pi` importance for the given samples or features
"""
if y is not None:
validate_data(self, X, y.ravel(), reset=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the checks already present in the superclass, I do not think these validations are necessary.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. We need to keep the validation for X even though it's not necessary because scikit-learn now enforces that the shape of X passed for score() must be consistent with the shape of X used in fit(). The validation for y is not necessary and I have removed it.

if y is not None:
validate_data(self, X, y.ravel(), reset=False)
else:
validate_data(self, X, reset=False) # present for API consistency
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same note as above. These matrices will already be validated in GreedySelector.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed - see above.

if y is not None:
validate_data(self, X, y.ravel(), reset=False)
else:
validate_data(self, X, reset=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same note as above. These matrices will already be validated in GreedySelector.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed - see above

if y is not None:
validate_data(self, X, y.ravel(), reset=False)
else:
validate_data(self, X, reset=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same note as above. These matrices will already be validated in GreedySelector.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed - see above

@rosecers rosecers merged commit 295c3e5 into main May 5, 2025
12 checks passed
@rosecers rosecers deleted the fix-sklearn branch May 5, 2025 17:08
@rosecers rosecers mentioned this pull request May 5, 2025
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Keep up with API changes of scikit-learn version 1.6.0
3 participants