Skip to content

Centralised input validation with validate_data#7816

Draft
betatim wants to merge 5 commits intorapidsai:mainfrom
betatim:more-xfails
Draft

Centralised input validation with validate_data#7816
betatim wants to merge 5 commits intorapidsai:mainfrom
betatim:more-xfails

Conversation

@betatim
Copy link
Copy Markdown
Member

@betatim betatim commented Feb 18, 2026

This is a PR that tries to move cuml towards using a central validate_data function that performs the conversion to a cuml array and applies various checks that catch cases where users pass in invalid inputs.

The goal is to make cuml a more robust library that provides helpful error messages to users when they make a mistake. It also reduces the number of xfailed checks in the scikit-learn compat test suite.

The main effort right now is towards check_fit1d, check_fit2d_predict1d, check_estimators_unfitted, check_requires_y_none.

check_supervised_y_no_nan is also getting addresses but I need to benchmark this to see how big the penalty is (same for check_estimators_nan_inf). You could argue that if we can't handle these inputs we need to check for them and that comparing to a version that incorrectly does not check for this isn't the right comparison to make. Status quo, things just crash if you pass nan or inf.

The main idea of validate_data is to centralise the checking ,exception raising, etc. It works quite well, but there are quite a few estimators that behave different to the majority. For example performing input validation inside the solver or accepting sparse input. Still working on what the right balance is between making validate_data more configurable and duplicating a few checks in these special estimators.

Early WIP to discuss with @jcrist

Closes #7428

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Feb 18, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions bot added the Cython / Python Cython or Python issue label Feb 18, 2026
@betatim
Copy link
Copy Markdown
Member Author

betatim commented Feb 18, 2026

/ok to test c36b55c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Cython / Python Cython or Python issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Unified Array Digestion System

2 participants