CI Make nightly cuml.accel integration test stricter by betatim · Pull Request #7631 · rapidsai/cuml

betatim · 2025-12-22T13:54:56Z

This changes the nightly cuml.accel integration test with scikit-learn to use a strict "fail on anything" setup. The same as we are using on Pull Requests.

This solves the problem that we have to choose an arbitrary threshold to declare "CI passes" (there is no great way to justify 80% over 85% or 87.325%) and that different versions of scikit-learn have a different number of tests. For example 1.8.0 has about 44000 test cases, v1.7.2 has 41472, about 41000 are shared between those two versions. About 1000 only exist in 1.7.2 and 4000 are new in 1.8.0. This means the pass rate can change quite a bit, without cuml.accel having gotten any worse.

We could also reconsider how we calculate the pass rate. For example, the denominator of the pass rate includes skipped tests. Virtually all of the ~2500 skipped tests that are only in 1.8.0 are related to the array API. The reason they are skipped has more to do with what is installed in the test environment (pytorch, cupy, etc) and which environment variables are set than with the quality of cuml.accel.

The important thing is that we do not start failing tests we used to pass or start passing tests we used to fail. And of course if new versions bring new tests that we fail that needs fixing.

betatim · 2025-12-22T13:55:53Z

python/cuml/cuml_accel_tests/upstream/scikit-learn/xfail-list.yaml

@@ -1,6 +1,6 @@
- reason: AUC standard deviation differs slightly with cuml.accel in sklearn 1.8
+- reason: AUC standard deviation differs slightly with cuml.accel in sklearn >= 1.7.2


I used 1.8.0 and 1.7.2 locally and noticed that this xfail should also include 1.7.2. Eventually 1.7.2 might become the "intermediate" scikit-learn version we test against, so updating this now when the memory is fresh.

betatim · 2025-12-22T13:56:06Z

python/cuml/cuml_accel_tests/upstream/scikit-learn/xfail-list.yaml

  - "sklearn.metrics._plot.tests.test_roc_curve_display::test_roc_curve_from_cv_results_legend_label[single-None]"
  - "sklearn.metrics._plot.tests.test_roc_curve_display::test_roc_curve_from_cv_results_legend_label[single-curve_kwargs1]"
- reason: Search CV sample weight equivalence differs with cuml.accel in sklearn 1.8
+- reason: Search CV sample weight equivalence differs with cuml.accel in sklearn 1.7.2


jcrist · 2025-12-22T15:24:22Z

ci/test_python_scikit_learn_tests.sh

 rapids-logger "Analyzing test results"
 ./python/cuml/cuml_accel_tests/upstream/summarize-results.py \
    --config ./python/cuml/cuml_accel_tests/upstream/scikit-learn/test_config.yaml \
    "${RAPIDS_TESTS_DIR}/junit-cuml-accel-scikit-learn.xml"


I think we can also drop this call (and the set +e above) and just run the tests like normal. The summarize script doesn't get us much of anything IMO.

I thought about that and decided to keep it so that those who want to know can track the pass rate manually. I'd prefer that people use a number from a CI run if they want to quote the pass rate than execute something locally (which will make it virtually impossible to ever understand how they came up with the number)

I mean, they can always take the numbers output in the pytest summary to calculate it if they want to. I suspect this case will never come up and the whole thing is unnecessary. Still, now that CI has passed not sure ripping it out is worth another ci cycle.

jameslamb

Giving this a cic-codeowners approval, sounds great to me 😁

jcrist · 2025-12-23T18:50:01Z

/merge

This changes the nightly cuml.accel integration test with scikit-learn to use a strict "fail on anything" setup. The same as we are using on Pull Requests. This solves the problem that we have to choose an arbitrary threshold to declare "CI passes" (there is no great way to justify 80% over 85% or 87.325%) and that different versions of scikit-learn have a different number of tests. For example 1.8.0 has about 44000 test cases, v1.7.2 has 41472, about 41000 are shared between those two versions. About 1000 only exist in 1.7.2 and 4000 are new in 1.8.0. This means the pass rate can change quite a bit, without cuml.accel having gotten any worse. We could also reconsider how we calculate the pass rate. For example, the denominator of the pass rate includes skipped tests. Virtually all of the ~2500 skipped tests that are only in 1.8.0 are related to the array API. The reason they are skipped has more to do with what is installed in the test environment (pytorch, cupy, etc) and which environment variables are set than with the quality of cuml.accel. The important thing is that we do not start failing tests we used to pass or start passing tests we used to fail. And of course if new versions bring new tests that we fail that needs fixing. Authors: - Tim Head (https://github.com/betatim) Approvers: - Jim Crist-Harif (https://github.com/jcrist) - James Lamb (https://github.com/jameslamb) URL: rapidsai#7631

Stop using failure threshold, always fail

197e294

betatim requested review from a team as code owners December 22, 2025 13:54

betatim requested review from jcrist and rockhowse December 22, 2025 13:54

github-actions bot added Cython / Python Cython or Python issue ci labels Dec 22, 2025

github-actions bot assigned betatim Dec 22, 2025

betatim commented Dec 22, 2025

View reviewed changes

jcrist reviewed Dec 22, 2025

View reviewed changes

Merge remote-tracking branch 'upstream/main' into fix-integration-tests

946d3db

betatim added non-breaking Non-breaking change improvement Improvement / enhancement to an existing function labels Dec 23, 2025

jcrist approved these changes Dec 23, 2025

View reviewed changes

jameslamb approved these changes Dec 23, 2025

View reviewed changes

jameslamb removed the request for review from rockhowse December 23, 2025 18:48

rapids-bot bot merged commit 6bd5bf2 into rapidsai:main Dec 23, 2025
109 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI Make nightly cuml.accel integration test stricter#7631

CI Make nightly cuml.accel integration test stricter#7631
rapids-bot[bot] merged 2 commits intorapidsai:mainfrom
betatim:fix-integration-tests

betatim commented Dec 22, 2025

Uh oh!

betatim Dec 22, 2025 •

edited

Loading

Uh oh!

betatim Dec 22, 2025

Uh oh!

jcrist Dec 22, 2025

Uh oh!

betatim Dec 23, 2025

Uh oh!

jcrist Dec 23, 2025

Uh oh!

jameslamb left a comment

Uh oh!

jcrist commented Dec 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		@@ -1,6 +1,6 @@
		- reason: AUC standard deviation differs slightly with cuml.accel in sklearn 1.8
		- reason: AUC standard deviation differs slightly with cuml.accel in sklearn >= 1.7.2

Conversation

betatim commented Dec 22, 2025

Uh oh!

betatim Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

betatim Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

jcrist Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

betatim Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

jcrist Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

jameslamb left a comment

Choose a reason for hiding this comment

Uh oh!

jcrist commented Dec 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

betatim Dec 22, 2025 •

edited

Loading