Skip to content

CI Make nightly cuml.accel integration test stricter#7631

Merged
rapids-bot[bot] merged 2 commits intorapidsai:mainfrom
betatim:fix-integration-tests
Dec 23, 2025
Merged

CI Make nightly cuml.accel integration test stricter#7631
rapids-bot[bot] merged 2 commits intorapidsai:mainfrom
betatim:fix-integration-tests

Conversation

@betatim
Copy link
Copy Markdown
Member

@betatim betatim commented Dec 22, 2025

This changes the nightly cuml.accel integration test with scikit-learn to use a strict "fail on anything" setup. The same as we are using on Pull Requests.

This solves the problem that we have to choose an arbitrary threshold to declare "CI passes" (there is no great way to justify 80% over 85% or 87.325%) and that different versions of scikit-learn have a different number of tests. For example 1.8.0 has about 44000 test cases, v1.7.2 has 41472, about 41000 are shared between those two versions. About 1000 only exist in 1.7.2 and 4000 are new in 1.8.0. This means the pass rate can change quite a bit, without cuml.accel having gotten any worse.

We could also reconsider how we calculate the pass rate. For example, the denominator of the pass rate includes skipped tests. Virtually all of the ~2500 skipped tests that are only in 1.8.0 are related to the array API. The reason they are skipped has more to do with what is installed in the test environment (pytorch, cupy, etc) and which environment variables are set than with the quality of cuml.accel.

The important thing is that we do not start failing tests we used to pass or start passing tests we used to fail. And of course if new versions bring new tests that we fail that needs fixing.

@betatim betatim requested review from a team as code owners December 22, 2025 13:54
@betatim betatim requested review from jcrist and rockhowse December 22, 2025 13:54
@github-actions github-actions bot added Cython / Python Cython or Python issue ci labels Dec 22, 2025
@@ -1,6 +1,6 @@
- reason: AUC standard deviation differs slightly with cuml.accel in sklearn 1.8
- reason: AUC standard deviation differs slightly with cuml.accel in sklearn >= 1.7.2
Copy link
Copy Markdown
Member Author

@betatim betatim Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used 1.8.0 and 1.7.2 locally and noticed that this xfail should also include 1.7.2. Eventually 1.7.2 might become the "intermediate" scikit-learn version we test against, so updating this now when the memory is fresh.

- "sklearn.metrics._plot.tests.test_roc_curve_display::test_roc_curve_from_cv_results_legend_label[single-None]"
- "sklearn.metrics._plot.tests.test_roc_curve_display::test_roc_curve_from_cv_results_legend_label[single-curve_kwargs1]"
- reason: Search CV sample weight equivalence differs with cuml.accel in sklearn 1.8
- reason: Search CV sample weight equivalence differs with cuml.accel in sklearn 1.7.2
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(as above)

Comment on lines 21 to 24
rapids-logger "Analyzing test results"
./python/cuml/cuml_accel_tests/upstream/summarize-results.py \
--config ./python/cuml/cuml_accel_tests/upstream/scikit-learn/test_config.yaml \
"${RAPIDS_TESTS_DIR}/junit-cuml-accel-scikit-learn.xml"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can also drop this call (and the set +e above) and just run the tests like normal. The summarize script doesn't get us much of anything IMO.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that and decided to keep it so that those who want to know can track the pass rate manually. I'd prefer that people use a number from a CI run if they want to quote the pass rate than execute something locally (which will make it virtually impossible to ever understand how they came up with the number)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, they can always take the numbers output in the pytest summary to calculate it if they want to. I suspect this case will never come up and the whole thing is unnecessary. Still, now that CI has passed not sure ripping it out is worth another ci cycle.

@betatim betatim added non-breaking Non-breaking change improvement Improvement / enhancement to an existing function labels Dec 23, 2025
Copy link
Copy Markdown
Member

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Giving this a cic-codeowners approval, sounds great to me 😁

@jameslamb jameslamb removed the request for review from rockhowse December 23, 2025 18:48
@jcrist
Copy link
Copy Markdown
Member

jcrist commented Dec 23, 2025

/merge

@rapids-bot rapids-bot bot merged commit 6bd5bf2 into rapidsai:main Dec 23, 2025
109 checks passed
mani-builds pushed a commit to mani-builds/cuml that referenced this pull request Jan 11, 2026
This changes the nightly cuml.accel integration test with scikit-learn to use a strict "fail on anything" setup. The same as we are using on Pull Requests.

This solves the problem that we have to choose an arbitrary threshold to declare "CI passes" (there is no great way to justify 80% over 85% or 87.325%) and that different versions of scikit-learn have a different number of tests. For example 1.8.0 has about 44000 test cases, v1.7.2 has 41472, about 41000 are shared between those two versions. About 1000 only exist in 1.7.2 and 4000 are new in 1.8.0. This means the pass rate can change quite a bit, without cuml.accel having gotten any worse.

We could also reconsider how we calculate the pass rate. For example, the denominator of the pass rate includes skipped tests. Virtually all of the ~2500 skipped tests that are only in 1.8.0 are related to the array API. The reason they are skipped has more to do with what is installed in the test environment (pytorch, cupy, etc) and which environment variables are set than with the quality of cuml.accel.

The important thing is that we do not start failing tests we used to pass or start passing tests we used to fail. And of course if new versions bring new tests that we fail that needs fixing.

Authors:
  - Tim Head (https://github.com/betatim)

Approvers:
  - Jim Crist-Harif (https://github.com/jcrist)
  - James Lamb (https://github.com/jameslamb)

URL: rapidsai#7631
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci Cython / Python Cython or Python issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants