[FEA] Implemented adjusted_mutual_info_score and expected_mutual_information#7673
Draft
mani-builds wants to merge 51 commits intorapidsai:mainfrom
Draft
[FEA] Implemented adjusted_mutual_info_score and expected_mutual_information#7673mani-builds wants to merge 51 commits intorapidsai:mainfrom
mani-builds wants to merge 51 commits intorapidsai:mainfrom
Conversation
This PR removes pre-release upper bound pinnings from non-RAPIDS dependencies. The presence of pre-release indicators like `<...a0` tells pip "pre-releases are OK, even if `--pre` was not passed to pip install." RAPIDS projects currently use such constraints in situations where it's not actually desirable to get pre-releases. xref: rapidsai/build-planning#144 Authors: - Bradley Dice (https://github.com/bdice) - Gil Forsyth (https://github.com/gforsyth) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Gil Forsyth (https://github.com/gforsyth) URL: rapidsai#7568
Admin merge per build-eng request. We have unnecessary `syncthreads` in the UMAP optimize step that can lead to hangs. The kernel has a big `while (row < nnz) ` loop where each thread progresses through a few rows. If some threads in a block break out of this loop, and other threads are still working, `__syncthreads` will hang. We don't need `__syncthreads` here because all threads read/write to independent slots in both the `grads_buffer` and `current_buffer`.
Admin merging per build-eng request Reverting this PR: rapidsai#7481 because it needs further testing on pointer residency on HMM-configure machines. Related issue: rapidsai#7540
The UCI dataset repository is currently down (and has been down before). `sklearn` includes a utility for accessing this same dataset, downloading it from a more reliable mirror. Here we update the `cuml.accel` example notebook to use that utility function instead to get our docs building again. --------- Co-authored-by: Simon Adorf <sadorf@nvidia.com>
Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - Kyle Edwards (https://github.com/KyleFromNVIDIA) URL: rapidsai#7583
Enable the `merge_barriers` setting in `.github/ops-bot.yaml` to enable the new merge barriers plugin. Authors: - Kyle Edwards (https://github.com/KyleFromNVIDIA) Approvers: - Bradley Dice (https://github.com/bdice) URL: rapidsai#7579
This PR updates the numba-cuda version to `>=0.22.1,<0.23.0`. Authors: - https://github.com/brandon-b-miller - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: rapidsai#7549
While cuML itself is compatible with scikit-learn 1.8, some of our dependencies (umap-learn, hdbscan, xgboost) are not. A very small number of tests is not yet compatible either.
This is a fairly substantial rewrite/refactor of our _existing_ type reflection system. There should be no user-visible changes from this refactor (though in the future we do want to make changes). For now this is just trying to simplify the internals so the existing system is easier to understand, reason about, and modify. - Moved (almost) all logic to a single file `cuml.internals.outputs` instead of being strewn around 5+ files. - Removed all the contextmanagers in `api_context_managers.py` in favor a simpler, more readable mechanisms - Removed all the decorators in `api_decorators.py` in favor of a single `reflect` decorator with sane defaults and only a few configurable knobs - Removed `set_api_output_type`; this feature was unnecessary, the `reflect` decorator can handle everything without an escape hatch. - Reduced state management for type reflection decisions down to 3 places (a combination of `GlobalSettings().output_type`, `Base.output_type`, and an array input type, depending on the call). The decision around what output type to return is now entirely in one location, and the conversion is also encompassed within a single function. This should hopefully be much easier to understand. - Removed the auto-decorating `Base` metaclass in favor of explicit decorators. This was done by logging the original auto-decorated versions, then inspecting each one when adding explicit versions to ensure they were accurate. Not everything that was decorated before needed to be decorated. - Removed decorators on functions that don't need them. This is mostly functions that return non-arrays and don't make any nested calls requiring a `CumlArray` output - Fixed a few decorators that weren't applied properly (e.g. `LinearRegression.predict`). These are bugfixes. Once this is in, we should have an easier time making behavior changes and deprecating features (rapidsai#7426) since the new implementation is simpler and has fewer moving pieces. Fixes rapidsai#5022. Authors: - Jim Crist-Harif (https://github.com/jcrist) Approvers: - Simon Adorf (https://github.com/csadorf) URL: rapidsai#7539
Resolves rapidsai#7517 The `optimize_batch_kernel` and `optimize_batch_kernel_reg` use the following condition to check for out of bounds `while (row < nnz)` Inside the loop `row += skip_size` is used for iteration. However, when row is close to `INT32_MAX` it can cause an overflow, which leads to the value of row wrapping around to become a negative number. The loop will then be run again since row < nnz still, which will lead to a cuda illegal memory access since row is negative. To solve this we can check for the overflow case and break from the while loop `if (row > nnz - skip_size) break;` Update: We can use `size_t` for row and skip_size instead. See rapidsai#7517 for a concrete reproducer. Authors: - Anupam (https://github.com/aamijar) Approvers: - Jinsol Park (https://github.com/jinsolp) - Victor Lafargue (https://github.com/viclafargue) URL: rapidsai#7587
## Summary - Update `run_ctests.sh` to first try the installed test location (CI/conda environments) and fall back to the build directory (devcontainer environments) - Enables testing in devcontainers with `test-cuml-cpp` xref: rapidsai/devcontainers#630 Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Kyle Edwards (https://github.com/KyleFromNVIDIA) URL: rapidsai#7596
…uctor (rapidsai#7604) closes rapidsai#7603 Fixes missing `epsilon` and `svmType` fields in the `SVC` class constructor's aggregate initialization of `SvmParameter`. After rapidsai#7461 added `max_outer_iter` to `SvmParameter`, the constructor in `svc.cu` was updated but still omitted the final two fields: ```C++ // Before (missing epsilon and svmType): --param(SvmParameter{C, cache_size, max_outer_iter, -1, nochange_steps, tol, verbosity}) // After (explicit initialization): ++param(SvmParameter{C, cache_size, max_outer_iter, -1, nochange_steps, tol, verbosity, 0, C_SVC}) ``` While C++ value-initializes omitted aggregate members to zero, this behavior can vary across compilers and build configurations. The missing `svmType` field caused intermittent CI failures in `SmoSolverTest/0.SvcTest` with the error: > "Incorrect training: cannot calculate the constant in the decision function" This occurred because an undefined `svmType` value could cause the SMO solver to misinterpret training data, leading to invalid support vector selection. Authors: - Dante Gama Dessavre (https://github.com/dantegd) Approvers: - Divye Gala (https://github.com/divyegala) - Simon Adorf (https://github.com/csadorf) URL: rapidsai#7604
I was seeing issues with the RAPIDS pip devcontainers solving dask-ml. Adding a lower bound to constrain the solution space appears to fix the problem.
Here is the error I saw:
```
Collecting dask-ml (from -r /tmp/rapids.requirements.txt (line 23))
Using cached dask_ml-2025.1.0-py3-none-any.whl.metadata (6.0 kB)
Using cached dask_ml-2024.4.4-py3-none-any.whl.metadata (5.9 kB)
Using cached dask_ml-2024.3.20-py3-none-any.whl.metadata (4.3 kB)
Using cached dask_ml-2023.3.24-py3-none-any.whl.metadata (4.3 kB)
Using cached dask_ml-2022.5.27-py3-none-any.whl.metadata (4.3 kB)
Using cached dask_ml-2022.1.22-py3-none-any.whl.metadata (4.3 kB)
Using cached dask_ml-2021.11.30-py3-none-any.whl.metadata (4.3 kB)
Using cached dask_ml-2021.11.16-py3-none-any.whl.metadata (4.3 kB)
Using cached dask_ml-2021.10.17-py3-none-any.whl.metadata (4.3 kB)
Using cached dask_ml-1.9.0-py3-none-any.whl.metadata (4.2 kB)
Using cached dask_ml-1.8.0-py3-none-any.whl.metadata (2.7 kB)
Using cached dask_ml-1.7.0-py3-none-any.whl.metadata (2.7 kB)
Using cached dask_ml-1.6.0-py3-none-any.whl.metadata (2.7 kB)
Using cached dask_ml-1.5.0-py3-none-any.whl.metadata (2.7 kB)
Using cached dask_ml-1.4.0-py3-none-any.whl.metadata (2.7 kB)
Using cached dask_ml-1.3.0-py3-none-any.whl.metadata (2.9 kB)
Using cached dask_ml-1.2.0-py3-none-any.whl.metadata (2.9 kB)
Using cached dask_ml-1.1.1-py3-none-any.whl.metadata (2.9 kB)
Using cached dask_ml-1.1.0-py3-none-any.whl.metadata (2.9 kB)
Using cached dask_ml-1.0.0-py3-none-any.whl.metadata (2.9 kB)
Using cached dask_ml-0.13.0-py2.py3-none-any.whl.metadata (3.1 kB)
Using cached dask_ml-0.12.0-py2.py3-none-any.whl.metadata (3.0 kB)
Using cached dask_ml-0.11.0-py2.py3-none-any.whl.metadata (2.8 kB)
Using cached dask_ml-0.10.0-py2.py3-none-any.whl.metadata (2.9 kB)
Using cached dask_ml-0.9.0-py2.py3-none-any.whl.metadata (2.9 kB)
Using cached dask-ml-0.8.0.tar.gz (243 kB)
Installing build dependencies: started
Installing build dependencies: finished with status 'done'
Getting requirements to build wheel: started
Getting requirements to build wheel: finished with status 'error'
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> [23 lines of output]
Traceback (most recent call last):
File "/home/coder/.local/share/venvs/rapids/lib/python3.13/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 389, in <module>
main()
~~~~^^
File "/home/coder/.local/share/venvs/rapids/lib/python3.13/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 373, in main
json_out["return_val"] = hook(**hook_input["kwargs"])
~~~~^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/coder/.local/share/venvs/rapids/lib/python3.13/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 143, in get_requires_for_build_wheel
return hook(config_settings)
File "/tmp/pip-build-env-jqfmfxhw/overlay/lib/python3.13/site-packages/setuptools/build_meta.py", line 331, in get_requires_for_build_wheel
return self._get_build_requires(config_settings, requirements=[])
~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/pip-build-env-jqfmfxhw/overlay/lib/python3.13/site-packages/setuptools/build_meta.py", line 301, in _get_build_requires
self.run_setup()
~~~~~~~~~~~~~~^^
File "/tmp/pip-build-env-jqfmfxhw/overlay/lib/python3.13/site-packages/setuptools/build_meta.py", line 512, in run_setup
super().run_setup(setup_script=setup_script)
~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/pip-build-env-jqfmfxhw/overlay/lib/python3.13/site-packages/setuptools/build_meta.py", line 317, in run_setup
exec(code, locals())
~~~~^^^^^^^^^^^^^^^^
File "<string>", line 4, in <module>
ModuleNotFoundError: No module named 'numpy'
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed to build 'dask-ml' when getting requirements to build wheel
```
<details><summary>Failing `requirements.txt` file:</summary>
```
--extra-index-url=https://download.pytorch.org/whl/cu130
--extra-index-url=https://pypi.anaconda.org/rapidsai-wheels-nightly/simple
--extra-index-url=https://pypi.nvidia.com
aiobotocore>=2.2.0
boto3>=1.21.21
botocore>=1.24.21
cachetools
certifi
click >=8.1
cloudpickle
cmake>=3.26.4,!=3.30.0
cmake>=3.30.4
cmake>=3.30.4,<4
confluent-kafka>=2.8.0,<2.9.0
cramjam
cuda-core==0.3.*
cuda-python>=13.0.0,<14.0a0
cuda-toolkit[nvcc,nvrtc]==13.*
cupy-cuda13x>=13.6.0
cython>=3.0.0,<3.2.0
cython>=3.0.3,<3.2.0
cython>=3.1.2,<3.2.0
dask-ml
fastavro>=0.22.9
fsspec>=0.6.0
fsspec[http]>=0.6.0
graphviz
hdbscan>=0.8.39,<0.8.40
hypothesis>=6.0,<7
hypothesis>=6.131.7
identify>=2.5.20
ipykernel
ipython
joblib>=0.11
jupyter_client
libucx-cu13>=1.19.0,<1.20
matplotlib
mmh3
moto[server]>=4.0.8
mpi4py
msgpack
nanoarrow
nbconvert
nbformat
nbsphinx
networkx>=2.5.1
networkx>=3.2
nltk
notebook
notebook>=0.5.0
numba-cuda>=0.22.1,<0.23.0
numba-cuda[cu13]>=0.22.1,<0.23.0
numba>=0.60.0,<0.62.0
numpy >=1.23,<3.0
numpy>=1.23,<3.0
numpydoc
numpydoc<1.9
numpydoc>=1.1.0
nvidia-libnvcomp-cu13==5.1.0.21
nvidia-ml-py>=12
nvidia-nccl-cu13>=2.19
nvtx>=0.2.1
ogb
openmpi >=5.0
openpyxl
packaging
pandas
pandas>=1.3
pandas>=2.0,<2.4.0
polars>=1.29,<1.35
pre-commit
psutil
pyarrow>=15.0.0,!=17.0.0; platform_machine=='aarch64'
pyarrow>=15.0.0; platform_machine=='x86_64'
pydata-sphinx-theme!=0.14.2
pynndescent
pytest
pytest-asyncio
pytest-asyncio>=1.0.0
pytest-benchmark
pytest-cases
pytest-cases>=3.8.2
pytest-cov
pytest-forked
pytest-httpserver
pytest-mpl
pytest-rerunfailures!=16.0.0
pytest-timeout
pytest-xdist
pytest<9.0
pytest<9.0.0
python-louvain
pyyaml
pyyaml>=6
rangehttpserver
rapids-build-backend>=0.4.0,<0.5.0
rapids-dask-dependency==26.2.*,>=0.0.0a0
rapids-logger==0.2.*,>=0.0.0a0
recommonmark
rich
s3fs>=2022.3.0
scikit-build-core[pyproject]>=0.10.0
scikit-learn
scikit-learn>=0.23.1
scikit-learn>=1.4
scikit-learn>=1.4,<1.8.0
scipy
scipy>=1.11.0
seaborn
sentence-transformers
setuptools
setuptools>=61.0.0
setuptools>=64.0.0
sphinx
sphinx-click
sphinx-click>=2.7.1
sphinx-copybutton
sphinx-markdown-tables
sphinx-rtd-theme>=0.5.1
sphinx_rtd_theme
statsmodels
streamz
structlog
tenacity
tensordict>=0.1.2
torch-geometric>=2.5,<2.7
torch>=2.9.0
treelite>=4.6.1,<5.0.0
typing-extensions; python_version < '3.11'
typing_extensions>=4.0.0
tzdata
umap-learn==0.5.7
wheel
xxhash
zarr>=3.0.0,<3.2.0,<4.0.0
zict>=2.0.0
zstandard
```
</details>
Authors:
- Bradley Dice (https://github.com/bdice)
Approvers:
- Kyle Edwards (https://github.com/KyleFromNVIDIA)
URL: rapidsai#7600
This: - Adds a new `CUML_ACCEL_LOG_LEVEL` environment variable for configuring the level of the `cuml.accel` logger. This log level will be used if a level isn't explicitly configured via other means (`cuml.accel.install` or `-v` in the CLI). - Uses the `CUML_ACCEL_LOG_LEVEL` environment variable to forward logging configuration on to subprocesses by default. Fixes rapidsai#7572. - Also updates the log handler to flush on every write, avoiding delays in logs appearing. Thanks to @betatim for reporting this issue. Fixes rapidsai#7572 Authors: - Jim Crist-Harif (https://github.com/jcrist) Approvers: - Simon Adorf (https://github.com/csadorf) URL: rapidsai#7602
Closes rapidsai#7377 This PR optimizes the `build_condensed_hierarchy` of HDBSCAN. Our previous implementation runs a top-down bfs tree traversal, where the GPU kernel is launched for every level of the tree. This is very slow because the tree is not balanced. This PR introduces a bottom-up approach by pointer-chasing up to the parent on the CPU using omp threads. This is much faster without any accuracy loss in the final result. Table below shows two main parts of our HDBSCAN implementation (build linkage, and condense). `adjusted_rand_score` is computed against our implementation using brute force graph build + original GPU condense implementation. BF + orig : Brute force MR graph build + original top-down GPU condense NND + orig: nn-descent MR graph build + original top-down GPU condense BF + new: Brute force MR graph build + new bottom-up CPU condense in this PR NND + new: nn-descent MR graph build + new bottom-up CPU condense in this PR <img width="652" height="559" alt="Screenshot 2025-11-06 at 6 50 26 PM" src="https://github.com/user-attachments/assets/66864ca7-5e46-46e0-affd-f3578f89f3ec" /> Authors: - Jinsol Park (https://github.com/jinsolp) - Simon Adorf (https://github.com/csadorf) Approvers: - Divye Gala (https://github.com/divyegala) - Simon Adorf (https://github.com/csadorf) URL: rapidsai#7459
…7594) Due to the recent issues with precomputed kNN failing in CI, I added changes to the `test_umap_outliers` to isolate this test from other issues and focus only on what it's supposed to be testing. The reason we were using precomputed kNN in this test was to do a faster run of CPU UMAP for comparison. Instead, now we run the full cuML UMAP (which internally computes the kNN) and hardwire the threshold from CPU UMAP. Note; A good embedding should fall into the ranges of +- 25, and bad embeddings with outliers have embedding values around 800. Authors: - Jinsol Park (https://github.com/jinsolp) - Simon Adorf (https://github.com/csadorf) Approvers: - Divye Gala (https://github.com/divyegala) - Victor Lafargue (https://github.com/viclafargue) URL: rapidsai#7594
This PR reverts rapidsai#7588. and updates the test suite to ensure compatibility with scikit-learn 1.8. Closes rapidsai#7599. Authors: - Simon Adorf (https://github.com/csadorf) Approvers: - Jim Crist-Harif (https://github.com/jcrist) - Gil Forsyth (https://github.com/gforsyth) - Bradley Dice (https://github.com/bdice) URL: rapidsai#7589
Resolves rapidsai#7554, Depends on rapidsai/cuvs#1610 (CI won't pass until this is merged) What does this PR do? 1. Removes lingering **unused** raft headers that will be deprecated such as `#include <raft/spatial/knn/knn.cuh>`, `#include <raft/distance/distance.cuh>`, etc. 2. ~~Updates to raft::memory_type_from_pointer instead of the deprecated raft::spatial::knn::detail::utils::pointer_residency.~~ 3. Removes `metric_processor` from `knn.hpp` and `knn.cu`. The only special metric processing needed is for correlation distance which we can handle in `knn.cu` instead of using the class from `processing.cuh` from raft. The cosine distance is supported in ivf_flat and ivf_pq in cuvs so we do **not** need to use the innerproduct metric and special processing that was there before. 4. Uses `build_dendrogram_host` from cuvs instead of raft. Authors: - Anupam (https://github.com/aamijar) Approvers: - Victor Lafargue (https://github.com/viclafargue) URL: rapidsai#7561
Closes rapidsai#7466. Authors: - Simon Adorf (https://github.com/csadorf) Approvers: - Bradley Dice (https://github.com/bdice) - Jim Crist-Harif (https://github.com/jcrist) URL: rapidsai#7605
This is the 2nd part of the deprecation plan for changing the behavior of `max_iter` for `SVC`/`SVR`. Previously we added a `TotalIters` wrapper so users could opt-in to the new behavior, now we make the new behavior the default and deprecate the wrapper. `max_iter` now always places a limit on _total iterations_ in the solver. Follow-up to rapidsai#7461. Authors: - Jim Crist-Harif (https://github.com/jcrist) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#7612
This removes the remaining bits of deprecated functionality set to be dropped for release 26.02: - Removes deprecated `MulticlassClassifier` class - Removes deprecated `n_iter` param to `TSNE` - Removes deprecated `n_neighbors` param to `AgglomerativeClustering` - Removes deprecated `metric=None` support to `AgglomerativeClustering` Authors: - Jim Crist-Harif (https://github.com/jcrist) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#7613
This: - Removes the deprecated `normalize` parameter from all affected models (`cuml.linear_model`/`cuml.dask.linear_model`) - Removes support for `normalize` at the C++ level as well - Updates the C++ tests accordingly (mostly just deleting no longer necessary test cases) Follow-up to rapidsai#7415. Authors: - Jim Crist-Harif (https://github.com/jcrist) Approvers: - Anupam (https://github.com/aamijar) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#7611
Authors: - Divye Gala (https://github.com/divyegala) Approvers: - Bradley Dice (https://github.com/bdice) URL: rapidsai#7619
This drops a few places where we were relying on `float(one_element_array)` to convert an array to a float. This behavior is deprecated in numpy; instead we can rely on `.item()` to do the same. Fixes rapidsai#7617 Authors: - Jim Crist-Harif (https://github.com/jcrist) Approvers: - Tim Head (https://github.com/betatim) URL: rapidsai#7618
rapidsai#5147 Authors: - Victor Lafargue (https://github.com/viclafargue) Approvers: - Jim Crist-Harif (https://github.com/jcrist) URL: rapidsai#7590
…_descent` (rapidsai#7620) We don't support deterministic NN Descent. This PR raises a warning to the user when `build_algo` is given as `nn_descent` and `random_state` is also set. Authors: - Jinsol Park (https://github.com/jinsolp) Approvers: - Victor Lafargue (https://github.com/viclafargue) - Jim Crist-Harif (https://github.com/jcrist) URL: rapidsai#7620
Resolves rapidsai#7352 This PR documents `k-means++` as a init option in the `KMeans` estimator. Authors: - Anupam (https://github.com/aamijar) Approvers: - Jim Crist-Harif (https://github.com/jcrist) URL: rapidsai#7615
…7424) Running the common tests from scikit-learn on the naive bayes estimators lead to frequent CUDA memory errors. This PR addresses this by rewriting the naive bayes algorithms to use cupy instead of using CUDA kernels defined on the fly. This is a good thing in general as it reduces unnecessary complexity in the code base and increases maintainability. The goal of this PR is to achieve correctness and stable test running. Once we have this we can investigate potential for speedups in the cupy implementation in a new PR. <details><summary>Some benchmarking on `CategoricalNB` done by Cursor</summary> <p> # CategoricalNB Final Performance Comparison ## Custom CUDA Kernels vs CuPy Implementation ### Test Environment - GPU: (Current CUDA device) - cuML version: 26.2.0 - Benchmark runs: 5 iterations per configuration - Results show mean ± std deviation - Times are in milliseconds --- ## 🎯 Dense Input Performance ### Before and After Optimization | Config | Samples | Features | Cats | **Original Kernels** | **CuPy Optimized** | **Speedup vs Kernel** | |--------|---------|----------|------|---------------------|-------------------|---------------------| | **Fit Time (ms)** | | Small | 1K | 50 | 5 | 1.30 ± 0.02 | **12.93 ± 0.08** ✅ | **0.10x** (10x slower) | | Medium | 5K | 100 | 10 | 1.34 ± 0.01 | **24.83 ± 0.13** ✅ | **0.05x** (18x slower) | | Medium | 10K | 50 | 5 | 1.33 ± 0.02 | **13.31 ± 0.06** ✅ | **0.10x** (10x slower) | | Large | 50K | 50 | 5 | 2.19 ± 0.50 | **14.40 ± 0.54** ✅ | **0.15x** (6.6x slower) | ### Remaining Gap vs Kernels While still 6-18x slower than custom kernels, this is now **acceptable** because: - Dense CategoricalNB is rarely used (sparse is the primary use case) - Performance is now in the **10-25ms range** (totally usable) - Eliminates kernel maintenance burden - Much simpler, maintainable code --- ## ✅ Sparse Input Performance - MAINTAINED ### Three-Way Comparison | Config | Samples | Features | Cats | Density | **Original Kernels** | **CuPy Optimized** | **Ratio** | |--------|---------|----------|------|---------|---------------------|-------------------|-----------| | **Fit Time (ms)** | | Small | 1K | 100 | 5 | 5.0% | 21.74 ± 0.12 | 22.11 ± 0.10 | 0.98x | | Small | 1K | 1K | 10 | 1.0% | 24.28 ± 0.33 | 24.66 ± 0.28 | 0.99x | | Medium | 5K | 500 | 5 | 5.0% | 35.97 ± 0.33 | 38.52 ± 0.21 | 0.93x | | Medium | 10K | 1K | 10 | 1.0% | 39.94 ± 0.20 | 40.76 ± 0.11 | 0.98x | | Large | 10K | 5K | 20 | 0.5% | 52.26 ± 0.35 | 52.57 ± 0.12 | 0.99x | | Large | 50K | 1K | 5 | 1.0% | 74.85 ± 0.50 | 74.87 ± 0.47 | 1.00x | | **Predict Time (ms)** | | Small | 1K | 100 | 5 | 5.0% | 18.84 ± 0.04 | 19.06 ± 0.02 | 0.99x | | Small | 1K | 1K | 10 | 1.0% | 20.14 ± 0.07 | 20.42 ± 0.04 | 0.99x | | Medium | 5K | 500 | 5 | 5.0% | 63.57 ± 0.15 | 66.88 ± 0.06 | 0.95x | | Medium | 10K | 1K | 10 | 1.0% | 58.56 ± 0.08 | 62.01 ± 0.08 | 0.94x | | Large | 10K | 5K | 20 | 0.5% | 98.25 ± 0.03 | 90.70 ± 0.06 | **1.08x** ⚡ | | Large | 50K | 1K | 5 | 1.0% | 297.36 ± 0.10 | 296.97 ± 0.09 | 1.00x | **Result**: Sparse performance remains **equivalent** (0.93-1.08x) with the optimized CuPy implementation! ✅ </p> </details> Authors: - Tim Head (https://github.com/betatim) - Simon Adorf (https://github.com/csadorf) - Jim Crist-Harif (https://github.com/jcrist) Approvers: - Jim Crist-Harif (https://github.com/jcrist) URL: rapidsai#7424
…#7566) Closes rapidsai#7132 This PR ensures that we don't default to copying data to device memory if precomputed knn graph is given. Authors: - Jinsol Park (https://github.com/jinsolp) Approvers: - Jim Crist-Harif (https://github.com/jcrist) URL: rapidsai#7566
…idsai#7626) This always is an issue otherwise when committing from devcontainers. Authors: - Divye Gala (https://github.com/divyegala) Approvers: - Bradley Dice (https://github.com/bdice) URL: rapidsai#7626
Follow up to rapidsai#7595. Depends on rapidsai/raft#2894 and rapidsai/cuvs#1639. This PR sets the seed to null if the user passes no seed. Deterministic cusparse algorithm in raft is used if user passes in a seed. Authors: - Anupam (https://github.com/aamijar) Approvers: - Victor Lafargue (https://github.com/viclafargue) - Jinsol Park (https://github.com/jinsolp) URL: rapidsai#7608
This applies the cleanups part of rapidsai#7317 to `cuml.naive_bayes`. It's a followup to rapidsai#7424. Highlights: - Several followup fixes from rapidsai#7424. Mostly ripping out debugging code that was leftover and no longer needed. - Removal of validation, conversion, and initialization of fitted attributes from `__init__`. - Fixed `CumlArrayDescriptor` definitions to only define fitted attributes for models that support them, rather than defining the same fitted attributes across all naive bayes estimators - Removal of `cuml.prims.array`, this module is no longer used - Simplification of code paths, removal of extraneous definitions - Docstring cleanups Given this is a very low priority module, I didn't dwell too much on the individual implementations, instead only handling the end goals in rapidsai#7317. _There are still improvements that could be made to this module_. In particular: - There are visible inefficiencies in the cupy code and category handling. Work is repeated, extraneous copies are made. Given the priorities here, I don't think this is worth working on unless someone suddenly needs these models to be much faster. - These classifiers still don't handle non-numeric inputs (like the rest of cuml does). Punting on this for now, but we likely do want to support this in the future (if for nothing else, I'd like to rip out `cuml.prims.labels` completely). I'll open a followup issue. Authors: - Jim Crist-Harif (https://github.com/jcrist) Approvers: - Victor Lafargue (https://github.com/viclafargue) URL: rapidsai#7623
…#7629) Stopgap until NVIDIA/numba-cuda#676 is in. Authors: - Jim Crist-Harif (https://github.com/jcrist) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) - Bradley Dice (https://github.com/bdice) URL: rapidsai#7629
This is an empty commit to trigger a build. It is used when builds get stuck with an old ABI. Rebuilding updates them to the new one.
This PR removes the dependency on the separate cumlprims_mg library by moving all of its code directly into cuML. The cumlprims_mg library provided multi-node multi-GPU (MNMG) primitives used by several cuML algorithms. By inlining this code, we simplify the build process, reduce external dependencies, and make it easier to maintain the MNMG functionality alongside the algorithms that use it.
Note: The move keeps the namespace MLCommon for the new-to-cuML code, to facilitate easy identification for further moves and cleanups.
Changes:
- New (i.e. moved files from cumlprims_mg) source files added to cuml/cpp/src_prims/opg/
- Build system changes
- Added new source files to cuml_objs target in cpp/CMakeLists.txt
- Removed cumlprims_mg::cumlprims_mg from link libraries
- Deleted cmake/thirdparty/get_cumlprims_mg.cmake
- Removed ENABLE_CUMLPRIMS_MG and CUML_EXCLUDE_CUMLPRIMS_MG_FROM_ALL CMake options
- Dependency/configuration updates
- Updated dependencies.yaml to remove libcumlprims
- Updated all conda environment YAML files
- Updated conda recipes (libcuml/recipe.yaml, cuml/recipe.yaml)
- Updated CI scripts (build_wheel_libcuml.sh, build_wheel_cuml.sh)
- Updated ci/release/update-version.sh
- Python changes
- Updated Cython declarations in opg_data_utils_mg.pxd
- Updated python/libcuml/CMakeLists.txt and python/cuml/CMakeLists.txt
- Updated libcuml/load.py to no longer load libcumlprims_mg.so
- Updated linkage tests
- Documentation updates
- Updated BUILD.md, cpp/README.md, python/cuml/README.md
Authors:
- Dante Gama Dessavre (https://github.com/dantegd)
- Divye Gala (https://github.com/divyegala)
Approvers:
- Vyas Ramasubramani (https://github.com/vyasr)
- Bradley Dice (https://github.com/bdice)
- Divye Gala (https://github.com/divyegala)
URL: rapidsai#7585
This deprecates the `handle` argument to all classes, methods, and functions. `cuml.Handle` is a bit of a relic, and doesn't necessarily provide a solid user-facing benefit. A few things it was purported to do: **Allow for asynchronous execution** All cuml python APIs are synchronous, we do not support asynchronous execution. Any docs or claims that this is currently possible are incorrect. Further, the python apis we're mimicking and the ecosystem we're extending don't support asynchronous execution. `cupy` (which we make heavy use of in cuml) does have some mechanisms for asynchronous execution - if/when we want to support asynchronous operation, we're likely to take an approach that piggybacks off their configuration (for better ecosystem support) than rely on the existing `cuml.Handle` class that doesn't really match the task. Handles are for specifying and caching resources, not for configuring sync/async operation. **Specify the stream of execution** This _was_ possible with `cuml.Handle`, but didn't provide a meaningful benefit. `cuml` apis are all synchronous, so the stream they execute on doesn't matter. Further, not every function or estimator that accepted a `cuml.Handle` made use of the handle (or respected the stream), making this specification kind of moot. `cuml.Handle` is really only used for functions part of `libcuml` - anything written using cupy/cudf/etc... ignores them completely. Given we provide synchronous APIs only (and rely on threads for concurrency, matching python conventions), it doesn't make sense to expose this at the user level necessarily. Better to make it an implementation detail of APIs that rely on `libcuml`. **Specify the number of streams in a backing stream pool** A few of our algorithms support using multiple streams from a pool on the handle. In some cases (`cuml.ensemble`) we also exposed a top-level `n_streams` argument, which seems preferable. I've added this to `LinearSVC` (the only other algorithm that uses this AFAICT). Elevating the `n_streams` to a top-level parameter makes them more discoverable by users, and also lets us avoid modifying or constructing a handle within the `__init__`, better following sklearn conventions. **Specify a `DeviceResourcesSNMG`** This is a relatively new use case currently only supported by `HDBSCAN` and `UMAP`. Some of our algorithms support running on multiple GPUs on a single node (when configured). Previously this was supported by passing in a `pylibraft.common.handle.DeviceResourcesSNMG` instead of a `pylibraft.common.handle.Handle`. There were several problems with that though (see rapidsai#7465, rapidsai#7059). Instead, we now elevate `device_ids` to a top-level parameter for these models. Like with `n_streams`, this better elevates this feature, and keeps our configuration and `__init__` simpler. `DeviceResourcesSNMG` is now an implementation detail, not something user-facing. **Hold other resources to share across calls** A `Handle` contains many resources (cusolver handles, cublas handles, ...), which are created lazily and may have some costs to initialize. The previous model would result in a unique `Handle` per estimator or function call by default, preventing sharing these initialization costs. Expert users may create a handle once and pass it around, but that requires extra plumbing on their part with no real other benefit. Instead, we now cache a handle per thread (some resources have concurrency or thread limitations, with python's thread-based concurrency model keeping things thread local prevents any issues on that front). For single threaded programs only a single `Handle` will now be created, letting us avoid any repeated init costs. --- Given the above arguments, `Handle` and `DeviceResourcesSNMG` are now implementation details, and are no longer user facing. To accomplish this, this PR: - Modifies all relevant estimators and functions to warn if a user specifies a `handle`. During the deprecation cycle the specified `handle` will continue to be used. The warning informs the user the parameter is deprecated, and if the estimator supports `n_streams`/`device_ids` will also include a note to use that instead. - The internal `*MG` estimators (e.g. `LogisticRegressionMG`) still do include a `handle` parameter, since for now the distributed comms are also attached to that object and need to be provided somehow by the caller. Since these are internal(ish) classes, I'm ok with keeping the `handle` parameter around on them for now. In the long run we probably will want to rethink our multi-gpu APIs since the `*MG` classes are a bit unwieldly, but no sense changing them for now. - All estimators that make use of a handle's stream pool now include an `n_streams` parameter for configuring the pool size. Users are recommended to use that by the docs and deprecation warnings. - All estimators that support multi-gpu execution now include a `device_ids` parameter for configuring the devices used. Users are recommended to use that by the docs and deprecation warnings. - Estimators or functions that don't have a handle manually specified will use a cached thread-local handle, unless `n_streams`/`device_ids` are specified. - Accessing `cuml.Handle` (a re-export of `pylibraft.common.handle.Handle`) now also raises a deprecation warning. This re-export will be removed in the following release. - Docstrings are updated to note the deprecation and inform the user of alternate parameters like `n_streams`/`device_ids`. - Tests are updated to no longer specify a `handle` (we had a few modules that parametrized across handle/no-handle). - New tests are added to test the deprecation warnings and that execution with a handle specified still works in all cases. Fixes rapidsai#6869 Fixes rapidsai#7059 Fixes rapidsai#7465 Authors: - Jim Crist-Harif (https://github.com/jcrist) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#7628
This changes the nightly cuml.accel integration test with scikit-learn to use a strict "fail on anything" setup. The same as we are using on Pull Requests. This solves the problem that we have to choose an arbitrary threshold to declare "CI passes" (there is no great way to justify 80% over 85% or 87.325%) and that different versions of scikit-learn have a different number of tests. For example 1.8.0 has about 44000 test cases, v1.7.2 has 41472, about 41000 are shared between those two versions. About 1000 only exist in 1.7.2 and 4000 are new in 1.8.0. This means the pass rate can change quite a bit, without cuml.accel having gotten any worse. We could also reconsider how we calculate the pass rate. For example, the denominator of the pass rate includes skipped tests. Virtually all of the ~2500 skipped tests that are only in 1.8.0 are related to the array API. The reason they are skipped has more to do with what is installed in the test environment (pytorch, cupy, etc) and which environment variables are set than with the quality of cuml.accel. The important thing is that we do not start failing tests we used to pass or start passing tests we used to fail. And of course if new versions bring new tests that we fail that needs fixing. Authors: - Tim Head (https://github.com/betatim) Approvers: - Jim Crist-Harif (https://github.com/jcrist) - James Lamb (https://github.com/jameslamb) URL: rapidsai#7631
…pidsai#7634) To make this notebook more robust by not relying on remote datasets. Closes rapidsai#7633 . Authors: - Simon Adorf (https://github.com/csadorf) Approvers: - Divye Gala (https://github.com/divyegala) URL: rapidsai#7634
Replace fetched datasets with synthetic generated data to make tests more robust and eliminate network dependencies. Closes rapidsai#3161 ; Closes rapidsai#5158; Closes rapidsai#6558; Closes rapidsai#7639 Authors: - Simon Adorf (https://github.com/csadorf) Approvers: - Bradley Dice (https://github.com/bdice) URL: rapidsai#7637
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #7294