Skip to content

Releases: cupy/cupy

v9.4.0

26 Aug 07:39
58f3db2
Compare
Choose a tag to compare

This is the release note of v9.4.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Compile with SASS (CUBIN) for CUDA versions >= 11.1 (#5097)

Changes NVRTC compile process to produce SASS (CUBIN files) instead of PTX so that kernels compiled with a new CUDA Toolkit version can be run with earlier CUDA Drivers. Check the CUDA Compatibility Guide and NVRTC Documentation for detailed information. We believe most users will not be affected by this change, but you can revert to the previous behavior by setting CUPY_COMPILE_WITH_PTX=1 environment variable just in case.

Support for AMD ROCm 4.3

Support for ROCm 4.3 has been added in the latest release and binary wheels are provided as well. Note that there is currently an issue with ROCm 4.3 that prevents it from running in several environments. The current workaround is to set the LLVM_PATH variable to the llvm folder included in ROCm 4.3 installation (e.g., export LLVM_PATH=/opt/rocm-4.3/llvm).

Changes

Enhancements

  • Compile with SASS for CUDA versions >= 11.1 (#5611)
  • Allow to compile using PTX with an envvar (#5634)
  • Add ncclAvg and ncclBfloat16 for NCCL (#5656)
  • Fix version check for new ROCm version definition (#5661)
  • Rest of version check fix for new ROCm version definition (#5670)

Bug Fixes

  • Fix FFT convolve for shapes containing 1 (#5613)
  • Fix the RTC call path for HIP (#5620)
  • Fix compute capability check (#5646)
  • Fix squareness checks (#5652)
  • Fix unique for empty array (#5658)

Code Fixes

  • Fix kernel names to be consistent (#5625)
  • Remove unnecessary comments (#5635)

Documentation

  • Update Sphinx to 4.1.2 (#5616)
  • __array_function__ feature by default (#5653)
  • Support ROCm v4.3 in document (#5674)

Tests

  • Increase test timeout (#5615)
  • Increase timeout for CUDA 11.4 tests (#5617)
  • Add CI for ROCm 4.3 (#5632)
  • Reload GPG key for ROCm 4.2 test (#5637)
  • Fix cubic for_all_dtypes_combination tests (#5639)
  • Add a workaround for ROCm 4.3.0 for testing (#5663)
  • Fix skipTest in test_decomp_lu (#5672)

Others

  • Bump version to v9.4.0 (#5680)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@grlee77 @leofang @yashasvimisra2798

v10.0.0b1

05 Aug 08:20
4ebc827
Compare
Choose a tag to compare
v10.0.0b1 Pre-release
Pre-release

This is the release note of v10.0.0b1. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

CuPy now supports CUDA 11.4 (cupy-cuda114)

Along with the new CUDA toolkit version, support for NCCL 2.10.3 and cuDNN 8.2.2 libraries is added.

Compute capability 86 support for GPUs of the RTX 30X0 and AX000 series is also added.

Google Summer of Code

CuPy is participating in Google Summer of Code under the NumFOCUS organization.

Our student @povinsahu1909 is working hard to add support for sparse linear algebra solvers and increasing the compatibility of the new random number generation API.

Compile with SASS (CUBIN) for CUDA versions >= 11.1 (#5097)

Changes NVRTC compile process to produce SASS (CUBIN files) instead of PTX so that kernels compiled with a new CUDA Toolkit version can be run with earlier CUDA Drivers. Check the CUDA Compatibility Guide and NVRTC Documentation for detailed information.

Changes without compatibility

Support the new DLPack exchange protocol (#5306)

By adopting the new DLPack exchange protocol proposed in the Python array API standard, cupy.fromDlpack has been deprecated in favor of cupy.from_dlpack.

Known Issues

  • cupy-cuda102, cupy-cuda110 and cupy-cuda111 wheels are not available yet in PyPI. In the meantime, they can be downloaded from the Assets section below. See #4971 for detailed instructions.

Changes

New Features

  • Texture memory 2D/3D affine transformations (#5171)
  • Support the new DLPack exchange protocol (#5306)
  • Add cupyx.scipy.sparse.linalg.lsmr (#5331)
  • JIT: Support all atomic intrinsics (#5387)
  • Expose _GUFunc through cupyx (#5408)
  • Add geometric distribution to new Generator (#5443)
  • Support Numba-like jit.gridsize() syntax in CuPy JIT (#5461)
  • Support Numba-like jit.laneid() and jit.warpsize syntax in CuPy JIT (#5462)
  • Add cupyx.scipy.sparse.linalg.cgs (#5524)
  • Add hypergeometric distribution to new Generator (#5560)

Enhancements

  • Compile with SASS for CUDA versions >= 11.1 (#5097)
  • Support NCCL v2.9.9 (#5268)
  • Support CUDA 11.4 and compute_86 (#5434)
  • Update NumPy/SciPy pinning in setup.py (#5453)
  • Make matrix_power support stacked matrices (#5458)
  • Support hipSPARSE and fix streams not set in some generic APIs in cuSPARSE (#5472)
  • Add cudaDeviceDisablePeerAccess wrapper (#5495)
  • Support cuDNN v8.2.2 (#5516)
  • Support NCCL v2.10.3: library installer and document (#5521)

Bug Fixes

  • JIT: Fix supported dtype of atomic_add on HIP (#5383)
  • Fix cupy.nanmedian's axis parameter to accept a sequence other than a tuple (#5389)
  • Fix astype from boolean (#5410)
  • Fix compatibility issues of ndarray.view (#5428)
  • Fix types attribute of ufunc (#5448)
  • Fix new DLPack protocol error messages and tests (#5449)
  • texture_memory option in affine_transform not supported by HIP (#5464)
  • Fix linalg.lstsq for empty matrix (#5467)
  • Fix reshape (#5470)
  • Fix random generator output not being raveled (#5478)
  • Fix random integers (#5479)
  • Fix availability tests in cuSOLVER and cuSPARSE (#5492)
  • Add missing hipSPARSE include to builder (#5515)
  • prune cuFFT static lib by major cc ver (#5531)
  • Fix casts from bool in ufunc inputs (#5539)
  • Access cudaMemoryType in the pointer attributes and fix for HIP (#5544)
  • Fix casts in ufunc outputs (#5550)
  • Code fix for {cu, roc}SOLVER (#5558)
  • Fix CUDA API call on module initialization (#5561)
  • Fix the RTC call path for HIP (#5569)
  • Fix broadcast error messages (#5579)

Code Fixes

  • Do not call cudnnGetVersion on import (#5326)
  • JIT: Fix __call__() for built-in functions (#5361)
  • Add HIP symbol redefinitions (#5362)
  • Remove the data member use_32bit_indexing from CArray (#5376)
  • Use dtype.name instead dtype.char (#5444)
  • Try to use -I in hipRTC (#5486)
  • Hide modules from public APIs (#5522)
  • consistent kernel names (#5551)
  • Use the new macro __HIP_PLATFORM_AMD__ at build time (#5554)

Documentation

  • Add upgrade guide for v10 (#5278)
  • Update tag lines in package description and docs index (#5399)
  • Fix typo in apply_along_axis (#5432)
  • Fix indent of Returns section (#5433)
  • Update user_guide/basic.rst device agnostic section (#5435)
  • Support CUDA 11.4 on documents (#5447)
  • Update install guide with new NumPy/SciPy versions (#5454)
  • Use from_dlpack instead of fromDlpack (#5488)
  • Use Sphinx 4.1.0 (#5489)
  • Bump ReadTheDocs configuration to version 2 (#5491)
  • Fix docs of eigh and eigvalsh (#5494)
  • Add a lingering doc page for fromDlpack() (#5509)
  • Document scipy.fft backend usage (#5514)
  • Replaced the links for NumPy docs as per issue #3418 (#5548)
  • Use Sphinx's envvar construct (#5570)
  • Fix intersphinx for SciPy 1.7.1 docs (#5587)

Installation

  • Fix license_file option in setup.cfg (#5406)
  • Import numpy before Cython (#5482)

Tests

  • Add tests for num_to_num's optional parameters (#5337)
  • Add script for ROCm CI on Jenkins (#5378)
  • Skip unwrap tests for numpy<1.21 (#5384)
  • Enable strict xfail in pytest (#5407)
  • Remove xfail in windows jitify test (#5409)
  • Fix preloading slow tests (#5440)
  • Add script for CUDA 11.4 CI on FlexCI (#5457)
  • Increase memory for CUDA 11.4 tests (#5477)
  • Fix DLPack test for ROCm/HIP (#5485)
  • Fix "Revert test decorators order" (#5498)
  • Fix some tests for HIP (#5501)
  • Fix FlexCI Linux tests (#5505)
  • Add CUDA 11.4 for FlexCI helper script (#5528)
  • Increase timeout for CUDA 11.4 tests (#5575)
  • Update tests to install all requirements and add PATH (#5576)
  • Add Cython to all requirements (#5577)

Others

  • Notify conflict by mergify (#5371)
  • Fix mergify to only comment when pull-request is open (#5439)
  • Fix mergify condition (#5513)
  • Add auto notify bot for hip label (#5538)
  • Use pull_request_target instead for auto notify bot (#5541)
  • Fix auto notify bot for issues (#5546)
  • Disable Mergify's auto-merge (#5556)
  • Bump version to v10.0.0b1 (#5595)
  • Fix signal tests for scipy 1.7.0 (#5368)
  • Fix numpy.unwrap for NumPy 1.21 (#5385)
  • Fix signaltools medfilt for scipy>=1.7.0 (#5386)
  • Fix deprecated numpy.typeDict utilization (#5388)

The CuPy Team would like to thank all those who contributed to this release!

@12rambau @grlee77 @leofang @maxim-belkin @Palash-Vishnani @povinsahu1909 @the-lay

v9.3.0

05 Aug 08:20
c8a3cc9
Compare
Choose a tag to compare

This is the release note of v9.3.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

CuPy now supports CUDA 11.4 (cupy-cuda114)

Along with the new CUDA toolkit version, support for NCCL 2.10.3 and cuDNN 8.2.2 libraries is added.

Compute capability 86 support for GPUs of the RTX 30X0 and AX000 series is also added.

Known Issues

  • cupy-cuda102, cupy-cuda110 and cupy-cuda111 wheels are not available yet in PyPI. In the meantime, they can be downloaded from the Assets section below. See #4971 for detailed instructions.

Changes

Enhancements

  • Support NCCL v2.9.9 (#5402)
  • Update NumPy/SciPy pinning in setup.py (#5471)
  • Support CUDA 11.4 and support compute_86 (#5519)
  • Support cuDNN v8.2.2 (#5523)
  • Make matrix_power support stacked matrices (#5525)
  • Support NCCL v2.10.3: library installer and document (#5526)

Bug Fixes

  • JIT: Fix supported dtype of atomic_add on HIP (#5405)
  • Fix cupy.nanmedian's axis parameter to accept a sequence other than a tuple (#5416)
  • Fix compatibility issues of ndarray.view (#5442)
  • Fix types attribute of ufunc (#5455)
  • Fix random integers (#5484)
  • Fix random generator output not being raveled (#5487)
  • Fix astype from boolean (#5490)
  • Fix reshape (#5504)
  • Fix linalg.lstsq for empty matrix (#5506)
  • Add missing checks and _setStream() (#5507)
  • Fix availability tests in cuSOLVER and cuSPARSE (#5534)
  • prune cufft static lib by major cc ver (#5536)
  • Fix casts from bool in ufunc inputs (#5549)
  • Code fix for {cu, roc}SOLVER (#5566)
  • Access cudaMemoryType in the pointer attributes and fix for HIP (#5571)
  • Fix broadcast error messages (#5584)
  • Fix casts in ufunc outputs (#5589)
  • Fix broken build on CUDA 9.2 (#5598)

Code Fixes

  • Remove the data member use_32bit_indexing from CArray (#5414)
  • JIT: Fix __call__() for built-in functions (#5422)
  • Do not call cudnnGetVersion on import (#5446)
  • Add HIP symbol redefinitions (#5475)
  • Try to use -I in hipRTC (#5502)
  • Hide modules from public APIs (#5533)
  • Use the new macro __HIP_PLATFORM_AMD__ at build time (#5565)

Documentation

  • Update tag lines in package description and docs index (#5415)
  • Fix typo in apply_along_axis (#5441)
  • Fix indent of Returns section (#5452)
  • Update user_guide/basic.rst device agnostic section (#5456)
  • Update install guide with new NumPy/SciPy versions (#5465)
  • Bump ReadTheDocs configuration to version 2 (#5497)
  • Fix docs of eigh and eigvalsh (#5499)
  • Use Sphinx 4.1.0 (#5500)
  • Document scipy.fft backend usage (#5532)
  • Support CUDA 11.4 on documents (#5535)
  • Replaced the links for NumPy docs as per issue #3418 (#5553)
  • Use Sphinx's envvar construct (#5586)
  • Fix intersphinx for SciPy 1.7.1 docs (#5588)

Installation

  • Fix license_file option in setup.cfg (#5411)
  • Import numpy before Cython (#5483)

Examples

Tests

  • Skip unwrap tests for numpy<1.21 (#5412)
  • Remove xfail in windows jitify test (#5418)
  • Enable strict xfail in pytest (#5423)
  • Add missing DLPack test for complex numbers (#5425)
  • Fix unwrap tests for v9 (#5426)
  • Fix preloading slow tests (#5445)
  • Add script for ROCm CI on Jenkins (#5468)
  • Add script for CUDA 11.4 CI on FlexCI (#5473)
  • Increase memory for CUDA 11.4 tests (#5480)
  • Fix "Revert test decorators order" (#5518)
  • Fix FlexCI Linux tests (#5520)
  • Add CUDA 11.4 for FlexCI helper script (#5543)
  • Fix scipy requirement in tests (#5563)
  • Fix some tests for HIP (#5578)
  • Update tests to install all requirements and add PATH (#5581)
  • Add Cython to all requirements (#5582)

Others

  • Notify conflict by mergify (#5419)
  • Fix mergify to only comment when pull-request is open (#5510)
  • Fix mergify condition (#5517)
  • Add auto notify bot for hip label (#5540)
  • Use pull_request_target instead for auto notify bot (#5542)
  • Fix auto notify bot for issues (#5547)
  • Disable Mergify's auto-merge (#5562)
  • Bump version to v9.3.0 (#5596)
  • Fix deprecated numpy.typeDict utilization (#5403)
  • Fix signal tests for SciPy 1.7.0 (#5413)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@12rambau @leofang @maxim-belkin @Palash-Vishnani

v10.0.0a2

24 Jun 08:32
827dfba
Compare
Choose a tag to compare
v10.0.0a2 Pre-release
Pre-release

This is the release note of v10.0.0a2. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

  • CuPy now supports CUDA 11.3 (cupy-cuda113) and AMD ROCm 4.2 (cupy-rocm-4-2) and binary wheels are now available on PyPI.
  • The following Python syntax and new APIs can now be used in JIT target functions.
    • Calling len, min, max Python built-ins.
      • len(arr): Equivalent to arr.shape[0].
      • min(scalar1, scalar2, ...): Returns the minimum value of the inputs.
      • max(scalar1, scalar2, ...): Returns the maximum value of the inputs.
    • Accessing .ndim, .size attributes of ndarray.
    • Unpacking nested tuples.
      • (x, y), z = ...
    • jit.grid() API, similar to numba.cuda.grid.
      • x, y, z = cupyx.jit.grid(3) (x is equal to threadIdx.x + blockIdx.x * blockDim.x.)
    • Warp shuffle and sync functions.
      • cupyx.jit.shfl_down_sync(mask, var, val_id) (__shfl_down_sync(mask, var, val_id))
  • cupyx.scipy.sparse.{coo,csr,csc}_matrix now provides the reshape method.

Changes without compatibility

Drop CUDA 9.2 & NCCL 2.4 Support (#5214)

CUDA 9.2 and NCCL 2.4 are no longer supported in CuPy v10.

Changes in Stream behavior (#5251)

The same cupy.cuda.Stream instance can now safely be shared between multiple threads. To achieve this, CuPy v10 will not destroy the stream (i.e., call cudaStreamDestroy) if the stream is the current stream of any thread.

Known Issues

  • cupy-cuda111 wheels only support CUDA 11.1.1 and will not work with CUDA 11.1.0 (#5313).
  • cupy-cuda110 and cupy-cuda111 wheels are not available yet in PyPI. In the meantime, they can be downloaded from the Assets section below. See #4971 for detailed instructions.

Changes

New Features

  • Add reshape method for COO, CSR and CSC matrices (#5301)
  • Support len, min, max, .ndim, .size in jit (#5319)
  • Support nested tuple unpack in CuPy JIT (#5332)
  • Support Numba-like jit.grid() syntax in CuPy JIT (#5334)
  • Support warp shuffle and sync functions in CuPy JIT (#5335)

Enhancements

  • Do not use handles unless requested in cupy.show_config() (#5073)
  • Fix to allow sharing a Stream instance between threads (#5251)
  • Adding GUFunc order, dtype and casting kwarg support (#5260)
  • Support nan, posinf, neginf in cupy.nan_to_num (#5295)
  • Use independent version of hipFFT for ROCm 4.1 and later (#5318)
  • Support cuTENSOR v1.3.1 (#5338)
  • Support cuDNN v8.2.1 (#5357)

Performance Improvements

  • Make cuTENSOR available in cupy.einsum (#5203)

Bug Fixes

  • Fix check_availablity for cupy.cusolver (#5207)
  • Fix MemoryAsync to keep a weakref to stream (#5264)
  • Fix cuFFT callback for sm_61 etc (#5304)
  • Fix cuDNN preloading (#5327)
  • Fix large arrays assignment (#5330)
  • Ensure source array is C-contiguous before copying to CUDAArray (#5342)
  • Increase test coverage for Generalized Universal Functions (#5344)
  • Remove unnecessary print (#5374)

Code Fixes

  • Fix cub repository url (#5236)
  • Code and comment fixes for stream (#5243)
  • Use cdef instead of cpdef where appropriate (#5274)

Documentation

  • Fix matmul docstring (#5174)
  • Update list of wheels in README (#5267)
  • Add user guide for FFT (#5272)
  • Bump CuPy version in docs (#5277)
  • Add user guide for streams & events (#5283)
  • Fix deadlink to tutorial and reorder in README (#5287)
  • Document ExternalStream (#5305)
  • Add ROCm 4.2 support to install docs (#5354)
  • user_guide/basic.rst: various improvements (#5356)

Installation

  • Drop support for CUDA 9.2 & NCCL 2.4 (#5214)
  • Add upper restrictions to NumPy/SciPy versions (#5225)
  • Exclude Cython 3 from setup_requires (#5273)

Tests

  • Fix threading memory pool tests (#5263)
  • Temporarily remove the async pool test from TestAllocator (#5308)
  • Fix Windows CI kernel cache (#5310)
  • Tentatively skip unstable MemoryPoolAsync tests (#5350)
  • Xfail random generator tests for HIP (#5355)
  • Tentatively pin to SciPy 1.6 in Windows CI (#5366)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @eternalphane @leofang @maxim-belkin @povinsahu1909

v9.2.0

24 Jun 08:32
83d5e6d
Compare
Choose a tag to compare

This is the release note of v9.2.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

  • CuPy now supports CUDA 11.3 (cupy-cuda113) and AMD ROCm 4.2 (cupy-rocm-4-2) and binary wheels are now available on PyPI.

Known Issues

  • cupy-cuda111 wheels only support CUDA 11.1.1 and will not work with CUDA 11.1.0 (#5313).
  • cupy-cuda110 and cupy-cuda111 wheels are not available yet in PyPI. In the meantime, they can be downloaded from the Assets section below. See #4971 for detailed instructions.

Changes

Enhancements

  • Add CUDA 11.3 headers (#5232)
  • Do not use handles unless requested in cupy.show_config() (#5285)
  • Use independent version of hipFFT for ROCm 4.1 and later (#5351)
  • Support cuTENSOR v1.3.1 (#5370)
  • Support cuDNN v8.2.1 (#5372)

Bug Fixes

  • MemoryAsyncPool: Use the "current" mempool instead of the "default" one (#5271)
  • Fix MemoryAsync to keep a weakref to stream (#5307)
  • Fix cuFFT callback for sm_61 etc (#5325)
  • Fix large arrays assignment (#5333)
  • Fix check_availablity for cupy.cusolver (#5336)
  • Fix cuDNN preloading (#5365)
  • Ensure source array is C-contiguous before copying to CUDAArray (#5375)
  • Remove unnecessary print (#5377)

Code Fixes

  • Use cdef instead of cpdef where appropriate (#5274)
  • Fix cub repository url (#5288)

Documentation

  • Fix matmul docstring (#5281)
  • Update list of wheels in README (#5284)
  • Add user guide for FFT (#5286)
  • Fix deadlink to tutorial and reorder in README (#5291)
  • Add user guide for streams & events (#5302)
  • Document ExternalStream (#5312)
  • user_guide/basic.rst: various improvements (#5356)
  • Add ROCm 4.2 support to install docs (#5360)

Installation

  • Exclude Cython 3 from setup_requires (#5273)
  • Add upper restrictions to NumPy/SciPy versions (#5321)

Tests

  • Fix threading memory pool tests (#5289)
  • Fix Windows CI kernel cache (#5317)
  • Xfail random generator tests for HIP (#5359)
  • Tentatively pin to SciPy 1.6 in Windows CI (#5369)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@leofang @maxim-belkin

v10.0.0a1

27 May 07:50
b01641d
Compare
Choose a tag to compare
v10.0.0a1 Pre-release
Pre-release

This is the release note of v10.0.0a1. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

CUDA 11.0 and 11.1 wheels not available yet in PyPI (#4971)

In the meantime, they can be downloaded from the Assets section below. See #4971 for the detailed instructions.

Changes without compatibility

Current stream is now managed per device (#5172)

CuPy now automatically manages the stream switching when changing a device, so the user is not responsible for changing the stream anymore.

This pull-request also includes a bug fix for #5143. An existing code mixing with stream: blocks and stream.use() may get different results as the stream set via use() API will not be reactivated when exiting a stream context.

s1 = cupy.cuda.Stream()
s2 = cupy.cuda.Stream()
s3 = cupy.cuda.Stream()
with s1:
    s2.use()
    with s3:
        pass
    cupy.cuda.get_current_stream()  # -> CuPy v10 returns `s1` instead of `s2`.

Make cupy.cuda.Device context manager interface thread safe (#5083)

The use of a single cupy.cuda.Device context manager object with multiple threads was leading to incorrect behavior when restoring the previous device since the first versions of CuPy. Now the correct device is restored back so user code relying on this incorrect behavior might need to be updated.

Deprecate cupyx.allow_synchronize and cupyx.DeviceSynchronized APIs (#5226)

These APIs used for detecting when synchronization to a device was happening have been deprecated since they don’t provide reliable behavior.

Changes

Note: many of these PRs are backported to the v9 series and available since the release.

New Features

  • CUDA 11.2: Add MemoryAsyncPool to support malloc_async (#4592)
  • Add APIs for creating NumPy arrays backed by pinned memory (#4870)
  • Support cuSPARSELt (#4883)
  • Add gamma distributions to random API (#4905)
  • Add random for uniform [0, 1) generation (#4906)
  • Add poisson distribution to random API (#4927)
  • Add SciPy compatible connected_components (#4940)
  • Support shared memory in CuPy JIT (#4950)
  • Add cupyx.scipy.sparse.kronsum() (#4968)
  • Add hfft2, ihfft2, hfftn, and ihfftn to cupyx.scipy.fft (#4996)
  • CuPy JIT: Print kernel code (#5017)
  • Add cupyx.jit.atomic_add (#5169)
  • CUDA 11.2/11.3: Support MemoryAsyncPool statistics and limits (#5177)

Enhancements

  • Ability to pass structured data types by value as kernel parameters (#4829)
  • Move the NVTX module to cupy_backends.cuda.libs (#4930)
  • Disable CUB SpMV on CUDA 11.x (#4949)
  • CuPy JIT: Readable compile error messages (#4991)
  • Fix JIT test failures on ROCm (#4998)
  • Mark cupyx.jit.rawkernel as experimental (#5005)
  • HIP: add -ftz=true (#5007)
  • Give gufunc a name (#5013)
  • CuPy JIT: Use C++-like typing rule in 'cuda' mode (#5028)
  • Add PCI Bus ID to show_config (#5037)
  • Print cuSPARSELt version in show_config (#5054)
  • Support custom getsource option in CuPy JIT (#5071)
  • Make cupy.cuda.Device context manager interface thread safe (#5083)
  • Add a new argument out to cupy.asnumpy() (#5155)
  • Support cuSPARSELt v0.1.0 (#5158)
  • Per device stream (#5172)
  • cuTENSOR v1.3.0 for library installer (#5192)
  • Add sum_labels to cupyx.scipy.ndimage.measure (#5200)
  • Support NCCL v2.9.8 (#5201)
  • Fix thrust compilation for ROCm 4.2.0 (#5209)
  • Add NVCC path and Python version to show_config (#5215)
  • Add CUDA 11.3 headers (#5218)
  • Add libraries for CUDA 11.3 (#5219)
  • Remove syncdetect APIs (#5226)

Bug Fixes

  • Use THRUST_OPTIONAL_CPP11_CONSTEXPR (#5002)
  • Use async memcpy in ndarray.copy (#5004)
  • Fix DLPack lanes (#5045)
  • Disable cuFFT plan cache on CUDA 11.1 (#5046)
  • Support PTDS in CuPy memory pool (#5072)
  • CuPy JIT: Fix range type (#5077)
  • Fix poisson to support lam array (#5087)
  • Adjust PATH when preloading to load cuDNN v8 correctly on Windows (#5103)
  • Bugfix for typing rule of CuPy JIT (#5125)
  • Fix TypeError in svds (#5140)
  • Properly handle non-contiguous RHS in cupyx.scipy.sparse.linalg.spsolve (#5168)
  • Fix integer scatter_add failure on Windows (#5173)
  • MemoryAsyncPool: Use the "current" mempool instead of the "default" one (#5191)
  • Fix matmul for input with relaxed strides (#5205)
  • Add check_availability for cuTensor routines (#5206)
  • Fix windows constexpr (#5233)
  • Remove duplicated subtraction in cupy.random.Generator.integers (#5247)

Code Fixes

  • Rename cupy.core submodule to cupy._core (#3820)
  • Fix some internal cpdef functions to cdef in _kernel.pyx (#5084)
  • Remove cupy.cupy (#5121)
  • Cosmetic change in cuSPARSELt stub header (#5149)
  • Cosmetic changes of CuPy JIT implementation (#5152)

Documentation

  • Follow the latest NumPy/SciPy docs style (#4945)
  • Fix docs: cupy-cuda112 now on PyPI (#4957)
  • Update installation guide for Conda-Forge (#4985)
  • CuPy JIT documentation (#5012)
  • Document cupyx.time.repeat (#5015)
  • Document cupy.cuda.runtime.getDeviceProperties (#5016)
  • More documentation on the supported backends (#5019)
  • Add links to Anaconda, Gitter, StackOverflow (#5020)
  • Improve the documentation on interoperability (#5023)
  • Document CFunctionAllocator and ManagedMemory (#5025)
  • Fix code block in installation guide (#5033)
  • Improve comments for memory and stream API usage (#5060)
  • Point to the correct numpy random docs (#5088)
  • Add user guide (#5093)
  • Add ROCm limitations to docs (#5107)
  • Reorganize API reference pages (#5108)
  • Revise ROCm doc (#5122)
  • Fix docs of scatter_add (#5129)
  • Mention baseline API change in upgrade guide (#5131)
  • Fix ROCm wheel install steps (#5133)
  • Fix docstring in coo.py (#5139)
  • Fix docs in stream.pyx (#5144)
  • cuDNN v8.2 on documentation (#5148)
  • Mention PTDS in ROCm Limitation (#5159)
  • Use Sphinx 4 (#5188)
  • cuTENSOR v1.3 on documentation (#5196)
  • Fix cuSPARSELt not covered in docs (#5221)
  • Add cupyx.scipy.ndimage.sum_labels to docs (#5223)
  • Improve README (#5254)
  • Update logo image (#5255)
  • Tentatively remove CUDA 11.3 from support list (#5256)

Installation

  • Fix Windows dll loading for Conda (#4974)
  • Add warnings for duplicate installation (#5032)
  • cuDNN v8.2.0 for library installer (#5146)
  • Bump version to v10.0.0a1 (#5269)

Examples

  • Fix cuSPARSELt example not to use internal function (#4995)
  • Update examples for current version of CuPy (#4999)

Tests

  • Refactor random tests (#4907)
  • Tentatively pin CI to ROCm 4.0.1 (#4961)
  • Fix cutensor import in the test (#4965)
  • Make install_tests runnable without depending on current path (#4969)
  • Avoid using pip install -e on Windows CI for performance (#4970)
  • Update known base branches in flexCI config (#4973)
  • Update list of known branches (#4982)
  • Fix TestStream cleanup (#5042)
  • Mark some memory tests as testing.slow (#5061)
  • Fix stream usage on D2D copy test under HIP (#5091)
  • Xfail tests for random distribution generator under HIP/ROCm (#5096)
  • Adjust testing tolerance for hfftn for HIP/ROCm (#5099)
  • Use current device in tests (#5127)
  • Fix for updated FlexCI base image (#5164)
  • Relax tolerance of cupyx.jit.atomic_add test (#5186)
  • Test build for ROCm 4.0 and latest (#5224)
  • Fix mergify configuration (#5248)

Others

  • Use bot mode in automatic backport (#5051)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @beingaryan @eternalphane @grlee77 @insertinterestingnamehere @keckj @leofang @povinsahu1909 @UmashankarTriforce

v9.1.0

27 May 07:48
51210b3
Compare
Choose a tag to compare

This is the release note of v9.1.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

CUDA 11.0 and 11.1 wheels not available yet in PyPI (#4971)

In the meantime, they can be downloaded from the Assets section below. See #4971 for the detailed instructions.

Changes without compatibility

Make cupy.cuda.Device context manager interface thread safe (#5083)

The use of a single cupy.cuda.Device context manager object with multiple threads was leading to incorrect behavior when restoring the previous device since the first versions of CuPy. Now the correct device is restored back so user code relying on this incorrect behavior might need to be updated.

Changes

Enhancements

  • Add cupyx.jit.atomic_add (#5181)
  • Support custom getsource option in CuPy JIT (#5089)
  • Fix JIT test failures on ROCm (#5101)
  • Make cupy.cuda.Device context manager interface thread safe (#5147)
  • Fix thrust compilation for ROCm 4.2.0 (#5212)
  • Add sum_labels to cupyx.scipy.ndimage.measure (#5222)
  • Support cuSPARSELt v0.1.0 (#5227)
  • Fix Stream destructor not taking care of PTDS (#5228)
  • NCCL v2.9.8 (#5229)
  • Add NVCC path and Python version to show_config (#5230)
  • cuTENSOR v1.3.0 for library installer (#5234)
  • Add libraries for CUDA 11.3 (#5235)

Bug Fixes

  • Fix DLPack lanes (#5094)
  • Fix TypeError in svds (#5161)
  • Fix integer scatter_add failure on Windows (#5178)
  • Properly handle non-contiguous RHS in cupyx.scipy.sparse.linalg.spsolve (#5180)
  • Fix poisson to support lam array (#5182)
  • Fix matmul for input with relaxed strides (#5240)
  • Add check_availability for cuTensor routines (#5244)
  • Fix windows constexpr (#5250)
  • Remove duplicated subtraction in cupy.random.Generator.integers (#5261)

Code Fixes

  • Remove cupy.cupy (#5137)
  • Cosmetic change in cuSPARSELt stub header (#5160)
  • Cosmetic changes of CuPy JIT implementation (#5162)

Documentation

  • Mention baseline API change in upgrade guide (#5132)
  • Fix docstring in coo.py (#5141)
  • Fix docs in stream.pyx (#5150)
  • Fix docs of scatter_add (#5153)
  • Fix ROCm wheel install steps (#5154)
  • Mention PTDS in ROCm Limitation (#5166)
  • Use Sphinx 4 (#5198)
  • cuDNN v8.2 on documentation (#5217)
  • Fix cuSPARSELt not covered in docs (#5231)
  • cuTENSOR v1.3 on documentation (#5238)
  • Add cupyx.scipy.ndimage.sum_labels to docs (#5245)
  • Update logo image (#5257)
  • Improve README (#5259)

Installation

  • cuDNN v8.2.0 for library installer (#5216)
  • Bump version to v9.1.0 (#5270)

Tests

  • Use current device in tests (#5151)
  • Fix stream usage on D2D copy test under HIP (#5157)
  • Fix for updated FlexCI base image (#5167)
  • Relax tolerance of cupyx.jit.atomic_add test (#5187)
  • Test build for ROCm 4.0 and latest (#5239)
  • Avoid using pip install -e on Windows CI for performance (#5242)
  • Fix mergify configuration (#5249)

Others

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @beingaryan @eternalphane @grlee77 @insertinterestingnamehere @leofang

v9.0.0

22 Apr 05:59
8de4cf8
Compare
Choose a tag to compare

This is the release note of v9.0.0.

This release note only covers the changes since v9.0.0rc1 release. Read the blog for the details of new features introduced in CuPy v9!

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

NVIDIA cuSPARSELt

CuPy now integrates the Python binding for the cuSPARSELt library that accelerates sparse matrix multiplications on NVIDIA Ampere GPUs. We are planning to start using it in CuPy sparse APIs to transparently improve performance.

RAPIDS cuGraph

cupyx.scipy.sparse.csgraph is added to the API with support for the connected_components method. The support for cuGraph is optional and can be installed through conda-forge or by manually building CuPy. Currently, PyPI wheels do not have built-in support for cuGraph.

Add MemoryAsyncPool to support malloc_async (#5034)

By using cupy.cuda.set_allocator(cupy.cuda.MemoryAsyncPool().malloc) it is now possible to use the stream ordered memory allocations introduced in CUDA 11.2.

APIs for creating NumPy arrays backed by pinned memory (#5100)

By using the cupyx.empty_pinned(), cupyx.empty_like_pinned(), cupyx.zeros_pinned() cupyx.zeros_like_pinned() it is possible to obtain NumPy ndarrays with their storage located in pinned memory to improve performance of data movement.

CUDA 11.0 and 11.1 wheels not available yet in PyPI (#4971)

In the meantime, they can be downloaded from the Assets section below. See #4971 for the detailed instructions.

Changes

See here for the complete list of solved issues and merged PRs after v9.0.0rc1 release. For all changes since v9 series, please refer to the release notes of the pre-releases ((alpha1, beta1, beta2, beta3, rc1).

New Features

  • Support shared memory in CuPy JIT (#4977)
  • Support cuSPARSELt (#4994)
  • Add random for uniform [0, 1) generation (#5003)
  • CUDA 11.2: Add MemoryAsyncPool to support malloc_async (#5034)
  • Add poisson distribution to random API (#5036)
  • CuPy JIT: Print kernel code (#5038)
  • Add gamma distributions to random API (#5086)
  • Add APIs for creating NumPy arrays backed by pinned memory (#5100)
  • Add SciPy compatible connected_components (#5113)

Enhancements

  • Disable CUB SpMV on CUDA 11.x (#4978)
  • Move the NVTX module to cupy_backends.cuda.libs (#5014)
  • HIP: add -ftz=true (#5035)
  • CuPy JIT: Readable compile error messages (#5041)
  • CuPy JIT: Use C++-like typing rule in 'cuda' mode (#5053)
  • Mark cupyx.jit.rawkernel as experimental (#5057)
  • Add PCI Bus ID to show_config (#5062)
  • Print cuSPARSELt version in show_config (#5065)
  • Give gufunc a name (#5085)

Bug Fixes

  • Use THRUST_OPTIONAL_CPP11_CONSTEXPR (#5011)
  • Disable cuFFT plan cache on CUDA 11.1 (#5068)
  • Use async memcpy in ndarray.copy (#5078)
  • CuPy JIT: Fix range type (#5081)
  • Support PTDS in CuPy memory pool (#5082)
  • Adjust PATH when preloading to load cuDNN v8 correctly on Windows (#5116)

Code Fixes

  • Rename cupy.core submodule to cupy._core (#4987)
  • Fix some internal cpdef functions to cdef in _kernel.pyx (#5098)

Documentation

  • Fix docs: cupy-cuda112 now on PyPI (#4990)
  • Update installation guide for Conda-Forge (#4993)
  • Document cupyx.time.repeat (#5027)
  • Document cupy.cuda.runtime.getDeviceProperties (#5029)
  • Doc: Add links to Anaconda, Gitter, StackOverflow (#5030)
  • More documentation on the supported backends (#5039)
  • Fix code block in installation guide (#5043)
  • Document CFunctionAllocator and ManagedMemory (#5059)
  • Improve the documentation on interoperability (#5064)
  • CuPy JIT documentation (#5076)
  • Improve comments for memory and stream API usage (#5079)
  • Add user guide (#5109)
  • Reorganize API reference pages (#5114)
  • Point to the correct numpy random docs (#5115)
  • Follow the latest NumPy/SciPy docs style (#5118)
  • Add ROCm limitations to docs (#5119)
  • Revise ROCm doc (#5123)

Installation

  • Fix Windows dll loading for Conda (#5106)

Examples

  • Update examples for current version of CuPy (#5009)
  • Fix cuSPARSELt example not to use internal function (#5066)

Tests

  • Tentatively pin CI to ROCm 4.0.1 (#4976)
  • Update known base branches in flexCI config (#4980)
  • Fix cutensor import in the test (#4981)
  • Update list of known branches (#4989)
  • Make install_tests runnable without depending on current path (#4992)
  • Fix TestStream cleanup (#5052)
  • Mark some memory tests as testing.slow (#5063)
  • Refactor random tests (#5102)

Others

  • Use bot mode in automatic backport (#5058)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @leofang @povinsahu1909

v9.0.0rc1

25 Mar 06:51
30e0045
Compare
Choose a tag to compare
v9.0.0rc1 Pre-release
Pre-release

This is the release note of v9.0.0rc1. See here for the complete list of solved issues and merged PRs.

We are planning to release the final v9.0.0 on April 22th. Please start testing your workload with this release. See the Upgrade Guide for the list of possible breaking changes.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

CuPy JIT (#4774)

Now creating raw kernels out of python functions is possible thanks to the introduction of the @cupyx.jit.rawkernel decorator.

from cupyx import jit

@jit.rawkernel()
def f(x, y, z, n):
    tid = jit.threadIdx.x + jit.blockIdx.x * jit.blockDim.x
    ntid = jit.blockDim.x * jit.gridDim.x
    for i in range(tid, n, ntid):
        z[i] = x[i] + y[i]

n = numpy.uint32(1024)
x = cupy.arange(n)
y = cupy.arange(n)
z = cupy.empty((n,), dtype='l')
f[16, 16](x, y, z, n)

Support for Generalized Universal Functions (#4675)

We have added an interface to support Generalized Universal Functions based on the one in Dask. Currently, it is used in matmul to ensure compatibility with __array_ufunc__ numpy dispatching.

cuTENSOR Support in Binary Packages (#4600)

cuTENSOR support is now enabled in wheel packages. To use cuTENSOR features you will need to install the shared library using python -m cupyx.tools.install_library --cuda 11.2 --library cutensor after installing wheels.

New Sphinx Theme in Documentation (#4351)

Following NumPy, we have adopted the pydata_sphinx_theme in our documentation site starting from this release.

CUDA 11.0 and 11.1 wheels not available yet in PyPI (#4971)

In the meantime they can be downloaded from the Assets section below. See #4971 for the detailed instructions.

Changes without compatibility

cupy.cuda.nccl is hidden by default (#4919)

NCCL wrapper is no longer imported in cupy/cuda/__init__.py requiring it to be explicitly imported from cupy.cuda.nccl.

Drop NCCL & cuDNN shared libraries from wheels (#4850, #4932)

NCCL and cuDNN shared libraries are no longer bundled in all wheels. To activate features using NCCL / cuDNN in CuPy v9, you will need to install these libraries using python -m cupyx.tools.install_library tool after installing CuPy wheels. See the Installation Guide for details.

By eliminating the default bundling of cuDNN & NCCL we have achieved further reductions in the wheel size averaging 5x.

Deprecate cupy.bool, cupy.int, cupy.float and cupy.complex (#4790)

Following NumPy 1.20 API, these aliases for the Python scalar types have been deprecated.
cupy.bool_, cupy.int_, cupy.float_ and cupy.complex_ should be used instead when required.

Docker image updated to CUDA 11.2 and Python 3.8

The official Docker image is now updated to use CUDA 11.2 and Python 3.8.

Changes

New Features

  • LOBPCG solver - cupyx.scipy.sparse.linalg.lobpcg (#4281)
  • Add diagonal and setdiag methods for COO sparse matrices (#4664)
  • Support for Generalized Universal Functions (#4675)
  • Support batched pinv (#4686)
  • Add CuPy JIT Kernel definition (#4774)
  • Add cupy.random.Generator.standard_normal (#4885)
  • Support tuple in CuPy JIT (#4890)
  • Add exponential distribution to random API (#4915)
  • Support tuple indexing in CuPy JIT (#4939)
  • Support __syncthreads() in CuPy JIT (#4941)

Enhancements

  • Support nvrtcGetSupportedArchs (#4691)
  • Update DLPack support (#4695)
  • Bump cuDNN to v8.1.1 in library installer tool (#4780)
  • Support norm='forward'/'backward' in cupy.fft functions (#4797)
  • Fix for flake8 F541 (#4803)
  • Complete build only when all of the essential modules are available (#4815)
  • Support norm='forward'/'backward' in cupyx.scipy.fft functions (#4816)
  • Support cuSparse functions for matrix conversion added in CUDA 11.2 (#4844)
  • Add NCCL to library installer (#4848)
  • Improve cuTENSOR installer (#4852)
  • Support cupy.ndarray type shift in cupy.roll (#4884)
  • Fix uniform random generation interval (#4894)
  • Use NVCC --threads option when building CuPy (#4908)
  • Bump headers to CUDA 11.2.2 (#4911)
  • Update preload to look for lib directory to support cuTENSOR/NCCL (#4912)
  • Move the NCCL module to cupy_backends.cuda.libs (#4919)
  • Add cupy/cuda/cutensor.py (#4920)

Performance Improvements

  • Improve batched SVD (#4731)
  • Avoid evaluating PTDS environment variable every time (#4842)

Bug Fixes

  • Fix dtypes in cupy.linalg (#4363)
  • Fix: avoid redeclaring attributes (#4764)
  • Windows: Fix compiler error for CUB block reduction kernels (#4771)
  • Support int argument for Dirichlet shape (#4772)
  • Windows: Fix histogram test failures (#4777)
  • Windows: fix sparse matrix indexing type (#4778)
  • Unify linux/windows randint with NumPy (#4808)
  • Improve/fix csc/csr argmax/argmin (#4813)
  • ROCm: Fix sorting bug (#4823)
  • Fixed choice function for 0 samples from 0 candidates (#4830)
  • Fix redeclaration of sparse warning classes (#4837)
  • Fix cuFFT callback compilations - v2 (#4853)
  • Solve UnboundLocalError on copy_from_host_async (#4900)
  • Add out arg verifier in new random interface. (#4904)
  • Fix compilation error due to invalid complex-to-real casting in _SimpleReductionKernel (#4909)
  • Fix C++ compilation error (#4922)
  • Fix cutensor import (#4933)
  • Fix flaky CUDAarray tests (#4946)
  • Declare CArray._indexing() only in CuPy JIT mode (#4951)

Code Fixes

  • Rename submodules under cupy.testing package (#3868)
  • Fix: code quality issues (#4587)
  • Use newest versions of stylecheck packages (#4694)
  • Clean-up sparse max/min argmax/argmin (#4860)

Documentation

  • Use pydata_sphinx_theme in Sphinx (#4351)
  • Remove cupy-cuda112 support from documentation (#4761)
  • Revert "Remove cupy-cuda112 support from documentation" (#4785)
  • Fix broken Stream docs (#4843)
  • Reformat environment variables table (#4845)
  • Revert memory back to reference (#4857)
  • Update wheel list in README (#4910)
  • Merge ROCm installation guide (#4928)
  • Document that cuDNN and NCCL are no longer included (#4932)
  • Update install docs (#4943)

Installation

  • Support optional dependencies from Conda-Forge (#4873)
  • Bump version to v9.0.0rc1 (#4953)
  • Bump Docker image to use CUDA 11.2 (#4972)

Tests

  • Show config on Windows CI (#4649)
  • Windows: Fix test condition for CUB device kernels (#4776)
  • Xfail some tests for cupyx.scipy.statistics.correlation under ROCm/HIP (#4781)
  • Windows: fix vectorize tests (#4794)
  • Windows: fix OOM errors in the CI (#4801)
  • Windows: Fix sepfir2d tests (#4804)
  • Windows: Fix cuTENSOR tests (#4806)
  • Windows: Fix cuTENSOR tests (#4818)
  • Remove AppVeyor configurations (#4836)
  • Windows: Fix test_poly1d_pow_scalar (#4854)
  • Fix for flake8 E741 (#4888)
  • Windows: Skip failing cuDNN tests (#4893)
  • Add names for workflows (#4913)
  • Prioritize FlexCI daemon in Windows CI (#4916)
  • Fix to work with scheduled FlexCI job (#4929)
  • Change irfft tests tolerance (#4937)
  • Xfail tests for ndarray indexing under HIP (#4653)
  • Adjust tolerance of TestPolyArithmeticDiffTypes under HIP/ROCm (#4657)
  • Xfail tests in polynomial roots (#4658)
  • Xfail tests for manipulation dims under HIP/ROCm (#4662)
  • Xfail TestPolyfitParametersCombinations when deg == 0 under ROCm/HIP (#4758)
  • Xfail TestPolyfitCovMode when deg == 0 under ROCm/HIP (#4759)
  • Xfail TestInvh under ROCm/HIP (#4760)
  • ROCm: remove the need to set HCC_AMDGPU_TARGET at runtime (#4766)
  • Assert MT19937 not implemented in hipRAND (#4769)
  • Xfail chi-squared test for some random functions under ROCm/HIP (#4770)
  • Remove duplicated typedef in example when HIP (#4782)
  • Xfail cuDNN version check test under ROCm/HIP (#4791)
  • Remove solved xfail mark for msort (#4792)
  • Fix to test checking HIP version (#4859)
  • Xfail test on sparse handle under ROCm/HIP (#4861)
  • Xfail some tests under ROCm/HIP (#4868)
  • Xfail some conditions of ndimage filter under ROCm/HIP (#4877)
  • Xfail some conditions of ndimage interpolation tests under ROCm/HIP (#4878)
  • Xfail some conditions of ndimage measurements under ROCm/HIP (#4879)
  • Xfail some conditions of signal tests under ROCm/HIP (#4880)

Others

  • Add CODEOWNERS file (#4757)
  • Add GitHub Actions workflow for automatic backport (#4812)
  • Fix pytest opts for Windows CI (#4820)
  • Use access token for automated backport (#4833)
  • Fix automated backport workflow (#4835)
  • Use pull_request_target trigger in backport automation (#4841)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @aryamccarthy @grlee77 @leofang @mattvend @povinsahu1909 @venkywonka @viantirreau @withshubh

v8.6.0

25 Mar 06:50
ec025b3
Compare
Choose a tag to compare

This is the release note of v8.6.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Notes

Final release for v8.x series

We expect this version to be the final release for v8.x series. Please start testing your workloads with the latest v9.x pre-release.

CUDA 11.0 and 11.1 wheels for Windows not available yet in PyPI (#4971)

In the meantime they can be downloaded from the Assets section below. See #4971 for the detailed instructions.

Changes

Enhancements

  • Bump cuDNN to v8.1.1 in library installer tool (#4795)
  • Update DLPack support (#4849)
  • Bump headers to CUDA 11.2.2 (#4917)

Bug Fixes

  • [v8] Fix linalg.pinv on empty matrices (#4783)
  • Windows: Fix histogram test failures (#4784)
  • Windows: fix sparse matrix indexing type (#4796)
  • Support int argument for Dirichlet shape (#4798)
  • Windows: Fix compiler error for CUB block reduction kernels (#4814)
  • ROCm: Fix sorting bug (#4826)
  • Unify linux/windows randint with NumPy (#4827)
  • Fix dtypes in cupy.linalg (#4839)
  • Fixed choice function for 0 samples from 0 candidates (#4851)
  • Improve/fix csc/csr argmax/argmin (#4858)
  • Fix cooperative kernel launch (#4887)

Code Fixes

  • Use newest versions of stylecheck packages (#4800)
  • Fix: code quality issues (#4832)

Documentation

  • Remove cupy-cuda112 support from documentation (#4762)
  • Revert " Remove cupy-cuda112 support from documentation" (#4786)
  • Reformat environment variables table (#4856)

Installation

  • Bump version to v8.6.0 (#4954)

Tests

  • Windows: Fix test condition for CUB device kernels (#4793)
  • Windows: Fix cuTENSOR tests (#4818)
  • Remove AppVeyor configurations (#4846)
  • Windows: fix OOM errors in the CI (#4862)
  • Fix raw kernel test (#4871)
  • Windows: Fix test_poly1d_pow_scalar (#4889)
  • Windows: Skip failing cuDNN tests (#4901)
  • Add names for workflows (#4914)
  • Show config on Windows CI (#4918)
  • Prioritize FlexCI daemon in Windows CI (#4921)
  • Fix to work with scheduled FlexCI job (#4931)

Others

  • Add CODEOWNERS file (#4788)
  • Fix pytest opts for Windows CI (#4822)
  • Rename submodules under cupy.testing package (#4876)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @aryamccarthy @leofang @povinsahu1909 @withshubh