Releases · NVIDIA/MatX

03 Apr 15:55

cliffburdick

v0.4.0

c9a2521

v0.4.0

New Features

slice optimization to use builtin tensor function when possible by @luitjens in #360
Slice support for std::array shapes by @luitjens in #363
svd power iteration example, benchmark and unit tests. by @luitjens in #366
matmul: support real/complex tensors by @kshitij12345 in #362
Adding sign/index operators: by @luitjens in #369
optimized cast and conj op to return a tensor view when possible. by @luitjens in #371
implement QR for small batched matrices. by @luitjens in #373
Implement block power iteration (qr iterations) for svd by @luitjens in #375
Added output iterator support for CUB sums, and converted all sum() by @cliffburdick in #380
Removing inheritance from std::iterator by @cliffburdick in #381
DLPack support by @cliffburdick in #392
Adding ref-count for DLPack by @cliffburdick in #394
updating cub optimization selection for >= 2.0 by @tylera-nvidia in #395
Refactored make_tensor to allow lvalue init by @cliffburdick in #397
Updated notebook documentation and refactored some code by @cliffburdick in #398
Allow 0-stride dimensions for cublas input/output by @tbensonatl in #400
16-bit float reductions + updated softmax by @cliffburdick in #399

Bug Fixes

Fix Duplicate Print and remove member prints by @tylera-nvidia in #364
cublasLT col major detection fix. by @luitjens in #368
Fixes for 32b mode by @cliffburdick in #388
Fixed a bogus maybe-unitialized warning/error in release mode by @cliffburdick in #389
Fixed issue with using const pointers by @cliffburdick in #393
Generator Printing Patch by @tylera-nvidia in #370

New Contributors

@kshitij12345 made their first contribution in #362
@tbensonatl made their first contribution in #400

Full Changelog: v0.3.0...v0.4.0

Contributors

luitjens, kshitij12345, and 3 other contributors

Assets 2

20 Jan 19:43

cliffburdick

v0.3.0

20e00a2

v0.3.0

v0.3.0 marks a major release with over 100 features and bug fixes. Release cadence will occur more frequently after this release to support users not living at the HEAD.

What's Changed

Added squeeze operator by @cliffburdick in #163
Change name of squeeze to flatten by @cliffburdick in #164
Updated version of cuTENSOR and fixed paths by @cliffburdick in #166
Added reduction example with einsum by @cliffburdick in #168
Fixed bug with wrong type on argmin/max by @cliffburdick in #170
Fixed missing return on operator() for sum by @cliffburdick in #171
Fixed error with reduction with invalid indices. Only shows up on Jetson by @cliffburdick in #172
Fixed bug with matmul use-after-free by @cliffburdick in #173
Added test for batches GEMMs by @cliffburdick in #174
Throw an exception if using SetVals on non-managed pointer by @cliffburdick in #176
Added missing assert in release mode by @cliffburdick in #178
Fixed einsum in release mode by @cliffburdick in #179
Updates to docs by @cliffburdick in #180
Added unit test for transpose and fixed bug with grid size by @cliffburdick in #181
Fix grid dimensions for transpose. by @galv in #182
Added missing include by @cliffburdick in #184
Remove CUB from sum reduction while bug is being investigated by @cliffburdick in #186
Fix for cub reductions by @luitjens in #187
Reenable CUB tests by @cliffburdick in #188
Fixing incorrect parameter to CUB sort for 2D tensors by @cliffburdick in #190
Remove 4D restriction on Clone by @cliffburdick in #191
Added support for N-D convolutions by @cliffburdick in #189
Download RAPIDS.cmake only if it does not exist. by @cwharris in #192
Fix 11.4 compilation issues by @cliffburdick in #195
Improve FFT batching by @cliffburdick in #196
Fixed argmax initialization value by @cliffburdick in #198
Fix issue #199 by @pkestene in #200
Fix type on concatenate by @cliffburdick in #201
Fix documentation type-o by @dagardner-nv in #202
Missing host annotation on some generators by @cliffburdick in #203
Fixed TotalSize on cub operators by @cliffburdick in #204
Implementing remap operator. by @luitjens in #205
Update reverse/shift APIs by @luitjens in #207
batching conv1d across filters. by @luitjens in #208
Added Print for operators by @cliffburdick in #211
Complex div by @cliffburdick in #213
Added lcollapse and rcollapse operator by @luitjens in #212
Baseops by @luitjens in #214
Only allow View() on contigious tensors. by @luitjens in #215
Remove caching on some CUB types temporarily by @cliffburdick in #216
Fixed convolution mode SAME and added unit tests by @cliffburdick in #217
Added convolution VALID support by @cliffburdick in #218
Allow operators on cumsum by @cliffburdick in #219
Using async allocation in median() by @cliffburdick in #220
Various CUB fixes -- got rid of offset pointers (async allocation + copy), allowed operators on more types, and fixed caching on sort by @cliffburdick in #222
Fixed memory leak on CUB cache bypass by @cliffburdick in #223
Update to pipe type through for scalars on set operation by @tylera-nvidia in #225
Added complex version of mean and variance by @cliffburdick in #227
Fixed FFT batching for non-contiguous tensors by @cliffburdick in #228
Added fmod operator by @cliffburdick in #230
Fmod by @cliffburdick in #231
Changing name to fmod by @cliffburdick in #232
Cloneop by @luitjens in #233
Making the shift parameter in shift an operator by @luitjens in #234
Change sign of shift to match python/matlab. by @luitjens in #235
Changing output operator type to by value to allow temporary operators to be used as an output type. by @luitjens in #236
Adding slice() operator. by @luitjens in #237
Fix cuTensorNet workspace size by @leofang in #241
adding permute operator by @luitjens in #239
Cleaning up operators/transforms. by @luitjens in #243
Rapids cmake no fetch by @cliffburdick in #245
Cleanup of include directory by @luitjens in #246
Fixed conv SAME mode by @cliffburdick in #248
Use singleton on GIL interpreter by @cliffburdick in #249
make owning a runtime parameter by @luitjens in #247
Fixed bug with batched 1D convoultion size by @cliffburdick in #250
Adding 2d convolution tests by @luitjens in #251
Properly initialize pybind object by @cliffburdick in #252
Fixed sum() using wrong iterator type by @cliffburdick in #253
g++11 fixes by @cliffburdick in #254
Fixed size on conv and added benchmarks by @cliffburdick in #256
Adding unit tests for collapse with remap by @luitjens in #255
Collapse tests by @luitjens in #257
adding madd function to improve convolution throughput by @luitjens in #258
Conv opt by @luitjens in #259
Fixed compiler errors in release mode by @cliffburdick in #261
Add streaming make_tensor APIs. by @luitjens in #262
adding random benchmark by @luitjens in #264
remove depricated APIs in make_tensor by @luitjens in #266
Host unit tests by @luitjens in #267
Fixed bug with FFT size shorter than length of tensor by @cliffburdick in #270
removing unused pybind call made before pybind initialize by @tylera-nvidia in #271
Fixed visualization tests by @cliffburdick in #275
Fix cmake function check_python_libs. by @pkestene in #274
Support CubSortSegmented by @tylera-nvidia in #272
Executor cleanup. by @luitjens in #277
Transpose operators changes by @luitjens in #278
Remove Deprecated Shape and add metadata to Print by @tylera-nvidia in #280
Update Documentation by @tylera-nvidia in #282
NVTX Macros by @tylera-nvidia in #276
Adding throw to file reading by @tylera-nvidia in #281
Adding str() function to generators and operators by @luitjens in #283
Added reshape op by @luitjens in #287
0D tensor printing was broken since they don't have a stride by @cliffburdick in #289
Allow hermitian to take any rank by @cliffburdick in #292
Hermitian nd by @cliffburdick in #293
Fixed batched inverse by @cliffburdick in #294
Added 4D matmul unit test and fixed batching bug by @cliffburdick in #297
Fixing batched half precision complex GEMM by @cliffburdick in #298
Rename simple_pipeline to simple_radar_pipeline for added clarity by @awthomp in #299
Remove cuda::std::min/max by @cliffburdick in #301
Fixed chained concatenations by @cliffburdick ...

Contributors

cwharris, galv, and 8 other contributors

Assets 2

05 Apr 04:18

cliffburdick

v0.2.5

8e6bb10

Minor fix on name collision

v0.2.5

Changed MAX name to not collide with other libraries (#162)

Assets 2

31 Mar 15:06

cliffburdick

v0.2.4

342ac85

Minor fix

Fixed argmin initialization issue that gave wrong results sometimes

Assets 2

23 Mar 20:30

cliffburdick

v0.2.3

1c2a942

v0.2.3

Improved error messages
Added support for einsum function. Includes tensor contractions, GEMMs with transposed outputs, dot products, and trace
Integrated cuTENSOR library
Added real/imag/r2c operators
Added chirp function
Added file readers for .mat files
Fixes to conv2, fft2
Switched to CUB for certain reductions. Results in a 4x speedup in some cases
Added find() and find_idx() functions
Added unique() function
Many CMake fixes to clean up transitive target
Added casting operators
Added negate operator

Assets 2

24 Dec 04:31

cliffburdick

v0.2.2

88c5c60

N-D Tensors

Added support for N-D tensors for:

Operators
FFTs
GEMMs
Reductions
Solver
Tensor/operator accesses

Assets 2

17 Dec 23:59

cliffburdick

v0.2.1

0683898

v0.2.1

Added unlimited concatenation of tensors

Assets 2

16 Dec 22:00

cliffburdick

v0.2.0

fe701b2

Tensor class refactoring

This release adds major changes with the main tensor class to allow for custom types for storage and descriptors. In addition, static tensor descriptors are now possible for compile time pointer arithmetic. As of this release it is not longer recommended to construct tensor_t objects directly. Instead, prefer the make_ variants of the functions.

Other features of this release are:

Refactored tensor class to use generic storage and descriptors
Adding comments on all make functions. Fixing spectrogram examples
Added concatenation operator
Added static tensors
Adding const on all operator() where applicable
Add more creation of tensors
Changed convolution example to use static tensor sizes
Added documentation for make

Assets 2

08 Nov 23:01

cliffburdick

v0.1.1

de6e1a8

v0.1.1

Added make_tensor helper functions
Updated Black-Scholes example
Moved host-specific defines into separate file
Updated build system to better track libcuda++ and nvbench
Improved release mode speed by turning off assertion checking
Improved host operator creation time by storing intermediate variables
Updated recursive filter example to error if not enough shared memory is available

Assets 2

26 Oct 20:01

cliffburdick

v0.1.0

739ec6b

v0.1.0

First public release of MatX. Brief list of supported features are:

Frontend API for cuBLAS, CUTLASS, cuFFT, cuSolver, cuRAND, and CUB
All standard POD data types supported, as well as fp16/bf16 and complex
Template expression trees to generate optimized device kernels
Examples for both performance and accuracy
Over 500 unit tests
Benchmarks using nvbench
Native CMake build system
and more!

Assets 2

Releases: NVIDIA/MatX

v0.4.0

New Features

Bug Fixes

New Contributors

Contributors

Uh oh!

v0.3.0

What's Changed

Contributors

Uh oh!

Minor fix on name collision

Uh oh!

Minor fix

Uh oh!

v0.2.3

Uh oh!

N-D Tensors

Uh oh!

v0.2.1

Uh oh!

Tensor class refactoring

Uh oh!

v0.1.1

Uh oh!

v0.1.0

Uh oh!