v0.2.13

yongwww released this 20 Aug 23:23

· 99 commits to main since this release

0973054

What's Changed

test: add top_k_sampling_with_variable_k test by @JasonJ2021 in #1505
benchmark: add moe to benchmark by @nv-yunzheq in #1497
update allreduce to match trtllm by @nvjullin in #1507
Support cuda<12.8 built for trtllm_allreduce_fusion. by @strgrb in #1508
gpt-oss: Add MXFP8 x MXFP4 CUTLASS MOE for SM100 and BF16 x MXFP4 CUTLASS for SM90 + SwigluBias Activation by @djmmoss in #1396
tuner: Trtllm-gen Fp4 MoE Autotunner by @IwakuraRein in #1475
refactor fp4 masked gemm cute-dsl implementation and add manual cache by @yzh119 in #1521
fix: add missing 'requests' when building the package with AOT by @EmilienM in #1517
Fix cuda-python v13.0 import compatibility by @yongwww in #1455
misc: add license of spdlog for packaging by @yzh119 in #1522
Fix linking errors with CUDA 13 by @yongwww in #1523
release: bump version to v0.2.13 by @yongwww in #1524

New Contributors

@JasonJ2021 made their first contribution in #1505
@nv-yunzheq made their first contribution in #1497
@nvjullin made their first contribution in #1507
@strgrb made their first contribution in #1508
@djmmoss made their first contribution in #1396

Full Changelog: v0.2.12...v0.2.13

Contributors

EmilienM, djmmoss, and 7 other contributors

Assets 2