v0.2.13
What's Changed
- test: add top_k_sampling_with_variable_k test by @JasonJ2021 in #1505
- benchmark: add moe to benchmark by @nv-yunzheq in #1497
- update allreduce to match trtllm by @nvjullin in #1507
- Support cuda<12.8 built for trtllm_allreduce_fusion. by @strgrb in #1508
- gpt-oss: Add MXFP8 x MXFP4 CUTLASS MOE for SM100 and BF16 x MXFP4 CUTLASS for SM90 + SwigluBias Activation by @djmmoss in #1396
- tuner: Trtllm-gen Fp4 MoE Autotunner by @IwakuraRein in #1475
- refactor fp4 masked gemm cute-dsl implementation and add manual cache by @yzh119 in #1521
- fix: add missing 'requests' when building the package with AOT by @EmilienM in #1517
- Fix cuda-python v13.0 import compatibility by @yongwww in #1455
- misc: add license of spdlog for packaging by @yzh119 in #1522
- Fix linking errors with CUDA 13 by @yongwww in #1523
- release: bump version to v0.2.13 by @yongwww in #1524
New Contributors
- @JasonJ2021 made their first contribution in #1505
- @nv-yunzheq made their first contribution in #1497
- @nvjullin made their first contribution in #1507
- @strgrb made their first contribution in #1508
- @djmmoss made their first contribution in #1396
Full Changelog: v0.2.12...v0.2.13