Skip to content

v0.3.1

Latest
Compare
Choose a tag to compare
@yongwww yongwww released this 05 Sep 06:24
· 16 commits to main since this release
3c1e8d7

What's Changed

  • hotfix: change MAX_JOBS in aot ci by @yzh119 in #1621
  • fix: export MAX_JOBS for AOT build by @yongwww in #1626
  • feat: initial support for SM103, SM110, SM120, SM121 by @aleozlx in #1608
  • perf: Fix the tactic sorting in TrtllmGenBatchedGemmRunner::getValidConfigIndices by @jinyangyuan-nvidia in #1615
  • Fix cute dsl gemm API wrong arg name and silent error when passing wrong kwargs by @fzyzcjy in #1619
  • bugfix: fix merge_attention_state in BatchAttention w/ gqa-group-size in Qwen family by @happierpig in #1614
  • bugfix: fix multi-gpu/node unit-test: skip when there aren't enough GPUs in test_trtllm_mnnvl_allreduce by @bkryu in #1627
  • ci: add cuda-13 unittests to CI by @yzh119 in #1603
  • Revert "hotfix: change MAX_JOBS in aot ci (#1621)" by @yzh119 in #1629
  • patch mm segfault & patch cubin avail. by @aleozlx in #1628
  • bugfix: fix flashinfer_benchmark.py IMA when running a test list by @bkryu in #1625
  • feat: cutlass fp4 gemm bringup for SM120 & SM121 by @yongwww in #1609
  • feat: update flashinfer-cli by @yzh119 in #1613
  • bugfix: trtllm-gen fmha sm101 and sm100 compatibility by @cyx-6 in #1631
  • bugfix: collect all modules to aot by @yzh119 in #1622
  • fix: pass workspace for trtllm-gen attention by @yyihuang in #1635
  • feat: cutlass fp8 gemm bringup for SM120 & SM121 by @yongwww in #1610
  • test: pytest.mark.xfail on deepgemm by @yongwww in #1636
  • release: bump version v0.3.1 by @yongwww in #1637

Full Changelog: v0.3.0...v0.3.1