Release v0.3.1 · flashinfer-ai/flashinfer

What's Changed

hotfix: change MAX_JOBS in aot ci by @yzh119 in #1621
fix: export MAX_JOBS for AOT build by @yongwww in #1626
feat: initial support for SM103, SM110, SM120, SM121 by @aleozlx in #1608
perf: Fix the tactic sorting in TrtllmGenBatchedGemmRunner::getValidConfigIndices by @jinyangyuan-nvidia in #1615
Fix cute dsl gemm API wrong arg name and silent error when passing wrong kwargs by @fzyzcjy in #1619
bugfix: fix merge_attention_state in BatchAttention w/ gqa-group-size in Qwen family by @happierpig in #1614
bugfix: fix multi-gpu/node unit-test: skip when there aren't enough GPUs in test_trtllm_mnnvl_allreduce by @bkryu in #1627
ci: add cuda-13 unittests to CI by @yzh119 in #1603
Revert "hotfix: change MAX_JOBS in aot ci (#1621)" by @yzh119 in #1629
patch mm segfault & patch cubin avail. by @aleozlx in #1628
bugfix: fix flashinfer_benchmark.py IMA when running a test list by @bkryu in #1625
feat: cutlass fp4 gemm bringup for SM120 & SM121 by @yongwww in #1609
feat: update flashinfer-cli by @yzh119 in #1613
bugfix: trtllm-gen fmha sm101 and sm100 compatibility by @cyx-6 in #1631
bugfix: collect all modules to aot by @yzh119 in #1622
fix: pass workspace for trtllm-gen attention by @yyihuang in #1635
feat: cutlass fp8 gemm bringup for SM120 & SM121 by @yongwww in #1610
test: pytest.mark.xfail on deepgemm by @yongwww in #1636
release: bump version v0.3.1 by @yongwww in #1637

Full Changelog: v0.3.0...v0.3.1