v1.1.0rc1
Pre-release
Pre-release
Announcement Highlights:
-
Model Support
-
API
-
Benchmark
-
Feature
-
Documentation
What's Changed
- [https://nvbugs/5455651][fix] Make ngram use XQA attention on Blackwell by @mikeiovine in #6873
- [https://nvbugs/5441714][chore] remove skip on disagg n-gram test by @raayandhar in #6872
- [None] [feat] Add Tencent HunYuanMoEV1 model support by @qianbiaoxiang in #5521
- [None][chore] Add tests for non-existent and completed request cancellation by @achartier in #6840
- [None][doc] Update gpt-oss doc on MoE support matrix by @hlu1 in #6908
- [https://nvbugs/5394685][fix] using static scheduler 2CTA MLA as WAR for an accuracy issue by @PerkzZheng in #6896
- [https://nvbugs/5437106][fix] Add L4 Scout benchmarking WAR option in deploy guide by @JunyiXu-nv in #6829
- [None][fix] Fix the issue of responsibility boundary between the assert and tllmException files by @Fan-Yunfan in #6723
- [None][fix] Correct reporting of torch_dtype for ModelConfig class. by @FrankD412 in #6800
- [None][fix] Fix perfect router. by @bobboli in #6797
- [https://nvbugs/5415862][fix] Update cublas as 12.9.1 and cuda memory alignment as 256 by @Wanli-Jiang in #6501
- [None][fix] Update tests to use standardized uppercase backend identifiers by @bo-nv in #6921
- [TRTLLM-7141][infra] Use repo mirrors to avoid intermittent network failures by @chzblych in #6836
- [None][doc] Modify the description for mla chunked context by @jmydurant in #6929
- [None][chore] Add failed cases into waives.txt by @xinhe-nv in #6914
- [None][chore] add a EditorConfig config by @zhenhuaw-me in #6897
- [https://nvbugs/5451373][fix] : Fix the accuracy issue when using FP8 context MLA by @peaceh-nv in #6881
- [https://nvbugs/5405041][fix] Update wide-ep doc by @qiaoxj07 in #6933
- [None] [chore] Mamba cache in separate file by @tomeras91 in #6796
- [https://nvbugs/5427801][fix] Torch compile support for Llama4 and Ea… by @liji-nv in #6858
- [https://nvbugs/5394685][fix] proper fix for the accuracy issue in 2CTA MLA kernels by @PerkzZheng in #6941
- [https://nvbugs/5394392][fix] Enlarge scheduler capacity under disagg bs == 1 by @yifeizhang-c in #6537
- [None][test] Add accuracy evaluation for AutoDeploy by @ajrasane in #6764
- [None][fix] Make TP working for Triton MOE (in additional to EP we are using) by @dongfengy in #6722
- [TRTLLM-5863][feat] Support MoE INT8 Weight-Only-Quantization in PyTorch Workflow by @Yuening-wa in #6629
- [https://nvbugs/5401114][fix] Unwaive Gemma3 tests by @brb-nv in #6952
- [None][chore] Bump version to 1.1.0rc1 by @yiqingy0 in #6953
- [TRTLLM-7157][feat] BREAKING CHANGE Introduce sampler_type, detect sampler according to options by @dcampora in #6831
- [None][fix] Skip Topk if 0 by @IzzyPutterman in #6934
- [None][fix] Fix: Using RAII to automatically manage the allocation and release of va_list for potential resource leak by @Fan-Yunfan in #6758
- [None][feat] Support Yarn on Qwen3 by @byshiue in #6785
- [None][feat] Add single block version renormalized routing kernel by @ChristinaZ in #6756
- [None][infra] Waive failed cases in main branch by @EmmaQiaoCh in #6951
- [https://nvbugs/5390853][fix] Fix _test_openai_lora.py - disable cuda graph by @amitz-nv in #6965
- [https://nvbugs/5451028][fix] Constrain NemotronSuper test parameters to prevent OOMs by @Naveassaf in #6970
- [None][infra] update feature_combination_matrix of disaggregated and Eagle3 by @leslie-fang25 in #6945
- [None][doc] Update gpt oss doc by @bobboli in #6954
- [None] [feat] Support accurate device iter time by @kaiyux in #6906
- [TRTLLM-7030][fix] uppercase def value in pd-config by @Shixiaowei02 in #6981
- [None] [fix] Fix the macro name by @ChristinaZ in #6983
- [None][infra] Waive failed tests on main 0818 by @EmmaQiaoCh in #6992
- [None][chore] Remove duplicate test waives by @yiqingy0 in #6998
- [None][fix] Clean up linking to CUDA stub libraries in build_wheel.py by @MartinMarciniszyn in #6823
- [None][infra] Cherry-pick #6836 from main branch and improve SSH connection (#6971) by @chzblych in #7005
- [TRTLLM-7158][feat] Introduce sampler options in trtllm bench by @dcampora in #6855
- [None][infra] Enable accuracy test for mtp and chunked prefill by @leslie-fang25 in #6314
- [None][autodeploy] Doc: fix link path in trtllm bench doc by @Fridah-nv in #7007
- [https://nvbugs/5371480][fix] Enable test_phi3_small_8k by @Wanli-Jiang in #6938
- [TRTLLM-7014][chore] Add accuracy test for ctx and gen workers with different models by @reasonsolo in #6741
- [None][refactor] Refactor Torch Compile Backend, MoeLoadBalancer and warmup Logic by @yizhang-nv in #6615
- [None] [infra] stricter coderabbit pr title generation instructions by @venkywonka in #6918
- [TRTLLM-6960][fix] enable scaled_mm tests by @dc3671 in #6936
- [TRTLLM-6991][chore] add DeepSeek-R1 FP8 accuracy tests on Blackwell by @lfr-0531 in #6710
- [TRTLLM-6541][test] Add NIM Related Cases [StarCoder2_7B] and [Codestral_22B_V01] by @fredricz-20070104 in #6939
- [https://nvbugs/5454875][ci] Unwaive Mistral Small 3.1 test by @2ez4bz in #7011
- [TRTLLM-6541][test] Add NIM Related Cases Part 1 by @crazydemo in #6684
- [https://nvbugs/5458798][fix] Relaxed test threshold, added documentation by @MrGeva in #6997
- [None][opt] Add batch wait timeout in fetching requests by @Shunkangz in #6923
- [None][chore] Remove closed bugs by @xinhe-nv in #6969
- [None][fix] acceptance rate calculation fix in benchmark_serving by @zerollzeng in #6746
- [None] [doc] Add more documents for large scale EP by @kaiyux in #7029
- [None] [chore] Update wide-ep genonly scripts by @qiaoxj07 in #6995
- [TRTLLM-7263][fix] Prevent recreation of cublas handles in lora_grouped_gemm every call by @amitz-nv in #6968
- [https://nvbugs/5458874][fix] Fix Nemotron-H flaky CUDA graph / overlap scheduler test by @tomeras91 in #6996
- [https://nvbugs/5455140][fix] unwaive DSR1-fp4 throughput_tp8 by @lfr-0531 in #7022
- [None][chore] Remove duplicate test waives by @yiqingy0 in #7044
- [None][infra] Waive failed tests on main 08/19 by @EmmaQiaoCh in #7037
- [None][feat] Use Separate QKV Input Layout for Context MLA by @zhhuang-nv in #6538
- [https://nvbugs/5444937][chore] Fixing KV events tests by @pcastonguay in #7004
- [https://nvbugs/5451296][bug] Cherry-pick #7017 from release/1.0 branch by @chzblych in #7043
- [None][fix] Accommodate Phi3/4 to work with ModelOpt's FP8 ckpts in Torch by @moraxu in #6761
New Contributors
- @qianbiaoxiang made their first contribution in #5521
- @ajrasane made their first contribution in #6764
- @fredricz-20070104 made their first contribution in #6939
Full Changelog: v1.1.0rc0...v1.1.0rc1