-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Bugfix] Invalidate CPU metadata shadows after query extension
bug
Something isn't working
speculative-decoding
v1
#45766
opened Jun 16, 2026 by
houj04
Loading…
[TieredOffloading] Bound Secondary Tier lookup times
v1
#45765
opened Jun 16, 2026 by
varun-sundar-rabindranath
Contributor
Loading…
[BugFix] Support MLA model identification for draft models Kimi(deeps…
bug
Something isn't working
#45764
opened Jun 16, 2026 by
baolongsun
Loading…
3 of 4 tasks
[Bugfix] Fix Qwen3 prompt tool-call reasoning false positive
bug
Something isn't working
qwen
Related to Qwen models
#45763
opened Jun 16, 2026 by
alexbi29
Loading…
[Docs] Update stale LMCache examples
documentation
Improvements or additions to documentation
kv-connector
#45762
opened Jun 16, 2026 by
sammshen
Contributor
Loading…
Revert "[Model Runner V2][Bugfix] Fix MRV2 LoRA warmup" (#35536)
bug
Something isn't working
nvidia
qwen
Related to Qwen models
v1
#45761
opened Jun 16, 2026 by
vllm-agent
Contributor
•
Draft
[Frontend] Remove AsyncMicrobatchTokenizer.
ready
ONLY add when PR is ready to merge/full CI is needed
#45759
opened Jun 16, 2026 by
noooop
Collaborator
Loading…
4 tasks
[XPU] Fix Triton attn fp8/bf16 check failing
intel-gpu
Related to Intel GPU
ready
ONLY add when PR is ready to merge/full CI is needed
v1
#45758
opened Jun 16, 2026 by
zhenwei-intel
Contributor
Loading…
4 tasks
[CPUOffloading] Guard CPU eviction check
v1
#45757
opened Jun 16, 2026 by
varun-sundar-rabindranath
Contributor
Loading…
[Frontend] [Parser] Migrate Nemotron V3 to streaming parser engine
#45755
opened Jun 16, 2026 by
bbrowning
Collaborator
Loading…
[Bugfix] DiffusionGemma: only pop a request's logprobs when it commits (#45689)
bug
Something isn't working
#45754
opened Jun 16, 2026 by
waynehacking8
Contributor
Loading…
[Rust Frontend] Add CORS support
rust
#45753
opened Jun 16, 2026 by
tahsintunan
Contributor
Loading…
Turboquant native fp8 v4 store
ci/build
v1
#45748
opened Jun 16, 2026 by
sladyn98
Contributor
Loading…
[Bugfix][ROCm] Fix rocm_aiter_per_tensor_quant custom op aliasing
bug
Something isn't working
rocm
Related to AMD ROCm
#45747
opened Jun 15, 2026 by
Rohan138
Contributor
Loading…
DO NOT MERGE
ci/build
ready
ONLY add when PR is ready to merge/full CI is needed
rocm
Related to AMD ROCm
#45746
opened Jun 15, 2026 by
AndreasKaratzas
Member
Loading…
[M3] Enable FP8 sparse GQA
ci/build
#45744
opened Jun 15, 2026 by
gau-nernst
Contributor
Loading…
4 tasks
[M3] Tune Triton indexer score decode for spec-decode
#45743
opened Jun 15, 2026 by
gau-nernst
Contributor
Loading…
4 tasks
Pre-Commit CI Speedup
ci/build
ready
ONLY add when PR is ready to merge/full CI is needed
#45740
opened Jun 15, 2026 by
AndreasKaratzas
Member
•
Draft
[Perf] Restore zero-init of swizzled NVFP4 scale buffer to recover Blackwell decode throughput
#45739
opened Jun 15, 2026 by
qiching
Contributor
Loading…
[NVFP4] Support clamped SwiGLU-OAI (SwigluBias) on FlashInfer-CUTLASS MoE
nvidia
#45738
opened Jun 15, 2026 by
ywang96
Member
Loading…
[KV-Offloading] : Expose CPU cache usage metric
kv-connector
v1
#45737
opened Jun 15, 2026 by
varun-sundar-rabindranath
Contributor
•
Draft
[Quantization] Extend ModelOpt mixed precision and NVFP4 runtime formats
#45735
opened Jun 15, 2026 by
baonudesifeizhai
Contributor
Loading…
4 tasks
docs: multi-server vLLM deployment issues and solutions
documentation
Improvements or additions to documentation
#45732
opened Jun 15, 2026 by
hsjlyj
Loading…
Previous Next
ProTip!
What’s not been updated in a month: updated:<2026-05-15.