Motivation.
This issue outlines the development roadmap for vllm-omni in Q1 2026. Our primary focus for this quarter includes architectural refactoring to support dynamic execution graphs, expanding support for diffusion and omni models, and introducing disaggregated serving capabilities. Any contribution and discussion is welcome :)
Please attach your design doc using this template in your RFC :)
Proposed Change.
🛠 Core Architecture & Infrastructure
Refactoring the core execution pipeline to support more dynamic and efficient workflows.
P0:
🚀 Disaggregated Serving & Distributed Systems
Enhancing serving capabilities for large-scale and multi-node deployments.
P0:
🎨 Feature: Diffusion Pipeline #814
Major feature additions to improve the flexibility and performance of image/video generation.
⚡️ Feature: Omni(AR+DiT) Pipeline
Enhancing the multimodal interaction capabilities.
P0:
🤖 Reinforcement Learning & Model Support
Expanding the ecosystem of supported models and training feedback loops.
Targeted Model Optimizations
📊 Benchmarks, Metrics & Logging
Improving observability and establishing performance baselines.
P0:
P1:
🧪 CI/CD & Quality Assurance
please check our design doc
Feedback Period.
No response
CC List.
@ywang96 @Gaohan123 @tzhouam @ZJY0516 @DarkLight1337 @Isotr0py @SamitHuang @david6666666
Any Other Things.
No response
Before submitting a new issue...
Motivation.
This issue outlines the development roadmap for vllm-omni in Q1 2026. Our primary focus for this quarter includes architectural refactoring to support dynamic execution graphs, expanding support for diffusion and omni models, and introducing disaggregated serving capabilities. Any contribution and discussion is welcome :)
Please attach your design doc using this template in your RFC :)
Proposed Change.
🛠 Core Architecture & Infrastructure
Refactoring the core execution pipeline to support more dynamic and efficient workflows.
P0:
Entrypoint:
Hardware Abstraction: Implement a plugin system to support diverse hardware backends (NPU, TPU, XPU, Metal, etc.). @faaany [RFC]: Hardware Abstraction Layer & Plugin System for Unified Backend Support #702 [Hardware] Support platforms and plugin system #774
For NPU related issues, please refer to [RFC]: vLLM-Omni NPU 2026 Q1 Roadmap #886 for details.
For XPU related issues, please refer to [RFC]: vLLM-Omni XPU 2026 Q1 Roadmap #1127 for details.
🚀 Disaggregated Serving & Distributed Systems
Enhancing serving capabilities for large-scale and multi-node deployments.
P0:
🎨 Feature: Diffusion Pipeline #814
Major feature additions to improve the flexibility and performance of image/video generation.
⚡️ Feature: Omni(AR+DiT) Pipeline
Enhancing the multimodal interaction capabilities.
P0:
Performance & Execution:
Memory Management:
Streaming I/O:
vllmupstream). [Feature] add session based streaming input support to v1 vllm#28973 [Feature]: Streaming Input for Qwen3-TTS #1766 [Feature]: Streaming Input for Qwen3-TTS #1766🤖 Reinforcement Learning & Model Support
Expanding the ecosystem of supported models and training feedback loops.
veRL+veOmni+vLLM-Omnistack. [RFC]: support customized scheduler & pipelines for diffusion model with arbitrary returns #686 [RFC] Support Qwen-Image Flow-GRPO Training based on vLLM-Omni verl-project/verl#4639 [RFC]: Reinforcement learning support on vllm-omni #778 [Misc] Support WorkerWrapperBase and CustomPipeline for Diffusion Worker #764Targeted Model Optimizations
📊 Benchmarks, Metrics & Logging
Improving observability and establishing performance baselines.
P0:
P1:
🧪 CI/CD & Quality Assurance
please check our design doc
Feedback Period.
No response
CC List.
@ywang96 @Gaohan123 @tzhouam @ZJY0516 @DarkLight1337 @Isotr0py @SamitHuang @david6666666
Any Other Things.
No response
Before submitting a new issue...