[RFC]: vLLM-Omni 2026 Q1 Roadmap

### Motivation.

This issue outlines the development roadmap for **vllm-omni** in Q1 2026. Our primary focus for this quarter includes architectural refactoring to support dynamic execution graphs, expanding support for diffusion and omni models, and introducing disaggregated serving capabilities. Any contribution and discussion is welcome :) 

Please attach your design doc using this [template](https://docs.google.com/document/d/12YxSsVeD1jvL-InClkeAEnZyWFDndz_65JmXvsamuV4/edit?usp=sharing) in your RFC :)

### Proposed Change.

### 🛠 Core Architecture & Infrastructure

Refactoring the core execution pipeline to support more dynamic and efficient workflows.

P0: 
* [x] **Entrypoint:** 
    * [x] Stage Config refactoring: simplify and optimize stage configurations to improve clarity, maintainablity, and extensibility.  #1115 
    * [x] Single-Stage CLI: support single stage cli for serving models in a single-stage mode. #939        
    * [x] Stage DP: implement DP(coordinator) at the stage level to coordinate DP execution. #984 #1465 
    * [x] Entrypoint refactoring to align with vLLM upstream #967 #1908 
    
* [x] **Hardware Abstraction:** Implement a plugin system to support diverse hardware backends (NPU, TPU, XPU, Metal, etc.). @faaany #702 #774  
**For NPU related issues, please refer to #886 for details.**
**For XPU related issues, please refer to #1127 for details.**


### 🚀 Disaggregated Serving & Distributed Systems

Enhancing serving capabilities for large-scale and multi-node deployments.

P0: 
* [x] **Multi-Node Serving:** Support for multi-node setups using the Mooncake Connector. #1019 
* [x] **Full EPDG Disaggregation:** Complete disaggregation of the Prefill, Decode, and Generate stages. #1303 #1863 #1912 #1999 
* [x] **Model Support:** Add support for Bagel and HunyuanImage3.0 models in disaggregated setups.  #726  #759 

### 🎨 Feature: Diffusion Pipeline #814 

Major feature additions to improve the flexibility and performance of image/video generation. 

### ⚡️ Feature: Omni(AR+DiT) Pipeline

Enhancing the multimodal interaction capabilities.

P0: 
* [x] **Performance & Execution:**
    * [x] Support Quantization. #1764 
    * [x] Support CUDA Graph execution for each stage.  #669 #1205 
    * [x] Support Async Chunked Computation across stages. #727 #742 

* [ ] **Memory Management:**
    * [ ] CPU offloading for KV Cache. #1150 

* [x] **Streaming I/O:**
    * [x] Support multimodal streaming input (aligning with `vllm` upstream).  https://github.com/vllm-project/vllm/pull/28973 #1766 #1766 
    * [x] Support audio streaming output. #1438 


### 🤖 Reinforcement Learning & Model Support

Expanding the ecosystem of supported models and training feedback loops.
* [x] **RL Integration:** Support for the `veRL` + `veOmni` + `vLLM-Omni` stack. #686 https://github.com/volcengine/verl/issues/4639 #778 #764 


### Targeted Model Optimizations

- [x]   Qwen3-Omni  #409 #727 #1151 #951 #1016 #962 
- [x]   Qwen3-tts #938 
- [x]   Hunyuan-Image 3.0 #759 #1085 #1935 #1323 
- [x]   Bagel #726 #936 
- [x]   Qwen-Image family #1682 
- [x]   WAN 2.2 #1350 #1365 


### 📊 Benchmarks, Metrics & Logging

Improving observability and establishing performance baselines.
P0: 
* [x] **Benchmarks:** Establish standard benchmarks for both Diffusion and Omni pipelines. 
    * [x] Diffusion #529 #1657 
    * [x] Omni #780 
* [ ] **Metrics Refactor:** 
    * [x] Expose granular per-request information.  #891 
    * [ ] Add system-level metrics. 🙋
* [x] **Profiling:** PyTorch Profiler integration (align with vLLM). #650 #651 #709 
* [ ] **Logging:** Remove print statements and optimize log verbosity.  @Bounty-hunter 

P1: 
* [ ] **Metrics Refactor:** Support returning metrics via logs and Prometheus integration. 🙋
* [ ] **Profiling:**  NVIDIA Nsight Systems integration (align with vLLM). #1098 



### 🧪 CI/CD & Quality Assurance 
please check our design [doc](https://docs.google.com/document/d/18SQUBSMeq-2Zp6jgfmkRn7Xha7Ai5JmuoTWwaaGo6sk/edit?tab=t.0#heading=h.7vrmys3vrq1m)
* [x] Multi-tiered tests #400 
* [x] Diverse hardware backend #1721 

### Feedback Period.

_No response_

### CC List.

@ywang96 @Gaohan123 @tzhouam @ZJY0516 @DarkLight1337 @Isotr0py @SamitHuang @david6666666 

### Any Other Things.

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://vllm-omni.readthedocs.io), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC]: vLLM-Omni 2026 Q1 Roadmap #677

Motivation.

Proposed Change.

🛠 Core Architecture & Infrastructure

🚀 Disaggregated Serving & Distributed Systems

🎨 Feature: Diffusion Pipeline #814

⚡️ Feature: Omni(AR+DiT) Pipeline

🤖 Reinforcement Learning & Model Support

Targeted Model Optimizations

📊 Benchmarks, Metrics & Logging

🧪 CI/CD & Quality Assurance

Feedback Period.

CC List.

Any Other Things.

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[RFC]: vLLM-Omni 2026 Q1 Roadmap #677

Description

Motivation.

Proposed Change.

🛠 Core Architecture & Infrastructure

🚀 Disaggregated Serving & Distributed Systems

🎨 Feature: Diffusion Pipeline #814

⚡️ Feature: Omni(AR+DiT) Pipeline

🤖 Reinforcement Learning & Model Support

Targeted Model Optimizations

📊 Benchmarks, Metrics & Logging

🧪 CI/CD & Quality Assurance

Feedback Period.

CC List.

Any Other Things.

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions