[RFC]: Qwen3-omni performance analyze

### Motivation.

https://arxiv.org/pdf/2509.17765 provides the theoretical performance of Qwen3-omni, we need to analyze and optimize  vllm-omni to achieve compatible performace.
 
# Performance metrics
+ End-to-end first-packet latency
+ Thinker/Talker TPS
+ Generation RTF (生成音频的单位时间/单位音频播放时间 80ms)
<img width="1058" height="458" alt="Image" src="https://github.com/user-attachments/assets/4670596c-ba4c-49a2-834b-98881215c7bb" />

End-to-end first-packet latency = 72 + 88 + 57 + 14 + 3 = 234ms
RTF = (1000/75 + 1000/140 + 14 + 3)/80 = 0.47

It can be understood as the "TTFT" and "TPOT" for speech.

# How to get metrics from vllm-omni
Benchmark: vllm-omni/benchmarks/qwen3-omni/vllm_omni/eval_qwen3_moe_omni.sh and get summary metric from log. 

**End-to-end first-packet latency**: Since streamming output is not currently supported, we can set Thinker's max output len = 1 to apporximately estimate it.

**RTF**: After support stream audio output, we can get it from log metric.

# How to analyze
If the performance un meet expectations, we can set VLLM_TORCH_PROFILER_DIR to en able further analyze.

# Scenario
+ AudioVisual Video to Text (dataset: WorldSense)
+ Text to speech (dataset: SEED)
+ AudioVisual Video to speech (dataset: WorldSense)

The dataset can be select acordding to https://arxiv.org/pdf/2509.17765

### Proposed Change.

Todo:
we set batch size = 1, enable thinker cuda graph currently.
- [ ] offline benchmark support warm up.  @Bounty-hunter 
- [ ] Text to speech @Bounty-hunter 
- [ ] offline benchmark support multimodal datasets. @GG-li 
- [ ] online benchmark
- [ ] AudioVisual Video to Text
- [ ] AudioVisual Video to speech
- [ ] support stream audio output & audio first-packet latency metric & audio inter-packet latency metric
- [ ] Fusion operator, e.g., https://github.com/vllm-project/vllm-omni/pull/734/changes

### Feedback Period.

_No response_

### CC List.

_No response_

### Any Other Things.

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://vllm-omni.readthedocs.io), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC]: Qwen3-omni performance analyze #696

Motivation.

Performance metrics

How to get metrics from vllm-omni

How to analyze

Scenario

Proposed Change.

Feedback Period.

CC List.

Any Other Things.

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[RFC]: Qwen3-omni performance analyze #696

Description

Motivation.

Performance metrics

How to get metrics from vllm-omni

How to analyze

Scenario

Proposed Change.

Feedback Period.

CC List.

Any Other Things.

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions