[Feature] Split #1303 Part 2: Qwen PD integration by ahengljh · Pull Request #1912 · vllm-project/vllm-omni

ahengljh · 2026-03-16T07:12:04Z

Purpose

This PR is part 2 of the #1303 split series.

Part 1 (#1863) merged the PD disaggregation scaffolding. This PR carries only the Qwen3-Omni integration layer that consumes that scaffolding.

Scope

This PR includes only:

vllm_omni/model_executor/models/qwen3_omni/qwen3_omni.py
vllm_omni/model_executor/stage_input_processors/qwen3_omni.py
tests/model_executor/stage_input_processors/test_qwen3_omni_stage_processors.py

Notes

The branch has been rebased logically onto current main, so it does not re-introduce the part 1 scaffolding diff.
The PR keeps the PD thinker->talker merge logic in the Qwen integration layer and adds focused unit coverage for the stage processor behavior.
A follow-up split can still carry broader integration / e2e coverage separately if needed.

Bring the split-2 branch back in line with vllm-project#1303 by pairing the Qwen model and stage-processor changes with the PD runtime wiring they depend on. Includes the orchestrator routing changes in omni.py/async_omni.py, stage worker PD flags and KV-transfer restoration in omni_stage.py, the connector flush in omni_llm.py, and the unit-test package markers from the original branch. Co-authored-by: spencerr221 <liubingyu62@gmail.com> Signed-off-by: Jinheng Li <ahengljh@gmail.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 25468fc951

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-16T07:24:28Z

+    if 0 <= index < len(prefill_stage.engine_outputs):
+        return prefill_stage.engine_outputs[index]
+    if prefill_stage.engine_outputs:
+        return prefill_stage.engine_outputs[-1]


Return None when no prefill request ID matches

In PD mode, _match_prefill_output falls back to index/last even after failing to find a matching request_id, so thinker2talker() can merge prefill embeddings from a different request into the current decode request. This corrupts talker context whenever prefill/decode output lists are not perfectly aligned for a step (for example, different ready-request sets across stages), and the safe behavior here is to skip merging (None) rather than positional fallback when no ID match exists.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-16T07:24:28Z

+                    val = prefill_output.outputs[0].multimodal_output.get(key)
+                except Exception:
+                    pass
+            return val.detach().to(device=device, dtype=torch.float) if val is not None else None


Ensure _tts always provides tensors for talker prefill

_tts() can now return None when both decode and prefill outputs lack a TTS embedding, and that None is stored in additional_information; the talker prefill path later does info_dict.get("tts_*_embed").to(...) unconditionally, which raises at runtime (NoneType has no to). This is the exact pd_no_tts_anywhere path introduced by the new logic, so this should either keep failing fast here or synthesize tensor defaults before forwarding to talker.

Useful? React with 👍 / 👎.

yenuo26 · 2026-03-16T11:28:26Z

+import warnings
+from collections import defaultdict
+from typing import Any
+from unittest.mock import MagicMock


maybe you can use pytest-mock, like #1315

ahengljh requested a review from hsliuustc0106 as a code owner March 16, 2026 07:12

ahengljh force-pushed the split/1303-pd-qwen branch from 25468fc to 5b6b234 Compare March 16, 2026 07:22

chatgpt-codex-connector Bot reviewed Mar 16, 2026

View reviewed changes

yenuo26 reviewed Mar 16, 2026

View reviewed changes

hsliuustc0106 mentioned this pull request Mar 19, 2026

[RFC]: vLLM-Omni 2026 Q1 Roadmap #677

Open

38 tasks

akshatvishu mentioned this pull request May 13, 2026

[Bug]: Stale OmniStage import and type annotation remain in pd_utils.py #3542

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Split #1303 Part 2: Qwen PD integration#1912

[Feature] Split #1303 Part 2: Qwen PD integration#1912
ahengljh wants to merge 1 commit into
vllm-project:mainfrom
ahengljh:split/1303-pd-qwen

ahengljh commented Mar 16, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Mar 16, 2026

Uh oh!

chatgpt-codex-connector Bot Mar 16, 2026

Uh oh!

yenuo26 Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ahengljh commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Scope

Notes

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

yenuo26 Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ahengljh commented Mar 16, 2026 •

edited

Loading