Skip to content

[Feature] Split #1303 Part 2: Qwen PD integration#1912

Open
ahengljh wants to merge 1 commit into
vllm-project:mainfrom
ahengljh:split/1303-pd-qwen
Open

[Feature] Split #1303 Part 2: Qwen PD integration#1912
ahengljh wants to merge 1 commit into
vllm-project:mainfrom
ahengljh:split/1303-pd-qwen

Conversation

@ahengljh

@ahengljh ahengljh commented Mar 16, 2026

Copy link
Copy Markdown
Contributor

Purpose

This PR is part 2 of the #1303 split series.

Part 1 (#1863) merged the PD disaggregation scaffolding. This PR carries only the Qwen3-Omni integration layer that consumes that scaffolding.

Scope

This PR includes only:

  • vllm_omni/model_executor/models/qwen3_omni/qwen3_omni.py
  • vllm_omni/model_executor/stage_input_processors/qwen3_omni.py
  • tests/model_executor/stage_input_processors/test_qwen3_omni_stage_processors.py

Notes

  • The branch has been rebased logically onto current main, so it does not re-introduce the part 1 scaffolding diff.
  • The PR keeps the PD thinker->talker merge logic in the Qwen integration layer and adds focused unit coverage for the stage processor behavior.
  • A follow-up split can still carry broader integration / e2e coverage separately if needed.

@ahengljh ahengljh requested a review from hsliuustc0106 as a code owner March 16, 2026 07:12
Bring the split-2 branch back in line with vllm-project#1303 by pairing the Qwen model and stage-processor changes with the PD runtime wiring they depend on.

Includes the orchestrator routing changes in omni.py/async_omni.py, stage worker PD flags and KV-transfer restoration in omni_stage.py, the connector flush in omni_llm.py, and the unit-test package markers from the original branch.

Co-authored-by: spencerr221 <liubingyu62@gmail.com>
Signed-off-by: Jinheng Li <ahengljh@gmail.com>
@ahengljh ahengljh force-pushed the split/1303-pd-qwen branch from 25468fc to 5b6b234 Compare March 16, 2026 07:22

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 25468fc951

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +171 to +174
if 0 <= index < len(prefill_stage.engine_outputs):
return prefill_stage.engine_outputs[index]
if prefill_stage.engine_outputs:
return prefill_stage.engine_outputs[-1]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Return None when no prefill request ID matches

In PD mode, _match_prefill_output falls back to index/last even after failing to find a matching request_id, so thinker2talker() can merge prefill embeddings from a different request into the current decode request. This corrupts talker context whenever prefill/decode output lists are not perfectly aligned for a step (for example, different ready-request sets across stages), and the safe behavior here is to skip merging (None) rather than positional fallback when no ID match exists.

Useful? React with 👍 / 👎.

val = prefill_output.outputs[0].multimodal_output.get(key)
except Exception:
pass
return val.detach().to(device=device, dtype=torch.float) if val is not None else None

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Ensure _tts always provides tensors for talker prefill

_tts() can now return None when both decode and prefill outputs lack a TTS embedding, and that None is stored in additional_information; the talker prefill path later does info_dict.get("tts_*_embed").to(...) unconditionally, which raises at runtime (NoneType has no to). This is the exact pd_no_tts_anywhere path introduced by the new logic, so this should either keep failing fast here or synthesize tensor defaults before forwarding to talker.

Useful? React with 👍 / 👎.

import warnings
from collections import defaultdict
from typing import Any
from unittest.mock import MagicMock

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you can use pytest-mock, like #1315

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants