[Feature] Support Prefill-Decode disaggregation via vLLM KV transfer by ahengljh · Pull Request #1303 · vllm-project/vllm-omni

ahengljh · 2026-02-10T09:08:44Z

Split Plan

This PR is being split into smaller reviewable pieces for easier review and merging:

#1863: PD disaggregation scaffolding only (pd_utils.py, Mooncake patch module, PD stage YAML).
Follow-up PR: live orchestrator / stage wiring that consumes the scaffolding from #1863.
Follow-up PR: Qwen3-Omni-specific PD integration plus unit and e2e coverage.

This umbrella PR remains the full implementation reference while the smaller split PRs are landed.

Summary

Implements Prefill-Decode (PD) disaggregation for the thinker stage in vLLM-Omni, reusing vLLM's native KV connector infrastructure (MooncakeConnector). Splits the thinker into separate prefill (KV producer) and decode (KV consumer) GPU instances, connected via RDMA/TCP KV cache transfer.

Architecture

Changes (17 files, ~4900 lines)

File	Lines	Description
`vllm_omni/entrypoints/omni.py`	+527	Core orchestration: PD detection, validation, prefill SP prep, routing, KV params lifecycle
`vllm_omni/entrypoints/async_omni.py`	+136	Async (online serving) PD routing with same merge semantics as sync path
`vllm_omni/entrypoints/omni_llm.py`	+68	`_flush_kv_connector_sends()` for batch-mode KV flush
`vllm_omni/entrypoints/omni_stage.py`	+112	Stage worker PD support: kv_transfer_params backup/restore, finish_reason check
`vllm_omni/distributed/kv_transfer/patched_mooncake_connector.py`	+272	Patched MooncakeConnector: remote_request_id injection, save-patch-restore for group_kv_pull
`vllm_omni/distributed/kv_transfer/monkey_patch.py`	+100	Version-checked monkey-patch to swap in PatchedMooncakeConnector
`vllm_omni/model_executor/stage_input_processors/qwen3_omni.py`	+148	PD embedding merge (`_merge_pd_embeddings`) for thinker→talker transition
`vllm_omni/model_executor/models/qwen3_omni/qwen3_omni.py`	+73	Zero-padding with safety threshold for PD embed/hidden mismatch
`qwen3_omni_moe_pd_separation.yaml`	+199	Production YAML config for 3-GPU PD deployment
`tests/entrypoints/test_pd_disaggregation.py`	+1468	42 unit tests covering detection, validation, routing, SP prep, cleanup, YAML, monkey-patch
`tests/.../test_qwen3_omni_stage_processors.py`	+1592	45 unit tests for stage input processors including PD merge, async chunk, audio pipeline
`tests/e2e/offline_inference/test_qwen3_omni_pd.py`	+66	E2E offline tests: text-only and video→audio through full PD pipeline
`tests/e2e/online_serving/test_qwen3_omni_pd.py`	+122	E2E online tests: text→text and mix→text+audio via OpenAI API
`tests/e2e/stage_configs/qwen3_omni_pd_ci.yaml`	+184	CI stage config with `load_format: dummy` for test without real weights

Test Plan

Automated Tests (all passing)

Unit tests (pytest tests/entrypoints/test_pd_disaggregation.py -v):

TestDetectPDSeparation (4 tests) — PD pair detection in 2/4-stage pipelines
TestValidatePDConfig (6 tests) — config validation: mismatched connector/role/buffer errors
TestGetPDConnectorInfo (3 tests) — engine_id and bootstrap_addr extraction
TestPreparePrefillSamplingParams (4 tests) — max_tokens=1, KV param injection, no mutation
TestPrefillStopNeutralization (4 tests) — stop=[], stop_token_ids=[], include_stop_str_in_output=False
TestSamplingParamsAutoDuplication (1 test) — auto-dup for 4-stage pipeline
TestNormalizeKVTransferParams (3 tests) — dict/None/dataclass conversion
TestKvCfgToDict (3 tests) — dict/None/dataclass with empty-dict default
TestPDRouting (3 tests) — prefill receives max_tokens=1, decode gets original prompt, correct KV flags
TestKVParamsCleanup (4 tests) — drop/pop/fallback lifecycle
TestTPSizeValidation (3 tests) — matching/mismatched/default TP size
TestPDYAMLConfig (1 test) — production YAML loads and validates
TestMooncakeConnectorPatch (4 tests) — subclass check, remote_request_id, stage payload flags

Stage input processor tests (pytest tests/model_executor/stage_input_processors/test_qwen3_omni_stage_processors.py -v):

TestMergePDEmbeddings (9 tests) — overlap, empty prefill/decode, missing keys, edge cases
TestGetPrefillStage (5 tests) — PD active/inactive, no outputs, wrong source
TestThinker2TalkerPDMode (8 tests) — PD merge, overlap, TTS fallback, graceful error
TestPDAudioPipelineIntegration (3 tests) — full PD audio chain, prompt context, non-PD fallback

E2E tests (require 3x GPUs + model):

test_pd_text_only — offline text generation through PD pipeline
test_pd_video_to_audio — offline video→audio through full 4-stage PD pipeline
test_pd_text_to_text — online text→text via OpenAI API
test_pd_mix_to_text_audio — online multimodal→text+audio via OpenAI API

Manual verification

Pre-commit checks pass (ruff check, ruff format, typos)
Unit tests pass: 42/42 in test_pd_disaggregation.py
Unit tests pass: 45/45 in test_qwen3_omni_stage_processors.py
All PR review comments addressed (22/25 fixed, 2 N/A, 1 deferred)

How to run E2E tests

# Offline (3x GPU, model downloaded)
CUDA_VISIBLE_DEVICES=0,1,2 python -m pytest tests/e2e/offline_inference/test_qwen3_omni_pd.py -v -s

# Online serving (3x GPU, model downloaded)
CUDA_VISIBLE_DEVICES=0,1,2 python -m pytest tests/e2e/online_serving/test_qwen3_omni_pd.py -v -s

# Unit tests (no GPU needed)
python -m pytest tests/entrypoints/test_pd_disaggregation.py -v
python -m pytest tests/model_executor/stage_input_processors/test_qwen3_omni_stage_processors.py -v

GPU Layout (default YAML, TP=1, 3 GPUs)

GPU	Stage	Role
0	Stage 0	Thinker Prefill (KV producer)
1	Stage 1	Thinker Decode (KV consumer)
2	Stage 2 + 3	Talker + Code2Wav

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4fb129bceb

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

lishunyang12

WIP design feedback -- the PD disaggregation idea is sound but the implementation has some structural issues worth sorting out before polish.

ahengljh · 2026-02-24T01:10:00Z

WIP design feedback -- the PD disaggregation idea is sound but the implementation has some structural issues worth sorting out before polish.

Thank you for comments and I'll work on them soon.

hsliuustc0106 · 2026-02-24T07:08:09Z

@vllm-omni-reviewer

…iew comments - Remove non-PD files: gpu_ar_model_runner.py (debug logging only), omni_ar_scheduler.py and omni_generation_scheduler.py (general compat shims, not PD-specific), pd_server_patch_guide.md (superseded by monkey_patch.py) - Downgrade all KV-DIAG logging from WARNING to DEBUG (omni_llm.py, omni_stage.py) - Strip verbose per-step/per-batch diagnostic scaffolding from omni_llm.py and omni_stage.py - patched_mooncake_connector: call super().add_new_req() instead of skipping; use copy-and-restore pattern in group_kv_pull - omni.py: refactor _detect_pd_separation to single-pass; deduplicate _kv_cfg_to_dict/_normalize_kv_transfer_params into _to_dict() - async_omni.py: unify PD routing merge semantics with sync path - qwen3_omni stage_input_processors: replace hardcoded "0"/"24" layer keys with named constants - qwen3_omni model: document zero-padding safety for PD disaggregation - omni_llm: add comment explaining why _flush_kv_connector_sends reaches into vLLM internals PR scope reduced from 15 to 11 files. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

hsliuustc0106

Summary

This PR implements Prefill-Decode (PD) disaggregation for vLLM-Omni. While the feature is architecturally sound, the implementation has several critical issues that need to be addressed.

Critical Issues:

Memory leak: _pd_kv_params_by_req never cleaned up on request failure
Silent failures in config parsing with empty dict fallbacks
Race conditions in state management despite locks
Fragile monkey-patching of vLLM internals
Hardcoded defaults (bootstrap port 25201) without documentation

Moderate Issues:

Complex state management spread across multiple dictionaries
Inconsistent error handling (some raise, some return None)
Missing validation for edge cases
No version compatibility checks for vLLM

Minor Issues:

Debug-level logging for important events
Large PR mixing feature + tests makes review difficult

Recommendation: Request changes - address memory leak and silent failures before merge.

lishunyang12

Thanks for addressing the earlier feedback -- the single-pass detection rewrite, _to_dict dedup, logging downgrade, and super().add_new_req() call all look correct now. A few remaining items:

…iew comments - Remove non-PD files: gpu_ar_model_runner.py (debug logging only), omni_ar_scheduler.py and omni_generation_scheduler.py (general compat shims, not PD-specific), pd_server_patch_guide.md (superseded by monkey_patch.py) - Downgrade all KV-DIAG logging from WARNING to DEBUG (omni_llm.py, omni_stage.py) - Strip verbose per-step/per-batch diagnostic scaffolding from omni_llm.py and omni_stage.py - patched_mooncake_connector: call super().add_new_req() instead of skipping; use copy-and-restore pattern in group_kv_pull - omni.py: refactor _detect_pd_separation to single-pass; deduplicate _kv_cfg_to_dict/_normalize_kv_transfer_params into _to_dict() - async_omni.py: unify PD routing merge semantics with sync path - qwen3_omni stage_input_processors: replace hardcoded "0"/"24" layer keys with named constants - qwen3_omni model: document zero-padding safety for PD disaggregation - omni_llm: add comment explaining why _flush_kv_connector_sends reaches into vLLM internals PR scope reduced from 15 to 11 files. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ests, e2e - Neutralize stop/stop_token_ids in prefill sampling params to ensure finish_reason='length' (prevents MooncakeConnector KV transfer cancel) - Add _DEFAULT_MOONCAKE_BOOTSTRAP_PORT named constant - Add tensor_parallel_size validation in PD config check - Improve error messages with type info for kv_transfer_config parsing - Add defense-in-depth cleanup of _pd_kv_params_by_req after generation - Upgrade auto-duplication log to WARNING with suppression hint - Downgrade per-request PD routing/trace logs from INFO to DEBUG - Add vLLM version compatibility warning in monkey_patch.py - Use dynamic __qualname__ from original MooncakeConnector - Add padding threshold warning (512 tokens) in model zero-padding - Add clarifying comments on threading model, merge order, save-patch-restore - Add unit tests: stop neutralization, failure/leak cleanup, TP validation - Add PD e2e tests for both text and audio modalities (offline + online) - Add PD CI stage config with load_format: dummy Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

spencerr221 · 2026-03-05T09:08:23Z

The unit test cases for this feature, along with the end-to-end (e2e) test cases for all three modalities (text_only, text-to-audio, and video-to-audio), have been successfully executed on the offline staging environment (Yellow Zone) using machine 82.

natureofnature · 2026-03-05T09:40:28Z

+        # multimodal_mask only selects audio/image/video token positions,
+        # which always lie within the prompt (prefill) portion where real
+        # embeddings exist.
+        target_len = thinker_result_ids.shape[-1]


Can we have a unified workflow for other models in PD disaggregation?

Yes, that's the direction we want to go. The PD orchestration logic in omni.py/async_omni.py is already model-agnostic — it only looks at is_prefill_only/is_decode_only flags and kv_transfer_config in the YAML.

The model-specific part is only in stage_input_processors (the embedding merge in _merge_pd_embeddings with layer keys "0" and "24"). For other models, they'd need their own stage_input_processor but can reuse the PD orchestration as-is.

We can extract a common PD embedding merge base with configurable layer keys to make it easier. Will track this as a follow-up.

it's not reasonable to change model files to support PD

natureofnature · 2026-03-05T09:44:24Z

@R2-Y PTAL

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Jinheng Li <ahengljh@gmail.com>

Move duplicated prefill→decode routing code from omni.py and async_omni.py into PDDisaggregationMixin._prepare_pd_decode_routing() in pd_utils.py. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Jinheng Li <ahengljh@gmail.com>

hsliuustc0106 · 2026-03-12T04:14:39Z

I agree with @lishunyang12 to split this PR into several atomic PR

When the talker generates a long output, the flattened codec codes (seq_len * num_quantizers) can exceed the code2wav model's max_model_len, causing a ValueError. Truncate to fit within the 65536 token limit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Jinheng Li <ahengljh@gmail.com>

Remove qwen3_omni_moe_pd_multiconnector.yaml and restore the original qwen3_omni_moe_pd_separation.yaml stage config. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Jinheng Li <ahengljh@gmail.com>

lishunyang12 · 2026-03-13T02:42:56Z

PR 1 — Core PD infra
pd_utils.py (detection, validation, KV normalization helpers)
kv_transfer/monkey_patch.py + init.py
omni_stage.py changes (stage worker PD support)
Corresponding unit tests (detection/validation subset of test_pd_disaggregation.py)

@ahengljh Hi please kindly follow this instruction to scope down this pr otherwise maintainers do not have extra bandwidth to review, and i don't think this pr can be integrated in the near future.

lishunyang12 · 2026-03-13T02:46:04Z

@ahengljh Another way to continue this is to present your pr and get maintainer in sync with your design ideas in our weekly meeting, refer to https://docs.google.com/document/d/1pdUBiS_7mdOUNDtdwy-9OUf7jsMWN324BIbps5olDME/edit?tab=t.0#heading=h.l9hdvzveucma.

ahengljh · 2026-03-13T02:48:38Z

@ahengljh Another way to continue this is to present your pr and get maintainer in sync with your design ideas in our weekly meeting, refer to https://docs.google.com/document/d/1pdUBiS_7mdOUNDtdwy-9OUf7jsMWN324BIbps5olDME/edit?tab=t.0#heading=h.l9hdvzveucma.

Thank you shunyang, actually I have presented this PR in an internal discussion with @hsliuustc0106 , but as you suggested, we also believe split this PR into small ones will be better for everyone, so I am working on it.

Signed-off-by: Jinheng Li <ahengljh@gmail.com>

Bring the split-2 branch back in line with vllm-project#1303 by pairing the Qwen model and stage-processor changes with the PD runtime wiring they depend on. Includes the orchestrator routing changes in omni.py/async_omni.py, stage worker PD flags and KV-transfer restoration in omni_stage.py, the connector flush in omni_llm.py, and the unit-test package markers from the original branch. Co-authored-by: spencerr221 <liubingyu62@gmail.com> Signed-off-by: Jinheng Li <ahengljh@gmail.com>

…ng (vllm-project#1863) Signed-off-by: Jinheng Li <ahengljh@gmail.com>

Carry only the remaining PD test coverage from vllm-project#1303 after split 1 and the corrected split 2 are accounted for. This commit contains the PD entrypoint unit tests plus the offline/online Qwen e2e coverage and the CI-only PD stage config fixture. Co-authored-by: spencerr221 <liubingyu62@gmail.com> Signed-off-by: Jinheng Li <ahengljh@gmail.com>

…rator architecture Adapts the Prefill-Decode (PD) disaggregation feature from PR vllm-project#1303 to the refactored single-process Orchestrator architecture introduced in PR vllm-project#1908. Key changes: - engine/async_omni_engine.py: Add _detect_pd_config() which detects PD stage pairs, applies MooncakeConnector monkey patch, and extracts the bootstrap address; passes pd_config to Orchestrator - engine/orchestrator.py: Add PD routing logic in _forward_to_next_stage; capture prefill KV params from outputs and inject into decode SP via _build_pd_decode_params(); clean up _pd_kv_params on request completion - entrypoints/omni_base.py: Inherit PDDisaggregationMixin; add stage_configs property; call _init_pd_state() on init - entrypoints/omni.py: Expand sampling params for PD before resolving; inject per-request prefill SP modifications - entrypoints/async_omni.py: Same sampling param expansions for async path - entrypoints/pd_utils.py: Replace stage_list -> stage_configs references - model_executor/stage_input_processors/qwen3_omni.py: Add PD embedding merge in thinker2talker(); fix talker2code2wav() dimension slicing and add truncation guard for code2wav max prompt length - model_executor/models/qwen3_omni/qwen3_omni.py: Safety zero-padding in _thinker_to_talker_prefill(); safety clamping in _get_talker_user_parts() for PD length mismatches - New YAML configs: qwen3_omni_moe_pd_separation.yaml (production), qwen3_omni_pd_ci.yaml (CI with dummy weights) - New tests: test_pd_disaggregation.py (adapted for new arch; old-arch integration tests marked xfail), test_qwen3_omni_stage_processors.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ng (vllm-project#1863) Signed-off-by: Jinheng Li <ahengljh@gmail.com> Signed-off-by: yiliu30 <yi4.liu@intel.com>

…ng (vllm-project#1863) Signed-off-by: Jinheng Li <ahengljh@gmail.com>

ahengljh requested a review from hsliuustc0106 as a code owner February 10, 2026 09:08

chatgpt-codex-connector Bot reviewed Feb 10, 2026

View reviewed changes

Comment thread vllm_omni/entrypoints/omni.py Outdated

Comment thread vllm_omni/entrypoints/async_omni.py Outdated

ahengljh mentioned this pull request Feb 12, 2026

[RFC]: Support Prefill-Decode Disaggregation for vLLM-Omni Thinker Stage via vLLM KV Transfer JiusiServe/vllm-omni#92

Open

1 task

ahengljh changed the title ~~[Feature][WIP] Support Prefill-Decode disaggregation via vLLM KV transfer #1~~ [Feature][WIP] Support Prefill-Decode disaggregation via vLLM KV transfer Feb 12, 2026

lishunyang12 reviewed Feb 21, 2026

View reviewed changes

ahengljh force-pushed the feat/pd-disaggregation branch from b315e6b to 606e7cf Compare February 25, 2026 08:04

hsliuustc0106 requested changes Feb 27, 2026

View reviewed changes

lishunyang12 reviewed Feb 27, 2026

View reviewed changes

Comment thread vllm_omni/entrypoints/omni.py Outdated

Comment thread vllm_omni/entrypoints/omni_stage.py Outdated

Comment thread vllm_omni/distributed/kv_transfer/patched_mooncake_connector.py Outdated

Comment thread vllm_omni/model_executor/stage_input_processors/qwen3_omni.py Outdated

ahengljh force-pushed the feat/pd-disaggregation branch from bddce52 to df087f3 Compare March 2, 2026 02:44

ahengljh force-pushed the feat/pd-disaggregation branch 2 times, most recently from 68e0f9e to 0483a26 Compare March 2, 2026 06:44

Vivo50E mentioned this pull request Mar 3, 2026

[RFC]: Support KV Cache CPU Offloading #1150

Open

ahengljh requested a review from hsliuustc0106 March 3, 2026 08:06

ahengljh force-pushed the feat/pd-disaggregation branch 6 times, most recently from a27763c to 2a8212d Compare March 5, 2026 09:03

ahengljh changed the title ~~[Feature][WIP] Support Prefill-Decode disaggregation via vLLM KV transfer~~ [Feature] Support Prefill-Decode disaggregation via vLLM KV transfer Mar 5, 2026

natureofnature reviewed Mar 5, 2026

View reviewed changes

natureofnature reviewed Mar 6, 2026

View reviewed changes

Comment thread vllm_omni/entrypoints/omni_stage.py

natureofnature reviewed Mar 6, 2026

View reviewed changes

Comment thread vllm_omni/entrypoints/omni_stage.py

wuhang2014 reviewed Mar 11, 2026

View reviewed changes

Comment thread vllm_omni/model_executor/stage_configs/qwen3_omni_moe_pd_multiconnector.yaml Outdated

ahengljh and others added 2 commits March 11, 2026 16:50

[Fix] Remove confusing GPU layout example from PD multiconnector config

f035e9c

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Jinheng Li <ahengljh@gmail.com>

Gaohan123 added this to the v0.18.0 milestone Mar 11, 2026

ahengljh and others added 2 commits March 13, 2026 09:56

ahengljh mentioned this pull request Mar 13, 2026

[Feature] Split #1303 Part 1: PD disaggregation scaffolding #1863

Merged

Vivo50E mentioned this pull request Mar 13, 2026

[RFC]: Multi-Stage KV Cache Management Roadmap #1867

Open

1 task

hsliuustc0106 pushed a commit that referenced this pull request Mar 16, 2026

[Feature] Split #1303 Part 1: PD disaggregation scaffolding (#1863)

88caaf1

Signed-off-by: Jinheng Li <ahengljh@gmail.com>

ahengljh mentioned this pull request Mar 16, 2026

[Feature] Split #1303 Part 2: Qwen PD integration #1912

Open

wtomin pushed a commit to wtomin/vllm-omni that referenced this pull request Mar 16, 2026

[Feature] Split vllm-project#1303 Part 1: PD disaggregation scaffoldi…

df65c69

…ng (vllm-project#1863) Signed-off-by: Jinheng Li <ahengljh@gmail.com>

lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request Mar 16, 2026

[Feature] Split vllm-project#1303 Part 1: PD disaggregation scaffoldi…

59c3d04

…ng (vllm-project#1863) Signed-off-by: Jinheng Li <ahengljh@gmail.com>

tangbinh pushed a commit to tangbinh/vllm-omni that referenced this pull request Mar 18, 2026

[Feature] Split vllm-project#1303 Part 1: PD disaggregation scaffoldi…

0ca1d7a

…ng (vllm-project#1863) Signed-off-by: Jinheng Li <ahengljh@gmail.com>

Gaohan123 removed this from the v0.18.0 milestone Mar 20, 2026

yiliu30 pushed a commit to yiliu30/vllm-omni-fork that referenced this pull request Mar 20, 2026

[Feature] Split vllm-project#1303 Part 1: PD disaggregation scaffoldi…

902cd36

…ng (vllm-project#1863) Signed-off-by: Jinheng Li <ahengljh@gmail.com> Signed-off-by: yiliu30 <yi4.liu@intel.com>

amy-why-3459 mentioned this pull request Mar 26, 2026

[RFC]: Omni-Modality Q2 Roadmap #2207

Open

spencerr221 mentioned this pull request Mar 26, 2026

[Feature] Support Prefill-Decode disaggregation via vLLM KV transfer #2220

Merged

5 tasks

clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026

[Feature] Split vllm-project#1303 Part 1: PD disaggregation scaffoldi…

cd924aa

…ng (vllm-project#1863) Signed-off-by: Jinheng Li <ahengljh@gmail.com>

akshatvishu mentioned this pull request May 13, 2026

[Bug]: Stale OmniStage import and type annotation remain in pd_utils.py #3542

Closed

1 task

natureofnature mentioned this pull request May 15, 2026

[RFC]: Qwen3-Omni Stage Transfer via Mooncake Transfer Engine #3635

Open

1 task

daixinning pushed a commit to daixinning/vllm-omni that referenced this pull request May 28, 2026

[Feature] Split vllm-project#1303 Part 1: PD disaggregation scaffoldi…

88df441

…ng (vllm-project#1863) Signed-off-by: Jinheng Li <ahengljh@gmail.com>

Gaohan123 removed the ready label to trigger buildkite CI label Jun 3, 2026

quyifei23 pushed a commit to quyifei23/vllm-omni that referenced this pull request Jun 6, 2026

[Feature] Split vllm-project#1303 Part 1: PD disaggregation scaffoldi…

5671a77

…ng (vllm-project#1863) Signed-off-by: Jinheng Li <ahengljh@gmail.com>

Conversation

ahengljh commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Split Plan

Summary

Architecture

Changes (17 files, ~4900 lines)

Test Plan

Automated Tests (all passing)

Manual verification

How to run E2E tests

GPU Layout (default YAML, TP=1, 3 GPUs)

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

lishunyang12 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ahengljh commented Feb 24, 2026

Uh oh!

hsliuustc0106 commented Feb 24, 2026

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lishunyang12 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

spencerr221 commented Mar 5, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

natureofnature Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

ahengljh Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

natureofnature commented Mar 5, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hsliuustc0106 commented Mar 12, 2026

Uh oh!

lishunyang12 commented Mar 13, 2026

Uh oh!

lishunyang12 commented Mar 13, 2026

ahengljh commented Feb 10, 2026 •

edited

Loading

ahengljh commented Mar 13, 2026 •

edited

Loading