Skip to content

[Feature] Support Prefill-Decode disaggregation via vLLM KV transfer#1303

Open
ahengljh wants to merge 23 commits into
vllm-project:mainfrom
ahengljh:feat/pd-disaggregation
Open

[Feature] Support Prefill-Decode disaggregation via vLLM KV transfer#1303
ahengljh wants to merge 23 commits into
vllm-project:mainfrom
ahengljh:feat/pd-disaggregation

Conversation

@ahengljh

@ahengljh ahengljh commented Feb 10, 2026

Copy link
Copy Markdown
Contributor

Split Plan

This PR is being split into smaller reviewable pieces for easier review and merging:

  1. #1863: PD disaggregation scaffolding only (pd_utils.py, Mooncake patch module, PD stage YAML).
  2. Follow-up PR: live orchestrator / stage wiring that consumes the scaffolding from #1863.
  3. Follow-up PR: Qwen3-Omni-specific PD integration plus unit and e2e coverage.

This umbrella PR remains the full implementation reference while the smaller split PRs are landed.


Summary

Implements Prefill-Decode (PD) disaggregation for the thinker stage in vLLM-Omni, reusing vLLM's native KV connector infrastructure (MooncakeConnector). Splits the thinker into separate prefill (KV producer) and decode (KV consumer) GPU instances, connected via RDMA/TCP KV cache transfer.

Architecture

image image

Changes (17 files, ~4900 lines)

File Lines Description
vllm_omni/entrypoints/omni.py +527 Core orchestration: PD detection, validation, prefill SP prep, routing, KV params lifecycle
vllm_omni/entrypoints/async_omni.py +136 Async (online serving) PD routing with same merge semantics as sync path
vllm_omni/entrypoints/omni_llm.py +68 _flush_kv_connector_sends() for batch-mode KV flush
vllm_omni/entrypoints/omni_stage.py +112 Stage worker PD support: kv_transfer_params backup/restore, finish_reason check
vllm_omni/distributed/kv_transfer/patched_mooncake_connector.py +272 Patched MooncakeConnector: remote_request_id injection, save-patch-restore for group_kv_pull
vllm_omni/distributed/kv_transfer/monkey_patch.py +100 Version-checked monkey-patch to swap in PatchedMooncakeConnector
vllm_omni/model_executor/stage_input_processors/qwen3_omni.py +148 PD embedding merge (_merge_pd_embeddings) for thinker→talker transition
vllm_omni/model_executor/models/qwen3_omni/qwen3_omni.py +73 Zero-padding with safety threshold for PD embed/hidden mismatch
qwen3_omni_moe_pd_separation.yaml +199 Production YAML config for 3-GPU PD deployment
tests/entrypoints/test_pd_disaggregation.py +1468 42 unit tests covering detection, validation, routing, SP prep, cleanup, YAML, monkey-patch
tests/.../test_qwen3_omni_stage_processors.py +1592 45 unit tests for stage input processors including PD merge, async chunk, audio pipeline
tests/e2e/offline_inference/test_qwen3_omni_pd.py +66 E2E offline tests: text-only and video→audio through full PD pipeline
tests/e2e/online_serving/test_qwen3_omni_pd.py +122 E2E online tests: text→text and mix→text+audio via OpenAI API
tests/e2e/stage_configs/qwen3_omni_pd_ci.yaml +184 CI stage config with load_format: dummy for test without real weights

Test Plan

Automated Tests (all passing)

Unit tests (pytest tests/entrypoints/test_pd_disaggregation.py -v):

  • TestDetectPDSeparation (4 tests) — PD pair detection in 2/4-stage pipelines
  • TestValidatePDConfig (6 tests) — config validation: mismatched connector/role/buffer errors
  • TestGetPDConnectorInfo (3 tests) — engine_id and bootstrap_addr extraction
  • TestPreparePrefillSamplingParams (4 tests) — max_tokens=1, KV param injection, no mutation
  • TestPrefillStopNeutralization (4 tests) — stop=[], stop_token_ids=[], include_stop_str_in_output=False
  • TestSamplingParamsAutoDuplication (1 test) — auto-dup for 4-stage pipeline
  • TestNormalizeKVTransferParams (3 tests) — dict/None/dataclass conversion
  • TestKvCfgToDict (3 tests) — dict/None/dataclass with empty-dict default
  • TestPDRouting (3 tests) — prefill receives max_tokens=1, decode gets original prompt, correct KV flags
  • TestKVParamsCleanup (4 tests) — drop/pop/fallback lifecycle
  • TestTPSizeValidation (3 tests) — matching/mismatched/default TP size
  • TestPDYAMLConfig (1 test) — production YAML loads and validates
  • TestMooncakeConnectorPatch (4 tests) — subclass check, remote_request_id, stage payload flags

Stage input processor tests (pytest tests/model_executor/stage_input_processors/test_qwen3_omni_stage_processors.py -v):

  • TestMergePDEmbeddings (9 tests) — overlap, empty prefill/decode, missing keys, edge cases
  • TestGetPrefillStage (5 tests) — PD active/inactive, no outputs, wrong source
  • TestThinker2TalkerPDMode (8 tests) — PD merge, overlap, TTS fallback, graceful error
  • TestPDAudioPipelineIntegration (3 tests) — full PD audio chain, prompt context, non-PD fallback

E2E tests (require 3x GPUs + model):

  • test_pd_text_only — offline text generation through PD pipeline
  • test_pd_video_to_audio — offline video→audio through full 4-stage PD pipeline
  • test_pd_text_to_text — online text→text via OpenAI API
  • test_pd_mix_to_text_audio — online multimodal→text+audio via OpenAI API

Manual verification

  • Pre-commit checks pass (ruff check, ruff format, typos)
  • Unit tests pass: 42/42 in test_pd_disaggregation.py
  • Unit tests pass: 45/45 in test_qwen3_omni_stage_processors.py
  • All PR review comments addressed (22/25 fixed, 2 N/A, 1 deferred)

How to run E2E tests

# Offline (3x GPU, model downloaded)
CUDA_VISIBLE_DEVICES=0,1,2 python -m pytest tests/e2e/offline_inference/test_qwen3_omni_pd.py -v -s

# Online serving (3x GPU, model downloaded)
CUDA_VISIBLE_DEVICES=0,1,2 python -m pytest tests/e2e/online_serving/test_qwen3_omni_pd.py -v -s

# Unit tests (no GPU needed)
python -m pytest tests/entrypoints/test_pd_disaggregation.py -v
python -m pytest tests/model_executor/stage_input_processors/test_qwen3_omni_stage_processors.py -v

GPU Layout (default YAML, TP=1, 3 GPUs)

GPU Stage Role
0 Stage 0 Thinker Prefill (KV producer)
1 Stage 1 Thinker Decode (KV consumer)
2 Stage 2 + 3 Talker + Code2Wav

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4fb129bceb

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread vllm_omni/entrypoints/omni.py Outdated
Comment thread vllm_omni/entrypoints/async_omni.py Outdated
@ahengljh ahengljh changed the title [Feature][WIP] Support Prefill-Decode disaggregation via vLLM KV transfer #1 [Feature][WIP] Support Prefill-Decode disaggregation via vLLM KV transfer Feb 12, 2026

@lishunyang12 lishunyang12 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WIP design feedback -- the PD disaggregation idea is sound but the implementation has some structural issues worth sorting out before polish.

Comment thread vllm_omni/distributed/kv_transfer/monkey_patch.py
Comment thread vllm_omni/distributed/kv_transfer/monkey_patch.py
Comment thread vllm_omni/entrypoints/omni.py Outdated
Comment thread vllm_omni/entrypoints/omni.py Outdated
Comment thread vllm_omni/entrypoints/omni_llm.py Outdated
Comment thread vllm_omni/entrypoints/omni_llm.py Outdated
Comment thread vllm_omni/entrypoints/async_omni.py Outdated
Comment thread vllm_omni/model_executor/stage_input_processors/qwen3_omni.py Outdated
Comment thread vllm_omni/model_executor/models/qwen3_omni/qwen3_omni.py
Comment thread vllm_omni/worker/gpu_ar_model_runner.py Outdated
@ahengljh

Copy link
Copy Markdown
Contributor Author

WIP design feedback -- the PD disaggregation idea is sound but the implementation has some structural issues worth sorting out before polish.

Thank you for comments and I'll work on them soon.

@hsliuustc0106

Copy link
Copy Markdown
Collaborator

@vllm-omni-reviewer

@ahengljh ahengljh force-pushed the feat/pd-disaggregation branch from b315e6b to 606e7cf Compare February 25, 2026 08:04
ahengljh added a commit to ahengljh/vllm-omni that referenced this pull request Feb 27, 2026
…iew comments

- Remove non-PD files: gpu_ar_model_runner.py (debug logging only),
  omni_ar_scheduler.py and omni_generation_scheduler.py (general compat
  shims, not PD-specific), pd_server_patch_guide.md (superseded by
  monkey_patch.py)
- Downgrade all KV-DIAG logging from WARNING to DEBUG (omni_llm.py,
  omni_stage.py)
- Strip verbose per-step/per-batch diagnostic scaffolding from
  omni_llm.py and omni_stage.py
- patched_mooncake_connector: call super().add_new_req() instead of
  skipping; use copy-and-restore pattern in group_kv_pull
- omni.py: refactor _detect_pd_separation to single-pass; deduplicate
  _kv_cfg_to_dict/_normalize_kv_transfer_params into _to_dict()
- async_omni.py: unify PD routing merge semantics with sync path
- qwen3_omni stage_input_processors: replace hardcoded "0"/"24" layer
  keys with named constants
- qwen3_omni model: document zero-padding safety for PD disaggregation
- omni_llm: add comment explaining why _flush_kv_connector_sends
  reaches into vLLM internals

PR scope reduced from 15 to 11 files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

@hsliuustc0106 hsliuustc0106 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

This PR implements Prefill-Decode (PD) disaggregation for vLLM-Omni. While the feature is architecturally sound, the implementation has several critical issues that need to be addressed.

Critical Issues:

  • Memory leak: _pd_kv_params_by_req never cleaned up on request failure
  • Silent failures in config parsing with empty dict fallbacks
  • Race conditions in state management despite locks
  • Fragile monkey-patching of vLLM internals
  • Hardcoded defaults (bootstrap port 25201) without documentation

Moderate Issues:

  • Complex state management spread across multiple dictionaries
  • Inconsistent error handling (some raise, some return None)
  • Missing validation for edge cases
  • No version compatibility checks for vLLM

Minor Issues:

  • Debug-level logging for important events
  • Large PR mixing feature + tests makes review difficult

Recommendation: Request changes - address memory leak and silent failures before merge.

Comment thread vllm_omni/entrypoints/omni.py
Comment thread vllm_omni/entrypoints/omni.py
Comment thread vllm_omni/entrypoints/omni.py Outdated
Comment thread vllm_omni/entrypoints/omni.py Outdated
Comment thread vllm_omni/distributed/kv_transfer/monkey_patch.py
Comment thread vllm_omni/entrypoints/omni.py
Comment thread vllm_omni/entrypoints/async_omni.py Outdated
Comment thread vllm_omni/model_executor/models/qwen3_omni/qwen3_omni.py Outdated
Comment thread tests/entrypoints/test_pd_disaggregation.py

@lishunyang12 lishunyang12 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the earlier feedback -- the single-pass detection rewrite, _to_dict dedup, logging downgrade, and super().add_new_req() call all look correct now. A few remaining items:

Comment thread vllm_omni/entrypoints/omni.py Outdated
Comment thread vllm_omni/entrypoints/omni_stage.py Outdated
Comment thread vllm_omni/distributed/kv_transfer/patched_mooncake_connector.py Outdated
Comment thread vllm_omni/model_executor/stage_input_processors/qwen3_omni.py Outdated
@ahengljh ahengljh force-pushed the feat/pd-disaggregation branch from bddce52 to df087f3 Compare March 2, 2026 02:44
ahengljh added a commit to ahengljh/vllm-omni that referenced this pull request Mar 2, 2026
…iew comments

- Remove non-PD files: gpu_ar_model_runner.py (debug logging only),
  omni_ar_scheduler.py and omni_generation_scheduler.py (general compat
  shims, not PD-specific), pd_server_patch_guide.md (superseded by
  monkey_patch.py)
- Downgrade all KV-DIAG logging from WARNING to DEBUG (omni_llm.py,
  omni_stage.py)
- Strip verbose per-step/per-batch diagnostic scaffolding from
  omni_llm.py and omni_stage.py
- patched_mooncake_connector: call super().add_new_req() instead of
  skipping; use copy-and-restore pattern in group_kv_pull
- omni.py: refactor _detect_pd_separation to single-pass; deduplicate
  _kv_cfg_to_dict/_normalize_kv_transfer_params into _to_dict()
- async_omni.py: unify PD routing merge semantics with sync path
- qwen3_omni stage_input_processors: replace hardcoded "0"/"24" layer
  keys with named constants
- qwen3_omni model: document zero-padding safety for PD disaggregation
- omni_llm: add comment explaining why _flush_kv_connector_sends
  reaches into vLLM internals

PR scope reduced from 15 to 11 files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ahengljh added a commit to ahengljh/vllm-omni that referenced this pull request Mar 2, 2026
…ests, e2e

- Neutralize stop/stop_token_ids in prefill sampling params to ensure
  finish_reason='length' (prevents MooncakeConnector KV transfer cancel)
- Add _DEFAULT_MOONCAKE_BOOTSTRAP_PORT named constant
- Add tensor_parallel_size validation in PD config check
- Improve error messages with type info for kv_transfer_config parsing
- Add defense-in-depth cleanup of _pd_kv_params_by_req after generation
- Upgrade auto-duplication log to WARNING with suppression hint
- Downgrade per-request PD routing/trace logs from INFO to DEBUG
- Add vLLM version compatibility warning in monkey_patch.py
- Use dynamic __qualname__ from original MooncakeConnector
- Add padding threshold warning (512 tokens) in model zero-padding
- Add clarifying comments on threading model, merge order, save-patch-restore
- Add unit tests: stop neutralization, failure/leak cleanup, TP validation
- Add PD e2e tests for both text and audio modalities (offline + online)
- Add PD CI stage config with load_format: dummy

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ahengljh ahengljh force-pushed the feat/pd-disaggregation branch 2 times, most recently from 68e0f9e to 0483a26 Compare March 2, 2026 06:44
@ahengljh ahengljh requested a review from hsliuustc0106 March 3, 2026 08:06
@ahengljh ahengljh force-pushed the feat/pd-disaggregation branch 6 times, most recently from a27763c to 2a8212d Compare March 5, 2026 09:03
@ahengljh ahengljh changed the title [Feature][WIP] Support Prefill-Decode disaggregation via vLLM KV transfer [Feature] Support Prefill-Decode disaggregation via vLLM KV transfer Mar 5, 2026
@spencerr221

Copy link
Copy Markdown
Contributor

The unit test cases for this feature, along with the end-to-end (e2e) test cases for all three modalities (text_only, text-to-audio, and video-to-audio), have been successfully executed on the offline staging environment (Yellow Zone) using machine 82.
image

Comment thread vllm_omni/distributed/kv_transfer/monkey_patch.py
Comment thread vllm_omni/entrypoints/async_omni.py Outdated
# multimodal_mask only selects audio/image/video token positions,
# which always lie within the prompt (prefill) portion where real
# embeddings exist.
target_len = thinker_result_ids.shape[-1]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a unified workflow for other models in PD disaggregation?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's the direction we want to go. The PD orchestration logic in omni.py/async_omni.py is already model-agnostic — it only looks at is_prefill_only/is_decode_only flags and kv_transfer_config in the YAML.

The model-specific part is only in stage_input_processors (the embedding merge in _merge_pd_embeddings with layer keys "0" and "24"). For other models, they'd need their own stage_input_processor but can reuse the PD orchestration as-is.

We can extract a common PD embedding merge base with configurable layer keys to make it easier. Will track this as a follow-up.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not reasonable to change model files to support PD

@natureofnature

Copy link
Copy Markdown
Contributor

@R2-Y PTAL

Comment thread vllm_omni/entrypoints/omni_stage.py
Comment thread vllm_omni/entrypoints/omni_stage.py
Comment thread vllm_omni/model_executor/stage_configs/qwen3_omni_moe_pd_multiconnector.yaml Outdated
ahengljh and others added 2 commits March 11, 2026 16:50
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Jinheng Li <ahengljh@gmail.com>
Move duplicated prefill→decode routing code from omni.py and
async_omni.py into PDDisaggregationMixin._prepare_pd_decode_routing()
in pd_utils.py.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Jinheng Li <ahengljh@gmail.com>
@Gaohan123 Gaohan123 added this to the v0.18.0 milestone Mar 11, 2026
@hsliuustc0106

Copy link
Copy Markdown
Collaborator

I agree with @lishunyang12 to split this PR into several atomic PR

ahengljh and others added 2 commits March 13, 2026 09:56
When the talker generates a long output, the flattened codec codes
(seq_len * num_quantizers) can exceed the code2wav model's max_model_len,
causing a ValueError. Truncate to fit within the 65536 token limit.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Jinheng Li <ahengljh@gmail.com>
Remove qwen3_omni_moe_pd_multiconnector.yaml and restore the original
qwen3_omni_moe_pd_separation.yaml stage config.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Jinheng Li <ahengljh@gmail.com>
@lishunyang12

Copy link
Copy Markdown
Collaborator

PR 1 — Core PD infra
pd_utils.py (detection, validation, KV normalization helpers)
kv_transfer/monkey_patch.py + init.py
omni_stage.py changes (stage worker PD support)
Corresponding unit tests (detection/validation subset of test_pd_disaggregation.py)

@ahengljh Hi please kindly follow this instruction to scope down this pr otherwise maintainers do not have extra bandwidth to review, and i don't think this pr can be integrated in the near future.

@lishunyang12

Copy link
Copy Markdown
Collaborator

@ahengljh Another way to continue this is to present your pr and get maintainer in sync with your design ideas in our weekly meeting, refer to https://docs.google.com/document/d/1pdUBiS_7mdOUNDtdwy-9OUf7jsMWN324BIbps5olDME/edit?tab=t.0#heading=h.l9hdvzveucma.

@ahengljh

ahengljh commented Mar 13, 2026

Copy link
Copy Markdown
Contributor Author

@ahengljh Another way to continue this is to present your pr and get maintainer in sync with your design ideas in our weekly meeting, refer to https://docs.google.com/document/d/1pdUBiS_7mdOUNDtdwy-9OUf7jsMWN324BIbps5olDME/edit?tab=t.0#heading=h.l9hdvzveucma.

Thank you shunyang, actually I have presented this PR in an internal discussion with @hsliuustc0106 , but as you suggested, we also believe split this PR into small ones will be better for everyone, so I am working on it.

hsliuustc0106 pushed a commit that referenced this pull request Mar 16, 2026
Signed-off-by: Jinheng Li <ahengljh@gmail.com>
ahengljh added a commit to ahengljh/vllm-omni that referenced this pull request Mar 16, 2026
Bring the split-2 branch back in line with vllm-project#1303 by pairing the Qwen model and stage-processor changes with the PD runtime wiring they depend on.

Includes the orchestrator routing changes in omni.py/async_omni.py, stage worker PD flags and KV-transfer restoration in omni_stage.py, the connector flush in omni_llm.py, and the unit-test package markers from the original branch.

Co-authored-by: spencerr221 <liubingyu62@gmail.com>
Signed-off-by: Jinheng Li <ahengljh@gmail.com>
wtomin pushed a commit to wtomin/vllm-omni that referenced this pull request Mar 16, 2026
lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request Mar 16, 2026
tangbinh pushed a commit to tangbinh/vllm-omni that referenced this pull request Mar 18, 2026
ahengljh added a commit to ahengljh/vllm-omni that referenced this pull request Mar 18, 2026
Carry only the remaining PD test coverage from vllm-project#1303 after split 1 and the corrected split 2 are accounted for.

This commit contains the PD entrypoint unit tests plus the offline/online Qwen e2e coverage and the CI-only PD stage config fixture.

Co-authored-by: spencerr221 <liubingyu62@gmail.com>
Signed-off-by: Jinheng Li <ahengljh@gmail.com>
spencerr221 added a commit to spencerr221/vllm-omni that referenced this pull request Mar 18, 2026
…rator architecture

Adapts the Prefill-Decode (PD) disaggregation feature from PR vllm-project#1303
to the refactored single-process Orchestrator architecture introduced
in PR vllm-project#1908.

Key changes:
- engine/async_omni_engine.py: Add _detect_pd_config() which detects
  PD stage pairs, applies MooncakeConnector monkey patch, and extracts
  the bootstrap address; passes pd_config to Orchestrator
- engine/orchestrator.py: Add PD routing logic in _forward_to_next_stage;
  capture prefill KV params from outputs and inject into decode SP via
  _build_pd_decode_params(); clean up _pd_kv_params on request completion
- entrypoints/omni_base.py: Inherit PDDisaggregationMixin; add
  stage_configs property; call _init_pd_state() on init
- entrypoints/omni.py: Expand sampling params for PD before resolving;
  inject per-request prefill SP modifications
- entrypoints/async_omni.py: Same sampling param expansions for async path
- entrypoints/pd_utils.py: Replace stage_list -> stage_configs references
- model_executor/stage_input_processors/qwen3_omni.py: Add PD embedding
  merge in thinker2talker(); fix talker2code2wav() dimension slicing and
  add truncation guard for code2wav max prompt length
- model_executor/models/qwen3_omni/qwen3_omni.py: Safety zero-padding
  in _thinker_to_talker_prefill(); safety clamping in
  _get_talker_user_parts() for PD length mismatches
- New YAML configs: qwen3_omni_moe_pd_separation.yaml (production),
  qwen3_omni_pd_ci.yaml (CI with dummy weights)
- New tests: test_pd_disaggregation.py (adapted for new arch; old-arch
  integration tests marked xfail), test_qwen3_omni_stage_processors.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Gaohan123 Gaohan123 removed this from the v0.18.0 milestone Mar 20, 2026
yiliu30 pushed a commit to yiliu30/vllm-omni-fork that referenced this pull request Mar 20, 2026
…ng (vllm-project#1863)

Signed-off-by: Jinheng Li <ahengljh@gmail.com>

Signed-off-by: yiliu30 <yi4.liu@intel.com>
clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026
daixinning pushed a commit to daixinning/vllm-omni that referenced this pull request May 28, 2026
…ng (vllm-project#1863)

Signed-off-by: Jinheng Li <ahengljh@gmail.com>
@Gaohan123 Gaohan123 removed the ready label to trigger buildkite CI label Jun 3, 2026
quyifei23 pushed a commit to quyifei23/vllm-omni that referenced this pull request Jun 6, 2026
…ng (vllm-project#1863)

Signed-off-by: Jinheng Li <ahengljh@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants