Skip to content

三段式分离部署(Encoder + Transformer + Decoder)#947

Open
fuheaven wants to merge 451 commits intoModelTC:mainfrom
fuheaven:disagg
Open

三段式分离部署(Encoder + Transformer + Decoder)#947
fuheaven wants to merge 451 commits intoModelTC:mainfrom
fuheaven:disagg

Conversation

@fuheaven
Copy link
Contributor

Summary

在runner中集成了mooncake提供的分离部署模式,为 LightX2V 提供完整 三段式 分离部署能力:推理流水线可拆分为 EncoderTransformerDecoder 三个节点。VAE Decoder 独立部署在 Decoder 节点。支持了Wan和Qwen系列模型。

feature

  1. 分离部署集成了mooncake engine,可用rdma高效传输,IO可达到显卡理论最高带宽速度。
  2. text encoder部分集成了lightllm优化,可使用kernel优化或者service优化,性能提升30%。
  3. 相对于mooncake提交的独立disagg模式,集成到了local runner中,目前支持了wan runner和qwen runner
  4. mooncake disagg中各阶段分别是统一进程起的不同线程,强耦合的生产者消费者模式不能匹配高并发场景,当前将其解耦为独立进程,三个阶段(encoder + transformer + decoder)可分别部署在不同机器不同卡上,高并发下可提高吞吐。

分离架构解析

通过配置参数 disagg_mode,推理 Pipeline 被物理拆分为 三段式 独立服务,数据流经 Phase1(Encoder → Transformer) 与 Phase2(Transformer → Decoder) 两次 Mooncake 传输:

  • Encoder 角色(disagg_mode="encoder"):
    仅加载 Text Encoder、Image Encoder(I2V / I2I 时)以及 VAE Encoder,跳过 DiT 与 VAE Decoder。
    执行特征提取,将 context、clip_encoder_out、vae_encoder_out、latent_shape 等通过 Mooncake Phase1 投递给 Transformer 节点。
  • Transformer 角色(disagg_mode="transformer"):
    仅加载 DiT 模型,跳过 Encoder 与 VAE Decoder(三段式下由 Decoder 节点承担解码)。
    启动后等待 Phase1 数据,收到后执行哈希校验、拼装输入并完成去噪;若配置了 decoder_engine_rank,将去噪后的潜空间通过 Mooncake Phase2 发送给 Decoder 节点,不本地做 VAE 解码。
  • Decoder 角色(disagg_mode="decode"):
    仅加载 VAE Decoder,跳过 Text/Image Encoder 与 DiT。
    启动后进入 Phase2 接收等待状态,收到 Transformer 发来的潜空间后执行 VAE 解码并保存输出视频/图像,任务完成状态与结果文件均落在 Decoder 节点。

helloyongyang and others added 30 commits December 4, 2025 04:44
Co-authored-by: Yang Yong (雍洋) <yongyang1030@163.com>
deploy 部署相关环境更新
1. 增加mlu dockerfile
2. 增加dockerfile保存目录
Tidy VAReader & OmniVAReader
Tidy VARecorder & X264VARecorder
VARecorder with stream, use buffer stream
Tidy env WORKER_RANK, READER_RANK, RECORDER_RANK
Support voice type choose
Co-authored-by: root <root@pt-de4c35727a1b4d1b9f27f422f06026ec-worker-0.pt-de4c35727a1b4d1b9f27f422f06026ec.ns-devsft-3460edd0.svc.cluster.local>
Co-authored-by: root <root@pt-9b2035a55fe647eeb007584b238e5077-worker-0.pt-9b2035a55fe647eeb007584b238e5077.ns-devsft-3460edd0.svc.cluster.local>
chengtao-lv and others added 27 commits February 5, 2026 20:21
Add option: ulysses qkv_fusion

---------

Co-authored-by: root <root@pt-72be2ccd01a14fa18a4b18c6c347f823-worker-0.pt-72be2ccd01a14fa18a4b18c6c347f823.ns-devsft-3460edd0.svc.cluster.local>
Co-authored-by: root <root@pt-0699d18802514bc1b116c156f9ce2bc1-worker-0.pt-0699d18802514bc1b116c156f9ce2bc1.ns-devsft-3460edd0.svc.cluster.local>
---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: wangshankun <wangshankun2011@hotmail.com>
# Intel Support for LightX2V

## Summary
This PR adds Intel support for LightX2V, enabling video generation and
image generation on Intel GPUs.

## End-to-End Performance
On PTL integrated GPUs (iGPUs), we have achieved native-level
performance leveraging the torch_sdpa kernel.
| Models            | Configuration                  | Time    |
|-------------------|--------------------------------|---------|
| Wan2.1-T2V-1.3B   | 33 frames, 480×848, 20 steps   | 197.80s |
| Z-image-turbo     | 16:9 ratio, 9steps             | 57s     |

## Usage

### Environment Setup
Set the platform environment variable for Intel iGPUs (Windows):
```bash
set PLATFORM=intel_xpu
```
### Wan Models (Text-to-Video)

```python
"""
Wan2.1 text-to-video generation example.
This example demonstrates how to use LightX2V with Wan2.1 model for T2V generation.
"""

from lightx2v import LightX2VPipeline

# Initialize pipeline for Wan2.1 T2V task
pipe = LightX2VPipeline(
    model_path=r"xxx\models\Wan2.1-T2V-1.3B",
    model_cls="wan2.1",
    task="t2v",
)
pipe.create_generator(
    config_json="../../configs/platforms/intel_xpu/wan_t2v_1_3.json"
)

# Create generator with specified parameters
pipe.create_generator(
    attn_mode="torch_sdpa",
    infer_steps=50,
    height=480,  # Can be set to 720 for higher resolution
    width=832,  # Can be set to 1280 for higher resolution
    num_frames=33,
    guidance_scale=5.0,
    sample_shift=5.0,
)

seed = 42
prompt = "a cat"
negative_prompt = "镜头晃动,色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走"
save_result_path = "./output.mp4"

pipe.generate(
    seed=seed,
    prompt=prompt,
    negative_prompt=negative_prompt,
    save_result_path=save_result_path,
)
```

### Z-image-turbo Models (Text-to-Image)
  ```python
"""
Z-Image image-to-image generation example.
This example demonstrates how to use LightX2V with Z-Image-Turbo model
for T2I generation.
"""

from lightx2v import LightX2VPipeline

# Initialize pipeline for Z-Image-edit T2I task
pipe = LightX2VPipeline(
    model_path=r"xxxx\models\Z-Image-Turbo",
    model_cls="z_image",
    task="t2i",
)

# Alternative: create generator from config JSON file
pipe.create_generator(
config_json="../../configs/platforms/intel_xpu/z_image_turbo_t2i.json"
)

# Create generator manually with specified parameters
pipe.create_generator(
    attn_mode="torch_sdpa",
    aspect_ratio="16:9",
    infer_steps=9,
    guidance_scale=1,
)

# Generation parameters
seed = 42
prompt = 'A coffee shop entrance features a chalkboard sign reading
"Qwen Coffee 😊 $2 per cup," with a neon light beside it displaying
"通义千问". Next to it hangs a poster showing a beautiful Chinese woman, and
beneath the poster is written
"π≈3.1415926-53589793-23846264-33832795-02384197". Ultra HD, 4K,
cinematic composition, Ultra HD, 4K, cinematic composition.'
negative_prompt = ""
save_result_path = "./output.png"

# Generate video
pipe.generate(
    seed=seed,
    prompt=prompt,
    negative_prompt=negative_prompt,
    save_result_path=save_result_path,
)
```

## Installation
Prerequisites
For intel platform, install dependencies with the following commands:
```bash
pip install --no-cache-dir -r requirements_win.txt
pip install --no-cache-dir torch==2.9.1+xpu torchvision torchaudio
--index-url https://download.pytorch.org/whl/xpu
pip install --no-cache-dir -e .
```
## Platform Detection
Verify Intel XPU availability with the following code:
```python
import torch
torch.xpu.is_available()

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gushiqiao <975033167>
Co-authored-by: gushiqiao <975033167>
服务化部署增加画布操作模式

---------

Co-authored-by: qinxinyi <qxy118045534@163.com>
# Enable Disaggregation Feature

## Summary

This PR introduces a **disaggregation architecture** to LightX2V,
enabling distributed deployment of the video generation pipeline across
multiple devices or machines.

## What's New

### Core Functionality
- **Service Decoupling**: Separate encoder and transformer services that
can run independently
- **High-Performance Communication**: ZeroMQ and RDMA-based messaging
with Mooncake transfer engine
- **Flexible Deployment**: Support for single-machine multi-GPU and
cross-machine distributed setups

### New Components
- `lightx2v/disagg/`: Complete disaggregation package
  - `conn.py`: Data connection and management
  - `services/encoder.py`: Encoder service implementation
  - `services/transformer.py`: Transformer service implementation
  - `examples/`: Usage examples for WAN I2V and T2V models

## Key Benefits

1. **Resource Flexibility**: Distribute compute-intensive tasks across
multiple devices
2. **Scalability**: Easy horizontal scaling for production deployments
3. **Memory Efficiency**: Run large models on hardware-constrained
environments
4. **Service-Oriented**: Build microservice-based video generation
systems

## Usage Example

```shell
python3 lightx2v/disagg/examples/wan_t2v_service.py
```

See `lightx2v/disagg/examples/` for complete working examples.

## Backward Compatibility

✅ This is an **optional feature** that doesn't affect existing
functionality:
- Default mode preserves current behavior
- All existing APIs remain unchanged
- Users can opt-in to use disaggregation when needed

## Testing

- ✅ Tested with WAN I2V and T2V models
- ✅ Verified cross-device communication stability
- ✅ Validated accuracy matches single-machine mode

## Files Changed

- Added: `lightx2v/disagg/` package with all disaggregation modules
- Modified: None (purely additive)

## Future Enhancements

- Automatic service discovery
- Load balancing across multiple workers
- Enhanced monitoring and health checks

---

**Type**: Feature  
**Breaking Changes**: None  
**Documentation**: Included in `lightx2v/disagg/examples/`

---------

Co-authored-by: jasonzhang517 <yzhang298@e.ntu.edu.sg>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: helloyongyang <yongyang1030@163.com>
…l) support (ModelTC#918)

# Summary
Add FP8 and Flash Attention optimizations for lightx2v_intel_xpu,
enabling future expansion of Intel-optimized kernels.

# E2E Perf 
## Wan2.1-T2V-1.3B (33 frames, 480×848, 20 steps)

Configuration | Time | Speedup
-- | -- | --
Before PR(torch_sdpa) | 197s | 1.00x
After PR (sycl_kernels) | 170.55s | 1.13x

# Usage Example
```python

import time
from lightx2v import LightX2VPipeline


# Initialize pipeline for Wan2.1 T2V task
pipe = LightX2VPipeline(
    model_path=r"xxxx\Wan2.1-T2V-1.3B",
    model_cls="wan2.1",
    task="t2v",
)

pipe.create_generator(
    config_json=r"xxx\LightX2V\configs\platforms\intel_xpu\wan_t2v_1_3_xpu_flash_attn.json"
)



seed = 42
prompt = "a bird"
negative_prompt = "镜头晃动,色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走"
save_result_path = "./output.mp4"

s=time.time()
pipe.generate(
    seed=seed,
    prompt=prompt,
    negative_prompt=negative_prompt,
    save_result_path=save_result_path,
)
e=time.time()

print("generate time",e-s)
```

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: helloyongyang <yongyang1030@163.com>
…ofile) (ModelTC#909)

## Description

通过引入 TensorRT 引擎来替换 PyTorch 原生算子,为 Qwen Image 模型的 VAE (Encoder /
Decoder) 带来了显著的性能提升。针对 T2I(定尺寸)和 I2I(变尺寸)两种截然不同的任务类型,设计了双维度的 TRT
加速方案,并重构了底层的加载组件。

### Key Features

1. **统一底层的 TensorRT VAE 加载器 (`vae_trt.py`)**:
   - 使用统一的 `trt_engine_path` 入参与 `vae_type: "tensorrt"` 配置开关。
- 支持完善的 **PyTorch Fallback 机制**:一旦环境探测失败、引擎文件缺失或预分配显存 OOM,会自动回退使用
PyTorch 的原生 VAE 算子执行,保证推理链路的业务安全与健壮性。

2. **T2I 场景:Static Shape 引擎 + 按需加载 (Lazy Load)**
   - 因为 T2I 生成图像具有有限的固定比例,为每个分辨率预构建独立的静态引擎,完全消除动态执行开销。
- 采用 **按需加载 (Lazy Load)** 策略:仅在当前分辨率首次请求时加载对应引擎对(~5GB 显存 /
对),切换分辨率时自动释放旧引擎、加载新引擎。相比全量加载(~25GB)大幅降低显存占用,兼容端到端推理场景。

3. **I2I 场景:Multi-Profile 动态引擎集成**
- 针对非受控的任意宽高输入,支持在一份引擎中包含 9 组经典的 Opt Shapes(包括 512x512, 1024x1024, 720p,
1080p 等)。
   - 推理时动态匹配最接近的 Profile 档位,确保 TensorRT 分配出最佳的内存布局与 Kernel 计算路径。
   - 引擎常驻显存,Encoder + Decoder 合计约 ~1.0-1.2 GB。

4. **配套文档 (`QwenImageVAETensorRT.md`)**
   - 新增 VAE TRT 优化的配置与最佳实践指南。
   - 含独立测试与端到端服务模式两组 benchmark 数据,以及性能差异的根因分析。

---

## Performance Benchmark

实测数据来自 NVIDIA H100 (80GB) 单卡环境。

### 1. T2I Static Shape — 独立 VAE 测试

| 比例 | PT Enc (ms) | TRT Enc (ms) | Enc 加速 | PT Dec (ms) | TRT Dec (ms)
| Dec 加速 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 16:9 | 66.53 | **32.70** | **2.03x** | 103.65 | **49.66** | **2.09x**
|
| 9:16 | 65.72 | **32.22** | **2.04x** | 103.02 | **50.71** | **2.03x**
|
| 1:1 | 78.16 | **41.95** | **1.86x** | 121.91 | **61.52** | **1.98x** |
| 4:3 | 73.99 | **37.23** | **1.99x** | 114.45 | **54.75** | **2.09x** |
| 3:4 | 31.74 | **17.33** | **1.83x** | 50.77 | **26.86** | **1.89x** |

> **Encoder ~1.95x, Decoder ~2.02x**

### 2. T2I Static Shape — 端到端服务模式 (Qwen-Image-2512, 5 step, VAE Decoder)

> T2I 无 VAE Encoder,仅统计 Decoder。

| 比例 | PT Dec (ms) | TRT Dec (ms) | Dec 加速 | 首次加载 (ms) |
| :---: | :---: | :---: | :---: | :---: |
| 16:9 | 189.3 | **88.4** | **2.14x** | 343.9 |
| 9:16 | 179.6 | **85.6** | **2.10x** | 226.4 |
| 1:1 | 157.6 | **106.2** | **1.48x** | 304.1 |
| 4:3 | 148.7 | **94.7** | **1.57x** | 238.0 |
| 3:4 | 70.4 | **46.1** | **1.53x** | 178.2 |

> **Decoder 平均 ~1.8x**。「首次加载」为 Lazy Load 切换分辨率时的一次性开销,后续同分辨率请求不再产生。

### 3. I2I Multi-Profile — 独立 VAE 测试 (10 轮平均)

**Encoder**:

| 分辨率 | PT Enc (ms) | TRT Enc (ms) | 加速 |
| :---: | :---: | :---: | :---: |
| 512x512 | 11.00 | **8.53** | **1.29x** |
| 1024x1024 | 42.85 | **27.56** | **1.55x** |
| 480p 16:9 | 17.25 | **12.00** | **1.44x** |
| 720p 16:9 | 38.00 | **25.35** | **1.50x** |
| 768p 4:3 | 31.98 | **21.76** | **1.47x** |

> **Encoder 平均 ~1.45x**

**Decoder**:

| 分辨率 | PT Dec (ms) | TRT Dec (ms) | 加速 |
| :---: | :---: | :---: | :---: |
| 512x512 | 17.60 | **12.78** | **1.38x** |
| 1024x1024 | 68.16 | **44.93** | **1.52x** |
| 480p 16:9 | 27.67 | **18.85** | **1.47x** |
| 720p 16:9 | 60.24 | **40.80** | **1.48x** |
| 768p 4:3 | 51.14 | **34.92** | **1.46x** |

> **Decoder 平均 ~1.46x。综合 ~1.45x**

### 4. I2I Multi-Profile — 端到端服务模式 (qwen-image-edit-251130, 4 step)

| 分辨率 | PT Enc → TRT Enc | Enc 加速 | PT Dec → TRT Dec | Dec 加速 |
| :---: | :---: | :---: | :---: | :---: |
| 512x512 | 48.5 → **28.8** | **1.68x** | 138.4 → **134.0** | **1.03x**
|
| 1024x1024 | 48.2 → **28.4** | **1.70x** | 152.7 → **133.3** |
**1.15x** |
| 480p 16:9 | 48.7 → **29.6** | **1.64x** | 140.4 → **134.4** |
**1.04x** |
| 720p 16:9 | 48.6 → **30.1** | **1.62x** | 139.0 → **134.2** |
**1.04x** |
| 768p 4:3 | 49.2 → **29.8** | **1.65x** | 152.8 → **134.8** | **1.13x**
|

> **Encoder ~1.66x, Decoder ~1.08x**
>
> Decoder 加速比低于独立测试是因为 `postprocess(output_type="pil")` 附加了 ~80-90ms 恒定
CPU 开销(tensor → PIL 转换),TRT 无法加速,数学上稀释了比值。TRT 引擎内核本身的加速效果应参考独立测试数据。

---

## Changes Made

- Refactored `lightx2v/models/video_encoders/trt/qwen_image/vae_trt.py`
  - Unified Static / Multi-Profile loading logic
- Implemented Lazy Load for T2I static engines (auto load/release per
resolution)
  - PyTorch fallback mechanism
- Added T2I TRT config:
`configs/qwen_image/qwen_image_t2i_2512_trt.json`
- Added I2I TRT config:
`configs/qwen_image/qwen_image_i2i_2511_trt.json`
- Added shell scripts: `scripts/qwen_image/qwen_image_t2i_2512_trt.sh`,
`scripts/qwen_image/qwen_image_i2i_2511_trt.sh`
- Added Documentation:
`examples/BeginnerGuide/ZH_CN/QwenImageVAETensorRT.md`,
`examples/BeginnerGuide/EN/QwenImageVAETensorRT.md`
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

此拉取请求引入了LightX2V框架中大型生成模型(如Wan和Qwen Image)的三段式分离部署模式。这一改进旨在通过将推理流水线拆分为独立的Encoder、Transformer和Decoder服务,显著优化显存使用、提高系统吞吐量,并支持跨设备或跨机器的灵活部署。通过集成Mooncake传输引擎和LightLLM优化,确保了数据传输的高效性和编码阶段的性能提升,从而为高分辨率、长时生成场景提供了更稳定和可扩展的解决方案。

Highlights

  • 三段式分离部署: 为LightX2V推理流水线引入了完整的Encoder、Transformer、Decoder三段式分离部署能力,支持将推理过程拆分为独立的服务,部署在不同的显卡或节点上。
  • Mooncake引擎集成: 集成了高性能Mooncake传输引擎,支持RDMA/TCP通信,实现Encoder与Transformer之间(Phase1)以及Transformer与Decoder之间(Phase2)的高效数据传输。
  • 模型支持扩展: Wan和Qwen系列模型均已支持完整的三段式分离部署,包括VAE Decoder的独立部署。
  • 性能优化: Text Encoder部分集成了LightLLM优化(kernel或service),可提升性能高达30%。通过解耦各阶段为独立进程,提高了高并发场景下的吞吐量。
  • 显存优化: 分离部署模式显著降低了各节点的显存占用,因为每个节点只加载自身所需的模型部分,特别是在Decoder节点独立承载VAE解码时。
  • 详细部署指南: 新增了详细的中文部署指南文档,涵盖了配置方法、启动服务与请求流程、以及RDMA与TCP协议选择等内容。

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • configs/qwen_image/qwen_image_i2i_disagg_decode.json
    • Added新配置,用于Qwen Image I2I任务的Decoder分离部署模式。
  • configs/qwen_image/qwen_image_i2i_disagg_encoder.json
    • Added新配置,用于Qwen Image I2I任务的Encoder分离部署模式。
  • configs/qwen_image/qwen_image_i2i_disagg_transformer.json
    • Added新配置,用于Qwen Image I2I任务的Transformer分离部署模式。
  • configs/qwen_image/qwen_image_t2i_disagg_decode.json
    • Added新配置,用于Qwen Image T2I任务的Decoder分离部署模式。
  • configs/qwen_image/qwen_image_t2i_disagg_encoder.json
    • Added新配置,用于Qwen Image T2I任务的Encoder分离部署模式。
  • configs/qwen_image/qwen_image_t2i_disagg_transformer.json
    • Added新配置,用于Qwen Image T2I任务的Transformer分离部署模式。
  • configs/wan/wan_i2v_disagg_decode.json
    • Added新配置,用于Wan I2V任务的Decoder分离部署模式。
  • configs/wan/wan_i2v_disagg_encoder.json
    • Added新配置,用于Wan I2V任务的Encoder分离部署模式。
  • configs/wan/wan_i2v_disagg_transformer.json
    • Added新配置,用于Wan I2V任务的Transformer分离部署模式。
  • configs/wan/wan_t2v_disagg_decode.json
    • Added新配置,用于Wan T2V任务的Decoder分离部署模式。
  • configs/wan/wan_t2v_disagg_encoder.json
    • Added新配置,用于Wan T2V任务的Encoder分离部署模式。
  • configs/wan/wan_t2v_disagg_transformer.json
    • Added新配置,用于Wan T2V任务的Transformer分离部署模式。
  • examples/BeginnerGuide/ZH_CN/DisaggSplitDeploy.md
    • Added一篇新的中文指南文档,详细介绍了Diffusion模型的分离部署。
  • lightx2v/disagg/disagg_mixin.py
    • Added一个新的Mixin类,用于实现基于Mooncake的分布式通信功能。
  • lightx2v/models/runners/base_runner.py
    • Updatedinit_scheduler方法,在解码模式下使用NullScheduler。
  • lightx2v/models/runners/default_runner.py
    • Updatedinit_modules和end_run方法,以正确处理模型在分离部署模式下可能为None的情况。
  • lightx2v/models/runners/qwen_image/qwen_image_runner.py
    • IntegratedDisaggMixin,Updatedload_model以根据分离模式加载特定组件,Modifiedinit_modules和run_pipeline以支持分离部署逻辑,Adjustedset_target_shape以处理Transformer模式下的形状恢复。
  • lightx2v/models/runners/wan/wan_runner.py
    • IntegratedDisaggMixin,Updatedload_model以根据分离模式加载特定组件,Added_run_transformer_role方法,Modifiedrun_pipeline以支持分离部署逻辑。
  • lightx2v/models/schedulers/scheduler.py
    • AddedNullScheduler类,用于在不需要调度器的分离部署角色中作为占位符。
  • scripts/base/base.sh
    • AddedMOONCAKE_CONFIG_PATH环境变量的设置。
  • scripts/server/disagg/qwen/post_qwen_i2i.py
    • Added一个Python脚本,用于向Qwen Image I2I三段式分离部署服务发送请求。
  • scripts/server/disagg/qwen/post_qwen_t2i.py
    • Added一个Python脚本,用于向Qwen Image T2I三段式分离部署服务发送请求。
  • scripts/server/disagg/qwen/start_qwen_i2i_disagg.sh
    • Added一个Shell脚本,用于启动Qwen Image I2I三段式分离部署服务。
  • scripts/server/disagg/qwen/start_qwen_t2i_disagg.sh
    • Added一个Shell脚本,用于启动Qwen Image T2I三段式分离部署服务。
  • scripts/server/disagg/wan/post_wan_i2v.py
    • Added一个Python脚本,用于向Wan I2V三段式分离部署服务发送请求。
  • scripts/server/disagg/wan/post_wan_t2v.py
    • Added一个Python脚本,用于向Wan T2V三段式分离部署服务发送请求。
  • scripts/server/disagg/wan/start_wan_i2v_disagg.sh
    • Added一个Shell脚本,用于启动Wan I2V三段式分离部署服务。
  • scripts/server/disagg/wan/start_wan_t2v_disagg.sh
    • Added一个Shell脚本,用于启动Wan T2V三段式分离部署服务。
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

本次 PR 引入了非常重要的三段式分离部署功能(Encoder + Transformer + Decoder),这是一个很棒的工程实现,可以有效优化大规模生成模型在分布式环境下的显存占用和推理吞吐。

代码实现非常全面,涵盖了从底层通信(基于 Mooncake)、核心逻辑(DisaggMixin)、与现有 Runner 的集成,到上层的配置、文档和测试脚本。整体设计考虑周全,例如:

  • 使用 DisaggMixin 来复用分离部署逻辑,代码结构清晰。
  • 针对不同角色(encoder, transformer, decode)按需加载模型,有效降低显存。
  • 包含了数据传输的哈希校验,保证了数据一致性。
  • 提供了详尽的中文文档和开箱即用的启动、测试脚本,极大地降低了用户的使用门槛。

我发现了一些文档和脚本注释中的小问题,并已在具体的 review comments 中提出建议,希望能让这个功能更加完善。总体来说,这是一次高质量的提交。

Comment on lines +358 to +366
```bash
python -m lightx2v.server \
--model_cls wan2.1 \
--task t2v \
--model_path $model_path \
--config_json ${lightx2v_path}/configs/wan/wan_t2v_disagg_decode.json \
--host 0.0.0.0 \
--port 8004
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

你好,这篇文档写得非常详细,对用户理解和使用分离部署功能非常有帮助。
在 3.1 节手动启动服务的示例代码中,使用了 $model_path${lightx2v_path} 这两个环境变量。对于直接阅读这部分内容的用户来说,可能不清楚如何设置这两个变量。
建议在这里增加一个简短的说明,提醒用户需要先设置这两个环境变量,并可以参考脚本 scripts/server/disagg/wan/start_wan_t2v_disagg.sh 中的定义方式。例如:

> **注意**:以下命令中的 `$model_path``${lightx2v_path}` 变量需要提前设置。`$lightx2v_path` 应指向项目根目录,`$model_path` 应指向模型文件所在的目录。

这样可以提升文档的易用性。

# GPU_T : Transformer (port 8005)
#
# Override GPUs via environment variables:
# GPU_ENCODER=4 GPU_TRANSFORMER=5 GPU_DECODER=6 ./start_wan_i2v_disagg_all.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

在脚本的注释中,示例命令引用了 start_wan_i2v_disagg_all.sh,但当前脚本的文件名是 start_wan_i2v_disagg.sh。这可能是一个小笔误。

Suggested change
# GPU_ENCODER=4 GPU_TRANSFORMER=5 GPU_DECODER=6 ./start_wan_i2v_disagg_all.sh
# GPU_ENCODER=4 GPU_TRANSFORMER=5 GPU_DECODER=6 ./start_wan_i2v_disagg.sh

# GPU_T : Transformer (port 8003)
#
# Override GPUs via environment variables:
# GPU_ENCODER=4 GPU_TRANSFORMER=5 GPU_DECODER=6 ./start_wan_t2v_disagg_all.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

和另一个脚本类似,这里的注释中示例命令引用了 start_wan_t2v_disagg_all.sh,但当前脚本的文件名是 start_wan_t2v_disagg.sh。建议修正这个笔误。

Suggested change
# GPU_ENCODER=4 GPU_TRANSFORMER=5 GPU_DECODER=6 ./start_wan_t2v_disagg_all.sh
# GPU_ENCODER=4 GPU_TRANSFORMER=5 GPU_DECODER=6 ./start_wan_t2v_disagg.sh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.