[New Model]: DreamX-World

### The model to consider.

- DreamX-World-5B: https://huggingface.co/GD-ML/DreamX-World-5B
- DreamX-World-5B-Cam: https://huggingface.co/GD-ML/DreamX-World-5B-Cam
- Upstream implementation: https://github.com/AMAP-ML/DreamX-World
- Project page: https://amap-ml.github.io/DreamX_World
- Technical report: https://arxiv.org/abs/2606.16993

### The closest model vllm-omni already supports.

`WanPipeline` / `Wan22Pipeline`, especially `Wan-AI/Wan2.2-TI2V-5B-Diffusers`.

DreamX-World is a Wan2.2 TI2V-based world model. The Hugging Face metadata for `GD-ML/DreamX-World-5B` lists `architecture: ti2v`, `library: diffusers`, and `base_model: Wan-AI/Wan2.2-TI2V-5B`. The upstream inference instructions also require the Wan2.2 TI2V 5B base checkpoint plus DreamX transformer weights.

### What's your difficulty of supporting the model you want?

Likely integration points:

- DreamX-World-5B-Cam uses image + caption + camera/action controls, with inputs such as `image_path`, `caption`, `action_seq`, and `action_speed_list`.
- The camera/action command space includes `w/s/a/d/i/k/j/l` and composed commands such as `wj` or `dj`; vllm-omni may need request schema/API support for these controls.
- The checkpoint layout appears to combine a Wan2.2 TI2V base checkpoint with DreamX-specific transformer weights, so loading may not map directly to the existing pure Wan2.2 Diffusers path.
- The short-horizon model generates 704x1280 videos at 121 frames/24 FPS or 81 frames/16 FPS, and supports multi-GPU Ulysses/ring parallel settings in the upstream scripts.
- DreamX-World-5B is autoregressive and supports long-horizon generation up to 1 minute at 16 FPS, which may require a generation loop beyond the standard Wan2.2 pipeline.

A useful first milestone might be supporting `DreamX-World-5B-Cam` for image-conditioned camera/action-controlled generation, then handling the autoregressive long-horizon `DreamX-World-5B` path separately.

### Use case and motivation

DreamX-World is a general-purpose interactive world model for controllable world simulation. Supporting it in vllm-omni would let users serve Wan2.2-derived world generation workloads with vllm-omni's existing video-generation stack and optimizations.

The main use cases are:

- image-to-video world exploration with explicit camera/action control;
- interactive scene navigation and transformation from prompts;
- long-horizon world generation for simulation, game-like environments, and robotics/world-model research.

### Existing issue search

I searched existing open and closed issues in `vllm-project/vllm-omni` for `DreamX`, `DreamX-World`, and `AMAP-ML`, and did not find a duplicate.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[New Model]: DreamX-World #4570

The model to consider.

The closest model vllm-omni already supports.

What's your difficulty of supporting the model you want?

Use case and motivation

Existing issue search

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[New Model]: DreamX-World #4570

Description

The model to consider.

The closest model vllm-omni already supports.

What's your difficulty of supporting the model you want?

Use case and motivation

Existing issue search

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions