Skip to content

[New Model]: DreamX-World #4570

@hsliuustc0106

Description

@hsliuustc0106

The model to consider.

The closest model vllm-omni already supports.

WanPipeline / Wan22Pipeline, especially Wan-AI/Wan2.2-TI2V-5B-Diffusers.

DreamX-World is a Wan2.2 TI2V-based world model. The Hugging Face metadata for GD-ML/DreamX-World-5B lists architecture: ti2v, library: diffusers, and base_model: Wan-AI/Wan2.2-TI2V-5B. The upstream inference instructions also require the Wan2.2 TI2V 5B base checkpoint plus DreamX transformer weights.

What's your difficulty of supporting the model you want?

Likely integration points:

  • DreamX-World-5B-Cam uses image + caption + camera/action controls, with inputs such as image_path, caption, action_seq, and action_speed_list.
  • The camera/action command space includes w/s/a/d/i/k/j/l and composed commands such as wj or dj; vllm-omni may need request schema/API support for these controls.
  • The checkpoint layout appears to combine a Wan2.2 TI2V base checkpoint with DreamX-specific transformer weights, so loading may not map directly to the existing pure Wan2.2 Diffusers path.
  • The short-horizon model generates 704x1280 videos at 121 frames/24 FPS or 81 frames/16 FPS, and supports multi-GPU Ulysses/ring parallel settings in the upstream scripts.
  • DreamX-World-5B is autoregressive and supports long-horizon generation up to 1 minute at 16 FPS, which may require a generation loop beyond the standard Wan2.2 pipeline.

A useful first milestone might be supporting DreamX-World-5B-Cam for image-conditioned camera/action-controlled generation, then handling the autoregressive long-horizon DreamX-World-5B path separately.

Use case and motivation

DreamX-World is a general-purpose interactive world model for controllable world simulation. Supporting it in vllm-omni would let users serve Wan2.2-derived world generation workloads with vllm-omni's existing video-generation stack and optimizations.

The main use cases are:

  • image-to-video world exploration with explicit camera/action control;
  • interactive scene navigation and transformation from prompts;
  • long-horizon world generation for simulation, game-like environments, and robotics/world-model research.

Existing issue search

I searched existing open and closed issues in vllm-project/vllm-omni for DreamX, DreamX-World, and AMAP-ML, and did not find a duplicate.

Metadata

Metadata

Assignees

Labels

diffusioncodes related to diffusion modelshelp wantedExtra attention is neededhigh priorityhigh priority issue, needs to be done asapnew modeladd new modelworld model

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions