The model to consider.
The closest model vllm-omni already supports.
WanPipeline / Wan22Pipeline, especially Wan-AI/Wan2.2-TI2V-5B-Diffusers.
DreamX-World is a Wan2.2 TI2V-based world model. The Hugging Face metadata for GD-ML/DreamX-World-5B lists architecture: ti2v, library: diffusers, and base_model: Wan-AI/Wan2.2-TI2V-5B. The upstream inference instructions also require the Wan2.2 TI2V 5B base checkpoint plus DreamX transformer weights.
What's your difficulty of supporting the model you want?
Likely integration points:
- DreamX-World-5B-Cam uses image + caption + camera/action controls, with inputs such as
image_path, caption, action_seq, and action_speed_list.
- The camera/action command space includes
w/s/a/d/i/k/j/l and composed commands such as wj or dj; vllm-omni may need request schema/API support for these controls.
- The checkpoint layout appears to combine a Wan2.2 TI2V base checkpoint with DreamX-specific transformer weights, so loading may not map directly to the existing pure Wan2.2 Diffusers path.
- The short-horizon model generates 704x1280 videos at 121 frames/24 FPS or 81 frames/16 FPS, and supports multi-GPU Ulysses/ring parallel settings in the upstream scripts.
- DreamX-World-5B is autoregressive and supports long-horizon generation up to 1 minute at 16 FPS, which may require a generation loop beyond the standard Wan2.2 pipeline.
A useful first milestone might be supporting DreamX-World-5B-Cam for image-conditioned camera/action-controlled generation, then handling the autoregressive long-horizon DreamX-World-5B path separately.
Use case and motivation
DreamX-World is a general-purpose interactive world model for controllable world simulation. Supporting it in vllm-omni would let users serve Wan2.2-derived world generation workloads with vllm-omni's existing video-generation stack and optimizations.
The main use cases are:
- image-to-video world exploration with explicit camera/action control;
- interactive scene navigation and transformation from prompts;
- long-horizon world generation for simulation, game-like environments, and robotics/world-model research.
Existing issue search
I searched existing open and closed issues in vllm-project/vllm-omni for DreamX, DreamX-World, and AMAP-ML, and did not find a duplicate.
The model to consider.
The closest model vllm-omni already supports.
WanPipeline/Wan22Pipeline, especiallyWan-AI/Wan2.2-TI2V-5B-Diffusers.DreamX-World is a Wan2.2 TI2V-based world model. The Hugging Face metadata for
GD-ML/DreamX-World-5Blistsarchitecture: ti2v,library: diffusers, andbase_model: Wan-AI/Wan2.2-TI2V-5B. The upstream inference instructions also require the Wan2.2 TI2V 5B base checkpoint plus DreamX transformer weights.What's your difficulty of supporting the model you want?
Likely integration points:
image_path,caption,action_seq, andaction_speed_list.w/s/a/d/i/k/j/land composed commands such aswjordj; vllm-omni may need request schema/API support for these controls.A useful first milestone might be supporting
DreamX-World-5B-Camfor image-conditioned camera/action-controlled generation, then handling the autoregressive long-horizonDreamX-World-5Bpath separately.Use case and motivation
DreamX-World is a general-purpose interactive world model for controllable world simulation. Supporting it in vllm-omni would let users serve Wan2.2-derived world generation workloads with vllm-omni's existing video-generation stack and optimizations.
The main use cases are:
Existing issue search
I searched existing open and closed issues in
vllm-project/vllm-omniforDreamX,DreamX-World, andAMAP-ML, and did not find a duplicate.