OmniVDiff:
Omni Controllable Video Diffusion for Generation and Understandingn

Dianbing Xi^1,2,*, Jiepeng Wang^2,*,‡, Yuanzhi Liang², Xi Qiu², Yuchi Huo¹, Rui Wang^1,†, Chi Zhang^2,†, Xuelong Li^2,†

*Equal contribution. †Corresponding author. ‡Project leader.

¹State Key Laboratory of CAD&CG, Zhejiang University
²Institute of Artificial Intelligence, China Telecom (TeleAI)

📄 Paper · 🌐 Project Page · 🤗 ModelScope

AAAI 2026

📌 Intro

OmniVDiff enables controllable video generation and understanding in a unified video diffusion framework.

📦 Environment Setup

Create a conda environment named ovdiff:

conda create -n ovdiff python=3.10.9
conda activate ovdiff

Install the required packages:
```
pip install -r requirements.txt
```
Install our modified version of the diffusers library.
Navigate to the diffusers directory and run:
```
pip install -e .
```

🤗 Model Zoo

OmniVDiff is available in the ModelScope Hub.

🔍 Inference

Navigate to the inference directory:
```
cd inference
```

Run batch inference:

python batch_infer.py

# -1 no condition, 0:rgb, 1:depth, 2:canny, 3:segment
python batch_infer.py --idx_cond_modality -1 --output_dir "./output_cond=-1"
python batch_infer.py --idx_cond_modality 0 --output_dir "./output_cond=0"
python batch_infer.py --idx_cond_modality 1 --output_dir "./output_cond=1"
python batch_infer.py --idx_cond_modality 2 --output_dir "./output_cond=2"
python batch_infer.py --idx_cond_modality 3 --output_dir "./output_cond=3"

🏋️‍♂️ Training

We provide an example configuration for training on 2 GPUs with batch_size=1.
You can modify the configuration file(.yaml) to adjust the number of GPUs to fit different hardware setups.

Navigate to the finetune directory:

cd finetune

Enable cached latents before training Before starting the actual training, enable the following option in train.sh to use cached latents:

-check_cache "true"

This will generate and store latent representations for faster training.

bash train.sh

Disable the option and start training After the latent cache has been prepared, disable the option (set it to "false" or comment it out) and begin training:

bash train.sh

🙏 Acknowledgements

We sincerely thank the developers of the following open-source repositories, whose contributions have been invaluable to our research:

📜 Citation

If you find our work helpful in your research, please consider citing it using the BibTeX entry below.

@article{xdb2025OmniVDiff,
  author    = {Xi, Dianbing and Wang, Jiepeng and Liang, Yuanzhi and Qi, Xi and Huo, Yuchi and Wang, Rui and Zhang, Chi and Li, Xuelong},
  title     = {OmniVDiff: Omni Controllable Video Diffusion for Generation and Understanding},
  journal   = {arXiv preprint arXiv:2504.10825},
  year      = {2025},
}

@misc{xdb2025CtrlVDiff,
      title={CtrlVDiff: Controllable Video Generation via Unified Multimodal Video Diffusion}, 
      author={Dianbing Xi and Jiepeng Wang and Yuanzhi Liang and Xi Qiu and Jialun Liu and Hao Pan and Yuchi Huo and Rui Wang and Haibin Huang and Chi Zhang and Xuelong Li},
      year={2025},
      eprint={2511.21129},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2511.21129}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github		.github
assets		assets
dataset		dataset
diffusers		diffusers
finetune		finetune
inference		inference
sat		sat
tools		tools
LICENSE		LICENSE
MODEL_LICENSE		MODEL_LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OmniVDiff:
Omni Controllable Video Diffusion for Generation and Understandingn

AAAI 2026

📌 Intro

📦 Environment Setup

🤗 Model Zoo

🔍 Inference

🏋️‍♂️ Training

🙏 Acknowledgements

📜 Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

Tele-AI/OmniVDiff

Folders and files

Latest commit

History

Repository files navigation

OmniVDiff:Omni Controllable Video Diffusion for Generation and Understandingn

AAAI 2026

📌 Intro

📦 Environment Setup

🤗 Model Zoo

🔍 Inference

🏋️‍♂️ Training

🙏 Acknowledgements

📜 Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

OmniVDiff:
Omni Controllable Video Diffusion for Generation and Understandingn

Packages