This repository describes the data preprocessing pipeline used in the Helios paper. And we prepare a toy training data here.
# Activate conda environment
conda activate heliosTo train your own video generation model, create JSON files following this format:
[
{
"cut": [0, 81],
"crop": [0, 832, 0, 480],
"fps": 24.0,
"num_frames": 81,
"resolution": {
"height": 480,
"width": 832
},
"cap": [
"A stunning mid-afternoon ..."
],
"path": "videos/2_240_ori81.mp4"
},
{
"cut": [0, 81],
...
}
...
]
and arrange video files following this structure:
📦 example/
├── 📂 toy_data/
│ ├── 📂 videos
│ │ ├── 2_240_ori81.mp4
│ │ ├── 239_120_ori129.mp4.mp4
│ │ └── ...
│ └── 📄 toy_data_1.json
│
├── 📂 toy_data_2/
│ │ ├── A.mp4
│ │ ├── B.mp4
│ │ └── ...
│ └── 📄 toy_data_2.json
...
These data can be used for training Stage-1, Stage-2, and Stage-3.
# Remember to modify the input and output paths before running
bash get_short-latents.pyThese data can only be used for training Stage-3.
# Remember to modify the input and output paths before running
bash get_ode-pairs.shIf you want to use the Self-Forcing training approach, prepare text embeddings:
# Remember to modify the input and output paths before running
bash get_text-embedding.sh- This project wouldn't be possible without the following open-sourced repositories: OpenSora Plan, OpenSora, Video-Dataset-Scripts