Cong Wang1*,
Zexuan Deng1*,
Zhiwei Jiang1†,
Fei Shen2,
Yafeng Yin1,
Shiwei Gan1,
Zifeng Cheng1,
Shiping Ge1,
Qing Gu1
1 Nanjing University,
2 National University of Singapore
* Equal contribution.
† Corresponding authors.
[arXiv]
Note: Left side shows generated GIFs, right side shows original videos.
Preview | Preview | Preview |
---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Preview | Preview | Preview |
---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
.
├── configs/ # Configuration files
├── metrics/ # Evaluation metrics
├── models/ # Model architectures
├── pipelines/ # Data processing pipelines
├── scripts/ # Utility scripts
├── signdatasets/ # Dataset handling
├── train.sh # Training script
├── train_stage_1.py # Stage 1 training (single frame)
├── train_stage_2.py # Stage 1 training (Temp.-Attn. Layer)
├── train_compress_vq_multicond.py # Stage 2 training
├── train_multihead_t2vqpgpt.py # Stage 3 training
└── utils.py # Utility functions
The system is trained in three main stages:
- Files:
train_stage_1.py
,train_stage_2.py
- File:
train_compress_vq_multicond.py
- File:
train_multihead_t2vqpgpt.py
- Install dependencies:
pip install -r requirements.txt
The training process can be initiated using the provided training script. Here are the specific commands for each stage:
# Stage 1: Single frame training
accelerate launch \
--config_file accelerate_config.yaml \
--num_processes 2 \
--gpu_ids "0,1" \
train_stage_1.py \
--config "configs/stage1/stage_1_multicond_RWTH.yaml"
# Stage 1: Temporal-Attention Layer training
accelerate launch \
--config_file accelerate_config.yaml \
--num_processes 2 \
--gpu_ids "0,1" \
train_stage_2.py \
--config "configs/stage2/stage_2_RWTH.yaml"
accelerate launch \
--config_file accelerate_config.yaml \
--num_processes 2 \
--gpu_ids "0,1" \
train_compress_vq_multicond.py \
--config "configs/vq/vq_multicond_RWTH_compress.yaml"
accelerate launch \
--config_file accelerate_config_bf16.yaml \
--num_processes 2 \
--gpu_ids "0,1" \
train_multihead_t2vqpgpt.py \
--config "configs/gpt/multihead_t2vqpgpt_RWTH.yaml"
To process the dataset and generate VQ tokens:
# Process train dataset
python get_compress_vq_pose_latent.py \
--config /path/to/config_train.yaml \
--output_dir /path/to/output/train_processed_videos/
# Process validation dataset
python get_compress_vq_pose_latent.py \
--config /path/to/config_val.yaml \
--output_dir /path/to/output/val_processed_videos/
# Process test dataset
python get_compress_vq_pose_latent.py \
--config /path/to/config_test.yaml \
--output_dir /path/to/output/test_processed_videos/
To generate videos from the processed data:
python eval_compress_vq_video.py \
--config /path/to/config_test.yaml \
--input_dir /path/to/test_processed_videos \
--video_base_path /path/to/original_videos \
--pose_size 12 # 12 for RWTH, 64 for How2Sign
Several evaluation scripts are provided:
eval_multihead_t2vqpgpt.py
: Evaluates the token translatoreval_compress_video_from_origin.py
: Evaluates video compressioneval_compress_vq_video.py
: Evaluates quantized video compressioncombined_t2s_eval.py
: Combined evaluation of text-to-sign translation
The project includes several utility scripts organized in the scripts/
directory:
Located in scripts/RWTH-T/
:
1_make_video.py
: Creates video files from raw data2_check_video.py
: Validates video files3_process_annotion.py
: Processes annotation files
Located in scripts/how2sign/
:
1_create_json.py
: Creates initial JSON metadata2_clip_videos.py
: Clips videos to appropriate lengths3_check_clip_videos.py
: Validates clipped videos4_crop_and_resize_videos.py
: Processes video dimensions5_create_final_json.py
: Generates final dataset metadata
scripts/hamer/
: Scripts for processing HAMER datasetscripts/sk/
: Scripts for processing SK(DWPose) dataset
If you find SignViP useful for your research and applications, please cite using this BibTeX:
@article{wang2025advanced,
title={Advanced Sign Language Video Generation with Compressed and Quantized Multi-Condition Tokenization},
author={Wang, Cong and Deng, Zexuan and Jiang, Zhiwei and Shen, Fei and Yafeng, Yin and Shiwei, Gan and Zifeng, Cheng and Shiping, Ge and Qing, Gu},
booktitle={arXiv preprint arXiv:2506.15980},
year={2025}
}