This repository contains the VASA implementation separated from EMOPortraits, with all components properly configured for standalone training.
- Clone the repository with submodules:
# Clone with submodules included
git clone --recurse-submodules https://github.com/johndpope/VASA-1-hack.git
cd VASA-1-hack
# Or if you already cloned without submodules:
git submodule update --init --recursive
# Create conda environment
conda create -n vasa python=3.10
conda activate vasa
# Install PyTorch (adjust for your CUDA version)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# Install required packages
pip install omegaconf wandb opencv-python pillow scipy matplotlib tqdm
pip install transformers diffusers accelerate
pip install facenet-pytorch insightface hsemotion-onnx
pip install mediapipe
pip install l2cs memory-profiler rich
# EMOPortaits
cd nemo
bootstrap.sh
- Create necessary symlinks:
# Create symlink for repos (required for relative paths)
ln -s nemo/repos repos
- Download pre-trained volumetric avatar model:
The pre-trained model should be placed in:
nemo/logs/Retrain_with_17_V1_New_rand_MM_SEC_4_drop_02_stm_10_CV_05_1_1/checkpoints/328_model.pth
- Prepare your training data:
# Create directories
mkdir -p junk cache checkpoints
# Place your training videos in the junk directory
# Videos should be .mp4 format
cp your_training_videos/*.mp4 junk/
VASA-1-hack/
├── nemo/ # Git submodule: nemo repository (base EMOPortraits code)
│ ├── models/ # Model implementations
│ ├── networks/ # Network architectures
│ ├── losses/ # Loss functions
│ ├── datasets/ # Dataset loaders
│ ├── repos/ # External repositories (face_par_off, etc.)
│ └── logs/ # Pre-trained model checkpoints
│
├── vasa_*.py # VASA-specific implementations
│ ├── vasa_trainer.py # Main training script
│ ├── vasa_model.py # VASA model architecture
│ ├── vasa_dataset.py # VASA dataset handler
│ ├── vasa_scheduler.py # Diffusion scheduler
│ └── vasa_lip_normalizer.py # Lip normalization utilities
│
├── vasa_config.yaml # Main configuration file
├── video_tracker.py # Video tracking utilities
├── syncnet.py # Sync network implementation
│
├── data/ # Data files
│ └── aligned_keypoints_3d.npy
├── losses/ # Loss model weights
│ └── loss_model_weights/
├── junk/ # Training videos directory
├── cache/ # Cache for processed data
├── checkpoints/ # Model checkpoints
└── repos/ # Symlink to nemo/repos
Edit vasa_config.yaml
to configure paths and training parameters:
paths:
volumetric_model: "nemo/logs/[...]/328_model.pth" # Pre-trained model
volumetric_config: "nemo/models/stage_1/volumetric_avatar/va.yaml"
data_dir: "data"
video_folder: "junk" # Your training videos directory
cache_dir: "cache"
checkpoint_dir: "checkpoints"
train:
batch_size: 1
num_epochs: 4000
lr: 1e-3
# ... other training parameters
python test_vasa_setup.py
Expected output:
✓ Config loaded successfully
✓ All paths exist
✓ All modules import correctly
✓ Setup looks good! You can now run vasa_trainer.py
Use the standard configuration for training on your complete dataset:
# Uses vasa_config.yaml by default
python vasa_trainer.py
# Or explicitly specify the config
python vasa_trainer.py --config vasa_config.yaml
Key parameters in vasa_config.yaml
:
window_size: 50
- Full 50-frame windowsn_layers: 8
- Full 8 transformer layersnum_steps: 1000
- Full 1000 diffusion stepsbatch_size: 1
- Adjust based on GPU memorynum_epochs: 4000
- Full training schedule
Use the overfitting configuration for rapid testing and debugging:
# Use the overfitting configuration
python vasa_trainer.py --config overfit_config.yaml
Key differences in overfit_config.yaml
:
window_size: 20
- Smaller windows for faster processingn_layers: 2
- Reduced transformer depth (2x-4x faster)num_steps: 100
- Reduced diffusion steps (10x faster)batch_size: 4
- Larger batch for better GPU utilizationnum_epochs: 100
- Shorter training for quick iterationmax_videos: 100
- Limited dataset sizenum_workers: 8
- Multi-threaded data loading- No augmentation - Pure overfitting test
When to use overfitting mode:
- Testing new model architectures
- Debugging training pipeline
- Verifying data loading and caching
- Quick convergence tests
- Checking if model can overfit to small dataset (sanity check)
Both training modes support WandB logging:
# View training progress
# Visit the URL printed at training start, e.g.:
# wandb: 🚀 View run at https://wandb.ai/your-username/vasa/runs/run-id
For overfitting mode, runs are grouped as "overfit-experiments" in WandB for easy comparison.
To use a different dataset (e.g., CelebV-HQ):
# Edit the config file or create a custom one
# Update video_folder path in the config:
# video_folder: "/path/to/your/dataset"
# For example, using CelebV-HQ:
# video_folder: "/media/12TB/Downloads/CelebV-HQ/celebvhq/35666"
The trainer will:
- Load the pre-trained volumetric avatar model
- Process videos from the configured directory
- Cache processed windows for faster subsequent epochs
- Save checkpoints periodically based on
save_freq
- Save checkpoints to
checkpoints/
(orcheckpoints_overfit/
for overfitting mode) - Log to Weights & Biases (if enabled)
Parameter | Vanilla Training | Overfitting Mode | Speedup |
---|---|---|---|
Window Size | 50 frames | 20 frames | 2.5x |
Transformer Layers | 8 | 2 | 4x |
Diffusion Steps | 1000 | 100 | 10x |
Batch Size | 1 | 4 | 4x |
Workers | 0 | 8 | Parallel loading |
Epoch Time (RTX 5090) | ~5 min | ~1.5 min | 3.3x |
Convergence | 1000+ epochs | 10-20 epochs | 50x+ |
The project uses Python's logging module with three configurable levels defined in nemo/logger.py:28-30
:
# log_level = logging.WARNING # Minimal output - only warnings and errors
log_level = logging.INFO # Standard output - informational messages (default)
# log_level = logging.DEBUG # Verbose output - detailed debugging information
Logging Levels Explained:
-
WARNING (
logging.WARNING
)- Shows only warnings, errors, and critical messages
- Use when you want minimal console output during training
- Best for production runs where you only need to know about issues
-
INFO (
logging.INFO
) - Currently Active- Shows informational messages, warnings, and errors
- Provides training progress, epoch updates, and key metrics
- Default and recommended level for normal training runs
- Balances visibility with readability
-
DEBUG (
logging.DEBUG
)- Shows all messages including detailed debugging information
- Includes tensor shapes, gradient information, and internal state
- Use when troubleshooting model issues or understanding data flow
- Can be verbose - recommended only for debugging sessions
To change the logging level:
- Edit
nemo/logger.py
line 29 - Uncomment the desired level and comment out the others
- The change takes effect on next run
Additional Features:
- Logs are saved to
project.log
file for later review - Rich formatting with color-coded output and timestamps
- Third-party library logging is suppressed to reduce noise
- TorchDebugger class available for advanced PyTorch debugging
-
ModuleNotFoundError: No module named 'logger'
# The logger module is in nemo, paths are already configured # If still having issues, check that nemo is cloned properly
-
FileNotFoundError: './repos/face_par_off/res/cp/79999_iter.pth'
# Ensure the symlink exists: ln -s nemo/repos repos
-
ValueError: num_samples should be a positive integer value, but got num_samples=0
# No videos found. Add videos to junk/ directory: cp your_video.mp4 junk/
-
FileNotFoundError: Config file not found at channel_config.yaml
# Copy from EMOPortraits or create a basic one
-
CUDA out of memory
- Reduce
batch_size
in vasa_config.yaml - Enable gradient checkpointing
- Reduce
sequence_length
in dataset config
- Reduce
-
FFmpeg warnings
- These can be safely ignored if not processing audio
- To fix:
pip install ffmpeg-python
If you're missing files, you'll need these from EMOPortraits:
channel_config.yaml
- Channel configurationsyncnet.py
- Sync network implementationdata/aligned_keypoints_3d.npy
- 3D keypoint alignmentslosses/loss_model_weights/*.pth
- Pre-trained loss models- Pre-trained volumetric avatar checkpoint
Training progress is logged to:
- Console: Real-time training metrics
- Weights & Biases: Detailed metrics and visualizations (if enabled)
- Checkpoints: Saved every N epochs to
checkpoints/
Monitor training:
# Watch training logs
tail -f project.log
# Check W&B dashboard
# https://wandb.ai/YOUR_USERNAME/vasa/
- VASA-specific code: Root directory (
vasa_*.py
) - Base EMOPortraits code:
nemo/
directory - Configuration:
vasa_config.yaml
- Training data:
junk/
directory - Model outputs:
checkpoints/
directory
- Separated VASA components from EMOPortraits codebase
- Fixed all hardcoded paths to be relative or configurable
- Proper module imports with sys.path management
- Configurable paths via vasa_config.yaml
- Auto-detection of project directories in nemo code
- Clean separation between VASA-specific and base code
Update nemo to latest version:
cd nemo
git pull origin main
cd ..
git add nemo
git commit -m "Update nemo submodule to latest"
Lock to specific nemo version:
cd nemo
git checkout <commit-hash>
cd ..
git add nemo
git commit -m "Lock nemo to specific version"
- The volumetric model must be pre-trained (from EMOPortraits)
- Training requires at least one video in the
junk/
directory - All paths in configs are relative to the project root
- The
repos
symlink is required for backward compatibility
- Training requires significant GPU memory (recommended: 24GB+)
- Some imports show FFmpeg warnings (can be ignored)
- Initial dataset processing can be slow (cached afterward)
This project is licensed under the MIT License - see the LICENSE file for details.
Note: The nemo submodule and other dependencies may have their own licenses.
- EMOPortraits team for the base implementation
- VASA paper authors for the architecture design
- Contributors to the nemo repository