VASA-1-hack

This repository contains the VASA implementation separated from EMOPortraits, with all components properly configured for standalone training.

Setup Instructions

Clone the repository with submodules:

# Clone with submodules included
git clone --recurse-submodules https://github.com/johndpope/VASA-1-hack.git
cd VASA-1-hack

# Or if you already cloned without submodules:
git submodule update --init --recursive

Prerequisites

# Create conda environment
conda create -n vasa python=3.10
conda activate vasa

# Install PyTorch (adjust for your CUDA version)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Install required packages
pip install omegaconf wandb opencv-python pillow scipy matplotlib tqdm
pip install transformers diffusers accelerate
pip install facenet-pytorch insightface hsemotion-onnx
pip install mediapipe 
pip install l2cs memory-profiler rich


# EMOPortaits
cd nemo
bootstrap.sh

Create necessary symlinks:

# Create symlink for repos (required for relative paths)
ln -s nemo/repos repos

Download pre-trained volumetric avatar model:

The pre-trained model should be placed in:

nemo/logs/Retrain_with_17_V1_New_rand_MM_SEC_4_drop_02_stm_10_CV_05_1_1/checkpoints/328_model.pth

Prepare your training data:

# Create directories
mkdir -p junk cache checkpoints

# Place your training videos in the junk directory
# Videos should be .mp4 format
cp your_training_videos/*.mp4 junk/

📁 Project Structure

VASA-1-hack/
├── nemo/                        # Git submodule: nemo repository (base EMOPortraits code)
│   ├── models/                  # Model implementations
│   ├── networks/                # Network architectures
│   ├── losses/                  # Loss functions
│   ├── datasets/                # Dataset loaders
│   ├── repos/                   # External repositories (face_par_off, etc.)
│   └── logs/                    # Pre-trained model checkpoints
│
├── vasa_*.py                    # VASA-specific implementations
│   ├── vasa_trainer.py          # Main training script
│   ├── vasa_model.py            # VASA model architecture
│   ├── vasa_dataset.py          # VASA dataset handler
│   ├── vasa_scheduler.py        # Diffusion scheduler
│   └── vasa_lip_normalizer.py   # Lip normalization utilities
│
├── vasa_config.yaml             # Main configuration file
├── video_tracker.py             # Video tracking utilities
├── syncnet.py                   # Sync network implementation
│
├── data/                        # Data files
│   └── aligned_keypoints_3d.npy
├── losses/                      # Loss model weights
│   └── loss_model_weights/
├── junk/                        # Training videos directory
├── cache/                       # Cache for processed data
├── checkpoints/                 # Model checkpoints
└── repos/                       # Symlink to nemo/repos

⚙️ Configuration

Edit vasa_config.yaml to configure paths and training parameters:

paths:
  volumetric_model: "nemo/logs/[...]/328_model.pth"  # Pre-trained model
  volumetric_config: "nemo/models/stage_1/volumetric_avatar/va.yaml"
  data_dir: "data"
  video_folder: "junk"  # Your training videos directory
  cache_dir: "cache"
  checkpoint_dir: "checkpoints"

train:
  batch_size: 1
  num_epochs: 4000
  lr: 1e-3
  # ... other training parameters

🏃 Running Training

Test the Setup

python test_vasa_setup.py

Expected output:

✓ Config loaded successfully
✓ All paths exist
✓ All modules import correctly
✓ Setup looks good! You can now run vasa_trainer.py

Training Modes

1. Vanilla Training (Full Dataset)

Use the standard configuration for training on your complete dataset:

# Uses vasa_config.yaml by default
python vasa_trainer.py

# Or explicitly specify the config
python vasa_trainer.py --config vasa_config.yaml

Key parameters in vasa_config.yaml:

window_size: 50 - Full 50-frame windows
n_layers: 8 - Full 8 transformer layers
num_steps: 1000 - Full 1000 diffusion steps
batch_size: 1 - Adjust based on GPU memory
num_epochs: 4000 - Full training schedule

2. Overfitting Training (Fast Convergence Testing)

Use the overfitting configuration for rapid testing and debugging:

# Use the overfitting configuration
python vasa_trainer.py --config overfit_config.yaml

Key differences in overfit_config.yaml:

window_size: 20 - Smaller windows for faster processing
n_layers: 2 - Reduced transformer depth (2x-4x faster)
num_steps: 100 - Reduced diffusion steps (10x faster)
batch_size: 4 - Larger batch for better GPU utilization
num_epochs: 100 - Shorter training for quick iteration
max_videos: 100 - Limited dataset size
num_workers: 8 - Multi-threaded data loading
No augmentation - Pure overfitting test

When to use overfitting mode:

Testing new model architectures
Debugging training pipeline
Verifying data loading and caching
Quick convergence tests
Checking if model can overfit to small dataset (sanity check)

Monitoring Training

Both training modes support WandB logging:

# View training progress
# Visit the URL printed at training start, e.g.:
# wandb: 🚀 View run at https://wandb.ai/your-username/vasa/runs/run-id

For overfitting mode, runs are grouped as "overfit-experiments" in WandB for easy comparison.

Custom Dataset Path

To use a different dataset (e.g., CelebV-HQ):

# Edit the config file or create a custom one
# Update video_folder path in the config:
# video_folder: "/path/to/your/dataset"

# For example, using CelebV-HQ:
# video_folder: "/media/12TB/Downloads/CelebV-HQ/celebvhq/35666"

The trainer will:

Load the pre-trained volumetric avatar model
Process videos from the configured directory
Cache processed windows for faster subsequent epochs
Save checkpoints periodically based on save_freq
Save checkpoints to checkpoints/ (or checkpoints_overfit/ for overfitting mode)
Log to Weights & Biases (if enabled)

Performance Comparison

Parameter	Vanilla Training	Overfitting Mode	Speedup
Window Size	50 frames	20 frames	2.5x
Transformer Layers	8	2	4x
Diffusion Steps	1000	100	10x
Batch Size	1	4	4x
Workers	0	8	Parallel loading
Epoch Time (RTX 5090)	~5 min	~1.5 min	3.3x
Convergence	1000+ epochs	10-20 epochs	50x+

📝 Logging Configuration

Logging Levels (nemo/logger.py)

The project uses Python's logging module with three configurable levels defined in nemo/logger.py:28-30:

# log_level = logging.WARNING    # Minimal output - only warnings and errors
log_level = logging.INFO         # Standard output - informational messages (default)
# log_level = logging.DEBUG       # Verbose output - detailed debugging information

Logging Levels Explained:

WARNING (logging.WARNING)
- Shows only warnings, errors, and critical messages
- Use when you want minimal console output during training
- Best for production runs where you only need to know about issues
INFO (logging.INFO) - Currently Active
- Shows informational messages, warnings, and errors
- Provides training progress, epoch updates, and key metrics
- Default and recommended level for normal training runs
- Balances visibility with readability
DEBUG (logging.DEBUG)
- Shows all messages including detailed debugging information
- Includes tensor shapes, gradient information, and internal state
- Use when troubleshooting model issues or understanding data flow
- Can be verbose - recommended only for debugging sessions

To change the logging level:

Edit nemo/logger.py line 29
Uncomment the desired level and comment out the others
The change takes effect on next run

Additional Features:

Logs are saved to project.log file for later review
Rich formatting with color-coded output and timestamps
Third-party library logging is suppressed to reduce noise
TorchDebugger class available for advanced PyTorch debugging

🔧 Troubleshooting

Common Issues and Solutions

ModuleNotFoundError: No module named 'logger'

# The logger module is in nemo, paths are already configured
# If still having issues, check that nemo is cloned properly

FileNotFoundError: './repos/face_par_off/res/cp/79999_iter.pth'
```
# Ensure the symlink exists:
ln -s nemo/repos repos
```
ValueError: num_samples should be a positive integer value, but got num_samples=0
```
# No videos found. Add videos to junk/ directory:
cp your_video.mp4 junk/
```
FileNotFoundError: Config file not found at channel_config.yaml
```
# Copy from EMOPortraits or create a basic one
```
CUDA out of memory
- Reduce batch_size in vasa_config.yaml
- Enable gradient checkpointing
- Reduce sequence_length in dataset config
FFmpeg warnings
- These can be safely ignored if not processing audio
- To fix: pip install ffmpeg-python

Required Files from EMOPortraits

If you're missing files, you'll need these from EMOPortraits:

channel_config.yaml - Channel configuration
syncnet.py - Sync network implementation
data/aligned_keypoints_3d.npy - 3D keypoint alignments
losses/loss_model_weights/*.pth - Pre-trained loss models
Pre-trained volumetric avatar checkpoint

📊 Monitoring Training

Training progress is logged to:

Console: Real-time training metrics
Weights & Biases: Detailed metrics and visualizations (if enabled)
Checkpoints: Saved every N epochs to checkpoints/

Monitor training:

# Watch training logs
tail -f project.log

# Check W&B dashboard
# https://wandb.ai/YOUR_USERNAME/vasa/

🛠️ Development

Project Organization

VASA-specific code: Root directory (vasa_*.py)
Base EMOPortraits code: nemo/ directory
Configuration: vasa_config.yaml
Training data: junk/ directory
Model outputs: checkpoints/ directory

Key Improvements Made

Separated VASA components from EMOPortraits codebase
Fixed all hardcoded paths to be relative or configurable
Proper module imports with sys.path management
Configurable paths via vasa_config.yaml
Auto-detection of project directories in nemo code
Clean separation between VASA-specific and base code

Working with the Submodule

Update nemo to latest version:

cd nemo
git pull origin main
cd ..
git add nemo
git commit -m "Update nemo submodule to latest"

Lock to specific nemo version:

cd nemo
git checkout <commit-hash>
cd ..
git add nemo
git commit -m "Lock nemo to specific version"

📝 Notes

The volumetric model must be pre-trained (from EMOPortraits)
Training requires at least one video in the junk/ directory
All paths in configs are relative to the project root
The repos symlink is required for backward compatibility

🚨 Known Issues

Training requires significant GPU memory (recommended: 24GB+)
Some imports show FFmpeg warnings (can be ignored)
Initial dataset processing can be slow (cached afterward)

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Note: The nemo submodule and other dependencies may have their own licenses.

🙏 Acknowledgments

EMOPortraits team for the base implementation
VASA paper authors for the architecture design
Contributors to the nemo repository

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
bad_videos		bad_videos
nemo @ 529174f		nemo @ 529174f
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
channel_config.yaml		channel_config.yaml
logs		logs
models		models
mohattention.py		mohattention.py
overfit_config.yaml		overfit_config.yaml
repos		repos
run_inference.sh		run_inference.sh
status.md		status.md
syncnet.py		syncnet.py
tdd_progressive_loss.py		tdd_progressive_loss.py
test_chunked_cache.py		test_chunked_cache.py
test_convergence_speed.py		test_convergence_speed.py
test_motion_variance.py		test_motion_variance.py
test_prev_context.py		test_prev_context.py
test_prev_embed.py		test_prev_embed.py
test_thumbnail.py		test_thumbnail.py
test_vasa_setup.py		test_vasa_setup.py
thumbnail_generator.py		thumbnail_generator.py
train.sh		train.sh
training_log.txt		training_log.txt
training_log_adaptive.txt		training_log_adaptive.txt
training_log_optimized.txt		training_log_optimized.txt
validate_channel_config.py		validate_channel_config.py
vasa_config.yaml		vasa_config.yaml
vasa_dataset.py		vasa_dataset.py
vasa_losses.py		vasa_losses.py
vasa_model.py		vasa_model.py
vasa_trainer.py		vasa_trainer.py
vi.py		vi.py
video_tracker.py		video_tracker.py
vis_helper.py		vis_helper.py
window_cache.py		window_cache.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VASA-1-hack

Setup Instructions

Prerequisites

📁 Project Structure

⚙️ Configuration

🏃 Running Training

Test the Setup

Training Modes

1. Vanilla Training (Full Dataset)

2. Overfitting Training (Fast Convergence Testing)

Monitoring Training

Custom Dataset Path

Performance Comparison

📝 Logging Configuration

Logging Levels (nemo/logger.py)

🔧 Troubleshooting

Common Issues and Solutions

Required Files from EMOPortraits

📊 Monitoring Training

🛠️ Development

Project Organization

Key Improvements Made

Working with the Submodule

📝 Notes

🚨 Known Issues

📄 License

🙏 Acknowledgments

About

Uh oh!

Packages

Uh oh!

Contributors 2

Languages

License

johndpope/VASA-1-hack

Folders and files

Latest commit

History

Repository files navigation

VASA-1-hack

Setup Instructions

Prerequisites

📁 Project Structure

⚙️ Configuration

🏃 Running Training

Test the Setup

Training Modes

1. Vanilla Training (Full Dataset)

2. Overfitting Training (Fast Convergence Testing)

Monitoring Training

Custom Dataset Path

Performance Comparison

📝 Logging Configuration

Logging Levels (nemo/logger.py)

🔧 Troubleshooting

Common Issues and Solutions

Required Files from EMOPortraits

📊 Monitoring Training

🛠️ Development

Project Organization

Key Improvements Made

Working with the Submodule

📝 Notes

🚨 Known Issues

📄 License

🙏 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Packages 0

Uh oh!

Contributors 2

Languages

Packages