Skip to content

This is a fork from DinoV2 original repository (https://github.com/facebookresearch/dinov2). It has been adapted to perform extended pre-training of pathology foundation models using ExPLoRA (https://arxiv.org/abs/2406.10973) or fully retraining the model.

License

Notifications You must be signed in to change notification settings

idiap/dinov2

 
 

Repository files navigation

DINOv2 extended pre-training

This is a fork from DinoV2 original repository. It has been adapted to facilitate extended pre-training of pathology foundation models using ExPLoRA or fully retraining the model.

Installation

This code was developed on a 64-bit Debian GNU/Linux system (x86_64), running kernel version 6.1.0-26-amd64 (Debian 6.1.112-1), with dynamic preemption enabled (PREEMPT_DYNAMIC). It has been run on a NVIDIA GeForce RTX 3090 GPU, but should be compatible with other GPUs.

First you have to install the environement:

conda env create -f conda.yaml  

FAQ: Environment problems

You may have a problem with cuml-cu12. This depend on your cuda version. For example if you are using cuda 11.7 you will have to install cuml-cu11 instead.

Pretrained models

Currently, we have implemented models: UNI, UNI2 and Virchow2. However, extented pre-training strategy has been implemented for UNI only, as it was the only model with available training parameters at the time of our study.

In order to train other models you will need to create the corresponding ViT architecture and modify the function dinov2/dinov2/models/__init__.py/load_weights.

Name Group Released SSL WSIs Tiles Patients Batch size Iterations Architecture Parameters Embed dim Input size Dataset Links
UNI Mahmood Lab Aug 2023[*] DINOv2 100K 100M ViT-L 1024 224 proprietary (Mass-100K)
Virchow 2 Paige / Microsoft Aug 2024[*] DINOv2 (+ ECT and KDE) 3.1M 2B 230K 4096 ViT-H with 4 registers 632M 3584 224 proprietary (from MSKCC and international sites)

Extended pre-training

You can run the extended pre-training using:

PYTHONPATH=$(pwd) python dinov2/train/train.py path_to_your_config_file

The different configurations we used in our study are stored in configs folder. Args from ssl_default_config.yaml will be used if no config file is given and if an argument is not specified in the selected configuration file.

Yaml parameters

Refer to DinoV2 documentation for parameters that are were already present for self-supervised learning.

Other added parameters:

  • wandb: controls if wandb is used for logging or not. You will need an account in order for the logging to work properly. Otherwise the code will constantly print indications.
  • data_path: path to your patches dataset
  • data_type: use images if you store all your patches in the given data_path, or hdf5 if you store them in a .hdf5 file. The patches will be resized to 224 * 224 pixels. You will then have to complete data_path and output_dir.
  • train_strategy: explora or full for the extended-pretraining strategy. Default is set to full.
  • kde_loss_weight: weight for the KDE loss
  • kde_kappa: kappa argument in the KDE loss
  • koleo_loss_weight: weight for the KoLeo loss
wandb:
train:
  data_type:
  data_path:
  output_dir: 
  
student:
  train_strategy:
  pretrained_weights:

dino:
  kde_loss_weight: 
  kde_kappa: 
  koleo_loss_weight:

About

This is a fork from DinoV2 original repository (https://github.com/facebookresearch/dinov2). It has been adapted to perform extended pre-training of pathology foundation models using ExPLoRA (https://arxiv.org/abs/2406.10973) or fully retraining the model.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%