DINOv2 extended pre-training

This is a fork from DinoV2 original repository. It has been adapted to facilitate extended pre-training of pathology foundation models using ExPLoRA or fully retraining the model.

Installation

This code was developed on a 64-bit Debian GNU/Linux system (x86_64), running kernel version 6.1.0-26-amd64 (Debian 6.1.112-1), with dynamic preemption enabled (PREEMPT_DYNAMIC). It has been run on a NVIDIA GeForce RTX 3090 GPU, but should be compatible with other GPUs.

First you have to install the environement:

conda env create -f conda.yaml

FAQ: Environment problems

You may have a problem with cuml-cu12. This depend on your cuda version. For example if you are using cuda 11.7 you will have to install cuml-cu11 instead.

Pretrained models

Currently, we have implemented models: UNI, UNI2 and Virchow2. However, extented pre-training strategy has been implemented for UNI only, as it was the only model with available training parameters at the time of our study.

In order to train other models you will need to create the corresponding ViT architecture and modify the function dinov2/dinov2/models/__init__.py/load_weights.

Name	Group	Released	SSL	WSIs	Tiles	Patients	Batch size	Iterations	Architecture	Parameters	Embed dim	Input size	Dataset	Links
UNI	Mahmood Lab	Aug 2023[*]	DINOv2	100K	100M				ViT-L		1024	224	proprietary (Mass-100K)
Virchow 2	Paige / Microsoft	Aug 2024[*]	DINOv2 (+ ECT and KDE)	3.1M	2B	230K	4096		ViT-H with 4 registers	632M	3584	224	proprietary (from MSKCC and international sites)

Extended pre-training

You can run the extended pre-training using:

PYTHONPATH=$(pwd) python dinov2/train/train.py path_to_your_config_file

The different configurations we used in our study are stored in configs folder. Args from ssl_default_config.yaml will be used if no config file is given and if an argument is not specified in the selected configuration file.

Yaml parameters

Refer to DinoV2 documentation for parameters that are were already present for self-supervised learning.

Other added parameters:

wandb: controls if wandb is used for logging or not. You will need an account in order for the logging to work properly. Otherwise the code will constantly print indications.
data_path: path to your patches dataset
data_type: use images if you store all your patches in the given data_path, or hdf5 if you store them in a .hdf5 file. The patches will be resized to 224 * 224 pixels. You will then have to complete data_path and output_dir.
train_strategy: explora or full for the extended-pretraining strategy. Default is set to full.
kde_loss_weight: weight for the KDE loss
kde_kappa: kappa argument in the KDE loss
koleo_loss_weight: weight for the KoLeo loss

wandb:
train:
  data_type:
  data_path:
  output_dir: 
  
student:
  train_strategy:
  pretrained_weights:

dino:
  kde_loss_weight: 
  kde_kappa: 
  koleo_loss_weight:

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.github/workflows		.github/workflows
configs		configs
dinov2		dinov2
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MODEL_CARD.md		MODEL_CARD.md
MODEL_CARD_T_UNI.md		MODEL_CARD_T_UNI.md
README.md		README.md
conda.yaml		conda.yaml
environment.yaml		environment.yaml
hubconf.py		hubconf.py
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DINOv2 extended pre-training

Installation

FAQ: Environment problems

Pretrained models

Extended pre-training

Yaml parameters

About

Uh oh!

Releases

Packages

Languages

License

idiap/dinov2

Folders and files

Latest commit

History

Repository files navigation

DINOv2 extended pre-training

Installation

FAQ: Environment problems

Pretrained models

Extended pre-training

Yaml parameters

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages