[RFC] Batteries Included - Phase 2

### 🚀 The feature

*Note: To track the progress of the project check out [this board](https://github.com/pytorch/vision/projects/4).*

This is the 2nd phase of TorchVision's modernization project (see [phase 1](https://github.com/pytorch/vision/issues/3911)). We aim to keep TorchVision relevant by ensuring it provides off-the-shelf all the necessary primitives, model architectures and recipe utilities to produce SOTA results for the supported  Computer Vision tasks. 

<h2 id="new-primitives">1. New Primitives</h2>

To enable our users to reproduce the latest state-of-the-art research we will enhance TorchVision with the following data augmentations, layers, losses and other operators:

### Data Augmentations

- [x] [Augmix](https://arxiv.org/abs/1912.02781) - #5411
- [x] [Large Scale Jitter](https://arxiv.org/pdf/2012.07177.pdf) - #5435 #5446 #5559
- [x] Fixed Size Crop - #5607
- [x] Random Shortest Size - #5610
- [x] [Simple CopyPaste](https://arxiv.org/pdf/2012.07177v2.pdf) - #5825

### Layers

- [x] [DropBlock](https://arxiv.org/abs/1810.12890) - #5416 
- [x] [Conv3DNormActivation](https://github.com/pytorch/vision/issues/5430) - #5445
- [x] [MLP](https://github.com/pytorch/vision/issues/4333) - #6053
- [x] Permute - #6055

### Losses
- [x] [Generalized-IoU loss](https://arxiv.org/pdf/1902.09630v2.pdf) - #4961
- [x] [Distance-IoU & Complete-IoU loss](https://arxiv.org/abs/1911.08287) - #5776 #5786 #5984

### Operators added in PyTorch Core

- [x] Better [EMA](https://github.com/pytorch/pytorch/issues/66686) support in `AveragedModel` - https://github.com/pytorch/pytorch/pull/71763
- [x] [Add support of empty output in SyncBatchNorm](https://github.com/pytorch/pytorch/issues/36530) - https://github.com/pytorch/pytorch/pull/74944

<h2 id="new-models">2. New Architectures & Model Iterations</h2>

To ensure that our users have access to the most popular SOTA models, we will add the following architectures along with pre-trained weights. Moreover we will improve existing architectures with commonly adopted optimizations introduced in follow up research:

### Image Classification

- [x] [ConvNeXt](https://arxiv.org/abs/2201.03545) - #5197 #5253 #5330
- [x] [EfficientNetV2](https://arxiv.org/abs/2104.00298) [code](https://github.com/pytorch/vision/pull/4910) - #5450
- [x] [Swin Transformer](https://arxiv.org/pdf/2103.14030.pdf) - #5491 #6048

### Object Detection & Segmentation

- [x] [FCOS](https://arxiv.org/abs/1904.01355) #4961
- [x] Post-paper optimizations for [RetinaNet, FasterRCNN & MaskRCNN](https://github.com/pytorch/vision/issues/4932) #5444

### Video Classification

- [x] [MViT](https://arxiv.org/pdf/2104.11227.pdf) - #6198

<h2 id="new-weights">3. Improved Training Recipes & Pre-trained models</h2>

To ensure that are users can have access to strong baselines and SOTA weights, we will improve our training recipes to incorporate the newly released primitives and offer improved pre-trained models:

### Reference Scripts
- [x] Update [EMA](https://github.com/pytorch/vision/issues/5284) to use PyTorch Core's  new implementation - #5469
- [x] Add support of new Detection primitives in Reference Scripts - #5715 

### Pre-trained weights
- [x] [Improve the accuracy of Classification models](https://github.com/pytorch/vision/issues/3995) - #5560 #5906 #5935 #6019
- [x] [Close the gap](https://github.com/pytorch/vision/issues/5307) with SOTA for Object Detection & Segmentation models - #5756 #5763 #5773
- [x] [Add weakly-supervised weights for ViT and RegNets](https://github.com/pytorch/vision/issues/5708) - #5714 #5722 #5732 #5721 #5793


----

## Other Candidates

There are several other Operators (#5414), Losses (#2980), Augmentations (#3817) and Models (#2707) proposed by the community. Here are some potential candidates that we could implement depending on bandwidth. Contributions are welcome for any of the below:
- [AutoAugment Detection](https://arxiv.org/pdf/1906.11172.pdf) [code](https://github.com/tensorflow/tpu/blob/b24729de804fdb751b06467d3dce0637fa652060/models/official/detection/utils/autoaugment_utils.py#L38) - #6224
- [Deformable DeTR](https://arxiv.org/abs/2010.04159)
- [Polynomial LR scheduler](https://github.com/pytorch/vision/issues/4438) (upstream to Core)
- [Shortcut Regularizer](https://github.com/pytorch/vision/pull/4549) (FX-based)

cc @datumbox @vfdev-5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC] Batteries Included - Phase 2 #5410

🚀 The feature

1. New Primitives

Data Augmentations

Layers

Losses

Operators added in PyTorch Core

2. New Architectures & Model Iterations

Image Classification

Object Detection & Segmentation

Video Classification

3. Improved Training Recipes & Pre-trained models

Reference Scripts

Pre-trained weights

Other Candidates

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] Batteries Included - Phase 2 #5410

Description

🚀 The feature

1. New Primitives

Data Augmentations

Layers

Losses

Operators added in PyTorch Core

2. New Architectures & Model Iterations

Image Classification

Object Detection & Segmentation

Video Classification

3. Improved Training Recipes & Pre-trained models

Reference Scripts

Pre-trained weights

Other Candidates

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions