Closed
Description
🚀 The feature
Swin Transforms were added to TorchVision on #5491 thanks to the work of @xiaohu2015 and @jdsgomes. At that point, the official code of SwinV2 was not released so we decided to postpone its addition. Later @YosuaMichael has refactored the class to make it compatible with 3d use-cases at #6088. With the code of V2 now available, we should consider adding it to TorchVision.
Here is what needs to be done:
- Examine if it's possible to extend the existing Swin Transformer class to support both V1 and V2 models. An example of such an extension was recently done for EfficientNets. On the other hand, if the implementations are substantially different, we can follow the approach we took for MobileNets v2 and v3 and use separate classes.
- Provide an implementation that uses TorchVision building blocks and idioms. See New Model Architectures - Implementation and Documentation Details #5319 for pointers.
- Provide proof in form of pre-trained weights for the smallest variant that reproduces the accuracy of the paper.
Motivation, pitch
We should keep TorchVision fresh by offering the latest influential architectures to support the work of ML researchers and practitioners.
Alternatives
No response
Additional context
No response
cc @datumbox