Skip to content

Add SwinV2 in TorchVision #6242

Closed
Closed
@datumbox

Description

@datumbox

🚀 The feature

Swin Transforms were added to TorchVision on #5491 thanks to the work of @xiaohu2015 and @jdsgomes. At that point, the official code of SwinV2 was not released so we decided to postpone its addition. Later @YosuaMichael has refactored the class to make it compatible with 3d use-cases at #6088. With the code of V2 now available, we should consider adding it to TorchVision.

Here is what needs to be done:

  • Examine if it's possible to extend the existing Swin Transformer class to support both V1 and V2 models. An example of such an extension was recently done for EfficientNets. On the other hand, if the implementations are substantially different, we can follow the approach we took for MobileNets v2 and v3 and use separate classes.
  • Provide an implementation that uses TorchVision building blocks and idioms. See New Model Architectures - Implementation and Documentation Details #5319 for pointers.
  • Provide proof in form of pre-trained weights for the smallest variant that reproduces the accuracy of the paper.

Motivation, pitch

We should keep TorchVision fresh by offering the latest influential architectures to support the work of ML researchers and practitioners.

Alternatives

No response

Additional context

No response

cc @datumbox

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions