Skip to content

Port SwinTransformer3d from torchmultimodal #6499

Closed
@oke-aditya

Description

@oke-aditya

🚀 The feature

The main Idea is to port the SwinTransformer3d model from torchmulitmodal to torchvision.

Need to keep in mind the nuances and code structure of torchvision

https://github.com/facebookresearch/multimodal/blob/main/torchmultimodal/modules/encoders/swin_transformer_3d_encoder.py

https://github.com/facebookresearch/multimodal/blob/main/examples/omnivore/LoadOriginalPretrainedWeightAndCompare.ipynb

We need to port the implementation as well as the weights.

Motivation, pitch

The idea is to first port SwinTransformer3dV1 and port its weights successfully. Once done we can then think of having SwinTransformer3dV2 (there is no such paper or implementation but maybe it will benefit like the 2d case)

Alternatives

No response

Additional context

Additionally in discussion with @YosuaMichael the paper also mentioned that SwinTransformerV2 can be used for object detection tasks. If possible we should explore it (but only after we finish previous things)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions