Allow arbitrary-sized images by dynamic masking: upstream changes from Swin-Transformer-Object-Detection / SOLQ

Hi!

To combine Swin transformer backbone with Deformable DETR detector, [SOLQ](https://github.com/megvii-research/SOLQ/blob/main/models/swin_transformer.py) did some changes to `swin_transformer.py` that allow to compute the padding mask dynamically and allow for arbitrary-sized images in input (I think this is supported for relative positional encoding only).

Similar edits were done by your colleagues in https://github.com/SwinTransformer/Swin-Transformer-Object-Detection/blob/master/mmdet/models/backbones/swin_transformer.py

If this interests you, maybe you could import those edits from SOLQ / Swin-Transformer-Object-Detection or implement similar edits. This will make it simpler to experiment with SimMIM checkpoints / backbone code in object detection context and make sure that checkpoints load correctly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow arbitrary-sized images by dynamic masking: upstream changes from Swin-Transformer-Object-Detection / SOLQ #13

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Allow arbitrary-sized images by dynamic masking: upstream changes from Swin-Transformer-Object-Detection / SOLQ #13

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions