Description
🚀 Feature
A simple torchvision.ops
to convert Segmentation masks to bounding boxes.
Motivation
This has a few use-cases.
-
This makes it easier to use semantic segmentation datasets for object detection.
The pipeline can be easier. Also the bounding boxes are represented asxyxy
intorchvision.ops
as a convention.
So probably convert masks toxyxy
format. -
The other use case is to make it easier in comparing performance of segmentation model vs detection model.
Let's Say that the detection model performs well for segmentation dataset. Then it would be better to go ahead with detection models as it is faster in real-time use-cases than to train a segmentation model.
New Pipeline
from torchvision.ops import masks_to_boxes, box_convert
class SegmentationToDetectionDataset(Dataset):
def __getitem__(self, idx):
boxes_xyxy = masks_to_boxes(segmentation_masks)
# Now for any change of boxes to COCO Format.
boxes_xywh = box_convert(boxes_xyxy, in_fmt="xyxy", out_fmt="xywh")
return boxes_xywh
Pitch
Port the masks_to_boxes function from mDeTR.
masks_to_boxes was also used in DeTR.
Alternatives
The above function assumes masks of shape (N, H, W)
-> num_masks, Height, Width
. A floating tensor.
IIRC, we used a boolean tensor in draw_segmentation_masks
(After Nicolas refactored). So perhaps we should be using boolean tensor? Though I see no particular use case of this util being only valid for instance segmentation.
Additional context
I can port this, we perhaps need a few tests to ensure it works fine.
Especially test for float16 overflow.