Closed
Description
🚀 Feature
Fill parameter allows creating a semi-transparent box. This is particularly useful for Mask RCNN Model.
This would complete utils for Object detection and Instance Segmentation (least with rectangular boxes)
Motivation
In Instance segmentation models, we also care about masks, not just the bounding box. Fill parameter allows us to fill in a semi-transparent way. Also, this parameter is optional hence it does not affect performance.
Pitch
Add a param fill as follows
fill: Optional[List[Union[str, Tuple[int, int, int]]]] = None,
Here is complete running code with a few edits
@torch.no_grad()
def draw_bounding_boxes(
image: torch.Tensor,
boxes: torch.Tensor,
labels: Optional[List[str]] = None,
colors: Optional[List[Union[str, Tuple[int, int, int]]]] = None,
fill: Optional[List[Union[str, Tuple[int, int, int]]]] = None,
width: int = 1,
font: Optional[str] = None,
font_size: int = 10
) -> torch.Tensor:
"""
Draws bounding boxes on given image.
The values of the input image should be uint8 between 0 and 255.
Args:
image (Tensor): Tensor of shape (C x H x W)
bboxes (Tensor): Tensor of size (N, 4) containing bounding boxes in (xmin, ymin, xmax, ymax) format. Note that
the boxes are absolute coordinates with respect to the image. In other words: `0 <= xmin < xmax < W` and
`0 <= ymin < ymax < H`.
labels (List[str]): List containing the labels of bounding boxes.
colors (List[Union[str, Tuple[int, int, int]]]): List containing the colors of bounding boxes. The colors can
be represented as `str` or `Tuple[int, int, int]`.
fill: Optional[List[Union[str, Tuple[int, int, int]]]] = None,
width (int): Width of bounding box.
font (str): A filename containing a TrueType font. If the file is not found in this filename, the loader may
also search in other directories, such as the `fonts/` directory on Windows or `/Library/Fonts/`,
`/System/Library/Fonts/` and `~/Library/Fonts/` on macOS.
font_size (int): The requested font size in points.
"""
if not isinstance(image, torch.Tensor):
raise TypeError(f"Tensor expected, got {type(image)}")
elif image.dtype != torch.uint8:
raise ValueError(f"Tensor uint8 expected, got {image.dtype}")
elif image.dim() != 3:
raise ValueError("Pass individual images, not batches")
ndarr = image.permute(1, 2, 0).numpy()
img_to_draw = Image.fromarray(ndarr)
img_boxes = boxes.to(torch.int64).tolist()
draw = ImageDraw.Draw(img_to_draw, "RGBA")
txt_font = ImageFont.load_default() if font is None else ImageFont.truetype(font=font, size=font_size)
for i, bbox in enumerate(img_boxes):
color = None if colors is None else colors[i]
draw.rectangle(bbox, width=width, outline=color, fill=fill)
if labels is not None:
draw.text((bbox[0], bbox[1]), labels[i], fill=color, font=txt_font)
return torch.from_numpy(np.array(img_to_draw)).permute(2, 0, 1)
This makes mask RCNN output more clear, and people can play with fill parameter such as confidence based fill, fill with colors different per class, etc.
Additional context
I can send PR for this 😅 I'm attaching outputs of above code.
(Sorry PyTorch logo 🙏 )