Open
Description
🐛 Describe the bug
Bug Report: Incorrect Box Slicing in Faster R-CNN's postprocess_detections
Minimal Reproduction Code
import torch
import torchvision
detector = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
data = torch.zeros((1, 3, 1080, 1920), dtype=torch.float32)
detections = detector(data)
Description
The bug occurs in roi_heads.py
(line 701) in the postprocess_detections
function of RoIHeads
when processing Faster R-CNN outputs. The current implementation incorrectly handles box dimension slicing when removing background class predictions.
Problem Location
The problematic code segment:
for boxes, scores, image_shape in zip(pred_boxes_list, pred_scores_list, image_shapes):
...
# remove predictions with the background label
boxes = boxes[:, 1:] # Incorrect slicing
scores = scores[:, 1:]
labels = labels[:, 1:]
...
Root Cause
- The boxes tensor has shape
[N, num_classes * 4]
(where each class has 4 coordinate values) - The current slicing
boxes[:, 1:]
incorrectly operates on the last dimension (class*coordinates) instead of just the class dimension - This causes misalignment between boxes, scores, and labels since they're being sliced differently
Expected Behavior
The boxes tensor should first be reshaped to [N, num_classes, 4]
before slicing to properly separate class and coordinate dimensions.
Proposed Fix
for boxes, scores, image_shape in zip(pred_boxes_list, pred_scores_list, image_shapes):
...
# remove predictions with the background label
boxes = boxes.reshape(-1, num_classes, 4) # Proper dimension separation
boxes = boxes[:, 1:, :] # Correct class dimension slicing
scores = scores[:, 1:]
labels = labels[:, 1:]
...
Impact
The current implementation leads to:
- Misaligned boxes and their corresponding scores/labels
- Potentially incorrect final detection results
- Silent failure without explicit errors
Versions
branch: 6473b77
Metadata
Metadata
Assignees
Labels
No labels