Closed
Description
Overview:
I updated torch and torchvision to the latest builds. A cool update was that now negative samples could be included in RCNN training. However, I end up getting a NaN value for loss_rpn_box_reg when I provide negative samples.
I was training a Pedestrian Detector. Based on my custom dataset input, if a label wasn't provided, I would use it as a negative sample. This is the code snippet I used.
def __getitem__(self, idx):
img_path , x1 , y1 , x2 ,y2 , label = self.imgs[idx].split(",")
img = Image.open(img_path).convert("RGB")
boxes = []
if label:
pos = np.asarray([[y1,y2],[x1,x2]]).astype(np.float)
xmin = np.min(pos[1])
xmax = np.max(pos[1])
ymin = np.min(pos[0])
ymax = np.max(pos[0])
boxes.append([xmin, ymin, xmax, ymax])
labels = torch.ones((1,), dtype=torch.int64)
iscrowd = torch.zeros((1,), dtype=torch.int64)
else:
boxes.append([0.0,0.0,0.0,0.0])
labels = torch.zeros((1,), dtype=torch.int64)
iscrowd = torch.zeros((0,), dtype=torch.int64)
# convert everything into a torch.Tensor
boxes = torch.as_tensor(boxes, dtype=torch.float32)
image_id = torch.tensor([idx])
area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
target = {}
target["boxes"] = boxes
target["labels"] = labels
target["image_id"] = image_id
target["area"] = area
target["iscrowd"] = iscrowd
if self.transforms is not None:
img, target = self.transforms(img, target)
return img, target
The training seems to work fine if I replace the following line:
boxes.append([0.0,0.0,0.0,0.0])
with
boxes.append([0.0,0.0,0.1,0.1])
So i'm guessing it's because both xmin/ymin and xmax/ymax are equal.
Setup:
Torch : 1.5.0
Torchvision: 0.6.0
Nvidia - 440.33
Cuda-10.2