Commit 6693b2c6 authored by ekka's avatar ekka Committed by Francisco Massa
Browse files

Add checks to roi_heads in detection module (#1091)

* add float32 to keypoint_rcnn docs

* add float32 to faster_rcnn docs

* add float32 to mask_rcnn

* Update faster_rcnn.py

* Update keypoint_rcnn.py

* Update mask_rcnn.py

* Update faster_rcnn.py

* make keypoints float

* make masks uint8

* Update keypoint_rcnn.py

* make labels Int64

* make labels Int64

* make labels Int64

* Add checks for boxes, labels, masks, keypoints

* update mask dim

* remove dtype

* check only if targets is not None

* account for targets being a list

* update target to be list of dict

* Update faster_rcnn.py

* Update keypoint_rcnn.py

* allow boxes to be of float16 type as well

* remove checks on mask
parent 8e60cf40
......@@ -30,11 +30,11 @@ class FasterRCNN(GeneralizedRCNN):
The behavior of the model changes depending if it is in training or evaluation mode.
During training, the model expects both the input tensors, as well as a targets dictionary,
During training, the model expects both the input tensors, as well as a targets (list of dictionary),
containing:
- boxes (Tensor[N, 4]): the ground-truth boxes in [x0, y0, x1, y1] format, with values
- boxes (FloatTensor[N, 4]): the ground-truth boxes in [x0, y0, x1, y1] format, with values
between 0 and H and 0 and W
- labels (Tensor[N]): the class label for each ground-truth box
- labels (Int64Tensor[N]): the class label for each ground-truth box
The model returns a Dict[Tensor] during training, containing the classification and regression
losses for both the RPN and the R-CNN.
......@@ -42,9 +42,9 @@ class FasterRCNN(GeneralizedRCNN):
During inference, the model requires only the input tensors, and returns the post-processed
predictions as a List[Dict[Tensor]], one for each input image. The fields of the Dict are as
follows:
- boxes (Tensor[N, 4]): the predicted boxes in [x0, y0, x1, y1] format, with values between
- boxes (FloatTensor[N, 4]): the predicted boxes in [x0, y0, x1, y1] format, with values between
0 and H and 0 and W
- labels (Tensor[N]): the predicted labels for each image
- labels (Int64Tensor[N]): the predicted labels for each image
- scores (Tensor[N]): the scores or each prediction
Arguments:
......@@ -298,11 +298,11 @@ def fasterrcnn_resnet50_fpn(pretrained=False, progress=True,
The behavior of the model changes depending if it is in training or evaluation mode.
During training, the model expects both the input tensors, as well as a targets dictionary,
During training, the model expects both the input tensors, as well as a targets (list of dictionary),
containing:
- boxes (``Tensor[N, 4]``): the ground-truth boxes in ``[x0, y0, x1, y1]`` format, with values
- boxes (``FloatTensor[N, 4]``): the ground-truth boxes in ``[x0, y0, x1, y1]`` format, with values
between ``0`` and ``H`` and ``0`` and ``W``
- labels (``Tensor[N]``): the class label for each ground-truth box
- labels (``Int64Tensor[N]``): the class label for each ground-truth box
The model returns a ``Dict[Tensor]`` during training, containing the classification and regression
losses for both the RPN and the R-CNN.
......@@ -310,9 +310,9 @@ def fasterrcnn_resnet50_fpn(pretrained=False, progress=True,
During inference, the model requires only the input tensors, and returns the post-processed
predictions as a ``List[Dict[Tensor]]``, one for each input image. The fields of the ``Dict`` are as
follows:
- boxes (``Tensor[N, 4]``): the predicted boxes in ``[x0, y0, x1, y1]`` format, with values between
- boxes (``FloatTensor[N, 4]``): the predicted boxes in ``[x0, y0, x1, y1]`` format, with values between
``0`` and ``H`` and ``0`` and ``W``
- labels (``Tensor[N]``): the predicted labels for each image
- labels (``Int64Tensor[N]``): the predicted labels for each image
- scores (``Tensor[N]``): the scores or each prediction
Example::
......
......@@ -24,12 +24,12 @@ class KeypointRCNN(FasterRCNN):
The behavior of the model changes depending if it is in training or evaluation mode.
During training, the model expects both the input tensors, as well as a targets dictionary,
During training, the model expects both the input tensors, as well as a targets (list of dictionary),
containing:
- boxes (Tensor[N, 4]): the ground-truth boxes in [x0, y0, x1, y1] format, with values
- boxes (FloatTensor[N, 4]): the ground-truth boxes in [x0, y0, x1, y1] format, with values
between 0 and H and 0 and W
- labels (Tensor[N]): the class label for each ground-truth box
- keypoints (Tensor[N, K, 3]): the K keypoints location for each of the N instances, in the
- labels (Int64Tensor[N]): the class label for each ground-truth box
- keypoints (FloatTensor[N, K, 3]): the K keypoints location for each of the N instances, in the
format [x, y, visibility], where visibility=0 means that the keypoint is not visible.
The model returns a Dict[Tensor] during training, containing the classification and regression
......@@ -38,11 +38,11 @@ class KeypointRCNN(FasterRCNN):
During inference, the model requires only the input tensors, and returns the post-processed
predictions as a List[Dict[Tensor]], one for each input image. The fields of the Dict are as
follows:
- boxes (Tensor[N, 4]): the predicted boxes in [x0, y0, x1, y1] format, with values between
- boxes (FloatTensor[N, 4]): the predicted boxes in [x0, y0, x1, y1] format, with values between
0 and H and 0 and W
- labels (Tensor[N]): the predicted labels for each image
- labels (Int64Tensor[N]): the predicted labels for each image
- scores (Tensor[N]): the scores or each prediction
- keypoints (Tensor[N, K, 3]): the locations of the predicted keypoints, in [x, y, v] format.
- keypoints (FloatTensor[N, K, 3]): the locations of the predicted keypoints, in [x, y, v] format.
Arguments:
backbone (nn.Module): the network used to compute the features for the model.
......@@ -274,12 +274,12 @@ def keypointrcnn_resnet50_fpn(pretrained=False, progress=True,
The behavior of the model changes depending if it is in training or evaluation mode.
During training, the model expects both the input tensors, as well as a targets dictionary,
During training, the model expects both the input tensors, as well as a targets (list of dictionary),
containing:
- boxes (``Tensor[N, 4]``): the ground-truth boxes in ``[x0, y0, x1, y1]`` format, with values
- boxes (``FloatTensor[N, 4]``): the ground-truth boxes in ``[x0, y0, x1, y1]`` format, with values
between ``0`` and ``H`` and ``0`` and ``W``
- labels (``Tensor[N]``): the class label for each ground-truth box
- keypoints (``Tensor[N, K, 3]``): the ``K`` keypoints location for each of the ``N`` instances, in the
- labels (``Int64Tensor[N]``): the class label for each ground-truth box
- keypoints (``FloatTensor[N, K, 3]``): the ``K`` keypoints location for each of the ``N`` instances, in the
format ``[x, y, visibility]``, where ``visibility=0`` means that the keypoint is not visible.
The model returns a ``Dict[Tensor]`` during training, containing the classification and regression
......@@ -288,11 +288,11 @@ def keypointrcnn_resnet50_fpn(pretrained=False, progress=True,
During inference, the model requires only the input tensors, and returns the post-processed
predictions as a ``List[Dict[Tensor]]``, one for each input image. The fields of the ``Dict`` are as
follows:
- boxes (``Tensor[N, 4]``): the predicted boxes in ``[x0, y0, x1, y1]`` format, with values between
- boxes (``FloatTensor[N, 4]``): the predicted boxes in ``[x0, y0, x1, y1]`` format, with values between
``0`` and ``H`` and ``0`` and ``W``
- labels (``Tensor[N]``): the predicted labels for each image
- labels (``Int64Tensor[N]``): the predicted labels for each image
- scores (``Tensor[N]``): the scores or each prediction
- keypoints (``Tensor[N, K, 3]``): the locations of the predicted keypoints, in ``[x, y, v]`` format.
- keypoints (``FloatTensor[N, K, 3]``): the locations of the predicted keypoints, in ``[x, y, v]`` format.
Example::
......
......@@ -26,12 +26,12 @@ class MaskRCNN(FasterRCNN):
The behavior of the model changes depending if it is in training or evaluation mode.
During training, the model expects both the input tensors, as well as a targets dictionary,
During training, the model expects both the input tensors, as well as a targets (list of dictionary),
containing:
- boxes (Tensor[N, 4]): the ground-truth boxes in [x0, y0, x1, y1] format, with values
- boxes (FloatTensor[N, 4]): the ground-truth boxes in [x0, y0, x1, y1] format, with values
between 0 and H and 0 and W
- labels (Tensor[N]): the class label for each ground-truth box
- masks (Tensor[N, 1, H, W]): the segmentation binary masks for each instance
- labels (Int64Tensor[N]): the class label for each ground-truth box
- masks (UInt8Tensor[N, 1, H, W]): the segmentation binary masks for each instance
The model returns a Dict[Tensor] during training, containing the classification and regression
losses for both the RPN and the R-CNN, and the mask loss.
......@@ -39,11 +39,11 @@ class MaskRCNN(FasterRCNN):
During inference, the model requires only the input tensors, and returns the post-processed
predictions as a List[Dict[Tensor]], one for each input image. The fields of the Dict are as
follows:
- boxes (Tensor[N, 4]): the predicted boxes in [x0, y0, x1, y1] format, with values between
- boxes (FloatTensor[N, 4]): the predicted boxes in [x0, y0, x1, y1] format, with values between
0 and H and 0 and W
- labels (Tensor[N]): the predicted labels for each image
- labels (Int64Tensor[N]): the predicted labels for each image
- scores (Tensor[N]): the scores or each prediction
- masks (Tensor[N, 1, H, W]): the predicted masks for each instance, in 0-1 range. In order to
- masks (UInt8Tensor[N, 1, H, W]): the predicted masks for each instance, in 0-1 range. In order to
obtain the final segmentation masks, the soft masks can be thresholded, generally
with a value of 0.5 (mask >= 0.5)
......@@ -273,12 +273,12 @@ def maskrcnn_resnet50_fpn(pretrained=False, progress=True,
The behavior of the model changes depending if it is in training or evaluation mode.
During training, the model expects both the input tensors, as well as a targets dictionary,
During training, the model expects both the input tensors, as well as a targets (list of dictionary),
containing:
- boxes (``Tensor[N, 4]``): the ground-truth boxes in ``[x0, y0, x1, y1]`` format, with values
- boxes (``FloatTensor[N, 4]``): the ground-truth boxes in ``[x0, y0, x1, y1]`` format, with values
between ``0`` and ``H`` and ``0`` and ``W``
- labels (``Tensor[N]``): the class label for each ground-truth box
- masks (``Tensor[N, H, W]``): the segmentation binary masks for each instance
- labels (``Int64Tensor[N]``): the class label for each ground-truth box
- masks (``UInt8Tensor[N, 1, H, W]``): the segmentation binary masks for each instance
The model returns a ``Dict[Tensor]`` during training, containing the classification and regression
losses for both the RPN and the R-CNN, and the mask loss.
......@@ -286,11 +286,11 @@ def maskrcnn_resnet50_fpn(pretrained=False, progress=True,
During inference, the model requires only the input tensors, and returns the post-processed
predictions as a ``List[Dict[Tensor]]``, one for each input image. The fields of the ``Dict`` are as
follows:
- boxes (``Tensor[N, 4]``): the predicted boxes in ``[x0, y0, x1, y1]`` format, with values between
- boxes (``FloatTensor[N, 4]``): the predicted boxes in ``[x0, y0, x1, y1]`` format, with values between
``0`` and ``H`` and ``0`` and ``W``
- labels (``Tensor[N]``): the predicted labels for each image
- labels (``Int64Tensor[N]``): the predicted labels for each image
- scores (``Tensor[N]``): the scores or each prediction
- masks (``Tensor[N, H, W]``): the predicted masks for each instance, in ``0-1`` range. In order to
- masks (``UInt8Tensor[N, 1, H, W]``): the predicted masks for each instance, in ``0-1`` range. In order to
obtain the final segmentation masks, the soft masks can be thresholded, generally
with a value of 0.5 (``mask >= 0.5``)
......
......@@ -17,6 +17,8 @@ def fastrcnn_loss(class_logits, box_regression, labels, regression_targets):
Arguments:
class_logits (Tensor)
box_regression (Tensor)
labels (list[BoxList])
regression_targets (Tensor)
Returns:
classification_loss (Tensor)
......@@ -55,7 +57,7 @@ def maskrcnn_inference(x, labels):
Arguments:
x (Tensor): the mask logits
boxes (list[BoxList]): bounding boxes that are used as
labels (list[BoxList]): bounding boxes that are used as
reference, one for ech image
Returns:
......@@ -250,7 +252,7 @@ def keypointrcnn_inference(x, boxes):
# the next two functions should be merged inside Masker
# but are kept here for the moment while we need them
# temporarily gor paste_mask_in_image
# temporarily for paste_mask_in_image
def expand_boxes(boxes, scale):
w_half = (boxes[:, 2] - boxes[:, 0]) * .5
h_half = (boxes[:, 3] - boxes[:, 1]) * .5
......@@ -525,6 +527,13 @@ class RoIHeads(torch.nn.Module):
image_shapes (List[Tuple[H, W]])
targets (List[Dict])
"""
if targets is not None:
for t in targets:
assert t["boxes"].dtype.is_floating_point, 'target boxes must of float type'
assert t["labels"].dtype == torch.int64, 'target labels must of int64 type'
if self.has_keypoint:
assert t["keypoints"].dtype == torch.float32, 'target keypoints must of float type'
if self.training:
proposals, matched_idxs, labels, regression_targets = self.select_training_samples(proposals, targets)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment