Commit 6693b2c6 authored by ekka's avatar ekka Committed by Francisco Massa
Browse files

Add checks to roi_heads in detection module (#1091)

* add float32 to keypoint_rcnn docs

* add float32 to faster_rcnn docs

* add float32 to mask_rcnn

* Update faster_rcnn.py

* Update keypoint_rcnn.py

* Update mask_rcnn.py

* Update faster_rcnn.py

* make keypoints float

* make masks uint8

* Update keypoint_rcnn.py

* make labels Int64

* make labels Int64

* make labels Int64

* Add checks for boxes, labels, masks, keypoints

* update mask dim

* remove dtype

* check only if targets is not None

* account for targets being a list

* update target to be list of dict

* Update faster_rcnn.py

* Update keypoint_rcnn.py

* allow boxes to be of float16 type as well

* remove checks on mask
parent 8e60cf40
...@@ -30,11 +30,11 @@ class FasterRCNN(GeneralizedRCNN): ...@@ -30,11 +30,11 @@ class FasterRCNN(GeneralizedRCNN):
The behavior of the model changes depending if it is in training or evaluation mode. The behavior of the model changes depending if it is in training or evaluation mode.
During training, the model expects both the input tensors, as well as a targets dictionary, During training, the model expects both the input tensors, as well as a targets (list of dictionary),
containing: containing:
- boxes (Tensor[N, 4]): the ground-truth boxes in [x0, y0, x1, y1] format, with values - boxes (FloatTensor[N, 4]): the ground-truth boxes in [x0, y0, x1, y1] format, with values
between 0 and H and 0 and W between 0 and H and 0 and W
- labels (Tensor[N]): the class label for each ground-truth box - labels (Int64Tensor[N]): the class label for each ground-truth box
The model returns a Dict[Tensor] during training, containing the classification and regression The model returns a Dict[Tensor] during training, containing the classification and regression
losses for both the RPN and the R-CNN. losses for both the RPN and the R-CNN.
...@@ -42,9 +42,9 @@ class FasterRCNN(GeneralizedRCNN): ...@@ -42,9 +42,9 @@ class FasterRCNN(GeneralizedRCNN):
During inference, the model requires only the input tensors, and returns the post-processed During inference, the model requires only the input tensors, and returns the post-processed
predictions as a List[Dict[Tensor]], one for each input image. The fields of the Dict are as predictions as a List[Dict[Tensor]], one for each input image. The fields of the Dict are as
follows: follows:
- boxes (Tensor[N, 4]): the predicted boxes in [x0, y0, x1, y1] format, with values between - boxes (FloatTensor[N, 4]): the predicted boxes in [x0, y0, x1, y1] format, with values between
0 and H and 0 and W 0 and H and 0 and W
- labels (Tensor[N]): the predicted labels for each image - labels (Int64Tensor[N]): the predicted labels for each image
- scores (Tensor[N]): the scores or each prediction - scores (Tensor[N]): the scores or each prediction
Arguments: Arguments:
...@@ -298,11 +298,11 @@ def fasterrcnn_resnet50_fpn(pretrained=False, progress=True, ...@@ -298,11 +298,11 @@ def fasterrcnn_resnet50_fpn(pretrained=False, progress=True,
The behavior of the model changes depending if it is in training or evaluation mode. The behavior of the model changes depending if it is in training or evaluation mode.
During training, the model expects both the input tensors, as well as a targets dictionary, During training, the model expects both the input tensors, as well as a targets (list of dictionary),
containing: containing:
- boxes (``Tensor[N, 4]``): the ground-truth boxes in ``[x0, y0, x1, y1]`` format, with values - boxes (``FloatTensor[N, 4]``): the ground-truth boxes in ``[x0, y0, x1, y1]`` format, with values
between ``0`` and ``H`` and ``0`` and ``W`` between ``0`` and ``H`` and ``0`` and ``W``
- labels (``Tensor[N]``): the class label for each ground-truth box - labels (``Int64Tensor[N]``): the class label for each ground-truth box
The model returns a ``Dict[Tensor]`` during training, containing the classification and regression The model returns a ``Dict[Tensor]`` during training, containing the classification and regression
losses for both the RPN and the R-CNN. losses for both the RPN and the R-CNN.
...@@ -310,9 +310,9 @@ def fasterrcnn_resnet50_fpn(pretrained=False, progress=True, ...@@ -310,9 +310,9 @@ def fasterrcnn_resnet50_fpn(pretrained=False, progress=True,
During inference, the model requires only the input tensors, and returns the post-processed During inference, the model requires only the input tensors, and returns the post-processed
predictions as a ``List[Dict[Tensor]]``, one for each input image. The fields of the ``Dict`` are as predictions as a ``List[Dict[Tensor]]``, one for each input image. The fields of the ``Dict`` are as
follows: follows:
- boxes (``Tensor[N, 4]``): the predicted boxes in ``[x0, y0, x1, y1]`` format, with values between - boxes (``FloatTensor[N, 4]``): the predicted boxes in ``[x0, y0, x1, y1]`` format, with values between
``0`` and ``H`` and ``0`` and ``W`` ``0`` and ``H`` and ``0`` and ``W``
- labels (``Tensor[N]``): the predicted labels for each image - labels (``Int64Tensor[N]``): the predicted labels for each image
- scores (``Tensor[N]``): the scores or each prediction - scores (``Tensor[N]``): the scores or each prediction
Example:: Example::
......
...@@ -24,12 +24,12 @@ class KeypointRCNN(FasterRCNN): ...@@ -24,12 +24,12 @@ class KeypointRCNN(FasterRCNN):
The behavior of the model changes depending if it is in training or evaluation mode. The behavior of the model changes depending if it is in training or evaluation mode.
During training, the model expects both the input tensors, as well as a targets dictionary, During training, the model expects both the input tensors, as well as a targets (list of dictionary),
containing: containing:
- boxes (Tensor[N, 4]): the ground-truth boxes in [x0, y0, x1, y1] format, with values - boxes (FloatTensor[N, 4]): the ground-truth boxes in [x0, y0, x1, y1] format, with values
between 0 and H and 0 and W between 0 and H and 0 and W
- labels (Tensor[N]): the class label for each ground-truth box - labels (Int64Tensor[N]): the class label for each ground-truth box
- keypoints (Tensor[N, K, 3]): the K keypoints location for each of the N instances, in the - keypoints (FloatTensor[N, K, 3]): the K keypoints location for each of the N instances, in the
format [x, y, visibility], where visibility=0 means that the keypoint is not visible. format [x, y, visibility], where visibility=0 means that the keypoint is not visible.
The model returns a Dict[Tensor] during training, containing the classification and regression The model returns a Dict[Tensor] during training, containing the classification and regression
...@@ -38,11 +38,11 @@ class KeypointRCNN(FasterRCNN): ...@@ -38,11 +38,11 @@ class KeypointRCNN(FasterRCNN):
During inference, the model requires only the input tensors, and returns the post-processed During inference, the model requires only the input tensors, and returns the post-processed
predictions as a List[Dict[Tensor]], one for each input image. The fields of the Dict are as predictions as a List[Dict[Tensor]], one for each input image. The fields of the Dict are as
follows: follows:
- boxes (Tensor[N, 4]): the predicted boxes in [x0, y0, x1, y1] format, with values between - boxes (FloatTensor[N, 4]): the predicted boxes in [x0, y0, x1, y1] format, with values between
0 and H and 0 and W 0 and H and 0 and W
- labels (Tensor[N]): the predicted labels for each image - labels (Int64Tensor[N]): the predicted labels for each image
- scores (Tensor[N]): the scores or each prediction - scores (Tensor[N]): the scores or each prediction
- keypoints (Tensor[N, K, 3]): the locations of the predicted keypoints, in [x, y, v] format. - keypoints (FloatTensor[N, K, 3]): the locations of the predicted keypoints, in [x, y, v] format.
Arguments: Arguments:
backbone (nn.Module): the network used to compute the features for the model. backbone (nn.Module): the network used to compute the features for the model.
...@@ -274,12 +274,12 @@ def keypointrcnn_resnet50_fpn(pretrained=False, progress=True, ...@@ -274,12 +274,12 @@ def keypointrcnn_resnet50_fpn(pretrained=False, progress=True,
The behavior of the model changes depending if it is in training or evaluation mode. The behavior of the model changes depending if it is in training or evaluation mode.
During training, the model expects both the input tensors, as well as a targets dictionary, During training, the model expects both the input tensors, as well as a targets (list of dictionary),
containing: containing:
- boxes (``Tensor[N, 4]``): the ground-truth boxes in ``[x0, y0, x1, y1]`` format, with values - boxes (``FloatTensor[N, 4]``): the ground-truth boxes in ``[x0, y0, x1, y1]`` format, with values
between ``0`` and ``H`` and ``0`` and ``W`` between ``0`` and ``H`` and ``0`` and ``W``
- labels (``Tensor[N]``): the class label for each ground-truth box - labels (``Int64Tensor[N]``): the class label for each ground-truth box
- keypoints (``Tensor[N, K, 3]``): the ``K`` keypoints location for each of the ``N`` instances, in the - keypoints (``FloatTensor[N, K, 3]``): the ``K`` keypoints location for each of the ``N`` instances, in the
format ``[x, y, visibility]``, where ``visibility=0`` means that the keypoint is not visible. format ``[x, y, visibility]``, where ``visibility=0`` means that the keypoint is not visible.
The model returns a ``Dict[Tensor]`` during training, containing the classification and regression The model returns a ``Dict[Tensor]`` during training, containing the classification and regression
...@@ -288,11 +288,11 @@ def keypointrcnn_resnet50_fpn(pretrained=False, progress=True, ...@@ -288,11 +288,11 @@ def keypointrcnn_resnet50_fpn(pretrained=False, progress=True,
During inference, the model requires only the input tensors, and returns the post-processed During inference, the model requires only the input tensors, and returns the post-processed
predictions as a ``List[Dict[Tensor]]``, one for each input image. The fields of the ``Dict`` are as predictions as a ``List[Dict[Tensor]]``, one for each input image. The fields of the ``Dict`` are as
follows: follows:
- boxes (``Tensor[N, 4]``): the predicted boxes in ``[x0, y0, x1, y1]`` format, with values between - boxes (``FloatTensor[N, 4]``): the predicted boxes in ``[x0, y0, x1, y1]`` format, with values between
``0`` and ``H`` and ``0`` and ``W`` ``0`` and ``H`` and ``0`` and ``W``
- labels (``Tensor[N]``): the predicted labels for each image - labels (``Int64Tensor[N]``): the predicted labels for each image
- scores (``Tensor[N]``): the scores or each prediction - scores (``Tensor[N]``): the scores or each prediction
- keypoints (``Tensor[N, K, 3]``): the locations of the predicted keypoints, in ``[x, y, v]`` format. - keypoints (``FloatTensor[N, K, 3]``): the locations of the predicted keypoints, in ``[x, y, v]`` format.
Example:: Example::
......
...@@ -26,12 +26,12 @@ class MaskRCNN(FasterRCNN): ...@@ -26,12 +26,12 @@ class MaskRCNN(FasterRCNN):
The behavior of the model changes depending if it is in training or evaluation mode. The behavior of the model changes depending if it is in training or evaluation mode.
During training, the model expects both the input tensors, as well as a targets dictionary, During training, the model expects both the input tensors, as well as a targets (list of dictionary),
containing: containing:
- boxes (Tensor[N, 4]): the ground-truth boxes in [x0, y0, x1, y1] format, with values - boxes (FloatTensor[N, 4]): the ground-truth boxes in [x0, y0, x1, y1] format, with values
between 0 and H and 0 and W between 0 and H and 0 and W
- labels (Tensor[N]): the class label for each ground-truth box - labels (Int64Tensor[N]): the class label for each ground-truth box
- masks (Tensor[N, 1, H, W]): the segmentation binary masks for each instance - masks (UInt8Tensor[N, 1, H, W]): the segmentation binary masks for each instance
The model returns a Dict[Tensor] during training, containing the classification and regression The model returns a Dict[Tensor] during training, containing the classification and regression
losses for both the RPN and the R-CNN, and the mask loss. losses for both the RPN and the R-CNN, and the mask loss.
...@@ -39,11 +39,11 @@ class MaskRCNN(FasterRCNN): ...@@ -39,11 +39,11 @@ class MaskRCNN(FasterRCNN):
During inference, the model requires only the input tensors, and returns the post-processed During inference, the model requires only the input tensors, and returns the post-processed
predictions as a List[Dict[Tensor]], one for each input image. The fields of the Dict are as predictions as a List[Dict[Tensor]], one for each input image. The fields of the Dict are as
follows: follows:
- boxes (Tensor[N, 4]): the predicted boxes in [x0, y0, x1, y1] format, with values between - boxes (FloatTensor[N, 4]): the predicted boxes in [x0, y0, x1, y1] format, with values between
0 and H and 0 and W 0 and H and 0 and W
- labels (Tensor[N]): the predicted labels for each image - labels (Int64Tensor[N]): the predicted labels for each image
- scores (Tensor[N]): the scores or each prediction - scores (Tensor[N]): the scores or each prediction
- masks (Tensor[N, 1, H, W]): the predicted masks for each instance, in 0-1 range. In order to - masks (UInt8Tensor[N, 1, H, W]): the predicted masks for each instance, in 0-1 range. In order to
obtain the final segmentation masks, the soft masks can be thresholded, generally obtain the final segmentation masks, the soft masks can be thresholded, generally
with a value of 0.5 (mask >= 0.5) with a value of 0.5 (mask >= 0.5)
...@@ -273,12 +273,12 @@ def maskrcnn_resnet50_fpn(pretrained=False, progress=True, ...@@ -273,12 +273,12 @@ def maskrcnn_resnet50_fpn(pretrained=False, progress=True,
The behavior of the model changes depending if it is in training or evaluation mode. The behavior of the model changes depending if it is in training or evaluation mode.
During training, the model expects both the input tensors, as well as a targets dictionary, During training, the model expects both the input tensors, as well as a targets (list of dictionary),
containing: containing:
- boxes (``Tensor[N, 4]``): the ground-truth boxes in ``[x0, y0, x1, y1]`` format, with values - boxes (``FloatTensor[N, 4]``): the ground-truth boxes in ``[x0, y0, x1, y1]`` format, with values
between ``0`` and ``H`` and ``0`` and ``W`` between ``0`` and ``H`` and ``0`` and ``W``
- labels (``Tensor[N]``): the class label for each ground-truth box - labels (``Int64Tensor[N]``): the class label for each ground-truth box
- masks (``Tensor[N, H, W]``): the segmentation binary masks for each instance - masks (``UInt8Tensor[N, 1, H, W]``): the segmentation binary masks for each instance
The model returns a ``Dict[Tensor]`` during training, containing the classification and regression The model returns a ``Dict[Tensor]`` during training, containing the classification and regression
losses for both the RPN and the R-CNN, and the mask loss. losses for both the RPN and the R-CNN, and the mask loss.
...@@ -286,11 +286,11 @@ def maskrcnn_resnet50_fpn(pretrained=False, progress=True, ...@@ -286,11 +286,11 @@ def maskrcnn_resnet50_fpn(pretrained=False, progress=True,
During inference, the model requires only the input tensors, and returns the post-processed During inference, the model requires only the input tensors, and returns the post-processed
predictions as a ``List[Dict[Tensor]]``, one for each input image. The fields of the ``Dict`` are as predictions as a ``List[Dict[Tensor]]``, one for each input image. The fields of the ``Dict`` are as
follows: follows:
- boxes (``Tensor[N, 4]``): the predicted boxes in ``[x0, y0, x1, y1]`` format, with values between - boxes (``FloatTensor[N, 4]``): the predicted boxes in ``[x0, y0, x1, y1]`` format, with values between
``0`` and ``H`` and ``0`` and ``W`` ``0`` and ``H`` and ``0`` and ``W``
- labels (``Tensor[N]``): the predicted labels for each image - labels (``Int64Tensor[N]``): the predicted labels for each image
- scores (``Tensor[N]``): the scores or each prediction - scores (``Tensor[N]``): the scores or each prediction
- masks (``Tensor[N, H, W]``): the predicted masks for each instance, in ``0-1`` range. In order to - masks (``UInt8Tensor[N, 1, H, W]``): the predicted masks for each instance, in ``0-1`` range. In order to
obtain the final segmentation masks, the soft masks can be thresholded, generally obtain the final segmentation masks, the soft masks can be thresholded, generally
with a value of 0.5 (``mask >= 0.5``) with a value of 0.5 (``mask >= 0.5``)
......
...@@ -17,6 +17,8 @@ def fastrcnn_loss(class_logits, box_regression, labels, regression_targets): ...@@ -17,6 +17,8 @@ def fastrcnn_loss(class_logits, box_regression, labels, regression_targets):
Arguments: Arguments:
class_logits (Tensor) class_logits (Tensor)
box_regression (Tensor) box_regression (Tensor)
labels (list[BoxList])
regression_targets (Tensor)
Returns: Returns:
classification_loss (Tensor) classification_loss (Tensor)
...@@ -55,7 +57,7 @@ def maskrcnn_inference(x, labels): ...@@ -55,7 +57,7 @@ def maskrcnn_inference(x, labels):
Arguments: Arguments:
x (Tensor): the mask logits x (Tensor): the mask logits
boxes (list[BoxList]): bounding boxes that are used as labels (list[BoxList]): bounding boxes that are used as
reference, one for ech image reference, one for ech image
Returns: Returns:
...@@ -250,7 +252,7 @@ def keypointrcnn_inference(x, boxes): ...@@ -250,7 +252,7 @@ def keypointrcnn_inference(x, boxes):
# the next two functions should be merged inside Masker # the next two functions should be merged inside Masker
# but are kept here for the moment while we need them # but are kept here for the moment while we need them
# temporarily gor paste_mask_in_image # temporarily for paste_mask_in_image
def expand_boxes(boxes, scale): def expand_boxes(boxes, scale):
w_half = (boxes[:, 2] - boxes[:, 0]) * .5 w_half = (boxes[:, 2] - boxes[:, 0]) * .5
h_half = (boxes[:, 3] - boxes[:, 1]) * .5 h_half = (boxes[:, 3] - boxes[:, 1]) * .5
...@@ -525,6 +527,13 @@ class RoIHeads(torch.nn.Module): ...@@ -525,6 +527,13 @@ class RoIHeads(torch.nn.Module):
image_shapes (List[Tuple[H, W]]) image_shapes (List[Tuple[H, W]])
targets (List[Dict]) targets (List[Dict])
""" """
if targets is not None:
for t in targets:
assert t["boxes"].dtype.is_floating_point, 'target boxes must of float type'
assert t["labels"].dtype == torch.int64, 'target labels must of int64 type'
if self.has_keypoint:
assert t["keypoints"].dtype == torch.float32, 'target keypoints must of float type'
if self.training: if self.training:
proposals, matched_idxs, labels, regression_targets = self.select_training_samples(proposals, targets) proposals, matched_idxs, labels, regression_targets = self.select_training_samples(proposals, targets)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment