Unverified Commit 59f7e80a authored by pkulzc's avatar pkulzc Committed by GitHub
Browse files

Update object detection post processing and fixes boxes padding/clipping issue. (#5026)

* Merged commit includes the following changes:
207771702  by Zhichao Lu:

    Refactoring evaluation utilities so that it is easier to introduce new DetectionEvaluators with eval_metric_ops.

--
207758641  by Zhichao Lu:

    Require tensorflow version 1.9+ for running object detection API.

--
207641470  by Zhichao Lu:

    Clip `num_groundtruth_boxes` in pad_input_data_to_static_shapes() to `max_num_boxes`. This prevents a scenario where tensors are sliced to an invalid range in model_lib.unstack_batch().

--
207621728  by Zhichao Lu:

    This CL adds a FreezableBatchNorm that inherits from the Keras BatchNormalization layer, but supports freezing the `training` parameter at construction time instead of having to do it in the `call` method.

    It also adds a method to the `KerasLayerHyperparams` class that will build an appropriate FreezableBatchNorm layer according to the hyperparameter configuration. If batch_norm is disabled, this method returns and Identity layer.

    These will be used to simplify the conversion to Keras APIs.

--
207610524  by Zhichao Lu:

    Update anchor generators and box predictors for python3 compatibility.

--
207585122  by Zhichao Lu:

    Refactoring convolutional box predictor into separate prediction heads.

--
207549305  by Zhichao Lu:

    Pass all 1s for batch weights if nothing is specified in GT.

--
207336575  by Zhichao Lu:

    Move the new argument 'target_assigner_instance' to the end of the list of arguments to the ssd_meta_arch constructor for backwards compatibility.

--
207327862  by Zhichao Lu:

    Enable support for float output in quantized custom op for postprocessing in SSD Mobilenet model.

--
207323154  by Zhichao Lu:

    Bug fix: change dict.iteritems() to dict.items()

--
207301109  by Zhichao Lu:

    Integrating expected_classification_loss_under_sampling op as an option in the ssd_meta_arch

--
207286221  by Zhichao Lu:

    Adding an option to weight regression loss with foreground scores from the ground truth labels.

--
207231739  by Zhichao Lu:

    Explicitly mentioning the argument names when calling the batch target assigner.

--
207206356  by Zhichao Lu:

    Add include_trainable_variables field to train config to better handle trainable variables.

--
207135930  by Zhichao Lu:

    Internal change.

--
206862541  by Zhichao Lu:

    Do not unpad the outputs from batch_non_max_suppression before sampling.

    Since BalancedPositiveNegativeSampler takes an indicator for valid positions to sample from we can pass the output from NMS directly into Sampler.

--

PiperOrigin-RevId: 207771702

* Remove unused doc.
parent fb6bc29b
......@@ -13,32 +13,47 @@
# limitations under the License.
# ==============================================================================
"""Base Mask RCNN head class."""
"""Base head class.
All the different kinds of prediction heads in different models will inherit
from this class. What is in common between all head classes is that they have a
`predict` function that receives `features` as its first argument.
How to add a new prediction head to an existing meta architecture?
For example, how can we add a `3d shape` prediction head to Mask RCNN?
We have to take the following steps to add a new prediction head to an
existing meta arch:
(a) Add a class for predicting the head. This class should inherit from the
`Head` class below and have a `predict` function that receives the features
and predicts the output. The output is always a tf.float32 tensor.
(b) Add the head to the meta architecture. For example in case of Mask RCNN,
go to box_predictor_builder and put in the logic for adding the new head to the
Mask RCNN box predictor.
(c) Add the logic for computing the loss for the new head.
(d) Add the necessary metrics for the new head.
(e) (optional) Add visualization for the new head.
"""
from abc import abstractmethod
class MaskRCNNHead(object):
class Head(object):
"""Mask RCNN head base class."""
def __init__(self):
"""Constructor."""
pass
def predict(self, roi_pooled_features):
@abstractmethod
def predict(self, features, num_predictions_per_location):
"""Returns the head's predictions.
Args:
roi_pooled_features: A float tensor of shape
[batch_size, height, width, channels] containing ROI pooled features
from a batch of boxes.
"""
return self._predict(roi_pooled_features)
features: A float tensor of features.
num_predictions_per_location: Int containing number of predictions per
location.
@abstractmethod
def _predict(self, roi_pooled_features):
"""The abstract internal prediction function that needs to be overloaded.
Args:
roi_pooled_features: A float tensor of shape
[batch_size, height, width, channels] containing ROI pooled features
from a batch of boxes.
Returns:
A tf.float32 tensor.
"""
pass
......@@ -13,15 +13,27 @@
# limitations under the License.
# ==============================================================================
"""Mask R-CNN Keypoint Head."""
"""Keypoint Head.
Contains Keypoint prediction head classes for different meta architectures.
All the keypoint prediction heads have a predict function that receives the
`features` as the first argument and returns `keypoint_predictions`.
Keypoints could be used to represent the human body joint locations as in
Mask RCNN paper. Or they could be used to represent different part locations of
objects.
"""
import tensorflow as tf
from object_detection.predictors.mask_rcnn_heads import mask_rcnn_head
from object_detection.predictors.heads import head
slim = tf.contrib.slim
class KeypointHead(mask_rcnn_head.MaskRCNNHead):
"""Mask RCNN keypoint prediction head."""
class MaskRCNNKeypointHead(head.Head):
"""Mask RCNN keypoint prediction head.
Please refer to Mask RCNN paper:
https://arxiv.org/abs/1703.06870
"""
def __init__(self,
num_keypoints=17,
......@@ -48,7 +60,7 @@ class KeypointHead(mask_rcnn_head.MaskRCNNHead):
based on the number of object classes and the number of channels in the
image features.
"""
super(KeypointHead, self).__init__()
super(MaskRCNNKeypointHead, self).__init__()
self._num_keypoints = num_keypoints
self._conv_hyperparams_fn = conv_hyperparams_fn
self._keypoint_heatmap_height = keypoint_heatmap_height
......@@ -57,20 +69,27 @@ class KeypointHead(mask_rcnn_head.MaskRCNNHead):
keypoint_prediction_num_conv_layers)
self._keypoint_prediction_conv_depth = keypoint_prediction_conv_depth
def _predict(self, roi_pooled_features):
def predict(self, features, num_predictions_per_location=1):
"""Performs keypoint prediction.
Args:
roi_pooled_features: A float tensor of shape [batch_size, height, width,
features: A float tensor of shape [batch_size, height, width,
channels] containing features for a batch of images.
num_predictions_per_location: Int containing number of predictions per
location.
Returns:
instance_masks: A float tensor of shape
[batch_size, 1, num_keypoints, heatmap_height, heatmap_width].
Raises:
ValueError: If num_predictions_per_location is not 1.
"""
if num_predictions_per_location != 1:
raise ValueError('Only num_predictions_per_location=1 is supported')
with slim.arg_scope(self._conv_hyperparams_fn()):
net = slim.conv2d(
roi_pooled_features,
features,
self._keypoint_prediction_conv_depth, [3, 3],
scope='conv_1')
for i in range(1, self._keypoint_prediction_num_conv_layers):
......
......@@ -13,17 +13,17 @@
# limitations under the License.
# ==============================================================================
"""Tests for object_detection.predictors.mask_rcnn_heads.keypoint_head."""
"""Tests for object_detection.predictors.heads.keypoint_head."""
import tensorflow as tf
from google.protobuf import text_format
from object_detection.builders import hyperparams_builder
from object_detection.predictors.mask_rcnn_heads import keypoint_head
from object_detection.predictors.heads import keypoint_head
from object_detection.protos import hyperparams_pb2
from object_detection.utils import test_case
class KeypointHeadTest(test_case.TestCase):
class MaskRCNNKeypointHeadTest(test_case.TestCase):
def _build_arg_scope_with_hyperparams(self,
op_type=hyperparams_pb2.Hyperparams.FC):
......@@ -44,13 +44,12 @@ class KeypointHeadTest(test_case.TestCase):
return hyperparams_builder.build(hyperparams, is_training=True)
def test_prediction_size(self):
keypoint_prediction_head = keypoint_head.KeypointHead(
keypoint_prediction_head = keypoint_head.MaskRCNNKeypointHead(
conv_hyperparams_fn=self._build_arg_scope_with_hyperparams())
roi_pooled_features = tf.random_uniform(
[64, 14, 14, 1024], minval=-2.0, maxval=2.0, dtype=tf.float32)
prediction = keypoint_prediction_head.predict(
roi_pooled_features=roi_pooled_features)
tf.logging.info(prediction.shape)
features=roi_pooled_features, num_predictions_per_location=1)
self.assertAllEqual([64, 1, 17, 56, 56], prediction.get_shape().as_list())
......
......@@ -13,17 +13,26 @@
# limitations under the License.
# ==============================================================================
"""Mask R-CNN Mask Head."""
"""Mask Head.
Contains Mask prediction head classes for different meta architectures.
All the mask prediction heads have a predict function that receives the
`features` as the first argument and returns `mask_predictions`.
"""
import math
import tensorflow as tf
from object_detection.predictors.mask_rcnn_heads import mask_rcnn_head
from object_detection.predictors.heads import head
slim = tf.contrib.slim
class MaskHead(mask_rcnn_head.MaskRCNNHead):
"""Mask RCNN mask prediction head."""
class MaskRCNNMaskHead(head.Head):
"""Mask RCNN mask prediction head.
Please refer to Mask RCNN paper:
https://arxiv.org/abs/1703.06870
"""
def __init__(self,
num_classes,
......@@ -57,7 +66,7 @@ class MaskHead(mask_rcnn_head.MaskRCNNHead):
Raises:
ValueError: conv_hyperparams_fn is None.
"""
super(MaskHead, self).__init__()
super(MaskRCNNMaskHead, self).__init__()
self._num_classes = num_classes
self._conv_hyperparams_fn = conv_hyperparams_fn
self._mask_height = mask_height
......@@ -102,25 +111,32 @@ class MaskHead(mask_rcnn_head.MaskRCNNHead):
total_weight)
return int(math.pow(2.0, num_conv_channels_log))
def _predict(self, roi_pooled_features):
def predict(self, features, num_predictions_per_location=1):
"""Performs mask prediction.
Args:
roi_pooled_features: A float tensor of shape [batch_size, height, width,
channels] containing features for a batch of images.
features: A float tensor of shape [batch_size, height, width, channels]
containing features for a batch of images.
num_predictions_per_location: Int containing number of predictions per
location.
Returns:
instance_masks: A float tensor of shape
[batch_size, 1, num_classes, mask_height, mask_width].
Raises:
ValueError: If num_predictions_per_location is not 1.
"""
if num_predictions_per_location != 1:
raise ValueError('Only num_predictions_per_location=1 is supported')
num_conv_channels = self._mask_prediction_conv_depth
if num_conv_channels == 0:
num_feature_channels = roi_pooled_features.get_shape().as_list()[3]
num_feature_channels = features.get_shape().as_list()[3]
num_conv_channels = self._get_mask_predictor_conv_depth(
num_feature_channels, self._num_classes)
with slim.arg_scope(self._conv_hyperparams_fn()):
upsampled_features = tf.image.resize_bilinear(
roi_pooled_features, [self._mask_height, self._mask_width],
features, [self._mask_height, self._mask_width],
align_corners=True)
for _ in range(self._mask_prediction_num_conv_layers - 1):
upsampled_features = slim.conv2d(
......@@ -137,3 +153,182 @@ class MaskHead(mask_rcnn_head.MaskRCNNHead):
tf.transpose(mask_predictions, perm=[0, 3, 1, 2]),
axis=1,
name='MaskPredictor')
class ConvolutionalMaskHead(head.Head):
"""Convolutional class prediction head."""
def __init__(self,
is_training,
num_classes,
use_dropout,
dropout_keep_prob,
kernel_size,
use_depthwise=False,
mask_height=7,
mask_width=7,
masks_are_class_agnostic=False):
"""Constructor.
Args:
is_training: Indicates whether the BoxPredictor is in training mode.
num_classes: Number of classes.
use_dropout: Option to use dropout or not. Note that a single dropout
op is applied here prior to both box and class predictions, which stands
in contrast to the ConvolutionalBoxPredictor below.
dropout_keep_prob: Keep probability for dropout.
This is only used if use_dropout is True.
kernel_size: Size of final convolution kernel. If the
spatial resolution of the feature map is smaller than the kernel size,
then the kernel size is automatically set to be
min(feature_width, feature_height).
use_depthwise: Whether to use depthwise convolutions for prediction
steps. Default is False.
mask_height: Desired output mask height. The default value is 7.
mask_width: Desired output mask width. The default value is 7.
masks_are_class_agnostic: Boolean determining if the mask-head is
class-agnostic or not.
Raises:
ValueError: if min_depth > max_depth.
"""
super(ConvolutionalMaskHead, self).__init__()
self._is_training = is_training
self._num_classes = num_classes
self._use_dropout = use_dropout
self._dropout_keep_prob = dropout_keep_prob
self._kernel_size = kernel_size
self._use_depthwise = use_depthwise
self._mask_height = mask_height
self._mask_width = mask_width
self._masks_are_class_agnostic = masks_are_class_agnostic
def predict(self, features, num_predictions_per_location):
"""Predicts boxes.
Args:
features: A float tensor of shape [batch_size, height, width, channels]
containing image features.
num_predictions_per_location: Number of box predictions to be made per
spatial location.
Returns:
mask_predictions: A float tensors of shape
[batch_size, num_anchors, num_masks, mask_height, mask_width]
representing the mask predictions for the proposals.
"""
image_feature = features
# Add a slot for the background class.
if self._masks_are_class_agnostic:
num_masks = 1
else:
num_masks = self._num_classes
num_mask_channels = num_masks * self._mask_height * self._mask_width
net = image_feature
if self._use_dropout:
net = slim.dropout(net, keep_prob=self._dropout_keep_prob)
if self._use_depthwise:
mask_predictions = slim.separable_conv2d(
net, None, [self._kernel_size, self._kernel_size],
padding='SAME', depth_multiplier=1, stride=1,
rate=1, scope='MaskPredictor_depthwise')
mask_predictions = slim.conv2d(
mask_predictions,
num_predictions_per_location * num_mask_channels,
[1, 1],
activation_fn=None,
normalizer_fn=None,
normalizer_params=None,
scope='MaskPredictor')
else:
mask_predictions = slim.conv2d(
net,
num_predictions_per_location * num_mask_channels,
[self._kernel_size, self._kernel_size],
activation_fn=None,
normalizer_fn=None,
normalizer_params=None,
scope='MaskPredictor')
batch_size = features.get_shape().as_list()[0]
if batch_size is None:
batch_size = tf.shape(features)[0]
mask_predictions = tf.reshape(
mask_predictions,
[batch_size, -1, num_masks, self._mask_height, self._mask_width])
return mask_predictions
# TODO(alirezafathi): See if possible to unify Weight Shared with regular
# convolutional mask head.
class WeightSharedConvolutionalMaskHead(head.Head):
"""Weight shared convolutional mask prediction head."""
def __init__(self,
num_classes,
kernel_size=3,
use_dropout=False,
dropout_keep_prob=0.8,
mask_height=7,
mask_width=7,
masks_are_class_agnostic=False):
"""Constructor.
Args:
num_classes: number of classes. Note that num_classes *does not*
include the background category, so if groundtruth labels take values
in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the
assigned classification targets can range from {0,... K}).
kernel_size: Size of final convolution kernel.
use_dropout: Whether to apply dropout to class prediction head.
dropout_keep_prob: Probability of keeping activiations.
mask_height: Desired output mask height. The default value is 7.
mask_width: Desired output mask width. The default value is 7.
masks_are_class_agnostic: Boolean determining if the mask-head is
class-agnostic or not.
"""
super(WeightSharedConvolutionalMaskHead, self).__init__()
self._num_classes = num_classes
self._kernel_size = kernel_size
self._use_dropout = use_dropout
self._dropout_keep_prob = dropout_keep_prob
self._mask_height = mask_height
self._mask_width = mask_width
self._masks_are_class_agnostic = masks_are_class_agnostic
def predict(self, features, num_predictions_per_location):
"""Predicts boxes.
Args:
features: A float tensor of shape [batch_size, height, width, channels]
containing image features.
num_predictions_per_location: Number of box predictions to be made per
spatial location.
Returns:
mask_predictions: A tensor of shape
[batch_size, num_anchors, num_classes, mask_height, mask_width]
representing the mask predictions for the proposals.
"""
mask_predictions_net = features
if self._masks_are_class_agnostic:
num_masks = 1
else:
num_masks = self._num_classes
num_mask_channels = num_masks * self._mask_height * self._mask_width
if self._use_dropout:
mask_predictions_net = slim.dropout(
mask_predictions_net, keep_prob=self._dropout_keep_prob)
mask_predictions = slim.conv2d(
mask_predictions_net,
num_predictions_per_location * num_mask_channels,
[self._kernel_size, self._kernel_size],
activation_fn=None, stride=1, padding='SAME',
normalizer_fn=None,
scope='MaskPredictor')
batch_size = features.get_shape().as_list()[0]
if batch_size is None:
batch_size = tf.shape(features)[0]
mask_predictions = tf.reshape(
mask_predictions,
[batch_size, -1, num_masks, self._mask_height, self._mask_width])
return mask_predictions
......@@ -13,17 +13,17 @@
# limitations under the License.
# ==============================================================================
"""Tests for object_detection.predictors.mask_rcnn_heads.mask_head."""
"""Tests for object_detection.predictors.heads.mask_head."""
import tensorflow as tf
from google.protobuf import text_format
from object_detection.builders import hyperparams_builder
from object_detection.predictors.mask_rcnn_heads import mask_head
from object_detection.predictors.heads import mask_head
from object_detection.protos import hyperparams_pb2
from object_detection.utils import test_case
class MaskHeadTest(test_case.TestCase):
class MaskRCNNMaskHeadTest(test_case.TestCase):
def _build_arg_scope_with_hyperparams(self,
op_type=hyperparams_pb2.Hyperparams.FC):
......@@ -44,7 +44,7 @@ class MaskHeadTest(test_case.TestCase):
return hyperparams_builder.build(hyperparams, is_training=True)
def test_prediction_size(self):
mask_prediction_head = mask_head.MaskHead(
mask_prediction_head = mask_head.MaskRCNNMaskHead(
num_classes=20,
conv_hyperparams_fn=self._build_arg_scope_with_hyperparams(),
mask_height=14,
......@@ -55,10 +55,115 @@ class MaskHeadTest(test_case.TestCase):
roi_pooled_features = tf.random_uniform(
[64, 7, 7, 1024], minval=-10.0, maxval=10.0, dtype=tf.float32)
prediction = mask_prediction_head.predict(
roi_pooled_features=roi_pooled_features)
tf.logging.info(prediction.shape)
features=roi_pooled_features, num_predictions_per_location=1)
self.assertAllEqual([64, 1, 20, 14, 14], prediction.get_shape().as_list())
class ConvolutionalMaskPredictorTest(test_case.TestCase):
def _build_arg_scope_with_hyperparams(
self, op_type=hyperparams_pb2.Hyperparams.CONV):
hyperparams = hyperparams_pb2.Hyperparams()
hyperparams_text_proto = """
activation: NONE
regularizer {
l2_regularizer {
}
}
initializer {
truncated_normal_initializer {
}
}
"""
text_format.Merge(hyperparams_text_proto, hyperparams)
hyperparams.op = op_type
return hyperparams_builder.build(hyperparams, is_training=True)
def test_prediction_size(self):
mask_prediction_head = mask_head.ConvolutionalMaskHead(
is_training=True,
num_classes=20,
use_dropout=True,
dropout_keep_prob=0.5,
kernel_size=3,
mask_height=7,
mask_width=7)
image_feature = tf.random_uniform(
[64, 17, 19, 1024], minval=-10.0, maxval=10.0, dtype=tf.float32)
mask_predictions = mask_prediction_head.predict(
features=image_feature,
num_predictions_per_location=1)
self.assertAllEqual([64, 323, 20, 7, 7],
mask_predictions.get_shape().as_list())
def test_class_agnostic_prediction_size(self):
mask_prediction_head = mask_head.ConvolutionalMaskHead(
is_training=True,
num_classes=20,
use_dropout=True,
dropout_keep_prob=0.5,
kernel_size=3,
mask_height=7,
mask_width=7,
masks_are_class_agnostic=True)
image_feature = tf.random_uniform(
[64, 17, 19, 1024], minval=-10.0, maxval=10.0, dtype=tf.float32)
mask_predictions = mask_prediction_head.predict(
features=image_feature,
num_predictions_per_location=1)
self.assertAllEqual([64, 323, 1, 7, 7],
mask_predictions.get_shape().as_list())
class WeightSharedConvolutionalMaskPredictorTest(test_case.TestCase):
def _build_arg_scope_with_hyperparams(
self, op_type=hyperparams_pb2.Hyperparams.CONV):
hyperparams = hyperparams_pb2.Hyperparams()
hyperparams_text_proto = """
activation: NONE
regularizer {
l2_regularizer {
}
}
initializer {
truncated_normal_initializer {
}
}
"""
text_format.Merge(hyperparams_text_proto, hyperparams)
hyperparams.op = op_type
return hyperparams_builder.build(hyperparams, is_training=True)
def test_prediction_size(self):
mask_prediction_head = (
mask_head.WeightSharedConvolutionalMaskHead(
num_classes=20,
mask_height=7,
mask_width=7))
image_feature = tf.random_uniform(
[64, 17, 19, 1024], minval=-10.0, maxval=10.0, dtype=tf.float32)
mask_predictions = mask_prediction_head.predict(
features=image_feature,
num_predictions_per_location=1)
self.assertAllEqual([64, 323, 20, 7, 7],
mask_predictions.get_shape().as_list())
def test_class_agnostic_prediction_size(self):
mask_prediction_head = (
mask_head.WeightSharedConvolutionalMaskHead(
num_classes=20,
mask_height=7,
mask_width=7,
masks_are_class_agnostic=True))
image_feature = tf.random_uniform(
[64, 17, 19, 1024], minval=-10.0, maxval=10.0, dtype=tf.float32)
mask_predictions = mask_prediction_head.predict(
features=image_feature,
num_predictions_per_location=1)
self.assertAllEqual([64, 323, 1, 7, 7],
mask_predictions.get_shape().as_list())
if __name__ == '__main__':
tf.test.main()
......@@ -126,15 +126,18 @@ class MaskRCNNBoxPredictor(box_predictor.BoxPredictor):
if prediction_stage == 2:
predictions_dict[BOX_ENCODINGS] = self._box_prediction_head.predict(
roi_pooled_features=image_feature)
features=image_feature,
num_predictions_per_location=num_predictions_per_location[0])
predictions_dict[CLASS_PREDICTIONS_WITH_BACKGROUND] = (
self._class_prediction_head.predict(roi_pooled_features=image_feature)
)
self._class_prediction_head.predict(
features=image_feature,
num_predictions_per_location=num_predictions_per_location[0]))
elif prediction_stage == 3:
for prediction_head in self.get_third_stage_prediction_heads():
head_object = self._third_stage_heads[prediction_head]
predictions_dict[prediction_head] = head_object.predict(
roi_pooled_features=image_feature)
features=image_feature,
num_predictions_per_location=num_predictions_per_location[0])
else:
raise ValueError('prediction_stage should be either 2 or 3.')
......
......@@ -18,11 +18,9 @@ import numpy as np
import tensorflow as tf
from google.protobuf import text_format
from object_detection.builders import box_predictor_builder
from object_detection.builders import hyperparams_builder
from object_detection.predictors import mask_rcnn_box_predictor as box_predictor
from object_detection.predictors.mask_rcnn_heads import box_head
from object_detection.predictors.mask_rcnn_heads import class_head
from object_detection.predictors.mask_rcnn_heads import mask_head
from object_detection.protos import hyperparams_pb2
from object_detection.utils import test_case
......@@ -47,45 +45,9 @@ class MaskRCNNBoxPredictorTest(test_case.TestCase):
hyperparams.op = op_type
return hyperparams_builder.build(hyperparams, is_training=True)
def _box_predictor_builder(self,
is_training,
num_classes,
fc_hyperparams_fn,
use_dropout,
dropout_keep_prob,
box_code_size,
share_box_across_classes=False,
conv_hyperparams_fn=None,
predict_instance_masks=False):
box_prediction_head = box_head.BoxHead(
is_training=is_training,
num_classes=num_classes,
fc_hyperparams_fn=fc_hyperparams_fn,
use_dropout=use_dropout,
dropout_keep_prob=dropout_keep_prob,
box_code_size=box_code_size,
share_box_across_classes=share_box_across_classes)
class_prediction_head = class_head.ClassHead(
is_training=is_training,
num_classes=num_classes,
fc_hyperparams_fn=fc_hyperparams_fn,
use_dropout=use_dropout,
dropout_keep_prob=dropout_keep_prob)
third_stage_heads = {}
if predict_instance_masks:
third_stage_heads[box_predictor.MASK_PREDICTIONS] = mask_head.MaskHead(
num_classes=num_classes,
conv_hyperparams_fn=conv_hyperparams_fn)
return box_predictor.MaskRCNNBoxPredictor(
is_training=is_training,
num_classes=num_classes,
box_prediction_head=box_prediction_head,
class_prediction_head=class_prediction_head,
third_stage_heads=third_stage_heads)
def test_get_boxes_with_five_classes(self):
def graph_fn(image_features):
mask_box_predictor = self._box_predictor_builder(
mask_box_predictor = box_predictor_builder.build_mask_rcnn_box_predictor(
is_training=False,
num_classes=5,
fc_hyperparams_fn=self._build_arg_scope_with_hyperparams(),
......@@ -109,7 +71,7 @@ class MaskRCNNBoxPredictorTest(test_case.TestCase):
def test_get_boxes_with_five_classes_share_box_across_classes(self):
def graph_fn(image_features):
mask_box_predictor = self._box_predictor_builder(
mask_box_predictor = box_predictor_builder.build_mask_rcnn_box_predictor(
is_training=False,
num_classes=5,
fc_hyperparams_fn=self._build_arg_scope_with_hyperparams(),
......@@ -134,7 +96,7 @@ class MaskRCNNBoxPredictorTest(test_case.TestCase):
def test_value_error_on_predict_instance_masks_with_no_conv_hyperparms(self):
with self.assertRaises(ValueError):
self._box_predictor_builder(
box_predictor_builder.build_mask_rcnn_box_predictor(
is_training=False,
num_classes=5,
fc_hyperparams_fn=self._build_arg_scope_with_hyperparams(),
......@@ -145,7 +107,7 @@ class MaskRCNNBoxPredictorTest(test_case.TestCase):
def test_get_instance_masks(self):
def graph_fn(image_features):
mask_box_predictor = self._box_predictor_builder(
mask_box_predictor = box_predictor_builder.build_mask_rcnn_box_predictor(
is_training=False,
num_classes=5,
fc_hyperparams_fn=self._build_arg_scope_with_hyperparams(),
......@@ -167,7 +129,7 @@ class MaskRCNNBoxPredictorTest(test_case.TestCase):
def test_do_not_return_instance_masks_without_request(self):
image_features = tf.random_uniform([2, 7, 7, 3], dtype=tf.float32)
mask_box_predictor = self._box_predictor_builder(
mask_box_predictor = box_predictor_builder.build_mask_rcnn_box_predictor(
is_training=False,
num_classes=5,
fc_hyperparams_fn=self._build_arg_scope_with_hyperparams(),
......
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Mask R-CNN Class Head."""
import tensorflow as tf
from object_detection.predictors.mask_rcnn_heads import mask_rcnn_head
slim = tf.contrib.slim
class ClassHead(mask_rcnn_head.MaskRCNNHead):
"""Mask RCNN class prediction head."""
def __init__(self, is_training, num_classes, fc_hyperparams_fn,
use_dropout, dropout_keep_prob):
"""Constructor.
Args:
is_training: Indicates whether the BoxPredictor is in training mode.
num_classes: number of classes. Note that num_classes *does not*
include the background category, so if groundtruth labels take values
in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the
assigned classification targets can range from {0,... K}).
fc_hyperparams_fn: A function to generate tf-slim arg_scope with
hyperparameters for fully connected ops.
use_dropout: Option to use dropout or not. Note that a single dropout
op is applied here prior to both box and class predictions, which stands
in contrast to the ConvolutionalBoxPredictor below.
dropout_keep_prob: Keep probability for dropout.
This is only used if use_dropout is True.
"""
super(ClassHead, self).__init__()
self._is_training = is_training
self._num_classes = num_classes
self._fc_hyperparams_fn = fc_hyperparams_fn
self._use_dropout = use_dropout
self._dropout_keep_prob = dropout_keep_prob
def _predict(self, roi_pooled_features):
"""Predicts boxes and class scores.
Args:
roi_pooled_features: A float tensor of shape [batch_size, height, width,
channels] containing features for a batch of images.
Returns:
class_predictions_with_background: A float tensor of shape
[batch_size, 1, num_classes + 1] representing the class predictions for
the proposals.
"""
spatial_averaged_roi_pooled_features = tf.reduce_mean(
roi_pooled_features, [1, 2], keep_dims=True, name='AvgPool')
flattened_roi_pooled_features = slim.flatten(
spatial_averaged_roi_pooled_features)
if self._use_dropout:
flattened_roi_pooled_features = slim.dropout(
flattened_roi_pooled_features,
keep_prob=self._dropout_keep_prob,
is_training=self._is_training)
with slim.arg_scope(self._fc_hyperparams_fn()):
class_predictions_with_background = slim.fully_connected(
flattened_roi_pooled_features,
self._num_classes + 1,
activation_fn=None,
scope='ClassPredictor')
class_predictions_with_background = tf.reshape(
class_predictions_with_background, [-1, 1, self._num_classes + 1])
return class_predictions_with_background
......@@ -79,10 +79,9 @@ message InputReader {
// Number of groundtruth keypoints per object.
optional uint32 num_keypoints = 16 [default = 0];
// Maximum number of boxes to pad to during training.
// Set this to at least the maximum amount of boxes in the input data.
// Otherwise, it may cause "Data loss: Attempted to pad to a smaller size
// than the input element" errors.
// Maximum number of boxes to pad to during training / evaluation.
// Set this to at least the maximum amount of boxes in the input data,
// otherwise some groundtruth boxes may be clipped.
optional int32 max_number_of_boxes = 21 [default=100];
// Whether to load groundtruth instance masks.
......
......@@ -12,6 +12,7 @@ import "object_detection/protos/post_processing.proto";
import "object_detection/protos/region_similarity_calculator.proto";
// Configuration for Single Shot Detection (SSD) models.
// Next id: 21
message Ssd {
// Number of classes to predict.
......@@ -80,6 +81,22 @@ message Ssd {
// a control dependency on tf.GraphKeys.UPDATE_OPS for train/loss op in order
// to update the batch norm moving average parameters.
optional bool inplace_batchnorm_update = 15 [default = false];
// Whether to weight the regression loss by the score of the ground truth box
// the anchor matches to.
optional bool weight_regression_loss_by_score = 17 [default=false];
// Whether to compute expected loss with respect to balanced positive/negative
// sampling scheme. If false, use explicit sampling.
optional bool use_expected_classification_loss_under_sampling = 18 [default=false];
// Minimum number of effective negative samples.
// Only applies if use_expected_classification_loss_under_sampling is true.
optional float minimum_negative_sampling = 19 [default=0];
// Desired number of effective negative samples per positive sample.
// Only applies if use_expected_classification_loss_under_sampling is true.
optional float desired_negative_sampling_ratio = 20 [default=3];
}
......@@ -147,3 +164,4 @@ message FeaturePyramidNetworks {
// maximum level in feature pyramid
optional int32 max_level = 2 [default = 7];
}
......@@ -6,7 +6,7 @@ import "object_detection/protos/optimizer.proto";
import "object_detection/protos/preprocessor.proto";
// Message for configuring DetectionModel training jobs (train.py).
// Next id: 25
// Next id: 26
message TrainConfig {
// Effective batch size to use for training.
// For TPU (or sync SGD jobs), the batch size per core (or GPU) is going to be
......@@ -61,7 +61,13 @@ message TrainConfig {
// amount.
optional float bias_grad_multiplier = 11 [default=0];
// Variables that should not be updated during training.
// Variables that should be updated during training. Note that variables which
// also match the patterns in freeze_variables will be excluded.
repeated string update_trainable_variables = 25;
// Variables that should not be updated during training. If
// update_trainable_variables is not empty, only eliminates the included
// variables according to freeze_variables patterns.
repeated string freeze_variables = 12;
// Number of replicas to aggregate before making parameter updates.
......
......@@ -91,6 +91,23 @@ class DetectionEvaluator(object):
"""
pass
def get_estimator_eval_metric_ops(self, eval_dict):
"""Returns dict of metrics to use with `tf.estimator.EstimatorSpec`.
Note that this must only be implemented if performing evaluation with a
`tf.estimator.Estimator`.
Args:
eval_dict: A dictionary that holds tensors for evaluating an object
detection model, returned from
eval_util.result_dict_for_single_example().
Returns:
A dictionary of metric names to tuple of value_op and update_op that can
be used as eval metric ops in `tf.estimator.EstimatorSpec`.
"""
pass
@abstractmethod
def evaluate(self):
"""Evaluates detections and returns a dictionary of metrics."""
......
......@@ -1008,15 +1008,15 @@ def matmul_crop_and_resize(image, boxes, crop_size, scope=None):
def expected_classification_loss_under_sampling(batch_cls_targets, cls_losses,
negative_to_positive_ratio,
desired_negative_sampling_ratio,
minimum_negative_sampling):
"""Computes classification loss by background/foreground weighting.
The weighting is such that the effective background/foreground weight ratio
is the negative_to_positive_ratio. if p_i is the foreground probability of
anchor a_i, L(a_i) is the anchors loss, N is the number of anchors, and M is
the sum of foreground probabilities across anchors, then the total loss L is
calculated as:
is the desired_negative_sampling_ratio. if p_i is the foreground probability
of anchor a_i, L(a_i) is the anchors loss, N is the number of anchors, and M
is the sum of foreground probabilities across anchors, then the total loss L
is calculated as:
beta = K*M/(N-M)
L = sum_{i=1}^N [p_i + beta * (1 - p_i)] * (L(a_i))
......@@ -1027,14 +1027,14 @@ def expected_classification_loss_under_sampling(batch_cls_targets, cls_losses,
the class distrubution for the target assigned to a given anchor.
cls_losses: Float tensor of shape [batch_size, num_anchors]
representing anchorwise classification losses.
negative_to_positive_ratio: The desired background/foreground weight ratio.
desired_negative_sampling_ratio: The desired background/foreground weight
ratio.
minimum_negative_sampling: Minimum number of effective negative samples.
Used only when there are no positive examples.
Returns:
The classification loss.
"""
num_anchors = tf.cast(tf.shape(batch_cls_targets)[1], tf.float32)
# find the p_i
......@@ -1042,7 +1042,7 @@ def expected_classification_loss_under_sampling(batch_cls_targets, cls_losses,
foreground_probabilities_from_targets(batch_cls_targets))
foreground_sum = tf.reduce_sum(foreground_probabilities, axis=-1)
k = negative_to_positive_ratio
k = desired_negative_sampling_ratio
# compute beta
denominators = (num_anchors - foreground_sum)
......@@ -1053,7 +1053,8 @@ def expected_classification_loss_under_sampling(batch_cls_targets, cls_losses,
# where the foreground sum is zero, use a minimum negative weight.
min_negative_weight = 1.0 * minimum_negative_sampling / num_anchors
beta = tf.where(
tf.equal(beta, 0), min_negative_weight * tf.ones_like(beta), beta)
tf.equal(foreground_sum, 0), min_negative_weight * tf.ones_like(beta),
beta)
beta = tf.reshape(beta, [-1, 1])
cls_loss_weights = foreground_probabilities + (
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment