Update object detection post processing and fixes boxes padding/clipping issue. (#5026)

* Merged commit includes the following changes: 207771702 by Zhichao Lu: Refactoring evaluation utilities so that it is easier to introduce new DetectionEvaluators with eval_metric_ops. -- 207758641 by Zhichao Lu: Require tensorflow version 1.9+ for running object detection API. -- 207641470 by Zhichao Lu: Clip `num_groundtruth_boxes` in pad_input_data_to_static_shapes() to `max_num_boxes`. This prevents a scenario where tensors are sliced to an invalid range in model_lib.unstack_batch(). -- 207621728 by Zhichao Lu: This CL adds a FreezableBatchNorm that inherits from the Keras BatchNormalization layer, but supports freezing the `training` parameter at construction time instead of having to do it in the `call` method. It also adds a method to the `KerasLayerHyperparams` class that will build an appropriate FreezableBatchNorm layer according to the hyperparameter configuration. If batch_norm is disabled, this method returns and Identity layer. These will be used to simplify the conversion to Keras APIs. -- 207610524 by Zhichao Lu: Update anchor generators and box predictors for python3 compatibility. -- 207585122 by Zhichao Lu: Refactoring convolutional box predictor into separate prediction heads. -- 207549305 by Zhichao Lu: Pass all 1s for batch weights if nothing is specified in GT. -- 207336575 by Zhichao Lu: Move the new argument 'target_assigner_instance' to the end of the list of arguments to the ssd_meta_arch constructor for backwards compatibility. -- 207327862 by Zhichao Lu: Enable support for float output in quantized custom op for postprocessing in SSD Mobilenet model. -- 207323154 by Zhichao Lu: Bug fix: change dict.iteritems() to dict.items() -- 207301109 by Zhichao Lu: Integrating expected_classification_loss_under_sampling op as an option in the ssd_meta_arch -- 207286221 by Zhichao Lu: Adding an option to weight regression loss with foreground scores from the ground truth labels. -- 207231739 by Zhichao Lu: Explicitly mentioning the argument names when calling the batch target assigner. -- 207206356 by Zhichao Lu: Add include_trainable_variables field to train config to better handle trainable variables. -- 207135930 by Zhichao Lu: Internal change. -- 206862541 by Zhichao Lu: Do not unpad the outputs from batch_non_max_suppression before sampling. Since BalancedPositiveNegativeSampler takes an indicator for valid positions to sample from we can pass the output from NMS directly into Sampler. -- PiperOrigin-RevId: 207771702 * Remove unused doc.

Update object detection post processing and fixes boxes padding/clipping issue. (#5026)
* Merged commit includes the following changes: 207771702 by Zhichao Lu: Refactoring evaluation utilities so that it is easier to introduce new DetectionEvaluators with eval_metric_ops. -- 207758641 by Zhichao Lu: Require tensorflow version 1.9+ for running object detection API. -- 207641470 by Zhichao Lu: Clip `num_groundtruth_boxes` in pad_input_data_to_static_shapes() to `max_num_boxes`. This prevents a scenario where tensors are sliced to an invalid range in model_lib.unstack_batch(). -- 207621728 by Zhichao Lu: This CL adds a FreezableBatchNorm that inherits from the Keras BatchNormalization layer, but supports freezing the `training` parameter at construction time instead of having to do it in the `call` method. It also adds a method to the `KerasLayerHyperparams` class that will build an appropriate FreezableBatchNorm layer according to the hyperparameter configuration. If batch_norm is disabled, this method returns and Identity layer. These will be used to simplify the conversion to Keras APIs. -- 207610524 by Zhichao Lu: Update anchor generators and box predictors for python3 compatibility. -- 207585122 by Zhichao Lu: Refactoring convolutional box predictor into separate prediction heads. -- 207549305 by Zhichao Lu: Pass all 1s for batch weights if nothing is specified in GT. -- 207336575 by Zhichao Lu: Move the new argument 'target_assigner_instance' to the end of the list of arguments to the ssd_meta_arch constructor for backwards compatibility. -- 207327862 by Zhichao Lu: Enable support for float output in quantized custom op for postprocessing in SSD Mobilenet model. -- 207323154 by Zhichao Lu: Bug fix: change dict.iteritems() to dict.items() -- 207301109 by Zhichao Lu: Integrating expected_classification_loss_under_sampling op as an option in the ssd_meta_arch -- 207286221 by Zhichao Lu: Adding an option to weight regression loss with foreground scores from the ground truth labels. -- 207231739 by Zhichao Lu: Explicitly mentioning the argument names when calling the batch target assigner. -- 207206356 by Zhichao Lu: Add include_trainable_variables field to train config to better handle trainable variables. -- 207135930 by Zhichao Lu: Internal change. -- 206862541 by Zhichao Lu: Do not unpad the outputs from batch_non_max_suppression before sampling. Since BalancedPositiveNegativeSampler takes an indicator for valid positions to sample from we can pass the output from NMS directly into Sampler. -- PiperOrigin-RevId: 207771702 * Remove unused doc.
59f7e80a · pkulzc · GitHub · fb6bc29b · 59f7e80a · 59f7e80a
Unverified Commit 59f7e80a authored Aug 07, 2018 by pkulzc Committed by GitHub Aug 07, 2018
13 changed files
--- a/research/object_detection/predictors/mask_rcnn_heads/mask_rcnn_head.py
+++ b/research/object_detection/predictors/mask_rcnn_heads/mask_rcnn_head.py
@@ -13,32 +13,47 @@
 # limitations under the License.
 # ==============================================================================

-"""Base Mask RCNN head class."""
+"""Base head class.
+
+All the different kinds of prediction heads in different models will inherit
+from this class. What is in common between all head classes is that they have a
+`predict` function that receives `features` as its first argument.
+
+How to add a new prediction head to an existing meta architecture?
+For example, how can we add a `3d shape` prediction head to Mask RCNN?
+
+We have to take the following steps to add a new prediction head to an
+existing meta arch:
+(a) Add a class for predicting the head. This class should inherit from the
+`Head` class below and have a `predict` function that receives the features
+and predicts the output. The output is always a tf.float32 tensor.
+(b) Add the head to the meta architecture. For example in case of Mask RCNN,
+go to box_predictor_builder and put in the logic for adding the new head to the
+Mask RCNN box predictor.
+(c) Add the logic for computing the loss for the new head.
+(d) Add the necessary metrics for the new head.
+(e) (optional) Add visualization for the new head.
+"""
 from abc import abstractmethod


-class MaskRCNNHead(object):
+class Head(object):
  """Mask RCNN head base class."""

  def __init__(self):
    """Constructor."""
+    pass

-  def predict(self, roi_pooled_features):
+  @abstractmethod
+  def predict(self, features, num_predictions_per_location):
    """Returns the head's predictions.

    Args:
-      roi_pooled_features: A float tensor of shape
-        [batch_size, height, width, channels] containing ROI pooled features
-        from a batch of boxes.
-    """
-    return self._predict(roi_pooled_features)
+      features: A float tensor of features.
+      num_predictions_per_location: Int containing number of predictions per
+        location.

-  @abstractmethod
-  def _predict(self, roi_pooled_features):
-    """The abstract internal prediction function that needs to be overloaded.
-
-    Args:
-      roi_pooled_features: A float tensor of shape
-        [batch_size, height, width, channels] containing ROI pooled features
-        from a batch of boxes.
+    Returns:
+      A tf.float32 tensor.
    """
+    pass
--- a/research/object_detection/predictors/mask_rcnn_heads/keypoint_head.py
+++ b/research/object_detection/predictors/mask_rcnn_heads/keypoint_head.py
@@ -13,15 +13,27 @@
 # limitations under the License.
 # ==============================================================================

-"""Mask R-CNN Keypoint Head."""
+"""Keypoint Head.
+
+Contains Keypoint prediction head classes for different meta architectures.
+All the keypoint prediction heads have a predict function that receives the
+`features` as the first argument and returns `keypoint_predictions`.
+Keypoints could be used to represent the human body joint locations as in
+Mask RCNN paper. Or they could be used to represent different part locations of
+objects.
+"""
 import tensorflow as tf

-from object_detection.predictors.mask_rcnn_heads import mask_rcnn_head
+from object_detection.predictors.heads import head
 slim = tf.contrib.slim


-class KeypointHead(mask_rcnn_head.MaskRCNNHead):
-  """Mask RCNN keypoint prediction head."""
+class MaskRCNNKeypointHead(head.Head):
+  """Mask RCNN keypoint prediction head.
+
+  Please refer to Mask RCNN paper:
+  https://arxiv.org/abs/1703.06870
+  """

  def __init__(self,
               num_keypoints=17,
@@ -48,7 +60,7 @@ class KeypointHead(mask_rcnn_head.MaskRCNNHead):
        based on the number of object classes and the number of channels in the
        image features.
    """
-    super(KeypointHead, self).__init__()
+    super(MaskRCNNKeypointHead, self).__init__()
    self._num_keypoints = num_keypoints
    self._conv_hyperparams_fn = conv_hyperparams_fn
    self._keypoint_heatmap_height = keypoint_heatmap_height
@@ -57,20 +69,27 @@ class KeypointHead(mask_rcnn_head.MaskRCNNHead):
        keypoint_prediction_num_conv_layers)
    self._keypoint_prediction_conv_depth = keypoint_prediction_conv_depth

-  def _predict(self, roi_pooled_features):
+  def predict(self, features, num_predictions_per_location=1):
    """Performs keypoint prediction.

    Args:
-      roi_pooled_features: A float tensor of shape [batch_size, height, width,
+      features: A float tensor of shape [batch_size, height, width,
        channels] containing features for a batch of images.
+      num_predictions_per_location: Int containing number of predictions per
+        location.

    Returns:
      instance_masks: A float tensor of shape
          [batch_size, 1, num_keypoints, heatmap_height, heatmap_width].
+
+    Raises:
+      ValueError: If num_predictions_per_location is not 1.
    """
+    if num_predictions_per_location != 1:
+      raise ValueError('Only num_predictions_per_location=1 is supported')
    with slim.arg_scope(self._conv_hyperparams_fn()):
      net = slim.conv2d(
-          roi_pooled_features,
+          features,
          self._keypoint_prediction_conv_depth, [3, 3],
          scope='conv_1')
      for i in range(1, self._keypoint_prediction_num_conv_layers):

--- a/research/object_detection/predictors/mask_rcnn_heads/keypoint_head_test.py
+++ b/research/object_detection/predictors/mask_rcnn_heads/keypoint_head_test.py
@@ -13,17 +13,17 @@
 # limitations under the License.
 # ==============================================================================

-"""Tests for object_detection.predictors.mask_rcnn_heads.keypoint_head."""
+"""Tests for object_detection.predictors.heads.keypoint_head."""
 import tensorflow as tf

 from google.protobuf import text_format
 from object_detection.builders import hyperparams_builder
-from object_detection.predictors.mask_rcnn_heads import keypoint_head
+from object_detection.predictors.heads import keypoint_head
 from object_detection.protos import hyperparams_pb2
 from object_detection.utils import test_case


-class KeypointHeadTest(test_case.TestCase):
+class MaskRCNNKeypointHeadTest(test_case.TestCase):

  def _build_arg_scope_with_hyperparams(self,
                                        op_type=hyperparams_pb2.Hyperparams.FC):
@@ -44,13 +44,12 @@ class KeypointHeadTest(test_case.TestCase):
    return hyperparams_builder.build(hyperparams, is_training=True)

  def test_prediction_size(self):
-    keypoint_prediction_head = keypoint_head.KeypointHead(
+    keypoint_prediction_head = keypoint_head.MaskRCNNKeypointHead(
        conv_hyperparams_fn=self._build_arg_scope_with_hyperparams())
    roi_pooled_features = tf.random_uniform(
        [64, 14, 14, 1024], minval=-2.0, maxval=2.0, dtype=tf.float32)
    prediction = keypoint_prediction_head.predict(
-        roi_pooled_features=roi_pooled_features)
-    tf.logging.info(prediction.shape)
+        features=roi_pooled_features, num_predictions_per_location=1)
    self.assertAllEqual([64, 1, 17, 56, 56], prediction.get_shape().as_list())



--- a/research/object_detection/predictors/mask_rcnn_heads/mask_head.py
+++ b/research/object_detection/predictors/mask_rcnn_heads/mask_head.py
@@ -13,17 +13,26 @@
 # limitations under the License.
 # ==============================================================================

-"""Mask R-CNN Mask Head."""
+"""Mask Head.
+
+Contains Mask prediction head classes for different meta architectures.
+All the mask prediction heads have a predict function that receives the
+`features` as the first argument and returns `mask_predictions`.
+"""
 import math
 import tensorflow as tf

-from object_detection.predictors.mask_rcnn_heads import mask_rcnn_head
+from object_detection.predictors.heads import head

 slim = tf.contrib.slim


-class MaskHead(mask_rcnn_head.MaskRCNNHead):
-  """Mask RCNN mask prediction head."""
+class MaskRCNNMaskHead(head.Head):
+  """Mask RCNN mask prediction head.
+
+  Please refer to Mask RCNN paper:
+  https://arxiv.org/abs/1703.06870
+  """

  def __init__(self,
               num_classes,
@@ -57,7 +66,7 @@ class MaskHead(mask_rcnn_head.MaskRCNNHead):
    Raises:
      ValueError: conv_hyperparams_fn is None.
    """
-    super(MaskHead, self).__init__()
+    super(MaskRCNNMaskHead, self).__init__()
    self._num_classes = num_classes
    self._conv_hyperparams_fn = conv_hyperparams_fn
    self._mask_height = mask_height
@@ -102,25 +111,32 @@ class MaskHead(mask_rcnn_head.MaskRCNNHead):
        total_weight)
    return int(math.pow(2.0, num_conv_channels_log))

-  def _predict(self, roi_pooled_features):
+  def predict(self, features, num_predictions_per_location=1):
    """Performs mask prediction.

    Args:
-      roi_pooled_features: A float tensor of shape [batch_size, height, width,
-        channels] containing features for a batch of images.
+      features: A float tensor of shape [batch_size, height, width, channels]
+        containing features for a batch of images.
+      num_predictions_per_location: Int containing number of predictions per
+        location.

    Returns:
      instance_masks: A float tensor of shape
          [batch_size, 1, num_classes, mask_height, mask_width].
+
+    Raises:
+      ValueError: If num_predictions_per_location is not 1.
    """
+    if num_predictions_per_location != 1:
+      raise ValueError('Only num_predictions_per_location=1 is supported')
    num_conv_channels = self._mask_prediction_conv_depth
    if num_conv_channels == 0:
-      num_feature_channels = roi_pooled_features.get_shape().as_list()[3]
+      num_feature_channels = features.get_shape().as_list()[3]
      num_conv_channels = self._get_mask_predictor_conv_depth(
          num_feature_channels, self._num_classes)
    with slim.arg_scope(self._conv_hyperparams_fn()):
      upsampled_features = tf.image.resize_bilinear(
-          roi_pooled_features, [self._mask_height, self._mask_width],
+          features, [self._mask_height, self._mask_width],
          align_corners=True)
      for _ in range(self._mask_prediction_num_conv_layers - 1):
        upsampled_features = slim.conv2d(
@@ -137,3 +153,182 @@ class MaskHead(mask_rcnn_head.MaskRCNNHead):
          tf.transpose(mask_predictions, perm=[0, 3, 1, 2]),
          axis=1,
          name='MaskPredictor')
+
+
+class ConvolutionalMaskHead(head.Head):
+  """Convolutional class prediction head."""
+
+  def __init__(self,
+               is_training,
+               num_classes,
+               use_dropout,
+               dropout_keep_prob,
+               kernel_size,
+               use_depthwise=False,
+               mask_height=7,
+               mask_width=7,
+               masks_are_class_agnostic=False):
+    """Constructor.
+
+    Args:
+      is_training: Indicates whether the BoxPredictor is in training mode.
+      num_classes: Number of classes.
+      use_dropout: Option to use dropout or not.  Note that a single dropout
+        op is applied here prior to both box and class predictions, which stands
+        in contrast to the ConvolutionalBoxPredictor below.
+      dropout_keep_prob: Keep probability for dropout.
+        This is only used if use_dropout is True.
+      kernel_size: Size of final convolution kernel.  If the
+        spatial resolution of the feature map is smaller than the kernel size,
+        then the kernel size is automatically set to be
+        min(feature_width, feature_height).
+      use_depthwise: Whether to use depthwise convolutions for prediction
+        steps. Default is False.
+      mask_height: Desired output mask height. The default value is 7.
+      mask_width: Desired output mask width. The default value is 7.
+      masks_are_class_agnostic: Boolean determining if the mask-head is
+        class-agnostic or not.
+
+    Raises:
+      ValueError: if min_depth > max_depth.
+    """
+    super(ConvolutionalMaskHead, self).__init__()
+    self._is_training = is_training
+    self._num_classes = num_classes
+    self._use_dropout = use_dropout
+    self._dropout_keep_prob = dropout_keep_prob
+    self._kernel_size = kernel_size
+    self._use_depthwise = use_depthwise
+    self._mask_height = mask_height
+    self._mask_width = mask_width
+    self._masks_are_class_agnostic = masks_are_class_agnostic
+
+  def predict(self, features, num_predictions_per_location):
+    """Predicts boxes.
+
+    Args:
+      features: A float tensor of shape [batch_size, height, width, channels]
+        containing image features.
+      num_predictions_per_location: Number of box predictions to be made per
+        spatial location.
+
+    Returns:
+      mask_predictions: A float tensors of shape
+        [batch_size, num_anchors, num_masks, mask_height, mask_width]
+        representing the mask predictions for the proposals.
+    """
+    image_feature = features
+    # Add a slot for the background class.
+    if self._masks_are_class_agnostic:
+      num_masks = 1
+    else:
+      num_masks = self._num_classes
+    num_mask_channels = num_masks * self._mask_height * self._mask_width
+    net = image_feature
+    if self._use_dropout:
+      net = slim.dropout(net, keep_prob=self._dropout_keep_prob)
+    if self._use_depthwise:
+      mask_predictions = slim.separable_conv2d(
+          net, None, [self._kernel_size, self._kernel_size],
+          padding='SAME', depth_multiplier=1, stride=1,
+          rate=1, scope='MaskPredictor_depthwise')
+      mask_predictions = slim.conv2d(
+          mask_predictions,
+          num_predictions_per_location * num_mask_channels,
+          [1, 1],
+          activation_fn=None,
+          normalizer_fn=None,
+          normalizer_params=None,
+          scope='MaskPredictor')
+    else:
+      mask_predictions = slim.conv2d(
+          net,
+          num_predictions_per_location * num_mask_channels,
+          [self._kernel_size, self._kernel_size],
+          activation_fn=None,
+          normalizer_fn=None,
+          normalizer_params=None,
+          scope='MaskPredictor')
+    batch_size = features.get_shape().as_list()[0]
+    if batch_size is None:
+      batch_size = tf.shape(features)[0]
+    mask_predictions = tf.reshape(
+        mask_predictions,
+        [batch_size, -1, num_masks, self._mask_height, self._mask_width])
+    return mask_predictions
+
+
+# TODO(alirezafathi): See if possible to unify Weight Shared with regular
+# convolutional mask head.
+class WeightSharedConvolutionalMaskHead(head.Head):
+  """Weight shared convolutional mask prediction head."""
+
+  def __init__(self,
+               num_classes,
+               kernel_size=3,
+               use_dropout=False,
+               dropout_keep_prob=0.8,
+               mask_height=7,
+               mask_width=7,
+               masks_are_class_agnostic=False):
+    """Constructor.
+
+    Args:
+      num_classes: number of classes.  Note that num_classes *does not*
+        include the background category, so if groundtruth labels take values
+        in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the
+        assigned classification targets can range from {0,... K}).
+      kernel_size: Size of final convolution kernel.
+      use_dropout: Whether to apply dropout to class prediction head.
+      dropout_keep_prob: Probability of keeping activiations.
+      mask_height: Desired output mask height. The default value is 7.
+      mask_width: Desired output mask width. The default value is 7.
+      masks_are_class_agnostic: Boolean determining if the mask-head is
+        class-agnostic or not.
+    """
+    super(WeightSharedConvolutionalMaskHead, self).__init__()
+    self._num_classes = num_classes
+    self._kernel_size = kernel_size
+    self._use_dropout = use_dropout
+    self._dropout_keep_prob = dropout_keep_prob
+    self._mask_height = mask_height
+    self._mask_width = mask_width
+    self._masks_are_class_agnostic = masks_are_class_agnostic
+
+  def predict(self, features, num_predictions_per_location):
+    """Predicts boxes.
+
+    Args:
+      features: A float tensor of shape [batch_size, height, width, channels]
+        containing image features.
+      num_predictions_per_location: Number of box predictions to be made per
+        spatial location.
+
+    Returns:
+      mask_predictions: A tensor of shape
+        [batch_size, num_anchors, num_classes, mask_height, mask_width]
+        representing the mask predictions for the proposals.
+    """
+    mask_predictions_net = features
+    if self._masks_are_class_agnostic:
+      num_masks = 1
+    else:
+      num_masks = self._num_classes
+    num_mask_channels = num_masks * self._mask_height * self._mask_width
+    if self._use_dropout:
+      mask_predictions_net = slim.dropout(
+          mask_predictions_net, keep_prob=self._dropout_keep_prob)
+    mask_predictions = slim.conv2d(
+        mask_predictions_net,
+        num_predictions_per_location * num_mask_channels,
+        [self._kernel_size, self._kernel_size],
+        activation_fn=None, stride=1, padding='SAME',
+        normalizer_fn=None,
+        scope='MaskPredictor')
+    batch_size = features.get_shape().as_list()[0]
+    if batch_size is None:
+      batch_size = tf.shape(features)[0]
+    mask_predictions = tf.reshape(
+        mask_predictions,
+        [batch_size, -1, num_masks, self._mask_height, self._mask_width])
+    return mask_predictions
--- a/research/object_detection/predictors/mask_rcnn_heads/mask_head_test.py
+++ b/research/object_detection/predictors/mask_rcnn_heads/mask_head_test.py
@@ -13,17 +13,17 @@
 # limitations under the License.
 # ==============================================================================

-"""Tests for object_detection.predictors.mask_rcnn_heads.mask_head."""
+"""Tests for object_detection.predictors.heads.mask_head."""
 import tensorflow as tf

 from google.protobuf import text_format
 from object_detection.builders import hyperparams_builder
-from object_detection.predictors.mask_rcnn_heads import mask_head
+from object_detection.predictors.heads import mask_head
 from object_detection.protos import hyperparams_pb2
 from object_detection.utils import test_case


-class MaskHeadTest(test_case.TestCase):
+class MaskRCNNMaskHeadTest(test_case.TestCase):

  def _build_arg_scope_with_hyperparams(self,
                                        op_type=hyperparams_pb2.Hyperparams.FC):
@@ -44,7 +44,7 @@ class MaskHeadTest(test_case.TestCase):
    return hyperparams_builder.build(hyperparams, is_training=True)

  def test_prediction_size(self):
-    mask_prediction_head = mask_head.MaskHead(
+    mask_prediction_head = mask_head.MaskRCNNMaskHead(
        num_classes=20,
        conv_hyperparams_fn=self._build_arg_scope_with_hyperparams(),
        mask_height=14,
@@ -55,10 +55,115 @@ class MaskHeadTest(test_case.TestCase):
    roi_pooled_features = tf.random_uniform(
        [64, 7, 7, 1024], minval=-10.0, maxval=10.0, dtype=tf.float32)
    prediction = mask_prediction_head.predict(
-        roi_pooled_features=roi_pooled_features)
-    tf.logging.info(prediction.shape)
+        features=roi_pooled_features, num_predictions_per_location=1)
    self.assertAllEqual([64, 1, 20, 14, 14], prediction.get_shape().as_list())


+class ConvolutionalMaskPredictorTest(test_case.TestCase):
+
+  def _build_arg_scope_with_hyperparams(
+      self, op_type=hyperparams_pb2.Hyperparams.CONV):
+    hyperparams = hyperparams_pb2.Hyperparams()
+    hyperparams_text_proto = """
+      activation: NONE
+      regularizer {
+        l2_regularizer {
+        }
+      }
+      initializer {
+        truncated_normal_initializer {
+        }
+      }
+    """
+    text_format.Merge(hyperparams_text_proto, hyperparams)
+    hyperparams.op = op_type
+    return hyperparams_builder.build(hyperparams, is_training=True)
+
+  def test_prediction_size(self):
+    mask_prediction_head = mask_head.ConvolutionalMaskHead(
+        is_training=True,
+        num_classes=20,
+        use_dropout=True,
+        dropout_keep_prob=0.5,
+        kernel_size=3,
+        mask_height=7,
+        mask_width=7)
+    image_feature = tf.random_uniform(
+        [64, 17, 19, 1024], minval=-10.0, maxval=10.0, dtype=tf.float32)
+    mask_predictions = mask_prediction_head.predict(
+        features=image_feature,
+        num_predictions_per_location=1)
+    self.assertAllEqual([64, 323, 20, 7, 7],
+                        mask_predictions.get_shape().as_list())
+
+  def test_class_agnostic_prediction_size(self):
+    mask_prediction_head = mask_head.ConvolutionalMaskHead(
+        is_training=True,
+        num_classes=20,
+        use_dropout=True,
+        dropout_keep_prob=0.5,
+        kernel_size=3,
+        mask_height=7,
+        mask_width=7,
+        masks_are_class_agnostic=True)
+    image_feature = tf.random_uniform(
+        [64, 17, 19, 1024], minval=-10.0, maxval=10.0, dtype=tf.float32)
+    mask_predictions = mask_prediction_head.predict(
+        features=image_feature,
+        num_predictions_per_location=1)
+    self.assertAllEqual([64, 323, 1, 7, 7],
+                        mask_predictions.get_shape().as_list())
+
+
+class WeightSharedConvolutionalMaskPredictorTest(test_case.TestCase):
+
+  def _build_arg_scope_with_hyperparams(
+      self, op_type=hyperparams_pb2.Hyperparams.CONV):
+    hyperparams = hyperparams_pb2.Hyperparams()
+    hyperparams_text_proto = """
+      activation: NONE
+      regularizer {
+        l2_regularizer {
+        }
+      }
+      initializer {
+        truncated_normal_initializer {
+        }
+      }
+    """
+    text_format.Merge(hyperparams_text_proto, hyperparams)
+    hyperparams.op = op_type
+    return hyperparams_builder.build(hyperparams, is_training=True)
+
+  def test_prediction_size(self):
+    mask_prediction_head = (
+        mask_head.WeightSharedConvolutionalMaskHead(
+            num_classes=20,
+            mask_height=7,
+            mask_width=7))
+    image_feature = tf.random_uniform(
+        [64, 17, 19, 1024], minval=-10.0, maxval=10.0, dtype=tf.float32)
+    mask_predictions = mask_prediction_head.predict(
+        features=image_feature,
+        num_predictions_per_location=1)
+    self.assertAllEqual([64, 323, 20, 7, 7],
+                        mask_predictions.get_shape().as_list())
+
+  def test_class_agnostic_prediction_size(self):
+    mask_prediction_head = (
+        mask_head.WeightSharedConvolutionalMaskHead(
+            num_classes=20,
+            mask_height=7,
+            mask_width=7,
+            masks_are_class_agnostic=True))
+    image_feature = tf.random_uniform(
+        [64, 17, 19, 1024], minval=-10.0, maxval=10.0, dtype=tf.float32)
+    mask_predictions = mask_prediction_head.predict(
+        features=image_feature,
+        num_predictions_per_location=1)
+    self.assertAllEqual([64, 323, 1, 7, 7],
+                        mask_predictions.get_shape().as_list())
+
+
 if __name__ == '__main__':
  tf.test.main()
--- a/research/object_detection/predictors/mask_rcnn_box_predictor.py
+++ b/research/object_detection/predictors/mask_rcnn_box_predictor.py
@@ -126,15 +126,18 @@ class MaskRCNNBoxPredictor(box_predictor.BoxPredictor):

    if prediction_stage == 2:
      predictions_dict[BOX_ENCODINGS] = self._box_prediction_head.predict(
-          roi_pooled_features=image_feature)
+          features=image_feature,
+          num_predictions_per_location=num_predictions_per_location[0])
      predictions_dict[CLASS_PREDICTIONS_WITH_BACKGROUND] = (
-          self._class_prediction_head.predict(roi_pooled_features=image_feature)
-      )
+          self._class_prediction_head.predict(
+              features=image_feature,
+              num_predictions_per_location=num_predictions_per_location[0]))
    elif prediction_stage == 3:
      for prediction_head in self.get_third_stage_prediction_heads():
        head_object = self._third_stage_heads[prediction_head]
        predictions_dict[prediction_head] = head_object.predict(
-            roi_pooled_features=image_feature)
+            features=image_feature,
+            num_predictions_per_location=num_predictions_per_location[0])
    else:
      raise ValueError('prediction_stage should be either 2 or 3.')


--- a/research/object_detection/predictors/mask_rcnn_box_predictor_test.py
+++ b/research/object_detection/predictors/mask_rcnn_box_predictor_test.py
@@ -18,11 +18,9 @@ import numpy as np
 import tensorflow as tf

 from google.protobuf import text_format
+from object_detection.builders import box_predictor_builder
 from object_detection.builders import hyperparams_builder
 from object_detection.predictors import mask_rcnn_box_predictor as box_predictor
-from object_detection.predictors.mask_rcnn_heads import box_head
-from object_detection.predictors.mask_rcnn_heads import class_head
-from object_detection.predictors.mask_rcnn_heads import mask_head
 from object_detection.protos import hyperparams_pb2
 from object_detection.utils import test_case

@@ -47,45 +45,9 @@ class MaskRCNNBoxPredictorTest(test_case.TestCase):
    hyperparams.op = op_type
    return hyperparams_builder.build(hyperparams, is_training=True)

-  def _box_predictor_builder(self,
-                             is_training,
-                             num_classes,
-                             fc_hyperparams_fn,
-                             use_dropout,
-                             dropout_keep_prob,
-                             box_code_size,
-                             share_box_across_classes=False,
-                             conv_hyperparams_fn=None,
-                             predict_instance_masks=False):
-    box_prediction_head = box_head.BoxHead(
-        is_training=is_training,
-        num_classes=num_classes,
-        fc_hyperparams_fn=fc_hyperparams_fn,
-        use_dropout=use_dropout,
-        dropout_keep_prob=dropout_keep_prob,
-        box_code_size=box_code_size,
-        share_box_across_classes=share_box_across_classes)
-    class_prediction_head = class_head.ClassHead(
-        is_training=is_training,
-        num_classes=num_classes,
-        fc_hyperparams_fn=fc_hyperparams_fn,
-        use_dropout=use_dropout,
-        dropout_keep_prob=dropout_keep_prob)
-    third_stage_heads = {}
-    if predict_instance_masks:
-      third_stage_heads[box_predictor.MASK_PREDICTIONS] = mask_head.MaskHead(
-          num_classes=num_classes,
-          conv_hyperparams_fn=conv_hyperparams_fn)
-    return box_predictor.MaskRCNNBoxPredictor(
-        is_training=is_training,
-        num_classes=num_classes,
-        box_prediction_head=box_prediction_head,
-        class_prediction_head=class_prediction_head,
-        third_stage_heads=third_stage_heads)
-
  def test_get_boxes_with_five_classes(self):
    def graph_fn(image_features):
-      mask_box_predictor = self._box_predictor_builder(
+      mask_box_predictor = box_predictor_builder.build_mask_rcnn_box_predictor(
          is_training=False,
          num_classes=5,
          fc_hyperparams_fn=self._build_arg_scope_with_hyperparams(),
@@ -109,7 +71,7 @@ class MaskRCNNBoxPredictorTest(test_case.TestCase):

  def test_get_boxes_with_five_classes_share_box_across_classes(self):
    def graph_fn(image_features):
-      mask_box_predictor = self._box_predictor_builder(
+      mask_box_predictor = box_predictor_builder.build_mask_rcnn_box_predictor(
          is_training=False,
          num_classes=5,
          fc_hyperparams_fn=self._build_arg_scope_with_hyperparams(),
@@ -134,7 +96,7 @@ class MaskRCNNBoxPredictorTest(test_case.TestCase):

  def test_value_error_on_predict_instance_masks_with_no_conv_hyperparms(self):
    with self.assertRaises(ValueError):
-      self._box_predictor_builder(
+      box_predictor_builder.build_mask_rcnn_box_predictor(
          is_training=False,
          num_classes=5,
          fc_hyperparams_fn=self._build_arg_scope_with_hyperparams(),
@@ -145,7 +107,7 @@ class MaskRCNNBoxPredictorTest(test_case.TestCase):

  def test_get_instance_masks(self):
    def graph_fn(image_features):
-      mask_box_predictor = self._box_predictor_builder(
+      mask_box_predictor = box_predictor_builder.build_mask_rcnn_box_predictor(
          is_training=False,
          num_classes=5,
          fc_hyperparams_fn=self._build_arg_scope_with_hyperparams(),
@@ -167,7 +129,7 @@ class MaskRCNNBoxPredictorTest(test_case.TestCase):

  def test_do_not_return_instance_masks_without_request(self):
    image_features = tf.random_uniform([2, 7, 7, 3], dtype=tf.float32)
-    mask_box_predictor = self._box_predictor_builder(
+    mask_box_predictor = box_predictor_builder.build_mask_rcnn_box_predictor(
        is_training=False,
        num_classes=5,
        fc_hyperparams_fn=self._build_arg_scope_with_hyperparams(),

--- a/research/object_detection/predictors/mask_rcnn_heads/class_head.py
+++ b/research/object_detection/predictors/mask_rcnn_heads/class_head.py
-# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Mask R-CNN Class Head."""
-import tensorflow as tf
-
-from object_detection.predictors.mask_rcnn_heads import mask_rcnn_head
-
-slim = tf.contrib.slim
-
-
-class ClassHead(mask_rcnn_head.MaskRCNNHead):
-  """Mask RCNN class prediction head."""
-
-  def __init__(self, is_training, num_classes, fc_hyperparams_fn,
-               use_dropout, dropout_keep_prob):
-    """Constructor.
-
-    Args:
-      is_training: Indicates whether the BoxPredictor is in training mode.
-      num_classes: number of classes.  Note that num_classes *does not*
-        include the background category, so if groundtruth labels take values
-        in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the
-        assigned classification targets can range from {0,... K}).
-      fc_hyperparams_fn: A function to generate tf-slim arg_scope with
-        hyperparameters for fully connected ops.
-      use_dropout: Option to use dropout or not.  Note that a single dropout
-        op is applied here prior to both box and class predictions, which stands
-        in contrast to the ConvolutionalBoxPredictor below.
-      dropout_keep_prob: Keep probability for dropout.
-        This is only used if use_dropout is True.
-    """
-    super(ClassHead, self).__init__()
-    self._is_training = is_training
-    self._num_classes = num_classes
-    self._fc_hyperparams_fn = fc_hyperparams_fn
-    self._use_dropout = use_dropout
-    self._dropout_keep_prob = dropout_keep_prob
-
-  def _predict(self, roi_pooled_features):
-    """Predicts boxes and class scores.
-
-    Args:
-      roi_pooled_features: A float tensor of shape [batch_size, height, width,
-        channels] containing features for a batch of images.
-
-    Returns:
-      class_predictions_with_background: A float tensor of shape
-        [batch_size, 1, num_classes + 1] representing the class predictions for
-        the proposals.
-    """
-    spatial_averaged_roi_pooled_features = tf.reduce_mean(
-        roi_pooled_features, [1, 2], keep_dims=True, name='AvgPool')
-    flattened_roi_pooled_features = slim.flatten(
-        spatial_averaged_roi_pooled_features)
-    if self._use_dropout:
-      flattened_roi_pooled_features = slim.dropout(
-          flattened_roi_pooled_features,
-          keep_prob=self._dropout_keep_prob,
-          is_training=self._is_training)
-
-    with slim.arg_scope(self._fc_hyperparams_fn()):
-      class_predictions_with_background = slim.fully_connected(
-          flattened_roi_pooled_features,
-          self._num_classes + 1,
-          activation_fn=None,
-          scope='ClassPredictor')
-    class_predictions_with_background = tf.reshape(
-        class_predictions_with_background, [-1, 1, self._num_classes + 1])
-    return class_predictions_with_background
--- a/research/object_detection/protos/input_reader.proto
+++ b/research/object_detection/protos/input_reader.proto
@@ -79,10 +79,9 @@ message InputReader {
  // Number of groundtruth keypoints per object.
  optional uint32 num_keypoints = 16 [default = 0];

-  // Maximum number of boxes to pad to during training.
-  // Set this to at least the maximum amount of boxes in the input data.
-  // Otherwise, it may cause "Data loss: Attempted to pad to a smaller size
-  // than the input element" errors.
+  // Maximum number of boxes to pad to during training / evaluation.
+  // Set this to at least the maximum amount of boxes in the input data,
+  // otherwise some groundtruth boxes may be clipped.
  optional int32 max_number_of_boxes = 21 [default=100];

  // Whether to load groundtruth instance masks.

--- a/research/object_detection/protos/ssd.proto
+++ b/research/object_detection/protos/ssd.proto
@@ -12,6 +12,7 @@ import "object_detection/protos/post_processing.proto";
 import "object_detection/protos/region_similarity_calculator.proto";

 // Configuration for Single Shot Detection (SSD) models.
+// Next id: 21
 message Ssd {

  // Number of classes to predict.
@@ -80,6 +81,22 @@ message Ssd {
  // a control dependency on tf.GraphKeys.UPDATE_OPS for train/loss op in order
  // to update the batch norm moving average parameters.
  optional bool inplace_batchnorm_update = 15 [default = false];
+
+  // Whether to weight the regression loss by the score of the ground truth box
+  // the anchor matches to.
+  optional bool weight_regression_loss_by_score = 17 [default=false];
+
+  // Whether to compute expected loss with respect to balanced positive/negative
+  // sampling scheme. If false, use explicit sampling.
+  optional bool use_expected_classification_loss_under_sampling = 18 [default=false];
+
+  // Minimum number of effective negative samples.
+  // Only applies if use_expected_classification_loss_under_sampling is true.
+  optional float minimum_negative_sampling = 19 [default=0];
+
+  // Desired number of effective negative samples per positive sample.
+  // Only applies if use_expected_classification_loss_under_sampling is true.
+  optional float desired_negative_sampling_ratio = 20 [default=3];
 }


@@ -147,3 +164,4 @@ message FeaturePyramidNetworks {
  // maximum level in feature pyramid
  optional int32 max_level = 2 [default = 7];
 }
+
--- a/research/object_detection/protos/train.proto
+++ b/research/object_detection/protos/train.proto
@@ -6,7 +6,7 @@ import "object_detection/protos/optimizer.proto";
 import "object_detection/protos/preprocessor.proto";

 // Message for configuring DetectionModel training jobs (train.py).
-// Next id: 25
+// Next id: 26
 message TrainConfig {
  // Effective batch size to use for training.
  // For TPU (or sync SGD jobs), the batch size per core (or GPU) is going to be
@@ -61,7 +61,13 @@ message TrainConfig {
  // amount.
  optional float bias_grad_multiplier = 11 [default=0];

-  // Variables that should not be updated during training.
+  // Variables that should be updated during training. Note that variables which
+  // also match the patterns in freeze_variables will be excluded.
+  repeated string update_trainable_variables = 25;
+
+  // Variables that should not be updated during training. If
+  // update_trainable_variables is not empty, only eliminates the included
+  // variables according to freeze_variables patterns.
  repeated string freeze_variables = 12;

  // Number of replicas to aggregate before making parameter updates.

--- a/research/object_detection/utils/object_detection_evaluation.py
+++ b/research/object_detection/utils/object_detection_evaluation.py
@@ -91,6 +91,23 @@ class DetectionEvaluator(object):
    """
    pass

+  def get_estimator_eval_metric_ops(self, eval_dict):
+    """Returns dict of metrics to use with `tf.estimator.EstimatorSpec`.
+
+    Note that this must only be implemented if performing evaluation with a
+    `tf.estimator.Estimator`.
+
+    Args:
+      eval_dict: A dictionary that holds tensors for evaluating an object
+        detection model, returned from
+        eval_util.result_dict_for_single_example().
+
+    Returns:
+      A dictionary of metric names to tuple of value_op and update_op that can
+      be used as eval metric ops in `tf.estimator.EstimatorSpec`.
+    """
+    pass
+
  @abstractmethod
  def evaluate(self):
    """Evaluates detections and returns a dictionary of metrics."""

--- a/research/object_detection/utils/ops.py
+++ b/research/object_detection/utils/ops.py
@@ -1008,15 +1008,15 @@ def matmul_crop_and_resize(image, boxes, crop_size, scope=None):


 def expected_classification_loss_under_sampling(batch_cls_targets, cls_losses,
-                                                negative_to_positive_ratio,
+                                                desired_negative_sampling_ratio,
                                                minimum_negative_sampling):
  """Computes classification loss by background/foreground weighting.

  The weighting is such that the effective background/foreground weight ratio
-  is the negative_to_positive_ratio. if p_i is the foreground probability of
-  anchor a_i, L(a_i) is the anchors loss, N is the number of anchors, and M is
-  the sum of foreground probabilities across anchors, then the total loss L is
-  calculated as:
+  is the desired_negative_sampling_ratio. if p_i is the foreground probability
+  of anchor a_i, L(a_i) is the anchors loss, N is the number of anchors, and M
+  is the sum of foreground probabilities across anchors, then the total loss L
+  is calculated as:

  beta = K*M/(N-M)
  L = sum_{i=1}^N [p_i + beta * (1 - p_i)] * (L(a_i))
@@ -1027,14 +1027,14 @@ def expected_classification_loss_under_sampling(batch_cls_targets, cls_losses,
        the class distrubution for the target assigned to a given anchor.
    cls_losses: Float tensor of shape [batch_size, num_anchors]
        representing anchorwise classification losses.
-    negative_to_positive_ratio: The desired background/foreground weight ratio.
+    desired_negative_sampling_ratio: The desired background/foreground weight
+      ratio.
    minimum_negative_sampling: Minimum number of effective negative samples.
      Used only when there are no positive examples.

  Returns:
    The classification loss.
  """
-
  num_anchors = tf.cast(tf.shape(batch_cls_targets)[1], tf.float32)

  # find the p_i
@@ -1042,7 +1042,7 @@ def expected_classification_loss_under_sampling(batch_cls_targets, cls_losses,
      foreground_probabilities_from_targets(batch_cls_targets))
  foreground_sum = tf.reduce_sum(foreground_probabilities, axis=-1)

-  k = negative_to_positive_ratio
+  k = desired_negative_sampling_ratio

  # compute beta
  denominators = (num_anchors - foreground_sum)
@@ -1053,7 +1053,8 @@ def expected_classification_loss_under_sampling(batch_cls_targets, cls_losses,
  # where the foreground sum is zero, use a minimum negative weight.
  min_negative_weight = 1.0 * minimum_negative_sampling / num_anchors
  beta = tf.where(
-      tf.equal(beta, 0), min_negative_weight * tf.ones_like(beta), beta)
+      tf.equal(foreground_sum, 0), min_negative_weight * tf.ones_like(beta),
+      beta)
  beta = tf.reshape(beta, [-1, 1])

  cls_loss_weights = foreground_probabilities + (