Refactor object detection box predictors and fix some issues with model_main. (#4965)

* Merged commit includes the following changes: 206852642 by Zhichao Lu: Build the balanced_positive_negative_sampler in the model builder for FasterRCNN. Also adds an option to use the static implementation of the sampler. -- 206803260 by Zhichao Lu: Fixes a misplaced argument in resnet fpn feature extractor. -- 206682736 by Zhichao Lu: This CL modifies the SSD meta architecture to support both Slim-based and Keras-based box predictors, and begins preparation for Keras box predictor support in the other meta architectures. Concretely, this CL adds a new `KerasBoxPredictor` base class and makes the meta architectures appropriately call whichever box predictors they are using. We can switch the non-ssd meta architectures to fully support Keras box predictors once the Keras Convolutional Box Predictor CL is submitted. -- 206669634 by Zhichao Lu: Adds an alternate method for balanced positive negative sampler using static shapes. -- 206643278 by Zhichao Lu: This CL adds a Keras layer hyperparameter configuration object to the hyperparams_builder. It automatically converts from Slim layer hyperparameter configs to Keras layer hyperparameters. Namely, it: - Builds Keras initializers/regularizers instead of Slim ones - sets weights_regularizer/initializer to kernel_regularizer/initializer - converts batchnorm decay to momentum - converts Slim l2 regularizer weights to the equivalent Keras l2 weights This will be used in the conversion of object detection feature extractors & box predictors to newer Tensorflow APIs. -- 206611681 by Zhichao Lu: Internal changes. -- 206591619 by Zhichao Lu: Clip the to shape when the input tensors are larger than the expected padded static shape -- 206517644 by Zhichao Lu: Make MultiscaleGridAnchorGenerator more consistent with MultipleGridAnchorGenerator. -- 206415624 by Zhichao Lu: Make the hardcoded feature pyramid network (FPN) levels configurable for both SSD Resnet and SSD Mobilenet. -- 206398204 by Zhichao Lu: This CL modifies the SSD meta architecture to support both Slim-based and Keras-based feature extractors. This allows us to begin the conversion of object detection to newer Tensorflow APIs. -- 206213448 by Zhichao Lu: Adding a method to compute the expected classification loss by background/foreground weighting. -- 206204232 by Zhichao Lu: Adding the keypoint head to the Mask RCNN pipeline. -- 206200352 by Zhichao Lu: - Create Faster R-CNN target assigner in the model builder. This allows configuring matchers in Target assigner to use TPU compatible ops (tf.gather in this case) without any change in meta architecture. - As a +ve side effect of the refactoring, we can now re-use a single target assigner for all of second stage heads in Faster R-CNN. -- 206178206 by Zhichao Lu: Force ssd feature extractor builder to use keyword arguments so values won't be passed to wrong arguments. -- 206168297 by Zhichao Lu: Updating exporter to use freeze_graph.freeze_graph_with_def_protos rather than a homegrown version. -- 206080748 by Zhichao Lu: Merge external contributions. -- 206074460 by Zhichao Lu: Update to preprocessor to apply temperature and softmax to the multiclass scores on read. -- 205960802 by Zhichao Lu: Fixing a bug in hierarchical label expansion script. -- 205944686 by Zhichao Lu: Update exporter to support exporting quantized model. -- 205912529 by Zhichao Lu: Add a two stage matcher to allow for thresholding by one criteria and then argmaxing on the other. -- 205909017 by Zhichao Lu: Add test for grayscale image_resizer -- 205892801 by Zhichao Lu: Add flag to decide whether to apply batch norm to conv layers of weight shared box predictor. -- 205824449 by Zhichao Lu: make sure that by default mask rcnn box predictor predicts 2 stages. -- 205730139 by Zhichao Lu: Updating warning message to be more explicit about variable size mismatch. -- 205696992 by Zhichao Lu: Remove utils/ops.py's dependency on core/box_list_ops.py. This will allow re-using TPU compatible ops from utils/ops.py in core/box_list_ops.py. -- 205696867 by Zhichao Lu: Refactoring mask rcnn predictor so have each head in a separate file. This CL lets us to add new heads more easily in the future to mask rcnn. -- 205492073 by Zhichao Lu: Refactor R-FCN box predictor to be TPU compliant. - Change utils/ops.py:position_sensitive_crop_regions to operate on single image and set of boxes without `box_ind` - Add a batch version that operations on batches of images and batches of boxes. - Refactor R-FCN box predictor to use the batched version of position sensitive crop regions. -- 205453567 by Zhichao Lu: Fix bug that cannot export inference graph when write_inference_graph flag is True. -- 205316039 by Zhichao Lu: Changing input tensor name. -- 205256307 by Zhichao Lu: Fix model zoo links for quantized model. -- 205164432 by Zhichao Lu: Fixes eval error when label map contains non-ascii characters. -- 205129842 by Zhichao Lu: Adds a option to clip the anchors to the window size without filtering the overlapped boxes in Faster-RCNN -- 205094863 by Zhichao Lu: Update to label map util to allow the option of adding a background class and fill in gaps in the label map. Useful for using multiclass scores which require a complete label map with explicit background label. -- 204989032 by Zhichao Lu: Add tf.prof support to exporter. -- 204825267 by Zhichao Lu: Modify mask rcnn box predictor tests for TPU compatibility. -- 204778749 by Zhichao Lu: Remove score filtering from postprocessing.py and rely on filtering logic in tf.image.non_max_suppression -- 204775818 by Zhichao Lu: Python3 fixes for object_detection. -- 204745920 by Zhichao Lu: Object Detection Dataset visualization tool (documentation). -- 204686993 by Zhichao Lu: Internal changes. -- 204559667 by Zhichao Lu: Refactor box_predictor.py into multiple files. The abstract base class remains in the object_detection/core, The other classes have moved to a separate file each in object_detection/predictors -- 204552847 by Zhichao Lu: Update blog post link. -- 204508028 by Zhichao Lu: Bump down the batch size to 1024 to be a bit more tolerant to OOM and double the number of iterations. This job still converges to 20.5 mAP in 3 hours. -- PiperOrigin-RevId: 206852642 * Add original post-processing back.

Refactor object detection box predictors and fix some issues with model_main. (#4965)
* Merged commit includes the following changes: 206852642 by Zhichao Lu: Build the balanced_positive_negative_sampler in the model builder for FasterRCNN. Also adds an option to use the static implementation of the sampler. -- 206803260 by Zhichao Lu: Fixes a misplaced argument in resnet fpn feature extractor. -- 206682736 by Zhichao Lu: This CL modifies the SSD meta architecture to support both Slim-based and Keras-based box predictors, and begins preparation for Keras box predictor support in the other meta architectures. Concretely, this CL adds a new `KerasBoxPredictor` base class and makes the meta architectures appropriately call whichever box predictors they are using. We can switch the non-ssd meta architectures to fully support Keras box predictors once the Keras Convolutional Box Predictor CL is submitted. -- 206669634 by Zhichao Lu: Adds an alternate method for balanced positive negative sampler using static shapes. -- 206643278 by Zhichao Lu: This CL adds a Keras layer hyperparameter configuration object to the hyperparams_builder. It automatically converts from Slim layer hyperparameter configs to Keras layer hyperparameters. Namely, it: - Builds Keras initializers/regularizers instead of Slim ones - sets weights_regularizer/initializer to kernel_regularizer/initializer - converts batchnorm decay to momentum - converts Slim l2 regularizer weights to the equivalent Keras l2 weights This will be used in the conversion of object detection feature extractors & box predictors to newer Tensorflow APIs. -- 206611681 by Zhichao Lu: Internal changes. -- 206591619 by Zhichao Lu: Clip the to shape when the input tensors are larger than the expected padded static shape -- 206517644 by Zhichao Lu: Make MultiscaleGridAnchorGenerator more consistent with MultipleGridAnchorGenerator. -- 206415624 by Zhichao Lu: Make the hardcoded feature pyramid network (FPN) levels configurable for both SSD Resnet and SSD Mobilenet. -- 206398204 by Zhichao Lu: This CL modifies the SSD meta architecture to support both Slim-based and Keras-based feature extractors. This allows us to begin the conversion of object detection to newer Tensorflow APIs. -- 206213448 by Zhichao Lu: Adding a method to compute the expected classification loss by background/foreground weighting. -- 206204232 by Zhichao Lu: Adding the keypoint head to the Mask RCNN pipeline. -- 206200352 by Zhichao Lu: - Create Faster R-CNN target assigner in the model builder. This allows configuring matchers in Target assigner to use TPU compatible ops (tf.gather in this case) without any change in meta architecture. - As a +ve side effect of the refactoring, we can now re-use a single target assigner for all of second stage heads in Faster R-CNN. -- 206178206 by Zhichao Lu: Force ssd feature extractor builder to use keyword arguments so values won't be passed to wrong arguments. -- 206168297 by Zhichao Lu: Updating exporter to use freeze_graph.freeze_graph_with_def_protos rather than a homegrown version. -- 206080748 by Zhichao Lu: Merge external contributions. -- 206074460 by Zhichao Lu: Update to preprocessor to apply temperature and softmax to the multiclass scores on read. -- 205960802 by Zhichao Lu: Fixing a bug in hierarchical label expansion script. -- 205944686 by Zhichao Lu: Update exporter to support exporting quantized model. -- 205912529 by Zhichao Lu: Add a two stage matcher to allow for thresholding by one criteria and then argmaxing on the other. -- 205909017 by Zhichao Lu: Add test for grayscale image_resizer -- 205892801 by Zhichao Lu: Add flag to decide whether to apply batch norm to conv layers of weight shared box predictor. -- 205824449 by Zhichao Lu: make sure that by default mask rcnn box predictor predicts 2 stages. -- 205730139 by Zhichao Lu: Updating warning message to be more explicit about variable size mismatch. -- 205696992 by Zhichao Lu: Remove utils/ops.py's dependency on core/box_list_ops.py. This will allow re-using TPU compatible ops from utils/ops.py in core/box_list_ops.py. -- 205696867 by Zhichao Lu: Refactoring mask rcnn predictor so have each head in a separate file. This CL lets us to add new heads more easily in the future to mask rcnn. -- 205492073 by Zhichao Lu: Refactor R-FCN box predictor to be TPU compliant. - Change utils/ops.py:position_sensitive_crop_regions to operate on single image and set of boxes without `box_ind` - Add a batch version that operations on batches of images and batches of boxes. - Refactor R-FCN box predictor to use the batched version of position sensitive crop regions. -- 205453567 by Zhichao Lu: Fix bug that cannot export inference graph when write_inference_graph flag is True. -- 205316039 by Zhichao Lu: Changing input tensor name. -- 205256307 by Zhichao Lu: Fix model zoo links for quantized model. -- 205164432 by Zhichao Lu: Fixes eval error when label map contains non-ascii characters. -- 205129842 by Zhichao Lu: Adds a option to clip the anchors to the window size without filtering the overlapped boxes in Faster-RCNN -- 205094863 by Zhichao Lu: Update to label map util to allow the option of adding a background class and fill in gaps in the label map. Useful for using multiclass scores which require a complete label map with explicit background label. -- 204989032 by Zhichao Lu: Add tf.prof support to exporter. -- 204825267 by Zhichao Lu: Modify mask rcnn box predictor tests for TPU compatibility. -- 204778749 by Zhichao Lu: Remove score filtering from postprocessing.py and rely on filtering logic in tf.image.non_max_suppression -- 204775818 by Zhichao Lu: Python3 fixes for object_detection. -- 204745920 by Zhichao Lu: Object Detection Dataset visualization tool (documentation). -- 204686993 by Zhichao Lu: Internal changes. -- 204559667 by Zhichao Lu: Refactor box_predictor.py into multiple files. The abstract base class remains in the object_detection/core, The other classes have moved to a separate file each in object_detection/predictors -- 204552847 by Zhichao Lu: Update blog post link. -- 204508028 by Zhichao Lu: Bump down the batch size to 1024 to be a bit more tolerant to OOM and double the number of iterations. This job still converges to 20.5 mAP in 3 hours. -- PiperOrigin-RevId: 206852642 * Add original post-processing back.
02a9969e · pkulzc · GitHub · d135ed9c · 02a9969e · 02a9969e
Unverified Commit 02a9969e authored Aug 01, 2018 by pkulzc Committed by GitHub Aug 01, 2018
20 changed files
--- a/research/object_detection/README.md
+++ b/research/object_detection/README.md
@@ -79,7 +79,7 @@ Extras:
      Run the evaluation for the Open Images Challenge 2018</a><br>
  * <a href='g3doc/tpu_compatibility.md'>
      TPU compatible detection pipelines</a><br>
-  *  <a href='g3doc/running_on_mobile_tensorflowlite.md'>
+  * <a href='g3doc/running_on_mobile_tensorflowlite.md'>
      Running object detection on mobile devices with TensorFlow Lite</a><br>
 ## Getting Help

--- a/research/object_detection/anchor_generators/multiple_grid_anchor_generator.py
+++ b/research/object_detection/anchor_generators/multiple_grid_anchor_generator.py
@@ -157,12 +157,10 @@ class MultipleGridAnchorGenerator(anchor_generator.AnchorGenerator):
        correspond to an 8x8 layer followed by a 7x7 layer.
      im_height: the height of the image to generate the grid for. If both
        im_height and im_width are 1, the generated anchors default to
-        normalized coordinates, otherwise absolute coordinates are used for the
+        absolute coordinates, otherwise normalized coordinates are produced.
-        grid.
      im_width: the width of the image to generate the grid for. If both
        im_height and im_width are 1, the generated anchors default to
-        normalized coordinates, otherwise absolute coordinates are used for the
+        absolute coordinates, otherwise normalized coordinates are produced.
-        grid.
    Returns:
      boxes_list: a list of BoxLists each holding anchor boxes corresponding to

--- a/research/object_detection/anchor_generators/multiscale_grid_anchor_generator.py
+++ b/research/object_detection/anchor_generators/multiscale_grid_anchor_generator.py
@@ -57,14 +57,12 @@ class MultiscaleGridAnchorGenerator(anchor_generator.AnchorGenerator):
    self._scales_per_octave = scales_per_octave
    self._normalize_coordinates = normalize_coordinates
+    scales = [2**(float(scale) / scales_per_octave)
+              for scale in xrange(scales_per_octave)]
+    aspects = list(aspect_ratios)
    for level in range(min_level, max_level + 1):
      anchor_stride = [2**level, 2**level]
-      scales = []
-      aspects = []
-      for scale in range(scales_per_octave):
-        scales.append(2**(float(scale) / scales_per_octave))
-      for aspect_ratio in aspect_ratios:
-        aspects.append(aspect_ratio)
      base_anchor_size = [2**level * anchor_scale, 2**level * anchor_scale]
      self._anchor_grid_info.append({
          'level': level,
@@ -84,7 +82,7 @@ class MultiscaleGridAnchorGenerator(anchor_generator.AnchorGenerator):
    return len(self._anchor_grid_info) * [
        len(self._aspect_ratios) * self._scales_per_octave]
-  def _generate(self, feature_map_shape_list, im_height, im_width):
+  def _generate(self, feature_map_shape_list, im_height=1, im_width=1):
    """Generates a collection of bounding boxes to be used as anchors.
    Currently we require the input image shape to be statically defined.  That
@@ -95,14 +93,20 @@ class MultiscaleGridAnchorGenerator(anchor_generator.AnchorGenerator):
        format [(height_0, width_0), (height_1, width_1), ...]. For example,
        setting feature_map_shape_list=[(8, 8), (7, 7)] asks for anchors that
        correspond to an 8x8 layer followed by a 7x7 layer.
-      im_height: the height of the image to generate the grid for.
+      im_height: the height of the image to generate the grid for. If both
-      im_width: the width of the image to generate the grid for.
+        im_height and im_width are 1, anchors can only be generated in
+        absolute coordinates.
+      im_width: the width of the image to generate the grid for. If both
+        im_height and im_width are 1, anchors can only be generated in
+        absolute coordinates.
    Returns:
      boxes_list: a list of BoxLists each holding anchor boxes corresponding to
        the input feature map shapes.
    Raises:
      ValueError: if im_height and im_width are not integers.
+      ValueError: if im_height and im_width are 1, but normalized coordinates
+        were requested.
    """
    if not isinstance(im_height, int) or not isinstance(im_width, int):
      raise ValueError('MultiscaleGridAnchorGenerator currently requires '
@@ -118,9 +122,9 @@ class MultiscaleGridAnchorGenerator(anchor_generator.AnchorGenerator):
      feat_h = feat_shape[0]
      feat_w = feat_shape[1]
      anchor_offset = [0, 0]
-      if im_height % 2.0**level == 0:
+      if im_height % 2.0**level == 0 or im_height == 1:
        anchor_offset[0] = stride / 2.0
-      if im_width % 2.0**level == 0:
+      if im_width % 2.0**level == 0 or im_width == 1:
        anchor_offset[1] = stride / 2.0
      ag = grid_anchor_generator.GridAnchorGenerator(
          scales,
@@ -131,6 +135,11 @@ class MultiscaleGridAnchorGenerator(anchor_generator.AnchorGenerator):
      (anchor_grid,) = ag.generate(feature_map_shape_list=[(feat_h, feat_w)])
      if self._normalize_coordinates:
+        if im_height == 1 or im_width == 1:
+          raise ValueError(
+              'Normalized coordinates were requested upon construction of the '
+              'MultiscaleGridAnchorGenerator, but a subsequent call to '
+              'generate did not supply dimension information.')
        anchor_grid = box_list_ops.to_normalized_coordinates(
            anchor_grid, im_height, im_width, check_range=False)
      anchor_grid_list.append(anchor_grid)

--- a/research/object_detection/anchor_generators/multiscale_grid_anchor_generator_test.py
+++ b/research/object_detection/anchor_generators/multiscale_grid_anchor_generator_test.py
@@ -47,6 +47,40 @@ class MultiscaleGridAnchorGeneratorTest(test_case.TestCase):
      anchor_corners_out = anchor_corners.eval()
      self.assertAllClose(anchor_corners_out, exp_anchor_corners)
+  def test_construct_single_anchor_unit_dimensions(self):
+    min_level = 5
+    max_level = 5
+    anchor_scale = 1.0
+    aspect_ratios = [1.0]
+    scales_per_octave = 1
+    im_height = 1
+    im_width = 1
+    feature_map_shape_list = [(2, 2)]
+    # Positive offsets are produced.
+    exp_anchor_corners = [[0, 0, 32, 32],
+                          [0, 32, 32, 64],
+                          [32, 0, 64, 32],
+                          [32, 32, 64, 64]]
+    anchor_generator = mg.MultiscaleGridAnchorGenerator(
+        min_level, max_level, anchor_scale, aspect_ratios, scales_per_octave,
+        normalize_coordinates=False)
+    anchors_list = anchor_generator.generate(
+        feature_map_shape_list, im_height=im_height, im_width=im_width)
+    anchor_corners = anchors_list[0].get()
+    with self.test_session():
+      anchor_corners_out = anchor_corners.eval()
+      self.assertAllClose(anchor_corners_out, exp_anchor_corners)
+  def test_construct_normalized_anchors_fails_with_unit_dimensions(self):
+    anchor_generator = mg.MultiscaleGridAnchorGenerator(
+        min_level=5, max_level=5, anchor_scale=1.0, aspect_ratios=[1.0],
+        scales_per_octave=1, normalize_coordinates=True)
+    with self.assertRaisesRegexp(ValueError, 'Normalized coordinates'):
+      anchor_generator.generate(
+          feature_map_shape_list=[(2, 2)], im_height=1, im_width=1)
  def test_construct_single_anchor_in_normalized_coordinates(self):
    min_level = 5
    max_level = 5
@@ -94,7 +128,7 @@ class MultiscaleGridAnchorGeneratorTest(test_case.TestCase):
    anchor_generator = mg.MultiscaleGridAnchorGenerator(
        min_level, max_level, anchor_scale, aspect_ratios, scales_per_octave,
        normalize_coordinates=False)
-    with self.assertRaises(ValueError):
+    with self.assertRaisesRegexp(ValueError, 'statically defined'):
      anchor_generator.generate(
          feature_map_shape_list, im_height=im_height, im_width=im_width)

--- a/research/object_detection/builders/box_predictor_builder.py
+++ b/research/object_detection/builders/box_predictor_builder.py
@@ -15,7 +15,12 @@
 """Function to build box predictor from configuration."""
-from object_detection.core import box_predictor
+from object_detection.predictors import convolutional_box_predictor
+from object_detection.predictors import mask_rcnn_box_predictor
+from object_detection.predictors import rfcn_box_predictor
+from object_detection.predictors.mask_rcnn_heads import box_head
+from object_detection.predictors.mask_rcnn_heads import class_head
+from object_detection.predictors.mask_rcnn_heads import mask_head
 from object_detection.protos import box_predictor_pb2
@@ -48,92 +53,112 @@ def build(argscope_fn, box_predictor_config, is_training, num_classes):
  box_predictor_oneof = box_predictor_config.WhichOneof('box_predictor_oneof')
  if  box_predictor_oneof == 'convolutional_box_predictor':
-    conv_box_predictor = box_predictor_config.convolutional_box_predictor
+    config_box_predictor = box_predictor_config.convolutional_box_predictor
-    conv_hyperparams_fn = argscope_fn(conv_box_predictor.conv_hyperparams,
+    conv_hyperparams_fn = argscope_fn(config_box_predictor.conv_hyperparams,
                                      is_training)
-    box_predictor_object = box_predictor.ConvolutionalBoxPredictor(
+    box_predictor_object = (
-        is_training=is_training,
+        convolutional_box_predictor.ConvolutionalBoxPredictor(
-        num_classes=num_classes,
+            is_training=is_training,
-        conv_hyperparams_fn=conv_hyperparams_fn,
+            num_classes=num_classes,
-        min_depth=conv_box_predictor.min_depth,
+            conv_hyperparams_fn=conv_hyperparams_fn,
-        max_depth=conv_box_predictor.max_depth,
+            min_depth=config_box_predictor.min_depth,
-        num_layers_before_predictor=(conv_box_predictor.
+            max_depth=config_box_predictor.max_depth,
-                                     num_layers_before_predictor),
+            num_layers_before_predictor=(
-        use_dropout=conv_box_predictor.use_dropout,
+                config_box_predictor.num_layers_before_predictor),
-        dropout_keep_prob=conv_box_predictor.dropout_keep_probability,
+            use_dropout=config_box_predictor.use_dropout,
-        kernel_size=conv_box_predictor.kernel_size,
+            dropout_keep_prob=config_box_predictor.dropout_keep_probability,
-        box_code_size=conv_box_predictor.box_code_size,
+            kernel_size=config_box_predictor.kernel_size,
-        apply_sigmoid_to_scores=conv_box_predictor.apply_sigmoid_to_scores,
+            box_code_size=config_box_predictor.box_code_size,
-        class_prediction_bias_init=(conv_box_predictor.
+            apply_sigmoid_to_scores=config_box_predictor.
-                                    class_prediction_bias_init),
+            apply_sigmoid_to_scores,
-        use_depthwise=conv_box_predictor.use_depthwise
+            class_prediction_bias_init=(
-    )
+                config_box_predictor.class_prediction_bias_init),
+            use_depthwise=config_box_predictor.use_depthwise))
    return box_predictor_object
  if  box_predictor_oneof == 'weight_shared_convolutional_box_predictor':
-    conv_box_predictor = (box_predictor_config.
+    config_box_predictor = (
-                          weight_shared_convolutional_box_predictor)
+        box_predictor_config.weight_shared_convolutional_box_predictor)
-    conv_hyperparams_fn = argscope_fn(conv_box_predictor.conv_hyperparams,
+    conv_hyperparams_fn = argscope_fn(config_box_predictor.conv_hyperparams,
                                      is_training)
-    box_predictor_object = box_predictor.WeightSharedConvolutionalBoxPredictor(
+    apply_batch_norm = config_box_predictor.conv_hyperparams.HasField(
-        is_training=is_training,
+        'batch_norm')
-        num_classes=num_classes,
+    box_predictor_object = (
-        conv_hyperparams_fn=conv_hyperparams_fn,
+        convolutional_box_predictor.WeightSharedConvolutionalBoxPredictor(
-        depth=conv_box_predictor.depth,
+            is_training=is_training,
-        num_layers_before_predictor=(
+            num_classes=num_classes,
-            conv_box_predictor.num_layers_before_predictor),
+            conv_hyperparams_fn=conv_hyperparams_fn,
-        kernel_size=conv_box_predictor.kernel_size,
+            depth=config_box_predictor.depth,
-        box_code_size=conv_box_predictor.box_code_size,
+            num_layers_before_predictor=(
-        class_prediction_bias_init=conv_box_predictor.
+                config_box_predictor.num_layers_before_predictor),
-        class_prediction_bias_init,
+            kernel_size=config_box_predictor.kernel_size,
-        use_dropout=conv_box_predictor.use_dropout,
+            box_code_size=config_box_predictor.box_code_size,
-        dropout_keep_prob=conv_box_predictor.dropout_keep_probability,
+            class_prediction_bias_init=config_box_predictor.
-        share_prediction_tower=conv_box_predictor.share_prediction_tower)
+            class_prediction_bias_init,
+            use_dropout=config_box_predictor.use_dropout,
+            dropout_keep_prob=config_box_predictor.dropout_keep_probability,
+            share_prediction_tower=config_box_predictor.share_prediction_tower,
+            apply_batch_norm=apply_batch_norm))
    return box_predictor_object
  if box_predictor_oneof == 'mask_rcnn_box_predictor':
-    mask_rcnn_box_predictor = box_predictor_config.mask_rcnn_box_predictor
+    config_box_predictor = box_predictor_config.mask_rcnn_box_predictor
-    fc_hyperparams_fn = argscope_fn(mask_rcnn_box_predictor.fc_hyperparams,
+    fc_hyperparams_fn = argscope_fn(config_box_predictor.fc_hyperparams,
                                    is_training)
    conv_hyperparams_fn = None
-    if mask_rcnn_box_predictor.HasField('conv_hyperparams'):
+    if config_box_predictor.HasField('conv_hyperparams'):
      conv_hyperparams_fn = argscope_fn(
-          mask_rcnn_box_predictor.conv_hyperparams, is_training)
+          config_box_predictor.conv_hyperparams, is_training)
-    box_predictor_object = box_predictor.MaskRCNNBoxPredictor(
+    box_prediction_head = box_head.BoxHead(
        is_training=is_training,
        num_classes=num_classes,
        fc_hyperparams_fn=fc_hyperparams_fn,
-        use_dropout=mask_rcnn_box_predictor.use_dropout,
+        use_dropout=config_box_predictor.use_dropout,
-        dropout_keep_prob=mask_rcnn_box_predictor.dropout_keep_probability,
+        dropout_keep_prob=config_box_predictor.dropout_keep_probability,
-        box_code_size=mask_rcnn_box_predictor.box_code_size,
+        box_code_size=config_box_predictor.box_code_size,
-        conv_hyperparams_fn=conv_hyperparams_fn,
-        predict_instance_masks=mask_rcnn_box_predictor.predict_instance_masks,
-        mask_height=mask_rcnn_box_predictor.mask_height,
-        mask_width=mask_rcnn_box_predictor.mask_width,
-        mask_prediction_num_conv_layers=(
-            mask_rcnn_box_predictor.mask_prediction_num_conv_layers),
-        mask_prediction_conv_depth=(
-            mask_rcnn_box_predictor.mask_prediction_conv_depth),
-        masks_are_class_agnostic=(
-            mask_rcnn_box_predictor.masks_are_class_agnostic),
-        predict_keypoints=mask_rcnn_box_predictor.predict_keypoints,
        share_box_across_classes=(
-            mask_rcnn_box_predictor.share_box_across_classes))
+            config_box_predictor.share_box_across_classes))
+    class_prediction_head = class_head.ClassHead(
+        is_training=is_training,
+        num_classes=num_classes,
+        fc_hyperparams_fn=fc_hyperparams_fn,
+        use_dropout=config_box_predictor.use_dropout,
+        dropout_keep_prob=config_box_predictor.dropout_keep_probability)
+    third_stage_heads = {}
+    if config_box_predictor.predict_instance_masks:
+      third_stage_heads[
+          mask_rcnn_box_predictor.MASK_PREDICTIONS] = mask_head.MaskHead(
+              num_classes=num_classes,
+              conv_hyperparams_fn=conv_hyperparams_fn,
+              mask_height=config_box_predictor.mask_height,
+              mask_width=config_box_predictor.mask_width,
+              mask_prediction_num_conv_layers=(
+                  config_box_predictor.mask_prediction_num_conv_layers),
+              mask_prediction_conv_depth=(
+                  config_box_predictor.mask_prediction_conv_depth),
+              masks_are_class_agnostic=(
+                  config_box_predictor.masks_are_class_agnostic))
+    box_predictor_object = mask_rcnn_box_predictor.MaskRCNNBoxPredictor(
+        is_training=is_training,
+        num_classes=num_classes,
+        box_prediction_head=box_prediction_head,
+        class_prediction_head=class_prediction_head,
+        third_stage_heads=third_stage_heads)
    return box_predictor_object
  if box_predictor_oneof == 'rfcn_box_predictor':
-    rfcn_box_predictor = box_predictor_config.rfcn_box_predictor
+    config_box_predictor = box_predictor_config.rfcn_box_predictor
-    conv_hyperparams_fn = argscope_fn(rfcn_box_predictor.conv_hyperparams,
+    conv_hyperparams_fn = argscope_fn(config_box_predictor.conv_hyperparams,
                                      is_training)
-    box_predictor_object = box_predictor.RfcnBoxPredictor(
+    box_predictor_object = rfcn_box_predictor.RfcnBoxPredictor(
        is_training=is_training,
        num_classes=num_classes,
        conv_hyperparams_fn=conv_hyperparams_fn,
-        crop_size=[rfcn_box_predictor.crop_height,
+        crop_size=[config_box_predictor.crop_height,
-                   rfcn_box_predictor.crop_width],
+                   config_box_predictor.crop_width],
-        num_spatial_bins=[rfcn_box_predictor.num_spatial_bins_height,
+        num_spatial_bins=[config_box_predictor.num_spatial_bins_height,
-                          rfcn_box_predictor.num_spatial_bins_width],
+                          config_box_predictor.num_spatial_bins_width],
-        depth=rfcn_box_predictor.depth,
+        depth=config_box_predictor.depth,
-        box_code_size=rfcn_box_predictor.box_code_size)
+        box_code_size=config_box_predictor.box_code_size)
    return box_predictor_object
  raise ValueError('Unknown box predictor: {}'.format(box_predictor_oneof))
--- a/research/object_detection/builders/box_predictor_builder_test.py
+++ b/research/object_detection/builders/box_predictor_builder_test.py
@@ -20,6 +20,7 @@ import tensorflow as tf
 from google.protobuf import text_format
 from object_detection.builders import box_predictor_builder
 from object_detection.builders import hyperparams_builder
+from object_detection.predictors import mask_rcnn_box_predictor
 from object_detection.protos import box_predictor_pb2
 from object_detection.protos import hyperparams_pb2
@@ -239,6 +240,7 @@ class WeightSharedConvolutionalBoxPredictorBuilderTest(tf.test.TestCase):
    self.assertAlmostEqual(box_predictor._class_prediction_bias_init, 4.0)
    self.assertEqual(box_predictor.num_classes, 10)
    self.assertFalse(box_predictor._is_training)
+    self.assertEqual(box_predictor._apply_batch_norm, False)
  def test_construct_default_conv_box_predictor(self):
    box_predictor_text_proto = """
@@ -265,6 +267,37 @@ class WeightSharedConvolutionalBoxPredictorBuilderTest(tf.test.TestCase):
    self.assertEqual(box_predictor._num_layers_before_predictor, 0)
    self.assertEqual(box_predictor.num_classes, 90)
    self.assertTrue(box_predictor._is_training)
+    self.assertEqual(box_predictor._apply_batch_norm, False)
+  def test_construct_default_conv_box_predictor_with_batch_norm(self):
+    box_predictor_text_proto = """
+      weight_shared_convolutional_box_predictor {
+        conv_hyperparams {
+          regularizer {
+            l1_regularizer {
+            }
+          }
+          batch_norm {
+            train: true
+          }
+          initializer {
+            truncated_normal_initializer {
+            }
+          }
+        }
+      }"""
+    box_predictor_proto = box_predictor_pb2.BoxPredictor()
+    text_format.Merge(box_predictor_text_proto, box_predictor_proto)
+    box_predictor = box_predictor_builder.build(
+        argscope_fn=hyperparams_builder.build,
+        box_predictor_config=box_predictor_proto,
+        is_training=True,
+        num_classes=90)
+    self.assertEqual(box_predictor._depth, 0)
+    self.assertEqual(box_predictor._num_layers_before_predictor, 0)
+    self.assertEqual(box_predictor.num_classes, 90)
+    self.assertTrue(box_predictor._is_training)
+    self.assertEqual(box_predictor._apply_batch_norm, True)
 class MaskRCNNBoxPredictorBuilderTest(tf.test.TestCase):
@@ -297,7 +330,10 @@ class MaskRCNNBoxPredictorBuilderTest(tf.test.TestCase):
        is_training=False,
        num_classes=10)
    mock_argscope_fn.assert_called_with(hyperparams_proto, False)
-    self.assertEqual(box_predictor._fc_hyperparams_fn, 'arg_scope')
+    self.assertEqual(box_predictor._box_prediction_head._fc_hyperparams_fn,
+                     'arg_scope')
+    self.assertEqual(box_predictor._class_prediction_head._fc_hyperparams_fn,
+                     'arg_scope')
  def test_non_default_mask_rcnn_box_predictor(self):
    fc_hyperparams_text_proto = """
@@ -334,12 +370,16 @@ class MaskRCNNBoxPredictorBuilderTest(tf.test.TestCase):
        box_predictor_config=box_predictor_proto,
        is_training=True,
        num_classes=90)
-    self.assertTrue(box_predictor._use_dropout)
+    box_head = box_predictor._box_prediction_head
-    self.assertAlmostEqual(box_predictor._dropout_keep_prob, 0.8)
+    class_head = box_predictor._class_prediction_head
+    self.assertTrue(box_head._use_dropout)
+    self.assertTrue(class_head._use_dropout)
+    self.assertAlmostEqual(box_head._dropout_keep_prob, 0.8)
+    self.assertAlmostEqual(class_head._dropout_keep_prob, 0.8)
    self.assertEqual(box_predictor.num_classes, 90)
    self.assertTrue(box_predictor._is_training)
-    self.assertEqual(box_predictor._box_code_size, 3)
+    self.assertEqual(box_head._box_code_size, 3)
-    self.assertEqual(box_predictor._share_box_across_classes, True)
+    self.assertEqual(box_head._share_box_across_classes, True)
  def test_build_default_mask_rcnn_box_predictor(self):
    box_predictor_proto = box_predictor_pb2.BoxPredictor()
@@ -350,13 +390,15 @@ class MaskRCNNBoxPredictorBuilderTest(tf.test.TestCase):
        box_predictor_config=box_predictor_proto,
        is_training=True,
        num_classes=90)
-    self.assertFalse(box_predictor._use_dropout)
+    box_head = box_predictor._box_prediction_head
-    self.assertAlmostEqual(box_predictor._dropout_keep_prob, 0.5)
+    class_head = box_predictor._class_prediction_head
+    self.assertFalse(box_head._use_dropout)
+    self.assertFalse(class_head._use_dropout)
+    self.assertAlmostEqual(box_head._dropout_keep_prob, 0.5)
    self.assertEqual(box_predictor.num_classes, 90)
    self.assertTrue(box_predictor._is_training)
-    self.assertEqual(box_predictor._box_code_size, 4)
+    self.assertEqual(box_head._box_code_size, 4)
-    self.assertFalse(box_predictor._predict_instance_masks)
+    self.assertEqual(len(box_predictor._third_stage_heads.keys()), 0)
-    self.assertFalse(box_predictor._predict_keypoints)
  def test_build_box_predictor_with_mask_branch(self):
    box_predictor_proto = box_predictor_pb2.BoxPredictor()
@@ -379,14 +421,21 @@ class MaskRCNNBoxPredictorBuilderTest(tf.test.TestCase):
                   True),
         mock.call(box_predictor_proto.mask_rcnn_box_predictor.conv_hyperparams,
                   True)], any_order=True)
-    self.assertFalse(box_predictor._use_dropout)
+    box_head = box_predictor._box_prediction_head
-    self.assertAlmostEqual(box_predictor._dropout_keep_prob, 0.5)
+    class_head = box_predictor._class_prediction_head
+    third_stage_heads = box_predictor._third_stage_heads
+    self.assertFalse(box_head._use_dropout)
+    self.assertFalse(class_head._use_dropout)
+    self.assertAlmostEqual(box_head._dropout_keep_prob, 0.5)
+    self.assertAlmostEqual(class_head._dropout_keep_prob, 0.5)
    self.assertEqual(box_predictor.num_classes, 90)
    self.assertTrue(box_predictor._is_training)
-    self.assertEqual(box_predictor._box_code_size, 4)
+    self.assertEqual(box_head._box_code_size, 4)
-    self.assertTrue(box_predictor._predict_instance_masks)
+    self.assertTrue(
-    self.assertEqual(box_predictor._mask_prediction_conv_depth, 512)
+        mask_rcnn_box_predictor.MASK_PREDICTIONS in third_stage_heads)
-    self.assertFalse(box_predictor._predict_keypoints)
+    self.assertEqual(
+        third_stage_heads[mask_rcnn_box_predictor.MASK_PREDICTIONS]
+        ._mask_prediction_conv_depth, 512)
 class RfcnBoxPredictorBuilderTest(tf.test.TestCase):

--- a/research/object_detection/builders/hyperparams_builder.py
+++ b/research/object_detection/builders/hyperparams_builder.py
@@ -22,6 +22,95 @@ from object_detection.utils import context_manager
 slim = tf.contrib.slim
+class KerasLayerHyperparams(object):
+  """
+  A hyperparameter configuration object for Keras layers used in
+  Object Detection models.
+  """
+  def __init__(self, hyperparams_config):
+    """Builds keras hyperparameter config for layers based on the proto config.
+    It automatically converts from Slim layer hyperparameter configs to
+    Keras layer hyperparameters. Namely, it:
+    - Builds Keras initializers/regularizers instead of Slim ones
+    - sets weights_regularizer/initializer to kernel_regularizer/initializer
+    - converts batchnorm decay to momentum
+    - converts Slim l2 regularizer weights to the equivalent Keras l2 weights
+    Contains a hyperparameter configuration for ops that specifies kernel
+    initializer, kernel regularizer, activation. Also contains parameters for
+    batch norm operators based on the configuration.
+    Note that if the batch_norm parameters are not specified in the config
+    (i.e. left to default) then batch norm is excluded from the config.
+    Args:
+      hyperparams_config: hyperparams.proto object containing
+        hyperparameters.
+    Raises:
+      ValueError: if hyperparams_config is not of type hyperparams.Hyperparams.
+    """
+    if not isinstance(hyperparams_config,
+                      hyperparams_pb2.Hyperparams):
+      raise ValueError('hyperparams_config not of type '
+                       'hyperparams_pb.Hyperparams.')
+    self._batch_norm_params = None
+    if hyperparams_config.HasField('batch_norm'):
+      self._batch_norm_params = _build_keras_batch_norm_params(
+          hyperparams_config.batch_norm)
+    self._op_params = {
+        'kernel_regularizer': _build_keras_regularizer(
+            hyperparams_config.regularizer),
+        'kernel_initializer': _build_initializer(
+            hyperparams_config.initializer, build_for_keras=True),
+        'activation': _build_activation_fn(hyperparams_config.activation)
+    }
+  def use_batch_norm(self):
+    return self._batch_norm_params is not None
+  def batch_norm_params(self, **overrides):
+    """Returns a dict containing batchnorm layer construction hyperparameters.
+    Optionally overrides values in the batchnorm hyperparam dict. Overrides
+    only apply to individual calls of this method, and do not affect
+    future calls.
+    Args:
+      **overrides: keyword arguments to override in the hyperparams dictionary
+    Returns: dict containing the layer construction keyword arguments, with
+      values overridden by the `overrides` keyword arguments.
+    """
+    if self._batch_norm_params is None:
+      new_batch_norm_params = dict()
+    else:
+      new_batch_norm_params = self._batch_norm_params.copy()
+    new_batch_norm_params.update(overrides)
+    return new_batch_norm_params
+  def params(self, **overrides):
+    """Returns a dict containing the layer construction hyperparameters to use.
+    Optionally overrides values in the returned dict. Overrides
+    only apply to individual calls of this method, and do not affect
+    future calls.
+    Args:
+      **overrides: keyword arguments to override in the hyperparams dictionary.
+    Returns: dict containing the layer construction keyword arguments, with
+      values overridden by the `overrides` keyword arguments.
+    """
+    new_params = self._op_params.copy()
+    new_params.update(**overrides)
+    return new_params
 def build(hyperparams_config, is_training):
  """Builds tf-slim arg_scope for convolution ops based on the config.
@@ -72,7 +161,7 @@ def build(hyperparams_config, is_training):
          context_manager.IdentityContextManager()):
      with slim.arg_scope(
          affected_ops,
-          weights_regularizer=_build_regularizer(
+          weights_regularizer=_build_slim_regularizer(
              hyperparams_config.regularizer),
          weights_initializer=_build_initializer(
              hyperparams_config.initializer),
@@ -104,7 +193,7 @@ def _build_activation_fn(activation_fn):
  raise ValueError('Unknown activation function: {}'.format(activation_fn))
-def _build_regularizer(regularizer):
+def _build_slim_regularizer(regularizer):
  """Builds a tf-slim regularizer from config.
  Args:
@@ -124,11 +213,36 @@ def _build_regularizer(regularizer):
  raise ValueError('Unknown regularizer function: {}'.format(regularizer_oneof))
-def _build_initializer(initializer):
+def _build_keras_regularizer(regularizer):
+  """Builds a keras regularizer from config.
+  Args:
+    regularizer: hyperparams_pb2.Hyperparams.regularizer proto.
+  Returns:
+    Keras regularizer.
+  Raises:
+    ValueError: On unknown regularizer.
+  """
+  regularizer_oneof = regularizer.WhichOneof('regularizer_oneof')
+  if  regularizer_oneof == 'l1_regularizer':
+    return tf.keras.regularizers.l1(float(regularizer.l1_regularizer.weight))
+  if regularizer_oneof == 'l2_regularizer':
+    # The Keras L2 regularizer weight differs from the Slim L2 regularizer
+    # weight by a factor of 2
+    return tf.keras.regularizers.l2(
+        float(regularizer.l2_regularizer.weight * 0.5))
+  raise ValueError('Unknown regularizer function: {}'.format(regularizer_oneof))
+def _build_initializer(initializer, build_for_keras=False):
  """Build a tf initializer from config.
  Args:
    initializer: hyperparams_pb2.Hyperparams.regularizer proto.
+    build_for_keras: Whether the initializers should be built for Keras
+      operators. If false builds for Slim.
  Returns:
    tf initializer.
@@ -151,10 +265,42 @@ def _build_initializer(initializer):
    mode = enum_descriptor.values_by_number[initializer.
                                            variance_scaling_initializer.
                                            mode].name
-    return slim.variance_scaling_initializer(
+    if build_for_keras:
-        factor=initializer.variance_scaling_initializer.factor,
+      if initializer.variance_scaling_initializer.uniform:
-        mode=mode,
+        return tf.variance_scaling_initializer(
-        uniform=initializer.variance_scaling_initializer.uniform)
+            scale=initializer.variance_scaling_initializer.factor,
+            mode=mode.lower(),
+            distribution='uniform')
+      else:
+        # In TF 1.9 release and earlier, the truncated_normal distribution was
+        # not supported correctly. So, in these earlier versions of tensorflow,
+        # the ValueError will be raised, and we manually truncate the
+        # distribution scale.
+        #
+        # It is insufficient to just set distribution to `normal` from the
+        # start, because the `normal` distribution in newer Tensorflow versions
+        # creates a truncated distribution, whereas it created untruncated
+        # distributions in older versions.
+        try:
+          return tf.variance_scaling_initializer(
+              scale=initializer.variance_scaling_initializer.factor,
+              mode=mode.lower(),
+              distribution='truncated_normal')
+        except ValueError:
+          truncate_constant = 0.87962566103423978
+          truncated_scale = initializer.variance_scaling_initializer.factor / (
+              truncate_constant * truncate_constant
+          )
+          return tf.variance_scaling_initializer(
+              scale=truncated_scale,
+              mode=mode.lower(),
+              distribution='normal')
+    else:
+      return slim.variance_scaling_initializer(
+          factor=initializer.variance_scaling_initializer.factor,
+          mode=mode,
+          uniform=initializer.variance_scaling_initializer.uniform)
  raise ValueError('Unknown initializer function: {}'.format(
      initializer_oneof))
@@ -180,3 +326,25 @@ def _build_batch_norm_params(batch_norm, is_training):
      'is_training': is_training and batch_norm.train,
  }
  return batch_norm_params
+def _build_keras_batch_norm_params(batch_norm):
+  """Build a dictionary of Keras BatchNormalization params from config.
+  Args:
+    batch_norm: hyperparams_pb2.ConvHyperparams.batch_norm proto.
+  Returns:
+    A dictionary containing Keras BatchNormalization parameters.
+  """
+  # Note: Although decay is defined to be 1 - momentum in batch_norm,
+  # decay in the slim batch_norm layers was erroneously defined and is
+  # actually the same as momentum in the Keras batch_norm layers.
+  # For context, see: github.com/keras-team/keras/issues/6839
+  batch_norm_params = {
+      'momentum': batch_norm.decay,
+      'center': batch_norm.center,
+      'scale': batch_norm.scale,
+      'epsilon': batch_norm.epsilon,
+  }
+  return batch_norm_params
--- a/research/object_detection/builders/hyperparams_builder_test.py
+++ b/research/object_detection/builders/hyperparams_builder_test.py
@@ -149,6 +149,29 @@ class HyperparamsBuilderTest(tf.test.TestCase):
      result = sess.run(regularizer(tf.constant(weights)))
    self.assertAllClose(np.abs(weights).sum() * 0.5, result)
+  def test_return_l1_regularized_weights_keras(self):
+    conv_hyperparams_text_proto = """
+      regularizer {
+        l1_regularizer {
+          weight: 0.5
+        }
+      }
+      initializer {
+        truncated_normal_initializer {
+        }
+      }
+    """
+    conv_hyperparams_proto = hyperparams_pb2.Hyperparams()
+    text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams_proto)
+    keras_config = hyperparams_builder.KerasLayerHyperparams(
+        conv_hyperparams_proto)
+    regularizer = keras_config.params()['kernel_regularizer']
+    weights = np.array([1., -1, 4., 2.])
+    with self.test_session() as sess:
+      result = sess.run(regularizer(tf.constant(weights)))
+    self.assertAllClose(np.abs(weights).sum() * 0.5, result)
  def test_return_l2_regularizer_weights(self):
    conv_hyperparams_text_proto = """
      regularizer {
@@ -174,6 +197,29 @@ class HyperparamsBuilderTest(tf.test.TestCase):
      result = sess.run(regularizer(tf.constant(weights)))
    self.assertAllClose(np.power(weights, 2).sum() / 2.0 * 0.42, result)
+  def test_return_l2_regularizer_weights_keras(self):
+    conv_hyperparams_text_proto = """
+      regularizer {
+        l2_regularizer {
+          weight: 0.42
+        }
+      }
+      initializer {
+        truncated_normal_initializer {
+        }
+      }
+    """
+    conv_hyperparams_proto = hyperparams_pb2.Hyperparams()
+    text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams_proto)
+    keras_config = hyperparams_builder.KerasLayerHyperparams(
+        conv_hyperparams_proto)
+    regularizer = keras_config.params()['kernel_regularizer']
+    weights = np.array([1., -1, 4., 2.])
+    with self.test_session() as sess:
+      result = sess.run(regularizer(tf.constant(weights)))
+    self.assertAllClose(np.power(weights, 2).sum() / 2.0 * 0.42, result)
  def test_return_non_default_batch_norm_params_with_train_during_train(self):
    conv_hyperparams_text_proto = """
      regularizer {
@@ -206,6 +252,66 @@ class HyperparamsBuilderTest(tf.test.TestCase):
    self.assertTrue(batch_norm_params['scale'])
    self.assertTrue(batch_norm_params['is_training'])
+  def test_return_non_default_batch_norm_params_keras(
+      self):
+    conv_hyperparams_text_proto = """
+      regularizer {
+        l2_regularizer {
+        }
+      }
+      initializer {
+        truncated_normal_initializer {
+        }
+      }
+      batch_norm {
+        decay: 0.7
+        center: false
+        scale: true
+        epsilon: 0.03
+      }
+    """
+    conv_hyperparams_proto = hyperparams_pb2.Hyperparams()
+    text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams_proto)
+    keras_config = hyperparams_builder.KerasLayerHyperparams(
+        conv_hyperparams_proto)
+    self.assertTrue(keras_config.use_batch_norm())
+    batch_norm_params = keras_config.batch_norm_params()
+    self.assertAlmostEqual(batch_norm_params['momentum'], 0.7)
+    self.assertAlmostEqual(batch_norm_params['epsilon'], 0.03)
+    self.assertFalse(batch_norm_params['center'])
+    self.assertTrue(batch_norm_params['scale'])
+  def test_return_non_default_batch_norm_params_keras_override(
+      self):
+    conv_hyperparams_text_proto = """
+      regularizer {
+        l2_regularizer {
+        }
+      }
+      initializer {
+        truncated_normal_initializer {
+        }
+      }
+      batch_norm {
+        decay: 0.7
+        center: false
+        scale: true
+        epsilon: 0.03
+      }
+    """
+    conv_hyperparams_proto = hyperparams_pb2.Hyperparams()
+    text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams_proto)
+    keras_config = hyperparams_builder.KerasLayerHyperparams(
+        conv_hyperparams_proto)
+    self.assertTrue(keras_config.use_batch_norm())
+    batch_norm_params = keras_config.batch_norm_params(momentum=0.4)
+    self.assertAlmostEqual(batch_norm_params['momentum'], 0.4)
+    self.assertAlmostEqual(batch_norm_params['epsilon'], 0.03)
+    self.assertFalse(batch_norm_params['center'])
+    self.assertTrue(batch_norm_params['scale'])
  def test_return_batch_norm_params_with_notrain_during_eval(self):
    conv_hyperparams_text_proto = """
      regularizer {
@@ -289,6 +395,24 @@ class HyperparamsBuilderTest(tf.test.TestCase):
    conv_scope_arguments = scope[_get_scope_key(slim.conv2d)]
    self.assertEqual(conv_scope_arguments['normalizer_fn'], None)
+  def test_do_not_use_batch_norm_if_default_keras(self):
+    conv_hyperparams_text_proto = """
+      regularizer {
+        l2_regularizer {
+        }
+      }
+      initializer {
+        truncated_normal_initializer {
+        }
+      }
+    """
+    conv_hyperparams_proto = hyperparams_pb2.Hyperparams()
+    text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams_proto)
+    keras_config = hyperparams_builder.KerasLayerHyperparams(
+        conv_hyperparams_proto)
+    self.assertFalse(keras_config.use_batch_norm())
+    self.assertEqual(keras_config.batch_norm_params(), {})
  def test_use_none_activation(self):
    conv_hyperparams_text_proto = """
      regularizer {
@@ -309,6 +433,24 @@ class HyperparamsBuilderTest(tf.test.TestCase):
    conv_scope_arguments = scope[_get_scope_key(slim.conv2d)]
    self.assertEqual(conv_scope_arguments['activation_fn'], None)
+  def test_use_none_activation_keras(self):
+    conv_hyperparams_text_proto = """
+      regularizer {
+        l2_regularizer {
+        }
+      }
+      initializer {
+        truncated_normal_initializer {
+        }
+      }
+      activation: NONE
+    """
+    conv_hyperparams_proto = hyperparams_pb2.Hyperparams()
+    text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams_proto)
+    keras_config = hyperparams_builder.KerasLayerHyperparams(
+        conv_hyperparams_proto)
+    self.assertEqual(keras_config.params()['activation'], None)
  def test_use_relu_activation(self):
    conv_hyperparams_text_proto = """
      regularizer {
@@ -329,6 +471,24 @@ class HyperparamsBuilderTest(tf.test.TestCase):
    conv_scope_arguments = scope[_get_scope_key(slim.conv2d)]
    self.assertEqual(conv_scope_arguments['activation_fn'], tf.nn.relu)
+  def test_use_relu_activation_keras(self):
+    conv_hyperparams_text_proto = """
+      regularizer {
+        l2_regularizer {
+        }
+      }
+      initializer {
+        truncated_normal_initializer {
+        }
+      }
+      activation: RELU
+    """
+    conv_hyperparams_proto = hyperparams_pb2.Hyperparams()
+    text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams_proto)
+    keras_config = hyperparams_builder.KerasLayerHyperparams(
+        conv_hyperparams_proto)
+    self.assertEqual(keras_config.params()['activation'], tf.nn.relu)
  def test_use_relu_6_activation(self):
    conv_hyperparams_text_proto = """
      regularizer {
@@ -349,6 +509,43 @@ class HyperparamsBuilderTest(tf.test.TestCase):
    conv_scope_arguments = scope[_get_scope_key(slim.conv2d)]
    self.assertEqual(conv_scope_arguments['activation_fn'], tf.nn.relu6)
+  def test_use_relu_6_activation_keras(self):
+    conv_hyperparams_text_proto = """
+      regularizer {
+        l2_regularizer {
+        }
+      }
+      initializer {
+        truncated_normal_initializer {
+        }
+      }
+      activation: RELU_6
+    """
+    conv_hyperparams_proto = hyperparams_pb2.Hyperparams()
+    text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams_proto)
+    keras_config = hyperparams_builder.KerasLayerHyperparams(
+        conv_hyperparams_proto)
+    self.assertEqual(keras_config.params()['activation'], tf.nn.relu6)
+  def test_override_activation_keras(self):
+    conv_hyperparams_text_proto = """
+      regularizer {
+        l2_regularizer {
+        }
+      }
+      initializer {
+        truncated_normal_initializer {
+        }
+      }
+      activation: RELU_6
+    """
+    conv_hyperparams_proto = hyperparams_pb2.Hyperparams()
+    text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams_proto)
+    keras_config = hyperparams_builder.KerasLayerHyperparams(
+        conv_hyperparams_proto)
+    new_params = keras_config.params(activation=tf.nn.relu)
+    self.assertEqual(new_params['activation'], tf.nn.relu)
  def _assert_variance_in_range(self, initializer, shape, variance,
                                tol=1e-2):
    with tf.Graph().as_default() as g:
@@ -386,6 +583,29 @@ class HyperparamsBuilderTest(tf.test.TestCase):
    self._assert_variance_in_range(initializer, shape=[100, 40],
                                   variance=2. / 100.)
+  def test_variance_in_range_with_variance_scaling_initializer_fan_in_keras(
+      self):
+    conv_hyperparams_text_proto = """
+      regularizer {
+        l2_regularizer {
+        }
+      }
+      initializer {
+        variance_scaling_initializer {
+          factor: 2.0
+          mode: FAN_IN
+          uniform: false
+        }
+      }
+    """
+    conv_hyperparams_proto = hyperparams_pb2.Hyperparams()
+    text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams_proto)
+    keras_config = hyperparams_builder.KerasLayerHyperparams(
+        conv_hyperparams_proto)
+    initializer = keras_config.params()['kernel_initializer']
+    self._assert_variance_in_range(initializer, shape=[100, 40],
+                                   variance=2. / 100.)
  def test_variance_in_range_with_variance_scaling_initializer_fan_out(self):
    conv_hyperparams_text_proto = """
      regularizer {
@@ -410,6 +630,29 @@ class HyperparamsBuilderTest(tf.test.TestCase):
    self._assert_variance_in_range(initializer, shape=[100, 40],
                                   variance=2. / 40.)
+  def test_variance_in_range_with_variance_scaling_initializer_fan_out_keras(
+      self):
+    conv_hyperparams_text_proto = """
+      regularizer {
+        l2_regularizer {
+        }
+      }
+      initializer {
+        variance_scaling_initializer {
+          factor: 2.0
+          mode: FAN_OUT
+          uniform: false
+        }
+      }
+    """
+    conv_hyperparams_proto = hyperparams_pb2.Hyperparams()
+    text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams_proto)
+    keras_config = hyperparams_builder.KerasLayerHyperparams(
+        conv_hyperparams_proto)
+    initializer = keras_config.params()['kernel_initializer']
+    self._assert_variance_in_range(initializer, shape=[100, 40],
+                                   variance=2. / 40.)
  def test_variance_in_range_with_variance_scaling_initializer_fan_avg(self):
    conv_hyperparams_text_proto = """
      regularizer {
@@ -434,6 +677,29 @@ class HyperparamsBuilderTest(tf.test.TestCase):
    self._assert_variance_in_range(initializer, shape=[100, 40],
                                   variance=4. / (100. + 40.))
+  def test_variance_in_range_with_variance_scaling_initializer_fan_avg_keras(
+      self):
+    conv_hyperparams_text_proto = """
+      regularizer {
+        l2_regularizer {
+        }
+      }
+      initializer {
+        variance_scaling_initializer {
+          factor: 2.0
+          mode: FAN_AVG
+          uniform: false
+        }
+      }
+    """
+    conv_hyperparams_proto = hyperparams_pb2.Hyperparams()
+    text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams_proto)
+    keras_config = hyperparams_builder.KerasLayerHyperparams(
+        conv_hyperparams_proto)
+    initializer = keras_config.params()['kernel_initializer']
+    self._assert_variance_in_range(initializer, shape=[100, 40],
+                                   variance=4. / (100. + 40.))
  def test_variance_in_range_with_variance_scaling_initializer_uniform(self):
    conv_hyperparams_text_proto = """
      regularizer {
@@ -458,6 +724,29 @@ class HyperparamsBuilderTest(tf.test.TestCase):
    self._assert_variance_in_range(initializer, shape=[100, 40],
                                   variance=2. / 100.)
+  def test_variance_in_range_with_variance_scaling_initializer_uniform_keras(
+      self):
+    conv_hyperparams_text_proto = """
+      regularizer {
+        l2_regularizer {
+        }
+      }
+      initializer {
+        variance_scaling_initializer {
+          factor: 2.0
+          mode: FAN_IN
+          uniform: true
+        }
+      }
+    """
+    conv_hyperparams_proto = hyperparams_pb2.Hyperparams()
+    text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams_proto)
+    keras_config = hyperparams_builder.KerasLayerHyperparams(
+        conv_hyperparams_proto)
+    initializer = keras_config.params()['kernel_initializer']
+    self._assert_variance_in_range(initializer, shape=[100, 40],
+                                   variance=2. / 100.)
  def test_variance_in_range_with_truncated_normal_initializer(self):
    conv_hyperparams_text_proto = """
      regularizer {
@@ -481,6 +770,27 @@ class HyperparamsBuilderTest(tf.test.TestCase):
    self._assert_variance_in_range(initializer, shape=[100, 40],
                                   variance=0.49, tol=1e-1)
+  def test_variance_in_range_with_truncated_normal_initializer_keras(self):
+    conv_hyperparams_text_proto = """
+      regularizer {
+        l2_regularizer {
+        }
+      }
+      initializer {
+        truncated_normal_initializer {
+          mean: 0.0
+          stddev: 0.8
+        }
+      }
+    """
+    conv_hyperparams_proto = hyperparams_pb2.Hyperparams()
+    text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams_proto)
+    keras_config = hyperparams_builder.KerasLayerHyperparams(
+        conv_hyperparams_proto)
+    initializer = keras_config.params()['kernel_initializer']
+    self._assert_variance_in_range(initializer, shape=[100, 40],
+                                   variance=0.49, tol=1e-1)
  def test_variance_in_range_with_random_normal_initializer(self):
    conv_hyperparams_text_proto = """
      regularizer {
@@ -504,6 +814,27 @@ class HyperparamsBuilderTest(tf.test.TestCase):
    self._assert_variance_in_range(initializer, shape=[100, 40],
                                   variance=0.64, tol=1e-1)
+  def test_variance_in_range_with_random_normal_initializer_keras(self):
+    conv_hyperparams_text_proto = """
+      regularizer {
+        l2_regularizer {
+        }
+      }
+      initializer {
+        random_normal_initializer {
+          mean: 0.0
+          stddev: 0.8
+        }
+      }
+    """
+    conv_hyperparams_proto = hyperparams_pb2.Hyperparams()
+    text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams_proto)
+    keras_config = hyperparams_builder.KerasLayerHyperparams(
+        conv_hyperparams_proto)
+    initializer = keras_config.params()['kernel_initializer']
+    self._assert_variance_in_range(initializer, shape=[100, 40],
+                                   variance=0.64, tol=1e-1)
 if __name__ == '__main__':
  tf.test.main()
--- a/research/object_detection/builders/image_resizer_builder_test.py
+++ b/research/object_detection/builders/image_resizer_builder_test.py
@@ -46,6 +46,20 @@ class ImageResizerBuilderTest(tf.test.TestCase):
        input_shape, image_resizer_text_proto)
    self.assertEqual(output_shape, expected_output_shape)
+  def test_build_keep_aspect_ratio_resizer_grayscale(self):
+    image_resizer_text_proto = """
+      keep_aspect_ratio_resizer {
+        min_dimension: 10
+        max_dimension: 20
+        convert_to_grayscale: true
+      }
+    """
+    input_shape = (50, 25, 3)
+    expected_output_shape = (20, 10, 1)
+    output_shape = self._shape_of_resized_random_image_given_text_proto(
+        input_shape, image_resizer_text_proto)
+    self.assertEqual(output_shape, expected_output_shape)
  def test_build_keep_aspect_ratio_resizer_with_padding(self):
    image_resizer_text_proto = """
      keep_aspect_ratio_resizer {
@@ -76,6 +90,20 @@ class ImageResizerBuilderTest(tf.test.TestCase):
        input_shape, image_resizer_text_proto)
    self.assertEqual(output_shape, expected_output_shape)
+  def test_built_fixed_shape_resizer_grayscale(self):
+    image_resizer_text_proto = """
+      fixed_shape_resizer {
+        height: 10
+        width: 20
+        convert_to_grayscale: true
+      }
+    """
+    input_shape = (50, 25, 3)
+    expected_output_shape = (10, 20, 1)
+    output_shape = self._shape_of_resized_random_image_given_text_proto(
+        input_shape, image_resizer_text_proto)
+    self.assertEqual(output_shape, expected_output_shape)
  def test_raises_error_on_invalid_input(self):
    invalid_input = 'invalid_input'
    with self.assertRaises(ValueError):

--- a/research/object_detection/builders/model_builder.py
+++ b/research/object_detection/builders/model_builder.py
@@ -23,7 +23,8 @@ from object_detection.builders import losses_builder
 from object_detection.builders import matcher_builder
 from object_detection.builders import post_processing_builder
 from object_detection.builders import region_similarity_calculator_builder as sim_calc
-from object_detection.core import box_predictor
+from object_detection.core import balanced_positive_negative_sampler as sampler
+from object_detection.core import target_assigner
 from object_detection.meta_architectures import faster_rcnn_meta_arch
 from object_detection.meta_architectures import rfcn_meta_arch
 from object_detection.meta_architectures import ssd_meta_arch
@@ -41,6 +42,7 @@ from object_detection.models.ssd_mobilenet_v1_feature_extractor import SSDMobile
 from object_detection.models.ssd_mobilenet_v1_fpn_feature_extractor import SSDMobileNetV1FpnFeatureExtractor
 from object_detection.models.ssd_mobilenet_v1_ppn_feature_extractor import SSDMobileNetV1PpnFeatureExtractor
 from object_detection.models.ssd_mobilenet_v2_feature_extractor import SSDMobileNetV2FeatureExtractor
+from object_detection.predictors import rfcn_box_predictor
 from object_detection.protos import model_pb2
 # A map of names to SSD feature extractors.
@@ -142,10 +144,34 @@ def _build_ssd_feature_extractor(feature_extractor_config, is_training,
    raise ValueError('Unknown ssd feature_extractor: {}'.format(feature_type))
  feature_extractor_class = SSD_FEATURE_EXTRACTOR_CLASS_MAP[feature_type]
-  return feature_extractor_class(
+  kwargs = {
-      is_training, depth_multiplier, min_depth, pad_to_multiple,
+      'is_training':
-      conv_hyperparams, reuse_weights, use_explicit_padding, use_depthwise,
+          is_training,
-      override_base_feature_extractor_hyperparams)
+      'depth_multiplier':
+          depth_multiplier,
+      'min_depth':
+          min_depth,
+      'pad_to_multiple':
+          pad_to_multiple,
+      'conv_hyperparams_fn':
+          conv_hyperparams,
+      'reuse_weights':
+          reuse_weights,
+      'use_explicit_padding':
+          use_explicit_padding,
+      'use_depthwise':
+          use_depthwise,
+      'override_base_feature_extractor_hyperparams':
+          override_base_feature_extractor_hyperparams
+  }
+  if feature_extractor_config.HasField('fpn'):
+    kwargs.update({
+        'fpn_min_level': feature_extractor_config.fpn.min_level,
+        'fpn_max_level': feature_extractor_config.fpn.max_level,
+    })
+  return feature_extractor_class(**kwargs)
 def _build_ssd_model(ssd_config, is_training, add_summaries,
@@ -291,6 +317,10 @@ def _build_faster_rcnn_model(frcnn_config, is_training, add_summaries):
  first_stage_anchor_generator = anchor_generator_builder.build(
      frcnn_config.first_stage_anchor_generator)
+  first_stage_target_assigner = target_assigner.create_target_assigner(
+      'FasterRCNN',
+      'proposal',
+      use_matmul_gather=frcnn_config.use_matmul_gather_in_matcher)
  first_stage_atrous_rate = frcnn_config.first_stage_atrous_rate
  first_stage_box_predictor_arg_scope_fn = hyperparams_builder.build(
      frcnn_config.first_stage_box_predictor_conv_hyperparams, is_training)
@@ -298,8 +328,9 @@ def _build_faster_rcnn_model(frcnn_config, is_training, add_summaries):
      frcnn_config.first_stage_box_predictor_kernel_size)
  first_stage_box_predictor_depth = frcnn_config.first_stage_box_predictor_depth
  first_stage_minibatch_size = frcnn_config.first_stage_minibatch_size
-  first_stage_positive_balance_fraction = (
+  first_stage_sampler = sampler.BalancedPositiveNegativeSampler(
-      frcnn_config.first_stage_positive_balance_fraction)
+      positive_fraction=frcnn_config.first_stage_positive_balance_fraction,
+      is_static=frcnn_config.use_static_balanced_label_sampler)
  first_stage_nms_score_threshold = frcnn_config.first_stage_nms_score_threshold
  first_stage_nms_iou_threshold = frcnn_config.first_stage_nms_iou_threshold
  first_stage_max_proposals = frcnn_config.first_stage_max_proposals
@@ -311,13 +342,19 @@ def _build_faster_rcnn_model(frcnn_config, is_training, add_summaries):
  maxpool_kernel_size = frcnn_config.maxpool_kernel_size
  maxpool_stride = frcnn_config.maxpool_stride
+  second_stage_target_assigner = target_assigner.create_target_assigner(
+      'FasterRCNN',
+      'detection',
+      use_matmul_gather=frcnn_config.use_matmul_gather_in_matcher)
  second_stage_box_predictor = box_predictor_builder.build(
      hyperparams_builder.build,
      frcnn_config.second_stage_box_predictor,
      is_training=is_training,
      num_classes=num_classes)
  second_stage_batch_size = frcnn_config.second_stage_batch_size
-  second_stage_balance_fraction = frcnn_config.second_stage_balance_fraction
+  second_stage_sampler = sampler.BalancedPositiveNegativeSampler(
+      positive_fraction=frcnn_config.second_stage_balance_fraction,
+      is_static=frcnn_config.use_static_balanced_label_sampler)
  (second_stage_non_max_suppression_fn, second_stage_score_conversion_fn
  ) = post_processing_builder.build(frcnn_config.second_stage_post_processing)
  second_stage_localization_loss_weight = (
@@ -338,6 +375,8 @@ def _build_faster_rcnn_model(frcnn_config, is_training, add_summaries):
        second_stage_localization_loss_weight)
  use_matmul_crop_and_resize = (frcnn_config.use_matmul_crop_and_resize)
+  clip_anchors_to_image = (
+      frcnn_config.clip_anchors_to_image)
  common_kwargs = {
      'is_training': is_training,
@@ -346,6 +385,7 @@ def _build_faster_rcnn_model(frcnn_config, is_training, add_summaries):
      'feature_extractor': feature_extractor,
      'number_of_stages': number_of_stages,
      'first_stage_anchor_generator': first_stage_anchor_generator,
+      'first_stage_target_assigner': first_stage_target_assigner,
      'first_stage_atrous_rate': first_stage_atrous_rate,
      'first_stage_box_predictor_arg_scope_fn':
      first_stage_box_predictor_arg_scope_fn,
@@ -353,15 +393,15 @@ def _build_faster_rcnn_model(frcnn_config, is_training, add_summaries):
      first_stage_box_predictor_kernel_size,
      'first_stage_box_predictor_depth': first_stage_box_predictor_depth,
      'first_stage_minibatch_size': first_stage_minibatch_size,
-      'first_stage_positive_balance_fraction':
+      'first_stage_sampler': first_stage_sampler,
-      first_stage_positive_balance_fraction,
      'first_stage_nms_score_threshold': first_stage_nms_score_threshold,
      'first_stage_nms_iou_threshold': first_stage_nms_iou_threshold,
      'first_stage_max_proposals': first_stage_max_proposals,
      'first_stage_localization_loss_weight': first_stage_loc_loss_weight,
      'first_stage_objectness_loss_weight': first_stage_obj_loss_weight,
+      'second_stage_target_assigner': second_stage_target_assigner,
      'second_stage_batch_size': second_stage_batch_size,
-      'second_stage_balance_fraction': second_stage_balance_fraction,
+      'second_stage_sampler': second_stage_sampler,
      'second_stage_non_max_suppression_fn':
      second_stage_non_max_suppression_fn,
      'second_stage_score_conversion_fn': second_stage_score_conversion_fn,
@@ -373,10 +413,12 @@ def _build_faster_rcnn_model(frcnn_config, is_training, add_summaries):
      second_stage_classification_loss_weight,
      'hard_example_miner': hard_example_miner,
      'add_summaries': add_summaries,
-      'use_matmul_crop_and_resize': use_matmul_crop_and_resize
+      'use_matmul_crop_and_resize': use_matmul_crop_and_resize,
+      'clip_anchors_to_image': clip_anchors_to_image
  }
-  if isinstance(second_stage_box_predictor, box_predictor.RfcnBoxPredictor):
+  if isinstance(second_stage_box_predictor,
+                rfcn_box_predictor.RfcnBoxPredictor):
    return rfcn_meta_arch.RFCNMetaArch(
        second_stage_rfcn_box_predictor=second_stage_box_predictor,
        **common_kwargs)

--- a/research/object_detection/builders/model_builder_test.py
+++ b/research/object_detection/builders/model_builder_test.py
@@ -54,12 +54,6 @@ SSD_RESNET_V1_FPN_FEAT_MAPS = {
    ssd_resnet_v1_fpn.SSDResnet101V1FpnFeatureExtractor,
    'ssd_resnet152_v1_fpn':
    ssd_resnet_v1_fpn.SSDResnet152V1FpnFeatureExtractor,
-    'ssd_resnet50_v1_ppn':
-    ssd_resnet_v1_ppn.SSDResnet50V1PpnFeatureExtractor,
-    'ssd_resnet101_v1_ppn':
-    ssd_resnet_v1_ppn.SSDResnet101V1PpnFeatureExtractor,
-    'ssd_resnet152_v1_ppn':
-    ssd_resnet_v1_ppn.SSDResnet152V1PpnFeatureExtractor
 }
 SSD_RESNET_V1_PPN_FEAT_MAPS = {
@@ -235,6 +229,10 @@ class ModelBuilderTest(tf.test.TestCase):
      ssd {
        feature_extractor {
          type: 'ssd_resnet50_v1_fpn'
+          fpn {
+            min_level: 3
+            max_level: 7
+          }
          conv_hyperparams {
            regularizer {
                l2_regularizer {
@@ -479,6 +477,10 @@ class ModelBuilderTest(tf.test.TestCase):
        inplace_batchnorm_update: true
        feature_extractor {
          type: 'ssd_mobilenet_v1_fpn'
+          fpn {
+            min_level: 3
+            max_level: 7
+          }
          conv_hyperparams {
            regularizer {
                l2_regularizer {

--- a/research/object_detection/builders/preprocessor_builder.py
+++ b/research/object_detection/builders/preprocessor_builder.py
@@ -71,22 +71,38 @@ def _get_dict_from_proto(config):
 # function that should be used. The PreprocessingStep proto should be parsable
 # with _get_dict_from_proto.
 PREPROCESSING_FUNCTION_MAP = {
-    'normalize_image': preprocessor.normalize_image,
+    'normalize_image':
-    'random_pixel_value_scale': preprocessor.random_pixel_value_scale,
+        preprocessor.normalize_image,
-    'random_image_scale': preprocessor.random_image_scale,
+    'random_pixel_value_scale':
-    'random_rgb_to_gray': preprocessor.random_rgb_to_gray,
+        preprocessor.random_pixel_value_scale,
-    'random_adjust_brightness': preprocessor.random_adjust_brightness,
+    'random_image_scale':
-    'random_adjust_contrast': preprocessor.random_adjust_contrast,
+        preprocessor.random_image_scale,
-    'random_adjust_hue': preprocessor.random_adjust_hue,
+    'random_rgb_to_gray':
-    'random_adjust_saturation': preprocessor.random_adjust_saturation,
+        preprocessor.random_rgb_to_gray,
-    'random_distort_color': preprocessor.random_distort_color,
+    'random_adjust_brightness':
-    'random_jitter_boxes': preprocessor.random_jitter_boxes,
+        preprocessor.random_adjust_brightness,
-    'random_crop_to_aspect_ratio': preprocessor.random_crop_to_aspect_ratio,
+    'random_adjust_contrast':
-    'random_black_patches': preprocessor.random_black_patches,
+        preprocessor.random_adjust_contrast,
-    'rgb_to_gray': preprocessor.rgb_to_gray,
+    'random_adjust_hue':
+        preprocessor.random_adjust_hue,
+    'random_adjust_saturation':
+        preprocessor.random_adjust_saturation,
+    'random_distort_color':
+        preprocessor.random_distort_color,
+    'random_jitter_boxes':
+        preprocessor.random_jitter_boxes,
+    'random_crop_to_aspect_ratio':
+        preprocessor.random_crop_to_aspect_ratio,
+    'random_black_patches':
+        preprocessor.random_black_patches,
+    'rgb_to_gray':
+        preprocessor.rgb_to_gray,
    'scale_boxes_to_pixel_coordinates': (
        preprocessor.scale_boxes_to_pixel_coordinates),
-    'subtract_channel_mean': preprocessor.subtract_channel_mean,
+    'subtract_channel_mean':
+        preprocessor.subtract_channel_mean,
+    'convert_class_logits_to_softmax':
+        preprocessor.convert_class_logits_to_softmax,
 }

--- a/research/object_detection/builders/preprocessor_builder_test.py
+++ b/research/object_detection/builders/preprocessor_builder_test.py
@@ -561,6 +561,18 @@ class PreprocessorBuilderTest(tf.test.TestCase):
                            'min_padded_size_ratio': (1.0, 1.0),
                            'max_padded_size_ratio': (2.0, 2.0)})
+  def test_build_normalize_image_convert_class_logits_to_softmax(self):
+    preprocessor_text_proto = """
+    convert_class_logits_to_softmax {
+        temperature: 2
+    }
+    """
+    preprocessor_proto = preprocessor_pb2.PreprocessingStep()
+    text_format.Merge(preprocessor_text_proto, preprocessor_proto)
+    function, args = preprocessor_builder.build(preprocessor_proto)
+    self.assertEqual(function, preprocessor.convert_class_logits_to_softmax)
+    self.assertEqual(args, {'temperature': 2})
 if __name__ == '__main__':
  tf.test.main()
--- a/research/object_detection/builders/region_similarity_calculator_builder.py
+++ b/research/object_detection/builders/region_similarity_calculator_builder.py
@@ -51,6 +51,9 @@ def build(region_similarity_calculator_config):
    return region_similarity_calculator.IoaSimilarity()
  if similarity_calculator == 'neg_sq_dist_similarity':
    return region_similarity_calculator.NegSqDistSimilarity()
+  if similarity_calculator == 'thresholded_iou_similarity':
+    return region_similarity_calculator.ThresholdedIouSimilarity(
+        region_similarity_calculator_config.thresholded_iou_similarity.threshold
+    )
  raise ValueError('Unknown region similarity calculator.')
--- a/research/object_detection/core/balanced_positive_negative_sampler.py
+++ b/research/object_detection/core/balanced_positive_negative_sampler.py
@@ -29,17 +29,19 @@ the minibatch_sampler base class.
 import tensorflow as tf
 from object_detection.core import minibatch_sampler
+from object_detection.utils import ops
 class BalancedPositiveNegativeSampler(minibatch_sampler.MinibatchSampler):
  """Subsamples minibatches to a desired balance of positives and negatives."""
-  def __init__(self, positive_fraction=0.5):
+  def __init__(self, positive_fraction=0.5, is_static=False):
    """Constructs a minibatch sampler.
    Args:
      positive_fraction: desired fraction of positive examples (scalar in [0,1])
        in the batch.
+      is_static: If True, uses an implementation with static shape guarantees.
    Raises:
      ValueError: if positive_fraction < 0, or positive_fraction > 1
@@ -48,21 +50,159 @@ class BalancedPositiveNegativeSampler(minibatch_sampler.MinibatchSampler):
      raise ValueError('positive_fraction should be in range [0,1]. '
                       'Received: %s.' % positive_fraction)
    self._positive_fraction = positive_fraction
+    self._is_static = is_static
-  def subsample(self, indicator, batch_size, labels):
+  def _get_num_pos_neg_samples(self, sorted_indices_tensor, sample_size):
+    """Counts the number of positives and negatives numbers to be sampled.
+    Args:
+      sorted_indices_tensor: A sorted int32 tensor of shape [N] which contains
+        the signed indices of the examples where the sign is based on the label
+        value. The examples that cannot be sampled are set to 0. It samples
+        atmost sample_size*positive_fraction positive examples and remaining
+        from negative examples.
+      sample_size: Size of subsamples.
+    Returns:
+      A tuple containing the number of positive and negative labels in the
+      subsample.
+    """
+    input_length = tf.shape(sorted_indices_tensor)[0]
+    valid_positive_index = tf.greater(sorted_indices_tensor,
+                                      tf.zeros(input_length, tf.int32))
+    num_sampled_pos = tf.reduce_sum(tf.cast(valid_positive_index, tf.int32))
+    max_num_positive_samples = tf.constant(
+        int(sample_size * self._positive_fraction), tf.int32)
+    num_positive_samples = tf.minimum(max_num_positive_samples, num_sampled_pos)
+    num_negative_samples = tf.constant(sample_size,
+                                       tf.int32) - num_positive_samples
+    return num_positive_samples, num_negative_samples
+  def _get_values_from_start_and_end(self, input_tensor, num_start_samples,
+                                     num_end_samples, total_num_samples):
+    """slices num_start_samples and last num_end_samples from input_tensor.
+    Args:
+      input_tensor: An int32 tensor of shape [N] to be sliced.
+      num_start_samples: Number of examples to be sliced from the beginning
+        of the input tensor.
+      num_end_samples: Number of examples to be sliced from the end of the
+        input tensor.
+      total_num_samples: Sum of is num_start_samples and num_end_samples. This
+        should be a scalar.
+    Returns:
+      A tensor containing the first num_start_samples and last num_end_samples
+      from input_tensor.
+    """
+    input_length = tf.shape(input_tensor)[0]
+    start_positions = tf.less(tf.range(input_length), num_start_samples)
+    end_positions = tf.greater_equal(
+        tf.range(input_length), input_length - num_end_samples)
+    selected_positions = tf.logical_or(start_positions, end_positions)
+    selected_positions = tf.cast(selected_positions, tf.int32)
+    indexed_positions = tf.multiply(tf.cumsum(selected_positions),
+                                    selected_positions)
+    one_hot_selector = tf.one_hot(indexed_positions - 1,
+                                  total_num_samples,
+                                  dtype=tf.int32)
+    return tf.tensordot(input_tensor, one_hot_selector, axes=[0, 0])
+  def _static_subsample(self, indicator, batch_size, labels):
+    """Returns subsampled minibatch.
+    Args:
+      indicator: boolean tensor of shape [N] whose True entries can be sampled.
+        N should be a complie time constant.
+      batch_size: desired batch size. This scalar cannot be None.
+      labels: boolean tensor of shape [N] denoting positive(=True) and negative
+        (=False) examples. N should be a complie time constant.
+    Returns:
+      sampled_idx_indicator: boolean tensor of shape [N], True for entries which
+        are sampled.
+    Raises:
+      ValueError: if labels and indicator are not 1D boolean tensors.
+    """
+    # Check if indicator and labels have a static size.
+    if not indicator.shape.is_fully_defined():
+      raise ValueError('indicator must be static in shape when is_static is'
+                       'True')
+    if not labels.shape.is_fully_defined():
+      raise ValueError('labels must be static in shape when is_static is'
+                       'True')
+    if not isinstance(batch_size, int):
+      raise ValueError('batch_size has to be an integer when is_static is'
+                       'True.')
+    input_length = tf.shape(indicator)[0]
+    # Shuffle indicator and label. Need to store the permutation to restore the
+    # order post sampling.
+    permutation = tf.random_shuffle(tf.range(input_length))
+    indicator = ops.matmul_gather_on_zeroth_axis(
+        tf.cast(indicator, tf.float32), permutation)
+    labels = ops.matmul_gather_on_zeroth_axis(
+        tf.cast(labels, tf.float32), permutation)
+    # index (starting from 1) when cls_weight is True, 0 when False
+    indicator_idx = tf.where(
+        tf.cast(indicator, tf.bool), tf.range(1, input_length + 1),
+        tf.zeros(input_length, tf.int32))
+    # Replace -1 for negative, +1 for positive labels
+    signed_label = tf.where(
+        tf.cast(labels, tf.bool), tf.ones(input_length, tf.int32),
+        tf.scalar_mul(-1, tf.ones(input_length, tf.int32)))
+    # negative of index for negative label, positive index for positive label,
+    # 0 when indicator is False.
+    signed_indicator_idx = tf.multiply(indicator_idx, signed_label)
+    sorted_signed_indicator_idx = tf.nn.top_k(
+        signed_indicator_idx, input_length, sorted=True).values
+    [num_positive_samples,
+     num_negative_samples] = self._get_num_pos_neg_samples(
+         sorted_signed_indicator_idx, batch_size)
+    sampled_idx = self._get_values_from_start_and_end(
+        sorted_signed_indicator_idx, num_positive_samples,
+        num_negative_samples, batch_size)
+    # Shift the indices to start from 0 and remove any samples that are set as
+    # False.
+    sampled_idx = tf.abs(sampled_idx) - tf.ones(batch_size, tf.int32)
+    sampled_idx = tf.multiply(
+        tf.cast(tf.greater_equal(sampled_idx, tf.constant(0)), tf.int32),
+        sampled_idx)
+    sampled_idx_indicator = tf.cast(tf.reduce_sum(
+        tf.one_hot(sampled_idx, depth=input_length),
+        axis=0), tf.bool)
+    # project back the order based on stored permutations
+    reprojections = tf.one_hot(permutation, depth=input_length, dtype=tf.int32)
+    return tf.cast(tf.tensordot(
+        tf.cast(sampled_idx_indicator, tf.int32),
+        reprojections, axes=[0, 0]), tf.bool)
+  def subsample(self, indicator, batch_size, labels, scope=None):
    """Returns subsampled minibatch.
    Args:
      indicator: boolean tensor of shape [N] whose True entries can be sampled.
      batch_size: desired batch size. If None, keeps all positive samples and
        randomly selects negative samples so that the positive sample fraction
-        matches self._positive_fraction.
+        matches self._positive_fraction. It cannot be None is is_static is True.
      labels: boolean tensor of shape [N] denoting positive(=True) and negative
          (=False) examples.
+      scope: name scope.
    Returns:
-      is_sampled: boolean tensor of shape [N], True for entries which are
+      sampled_idx_indicator: boolean tensor of shape [N], True for entries which
-          sampled.
+        are sampled.
    Raises:
      ValueError: if labels and indicator are not 1D boolean tensors.
@@ -79,27 +219,30 @@ class BalancedPositiveNegativeSampler(minibatch_sampler.MinibatchSampler):
    if indicator.dtype != tf.bool:
      raise ValueError('indicator should be of type bool. Received: %s' %
                       indicator.dtype)
+    with tf.name_scope(scope, 'BalancedPositiveNegativeSampler'):
-    # Only sample from indicated samples
+      if self._is_static:
-    negative_idx = tf.logical_not(labels)
+        return self._static_subsample(indicator, batch_size, labels)
-    positive_idx = tf.logical_and(labels, indicator)
-    negative_idx = tf.logical_and(negative_idx, indicator)
+      else:
+        # Only sample from indicated samples
-    # Sample positive and negative samples separately
+        negative_idx = tf.logical_not(labels)
-    if batch_size is None:
+        positive_idx = tf.logical_and(labels, indicator)
-      max_num_pos = tf.reduce_sum(tf.to_int32(positive_idx))
+        negative_idx = tf.logical_and(negative_idx, indicator)
-    else:
-      max_num_pos = int(self._positive_fraction * batch_size)
+        # Sample positive and negative samples separately
-    sampled_pos_idx = self.subsample_indicator(positive_idx, max_num_pos)
+        if batch_size is None:
-    num_sampled_pos = tf.reduce_sum(tf.cast(sampled_pos_idx, tf.int32))
+          max_num_pos = tf.reduce_sum(tf.to_int32(positive_idx))
-    if batch_size is None:
+        else:
-      negative_positive_ratio = (
+          max_num_pos = int(self._positive_fraction * batch_size)
-          1 - self._positive_fraction) / self._positive_fraction
+        sampled_pos_idx = self.subsample_indicator(positive_idx, max_num_pos)
-      max_num_neg = tf.to_int32(
+        num_sampled_pos = tf.reduce_sum(tf.cast(sampled_pos_idx, tf.int32))
-          negative_positive_ratio * tf.to_float(num_sampled_pos))
+        if batch_size is None:
-    else:
+          negative_positive_ratio = (
-      max_num_neg = batch_size - num_sampled_pos
+              1 - self._positive_fraction) / self._positive_fraction
-    sampled_neg_idx = self.subsample_indicator(negative_idx, max_num_neg)
+          max_num_neg = tf.to_int32(
+              negative_positive_ratio * tf.to_float(num_sampled_pos))
-    sampled_idx = tf.logical_or(sampled_pos_idx, sampled_neg_idx)
+        else:
-    return sampled_idx
+          max_num_neg = batch_size - num_sampled_pos
+        sampled_neg_idx = self.subsample_indicator(negative_idx, max_num_neg)
+        return tf.logical_or(sampled_pos_idx, sampled_neg_idx)
--- a/research/object_detection/core/balanced_positive_negative_sampler_test.py
+++ b/research/object_detection/core/balanced_positive_negative_sampler_test.py
@@ -24,15 +24,16 @@ from object_detection.utils import test_case
 class BalancedPositiveNegativeSamplerTest(test_case.TestCase):
-  def test_subsample_all_examples(self):
+  def _test_subsample_all_examples(self, is_static=False):
    numpy_labels = np.random.permutation(300)
    indicator = tf.constant(np.ones(300) == 1)
    numpy_labels = (numpy_labels - 200) > 0
    labels = tf.constant(numpy_labels)
-    sampler = (balanced_positive_negative_sampler.
+    sampler = (
-               BalancedPositiveNegativeSampler())
+        balanced_positive_negative_sampler.BalancedPositiveNegativeSampler(
+            is_static=is_static))
    is_sampled = sampler.subsample(indicator, 64, labels)
    with self.test_session() as sess:
      is_sampled = sess.run(is_sampled)
@@ -41,7 +42,13 @@ class BalancedPositiveNegativeSamplerTest(test_case.TestCase):
      self.assertTrue(sum(np.logical_and(
          np.logical_not(numpy_labels), is_sampled)) == 32)
-  def test_subsample_selection(self):
+  def test_subsample_all_examples_dynamic(self):
+    self._test_subsample_all_examples()
+  def test_subsample_all_examples_static(self):
+    self._test_subsample_all_examples(is_static=True)
+  def _test_subsample_selection(self, is_static=False):
    # Test random sampling when only some examples can be sampled:
    # 100 samples, 20 positives, 10 positives cannot be sampled
    numpy_labels = np.arange(100)
@@ -51,8 +58,9 @@ class BalancedPositiveNegativeSamplerTest(test_case.TestCase):
    labels = tf.constant(numpy_labels)
-    sampler = (balanced_positive_negative_sampler.
+    sampler = (
-               BalancedPositiveNegativeSampler())
+        balanced_positive_negative_sampler.BalancedPositiveNegativeSampler(
+            is_static=is_static))
    is_sampled = sampler.subsample(indicator, 64, labels)
    with self.test_session() as sess:
      is_sampled = sess.run(is_sampled)
@@ -63,6 +71,42 @@ class BalancedPositiveNegativeSamplerTest(test_case.TestCase):
      self.assertAllEqual(is_sampled, np.logical_and(is_sampled,
                                                     numpy_indicator))
+  def test_subsample_selection_dynamic(self):
+    self._test_subsample_selection()
+  def test_subsample_selection_static(self):
+    self._test_subsample_selection(is_static=True)
+  def _test_subsample_selection_larger_batch_size(self, is_static=False):
+    # Test random sampling when total number of examples that can be sampled are
+    # less than batch size:
+    # 100 samples, 50 positives, 40 positives cannot be sampled, batch size 64.
+    numpy_labels = np.arange(100)
+    numpy_indicator = numpy_labels < 60
+    indicator = tf.constant(numpy_indicator)
+    numpy_labels = (numpy_labels - 50) >= 0
+    labels = tf.constant(numpy_labels)
+    sampler = (
+        balanced_positive_negative_sampler.BalancedPositiveNegativeSampler(
+            is_static=is_static))
+    is_sampled = sampler.subsample(indicator, 64, labels)
+    with self.test_session() as sess:
+      is_sampled = sess.run(is_sampled)
+      self.assertTrue(sum(is_sampled) == 60)
+      self.assertTrue(sum(np.logical_and(numpy_labels, is_sampled)) == 10)
+      self.assertTrue(
+          sum(np.logical_and(np.logical_not(numpy_labels), is_sampled)) == 50)
+      self.assertAllEqual(is_sampled, np.logical_and(is_sampled,
+                                                     numpy_indicator))
+  def test_subsample_selection_larger_batch_size_dynamic(self):
+    self._test_subsample_selection_larger_batch_size()
+  def test_subsample_selection_larger_batch_size_static(self):
+    self._test_subsample_selection_larger_batch_size(is_static=True)
  def test_subsample_selection_no_batch_size(self):
    # Test random sampling when only some examples can be sampled:
    # 1000 samples, 6 positives (5 can be sampled).
@@ -85,6 +129,14 @@ class BalancedPositiveNegativeSamplerTest(test_case.TestCase):
      self.assertAllEqual(is_sampled, np.logical_and(is_sampled,
                                                     numpy_indicator))
+  def test_subsample_selection_no_batch_size_static(self):
+    labels = tf.constant([[True, False, False]])
+    indicator = tf.constant([True, False, True])
+    sampler = (
+        balanced_positive_negative_sampler.BalancedPositiveNegativeSampler())
+    with self.assertRaises(ValueError):
+      sampler.subsample(indicator, None, labels)
  def test_raises_error_with_incorrect_label_shape(self):
    labels = tf.constant([[True, False, False]])
    indicator = tf.constant([True, False, True])
@@ -101,6 +153,5 @@ class BalancedPositiveNegativeSamplerTest(test_case.TestCase):
    with self.assertRaises(ValueError):
      sampler.subsample(indicator, 64, labels)
 if __name__ == '__main__':
  tf.test.main()
--- a/research/object_detection/core/box_predictor.py
+++ b/research/object_detection/core/box_predictor.py
@@ -27,13 +27,7 @@ These modules are separated from the main model since the same
 few box predictor architectures are shared across many models.
 """
 from abc import abstractmethod
-import math
 import tensorflow as tf
-from object_detection.utils import ops
-from object_detection.utils import shape_utils
-from object_detection.utils import static_shape
-slim = tf.contrib.slim
 BOX_ENCODINGS = 'box_encodings'
 CLASS_PREDICTIONS_WITH_BACKGROUND = 'class_predictions_with_background'
@@ -56,6 +50,10 @@ class BoxPredictor(object):
    self._is_training = is_training
    self._num_classes = num_classes
+  @property
+  def is_keras_model(self):
+    return False
  @property
  def num_classes(self):
    return self._num_classes
@@ -136,26 +134,11 @@ class BoxPredictor(object):
    pass
-class RfcnBoxPredictor(BoxPredictor):
+class KerasBoxPredictor(tf.keras.Model):
-  """RFCN Box Predictor.
+  """Keras-based BoxPredictor."""
-  Applies a position sensitive ROI pooling on position sensitive feature maps to
+  def __init__(self, is_training, num_classes, freeze_batchnorm,
-  predict classes and refined locations. See https://arxiv.org/abs/1605.06409
+               inplace_batchnorm_update):
-  for details.
-  This is used for the second stage of the RFCN meta architecture. Notice that
-  locations are *not* shared across classes, thus for each anchor, a separate
-  prediction is made for each class.
-  """
-  def __init__(self,
-               is_training,
-               num_classes,
-               conv_hyperparams_fn,
-               num_spatial_bins,
-               depth,
-               crop_size,
-               box_code_size):
    """Constructor.
    Args:
@@ -164,835 +147,80 @@ class RfcnBoxPredictor(BoxPredictor):
        include the background category, so if groundtruth labels take values
        in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the
        assigned classification targets can range from {0,... K}).
-      conv_hyperparams_fn: A function to construct tf-slim arg_scope with
+      freeze_batchnorm: Whether to freeze batch norm parameters during
-        hyperparameters for convolutional layers.
+        training or not. When training with a small batch size (e.g. 1), it is
-      num_spatial_bins: A list of two integers `[spatial_bins_y,
+        desirable to freeze batch norm update and use pretrained batch norm
-        spatial_bins_x]`.
+        params.
-      depth: Target depth to reduce the input feature maps to.
+      inplace_batchnorm_update: Whether to update batch norm moving average
-      crop_size: A list of two integers `[crop_height, crop_width]`.
+        values inplace. When this is false train op must add a control
-      box_code_size: Size of encoding for each box.
+        dependency on tf.graphkeys.UPDATE_OPS collection in order to update
+        batch norm statistics.
    """
-    super(RfcnBoxPredictor, self).__init__(is_training, num_classes)
+    super(KerasBoxPredictor, self).__init__()
-    self._conv_hyperparams_fn = conv_hyperparams_fn
-    self._num_spatial_bins = num_spatial_bins
-    self._depth = depth
-    self._crop_size = crop_size
-    self._box_code_size = box_code_size
-  @property
-  def num_classes(self):
-    return self._num_classes
-  def _predict(self, image_features, num_predictions_per_location,
-               proposal_boxes):
-    """Computes encoded object locations and corresponding confidences.
-    Args:
+    self._is_training = is_training
-      image_features: A list of float tensors of shape [batch_size, height_i,
+    self._num_classes = num_classes
-      width_i, channels_i] containing features for a batch of images.
+    self._freeze_batchnorm = freeze_batchnorm
-      num_predictions_per_location: A list of integers representing the number
+    self._inplace_batchnorm_update = inplace_batchnorm_update
-        of box predictions to be made per spatial location for each feature map.
-        Currently, this must be set to [1], or an error will be raised.
-      proposal_boxes: A float tensor of shape [batch_size, num_proposals,
-        box_code_size].
-    Returns:
-      box_encodings: A list of float tensors of shape
-        [batch_size, num_anchors_i, q, code_size] representing the location of
-        the objects, where q is 1 or the number of classes. Each entry in the
-        list corresponds to a feature map in the input `image_features` list.
-      class_predictions_with_background: A list of float tensors of shape
-        [batch_size, num_anchors_i, num_classes + 1] representing the class
-        predictions for the proposals. Each entry in the list corresponds to a
-        feature map in the input `image_features` list.
-    Raises:
-      ValueError: if num_predictions_per_location is not 1 or if
-        len(image_features) is not 1.
-    """
-    if (len(num_predictions_per_location) != 1 or
-        num_predictions_per_location[0] != 1):
-      raise ValueError('Currently RfcnBoxPredictor only supports '
-                       'predicting a single box per class per location.')
-    if len(image_features) != 1:
-      raise ValueError('length of `image_features` must be 1. Found {}'.
-                       format(len(image_features)))
-    image_feature = image_features[0]
-    num_predictions_per_location = num_predictions_per_location[0]
-    batch_size = tf.shape(proposal_boxes)[0]
-    num_boxes = tf.shape(proposal_boxes)[1]
-    def get_box_indices(proposals):
-      proposals_shape = proposals.get_shape().as_list()
-      if any(dim is None for dim in proposals_shape):
-        proposals_shape = tf.shape(proposals)
-      ones_mat = tf.ones(proposals_shape[:2], dtype=tf.int32)
-      multiplier = tf.expand_dims(
-          tf.range(start=0, limit=proposals_shape[0]), 1)
-      return tf.reshape(ones_mat * multiplier, [-1])
-    net = image_feature
-    with slim.arg_scope(self._conv_hyperparams_fn()):
-      net = slim.conv2d(net, self._depth, [1, 1], scope='reduce_depth')
-      # Location predictions.
-      location_feature_map_depth = (self._num_spatial_bins[0] *
-                                    self._num_spatial_bins[1] *
-                                    self.num_classes *
-                                    self._box_code_size)
-      location_feature_map = slim.conv2d(net, location_feature_map_depth,
-                                         [1, 1], activation_fn=None,
-                                         scope='refined_locations')
-      box_encodings = ops.position_sensitive_crop_regions(
-          location_feature_map,
-          boxes=tf.reshape(proposal_boxes, [-1, self._box_code_size]),
-          box_ind=get_box_indices(proposal_boxes),
-          crop_size=self._crop_size,
-          num_spatial_bins=self._num_spatial_bins,
-          global_pool=True)
-      box_encodings = tf.squeeze(box_encodings, squeeze_dims=[1, 2])
-      box_encodings = tf.reshape(box_encodings,
-                                 [batch_size * num_boxes, 1, self.num_classes,
-                                  self._box_code_size])
-      # Class predictions.
-      total_classes = self.num_classes + 1  # Account for background class.
-      class_feature_map_depth = (self._num_spatial_bins[0] *
-                                 self._num_spatial_bins[1] *
-                                 total_classes)
-      class_feature_map = slim.conv2d(net, class_feature_map_depth, [1, 1],
-                                      activation_fn=None,
-                                      scope='class_predictions')
-      class_predictions_with_background = ops.position_sensitive_crop_regions(
-          class_feature_map,
-          boxes=tf.reshape(proposal_boxes, [-1, self._box_code_size]),
-          box_ind=get_box_indices(proposal_boxes),
-          crop_size=self._crop_size,
-          num_spatial_bins=self._num_spatial_bins,
-          global_pool=True)
-      class_predictions_with_background = tf.squeeze(
-          class_predictions_with_background, squeeze_dims=[1, 2])
-      class_predictions_with_background = tf.reshape(
-          class_predictions_with_background,
-          [batch_size * num_boxes, 1, total_classes])
-    return {BOX_ENCODINGS: [box_encodings],
-            CLASS_PREDICTIONS_WITH_BACKGROUND:
-            [class_predictions_with_background]}
-# TODO(rathodv): Change the implementation to return lists of predictions.
-class MaskRCNNBoxPredictor(BoxPredictor):
-  """Mask R-CNN Box Predictor.
-  See Mask R-CNN: He, K., Gkioxari, G., Dollar, P., & Girshick, R. (2017).
-  Mask R-CNN. arXiv preprint arXiv:1703.06870.
-  This is used for the second stage of the Mask R-CNN detector where proposals
-  cropped from an image are arranged along the batch dimension of the input
-  image_features tensor. Notice that locations are *not* shared across classes,
-  thus for each anchor, a separate prediction is made for each class.
-  In addition to predicting boxes and classes, optionally this class allows
-  predicting masks and/or keypoints inside detection boxes.
-  Currently this box predictor makes per-class predictions; that is, each
-  anchor makes a separate box prediction for each class.
-  """
-  def __init__(self,
-               is_training,
-               num_classes,
-               fc_hyperparams_fn,
-               use_dropout,
-               dropout_keep_prob,
-               box_code_size,
-               conv_hyperparams_fn=None,
-               predict_instance_masks=False,
-               mask_height=14,
-               mask_width=14,
-               mask_prediction_num_conv_layers=2,
-               mask_prediction_conv_depth=256,
-               masks_are_class_agnostic=False,
-               predict_keypoints=False,
-               share_box_across_classes=False):
-    """Constructor.
-    Args:
-      is_training: Indicates whether the BoxPredictor is in training mode.
-      num_classes: number of classes.  Note that num_classes *does not*
-        include the background category, so if groundtruth labels take values
-        in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the
-        assigned classification targets can range from {0,... K}).
-      fc_hyperparams_fn: A function to generate tf-slim arg_scope with
-        hyperparameters for fully connected ops.
-      use_dropout: Option to use dropout or not.  Note that a single dropout
-        op is applied here prior to both box and class predictions, which stands
-        in contrast to the ConvolutionalBoxPredictor below.
-      dropout_keep_prob: Keep probability for dropout.
-        This is only used if use_dropout is True.
-      box_code_size: Size of encoding for each box.
-      conv_hyperparams_fn: A function to generate tf-slim arg_scope with
-        hyperparameters for convolution ops.
-      predict_instance_masks: Whether to predict object masks inside detection
-        boxes.
-      mask_height: Desired output mask height. The default value is 14.
-      mask_width: Desired output mask width. The default value is 14.
-      mask_prediction_num_conv_layers: Number of convolution layers applied to
-        the image_features in mask prediction branch.
-      mask_prediction_conv_depth: The depth for the first conv2d_transpose op
-        applied to the image_features in the mask prediction branch. If set
-        to 0, the depth of the convolution layers will be automatically chosen
-        based on the number of object classes and the number of channels in the
-        image features.
-      masks_are_class_agnostic: Boolean determining if the mask-head is
-        class-agnostic or not.
-      predict_keypoints: Whether to predict keypoints insde detection boxes.
-      share_box_across_classes: Whether to share boxes across classes rather
-        than use a different box for each class.
-    Raises:
+  @property
-      ValueError: If predict_instance_masks is true but conv_hyperparams is not
+  def is_keras_model(self):
-        set.
+    return True
-      ValueError: If predict_keypoints is true since it is not implemented yet.
-      ValueError: If mask_prediction_num_conv_layers is smaller than two.
-    """
-    super(MaskRCNNBoxPredictor, self).__init__(is_training, num_classes)
-    self._fc_hyperparams_fn = fc_hyperparams_fn
-    self._use_dropout = use_dropout
-    self._box_code_size = box_code_size
-    self._dropout_keep_prob = dropout_keep_prob
-    self._conv_hyperparams_fn = conv_hyperparams_fn
-    self._predict_instance_masks = predict_instance_masks
-    self._mask_height = mask_height
-    self._mask_width = mask_width
-    self._mask_prediction_num_conv_layers = mask_prediction_num_conv_layers
-    self._mask_prediction_conv_depth = mask_prediction_conv_depth
-    self._masks_are_class_agnostic = masks_are_class_agnostic
-    self._predict_keypoints = predict_keypoints
-    self._share_box_across_classes = share_box_across_classes
-    if self._predict_keypoints:
-      raise ValueError('Keypoint prediction is unimplemented.')
-    if ((self._predict_instance_masks or self._predict_keypoints) and
-        self._conv_hyperparams_fn is None):
-      raise ValueError('`conv_hyperparams` must be provided when predicting '
-                       'masks.')
-    if self._mask_prediction_num_conv_layers < 2:
-      raise ValueError(
-          'Mask prediction should consist of at least 2 conv layers')
  @property
  def num_classes(self):
    return self._num_classes
-  @property
+  def call(self, image_features, scope=None, **kwargs):
-  def predicts_instance_masks(self):
+    """Computes encoded object locations and corresponding confidences.
-    return self._predict_instance_masks
-  def _predict_boxes_and_classes(self, image_features):
-    """Predicts boxes and class scores.
-    Args:
-      image_features: A float tensor of shape [batch_size, height, width,
-        channels] containing features for a batch of images.
-    Returns:
-      box_encodings: A float tensor of shape
-        [batch_size, 1, num_classes, code_size] representing the location of the
-        objects.
-      class_predictions_with_background: A float tensor of shape
-        [batch_size, 1, num_classes + 1] representing the class predictions for
-        the proposals.
-    """
-    spatial_averaged_image_features = tf.reduce_mean(image_features, [1, 2],
-                                                     keep_dims=True,
-                                                     name='AvgPool')
-    flattened_image_features = slim.flatten(spatial_averaged_image_features)
-    if self._use_dropout:
-      flattened_image_features = slim.dropout(flattened_image_features,
-                                              keep_prob=self._dropout_keep_prob,
-                                              is_training=self._is_training)
-    number_of_boxes = 1
-    if not self._share_box_across_classes:
-      number_of_boxes = self._num_classes
-    with slim.arg_scope(self._fc_hyperparams_fn()):
-      box_encodings = slim.fully_connected(
-          flattened_image_features,
-          number_of_boxes * self._box_code_size,
-          activation_fn=None,
-          scope='BoxEncodingPredictor')
-      class_predictions_with_background = slim.fully_connected(
-          flattened_image_features,
-          self._num_classes + 1,
-          activation_fn=None,
-          scope='ClassPredictor')
-    box_encodings = tf.reshape(
-        box_encodings, [-1, 1, number_of_boxes, self._box_code_size])
-    class_predictions_with_background = tf.reshape(
-        class_predictions_with_background, [-1, 1, self._num_classes + 1])
-    return box_encodings, class_predictions_with_background
-  def _get_mask_predictor_conv_depth(self, num_feature_channels, num_classes,
-                                     class_weight=3.0, feature_weight=2.0):
-    """Computes the depth of the mask predictor convolutions.
-    Computes the depth of the mask predictor convolutions given feature channels
-    and number of classes by performing a weighted average of the two in
-    log space to compute the number of convolution channels. The weights that
-    are used for computing the weighted average do not need to sum to 1.
-    Args:
-      num_feature_channels: An integer containing the number of feature
-        channels.
-      num_classes: An integer containing the number of classes.
-      class_weight: Class weight used in computing the weighted average.
-      feature_weight: Feature weight used in computing the weighted average.
-    Returns:
-      An integer containing the number of convolution channels used by mask
-        predictor.
-    """
-    num_feature_channels_log = math.log(float(num_feature_channels), 2.0)
-    num_classes_log = math.log(float(num_classes), 2.0)
-    weighted_num_feature_channels_log = (
-        num_feature_channels_log * feature_weight)
-    weighted_num_classes_log = num_classes_log * class_weight
-    total_weight = feature_weight + class_weight
-    num_conv_channels_log = round(
-        (weighted_num_feature_channels_log + weighted_num_classes_log) /
-        total_weight)
-    return int(math.pow(2.0, num_conv_channels_log))
-  def _predict_masks(self, image_features):
-    """Performs mask prediction.
-    Args:
-      image_features: A float tensor of shape [batch_size, height, width,
-        channels] containing features for a batch of images.
-    Returns:
+    Takes a list of high level image feature maps as input and produces a list
-      instance_masks: A float tensor of shape
+    of box encodings and a list of class scores where each element in the output
-          [batch_size, 1, num_classes, image_height, image_width].
+    lists correspond to the feature maps in the input list.
-    """
-    num_conv_channels = self._mask_prediction_conv_depth
-    if num_conv_channels == 0:
-      num_feature_channels = image_features.get_shape().as_list()[3]
-      num_conv_channels = self._get_mask_predictor_conv_depth(
-          num_feature_channels, self.num_classes)
-    with slim.arg_scope(self._conv_hyperparams_fn()):
-      upsampled_features = tf.image.resize_bilinear(
-          image_features,
-          [self._mask_height, self._mask_width],
-          align_corners=True)
-      for _ in range(self._mask_prediction_num_conv_layers - 1):
-        upsampled_features = slim.conv2d(
-            upsampled_features,
-            num_outputs=num_conv_channels,
-            kernel_size=[3, 3])
-      num_masks = 1 if self._masks_are_class_agnostic else self.num_classes
-      mask_predictions = slim.conv2d(upsampled_features,
-                                     num_outputs=num_masks,
-                                     activation_fn=None,
-                                     kernel_size=[3, 3])
-      return tf.expand_dims(
-          tf.transpose(mask_predictions, perm=[0, 3, 1, 2]),
-          axis=1,
-          name='MaskPredictor')
-  def _predict(self, image_features, num_predictions_per_location,
-               predict_boxes_and_classes=True, predict_auxiliary_outputs=False):
-    """Optionally computes encoded object locations, confidences, and masks.
-    Flattens image_features and applies fully connected ops (with no
-    non-linearity) to predict box encodings and class predictions.  In this
-    setting, anchors are not spatially arranged in any way and are assumed to
-    have been folded into the batch dimension.  Thus we output 1 for the
-    anchors dimension.
-    Also optionally predicts instance masks.
-    The mask prediction head is based on the Mask RCNN paper with the following
-    modifications: We replace the deconvolution layer with a bilinear resize
-    and a convolution.
    Args:
      image_features: A list of float tensors of shape [batch_size, height_i,
-        width_i, channels_i] containing features for a batch of images.
+      width_i, channels_i] containing features for a batch of images.
-      num_predictions_per_location: A list of integers representing the number
+      scope: Variable and Op scope name.
-        of box predictions to be made per spatial location for each feature map.
+      **kwargs: Additional keyword arguments for specific implementations of
-        Currently, this must be set to [1], or an error will be raised.
+              BoxPredictor.
-      predict_boxes_and_classes: If true, the function will perform box
-        refinement and classification.
-      predict_auxiliary_outputs: If true, the function will perform other
-        predictions such as mask, keypoint, boundaries, etc. if any.
    Returns:
-      A dictionary containing the following tensors.
+      A dictionary containing at least the following tensors.
-        box_encodings: A float tensor of shape
+        box_encodings: A list of float tensors. Each entry in the list
-          [batch_size, 1, num_classes, code_size] representing the
+          corresponds to a feature map in the input `image_features` list. All
-          location of the objects.
+          tensors in the list have one of the two following shapes:
-        class_predictions_with_background: A float tensor of shape
+          a. [batch_size, num_anchors_i, q, code_size] representing the location
-          [batch_size, 1, num_classes + 1] representing the class
+            of the objects, where q is 1 or the number of classes.
-          predictions for the proposals.
+          b. [batch_size, num_anchors_i, code_size].
-      If predict_masks is True the dictionary also contains:
+        class_predictions_with_background: A list of float tensors of shape
-        instance_masks: A float tensor of shape
+          [batch_size, num_anchors_i, num_classes + 1] representing the class
-          [batch_size, 1, num_classes, image_height, image_width]
+          predictions for the proposals. Each entry in the list corresponds to a
-      If predict_keypoints is True the dictionary also contains:
+          feature map in the input `image_features` list.
-        keypoints: [batch_size, 1, num_keypoints, 2]
-    Raises:
-      ValueError: If num_predictions_per_location is not 1 or if both
-        predict_boxes_and_classes and predict_auxiliary_outputs are false or if
-        len(image_features) is not 1.
    """
-    if (len(num_predictions_per_location) != 1 or
+    return self._predict(image_features, **kwargs)
-        num_predictions_per_location[0] != 1):
-      raise ValueError('Currently FullyConnectedBoxPredictor only supports '
-                       'predicting a single box per class per location.')
-    if not predict_boxes_and_classes and not predict_auxiliary_outputs:
-      raise ValueError('Should perform at least one prediction.')
-    if len(image_features) != 1:
-      raise ValueError('length of `image_features` must be 1. Found {}'.
-                       format(len(image_features)))
-    image_feature = image_features[0]
-    num_predictions_per_location = num_predictions_per_location[0]
-    predictions_dict = {}
-    if predict_boxes_and_classes:
-      (box_encodings, class_predictions_with_background
-      ) = self._predict_boxes_and_classes(image_feature)
-      predictions_dict[BOX_ENCODINGS] = box_encodings
-      predictions_dict[
-          CLASS_PREDICTIONS_WITH_BACKGROUND] = class_predictions_with_background
-    if self._predict_instance_masks and predict_auxiliary_outputs:
-      predictions_dict[MASK_PREDICTIONS] = self._predict_masks(image_feature)
-    return predictions_dict
-class _NoopVariableScope(object):
-  """A dummy class that does not push any scope."""
-  def __enter__(self):
-    return None
-  def __exit__(self, exc_type, exc_value, traceback):
-    return False
-class ConvolutionalBoxPredictor(BoxPredictor):
+  @abstractmethod
-  """Convolutional Box Predictor.
+  def _predict(self, image_features, **kwargs):
+    """Implementations must override this method.
-  Optionally add an intermediate 1x1 convolutional layer after features and
-  predict in parallel branches box_encodings and
-  class_predictions_with_background.
-  Currently this box predictor assumes that predictions are "shared" across
-  classes --- that is each anchor makes box predictions which do not depend
-  on class.
-  """
-  def __init__(self,
-               is_training,
-               num_classes,
-               conv_hyperparams_fn,
-               min_depth,
-               max_depth,
-               num_layers_before_predictor,
-               use_dropout,
-               dropout_keep_prob,
-               kernel_size,
-               box_code_size,
-               apply_sigmoid_to_scores=False,
-               class_prediction_bias_init=0.0,
-               use_depthwise=False):
-    """Constructor.
-    Args:
-      is_training: Indicates whether the BoxPredictor is in training mode.
-      num_classes: number of classes.  Note that num_classes *does not*
-        include the background category, so if groundtruth labels take values
-        in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the
-        assigned classification targets can range from {0,... K}).
-      conv_hyperparams_fn: A function to generate tf-slim arg_scope with
-        hyperparameters for convolution ops.
-      min_depth: Minimum feature depth prior to predicting box encodings
-        and class predictions.
-      max_depth: Maximum feature depth prior to predicting box encodings
-        and class predictions. If max_depth is set to 0, no additional
-        feature map will be inserted before location and class predictions.
-      num_layers_before_predictor: Number of the additional conv layers before
-        the predictor.
-      use_dropout: Option to use dropout for class prediction or not.
-      dropout_keep_prob: Keep probability for dropout.
-        This is only used if use_dropout is True.
-      kernel_size: Size of final convolution kernel.  If the
-        spatial resolution of the feature map is smaller than the kernel size,
-        then the kernel size is automatically set to be
-        min(feature_width, feature_height).
-      box_code_size: Size of encoding for each box.
-      apply_sigmoid_to_scores: if True, apply the sigmoid on the output
-        class_predictions.
-      class_prediction_bias_init: constant value to initialize bias of the last
-        conv2d layer before class prediction.
-      use_depthwise: Whether to use depthwise convolutions for prediction
-        steps. Default is False.
-    Raises:
-      ValueError: if min_depth > max_depth.
-    """
-    super(ConvolutionalBoxPredictor, self).__init__(is_training, num_classes)
-    if min_depth > max_depth:
-      raise ValueError('min_depth should be less than or equal to max_depth')
-    self._conv_hyperparams_fn = conv_hyperparams_fn
-    self._min_depth = min_depth
-    self._max_depth = max_depth
-    self._num_layers_before_predictor = num_layers_before_predictor
-    self._use_dropout = use_dropout
-    self._kernel_size = kernel_size
-    self._box_code_size = box_code_size
-    self._dropout_keep_prob = dropout_keep_prob
-    self._apply_sigmoid_to_scores = apply_sigmoid_to_scores
-    self._class_prediction_bias_init = class_prediction_bias_init
-    self._use_depthwise = use_depthwise
-  def _predict(self, image_features, num_predictions_per_location_list):
-    """Computes encoded object locations and corresponding confidences.
    Args:
      image_features: A list of float tensors of shape [batch_size, height_i,
        width_i, channels_i] containing features for a batch of images.
-      num_predictions_per_location_list: A list of integers representing the
+      **kwargs: Additional keyword arguments for specific implementations of
-        number of box predictions to be made per spatial location for each
+              BoxPredictor.
-        feature map.
-    Returns:
-      box_encodings: A list of float tensors of shape
-        [batch_size, num_anchors_i, q, code_size] representing the location of
-        the objects, where q is 1 or the number of classes. Each entry in the
-        list corresponds to a feature map in the input `image_features` list.
-      class_predictions_with_background: A list of float tensors of shape
-        [batch_size, num_anchors_i, num_classes + 1] representing the class
-        predictions for the proposals. Each entry in the list corresponds to a
-        feature map in the input `image_features` list.
-    """
-    box_encodings_list = []
-    class_predictions_list = []
-    # TODO(rathodv): Come up with a better way to generate scope names
-    # in box predictor once we have time to retrain all models in the zoo.
-    # The following lines create scope names to be backwards compatible with the
-    # existing checkpoints.
-    box_predictor_scopes = [_NoopVariableScope()]
-    if len(image_features) > 1:
-      box_predictor_scopes = [
-          tf.variable_scope('BoxPredictor_{}'.format(i))
-          for i in range(len(image_features))
-      ]
-    for (image_feature,
-         num_predictions_per_location, box_predictor_scope) in zip(
-             image_features, num_predictions_per_location_list,
-             box_predictor_scopes):
-      with box_predictor_scope:
-        # Add a slot for the background class.
-        num_class_slots = self.num_classes + 1
-        net = image_feature
-        with slim.arg_scope(self._conv_hyperparams_fn()), \
-             slim.arg_scope([slim.dropout], is_training=self._is_training):
-          # Add additional conv layers before the class predictor.
-          features_depth = static_shape.get_depth(image_feature.get_shape())
-          depth = max(min(features_depth, self._max_depth), self._min_depth)
-          tf.logging.info('depth of additional conv before box predictor: {}'.
-                          format(depth))
-          if depth > 0 and self._num_layers_before_predictor > 0:
-            for i in range(self._num_layers_before_predictor):
-              net = slim.conv2d(
-                  net, depth, [1, 1], scope='Conv2d_%d_1x1_%d' % (i, depth))
-          with slim.arg_scope([slim.conv2d], activation_fn=None,
-                              normalizer_fn=None, normalizer_params=None):
-            if self._use_depthwise:
-              box_encodings = slim.separable_conv2d(
-                  net, None, [self._kernel_size, self._kernel_size],
-                  padding='SAME', depth_multiplier=1, stride=1,
-                  rate=1, scope='BoxEncodingPredictor_depthwise')
-              box_encodings = slim.conv2d(
-                  box_encodings,
-                  num_predictions_per_location * self._box_code_size, [1, 1],
-                  scope='BoxEncodingPredictor')
-            else:
-              box_encodings = slim.conv2d(
-                  net, num_predictions_per_location * self._box_code_size,
-                  [self._kernel_size, self._kernel_size],
-                  scope='BoxEncodingPredictor')
-            if self._use_dropout:
-              net = slim.dropout(net, keep_prob=self._dropout_keep_prob)
-            if self._use_depthwise:
-              class_predictions_with_background = slim.separable_conv2d(
-                  net, None, [self._kernel_size, self._kernel_size],
-                  padding='SAME', depth_multiplier=1, stride=1,
-                  rate=1, scope='ClassPredictor_depthwise')
-              class_predictions_with_background = slim.conv2d(
-                  class_predictions_with_background,
-                  num_predictions_per_location * num_class_slots,
-                  [1, 1], scope='ClassPredictor')
-            else:
-              class_predictions_with_background = slim.conv2d(
-                  net, num_predictions_per_location * num_class_slots,
-                  [self._kernel_size, self._kernel_size],
-                  scope='ClassPredictor',
-                  biases_initializer=tf.constant_initializer(
-                      self._class_prediction_bias_init))
-            if self._apply_sigmoid_to_scores:
-              class_predictions_with_background = tf.sigmoid(
-                  class_predictions_with_background)
-        combined_feature_map_shape = (shape_utils.
-                                      combined_static_and_dynamic_shape(
-                                          image_feature))
-        box_encodings = tf.reshape(
-            box_encodings, tf.stack([combined_feature_map_shape[0],
-                                     combined_feature_map_shape[1] *
-                                     combined_feature_map_shape[2] *
-                                     num_predictions_per_location,
-                                     1, self._box_code_size]))
-        box_encodings_list.append(box_encodings)
-        class_predictions_with_background = tf.reshape(
-            class_predictions_with_background,
-            tf.stack([combined_feature_map_shape[0],
-                      combined_feature_map_shape[1] *
-                      combined_feature_map_shape[2] *
-                      num_predictions_per_location,
-                      num_class_slots]))
-        class_predictions_list.append(class_predictions_with_background)
-    return {
-        BOX_ENCODINGS: box_encodings_list,
-        CLASS_PREDICTIONS_WITH_BACKGROUND: class_predictions_list
-    }
-# TODO(rathodv): Replace with slim.arg_scope_func_key once its available
-# externally.
-def _arg_scope_func_key(op):
-  """Returns a key that can be used to index arg_scope dictionary."""
-  return getattr(op, '_key_op', str(op))
-# TODO(rathodv): Merge the implementation with ConvolutionalBoxPredictor above
-# since they are very similar.
-class WeightSharedConvolutionalBoxPredictor(BoxPredictor):
-  """Convolutional Box Predictor with weight sharing.
-  Defines the box predictor as defined in
-  https://arxiv.org/abs/1708.02002. This class differs from
-  ConvolutionalBoxPredictor in that it shares weights and biases while
-  predicting from different feature maps. However, batch_norm parameters are not
-  shared because the statistics of the activations vary among the different
-  feature maps.
-  Also note that separate multi-layer towers are constructed for the box
-  encoding and class predictors respectively.
-  """
-  def __init__(self,
-               is_training,
-               num_classes,
-               conv_hyperparams_fn,
-               depth,
-               num_layers_before_predictor,
-               box_code_size,
-               kernel_size=3,
-               class_prediction_bias_init=0.0,
-               use_dropout=False,
-               dropout_keep_prob=0.8,
-               share_prediction_tower=False):
-    """Constructor.
-    Args:
-      is_training: Indicates whether the BoxPredictor is in training mode.
-      num_classes: number of classes.  Note that num_classes *does not*
-        include the background category, so if groundtruth labels take values
-        in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the
-        assigned classification targets can range from {0,... K}).
-      conv_hyperparams_fn: A function to generate tf-slim arg_scope with
-        hyperparameters for convolution ops.
-      depth: depth of conv layers.
-      num_layers_before_predictor: Number of the additional conv layers before
-        the predictor.
-      box_code_size: Size of encoding for each box.
-      kernel_size: Size of final convolution kernel.
-      class_prediction_bias_init: constant value to initialize bias of the last
-        conv2d layer before class prediction.
-      use_dropout: Whether to apply dropout to class prediction head.
-      dropout_keep_prob: Probability of keeping activiations.
-      share_prediction_tower: Whether to share the multi-layer tower between box
-        prediction and class prediction heads.
-    """
-    super(WeightSharedConvolutionalBoxPredictor, self).__init__(is_training,
-                                                                num_classes)
-    self._conv_hyperparams_fn = conv_hyperparams_fn
-    self._depth = depth
-    self._num_layers_before_predictor = num_layers_before_predictor
-    self._box_code_size = box_code_size
-    self._kernel_size = kernel_size
-    self._class_prediction_bias_init = class_prediction_bias_init
-    self._use_dropout = use_dropout
-    self._dropout_keep_prob = dropout_keep_prob
-    self._share_prediction_tower = share_prediction_tower
-  def _predict(self, image_features, num_predictions_per_location_list):
-    """Computes encoded object locations and corresponding confidences.
-    Args:
-      image_features: A list of float tensors of shape [batch_size, height_i,
-        width_i, channels] containing features for a batch of images. Note that
-        when not all tensors in the list have the same number of channels, an
-        additional projection layer will be added on top the tensor to generate
-        feature map with number of channels consitent with the majority.
-      num_predictions_per_location_list: A list of integers representing the
-        number of box predictions to be made per spatial location for each
-        feature map. Note that all values must be the same since the weights are
-        shared.
    Returns:
-      box_encodings: A list of float tensors of shape
+      A dictionary containing at least the following tensors.
-        [batch_size, num_anchors_i, code_size] representing the location of
+        box_encodings: A list of float tensors. Each entry in the list
-        the objects. Each entry in the list corresponds to a feature map in the
+          corresponds to a feature map in the input `image_features` list. All
-        input `image_features` list.
+          tensors in the list have one of the two following shapes:
-      class_predictions_with_background: A list of float tensors of shape
+          a. [batch_size, num_anchors_i, q, code_size] representing the location
-        [batch_size, num_anchors_i, num_classes + 1] representing the class
+            of the objects, where q is 1 or the number of classes.
-        predictions for the proposals. Each entry in the list corresponds to a
+          b. [batch_size, num_anchors_i, code_size].
-        feature map in the input `image_features` list.
+        class_predictions_with_background: A list of float tensors of shape
+          [batch_size, num_anchors_i, num_classes + 1] representing the class
+          predictions for the proposals. Each entry in the list corresponds to a
-    Raises:
+          feature map in the input `image_features` list.
-      ValueError: If the image feature maps do not have the same number of
-        channels or if the num predictions per locations is differs between the
-        feature maps.
    """
-    if len(set(num_predictions_per_location_list)) > 1:
+    raise NotImplementedError
-      raise ValueError('num predictions per location must be same for all'
-                       'feature maps, found: {}'.format(
-                           num_predictions_per_location_list))
-    feature_channels = [
-        image_feature.shape[3].value for image_feature in image_features
-    ]
-    has_different_feature_channels = len(set(feature_channels)) > 1
-    if has_different_feature_channels:
-      inserted_layer_counter = 0
-      target_channel = max(set(feature_channels), key=feature_channels.count)
-      tf.logging.info('Not all feature maps have the same number of '
-                      'channels, found: {}, addition project layers '
-                      'to bring all feature maps to uniform channels '
-                      'of {}'.format(feature_channels, target_channel))
-    box_encodings_list = []
-    class_predictions_list = []
-    num_class_slots = self.num_classes + 1
-    for feature_index, (image_feature,
-                        num_predictions_per_location) in enumerate(
-                            zip(image_features,
-                                num_predictions_per_location_list)):
-      # Add a slot for the background class.
-      with tf.variable_scope('WeightSharedConvolutionalBoxPredictor',
-                             reuse=tf.AUTO_REUSE):
-        with slim.arg_scope(self._conv_hyperparams_fn()) as sc:
-          apply_batch_norm = _arg_scope_func_key(slim.batch_norm) in sc
-          # Insert an additional projection layer if necessary.
-          if (has_different_feature_channels and
-              image_feature.shape[3].value != target_channel):
-            image_feature = slim.conv2d(
-                image_feature,
-                target_channel, [1, 1],
-                stride=1,
-                padding='SAME',
-                activation_fn=None,
-                normalizer_fn=(tf.identity if apply_batch_norm else None),
-                scope='ProjectionLayer/conv2d_{}'.format(
-                    inserted_layer_counter))
-            if apply_batch_norm:
-              image_feature = slim.batch_norm(
-                  image_feature,
-                  scope='ProjectionLayer/conv2d_{}/BatchNorm'.format(
-                      inserted_layer_counter))
-            inserted_layer_counter += 1
-          box_encodings_net = image_feature
-          class_predictions_net = image_feature
-          for i in range(self._num_layers_before_predictor):
-            box_prediction_tower_prefix = (
-                'PredictionTower' if self._share_prediction_tower
-                else 'BoxPredictionTower')
-            box_encodings_net = slim.conv2d(
-                box_encodings_net,
-                self._depth,
-                [self._kernel_size, self._kernel_size],
-                stride=1,
-                padding='SAME',
-                activation_fn=None,
-                normalizer_fn=(tf.identity if apply_batch_norm else None),
-                scope='{}/conv2d_{}'.format(box_prediction_tower_prefix, i))
-            if apply_batch_norm:
-              box_encodings_net = slim.batch_norm(
-                  box_encodings_net,
-                  scope='{}/conv2d_{}/BatchNorm/feature_{}'.
-                  format(box_prediction_tower_prefix, i, feature_index))
-            box_encodings_net = tf.nn.relu6(box_encodings_net)
-          box_encodings = slim.conv2d(
-              box_encodings_net,
-              num_predictions_per_location * self._box_code_size,
-              [self._kernel_size, self._kernel_size],
-              activation_fn=None, stride=1, padding='SAME',
-              normalizer_fn=None,
-              scope='BoxPredictor')
-          if self._share_prediction_tower:
-            class_predictions_net = box_encodings_net
-          else:
-            for i in range(self._num_layers_before_predictor):
-              class_predictions_net = slim.conv2d(
-                  class_predictions_net,
-                  self._depth,
-                  [self._kernel_size, self._kernel_size],
-                  stride=1,
-                  padding='SAME',
-                  activation_fn=None,
-                  normalizer_fn=(tf.identity if apply_batch_norm else None),
-                  scope='ClassPredictionTower/conv2d_{}'.format(i))
-              if apply_batch_norm:
-                class_predictions_net = slim.batch_norm(
-                    class_predictions_net,
-                    scope='ClassPredictionTower/conv2d_{}/BatchNorm/feature_{}'
-                    .format(i, feature_index))
-              class_predictions_net = tf.nn.relu6(class_predictions_net)
-          if self._use_dropout:
-            class_predictions_net = slim.dropout(
-                class_predictions_net, keep_prob=self._dropout_keep_prob)
-          class_predictions_with_background = slim.conv2d(
-              class_predictions_net,
-              num_predictions_per_location * num_class_slots,
-              [self._kernel_size, self._kernel_size],
-              activation_fn=None, stride=1, padding='SAME',
-              normalizer_fn=None,
-              biases_initializer=tf.constant_initializer(
-                  self._class_prediction_bias_init),
-              scope='ClassPredictor')
-          combined_feature_map_shape = (shape_utils.
-                                        combined_static_and_dynamic_shape(
-                                            image_feature))
-          box_encodings = tf.reshape(
-              box_encodings, tf.stack([combined_feature_map_shape[0],
-                                       combined_feature_map_shape[1] *
-                                       combined_feature_map_shape[2] *
-                                       num_predictions_per_location,
-                                       self._box_code_size]))
-          box_encodings_list.append(box_encodings)
-          class_predictions_with_background = tf.reshape(
-              class_predictions_with_background,
-              tf.stack([combined_feature_map_shape[0],
-                        combined_feature_map_shape[1] *
-                        combined_feature_map_shape[2] *
-                        num_predictions_per_location,
-                        num_class_slots]))
-          class_predictions_list.append(class_predictions_with_background)
-    return {
-        BOX_ENCODINGS: box_encodings_list,
-        CLASS_PREDICTIONS_WITH_BACKGROUND: class_predictions_list
-    }
--- a/research/object_detection/core/preprocessor.py
+++ b/research/object_detection/core/preprocessor.py
@@ -2925,6 +2925,29 @@ def ssd_random_crop_pad_fixed_aspect_ratio(
  return result
+def convert_class_logits_to_softmax(multiclass_scores, temperature=1.0):
+  """Converts multiclass logits to softmax scores after applying temperature.
+  Args:
+    multiclass_scores: float32 tensor of shape
+      [num_instances, num_classes] representing the score for each box for each
+      class.
+    temperature: Scale factor to use prior to applying softmax. Larger
+      temperatures give more uniform distruibutions after softmax.
+  Returns:
+    multiclass_scores: float32 tensor of shape
+      [num_instances, num_classes] with scaling and softmax applied.
+  """
+  # Multiclass scores must be stored as logits. Apply temp and softmax.
+  multiclass_scores_scaled = tf.divide(
+      multiclass_scores, temperature, name='scale_logits')
+  multiclass_scores = tf.nn.softmax(multiclass_scores_scaled, name='softmax')
+  return multiclass_scores
 def get_default_func_arg_map(include_label_scores=False,
                             include_multiclass_scores=False,
                             include_instance_masks=False,
@@ -3003,8 +3026,7 @@ def get_default_func_arg_map(include_label_scores=False,
      random_crop_pad_image: (fields.InputDataFields.image,
                              fields.InputDataFields.groundtruth_boxes,
                              fields.InputDataFields.groundtruth_classes,
-                              groundtruth_label_scores,
+                              groundtruth_label_scores, multiclass_scores),
-                              multiclass_scores),
      random_crop_to_aspect_ratio: (
          fields.InputDataFields.image,
          fields.InputDataFields.groundtruth_boxes,
@@ -3051,20 +3073,15 @@ def get_default_func_arg_map(include_label_scores=False,
      subtract_channel_mean: (fields.InputDataFields.image,),
      one_hot_encoding: (fields.InputDataFields.groundtruth_image_classes,),
      rgb_to_gray: (fields.InputDataFields.image,),
-      ssd_random_crop: (
+      ssd_random_crop: (fields.InputDataFields.image,
-          fields.InputDataFields.image,
+                        fields.InputDataFields.groundtruth_boxes,
-          fields.InputDataFields.groundtruth_boxes,
+                        fields.InputDataFields.groundtruth_classes,
-          fields.InputDataFields.groundtruth_classes,
+                        groundtruth_label_scores, multiclass_scores,
-          groundtruth_label_scores,
+                        groundtruth_instance_masks, groundtruth_keypoints),
-          multiclass_scores,
-          groundtruth_instance_masks,
-          groundtruth_keypoints
-      ),
      ssd_random_crop_pad: (fields.InputDataFields.image,
                            fields.InputDataFields.groundtruth_boxes,
                            fields.InputDataFields.groundtruth_classes,
-                            groundtruth_label_scores,
+                            groundtruth_label_scores, multiclass_scores),
-                            multiclass_scores),
      ssd_random_crop_fixed_aspect_ratio: (
          fields.InputDataFields.image,
          fields.InputDataFields.groundtruth_boxes,
@@ -3079,6 +3096,7 @@ def get_default_func_arg_map(include_label_scores=False,
          groundtruth_instance_masks,
          groundtruth_keypoints,
      ),
+      convert_class_logits_to_softmax: (multiclass_scores,),
  }
  return prep_func_arg_map

--- a/research/object_detection/core/preprocessor_test.py
+++ b/research/object_detection/core/preprocessor_test.py
@@ -2844,5 +2844,24 @@ class PreprocessorTest(tf.test.TestCase):
                                            include_instance_masks=True,
                                            include_keypoints=True)
+  def testConvertClassLogitsToSoftmax(self):
+    multiclass_scores = tf.constant(
+        [[1.0, 0.0], [0.5, 0.5], [1000, 1]], dtype=tf.float32)
+    temperature = 2.0
+    converted_multiclass_scores = (
+        preprocessor.convert_class_logits_to_softmax(
+            multiclass_scores=multiclass_scores, temperature=temperature))
+    expected_converted_multiclass_scores = [[[0.62245935, 0.37754068],
+                                             [0.5, 0.5], [1, 0]]]
+    with self.test_session() as sess:
+      (converted_multiclass_scores_) = sess.run([converted_multiclass_scores])
+      self.assertAllClose(converted_multiclass_scores_,
+                          expected_converted_multiclass_scores)
 if __name__ == '__main__':
  tf.test.main()
--- a/research/object_detection/core/region_similarity_calculator.py
+++ b/research/object_detection/core/region_similarity_calculator.py
@@ -24,6 +24,7 @@ from abc import abstractmethod
 import tensorflow as tf
 from object_detection.core import box_list_ops
+from object_detection.core import standard_fields as fields
 class RegionSimilarityCalculator(object):
@@ -33,7 +34,7 @@ class RegionSimilarityCalculator(object):
  def compare(self, boxlist1, boxlist2, scope=None):
    """Computes matrix of pairwise similarity between BoxLists.
-    This op (to be overriden) computes a measure of pairwise similarity between
+    This op (to be overridden) computes a measure of pairwise similarity between
    the boxes in the given BoxLists. Higher values indicate more similarity.
    Note that this method simply measures similarity and does not explicitly
@@ -112,3 +113,42 @@ class IoaSimilarity(RegionSimilarityCalculator):
      A tensor with shape [N, M] representing pairwise IOA scores.
    """
    return box_list_ops.ioa(boxlist1, boxlist2)
+class ThresholdedIouSimilarity(RegionSimilarityCalculator):
+  """Class to compute similarity based on thresholded IOU and score.
+  This class computes pairwise similarity between two BoxLists based on IOU and
+  a 'score' present in boxlist1. If IOU > threshold, then the entry in the
+  output pairwise tensor will contain `score`, otherwise 0.
+  """
+  def __init__(self, iou_threshold=0):
+    """Initialize the ThresholdedIouSimilarity.
+    Args:
+      iou_threshold: For a given pair of boxes, if the IOU is > iou_threshold,
+        then the comparison result will be the foreground probability of
+        the first box, otherwise it will be zero.
+    """
+    self._iou_threshold = iou_threshold
+  def _compare(self, boxlist1, boxlist2):
+    """Compute pairwise IOU similarity between the two BoxLists and score.
+    Args:
+      boxlist1: BoxList holding N boxes. Must have a score field.
+      boxlist2: BoxList holding M boxes.
+    Returns:
+      A tensor with shape [N, M] representing scores threholded by pairwise
+      iou scores.
+    """
+    ious = box_list_ops.iou(boxlist1, boxlist2)
+    scores = boxlist1.get_field(fields.BoxListFields.scores)
+    scores = tf.expand_dims(scores, axis=1)
+    row_replicated_scores = tf.tile(scores, [1, tf.shape(ious)[-1]])
+    thresholded_ious = tf.where(ious > self._iou_threshold,
+                                row_replicated_scores, tf.zeros_like(ious))
+    return thresholded_ious