Release iNaturalist Species-trained models, refactor of evaluation, box...

Release iNaturalist Species-trained models, refactor of evaluation, box predictor for object detection. (#5289) * Merged commit includes the following changes: 212389173 by Zhichao Lu: 1. Replace tf.boolean_mask with tf.where -- 212282646 by Zhichao Lu: 1. Fix a typo in model_builder.py and add a test to cover it. -- 212142989 by Zhichao Lu: Only resize masks in meta architecture if it has not already been resized in the input pipeline. -- 212136935 by Zhichao Lu: Choose matmul or native crop_and_resize in the model builder instead of faster r-cnn meta architecture. -- 211907984 by Zhichao Lu: Make eval input reader repeated field and update config util to handle this field. -- 211858098 by Zhichao Lu: Change the implementation of merge_boxes_with_multiple_labels. -- 211843915 by Zhichao Lu: Add Mobilenet v2 + FPN support. -- 211655076 by Zhichao Lu: Bug fix for generic keys in config overrides In generic configuration overrides, we had a duplicate entry for train_input_config and we were missing the eval_input_config and eval_config. This change also introduces testing for all config overrides. -- 211157501 by Zhichao Lu: Make the locally-modified conv defs a copy. So that it doesn't modify MobileNet conv defs globally for other code that transitively imports this package. -- 211112813 by Zhichao Lu: Refactoring visualization tools for Estimator's eval_metric_ops. This will make it easier for future models to take advantage of a single interface and mechanics. -- 211109571 by Zhichao Lu: A test decorator. -- 210747685 by Zhichao Lu: For FPN, when use_depthwise is set to true, use slightly modified mobilenet v1 config. -- 210723882 by Zhichao Lu: Integrating the losses mask into the meta architectures. When providing groundtruth, one can optionally specify annotation information (i.e. which images are labeled vs. unlabeled). For any image that is unlabeled, there is no loss accumulation. -- 210673675 by Zhichao Lu: Internal change. -- 210546590 by Zhichao Lu: Internal change. -- 210529752 by Zhichao Lu: Support batched inputs with ops.matmul_crop_and_resize. With this change the new inputs are images of shape [batch, heigh, width, depth] and boxes of shape [batch, num_boxes, 4]. The output tensor is of the shape [batch, num_boxes, crop_height, crop_width, depth]. -- 210485912 by Zhichao Lu: Fix TensorFlow version check in object_detection_tutorial.ipynb -- 210484076 by Zhichao Lu: Reduce TPU memory required for single image matmul_crop_and_resize. Using tf.einsum eliminates intermediate tensors, tiling and expansion. for an image of size [40, 40, 1024] and boxes of shape [300, 4] HBM memory usage goes down from 3.52G to 1.67G. -- 210468361 by Zhichao Lu: Remove PositiveAnchorLossCDF/NegativeAnchorLossCDF to resolve "Main thread is not in main loop error" issue in local training. -- 210100253 by Zhichao Lu: Pooling pyramid feature maps: add option to replace max pool with convolution layers. -- 209995842 by Zhichao Lu: Fix a bug which prevents variable sharing in Faster RCNN. -- 209965526 by Zhichao Lu: Add support for enabling export_to_tpu through the estimator. -- 209946440 by Zhichao Lu: Replace deprecated tf.train.Supervisor with tf.train.MonitoredSession. MonitoredSession also takes away the hassle of starting queue runners. -- 209888003 by Zhichao Lu: Implement function to handle data where source_id is not set. If the field source_id is found to be the empty string for any image during runtime, it will be replaced with a random string. This avoids hash-collisions on dataset where many examples do not have source_id set. Those hash-collisions have unintended site effects and may lead to bugs in the detection pipeline. -- 209842134 by Zhichao Lu: Converting loss mask into multiplier, rather than using it as a boolean mask (which changes tensor shape). This is necessary, since other utilities (e.g. hard example miner) require a loss matrix with the same dimensions as the original prediction tensor. -- 209768066 by Zhichao Lu: Adding ability to remove loss computation from specific images in a batch, via an optional boolean mask. -- 209722556 by Zhichao Lu: Remove dead code. (_USE_C_API was flipped to True by default in TensorFlow 1.8) -- 209701861 by Zhichao Lu: This CL cleans-up some tf.Example creation snippets, by reusing the convenient tf.train.Feature building functions in dataset_util. -- 209697893 by Zhichao Lu: Do not overwrite num_epoch for eval input. This leads to errors in some cases. -- 209694652 by Zhichao Lu: Sample boxes by jittering around the currently given boxes. -- 209550300 by Zhichao Lu: `create_category_index_from_labelmap()` function now accepts `use_display_name` parameter. Also added create_categories_from_labelmap function for convenience -- 209490273 by Zhichao Lu: Check result_dict type before accessing image_id via key. -- 209442529 by Zhichao Lu: Introducing the capability to sample examples for evaluation. This makes it easy to specify one full epoch of evaluation, or a subset (e.g. sample 1 of every N examples). -- 208941150 by Zhichao Lu: Adding the capability of exporting the results in json format. -- 208888798 by Zhichao Lu: Fixes wrong dictionary key for num_det_boxes_per_image. -- 208873549 by Zhichao Lu: Reduce the number of HLO ops created by matmul_crop_and_resize. Do not unroll along the channels dimension. Instead, transpose the input image dimensions, apply tf.matmul and transpose back. The number of HLO instructions for 1024 channels reduce from 12368 to 110. -- 208844315 by Zhichao Lu: Add an option to use tf.non_maximal_supression_padded in SSD post-process -- 208731380 by Zhichao Lu: Add field in box_predictor config to enable mask prediction and update builders accordingly. -- 208699405 by Zhichao Lu: This CL creates a keras-based multi-resolution feature map extractor. -- 208557208 by Zhichao Lu: Add TPU tests for Faster R-CNN Meta arch. * Tests that two_stage_predict and total_loss tests run successfully on TPU. * Small mods to multiclass_non_max_suppression to preserve static shapes. -- 208499278 by Zhichao Lu: This CL makes sure the Keras convolutional box predictor & head layers apply activation layers *after* normalization (as opposed to before). -- 208391694 by Zhichao Lu: Updating visualization tool to produce multiple evaluation images. -- 208275961 by Zhichao Lu: This CL adds a Keras version of the Convolutional Box Predictor, as well as more general infrastructure for making Keras Prediction heads & Keras box predictors. -- 208275585 by Zhichao Lu: This CL enables the Keras layer hyperparameter object to build a dedicated activation layer, and to disable activation by default in the op layer construction kwargs. This is necessary because in most cases the normalization layer must be applied before the activation layer. So, in Keras models we must set the convolution activation in a dedicated layer after normalization is applied, rather than setting it in the convolution layer construction args. -- 208263792 by Zhichao Lu: Add a new SSD mask meta arch that can predict masks for SSD models. Changes including: - overwrite loss function to add mask loss computation. - update ssd_meta_arch to handle masks if predicted in predict and postprocessing. -- 208000218 by Zhichao Lu: Make FasterRCNN choose static shape operations only in training mode. -- 207997797 by Zhichao Lu: Add static boolean_mask op to box_list_ops.py and use that in faster_rcnn_meta_arch.py to support use_static_shapes option. -- 207993460 by Zhichao Lu: Include FGVC detection models in model zoo. -- 207971213 by Zhichao Lu: remove the restriction to run tf.nn.top_k op on CPU -- 207961187 by Zhichao Lu: Build the first stage NMS function in the model builder and pass it to FasterRCNN meta arch. -- 207960608 by Zhichao Lu: Internal Change. -- 207927015 by Zhichao Lu: Have an option to use the TPU compatible NMS op cl/206673787, in the batch_multiclass_non_max_suppression function. On setting pad_to_max_output_size to true, the output nmsed boxes are padded to be of length max_size_per_class. This can be used in first stage Region Proposal Network in FasterRCNN model by setting the first_stage_nms_pad_to_max_proposals field to true in config proto. -- 207809668 by Zhichao Lu: Add option to use depthwise separable conv instead of conv2d in FPN and WeightSharedBoxPredictor. More specifically, there are two related configs: - SsdFeatureExtractor.use_depthwise - WeightSharedConvolutionalBoxPredictor.use_depthwise -- 207808651 by Zhichao Lu: Fix the static balanced positive negative sampler's TPU tests -- 207798658 by Zhichao Lu: Fixes a post-refactoring bug where the pre-prediction convolution layers in the convolutional box predictor are ignored. -- 207796470 by Zhichao Lu: Make slim endpoints visible in FasterRCNNMetaArch. -- 207787053 by Zhichao Lu: Refactor ssd_meta_arch so that the target assigner instance is passed into the SSDMetaArch constructor rather than constructed inside. -- PiperOrigin-RevId: 212389173 * Fix detection model zoo typo. * Modify tf example decoder to handle label maps with either `display_name` or `name` fields seamlessly. Currently, tf example decoder uses only `name` field to look up ids for class text field present in the data. This change uses both `display_name` and `name` fields in the label map to fetch ids for class text. PiperOrigin-RevId: 212672223 * Modify create_coco_tf_record tool to write out class text instead of class labels. PiperOrigin-RevId: 212679112 * Fix detection model zoo typo. PiperOrigin-RevId: 212715692 * Adding the following two optional flags to WeightSharedConvolutionalBoxHead: 1) In the box head, apply clipping to box encodings in the box head. 2) In the class head, apply sigmoid to class predictions at inference time. PiperOrigin-RevId: 212723242 * Support class confidences in merge boxes with multiple labels. PiperOrigin-RevId: 212884998 * Creates multiple eval specs for object detection. PiperOrigin-RevId: 212894556 * Set batch_norm on last layer in Mask Head to None. PiperOrigin-RevId: 213030087 * Enable bfloat16 training for object detection models. PiperOrigin-RevId: 213053547 * Skip padding op when unnecessary. PiperOrigin-RevId: 213065869 * Modify `Matchers` to use groundtruth weights before performing matching. Groundtruth weights tensor is used to indicate padding in groundtruth box tensor. It is handled in `TargetAssigner` by creating appropriate classification and regression target weights based on the groundtruth box each anchor matches to. However, options such as `force_match_all_rows` in `ArgmaxMatcher` force certain anchors to match to groundtruth boxes that are just paddings thereby reducing the number of anchors that could otherwise match to real groundtruth boxes. For single stage models like SSD the effect of this is negligible as there are two orders of magnitude more anchors than the number of padded groundtruth boxes. But for Faster R-CNN and Mask R-CNN where there are only 300 anchors in the second stage, a significant number of these match to groundtruth paddings reducing the number of anchors regressing to real groundtruth boxes degrading the performance severely. Therefore, this change introduces an additional boolean argument `valid_rows` to `Matcher.match` methods and the implementations now ignore such padded groudtruth boxes during matching. PiperOrigin-RevId: 213345395 * Add release note for iNaturalist Species trained models. PiperOrigin-RevId: 213347179 * Fix the bug of uninitialized gt_is_crowd_list variable. PiperOrigin-RevId: 213364858 * ...text exposed to open source public git repo... PiperOrigin-RevId: 213554260

Release iNaturalist Species-trained models, refactor of evaluation, box...
Release iNaturalist Species-trained models, refactor of evaluation, box predictor for object detection. (#5289) * Merged commit includes the following changes: 212389173 by Zhichao Lu: 1. Replace tf.boolean_mask with tf.where -- 212282646 by Zhichao Lu: 1. Fix a typo in model_builder.py and add a test to cover it. -- 212142989 by Zhichao Lu: Only resize masks in meta architecture if it has not already been resized in the input pipeline. -- 212136935 by Zhichao Lu: Choose matmul or native crop_and_resize in the model builder instead of faster r-cnn meta architecture. -- 211907984 by Zhichao Lu: Make eval input reader repeated field and update config util to handle this field. -- 211858098 by Zhichao Lu: Change the implementation of merge_boxes_with_multiple_labels. -- 211843915 by Zhichao Lu: Add Mobilenet v2 + FPN support. -- 211655076 by Zhichao Lu: Bug fix for generic keys in config overrides In generic configuration overrides, we had a duplicate entry for train_input_config and we were missing the eval_input_config and eval_config. This change also introduces testing for all config overrides. -- 211157501 by Zhichao Lu: Make the locally-modified conv defs a copy. So that it doesn't modify MobileNet conv defs globally for other code that transitively imports this package. -- 211112813 by Zhichao Lu: Refactoring visualization tools for Estimator's eval_metric_ops. This will make it easier for future models to take advantage of a single interface and mechanics. -- 211109571 by Zhichao Lu: A test decorator. -- 210747685 by Zhichao Lu: For FPN, when use_depthwise is set to true, use slightly modified mobilenet v1 config. -- 210723882 by Zhichao Lu: Integrating the losses mask into the meta architectures. When providing groundtruth, one can optionally specify annotation information (i.e. which images are labeled vs. unlabeled). For any image that is unlabeled, there is no loss accumulation. -- 210673675 by Zhichao Lu: Internal change. -- 210546590 by Zhichao Lu: Internal change. -- 210529752 by Zhichao Lu: Support batched inputs with ops.matmul_crop_and_resize. With this change the new inputs are images of shape [batch, heigh, width, depth] and boxes of shape [batch, num_boxes, 4]. The output tensor is of the shape [batch, num_boxes, crop_height, crop_width, depth]. -- 210485912 by Zhichao Lu: Fix TensorFlow version check in object_detection_tutorial.ipynb -- 210484076 by Zhichao Lu: Reduce TPU memory required for single image matmul_crop_and_resize. Using tf.einsum eliminates intermediate tensors, tiling and expansion. for an image of size [40, 40, 1024] and boxes of shape [300, 4] HBM memory usage goes down from 3.52G to 1.67G. -- 210468361 by Zhichao Lu: Remove PositiveAnchorLossCDF/NegativeAnchorLossCDF to resolve "Main thread is not in main loop error" issue in local training. -- 210100253 by Zhichao Lu: Pooling pyramid feature maps: add option to replace max pool with convolution layers. -- 209995842 by Zhichao Lu: Fix a bug which prevents variable sharing in Faster RCNN. -- 209965526 by Zhichao Lu: Add support for enabling export_to_tpu through the estimator. -- 209946440 by Zhichao Lu: Replace deprecated tf.train.Supervisor with tf.train.MonitoredSession. MonitoredSession also takes away the hassle of starting queue runners. -- 209888003 by Zhichao Lu: Implement function to handle data where source_id is not set. If the field source_id is found to be the empty string for any image during runtime, it will be replaced with a random string. This avoids hash-collisions on dataset where many examples do not have source_id set. Those hash-collisions have unintended site effects and may lead to bugs in the detection pipeline. -- 209842134 by Zhichao Lu: Converting loss mask into multiplier, rather than using it as a boolean mask (which changes tensor shape). This is necessary, since other utilities (e.g. hard example miner) require a loss matrix with the same dimensions as the original prediction tensor. -- 209768066 by Zhichao Lu: Adding ability to remove loss computation from specific images in a batch, via an optional boolean mask. -- 209722556 by Zhichao Lu: Remove dead code. (_USE_C_API was flipped to True by default in TensorFlow 1.8) -- 209701861 by Zhichao Lu: This CL cleans-up some tf.Example creation snippets, by reusing the convenient tf.train.Feature building functions in dataset_util. -- 209697893 by Zhichao Lu: Do not overwrite num_epoch for eval input. This leads to errors in some cases. -- 209694652 by Zhichao Lu: Sample boxes by jittering around the currently given boxes. -- 209550300 by Zhichao Lu: `create_category_index_from_labelmap()` function now accepts `use_display_name` parameter. Also added create_categories_from_labelmap function for convenience -- 209490273 by Zhichao Lu: Check result_dict type before accessing image_id via key. -- 209442529 by Zhichao Lu: Introducing the capability to sample examples for evaluation. This makes it easy to specify one full epoch of evaluation, or a subset (e.g. sample 1 of every N examples). -- 208941150 by Zhichao Lu: Adding the capability of exporting the results in json format. -- 208888798 by Zhichao Lu: Fixes wrong dictionary key for num_det_boxes_per_image. -- 208873549 by Zhichao Lu: Reduce the number of HLO ops created by matmul_crop_and_resize. Do not unroll along the channels dimension. Instead, transpose the input image dimensions, apply tf.matmul and transpose back. The number of HLO instructions for 1024 channels reduce from 12368 to 110. -- 208844315 by Zhichao Lu: Add an option to use tf.non_maximal_supression_padded in SSD post-process -- 208731380 by Zhichao Lu: Add field in box_predictor config to enable mask prediction and update builders accordingly. -- 208699405 by Zhichao Lu: This CL creates a keras-based multi-resolution feature map extractor. -- 208557208 by Zhichao Lu: Add TPU tests for Faster R-CNN Meta arch. * Tests that two_stage_predict and total_loss tests run successfully on TPU. * Small mods to multiclass_non_max_suppression to preserve static shapes. -- 208499278 by Zhichao Lu: This CL makes sure the Keras convolutional box predictor & head layers apply activation layers *after* normalization (as opposed to before). -- 208391694 by Zhichao Lu: Updating visualization tool to produce multiple evaluation images. -- 208275961 by Zhichao Lu: This CL adds a Keras version of the Convolutional Box Predictor, as well as more general infrastructure for making Keras Prediction heads & Keras box predictors. -- 208275585 by Zhichao Lu: This CL enables the Keras layer hyperparameter object to build a dedicated activation layer, and to disable activation by default in the op layer construction kwargs. This is necessary because in most cases the normalization layer must be applied before the activation layer. So, in Keras models we must set the convolution activation in a dedicated layer after normalization is applied, rather than setting it in the convolution layer construction args. -- 208263792 by Zhichao Lu: Add a new SSD mask meta arch that can predict masks for SSD models. Changes including: - overwrite loss function to add mask loss computation. - update ssd_meta_arch to handle masks if predicted in predict and postprocessing. -- 208000218 by Zhichao Lu: Make FasterRCNN choose static shape operations only in training mode. -- 207997797 by Zhichao Lu: Add static boolean_mask op to box_list_ops.py and use that in faster_rcnn_meta_arch.py to support use_static_shapes option. -- 207993460 by Zhichao Lu: Include FGVC detection models in model zoo. -- 207971213 by Zhichao Lu: remove the restriction to run tf.nn.top_k op on CPU -- 207961187 by Zhichao Lu: Build the first stage NMS function in the model builder and pass it to FasterRCNN meta arch. -- 207960608 by Zhichao Lu: Internal Change. -- 207927015 by Zhichao Lu: Have an option to use the TPU compatible NMS op cl/206673787, in the batch_multiclass_non_max_suppression function. On setting pad_to_max_output_size to true, the output nmsed boxes are padded to be of length max_size_per_class. This can be used in first stage Region Proposal Network in FasterRCNN model by setting the first_stage_nms_pad_to_max_proposals field to true in config proto. -- 207809668 by Zhichao Lu: Add option to use depthwise separable conv instead of conv2d in FPN and WeightSharedBoxPredictor. More specifically, there are two related configs: - SsdFeatureExtractor.use_depthwise - WeightSharedConvolutionalBoxPredictor.use_depthwise -- 207808651 by Zhichao Lu: Fix the static balanced positive negative sampler's TPU tests -- 207798658 by Zhichao Lu: Fixes a post-refactoring bug where the pre-prediction convolution layers in the convolutional box predictor are ignored. -- 207796470 by Zhichao Lu: Make slim endpoints visible in FasterRCNNMetaArch. -- 207787053 by Zhichao Lu: Refactor ssd_meta_arch so that the target assigner instance is passed into the SSDMetaArch constructor rather than constructed inside. -- PiperOrigin-RevId: 212389173 * Fix detection model zoo typo. * Modify tf example decoder to handle label maps with either `display_name` or `name` fields seamlessly. Currently, tf example decoder uses only `name` field to look up ids for class text field present in the data. This change uses both `display_name` and `name` fields in the label map to fetch ids for class text. PiperOrigin-RevId: 212672223 * Modify create_coco_tf_record tool to write out class text instead of class labels. PiperOrigin-RevId: 212679112 * Fix detection model zoo typo. PiperOrigin-RevId: 212715692 * Adding the following two optional flags to WeightSharedConvolutionalBoxHead: 1) In the box head, apply clipping to box encodings in the box head. 2) In the class head, apply sigmoid to class predictions at inference time. PiperOrigin-RevId: 212723242 * Support class confidences in merge boxes with multiple labels. PiperOrigin-RevId: 212884998 * Creates multiple eval specs for object detection. PiperOrigin-RevId: 212894556 * Set batch_norm on last layer in Mask Head to None. PiperOrigin-RevId: 213030087 * Enable bfloat16 training for object detection models. PiperOrigin-RevId: 213053547 * Skip padding op when unnecessary. PiperOrigin-RevId: 213065869 * Modify `Matchers` to use groundtruth weights before performing matching. Groundtruth weights tensor is used to indicate padding in groundtruth box tensor. It is handled in `TargetAssigner` by creating appropriate classification and regression target weights based on the groundtruth box each anchor matches to. However, options such as `force_match_all_rows` in `ArgmaxMatcher` force certain anchors to match to groundtruth boxes that are just paddings thereby reducing the number of anchors that could otherwise match to real groundtruth boxes. For single stage models like SSD the effect of this is negligible as there are two orders of magnitude more anchors than the number of padded groundtruth boxes. But for Faster R-CNN and Mask R-CNN where there are only 300 anchors in the second stage, a significant number of these match to groundtruth paddings reducing the number of anchors regressing to real groundtruth boxes degrading the performance severely. Therefore, this change introduces an additional boolean argument `valid_rows` to `Matcher.match` methods and the implementations now ignore such padded groudtruth boxes during matching. PiperOrigin-RevId: 213345395 * Add release note for iNaturalist Species trained models. PiperOrigin-RevId: 213347179 * Fix the bug of uninitialized gt_is_crowd_list variable. PiperOrigin-RevId: 213364858 * ...text exposed to open source public git repo... PiperOrigin-RevId: 213554260
99256cf4 · pkulzc · GitHub · 256b8ae6 · 99256cf4 · 99256cf4
Unverified Commit 99256cf4 authored Sep 21, 2018 by pkulzc Committed by GitHub Sep 21, 2018
20 changed files
--- a/research/object_detection/legacy/evaluator.py
+++ b/research/object_detection/legacy/evaluator.py
@@ -273,6 +273,7 @@ def evaluate(create_input_dict_fn, create_model_fn, eval_config, categories,
      master=eval_config.eval_master,
      save_graph=eval_config.save_graph,
      save_graph_dir=(eval_dir if eval_config.save_graph else ''),
-      losses_dict=losses_dict)
+      losses_dict=losses_dict,
+      eval_export_path=eval_config.export_path)

  return metrics
--- a/research/object_detection/matchers/argmax_matcher.py
+++ b/research/object_detection/matchers/argmax_matcher.py
@@ -99,17 +99,19 @@ class ArgMaxMatcher(matcher.Matcher):
      if self._unmatched_threshold == self._matched_threshold:
        raise ValueError('When negatives are in between matched and '
                         'unmatched thresholds, these cannot be of equal '
-                         'value. matched: %s, unmatched: %s',
-                         self._matched_threshold, self._unmatched_threshold)
+                         'value. matched: {}, unmatched: {}'.format(
+                             self._matched_threshold,
+                             self._unmatched_threshold))
    self._force_match_for_each_row = force_match_for_each_row
    self._negatives_lower_than_unmatched = negatives_lower_than_unmatched

-  def _match(self, similarity_matrix):
+  def _match(self, similarity_matrix, valid_rows):
    """Tries to match each column of the similarity matrix to a row.

    Args:
      similarity_matrix: tensor of shape [N, M] representing any similarity
        metric.
+      valid_rows: a boolean tensor of shape [N] indicating valid rows.

    Returns:
      Match object with corresponding matches for each of M columns.
@@ -167,8 +169,10 @@ class ArgMaxMatcher(matcher.Matcher):
            similarity_matrix)
        force_match_column_ids = tf.argmax(similarity_matrix, 1,
                                           output_type=tf.int32)
-        force_match_column_indicators = tf.one_hot(
-            force_match_column_ids, depth=similarity_matrix_shape[1])
+        force_match_column_indicators = (
+            tf.one_hot(
+                force_match_column_ids, depth=similarity_matrix_shape[1]) *
+            tf.cast(tf.expand_dims(valid_rows, axis=-1), dtype=tf.float32))
        force_match_row_ids = tf.argmax(force_match_column_indicators, 0,
                                        output_type=tf.int32)
        force_match_column_mask = tf.cast(

--- a/research/object_detection/matchers/argmax_matcher_test.py
+++ b/research/object_detection/matchers/argmax_matcher_test.py
@@ -182,6 +182,34 @@ class ArgMaxMatcherTest(test_case.TestCase):
    self.assertAllEqual(np.nonzero(res_unmatched_cols)[0],
                        expected_unmatched_cols)

+  def test_return_correct_matches_using_force_match_padded_groundtruth(self):
+    def graph_fn(similarity, valid_rows):
+      matcher = argmax_matcher.ArgMaxMatcher(matched_threshold=3.,
+                                             unmatched_threshold=2.,
+                                             force_match_for_each_row=True)
+      match = matcher.match(similarity, valid_rows)
+      matched_cols = match.matched_column_indicator()
+      unmatched_cols = match.unmatched_column_indicator()
+      match_results = match.match_results
+      return (matched_cols, unmatched_cols, match_results)
+
+    similarity = np.array([[1, 1, 1, 3, 1],
+                           [-1, 0, -2, -2, -1],
+                           [0, 0, 0, 0, 0],
+                           [3, 0, -1, 2, 0],
+                           [0, 0, 0, 0, 0]], dtype=np.float32)
+    valid_rows = np.array([True, True, False, True, False])
+    expected_matched_cols = np.array([0, 1, 3])
+    expected_matched_rows = np.array([3, 1, 0])
+    expected_unmatched_cols = np.array([2, 4])  # col 2 has too high max val
+
+    (res_matched_cols, res_unmatched_cols,
+     match_results) = self.execute(graph_fn, [similarity, valid_rows])
+    self.assertAllEqual(match_results[res_matched_cols], expected_matched_rows)
+    self.assertAllEqual(np.nonzero(res_matched_cols)[0], expected_matched_cols)
+    self.assertAllEqual(np.nonzero(res_unmatched_cols)[0],
+                        expected_unmatched_cols)
+
  def test_valid_arguments_corner_case(self):
    argmax_matcher.ArgMaxMatcher(matched_threshold=1,
                                 unmatched_threshold=1)

--- a/research/object_detection/matchers/bipartite_matcher.py
+++ b/research/object_detection/matchers/bipartite_matcher.py
@@ -35,7 +35,7 @@ class GreedyBipartiteMatcher(matcher.Matcher):
    super(GreedyBipartiteMatcher, self).__init__(
        use_matmul_gather=use_matmul_gather)

-  def _match(self, similarity_matrix, num_valid_rows=-1):
+  def _match(self, similarity_matrix, valid_rows):
    """Bipartite matches a collection rows and columns. A greedy bi-partite.

    TODO(rathodv): Add num_valid_columns options to match only that many columns
@@ -44,21 +44,27 @@ class GreedyBipartiteMatcher(matcher.Matcher):
    Args:
      similarity_matrix: Float tensor of shape [N, M] with pairwise similarity
        where higher values mean more similar.
-      num_valid_rows: A scalar or a 1-D tensor with one element describing the
-        number of valid rows of similarity_matrix to consider for the bipartite
-        matching. If set to be negative, then all rows from similarity_matrix
-        are used.
+      valid_rows: A boolean tensor of shape [N] indicating the rows that are
+        valid.

    Returns:
      match_results: int32 tensor of shape [M] with match_results[i]=-1
        meaning that column i is not matched and otherwise that it is matched to
        row match_results[i].
    """
+    valid_row_sim_matrix = tf.gather(similarity_matrix,
+                                     tf.squeeze(tf.where(valid_rows), axis=-1))
+    invalid_row_sim_matrix = tf.gather(
+        similarity_matrix,
+        tf.squeeze(tf.where(tf.logical_not(valid_rows)), axis=-1))
+    similarity_matrix = tf.concat(
+        [valid_row_sim_matrix, invalid_row_sim_matrix], axis=0)
    # Convert similarity matrix to distance matrix as tf.image.bipartite tries
    # to find minimum distance matches.
    distance_matrix = -1 * similarity_matrix
+    num_valid_rows = tf.reduce_sum(tf.to_float(valid_rows))
    _, match_results = image_ops.bipartite_match(
-        distance_matrix, num_valid_rows)
+        distance_matrix, num_valid_rows=num_valid_rows)
    match_results = tf.reshape(match_results, [-1])
    match_results = tf.cast(match_results, tf.int32)
    return match_results
--- a/research/object_detection/matchers/bipartite_matcher_test.py
+++ b/research/object_detection/matchers/bipartite_matcher_test.py
@@ -24,44 +24,54 @@ class GreedyBipartiteMatcherTest(tf.test.TestCase):

  def test_get_expected_matches_when_all_rows_are_valid(self):
    similarity_matrix = tf.constant([[0.50, 0.1, 0.8], [0.15, 0.2, 0.3]])
-    num_valid_rows = 2
+    valid_rows = tf.ones([2], dtype=tf.bool)
    expected_match_results = [-1, 1, 0]

    matcher = bipartite_matcher.GreedyBipartiteMatcher()
-    match = matcher.match(similarity_matrix, num_valid_rows=num_valid_rows)
+    match = matcher.match(similarity_matrix, valid_rows=valid_rows)
    with self.test_session() as sess:
      match_results_out = sess.run(match._match_results)
      self.assertAllEqual(match_results_out, expected_match_results)

-  def test_get_expected_matches_with_valid_rows_set_to_minus_one(self):
+  def test_get_expected_matches_with_all_rows_be_default(self):
    similarity_matrix = tf.constant([[0.50, 0.1, 0.8], [0.15, 0.2, 0.3]])
-    num_valid_rows = -1
    expected_match_results = [-1, 1, 0]

    matcher = bipartite_matcher.GreedyBipartiteMatcher()
-    match = matcher.match(similarity_matrix, num_valid_rows=num_valid_rows)
+    match = matcher.match(similarity_matrix)
    with self.test_session() as sess:
      match_results_out = sess.run(match._match_results)
      self.assertAllEqual(match_results_out, expected_match_results)

  def test_get_no_matches_with_zero_valid_rows(self):
    similarity_matrix = tf.constant([[0.50, 0.1, 0.8], [0.15, 0.2, 0.3]])
-    num_valid_rows = 0
+    valid_rows = tf.zeros([2], dtype=tf.bool)
    expected_match_results = [-1, -1, -1]

    matcher = bipartite_matcher.GreedyBipartiteMatcher()
-    match = matcher.match(similarity_matrix, num_valid_rows=num_valid_rows)
+    match = matcher.match(similarity_matrix, valid_rows)
    with self.test_session() as sess:
      match_results_out = sess.run(match._match_results)
      self.assertAllEqual(match_results_out, expected_match_results)

  def test_get_expected_matches_with_only_one_valid_row(self):
    similarity_matrix = tf.constant([[0.50, 0.1, 0.8], [0.15, 0.2, 0.3]])
-    num_valid_rows = 1
+    valid_rows = tf.constant([True, False], dtype=tf.bool)
    expected_match_results = [-1, -1, 0]

    matcher = bipartite_matcher.GreedyBipartiteMatcher()
-    match = matcher.match(similarity_matrix, num_valid_rows=num_valid_rows)
+    match = matcher.match(similarity_matrix, valid_rows)
+    with self.test_session() as sess:
+      match_results_out = sess.run(match._match_results)
+      self.assertAllEqual(match_results_out, expected_match_results)
+
+  def test_get_expected_matches_with_only_one_valid_row_at_bottom(self):
+    similarity_matrix = tf.constant([[0.15, 0.2, 0.3], [0.50, 0.1, 0.8]])
+    valid_rows = tf.constant([False, True], dtype=tf.bool)
+    expected_match_results = [-1, -1, 0]
+
+    matcher = bipartite_matcher.GreedyBipartiteMatcher()
+    match = matcher.match(similarity_matrix, valid_rows)
    with self.test_session() as sess:
      match_results_out = sess.run(match._match_results)
      self.assertAllEqual(match_results_out, expected_match_results)

--- a/research/object_detection/meta_architectures/faster_rcnn_meta_arch.py
+++ b/research/object_detection/meta_architectures/faster_rcnn_meta_arch.py
@@ -103,7 +103,6 @@ from object_detection.core import box_list_ops
 from object_detection.core import box_predictor
 from object_detection.core import losses
 from object_detection.core import model
-from object_detection.core import post_processing
 from object_detection.core import standard_fields as fields
 from object_detection.core import target_assigner
 from object_detection.utils import ops
@@ -234,11 +233,11 @@ class FasterRCNNMetaArch(model.DetectionModel):
               first_stage_box_predictor_depth,
               first_stage_minibatch_size,
               first_stage_sampler,
-               first_stage_nms_score_threshold,
-               first_stage_nms_iou_threshold,
+               first_stage_non_max_suppression_fn,
               first_stage_max_proposals,
               first_stage_localization_loss_weight,
               first_stage_objectness_loss_weight,
+               crop_and_resize_fn,
               initial_crop_size,
               maxpool_kernel_size,
               maxpool_stride,
@@ -255,8 +254,9 @@ class FasterRCNNMetaArch(model.DetectionModel):
               hard_example_miner=None,
               parallel_iterations=16,
               add_summaries=True,
-               use_matmul_crop_and_resize=False,
-               clip_anchors_to_image=False):
+               clip_anchors_to_image=False,
+               use_static_shapes=False,
+               resize_masks=True):
    """FasterRCNNMetaArch Constructor.

    Args:
@@ -309,18 +309,22 @@ class FasterRCNNMetaArch(model.DetectionModel):
        to the loss function for any given image within the image batch and is
        only called "batch_size" due to terminology from the Faster R-CNN paper.
      first_stage_sampler: Sampler to use for first stage loss (RPN loss).
-      first_stage_nms_score_threshold: Score threshold for non max suppression
-        for the Region Proposal Network (RPN).  This value is expected to be in
-        [0, 1] as it is applied directly after a softmax transformation.  The
-        recommended value for Faster R-CNN is 0.
-      first_stage_nms_iou_threshold: The Intersection Over Union (IOU) threshold
-        for performing Non-Max Suppression (NMS) on the boxes predicted by the
-        Region Proposal Network (RPN).
+      first_stage_non_max_suppression_fn: batch_multiclass_non_max_suppression
+        callable that takes `boxes`, `scores` and optional `clip_window`(with
+        all other inputs already set) and returns a dictionary containing
+        tensors with keys: `detection_boxes`, `detection_scores`,
+        `detection_classes`, `num_detections`. This is used to perform non max
+        suppression  on the boxes predicted by the Region Proposal Network
+        (RPN).
+        See `post_processing.batch_multiclass_non_max_suppression` for the type
+        and shape of these tensors.
      first_stage_max_proposals: Maximum number of boxes to retain after
        performing Non-Max Suppression (NMS) on the boxes predicted by the
        Region Proposal Network (RPN).
      first_stage_localization_loss_weight: A float
      first_stage_objectness_loss_weight: A float
+      crop_and_resize_fn: A differentiable resampler to use for cropping RPN
+        proposal features.
      initial_crop_size: A single integer indicating the output size
        (width and height are set to be the same) of the initial bilinear
        interpolation based cropping during ROI pooling.
@@ -367,12 +371,13 @@ class FasterRCNNMetaArch(model.DetectionModel):
        in parallel for calls to tf.map_fn.
      add_summaries: boolean (default: True) controlling whether summary ops
        should be added to tensorflow graph.
-      use_matmul_crop_and_resize: Force the use of matrix multiplication based
-        crop and resize instead of standard tf.image.crop_and_resize while
-        computing second stage input feature maps.
      clip_anchors_to_image: Normally, anchors generated for a given image size
-      are pruned during training if they lie outside the image window. This
-      option clips the anchors to be within the image instead of pruning.
+        are pruned during training if they lie outside the image window. This
+        option clips the anchors to be within the image instead of pruning.
+      use_static_shapes: If True, uses implementation of ops with static shape
+        guarantees.
+      resize_masks: Indicates whether the masks presend in the groundtruth
+        should be resized in the model with `image_resizer_fn`

    Raises:
      ValueError: If `second_stage_batch_size` > `first_stage_max_proposals` at
@@ -384,9 +389,6 @@ class FasterRCNNMetaArch(model.DetectionModel):
    # in the future.
    super(FasterRCNNMetaArch, self).__init__(num_classes=num_classes)

-    if is_training and second_stage_batch_size > first_stage_max_proposals:
-      raise ValueError('second_stage_batch_size should be no greater than '
-                       'first_stage_max_proposals.')
    if not isinstance(first_stage_anchor_generator,
                      grid_anchor_generator.GridAnchorGenerator):
      raise ValueError('first_stage_anchor_generator must be of type '
@@ -394,6 +396,7 @@ class FasterRCNNMetaArch(model.DetectionModel):

    self._is_training = is_training
    self._image_resizer_fn = image_resizer_fn
+    self._resize_masks = resize_masks
    self._feature_extractor = feature_extractor
    self._number_of_stages = number_of_stages

@@ -425,9 +428,9 @@ class FasterRCNNMetaArch(model.DetectionModel):
            min_depth=0,
            max_depth=0))

-    self._first_stage_nms_score_threshold = first_stage_nms_score_threshold
-    self._first_stage_nms_iou_threshold = first_stage_nms_iou_threshold
+    self._first_stage_nms_fn = first_stage_non_max_suppression_fn
    self._first_stage_max_proposals = first_stage_max_proposals
+    self._use_static_shapes = use_static_shapes

    self._first_stage_localization_loss = (
        losses.WeightedSmoothL1LocalizationLoss())
@@ -437,6 +440,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
    self._first_stage_obj_loss_weight = first_stage_objectness_loss_weight

    # Per-region cropping parameters
+    self._crop_and_resize_fn = crop_and_resize_fn
    self._initial_crop_size = initial_crop_size
    self._maxpool_kernel_size = maxpool_kernel_size
    self._maxpool_stride = maxpool_stride
@@ -458,7 +462,6 @@ class FasterRCNNMetaArch(model.DetectionModel):
    self._second_stage_cls_loss_weight = second_stage_classification_loss_weight
    self._second_stage_mask_loss_weight = (
        second_stage_mask_prediction_loss_weight)
-    self._use_matmul_crop_and_resize = use_matmul_crop_and_resize
    self._hard_example_miner = hard_example_miner
    self._parallel_iterations = parallel_iterations

@@ -673,9 +676,13 @@ class FasterRCNNMetaArch(model.DetectionModel):
    }

    if self._number_of_stages >= 2:
+      # If mixed-precision training on TPU is enabled, rpn_box_encodings and
+      # rpn_objectness_predictions_with_background are bfloat16 tensors.
+      # Considered prediction results, they need to be casted to float32
+      # tensors for correct postprocess_rpn computation in predict_second_stage.
      prediction_dict.update(self._predict_second_stage(
-          rpn_box_encodings,
-          rpn_objectness_predictions_with_background,
+          tf.to_float(rpn_box_encodings),
+          tf.to_float(rpn_objectness_predictions_with_background),
          rpn_features_to_crop,
          self._anchors.get(), image_shape, true_image_shapes))

@@ -719,7 +726,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
        [batch_size, num_valid_anchors, 2] containing class
        predictions (logits) for each of the anchors.  Note that this
        tensor *includes* background class predictions (at class index 0).
-      rpn_features_to_crop: A 4-D float32 tensor with shape
+      rpn_features_to_crop: A 4-D float32 or bfloat16 tensor with shape
        [batch_size, height, width, depth] representing image features to crop
        using the proposal boxes predicted by the RPN.
      anchors: 2-D float tensor of shape
@@ -758,17 +765,22 @@ class FasterRCNNMetaArch(model.DetectionModel):
          boxes proposed by the RPN, thus enabling one to extract features and
          get box classification and prediction for externally selected areas
          of the image.
-        6) box_classifier_features: a 4-D float32 tensor representing the
-          features for each proposal.
+        6) box_classifier_features: a 4-D float32 or bfloat16 tensor
+          representing the features for each proposal.
    """
    image_shape_2d = self._image_batch_shape_2d(image_shape)
    proposal_boxes_normalized, _, num_proposals = self._postprocess_rpn(
        rpn_box_encodings, rpn_objectness_predictions_with_background,
        anchors, image_shape_2d, true_image_shapes)

+    # If mixed-precision training on TPU is enabled, the dtype of
+    # rpn_features_to_crop is bfloat16, otherwise it is float32. tf.cast is
+    # used to match the dtype of proposal_boxes_normalized to that of
+    # rpn_features_to_crop for further computation.
    flattened_proposal_feature_maps = (
        self._compute_second_stage_input_feature_maps(
-            rpn_features_to_crop, proposal_boxes_normalized))
+            rpn_features_to_crop,
+            tf.cast(proposal_boxes_normalized, rpn_features_to_crop.dtype)))

    box_classifier_features = (
        self._feature_extractor.extract_box_classifier_features(
@@ -956,8 +968,11 @@ class FasterRCNNMetaArch(model.DetectionModel):
      image_shape: A 1-D tensor representing the input image shape.
    """
    image_shape = tf.shape(preprocessed_inputs)
-    rpn_features_to_crop, _ = self._feature_extractor.extract_proposal_features(
-        preprocessed_inputs, scope=self.first_stage_feature_extractor_scope)
+
+    rpn_features_to_crop, self.endpoints = (
+        self._feature_extractor.extract_proposal_features(
+            preprocessed_inputs,
+            scope=self.first_stage_feature_extractor_scope))

    feature_map_shape = tf.shape(rpn_features_to_crop)
    anchors = box_list_ops.concatenate(
@@ -965,12 +980,15 @@ class FasterRCNNMetaArch(model.DetectionModel):
                                                      feature_map_shape[2])]))
    with slim.arg_scope(self._first_stage_box_predictor_arg_scope_fn()):
      kernel_size = self._first_stage_box_predictor_kernel_size
+      reuse = tf.get_variable_scope().reuse
      rpn_box_predictor_features = slim.conv2d(
          rpn_features_to_crop,
          self._first_stage_box_predictor_depth,
          kernel_size=[kernel_size, kernel_size],
          rate=self._first_stage_atrous_rate,
-          activation_fn=tf.nn.relu6)
+          activation_fn=tf.nn.relu6,
+          scope='Conv',
+          reuse=reuse)
    return (rpn_box_predictor_features, rpn_features_to_crop,
            anchors, image_shape)

@@ -1223,14 +1241,9 @@ class FasterRCNNMetaArch(model.DetectionModel):
        rpn_objectness_predictions_with_background_batch)[:, :, 1]
    clip_window = self._compute_clip_window(image_shapes)
    (proposal_boxes, proposal_scores, _, _, _,
-     num_proposals) = post_processing.batch_multiclass_non_max_suppression(
+     num_proposals) = self._first_stage_nms_fn(
         tf.expand_dims(proposal_boxes, axis=2),
-         tf.expand_dims(rpn_objectness_softmax_without_background,
-                        axis=2),
-         self._first_stage_nms_score_threshold,
-         self._first_stage_nms_iou_threshold,
-         self._first_stage_max_proposals,
-         self._first_stage_max_proposals,
+         tf.expand_dims(rpn_objectness_softmax_without_background, axis=2),
         clip_window=clip_window)
    if self._is_training:
      proposal_boxes = tf.stop_gradient(proposal_boxes)
@@ -1377,16 +1390,19 @@ class FasterRCNNMetaArch(model.DetectionModel):

    groundtruth_masks_list = self._groundtruth_lists.get(
        fields.BoxListFields.masks)
-    if groundtruth_masks_list is not None:
+    # TODO(rathodv): Remove mask resizing once the legacy pipeline is deleted.
+    if groundtruth_masks_list is not None and self._resize_masks:
      resized_masks_list = []
      for mask in groundtruth_masks_list:
+
        _, resized_mask, _ = self._image_resizer_fn(
            # Reuse the given `image_resizer_fn` to resize groundtruth masks.
            # `mask` tensor for an image is of the shape [num_masks,
            # image_height, image_width]. Below we create a dummy image of the
            # the shape [image_height, image_width, 1] to use with
            # `image_resizer_fn`.
-            image=tf.zeros(tf.stack([tf.shape(mask)[1], tf.shape(mask)[2], 1])),
+            image=tf.zeros(tf.stack([tf.shape(mask)[1],
+                                     tf.shape(mask)[2], 1])),
            masks=mask)
        resized_masks_list.append(resized_mask)

@@ -1443,11 +1459,16 @@ class FasterRCNNMetaArch(model.DetectionModel):
        tf.range(proposal_boxlist.num_boxes()) < num_valid_proposals,
        cls_weights > 0
    )
-    sampled_indices = self._second_stage_sampler.subsample(
+    selected_positions = self._second_stage_sampler.subsample(
        valid_indicator,
        self._second_stage_batch_size,
        positive_indicator)
-    return box_list_ops.boolean_mask(proposal_boxlist, sampled_indices)
+    return box_list_ops.boolean_mask(
+        proposal_boxlist,
+        selected_positions,
+        use_static_shapes=self._use_static_shapes,
+        indicator_sum=(self._second_stage_batch_size
+                       if self._use_static_shapes else None))

  def _compute_second_stage_input_feature_maps(self, features_to_crop,
                                               proposal_boxes_normalized):
@@ -1467,35 +1488,10 @@ class FasterRCNNMetaArch(model.DetectionModel):
    Returns:
      A float32 tensor with shape [K, new_height, new_width, depth].
    """
-    def get_box_inds(proposals):
-      proposals_shape = proposals.get_shape().as_list()
-      if any(dim is None for dim in proposals_shape):
-        proposals_shape = tf.shape(proposals)
-      ones_mat = tf.ones(proposals_shape[:2], dtype=tf.int32)
-      multiplier = tf.expand_dims(
-          tf.range(start=0, limit=proposals_shape[0]), 1)
-      return tf.reshape(ones_mat * multiplier, [-1])
-
-    if self._use_matmul_crop_and_resize:
-      def _single_image_crop_and_resize(inputs):
-        single_image_features_to_crop, proposal_boxes_normalized = inputs
-        return ops.matmul_crop_and_resize(
-            tf.expand_dims(single_image_features_to_crop, 0),
-            proposal_boxes_normalized,
-            [self._initial_crop_size, self._initial_crop_size])
-
-      cropped_regions = self._flatten_first_two_dimensions(
-          shape_utils.static_or_dynamic_map_fn(
-              _single_image_crop_and_resize,
-              elems=[features_to_crop, proposal_boxes_normalized],
-              dtype=tf.float32,
-              parallel_iterations=self._parallel_iterations))
-    else:
-      cropped_regions = tf.image.crop_and_resize(
-          features_to_crop,
-          self._flatten_first_two_dimensions(proposal_boxes_normalized),
-          get_box_inds(proposal_boxes_normalized),
-          (self._initial_crop_size, self._initial_crop_size))
+    cropped_regions = self._flatten_first_two_dimensions(
+        self._crop_and_resize_fn(
+            features_to_crop, proposal_boxes_normalized,
+            [self._initial_crop_size, self._initial_crop_size]))
    return slim.max_pool2d(
        cropped_regions,
        [self._maxpool_kernel_size, self._maxpool_kernel_size],
@@ -1738,11 +1734,17 @@ class FasterRCNNMetaArch(model.DetectionModel):
      sampled_reg_indices = tf.multiply(batch_sampled_indices,
                                        batch_reg_weights)

+      losses_mask = None
+      if self.groundtruth_has_field(fields.InputDataFields.is_annotated):
+        losses_mask = tf.stack(self.groundtruth_lists(
+            fields.InputDataFields.is_annotated))
      localization_losses = self._first_stage_localization_loss(
-          rpn_box_encodings, batch_reg_targets, weights=sampled_reg_indices)
+          rpn_box_encodings, batch_reg_targets, weights=sampled_reg_indices,
+          losses_mask=losses_mask)
      objectness_losses = self._first_stage_objectness_loss(
          rpn_objectness_predictions_with_background,
-          batch_one_hot_targets, weights=batch_sampled_indices)
+          batch_one_hot_targets, weights=batch_sampled_indices,
+          losses_mask=losses_mask)
      localization_loss = tf.reduce_mean(
          tf.reduce_sum(localization_losses, axis=1) / normalizer)
      objectness_loss = tf.reduce_mean(
@@ -1866,32 +1868,32 @@ class FasterRCNNMetaArch(model.DetectionModel):
      # for just one class to avoid over-counting for regression loss and
      # (optionally) mask loss.
      else:
-        # We only predict refined location encodings for the non background
-        # classes, but we now pad it to make it compatible with the class
-        # predictions
-        refined_box_encodings_with_background = tf.pad(
-            refined_box_encodings, [[0, 0], [1, 0], [0, 0]])
-        refined_box_encodings_masked_by_class_targets = tf.boolean_mask(
-            refined_box_encodings_with_background,
-            tf.greater(one_hot_flat_cls_targets_with_background, 0))
-        reshaped_refined_box_encodings = tf.reshape(
-            refined_box_encodings_masked_by_class_targets,
-            [batch_size, self.max_num_proposals, self._box_coder.code_size])
-
+        reshaped_refined_box_encodings = (
+            self._get_refined_encodings_for_postitive_class(
+                refined_box_encodings,
+                one_hot_flat_cls_targets_with_background, batch_size))
+
+      losses_mask = None
+      if self.groundtruth_has_field(fields.InputDataFields.is_annotated):
+        losses_mask = tf.stack(self.groundtruth_lists(
+            fields.InputDataFields.is_annotated))
      second_stage_loc_losses = self._second_stage_localization_loss(
          reshaped_refined_box_encodings,
-          batch_reg_targets, weights=batch_reg_weights) / normalizer
+          batch_reg_targets,
+          weights=batch_reg_weights,
+          losses_mask=losses_mask) / normalizer
      second_stage_cls_losses = ops.reduce_sum_trailing_dimensions(
          self._second_stage_classification_loss(
              class_predictions_with_background,
              batch_cls_targets_with_background,
-              weights=batch_cls_weights),
+              weights=batch_cls_weights,
+              losses_mask=losses_mask),
          ndims=2) / normalizer

      second_stage_loc_loss = tf.reduce_sum(
-          tf.boolean_mask(second_stage_loc_losses, paddings_indicator))
+          second_stage_loc_losses * tf.to_float(paddings_indicator))
      second_stage_cls_loss = tf.reduce_sum(
-          tf.boolean_mask(second_stage_cls_losses, paddings_indicator))
+          second_stage_cls_losses * tf.to_float(paddings_indicator))

      if self._hard_example_miner:
        (second_stage_loc_loss, second_stage_cls_loss
@@ -1954,10 +1956,9 @@ class FasterRCNNMetaArch(model.DetectionModel):
            box_list.BoxList(tf.reshape(proposal_boxes, [-1, 4])),
            image_shape[1], image_shape[2]).get()

-        flat_cropped_gt_mask = tf.image.crop_and_resize(
+        flat_cropped_gt_mask = self._crop_and_resize_fn(
            tf.expand_dims(flat_gt_masks, -1),
-            flat_normalized_proposals,
-            tf.range(flat_normalized_proposals.shape[0].value),
+            tf.expand_dims(flat_normalized_proposals, axis=1),
            [mask_height, mask_width])

        batch_cropped_gt_mask = tf.reshape(
@@ -1968,14 +1969,16 @@ class FasterRCNNMetaArch(model.DetectionModel):
            self._second_stage_mask_loss(
                reshaped_prediction_masks,
                batch_cropped_gt_mask,
-                weights=batch_mask_target_weights),
+                weights=batch_mask_target_weights,
+                losses_mask=losses_mask),
            ndims=2) / (
                mask_height * mask_width * tf.maximum(
                    tf.reduce_sum(
                        batch_mask_target_weights, axis=1, keep_dims=True
                    ), tf.ones((batch_size, 1))))
        second_stage_mask_loss = tf.reduce_sum(
-            tf.boolean_mask(second_stage_mask_losses, paddings_indicator))
+            tf.where(paddings_indicator, second_stage_mask_losses,
+                     tf.zeros_like(second_stage_mask_losses)))

      if second_stage_mask_loss is not None:
        mask_loss = tf.multiply(self._second_stage_mask_loss_weight,
@@ -1983,6 +1986,29 @@ class FasterRCNNMetaArch(model.DetectionModel):
        loss_dict[mask_loss.op.name] = mask_loss
    return loss_dict

+  def _get_refined_encodings_for_postitive_class(
+      self, refined_box_encodings, flat_cls_targets_with_background,
+      batch_size):
+    # We only predict refined location encodings for the non background
+    # classes, but we now pad it to make it compatible with the class
+    # predictions
+    refined_box_encodings_with_background = tf.pad(refined_box_encodings,
+                                                   [[0, 0], [1, 0], [0, 0]])
+    refined_box_encodings_masked_by_class_targets = (
+        box_list_ops.boolean_mask(
+            box_list.BoxList(
+                tf.reshape(refined_box_encodings_with_background,
+                           [-1, self._box_coder.code_size])),
+            tf.reshape(tf.greater(flat_cls_targets_with_background, 0), [-1]),
+            use_static_shapes=self._use_static_shapes,
+            indicator_sum=batch_size * self.max_num_proposals
+            if self._use_static_shapes else None).get())
+    return tf.reshape(
+        refined_box_encodings_masked_by_class_targets, [
+            batch_size, self.max_num_proposals,
+            self._box_coder.code_size
+        ])
+
  def _padded_batched_proposals_indicator(self,
                                          num_proposals,
                                          max_num_proposals):

--- a/research/object_detection/meta_architectures/faster_rcnn_meta_arch_test_lib.py
+++ b/research/object_detection/meta_architectures/faster_rcnn_meta_arch_test_lib.py
@@ -14,8 +14,12 @@
 # ==============================================================================

 """Tests for object_detection.meta_architectures.faster_rcnn_meta_arch."""
+import functools
+from absl.testing import parameterized
+
 import numpy as np
 import tensorflow as tf
+
 from google.protobuf import text_format
 from object_detection.anchor_generators import grid_anchor_generator
 from object_detection.builders import box_predictor_builder
@@ -23,11 +27,14 @@ from object_detection.builders import hyperparams_builder
 from object_detection.builders import post_processing_builder
 from object_detection.core import balanced_positive_negative_sampler as sampler
 from object_detection.core import losses
+from object_detection.core import post_processing
 from object_detection.core import target_assigner
 from object_detection.meta_architectures import faster_rcnn_meta_arch
 from object_detection.protos import box_predictor_pb2
 from object_detection.protos import hyperparams_pb2
 from object_detection.protos import post_processing_pb2
+from object_detection.utils import ops
+from object_detection.utils import test_case
 from object_detection.utils import test_utils

 slim = tf.contrib.slim
@@ -60,7 +67,7 @@ class FakeFasterRCNNFeatureExtractor(
                             num_outputs=3, kernel_size=1, scope='layer2')


-class FasterRCNNMetaArchTestBase(tf.test.TestCase):
+class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
  """Base class to test Faster R-CNN and R-FCN meta architectures."""

  def _build_arg_scope_with_hyperparams(self,
@@ -157,7 +164,8 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
                   masks_are_class_agnostic=False,
                   use_matmul_crop_and_resize=False,
                   clip_anchors_to_image=False,
-                   use_matmul_gather_in_matcher=False):
+                   use_matmul_gather_in_matcher=False,
+                   use_static_shapes=False):

    def image_resizer_fn(image, masks=None):
      """Fake image resizer function."""
@@ -220,11 +228,18 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
    first_stage_box_predictor_depth = 512
    first_stage_minibatch_size = 3
    first_stage_sampler = sampler.BalancedPositiveNegativeSampler(
-        positive_fraction=0.5, is_static=False)
+        positive_fraction=0.5, is_static=use_static_shapes)

    first_stage_nms_score_threshold = -1.0
    first_stage_nms_iou_threshold = 1.0
    first_stage_max_proposals = first_stage_max_proposals
+    first_stage_non_max_suppression_fn = functools.partial(
+        post_processing.batch_multiclass_non_max_suppression,
+        score_thresh=first_stage_nms_score_threshold,
+        iou_thresh=first_stage_nms_iou_threshold,
+        max_size_per_class=first_stage_max_proposals,
+        max_total_size=first_stage_max_proposals,
+        use_static_shapes=use_static_shapes)

    first_stage_localization_loss_weight = 1.0
    first_stage_objectness_loss_weight = 1.0
@@ -246,7 +261,7 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
    second_stage_non_max_suppression_fn, _ = post_processing_builder.build(
        post_processing_config)
    second_stage_sampler = sampler.BalancedPositiveNegativeSampler(
-        positive_fraction=1.0, is_static=False)
+        positive_fraction=1.0, is_static=use_static_shapes)

    second_stage_score_conversion_fn = tf.identity
    second_stage_localization_loss_weight = 1.0
@@ -268,6 +283,9 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
          loc_loss_weight=second_stage_localization_loss_weight,
          max_negatives_per_positive=None)

+    crop_and_resize_fn = (
+        ops.matmul_crop_and_resize
+        if use_matmul_crop_and_resize else ops.native_crop_and_resize)
    common_kwargs = {
        'is_training': is_training,
        'num_classes': num_classes,
@@ -284,8 +302,8 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
        'first_stage_box_predictor_depth': first_stage_box_predictor_depth,
        'first_stage_minibatch_size': first_stage_minibatch_size,
        'first_stage_sampler': first_stage_sampler,
-        'first_stage_nms_score_threshold': first_stage_nms_score_threshold,
-        'first_stage_nms_iou_threshold': first_stage_nms_iou_threshold,
+        'first_stage_non_max_suppression_fn':
+        first_stage_non_max_suppression_fn,
        'first_stage_max_proposals': first_stage_max_proposals,
        'first_stage_localization_loss_weight':
        first_stage_localization_loss_weight,
@@ -304,8 +322,10 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
        'second_stage_classification_loss':
        second_stage_classification_loss,
        'hard_example_miner': hard_example_miner,
-        'use_matmul_crop_and_resize': use_matmul_crop_and_resize,
-        'clip_anchors_to_image': clip_anchors_to_image
+        'crop_and_resize_fn': crop_and_resize_fn,
+        'clip_anchors_to_image': clip_anchors_to_image,
+        'use_static_shapes': use_static_shapes,
+        'resize_masks': True,
    }

    return self._get_model(
@@ -412,7 +432,7 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
        anchors = prediction_out['anchors']
        self.assertTrue(len(anchors.shape) == 2 and anchors.shape[1] == 4)
        num_anchors_out = anchors.shape[0]
-        self.assertTrue(num_anchors_out < num_anchors_strict_upper_bound)
+        self.assertLess(num_anchors_out, num_anchors_strict_upper_bound)

        self.assertTrue(np.all(np.greater_equal(anchors, 0)))
        self.assertTrue(np.all(np.less_equal(anchors[:, 0], height)))
@@ -484,94 +504,104 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
      for key in expected_shapes:
        self.assertAllEqual(tensor_dict_out[key].shape, expected_shapes[key])

-  def _test_predict_gives_correct_shapes_in_train_mode_both_stages(
-      self, use_matmul_crop_and_resize=False,
-      clip_anchors_to_image=False):
-    test_graph = tf.Graph()
-    with test_graph.as_default():
+  # BEGIN GOOGLE-INTERNAL
+  # TODO(bhattad): Remove conditional after CMLE moves to TF 1.11
+  @parameterized.parameters(
+      {'use_static_shapes': False},
+      {'use_static_shapes': True}
+  )
+  # END GOOGLE-INTERNAL
+  def test_predict_gives_correct_shapes_in_train_mode_both_stages(
+      self,
+      use_static_shapes=False):
+    batch_size = 2
+    image_size = 10
+    max_num_proposals = 7
+    initial_crop_size = 3
+    maxpool_stride = 1
+
+    def graph_fn(images, gt_boxes, gt_classes, gt_weights):
+      """Function to construct tf graph for the test."""
      model = self._build_model(
          is_training=True,
          number_of_stages=2,
          second_stage_batch_size=7,
          predict_masks=False,
-          use_matmul_crop_and_resize=use_matmul_crop_and_resize,
-          clip_anchors_to_image=clip_anchors_to_image)
+          use_matmul_crop_and_resize=use_static_shapes,
+          clip_anchors_to_image=use_static_shapes,
+          use_static_shapes=use_static_shapes)

-      batch_size = 2
-      image_size = 10
-      max_num_proposals = 7
-      initial_crop_size = 3
-      maxpool_stride = 1
-
-      image_shape = (batch_size, image_size, image_size, 3)
-      preprocessed_inputs = tf.zeros(image_shape, dtype=tf.float32)
-      groundtruth_boxes_list = [
-          tf.constant([[0, 0, .5, .5], [.5, .5, 1, 1]], dtype=tf.float32),
-          tf.constant([[0, .5, .5, 1], [.5, 0, 1, .5]], dtype=tf.float32)]
-      groundtruth_classes_list = [
-          tf.constant([[1, 0], [0, 1]], dtype=tf.float32),
-          tf.constant([[1, 0], [1, 0]], dtype=tf.float32)]
-      groundtruth_weights_list = [
-          tf.constant([1, 1], dtype=tf.float32),
-          tf.constant([1, 1], dtype=tf.float32)]
-      _, true_image_shapes = model.preprocess(tf.zeros(image_shape))
+      preprocessed_inputs, true_image_shapes = model.preprocess(images)
      model.provide_groundtruth(
-          groundtruth_boxes_list,
-          groundtruth_classes_list,
-          groundtruth_weights_list=groundtruth_weights_list)
-
+          groundtruth_boxes_list=tf.unstack(gt_boxes),
+          groundtruth_classes_list=tf.unstack(gt_classes),
+          groundtruth_weights_list=tf.unstack(gt_weights))
      result_tensor_dict = model.predict(preprocessed_inputs, true_image_shapes)
-      expected_shapes = {
-          'rpn_box_predictor_features':
-          (2, image_size, image_size, 512),
-          'rpn_features_to_crop': (2, image_size, image_size, 3),
-          'image_shape': (4,),
-          'refined_box_encodings': (2 * max_num_proposals, 2, 4),
-          'class_predictions_with_background': (2 * max_num_proposals, 2 + 1),
-          'num_proposals': (2,),
-          'proposal_boxes': (2, max_num_proposals, 4),
-          'proposal_boxes_normalized': (2, max_num_proposals, 4),
-          'box_classifier_features':
-          self._get_box_classifier_features_shape(image_size,
-                                                  batch_size,
-                                                  max_num_proposals,
-                                                  initial_crop_size,
-                                                  maxpool_stride,
-                                                  3)
-      }
-
-      init_op = tf.global_variables_initializer()
-      with self.test_session(graph=test_graph) as sess:
-        sess.run(init_op)
-        tensor_dict_out = sess.run(result_tensor_dict)
-        self.assertEqual(set(tensor_dict_out.keys()),
-                         set(expected_shapes.keys()).union(set([
-                             'rpn_box_encodings',
-                             'rpn_objectness_predictions_with_background',
-                             'anchors'])))
-        for key in expected_shapes:
-          self.assertAllEqual(tensor_dict_out[key].shape, expected_shapes[key])
-
-        anchors_shape_out = tensor_dict_out['anchors'].shape
-        self.assertEqual(2, len(anchors_shape_out))
-        self.assertEqual(4, anchors_shape_out[1])
-        num_anchors_out = anchors_shape_out[0]
-        self.assertAllEqual(tensor_dict_out['rpn_box_encodings'].shape,
-                            (2, num_anchors_out, 4))
-        self.assertAllEqual(
-            tensor_dict_out['rpn_objectness_predictions_with_background'].shape,
-            (2, num_anchors_out, 2))
-
-  def test_predict_gives_correct_shapes_in_train_mode_both_stages(self):
-    self._test_predict_gives_correct_shapes_in_train_mode_both_stages()
-
-  def test_predict_gives_correct_shapes_in_train_mode_matmul_crop_resize(self):
-    self._test_predict_gives_correct_shapes_in_train_mode_both_stages(
-        use_matmul_crop_and_resize=True)
+      return (result_tensor_dict['refined_box_encodings'],
+              result_tensor_dict['class_predictions_with_background'],
+              result_tensor_dict['proposal_boxes'],
+              result_tensor_dict['proposal_boxes_normalized'],
+              result_tensor_dict['anchors'],
+              result_tensor_dict['rpn_box_encodings'],
+              result_tensor_dict['rpn_objectness_predictions_with_background'],
+              result_tensor_dict['rpn_features_to_crop'],
+              result_tensor_dict['rpn_box_predictor_features'],
+             )
+
+    image_shape = (batch_size, image_size, image_size, 3)
+    images = np.zeros(image_shape, dtype=np.float32)
+    gt_boxes = np.stack([
+        np.array([[0, 0, .5, .5], [.5, .5, 1, 1]], dtype=np.float32),
+        np.array([[0, .5, .5, 1], [.5, 0, 1, .5]], dtype=np.float32)
+    ])
+    gt_classes = np.stack([
+        np.array([[1, 0], [0, 1]], dtype=np.float32),
+        np.array([[1, 0], [1, 0]], dtype=np.float32)
+    ])
+    gt_weights = np.stack([
+        np.array([1, 1], dtype=np.float32),
+        np.array([1, 1], dtype=np.float32)
+    ])
+    if use_static_shapes:
+      results = self.execute(graph_fn,
+                             [images, gt_boxes, gt_classes, gt_weights])
+    else:
+      results = self.execute_cpu(graph_fn,
+                                 [images, gt_boxes, gt_classes, gt_weights])

-  def test_predict_gives_correct_shapes_in_train_mode_clip_anchors(self):
-    self._test_predict_gives_correct_shapes_in_train_mode_both_stages(
-        clip_anchors_to_image=True)
+    expected_shapes = {
+        'rpn_box_predictor_features': (2, image_size, image_size, 512),
+        'rpn_features_to_crop': (2, image_size, image_size, 3),
+        'refined_box_encodings': (2 * max_num_proposals, 2, 4),
+        'class_predictions_with_background': (2 * max_num_proposals, 2 + 1),
+        'proposal_boxes': (2, max_num_proposals, 4),
+        'rpn_box_encodings': (2, image_size * image_size * 9, 4),
+        'proposal_boxes_normalized': (2, max_num_proposals, 4),
+        'box_classifier_features':
+            self._get_box_classifier_features_shape(
+                image_size, batch_size, max_num_proposals, initial_crop_size,
+                maxpool_stride, 3),
+        'rpn_objectness_predictions_with_background':
+        (2, image_size * image_size * 9, 2)
+    }
+    # TODO(rathodv): Possibly change utils/test_case.py to accept dictionaries
+    # and return dicionaries so don't have to rely on the order of tensors.
+    self.assertAllEqual(results[0].shape,
+                        expected_shapes['refined_box_encodings'])
+    self.assertAllEqual(results[1].shape,
+                        expected_shapes['class_predictions_with_background'])
+    self.assertAllEqual(results[2].shape, expected_shapes['proposal_boxes'])
+    self.assertAllEqual(results[3].shape,
+                        expected_shapes['proposal_boxes_normalized'])
+    anchors_shape = results[4].shape
+    self.assertAllEqual(results[5].shape,
+                        [batch_size, anchors_shape[0], 4])
+    self.assertAllEqual(results[6].shape,
+                        [batch_size, anchors_shape[0], 2])
+    self.assertAllEqual(results[7].shape,
+                        expected_shapes['rpn_features_to_crop'])
+    self.assertAllEqual(results[8].shape,
+                        expected_shapes['rpn_box_predictor_features'])

  def _test_postprocess_first_stage_only_inference_mode(
      self, pad_to_max_dimension=None):
@@ -848,10 +878,10 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
      loss_dict_out = sess.run(loss_dict)
      self.assertAllClose(loss_dict_out['Loss/RPNLoss/localization_loss'], 0)
      self.assertAllClose(loss_dict_out['Loss/RPNLoss/objectness_loss'], 0)
-      self.assertTrue('Loss/BoxClassifierLoss/localization_loss'
-                      not in loss_dict_out)
-      self.assertTrue('Loss/BoxClassifierLoss/classification_loss'
-                      not in loss_dict_out)
+      self.assertNotIn('Loss/BoxClassifierLoss/localization_loss',
+                       loss_dict_out)
+      self.assertNotIn('Loss/BoxClassifierLoss/classification_loss',
+                       loss_dict_out)

  # TODO(rathodv): Split test into two - with and without masks.
  def test_loss_full(self):
@@ -1157,22 +1187,68 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
          'Loss/BoxClassifierLoss/classification_loss'], 0)
      self.assertAllClose(loss_dict_out['Loss/BoxClassifierLoss/mask_loss'], 0)

-  def test_loss_full_zero_padded_proposals_nonzero_loss_with_two_images(self):
-    model = self._build_model(
-        is_training=True, number_of_stages=2, second_stage_batch_size=6)
+  # BEGIN GOOGLE-INTERNAL
+  # TODO(bhattad): Remove conditional after CMLE moves to TF 1.11
+  @parameterized.parameters(
+      {'use_static_shapes': False, 'shared_boxes': False},
+      {'use_static_shapes': False, 'shared_boxes': True},
+
+      {'use_static_shapes': True, 'shared_boxes': False},
+      {'use_static_shapes': True, 'shared_boxes': True},
+  )
+  # END GOOGLE-INTERNAL
+  def test_loss_full_zero_padded_proposals_nonzero_loss_with_two_images(
+      self, use_static_shapes=False, shared_boxes=False):
    batch_size = 2
-    anchors = tf.constant(
+    first_stage_max_proposals = 8
+    second_stage_batch_size = 6
+    num_classes = 2
+    def graph_fn(anchors, rpn_box_encodings,
+                 rpn_objectness_predictions_with_background, images,
+                 num_proposals, proposal_boxes, refined_box_encodings,
+                 class_predictions_with_background, groundtruth_boxes,
+                 groundtruth_classes):
+      """Function to construct tf graph for the test."""
+      model = self._build_model(
+          is_training=True, number_of_stages=2,
+          second_stage_batch_size=second_stage_batch_size,
+          first_stage_max_proposals=first_stage_max_proposals,
+          num_classes=num_classes,
+          use_matmul_crop_and_resize=use_static_shapes,
+          clip_anchors_to_image=use_static_shapes,
+          use_static_shapes=use_static_shapes)
+
+      prediction_dict = {
+          'rpn_box_encodings': rpn_box_encodings,
+          'rpn_objectness_predictions_with_background':
+          rpn_objectness_predictions_with_background,
+          'image_shape': tf.shape(images),
+          'anchors': anchors,
+          'refined_box_encodings': refined_box_encodings,
+          'class_predictions_with_background':
+          class_predictions_with_background,
+          'proposal_boxes': proposal_boxes,
+          'num_proposals': num_proposals
+      }
+      _, true_image_shapes = model.preprocess(images)
+      model.provide_groundtruth(tf.unstack(groundtruth_boxes),
+                                tf.unstack(groundtruth_classes))
+      loss_dict = model.loss(prediction_dict, true_image_shapes)
+      return (loss_dict['Loss/RPNLoss/localization_loss'],
+              loss_dict['Loss/RPNLoss/objectness_loss'],
+              loss_dict['Loss/BoxClassifierLoss/localization_loss'],
+              loss_dict['Loss/BoxClassifierLoss/classification_loss'])
+
+    anchors = np.array(
        [[0, 0, 16, 16],
         [0, 16, 16, 32],
         [16, 0, 32, 16],
-         [16, 16, 32, 32]], dtype=tf.float32)
-    rpn_box_encodings = tf.zeros(
-        [batch_size,
-         anchors.get_shape().as_list()[0],
-         BOX_CODE_SIZE], dtype=tf.float32)
+         [16, 16, 32, 32]], dtype=np.float32)
+    rpn_box_encodings = np.zeros(
+        [batch_size, anchors.shape[1], BOX_CODE_SIZE], dtype=np.float32)
    # use different numbers for the objectness category to break ties in
    # order of boxes returned by NMS
-    rpn_objectness_predictions_with_background = tf.constant(
+    rpn_objectness_predictions_with_background = np.array(
        [[[-10, 13],
          [10, -10],
          [10, -11],
@@ -1180,13 +1256,13 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
         [[-10, 13],
          [10, -10],
          [10, -11],
-          [10, -12]]], dtype=tf.float32)
-    image_shape = tf.constant([batch_size, 32, 32, 3], dtype=tf.int32)
+          [10, -12]]], dtype=np.float32)
+    images = np.zeros([batch_size, 32, 32, 3], dtype=np.float32)

    # box_classifier_batch_size is 6, but here we assume that the number of
    # actual proposals (not counting zero paddings) is fewer.
-    num_proposals = tf.constant([3, 2], dtype=tf.int32)
-    proposal_boxes = tf.constant(
+    num_proposals = np.array([3, 2], dtype=np.int32)
+    proposal_boxes = np.array(
        [[[0, 0, 16, 16],
          [0, 16, 16, 32],
          [16, 0, 32, 16],
@@ -1198,13 +1274,13 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
          [0, 0, 0, 0],  # begin paddings
          [0, 0, 0, 0],
          [0, 0, 0, 0],
-          [0, 0, 0, 0]]], dtype=tf.float32)
+          [0, 0, 0, 0]]], dtype=np.float32)

-    refined_box_encodings = tf.zeros(
-        (batch_size * model.max_num_proposals,
-         model.num_classes,
-         BOX_CODE_SIZE), dtype=tf.float32)
-    class_predictions_with_background = tf.constant(
+    refined_box_encodings = np.zeros(
+        (batch_size * second_stage_batch_size, 1
+         if shared_boxes else num_classes, BOX_CODE_SIZE),
+        dtype=np.float32)
+    class_predictions_with_background = np.array(
        [[-10, 10, -10],  # first image
         [10, -10, -10],
         [10, -10, -10],
@@ -1216,7 +1292,7 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
         [0, 0, 0],  # begin paddings
         [0, 0, 0],
         [0, 0, 0],
-         [0, 0, 0],], dtype=tf.float32)
+         [0, 0, 0],], dtype=np.float32)

    # The first groundtruth box is 4/5 of the anchor size in both directions
    # experiencing a loss of:
@@ -1225,38 +1301,29 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
    # The second groundtruth box is identical to the prediction and thus
    # experiences zero loss.
    # Total average loss is (abs(5 * log(1/2)) - .5) / 3.
-    groundtruth_boxes_list = [
-        tf.constant([[0.05, 0.05, 0.45, 0.45]], dtype=tf.float32),
-        tf.constant([[0.0, 0.0, 0.5, 0.5]], dtype=tf.float32)]
-    groundtruth_classes_list = [tf.constant([[1, 0]], dtype=tf.float32),
-                                tf.constant([[0, 1]], dtype=tf.float32)]
-    exp_loc_loss = (-5 * np.log(.8) - 0.5) / 3.0
+    groundtruth_boxes = np.stack([
+        np.array([[0.05, 0.05, 0.45, 0.45]], dtype=np.float32),
+        np.array([[0.0, 0.0, 0.5, 0.5]], dtype=np.float32)])
+    groundtruth_classes = np.stack([np.array([[1, 0]], dtype=np.float32),
+                                    np.array([[0, 1]], dtype=np.float32)])
+
+    execute_fn = self.execute_cpu
+    if use_static_shapes:
+      execute_fn = self.execute
+
+    results = execute_fn(graph_fn, [
+        anchors, rpn_box_encodings, rpn_objectness_predictions_with_background,
+        images, num_proposals, proposal_boxes, refined_box_encodings,
+        class_predictions_with_background, groundtruth_boxes,
+        groundtruth_classes
+    ])

-    prediction_dict = {
-        'rpn_box_encodings': rpn_box_encodings,
-        'rpn_objectness_predictions_with_background':
-        rpn_objectness_predictions_with_background,
-        'image_shape': image_shape,
-        'anchors': anchors,
-        'refined_box_encodings': refined_box_encodings,
-        'class_predictions_with_background': class_predictions_with_background,
-        'proposal_boxes': proposal_boxes,
-        'num_proposals': num_proposals
-    }
-    _, true_image_shapes = model.preprocess(tf.zeros(image_shape))
-    model.provide_groundtruth(groundtruth_boxes_list,
-                              groundtruth_classes_list)
-    loss_dict = model.loss(prediction_dict, true_image_shapes)
+    exp_loc_loss = (-5 * np.log(.8) - 0.5) / 3.0

-    with self.test_session() as sess:
-      loss_dict_out = sess.run(loss_dict)
-      self.assertAllClose(loss_dict_out['Loss/RPNLoss/localization_loss'],
-                          exp_loc_loss)
-      self.assertAllClose(loss_dict_out['Loss/RPNLoss/objectness_loss'], 0)
-      self.assertAllClose(loss_dict_out[
-          'Loss/BoxClassifierLoss/localization_loss'], exp_loc_loss)
-      self.assertAllClose(loss_dict_out[
-          'Loss/BoxClassifierLoss/classification_loss'], 0)
+    self.assertAllClose(results[0], exp_loc_loss, rtol=1e-4, atol=1e-4)
+    self.assertAllClose(results[1], 0.0)
+    self.assertAllClose(results[2], exp_loc_loss, rtol=1e-4, atol=1e-4)
+    self.assertAllClose(results[3], 0.0)

  def test_loss_with_hard_mining(self):
    model = self._build_model(is_training=True,
@@ -1346,10 +1413,14 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
      self.assertAllClose(loss_dict_out[
          'Loss/BoxClassifierLoss/classification_loss'], 0)

-  def test_loss_full_with_shared_boxes(self):
-    model = self._build_model(
-        is_training=True, number_of_stages=2, second_stage_batch_size=6)
+  def test_loss_with_hard_mining_and_losses_mask(self):
+    model = self._build_model(is_training=True,
+                              number_of_stages=2,
+                              second_stage_batch_size=None,
+                              first_stage_max_proposals=6,
+                              hard_mining=True)
    batch_size = 2
+    number_of_proposals = 3
    anchors = tf.constant(
        [[0, 0, 16, 16],
         [0, 16, 16, 32],
@@ -1361,63 +1432,77 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
         BOX_CODE_SIZE], dtype=tf.float32)
    # use different numbers for the objectness category to break ties in
    # order of boxes returned by NMS
-    rpn_objectness_predictions_with_background = tf.constant([
-        [[-10, 13],
-         [10, -10],
-         [10, -11],
-         [-10, 12]],
-        [[10, -10],
-         [-10, 13],
-         [-10, 12],
-         [10, -11]]], dtype=tf.float32)
+    rpn_objectness_predictions_with_background = tf.constant(
+        [[[-10, 13],
+          [-10, 12],
+          [10, -11],
+          [10, -12]],
+         [[-10, 13],
+          [-10, 12],
+          [10, -11],
+          [10, -12]]], dtype=tf.float32)
    image_shape = tf.constant([batch_size, 32, 32, 3], dtype=tf.int32)

-    num_proposals = tf.constant([6, 6], dtype=tf.int32)
+    # box_classifier_batch_size is 6, but here we assume that the number of
+    # actual proposals (not counting zero paddings) is fewer (3).
+    num_proposals = tf.constant([number_of_proposals, number_of_proposals],
+                                dtype=tf.int32)
    proposal_boxes = tf.constant(
-        2 * [[[0, 0, 16, 16],
-              [0, 16, 16, 32],
-              [16, 0, 32, 16],
-              [16, 16, 32, 32],
-              [0, 0, 16, 16],
-              [0, 16, 16, 32]]], dtype=tf.float32)
+        [[[0, 0, 16, 16],  # first image
+          [0, 16, 16, 32],
+          [16, 0, 32, 16],
+          [0, 0, 0, 0],  # begin paddings
+          [0, 0, 0, 0],
+          [0, 0, 0, 0]],
+         [[0, 0, 16, 16],  # second image
+          [0, 16, 16, 32],
+          [16, 0, 32, 16],
+          [0, 0, 0, 0],  # begin paddings
+          [0, 0, 0, 0],
+          [0, 0, 0, 0]]], dtype=tf.float32)
+
    refined_box_encodings = tf.zeros(
        (batch_size * model.max_num_proposals,
-         1,  # one box shared among all the classes
+         model.num_classes,
         BOX_CODE_SIZE), dtype=tf.float32)
    class_predictions_with_background = tf.constant(
        [[-10, 10, -10],  # first image
-         [10, -10, -10],
-         [10, -10, -10],
         [-10, -10, 10],
-         [-10, 10, -10],
-         [10, -10, -10],
-         [10, -10, -10],  # second image
-         [-10, 10, -10],
-         [-10, 10, -10],
         [10, -10, -10],
+         [0, 0, 0],  # begin paddings
+         [0, 0, 0],
+         [0, 0, 0],
+         [-10, 10, -10],  # second image
+         [-10, -10, 10],
         [10, -10, -10],
-         [-10, 10, -10]], dtype=tf.float32)
-
-    mask_predictions_logits = 20 * tf.ones((batch_size *
-                                            model.max_num_proposals,
-                                            model.num_classes,
-                                            14, 14),
-                                           dtype=tf.float32)
+         [0, 0, 0],  # begin paddings
+         [0, 0, 0],
+         [0, 0, 0]], dtype=tf.float32)

+    # The first groundtruth box is 4/5 of the anchor size in both directions
+    # experiencing a loss of:
+    # 2 * SmoothL1(5 * log(4/5)) / (num_proposals * batch_size)
+    #   = 2 * (abs(5 * log(1/2)) - .5) / 3
+    # The second groundtruth box is 46/50 of the anchor size in both directions
+    # experiencing a loss of:
+    # 2 * SmoothL1(5 * log(42/50)) / (num_proposals * batch_size)
+    #   = 2 * (.5(5 * log(.92))^2 - .5) / 3.
+    # Since the first groundtruth box experiences greater loss, and we have
+    # set num_hard_examples=1 in the HardMiner, the final localization loss
+    # corresponds to that of the first groundtruth box.
    groundtruth_boxes_list = [
-        tf.constant([[0, 0, .5, .5], [.5, .5, 1, 1]], dtype=tf.float32),
-        tf.constant([[0, .5, .5, 1], [.5, 0, 1, .5]], dtype=tf.float32)]
-    groundtruth_classes_list = [tf.constant([[1, 0], [0, 1]], dtype=tf.float32),
-                                tf.constant([[1, 0], [1, 0]], dtype=tf.float32)]
+        tf.constant([[0.05, 0.05, 0.45, 0.45],
+                     [0.02, 0.52, 0.48, 0.98]], dtype=tf.float32),
+        tf.constant([[0.05, 0.05, 0.45, 0.45],
+                     [0.02, 0.52, 0.48, 0.98]], dtype=tf.float32)]
+    groundtruth_classes_list = [
+        tf.constant([[1, 0], [0, 1]], dtype=tf.float32),
+        tf.constant([[1, 0], [0, 1]], dtype=tf.float32)]
+    is_annotated_list = [tf.constant(True, dtype=tf.bool),
+                         tf.constant(False, dtype=tf.bool)]
+    exp_loc_loss = (2 * (-5 * np.log(.8) - 0.5) /
+                    (number_of_proposals * batch_size))

-    # Set all elements of groundtruth mask to 1.0. In this case all proposal
-    # crops of the groundtruth masks should return a mask that covers the entire
-    # proposal. Thus, if mask_predictions_logits element values are all greater
-    # than 20, the loss should be zero.
-    groundtruth_masks_list = [tf.convert_to_tensor(np.ones((2, 32, 32)),
-                                                   dtype=tf.float32),
-                              tf.convert_to_tensor(np.ones((2, 32, 32)),
-                                                   dtype=tf.float32)]
    prediction_dict = {
        'rpn_box_encodings': rpn_box_encodings,
        'rpn_objectness_predictions_with_background':
@@ -1427,24 +1512,20 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
        'refined_box_encodings': refined_box_encodings,
        'class_predictions_with_background': class_predictions_with_background,
        'proposal_boxes': proposal_boxes,
-        'num_proposals': num_proposals,
-        'mask_predictions': mask_predictions_logits
+        'num_proposals': num_proposals
    }
    _, true_image_shapes = model.preprocess(tf.zeros(image_shape))
    model.provide_groundtruth(groundtruth_boxes_list,
                              groundtruth_classes_list,
-                              groundtruth_masks_list)
+                              is_annotated_list=is_annotated_list)
    loss_dict = model.loss(prediction_dict, true_image_shapes)

    with self.test_session() as sess:
      loss_dict_out = sess.run(loss_dict)
-      self.assertAllClose(loss_dict_out['Loss/RPNLoss/localization_loss'], 0)
-      self.assertAllClose(loss_dict_out['Loss/RPNLoss/objectness_loss'], 0)
      self.assertAllClose(loss_dict_out[
-          'Loss/BoxClassifierLoss/localization_loss'], 0)
+          'Loss/BoxClassifierLoss/localization_loss'], exp_loc_loss)
      self.assertAllClose(loss_dict_out[
          'Loss/BoxClassifierLoss/classification_loss'], 0)
-      self.assertAllClose(loss_dict_out['Loss/BoxClassifierLoss/mask_loss'], 0)

  def test_restore_map_for_classification_ckpt(self):
    # Define mock tensorflow classification graph and save variables.

--- a/research/object_detection/meta_architectures/rfcn_meta_arch.py
+++ b/research/object_detection/meta_architectures/rfcn_meta_arch.py
@@ -62,11 +62,11 @@ class RFCNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch):
               first_stage_box_predictor_depth,
               first_stage_minibatch_size,
               first_stage_sampler,
-               first_stage_nms_score_threshold,
-               first_stage_nms_iou_threshold,
+               first_stage_non_max_suppression_fn,
               first_stage_max_proposals,
               first_stage_localization_loss_weight,
               first_stage_objectness_loss_weight,
+               crop_and_resize_fn,
               second_stage_target_assigner,
               second_stage_rfcn_box_predictor,
               second_stage_batch_size,
@@ -79,8 +79,9 @@ class RFCNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch):
               hard_example_miner,
               parallel_iterations=16,
               add_summaries=True,
-               use_matmul_crop_and_resize=False,
-               clip_anchors_to_image=False):
+               clip_anchors_to_image=False,
+               use_static_shapes=False,
+               resize_masks=False):
    """RFCNMetaArch Constructor.

    Args:
@@ -123,18 +124,22 @@ class RFCNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch):
        only called "batch_size" due to terminology from the Faster R-CNN paper.
      first_stage_sampler: The sampler for the boxes used to calculate the RPN
        loss after the first stage.
-      first_stage_nms_score_threshold: Score threshold for non max suppression
-        for the Region Proposal Network (RPN).  This value is expected to be in
-        [0, 1] as it is applied directly after a softmax transformation.  The
-        recommended value for Faster R-CNN is 0.
-      first_stage_nms_iou_threshold: The Intersection Over Union (IOU) threshold
-        for performing Non-Max Suppression (NMS) on the boxes predicted by the
-        Region Proposal Network (RPN).
+      first_stage_non_max_suppression_fn: batch_multiclass_non_max_suppression
+        callable that takes `boxes`, `scores` and optional `clip_window`(with
+        all other inputs already set) and returns a dictionary containing
+        tensors with keys: `detection_boxes`, `detection_scores`,
+        `detection_classes`, `num_detections`. This is used to perform non max
+        suppression  on the boxes predicted by the Region Proposal Network
+        (RPN).
+        See `post_processing.batch_multiclass_non_max_suppression` for the type
+        and shape of these tensors.
      first_stage_max_proposals: Maximum number of boxes to retain after
        performing Non-Max Suppression (NMS) on the boxes predicted by the
        Region Proposal Network (RPN).
      first_stage_localization_loss_weight: A float
      first_stage_objectness_loss_weight: A float
+      crop_and_resize_fn: A differentiable resampler to use for cropping RPN
+        proposal features.
      second_stage_target_assigner: Target assigner to use for second stage of
        R-FCN. If the model is configured with multiple prediction heads, this
        target assigner is used to generate targets for all heads (with the
@@ -168,12 +173,13 @@ class RFCNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch):
        in parallel for calls to tf.map_fn.
      add_summaries: boolean (default: True) controlling whether summary ops
        should be added to tensorflow graph.
-      use_matmul_crop_and_resize: Force the use of matrix multiplication based
-        crop and resize instead of standard tf.image.crop_and_resize while
-        computing second stage input feature maps.
      clip_anchors_to_image: The anchors generated are clip to the
        window size without filtering the nonoverlapping anchors. This generates
        a static number of anchors. This argument is unused.
+      use_static_shapes: If True, uses implementation of ops with static shape
+        guarantees.
+      resize_masks: Indicates whether the masks presend in the groundtruth
+        should be resized in the model with `image_resizer_fn`

    Raises:
      ValueError: If `second_stage_batch_size` > `first_stage_max_proposals`
@@ -196,11 +202,11 @@ class RFCNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch):
        first_stage_box_predictor_depth,
        first_stage_minibatch_size,
        first_stage_sampler,
-        first_stage_nms_score_threshold,
-        first_stage_nms_iou_threshold,
+        first_stage_non_max_suppression_fn,
        first_stage_max_proposals,
        first_stage_localization_loss_weight,
        first_stage_objectness_loss_weight,
+        crop_and_resize_fn,
        None,  # initial_crop_size is not used in R-FCN
        None,  # maxpool_kernel_size is not use in R-FCN
        None,  # maxpool_stride is not use in R-FCN
@@ -215,7 +221,11 @@ class RFCNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch):
        second_stage_classification_loss,
        1.0,  # second stage mask prediction loss weight isn't used in R-FCN.
        hard_example_miner,
-        parallel_iterations)
+        parallel_iterations,
+        add_summaries,
+        clip_anchors_to_image,
+        use_static_shapes,
+        resize_masks)

    self._rfcn_box_predictor = second_stage_rfcn_box_predictor


--- a/research/object_detection/meta_architectures/ssd_meta_arch.py
+++ b/research/object_detection/meta_architectures/ssd_meta_arch.py
@@ -225,10 +225,7 @@ class SSDMetaArch(model.DetectionModel):
               box_predictor,
               box_coder,
               feature_extractor,
-               matcher,
-               region_similarity_calculator,
               encode_background_as_zeros,
-               negative_class_weight,
               image_resizer_fn,
               non_max_suppression_fn,
               score_conversion_fn,
@@ -238,14 +235,14 @@ class SSDMetaArch(model.DetectionModel):
               localization_loss_weight,
               normalize_loss_by_num_matches,
               hard_example_miner,
+               target_assigner_instance,
               add_summaries=True,
               normalize_loc_loss_by_codesize=False,
               freeze_batchnorm=False,
               inplace_batchnorm_update=False,
               add_background_class=True,
               random_example_sampler=None,
-               expected_classification_loss_under_sampling=None,
-               target_assigner_instance=None):
+               expected_classification_loss_under_sampling=None):
    """SSDMetaArch Constructor.

    TODO(rathodv,jonathanhuang): group NMS parameters + score converter into
@@ -259,13 +256,9 @@ class SSDMetaArch(model.DetectionModel):
      box_predictor: a box_predictor.BoxPredictor object.
      box_coder: a box_coder.BoxCoder object.
      feature_extractor: a SSDFeatureExtractor object.
-      matcher: a matcher.Matcher object.
-      region_similarity_calculator: a
-        region_similarity_calculator.RegionSimilarityCalculator object.
      encode_background_as_zeros: boolean determining whether background
        targets are to be encoded as an all zeros vector or a one-hot
        vector (where background is the 0th class).
-      negative_class_weight: Weight for confidence loss of negative anchors.
      image_resizer_fn: a callable for image resizing.  This callable always
        takes a rank-3 image tensor (corresponding to a single image) and
        returns a rank-3 image tensor, possibly with new spatial dimensions and
@@ -288,6 +281,7 @@ class SSDMetaArch(model.DetectionModel):
      localization_loss_weight: float
      normalize_loss_by_num_matches: boolean
      hard_example_miner: a losses.HardExampleMiner object (can be None)
+      target_assigner_instance: target_assigner.TargetAssigner instance to use.
      add_summaries: boolean (default: True) controlling whether summary ops
        should be added to tensorflow graph.
      normalize_loc_loss_by_codesize: whether to normalize localization loss
@@ -312,7 +306,6 @@ class SSDMetaArch(model.DetectionModel):
        the random sampled examples.
      expected_classification_loss_under_sampling: If not None, use
        to calcualte classification loss by background/foreground weighting.
-      target_assigner_instance: target_assigner.TargetAssigner instance to use.
    """
    super(SSDMetaArch, self).__init__(num_classes=box_predictor.num_classes)
    self._is_training = is_training
@@ -324,8 +317,6 @@ class SSDMetaArch(model.DetectionModel):

    self._box_coder = box_coder
    self._feature_extractor = feature_extractor
-    self._matcher = matcher
-    self._region_similarity_calculator = region_similarity_calculator
    self._add_background_class = add_background_class

    # Needed for fine-tuning from classification checkpoints whose
@@ -347,14 +338,7 @@ class SSDMetaArch(model.DetectionModel):
      self._unmatched_class_label = tf.constant((self.num_classes + 1) * [0],
                                                tf.float32)

-    if target_assigner_instance:
-      self._target_assigner = target_assigner_instance
-    else:
-      self._target_assigner = target_assigner.TargetAssigner(
-          self._region_similarity_calculator,
-          self._matcher,
-          self._box_coder,
-          negative_class_weight=negative_class_weight)
+    self._target_assigner = target_assigner_instance

    self._classification_loss = classification_loss
    self._localization_loss = localization_loss
@@ -523,28 +507,25 @@ class SSDMetaArch(model.DetectionModel):
            im_height=image_shape[1],
            im_width=image_shape[2]))
    if self._box_predictor.is_keras_model:
-      prediction_dict = self._box_predictor(feature_maps)
+      predictor_results_dict = self._box_predictor(feature_maps)
    else:
      with slim.arg_scope([slim.batch_norm],
                          is_training=(self._is_training and
                                       not self._freeze_batchnorm),
                          updates_collections=batchnorm_updates_collections):
-        prediction_dict = self._box_predictor.predict(
+        predictor_results_dict = self._box_predictor.predict(
            feature_maps, self._anchor_generator.num_anchors_per_location())
-
-    box_encodings = tf.concat(prediction_dict['box_encodings'], axis=1)
-    if box_encodings.shape.ndims == 4 and box_encodings.shape[2] == 1:
-      box_encodings = tf.squeeze(box_encodings, axis=2)
-    class_predictions_with_background = tf.concat(
-        prediction_dict['class_predictions_with_background'], axis=1)
    predictions_dict = {
        'preprocessed_inputs': preprocessed_inputs,
-        'box_encodings': box_encodings,
-        'class_predictions_with_background':
-        class_predictions_with_background,
        'feature_maps': feature_maps,
        'anchors': self._anchors.get()
    }
+    for prediction_key, prediction_list in iter(predictor_results_dict.items()):
+      prediction = tf.concat(prediction_list, axis=1)
+      if (prediction_key == 'box_encodings' and prediction.shape.ndims == 4 and
+          prediction.shape[2] == 1):
+        prediction = tf.squeeze(prediction, axis=2)
+      predictions_dict[prediction_key] = prediction
    self._batched_prediction_tensor_names = [x for x in predictions_dict
                                             if x != 'anchors']
    return predictions_dict
@@ -587,6 +568,10 @@ class SSDMetaArch(model.DetectionModel):
          [batch_size, num_anchors, num_classes+1] containing class predictions
          (logits) for each of the anchors.  Note that this tensor *includes*
          background class predictions.
+        4) mask_predictions: (optional) a 5-D float tensor of shape
+          [batch_size, num_anchors, q, mask_height, mask_width]. `q` can be
+          either number of classes or 1 depending on whether a separate mask is
+          predicted per class.
      true_image_shapes: int32 tensor of shape [batch, 3] where each row is
        of the form [height, width, channels] indicating the shapes
        of true images in the resized images, as resized images can be padded
@@ -599,6 +584,8 @@ class SSDMetaArch(model.DetectionModel):
        detection_classes: [batch, max_detections]
        detection_keypoints: [batch, max_detections, num_keypoints, 2] (if
          encoded in the prediction_dict 'box_encodings')
+        detection_masks: [batch_size, max_detections, mask_height, mask_width]
+          (optional)
        num_detections: [batch]
    Raises:
      ValueError: if prediction_dict does not contain `box_encodings` or
@@ -627,13 +614,14 @@ class SSDMetaArch(model.DetectionModel):
      if detection_keypoints is not None:
        additional_fields = {
            fields.BoxListFields.keypoints: detection_keypoints}
-      (nmsed_boxes, nmsed_scores, nmsed_classes, _, nmsed_additional_fields,
-       num_detections) = self._non_max_suppression_fn(
+      (nmsed_boxes, nmsed_scores, nmsed_classes, nmsed_masks,
+       nmsed_additional_fields, num_detections) = self._non_max_suppression_fn(
           detection_boxes,
           detection_scores,
-           clip_window=self._compute_clip_window(
-               preprocessed_images, true_image_shapes),
-           additional_fields=additional_fields)
+           clip_window=self._compute_clip_window(preprocessed_images,
+                                                 true_image_shapes),
+           additional_fields=additional_fields,
+           masks=prediction_dict.get('mask_predictions'))
      detection_dict = {
          fields.DetectionResultFields.detection_boxes: nmsed_boxes,
          fields.DetectionResultFields.detection_scores: nmsed_scores,
@@ -645,6 +633,9 @@ class SSDMetaArch(model.DetectionModel):
          fields.BoxListFields.keypoints in nmsed_additional_fields):
        detection_dict[fields.DetectionResultFields.detection_keypoints] = (
            nmsed_additional_fields[fields.BoxListFields.keypoints])
+      if nmsed_masks is not None:
+        detection_dict[
+            fields.DetectionResultFields.detection_masks] = nmsed_masks
      return detection_dict

  def loss(self, prediction_dict, true_image_shapes, scope=None):
@@ -701,16 +692,22 @@ class SSDMetaArch(model.DetectionModel):
        batch_cls_weights = tf.multiply(batch_sampled_indicator,
                                        batch_cls_weights)

+      losses_mask = None
+      if self.groundtruth_has_field(fields.InputDataFields.is_annotated):
+        losses_mask = tf.stack(self.groundtruth_lists(
+            fields.InputDataFields.is_annotated))
      location_losses = self._localization_loss(
          prediction_dict['box_encodings'],
          batch_reg_targets,
          ignore_nan_targets=True,
-          weights=batch_reg_weights)
+          weights=batch_reg_weights,
+          losses_mask=losses_mask)

      cls_losses = self._classification_loss(
          prediction_dict['class_predictions_with_background'],
          batch_cls_targets,
-          weights=batch_cls_weights)
+          weights=batch_cls_weights,
+          losses_mask=losses_mask)

      if self._expected_classification_loss_under_sampling:
        if cls_losses.get_shape().ndims == 3:
@@ -734,12 +731,6 @@ class SSDMetaArch(model.DetectionModel):
          self._hard_example_miner.summarize()
      else:
        cls_losses = ops.reduce_sum_trailing_dimensions(cls_losses, ndims=2)
-        if self._add_summaries:
-          class_ids = tf.argmax(batch_cls_targets, axis=2)
-          flattened_class_ids = tf.reshape(class_ids, [-1])
-          flattened_classification_losses = tf.reshape(cls_losses, [-1])
-          self._summarize_anchor_classification_loss(
-              flattened_class_ids, flattened_classification_losses)
        localization_loss = tf.reduce_sum(location_losses)
        classification_loss = tf.reduce_sum(cls_losses)


--- a/research/object_detection/meta_architectures/ssd_meta_arch_test.py
+++ b/research/object_detection/meta_architectures/ssd_meta_arch_test.py
@@ -14,105 +14,26 @@
 # ==============================================================================

 """Tests for object_detection.meta_architectures.ssd_meta_arch."""
-import functools
+
 from absl.testing import parameterized

 import numpy as np
 import tensorflow as tf

-from object_detection.core import anchor_generator
-from object_detection.core import balanced_positive_negative_sampler as sampler
-from object_detection.core import box_list
-from object_detection.core import losses
-from object_detection.core import post_processing
-from object_detection.core import region_similarity_calculator as sim_calc
-from object_detection.core import target_assigner
 from object_detection.meta_architectures import ssd_meta_arch
-from object_detection.utils import ops
-from object_detection.utils import test_case
+from object_detection.meta_architectures import ssd_meta_arch_test_lib
 from object_detection.utils import test_utils

 slim = tf.contrib.slim
 keras = tf.keras.layers


-class FakeSSDFeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
-
-  def __init__(self):
-    super(FakeSSDFeatureExtractor, self).__init__(
-        is_training=True,
-        depth_multiplier=0,
-        min_depth=0,
-        pad_to_multiple=1,
-        conv_hyperparams_fn=None)
-
-  def preprocess(self, resized_inputs):
-    return tf.identity(resized_inputs)
-
-  def extract_features(self, preprocessed_inputs):
-    with tf.variable_scope('mock_model'):
-      features = slim.conv2d(inputs=preprocessed_inputs, num_outputs=32,
-                             kernel_size=1, scope='layer1')
-      return [features]
-
-
-class FakeSSDKerasFeatureExtractor(ssd_meta_arch.SSDKerasFeatureExtractor):
-
-  def __init__(self):
-    with tf.name_scope('mock_model'):
-      super(FakeSSDKerasFeatureExtractor, self).__init__(
-          is_training=True,
-          depth_multiplier=0,
-          min_depth=0,
-          pad_to_multiple=1,
-          conv_hyperparams_config=None,
-          freeze_batchnorm=False,
-          inplace_batchnorm_update=False,
-      )
-
-      self._conv = keras.Conv2D(filters=32, kernel_size=1, name='layer1')
-
-  def preprocess(self, resized_inputs):
-    return tf.identity(resized_inputs)
-
-  def _extract_features(self, preprocessed_inputs, **kwargs):
-    with tf.name_scope('mock_model'):
-      return [self._conv(preprocessed_inputs)]
-
-
-class MockAnchorGenerator2x2(anchor_generator.AnchorGenerator):
-  """Sets up a simple 2x2 anchor grid on the unit square."""
-
-  def name_scope(self):
-    return 'MockAnchorGenerator'
-
-  def num_anchors_per_location(self):
-    return [1]
-
-  def _generate(self, feature_map_shape_list, im_height, im_width):
-    return [box_list.BoxList(
-        tf.constant([[0, 0, .5, .5],
-                     [0, .5, .5, 1],
-                     [.5, 0, 1, .5],
-                     [1., 1., 1.5, 1.5]  # Anchor that is outside clip_window.
-                    ], tf.float32))]
-
-  def num_anchors(self):
-    return 4
-
-
-def _get_value_for_matching_key(dictionary, suffix):
-  for key in dictionary.keys():
-    if key.endswith(suffix):
-      return dictionary[key]
-  raise ValueError('key not found {}'.format(suffix))
-
-
 @parameterized.parameters(
    {'use_keras': False},
    {'use_keras': True},
 )
-class SsdMetaArchTest(test_case.TestCase, parameterized.TestCase):
+class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
+                      parameterized.TestCase):

  def _create_model(self,
                    apply_hard_mining=True,
@@ -123,96 +44,25 @@ class SsdMetaArchTest(test_case.TestCase, parameterized.TestCase):
                    use_expected_classification_loss_under_sampling=False,
                    minimum_negative_sampling=1,
                    desired_negative_sampling_ratio=3,
-                    use_keras=False):
-    is_training = False
-    num_classes = 1
-    mock_anchor_generator = MockAnchorGenerator2x2()
-    if use_keras:
-      mock_box_predictor = test_utils.MockKerasBoxPredictor(
-          is_training, num_classes)
-    else:
-      mock_box_predictor = test_utils.MockBoxPredictor(
-          is_training, num_classes)
-    mock_box_coder = test_utils.MockBoxCoder()
-    if use_keras:
-      fake_feature_extractor = FakeSSDKerasFeatureExtractor()
-    else:
-      fake_feature_extractor = FakeSSDFeatureExtractor()
-    mock_matcher = test_utils.MockMatcher()
-    region_similarity_calculator = sim_calc.IouSimilarity()
-    encode_background_as_zeros = False
-    def image_resizer_fn(image):
-      return [tf.identity(image), tf.shape(image)]
-
-    classification_loss = losses.WeightedSigmoidClassificationLoss()
-    localization_loss = losses.WeightedSmoothL1LocalizationLoss()
-    non_max_suppression_fn = functools.partial(
-        post_processing.batch_multiclass_non_max_suppression,
-        score_thresh=-20.0,
-        iou_thresh=1.0,
-        max_size_per_class=5,
-        max_total_size=5)
-    classification_loss_weight = 1.0
-    localization_loss_weight = 1.0
-    negative_class_weight = 1.0
-    normalize_loss_by_num_matches = False
-
-    hard_example_miner = None
-    if apply_hard_mining:
-      # This hard example miner is expected to be a no-op.
-      hard_example_miner = losses.HardExampleMiner(
-          num_hard_examples=None,
-          iou_threshold=1.0)
-
-    random_example_sampler = None
-    if random_example_sampling:
-      random_example_sampler = sampler.BalancedPositiveNegativeSampler(
-          positive_fraction=0.5)
-
-    target_assigner_instance = target_assigner.TargetAssigner(
-        region_similarity_calculator,
-        mock_matcher,
-        mock_box_coder,
-        negative_class_weight=negative_class_weight,
-        weight_regression_loss_by_score=weight_regression_loss_by_score)
-
-    expected_classification_loss_under_sampling = None
-    if use_expected_classification_loss_under_sampling:
-      expected_classification_loss_under_sampling = functools.partial(
-          ops.expected_classification_loss_under_sampling,
-          minimum_negative_sampling=minimum_negative_sampling,
-          desired_negative_sampling_ratio=desired_negative_sampling_ratio)
-
-    code_size = 4
-    model = ssd_meta_arch.SSDMetaArch(
-        is_training,
-        mock_anchor_generator,
-        mock_box_predictor,
-        mock_box_coder,
-        fake_feature_extractor,
-        mock_matcher,
-        region_similarity_calculator,
-        encode_background_as_zeros,
-        negative_class_weight,
-        image_resizer_fn,
-        non_max_suppression_fn,
-        tf.identity,
-        classification_loss,
-        localization_loss,
-        classification_loss_weight,
-        localization_loss_weight,
-        normalize_loss_by_num_matches,
-        hard_example_miner,
-        target_assigner_instance=target_assigner_instance,
-        add_summaries=False,
+                    use_keras=False,
+                    predict_mask=False,
+                    use_static_shapes=False,
+                    nms_max_size_per_class=5):
+    return super(SsdMetaArchTest, self)._create_model(
+        model_fn=ssd_meta_arch.SSDMetaArch,
+        apply_hard_mining=apply_hard_mining,
        normalize_loc_loss_by_codesize=normalize_loc_loss_by_codesize,
-        freeze_batchnorm=False,
-        inplace_batchnorm_update=False,
        add_background_class=add_background_class,
-        random_example_sampler=random_example_sampler,
-        expected_classification_loss_under_sampling=
-        expected_classification_loss_under_sampling)
-    return model, num_classes, mock_anchor_generator.num_anchors(), code_size
+        random_example_sampling=random_example_sampling,
+        weight_regression_loss_by_score=weight_regression_loss_by_score,
+        use_expected_classification_loss_under_sampling=
+        use_expected_classification_loss_under_sampling,
+        minimum_negative_sampling=minimum_negative_sampling,
+        desired_negative_sampling_ratio=desired_negative_sampling_ratio,
+        use_keras=use_keras,
+        predict_mask=predict_mask,
+        use_static_shapes=use_static_shapes,
+        nms_max_size_per_class=nms_max_size_per_class)

  def test_preprocess_preserves_shapes_with_dynamic_input_image(
      self, use_keras):
@@ -360,6 +210,61 @@ class SsdMetaArchTest(test_case.TestCase, parameterized.TestCase):
      self.assertAllClose(detections_out['num_detections'],
                          expected_num_detections)

+  # BEGIN GOOGLE-INTERNAL
+  # TODO(b/112621326): Remove conditional after CMLE moves to TF 1.11
+  def test_postprocess_results_are_correct_static(self, use_keras):
+    with tf.Graph().as_default():
+      _, _, _, _ = self._create_model(use_keras=use_keras)
+    def graph_fn(input_image):
+      model, _, _, _ = self._create_model(use_static_shapes=True,
+                                          nms_max_size_per_class=4)
+      preprocessed_inputs, true_image_shapes = model.preprocess(input_image)
+      prediction_dict = model.predict(preprocessed_inputs,
+                                      true_image_shapes)
+      detections = model.postprocess(prediction_dict, true_image_shapes)
+      return (detections['detection_boxes'], detections['detection_scores'],
+              detections['detection_classes'], detections['num_detections'])
+
+    batch_size = 2
+    image_size = 2
+    channels = 3
+    input_image = np.random.rand(batch_size, image_size, image_size,
+                                 channels).astype(np.float32)
+    expected_boxes = [
+        [
+            [0, 0, .5, .5],
+            [0, .5, .5, 1],
+            [.5, 0, 1, .5],
+            [0, 0, 0, 0]
+        ],  # padding
+        [
+            [0, 0, .5, .5],
+            [0, .5, .5, 1],
+            [.5, 0, 1, .5],
+            [0, 0, 0, 0]
+        ]
+    ]  # padding
+    expected_scores = [[0, 0, 0, 0], [0, 0, 0, 0]]
+    expected_classes = [[0, 0, 0, 0], [0, 0, 0, 0]]
+    expected_num_detections = np.array([3, 3])
+
+    (detection_boxes, detection_scores, detection_classes,
+     num_detections) = self.execute(graph_fn, [input_image])
+    for image_idx in range(batch_size):
+      self.assertTrue(test_utils.first_rows_close_as_set(
+          detection_boxes[image_idx][
+              0:expected_num_detections[image_idx]].tolist(),
+          expected_boxes[image_idx][0:expected_num_detections[image_idx]]))
+      self.assertAllClose(
+          detection_scores[image_idx][0:expected_num_detections[image_idx]],
+          expected_scores[image_idx][0:expected_num_detections[image_idx]])
+      self.assertAllClose(
+          detection_classes[image_idx][0:expected_num_detections[image_idx]],
+          expected_classes[image_idx][0:expected_num_detections[image_idx]])
+    self.assertAllClose(num_detections,
+                        expected_num_detections)
+  # END GOOGLE-INTERNAL
+
  def test_loss_results_are_correct(self, use_keras):

    with tf.Graph().as_default():
@@ -374,9 +279,10 @@ class SsdMetaArchTest(test_case.TestCase, parameterized.TestCase):
      prediction_dict = model.predict(preprocessed_tensor,
                                      true_image_shapes=None)
      loss_dict = model.loss(prediction_dict, true_image_shapes=None)
-      return (
-          _get_value_for_matching_key(loss_dict, 'Loss/localization_loss'),
-          _get_value_for_matching_key(loss_dict, 'Loss/classification_loss'))
+      return (self._get_value_for_matching_key(loss_dict,
+                                               'Loss/localization_loss'),
+              self._get_value_for_matching_key(loss_dict,
+                                               'Loss/classification_loss'))

    batch_size = 2
    preprocessed_input = np.random.rand(batch_size, 2, 2, 3).astype(np.float32)
@@ -413,7 +319,8 @@ class SsdMetaArchTest(test_case.TestCase, parameterized.TestCase):
      prediction_dict = model.predict(preprocessed_tensor,
                                      true_image_shapes=None)
      loss_dict = model.loss(prediction_dict, true_image_shapes=None)
-      return (_get_value_for_matching_key(loss_dict, 'Loss/localization_loss'),)
+      return (self._get_value_for_matching_key(loss_dict,
+                                               'Loss/localization_loss'),)

    batch_size = 2
    preprocessed_input = np.random.rand(batch_size, 2, 2, 3).astype(np.float32)
@@ -443,9 +350,10 @@ class SsdMetaArchTest(test_case.TestCase, parameterized.TestCase):
      prediction_dict = model.predict(preprocessed_tensor,
                                      true_image_shapes=None)
      loss_dict = model.loss(prediction_dict, true_image_shapes=None)
-      return (
-          _get_value_for_matching_key(loss_dict, 'Loss/localization_loss'),
-          _get_value_for_matching_key(loss_dict, 'Loss/classification_loss'))
+      return (self._get_value_for_matching_key(loss_dict,
+                                               'Loss/localization_loss'),
+              self._get_value_for_matching_key(loss_dict,
+                                               'Loss/classification_loss'))

    batch_size = 2
    preprocessed_input = np.random.rand(batch_size, 2, 2, 3).astype(np.float32)
@@ -591,6 +499,55 @@ class SsdMetaArchTest(test_case.TestCase, parameterized.TestCase):
    self.assertAllClose(localization_loss, expected_localization_loss)
    self.assertAllClose(classification_loss, expected_classification_loss)

+  def test_loss_results_are_correct_with_losses_mask(self, use_keras):
+
+    with tf.Graph().as_default():
+      _, num_classes, num_anchors, _ = self._create_model(use_keras=use_keras)
+    def graph_fn(preprocessed_tensor, groundtruth_boxes1, groundtruth_boxes2,
+                 groundtruth_boxes3, groundtruth_classes1, groundtruth_classes2,
+                 groundtruth_classes3):
+      groundtruth_boxes_list = [groundtruth_boxes1, groundtruth_boxes2,
+                                groundtruth_boxes3]
+      groundtruth_classes_list = [groundtruth_classes1, groundtruth_classes2,
+                                  groundtruth_classes3]
+      is_annotated_list = [tf.constant(True), tf.constant(True),
+                           tf.constant(False)]
+      model, _, _, _ = self._create_model(apply_hard_mining=False)
+      model.provide_groundtruth(groundtruth_boxes_list,
+                                groundtruth_classes_list,
+                                is_annotated_list=is_annotated_list)
+      prediction_dict = model.predict(preprocessed_tensor,
+                                      true_image_shapes=None)
+      loss_dict = model.loss(prediction_dict, true_image_shapes=None)
+      return (self._get_value_for_matching_key(loss_dict,
+                                               'Loss/localization_loss'),
+              self._get_value_for_matching_key(loss_dict,
+                                               'Loss/classification_loss'))
+
+    batch_size = 3
+    preprocessed_input = np.random.rand(batch_size, 2, 2, 3).astype(np.float32)
+    groundtruth_boxes1 = np.array([[0, 0, .5, .5]], dtype=np.float32)
+    groundtruth_boxes2 = np.array([[0, 0, .5, .5]], dtype=np.float32)
+    groundtruth_boxes3 = np.array([[0, 0, .5, .5]], dtype=np.float32)
+    groundtruth_classes1 = np.array([[1]], dtype=np.float32)
+    groundtruth_classes2 = np.array([[1]], dtype=np.float32)
+    groundtruth_classes3 = np.array([[1]], dtype=np.float32)
+    expected_localization_loss = 0.0
+    # Note that we are subtracting 1 from batch_size, since the final image is
+    # not annotated.
+    expected_classification_loss = ((batch_size - 1) * num_anchors
+                                    * (num_classes+1) * np.log(2.0))
+    (localization_loss,
+     classification_loss) = self.execute(graph_fn, [preprocessed_input,
+                                                    groundtruth_boxes1,
+                                                    groundtruth_boxes2,
+                                                    groundtruth_boxes3,
+                                                    groundtruth_classes1,
+                                                    groundtruth_classes2,
+                                                    groundtruth_classes3])
+    self.assertAllClose(localization_loss, expected_localization_loss)
+    self.assertAllClose(classification_loss, expected_classification_loss)
+
  def test_restore_map_for_detection_ckpt(self, use_keras):
    model, _, _, _ = self._create_model(use_keras=use_keras)
    model.predict(tf.constant(np.array([[[[0, 0], [1, 1]], [[1, 0], [0, 1]]]],
@@ -678,10 +635,8 @@ class SsdMetaArchTest(test_case.TestCase, parameterized.TestCase):
      use_keras):

    with tf.Graph().as_default():
-      _, num_classes, num_anchors, _ = self._create_model(
-          random_example_sampling=True,
-          use_keras=use_keras)
-    print num_classes, num_anchors
+      _, num_classes, _, _ = self._create_model(
+          random_example_sampling=True, use_keras=use_keras)

    def graph_fn(preprocessed_tensor, groundtruth_boxes1, groundtruth_boxes2,
                 groundtruth_classes1, groundtruth_classes2):
@@ -694,9 +649,10 @@ class SsdMetaArchTest(test_case.TestCase, parameterized.TestCase):
      prediction_dict = model.predict(
          preprocessed_tensor, true_image_shapes=None)
      loss_dict = model.loss(prediction_dict, true_image_shapes=None)
-      return (_get_value_for_matching_key(loss_dict, 'Loss/localization_loss'),
-              _get_value_for_matching_key(loss_dict,
-                                          'Loss/classification_loss'))
+      return (self._get_value_for_matching_key(loss_dict,
+                                               'Loss/localization_loss'),
+              self._get_value_for_matching_key(loss_dict,
+                                               'Loss/classification_loss'))

    batch_size = 2
    preprocessed_input = np.random.rand(batch_size, 2, 2, 3).astype(np.float32)

--- a/research/object_detection/meta_architectures/ssd_meta_arch_test_lib.py
+++ b/research/object_detection/meta_architectures/ssd_meta_arch_test_lib.py
+# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Helper functions for SSD models meta architecture tests."""
+
+import functools
+import tensorflow as tf
+
+from object_detection.core import anchor_generator
+from object_detection.core import balanced_positive_negative_sampler as sampler
+from object_detection.core import box_list
+from object_detection.core import losses
+from object_detection.core import post_processing
+from object_detection.core import region_similarity_calculator as sim_calc
+from object_detection.core import target_assigner
+from object_detection.meta_architectures import ssd_meta_arch
+from object_detection.utils import ops
+from object_detection.utils import test_case
+from object_detection.utils import test_utils
+
+slim = tf.contrib.slim
+keras = tf.keras.layers
+
+
+class FakeSSDFeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
+  """Fake ssd feature extracture for ssd meta arch tests."""
+
+  def __init__(self):
+    super(FakeSSDFeatureExtractor, self).__init__(
+        is_training=True,
+        depth_multiplier=0,
+        min_depth=0,
+        pad_to_multiple=1,
+        conv_hyperparams_fn=None)
+
+  def preprocess(self, resized_inputs):
+    return tf.identity(resized_inputs)
+
+  def extract_features(self, preprocessed_inputs):
+    with tf.variable_scope('mock_model'):
+      features = slim.conv2d(
+          inputs=preprocessed_inputs,
+          num_outputs=32,
+          kernel_size=1,
+          scope='layer1')
+      return [features]
+
+
+class FakeSSDKerasFeatureExtractor(ssd_meta_arch.SSDKerasFeatureExtractor):
+  """Fake keras based ssd feature extracture for ssd meta arch tests."""
+
+  def __init__(self):
+    with tf.name_scope('mock_model'):
+      super(FakeSSDKerasFeatureExtractor, self).__init__(
+          is_training=True,
+          depth_multiplier=0,
+          min_depth=0,
+          pad_to_multiple=1,
+          conv_hyperparams_config=None,
+          freeze_batchnorm=False,
+          inplace_batchnorm_update=False,
+      )
+
+      self._conv = keras.Conv2D(filters=32, kernel_size=1, name='layer1')
+
+  def preprocess(self, resized_inputs):
+    return tf.identity(resized_inputs)
+
+  def _extract_features(self, preprocessed_inputs, **kwargs):
+    with tf.name_scope('mock_model'):
+      return [self._conv(preprocessed_inputs)]
+
+
+class MockAnchorGenerator2x2(anchor_generator.AnchorGenerator):
+  """A simple 2x2 anchor grid on the unit square used for test only."""
+
+  def name_scope(self):
+    return 'MockAnchorGenerator'
+
+  def num_anchors_per_location(self):
+    return [1]
+
+  def _generate(self, feature_map_shape_list, im_height, im_width):
+    return [
+        box_list.BoxList(
+            tf.constant(
+                [
+                    [0, 0, .5, .5],
+                    [0, .5, .5, 1],
+                    [.5, 0, 1, .5],
+                    [1., 1., 1.5, 1.5]  # Anchor that is outside clip_window.
+                ],
+                tf.float32))
+    ]
+
+  def num_anchors(self):
+    return 4
+
+
+class SSDMetaArchTestBase(test_case.TestCase):
+  """Base class to test SSD based meta architectures."""
+
+  def _create_model(self,
+                    model_fn=ssd_meta_arch.SSDMetaArch,
+                    apply_hard_mining=True,
+                    normalize_loc_loss_by_codesize=False,
+                    add_background_class=True,
+                    random_example_sampling=False,
+                    weight_regression_loss_by_score=False,
+                    use_expected_classification_loss_under_sampling=False,
+                    minimum_negative_sampling=1,
+                    desired_negative_sampling_ratio=3,
+                    use_keras=False,
+                    predict_mask=False,
+                    use_static_shapes=False,
+                    nms_max_size_per_class=5):
+    is_training = False
+    num_classes = 1
+    mock_anchor_generator = MockAnchorGenerator2x2()
+    if use_keras:
+      mock_box_predictor = test_utils.MockKerasBoxPredictor(
+          is_training, num_classes, predict_mask=predict_mask)
+    else:
+      mock_box_predictor = test_utils.MockBoxPredictor(
+          is_training, num_classes, predict_mask=predict_mask)
+    mock_box_coder = test_utils.MockBoxCoder()
+    if use_keras:
+      fake_feature_extractor = FakeSSDKerasFeatureExtractor()
+    else:
+      fake_feature_extractor = FakeSSDFeatureExtractor()
+    mock_matcher = test_utils.MockMatcher()
+    region_similarity_calculator = sim_calc.IouSimilarity()
+    encode_background_as_zeros = False
+
+    def image_resizer_fn(image):
+      return [tf.identity(image), tf.shape(image)]
+
+    classification_loss = losses.WeightedSigmoidClassificationLoss()
+    localization_loss = losses.WeightedSmoothL1LocalizationLoss()
+    non_max_suppression_fn = functools.partial(
+        post_processing.batch_multiclass_non_max_suppression,
+        score_thresh=-20.0,
+        iou_thresh=1.0,
+        max_size_per_class=nms_max_size_per_class,
+        max_total_size=nms_max_size_per_class,
+        use_static_shapes=use_static_shapes)
+    classification_loss_weight = 1.0
+    localization_loss_weight = 1.0
+    negative_class_weight = 1.0
+    normalize_loss_by_num_matches = False
+
+    hard_example_miner = None
+    if apply_hard_mining:
+      # This hard example miner is expected to be a no-op.
+      hard_example_miner = losses.HardExampleMiner(
+          num_hard_examples=None, iou_threshold=1.0)
+
+    random_example_sampler = None
+    if random_example_sampling:
+      random_example_sampler = sampler.BalancedPositiveNegativeSampler(
+          positive_fraction=0.5)
+
+    target_assigner_instance = target_assigner.TargetAssigner(
+        region_similarity_calculator,
+        mock_matcher,
+        mock_box_coder,
+        negative_class_weight=negative_class_weight,
+        weight_regression_loss_by_score=weight_regression_loss_by_score)
+
+    expected_classification_loss_under_sampling = None
+    if use_expected_classification_loss_under_sampling:
+      expected_classification_loss_under_sampling = functools.partial(
+          ops.expected_classification_loss_under_sampling,
+          minimum_negative_sampling=minimum_negative_sampling,
+          desired_negative_sampling_ratio=desired_negative_sampling_ratio)
+
+    code_size = 4
+    model = model_fn(
+        is_training=is_training,
+        anchor_generator=mock_anchor_generator,
+        box_predictor=mock_box_predictor,
+        box_coder=mock_box_coder,
+        feature_extractor=fake_feature_extractor,
+        encode_background_as_zeros=encode_background_as_zeros,
+        image_resizer_fn=image_resizer_fn,
+        non_max_suppression_fn=non_max_suppression_fn,
+        score_conversion_fn=tf.identity,
+        classification_loss=classification_loss,
+        localization_loss=localization_loss,
+        classification_loss_weight=classification_loss_weight,
+        localization_loss_weight=localization_loss_weight,
+        normalize_loss_by_num_matches=normalize_loss_by_num_matches,
+        hard_example_miner=hard_example_miner,
+        target_assigner_instance=target_assigner_instance,
+        add_summaries=False,
+        normalize_loc_loss_by_codesize=normalize_loc_loss_by_codesize,
+        freeze_batchnorm=False,
+        inplace_batchnorm_update=False,
+        add_background_class=add_background_class,
+        random_example_sampler=random_example_sampler,
+        expected_classification_loss_under_sampling=
+        expected_classification_loss_under_sampling)
+    return model, num_classes, mock_anchor_generator.num_anchors(), code_size
+
+  def _get_value_for_matching_key(self, dictionary, suffix):
+    for key in dictionary.keys():
+      if key.endswith(suffix):
+        return dictionary[key]
+    raise ValueError('key not found {}'.format(suffix))
+
+
+if __name__ == '__main__':
+  tf.test.main()
--- a/research/object_detection/metrics/coco_evaluation.py
+++ b/research/object_detection/metrics/coco_evaluation.py
@@ -18,6 +18,7 @@ import tensorflow as tf

 from object_detection.core import standard_fields
 from object_detection.metrics import coco_tools
+from object_detection.utils import json_utils
 from object_detection.utils import object_detection_evaluation


@@ -148,6 +149,19 @@ class CocoDetectionEvaluator(object_detection_evaluation.DetectionEvaluator):
                                              detection_classes]))
    self._image_ids[image_id] = True

+  def dump_detections_to_json_file(self, json_output_path):
+    """Saves the detections into json_output_path in the format used by MS COCO.
+
+    Args:
+      json_output_path: String containing the output file's path. It can be also
+        None. In that case nothing will be written to the output file.
+    """
+    if json_output_path and json_output_path is not None:
+      with tf.gfile.GFile(json_output_path, 'w') as fid:
+        tf.logging.info('Dumping detections to output json file.')
+        json_utils.Dump(
+            obj=self._detection_boxes_list, fid=fid, float_digits=4, indent=2)
+
  def evaluate(self):
    """Evaluates the detection boxes and returns a dictionary of coco metrics.

@@ -245,10 +259,11 @@ class CocoDetectionEvaluator(object_detection_evaluation.DetectionEvaluator):
               detection_boxes_batched, detection_scores_batched,
               detection_classes_batched, num_det_boxes_per_image):
        self.add_single_ground_truth_image_info(
-            image_id,
-            {'groundtruth_boxes': gt_box[:num_gt_box],
-             'groundtruth_classes': gt_class[:num_gt_box],
-             'groundtruth_is_crowd': gt_is_crowd[:num_gt_box]})
+            image_id, {
+                'groundtruth_boxes': gt_box[:num_gt_box],
+                'groundtruth_classes': gt_class[:num_gt_box],
+                'groundtruth_is_crowd': gt_is_crowd[:num_gt_box]
+            })
        self.add_single_detected_image_info(
            image_id,
            {'detection_boxes': det_box[:num_det_box],
@@ -268,8 +283,7 @@ class CocoDetectionEvaluator(object_detection_evaluation.DetectionEvaluator):
    detection_classes = eval_dict[detection_fields.detection_classes]
    num_gt_boxes_per_image = eval_dict.get(
        'num_groundtruth_boxes_per_image', None)
-    num_det_boxes_per_image = eval_dict.get(
-        'num_groundtruth_boxes_per_image', None)
+    num_det_boxes_per_image = eval_dict.get('num_det_boxes_per_image', None)

    if groundtruth_is_crowd is None:
      groundtruth_is_crowd = tf.zeros_like(groundtruth_classes, dtype=tf.bool)
@@ -491,6 +505,19 @@ class CocoMaskEvaluator(object_detection_evaluation.DetectionEvaluator):
                                              detection_classes]))
    self._image_ids_with_detections.update([image_id])

+  def dump_detections_to_json_file(self, json_output_path):
+    """Saves the detections into json_output_path in the format used by MS COCO.
+
+    Args:
+      json_output_path: String containing the output file's path. It can be also
+        None. In that case nothing will be written to the output file.
+    """
+    if json_output_path and json_output_path is not None:
+      tf.logging.info('Dumping detections to output json file.')
+      with tf.gfile.GFile(json_output_path, 'w') as fid:
+        json_utils.Dump(
+            obj=self._detection_masks_list, fid=fid, float_digits=4, indent=2)
+
  def evaluate(self):
    """Evaluates the detection masks and returns a dictionary of coco metrics.


--- a/research/object_detection/metrics/coco_evaluation_test.py
+++ b/research/object_detection/metrics/coco_evaluation_test.py
@@ -24,14 +24,25 @@ from object_detection.core import standard_fields
 from object_detection.metrics import coco_evaluation


+def _get_categories_list():
+  return [{
+      'id': 1,
+      'name': 'person'
+  }, {
+      'id': 2,
+      'name': 'dog'
+  }, {
+      'id': 3,
+      'name': 'cat'
+  }]
+
+
 class CocoDetectionEvaluationTest(tf.test.TestCase):

  def testGetOneMAPWithMatchingGroundtruthAndDetections(self):
    """Tests that mAP is calculated correctly on GT and Detections."""
-    category_list = [{'id': 0, 'name': 'person'},
-                     {'id': 1, 'name': 'cat'},
-                     {'id': 2, 'name': 'dog'}]
-    coco_evaluator = coco_evaluation.CocoDetectionEvaluator(category_list)
+    coco_evaluator = coco_evaluation.CocoDetectionEvaluator(
+        _get_categories_list())
    coco_evaluator.add_single_ground_truth_image_info(
        image_id='image1',
        groundtruth_dict={
@@ -88,17 +99,8 @@ class CocoDetectionEvaluationTest(tf.test.TestCase):

  def testGetOneMAPWithMatchingGroundtruthAndDetectionsSkipCrowd(self):
    """Tests computing mAP with is_crowd GT boxes skipped."""
-    category_list = [{
-        'id': 0,
-        'name': 'person'
-    }, {
-        'id': 1,
-        'name': 'cat'
-    }, {
-        'id': 2,
-        'name': 'dog'
-    }]
-    coco_evaluator = coco_evaluation.CocoDetectionEvaluator(category_list)
+    coco_evaluator = coco_evaluation.CocoDetectionEvaluator(
+        _get_categories_list())
    coco_evaluator.add_single_ground_truth_image_info(
        image_id='image1',
        groundtruth_dict={
@@ -124,17 +126,8 @@ class CocoDetectionEvaluationTest(tf.test.TestCase):

  def testGetOneMAPWithMatchingGroundtruthAndDetectionsEmptyCrowd(self):
    """Tests computing mAP with empty is_crowd array passed in."""
-    category_list = [{
-        'id': 0,
-        'name': 'person'
-    }, {
-        'id': 1,
-        'name': 'cat'
-    }, {
-        'id': 2,
-        'name': 'dog'
-    }]
-    coco_evaluator = coco_evaluation.CocoDetectionEvaluator(category_list)
+    coco_evaluator = coco_evaluation.CocoDetectionEvaluator(
+        _get_categories_list())
    coco_evaluator.add_single_ground_truth_image_info(
        image_id='image1',
        groundtruth_dict={
@@ -160,11 +153,9 @@ class CocoDetectionEvaluationTest(tf.test.TestCase):

  def testRejectionOnDuplicateGroundtruth(self):
    """Tests that groundtruth cannot be added more than once for an image."""
-    categories = [{'id': 1, 'name': 'cat'},
-                  {'id': 2, 'name': 'dog'},
-                  {'id': 3, 'name': 'elephant'}]
+    coco_evaluator = coco_evaluation.CocoDetectionEvaluator(
+        _get_categories_list())
    #  Add groundtruth
-    coco_evaluator = coco_evaluation.CocoDetectionEvaluator(categories)
    image_key1 = 'img1'
    groundtruth_boxes1 = np.array([[0, 0, 1, 1], [0, 0, 2, 2], [0, 0, 3, 3]],
                                  dtype=float)
@@ -189,11 +180,9 @@ class CocoDetectionEvaluationTest(tf.test.TestCase):

  def testRejectionOnDuplicateDetections(self):
    """Tests that detections cannot be added more than once for an image."""
-    categories = [{'id': 1, 'name': 'cat'},
-                  {'id': 2, 'name': 'dog'},
-                  {'id': 3, 'name': 'elephant'}]
+    coco_evaluator = coco_evaluation.CocoDetectionEvaluator(
+        _get_categories_list())
    #  Add groundtruth
-    coco_evaluator = coco_evaluation.CocoDetectionEvaluator(categories)
    coco_evaluator.add_single_ground_truth_image_info(
        image_id='image1',
        groundtruth_dict={
@@ -227,10 +216,8 @@ class CocoDetectionEvaluationTest(tf.test.TestCase):

  def testExceptionRaisedWithMissingGroundtruth(self):
    """Tests that exception is raised for detection with missing groundtruth."""
-    categories = [{'id': 1, 'name': 'cat'},
-                  {'id': 2, 'name': 'dog'},
-                  {'id': 3, 'name': 'elephant'}]
-    coco_evaluator = coco_evaluation.CocoDetectionEvaluator(categories)
+    coco_evaluator = coco_evaluation.CocoDetectionEvaluator(
+        _get_categories_list())
    with self.assertRaises(ValueError):
      coco_evaluator.add_single_detected_image_info(
          image_id='image1',
@@ -247,10 +234,8 @@ class CocoDetectionEvaluationTest(tf.test.TestCase):
 class CocoEvaluationPyFuncTest(tf.test.TestCase):

  def testGetOneMAPWithMatchingGroundtruthAndDetections(self):
-    category_list = [{'id': 0, 'name': 'person'},
-                     {'id': 1, 'name': 'cat'},
-                     {'id': 2, 'name': 'dog'}]
-    coco_evaluator = coco_evaluation.CocoDetectionEvaluator(category_list)
+    coco_evaluator = coco_evaluation.CocoDetectionEvaluator(
+        _get_categories_list())
    image_id = tf.placeholder(tf.string, shape=())
    groundtruth_boxes = tf.placeholder(tf.float32, shape=(None, 4))
    groundtruth_classes = tf.placeholder(tf.float32, shape=(None))
@@ -310,31 +295,22 @@ class CocoEvaluationPyFuncTest(tf.test.TestCase):
    self.assertAlmostEqual(metrics['DetectionBoxes_Precision/mAP@.75IOU'], 1.0)
    self.assertAlmostEqual(metrics['DetectionBoxes_Precision/mAP (large)'], 1.0)
    self.assertAlmostEqual(metrics['DetectionBoxes_Precision/mAP (medium)'],
-                           -1.0)
+                           1.0)
    self.assertAlmostEqual(metrics['DetectionBoxes_Precision/mAP (small)'], 1.0)
    self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@1'], 1.0)
    self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@10'], 1.0)
    self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@100'], 1.0)
    self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@100 (large)'], 1.0)
    self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@100 (medium)'],
-                           -1.0)
+                           1.0)
    self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@100 (small)'], 1.0)
    self.assertFalse(coco_evaluator._groundtruth_list)
    self.assertFalse(coco_evaluator._detection_boxes_list)
    self.assertFalse(coco_evaluator._image_ids)

  def testGetOneMAPWithMatchingGroundtruthAndDetectionsPadded(self):
-    category_list = [{
-        'id': 0,
-        'name': 'person'
-    }, {
-        'id': 1,
-        'name': 'cat'
-    }, {
-        'id': 2,
-        'name': 'dog'
-    }]
-    coco_evaluator = coco_evaluation.CocoDetectionEvaluator(category_list)
+    coco_evaluator = coco_evaluation.CocoDetectionEvaluator(
+        _get_categories_list())
    image_id = tf.placeholder(tf.string, shape=())
    groundtruth_boxes = tf.placeholder(tf.float32, shape=(None, 4))
    groundtruth_classes = tf.placeholder(tf.float32, shape=(None))
@@ -415,24 +391,22 @@ class CocoEvaluationPyFuncTest(tf.test.TestCase):
    self.assertAlmostEqual(metrics['DetectionBoxes_Precision/mAP@.75IOU'], 1.0)
    self.assertAlmostEqual(metrics['DetectionBoxes_Precision/mAP (large)'], 1.0)
    self.assertAlmostEqual(metrics['DetectionBoxes_Precision/mAP (medium)'],
-                           -1.0)
+                           1.0)
    self.assertAlmostEqual(metrics['DetectionBoxes_Precision/mAP (small)'], 1.0)
-    self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@1'], 0.75)
+    self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@1'], 0.83333331)
    self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@10'], 1.0)
    self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@100'], 1.0)
    self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@100 (large)'], 1.0)
    self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@100 (medium)'],
-                           -1.0)
+                           1.0)
    self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@100 (small)'], 1.0)
    self.assertFalse(coco_evaluator._groundtruth_list)
    self.assertFalse(coco_evaluator._detection_boxes_list)
    self.assertFalse(coco_evaluator._image_ids)

  def testGetOneMAPWithMatchingGroundtruthAndDetectionsBatched(self):
-    category_list = [{'id': 0, 'name': 'person'},
-                     {'id': 1, 'name': 'cat'},
-                     {'id': 2, 'name': 'dog'}]
-    coco_evaluator = coco_evaluation.CocoDetectionEvaluator(category_list)
+    coco_evaluator = coco_evaluation.CocoDetectionEvaluator(
+        _get_categories_list())
    batch_size = 3
    image_id = tf.placeholder(tf.string, shape=(batch_size))
    groundtruth_boxes = tf.placeholder(tf.float32, shape=(batch_size, None, 4))
@@ -479,24 +453,22 @@ class CocoEvaluationPyFuncTest(tf.test.TestCase):
    self.assertAlmostEqual(metrics['DetectionBoxes_Precision/mAP@.75IOU'], 1.0)
    self.assertAlmostEqual(metrics['DetectionBoxes_Precision/mAP (large)'], 1.0)
    self.assertAlmostEqual(metrics['DetectionBoxes_Precision/mAP (medium)'],
-                           -1.0)
+                           1.0)
    self.assertAlmostEqual(metrics['DetectionBoxes_Precision/mAP (small)'], 1.0)
    self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@1'], 1.0)
    self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@10'], 1.0)
    self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@100'], 1.0)
    self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@100 (large)'], 1.0)
    self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@100 (medium)'],
-                           -1.0)
+                           1.0)
    self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@100 (small)'], 1.0)
    self.assertFalse(coco_evaluator._groundtruth_list)
    self.assertFalse(coco_evaluator._detection_boxes_list)
    self.assertFalse(coco_evaluator._image_ids)

  def testGetOneMAPWithMatchingGroundtruthAndDetectionsPaddedBatches(self):
-    category_list = [{'id': 0, 'name': 'person'},
-                     {'id': 1, 'name': 'cat'},
-                     {'id': 2, 'name': 'dog'}]
-    coco_evaluator = coco_evaluation.CocoDetectionEvaluator(category_list)
+    coco_evaluator = coco_evaluation.CocoDetectionEvaluator(
+        _get_categories_list())
    batch_size = 3
    image_id = tf.placeholder(tf.string, shape=(batch_size))
    groundtruth_boxes = tf.placeholder(tf.float32, shape=(batch_size, None, 4))
@@ -525,27 +497,40 @@ class CocoEvaluationPyFuncTest(tf.test.TestCase):
    _, update_op = eval_metric_ops['DetectionBoxes_Precision/mAP']

    with self.test_session() as sess:
-      sess.run(update_op,
-               feed_dict={
-                   image_id: ['image1', 'image2', 'image3'],
-                   groundtruth_boxes: np.array([[[100., 100., 200., 200.],
-                                                 [-1, -1, -1, -1]],
-                                                [[50., 50., 100., 100.],
-                                                 [-1, -1, -1, -1]],
-                                                [[25., 25., 50., 50.],
-                                                 [10., 10., 15., 15.]]]),
-                   groundtruth_classes: np.array([[1, -1], [3, -1], [2, 2]]),
-                   num_gt_boxes_per_image: np.array([1, 1, 2]),
-                   detection_boxes: np.array([[[100., 100., 200., 200.],
-                                               [0., 0., 0., 0.]],
-                                              [[50., 50., 100., 100.],
-                                               [0., 0., 0., 0.]],
-                                              [[25., 25., 50., 50.],
-                                               [10., 10., 15., 15.]]]),
-                   detection_scores: np.array([[.8, 0.], [.7, 0.], [.95, .9]]),
-                   detection_classes: np.array([[1, -1], [3, -1], [2, 2]]),
-                   num_det_boxes_per_image: np.array([1, 1, 2]),
-               })
+      sess.run(
+          update_op,
+          feed_dict={
+              image_id: ['image1', 'image2', 'image3'],
+              groundtruth_boxes:
+                  np.array([[[100., 100., 200., 200.], [-1, -1, -1, -1]],
+                            [[50., 50., 100., 100.], [-1, -1, -1, -1]],
+                            [[25., 25., 50., 50.], [10., 10., 15., 15.]]]),
+              groundtruth_classes:
+                  np.array([[1, -1], [3, -1], [2, 2]]),
+              num_gt_boxes_per_image:
+                  np.array([1, 1, 2]),
+              detection_boxes:
+                  np.array([[[100., 100., 200., 200.],
+                             [0., 0., 0., 0.],
+                             [0., 0., 0., 0.]],
+                            [[50., 50., 100., 100.],
+                             [0., 0., 0., 0.],
+                             [0., 0., 0., 0.]],
+                            [[25., 25., 50., 50.],
+                             [10., 10., 15., 15.],
+                             [10., 10., 15., 15.]]]),
+              detection_scores:
+                  np.array([[.8, 0., 0.], [.7, 0., 0.], [.95, .9, 0.9]]),
+              detection_classes:
+                  np.array([[1, -1, -1], [3, -1, -1], [2, 2, 2]]),
+              num_det_boxes_per_image:
+                  np.array([1, 1, 3]),
+          })
+
+    # Check the number of bounding boxes added.
+    self.assertEqual(len(coco_evaluator._groundtruth_list), 4)
+    self.assertEqual(len(coco_evaluator._detection_boxes_list), 5)
+
    metrics = {}
    for key, (value_op, _) in eval_metric_ops.iteritems():
      metrics[key] = value_op
@@ -555,14 +540,14 @@ class CocoEvaluationPyFuncTest(tf.test.TestCase):
    self.assertAlmostEqual(metrics['DetectionBoxes_Precision/mAP@.75IOU'], 1.0)
    self.assertAlmostEqual(metrics['DetectionBoxes_Precision/mAP (large)'], 1.0)
    self.assertAlmostEqual(metrics['DetectionBoxes_Precision/mAP (medium)'],
-                           -1.0)
+                           1.0)
    self.assertAlmostEqual(metrics['DetectionBoxes_Precision/mAP (small)'], 1.0)
-    self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@1'], 0.75)
+    self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@1'], 0.83333331)
    self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@10'], 1.0)
    self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@100'], 1.0)
    self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@100 (large)'], 1.0)
    self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@100 (medium)'],
-                           -1.0)
+                           1.0)
    self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@100 (small)'], 1.0)
    self.assertFalse(coco_evaluator._groundtruth_list)
    self.assertFalse(coco_evaluator._detection_boxes_list)
@@ -572,10 +557,7 @@ class CocoEvaluationPyFuncTest(tf.test.TestCase):
 class CocoMaskEvaluationTest(tf.test.TestCase):

  def testGetOneMAPWithMatchingGroundtruthAndDetections(self):
-    category_list = [{'id': 0, 'name': 'person'},
-                     {'id': 1, 'name': 'cat'},
-                     {'id': 2, 'name': 'dog'}]
-    coco_evaluator = coco_evaluation.CocoMaskEvaluator(category_list)
+    coco_evaluator = coco_evaluation.CocoMaskEvaluator(_get_categories_list())
    coco_evaluator.add_single_ground_truth_image_info(
        image_id='image1',
        groundtruth_dict={
@@ -657,10 +639,7 @@ class CocoMaskEvaluationTest(tf.test.TestCase):
 class CocoMaskEvaluationPyFuncTest(tf.test.TestCase):

  def testGetOneMAPWithMatchingGroundtruthAndDetections(self):
-    category_list = [{'id': 0, 'name': 'person'},
-                     {'id': 1, 'name': 'cat'},
-                     {'id': 2, 'name': 'dog'}]
-    coco_evaluator = coco_evaluation.CocoMaskEvaluator(category_list)
+    coco_evaluator = coco_evaluation.CocoMaskEvaluator(_get_categories_list())
    image_id = tf.placeholder(tf.string, shape=())
    groundtruth_boxes = tf.placeholder(tf.float32, shape=(None, 4))
    groundtruth_classes = tf.placeholder(tf.float32, shape=(None))
@@ -756,5 +735,6 @@ class CocoMaskEvaluationPyFuncTest(tf.test.TestCase):
    self.assertFalse(coco_evaluator._image_id_to_mask_shape_map)
    self.assertFalse(coco_evaluator._detection_masks_list)

+
 if __name__ == '__main__':
  tf.test.main()
--- a/research/object_detection/metrics/offline_eval_map_corloc.py
+++ b/research/object_detection/metrics/offline_eval_map_corloc.py
@@ -91,10 +91,8 @@ def read_data_and_evaluate(input_config, eval_config):
  if input_config.WhichOneof('input_reader') == 'tf_record_input_reader':
    input_paths = input_config.tf_record_input_reader.input_path

-    label_map = label_map_util.load_labelmap(input_config.label_map_path)
-    max_num_classes = max([item.id for item in label_map.item])
-    categories = label_map_util.convert_label_map_to_categories(
-        label_map, max_num_classes)
+    categories = label_map_util.create_categories_from_labelmap(
+        input_config.label_map_path)

    object_detection_evaluators = evaluator.get_evaluators(
        eval_config, categories)

--- a/research/object_detection/model_lib.py
+++ b/research/object_detection/model_lib.py
@@ -18,6 +18,7 @@ from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function

+import copy
 import functools
 import os

@@ -43,9 +44,12 @@ MODEL_BUILD_UTIL_MAP = {
        config_util.create_pipeline_proto_from_configs,
    'merge_external_params_with_configs':
        config_util.merge_external_params_with_configs,
-    'create_train_input_fn': inputs.create_train_input_fn,
-    'create_eval_input_fn': inputs.create_eval_input_fn,
-    'create_predict_input_fn': inputs.create_predict_input_fn,
+    'create_train_input_fn':
+        inputs.create_train_input_fn,
+    'create_eval_input_fn':
+        inputs.create_eval_input_fn,
+    'create_predict_input_fn':
+        inputs.create_predict_input_fn,
 }


@@ -126,8 +130,9 @@ def unstack_batch(tensor_dict, unpad_groundtruth_tensors=True):
    ValueError: If unpad_tensors is True and `tensor_dict` does not contain
      `num_groundtruth_boxes` tensor.
  """
-  unbatched_tensor_dict = {key: tf.unstack(tensor)
-                           for key, tensor in tensor_dict.items()}
+  unbatched_tensor_dict = {
+      key: tf.unstack(tensor) for key, tensor in tensor_dict.items()
+  }
  if unpad_groundtruth_tensors:
    if (fields.InputDataFields.num_groundtruth_boxes not in
        unbatched_tensor_dict):
@@ -206,8 +211,8 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
    # Make sure to set the Keras learning phase. True during training,
    # False for inference.
    tf.keras.backend.set_learning_phase(is_training)
-    detection_model = detection_model_fn(is_training=is_training,
-                                         add_summaries=(not use_tpu))
+    detection_model = detection_model_fn(
+        is_training=is_training, add_summaries=(not use_tpu))
    scaffold_fn = None

    if mode == tf.estimator.ModeKeys.TRAIN:
@@ -237,6 +242,7 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
      gt_weights_list = None
      if fields.InputDataFields.groundtruth_weights in labels:
        gt_weights_list = labels[fields.InputDataFields.groundtruth_weights]
+      gt_is_crowd_list = None
      if fields.InputDataFields.groundtruth_is_crowd in labels:
        gt_is_crowd_list = labels[fields.InputDataFields.groundtruth_is_crowd]
      detection_model.provide_groundtruth(
@@ -248,8 +254,18 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
          groundtruth_is_crowd_list=gt_is_crowd_list)

    preprocessed_images = features[fields.InputDataFields.image]
-    prediction_dict = detection_model.predict(
-        preprocessed_images, features[fields.InputDataFields.true_image_shape])
+    if use_tpu and train_config.use_bfloat16:
+      with tf.contrib.tpu.bfloat16_scope():
+        prediction_dict = detection_model.predict(
+            preprocessed_images,
+            features[fields.InputDataFields.true_image_shape])
+        for k, v in prediction_dict.items():
+          if v.dtype == tf.bfloat16:
+            prediction_dict[k] = tf.cast(v, tf.float32)
+    else:
+      prediction_dict = detection_model.predict(
+          preprocessed_images,
+          features[fields.InputDataFields.true_image_shape])
    if mode in (tf.estimator.ModeKeys.EVAL, tf.estimator.ModeKeys.PREDICT):
      detections = detection_model.postprocess(
          prediction_dict, features[fields.InputDataFields.true_image_shape])
@@ -270,13 +286,16 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
                train_config.load_all_detection_checkpoint_vars))
        available_var_map = (
            variables_helper.get_variables_available_in_checkpoint(
-                asg_map, train_config.fine_tune_checkpoint,
+                asg_map,
+                train_config.fine_tune_checkpoint,
                include_global_step=False))
        if use_tpu:
+
          def tpu_scaffold():
            tf.train.init_from_checkpoint(train_config.fine_tune_checkpoint,
                                          available_var_map)
            return tf.train.Scaffold()
+
          scaffold_fn = tpu_scaffold
        else:
          tf.train.init_from_checkpoint(train_config.fine_tune_checkpoint,
@@ -290,8 +309,8 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
        regularization_losses = tf.get_collection(
            tf.GraphKeys.REGULARIZATION_LOSSES)
        if regularization_losses:
-          regularization_loss = tf.add_n(regularization_losses,
-                                         name='regularization_loss')
+          regularization_loss = tf.add_n(
+              regularization_losses, name='regularization_loss')
          losses.append(regularization_loss)
          losses_dict['Loss/regularization_loss'] = regularization_loss
      total_loss = tf.add_n(losses, name='total_loss')
@@ -353,14 +372,14 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
    eval_metric_ops = None
    scaffold = None
    if mode == tf.estimator.ModeKeys.EVAL:
-      class_agnostic = (fields.DetectionResultFields.detection_classes
-                        not in detections)
-      groundtruth = _prepare_groundtruth_for_eval(
-          detection_model, class_agnostic)
+      class_agnostic = (
+          fields.DetectionResultFields.detection_classes not in detections)
+      groundtruth = _prepare_groundtruth_for_eval(detection_model,
+                                                  class_agnostic)
      use_original_images = fields.InputDataFields.original_image in features
      eval_images = (
-          features[fields.InputDataFields.original_image] if use_original_images
-          else features[fields.InputDataFields.image])
+          features[fields.InputDataFields.original_image]
+          if use_original_images else features[fields.InputDataFields.image])
      eval_dict = eval_util.result_dict_for_single_example(
          eval_images[0:1],
          features[inputs.HASH_KEY][0],
@@ -374,28 +393,26 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
      else:
        category_index = label_map_util.create_category_index_from_labelmap(
            eval_input_config.label_map_path)
-      img_summary = None
+      vis_metric_ops = None
      if not use_tpu and use_original_images:
-        detection_and_groundtruth = (
-            vis_utils.draw_side_by_side_evaluation_image(
-                eval_dict, category_index, max_boxes_to_draw=eval_config.max_num_boxes_to_visualize,
-                min_score_thresh=eval_config.min_score_threshold,
-                use_normalized_coordinates=False))
-        img_summary = tf.summary.image('Detections_Left_Groundtruth_Right',
-                                       detection_and_groundtruth)
+        eval_metric_op_vis = vis_utils.VisualizeSingleFrameDetections(
+            category_index,
+            max_examples_to_draw=eval_config.num_visualizations,
+            max_boxes_to_draw=eval_config.max_num_boxes_to_visualize,
+            min_score_thresh=eval_config.min_score_threshold,
+            use_normalized_coordinates=False)
+        vis_metric_ops = eval_metric_op_vis.get_estimator_eval_metric_ops(
+            eval_dict)

      # Eval metrics on a single example.
      eval_metric_ops = eval_util.get_eval_metric_ops_for_evaluators(
-          eval_config,
-          category_index.values(),
-          eval_dict)
+          eval_config, category_index.values(), eval_dict)
      for loss_key, loss_tensor in iter(losses_dict.items()):
        eval_metric_ops[loss_key] = tf.metrics.mean(loss_tensor)
      for var in optimizer_summary_vars:
        eval_metric_ops[var.op.name] = (var, tf.no_op())
-      if img_summary is not None:
-        eval_metric_ops['Detections_Left_Groundtruth_Right'] = (
-            img_summary, tf.no_op())
+      if vis_metric_ops is not None:
+        eval_metric_ops.update(vis_metric_ops)
      eval_metric_ops = {str(k): v for k, v in eval_metric_ops.items()}

      if eval_config.use_moving_averages:
@@ -435,12 +452,14 @@ def create_estimator_and_inputs(run_config,
                                hparams,
                                pipeline_config_path,
                                train_steps=None,
-                                eval_steps=None,
+                                sample_1_of_n_eval_examples=1,
+                                sample_1_of_n_eval_on_train_examples=1,
                                model_fn_creator=create_model_fn,
                                use_tpu_estimator=False,
                                use_tpu=False,
                                num_shards=1,
                                params=None,
+                                override_eval_num_epochs=True,
                                **kwargs):
  """Creates `Estimator`, input functions, and steps.

@@ -450,8 +469,11 @@ def create_estimator_and_inputs(run_config,
    pipeline_config_path: A path to a pipeline config file.
    train_steps: Number of training steps. If None, the number of training steps
      is set from the `TrainConfig` proto.
-    eval_steps: Number of evaluation steps per evaluation cycle. If None, the
-      number of evaluation steps is set from the `EvalConfig` proto.
+    sample_1_of_n_eval_examples: Integer representing how often an eval example
+      should be sampled. If 1, will sample all examples.
+    sample_1_of_n_eval_on_train_examples: Similar to
+      `sample_1_of_n_eval_examples`, except controls the sampling of training
+      data for evaluation.
    model_fn_creator: A function that creates a `model_fn` for `Estimator`.
      Follows the signature:

@@ -470,19 +492,20 @@ def create_estimator_and_inputs(run_config,
      is True.
    params: Parameter dictionary passed from the estimator. Only used if
      `use_tpu_estimator` is True.
+    override_eval_num_epochs: Whether to overwrite the number of epochs to
+      1 for eval_input.
    **kwargs: Additional keyword arguments for configuration override.

  Returns:
    A dictionary with the following fields:
    'estimator': An `Estimator` or `TPUEstimator`.
    'train_input_fn': A training input function.
-    'eval_input_fn': An evaluation input function.
+    'eval_input_fns': A list of all evaluation input functions.
+    'eval_input_names': A list of names for each evaluation input.
    'eval_on_train_input_fn': An evaluation-on-train input function.
    'predict_input_fn': A prediction input function.
    'train_steps': Number of training steps. Either directly from input or from
      configuration.
-    'eval_steps': Number of evaluation steps. Either directly from input or from
-      configuration.
  """
  get_configs_from_pipeline_file = MODEL_BUILD_UTIL_MAP[
      'get_configs_from_pipeline_file']
@@ -495,27 +518,37 @@ def create_estimator_and_inputs(run_config,
  create_predict_input_fn = MODEL_BUILD_UTIL_MAP['create_predict_input_fn']

  configs = get_configs_from_pipeline_file(pipeline_config_path)
+  kwargs.update({
+      'train_steps': train_steps,
+      'sample_1_of_n_eval_examples': sample_1_of_n_eval_examples,
+      'retain_original_images_in_eval': False if use_tpu else True,
+  })
+  if override_eval_num_epochs:
+    kwargs.update({'eval_num_epochs': 1})
+    tf.logging.warning(
+        'Forced number of epochs for all eval validations to be 1.')
  configs = merge_external_params_with_configs(
-      configs,
-      hparams,
-      train_steps=train_steps,
-      eval_steps=eval_steps,
-      retain_original_images_in_eval=False if use_tpu else True,
-      **kwargs)
+      configs, hparams, kwargs_dict=kwargs)
  model_config = configs['model']
  train_config = configs['train_config']
  train_input_config = configs['train_input_config']
  eval_config = configs['eval_config']
-  eval_input_config = configs['eval_input_config']
+  eval_input_configs = configs['eval_input_configs']
+  eval_on_train_input_config = copy.deepcopy(train_input_config)
+  eval_on_train_input_config.sample_1_of_n_examples = (
+      sample_1_of_n_eval_on_train_examples)
+  if override_eval_num_epochs and eval_on_train_input_config.num_epochs != 1:
+    tf.logging.warning('Expected number of evaluation epochs is 1, but '
+                       'instead encountered `eval_on_train_input_config'
+                       '.num_epochs` = '
+                       '{}. Overwriting `num_epochs` to 1.'.format(
+                           eval_on_train_input_config.num_epochs))
+    eval_on_train_input_config.num_epochs = 1

  # update train_steps from config but only when non-zero value is provided
  if train_steps is None and train_config.num_steps != 0:
    train_steps = train_config.num_steps

-  # update eval_steps from config but only when non-zero value is provided
-  if eval_steps is None and eval_config.num_examples != 0:
-    eval_steps = eval_config.num_examples
-
  detection_model_fn = functools.partial(
      model_builder.build, model_config=model_config)

@@ -524,18 +557,25 @@ def create_estimator_and_inputs(run_config,
      train_config=train_config,
      train_input_config=train_input_config,
      model_config=model_config)
-  eval_input_fn = create_eval_input_fn(
-      eval_config=eval_config,
-      eval_input_config=eval_input_config,
-      model_config=model_config)
+  eval_input_fns = [
+      create_eval_input_fn(
+          eval_config=eval_config,
+          eval_input_config=eval_input_config,
+          model_config=model_config) for eval_input_config in eval_input_configs
+  ]
+  eval_input_names = [
+      eval_input_config.name for eval_input_config in eval_input_configs
+  ]
  eval_on_train_input_fn = create_eval_input_fn(
      eval_config=eval_config,
-      eval_input_config=train_input_config,
+      eval_input_config=eval_on_train_input_config,
      model_config=model_config)
  predict_input_fn = create_predict_input_fn(
-      model_config=model_config, predict_input_config=eval_input_config)
+      model_config=model_config, predict_input_config=eval_input_configs[0])

-  tf.logging.info('create_estimator_and_inputs: use_tpu %s', use_tpu)
+  export_to_tpu = hparams.get('export_to_tpu', False)
+  tf.logging.info('create_estimator_and_inputs: use_tpu %s, export_to_tpu %s',
+                  use_tpu, export_to_tpu)
  model_fn = model_fn_creator(detection_model_fn, configs, hparams, use_tpu)
  if use_tpu_estimator:
    estimator = tf.contrib.tpu.TPUEstimator(
@@ -546,95 +586,95 @@ def create_estimator_and_inputs(run_config,
        use_tpu=use_tpu,
        config=run_config,
        # TODO(lzc): Remove conditional after CMLE moves to TF 1.9
+        # BEGIN GOOGLE-INTERNAL
+        export_to_tpu=export_to_tpu,
+        eval_on_tpu=False,  # Eval runs on CPU, so disable eval on TPU
+        # END GOOGLE-INTERNAL
        params=params if params else {})
  else:
    estimator = tf.estimator.Estimator(model_fn=model_fn, config=run_config)

  # Write the as-run pipeline config to disk.
  if run_config.is_chief:
-    pipeline_config_final = create_pipeline_proto_from_configs(
-        configs)
+    pipeline_config_final = create_pipeline_proto_from_configs(configs)
    config_util.save_pipeline_config(pipeline_config_final, estimator.model_dir)

  return dict(
      estimator=estimator,
      train_input_fn=train_input_fn,
-      eval_input_fn=eval_input_fn,
+      eval_input_fns=eval_input_fns,
+      eval_input_names=eval_input_names,
      eval_on_train_input_fn=eval_on_train_input_fn,
      predict_input_fn=predict_input_fn,
-      train_steps=train_steps,
-      eval_steps=eval_steps)
+      train_steps=train_steps)


 def create_train_and_eval_specs(train_input_fn,
-                                eval_input_fn,
+                                eval_input_fns,
                                eval_on_train_input_fn,
                                predict_input_fn,
                                train_steps,
-                                eval_steps,
                                eval_on_train_data=False,
-                                eval_on_train_steps=None,
                                final_exporter_name='Servo',
-                                eval_spec_name='eval'):
+                                eval_spec_names=None):
  """Creates a `TrainSpec` and `EvalSpec`s.

  Args:
    train_input_fn: Function that produces features and labels on train data.
-    eval_input_fn: Function that produces features and labels on eval data.
+    eval_input_fns: A list of functions that produce features and labels on eval
+      data.
    eval_on_train_input_fn: Function that produces features and labels for
      evaluation on train data.
    predict_input_fn: Function that produces features for inference.
    train_steps: Number of training steps.
-    eval_steps: Number of eval steps.
    eval_on_train_data: Whether to evaluate model on training data. Default is
      False.
-    eval_on_train_steps: Number of eval steps for training data. If not given,
-      uses eval_steps.
    final_exporter_name: String name given to `FinalExporter`.
-    eval_spec_name: String name given to main `EvalSpec`.
+    eval_spec_names: A list of string names for each `EvalSpec`.

  Returns:
-    Tuple of `TrainSpec` and list of `EvalSpecs`. The first `EvalSpec` is for
-    evaluation data. If `eval_on_train_data` is True, the second `EvalSpec` in
-    the list will correspond to training data.
+    Tuple of `TrainSpec` and list of `EvalSpecs`. If `eval_on_train_data` is
+    True, the last `EvalSpec` in the list will correspond to training data. The
+    rest EvalSpecs in the list are evaluation datas.
  """
-
-  exporter = tf.estimator.FinalExporter(
-      name=final_exporter_name, serving_input_receiver_fn=predict_input_fn)
-
  train_spec = tf.estimator.TrainSpec(
      input_fn=train_input_fn, max_steps=train_steps)

-  eval_specs = [
-      tf.estimator.EvalSpec(
-          name=eval_spec_name,
-          input_fn=eval_input_fn,
-          steps=eval_steps,
-          exporters=exporter)
-  ]
+  if eval_spec_names is None:
+    eval_spec_names = range(len(eval_input_fns))
+
+  eval_specs = []
+  for eval_spec_name, eval_input_fn in zip(eval_spec_names, eval_input_fns):
+    exporter_name = '{}_{}'.format(final_exporter_name, eval_spec_name)
+    exporter = tf.estimator.FinalExporter(
+        name=exporter_name, serving_input_receiver_fn=predict_input_fn)
+    eval_specs.append(
+        tf.estimator.EvalSpec(
+            name=eval_spec_name,
+            input_fn=eval_input_fn,
+            steps=None,
+            exporters=exporter))

  if eval_on_train_data:
    eval_specs.append(
        tf.estimator.EvalSpec(
-            name='eval_on_train', input_fn=eval_on_train_input_fn,
-            steps=eval_on_train_steps or eval_steps))
+            name='eval_on_train', input_fn=eval_on_train_input_fn, steps=None))

  return train_spec, eval_specs


-def continuous_eval(estimator, model_dir, input_fn, eval_steps, train_steps,
-                    name):
+def continuous_eval(estimator, model_dir, input_fn, train_steps, name):
  """Perform continuous evaluation on checkpoints written to a model directory.

  Args:
    estimator: Estimator object to use for evaluation.
    model_dir: Model directory to read checkpoints for continuous evaluation.
    input_fn: Input function to use for evaluation.
-    eval_steps: Number of steps to run during each evaluation.
    train_steps: Number of training steps. This is used to infer the last
      checkpoint and stop evaluation loop.
    name: Namescope for eval summary.
  """
+
  def terminate_eval():
    tf.logging.info('Terminating eval after 180 seconds of no checkpoints')
    return True
@@ -646,10 +686,7 @@ def continuous_eval(estimator, model_dir, input_fn, eval_steps, train_steps,
    tf.logging.info('Starting Evaluation.')
    try:
      eval_results = estimator.evaluate(
-          input_fn=input_fn,
-          steps=eval_steps,
-          checkpoint_path=ckpt,
-          name=name)
+          input_fn=input_fn, steps=None, checkpoint_path=ckpt, name=name)
      tf.logging.info('Eval results: %s' % eval_results)

      # Terminate eval job when final checkpoint is reached
@@ -713,10 +750,9 @@ def populate_experiment(run_config,
      **kwargs)
  estimator = train_and_eval_dict['estimator']
  train_input_fn = train_and_eval_dict['train_input_fn']
-  eval_input_fn = train_and_eval_dict['eval_input_fn']
+  eval_input_fns = train_and_eval_dict['eval_input_fns']
  predict_input_fn = train_and_eval_dict['predict_input_fn']
  train_steps = train_and_eval_dict['train_steps']
-  eval_steps = train_and_eval_dict['eval_steps']

  export_strategies = [
      tf.contrib.learn.utils.saved_model_export_utils.make_export_strategy(
@@ -726,8 +762,9 @@ def populate_experiment(run_config,
  return tf.contrib.learn.Experiment(
      estimator=estimator,
      train_input_fn=train_input_fn,
-      eval_input_fn=eval_input_fn,
+      eval_input_fn=eval_input_fns[0],
      train_steps=train_steps,
-      eval_steps=eval_steps,
+      eval_steps=None,
      export_strategies=export_strategies,
-      eval_delay_secs=120,)
+      eval_delay_secs=120,
+  )
--- a/research/object_detection/model_lib_test.py
+++ b/research/object_detection/model_lib_test.py
@@ -64,11 +64,13 @@ def _get_configs_for_model(model_name):
  data_path = _get_data_path()
  label_map_path = _get_labelmap_path()
  configs = config_util.get_configs_from_pipeline_file(filename)
+  override_dict = {
+      'train_input_path': data_path,
+      'eval_input_path': data_path,
+      'label_map_path': label_map_path
+  }
  configs = config_util.merge_external_params_with_configs(
-      configs,
-      train_input_path=data_path,
-      eval_input_path=data_path,
-      label_map_path=label_map_path)
+      configs, kwargs_dict=override_dict)
  return configs


@@ -145,6 +147,9 @@ class ModelLibTest(tf.test.TestCase):
        self.assertEqual(batch_size, detection_scores.shape.as_list()[0])
        self.assertEqual(tf.float32, detection_scores.dtype)
        self.assertEqual(tf.float32, num_detections.dtype)
+        if mode == 'eval':
+          self.assertIn('Detections_Left_Groundtruth_Right/0',
+                        estimator_spec.eval_metric_ops)
      if model_mode == tf.estimator.ModeKeys.TRAIN:
        self.assertIsNotNone(estimator_spec.train_op)
      return estimator_spec
@@ -225,21 +230,17 @@ class ModelLibTest(tf.test.TestCase):
        hparams_overrides='load_pretrained=false')
    pipeline_config_path = get_pipeline_config_path(MODEL_NAME_FOR_TEST)
    train_steps = 20
-    eval_steps = 10
    train_and_eval_dict = model_lib.create_estimator_and_inputs(
        run_config,
        hparams,
        pipeline_config_path,
-        train_steps=train_steps,
-        eval_steps=eval_steps)
+        train_steps=train_steps)
    estimator = train_and_eval_dict['estimator']
    train_steps = train_and_eval_dict['train_steps']
-    eval_steps = train_and_eval_dict['eval_steps']
    self.assertIsInstance(estimator, tf.estimator.Estimator)
    self.assertEqual(20, train_steps)
-    self.assertEqual(10, eval_steps)
    self.assertIn('train_input_fn', train_and_eval_dict)
-    self.assertIn('eval_input_fn', train_and_eval_dict)
+    self.assertIn('eval_input_fns', train_and_eval_dict)
    self.assertIn('eval_on_train_input_fn', train_and_eval_dict)

  def test_create_estimator_with_default_train_eval_steps(self):
@@ -250,16 +251,13 @@ class ModelLibTest(tf.test.TestCase):
    pipeline_config_path = get_pipeline_config_path(MODEL_NAME_FOR_TEST)
    configs = config_util.get_configs_from_pipeline_file(pipeline_config_path)
    config_train_steps = configs['train_config'].num_steps
-    config_eval_steps = configs['eval_config'].num_examples
    train_and_eval_dict = model_lib.create_estimator_and_inputs(
        run_config, hparams, pipeline_config_path)
    estimator = train_and_eval_dict['estimator']
    train_steps = train_and_eval_dict['train_steps']
-    eval_steps = train_and_eval_dict['eval_steps']

    self.assertIsInstance(estimator, tf.estimator.Estimator)
    self.assertEqual(config_train_steps, train_steps)
-    self.assertEqual(config_eval_steps, eval_steps)

  def test_create_tpu_estimator_and_inputs(self):
    """Tests that number of train/eval defaults to config values."""
@@ -269,21 +267,17 @@ class ModelLibTest(tf.test.TestCase):
        hparams_overrides='load_pretrained=false')
    pipeline_config_path = get_pipeline_config_path(MODEL_NAME_FOR_TEST)
    train_steps = 20
-    eval_steps = 10
    train_and_eval_dict = model_lib.create_estimator_and_inputs(
        run_config,
        hparams,
        pipeline_config_path,
        train_steps=train_steps,
-        eval_steps=eval_steps,
        use_tpu_estimator=True)
    estimator = train_and_eval_dict['estimator']
    train_steps = train_and_eval_dict['train_steps']
-    eval_steps = train_and_eval_dict['eval_steps']

    self.assertIsInstance(estimator, tpu_estimator.TPUEstimator)
    self.assertEqual(20, train_steps)
-    self.assertEqual(10, eval_steps)

  def test_create_train_and_eval_specs(self):
    """Tests that `TrainSpec` and `EvalSpec` is created correctly."""
@@ -292,38 +286,32 @@ class ModelLibTest(tf.test.TestCase):
        hparams_overrides='load_pretrained=false')
    pipeline_config_path = get_pipeline_config_path(MODEL_NAME_FOR_TEST)
    train_steps = 20
-    eval_steps = 10
-    eval_on_train_steps = 15
    train_and_eval_dict = model_lib.create_estimator_and_inputs(
        run_config,
        hparams,
        pipeline_config_path,
-        train_steps=train_steps,
-        eval_steps=eval_steps)
+        train_steps=train_steps)
    train_input_fn = train_and_eval_dict['train_input_fn']
-    eval_input_fn = train_and_eval_dict['eval_input_fn']
+    eval_input_fns = train_and_eval_dict['eval_input_fns']
    eval_on_train_input_fn = train_and_eval_dict['eval_on_train_input_fn']
    predict_input_fn = train_and_eval_dict['predict_input_fn']
    train_steps = train_and_eval_dict['train_steps']
-    eval_steps = train_and_eval_dict['eval_steps']

    train_spec, eval_specs = model_lib.create_train_and_eval_specs(
        train_input_fn,
-        eval_input_fn,
+        eval_input_fns,
        eval_on_train_input_fn,
        predict_input_fn,
        train_steps,
-        eval_steps,
        eval_on_train_data=True,
-        eval_on_train_steps=eval_on_train_steps,
        final_exporter_name='exporter',
-        eval_spec_name='holdout')
+        eval_spec_names=['holdout'])
    self.assertEqual(train_steps, train_spec.max_steps)
    self.assertEqual(2, len(eval_specs))
-    self.assertEqual(eval_steps, eval_specs[0].steps)
+    self.assertEqual(None, eval_specs[0].steps)
    self.assertEqual('holdout', eval_specs[0].name)
-    self.assertEqual('exporter', eval_specs[0].exporters[0].name)
-    self.assertEqual(eval_on_train_steps, eval_specs[1].steps)
+    self.assertEqual('exporter_holdout', eval_specs[0].exporters[0].name)
+    self.assertEqual(None, eval_specs[1].steps)
    self.assertEqual('eval_on_train', eval_specs[1].name)

  def test_experiment(self):
@@ -339,7 +327,7 @@ class ModelLibTest(tf.test.TestCase):
        train_steps=10,
        eval_steps=20)
    self.assertEqual(10, experiment.train_steps)
-    self.assertEqual(20, experiment.eval_steps)
+    self.assertEqual(None, experiment.eval_steps)


 class UnbatchTensorsTest(tf.test.TestCase):

--- a/research/object_detection/model_main.py
+++ b/research/object_detection/model_main.py
@@ -31,7 +31,16 @@ flags.DEFINE_string(
 flags.DEFINE_string('pipeline_config_path', None, 'Path to pipeline config '
                    'file.')
 flags.DEFINE_integer('num_train_steps', None, 'Number of train steps.')
-flags.DEFINE_integer('num_eval_steps', None, 'Number of train steps.')
+flags.DEFINE_boolean('eval_training_data', False,
+                     'If training data should be evaluated for this job. Note '
+                     'that one call only use this in eval-only mode, and '
+                     '`checkpoint_dir` must be supplied.')
+flags.DEFINE_integer('sample_1_of_n_eval_examples', 1, 'Will sample one of '
+                     'every n eval input examples, where n is provided.')
+flags.DEFINE_integer('sample_1_of_n_eval_on_train_examples', 5, 'Will sample '
+                     'one of every n train input examples for evaluation, '
+                     'where n is provided. This is only used if '
+                     '`eval_training_data` is True.')
 flags.DEFINE_string(
    'hparams_overrides', None, 'Hyperparameter overrides, '
    'represented as a string containing comma-separated '
@@ -44,8 +53,6 @@ flags.DEFINE_boolean(
    'run_once', False, 'If running in eval-only mode, whether to run just '
    'one round of eval vs running continuously (default).'
 )
-flags.DEFINE_boolean('eval_training_data', False,
-                     'If training data should be evaluated for this job.')
 FLAGS = flags.FLAGS


@@ -59,14 +66,15 @@ def main(unused_argv):
      hparams=model_hparams.create_hparams(FLAGS.hparams_overrides),
      pipeline_config_path=FLAGS.pipeline_config_path,
      train_steps=FLAGS.num_train_steps,
-      eval_steps=FLAGS.num_eval_steps)
+      sample_1_of_n_eval_examples=FLAGS.sample_1_of_n_eval_examples,
+      sample_1_of_n_eval_on_train_examples=(
+          FLAGS.sample_1_of_n_eval_on_train_examples))
  estimator = train_and_eval_dict['estimator']
  train_input_fn = train_and_eval_dict['train_input_fn']
-  eval_input_fn = train_and_eval_dict['eval_input_fn']
+  eval_input_fns = train_and_eval_dict['eval_input_fns']
  eval_on_train_input_fn = train_and_eval_dict['eval_on_train_input_fn']
  predict_input_fn = train_and_eval_dict['predict_input_fn']
  train_steps = train_and_eval_dict['train_steps']
-  eval_steps = train_and_eval_dict['eval_steps']

  if FLAGS.checkpoint_dir:
    if FLAGS.eval_training_data:
@@ -74,23 +82,23 @@ def main(unused_argv):
      input_fn = eval_on_train_input_fn
    else:
      name = 'validation_data'
-      input_fn = eval_input_fn
+      # The first eval input will be evaluated.
+      input_fn = eval_input_fns[0]
    if FLAGS.run_once:
      estimator.evaluate(input_fn,
-                         eval_steps,
+                         num_eval_steps=None,
                         checkpoint_path=tf.train.latest_checkpoint(
                             FLAGS.checkpoint_dir))
    else:
-      model_lib.continuous_eval(estimator, FLAGS.model_dir, input_fn,
-                                eval_steps, train_steps, name)
+      model_lib.continuous_eval(estimator, FLAGS.checkpoint_dir, input_fn,
+                                train_steps, name)
  else:
    train_spec, eval_specs = model_lib.create_train_and_eval_specs(
        train_input_fn,
-        eval_input_fn,
+        eval_input_fns,
        eval_on_train_input_fn,
        predict_input_fn,
        train_steps,
-        eval_steps,
        eval_on_train_data=False)

    # Currently only a single Eval Spec is allowed.

--- a/research/object_detection/model_tpu_main.py
+++ b/research/object_detection/model_tpu_main.py
@@ -62,15 +62,20 @@ flags.DEFINE_integer('train_batch_size', None, 'Batch size for training. If '
 flags.DEFINE_string(
    'hparams_overrides', None, 'Comma-separated list of '
    'hyperparameters to override defaults.')
+flags.DEFINE_integer('num_train_steps', None, 'Number of train steps.')
 flags.DEFINE_boolean('eval_training_data', False,
                     'If training data should be evaluated for this job.')
+flags.DEFINE_integer('sample_1_of_n_eval_examples', 1, 'Will sample one of '
+                     'every n eval input examples, where n is provided.')
+flags.DEFINE_integer('sample_1_of_n_eval_on_train_examples', 5, 'Will sample '
+                     'one of every n train input examples for evaluation, '
+                     'where n is provided. This is only used if '
+                     '`eval_training_data` is True.')
 flags.DEFINE_string(
    'model_dir', None, 'Path to output model directory '
    'where event and checkpoint files will be written.')
 flags.DEFINE_string('pipeline_config_path', None, 'Path to pipeline config '
                    'file.')
-flags.DEFINE_integer('num_train_steps', None, 'Number of train steps.')
-flags.DEFINE_integer('num_eval_steps', None, 'Number of train steps.')

 FLAGS = tf.flags.FLAGS

@@ -103,17 +108,18 @@ def main(unused_argv):
      hparams=model_hparams.create_hparams(FLAGS.hparams_overrides),
      pipeline_config_path=FLAGS.pipeline_config_path,
      train_steps=FLAGS.num_train_steps,
-      eval_steps=FLAGS.num_eval_steps,
+      sample_1_of_n_eval_examples=FLAGS.sample_1_of_n_eval_examples,
+      sample_1_of_n_eval_on_train_examples=(
+          FLAGS.sample_1_of_n_eval_on_train_examples),
      use_tpu_estimator=True,
      use_tpu=FLAGS.use_tpu,
      num_shards=FLAGS.num_shards,
      **kwargs)
  estimator = train_and_eval_dict['estimator']
  train_input_fn = train_and_eval_dict['train_input_fn']
-  eval_input_fn = train_and_eval_dict['eval_input_fn']
+  eval_input_fns = train_and_eval_dict['eval_input_fns']
  eval_on_train_input_fn = train_and_eval_dict['eval_on_train_input_fn']
  train_steps = train_and_eval_dict['train_steps']
-  eval_steps = train_and_eval_dict['eval_steps']

  if FLAGS.mode == 'train':
    estimator.train(input_fn=train_input_fn, max_steps=train_steps)
@@ -125,9 +131,10 @@ def main(unused_argv):
      input_fn = eval_on_train_input_fn
    else:
      name = 'validation_data'
-      input_fn = eval_input_fn
-    model_lib.continuous_eval(estimator, FLAGS.model_dir, input_fn, eval_steps,
-                              train_steps, name)
+      # Currently only a single eval input is allowed.
+      input_fn = eval_input_fns[0]
+    model_lib.continuous_eval(estimator, FLAGS.model_dir, input_fn, train_steps,
+                              name)


 if __name__ == '__main__':

--- a/research/object_detection/models/feature_map_generators.py
+++ b/research/object_detection/models/feature_map_generators.py
@@ -24,6 +24,7 @@ Feature map generators build on the base feature extractors and produce a list
 of final feature maps.
 """
 import collections
+import functools
 import tensorflow as tf
 from object_detection.utils import ops
 slim = tf.contrib.slim
@@ -45,6 +46,220 @@ def get_depth_fn(depth_multiplier, min_depth):
  return multiply_depth


+class KerasMultiResolutionFeatureMaps(tf.keras.Model):
+  """Generates multi resolution feature maps from input image features.
+
+  A Keras model that generates multi-scale feature maps for detection as in the
+  SSD papers by Liu et al: https://arxiv.org/pdf/1512.02325v2.pdf, See Sec 2.1.
+
+  More specifically, when called on inputs it performs the following two tasks:
+  1) If a layer name is provided in the configuration, returns that layer as a
+     feature map.
+  2) If a layer name is left as an empty string, constructs a new feature map
+     based on the spatial shape and depth configuration. Note that the current
+     implementation only supports generating new layers using convolution of
+     stride 2 resulting in a spatial resolution reduction by a factor of 2.
+     By default convolution kernel size is set to 3, and it can be customized
+     by caller.
+
+  An example of the configuration for Inception V3:
+  {
+    'from_layer': ['Mixed_5d', 'Mixed_6e', 'Mixed_7c', '', '', ''],
+    'layer_depth': [-1, -1, -1, 512, 256, 128]
+  }
+
+  When this feature generator object is called on input image_features:
+    Args:
+      image_features: A dictionary of handles to activation tensors from the
+        base feature extractor.
+
+    Returns:
+      feature_maps: an OrderedDict mapping keys (feature map names) to
+        tensors where each tensor has shape [batch, height_i, width_i, depth_i].
+  """
+
+  def __init__(self,
+               feature_map_layout,
+               depth_multiplier,
+               min_depth,
+               insert_1x1_conv,
+               is_training,
+               conv_hyperparams,
+               freeze_batchnorm,
+               name=None):
+    """Constructor.
+
+    Args:
+      feature_map_layout: Dictionary of specifications for the feature map
+        layouts in the following format (Inception V2/V3 respectively):
+        {
+          'from_layer': ['Mixed_3c', 'Mixed_4c', 'Mixed_5c', '', '', ''],
+          'layer_depth': [-1, -1, -1, 512, 256, 128]
+        }
+        or
+        {
+          'from_layer': ['Mixed_5d', 'Mixed_6e', 'Mixed_7c', '', '', ''],
+          'layer_depth': [-1, -1, -1, 512, 256, 128]
+        }
+        If 'from_layer' is specified, the specified feature map is directly used
+        as a box predictor layer, and the layer_depth is directly infered from
+        the feature map (instead of using the provided 'layer_depth' parameter).
+        In this case, our convention is to set 'layer_depth' to -1 for clarity.
+        Otherwise, if 'from_layer' is an empty string, then the box predictor
+        layer will be built from the previous layer using convolution
+        operations. Note that the current implementation only supports
+        generating new layers using convolutions of stride 2 (resulting in a
+        spatial resolution reduction by a factor of 2), and will be extended to
+        a more flexible design. Convolution kernel size is set to 3 by default,
+        and can be customized by 'conv_kernel_size' parameter (similarily,
+        'conv_kernel_size' should be set to -1 if 'from_layer' is specified).
+        The created convolution operation will be a normal 2D convolution by
+        default, and a depthwise convolution followed by 1x1 convolution if
+        'use_depthwise' is set to True.
+      depth_multiplier: Depth multiplier for convolutional layers.
+      min_depth: Minimum depth for convolutional layers.
+      insert_1x1_conv: A boolean indicating whether an additional 1x1
+        convolution should be inserted before shrinking the feature map.
+      is_training: Indicates whether the feature generator is in training mode.
+      conv_hyperparams: A `hyperparams_builder.KerasLayerHyperparams` object
+        containing hyperparameters for convolution ops.
+      freeze_batchnorm: Bool. Whether to freeze batch norm parameters during
+        training or not. When training with a small batch size (e.g. 1), it is
+        desirable to freeze batch norm update and use pretrained batch norm
+        params.
+      name: A string name scope to assign to the model. If 'None', Keras
+        will auto-generate one from the class name.
+    """
+    super(KerasMultiResolutionFeatureMaps, self).__init__(name=name)
+
+    self.feature_map_layout = feature_map_layout
+    self.convolutions = []
+
+    depth_fn = get_depth_fn(depth_multiplier, min_depth)
+
+    base_from_layer = ''
+    use_explicit_padding = False
+    if 'use_explicit_padding' in feature_map_layout:
+      use_explicit_padding = feature_map_layout['use_explicit_padding']
+    use_depthwise = False
+    if 'use_depthwise' in feature_map_layout:
+      use_depthwise = feature_map_layout['use_depthwise']
+    for index, from_layer in enumerate(feature_map_layout['from_layer']):
+      net = tf.keras.Sequential(name='output_%d' % index)
+      self.convolutions.append(net)
+      layer_depth = feature_map_layout['layer_depth'][index]
+      conv_kernel_size = 3
+      if 'conv_kernel_size' in feature_map_layout:
+        conv_kernel_size = feature_map_layout['conv_kernel_size'][index]
+      if from_layer:
+        base_from_layer = from_layer
+      else:
+        if insert_1x1_conv:
+          layer_name = '{}_1_Conv2d_{}_1x1_{}'.format(
+              base_from_layer, index, depth_fn(layer_depth / 2))
+          net.add(tf.keras.layers.Conv2D(depth_fn(layer_depth / 2),
+                                         [1, 1],
+                                         padding='SAME',
+                                         strides=1,
+                                         name=layer_name + '_conv',
+                                         **conv_hyperparams.params()))
+          net.add(
+              conv_hyperparams.build_batch_norm(
+                  training=(is_training and not freeze_batchnorm),
+                  name=layer_name + '_batchnorm'))
+          net.add(
+              conv_hyperparams.build_activation_layer(
+                  name=layer_name))
+
+        layer_name = '{}_2_Conv2d_{}_{}x{}_s2_{}'.format(
+            base_from_layer, index, conv_kernel_size, conv_kernel_size,
+            depth_fn(layer_depth))
+        stride = 2
+        padding = 'SAME'
+        if use_explicit_padding:
+          padding = 'VALID'
+          # We define this function here while capturing the value of
+          # conv_kernel_size, to avoid holding a reference to the loop variable
+          # conv_kernel_size inside of a lambda function
+          def fixed_padding(features, kernel_size=conv_kernel_size):
+            ops.fixed_padding(features, kernel_size)
+          net.add(tf.keras.layers.Lambda(fixed_padding))
+        # TODO(rathodv): Add some utilities to simplify the creation of
+        # Depthwise & non-depthwise convolutions w/ normalization & activations
+        if use_depthwise:
+          net.add(tf.keras.layers.DepthwiseConv2D(
+              [conv_kernel_size, conv_kernel_size],
+              depth_multiplier=1,
+              padding=padding,
+              strides=stride,
+              name=layer_name + '_depthwise_conv',
+              **conv_hyperparams.params()))
+          net.add(
+              conv_hyperparams.build_batch_norm(
+                  training=(is_training and not freeze_batchnorm),
+                  name=layer_name + '_depthwise_batchnorm'))
+          net.add(
+              conv_hyperparams.build_activation_layer(
+                  name=layer_name + '_depthwise'))
+
+          net.add(tf.keras.layers.Conv2D(depth_fn(layer_depth), [1, 1],
+                                         padding='SAME',
+                                         strides=1,
+                                         name=layer_name + '_conv',
+                                         **conv_hyperparams.params()))
+          net.add(
+              conv_hyperparams.build_batch_norm(
+                  training=(is_training and not freeze_batchnorm),
+                  name=layer_name + '_batchnorm'))
+          net.add(
+              conv_hyperparams.build_activation_layer(
+                  name=layer_name))
+
+        else:
+          net.add(tf.keras.layers.Conv2D(depth_fn(layer_depth),
+                                         [conv_kernel_size, conv_kernel_size],
+                                         padding=padding,
+                                         strides=stride,
+                                         name=layer_name + '_conv',
+                                         **conv_hyperparams.params()))
+          net.add(
+              conv_hyperparams.build_batch_norm(
+                  training=(is_training and not freeze_batchnorm),
+                  name=layer_name + '_batchnorm'))
+          net.add(
+              conv_hyperparams.build_activation_layer(
+                  name=layer_name))
+
+  def call(self, image_features):
+    """Generate the multi-resolution feature maps.
+
+    Executed when calling the `.__call__` method on input.
+
+    Args:
+      image_features: A dictionary of handles to activation tensors from the
+        base feature extractor.
+
+    Returns:
+      feature_maps: an OrderedDict mapping keys (feature map names) to
+        tensors where each tensor has shape [batch, height_i, width_i, depth_i].
+    """
+    feature_maps = []
+    feature_map_keys = []
+
+    for index, from_layer in enumerate(self.feature_map_layout['from_layer']):
+      if from_layer:
+        feature_map = image_features[from_layer]
+        feature_map_keys.append(from_layer)
+      else:
+        feature_map = feature_maps[-1]
+        feature_map = self.convolutions[index](feature_map)
+        layer_name = self.convolutions[index].layers[-1].name
+        feature_map_keys.append(layer_name)
+      feature_maps.append(feature_map)
+    return collections.OrderedDict(
+        [(x, y) for (x, y) in zip(feature_map_keys, feature_maps)])
+
+
 def multi_resolution_feature_maps(feature_map_layout, depth_multiplier,
                                  min_depth, insert_1x1_conv, image_features):
  """Generates multi resolution feature maps from input image features.
@@ -77,7 +292,7 @@ def multi_resolution_feature_maps(feature_map_layout, depth_multiplier,
      }
      or
      {
-        'from_layer': ['Mixed_5d', 'Mixed_6e', 'Mixed_7c', '', '', '', ''],
+        'from_layer': ['Mixed_5d', 'Mixed_6e', 'Mixed_7c', '', '', ''],
        'layer_depth': [-1, -1, -1, 512, 256, 128]
      }
      If 'from_layer' is specified, the specified feature map is directly used
@@ -179,7 +394,10 @@ def multi_resolution_feature_maps(feature_map_layout, depth_multiplier,
      [(x, y) for (x, y) in zip(feature_map_keys, feature_maps)])


-def fpn_top_down_feature_maps(image_features, depth, scope=None):
+def fpn_top_down_feature_maps(image_features,
+                              depth,
+                              use_depthwise=False,
+                              scope=None):
  """Generates `top-down` feature maps for Feature Pyramid Networks.

  See https://arxiv.org/abs/1612.03144 for details.
@@ -189,6 +407,7 @@ def fpn_top_down_feature_maps(image_features, depth, scope=None):
      Spatial resolutions of succesive tensors must reduce exactly by a factor
      of 2.
    depth: depth of output feature maps.
+    use_depthwise: use depthwise separable conv instead of regular conv.
    scope: A scope name to wrap this op under.

  Returns:
@@ -200,7 +419,7 @@ def fpn_top_down_feature_maps(image_features, depth, scope=None):
    output_feature_maps_list = []
    output_feature_map_keys = []
    with slim.arg_scope(
-        [slim.conv2d], padding='SAME', stride=1):
+        [slim.conv2d, slim.separable_conv2d], padding='SAME', stride=1):
      top_down = slim.conv2d(
          image_features[-1][1],
          depth, [1, 1], activation_fn=None, normalizer_fn=None,
@@ -216,7 +435,11 @@ def fpn_top_down_feature_maps(image_features, depth, scope=None):
            activation_fn=None, normalizer_fn=None,
            scope='projection_%d' % (level + 1))
        top_down += residual
-        output_feature_maps_list.append(slim.conv2d(
+        if use_depthwise:
+          conv_op = functools.partial(slim.separable_conv2d, depth_multiplier=1)
+        else:
+          conv_op = slim.conv2d
+        output_feature_maps_list.append(conv_op(
            top_down,
            depth, [3, 3],
            scope='smoothing_%d' % (level + 1)))
@@ -226,7 +449,7 @@ def fpn_top_down_feature_maps(image_features, depth, scope=None):


 def pooling_pyramid_feature_maps(base_feature_map_depth, num_layers,
-                                 image_features):
+                                 image_features, replace_pool_with_conv=False):
  """Generates pooling pyramid feature maps.

  The pooling pyramid feature maps is motivated by
@@ -250,6 +473,8 @@ def pooling_pyramid_feature_maps(base_feature_map_depth, num_layers,
      from the base feature.
    image_features: A dictionary of handles to activation tensors from the
      feature extractor.
+    replace_pool_with_conv: Whether or not to replace pooling operations with
+      convolutions in the PPN. Default is False.

  Returns:
    feature_maps: an OrderedDict mapping keys (feature map names) to
@@ -279,12 +504,22 @@ def pooling_pyramid_feature_maps(base_feature_map_depth, num_layers,
  feature_map_keys.append(feature_map_key)
  feature_maps.append(image_features)
  feature_map = image_features
-  with slim.arg_scope([slim.max_pool2d], padding='SAME', stride=2):
-    for i in range(num_layers - 1):
-      feature_map_key = 'MaxPool2d_%d_2x2' % i
-      feature_map = slim.max_pool2d(
-          feature_map, [2, 2], padding='SAME', scope=feature_map_key)
-      feature_map_keys.append(feature_map_key)
-      feature_maps.append(feature_map)
+  if replace_pool_with_conv:
+    with slim.arg_scope([slim.conv2d], padding='SAME', stride=2):
+      for i in range(num_layers - 1):
+        feature_map_key = 'Conv2d_{}_3x3_s2_{}'.format(i,
+                                                       base_feature_map_depth)
+        feature_map = slim.conv2d(
+            feature_map, base_feature_map_depth, [3, 3], scope=feature_map_key)
+        feature_map_keys.append(feature_map_key)
+        feature_maps.append(feature_map)
+  else:
+    with slim.arg_scope([slim.max_pool2d], padding='SAME', stride=2):
+      for i in range(num_layers - 1):
+        feature_map_key = 'MaxPool2d_%d_2x2' % i
+        feature_map = slim.max_pool2d(
+            feature_map, [2, 2], padding='SAME', scope=feature_map_key)
+        feature_map_keys.append(feature_map_key)
+        feature_maps.append(feature_map)
  return collections.OrderedDict(
      [(x, y) for (x, y) in zip(feature_map_keys, feature_maps)])
--- a/research/object_detection/models/feature_map_generators_test.py
+++ b/research/object_detection/models/feature_map_generators_test.py
@@ -15,9 +15,15 @@

 """Tests for feature map generators."""

+from absl.testing import parameterized
+
 import tensorflow as tf

+from google.protobuf import text_format
+
+from object_detection.builders import hyperparams_builder
 from object_detection.models import feature_map_generators
+from object_detection.protos import hyperparams_pb2

 INCEPTION_V2_LAYOUT = {
    'from_layer': ['Mixed_3c', 'Mixed_4c', 'Mixed_5c', '', '', ''],
@@ -40,21 +46,60 @@ EMBEDDED_SSD_MOBILENET_V1_LAYOUT = {
 }


-# TODO(rathodv): add tests with different anchor strides.
+@parameterized.parameters(
+    {'use_keras': False},
+    {'use_keras': True},
+)
 class MultiResolutionFeatureMapGeneratorTest(tf.test.TestCase):

-  def test_get_expected_feature_map_shapes_with_inception_v2(self):
+  def _build_conv_hyperparams(self):
+    conv_hyperparams = hyperparams_pb2.Hyperparams()
+    conv_hyperparams_text_proto = """
+      regularizer {
+        l2_regularizer {
+        }
+      }
+      initializer {
+        truncated_normal_initializer {
+        }
+      }
+    """
+    text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams)
+    return hyperparams_builder.KerasLayerHyperparams(conv_hyperparams)
+
+  def _build_feature_map_generator(self, feature_map_layout, use_keras):
+    if use_keras:
+      return feature_map_generators.KerasMultiResolutionFeatureMaps(
+          feature_map_layout=feature_map_layout,
+          depth_multiplier=1,
+          min_depth=32,
+          insert_1x1_conv=True,
+          freeze_batchnorm=False,
+          is_training=True,
+          conv_hyperparams=self._build_conv_hyperparams(),
+          name='FeatureMaps'
+      )
+    else:
+      def feature_map_generator(image_features):
+        return feature_map_generators.multi_resolution_feature_maps(
+            feature_map_layout=feature_map_layout,
+            depth_multiplier=1,
+            min_depth=32,
+            insert_1x1_conv=True,
+            image_features=image_features)
+      return feature_map_generator
+
+  def test_get_expected_feature_map_shapes_with_inception_v2(self, use_keras):
    image_features = {
        'Mixed_3c': tf.random_uniform([4, 28, 28, 256], dtype=tf.float32),
        'Mixed_4c': tf.random_uniform([4, 14, 14, 576], dtype=tf.float32),
        'Mixed_5c': tf.random_uniform([4, 7, 7, 1024], dtype=tf.float32)
    }
-    feature_maps = feature_map_generators.multi_resolution_feature_maps(
+    feature_map_generator = self._build_feature_map_generator(
        feature_map_layout=INCEPTION_V2_LAYOUT,
-        depth_multiplier=1,
-        min_depth=32,
-        insert_1x1_conv=True,
-        image_features=image_features)
+        use_keras=use_keras
+    )
+    feature_maps = feature_map_generator(image_features)

    expected_feature_map_shapes = {
        'Mixed_3c': (4, 28, 28, 256),
@@ -70,21 +115,54 @@ class MultiResolutionFeatureMapGeneratorTest(tf.test.TestCase):
      out_feature_maps = sess.run(feature_maps)
      out_feature_map_shapes = dict(
          (key, value.shape) for key, value in out_feature_maps.items())
-      self.assertDictEqual(out_feature_map_shapes, expected_feature_map_shapes)
+      self.assertDictEqual(expected_feature_map_shapes, out_feature_map_shapes)
+
+  # TODO(kaftan): Remove conditional after CMLE moves to TF 1.10
+  # BEGIN GOOGLE-INTERNAL
+  def test_get_expected_feature_map_shapes_with_inception_v2_use_depthwise(
+      self, use_keras):
+    image_features = {
+        'Mixed_3c': tf.random_uniform([4, 28, 28, 256], dtype=tf.float32),
+        'Mixed_4c': tf.random_uniform([4, 14, 14, 576], dtype=tf.float32),
+        'Mixed_5c': tf.random_uniform([4, 7, 7, 1024], dtype=tf.float32)
+    }
+    layout_copy = INCEPTION_V2_LAYOUT.copy()
+    layout_copy['use_depthwise'] = True
+    feature_map_generator = self._build_feature_map_generator(
+        feature_map_layout=layout_copy,
+        use_keras=use_keras
+    )
+    feature_maps = feature_map_generator(image_features)
+
+    expected_feature_map_shapes = {
+        'Mixed_3c': (4, 28, 28, 256),
+        'Mixed_4c': (4, 14, 14, 576),
+        'Mixed_5c': (4, 7, 7, 1024),
+        'Mixed_5c_2_Conv2d_3_3x3_s2_512': (4, 4, 4, 512),
+        'Mixed_5c_2_Conv2d_4_3x3_s2_256': (4, 2, 2, 256),
+        'Mixed_5c_2_Conv2d_5_3x3_s2_256': (4, 1, 1, 256)}
+
+    init_op = tf.global_variables_initializer()
+    with self.test_session() as sess:
+      sess.run(init_op)
+      out_feature_maps = sess.run(feature_maps)
+      out_feature_map_shapes = dict(
+          (key, value.shape) for key, value in out_feature_maps.items())
+      self.assertDictEqual(expected_feature_map_shapes, out_feature_map_shapes)
+  # END GOOGLE-INTERNAL

-  def test_get_expected_feature_map_shapes_with_inception_v3(self):
+  def test_get_expected_feature_map_shapes_with_inception_v3(self, use_keras):
    image_features = {
        'Mixed_5d': tf.random_uniform([4, 35, 35, 256], dtype=tf.float32),
        'Mixed_6e': tf.random_uniform([4, 17, 17, 576], dtype=tf.float32),
        'Mixed_7c': tf.random_uniform([4, 8, 8, 1024], dtype=tf.float32)
    }

-    feature_maps = feature_map_generators.multi_resolution_feature_maps(
+    feature_map_generator = self._build_feature_map_generator(
        feature_map_layout=INCEPTION_V3_LAYOUT,
-        depth_multiplier=1,
-        min_depth=32,
-        insert_1x1_conv=True,
-        image_features=image_features)
+        use_keras=use_keras
+    )
+    feature_maps = feature_map_generator(image_features)

    expected_feature_map_shapes = {
        'Mixed_5d': (4, 35, 35, 256),
@@ -100,10 +178,10 @@ class MultiResolutionFeatureMapGeneratorTest(tf.test.TestCase):
      out_feature_maps = sess.run(feature_maps)
      out_feature_map_shapes = dict(
          (key, value.shape) for key, value in out_feature_maps.items())
-      self.assertDictEqual(out_feature_map_shapes, expected_feature_map_shapes)
+      self.assertDictEqual(expected_feature_map_shapes, out_feature_map_shapes)

  def test_get_expected_feature_map_shapes_with_embedded_ssd_mobilenet_v1(
-      self):
+      self, use_keras):
    image_features = {
        'Conv2d_11_pointwise': tf.random_uniform([4, 16, 16, 512],
                                                 dtype=tf.float32),
@@ -111,12 +189,11 @@ class MultiResolutionFeatureMapGeneratorTest(tf.test.TestCase):
                                                 dtype=tf.float32),
    }

-    feature_maps = feature_map_generators.multi_resolution_feature_maps(
+    feature_map_generator = self._build_feature_map_generator(
        feature_map_layout=EMBEDDED_SSD_MOBILENET_V1_LAYOUT,
-        depth_multiplier=1,
-        min_depth=32,
-        insert_1x1_conv=True,
-        image_features=image_features)
+        use_keras=use_keras
+    )
+    feature_maps = feature_map_generator(image_features)

    expected_feature_map_shapes = {
        'Conv2d_11_pointwise': (4, 16, 16, 512),
@@ -131,7 +208,138 @@ class MultiResolutionFeatureMapGeneratorTest(tf.test.TestCase):
      out_feature_maps = sess.run(feature_maps)
      out_feature_map_shapes = dict(
          (key, value.shape) for key, value in out_feature_maps.items())
-      self.assertDictEqual(out_feature_map_shapes, expected_feature_map_shapes)
+      self.assertDictEqual(expected_feature_map_shapes, out_feature_map_shapes)
+
+  def test_get_expected_variable_names_with_inception_v2(self, use_keras):
+    image_features = {
+        'Mixed_3c': tf.random_uniform([4, 28, 28, 256], dtype=tf.float32),
+        'Mixed_4c': tf.random_uniform([4, 14, 14, 576], dtype=tf.float32),
+        'Mixed_5c': tf.random_uniform([4, 7, 7, 1024], dtype=tf.float32)
+    }
+    feature_map_generator = self._build_feature_map_generator(
+        feature_map_layout=INCEPTION_V2_LAYOUT,
+        use_keras=use_keras
+    )
+    feature_maps = feature_map_generator(image_features)
+
+    expected_slim_variables = set([
+        'Mixed_5c_1_Conv2d_3_1x1_256/weights',
+        'Mixed_5c_1_Conv2d_3_1x1_256/biases',
+        'Mixed_5c_2_Conv2d_3_3x3_s2_512/weights',
+        'Mixed_5c_2_Conv2d_3_3x3_s2_512/biases',
+        'Mixed_5c_1_Conv2d_4_1x1_128/weights',
+        'Mixed_5c_1_Conv2d_4_1x1_128/biases',
+        'Mixed_5c_2_Conv2d_4_3x3_s2_256/weights',
+        'Mixed_5c_2_Conv2d_4_3x3_s2_256/biases',
+        'Mixed_5c_1_Conv2d_5_1x1_128/weights',
+        'Mixed_5c_1_Conv2d_5_1x1_128/biases',
+        'Mixed_5c_2_Conv2d_5_3x3_s2_256/weights',
+        'Mixed_5c_2_Conv2d_5_3x3_s2_256/biases',
+    ])
+
+    expected_keras_variables = set([
+        'FeatureMaps/output_3/Mixed_5c_1_Conv2d_3_1x1_256_conv/kernel',
+        'FeatureMaps/output_3/Mixed_5c_1_Conv2d_3_1x1_256_conv/bias',
+        'FeatureMaps/output_3/Mixed_5c_2_Conv2d_3_3x3_s2_512_conv/kernel',
+        'FeatureMaps/output_3/Mixed_5c_2_Conv2d_3_3x3_s2_512_conv/bias',
+        'FeatureMaps/output_4/Mixed_5c_1_Conv2d_4_1x1_128_conv/kernel',
+        'FeatureMaps/output_4/Mixed_5c_1_Conv2d_4_1x1_128_conv/bias',
+        'FeatureMaps/output_4/Mixed_5c_2_Conv2d_4_3x3_s2_256_conv/kernel',
+        'FeatureMaps/output_4/Mixed_5c_2_Conv2d_4_3x3_s2_256_conv/bias',
+        'FeatureMaps/output_5/Mixed_5c_1_Conv2d_5_1x1_128_conv/kernel',
+        'FeatureMaps/output_5/Mixed_5c_1_Conv2d_5_1x1_128_conv/bias',
+        'FeatureMaps/output_5/Mixed_5c_2_Conv2d_5_3x3_s2_256_conv/kernel',
+        'FeatureMaps/output_5/Mixed_5c_2_Conv2d_5_3x3_s2_256_conv/bias',
+    ])
+
+    init_op = tf.global_variables_initializer()
+    with self.test_session() as sess:
+      sess.run(init_op)
+      sess.run(feature_maps)
+      actual_variable_set = set(
+          [var.op.name for var in tf.trainable_variables()])
+      if use_keras:
+        self.assertSetEqual(expected_keras_variables, actual_variable_set)
+      else:
+        self.assertSetEqual(expected_slim_variables, actual_variable_set)
+
+  # TODO(kaftan): Remove conditional after CMLE moves to TF 1.10
+  # BEGIN GOOGLE-INTERNAL
+  def test_get_expected_variable_names_with_inception_v2_use_depthwise(
+      self,
+      use_keras):
+    image_features = {
+        'Mixed_3c': tf.random_uniform([4, 28, 28, 256], dtype=tf.float32),
+        'Mixed_4c': tf.random_uniform([4, 14, 14, 576], dtype=tf.float32),
+        'Mixed_5c': tf.random_uniform([4, 7, 7, 1024], dtype=tf.float32)
+    }
+    layout_copy = INCEPTION_V2_LAYOUT.copy()
+    layout_copy['use_depthwise'] = True
+    feature_map_generator = self._build_feature_map_generator(
+        feature_map_layout=layout_copy,
+        use_keras=use_keras
+    )
+    feature_maps = feature_map_generator(image_features)
+
+    expected_slim_variables = set([
+        'Mixed_5c_1_Conv2d_3_1x1_256/weights',
+        'Mixed_5c_1_Conv2d_3_1x1_256/biases',
+        'Mixed_5c_2_Conv2d_3_3x3_s2_512_depthwise/depthwise_weights',
+        'Mixed_5c_2_Conv2d_3_3x3_s2_512_depthwise/biases',
+        'Mixed_5c_2_Conv2d_3_3x3_s2_512/weights',
+        'Mixed_5c_2_Conv2d_3_3x3_s2_512/biases',
+        'Mixed_5c_1_Conv2d_4_1x1_128/weights',
+        'Mixed_5c_1_Conv2d_4_1x1_128/biases',
+        'Mixed_5c_2_Conv2d_4_3x3_s2_256_depthwise/depthwise_weights',
+        'Mixed_5c_2_Conv2d_4_3x3_s2_256_depthwise/biases',
+        'Mixed_5c_2_Conv2d_4_3x3_s2_256/weights',
+        'Mixed_5c_2_Conv2d_4_3x3_s2_256/biases',
+        'Mixed_5c_1_Conv2d_5_1x1_128/weights',
+        'Mixed_5c_1_Conv2d_5_1x1_128/biases',
+        'Mixed_5c_2_Conv2d_5_3x3_s2_256_depthwise/depthwise_weights',
+        'Mixed_5c_2_Conv2d_5_3x3_s2_256_depthwise/biases',
+        'Mixed_5c_2_Conv2d_5_3x3_s2_256/weights',
+        'Mixed_5c_2_Conv2d_5_3x3_s2_256/biases',
+    ])
+
+    expected_keras_variables = set([
+        'FeatureMaps/output_3/Mixed_5c_1_Conv2d_3_1x1_256_conv/kernel',
+        'FeatureMaps/output_3/Mixed_5c_1_Conv2d_3_1x1_256_conv/bias',
+        ('FeatureMaps/output_3/Mixed_5c_2_Conv2d_3_3x3_s2_512_depthwise_conv/'
+         'depthwise_kernel'),
+        ('FeatureMaps/output_3/Mixed_5c_2_Conv2d_3_3x3_s2_512_depthwise_conv/'
+         'bias'),
+        'FeatureMaps/output_3/Mixed_5c_2_Conv2d_3_3x3_s2_512_conv/kernel',
+        'FeatureMaps/output_3/Mixed_5c_2_Conv2d_3_3x3_s2_512_conv/bias',
+        'FeatureMaps/output_4/Mixed_5c_1_Conv2d_4_1x1_128_conv/kernel',
+        'FeatureMaps/output_4/Mixed_5c_1_Conv2d_4_1x1_128_conv/bias',
+        ('FeatureMaps/output_4/Mixed_5c_2_Conv2d_4_3x3_s2_256_depthwise_conv/'
+         'depthwise_kernel'),
+        ('FeatureMaps/output_4/Mixed_5c_2_Conv2d_4_3x3_s2_256_depthwise_conv/'
+         'bias'),
+        'FeatureMaps/output_4/Mixed_5c_2_Conv2d_4_3x3_s2_256_conv/kernel',
+        'FeatureMaps/output_4/Mixed_5c_2_Conv2d_4_3x3_s2_256_conv/bias',
+        'FeatureMaps/output_5/Mixed_5c_1_Conv2d_5_1x1_128_conv/kernel',
+        'FeatureMaps/output_5/Mixed_5c_1_Conv2d_5_1x1_128_conv/bias',
+        ('FeatureMaps/output_5/Mixed_5c_2_Conv2d_5_3x3_s2_256_depthwise_conv/'
+         'depthwise_kernel'),
+        ('FeatureMaps/output_5/Mixed_5c_2_Conv2d_5_3x3_s2_256_depthwise_conv/'
+         'bias'),
+        'FeatureMaps/output_5/Mixed_5c_2_Conv2d_5_3x3_s2_256_conv/kernel',
+        'FeatureMaps/output_5/Mixed_5c_2_Conv2d_5_3x3_s2_256_conv/bias',
+    ])
+
+    init_op = tf.global_variables_initializer()
+    with self.test_session() as sess:
+      sess.run(init_op)
+      sess.run(feature_maps)
+      actual_variable_set = set(
+          [var.op.name for var in tf.trainable_variables()])
+      if use_keras:
+        self.assertSetEqual(expected_keras_variables, actual_variable_set)
+      else:
+        self.assertSetEqual(expected_slim_variables, actual_variable_set)
+  # END GOOGLE-INTERNAL


 class FPNFeatureMapGeneratorTest(tf.test.TestCase):
@@ -161,6 +369,31 @@ class FPNFeatureMapGeneratorTest(tf.test.TestCase):
                                for key, value in out_feature_maps.items()}
      self.assertDictEqual(out_feature_map_shapes, expected_feature_map_shapes)

+  def test_get_expected_feature_map_shapes_with_depthwise(self):
+    image_features = [
+        ('block2', tf.random_uniform([4, 8, 8, 256], dtype=tf.float32)),
+        ('block3', tf.random_uniform([4, 4, 4, 256], dtype=tf.float32)),
+        ('block4', tf.random_uniform([4, 2, 2, 256], dtype=tf.float32)),
+        ('block5', tf.random_uniform([4, 1, 1, 256], dtype=tf.float32))
+    ]
+    feature_maps = feature_map_generators.fpn_top_down_feature_maps(
+        image_features=image_features, depth=128, use_depthwise=True)
+
+    expected_feature_map_shapes = {
+        'top_down_block2': (4, 8, 8, 128),
+        'top_down_block3': (4, 4, 4, 128),
+        'top_down_block4': (4, 2, 2, 128),
+        'top_down_block5': (4, 1, 1, 128)
+    }
+
+    init_op = tf.global_variables_initializer()
+    with self.test_session() as sess:
+      sess.run(init_op)
+      out_feature_maps = sess.run(feature_maps)
+      out_feature_map_shapes = {key: value.shape
+                                for key, value in out_feature_maps.items()}
+      self.assertDictEqual(out_feature_map_shapes, expected_feature_map_shapes)
+

 class GetDepthFunctionTest(tf.test.TestCase):

@@ -175,5 +408,94 @@ class GetDepthFunctionTest(tf.test.TestCase):
    self.assertEqual(depth_fn(64), 32)


+@parameterized.parameters(
+    {'replace_pool_with_conv': False},
+    {'replace_pool_with_conv': True},
+)
+class PoolingPyramidFeatureMapGeneratorTest(tf.test.TestCase):
+
+  def test_get_expected_feature_map_shapes(self, replace_pool_with_conv):
+    image_features = {
+        'image_features': tf.random_uniform([4, 19, 19, 1024])
+    }
+    feature_maps = feature_map_generators.pooling_pyramid_feature_maps(
+        base_feature_map_depth=1024,
+        num_layers=6,
+        image_features=image_features,
+        replace_pool_with_conv=replace_pool_with_conv)
+
+    expected_pool_feature_map_shapes = {
+        'Base_Conv2d_1x1_1024': (4, 19, 19, 1024),
+        'MaxPool2d_0_2x2': (4, 10, 10, 1024),
+        'MaxPool2d_1_2x2': (4, 5, 5, 1024),
+        'MaxPool2d_2_2x2': (4, 3, 3, 1024),
+        'MaxPool2d_3_2x2': (4, 2, 2, 1024),
+        'MaxPool2d_4_2x2': (4, 1, 1, 1024),
+    }
+
+    expected_conv_feature_map_shapes = {
+        'Base_Conv2d_1x1_1024': (4, 19, 19, 1024),
+        'Conv2d_0_3x3_s2_1024': (4, 10, 10, 1024),
+        'Conv2d_1_3x3_s2_1024': (4, 5, 5, 1024),
+        'Conv2d_2_3x3_s2_1024': (4, 3, 3, 1024),
+        'Conv2d_3_3x3_s2_1024': (4, 2, 2, 1024),
+        'Conv2d_4_3x3_s2_1024': (4, 1, 1, 1024),
+    }
+
+    init_op = tf.global_variables_initializer()
+    with self.test_session() as sess:
+      sess.run(init_op)
+      out_feature_maps = sess.run(feature_maps)
+      out_feature_map_shapes = {key: value.shape
+                                for key, value in out_feature_maps.items()}
+      if replace_pool_with_conv:
+        self.assertDictEqual(expected_conv_feature_map_shapes,
+                             out_feature_map_shapes)
+      else:
+        self.assertDictEqual(expected_pool_feature_map_shapes,
+                             out_feature_map_shapes)
+
+  def test_get_expected_variable_names(self, replace_pool_with_conv):
+    image_features = {
+        'image_features': tf.random_uniform([4, 19, 19, 1024])
+    }
+    feature_maps = feature_map_generators.pooling_pyramid_feature_maps(
+        base_feature_map_depth=1024,
+        num_layers=6,
+        image_features=image_features,
+        replace_pool_with_conv=replace_pool_with_conv)
+
+    expected_pool_variables = set([
+        'Base_Conv2d_1x1_1024/weights',
+        'Base_Conv2d_1x1_1024/biases',
+    ])
+
+    expected_conv_variables = set([
+        'Base_Conv2d_1x1_1024/weights',
+        'Base_Conv2d_1x1_1024/biases',
+        'Conv2d_0_3x3_s2_1024/weights',
+        'Conv2d_0_3x3_s2_1024/biases',
+        'Conv2d_1_3x3_s2_1024/weights',
+        'Conv2d_1_3x3_s2_1024/biases',
+        'Conv2d_2_3x3_s2_1024/weights',
+        'Conv2d_2_3x3_s2_1024/biases',
+        'Conv2d_3_3x3_s2_1024/weights',
+        'Conv2d_3_3x3_s2_1024/biases',
+        'Conv2d_4_3x3_s2_1024/weights',
+        'Conv2d_4_3x3_s2_1024/biases',
+    ])
+
+    init_op = tf.global_variables_initializer()
+    with self.test_session() as sess:
+      sess.run(init_op)
+      sess.run(feature_maps)
+      actual_variable_set = set(
+          [var.op.name for var in tf.trainable_variables()])
+      if replace_pool_with_conv:
+        self.assertSetEqual(expected_conv_variables, actual_variable_set)
+      else:
+        self.assertSetEqual(expected_pool_variables, actual_variable_set)
+
+
 if __name__ == '__main__':
  tf.test.main()