Merged commit includes the following changes: (#6315)

236813471 by lzc: Internal change. -- 236507310 by lzc: Fix preprocess.random_resize_method config type issue. The target height and width will be passed as "size" to tf.image.resize_images which only accepts integer. -- 236409989 by Zhichao Lu: Config export_to_tpu from function parameter instead of HParams for TPU inference. -- 236403186 by Zhichao Lu: Make graph file names optional arguments. -- 236237072 by Zhichao Lu: Minor bugfix for keyword args. -- 236209602 by Zhichao Lu: Add support for PartitionedVariable to get_variables_available_in_checkpoint. -- 235828658 by Zhichao Lu: Automatically stop evaluation jobs when training is finished. -- 235817964 by Zhichao Lu: Add an optional process_metrics_fn callback to eval_util, it gets called with evaluation results once each evaluation is complete. -- 235788721 by lzc: Fix yml file tf runtime version. -- 235262897 by Zhichao Lu: Add keypoint support to the random_pad_image preprocessor method. -- 235257380 by Zhichao Lu: Support InputDataFields.groundtruth_confidences in retain_groundtruth(), retain_groundtruth_with_positive_classes(), filter_groundtruth_with_crowd_boxes(), filter_groundtruth_with_nan_box_coordinates(), filter_unrecognized_classes(). -- 235109188 by Zhichao Lu: Fix bug in pad_input_data_to_static_shapes for num_additional_channels > 0; make color-specific data augmentation only touch RGB channels. -- 235045010 by Zhichao Lu: Don't slice class_predictions_with_background when add_background_class is false. -- 235026189 by lzc: Fix import in g3doc. -- 234863426 by Zhichao Lu: Added fixes in exporter to allow writing a checkpoint to a specified temporary directory. -- 234671886 by lzc: Internal Change. -- 234630803 by rathodv: Internal Change. -- 233985896 by Zhichao Lu: Add Neumann optimizer to object detection. -- 233560911 by Zhichao Lu: Add NAS-FPN object detection with Resnet and Mobilenet v2. -- 233513536 by Zhichao Lu: Export TPU compatible object detection model -- 233495772 by lzc: Internal change. -- 233453557 by Zhichao Lu: Create Keras-based SSD+MobilenetV1 for object detection. -- 233220074 by lzc: Update release notes date. -- 233165761 by Zhichao Lu: Support depth_multiplier and min_depth in _SSDResnetV1FpnFeatureExtractor. -- 233160046 by lzc: Internal change. -- 232926599 by Zhichao Lu: [tf.data] Switching tf.data functions to use `defun`, providing an escape hatch to continue using the legacy `Defun`. There are subtle differences between the implementation of `defun` and `Defun` (such as resources handling or control flow) and it is possible that input pipelines that use control flow or resources in their functions might be affected by this change. To migrate majority of existing pipelines to the recommended way of creating functions in TF 2.0 world, while allowing (a small number of) existing pipelines to continue relying on the deprecated behavior, this CL provides an escape hatch. If your input pipeline is affected by this CL, it should apply the escape hatch by replacing `foo.map(...)` with `foo.map_with_legacy_function(...)`. -- 232891621 by Zhichao Lu: Modify faster_rcnn meta architecture to normalize raw detections. -- 232875817 by Zhichao Lu: Make calibration a post-processing step. Specifically: - Move the calibration config from pipeline.proto --> post_processing.proto - Edit post_processing_builder.py to return a calibration function. If no calibration config is provided, it None. - Edit SSD and FasterRCNN meta architectures to optionally call the calibration function on detection scores after score conversion and before NMS. -- 232704481 by Zhichao Lu: Edit calibration builder to build a function that will be used within a detection model's `postprocess` method, after score conversion and before non-maxima suppression. Specific Edits: - The returned function now accepts class_predictions_with_background as its argument instead of detection_scores and detection_classes. - Class-specific calibration was temporarily removed, as it requires more significant refactoring. Will be added later. -- 232615379 by Zhichao Lu: Internal change -- 232483345 by ronnyvotel: Making the use of bfloat16 restricted to TPUs. -- 232399572 by Zhichao Lu: Edit calibration builder and proto to support class-agnostic calibration. Specifically: - Edit calibration protos to include path to relevant label map if required for class-specific calibration. Previously, label maps were inferred from other parts of the pipeline proto; this allows all information required by the builder stay within the calibration proto and remove extraneous information from being passed with class-agnostic calibration. - Add class-agnostic protos to the calibration config. Note that the proto supports sigmoid and linear interpolation parameters, but the builder currently only supports linear interpolation. -- 231613048 by Zhichao Lu: Add calibration builder for applying calibration transformations from output of object detection models. Specifically: - Add calibration proto to support sigmoid and isotonic regression (stepwise function) calibration. - Add a builder to support calibration from isotonic regression outputs. -- 231519786 by lzc: model_builder test refactor. - removed proto text boilerplate in each test case and let them call a create_default_proto function instead. - consolidated all separate ssd model creation tests into one. - consolidated all separate faster rcnn model creation tests into one. - used parameterized test for testing mask rcnn models and use_matmul_crop_and_resize - added all failures test. -- 231448169 by Zhichao Lu: Return static shape as a constant tensor. -- 231423126 by lzc: Add a release note for OID v4 models. -- 231401941 by Zhichao Lu: Adding correct labelmap for the models trained on Open Images V4 (*oid_v4 config suffix). -- 231320357 by Zhichao Lu: Add scope to Nearest Neighbor Resize op so that it stays in the same name scope as the original resize ops. -- 231257699 by Zhichao Lu: Switch to using preserve_aspect_ratio in tf.image.resize_images rather than using a custom implementation. -- 231247368 by rathodv: Internal change. -- 231004874 by lzc: Update documentations to use tf 1.12 for object detection API. -- 230999911 by rathodv: Use tf.batch_gather instead of ops.batch_gather -- 230999720 by huizhongc: Fix weight equalization test in ops_test. -- 230984728 by rathodv: Internal update. -- 230929019 by lzc: Add an option to replace preprocess operation with placeholder for ssd feature extractor. -- 230845266 by lzc: Require tensorflow version 1.12 for object detection API and rename keras_applications to keras_models -- 230392064 by lzc: Add RetinaNet 101 checkpoint trained on OID v4 to detection model zoo. -- 230014128 by derekjchow: This file was re-located below the tensorflow/lite/g3doc/convert -- 229941449 by lzc: Update SSD mobilenet v2 quantized model download path. -- 229843662 by lzc: Add an option to use native resize tf op in fpn top-down feature map generation. -- 229636034 by rathodv: Add deprecation notice to a few old parameters in train.proto -- 228959078 by derekjchow: Remove duplicate elif case in _check_and_convert_legacy_input_config_key -- 228749719 by rathodv: Minor refactoring to make exporter's `build_detection_graph` method public. -- 228573828 by rathodv: Mofity model.postprocess to return raw detections and raw scores. Modify, post-process methods in core/model.py and the meta architectures to export raw detection (without any non-max suppression) and raw multiclass score logits for those detections. -- 228420670 by Zhichao Lu: Add shims for custom architectures for object detection models. -- 228241692 by Zhichao Lu: Fix the comment on "losses_mask" in "Loss" class. -- 228223810 by Zhichao Lu: Support other_heads' predictions in WeightSharedConvolutionalBoxPredictor. Also remove a few unused parameters and fix a couple of comments in convolutional_box_predictor.py. -- 228200588 by Zhichao Lu: Add Expected Calibration Error and an evaluator that calculates the metric for object detections. -- 228167740 by lzc: Add option to use bounded activations in FPN top-down feature map generation. -- 227767700 by rathodv: Internal. -- 226295236 by Zhichao Lu: Add Open Image V4 Resnet101-FPN training config to third_party -- 226254842 by Zhichao Lu: Fix typo in documentation. -- 225833971 by Zhichao Lu: Option to have no resizer in object detection model. -- 225824890 by lzc: Fixes p3 compatibility for model_lib.py -- 225760897 by menglong: normalizer should be at least 1. -- 225559842 by menglong: Add extra logic filtering unrecognized classes. -- 225379421 by lzc: Add faster_rcnn_inception_resnet_v2_atrous_oid_v4 config to third_party -- 225368337 by Zhichao Lu: Add extra logic filtering unrecognized classes. -- 225341095 by Zhichao Lu: Adding Open Images V4 models to OD API model zoo and corresponding configs to the configs. -- 225218450 by menglong: Add extra logic filtering unrecognized classes. -- 225057591 by Zhichao Lu: Internal change. -- 224895417 by rathodv: Internal change. -- 224209282 by Zhichao Lu: Add two data augmentations to object detection: (1) Self-concat (2) Absolute pads. -- 224073762 by Zhichao Lu: Do not create tf.constant until _generate() is actually called in the object detector. -- PiperOrigin-RevId: 236813471

Merged commit includes the following changes: (#6315)
236813471 by lzc: Internal change. -- 236507310 by lzc: Fix preprocess.random_resize_method config type issue. The target height and width will be passed as "size" to tf.image.resize_images which only accepts integer. -- 236409989 by Zhichao Lu: Config export_to_tpu from function parameter instead of HParams for TPU inference. -- 236403186 by Zhichao Lu: Make graph file names optional arguments. -- 236237072 by Zhichao Lu: Minor bugfix for keyword args. -- 236209602 by Zhichao Lu: Add support for PartitionedVariable to get_variables_available_in_checkpoint. -- 235828658 by Zhichao Lu: Automatically stop evaluation jobs when training is finished. -- 235817964 by Zhichao Lu: Add an optional process_metrics_fn callback to eval_util, it gets called with evaluation results once each evaluation is complete. -- 235788721 by lzc: Fix yml file tf runtime version. -- 235262897 by Zhichao Lu: Add keypoint support to the random_pad_image preprocessor method. -- 235257380 by Zhichao Lu: Support InputDataFields.groundtruth_confidences in retain_groundtruth(), retain_groundtruth_with_positive_classes(), filter_groundtruth_with_crowd_boxes(), filter_groundtruth_with_nan_box_coordinates(), filter_unrecognized_classes(). -- 235109188 by Zhichao Lu: Fix bug in pad_input_data_to_static_shapes for num_additional_channels > 0; make color-specific data augmentation only touch RGB channels. -- 235045010 by Zhichao Lu: Don't slice class_predictions_with_background when add_background_class is false. -- 235026189 by lzc: Fix import in g3doc. -- 234863426 by Zhichao Lu: Added fixes in exporter to allow writing a checkpoint to a specified temporary directory. -- 234671886 by lzc: Internal Change. -- 234630803 by rathodv: Internal Change. -- 233985896 by Zhichao Lu: Add Neumann optimizer to object detection. -- 233560911 by Zhichao Lu: Add NAS-FPN object detection with Resnet and Mobilenet v2. -- 233513536 by Zhichao Lu: Export TPU compatible object detection model -- 233495772 by lzc: Internal change. -- 233453557 by Zhichao Lu: Create Keras-based SSD+MobilenetV1 for object detection. -- 233220074 by lzc: Update release notes date. -- 233165761 by Zhichao Lu: Support depth_multiplier and min_depth in _SSDResnetV1FpnFeatureExtractor. -- 233160046 by lzc: Internal change. -- 232926599 by Zhichao Lu: [tf.data] Switching tf.data functions to use `defun`, providing an escape hatch to continue using the legacy `Defun`. There are subtle differences between the implementation of `defun` and `Defun` (such as resources handling or control flow) and it is possible that input pipelines that use control flow or resources in their functions might be affected by this change. To migrate majority of existing pipelines to the recommended way of creating functions in TF 2.0 world, while allowing (a small number of) existing pipelines to continue relying on the deprecated behavior, this CL provides an escape hatch. If your input pipeline is affected by this CL, it should apply the escape hatch by replacing `foo.map(...)` with `foo.map_with_legacy_function(...)`. -- 232891621 by Zhichao Lu: Modify faster_rcnn meta architecture to normalize raw detections. -- 232875817 by Zhichao Lu: Make calibration a post-processing step. Specifically: - Move the calibration config from pipeline.proto --> post_processing.proto - Edit post_processing_builder.py to return a calibration function. If no calibration config is provided, it None. - Edit SSD and FasterRCNN meta architectures to optionally call the calibration function on detection scores after score conversion and before NMS. -- 232704481 by Zhichao Lu: Edit calibration builder to build a function that will be used within a detection model's `postprocess` method, after score conversion and before non-maxima suppression. Specific Edits: - The returned function now accepts class_predictions_with_background as its argument instead of detection_scores and detection_classes. - Class-specific calibration was temporarily removed, as it requires more significant refactoring. Will be added later. -- 232615379 by Zhichao Lu: Internal change -- 232483345 by ronnyvotel: Making the use of bfloat16 restricted to TPUs. -- 232399572 by Zhichao Lu: Edit calibration builder and proto to support class-agnostic calibration. Specifically: - Edit calibration protos to include path to relevant label map if required for class-specific calibration. Previously, label maps were inferred from other parts of the pipeline proto; this allows all information required by the builder stay within the calibration proto and remove extraneous information from being passed with class-agnostic calibration. - Add class-agnostic protos to the calibration config. Note that the proto supports sigmoid and linear interpolation parameters, but the builder currently only supports linear interpolation. -- 231613048 by Zhichao Lu: Add calibration builder for applying calibration transformations from output of object detection models. Specifically: - Add calibration proto to support sigmoid and isotonic regression (stepwise function) calibration. - Add a builder to support calibration from isotonic regression outputs. -- 231519786 by lzc: model_builder test refactor. - removed proto text boilerplate in each test case and let them call a create_default_proto function instead. - consolidated all separate ssd model creation tests into one. - consolidated all separate faster rcnn model creation tests into one. - used parameterized test for testing mask rcnn models and use_matmul_crop_and_resize - added all failures test. -- 231448169 by Zhichao Lu: Return static shape as a constant tensor. -- 231423126 by lzc: Add a release note for OID v4 models. -- 231401941 by Zhichao Lu: Adding correct labelmap for the models trained on Open Images V4 (*oid_v4 config suffix). -- 231320357 by Zhichao Lu: Add scope to Nearest Neighbor Resize op so that it stays in the same name scope as the original resize ops. -- 231257699 by Zhichao Lu: Switch to using preserve_aspect_ratio in tf.image.resize_images rather than using a custom implementation. -- 231247368 by rathodv: Internal change. -- 231004874 by lzc: Update documentations to use tf 1.12 for object detection API. -- 230999911 by rathodv: Use tf.batch_gather instead of ops.batch_gather -- 230999720 by huizhongc: Fix weight equalization test in ops_test. -- 230984728 by rathodv: Internal update. -- 230929019 by lzc: Add an option to replace preprocess operation with placeholder for ssd feature extractor. -- 230845266 by lzc: Require tensorflow version 1.12 for object detection API and rename keras_applications to keras_models -- 230392064 by lzc: Add RetinaNet 101 checkpoint trained on OID v4 to detection model zoo. -- 230014128 by derekjchow: This file was re-located below the tensorflow/lite/g3doc/convert -- 229941449 by lzc: Update SSD mobilenet v2 quantized model download path. -- 229843662 by lzc: Add an option to use native resize tf op in fpn top-down feature map generation. -- 229636034 by rathodv: Add deprecation notice to a few old parameters in train.proto -- 228959078 by derekjchow: Remove duplicate elif case in _check_and_convert_legacy_input_config_key -- 228749719 by rathodv: Minor refactoring to make exporter's `build_detection_graph` method public. -- 228573828 by rathodv: Mofity model.postprocess to return raw detections and raw scores. Modify, post-process methods in core/model.py and the meta architectures to export raw detection (without any non-max suppression) and raw multiclass score logits for those detections. -- 228420670 by Zhichao Lu: Add shims for custom architectures for object detection models. -- 228241692 by Zhichao Lu: Fix the comment on "losses_mask" in "Loss" class. -- 228223810 by Zhichao Lu: Support other_heads' predictions in WeightSharedConvolutionalBoxPredictor. Also remove a few unused parameters and fix a couple of comments in convolutional_box_predictor.py. -- 228200588 by Zhichao Lu: Add Expected Calibration Error and an evaluator that calculates the metric for object detections. -- 228167740 by lzc: Add option to use bounded activations in FPN top-down feature map generation. -- 227767700 by rathodv: Internal. -- 226295236 by Zhichao Lu: Add Open Image V4 Resnet101-FPN training config to third_party -- 226254842 by Zhichao Lu: Fix typo in documentation. -- 225833971 by Zhichao Lu: Option to have no resizer in object detection model. -- 225824890 by lzc: Fixes p3 compatibility for model_lib.py -- 225760897 by menglong: normalizer should be at least 1. -- 225559842 by menglong: Add extra logic filtering unrecognized classes. -- 225379421 by lzc: Add faster_rcnn_inception_resnet_v2_atrous_oid_v4 config to third_party -- 225368337 by Zhichao Lu: Add extra logic filtering unrecognized classes. -- 225341095 by Zhichao Lu: Adding Open Images V4 models to OD API model zoo and corresponding configs to the configs. -- 225218450 by menglong: Add extra logic filtering unrecognized classes. -- 225057591 by Zhichao Lu: Internal change. -- 224895417 by rathodv: Internal change. -- 224209282 by Zhichao Lu: Add two data augmentations to object detection: (1) Self-concat (2) Absolute pads. -- 224073762 by Zhichao Lu: Do not create tf.constant until _generate() is actually called in the object detector. -- PiperOrigin-RevId: 236813471
05584085 · pkulzc · Jonathan Huang · a5db4420 · 05584085 · 05584085
Commit 05584085 authored Mar 07, 2019 by pkulzc Committed by Jonathan Huang Mar 07, 2019
20 changed files
--- a/research/object_detection/meta_architectures/faster_rcnn_meta_arch.py
+++ b/research/object_detection/meta_architectures/faster_rcnn_meta_arch.py
@@ -92,8 +92,8 @@ configured in the meta architecture:
  non-max suppression and normalize them. In this case, the `postprocess` method
  skips both `_postprocess_rpn` and `_postprocess_box_classifier`.
 """
-from abc import abstractmethod
-from functools import partial
+import abc
+import functools
 import tensorflow as tf

 from object_detection.anchor_generators import grid_anchor_generator
@@ -138,7 +138,7 @@ class FasterRCNNFeatureExtractor(object):
    self._reuse_weights = reuse_weights
    self._weight_decay = weight_decay

-  @abstractmethod
+  @abc.abstractmethod
  def preprocess(self, resized_inputs):
    """Feature-extractor specific preprocessing (minus image resizing)."""
    pass
@@ -162,7 +162,7 @@ class FasterRCNNFeatureExtractor(object):
    with tf.variable_scope(scope, values=[preprocessed_inputs]):
      return self._extract_proposal_features(preprocessed_inputs, scope)

-  @abstractmethod
+  @abc.abstractmethod
  def _extract_proposal_features(self, preprocessed_inputs, scope):
    """Extracts first stage RPN features, to be overridden."""
    pass
@@ -185,7 +185,7 @@ class FasterRCNNFeatureExtractor(object):
        scope, values=[proposal_feature_maps], reuse=tf.AUTO_REUSE):
      return self._extract_box_classifier_features(proposal_feature_maps, scope)

-  @abstractmethod
+  @abc.abstractmethod
  def _extract_box_classifier_features(self, proposal_feature_maps, scope):
    """Extracts second stage box classifier features, to be overridden."""
    pass
@@ -770,7 +770,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
          representing the features for each proposal.
    """
    image_shape_2d = self._image_batch_shape_2d(image_shape)
-    proposal_boxes_normalized, _, num_proposals = self._postprocess_rpn(
+    proposal_boxes_normalized, _, num_proposals, _, _ = self._postprocess_rpn(
        rpn_box_encodings, rpn_objectness_predictions_with_background,
        anchors, image_shape_2d, true_image_shapes)

@@ -1080,7 +1080,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
        anchors_boxlist, clip_window)
    def _batch_gather_kept_indices(predictions_tensor):
      return shape_utils.static_or_dynamic_map_fn(
-          partial(tf.gather, indices=keep_indices),
+          functools.partial(tf.gather, indices=keep_indices),
          elems=predictions_tensor,
          dtype=tf.float32,
          parallel_iterations=self._parallel_iterations,
@@ -1148,17 +1148,22 @@ class FasterRCNNMetaArch(model.DetectionModel):

    with tf.name_scope('FirstStagePostprocessor'):
      if self._number_of_stages == 1:
-        proposal_boxes, proposal_scores, num_proposals = self._postprocess_rpn(
-            prediction_dict['rpn_box_encodings'],
-            prediction_dict['rpn_objectness_predictions_with_background'],
-            prediction_dict['anchors'],
-            true_image_shapes,
-            true_image_shapes)
+        (proposal_boxes, proposal_scores, num_proposals, raw_proposal_boxes,
+         raw_proposal_scores) = self._postprocess_rpn(
+             prediction_dict['rpn_box_encodings'],
+             prediction_dict['rpn_objectness_predictions_with_background'],
+             prediction_dict['anchors'], true_image_shapes, true_image_shapes)
        return {
-            fields.DetectionResultFields.detection_boxes: proposal_boxes,
-            fields.DetectionResultFields.detection_scores: proposal_scores,
+            fields.DetectionResultFields.detection_boxes:
+                proposal_boxes,
+            fields.DetectionResultFields.detection_scores:
+                proposal_scores,
            fields.DetectionResultFields.num_detections:
                tf.to_float(num_proposals),
+            fields.DetectionResultFields.raw_detection_boxes:
+                raw_proposal_boxes,
+            fields.DetectionResultFields.raw_detection_scores:
+                raw_proposal_scores
        }

    # TODO(jrru): Remove mask_predictions from _post_process_box_classifier.
@@ -1266,6 +1271,11 @@ class FasterRCNNMetaArch(model.DetectionModel):
      num_proposals: A Tensor of type `int32`. A 1-D tensor of shape [batch]
        representing the number of proposals predicted for each image in
        the batch.
+      raw_detection_boxes: [batch, total_detections, 4] tensor with decoded
+        proposal boxes before Non-Max Suppression.
+      raw_detection_score: [batch, total_detections,
+        num_classes_with_background] tensor of class score logits for
+        raw proposal boxes.
    """
    rpn_box_encodings_batch = tf.expand_dims(rpn_box_encodings_batch, axis=2)
    rpn_encodings_shape = shape_utils.combined_static_and_dynamic_shape(
@@ -1274,13 +1284,13 @@ class FasterRCNNMetaArch(model.DetectionModel):
        tf.expand_dims(anchors, 0), [rpn_encodings_shape[0], 1, 1])
    proposal_boxes = self._batch_decode_boxes(rpn_box_encodings_batch,
                                              tiled_anchor_boxes)
-    proposal_boxes = tf.squeeze(proposal_boxes, axis=2)
+    raw_proposal_boxes = tf.squeeze(proposal_boxes, axis=2)
    rpn_objectness_softmax_without_background = tf.nn.softmax(
        rpn_objectness_predictions_with_background_batch)[:, :, 1]
    clip_window = self._compute_clip_window(image_shapes)
    (proposal_boxes, proposal_scores, _, _, _,
     num_proposals) = self._first_stage_nms_fn(
-         tf.expand_dims(proposal_boxes, axis=2),
+         tf.expand_dims(raw_proposal_boxes, axis=2),
         tf.expand_dims(rpn_objectness_softmax_without_background, axis=2),
         clip_window=clip_window)
    if self._is_training:
@@ -1304,7 +1314,13 @@ class FasterRCNNMetaArch(model.DetectionModel):
      return normalized_boxes_per_image
    normalized_proposal_boxes = shape_utils.static_or_dynamic_map_fn(
        normalize_boxes, elems=[proposal_boxes, image_shapes], dtype=tf.float32)
-    return normalized_proposal_boxes, proposal_scores, num_proposals
+    raw_normalized_proposal_boxes = shape_utils.static_or_dynamic_map_fn(
+        normalize_boxes,
+        elems=[raw_proposal_boxes, image_shapes],
+        dtype=tf.float32)
+    return (normalized_proposal_boxes, proposal_scores, num_proposals,
+            raw_normalized_proposal_boxes,
+            rpn_objectness_predictions_with_background_batch)

  def _sample_box_classifier_batch(
      self,
@@ -1576,6 +1592,11 @@ class FasterRCNNMetaArch(model.DetectionModel):
          (optional) [batch, max_detections, mask_height, mask_width]. Note
          that a pixel-wise sigmoid score converter is applied to the detection
          masks.
+        `raw_detection_boxes`: [batch, total_detections, 4] tensor with decoded
+          detection boxes before Non-Max Suppression.
+        `raw_detection_score`: [batch, total_detections,
+          num_classes_with_background] tensor of multi-class score logits for
+          raw detection boxes.
    """
    refined_box_encodings_batch = tf.reshape(
        refined_box_encodings,
@@ -1589,11 +1610,11 @@ class FasterRCNNMetaArch(model.DetectionModel):
    )
    refined_decoded_boxes_batch = self._batch_decode_boxes(
        refined_box_encodings_batch, proposal_boxes)
-    class_predictions_with_background_batch = (
+    class_predictions_with_background_batch_normalized = (
        self._second_stage_score_conversion_fn(
            class_predictions_with_background_batch))
    class_predictions_batch = tf.reshape(
-        tf.slice(class_predictions_with_background_batch,
+        tf.slice(class_predictions_with_background_batch_normalized,
                 [0, 0, 1], [-1, -1, -1]),
        [-1, self.max_num_proposals, self.num_classes])
    clip_window = self._compute_clip_window(image_shapes)
@@ -1614,11 +1635,51 @@ class FasterRCNNMetaArch(model.DetectionModel):
         change_coordinate_frame=True,
         num_valid_boxes=num_proposals,
         masks=mask_predictions_batch)
+    if refined_decoded_boxes_batch.shape[2] > 1:
+      class_ids = tf.expand_dims(
+          tf.argmax(class_predictions_with_background_batch[:, :, 1:], axis=2,
+                    output_type=tf.int32),
+          axis=-1)
+      raw_detection_boxes = tf.squeeze(
+          tf.batch_gather(refined_decoded_boxes_batch, class_ids), axis=2)
+    else:
+      raw_detection_boxes = tf.squeeze(refined_decoded_boxes_batch, axis=2)
+
+    def normalize_and_clip_boxes(args):
+      """Normalize and clip boxes."""
+      boxes_per_image = args[0]
+      image_shape = args[1]
+      normalized_boxes_per_image = box_list_ops.to_normalized_coordinates(
+          box_list.BoxList(boxes_per_image),
+          image_shape[0],
+          image_shape[1],
+          check_range=False).get()
+
+      normalized_boxes_per_image = box_list_ops.clip_to_window(
+          box_list.BoxList(normalized_boxes_per_image),
+          tf.constant([0.0, 0.0, 1.0, 1.0], tf.float32),
+          filter_nonoverlapping=False).get()
+
+      return normalized_boxes_per_image
+
+    raw_normalized_detection_boxes = shape_utils.static_or_dynamic_map_fn(
+        normalize_and_clip_boxes,
+        elems=[raw_detection_boxes, image_shapes],
+        dtype=tf.float32)
+
    detections = {
-        fields.DetectionResultFields.detection_boxes: nmsed_boxes,
-        fields.DetectionResultFields.detection_scores: nmsed_scores,
-        fields.DetectionResultFields.detection_classes: nmsed_classes,
-        fields.DetectionResultFields.num_detections: tf.to_float(num_detections)
+        fields.DetectionResultFields.detection_boxes:
+            nmsed_boxes,
+        fields.DetectionResultFields.detection_scores:
+            nmsed_scores,
+        fields.DetectionResultFields.detection_classes:
+            nmsed_classes,
+        fields.DetectionResultFields.num_detections:
+            tf.to_float(num_detections),
+        fields.DetectionResultFields.raw_detection_boxes:
+            raw_normalized_detection_boxes,
+        fields.DetectionResultFields.raw_detection_scores:
+            class_predictions_with_background_batch
    }
    if nmsed_masks is not None:
      detections[fields.DetectionResultFields.detection_masks] = nmsed_masks
@@ -1769,7 +1830,8 @@ class FasterRCNNMetaArch(model.DetectionModel):
          back_prop=True))

      # Normalize by number of examples in sampled minibatch
-      normalizer = tf.reduce_sum(batch_sampled_indices, axis=1)
+      normalizer = tf.maximum(
+          tf.reduce_sum(batch_sampled_indices, axis=1), 1.0)
      batch_one_hot_targets = tf.one_hot(
          tf.to_int32(batch_cls_targets), depth=2)
      sampled_reg_indices = tf.multiply(batch_sampled_indices,

--- a/research/object_detection/meta_architectures/faster_rcnn_meta_arch_test.py
+++ b/research/object_detection/meta_architectures/faster_rcnn_meta_arch_test.py
@@ -85,6 +85,68 @@ class FasterRCNNMetaArchTest(
      self.assertTrue(np.amax(detections_out['detection_masks'] <= 1.0))
      self.assertTrue(np.amin(detections_out['detection_masks'] >= 0.0))

+  def test_postprocess_second_stage_only_inference_mode_with_calibration(self):
+    model = self._build_model(
+        is_training=False, number_of_stages=2, second_stage_batch_size=6,
+        calibration_mapping_value=0.5)
+
+    batch_size = 2
+    total_num_padded_proposals = batch_size * model.max_num_proposals
+    proposal_boxes = tf.constant(
+        [[[1, 1, 2, 3],
+          [0, 0, 1, 1],
+          [.5, .5, .6, .6],
+          4*[0], 4*[0], 4*[0], 4*[0], 4*[0]],
+         [[2, 3, 6, 8],
+          [1, 2, 5, 3],
+          4*[0], 4*[0], 4*[0], 4*[0], 4*[0], 4*[0]]], dtype=tf.float32)
+    num_proposals = tf.constant([3, 2], dtype=tf.int32)
+    refined_box_encodings = tf.zeros(
+        [total_num_padded_proposals, model.num_classes, 4], dtype=tf.float32)
+    class_predictions_with_background = tf.ones(
+        [total_num_padded_proposals, model.num_classes+1], dtype=tf.float32)
+    image_shape = tf.constant([batch_size, 36, 48, 3], dtype=tf.int32)
+
+    mask_height = 2
+    mask_width = 2
+    mask_predictions = 30. * tf.ones(
+        [total_num_padded_proposals, model.num_classes,
+         mask_height, mask_width], dtype=tf.float32)
+    exp_detection_masks = np.array([[[[1, 1], [1, 1]],
+                                     [[1, 1], [1, 1]],
+                                     [[1, 1], [1, 1]],
+                                     [[1, 1], [1, 1]],
+                                     [[1, 1], [1, 1]]],
+                                    [[[1, 1], [1, 1]],
+                                     [[1, 1], [1, 1]],
+                                     [[1, 1], [1, 1]],
+                                     [[1, 1], [1, 1]],
+                                     [[0, 0], [0, 0]]]])
+
+    _, true_image_shapes = model.preprocess(tf.zeros(image_shape))
+    detections = model.postprocess({
+        'refined_box_encodings': refined_box_encodings,
+        'class_predictions_with_background': class_predictions_with_background,
+        'num_proposals': num_proposals,
+        'proposal_boxes': proposal_boxes,
+        'image_shape': image_shape,
+        'mask_predictions': mask_predictions
+    }, true_image_shapes)
+    with self.test_session() as sess:
+      detections_out = sess.run(detections)
+      self.assertAllEqual(detections_out['detection_boxes'].shape, [2, 5, 4])
+      # All scores map to 0.5, except for the final one, which is pruned.
+      self.assertAllClose(detections_out['detection_scores'],
+                          [[0.5, 0.5, 0.5, 0.5, 0.5],
+                           [0.5, 0.5, 0.5, 0.5, 0.0]])
+      self.assertAllClose(detections_out['detection_classes'],
+                          [[0, 0, 0, 1, 1], [0, 0, 1, 1, 0]])
+      self.assertAllClose(detections_out['num_detections'], [5, 4])
+      self.assertAllClose(detections_out['detection_masks'],
+                          exp_detection_masks)
+      self.assertTrue(np.amax(detections_out['detection_masks'] <= 1.0))
+      self.assertTrue(np.amin(detections_out['detection_masks'] >= 0.0))
+
  def test_postprocess_second_stage_only_inference_mode_with_shared_boxes(self):
    model = self._build_model(
        is_training=False, number_of_stages=2, second_stage_batch_size=6)
@@ -190,6 +252,7 @@ class FasterRCNNMetaArchTest(
              set([
                  'detection_boxes', 'detection_scores', 'detection_classes',
                  'detection_masks', 'num_detections', 'mask_predictions',
+                  'raw_detection_boxes', 'raw_detection_scores'
              ])))
      for key in expected_shapes:
        self.assertAllEqual(tensor_dict_out[key].shape, expected_shapes[key])
@@ -276,7 +339,7 @@ class FasterRCNNMetaArchTest(
          self.assertAllEqual(tensor_dict_out[key].shape, expected_shapes[key])

        anchors_shape_out = tensor_dict_out['anchors'].shape
-        self.assertEqual(2, len(anchors_shape_out))
+        self.assertLen(anchors_shape_out, 2)
        self.assertEqual(4, anchors_shape_out[1])
        num_anchors_out = anchors_shape_out[0]
        self.assertAllEqual(tensor_dict_out['rpn_box_encodings'].shape,

--- a/research/object_detection/meta_architectures/faster_rcnn_meta_arch_test_lib.py
+++ b/research/object_detection/meta_architectures/faster_rcnn_meta_arch_test_lib.py
@@ -165,7 +165,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
                   use_matmul_crop_and_resize=False,
                   clip_anchors_to_image=False,
                   use_matmul_gather_in_matcher=False,
-                   use_static_shapes=False):
+                   use_static_shapes=False,
+                   calibration_mapping_value=None):

    def image_resizer_fn(image, masks=None):
      """Fake image resizer function."""
@@ -244,7 +245,9 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
    first_stage_localization_loss_weight = 1.0
    first_stage_objectness_loss_weight = 1.0

+    post_processing_config = post_processing_pb2.PostProcessing()
    post_processing_text_proto = """
+      score_converter: IDENTITY
      batch_non_max_suppression {
        score_threshold: -20.0
        iou_threshold: 1.0
@@ -253,18 +256,31 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
        use_static_shapes: """ +'{}'.format(use_static_shapes) + """
      }
    """
-    post_processing_config = post_processing_pb2.PostProcessing()
+    if calibration_mapping_value:
+      calibration_text_proto = """
+      calibration_config {
+        function_approximation {
+          x_y_pairs {
+            x_y_pair {
+              x: 0.0
+              y: %f
+            }
+            x_y_pair {
+              x: 1.0
+              y: %f
+              }}}}""" % (calibration_mapping_value, calibration_mapping_value)
+      post_processing_text_proto = (post_processing_text_proto
+                                    + ' ' + calibration_text_proto)
    text_format.Merge(post_processing_text_proto, post_processing_config)
+    second_stage_non_max_suppression_fn, second_stage_score_conversion_fn = (
+        post_processing_builder.build(post_processing_config))

    second_stage_target_assigner = target_assigner.create_target_assigner(
        'FasterRCNN', 'detection',
        use_matmul_gather=use_matmul_gather_in_matcher)
-    second_stage_non_max_suppression_fn, _ = post_processing_builder.build(
-        post_processing_config)
    second_stage_sampler = sampler.BalancedPositiveNegativeSampler(
        positive_fraction=1.0, is_static=use_static_shapes)

-    second_stage_score_conversion_fn = tf.identity
    second_stage_localization_loss_weight = 1.0
    second_stage_classification_loss_weight = 1.0
    if softmax_second_stage_classification_loss:
@@ -336,6 +352,10 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
            predict_masks=predict_masks,
            masks_are_class_agnostic=masks_are_class_agnostic), **common_kwargs)

+  @parameterized.parameters(
+      {'use_static_shapes': False},
+      {'use_static_shapes': True}
+  )
  def test_predict_gives_correct_shapes_in_inference_mode_first_stage_only(
      self, use_static_shapes=False):
    batch_size = 2
@@ -457,6 +477,10 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
            prediction_out['rpn_objectness_predictions_with_background'].shape,
            (batch_size, num_anchors_out, 2))

+  @parameterized.parameters(
+      {'use_static_shapes': False},
+      {'use_static_shapes': True}
+  )
  def test_predict_correct_shapes_in_inference_mode_two_stages(
      self, use_static_shapes=False):

@@ -578,6 +602,10 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
        for key in expected_shapes:
          self.assertAllEqual(tensor_dict_out[key].shape, expected_shapes[key])

+  @parameterized.parameters(
+      {'use_static_shapes': False},
+      {'use_static_shapes': True}
+  )
  def test_predict_gives_correct_shapes_in_train_mode_both_stages(
      self,
      use_static_shapes=False):
@@ -670,6 +698,12 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
    self.assertAllEqual(results[8].shape,
                        expected_shapes['rpn_box_predictor_features'])

+  @parameterized.parameters(
+      {'use_static_shapes': False, 'pad_to_max_dimension': None},
+      {'use_static_shapes': True, 'pad_to_max_dimension': None},
+      {'use_static_shapes': False, 'pad_to_max_dimension': 56},
+      {'use_static_shapes': True, 'pad_to_max_dimension': 56}
+  )
  def test_postprocess_first_stage_only_inference_mode(
      self, use_static_shapes=False, pad_to_max_dimension=None):
    batch_size = 2
@@ -696,9 +730,9 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
          rpn_objectness_predictions_with_background,
          'rpn_features_to_crop': rpn_features_to_crop,
          'anchors': anchors}, true_image_shapes)
-      return (proposals['num_detections'],
-              proposals['detection_boxes'],
-              proposals['detection_scores'])
+      return (proposals['num_detections'], proposals['detection_boxes'],
+              proposals['detection_scores'], proposals['raw_detection_boxes'],
+              proposals['raw_detection_scores'])

    anchors = np.array(
        [[0, 0, 16, 16],
@@ -741,6 +775,12 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
    expected_proposal_scores = [[1, 1, 0, 0, 0, 0, 0, 0],
                                [1, 1, 0, 0, 0, 0, 0, 0]]
    expected_num_proposals = [4, 4]
+    expected_raw_proposal_boxes = [[[0., 0., 0.5, 0.5], [0., 0.5, 0.5, 1.],
+                                    [0.5, 0., 1., 0.5], [0.5, 0.5, 1., 1.]],
+                                   [[0., 0., 0.5, 0.5], [0., 0.5, 0.5, 1.],
+                                    [0.5, 0., 1., 0.5], [0.5, 0.5, 1., 1.]]]
+    expected_raw_scores = [[[-10., 13.], [10., -10.], [10., -11.], [-10., 12.]],
+                           [[10., -10.], [-10., 13.], [-10., 12.], [10., -11.]]]

    self.assertAllClose(results[0], expected_num_proposals)
    for indx, num_proposals in enumerate(expected_num_proposals):
@@ -748,6 +788,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
                          expected_proposal_boxes[indx][0:num_proposals])
      self.assertAllClose(results[2][indx][0:num_proposals],
                          expected_proposal_scores[indx][0:num_proposals])
+    self.assertAllClose(results[3], expected_raw_proposal_boxes)
+    self.assertAllClose(results[4], expected_raw_scores)

  def _test_postprocess_first_stage_only_train_mode(self,
                                                    pad_to_max_dimension=None):
@@ -801,9 +843,17 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
    expected_proposal_scores = [[1, 1],
                                [1, 1]]
    expected_num_proposals = [2, 2]
-
-    expected_output_keys = set(['detection_boxes', 'detection_scores',
-                                'num_detections'])
+    expected_raw_proposal_boxes = [[[0., 0., 0.5, 0.5], [0., 0.5, 0.5, 1.],
+                                    [0.5, 0., 1., 0.5], [0.5, 0.5, 1., 1.]],
+                                   [[0., 0., 0.5, 0.5], [0., 0.5, 0.5, 1.],
+                                    [0.5, 0., 1., 0.5], [0.5, 0.5, 1., 1.]]]
+    expected_raw_scores = [[[-10., 13.], [-10., 12.], [-10., 11.], [-10., 10.]],
+                           [[-10., 13.], [-10., 12.], [-10., 11.], [-10., 10.]]]
+
+    expected_output_keys = set([
+        'detection_boxes', 'detection_scores', 'num_detections',
+        'raw_detection_boxes', 'raw_detection_scores'
+    ])
    self.assertEqual(set(proposals.keys()), expected_output_keys)

    with self.test_session() as sess:
@@ -817,6 +867,10 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
                          expected_proposal_scores)
      self.assertAllEqual(proposals_out['num_detections'],
                          expected_num_proposals)
+    self.assertAllClose(proposals_out['raw_detection_boxes'],
+                        expected_raw_proposal_boxes)
+    self.assertAllClose(proposals_out['raw_detection_scores'],
+                        expected_raw_scores)

  def test_postprocess_first_stage_only_train_mode(self):
    self._test_postprocess_first_stage_only_train_mode()
@@ -824,6 +878,12 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
  def test_postprocess_first_stage_only_train_mode_padded_image(self):
    self._test_postprocess_first_stage_only_train_mode(pad_to_max_dimension=56)

+  @parameterized.parameters(
+      {'use_static_shapes': False, 'pad_to_max_dimension': None},
+      {'use_static_shapes': True, 'pad_to_max_dimension': None},
+      {'use_static_shapes': False, 'pad_to_max_dimension': 56},
+      {'use_static_shapes': True, 'pad_to_max_dimension': 56}
+  )
  def test_postprocess_second_stage_only_inference_mode(
      self, use_static_shapes=False, pad_to_max_dimension=None):
    batch_size = 2
@@ -854,10 +914,10 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
          'num_proposals': num_proposals,
          'proposal_boxes': proposal_boxes,
      }, true_image_shapes)
-      return (detections['num_detections'],
-              detections['detection_boxes'],
-              detections['detection_scores'],
-              detections['detection_classes'])
+      return (detections['num_detections'], detections['detection_boxes'],
+              detections['detection_scores'], detections['detection_classes'],
+              detections['raw_detection_boxes'],
+              detections['raw_detection_scores'])

    proposal_boxes = np.array(
        [[[1, 1, 2, 3],
@@ -867,6 +927,7 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
         [[2, 3, 6, 8],
          [1, 2, 5, 3],
          4*[0], 4*[0], 4*[0], 4*[0], 4*[0], 4*[0]]], dtype=np.float32)
+
    num_proposals = np.array([3, 2], dtype=np.int32)
    refined_box_encodings = np.zeros(
        [total_num_padded_proposals, num_classes, 4], dtype=np.float32)
@@ -887,6 +948,15 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
    expected_num_detections = [5, 4]
    expected_detection_classes = [[0, 0, 0, 1, 1], [0, 0, 1, 1, 0]]
    expected_detection_scores = [[1, 1, 1, 1, 1], [1, 1, 1, 1, 0]]
+    h = float(image_shape[1])
+    w = float(image_shape[2])
+    expected_raw_detection_boxes = np.array(
+        [[[1 / h, 1 / w, 2 / h, 3 / w], [0, 0, 1 / h, 1 / w],
+          [.5 / h, .5 / w, .6 / h, .6 / w], 4 * [0], 4 * [0], 4 * [0], 4 * [0],
+          4 * [0]],
+         [[2 / h, 3 / w, 6 / h, 8 / w], [1 / h, 2 / w, 5 / h, 3 / w], 4 * [0],
+          4 * [0], 4 * [0], 4 * [0], 4 * [0], 4 * [0]]],
+        dtype=np.float32)

    self.assertAllClose(results[0], expected_num_detections)

@@ -896,6 +966,9 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
      self.assertAllClose(results[3][indx][0:num_proposals],
                          expected_detection_classes[indx][0:num_proposals])

+    self.assertAllClose(results[4], expected_raw_detection_boxes)
+    self.assertAllClose(results[5],
+                        class_predictions_with_background.reshape([-1, 8, 3]))
    if not use_static_shapes:
      self.assertAllEqual(results[1].shape, [2, 5, 4])

@@ -1268,6 +1341,13 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
          'Loss/BoxClassifierLoss/classification_loss'], 0)
      self.assertAllClose(loss_dict_out['Loss/BoxClassifierLoss/mask_loss'], 0)

+  @parameterized.parameters(
+      {'use_static_shapes': False, 'shared_boxes': False},
+      {'use_static_shapes': False, 'shared_boxes': True},
+
+      {'use_static_shapes': True, 'shared_boxes': False},
+      {'use_static_shapes': True, 'shared_boxes': True},
+  )
  def test_loss_full_zero_padded_proposals_nonzero_loss_with_two_images(
      self, use_static_shapes=False, shared_boxes=False):
    batch_size = 2

--- a/research/object_detection/meta_architectures/rfcn_meta_arch.py
+++ b/research/object_detection/meta_architectures/rfcn_meta_arch.py
@@ -288,7 +288,7 @@ class RFCNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch):
    """
    image_shape_2d = tf.tile(tf.expand_dims(image_shape[1:], 0),
                             [image_shape[0], 1])
-    proposal_boxes_normalized, _, num_proposals = self._postprocess_rpn(
+    proposal_boxes_normalized, _, num_proposals, _, _ = self._postprocess_rpn(
        rpn_box_encodings, rpn_objectness_predictions_with_background,
        anchors, image_shape_2d, true_image_shapes)


--- a/research/object_detection/meta_architectures/ssd_meta_arch.py
+++ b/research/object_detection/meta_architectures/ssd_meta_arch.py
@@ -17,8 +17,7 @@
 General tensorflow implementation of convolutional Multibox/SSD detection
 models.
 """
-from abc import abstractmethod
-
+import abc
 import tensorflow as tf

 from object_detection.core import box_list
@@ -80,7 +79,7 @@ class SSDFeatureExtractor(object):
  def is_keras_model(self):
    return False

-  @abstractmethod
+  @abc.abstractmethod
  def preprocess(self, resized_inputs):
    """Preprocesses images for feature extraction (minus image resizing).

@@ -98,7 +97,7 @@ class SSDFeatureExtractor(object):
    """
    pass

-  @abstractmethod
+  @abc.abstractmethod
  def extract_features(self, preprocessed_inputs):
    """Extracts features from preprocessed inputs.

@@ -196,7 +195,7 @@ class SSDKerasFeatureExtractor(tf.keras.Model):
  def is_keras_model(self):
    return True

-  @abstractmethod
+  @abc.abstractmethod
  def preprocess(self, resized_inputs):
    """Preprocesses images for feature extraction (minus image resizing).

@@ -214,7 +213,7 @@ class SSDKerasFeatureExtractor(tf.keras.Model):
    """
    raise NotImplementedError

-  @abstractmethod
+  @abc.abstractmethod
  def _extract_features(self, preprocessed_inputs):
    """Extracts features from preprocessed inputs.

@@ -552,8 +551,10 @@ class SSDMetaArch(model.DetectionModel):
        5) anchors: 2-D float tensor of shape [num_anchors, 4] containing
          the generated anchors in normalized coordinates.
    """
-    batchnorm_updates_collections = (None if self._inplace_batchnorm_update
-                                     else tf.GraphKeys.UPDATE_OPS)
+    if self._inplace_batchnorm_update:
+      batchnorm_updates_collections = None
+    else:
+      batchnorm_updates_collections = tf.GraphKeys.UPDATE_OPS
    if self._feature_extractor.is_keras_model:
      feature_maps = self._feature_extractor(preprocessed_inputs)
    else:
@@ -648,14 +649,22 @@ class SSDMetaArch(model.DetectionModel):

    Returns:
      detections: a dictionary containing the following fields
-        detection_boxes: [batch, max_detections, 4]
-        detection_scores: [batch, max_detections]
-        detection_classes: [batch, max_detections]
+        detection_boxes: [batch, max_detections, 4] tensor with post-processed
+          detection boxes.
+        detection_scores: [batch, max_detections] tensor with scalar scores for
+          post-processed detection boxes.
+        detection_classes: [batch, max_detections] tensor with classes for
+          post-processed detection classes.
        detection_keypoints: [batch, max_detections, num_keypoints, 2] (if
          encoded in the prediction_dict 'box_encodings')
        detection_masks: [batch_size, max_detections, mask_height, mask_width]
          (optional)
        num_detections: [batch]
+        raw_detection_boxes: [batch, total_detections, 4] tensor with decoded
+          detection boxes before Non-Max Suppression.
+        raw_detection_score: [batch, total_detections,
+          num_classes_with_background] tensor of multi-class score logits for
+          raw detection boxes.
    Raises:
      ValueError: if prediction_dict does not contain `box_encodings` or
        `class_predictions_with_background` fields.
@@ -700,11 +709,18 @@ class SSDMetaArch(model.DetectionModel):
           additional_fields=additional_fields,
           masks=prediction_dict.get('mask_predictions'))
      detection_dict = {
-          fields.DetectionResultFields.detection_boxes: nmsed_boxes,
-          fields.DetectionResultFields.detection_scores: nmsed_scores,
-          fields.DetectionResultFields.detection_classes: nmsed_classes,
+          fields.DetectionResultFields.detection_boxes:
+              nmsed_boxes,
+          fields.DetectionResultFields.detection_scores:
+              nmsed_scores,
+          fields.DetectionResultFields.detection_classes:
+              nmsed_classes,
          fields.DetectionResultFields.num_detections:
-              tf.to_float(num_detections)
+              tf.to_float(num_detections),
+          fields.DetectionResultFields.raw_detection_boxes:
+              tf.squeeze(detection_boxes, axis=2),
+          fields.DetectionResultFields.raw_detection_scores:
+              class_predictions
      }
      if (nmsed_additional_fields is not None and
          fields.BoxListFields.keypoints in nmsed_additional_fields):
@@ -1049,9 +1065,9 @@ class SSDMetaArch(model.DetectionModel):
      mined_cls_loss: a float scalar with sum of classification losses from
        selected hard examples.
    """
-    class_predictions = tf.slice(
-        prediction_dict['class_predictions_with_background'], [0, 0,
-                                                               1], [-1, -1, -1])
+    class_predictions = prediction_dict['class_predictions_with_background']
+    if self._add_background_class:
+      class_predictions = tf.slice(class_predictions, [0, 0, 1], [-1, -1, -1])

    decoded_boxes, _ = self._batch_decode(prediction_dict['box_encodings'])
    decoded_box_tensors_list = tf.unstack(decoded_boxes)

--- a/research/object_detection/meta_architectures/ssd_meta_arch_test.py
+++ b/research/object_detection/meta_architectures/ssd_meta_arch_test.py
@@ -48,7 +48,8 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
      use_keras=False,
      predict_mask=False,
      use_static_shapes=False,
-      nms_max_size_per_class=5):
+      nms_max_size_per_class=5,
+      calibration_mapping_value=None):
    return super(SsdMetaArchTest, self)._create_model(
        model_fn=ssd_meta_arch.SSDMetaArch,
        apply_hard_mining=apply_hard_mining,
@@ -61,7 +62,8 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
        use_keras=use_keras,
        predict_mask=predict_mask,
        use_static_shapes=use_static_shapes,
-        nms_max_size_per_class=nms_max_size_per_class)
+        nms_max_size_per_class=nms_max_size_per_class,
+        calibration_mapping_value=calibration_mapping_value)

  def test_preprocess_preserves_shapes_with_dynamic_input_image(
      self, use_keras):
@@ -177,6 +179,13 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
    expected_classes = [[0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]
    expected_num_detections = np.array([3, 3])

+    raw_detection_boxes = [[[0., 0., 0.5, 0.5], [0., 0.5, 0.5, 1.],
+                            [0.5, 0., 1., 0.5], [1., 1., 1.5, 1.5]],
+                           [[0., 0., 0.5, 0.5], [0., 0.5, 0.5, 1.],
+                            [0.5, 0., 1., 0.5], [1., 1., 1.5, 1.5]]]
+    raw_detection_scores = [[[0, 0], [0, 0], [0, 0], [0, 0]],
+                            [[0, 0], [0, 0], [0, 0], [0, 0]]]
+
    for input_shape in input_shapes:
      tf_graph = tf.Graph()
      with tf_graph.as_default():
@@ -191,6 +200,8 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
        self.assertIn('detection_scores', detections)
        self.assertIn('detection_classes', detections)
        self.assertIn('num_detections', detections)
+        self.assertIn('raw_detection_boxes', detections)
+        self.assertIn('raw_detection_scores', detections)
        init_op = tf.global_variables_initializer()
      with self.test_session(graph=tf_graph) as sess:
        sess.run(init_op)
@@ -208,7 +219,139 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
      self.assertAllClose(detections_out['detection_classes'], expected_classes)
      self.assertAllClose(detections_out['num_detections'],
                          expected_num_detections)
+      self.assertAllEqual(detections_out['raw_detection_boxes'],
+                          raw_detection_boxes)
+      self.assertAllEqual(detections_out['raw_detection_scores'],
+                          raw_detection_scores)
+
+  def test_postprocess_results_are_correct_static(self, use_keras):
+    with tf.Graph().as_default():
+      _, _, _, _ = self._create_model(use_keras=use_keras)
+    def graph_fn(input_image):
+      model, _, _, _ = self._create_model(use_static_shapes=True,
+                                          nms_max_size_per_class=4)
+      preprocessed_inputs, true_image_shapes = model.preprocess(input_image)
+      prediction_dict = model.predict(preprocessed_inputs,
+                                      true_image_shapes)
+      detections = model.postprocess(prediction_dict, true_image_shapes)
+      return (detections['detection_boxes'], detections['detection_scores'],
+              detections['detection_classes'], detections['num_detections'])
+
+    batch_size = 2
+    image_size = 2
+    channels = 3
+    input_image = np.random.rand(batch_size, image_size, image_size,
+                                 channels).astype(np.float32)
+    expected_boxes = [
+        [
+            [0, 0, .5, .5],
+            [0, .5, .5, 1],
+            [.5, 0, 1, .5],
+            [0, 0, 0, 0]
+        ],  # padding
+        [
+            [0, 0, .5, .5],
+            [0, .5, .5, 1],
+            [.5, 0, 1, .5],
+            [0, 0, 0, 0]
+        ]
+    ]  # padding
+    expected_scores = [[0, 0, 0, 0], [0, 0, 0, 0]]
+    expected_classes = [[0, 0, 0, 0], [0, 0, 0, 0]]
+    expected_num_detections = np.array([3, 3])

+    (detection_boxes, detection_scores, detection_classes,
+     num_detections) = self.execute(graph_fn, [input_image])
+    for image_idx in range(batch_size):
+      self.assertTrue(test_utils.first_rows_close_as_set(
+          detection_boxes[image_idx][
+              0:expected_num_detections[image_idx]].tolist(),
+          expected_boxes[image_idx][0:expected_num_detections[image_idx]]))
+      self.assertAllClose(
+          detection_scores[image_idx][0:expected_num_detections[image_idx]],
+          expected_scores[image_idx][0:expected_num_detections[image_idx]])
+      self.assertAllClose(
+          detection_classes[image_idx][0:expected_num_detections[image_idx]],
+          expected_classes[image_idx][0:expected_num_detections[image_idx]])
+    self.assertAllClose(num_detections,
+                        expected_num_detections)
+
+  def test_postprocess_results_are_correct_with_calibration(self, use_keras):
+    batch_size = 2
+    image_size = 2
+    input_shapes = [(batch_size, image_size, image_size, 3),
+                    (None, image_size, image_size, 3),
+                    (batch_size, None, None, 3),
+                    (None, None, None, 3)]
+
+    expected_boxes = [
+        [
+            [0, 0, .5, .5],
+            [0, .5, .5, 1],
+            [.5, 0, 1, .5],
+            [0, 0, 0, 0],  # pruned prediction
+            [0, 0, 0, 0]
+        ],  # padding
+        [
+            [0, 0, .5, .5],
+            [0, .5, .5, 1],
+            [.5, 0, 1, .5],
+            [0, 0, 0, 0],  # pruned prediction
+            [0, 0, 0, 0]
+        ]
+    ]  # padding
+    # Calibration mapping value below is set to map all scores to 0.5, except
+    # for the last two detections in each batch (see expected number of
+    # detections below.
+    expected_scores = [[0.5, 0.5, 0.5, 0., 0.], [0.5, 0.5, 0.5, 0., 0.]]
+    expected_classes = [[0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]
+    expected_num_detections = np.array([3, 3])
+
+    raw_detection_boxes = [[[0., 0., 0.5, 0.5], [0., 0.5, 0.5, 1.],
+                            [0.5, 0., 1., 0.5], [1., 1., 1.5, 1.5]],
+                           [[0., 0., 0.5, 0.5], [0., 0.5, 0.5, 1.],
+                            [0.5, 0., 1., 0.5], [1., 1., 1.5, 1.5]]]
+    raw_detection_scores = [[[0, 0], [0, 0], [0, 0], [0, 0]],
+                            [[0, 0], [0, 0], [0, 0], [0, 0]]]
+
+    for input_shape in input_shapes:
+      tf_graph = tf.Graph()
+      with tf_graph.as_default():
+        model, _, _, _ = self._create_model(use_keras=use_keras,
+                                            calibration_mapping_value=0.5)
+        input_placeholder = tf.placeholder(tf.float32, shape=input_shape)
+        preprocessed_inputs, true_image_shapes = model.preprocess(
+            input_placeholder)
+        prediction_dict = model.predict(preprocessed_inputs,
+                                        true_image_shapes)
+        detections = model.postprocess(prediction_dict, true_image_shapes)
+        self.assertIn('detection_boxes', detections)
+        self.assertIn('detection_scores', detections)
+        self.assertIn('detection_classes', detections)
+        self.assertIn('num_detections', detections)
+        self.assertIn('raw_detection_boxes', detections)
+        self.assertIn('raw_detection_scores', detections)
+        init_op = tf.global_variables_initializer()
+      with self.test_session(graph=tf_graph) as sess:
+        sess.run(init_op)
+        detections_out = sess.run(detections,
+                                  feed_dict={
+                                      input_placeholder:
+                                      np.random.uniform(
+                                          size=(batch_size, 2, 2, 3))})
+      for image_idx in range(batch_size):
+        self.assertTrue(
+            test_utils.first_rows_close_as_set(
+                detections_out['detection_boxes'][image_idx].tolist(),
+                expected_boxes[image_idx]))
+      self.assertAllClose(detections_out['detection_scores'], expected_scores)
+      self.assertAllClose(detections_out['detection_classes'], expected_classes)
+      self.assertAllClose(detections_out['num_detections'],
+                          expected_num_detections)
+      self.assertAllEqual(detections_out['raw_detection_boxes'],
+                          raw_detection_boxes)
+      self.assertAllEqual(detections_out['raw_detection_scores'],
+                          raw_detection_scores)

  def test_loss_results_are_correct(self, use_keras):


--- a/research/object_detection/meta_architectures/ssd_meta_arch_test_lib.py
+++ b/research/object_detection/meta_architectures/ssd_meta_arch_test_lib.py
@@ -16,7 +16,9 @@

 import functools
 import tensorflow as tf
+from google.protobuf import text_format

+from object_detection.builders import post_processing_builder
 from object_detection.core import anchor_generator
 from object_detection.core import balanced_positive_negative_sampler as sampler
 from object_detection.core import box_list
@@ -25,6 +27,7 @@ from object_detection.core import post_processing
 from object_detection.core import region_similarity_calculator as sim_calc
 from object_detection.core import target_assigner
 from object_detection.meta_architectures import ssd_meta_arch
+from object_detection.protos import calibration_pb2
 from object_detection.protos import model_pb2
 from object_detection.utils import ops
 from object_detection.utils import test_case
@@ -125,7 +128,8 @@ class SSDMetaArchTestBase(test_case.TestCase):
      use_keras=False,
      predict_mask=False,
      use_static_shapes=False,
-      nms_max_size_per_class=5):
+      nms_max_size_per_class=5,
+      calibration_mapping_value=None):
    is_training = False
    num_classes = 1
    mock_anchor_generator = MockAnchorGenerator2x2()
@@ -156,6 +160,24 @@ class SSDMetaArchTestBase(test_case.TestCase):
        max_size_per_class=nms_max_size_per_class,
        max_total_size=nms_max_size_per_class,
        use_static_shapes=use_static_shapes)
+    score_conversion_fn = tf.identity
+    calibration_config = calibration_pb2.CalibrationConfig()
+    if calibration_mapping_value:
+      calibration_text_proto = """
+      function_approximation {
+        x_y_pairs {
+            x_y_pair {
+              x: 0.0
+              y: %f
+            }
+            x_y_pair {
+              x: 1.0
+              y: %f
+            }}}""" % (calibration_mapping_value, calibration_mapping_value)
+      text_format.Merge(calibration_text_proto, calibration_config)
+      score_conversion_fn = (
+          post_processing_builder._build_calibrated_score_converter(  # pylint: disable=protected-access
+              tf.identity, calibration_config))
    classification_loss_weight = 1.0
    localization_loss_weight = 1.0
    negative_class_weight = 1.0
@@ -201,7 +223,7 @@ class SSDMetaArchTestBase(test_case.TestCase):
        encode_background_as_zeros=encode_background_as_zeros,
        image_resizer_fn=image_resizer_fn,
        non_max_suppression_fn=non_max_suppression_fn,
-        score_conversion_fn=tf.identity,
+        score_conversion_fn=score_conversion_fn,
        classification_loss=classification_loss,
        localization_loss=localization_loss,
        classification_loss_weight=classification_loss_weight,

--- a/research/object_detection/metrics/calibration_evaluation.py
+++ b/research/object_detection/metrics/calibration_evaluation.py
+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Class for evaluating object detections with calibration metrics."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import tensorflow as tf
+
+from object_detection.box_coders import mean_stddev_box_coder
+from object_detection.core import box_list
+from object_detection.core import region_similarity_calculator
+from object_detection.core import standard_fields
+from object_detection.core import target_assigner
+from object_detection.matchers import argmax_matcher
+from object_detection.metrics import calibration_metrics
+from object_detection.utils import object_detection_evaluation
+
+
+# TODO(zbeaver): Implement metrics per category.
+class CalibrationDetectionEvaluator(
+    object_detection_evaluation.DetectionEvaluator):
+  """Class to evaluate calibration detection metrics."""
+
+  def __init__(self,
+               categories,
+               iou_threshold=0.5):
+    """Constructor.
+
+    Args:
+      categories: A list of dicts, each of which has the following keys -
+        'id': (required) an integer id uniquely identifying this category.
+        'name': (required) string representing category name e.g., 'cat', 'dog'.
+      iou_threshold: Threshold above which to consider a box as matched during
+        evaluation.
+    """
+    super(CalibrationDetectionEvaluator, self).__init__(categories)
+
+    # Constructing target_assigner to match detections to groundtruth.
+    similarity_calc = region_similarity_calculator.IouSimilarity()
+    matcher = argmax_matcher.ArgMaxMatcher(
+        matched_threshold=iou_threshold, unmatched_threshold=iou_threshold)
+    box_coder = mean_stddev_box_coder.MeanStddevBoxCoder(stddev=0.1)
+    self._target_assigner = target_assigner.TargetAssigner(
+        similarity_calc, matcher, box_coder)
+
+  def match_single_image_info(self, image_info):
+    """Match detections to groundtruth for a single image.
+
+    Detections are matched to available groundtruth in the image based on the
+    IOU threshold from the constructor.  The classes of the detections and
+    groundtruth matches are then compared. Detections that do not have IOU above
+    the required threshold or have different classes from their match are
+    considered negative matches. All inputs in `image_info` originate or are
+    inferred from the eval_dict passed to class method
+    `get_estimator_eval_metric_ops`.
+
+    Args:
+      image_info: a tuple or list containing the following (in order):
+        - gt_boxes: tf.float32 tensor of groundtruth boxes.
+        - gt_classes: tf.int64 tensor of groundtruth classes associated with
+            groundtruth boxes.
+        - num_gt_box: scalar indicating the number of groundtruth boxes per
+            image.
+        - det_boxes: tf.float32 tensor of detection boxes.
+        - det_classes: tf.int64 tensor of detection classes associated with
+            detection boxes.
+        - num_det_box: scalar indicating the number of detection boxes per
+            image.
+    Returns:
+      is_class_matched: tf.int64 tensor identical in shape to det_boxes,
+        indicating whether detection boxes matched with and had the same
+        class as groundtruth annotations.
+    """
+    (gt_boxes, gt_classes, num_gt_box, det_boxes, det_classes,
+     num_det_box) = image_info
+    detection_boxes = det_boxes[:num_det_box]
+    detection_classes = det_classes[:num_det_box]
+    groundtruth_boxes = gt_boxes[:num_gt_box]
+    groundtruth_classes = gt_classes[:num_gt_box]
+    det_boxlist = box_list.BoxList(detection_boxes)
+    gt_boxlist = box_list.BoxList(groundtruth_boxes)
+
+    # Target assigner requires classes in one-hot format. An additional
+    # dimension is required since gt_classes are 1-indexed; the zero index is
+    # provided to all non-matches.
+    one_hot_depth = tf.cast(tf.add(tf.reduce_max(groundtruth_classes), 1),
+                            dtype=tf.int32)
+    gt_classes_one_hot = tf.one_hot(
+        groundtruth_classes, one_hot_depth, dtype=tf.float32)
+    one_hot_cls_targets, _, _, _, _ = self._target_assigner.assign(
+        det_boxlist,
+        gt_boxlist,
+        gt_classes_one_hot,
+        unmatched_class_label=tf.zeros(shape=one_hot_depth, dtype=tf.float32))
+    # Transform from one-hot back to indexes.
+    cls_targets = tf.argmax(one_hot_cls_targets, axis=1)
+    is_class_matched = tf.cast(
+        tf.equal(tf.cast(cls_targets, tf.int64), detection_classes),
+        dtype=tf.int64)
+    return is_class_matched
+
+  def get_estimator_eval_metric_ops(self, eval_dict):
+    """Returns a dictionary of eval metric ops.
+
+    Note that once value_op is called, the detections and groundtruth added via
+    update_op are cleared.
+
+    This function can take in groundtruth and detections for a batch of images,
+    or for a single image. For the latter case, the batch dimension for input
+    tensors need not be present.
+
+    Args:
+      eval_dict: A dictionary that holds tensors for evaluating object detection
+        performance. For single-image evaluation, this dictionary may be
+        produced from eval_util.result_dict_for_single_example(). If multi-image
+        evaluation, `eval_dict` should contain the fields
+        'num_groundtruth_boxes_per_image' and 'num_det_boxes_per_image' to
+        properly unpad the tensors from the batch.
+
+    Returns:
+      a dictionary of metric names to tuple of value_op and update_op that can
+      be used as eval metric ops in tf.estimator.EstimatorSpec. Note that all
+      update ops must be run together and similarly all value ops must be run
+      together to guarantee correct behaviour.
+    """
+    # Unpack items from the evaluation dictionary.
+    input_data_fields = standard_fields.InputDataFields
+    detection_fields = standard_fields.DetectionResultFields
+    image_id = eval_dict[input_data_fields.key]
+    groundtruth_boxes = eval_dict[input_data_fields.groundtruth_boxes]
+    groundtruth_classes = eval_dict[input_data_fields.groundtruth_classes]
+    detection_boxes = eval_dict[detection_fields.detection_boxes]
+    detection_scores = eval_dict[detection_fields.detection_scores]
+    detection_classes = eval_dict[detection_fields.detection_classes]
+    num_gt_boxes_per_image = eval_dict.get(
+        'num_groundtruth_boxes_per_image', None)
+    num_det_boxes_per_image = eval_dict.get('num_det_boxes_per_image', None)
+    is_annotated_batched = eval_dict.get('is_annotated', None)
+
+    if not image_id.shape.as_list():
+      # Apply a batch dimension to all tensors.
+      image_id = tf.expand_dims(image_id, 0)
+      groundtruth_boxes = tf.expand_dims(groundtruth_boxes, 0)
+      groundtruth_classes = tf.expand_dims(groundtruth_classes, 0)
+      detection_boxes = tf.expand_dims(detection_boxes, 0)
+      detection_scores = tf.expand_dims(detection_scores, 0)
+      detection_classes = tf.expand_dims(detection_classes, 0)
+
+      if num_gt_boxes_per_image is None:
+        num_gt_boxes_per_image = tf.shape(groundtruth_boxes)[1:2]
+      else:
+        num_gt_boxes_per_image = tf.expand_dims(num_gt_boxes_per_image, 0)
+
+      if num_det_boxes_per_image is None:
+        num_det_boxes_per_image = tf.shape(detection_boxes)[1:2]
+      else:
+        num_det_boxes_per_image = tf.expand_dims(num_det_boxes_per_image, 0)
+
+      if is_annotated_batched is None:
+        is_annotated_batched = tf.constant([True])
+      else:
+        is_annotated_batched = tf.expand_dims(is_annotated_batched, 0)
+    else:
+      if num_gt_boxes_per_image is None:
+        num_gt_boxes_per_image = tf.tile(
+            tf.shape(groundtruth_boxes)[1:2],
+            multiples=tf.shape(groundtruth_boxes)[0:1])
+      if num_det_boxes_per_image is None:
+        num_det_boxes_per_image = tf.tile(
+            tf.shape(detection_boxes)[1:2],
+            multiples=tf.shape(detection_boxes)[0:1])
+      if is_annotated_batched is None:
+        is_annotated_batched = tf.ones_like(image_id, dtype=tf.bool)
+
+    # Filter images based on is_annotated_batched and match detections.
+    image_info = [tf.boolean_mask(tensor, is_annotated_batched) for tensor in
+                  [groundtruth_boxes, groundtruth_classes,
+                   num_gt_boxes_per_image, detection_boxes, detection_classes,
+                   num_det_boxes_per_image]]
+    is_class_matched = tf.map_fn(
+        self.match_single_image_info, image_info, dtype=tf.int64)
+    y_true = tf.squeeze(is_class_matched)
+    y_pred = tf.squeeze(tf.boolean_mask(detection_scores, is_annotated_batched))
+    ece, update_op = calibration_metrics.expected_calibration_error(
+        y_true, y_pred)
+    return {'CalibrationError/ExpectedCalibrationError': (ece, update_op)}
+
+  def add_single_ground_truth_image_info(self, image_id, groundtruth_dict):
+    """Adds groundtruth for a single image to be used for evaluation.
+
+    Args:
+      image_id: A unique string/integer identifier for the image.
+      groundtruth_dict: A dictionary of groundtruth numpy arrays required
+        for evaluations.
+    """
+    raise NotImplementedError
+
+  def add_single_detected_image_info(self, image_id, detections_dict):
+    """Adds detections for a single image to be used for evaluation.
+
+    Args:
+      image_id: A unique string/integer identifier for the image.
+      detections_dict: A dictionary of detection numpy arrays required for
+        evaluation.
+    """
+    raise NotImplementedError
+
+  def evaluate(self):
+    """Evaluates detections and returns a dictionary of metrics."""
+    raise NotImplementedError
+
+  def clear(self):
+    """Clears the state to prepare for a fresh evaluation."""
+    raise NotImplementedError
--- a/research/object_detection/metrics/calibration_evaluation_test.py
+++ b/research/object_detection/metrics/calibration_evaluation_test.py
+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for tensorflow_models.object_detection.metrics.calibration_evaluation."""  # pylint: disable=line-too-long
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import tensorflow as tf
+from object_detection.core import standard_fields
+from object_detection.metrics import calibration_evaluation
+
+
+def _get_categories_list():
+  return [{
+      'id': 1,
+      'name': 'person'
+  }, {
+      'id': 2,
+      'name': 'dog'
+  }, {
+      'id': 3,
+      'name': 'cat'
+  }]
+
+
+class CalibrationDetectionEvaluationTest(tf.test.TestCase):
+
+  def _get_ece(self, ece_op, update_op):
+    """Return scalar expected calibration error."""
+    with self.test_session() as sess:
+      metrics_vars = tf.get_collection(tf.GraphKeys.METRIC_VARIABLES)
+      sess.run(tf.variables_initializer(var_list=metrics_vars))
+      _ = sess.run(update_op)
+    return sess.run(ece_op)
+
+  def testGetECEWithMatchingGroundtruthAndDetections(self):
+    """Tests that ECE is calculated correctly when box matches exist."""
+    calibration_evaluator = calibration_evaluation.CalibrationDetectionEvaluator(
+        _get_categories_list(), iou_threshold=0.5)
+    input_data_fields = standard_fields.InputDataFields
+    detection_fields = standard_fields.DetectionResultFields
+    # All gt and detection boxes match.
+    base_eval_dict = {
+        input_data_fields.key:
+            tf.constant(['image_1', 'image_2', 'image_3']),
+        input_data_fields.groundtruth_boxes:
+            tf.constant([[[100., 100., 200., 200.]],
+                         [[50., 50., 100., 100.]],
+                         [[25., 25., 50., 50.]]],
+                        dtype=tf.float32),
+        detection_fields.detection_boxes:
+            tf.constant([[[100., 100., 200., 200.]],
+                         [[50., 50., 100., 100.]],
+                         [[25., 25., 50., 50.]]],
+                        dtype=tf.float32),
+        input_data_fields.groundtruth_classes:
+            tf.constant([[1], [2], [3]], dtype=tf.int64),
+        # Note that, in the zero ECE case, the detection class for image_2
+        # should NOT match groundtruth, since the detection score is zero.
+        detection_fields.detection_scores:
+            tf.constant([[1.0], [0.0], [1.0]], dtype=tf.float32)
+    }
+
+    # Zero ECE (perfectly calibrated).
+    zero_ece_eval_dict = base_eval_dict.copy()
+    zero_ece_eval_dict[detection_fields.detection_classes] = tf.constant(
+        [[1], [1], [3]], dtype=tf.int64)
+    zero_ece_op, zero_ece_update_op = (
+        calibration_evaluator.get_estimator_eval_metric_ops(zero_ece_eval_dict)
+        ['CalibrationError/ExpectedCalibrationError'])
+    zero_ece = self._get_ece(zero_ece_op, zero_ece_update_op)
+    self.assertAlmostEqual(zero_ece, 0.0)
+
+    # ECE of 1 (poorest calibration).
+    one_ece_eval_dict = base_eval_dict.copy()
+    one_ece_eval_dict[detection_fields.detection_classes] = tf.constant(
+        [[3], [2], [1]], dtype=tf.int64)
+    one_ece_op, one_ece_update_op = (
+        calibration_evaluator.get_estimator_eval_metric_ops(one_ece_eval_dict)
+        ['CalibrationError/ExpectedCalibrationError'])
+    one_ece = self._get_ece(one_ece_op, one_ece_update_op)
+    self.assertAlmostEqual(one_ece, 1.0)
+
+  def testGetECEWithUnmatchedGroundtruthAndDetections(self):
+    """Tests that ECE is correctly calculated when boxes are unmatched."""
+    calibration_evaluator = calibration_evaluation.CalibrationDetectionEvaluator(
+        _get_categories_list(), iou_threshold=0.5)
+    input_data_fields = standard_fields.InputDataFields
+    detection_fields = standard_fields.DetectionResultFields
+    # No gt and detection boxes match.
+    eval_dict = {
+        input_data_fields.key:
+            tf.constant(['image_1', 'image_2', 'image_3']),
+        input_data_fields.groundtruth_boxes:
+            tf.constant([[[100., 100., 200., 200.]],
+                         [[50., 50., 100., 100.]],
+                         [[25., 25., 50., 50.]]],
+                        dtype=tf.float32),
+        detection_fields.detection_boxes:
+            tf.constant([[[50., 50., 100., 100.]],
+                         [[25., 25., 50., 50.]],
+                         [[100., 100., 200., 200.]]],
+                        dtype=tf.float32),
+        input_data_fields.groundtruth_classes:
+            tf.constant([[1], [2], [3]], dtype=tf.int64),
+        detection_fields.detection_classes:
+            tf.constant([[1], [1], [3]], dtype=tf.int64),
+        # Detection scores of zero when boxes are unmatched = ECE of zero.
+        detection_fields.detection_scores:
+            tf.constant([[0.0], [0.0], [0.0]], dtype=tf.float32)
+    }
+
+    ece_op, update_op = calibration_evaluator.get_estimator_eval_metric_ops(
+        eval_dict)['CalibrationError/ExpectedCalibrationError']
+    ece = self._get_ece(ece_op, update_op)
+    self.assertAlmostEqual(ece, 0.0)
+
+  def testGetECEWithBatchedDetections(self):
+    """Tests that ECE is correct with multiple detections per image."""
+    calibration_evaluator = calibration_evaluation.CalibrationDetectionEvaluator(
+        _get_categories_list(), iou_threshold=0.5)
+    input_data_fields = standard_fields.InputDataFields
+    detection_fields = standard_fields.DetectionResultFields
+    # Note that image_2 has mismatched classes and detection scores but should
+    # still produce ECE of 0 because detection scores are also 0.
+    eval_dict = {
+        input_data_fields.key:
+            tf.constant(['image_1', 'image_2', 'image_3']),
+        input_data_fields.groundtruth_boxes:
+            tf.constant([[[100., 100., 200., 200.], [50., 50., 100., 100.]],
+                         [[50., 50., 100., 100.], [100., 100., 200., 200.]],
+                         [[25., 25., 50., 50.], [100., 100., 200., 200.]]],
+                        dtype=tf.float32),
+        detection_fields.detection_boxes:
+            tf.constant([[[100., 100., 200., 200.], [50., 50., 100., 100.]],
+                         [[50., 50., 100., 100.], [25., 25., 50., 50.]],
+                         [[25., 25., 50., 50.], [100., 100., 200., 200.]]],
+                        dtype=tf.float32),
+        input_data_fields.groundtruth_classes:
+            tf.constant([[1, 2], [2, 3], [3, 1]], dtype=tf.int64),
+        detection_fields.detection_classes:
+            tf.constant([[1, 2], [1, 1], [3, 1]], dtype=tf.int64),
+        detection_fields.detection_scores:
+            tf.constant([[1.0, 1.0], [0.0, 0.0], [1.0, 1.0]], dtype=tf.float32)
+    }
+
+    ece_op, update_op = calibration_evaluator.get_estimator_eval_metric_ops(
+        eval_dict)['CalibrationError/ExpectedCalibrationError']
+    ece = self._get_ece(ece_op, update_op)
+    self.assertAlmostEqual(ece, 0.0)
+
+  def testGetECEWhenImagesFilteredByIsAnnotated(self):
+    """Tests that ECE is correct when detections filtered by is_annotated."""
+    calibration_evaluator = calibration_evaluation.CalibrationDetectionEvaluator(
+        _get_categories_list(), iou_threshold=0.5)
+    input_data_fields = standard_fields.InputDataFields
+    detection_fields = standard_fields.DetectionResultFields
+    # ECE will be 0 only if the third image is filtered by is_annotated.
+    eval_dict = {
+        input_data_fields.key:
+            tf.constant(['image_1', 'image_2', 'image_3']),
+        input_data_fields.groundtruth_boxes:
+            tf.constant([[[100., 100., 200., 200.]],
+                         [[50., 50., 100., 100.]],
+                         [[25., 25., 50., 50.]]],
+                        dtype=tf.float32),
+        detection_fields.detection_boxes:
+            tf.constant([[[100., 100., 200., 200.]],
+                         [[50., 50., 100., 100.]],
+                         [[25., 25., 50., 50.]]],
+                        dtype=tf.float32),
+        input_data_fields.groundtruth_classes:
+            tf.constant([[1], [2], [1]], dtype=tf.int64),
+        detection_fields.detection_classes:
+            tf.constant([[1], [1], [3]], dtype=tf.int64),
+        detection_fields.detection_scores:
+            tf.constant([[1.0], [0.0], [1.0]], dtype=tf.float32),
+        'is_annotated': tf.constant([True, True, False], dtype=tf.bool)
+    }
+
+    ece_op, update_op = calibration_evaluator.get_estimator_eval_metric_ops(
+        eval_dict)['CalibrationError/ExpectedCalibrationError']
+    ece = self._get_ece(ece_op, update_op)
+    self.assertAlmostEqual(ece, 0.0)
+
+if __name__ == '__main__':
+  tf.test.main()
--- a/research/object_detection/metrics/calibration_metrics.py
+++ b/research/object_detection/metrics/calibration_metrics.py
+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Object detection calibration metrics.
+"""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import tensorflow as tf
+from tensorflow.python.ops import metrics_impl
+
+
+def _safe_div(numerator, denominator):
+  """Divides two tensors element-wise, returning 0 if the denominator is <= 0.
+
+  Args:
+    numerator: A real `Tensor`.
+    denominator: A real `Tensor`, with dtype matching `numerator`.
+
+  Returns:
+    0 if `denominator` <= 0, else `numerator` / `denominator`
+  """
+  t = tf.truediv(numerator, denominator)
+  zero = tf.zeros_like(t, dtype=denominator.dtype)
+  condition = tf.greater(denominator, zero)
+  zero = tf.cast(zero, t.dtype)
+  return tf.where(condition, t, zero)
+
+
+def _ece_from_bins(bin_counts, bin_true_sum, bin_preds_sum, name):
+  """Calculates Expected Calibration Error from accumulated statistics."""
+  bin_accuracies = _safe_div(bin_true_sum, bin_counts)
+  bin_confidences = _safe_div(bin_preds_sum, bin_counts)
+  abs_bin_errors = tf.abs(bin_accuracies - bin_confidences)
+  bin_weights = _safe_div(bin_counts, tf.reduce_sum(bin_counts))
+  return tf.reduce_sum(abs_bin_errors * bin_weights, name=name)
+
+
+def expected_calibration_error(y_true, y_pred, nbins=20):
+  """Calculates Expected Calibration Error (ECE).
+
+  ECE is a scalar summary statistic of calibration error. It is the
+  sample-weighted average of the difference between the predicted and true
+  probabilities of a positive detection across uniformly-spaced model
+  confidences [0, 1]. See referenced paper for a thorough explanation.
+
+  Reference:
+    Guo, et. al, "On Calibration of Modern Neural Networks"
+    Page 2, Expected Calibration Error (ECE).
+    https://arxiv.org/pdf/1706.04599.pdf
+
+  This function creates three local variables, `bin_counts`, `bin_true_sum`, and
+  `bin_preds_sum` that are used to compute ECE.  For estimation of the metric
+  over a stream of data, the function creates an `update_op` operation that
+  updates these variables and returns the ECE.
+
+  Args:
+    y_true: 1-D tf.int64 Tensor of binarized ground truth, corresponding to each
+      prediction in y_pred.
+    y_pred: 1-D tf.float32 tensor of model confidence scores in range
+      [0.0, 1.0].
+    nbins: int specifying the number of uniformly-spaced bins into which y_pred
+      will be bucketed.
+
+  Returns:
+    value_op: A value metric op that returns ece.
+    update_op: An operation that increments the `bin_counts`, `bin_true_sum`,
+      and `bin_preds_sum` variables appropriately and whose value matches `ece`.
+
+  Raises:
+    InvalidArgumentError: if y_pred is not in [0.0, 1.0].
+  """
+  bin_counts = metrics_impl.metric_variable(
+      [nbins], tf.float32, name='bin_counts')
+  bin_true_sum = metrics_impl.metric_variable(
+      [nbins], tf.float32, name='true_sum')
+  bin_preds_sum = metrics_impl.metric_variable(
+      [nbins], tf.float32, name='preds_sum')
+
+  with tf.control_dependencies([
+      tf.assert_greater_equal(y_pred, 0.0),
+      tf.assert_less_equal(y_pred, 1.0),
+  ]):
+    bin_ids = tf.histogram_fixed_width_bins(y_pred, [0.0, 1.0], nbins=nbins)
+
+  with tf.control_dependencies([bin_ids]):
+    update_bin_counts_op = tf.assign_add(
+        bin_counts, tf.to_float(tf.bincount(bin_ids, minlength=nbins)))
+    update_bin_true_sum_op = tf.assign_add(
+        bin_true_sum,
+        tf.to_float(tf.bincount(bin_ids, weights=y_true, minlength=nbins)))
+    update_bin_preds_sum_op = tf.assign_add(
+        bin_preds_sum,
+        tf.to_float(tf.bincount(bin_ids, weights=y_pred, minlength=nbins)))
+
+  ece_update_op = _ece_from_bins(
+      update_bin_counts_op,
+      update_bin_true_sum_op,
+      update_bin_preds_sum_op,
+      name='update_op')
+  ece = _ece_from_bins(bin_counts, bin_true_sum, bin_preds_sum, name='value')
+  return ece, ece_update_op
--- a/research/object_detection/metrics/calibration_metrics_test.py
+++ b/research/object_detection/metrics/calibration_metrics_test.py
+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for calibration_metrics."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+import tensorflow as tf
+from object_detection.metrics import calibration_metrics
+
+
+class CalibrationLibTest(tf.test.TestCase):
+
+  @staticmethod
+  def _get_calibration_placeholders():
+    """Returns TF placeholders for y_true and y_pred."""
+    return (tf.placeholder(tf.int64, shape=(None)),
+            tf.placeholder(tf.float32, shape=(None)))
+
+  def test_expected_calibration_error_all_bins_filled(self):
+    """Test expected calibration error when all bins contain predictions."""
+    y_true, y_pred = self._get_calibration_placeholders()
+    expected_ece_op, update_op = calibration_metrics.expected_calibration_error(
+        y_true, y_pred, nbins=2)
+    with self.test_session() as sess:
+      metrics_vars = tf.get_collection(tf.GraphKeys.METRIC_VARIABLES)
+      sess.run(tf.variables_initializer(var_list=metrics_vars))
+      # Bin calibration errors (|confidence - accuracy| * bin_weight):
+      # - [0,0.5): |0.2 - 0.333| * (3/5) = 0.08
+      # - [0.5, 1]: |0.75 - 0.5| * (2/5) = 0.1
+      sess.run(
+          update_op,
+          feed_dict={
+              y_pred: np.array([0., 0.2, 0.4, 0.5, 1.0]),
+              y_true: np.array([0, 0, 1, 0, 1])
+          })
+    actual_ece = 0.08 + 0.1
+    expected_ece = sess.run(expected_ece_op)
+    self.assertAlmostEqual(actual_ece, expected_ece)
+
+  def test_expected_calibration_error_all_bins_not_filled(self):
+    """Test expected calibration error when no predictions for one bin."""
+    y_true, y_pred = self._get_calibration_placeholders()
+    expected_ece_op, update_op = calibration_metrics.expected_calibration_error(
+        y_true, y_pred, nbins=2)
+    with self.test_session() as sess:
+      metrics_vars = tf.get_collection(tf.GraphKeys.METRIC_VARIABLES)
+      sess.run(tf.variables_initializer(var_list=metrics_vars))
+      # Bin calibration errors (|confidence - accuracy| * bin_weight):
+      # - [0,0.5): |0.2 - 0.333| * (3/5) = 0.08
+      # - [0.5, 1]: |0.75 - 0.5| * (2/5) = 0.1
+      sess.run(
+          update_op,
+          feed_dict={
+              y_pred: np.array([0., 0.2, 0.4]),
+              y_true: np.array([0, 0, 1])
+          })
+    actual_ece = np.abs(0.2 - (1 / 3.))
+    expected_ece = sess.run(expected_ece_op)
+    self.assertAlmostEqual(actual_ece, expected_ece)
+
+  def test_expected_calibration_error_with_multiple_data_streams(self):
+    """Test expected calibration error when multiple data batches provided."""
+    y_true, y_pred = self._get_calibration_placeholders()
+    expected_ece_op, update_op = calibration_metrics.expected_calibration_error(
+        y_true, y_pred, nbins=2)
+    with self.test_session() as sess:
+      metrics_vars = tf.get_collection(tf.GraphKeys.METRIC_VARIABLES)
+      sess.run(tf.variables_initializer(var_list=metrics_vars))
+      # Identical data to test_expected_calibration_error_all_bins_filled,
+      # except split over three batches.
+      sess.run(
+          update_op,
+          feed_dict={
+              y_pred: np.array([0., 0.2]),
+              y_true: np.array([0, 0])
+          })
+      sess.run(
+          update_op,
+          feed_dict={
+              y_pred: np.array([0.4, 0.5]),
+              y_true: np.array([1, 0])
+          })
+      sess.run(
+          update_op, feed_dict={
+              y_pred: np.array([1.0]),
+              y_true: np.array([1])
+          })
+    actual_ece = 0.08 + 0.1
+    expected_ece = sess.run(expected_ece_op)
+    self.assertAlmostEqual(actual_ece, expected_ece)
+
+
+if __name__ == '__main__':
+  tf.test.main()
--- a/research/object_detection/model_lib.py
+++ b/research/object_detection/model_lib.py
@@ -51,6 +51,7 @@ MODEL_BUILD_UTIL_MAP = {
        inputs.create_eval_input_fn,
    'create_predict_input_fn':
        inputs.create_predict_input_fn,
+    'detection_model_fn_base': model_builder.build,
 }


@@ -184,7 +185,8 @@ def unstack_batch(tensor_dict, unpad_groundtruth_tensors=True):
  return unbatched_tensor_dict


-def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
+def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False,
+                    postprocess_on_cpu=False):
  """Creates a model function for `Estimator`.

  Args:
@@ -193,6 +195,8 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
    hparams: `HParams` object.
    use_tpu: Boolean indicating whether model should be constructed for
        use on TPU.
+    postprocess_on_cpu: When use_tpu and postprocess_on_cpu is true, postprocess
+        is scheduled on the host cpu.

  Returns:
    `model_fn` for `Estimator`.
@@ -282,9 +286,20 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
      prediction_dict = detection_model.predict(
          preprocessed_images,
          features[fields.InputDataFields.true_image_shape])
+
+    def postprocess_wrapper(args):
+      return detection_model.postprocess(args[0], args[1])
+
    if mode in (tf.estimator.ModeKeys.EVAL, tf.estimator.ModeKeys.PREDICT):
-      detections = detection_model.postprocess(
-          prediction_dict, features[fields.InputDataFields.true_image_shape])
+      if use_tpu and postprocess_on_cpu:
+        detections = tf.contrib.tpu.outside_compilation(
+            postprocess_wrapper,
+            (prediction_dict,
+             features[fields.InputDataFields.true_image_shape]))
+      else:
+        detections = postprocess_wrapper((
+            prediction_dict,
+            features[fields.InputDataFields.true_image_shape]))

    if mode == tf.estimator.ModeKeys.TRAIN:
      if train_config.fine_tune_checkpoint and hparams.load_pretrained:
@@ -501,6 +516,8 @@ def create_estimator_and_inputs(run_config,
                                params=None,
                                override_eval_num_epochs=True,
                                save_final_config=False,
+                                postprocess_on_cpu=False,
+                                export_to_tpu=None,
                                **kwargs):
  """Creates `Estimator`, input functions, and steps.

@@ -535,10 +552,15 @@ def create_estimator_and_inputs(run_config,
      is True.
    params: Parameter dictionary passed from the estimator. Only used if
      `use_tpu_estimator` is True.
-    override_eval_num_epochs: Whether to overwrite the number of epochs to
-      1 for eval_input.
+    override_eval_num_epochs: Whether to overwrite the number of epochs to 1 for
+      eval_input.
    save_final_config: Whether to save final config (obtained after applying
      overrides) to `estimator.model_dir`.
+    postprocess_on_cpu: When use_tpu and postprocess_on_cpu are true,
+      postprocess is scheduled on the host cpu.
+    export_to_tpu: When use_tpu and export_to_tpu are true,
+      `export_savedmodel()` exports a metagraph for serving on TPU besides the
+      one on CPU.
    **kwargs: Additional keyword arguments for configuration override.

  Returns:
@@ -561,12 +583,14 @@ def create_estimator_and_inputs(run_config,
  create_train_input_fn = MODEL_BUILD_UTIL_MAP['create_train_input_fn']
  create_eval_input_fn = MODEL_BUILD_UTIL_MAP['create_eval_input_fn']
  create_predict_input_fn = MODEL_BUILD_UTIL_MAP['create_predict_input_fn']
+  detection_model_fn_base = MODEL_BUILD_UTIL_MAP['detection_model_fn_base']

-  configs = get_configs_from_pipeline_file(pipeline_config_path,
-                                           config_override=config_override)
+  configs = get_configs_from_pipeline_file(
+      pipeline_config_path, config_override=config_override)
  kwargs.update({
      'train_steps': train_steps,
-      'sample_1_of_n_eval_examples': sample_1_of_n_eval_examples
+      'sample_1_of_n_eval_examples': sample_1_of_n_eval_examples,
+      'use_bfloat16': configs['train_config'].use_bfloat16 and use_tpu
  })
  if override_eval_num_epochs:
    kwargs.update({'eval_num_epochs': 1})
@@ -595,7 +619,7 @@ def create_estimator_and_inputs(run_config,
    train_steps = train_config.num_steps

  detection_model_fn = functools.partial(
-      model_builder.build, model_config=model_config)
+      detection_model_fn_base, model_config=model_config)

  # Create the input functions for TRAIN/EVAL/PREDICT.
  train_input_fn = create_train_input_fn(
@@ -618,10 +642,13 @@ def create_estimator_and_inputs(run_config,
  predict_input_fn = create_predict_input_fn(
      model_config=model_config, predict_input_config=eval_input_configs[0])

-  export_to_tpu = hparams.get('export_to_tpu', False)
+  # Read export_to_tpu from hparams if not passed.
+  if export_to_tpu is None:
+    export_to_tpu = hparams.get('export_to_tpu', False)
  tf.logging.info('create_estimator_and_inputs: use_tpu %s, export_to_tpu %s',
                  use_tpu, export_to_tpu)
-  model_fn = model_fn_creator(detection_model_fn, configs, hparams, use_tpu)
+  model_fn = model_fn_creator(detection_model_fn, configs, hparams, use_tpu,
+                              postprocess_on_cpu)
  if use_tpu_estimator:
    estimator = tf.contrib.tpu.TPUEstimator(
        model_fn=model_fn,
@@ -630,7 +657,8 @@ def create_estimator_and_inputs(run_config,
        eval_batch_size=num_shards * 1 if use_tpu else 1,
        use_tpu=use_tpu,
        config=run_config,
-        # TODO(lzc): Remove conditional after CMLE moves to TF 1.9
+        export_to_tpu=export_to_tpu,
+        eval_on_tpu=False,  # Eval runs on CPU, so disable eval on TPU
        params=params if params else {})
  else:
    estimator = tf.estimator.Estimator(model_fn=model_fn, config=run_config)

--- a/research/object_detection/models/feature_map_generators.py
+++ b/research/object_detection/models/feature_map_generators.py
@@ -29,6 +29,11 @@ import tensorflow as tf
 from object_detection.utils import ops
 slim = tf.contrib.slim

+# Activation bound used for TPU v1. Activations will be clipped to
+# [-ACTIVATION_BOUND, ACTIVATION_BOUND] when training with
+# use_bounded_activations enabled.
+ACTIVATION_BOUND = 6.0
+

 def get_depth_fn(depth_multiplier, min_depth):
  """Builds a callable to compute depth (output channels) of conv filters.
@@ -418,7 +423,9 @@ def fpn_top_down_feature_maps(image_features,
                              depth,
                              use_depthwise=False,
                              use_explicit_padding=False,
-                              scope=None):
+                              use_bounded_activations=False,
+                              scope=None,
+                              use_native_resize_op=False):
  """Generates `top-down` feature maps for Feature Pyramid Networks.

  See https://arxiv.org/abs/1612.03144 for details.
@@ -431,7 +438,12 @@ def fpn_top_down_feature_maps(image_features,
    use_depthwise: whether to use depthwise separable conv instead of regular
      conv.
    use_explicit_padding: whether to use explicit padding.
+    use_bounded_activations: Whether or not to clip activations to range
+      [-ACTIVATION_BOUND, ACTIVATION_BOUND]. Bounded activations better lend
+      themselves to quantized inference.
    scope: A scope name to wrap this op under.
+    use_native_resize_op: If True, uses tf.image.resize_nearest_neighbor op for
+      the upsampling process instead of reshape and broadcasting implementation.

  Returns:
    feature_maps: an OrderedDict mapping keys (feature map names) to
@@ -449,21 +461,36 @@ def fpn_top_down_feature_maps(image_features,
          image_features[-1][1],
          depth, [1, 1], activation_fn=None, normalizer_fn=None,
          scope='projection_%d' % num_levels)
+      if use_bounded_activations:
+        top_down = tf.clip_by_value(top_down, -ACTIVATION_BOUND,
+                                    ACTIVATION_BOUND)
      output_feature_maps_list.append(top_down)
      output_feature_map_keys.append(
          'top_down_%s' % image_features[-1][0])

      for level in reversed(range(num_levels - 1)):
-        top_down = ops.nearest_neighbor_upsampling(top_down, 2)
+        if use_native_resize_op:
+          with tf.name_scope('nearest_neighbor_upsampling'):
+            top_down_shape = top_down.shape.as_list()
+            top_down = tf.image.resize_nearest_neighbor(
+                top_down, [top_down_shape[1] * 2, top_down_shape[2] * 2])
+        else:
+          top_down = ops.nearest_neighbor_upsampling(top_down, scale=2)
        residual = slim.conv2d(
            image_features[level][1], depth, [1, 1],
            activation_fn=None, normalizer_fn=None,
            scope='projection_%d' % (level + 1))
+        if use_bounded_activations:
+          residual = tf.clip_by_value(residual, -ACTIVATION_BOUND,
+                                      ACTIVATION_BOUND)
        if use_explicit_padding:
          # slice top_down to the same shape as residual
          residual_shape = tf.shape(residual)
          top_down = top_down[:, :residual_shape[1], :residual_shape[2], :]
        top_down += residual
+        if use_bounded_activations:
+          top_down = tf.clip_by_value(top_down, -ACTIVATION_BOUND,
+                                      ACTIVATION_BOUND)
        if use_depthwise:
          conv_op = functools.partial(slim.separable_conv2d, depth_multiplier=1)
        else:

--- a/research/object_detection/models/feature_map_generators_test.py
+++ b/research/object_detection/models/feature_map_generators_test.py
@@ -17,6 +17,7 @@

 from absl.testing import parameterized

+import numpy as np
 import tensorflow as tf

 from google.protobuf import text_format
@@ -124,7 +125,36 @@ class MultiResolutionFeatureMapGeneratorTest(tf.test.TestCase):
          (key, value.shape) for key, value in out_feature_maps.items())
      self.assertDictEqual(expected_feature_map_shapes, out_feature_map_shapes)

-  # TODO(kaftan): Remove conditional after CMLE moves to TF 1.10
+  def test_get_expected_feature_map_shapes_with_inception_v2_use_depthwise(
+      self, use_keras):
+    image_features = {
+        'Mixed_3c': tf.random_uniform([4, 28, 28, 256], dtype=tf.float32),
+        'Mixed_4c': tf.random_uniform([4, 14, 14, 576], dtype=tf.float32),
+        'Mixed_5c': tf.random_uniform([4, 7, 7, 1024], dtype=tf.float32)
+    }
+    layout_copy = INCEPTION_V2_LAYOUT.copy()
+    layout_copy['use_depthwise'] = True
+    feature_map_generator = self._build_feature_map_generator(
+        feature_map_layout=layout_copy,
+        use_keras=use_keras
+    )
+    feature_maps = feature_map_generator(image_features)
+
+    expected_feature_map_shapes = {
+        'Mixed_3c': (4, 28, 28, 256),
+        'Mixed_4c': (4, 14, 14, 576),
+        'Mixed_5c': (4, 7, 7, 1024),
+        'Mixed_5c_2_Conv2d_3_3x3_s2_512': (4, 4, 4, 512),
+        'Mixed_5c_2_Conv2d_4_3x3_s2_256': (4, 2, 2, 256),
+        'Mixed_5c_2_Conv2d_5_3x3_s2_256': (4, 1, 1, 256)}
+
+    init_op = tf.global_variables_initializer()
+    with self.test_session() as sess:
+      sess.run(init_op)
+      out_feature_maps = sess.run(feature_maps)
+      out_feature_map_shapes = dict(
+          (key, value.shape) for key, value in out_feature_maps.items())
+      self.assertDictEqual(expected_feature_map_shapes, out_feature_map_shapes)

  def test_get_expected_feature_map_shapes_use_explicit_padding(
      self, use_keras):
@@ -297,12 +327,87 @@ class MultiResolutionFeatureMapGeneratorTest(tf.test.TestCase):
      else:
        self.assertSetEqual(expected_slim_variables, actual_variable_set)

-  # TODO(kaftan): Remove conditional after CMLE moves to TF 1.10
+  def test_get_expected_variable_names_with_inception_v2_use_depthwise(
+      self,
+      use_keras):
+    image_features = {
+        'Mixed_3c': tf.random_uniform([4, 28, 28, 256], dtype=tf.float32),
+        'Mixed_4c': tf.random_uniform([4, 14, 14, 576], dtype=tf.float32),
+        'Mixed_5c': tf.random_uniform([4, 7, 7, 1024], dtype=tf.float32)
+    }
+    layout_copy = INCEPTION_V2_LAYOUT.copy()
+    layout_copy['use_depthwise'] = True
+    feature_map_generator = self._build_feature_map_generator(
+        feature_map_layout=layout_copy,
+        use_keras=use_keras
+    )
+    feature_maps = feature_map_generator(image_features)
+
+    expected_slim_variables = set([
+        'Mixed_5c_1_Conv2d_3_1x1_256/weights',
+        'Mixed_5c_1_Conv2d_3_1x1_256/biases',
+        'Mixed_5c_2_Conv2d_3_3x3_s2_512_depthwise/depthwise_weights',
+        'Mixed_5c_2_Conv2d_3_3x3_s2_512_depthwise/biases',
+        'Mixed_5c_2_Conv2d_3_3x3_s2_512/weights',
+        'Mixed_5c_2_Conv2d_3_3x3_s2_512/biases',
+        'Mixed_5c_1_Conv2d_4_1x1_128/weights',
+        'Mixed_5c_1_Conv2d_4_1x1_128/biases',
+        'Mixed_5c_2_Conv2d_4_3x3_s2_256_depthwise/depthwise_weights',
+        'Mixed_5c_2_Conv2d_4_3x3_s2_256_depthwise/biases',
+        'Mixed_5c_2_Conv2d_4_3x3_s2_256/weights',
+        'Mixed_5c_2_Conv2d_4_3x3_s2_256/biases',
+        'Mixed_5c_1_Conv2d_5_1x1_128/weights',
+        'Mixed_5c_1_Conv2d_5_1x1_128/biases',
+        'Mixed_5c_2_Conv2d_5_3x3_s2_256_depthwise/depthwise_weights',
+        'Mixed_5c_2_Conv2d_5_3x3_s2_256_depthwise/biases',
+        'Mixed_5c_2_Conv2d_5_3x3_s2_256/weights',
+        'Mixed_5c_2_Conv2d_5_3x3_s2_256/biases',
+    ])
+
+    expected_keras_variables = set([
+        'FeatureMaps/Mixed_5c_1_Conv2d_3_1x1_256_conv/kernel',
+        'FeatureMaps/Mixed_5c_1_Conv2d_3_1x1_256_conv/bias',
+        ('FeatureMaps/Mixed_5c_2_Conv2d_3_3x3_s2_512_depthwise_conv/'
+         'depthwise_kernel'),
+        ('FeatureMaps/Mixed_5c_2_Conv2d_3_3x3_s2_512_depthwise_conv/'
+         'bias'),
+        'FeatureMaps/Mixed_5c_2_Conv2d_3_3x3_s2_512_conv/kernel',
+        'FeatureMaps/Mixed_5c_2_Conv2d_3_3x3_s2_512_conv/bias',
+        'FeatureMaps/Mixed_5c_1_Conv2d_4_1x1_128_conv/kernel',
+        'FeatureMaps/Mixed_5c_1_Conv2d_4_1x1_128_conv/bias',
+        ('FeatureMaps/Mixed_5c_2_Conv2d_4_3x3_s2_256_depthwise_conv/'
+         'depthwise_kernel'),
+        ('FeatureMaps/Mixed_5c_2_Conv2d_4_3x3_s2_256_depthwise_conv/'
+         'bias'),
+        'FeatureMaps/Mixed_5c_2_Conv2d_4_3x3_s2_256_conv/kernel',
+        'FeatureMaps/Mixed_5c_2_Conv2d_4_3x3_s2_256_conv/bias',
+        'FeatureMaps/Mixed_5c_1_Conv2d_5_1x1_128_conv/kernel',
+        'FeatureMaps/Mixed_5c_1_Conv2d_5_1x1_128_conv/bias',
+        ('FeatureMaps/Mixed_5c_2_Conv2d_5_3x3_s2_256_depthwise_conv/'
+         'depthwise_kernel'),
+        ('FeatureMaps/Mixed_5c_2_Conv2d_5_3x3_s2_256_depthwise_conv/'
+         'bias'),
+        'FeatureMaps/Mixed_5c_2_Conv2d_5_3x3_s2_256_conv/kernel',
+        'FeatureMaps/Mixed_5c_2_Conv2d_5_3x3_s2_256_conv/bias',
+    ])
+
+    init_op = tf.global_variables_initializer()
+    with self.test_session() as sess:
+      sess.run(init_op)
+      sess.run(feature_maps)
+      actual_variable_set = set(
+          [var.op.name for var in tf.trainable_variables()])
+      if use_keras:
+        self.assertSetEqual(expected_keras_variables, actual_variable_set)
+      else:
+        self.assertSetEqual(expected_slim_variables, actual_variable_set)


-class FPNFeatureMapGeneratorTest(tf.test.TestCase):
+@parameterized.parameters({'use_native_resize_op': True},
+                          {'use_native_resize_op': False})
+class FPNFeatureMapGeneratorTest(tf.test.TestCase, parameterized.TestCase):

-  def test_get_expected_feature_map_shapes(self):
+  def test_get_expected_feature_map_shapes(self, use_native_resize_op):
    image_features = [
        ('block2', tf.random_uniform([4, 8, 8, 256], dtype=tf.float32)),
        ('block3', tf.random_uniform([4, 4, 4, 256], dtype=tf.float32)),
@@ -310,7 +415,9 @@ class FPNFeatureMapGeneratorTest(tf.test.TestCase):
        ('block5', tf.random_uniform([4, 1, 1, 256], dtype=tf.float32))
    ]
    feature_maps = feature_map_generators.fpn_top_down_feature_maps(
-        image_features=image_features, depth=128)
+        image_features=image_features,
+        depth=128,
+        use_native_resize_op=use_native_resize_op)

    expected_feature_map_shapes = {
        'top_down_block2': (4, 8, 8, 128),
@@ -327,7 +434,95 @@ class FPNFeatureMapGeneratorTest(tf.test.TestCase):
                                for key, value in out_feature_maps.items()}
      self.assertDictEqual(out_feature_map_shapes, expected_feature_map_shapes)

-  def test_get_expected_feature_map_shapes_with_depthwise(self):
+  def test_use_bounded_activations_add_operations(self, use_native_resize_op):
+    tf_graph = tf.Graph()
+    with tf_graph.as_default():
+      image_features = [('block2',
+                         tf.random_uniform([4, 8, 8, 256], dtype=tf.float32)),
+                        ('block3',
+                         tf.random_uniform([4, 4, 4, 256], dtype=tf.float32)),
+                        ('block4',
+                         tf.random_uniform([4, 2, 2, 256], dtype=tf.float32)),
+                        ('block5',
+                         tf.random_uniform([4, 1, 1, 256], dtype=tf.float32))]
+      feature_map_generators.fpn_top_down_feature_maps(
+          image_features=image_features,
+          depth=128,
+          use_bounded_activations=True,
+          use_native_resize_op=use_native_resize_op)
+
+      expected_added_operations = dict.fromkeys([
+          'top_down/clip_by_value', 'top_down/clip_by_value_1',
+          'top_down/clip_by_value_2', 'top_down/clip_by_value_3',
+          'top_down/clip_by_value_4', 'top_down/clip_by_value_5',
+          'top_down/clip_by_value_6'
+      ])
+      op_names = {op.name: None for op in tf_graph.get_operations()}
+      self.assertDictContainsSubset(expected_added_operations, op_names)
+
+  def test_use_bounded_activations_clip_value(self, use_native_resize_op):
+    tf_graph = tf.Graph()
+    with tf_graph.as_default():
+      image_features = [
+          ('block2', 255 * tf.ones([4, 8, 8, 256], dtype=tf.float32)),
+          ('block3', 255 * tf.ones([4, 4, 4, 256], dtype=tf.float32)),
+          ('block4', 255 * tf.ones([4, 2, 2, 256], dtype=tf.float32)),
+          ('block5', 255 * tf.ones([4, 1, 1, 256], dtype=tf.float32))
+      ]
+      feature_map_generators.fpn_top_down_feature_maps(
+          image_features=image_features,
+          depth=128,
+          use_bounded_activations=True,
+          use_native_resize_op=use_native_resize_op)
+
+      expected_clip_by_value_ops = [
+          'top_down/clip_by_value', 'top_down/clip_by_value_1',
+          'top_down/clip_by_value_2', 'top_down/clip_by_value_3',
+          'top_down/clip_by_value_4', 'top_down/clip_by_value_5',
+          'top_down/clip_by_value_6'
+      ]
+
+      # Gathers activation tensors before and after clip_by_value operations.
+      activations = {}
+      for clip_by_value_op in expected_clip_by_value_ops:
+        clip_input_tensor = tf_graph.get_operation_by_name(
+            '{}/Minimum'.format(clip_by_value_op)).inputs[0]
+        clip_output_tensor = tf_graph.get_tensor_by_name(
+            '{}:0'.format(clip_by_value_op))
+        activations.update({
+            'before_{}'.format(clip_by_value_op): clip_input_tensor,
+            'after_{}'.format(clip_by_value_op): clip_output_tensor,
+        })
+
+      expected_lower_bound = -feature_map_generators.ACTIVATION_BOUND
+      expected_upper_bound = feature_map_generators.ACTIVATION_BOUND
+      init_op = tf.global_variables_initializer()
+      with self.test_session() as session:
+        session.run(init_op)
+        activations_output = session.run(activations)
+        for clip_by_value_op in expected_clip_by_value_ops:
+          # Before clipping, activations are beyound the expected bound because
+          # of large input image_features values.
+          activations_before_clipping = (
+              activations_output['before_{}'.format(clip_by_value_op)])
+          before_clipping_lower_bound = np.amin(activations_before_clipping)
+          before_clipping_upper_bound = np.amax(activations_before_clipping)
+          self.assertLessEqual(before_clipping_lower_bound,
+                               expected_lower_bound)
+          self.assertGreaterEqual(before_clipping_upper_bound,
+                                  expected_upper_bound)
+
+          # After clipping, activations are bounded as expectation.
+          activations_after_clipping = (
+              activations_output['after_{}'.format(clip_by_value_op)])
+          after_clipping_lower_bound = np.amin(activations_after_clipping)
+          after_clipping_upper_bound = np.amax(activations_after_clipping)
+          self.assertGreaterEqual(after_clipping_lower_bound,
+                                  expected_lower_bound)
+          self.assertLessEqual(after_clipping_upper_bound, expected_upper_bound)
+
+  def test_get_expected_feature_map_shapes_with_depthwise(
+      self, use_native_resize_op):
    image_features = [
        ('block2', tf.random_uniform([4, 8, 8, 256], dtype=tf.float32)),
        ('block3', tf.random_uniform([4, 4, 4, 256], dtype=tf.float32)),
@@ -335,7 +530,10 @@ class FPNFeatureMapGeneratorTest(tf.test.TestCase):
        ('block5', tf.random_uniform([4, 1, 1, 256], dtype=tf.float32))
    ]
    feature_maps = feature_map_generators.fpn_top_down_feature_maps(
-        image_features=image_features, depth=128, use_depthwise=True)
+        image_features=image_features,
+        depth=128,
+        use_depthwise=True,
+        use_native_resize_op=use_native_resize_op)

    expected_feature_map_shapes = {
        'top_down_block2': (4, 8, 8, 128),

--- a/research/object_detection/models/keras_applications/__init__.py
+++ b/research/object_detection/models/keras_applications/__init__.py
--- a/research/object_detection/models/keras_models/mobilenet_v1.py
+++ b/research/object_detection/models/keras_models/mobilenet_v1.py
+# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""A wrapper around the Keras MobilenetV1 models for object detection."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import tensorflow as tf
+
+from object_detection.core import freezable_batch_norm
+
+
+def _fixed_padding(inputs, kernel_size, rate=1):  # pylint: disable=invalid-name
+  """Pads the input along the spatial dimensions independently of input size.
+
+  Pads the input such that if it was used in a convolution with 'VALID' padding,
+  the output would have the same dimensions as if the unpadded input was used
+  in a convolution with 'SAME' padding.
+
+  Args:
+    inputs: A tensor of size [batch, height_in, width_in, channels].
+    kernel_size: The kernel to be used in the conv2d or max_pool2d operation.
+    rate: An integer, rate for atrous convolution.
+
+  Returns:
+    output: A tensor of size [batch, height_out, width_out, channels] with the
+      input, either intact (if kernel_size == 1) or padded (if kernel_size > 1).
+  """
+  kernel_size_effective = [kernel_size[0] + (kernel_size[0] - 1) * (rate - 1),
+                           kernel_size[0] + (kernel_size[0] - 1) * (rate - 1)]
+  pad_total = [kernel_size_effective[0] - 1, kernel_size_effective[1] - 1]
+  pad_beg = [pad_total[0] // 2, pad_total[1] // 2]
+  pad_end = [pad_total[0] - pad_beg[0], pad_total[1] - pad_beg[1]]
+  padded_inputs = tf.pad(inputs, [[0, 0], [pad_beg[0], pad_end[0]],
+                                  [pad_beg[1], pad_end[1]], [0, 0]])
+  return padded_inputs
+
+
+class _LayersOverride(object):
+  """Alternative Keras layers interface for the Keras MobileNetV1."""
+
+  def __init__(self,
+               batchnorm_training,
+               default_batchnorm_momentum=0.999,
+               conv_hyperparams=None,
+               use_explicit_padding=False,
+               alpha=1.0,
+               min_depth=None):
+    """Alternative tf.keras.layers interface, for use by the Keras MobileNetV1.
+
+    It is used by the Keras applications kwargs injection API to
+    modify the MobilenetV1 Keras application with changes required by
+    the Object Detection API.
+
+    These injected interfaces make the following changes to the network:
+
+    - Applies the Object Detection hyperparameter configuration
+    - Supports FreezableBatchNorms
+    - Adds support for a min number of filters for each layer
+    - Makes the `alpha` parameter affect the final convolution block even if it
+        is less than 1.0
+    - Adds support for explicit padding of convolutions
+
+    Args:
+      batchnorm_training: Bool. Assigned to Batch norm layer `training` param
+        when constructing `freezable_batch_norm.FreezableBatchNorm` layers.
+      default_batchnorm_momentum: Float. When 'conv_hyperparams' is None,
+        batch norm layers will be constructed using this value as the momentum.
+      conv_hyperparams: A `hyperparams_builder.KerasLayerHyperparams` object
+        containing hyperparameters for convolution ops. Optionally set to `None`
+        to use default mobilenet_v1 layer builders.
+      use_explicit_padding: If True, use 'valid' padding for convolutions,
+        but explicitly pre-pads inputs so that the output dimensions are the
+        same as if 'same' padding were used. Off by default.
+      alpha: The width multiplier referenced in the MobileNetV1 paper. It
+        modifies the number of filters in each convolutional layer. It's called
+        depth multiplier in Keras application MobilenetV1.
+      min_depth: Minimum number of filters in the convolutional layers.
+    """
+    self._alpha = alpha
+    self._batchnorm_training = batchnorm_training
+    self._default_batchnorm_momentum = default_batchnorm_momentum
+    self._conv_hyperparams = conv_hyperparams
+    self._use_explicit_padding = use_explicit_padding
+    self._min_depth = min_depth
+    self.regularizer = tf.keras.regularizers.l2(0.00004 * 0.5)
+    self.initializer = tf.truncated_normal_initializer(stddev=0.09)
+
+  def _FixedPaddingLayer(self, kernel_size, rate=1):
+    return tf.keras.layers.Lambda(
+        lambda x: _fixed_padding(x, kernel_size, rate))
+
+  def Conv2D(self, filters, kernel_size, **kwargs):
+    """Builds a Conv2D layer according to the current Object Detection config.
+
+    Overrides the Keras MobileNetV1 application's convolutions with ones that
+    follow the spec specified by the Object Detection hyperparameters.
+
+    Args:
+      filters: The number of filters to use for the convolution.
+      kernel_size: The kernel size to specify the height and width of the 2D
+        convolution window.
+      **kwargs: Keyword args specified by the Keras application for
+        constructing the convolution.
+
+    Returns:
+      A one-arg callable that will either directly apply a Keras Conv2D layer to
+      the input argument, or that will first pad the input then apply a Conv2D
+      layer.
+    """
+    # Apply the width multiplier and the minimum depth to the convolution layers
+    filters = int(filters * self._alpha)
+    if self._min_depth and filters < self._min_depth:
+      filters = self._min_depth
+
+    if self._conv_hyperparams:
+      kwargs = self._conv_hyperparams.params(**kwargs)
+    else:
+      kwargs['kernel_regularizer'] = self.regularizer
+      kwargs['kernel_initializer'] = self.initializer
+
+    kwargs['padding'] = 'same'
+    if self._use_explicit_padding and kernel_size > 1:
+      kwargs['padding'] = 'valid'
+      def padded_conv(features):  # pylint: disable=invalid-name
+        padded_features = self._FixedPaddingLayer(kernel_size)(features)
+        return tf.keras.layers.Conv2D(
+            filters, kernel_size, **kwargs)(padded_features)
+      return padded_conv
+    else:
+      return tf.keras.layers.Conv2D(filters, kernel_size, **kwargs)
+
+  def DepthwiseConv2D(self, kernel_size, **kwargs):
+    """Builds a DepthwiseConv2D according to the Object Detection config.
+
+    Overrides the Keras MobileNetV2 application's convolutions with ones that
+    follow the spec specified by the Object Detection hyperparameters.
+
+    Args:
+      kernel_size: The kernel size to specify the height and width of the 2D
+        convolution window.
+      **kwargs: Keyword args specified by the Keras application for
+        constructing the convolution.
+
+    Returns:
+      A one-arg callable that will either directly apply a Keras DepthwiseConv2D
+      layer to the input argument, or that will first pad the input then apply
+      the depthwise convolution.
+    """
+    if self._conv_hyperparams:
+      kwargs = self._conv_hyperparams.params(**kwargs)
+    else:
+      kwargs['depthwise_initializer'] = self.initializer
+
+    kwargs['padding'] = 'same'
+    if self._use_explicit_padding:
+      kwargs['padding'] = 'valid'
+      def padded_depthwise_conv(features):  # pylint: disable=invalid-name
+        padded_features = self._FixedPaddingLayer(kernel_size)(features)
+        return tf.keras.layers.DepthwiseConv2D(
+            kernel_size, **kwargs)(padded_features)
+      return padded_depthwise_conv
+    else:
+      return tf.keras.layers.DepthwiseConv2D(kernel_size, **kwargs)
+
+  def BatchNormalization(self, **kwargs):
+    """Builds a normalization layer.
+
+    Overrides the Keras application batch norm with the norm specified by the
+    Object Detection configuration.
+
+    Args:
+      **kwargs: Only the name is used, all other params ignored.
+        Required for matching `layers.BatchNormalization` calls in the Keras
+        application.
+
+    Returns:
+      A normalization layer specified by the Object Detection hyperparameter
+      configurations.
+    """
+    name = kwargs.get('name')
+    if self._conv_hyperparams:
+      return self._conv_hyperparams.build_batch_norm(
+          training=self._batchnorm_training,
+          name=name)
+    else:
+      return freezable_batch_norm.FreezableBatchNorm(
+          training=self._batchnorm_training,
+          epsilon=1e-3,
+          momentum=self._default_batchnorm_momentum,
+          name=name)
+
+  def Input(self, shape):
+    """Builds an Input layer.
+
+    Overrides the Keras application Input layer with one that uses a
+    tf.placeholder_with_default instead of a tf.placeholder. This is necessary
+    to ensure the application works when run on a TPU.
+
+    Args:
+      shape: The shape for the input layer to use. (Does not include a dimension
+        for the batch size).
+    Returns:
+      An input layer for the specified shape that internally uses a
+      placeholder_with_default.
+    """
+    default_size = 224
+    default_batch_size = 1
+    shape = list(shape)
+    default_shape = [default_size if dim is None else dim for dim in shape]
+
+    input_tensor = tf.constant(0.0, shape=[default_batch_size] + default_shape)
+
+    placeholder_with_default = tf.placeholder_with_default(
+        input=input_tensor, shape=[None] + shape)
+    return tf.keras.layers.Input(tensor=placeholder_with_default)
+
+  # pylint: disable=unused-argument
+  def ReLU(self, *args, **kwargs):
+    """Builds an activation layer.
+
+    Overrides the Keras application ReLU with the activation specified by the
+    Object Detection configuration.
+
+    Args:
+      *args: Ignored, required to match the `tf.keras.ReLU` interface
+      **kwargs: Only the name is used,
+        required to match `tf.keras.ReLU` interface
+
+    Returns:
+      An activation layer specified by the Object Detection hyperparameter
+      configurations.
+    """
+    name = kwargs.get('name')
+    if self._conv_hyperparams:
+      return self._conv_hyperparams.build_activation_layer(name=name)
+    else:
+      return tf.keras.layers.Lambda(tf.nn.relu6, name=name)
+  # pylint: enable=unused-argument
+
+  # pylint: disable=unused-argument
+  def ZeroPadding2D(self, padding, **kwargs):
+    """Replaces explicit padding in the Keras application with a no-op.
+
+    Args:
+      padding: The padding values for image height and width.
+      **kwargs: Ignored, required to match the Keras applications usage.
+
+    Returns:
+      A no-op identity lambda.
+    """
+    return lambda x: x
+  # pylint: enable=unused-argument
+
+  # Forward all non-overridden methods to the keras layers
+  def __getattr__(self, item):
+    return getattr(tf.keras.layers, item)
+
+
+# pylint: disable=invalid-name
+def mobilenet_v1(batchnorm_training,
+                 default_batchnorm_momentum=0.9997,
+                 conv_hyperparams=None,
+                 use_explicit_padding=False,
+                 alpha=1.0,
+                 min_depth=None,
+                 **kwargs):
+  """Instantiates the MobileNetV1 architecture, modified for object detection.
+
+  This wraps the MobileNetV1 tensorflow Keras application, but uses the
+  Keras application's kwargs-based monkey-patching API to override the Keras
+  architecture with the following changes:
+
+  - Changes the default batchnorm momentum to 0.9997
+  - Applies the Object Detection hyperparameter configuration
+  - Supports FreezableBatchNorms
+  - Adds support for a min number of filters for each layer
+  - Makes the `alpha` parameter affect the final convolution block even if it
+      is less than 1.0
+  - Adds support for explicit padding of convolutions
+  - Makes the Input layer use a tf.placeholder_with_default instead of a
+      tf.placeholder, to work on TPUs.
+
+  Args:
+      batchnorm_training: Bool. Assigned to Batch norm layer `training` param
+        when constructing `freezable_batch_norm.FreezableBatchNorm` layers.
+      default_batchnorm_momentum: Float. When 'conv_hyperparams' is None,
+        batch norm layers will be constructed using this value as the momentum.
+      conv_hyperparams: A `hyperparams_builder.KerasLayerHyperparams` object
+        containing hyperparameters for convolution ops. Optionally set to `None`
+        to use default mobilenet_v1 layer builders.
+      use_explicit_padding: If True, use 'valid' padding for convolutions,
+        but explicitly pre-pads inputs so that the output dimensions are the
+        same as if 'same' padding were used. Off by default.
+      alpha: The width multiplier referenced in the MobileNetV1 paper. It
+        modifies the number of filters in each convolutional layer.
+      min_depth: Minimum number of filters in the convolutional layers.
+      **kwargs: Keyword arguments forwarded directly to the
+        `tf.keras.applications.Mobilenet` method that constructs the Keras
+        model.
+
+  Returns:
+      A Keras model instance.
+  """
+  layers_override = _LayersOverride(
+      batchnorm_training,
+      default_batchnorm_momentum=default_batchnorm_momentum,
+      conv_hyperparams=conv_hyperparams,
+      use_explicit_padding=use_explicit_padding,
+      min_depth=min_depth,
+      alpha=alpha)
+  return tf.keras.applications.MobileNet(
+      alpha=alpha, layers=layers_override, **kwargs)
+# pylint: enable=invalid-name
--- a/research/object_detection/models/keras_models/mobilenet_v1_test.py
+++ b/research/object_detection/models/keras_models/mobilenet_v1_test.py
+# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Tests for mobilenet_v1.py.
+
+This test mainly focuses on comparing slim MobilenetV1 and Keras MobilenetV1 for
+object detection. To verify the consistency of the two models, we compare:
+  1. Output shape of each layer given different inputs
+  2. Number of global variables
+
+We also visualize the model structure via Tensorboard, and compare the model
+layout and the parameters of each Op to make sure the two implementations are
+consistent.
+"""
+
+import itertools
+import numpy as np
+import tensorflow as tf
+
+from google.protobuf import text_format
+
+from object_detection.builders import hyperparams_builder
+from object_detection.models.keras_models import mobilenet_v1
+from object_detection.models.keras_models import test_utils
+from object_detection.protos import hyperparams_pb2
+from object_detection.utils import test_case
+
+_KERAS_LAYERS_TO_CHECK = [
+    'conv1_relu',
+    'conv_dw_1_relu', 'conv_pw_1_relu',
+    'conv_dw_2_relu', 'conv_pw_2_relu',
+    'conv_dw_3_relu', 'conv_pw_3_relu',
+    'conv_dw_4_relu', 'conv_pw_4_relu',
+    'conv_dw_5_relu', 'conv_pw_5_relu',
+    'conv_dw_6_relu', 'conv_pw_6_relu',
+    'conv_dw_7_relu', 'conv_pw_7_relu',
+    'conv_dw_8_relu', 'conv_pw_8_relu',
+    'conv_dw_9_relu', 'conv_pw_9_relu',
+    'conv_dw_10_relu', 'conv_pw_10_relu',
+    'conv_dw_11_relu', 'conv_pw_11_relu',
+    'conv_dw_12_relu', 'conv_pw_12_relu',
+    'conv_dw_13_relu', 'conv_pw_13_relu',
+]
+
+_NUM_CHANNELS = 3
+_BATCH_SIZE = 2
+
+
+class MobilenetV1Test(test_case.TestCase):
+
+  def _build_conv_hyperparams(self):
+    conv_hyperparams = hyperparams_pb2.Hyperparams()
+    conv_hyperparams_text_proto = """
+      activation: RELU_6
+      regularizer {
+        l2_regularizer {
+        }
+      }
+      initializer {
+        truncated_normal_initializer {
+        }
+      }
+      batch_norm {
+        train: true,
+        scale: false,
+        center: true,
+        decay: 0.2,
+        epsilon: 0.1,
+      }
+    """
+    text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams)
+    return hyperparams_builder.KerasLayerHyperparams(conv_hyperparams)
+
+  def _create_application_with_layer_outputs(
+      self, layer_names, batchnorm_training,
+      conv_hyperparams=None,
+      use_explicit_padding=False,
+      alpha=1.0,
+      min_depth=None):
+    """Constructs Keras MobilenetV1 that extracts intermediate layer outputs."""
+    if not layer_names:
+      layer_names = _KERAS_LAYERS_TO_CHECK
+    full_model = mobilenet_v1.mobilenet_v1(
+        batchnorm_training=batchnorm_training,
+        conv_hyperparams=conv_hyperparams,
+        weights=None,
+        use_explicit_padding=use_explicit_padding,
+        alpha=alpha,
+        min_depth=min_depth,
+        include_top=False)
+    layer_outputs = [full_model.get_layer(name=layer).output
+                     for layer in layer_names]
+    return tf.keras.Model(
+        inputs=full_model.inputs,
+        outputs=layer_outputs)
+
+  def _check_returns_correct_shape(
+      self, image_height, image_width, depth_multiplier,
+      expected_feature_map_shape, use_explicit_padding=False, min_depth=8,
+      layer_names=None):
+    def graph_fn(image_tensor):
+      model = self._create_application_with_layer_outputs(
+          layer_names=layer_names,
+          batchnorm_training=False,
+          use_explicit_padding=use_explicit_padding,
+          min_depth=min_depth,
+          alpha=depth_multiplier)
+      return model(image_tensor)
+
+    image_tensor = np.random.rand(_BATCH_SIZE, image_height, image_width,
+                                  _NUM_CHANNELS).astype(np.float32)
+    feature_maps = self.execute(graph_fn, [image_tensor])
+
+    for feature_map, expected_shape in itertools.izip(
+        feature_maps, expected_feature_map_shape):
+      self.assertAllEqual(feature_map.shape, expected_shape)
+
+  def _check_returns_correct_shapes_with_dynamic_inputs(
+      self, image_height, image_width, depth_multiplier,
+      expected_feature_map_shape, use_explicit_padding=False, min_depth=8,
+      layer_names=None):
+    def graph_fn(image_height, image_width):
+      image_tensor = tf.random_uniform([_BATCH_SIZE, image_height, image_width,
+                                        _NUM_CHANNELS], dtype=tf.float32)
+      model = self._create_application_with_layer_outputs(
+          layer_names=layer_names,
+          batchnorm_training=False,
+          use_explicit_padding=use_explicit_padding,
+          alpha=depth_multiplier)
+      return model(image_tensor)
+
+    feature_maps = self.execute_cpu(graph_fn, [
+        np.array(image_height, dtype=np.int32),
+        np.array(image_width, dtype=np.int32)
+    ])
+
+    for feature_map, expected_shape in itertools.izip(
+        feature_maps, expected_feature_map_shape):
+      self.assertAllEqual(feature_map.shape, expected_shape)
+
+  def _get_variables(self, depth_multiplier, layer_names=None):
+    g = tf.Graph()
+    with g.as_default():
+      preprocessed_inputs = tf.placeholder(
+          tf.float32, (4, None, None, _NUM_CHANNELS))
+      model = self._create_application_with_layer_outputs(
+          layer_names=layer_names,
+          batchnorm_training=False, use_explicit_padding=False,
+          alpha=depth_multiplier)
+      model(preprocessed_inputs)
+      return g.get_collection(tf.GraphKeys.GLOBAL_VARIABLES)
+
+  def test_returns_correct_shapes_128(self):
+    image_height = 128
+    image_width = 128
+    depth_multiplier = 1.0
+    expected_feature_map_shape = (
+        test_utils.moblenet_v1_expected_feature_map_shape_128)
+    self._check_returns_correct_shape(
+        image_height, image_width, depth_multiplier, expected_feature_map_shape)
+
+  def test_returns_correct_shapes_128_explicit_padding(
+      self):
+    image_height = 128
+    image_width = 128
+    depth_multiplier = 1.0
+    expected_feature_map_shape = (
+        test_utils.moblenet_v1_expected_feature_map_shape_128_explicit_padding)
+    self._check_returns_correct_shape(
+        image_height, image_width, depth_multiplier, expected_feature_map_shape,
+        use_explicit_padding=True)
+
+  def test_returns_correct_shapes_with_dynamic_inputs(
+      self):
+    image_height = 128
+    image_width = 128
+    depth_multiplier = 1.0
+    expected_feature_map_shape = (
+        test_utils.mobilenet_v1_expected_feature_map_shape_with_dynamic_inputs)
+    self._check_returns_correct_shapes_with_dynamic_inputs(
+        image_height, image_width, depth_multiplier, expected_feature_map_shape)
+
+  def test_returns_correct_shapes_299(self):
+    image_height = 299
+    image_width = 299
+    depth_multiplier = 1.0
+    expected_feature_map_shape = (
+        test_utils.moblenet_v1_expected_feature_map_shape_299)
+    self._check_returns_correct_shape(
+        image_height, image_width, depth_multiplier, expected_feature_map_shape)
+
+  def test_returns_correct_shapes_enforcing_min_depth(
+      self):
+    image_height = 299
+    image_width = 299
+    depth_multiplier = 0.5**12
+    expected_feature_map_shape = (
+        test_utils.moblenet_v1_expected_feature_map_shape_enforcing_min_depth)
+    self._check_returns_correct_shape(
+        image_height, image_width, depth_multiplier, expected_feature_map_shape)
+
+  def test_hyperparam_override(self):
+    hyperparams = self._build_conv_hyperparams()
+    model = mobilenet_v1.mobilenet_v1(
+        batchnorm_training=True,
+        conv_hyperparams=hyperparams,
+        weights=None,
+        use_explicit_padding=False,
+        alpha=1.0,
+        min_depth=32,
+        include_top=False)
+    hyperparams.params()
+    bn_layer = model.get_layer(name='conv_pw_5_bn')
+    self.assertAllClose(bn_layer.momentum, 0.2)
+    self.assertAllClose(bn_layer.epsilon, 0.1)
+
+  def test_variable_count(self):
+    depth_multiplier = 1
+    variables = self._get_variables(depth_multiplier)
+    # 135 is the number of variables from slim MobilenetV1 model.
+    self.assertEqual(len(variables), 135)
+
+
+if __name__ == '__main__':
+  tf.test.main()
--- a/research/object_detection/models/keras_applications/mobilenet_v2.py
+++ b/research/object_detection/models/keras_applications/mobilenet_v2.py
--- a/research/object_detection/models/keras_applications/mobilenet_v2_test.py
+++ b/research/object_detection/models/keras_applications/mobilenet_v2_test.py
@@ -21,7 +21,8 @@ import tensorflow as tf
 from google.protobuf import text_format

 from object_detection.builders import hyperparams_builder
-from object_detection.models.keras_applications import mobilenet_v2
+from object_detection.models.keras_models import mobilenet_v2
+from object_detection.models.keras_models import test_utils
 from object_detection.protos import hyperparams_pb2
 from object_detection.utils import test_case

@@ -151,56 +152,8 @@ class MobilenetV2Test(test_case.TestCase):
    image_height = 128
    image_width = 128
    depth_multiplier = 1.0
-    expected_feature_map_shape = [(2, 64, 64, 32),
-                                  (2, 64, 64, 96),
-                                  (2, 32, 32, 96),
-                                  (2, 32, 32, 24),
-                                  (2, 32, 32, 144),
-                                  (2, 32, 32, 144),
-                                  (2, 32, 32, 24),
-                                  (2, 32, 32, 144),
-                                  (2, 16, 16, 144),
-                                  (2, 16, 16, 32),
-                                  (2, 16, 16, 192),
-                                  (2, 16, 16, 192),
-                                  (2, 16, 16, 32),
-                                  (2, 16, 16, 192),
-                                  (2, 16, 16, 192),
-                                  (2, 16, 16, 32),
-                                  (2, 16, 16, 192),
-                                  (2, 8, 8, 192),
-                                  (2, 8, 8, 64),
-                                  (2, 8, 8, 384),
-                                  (2, 8, 8, 384),
-                                  (2, 8, 8, 64),
-                                  (2, 8, 8, 384),
-                                  (2, 8, 8, 384),
-                                  (2, 8, 8, 64),
-                                  (2, 8, 8, 384),
-                                  (2, 8, 8, 384),
-                                  (2, 8, 8, 64),
-                                  (2, 8, 8, 384),
-                                  (2, 8, 8, 384),
-                                  (2, 8, 8, 96),
-                                  (2, 8, 8, 576),
-                                  (2, 8, 8, 576),
-                                  (2, 8, 8, 96),
-                                  (2, 8, 8, 576),
-                                  (2, 8, 8, 576),
-                                  (2, 8, 8, 96),
-                                  (2, 8, 8, 576),
-                                  (2, 4, 4, 576),
-                                  (2, 4, 4, 160),
-                                  (2, 4, 4, 960),
-                                  (2, 4, 4, 960),
-                                  (2, 4, 4, 160),
-                                  (2, 4, 4, 960),
-                                  (2, 4, 4, 960),
-                                  (2, 4, 4, 160),
-                                  (2, 4, 4, 960),
-                                  (2, 4, 4, 960),
-                                  (2, 4, 4, 320),
-                                  (2, 4, 4, 1280)]
+    expected_feature_map_shape = (
+        test_utils.moblenet_v2_expected_feature_map_shape_128)

    self._check_returns_correct_shape(
        2, image_height, image_width, depth_multiplier,
@@ -211,56 +164,8 @@ class MobilenetV2Test(test_case.TestCase):
    image_height = 128
    image_width = 128
    depth_multiplier = 1.0
-    expected_feature_map_shape = [(2, 64, 64, 32),
-                                  (2, 64, 64, 96),
-                                  (2, 32, 32, 96),
-                                  (2, 32, 32, 24),
-                                  (2, 32, 32, 144),
-                                  (2, 32, 32, 144),
-                                  (2, 32, 32, 24),
-                                  (2, 32, 32, 144),
-                                  (2, 16, 16, 144),
-                                  (2, 16, 16, 32),
-                                  (2, 16, 16, 192),
-                                  (2, 16, 16, 192),
-                                  (2, 16, 16, 32),
-                                  (2, 16, 16, 192),
-                                  (2, 16, 16, 192),
-                                  (2, 16, 16, 32),
-                                  (2, 16, 16, 192),
-                                  (2, 8, 8, 192),
-                                  (2, 8, 8, 64),
-                                  (2, 8, 8, 384),
-                                  (2, 8, 8, 384),
-                                  (2, 8, 8, 64),
-                                  (2, 8, 8, 384),
-                                  (2, 8, 8, 384),
-                                  (2, 8, 8, 64),
-                                  (2, 8, 8, 384),
-                                  (2, 8, 8, 384),
-                                  (2, 8, 8, 64),
-                                  (2, 8, 8, 384),
-                                  (2, 8, 8, 384),
-                                  (2, 8, 8, 96),
-                                  (2, 8, 8, 576),
-                                  (2, 8, 8, 576),
-                                  (2, 8, 8, 96),
-                                  (2, 8, 8, 576),
-                                  (2, 8, 8, 576),
-                                  (2, 8, 8, 96),
-                                  (2, 8, 8, 576),
-                                  (2, 4, 4, 576),
-                                  (2, 4, 4, 160),
-                                  (2, 4, 4, 960),
-                                  (2, 4, 4, 960),
-                                  (2, 4, 4, 160),
-                                  (2, 4, 4, 960),
-                                  (2, 4, 4, 960),
-                                  (2, 4, 4, 160),
-                                  (2, 4, 4, 960),
-                                  (2, 4, 4, 960),
-                                  (2, 4, 4, 320),
-                                  (2, 4, 4, 1280)]
+    expected_feature_map_shape = (
+        test_utils.moblenet_v2_expected_feature_map_shape_128_explicit_padding)
    self._check_returns_correct_shape(
        2, image_height, image_width, depth_multiplier,
        expected_feature_map_shape, use_explicit_padding=True)
@@ -270,56 +175,8 @@ class MobilenetV2Test(test_case.TestCase):
    image_height = 128
    image_width = 128
    depth_multiplier = 1.0
-    expected_feature_map_shape = [(2, 64, 64, 32),
-                                  (2, 64, 64, 96),
-                                  (2, 32, 32, 96),
-                                  (2, 32, 32, 24),
-                                  (2, 32, 32, 144),
-                                  (2, 32, 32, 144),
-                                  (2, 32, 32, 24),
-                                  (2, 32, 32, 144),
-                                  (2, 16, 16, 144),
-                                  (2, 16, 16, 32),
-                                  (2, 16, 16, 192),
-                                  (2, 16, 16, 192),
-                                  (2, 16, 16, 32),
-                                  (2, 16, 16, 192),
-                                  (2, 16, 16, 192),
-                                  (2, 16, 16, 32),
-                                  (2, 16, 16, 192),
-                                  (2, 8, 8, 192),
-                                  (2, 8, 8, 64),
-                                  (2, 8, 8, 384),
-                                  (2, 8, 8, 384),
-                                  (2, 8, 8, 64),
-                                  (2, 8, 8, 384),
-                                  (2, 8, 8, 384),
-                                  (2, 8, 8, 64),
-                                  (2, 8, 8, 384),
-                                  (2, 8, 8, 384),
-                                  (2, 8, 8, 64),
-                                  (2, 8, 8, 384),
-                                  (2, 8, 8, 384),
-                                  (2, 8, 8, 96),
-                                  (2, 8, 8, 576),
-                                  (2, 8, 8, 576),
-                                  (2, 8, 8, 96),
-                                  (2, 8, 8, 576),
-                                  (2, 8, 8, 576),
-                                  (2, 8, 8, 96),
-                                  (2, 8, 8, 576),
-                                  (2, 4, 4, 576),
-                                  (2, 4, 4, 160),
-                                  (2, 4, 4, 960),
-                                  (2, 4, 4, 960),
-                                  (2, 4, 4, 160),
-                                  (2, 4, 4, 960),
-                                  (2, 4, 4, 960),
-                                  (2, 4, 4, 160),
-                                  (2, 4, 4, 960),
-                                  (2, 4, 4, 960),
-                                  (2, 4, 4, 320),
-                                  (2, 4, 4, 1280)]
+    expected_feature_map_shape = (
+        test_utils.mobilenet_v2_expected_feature_map_shape_with_dynamic_inputs)
    self._check_returns_correct_shapes_with_dynamic_inputs(
        2, image_height, image_width, depth_multiplier,
        expected_feature_map_shape)
@@ -328,57 +185,8 @@ class MobilenetV2Test(test_case.TestCase):
    image_height = 299
    image_width = 299
    depth_multiplier = 1.0
-    expected_feature_map_shape = [(2, 150, 150, 32),
-                                  (2, 150, 150, 96),
-                                  (2, 75, 75, 96),
-                                  (2, 75, 75, 24),
-                                  (2, 75, 75, 144),
-                                  (2, 75, 75, 144),
-                                  (2, 75, 75, 24),
-                                  (2, 75, 75, 144),
-                                  (2, 38, 38, 144),
-                                  (2, 38, 38, 32),
-                                  (2, 38, 38, 192),
-                                  (2, 38, 38, 192),
-                                  (2, 38, 38, 32),
-                                  (2, 38, 38, 192),
-                                  (2, 38, 38, 192),
-                                  (2, 38, 38, 32),
-                                  (2, 38, 38, 192),
-                                  (2, 19, 19, 192),
-                                  (2, 19, 19, 64),
-                                  (2, 19, 19, 384),
-                                  (2, 19, 19, 384),
-                                  (2, 19, 19, 64),
-                                  (2, 19, 19, 384),
-                                  (2, 19, 19, 384),
-                                  (2, 19, 19, 64),
-                                  (2, 19, 19, 384),
-                                  (2, 19, 19, 384),
-                                  (2, 19, 19, 64),
-                                  (2, 19, 19, 384),
-                                  (2, 19, 19, 384),
-                                  (2, 19, 19, 96),
-                                  (2, 19, 19, 576),
-                                  (2, 19, 19, 576),
-                                  (2, 19, 19, 96),
-                                  (2, 19, 19, 576),
-                                  (2, 19, 19, 576),
-                                  (2, 19, 19, 96),
-                                  (2, 19, 19, 576),
-                                  (2, 10, 10, 576),
-                                  (2, 10, 10, 160),
-                                  (2, 10, 10, 960),
-                                  (2, 10, 10, 960),
-                                  (2, 10, 10, 160),
-                                  (2, 10, 10, 960),
-                                  (2, 10, 10, 960),
-                                  (2, 10, 10, 160),
-                                  (2, 10, 10, 960),
-                                  (2, 10, 10, 960),
-                                  (2, 10, 10, 320),
-                                  (2, 10, 10, 1280)]
-
+    expected_feature_map_shape = (
+        test_utils.moblenet_v2_expected_feature_map_shape_299)
    self._check_returns_correct_shape(
        2, image_height, image_width, depth_multiplier,
        expected_feature_map_shape)
@@ -388,56 +196,8 @@ class MobilenetV2Test(test_case.TestCase):
    image_height = 299
    image_width = 299
    depth_multiplier = 0.5**12
-    expected_feature_map_shape = [(2, 150, 150, 32),
-                                  (2, 150, 150, 192),
-                                  (2, 75, 75, 192),
-                                  (2, 75, 75, 32),
-                                  (2, 75, 75, 192),
-                                  (2, 75, 75, 192),
-                                  (2, 75, 75, 32),
-                                  (2, 75, 75, 192),
-                                  (2, 38, 38, 192),
-                                  (2, 38, 38, 32),
-                                  (2, 38, 38, 192),
-                                  (2, 38, 38, 192),
-                                  (2, 38, 38, 32),
-                                  (2, 38, 38, 192),
-                                  (2, 38, 38, 192),
-                                  (2, 38, 38, 32),
-                                  (2, 38, 38, 192),
-                                  (2, 19, 19, 192),
-                                  (2, 19, 19, 32),
-                                  (2, 19, 19, 192),
-                                  (2, 19, 19, 192),
-                                  (2, 19, 19, 32),
-                                  (2, 19, 19, 192),
-                                  (2, 19, 19, 192),
-                                  (2, 19, 19, 32),
-                                  (2, 19, 19, 192),
-                                  (2, 19, 19, 192),
-                                  (2, 19, 19, 32),
-                                  (2, 19, 19, 192),
-                                  (2, 19, 19, 192),
-                                  (2, 19, 19, 32),
-                                  (2, 19, 19, 192),
-                                  (2, 19, 19, 192),
-                                  (2, 19, 19, 32),
-                                  (2, 19, 19, 192),
-                                  (2, 19, 19, 192),
-                                  (2, 19, 19, 32),
-                                  (2, 19, 19, 192),
-                                  (2, 10, 10, 192),
-                                  (2, 10, 10, 32),
-                                  (2, 10, 10, 192),
-                                  (2, 10, 10, 192),
-                                  (2, 10, 10, 32),
-                                  (2, 10, 10, 192),
-                                  (2, 10, 10, 192),
-                                  (2, 10, 10, 32),
-                                  (2, 10, 10, 192),
-                                  (2, 10, 10, 192),
-                                  (2, 10, 10, 32),
-                                  (2, 10, 10, 32)]
+    expected_feature_map_shape = (
+        test_utils.moblenet_v2_expected_feature_map_shape_enforcing_min_depth)
    self._check_returns_correct_shape(
        2, image_height, image_width, depth_multiplier,
        expected_feature_map_shape, min_depth=32)

--- a/research/object_detection/models/keras_models/test_utils.py
+++ b/research/object_detection/models/keras_models/test_utils.py
+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Test utils for other test files."""
+
+# import tensorflow as tf
+#
+# from nets import mobilenet_v1
+#
+# slim = tf.contrib.slim
+#
+# # Layer names of Slim to map Keras layer names in MobilenetV1
+# _MOBLIENET_V1_SLIM_ENDPOINTS = [
+#     'Conv2d_0',
+#     'Conv2d_1_depthwise', 'Conv2d_1_pointwise',
+#     'Conv2d_2_depthwise', 'Conv2d_2_pointwise',
+#     'Conv2d_3_depthwise', 'Conv2d_3_pointwise',
+#     'Conv2d_4_depthwise', 'Conv2d_4_pointwise',
+#     'Conv2d_5_depthwise', 'Conv2d_5_pointwise',
+#     'Conv2d_6_depthwise', 'Conv2d_6_pointwise',
+#     'Conv2d_7_depthwise', 'Conv2d_7_pointwise',
+#     'Conv2d_8_depthwise', 'Conv2d_8_pointwise',
+#     'Conv2d_9_depthwise', 'Conv2d_9_pointwise',
+#     'Conv2d_10_depthwise', 'Conv2d_10_pointwise',
+#     'Conv2d_11_depthwise', 'Conv2d_11_pointwise',
+#     'Conv2d_12_depthwise', 'Conv2d_12_pointwise',
+#     'Conv2d_13_depthwise', 'Conv2d_13_pointwise'
+# ]
+#
+#
+# # Function to get the output shape of each layer in Slim. It's used to
+# # generate the following constant expected_feature_map_shape for MobilenetV1.
+# # Similarly, this can also apply to MobilenetV2.
+# def _get_slim_endpoint_shapes(inputs, depth_multiplier=1.0, min_depth=8,
+#                               use_explicit_padding=False):
+#   with slim.arg_scope([slim.conv2d, slim.separable_conv2d],
+#                       normalizer_fn=slim.batch_norm):
+#     _, end_points = mobilenet_v1.mobilenet_v1_base(
+#         inputs, final_endpoint='Conv2d_13_pointwise',
+#         depth_multiplier=depth_multiplier, min_depth=min_depth,
+#         use_explicit_padding=use_explicit_padding)
+#     return [end_points[endpoint_name].get_shape()
+#             for endpoint_name in _MOBLIENET_V1_SLIM_ENDPOINTS]
+
+
+# For Mobilenet V1
+moblenet_v1_expected_feature_map_shape_128 = [
+    (2, 64, 64, 32), (2, 64, 64, 32), (2, 64, 64, 64), (2, 32, 32, 64),
+    (2, 32, 32, 128), (2, 32, 32, 128), (2, 32, 32, 128), (2, 16, 16, 128),
+    (2, 16, 16, 256), (2, 16, 16, 256), (2, 16, 16, 256), (2, 8, 8, 256),
+    (2, 8, 8, 512), (2, 8, 8, 512), (2, 8, 8, 512), (2, 8, 8, 512),
+    (2, 8, 8, 512), (2, 8, 8, 512), (2, 8, 8, 512), (2, 8, 8, 512),
+    (2, 8, 8, 512), (2, 8, 8, 512), (2, 8, 8, 512), (2, 4, 4, 512),
+    (2, 4, 4, 1024), (2, 4, 4, 1024), (2, 4, 4, 1024),
+]
+
+moblenet_v1_expected_feature_map_shape_128_explicit_padding = [
+    (2, 64, 64, 32), (2, 64, 64, 32), (2, 64, 64, 64), (2, 32, 32, 64),
+    (2, 32, 32, 128), (2, 32, 32, 128), (2, 32, 32, 128), (2, 16, 16, 128),
+    (2, 16, 16, 256), (2, 16, 16, 256), (2, 16, 16, 256), (2, 8, 8, 256),
+    (2, 8, 8, 512), (2, 8, 8, 512), (2, 8, 8, 512), (2, 8, 8, 512),
+    (2, 8, 8, 512), (2, 8, 8, 512), (2, 8, 8, 512), (2, 8, 8, 512),
+    (2, 8, 8, 512), (2, 8, 8, 512), (2, 8, 8, 512), (2, 4, 4, 512),
+    (2, 4, 4, 1024), (2, 4, 4, 1024), (2, 4, 4, 1024),
+]
+
+mobilenet_v1_expected_feature_map_shape_with_dynamic_inputs = [
+    (2, 64, 64, 32), (2, 64, 64, 32), (2, 64, 64, 64), (2, 32, 32, 64),
+    (2, 32, 32, 128), (2, 32, 32, 128), (2, 32, 32, 128), (2, 16, 16, 128),
+    (2, 16, 16, 256), (2, 16, 16, 256), (2, 16, 16, 256), (2, 8, 8, 256),
+    (2, 8, 8, 512), (2, 8, 8, 512), (2, 8, 8, 512), (2, 8, 8, 512),
+    (2, 8, 8, 512), (2, 8, 8, 512), (2, 8, 8, 512), (2, 8, 8, 512),
+    (2, 8, 8, 512), (2, 8, 8, 512), (2, 8, 8, 512), (2, 4, 4, 512),
+    (2, 4, 4, 1024), (2, 4, 4, 1024), (2, 4, 4, 1024),
+]
+
+moblenet_v1_expected_feature_map_shape_299 = [
+    (2, 150, 150, 32), (2, 150, 150, 32), (2, 150, 150, 64), (2, 75, 75, 64),
+    (2, 75, 75, 128), (2, 75, 75, 128), (2, 75, 75, 128), (2, 38, 38, 128),
+    (2, 38, 38, 256), (2, 38, 38, 256), (2, 38, 38, 256), (2, 19, 19, 256),
+    (2, 19, 19, 512), (2, 19, 19, 512), (2, 19, 19, 512), (2, 19, 19, 512),
+    (2, 19, 19, 512), (2, 19, 19, 512), (2, 19, 19, 512), (2, 19, 19, 512),
+    (2, 19, 19, 512), (2, 19, 19, 512), (2, 19, 19, 512), (2, 10, 10, 512),
+    (2, 10, 10, 1024), (2, 10, 10, 1024), (2, 10, 10, 1024),
+]
+
+moblenet_v1_expected_feature_map_shape_enforcing_min_depth = [
+    (2, 150, 150, 8), (2, 150, 150, 8), (2, 150, 150, 8), (2, 75, 75, 8),
+    (2, 75, 75, 8), (2, 75, 75, 8), (2, 75, 75, 8), (2, 38, 38, 8),
+    (2, 38, 38, 8), (2, 38, 38, 8), (2, 38, 38, 8), (2, 19, 19, 8),
+    (2, 19, 19, 8), (2, 19, 19, 8), (2, 19, 19, 8), (2, 19, 19, 8),
+    (2, 19, 19, 8), (2, 19, 19, 8), (2, 19, 19, 8), (2, 19, 19, 8),
+    (2, 19, 19, 8), (2, 19, 19, 8), (2, 19, 19, 8), (2, 10, 10, 8),
+    (2, 10, 10, 8), (2, 10, 10, 8), (2, 10, 10, 8),
+]
+
+# For Mobilenet V2
+moblenet_v2_expected_feature_map_shape_128 = [
+    (2, 64, 64, 32), (2, 64, 64, 96), (2, 32, 32, 96), (2, 32, 32, 24),
+    (2, 32, 32, 144), (2, 32, 32, 144), (2, 32, 32, 24), (2, 32, 32, 144),
+    (2, 16, 16, 144), (2, 16, 16, 32), (2, 16, 16, 192), (2, 16, 16, 192),
+    (2, 16, 16, 32), (2, 16, 16, 192), (2, 16, 16, 192), (2, 16, 16, 32),
+    (2, 16, 16, 192), (2, 8, 8, 192), (2, 8, 8, 64), (2, 8, 8, 384),
+    (2, 8, 8, 384), (2, 8, 8, 64), (2, 8, 8, 384), (2, 8, 8, 384),
+    (2, 8, 8, 64), (2, 8, 8, 384), (2, 8, 8, 384), (2, 8, 8, 64),
+    (2, 8, 8, 384), (2, 8, 8, 384), (2, 8, 8, 96), (2, 8, 8, 576),
+    (2, 8, 8, 576), (2, 8, 8, 96), (2, 8, 8, 576), (2, 8, 8, 576),
+    (2, 8, 8, 96), (2, 8, 8, 576), (2, 4, 4, 576), (2, 4, 4, 160),
+    (2, 4, 4, 960), (2, 4, 4, 960), (2, 4, 4, 160), (2, 4, 4, 960),
+    (2, 4, 4, 960), (2, 4, 4, 160), (2, 4, 4, 960), (2, 4, 4, 960),
+    (2, 4, 4, 320), (2, 4, 4, 1280)
+]
+
+moblenet_v2_expected_feature_map_shape_128_explicit_padding = [
+    (2, 64, 64, 32), (2, 64, 64, 96), (2, 32, 32, 96), (2, 32, 32, 24),
+    (2, 32, 32, 144), (2, 32, 32, 144), (2, 32, 32, 24), (2, 32, 32, 144),
+    (2, 16, 16, 144), (2, 16, 16, 32), (2, 16, 16, 192), (2, 16, 16, 192),
+    (2, 16, 16, 32), (2, 16, 16, 192), (2, 16, 16, 192), (2, 16, 16, 32),
+    (2, 16, 16, 192), (2, 8, 8, 192), (2, 8, 8, 64), (2, 8, 8, 384),
+    (2, 8, 8, 384), (2, 8, 8, 64), (2, 8, 8, 384), (2, 8, 8, 384),
+    (2, 8, 8, 64), (2, 8, 8, 384), (2, 8, 8, 384), (2, 8, 8, 64),
+    (2, 8, 8, 384), (2, 8, 8, 384), (2, 8, 8, 96), (2, 8, 8, 576),
+    (2, 8, 8, 576), (2, 8, 8, 96), (2, 8, 8, 576), (2, 8, 8, 576),
+    (2, 8, 8, 96), (2, 8, 8, 576), (2, 4, 4, 576), (2, 4, 4, 160),
+    (2, 4, 4, 960), (2, 4, 4, 960), (2, 4, 4, 160), (2, 4, 4, 960),
+    (2, 4, 4, 960), (2, 4, 4, 160), (2, 4, 4, 960), (2, 4, 4, 960),
+    (2, 4, 4, 320), (2, 4, 4, 1280)
+]
+
+mobilenet_v2_expected_feature_map_shape_with_dynamic_inputs = [
+    (2, 64, 64, 32), (2, 64, 64, 96), (2, 32, 32, 96), (2, 32, 32, 24),
+    (2, 32, 32, 144), (2, 32, 32, 144), (2, 32, 32, 24), (2, 32, 32, 144),
+    (2, 16, 16, 144), (2, 16, 16, 32), (2, 16, 16, 192), (2, 16, 16, 192),
+    (2, 16, 16, 32), (2, 16, 16, 192), (2, 16, 16, 192), (2, 16, 16, 32),
+    (2, 16, 16, 192), (2, 8, 8, 192), (2, 8, 8, 64), (2, 8, 8, 384),
+    (2, 8, 8, 384), (2, 8, 8, 64), (2, 8, 8, 384), (2, 8, 8, 384),
+    (2, 8, 8, 64), (2, 8, 8, 384), (2, 8, 8, 384), (2, 8, 8, 64),
+    (2, 8, 8, 384), (2, 8, 8, 384), (2, 8, 8, 96), (2, 8, 8, 576),
+    (2, 8, 8, 576), (2, 8, 8, 96), (2, 8, 8, 576), (2, 8, 8, 576),
+    (2, 8, 8, 96), (2, 8, 8, 576), (2, 4, 4, 576), (2, 4, 4, 160),
+    (2, 4, 4, 960), (2, 4, 4, 960), (2, 4, 4, 160), (2, 4, 4, 960),
+    (2, 4, 4, 960), (2, 4, 4, 160), (2, 4, 4, 960), (2, 4, 4, 960),
+    (2, 4, 4, 320), (2, 4, 4, 1280)
+]
+
+moblenet_v2_expected_feature_map_shape_299 = [
+    (2, 150, 150, 32), (2, 150, 150, 96), (2, 75, 75, 96), (2, 75, 75, 24),
+    (2, 75, 75, 144), (2, 75, 75, 144), (2, 75, 75, 24), (2, 75, 75, 144),
+    (2, 38, 38, 144), (2, 38, 38, 32), (2, 38, 38, 192), (2, 38, 38, 192),
+    (2, 38, 38, 32), (2, 38, 38, 192), (2, 38, 38, 192), (2, 38, 38, 32),
+    (2, 38, 38, 192), (2, 19, 19, 192), (2, 19, 19, 64), (2, 19, 19, 384),
+    (2, 19, 19, 384), (2, 19, 19, 64), (2, 19, 19, 384), (2, 19, 19, 384),
+    (2, 19, 19, 64), (2, 19, 19, 384), (2, 19, 19, 384), (2, 19, 19, 64),
+    (2, 19, 19, 384), (2, 19, 19, 384), (2, 19, 19, 96), (2, 19, 19, 576),
+    (2, 19, 19, 576), (2, 19, 19, 96), (2, 19, 19, 576), (2, 19, 19, 576),
+    (2, 19, 19, 96), (2, 19, 19, 576), (2, 10, 10, 576), (2, 10, 10, 160),
+    (2, 10, 10, 960), (2, 10, 10, 960), (2, 10, 10, 160), (2, 10, 10, 960),
+    (2, 10, 10, 960), (2, 10, 10, 160), (2, 10, 10, 960), (2, 10, 10, 960),
+    (2, 10, 10, 320), (2, 10, 10, 1280)
+]
+
+moblenet_v2_expected_feature_map_shape_enforcing_min_depth = [
+    (2, 150, 150, 32), (2, 150, 150, 192), (2, 75, 75, 192), (2, 75, 75, 32),
+    (2, 75, 75, 192), (2, 75, 75, 192), (2, 75, 75, 32), (2, 75, 75, 192),
+    (2, 38, 38, 192), (2, 38, 38, 32), (2, 38, 38, 192), (2, 38, 38, 192),
+    (2, 38, 38, 32), (2, 38, 38, 192), (2, 38, 38, 192), (2, 38, 38, 32),
+    (2, 38, 38, 192), (2, 19, 19, 192), (2, 19, 19, 32), (2, 19, 19, 192),
+    (2, 19, 19, 192), (2, 19, 19, 32), (2, 19, 19, 192), (2, 19, 19, 192),
+    (2, 19, 19, 32), (2, 19, 19, 192), (2, 19, 19, 192), (2, 19, 19, 32),
+    (2, 19, 19, 192), (2, 19, 19, 192), (2, 19, 19, 32), (2, 19, 19, 192),
+    (2, 19, 19, 192), (2, 19, 19, 32), (2, 19, 19, 192), (2, 19, 19, 192),
+    (2, 19, 19, 32), (2, 19, 19, 192), (2, 10, 10, 192), (2, 10, 10, 32),
+    (2, 10, 10, 192), (2, 10, 10, 192), (2, 10, 10, 32), (2, 10, 10, 192),
+    (2, 10, 10, 192), (2, 10, 10, 32), (2, 10, 10, 192), (2, 10, 10, 192),
+    (2, 10, 10, 32), (2, 10, 10, 32)
+]
+