Release MobileNet V3 models and SSDLite models with MobileNet V3 backbone. (#7678)

* Merged commit includes the following changes: 275131829 by Sergio Guadarrama: updates mobilenet/README.md to be github compatible adds V2+ reference to mobilenet_v1.md file and fixes invalid markdown -- 274908068 by Sergio Guadarrama: Opensource MobilenetV3 detection models. -- 274697808 by Sergio Guadarrama: Fixed cases where tf.TensorShape was constructed with float dimensions This is a prerequisite for making TensorShape and Dimension more strict about the types of their arguments. -- 273577462 by Sergio Guadarrama: Fixing `conv_defs['defaults']` override issue. -- 272801298 by Sergio Guadarrama: Adds links to trained models for Moblienet V3, adds a version of minimalistic mobilenet-v3 to the definitions. -- 268928503 by Sergio Guadarrama: Mobilenet v2 with group normalization. -- 263492735 by Sergio Guadarrama: Internal change 260037126 by Sergio Guadarrama: Adds an option of using a custom depthwise operation in `expanded_conv`. -- 259997001 by Sergio Guadarrama: Explicitly mark Python binaries/tests with python_version = "PY2". -- 252697685 by Sergio Guadarrama: Internal change 251918746 by Sergio Guadarrama: Internal change 251909704 by Sergio Guadarrama: Mobilenet V3 backbone implementation. -- 247510236 by Sergio Guadarrama: Internal change 246196802 by Sergio Guadarrama: Internal change 246014539 by Sergio Guadarrama: Internal change 245891435 by Sergio Guadarrama: Internal change 245834925 by Sergio Guadarrama: n/a -- PiperOrigin-RevId: 275131829 * Merged commit includes the following changes: 274959989 by Zhichao Lu: Update detection model zoo with MobilenetV3 SSD candidates. -- 274908068 by Zhichao Lu: Opensource MobilenetV3 detection models. -- 274695889 by richardmunoz: RandomPatchGaussian preprocessing step This step can be used during model training to randomly apply gaussian noise to a random image patch. Example addition to an Object Detection API pipeline config: train_config { ... data_augmentation_options { random_patch_gaussian { random_coef: 0.5 min_patch_size: 1 max_patch_size: 250 min_gaussian_stddev: 0.0 max_gaussian_stddev: 1.0 } } ... } -- 274257872 by lzc: Internal change. -- 274114689 by Zhichao Lu: Pass native_resize flag to other FPN variants. -- 274112308 by lzc: Internal change. -- 274090763 by richardmunoz: Util function for getting a patch mask on an image for use with the Object Detection API -- 274069806 by Zhichao Lu: Adding functions which will help compute predictions and losses for CenterNet. -- 273860828 by lzc: Internal change. -- 273380069 by richardmunoz: RandomImageDownscaleToTargetPixels preprocessing step This step can be used during model training to randomly downscale an image to a random target number of pixels. If the image does not contain more than the target number of pixels, then downscaling is skipped. Example addition to an Object Detection API pipeline config: train_config { ... data_augmentation_options { random_downscale_to_target_pixels { random_coef: 0.5 min_target_pixels: 300000 max_target_pixels: 500000 } } ... } -- 272987602 by Zhichao Lu: Avoid -inf when empty box list is passed. -- 272525836 by Zhichao Lu: Cleanup repeated resizing code in meta archs. -- 272458667 by richardmunoz: RandomJpegQuality preprocessing step This step can be used during model training to randomly encode the image into a jpeg with a random quality level. Example addition to an Object Detection API pipeline config: train_config { ... data_augmentation_options { random_jpeg_quality { random_coef: 0.5 min_jpeg_quality: 80 max_jpeg_quality: 100 } } ... } -- 271412717 by Zhichao Lu: Enables TPU training with the V2 eager + tf.function Object Detection training loops. -- 270744153 by Zhichao Lu: Adding the offset and size target assigners for CenterNet. -- 269916081 by Zhichao Lu: Include basic installation in Object Detection API tutorial. Also: - Use TF2.0 - Use saved_model -- 269376056 by Zhichao Lu: Fix to variable loading in RetinaNet w/ custom loops. (makes the code rely on the exact name scopes that are generated a little bit less) -- 269256251 by lzc: Add use_partitioned_nms field to config and update post_prossing_builder to honor that flag when building nms function. -- 268865295 by Zhichao Lu: Adding functionality for importing and merging back internal state of the metric. -- 268640984 by Zhichao Lu: Fix computation of gaussian sigma value to create CenterNet heatmap target. -- 267475576 by Zhichao Lu: Fix for exporter trying to export non-existent exponential moving averages. -- 267286768 by Zhichao Lu: Update mixed-precision policy. -- 266166879 by Zhichao Lu: Internal change 265860884 by Zhichao Lu: Apply floor function to center coordinates when creating heatmap for CenterNet target. -- 265702749 by Zhichao Lu: Internal change -- 264241949 by ronnyvotel: Updating Faster R-CNN 'final_anchors' to be in normalized coordinates. -- 264175192 by lzc: Update model_fn to only read hparams if it is not None. -- 264159328 by Zhichao Lu: Modify nearest neighbor upsampling to eliminate a multiply operation. For quantized models, the multiply operation gets unnecessarily quantized and reduces accuracy (simple stacking would work in place of the broadcast op which doesn't require quantization). Also removes an unnecessary reshape op. -- 263668306 by Zhichao Lu: Add the option to use dynamic map_fn for batch NMS -- 263031163 by Zhichao Lu: Mark outside compilation for NMS as optional. -- 263024916 by Zhichao Lu: Add an ExperimentalModel meta arch for experimenting with new model types. -- 262655894 by Zhichao Lu: Add the center heatmap target assigner for CenterNet -- 262431036 by Zhichao Lu: Adding add_eval_dict to allow for evaluation on model_v2 -- 262035351 by ronnyvotel: Removing any non-Tensor predictions from the third stage of Mask R-CNN. -- 261953416 by Zhichao Lu: Internal change. -- 261834966 by Zhichao Lu: Fix the NMS OOM issue on TPU by forcing NMS to run outside of TPU. -- 261775941 by Zhichao Lu: Make Keras InputLayer compatible with both TF 1.x and TF 2.0. -- 261775633 by Zhichao Lu: Visualize additional channels with ground-truth bounding boxes. -- 261768117 by lzc: Internal change. -- 261766773 by ronnyvotel: Exposing `return_raw_detections_during_predict` in Faster R-CNN Proto. -- 260975089 by ronnyvotel: Moving calculation of batched prediction tensor names after all tensors in prediction dictionary are created. -- 259816913 by ronnyvotel: Adding raw detection boxes and feature map indices to SSD -- 259791955 by Zhichao Lu: Added a flag to control the use partitioned_non_max_suppression. -- 259580475 by Zhichao Lu: Tweak quantization-aware training re-writer to support NasFpn model architecture. -- 259579943 by rathodv: Add a meta target assigner proto and builders in OD API. -- 259577741 by Zhichao Lu: Internal change. -- 259366315 by lzc: Internal change. -- 259344310 by ronnyvotel: Updating faster rcnn so that raw_detection_boxes from predict() are in normalized coordinates. -- 259338670 by Zhichao Lu: Add support for use_native_resize_op to more feature extractors. Use dynamic shapes when static shapes are not available. -- 259083543 by ronnyvotel: Updating/fixing documentation. -- 259078937 by rathodv: Add prediction fields for tensors returned from detection_model.predict. -- 259044601 by Zhichao Lu: Add protocol buffer and builders for temperature scaling calibration. -- 259036770 by lzc: Internal changes. -- 259006223 by ronnyvotel: Adding detection anchor indices to Faster R-CNN Config. This is useful when one wishes to associate final detections and the anchors (or pre-nms boxes) from which they originated. -- 258872501 by Zhichao Lu: Run the training pipeline of ssd + resnet_v1_50 + fpn with a checkpoint. -- 258840686 by ronnyvotel: Adding standard outputs to DetectionModel.predict(). This CL only updates Faster R-CNN. Other meta architectures will be updated in future CLs. -- 258672969 by lzc: Internal change. -- 258649494 by lzc: Internal changes. -- 258630321 by ronnyvotel: Fixing documentation in shape_utils.flatten_dimensions(). -- 258468145 by Zhichao Lu: Add additional output tensors parameter to Postprocess op. -- 258099219 by Zhichao Lu: Internal changes -- PiperOrigin-RevId: 274959989

Release MobileNet V3 models and SSDLite models with MobileNet V3 backbone. (#7678)
* Merged commit includes the following changes: 275131829 by Sergio Guadarrama: updates mobilenet/README.md to be github compatible adds V2+ reference to mobilenet_v1.md file and fixes invalid markdown -- 274908068 by Sergio Guadarrama: Opensource MobilenetV3 detection models. -- 274697808 by Sergio Guadarrama: Fixed cases where tf.TensorShape was constructed with float dimensions This is a prerequisite for making TensorShape and Dimension more strict about the types of their arguments. -- 273577462 by Sergio Guadarrama: Fixing `conv_defs['defaults']` override issue. -- 272801298 by Sergio Guadarrama: Adds links to trained models for Moblienet V3, adds a version of minimalistic mobilenet-v3 to the definitions. -- 268928503 by Sergio Guadarrama: Mobilenet v2 with group normalization. -- 263492735 by Sergio Guadarrama: Internal change 260037126 by Sergio Guadarrama: Adds an option of using a custom depthwise operation in `expanded_conv`. -- 259997001 by Sergio Guadarrama: Explicitly mark Python binaries/tests with python_version = "PY2". -- 252697685 by Sergio Guadarrama: Internal change 251918746 by Sergio Guadarrama: Internal change 251909704 by Sergio Guadarrama: Mobilenet V3 backbone implementation. -- 247510236 by Sergio Guadarrama: Internal change 246196802 by Sergio Guadarrama: Internal change 246014539 by Sergio Guadarrama: Internal change 245891435 by Sergio Guadarrama: Internal change 245834925 by Sergio Guadarrama: n/a -- PiperOrigin-RevId: 275131829 * Merged commit includes the following changes: 274959989 by Zhichao Lu: Update detection model zoo with MobilenetV3 SSD candidates. -- 274908068 by Zhichao Lu: Opensource MobilenetV3 detection models. -- 274695889 by richardmunoz: RandomPatchGaussian preprocessing step This step can be used during model training to randomly apply gaussian noise to a random image patch. Example addition to an Object Detection API pipeline config: train_config { ... data_augmentation_options { random_patch_gaussian { random_coef: 0.5 min_patch_size: 1 max_patch_size: 250 min_gaussian_stddev: 0.0 max_gaussian_stddev: 1.0 } } ... } -- 274257872 by lzc: Internal change. -- 274114689 by Zhichao Lu: Pass native_resize flag to other FPN variants. -- 274112308 by lzc: Internal change. -- 274090763 by richardmunoz: Util function for getting a patch mask on an image for use with the Object Detection API -- 274069806 by Zhichao Lu: Adding functions which will help compute predictions and losses for CenterNet. -- 273860828 by lzc: Internal change. -- 273380069 by richardmunoz: RandomImageDownscaleToTargetPixels preprocessing step This step can be used during model training to randomly downscale an image to a random target number of pixels. If the image does not contain more than the target number of pixels, then downscaling is skipped. Example addition to an Object Detection API pipeline config: train_config { ... data_augmentation_options { random_downscale_to_target_pixels { random_coef: 0.5 min_target_pixels: 300000 max_target_pixels: 500000 } } ... } -- 272987602 by Zhichao Lu: Avoid -inf when empty box list is passed. -- 272525836 by Zhichao Lu: Cleanup repeated resizing code in meta archs. -- 272458667 by richardmunoz: RandomJpegQuality preprocessing step This step can be used during model training to randomly encode the image into a jpeg with a random quality level. Example addition to an Object Detection API pipeline config: train_config { ... data_augmentation_options { random_jpeg_quality { random_coef: 0.5 min_jpeg_quality: 80 max_jpeg_quality: 100 } } ... } -- 271412717 by Zhichao Lu: Enables TPU training with the V2 eager + tf.function Object Detection training loops. -- 270744153 by Zhichao Lu: Adding the offset and size target assigners for CenterNet. -- 269916081 by Zhichao Lu: Include basic installation in Object Detection API tutorial. Also: - Use TF2.0 - Use saved_model -- 269376056 by Zhichao Lu: Fix to variable loading in RetinaNet w/ custom loops. (makes the code rely on the exact name scopes that are generated a little bit less) -- 269256251 by lzc: Add use_partitioned_nms field to config and update post_prossing_builder to honor that flag when building nms function. -- 268865295 by Zhichao Lu: Adding functionality for importing and merging back internal state of the metric. -- 268640984 by Zhichao Lu: Fix computation of gaussian sigma value to create CenterNet heatmap target. -- 267475576 by Zhichao Lu: Fix for exporter trying to export non-existent exponential moving averages. -- 267286768 by Zhichao Lu: Update mixed-precision policy. -- 266166879 by Zhichao Lu: Internal change 265860884 by Zhichao Lu: Apply floor function to center coordinates when creating heatmap for CenterNet target. -- 265702749 by Zhichao Lu: Internal change -- 264241949 by ronnyvotel: Updating Faster R-CNN 'final_anchors' to be in normalized coordinates. -- 264175192 by lzc: Update model_fn to only read hparams if it is not None. -- 264159328 by Zhichao Lu: Modify nearest neighbor upsampling to eliminate a multiply operation. For quantized models, the multiply operation gets unnecessarily quantized and reduces accuracy (simple stacking would work in place of the broadcast op which doesn't require quantization). Also removes an unnecessary reshape op. -- 263668306 by Zhichao Lu: Add the option to use dynamic map_fn for batch NMS -- 263031163 by Zhichao Lu: Mark outside compilation for NMS as optional. -- 263024916 by Zhichao Lu: Add an ExperimentalModel meta arch for experimenting with new model types. -- 262655894 by Zhichao Lu: Add the center heatmap target assigner for CenterNet -- 262431036 by Zhichao Lu: Adding add_eval_dict to allow for evaluation on model_v2 -- 262035351 by ronnyvotel: Removing any non-Tensor predictions from the third stage of Mask R-CNN. -- 261953416 by Zhichao Lu: Internal change. -- 261834966 by Zhichao Lu: Fix the NMS OOM issue on TPU by forcing NMS to run outside of TPU. -- 261775941 by Zhichao Lu: Make Keras InputLayer compatible with both TF 1.x and TF 2.0. -- 261775633 by Zhichao Lu: Visualize additional channels with ground-truth bounding boxes. -- 261768117 by lzc: Internal change. -- 261766773 by ronnyvotel: Exposing `return_raw_detections_during_predict` in Faster R-CNN Proto. -- 260975089 by ronnyvotel: Moving calculation of batched prediction tensor names after all tensors in prediction dictionary are created. -- 259816913 by ronnyvotel: Adding raw detection boxes and feature map indices to SSD -- 259791955 by Zhichao Lu: Added a flag to control the use partitioned_non_max_suppression. -- 259580475 by Zhichao Lu: Tweak quantization-aware training re-writer to support NasFpn model architecture. -- 259579943 by rathodv: Add a meta target assigner proto and builders in OD API. -- 259577741 by Zhichao Lu: Internal change. -- 259366315 by lzc: Internal change. -- 259344310 by ronnyvotel: Updating faster rcnn so that raw_detection_boxes from predict() are in normalized coordinates. -- 259338670 by Zhichao Lu: Add support for use_native_resize_op to more feature extractors. Use dynamic shapes when static shapes are not available. -- 259083543 by ronnyvotel: Updating/fixing documentation. -- 259078937 by rathodv: Add prediction fields for tensors returned from detection_model.predict. -- 259044601 by Zhichao Lu: Add protocol buffer and builders for temperature scaling calibration. -- 259036770 by lzc: Internal changes. -- 259006223 by ronnyvotel: Adding detection anchor indices to Faster R-CNN Config. This is useful when one wishes to associate final detections and the anchors (or pre-nms boxes) from which they originated. -- 258872501 by Zhichao Lu: Run the training pipeline of ssd + resnet_v1_50 + fpn with a checkpoint. -- 258840686 by ronnyvotel: Adding standard outputs to DetectionModel.predict(). This CL only updates Faster R-CNN. Other meta architectures will be updated in future CLs. -- 258672969 by lzc: Internal change. -- 258649494 by lzc: Internal changes. -- 258630321 by ronnyvotel: Fixing documentation in shape_utils.flatten_dimensions(). -- 258468145 by Zhichao Lu: Add additional output tensors parameter to Postprocess op. -- 258099219 by Zhichao Lu: Internal changes -- PiperOrigin-RevId: 274959989
0ba83cf0 · pkulzc · Sergio Guadarrama · 9aed0ffb · 0ba83cf0 · 0ba83cf0
Commit 0ba83cf0 authored Oct 17, 2019 by pkulzc Committed by Sergio Guadarrama Oct 17, 2019
20 changed files
--- a/research/object_detection/core/preprocessor_test.py
+++ b/research/object_detection/core/preprocessor_test.py
@@ -19,9 +19,9 @@ from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function

+from absl.testing import parameterized
 import numpy as np
 import six
-
 from six.moves import range
 from six.moves import zip
 import tensorflow as tf
@@ -36,7 +36,7 @@ else:
  from unittest import mock  # pylint: disable=g-import-not-at-top


-class PreprocessorTest(tf.test.TestCase):
+class PreprocessorTest(tf.test.TestCase, parameterized.TestCase):

  def createColorfulTestImage(self):
    ch255 = tf.fill([1, 100, 200, 1], tf.constant(255, dtype=tf.uint8))
@@ -2478,6 +2478,233 @@ class PreprocessorTest(tf.test.TestCase):
          [images_shape, blacked_images_shape])
      self.assertAllEqual(images_shape_, blacked_images_shape_)

+  def testRandomJpegQuality(self):
+    preprocessing_options = [(preprocessor.random_jpeg_quality, {
+        'min_jpeg_quality': 0,
+        'max_jpeg_quality': 100
+    })]
+    images = self.createTestImages()
+    tensor_dict = {fields.InputDataFields.image: images}
+    processed_tensor_dict = preprocessor.preprocess(tensor_dict,
+                                                    preprocessing_options)
+    encoded_images = processed_tensor_dict[fields.InputDataFields.image]
+    images_shape = tf.shape(images)
+    encoded_images_shape = tf.shape(encoded_images)
+
+    with self.test_session() as sess:
+      images_shape_out, encoded_images_shape_out = sess.run(
+          [images_shape, encoded_images_shape])
+      self.assertAllEqual(images_shape_out, encoded_images_shape_out)
+
+  def testRandomJpegQualityKeepsStaticChannelShape(self):
+    # Set at least three weeks past the forward compatibility horizon for
+    # tf 1.14 of 2019/11/01.
+    # https://github.com/tensorflow/tensorflow/blob/v1.14.0/tensorflow/python/compat/compat.py#L30
+    if not tf.compat.forward_compatible(year=2019, month=12, day=1):
+      self.skipTest('Skipping test for future functionality.')
+
+    preprocessing_options = [(preprocessor.random_jpeg_quality, {
+        'min_jpeg_quality': 0,
+        'max_jpeg_quality': 100
+    })]
+    images = self.createTestImages()
+    tensor_dict = {fields.InputDataFields.image: images}
+    processed_tensor_dict = preprocessor.preprocess(tensor_dict,
+                                                    preprocessing_options)
+    encoded_images = processed_tensor_dict[fields.InputDataFields.image]
+    images_static_channels = images.shape[-1]
+    encoded_images_static_channels = encoded_images.shape[-1]
+    self.assertEqual(images_static_channels, encoded_images_static_channels)
+
+  def testRandomJpegQualityWithCache(self):
+    preprocessing_options = [(preprocessor.random_jpeg_quality, {
+        'min_jpeg_quality': 0,
+        'max_jpeg_quality': 100
+    })]
+    self._testPreprocessorCache(preprocessing_options)
+
+  def testRandomJpegQualityWithRandomCoefOne(self):
+    preprocessing_options = [(preprocessor.random_jpeg_quality, {
+        'random_coef': 1.0
+    })]
+    images = self.createTestImages()
+    tensor_dict = {fields.InputDataFields.image: images}
+    processed_tensor_dict = preprocessor.preprocess(tensor_dict,
+                                                    preprocessing_options)
+    encoded_images = processed_tensor_dict[fields.InputDataFields.image]
+    images_shape = tf.shape(images)
+    encoded_images_shape = tf.shape(encoded_images)
+
+    with self.test_session() as sess:
+      (images_out, encoded_images_out, images_shape_out,
+       encoded_images_shape_out) = sess.run(
+           [images, encoded_images, images_shape, encoded_images_shape])
+      self.assertAllEqual(images_shape_out, encoded_images_shape_out)
+      self.assertAllEqual(images_out, encoded_images_out)
+
+  def testRandomDownscaleToTargetPixels(self):
+    preprocessing_options = [(preprocessor.random_downscale_to_target_pixels, {
+        'min_target_pixels': 100,
+        'max_target_pixels': 101
+    })]
+    images = tf.random_uniform([1, 25, 100, 3])
+    tensor_dict = {fields.InputDataFields.image: images}
+    processed_tensor_dict = preprocessor.preprocess(tensor_dict,
+                                                    preprocessing_options)
+    downscaled_images = processed_tensor_dict[fields.InputDataFields.image]
+    downscaled_shape = tf.shape(downscaled_images)
+    expected_shape = [1, 5, 20, 3]
+    with self.test_session() as sess:
+      downscaled_shape_out = sess.run(downscaled_shape)
+      self.assertAllEqual(downscaled_shape_out, expected_shape)
+
+  def testRandomDownscaleToTargetPixelsWithMasks(self):
+    preprocessing_options = [(preprocessor.random_downscale_to_target_pixels, {
+        'min_target_pixels': 100,
+        'max_target_pixels': 101
+    })]
+    images = tf.random_uniform([1, 25, 100, 3])
+    masks = tf.random_uniform([10, 25, 100])
+    tensor_dict = {
+        fields.InputDataFields.image: images,
+        fields.InputDataFields.groundtruth_instance_masks: masks
+    }
+    preprocessor_arg_map = preprocessor.get_default_func_arg_map(
+        include_instance_masks=True)
+    processed_tensor_dict = preprocessor.preprocess(
+        tensor_dict, preprocessing_options, func_arg_map=preprocessor_arg_map)
+    downscaled_images = processed_tensor_dict[fields.InputDataFields.image]
+    downscaled_masks = processed_tensor_dict[
+        fields.InputDataFields.groundtruth_instance_masks]
+    downscaled_images_shape = tf.shape(downscaled_images)
+    downscaled_masks_shape = tf.shape(downscaled_masks)
+    expected_images_shape = [1, 5, 20, 3]
+    expected_masks_shape = [10, 5, 20]
+    with self.test_session() as sess:
+      downscaled_images_shape_out, downscaled_masks_shape_out = sess.run(
+          [downscaled_images_shape, downscaled_masks_shape])
+      self.assertAllEqual(downscaled_images_shape_out, expected_images_shape)
+      self.assertAllEqual(downscaled_masks_shape_out, expected_masks_shape)
+
+  @parameterized.parameters(
+      {'test_masks': False},
+      {'test_masks': True}
+  )
+  def testRandomDownscaleToTargetPixelsWithCache(self, test_masks):
+    preprocessing_options = [(preprocessor.random_downscale_to_target_pixels, {
+        'min_target_pixels': 100,
+        'max_target_pixels': 999
+    })]
+    self._testPreprocessorCache(preprocessing_options, test_masks=test_masks)
+
+  def testRandomDownscaleToTargetPixelsWithRandomCoefOne(self):
+    preprocessing_options = [(preprocessor.random_downscale_to_target_pixels, {
+        'random_coef': 1.0,
+        'min_target_pixels': 10,
+        'max_target_pixels': 20,
+    })]
+    images = tf.random_uniform([1, 25, 100, 3])
+    tensor_dict = {fields.InputDataFields.image: images}
+    processed_tensor_dict = preprocessor.preprocess(tensor_dict,
+                                                    preprocessing_options)
+    downscaled_images = processed_tensor_dict[fields.InputDataFields.image]
+    images_shape = tf.shape(images)
+    downscaled_images_shape = tf.shape(downscaled_images)
+
+    with self.test_session() as sess:
+      (images_out, downscaled_images_out, images_shape_out,
+       downscaled_images_shape_out) = sess.run(
+           [images, downscaled_images, images_shape, downscaled_images_shape])
+      self.assertAllEqual(images_shape_out, downscaled_images_shape_out)
+      self.assertAllEqual(images_out, downscaled_images_out)
+
+  def testRandomDownscaleToTargetPixelsIgnoresSmallImages(self):
+    preprocessing_options = [(preprocessor.random_downscale_to_target_pixels, {
+        'min_target_pixels': 1000,
+        'max_target_pixels': 1001
+    })]
+    images = tf.random_uniform([1, 10, 10, 3])
+    tensor_dict = {fields.InputDataFields.image: images}
+    processed_tensor_dict = preprocessor.preprocess(tensor_dict,
+                                                    preprocessing_options)
+    downscaled_images = processed_tensor_dict[fields.InputDataFields.image]
+    images_shape = tf.shape(images)
+    downscaled_images_shape = tf.shape(downscaled_images)
+    with self.test_session() as sess:
+      (images_out, downscaled_images_out, images_shape_out,
+       downscaled_images_shape_out) = sess.run(
+           [images, downscaled_images, images_shape, downscaled_images_shape])
+      self.assertAllEqual(images_shape_out, downscaled_images_shape_out)
+      self.assertAllEqual(images_out, downscaled_images_out)
+
+  def testRandomPatchGaussianShape(self):
+    preprocessing_options = [(preprocessor.random_patch_gaussian, {
+        'min_patch_size': 1,
+        'max_patch_size': 200,
+        'min_gaussian_stddev': 0.0,
+        'max_gaussian_stddev': 2.0
+    })]
+    images = self.createTestImages()
+    tensor_dict = {fields.InputDataFields.image: images}
+    processed_tensor_dict = preprocessor.preprocess(tensor_dict,
+                                                    preprocessing_options)
+    patched_images = processed_tensor_dict[fields.InputDataFields.image]
+    images_shape = tf.shape(images)
+    patched_images_shape = tf.shape(patched_images)
+    self.assertAllEqual(images_shape, patched_images_shape)
+
+  def testRandomPatchGaussianClippedToLowerBound(self):
+    preprocessing_options = [(preprocessor.random_patch_gaussian, {
+        'min_patch_size': 20,
+        'max_patch_size': 40,
+        'min_gaussian_stddev': 50,
+        'max_gaussian_stddev': 100
+    })]
+    images = tf.zeros([1, 5, 4, 3])
+    tensor_dict = {fields.InputDataFields.image: images}
+    processed_tensor_dict = preprocessor.preprocess(tensor_dict,
+                                                    preprocessing_options)
+    patched_images = processed_tensor_dict[fields.InputDataFields.image]
+    self.assertAllGreaterEqual(patched_images, 0.0)
+
+  def testRandomPatchGaussianClippedToUpperBound(self):
+    preprocessing_options = [(preprocessor.random_patch_gaussian, {
+        'min_patch_size': 20,
+        'max_patch_size': 40,
+        'min_gaussian_stddev': 50,
+        'max_gaussian_stddev': 100
+    })]
+    images = tf.constant(255.0, shape=[1, 5, 4, 3])
+    tensor_dict = {fields.InputDataFields.image: images}
+    processed_tensor_dict = preprocessor.preprocess(tensor_dict,
+                                                    preprocessing_options)
+    patched_images = processed_tensor_dict[fields.InputDataFields.image]
+    self.assertAllLessEqual(patched_images, 255.0)
+
+  def testRandomPatchGaussianWithCache(self):
+    preprocessing_options = [(preprocessor.random_patch_gaussian, {
+        'min_patch_size': 1,
+        'max_patch_size': 200,
+        'min_gaussian_stddev': 0.0,
+        'max_gaussian_stddev': 2.0
+    })]
+    self._testPreprocessorCache(preprocessing_options)
+
+  def testRandomPatchGaussianWithRandomCoefOne(self):
+    preprocessing_options = [(preprocessor.random_patch_gaussian, {
+        'random_coef': 1.0
+    })]
+    images = self.createTestImages()
+    tensor_dict = {fields.InputDataFields.image: images}
+    processed_tensor_dict = preprocessor.preprocess(tensor_dict,
+                                                    preprocessing_options)
+    patched_images = processed_tensor_dict[fields.InputDataFields.image]
+    images_shape = tf.shape(images)
+    patched_images_shape = tf.shape(patched_images)
+
+    self.assertAllEqual(images_shape, patched_images_shape)
+    self.assertAllEqual(images, patched_images)
+
  def testAutoAugmentImage(self):
    preprocessing_options = []
    preprocessing_options.append((preprocessor.autoaugment_image, {

--- a/research/object_detection/core/standard_fields.py
+++ b/research/object_detection/core/standard_fields.py
@@ -168,6 +168,22 @@ class BoxListFields(object):
  is_crowd = 'is_crowd'


+class PredictionFields(object):
+  """Naming conventions for standardized prediction outputs.
+
+  Attributes:
+    feature_maps: List of feature maps for prediction.
+    anchors: Generated anchors.
+    raw_detection_boxes: Decoded detection boxes without NMS.
+    raw_detection_feature_map_indices: Feature map indices from which each raw
+      detection box was produced.
+  """
+  feature_maps = 'feature_maps'
+  anchors = 'anchors'
+  raw_detection_boxes = 'raw_detection_boxes'
+  raw_detection_feature_map_indices = 'raw_detection_feature_map_indices'
+
+
 class TfExampleFields(object):
  """TF-example proto feature names for object detection.


--- a/research/object_detection/core/target_assigner.py
+++ b/research/object_detection/core/target_assigner.py
@@ -41,8 +41,9 @@ import tensorflow as tf

 from object_detection.box_coders import faster_rcnn_box_coder
 from object_detection.box_coders import mean_stddev_box_coder
-from object_detection.core import box_coder as bcoder
+from object_detection.core import box_coder
 from object_detection.core import box_list
+from object_detection.core import box_list_ops
 from object_detection.core import matcher as mat
 from object_detection.core import region_similarity_calculator as sim_calc
 from object_detection.core import standard_fields as fields
@@ -57,7 +58,7 @@ class TargetAssigner(object):
  def __init__(self,
               similarity_calc,
               matcher,
-               box_coder,
+               box_coder_instance,
               negative_class_weight=1.0):
    """Construct Object Detection Target Assigner.

@@ -65,8 +66,8 @@ class TargetAssigner(object):
      similarity_calc: a RegionSimilarityCalculator
      matcher: an object_detection.core.Matcher used to match groundtruth to
        anchors.
-      box_coder: an object_detection.core.BoxCoder used to encode matching
-        groundtruth boxes with respect to anchors.
+      box_coder_instance: an object_detection.core.BoxCoder used to encode
+        matching groundtruth boxes with respect to anchors.
      negative_class_weight: classification weight to be associated to negative
        anchors (default: 1.0). The weight must be in [0., 1.].

@@ -78,11 +79,11 @@ class TargetAssigner(object):
      raise ValueError('similarity_calc must be a RegionSimilarityCalculator')
    if not isinstance(matcher, mat.Matcher):
      raise ValueError('matcher must be a Matcher')
-    if not isinstance(box_coder, bcoder.BoxCoder):
+    if not isinstance(box_coder_instance, box_coder.BoxCoder):
      raise ValueError('box_coder must be a BoxCoder')
    self._similarity_calc = similarity_calc
    self._matcher = matcher
-    self._box_coder = box_coder
+    self._box_coder = box_coder_instance
    self._negative_class_weight = negative_class_weight

  @property
@@ -391,7 +392,7 @@ def create_target_assigner(reference, stage=None,
  if reference == 'Multibox' and stage == 'proposal':
    similarity_calc = sim_calc.NegSqDistSimilarity()
    matcher = bipartite_matcher.GreedyBipartiteMatcher()
-    box_coder = mean_stddev_box_coder.MeanStddevBoxCoder()
+    box_coder_instance = mean_stddev_box_coder.MeanStddevBoxCoder()

  elif reference == 'FasterRCNN' and stage == 'proposal':
    similarity_calc = sim_calc.IouSimilarity()
@@ -399,7 +400,7 @@ def create_target_assigner(reference, stage=None,
                                           unmatched_threshold=0.3,
                                           force_match_for_each_row=True,
                                           use_matmul_gather=use_matmul_gather)
-    box_coder = faster_rcnn_box_coder.FasterRcnnBoxCoder(
+    box_coder_instance = faster_rcnn_box_coder.FasterRcnnBoxCoder(
        scale_factors=[10.0, 10.0, 5.0, 5.0])

  elif reference == 'FasterRCNN' and stage == 'detection':
@@ -408,7 +409,7 @@ def create_target_assigner(reference, stage=None,
    matcher = argmax_matcher.ArgMaxMatcher(matched_threshold=0.5,
                                           negatives_lower_than_unmatched=True,
                                           use_matmul_gather=use_matmul_gather)
-    box_coder = faster_rcnn_box_coder.FasterRcnnBoxCoder(
+    box_coder_instance = faster_rcnn_box_coder.FasterRcnnBoxCoder(
        scale_factors=[10.0, 10.0, 5.0, 5.0])

  elif reference == 'FastRCNN':
@@ -418,12 +419,12 @@ def create_target_assigner(reference, stage=None,
                                           force_match_for_each_row=False,
                                           negatives_lower_than_unmatched=False,
                                           use_matmul_gather=use_matmul_gather)
-    box_coder = faster_rcnn_box_coder.FasterRcnnBoxCoder()
+    box_coder_instance = faster_rcnn_box_coder.FasterRcnnBoxCoder()

  else:
    raise ValueError('No valid combination of reference and stage.')

-  return TargetAssigner(similarity_calc, matcher, box_coder,
+  return TargetAssigner(similarity_calc, matcher, box_coder_instance,
                        negative_class_weight=negative_class_weight)


@@ -702,3 +703,5 @@ def batch_assign_confidences(target_assigner,
  batch_match = tf.stack(match_list)
  return (batch_cls_targets, batch_cls_weights, batch_reg_targets,
          batch_reg_weights, batch_match)
+
+
--- a/research/object_detection/export_tflite_ssd_graph_lib.py
+++ b/research/object_detection/export_tflite_ssd_graph_lib.py
@@ -67,7 +67,8 @@ def append_postprocessing_op(frozen_graph_def,
                             num_classes,
                             scale_values,
                             detections_per_class=100,
-                             use_regular_nms=False):
+                             use_regular_nms=False,
+                             additional_output_tensors=()):
  """Appends postprocessing custom op.

  Args:
@@ -82,11 +83,13 @@ def append_postprocessing_op(frozen_graph_def,
    num_classes: number of classes in SSD detector
    scale_values: scale values is a dict with following key-value pairs
      {y_scale: 10, x_scale: 10, h_scale: 5, w_scale: 5} that are used in decode
-      centersize boxes
+        centersize boxes
    detections_per_class: In regular NonMaxSuppression, number of anchors used
-    for NonMaxSuppression per class
-    use_regular_nms: Flag to set postprocessing op to use Regular NMS instead
-      of Fast NMS.
+      for NonMaxSuppression per class
+    use_regular_nms: Flag to set postprocessing op to use Regular NMS instead of
+      Fast NMS.
+    additional_output_tensors: Array of additional tensor names to output.
+      Tensors are appended after postprocessing output.

  Returns:
    transformed_graph_def: Frozen GraphDef with postprocessing custom op
@@ -140,7 +143,8 @@ def append_postprocessing_op(frozen_graph_def,
      ['raw_outputs/box_encodings', 'raw_outputs/class_predictions', 'anchors'])
  # Transform the graph to append new postprocessing op
  input_names = []
-  output_names = ['TFLite_Detection_PostProcess']
+  output_names = ['TFLite_Detection_PostProcess'
+                 ] + list(additional_output_tensors)
  transforms = ['strip_unused_nodes']
  transformed_graph_def = TransformGraph(frozen_graph_def, input_names,
                                         output_names, transforms)
@@ -156,7 +160,8 @@ def export_tflite_graph(pipeline_config,
                        detections_per_class=100,
                        use_regular_nms=False,
                        binary_graph_name='tflite_graph.pb',
-                        txt_graph_name='tflite_graph.pbtxt'):
+                        txt_graph_name='tflite_graph.pbtxt',
+                        additional_output_tensors=()):
  """Exports a tflite compatible graph and anchors for ssd detection model.

  Anchors are written to a tensor and tflite compatible graph
@@ -173,11 +178,13 @@ def export_tflite_graph(pipeline_config,
    max_detections: Maximum number of detections (boxes) to show
    max_classes_per_detection: Number of classes to display per detection
    detections_per_class: In regular NonMaxSuppression, number of anchors used
-    for NonMaxSuppression per class
-    use_regular_nms: Flag to set postprocessing op to use Regular NMS instead
-      of Fast NMS.
+      for NonMaxSuppression per class
+    use_regular_nms: Flag to set postprocessing op to use Regular NMS instead of
+      Fast NMS.
    binary_graph_name: Name of the exported graph file in binary format.
    txt_graph_name: Name of the exported graph file in text format.
+    additional_output_tensors: Array of additional tensor names to output.
+      Additional tensors are appended to the end of output tensor list.

  Raises:
    ValueError: if the pipeline config contains models other than ssd or uses an
@@ -191,12 +198,12 @@ def export_tflite_graph(pipeline_config,

  num_classes = pipeline_config.model.ssd.num_classes
  nms_score_threshold = {
-      pipeline_config.model.ssd.post_processing.batch_non_max_suppression.
-      score_threshold
+      pipeline_config.model.ssd.post_processing.batch_non_max_suppression
+      .score_threshold
  }
  nms_iou_threshold = {
-      pipeline_config.model.ssd.post_processing.batch_non_max_suppression.
-      iou_threshold
+      pipeline_config.model.ssd.post_processing.batch_non_max_suppression
+      .iou_threshold
  }
  scale_values = {}
  scale_values['y_scale'] = {
@@ -291,7 +298,7 @@ def export_tflite_graph(pipeline_config,
      output_node_names=','.join([
          'raw_outputs/box_encodings', 'raw_outputs/class_predictions',
          'anchors'
-      ]),
+      ] + list(additional_output_tensors)),
      restore_op_name='save/restore_all',
      filename_tensor_name='save/Const:0',
      clear_devices=True,
@@ -301,9 +308,16 @@ def export_tflite_graph(pipeline_config,
  # Add new operation to do post processing in a custom op (TF Lite only)
  if add_postprocessing_op:
    transformed_graph_def = append_postprocessing_op(
-        frozen_graph_def, max_detections, max_classes_per_detection,
-        nms_score_threshold, nms_iou_threshold, num_classes, scale_values,
-        detections_per_class, use_regular_nms)
+        frozen_graph_def,
+        max_detections,
+        max_classes_per_detection,
+        nms_score_threshold,
+        nms_iou_threshold,
+        num_classes,
+        scale_values,
+        detections_per_class,
+        use_regular_nms,
+        additional_output_tensors=additional_output_tensors)
  else:
    # Return frozen without adding post-processing custom op
    transformed_graph_def = frozen_graph_def

--- a/research/object_detection/export_tflite_ssd_graph_lib_test.py
+++ b/research/object_detection/export_tflite_ssd_graph_lib_test.py
@@ -12,7 +12,6 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ==============================================================================
-
 """Tests for object_detection.export_tflite_ssd_graph."""
 from __future__ import absolute_import
 from __future__ import division
@@ -31,7 +30,6 @@ from object_detection.protos import graph_rewriter_pb2
 from object_detection.protos import pipeline_pb2
 from object_detection.protos import post_processing_pb2

-
 if six.PY2:
  import mock  # pylint: disable=g-import-not-at-top
 else:
@@ -130,7 +128,10 @@ class ExportTfliteGraphTest(tf.test.TestCase):
            feed_dict={input_tensor: np.random.rand(1, 10, 10, num_channels)})
    return box_encodings_np, class_predictions_np

-  def _export_graph(self, pipeline_config, num_channels=3):
+  def _export_graph(self,
+                    pipeline_config,
+                    num_channels=3,
+                    additional_output_tensors=()):
    """Exports a tflite graph."""
    output_dir = self.get_temp_dir()
    trained_checkpoint_prefix = os.path.join(output_dir, 'model.ckpt')
@@ -147,18 +148,22 @@ class ExportTfliteGraphTest(tf.test.TestCase):
      mock_builder.return_value = FakeModel()

      with tf.Graph().as_default():
+        tf.identity(
+            tf.constant([[1, 2], [3, 4]], tf.uint8), name='UnattachedTensor')
        export_tflite_ssd_graph_lib.export_tflite_graph(
            pipeline_config=pipeline_config,
            trained_checkpoint_prefix=trained_checkpoint_prefix,
            output_dir=output_dir,
            add_postprocessing_op=False,
            max_detections=10,
-            max_classes_per_detection=1)
+            max_classes_per_detection=1,
+            additional_output_tensors=additional_output_tensors)
    return tflite_graph_file

  def _export_graph_with_postprocessing_op(self,
                                           pipeline_config,
-                                           num_channels=3):
+                                           num_channels=3,
+                                           additional_output_tensors=()):
    """Exports a tflite graph with custom postprocessing op."""
    output_dir = self.get_temp_dir()
    trained_checkpoint_prefix = os.path.join(output_dir, 'model.ckpt')
@@ -175,13 +180,16 @@ class ExportTfliteGraphTest(tf.test.TestCase):
      mock_builder.return_value = FakeModel()

      with tf.Graph().as_default():
+        tf.identity(
+            tf.constant([[1, 2], [3, 4]], tf.uint8), name='UnattachedTensor')
        export_tflite_ssd_graph_lib.export_tflite_graph(
            pipeline_config=pipeline_config,
            trained_checkpoint_prefix=trained_checkpoint_prefix,
            output_dir=output_dir,
            add_postprocessing_op=True,
            max_detections=10,
-            max_classes_per_detection=1)
+            max_classes_per_detection=1,
+            additional_output_tensors=additional_output_tensors)
    return tflite_graph_file

  def test_export_tflite_graph_with_moving_averages(self):
@@ -325,7 +333,8 @@ class ExportTfliteGraphTest(tf.test.TestCase):
      with tf.gfile.Open(tflite_graph_file) as f:
        graph_def.ParseFromString(f.read())
      all_op_names = [node.name for node in graph_def.node]
-      self.assertTrue('TFLite_Detection_PostProcess' in all_op_names)
+      self.assertIn('TFLite_Detection_PostProcess', all_op_names)
+      self.assertNotIn('UnattachedTensor', all_op_names)
      for node in graph_def.node:
        if node.name == 'TFLite_Detection_PostProcess':
          self.assertTrue(node.attr['_output_quantized'].b is True)
@@ -342,6 +351,42 @@ class ExportTfliteGraphTest(tf.test.TestCase):
                  for t in node.attr['_output_types'].list.type
              ]))

+  def test_export_tflite_graph_with_additional_tensors(self):
+    pipeline_config = pipeline_pb2.TrainEvalPipelineConfig()
+    pipeline_config.eval_config.use_moving_averages = False
+    pipeline_config.model.ssd.image_resizer.fixed_shape_resizer.height = 10
+    pipeline_config.model.ssd.image_resizer.fixed_shape_resizer.width = 10
+    tflite_graph_file = self._export_graph(
+        pipeline_config, additional_output_tensors=['UnattachedTensor'])
+    self.assertTrue(os.path.exists(tflite_graph_file))
+    graph = tf.Graph()
+    with graph.as_default():
+      graph_def = tf.GraphDef()
+      with tf.gfile.Open(tflite_graph_file) as f:
+        graph_def.ParseFromString(f.read())
+      all_op_names = [node.name for node in graph_def.node]
+      self.assertIn('UnattachedTensor', all_op_names)
+
+  def test_export_tflite_graph_with_postprocess_op_and_additional_tensors(self):
+    pipeline_config = pipeline_pb2.TrainEvalPipelineConfig()
+    pipeline_config.eval_config.use_moving_averages = False
+    pipeline_config.model.ssd.post_processing.score_converter = (
+        post_processing_pb2.PostProcessing.SIGMOID)
+    pipeline_config.model.ssd.image_resizer.fixed_shape_resizer.height = 10
+    pipeline_config.model.ssd.image_resizer.fixed_shape_resizer.width = 10
+    pipeline_config.model.ssd.num_classes = 2
+    tflite_graph_file = self._export_graph_with_postprocessing_op(
+        pipeline_config, additional_output_tensors=['UnattachedTensor'])
+    self.assertTrue(os.path.exists(tflite_graph_file))
+    graph = tf.Graph()
+    with graph.as_default():
+      graph_def = tf.GraphDef()
+      with tf.gfile.Open(tflite_graph_file) as f:
+        graph_def.ParseFromString(f.read())
+      all_op_names = [node.name for node in graph_def.node]
+      self.assertIn('TFLite_Detection_PostProcess', all_op_names)
+      self.assertIn('UnattachedTensor', all_op_names)
+
  @mock.patch.object(exporter, 'rewrite_nn_resize_op')
  def test_export_with_nn_resize_op_not_called_without_fpn(self, mock_get):
    pipeline_config = pipeline_pb2.TrainEvalPipelineConfig()

--- a/research/object_detection/exporter.py
+++ b/research/object_detection/exporter.py
@@ -40,50 +40,54 @@ def rewrite_nn_resize_op(is_quantized=False):
  Args:
    is_quantized: True if the default graph is quantized.
  """
-  input_pattern = graph_matcher.OpTypePattern(
-      'FakeQuantWithMinMaxVars' if is_quantized else '*')
-  reshape_1_pattern = graph_matcher.OpTypePattern(
-      'Reshape', inputs=[input_pattern, 'Const'], ordered_inputs=False)
-  mul_pattern = graph_matcher.OpTypePattern(
-      'Mul', inputs=[reshape_1_pattern, 'Const'], ordered_inputs=False)
-  # The quantization script may or may not insert a fake quant op after the
-  # Mul. In either case, these min/max vars are not needed once replaced with
-  # the TF version of NN resize.
-  fake_quant_pattern = graph_matcher.OpTypePattern(
-      'FakeQuantWithMinMaxVars',
-      inputs=[mul_pattern, 'Identity', 'Identity'],
-      ordered_inputs=False)
-  reshape_2_pattern = graph_matcher.OpTypePattern(
-      'Reshape',
-      inputs=[graph_matcher.OneofPattern([fake_quant_pattern, mul_pattern]),
-              'Const'],
-      ordered_inputs=False)
-  add_type_name = 'Add'
-  if tf.compat.forward_compatible(2019, 6, 26):
-    add_type_name = 'AddV2'
-  add_pattern = graph_matcher.OpTypePattern(
-      add_type_name, inputs=[reshape_2_pattern, '*'], ordered_inputs=False)
-
-  matcher = graph_matcher.GraphMatcher(add_pattern)
-  for match in matcher.match_graph(tf.get_default_graph()):
-    projection_op = match.get_op(input_pattern)
-    reshape_2_op = match.get_op(reshape_2_pattern)
-    add_op = match.get_op(add_pattern)
-    nn_resize = tf.image.resize_nearest_neighbor(
-        projection_op.outputs[0],
-        add_op.outputs[0].shape.dims[1:3],
-        align_corners=False,
-        name=os.path.split(reshape_2_op.name)[0] + '/resize_nearest_neighbor')
-
-    for index, op_input in enumerate(add_op.inputs):
-      if op_input == reshape_2_op.outputs[0]:
-        add_op._update_input(index, nn_resize)  # pylint: disable=protected-access
-        break
+  def remove_nn():
+    """Remove nearest neighbor upsampling structure and replace with TF op."""
+    input_pattern = graph_matcher.OpTypePattern(
+        'FakeQuantWithMinMaxVars' if is_quantized else '*')
+    stack_1_pattern = graph_matcher.OpTypePattern(
+        'Pack', inputs=[input_pattern, input_pattern], ordered_inputs=False)
+    stack_2_pattern = graph_matcher.OpTypePattern(
+        'Pack', inputs=[stack_1_pattern, stack_1_pattern], ordered_inputs=False)
+    reshape_pattern = graph_matcher.OpTypePattern(
+        'Reshape', inputs=[stack_2_pattern, 'Const'], ordered_inputs=False)
+    consumer_pattern = graph_matcher.OpTypePattern(
+        'Add|AddV2|Max|Mul', inputs=[reshape_pattern, '*'],
+        ordered_inputs=False)
+
+    match_counter = 0
+    matcher = graph_matcher.GraphMatcher(consumer_pattern)
+    for match in matcher.match_graph(tf.get_default_graph()):
+      match_counter += 1
+      projection_op = match.get_op(input_pattern)
+      reshape_op = match.get_op(reshape_pattern)
+      consumer_op = match.get_op(consumer_pattern)
+      nn_resize = tf.image.resize_nearest_neighbor(
+          projection_op.outputs[0],
+          reshape_op.outputs[0].shape.dims[1:3],
+          align_corners=False,
+          name=os.path.split(reshape_op.name)[0] + '/resize_nearest_neighbor')
+
+      for index, op_input in enumerate(consumer_op.inputs):
+        if op_input == reshape_op.outputs[0]:
+          consumer_op._update_input(index, nn_resize)  # pylint: disable=protected-access
+          break
+
+    tf.logging.info('Found and fixed {} matches'.format(match_counter))
+    return match_counter
+
+  # Applying twice because both inputs to Add could be NN pattern
+  total_removals = 0
+  while remove_nn():
+    total_removals += 1
+    # This number is chosen based on the nas-fpn architecture.
+    if total_removals > 4:
+      raise ValueError('Graph removal encountered a infinite loop.')


 def replace_variable_values_with_moving_averages(graph,
                                                 current_checkpoint_file,
-                                                 new_checkpoint_file):
+                                                 new_checkpoint_file,
+                                                 no_ema_collection=None):
  """Replaces variable values in the checkpoint with their moving averages.

  If the current checkpoint has shadow variables maintaining moving averages of
@@ -95,10 +99,14 @@ def replace_variable_values_with_moving_averages(graph,
    current_checkpoint_file: a checkpoint containing both original variables and
      their moving averages.
    new_checkpoint_file: file path to write a new checkpoint.
+    no_ema_collection: A list of namescope substrings to match the variables
+      to eliminate EMA.
  """
  with graph.as_default():
    variable_averages = tf.train.ExponentialMovingAverage(0.0)
    ema_variables_to_restore = variable_averages.variables_to_restore()
+    ema_variables_to_restore = config_util.remove_unecessary_ema(
+        ema_variables_to_restore, no_ema_collection)
    with tf.Session() as sess:
      read_saver = tf.train.Saver(ema_variables_to_restore)
      read_saver.restore(sess, current_checkpoint_file)

--- a/research/object_detection/exporter_test.py
+++ b/research/object_detection/exporter_test.py
@@ -21,6 +21,7 @@ import tensorflow as tf
 from google.protobuf import text_format
 from tensorflow.python.framework import dtypes
 from tensorflow.python.ops import array_ops
+from tensorflow.python.tools import strip_unused_lib
 from object_detection import exporter
 from object_detection.builders import graph_rewriter_builder
 from object_detection.builders import model_builder
@@ -1056,6 +1057,42 @@ class ExportInferenceGraphTest(tf.test.TestCase):

    self.assertTrue(resize_op_found)

+  def test_rewrite_nn_resize_op_multiple_path(self):
+    g = tf.Graph()
+    with g.as_default():
+      with tf.name_scope('nearest_upsampling'):
+        x = array_ops.placeholder(dtypes.float32, shape=(8, 10, 10, 8))
+        x_stack = tf.stack([tf.stack([x] * 2, axis=3)] * 2, axis=2)
+        x_reshape = tf.reshape(x_stack, [8, 20, 20, 8])
+
+      with tf.name_scope('nearest_upsampling'):
+        x_2 = array_ops.placeholder(dtypes.float32, shape=(8, 10, 10, 8))
+        x_stack_2 = tf.stack([tf.stack([x_2] * 2, axis=3)] * 2, axis=2)
+        x_reshape_2 = tf.reshape(x_stack_2, [8, 20, 20, 8])
+
+      t = x_reshape + x_reshape_2
+
+      exporter.rewrite_nn_resize_op()
+
+    graph_def = g.as_graph_def()
+    graph_def = strip_unused_lib.strip_unused(
+        graph_def,
+        input_node_names=[
+            'nearest_upsampling/Placeholder', 'nearest_upsampling_1/Placeholder'
+        ],
+        output_node_names=['add'],
+        placeholder_type_enum=dtypes.float32.as_datatype_enum)
+
+    counter_resize_op = 0
+    t_input_ops = [op.name for op in t.op.inputs]
+    for node in graph_def.node:
+      # Make sure Stacks are replaced.
+      self.assertNotEqual(node.op, 'Pack')
+      if node.op == 'ResizeNearestNeighbor':
+        counter_resize_op += 1
+        self.assertIn(node.name + ':0', t_input_ops)
+    self.assertEqual(counter_resize_op, 2)
+

 if __name__ == '__main__':
  tf.test.main()
--- a/research/object_detection/g3doc/challenge_evaluation.md
+++ b/research/object_detection/g3doc/challenge_evaluation.md
@@ -66,6 +66,9 @@ python models/research/object_detection/metrics/oid_challenge_evaluation.py \
    --output_metrics=${OUTPUT_METRICS} \
 ```

+Note that predictions file must contain the following keys:
+ImageID,LabelName,Score,XMin,XMax,YMin,YMax
+
 For the Object Detection Track, the participants will be ranked on:

 -   "OpenImagesDetectionChallenge_Precision/mAP@0.5IOU"
@@ -94,10 +97,11 @@ evaluation metric implementation is available in the class
    masks.
    Those should be transformed into a single CSV file in the format:

-    ImageID,LabelName,ImageWidth,ImageHeight,XMin,YMin,XMax,YMax,GroupOf,Mask
-    where Mask is MS COCO RLE encoding of a binary mask stored in .png file.
-
-    NOTE: the util to make the transformation will be released soon.
+    ImageID,LabelName,ImageWidth,ImageHeight,XMin,YMin,XMax,YMax,IsGroupOf,Mask
+    where Mask is MS COCO RLE encoding, compressed with zip, and re-coded with
+    base64 encoding of a binary mask stored in .png file. See an example
+    implementation of the encoding function
+    [here](https://gist.github.com/pculliton/209398a2a52867580c6103e25e55d93c).

 1.  Run the following command to create hierarchical expansion of the instance
    segmentation, bounding boxes and image-level label annotations: {value=4}
@@ -142,6 +146,11 @@ python models/research/object_detection/metrics/oid_challenge_evaluation.py \
    --output_metrics=${OUTPUT_METRICS} \
 ```

+Note that predictions file must contain the following keys:
+ImageID,ImageWidth,ImageHeight,LabelName,Score,Mask
+
+Mask must be encoded the same way as groundtruth masks.
+
 For the Instance Segmentation Track, the participants will be ranked on:

 -   "OpenImagesInstanceSegmentationChallenge_Precision/mAP@0.5IOU"
@@ -196,6 +205,9 @@ python object_detection/metrics/oid_vrd_challenge_evaluation.py \
    --output_metrics=${OUTPUT_METRICS}
 ```

+Note that predictions file must contain the following keys:
+ImageID,LabelName1,LabelName2,RelationshipLabel,Score,XMin1,XMax1,YMin1,YMax1,XMin2,XMax2,YMin2,YMax2
+
 The participants of the challenge will be evaluated by weighted average of the following three metrics:

 - "VRDMetric_Relationships_mAP@0.5IOU"

--- a/research/object_detection/g3doc/detection_model_zoo.md
+++ b/research/object_detection/g3doc/detection_model_zoo.md
@@ -35,17 +35,20 @@ tar -xzvf ssd_mobilenet_v1_coco.tar.gz

 Inside the un-tar'ed directory, you will find:

-* a graph proto (`graph.pbtxt`)
-* a checkpoint
-  (`model.ckpt.data-00000-of-00001`, `model.ckpt.index`, `model.ckpt.meta`)
-* a frozen graph proto with weights baked into the graph as constants
-  (`frozen_inference_graph.pb`) to be used for out of the box inference
-    (try this out in the Jupyter notebook!)
-* a config file (`pipeline.config`) which was used to generate the graph.  These
-  directly correspond to a config file in the
-  [samples/configs](https://github.com/tensorflow/models/tree/master/research/object_detection/samples/configs)) directory but often with a modified score threshold.  In the case
-  of the heavier Faster R-CNN models, we also provide a version of the model
-  that uses a highly reduced number of proposals for speed.
+*   a graph proto (`graph.pbtxt`)
+*   a checkpoint (`model.ckpt.data-00000-of-00001`, `model.ckpt.index`,
+    `model.ckpt.meta`)
+*   a frozen graph proto with weights baked into the graph as constants
+    (`frozen_inference_graph.pb`) to be used for out of the box inference (try
+    this out in the Jupyter notebook!)
+*   a config file (`pipeline.config`) which was used to generate the graph.
+    These directly correspond to a config file in the
+    [samples/configs](https://github.com/tensorflow/models/tree/master/research/object_detection/samples/configs))
+    directory but often with a modified score threshold. In the case of the
+    heavier Faster R-CNN models, we also provide a version of the model that
+    uses a highly reduced number of proposals for speed.
+*   Mobile model only: a TfLite file (`model.tflite`) that can be deployed on
+    mobile devices.

 Some remarks on frozen inference graphs:

@@ -100,6 +103,13 @@ Note: The asterisk (☆) at the end of model name indicates that this model supp

 Note: If you download the tar.gz file of quantized models and un-tar, you will get different set of files - a checkpoint, a config file and tflite frozen graphs (txt/binary).

+### Mobile models
+
+Model name                                                                                                                          | Pixel 1 Latency (ms) | COCO mAP | Outputs
+----------------------------------------------------------------------------------------------------------------------------------- | :------------------: | :------: | :-----:
+[ssd_mobilenet_v3_large_coco](http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v3_large_coco_2019_08_14.tar.gz) | 119                  | 22.3     | Boxes
+[ssd_mobilenet_v3_small_coco](http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v3_small_coco_2019_08_14.tar.gz) | 43                   | 15.6     | Boxes
+
 ## Kitti-trained models

 Model name                                                                                                                                                        | Speed (ms) | Pascal mAP@0.5 | Outputs

--- a/research/object_detection/inputs.py
+++ b/research/object_detection/inputs.py
@@ -71,7 +71,8 @@ def transform_input_data(tensor_dict,
                         merge_multiple_boxes=False,
                         retain_original_image=False,
                         use_multiclass_scores=False,
-                         use_bfloat16=False):
+                         use_bfloat16=False,
+                         retain_original_image_additional_channels=False):
  """A single function that is responsible for all input data transformations.

  Data transformation functions are applied in the following order.
@@ -110,6 +111,8 @@ def transform_input_data(tensor_dict,
      this is True and multiclass_scores is empty, one-hot encoding of
      `groundtruth_classes` is used as a fallback.
    use_bfloat16: (optional) a bool, whether to use bfloat16 in training.
+    retain_original_image_additional_channels: (optional) Whether to retain
+      original image additional channels in the output dictionary.

  Returns:
    A dictionary keyed by fields.InputDataFields containing the tensors obtained
@@ -139,6 +142,10 @@ def transform_input_data(tensor_dict,
    channels = out_tensor_dict[fields.InputDataFields.image_additional_channels]
    out_tensor_dict[fields.InputDataFields.image] = tf.concat(
        [out_tensor_dict[fields.InputDataFields.image], channels], axis=2)
+    if retain_original_image_additional_channels:
+      out_tensor_dict[
+          fields.InputDataFields.image_additional_channels] = tf.cast(
+              image_resizer_fn(channels, None)[0], tf.uint8)

  # Apply data augmentation ops.
  if data_augmentation_fn is not None:
@@ -445,6 +452,9 @@ def _get_features_dict(input_dict):
  if fields.InputDataFields.original_image in input_dict:
    features[fields.InputDataFields.original_image] = input_dict[
        fields.InputDataFields.original_image]
+  if fields.InputDataFields.image_additional_channels in input_dict:
+    features[fields.InputDataFields.image_additional_channels] = input_dict[
+        fields.InputDataFields.image_additional_channels]
  return features


@@ -663,7 +673,9 @@ def eval_input(eval_config, eval_input_config, model_config,
        image_resizer_fn=image_resizer_fn,
        num_classes=num_classes,
        data_augmentation_fn=None,
-        retain_original_image=eval_config.retain_original_images)
+        retain_original_image=eval_config.retain_original_images,
+        retain_original_image_additional_channels=
+        eval_config.retain_original_image_additional_channels)
    tensor_dict = pad_input_data_to_static_shapes(
        tensor_dict=transform_data_fn(tensor_dict),
        max_num_boxes=eval_input_config.max_number_of_boxes,

--- a/research/object_detection/inputs_test.py
+++ b/research/object_detection/inputs_test.py
@@ -301,6 +301,70 @@ class InputsTest(test_case.TestCase, parameterized.TestCase):
    self.assertEqual(
        tf.int32, labels[fields.InputDataFields.groundtruth_difficult].dtype)

+  def test_ssd_inceptionV2_eval_input_with_additional_channels(
+      self, eval_batch_size=1):
+    """Tests the eval input function for SSDInceptionV2 with additional channels.
+
+    Args:
+      eval_batch_size: Batch size for eval set.
+    """
+    configs = _get_configs_for_model('ssd_inception_v2_pets')
+    model_config = configs['model']
+    model_config.ssd.num_classes = 37
+    configs['eval_input_configs'][0].num_additional_channels = 1
+    eval_config = configs['eval_config']
+    eval_config.batch_size = eval_batch_size
+    eval_config.retain_original_image_additional_channels = True
+    eval_input_fn = inputs.create_eval_input_fn(
+        eval_config, configs['eval_input_configs'][0], model_config)
+    features, labels = _make_initializable_iterator(eval_input_fn()).get_next()
+    self.assertAllEqual([eval_batch_size, 300, 300, 4],
+                        features[fields.InputDataFields.image].shape.as_list())
+    self.assertEqual(tf.float32, features[fields.InputDataFields.image].dtype)
+    self.assertAllEqual(
+        [eval_batch_size, 300, 300, 3],
+        features[fields.InputDataFields.original_image].shape.as_list())
+    self.assertEqual(tf.uint8,
+                     features[fields.InputDataFields.original_image].dtype)
+    self.assertAllEqual([eval_batch_size, 300, 300, 1], features[
+        fields.InputDataFields.image_additional_channels].shape.as_list())
+    self.assertEqual(
+        tf.uint8,
+        features[fields.InputDataFields.image_additional_channels].dtype)
+    self.assertAllEqual([eval_batch_size],
+                        features[inputs.HASH_KEY].shape.as_list())
+    self.assertEqual(tf.int32, features[inputs.HASH_KEY].dtype)
+    self.assertAllEqual(
+        [eval_batch_size, 100, 4],
+        labels[fields.InputDataFields.groundtruth_boxes].shape.as_list())
+    self.assertEqual(tf.float32,
+                     labels[fields.InputDataFields.groundtruth_boxes].dtype)
+    self.assertAllEqual(
+        [eval_batch_size, 100, model_config.ssd.num_classes],
+        labels[fields.InputDataFields.groundtruth_classes].shape.as_list())
+    self.assertEqual(tf.float32,
+                     labels[fields.InputDataFields.groundtruth_classes].dtype)
+    self.assertAllEqual(
+        [eval_batch_size, 100],
+        labels[fields.InputDataFields.groundtruth_weights].shape.as_list())
+    self.assertEqual(tf.float32,
+                     labels[fields.InputDataFields.groundtruth_weights].dtype)
+    self.assertAllEqual(
+        [eval_batch_size, 100],
+        labels[fields.InputDataFields.groundtruth_area].shape.as_list())
+    self.assertEqual(tf.float32,
+                     labels[fields.InputDataFields.groundtruth_area].dtype)
+    self.assertAllEqual(
+        [eval_batch_size, 100],
+        labels[fields.InputDataFields.groundtruth_is_crowd].shape.as_list())
+    self.assertEqual(tf.bool,
+                     labels[fields.InputDataFields.groundtruth_is_crowd].dtype)
+    self.assertAllEqual(
+        [eval_batch_size, 100],
+        labels[fields.InputDataFields.groundtruth_difficult].shape.as_list())
+    self.assertEqual(tf.int32,
+                     labels[fields.InputDataFields.groundtruth_difficult].dtype)
+
  def test_predict_input(self):
    """Tests the predict input function."""
    configs = _get_configs_for_model('ssd_inception_v2_pets')

--- a/research/object_detection/meta_architectures/faster_rcnn_meta_arch.py
+++ b/research/object_detection/meta_architectures/faster_rcnn_meta_arch.py
@@ -326,7 +326,8 @@ class FasterRCNNMetaArch(model.DetectionModel):
               clip_anchors_to_image=False,
               use_static_shapes=False,
               resize_masks=True,
-               freeze_batchnorm=False):
+               freeze_batchnorm=False,
+               return_raw_detections_during_predict=False):
    """FasterRCNNMetaArch Constructor.

    Args:
@@ -455,7 +456,9 @@ class FasterRCNNMetaArch(model.DetectionModel):
        stage box predictor during training or not. When training with a small
        batch size (e.g. 1), it is desirable to freeze batch norm update and
        use pretrained batch norm params.
-
+      return_raw_detections_during_predict: Whether to return raw detection
+        boxes in the predict() method. These are decoded boxes that have not
+        been through postprocessing (i.e. NMS). Default False.
    Raises:
      ValueError: If `second_stage_batch_size` > `first_stage_max_proposals` at
        training time.
@@ -623,6 +626,8 @@ class FasterRCNNMetaArch(model.DetectionModel):
    if self._number_of_stages <= 0 or self._number_of_stages > 3:
      raise ValueError('Number of stages should be a value in {1, 2, 3}.')
    self._batched_prediction_tensor_names = []
+    self._return_raw_detections_during_predict = (
+        return_raw_detections_during_predict)

  @property
  def first_stage_feature_extractor_scope(self):
@@ -694,16 +699,12 @@ class FasterRCNNMetaArch(model.DetectionModel):
    Raises:
      ValueError: if inputs tensor does not have type tf.float32
    """
-    if inputs.dtype is not tf.float32:
-      raise ValueError('`preprocess` expects a tf.float32 tensor')
+
    with tf.name_scope('Preprocessor'):
-      outputs = shape_utils.static_or_dynamic_map_fn(
-          self._image_resizer_fn,
-          elems=inputs,
-          dtype=[tf.float32, tf.int32],
-          parallel_iterations=self._parallel_iterations)
-      resized_inputs = outputs[0]
-      true_image_shapes = outputs[1]
+      (resized_inputs,
+       true_image_shapes) = shape_utils.resize_images_and_return_shapes(
+           inputs, self._image_resizer_fn)
+
      return (self._feature_extractor.preprocess(resized_inputs),
              true_image_shapes)

@@ -790,31 +791,42 @@ class FasterRCNNMetaArch(model.DetectionModel):
          for the first stage RPN (in absolute coordinates).  Note that
          `num_anchors` can differ depending on whether the model is created in
          training or inference mode.
+        7) feature_maps: A single element list containing a 4-D float32 tensor
+          with shape batch_size, height, width, depth] representing the RPN
+          features to crop.

        (and if number_of_stages > 1):
-        7) refined_box_encodings: a 3-D tensor with shape
+        8) refined_box_encodings: a 3-D tensor with shape
          [total_num_proposals, num_classes, self._box_coder.code_size]
          representing predicted (final) refined box encodings, where
          total_num_proposals=batch_size*self._max_num_proposals. If using
          a shared box across classes the shape will instead be
          [total_num_proposals, 1, self._box_coder.code_size].
-        8) class_predictions_with_background: a 3-D tensor with shape
+        9) class_predictions_with_background: a 3-D tensor with shape
          [total_num_proposals, num_classes + 1] containing class
          predictions (logits) for each of the anchors, where
          total_num_proposals=batch_size*self._max_num_proposals.
          Note that this tensor *includes* background class predictions
          (at class index 0).
-        9) num_proposals: An int32 tensor of shape [batch_size] representing the
-          number of proposals generated by the RPN.  `num_proposals` allows us
-          to keep track of which entries are to be treated as zero paddings and
-          which are not since we always pad the number of proposals to be
+        10) num_proposals: An int32 tensor of shape [batch_size] representing
+          the number of proposals generated by the RPN.  `num_proposals` allows
+          us to keep track of which entries are to be treated as zero paddings
+          and which are not since we always pad the number of proposals to be
          `self.max_num_proposals` for each image.
-        10) proposal_boxes: A float32 tensor of shape
+        11) proposal_boxes: A float32 tensor of shape
          [batch_size, self.max_num_proposals, 4] representing
          decoded proposal bounding boxes in absolute coordinates.
-        11) mask_predictions: (optional) a 4-D tensor with shape
+        12) mask_predictions: (optional) a 4-D tensor with shape
          [total_num_padded_proposals, num_classes, mask_height, mask_width]
          containing instance mask predictions.
+        13) raw_detection_boxes: (optional) a
+          [batch_size, self.max_num_proposals, num_classes, 4] float32 tensor
+          with detections prior to NMS in normalized coordinates.
+        14) raw_detection_feature_map_indices: (optional) a
+          [batch_size, self.max_num_proposals, num_classes] int32 tensor with
+          indices indicating which feature map each raw detection box was
+          produced from. The indices correspond to the elements in the
+          'feature_maps' field.

    Raises:
      ValueError: If `predict` is called before `preprocess`.
@@ -868,6 +880,9 @@ class FasterRCNNMetaArch(model.DetectionModel):
          for the first stage RPN (in absolute coordinates).  Note that
          `num_anchors` can differ depending on whether the model is created in
          training or inference mode.
+        7) feature_maps: A single element list containing a 4-D float32 tensor
+          with shape batch_size, height, width, depth] representing the RPN
+          features to crop.
    """
    (rpn_box_predictor_features, rpn_features_to_crop, anchors_boxlist,
     image_shape) = self._extract_rpn_feature_maps(preprocessed_inputs)
@@ -907,6 +922,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
                    dtype=tf.float32),
        'anchors':
            anchors_boxlist.data['boxes'],
+        fields.PredictionFields.feature_maps: [rpn_features_to_crop]
    }
    return prediction_dict

@@ -985,18 +1001,25 @@ class FasterRCNNMetaArch(model.DetectionModel):
          of the image.
        6) box_classifier_features: a 4-D float32/bfloat16 tensor
          representing the features for each proposal.
+        If self._return_raw_detections_during_predict is True, the dictionary
+        will also contain:
+        7) raw_detection_boxes: a 4-D float32 tensor with shape
+          [batch_size, self.max_num_proposals, num_classes, 4] in normalized
+          coordinates.
+        8) raw_detection_feature_map_indices: a 3-D int32 tensor with shape
+          [batch_size, self.max_num_proposals, num_classes].
    """
    proposal_boxes_normalized, num_proposals = self._proposal_postprocess(
        rpn_box_encodings, rpn_objectness_predictions_with_background, anchors,
        image_shape, true_image_shapes)
    prediction_dict = self._box_prediction(rpn_features_to_crop,
                                           proposal_boxes_normalized,
-                                           image_shape)
+                                           image_shape, true_image_shapes)
    prediction_dict['num_proposals'] = num_proposals
    return prediction_dict

  def _box_prediction(self, rpn_features_to_crop, proposal_boxes_normalized,
-                      image_shape):
+                      image_shape, true_image_shapes):
    """Predicts the output tensors from second stage of Faster R-CNN.

    Args:
@@ -1008,6 +1031,10 @@ class FasterRCNNMetaArch(model.DetectionModel):
        proposal boxes for all images in the batch.  These boxes are represented
        as normalized coordinates.
      image_shape: A 1D int32 tensors of size [4] containing the image shape.
+      true_image_shapes: int32 tensor of shape [batch, 3] where each row is
+        of the form [height, width, channels] indicating the shapes
+        of true images in the resized images, as resized images can be padded
+        with zeros.

    Returns:
      prediction_dict: a dictionary holding "raw" prediction tensors:
@@ -1034,6 +1061,16 @@ class FasterRCNNMetaArch(model.DetectionModel):
          of the image.
        5) box_classifier_features: a 4-D float32/bfloat16 tensor
          representing the features for each proposal.
+        If self._return_raw_detections_during_predict is True, the dictionary
+        will also contain:
+        6) raw_detection_boxes: a 4-D float32 tensor with shape
+          [batch_size, self.max_num_proposals, num_classes, 4] in normalized
+          coordinates.
+        7) raw_detection_feature_map_indices: a 3-D int32 tensor with shape
+          [batch_size, self.max_num_proposals, num_classes].
+        8) final_anchors: a 3-D float tensor of shape [batch_size,
+          self.max_num_proposals, 4] containing the reference anchors for raw
+          detection boxes in normalized coordinates.
    """
    flattened_proposal_feature_maps = (
        self._compute_second_stage_input_feature_maps(
@@ -1071,10 +1108,54 @@ class FasterRCNNMetaArch(model.DetectionModel):
        'proposal_boxes': absolute_proposal_boxes,
        'box_classifier_features': box_classifier_features,
        'proposal_boxes_normalized': proposal_boxes_normalized,
+        'final_anchors': proposal_boxes_normalized
    }

+    if self._return_raw_detections_during_predict:
+      prediction_dict.update(self._raw_detections_and_feature_map_inds(
+          refined_box_encodings, absolute_proposal_boxes, true_image_shapes))
+
    return prediction_dict

+  def _raw_detections_and_feature_map_inds(
+      self, refined_box_encodings, absolute_proposal_boxes, true_image_shapes):
+    """Returns raw detections and feat map inds from where they originated.
+
+    Args:
+      refined_box_encodings: [total_num_proposals, num_classes,
+        self._box_coder.code_size] float32 tensor.
+      absolute_proposal_boxes: [batch_size, self.max_num_proposals, 4] float32
+        tensor representing decoded proposal bounding boxes in absolute
+        coordinates.
+      true_image_shapes: [batch, 3] int32 tensor where each row is
+        of the form [height, width, channels] indicating the shapes
+        of true images in the resized images, as resized images can be padded
+        with zeros.
+
+    Returns:
+      A dictionary with raw detection boxes, and the feature map indices from
+      which they originated.
+    """
+    box_encodings_batch = tf.reshape(
+        refined_box_encodings,
+        [-1, self.max_num_proposals, refined_box_encodings.shape[1],
+         self._box_coder.code_size])
+    raw_detection_boxes_absolute = self._batch_decode_boxes(
+        box_encodings_batch, absolute_proposal_boxes)
+
+    raw_detection_boxes_normalized = shape_utils.static_or_dynamic_map_fn(
+        self._normalize_and_clip_boxes,
+        elems=[raw_detection_boxes_absolute, true_image_shapes],
+        dtype=tf.float32)
+    detection_feature_map_indices = tf.zeros_like(
+        raw_detection_boxes_normalized[:, :, :, 0], dtype=tf.int32)
+    return {
+        fields.PredictionFields.raw_detection_boxes:
+            raw_detection_boxes_normalized,
+        fields.PredictionFields.raw_detection_feature_map_indices:
+            detection_feature_map_indices
+    }
+
  def _extract_box_classifier_features(self, flattened_feature_maps):
    if self._feature_extractor_for_box_classifier_features == (
        _UNINITIALIZED_FEATURE_EXTRACTOR):
@@ -1416,11 +1497,12 @@ class FasterRCNNMetaArch(model.DetectionModel):
        detection_boxes: [batch, max_detection, 4]
        detection_scores: [batch, max_detections]
        detection_multiclass_scores: [batch, max_detections, 2]
+        detection_anchor_indices: [batch, max_detections]
        detection_classes: [batch, max_detections]
          (this entry is only created if rpn_mode=False)
        num_detections: [batch]
-        raw_detection_boxes: [batch, max_detections, 4]
-        raw_detection_scores: [batch, max_detections, num_classes + 1]
+        raw_detection_boxes: [batch, total_detections, 4]
+        raw_detection_scores: [batch, total_detections, num_classes + 1]

    Raises:
      ValueError: If `predict` is called before `preprocess`.
@@ -1473,6 +1555,13 @@ class FasterRCNNMetaArch(model.DetectionModel):
    if self._number_of_stages == 3:
      # Post processing is already performed in 3rd stage. We need to transfer
      # postprocessed tensors from `prediction_dict` to `detections_dict`.
+      # Remove any items from the prediction dictionary if they are not pure
+      # Tensors.
+      non_tensor_predictions = [
+          k for k, v in prediction_dict.items() if not isinstance(v, tf.Tensor)]
+      for k in non_tensor_predictions:
+        tf.logging.info('Removing {0} from prediction_dict'.format(k))
+        prediction_dict.pop(k)
      return prediction_dict

  def _add_detection_features_output_node(self, detection_boxes,
@@ -1621,8 +1710,9 @@ class FasterRCNNMetaArch(model.DetectionModel):
        normalize_boxes,
        elems=[raw_proposal_boxes, image_shapes],
        dtype=tf.float32)
-    proposal_multiclass_scores = nmsed_additional_fields.get(
-        'multiclass_scores') if nmsed_additional_fields else None,
+    proposal_multiclass_scores = (
+        nmsed_additional_fields.get('multiclass_scores')
+        if nmsed_additional_fields else None)
    return (normalized_proposal_boxes, proposal_scores,
            proposal_multiclass_scores, num_proposals,
            raw_normalized_proposal_boxes, rpn_objectness_softmax)
@@ -1899,9 +1989,11 @@ class FasterRCNNMetaArch(model.DetectionModel):
      A dictionary containing:
        `detection_boxes`: [batch, max_detection, 4] in normalized co-ordinates.
        `detection_scores`: [batch, max_detections]
-         detection_multiclass_scores: [batch, max_detections,
+         `detection_multiclass_scores`: [batch, max_detections,
          num_classes_with_background] tensor with class score distribution for
          post-processed detection boxes including background class if any.
+        `detection_anchor_indices`: [batch, max_detections] with anchor
+          indices.
        `detection_classes`: [batch, max_detections]
        `num_detections`: [batch]
        `detection_masks`:
@@ -1909,10 +2001,13 @@ class FasterRCNNMetaArch(model.DetectionModel):
          that a pixel-wise sigmoid score converter is applied to the detection
          masks.
        `raw_detection_boxes`: [batch, total_detections, 4] tensor with decoded
-          detection boxes before Non-Max Suppression.
+          detection boxes in normalized coordinates, before Non-Max Suppression.
+          The value total_detections is the number of second stage anchors
+          (i.e. the total number of boxes before NMS).
        `raw_detection_scores`: [batch, total_detections,
          num_classes_with_background] tensor of multi-class scores for
-          raw detection boxes.
+          raw detection boxes. The value total_detections is the number of
+          second stage anchors (i.e. the total number of boxes before NMS).
    """
    refined_box_encodings_batch = tf.reshape(
        refined_box_encodings,
@@ -1943,8 +2038,14 @@ class FasterRCNNMetaArch(model.DetectionModel):
          mask_predictions, [-1, self.max_num_proposals,
                             self.num_classes, mask_height, mask_width])

+    batch_size = shape_utils.combined_static_and_dynamic_shape(
+        refined_box_encodings_batch)[0]
+    batch_anchor_indices = tf.tile(
+        tf.expand_dims(tf.range(self.max_num_proposals), 0),
+        multiples=[batch_size, 1])
    additional_fields = {
-        'multiclass_scores': class_predictions_with_background_batch_normalized
+        'multiclass_scores': class_predictions_with_background_batch_normalized,
+        'anchor_indices': tf.cast(batch_anchor_indices, tf.float32)
    }
    (nmsed_boxes, nmsed_scores, nmsed_classes, nmsed_masks,
     nmsed_additional_fields, num_detections) = self._second_stage_nms_fn(
@@ -1965,25 +2066,8 @@ class FasterRCNNMetaArch(model.DetectionModel):
    else:
      raw_detection_boxes = tf.squeeze(refined_decoded_boxes_batch, axis=2)

-    def normalize_and_clip_boxes(args):
-      """Normalize and clip boxes."""
-      boxes_per_image = args[0]
-      image_shape = args[1]
-      normalized_boxes_per_image = box_list_ops.to_normalized_coordinates(
-          box_list.BoxList(boxes_per_image),
-          image_shape[0],
-          image_shape[1],
-          check_range=False).get()
-
-      normalized_boxes_per_image = box_list_ops.clip_to_window(
-          box_list.BoxList(normalized_boxes_per_image),
-          tf.constant([0.0, 0.0, 1.0, 1.0], tf.float32),
-          filter_nonoverlapping=False).get()
-
-      return normalized_boxes_per_image
-
    raw_normalized_detection_boxes = shape_utils.static_or_dynamic_map_fn(
-        normalize_and_clip_boxes,
+        self._normalize_and_clip_boxes,
        elems=[raw_detection_boxes, image_shapes],
        dtype=tf.float32)

@@ -1996,6 +2080,8 @@ class FasterRCNNMetaArch(model.DetectionModel):
            nmsed_classes,
        fields.DetectionResultFields.detection_multiclass_scores:
            nmsed_additional_fields['multiclass_scores'],
+        fields.DetectionResultFields.detection_anchor_indices:
+            tf.cast(nmsed_additional_fields['anchor_indices'], tf.int32),
        fields.DetectionResultFields.num_detections:
            tf.cast(num_detections, dtype=tf.float32),
        fields.DetectionResultFields.raw_detection_boxes:
@@ -2041,6 +2127,35 @@ class FasterRCNNMetaArch(model.DetectionModel):
                      tf.stack([combined_shape[0], combined_shape[1],
                                num_classes, 4]))

+  def _normalize_and_clip_boxes(self, boxes_and_image_shape):
+    """Normalize and clip boxes."""
+    boxes_per_image = boxes_and_image_shape[0]
+    image_shape = boxes_and_image_shape[1]
+
+    boxes_contains_classes_dim = boxes_per_image.shape.ndims == 3
+    if boxes_contains_classes_dim:
+      boxes_per_image = shape_utils.flatten_first_n_dimensions(
+          boxes_per_image, 2)
+    normalized_boxes_per_image = box_list_ops.to_normalized_coordinates(
+        box_list.BoxList(boxes_per_image),
+        image_shape[0],
+        image_shape[1],
+        check_range=False).get()
+
+    normalized_boxes_per_image = box_list_ops.clip_to_window(
+        box_list.BoxList(normalized_boxes_per_image),
+        tf.constant([0.0, 0.0, 1.0, 1.0], tf.float32),
+        filter_nonoverlapping=False).get()
+
+    if boxes_contains_classes_dim:
+      max_num_proposals, num_classes, _ = (
+          shape_utils.combined_static_and_dynamic_shape(
+              boxes_and_image_shape[0]))
+      normalized_boxes_per_image = shape_utils.expand_first_dimension(
+          normalized_boxes_per_image, [max_num_proposals, num_classes])
+
+    return normalized_boxes_per_image
+
  def loss(self, prediction_dict, true_image_shapes, scope=None):
    """Compute scalar loss tensors given prediction tensors.


--- a/research/object_detection/meta_architectures/faster_rcnn_meta_arch_test.py
+++ b/research/object_detection/meta_architectures/faster_rcnn_meta_arch_test.py
@@ -244,7 +244,8 @@ class FasterRCNNMetaArchTest(
                                                max_num_proposals,
                                                initial_crop_size,
                                                maxpool_stride,
-                                                3)
+                                                3),
+        'feature_maps': [(2, image_size, image_size, 512)]
    }

    for input_shape in input_shapes:
@@ -274,9 +275,12 @@ class FasterRCNNMetaArchTest(
                  'detection_boxes', 'detection_scores',
                  'detection_multiclass_scores', 'detection_classes',
                  'detection_masks', 'num_detections', 'mask_predictions',
-                  'raw_detection_boxes', 'raw_detection_scores'
+                  'raw_detection_boxes', 'raw_detection_scores',
+                  'detection_anchor_indices', 'final_anchors',
              ])))
      for key in expected_shapes:
+        if isinstance(tensor_dict_out[key], list):
+          continue
        self.assertAllEqual(tensor_dict_out[key].shape, expected_shapes[key])
      self.assertAllEqual(tensor_dict_out['detection_boxes'].shape, [2, 5, 4])
      self.assertAllEqual(tensor_dict_out['detection_masks'].shape,
@@ -288,6 +292,101 @@ class FasterRCNNMetaArchTest(
      self.assertAllEqual(tensor_dict_out['mask_predictions'].shape,
                          [10, num_classes, 14, 14])

+  @parameterized.parameters(
+      {'use_keras': True},
+      {'use_keras': False},
+  )
+  def test_raw_detection_boxes_and_anchor_indices_correct(self, use_keras):
+    batch_size = 2
+    image_size = 10
+    max_num_proposals = 8
+    initial_crop_size = 3
+    maxpool_stride = 1
+
+    input_shapes = [(batch_size, image_size, image_size, 3),
+                    (None, image_size, image_size, 3),
+                    (batch_size, None, None, 3),
+                    (None, None, None, 3)]
+    expected_num_anchors = image_size * image_size * 3 * 3
+    expected_shapes = {
+        'rpn_box_predictor_features':
+        (batch_size, image_size, image_size, 512),
+        'rpn_features_to_crop': (batch_size, image_size, image_size, 3),
+        'image_shape': (4,),
+        'rpn_box_encodings': (batch_size, expected_num_anchors, 4),
+        'rpn_objectness_predictions_with_background':
+        (batch_size, expected_num_anchors, 2),
+        'anchors': (expected_num_anchors, 4),
+        'refined_box_encodings': (batch_size * max_num_proposals, 1, 4),
+        'class_predictions_with_background':
+            (batch_size * max_num_proposals, 2 + 1),
+        'num_proposals': (batch_size,),
+        'proposal_boxes': (batch_size, max_num_proposals, 4),
+        'proposal_boxes_normalized': (batch_size, max_num_proposals, 4),
+        'box_classifier_features':
+        self._get_box_classifier_features_shape(image_size,
+                                                batch_size,
+                                                max_num_proposals,
+                                                initial_crop_size,
+                                                maxpool_stride,
+                                                3),
+        'feature_maps': [(batch_size, image_size, image_size, 3)],
+        'raw_detection_feature_map_indices': (batch_size, max_num_proposals, 1),
+        'raw_detection_boxes': (batch_size, max_num_proposals, 1, 4),
+        'final_anchors': (batch_size, max_num_proposals, 4)
+    }
+
+    for input_shape in input_shapes:
+      test_graph = tf.Graph()
+      with test_graph.as_default():
+        model = self._build_model(
+            is_training=False,
+            use_keras=use_keras,
+            number_of_stages=2,
+            second_stage_batch_size=2,
+            share_box_across_classes=True,
+            return_raw_detections_during_predict=True)
+        preprocessed_inputs = tf.placeholder(tf.float32, shape=input_shape)
+        _, true_image_shapes = model.preprocess(preprocessed_inputs)
+        predict_tensor_dict = model.predict(preprocessed_inputs,
+                                            true_image_shapes)
+        postprocess_tensor_dict = model.postprocess(predict_tensor_dict,
+                                                    true_image_shapes)
+        init_op = tf.global_variables_initializer()
+      with self.test_session(graph=test_graph) as sess:
+        sess.run(init_op)
+        [predict_dict_out, postprocess_dict_out] = sess.run(
+            [predict_tensor_dict, postprocess_tensor_dict], feed_dict={
+                preprocessed_inputs:
+                    np.zeros((batch_size, image_size, image_size, 3))})
+      self.assertEqual(
+          set(predict_dict_out.keys()),
+          set(expected_shapes.keys()))
+      for key in expected_shapes:
+        if isinstance(predict_dict_out[key], list):
+          continue
+        self.assertAllEqual(predict_dict_out[key].shape, expected_shapes[key])
+      # Verify that the raw detections from predict and postprocess are the
+      # same.
+      self.assertAllClose(
+          np.squeeze(predict_dict_out['raw_detection_boxes']),
+          postprocess_dict_out['raw_detection_boxes'])
+      # Verify that the raw detection boxes at detection anchor indices are the
+      # same as the postprocessed detections.
+      for i in range(batch_size):
+        num_detections_per_image = int(
+            postprocess_dict_out['num_detections'][i])
+        detection_boxes_per_image = postprocess_dict_out[
+            'detection_boxes'][i][:num_detections_per_image]
+        detection_anchor_indices_per_image = postprocess_dict_out[
+            'detection_anchor_indices'][i][:num_detections_per_image]
+        raw_detections_per_image = np.squeeze(predict_dict_out[
+            'raw_detection_boxes'][i])
+        raw_detections_at_anchor_indices = raw_detections_per_image[
+            detection_anchor_indices_per_image]
+        self.assertAllClose(detection_boxes_per_image,
+                            raw_detections_at_anchor_indices)
+
  @parameterized.parameters(
      {'masks_are_class_agnostic': False, 'use_keras': True},
      {'masks_are_class_agnostic': True, 'use_keras': True},
@@ -345,7 +444,8 @@ class FasterRCNNMetaArchTest(
              self._get_box_classifier_features_shape(
                  image_size, batch_size, max_num_proposals, initial_crop_size,
                  maxpool_stride, 3),
-          'mask_predictions': (2 * max_num_proposals, mask_shape_1, 14, 14)
+          'mask_predictions': (2 * max_num_proposals, mask_shape_1, 14, 14),
+          'feature_maps': [(2, image_size, image_size, 512)]
      }

      init_op = tf.global_variables_initializer()
@@ -359,8 +459,11 @@ class FasterRCNNMetaArchTest(
                    'rpn_box_encodings',
                    'rpn_objectness_predictions_with_background',
                    'anchors',
+                    'final_anchors',
                ])))
        for key in expected_shapes:
+          if isinstance(tensor_dict_out[key], list):
+            continue
          self.assertAllEqual(tensor_dict_out[key].shape, expected_shapes[key])

        anchors_shape_out = tensor_dict_out['anchors'].shape

--- a/research/object_detection/meta_architectures/faster_rcnn_meta_arch_test_lib.py
+++ b/research/object_detection/meta_architectures/faster_rcnn_meta_arch_test_lib.py
@@ -118,27 +118,30 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
    text_format.Merge(hyperparams_text_proto, hyperparams)
    return hyperparams_builder.KerasLayerHyperparams(hyperparams)

-  def _get_second_stage_box_predictor_text_proto(self):
+  def _get_second_stage_box_predictor_text_proto(
+      self, share_box_across_classes=False):
+    share_box_field = 'true' if share_box_across_classes else 'false'
    box_predictor_text_proto = """
-      mask_rcnn_box_predictor {
-        fc_hyperparams {
+      mask_rcnn_box_predictor {{
+        fc_hyperparams {{
          op: FC
          activation: NONE
-          regularizer {
-            l2_regularizer {
+          regularizer {{
+            l2_regularizer {{
              weight: 0.0005
-            }
-          }
-          initializer {
-            variance_scaling_initializer {
+            }}
+          }}
+          initializer {{
+            variance_scaling_initializer {{
              factor: 1.0
              uniform: true
              mode: FAN_AVG
-            }
-          }
-        }
-      }
-    """
+            }}
+          }}
+        }}
+        share_box_across_classes: {share_box_across_classes}
+      }}
+    """.format(share_box_across_classes=share_box_field)
    return box_predictor_text_proto

  def _add_mask_to_second_stage_box_predictor_text_proto(
@@ -169,10 +172,11 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):

  def _get_second_stage_box_predictor(self, num_classes, is_training,
                                      predict_masks, masks_are_class_agnostic,
+                                      share_box_across_classes=False,
                                      use_keras=False):
    box_predictor_proto = box_predictor_pb2.BoxPredictor()
-    text_format.Merge(self._get_second_stage_box_predictor_text_proto(),
-                      box_predictor_proto)
+    text_format.Merge(self._get_second_stage_box_predictor_text_proto(
+        share_box_across_classes), box_predictor_proto)
    if predict_masks:
      text_format.Merge(
          self._add_mask_to_second_stage_box_predictor_text_proto(
@@ -219,7 +223,9 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
                   clip_anchors_to_image=False,
                   use_matmul_gather_in_matcher=False,
                   use_static_shapes=False,
-                   calibration_mapping_value=None):
+                   calibration_mapping_value=None,
+                   share_box_across_classes=False,
+                   return_raw_detections_during_predict=False):

    def image_resizer_fn(image, masks=None):
      """Fake image resizer function."""
@@ -404,6 +410,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
        'clip_anchors_to_image': clip_anchors_to_image,
        'use_static_shapes': use_static_shapes,
        'resize_masks': True,
+        'return_raw_detections_during_predict':
+            return_raw_detections_during_predict
    }

    return self._get_model(
@@ -412,7 +420,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
            is_training=is_training,
            use_keras=use_keras,
            predict_masks=predict_masks,
-            masks_are_class_agnostic=masks_are_class_agnostic), **common_kwargs)
+            masks_are_class_agnostic=masks_are_class_agnostic,
+            share_box_across_classes=share_box_across_classes), **common_kwargs)

  @parameterized.parameters(
      {'use_static_shapes': False, 'use_keras': True},
@@ -538,7 +547,7 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
      expected_output_keys = set([
          'rpn_box_predictor_features', 'rpn_features_to_crop', 'image_shape',
          'rpn_box_encodings', 'rpn_objectness_predictions_with_background',
-          'anchors'])
+          'anchors', 'feature_maps'])
      # At training time, anchors that exceed image bounds are pruned.  Thus
      # the `expected_num_anchors` in the above inference mode test is now
      # a strict upper bound on the number of anchors.
@@ -612,7 +621,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
                          expected_output_shapes['proposal_boxes_normalized'])
      self.assertAllEqual(results[11].shape,
                          expected_output_shapes['box_classifier_features'])
-
+      self.assertAllEqual(results[12].shape,
+                          expected_output_shapes['final_anchors'])
    batch_size = 2
    image_size = 10
    max_num_proposals = 8
@@ -648,7 +658,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
              prediction_dict['num_proposals'],
              prediction_dict['proposal_boxes'],
              prediction_dict['proposal_boxes_normalized'],
-              prediction_dict['box_classifier_features'])
+              prediction_dict['box_classifier_features'],
+              prediction_dict['final_anchors'])

    expected_num_anchors = image_size * image_size * 3 * 3
    expected_shapes = {
@@ -671,7 +682,9 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
                                                max_num_proposals,
                                                initial_crop_size,
                                                maxpool_stride,
-                                                3)
+                                                3),
+        'feature_maps': [(2, image_size, image_size, 512)],
+        'final_anchors': (2, max_num_proposals, 4)
    }

    if use_static_shapes:
@@ -702,6 +715,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
        self.assertEqual(set(tensor_dict_out.keys()),
                         set(expected_shapes.keys()))
        for key in expected_shapes:
+          if isinstance(tensor_dict_out[key], list):
+            continue
          self.assertAllEqual(tensor_dict_out[key].shape, expected_shapes[key])

  @parameterized.parameters(
@@ -748,7 +763,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
              result_tensor_dict['rpn_objectness_predictions_with_background'],
              result_tensor_dict['rpn_features_to_crop'],
              result_tensor_dict['rpn_box_predictor_features'],
-              updates
+              updates,
+              result_tensor_dict['final_anchors'],
             )

    image_shape = (batch_size, image_size, image_size, 3)
@@ -785,7 +801,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
                image_size, batch_size, max_num_proposals, initial_crop_size,
                maxpool_stride, 3),
        'rpn_objectness_predictions_with_background':
-        (2, image_size * image_size * 9, 2)
+        (2, image_size * image_size * 9, 2),
+        'final_anchors': (2, max_num_proposals, 4)
    }
    # TODO(rathodv): Possibly change utils/test_case.py to accept dictionaries
    # and return dicionaries so don't have to rely on the order of tensors.
@@ -805,6 +822,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
                        expected_shapes['rpn_features_to_crop'])
    self.assertAllEqual(results[8].shape,
                        expected_shapes['rpn_box_predictor_features'])
+    self.assertAllEqual(results[10].shape,
+                        expected_shapes['final_anchors'])

  @parameterized.parameters(
      {'use_static_shapes': False, 'pad_to_max_dimension': None,
@@ -1082,7 +1101,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
              detections['detection_scores'], detections['detection_classes'],
              detections['raw_detection_boxes'],
              detections['raw_detection_scores'],
-              detections['detection_multiclass_scores'])
+              detections['detection_multiclass_scores'],
+              detections['detection_anchor_indices'])

    proposal_boxes = np.array(
        [[[1, 1, 2, 3],
@@ -1110,6 +1130,7 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
                                 [images, refined_box_encodings,
                                  class_predictions_with_background,
                                  num_proposals, proposal_boxes])
+    # Note that max_total_detections=5 in the NMS config.
    expected_num_detections = [5, 4]
    expected_detection_classes = [[0, 0, 0, 1, 1], [0, 0, 1, 1, 0]]
    expected_detection_scores = [[1, 1, 1, 1, 1], [1, 1, 1, 1, 0]]
@@ -1123,6 +1144,10 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
                                   [1, 1, 1],
                                   [1, 1, 1],
                                   [0, 0, 0]]]
+    # Note that a single anchor can be used for multiple detections (predictions
+    # are made independently per class).
+    expected_anchor_indices = [[0, 1, 2, 0, 1],
+                               [0, 1, 0, 1]]

    h = float(image_shape[1])
    w = float(image_shape[2])
@@ -1143,6 +1168,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
                          expected_detection_classes[indx][0:num_proposals])
      self.assertAllClose(results[6][indx][0:num_proposals],
                          expected_multiclass_scores[indx][0:num_proposals])
+      self.assertAllClose(results[7][indx][0:num_proposals],
+                          expected_anchor_indices[indx][0:num_proposals])

    self.assertAllClose(results[4], expected_raw_detection_boxes)
    self.assertAllClose(results[5],

--- a/research/object_detection/meta_architectures/rfcn_meta_arch.py
+++ b/research/object_detection/meta_architectures/rfcn_meta_arch.py
@@ -82,7 +82,8 @@ class RFCNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch):
               clip_anchors_to_image=False,
               use_static_shapes=False,
               resize_masks=False,
-               freeze_batchnorm=False):
+               freeze_batchnorm=False,
+               return_raw_detections_during_predict=False):
    """RFCNMetaArch Constructor.

    Args:
@@ -188,6 +189,9 @@ class RFCNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch):
        training or not. When training with a small batch size (e.g. 1), it is
        desirable to freeze batch norm update and use pretrained batch norm
        params.
+      return_raw_detections_during_predict: Whether to return raw detection
+        boxes in the predict() method. These are decoded boxes that have not
+        been through postprocessing (i.e. NMS). Default False.

    Raises:
      ValueError: If `second_stage_batch_size` > `first_stage_max_proposals`
@@ -234,7 +238,9 @@ class RFCNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch):
        clip_anchors_to_image,
        use_static_shapes,
        resize_masks,
-        freeze_batchnorm=freeze_batchnorm)
+        freeze_batchnorm=freeze_batchnorm,
+        return_raw_detections_during_predict=(
+            return_raw_detections_during_predict))

    self._rfcn_box_predictor = second_stage_rfcn_box_predictor

@@ -335,7 +341,11 @@ class RFCNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch):
        'proposal_boxes': absolute_proposal_boxes,
        'box_classifier_features': box_classifier_features,
        'proposal_boxes_normalized': proposal_boxes_normalized,
+        'final_anchors': absolute_proposal_boxes
    }
+    if self._return_raw_detections_during_predict:
+      prediction_dict.update(self._raw_detections_and_feature_map_inds(
+          refined_box_encodings, absolute_proposal_boxes))
    return prediction_dict

  def regularization_losses(self):

--- a/research/object_detection/meta_architectures/rfcn_meta_arch_test.py
+++ b/research/object_detection/meta_architectures/rfcn_meta_arch_test.py
@@ -24,7 +24,9 @@ from object_detection.meta_architectures import rfcn_meta_arch
 class RFCNMetaArchTest(
    faster_rcnn_meta_arch_test_lib.FasterRCNNMetaArchTestBase):

-  def _get_second_stage_box_predictor_text_proto(self):
+  def _get_second_stage_box_predictor_text_proto(
+      self, share_box_across_classes=False):
+    del share_box_across_classes
    box_predictor_text_proto = """
      rfcn_box_predictor {
        conv_hyperparams {

--- a/research/object_detection/meta_architectures/ssd_meta_arch.py
+++ b/research/object_detection/meta_architectures/ssd_meta_arch.py
@@ -254,13 +254,21 @@ class SSDKerasFeatureExtractor(tf.keras.Model):
      the model graph.
    """
    variables_to_restore = {}
-    for variable in self.variables:
-      # variable.name includes ":0" at the end, but the names in the checkpoint
-      # do not have the suffix ":0". So, we strip it here.
-      var_name = variable.name[:-2]
-      if var_name.startswith(feature_extractor_scope + '/'):
-        var_name = var_name.replace(feature_extractor_scope + '/', '')
-      variables_to_restore[var_name] = variable
+    if tf.executing_eagerly():
+      for variable in self.variables:
+        # variable.name includes ":0" at the end, but the names in the
+        # checkpoint do not have the suffix ":0". So, we strip it here.
+        var_name = variable.name[:-2]
+        if var_name.startswith(feature_extractor_scope + '/'):
+          var_name = var_name.replace(feature_extractor_scope + '/', '')
+        variables_to_restore[var_name] = variable
+    else:
+      # b/137854499: use global_variables.
+      for variable in variables_helper.get_global_variables_safely():
+        var_name = variable.op.name
+        if var_name.startswith(feature_extractor_scope + '/'):
+          var_name = var_name.replace(feature_extractor_scope + '/', '')
+          variables_to_restore[var_name] = variable

    return variables_to_restore

@@ -295,7 +303,9 @@ class SSDMetaArch(model.DetectionModel):
               expected_loss_weights_fn=None,
               use_confidences_as_targets=False,
               implicit_example_weight=0.5,
-               equalization_loss_config=None):
+               equalization_loss_config=None,
+               return_raw_detections_during_predict=False,
+               nms_on_host=True):
    """SSDMetaArch Constructor.

    TODO(rathodv,jonathanhuang): group NMS parameters + score converter into
@@ -371,6 +381,11 @@ class SSDMetaArch(model.DetectionModel):
        for the implicit negative examples.
      equalization_loss_config: a namedtuple that specifies configs for
        computing equalization loss.
+      return_raw_detections_during_predict: Whether to return raw detection
+        boxes in the predict() method. These are decoded boxes that have not
+        been through postprocessing (i.e. NMS). Default False.
+      nms_on_host: boolean (default: True) controlling whether NMS should be
+        carried out on the host (outside of TPU).
    """
    super(SSDMetaArch, self).__init__(num_classes=box_predictor.num_classes)
    self._is_training = is_training
@@ -438,6 +453,10 @@ class SSDMetaArch(model.DetectionModel):

    self._equalization_loss_config = equalization_loss_config

+    self._return_raw_detections_during_predict = (
+        return_raw_detections_during_predict)
+    self._nms_on_host = nms_on_host
+
  @property
  def anchors(self):
    if not self._anchors:
@@ -475,17 +494,10 @@ class SSDMetaArch(model.DetectionModel):
    Raises:
      ValueError: if inputs tensor does not have type tf.float32
    """
-    if inputs.dtype is not tf.float32:
-      raise ValueError('`preprocess` expects a tf.float32 tensor')
    with tf.name_scope('Preprocessor'):
-      # TODO(jonathanhuang): revisit whether to always use batch size as
-      # the number of parallel iterations vs allow for dynamic batching.
-      outputs = shape_utils.static_or_dynamic_map_fn(
-          self._image_resizer_fn,
-          elems=inputs,
-          dtype=[tf.float32, tf.int32])
-      resized_inputs = outputs[0]
-      true_image_shapes = outputs[1]
+      (resized_inputs,
+       true_image_shapes) = shape_utils.resize_images_and_return_shapes(
+           inputs, self._image_resizer_fn)

      return (self._feature_extractor.preprocess(resized_inputs),
              true_image_shapes)
@@ -560,6 +572,14 @@ class SSDMetaArch(model.DetectionModel):
          [batch, height_i, width_i, depth_i].
        5) anchors: 2-D float tensor of shape [num_anchors, 4] containing
          the generated anchors in normalized coordinates.
+        6) final_anchors: 3-D float tensor of shape [batch_size, num_anchors, 4]
+          containing the generated anchors in normalized coordinates.
+        If self._return_raw_detections_during_predict is True, the dictionary
+        will also contain:
+        7) raw_detection_boxes: a 4-D float32 tensor with shape
+          [batch_size, self.max_num_proposals, 4] in normalized coordinates.
+        8) raw_detection_feature_map_indices: a 3-D int32 tensor with shape
+          [batch_size, self.max_num_proposals].
    """
    if self._inplace_batchnorm_update:
      batchnorm_updates_collections = None
@@ -581,11 +601,11 @@ class SSDMetaArch(model.DetectionModel):
        feature_maps)
    image_shape = shape_utils.combined_static_and_dynamic_shape(
        preprocessed_inputs)
-    self._anchors = box_list_ops.concatenate(
-        self._anchor_generator.generate(
-            feature_map_spatial_dims,
-            im_height=image_shape[1],
-            im_width=image_shape[2]))
+    boxlist_list = self._anchor_generator.generate(
+        feature_map_spatial_dims,
+        im_height=image_shape[1],
+        im_width=image_shape[2])
+    self._anchors = box_list_ops.concatenate(boxlist_list)
    if self._box_predictor.is_keras_model:
      predictor_results_dict = self._box_predictor(feature_maps)
    else:
@@ -596,9 +616,15 @@ class SSDMetaArch(model.DetectionModel):
        predictor_results_dict = self._box_predictor.predict(
            feature_maps, self._anchor_generator.num_anchors_per_location())
    predictions_dict = {
-        'preprocessed_inputs': preprocessed_inputs,
-        'feature_maps': feature_maps,
-        'anchors': self._anchors.get()
+        'preprocessed_inputs':
+            preprocessed_inputs,
+        'feature_maps':
+            feature_maps,
+        'anchors':
+            self._anchors.get(),
+        'final_anchors':
+            tf.tile(
+                tf.expand_dims(self._anchors.get(), 0), [image_shape[0], 1, 1])
    }
    for prediction_key, prediction_list in iter(predictor_results_dict.items()):
      prediction = tf.concat(prediction_list, axis=1)
@@ -606,10 +632,29 @@ class SSDMetaArch(model.DetectionModel):
          prediction.shape[2] == 1):
        prediction = tf.squeeze(prediction, axis=2)
      predictions_dict[prediction_key] = prediction
+    if self._return_raw_detections_during_predict:
+      predictions_dict.update(self._raw_detections_and_feature_map_inds(
+          predictions_dict['box_encodings'], boxlist_list))
    self._batched_prediction_tensor_names = [x for x in predictions_dict
                                             if x != 'anchors']
    return predictions_dict

+  def _raw_detections_and_feature_map_inds(self, box_encodings, boxlist_list):
+    anchors = self._anchors.get()
+    raw_detection_boxes, _ = self._batch_decode(box_encodings, anchors)
+    batch_size, _, _ = shape_utils.combined_static_and_dynamic_shape(
+        raw_detection_boxes)
+    feature_map_indices = (
+        self._anchor_generator.anchor_index_to_feature_map_index(boxlist_list))
+    feature_map_indices_batched = tf.tile(
+        tf.expand_dims(feature_map_indices, 0),
+        multiples=[batch_size, 1])
+    return {
+        fields.PredictionFields.raw_detection_boxes: raw_detection_boxes,
+        fields.PredictionFields.raw_detection_feature_map_indices:
+            feature_map_indices_batched
+    }
+
  def _get_feature_map_spatial_dims(self, feature_maps):
    """Return list of spatial dimensions for each feature map in a list.

@@ -719,7 +764,9 @@ class SSDMetaArch(model.DetectionModel):
          'multiclass_scores': detection_scores_with_background
      }
      if self._anchors is not None:
-        anchor_indices = tf.range(self._anchors.num_boxes_static())
+        num_boxes = (self._anchors.num_boxes_static() or
+                     self._anchors.num_boxes())
+        anchor_indices = tf.range(num_boxes)
        batch_anchor_indices = tf.tile(
            tf.expand_dims(anchor_indices, 0), [batch_size, 1])
        # All additional fields need to be float.
@@ -730,14 +777,30 @@ class SSDMetaArch(model.DetectionModel):
        detection_keypoints = tf.identity(
            detection_keypoints, 'raw_keypoint_locations')
        additional_fields[fields.BoxListFields.keypoints] = detection_keypoints
+
+      def _non_max_suppression_wrapper(kwargs):
+        if self._nms_on_host:
+          # Note: NMS is not memory efficient on TPU. This force the NMS to run
+          # outside of TPU.
+          return tf.contrib.tpu.outside_compilation(
+              lambda x: self._non_max_suppression_fn(**x), kwargs)
+        else:
+          return self._non_max_suppression_fn(**kwargs)
+
      (nmsed_boxes, nmsed_scores, nmsed_classes, nmsed_masks,
-       nmsed_additional_fields, num_detections) = self._non_max_suppression_fn(
+       nmsed_additional_fields,
+       num_detections) = _non_max_suppression_wrapper({
+           'boxes':
           detection_boxes,
+           'scores':
           detection_scores,
-           clip_window=self._compute_clip_window(preprocessed_images,
-                                                 true_image_shapes),
-           additional_fields=additional_fields,
-           masks=prediction_dict.get('mask_predictions'))
+           'clip_window':
+           self._compute_clip_window(preprocessed_images, true_image_shapes),
+           'additional_fields':
+           additional_fields,
+           'masks':
+           prediction_dict.get('mask_predictions')
+       })
      detection_dict = {
          fields.DetectionResultFields.detection_boxes:
              nmsed_boxes,
@@ -1058,6 +1121,15 @@ class SSDMetaArch(model.DetectionModel):
        with rows of the Match objects corresponding to groundtruth boxes
        and columns corresponding to anchors.
    """
+    # TODO(rathodv): Add a test for these summaries.
+    try:
+      # TODO(kaftan): Integrate these summaries into the v2 style loops
+      with tf.compat.v2.init_scope():
+        if tf.compat.v2.executing_eagerly():
+          return
+    except AttributeError:
+      pass
+
    avg_num_gt_boxes = tf.reduce_mean(
        tf.cast(
            tf.stack([tf.shape(x)[0] for x in groundtruth_boxes_list]),
@@ -1078,14 +1150,6 @@ class SSDMetaArch(model.DetectionModel):
        tf.cast(
            tf.stack([match.num_ignored_columns() for match in match_list]),
            dtype=tf.float32))
-    # TODO(rathodv): Add a test for these summaries.
-    try:
-      # TODO(kaftan): Integrate these summaries into the v2 style loops
-      with tf.compat.v2.init_scope():
-        if tf.compat.v2.executing_eagerly():
-          return
-    except AttributeError:
-      pass

    tf.summary.scalar('AvgNumGroundtruthBoxesPerImage',
                      avg_num_gt_boxes,
@@ -1232,26 +1296,27 @@ class SSDMetaArch(model.DetectionModel):
      ValueError: if fine_tune_checkpoint_type is neither `classification`
        nor `detection`.
    """
-    if fine_tune_checkpoint_type not in ['detection', 'classification']:
-      raise ValueError('Not supported fine_tune_checkpoint_type: {}'.format(
-          fine_tune_checkpoint_type))
-
    if fine_tune_checkpoint_type == 'classification':
      return self._feature_extractor.restore_from_classification_checkpoint_fn(
          self._extract_features_scope)

-    if fine_tune_checkpoint_type == 'detection':
+    elif fine_tune_checkpoint_type == 'detection':
      variables_to_restore = {}
      if tf.executing_eagerly():
-        for variable in self.variables:
-          # variable.name includes ":0" at the end, but the names in the
-          # checkpoint do not have the suffix ":0". So, we strip it here.
-          var_name = variable.name[:-2]
-          if load_all_detection_checkpoint_vars:
+        if load_all_detection_checkpoint_vars:
+          # Grab all detection vars by name
+          for variable in self.variables:
+            # variable.name includes ":0" at the end, but the names in the
+            # checkpoint do not have the suffix ":0". So, we strip it here.
+            var_name = variable.name[:-2]
+            variables_to_restore[var_name] = variable
+        else:
+          # Grab just the feature extractor vars by name
+          for variable in self._feature_extractor.variables:
+            # variable.name includes ":0" at the end, but the names in the
+            # checkpoint do not have the suffix ":0". So, we strip it here.
+            var_name = variable.name[:-2]
            variables_to_restore[var_name] = variable
-          else:
-            if var_name.startswith(self._extract_features_scope):
-              variables_to_restore[var_name] = variable
      else:
        for variable in variables_helper.get_global_variables_safely():
          var_name = variable.op.name
@@ -1261,7 +1326,11 @@ class SSDMetaArch(model.DetectionModel):
            if var_name.startswith(self._extract_features_scope):
              variables_to_restore[var_name] = variable

-    return variables_to_restore
+      return variables_to_restore
+
+    else:
+      raise ValueError('Not supported fine_tune_checkpoint_type: {}'.format(
+          fine_tune_checkpoint_type))

  def updates(self):
    """Returns a list of update operators for this model.

--- a/research/object_detection/meta_architectures/ssd_meta_arch_test.py
+++ b/research/object_detection/meta_architectures/ssd_meta_arch_test.py
@@ -49,7 +49,8 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
      predict_mask=False,
      use_static_shapes=False,
      nms_max_size_per_class=5,
-      calibration_mapping_value=None):
+      calibration_mapping_value=None,
+      return_raw_detections_during_predict=False):
    return super(SsdMetaArchTest, self)._create_model(
        model_fn=ssd_meta_arch.SSDMetaArch,
        apply_hard_mining=apply_hard_mining,
@@ -63,7 +64,9 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
        predict_mask=predict_mask,
        use_static_shapes=use_static_shapes,
        nms_max_size_per_class=nms_max_size_per_class,
-        calibration_mapping_value=calibration_mapping_value)
+        calibration_mapping_value=calibration_mapping_value,
+        return_raw_detections_during_predict=(
+            return_raw_detections_during_predict))

  def test_preprocess_preserves_shapes_with_dynamic_input_image(
      self, use_keras):
@@ -105,6 +108,7 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
        self.assertIn('class_predictions_with_background', prediction_dict)
        self.assertIn('feature_maps', prediction_dict)
        self.assertIn('anchors', prediction_dict)
+        self.assertIn('final_anchors', prediction_dict)

        init_op = tf.global_variables_initializer()
      with self.test_session(graph=tf_graph) as sess:
@@ -121,6 +125,8 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,

      self.assertAllEqual(prediction_out['box_encodings'].shape,
                          expected_box_encodings_shape_out)
+      self.assertAllEqual(prediction_out['final_anchors'].shape,
+                          (batch_size, num_anchors, 4))
      self.assertAllEqual(
          prediction_out['class_predictions_with_background'].shape,
          expected_class_predictions_with_background_shape_out)
@@ -137,7 +143,7 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
      return (predictions['box_encodings'],
              predictions['class_predictions_with_background'],
              predictions['feature_maps'],
-              predictions['anchors'])
+              predictions['anchors'], predictions['final_anchors'])
    batch_size = 3
    image_size = 2
    channels = 3
@@ -145,11 +151,83 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
                                 channels).astype(np.float32)
    expected_box_encodings_shape = (batch_size, num_anchors, code_size)
    expected_class_predictions_shape = (batch_size, num_anchors, num_classes+1)
-    (box_encodings, class_predictions, _, _) = self.execute(graph_fn,
-                                                            [input_image])
+    final_anchors_shape = (batch_size, num_anchors, 4)
+    (box_encodings, class_predictions, _, _, final_anchors) = self.execute(
+        graph_fn, [input_image])
    self.assertAllEqual(box_encodings.shape, expected_box_encodings_shape)
    self.assertAllEqual(class_predictions.shape,
                        expected_class_predictions_shape)
+    self.assertAllEqual(final_anchors.shape, final_anchors_shape)
+
+  def test_predict_with_raw_output_fields(self, use_keras):
+    with tf.Graph().as_default():
+      _, num_classes, num_anchors, code_size = self._create_model(
+          use_keras=use_keras)
+
+    def graph_fn(input_image):
+      model, _, _, _ = self._create_model(
+          return_raw_detections_during_predict=True)
+      predictions = model.predict(input_image, true_image_shapes=None)
+      return (predictions['box_encodings'],
+              predictions['class_predictions_with_background'],
+              predictions['feature_maps'],
+              predictions['anchors'], predictions['final_anchors'],
+              predictions['raw_detection_boxes'],
+              predictions['raw_detection_feature_map_indices'])
+    batch_size = 3
+    image_size = 2
+    channels = 3
+    input_image = np.random.rand(batch_size, image_size, image_size,
+                                 channels).astype(np.float32)
+    expected_box_encodings_shape = (batch_size, num_anchors, code_size)
+    expected_class_predictions_shape = (batch_size, num_anchors, num_classes+1)
+    final_anchors_shape = (batch_size, num_anchors, 4)
+    expected_raw_detection_boxes_shape = (batch_size, num_anchors, 4)
+    (box_encodings, class_predictions, _, _, final_anchors, raw_detection_boxes,
+     raw_detection_feature_map_indices) = self.execute(
+         graph_fn, [input_image])
+    self.assertAllEqual(box_encodings.shape, expected_box_encodings_shape)
+    self.assertAllEqual(class_predictions.shape,
+                        expected_class_predictions_shape)
+    self.assertAllEqual(final_anchors.shape, final_anchors_shape)
+    self.assertAllEqual(raw_detection_boxes.shape,
+                        expected_raw_detection_boxes_shape)
+    self.assertAllEqual(raw_detection_feature_map_indices,
+                        np.zeros((batch_size, num_anchors)))
+
+  def test_raw_detection_boxes_agree_predict_postprocess(self, use_keras):
+    batch_size = 2
+    image_size = 2
+    input_shapes = [(batch_size, image_size, image_size, 3),
+                    (None, image_size, image_size, 3),
+                    (batch_size, None, None, 3),
+                    (None, None, None, 3)]
+
+    for input_shape in input_shapes:
+      tf_graph = tf.Graph()
+      with tf_graph.as_default():
+        model, _, _, _ = self._create_model(
+            use_keras=use_keras, return_raw_detections_during_predict=True)
+        input_placeholder = tf.placeholder(tf.float32, shape=input_shape)
+        preprocessed_inputs, true_image_shapes = model.preprocess(
+            input_placeholder)
+        prediction_dict = model.predict(preprocessed_inputs,
+                                        true_image_shapes)
+        raw_detection_boxes_predict = prediction_dict['raw_detection_boxes']
+        detections = model.postprocess(prediction_dict, true_image_shapes)
+        raw_detection_boxes_postprocess = detections['raw_detection_boxes']
+        init_op = tf.global_variables_initializer()
+      with self.test_session(graph=tf_graph) as sess:
+        sess.run(init_op)
+        raw_detection_boxes_predict_out, raw_detection_boxes_postprocess_out = (
+            sess.run(
+                [raw_detection_boxes_predict, raw_detection_boxes_postprocess],
+                feed_dict={
+                    input_placeholder:
+                        np.random.uniform(size=(batch_size, 2, 2, 3))}))
+
+      self.assertAllEqual(raw_detection_boxes_predict_out,
+                          raw_detection_boxes_postprocess_out)

  def test_postprocess_results_are_correct(self, use_keras):
    batch_size = 2
@@ -188,7 +266,7 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
                            [0.5, 0., 1., 0.5], [1., 1., 1.5, 1.5]]]
    raw_detection_scores = [[[0, 0], [0, 0], [0, 0], [0, 0]],
                            [[0, 0], [0, 0], [0, 0], [0, 0]]]
-    detection_anchor_indices = [[0, 2, 1, 0, 0], [0, 2, 1, 0, 0]]
+    detection_anchor_indices_sets = [[0, 1, 2], [0, 1, 2]]

    for input_shape in input_shapes:
      tf_graph = tf.Graph()
@@ -230,8 +308,9 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
                          raw_detection_boxes)
      self.assertAllEqual(detections_out['raw_detection_scores'],
                          raw_detection_scores)
-      self.assertAllEqual(detections_out['detection_anchor_indices'],
-                          detection_anchor_indices)
+      for idx in range(batch_size):
+        self.assertSameElements(detections_out['detection_anchor_indices'][idx],
+                                detection_anchor_indices_sets[idx])

  def test_postprocess_results_are_correct_static(self, use_keras):
    with tf.Graph().as_default():

--- a/research/object_detection/meta_architectures/ssd_meta_arch_test_lib.py
+++ b/research/object_detection/meta_architectures/ssd_meta_arch_test_lib.py
@@ -129,7 +129,8 @@ class SSDMetaArchTestBase(test_case.TestCase):
      predict_mask=False,
      use_static_shapes=False,
      nms_max_size_per_class=5,
-      calibration_mapping_value=None):
+      calibration_mapping_value=None,
+      return_raw_detections_during_predict=False):
    is_training = False
    num_classes = 1
    mock_anchor_generator = MockAnchorGenerator2x2()
@@ -238,6 +239,8 @@ class SSDMetaArchTestBase(test_case.TestCase):
        add_background_class=add_background_class,
        random_example_sampler=random_example_sampler,
        expected_loss_weights_fn=expected_loss_weights_fn,
+        return_raw_detections_during_predict=(
+            return_raw_detections_during_predict),
        **kwargs)
    return model, num_classes, mock_anchor_generator.num_anchors(), code_size


--- a/research/object_detection/model_lib.py
+++ b/research/object_detection/model_lib.py
@@ -267,6 +267,13 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False,
    # Make sure to set the Keras learning phase. True during training,
    # False for inference.
    tf.keras.backend.set_learning_phase(is_training)
+    # Set policy for mixed-precision training with Keras-based models.
+    if use_tpu and train_config.use_bfloat16:
+      from tensorflow.python.keras.engine import base_layer_utils  # pylint: disable=g-import-not-at-top
+      # Enable v2 behavior, as `mixed_bfloat16` is only supported in TF 2.0.
+      base_layer_utils.enable_v2_dtype_behavior()
+      tf.compat.v2.keras.mixed_precision.experimental.set_policy(
+          'mixed_bfloat16')
    detection_model = detection_model_fn(
        is_training=is_training, add_summaries=(not use_tpu))
    scaffold_fn = None
@@ -315,7 +322,8 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False,
            features[fields.InputDataFields.true_image_shape]))

    if mode == tf.estimator.ModeKeys.TRAIN:
-      if train_config.fine_tune_checkpoint and hparams.load_pretrained:
+      load_pretrained = hparams.load_pretrained if hparams else False
+      if train_config.fine_tune_checkpoint and load_pretrained:
        if not train_config.fine_tune_checkpoint_type:
          # train_config.from_detection_checkpoint field is deprecated. For
          # backward compatibility, set train_config.fine_tune_checkpoint_type
@@ -449,6 +457,10 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False,
          original_image_spatial_shapes=original_image_spatial_shapes,
          true_image_shapes=true_image_shapes)

+      if fields.InputDataFields.image_additional_channels in features:
+        eval_dict[fields.InputDataFields.image_additional_channels] = features[
+            fields.InputDataFields.image_additional_channels]
+
      if class_agnostic:
        category_index = label_map_util.create_class_agnostic_category_index()
      else: