Merged commit includes the following changes: (#6932)

250447559 by Zhichao Lu: Update expected files format for Instance Segmentation challenge: - add fields ImageWidth, ImageHeight and store the values per prediction - as mask, store only encoded image and assume its size is ImageWidth x ImageHeight -- 250402780 by rathodv: Fix failing Mask R-CNN TPU convergence test. Cast second stage prediction tensors from bfloat16 to float32 to prevent errors in third target assignment (Mask Prediction) - Concat with different types bfloat16 and bfloat32 isn't allowed. -- 250300240 by Zhichao Lu: Addion Open Images Challenge 2019 object detection and instance segmentation support into Estimator framework. -- 249944839 by rathodv: Modify exporter.py to add multiclass score nodes in exported inference graphs. -- 249935201 by rathodv: Modify postprocess methods to preserve multiclass scores after non max suppression. -- 249878079 by Zhichao Lu: This CL slightly refactors some Object Detection helper functions for data creation, evaluation, and groundtruth providing. This will allow the eager+function custom loops to share code with the existing estimator training loops. Concretely we make the following changes: 1. In input creation we separate dataset-creation into top-level helpers, and allow it to optionally accept a pre-constructed model directly instead of always creating a model from the config just for feature preprocessing. 2. In coco evaluation we split the update_op creation into its own function, which the custom loops will call directly. 3. In model_lib we move groundtruth providing/ datastructure munging into a helper function 4. For now we put an escape hatch in `_summarize_target_assignment` when executing in tf v2.0 behavior because the summary apis used only work w/ tf 1.x -- 249673507 by rathodv: Use explicit casts instead of tf.to_float and tf.to_int32 to avoid warnings. -- 249656006 by Zhichao Lu: Add named "raw_keypoint_locations" node that corresponds with the "raw_box_locations" node. -- 249651674 by rathodv: Keep proposal boxes in float format. MatMulCropAndResize can handle the type even when feature themselves are bfloat16s. -- 249568633 by rathodv: Support q > 1 in class agnostic NMS. Break post_processing_test.py into 3 separate files to avoid linter errors. -- 249535530 by rathodv: Update some deprecated arguments to tf ops. -- 249368223 by rathodv: Modify MatMulCropAndResize to use MultiLevelRoIAlign method and move the tests to spatial_transform_ops.py module. This cl establishes that CropAndResize and RoIAlign are equivalent and only differ in the sampling point grid within the boxes. CropAndResize uses a uniform size x size point grid such that the corner points exactly overlap box corners, while RoiAlign divides boxes into size x size cells and uses their centers as sampling points. In this cl, we switch MatMulCropAndResize to use the MultiLevelRoIAlign implementation with `align_corner` option as MultiLevelRoIAlign implementation is more memory efficient on TPU when compared to the original MatMulCropAndResize. -- 249337338 by chowdhery: Add class-agnostic non-max-suppression in post_processing -- 249139196 by Zhichao Lu: Fix positional argument bug in export_tflite_ssd_graph -- 249120219 by Zhichao Lu: Add evaluator for computing precision limited to a given recall range. -- 249030593 by Zhichao Lu: Evaluation util to run segmentation and detection challenge evaluation. -- 248554358 by Zhichao Lu: This change contains the auxiliary changes required for TF 2.0 style training with eager+functions+dist strat loops, but not the loops themselves. It includes: - Updates to shape usage to support both tensorshape v1 and tensorshape v2 - A fix to FreezableBatchNorm to not override the `training` arg in call when `None` was passed to the constructor (Not an issue in the estimator loops but it was in the custom loops) - Puts some constants in init_scope so they work in eager + functions - Makes learning rate schedules return a callable in eager mode (required so they update when the global_step changes) - Makes DetectionModel a tf.module so it tracks variables (e.g. ones nested in layers) - Removes some references to `op.name` for some losses and replaces it w/ explicit names - A small part of the change to allow the coco evaluation metrics to work in eager mode -- 248271226 by rathodv: Add MultiLevel RoIAlign op. -- 248229103 by rathodv: Add functions to 1. pad features maps 2. ravel 5-D indices -- 248206769 by rathodv: Add utilities needed to introduce RoI Align op. -- 248177733 by pengchong: Internal changes -- 247742582 by Zhichao Lu: Open Images Challenge 2019 instance segmentation metric: part 2 -- 247525401 by Zhichao Lu: Update comments on max_class_per_detection. -- 247520753 by rathodv: Add multilevel crop and resize operation that builds on top of matmul_crop_and_resize. -- 247391600 by Zhichao Lu: Open Images Challenge 2019 instance segmentation metric -- 247325813 by chowdhery: Quantized MobileNet v2 SSD FPNLite config with depth multiplier 0.75 -- PiperOrigin-RevId: 250447559

Merged commit includes the following changes: (#6932)
250447559 by Zhichao Lu: Update expected files format for Instance Segmentation challenge: - add fields ImageWidth, ImageHeight and store the values per prediction - as mask, store only encoded image and assume its size is ImageWidth x ImageHeight -- 250402780 by rathodv: Fix failing Mask R-CNN TPU convergence test. Cast second stage prediction tensors from bfloat16 to float32 to prevent errors in third target assignment (Mask Prediction) - Concat with different types bfloat16 and bfloat32 isn't allowed. -- 250300240 by Zhichao Lu: Addion Open Images Challenge 2019 object detection and instance segmentation support into Estimator framework. -- 249944839 by rathodv: Modify exporter.py to add multiclass score nodes in exported inference graphs. -- 249935201 by rathodv: Modify postprocess methods to preserve multiclass scores after non max suppression. -- 249878079 by Zhichao Lu: This CL slightly refactors some Object Detection helper functions for data creation, evaluation, and groundtruth providing. This will allow the eager+function custom loops to share code with the existing estimator training loops. Concretely we make the following changes: 1. In input creation we separate dataset-creation into top-level helpers, and allow it to optionally accept a pre-constructed model directly instead of always creating a model from the config just for feature preprocessing. 2. In coco evaluation we split the update_op creation into its own function, which the custom loops will call directly. 3. In model_lib we move groundtruth providing/ datastructure munging into a helper function 4. For now we put an escape hatch in `_summarize_target_assignment` when executing in tf v2.0 behavior because the summary apis used only work w/ tf 1.x -- 249673507 by rathodv: Use explicit casts instead of tf.to_float and tf.to_int32 to avoid warnings. -- 249656006 by Zhichao Lu: Add named "raw_keypoint_locations" node that corresponds with the "raw_box_locations" node. -- 249651674 by rathodv: Keep proposal boxes in float format. MatMulCropAndResize can handle the type even when feature themselves are bfloat16s. -- 249568633 by rathodv: Support q > 1 in class agnostic NMS. Break post_processing_test.py into 3 separate files to avoid linter errors. -- 249535530 by rathodv: Update some deprecated arguments to tf ops. -- 249368223 by rathodv: Modify MatMulCropAndResize to use MultiLevelRoIAlign method and move the tests to spatial_transform_ops.py module. This cl establishes that CropAndResize and RoIAlign are equivalent and only differ in the sampling point grid within the boxes. CropAndResize uses a uniform size x size point grid such that the corner points exactly overlap box corners, while RoiAlign divides boxes into size x size cells and uses their centers as sampling points. In this cl, we switch MatMulCropAndResize to use the MultiLevelRoIAlign implementation with `align_corner` option as MultiLevelRoIAlign implementation is more memory efficient on TPU when compared to the original MatMulCropAndResize. -- 249337338 by chowdhery: Add class-agnostic non-max-suppression in post_processing -- 249139196 by Zhichao Lu: Fix positional argument bug in export_tflite_ssd_graph -- 249120219 by Zhichao Lu: Add evaluator for computing precision limited to a given recall range. -- 249030593 by Zhichao Lu: Evaluation util to run segmentation and detection challenge evaluation. -- 248554358 by Zhichao Lu: This change contains the auxiliary changes required for TF 2.0 style training with eager+functions+dist strat loops, but not the loops themselves. It includes: - Updates to shape usage to support both tensorshape v1 and tensorshape v2 - A fix to FreezableBatchNorm to not override the `training` arg in call when `None` was passed to the constructor (Not an issue in the estimator loops but it was in the custom loops) - Puts some constants in init_scope so they work in eager + functions - Makes learning rate schedules return a callable in eager mode (required so they update when the global_step changes) - Makes DetectionModel a tf.module so it tracks variables (e.g. ones nested in layers) - Removes some references to `op.name` for some losses and replaces it w/ explicit names - A small part of the change to allow the coco evaluation metrics to work in eager mode -- 248271226 by rathodv: Add MultiLevel RoIAlign op. -- 248229103 by rathodv: Add functions to 1. pad features maps 2. ravel 5-D indices -- 248206769 by rathodv: Add utilities needed to introduce RoI Align op. -- 248177733 by pengchong: Internal changes -- 247742582 by Zhichao Lu: Open Images Challenge 2019 instance segmentation metric: part 2 -- 247525401 by Zhichao Lu: Update comments on max_class_per_detection. -- 247520753 by rathodv: Add multilevel crop and resize operation that builds on top of matmul_crop_and_resize. -- 247391600 by Zhichao Lu: Open Images Challenge 2019 instance segmentation metric -- 247325813 by chowdhery: Quantized MobileNet v2 SSD FPNLite config with depth multiplier 0.75 -- PiperOrigin-RevId: 250447559
9bbf8015 · pkulzc · GitHub · f42fddee · 9bbf8015 · 9bbf8015
Unverified Commit 9bbf8015 authored May 30, 2019 by pkulzc Committed by GitHub May 30, 2019
20 changed files
--- a/research/object_detection/core/model.py
+++ b/research/object_detection/core/model.py
@@ -55,12 +55,24 @@ a handful of auxiliary annotations associated with each bounding box, namely,
 instance masks and keypoints.
 """
 import abc
+import tensorflow as tf

 from object_detection.core import standard_fields as fields


-class DetectionModel(object):
-  """Abstract base class for detection models."""
+# If using a new enough version of TensorFlow, detection models should be a
+# tf module or keras model for tracking.
+try:
+  _BaseClass = tf.Module
+except AttributeError:
+  _BaseClass = object
+
+
+class DetectionModel(_BaseClass):
+  """Abstract base class for detection models.
+
+  Extends tf.Module to guarantee variable tracking.
+  """
  __metaclass__ = abc.ABCMeta

  def __init__(self, num_classes):

--- a/research/object_detection/core/multiclass_nms_test.py
+++ b/research/object_detection/core/multiclass_nms_test.py
+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Tests for tensorflow_models.object_detection.core.post_processing."""
+import numpy as np
+import tensorflow as tf
+from object_detection.core import post_processing
+from object_detection.core import standard_fields as fields
+from object_detection.utils import test_case
+
+
+class MulticlassNonMaxSuppressionTest(test_case.TestCase):
+
+  def test_multiclass_nms_select_with_shared_boxes(self):
+    boxes = tf.constant([[[0, 0, 1, 1]],
+                         [[0, 0.1, 1, 1.1]],
+                         [[0, -0.1, 1, 0.9]],
+                         [[0, 10, 1, 11]],
+                         [[0, 10.1, 1, 11.1]],
+                         [[0, 100, 1, 101]],
+                         [[0, 1000, 1, 1002]],
+                         [[0, 1000, 1, 1002.1]]], tf.float32)
+    scores = tf.constant([[.9, 0.01], [.75, 0.05],
+                          [.6, 0.01], [.95, 0],
+                          [.5, 0.01], [.3, 0.01],
+                          [.01, .85], [.01, .5]])
+    score_thresh = 0.1
+    iou_thresh = .5
+    max_output_size = 4
+
+    exp_nms_corners = [[0, 10, 1, 11],
+                       [0, 0, 1, 1],
+                       [0, 1000, 1, 1002],
+                       [0, 100, 1, 101]]
+    exp_nms_scores = [.95, .9, .85, .3]
+    exp_nms_classes = [0, 0, 1, 0]
+
+    nms, _ = post_processing.multiclass_non_max_suppression(
+        boxes, scores, score_thresh, iou_thresh, max_output_size)
+    with self.test_session() as sess:
+      nms_corners_output, nms_scores_output, nms_classes_output = sess.run(
+          [nms.get(), nms.get_field(fields.BoxListFields.scores),
+           nms.get_field(fields.BoxListFields.classes)])
+      self.assertAllClose(nms_corners_output, exp_nms_corners)
+      self.assertAllClose(nms_scores_output, exp_nms_scores)
+      self.assertAllClose(nms_classes_output, exp_nms_classes)
+
+  def test_multiclass_nms_select_with_shared_boxes_pad_to_max_output_size(self):
+    boxes = np.array([[[0, 0, 1, 1]],
+                      [[0, 0.1, 1, 1.1]],
+                      [[0, -0.1, 1, 0.9]],
+                      [[0, 10, 1, 11]],
+                      [[0, 10.1, 1, 11.1]],
+                      [[0, 100, 1, 101]],
+                      [[0, 1000, 1, 1002]],
+                      [[0, 1000, 1, 1002.1]]], np.float32)
+    scores = np.array([[.9, 0.01], [.75, 0.05],
+                       [.6, 0.01], [.95, 0],
+                       [.5, 0.01], [.3, 0.01],
+                       [.01, .85], [.01, .5]], np.float32)
+    score_thresh = 0.1
+    iou_thresh = .5
+    max_size_per_class = 4
+    max_output_size = 5
+
+    exp_nms_corners = [[0, 10, 1, 11],
+                       [0, 0, 1, 1],
+                       [0, 1000, 1, 1002],
+                       [0, 100, 1, 101]]
+    exp_nms_scores = [.95, .9, .85, .3]
+    exp_nms_classes = [0, 0, 1, 0]
+
+    def graph_fn(boxes, scores):
+      nms, num_valid_nms_boxes = post_processing.multiclass_non_max_suppression(
+          boxes,
+          scores,
+          score_thresh,
+          iou_thresh,
+          max_size_per_class,
+          max_total_size=max_output_size,
+          pad_to_max_output_size=True)
+      return [nms.get(), nms.get_field(fields.BoxListFields.scores),
+              nms.get_field(fields.BoxListFields.classes), num_valid_nms_boxes]
+
+    [nms_corners_output, nms_scores_output, nms_classes_output,
+     num_valid_nms_boxes] = self.execute(graph_fn, [boxes, scores])
+
+    self.assertEqual(num_valid_nms_boxes, 4)
+    self.assertAllClose(nms_corners_output[0:num_valid_nms_boxes],
+                        exp_nms_corners)
+    self.assertAllClose(nms_scores_output[0:num_valid_nms_boxes],
+                        exp_nms_scores)
+    self.assertAllClose(nms_classes_output[0:num_valid_nms_boxes],
+                        exp_nms_classes)
+
+  def test_multiclass_nms_select_with_shared_boxes_given_keypoints(self):
+    boxes = tf.constant([[[0, 0, 1, 1]],
+                         [[0, 0.1, 1, 1.1]],
+                         [[0, -0.1, 1, 0.9]],
+                         [[0, 10, 1, 11]],
+                         [[0, 10.1, 1, 11.1]],
+                         [[0, 100, 1, 101]],
+                         [[0, 1000, 1, 1002]],
+                         [[0, 1000, 1, 1002.1]]], tf.float32)
+    scores = tf.constant([[.9, 0.01], [.75, 0.05],
+                          [.6, 0.01], [.95, 0],
+                          [.5, 0.01], [.3, 0.01],
+                          [.01, .85], [.01, .5]])
+    num_keypoints = 6
+    keypoints = tf.tile(
+        tf.reshape(tf.range(8), [8, 1, 1]),
+        [1, num_keypoints, 2])
+    score_thresh = 0.1
+    iou_thresh = .5
+    max_output_size = 4
+
+    exp_nms_corners = [[0, 10, 1, 11],
+                       [0, 0, 1, 1],
+                       [0, 1000, 1, 1002],
+                       [0, 100, 1, 101]]
+    exp_nms_scores = [.95, .9, .85, .3]
+    exp_nms_classes = [0, 0, 1, 0]
+    exp_nms_keypoints_tensor = tf.tile(
+        tf.reshape(tf.constant([3, 0, 6, 5], dtype=tf.float32), [4, 1, 1]),
+        [1, num_keypoints, 2])
+
+    nms, _ = post_processing.multiclass_non_max_suppression(
+        boxes,
+        scores,
+        score_thresh,
+        iou_thresh,
+        max_output_size,
+        additional_fields={fields.BoxListFields.keypoints: keypoints})
+
+    with self.test_session() as sess:
+      (nms_corners_output,
+       nms_scores_output,
+       nms_classes_output,
+       nms_keypoints,
+       exp_nms_keypoints) = sess.run([
+           nms.get(),
+           nms.get_field(fields.BoxListFields.scores),
+           nms.get_field(fields.BoxListFields.classes),
+           nms.get_field(fields.BoxListFields.keypoints),
+           exp_nms_keypoints_tensor
+       ])
+      self.assertAllClose(nms_corners_output, exp_nms_corners)
+      self.assertAllClose(nms_scores_output, exp_nms_scores)
+      self.assertAllClose(nms_classes_output, exp_nms_classes)
+      self.assertAllEqual(nms_keypoints, exp_nms_keypoints)
+
+  def test_multiclass_nms_with_shared_boxes_given_keypoint_heatmaps(self):
+    boxes = tf.constant([[[0, 0, 1, 1]],
+                         [[0, 0.1, 1, 1.1]],
+                         [[0, -0.1, 1, 0.9]],
+                         [[0, 10, 1, 11]],
+                         [[0, 10.1, 1, 11.1]],
+                         [[0, 100, 1, 101]],
+                         [[0, 1000, 1, 1002]],
+                         [[0, 1000, 1, 1002.1]]], tf.float32)
+
+    scores = tf.constant([[.9, 0.01], [.75, 0.05],
+                          [.6, 0.01], [.95, 0],
+                          [.5, 0.01], [.3, 0.01],
+                          [.01, .85], [.01, .5]])
+
+    num_boxes = tf.shape(boxes)[0]
+    heatmap_height = 5
+    heatmap_width = 5
+    num_keypoints = 17
+    keypoint_heatmaps = tf.ones(
+        [num_boxes, heatmap_height, heatmap_width, num_keypoints],
+        dtype=tf.float32)
+
+    score_thresh = 0.1
+    iou_thresh = .5
+    max_output_size = 4
+    exp_nms_corners = [[0, 10, 1, 11],
+                       [0, 0, 1, 1],
+                       [0, 1000, 1, 1002],
+                       [0, 100, 1, 101]]
+
+    exp_nms_scores = [.95, .9, .85, .3]
+    exp_nms_classes = [0, 0, 1, 0]
+    exp_nms_keypoint_heatmaps = np.ones(
+        (4, heatmap_height, heatmap_width, num_keypoints), dtype=np.float32)
+
+    nms, _ = post_processing.multiclass_non_max_suppression(
+        boxes,
+        scores,
+        score_thresh,
+        iou_thresh,
+        max_output_size,
+        additional_fields={
+            fields.BoxListFields.keypoint_heatmaps: keypoint_heatmaps
+        })
+
+    with self.test_session() as sess:
+      (nms_corners_output,
+       nms_scores_output,
+       nms_classes_output,
+       nms_keypoint_heatmaps) = sess.run(
+           [nms.get(),
+            nms.get_field(fields.BoxListFields.scores),
+            nms.get_field(fields.BoxListFields.classes),
+            nms.get_field(fields.BoxListFields.keypoint_heatmaps)])
+
+      self.assertAllClose(nms_corners_output, exp_nms_corners)
+      self.assertAllClose(nms_scores_output, exp_nms_scores)
+      self.assertAllClose(nms_classes_output, exp_nms_classes)
+      self.assertAllEqual(nms_keypoint_heatmaps, exp_nms_keypoint_heatmaps)
+
+  def test_multiclass_nms_with_additional_fields(self):
+    boxes = tf.constant([[[0, 0, 1, 1]],
+                         [[0, 0.1, 1, 1.1]],
+                         [[0, -0.1, 1, 0.9]],
+                         [[0, 10, 1, 11]],
+                         [[0, 10.1, 1, 11.1]],
+                         [[0, 100, 1, 101]],
+                         [[0, 1000, 1, 1002]],
+                         [[0, 1000, 1, 1002.1]]], tf.float32)
+
+    scores = tf.constant([[.9, 0.01], [.75, 0.05],
+                          [.6, 0.01], [.95, 0],
+                          [.5, 0.01], [.3, 0.01],
+                          [.01, .85], [.01, .5]])
+
+    coarse_boxes_key = 'coarse_boxes'
+    coarse_boxes = tf.constant([[0.1, 0.1, 1.1, 1.1],
+                                [0.1, 0.2, 1.1, 1.2],
+                                [0.1, -0.2, 1.1, 1.0],
+                                [0.1, 10.1, 1.1, 11.1],
+                                [0.1, 10.2, 1.1, 11.2],
+                                [0.1, 100.1, 1.1, 101.1],
+                                [0.1, 1000.1, 1.1, 1002.1],
+                                [0.1, 1000.1, 1.1, 1002.2]], tf.float32)
+
+    score_thresh = 0.1
+    iou_thresh = .5
+    max_output_size = 4
+
+    exp_nms_corners = np.array([[0, 10, 1, 11],
+                                [0, 0, 1, 1],
+                                [0, 1000, 1, 1002],
+                                [0, 100, 1, 101]], dtype=np.float32)
+
+    exp_nms_coarse_corners = np.array([[0.1, 10.1, 1.1, 11.1],
+                                       [0.1, 0.1, 1.1, 1.1],
+                                       [0.1, 1000.1, 1.1, 1002.1],
+                                       [0.1, 100.1, 1.1, 101.1]],
+                                      dtype=np.float32)
+
+    exp_nms_scores = [.95, .9, .85, .3]
+    exp_nms_classes = [0, 0, 1, 0]
+
+    nms, _ = post_processing.multiclass_non_max_suppression(
+        boxes,
+        scores,
+        score_thresh,
+        iou_thresh,
+        max_output_size,
+        additional_fields={coarse_boxes_key: coarse_boxes})
+
+    with self.test_session() as sess:
+      (nms_corners_output,
+       nms_scores_output,
+       nms_classes_output,
+       nms_coarse_corners) = sess.run(
+           [nms.get(),
+            nms.get_field(fields.BoxListFields.scores),
+            nms.get_field(fields.BoxListFields.classes),
+            nms.get_field(coarse_boxes_key)])
+
+      self.assertAllClose(nms_corners_output, exp_nms_corners)
+      self.assertAllClose(nms_scores_output, exp_nms_scores)
+      self.assertAllClose(nms_classes_output, exp_nms_classes)
+      self.assertAllEqual(nms_coarse_corners, exp_nms_coarse_corners)
+
+  def test_multiclass_nms_select_with_shared_boxes_given_masks(self):
+    boxes = tf.constant([[[0, 0, 1, 1]],
+                         [[0, 0.1, 1, 1.1]],
+                         [[0, -0.1, 1, 0.9]],
+                         [[0, 10, 1, 11]],
+                         [[0, 10.1, 1, 11.1]],
+                         [[0, 100, 1, 101]],
+                         [[0, 1000, 1, 1002]],
+                         [[0, 1000, 1, 1002.1]]], tf.float32)
+    scores = tf.constant([[.9, 0.01], [.75, 0.05],
+                          [.6, 0.01], [.95, 0],
+                          [.5, 0.01], [.3, 0.01],
+                          [.01, .85], [.01, .5]])
+    num_classes = 2
+    mask_height = 3
+    mask_width = 3
+    masks = tf.tile(
+        tf.reshape(tf.range(8), [8, 1, 1, 1]),
+        [1, num_classes, mask_height, mask_width])
+    score_thresh = 0.1
+    iou_thresh = .5
+    max_output_size = 4
+
+    exp_nms_corners = [[0, 10, 1, 11],
+                       [0, 0, 1, 1],
+                       [0, 1000, 1, 1002],
+                       [0, 100, 1, 101]]
+    exp_nms_scores = [.95, .9, .85, .3]
+    exp_nms_classes = [0, 0, 1, 0]
+    exp_nms_masks_tensor = tf.tile(
+        tf.reshape(tf.constant([3, 0, 6, 5], dtype=tf.float32), [4, 1, 1]),
+        [1, mask_height, mask_width])
+
+    nms, _ = post_processing.multiclass_non_max_suppression(
+        boxes, scores, score_thresh, iou_thresh, max_output_size, masks=masks)
+    with self.test_session() as sess:
+      (nms_corners_output,
+       nms_scores_output,
+       nms_classes_output,
+       nms_masks,
+       exp_nms_masks) = sess.run([nms.get(),
+                                  nms.get_field(fields.BoxListFields.scores),
+                                  nms.get_field(fields.BoxListFields.classes),
+                                  nms.get_field(fields.BoxListFields.masks),
+                                  exp_nms_masks_tensor])
+      self.assertAllClose(nms_corners_output, exp_nms_corners)
+      self.assertAllClose(nms_scores_output, exp_nms_scores)
+      self.assertAllClose(nms_classes_output, exp_nms_classes)
+      self.assertAllEqual(nms_masks, exp_nms_masks)
+
+  def test_multiclass_nms_select_with_clip_window(self):
+    boxes = tf.constant([[[0, 0, 10, 10]],
+                         [[1, 1, 11, 11]]], tf.float32)
+    scores = tf.constant([[.9], [.75]])
+    clip_window = tf.constant([5, 4, 8, 7], tf.float32)
+    score_thresh = 0.0
+    iou_thresh = 0.5
+    max_output_size = 100
+
+    exp_nms_corners = [[5, 4, 8, 7]]
+    exp_nms_scores = [.9]
+    exp_nms_classes = [0]
+
+    nms, _ = post_processing.multiclass_non_max_suppression(
+        boxes,
+        scores,
+        score_thresh,
+        iou_thresh,
+        max_output_size,
+        clip_window=clip_window)
+    with self.test_session() as sess:
+      nms_corners_output, nms_scores_output, nms_classes_output = sess.run(
+          [nms.get(), nms.get_field(fields.BoxListFields.scores),
+           nms.get_field(fields.BoxListFields.classes)])
+      self.assertAllClose(nms_corners_output, exp_nms_corners)
+      self.assertAllClose(nms_scores_output, exp_nms_scores)
+      self.assertAllClose(nms_classes_output, exp_nms_classes)
+
+  def test_multiclass_nms_select_with_clip_window_change_coordinate_frame(self):
+    boxes = tf.constant([[[0, 0, 10, 10]],
+                         [[1, 1, 11, 11]]], tf.float32)
+    scores = tf.constant([[.9], [.75]])
+    clip_window = tf.constant([5, 4, 8, 7], tf.float32)
+    score_thresh = 0.0
+    iou_thresh = 0.5
+    max_output_size = 100
+
+    exp_nms_corners = [[0, 0, 1, 1]]
+    exp_nms_scores = [.9]
+    exp_nms_classes = [0]
+
+    nms, _ = post_processing.multiclass_non_max_suppression(
+        boxes,
+        scores,
+        score_thresh,
+        iou_thresh,
+        max_output_size,
+        clip_window=clip_window,
+        change_coordinate_frame=True)
+    with self.test_session() as sess:
+      nms_corners_output, nms_scores_output, nms_classes_output = sess.run(
+          [nms.get(), nms.get_field(fields.BoxListFields.scores),
+           nms.get_field(fields.BoxListFields.classes)])
+      self.assertAllClose(nms_corners_output, exp_nms_corners)
+      self.assertAllClose(nms_scores_output, exp_nms_scores)
+      self.assertAllClose(nms_classes_output, exp_nms_classes)
+
+  def test_multiclass_nms_select_with_per_class_cap(self):
+    boxes = tf.constant([[[0, 0, 1, 1]],
+                         [[0, 0.1, 1, 1.1]],
+                         [[0, -0.1, 1, 0.9]],
+                         [[0, 10, 1, 11]],
+                         [[0, 10.1, 1, 11.1]],
+                         [[0, 100, 1, 101]],
+                         [[0, 1000, 1, 1002]],
+                         [[0, 1000, 1, 1002.1]]], tf.float32)
+    scores = tf.constant([[.9, 0.01], [.75, 0.05],
+                          [.6, 0.01], [.95, 0],
+                          [.5, 0.01], [.3, 0.01],
+                          [.01, .85], [.01, .5]])
+    score_thresh = 0.1
+    iou_thresh = .5
+    max_size_per_class = 2
+
+    exp_nms_corners = [[0, 10, 1, 11],
+                       [0, 0, 1, 1],
+                       [0, 1000, 1, 1002]]
+    exp_nms_scores = [.95, .9, .85]
+    exp_nms_classes = [0, 0, 1]
+
+    nms, _ = post_processing.multiclass_non_max_suppression(
+        boxes, scores, score_thresh, iou_thresh, max_size_per_class)
+    with self.test_session() as sess:
+      nms_corners_output, nms_scores_output, nms_classes_output = sess.run(
+          [nms.get(), nms.get_field(fields.BoxListFields.scores),
+           nms.get_field(fields.BoxListFields.classes)])
+      self.assertAllClose(nms_corners_output, exp_nms_corners)
+      self.assertAllClose(nms_scores_output, exp_nms_scores)
+      self.assertAllClose(nms_classes_output, exp_nms_classes)
+
+  def test_multiclass_nms_select_with_total_cap(self):
+    boxes = tf.constant([[[0, 0, 1, 1]],
+                         [[0, 0.1, 1, 1.1]],
+                         [[0, -0.1, 1, 0.9]],
+                         [[0, 10, 1, 11]],
+                         [[0, 10.1, 1, 11.1]],
+                         [[0, 100, 1, 101]],
+                         [[0, 1000, 1, 1002]],
+                         [[0, 1000, 1, 1002.1]]], tf.float32)
+    scores = tf.constant([[.9, 0.01], [.75, 0.05],
+                          [.6, 0.01], [.95, 0],
+                          [.5, 0.01], [.3, 0.01],
+                          [.01, .85], [.01, .5]])
+    score_thresh = 0.1
+    iou_thresh = .5
+    max_size_per_class = 4
+    max_total_size = 2
+
+    exp_nms_corners = [[0, 10, 1, 11],
+                       [0, 0, 1, 1]]
+    exp_nms_scores = [.95, .9]
+    exp_nms_classes = [0, 0]
+
+    nms, _ = post_processing.multiclass_non_max_suppression(
+        boxes, scores, score_thresh, iou_thresh, max_size_per_class,
+        max_total_size)
+    with self.test_session() as sess:
+      nms_corners_output, nms_scores_output, nms_classes_output = sess.run(
+          [nms.get(), nms.get_field(fields.BoxListFields.scores),
+           nms.get_field(fields.BoxListFields.classes)])
+      self.assertAllClose(nms_corners_output, exp_nms_corners)
+      self.assertAllClose(nms_scores_output, exp_nms_scores)
+      self.assertAllClose(nms_classes_output, exp_nms_classes)
+
+  def test_multiclass_nms_threshold_then_select_with_shared_boxes(self):
+    boxes = tf.constant([[[0, 0, 1, 1]],
+                         [[0, 0.1, 1, 1.1]],
+                         [[0, -0.1, 1, 0.9]],
+                         [[0, 10, 1, 11]],
+                         [[0, 10.1, 1, 11.1]],
+                         [[0, 100, 1, 101]],
+                         [[0, 1000, 1, 1002]],
+                         [[0, 1000, 1, 1002.1]]], tf.float32)
+    scores = tf.constant([[.9], [.75], [.6], [.95], [.5], [.3], [.01], [.01]])
+    score_thresh = 0.1
+    iou_thresh = .5
+    max_output_size = 3
+
+    exp_nms = [[0, 10, 1, 11],
+               [0, 0, 1, 1],
+               [0, 100, 1, 101]]
+    nms, _ = post_processing.multiclass_non_max_suppression(
+        boxes, scores, score_thresh, iou_thresh, max_output_size)
+    with self.test_session() as sess:
+      nms_output = sess.run(nms.get())
+      self.assertAllClose(nms_output, exp_nms)
+
+  def test_multiclass_nms_select_with_separate_boxes(self):
+    boxes = tf.constant([[[0, 0, 1, 1], [0, 0, 4, 5]],
+                         [[0, 0.1, 1, 1.1], [0, 0.1, 2, 1.1]],
+                         [[0, -0.1, 1, 0.9], [0, -0.1, 1, 0.9]],
+                         [[0, 10, 1, 11], [0, 10, 1, 11]],
+                         [[0, 10.1, 1, 11.1], [0, 10.1, 1, 11.1]],
+                         [[0, 100, 1, 101], [0, 100, 1, 101]],
+                         [[0, 1000, 1, 1002], [0, 999, 2, 1004]],
+                         [[0, 1000, 1, 1002.1], [0, 999, 2, 1002.7]]],
+                        tf.float32)
+    scores = tf.constant([[.9, 0.01], [.75, 0.05],
+                          [.6, 0.01], [.95, 0],
+                          [.5, 0.01], [.3, 0.01],
+                          [.01, .85], [.01, .5]])
+    score_thresh = 0.1
+    iou_thresh = .5
+    max_output_size = 4
+
+    exp_nms_corners = [[0, 10, 1, 11],
+                       [0, 0, 1, 1],
+                       [0, 999, 2, 1004],
+                       [0, 100, 1, 101]]
+    exp_nms_scores = [.95, .9, .85, .3]
+    exp_nms_classes = [0, 0, 1, 0]
+
+    nms, _ = post_processing.multiclass_non_max_suppression(
+        boxes, scores, score_thresh, iou_thresh, max_output_size)
+    with self.test_session() as sess:
+      nms_corners_output, nms_scores_output, nms_classes_output = sess.run(
+          [nms.get(), nms.get_field(fields.BoxListFields.scores),
+           nms.get_field(fields.BoxListFields.classes)])
+      self.assertAllClose(nms_corners_output, exp_nms_corners)
+      self.assertAllClose(nms_scores_output, exp_nms_scores)
+      self.assertAllClose(nms_classes_output, exp_nms_classes)
+
+
+if __name__ == '__main__':
+  tf.test.main()
--- a/research/object_detection/core/post_processing.py
+++ b/research/object_detection/core/post_processing.py
@@ -24,6 +24,94 @@ from object_detection.core import standard_fields as fields
 from object_detection.utils import shape_utils


+def _validate_boxes_scores_iou_thresh(boxes, scores, iou_thresh,
+                                      change_coordinate_frame, clip_window):
+  """Validates boxes, scores and iou_thresh.
+
+  This function validates the boxes, scores, iou_thresh
+     and if change_coordinate_frame is True, clip_window must be specified.
+
+  Args:
+    boxes: A [k, q, 4] float32 tensor containing k detections. `q` can be either
+      number of classes or 1 depending on whether a separate box is predicted
+      per class.
+    scores: A [k, num_classes] float32 tensor containing the scores for each of
+      the k detections. The scores have to be non-negative when
+      pad_to_max_output_size is True.
+    iou_thresh: scalar threshold for IOU (new boxes that have high IOU overlap
+      with previously selected boxes are removed).
+    change_coordinate_frame: Whether to normalize coordinates after clipping
+      relative to clip_window (this can only be set to True if a clip_window is
+      provided)
+    clip_window: A float32 tensor of the form [y_min, x_min, y_max, x_max]
+      representing the window to clip and normalize boxes to before performing
+      non-max suppression.
+
+  Raises:
+    ValueError: if iou_thresh is not in [0, 1] or if input boxlist does not
+    have a valid scores field.
+  """
+  if not 0 <= iou_thresh <= 1.0:
+    raise ValueError('iou_thresh must be between 0 and 1')
+  if scores.shape.ndims != 2:
+    raise ValueError('scores field must be of rank 2')
+  if scores.shape[1].value is None:
+    raise ValueError('scores must have statically defined second ' 'dimension')
+  if boxes.shape.ndims != 3:
+    raise ValueError('boxes must be of rank 3.')
+  if not (shape_utils.get_dim_as_int(
+      boxes.shape[1]) == shape_utils.get_dim_as_int(scores.shape[1]) or
+          shape_utils.get_dim_as_int(boxes.shape[1]) == 1):
+    raise ValueError('second dimension of boxes must be either 1 or equal '
+                     'to the second dimension of scores')
+  if boxes.shape[2].value != 4:
+    raise ValueError('last dimension of boxes must be of size 4.')
+  if change_coordinate_frame and clip_window is None:
+    raise ValueError('if change_coordinate_frame is True, then a clip_window'
+                     'must be specified.')
+
+
+def _clip_window_prune_boxes(sorted_boxes, clip_window, pad_to_max_output_size,
+                             change_coordinate_frame):
+  """Prune boxes with zero area.
+
+  Args:
+    sorted_boxes: A BoxList containing k detections.
+    clip_window: A float32 tensor of the form [y_min, x_min, y_max, x_max]
+      representing the window to clip and normalize boxes to before performing
+      non-max suppression.
+    pad_to_max_output_size: flag indicating whether to pad to max output size or
+      not.
+    change_coordinate_frame: Whether to normalize coordinates after clipping
+      relative to clip_window (this can only be set to True if a clip_window is
+      provided).
+
+  Returns:
+    sorted_boxes: A BoxList containing k detections after pruning.
+    num_valid_nms_boxes_cumulative: Number of valid NMS boxes
+  """
+  sorted_boxes = box_list_ops.clip_to_window(
+      sorted_boxes,
+      clip_window,
+      filter_nonoverlapping=not pad_to_max_output_size)
+  # Set the scores of boxes with zero area to -1 to keep the default
+  # behaviour of pruning out zero area boxes.
+  sorted_boxes_size = tf.shape(sorted_boxes.get())[0]
+  non_zero_box_area = tf.cast(box_list_ops.area(sorted_boxes), tf.bool)
+  sorted_boxes_scores = tf.where(
+      non_zero_box_area, sorted_boxes.get_field(fields.BoxListFields.scores),
+      -1 * tf.ones(sorted_boxes_size))
+  sorted_boxes.add_field(fields.BoxListFields.scores, sorted_boxes_scores)
+  num_valid_nms_boxes_cumulative = tf.reduce_sum(
+      tf.cast(tf.greater_equal(sorted_boxes_scores, 0), tf.int32))
+  sorted_boxes = box_list_ops.sort_by_field(sorted_boxes,
+                                            fields.BoxListFields.scores)
+  if change_coordinate_frame:
+    sorted_boxes = box_list_ops.change_coordinate_frame(sorted_boxes,
+                                                        clip_window)
+  return sorted_boxes, num_valid_nms_boxes_cumulative
+
+
 def multiclass_non_max_suppression(boxes,
                                   scores,
                                   score_thresh,
@@ -97,28 +185,12 @@ def multiclass_non_max_suppression(boxes,
    ValueError: if iou_thresh is not in [0, 1] or if input boxlist does not have
      a valid scores field.
  """
-  if not 0 <= iou_thresh <= 1.0:
-    raise ValueError('iou_thresh must be between 0 and 1')
-  if scores.shape.ndims != 2:
-    raise ValueError('scores field must be of rank 2')
-  if scores.shape[1].value is None:
-    raise ValueError('scores must have statically defined second '
-                     'dimension')
-  if boxes.shape.ndims != 3:
-    raise ValueError('boxes must be of rank 3.')
-  if not (boxes.shape[1].value == scores.shape[1].value or
-          boxes.shape[1].value == 1):
-    raise ValueError('second dimension of boxes must be either 1 or equal '
-                     'to the second dimension of scores')
-  if boxes.shape[2].value != 4:
-    raise ValueError('last dimension of boxes must be of size 4.')
-  if change_coordinate_frame and clip_window is None:
-    raise ValueError('if change_coordinate_frame is True, then a clip_window'
-                     'must be specified.')
+  _validate_boxes_scores_iou_thresh(boxes, scores, iou_thresh,
+                                    change_coordinate_frame, clip_window)

  with tf.name_scope(scope, 'MultiClassNonMaxSuppression'):
    num_scores = tf.shape(scores)[0]
-    num_classes = scores.get_shape()[1]
+    num_classes = shape_utils.get_dim_as_int(scores.get_shape()[1])

    selected_boxes_list = []
    num_valid_nms_boxes_cumulative = tf.constant(0)
@@ -128,7 +200,7 @@ def multiclass_non_max_suppression(boxes,
    if boundaries is not None:
      per_class_boundaries_list = tf.unstack(boundaries, axis=1)
    boxes_ids = (range(num_classes) if len(per_class_boxes_list) > 1
-                 else [0] * num_classes.value)
+                 else [0] * num_classes)
    for class_idx, boxes_idx in zip(range(num_classes), boxes_ids):
      per_class_boxes = per_class_boxes_list[boxes_idx]
      boxlist_and_class_scores = box_list.BoxList(per_class_boxes)
@@ -193,32 +265,13 @@ def multiclass_non_max_suppression(boxes,
    if clip_window is not None:
      # When pad_to_max_output_size is False, it prunes the boxes with zero
      # area.
-      sorted_boxes = box_list_ops.clip_to_window(
-          sorted_boxes,
-          clip_window,
-          filter_nonoverlapping=not pad_to_max_output_size)
-      # Set the scores of boxes with zero area to -1 to keep the default
-      # behaviour of pruning out zero area boxes.
-      sorted_boxes_size = tf.shape(sorted_boxes.get())[0]
-      non_zero_box_area = tf.cast(box_list_ops.area(sorted_boxes), tf.bool)
-      sorted_boxes_scores = tf.where(
-          non_zero_box_area,
-          sorted_boxes.get_field(fields.BoxListFields.scores),
-          -1*tf.ones(sorted_boxes_size))
-      sorted_boxes.add_field(fields.BoxListFields.scores, sorted_boxes_scores)
-      num_valid_nms_boxes_cumulative = tf.reduce_sum(
-          tf.cast(tf.greater_equal(sorted_boxes_scores, 0), tf.int32))
-      sorted_boxes = box_list_ops.sort_by_field(sorted_boxes,
-                                                fields.BoxListFields.scores)
-      if change_coordinate_frame:
-        sorted_boxes = box_list_ops.change_coordinate_frame(
-            sorted_boxes, clip_window)
+      sorted_boxes, num_valid_nms_boxes_cumulative = _clip_window_prune_boxes(
+          sorted_boxes, clip_window, pad_to_max_output_size,
+          change_coordinate_frame)

    if max_total_size:
-      max_total_size = tf.minimum(max_total_size,
-                                  sorted_boxes.num_boxes())
-      sorted_boxes = box_list_ops.gather(sorted_boxes,
-                                         tf.range(max_total_size))
+      max_total_size = tf.minimum(max_total_size, sorted_boxes.num_boxes())
+      sorted_boxes = box_list_ops.gather(sorted_boxes, tf.range(max_total_size))
      num_valid_nms_boxes_cumulative = tf.where(
          max_total_size > num_valid_nms_boxes_cumulative,
          num_valid_nms_boxes_cumulative, max_total_size)
@@ -230,6 +283,175 @@ def multiclass_non_max_suppression(boxes,
    return sorted_boxes, num_valid_nms_boxes_cumulative


+def class_agnostic_non_max_suppression(boxes,
+                                       scores,
+                                       score_thresh,
+                                       iou_thresh,
+                                       max_classes_per_detection=1,
+                                       max_total_size=0,
+                                       clip_window=None,
+                                       change_coordinate_frame=False,
+                                       masks=None,
+                                       boundaries=None,
+                                       pad_to_max_output_size=False,
+                                       additional_fields=None,
+                                       scope=None):
+  """Class-agnostic version of non maximum suppression.
+
+  This op greedily selects a subset of detection bounding boxes, pruning
+  away boxes that have high IOU (intersection over union) overlap (> thresh)
+  with already selected boxes.  It operates on all the boxes using
+  max scores across all classes for which scores are provided (via the scores
+  field of the input box_list), pruning boxes with score less than a provided
+  threshold prior to applying NMS.
+
+  Please note that this operation is performed in a class-agnostic way,
+  therefore any background classes should be removed prior to calling this
+  function.
+
+  Selected boxes are guaranteed to be sorted in decreasing order by score (but
+  the sort is not guaranteed to be stable).
+
+  Args:
+    boxes: A [k, q, 4] float32 tensor containing k detections. `q` can be either
+      number of classes or 1 depending on whether a separate box is predicted
+      per class.
+    scores: A [k, num_classes] float32 tensor containing the scores for each of
+      the k detections. The scores have to be non-negative when
+      pad_to_max_output_size is True.
+    score_thresh: scalar threshold for score (low scoring boxes are removed).
+    iou_thresh: scalar threshold for IOU (new boxes that have high IOU overlap
+      with previously selected boxes are removed).
+    max_classes_per_detection: maximum number of retained classes per detection
+      box in class-agnostic NMS.
+    max_total_size: maximum number of boxes retained over all classes. By
+      default returns all boxes retained after capping boxes per class.
+    clip_window: A float32 tensor of the form [y_min, x_min, y_max, x_max]
+      representing the window to clip and normalize boxes to before performing
+      non-max suppression.
+    change_coordinate_frame: Whether to normalize coordinates after clipping
+      relative to clip_window (this can only be set to True if a clip_window is
+      provided)
+    masks: (optional) a [k, q, mask_height, mask_width] float32 tensor
+      containing box masks. `q` can be either number of classes or 1 depending
+      on whether a separate mask is predicted per class.
+    boundaries: (optional) a [k, q, boundary_height, boundary_width] float32
+      tensor containing box boundaries. `q` can be either number of classes or 1
+      depending on whether a separate boundary is predicted per class.
+    pad_to_max_output_size: If true, the output nmsed boxes are padded to be of
+      length `max_size_per_class`. Defaults to false.
+    additional_fields: (optional) If not None, a dictionary that maps keys to
+      tensors whose first dimensions are all of size `k`. After non-maximum
+      suppression, all tensors corresponding to the selected boxes will be added
+      to resulting BoxList.
+    scope: name scope.
+
+  Returns:
+    A tuple of sorted_boxes and num_valid_nms_boxes. The sorted_boxes is a
+      BoxList holds M boxes with a rank-1 scores field representing
+      corresponding scores for each box with scores sorted in decreasing order
+      and a rank-1 classes field representing a class label for each box. The
+      num_valid_nms_boxes is a 0-D integer tensor representing the number of
+      valid elements in `BoxList`, with the valid elements appearing first.
+
+  Raises:
+    ValueError: if iou_thresh is not in [0, 1] or if input boxlist does not have
+      a valid scores field.
+  """
+  _validate_boxes_scores_iou_thresh(boxes, scores, iou_thresh,
+                                    change_coordinate_frame, clip_window)
+
+  if max_classes_per_detection > 1:
+    raise ValueError('Max classes per detection box >1 not supported.')
+  q = boxes.shape[1].value
+  if q > 1:
+    class_ids = tf.expand_dims(
+        tf.argmax(scores, axis=1, output_type=tf.int32), axis=1)
+    boxes = tf.batch_gather(boxes, class_ids)
+    if masks is not None:
+      masks = tf.batch_gather(masks, class_ids)
+    if boundaries is not None:
+      boundaries = tf.batch_gather(boundaries, class_ids)
+  boxes = tf.squeeze(boxes, axis=[1])
+  if masks is not None:
+    masks = tf.squeeze(masks, axis=[1])
+  if boundaries is not None:
+    boundaries = tf.squeeze(boundaries, axis=[1])
+
+  with tf.name_scope(scope, 'ClassAgnosticNonMaxSuppression'):
+    boxlist_and_class_scores = box_list.BoxList(boxes)
+    max_scores = tf.reduce_max(scores, axis=-1)
+    classes_with_max_scores = tf.argmax(scores, axis=-1)
+    boxlist_and_class_scores.add_field(fields.BoxListFields.scores, max_scores)
+    if masks is not None:
+      boxlist_and_class_scores.add_field(fields.BoxListFields.masks, masks)
+    if boundaries is not None:
+      boxlist_and_class_scores.add_field(fields.BoxListFields.boundaries,
+                                         boundaries)
+
+    if additional_fields is not None:
+      for key, tensor in additional_fields.items():
+        boxlist_and_class_scores.add_field(key, tensor)
+
+    if pad_to_max_output_size:
+      max_selection_size = max_total_size
+      selected_indices, num_valid_nms_boxes = (
+          tf.image.non_max_suppression_padded(
+              boxlist_and_class_scores.get(),
+              boxlist_and_class_scores.get_field(fields.BoxListFields.scores),
+              max_selection_size,
+              iou_threshold=iou_thresh,
+              score_threshold=score_thresh,
+              pad_to_max_output_size=True))
+    else:
+      max_selection_size = tf.minimum(max_total_size,
+                                      boxlist_and_class_scores.num_boxes())
+      selected_indices = tf.image.non_max_suppression(
+          boxlist_and_class_scores.get(),
+          boxlist_and_class_scores.get_field(fields.BoxListFields.scores),
+          max_selection_size,
+          iou_threshold=iou_thresh,
+          score_threshold=score_thresh)
+      num_valid_nms_boxes = tf.shape(selected_indices)[0]
+      selected_indices = tf.concat([
+          selected_indices,
+          tf.zeros(max_selection_size - num_valid_nms_boxes, tf.int32)
+      ], 0)
+
+    nms_result = box_list_ops.gather(boxlist_and_class_scores, selected_indices)
+
+    valid_nms_boxes_indx = tf.less(
+        tf.range(max_selection_size), num_valid_nms_boxes)
+    nms_scores = nms_result.get_field(fields.BoxListFields.scores)
+    nms_result.add_field(
+        fields.BoxListFields.scores,
+        tf.where(valid_nms_boxes_indx, nms_scores,
+                 -1 * tf.ones(max_selection_size)))
+    selected_classes = tf.gather(classes_with_max_scores, selected_indices)
+    nms_result.add_field(fields.BoxListFields.classes, selected_classes)
+    selected_boxes = nms_result
+    sorted_boxes = box_list_ops.sort_by_field(selected_boxes,
+                                              fields.BoxListFields.scores)
+    if clip_window is not None:
+      # When pad_to_max_output_size is False, it prunes the boxes with zero
+      # area.
+      sorted_boxes, num_valid_nms_boxes = _clip_window_prune_boxes(
+          sorted_boxes, clip_window, pad_to_max_output_size,
+          change_coordinate_frame)
+
+    if max_total_size:
+      max_total_size = tf.minimum(max_total_size, sorted_boxes.num_boxes())
+      sorted_boxes = box_list_ops.gather(sorted_boxes, tf.range(max_total_size))
+      num_valid_nms_boxes = tf.where(max_total_size > num_valid_nms_boxes,
+                                     num_valid_nms_boxes, max_total_size)
+    # Select only the valid boxes if pad_to_max_output_size is False.
+    if not pad_to_max_output_size:
+      sorted_boxes = box_list_ops.gather(sorted_boxes,
+                                         tf.range(num_valid_nms_boxes))
+
+    return sorted_boxes, num_valid_nms_boxes
+
+
 def batch_multiclass_non_max_suppression(boxes,
                                         scores,
                                         score_thresh,
@@ -243,7 +465,9 @@ def batch_multiclass_non_max_suppression(boxes,
                                         additional_fields=None,
                                         scope=None,
                                         use_static_shapes=False,
-                                         parallel_iterations=32):
+                                         parallel_iterations=32,
+                                         use_class_agnostic_nms=False,
+                                         max_classes_per_detection=1):
  """Multi-class version of non maximum suppression that operates on a batch.

  This op is similar to `multiclass_non_max_suppression` but operates on a batch
@@ -253,8 +477,8 @@ def batch_multiclass_non_max_suppression(boxes,
  Args:
    boxes: A [batch_size, num_anchors, q, 4] float32 tensor containing
      detections. If `q` is 1 then same boxes are used for all classes
-        otherwise, if `q` is equal to number of classes, class-specific boxes
-        are used.
+      otherwise, if `q` is equal to number of classes, class-specific boxes are
+      used.
    scores: A [batch_size, num_anchors, num_classes] float32 tensor containing
      the scores for each of the `num_anchors` detections. The scores have to be
      non-negative when use_static_shapes is set True.
@@ -274,8 +498,8 @@ def batch_multiclass_non_max_suppression(boxes,
      relative to clip_window (this can only be set to True if a clip_window is
      provided)
    num_valid_boxes: (optional) a Tensor of type `int32`. A 1-D tensor of shape
-      [batch_size] representing the number of valid boxes to be considered
-      for each image in the batch.  This parameter allows for ignoring zero
+      [batch_size] representing the number of valid boxes to be considered for
+      each image in the batch.  This parameter allows for ignoring zero
      paddings.
    masks: (optional) a [batch_size, num_anchors, q, mask_height, mask_width]
      float32 tensor containing box masks. `q` can be either number of classes
@@ -288,6 +512,10 @@ def batch_multiclass_non_max_suppression(boxes,
      Defaults to false.
    parallel_iterations: (optional) number of batch items to process in
      parallel.
+    use_class_agnostic_nms: If true, this uses class-agnostic non max
+      suppression
+    max_classes_per_detection: Maximum number of retained classes per detection
+      box in class-agnostic NMS.

  Returns:
    'nmsed_boxes': A [batch_size, max_detections, 4] float32 tensor
@@ -313,8 +541,8 @@ def batch_multiclass_non_max_suppression(boxes,
    ValueError: if `q` in boxes.shape is not 1 or not equal to number of
      classes as inferred from scores.shape.
  """
-  q = boxes.shape[2].value
-  num_classes = scores.shape[2].value
+  q = shape_utils.get_dim_as_int(boxes.shape[2])
+  num_classes = shape_utils.get_dim_as_int(scores.shape[2])
  if q != 1 and q != num_classes:
    raise ValueError('third dimension of boxes must be either 1 or equal '
                     'to the third dimension of scores')
@@ -335,8 +563,8 @@ def batch_multiclass_non_max_suppression(boxes,
  del additional_fields
  with tf.name_scope(scope, 'BatchMultiClassNonMaxSuppression'):
    boxes_shape = boxes.shape
-    batch_size = boxes_shape[0].value
-    num_anchors = boxes_shape[1].value
+    batch_size = shape_utils.get_dim_as_int(boxes_shape[0])
+    num_anchors = shape_utils.get_dim_as_int(boxes_shape[1])

    if batch_size is None:
      batch_size = tf.shape(boxes)[0]
@@ -434,31 +662,47 @@ def batch_multiclass_non_max_suppression(boxes,
        per_image_masks = tf.reshape(
            tf.slice(per_image_masks, 4 * [0],
                     tf.stack([per_image_num_valid_boxes, -1, -1, -1])),
-            [-1, q, per_image_masks.shape[2].value,
-             per_image_masks.shape[3].value])
+            [-1, q, shape_utils.get_dim_as_int(per_image_masks.shape[2]),
+             shape_utils.get_dim_as_int(per_image_masks.shape[3])])
        if per_image_additional_fields is not None:
          for key, tensor in per_image_additional_fields.items():
            additional_field_shape = tensor.get_shape()
            additional_field_dim = len(additional_field_shape)
            per_image_additional_fields[key] = tf.reshape(
-                tf.slice(per_image_additional_fields[key],
-                         additional_field_dim * [0],
-                         tf.stack([per_image_num_valid_boxes] +
-                                  (additional_field_dim - 1) * [-1])),
-                [-1] + [dim.value for dim in additional_field_shape[1:]])
-
-      nmsed_boxlist, num_valid_nms_boxes = multiclass_non_max_suppression(
-          per_image_boxes,
-          per_image_scores,
-          score_thresh,
-          iou_thresh,
-          max_size_per_class,
-          max_total_size,
-          clip_window=per_image_clip_window,
-          change_coordinate_frame=change_coordinate_frame,
-          masks=per_image_masks,
-          pad_to_max_output_size=use_static_shapes,
-          additional_fields=per_image_additional_fields)
+                tf.slice(
+                    per_image_additional_fields[key],
+                    additional_field_dim * [0],
+                    tf.stack([per_image_num_valid_boxes] +
+                             (additional_field_dim - 1) * [-1])), [-1] + [
+                                 shape_utils.get_dim_as_int(dim)
+                                 for dim in additional_field_shape[1:]
+                             ])
+      if use_class_agnostic_nms:
+        nmsed_boxlist, num_valid_nms_boxes = class_agnostic_non_max_suppression(
+            per_image_boxes,
+            per_image_scores,
+            score_thresh,
+            iou_thresh,
+            max_classes_per_detection,
+            max_total_size,
+            clip_window=per_image_clip_window,
+            change_coordinate_frame=change_coordinate_frame,
+            masks=per_image_masks,
+            pad_to_max_output_size=use_static_shapes,
+            additional_fields=per_image_additional_fields)
+      else:
+        nmsed_boxlist, num_valid_nms_boxes = multiclass_non_max_suppression(
+            per_image_boxes,
+            per_image_scores,
+            score_thresh,
+            iou_thresh,
+            max_size_per_class,
+            max_total_size,
+            clip_window=per_image_clip_window,
+            change_coordinate_frame=change_coordinate_frame,
+            masks=per_image_masks,
+            pad_to_max_output_size=use_static_shapes,
+            additional_fields=per_image_additional_fields)

      if not use_static_shapes:
        nmsed_boxlist = box_list_ops.pad_or_clip_box_list(
@@ -499,7 +743,7 @@ def batch_multiclass_non_max_suppression(boxes,
    if num_additional_fields > 0:
      # Sort the keys to ensure arranging elements in same order as
      # in _single_image_nms_fn.
-      batch_nmsed_keys = ordered_additional_fields.keys()
+      batch_nmsed_keys = list(ordered_additional_fields.keys())
      for i in range(len(batch_nmsed_keys)):
        batch_nmsed_additional_fields[
            batch_nmsed_keys[i]] = batch_nmsed_values[i]

--- a/research/object_detection/core/prefetcher.py
+++ b/research/object_detection/core/prefetcher.py
@@ -55,7 +55,7 @@ def prefetch(tensor_dict, capacity):
  enqueue_op = prefetch_queue.enqueue(tensor_dict)
  tf.train.queue_runner.add_queue_runner(tf.train.queue_runner.QueueRunner(
      prefetch_queue, [enqueue_op]))
-  tf.summary.scalar('queue/%s/fraction_of_%d_full' % (prefetch_queue.name,
-                                                      capacity),
-                    tf.to_float(prefetch_queue.size()) * (1. / capacity))
+  tf.summary.scalar(
+      'queue/%s/fraction_of_%d_full' % (prefetch_queue.name, capacity),
+      tf.cast(prefetch_queue.size(), dtype=tf.float32) * (1. / capacity))
  return prefetch_queue
--- a/research/object_detection/core/preprocessor.py
+++ b/research/object_detection/core/preprocessor.py
@@ -261,7 +261,7 @@ def normalize_image(image, original_minval, original_maxval, target_minval,
    original_maxval = float(original_maxval)
    target_minval = float(target_minval)
    target_maxval = float(target_maxval)
-    image = tf.to_float(image)
+    image = tf.cast(image, dtype=tf.float32)
    image = tf.subtract(image, original_minval)
    image = tf.multiply(image, (target_maxval - target_minval) /
                        (original_maxval - original_minval))
@@ -810,10 +810,12 @@ def random_image_scale(image,
        generator_func, preprocessor_cache.PreprocessorCache.IMAGE_SCALE,
        preprocess_vars_cache)

-    image_newysize = tf.to_int32(
-        tf.multiply(tf.to_float(image_height), size_coef))
-    image_newxsize = tf.to_int32(
-        tf.multiply(tf.to_float(image_width), size_coef))
+    image_newysize = tf.cast(
+        tf.multiply(tf.cast(image_height, dtype=tf.float32), size_coef),
+        dtype=tf.int32)
+    image_newxsize = tf.cast(
+        tf.multiply(tf.cast(image_width, dtype=tf.float32), size_coef),
+        dtype=tf.int32)
    image = tf.image.resize_images(
        image, [image_newysize, image_newxsize], align_corners=True)
    result.append(image)
@@ -1237,7 +1239,7 @@ def _strict_random_crop_image(image,
    new_image.set_shape([None, None, image.get_shape()[2]])

    # [1, 4]
-    im_box_rank2 = tf.squeeze(im_box, squeeze_dims=[0])
+    im_box_rank2 = tf.squeeze(im_box, axis=[0])
    # [4]
    im_box_rank1 = tf.squeeze(im_box)

@@ -1555,13 +1557,15 @@ def random_pad_image(image,
  new_image += image_color_padded

  # setting boxes
-  new_window = tf.to_float(
+  new_window = tf.cast(
      tf.stack([
          -offset_height, -offset_width, target_height - offset_height,
          target_width - offset_width
-      ]))
-  new_window /= tf.to_float(
-      tf.stack([image_height, image_width, image_height, image_width]))
+      ]),
+      dtype=tf.float32)
+  new_window /= tf.cast(
+      tf.stack([image_height, image_width, image_height, image_width]),
+      dtype=tf.float32)
  boxlist = box_list.BoxList(boxes)
  new_boxlist = box_list_ops.change_coordinate_frame(boxlist, new_window)
  new_boxes = new_boxlist.get()
@@ -1616,8 +1620,8 @@ def random_absolute_pad_image(image,
           form.
  """
  min_image_size = tf.shape(image)[:2]
-  max_image_size = min_image_size + tf.to_int32(
-      [max_height_padding, max_width_padding])
+  max_image_size = min_image_size + tf.cast(
+      [max_height_padding, max_width_padding], dtype=tf.int32)
  return random_pad_image(image, boxes, min_image_size=min_image_size,
                          max_image_size=max_image_size, pad_color=pad_color,
                          seed=seed,
@@ -1723,12 +1727,14 @@ def random_crop_pad_image(image,

  cropped_image, cropped_boxes, cropped_labels = result[:3]

-  min_image_size = tf.to_int32(
-      tf.to_float(tf.stack([image_height, image_width])) *
-      min_padded_size_ratio)
-  max_image_size = tf.to_int32(
-      tf.to_float(tf.stack([image_height, image_width])) *
-      max_padded_size_ratio)
+  min_image_size = tf.cast(
+      tf.cast(tf.stack([image_height, image_width]), dtype=tf.float32) *
+      min_padded_size_ratio,
+      dtype=tf.int32)
+  max_image_size = tf.cast(
+      tf.cast(tf.stack([image_height, image_width]), dtype=tf.float32) *
+      max_padded_size_ratio,
+      dtype=tf.int32)

  padded_image, padded_boxes = random_pad_image(
      cropped_image,
@@ -1840,16 +1846,23 @@ def random_crop_to_aspect_ratio(image,
    image_shape = tf.shape(image)
    orig_height = image_shape[0]
    orig_width = image_shape[1]
-    orig_aspect_ratio = tf.to_float(orig_width) / tf.to_float(orig_height)
+    orig_aspect_ratio = tf.cast(
+        orig_width, dtype=tf.float32) / tf.cast(
+            orig_height, dtype=tf.float32)
    new_aspect_ratio = tf.constant(aspect_ratio, dtype=tf.float32)
+
    def target_height_fn():
-      return tf.to_int32(tf.round(tf.to_float(orig_width) / new_aspect_ratio))
+      return tf.cast(
+          tf.round(tf.cast(orig_width, dtype=tf.float32) / new_aspect_ratio),
+          dtype=tf.int32)

    target_height = tf.cond(orig_aspect_ratio >= new_aspect_ratio,
                            lambda: orig_height, target_height_fn)

    def target_width_fn():
-      return tf.to_int32(tf.round(tf.to_float(orig_height) * new_aspect_ratio))
+      return tf.cast(
+          tf.round(tf.cast(orig_height, dtype=tf.float32) * new_aspect_ratio),
+          dtype=tf.int32)

    target_width = tf.cond(orig_aspect_ratio <= new_aspect_ratio,
                           lambda: orig_width, target_width_fn)
@@ -1870,10 +1883,14 @@ def random_crop_to_aspect_ratio(image,
        image, offset_height, offset_width, target_height, target_width)

    im_box = tf.stack([
-        tf.to_float(offset_height) / tf.to_float(orig_height),
-        tf.to_float(offset_width) / tf.to_float(orig_width),
-        tf.to_float(offset_height + target_height) / tf.to_float(orig_height),
-        tf.to_float(offset_width + target_width) / tf.to_float(orig_width)
+        tf.cast(offset_height, dtype=tf.float32) /
+        tf.cast(orig_height, dtype=tf.float32),
+        tf.cast(offset_width, dtype=tf.float32) /
+        tf.cast(orig_width, dtype=tf.float32),
+        tf.cast(offset_height + target_height, dtype=tf.float32) /
+        tf.cast(orig_height, dtype=tf.float32),
+        tf.cast(offset_width + target_width, dtype=tf.float32) /
+        tf.cast(orig_width, dtype=tf.float32)
    ])

    boxlist = box_list.BoxList(boxes)
@@ -1996,8 +2013,8 @@ def random_pad_to_aspect_ratio(image,

  with tf.name_scope('RandomPadToAspectRatio', values=[image]):
    image_shape = tf.shape(image)
-    image_height = tf.to_float(image_shape[0])
-    image_width = tf.to_float(image_shape[1])
+    image_height = tf.cast(image_shape[0], dtype=tf.float32)
+    image_width = tf.cast(image_shape[1], dtype=tf.float32)
    image_aspect_ratio = image_width / image_height
    new_aspect_ratio = tf.constant(aspect_ratio, dtype=tf.float32)
    target_height = tf.cond(
@@ -2034,7 +2051,8 @@ def random_pad_to_aspect_ratio(image,
    target_width = tf.round(scale * target_width)

    new_image = tf.image.pad_to_bounding_box(
-        image, 0, 0, tf.to_int32(target_height), tf.to_int32(target_width))
+        image, 0, 0, tf.cast(target_height, dtype=tf.int32),
+        tf.cast(target_width, dtype=tf.int32))

    im_box = tf.stack([
        0.0,
@@ -2050,9 +2068,9 @@ def random_pad_to_aspect_ratio(image,

    if masks is not None:
      new_masks = tf.expand_dims(masks, -1)
-      new_masks = tf.image.pad_to_bounding_box(new_masks, 0, 0,
-                                               tf.to_int32(target_height),
-                                               tf.to_int32(target_width))
+      new_masks = tf.image.pad_to_bounding_box(
+          new_masks, 0, 0, tf.cast(target_height, dtype=tf.int32),
+          tf.cast(target_width, dtype=tf.int32))
      new_masks = tf.squeeze(new_masks, [-1])
      result.append(new_masks)

@@ -2106,10 +2124,12 @@ def random_black_patches(image,
    image_shape = tf.shape(image)
    image_height = image_shape[0]
    image_width = image_shape[1]
-    box_size = tf.to_int32(
+    box_size = tf.cast(
        tf.multiply(
-            tf.minimum(tf.to_float(image_height), tf.to_float(image_width)),
-            size_to_image_ratio))
+            tf.minimum(
+                tf.cast(image_height, dtype=tf.float32),
+                tf.cast(image_width, dtype=tf.float32)), size_to_image_ratio),
+        dtype=tf.int32)

    generator_func = functools.partial(tf.random_uniform, [], minval=0.0,
                                       maxval=(1.0 - size_to_image_ratio),
@@ -2123,8 +2143,12 @@ def random_black_patches(image,
        preprocessor_cache.PreprocessorCache.ADD_BLACK_PATCH,
        preprocess_vars_cache, key=str(idx) + 'x')

-    y_min = tf.to_int32(normalized_y_min * tf.to_float(image_height))
-    x_min = tf.to_int32(normalized_x_min * tf.to_float(image_width))
+    y_min = tf.cast(
+        normalized_y_min * tf.cast(image_height, dtype=tf.float32),
+        dtype=tf.int32)
+    x_min = tf.cast(
+        normalized_x_min * tf.cast(image_width, dtype=tf.float32),
+        dtype=tf.int32)
    black_box = tf.ones([box_size, box_size, 3], dtype=tf.float32)
    mask = 1.0 - tf.image.pad_to_bounding_box(black_box, y_min, x_min,
                                              image_height, image_width)
@@ -2156,7 +2180,7 @@ def image_to_float(image):
    image: image in tf.float32 format.
  """
  with tf.name_scope('ImageToFloat', values=[image]):
-    image = tf.to_float(image)
+    image = tf.cast(image, dtype=tf.float32)
    return image


@@ -2342,10 +2366,12 @@ def resize_to_min_dimension(image, masks=None, min_dimension=600,
    (image_height, image_width, num_channels) = _get_image_info(image)
    min_image_dimension = tf.minimum(image_height, image_width)
    min_target_dimension = tf.maximum(min_image_dimension, min_dimension)
-    target_ratio = tf.to_float(min_target_dimension) / tf.to_float(
-        min_image_dimension)
-    target_height = tf.to_int32(tf.to_float(image_height) * target_ratio)
-    target_width = tf.to_int32(tf.to_float(image_width) * target_ratio)
+    target_ratio = tf.cast(min_target_dimension, dtype=tf.float32) / tf.cast(
+        min_image_dimension, dtype=tf.float32)
+    target_height = tf.cast(
+        tf.cast(image_height, dtype=tf.float32) * target_ratio, dtype=tf.int32)
+    target_width = tf.cast(
+        tf.cast(image_width, dtype=tf.float32) * target_ratio, dtype=tf.int32)
    image = tf.image.resize_images(
        tf.expand_dims(image, axis=0), size=[target_height, target_width],
        method=method,
@@ -2398,10 +2424,12 @@ def resize_to_max_dimension(image, masks=None, max_dimension=600,
    (image_height, image_width, num_channels) = _get_image_info(image)
    max_image_dimension = tf.maximum(image_height, image_width)
    max_target_dimension = tf.minimum(max_image_dimension, max_dimension)
-    target_ratio = tf.to_float(max_target_dimension) / tf.to_float(
-        max_image_dimension)
-    target_height = tf.to_int32(tf.to_float(image_height) * target_ratio)
-    target_width = tf.to_int32(tf.to_float(image_width) * target_ratio)
+    target_ratio = tf.cast(max_target_dimension, dtype=tf.float32) / tf.cast(
+        max_image_dimension, dtype=tf.float32)
+    target_height = tf.cast(
+        tf.cast(image_height, dtype=tf.float32) * target_ratio, dtype=tf.int32)
+    target_width = tf.cast(
+        tf.cast(image_width, dtype=tf.float32) * target_ratio, dtype=tf.int32)
    image = tf.image.resize_images(
        tf.expand_dims(image, axis=0), size=[target_height, target_width],
        method=method,
@@ -2639,11 +2667,11 @@ def random_self_concat_image(

    if axis == 0:
      # Concat vertically, so need to reduce the y coordinates.
-      old_scaling = tf.to_float([0.5, 1.0, 0.5, 1.0])
-      new_translation = tf.to_float([0.5, 0.0, 0.5, 0.0])
+      old_scaling = tf.constant([0.5, 1.0, 0.5, 1.0])
+      new_translation = tf.constant([0.5, 0.0, 0.5, 0.0])
    elif axis == 1:
-      old_scaling = tf.to_float([1.0, 0.5, 1.0, 0.5])
-      new_translation = tf.to_float([0.0, 0.5, 0.0, 0.5])
+      old_scaling = tf.constant([1.0, 0.5, 1.0, 0.5])
+      new_translation = tf.constant([0.0, 0.5, 0.0, 0.5])

    old_boxes = old_scaling * boxes
    new_boxes = old_boxes + new_translation

--- a/research/object_detection/core/preprocessor_test.py
+++ b/research/object_detection/core/preprocessor_test.py
@@ -795,8 +795,8 @@ class PreprocessorTest(tf.test.TestCase):
    images = self.createTestImages()
    tensor_dict = {fields.InputDataFields.image: images}
    tensor_dict = preprocessor.preprocess(tensor_dict, preprocessing_options)
-    images_min = tf.to_float(images) * 0.9 / 255.0
-    images_max = tf.to_float(images) * 1.1 / 255.0
+    images_min = tf.cast(images, dtype=tf.float32) * 0.9 / 255.0
+    images_max = tf.cast(images, dtype=tf.float32) * 1.1 / 255.0
    images = tensor_dict[fields.InputDataFields.image]
    values_greater = tf.greater_equal(images, images_min)
    values_less = tf.less_equal(images, images_max)
@@ -858,20 +858,26 @@ class PreprocessorTest(tf.test.TestCase):
        value=images_gray, num_or_size_splits=3, axis=3)
    images_r, images_g, images_b = tf.split(
        value=images_original, num_or_size_splits=3, axis=3)
-    images_r_diff1 = tf.squared_difference(tf.to_float(images_r),
-                                           tf.to_float(images_gray_r))
-    images_r_diff2 = tf.squared_difference(tf.to_float(images_gray_r),
-                                           tf.to_float(images_gray_g))
+    images_r_diff1 = tf.squared_difference(
+        tf.cast(images_r, dtype=tf.float32),
+        tf.cast(images_gray_r, dtype=tf.float32))
+    images_r_diff2 = tf.squared_difference(
+        tf.cast(images_gray_r, dtype=tf.float32),
+        tf.cast(images_gray_g, dtype=tf.float32))
    images_r_diff = tf.multiply(images_r_diff1, images_r_diff2)
-    images_g_diff1 = tf.squared_difference(tf.to_float(images_g),
-                                           tf.to_float(images_gray_g))
-    images_g_diff2 = tf.squared_difference(tf.to_float(images_gray_g),
-                                           tf.to_float(images_gray_b))
+    images_g_diff1 = tf.squared_difference(
+        tf.cast(images_g, dtype=tf.float32),
+        tf.cast(images_gray_g, dtype=tf.float32))
+    images_g_diff2 = tf.squared_difference(
+        tf.cast(images_gray_g, dtype=tf.float32),
+        tf.cast(images_gray_b, dtype=tf.float32))
    images_g_diff = tf.multiply(images_g_diff1, images_g_diff2)
-    images_b_diff1 = tf.squared_difference(tf.to_float(images_b),
-                                           tf.to_float(images_gray_b))
-    images_b_diff2 = tf.squared_difference(tf.to_float(images_gray_b),
-                                           tf.to_float(images_gray_r))
+    images_b_diff1 = tf.squared_difference(
+        tf.cast(images_b, dtype=tf.float32),
+        tf.cast(images_gray_b, dtype=tf.float32))
+    images_b_diff2 = tf.squared_difference(
+        tf.cast(images_gray_b, dtype=tf.float32),
+        tf.cast(images_gray_r, dtype=tf.float32))
    images_b_diff = tf.multiply(images_b_diff1, images_b_diff2)
    image_zero1 = tf.constant(0, dtype=tf.float32, shape=[1, 4, 4, 1])
    with self.test_session() as sess:
@@ -2135,7 +2141,7 @@ class PreprocessorTest(tf.test.TestCase):
    boxes = self.createTestBoxes()
    labels = self.createTestLabels()
    tensor_dict = {
-        fields.InputDataFields.image: tf.to_float(images),
+        fields.InputDataFields.image: tf.cast(images, dtype=tf.float32),
        fields.InputDataFields.groundtruth_boxes: boxes,
        fields.InputDataFields.groundtruth_classes: labels,
    }
@@ -2856,7 +2862,7 @@ class PreprocessorTest(tf.test.TestCase):
    scores = self.createTestMultiClassScores()

    tensor_dict = {
-        fields.InputDataFields.image: tf.to_float(images),
+        fields.InputDataFields.image: tf.cast(images, dtype=tf.float32),
        fields.InputDataFields.groundtruth_boxes: boxes,
        fields.InputDataFields.groundtruth_classes: labels,
        fields.InputDataFields.groundtruth_weights: weights,

--- a/research/object_detection/core/standard_fields.py
+++ b/research/object_detection/core/standard_fields.py
@@ -109,6 +109,8 @@ class DetectionResultFields(object):
    key: unique key corresponding to image.
    detection_boxes: coordinates of the detection boxes in the image.
    detection_scores: detection scores for the detection boxes in the image.
+    detection_multiclass_scores: class score distribution (including background)
+      for detection boxes in the image including background class.
    detection_classes: detection-level class labels.
    detection_masks: contains a segmentation mask for each detection box.
    detection_boundaries: contains an object boundary for each detection box.
@@ -123,6 +125,7 @@ class DetectionResultFields(object):
  key = 'key'
  detection_boxes = 'detection_boxes'
  detection_scores = 'detection_scores'
+  detection_multiclass_scores = 'detection_multiclass_scores'
  detection_classes = 'detection_classes'
  detection_masks = 'detection_masks'
  detection_boundaries = 'detection_boundaries'

--- a/research/object_detection/core/target_assigner.py
+++ b/research/object_detection/core/target_assigner.py
@@ -660,16 +660,16 @@ def batch_assign_confidences(target_assigner,
    explicit_example_mask = tf.logical_or(positive_mask, negative_mask)
    positive_anchors = tf.reduce_any(positive_mask, axis=-1)

-    regression_weights = tf.to_float(positive_anchors)
+    regression_weights = tf.cast(positive_anchors, dtype=tf.float32)
    regression_targets = (
        reg_targets * tf.expand_dims(regression_weights, axis=-1))
    regression_weights_expanded = tf.expand_dims(regression_weights, axis=-1)

    cls_targets_without_background = (
-        cls_targets_without_background * (1 - tf.to_float(negative_mask)))
-    cls_weights_without_background = (
-        (1 - implicit_class_weight) * tf.to_float(explicit_example_mask)
-        + implicit_class_weight)
+        cls_targets_without_background *
+        (1 - tf.cast(negative_mask, dtype=tf.float32)))
+    cls_weights_without_background = ((1 - implicit_class_weight) * tf.cast(
+        explicit_example_mask, dtype=tf.float32) + implicit_class_weight)

    if include_background_class:
      cls_weights_background = (

--- a/research/object_detection/data_decoders/tf_example_decoder.py
+++ b/research/object_detection/data_decoders/tf_example_decoder.py
@@ -59,8 +59,15 @@ class _ClassTensorHandler(slim_example_decoder.Tensor):
        label_map_proto_file, use_display_name=False)
    # We use a default_value of -1, but we expect all labels to be contained
    # in the label map.
-    name_to_id_table = tf.contrib.lookup.HashTable(
-        initializer=tf.contrib.lookup.KeyValueTensorInitializer(
+    try:
+      # Dynamically try to load the tf v2 lookup, falling back to contrib
+      lookup = tf.compat.v2.lookup
+      hash_table_class = tf.compat.v2.lookup.StaticHashTable
+    except AttributeError:
+      lookup = tf.contrib.lookup
+      hash_table_class = tf.contrib.lookup.HashTable
+    name_to_id_table = hash_table_class(
+        initializer=lookup.KeyValueTensorInitializer(
            keys=tf.constant(list(name_to_id.keys())),
            values=tf.constant(list(name_to_id.values()), dtype=tf.int64)),
        default_value=-1)
@@ -68,8 +75,8 @@ class _ClassTensorHandler(slim_example_decoder.Tensor):
        label_map_proto_file, use_display_name=True)
    # We use a default_value of -1, but we expect all labels to be contained
    # in the label map.
-    display_name_to_id_table = tf.contrib.lookup.HashTable(
-        initializer=tf.contrib.lookup.KeyValueTensorInitializer(
+    display_name_to_id_table = hash_table_class(
+        initializer=lookup.KeyValueTensorInitializer(
            keys=tf.constant(list(display_name_to_id.keys())),
            values=tf.constant(
                list(display_name_to_id.values()), dtype=tf.int64)),
@@ -444,7 +451,8 @@ class TfExampleDecoder(data_decoder.DataDecoder):
    masks = keys_to_tensors['image/object/mask']
    if isinstance(masks, tf.SparseTensor):
      masks = tf.sparse_tensor_to_dense(masks)
-    masks = tf.reshape(tf.to_float(tf.greater(masks, 0.0)), to_shape)
+    masks = tf.reshape(
+        tf.cast(tf.greater(masks, 0.0), dtype=tf.float32), to_shape)
    return tf.cast(masks, tf.float32)

  def _decode_png_instance_masks(self, keys_to_tensors):
@@ -465,7 +473,7 @@ class TfExampleDecoder(data_decoder.DataDecoder):
      image = tf.squeeze(
          tf.image.decode_image(image_buffer, channels=1), axis=2)
      image.set_shape([None, None])
-      image = tf.to_float(tf.greater(image, 0))
+      image = tf.cast(tf.greater(image, 0), dtype=tf.float32)
      return image

    png_masks = keys_to_tensors['image/object/mask']
@@ -476,4 +484,4 @@ class TfExampleDecoder(data_decoder.DataDecoder):
    return tf.cond(
        tf.greater(tf.size(png_masks), 0),
        lambda: tf.map_fn(decode_png_mask, png_masks, dtype=tf.float32),
-        lambda: tf.zeros(tf.to_int32(tf.stack([0, height, width]))))
+        lambda: tf.zeros(tf.cast(tf.stack([0, height, width]), dtype=tf.int32)))
--- a/research/object_detection/eval_util.py
+++ b/research/object_detection/eval_util.py
@@ -44,10 +44,15 @@ EVAL_METRICS_CLASS_DICT = {
        coco_evaluation.CocoMaskEvaluator,
    'oid_challenge_detection_metrics':
        object_detection_evaluation.OpenImagesDetectionChallengeEvaluator,
+    'oid_challenge_segmentation_metrics':
+        object_detection_evaluation
+        .OpenImagesInstanceSegmentationChallengeEvaluator,
    'pascal_voc_detection_metrics':
        object_detection_evaluation.PascalDetectionEvaluator,
    'weighted_pascal_voc_detection_metrics':
        object_detection_evaluation.WeightedPascalDetectionEvaluator,
+    'precision_at_recall_detection_metrics':
+        object_detection_evaluation.PrecisionAtRecallDetectionEvaluator,
    'pascal_voc_instance_segmentation_metrics':
        object_detection_evaluation.PascalInstanceSegmentationEvaluator,
    'weighted_pascal_voc_instance_segmentation_metrics':
@@ -776,7 +781,8 @@ def result_dict_for_batched_example(images,
  detection_fields = fields.DetectionResultFields
  detection_boxes = detections[detection_fields.detection_boxes]
  detection_scores = detections[detection_fields.detection_scores]
-  num_detections = tf.to_int32(detections[detection_fields.num_detections])
+  num_detections = tf.cast(detections[detection_fields.num_detections],
+                           dtype=tf.int32)

  if class_agnostic:
    detection_classes = tf.ones_like(detection_scores, dtype=tf.int64)
@@ -939,4 +945,9 @@ def evaluator_options_from_eval_config(eval_config):
          'include_metrics_per_category': (
              eval_config.include_metrics_per_category)
      }
+    elif eval_metric_fn_key == 'precision_at_recall_detection_metrics':
+      evaluator_options[eval_metric_fn_key] = {
+          'recall_lower_bound': (eval_config.recall_lower_bound),
+          'recall_upper_bound': (eval_config.recall_upper_bound)
+      }
  return evaluator_options
--- a/research/object_detection/eval_util_test.py
+++ b/research/object_detection/eval_util_test.py
@@ -31,9 +31,9 @@ from object_detection.utils import test_case
 class EvalUtilTest(test_case.TestCase, parameterized.TestCase):

  def _get_categories_list(self):
-    return [{'id': 0, 'name': 'person'},
-            {'id': 1, 'name': 'dog'},
-            {'id': 2, 'name': 'cat'}]
+    return [{'id': 1, 'name': 'person'},
+            {'id': 2, 'name': 'dog'},
+            {'id': 3, 'name': 'cat'}]

  def _make_evaluation_dict(self,
                            resized_groundtruth_masks=False,
@@ -192,43 +192,66 @@ class EvalUtilTest(test_case.TestCase, parameterized.TestCase):

  def test_get_eval_metric_ops_for_evaluators(self):
    eval_config = eval_pb2.EvalConfig()
-    eval_config.metrics_set.extend(
-        ['coco_detection_metrics', 'coco_mask_metrics'])
+    eval_config.metrics_set.extend([
+        'coco_detection_metrics', 'coco_mask_metrics',
+        'precision_at_recall_detection_metrics'
+    ])
    eval_config.include_metrics_per_category = True
+    eval_config.recall_lower_bound = 0.2
+    eval_config.recall_upper_bound = 0.6

    evaluator_options = eval_util.evaluator_options_from_eval_config(
        eval_config)
-    self.assertTrue(evaluator_options['coco_detection_metrics'][
-        'include_metrics_per_category'])
-    self.assertTrue(evaluator_options['coco_mask_metrics'][
-        'include_metrics_per_category'])
+    self.assertTrue(evaluator_options['coco_detection_metrics']
+                    ['include_metrics_per_category'])
+    self.assertTrue(
+        evaluator_options['coco_mask_metrics']['include_metrics_per_category'])
+    self.assertAlmostEqual(
+        evaluator_options['precision_at_recall_detection_metrics']
+        ['recall_lower_bound'], eval_config.recall_lower_bound)
+    self.assertAlmostEqual(
+        evaluator_options['precision_at_recall_detection_metrics']
+        ['recall_upper_bound'], eval_config.recall_upper_bound)

  def test_get_evaluator_with_evaluator_options(self):
    eval_config = eval_pb2.EvalConfig()
-    eval_config.metrics_set.extend(['coco_detection_metrics'])
+    eval_config.metrics_set.extend(
+        ['coco_detection_metrics', 'precision_at_recall_detection_metrics'])
    eval_config.include_metrics_per_category = True
+    eval_config.recall_lower_bound = 0.2
+    eval_config.recall_upper_bound = 0.6
    categories = self._get_categories_list()

    evaluator_options = eval_util.evaluator_options_from_eval_config(
        eval_config)
-    evaluator = eval_util.get_evaluators(
-        eval_config, categories, evaluator_options)
+    evaluator = eval_util.get_evaluators(eval_config, categories,
+                                         evaluator_options)

    self.assertTrue(evaluator[0]._include_metrics_per_category)
+    self.assertAlmostEqual(evaluator[1]._recall_lower_bound,
+                           eval_config.recall_lower_bound)
+    self.assertAlmostEqual(evaluator[1]._recall_upper_bound,
+                           eval_config.recall_upper_bound)

  def test_get_evaluator_with_no_evaluator_options(self):
    eval_config = eval_pb2.EvalConfig()
-    eval_config.metrics_set.extend(['coco_detection_metrics'])
+    eval_config.metrics_set.extend(
+        ['coco_detection_metrics', 'precision_at_recall_detection_metrics'])
    eval_config.include_metrics_per_category = True
+    eval_config.recall_lower_bound = 0.2
+    eval_config.recall_upper_bound = 0.6
    categories = self._get_categories_list()

    evaluator = eval_util.get_evaluators(
        eval_config, categories, evaluator_options=None)

    # Even though we are setting eval_config.include_metrics_per_category = True
-    # this option is never passed into the DetectionEvaluator constructor (via
-    # `evaluator_options`).
+    # and bounds on recall, these options are never passed into the
+    # DetectionEvaluator constructor (via `evaluator_options`).
    self.assertFalse(evaluator[0]._include_metrics_per_category)
+    self.assertAlmostEqual(evaluator[1]._recall_lower_bound, 0.0)
+    self.assertAlmostEqual(evaluator[1]._recall_upper_bound, 1.0)
+

 if __name__ == '__main__':
  tf.test.main()
--- a/research/object_detection/export_tflite_ssd_graph.py
+++ b/research/object_detection/export_tflite_ssd_graph.py
@@ -106,7 +106,7 @@ flags.DEFINE_string('trained_checkpoint_prefix', None, 'Checkpoint prefix.')
 flags.DEFINE_integer('max_detections', 10,
                     'Maximum number of detections (boxes) to show.')
 flags.DEFINE_integer('max_classes_per_detection', 1,
-                     'Number of classes to display per detection box.')
+                     'Maximum number of classes to output per detection box.')
 flags.DEFINE_integer(
    'detections_per_class', 100,
    'Number of anchors used per class in Regular Non-Max-Suppression.')
@@ -136,7 +136,7 @@ def main(argv):
  export_tflite_ssd_graph_lib.export_tflite_graph(
      pipeline_config, FLAGS.trained_checkpoint_prefix, FLAGS.output_directory,
      FLAGS.add_postprocessing_op, FLAGS.max_detections,
-      FLAGS.max_classes_per_detection, FLAGS.use_regular_nms)
+      FLAGS.max_classes_per_detection, use_regular_nms=FLAGS.use_regular_nms)


 if __name__ == '__main__':

--- a/research/object_detection/exporter.py
+++ b/research/object_detection/exporter.py
@@ -176,6 +176,9 @@ def add_output_tensor_nodes(postprocessed_tensors,
      containing detected boxes.
    * detection_scores: float32 tensor of shape [batch_size, num_boxes]
      containing scores for the detected boxes.
+    * detection_multiclass_scores: (Optional) float32 tensor of shape
+      [batch_size, num_boxes, num_classes_with_background] for containing class
+      score distribution for detected boxes including background if any.
    * detection_classes: float32 tensor of shape [batch_size, num_boxes]
      containing class predictions for the detected boxes.
    * detection_keypoints: (Optional) float32 tensor of shape
@@ -189,6 +192,8 @@ def add_output_tensor_nodes(postprocessed_tensors,
    postprocessed_tensors: a dictionary containing the following fields
      'detection_boxes': [batch, max_detections, 4]
      'detection_scores': [batch, max_detections]
+      'detection_multiclass_scores': [batch, max_detections,
+        num_classes_with_background]
      'detection_classes': [batch, max_detections]
      'detection_masks': [batch, max_detections, mask_height, mask_width]
        (optional).
@@ -204,6 +209,8 @@ def add_output_tensor_nodes(postprocessed_tensors,
  label_id_offset = 1
  boxes = postprocessed_tensors.get(detection_fields.detection_boxes)
  scores = postprocessed_tensors.get(detection_fields.detection_scores)
+  multiclass_scores = postprocessed_tensors.get(
+      detection_fields.detection_multiclass_scores)
  raw_boxes = postprocessed_tensors.get(detection_fields.raw_detection_boxes)
  raw_scores = postprocessed_tensors.get(detection_fields.raw_detection_scores)
  classes = postprocessed_tensors.get(
@@ -216,6 +223,9 @@ def add_output_tensor_nodes(postprocessed_tensors,
      boxes, name=detection_fields.detection_boxes)
  outputs[detection_fields.detection_scores] = tf.identity(
      scores, name=detection_fields.detection_scores)
+  if multiclass_scores is not None:
+    outputs[detection_fields.detection_multiclass_scores] = tf.identity(
+        multiclass_scores, name=detection_fields.detection_multiclass_scores)
  outputs[detection_fields.detection_classes] = tf.identity(
      classes, name=detection_fields.detection_classes)
  outputs[detection_fields.num_detections] = tf.identity(
@@ -306,7 +316,7 @@ def write_graph_and_checkpoint(inference_graph_def,

 def _get_outputs_from_inputs(input_tensors, detection_model,
                             output_collection_name):
-  inputs = tf.to_float(input_tensors)
+  inputs = tf.cast(input_tensors, dtype=tf.float32)
  preprocessed_inputs, true_image_shapes = detection_model.preprocess(inputs)
  output_tensors = detection_model.predict(
      preprocessed_inputs, true_image_shapes)

--- a/research/object_detection/exporter_test.py
+++ b/research/object_detection/exporter_test.py
@@ -59,6 +59,9 @@ class FakeModel(model.DetectionModel):
                                           [0.0, 0.0, 0.0, 0.0]]], tf.float32),
          'detection_scores': tf.constant([[0.7, 0.6],
                                           [0.9, 0.0]], tf.float32),
+          'detection_multiclass_scores': tf.constant([[[0.3, 0.7], [0.4, 0.6]],
+                                                      [[0.1, 0.9], [0.0, 0.0]]],
+                                                     tf.float32),
          'detection_classes': tf.constant([[0, 1],
                                            [1, 0]], tf.float32),
          'num_detections': tf.constant([2, 1], tf.float32),
@@ -371,6 +374,7 @@ class ExportInferenceGraphTest(tf.test.TestCase):
      inference_graph.get_tensor_by_name('image_tensor:0')
      inference_graph.get_tensor_by_name('detection_boxes:0')
      inference_graph.get_tensor_by_name('detection_scores:0')
+      inference_graph.get_tensor_by_name('detection_multiclass_scores:0')
      inference_graph.get_tensor_by_name('detection_classes:0')
      inference_graph.get_tensor_by_name('detection_keypoints:0')
      inference_graph.get_tensor_by_name('detection_masks:0')
@@ -398,6 +402,7 @@ class ExportInferenceGraphTest(tf.test.TestCase):
      inference_graph.get_tensor_by_name('image_tensor:0')
      inference_graph.get_tensor_by_name('detection_boxes:0')
      inference_graph.get_tensor_by_name('detection_scores:0')
+      inference_graph.get_tensor_by_name('detection_multiclass_scores:0')
      inference_graph.get_tensor_by_name('detection_classes:0')
      inference_graph.get_tensor_by_name('num_detections:0')
      with self.assertRaises(KeyError):
@@ -491,15 +496,20 @@ class ExportInferenceGraphTest(tf.test.TestCase):
          'encoded_image_string_tensor:0')
      boxes = inference_graph.get_tensor_by_name('detection_boxes:0')
      scores = inference_graph.get_tensor_by_name('detection_scores:0')
+      multiclass_scores = inference_graph.get_tensor_by_name(
+          'detection_multiclass_scores:0')
      classes = inference_graph.get_tensor_by_name('detection_classes:0')
      keypoints = inference_graph.get_tensor_by_name('detection_keypoints:0')
      masks = inference_graph.get_tensor_by_name('detection_masks:0')
      num_detections = inference_graph.get_tensor_by_name('num_detections:0')
      for image_str in [jpg_image_str, png_image_str]:
        image_str_batch_np = np.hstack([image_str]* 2)
-        (boxes_np, scores_np, classes_np, keypoints_np, masks_np,
-         num_detections_np) = sess.run(
-             [boxes, scores, classes, keypoints, masks, num_detections],
+        (boxes_np, scores_np, multiclass_scores_np, classes_np, keypoints_np,
+         masks_np, num_detections_np) = sess.run(
+             [
+                 boxes, scores, multiclass_scores, classes, keypoints, masks,
+                 num_detections
+             ],
             feed_dict={image_str_tensor: image_str_batch_np})
        self.assertAllClose(boxes_np, [[[0.0, 0.0, 0.5, 0.5],
                                        [0.5, 0.5, 0.8, 0.8]],
@@ -507,6 +517,8 @@ class ExportInferenceGraphTest(tf.test.TestCase):
                                        [0.0, 0.0, 0.0, 0.0]]])
        self.assertAllClose(scores_np, [[0.7, 0.6],
                                        [0.9, 0.0]])
+        self.assertAllClose(multiclass_scores_np, [[[0.3, 0.7], [0.4, 0.6]],
+                                                   [[0.1, 0.9], [0.0, 0.0]]])
        self.assertAllClose(classes_np, [[1, 2],
                                         [2, 1]])
        self.assertAllClose(keypoints_np, np.arange(48).reshape([2, 2, 6, 2]))

--- a/research/object_detection/inputs.py
+++ b/research/object_detection/inputs.py
@@ -127,7 +127,7 @@ def transform_input_data(tensor_dict,
  # Apply model preprocessing ops and resize instance masks.
  image = tensor_dict[fields.InputDataFields.image]
  preprocessed_resized_image, true_image_shape = model_preprocess_fn(
-      tf.expand_dims(tf.to_float(image), axis=0))
+      tf.expand_dims(tf.cast(image, dtype=tf.float32), axis=0))
  if use_bfloat16:
    preprocessed_resized_image = tf.cast(
        preprocessed_resized_image, tf.bfloat16)
@@ -219,14 +219,15 @@ def pad_input_data_to_static_shapes(tensor_dict, max_num_boxes, num_classes,

  num_additional_channels = 0
  if fields.InputDataFields.image_additional_channels in tensor_dict:
-    num_additional_channels = tensor_dict[
-        fields.InputDataFields.image_additional_channels].shape[2].value
+    num_additional_channels = shape_utils.get_dim_as_int(tensor_dict[
+        fields.InputDataFields.image_additional_channels].shape[2])

  # We assume that if num_additional_channels > 0, then it has already been
  # concatenated to the base image (but not the ground truth).
  num_channels = 3
  if fields.InputDataFields.image in tensor_dict:
-    num_channels = tensor_dict[fields.InputDataFields.image].shape[2].value
+    num_channels = shape_utils.get_dim_as_int(
+        tensor_dict[fields.InputDataFields.image].shape[2])

  if num_additional_channels:
    if num_additional_channels >= num_channels:
@@ -234,7 +235,8 @@ def pad_input_data_to_static_shapes(tensor_dict, max_num_boxes, num_classes,
          'Image must be already concatenated with additional channels.')

    if (fields.InputDataFields.original_image in tensor_dict and
-        tensor_dict[fields.InputDataFields.original_image].shape[2].value ==
+        shape_utils.get_dim_as_int(
+            tensor_dict[fields.InputDataFields.original_image].shape[2]) ==
        num_channels):
      raise ValueError(
          'Image must be already concatenated with additional channels.')
@@ -273,19 +275,21 @@ def pad_input_data_to_static_shapes(tensor_dict, max_num_boxes, num_classes,

  if fields.InputDataFields.original_image in tensor_dict:
    padding_shapes[fields.InputDataFields.original_image] = [
-        height, width, tensor_dict[fields.InputDataFields.
-                                   original_image].shape[2].value
+        height, width,
+        shape_utils.get_dim_as_int(tensor_dict[fields.InputDataFields.
+                                               original_image].shape[2])
    ]
  if fields.InputDataFields.groundtruth_keypoints in tensor_dict:
    tensor_shape = (
        tensor_dict[fields.InputDataFields.groundtruth_keypoints].shape)
-    padding_shape = [max_num_boxes, tensor_shape[1].value,
-                     tensor_shape[2].value]
+    padding_shape = [max_num_boxes,
+                     shape_utils.get_dim_as_int(tensor_shape[1]),
+                     shape_utils.get_dim_as_int(tensor_shape[2])]
    padding_shapes[fields.InputDataFields.groundtruth_keypoints] = padding_shape
  if fields.InputDataFields.groundtruth_keypoint_visibilities in tensor_dict:
    tensor_shape = tensor_dict[fields.InputDataFields.
                               groundtruth_keypoint_visibilities].shape
-    padding_shape = [max_num_boxes, tensor_shape[1].value]
+    padding_shape = [max_num_boxes, shape_utils.get_dim_as_int(tensor_shape[1])]
    padding_shapes[fields.InputDataFields.
                   groundtruth_keypoint_visibilities] = padding_shape

@@ -318,7 +322,7 @@ def augment_input_data(tensor_dict, data_augmentation_options):
    input tensor dictionary.
  """
  tensor_dict[fields.InputDataFields.image] = tf.expand_dims(
-      tf.to_float(tensor_dict[fields.InputDataFields.image]), 0)
+      tf.cast(tensor_dict[fields.InputDataFields.image], dtype=tf.float32), 0)

  include_instance_masks = (fields.InputDataFields.groundtruth_instance_masks
                            in tensor_dict)
@@ -438,97 +442,112 @@ def create_train_input_fn(train_config, train_input_config,
  """

  def _train_input_fn(params=None):
-    """Returns `features` and `labels` tensor dictionaries for training.
+    return train_input(train_config, train_input_config, model_config,
+                       params=params)

-    Args:
-      params: Parameter dictionary passed from the estimator.
+  return _train_input_fn

-    Returns:
-      A tf.data.Dataset that holds (features, labels) tuple.
-
-      features: Dictionary of feature tensors.
-        features[fields.InputDataFields.image] is a [batch_size, H, W, C]
-          float32 tensor with preprocessed images.
-        features[HASH_KEY] is a [batch_size] int32 tensor representing unique
-          identifiers for the images.
-        features[fields.InputDataFields.true_image_shape] is a [batch_size, 3]
-          int32 tensor representing the true image shapes, as preprocessed
-          images could be padded.
-        features[fields.InputDataFields.original_image] (optional) is a
-          [batch_size, H, W, C] float32 tensor with original images.
-      labels: Dictionary of groundtruth tensors.
-        labels[fields.InputDataFields.num_groundtruth_boxes] is a [batch_size]
-          int32 tensor indicating the number of groundtruth boxes.
-        labels[fields.InputDataFields.groundtruth_boxes] is a
-          [batch_size, num_boxes, 4] float32 tensor containing the corners of
-          the groundtruth boxes.
-        labels[fields.InputDataFields.groundtruth_classes] is a
-          [batch_size, num_boxes, num_classes] float32 one-hot tensor of
-          classes.
-        labels[fields.InputDataFields.groundtruth_weights] is a
-          [batch_size, num_boxes] float32 tensor containing groundtruth weights
-          for the boxes.
-        -- Optional --
-        labels[fields.InputDataFields.groundtruth_instance_masks] is a
-          [batch_size, num_boxes, H, W] float32 tensor containing only binary
-          values, which represent instance masks for objects.
-        labels[fields.InputDataFields.groundtruth_keypoints] is a
-          [batch_size, num_boxes, num_keypoints, 2] float32 tensor containing
-          keypoints for each box.
-
-    Raises:
-      TypeError: if the `train_config`, `train_input_config` or `model_config`
-        are not of the correct type.
-    """
-    if not isinstance(train_config, train_pb2.TrainConfig):
-      raise TypeError('For training mode, the `train_config` must be a '
-                      'train_pb2.TrainConfig.')
-    if not isinstance(train_input_config, input_reader_pb2.InputReader):
-      raise TypeError('The `train_input_config` must be a '
-                      'input_reader_pb2.InputReader.')
-    if not isinstance(model_config, model_pb2.DetectionModel):
-      raise TypeError('The `model_config` must be a '
-                      'model_pb2.DetectionModel.')
-
-    def transform_and_pad_input_data_fn(tensor_dict):
-      """Combines transform and pad operation."""
-      data_augmentation_options = [
-          preprocessor_builder.build(step)
-          for step in train_config.data_augmentation_options
-      ]
-      data_augmentation_fn = functools.partial(
-          augment_input_data,
-          data_augmentation_options=data_augmentation_options)
-
-      model_preprocess_fn = INPUT_BUILDER_UTIL_MAP['model_build'](
-          model_config, is_training=True).preprocess
-      image_resizer_config = config_util.get_image_resizer_config(model_config)
-      image_resizer_fn = image_resizer_builder.build(image_resizer_config)
-      transform_data_fn = functools.partial(
-          transform_input_data, model_preprocess_fn=model_preprocess_fn,
-          image_resizer_fn=image_resizer_fn,
-          num_classes=config_util.get_number_of_classes(model_config),
-          data_augmentation_fn=data_augmentation_fn,
-          merge_multiple_boxes=train_config.merge_multiple_label_boxes,
-          retain_original_image=train_config.retain_original_images,
-          use_multiclass_scores=train_config.use_multiclass_scores,
-          use_bfloat16=train_config.use_bfloat16)
-
-      tensor_dict = pad_input_data_to_static_shapes(
-          tensor_dict=transform_data_fn(tensor_dict),
-          max_num_boxes=train_input_config.max_number_of_boxes,
-          num_classes=config_util.get_number_of_classes(model_config),
-          spatial_image_shape=config_util.get_spatial_image_size(
-              image_resizer_config))
-      return (_get_features_dict(tensor_dict), _get_labels_dict(tensor_dict))
-
-    dataset = INPUT_BUILDER_UTIL_MAP['dataset_build'](
-        train_input_config,
-        transform_input_data_fn=transform_and_pad_input_data_fn,
-        batch_size=params['batch_size'] if params else train_config.batch_size)
-    return dataset

-  return _train_input_fn
+def train_input(train_config, train_input_config,
+                model_config, model=None, params=None):
+  """Returns `features` and `labels` tensor dictionaries for training.
+
+  Args:
+    train_config: A train_pb2.TrainConfig.
+    train_input_config: An input_reader_pb2.InputReader.
+    model_config: A model_pb2.DetectionModel.
+    model: A pre-constructed Detection Model.
+      If None, one will be created from the config.
+    params: Parameter dictionary passed from the estimator.
+
+  Returns:
+    A tf.data.Dataset that holds (features, labels) tuple.
+
+    features: Dictionary of feature tensors.
+      features[fields.InputDataFields.image] is a [batch_size, H, W, C]
+        float32 tensor with preprocessed images.
+      features[HASH_KEY] is a [batch_size] int32 tensor representing unique
+        identifiers for the images.
+      features[fields.InputDataFields.true_image_shape] is a [batch_size, 3]
+        int32 tensor representing the true image shapes, as preprocessed
+        images could be padded.
+      features[fields.InputDataFields.original_image] (optional) is a
+        [batch_size, H, W, C] float32 tensor with original images.
+    labels: Dictionary of groundtruth tensors.
+      labels[fields.InputDataFields.num_groundtruth_boxes] is a [batch_size]
+        int32 tensor indicating the number of groundtruth boxes.
+      labels[fields.InputDataFields.groundtruth_boxes] is a
+        [batch_size, num_boxes, 4] float32 tensor containing the corners of
+        the groundtruth boxes.
+      labels[fields.InputDataFields.groundtruth_classes] is a
+        [batch_size, num_boxes, num_classes] float32 one-hot tensor of
+        classes.
+      labels[fields.InputDataFields.groundtruth_weights] is a
+        [batch_size, num_boxes] float32 tensor containing groundtruth weights
+        for the boxes.
+      -- Optional --
+      labels[fields.InputDataFields.groundtruth_instance_masks] is a
+        [batch_size, num_boxes, H, W] float32 tensor containing only binary
+        values, which represent instance masks for objects.
+      labels[fields.InputDataFields.groundtruth_keypoints] is a
+        [batch_size, num_boxes, num_keypoints, 2] float32 tensor containing
+        keypoints for each box.
+
+  Raises:
+    TypeError: if the `train_config`, `train_input_config` or `model_config`
+      are not of the correct type.
+  """
+  if not isinstance(train_config, train_pb2.TrainConfig):
+    raise TypeError('For training mode, the `train_config` must be a '
+                    'train_pb2.TrainConfig.')
+  if not isinstance(train_input_config, input_reader_pb2.InputReader):
+    raise TypeError('The `train_input_config` must be a '
+                    'input_reader_pb2.InputReader.')
+  if not isinstance(model_config, model_pb2.DetectionModel):
+    raise TypeError('The `model_config` must be a '
+                    'model_pb2.DetectionModel.')
+
+  if model is None:
+    model_preprocess_fn = INPUT_BUILDER_UTIL_MAP['model_build'](
+        model_config, is_training=True).preprocess
+  else:
+    model_preprocess_fn = model.preprocess
+
+  def transform_and_pad_input_data_fn(tensor_dict):
+    """Combines transform and pad operation."""
+    data_augmentation_options = [
+        preprocessor_builder.build(step)
+        for step in train_config.data_augmentation_options
+    ]
+    data_augmentation_fn = functools.partial(
+        augment_input_data,
+        data_augmentation_options=data_augmentation_options)
+
+    image_resizer_config = config_util.get_image_resizer_config(model_config)
+    image_resizer_fn = image_resizer_builder.build(image_resizer_config)
+    transform_data_fn = functools.partial(
+        transform_input_data, model_preprocess_fn=model_preprocess_fn,
+        image_resizer_fn=image_resizer_fn,
+        num_classes=config_util.get_number_of_classes(model_config),
+        data_augmentation_fn=data_augmentation_fn,
+        merge_multiple_boxes=train_config.merge_multiple_label_boxes,
+        retain_original_image=train_config.retain_original_images,
+        use_multiclass_scores=train_config.use_multiclass_scores,
+        use_bfloat16=train_config.use_bfloat16)
+
+    tensor_dict = pad_input_data_to_static_shapes(
+        tensor_dict=transform_data_fn(tensor_dict),
+        max_num_boxes=train_input_config.max_number_of_boxes,
+        num_classes=config_util.get_number_of_classes(model_config),
+        spatial_image_shape=config_util.get_spatial_image_size(
+            image_resizer_config))
+    return (_get_features_dict(tensor_dict), _get_labels_dict(tensor_dict))
+
+  dataset = INPUT_BUILDER_UTIL_MAP['dataset_build'](
+      train_input_config,
+      transform_input_data_fn=transform_and_pad_input_data_fn,
+      batch_size=params['batch_size'] if params else train_config.batch_size)
+  return dataset


 def create_eval_input_fn(eval_config, eval_input_config, model_config):
@@ -544,84 +563,99 @@ def create_eval_input_fn(eval_config, eval_input_config, model_config):
  """

  def _eval_input_fn(params=None):
-    """Returns `features` and `labels` tensor dictionaries for evaluation.
+    return eval_input(eval_config, eval_input_config, model_config,
+                      params=params)

-    Args:
-      params: Parameter dictionary passed from the estimator.
+  return _eval_input_fn

-    Returns:
-      A tf.data.Dataset that holds (features, labels) tuple.
-
-      features: Dictionary of feature tensors.
-        features[fields.InputDataFields.image] is a [1, H, W, C] float32 tensor
-          with preprocessed images.
-        features[HASH_KEY] is a [1] int32 tensor representing unique
-          identifiers for the images.
-        features[fields.InputDataFields.true_image_shape] is a [1, 3]
-          int32 tensor representing the true image shapes, as preprocessed
-          images could be padded.
-        features[fields.InputDataFields.original_image] is a [1, H', W', C]
-          float32 tensor with the original image.
-      labels: Dictionary of groundtruth tensors.
-        labels[fields.InputDataFields.groundtruth_boxes] is a [1, num_boxes, 4]
-          float32 tensor containing the corners of the groundtruth boxes.
-        labels[fields.InputDataFields.groundtruth_classes] is a
-          [num_boxes, num_classes] float32 one-hot tensor of classes.
-        labels[fields.InputDataFields.groundtruth_area] is a [1, num_boxes]
-          float32 tensor containing object areas.
-        labels[fields.InputDataFields.groundtruth_is_crowd] is a [1, num_boxes]
-          bool tensor indicating if the boxes enclose a crowd.
-        labels[fields.InputDataFields.groundtruth_difficult] is a [1, num_boxes]
-          int32 tensor indicating if the boxes represent difficult instances.
-        -- Optional --
-        labels[fields.InputDataFields.groundtruth_instance_masks] is a
-          [1, num_boxes, H, W] float32 tensor containing only binary values,
-          which represent instance masks for objects.
-
-    Raises:
-      TypeError: if the `eval_config`, `eval_input_config` or `model_config`
-        are not of the correct type.
-    """
-    params = params or {}
-    if not isinstance(eval_config, eval_pb2.EvalConfig):
-      raise TypeError('For eval mode, the `eval_config` must be a '
-                      'train_pb2.EvalConfig.')
-    if not isinstance(eval_input_config, input_reader_pb2.InputReader):
-      raise TypeError('The `eval_input_config` must be a '
-                      'input_reader_pb2.InputReader.')
-    if not isinstance(model_config, model_pb2.DetectionModel):
-      raise TypeError('The `model_config` must be a '
-                      'model_pb2.DetectionModel.')
-
-    def transform_and_pad_input_data_fn(tensor_dict):
-      """Combines transform and pad operation."""
-      num_classes = config_util.get_number_of_classes(model_config)
-      model_preprocess_fn = INPUT_BUILDER_UTIL_MAP['model_build'](
-          model_config, is_training=False).preprocess
-
-      image_resizer_config = config_util.get_image_resizer_config(model_config)
-      image_resizer_fn = image_resizer_builder.build(image_resizer_config)
-
-      transform_data_fn = functools.partial(
-          transform_input_data, model_preprocess_fn=model_preprocess_fn,
-          image_resizer_fn=image_resizer_fn,
-          num_classes=num_classes,
-          data_augmentation_fn=None,
-          retain_original_image=eval_config.retain_original_images)
-      tensor_dict = pad_input_data_to_static_shapes(
-          tensor_dict=transform_data_fn(tensor_dict),
-          max_num_boxes=eval_input_config.max_number_of_boxes,
-          num_classes=config_util.get_number_of_classes(model_config),
-          spatial_image_shape=config_util.get_spatial_image_size(
-              image_resizer_config))
-      return (_get_features_dict(tensor_dict), _get_labels_dict(tensor_dict))
-    dataset = INPUT_BUILDER_UTIL_MAP['dataset_build'](
-        eval_input_config,
-        batch_size=params['batch_size'] if params else eval_config.batch_size,
-        transform_input_data_fn=transform_and_pad_input_data_fn)
-    return dataset

-  return _eval_input_fn
+def eval_input(eval_config, eval_input_config, model_config,
+               model=None, params=None):
+  """Returns `features` and `labels` tensor dictionaries for evaluation.
+
+  Args:
+    eval_config: An eval_pb2.EvalConfig.
+    eval_input_config: An input_reader_pb2.InputReader.
+    model_config: A model_pb2.DetectionModel.
+    model: A pre-constructed Detection Model.
+      If None, one will be created from the config.
+    params: Parameter dictionary passed from the estimator.
+
+  Returns:
+    A tf.data.Dataset that holds (features, labels) tuple.
+
+    features: Dictionary of feature tensors.
+      features[fields.InputDataFields.image] is a [1, H, W, C] float32 tensor
+        with preprocessed images.
+      features[HASH_KEY] is a [1] int32 tensor representing unique
+        identifiers for the images.
+      features[fields.InputDataFields.true_image_shape] is a [1, 3]
+        int32 tensor representing the true image shapes, as preprocessed
+        images could be padded.
+      features[fields.InputDataFields.original_image] is a [1, H', W', C]
+        float32 tensor with the original image.
+    labels: Dictionary of groundtruth tensors.
+      labels[fields.InputDataFields.groundtruth_boxes] is a [1, num_boxes, 4]
+        float32 tensor containing the corners of the groundtruth boxes.
+      labels[fields.InputDataFields.groundtruth_classes] is a
+        [num_boxes, num_classes] float32 one-hot tensor of classes.
+      labels[fields.InputDataFields.groundtruth_area] is a [1, num_boxes]
+        float32 tensor containing object areas.
+      labels[fields.InputDataFields.groundtruth_is_crowd] is a [1, num_boxes]
+        bool tensor indicating if the boxes enclose a crowd.
+      labels[fields.InputDataFields.groundtruth_difficult] is a [1, num_boxes]
+        int32 tensor indicating if the boxes represent difficult instances.
+      -- Optional --
+      labels[fields.InputDataFields.groundtruth_instance_masks] is a
+        [1, num_boxes, H, W] float32 tensor containing only binary values,
+        which represent instance masks for objects.
+
+  Raises:
+    TypeError: if the `eval_config`, `eval_input_config` or `model_config`
+      are not of the correct type.
+  """
+  params = params or {}
+  if not isinstance(eval_config, eval_pb2.EvalConfig):
+    raise TypeError('For eval mode, the `eval_config` must be a '
+                    'train_pb2.EvalConfig.')
+  if not isinstance(eval_input_config, input_reader_pb2.InputReader):
+    raise TypeError('The `eval_input_config` must be a '
+                    'input_reader_pb2.InputReader.')
+  if not isinstance(model_config, model_pb2.DetectionModel):
+    raise TypeError('The `model_config` must be a '
+                    'model_pb2.DetectionModel.')
+
+  if model is None:
+    model_preprocess_fn = INPUT_BUILDER_UTIL_MAP['model_build'](
+        model_config, is_training=False).preprocess
+  else:
+    model_preprocess_fn = model.preprocess
+
+  def transform_and_pad_input_data_fn(tensor_dict):
+    """Combines transform and pad operation."""
+    num_classes = config_util.get_number_of_classes(model_config)
+
+    image_resizer_config = config_util.get_image_resizer_config(model_config)
+    image_resizer_fn = image_resizer_builder.build(image_resizer_config)
+
+    transform_data_fn = functools.partial(
+        transform_input_data, model_preprocess_fn=model_preprocess_fn,
+        image_resizer_fn=image_resizer_fn,
+        num_classes=num_classes,
+        data_augmentation_fn=None,
+        retain_original_image=eval_config.retain_original_images)
+    tensor_dict = pad_input_data_to_static_shapes(
+        tensor_dict=transform_data_fn(tensor_dict),
+        max_num_boxes=eval_input_config.max_number_of_boxes,
+        num_classes=config_util.get_number_of_classes(model_config),
+        spatial_image_shape=config_util.get_spatial_image_size(
+            image_resizer_config))
+    return (_get_features_dict(tensor_dict), _get_labels_dict(tensor_dict))
+  dataset = INPUT_BUILDER_UTIL_MAP['dataset_build'](
+      eval_input_config,
+      batch_size=params['batch_size'] if params else eval_config.batch_size,
+      transform_input_data_fn=transform_and_pad_input_data_fn)
+  return dataset


 def create_predict_input_fn(model_config, predict_input_config):
@@ -664,7 +698,7 @@ def create_predict_input_fn(model_config, predict_input_config):
        load_instance_masks=False,
        num_additional_channels=predict_input_config.num_additional_channels)
    input_dict = transform_fn(decoder.decode(example))
-    images = tf.to_float(input_dict[fields.InputDataFields.image])
+    images = tf.cast(input_dict[fields.InputDataFields.image], dtype=tf.float32)
    images = tf.expand_dims(images, axis=0)
    true_image_shape = tf.expand_dims(
        input_dict[fields.InputDataFields.true_image_shape], axis=0)

--- a/research/object_detection/legacy/evaluator.py
+++ b/research/object_detection/legacy/evaluator.py
@@ -53,6 +53,9 @@ EVAL_METRICS_CLASS_DICT = {
    # DEPRECATED: please use oid_challenge_detection_metrics instead
    'oid_challenge_object_detection_metrics':
        object_detection_evaluation.OpenImagesDetectionChallengeEvaluator,
+    'oid_challenge_segmentation_metrics':
+        object_detection_evaluation
+        .OpenImagesInstanceSegmentationChallengeEvaluator,
 }

 EVAL_DEFAULT_METRIC = 'pascal_voc_detection_metrics'
@@ -80,7 +83,7 @@ def _extract_predictions_and_losses(model,
  input_dict = prefetch_queue.dequeue()
  original_image = tf.expand_dims(input_dict[fields.InputDataFields.image], 0)
  preprocessed_image, true_image_shapes = model.preprocess(
-      tf.to_float(original_image))
+      tf.cast(original_image, dtype=tf.float32))
  prediction_dict = model.predict(preprocessed_image, true_image_shapes)
  detections = model.postprocess(prediction_dict, true_image_shapes)


--- a/research/object_detection/legacy/trainer.py
+++ b/research/object_detection/legacy/trainer.py
@@ -62,7 +62,7 @@ def create_input_queue(batch_size_per_clone, create_tensor_dict_fn,
      tensor_dict[fields.InputDataFields.image], 0)

  images = tensor_dict[fields.InputDataFields.image]
-  float_images = tf.to_float(images)
+  float_images = tf.cast(images, dtype=tf.float32)
  tensor_dict[fields.InputDataFields.image] = float_images

  include_instance_masks = (fields.InputDataFields.groundtruth_instance_masks

--- a/research/object_detection/matchers/argmax_matcher.py
+++ b/research/object_detection/matchers/argmax_matcher.py
@@ -184,7 +184,7 @@ class ArgMaxMatcher(matcher.Matcher):
        return matches

    if similarity_matrix.shape.is_fully_defined():
-      if similarity_matrix.shape[0].value == 0:
+      if shape_utils.get_dim_as_int(similarity_matrix.shape[0]) == 0:
        return _match_when_rows_are_empty()
      else:
        return _match_when_rows_are_non_empty()

--- a/research/object_detection/matchers/bipartite_matcher.py
+++ b/research/object_detection/matchers/bipartite_matcher.py
@@ -62,7 +62,7 @@ class GreedyBipartiteMatcher(matcher.Matcher):
    # Convert similarity matrix to distance matrix as tf.image.bipartite tries
    # to find minimum distance matches.
    distance_matrix = -1 * similarity_matrix
-    num_valid_rows = tf.reduce_sum(tf.to_float(valid_rows))
+    num_valid_rows = tf.reduce_sum(tf.cast(valid_rows, dtype=tf.float32))
    _, match_results = image_ops.bipartite_match(
        distance_matrix, num_valid_rows=num_valid_rows)
    match_results = tf.reshape(match_results, [-1])

--- a/research/object_detection/meta_architectures/faster_rcnn_meta_arch.py
+++ b/research/object_detection/meta_architectures/faster_rcnn_meta_arch.py
@@ -722,9 +722,13 @@ class FasterRCNNMetaArch(model.DetectionModel):
    """
    clip_heights = image_shapes[:, 0]
    clip_widths = image_shapes[:, 1]
-    clip_window = tf.to_float(tf.stack([tf.zeros_like(clip_heights),
-                                        tf.zeros_like(clip_heights),
-                                        clip_heights, clip_widths], axis=1))
+    clip_window = tf.cast(
+        tf.stack([
+            tf.zeros_like(clip_heights),
+            tf.zeros_like(clip_heights), clip_heights, clip_widths
+        ],
+                 axis=1),
+        dtype=tf.float32)
    return clip_window

  def _proposal_postprocess(self, rpn_box_encodings,
@@ -732,7 +736,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
                            image_shape, true_image_shapes):
    """Wraps over FasterRCNNMetaArch._postprocess_rpn()."""
    image_shape_2d = self._image_batch_shape_2d(image_shape)
-    proposal_boxes_normalized, _, num_proposals, _, _ = \
+    proposal_boxes_normalized, _, _, num_proposals, _, _ = \
        self._postprocess_rpn(
            rpn_box_encodings, rpn_objectness_predictions_with_background,
            anchors, image_shape_2d, true_image_shapes)
@@ -817,17 +821,13 @@ class FasterRCNNMetaArch(model.DetectionModel):
    prediction_dict = self._predict_first_stage(preprocessed_inputs)

    if self._number_of_stages >= 2:
-      # If mixed-precision training on TPU is enabled, rpn_box_encodings and
-      # rpn_objectness_predictions_with_background are bfloat16 tensors.
-      # Considered prediction results, they need to be casted to float32
-      # tensors for correct postprocess_rpn computation in predict_second_stage.
      prediction_dict.update(
          self._predict_second_stage(
-              tf.to_float(prediction_dict['rpn_box_encodings']),
-              tf.to_float(
-                  prediction_dict['rpn_objectness_predictions_with_background']
-              ), prediction_dict['rpn_features_to_crop'],
-              prediction_dict['anchors'], prediction_dict['image_shape'],
+              prediction_dict['rpn_box_encodings'],
+              prediction_dict['rpn_objectness_predictions_with_background'],
+              prediction_dict['rpn_features_to_crop'],
+              prediction_dict['anchors'],
+              prediction_dict['image_shape'],
              true_image_shapes))

    if self._number_of_stages == 3:
@@ -848,21 +848,21 @@ class FasterRCNNMetaArch(model.DetectionModel):

    Returns:
      prediction_dict: a dictionary holding "raw" prediction tensors:
-        1) rpn_box_predictor_features: A 4-D float32 tensor with shape
+        1) rpn_box_predictor_features: A 4-D float32/bfloat16 tensor with shape
          [batch_size, height, width, depth] to be used for predicting proposal
          boxes and corresponding objectness scores.
-        2) rpn_features_to_crop: A 4-D float32 tensor with shape
+        2) rpn_features_to_crop: A 4-D float32/bfloat16 tensor with shape
          [batch_size, height, width, depth] representing image features to crop
          using the proposal boxes predicted by the RPN.
        3) image_shape: a 1-D tensor of shape [4] representing the input
          image shape.
-        4) rpn_box_encodings:  3-D float tensor of shape
+        4) rpn_box_encodings:  3-D float32 tensor of shape
          [batch_size, num_anchors, self._box_coder.code_size] containing
          predicted boxes.
-        5) rpn_objectness_predictions_with_background: 3-D float tensor of shape
-          [batch_size, num_anchors, 2] containing class
-          predictions (logits) for each of the anchors.  Note that this
-          tensor *includes* background class predictions (at class index 0).
+        5) rpn_objectness_predictions_with_background: 3-D float32 tensor of
+          shape [batch_size, num_anchors, 2] containing class predictions
+          (logits) for each of the anchors.  Note that this tensor *includes*
+          background class predictions (at class index 0).
        6) anchors: A 2-D tensor of shape [num_anchors, 4] representing anchors
          for the first stage RPN (in absolute coordinates).  Note that
          `num_anchors` can differ depending on whether the model is created in
@@ -875,7 +875,8 @@ class FasterRCNNMetaArch(model.DetectionModel):

    # The Faster R-CNN paper recommends pruning anchors that venture outside
    # the image window at training time and clipping at inference time.
-    clip_window = tf.to_float(tf.stack([0, 0, image_shape[1], image_shape[2]]))
+    clip_window = tf.cast(tf.stack([0, 0, image_shape[1], image_shape[2]]),
+                          dtype=tf.float32)
    if self._is_training:
      if self.clip_anchors_to_image:
        anchors_boxlist = box_list_ops.clip_to_window(
@@ -899,9 +900,10 @@ class FasterRCNNMetaArch(model.DetectionModel):
        'image_shape':
            image_shape,
        'rpn_box_encodings':
-            rpn_box_encodings,
+            tf.cast(rpn_box_encodings, dtype=tf.float32),
        'rpn_objectness_predictions_with_background':
-            rpn_objectness_predictions_with_background,
+            tf.cast(rpn_objectness_predictions_with_background,
+                    dtype=tf.float32),
        'anchors':
            anchors_boxlist.data['boxes'],
    }
@@ -954,13 +956,13 @@ class FasterRCNNMetaArch(model.DetectionModel):

    Returns:
      prediction_dict: a dictionary holding "raw" prediction tensors:
-        1) refined_box_encodings: a 3-D tensor with shape
+        1) refined_box_encodings: a 3-D float32 tensor with shape
          [total_num_proposals, num_classes, self._box_coder.code_size]
          representing predicted (final) refined box encodings, where
          total_num_proposals=batch_size*self._max_num_proposals. If using a
          shared box across classes the shape will instead be
          [total_num_proposals, 1, self._box_coder.code_size].
-        2) class_predictions_with_background: a 3-D tensor with shape
+        2) class_predictions_with_background: a 3-D float32 tensor with shape
          [total_num_proposals, num_classes + 1] containing class
          predictions (logits) for each of the anchors, where
          total_num_proposals=batch_size*self._max_num_proposals.
@@ -980,7 +982,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
          boxes proposed by the RPN, thus enabling one to extract features and
          get box classification and prediction for externally selected areas
          of the image.
-        6) box_classifier_features: a 4-D float32 or bfloat16 tensor
+        6) box_classifier_features: a 4-D float32/bfloat16 tensor
          representing the features for each proposal.
    """
    proposal_boxes_normalized, num_proposals = self._proposal_postprocess(
@@ -1008,13 +1010,13 @@ class FasterRCNNMetaArch(model.DetectionModel):

    Returns:
      prediction_dict: a dictionary holding "raw" prediction tensors:
-        1) refined_box_encodings: a 3-D tensor with shape
+        1) refined_box_encodings: a 3-D float32 tensor with shape
          [total_num_proposals, num_classes, self._box_coder.code_size]
          representing predicted (final) refined box encodings, where
          total_num_proposals=batch_size*self._max_num_proposals. If using a
          shared box across classes the shape will instead be
          [total_num_proposals, 1, self._box_coder.code_size].
-        2) class_predictions_with_background: a 3-D tensor with shape
+        2) class_predictions_with_background: a 3-D float32 tensor with shape
          [total_num_proposals, num_classes + 1] containing class
          predictions (logits) for each of the anchors, where
          total_num_proposals=batch_size*self._max_num_proposals.
@@ -1029,17 +1031,12 @@ class FasterRCNNMetaArch(model.DetectionModel):
          boxes proposed by the RPN, thus enabling one to extract features and
          get box classification and prediction for externally selected areas
          of the image.
-        5) box_classifier_features: a 4-D float32 or bfloat16 tensor
+        5) box_classifier_features: a 4-D float32/bfloat16 tensor
          representing the features for each proposal.
    """
-    # If mixed-precision training on TPU is enabled, the dtype of
-    # rpn_features_to_crop is bfloat16, otherwise it is float32. tf.cast is
-    # used to match the dtype of proposal_boxes_normalized to that of
-    # rpn_features_to_crop for further computation.
    flattened_proposal_feature_maps = (
        self._compute_second_stage_input_feature_maps(
-            rpn_features_to_crop,
-            tf.cast(proposal_boxes_normalized, rpn_features_to_crop.dtype)))
+            rpn_features_to_crop, proposal_boxes_normalized))

    box_classifier_features = self._extract_box_classifier_features(
        flattened_proposal_feature_maps)
@@ -1066,9 +1063,10 @@ class FasterRCNNMetaArch(model.DetectionModel):
        proposal_boxes_normalized, image_shape, self._parallel_iterations)

    prediction_dict = {
-        'refined_box_encodings': refined_box_encodings,
+        'refined_box_encodings': tf.cast(refined_box_encodings,
+                                         dtype=tf.float32),
        'class_predictions_with_background':
-        class_predictions_with_background,
+        tf.cast(class_predictions_with_background, dtype=tf.float32),
        'proposal_boxes': absolute_proposal_boxes,
        'box_classifier_features': box_classifier_features,
        'proposal_boxes_normalized': proposal_boxes_normalized,
@@ -1215,7 +1213,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
    _, num_classes, height, width = instance_masks.get_shape().as_list()
    k = tf.shape(instance_masks)[0]
    instance_masks = tf.reshape(instance_masks, [-1, height, width])
-    classes = tf.to_int32(tf.reshape(classes, [-1]))
+    classes = tf.cast(tf.reshape(classes, [-1]), dtype=tf.int32)
    gather_idx = tf.range(k) * num_classes + classes
    return tf.gather(instance_masks, gather_idx)

@@ -1415,6 +1413,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
      detections: a dictionary containing the following fields
        detection_boxes: [batch, max_detection, 4]
        detection_scores: [batch, max_detections]
+        detection_multiclass_scores: [batch, max_detections, 2]
        detection_classes: [batch, max_detections]
          (this entry is only created if rpn_mode=False)
        num_detections: [batch]
@@ -1427,7 +1426,8 @@ class FasterRCNNMetaArch(model.DetectionModel):

    with tf.name_scope('FirstStagePostprocessor'):
      if self._number_of_stages == 1:
-        (proposal_boxes, proposal_scores, num_proposals, raw_proposal_boxes,
+        (proposal_boxes, proposal_scores, proposal_multiclass_scores,
+         num_proposals, raw_proposal_boxes,
         raw_proposal_scores) = self._postprocess_rpn(
             prediction_dict['rpn_box_encodings'],
             prediction_dict['rpn_objectness_predictions_with_background'],
@@ -1437,8 +1437,10 @@ class FasterRCNNMetaArch(model.DetectionModel):
                proposal_boxes,
            fields.DetectionResultFields.detection_scores:
                proposal_scores,
+            fields.DetectionResultFields.detection_multiclass_scores:
+                proposal_multiclass_scores,
            fields.DetectionResultFields.num_detections:
-                tf.to_float(num_proposals),
+                tf.cast(num_proposals, dtype=tf.float32),
            fields.DetectionResultFields.raw_detection_boxes:
                raw_proposal_boxes,
            fields.DetectionResultFields.raw_detection_scores:
@@ -1545,6 +1547,9 @@ class FasterRCNNMetaArch(model.DetectionModel):
      proposal_scores:  A float tensor with shape
        [batch_size, max_num_proposals] representing the (potentially zero
        padded) proposal objectness scores for all images in the batch.
+      proposal_multiclass_scores:  A float tensor with shape
+        [batch_size, max_num_proposals, 2] representing the (potentially zero
+        padded) proposal multiclass scores for all images in the batch.
      num_proposals: A Tensor of type `int32`. A 1-D tensor of shape [batch]
        representing the number of proposals predicted for each image in
        the batch.
@@ -1566,10 +1571,12 @@ class FasterRCNNMetaArch(model.DetectionModel):
        rpn_objectness_predictions_with_background_batch)
    rpn_objectness_softmax_without_background = rpn_objectness_softmax[:, :, 1]
    clip_window = self._compute_clip_window(image_shapes)
-    (proposal_boxes, proposal_scores, _, _, _,
+    additional_fields = {'multiclass_scores': rpn_objectness_softmax}
+    (proposal_boxes, proposal_scores, _, _, nmsed_additional_fields,
     num_proposals) = self._first_stage_nms_fn(
         tf.expand_dims(raw_proposal_boxes, axis=2),
         tf.expand_dims(rpn_objectness_softmax_without_background, axis=2),
+         additional_fields=additional_fields,
         clip_window=clip_window)
    if self._is_training:
      proposal_boxes = tf.stop_gradient(proposal_boxes)
@@ -1596,7 +1603,9 @@ class FasterRCNNMetaArch(model.DetectionModel):
        normalize_boxes,
        elems=[raw_proposal_boxes, image_shapes],
        dtype=tf.float32)
-    return (normalized_proposal_boxes, proposal_scores, num_proposals,
+    proposal_multiclass_scores = nmsed_additional_fields['multiclass_scores']
+    return (normalized_proposal_boxes, proposal_scores,
+            proposal_multiclass_scores, num_proposals,
            raw_normalized_proposal_boxes, rpn_objectness_softmax)

  def _sample_box_classifier_batch(
@@ -1713,11 +1722,13 @@ class FasterRCNNMetaArch(model.DetectionModel):
        for i, boxes in enumerate(
            self.groundtruth_lists(fields.BoxListFields.boxes))
    ]
-    groundtruth_classes_with_background_list = [
-        tf.to_float(
-            tf.pad(one_hot_encoding, [[0, 0], [1, 0]], mode='CONSTANT'))
-        for one_hot_encoding in self.groundtruth_lists(
-            fields.BoxListFields.classes)]
+    groundtruth_classes_with_background_list = []
+    for one_hot_encoding in self.groundtruth_lists(
+        fields.BoxListFields.classes):
+      groundtruth_classes_with_background_list.append(
+          tf.cast(
+              tf.pad(one_hot_encoding, [[0, 0], [1, 0]], mode='CONSTANT'),
+              dtype=tf.float32))

    groundtruth_masks_list = self._groundtruth_lists.get(
        fields.BoxListFields.masks)
@@ -1860,6 +1871,9 @@ class FasterRCNNMetaArch(model.DetectionModel):
      A dictionary containing:
        `detection_boxes`: [batch, max_detection, 4] in normalized co-ordinates.
        `detection_scores`: [batch, max_detections]
+         detection_multiclass_scores: [batch, max_detections,
+          num_classes_with_background] tensor with class score distribution for
+          post-processed detection boxes including background class if any.
        `detection_classes`: [batch, max_detections]
        `num_detections`: [batch]
        `detection_masks`:
@@ -1894,20 +1908,24 @@ class FasterRCNNMetaArch(model.DetectionModel):
    clip_window = self._compute_clip_window(image_shapes)
    mask_predictions_batch = None
    if mask_predictions is not None:
-      mask_height = mask_predictions.shape[2].value
-      mask_width = mask_predictions.shape[3].value
+      mask_height = shape_utils.get_dim_as_int(mask_predictions.shape[2])
+      mask_width = shape_utils.get_dim_as_int(mask_predictions.shape[3])
      mask_predictions = tf.sigmoid(mask_predictions)
      mask_predictions_batch = tf.reshape(
          mask_predictions, [-1, self.max_num_proposals,
                             self.num_classes, mask_height, mask_width])

-    (nmsed_boxes, nmsed_scores, nmsed_classes, nmsed_masks, _,
-     num_detections) = self._second_stage_nms_fn(
+    additional_fields = {
+        'multiclass_scores': class_predictions_with_background_batch_normalized
+    }
+    (nmsed_boxes, nmsed_scores, nmsed_classes, nmsed_masks,
+     nmsed_additional_fields, num_detections) = self._second_stage_nms_fn(
         refined_decoded_boxes_batch,
         class_predictions_batch,
         clip_window=clip_window,
         change_coordinate_frame=True,
         num_valid_boxes=num_proposals,
+         additional_fields=additional_fields,
         masks=mask_predictions_batch)
    if refined_decoded_boxes_batch.shape[2] > 1:
      class_ids = tf.expand_dims(
@@ -1948,8 +1966,10 @@ class FasterRCNNMetaArch(model.DetectionModel):
            nmsed_scores,
        fields.DetectionResultFields.detection_classes:
            nmsed_classes,
+        fields.DetectionResultFields.detection_multiclass_scores:
+            nmsed_additional_fields['multiclass_scores'],
        fields.DetectionResultFields.num_detections:
-            tf.to_float(num_detections),
+            tf.cast(num_detections, dtype=tf.float32),
        fields.DetectionResultFields.raw_detection_boxes:
            raw_normalized_detection_boxes,
        fields.DetectionResultFields.raw_detection_scores:
@@ -2096,18 +2116,18 @@ class FasterRCNNMetaArch(model.DetectionModel):
        return self._first_stage_sampler.subsample(
            tf.cast(cls_weights, tf.bool),
            self._first_stage_minibatch_size, tf.cast(cls_targets, tf.bool))
-      batch_sampled_indices = tf.to_float(shape_utils.static_or_dynamic_map_fn(
+      batch_sampled_indices = tf.cast(shape_utils.static_or_dynamic_map_fn(
          _minibatch_subsample_fn,
          [batch_cls_targets, batch_cls_weights],
          dtype=tf.bool,
          parallel_iterations=self._parallel_iterations,
-          back_prop=True))
+          back_prop=True), dtype=tf.float32)

      # Normalize by number of examples in sampled minibatch
      normalizer = tf.maximum(
          tf.reduce_sum(batch_sampled_indices, axis=1), 1.0)
      batch_one_hot_targets = tf.one_hot(
-          tf.to_int32(batch_cls_targets), depth=2)
+          tf.cast(batch_cls_targets, dtype=tf.int32), depth=2)
      sampled_reg_indices = tf.multiply(batch_sampled_indices,
                                        batch_reg_weights)

@@ -2133,8 +2153,8 @@ class FasterRCNNMetaArch(model.DetectionModel):
                                      name='localization_loss')
      objectness_loss = tf.multiply(self._first_stage_obj_loss_weight,
                                    objectness_loss, name='objectness_loss')
-      loss_dict = {localization_loss.op.name: localization_loss,
-                   objectness_loss.op.name: objectness_loss}
+      loss_dict = {'Loss/RPNLoss/localization_loss': localization_loss,
+                   'Loss/RPNLoss/objectness_loss': objectness_loss}
    return loss_dict

  def _loss_box_classifier(self,
@@ -2216,8 +2236,9 @@ class FasterRCNNMetaArch(model.DetectionModel):
          for proposal_boxes_single_image in tf.unstack(proposal_boxes)]
      batch_size = len(proposal_boxlists)

-      num_proposals_or_one = tf.to_float(tf.expand_dims(
-          tf.maximum(num_proposals, tf.ones_like(num_proposals)), 1))
+      num_proposals_or_one = tf.cast(tf.expand_dims(
+          tf.maximum(num_proposals, tf.ones_like(num_proposals)), 1),
+                                     dtype=tf.float32)
      normalizer = tf.tile(num_proposals_or_one,
                           [1, self.max_num_proposals]) * batch_size

@@ -2276,9 +2297,11 @@ class FasterRCNNMetaArch(model.DetectionModel):
          ndims=2) / normalizer

      second_stage_loc_loss = tf.reduce_sum(
-          second_stage_loc_losses * tf.to_float(paddings_indicator))
+          second_stage_loc_losses * tf.cast(paddings_indicator,
+                                            dtype=tf.float32))
      second_stage_cls_loss = tf.reduce_sum(
-          second_stage_cls_losses * tf.to_float(paddings_indicator))
+          second_stage_cls_losses * tf.cast(paddings_indicator,
+                                            dtype=tf.float32))

      if self._hard_example_miner:
        (second_stage_loc_loss, second_stage_cls_loss
@@ -2293,8 +2316,10 @@ class FasterRCNNMetaArch(model.DetectionModel):
                                        second_stage_cls_loss,
                                        name='classification_loss')

-      loss_dict = {localization_loss.op.name: localization_loss,
-                   classification_loss.op.name: classification_loss}
+      loss_dict = {'Loss/BoxClassifierLoss/localization_loss':
+                       localization_loss,
+                   'Loss/BoxClassifierLoss/classification_loss':
+                       classification_loss}
      second_stage_mask_loss = None
      if prediction_masks is not None:
        if groundtruth_masks_list is None:
@@ -2332,8 +2357,8 @@ class FasterRCNNMetaArch(model.DetectionModel):
              prediction_masks_with_background,
              tf.greater(one_hot_flat_cls_targets_with_background, 0))

-        mask_height = prediction_masks.shape[2].value
-        mask_width = prediction_masks.shape[3].value
+        mask_height = shape_utils.get_dim_as_int(prediction_masks.shape[2])
+        mask_width = shape_utils.get_dim_as_int(prediction_masks.shape[3])
        reshaped_prediction_masks = tf.reshape(
            prediction_masks_masked_by_class_targets,
            [batch_size, -1, mask_height * mask_width])
@@ -2364,7 +2389,8 @@ class FasterRCNNMetaArch(model.DetectionModel):
            [batch_size, -1, mask_height * mask_width])

        mask_losses_weights = (
-            batch_mask_target_weights * tf.to_float(paddings_indicator))
+            batch_mask_target_weights * tf.cast(paddings_indicator,
+                                                dtype=tf.float32))
        mask_losses = self._second_stage_mask_loss(
            reshaped_prediction_masks,
            batch_cropped_gt_mask,
@@ -2419,7 +2445,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
        for detection_boxes_single_image in tf.unstack(proposal_boxes)
    ]
    paddings_indicator = self._padded_batched_proposals_indicator(
-        tf.to_int32(num_detections), detection_boxes.shape[1])
+        tf.cast(num_detections, dtype=tf.int32), detection_boxes.shape[1])
    (batch_cls_targets_with_background, _, _, _,
     _) = target_assigner.batch_assign_targets(
         target_assigner=self._detector_target_assigner,