Add TPU SavedModel exporter and refactor OD code (#6737)

247226201 by ronnyvotel: Updating the visualization tools to accept unique_ids for color coding. -- 247067830 by Zhichao Lu: Add box_encodings_clip_range options for the convolutional box predictor (for TPU compatibility). -- 246888475 by Zhichao Lu: Remove unused _update_eval_steps function. -- 246163259 by lzc: Add a gather op that can handle ignore indices (which are "-1"s in this case). -- 246084944 by Zhichao Lu: Keras based implementation for SSD + MobilenetV2 + FPN. -- 245544227 by rathodv: Add batch_get_targets method to target assigner module to gather any groundtruth tensors based on the results of target assigner. -- 245540854 by rathodv: Update target assigner to return match tensor instead of a match object. -- 245434441 by Zhichao Lu: Add README for tpu_exporters package. -- 245381834 by lzc: Internal change. -- 245298983 by Zhichao Lu: Add conditional_shape_resizer to config_util -- 245134666 by Zhichao Lu: Adds ConditionalShapeResizer to the ImageResizer proto which enables resizing only if input image height or width is is greater or smaller than a certain size. Also enables specification of resize method in resize_to_{max, min}_dimension methods. -- 245093975 by Zhichao Lu: Exporting SavedModel for Object Detection TPU inference. (faster-rcnn) -- 245072421 by Zhichao Lu: Adds a new image resizing method "resize_to_max_dimension" which resizes images only if a dimension is greater than the maximum desired value while maintaining aspect ratio. -- 244946998 by lzc: Internal Changes. -- 244943693 by Zhichao Lu: Add a custom config to mobilenet v2 that makes it more detection friendly. -- 244754158 by derekjchow: Internal change. -- 244699875 by Zhichao Lu: Add check_range=False to box_list_ops.to_normalized_coordinates when training for instance segmentation. This is consistent with other calls when training for object detection. There could be wrongly annotated boxes in the dataset. -- 244507425 by rathodv: Support bfloat16 for ssd models. -- 244399982 by Zhichao Lu: Exporting SavedModel for Object Detection TPU inference. (ssd) -- 244209387 by Zhichao Lu: Internal change. -- 243922296 by rathodv: Change `raw_detection_scores` to contain softmax/sigmoid scores (not logits) for `raw_ detection_boxes`. -- 243883978 by Zhichao Lu: Add a sample fully conv config. -- 243369455 by Zhichao Lu: Fix regularization loss gap in Keras and Slim. -- 243292002 by lzc: Internal changes. -- 243097958 by Zhichao Lu: Exporting SavedModel for Object Detection TPU inference. (ssd model) -- 243007177 by Zhichao Lu: Exporting SavedModel for Object Detection TPU inference. (ssd model) -- 242776550 by Zhichao Lu: Make object detection pre-processing run on GPU. tf.map_fn() uses TensorArrayV3 ops, which have no int32 GPU implementation. Cast to int64, then cast back to int32. -- 242723128 by Zhichao Lu: Using sorted dictionaries for additional heads in non_max_suppression to ensure tensor order -- 242495311 by Zhichao Lu: Update documentation to reflect new TFLite examples repo location -- 242230527 by Zhichao Lu: Fix Dropout bugs for WeightSharedConvolutionalBoxPred. -- 242226573 by Zhichao Lu: Create Keras-based WeightSharedConvolutionalBoxPredictor. -- 241806074 by Zhichao Lu: Add inference in unit tests of TFX OD template. -- 241641498 by lzc: Internal change. -- 241637481 by Zhichao Lu: matmul_crop_and_resize(): Switch to dynamic shaping, so that not all dimensions are required to be known. -- 241429980 by Zhichao Lu: Internal change -- 241167237 by Zhichao Lu: Adds a faster_rcnn_inception_resnet_v2 Keras feature extractor, and updates the model builder to construct it. -- 241088616 by Zhichao Lu: Make it compatible with different dtype, e.g. float32, bfloat16, etc. -- 240897364 by lzc: Use image_np_expanded in object_detection_tutorial notebook. -- 240890393 by Zhichao Lu: Disable multicore inference for OD template as its not yet compatible. -- 240352168 by Zhichao Lu: Make SSDResnetV1FpnFeatureExtractor not protected to allow inheritance. -- 240351470 by lzc: Internal change. -- 239878928 by Zhichao Lu: Defines Keras box predictors for Faster RCNN and RFCN -- 239872103 by Zhichao Lu: Delete duplicated inputs in test. -- 239714273 by Zhichao Lu: Adding scope variable to all class heads -- 239698643 by Zhichao Lu: Create FPN feature extractor for object detection. -- 239696657 by Zhichao Lu: Internal Change. -- 239299404 by Zhichao Lu: Allows the faster rcnn meta-architecture to support Keras subcomponents -- 238502595 by Zhichao Lu: Lay the groundwork for symmetric quantization. -- 238496885 by Zhichao Lu: Add flexible_grid_anchor_generator -- 238138727 by lzc: Remove dead code. _USE_C_SHAPES has been forced True in TensorFlow releases since TensorFlow 1.9 (https://github.com/tensorflow/tensorflow/commit/1d74a69443f741e69f9f52cb6bc2940b4d4ae3b7) -- 238123936 by rathodv: Add num_matched_groundtruth summary to target assigner in SSD. -- 238103345 by ronnyvotel: Raising error if input file pattern does not match any files. Also printing the number of evaluation images for coco metrics. -- 238044081 by Zhichao Lu: Fix docstring to state the correct dimensionality of `class_predictions_with_background`. -- 237920279 by Zhichao Lu: [XLA] Rework debug flags for dumping HLO. The following flags (usually passed via the XLA_FLAGS envvar) are removed: xla_dump_computations_to xla_dump_executions_to xla_dump_ir_to xla_dump_optimized_hlo_proto_to xla_dump_per_pass_hlo_proto_to xla_dump_unoptimized_hlo_proto_to xla_generate_hlo_graph xla_generate_hlo_text_to xla_hlo_dump_as_html xla_hlo_graph_path xla_log_hlo_text The following new flags are added: xla_dump_to xla_dump_hlo_module_re xla_dump_hlo_pass_re xla_dump_hlo_as_text xla_dump_hlo_as_proto xla_dump_hlo_as_dot xla_dump_hlo_as_url xla_dump_hlo_as_html xla_dump_ir xla_dump_hlo_snapshots The default is not to dump anything at all, but as soon as some dumping flag is specified, we enable the following defaults (most of which can be overridden). * dump to stdout (overridden by --xla_dump_to) * dump HLO modules at the very beginning and end of the optimization pipeline * don't dump between any HLO passes (overridden by --xla_dump_hlo_pass_re) * dump all HLO modules (overridden by --xla_dump_hlo_module_re) * dump in textual format (overridden by --xla_dump_hlo_as_{text,proto,dot,url,html}). For example, to dump optimized and unoptimized HLO text and protos to /tmp/foo, pass --xla_dump_to=/tmp/foo --xla_dump_hlo_as_text --xla_dump_hlo_as_proto For details on these flags' meanings, see xla.proto. The intent of this change is to make dumping both simpler to use and more powerful. For example: * Previously there was no way to dump the HLO module during the pass pipeline in HLO text format; the only option was --dump_per_pass_hlo_proto_to, which dumped in proto format. Now this is --xla_dump_pass_re=.* --xla_dump_hlo_as_text. (In fact, the second flag is not necessary in this case, as dumping as text is the default.) * Previously there was no way to dump HLO as a graph before and after compilation; the only option was --xla_generate_hlo_graph, which would dump before/after every pass. Now this is --xla_dump_hlo_as_{dot,url,html} (depending on what format you want the graph in). * Previously, there was no coordination between the filenames written by the various flags, so info about one module might be dumped with various filename prefixes. Now the filenames are consistent and all dumps from a particular module are next to each other. If you only specify some of these flags, we try to figure out what you wanted. For example: * --xla_dump_to implies --xla_dump_hlo_as_text unless you specify some other --xla_dump_as_* flag. * --xla_dump_hlo_as_text or --xla_dump_ir implies dumping to stdout unless you specify a different --xla_dump_to directory. You can explicitly dump to stdout with --xla_dump_to=-. As part of this change, I simplified the debugging code in the HLO passes for dumping HLO modules. Previously, many tests explicitly VLOG'ed the HLO module before, after, and sometimes during the pass. I removed these VLOGs. If you want dumps before/during/after an HLO pass, use --xla_dump_pass_re=<pass_name>. -- 237510043 by lzc: Internal Change. -- 237469515 by Zhichao Lu: Parameterize model_builder.build in inputs.py. -- 237293511 by rathodv: Remove multiclass_scores from tensor_dict in transform_data_fn always. -- 237260333 by ronnyvotel: Updating faster_rcnn_meta_arch to define prediction dictionary fields that are batched. -- PiperOrigin-RevId: 247226201

Add TPU SavedModel exporter and refactor OD code (#6737)
247226201 by ronnyvotel: Updating the visualization tools to accept unique_ids for color coding. -- 247067830 by Zhichao Lu: Add box_encodings_clip_range options for the convolutional box predictor (for TPU compatibility). -- 246888475 by Zhichao Lu: Remove unused _update_eval_steps function. -- 246163259 by lzc: Add a gather op that can handle ignore indices (which are "-1"s in this case). -- 246084944 by Zhichao Lu: Keras based implementation for SSD + MobilenetV2 + FPN. -- 245544227 by rathodv: Add batch_get_targets method to target assigner module to gather any groundtruth tensors based on the results of target assigner. -- 245540854 by rathodv: Update target assigner to return match tensor instead of a match object. -- 245434441 by Zhichao Lu: Add README for tpu_exporters package. -- 245381834 by lzc: Internal change. -- 245298983 by Zhichao Lu: Add conditional_shape_resizer to config_util -- 245134666 by Zhichao Lu: Adds ConditionalShapeResizer to the ImageResizer proto which enables resizing only if input image height or width is is greater or smaller than a certain size. Also enables specification of resize method in resize_to_{max, min}_dimension methods. -- 245093975 by Zhichao Lu: Exporting SavedModel for Object Detection TPU inference. (faster-rcnn) -- 245072421 by Zhichao Lu: Adds a new image resizing method "resize_to_max_dimension" which resizes images only if a dimension is greater than the maximum desired value while maintaining aspect ratio. -- 244946998 by lzc: Internal Changes. -- 244943693 by Zhichao Lu: Add a custom config to mobilenet v2 that makes it more detection friendly. -- 244754158 by derekjchow: Internal change. -- 244699875 by Zhichao Lu: Add check_range=False to box_list_ops.to_normalized_coordinates when training for instance segmentation. This is consistent with other calls when training for object detection. There could be wrongly annotated boxes in the dataset. -- 244507425 by rathodv: Support bfloat16 for ssd models. -- 244399982 by Zhichao Lu: Exporting SavedModel for Object Detection TPU inference. (ssd) -- 244209387 by Zhichao Lu: Internal change. -- 243922296 by rathodv: Change `raw_detection_scores` to contain softmax/sigmoid scores (not logits) for `raw_ detection_boxes`. -- 243883978 by Zhichao Lu: Add a sample fully conv config. -- 243369455 by Zhichao Lu: Fix regularization loss gap in Keras and Slim. -- 243292002 by lzc: Internal changes. -- 243097958 by Zhichao Lu: Exporting SavedModel for Object Detection TPU inference. (ssd model) -- 243007177 by Zhichao Lu: Exporting SavedModel for Object Detection TPU inference. (ssd model) -- 242776550 by Zhichao Lu: Make object detection pre-processing run on GPU. tf.map_fn() uses TensorArrayV3 ops, which have no int32 GPU implementation. Cast to int64, then cast back to int32. -- 242723128 by Zhichao Lu: Using sorted dictionaries for additional heads in non_max_suppression to ensure tensor order -- 242495311 by Zhichao Lu: Update documentation to reflect new TFLite examples repo location -- 242230527 by Zhichao Lu: Fix Dropout bugs for WeightSharedConvolutionalBoxPred. -- 242226573 by Zhichao Lu: Create Keras-based WeightSharedConvolutionalBoxPredictor. -- 241806074 by Zhichao Lu: Add inference in unit tests of TFX OD template. -- 241641498 by lzc: Internal change. -- 241637481 by Zhichao Lu: matmul_crop_and_resize(): Switch to dynamic shaping, so that not all dimensions are required to be known. -- 241429980 by Zhichao Lu: Internal change -- 241167237 by Zhichao Lu: Adds a faster_rcnn_inception_resnet_v2 Keras feature extractor, and updates the model builder to construct it. -- 241088616 by Zhichao Lu: Make it compatible with different dtype, e.g. float32, bfloat16, etc. -- 240897364 by lzc: Use image_np_expanded in object_detection_tutorial notebook. -- 240890393 by Zhichao Lu: Disable multicore inference for OD template as its not yet compatible. -- 240352168 by Zhichao Lu: Make SSDResnetV1FpnFeatureExtractor not protected to allow inheritance. -- 240351470 by lzc: Internal change. -- 239878928 by Zhichao Lu: Defines Keras box predictors for Faster RCNN and RFCN -- 239872103 by Zhichao Lu: Delete duplicated inputs in test. -- 239714273 by Zhichao Lu: Adding scope variable to all class heads -- 239698643 by Zhichao Lu: Create FPN feature extractor for object detection. -- 239696657 by Zhichao Lu: Internal Change. -- 239299404 by Zhichao Lu: Allows the faster rcnn meta-architecture to support Keras subcomponents -- 238502595 by Zhichao Lu: Lay the groundwork for symmetric quantization. -- 238496885 by Zhichao Lu: Add flexible_grid_anchor_generator -- 238138727 by lzc: Remove dead code. _USE_C_SHAPES has been forced True in TensorFlow releases since TensorFlow 1.9 (https://github.com/tensorflow/tensorflow/commit/1d74a69443f741e69f9f52cb6bc2940b4d4ae3b7) -- 238123936 by rathodv: Add num_matched_groundtruth summary to target assigner in SSD. -- 238103345 by ronnyvotel: Raising error if input file pattern does not match any files. Also printing the number of evaluation images for coco metrics. -- 238044081 by Zhichao Lu: Fix docstring to state the correct dimensionality of `class_predictions_with_background`. -- 237920279 by Zhichao Lu: [XLA] Rework debug flags for dumping HLO. The following flags (usually passed via the XLA_FLAGS envvar) are removed: xla_dump_computations_to xla_dump_executions_to xla_dump_ir_to xla_dump_optimized_hlo_proto_to xla_dump_per_pass_hlo_proto_to xla_dump_unoptimized_hlo_proto_to xla_generate_hlo_graph xla_generate_hlo_text_to xla_hlo_dump_as_html xla_hlo_graph_path xla_log_hlo_text The following new flags are added: xla_dump_to xla_dump_hlo_module_re xla_dump_hlo_pass_re xla_dump_hlo_as_text xla_dump_hlo_as_proto xla_dump_hlo_as_dot xla_dump_hlo_as_url xla_dump_hlo_as_html xla_dump_ir xla_dump_hlo_snapshots The default is not to dump anything at all, but as soon as some dumping flag is specified, we enable the following defaults (most of which can be overridden). * dump to stdout (overridden by --xla_dump_to) * dump HLO modules at the very beginning and end of the optimization pipeline * don't dump between any HLO passes (overridden by --xla_dump_hlo_pass_re) * dump all HLO modules (overridden by --xla_dump_hlo_module_re) * dump in textual format (overridden by --xla_dump_hlo_as_{text,proto,dot,url,html}). For example, to dump optimized and unoptimized HLO text and protos to /tmp/foo, pass --xla_dump_to=/tmp/foo --xla_dump_hlo_as_text --xla_dump_hlo_as_proto For details on these flags' meanings, see xla.proto. The intent of this change is to make dumping both simpler to use and more powerful. For example: * Previously there was no way to dump the HLO module during the pass pipeline in HLO text format; the only option was --dump_per_pass_hlo_proto_to, which dumped in proto format. Now this is --xla_dump_pass_re=.* --xla_dump_hlo_as_text. (In fact, the second flag is not necessary in this case, as dumping as text is the default.) * Previously there was no way to dump HLO as a graph before and after compilation; the only option was --xla_generate_hlo_graph, which would dump before/after every pass. Now this is --xla_dump_hlo_as_{dot,url,html} (depending on what format you want the graph in). * Previously, there was no coordination between the filenames written by the various flags, so info about one module might be dumped with various filename prefixes. Now the filenames are consistent and all dumps from a particular module are next to each other. If you only specify some of these flags, we try to figure out what you wanted. For example: * --xla_dump_to implies --xla_dump_hlo_as_text unless you specify some other --xla_dump_as_* flag. * --xla_dump_hlo_as_text or --xla_dump_ir implies dumping to stdout unless you specify a different --xla_dump_to directory. You can explicitly dump to stdout with --xla_dump_to=-. As part of this change, I simplified the debugging code in the HLO passes for dumping HLO modules. Previously, many tests explicitly VLOG'ed the HLO module before, after, and sometimes during the pass. I removed these VLOGs. If you want dumps before/during/after an HLO pass, use --xla_dump_pass_re=<pass_name>. -- 237510043 by lzc: Internal Change. -- 237469515 by Zhichao Lu: Parameterize model_builder.build in inputs.py. -- 237293511 by rathodv: Remove multiclass_scores from tensor_dict in transform_data_fn always. -- 237260333 by ronnyvotel: Updating faster_rcnn_meta_arch to define prediction dictionary fields that are batched. -- PiperOrigin-RevId: 247226201
80444539 · Zhuoran Liu · pkulzc · c4f34e58 · 80444539 · 80444539
Commit 80444539 authored May 22, 2019 by Zhuoran Liu Committed by pkulzc May 22, 2019
16 changed files
--- a/research/object_detection/tpu_exporters/export_saved_model_tpu_lib_test.py
+++ b/research/object_detection/tpu_exporters/export_saved_model_tpu_lib_test.py
+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Test for object detection's TPU exporter."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import os
+from absl.testing import parameterized
+import numpy as np
+import tensorflow as tf
+from object_detection.tpu_exporters import export_saved_model_tpu_lib
+flags = tf.app.flags
+FLAGS = flags.FLAGS
+def get_path(path_suffix):
+  return os.path.join(tf.resource_loader.get_data_files_path(), 'testdata',
+                      path_suffix)
+class ExportSavedModelTPUTest(tf.test.TestCase, parameterized.TestCase):
+  @parameterized.named_parameters(
+      ('ssd', get_path('ssd/ssd_pipeline.config'), 'image_tensor', True, 20),
+      ('faster_rcnn',
+       get_path('faster_rcnn/faster_rcnn_resnet101_atrous_coco.config'),
+       'image_tensor', True, 20))
+  def testExportAndLoad(self,
+                        pipeline_config_file,
+                        input_type='image_tensor',
+                        use_bfloat16=False,
+                        repeat=1):
+    input_placeholder_name = 'placeholder_tensor'
+    export_dir = os.path.join(FLAGS.test_tmpdir, 'tpu_saved_model')
+    if tf.gfile.Exists(export_dir):
+      tf.gfile.DeleteRecursively(export_dir)
+    ckpt_path = None
+    export_saved_model_tpu_lib.export(pipeline_config_file, ckpt_path,
+                                      export_dir, input_placeholder_name,
+                                      input_type, use_bfloat16)
+    inputs = np.random.rand(256, 256, 3)
+    tensor_dict_out = export_saved_model_tpu_lib.run_inference_from_saved_model(
+        inputs, export_dir, input_placeholder_name, repeat)
+    for k, v in tensor_dict_out.items():
+      tf.logging.info('{}: {}'.format(k, v))
+if __name__ == '__main__':
+  tf.test.main()
--- a/research/object_detection/tpu_exporters/faster_rcnn.py
+++ b/research/object_detection/tpu_exporters/faster_rcnn.py
+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Python library for faster_rcnn model, tailored for TPU inference."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+# pylint: disable=protected-access
+import tensorflow as tf
+# pylint: disable=g-import-not-at-top
+# Checking TF version, because this module relies on TPUPartitionedCall
+# in tensorflow.python.tpu, which is not available until TF r1.14.
+major, minor, _ = tf.__version__.split('.')  # pylint: disable=protected-access
+if int(major) < 1 or (int(major == 1) and int(minor) < 14):
+  raise RuntimeError(
+      'TensorFlow version >= 1.14 is required. Found ({}).'.format(
+          tf.__version__))
+from tensorflow.python.framework import function
+from tensorflow.python.tpu import functional as tpu_functional
+from tensorflow.python.tpu.ops import tpu_ops
+from object_detection import exporter
+from object_detection.builders import model_builder
+from object_detection.tpu_exporters import utils
+ANCHORS = 'anchors'
+BOX_CLASSIFIER_FEATURES = 'box_classifier_features'
+BOX_ENCODINGS = 'box_encodings'
+CLASS_PREDICTIONS_WITH_BACKGROUND = 'class_predictions_with_background'
+IMAGE_SHAPE = 'image_shape'
+NUM_PROPOSALS = 'num_proposals'
+PROPOSAL_BOXES = 'proposal_boxes'
+PROPOSAL_BOXES_NORMALIZED = 'proposal_boxes_normalized'
+REFINED_BOX_ENCODINGS = 'refined_box_encodings'
+RPN_BOX_ENCODINGS = 'rpn_box_encodings'
+RPN_BOX_PREDICTOR_FEATURES = 'rpn_box_predictor_features'
+RPN_FEATURES_TO_CROP = 'rpn_features_to_crop'
+RPN_OBJECTNESS_PREDICTIONS_WITH_BACKGROUND = \
+    'rpn_objectness_predictions_with_background'
+def modify_config(pipeline_config):
+  """Modifies pipeline config to build the correct graph for TPU."""
+  # faster_rcnn.use_static_shapes and faster_rcnn.use_static_shapes_for_eval
+  # are set to True in order for detection_model.use_static_shapes to be True.
+  # We need to set this so that clip_to_window in _predict_first_stage
+  # can work on TPU. However as a side-effect, the flag forces the use of
+  # padded version of NMS.
+  pipeline_config.model.faster_rcnn.use_static_shapes = True
+  pipeline_config.model.faster_rcnn.use_static_shapes_for_eval = True
+  pipeline_config.model.faster_rcnn.use_matmul_crop_and_resize = True
+  pipeline_config.model.faster_rcnn.clip_anchors_to_image = True
+  return pipeline_config
+def get_prediction_tensor_shapes(pipeline_config):
+  """Gets static shapes of tensors by building the graph on CPU.
+  This function builds the graph on CPU and obtain static shapes of output
+  tensors from TPUPartitionedCall. Shapes information are later used for setting
+  shapes of tensors when TPU graphs are built. This is necessary because tensors
+  coming out of TPUPartitionedCall lose their shape information, which are
+  needed for a lot of CPU operations later.
+  Args:
+    pipeline_config: A TrainEvalPipelineConfig proto.
+  Returns:
+    A python dict of tensors' names and their shapes.
+  """
+  pipeline_config = modify_config(pipeline_config)
+  detection_model = model_builder.build(
+      pipeline_config.model, is_training=False)
+  _, input_tensors = exporter.input_placeholder_fn_map['image_tensor']()
+  inputs = tf.to_float(input_tensors)
+  preprocessed_inputs, true_image_shapes = detection_model.preprocess(inputs)
+  prediction_dict = detection_model.predict(preprocessed_inputs,
+                                            true_image_shapes)
+  shapes_info = {k: v.shape.as_list() for k, v in prediction_dict.items()}
+  return shapes_info
+def build_graph(pipeline_config,
+                shapes_info,
+                input_type='encoded_image_string_tensor',
+                use_bfloat16=True):
+  """Builds serving graph of faster_rcnn to be exported.
+  Args:
+    pipeline_config: A TrainEvalPipelineConfig proto.
+    shapes_info: A python dict of tensors' names and their shapes, returned by
+      `get_prediction_tensor_shapes()`.
+    input_type: One of
+                'encoded_image_string_tensor': a 1d tensor with dtype=tf.string
+                'image_tensor': a 4d tensor with dtype=tf.uint8
+                'tf_example': a 1d tensor with dtype=tf.string
+    use_bfloat16: If true, use tf.bfloat16 on TPU.
+  Returns:
+    placeholder_tensor: A placeholder tensor, type determined by `input_type`.
+    result_tensor_dict: A python dict of tensors' names and tensors.
+  """
+  pipeline_config = modify_config(pipeline_config)
+  detection_model = model_builder.build(
+      pipeline_config.model, is_training=False)
+  placeholder_tensor, input_tensors = \
+      exporter.input_placeholder_fn_map[input_type]()
+  # CPU pre-processing
+  inputs = tf.to_float(input_tensors)
+  preprocessed_inputs, true_image_shapes = detection_model.preprocess(inputs)
+  # Dimshuffle: [b, h, w, c] -> [b, c, h, w]
+  preprocessed_inputs = tf.transpose(preprocessed_inputs, perm=[0, 3, 1, 2])
+  if use_bfloat16:
+    preprocessed_inputs = tf.cast(preprocessed_inputs, dtype=tf.bfloat16)
+  # TPU feature extraction
+  def tpu_subgraph_first_stage_fn(preprocessed_inputs):
+    """Defines the first part of graph on TPU."""
+    # [b, c, h, w] -> [b, h, w, c]
+    preprocessed_inputs = tf.transpose(preprocessed_inputs, perm=[0, 2, 3, 1])
+    prediction_dict = detection_model._predict_first_stage(preprocessed_inputs)
+    # [b, h, w, c] -> [b, c, h, w]
+    rpn_box_predictor_features = tf.transpose(
+        prediction_dict[RPN_BOX_PREDICTOR_FEATURES], perm=[0, 3, 1, 2])
+    # [b, h, w, c] -> [b, c, h, w]
+    rpn_features_to_crop = tf.transpose(
+        prediction_dict[RPN_FEATURES_TO_CROP], perm=[0, 3, 1, 2])
+    # [batch, anchor, depth] -> [depth, batch, anchor]
+    rpn_box_encodings = tf.transpose(
+        prediction_dict[RPN_BOX_ENCODINGS], perm=[2, 0, 1])
+    # [batch, anchor, depth] -> [depth, batch, anchor]
+    rpn_objectness_predictions_with_background = tf.transpose(
+        prediction_dict[RPN_OBJECTNESS_PREDICTIONS_WITH_BACKGROUND],
+        perm=[2, 0, 1])
+    # [anchors, depth]
+    anchors = tf.transpose(prediction_dict[ANCHORS], perm=[1, 0])
+    return (rpn_box_predictor_features, rpn_features_to_crop,
+            prediction_dict['image_shape'], rpn_box_encodings,
+            rpn_objectness_predictions_with_background, anchors)
+  @function.Defun(capture_resource_var_by_value=False)
+  def tpu_subgraph_first_stage():
+    if use_bfloat16:
+      with tf.contrib.tpu.bfloat16_scope():
+        return tf.contrib.tpu.rewrite(tpu_subgraph_first_stage_fn,
+                                      [preprocessed_inputs])
+    else:
+      return tf.contrib.tpu.rewrite(tpu_subgraph_first_stage_fn,
+                                    [preprocessed_inputs])
+  (rpn_box_predictor_features, rpn_features_to_crop, image_shape,
+   rpn_box_encodings, rpn_objectness_predictions_with_background,
+   anchors) = \
+      tpu_functional.TPUPartitionedCall(
+          args=tpu_subgraph_first_stage.captured_inputs,
+          device_ordinal=tpu_ops.tpu_ordinal_selector(),
+          Tout=[
+              o.type
+              for o in tpu_subgraph_first_stage.definition.signature.output_arg
+          ],
+          f=tpu_subgraph_first_stage)
+  prediction_dict = {
+      RPN_BOX_PREDICTOR_FEATURES:
+          tf.transpose(rpn_box_predictor_features, perm=[0, 2, 3, 1]),
+      RPN_FEATURES_TO_CROP:
+          tf.transpose(rpn_features_to_crop, perm=[0, 2, 3, 1]),
+      IMAGE_SHAPE:
+          image_shape,
+      RPN_BOX_ENCODINGS:
+          tf.transpose(rpn_box_encodings, perm=[1, 2, 0]),
+      RPN_OBJECTNESS_PREDICTIONS_WITH_BACKGROUND:
+          tf.transpose(
+              rpn_objectness_predictions_with_background, perm=[1, 2, 0]),
+      ANCHORS:
+          tf.transpose(anchors, perm=[1, 0]),
+  }
+  for k in prediction_dict:
+    prediction_dict[k].set_shape(shapes_info[k])
+  if use_bfloat16:
+    prediction_dict = utils.bfloat16_to_float32_nested(prediction_dict)
+  # CPU region proposal (NMS)
+  proposal_boxes_normalized, num_proposals = \
+      detection_model._proposal_postprocess(
+          tf.cast(prediction_dict[RPN_BOX_ENCODINGS], dtype=tf.float32),
+          tf.cast(
+              prediction_dict[RPN_OBJECTNESS_PREDICTIONS_WITH_BACKGROUND],
+              dtype=tf.float32), prediction_dict[ANCHORS],
+          prediction_dict[IMAGE_SHAPE], true_image_shapes)
+  prediction_dict[NUM_PROPOSALS] = num_proposals
+  # [b, h, w, c] -> [b, c, h, w]
+  prediction_dict[RPN_FEATURES_TO_CROP] = tf.transpose(
+      prediction_dict[RPN_FEATURES_TO_CROP], perm=[0, 3, 1, 2])
+  if use_bfloat16:
+    prediction_dict[RPN_FEATURES_TO_CROP] = tf.cast(
+        prediction_dict[RPN_FEATURES_TO_CROP], dtype=tf.bfloat16)
+    proposal_boxes_normalized = tf.cast(
+        proposal_boxes_normalized, dtype=tf.bfloat16)
+  # TPU box prediction
+  def tpu_subgraph_second_stage_fn(rpn_features_to_crop,
+                                   proposal_boxes_normalized, image_shape):
+    """Defines the second part of graph on TPU."""
+    rpn_features_to_crop = tf.transpose(rpn_features_to_crop, perm=[0, 2, 3, 1])
+    output_dict = detection_model._box_prediction(
+        rpn_features_to_crop, proposal_boxes_normalized, image_shape)
+    return [
+        output_dict[REFINED_BOX_ENCODINGS],
+        output_dict[CLASS_PREDICTIONS_WITH_BACKGROUND],
+        output_dict[PROPOSAL_BOXES], output_dict[BOX_CLASSIFIER_FEATURES]
+    ]
+  @function.Defun(capture_resource_var_by_value=False)
+  def tpu_subgraph_second_stage():
+    """TPU subgraph 2 wrapper."""
+    if use_bfloat16:
+      with tf.contrib.tpu.bfloat16_scope():
+        return tf.contrib.tpu.rewrite(tpu_subgraph_second_stage_fn, [
+            prediction_dict[RPN_FEATURES_TO_CROP],
+            proposal_boxes_normalized,
+            prediction_dict[IMAGE_SHAPE],
+        ])
+    else:
+      return tf.contrib.tpu.rewrite(tpu_subgraph_second_stage_fn, [
+          prediction_dict[RPN_FEATURES_TO_CROP],
+          proposal_boxes_normalized,
+          prediction_dict[IMAGE_SHAPE],
+      ])
+  (refined_box_encodings, class_predictions_with_background, proposal_boxes,
+   box_classifier_features) = tpu_functional.TPUPartitionedCall(
+       args=tpu_subgraph_second_stage.captured_inputs,
+       device_ordinal=tpu_ops.tpu_ordinal_selector(),
+       Tout=[
+           o.type
+           for o in tpu_subgraph_second_stage.definition.signature.output_arg
+       ],
+       f=tpu_subgraph_second_stage)
+  prediction_dict[RPN_FEATURES_TO_CROP] = tf.transpose(
+      prediction_dict[RPN_FEATURES_TO_CROP], perm=[0, 2, 3, 1])
+  prediction_dict_updater = {
+      REFINED_BOX_ENCODINGS: refined_box_encodings,
+      CLASS_PREDICTIONS_WITH_BACKGROUND: class_predictions_with_background,
+      PROPOSAL_BOXES: proposal_boxes,
+      BOX_CLASSIFIER_FEATURES: box_classifier_features,
+      PROPOSAL_BOXES_NORMALIZED: proposal_boxes_normalized,
+  }
+  for k in prediction_dict_updater:
+    prediction_dict_updater[k].set_shape(shapes_info[k])
+  prediction_dict.update(prediction_dict_updater)
+  if use_bfloat16:
+    prediction_dict = utils.bfloat16_to_float32_nested(prediction_dict)
+  # CPU post-processing (NMS)
+  postprocessed_tensors = detection_model.postprocess(prediction_dict,
+                                                      true_image_shapes)
+  result_tensor_dict = exporter.add_output_tensor_nodes(postprocessed_tensors,
+                                                        'inference_op')
+  return placeholder_tensor, result_tensor_dict
--- a/research/object_detection/tpu_exporters/ssd.py
+++ b/research/object_detection/tpu_exporters/ssd.py
+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Python library for ssd model, tailored for TPU inference."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import tensorflow as tf
+# pylint: disable=g-import-not-at-top
+# Checking TF version, because this module relies on TPUPartitionedCall
+# in tensorflow.python.tpu, which is not available until TF r1.14.
+major, minor, _ = tf.__version__.split('.')  # pylint: disable=protected-access
+if int(major) < 1 or (int(major == 1) and int(minor) < 14):
+  raise RuntimeError(
+      'TensorFlow version >= 1.14 is required. Found ({}).'.format(
+          tf.__version__))  # pylint: disable=protected-access
+from tensorflow.python.framework import function
+from tensorflow.python.tpu import functional as tpu_functional
+from tensorflow.python.tpu.ops import tpu_ops
+from object_detection import exporter
+from object_detection.builders import model_builder
+from object_detection.tpu_exporters import utils
+ANCHORS = 'anchors'
+BOX_ENCODINGS = 'box_encodings'
+CLASS_PREDICTIONS_WITH_BACKGROUND = 'class_predictions_with_background'
+def get_prediction_tensor_shapes(pipeline_config):
+  """Gets static shapes of tensors by building the graph on CPU.
+  This function builds the graph on CPU and obtain static shapes of output
+  tensors from TPUPartitionedCall. Shapes information are later used for setting
+  shapes of tensors when TPU graphs are built. This is necessary because tensors
+  coming out of TPUPartitionedCall lose their shape information, which are
+  needed for a lot of CPU operations later.
+  Args:
+    pipeline_config: A TrainEvalPipelineConfig proto.
+  Returns:
+    A python dict of tensors' names and their shapes.
+  """
+  detection_model = model_builder.build(
+      pipeline_config.model, is_training=False)
+  _, input_tensors = exporter.input_placeholder_fn_map['image_tensor']()
+  inputs = tf.to_float(input_tensors)
+  preprocessed_inputs, true_image_shapes = detection_model.preprocess(inputs)
+  prediction_dict = detection_model.predict(preprocessed_inputs,
+                                            true_image_shapes)
+  return {
+      BOX_ENCODINGS:
+          prediction_dict[BOX_ENCODINGS].shape.as_list(),
+      CLASS_PREDICTIONS_WITH_BACKGROUND:
+          prediction_dict[CLASS_PREDICTIONS_WITH_BACKGROUND].shape.as_list(),
+      ANCHORS:
+          prediction_dict[ANCHORS].shape.as_list(),
+  }
+def recover_shape(preprocessed_inputs, prediction_outputs, shapes_info):
+  """Recovers shape from TPUPartitionedCall.
+  Args:
+    preprocessed_inputs: 4D tensor, shaped (batch, channels, height, width)
+    prediction_outputs: Python list of tensors, in the following order -
+      box_encodings - 3D tensor, shaped (code_size, batch, num_anchors);
+      class_predictions_with_background - 3D tensor, shaped (num_classes + 1,
+      batch, num_anchors); anchors - 2D tensor, shaped (4, num_anchors)
+    shapes_info: Python dict of tensor shapes as lists.
+  Returns:
+    preprocessed_inputs: 4D tensor, shaped (batch, height, width, channels)
+    box_encodings: 3D tensor, shaped (batch, num_anchors, code_size)
+    class_predictions_with_background: 3D tensor,
+        shaped (batch, num_anchors, num_classes + 1)
+    anchors: 2D tensor, shaped (num_anchors, 4)
+  """
+  # Dimshuffle: (b, c, h, w) -> (b, h, w, c)
+  preprocessed_inputs = tf.transpose(preprocessed_inputs, perm=[0, 2, 3, 1])
+  box_encodings = tf.transpose(prediction_outputs[0], perm=[1, 2, 0])
+  # [None, None, detection_model._box_coder.code_size]
+  box_encodings.set_shape(shapes_info[BOX_ENCODINGS])
+  class_predictions_with_background = tf.transpose(
+      prediction_outputs[1], perm=[1, 2, 0])
+  # [None, None, num_classes + 1]
+  class_predictions_with_background.set_shape(
+      shapes_info[CLASS_PREDICTIONS_WITH_BACKGROUND])
+  anchors = tf.transpose(prediction_outputs[2], perm=[1, 0])
+  # [None, 4]
+  anchors.set_shape(shapes_info[ANCHORS])
+  return (preprocessed_inputs, box_encodings, class_predictions_with_background,
+          anchors)
+def build_graph(pipeline_config,
+                shapes_info,
+                input_type='encoded_image_string_tensor',
+                use_bfloat16=False):
+  """Builds TPU serving graph of ssd to be exported.
+  Args:
+    pipeline_config: A TrainEvalPipelineConfig proto.
+    shapes_info: A python dict of tensors' names and their shapes, returned by
+      `get_prediction_tensor_shapes()`.
+    input_type: One of
+                'encoded_image_string_tensor': a 1d tensor with dtype=tf.string
+                'image_tensor': a 4d tensor with dtype=tf.uint8
+                'tf_example': a 1d tensor with dtype=tf.string
+    use_bfloat16: If true, use tf.bfloat16 on TPU.
+  Returns:
+    placeholder_tensor: A placeholder tensor, type determined by `input_type`.
+    result_tensor_dict: A python dict of tensors' names and tensors.
+  """
+  detection_model = model_builder.build(
+      pipeline_config.model, is_training=False)
+  placeholder_tensor, input_tensors = \
+      exporter.input_placeholder_fn_map[input_type]()
+  inputs = tf.to_float(input_tensors)
+  preprocessed_inputs, true_image_shapes = detection_model.preprocess(inputs)
+  # Dimshuffle: (b, h, w, c) -> (b, c, h, w)
+  # This is to avoid extra padding due to TPU memory layout:
+  # We swap larger dimensions in and smaller dimensions out, so that small
+  # dimensions don't get padded tens / hundreds times of its own size.
+  # This trick is applied to other similar tensors below.
+  preprocessed_inputs = tf.transpose(preprocessed_inputs, perm=[0, 3, 1, 2])
+  if use_bfloat16:
+    preprocessed_inputs = tf.cast(preprocessed_inputs, dtype=tf.bfloat16)
+  def predict_tpu_subgraph(preprocessed_inputs, true_image_shapes):
+    """Wraps over the CPU version of `predict()`.
+    This builds a same graph as the original `predict()`, manipulates
+    result tensors' dimensions to be memory efficient on TPU, and
+    returns them as list of tensors.
+    Args:
+      preprocessed_inputs: A 4D tensor of shape (batch, channels, height, width)
+      true_image_shapes: True image shapes tensor.
+    Returns:
+      A Python list of tensors:
+        box_encodings: 3D tensor of shape (code_size, batch_size, num_anchors)
+        class_predictions_with_background: 3D tensor,
+            shape (num_classes + 1, batch_size, num_anchors)
+        anchors: 2D tensor of shape (4, num_anchors)
+    """
+    # Dimshuffle: (b, c, h, w) -> (b, h, w, c)
+    preprocessed_inputs = tf.transpose(preprocessed_inputs, perm=[0, 2, 3, 1])
+    if use_bfloat16:
+      with tf.contrib.tpu.bfloat16_scope():
+        prediction_dict = detection_model.predict(preprocessed_inputs,
+                                                  true_image_shapes)
+    else:
+      prediction_dict = detection_model.predict(preprocessed_inputs,
+                                                true_image_shapes)
+    # Dimshuffle: (batch, anchors, depth) -> (depth, batch, anchors)
+    return [
+        tf.transpose(prediction_dict[BOX_ENCODINGS], perm=[2, 0, 1]),
+        tf.transpose(
+            prediction_dict[CLASS_PREDICTIONS_WITH_BACKGROUND], perm=[2, 0, 1]),
+        tf.transpose(prediction_dict[ANCHORS], perm=[1, 0]),
+    ]
+  @function.Defun(capture_resource_var_by_value=False)
+  def predict_tpu():
+    return tf.contrib.tpu.rewrite(predict_tpu_subgraph,
+                                  [preprocessed_inputs, true_image_shapes])
+  prediction_outputs = tpu_functional.TPUPartitionedCall(
+      args=predict_tpu.captured_inputs,
+      device_ordinal=tpu_ops.tpu_ordinal_selector(),
+      Tout=[o.type for o in predict_tpu.definition.signature.output_arg],
+      f=predict_tpu)
+  (preprocessed_inputs, box_encodings, class_predictions_with_background,
+   anchors) = recover_shape(preprocessed_inputs, prediction_outputs,
+                            shapes_info)
+  output_tensors = {
+      'preprocessed_inputs': preprocessed_inputs,
+      BOX_ENCODINGS: box_encodings,
+      CLASS_PREDICTIONS_WITH_BACKGROUND: class_predictions_with_background,
+      ANCHORS: anchors,
+  }
+  if use_bfloat16:
+    output_tensors = utils.bfloat16_to_float32_nested(output_tensors)
+  postprocessed_tensors = detection_model.postprocess(output_tensors,
+                                                      true_image_shapes)
+  result_tensor_dict = exporter.add_output_tensor_nodes(postprocessed_tensors,
+                                                        'inference_op')
+  return placeholder_tensor, result_tensor_dict
--- a/research/object_detection/tpu_exporters/testdata/__init__.py
+++ b/research/object_detection/tpu_exporters/testdata/__init__.py
--- a/research/object_detection/tpu_exporters/testdata/faster_rcnn/faster_rcnn_resnet101_atrous_coco.config
+++ b/research/object_detection/tpu_exporters/testdata/faster_rcnn/faster_rcnn_resnet101_atrous_coco.config
+# Faster R-CNN with Resnet-101 (v1), Atrous version
+# Trained on COCO, initialized from Imagenet classification checkpoint
+model {
+  faster_rcnn {
+    num_classes: 90
+    image_resizer {
+      keep_aspect_ratio_resizer {
+        min_dimension: 600
+        max_dimension: 1024
+      }
+    }
+    feature_extractor {
+      type: 'faster_rcnn_resnet101'
+      first_stage_features_stride: 8
+    }
+    first_stage_anchor_generator {
+      grid_anchor_generator {
+        scales: [0.25, 0.5, 1.0, 2.0]
+        aspect_ratios: [0.5, 1.0, 2.0]
+        height_stride: 8
+        width_stride: 8
+      }
+    }
+    first_stage_atrous_rate: 2
+    first_stage_box_predictor_conv_hyperparams {
+      op: CONV
+      regularizer {
+        l2_regularizer {
+          weight: 0.0
+        }
+      }
+      initializer {
+        truncated_normal_initializer {
+          stddev: 0.01
+        }
+      }
+    }
+    first_stage_nms_score_threshold: 0.0
+    first_stage_nms_iou_threshold: 0.7
+    first_stage_max_proposals: 300
+    first_stage_localization_loss_weight: 2.0
+    first_stage_objectness_loss_weight: 1.0
+    initial_crop_size: 14
+    maxpool_kernel_size: 2
+    maxpool_stride: 2
+    second_stage_box_predictor {
+      mask_rcnn_box_predictor {
+        use_dropout: false
+        dropout_keep_probability: 1.0
+        fc_hyperparams {
+          op: FC
+          regularizer {
+            l2_regularizer {
+              weight: 0.0
+            }
+          }
+          initializer {
+            variance_scaling_initializer {
+              factor: 1.0
+              uniform: true
+              mode: FAN_AVG
+            }
+          }
+        }
+      }
+    }
+    second_stage_post_processing {
+      batch_non_max_suppression {
+        score_threshold: 0.0
+        iou_threshold: 0.6
+        max_detections_per_class: 100
+        max_total_detections: 300
+      }
+      score_converter: SOFTMAX
+    }
+    second_stage_localization_loss_weight: 2.0
+    second_stage_classification_loss_weight: 1.0
+  }
+}
--- a/research/object_detection/tpu_exporters/testdata/ssd/ssd_pipeline.config
+++ b/research/object_detection/tpu_exporters/testdata/ssd/ssd_pipeline.config
+model {
+  ssd {
+    num_classes: 2
+    image_resizer {
+      fixed_shape_resizer {
+        height: 1280
+        width: 1280
+      }
+    }
+    feature_extractor {
+      type: "ssd_resnet50_v1_fpn"
+      depth_multiplier: 1.0
+      min_depth: 16
+      conv_hyperparams {
+        regularizer {
+          l2_regularizer {
+            weight: 0.000399999989895
+          }
+        }
+        initializer {
+          truncated_normal_initializer {
+            mean: 0.0
+            stddev: 0.0299999993294
+          }
+        }
+        activation: RELU_6
+        batch_norm {
+          decay: 0.996999979019
+          center: true
+          scale: true
+          epsilon: 0.0010000000475
+          train: true
+        }
+      }
+      override_base_feature_extractor_hyperparams: true
+      fpn {
+        min_level: 2
+      }
+    }
+    box_coder {
+      faster_rcnn_box_coder {
+        y_scale: 10.0
+        x_scale: 10.0
+        height_scale: 5.0
+        width_scale: 5.0
+      }
+    }
+    matcher {
+      argmax_matcher {
+        matched_threshold: 0.5
+        unmatched_threshold: 0.5
+        ignore_thresholds: false
+        negatives_lower_than_unmatched: true
+        force_match_for_each_row: true
+        use_matmul_gather: true
+      }
+    }
+    similarity_calculator {
+      iou_similarity {
+      }
+    }
+    box_predictor {
+      weight_shared_convolutional_box_predictor {
+        conv_hyperparams {
+          regularizer {
+            l2_regularizer {
+              weight: 0.000399999989895
+            }
+          }
+          initializer {
+            random_normal_initializer {
+              mean: 0.0
+              stddev: 0.00999999977648
+            }
+          }
+          activation: RELU_6
+          batch_norm {
+            decay: 0.996999979019
+            scale: true
+            epsilon: 0.0010000000475
+          }
+        }
+        depth: 256
+        num_layers_before_predictor: 4
+        kernel_size: 3
+        class_prediction_bias_init: -4.59999990463
+      }
+    }
+    anchor_generator {
+      multiscale_anchor_generator {
+        min_level: 2
+        max_level: 7
+        anchor_scale: 3.0
+        aspect_ratios: 1.0
+        aspect_ratios: 2.0
+        aspect_ratios: 0.5
+        scales_per_octave: 2
+      }
+    }
+    post_processing {
+      batch_non_max_suppression {
+        score_threshold: 9.99999993923e-09
+        iou_threshold: 0.600000023842
+        max_detections_per_class: 300
+        max_total_detections: 600
+  use_static_shapes: true
+      }
+      score_converter: SIGMOID
+    }
+    normalize_loss_by_num_matches: true
+    loss {
+      localization_loss {
+        weighted_smooth_l1 {
+        }
+      }
+      classification_loss {
+        weighted_sigmoid_focal {
+          gamma: 2.0
+          alpha: 0.25
+        }
+      }
+      classification_weight: 1.0
+      localization_weight: 1.0
+    }
+    encode_background_as_zeros: true
+    normalize_loc_loss_by_codesize: true
+    inplace_batchnorm_update: true
+    freeze_batchnorm: false
+  }
+}
--- a/research/object_detection/tpu_exporters/utils.py
+++ b/research/object_detection/tpu_exporters/utils.py
+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Utilities for TPU inference."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import tensorflow as tf
+def bfloat16_to_float32(tensor):
+  """Converts a tensor to tf.float32 only if it is tf.bfloat16."""
+  if tensor.dtype == tf.bfloat16:
+    return tf.cast(tensor, dtype=tf.float32)
+  else:
+    return tensor
+def bfloat16_to_float32_nested(bfloat16_tensor_dict):
+  """Converts bfloat16 tensors in a nested structure to float32.
+  Other tensors not of dtype bfloat16 will be left as is.
+  Args:
+    bfloat16_tensor_dict: A Python dict, values being Tensor or Python
+      list/tuple of Tensor.
+  Returns:
+    A Python dict with the same structure as `bfloat16_tensor_dict`,
+    with all bfloat16 tensors converted to float32.
+  """
+  float32_tensor_dict = {}
+  for k, v in bfloat16_tensor_dict.items():
+    if isinstance(v, tf.Tensor):
+      float32_tensor_dict[k] = bfloat16_to_float32(v)
+    elif isinstance(v, (list, tuple)):
+      float32_tensor_dict[k] = [bfloat16_to_float32(t) for t in v]
+  return float32_tensor_dict
--- a/research/object_detection/tpu_exporters/utils_test.py
+++ b/research/object_detection/tpu_exporters/utils_test.py
+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Test for Utility functions."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import tensorflow as tf
+from object_detection.tpu_exporters import utils
+class UtilsTest(tf.test.TestCase):
+  def testBfloat16ToFloat32(self):
+    bfloat16_tensor = tf.random.uniform([2, 3], dtype=tf.bfloat16)
+    float32_tensor = utils.bfloat16_to_float32(bfloat16_tensor)
+    self.assertEqual(float32_tensor.dtype, tf.float32)
+  def testOtherDtypesNotConverted(self):
+    int32_tensor = tf.ones([2, 3], dtype=tf.int32)
+    converted_tensor = utils.bfloat16_to_float32(int32_tensor)
+    self.assertEqual(converted_tensor.dtype, tf.int32)
+  def testBfloat16ToFloat32Nested(self):
+    tensor_dict = {
+        'key1': tf.random.uniform([2, 3], dtype=tf.bfloat16),
+        'key2': [
+            tf.random.uniform([1, 2], dtype=tf.bfloat16) for _ in range(3)
+        ],
+        'key3': tf.ones([2, 3], dtype=tf.int32),
+    }
+    tensor_dict = utils.bfloat16_to_float32_nested(tensor_dict)
+    self.assertEqual(tensor_dict['key1'].dtype, tf.float32)
+    for t in tensor_dict['key2']:
+      self.assertEqual(t.dtype, tf.float32)
+    self.assertEqual(tensor_dict['key3'].dtype, tf.int32)
+if __name__ == '__main__':
+  tf.test.main()
--- a/research/object_detection/utils/config_util.py
+++ b/research/object_detection/utils/config_util.py
@@ -73,7 +73,9 @@ def get_spatial_image_size(image_resizer_config):
      return [image_resizer_config.keep_aspect_ratio_resizer.max_dimension] * 2
    else:
      return [-1, -1]
-  if image_resizer_config.HasField("identity_resizer"):
+  if image_resizer_config.HasField(
+      "identity_resizer") or image_resizer_config.HasField(
+          "conditional_shape_resizer"):
    return [-1, -1]
  raise ValueError("Unknown image resizer type.")
@@ -856,11 +858,6 @@ def _update_train_steps(configs, train_steps):
  configs["train_config"].num_steps = int(train_steps)
-def _update_eval_steps(configs, eval_steps):
-  """Updates `configs` to reflect new number of eval steps per evaluation."""
-  configs["eval_config"].num_examples = int(eval_steps)
 def _update_all_eval_input_configs(configs, field, value):
  """Updates the content of `field` with `value` for all eval input configs."""
  for eval_input_config in configs["eval_input_configs"]:

--- a/research/object_detection/utils/config_util_test.py
+++ b/research/object_detection/utils/config_util_test.py
@@ -612,6 +612,12 @@ class ConfigUtilTest(tf.test.TestCase):
    image_shape = config_util.get_spatial_image_size(image_resizer_config)
    self.assertAllEqual(image_shape, [-1, -1])
+  def testGetSpatialImageSizeFromConditionalShapeResizer(self):
+    image_resizer_config = image_resizer_pb2.ImageResizer()
+    image_resizer_config.conditional_shape_resizer.size_threshold = 100
+    image_shape = config_util.get_spatial_image_size(image_resizer_config)
+    self.assertAllEqual(image_shape, [-1, -1])
  def testEvalShuffle(self):
    """Tests that `eval_shuffle` keyword arguments are applied correctly."""
    original_shuffle = True

--- a/research/object_detection/utils/model_util.py
+++ b/research/object_detection/utils/model_util.py
+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Utility functions for manipulating Keras models."""
+import tensorflow as tf
+def extract_submodel(model, inputs, outputs, name=None):
+  """Extracts a section of a Keras model into a new model.
+  This method walks an existing model from the specified outputs back to the
+  specified inputs in order to construct a new model containing only a portion
+  of the old model, while sharing the layers and weights with the original
+  model.
+  WARNING: This method does not work for submodels containing layers that have
+  been used multiple times in the original model, or in other models beyond
+  the original model. (E.g. does not work for submodels that contain layers that
+  use shared weights). This also means that multiple overlapping submodels
+  cannot be extracted from the same model.
+  It also relies on recursion and will hit python's recursion limit for large
+  submodels.
+  Args:
+    model: The existing Keras model this method extracts a submodel from.
+    inputs: The layer inputs in the existing model that start the submodel
+    outputs: The layer outputs in the existing model that should be output by
+      the submodel
+    name: The name for the extracted model
+  Returns:
+    The extracted submodel specified by the given inputs and outputs
+  """
+  output_to_layer = {}
+  output_to_layer_input = {}
+  for layer in model.layers:
+    layer_output = layer.output
+    layer_inputs = layer.input
+    output_to_layer[layer_output] = layer
+    output_to_layer_input[layer_output] = layer_inputs
+  model_inputs_dict = {}
+  memoized_results = {}
+  # Relies on recursion, very low limit in python
+  def _recurse_in_model(tensor):
+    """Walk the existing model recursively to copy a submodel."""
+    if tensor in memoized_results:
+      return memoized_results[tensor]
+    if (tensor == inputs) or (isinstance(inputs, list) and tensor in inputs):
+      if tensor not in model_inputs_dict:
+        model_inputs_dict[tensor] = tf.keras.layers.Input(tensor=tensor)
+      out = model_inputs_dict[tensor]
+    else:
+      cur_inputs = output_to_layer_input[tensor]
+      cur_layer = output_to_layer[tensor]
+      if isinstance(cur_inputs, list):
+        out = cur_layer([_recurse_in_model(inp) for inp in cur_inputs])
+      else:
+        out = cur_layer(_recurse_in_model(cur_inputs))
+    memoized_results[tensor] = out
+    return out
+  if isinstance(outputs, list):
+    model_outputs = [_recurse_in_model(tensor) for tensor in outputs]
+  else:
+    model_outputs = _recurse_in_model(outputs)
+  if isinstance(inputs, list):
+    model_inputs = [model_inputs_dict[tensor] for tensor in inputs]
+  else:
+    model_inputs = model_inputs_dict[inputs]
+  return tf.keras.Model(inputs=model_inputs, outputs=model_outputs, name=name)
--- a/research/object_detection/utils/model_util_test.py
+++ b/research/object_detection/utils/model_util_test.py
+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Test utility functions for manipulating Keras models."""
+import tensorflow as tf
+from object_detection.utils import model_util
+class ExtractSubmodelUtilTest(tf.test.TestCase):
+  def test_simple_model(self):
+    inputs = tf.keras.Input(shape=(256,))  # Returns a placeholder tensor
+    # A layer instance is callable on a tensor, and returns a tensor.
+    x = tf.keras.layers.Dense(128, activation='relu', name='a')(inputs)
+    x = tf.keras.layers.Dense(64, activation='relu', name='b')(x)
+    x = tf.keras.layers.Dense(32, activation='relu', name='c')(x)
+    x = tf.keras.layers.Dense(16, activation='relu', name='d')(x)
+    x = tf.keras.layers.Dense(8, activation='relu', name='e')(x)
+    predictions = tf.keras.layers.Dense(10, activation='softmax')(x)
+    model = tf.keras.Model(inputs=inputs, outputs=predictions)
+    new_in = model.get_layer(
+        name='b').input
+    new_out = model.get_layer(
+        name='d').output
+    new_model = model_util.extract_submodel(
+        model=model,
+        inputs=new_in,
+        outputs=new_out)
+    batch_size = 3
+    ones = tf.ones((batch_size, 128))
+    final_out = new_model(ones)
+    self.assertAllEqual(final_out.shape, (batch_size, 16))
+if __name__ == '__main__':
+  tf.test.main()
--- a/research/object_detection/utils/ops.py
+++ b/research/object_detection/utils/ops.py
@@ -16,7 +16,6 @@
 """A module for helper tensorflow ops."""
 import collections
 import math
-import numpy as np
 import six
 import tensorflow as tf
@@ -53,17 +52,19 @@ def normalized_to_image_coordinates(normalized_boxes, image_shape,
  """Converts a batch of boxes from normal to image coordinates.
  Args:
-    normalized_boxes: a float32 tensor of shape [None, num_boxes, 4] in
+    normalized_boxes: a tensor of shape [None, num_boxes, 4] in
-      normalized coordinates.
+      normalized coordinates. The dtype of this tensor must support tf.mul.
-    image_shape: a float32 tensor of shape [4] containing the image shape.
+    image_shape: a tensor of shape [4] containing the image shape, with same
+      dtype as `normalized_boxes`.
    parallel_iterations: parallelism for the map_fn op.
  Returns:
-    absolute_boxes: a float32 tensor of shape [None, num_boxes, 4] containing
+    absolute_boxes: a tensor of shape [None, num_boxes, 4] containing
-      the boxes in image coordinates.
+      the boxes in image coordinates, with same
+      dtype as `normalized_boxes`.
  """
-  x_scale = tf.cast(image_shape[2], tf.float32)
+  x_scale = tf.cast(image_shape[2], normalized_boxes.dtype)
-  y_scale = tf.cast(image_shape[1], tf.float32)
+  y_scale = tf.cast(image_shape[1], normalized_boxes.dtype)
  def _to_absolute_coordinates(normalized_boxes):
    y_min, x_min, y_max, x_max = tf.split(
        value=normalized_boxes, num_or_size_splits=4, axis=1)
@@ -77,7 +78,7 @@ def normalized_to_image_coordinates(normalized_boxes, image_shape,
  absolute_boxes = shape_utils.static_or_dynamic_map_fn(
      _to_absolute_coordinates,
      elems=(normalized_boxes),
-      dtype=tf.float32,
+      dtype=normalized_boxes.dtype,
      parallel_iterations=parallel_iterations,
      back_prop=True)
  return absolute_boxes
@@ -881,27 +882,33 @@ def merge_boxes_with_multiple_labels(boxes,
    merged_box_indices = tf.unsorted_segment_min(
        tf.range(num_boxes), unique_indices, num_unique_boxes)
    merged_boxes = tf.gather(boxes, merged_box_indices)
+    unique_indices = tf.to_int64(unique_indices)
+    classes = tf.to_int64(classes)
    def map_box_encodings(i):
      """Produces box K-hot and score encodings for each class index."""
      box_mask = tf.equal(
-          unique_indices, i * tf.ones(num_boxes, dtype=tf.int32))
+          unique_indices, i * tf.ones(num_boxes, dtype=tf.int64))
      box_mask = tf.reshape(box_mask, [-1])
      box_indices = tf.boolean_mask(classes, box_mask)
      box_confidences = tf.boolean_mask(confidences, box_mask)
      box_class_encodings = tf.sparse_to_dense(
-          box_indices, [num_classes], 1, validate_indices=False)
+          box_indices, [num_classes], tf.constant(1, dtype=tf.int64),
+          validate_indices=False)
      box_confidence_encodings = tf.sparse_to_dense(
          box_indices, [num_classes], box_confidences, validate_indices=False)
      return box_class_encodings, box_confidence_encodings
+    # Important to avoid int32 here since there is no GPU kernel for int32.
+    # int64 and float32 are fine.
    class_encodings, confidence_encodings = tf.map_fn(
        map_box_encodings,
-        tf.range(num_unique_boxes),
+        tf.range(tf.to_int64(num_unique_boxes)),
        back_prop=False,
-        dtype=(tf.int32, tf.float32))
+        dtype=(tf.int64, tf.float32))
    merged_boxes = tf.reshape(merged_boxes, [-1, 4])
+    class_encodings = tf.to_int32(class_encodings)
    class_encodings = tf.reshape(class_encodings, [-1, num_classes])
    confidence_encodings = tf.reshape(confidence_encodings, [-1, num_classes])
    merged_box_indices = tf.reshape(merged_box_indices, [-1])
@@ -1003,8 +1010,8 @@ def matmul_crop_and_resize(image, boxes, crop_size, scope=None):
    2) Only XLA supported operations are used (e.g., matrix multiplication).
    3) There is no `box_indices` argument --- to run this op on multiple images,
      one must currently call this op independently on each image.
-    4) All shapes and the `crop_size` parameter are assumed to be statically
+    4) The `crop_size` parameter is assumed to be statically defined.
-      defined.  Moreover, the number of boxes must be strictly nonzero.
+      Moreover, the number of boxes must be strictly nonzero.
  Args:
    image: A `Tensor`. Must be one of the following types: `uint8`, `int8`,
@@ -1029,41 +1036,20 @@ def matmul_crop_and_resize(image, boxes, crop_size, scope=None):
  Returns:
    A 5-D tensor of shape `[batch, num_boxes, crop_height, crop_width, depth]`
-  Raises:
-    ValueError: if image tensor does not have shape
-      `[batch, image_height, image_width, depth]` and all dimensions statically
-      defined.
-    ValueError: if boxes tensor does not have shape `[batch, num_boxes, 4]`
-      where num_boxes > 0.
-    ValueError: if crop_size is not a list of two positive integers
  """
-  img_shape = image.shape.as_list()
+  img_shape = tf.shape(image)
-  boxes_shape = boxes.shape.as_list()
+  img_height = img_shape[1]
-  _, img_height, img_width, _ = img_shape
+  img_width = img_shape[2]
-  if not isinstance(crop_size, list) or len(crop_size) != 2:
-    raise ValueError('`crop_size` must be a list of length 2')
-  dimensions = img_shape + crop_size + boxes_shape
-  if not all([isinstance(dim, int) for dim in dimensions]):
-    raise ValueError('all input shapes must be statically defined')
-  if len(boxes_shape) != 3 or boxes_shape[2] != 4:
-    raise ValueError('`boxes` should have shape `[batch, num_boxes, 4]`')
-  if len(img_shape) != 4:
-    raise ValueError('image should have shape '
-                     '`[batch, image_height, image_width, depth]`')
-  num_crops = boxes_shape[0]
-  if not num_crops > 0:
-    raise ValueError('number of boxes must be > 0')
-  if not (crop_size[0] > 0 and crop_size[1] > 0):
-    raise ValueError('`crop_size` must be a list of two positive integers.')
  def _lin_space_weights(num, img_size):
    if num > 1:
-      start_weights = tf.linspace(img_size - 1.0, 0.0, num)
+      start_weights = tf.linspace(tf.to_float(img_size) - 1.0, 0.0, num)
-      stop_weights = img_size - 1 - start_weights
+      stop_weights = tf.to_float(img_size) - 1.0 - start_weights
    else:
-      start_weights = tf.constant(num * [.5 * (img_size - 1)], dtype=tf.float32)
+      start_weights = tf.ones([num], dtype=tf.float32) * \
-      stop_weights = tf.constant(num * [.5 * (img_size - 1)], dtype=tf.float32)
+          .5 * (tf.to_float(img_size) - 1.0)
+      stop_weights = tf.ones([num], dtype=tf.float32) * \
+          .5 * (tf.to_float(img_size) - 1.0)
    return (start_weights, stop_weights)
  with tf.name_scope(scope, 'MatMulCropAndResize'):
@@ -1076,19 +1062,19 @@ def matmul_crop_and_resize(image, boxes, crop_size, scope=None):
    [y1, x1, y2, x2] = tf.unstack(boxes, axis=2)
    # Pixel centers of input image and grid points along height and width
-    image_idx_h = tf.constant(
+    image_idx_h = tf.cast(
-        np.reshape(np.arange(img_height), (1, 1, 1, img_height)),
+        tf.reshape(tf.range(img_height), (1, 1, 1, img_height)),
        dtype=boxes.dtype)
-    image_idx_w = tf.constant(
+    image_idx_w = tf.cast(
-        np.reshape(np.arange(img_width), (1, 1, 1, img_width)),
+        tf.reshape(tf.range(img_width), (1, 1, 1, img_width)),
        dtype=boxes.dtype)
    grid_pos_h = tf.expand_dims(
-        tf.einsum('ab,c->abc', y1, y1_weights) + tf.einsum(
+        tf.einsum('ab,c->abc', y1, y1_weights) +
-            'ab,c->abc', y2, y2_weights),
+        tf.einsum('ab,c->abc', y2, y2_weights),
        axis=3)
    grid_pos_w = tf.expand_dims(
-        tf.einsum('ab,c->abc', x1, x1_weights) + tf.einsum(
+        tf.einsum('ab,c->abc', x1, x1_weights) +
-            'ab,c->abc', x2, x2_weights),
+        tf.einsum('ab,c->abc', x2, x2_weights),
        axis=3)
    # Create kernel matrices of pairwise kernel evaluations between pixel
@@ -1096,7 +1082,8 @@ def matmul_crop_and_resize(image, boxes, crop_size, scope=None):
    kernel_h = tf.nn.relu(1 - tf.abs(image_idx_h - grid_pos_h))
    kernel_w = tf.nn.relu(1 - tf.abs(image_idx_w - grid_pos_w))
-    # Compute matrix multiplication between the spatial dimensions of the image
+    # Compute matrix multiplication between
+    # the spatial dimensions of the image
    # and height-wise kernel using einsum.
    intermediate_image = tf.einsum('abci,aiop->abcop', kernel_h, image)
    # Compute matrix multiplication between the spatial dimensions of the
@@ -1124,6 +1111,58 @@ def native_crop_and_resize(image, boxes, crop_size, scope=None):
    return tf.reshape(cropped_regions, final_shape)
+def bfloat16_to_float32_nested(tensor_nested):
+  """Convert float32 tensors in a nested structure to bfloat16.
+  Args:
+    tensor_nested: A Python dict, values being Tensor or Python list/tuple of
+      Tensor.
+  Returns:
+    A Python dict with the same structure as `tensor_dict`,
+    with all bfloat16 tensors converted to float32.
+  """
+  if isinstance(tensor_nested, tf.Tensor):
+    if tensor_nested.dtype == tf.bfloat16:
+      return tf.cast(tensor_nested, dtype=tf.float32)
+    else:
+      return tensor_nested
+  elif isinstance(tensor_nested, (list, tuple)):
+    out_tensor_dict = [bfloat16_to_float32_nested(t) for t in tensor_nested]
+  elif isinstance(tensor_nested, dict):
+    out_tensor_dict = {
+        k: bfloat16_to_float32_nested(v) for k, v in tensor_nested.items()
+    }
+  return out_tensor_dict
+def gather_with_padding_values(input_tensor, indices, padding_value):
+  """Gathers elements from tensor and pads `padding_value` for ignore indices.
+  Gathers elements from `input_tensor` based on `indices`. If there are ignore
+  indices (which are "-1"s) in `indices`, `padding_value` will be gathered for
+  those positions.
+  Args:
+    input_tensor: A N-D tensor of shape [M, d_1, d_2 .. d_(N-1)] to gather
+      values from.
+    indices: A 1-D tensor in which each element is either an index in the
+      first dimension of input_tensor or -1.
+    padding_value: A (N-1)-D tensor of shape [d_1, d_2 .. d_(N-1)] which will be
+      used as gathered value for each ignore index in `indices`.
+  Returns:
+    gathered_tensor: A tensor of shape [L, d_1, d_2 .. d_(N-1)] containing
+      values gathered from input_tensor. The first dimension L is equal to the
+      length of `indices`.
+  """
+  padding_value = tf.expand_dims(padding_value, axis=0)
+  input_tensor = tf.concat([padding_value, input_tensor], axis=0)
+  gather_indices = indices + 1
+  gathered_tensor = tf.gather(input_tensor, gather_indices)
+  return gathered_tensor

--- a/research/object_detection/utils/ops_test.py
+++ b/research/object_detection/utils/ops_test.py
@@ -1223,6 +1223,35 @@ class MergeBoxesWithMultipleLabelsTest(tf.test.TestCase):
      self.assertAllEqual(np_merged_confidences.shape, [0, 5])
      self.assertAllEqual(np_merged_box_indices.shape, [0])
+  def testMergeBoxesWithMultipleLabelsUsesInt64(self):
+    boxes = tf.constant(
+        [[0.25, 0.25, 0.75, 0.75], [0.0, 0.0, 0.5, 0.75],
+         [0.25, 0.25, 0.75, 0.75]],
+        dtype=tf.float32)
+    class_indices = tf.constant([0, 4, 2], dtype=tf.int32)
+    class_confidences = tf.constant([0.8, 0.2, 0.1], dtype=tf.float32)
+    num_classes = 5
+    ops.merge_boxes_with_multiple_labels(
+        boxes, class_indices, class_confidences, num_classes)
+    graph = tf.get_default_graph()
+    def assert_dtype_is_int64(op_name):
+      op = graph.get_operation_by_name(op_name)
+      self.assertEqual(op.get_attr('dtype'), tf.int64)
+    def assert_t_is_int64(op_name):
+      op = graph.get_operation_by_name(op_name)
+      self.assertEqual(op.get_attr('T'), tf.int64)
+    assert_dtype_is_int64('map/TensorArray')
+    assert_dtype_is_int64('map/TensorArray_1')
+    assert_dtype_is_int64('map/while/TensorArrayReadV3')
+    assert_t_is_int64('map/while/TensorArrayWrite/TensorArrayWriteV3')
+    assert_t_is_int64(
+        'map/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3')
+    assert_dtype_is_int64('map/TensorArrayStack/TensorArrayGatherV3')
 class NearestNeighborUpsamplingTest(test_case.TestCase):
@@ -1470,6 +1499,56 @@ class OpsTestCropAndResize(test_case.TestCase):
    self.assertAllClose(crop_output, expected_output)
+class TestBfloat16ToFloat32(test_case.TestCase):
+  def test_convert_list(self):
+    var_list = [
+        tf.constant([1.], dtype=tf.bfloat16),
+        tf.constant([2], dtype=tf.int32)
+    ]
+    casted_var_list = ops.bfloat16_to_float32_nested(var_list)
+    self.assertEqual(casted_var_list[0].dtype, tf.float32)
+    self.assertEqual(casted_var_list[1].dtype, tf.int32)
+  def test_convert_tensor_dict(self):
+    tensor_dict = {
+        'key1': tf.constant([1.], dtype=tf.bfloat16),
+        'key2': [
+            tf.constant([0.5], dtype=tf.bfloat16),
+            tf.constant([7], dtype=tf.int32),
+        ],
+        'key3': tf.constant([2], dtype=tf.uint8),
+    }
+    tensor_dict = ops.bfloat16_to_float32_nested(tensor_dict)
+    self.assertEqual(tensor_dict['key1'].dtype, tf.float32)
+    self.assertEqual(tensor_dict['key2'][0].dtype, tf.float32)
+    self.assertEqual(tensor_dict['key2'][1].dtype, tf.int32)
+    self.assertEqual(tensor_dict['key3'].dtype, tf.uint8)
+class TestGatherWithPaddingValues(test_case.TestCase):
+  def test_gather_with_padding_values(self):
+    indices = tf.constant([1, -1, 0, -1])
+    input_tensor = tf.constant([[0, 0, 0.1, 0.1], [0, 0, 0.2, 0.2]],
+                               dtype=tf.float32)
+    expected_gathered_tensor = [
+        [0, 0, 0.2, 0.2],
+        [0, 0, 0, 0],
+        [0, 0, 0.1, 0.1],
+        [0, 0, 0, 0],
+    ]
+    gathered_tensor = ops.gather_with_padding_values(
+        input_tensor,
+        indices=indices,
+        padding_value=tf.zeros_like(input_tensor[0]))
+    self.assertEqual(gathered_tensor.dtype, tf.float32)
+    with self.test_session():
+      gathered_tensor_np = gathered_tensor.eval()
+    self.assertAllClose(expected_gathered_tensor, gathered_tensor_np)

--- a/research/object_detection/utils/visualization_utils.py
+++ b/research/object_detection/utils/visualization_utils.py
@@ -21,7 +21,6 @@ The functions do not return a value, instead they modify the image itself.
 """
 import abc
 import collections
-import functools
 # Set headless-friendly backend.
 import matplotlib; matplotlib.use('Agg')  # pylint: disable=multiple-statements
 import matplotlib.pyplot as plt  # pylint: disable=g-import-not-at-top
@@ -65,6 +64,34 @@ STANDARD_COLORS = [
 ]
+def _get_multiplier_for_color_randomness():
+  """Returns a multiplier to get semi-random colors from successive indices.
+  This function computes a prime number, p, in the range [2, 17] that:
+  - is closest to len(STANDARD_COLORS) / 10
+  - does not divide len(STANDARD_COLORS)
+  If no prime numbers in that range satisfy the constraints, p is returned as 1.
+  Once p is established, it can be used as a multiplier to select
+  non-consecutive colors from STANDARD_COLORS:
+  colors = [(p * i) % len(STANDARD_COLORS) for i in range(20)]
+  """
+  num_colors = len(STANDARD_COLORS)
+  prime_candidates = [5, 7, 11, 13, 17]
+  # Remove all prime candidates that divide the number of colors.
+  prime_candidates = [p for p in prime_candidates if num_colors % p]
+  if not prime_candidates:
+    return 1
+  # Return the closest prime number to num_colors / 10.
+  abs_distance = [np.abs(num_colors / 10. - p) for p in prime_candidates]
+  num_candidates = len(abs_distance)
+  inds = [i for _, i in sorted(zip(abs_distance, range(num_candidates)))]
+  return prime_candidates[inds[0]]
 def save_image_array_as_png(image, output_path):
  """Saves an image (represented as a numpy array) to PNG.
@@ -266,46 +293,98 @@ def draw_bounding_boxes_on_image(image,
                               boxes[i, 3], color, thickness, display_str_list)
-def _visualize_boxes(image, boxes, classes, scores, category_index, **kwargs):
+def create_visualization_fn(category_index, include_masks=False,
-  return visualize_boxes_and_labels_on_image_array(
+                            include_keypoints=False, include_track_ids=False,
-      image, boxes, classes, scores, category_index=category_index, **kwargs)
+                            **kwargs):
+  """Constructs a visualization function that can be wrapped in a py_func.
+  py_funcs only accept positional arguments. This function returns a suitable
+  function with the correct positional argument mapping. The positional
+  arguments in order are:
+  0: image
+  1: boxes
+  2: classes
+  3: scores
+  [4-6]: masks (optional)
+  [4-6]: keypoints (optional)
+  [4-6]: track_ids (optional)
+  -- Example 1 --
+  vis_only_masks_fn = create_visualization_fn(category_index,
+    include_masks=True, include_keypoints=False, include_track_ids=False,
+    **kwargs)
+  image = tf.py_func(vis_only_masks_fn,
+                     inp=[image, boxes, classes, scores, masks],
+                     Tout=tf.uint8)
+  -- Example 2 --
+  vis_masks_and_track_ids_fn = create_visualization_fn(category_index,
+    include_masks=True, include_keypoints=False, include_track_ids=True,
+    **kwargs)
+  image = tf.py_func(vis_masks_and_track_ids_fn,
+                     inp=[image, boxes, classes, scores, masks, track_ids],
+                     Tout=tf.uint8)
+  Args:
+    category_index: a dict that maps integer ids to category dicts. e.g.
+      {1: {1: 'dog'}, 2: {2: 'cat'}, ...}
+    include_masks: Whether masks should be expected as a positional argument in
+      the returned function.
+    include_keypoints: Whether keypoints should be expected as a positional
+      argument in the returned function.
+    include_track_ids: Whether track ids should be expected as a positional
+      argument in the returned function.
+    **kwargs: Additional kwargs that will be passed to
+      visualize_boxes_and_labels_on_image_array.
-def _visualize_boxes_and_masks(image, boxes, classes, scores, masks,
+  Returns:
-                               category_index, **kwargs):
+    Returns a function that only takes tensors as positional arguments.
-  return visualize_boxes_and_labels_on_image_array(
+  """
-      image,
-      boxes,
-      classes,
-      scores,
-      category_index=category_index,
-      instance_masks=masks,
-      **kwargs)
-def _visualize_boxes_and_keypoints(image, boxes, classes, scores, keypoints,
+  def visualization_py_func_fn(*args):
-                                   category_index, **kwargs):
+    """Visualization function that can be wrapped in a tf.py_func.
-  return visualize_boxes_and_labels_on_image_array(
-      image,
-      boxes,
-      classes,
-      scores,
-      category_index=category_index,
-      keypoints=keypoints,
-      **kwargs)
+    Args:
+      *args: First 4 positional arguments must be:
+        image - uint8 numpy array with shape (img_height, img_width, 3).
+        boxes - a numpy array of shape [N, 4].
+        classes - a numpy array of shape [N].
+        scores - a numpy array of shape [N] or None.
+        -- Optional positional arguments --
+        instance_masks - a numpy array of shape [N, image_height, image_width].
+        keypoints - a numpy array of shape [N, num_keypoints, 2].
+        track_ids - a numpy array of shape [N] with unique track ids.
-def _visualize_boxes_and_masks_and_keypoints(
+    Returns:
-    image, boxes, classes, scores, masks, keypoints, category_index, **kwargs):
+      uint8 numpy array with shape (img_height, img_width, 3) with overlaid
-  return visualize_boxes_and_labels_on_image_array(
+      boxes.
-      image,
+    """
-      boxes,
+    image = args[0]
-      classes,
+    boxes = args[1]
-      scores,
+    classes = args[2]
-      category_index=category_index,
+    scores = args[3]
-      instance_masks=masks,
+    masks = keypoints = track_ids = None
-      keypoints=keypoints,
+    pos_arg_ptr = 4  # Positional argument for first optional tensor (masks).
-      **kwargs)
+    if include_masks:
+      masks = args[pos_arg_ptr]
+      pos_arg_ptr += 1
+    if include_keypoints:
+      keypoints = args[pos_arg_ptr]
+      pos_arg_ptr += 1
+    if include_track_ids:
+      track_ids = args[pos_arg_ptr]
+    return visualize_boxes_and_labels_on_image_array(
+        image,
+        boxes,
+        classes,
+        scores,
+        category_index=category_index,
+        instance_masks=masks,
+        keypoints=keypoints,
+        track_ids=track_ids,
+        **kwargs)
+  return visualization_py_func_fn
 def _resize_original_image(image, image_shape):
@@ -327,6 +406,7 @@ def draw_bounding_boxes_on_image_tensors(images,
                                         true_image_shape=None,
                                         instance_masks=None,
                                         keypoints=None,
+                                         track_ids=None,
                                         max_boxes_to_draw=20,
                                         min_score_thresh=0.2,
                                         use_normalized_coordinates=True):
@@ -350,6 +430,9 @@ def draw_bounding_boxes_on_image_tensors(images,
      instance masks.
    keypoints: A 4D float32 tensor of shape [N, max_detection, num_keypoints, 2]
      with keypoints.
+    track_ids: [N, max_detections] int32 tensor of unique tracks ids (i.e.
+      instance ids for each object). If provided, the color-coding of boxes is
+      dictated by these ids, and not classes.
    max_boxes_to_draw: Maximum number of boxes to draw on an image. Default 20.
    min_score_thresh: Minimum score threshold for visualization. Default 0.2.
    use_normalized_coordinates: Whether to assume boxes and kepoints are in
@@ -380,40 +463,20 @@ def draw_bounding_boxes_on_image_tensors(images,
  else:
    original_shapes = original_image_spatial_shape
-  if instance_masks is not None and keypoints is None:
+  visualize_boxes_fn = create_visualization_fn(
-    visualize_boxes_fn = functools.partial(
+      category_index,
-        _visualize_boxes_and_masks,
+      include_masks=instance_masks is not None,
-        category_index=category_index,
+      include_keypoints=keypoints is not None,
-        **visualization_keyword_args)
+      include_track_ids=track_ids is not None,
-    elems = [
+      **visualization_keyword_args)
-        true_shapes, original_shapes, images, boxes, classes, scores,
-        instance_masks
+  elems = [true_shapes, original_shapes, images, boxes, classes, scores]
-    ]
+  if instance_masks is not None:
-  elif instance_masks is None and keypoints is not None:
+    elems.append(instance_masks)
-    visualize_boxes_fn = functools.partial(
+  if keypoints is not None:
-        _visualize_boxes_and_keypoints,
+    elems.append(keypoints)
-        category_index=category_index,
+  if track_ids is not None:
-        **visualization_keyword_args)
+    elems.append(track_ids)
-    elems = [
-        true_shapes, original_shapes, images, boxes, classes, scores, keypoints
-    ]
-  elif instance_masks is not None and keypoints is not None:
-    visualize_boxes_fn = functools.partial(
-        _visualize_boxes_and_masks_and_keypoints,
-        category_index=category_index,
-        **visualization_keyword_args)
-    elems = [
-        true_shapes, original_shapes, images, boxes, classes, scores,
-        instance_masks, keypoints
-    ]
-  else:
-    visualize_boxes_fn = functools.partial(
-        _visualize_boxes,
-        category_index=category_index,
-        **visualization_keyword_args)
-    elems = [
-        true_shapes, original_shapes, images, boxes, classes, scores
-    ]
  def draw_boxes(image_and_detections):
    """Draws boxes on image."""
@@ -627,6 +690,7 @@ def visualize_boxes_and_labels_on_image_array(
    instance_masks=None,
    instance_boundaries=None,
    keypoints=None,
+    track_ids=None,
    use_normalized_coordinates=False,
    max_boxes_to_draw=20,
    min_score_thresh=.5,
@@ -634,7 +698,8 @@ def visualize_boxes_and_labels_on_image_array(
    line_thickness=4,
    groundtruth_box_visualization_color='black',
    skip_scores=False,
-    skip_labels=False):
+    skip_labels=False,
+    skip_track_ids=False):
  """Overlay labeled boxes on an image with formatted scores and label names.
  This function groups boxes that correspond to the same location
@@ -658,6 +723,9 @@ def visualize_boxes_and_labels_on_image_array(
      with values ranging between 0 and 1, can be None.
    keypoints: a numpy array of shape [N, num_keypoints, 2], can
      be None
+    track_ids: a numpy array of shape [N] with unique track ids. If provided,
+      color-coding of boxes will be determined by these ids, and not the class
+      indices.
    use_normalized_coordinates: whether boxes is to be interpreted as
      normalized coordinates or not.
    max_boxes_to_draw: maximum number of boxes to visualize.  If None, draw
@@ -671,6 +739,7 @@ def visualize_boxes_and_labels_on_image_array(
      boxes
    skip_scores: whether to skip score when drawing a single detection
    skip_labels: whether to skip label when drawing a single detection
+    skip_track_ids: whether to skip track id when drawing a single detection
  Returns:
    uint8 numpy array with shape (img_height, img_width, 3) with overlaid boxes.
@@ -682,6 +751,7 @@ def visualize_boxes_and_labels_on_image_array(
  box_to_instance_masks_map = {}
  box_to_instance_boundaries_map = {}
  box_to_keypoints_map = collections.defaultdict(list)
+  box_to_track_ids_map = {}
  if not max_boxes_to_draw:
    max_boxes_to_draw = boxes.shape[0]
  for i in range(min(max_boxes_to_draw, boxes.shape[0])):
@@ -693,6 +763,8 @@ def visualize_boxes_and_labels_on_image_array(
        box_to_instance_boundaries_map[box] = instance_boundaries[i]
      if keypoints is not None:
        box_to_keypoints_map[box].extend(keypoints[i])
+      if track_ids is not None:
+        box_to_track_ids_map[box] = track_ids[i]
      if scores is None:
        box_to_color_map[box] = groundtruth_box_visualization_color
      else:
@@ -709,9 +781,18 @@ def visualize_boxes_and_labels_on_image_array(
            display_str = '{}%'.format(int(100*scores[i]))
          else:
            display_str = '{}: {}%'.format(display_str, int(100*scores[i]))
+        if not skip_track_ids and track_ids is not None:
+          if not display_str:
+            display_str = 'ID {}'.format(track_ids[i])
+          else:
+            display_str = '{}: ID {}'.format(display_str, track_ids[i])
        box_to_display_str_map[box].append(display_str)
        if agnostic_mode:
          box_to_color_map[box] = 'DarkOrange'
+        elif track_ids is not None:
+          prime_multipler = _get_multiplier_for_color_randomness()
+          box_to_color_map[box] = STANDARD_COLORS[
+              (prime_multipler * track_ids[i]) % len(STANDARD_COLORS)]
        else:
          box_to_color_map[box] = STANDARD_COLORS[
              classes[i] % len(STANDARD_COLORS)]

--- a/research/object_detection/utils/visualization_utils_test.py
+++ b/research/object_detection/utils/visualization_utils_test.py
@@ -29,6 +29,30 @@ _TESTDATA_PATH = 'object_detection/test_images'
 class VisualizationUtilsTest(tf.test.TestCase):
+  def test_get_prime_multiplier_for_color_randomness(self):
+    # Show that default multipler is not 1 and does not divide the total number
+    # of standard colors.
+    multiplier = visualization_utils._get_multiplier_for_color_randomness()
+    self.assertNotEqual(
+        0, multiplier % len(visualization_utils.STANDARD_COLORS))
+    self.assertNotEqual(1, multiplier)
+    # Show that with 34 colors, the closest prime number to 34/10 that
+    # satisfies the constraints is 5.
+    visualization_utils.STANDARD_COLORS = [
+        'color_{}'.format(str(i)) for i in range(34)
+    ]
+    multiplier = visualization_utils._get_multiplier_for_color_randomness()
+    self.assertEqual(5, multiplier)
+    # Show that with 110 colors, the closest prime number to 110/10 that
+    # satisfies the constraints is 13 (since 11 equally divides 110).
+    visualization_utils.STANDARD_COLORS = [
+        'color_{}'.format(str(i)) for i in range(110)
+    ]
+    multiplier = visualization_utils._get_multiplier_for_color_randomness()
+    self.assertEqual(13, multiplier)
  def create_colorful_test_image(self):
    """This function creates an image that can be used to test vis functions.
@@ -158,6 +182,55 @@ class VisualizationUtilsTest(tf.test.TestCase):
          image_pil = Image.fromarray(images_with_boxes_np[i, ...])
          image_pil.save(output_file)
+  def test_draw_bounding_boxes_on_image_tensors_with_track_ids(self):
+    """Tests that bounding box utility produces reasonable results."""
+    category_index = {1: {'id': 1, 'name': 'dog'}, 2: {'id': 2, 'name': 'cat'}}
+    fname = os.path.join(_TESTDATA_PATH, 'image1.jpg')
+    image_np = np.array(Image.open(fname))
+    images_np = np.stack((image_np, image_np), axis=0)
+    original_image_shape = [[636, 512], [636, 512]]
+    with tf.Graph().as_default():
+      images_tensor = tf.constant(value=images_np, dtype=tf.uint8)
+      image_shape = tf.constant(original_image_shape, dtype=tf.int32)
+      boxes = tf.constant([[[0.4, 0.25, 0.75, 0.75],
+                            [0.5, 0.3, 0.7, 0.9],
+                            [0.7, 0.5, 0.8, 0.9]],
+                           [[0.41, 0.25, 0.75, 0.75],
+                            [0.51, 0.3, 0.7, 0.9],
+                            [0.75, 0.5, 0.8, 0.9]]])
+      classes = tf.constant([[1, 1, 2], [1, 1, 2]], dtype=tf.int64)
+      scores = tf.constant([[0.8, 0.5, 0.7], [0.6, 0.5, 0.8]])
+      track_ids = tf.constant([[3, 9, 7], [3, 9, 144]], dtype=tf.int32)
+      images_with_boxes = (
+          visualization_utils.draw_bounding_boxes_on_image_tensors(
+              images_tensor,
+              boxes,
+              classes,
+              scores,
+              category_index,
+              original_image_spatial_shape=image_shape,
+              true_image_shape=image_shape,
+              track_ids=track_ids,
+              min_score_thresh=0.2))
+      with self.test_session() as sess:
+        sess.run(tf.global_variables_initializer())
+        # Write output images for visualization.
+        images_with_boxes_np = sess.run(images_with_boxes)
+        self.assertEqual(images_np.shape[0], images_with_boxes_np.shape[0])
+        self.assertEqual(images_np.shape[3], images_with_boxes_np.shape[3])
+        self.assertEqual(
+            tuple(original_image_shape[0]), images_with_boxes_np.shape[1:3])
+        for i in range(images_with_boxes_np.shape[0]):
+          img_name = 'image_with_track_ids_' + str(i) + '.png'
+          output_file = os.path.join(self.get_temp_dir(), img_name)
+          logging.info('Writing output image %d to %s', i, output_file)
+          image_pil = Image.fromarray(images_with_boxes_np[i, ...])
+          image_pil.save(output_file)
  def test_draw_bounding_boxes_on_image_tensors_with_additional_channels(self):
    """Tests the case where input image tensor has more than 3 channels."""
    category_index = {1: {'id': 1, 'name': 'dog'}}