Merged commit includes the following changes:

185215255 by Zhichao Lu: Stop populating image/object/class/text field when generating COCO tf record. -- 185213306 by Zhichao Lu: Use the params batch size and not the one from train_config in input_fn -- 185209081 by Zhichao Lu: Handle the case when there are no ground-truth masks for an image. -- 185195531 by Zhichao Lu: Remove unstack and stack operations on features from third_party/object_detection/model.py. -- 185195017 by Zhichao Lu: Matrix multiplication based gather op implementation. -- 185187744 by Zhichao Lu: Fix eval_util minor issue. -- 185098733 by Zhichao Lu: Internal change 185076656 by Zhichao Lu: Increment the amount of boxes for coco17. -- 185074199 by Zhichao Lu: Add config for SSD Resnet50 v1 with FPN. -- 185060199 by Zhichao Lu: Fix a bug in clear_detections. This method set detection_keys to an empty dictionary instead of an empty set. I've refactored so that this method and the constructor use the same code path. -- 185031359 by Zhichao Lu: Eval TPU trained models continuously. -- 185016591 by Zhichao Lu: Use TPUEstimatorSpec for TPU -- 185013651 by Zhichao Lu: Add PreprocessorCache to record and duplicate augmentations. -- 184921763 by Zhichao Lu: Minor fixes for object detection. -- 184920610 by Zhichao Lu: Adds a model builder test for "embedded_ssd_mobilenet_v1" feature extractor. -- 184919284 by Zhichao Lu: Added unit tests for TPU, with optional training / eval. -- 184915910 by Zhichao Lu: Update third_party g3 doc with Mask RCNN detection models. -- 184914085 by Zhichao Lu: Slight change to WeightSharedConvolutionalBoxPredictor implementation to make things match more closely with RetinaNet. Specifically we now construct the box encoding and class predictor towers separately rather than having them share weights until penultimate layer. -- 184913786 by Zhichao Lu: Plumbs SSD Resnet V1 with FPN models into model builder. -- 184910030 by Zhichao Lu: Add coco metrics to evaluator. -- 184897758 by Zhichao Lu: Merge changes from github. -- 184888736 by Zhichao Lu: Ensure groundtruth_weights are always 1-D. -- 184887256 by Zhichao Lu: Introduce an option to add summaries in the model so it can be turned off when necessary. -- 184865559 by Zhichao Lu: Updating inputs so that a dictionary of tensors is returned from input_fn. Moving unbatch/unpad to model.py. Also removing source_id key from features dictionary, and replacing with an integer hash. -- 184859205 by Zhichao Lu: This CL is trying to hide those differences by making the default settings work with the public code. -- 184769779 by Zhichao Lu: Pass groundtruth weights into ssd meta architecture all the way to target assigner. This will allow training ssd models with padded groundtruth tensors. -- 184767117 by Zhichao Lu: * Add `params` arg to make all input fns work with TPUEstimator * Add --master * Output eval results -- 184766244 by Zhichao Lu: Update create_coco_tf_record to include category indices -- 184752937 by Zhichao Lu: Create a third_party version of TPU compatible mobilenet_v2_focal_loss coco config. -- 184750174 by Zhichao Lu: A few small fixes for multiscale anchor generator and a test. -- 184746581 by Zhichao Lu: Update jupyter notebook to show mask if provided by model. -- 184728646 by Zhichao Lu: Adding a few more tests to make sure decoding with/without label maps performs as expected. -- 184624154 by Zhichao Lu: Add an object detection binary for TPU. -- 184622118 by Zhichao Lu: Batch, transform, and unbatch in the tflearn interface. -- 184595064 by Zhichao Lu: Add support for training grayscale models. -- 184532026 by Zhichao Lu: Change dataset_builder.build to perform optional batching using tf.data.Dataset API -- 184330239 by Zhichao Lu: Add augment_input_data and transform_input_data helper functions to third_party/tensorflow_models/object_detection/inputs.py -- 184328681 by Zhichao Lu: Use an internal rgb to gray method that can be quantized. -- 184327909 by Zhichao Lu: Helper function to return padding shapes to use with Dataset.padded_batch. -- 184326291 by Zhichao Lu: Added decode_func for specialized decoding. -- 184314676 by Zhichao Lu: Add unstack_batch method to inputs.py. This will enable us to convert batched tensors to lists of tensors. This is compatible with OD API that consumes groundtruth batch as a list of tensors. -- 184281269 by Zhichao Lu: Internal test target changes. -- 184192851 by Zhichao Lu: Adding `Estimator` interface for object detection. -- 184187885 by Zhichao Lu: Add config_util functions to help with input pipeline. 1. function to return expected shapes from the resizer config 2. function to extract image_resizer_config from model_config. -- 184139892 by Zhichao Lu: Adding support for depthwise SSD (ssd-lite) and depthwise box predictions. -- 184089891 by Zhichao Lu: Fix third_party faster rcnn resnet101 coco config. -- 184083378 by Zhichao Lu: In the case when there is no object/weights field in tf.Example proto, return a default weight of 1.0 for all boxes. -- PiperOrigin-RevId: 185215255

Merged commit includes the following changes:
185215255 by Zhichao Lu: Stop populating image/object/class/text field when generating COCO tf record. -- 185213306 by Zhichao Lu: Use the params batch size and not the one from train_config in input_fn -- 185209081 by Zhichao Lu: Handle the case when there are no ground-truth masks for an image. -- 185195531 by Zhichao Lu: Remove unstack and stack operations on features from third_party/object_detection/model.py. -- 185195017 by Zhichao Lu: Matrix multiplication based gather op implementation. -- 185187744 by Zhichao Lu: Fix eval_util minor issue. -- 185098733 by Zhichao Lu: Internal change 185076656 by Zhichao Lu: Increment the amount of boxes for coco17. -- 185074199 by Zhichao Lu: Add config for SSD Resnet50 v1 with FPN. -- 185060199 by Zhichao Lu: Fix a bug in clear_detections. This method set detection_keys to an empty dictionary instead of an empty set. I've refactored so that this method and the constructor use the same code path. -- 185031359 by Zhichao Lu: Eval TPU trained models continuously. -- 185016591 by Zhichao Lu: Use TPUEstimatorSpec for TPU -- 185013651 by Zhichao Lu: Add PreprocessorCache to record and duplicate augmentations. -- 184921763 by Zhichao Lu: Minor fixes for object detection. -- 184920610 by Zhichao Lu: Adds a model builder test for "embedded_ssd_mobilenet_v1" feature extractor. -- 184919284 by Zhichao Lu: Added unit tests for TPU, with optional training / eval. -- 184915910 by Zhichao Lu: Update third_party g3 doc with Mask RCNN detection models. -- 184914085 by Zhichao Lu: Slight change to WeightSharedConvolutionalBoxPredictor implementation to make things match more closely with RetinaNet. Specifically we now construct the box encoding and class predictor towers separately rather than having them share weights until penultimate layer. -- 184913786 by Zhichao Lu: Plumbs SSD Resnet V1 with FPN models into model builder. -- 184910030 by Zhichao Lu: Add coco metrics to evaluator. -- 184897758 by Zhichao Lu: Merge changes from github. -- 184888736 by Zhichao Lu: Ensure groundtruth_weights are always 1-D. -- 184887256 by Zhichao Lu: Introduce an option to add summaries in the model so it can be turned off when necessary. -- 184865559 by Zhichao Lu: Updating inputs so that a dictionary of tensors is returned from input_fn. Moving unbatch/unpad to model.py. Also removing source_id key from features dictionary, and replacing with an integer hash. -- 184859205 by Zhichao Lu: This CL is trying to hide those differences by making the default settings work with the public code. -- 184769779 by Zhichao Lu: Pass groundtruth weights into ssd meta architecture all the way to target assigner. This will allow training ssd models with padded groundtruth tensors. -- 184767117 by Zhichao Lu: * Add `params` arg to make all input fns work with TPUEstimator * Add --master * Output eval results -- 184766244 by Zhichao Lu: Update create_coco_tf_record to include category indices -- 184752937 by Zhichao Lu: Create a third_party version of TPU compatible mobilenet_v2_focal_loss coco config. -- 184750174 by Zhichao Lu: A few small fixes for multiscale anchor generator and a test. -- 184746581 by Zhichao Lu: Update jupyter notebook to show mask if provided by model. -- 184728646 by Zhichao Lu: Adding a few more tests to make sure decoding with/without label maps performs as expected. -- 184624154 by Zhichao Lu: Add an object detection binary for TPU. -- 184622118 by Zhichao Lu: Batch, transform, and unbatch in the tflearn interface. -- 184595064 by Zhichao Lu: Add support for training grayscale models. -- 184532026 by Zhichao Lu: Change dataset_builder.build to perform optional batching using tf.data.Dataset API -- 184330239 by Zhichao Lu: Add augment_input_data and transform_input_data helper functions to third_party/tensorflow_models/object_detection/inputs.py -- 184328681 by Zhichao Lu: Use an internal rgb to gray method that can be quantized. -- 184327909 by Zhichao Lu: Helper function to return padding shapes to use with Dataset.padded_batch. -- 184326291 by Zhichao Lu: Added decode_func for specialized decoding. -- 184314676 by Zhichao Lu: Add unstack_batch method to inputs.py. This will enable us to convert batched tensors to lists of tensors. This is compatible with OD API that consumes groundtruth batch as a list of tensors. -- 184281269 by Zhichao Lu: Internal test target changes. -- 184192851 by Zhichao Lu: Adding `Estimator` interface for object detection. -- 184187885 by Zhichao Lu: Add config_util functions to help with input pipeline. 1. function to return expected shapes from the resizer config 2. function to extract image_resizer_config from model_config. -- 184139892 by Zhichao Lu: Adding support for depthwise SSD (ssd-lite) and depthwise box predictions. -- 184089891 by Zhichao Lu: Fix third_party faster rcnn resnet101 coco config. -- 184083378 by Zhichao Lu: In the case when there is no object/weights field in tf.Example proto, return a default weight of 1.0 for all boxes. -- PiperOrigin-RevId: 185215255
1efe98bb · Zhichao Lu · lzc5123016 · fbc5ba06 · 1efe98bb · 1efe98bb
Commit 1efe98bb authored Feb 09, 2018 by Zhichao Lu Committed by lzc5123016 Feb 09, 2018
20 changed files
--- a/research/object_detection/exporter.py
+++ b/research/object_detection/exporter.py
@@ -19,6 +19,7 @@ import os
 import tempfile
 import tensorflow as tf
 from google.protobuf import text_format
+from tensorflow.core.protobuf import saver_pb2
 from tensorflow.python import pywrap_tensorflow
 from tensorflow.python.client import session
 from tensorflow.python.framework import graph_util
@@ -354,16 +355,22 @@ def _export_inference_graph(input_type,
  if graph_hook_fn: graph_hook_fn()
+  saver_kwargs = {}
  if use_moving_averages:
-    temp_checkpoint_file = tempfile.NamedTemporaryFile()
+    # This check is to be compatible with both version of SaverDef.
+    if os.path.isfile(trained_checkpoint_prefix):
+      saver_kwargs['write_version'] = saver_pb2.SaverDef.V1
+      temp_checkpoint_prefix = tempfile.NamedTemporaryFile().name
+    else:
+      temp_checkpoint_prefix = tempfile.mkdtemp()
    replace_variable_values_with_moving_averages(
        tf.get_default_graph(), trained_checkpoint_prefix,
-        temp_checkpoint_file.name)
+        temp_checkpoint_prefix)
-    checkpoint_to_use = temp_checkpoint_file.name
+    checkpoint_to_use = temp_checkpoint_prefix
  else:
    checkpoint_to_use = trained_checkpoint_prefix
-  saver = tf.train.Saver()
+  saver = tf.train.Saver(**saver_kwargs)
  input_saver_def = saver.as_saver_def()
  _write_graph_and_checkpoint(

--- a/research/object_detection/g3doc/detection_model_zoo.md
+++ b/research/object_detection/g3doc/detection_model_zoo.md
@@ -23,7 +23,7 @@ In the table below, we list each such pre-trained model including:
 * detector performance on subset of the COCO validation set or Open Images test split as measured by the dataset-specific mAP measure.
  Here, higher is better, and we only report bounding box mAP rounded to the
  nearest integer.
-* Output types (currently only `Boxes`)
+* Output types (`Boxes`, and `Masks` if applicable )
 You can un-tar each tar.gz file via, e.g.,:
@@ -55,7 +55,7 @@ Some remarks on frozen inference graphs:
  a detector (and discarding the part past that point), which negatively impacts
  standard mAP metrics.
 * Our frozen inference graphs are generated using the
-  [v1.4.0](https://github.com/tensorflow/tensorflow/tree/v1.4.0)
+  [v1.5.0](https://github.com/tensorflow/tensorflow/tree/v1.5.0)
  release version of Tensorflow and we do not guarantee that these will work
  with other versions; this being said, each frozen inference graph can be
  regenerated using your current version of Tensorflow by re-running the
@@ -69,16 +69,20 @@ Some remarks on frozen inference graphs:
 | ------------ | :--------------: | :--------------: | :-------------: |
 | [ssd_mobilenet_v1_coco](http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_coco_2017_11_17.tar.gz) | 30 | 21 | Boxes |
 | [ssd_inception_v2_coco](http://download.tensorflow.org/models/object_detection/ssd_inception_v2_coco_2017_11_17.tar.gz) | 42 | 24 | Boxes |
-| [faster_rcnn_inception_v2_coco](http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_v2_coco_2017_11_08.tar.gz) | 58 | 28 | Boxes |
+| [faster_rcnn_inception_v2_coco](http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_v2_coco_2018_01_28.tar.gz) | 58 | 28 | Boxes |
-| [faster_rcnn_resnet50_coco](http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet50_coco_2017_11_08.tar.gz) | 89 | 30 | Boxes |
+| [faster_rcnn_resnet50_coco](http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet50_coco_2018_01_28.tar.gz) | 89 | 30 | Boxes |
-| [faster_rcnn_resnet50_lowproposals_coco](http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet50_lowproposals_coco_2017_11_08.tar.gz) | 64 |  | Boxes |
+| [faster_rcnn_resnet50_lowproposals_coco](http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet50_lowproposals_coco_2018_01_28.tar.gz) | 64 |  | Boxes |
-| [rfcn_resnet101_coco](http://download.tensorflow.org/models/object_detection/rfcn_resnet101_coco_2017_11_08.tar.gz)  | 92 | 30 | Boxes |
+| [rfcn_resnet101_coco](http://download.tensorflow.org/models/object_detection/rfcn_resnet101_coco_2018_01_28.tar.gz)  | 92 | 30 | Boxes |
-| [faster_rcnn_resnet101_coco](http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_coco_2017_11_08.tar.gz) | 106 | 32 | Boxes |
+| [faster_rcnn_resnet101_coco](http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_coco_2018_01_28.tar.gz) | 106 | 32 | Boxes |
-| [faster_rcnn_resnet101_lowproposals_coco](http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_lowproposals_coco_2017_11_08.tar.gz) | 82 |  | Boxes |
+| [faster_rcnn_resnet101_lowproposals_coco](http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_lowproposals_coco_2018_01_28.tar.gz) | 82 |  | Boxes |
-| [faster_rcnn_inception_resnet_v2_atrous_coco](http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_resnet_v2_atrous_coco_2017_11_08.tar.gz) | 620 | 37 | Boxes |
+| [faster_rcnn_inception_resnet_v2_atrous_coco](http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_resnet_v2_atrous_coco_2018_01_28.tar.gz) | 620 | 37 | Boxes |
-| [faster_rcnn_inception_resnet_v2_atrous_lowproposals_coco](http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_resnet_v2_atrous_lowproposals_coco_2017_11_08.tar.gz) | 241 |  | Boxes |
+| [faster_rcnn_inception_resnet_v2_atrous_lowproposals_coco](http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_resnet_v2_atrous_lowproposals_coco_2018_01_28.tar.gz) | 241 |  | Boxes |
-| [faster_rcnn_nas](http://download.tensorflow.org/models/object_detection/faster_rcnn_nas_coco_2017_11_08.tar.gz) | 1833 | 43 | Boxes |
+| [faster_rcnn_nas](http://download.tensorflow.org/models/object_detection/faster_rcnn_nas_coco_2018_01_28.tar.gz) | 1833 | 43 | Boxes |
-| [faster_rcnn_nas_lowproposals_coco](http://download.tensorflow.org/models/object_detection/faster_rcnn_nas_lowproposals_coco_2017_11_08.tar.gz) | 540 |  | Boxes |
+| [faster_rcnn_nas_lowproposals_coco](http://download.tensorflow.org/models/object_detection/faster_rcnn_nas_lowproposals_coco_2018_01_28.tar.gz) | 540 |  | Boxes |
+| [mask_rcnn_inception_resnet_v2_atrous_coco](http://download.tensorflow.org/models/object_detection/mask_rcnn_inception_resnet_v2_atrous_coco_2018_01_28.tar.gz) | 771 | 36 | Masks |
+| [mask_rcnn_inception_v2_coco](http://download.tensorflow.org/models/object_detection/mask_rcnn_inception_v2_coco_2018_01_28.tar.gz) | 79 | 25 | Masks |
+| [mask_rcnn_resnet101_atrous_coco](http://download.tensorflow.org/models/object_detection/mask_rcnn_resnet101_atrous_coco_2018_01_28.tar.gz) | 470 | 33 | Masks |
+| [mask_rcnn_resnet50_atrous_coco](http://download.tensorflow.org/models/object_detection/mask_rcnn_resnet50_atrous_coco_2018_01_28.tar.gz) | 343 | 29 | Masks |
@@ -86,14 +90,14 @@ Some remarks on frozen inference graphs:
 Model name                                                                                                                                                        | Speed (ms) | Pascal mAP@0.5 (ms) | Outputs
 ----------------------------------------------------------------------------------------------------------------------------------------------------------------- | :---: | :-------------: | :-----:
-[faster_rcnn_resnet101_kitti](http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_kitti_2017_11_08.tar.gz) | 79  | 87              | Boxes
+[faster_rcnn_resnet101_kitti](http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_kitti_2018_01_28.tar.gz) | 79  | 87              | Boxes
 ## Open Images-trained models {#open-images-models}
 Model name                                                                                                                                                        | Speed (ms) | Open Images mAP@0.5[^2] | Outputs
 ----------------------------------------------------------------------------------------------------------------------------------------------------------------- | :---: | :-------------: | :-----:
-[faster_rcnn_inception_resnet_v2_atrous_oid](http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_resnet_v2_atrous_oid_2017_11_08.tar.gz) | 727 | 37              | Boxes
+[faster_rcnn_inception_resnet_v2_atrous_oid](http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_resnet_v2_atrous_oid_2018_01_28.tar.gz) | 727 | 37              | Boxes
-[faster_rcnn_inception_resnet_v2_atrous_lowproposals_oid](http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_resnet_v2_atrous_lowproposals_oid_2017_11_08.tar.gz) | 347  |               | Boxes
+[faster_rcnn_inception_resnet_v2_atrous_lowproposals_oid](http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_resnet_v2_atrous_lowproposals_oid_2018_01_28.tar.gz) | 347  |               | Boxes
 [^1]: See [MSCOCO evaluation protocol](http://cocodataset.org/#detections-eval).

--- a/research/object_detection/g3doc/img/kites_with_segment_overlay.png
+++ b/research/object_detection/g3doc/img/kites_with_segment_overlay.png
--- a/research/object_detection/g3doc/instance_segmentation.md
+++ b/research/object_detection/g3doc/instance_segmentation.md
+## Run an Instance Segmentation Model
+For some applications it isn't adequate enough to localize an object with a
+simple bounding box. For instance, you might want to segment an object region
+once it is detected. This class of problems is called **instance segmentation**.
+<p align="center">
+  <img src="img/kites_with_segment_overlay.png" width=676 height=450>
+</p>
+### Materializing data for instance segmentation {#materializing-instance-seg}
+Instance segmentation is an extension of object detection, where a binary mask
+(i.e. object vs. background) is associated with every bounding box. This allows
+for more fine-grained information about the extent of the object within the box.
+To train an instance segmentation model, a groundtruth mask must be supplied for
+every groundtruth bounding box. In additional to the proto fields listed in the
+section titled [Using your own dataset](using_your_own_dataset.md), one must
+also supply `image/object/mask`, which can either be a repeated list of
+single-channel encoded PNG strings, or a single dense 3D binary tensor where
+masks corresponding to each object are stacked along the first dimension. Each
+is described in more detail below.
+#### PNG Instance Segmentation Masks
+Instance segmentation masks can be supplied as serialized PNG images.
+```shell
+image/object/mask = ["\x89PNG\r\n\x1A\n\x00\x00\x00\rIHDR\...", ...]
+```
+These masks are whole-image masks, one for each object instance. The spatial
+dimensions of each mask must agree with the image. Each mask has only a single
+channel, and the pixel values are either 0 (background) or 1 (object mask).
+**PNG masks are the preferred parameterization since they offer considerable
+space savings compared to dense numerical masks.**
+#### Dense Numerical Instance Segmentation Masks
+Masks can also be specified via a dense numerical tensor.
+```shell
+image/object/mask = [0.0, 0.0, 1.0, 1.0, 0.0, ...]
+```
+For an image with dimensions `H` x `W` and `num_boxes` groundtruth boxes, the
+mask corresponds to a [`num_boxes`, `H`, `W`] float32 tensor, flattened into a
+single vector of shape `num_boxes` * `H` * `W`. In TensorFlow, examples are read
+in row-major format, so the elements are organized as:
+```shell
+... mask 0 row 0 ... mask 0 row 1 ... // ... mask 0 row H-1 ... mask 1 row 0 ...
+```
+where each row has W contiguous binary values.
+To see an example tf-records with mask labels, see the examples under the
+[Preparing Inputs](preparing_inputs.md) section.
+### Pre-existing config files
+We provide four instance segmentation config files that you can use to train
+your own models:
+1.  <a href="https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/mask_rcnn_inception_resnet_v2_atrous_coco.config" target=_blank>mask_rcnn_inception_resnet_v2_atrous_coco</a>
+1.  <a href="https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/mask_rcnn_resnet101_atrous_coco.config" target=_blank>mask_rcnn_resnet101_atrous_coco</a>
+1.  <a href="https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/mask_rcnn_resnet50_atrous_coco.config" target=_blank>mask_rcnn_resnet50_atrous_coco</a>
+1.  <a href="https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/mask_rcnn_inception_v2_coco.config" target=_blank>mask_rcnn_inception_v2_coco</a>
+For more details see the [detection model zoo](detection_model_zoo.md).
+### Updating a Faster R-CNN config file
+Currently, the only supported instance segmentation model is [Mask
+R-CNN](https://arxiv.org/abs/1703.06870), which requires Faster R-CNN as the
+backbone object detector.
+Once you have a baseline Faster R-CNN pipeline configuration, you can make the
+following modifications in order to convert it into a Mask R-CNN model.
+1.  Within `train_input_reader` and `eval_input_reader`, set
+    `load_instance_masks` to `True`. If using PNG masks, set `mask_type` to
+    `PNG_MASKS`, otherwise you can leave it as the default 'NUMERICAL_MASKS'.
+1.  Within the `faster_rcnn` config, use a `MaskRCNNBoxPredictor` as the
+    `second_stage_box_predictor`.
+1.  Within the `MaskRCNNBoxPredictor` message, set `predict_instance_masks` to
+    `True`. You must also define `conv_hyperparams`.
+1.  Within the `faster_rcnn` message, set `number_of_stages` to `3`.
+1.  Add instance segmentation metrics to the set of metrics:
+    `'coco_mask_metrics'`.
+1.  Update the `input_path`s to point at your data.
+Please refer to the section on [Running the pets dataset](running_pets.md) for
+additional details.
+> Note: The mask prediction branch consists of a sequence of convolution layers.
+> You can set the number of convolution layers and their depth as follows:
+>
+> 1.  Within the `MaskRCNNBoxPredictor` message, set the
+>     `mask_prediction_conv_depth` to your value of interest. The default value
+>     is 256. If you set it to `0` (recommended), the depth is computed
+>     automatically based on the number of classes in the dataset.
+> 1.  Within the `MaskRCNNBoxPredictor` message, set the
+>     `mask_prediction_num_conv_layers` to your value of interest. The default
+>     value is 2.
--- a/research/object_detection/g3doc/running_pets.md
+++ b/research/object_detection/g3doc/running_pets.md
@@ -319,6 +319,9 @@ instance segmentation pipeline. Everything above that was mentioned about object
 detection holds true for instance segmentation. Instance segmentation consists
 of an object detection model with an additional head that predicts the object
 mask inside each predicted box once we remove the training and other details.
+Please refer to the section on [Running an Instance Segmentation
+Model](instance_segmentation.md) for instructions on how to configure a model
+that predicts masks in addition to object bounding boxes.
 ## What's Next

--- a/research/object_detection/g3doc/using_your_own_dataset.md
+++ b/research/object_detection/g3doc/using_your_own_dataset.md
@@ -103,7 +103,7 @@ FLAGS = flags.FLAGS
 def create_tf_example(example):
-  # TODO(user): Populate the following variables from your example.
+  # TODO: Populate the following variables from your example.
  height = None # Image height
  width = None # Image width
  filename = None # Filename of the image. Empty if image is not from file
@@ -139,7 +139,7 @@ def create_tf_example(example):
 def main(_):
  writer = tf.python_io.TFRecordWriter(FLAGS.output_path)
-  # TODO(user): Write code to read in your dataset to examples variable
+  # TODO: Write code to read in your dataset to examples variable
  for example in examples:
    tf_example = create_tf_example(example)
@@ -155,3 +155,7 @@ if __name__ == '__main__':
 Note: You may notice additional fields in some other datasets. They are
 currently unused by the API and are optional.
+Note: Please refer to the section on [Running an Instance Segmentation
+Model](instance_segmentation.md) for instructions on how to configure a model
+that predicts masks in addition to object bounding boxes.
--- a/research/object_detection/inputs.py
+++ b/research/object_detection/inputs.py
@@ -21,56 +21,183 @@ from __future__ import print_function
 import functools
 import tensorflow as tf
-from object_detection import trainer
 from object_detection.builders import dataset_builder
+from object_detection.builders import image_resizer_builder
+from object_detection.builders import model_builder
 from object_detection.builders import preprocessor_builder
-from object_detection.core import prefetcher
+from object_detection.core import preprocessor
 from object_detection.core import standard_fields as fields
 from object_detection.data_decoders import tf_example_decoder
 from object_detection.protos import eval_pb2
 from object_detection.protos import input_reader_pb2
+from object_detection.protos import model_pb2
 from object_detection.protos import train_pb2
+from object_detection.utils import config_util
 from object_detection.utils import dataset_util
 from object_detection.utils import ops as util_ops
-FEATURES_IMAGE = 'images'
+HASH_KEY = 'hash'
-FEATURES_KEY = 'key'
+HASH_BINS = 1 << 31
 SERVING_FED_EXAMPLE_KEY = 'serialized_example'
-def create_train_input_fn(num_classes, train_config, train_input_config):
+def transform_input_data(tensor_dict,
+                         model_preprocess_fn,
+                         image_resizer_fn,
+                         num_classes,
+                         data_augmentation_fn=None,
+                         merge_multiple_boxes=False,
+                         retain_original_image=False):
+  """A single function that is responsible for all input data transformations.
+  Data transformation functions are applied in the following order.
+  1. data_augmentation_fn (optional): applied on tensor_dict.
+  2. model_preprocess_fn: applied only on image tensor in tensor_dict.
+  3. image_resizer_fn: applied only on instance mask tensor in tensor_dict.
+  4. one_hot_encoding: applied to classes tensor in tensor_dict.
+  5. merge_multiple_boxes (optional): when groundtruth boxes are exactly the
+     same they can be merged into a single box with an associated k-hot class
+     label.
+  Args:
+    tensor_dict: dictionary containing input tensors keyed by
+      fields.InputDataFields.
+    model_preprocess_fn: model's preprocess function to apply on image tensor.
+      This function must take in a 4-D float tensor and return a 4-D preprocess
+      float tensor and a tensor containing the true image shape.
+    image_resizer_fn: image resizer function to apply on groundtruth instance
+      masks. This function must take a 4-D float tensor of image and a 4-D
+      tensor of instances masks and return resized version of these along with
+      the true shapes.
+    num_classes: number of max classes to one-hot (or k-hot) encode the class
+      labels.
+    data_augmentation_fn: (optional) data augmentation function to apply on
+      input `tensor_dict`.
+    merge_multiple_boxes: (optional) whether to merge multiple groundtruth boxes
+      and classes for a given image if the boxes are exactly the same.
+    retain_original_image: (optional) whether to retain original image in the
+      output dictionary.
+  Returns:
+    A dictionary keyed by fields.InputDataFields containing the tensors obtained
+    after applying all the transformations.
+  """
+  if retain_original_image:
+    tensor_dict[fields.InputDataFields.
+                original_image] = tensor_dict[fields.InputDataFields.image]
+  # Apply data augmentation ops.
+  if data_augmentation_fn is not None:
+    tensor_dict = data_augmentation_fn(tensor_dict)
+  # Apply model preprocessing ops and resize instance masks.
+  image = tf.expand_dims(
+      tf.to_float(tensor_dict[fields.InputDataFields.image]), axis=0)
+  preprocessed_resized_image, true_image_shape = model_preprocess_fn(image)
+  tensor_dict[fields.InputDataFields.image] = tf.squeeze(
+      preprocessed_resized_image, axis=0)
+  tensor_dict[fields.InputDataFields.true_image_shape] = tf.squeeze(
+      true_image_shape, axis=0)
+  if fields.InputDataFields.groundtruth_instance_masks in tensor_dict:
+    masks = tensor_dict[fields.InputDataFields.groundtruth_instance_masks]
+    _, resized_masks, _ = image_resizer_fn(image, masks)
+    tensor_dict[fields.InputDataFields.
+                groundtruth_instance_masks] = resized_masks
+  # Transform groundtruth classes to one hot encodings.
+  label_offset = 1
+  zero_indexed_groundtruth_classes = tensor_dict[
+      fields.InputDataFields.groundtruth_classes] - label_offset
+  tensor_dict[fields.InputDataFields.groundtruth_classes] = tf.one_hot(
+      zero_indexed_groundtruth_classes, num_classes)
+  if merge_multiple_boxes:
+    merged_boxes, merged_classes, _ = util_ops.merge_boxes_with_multiple_labels(
+        tensor_dict[fields.InputDataFields.groundtruth_boxes],
+        zero_indexed_groundtruth_classes, num_classes)
+    tensor_dict[fields.InputDataFields.groundtruth_boxes] = merged_boxes
+    tensor_dict[fields.InputDataFields.groundtruth_classes] = merged_classes
+  return tensor_dict
+def augment_input_data(tensor_dict, data_augmentation_options):
+  """Applies data augmentation ops to input tensors.
+  Args:
+    tensor_dict: A dictionary of input tensors keyed by fields.InputDataFields.
+    data_augmentation_options: A list of tuples, where each tuple contains a
+      function and a dictionary that contains arguments and their values.
+      Usually, this is the output of core/preprocessor.build.
+  Returns:
+    A dictionary of tensors obtained by applying data augmentation ops to the
+    input tensor dictionary.
+  """
+  tensor_dict[fields.InputDataFields.image] = tf.expand_dims(
+      tf.to_float(tensor_dict[fields.InputDataFields.image]), 0)
+  include_instance_masks = (fields.InputDataFields.groundtruth_instance_masks
+                            in tensor_dict)
+  include_keypoints = (fields.InputDataFields.groundtruth_keypoints
+                       in tensor_dict)
+  tensor_dict = preprocessor.preprocess(
+      tensor_dict, data_augmentation_options,
+      func_arg_map=preprocessor.get_default_func_arg_map(
+          include_instance_masks=include_instance_masks,
+          include_keypoints=include_keypoints))
+  tensor_dict[fields.InputDataFields.image] = tf.squeeze(
+      tensor_dict[fields.InputDataFields.image], axis=0)
+  return tensor_dict
+def create_train_input_fn(train_config, train_input_config,
+                          model_config):
  """Creates a train `input` function for `Estimator`.
  Args:
-    num_classes: Number of classes, which does not include a background
-      category.
    train_config: A train_pb2.TrainConfig.
    train_input_config: An input_reader_pb2.InputReader.
+    model_config: A model_pb2.DetectionModel.
  Returns:
    `input_fn` for `Estimator` in TRAIN mode.
  """
-  def _train_input_fn():
+  def _train_input_fn(params=None):
    """Returns `features` and `labels` tensor dictionaries for training.
+    Args:
+      params: Parameter dictionary passed from the estimator.
    Returns:
      features: Dictionary of feature tensors.
-        features['images'] is a list of N [1, H, W, C] float32 tensors,
+        features[fields.InputDataFields.image] is a [batch_size, H, W, C]
-          where N is the number of images in a batch.
+          float32 tensor with preprocessed images.
-        features['key'] is a list of N string tensors, each representing a
+        features[HASH_KEY] is a [batch_size] int32 tensor representing unique
-          unique identifier for the image.
+          identifiers for the images.
+        features[fields.InputDataFields.true_image_shape] is a [batch_size, 3]
+          int32 tensor representing the true image shapes, as preprocessed
+          images could be padded.
      labels: Dictionary of groundtruth tensors.
-        labels['locations_list'] is a list of N [num_boxes, 4] float32 tensors
+        labels[fields.InputDataFields.num_groundtruth_boxes] is a [batch_size]
-          containing the corners of the groundtruth boxes.
+          int32 tensor indicating the number of groundtruth boxes.
-        labels['classes_list'] is a list of N [num_boxes, num_classes] float32
+        labels[fields.InputDataFields.groundtruth_boxes] is a
-          padded one-hot tensors of classes.
+          [batch_size, num_boxes, 4] float32 tensor containing the corners of
-        labels['masks_list'] is a list of N [num_boxes, H, W] float32 tensors
+          the groundtruth boxes.
-          containing only binary values, which represent instance masks for
+        labels[fields.InputDataFields.groundtruth_classes] is a
-          objects if present in the dataset. Else returns None.
+          [batch_size, num_boxes, num_classes] float32 one-hot tensor of
-        labels[fields.InputDataFields.groundtruth_weights] is a list of N
+          classes.
-          [num_boxes] float32 tensors containing groundtruth weights for the
+        labels[fields.InputDataFields.groundtruth_weights] is a
-          boxes.
+          [batch_size, num_boxes] float32 tensor containing groundtruth weights
+          for the boxes.
+        -- Optional --
+        labels[fields.InputDataFields.groundtruth_instance_masks] is a
+          [batch_size, num_boxes, H, W] float32 tensor containing only binary
+          values, which represent instance masks for objects.
+        labels[fields.InputDataFields.groundtruth_keypoints] is a
+          [batch_size, num_boxes, num_keypoints, 2] float32 tensor containing
+          keypoints for each box.
    Raises:
      TypeError: if the `train_config` or `train_input_config` are not of the
@@ -82,164 +209,226 @@ def create_train_input_fn(num_classes, train_config, train_input_config):
    if not isinstance(train_input_config, input_reader_pb2.InputReader):
      raise TypeError('The `train_input_config` must be a '
                      'input_reader_pb2.InputReader.')
+    if not isinstance(model_config, model_pb2.DetectionModel):
-    def get_next(config):
+      raise TypeError('The `model_config` must be a '
-      return dataset_util.make_initializable_iterator(
+                      'model_pb2.DetectionModel.')
-          dataset_builder.build(config)).get_next()
-    create_tensor_dict_fn = functools.partial(get_next, train_input_config)
    data_augmentation_options = [
        preprocessor_builder.build(step)
        for step in train_config.data_augmentation_options
    ]
+    data_augmentation_fn = functools.partial(
-    input_queue = trainer.create_input_queue(
+        augment_input_data, data_augmentation_options=data_augmentation_options)
-        batch_size_per_clone=train_config.batch_size,
-        create_tensor_dict_fn=create_tensor_dict_fn,
+    model = model_builder.build(model_config, is_training=True)
-        batch_queue_capacity=train_config.batch_queue_capacity,
+    image_resizer_config = config_util.get_image_resizer_config(model_config)
-        num_batch_queue_threads=train_config.num_batch_queue_threads,
+    image_resizer_fn = image_resizer_builder.build(image_resizer_config)
-        prefetch_queue_capacity=train_config.prefetch_queue_capacity,
-        data_augmentation_options=data_augmentation_options)
+    transform_data_fn = functools.partial(
+        transform_input_data, model_preprocess_fn=model.preprocess,
-    (images_tuple, image_keys, locations_tuple, classes_tuple, masks_tuple,
+        image_resizer_fn=image_resizer_fn,
-     keypoints_tuple, weights_tuple) = (trainer.get_inputs(
+        num_classes=config_util.get_number_of_classes(model_config),
-         input_queue=input_queue, num_classes=num_classes))
+        data_augmentation_fn=data_augmentation_fn)
+    dataset = dataset_builder.build(
+        train_input_config,
+        transform_input_data_fn=transform_data_fn,
+        batch_size=params['batch_size'] if params else train_config.batch_size,
+        max_num_boxes=train_config.max_number_of_boxes,
+        num_classes=config_util.get_number_of_classes(model_config),
+        spatial_image_shape=config_util.get_spatial_image_size(
+            image_resizer_config))
+    tensor_dict = dataset_util.make_initializable_iterator(dataset).get_next()
+    hash_from_source_id = tf.string_to_hash_bucket_fast(
+        tensor_dict[fields.InputDataFields.source_id], HASH_BINS)
    features = {
-        FEATURES_IMAGE: list(images_tuple),
+        fields.InputDataFields.image: tensor_dict[fields.InputDataFields.image],
-        FEATURES_KEY: list(image_keys)
+        HASH_KEY: tf.cast(hash_from_source_id, tf.int32),
+        fields.InputDataFields.true_image_shape: tensor_dict[
+            fields.InputDataFields.true_image_shape]
    }
    labels = {
-        'locations_list': list(locations_tuple),
+        fields.InputDataFields.num_groundtruth_boxes: tensor_dict[
-        'classes_list': list(classes_tuple)
+            fields.InputDataFields.num_groundtruth_boxes],
+        fields.InputDataFields.groundtruth_boxes: tensor_dict[
+            fields.InputDataFields.groundtruth_boxes],
+        fields.InputDataFields.groundtruth_classes: tensor_dict[
+            fields.InputDataFields.groundtruth_classes],
+        fields.InputDataFields.groundtruth_weights: tensor_dict[
+            fields.InputDataFields.groundtruth_weights]
    }
+    if fields.InputDataFields.groundtruth_keypoints in tensor_dict:
-    # Make sure that there are no tuple elements with None.
+      labels[fields.InputDataFields.groundtruth_keypoints] = tensor_dict[
-    if all(masks is not None for masks in masks_tuple):
+          fields.InputDataFields.groundtruth_keypoints]
-      labels['masks_list'] = list(masks_tuple)
+    if fields.InputDataFields.groundtruth_instance_masks in tensor_dict:
-    if all(keypoints is not None for keypoints in keypoints_tuple):
+      labels[fields.InputDataFields.groundtruth_instance_masks] = tensor_dict[
-      labels['keypoints_list'] = list(keypoints_tuple)
+          fields.InputDataFields.groundtruth_instance_masks]
-    if all((elem is not None for elem in weights_tuple)):
-      labels[fields.InputDataFields.groundtruth_weights] = list(weights_tuple)
    return features, labels
  return _train_input_fn
-def create_eval_input_fn(num_classes, eval_config, eval_input_config):
+def create_eval_input_fn(eval_config, eval_input_config, model_config):
  """Creates an eval `input` function for `Estimator`.
  Args:
-    num_classes: Number of classes, which does not include a background
-      category.
    eval_config: An eval_pb2.EvalConfig.
    eval_input_config: An input_reader_pb2.InputReader.
+    model_config: A model_pb2.DetectionModel.
  Returns:
    `input_fn` for `Estimator` in EVAL mode.
  """
-  def _eval_input_fn():
+  def _eval_input_fn(params=None):
    """Returns `features` and `labels` tensor dictionaries for evaluation.
+    Args:
+      params: Parameter dictionary passed from the estimator.
    Returns:
      features: Dictionary of feature tensors.
-        features['images'] is a [1, H, W, C] float32 tensor.
+        features[fields.InputDataFields.image] is a [1, H, W, C] float32 tensor
-        features['key'] is a string tensor representing a unique identifier for
+          with preprocessed images.
-          the image.
+        features[HASH_KEY] is a [1] int32 tensor representing unique
+          identifiers for the images.
+        features[fields.InputDataFields.true_image_shape] is a [1, 3]
+          int32 tensor representing the true image shapes, as preprocessed
+          images could be padded.
+        features[fields.InputDataFields.original_image] is a [1, H', W', C]
+          float32 tensor with the original image.
      labels: Dictionary of groundtruth tensors.
-        labels['locations_list'] is a list of 1 [num_boxes, 4] float32 tensors
+        labels[fields.InputDataFields.groundtruth_boxes] is a [1, num_boxes, 4]
-          containing the corners of the groundtruth boxes.
+          float32 tensor containing the corners of the groundtruth boxes.
-        labels['classes_list'] is a list of 1 [num_boxes, num_classes] float32
+        labels[fields.InputDataFields.groundtruth_classes] is a
-          padded one-hot tensors of classes.
+          [num_boxes, num_classes] float32 one-hot tensor of classes.
-        labels['masks_list'] is an (optional) list of 1 [num_boxes, H, W]
+        labels[fields.InputDataFields.groundtruth_area] is a [1, num_boxes]
-          float32 tensors containing only binary values, which represent
+          float32 tensor containing object areas.
-          instance masks for objects if present in the dataset. Else returns
+        labels[fields.InputDataFields.groundtruth_is_crowd] is a [1, num_boxes]
-          None.
+          bool tensor indicating if the boxes enclose a crowd.
-        labels['image_id_list'] is a list of 1 string tensors containing the
+        labels[fields.InputDataFields.groundtruth_difficult] is a [1, num_boxes]
-          original image id.
+          int32 tensor indicating if the boxes represent difficult instances.
-        labels['area_list'] is a list of 1 [num_boxes] float32 tensors
+        -- Optional --
-          containing object mask area in pixels squared.
+        labels[fields.InputDataFields.groundtruth_instance_masks] is a
-        labels['is_crowd_list'] is a list of 1 [num_boxes] bool tensors
+          [1, num_boxes, H, W] float32 tensor containing only binary values,
-          indicating if the boxes enclose a crowd.
+          which represent instance masks for objects.
-        labels['difficult_list'] is a list of 1 [num_boxes] bool tensors
-          indicating if the boxes represent `difficult` instances.
    Raises:
      TypeError: if the `eval_config` or `eval_input_config` are not of the
        correct type.
    """
+    del params
    if not isinstance(eval_config, eval_pb2.EvalConfig):
      raise TypeError('For eval mode, the `eval_config` must be a '
-                      'eval_pb2.EvalConfig.')
+                      'train_pb2.EvalConfig.')
    if not isinstance(eval_input_config, input_reader_pb2.InputReader):
      raise TypeError('The `eval_input_config` must be a '
                      'input_reader_pb2.InputReader.')
+    if not isinstance(model_config, model_pb2.DetectionModel):
+      raise TypeError('The `model_config` must be a '
+                      'model_pb2.DetectionModel.')
+    num_classes = config_util.get_number_of_classes(model_config)
+    model = model_builder.build(model_config, is_training=False)
+    image_resizer_config = config_util.get_image_resizer_config(model_config)
+    image_resizer_fn = image_resizer_builder.build(image_resizer_config)
+    transform_data_fn = functools.partial(
+        transform_input_data, model_preprocess_fn=model.preprocess,
+        image_resizer_fn=image_resizer_fn,
+        num_classes=num_classes,
+        data_augmentation_fn=None,
+        retain_original_image=True)
+    dataset = dataset_builder.build(eval_input_config,
+                                    transform_input_data_fn=transform_data_fn)
+    input_dict = dataset_util.make_initializable_iterator(dataset).get_next()
+    hash_from_source_id = tf.string_to_hash_bucket_fast(
+        input_dict[fields.InputDataFields.source_id], HASH_BINS)
+    features = {
+        fields.InputDataFields.image:
+            input_dict[fields.InputDataFields.image],
+        fields.InputDataFields.original_image:
+            input_dict[fields.InputDataFields.original_image],
+        HASH_KEY: tf.cast(hash_from_source_id, tf.int32),
+        fields.InputDataFields.true_image_shape:
+            input_dict[fields.InputDataFields.true_image_shape]
+    }
-    input_dict = dataset_util.make_initializable_iterator(
+    labels = {
-        dataset_builder.build(eval_input_config)).get_next()
+        fields.InputDataFields.groundtruth_boxes:
-    prefetch_queue = prefetcher.prefetch(input_dict, capacity=500)
+            input_dict[fields.InputDataFields.groundtruth_boxes],
-    input_dict = prefetch_queue.dequeue()
+        fields.InputDataFields.groundtruth_classes:
-    original_image = tf.to_float(
+            input_dict[fields.InputDataFields.groundtruth_classes],
-        tf.expand_dims(input_dict[fields.InputDataFields.image], 0))
+        fields.InputDataFields.groundtruth_area:
-    features = {}
+            input_dict[fields.InputDataFields.groundtruth_area],
-    features[FEATURES_IMAGE] = original_image
+        fields.InputDataFields.groundtruth_is_crowd:
-    features[FEATURES_KEY] = input_dict[fields.InputDataFields.source_id]
+            input_dict[fields.InputDataFields.groundtruth_is_crowd],
+        fields.InputDataFields.groundtruth_difficult:
-    labels = {}
+            tf.cast(input_dict[fields.InputDataFields.groundtruth_difficult],
-    labels['locations_list'] = [
+                    tf.int32)
-        input_dict[fields.InputDataFields.groundtruth_boxes]
+    }
-    ]
-    classes_gt = tf.cast(input_dict[fields.InputDataFields.groundtruth_classes],
-                         tf.int32)
-    classes_gt -= 1  # Remove the label id offset.
-    labels['classes_list'] = [
-        util_ops.padded_one_hot_encoding(
-            indices=classes_gt, depth=num_classes, left_pad=0)
-    ]
-    labels['image_id_list'] = [input_dict[fields.InputDataFields.source_id]]
-    labels['area_list'] = [input_dict[fields.InputDataFields.groundtruth_area]]
-    labels['is_crowd_list'] = [
-        input_dict[fields.InputDataFields.groundtruth_is_crowd]
-    ]
-    labels['difficult_list'] = [
-        input_dict[fields.InputDataFields.groundtruth_difficult]
-    ]
    if fields.InputDataFields.groundtruth_instance_masks in input_dict:
-      labels['masks_list'] = [
+      labels[fields.InputDataFields.groundtruth_instance_masks] = input_dict[
-          input_dict[fields.InputDataFields.groundtruth_instance_masks]
+          fields.InputDataFields.groundtruth_instance_masks]
-      ]
+    # Add a batch dimension to the tensors.
+    features = {
+        key: tf.expand_dims(features[key], axis=0)
+        for key, feature in features.items()
+    }
+    labels = {
+        key: tf.expand_dims(labels[key], axis=0)
+        for key, label in labels.items()
+    }
    return features, labels
  return _eval_input_fn
-def create_predict_input_fn():
+def create_predict_input_fn(model_config):
  """Creates a predict `input` function for `Estimator`.
+  Args:
+    model_config: A model_pb2.DetectionModel.
  Returns:
    `input_fn` for `Estimator` in PREDICT mode.
  """
-  def _predict_input_fn():
+  def _predict_input_fn(params=None):
    """Decodes serialized tf.Examples and returns `ServingInputReceiver`.
+    Args:
+      params: Parameter dictionary passed from the estimator.
    Returns:
      `ServingInputReceiver`.
    """
+    del params
    example = tf.placeholder(dtype=tf.string, shape=[], name='input_feature')
-    decoder = tf_example_decoder.TfExampleDecoder(load_instance_masks=False)
+    num_classes = config_util.get_number_of_classes(model_config)
+    model = model_builder.build(model_config, is_training=False)
+    image_resizer_config = config_util.get_image_resizer_config(model_config)
+    image_resizer_fn = image_resizer_builder.build(image_resizer_config)
-    input_dict = decoder.decode(example)
+    transform_fn = functools.partial(
+        transform_input_data, model_preprocess_fn=model.preprocess,
+        image_resizer_fn=image_resizer_fn,
+        num_classes=num_classes,
+        data_augmentation_fn=None)
+    decoder = tf_example_decoder.TfExampleDecoder(load_instance_masks=False)
+    input_dict = transform_fn(decoder.decode(example))
    images = tf.to_float(input_dict[fields.InputDataFields.image])
    images = tf.expand_dims(images, axis=0)
    return tf.estimator.export.ServingInputReceiver(
-        features={FEATURES_IMAGE: images},
+        features={fields.InputDataFields.image: images},
        receiver_tensors={SERVING_FED_EXAMPLE_KEY: example})
  return _predict_input_fn
--- a/research/object_detection/inputs_test.py
+++ b/research/object_detection/inputs_test.py
@@ -18,11 +18,14 @@ from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
+import functools
 import os
+import numpy as np
 import tensorflow as tf
 from object_detection import inputs
+from object_detection.core import preprocessor
 from object_detection.core import standard_fields as fields
 from object_detection.utils import config_util
@@ -52,148 +55,516 @@ def _get_configs_for_model(model_name):
 class InputsTest(tf.test.TestCase):
-  def _assert_training_inputs(self, features, labels, num_classes, batch_size):
-    self.assertEqual(batch_size, len(features['images']))
-    self.assertEqual(batch_size, len(features['key']))
-    self.assertEqual(batch_size, len(labels['locations_list']))
-    self.assertEqual(batch_size, len(labels['classes_list']))
-    for i in range(batch_size):
-      image = features['images'][i]
-      key = features['key'][i]
-      locations_list = labels['locations_list'][i]
-      classes_list = labels['classes_list'][i]
-      weights_list = labels[fields.InputDataFields.groundtruth_weights][i]
-      self.assertEqual([1, None, None, 3], image.shape.as_list())
-      self.assertEqual(tf.float32, image.dtype)
-      self.assertEqual(tf.string, key.dtype)
-      self.assertEqual([None, 4], locations_list.shape.as_list())
-      self.assertEqual(tf.float32, locations_list.dtype)
-      self.assertEqual([None, num_classes], classes_list.shape.as_list())
-      self.assertEqual(tf.float32, classes_list.dtype)
-      self.assertEqual([None], weights_list.shape.as_list())
-      self.assertEqual(tf.float32, weights_list.dtype)
-  def _assert_eval_inputs(self, features, labels, num_classes):
-    self.assertEqual(1, len(labels['locations_list']))
-    self.assertEqual(1, len(labels['classes_list']))
-    self.assertEqual(1, len(labels['image_id_list']))
-    self.assertEqual(1, len(labels['area_list']))
-    self.assertEqual(1, len(labels['is_crowd_list']))
-    self.assertEqual(1, len(labels['difficult_list']))
-    image = features['images']
-    key = features['key']
-    locations_list = labels['locations_list'][0]
-    classes_list = labels['classes_list'][0]
-    image_id_list = labels['image_id_list'][0]
-    area_list = labels['area_list'][0]
-    is_crowd_list = labels['is_crowd_list'][0]
-    difficult_list = labels['difficult_list'][0]
-    self.assertEqual([1, None, None, 3], image.shape.as_list())
-    self.assertEqual(tf.float32, image.dtype)
-    self.assertEqual(tf.string, key.dtype)
-    self.assertEqual([None, 4], locations_list.shape.as_list())
-    self.assertEqual(tf.float32, locations_list.dtype)
-    self.assertEqual([None, num_classes], classes_list.shape.as_list())
-    self.assertEqual(tf.float32, classes_list.dtype)
-    self.assertEqual(tf.string, image_id_list.dtype)
-    self.assertEqual(tf.float32, area_list.dtype)
-    self.assertEqual(tf.bool, is_crowd_list.dtype)
-    self.assertEqual(tf.int64, difficult_list.dtype)
  def test_faster_rcnn_resnet50_train_input(self):
    """Tests the training input function for FasterRcnnResnet50."""
    configs = _get_configs_for_model('faster_rcnn_resnet50_pets')
-    classes = 37
+    configs['train_config'].unpad_groundtruth_tensors = True
-    batch_size = configs['train_config'].batch_size
+    model_config = configs['model']
+    model_config.faster_rcnn.num_classes = 37
    train_input_fn = inputs.create_train_input_fn(
-        classes, configs['train_config'], configs['train_input_config'])
+        configs['train_config'], configs['train_input_config'], model_config)
    features, labels = train_input_fn()
-    self._assert_training_inputs(features, labels, classes, batch_size)
+    self.assertAllEqual([None, None, 3],
+                        features[fields.InputDataFields.image].shape.as_list())
+    self.assertEqual(tf.float32, features[fields.InputDataFields.image].dtype)
+    self.assertAllEqual([],
+                        features[inputs.HASH_KEY].shape.as_list())
+    self.assertEqual(tf.int32, features[inputs.HASH_KEY].dtype)
+    self.assertAllEqual(
+        [None, 4],
+        labels[fields.InputDataFields.groundtruth_boxes].shape.as_list())
+    self.assertEqual(tf.float32,
+                     labels[fields.InputDataFields.groundtruth_boxes].dtype)
+    self.assertAllEqual(
+        [None, model_config.faster_rcnn.num_classes],
+        labels[fields.InputDataFields.groundtruth_classes].shape.as_list())
+    self.assertEqual(tf.float32,
+                     labels[fields.InputDataFields.groundtruth_classes].dtype)
+    self.assertAllEqual(
+        [None],
+        labels[fields.InputDataFields.groundtruth_weights].shape.as_list())
+    self.assertEqual(tf.float32,
+                     labels[fields.InputDataFields.groundtruth_weights].dtype)
  def test_faster_rcnn_resnet50_eval_input(self):
    """Tests the eval input function for FasterRcnnResnet50."""
    configs = _get_configs_for_model('faster_rcnn_resnet50_pets')
-    classes = 37
+    model_config = configs['model']
-    eval_input_fn = inputs.create_eval_input_fn(classes, configs['eval_config'],
+    model_config.faster_rcnn.num_classes = 37
-                                                configs['eval_input_config'])
+    eval_input_fn = inputs.create_eval_input_fn(
+        configs['eval_config'], configs['eval_input_config'], model_config)
    features, labels = eval_input_fn()
-    self._assert_eval_inputs(features, labels, classes)
+    self.assertAllEqual([1, None, None, 3],
+                        features[fields.InputDataFields.image].shape.as_list())
+    self.assertEqual(tf.float32, features[fields.InputDataFields.image].dtype)
+    self.assertAllEqual(
+        [1, None, None, 3],
+        features[fields.InputDataFields.original_image].shape.as_list())
+    self.assertEqual(tf.uint8,
+                     features[fields.InputDataFields.original_image].dtype)
+    self.assertAllEqual([1], features[inputs.HASH_KEY].shape.as_list())
+    self.assertEqual(tf.int32, features[inputs.HASH_KEY].dtype)
+    self.assertAllEqual(
+        [1, None, 4],
+        labels[fields.InputDataFields.groundtruth_boxes].shape.as_list())
+    self.assertEqual(tf.float32,
+                     labels[fields.InputDataFields.groundtruth_boxes].dtype)
+    self.assertAllEqual(
+        [1, None, model_config.faster_rcnn.num_classes],
+        labels[fields.InputDataFields.groundtruth_classes].shape.as_list())
+    self.assertEqual(tf.float32,
+                     labels[fields.InputDataFields.groundtruth_classes].dtype)
+    self.assertAllEqual(
+        [1, None],
+        labels[fields.InputDataFields.groundtruth_area].shape.as_list())
+    self.assertEqual(tf.float32,
+                     labels[fields.InputDataFields.groundtruth_area].dtype)
+    self.assertAllEqual(
+        [1, None],
+        labels[fields.InputDataFields.groundtruth_is_crowd].shape.as_list())
+    self.assertEqual(
+        tf.bool, labels[fields.InputDataFields.groundtruth_is_crowd].dtype)
+    self.assertAllEqual(
+        [1, None],
+        labels[fields.InputDataFields.groundtruth_difficult].shape.as_list())
+    self.assertEqual(
+        tf.int32, labels[fields.InputDataFields.groundtruth_difficult].dtype)
  def test_ssd_inceptionV2_train_input(self):
    """Tests the training input function for SSDInceptionV2."""
    configs = _get_configs_for_model('ssd_inception_v2_pets')
-    classes = 37
+    model_config = configs['model']
+    model_config.ssd.num_classes = 37
    batch_size = configs['train_config'].batch_size
    train_input_fn = inputs.create_train_input_fn(
-        classes, configs['train_config'], configs['train_input_config'])
+        configs['train_config'], configs['train_input_config'], model_config)
    features, labels = train_input_fn()
-    self._assert_training_inputs(features, labels, classes, batch_size)
+    self.assertAllEqual([batch_size, 300, 300, 3],
+                        features[fields.InputDataFields.image].shape.as_list())
+    self.assertEqual(tf.float32, features[fields.InputDataFields.image].dtype)
+    self.assertAllEqual([batch_size],
+                        features[inputs.HASH_KEY].shape.as_list())
+    self.assertEqual(tf.int32, features[inputs.HASH_KEY].dtype)
+    self.assertAllEqual(
+        [batch_size],
+        labels[fields.InputDataFields.num_groundtruth_boxes].shape.as_list())
+    self.assertEqual(tf.int32,
+                     labels[fields.InputDataFields.num_groundtruth_boxes].dtype)
+    self.assertAllEqual(
+        [batch_size, 50, 4],
+        labels[fields.InputDataFields.groundtruth_boxes].shape.as_list())
+    self.assertEqual(tf.float32,
+                     labels[fields.InputDataFields.groundtruth_boxes].dtype)
+    self.assertAllEqual(
+        [batch_size, 50, model_config.ssd.num_classes],
+        labels[fields.InputDataFields.groundtruth_classes].shape.as_list())
+    self.assertEqual(tf.float32,
+                     labels[fields.InputDataFields.groundtruth_classes].dtype)
+    self.assertAllEqual(
+        [batch_size, 50],
+        labels[fields.InputDataFields.groundtruth_weights].shape.as_list())
+    self.assertEqual(tf.float32,
+                     labels[fields.InputDataFields.groundtruth_weights].dtype)
  def test_ssd_inceptionV2_eval_input(self):
    """Tests the eval input function for SSDInceptionV2."""
    configs = _get_configs_for_model('ssd_inception_v2_pets')
-    classes = 37
+    model_config = configs['model']
-    eval_input_fn = inputs.create_eval_input_fn(classes, configs['eval_config'],
+    model_config.ssd.num_classes = 37
-                                                configs['eval_input_config'])
+    eval_input_fn = inputs.create_eval_input_fn(
+        configs['eval_config'], configs['eval_input_config'], model_config)
    features, labels = eval_input_fn()
-    self._assert_eval_inputs(features, labels, classes)
+    self.assertAllEqual([1, 300, 300, 3],
+                        features[fields.InputDataFields.image].shape.as_list())
+    self.assertEqual(tf.float32, features[fields.InputDataFields.image].dtype)
+    self.assertAllEqual(
+        [1, None, None, 3],
+        features[fields.InputDataFields.original_image].shape.as_list())
+    self.assertEqual(tf.uint8,
+                     features[fields.InputDataFields.original_image].dtype)
+    self.assertAllEqual([1], features[inputs.HASH_KEY].shape.as_list())
+    self.assertEqual(tf.int32, features[inputs.HASH_KEY].dtype)
+    self.assertAllEqual(
+        [1, None, 4],
+        labels[fields.InputDataFields.groundtruth_boxes].shape.as_list())
+    self.assertEqual(tf.float32,
+                     labels[fields.InputDataFields.groundtruth_boxes].dtype)
+    self.assertAllEqual(
+        [1, None, model_config.ssd.num_classes],
+        labels[fields.InputDataFields.groundtruth_classes].shape.as_list())
+    self.assertEqual(tf.float32,
+                     labels[fields.InputDataFields.groundtruth_classes].dtype)
+    self.assertAllEqual(
+        [1, None],
+        labels[fields.InputDataFields.groundtruth_area].shape.as_list())
+    self.assertEqual(tf.float32,
+                     labels[fields.InputDataFields.groundtruth_area].dtype)
+    self.assertAllEqual(
+        [1, None],
+        labels[fields.InputDataFields.groundtruth_is_crowd].shape.as_list())
+    self.assertEqual(
+        tf.bool, labels[fields.InputDataFields.groundtruth_is_crowd].dtype)
+    self.assertAllEqual(
+        [1, None],
+        labels[fields.InputDataFields.groundtruth_difficult].shape.as_list())
+    self.assertEqual(
+        tf.int32, labels[fields.InputDataFields.groundtruth_difficult].dtype)
  def test_predict_input(self):
    """Tests the predict input function."""
-    predict_input_fn = inputs.create_predict_input_fn()
+    configs = _get_configs_for_model('ssd_inception_v2_pets')
+    predict_input_fn = inputs.create_predict_input_fn(
+        model_config=configs['model'])
    serving_input_receiver = predict_input_fn()
-    image = serving_input_receiver.features['images']
+    image = serving_input_receiver.features[fields.InputDataFields.image]
    receiver_tensors = serving_input_receiver.receiver_tensors[
-        'serialized_example']
+        inputs.SERVING_FED_EXAMPLE_KEY]
-    self.assertEqual([1, None, None, 3], image.shape.as_list())
+    self.assertEqual([1, 300, 300, 3], image.shape.as_list())
    self.assertEqual(tf.float32, image.dtype)
    self.assertEqual(tf.string, receiver_tensors.dtype)
  def test_error_with_bad_train_config(self):
    """Tests that a TypeError is raised with improper train config."""
    configs = _get_configs_for_model('ssd_inception_v2_pets')
-    classes = 37
+    configs['model'].ssd.num_classes = 37
    train_input_fn = inputs.create_train_input_fn(
-        num_classes=classes,
        train_config=configs['eval_config'],  # Expecting `TrainConfig`.
-        train_input_config=configs['train_input_config'])
+        train_input_config=configs['train_input_config'],
+        model_config=configs['model'])
    with self.assertRaises(TypeError):
      train_input_fn()
  def test_error_with_bad_train_input_config(self):
    """Tests that a TypeError is raised with improper train input config."""
    configs = _get_configs_for_model('ssd_inception_v2_pets')
-    classes = 37
+    configs['model'].ssd.num_classes = 37
+    train_input_fn = inputs.create_train_input_fn(
+        train_config=configs['train_config'],
+        train_input_config=configs['model'],  # Expecting `InputReader`.
+        model_config=configs['model'])
+    with self.assertRaises(TypeError):
+      train_input_fn()
+  def test_error_with_bad_train_model_config(self):
+    """Tests that a TypeError is raised with improper train model config."""
+    configs = _get_configs_for_model('ssd_inception_v2_pets')
+    configs['model'].ssd.num_classes = 37
    train_input_fn = inputs.create_train_input_fn(
-        num_classes=classes,
        train_config=configs['train_config'],
-        train_input_config=configs['model'])  # Expecting `InputReader`.
+        train_input_config=configs['train_input_config'],
+        model_config=configs['train_config'])  # Expecting `DetectionModel`.
    with self.assertRaises(TypeError):
      train_input_fn()
  def test_error_with_bad_eval_config(self):
    """Tests that a TypeError is raised with improper eval config."""
    configs = _get_configs_for_model('ssd_inception_v2_pets')
-    classes = 37
+    configs['model'].ssd.num_classes = 37
    eval_input_fn = inputs.create_eval_input_fn(
-        num_classes=classes,
        eval_config=configs['train_config'],  # Expecting `EvalConfig`.
-        eval_input_config=configs['eval_input_config'])
+        eval_input_config=configs['eval_input_config'],
+        model_config=configs['model'])
    with self.assertRaises(TypeError):
      eval_input_fn()
  def test_error_with_bad_eval_input_config(self):
    """Tests that a TypeError is raised with improper eval input config."""
    configs = _get_configs_for_model('ssd_inception_v2_pets')
-    classes = 37
+    configs['model'].ssd.num_classes = 37
    eval_input_fn = inputs.create_eval_input_fn(
-        num_classes=classes,
        eval_config=configs['eval_config'],
-        eval_input_config=configs['model'])  # Expecting `InputReader`.
+        eval_input_config=configs['model'],  # Expecting `InputReader`.
+        model_config=configs['model'])
    with self.assertRaises(TypeError):
      eval_input_fn()
+  def test_error_with_bad_eval_model_config(self):
+    """Tests that a TypeError is raised with improper eval model config."""
+    configs = _get_configs_for_model('ssd_inception_v2_pets')
+    configs['model'].ssd.num_classes = 37
+    eval_input_fn = inputs.create_eval_input_fn(
+        eval_config=configs['eval_config'],
+        eval_input_config=configs['eval_input_config'],
+        model_config=configs['eval_config'])  # Expecting `DetectionModel`.
+    with self.assertRaises(TypeError):
+      eval_input_fn()
+class DataAugmentationFnTest(tf.test.TestCase):
+  def test_apply_image_and_box_augmentation(self):
+    data_augmentation_options = [
+        (preprocessor.resize_image, {
+            'new_height': 20,
+            'new_width': 20,
+            'method': tf.image.ResizeMethod.NEAREST_NEIGHBOR
+        }),
+        (preprocessor.scale_boxes_to_pixel_coordinates, {}),
+    ]
+    data_augmentation_fn = functools.partial(
+        inputs.augment_input_data,
+        data_augmentation_options=data_augmentation_options)
+    tensor_dict = {
+        fields.InputDataFields.image:
+            tf.constant(np.random.rand(10, 10, 3).astype(np.float32)),
+        fields.InputDataFields.groundtruth_boxes:
+            tf.constant(np.array([[.5, .5, 1., 1.]], np.float32))
+    }
+    augmented_tensor_dict = data_augmentation_fn(tensor_dict=tensor_dict)
+    with self.test_session() as sess:
+      augmented_tensor_dict_out = sess.run(augmented_tensor_dict)
+    self.assertAllEqual(
+        augmented_tensor_dict_out[fields.InputDataFields.image].shape,
+        [20, 20, 3]
+    )
+    self.assertAllClose(
+        augmented_tensor_dict_out[fields.InputDataFields.groundtruth_boxes],
+        [[10, 10, 20, 20]]
+    )
+  def test_include_masks_in_data_augmentation(self):
+    data_augmentation_options = [
+        (preprocessor.resize_image, {
+            'new_height': 20,
+            'new_width': 20,
+            'method': tf.image.ResizeMethod.NEAREST_NEIGHBOR
+        })
+    ]
+    data_augmentation_fn = functools.partial(
+        inputs.augment_input_data,
+        data_augmentation_options=data_augmentation_options)
+    tensor_dict = {
+        fields.InputDataFields.image:
+            tf.constant(np.random.rand(10, 10, 3).astype(np.float32)),
+        fields.InputDataFields.groundtruth_instance_masks:
+            tf.constant(np.zeros([2, 10, 10], np.uint8))
+    }
+    augmented_tensor_dict = data_augmentation_fn(tensor_dict=tensor_dict)
+    with self.test_session() as sess:
+      augmented_tensor_dict_out = sess.run(augmented_tensor_dict)
+    self.assertAllEqual(
+        augmented_tensor_dict_out[fields.InputDataFields.image].shape,
+        [20, 20, 3])
+    self.assertAllEqual(augmented_tensor_dict_out[
+        fields.InputDataFields.groundtruth_instance_masks].shape, [2, 20, 20])
+  def test_include_keypoints_in_data_augmentation(self):
+    data_augmentation_options = [
+        (preprocessor.resize_image, {
+            'new_height': 20,
+            'new_width': 20,
+            'method': tf.image.ResizeMethod.NEAREST_NEIGHBOR
+        }),
+        (preprocessor.scale_boxes_to_pixel_coordinates, {}),
+    ]
+    data_augmentation_fn = functools.partial(
+        inputs.augment_input_data,
+        data_augmentation_options=data_augmentation_options)
+    tensor_dict = {
+        fields.InputDataFields.image:
+            tf.constant(np.random.rand(10, 10, 3).astype(np.float32)),
+        fields.InputDataFields.groundtruth_boxes:
+            tf.constant(np.array([[.5, .5, 1., 1.]], np.float32)),
+        fields.InputDataFields.groundtruth_keypoints:
+            tf.constant(np.array([[[0.5, 1.0], [0.5, 0.5]]], np.float32))
+    }
+    augmented_tensor_dict = data_augmentation_fn(tensor_dict=tensor_dict)
+    with self.test_session() as sess:
+      augmented_tensor_dict_out = sess.run(augmented_tensor_dict)
+    self.assertAllEqual(
+        augmented_tensor_dict_out[fields.InputDataFields.image].shape,
+        [20, 20, 3]
+    )
+    self.assertAllClose(
+        augmented_tensor_dict_out[fields.InputDataFields.groundtruth_boxes],
+        [[10, 10, 20, 20]]
+    )
+    self.assertAllClose(
+        augmented_tensor_dict_out[fields.InputDataFields.groundtruth_keypoints],
+        [[[10, 20], [10, 10]]]
+    )
+def _fake_model_preprocessor_fn(image):
+  return (image, tf.expand_dims(tf.shape(image)[1:], axis=0))
+def _fake_image_resizer_fn(image, mask):
+  return (image, mask, tf.shape(image))
+class DataTransformationFnTest(tf.test.TestCase):
+  def test_returns_correct_class_label_encodings(self):
+    tensor_dict = {
+        fields.InputDataFields.image:
+            tf.constant(np.random.rand(4, 4, 3).astype(np.float32)),
+        fields.InputDataFields.groundtruth_boxes:
+            tf.constant(np.array([[0, 0, 1, 1], [.5, .5, 1, 1]], np.float32)),
+        fields.InputDataFields.groundtruth_classes:
+            tf.constant(np.array([3, 1], np.int32))
+    }
+    num_classes = 3
+    input_transformation_fn = functools.partial(
+        inputs.transform_input_data,
+        model_preprocess_fn=_fake_model_preprocessor_fn,
+        image_resizer_fn=_fake_image_resizer_fn,
+        num_classes=num_classes)
+    with self.test_session() as sess:
+      transformed_inputs = sess.run(
+          input_transformation_fn(tensor_dict=tensor_dict))
+    self.assertAllClose(
+        transformed_inputs[fields.InputDataFields.groundtruth_classes],
+        [[0, 0, 1], [1, 0, 0]])
+  def test_returns_correct_merged_boxes(self):
+    tensor_dict = {
+        fields.InputDataFields.image:
+            tf.constant(np.random.rand(4, 4, 3).astype(np.float32)),
+        fields.InputDataFields.groundtruth_boxes:
+            tf.constant(np.array([[.5, .5, 1, 1], [.5, .5, 1, 1]], np.float32)),
+        fields.InputDataFields.groundtruth_classes:
+            tf.constant(np.array([3, 1], np.int32))
+    }
+    num_classes = 3
+    input_transformation_fn = functools.partial(
+        inputs.transform_input_data,
+        model_preprocess_fn=_fake_model_preprocessor_fn,
+        image_resizer_fn=_fake_image_resizer_fn,
+        num_classes=num_classes,
+        merge_multiple_boxes=True)
+    with self.test_session() as sess:
+      transformed_inputs = sess.run(
+          input_transformation_fn(tensor_dict=tensor_dict))
+    self.assertAllClose(
+        transformed_inputs[fields.InputDataFields.groundtruth_boxes],
+        [[.5, .5, 1., 1.]])
+    self.assertAllClose(
+        transformed_inputs[fields.InputDataFields.groundtruth_classes],
+        [[1, 0, 1]])
+  def test_returns_resized_masks(self):
+    tensor_dict = {
+        fields.InputDataFields.image:
+            tf.constant(np.random.rand(4, 4, 3).astype(np.float32)),
+        fields.InputDataFields.groundtruth_instance_masks:
+            tf.constant(np.random.rand(2, 4, 4).astype(np.float32)),
+        fields.InputDataFields.groundtruth_classes:
+            tf.constant(np.array([3, 1], np.int32))
+    }
+    def fake_image_resizer_fn(image, masks):
+      resized_image = tf.image.resize_images(image, [8, 8])
+      resized_masks = tf.transpose(
+          tf.image.resize_images(tf.transpose(masks, [1, 2, 0]), [8, 8]),
+          [2, 0, 1])
+      return resized_image, resized_masks, tf.shape(resized_image)
+    num_classes = 3
+    input_transformation_fn = functools.partial(
+        inputs.transform_input_data,
+        model_preprocess_fn=_fake_model_preprocessor_fn,
+        image_resizer_fn=fake_image_resizer_fn,
+        num_classes=num_classes)
+    with self.test_session() as sess:
+      transformed_inputs = sess.run(
+          input_transformation_fn(tensor_dict=tensor_dict))
+    self.assertAllEqual(transformed_inputs[
+        fields.InputDataFields.groundtruth_instance_masks].shape, [2, 8, 8])
+  def test_applies_model_preprocess_fn_to_image_tensor(self):
+    np_image = np.random.randint(256, size=(4, 4, 3))
+    tensor_dict = {
+        fields.InputDataFields.image:
+            tf.constant(np_image),
+        fields.InputDataFields.groundtruth_classes:
+            tf.constant(np.array([3, 1], np.int32))
+    }
+    def fake_model_preprocessor_fn(image):
+      return (image / 255., tf.expand_dims(tf.shape(image)[1:], axis=0))
+    num_classes = 3
+    input_transformation_fn = functools.partial(
+        inputs.transform_input_data,
+        model_preprocess_fn=fake_model_preprocessor_fn,
+        image_resizer_fn=_fake_image_resizer_fn,
+        num_classes=num_classes)
+    with self.test_session() as sess:
+      transformed_inputs = sess.run(
+          input_transformation_fn(tensor_dict=tensor_dict))
+    self.assertAllClose(transformed_inputs[fields.InputDataFields.image],
+                        np_image / 255.)
+    self.assertAllClose(transformed_inputs[fields.InputDataFields.
+                                           true_image_shape],
+                        [4, 4, 3])
+  def test_applies_data_augmentation_fn_to_tensor_dict(self):
+    np_image = np.random.randint(256, size=(4, 4, 3))
+    tensor_dict = {
+        fields.InputDataFields.image:
+            tf.constant(np_image),
+        fields.InputDataFields.groundtruth_classes:
+            tf.constant(np.array([3, 1], np.int32))
+    }
+    def add_one_data_augmentation_fn(tensor_dict):
+      return {key: value + 1 for key, value in tensor_dict.items()}
+    num_classes = 4
+    input_transformation_fn = functools.partial(
+        inputs.transform_input_data,
+        model_preprocess_fn=_fake_model_preprocessor_fn,
+        image_resizer_fn=_fake_image_resizer_fn,
+        num_classes=num_classes,
+        data_augmentation_fn=add_one_data_augmentation_fn)
+    with self.test_session() as sess:
+      augmented_tensor_dict = sess.run(
+          input_transformation_fn(tensor_dict=tensor_dict))
+    self.assertAllEqual(augmented_tensor_dict[fields.InputDataFields.image],
+                        np_image + 1)
+    self.assertAllEqual(
+        augmented_tensor_dict[fields.InputDataFields.groundtruth_classes],
+        [[0, 0, 0, 1], [0, 1, 0, 0]])
+  def test_applies_data_augmentation_fn_before_model_preprocess_fn(self):
+    np_image = np.random.randint(256, size=(4, 4, 3))
+    tensor_dict = {
+        fields.InputDataFields.image:
+            tf.constant(np_image),
+        fields.InputDataFields.groundtruth_classes:
+            tf.constant(np.array([3, 1], np.int32))
+    }
+    def mul_two_model_preprocessor_fn(image):
+      return (image * 2, tf.expand_dims(tf.shape(image)[1:], axis=0))
+    def add_five_to_image_data_augmentation_fn(tensor_dict):
+      tensor_dict[fields.InputDataFields.image] += 5
+      return tensor_dict
+    num_classes = 4
+    input_transformation_fn = functools.partial(
+        inputs.transform_input_data,
+        model_preprocess_fn=mul_two_model_preprocessor_fn,
+        image_resizer_fn=_fake_image_resizer_fn,
+        num_classes=num_classes,
+        data_augmentation_fn=add_five_to_image_data_augmentation_fn)
+    with self.test_session() as sess:
+      augmented_tensor_dict = sess.run(
+          input_transformation_fn(tensor_dict=tensor_dict))
+    self.assertAllEqual(augmented_tensor_dict[fields.InputDataFields.image],
+                        (np_image + 5) * 2)
 if __name__ == '__main__':
  tf.test.main()
--- a/research/object_detection/matchers/argmax_matcher.py
+++ b/research/object_detection/matchers/argmax_matcher.py
@@ -55,7 +55,8 @@ class ArgMaxMatcher(matcher.Matcher):
               matched_threshold,
               unmatched_threshold=None,
               negatives_lower_than_unmatched=True,
-               force_match_for_each_row=False):
+               force_match_for_each_row=False,
+               use_matmul_gather=False):
    """Construct ArgMaxMatcher.
    Args:
@@ -74,11 +75,15 @@ class ArgMaxMatcher(matcher.Matcher):
        at least one column (which is not guaranteed otherwise if the
        matched_threshold is high). Defaults to False. See
        argmax_matcher_test.testMatcherForceMatch() for an example.
+      use_matmul_gather: Force constructed match objects to use matrix
+        multiplication based gather instead of standard tf.gather.
+        (Default: False).
    Raises:
      ValueError: if unmatched_threshold is set but matched_threshold is not set
        or if unmatched_threshold > matched_threshold.
    """
+    super(ArgMaxMatcher, self).__init__(use_matmul_gather=use_matmul_gather)
    if (matched_threshold is None) and (unmatched_threshold is not None):
      raise ValueError('Need to also define matched_threshold when'
                       'unmatched_threshold is defined')

--- a/research/object_detection/matchers/bipartite_matcher.py
+++ b/research/object_detection/matchers/bipartite_matcher.py
@@ -24,6 +24,17 @@ from object_detection.core import matcher
 class GreedyBipartiteMatcher(matcher.Matcher):
  """Wraps a Tensorflow greedy bipartite matcher."""
+  def __init__(self, use_matmul_gather=False):
+    """Constructs a Matcher.
+    Args:
+      use_matmul_gather: Force constructed match objects to use matrix
+        multiplication based gather instead of standard tf.gather.
+        (Default: False).
+    """
+    super(GreedyBipartiteMatcher, self).__init__(
+        use_matmul_gather=use_matmul_gather)
  def _match(self, similarity_matrix, num_valid_rows=-1):
    """Bipartite matches a collection rows and columns. A greedy bi-partite.

--- a/research/object_detection/meta_architectures/faster_rcnn_meta_arch.py
+++ b/research/object_detection/meta_architectures/faster_rcnn_meta_arch.py
@@ -251,7 +251,8 @@ class FasterRCNNMetaArch(model.DetectionModel):
               second_stage_classification_loss,
               second_stage_mask_prediction_loss_weight=1.0,
               hard_example_miner=None,
-               parallel_iterations=16):
+               parallel_iterations=16,
+               add_summaries=True):
    """FasterRCNNMetaArch Constructor.
    Args:
@@ -355,12 +356,17 @@ class FasterRCNNMetaArch(model.DetectionModel):
      hard_example_miner:  A losses.HardExampleMiner object (can be None).
      parallel_iterations: (Optional) The number of iterations allowed to run
        in parallel for calls to tf.map_fn.
+      add_summaries: boolean (default: True) controlling whether summary ops
+        should be added to tensorflow graph.
    Raises:
      ValueError: If `second_stage_batch_size` > `first_stage_max_proposals` at
        training time.
      ValueError: If first_stage_anchor_generator is not of type
        grid_anchor_generator.GridAnchorGenerator.
    """
+    # TODO: add_summaries is currently unused. Respect that directive
+    # in the future.
    super(FasterRCNNMetaArch, self).__init__(num_classes=num_classes)
    if is_training and second_stage_batch_size > first_stage_max_proposals:

--- a/research/object_detection/meta_architectures/rfcn_meta_arch.py
+++ b/research/object_detection/meta_architectures/rfcn_meta_arch.py
@@ -75,7 +75,8 @@ class RFCNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch):
               second_stage_classification_loss_weight,
               second_stage_classification_loss,
               hard_example_miner,
-               parallel_iterations=16):
+               parallel_iterations=16,
+               add_summaries=True):
    """RFCNMetaArch Constructor.
    Args:
@@ -155,11 +156,16 @@ class RFCNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch):
      hard_example_miner:  A losses.HardExampleMiner object (can be None).
      parallel_iterations: (Optional) The number of iterations allowed to run
        in parallel for calls to tf.map_fn.
+      add_summaries: boolean (default: True) controlling whether summary ops
+        should be added to tensorflow graph.
    Raises:
      ValueError: If `second_stage_batch_size` > `first_stage_max_proposals`
      ValueError: If first_stage_anchor_generator is not of type
        grid_anchor_generator.GridAnchorGenerator.
    """
+    # TODO: add_summaries is currently unused. Respect that directive
+    # in the future.
    super(RFCNMetaArch, self).__init__(
        is_training,
        num_classes,

--- a/research/object_detection/meta_architectures/ssd_meta_arch.py
+++ b/research/object_detection/meta_architectures/ssd_meta_arch.py
@@ -44,7 +44,8 @@ class SSDFeatureExtractor(object):
               conv_hyperparams,
               batch_norm_trainable=True,
               reuse_weights=None,
-               use_explicit_padding=False):
+               use_explicit_padding=False,
+               use_depthwise=False):
    """Constructor.
    Args:
@@ -61,6 +62,7 @@ class SSDFeatureExtractor(object):
      reuse_weights: whether to reuse variables. Default is None.
      use_explicit_padding: Whether to use explicit padding when extracting
        features. Default is False.
+      use_depthwise: Whether to use depthwise convolutions. Default is False.
    """
    self._is_training = is_training
    self._depth_multiplier = depth_multiplier
@@ -70,6 +72,7 @@ class SSDFeatureExtractor(object):
    self._batch_norm_trainable = batch_norm_trainable
    self._reuse_weights = reuse_weights
    self._use_explicit_padding = use_explicit_padding
+    self._use_depthwise = use_depthwise
  @abstractmethod
  def preprocess(self, resized_inputs):
@@ -130,7 +133,7 @@ class SSDMetaArch(model.DetectionModel):
               add_summaries=True):
    """SSDMetaArch Constructor.
-    TODO: group NMS parameters + score converter into
+    TODO(rathodv,jonathanhuang): group NMS parameters + score converter into
    a class and loss parameters into a class and write config protos for
    postprocessing and losses.
@@ -330,7 +333,8 @@ class SSDMetaArch(model.DetectionModel):
      feature_maps = self._feature_extractor.extract_features(
          preprocessed_inputs)
    feature_map_spatial_dims = self._get_feature_map_spatial_dims(feature_maps)
-    image_shape = tf.shape(preprocessed_inputs)
+    image_shape = shape_utils.combined_static_and_dynamic_shape(
+        preprocessed_inputs)
    self._anchors = self._anchor_generator.generate(
        feature_map_spatial_dims,
        im_height=image_shape[1],
@@ -472,11 +476,14 @@ class SSDMetaArch(model.DetectionModel):
      keypoints = None
      if self.groundtruth_has_field(fields.BoxListFields.keypoints):
        keypoints = self.groundtruth_lists(fields.BoxListFields.keypoints)
+      weights = None
+      if self.groundtruth_has_field(fields.BoxListFields.weights):
+        weights = self.groundtruth_lists(fields.BoxListFields.weights)
      (batch_cls_targets, batch_cls_weights, batch_reg_targets,
       batch_reg_weights, match_list) = self._assign_targets(
           self.groundtruth_lists(fields.BoxListFields.boxes),
           self.groundtruth_lists(fields.BoxListFields.classes),
-           keypoints)
+           keypoints, weights)
      if self._add_summaries:
        self._summarize_input(
            self.groundtruth_lists(fields.BoxListFields.boxes), match_list)
@@ -539,7 +546,8 @@ class SSDMetaArch(model.DetectionModel):
                                              'NegativeAnchorLossCDF')
  def _assign_targets(self, groundtruth_boxes_list, groundtruth_classes_list,
-                      groundtruth_keypoints_list=None):
+                      groundtruth_keypoints_list=None,
+                      groundtruth_weights_list=None):
    """Assign groundtruth targets.
    Adds a background class to each one-hot encoding of groundtruth classes
@@ -556,6 +564,8 @@ class SSDMetaArch(model.DetectionModel):
        index assumed to map to the first non-background class.
      groundtruth_keypoints_list: (optional) a list of 3-D tensors of shape
        [num_boxes, num_keypoints, 2]
+      groundtruth_weights_list: A list of 1-D tf.float32 tensors of shape
+        [num_boxes] containing weights for groundtruth boxes.
    Returns:
      batch_cls_targets: a tensor with shape [batch_size, num_anchors,
@@ -582,7 +592,7 @@ class SSDMetaArch(model.DetectionModel):
        boxlist.add_field(fields.BoxListFields.keypoints, keypoints)
    return target_assigner.batch_assign_targets(
        self._target_assigner, self.anchors, groundtruth_boxlists,
-        groundtruth_classes_with_background_list)
+        groundtruth_classes_with_background_list, groundtruth_weights_list)
  def _summarize_input(self, groundtruth_boxes_list, match_list):
    """Creates tensorflow summaries for the input boxes and anchors.

--- a/research/object_detection/metrics/coco_evaluation.py
+++ b/research/object_detection/metrics/coco_evaluation.py
@@ -24,13 +24,17 @@ from object_detection.utils import object_detection_evaluation
 class CocoDetectionEvaluator(object_detection_evaluation.DetectionEvaluator):
  """Class to evaluate COCO detection metrics."""
-  def __init__(self, categories, all_metrics_per_category=False):
+  def __init__(self,
+               categories,
+               include_metrics_per_category=False,
+               all_metrics_per_category=False):
    """Constructor.
    Args:
      categories: A list of dicts, each of which has the following keys -
        'id': (required) an integer id uniquely identifying this category.
        'name': (required) string representing category name e.g., 'cat', 'dog'.
+      include_metrics_per_category: If True, include metrics for each category.
      all_metrics_per_category: Whether to include all the summary metrics for
        each category in per_category_ap. Be careful with setting it to true if
        you have more than handful of categories, because it will pollute
@@ -45,6 +49,7 @@ class CocoDetectionEvaluator(object_detection_evaluation.DetectionEvaluator):
    self._category_id_set = set([cat['id'] for cat in self._categories])
    self._annotation_id = 1
    self._metrics = None
+    self._include_metrics_per_category = include_metrics_per_category
    self._all_metrics_per_category = all_metrics_per_category
  def clear(self):
@@ -166,7 +171,8 @@ class CocoDetectionEvaluator(object_detection_evaluation.DetectionEvaluator):
      'DetectionBoxes_Recall/AR@100 (large)': average recall for large objects
        with 100 detections.
-      2. per_category_ap: category specific results with keys of the form:
+      2. per_category_ap: if include_metrics_per_category is True, category
+      specific results with keys of the form:
      'Precision mAP ByCategory/category' (without the supercategory part if
      no supercategories exist). For backward compatibility
      'PerformanceByCategory' is included in the output regardless of
@@ -183,6 +189,7 @@ class CocoDetectionEvaluator(object_detection_evaluation.DetectionEvaluator):
    box_evaluator = coco_tools.COCOEvalWrapper(
        coco_wrapped_groundtruth, coco_wrapped_detections, agnostic_mode=False)
    box_metrics, box_per_category_ap = box_evaluator.ComputeMetrics(
+        include_metrics_per_category=self._include_metrics_per_category,
        all_metrics_per_category=self._all_metrics_per_category)
    box_metrics.update(box_per_category_ap)
    box_metrics = {'DetectionBoxes_'+ key: value
@@ -253,9 +260,10 @@ class CocoDetectionEvaluator(object_detection_evaluation.DetectionEvaluator):
                    'DetectionBoxes_Recall/AR@100 (large)',
                    'DetectionBoxes_Recall/AR@100 (medium)',
                    'DetectionBoxes_Recall/AR@100 (small)']
-    for category_dict in self._categories:
+    if self._include_metrics_per_category:
-      metric_names.append('DetectionBoxes_PerformanceByCategory/mAP/' +
+      for category_dict in self._categories:
-                          category_dict['name'])
+        metric_names.append('DetectionBoxes_PerformanceByCategory/mAP/' +
+                            category_dict['name'])
    def first_value_func():
      self._metrics = self.evaluate()
@@ -289,13 +297,14 @@ def _check_mask_type_and_value(array_name, masks):
 class CocoMaskEvaluator(object_detection_evaluation.DetectionEvaluator):
  """Class to evaluate COCO detection metrics."""
-  def __init__(self, categories):
+  def __init__(self, categories, include_metrics_per_category=False):
    """Constructor.
    Args:
      categories: A list of dicts, each of which has the following keys -
        'id': (required) an integer id uniquely identifying this category.
        'name': (required) string representing category name e.g., 'cat', 'dog'.
+      include_metrics_per_category: If True, include metrics for each category.
    """
    super(CocoMaskEvaluator, self).__init__(categories)
    self._image_id_to_mask_shape_map = {}
@@ -304,6 +313,7 @@ class CocoMaskEvaluator(object_detection_evaluation.DetectionEvaluator):
    self._detection_masks_list = []
    self._category_id_set = set([cat['id'] for cat in self._categories])
    self._annotation_id = 1
+    self._include_metrics_per_category = include_metrics_per_category
  def clear(self):
    """Clears the state to prepare for a fresh evaluation."""
@@ -438,7 +448,8 @@ class CocoMaskEvaluator(object_detection_evaluation.DetectionEvaluator):
      'Recall/AR@100 (large)': average recall for large objects with 100
        detections
-      2. per_category_ap: category specific results with keys of the form:
+      2. per_category_ap: if include_metrics_per_category is True, category
+      specific results with keys of the form:
      'Precision mAP ByCategory/category' (without the supercategory part if
      no supercategories exist). For backward compatibility
      'PerformanceByCategory' is included in the output regardless of
@@ -458,7 +469,8 @@ class CocoMaskEvaluator(object_detection_evaluation.DetectionEvaluator):
    mask_evaluator = coco_tools.COCOEvalWrapper(
        coco_wrapped_groundtruth, coco_wrapped_detection_masks,
        agnostic_mode=False, iou_type='segm')
-    mask_metrics, mask_per_category_ap = mask_evaluator.ComputeMetrics()
+    mask_metrics, mask_per_category_ap = mask_evaluator.ComputeMetrics(
+        include_metrics_per_category=self._include_metrics_per_category)
    mask_metrics.update(mask_per_category_ap)
    mask_metrics = {'DetectionMasks_'+ key: value
                    for key, value in mask_metrics.iteritems()}

--- a/research/object_detection/metrics/coco_evaluation_test.py
+++ b/research/object_detection/metrics/coco_evaluation_test.py
@@ -12,13 +12,12 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ==============================================================================
-"""Tests for image.understanding.object_detection.metrics.coco_evaluation."""
+"""Tests for tensorflow_models.object_detection.metrics.coco_evaluation."""
 from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
-import math
 import numpy as np
 import tensorflow as tf
 from object_detection.core import standard_fields
@@ -87,43 +86,6 @@ class CocoDetectionEvaluationTest(tf.test.TestCase):
    metrics = coco_evaluator.evaluate()
    self.assertAlmostEqual(metrics['DetectionBoxes_Precision/mAP'], 1.0)
-  def testReturnAllMetricsPerCategory(self):
-    """Tests that mAP is calculated correctly on GT and Detections."""
-    category_list = [{'id': 0, 'name': 'person'}]
-    coco_evaluator = coco_evaluation.CocoDetectionEvaluator(
-        category_list, all_metrics_per_category=True)
-    coco_evaluator.add_single_ground_truth_image_info(
-        image_id='image1',
-        groundtruth_dict={
-            standard_fields.InputDataFields.groundtruth_boxes:
-            np.array([[100., 100., 200., 200.]]),
-            standard_fields.InputDataFields.groundtruth_classes: np.array([1])
-        })
-    coco_evaluator.add_single_detected_image_info(
-        image_id='image1',
-        detections_dict={
-            standard_fields.DetectionResultFields.detection_boxes:
-            np.array([[100., 100., 200., 200.]]),
-            standard_fields.DetectionResultFields.detection_scores:
-            np.array([.8]),
-            standard_fields.DetectionResultFields.detection_classes:
-            np.array([1])
-        })
-    metrics = coco_evaluator.evaluate()
-    expected_metrics = [
-        'DetectionBoxes_Recall AR@10 ByCategory/person',
-        'DetectionBoxes_Precision mAP (medium) ByCategory/person',
-        'DetectionBoxes_Precision mAP ByCategory/person',
-        'DetectionBoxes_Precision mAP@.50IOU ByCategory/person',
-        'DetectionBoxes_Precision mAP (small) ByCategory/person',
-        'DetectionBoxes_Precision mAP (large) ByCategory/person',
-        'DetectionBoxes_Recall AR@1 ByCategory/person',
-        'DetectionBoxes_Precision mAP@.75IOU ByCategory/person',
-        'DetectionBoxes_Recall AR@100 ByCategory/person',
-        'DetectionBoxes_Recall AR@100 (medium) ByCategory/person',
-        'DetectionBoxes_Recall AR@100 (large) ByCategory/person']
-    self.assertTrue(set(expected_metrics).issubset(set(metrics)))
  def testRejectionOnDuplicateGroundtruth(self):
    """Tests that groundtruth cannot be added more than once for an image."""
    categories = [{'id': 1, 'name': 'cat'},
@@ -279,12 +241,6 @@ class CocoEvaluationPyFuncTest(tf.test.TestCase):
    self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@100 (medium)'],
                           -1.0)
    self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@100 (small)'], 1.0)
-    self.assertAlmostEqual(metrics[
-        'DetectionBoxes_PerformanceByCategory/mAP/dog'], 1.0)
-    self.assertAlmostEqual(metrics[
-        'DetectionBoxes_PerformanceByCategory/mAP/cat'], 1.0)
-    self.assertTrue(math.isnan(metrics[
-        'DetectionBoxes_PerformanceByCategory/mAP/person']))
    self.assertFalse(coco_evaluator._groundtruth_list)
    self.assertFalse(coco_evaluator._detection_boxes_list)
    self.assertFalse(coco_evaluator._image_ids)

--- a/research/object_detection/metrics/coco_tools.py
+++ b/research/object_detection/metrics/coco_tools.py
@@ -189,14 +189,18 @@ class COCOEvalWrapper(cocoeval.COCOeval):
    """Returns list of valid category ids."""
    return self.params.catIds
-  def ComputeMetrics(self, all_metrics_per_category=False):
+  def ComputeMetrics(self,
+                     include_metrics_per_category=False,
+                     all_metrics_per_category=False):
    """Computes detection metrics.
    Args:
+      include_metrics_per_category: If True, will include metrics per category.
      all_metrics_per_category: If true, include all the summery metrics for
        each category in per_category_ap. Be careful with setting it to true if
        you have more than handful of categories, because it will pollute
        your mldash.
    Returns:
      1. summary_metrics: a dictionary holding:
        'Precision/mAP': mean average precision over classes averaged over IOU
@@ -225,6 +229,9 @@ class COCOEvalWrapper(cocoeval.COCOeval):
        output regardless of all_metrics_per_category.
        If evaluating class-agnostic mode, per_category_ap is an empty
        dictionary.
+    Raises:
+      ValueError: If category_stats does not exist.
    """
    self.evaluate()
    self.accumulate()
@@ -244,6 +251,10 @@ class COCOEvalWrapper(cocoeval.COCOeval):
        ('Recall/AR@100 (medium)', self.stats[10]),
        ('Recall/AR@100 (large)', self.stats[11])
    ])
+    if not include_metrics_per_category:
+      return summary_metrics, {}
+    if not hasattr(self, 'category_stats'):
+      raise ValueError('Category stats do not exist')
    per_category_ap = OrderedDict([])
    if self.GetAgnosticMode():
      return summary_metrics, per_category_ap

--- a/research/object_detection/metrics/coco_tools_test.py
+++ b/research/object_detection/metrics/coco_tools_test.py
@@ -12,7 +12,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ==============================================================================
-"""Tests for google3.image.understanding.object_detection.metrics.coco_tools."""
+"""Tests for tensorflow_model.object_detection.metrics.coco_tools."""
 import json
 import os
 import re

--- a/research/object_detection/model.py
+++ b/research/object_detection/model.py
+# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+r"""Creates and runs `Experiment` for object detection model.
+This uses the TF.learn framework to define and run an object detection model
+wrapped in an `Estimator`.
+Note that this module is only compatible with SSD Meta architecture at the
+moment.
+"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import functools
+import os
+import tensorflow as tf
+from google.protobuf import text_format
+from tensorflow.contrib.learn.python.learn import learn_runner
+from tensorflow.contrib.tpu.python.tpu import tpu_optimizer
+from object_detection import eval_util
+from object_detection import inputs
+from object_detection import model_hparams
+from object_detection.builders import model_builder
+from object_detection.builders import optimizer_builder
+from object_detection.core import standard_fields as fields
+from object_detection.metrics import coco_evaluation
+from object_detection.utils import config_util
+from object_detection.utils import label_map_util
+from object_detection.utils import shape_utils
+from object_detection.utils import variables_helper
+from object_detection.utils import visualization_utils as vis_utils
+tf.flags.DEFINE_string('model_dir', None, 'Path to output model directory '
+                       'where event and checkpoint files will be written.')
+tf.flags.DEFINE_string('pipeline_config_path', None, 'Path to pipeline config '
+                       'file.')
+tf.flags.DEFINE_integer('num_train_steps', 500000, 'Number of train steps.')
+tf.flags.DEFINE_integer('num_eval_steps', 10000, 'Number of train steps.')
+FLAGS = tf.flags.FLAGS
+def _get_groundtruth_data(detection_model, class_agnostic):
+  """Extracts groundtruth data from detection_model.
+  Args:
+    detection_model: A `DetectionModel` object.
+    class_agnostic: Whether the detections are class_agnostic.
+  Returns:
+    A tuple of:
+    groundtruth: Dictionary with the following fields:
+      'groundtruth_boxes': [num_boxes, 4] float32 tensor of boxes, in
+        normalized coordinates.
+      'groundtruth_classes': [num_boxes] int64 tensor of 1-indexed classes.
+      'groundtruth_masks': 3D float32 tensor of instance masks (if provided in
+        groundtruth)
+    class_agnostic: Boolean indicating whether detections are class agnostic.
+  """
+  input_data_fields = fields.InputDataFields()
+  groundtruth_boxes = detection_model.groundtruth_lists(
+      fields.BoxListFields.boxes)[0]
+  # For class-agnostic models, groundtruth one-hot encodings collapse to all
+  # ones.
+  if class_agnostic:
+    groundtruth_boxes_shape = tf.shape(groundtruth_boxes)
+    groundtruth_classes_one_hot = tf.ones([groundtruth_boxes_shape[0], 1])
+  else:
+    groundtruth_classes_one_hot = detection_model.groundtruth_lists(
+        fields.BoxListFields.classes)[0]
+  label_id_offset = 1  # Applying label id offset (b/63711816)
+  groundtruth_classes = (
+      tf.argmax(groundtruth_classes_one_hot, axis=1) + label_id_offset)
+  groundtruth = {
+      input_data_fields.groundtruth_boxes: groundtruth_boxes,
+      input_data_fields.groundtruth_classes: groundtruth_classes
+  }
+  if detection_model.groundtruth_has_field(fields.BoxListFields.masks):
+    groundtruth[input_data_fields.groundtruth_instance_masks] = (
+        detection_model.groundtruth_lists(fields.BoxListFields.masks)[0])
+  return groundtruth
+def unstack_batch(tensor_dict, unpad_groundtruth_tensors=True):
+  """Unstacks all tensors in `tensor_dict` along 0th dimension.
+  Unstacks tensor from the tensor dict along 0th dimension and returns a
+  tensor_dict containing values that are lists of unstacked tensors.
+  Tensors in the `tensor_dict` are expected to be of one of the three shapes:
+  1. [batch_size]
+  2. [batch_size, height, width, channels]
+  3. [batch_size, num_boxes, d1, d2, ... dn]
+  When unpad_tensors is set to true, unstacked tensors of form 3 above are
+  sliced along the `num_boxes` dimension using the value in tensor
+  field.InputDataFields.num_groundtruth_boxes.
+  Note that this function has a static list of input data fields and has to be
+  kept in sync with the InputDataFields defined in core/standard_fields.py
+  Args:
+    tensor_dict: A dictionary of batched groundtruth tensors.
+    unpad_groundtruth_tensors: Whether to remove padding along `num_boxes`
+      dimension of the groundtruth tensors.
+  Returns:
+    A dictionary where the keys are from fields.InputDataFields and values are
+    a list of unstacked (optionally unpadded) tensors.
+  Raises:
+    ValueError: If unpad_tensors is True and `tensor_dict` does not contain
+      `num_groundtruth_boxes` tensor.
+  """
+  unbatched_tensor_dict = {key: tf.unstack(tensor)
+                           for key, tensor in tensor_dict.items()}
+  if unpad_groundtruth_tensors:
+    if (fields.InputDataFields.num_groundtruth_boxes not in
+        unbatched_tensor_dict):
+      raise ValueError('`num_groundtruth_boxes` not found in tensor_dict. '
+                       'Keys available: {}'.format(
+                           unbatched_tensor_dict.keys()))
+    unbatched_unpadded_tensor_dict = {}
+    unpad_keys = set([
+        # List of input data fields that are padded along the num_boxes
+        # dimension. This list has to be kept in sync with InputDataFields in
+        # standard_fields.py.
+        fields.InputDataFields.groundtruth_instance_masks,
+        fields.InputDataFields.groundtruth_classes,
+        fields.InputDataFields.groundtruth_boxes,
+        fields.InputDataFields.groundtruth_keypoints,
+        fields.InputDataFields.groundtruth_group_of,
+        fields.InputDataFields.groundtruth_difficult,
+        fields.InputDataFields.groundtruth_is_crowd,
+        fields.InputDataFields.groundtruth_area,
+        fields.InputDataFields.groundtruth_weights
+    ]).intersection(set(unbatched_tensor_dict.keys()))
+    for key in unpad_keys:
+      unpadded_tensor_list = []
+      for num_gt, padded_tensor in zip(
+          unbatched_tensor_dict[fields.InputDataFields.num_groundtruth_boxes],
+          unbatched_tensor_dict[key]):
+        tensor_shape = shape_utils.combined_static_and_dynamic_shape(
+            padded_tensor)
+        slice_begin = tf.zeros([len(tensor_shape)], dtype=tf.int32)
+        slice_size = tf.stack(
+            [num_gt] + [-1 if dim is None else dim for dim in tensor_shape[1:]])
+        unpadded_tensor = tf.slice(padded_tensor, slice_begin, slice_size)
+        unpadded_tensor_list.append(unpadded_tensor)
+      unbatched_unpadded_tensor_dict[key] = unpadded_tensor_list
+    unbatched_tensor_dict.update(unbatched_unpadded_tensor_dict)
+  return unbatched_tensor_dict
+def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
+  """Creates a model function for `Estimator`.
+  Args:
+    detection_model_fn: Function that returns a `DetectionModel` instance.
+    configs: Dictionary of pipeline config objects.
+    hparams: `HParams` object.
+    use_tpu: Boolean indicating whether model should be constructed for
+        use on TPU.
+  Returns:
+    `model_fn` for `Estimator`.
+  """
+  train_config = configs['train_config']
+  eval_input_config = configs['eval_input_config']
+  def model_fn(features, labels, mode, params=None):
+    """Constructs the object detection model.
+    Args:
+      features: Dictionary of feature tensors, returned from `input_fn`.
+      labels: Dictionary of groundtruth tensors if mode is TRAIN or EVAL,
+        otherwise None.
+      mode: Mode key from tf.estimator.ModeKeys.
+      params: Parameter dictionary passed from the estimator.
+    Returns:
+      An `EstimatorSpec` that encapsulates the model and its serving
+        configurations.
+    """
+    params = params or {}
+    total_loss, train_op, detections, export_outputs = None, None, None, None
+    is_training = mode == tf.estimator.ModeKeys.TRAIN
+    detection_model = detection_model_fn(is_training=is_training,
+                                         add_summaries=(not use_tpu))
+    scaffold_fn = None
+    if mode == tf.estimator.ModeKeys.TRAIN:
+      labels = unstack_batch(
+          labels,
+          unpad_groundtruth_tensors=train_config.unpad_groundtruth_tensors)
+    elif mode == tf.estimator.ModeKeys.EVAL:
+      labels = unstack_batch(labels, unpad_groundtruth_tensors=False)
+    if mode in (tf.estimator.ModeKeys.TRAIN, tf.estimator.ModeKeys.EVAL):
+      gt_boxes_list = labels[fields.InputDataFields.groundtruth_boxes]
+      gt_classes_list = labels[fields.InputDataFields.groundtruth_classes]
+      gt_masks_list = None
+      if fields.InputDataFields.groundtruth_instance_masks in labels:
+        gt_masks_list = labels[
+            fields.InputDataFields.groundtruth_instance_masks]
+      gt_keypoints_list = None
+      if fields.InputDataFields.groundtruth_keypoints in labels:
+        gt_keypoints_list = labels[fields.InputDataFields.groundtruth_keypoints]
+      detection_model.provide_groundtruth(
+          groundtruth_boxes_list=gt_boxes_list,
+          groundtruth_classes_list=gt_classes_list,
+          groundtruth_masks_list=gt_masks_list,
+          groundtruth_keypoints_list=gt_keypoints_list)
+    preprocessed_images = features[fields.InputDataFields.image]
+    prediction_dict = detection_model.predict(
+        preprocessed_images, features[fields.InputDataFields.true_image_shape])
+    detections = detection_model.postprocess(
+        prediction_dict, features[fields.InputDataFields.true_image_shape])
+    if mode == tf.estimator.ModeKeys.TRAIN:
+      if train_config.fine_tune_checkpoint and hparams.load_pretrained:
+        asg_map = detection_model.restore_map(
+            from_detection_checkpoint=train_config.from_detection_checkpoint,
+            load_all_detection_checkpoint_vars=(
+                train_config.load_all_detection_checkpoint_vars))
+        available_var_map = (
+            variables_helper.get_variables_available_in_checkpoint(
+                asg_map, train_config.fine_tune_checkpoint,
+                include_global_step=False))
+        if use_tpu:
+          def tpu_scaffold():
+            tf.train.init_from_checkpoint(train_config.fine_tune_checkpoint,
+                                          available_var_map)
+            return tf.train.Scaffold()
+          scaffold_fn = tpu_scaffold
+        else:
+          tf.train.init_from_checkpoint(train_config.fine_tune_checkpoint,
+                                        available_var_map)
+    if mode in (tf.estimator.ModeKeys.TRAIN, tf.estimator.ModeKeys.EVAL):
+      losses_dict = detection_model.loss(
+          prediction_dict, features[fields.InputDataFields.true_image_shape])
+      losses = [loss_tensor for loss_tensor in losses_dict.itervalues()]
+      total_loss = tf.add_n(losses, name='total_loss')
+    if mode == tf.estimator.ModeKeys.TRAIN:
+      global_step = tf.train.get_or_create_global_step()
+      training_optimizer, optimizer_summary_vars = optimizer_builder.build(
+          train_config.optimizer)
+      if use_tpu:
+        training_optimizer = tpu_optimizer.CrossShardOptimizer(
+            training_optimizer)
+      # Optionally freeze some layers by setting their gradients to be zero.
+      trainable_variables = None
+      if train_config.freeze_variables:
+        trainable_variables = tf.contrib.framework.filter_variables(
+            tf.trainable_variables(),
+            exclude_patterns=train_config.freeze_variables)
+      clip_gradients_value = None
+      if train_config.gradient_clipping_by_norm > 0:
+        clip_gradients_value = train_config.gradient_clipping_by_norm
+      if not use_tpu:
+        for var in optimizer_summary_vars:
+          tf.summary.scalar(var.op.name, var)
+      summaries = [] if use_tpu else None
+      train_op = tf.contrib.layers.optimize_loss(
+          loss=total_loss,
+          global_step=global_step,
+          learning_rate=None,
+          clip_gradients=clip_gradients_value,
+          optimizer=training_optimizer,
+          variables=trainable_variables,
+          summaries=summaries,
+          name='')  # Preventing scope prefix on all variables.
+    if mode == tf.estimator.ModeKeys.PREDICT:
+      export_outputs = {
+          tf.saved_model.signature_constants.PREDICT_METHOD_NAME:
+              tf.estimator.export.PredictOutput(detections)
+      }
+    eval_metric_ops = None
+    if mode == tf.estimator.ModeKeys.EVAL:
+      # Detection summaries during eval.
+      class_agnostic = (fields.DetectionResultFields.detection_classes
+                        not in detections)
+      groundtruth = _get_groundtruth_data(detection_model, class_agnostic)
+      eval_dict = eval_util.result_dict_for_single_example(
+          tf.expand_dims(features[fields.InputDataFields.original_image][0], 0),
+          features[inputs.HASH_KEY][0],
+          detections,
+          groundtruth,
+          class_agnostic=class_agnostic,
+          scale_to_absolute=False)
+      if class_agnostic:
+        category_index = label_map_util.create_class_agnostic_category_index()
+      else:
+        category_index = label_map_util.create_category_index_from_labelmap(
+            eval_input_config.label_map_path)
+      detection_and_groundtruth = vis_utils.draw_side_by_side_evaluation_image(
+          eval_dict, category_index, max_boxes_to_draw=20, min_score_thresh=0.2)
+      if not use_tpu:
+        tf.summary.image('Detections_Left_Groundtruth_Right',
+                         detection_and_groundtruth)
+      # Eval metrics on a single image.
+      detection_fields = fields.DetectionResultFields()
+      input_data_fields = fields.InputDataFields()
+      coco_evaluator = coco_evaluation.CocoDetectionEvaluator(
+          category_index.values())
+      eval_metric_ops = coco_evaluator.get_estimator_eval_metric_ops(
+          image_id=eval_dict[input_data_fields.key],
+          groundtruth_boxes=eval_dict[input_data_fields.groundtruth_boxes],
+          groundtruth_classes=eval_dict[input_data_fields.groundtruth_classes],
+          detection_boxes=eval_dict[detection_fields.detection_boxes],
+          detection_scores=eval_dict[detection_fields.detection_scores],
+          detection_classes=eval_dict[detection_fields.detection_classes])
+    if use_tpu:
+      return tf.contrib.tpu.TPUEstimatorSpec(
+          mode=mode,
+          scaffold_fn=scaffold_fn,
+          predictions=detections,
+          loss=total_loss,
+          train_op=train_op,
+          eval_metrics=eval_metric_ops,
+          export_outputs=export_outputs)
+    else:
+      return tf.estimator.EstimatorSpec(
+          mode=mode,
+          predictions=detections,
+          loss=total_loss,
+          train_op=train_op,
+          eval_metric_ops=eval_metric_ops,
+          export_outputs=export_outputs)
+  return model_fn
+def _build_experiment_fn(train_steps, eval_steps):
+  """Returns a function that creates an `Experiment`."""
+  def build_experiment(run_config, hparams):
+    """Builds an `Experiment` from configuration and hyperparameters.
+    Args:
+      run_config: A `RunConfig`.
+      hparams: A `HParams`.
+    Returns:
+      An `Experiment` object.
+    """
+    return populate_experiment(run_config, hparams, FLAGS.pipeline_config_path,
+                               train_steps, eval_steps)
+  return build_experiment
+def populate_experiment(run_config,
+                        hparams,
+                        pipeline_config_path,
+                        train_steps=None,
+                        eval_steps=None,
+                        model_fn_creator=create_model_fn,
+                        **kwargs):
+  """Populates an `Experiment` object.
+  Args:
+    run_config: A `RunConfig`.
+    hparams: A `HParams`.
+    pipeline_config_path: A path to a pipeline config file.
+    train_steps: Number of training steps. If None, the number of training steps
+      is set from the `TrainConfig` proto.
+    eval_steps: Number of evaluation steps per evaluation cycle. If None, the
+      number of evaluation steps is set from the `EvalConfig` proto.
+    model_fn_creator: A function that creates a `model_fn` for `Estimator`.
+      Follows the signature:
+      * Args:
+        * `detection_model_fn`: Function that returns `DetectionModel` instance.
+        * `configs`: Dictionary of pipeline config objects.
+        * `hparams`: `HParams` object.
+      * Returns:
+        `model_fn` for `Estimator`.
+    **kwargs: Additional keyword arguments for configuration override.
+  Returns:
+    An `Experiment` that defines all aspects of training, evaluation, and
+    export.
+  """
+  configs = config_util.get_configs_from_pipeline_file(pipeline_config_path)
+  configs = config_util.merge_external_params_with_configs(
+      configs,
+      hparams,
+      train_steps=train_steps,
+      eval_steps=eval_steps,
+      **kwargs)
+  model_config = configs['model']
+  train_config = configs['train_config']
+  train_input_config = configs['train_input_config']
+  eval_config = configs['eval_config']
+  eval_input_config = configs['eval_input_config']
+  if train_steps is None:
+    train_steps = train_config.num_steps if train_config.num_steps else None
+  if eval_steps is None:
+    eval_steps = eval_config.num_examples if eval_config.num_examples else None
+  detection_model_fn = functools.partial(
+      model_builder.build, model_config=model_config)
+  # Create the input functions for TRAIN/EVAL.
+  train_input_fn = inputs.create_train_input_fn(
+      train_config=train_config,
+      train_input_config=train_input_config,
+      model_config=model_config)
+  eval_input_fn = inputs.create_eval_input_fn(
+      eval_config=eval_config,
+      eval_input_config=eval_input_config,
+      model_config=model_config)
+  export_strategies = [
+      tf.contrib.learn.utils.saved_model_export_utils.make_export_strategy(
+          serving_input_fn=inputs.create_predict_input_fn(
+              model_config=model_config))
+  ]
+  estimator = tf.estimator.Estimator(
+      model_fn=model_fn_creator(detection_model_fn, configs, hparams),
+      config=run_config)
+  if run_config.is_chief:
+    # Store the final pipeline config for traceability.
+    pipeline_config_final = config_util.create_pipeline_proto_from_configs(
+        configs)
+    pipeline_config_final_path = os.path.join(estimator.model_dir,
+                                              'pipeline.config')
+    config_text = text_format.MessageToString(pipeline_config_final)
+    with tf.gfile.Open(pipeline_config_final_path, 'wb') as f:
+      tf.logging.info('Writing as-run pipeline config file to %s',
+                      pipeline_config_final_path)
+      f.write(config_text)
+  return tf.contrib.learn.Experiment(
+      estimator=estimator,
+      train_input_fn=train_input_fn,
+      eval_input_fn=eval_input_fn,
+      train_steps=train_steps,
+      eval_steps=eval_steps,
+      export_strategies=export_strategies,
+      eval_delay_secs=120,)
+def main(unused_argv):
+  tf.flags.mark_flag_as_required('model_dir')
+  tf.flags.mark_flag_as_required('pipeline_config_path')
+  config = tf.contrib.learn.RunConfig(model_dir=FLAGS.model_dir)
+  learn_runner.run(
+      experiment_fn=_build_experiment_fn(FLAGS.num_train_steps,
+                                         FLAGS.num_eval_steps),
+      run_config=config,
+      hparams=model_hparams.create_hparams())
+if __name__ == '__main__':
+  tf.app.run()
--- a/research/object_detection/model_hparams.py
+++ b/research/object_detection/model_hparams.py
+# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Hyperparameters for the object detection model in TF.learn.
+This file consolidates and documents the hyperparameters used by the model.
+"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import tensorflow as tf
+def create_hparams(hparams_overrides=None):
+  """Returns hyperparameters, including any flag value overrides.
+  Args:
+    hparams_overrides: Optional hparams overrides, represented as a
+      string containing comma-separated hparam_name=value pairs.
+  Returns:
+    The hyperparameters as a tf.HParams object.
+  """
+  hparams = tf.contrib.training.HParams(
+      # Whether a fine tuning checkpoint (provided in the pipeline config)
+      # should be loaded for training.
+      load_pretrained=True)
+  # Override any of the preceding hyperparameter values.
+  if hparams_overrides:
+    hparams = hparams.parse(hparams_overrides)
+  return hparams
--- a/research/object_detection/model_test.py
+++ b/research/object_detection/model_test.py
+# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for object detection model."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import functools
+import os
+import numpy as np
+import tensorflow as tf
+from object_detection import inputs
+from object_detection import model
+from object_detection import model_hparams
+from object_detection import model_test_util
+from object_detection.builders import model_builder
+from object_detection.core import standard_fields as fields
+from object_detection.utils import config_util
+FLAGS = tf.flags.FLAGS
+MODEL_NAME_FOR_TEST = model_test_util.SSD_INCEPTION_MODEL_NAME
+def _get_data_path():
+  """Returns an absolute path to TFRecord file."""
+  return os.path.join(FLAGS.test_srcdir, model_test_util.PATH_BASE, 'test_data',
+                      'pets_examples.record')
+def _get_labelmap_path():
+  """Returns an absolute path to label map file."""
+  return os.path.join(FLAGS.test_srcdir, model_test_util.PATH_BASE, 'data',
+                      'pet_label_map.pbtxt')
+def _get_configs_for_model(model_name):
+  """Returns configurations for model."""
+  filename = model_test_util.GetPipelineConfigPath(model_name)
+  data_path = _get_data_path()
+  label_map_path = _get_labelmap_path()
+  configs = config_util.get_configs_from_pipeline_file(filename)
+  configs = config_util.merge_external_params_with_configs(
+      configs,
+      train_input_path=data_path,
+      eval_input_path=data_path,
+      label_map_path=label_map_path)
+  return configs
+def setUpModule():
+  model_test_util.InitializeFlags(MODEL_NAME_FOR_TEST)
+class ModelTflearnTest(tf.test.TestCase):
+  @classmethod
+  def setUpClass(cls):
+    tf.reset_default_graph()
+  def _assert_outputs_for_train_eval(self, configs, mode, class_agnostic=False):
+    model_config = configs['model']
+    train_config = configs['train_config']
+    with tf.Graph().as_default():
+      if mode == tf.estimator.ModeKeys.TRAIN:
+        features, labels = inputs.create_train_input_fn(
+            configs['train_config'],
+            configs['train_input_config'],
+            configs['model'])()
+        batch_size = train_config.batch_size
+      else:
+        features, labels = inputs.create_eval_input_fn(
+            configs['eval_config'],
+            configs['eval_input_config'],
+            configs['model'])()
+        batch_size = 1
+      detection_model_fn = functools.partial(
+          model_builder.build, model_config=model_config, is_training=True)
+      hparams = model_hparams.create_hparams(
+          hparams_overrides='load_pretrained=false')
+      model_fn = model.create_model_fn(detection_model_fn, configs, hparams)
+      estimator_spec = model_fn(features, labels, mode)
+      self.assertIsNotNone(estimator_spec.loss)
+      self.assertIsNotNone(estimator_spec.predictions)
+      if class_agnostic:
+        self.assertNotIn('detection_classes', estimator_spec.predictions)
+      else:
+        detection_classes = estimator_spec.predictions['detection_classes']
+        self.assertEqual(batch_size, detection_classes.shape.as_list()[0])
+        self.assertEqual(tf.float32, detection_classes.dtype)
+      detection_boxes = estimator_spec.predictions['detection_boxes']
+      detection_scores = estimator_spec.predictions['detection_scores']
+      num_detections = estimator_spec.predictions['num_detections']
+      self.assertEqual(batch_size, detection_boxes.shape.as_list()[0])
+      self.assertEqual(tf.float32, detection_boxes.dtype)
+      self.assertEqual(batch_size, detection_scores.shape.as_list()[0])
+      self.assertEqual(tf.float32, detection_scores.dtype)
+      self.assertEqual(tf.float32, num_detections.dtype)
+      if mode == tf.estimator.ModeKeys.TRAIN:
+        self.assertIsNotNone(estimator_spec.train_op)
+      return estimator_spec
+  def _assert_outputs_for_predict(self, configs):
+    model_config = configs['model']
+    with tf.Graph().as_default():
+      features, _ = inputs.create_eval_input_fn(
+          configs['eval_config'],
+          configs['eval_input_config'],
+          configs['model'])()
+      detection_model_fn = functools.partial(
+          model_builder.build, model_config=model_config, is_training=False)
+      hparams = model_hparams.create_hparams(
+          hparams_overrides='load_pretrained=false')
+      model_fn = model.create_model_fn(detection_model_fn, configs, hparams)
+      estimator_spec = model_fn(features, None, tf.estimator.ModeKeys.PREDICT)
+      self.assertIsNone(estimator_spec.loss)
+      self.assertIsNone(estimator_spec.train_op)
+      self.assertIsNotNone(estimator_spec.predictions)
+      self.assertIsNotNone(estimator_spec.export_outputs)
+      self.assertIn(tf.saved_model.signature_constants.PREDICT_METHOD_NAME,
+                    estimator_spec.export_outputs)
+  def testModelFnInTrainMode(self):
+    """Tests the model function in TRAIN mode."""
+    configs = _get_configs_for_model(MODEL_NAME_FOR_TEST)
+    self._assert_outputs_for_train_eval(configs, tf.estimator.ModeKeys.TRAIN)
+  def testModelFnInEvalMode(self):
+    """Tests the model function in EVAL mode."""
+    configs = _get_configs_for_model(MODEL_NAME_FOR_TEST)
+    self._assert_outputs_for_train_eval(configs, tf.estimator.ModeKeys.EVAL)
+  def testModelFnInPredictMode(self):
+    """Tests the model function in PREDICT mode."""
+    configs = _get_configs_for_model(MODEL_NAME_FOR_TEST)
+    self._assert_outputs_for_predict(configs)
+  def testExperiment(self):
+    """Tests that the `Experiment` object is constructed correctly."""
+    experiment = model_test_util.BuildExperiment()
+    model_dir = experiment.estimator.model_dir
+    pipeline_config_path = os.path.join(model_dir, 'pipeline.config')
+    self.assertTrue(tf.gfile.Exists(pipeline_config_path))
+class UnbatchTensorsTest(tf.test.TestCase):
+  def test_unbatch_without_unpadding(self):
+    image_placeholder = tf.placeholder(tf.float32, [2, None, None, None])
+    groundtruth_boxes_placeholder = tf.placeholder(tf.float32, [2, None, None])
+    groundtruth_classes_placeholder = tf.placeholder(tf.float32,
+                                                     [2, None, None])
+    groundtruth_weights_placeholder = tf.placeholder(tf.float32, [2, None])
+    tensor_dict = {
+        fields.InputDataFields.image:
+            image_placeholder,
+        fields.InputDataFields.groundtruth_boxes:
+            groundtruth_boxes_placeholder,
+        fields.InputDataFields.groundtruth_classes:
+            groundtruth_classes_placeholder,
+        fields.InputDataFields.groundtruth_weights:
+            groundtruth_weights_placeholder
+    }
+    unbatched_tensor_dict = model.unstack_batch(
+        tensor_dict, unpad_groundtruth_tensors=False)
+    with self.test_session() as sess:
+      unbatched_tensor_dict_out = sess.run(
+          unbatched_tensor_dict,
+          feed_dict={
+              image_placeholder:
+                  np.random.rand(2, 4, 4, 3).astype(np.float32),
+              groundtruth_boxes_placeholder:
+                  np.random.rand(2, 5, 4).astype(np.float32),
+              groundtruth_classes_placeholder:
+                  np.random.rand(2, 5, 6).astype(np.float32),
+              groundtruth_weights_placeholder:
+                  np.random.rand(2, 5).astype(np.float32)
+          })
+    for image_out in unbatched_tensor_dict_out[fields.InputDataFields.image]:
+      self.assertAllEqual(image_out.shape, [4, 4, 3])
+    for groundtruth_boxes_out in unbatched_tensor_dict_out[
+        fields.InputDataFields.groundtruth_boxes]:
+      self.assertAllEqual(groundtruth_boxes_out.shape, [5, 4])
+    for groundtruth_classes_out in unbatched_tensor_dict_out[
+        fields.InputDataFields.groundtruth_classes]:
+      self.assertAllEqual(groundtruth_classes_out.shape, [5, 6])
+    for groundtruth_weights_out in unbatched_tensor_dict_out[
+        fields.InputDataFields.groundtruth_weights]:
+      self.assertAllEqual(groundtruth_weights_out.shape, [5])
+  def test_unbatch_and_unpad_groundtruth_tensors(self):
+    image_placeholder = tf.placeholder(tf.float32, [2, None, None, None])
+    groundtruth_boxes_placeholder = tf.placeholder(tf.float32, [2, 5, None])
+    groundtruth_classes_placeholder = tf.placeholder(tf.float32, [2, 5, None])
+    groundtruth_weights_placeholder = tf.placeholder(tf.float32, [2, 5])
+    num_groundtruth_placeholder = tf.placeholder(tf.int32, [2])
+    tensor_dict = {
+        fields.InputDataFields.image:
+            image_placeholder,
+        fields.InputDataFields.groundtruth_boxes:
+            groundtruth_boxes_placeholder,
+        fields.InputDataFields.groundtruth_classes:
+            groundtruth_classes_placeholder,
+        fields.InputDataFields.groundtruth_weights:
+            groundtruth_weights_placeholder,
+        fields.InputDataFields.num_groundtruth_boxes:
+            num_groundtruth_placeholder
+    }
+    unbatched_tensor_dict = model.unstack_batch(
+        tensor_dict, unpad_groundtruth_tensors=True)
+    with self.test_session() as sess:
+      unbatched_tensor_dict_out = sess.run(
+          unbatched_tensor_dict,
+          feed_dict={
+              image_placeholder:
+                  np.random.rand(2, 4, 4, 3).astype(np.float32),
+              groundtruth_boxes_placeholder:
+                  np.random.rand(2, 5, 4).astype(np.float32),
+              groundtruth_classes_placeholder:
+                  np.random.rand(2, 5, 6).astype(np.float32),
+              groundtruth_weights_placeholder:
+                  np.random.rand(2, 5).astype(np.float32),
+              num_groundtruth_placeholder:
+                  np.array([3, 3], np.int32)
+          })
+    for image_out in unbatched_tensor_dict_out[fields.InputDataFields.image]:
+      self.assertAllEqual(image_out.shape, [4, 4, 3])
+    for groundtruth_boxes_out in unbatched_tensor_dict_out[
+        fields.InputDataFields.groundtruth_boxes]:
+      self.assertAllEqual(groundtruth_boxes_out.shape, [3, 4])
+    for groundtruth_classes_out in unbatched_tensor_dict_out[
+        fields.InputDataFields.groundtruth_classes]:
+      self.assertAllEqual(groundtruth_classes_out.shape, [3, 6])
+    for groundtruth_weights_out in unbatched_tensor_dict_out[
+        fields.InputDataFields.groundtruth_weights]:
+      self.assertAllEqual(groundtruth_weights_out.shape, [3])
+if __name__ == '__main__':
+  tf.test.main()