Mask R-CNN model added to models/research/mlperf_object_detection/Mask_RCNN (#4678)

* Create README.md * readme changed * readme changed * ResNet backbone completed. * FPN added * Create README.md * initial commit * files removed * initial commit * protobuf file removed

Mask R-CNN model added to models/research/mlperf_object_detection/Mask_RCNN (#4678)
* Create README.md * readme changed * readme changed * ResNet backbone completed. * FPN added * Create README.md * initial commit * files removed * initial commit * protobuf file removed
c308c03c · Mehdi Sharifzadeh · Taylor Robie · 32e7d660 · c308c03c · c308c03c
Commit c308c03c authored Jul 02, 2018 by Mehdi Sharifzadeh Committed by Taylor Robie Jul 02, 2018
20 changed files
--- a/research/mlperf_object_detection/Mask_RCNN/object_detection/g3doc/img/tensorboard2.png
+++ b/research/mlperf_object_detection/Mask_RCNN/object_detection/g3doc/img/tensorboard2.png
--- a/research/mlperf_object_detection/Mask_RCNN/object_detection/g3doc/img/tf-od-api-logo.png
+++ b/research/mlperf_object_detection/Mask_RCNN/object_detection/g3doc/img/tf-od-api-logo.png
--- a/research/mlperf_object_detection/Mask_RCNN/object_detection/g3doc/installation.md
+++ b/research/mlperf_object_detection/Mask_RCNN/object_detection/g3doc/installation.md
+# Installation
+
+## Dependencies
+
+Tensorflow Object Detection API depends on the following libraries:
+
+*   Protobuf 3+
+*   Python-tk
+*   Pillow 1.0
+*   lxml
+*   tf Slim (which is included in the "tensorflow/models/research/" checkout)
+*   Jupyter notebook
+*   Matplotlib
+*   Tensorflow
+*   Cython
+*   cocoapi
+
+For detailed steps to install Tensorflow, follow the [Tensorflow installation
+instructions](https://www.tensorflow.org/install/). A typical user can install
+Tensorflow using one of the following commands:
+
+``` bash
+# For CPU
+pip install tensorflow
+# For GPU
+pip install tensorflow-gpu
+```
+
+The remaining libraries can be installed on Ubuntu 16.04 using via apt-get:
+
+``` bash
+sudo apt-get install protobuf-compiler python-pil python-lxml python-tk
+sudo pip install Cython
+sudo pip install jupyter
+sudo pip install matplotlib
+```
+
+Alternatively, users can install dependencies using pip:
+
+``` bash
+sudo pip install Cython
+sudo pip install pillow
+sudo pip install lxml
+sudo pip install jupyter
+sudo pip install matplotlib
+```
+
+## COCO API installation
+
+Download the
+<a href="https://github.com/cocodataset/cocoapi" target=_blank>cocoapi</a> and
+copy the pycocotools subfolder to the tensorflow/models/research directory if
+you are interested in using COCO evaluation metrics. The default metrics are
+based on those used in Pascal VOC evaluation. To use the COCO object detection
+metrics add `metrics_set: "coco_detection_metrics"` to the `eval_config` message
+in the config file. To use the COCO instance segmentation metrics add
+`metrics_set: "coco_mask_metrics"` to the `eval_config` message in the config
+file.
+
+```bash
+git clone https://github.com/cocodataset/cocoapi.git
+cd cocoapi/PythonAPI
+make
+cp -r pycocotools <path_to_tensorflow>/models/research/
+```
+
+## Protobuf Compilation
+
+The Tensorflow Object Detection API uses Protobufs to configure model and
+training parameters. Before the framework can be used, the Protobuf libraries
+must be compiled. This should be done by running the following command from
+the tensorflow/models/research/ directory:
+
+
+``` bash
+# From tensorflow/models/research/
+protoc object_detection/protos/*.proto --python_out=.
+```
+
+## Add Libraries to PYTHONPATH
+
+When running locally, the tensorflow/models/research/ and slim directories
+should be appended to PYTHONPATH. This can be done by running the following from
+tensorflow/models/research/:
+
+
+``` bash
+# From tensorflow/models/research/
+export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim
+```
+
+Note: This command needs to run from every new terminal you start. If you wish
+to avoid running this manually, you can add it as a new line to the end of your
+~/.bashrc file, replacing \`pwd\` with the absolute path of
+tensorflow/models/research on your system.
+
+# Testing the Installation
+
+You can test that you have correctly installed the Tensorflow Object Detection\
+API by running the following command:
+
+```bash
+python object_detection/builders/model_builder_test.py
+```
--- a/research/mlperf_object_detection/Mask_RCNN/object_detection/g3doc/instance_segmentation.md
+++ b/research/mlperf_object_detection/Mask_RCNN/object_detection/g3doc/instance_segmentation.md
+## Run an Instance Segmentation Model
+
+For some applications it isn't adequate enough to localize an object with a
+simple bounding box. For instance, you might want to segment an object region
+once it is detected. This class of problems is called **instance segmentation**.
+
+<p align="center">
+  <img src="img/kites_with_segment_overlay.png" width=676 height=450>
+</p>
+
+### Materializing data for instance segmentation {#materializing-instance-seg}
+
+Instance segmentation is an extension of object detection, where a binary mask
+(i.e. object vs. background) is associated with every bounding box. This allows
+for more fine-grained information about the extent of the object within the box.
+To train an instance segmentation model, a groundtruth mask must be supplied for
+every groundtruth bounding box. In additional to the proto fields listed in the
+section titled [Using your own dataset](using_your_own_dataset.md), one must
+also supply `image/object/mask`, which can either be a repeated list of
+single-channel encoded PNG strings, or a single dense 3D binary tensor where
+masks corresponding to each object are stacked along the first dimension. Each
+is described in more detail below.
+
+#### PNG Instance Segmentation Masks
+
+Instance segmentation masks can be supplied as serialized PNG images.
+
+```shell
+image/object/mask = ["\x89PNG\r\n\x1A\n\x00\x00\x00\rIHDR\...", ...]
+```
+
+These masks are whole-image masks, one for each object instance. The spatial
+dimensions of each mask must agree with the image. Each mask has only a single
+channel, and the pixel values are either 0 (background) or 1 (object mask).
+**PNG masks are the preferred parameterization since they offer considerable
+space savings compared to dense numerical masks.**
+
+#### Dense Numerical Instance Segmentation Masks
+
+Masks can also be specified via a dense numerical tensor.
+
+```shell
+image/object/mask = [0.0, 0.0, 1.0, 1.0, 0.0, ...]
+```
+
+For an image with dimensions `H` x `W` and `num_boxes` groundtruth boxes, the
+mask corresponds to a [`num_boxes`, `H`, `W`] float32 tensor, flattened into a
+single vector of shape `num_boxes` * `H` * `W`. In TensorFlow, examples are read
+in row-major format, so the elements are organized as:
+
+```shell
+... mask 0 row 0 ... mask 0 row 1 ... // ... mask 0 row H-1 ... mask 1 row 0 ...
+```
+
+where each row has W contiguous binary values.
+
+To see an example tf-records with mask labels, see the examples under the
+[Preparing Inputs](preparing_inputs.md) section.
+
+### Pre-existing config files
+
+We provide four instance segmentation config files that you can use to train
+your own models:
+
+1.  <a href="https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/mask_rcnn_inception_resnet_v2_atrous_coco.config" target=_blank>mask_rcnn_inception_resnet_v2_atrous_coco</a>
+1.  <a href="https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/mask_rcnn_resnet101_atrous_coco.config" target=_blank>mask_rcnn_resnet101_atrous_coco</a>
+1.  <a href="https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/mask_rcnn_resnet50_atrous_coco.config" target=_blank>mask_rcnn_resnet50_atrous_coco</a>
+1.  <a href="https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/mask_rcnn_inception_v2_coco.config" target=_blank>mask_rcnn_inception_v2_coco</a>
+
+For more details see the [detection model zoo](detection_model_zoo.md).
+
+### Updating a Faster R-CNN config file
+
+Currently, the only supported instance segmentation model is [Mask
+R-CNN](https://arxiv.org/abs/1703.06870), which requires Faster R-CNN as the
+backbone object detector.
+
+Once you have a baseline Faster R-CNN pipeline configuration, you can make the
+following modifications in order to convert it into a Mask R-CNN model.
+
+1.  Within `train_input_reader` and `eval_input_reader`, set
+    `load_instance_masks` to `True`. If using PNG masks, set `mask_type` to
+    `PNG_MASKS`, otherwise you can leave it as the default 'NUMERICAL_MASKS'.
+1.  Within the `faster_rcnn` config, use a `MaskRCNNBoxPredictor` as the
+    `second_stage_box_predictor`.
+1.  Within the `MaskRCNNBoxPredictor` message, set `predict_instance_masks` to
+    `True`. You must also define `conv_hyperparams`.
+1.  Within the `faster_rcnn` message, set `number_of_stages` to `3`.
+1.  Add instance segmentation metrics to the set of metrics:
+    `'coco_mask_metrics'`.
+1.  Update the `input_path`s to point at your data.
+
+Please refer to the section on [Running the pets dataset](running_pets.md) for
+additional details.
+
+> Note: The mask prediction branch consists of a sequence of convolution layers.
+> You can set the number of convolution layers and their depth as follows:
+>
+> 1.  Within the `MaskRCNNBoxPredictor` message, set the
+>     `mask_prediction_conv_depth` to your value of interest. The default value
+>     is 256. If you set it to `0` (recommended), the depth is computed
+>     automatically based on the number of classes in the dataset.
+> 1.  Within the `MaskRCNNBoxPredictor` message, set the
+>     `mask_prediction_num_conv_layers` to your value of interest. The default
+>     value is 2.
--- a/research/mlperf_object_detection/Mask_RCNN/object_detection/g3doc/oid_inference_and_evaluation.md
+++ b/research/mlperf_object_detection/Mask_RCNN/object_detection/g3doc/oid_inference_and_evaluation.md
+# Inference and evaluation on the Open Images dataset
+
+This page presents a tutorial for running object detector inference and
+evaluation measure computations on the [Open Images
+dataset](https://github.com/openimages/dataset), using tools from the
+[TensorFlow Object Detection
+API](https://github.com/tensorflow/models/tree/master/research/object_detection).
+It shows how to download the images and annotations for the validation and test
+sets of Open Images; how to package the downloaded data in a format understood
+by the Object Detection API; where to find a trained object detector model for
+Open Images; how to run inference; and how to compute evaluation measures on the
+inferred detections.
+
+Inferred detections will look like the following:
+
+![](img/oid_bus_72e19c28aac34ed8.jpg){height="300"}
+![](img/oid_monkey_3b4168c89cecbc5b.jpg){height="300"}
+
+On the validation set of Open Images, this tutorial requires 27GB of free disk
+space and the inference step takes approximately 9 hours on a single NVIDIA
+Tesla P100 GPU. On the test set -- 75GB and 27 hours respectively. All other
+steps require less than two hours in total on both sets.
+
+## Installing TensorFlow, the Object Detection API, and Google Cloud SDK
+
+Please run through the [installation instructions](installation.md) to install
+TensorFlow and all its dependencies. Ensure the Protobuf libraries are compiled
+and the library directories are added to `PYTHONPATH`. You will also need to
+`pip` install `pandas` and `contextlib2`.
+
+Some of the data used in this tutorial lives in Google Cloud buckets. To access
+it, you will have to [install the Google Cloud
+SDK](https://cloud.google.com/sdk/downloads) on your workstation or laptop.
+
+## Preparing the Open Images validation and test sets
+
+In order to run inference and subsequent evaluation measure computations, we
+require a dataset of images and ground truth boxes, packaged as TFRecords of
+TFExamples. To create such a dataset for Open Images, you will need to first
+download ground truth boxes from the [Open Images
+website](https://github.com/openimages/dataset):
+
+```bash
+# From tensorflow/models/research
+mkdir oid
+cd oid
+wget https://storage.googleapis.com/openimages/2017_07/annotations_human_bbox_2017_07.tar.gz
+tar -xvf annotations_human_bbox_2017_07.tar.gz
+```
+
+Next, download the images. In this tutorial, we will use lower resolution images
+provided by [CVDF](http://www.cvdfoundation.org). Please follow the instructions
+on [CVDF's Open Images repository
+page](https://github.com/cvdfoundation/open-images-dataset) in order to gain
+access to the cloud bucket with the images. Then run:
+
+```bash
+# From tensorflow/models/research/oid
+SPLIT=validation  # Set SPLIT to "test" to download the images in the test set
+mkdir raw_images_${SPLIT}
+gsutil -m rsync -r gs://open-images-dataset/$SPLIT raw_images_${SPLIT}
+```
+
+Another option for downloading the images is to follow the URLs contained in the
+[image URLs and metadata CSV
+files](https://storage.googleapis.com/openimages/2017_07/images_2017_07.tar.gz)
+on the Open Images website.
+
+At this point, your `tensorflow/models/research/oid` directory should appear as
+follows:
+
+```lang-none
+|-- 2017_07
+|   |-- test
+|   |   `-- annotations-human-bbox.csv
+|   |-- train
+|   |   `-- annotations-human-bbox.csv
+|   `-- validation
+|       `-- annotations-human-bbox.csv
+|-- raw_images_validation (if you downloaded the validation split)
+|   `-- ... (41,620 files matching regex "[0-9a-f]{16}.jpg")
+|-- raw_images_test (if you downloaded the test split)
+|   `-- ... (125,436 files matching regex "[0-9a-f]{16}.jpg")
+`-- annotations_human_bbox_2017_07.tar.gz
+```
+
+Next, package the data into TFRecords of TFExamples by running:
+
+```bash
+# From tensorflow/models/research/oid
+SPLIT=validation  # Set SPLIT to "test" to create TFRecords for the test split
+mkdir ${SPLIT}_tfrecords
+
+PYTHONPATH=$PYTHONPATH:$(readlink -f ..) \
+python -m object_detection/dataset_tools/create_oid_tf_record \
+  --input_box_annotations_csv 2017_07/$SPLIT/annotations-human-bbox.csv \
+  --input_images_directory raw_images_${SPLIT} \
+  --input_label_map ../object_detection/data/oid_bbox_trainable_label_map.pbtxt \
+  --output_tf_record_path_prefix ${SPLIT}_tfrecords/$SPLIT.tfrecord \
+  --num_shards=100
+```
+
+This results in 100 TFRecord files (shards), written to
+`oid/${SPLIT}_tfrecords`, with filenames matching
+`${SPLIT}.tfrecord-000[0-9][0-9]-of-00100`. Each shard contains approximately
+the same number of images and is defacto a representative random sample of the
+input data. [This enables](#accelerating_inference) a straightforward work
+division scheme for distributing inference and also approximate measure
+computations on subsets of the validation and test sets.
+
+## Inferring detections
+
+Inference requires a trained object detection model. In this tutorial we will
+use a model from the [detections model zoo](detection_model_zoo.md), which can
+be downloaded and unpacked by running the commands below. More information about
+the model, such as its architecture and how it was trained, is available in the
+[model zoo page](detection_model_zoo.md).
+
+```bash
+# From tensorflow/models/research/oid
+wget http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_resnet_v2_atrous_oid_14_10_2017.tar.gz
+tar -zxvf faster_rcnn_inception_resnet_v2_atrous_oid_14_10_2017.tar.gz
+```
+
+At this point, data is packed into TFRecords and we have an object detector
+model. We can run inference using:
+
+```bash
+# From tensorflow/models/research/oid
+SPLIT=validation  # or test
+TF_RECORD_FILES=$(ls -1 ${SPLIT}_tfrecords/* | tr '\n' ',')
+
+PYTHONPATH=$PYTHONPATH:$(readlink -f ..) \
+python -m object_detection/inference/infer_detections \
+  --input_tfrecord_paths=$TF_RECORD_FILES \
+  --output_tfrecord_path=${SPLIT}_detections.tfrecord-00000-of-00001 \
+  --inference_graph=faster_rcnn_inception_resnet_v2_atrous_oid/frozen_inference_graph.pb \
+  --discard_image_pixels
+```
+
+Inference preserves all fields of the input TFExamples, and adds new fields to
+store the inferred detections. This allows [computing evaluation
+measures](#compute_evaluation_measures) on the output TFRecord alone, as ground
+truth boxes are preserved as well. Since measure computations don't require
+access to the images, `infer_detections` can optionally discard them with the
+`--discard_image_pixels` flag. Discarding the images drastically reduces the
+size of the output TFRecord.
+
+### Accelerating inference {#accelerating_inference}
+
+Running inference on the whole validation or test set can take a long time to
+complete due to the large number of images present in these sets (41,620 and
+125,436 respectively). For quick but approximate evaluation, inference and the
+subsequent measure computations can be run on a small number of shards. To run
+for example on 2% of all the data, it is enough to set `TF_RECORD_FILES` as
+shown below before running `infer_detections`:
+
+```bash
+TF_RECORD_FILES=$(ls ${SPLIT}_tfrecords/${SPLIT}.tfrecord-0000[0-1]-of-00100 | tr '\n' ',')
+```
+
+Please note that computing evaluation measures on a small subset of the data
+introduces variance and bias, since some classes of objects won't be seen during
+evaluation. In the example above, this leads to 13.2% higher mAP on the first
+two shards of the validation set compared to the mAP for the full set ([see mAP
+results](#expected-maps)).
+
+Another way to accelerate inference is to run it in parallel on multiple
+TensorFlow devices on possibly multiple machines. The script below uses
+[tmux](https://github.com/tmux/tmux/wiki) to run a separate `infer_detections`
+process for each GPU on different partition of the input data.
+
+```bash
+# From tensorflow/models/research/oid
+SPLIT=validation  # or test
+NUM_GPUS=4
+NUM_SHARDS=100
+
+tmux new-session -d -s "inference"
+function tmux_start { tmux new-window -d -n "inference:GPU$1" "${*:2}; exec bash"; }
+for gpu_index in $(seq 0 $(($NUM_GPUS-1))); do
+  start_shard=$(( $gpu_index * $NUM_SHARDS / $NUM_GPUS ))
+  end_shard=$(( ($gpu_index + 1) * $NUM_SHARDS / $NUM_GPUS - 1))
+  TF_RECORD_FILES=$(seq -s, -f "${SPLIT}_tfrecords/${SPLIT}.tfrecord-%05.0f-of-$(printf '%05d' $NUM_SHARDS)" $start_shard $end_shard)
+  tmux_start ${gpu_index} \
+  PYTHONPATH=$PYTHONPATH:$(readlink -f ..) CUDA_VISIBLE_DEVICES=$gpu_index \
+  python -m object_detection/inference/infer_detections \
+    --input_tfrecord_paths=$TF_RECORD_FILES \
+    --output_tfrecord_path=${SPLIT}_detections.tfrecord-$(printf "%05d" $gpu_index)-of-$(printf "%05d" $NUM_GPUS) \
+    --inference_graph=faster_rcnn_inception_resnet_v2_atrous_oid/frozen_inference_graph.pb \
+    --discard_image_pixels
+done
+```
+
+After all `infer_detections` processes finish, `tensorflow/models/research/oid`
+will contain one output TFRecord from each process, with name matching
+`validation_detections.tfrecord-0000[0-3]-of-00004`.
+
+## Computing evaluation measures {#compute_evaluation_measures}
+
+To compute evaluation measures on the inferred detections you first need to
+create the appropriate configuration files:
+
+```bash
+# From tensorflow/models/research/oid
+SPLIT=validation  # or test
+NUM_SHARDS=1  # Set to NUM_GPUS if using the parallel evaluation script above
+
+mkdir -p ${SPLIT}_eval_metrics
+
+echo "
+label_map_path: '../object_detection/data/oid_bbox_trainable_label_map.pbtxt'
+tf_record_input_reader: { input_path: '${SPLIT}_detections.tfrecord@${NUM_SHARDS}' }
+" > ${SPLIT}_eval_metrics/${SPLIT}_input_config.pbtxt
+
+echo "
+metrics_set: 'open_images_V2_detection_metrics'
+" > ${SPLIT}_eval_metrics/${SPLIT}_eval_config.pbtxt
+```
+
+And then run:
+
+```bash
+# From tensorflow/models/research/oid
+SPLIT=validation  # or test
+
+PYTHONPATH=$PYTHONPATH:$(readlink -f ..) \
+python -m object_detection/metrics/offline_eval_map_corloc \
+  --eval_dir=${SPLIT}_eval_metrics \
+  --eval_config_path=${SPLIT}_eval_metrics/${SPLIT}_eval_config.pbtxt \
+  --input_config_path=${SPLIT}_eval_metrics/${SPLIT}_input_config.pbtxt
+```
+
+The first configuration file contains an `object_detection.protos.InputReader`
+message that describes the location of the necessary input files. The second
+file contains an `object_detection.protos.EvalConfig` message that describes the
+evaluation metric. For more information about these protos see the corresponding
+source files.
+
+### Expected mAPs {#expected-maps}
+
+The result of running `offline_eval_map_corloc` is a CSV file located at
+`${SPLIT}_eval_metrics/metrics.csv`. With the above configuration, the file will
+contain average precision at IoU≥0.5 for each of the classes present in the
+dataset. It will also contain the mAP@IoU≥0.5. Both the per-class average
+precisions and the mAP are computed according to the [Open Images evaluation
+protocol](evaluation_protocols.md). The expected mAPs for the validation and
+test sets of Open Images in this case are:
+
+Set        | Fraction of data | Images  | mAP@IoU≥0.5
+---------: | :--------------: | :-----: | -----------
+validation | everything       | 41,620  | 39.2%
+validation | first 2 shards   | 884     | 52.4%
+test       | everything       | 125,436 | 37.7%
+test       | first 2 shards   | 2,476   | 50.8%
--- a/research/mlperf_object_detection/Mask_RCNN/object_detection/g3doc/preparing_inputs.md
+++ b/research/mlperf_object_detection/Mask_RCNN/object_detection/g3doc/preparing_inputs.md
+# Preparing Inputs
+
+Tensorflow Object Detection API reads data using the TFRecord file format. Two
+sample scripts (`create_pascal_tf_record.py` and `create_pet_tf_record.py`) are
+provided to convert from the PASCAL VOC dataset and Oxford-IIIT Pet dataset to
+TFRecords.
+
+## Generating the PASCAL VOC TFRecord files.
+
+The raw 2012 PASCAL VOC data set is located
+[here](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar).
+To download, extract and convert it to TFRecords, run the following commands
+below:
+
+```bash
+# From tensorflow/models/research/
+wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
+tar -xvf VOCtrainval_11-May-2012.tar
+python object_detection/dataset_tools/create_pascal_tf_record.py \
+    --label_map_path=object_detection/data/pascal_label_map.pbtxt \
+    --data_dir=VOCdevkit --year=VOC2012 --set=train \
+    --output_path=pascal_train.record
+python object_detection/dataset_tools/create_pascal_tf_record.py \
+    --label_map_path=object_detection/data/pascal_label_map.pbtxt \
+    --data_dir=VOCdevkit --year=VOC2012 --set=val \
+    --output_path=pascal_val.record
+```
+
+You should end up with two TFRecord files named `pascal_train.record` and
+`pascal_val.record` in the `tensorflow/models/research/` directory.
+
+The label map for the PASCAL VOC data set can be found at
+`object_detection/data/pascal_label_map.pbtxt`.
+
+## Generating the Oxford-IIIT Pet TFRecord files.
+
+The Oxford-IIIT Pet data set is located
+[here](http://www.robots.ox.ac.uk/~vgg/data/pets/). To download, extract and
+convert it to TFRecrods, run the following commands below:
+
+```bash
+# From tensorflow/models/research/
+wget http://www.robots.ox.ac.uk/~vgg/data/pets/data/images.tar.gz
+wget http://www.robots.ox.ac.uk/~vgg/data/pets/data/annotations.tar.gz
+tar -xvf annotations.tar.gz
+tar -xvf images.tar.gz
+python object_detection/dataset_tools/create_pet_tf_record.py \
+    --label_map_path=object_detection/data/pet_label_map.pbtxt \
+    --data_dir=`pwd` \
+    --output_dir=`pwd`
+```
+
+You should end up with two TFRecord files named `pet_train.record` and
+`pet_val.record` in the `tensorflow/models/research/` directory.
+
+The label map for the Pet dataset can be found at
+`object_detection/data/pet_label_map.pbtxt`.
--- a/research/mlperf_object_detection/Mask_RCNN/object_detection/g3doc/running_locally.md
+++ b/research/mlperf_object_detection/Mask_RCNN/object_detection/g3doc/running_locally.md
+# Running Locally
+
+This page walks through the steps required to train an object detection model
+on a local machine. It assumes the reader has completed the
+following prerequisites:
+
+1. The Tensorflow Object Detection API has been installed as documented in the
+[installation instructions](installation.md). This includes installing library
+dependencies, compiling the configuration protobufs and setting up the Python
+environment.
+2. A valid data set has been created. See [this page](preparing_inputs.md) for
+instructions on how to generate a dataset for the PASCAL VOC challenge or the
+Oxford-IIIT Pet dataset.
+3. A Object Detection pipeline configuration has been written. See
+[this page](configuring_jobs.md) for details on how to write a pipeline configuration.
+
+## Recommended Directory Structure for Training and Evaluation
+
+```
+data
+  -label_map file
+  -train TFRecord file
+  -eval TFRecord file
+models
+  + model
+    -pipeline config file
+    +train
+    +eval
+```
+
+## Running the Training Job
+
+A local training job can be run with the following command:
+
+```bash
+# From the tensorflow/models/research/ directory
+python object_detection/train.py \
+    --logtostderr \
+    --pipeline_config_path=${PATH_TO_YOUR_PIPELINE_CONFIG} \
+    --train_dir=${PATH_TO_TRAIN_DIR}
+```
+
+where `${PATH_TO_YOUR_PIPELINE_CONFIG}` points to the pipeline config and
+`${PATH_TO_TRAIN_DIR}` points to the directory in which training checkpoints
+and events will be written to. By default, the training job will
+run indefinitely until the user kills it.
+
+## Running the Evaluation Job
+
+Evaluation is run as a separate job. The eval job will periodically poll the
+train directory for new checkpoints and evaluate them on a test dataset. The
+job can be run using the following command:
+
+```bash
+# From the tensorflow/models/research/ directory
+python object_detection/eval.py \
+    --logtostderr \
+    --pipeline_config_path=${PATH_TO_YOUR_PIPELINE_CONFIG} \
+    --checkpoint_dir=${PATH_TO_TRAIN_DIR} \
+    --eval_dir=${PATH_TO_EVAL_DIR}
+```
+
+where `${PATH_TO_YOUR_PIPELINE_CONFIG}` points to the pipeline config,
+`${PATH_TO_TRAIN_DIR}` points to the directory in which training checkpoints
+were saved (same as the training job) and `${PATH_TO_EVAL_DIR}` points to the
+directory in which evaluation events will be saved. As with the training job,
+the eval job run until terminated by default.
+
+## Running Tensorboard
+
+Progress for training and eval jobs can be inspected using Tensorboard. If
+using the recommended directory structure, Tensorboard can be run using the
+following command:
+
+```bash
+tensorboard --logdir=${PATH_TO_MODEL_DIRECTORY}
+```
+
+where `${PATH_TO_MODEL_DIRECTORY}` points to the directory that contains the
+train and eval directories. Please note it may take Tensorboard a couple minutes
+to populate with data.
--- a/research/mlperf_object_detection/Mask_RCNN/object_detection/g3doc/running_notebook.md
+++ b/research/mlperf_object_detection/Mask_RCNN/object_detection/g3doc/running_notebook.md
+# Quick Start: Jupyter notebook for off-the-shelf inference
+
+If you'd like to hit the ground running and run detection on a few example
+images right out of the box, we recommend trying out the Jupyter notebook demo.
+To run the Jupyter notebook, run the following command from
+`tensorflow/models/research/object_detection`:
+
+```
+# From tensorflow/models/research/object_detection
+jupyter notebook
+```
+
+The notebook should open in your favorite web browser. Click the
+[`object_detection_tutorial.ipynb`](../object_detection_tutorial.ipynb) link to
+open the demo.
--- a/research/mlperf_object_detection/Mask_RCNN/object_detection/g3doc/running_on_cloud.md
+++ b/research/mlperf_object_detection/Mask_RCNN/object_detection/g3doc/running_on_cloud.md
+# Running on Google Cloud Platform
+
+The Tensorflow Object Detection API supports distributed training on Google
+Cloud ML Engine. This section documents instructions on how to train and
+evaluate your model using Cloud ML. The reader should complete the following
+prerequistes:
+
+1. The reader has created and configured a project on Google Cloud Platform.
+See [the Cloud ML quick start guide](https://cloud.google.com/ml-engine/docs/quickstarts/command-line).
+2. The reader has installed the Tensorflow Object Detection API as documented
+in the [installation instructions](installation.md).
+3. The reader has a valid data set and stored it in a Google Cloud Storage
+bucket. See [this page](preparing_inputs.md) for instructions on how to generate
+a dataset for the PASCAL VOC challenge or the Oxford-IIIT Pet dataset.
+4. The reader has configured a valid Object Detection pipeline, and stored it
+in a Google Cloud Storage bucket. See [this page](configuring_jobs.md) for
+details on how to write a pipeline configuration.
+
+Additionally, it is recommended users test their job by running training and
+evaluation jobs for a few iterations
+[locally on their own machines](running_locally.md).
+
+## Packaging
+
+In order to run the Tensorflow Object Detection API on Cloud ML, it must be
+packaged (along with it's TF-Slim dependency). The required packages can be
+created with the following command
+
+``` bash
+# From tensorflow/models/research/
+python setup.py sdist
+(cd slim && python setup.py sdist)
+```
+
+This will create python packages in dist/object_detection-0.1.tar.gz and
+slim/dist/slim-0.1.tar.gz.
+
+## Running a Multiworker Training Job
+
+Google Cloud ML requires a YAML configuration file for a multiworker training
+job using GPUs. A sample YAML file is given below:
+
+```
+trainingInput:
+  runtimeVersion: "1.2"
+  scaleTier: CUSTOM
+  masterType: standard_gpu
+  workerCount: 9
+  workerType: standard_gpu
+  parameterServerCount: 3
+  parameterServerType: standard
+
+
+```
+
+Please keep the following guidelines in mind when writing the YAML
+configuration:
+
+* A job with n workers will have n + 1 training machines (n workers + 1 master).
+* The number of parameters servers used should be an odd number to prevent
+  a parameter server from storing only weight variables or only bias variables
+  (due to round robin parameter scheduling).
+* The learning rate in the training config should be decreased when using a
+  larger number of workers. Some experimentation is required to find the
+  optimal learning rate.
+
+The YAML file should be saved on the local machine (not on GCP). Once it has
+been written, a user can start a training job on Cloud ML Engine using the
+following command:
+
+``` bash
+# From tensorflow/models/research/
+gcloud ml-engine jobs submit training object_detection_`date +%s` \
+    --runtime-version 1.2 \
+    --job-dir=gs://${TRAIN_DIR} \
+    --packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz \
+    --module-name object_detection.train \
+    --region us-central1 \
+    --config ${PATH_TO_LOCAL_YAML_FILE} \
+    -- \
+    --train_dir=gs://${TRAIN_DIR} \
+    --pipeline_config_path=gs://${PIPELINE_CONFIG_PATH}
+```
+
+Where `${PATH_TO_LOCAL_YAML_FILE}` is the local path to the YAML configuration,
+`gs://${TRAIN_DIR}` specifies the directory on Google Cloud Storage where the
+training checkpoints and events will be written to and
+`gs://${PIPELINE_CONFIG_PATH}` points to the pipeline configuration stored on
+Google Cloud Storage.
+
+Users can monitor the progress of their training job on the [ML Engine
+Dashboard](https://console.cloud.google.com/mlengine/jobs).
+
+Note: This sample is supported for use with 1.2 runtime version.
+
+## Running an Evaluation Job on Cloud
+
+Evaluation jobs run on a single machine, so it is not necessary to write a YAML
+configuration for evaluation. Run the following command to start the evaluation
+job:
+
+``` bash
+gcloud ml-engine jobs submit training object_detection_eval_`date +%s` \
+    --runtime-version 1.2 \
+    --job-dir=gs://${TRAIN_DIR} \
+    --packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz \
+    --module-name object_detection.eval \
+    --region us-central1 \
+    --scale-tier BASIC_GPU \
+    -- \
+    --checkpoint_dir=gs://${TRAIN_DIR} \
+    --eval_dir=gs://${EVAL_DIR} \
+    --pipeline_config_path=gs://${PIPELINE_CONFIG_PATH}
+```
+
+Where `gs://${TRAIN_DIR}` points to the directory on Google Cloud Storage where
+training checkpoints are saved (same as the training job), `gs://${EVAL_DIR}`
+points to where evaluation events will be saved on Google Cloud Storage and
+`gs://${PIPELINE_CONFIG_PATH}` points to where the pipeline configuration is
+stored on Google Cloud Storage.
+
+## Running Tensorboard
+
+You can run Tensorboard locally on your own machine to view progress of your
+training and eval jobs on Google Cloud ML. Run the following command to start
+Tensorboard:
+
+``` bash
+tensorboard --logdir=gs://${YOUR_CLOUD_BUCKET}
+```
+
+Note it may Tensorboard a few minutes to populate with results.
--- a/research/mlperf_object_detection/Mask_RCNN/object_detection/g3doc/running_pets.md
+++ b/research/mlperf_object_detection/Mask_RCNN/object_detection/g3doc/running_pets.md
+# Quick Start: Distributed Training on the Oxford-IIIT Pets Dataset on Google Cloud
+
+This page is a walkthrough for training an object detector using the Tensorflow
+Object Detection API. In this tutorial, we'll be training on the Oxford-IIIT Pets
+dataset to build a system to detect various breeds of cats and dogs. The output
+of the detector will look like the following:
+
+![](img/oxford_pet.png)
+
+## Setting up a Project on Google Cloud
+
+To accelerate the process, we'll run training and evaluation on [Google Cloud
+ML Engine](https://cloud.google.com/ml-engine/) to leverage multiple GPUs. To
+begin, you will have to set up Google Cloud via the following steps (if you have
+already done this, feel free to skip to the next section):
+
+1. [Create a GCP project](https://cloud.google.com/resource-manager/docs/creating-managing-projects).
+2. [Install the Google Cloud SDK](https://cloud.google.com/sdk/downloads) on
+your workstation or laptop.
+This will provide the tools you need to upload files to Google Cloud Storage and
+start ML training jobs.
+3. [Enable the ML Engine
+APIs](https://console.cloud.google.com/flows/enableapi?apiid=ml.googleapis.com,compute_component&_ga=1.73374291.1570145678.1496689256).
+By default, a new GCP project does not enable APIs to start ML Engine training
+jobs. Use the above link to explicitly enable them.
+4. [Set up a Google Cloud Storage (GCS)
+bucket](https://cloud.google.com/storage/docs/creating-buckets). ML Engine
+training jobs can only access files on a Google Cloud Storage bucket. In this
+tutorial, we'll be required to upload our dataset and configuration to GCS.
+
+Please remember the name of your GCS bucket, as we will reference it multiple
+times in this document. Substitute `${YOUR_GCS_BUCKET}` with the name of
+your bucket in this document. For your convenience, you should define the
+environment variable below:
+
+``` bash
+export YOUR_GCS_BUCKET=${YOUR_GCS_BUCKET}
+```
+
+It is also possible to run locally by following
+[the running locally instructions](running_locally.md).
+
+## Installing Tensorflow and the Tensorflow Object Detection API
+
+Please run through the [installation instructions](installation.md) to install
+Tensorflow and all it dependencies. Ensure the Protobuf libraries are
+compiled and the library directories are added to `PYTHONPATH`.
+
+## Getting the Oxford-IIIT Pets Dataset and Uploading it to Google Cloud Storage
+
+In order to train a detector, we require a dataset of images, bounding boxes and
+classifications. For this demo, we'll use the Oxford-IIIT Pets dataset. The raw
+dataset for Oxford-IIIT Pets lives
+[here](http://www.robots.ox.ac.uk/~vgg/data/pets/). You will need to download
+both the image dataset [`images.tar.gz`](http://www.robots.ox.ac.uk/~vgg/data/pets/data/images.tar.gz)
+and the groundtruth data [`annotations.tar.gz`](http://www.robots.ox.ac.uk/~vgg/data/pets/data/annotations.tar.gz)
+to the `tensorflow/models/research/` directory and unzip them. This may take
+some time.
+
+``` bash
+# From tensorflow/models/research/
+wget http://www.robots.ox.ac.uk/~vgg/data/pets/data/images.tar.gz
+wget http://www.robots.ox.ac.uk/~vgg/data/pets/data/annotations.tar.gz
+tar -xvf images.tar.gz
+tar -xvf annotations.tar.gz
+```
+
+After downloading the tarballs, your `tensorflow/models/research/` directory
+should appear as follows:
+
+```lang-none
+- images.tar.gz
+- annotations.tar.gz
+ images/
+ annotations/
+ object_detection/
+... other files and directories
+```
+
+The Tensorflow Object Detection API expects data to be in the TFRecord format,
+so we'll now run the `create_pet_tf_record` script to convert from the raw
+Oxford-IIIT Pet dataset into TFRecords. Run the following commands from the
+`tensorflow/models/research/` directory:
+
+``` bash
+# From tensorflow/models/research/
+python object_detection/dataset_tools/create_pet_tf_record.py \
+    --label_map_path=object_detection/data/pet_label_map.pbtxt \
+    --data_dir=`pwd` \
+    --output_dir=`pwd`
+```
+
+Note: It is normal to see some warnings when running this script. You may ignore
+them.
+
+Two TFRecord files named `pet_train.record` and `pet_val.record` should be
+generated in the `tensorflow/models/research/` directory.
+
+Now that the data has been generated, we'll need to upload it to Google Cloud
+Storage so the data can be accessed by ML Engine. Run the following command to
+copy the files into your GCS bucket (substituting `${YOUR_GCS_BUCKET}`):
+
+``` bash
+# From tensorflow/models/research/
+gsutil cp pet_train.record gs://${YOUR_GCS_BUCKET}/data/pet_train.record
+gsutil cp pet_val.record gs://${YOUR_GCS_BUCKET}/data/pet_val.record
+gsutil cp object_detection/data/pet_label_map.pbtxt gs://${YOUR_GCS_BUCKET}/data/pet_label_map.pbtxt
+```
+
+Please remember the path where you upload the data to, as we will need this
+information when configuring the pipeline in a following step.
+
+## Downloading a COCO-pretrained Model for Transfer Learning
+
+Training a state of the art object detector from scratch can take days, even
+when using multiple GPUs! In order to speed up training, we'll take an object
+detector trained on a different dataset (COCO), and reuse some of it's
+parameters to initialize our new model.
+
+Download our [COCO-pretrained Faster R-CNN with Resnet-101
+model](http://storage.googleapis.com/download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_coco_11_06_2017.tar.gz).
+Unzip the contents of the folder and copy the `model.ckpt*` files into your GCS
+Bucket.
+
+``` bash
+wget http://storage.googleapis.com/download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_coco_11_06_2017.tar.gz
+tar -xvf faster_rcnn_resnet101_coco_11_06_2017.tar.gz
+gsutil cp faster_rcnn_resnet101_coco_11_06_2017/model.ckpt.* gs://${YOUR_GCS_BUCKET}/data/
+```
+
+Remember the path where you uploaded the model checkpoint to, as we will need it
+in the following step.
+
+## Configuring the Object Detection Pipeline
+
+In the Tensorflow Object Detection API, the model parameters, training
+parameters and eval parameters are all defined by a config file. More details
+can be found [here](configuring_jobs.md). For this tutorial, we will use some
+predefined templates provided with the source code. In the
+`object_detection/samples/configs` folder, there are skeleton object_detection
+configuration files. We will use `faster_rcnn_resnet101_pets.config` as a
+starting point for configuring the pipeline. Open the file with your favourite
+text editor.
+
+We'll need to configure some paths in order for the template to work. Search the
+file for instances of `PATH_TO_BE_CONFIGURED` and replace them with the
+appropriate value (typically `gs://${YOUR_GCS_BUCKET}/data/`). Afterwards
+upload your edited file onto GCS, making note of the path it was uploaded to
+(we'll need it when starting the training/eval jobs).
+
+``` bash
+# From tensorflow/models/research/
+
+# Edit the faster_rcnn_resnet101_pets.config template. Please note that there
+# are multiple places where PATH_TO_BE_CONFIGURED needs to be set.
+sed -i "s|PATH_TO_BE_CONFIGURED|"gs://${YOUR_GCS_BUCKET}"/data|g" \
+    object_detection/samples/configs/faster_rcnn_resnet101_pets.config
+
+# Copy edited template to cloud.
+gsutil cp object_detection/samples/configs/faster_rcnn_resnet101_pets.config \
+    gs://${YOUR_GCS_BUCKET}/data/faster_rcnn_resnet101_pets.config
+```
+
+## Checking Your Google Cloud Storage Bucket
+
+At this point in the tutorial, you should have uploaded the training/validation
+datasets (including label map), our COCO trained FasterRCNN finetune checkpoint and your job
+configuration to your Google Cloud Storage Bucket. Your bucket should look like
+the following:
+
+```lang-none
+ ${YOUR_GCS_BUCKET}/
+  + data/
+    - faster_rcnn_resnet101_pets.config
+    - model.ckpt.index
+    - model.ckpt.meta
+    - model.ckpt.data-00000-of-00001
+    - pet_label_map.pbtxt
+    - pet_train.record
+    - pet_val.record
+```
+
+You can inspect your bucket using the [Google Cloud Storage
+browser](https://console.cloud.google.com/storage/browser).
+
+## Starting Training and Evaluation Jobs on Google Cloud ML Engine
+
+Before we can start a job on Google Cloud ML Engine, we must:
+
+1. Package the Tensorflow Object Detection code.
+2. Write a cluster configuration for our Google Cloud ML job.
+
+To package the Tensorflow Object Detection code, run the following commands from
+the `tensorflow/models/research/` directory:
+
+``` bash
+# From tensorflow/models/research/
+python setup.py sdist
+(cd slim && python setup.py sdist)
+```
+
+You should see two tar.gz files created at `dist/object_detection-0.1.tar.gz`
+and `slim/dist/slim-0.1.tar.gz`.
+
+For running the training Cloud ML job, we'll configure the cluster to use 10
+training jobs (1 master + 9 workers) and three parameters servers. The
+configuration file can be found at `object_detection/samples/cloud/cloud.yml`.
+
+Note: This sample is supported for use with 1.2 runtime version.
+
+To start training, execute the following command from the
+`tensorflow/models/research/` directory:
+
+``` bash
+# From tensorflow/models/research/
+gcloud ml-engine jobs submit training `whoami`_object_detection_`date +%s` \
+    --runtime-version 1.2 \
+    --job-dir=gs://${YOUR_GCS_BUCKET}/train \
+    --packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz \
+    --module-name object_detection.train \
+    --region us-central1 \
+    --config object_detection/samples/cloud/cloud.yml \
+    -- \
+    --train_dir=gs://${YOUR_GCS_BUCKET}/train \
+    --pipeline_config_path=gs://${YOUR_GCS_BUCKET}/data/faster_rcnn_resnet101_pets.config
+```
+
+Once training has started, we can run an evaluation concurrently:
+
+``` bash
+# From tensorflow/models/research/
+gcloud ml-engine jobs submit training `whoami`_object_detection_eval_`date +%s` \
+    --runtime-version 1.2 \
+    --job-dir=gs://${YOUR_GCS_BUCKET}/train \
+    --packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz \
+    --module-name object_detection.eval \
+    --region us-central1 \
+    --scale-tier BASIC_GPU \
+    -- \
+    --checkpoint_dir=gs://${YOUR_GCS_BUCKET}/train \
+    --eval_dir=gs://${YOUR_GCS_BUCKET}/eval \
+    --pipeline_config_path=gs://${YOUR_GCS_BUCKET}/data/faster_rcnn_resnet101_pets.config
+```
+
+Note: Even though we're running an evaluation job, the `gcloud ml-engine jobs
+submit training` command is correct. ML Engine does not distinguish between
+training and evaluation jobs.
+
+Users can monitor and stop training and evaluation jobs on the [ML Engine
+Dashboard](https://console.cloud.google.com/mlengine/jobs).
+
+## Monitoring Progress with Tensorboard
+
+You can monitor progress of the training and eval jobs by running Tensorboard on
+your local machine:
+
+``` bash
+# This command needs to be run once to allow your local machine to access your
+# GCS bucket.
+gcloud auth application-default login
+
+tensorboard --logdir=gs://${YOUR_GCS_BUCKET}
+```
+
+Once Tensorboard is running, navigate to `localhost:6006` from your favourite
+web browser. You should see something similar to the following:
+
+![](img/tensorboard.png)
+
+You will also want to click on the images tab to see example detections made by
+the model while it trains. After about an hour and a half of training, you can
+expect to see something like this:
+
+![](img/tensorboard2.png)
+
+Note: It takes roughly 10 minutes for a job to get started on ML Engine, and
+roughly an hour for the system to evaluate the validation dataset. It may take
+some time to populate the dashboards. If you do not see any entries after half
+an hour, check the logs from the [ML Engine
+Dashboard](https://console.cloud.google.com/mlengine/jobs). Note that by default
+the training jobs are configured to go for much longer than is necessary for
+convergence.  To save money, we recommend killing your jobs once you've seen
+that they've converged.
+
+## Exporting the Tensorflow Graph
+
+After your model has been trained, you should export it to a Tensorflow
+graph proto. First, you need to identify a candidate checkpoint to export. You
+can search your bucket using the [Google Cloud Storage
+Browser](https://console.cloud.google.com/storage/browser). The file should be
+stored under `${YOUR_GCS_BUCKET}/train`. The checkpoint will typically consist of
+three files:
+
+* `model.ckpt-${CHECKPOINT_NUMBER}.data-00000-of-00001`
+* `model.ckpt-${CHECKPOINT_NUMBER}.index`
+* `model.ckpt-${CHECKPOINT_NUMBER}.meta`
+
+After you've identified a candidate checkpoint to export, run the following
+command from `tensorflow/models/research/`:
+
+``` bash
+# From tensorflow/models/research/
+gsutil cp gs://${YOUR_GCS_BUCKET}/train/model.ckpt-${CHECKPOINT_NUMBER}.* .
+python object_detection/export_inference_graph.py \
+    --input_type image_tensor \
+    --pipeline_config_path object_detection/samples/configs/faster_rcnn_resnet101_pets.config \
+    --trained_checkpoint_prefix model.ckpt-${CHECKPOINT_NUMBER} \
+    --output_directory exported_graphs
+```
+
+Afterwards, you should see a directory named `exported_graphs` containing the
+SavedModel and frozen graph.
+
+## Configuring the Instance Segmentation Pipeline
+
+Mask prediction can be turned on for an object detection config by adding
+`predict_instance_masks: true` within the `MaskRCNNBoxPredictor`. Other
+parameters such as mask size, number of convolutions in the mask layer, and the
+convolution hyper parameters can be defined. We will use
+`mask_rcnn_resnet101_pets.config` as a starting point for configuring the
+instance segmentation pipeline. Everything above that was mentioned about object
+detection holds true for instance segmentation. Instance segmentation consists
+of an object detection model with an additional head that predicts the object
+mask inside each predicted box once we remove the training and other details.
+Please refer to the section on [Running an Instance Segmentation
+Model](instance_segmentation.md) for instructions on how to configure a model
+that predicts masks in addition to object bounding boxes.
+
+## What's Next
+
+Congratulations, you have now trained an object detector for various cats and
+dogs! There different things you can do now:
+
+1. [Test your exported model using the provided Jupyter notebook.](running_notebook.md)
+2. [Experiment with different model configurations.](configuring_jobs.md)
+3. Train an object detector using your own data.
--- a/research/mlperf_object_detection/Mask_RCNN/object_detection/g3doc/using_your_own_dataset.md
+++ b/research/mlperf_object_detection/Mask_RCNN/object_detection/g3doc/using_your_own_dataset.md
+# Preparing Inputs
+
+To use your own dataset in Tensorflow Object Detection API, you must convert it
+into the [TFRecord file format](https://www.tensorflow.org/api_guides/python/python_io#tfrecords_format_details).
+This document outlines how to write a script to generate the TFRecord file.
+
+## Label Maps
+
+Each dataset is required to have a label map associated with it. This label map
+defines a mapping from string class names to integer class Ids. The label map
+should be a `StringIntLabelMap` text protobuf. Sample label maps can be found in
+object_detection/data. Label maps should always start from id 1.
+
+## Dataset Requirements
+
+For every example in your dataset, you should have the following information:
+
+1. An RGB image for the dataset encoded as jpeg or png.
+2. A list of bounding boxes for the image. Each bounding box should contain:
+    1. A bounding box coordinates (with origin in top left corner) defined by 4
+       floating point numbers [ymin, xmin, ymax, xmax]. Note that we store the
+       _normalized_ coordinates (x / width, y / height) in the TFRecord dataset.
+    2. The class of the object in the bounding box.
+
+# Example Image
+
+Consider the following image:
+
+![Example Image](img/example_cat.jpg "Example Image")
+
+with the following label map:
+
+```
+item {
+  id: 1
+  name: 'Cat'
+}
+
+
+item {
+  id: 2
+  name: 'Dog'
+}
+```
+
+We can generate a tf.Example proto for this image using the following code:
+
+```python
+
+def create_cat_tf_example(encoded_cat_image_data):
+   """Creates a tf.Example proto from sample cat image.
+
+  Args:
+    encoded_cat_image_data: The jpg encoded data of the cat image.
+
+  Returns:
+    example: The created tf.Example.
+  """
+
+  height = 1032.0
+  width = 1200.0
+  filename = 'example_cat.jpg'
+  image_format = b'jpg'
+
+  xmins = [322.0 / 1200.0]
+  xmaxs = [1062.0 / 1200.0]
+  ymins = [174.0 / 1032.0]
+  ymaxs = [761.0 / 1032.0]
+  classes_text = ['Cat']
+  classes = [1]
+
+  tf_example = tf.train.Example(features=tf.train.Features(feature={
+      'image/height': dataset_util.int64_feature(height),
+      'image/width': dataset_util.int64_feature(width),
+      'image/filename': dataset_util.bytes_feature(filename),
+      'image/source_id': dataset_util.bytes_feature(filename),
+      'image/encoded': dataset_util.bytes_feature(encoded_image_data),
+      'image/format': dataset_util.bytes_feature(image_format),
+      'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
+      'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
+      'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
+      'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
+      'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
+      'image/object/class/label': dataset_util.int64_list_feature(classes),
+  }))
+  return tf_example
+```
+
+## Conversion Script Outline
+
+A typical conversion script will look like the following:
+
+```python
+
+import tensorflow as tf
+
+from object_detection.utils import dataset_util
+
+
+flags = tf.app.flags
+flags.DEFINE_string('output_path', '', 'Path to output TFRecord')
+FLAGS = flags.FLAGS
+
+
+def create_tf_example(example):
+  # TODO(user): Populate the following variables from your example.
+  height = None # Image height
+  width = None # Image width
+  filename = None # Filename of the image. Empty if image is not from file
+  encoded_image_data = None # Encoded image bytes
+  image_format = None # b'jpeg' or b'png'
+
+  xmins = [] # List of normalized left x coordinates in bounding box (1 per box)
+  xmaxs = [] # List of normalized right x coordinates in bounding box
+             # (1 per box)
+  ymins = [] # List of normalized top y coordinates in bounding box (1 per box)
+  ymaxs = [] # List of normalized bottom y coordinates in bounding box
+             # (1 per box)
+  classes_text = [] # List of string class name of bounding box (1 per box)
+  classes = [] # List of integer class id of bounding box (1 per box)
+
+  tf_example = tf.train.Example(features=tf.train.Features(feature={
+      'image/height': dataset_util.int64_feature(height),
+      'image/width': dataset_util.int64_feature(width),
+      'image/filename': dataset_util.bytes_feature(filename),
+      'image/source_id': dataset_util.bytes_feature(filename),
+      'image/encoded': dataset_util.bytes_feature(encoded_image_data),
+      'image/format': dataset_util.bytes_feature(image_format),
+      'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
+      'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
+      'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
+      'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
+      'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
+      'image/object/class/label': dataset_util.int64_list_feature(classes),
+  }))
+  return tf_example
+
+
+def main(_):
+  writer = tf.python_io.TFRecordWriter(FLAGS.output_path)
+
+  # TODO(user): Write code to read in your dataset to examples variable
+
+  for example in examples:
+    tf_example = create_tf_example(example)
+    writer.write(tf_example.SerializeToString())
+
+  writer.close()
+
+
+if __name__ == '__main__':
+  tf.app.run()
+
+```
+
+Note: You may notice additional fields in some other datasets. They are
+currently unused by the API and are optional.
+
+Note: Please refer to the section on [Running an Instance Segmentation
+Model](instance_segmentation.md) for instructions on how to configure a model
+that predicts masks in addition to object bounding boxes.
--- a/research/mlperf_object_detection/Mask_RCNN/object_detection/inference/__init__.py
+++ b/research/mlperf_object_detection/Mask_RCNN/object_detection/inference/__init__.py
--- a/research/mlperf_object_detection/Mask_RCNN/object_detection/inference/detection_inference.py
+++ b/research/mlperf_object_detection/Mask_RCNN/object_detection/inference/detection_inference.py
+# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Utility functions for detection inference."""
+from __future__ import division
+
+import tensorflow as tf
+
+from object_detection.core import standard_fields
+
+
+def build_input(tfrecord_paths):
+  """Builds the graph's input.
+
+  Args:
+    tfrecord_paths: List of paths to the input TFRecords
+
+  Returns:
+    serialized_example_tensor: The next serialized example. String scalar Tensor
+    image_tensor: The decoded image of the example. Uint8 tensor,
+        shape=[1, None, None,3]
+  """
+  filename_queue = tf.train.string_input_producer(
+      tfrecord_paths, shuffle=False, num_epochs=1)
+
+  tf_record_reader = tf.TFRecordReader()
+  _, serialized_example_tensor = tf_record_reader.read(filename_queue)
+  features = tf.parse_single_example(
+      serialized_example_tensor,
+      features={
+          standard_fields.TfExampleFields.image_encoded:
+              tf.FixedLenFeature([], tf.string),
+      })
+  encoded_image = features[standard_fields.TfExampleFields.image_encoded]
+  image_tensor = tf.image.decode_image(encoded_image, channels=3)
+  image_tensor.set_shape([None, None, 3])
+  image_tensor = tf.expand_dims(image_tensor, 0)
+
+  return serialized_example_tensor, image_tensor
+
+
+def build_inference_graph(image_tensor, inference_graph_path):
+  """Loads the inference graph and connects it to the input image.
+
+  Args:
+    image_tensor: The input image. uint8 tensor, shape=[1, None, None, 3]
+    inference_graph_path: Path to the inference graph with embedded weights
+
+  Returns:
+    detected_boxes_tensor: Detected boxes. Float tensor,
+        shape=[num_detections, 4]
+    detected_scores_tensor: Detected scores. Float tensor,
+        shape=[num_detections]
+    detected_labels_tensor: Detected labels. Int64 tensor,
+        shape=[num_detections]
+  """
+  with tf.gfile.Open(inference_graph_path, 'r') as graph_def_file:
+    graph_content = graph_def_file.read()
+  graph_def = tf.GraphDef()
+  graph_def.MergeFromString(graph_content)
+
+  tf.import_graph_def(
+      graph_def, name='', input_map={'image_tensor': image_tensor})
+
+  g = tf.get_default_graph()
+
+  num_detections_tensor = tf.squeeze(
+      g.get_tensor_by_name('num_detections:0'), 0)
+  num_detections_tensor = tf.cast(num_detections_tensor, tf.int32)
+
+  detected_boxes_tensor = tf.squeeze(
+      g.get_tensor_by_name('detection_boxes:0'), 0)
+  detected_boxes_tensor = detected_boxes_tensor[:num_detections_tensor]
+
+  detected_scores_tensor = tf.squeeze(
+      g.get_tensor_by_name('detection_scores:0'), 0)
+  detected_scores_tensor = detected_scores_tensor[:num_detections_tensor]
+
+  detected_labels_tensor = tf.squeeze(
+      g.get_tensor_by_name('detection_classes:0'), 0)
+  detected_labels_tensor = tf.cast(detected_labels_tensor, tf.int64)
+  detected_labels_tensor = detected_labels_tensor[:num_detections_tensor]
+
+  return detected_boxes_tensor, detected_scores_tensor, detected_labels_tensor
+
+
+def infer_detections_and_add_to_example(
+    serialized_example_tensor, detected_boxes_tensor, detected_scores_tensor,
+    detected_labels_tensor, discard_image_pixels):
+  """Runs the supplied tensors and adds the inferred detections to the example.
+
+  Args:
+    serialized_example_tensor: Serialized TF example. Scalar string tensor
+    detected_boxes_tensor: Detected boxes. Float tensor,
+        shape=[num_detections, 4]
+    detected_scores_tensor: Detected scores. Float tensor,
+        shape=[num_detections]
+    detected_labels_tensor: Detected labels. Int64 tensor,
+        shape=[num_detections]
+    discard_image_pixels: If true, discards the image from the result
+  Returns:
+    The de-serialized TF example augmented with the inferred detections.
+  """
+  tf_example = tf.train.Example()
+  (serialized_example, detected_boxes, detected_scores,
+   detected_classes) = tf.get_default_session().run([
+       serialized_example_tensor, detected_boxes_tensor, detected_scores_tensor,
+       detected_labels_tensor
+   ])
+  detected_boxes = detected_boxes.T
+
+  tf_example.ParseFromString(serialized_example)
+  feature = tf_example.features.feature
+  feature[standard_fields.TfExampleFields.
+          detection_score].float_list.value[:] = detected_scores
+  feature[standard_fields.TfExampleFields.
+          detection_bbox_ymin].float_list.value[:] = detected_boxes[0]
+  feature[standard_fields.TfExampleFields.
+          detection_bbox_xmin].float_list.value[:] = detected_boxes[1]
+  feature[standard_fields.TfExampleFields.
+          detection_bbox_ymax].float_list.value[:] = detected_boxes[2]
+  feature[standard_fields.TfExampleFields.
+          detection_bbox_xmax].float_list.value[:] = detected_boxes[3]
+  feature[standard_fields.TfExampleFields.
+          detection_class_label].int64_list.value[:] = detected_classes
+
+  if discard_image_pixels:
+    del feature[standard_fields.TfExampleFields.image_encoded]
+
+  return tf_example
--- a/research/mlperf_object_detection/Mask_RCNN/object_detection/inference/detection_inference_test.py
+++ b/research/mlperf_object_detection/Mask_RCNN/object_detection/inference/detection_inference_test.py
+# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+r"""Tests for detection_inference.py."""
+
+import os
+import StringIO
+
+import numpy as np
+from PIL import Image
+import tensorflow as tf
+
+from object_detection.core import standard_fields
+from object_detection.inference import detection_inference
+from object_detection.utils import dataset_util
+
+
+def get_mock_tfrecord_path():
+  return os.path.join(tf.test.get_temp_dir(), 'mock.tfrec')
+
+
+def create_mock_tfrecord():
+  pil_image = Image.fromarray(np.array([[[123, 0, 0]]], dtype=np.uint8), 'RGB')
+  image_output_stream = StringIO.StringIO()
+  pil_image.save(image_output_stream, format='png')
+  encoded_image = image_output_stream.getvalue()
+
+  feature_map = {
+      'test_field':
+          dataset_util.float_list_feature([1, 2, 3, 4]),
+      standard_fields.TfExampleFields.image_encoded:
+          dataset_util.bytes_feature(encoded_image),
+  }
+
+  tf_example = tf.train.Example(features=tf.train.Features(feature=feature_map))
+  with tf.python_io.TFRecordWriter(get_mock_tfrecord_path()) as writer:
+    writer.write(tf_example.SerializeToString())
+
+
+def get_mock_graph_path():
+  return os.path.join(tf.test.get_temp_dir(), 'mock_graph.pb')
+
+
+def create_mock_graph():
+  g = tf.Graph()
+  with g.as_default():
+    in_image_tensor = tf.placeholder(
+        tf.uint8, shape=[1, None, None, 3], name='image_tensor')
+    tf.constant([2.0], name='num_detections')
+    tf.constant(
+        [[[0, 0.8, 0.7, 1], [0.1, 0.2, 0.8, 0.9], [0.2, 0.3, 0.4, 0.5]]],
+        name='detection_boxes')
+    tf.constant([[0.1, 0.2, 0.3]], name='detection_scores')
+    tf.identity(
+        tf.constant([[1.0, 2.0, 3.0]]) *
+        tf.reduce_sum(tf.cast(in_image_tensor, dtype=tf.float32)),
+        name='detection_classes')
+    graph_def = g.as_graph_def()
+
+  with tf.gfile.Open(get_mock_graph_path(), 'w') as fl:
+    fl.write(graph_def.SerializeToString())
+
+
+class InferDetectionsTests(tf.test.TestCase):
+
+  def test_simple(self):
+    create_mock_graph()
+    create_mock_tfrecord()
+
+    serialized_example_tensor, image_tensor = detection_inference.build_input(
+        [get_mock_tfrecord_path()])
+    self.assertAllEqual(image_tensor.get_shape().as_list(), [1, None, None, 3])
+
+    (detected_boxes_tensor, detected_scores_tensor,
+     detected_labels_tensor) = detection_inference.build_inference_graph(
+         image_tensor, get_mock_graph_path())
+
+    with self.test_session(use_gpu=False) as sess:
+      sess.run(tf.global_variables_initializer())
+      sess.run(tf.local_variables_initializer())
+      tf.train.start_queue_runners()
+
+      tf_example = detection_inference.infer_detections_and_add_to_example(
+          serialized_example_tensor, detected_boxes_tensor,
+          detected_scores_tensor, detected_labels_tensor, False)
+
+    self.assertProtoEquals(r"""
+        features {
+          feature {
+            key: "image/detection/bbox/ymin"
+            value { float_list { value: [0.0, 0.1] } } }
+          feature {
+            key: "image/detection/bbox/xmin"
+            value { float_list { value: [0.8, 0.2] } } }
+          feature {
+            key: "image/detection/bbox/ymax"
+            value { float_list { value: [0.7, 0.8] } } }
+          feature {
+            key: "image/detection/bbox/xmax"
+            value { float_list { value: [1.0, 0.9] } } }
+          feature {
+            key: "image/detection/label"
+            value { int64_list { value: [123, 246] } } }
+          feature {
+            key: "image/detection/score"
+            value { float_list { value: [0.1, 0.2] } } }
+          feature {
+            key: "image/encoded"
+            value { bytes_list { value:
+              "\211PNG\r\n\032\n\000\000\000\rIHDR\000\000\000\001\000\000"
+              "\000\001\010\002\000\000\000\220wS\336\000\000\000\022IDATx"
+              "\234b\250f`\000\000\000\000\377\377\003\000\001u\000|gO\242"
+              "\213\000\000\000\000IEND\256B`\202" } } }
+          feature {
+            key: "test_field"
+            value { float_list { value: [1.0, 2.0, 3.0, 4.0] } } } }
+    """, tf_example)
+
+  def test_discard_image(self):
+    create_mock_graph()
+    create_mock_tfrecord()
+
+    serialized_example_tensor, image_tensor = detection_inference.build_input(
+        [get_mock_tfrecord_path()])
+    (detected_boxes_tensor, detected_scores_tensor,
+     detected_labels_tensor) = detection_inference.build_inference_graph(
+         image_tensor, get_mock_graph_path())
+
+    with self.test_session(use_gpu=False) as sess:
+      sess.run(tf.global_variables_initializer())
+      sess.run(tf.local_variables_initializer())
+      tf.train.start_queue_runners()
+
+      tf_example = detection_inference.infer_detections_and_add_to_example(
+          serialized_example_tensor, detected_boxes_tensor,
+          detected_scores_tensor, detected_labels_tensor, True)
+
+    self.assertProtoEquals(r"""
+        features {
+          feature {
+            key: "image/detection/bbox/ymin"
+            value { float_list { value: [0.0, 0.1] } } }
+          feature {
+            key: "image/detection/bbox/xmin"
+            value { float_list { value: [0.8, 0.2] } } }
+          feature {
+            key: "image/detection/bbox/ymax"
+            value { float_list { value: [0.7, 0.8] } } }
+          feature {
+            key: "image/detection/bbox/xmax"
+            value { float_list { value: [1.0, 0.9] } } }
+          feature {
+            key: "image/detection/label"
+            value { int64_list { value: [123, 246] } } }
+          feature {
+            key: "image/detection/score"
+            value { float_list { value: [0.1, 0.2] } } }
+          feature {
+            key: "test_field"
+            value { float_list { value: [1.0, 2.0, 3.0, 4.0] } } } }
+    """, tf_example)
+
+
+if __name__ == '__main__':
+  tf.test.main()
--- a/research/mlperf_object_detection/Mask_RCNN/object_detection/inference/infer_detections.py
+++ b/research/mlperf_object_detection/Mask_RCNN/object_detection/inference/infer_detections.py
+# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+r"""Infers detections on a TFRecord of TFExamples given an inference graph.
+
+Example usage:
+  ./infer_detections \
+    --input_tfrecord_paths=/path/to/input/tfrecord1,/path/to/input/tfrecord2 \
+    --output_tfrecord_path_prefix=/path/to/output/detections.tfrecord \
+    --inference_graph=/path/to/frozen_weights_inference_graph.pb
+
+The output is a TFRecord of TFExamples. Each TFExample from the input is first
+augmented with detections from the inference graph and then copied to the
+output.
+
+The input and output nodes of the inference graph are expected to have the same
+types, shapes, and semantics, as the input and output nodes of graphs produced
+by export_inference_graph.py, when run with --input_type=image_tensor.
+
+The script can also discard the image pixels in the output. This greatly
+reduces the output size and can potentially accelerate reading data in
+subsequent processing steps that don't require the images (e.g. computing
+metrics).
+"""
+
+import itertools
+import tensorflow as tf
+from object_detection.inference import detection_inference
+
+tf.flags.DEFINE_string('input_tfrecord_paths', None,
+                       'A comma separated list of paths to input TFRecords.')
+tf.flags.DEFINE_string('output_tfrecord_path', None,
+                       'Path to the output TFRecord.')
+tf.flags.DEFINE_string('inference_graph', None,
+                       'Path to the inference graph with embedded weights.')
+tf.flags.DEFINE_boolean('discard_image_pixels', False,
+                        'Discards the images in the output TFExamples. This'
+                        ' significantly reduces the output size and is useful'
+                        ' if the subsequent tools don\'t need access to the'
+                        ' images (e.g. when computing evaluation measures).')
+
+FLAGS = tf.flags.FLAGS
+
+
+def main(_):
+  tf.logging.set_verbosity(tf.logging.INFO)
+
+  required_flags = ['input_tfrecord_paths', 'output_tfrecord_path',
+                    'inference_graph']
+  for flag_name in required_flags:
+    if not getattr(FLAGS, flag_name):
+      raise ValueError('Flag --{} is required'.format(flag_name))
+
+  with tf.Session() as sess:
+    input_tfrecord_paths = [
+        v for v in FLAGS.input_tfrecord_paths.split(',') if v]
+    tf.logging.info('Reading input from %d files', len(input_tfrecord_paths))
+    serialized_example_tensor, image_tensor = detection_inference.build_input(
+        input_tfrecord_paths)
+    tf.logging.info('Reading graph and building model...')
+    (detected_boxes_tensor, detected_scores_tensor,
+     detected_labels_tensor) = detection_inference.build_inference_graph(
+         image_tensor, FLAGS.inference_graph)
+
+    tf.logging.info('Running inference and writing output to {}'.format(
+        FLAGS.output_tfrecord_path))
+    sess.run(tf.local_variables_initializer())
+    tf.train.start_queue_runners()
+    with tf.python_io.TFRecordWriter(
+        FLAGS.output_tfrecord_path) as tf_record_writer:
+      try:
+        for counter in itertools.count():
+          tf.logging.log_every_n(tf.logging.INFO, 'Processed %d images...', 10,
+                                 counter)
+          tf_example = detection_inference.infer_detections_and_add_to_example(
+              serialized_example_tensor, detected_boxes_tensor,
+              detected_scores_tensor, detected_labels_tensor,
+              FLAGS.discard_image_pixels)
+          tf_record_writer.write(tf_example.SerializeToString())
+      except tf.errors.OutOfRangeError:
+        tf.logging.info('Finished processing records')
+
+
+if __name__ == '__main__':
+  tf.app.run()
--- a/research/mlperf_object_detection/Mask_RCNN/object_detection/inputs.py
+++ b/research/mlperf_object_detection/Mask_RCNN/object_detection/inputs.py
+# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Model input function for tf-learn object detection model."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import functools
+
+import tensorflow as tf
+from object_detection.builders import dataset_builder
+from object_detection.builders import image_resizer_builder
+from object_detection.builders import model_builder
+from object_detection.builders import preprocessor_builder
+from object_detection.core import preprocessor
+from object_detection.core import standard_fields as fields
+from object_detection.data_decoders import tf_example_decoder
+from object_detection.protos import eval_pb2
+from object_detection.protos import input_reader_pb2
+from object_detection.protos import model_pb2
+from object_detection.protos import train_pb2
+from object_detection.utils import config_util
+from object_detection.utils import dataset_util
+from object_detection.utils import ops as util_ops
+
+HASH_KEY = 'hash'
+HASH_BINS = 1 << 31
+SERVING_FED_EXAMPLE_KEY = 'serialized_example'
+
+# A map of names to methods that help build the input pipeline.
+INPUT_BUILDER_UTIL_MAP = {
+    'dataset_build': dataset_builder.build,
+}
+
+
+def transform_input_data(tensor_dict,
+                         model_preprocess_fn,
+                         image_resizer_fn,
+                         num_classes,
+                         data_augmentation_fn=None,
+                         merge_multiple_boxes=False,
+                         retain_original_image=False):
+  """A single function that is responsible for all input data transformations.
+
+  Data transformation functions are applied in the following order.
+  1. If key fields.InputDataFields.image_additional_channels is present in
+     tensor_dict, the additional channels will be merged into
+     fields.InputDataFields.image.
+  2. data_augmentation_fn (optional): applied on tensor_dict.
+  3. model_preprocess_fn: applied only on image tensor in tensor_dict.
+  4. image_resizer_fn: applied on original image and instance mask tensor in
+     tensor_dict.
+  5. one_hot_encoding: applied to classes tensor in tensor_dict.
+  6. merge_multiple_boxes (optional): when groundtruth boxes are exactly the
+     same they can be merged into a single box with an associated k-hot class
+     label.
+
+  Args:
+    tensor_dict: dictionary containing input tensors keyed by
+      fields.InputDataFields.
+    model_preprocess_fn: model's preprocess function to apply on image tensor.
+      This function must take in a 4-D float tensor and return a 4-D preprocess
+      float tensor and a tensor containing the true image shape.
+    image_resizer_fn: image resizer function to apply on groundtruth instance
+      `masks. This function must take a 3-D float tensor of an image and a 3-D
+      tensor of instance masks and return a resized version of these along with
+      the true shapes.
+    num_classes: number of max classes to one-hot (or k-hot) encode the class
+      labels.
+    data_augmentation_fn: (optional) data augmentation function to apply on
+      input `tensor_dict`.
+    merge_multiple_boxes: (optional) whether to merge multiple groundtruth boxes
+      and classes for a given image if the boxes are exactly the same.
+    retain_original_image: (optional) whether to retain original image in the
+      output dictionary.
+
+  Returns:
+    A dictionary keyed by fields.InputDataFields containing the tensors obtained
+    after applying all the transformations.
+  """
+  if fields.InputDataFields.image_additional_channels in tensor_dict:
+    channels = tensor_dict[fields.InputDataFields.image_additional_channels]
+    tensor_dict[fields.InputDataFields.image] = tf.concat(
+        [tensor_dict[fields.InputDataFields.image], channels], axis=2)
+
+  if retain_original_image:
+    tensor_dict[fields.InputDataFields.original_image] = tf.cast(
+        tensor_dict[fields.InputDataFields.image], tf.uint8)
+
+  # Apply data augmentation ops.
+  if data_augmentation_fn is not None:
+    tensor_dict = data_augmentation_fn(tensor_dict)
+
+  # Apply model preprocessing ops and resize instance masks.
+  image = tensor_dict[fields.InputDataFields.image]
+  preprocessed_resized_image, true_image_shape = model_preprocess_fn(
+      tf.expand_dims(tf.to_float(image), axis=0))
+  tensor_dict[fields.InputDataFields.image] = tf.squeeze(
+      preprocessed_resized_image, axis=0)
+  tensor_dict[fields.InputDataFields.true_image_shape] = tf.squeeze(
+      true_image_shape, axis=0)
+  if fields.InputDataFields.groundtruth_instance_masks in tensor_dict:
+    masks = tensor_dict[fields.InputDataFields.groundtruth_instance_masks]
+    _, resized_masks, _ = image_resizer_fn(image, masks)
+    tensor_dict[fields.InputDataFields.
+                groundtruth_instance_masks] = resized_masks
+
+  # Transform groundtruth classes to one hot encodings.
+  label_offset = 1
+  zero_indexed_groundtruth_classes = tensor_dict[
+      fields.InputDataFields.groundtruth_classes] - label_offset
+  tensor_dict[fields.InputDataFields.groundtruth_classes] = tf.one_hot(
+      zero_indexed_groundtruth_classes, num_classes)
+
+  if merge_multiple_boxes:
+    merged_boxes, merged_classes, _ = util_ops.merge_boxes_with_multiple_labels(
+        tensor_dict[fields.InputDataFields.groundtruth_boxes],
+        zero_indexed_groundtruth_classes, num_classes)
+    tensor_dict[fields.InputDataFields.groundtruth_boxes] = merged_boxes
+    tensor_dict[fields.InputDataFields.groundtruth_classes] = merged_classes
+
+  return tensor_dict
+
+
+def augment_input_data(tensor_dict, data_augmentation_options):
+  """Applies data augmentation ops to input tensors.
+
+  Args:
+    tensor_dict: A dictionary of input tensors keyed by fields.InputDataFields.
+    data_augmentation_options: A list of tuples, where each tuple contains a
+      function and a dictionary that contains arguments and their values.
+      Usually, this is the output of core/preprocessor.build.
+
+  Returns:
+    A dictionary of tensors obtained by applying data augmentation ops to the
+    input tensor dictionary.
+  """
+  tensor_dict[fields.InputDataFields.image] = tf.expand_dims(
+      tf.to_float(tensor_dict[fields.InputDataFields.image]), 0)
+
+  include_instance_masks = (fields.InputDataFields.groundtruth_instance_masks
+                            in tensor_dict)
+  include_keypoints = (fields.InputDataFields.groundtruth_keypoints
+                       in tensor_dict)
+  tensor_dict = preprocessor.preprocess(
+      tensor_dict, data_augmentation_options,
+      func_arg_map=preprocessor.get_default_func_arg_map(
+          include_instance_masks=include_instance_masks,
+          include_keypoints=include_keypoints))
+  tensor_dict[fields.InputDataFields.image] = tf.squeeze(
+      tensor_dict[fields.InputDataFields.image], axis=0)
+  return tensor_dict
+
+
+def _get_labels_dict(input_dict):
+  """Extracts labels dict from input dict."""
+  required_label_keys = [
+      fields.InputDataFields.num_groundtruth_boxes,
+      fields.InputDataFields.groundtruth_boxes,
+      fields.InputDataFields.groundtruth_classes,
+      fields.InputDataFields.groundtruth_weights
+  ]
+  labels_dict = {}
+  for key in required_label_keys:
+    labels_dict[key] = input_dict[key]
+
+  optional_label_keys = [
+      fields.InputDataFields.groundtruth_keypoints,
+      fields.InputDataFields.groundtruth_instance_masks,
+      fields.InputDataFields.groundtruth_area,
+      fields.InputDataFields.groundtruth_is_crowd,
+      fields.InputDataFields.groundtruth_difficult
+  ]
+
+  for key in optional_label_keys:
+    if key in input_dict:
+      labels_dict[key] = input_dict[key]
+  if fields.InputDataFields.groundtruth_difficult in labels_dict:
+    labels_dict[fields.InputDataFields.groundtruth_difficult] = tf.cast(
+        labels_dict[fields.InputDataFields.groundtruth_difficult], tf.int32)
+  return labels_dict
+
+
+def _get_features_dict(input_dict):
+  """Extracts features dict from input dict."""
+  hash_from_source_id = tf.string_to_hash_bucket_fast(
+      input_dict[fields.InputDataFields.source_id], HASH_BINS)
+  features = {
+      fields.InputDataFields.image:
+          input_dict[fields.InputDataFields.image],
+      HASH_KEY: tf.cast(hash_from_source_id, tf.int32),
+      fields.InputDataFields.true_image_shape:
+          input_dict[fields.InputDataFields.true_image_shape]
+  }
+  if fields.InputDataFields.original_image in input_dict:
+    features[fields.InputDataFields.original_image] = input_dict[
+        fields.InputDataFields.original_image]
+  return features
+
+
+def create_train_input_fn(train_config, train_input_config,
+                          model_config):
+  """Creates a train `input` function for `Estimator`.
+
+  Args:
+    train_config: A train_pb2.TrainConfig.
+    train_input_config: An input_reader_pb2.InputReader.
+    model_config: A model_pb2.DetectionModel.
+
+  Returns:
+    `input_fn` for `Estimator` in TRAIN mode.
+  """
+
+  def _train_input_fn(params=None):
+    """Returns `features` and `labels` tensor dictionaries for training.
+
+    Args:
+      params: Parameter dictionary passed from the estimator.
+
+    Returns:
+      features: Dictionary of feature tensors.
+        features[fields.InputDataFields.image] is a [batch_size, H, W, C]
+          float32 tensor with preprocessed images.
+        features[HASH_KEY] is a [batch_size] int32 tensor representing unique
+          identifiers for the images.
+        features[fields.InputDataFields.true_image_shape] is a [batch_size, 3]
+          int32 tensor representing the true image shapes, as preprocessed
+          images could be padded.
+        features[fields.InputDataFields.original_image] (optional) is a
+          [batch_size, H, W, C] float32 tensor with original images.
+      labels: Dictionary of groundtruth tensors.
+        labels[fields.InputDataFields.num_groundtruth_boxes] is a [batch_size]
+          int32 tensor indicating the number of groundtruth boxes.
+        labels[fields.InputDataFields.groundtruth_boxes] is a
+          [batch_size, num_boxes, 4] float32 tensor containing the corners of
+          the groundtruth boxes.
+        labels[fields.InputDataFields.groundtruth_classes] is a
+          [batch_size, num_boxes, num_classes] float32 one-hot tensor of
+          classes.
+        labels[fields.InputDataFields.groundtruth_weights] is a
+          [batch_size, num_boxes] float32 tensor containing groundtruth weights
+          for the boxes.
+        -- Optional --
+        labels[fields.InputDataFields.groundtruth_instance_masks] is a
+          [batch_size, num_boxes, H, W] float32 tensor containing only binary
+          values, which represent instance masks for objects.
+        labels[fields.InputDataFields.groundtruth_keypoints] is a
+          [batch_size, num_boxes, num_keypoints, 2] float32 tensor containing
+          keypoints for each box.
+
+    Raises:
+      TypeError: if the `train_config`, `train_input_config` or `model_config`
+        are not of the correct type.
+    """
+    if not isinstance(train_config, train_pb2.TrainConfig):
+      raise TypeError('For training mode, the `train_config` must be a '
+                      'train_pb2.TrainConfig.')
+    if not isinstance(train_input_config, input_reader_pb2.InputReader):
+      raise TypeError('The `train_input_config` must be a '
+                      'input_reader_pb2.InputReader.')
+    if not isinstance(model_config, model_pb2.DetectionModel):
+      raise TypeError('The `model_config` must be a '
+                      'model_pb2.DetectionModel.')
+
+    data_augmentation_options = [
+        preprocessor_builder.build(step)
+        for step in train_config.data_augmentation_options
+    ]
+    data_augmentation_fn = functools.partial(
+        augment_input_data, data_augmentation_options=data_augmentation_options)
+
+    model = model_builder.build(model_config, is_training=True)
+    image_resizer_config = config_util.get_image_resizer_config(model_config)
+    image_resizer_fn = image_resizer_builder.build(image_resizer_config)
+
+    transform_data_fn = functools.partial(
+        transform_input_data, model_preprocess_fn=model.preprocess,
+        image_resizer_fn=image_resizer_fn,
+        num_classes=config_util.get_number_of_classes(model_config),
+        data_augmentation_fn=data_augmentation_fn,
+        retain_original_image=train_config.retain_original_images)
+    dataset = INPUT_BUILDER_UTIL_MAP['dataset_build'](
+        train_input_config,
+        transform_input_data_fn=transform_data_fn,
+        batch_size=params['batch_size'] if params else train_config.batch_size,
+        max_num_boxes=train_config.max_number_of_boxes,
+        num_classes=config_util.get_number_of_classes(model_config),
+        spatial_image_shape=config_util.get_spatial_image_size(
+            image_resizer_config))
+    input_dict = dataset_util.make_initializable_iterator(dataset).get_next()
+    return (_get_features_dict(input_dict), _get_labels_dict(input_dict))
+
+  return _train_input_fn
+
+
+def create_eval_input_fn(eval_config, eval_input_config, model_config):
+  """Creates an eval `input` function for `Estimator`.
+
+  Args:
+    eval_config: An eval_pb2.EvalConfig.
+    eval_input_config: An input_reader_pb2.InputReader.
+    model_config: A model_pb2.DetectionModel.
+
+  Returns:
+    `input_fn` for `Estimator` in EVAL mode.
+  """
+
+  def _eval_input_fn(params=None):
+    """Returns `features` and `labels` tensor dictionaries for evaluation.
+
+    Args:
+      params: Parameter dictionary passed from the estimator.
+
+    Returns:
+      features: Dictionary of feature tensors.
+        features[fields.InputDataFields.image] is a [1, H, W, C] float32 tensor
+          with preprocessed images.
+        features[HASH_KEY] is a [1] int32 tensor representing unique
+          identifiers for the images.
+        features[fields.InputDataFields.true_image_shape] is a [1, 3]
+          int32 tensor representing the true image shapes, as preprocessed
+          images could be padded.
+        features[fields.InputDataFields.original_image] is a [1, H', W', C]
+          float32 tensor with the original image.
+      labels: Dictionary of groundtruth tensors.
+        labels[fields.InputDataFields.groundtruth_boxes] is a [1, num_boxes, 4]
+          float32 tensor containing the corners of the groundtruth boxes.
+        labels[fields.InputDataFields.groundtruth_classes] is a
+          [num_boxes, num_classes] float32 one-hot tensor of classes.
+        labels[fields.InputDataFields.groundtruth_area] is a [1, num_boxes]
+          float32 tensor containing object areas.
+        labels[fields.InputDataFields.groundtruth_is_crowd] is a [1, num_boxes]
+          bool tensor indicating if the boxes enclose a crowd.
+        labels[fields.InputDataFields.groundtruth_difficult] is a [1, num_boxes]
+          int32 tensor indicating if the boxes represent difficult instances.
+        -- Optional --
+        labels[fields.InputDataFields.groundtruth_instance_masks] is a
+          [1, num_boxes, H, W] float32 tensor containing only binary values,
+          which represent instance masks for objects.
+
+    Raises:
+      TypeError: if the `eval_config`, `eval_input_config` or `model_config`
+        are not of the correct type.
+    """
+    params = params or {}
+    if not isinstance(eval_config, eval_pb2.EvalConfig):
+      raise TypeError('For eval mode, the `eval_config` must be a '
+                      'train_pb2.EvalConfig.')
+    if not isinstance(eval_input_config, input_reader_pb2.InputReader):
+      raise TypeError('The `eval_input_config` must be a '
+                      'input_reader_pb2.InputReader.')
+    if not isinstance(model_config, model_pb2.DetectionModel):
+      raise TypeError('The `model_config` must be a '
+                      'model_pb2.DetectionModel.')
+
+    num_classes = config_util.get_number_of_classes(model_config)
+    model = model_builder.build(model_config, is_training=False)
+    image_resizer_config = config_util.get_image_resizer_config(model_config)
+    image_resizer_fn = image_resizer_builder.build(image_resizer_config)
+
+    transform_data_fn = functools.partial(
+        transform_input_data, model_preprocess_fn=model.preprocess,
+        image_resizer_fn=image_resizer_fn,
+        num_classes=num_classes,
+        data_augmentation_fn=None,
+        retain_original_image=eval_config.retain_original_images)
+    dataset = INPUT_BUILDER_UTIL_MAP['dataset_build'](
+        eval_input_config,
+        transform_input_data_fn=transform_data_fn,
+        batch_size=params.get('batch_size', 1),
+        num_classes=config_util.get_number_of_classes(model_config),
+        spatial_image_shape=config_util.get_spatial_image_size(
+            image_resizer_config))
+    input_dict = dataset_util.make_initializable_iterator(dataset).get_next()
+
+    return (_get_features_dict(input_dict), _get_labels_dict(input_dict))
+
+  return _eval_input_fn
+
+
+def create_predict_input_fn(model_config):
+  """Creates a predict `input` function for `Estimator`.
+
+  Args:
+    model_config: A model_pb2.DetectionModel.
+
+  Returns:
+    `input_fn` for `Estimator` in PREDICT mode.
+  """
+
+  def _predict_input_fn(params=None):
+    """Decodes serialized tf.Examples and returns `ServingInputReceiver`.
+
+    Args:
+      params: Parameter dictionary passed from the estimator.
+
+    Returns:
+      `ServingInputReceiver`.
+    """
+    del params
+    example = tf.placeholder(dtype=tf.string, shape=[], name='input_feature')
+
+    num_classes = config_util.get_number_of_classes(model_config)
+    model = model_builder.build(model_config, is_training=False)
+    image_resizer_config = config_util.get_image_resizer_config(model_config)
+    image_resizer_fn = image_resizer_builder.build(image_resizer_config)
+
+    transform_fn = functools.partial(
+        transform_input_data, model_preprocess_fn=model.preprocess,
+        image_resizer_fn=image_resizer_fn,
+        num_classes=num_classes,
+        data_augmentation_fn=None)
+
+    decoder = tf_example_decoder.TfExampleDecoder(load_instance_masks=False)
+    input_dict = transform_fn(decoder.decode(example))
+    images = tf.to_float(input_dict[fields.InputDataFields.image])
+    images = tf.expand_dims(images, axis=0)
+    true_image_shape = tf.expand_dims(
+        input_dict[fields.InputDataFields.true_image_shape], axis=0)
+
+    return tf.estimator.export.ServingInputReceiver(
+        features={
+            fields.InputDataFields.image: images,
+            fields.InputDataFields.true_image_shape: true_image_shape},
+        receiver_tensors={SERVING_FED_EXAMPLE_KEY: example})
+
+  return _predict_input_fn
--- a/research/mlperf_object_detection/Mask_RCNN/object_detection/inputs_test.py
+++ b/research/mlperf_object_detection/Mask_RCNN/object_detection/inputs_test.py
+# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for object_detection.tflearn.inputs."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import functools
+import os
+
+import numpy as np
+import tensorflow as tf
+
+from object_detection import inputs
+from object_detection.core import preprocessor
+from object_detection.core import standard_fields as fields
+from object_detection.utils import config_util
+
+FLAGS = tf.flags.FLAGS
+
+
+def _get_configs_for_model(model_name):
+  """Returns configurations for model."""
+  fname = os.path.join(tf.resource_loader.get_data_files_path(),
+                       'samples/configs/' + model_name + '.config')
+  label_map_path = os.path.join(tf.resource_loader.get_data_files_path(),
+                                'data/pet_label_map.pbtxt')
+  data_path = os.path.join(tf.resource_loader.get_data_files_path(),
+                           'test_data/pets_examples.record')
+  configs = config_util.get_configs_from_pipeline_file(fname)
+  return config_util.merge_external_params_with_configs(
+      configs,
+      train_input_path=data_path,
+      eval_input_path=data_path,
+      label_map_path=label_map_path)
+
+
+class InputsTest(tf.test.TestCase):
+
+  def test_faster_rcnn_resnet50_train_input(self):
+    """Tests the training input function for FasterRcnnResnet50."""
+    configs = _get_configs_for_model('faster_rcnn_resnet50_pets')
+    configs['train_config'].unpad_groundtruth_tensors = True
+    model_config = configs['model']
+    model_config.faster_rcnn.num_classes = 37
+    train_input_fn = inputs.create_train_input_fn(
+        configs['train_config'], configs['train_input_config'], model_config)
+    features, labels = train_input_fn()
+
+    self.assertAllEqual([1, None, None, 3],
+                        features[fields.InputDataFields.image].shape.as_list())
+    self.assertEqual(tf.float32, features[fields.InputDataFields.image].dtype)
+    self.assertAllEqual([1],
+                        features[inputs.HASH_KEY].shape.as_list())
+    self.assertEqual(tf.int32, features[inputs.HASH_KEY].dtype)
+    self.assertAllEqual(
+        [1, 50, 4],
+        labels[fields.InputDataFields.groundtruth_boxes].shape.as_list())
+    self.assertEqual(tf.float32,
+                     labels[fields.InputDataFields.groundtruth_boxes].dtype)
+    self.assertAllEqual(
+        [1, 50, model_config.faster_rcnn.num_classes],
+        labels[fields.InputDataFields.groundtruth_classes].shape.as_list())
+    self.assertEqual(tf.float32,
+                     labels[fields.InputDataFields.groundtruth_classes].dtype)
+    self.assertAllEqual(
+        [1, 50],
+        labels[fields.InputDataFields.groundtruth_weights].shape.as_list())
+    self.assertEqual(tf.float32,
+                     labels[fields.InputDataFields.groundtruth_weights].dtype)
+
+  def test_faster_rcnn_resnet50_eval_input(self):
+    """Tests the eval input function for FasterRcnnResnet50."""
+    configs = _get_configs_for_model('faster_rcnn_resnet50_pets')
+    model_config = configs['model']
+    model_config.faster_rcnn.num_classes = 37
+    eval_input_fn = inputs.create_eval_input_fn(
+        configs['eval_config'], configs['eval_input_config'], model_config)
+    features, labels = eval_input_fn()
+
+    self.assertAllEqual([1, None, None, 3],
+                        features[fields.InputDataFields.image].shape.as_list())
+    self.assertEqual(tf.float32, features[fields.InputDataFields.image].dtype)
+    self.assertAllEqual(
+        [1, None, None, 3],
+        features[fields.InputDataFields.original_image].shape.as_list())
+    self.assertEqual(tf.uint8,
+                     features[fields.InputDataFields.original_image].dtype)
+    self.assertAllEqual([1], features[inputs.HASH_KEY].shape.as_list())
+    self.assertEqual(tf.int32, features[inputs.HASH_KEY].dtype)
+    self.assertAllEqual(
+        [1, None, 4],
+        labels[fields.InputDataFields.groundtruth_boxes].shape.as_list())
+    self.assertEqual(tf.float32,
+                     labels[fields.InputDataFields.groundtruth_boxes].dtype)
+    self.assertAllEqual(
+        [1, None, model_config.faster_rcnn.num_classes],
+        labels[fields.InputDataFields.groundtruth_classes].shape.as_list())
+    self.assertEqual(tf.float32,
+                     labels[fields.InputDataFields.groundtruth_classes].dtype)
+    self.assertAllEqual(
+        [1, None],
+        labels[fields.InputDataFields.groundtruth_area].shape.as_list())
+    self.assertEqual(tf.float32,
+                     labels[fields.InputDataFields.groundtruth_area].dtype)
+    self.assertAllEqual(
+        [1, None],
+        labels[fields.InputDataFields.groundtruth_is_crowd].shape.as_list())
+    self.assertEqual(
+        tf.bool, labels[fields.InputDataFields.groundtruth_is_crowd].dtype)
+    self.assertAllEqual(
+        [1, None],
+        labels[fields.InputDataFields.groundtruth_difficult].shape.as_list())
+    self.assertEqual(
+        tf.int32, labels[fields.InputDataFields.groundtruth_difficult].dtype)
+
+  def test_ssd_inceptionV2_train_input(self):
+    """Tests the training input function for SSDInceptionV2."""
+    configs = _get_configs_for_model('ssd_inception_v2_pets')
+    model_config = configs['model']
+    model_config.ssd.num_classes = 37
+    batch_size = configs['train_config'].batch_size
+    train_input_fn = inputs.create_train_input_fn(
+        configs['train_config'], configs['train_input_config'], model_config)
+    features, labels = train_input_fn()
+
+    self.assertAllEqual([batch_size, 300, 300, 3],
+                        features[fields.InputDataFields.image].shape.as_list())
+    self.assertEqual(tf.float32, features[fields.InputDataFields.image].dtype)
+    self.assertAllEqual([batch_size],
+                        features[inputs.HASH_KEY].shape.as_list())
+    self.assertEqual(tf.int32, features[inputs.HASH_KEY].dtype)
+    self.assertAllEqual(
+        [batch_size],
+        labels[fields.InputDataFields.num_groundtruth_boxes].shape.as_list())
+    self.assertEqual(tf.int32,
+                     labels[fields.InputDataFields.num_groundtruth_boxes].dtype)
+    self.assertAllEqual(
+        [batch_size, 50, 4],
+        labels[fields.InputDataFields.groundtruth_boxes].shape.as_list())
+    self.assertEqual(tf.float32,
+                     labels[fields.InputDataFields.groundtruth_boxes].dtype)
+    self.assertAllEqual(
+        [batch_size, 50, model_config.ssd.num_classes],
+        labels[fields.InputDataFields.groundtruth_classes].shape.as_list())
+    self.assertEqual(tf.float32,
+                     labels[fields.InputDataFields.groundtruth_classes].dtype)
+    self.assertAllEqual(
+        [batch_size, 50],
+        labels[fields.InputDataFields.groundtruth_weights].shape.as_list())
+    self.assertEqual(tf.float32,
+                     labels[fields.InputDataFields.groundtruth_weights].dtype)
+
+  def test_ssd_inceptionV2_eval_input(self):
+    """Tests the eval input function for SSDInceptionV2."""
+    configs = _get_configs_for_model('ssd_inception_v2_pets')
+    model_config = configs['model']
+    model_config.ssd.num_classes = 37
+    eval_input_fn = inputs.create_eval_input_fn(
+        configs['eval_config'], configs['eval_input_config'], model_config)
+    features, labels = eval_input_fn()
+
+    self.assertAllEqual([1, 300, 300, 3],
+                        features[fields.InputDataFields.image].shape.as_list())
+    self.assertEqual(tf.float32, features[fields.InputDataFields.image].dtype)
+    self.assertAllEqual(
+        [1, None, None, 3],
+        features[fields.InputDataFields.original_image].shape.as_list())
+    self.assertEqual(tf.uint8,
+                     features[fields.InputDataFields.original_image].dtype)
+    self.assertAllEqual([1], features[inputs.HASH_KEY].shape.as_list())
+    self.assertEqual(tf.int32, features[inputs.HASH_KEY].dtype)
+    self.assertAllEqual(
+        [1, None, 4],
+        labels[fields.InputDataFields.groundtruth_boxes].shape.as_list())
+    self.assertEqual(tf.float32,
+                     labels[fields.InputDataFields.groundtruth_boxes].dtype)
+    self.assertAllEqual(
+        [1, None, model_config.ssd.num_classes],
+        labels[fields.InputDataFields.groundtruth_classes].shape.as_list())
+    self.assertEqual(tf.float32,
+                     labels[fields.InputDataFields.groundtruth_classes].dtype)
+    self.assertAllEqual(
+        [1, None],
+        labels[fields.InputDataFields.groundtruth_area].shape.as_list())
+    self.assertEqual(tf.float32,
+                     labels[fields.InputDataFields.groundtruth_area].dtype)
+    self.assertAllEqual(
+        [1, None],
+        labels[fields.InputDataFields.groundtruth_is_crowd].shape.as_list())
+    self.assertEqual(
+        tf.bool, labels[fields.InputDataFields.groundtruth_is_crowd].dtype)
+    self.assertAllEqual(
+        [1, None],
+        labels[fields.InputDataFields.groundtruth_difficult].shape.as_list())
+    self.assertEqual(
+        tf.int32, labels[fields.InputDataFields.groundtruth_difficult].dtype)
+
+  def test_predict_input(self):
+    """Tests the predict input function."""
+    configs = _get_configs_for_model('ssd_inception_v2_pets')
+    predict_input_fn = inputs.create_predict_input_fn(
+        model_config=configs['model'])
+    serving_input_receiver = predict_input_fn()
+
+    image = serving_input_receiver.features[fields.InputDataFields.image]
+    receiver_tensors = serving_input_receiver.receiver_tensors[
+        inputs.SERVING_FED_EXAMPLE_KEY]
+    self.assertEqual([1, 300, 300, 3], image.shape.as_list())
+    self.assertEqual(tf.float32, image.dtype)
+    self.assertEqual(tf.string, receiver_tensors.dtype)
+
+  def test_error_with_bad_train_config(self):
+    """Tests that a TypeError is raised with improper train config."""
+    configs = _get_configs_for_model('ssd_inception_v2_pets')
+    configs['model'].ssd.num_classes = 37
+    train_input_fn = inputs.create_train_input_fn(
+        train_config=configs['eval_config'],  # Expecting `TrainConfig`.
+        train_input_config=configs['train_input_config'],
+        model_config=configs['model'])
+    with self.assertRaises(TypeError):
+      train_input_fn()
+
+  def test_error_with_bad_train_input_config(self):
+    """Tests that a TypeError is raised with improper train input config."""
+    configs = _get_configs_for_model('ssd_inception_v2_pets')
+    configs['model'].ssd.num_classes = 37
+    train_input_fn = inputs.create_train_input_fn(
+        train_config=configs['train_config'],
+        train_input_config=configs['model'],  # Expecting `InputReader`.
+        model_config=configs['model'])
+    with self.assertRaises(TypeError):
+      train_input_fn()
+
+  def test_error_with_bad_train_model_config(self):
+    """Tests that a TypeError is raised with improper train model config."""
+    configs = _get_configs_for_model('ssd_inception_v2_pets')
+    configs['model'].ssd.num_classes = 37
+    train_input_fn = inputs.create_train_input_fn(
+        train_config=configs['train_config'],
+        train_input_config=configs['train_input_config'],
+        model_config=configs['train_config'])  # Expecting `DetectionModel`.
+    with self.assertRaises(TypeError):
+      train_input_fn()
+
+  def test_error_with_bad_eval_config(self):
+    """Tests that a TypeError is raised with improper eval config."""
+    configs = _get_configs_for_model('ssd_inception_v2_pets')
+    configs['model'].ssd.num_classes = 37
+    eval_input_fn = inputs.create_eval_input_fn(
+        eval_config=configs['train_config'],  # Expecting `EvalConfig`.
+        eval_input_config=configs['eval_input_config'],
+        model_config=configs['model'])
+    with self.assertRaises(TypeError):
+      eval_input_fn()
+
+  def test_error_with_bad_eval_input_config(self):
+    """Tests that a TypeError is raised with improper eval input config."""
+    configs = _get_configs_for_model('ssd_inception_v2_pets')
+    configs['model'].ssd.num_classes = 37
+    eval_input_fn = inputs.create_eval_input_fn(
+        eval_config=configs['eval_config'],
+        eval_input_config=configs['model'],  # Expecting `InputReader`.
+        model_config=configs['model'])
+    with self.assertRaises(TypeError):
+      eval_input_fn()
+
+  def test_error_with_bad_eval_model_config(self):
+    """Tests that a TypeError is raised with improper eval model config."""
+    configs = _get_configs_for_model('ssd_inception_v2_pets')
+    configs['model'].ssd.num_classes = 37
+    eval_input_fn = inputs.create_eval_input_fn(
+        eval_config=configs['eval_config'],
+        eval_input_config=configs['eval_input_config'],
+        model_config=configs['eval_config'])  # Expecting `DetectionModel`.
+    with self.assertRaises(TypeError):
+      eval_input_fn()
+
+
+class DataAugmentationFnTest(tf.test.TestCase):
+
+  def test_apply_image_and_box_augmentation(self):
+    data_augmentation_options = [
+        (preprocessor.resize_image, {
+            'new_height': 20,
+            'new_width': 20,
+            'method': tf.image.ResizeMethod.NEAREST_NEIGHBOR
+        }),
+        (preprocessor.scale_boxes_to_pixel_coordinates, {}),
+    ]
+    data_augmentation_fn = functools.partial(
+        inputs.augment_input_data,
+        data_augmentation_options=data_augmentation_options)
+    tensor_dict = {
+        fields.InputDataFields.image:
+            tf.constant(np.random.rand(10, 10, 3).astype(np.float32)),
+        fields.InputDataFields.groundtruth_boxes:
+            tf.constant(np.array([[.5, .5, 1., 1.]], np.float32))
+    }
+    augmented_tensor_dict = data_augmentation_fn(tensor_dict=tensor_dict)
+    with self.test_session() as sess:
+      augmented_tensor_dict_out = sess.run(augmented_tensor_dict)
+
+    self.assertAllEqual(
+        augmented_tensor_dict_out[fields.InputDataFields.image].shape,
+        [20, 20, 3]
+    )
+    self.assertAllClose(
+        augmented_tensor_dict_out[fields.InputDataFields.groundtruth_boxes],
+        [[10, 10, 20, 20]]
+    )
+
+  def test_include_masks_in_data_augmentation(self):
+    data_augmentation_options = [
+        (preprocessor.resize_image, {
+            'new_height': 20,
+            'new_width': 20,
+            'method': tf.image.ResizeMethod.NEAREST_NEIGHBOR
+        })
+    ]
+    data_augmentation_fn = functools.partial(
+        inputs.augment_input_data,
+        data_augmentation_options=data_augmentation_options)
+    tensor_dict = {
+        fields.InputDataFields.image:
+            tf.constant(np.random.rand(10, 10, 3).astype(np.float32)),
+        fields.InputDataFields.groundtruth_instance_masks:
+            tf.constant(np.zeros([2, 10, 10], np.uint8))
+    }
+    augmented_tensor_dict = data_augmentation_fn(tensor_dict=tensor_dict)
+    with self.test_session() as sess:
+      augmented_tensor_dict_out = sess.run(augmented_tensor_dict)
+
+    self.assertAllEqual(
+        augmented_tensor_dict_out[fields.InputDataFields.image].shape,
+        [20, 20, 3])
+    self.assertAllEqual(augmented_tensor_dict_out[
+        fields.InputDataFields.groundtruth_instance_masks].shape, [2, 20, 20])
+
+  def test_include_keypoints_in_data_augmentation(self):
+    data_augmentation_options = [
+        (preprocessor.resize_image, {
+            'new_height': 20,
+            'new_width': 20,
+            'method': tf.image.ResizeMethod.NEAREST_NEIGHBOR
+        }),
+        (preprocessor.scale_boxes_to_pixel_coordinates, {}),
+    ]
+    data_augmentation_fn = functools.partial(
+        inputs.augment_input_data,
+        data_augmentation_options=data_augmentation_options)
+    tensor_dict = {
+        fields.InputDataFields.image:
+            tf.constant(np.random.rand(10, 10, 3).astype(np.float32)),
+        fields.InputDataFields.groundtruth_boxes:
+            tf.constant(np.array([[.5, .5, 1., 1.]], np.float32)),
+        fields.InputDataFields.groundtruth_keypoints:
+            tf.constant(np.array([[[0.5, 1.0], [0.5, 0.5]]], np.float32))
+    }
+    augmented_tensor_dict = data_augmentation_fn(tensor_dict=tensor_dict)
+    with self.test_session() as sess:
+      augmented_tensor_dict_out = sess.run(augmented_tensor_dict)
+
+    self.assertAllEqual(
+        augmented_tensor_dict_out[fields.InputDataFields.image].shape,
+        [20, 20, 3]
+    )
+    self.assertAllClose(
+        augmented_tensor_dict_out[fields.InputDataFields.groundtruth_boxes],
+        [[10, 10, 20, 20]]
+    )
+    self.assertAllClose(
+        augmented_tensor_dict_out[fields.InputDataFields.groundtruth_keypoints],
+        [[[10, 20], [10, 10]]]
+    )
+
+
+def _fake_model_preprocessor_fn(image):
+  return (image, tf.expand_dims(tf.shape(image)[1:], axis=0))
+
+
+def _fake_image_resizer_fn(image, mask):
+  return (image, mask, tf.shape(image))
+
+
+class DataTransformationFnTest(tf.test.TestCase):
+
+  def test_combine_additional_channels_if_present(self):
+    image = np.random.rand(4, 4, 3).astype(np.float32)
+    additional_channels = np.random.rand(4, 4, 2).astype(np.float32)
+    tensor_dict = {
+        fields.InputDataFields.image:
+            tf.constant(image),
+        fields.InputDataFields.image_additional_channels:
+            tf.constant(additional_channels),
+        fields.InputDataFields.groundtruth_classes:
+            tf.constant(np.array([1, 1], np.int32))
+    }
+
+    input_transformation_fn = functools.partial(
+        inputs.transform_input_data,
+        model_preprocess_fn=_fake_model_preprocessor_fn,
+        image_resizer_fn=_fake_image_resizer_fn,
+        num_classes=1)
+    with self.test_session() as sess:
+      transformed_inputs = sess.run(
+          input_transformation_fn(tensor_dict=tensor_dict))
+    self.assertAllEqual(transformed_inputs[fields.InputDataFields.image].dtype,
+                        tf.float32)
+    self.assertAllEqual(transformed_inputs[fields.InputDataFields.image].shape,
+                        [4, 4, 5])
+    self.assertAllClose(transformed_inputs[fields.InputDataFields.image],
+                        np.concatenate((image, additional_channels), axis=2))
+
+  def test_returns_correct_class_label_encodings(self):
+    tensor_dict = {
+        fields.InputDataFields.image:
+            tf.constant(np.random.rand(4, 4, 3).astype(np.float32)),
+        fields.InputDataFields.groundtruth_boxes:
+            tf.constant(np.array([[0, 0, 1, 1], [.5, .5, 1, 1]], np.float32)),
+        fields.InputDataFields.groundtruth_classes:
+            tf.constant(np.array([3, 1], np.int32))
+    }
+    num_classes = 3
+    input_transformation_fn = functools.partial(
+        inputs.transform_input_data,
+        model_preprocess_fn=_fake_model_preprocessor_fn,
+        image_resizer_fn=_fake_image_resizer_fn,
+        num_classes=num_classes)
+    with self.test_session() as sess:
+      transformed_inputs = sess.run(
+          input_transformation_fn(tensor_dict=tensor_dict))
+
+    self.assertAllClose(
+        transformed_inputs[fields.InputDataFields.groundtruth_classes],
+        [[0, 0, 1], [1, 0, 0]])
+
+  def test_returns_correct_merged_boxes(self):
+    tensor_dict = {
+        fields.InputDataFields.image:
+            tf.constant(np.random.rand(4, 4, 3).astype(np.float32)),
+        fields.InputDataFields.groundtruth_boxes:
+            tf.constant(np.array([[.5, .5, 1, 1], [.5, .5, 1, 1]], np.float32)),
+        fields.InputDataFields.groundtruth_classes:
+            tf.constant(np.array([3, 1], np.int32))
+    }
+
+    num_classes = 3
+    input_transformation_fn = functools.partial(
+        inputs.transform_input_data,
+        model_preprocess_fn=_fake_model_preprocessor_fn,
+        image_resizer_fn=_fake_image_resizer_fn,
+        num_classes=num_classes,
+        merge_multiple_boxes=True)
+
+    with self.test_session() as sess:
+      transformed_inputs = sess.run(
+          input_transformation_fn(tensor_dict=tensor_dict))
+    self.assertAllClose(
+        transformed_inputs[fields.InputDataFields.groundtruth_boxes],
+        [[.5, .5, 1., 1.]])
+    self.assertAllClose(
+        transformed_inputs[fields.InputDataFields.groundtruth_classes],
+        [[1, 0, 1]])
+
+  def test_returns_resized_masks(self):
+    tensor_dict = {
+        fields.InputDataFields.image:
+            tf.constant(np.random.rand(4, 4, 3).astype(np.float32)),
+        fields.InputDataFields.groundtruth_instance_masks:
+            tf.constant(np.random.rand(2, 4, 4).astype(np.float32)),
+        fields.InputDataFields.groundtruth_classes:
+            tf.constant(np.array([3, 1], np.int32))
+    }
+    def fake_image_resizer_fn(image, masks=None):
+      resized_image = tf.image.resize_images(image, [8, 8])
+      results = [resized_image]
+      if masks is not None:
+        resized_masks = tf.transpose(
+            tf.image.resize_images(tf.transpose(masks, [1, 2, 0]), [8, 8]),
+            [2, 0, 1])
+        results.append(resized_masks)
+      results.append(tf.shape(resized_image))
+      return results
+
+    num_classes = 3
+    input_transformation_fn = functools.partial(
+        inputs.transform_input_data,
+        model_preprocess_fn=_fake_model_preprocessor_fn,
+        image_resizer_fn=fake_image_resizer_fn,
+        num_classes=num_classes,
+        retain_original_image=True)
+    with self.test_session() as sess:
+      transformed_inputs = sess.run(
+          input_transformation_fn(tensor_dict=tensor_dict))
+    self.assertAllEqual(transformed_inputs[
+        fields.InputDataFields.original_image].dtype, tf.uint8)
+    self.assertAllEqual(transformed_inputs[
+        fields.InputDataFields.original_image].shape, [4, 4, 3])
+    self.assertAllEqual(transformed_inputs[
+        fields.InputDataFields.groundtruth_instance_masks].shape, [2, 8, 8])
+
+  def test_applies_model_preprocess_fn_to_image_tensor(self):
+    np_image = np.random.randint(256, size=(4, 4, 3))
+    tensor_dict = {
+        fields.InputDataFields.image:
+            tf.constant(np_image),
+        fields.InputDataFields.groundtruth_classes:
+            tf.constant(np.array([3, 1], np.int32))
+    }
+    def fake_model_preprocessor_fn(image):
+      return (image / 255., tf.expand_dims(tf.shape(image)[1:], axis=0))
+
+    num_classes = 3
+    input_transformation_fn = functools.partial(
+        inputs.transform_input_data,
+        model_preprocess_fn=fake_model_preprocessor_fn,
+        image_resizer_fn=_fake_image_resizer_fn,
+        num_classes=num_classes)
+
+    with self.test_session() as sess:
+      transformed_inputs = sess.run(
+          input_transformation_fn(tensor_dict=tensor_dict))
+    self.assertAllClose(transformed_inputs[fields.InputDataFields.image],
+                        np_image / 255.)
+    self.assertAllClose(transformed_inputs[fields.InputDataFields.
+                                           true_image_shape],
+                        [4, 4, 3])
+
+  def test_applies_data_augmentation_fn_to_tensor_dict(self):
+    np_image = np.random.randint(256, size=(4, 4, 3))
+    tensor_dict = {
+        fields.InputDataFields.image:
+            tf.constant(np_image),
+        fields.InputDataFields.groundtruth_classes:
+            tf.constant(np.array([3, 1], np.int32))
+    }
+    def add_one_data_augmentation_fn(tensor_dict):
+      return {key: value + 1 for key, value in tensor_dict.items()}
+
+    num_classes = 4
+    input_transformation_fn = functools.partial(
+        inputs.transform_input_data,
+        model_preprocess_fn=_fake_model_preprocessor_fn,
+        image_resizer_fn=_fake_image_resizer_fn,
+        num_classes=num_classes,
+        data_augmentation_fn=add_one_data_augmentation_fn)
+    with self.test_session() as sess:
+      augmented_tensor_dict = sess.run(
+          input_transformation_fn(tensor_dict=tensor_dict))
+
+    self.assertAllEqual(augmented_tensor_dict[fields.InputDataFields.image],
+                        np_image + 1)
+    self.assertAllEqual(
+        augmented_tensor_dict[fields.InputDataFields.groundtruth_classes],
+        [[0, 0, 0, 1], [0, 1, 0, 0]])
+
+  def test_applies_data_augmentation_fn_before_model_preprocess_fn(self):
+    np_image = np.random.randint(256, size=(4, 4, 3))
+    tensor_dict = {
+        fields.InputDataFields.image:
+            tf.constant(np_image),
+        fields.InputDataFields.groundtruth_classes:
+            tf.constant(np.array([3, 1], np.int32))
+    }
+    def mul_two_model_preprocessor_fn(image):
+      return (image * 2, tf.expand_dims(tf.shape(image)[1:], axis=0))
+    def add_five_to_image_data_augmentation_fn(tensor_dict):
+      tensor_dict[fields.InputDataFields.image] += 5
+      return tensor_dict
+
+    num_classes = 4
+    input_transformation_fn = functools.partial(
+        inputs.transform_input_data,
+        model_preprocess_fn=mul_two_model_preprocessor_fn,
+        image_resizer_fn=_fake_image_resizer_fn,
+        num_classes=num_classes,
+        data_augmentation_fn=add_five_to_image_data_augmentation_fn)
+    with self.test_session() as sess:
+      augmented_tensor_dict = sess.run(
+          input_transformation_fn(tensor_dict=tensor_dict))
+
+    self.assertAllEqual(augmented_tensor_dict[fields.InputDataFields.image],
+                        (np_image + 5) * 2)
+
+
+if __name__ == '__main__':
+  tf.test.main()
--- a/research/mlperf_object_detection/Mask_RCNN/object_detection/matchers/__init__.py
+++ b/research/mlperf_object_detection/Mask_RCNN/object_detection/matchers/__init__.py
--- a/research/mlperf_object_detection/Mask_RCNN/object_detection/matchers/argmax_matcher.py
+++ b/research/mlperf_object_detection/Mask_RCNN/object_detection/matchers/argmax_matcher.py
+# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Argmax matcher implementation.
+
+This class takes a similarity matrix and matches columns to rows based on the
+maximum value per column. One can specify matched_thresholds and
+to prevent columns from matching to rows (generally resulting in a negative
+training example) and unmatched_theshold to ignore the match (generally
+resulting in neither a positive or negative training example).
+
+This matcher is used in Fast(er)-RCNN.
+
+Note: matchers are used in TargetAssigners. There is a create_target_assigner
+factory function for popular implementations.
+"""
+import tensorflow as tf
+
+from object_detection.core import matcher
+from object_detection.utils import shape_utils
+
+
+class ArgMaxMatcher(matcher.Matcher):
+  """Matcher based on highest value.
+
+  This class computes matches from a similarity matrix. Each column is matched
+  to a single row.
+
+  To support object detection target assignment this class enables setting both
+  matched_threshold (upper threshold) and unmatched_threshold (lower thresholds)
+  defining three categories of similarity which define whether examples are
+  positive, negative, or ignored:
+  (1) similarity >= matched_threshold: Highest similarity. Matched/Positive!
+  (2) matched_threshold > similarity >= unmatched_threshold: Medium similarity.
+          Depending on negatives_lower_than_unmatched, this is either
+          Unmatched/Negative OR Ignore.
+  (3) unmatched_threshold > similarity: Lowest similarity. Depending on flag
+          negatives_lower_than_unmatched, either Unmatched/Negative OR Ignore.
+  For ignored matches this class sets the values in the Match object to -2.
+  """
+
+  def __init__(self,
+               matched_threshold,
+               unmatched_threshold=None,
+               negatives_lower_than_unmatched=True,
+               force_match_for_each_row=False,
+               use_matmul_gather=False):
+    """Construct ArgMaxMatcher.
+
+    Args:
+      matched_threshold: Threshold for positive matches. Positive if
+        sim >= matched_threshold, where sim is the maximum value of the
+        similarity matrix for a given column. Set to None for no threshold.
+      unmatched_threshold: Threshold for negative matches. Negative if
+        sim < unmatched_threshold. Defaults to matched_threshold
+        when set to None.
+      negatives_lower_than_unmatched: Boolean which defaults to True. If True
+        then negative matches are the ones below the unmatched_threshold,
+        whereas ignored matches are in between the matched and umatched
+        threshold. If False, then negative matches are in between the matched
+        and unmatched threshold, and everything lower than unmatched is ignored.
+      force_match_for_each_row: If True, ensures that each row is matched to
+        at least one column (which is not guaranteed otherwise if the
+        matched_threshold is high). Defaults to False. See
+        argmax_matcher_test.testMatcherForceMatch() for an example.
+      use_matmul_gather: Force constructed match objects to use matrix
+        multiplication based gather instead of standard tf.gather.
+        (Default: False).
+
+    Raises:
+      ValueError: if unmatched_threshold is set but matched_threshold is not set
+        or if unmatched_threshold > matched_threshold.
+    """
+    super(ArgMaxMatcher, self).__init__(use_matmul_gather=use_matmul_gather)
+    if (matched_threshold is None) and (unmatched_threshold is not None):
+      raise ValueError('Need to also define matched_threshold when'
+                       'unmatched_threshold is defined')
+    self._matched_threshold = matched_threshold
+    if unmatched_threshold is None:
+      self._unmatched_threshold = matched_threshold
+    else:
+      if unmatched_threshold > matched_threshold:
+        raise ValueError('unmatched_threshold needs to be smaller or equal'
+                         'to matched_threshold')
+      self._unmatched_threshold = unmatched_threshold
+    if not negatives_lower_than_unmatched:
+      if self._unmatched_threshold == self._matched_threshold:
+        raise ValueError('When negatives are in between matched and '
+                         'unmatched thresholds, these cannot be of equal '
+                         'value. matched: %s, unmatched: %s',
+                         self._matched_threshold, self._unmatched_threshold)
+    self._force_match_for_each_row = force_match_for_each_row
+    self._negatives_lower_than_unmatched = negatives_lower_than_unmatched
+
+  def _match(self, similarity_matrix):
+    """Tries to match each column of the similarity matrix to a row.
+
+    Args:
+      similarity_matrix: tensor of shape [N, M] representing any similarity
+        metric.
+
+    Returns:
+      Match object with corresponding matches for each of M columns.
+    """
+
+    def _match_when_rows_are_empty():
+      """Performs matching when the rows of similarity matrix are empty.
+
+      When the rows are empty, all detections are false positives. So we return
+      a tensor of -1's to indicate that the columns do not match to any rows.
+
+      Returns:
+        matches:  int32 tensor indicating the row each column matches to.
+      """
+      similarity_matrix_shape = shape_utils.combined_static_and_dynamic_shape(
+          similarity_matrix)
+      return -1 * tf.ones([similarity_matrix_shape[1]], dtype=tf.int32)
+
+    def _match_when_rows_are_non_empty():
+      """Performs matching when the rows of similarity matrix are non empty.
+
+      Returns:
+        matches:  int32 tensor indicating the row each column matches to.
+      """
+      # Matches for each column
+      matches = tf.argmax(similarity_matrix, 0, output_type=tf.int32)
+
+      # Deal with matched and unmatched threshold
+      if self._matched_threshold is not None:
+        # Get logical indices of ignored and unmatched columns as tf.int64
+        matched_vals = tf.reduce_max(similarity_matrix, 0)
+        below_unmatched_threshold = tf.greater(self._unmatched_threshold,
+                                               matched_vals)
+        between_thresholds = tf.logical_and(
+            tf.greater_equal(matched_vals, self._unmatched_threshold),
+            tf.greater(self._matched_threshold, matched_vals))
+
+        if self._negatives_lower_than_unmatched:
+          matches = self._set_values_using_indicator(matches,
+                                                     below_unmatched_threshold,
+                                                     -1)
+          matches = self._set_values_using_indicator(matches,
+                                                     between_thresholds,
+                                                     -2)
+        else:
+          matches = self._set_values_using_indicator(matches,
+                                                     below_unmatched_threshold,
+                                                     -2)
+          matches = self._set_values_using_indicator(matches,
+                                                     between_thresholds,
+                                                     -1)
+
+      if self._force_match_for_each_row:
+        similarity_matrix_shape = shape_utils.combined_static_and_dynamic_shape(
+            similarity_matrix)
+        force_match_column_ids = tf.argmax(similarity_matrix, 1,
+                                           output_type=tf.int32)
+        force_match_column_indicators = tf.one_hot(
+            force_match_column_ids, depth=similarity_matrix_shape[1])
+        force_match_row_ids = tf.argmax(force_match_column_indicators, 0,
+                                        output_type=tf.int32)
+        force_match_column_mask = tf.cast(
+            tf.reduce_max(force_match_column_indicators, 0), tf.bool)
+        final_matches = tf.where(force_match_column_mask,
+                                 force_match_row_ids, matches)
+        return final_matches
+      else:
+        return matches
+
+    if similarity_matrix.shape.is_fully_defined():
+      if similarity_matrix.shape[0].value == 0:
+        return _match_when_rows_are_empty()
+      else:
+        return _match_when_rows_are_non_empty()
+    else:
+      return tf.cond(
+          tf.greater(tf.shape(similarity_matrix)[0], 0),
+          _match_when_rows_are_non_empty, _match_when_rows_are_empty)
+
+  def _set_values_using_indicator(self, x, indicator, val):
+    """Set the indicated fields of x to val.
+
+    Args:
+      x: tensor.
+      indicator: boolean with same shape as x.
+      val: scalar with value to set.
+
+    Returns:
+      modified tensor.
+    """
+    indicator = tf.cast(indicator, x.dtype)
+    return tf.add(tf.multiply(x, 1 - indicator), val * indicator)
--- a/research/mlperf_object_detection/Mask_RCNN/object_detection/matchers/argmax_matcher_test.py
+++ b/research/mlperf_object_detection/Mask_RCNN/object_detection/matchers/argmax_matcher_test.py
+# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Tests for object_detection.matchers.argmax_matcher."""
+
+import numpy as np
+import tensorflow as tf
+
+from object_detection.matchers import argmax_matcher
+from object_detection.utils import test_case
+
+
+class ArgMaxMatcherTest(test_case.TestCase):
+
+  def test_return_correct_matches_with_default_thresholds(self):
+
+    def graph_fn(similarity_matrix):
+      matcher = argmax_matcher.ArgMaxMatcher(matched_threshold=None)
+      match = matcher.match(similarity_matrix)
+      matched_cols = match.matched_column_indicator()
+      unmatched_cols = match.unmatched_column_indicator()
+      match_results = match.match_results
+      return (matched_cols, unmatched_cols, match_results)
+
+    similarity = np.array([[1., 1, 1, 3, 1],
+                           [2, -1, 2, 0, 4],
+                           [3, 0, -1, 0, 0]], dtype=np.float32)
+    expected_matched_rows = np.array([2, 0, 1, 0, 1])
+    (res_matched_cols, res_unmatched_cols,
+     res_match_results) = self.execute(graph_fn, [similarity])
+
+    self.assertAllEqual(res_match_results[res_matched_cols],
+                        expected_matched_rows)
+    self.assertAllEqual(np.nonzero(res_matched_cols)[0], [0, 1, 2, 3, 4])
+    self.assertFalse(np.all(res_unmatched_cols))
+
+  def test_return_correct_matches_with_empty_rows(self):
+
+    def graph_fn(similarity_matrix):
+      matcher = argmax_matcher.ArgMaxMatcher(matched_threshold=None)
+      match = matcher.match(similarity_matrix)
+      return match.unmatched_column_indicator()
+    similarity = 0.2 * np.ones([0, 5], dtype=np.float32)
+    res_unmatched_cols = self.execute(graph_fn, [similarity])
+    self.assertAllEqual(np.nonzero(res_unmatched_cols)[0], np.arange(5))
+
+  def test_return_correct_matches_with_matched_threshold(self):
+
+    def graph_fn(similarity):
+      matcher = argmax_matcher.ArgMaxMatcher(matched_threshold=3.)
+      match = matcher.match(similarity)
+      matched_cols = match.matched_column_indicator()
+      unmatched_cols = match.unmatched_column_indicator()
+      match_results = match.match_results
+      return (matched_cols, unmatched_cols, match_results)
+
+    similarity = np.array([[1, 1, 1, 3, 1],
+                           [2, -1, 2, 0, 4],
+                           [3, 0, -1, 0, 0]], dtype=np.float32)
+    expected_matched_cols = np.array([0, 3, 4])
+    expected_matched_rows = np.array([2, 0, 1])
+    expected_unmatched_cols = np.array([1, 2])
+
+    (res_matched_cols, res_unmatched_cols,
+     match_results) = self.execute(graph_fn, [similarity])
+    self.assertAllEqual(match_results[res_matched_cols], expected_matched_rows)
+    self.assertAllEqual(np.nonzero(res_matched_cols)[0], expected_matched_cols)
+    self.assertAllEqual(np.nonzero(res_unmatched_cols)[0],
+                        expected_unmatched_cols)
+
+  def test_return_correct_matches_with_matched_and_unmatched_threshold(self):
+
+    def graph_fn(similarity):
+      matcher = argmax_matcher.ArgMaxMatcher(matched_threshold=3.,
+                                             unmatched_threshold=2.)
+      match = matcher.match(similarity)
+      matched_cols = match.matched_column_indicator()
+      unmatched_cols = match.unmatched_column_indicator()
+      match_results = match.match_results
+      return (matched_cols, unmatched_cols, match_results)
+
+    similarity = np.array([[1, 1, 1, 3, 1],
+                           [2, -1, 2, 0, 4],
+                           [3, 0, -1, 0, 0]], dtype=np.float32)
+    expected_matched_cols = np.array([0, 3, 4])
+    expected_matched_rows = np.array([2, 0, 1])
+    expected_unmatched_cols = np.array([1])  # col 2 has too high maximum val
+
+    (res_matched_cols, res_unmatched_cols,
+     match_results) = self.execute(graph_fn, [similarity])
+    self.assertAllEqual(match_results[res_matched_cols], expected_matched_rows)
+    self.assertAllEqual(np.nonzero(res_matched_cols)[0], expected_matched_cols)
+    self.assertAllEqual(np.nonzero(res_unmatched_cols)[0],
+                        expected_unmatched_cols)
+
+  def test_return_correct_matches_negatives_lower_than_unmatched_false(self):
+
+    def graph_fn(similarity):
+      matcher = argmax_matcher.ArgMaxMatcher(
+          matched_threshold=3.,
+          unmatched_threshold=2.,
+          negatives_lower_than_unmatched=False)
+      match = matcher.match(similarity)
+      matched_cols = match.matched_column_indicator()
+      unmatched_cols = match.unmatched_column_indicator()
+      match_results = match.match_results
+      return (matched_cols, unmatched_cols, match_results)
+
+    similarity = np.array([[1, 1, 1, 3, 1],
+                           [2, -1, 2, 0, 4],
+                           [3, 0, -1, 0, 0]], dtype=np.float32)
+    expected_matched_cols = np.array([0, 3, 4])
+    expected_matched_rows = np.array([2, 0, 1])
+    expected_unmatched_cols = np.array([2])  # col 1 has too low maximum val
+
+    (res_matched_cols, res_unmatched_cols,
+     match_results) = self.execute(graph_fn, [similarity])
+    self.assertAllEqual(match_results[res_matched_cols], expected_matched_rows)
+    self.assertAllEqual(np.nonzero(res_matched_cols)[0], expected_matched_cols)
+    self.assertAllEqual(np.nonzero(res_unmatched_cols)[0],
+                        expected_unmatched_cols)
+
+  def test_return_correct_matches_unmatched_row_not_using_force_match(self):
+
+    def graph_fn(similarity):
+      matcher = argmax_matcher.ArgMaxMatcher(matched_threshold=3.,
+                                             unmatched_threshold=2.)
+      match = matcher.match(similarity)
+      matched_cols = match.matched_column_indicator()
+      unmatched_cols = match.unmatched_column_indicator()
+      match_results = match.match_results
+      return (matched_cols, unmatched_cols, match_results)
+
+    similarity = np.array([[1, 1, 1, 3, 1],
+                           [-1, 0, -2, -2, -1],
+                           [3, 0, -1, 2, 0]], dtype=np.float32)
+    expected_matched_cols = np.array([0, 3])
+    expected_matched_rows = np.array([2, 0])
+    expected_unmatched_cols = np.array([1, 2, 4])
+
+    (res_matched_cols, res_unmatched_cols,
+     match_results) = self.execute(graph_fn, [similarity])
+    self.assertAllEqual(match_results[res_matched_cols], expected_matched_rows)
+    self.assertAllEqual(np.nonzero(res_matched_cols)[0], expected_matched_cols)
+    self.assertAllEqual(np.nonzero(res_unmatched_cols)[0],
+                        expected_unmatched_cols)
+
+  def test_return_correct_matches_unmatched_row_while_using_force_match(self):
+    def graph_fn(similarity):
+      matcher = argmax_matcher.ArgMaxMatcher(matched_threshold=3.,
+                                             unmatched_threshold=2.,
+                                             force_match_for_each_row=True)
+      match = matcher.match(similarity)
+      matched_cols = match.matched_column_indicator()
+      unmatched_cols = match.unmatched_column_indicator()
+      match_results = match.match_results
+      return (matched_cols, unmatched_cols, match_results)
+
+    similarity = np.array([[1, 1, 1, 3, 1],
+                           [-1, 0, -2, -2, -1],
+                           [3, 0, -1, 2, 0]], dtype=np.float32)
+    expected_matched_cols = np.array([0, 1, 3])
+    expected_matched_rows = np.array([2, 1, 0])
+    expected_unmatched_cols = np.array([2, 4])  # col 2 has too high max val
+
+    (res_matched_cols, res_unmatched_cols,
+     match_results) = self.execute(graph_fn, [similarity])
+    self.assertAllEqual(match_results[res_matched_cols], expected_matched_rows)
+    self.assertAllEqual(np.nonzero(res_matched_cols)[0], expected_matched_cols)
+    self.assertAllEqual(np.nonzero(res_unmatched_cols)[0],
+                        expected_unmatched_cols)
+
+  def test_valid_arguments_corner_case(self):
+    argmax_matcher.ArgMaxMatcher(matched_threshold=1,
+                                 unmatched_threshold=1)
+
+  def test_invalid_arguments_corner_case_negatives_lower_than_thres_false(self):
+    with self.assertRaises(ValueError):
+      argmax_matcher.ArgMaxMatcher(matched_threshold=1,
+                                   unmatched_threshold=1,
+                                   negatives_lower_than_unmatched=False)
+
+  def test_invalid_arguments_no_matched_threshold(self):
+    with self.assertRaises(ValueError):
+      argmax_matcher.ArgMaxMatcher(matched_threshold=None,
+                                   unmatched_threshold=4)
+
+  def test_invalid_arguments_unmatched_thres_larger_than_matched_thres(self):
+    with self.assertRaises(ValueError):
+      argmax_matcher.ArgMaxMatcher(matched_threshold=1,
+                                   unmatched_threshold=2)
+
+
+if __name__ == '__main__':
+  tf.test.main()