Open Images Challenge 2018 tools, minor fixes and refactors. (#4661)

* Merged commit includes the following changes: 202804536 by Zhichao Lu: Return tf.data.Dataset from input_fn that goes into the estimator and use PER_HOST_V2 option for tpu input pipeline config. This change shaves off 100ms per step resulting in 25 minutes of total reduced training time for ssd mobilenet v1 (15k steps to convergence). -- 202769340 by Zhichao Lu: Adding as_matrix() transformation for image-level labels. -- 202768721 by Zhichao Lu: Challenge evaluation protocol modification: adding labelmaps creation. -- 202750966 by Zhichao Lu: Add the explicit names to two output nodes. -- 202732783 by Zhichao Lu: Enforcing that batch size is 1 for evaluation, and no original images are retained during evaluation when use_tpu=False (to avoid dynamic shapes). -- 202425430 by Zhichao Lu: Refactor input pipeline to improve performance. -- 202406389 by Zhichao Lu: Only check the validity of `warmup_learning_rate` if it will be used. -- 202330450 by Zhichao Lu: Adding the description of the flag input_image_label_annotations_csv to add image-level labels to tf.Example. -- 202029012 by Zhichao Lu: Enabling displaying relationship name in the final metrics output. -- 202024010 by Zhichao Lu: Update to the public README. -- 201999677 by Zhichao Lu: Fixing the way negative labels are handled in VRD evaluation. -- 201962313 by Zhichao Lu: Fix a bug in resize_to_range. -- 201808488 by Zhichao Lu: Update ssd_inception_v2_pets.config to use right filename of pets dataset tf records. -- 201779225 by Zhichao Lu: Update object detection API installation doc -- 201766518 by Zhichao Lu: Add shell script to create pycocotools package for CMLE. -- 201722377 by Zhichao Lu: Removes verified_labels field and uses groundtruth_image_classes field instead. -- 201616819 by Zhichao Lu: Disable eval_on_tpu since eval_metrics is not setup to execute on TPU. Do not use run_config.task_type to switch tpu mode for EVAL, since that won't work in unit test. Expand unit test to verify that the same instantiation of the Estimator can independently disable eval on TPU whereas training is enabled on TPU. -- 201524716 by Zhichao Lu: Disable export model to TPU, inference is not compatible with TPU. Add GOOGLE_INTERNAL support in object detection copy.bara.sky -- 201453347 by Zhichao Lu: Fixing bug when evaluating the quantized model. -- 200795826 by Zhichao Lu: Fixing parsing bug: image-level labels are parsed as tuples instead of numpy array. -- 200746134 by Zhichao Lu: Adding image_class_text and image_class_label fields into tf_example_decoder.py -- 200743003 by Zhichao Lu: Changes to model_main.py and model_tpu_main to enable training and continuous eval. -- 200736324 by Zhichao Lu: Replace deprecated squeeze_dims argument. -- 200730072 by Zhichao Lu: Make detections only during predict and eval mode while creating model function -- 200729699 by Zhichao Lu: Minor correction to internal documentation (definition of Huber loss) -- 200727142 by Zhichao Lu: Add command line parsing as a set of flags using argparse and add header to the resulting file. -- 200726169 by Zhichao Lu: A tutorial on running evaluation for the Open Images Challenge 2018. -- 200665093 by Zhichao Lu: Cleanup on variables_helper_test.py. -- 200652145 by Zhichao Lu: Add an option to write (non-frozen) graph when exporting inference graph. -- 200573810 by Zhichao Lu: Update ssd_mobilenet_v1_coco and ssd_inception_v2_coco download links to point to a newer version. -- 200498014 by Zhichao Lu: Add test for groundtruth mask resizing. -- 200453245 by Zhichao Lu: Cleaning up exporting_models.md along with exporting scripts -- 200311747 by Zhichao Lu: Resize groundtruth mask to match the size of the original image. -- 200287269 by Zhichao Lu: Having a option to use custom MatMul based crop_and_resize op as an alternate to the TF op in Faster-RCNN -- 200127859 by Zhichao Lu: Updating the instructions to run locally with new binary. Also updating pets configs since file path naming has changed. -- 200127044 by Zhichao Lu: A simpler evaluation util to compute Open Images Challenge 2018 metric (object detection track). -- 200124019 by Zhichao Lu: Freshening up configuring_jobs.md -- 200086825 by Zhichao Lu: Make merge_multiple_label_boxes work for ssd model. -- 199843258 by Zhichao Lu: Allows inconsistent feature channels to be compatible with WeightSharedConvolutionalBoxPredictor. -- 199676082 by Zhichao Lu: Enable an override for `InputReader.shuffle` for object detection pipelines. -- 199599212 by Zhichao Lu: Markdown fixes. -- 199535432 by Zhichao Lu: Pass num_additional_channels to tf.example decoder in predict_input_fn. -- 199399439 by Zhichao Lu: Adding `num_additional_channels` field to specify how many additional channels to use in the model. -- PiperOrigin-RevId: 202804536 * Add original model builder and docs back.

Open Images Challenge 2018 tools, minor fixes and refactors. (#4661)
* Merged commit includes the following changes: 202804536 by Zhichao Lu: Return tf.data.Dataset from input_fn that goes into the estimator and use PER_HOST_V2 option for tpu input pipeline config. This change shaves off 100ms per step resulting in 25 minutes of total reduced training time for ssd mobilenet v1 (15k steps to convergence). -- 202769340 by Zhichao Lu: Adding as_matrix() transformation for image-level labels. -- 202768721 by Zhichao Lu: Challenge evaluation protocol modification: adding labelmaps creation. -- 202750966 by Zhichao Lu: Add the explicit names to two output nodes. -- 202732783 by Zhichao Lu: Enforcing that batch size is 1 for evaluation, and no original images are retained during evaluation when use_tpu=False (to avoid dynamic shapes). -- 202425430 by Zhichao Lu: Refactor input pipeline to improve performance. -- 202406389 by Zhichao Lu: Only check the validity of `warmup_learning_rate` if it will be used. -- 202330450 by Zhichao Lu: Adding the description of the flag input_image_label_annotations_csv to add image-level labels to tf.Example. -- 202029012 by Zhichao Lu: Enabling displaying relationship name in the final metrics output. -- 202024010 by Zhichao Lu: Update to the public README. -- 201999677 by Zhichao Lu: Fixing the way negative labels are handled in VRD evaluation. -- 201962313 by Zhichao Lu: Fix a bug in resize_to_range. -- 201808488 by Zhichao Lu: Update ssd_inception_v2_pets.config to use right filename of pets dataset tf records. -- 201779225 by Zhichao Lu: Update object detection API installation doc -- 201766518 by Zhichao Lu: Add shell script to create pycocotools package for CMLE. -- 201722377 by Zhichao Lu: Removes verified_labels field and uses groundtruth_image_classes field instead. -- 201616819 by Zhichao Lu: Disable eval_on_tpu since eval_metrics is not setup to execute on TPU. Do not use run_config.task_type to switch tpu mode for EVAL, since that won't work in unit test. Expand unit test to verify that the same instantiation of the Estimator can independently disable eval on TPU whereas training is enabled on TPU. -- 201524716 by Zhichao Lu: Disable export model to TPU, inference is not compatible with TPU. Add GOOGLE_INTERNAL support in object detection copy.bara.sky -- 201453347 by Zhichao Lu: Fixing bug when evaluating the quantized model. -- 200795826 by Zhichao Lu: Fixing parsing bug: image-level labels are parsed as tuples instead of numpy array. -- 200746134 by Zhichao Lu: Adding image_class_text and image_class_label fields into tf_example_decoder.py -- 200743003 by Zhichao Lu: Changes to model_main.py and model_tpu_main to enable training and continuous eval. -- 200736324 by Zhichao Lu: Replace deprecated squeeze_dims argument. -- 200730072 by Zhichao Lu: Make detections only during predict and eval mode while creating model function -- 200729699 by Zhichao Lu: Minor correction to internal documentation (definition of Huber loss) -- 200727142 by Zhichao Lu: Add command line parsing as a set of flags using argparse and add header to the resulting file. -- 200726169 by Zhichao Lu: A tutorial on running evaluation for the Open Images Challenge 2018. -- 200665093 by Zhichao Lu: Cleanup on variables_helper_test.py. -- 200652145 by Zhichao Lu: Add an option to write (non-frozen) graph when exporting inference graph. -- 200573810 by Zhichao Lu: Update ssd_mobilenet_v1_coco and ssd_inception_v2_coco download links to point to a newer version. -- 200498014 by Zhichao Lu: Add test for groundtruth mask resizing. -- 200453245 by Zhichao Lu: Cleaning up exporting_models.md along with exporting scripts -- 200311747 by Zhichao Lu: Resize groundtruth mask to match the size of the original image. -- 200287269 by Zhichao Lu: Having a option to use custom MatMul based crop_and_resize op as an alternate to the TF op in Faster-RCNN -- 200127859 by Zhichao Lu: Updating the instructions to run locally with new binary. Also updating pets configs since file path naming has changed. -- 200127044 by Zhichao Lu: A simpler evaluation util to compute Open Images Challenge 2018 metric (object detection track). -- 200124019 by Zhichao Lu: Freshening up configuring_jobs.md -- 200086825 by Zhichao Lu: Make merge_multiple_label_boxes work for ssd model. -- 199843258 by Zhichao Lu: Allows inconsistent feature channels to be compatible with WeightSharedConvolutionalBoxPredictor. -- 199676082 by Zhichao Lu: Enable an override for `InputReader.shuffle` for object detection pipelines. -- 199599212 by Zhichao Lu: Markdown fixes. -- 199535432 by Zhichao Lu: Pass num_additional_channels to tf.example decoder in predict_input_fn. -- 199399439 by Zhichao Lu: Adding `num_additional_channels` field to specify how many additional channels to use in the model. -- PiperOrigin-RevId: 202804536 * Add original model builder and docs back.
32e7d660 · pkulzc · GitHub · 86ac7a47 · 32e7d660 · 32e7d660
Unverified Commit 32e7d660 authored Jul 02, 2018 by pkulzc Committed by GitHub Jul 02, 2018
20 changed files
--- a/research/object_detection/g3doc/configuring_jobs.md
+++ b/research/object_detection/g3doc/configuring_jobs.md
@@ -13,7 +13,7 @@ file is split into 5 parts:
 model parameters (ie. SGD parameters, input preprocessing and feature extractor
 initialization values).
 3. The `eval_config`, which determines what set of metrics will be reported for
-evaluation (currently we only support the PASCAL VOC metrics).
+evaluation.
 4. The `train_input_config`, which defines what dataset the model should be
 trained on.
 5. The `eval_input_config`, which defines what dataset the model will be
@@ -118,6 +118,7 @@ optimizer {
 }
 fine_tune_checkpoint: "/usr/home/username/tmp/model.ckpt-#####"
 from_detection_checkpoint: true
+load_all_detection_checkpoint_vars: true
 gradient_clipping_by_norm: 10.0
 data_augmentation_options {
  random_horizontal_flip {
@@ -130,8 +131,8 @@ data_augmentation_options {
 While optional, it is highly recommended that users utilize other object
 detection checkpoints. Training an object detector from scratch can take days.
 To speed up the training process, it is recommended that users re-use the
-feature extractor parameters from a pre-existing object classification or
+feature extractor parameters from a pre-existing image classification or
-detection checkpoint. `train_config` provides two fields to specify
+object detection checkpoint. `train_config` provides two fields to specify
 pre-existing checkpoints: `fine_tune_checkpoint` and
 `from_detection_checkpoint`. `fine_tune_checkpoint` should provide a path to
 the pre-existing checkpoint
@@ -157,6 +158,8 @@ number of workers, gpu type).
 ## Configuring the Evaluator
-Currently evaluation is fixed to generating metrics as defined by the PASCAL VOC
+The main components to set in `eval_config` are `num_examples` and
-challenge. The parameters for `eval_config` are set to reasonable defaults and
+`metrics_set`. The parameter `num_examples` indicates the number of batches (
-typically do not need to be configured.
+currently of batch size 1) used for an evaluation cycle, and often is the total
+size of the evaluation dataset. The parameter `metrics_set` indicates which
+metrics to run during evaluation (i.e. `"coco_detection_metrics"`).
--- a/research/object_detection/g3doc/detection_model_zoo.md
+++ b/research/object_detection/g3doc/detection_model_zoo.md
@@ -69,10 +69,10 @@ Some remarks on frozen inference graphs:
 | Model name  | Speed (ms) | COCO mAP[^1] | Outputs |
 | ------------ | :--------------: | :--------------: | :-------------: |
-| [ssd_mobilenet_v1_coco](http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_coco_2017_11_17.tar.gz) | 30 | 21 | Boxes |
+| [ssd_mobilenet_v1_coco](http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_coco_2018_01_28.tar.gz) | 30 | 21 | Boxes |
 | [ssd_mobilenet_v2_coco](http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v2_coco_2018_03_29.tar.gz) | 31 | 22 | Boxes |
 | [ssdlite_mobilenet_v2_coco](http://download.tensorflow.org/models/object_detection/ssdlite_mobilenet_v2_coco_2018_05_09.tar.gz) | 27 | 22 | Boxes |
-| [ssd_inception_v2_coco](http://download.tensorflow.org/models/object_detection/ssd_inception_v2_coco_2017_11_17.tar.gz) | 42 | 24 | Boxes |
+| [ssd_inception_v2_coco](http://download.tensorflow.org/models/object_detection/ssd_inception_v2_coco_2018_01_28.tar.gz) | 42 | 24 | Boxes |
 | [faster_rcnn_inception_v2_coco](http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_v2_coco_2018_01_28.tar.gz) | 58 | 28 | Boxes |
 | [faster_rcnn_resnet50_coco](http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet50_coco_2018_01_28.tar.gz) | 89 | 30 | Boxes |
 | [faster_rcnn_resnet50_lowproposals_coco](http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet50_lowproposals_coco_2018_01_28.tar.gz) | 64 |  | Boxes |

--- a/research/object_detection/g3doc/exporting_models.md
+++ b/research/object_detection/g3doc/exporting_models.md
@@ -12,16 +12,25 @@ command from tensorflow/models/research:
 ``` bash
 # From tensorflow/models/research/
+INPUT_TYPE=image_tensor
+PIPELINE_CONFIG_PATH={path to pipeline config file}
+TRAINED_CKPT_PREFIX={path to model.ckpt}
+EXPORT_DIR={path to folder that will be used for export}
 python object_detection/export_inference_graph.py \
-    --input_type image_tensor \
+    --input_type=${INPUT_TYPE} \
-    --pipeline_config_path ${PIPELINE_CONFIG_PATH} \
+    --pipeline_config_path=${PIPELINE_CONFIG_PATH} \
-    --trained_checkpoint_prefix ${TRAIN_PATH} \
+    --trained_checkpoint_prefix=${TRAINED_CKPT_PREFIX} \
-    --output_directory ${EXPORT_DIR}
+    --output_directory=${EXPORT_DIR}
 ```
-Afterwards, you should see the directory ${EXPORT_DIR} containing the following:
+NOTE: We are configuring our exported model to ingest 4-D image tensors. We can
+also configure the exported model to take encoded images or serialized
+`tf.Example`s.
+After export, you should see the directory ${EXPORT_DIR} containing the following:
-* output_inference_graph.pb, the frozen graph format of the exported model
 * saved_model/, a directory containing the saved model format of the exported model
+* frozen_inference_graph.pb, the frozen graph format of the exported model
 * model.ckpt.*, the model checkpoints used for exporting
 * checkpoint, a file specifying to restore included checkpoint files
+* pipeline.config, pipeline config file for the exported model
--- a/research/object_detection/g3doc/installation.md
+++ b/research/object_detection/g3doc/installation.md
@@ -4,7 +4,7 @@
 Tensorflow Object Detection API depends on the following libraries:
-*   Protobuf 3+
+*   Protobuf 3.0.0
 *   Python-tk
 *   Pillow 1.0
 *   lxml
@@ -13,6 +13,7 @@ Tensorflow Object Detection API depends on the following libraries:
 *   Matplotlib
 *   Tensorflow
 *   Cython
+*   contextlib2
 *   cocoapi
 For detailed steps to install Tensorflow, follow the [Tensorflow installation
@@ -30,21 +31,28 @@ The remaining libraries can be installed on Ubuntu 16.04 using via apt-get:
 ``` bash
 sudo apt-get install protobuf-compiler python-pil python-lxml python-tk
-sudo pip install Cython
+pip install --user Cython
-sudo pip install jupyter
+pip install --user contextlib2
-sudo pip install matplotlib
+pip install --user jupyter
+pip install --user matplotlib
 ```
 Alternatively, users can install dependencies using pip:
 ``` bash
-sudo pip install Cython
+pip install --user Cython
-sudo pip install pillow
+pip install --user contextlib2
-sudo pip install lxml
+pip install --user pillow
-sudo pip install jupyter
+pip install --user lxml
-sudo pip install matplotlib
+pip install --user jupyter
+pip install --user matplotlib
 ```
+Note that sometimes "sudo apt-get install protobuf-compiler" will install
+Protobuf 3+ versions for you and some users have issues when using 3.5.
+If that is your case, you're suggested to download and install Protobuf 3.0.0
+(available [here](https://github.com/google/protobuf/releases/tag/v3.0.0)).
 ## COCO API installation
 Download the

--- a/research/object_detection/g3doc/oid_inference_and_evaluation.md
+++ b/research/object_detection/g3doc/oid_inference_and_evaluation.md
@@ -13,8 +13,8 @@ inferred detections.
 Inferred detections will look like the following:
-![](img/oid_bus_72e19c28aac34ed8.jpg){height="300"}
+![](img/oid_bus_72e19c28aac34ed8.jpg)
-![](img/oid_monkey_3b4168c89cecbc5b.jpg){height="300"}
+![](img/oid_monkey_3b4168c89cecbc5b.jpg)
 On the validation set of Open Images, this tutorial requires 27GB of free disk
 space and the inference step takes approximately 9 hours on a single NVIDIA
@@ -100,6 +100,8 @@ python -m object_detection/dataset_tools/create_oid_tf_record \
  --num_shards=100
 ```
+To add image-level labels, use the `--input_image_label_annotations_csv` flag.
 This results in 100 TFRecord files (shards), written to
 `oid/${SPLIT}_tfrecords`, with filenames matching
 `${SPLIT}.tfrecord-000[0-9][0-9]-of-00100`. Each shard contains approximately
@@ -146,7 +148,7 @@ access to the images, `infer_detections` can optionally discard them with the
 `--discard_image_pixels` flag. Discarding the images drastically reduces the
 size of the output TFRecord.
-### Accelerating inference {#accelerating_inference}
+### Accelerating inference
 Running inference on the whole validation or test set can take a long time to
 complete due to the large number of images present in these sets (41,620 and
@@ -196,7 +198,7 @@ After all `infer_detections` processes finish, `tensorflow/models/research/oid`
 will contain one output TFRecord from each process, with name matching
 `validation_detections.tfrecord-0000[0-3]-of-00004`.
-## Computing evaluation measures {#compute_evaluation_measures}
+## Computing evaluation measures
 To compute evaluation measures on the inferred detections you first need to
 create the appropriate configuration files:
@@ -237,7 +239,7 @@ file contains an `object_detection.protos.EvalConfig` message that describes the
 evaluation metric. For more information about these protos see the corresponding
 source files.
-### Expected mAPs {#expected-maps}
+### Expected mAPs
 The result of running `offline_eval_map_corloc` is a CSV file located at
 `${SPLIT}_eval_metrics/metrics.csv`. With the above configuration, the file will

--- a/research/object_detection/inputs.py
+++ b/research/object_detection/inputs.py
@@ -33,8 +33,8 @@ from object_detection.protos import input_reader_pb2
 from object_detection.protos import model_pb2
 from object_detection.protos import train_pb2
 from object_detection.utils import config_util
-from object_detection.utils import dataset_util
 from object_detection.utils import ops as util_ops
+from object_detection.utils import shape_utils
 HASH_KEY = 'hash'
 HASH_BINS = 1 << 31
@@ -91,6 +91,9 @@ def transform_input_data(tensor_dict,
    A dictionary keyed by fields.InputDataFields containing the tensors obtained
    after applying all the transformations.
  """
+  if fields.InputDataFields.groundtruth_boxes in tensor_dict:
+    tensor_dict = util_ops.filter_groundtruth_with_nan_box_coordinates(
+        tensor_dict)
  if fields.InputDataFields.image_additional_channels in tensor_dict:
    channels = tensor_dict[fields.InputDataFields.image_additional_channels]
    tensor_dict[fields.InputDataFields.image] = tf.concat(
@@ -135,6 +138,103 @@ def transform_input_data(tensor_dict,
  return tensor_dict
+def pad_input_data_to_static_shapes(tensor_dict, max_num_boxes, num_classes,
+                                    spatial_image_shape=None):
+  """Pads input tensors to static shapes.
+  Args:
+    tensor_dict: Tensor dictionary of input data
+    max_num_boxes: Max number of groundtruth boxes needed to compute shapes for
+      padding.
+    num_classes: Number of classes in the dataset needed to compute shapes for
+      padding.
+    spatial_image_shape: A list of two integers of the form [height, width]
+      containing expected spatial shape of the image.
+  Returns:
+    A dictionary keyed by fields.InputDataFields containing padding shapes for
+    tensors in the dataset.
+  Raises:
+    ValueError: If groundtruth classes is neither rank 1 nor rank 2.
+  """
+  if not spatial_image_shape or spatial_image_shape == [-1, -1]:
+    height, width = None, None
+  else:
+    height, width = spatial_image_shape  # pylint: disable=unpacking-non-sequence
+  num_additional_channels = 0
+  if fields.InputDataFields.image_additional_channels in tensor_dict:
+    num_additional_channels = tensor_dict[
+        fields.InputDataFields.image_additional_channels].shape[2].value
+  padding_shapes = {
+      # Additional channels are merged before batching.
+      fields.InputDataFields.image: [
+          height, width, 3 + num_additional_channels
+      ],
+      fields.InputDataFields.image_additional_channels: [
+          height, width, num_additional_channels
+      ],
+      fields.InputDataFields.source_id: [],
+      fields.InputDataFields.filename: [],
+      fields.InputDataFields.key: [],
+      fields.InputDataFields.groundtruth_difficult: [max_num_boxes],
+      fields.InputDataFields.groundtruth_boxes: [max_num_boxes, 4],
+      fields.InputDataFields.groundtruth_classes: [max_num_boxes, num_classes],
+      fields.InputDataFields.groundtruth_instance_masks: [
+          max_num_boxes, height, width
+      ],
+      fields.InputDataFields.groundtruth_is_crowd: [max_num_boxes],
+      fields.InputDataFields.groundtruth_group_of: [max_num_boxes],
+      fields.InputDataFields.groundtruth_area: [max_num_boxes],
+      fields.InputDataFields.groundtruth_weights: [max_num_boxes],
+      fields.InputDataFields.num_groundtruth_boxes: [],
+      fields.InputDataFields.groundtruth_label_types: [max_num_boxes],
+      fields.InputDataFields.groundtruth_label_scores: [max_num_boxes],
+      fields.InputDataFields.true_image_shape: [3],
+      fields.InputDataFields.multiclass_scores: [
+          max_num_boxes, num_classes + 1 if num_classes is not None else None
+      ],
+      fields.InputDataFields.groundtruth_image_classes: [num_classes],
+  }
+  if fields.InputDataFields.original_image in tensor_dict:
+    padding_shapes[fields.InputDataFields.original_image] = [
+        None, None, 3 + num_additional_channels
+    ]
+  if fields.InputDataFields.groundtruth_keypoints in tensor_dict:
+    tensor_shape = (
+        tensor_dict[fields.InputDataFields.groundtruth_keypoints].shape)
+    padding_shape = [max_num_boxes, tensor_shape[1].value,
+                     tensor_shape[2].value]
+    padding_shapes[fields.InputDataFields.groundtruth_keypoints] = padding_shape
+  if fields.InputDataFields.groundtruth_keypoint_visibilities in tensor_dict:
+    tensor_shape = tensor_dict[fields.InputDataFields.
+                               groundtruth_keypoint_visibilities].shape
+    padding_shape = [max_num_boxes, tensor_shape[1].value]
+    padding_shapes[fields.InputDataFields.
+                   groundtruth_keypoint_visibilities] = padding_shape
+  padded_tensor_dict = {}
+  for tensor_name in tensor_dict:
+    expected_shape = padding_shapes[tensor_name]
+    current_shape = shape_utils.combined_static_and_dynamic_shape(
+        tensor_dict[tensor_name])
+    trailing_paddings = [
+        expected_shape_dim - current_shape_dim if expected_shape_dim else 0
+        for expected_shape_dim, current_shape_dim in zip(
+            expected_shape, current_shape)
+    ]
+    paddings = tf.stack([tf.zeros(len(trailing_paddings), dtype=tf.int32),
+                         trailing_paddings],
+                        axis=1)
+    padded_tensor_dict[tensor_name] = tf.pad(
+        tensor_dict[tensor_name], paddings=paddings)
+    padded_tensor_dict[tensor_name].set_shape(expected_shape)
+  return padded_tensor_dict
 def augment_input_data(tensor_dict, data_augmentation_options):
  """Applies data augmentation ops to input tensors.
@@ -231,6 +331,8 @@ def create_train_input_fn(train_config, train_input_config,
      params: Parameter dictionary passed from the estimator.
    Returns:
+      A tf.data.Dataset that holds (features, labels) tuple.
      features: Dictionary of feature tensors.
        features[fields.InputDataFields.image] is a [batch_size, H, W, C]
          float32 tensor with preprocessed images.
@@ -275,33 +377,39 @@ def create_train_input_fn(train_config, train_input_config,
      raise TypeError('The `model_config` must be a '
                      'model_pb2.DetectionModel.')
-    data_augmentation_options = [
+    def transform_and_pad_input_data_fn(tensor_dict):
-        preprocessor_builder.build(step)
+      """Combines transform and pad operation."""
-        for step in train_config.data_augmentation_options
+      data_augmentation_options = [
-    ]
+          preprocessor_builder.build(step)
-    data_augmentation_fn = functools.partial(
+          for step in train_config.data_augmentation_options
-        augment_input_data, data_augmentation_options=data_augmentation_options)
+      ]
+      data_augmentation_fn = functools.partial(
+          augment_input_data,
+          data_augmentation_options=data_augmentation_options)
+      model = model_builder.build(model_config, is_training=True)
+      image_resizer_config = config_util.get_image_resizer_config(model_config)
+      image_resizer_fn = image_resizer_builder.build(image_resizer_config)
+      transform_data_fn = functools.partial(
+          transform_input_data, model_preprocess_fn=model.preprocess,
+          image_resizer_fn=image_resizer_fn,
+          num_classes=config_util.get_number_of_classes(model_config),
+          data_augmentation_fn=data_augmentation_fn,
+          merge_multiple_boxes=train_config.merge_multiple_label_boxes,
+          retain_original_image=train_config.retain_original_images)
+      tensor_dict = pad_input_data_to_static_shapes(
+          tensor_dict=transform_data_fn(tensor_dict),
+          max_num_boxes=train_input_config.max_number_of_boxes,
+          num_classes=config_util.get_number_of_classes(model_config),
+          spatial_image_shape=config_util.get_spatial_image_size(
+              image_resizer_config))
+      return (_get_features_dict(tensor_dict), _get_labels_dict(tensor_dict))
-    model = model_builder.build(model_config, is_training=True)
-    image_resizer_config = config_util.get_image_resizer_config(model_config)
-    image_resizer_fn = image_resizer_builder.build(image_resizer_config)
-    transform_data_fn = functools.partial(
-        transform_input_data, model_preprocess_fn=model.preprocess,
-        image_resizer_fn=image_resizer_fn,
-        num_classes=config_util.get_number_of_classes(model_config),
-        data_augmentation_fn=data_augmentation_fn,
-        retain_original_image=train_config.retain_original_images)
    dataset = INPUT_BUILDER_UTIL_MAP['dataset_build'](
        train_input_config,
-        transform_input_data_fn=transform_data_fn,
+        transform_input_data_fn=transform_and_pad_input_data_fn,
-        batch_size=params['batch_size'] if params else train_config.batch_size,
+        batch_size=params['batch_size'] if params else train_config.batch_size)
-        max_num_boxes=train_config.max_number_of_boxes,
+    return dataset
-        num_classes=config_util.get_number_of_classes(model_config),
-        spatial_image_shape=config_util.get_spatial_image_size(
-            image_resizer_config))
-    input_dict = dataset_util.make_initializable_iterator(dataset).get_next()
-    return (_get_features_dict(input_dict), _get_labels_dict(input_dict))
  return _train_input_fn
@@ -309,6 +417,8 @@ def create_train_input_fn(train_config, train_input_config,
 def create_eval_input_fn(eval_config, eval_input_config, model_config):
  """Creates an eval `input` function for `Estimator`.
+  # TODO(ronnyvotel,rathodv): Allow batch sizes of more than 1 for eval.
  Args:
    eval_config: An eval_pb2.EvalConfig.
    eval_input_config: An input_reader_pb2.InputReader.
@@ -325,6 +435,8 @@ def create_eval_input_fn(eval_config, eval_input_config, model_config):
      params: Parameter dictionary passed from the estimator.
    Returns:
+      A tf.data.Dataset that holds (features, labels) tuple.
      features: Dictionary of feature tensors.
        features[fields.InputDataFields.image] is a [1, H, W, C] float32 tensor
          with preprocessed images.
@@ -366,36 +478,41 @@ def create_eval_input_fn(eval_config, eval_input_config, model_config):
      raise TypeError('The `model_config` must be a '
                      'model_pb2.DetectionModel.')
-    num_classes = config_util.get_number_of_classes(model_config)
+    def transform_and_pad_input_data_fn(tensor_dict):
-    model = model_builder.build(model_config, is_training=False)
+      """Combines transform and pad operation."""
-    image_resizer_config = config_util.get_image_resizer_config(model_config)
+      num_classes = config_util.get_number_of_classes(model_config)
-    image_resizer_fn = image_resizer_builder.build(image_resizer_config)
+      model = model_builder.build(model_config, is_training=False)
+      image_resizer_config = config_util.get_image_resizer_config(model_config)
-    transform_data_fn = functools.partial(
+      image_resizer_fn = image_resizer_builder.build(image_resizer_config)
-        transform_input_data, model_preprocess_fn=model.preprocess,
-        image_resizer_fn=image_resizer_fn,
+      transform_data_fn = functools.partial(
-        num_classes=num_classes,
+          transform_input_data, model_preprocess_fn=model.preprocess,
-        data_augmentation_fn=None,
+          image_resizer_fn=image_resizer_fn,
-        retain_original_image=eval_config.retain_original_images)
+          num_classes=num_classes,
+          data_augmentation_fn=None,
+          retain_original_image=eval_config.retain_original_images)
+      tensor_dict = pad_input_data_to_static_shapes(
+          tensor_dict=transform_data_fn(tensor_dict),
+          max_num_boxes=eval_input_config.max_number_of_boxes,
+          num_classes=config_util.get_number_of_classes(model_config),
+          spatial_image_shape=config_util.get_spatial_image_size(
+              image_resizer_config))
+      return (_get_features_dict(tensor_dict), _get_labels_dict(tensor_dict))
    dataset = INPUT_BUILDER_UTIL_MAP['dataset_build'](
        eval_input_config,
-        transform_input_data_fn=transform_data_fn,
+        batch_size=1,  # Currently only support batch size of 1 for eval.
-        batch_size=params.get('batch_size', 1),
+        transform_input_data_fn=transform_and_pad_input_data_fn)
-        num_classes=config_util.get_number_of_classes(model_config),
+    return dataset
-        spatial_image_shape=config_util.get_spatial_image_size(
-            image_resizer_config))
-    input_dict = dataset_util.make_initializable_iterator(dataset).get_next()
-    return (_get_features_dict(input_dict), _get_labels_dict(input_dict))
  return _eval_input_fn
-def create_predict_input_fn(model_config):
+def create_predict_input_fn(model_config, predict_input_config):
  """Creates a predict `input` function for `Estimator`.
  Args:
    model_config: A model_pb2.DetectionModel.
+    predict_input_config: An input_reader_pb2.InputReader.
  Returns:
    `input_fn` for `Estimator` in PREDICT mode.
@@ -424,7 +541,9 @@ def create_predict_input_fn(model_config):
        num_classes=num_classes,
        data_augmentation_fn=None)
-    decoder = tf_example_decoder.TfExampleDecoder(load_instance_masks=False)
+    decoder = tf_example_decoder.TfExampleDecoder(
+        load_instance_masks=False,
+        num_additional_channels=predict_input_config.num_additional_channels)
    input_dict = transform_fn(decoder.decode(example))
    images = tf.to_float(input_dict[fields.InputDataFields.image])
    images = tf.expand_dims(images, axis=0)

--- a/research/object_detection/inputs_test.py
+++ b/research/object_detection/inputs_test.py
@@ -48,17 +48,30 @@ def _get_configs_for_model(model_name):
      label_map_path=label_map_path)
+def _make_initializable_iterator(dataset):
+  """Creates an iterator, and initializes tables.
+  Args:
+    dataset: A `tf.data.Dataset` object.
+  Returns:
+    A `tf.data.Iterator`.
+  """
+  iterator = dataset.make_initializable_iterator()
+  tf.add_to_collection(tf.GraphKeys.TABLE_INITIALIZERS, iterator.initializer)
+  return iterator
 class InputsTest(tf.test.TestCase):
  def test_faster_rcnn_resnet50_train_input(self):
    """Tests the training input function for FasterRcnnResnet50."""
    configs = _get_configs_for_model('faster_rcnn_resnet50_pets')
-    configs['train_config'].unpad_groundtruth_tensors = True
    model_config = configs['model']
    model_config.faster_rcnn.num_classes = 37
    train_input_fn = inputs.create_train_input_fn(
        configs['train_config'], configs['train_input_config'], model_config)
-    features, labels = train_input_fn()
+    features, labels = _make_initializable_iterator(train_input_fn()).get_next()
    self.assertAllEqual([1, None, None, 3],
                        features[fields.InputDataFields.image].shape.as_list())
@@ -67,17 +80,17 @@ class InputsTest(tf.test.TestCase):
                        features[inputs.HASH_KEY].shape.as_list())
    self.assertEqual(tf.int32, features[inputs.HASH_KEY].dtype)
    self.assertAllEqual(
-        [1, 50, 4],
+        [1, 100, 4],
        labels[fields.InputDataFields.groundtruth_boxes].shape.as_list())
    self.assertEqual(tf.float32,
                     labels[fields.InputDataFields.groundtruth_boxes].dtype)
    self.assertAllEqual(
-        [1, 50, model_config.faster_rcnn.num_classes],
+        [1, 100, model_config.faster_rcnn.num_classes],
        labels[fields.InputDataFields.groundtruth_classes].shape.as_list())
    self.assertEqual(tf.float32,
                     labels[fields.InputDataFields.groundtruth_classes].dtype)
    self.assertAllEqual(
-        [1, 50],
+        [1, 100],
        labels[fields.InputDataFields.groundtruth_weights].shape.as_list())
    self.assertEqual(tf.float32,
                     labels[fields.InputDataFields.groundtruth_weights].dtype)
@@ -89,8 +102,7 @@ class InputsTest(tf.test.TestCase):
    model_config.faster_rcnn.num_classes = 37
    eval_input_fn = inputs.create_eval_input_fn(
        configs['eval_config'], configs['eval_input_config'], model_config)
-    features, labels = eval_input_fn()
+    features, labels = _make_initializable_iterator(eval_input_fn()).get_next()
    self.assertAllEqual([1, None, None, 3],
                        features[fields.InputDataFields.image].shape.as_list())
    self.assertEqual(tf.float32, features[fields.InputDataFields.image].dtype)
@@ -102,27 +114,27 @@ class InputsTest(tf.test.TestCase):
    self.assertAllEqual([1], features[inputs.HASH_KEY].shape.as_list())
    self.assertEqual(tf.int32, features[inputs.HASH_KEY].dtype)
    self.assertAllEqual(
-        [1, None, 4],
+        [1, 100, 4],
        labels[fields.InputDataFields.groundtruth_boxes].shape.as_list())
    self.assertEqual(tf.float32,
                     labels[fields.InputDataFields.groundtruth_boxes].dtype)
    self.assertAllEqual(
-        [1, None, model_config.faster_rcnn.num_classes],
+        [1, 100, model_config.faster_rcnn.num_classes],
        labels[fields.InputDataFields.groundtruth_classes].shape.as_list())
    self.assertEqual(tf.float32,
                     labels[fields.InputDataFields.groundtruth_classes].dtype)
    self.assertAllEqual(
-        [1, None],
+        [1, 100],
        labels[fields.InputDataFields.groundtruth_area].shape.as_list())
    self.assertEqual(tf.float32,
                     labels[fields.InputDataFields.groundtruth_area].dtype)
    self.assertAllEqual(
-        [1, None],
+        [1, 100],
        labels[fields.InputDataFields.groundtruth_is_crowd].shape.as_list())
    self.assertEqual(
        tf.bool, labels[fields.InputDataFields.groundtruth_is_crowd].dtype)
    self.assertAllEqual(
-        [1, None],
+        [1, 100],
        labels[fields.InputDataFields.groundtruth_difficult].shape.as_list())
    self.assertEqual(
        tf.int32, labels[fields.InputDataFields.groundtruth_difficult].dtype)
@@ -135,7 +147,7 @@ class InputsTest(tf.test.TestCase):
    batch_size = configs['train_config'].batch_size
    train_input_fn = inputs.create_train_input_fn(
        configs['train_config'], configs['train_input_config'], model_config)
-    features, labels = train_input_fn()
+    features, labels = _make_initializable_iterator(train_input_fn()).get_next()
    self.assertAllEqual([batch_size, 300, 300, 3],
                        features[fields.InputDataFields.image].shape.as_list())
@@ -149,17 +161,17 @@ class InputsTest(tf.test.TestCase):
    self.assertEqual(tf.int32,
                     labels[fields.InputDataFields.num_groundtruth_boxes].dtype)
    self.assertAllEqual(
-        [batch_size, 50, 4],
+        [batch_size, 100, 4],
        labels[fields.InputDataFields.groundtruth_boxes].shape.as_list())
    self.assertEqual(tf.float32,
                     labels[fields.InputDataFields.groundtruth_boxes].dtype)
    self.assertAllEqual(
-        [batch_size, 50, model_config.ssd.num_classes],
+        [batch_size, 100, model_config.ssd.num_classes],
        labels[fields.InputDataFields.groundtruth_classes].shape.as_list())
    self.assertEqual(tf.float32,
                     labels[fields.InputDataFields.groundtruth_classes].dtype)
    self.assertAllEqual(
-        [batch_size, 50],
+        [batch_size, 100],
        labels[fields.InputDataFields.groundtruth_weights].shape.as_list())
    self.assertEqual(tf.float32,
                     labels[fields.InputDataFields.groundtruth_weights].dtype)
@@ -171,8 +183,7 @@ class InputsTest(tf.test.TestCase):
    model_config.ssd.num_classes = 37
    eval_input_fn = inputs.create_eval_input_fn(
        configs['eval_config'], configs['eval_input_config'], model_config)
-    features, labels = eval_input_fn()
+    features, labels = _make_initializable_iterator(eval_input_fn()).get_next()
    self.assertAllEqual([1, 300, 300, 3],
                        features[fields.InputDataFields.image].shape.as_list())
    self.assertEqual(tf.float32, features[fields.InputDataFields.image].dtype)
@@ -184,27 +195,27 @@ class InputsTest(tf.test.TestCase):
    self.assertAllEqual([1], features[inputs.HASH_KEY].shape.as_list())
    self.assertEqual(tf.int32, features[inputs.HASH_KEY].dtype)
    self.assertAllEqual(
-        [1, None, 4],
+        [1, 100, 4],
        labels[fields.InputDataFields.groundtruth_boxes].shape.as_list())
    self.assertEqual(tf.float32,
                     labels[fields.InputDataFields.groundtruth_boxes].dtype)
    self.assertAllEqual(
-        [1, None, model_config.ssd.num_classes],
+        [1, 100, model_config.ssd.num_classes],
        labels[fields.InputDataFields.groundtruth_classes].shape.as_list())
    self.assertEqual(tf.float32,
                     labels[fields.InputDataFields.groundtruth_classes].dtype)
    self.assertAllEqual(
-        [1, None],
+        [1, 100],
        labels[fields.InputDataFields.groundtruth_area].shape.as_list())
    self.assertEqual(tf.float32,
                     labels[fields.InputDataFields.groundtruth_area].dtype)
    self.assertAllEqual(
-        [1, None],
+        [1, 100],
        labels[fields.InputDataFields.groundtruth_is_crowd].shape.as_list())
    self.assertEqual(
        tf.bool, labels[fields.InputDataFields.groundtruth_is_crowd].dtype)
    self.assertAllEqual(
-        [1, None],
+        [1, 100],
        labels[fields.InputDataFields.groundtruth_difficult].shape.as_list())
    self.assertEqual(
        tf.int32, labels[fields.InputDataFields.groundtruth_difficult].dtype)
@@ -213,7 +224,8 @@ class InputsTest(tf.test.TestCase):
    """Tests the predict input function."""
    configs = _get_configs_for_model('ssd_inception_v2_pets')
    predict_input_fn = inputs.create_predict_input_fn(
-        model_config=configs['model'])
+        model_config=configs['model'],
+        predict_input_config=configs['eval_input_config'])
    serving_input_receiver = predict_input_fn()
    image = serving_input_receiver.features[fields.InputDataFields.image]
@@ -223,6 +235,23 @@ class InputsTest(tf.test.TestCase):
    self.assertEqual(tf.float32, image.dtype)
    self.assertEqual(tf.string, receiver_tensors.dtype)
+  def test_predict_input_with_additional_channels(self):
+    """Tests the predict input function with additional channels."""
+    configs = _get_configs_for_model('ssd_inception_v2_pets')
+    configs['eval_input_config'].num_additional_channels = 2
+    predict_input_fn = inputs.create_predict_input_fn(
+        model_config=configs['model'],
+        predict_input_config=configs['eval_input_config'])
+    serving_input_receiver = predict_input_fn()
+    image = serving_input_receiver.features[fields.InputDataFields.image]
+    receiver_tensors = serving_input_receiver.receiver_tensors[
+        inputs.SERVING_FED_EXAMPLE_KEY]
+    # RGB + 2 additional channels = 5 channels.
+    self.assertEqual([1, 300, 300, 5], image.shape.as_list())
+    self.assertEqual(tf.float32, image.dtype)
+    self.assertEqual(tf.string, receiver_tensors.dtype)
  def test_error_with_bad_train_config(self):
    """Tests that a TypeError is raised with improper train config."""
    configs = _get_configs_for_model('ssd_inception_v2_pets')
@@ -597,5 +626,93 @@ class DataTransformationFnTest(tf.test.TestCase):
                        (np_image + 5) * 2)
+class PadInputDataToStaticShapesFnTest(tf.test.TestCase):
+  def test_pad_images_boxes_and_classes(self):
+    input_tensor_dict = {
+        fields.InputDataFields.image:
+            tf.placeholder(tf.float32, [None, None, 3]),
+        fields.InputDataFields.groundtruth_boxes:
+            tf.placeholder(tf.float32, [None, 4]),
+        fields.InputDataFields.groundtruth_classes:
+            tf.placeholder(tf.int32, [None, 3]),
+        fields.InputDataFields.true_image_shape: tf.placeholder(tf.int32, [3]),
+    }
+    padded_tensor_dict = inputs.pad_input_data_to_static_shapes(
+        tensor_dict=input_tensor_dict,
+        max_num_boxes=3,
+        num_classes=3,
+        spatial_image_shape=[5, 6])
+    self.assertAllEqual(
+        padded_tensor_dict[fields.InputDataFields.image].shape.as_list(),
+        [5, 6, 3])
+    self.assertAllEqual(
+        padded_tensor_dict[fields.InputDataFields.true_image_shape]
+        .shape.as_list(), [3])
+    self.assertAllEqual(
+        padded_tensor_dict[fields.InputDataFields.groundtruth_boxes]
+        .shape.as_list(), [3, 4])
+    self.assertAllEqual(
+        padded_tensor_dict[fields.InputDataFields.groundtruth_classes]
+        .shape.as_list(), [3, 3])
+  def test_do_not_pad_dynamic_images(self):
+    input_tensor_dict = {
+        fields.InputDataFields.image:
+            tf.placeholder(tf.float32, [None, None, 3]),
+    }
+    padded_tensor_dict = inputs.pad_input_data_to_static_shapes(
+        tensor_dict=input_tensor_dict,
+        max_num_boxes=3,
+        num_classes=3,
+        spatial_image_shape=[None, None])
+    self.assertAllEqual(
+        padded_tensor_dict[fields.InputDataFields.image].shape.as_list(),
+        [None, None, 3])
+  def test_images_and_additional_channels(self):
+    input_tensor_dict = {
+        fields.InputDataFields.image:
+            tf.placeholder(tf.float32, [None, None, 3]),
+        fields.InputDataFields.image_additional_channels:
+            tf.placeholder(tf.float32, [None, None, 2]),
+    }
+    padded_tensor_dict = inputs.pad_input_data_to_static_shapes(
+        tensor_dict=input_tensor_dict,
+        max_num_boxes=3,
+        num_classes=3,
+        spatial_image_shape=[5, 6])
+    self.assertAllEqual(
+        padded_tensor_dict[fields.InputDataFields.image].shape.as_list(),
+        [5, 6, 5])
+    self.assertAllEqual(
+        padded_tensor_dict[fields.InputDataFields.image_additional_channels]
+        .shape.as_list(), [5, 6, 2])
+  def test_keypoints(self):
+    input_tensor_dict = {
+        fields.InputDataFields.groundtruth_keypoints:
+            tf.placeholder(tf.float32, [None, 16, 4]),
+        fields.InputDataFields.groundtruth_keypoint_visibilities:
+            tf.placeholder(tf.bool, [None, 16]),
+    }
+    padded_tensor_dict = inputs.pad_input_data_to_static_shapes(
+        tensor_dict=input_tensor_dict,
+        max_num_boxes=3,
+        num_classes=3,
+        spatial_image_shape=[5, 6])
+    self.assertAllEqual(
+        padded_tensor_dict[fields.InputDataFields.groundtruth_keypoints]
+        .shape.as_list(), [3, 16, 4])
+    self.assertAllEqual(
+        padded_tensor_dict[
+            fields.InputDataFields.groundtruth_keypoint_visibilities]
+        .shape.as_list(), [3, 16])
 if __name__ == '__main__':
  tf.test.main()
--- a/research/object_detection/meta_architectures/faster_rcnn_meta_arch.py
+++ b/research/object_detection/meta_architectures/faster_rcnn_meta_arch.py
@@ -253,7 +253,8 @@ class FasterRCNNMetaArch(model.DetectionModel):
               second_stage_mask_prediction_loss_weight=1.0,
               hard_example_miner=None,
               parallel_iterations=16,
-               add_summaries=True):
+               add_summaries=True,
+               use_matmul_crop_and_resize=False):
    """FasterRCNNMetaArch Constructor.
    Args:
@@ -360,6 +361,9 @@ class FasterRCNNMetaArch(model.DetectionModel):
        in parallel for calls to tf.map_fn.
      add_summaries: boolean (default: True) controlling whether summary ops
        should be added to tensorflow graph.
+      use_matmul_crop_and_resize: Force the use of matrix multiplication based
+        crop and resize instead of standard tf.image.crop_and_resize while
+        computing second stage input feature maps.
    Raises:
      ValueError: If `second_stage_batch_size` > `first_stage_max_proposals` at
@@ -446,6 +450,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
    self._second_stage_cls_loss_weight = second_stage_classification_loss_weight
    self._second_stage_mask_loss_weight = (
        second_stage_mask_prediction_loss_weight)
+    self._use_matmul_crop_and_resize = use_matmul_crop_and_resize
    self._hard_example_miner = hard_example_miner
    self._parallel_iterations = parallel_iterations
@@ -1429,11 +1434,26 @@ class FasterRCNNMetaArch(model.DetectionModel):
          tf.range(start=0, limit=proposals_shape[0]), 1)
      return tf.reshape(ones_mat * multiplier, [-1])
-    cropped_regions = tf.image.crop_and_resize(
+    if self._use_matmul_crop_and_resize:
-        features_to_crop,
+      def _single_image_crop_and_resize(inputs):
-        self._flatten_first_two_dimensions(proposal_boxes_normalized),
+        single_image_features_to_crop, proposal_boxes_normalized = inputs
-        get_box_inds(proposal_boxes_normalized),
+        return ops.matmul_crop_and_resize(
-        (self._initial_crop_size, self._initial_crop_size))
+            tf.expand_dims(single_image_features_to_crop, 0),
+            proposal_boxes_normalized,
+            [self._initial_crop_size, self._initial_crop_size])
+      cropped_regions = self._flatten_first_two_dimensions(
+          shape_utils.static_or_dynamic_map_fn(
+              _single_image_crop_and_resize,
+              elems=[features_to_crop, proposal_boxes_normalized],
+              dtype=tf.float32,
+              parallel_iterations=self._parallel_iterations))
+    else:
+      cropped_regions = tf.image.crop_and_resize(
+          features_to_crop,
+          self._flatten_first_two_dimensions(proposal_boxes_normalized),
+          get_box_inds(proposal_boxes_normalized),
+          (self._initial_crop_size, self._initial_crop_size))
    return slim.max_pool2d(
        cropped_regions,
        [self._maxpool_kernel_size, self._maxpool_kernel_size],

--- a/research/object_detection/meta_architectures/faster_rcnn_meta_arch_test_lib.py
+++ b/research/object_detection/meta_architectures/faster_rcnn_meta_arch_test_lib.py
@@ -152,7 +152,8 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
                   softmax_second_stage_classification_loss=True,
                   predict_masks=False,
                   pad_to_max_dimension=None,
-                   masks_are_class_agnostic=False):
+                   masks_are_class_agnostic=False,
+                   use_matmul_crop_and_resize=False):
    def image_resizer_fn(image, masks=None):
      """Fake image resizer function."""
@@ -287,7 +288,9 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
        second_stage_classification_loss_weight,
        'second_stage_classification_loss':
        second_stage_classification_loss,
-        'hard_example_miner': hard_example_miner}
+        'hard_example_miner': hard_example_miner,
+        'use_matmul_crop_and_resize': use_matmul_crop_and_resize
+    }
    return self._get_model(
        self._get_second_stage_box_predictor(
@@ -465,14 +468,16 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
      for key in expected_shapes:
        self.assertAllEqual(tensor_dict_out[key].shape, expected_shapes[key])
-  def test_predict_gives_correct_shapes_in_train_mode_both_stages(self):
+  def _test_predict_gives_correct_shapes_in_train_mode_both_stages(
+      self, use_matmul_crop_and_resize=False):
    test_graph = tf.Graph()
    with test_graph.as_default():
      model = self._build_model(
          is_training=True,
          number_of_stages=2,
          second_stage_batch_size=7,
-          predict_masks=False)
+          predict_masks=False,
+          use_matmul_crop_and_resize=use_matmul_crop_and_resize)
      batch_size = 2
      image_size = 10
@@ -535,6 +540,13 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
            tensor_dict_out['rpn_objectness_predictions_with_background'].shape,
            (2, num_anchors_out, 2))
+  def test_predict_gives_correct_shapes_in_train_mode_both_stages(self):
+    self._test_predict_gives_correct_shapes_in_train_mode_both_stages()
+  def test_predict_gives_correct_shapes_in_train_mode_matmul_crop_resize(self):
+    self._test_predict_gives_correct_shapes_in_train_mode_both_stages(
+        use_matmul_crop_and_resize=True)
  def _test_postprocess_first_stage_only_inference_mode(
      self, pad_to_max_dimension=None):
    model = self._build_model(

--- a/research/object_detection/meta_architectures/rfcn_meta_arch.py
+++ b/research/object_detection/meta_architectures/rfcn_meta_arch.py
@@ -76,7 +76,8 @@ class RFCNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch):
               second_stage_classification_loss,
               hard_example_miner,
               parallel_iterations=16,
-               add_summaries=True):
+               add_summaries=True,
+               use_matmul_crop_and_resize=False):
    """RFCNMetaArch Constructor.
    Args:
@@ -159,14 +160,17 @@ class RFCNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch):
        in parallel for calls to tf.map_fn.
      add_summaries: boolean (default: True) controlling whether summary ops
        should be added to tensorflow graph.
+      use_matmul_crop_and_resize: Force the use of matrix multiplication based
+        crop and resize instead of standard tf.image.crop_and_resize while
+        computing second stage input feature maps.
    Raises:
      ValueError: If `second_stage_batch_size` > `first_stage_max_proposals`
      ValueError: If first_stage_anchor_generator is not of type
        grid_anchor_generator.GridAnchorGenerator.
    """
-    # TODO(rathodv): add_summaries is currently unused. Respect that directive
+    # TODO(rathodv): add_summaries and crop_and_resize_fn is currently
-    # in the future.
+    # unused. Respect that directive in the future.
    super(RFCNMetaArch, self).__init__(
        is_training,
        num_classes,

--- a/research/object_detection/meta_architectures/ssd_meta_arch.py
+++ b/research/object_detection/meta_architectures/ssd_meta_arch.py
@@ -480,12 +480,16 @@ class SSDMetaArch(model.DetectionModel):
    with tf.name_scope('Postprocessor'):
      preprocessed_images = prediction_dict['preprocessed_inputs']
      box_encodings = prediction_dict['box_encodings']
+      box_encodings = tf.identity(box_encodings, 'raw_box_encodings')
      class_predictions = prediction_dict['class_predictions_with_background']
      detection_boxes, detection_keypoints = self._batch_decode(box_encodings)
+      detection_boxes = tf.identity(detection_boxes, 'raw_box_locations')
      detection_boxes = tf.expand_dims(detection_boxes, axis=2)
      detection_scores_with_background = self._score_conversion_fn(
          class_predictions)
+      detection_scores_with_background = tf.identity(
+          detection_scores_with_background, 'raw_box_scores')
      detection_scores = tf.slice(detection_scores_with_background, [0, 0, 1],
                                  [-1, -1, -1])
      additional_fields = None

--- a/research/object_detection/metrics/io_utils.py
+++ b/research/object_detection/metrics/io_utils.py
+# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Common IO utils used in offline metric computation.
+"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import csv
+def write_csv(fid, metrics):
+  """Writes metrics key-value pairs to CSV file.
+  Args:
+    fid: File identifier of an opened file.
+    metrics: A dictionary with metrics to be written.
+  """
+  metrics_writer = csv.writer(fid, delimiter=',')
+  for metric_name, metric_value in metrics.items():
+    metrics_writer.writerow([metric_name, str(metric_value)])
--- a/research/object_detection/metrics/oid_od_challenge_evaluation.py
+++ b/research/object_detection/metrics/oid_od_challenge_evaluation.py
+# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+r"""Runs evaluation using OpenImages groundtruth and predictions.
+Example usage:
+python models/research/object_detection/metrics/oid_od_challenge_evaluation.py \
+    --input_annotations_boxes=/path/to/input/annotations-human-bbox.csv \
+    --input_annotations_labels=/path/to/input/annotations-label.csv \
+    --input_class_labelmap=/path/to/input/class_labelmap.pbtxt \
+    --input_predictions=/path/to/input/predictions.csv \
+    --output_metrics=/path/to/output/metric.csv \
+CSVs with bounding box annotations and image label (including the image URLs)
+can be downloaded from the Open Images Challenge website:
+https://storage.googleapis.com/openimages/web/challenge.html
+The format of the input csv and the metrics itself are described on the
+challenge website.
+"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import argparse
+import pandas as pd
+from google.protobuf import text_format
+from object_detection.metrics import io_utils
+from object_detection.metrics import oid_od_challenge_evaluation_utils as utils
+from object_detection.protos import string_int_label_map_pb2
+from object_detection.utils import object_detection_evaluation
+def _load_labelmap(labelmap_path):
+  """Loads labelmap from the labelmap path.
+  Args:
+    labelmap_path: Path to the labelmap.
+  Returns:
+    A dictionary mapping class name to class numerical id
+    A list with dictionaries, one dictionary per category.
+  """
+  label_map = string_int_label_map_pb2.StringIntLabelMap()
+  with open(labelmap_path, 'r') as fid:
+    label_map_string = fid.read()
+    text_format.Merge(label_map_string, label_map)
+  labelmap_dict = {}
+  categories = []
+  for item in label_map.item:
+    labelmap_dict[item.name] = item.id
+    categories.append({'id': item.id, 'name': item.name})
+  return labelmap_dict, categories
+def main(parsed_args):
+  all_box_annotations = pd.read_csv(parsed_args.input_annotations_boxes)
+  all_label_annotations = pd.read_csv(parsed_args.input_annotations_labels)
+  all_label_annotations.rename(
+      columns={'Confidence': 'ConfidenceImageLabel'}, inplace=True)
+  all_annotations = pd.concat([all_box_annotations, all_label_annotations])
+  class_label_map, categories = _load_labelmap(parsed_args.input_class_labelmap)
+  challenge_evaluator = (
+      object_detection_evaluation.OpenImagesDetectionChallengeEvaluator(
+          categories))
+  for _, groundtruth in enumerate(all_annotations.groupby('ImageID')):
+    image_id, image_groundtruth = groundtruth
+    groundtruth_dictionary = utils.build_groundtruth_boxes_dictionary(
+        image_groundtruth, class_label_map)
+    challenge_evaluator.add_single_ground_truth_image_info(
+        image_id, groundtruth_dictionary)
+  all_predictions = pd.read_csv(parsed_args.input_predictions)
+  for _, prediction_data in enumerate(all_predictions.groupby('ImageID')):
+    image_id, image_predictions = prediction_data
+    prediction_dictionary = utils.build_predictions_dictionary(
+        image_predictions, class_label_map)
+    challenge_evaluator.add_single_detected_image_info(image_id,
+                                                       prediction_dictionary)
+  metrics = challenge_evaluator.evaluate()
+  with open(parsed_args.output_metrics, 'w') as fid:
+    io_utils.write_csv(fid, metrics)
+if __name__ == '__main__':
+  parser = argparse.ArgumentParser(
+      description='Evaluate Open Images Object Detection Challenge predictions.'
+  )
+  parser.add_argument(
+      '--input_annotations_boxes',
+      required=True,
+      help='File with groundtruth boxes annotations.')
+  parser.add_argument(
+      '--input_annotations_labels',
+      required=True,
+      help='File with groundtruth labels annotations')
+  parser.add_argument(
+      '--input_predictions',
+      required=True,
+      help="""File with detection predictions; NOTE: no postprocessing is
+      applied in the evaluation script.""")
+  parser.add_argument(
+      '--input_class_labelmap',
+      required=True,
+      help='Open Images Challenge labelmap.')
+  parser.add_argument(
+      '--output_metrics', required=True, help='Output file with csv metrics')
+  args = parser.parse_args()
+  main(args)
--- a/research/object_detection/metrics/oid_od_challenge_evaluation_utils.py
+++ b/research/object_detection/metrics/oid_od_challenge_evaluation_utils.py
+# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+r"""Converts data from CSV to the OpenImagesDetectionChallengeEvaluator format.
+"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from object_detection.core import standard_fields
+def build_groundtruth_boxes_dictionary(data, class_label_map):
+  """Builds a groundtruth dictionary from groundtruth data in CSV file.
+  Args:
+    data: Pandas DataFrame with the groundtruth data for a single image.
+    class_label_map: Class labelmap from string label name to an integer.
+  Returns:
+    A dictionary with keys suitable for passing to
+    OpenImagesDetectionChallengeEvaluator.add_single_ground_truth_image_info:
+        standard_fields.InputDataFields.groundtruth_boxes: float32 numpy array
+          of shape [num_boxes, 4] containing `num_boxes` groundtruth boxes of
+          the format [ymin, xmin, ymax, xmax] in absolute image coordinates.
+        standard_fields.InputDataFields.groundtruth_classes: integer numpy array
+          of shape [num_boxes] containing 1-indexed groundtruth classes for the
+          boxes.
+        standard_fields.InputDataFields.verified_labels: integer 1D numpy array
+          containing all classes for which labels are verified.
+        standard_fields.InputDataFields.groundtruth_group_of: Optional length
+          M numpy boolean array denoting whether a groundtruth box contains a
+          group of instances.
+  """
+  data_boxes = data[data.ConfidenceImageLabel.isnull()]
+  data_labels = data[data.XMin.isnull()]
+  return {
+      standard_fields.InputDataFields.groundtruth_boxes:
+          data_boxes[['YMin', 'XMin', 'YMax', 'XMax']].as_matrix(),
+      standard_fields.InputDataFields.groundtruth_classes:
+          data_boxes['LabelName'].map(lambda x: class_label_map[x]).as_matrix(),
+      standard_fields.InputDataFields.groundtruth_group_of:
+          data_boxes['IsGroupOf'].as_matrix().astype(int),
+      standard_fields.InputDataFields.groundtruth_image_classes:
+          data_labels['LabelName'].map(lambda x: class_label_map[x])
+          .as_matrix(),
+  }
+def build_predictions_dictionary(data, class_label_map):
+  """Builds a predictions dictionary from predictions data in CSV file.
+  Args:
+    data: Pandas DataFrame with the predictions data for a single image.
+    class_label_map: Class labelmap from string label name to an integer.
+  Returns:
+    Dictionary with keys suitable for passing to
+    OpenImagesDetectionChallengeEvaluator.add_single_detected_image_info:
+        standard_fields.DetectionResultFields.detection_boxes: float32 numpy
+          array of shape [num_boxes, 4] containing `num_boxes` detection boxes
+          of the format [ymin, xmin, ymax, xmax] in absolute image coordinates.
+        standard_fields.DetectionResultFields.detection_scores: float32 numpy
+          array of shape [num_boxes] containing detection scores for the boxes.
+        standard_fields.DetectionResultFields.detection_classes: integer numpy
+          array of shape [num_boxes] containing 1-indexed detection classes for
+          the boxes.
+  """
+  return {
+      standard_fields.DetectionResultFields.detection_boxes:
+          data[['YMin', 'XMin', 'YMax', 'XMax']].as_matrix(),
+      standard_fields.DetectionResultFields.detection_classes:
+          data['LabelName'].map(lambda x: class_label_map[x]).as_matrix(),
+      standard_fields.DetectionResultFields.detection_scores:
+          data['Score'].as_matrix()
+  }
--- a/research/object_detection/metrics/oid_od_challenge_evaluation_utils_test.py
+++ b/research/object_detection/metrics/oid_od_challenge_evaluation_utils_test.py
+# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for oid_od_challenge_evaluation_util."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import numpy as np
+import pandas as pd
+import tensorflow as tf
+from object_detection.core import standard_fields
+from object_detection.metrics import oid_od_challenge_evaluation_utils as utils
+class OidOdChallengeEvaluationUtilTest(tf.test.TestCase):
+  def testBuildGroundtruthDictionary(self):
+    np_data = pd.DataFrame(
+        [['fe58ec1b06db2bb7', '/m/04bcr3', 0.0, 0.3, 0.5, 0.6, 1, None], [
+            'fe58ec1b06db2bb7', '/m/02gy9n', 0.1, 0.2, 0.3, 0.4, 0, None
+        ], ['fe58ec1b06db2bb7', '/m/04bcr3', None, None, None, None, None, 1], [
+            'fe58ec1b06db2bb7', '/m/083vt', None, None, None, None, None, 0
+        ], ['fe58ec1b06db2bb7', '/m/02gy9n', None, None, None, None, None, 1]],
+        columns=[
+            'ImageID', 'LabelName', 'XMin', 'XMax', 'YMin', 'YMax', 'IsGroupOf',
+            'ConfidenceImageLabel'
+        ])
+    class_label_map = {'/m/04bcr3': 1, '/m/083vt': 2, '/m/02gy9n': 3}
+    groundtruth_dictionary = utils.build_groundtruth_boxes_dictionary(
+        np_data, class_label_map)
+    self.assertTrue(standard_fields.InputDataFields.groundtruth_boxes in
+                    groundtruth_dictionary)
+    self.assertTrue(standard_fields.InputDataFields.groundtruth_classes in
+                    groundtruth_dictionary)
+    self.assertTrue(standard_fields.InputDataFields.groundtruth_group_of in
+                    groundtruth_dictionary)
+    self.assertTrue(standard_fields.InputDataFields.groundtruth_image_classes in
+                    groundtruth_dictionary)
+    self.assertAllEqual(
+        np.array([1, 3]), groundtruth_dictionary[
+            standard_fields.InputDataFields.groundtruth_classes])
+    self.assertAllEqual(
+        np.array([1, 0]), groundtruth_dictionary[
+            standard_fields.InputDataFields.groundtruth_group_of])
+    expected_boxes_data = np.array([[0.5, 0.0, 0.6, 0.3], [0.3, 0.1, 0.4, 0.2]])
+    self.assertNDArrayNear(
+        expected_boxes_data, groundtruth_dictionary[
+            standard_fields.InputDataFields.groundtruth_boxes], 1e-5)
+    self.assertAllEqual(
+        np.array([1, 2, 3]), groundtruth_dictionary[
+            standard_fields.InputDataFields.groundtruth_image_classes])
+  def testBuildPredictionDictionary(self):
+    np_data = pd.DataFrame(
+        [['fe58ec1b06db2bb7', '/m/04bcr3', 0.0, 0.3, 0.5, 0.6, 0.1], [
+            'fe58ec1b06db2bb7', '/m/02gy9n', 0.1, 0.2, 0.3, 0.4, 0.2
+        ], ['fe58ec1b06db2bb7', '/m/04bcr3', 0.0, 0.1, 0.2, 0.3, 0.3]],
+        columns=[
+            'ImageID', 'LabelName', 'XMin', 'XMax', 'YMin', 'YMax', 'Score'
+        ])
+    class_label_map = {'/m/04bcr3': 1, '/m/083vt': 2, '/m/02gy9n': 3}
+    prediction_dictionary = utils.build_predictions_dictionary(
+        np_data, class_label_map)
+    self.assertTrue(standard_fields.DetectionResultFields.detection_boxes in
+                    prediction_dictionary)
+    self.assertTrue(standard_fields.DetectionResultFields.detection_classes in
+                    prediction_dictionary)
+    self.assertTrue(standard_fields.DetectionResultFields.detection_scores in
+                    prediction_dictionary)
+    self.assertAllEqual(
+        np.array([1, 3, 1]), prediction_dictionary[
+            standard_fields.DetectionResultFields.detection_classes])
+    expected_boxes_data = np.array([[0.5, 0.0, 0.6, 0.3], [0.3, 0.1, 0.4, 0.2],
+                                    [0.2, 0.0, 0.3, 0.1]])
+    self.assertNDArrayNear(
+        expected_boxes_data, prediction_dictionary[
+            standard_fields.DetectionResultFields.detection_boxes], 1e-5)
+    self.assertNDArrayNear(
+        np.array([0.1, 0.2, 0.3]), prediction_dictionary[
+            standard_fields.DetectionResultFields.detection_scores], 1e-5)
+if __name__ == '__main__':
+  tf.test.main()
--- a/research/object_detection/metrics/oid_vrd_challenge_evaluation.py
+++ b/research/object_detection/metrics/oid_vrd_challenge_evaluation.py
@@ -15,8 +15,8 @@
 r"""Runs evaluation using OpenImages groundtruth and predictions.
 Example usage:
-  python third_party/tensorflow_models/object_detection/\
+python \
-  metrics/oid_vrd_challenge_evaluation.py \
+models/research/object_detection/metrics/oid_vrd_challenge_evaluation.py \
    --input_annotations_boxes=/path/to/input/annotations-human-bbox.csv \
    --input_annotations_labels=/path/to/input/annotations-label.csv \
    --input_class_labelmap=/path/to/input/class_labelmap.pbtxt \
@@ -39,6 +39,7 @@ import argparse
 import pandas as pd
 from google.protobuf import text_format
+from object_detection.metrics import io_utils
 from object_detection.metrics import oid_vrd_challenge_evaluation_utils as utils
 from object_detection.protos import string_int_label_map_pb2
 from object_detection.utils import vrd_evaluation
@@ -109,12 +110,14 @@ def main(parsed_args):
    phrase_evaluator.add_single_detected_image_info(image_id,
                                                    prediction_dictionary)
-  relation_metrics = relation_evaluator.evaluate()
+  relation_metrics = relation_evaluator.evaluate(
-  phrase_metrics = phrase_evaluator.evaluate()
+      relationships=_swap_labelmap_dict(relationship_label_map))
+  phrase_metrics = phrase_evaluator.evaluate(
+      relationships=_swap_labelmap_dict(relationship_label_map))
  with open(parsed_args.output_metrics, 'w') as fid:
-    utils.write_csv(fid, relation_metrics)
+    io_utils.write_csv(fid, relation_metrics)
-    utils.write_csv(fid, phrase_metrics)
+    io_utils.write_csv(fid, phrase_metrics)
 if __name__ == '__main__':

--- a/research/object_detection/metrics/oid_vrd_challenge_evaluation_utils.py
+++ b/research/object_detection/metrics/oid_vrd_challenge_evaluation_utils.py
@@ -18,7 +18,6 @@ from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
-import csv
 import numpy as np
 from object_detection.core import standard_fields
 from object_detection.utils import vrd_evaluation
@@ -58,18 +57,21 @@ def build_groundtruth_vrd_dictionary(data, class_label_map,
  boxes['object'] = data_boxes[['YMin2', 'XMin2', 'YMax2', 'XMax2']].as_matrix()
  labels = np.zeros(data_boxes.shape[0], dtype=vrd_evaluation.label_data_type)
-  labels['subject'] = data_boxes['LabelName1'].map(lambda x: class_label_map[x])
+  labels['subject'] = data_boxes['LabelName1'].map(
-  labels['object'] = data_boxes['LabelName2'].map(lambda x: class_label_map[x])
+      lambda x: class_label_map[x]).as_matrix()
+  labels['object'] = data_boxes['LabelName2'].map(
+      lambda x: class_label_map[x]).as_matrix()
  labels['relation'] = data_boxes['RelationshipLabel'].map(
-      lambda x: relationship_label_map[x])
+      lambda x: relationship_label_map[x]).as_matrix()
  return {
      standard_fields.InputDataFields.groundtruth_boxes:
          boxes,
      standard_fields.InputDataFields.groundtruth_classes:
          labels,
-      standard_fields.InputDataFields.verified_labels:
+      standard_fields.InputDataFields.groundtruth_image_classes:
-          data_labels['LabelName'].map(lambda x: class_label_map[x]),
+          data_labels['LabelName'].map(lambda x: class_label_map[x])
+          .as_matrix(),
  }
@@ -106,10 +108,12 @@ def build_predictions_vrd_dictionary(data, class_label_map,
  boxes['object'] = data_boxes[['YMin2', 'XMin2', 'YMax2', 'XMax2']].as_matrix()
  labels = np.zeros(data_boxes.shape[0], dtype=vrd_evaluation.label_data_type)
-  labels['subject'] = data_boxes['LabelName1'].map(lambda x: class_label_map[x])
+  labels['subject'] = data_boxes['LabelName1'].map(
-  labels['object'] = data_boxes['LabelName2'].map(lambda x: class_label_map[x])
+      lambda x: class_label_map[x]).as_matrix()
+  labels['object'] = data_boxes['LabelName2'].map(
+      lambda x: class_label_map[x]).as_matrix()
  labels['relation'] = data_boxes['RelationshipLabel'].map(
-      lambda x: relationship_label_map[x])
+      lambda x: relationship_label_map[x]).as_matrix()
  return {
      standard_fields.DetectionResultFields.detection_boxes:
@@ -119,15 +123,3 @@ def build_predictions_vrd_dictionary(data, class_label_map,
      standard_fields.DetectionResultFields.detection_scores:
          data_boxes['Score'].as_matrix()
  }
-def write_csv(fid, metrics):
-  """Writes metrics key-value pairs to CSV file.
-  Args:
-    fid: File identifier of an opened file.
-    metrics: A dictionary with metrics to be written.
-  """
-  metrics_writer = csv.writer(fid, delimiter=',')
-  for metric_name, metric_value in metrics.items():
-    metrics_writer.writerow([metric_name, str(metric_value)])
--- a/research/object_detection/metrics/oid_vrd_challenge_evaluation_utils_test.py
+++ b/research/object_detection/metrics/oid_vrd_challenge_evaluation_utils_test.py
@@ -66,7 +66,7 @@ class OidVrdChallengeEvaluationUtilsTest(tf.test.TestCase):
                    groundtruth_dictionary)
    self.assertTrue(standard_fields.InputDataFields.groundtruth_classes in
                    groundtruth_dictionary)
-    self.assertTrue(standard_fields.InputDataFields.verified_labels in
+    self.assertTrue(standard_fields.InputDataFields.groundtruth_image_classes in
                    groundtruth_dictionary)
    self.assertAllEqual(
@@ -87,8 +87,8 @@ class OidVrdChallengeEvaluationUtilsTest(tf.test.TestCase):
          expected_vrd_data[field], groundtruth_dictionary[
              standard_fields.InputDataFields.groundtruth_boxes][field], 1e-5)
    self.assertAllEqual(
-        np.array([1, 2, 3]),
+        np.array([1, 2, 3]), groundtruth_dictionary[
-        groundtruth_dictionary[standard_fields.InputDataFields.verified_labels])
+            standard_fields.InputDataFields.groundtruth_image_classes])
  def testBuildPredictionDictionary(self):
    np_data = pd.DataFrame(

--- a/research/object_detection/metrics/tf_example_parser.py
+++ b/research/object_detection/metrics/tf_example_parser.py
@@ -114,7 +114,7 @@ class TfExampleDetectionAndGTParser(data_parser.DataToNumpyParser):
            Int64Parser(fields.TfExampleFields.object_difficult),
        fields.InputDataFields.groundtruth_group_of:
            Int64Parser(fields.TfExampleFields.object_group_of),
-        fields.InputDataFields.verified_labels:
+        fields.InputDataFields.groundtruth_image_classes:
            Int64Parser(fields.TfExampleFields.image_class_label),
    }
@@ -136,6 +136,8 @@ class TfExampleDetectionAndGTParser(data_parser.DataToNumpyParser):
      groundtruth group of flag (optional, None if not specified).
      fields.InputDataFields.groundtruth_difficult - a numpy array containing
      groundtruth difficult flag (optional, None if not specified).
+      fields.InputDataFields.groundtruth_image_classes - a numpy array
+      containing groundtruth image-level labels.
      fields.DetectionResultFields.detection_boxes - a numpy array containing
      detection boxes.
      fields.DetectionResultFields.detection_classes - a numpy array containing

--- a/research/object_detection/metrics/tf_example_parser_test.py
+++ b/research/object_detection/metrics/tf_example_parser_test.py
@@ -125,7 +125,8 @@ class TfExampleDecoderTest(tf.test.TestCase):
    results_dict = parser.parse(example)
    self.assertIsNotNone(results_dict)
    np_testing.assert_equal(
-        verified_labels, results_dict[fields.InputDataFields.verified_labels])
+        verified_labels,
+        results_dict[fields.InputDataFields.groundtruth_image_classes])
  def testParseString(self):
    string_val = 'abc'