Merged commit includes the following changes:

199348852 by Zhichao Lu: Small typos fixes in VRD evaluation. -- 199315191 by Zhichao Lu: Change padding shapes when additional channels are available. -- 199309180 by Zhichao Lu: Adds minor fixes to the Object Detection API implementation. -- 199298605 by Zhichao Lu: Force num_readers to be 1 when only input file is not sharded. -- 199292952 by Zhichao Lu: Adds image-level labels parsing into TfExampleDetectionAndGTParser. -- 199259866 by Zhichao Lu: Visual Relationships Evaluation executable. -- 199208330 by Zhichao Lu: Infer train_config.batch_size as the effective batch size. Therefore we need to divide the effective batch size in trainer by train_config.replica_to_aggregate to get per worker batch size. -- 199207842 by Zhichao Lu: Internal change. -- 199204222 by Zhichao Lu: In case the image has more than three channels, we only take the first three channels for visualization. -- 199194388 by Zhichao Lu: Correcting protocols description: VOC 2007 -> VOC 2012. -- 199188290 by Zhichao Lu: Adds per-relationship APs and mAP computation to VRD evaluation. -- 199158801 by Zhichao Lu: If available, additional channels are merged with input image. -- 199099637 by Zhichao Lu: OpenImages Challenge metric support: -adding verified labels standard field for TFExample; -adding tfrecord creation functionality. -- 198957391 by Zhichao Lu: Allow tf record sharding when creating pets dataset. -- 198925184 by Zhichao Lu: Introduce moving average support for evaluation. Also adding the ability to override this configuration via config_util. -- 198918186 by Zhichao Lu: Handles the case where there are 0 box masks. -- 198809009 by Zhichao Lu: Plumb groundtruth weights into target assigner for Faster RCNN. -- 198759987 by Zhichao Lu: Fix object detection test broken by shape inference. -- 198668602 by Zhichao Lu: Adding a new input field in data_decoders/tf_example_decoder.py for storing additional channels. -- 198530013 by Zhichao Lu: An util for hierarchical expandion of boxes and labels of OID dataset. -- 198503124 by Zhichao Lu: Fix dimension mismatch error introduced by https://github.com/tensorflow/tensorflow/pull/18251, or cl/194031845. After above change, conv2d strictly checks for conv_dims + 2 == input_rank. -- 198445807 by Zhichao Lu: Enabling Object Detection Challenge 2018 metric in evaluator.py framework for running eval job. Renaming old OpenImages V2 metric. -- 198413950 by Zhichao Lu: Support generic configuration override using namespaced keys Useful for adding custom hyper-parameter tuning fields without having to add custom override methods to config_utils.py. -- 198106437 by Zhichao Lu: Enable fused batchnorm now that quantization is supported. -- 198048364 by Zhichao Lu: Add support for keypoints in tf sequence examples and some util ops. -- 198004736 by Zhichao Lu: Relax postprocessing unit tests that are based on assumption that tf.image.non_max_suppression are stable with respect to input. -- 197997513 by Zhichao Lu: More lenient validation for normalized box boundaries. -- 197940068 by Zhichao Lu: A couple of minor updates/fixes: - Updating input reader proto with option to use display_name when decoding data. - Updating visualization tool to specify whether using absolute or normalized box coordinates. Appropriate boxes will now appear in TB when using model_main.py -- 197920152 by Zhichao Lu: Add quantized training support in the new OD binaries and a config for SSD Mobilenet v1 quantized training that is TPU compatible. -- 197213563 by Zhichao Lu: Do not share batch_norm for classification and regression tower in weight shared box predictor. -- 197196757 by Zhichao Lu: Relax the box_predictor api to return box_prediction of shape [batch_size, num_anchors, code_size] in addition to [batch_size, num_anchors, (1|q), code_size]. -- 196898361 by Zhichao Lu: Allow per-channel scalar value to pad input image with when using keep aspect ratio resizer (when pad_to_max_dimension=True). In Object Detection Pipeline, we pad image before normalization and this skews batch_norm statistics during training. The option to set per channel pad value lets us truly pad with zeros. -- 196592101 by Zhichao Lu: Fix bug regarding tfrecord shuffling in object_detection -- 196320138 by Zhichao Lu: Fix typo in exporting_models.md -- PiperOrigin-RevId: 199348852

Merged commit includes the following changes:
199348852 by Zhichao Lu: Small typos fixes in VRD evaluation. -- 199315191 by Zhichao Lu: Change padding shapes when additional channels are available. -- 199309180 by Zhichao Lu: Adds minor fixes to the Object Detection API implementation. -- 199298605 by Zhichao Lu: Force num_readers to be 1 when only input file is not sharded. -- 199292952 by Zhichao Lu: Adds image-level labels parsing into TfExampleDetectionAndGTParser. -- 199259866 by Zhichao Lu: Visual Relationships Evaluation executable. -- 199208330 by Zhichao Lu: Infer train_config.batch_size as the effective batch size. Therefore we need to divide the effective batch size in trainer by train_config.replica_to_aggregate to get per worker batch size. -- 199207842 by Zhichao Lu: Internal change. -- 199204222 by Zhichao Lu: In case the image has more than three channels, we only take the first three channels for visualization. -- 199194388 by Zhichao Lu: Correcting protocols description: VOC 2007 -> VOC 2012. -- 199188290 by Zhichao Lu: Adds per-relationship APs and mAP computation to VRD evaluation. -- 199158801 by Zhichao Lu: If available, additional channels are merged with input image. -- 199099637 by Zhichao Lu: OpenImages Challenge metric support: -adding verified labels standard field for TFExample; -adding tfrecord creation functionality. -- 198957391 by Zhichao Lu: Allow tf record sharding when creating pets dataset. -- 198925184 by Zhichao Lu: Introduce moving average support for evaluation. Also adding the ability to override this configuration via config_util. -- 198918186 by Zhichao Lu: Handles the case where there are 0 box masks. -- 198809009 by Zhichao Lu: Plumb groundtruth weights into target assigner for Faster RCNN. -- 198759987 by Zhichao Lu: Fix object detection test broken by shape inference. -- 198668602 by Zhichao Lu: Adding a new input field in data_decoders/tf_example_decoder.py for storing additional channels. -- 198530013 by Zhichao Lu: An util for hierarchical expandion of boxes and labels of OID dataset. -- 198503124 by Zhichao Lu: Fix dimension mismatch error introduced by https://github.com/tensorflow/tensorflow/pull/18251, or cl/194031845. After above change, conv2d strictly checks for conv_dims + 2 == input_rank. -- 198445807 by Zhichao Lu: Enabling Object Detection Challenge 2018 metric in evaluator.py framework for running eval job. Renaming old OpenImages V2 metric. -- 198413950 by Zhichao Lu: Support generic configuration override using namespaced keys Useful for adding custom hyper-parameter tuning fields without having to add custom override methods to config_utils.py. -- 198106437 by Zhichao Lu: Enable fused batchnorm now that quantization is supported. -- 198048364 by Zhichao Lu: Add support for keypoints in tf sequence examples and some util ops. -- 198004736 by Zhichao Lu: Relax postprocessing unit tests that are based on assumption that tf.image.non_max_suppression are stable with respect to input. -- 197997513 by Zhichao Lu: More lenient validation for normalized box boundaries. -- 197940068 by Zhichao Lu: A couple of minor updates/fixes: - Updating input reader proto with option to use display_name when decoding data. - Updating visualization tool to specify whether using absolute or normalized box coordinates. Appropriate boxes will now appear in TB when using model_main.py -- 197920152 by Zhichao Lu: Add quantized training support in the new OD binaries and a config for SSD Mobilenet v1 quantized training that is TPU compatible. -- 197213563 by Zhichao Lu: Do not share batch_norm for classification and regression tower in weight shared box predictor. -- 197196757 by Zhichao Lu: Relax the box_predictor api to return box_prediction of shape [batch_size, num_anchors, code_size] in addition to [batch_size, num_anchors, (1|q), code_size]. -- 196898361 by Zhichao Lu: Allow per-channel scalar value to pad input image with when using keep aspect ratio resizer (when pad_to_max_dimension=True). In Object Detection Pipeline, we pad image before normalization and this skews batch_norm statistics during training. The option to set per channel pad value lets us truly pad with zeros. -- 196592101 by Zhichao Lu: Fix bug regarding tfrecord shuffling in object_detection -- 196320138 by Zhichao Lu: Fix typo in exporting_models.md -- PiperOrigin-RevId: 199348852
9fce9c64 · Zhichao Lu · pkulzc · ed901b73 · ed901b73 · 9fce9c64
Commit 9fce9c64 authored Jun 05, 2018 by Zhichao Lu Committed by pkulzc Jun 05, 2018
20 changed files
--- a/research/object_detection/models/feature_map_generators.py
+++ b/research/object_detection/models/feature_map_generators.py
-# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Functions to generate a list of feature maps based on image features.
-
-Provides several feature map generators that can be used to build object
-detection feature extractors.
-
-Object detection feature extractors usually are built by stacking two components
- A base feature extractor such as Inception V3 and a feature map generator.
-Feature map generators build on the base feature extractors and produce a list
-of final feature maps.
-"""
-import collections
-import tensorflow as tf
-from object_detection.utils import ops
-slim = tf.contrib.slim
-
-
-def get_depth_fn(depth_multiplier, min_depth):
-  """Builds a callable to compute depth (output channels) of conv filters.
-
-  Args:
-    depth_multiplier: a multiplier for the nominal depth.
-    min_depth: a lower bound on the depth of filters.
-
-  Returns:
-    A callable that takes in a nominal depth and returns the depth to use.
-  """
-  def multiply_depth(depth):
-    new_depth = int(depth * depth_multiplier)
-    return max(new_depth, min_depth)
-  return multiply_depth
-
-
-def multi_resolution_feature_maps(feature_map_layout, depth_multiplier,
-                                  min_depth, insert_1x1_conv, image_features):
-  """Generates multi resolution feature maps from input image features.
-
-  Generates multi-scale feature maps for detection as in the SSD papers by
-  Liu et al: https://arxiv.org/pdf/1512.02325v2.pdf, See Sec 2.1.
-
-  More specifically, it performs the following two tasks:
-  1) If a layer name is provided in the configuration, returns that layer as a
-     feature map.
-  2) If a layer name is left as an empty string, constructs a new feature map
-     based on the spatial shape and depth configuration. Note that the current
-     implementation only supports generating new layers using convolution of
-     stride 2 resulting in a spatial resolution reduction by a factor of 2.
-     By default convolution kernel size is set to 3, and it can be customized
-     by caller.
-
-  An example of the configuration for Inception V3:
-  {
-    'from_layer': ['Mixed_5d', 'Mixed_6e', 'Mixed_7c', '', '', ''],
-    'layer_depth': [-1, -1, -1, 512, 256, 128]
-  }
-
-  Args:
-    feature_map_layout: Dictionary of specifications for the feature map
-      layouts in the following format (Inception V2/V3 respectively):
-      {
-        'from_layer': ['Mixed_3c', 'Mixed_4c', 'Mixed_5c', '', '', ''],
-        'layer_depth': [-1, -1, -1, 512, 256, 128]
-      }
-      or
-      {
-        'from_layer': ['Mixed_5d', 'Mixed_6e', 'Mixed_7c', '', '', '', ''],
-        'layer_depth': [-1, -1, -1, 512, 256, 128]
-      }
-      If 'from_layer' is specified, the specified feature map is directly used
-      as a box predictor layer, and the layer_depth is directly infered from the
-      feature map (instead of using the provided 'layer_depth' parameter). In
-      this case, our convention is to set 'layer_depth' to -1 for clarity.
-      Otherwise, if 'from_layer' is an empty string, then the box predictor
-      layer will be built from the previous layer using convolution operations.
-      Note that the current implementation only supports generating new layers
-      using convolutions of stride 2 (resulting in a spatial resolution
-      reduction by a factor of 2), and will be extended to a more flexible
-      design. Convolution kernel size is set to 3 by default, and can be
-      customized by 'conv_kernel_size' parameter (similarily, 'conv_kernel_size'
-      should be set to -1 if 'from_layer' is specified). The created convolution
-      operation will be a normal 2D convolution by default, and a depthwise
-      convolution followed by 1x1 convolution if 'use_depthwise' is set to True.
-    depth_multiplier: Depth multiplier for convolutional layers.
-    min_depth: Minimum depth for convolutional layers.
-    insert_1x1_conv: A boolean indicating whether an additional 1x1 convolution
-      should be inserted before shrinking the feature map.
-    image_features: A dictionary of handles to activation tensors from the
-      base feature extractor.
-
-  Returns:
-    feature_maps: an OrderedDict mapping keys (feature map names) to
-      tensors where each tensor has shape [batch, height_i, width_i, depth_i].
-
-  Raises:
-    ValueError: if the number entries in 'from_layer' and
-      'layer_depth' do not match.
-    ValueError: if the generated layer does not have the same resolution
-      as specified.
-  """
-  depth_fn = get_depth_fn(depth_multiplier, min_depth)
-
-  feature_map_keys = []
-  feature_maps = []
-  base_from_layer = ''
-  use_explicit_padding = False
-  if 'use_explicit_padding' in feature_map_layout:
-    use_explicit_padding = feature_map_layout['use_explicit_padding']
-  use_depthwise = False
-  if 'use_depthwise' in feature_map_layout:
-    use_depthwise = feature_map_layout['use_depthwise']
-  for index, from_layer in enumerate(feature_map_layout['from_layer']):
-    layer_depth = feature_map_layout['layer_depth'][index]
-    conv_kernel_size = 3
-    if 'conv_kernel_size' in feature_map_layout:
-      conv_kernel_size = feature_map_layout['conv_kernel_size'][index]
-    if from_layer:
-      feature_map = image_features[from_layer]
-      base_from_layer = from_layer
-      feature_map_keys.append(from_layer)
-    else:
-      pre_layer = feature_maps[-1]
-      intermediate_layer = pre_layer
-      if insert_1x1_conv:
-        layer_name = '{}_1_Conv2d_{}_1x1_{}'.format(
-            base_from_layer, index, depth_fn(layer_depth / 2))
-        intermediate_layer = slim.conv2d(
-            pre_layer,
-            depth_fn(layer_depth / 2), [1, 1],
-            padding='SAME',
-            stride=1,
-            scope=layer_name)
-      layer_name = '{}_2_Conv2d_{}_{}x{}_s2_{}'.format(
-          base_from_layer, index, conv_kernel_size, conv_kernel_size,
-          depth_fn(layer_depth))
-      stride = 2
-      padding = 'SAME'
-      if use_explicit_padding:
-        padding = 'VALID'
-        intermediate_layer = ops.fixed_padding(
-            intermediate_layer, conv_kernel_size)
-      if use_depthwise:
-        feature_map = slim.separable_conv2d(
-            intermediate_layer,
-            None, [conv_kernel_size, conv_kernel_size],
-            depth_multiplier=1,
-            padding=padding,
-            stride=stride,
-            scope=layer_name + '_depthwise')
-        feature_map = slim.conv2d(
-            feature_map,
-            depth_fn(layer_depth), [1, 1],
-            padding='SAME',
-            stride=1,
-            scope=layer_name)
-      else:
-        feature_map = slim.conv2d(
-            intermediate_layer,
-            depth_fn(layer_depth), [conv_kernel_size, conv_kernel_size],
-            padding=padding,
-            stride=stride,
-            scope=layer_name)
-      feature_map_keys.append(layer_name)
-    feature_maps.append(feature_map)
-  return collections.OrderedDict(
-      [(x, y) for (x, y) in zip(feature_map_keys, feature_maps)])
-
-
-def fpn_top_down_feature_maps(image_features, depth, scope=None):
-  """Generates `top-down` feature maps for Feature Pyramid Networks.
-
-  See https://arxiv.org/abs/1612.03144 for details.
-
-  Args:
-    image_features: list of tuples of (tensor_name, image_feature_tensor).
-      Spatial resolutions of succesive tensors must reduce exactly by a factor
-      of 2.
-    depth: depth of output feature maps.
-    scope: A scope name to wrap this op under.
-
-  Returns:
-    feature_maps: an OrderedDict mapping keys (feature map names) to
-      tensors where each tensor has shape [batch, height_i, width_i, depth_i].
-  """
-  with tf.name_scope(scope, 'top_down'):
-    num_levels = len(image_features)
-    output_feature_maps_list = []
-    output_feature_map_keys = []
-    with slim.arg_scope(
-        [slim.conv2d], padding='SAME', stride=1):
-      top_down = slim.conv2d(
-          image_features[-1][1],
-          depth, [1, 1], activation_fn=None, normalizer_fn=None,
-          scope='projection_%d' % num_levels)
-      output_feature_maps_list.append(top_down)
-      output_feature_map_keys.append(
-          'top_down_%s' % image_features[-1][0])
-
-      for level in reversed(range(num_levels - 1)):
-        top_down = ops.nearest_neighbor_upsampling(top_down, 2)
-        residual = slim.conv2d(
-            image_features[level][1], depth, [1, 1],
-            activation_fn=None, normalizer_fn=None,
-            scope='projection_%d' % (level + 1))
-        top_down += residual
-        output_feature_maps_list.append(slim.conv2d(
-            top_down,
-            depth, [3, 3],
-            scope='smoothing_%d' % (level + 1)))
-        output_feature_map_keys.append('top_down_%s' % image_features[level][0])
-      return collections.OrderedDict(
-          reversed(zip(output_feature_map_keys, output_feature_maps_list)))
--- a/research/object_detection/models/ssd_mobilenet_v1_feature_extractor.py
+++ b/research/object_detection/models/ssd_mobilenet_v1_feature_extractor.py
@@ -110,23 +110,19 @@ class SSDMobileNetV1FeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
        with (slim.arg_scope(self._conv_hyperparams_fn())
              if self._override_base_feature_extractor_hyperparams
              else context_manager.IdentityContextManager()):
-        # TODO(skligys): Enable fused batch norm once quantization supports it.
-          with slim.arg_scope([slim.batch_norm], fused=False):
-            _, image_features = mobilenet_v1.mobilenet_v1_base(
-                ops.pad_to_multiple(preprocessed_inputs, self._pad_to_multiple),
-                final_endpoint='Conv2d_13_pointwise',
-                min_depth=self._min_depth,
-                depth_multiplier=self._depth_multiplier,
-                use_explicit_padding=self._use_explicit_padding,
-                scope=scope)
-      with slim.arg_scope(self._conv_hyperparams_fn()):
-        # TODO(skligys): Enable fused batch norm once quantization supports it.
-        with slim.arg_scope([slim.batch_norm], fused=False):
-          feature_maps = feature_map_generators.multi_resolution_feature_maps(
-              feature_map_layout=feature_map_layout,
-              depth_multiplier=self._depth_multiplier,
+          _, image_features = mobilenet_v1.mobilenet_v1_base(
+              ops.pad_to_multiple(preprocessed_inputs, self._pad_to_multiple),
+              final_endpoint='Conv2d_13_pointwise',
              min_depth=self._min_depth,
-              insert_1x1_conv=True,
-              image_features=image_features)
+              depth_multiplier=self._depth_multiplier,
+              use_explicit_padding=self._use_explicit_padding,
+              scope=scope)
+      with slim.arg_scope(self._conv_hyperparams_fn()):
+        feature_maps = feature_map_generators.multi_resolution_feature_maps(
+            feature_map_layout=feature_map_layout,
+            depth_multiplier=self._depth_multiplier,
+            min_depth=self._min_depth,
+            insert_1x1_conv=True,
+            image_features=image_features)

    return feature_maps.values()
--- a/research/object_detection/models/ssd_mobilenet_v1_feature_extractor_test.py
+++ b/research/object_detection/models/ssd_mobilenet_v1_feature_extractor_test.py
@@ -148,7 +148,7 @@ class SsdMobilenetV1FeatureExtractorTest(
    self.check_feature_extractor_variables_under_scope(
        depth_multiplier, pad_to_multiple, scope_name)

-  def test_nofused_batchnorm(self):
+  def test_has_fused_batchnorm(self):
    image_height = 40
    image_width = 40
    depth_multiplier = 1
@@ -159,8 +159,8 @@ class SsdMobilenetV1FeatureExtractorTest(
                                                       pad_to_multiple)
    preprocessed_image = feature_extractor.preprocess(image_placeholder)
    _ = feature_extractor.extract_features(preprocessed_image)
-    self.assertFalse(any(op.type == 'FusedBatchNorm'
-                         for op in tf.get_default_graph().get_operations()))
+    self.assertTrue(any(op.type == 'FusedBatchNorm'
+                        for op in tf.get_default_graph().get_operations()))

 if __name__ == '__main__':
  tf.test.main()
--- a/research/object_detection/models/ssd_mobilenet_v2_feature_extractor.py
+++ b/research/object_detection/models/ssd_mobilenet_v2_feature_extractor.py
@@ -112,24 +112,18 @@ class SSDMobileNetV2FeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
        with (slim.arg_scope(self._conv_hyperparams_fn())
              if self._override_base_feature_extractor_hyperparams else
              context_manager.IdentityContextManager()):
-          # TODO(b/68150321): Enable fused batch norm once quantization
-          # supports it.
-          with slim.arg_scope([slim.batch_norm], fused=False):
-            _, image_features = mobilenet_v2.mobilenet_base(
-                ops.pad_to_multiple(preprocessed_inputs, self._pad_to_multiple),
-                final_endpoint='layer_19',
-                depth_multiplier=self._depth_multiplier,
-                use_explicit_padding=self._use_explicit_padding,
-                scope=scope)
+          _, image_features = mobilenet_v2.mobilenet_base(
+              ops.pad_to_multiple(preprocessed_inputs, self._pad_to_multiple),
+              final_endpoint='layer_19',
+              depth_multiplier=self._depth_multiplier,
+              use_explicit_padding=self._use_explicit_padding,
+              scope=scope)
        with slim.arg_scope(self._conv_hyperparams_fn()):
-          # TODO(b/68150321): Enable fused batch norm once quantization
-          # supports it.
-          with slim.arg_scope([slim.batch_norm], fused=False):
-            feature_maps = feature_map_generators.multi_resolution_feature_maps(
-                feature_map_layout=feature_map_layout,
-                depth_multiplier=self._depth_multiplier,
-                min_depth=self._min_depth,
-                insert_1x1_conv=True,
-                image_features=image_features)
+          feature_maps = feature_map_generators.multi_resolution_feature_maps(
+              feature_map_layout=feature_map_layout,
+              depth_multiplier=self._depth_multiplier,
+              min_depth=self._min_depth,
+              insert_1x1_conv=True,
+              image_features=image_features)

    return feature_maps.values()
--- a/research/object_detection/models/ssd_mobilenet_v2_feature_extractor_test.py
+++ b/research/object_detection/models/ssd_mobilenet_v2_feature_extractor_test.py
@@ -135,7 +135,7 @@ class SsdMobilenetV2FeatureExtractorTest(
    self.check_feature_extractor_variables_under_scope(
        depth_multiplier, pad_to_multiple, scope_name)

-  def test_nofused_batchnorm(self):
+  def test_has_fused_batchnorm(self):
    image_height = 40
    image_width = 40
    depth_multiplier = 1
@@ -146,8 +146,8 @@ class SsdMobilenetV2FeatureExtractorTest(
                                                       pad_to_multiple)
    preprocessed_image = feature_extractor.preprocess(image_placeholder)
    _ = feature_extractor.extract_features(preprocessed_image)
-    self.assertFalse(any(op.type == 'FusedBatchNorm'
-                         for op in tf.get_default_graph().get_operations()))
+    self.assertTrue(any(op.type == 'FusedBatchNorm'
+                        for op in tf.get_default_graph().get_operations()))


 if __name__ == '__main__':

--- a/research/object_detection/protos/image_resizer.proto
+++ b/research/object_detection/protos/image_resizer.proto
@@ -37,6 +37,10 @@ message KeepAspectRatioResizer {

  // Whether to also resize the image channels from 3 to 1 (RGB to grayscale).
  optional bool convert_to_grayscale = 5 [default = false];
+
+  // Per-channel pad value. This is only used when pad_to_max_dimension is True.
+  // If unspecified, a default pad value of 0 is applied to all channels.
+  repeated float per_channel_pad_value = 6;
 }

 // Configuration proto for image resizer that resizes to a fixed shape.

--- a/research/object_detection/protos/input_reader.proto
+++ b/research/object_detection/protos/input_reader.proto
@@ -69,6 +69,10 @@ message InputReader {
  // Type of instance mask.
  optional InstanceMaskType mask_type = 10 [default = NUMERICAL_MASKS];

+  // Whether to use the display name when decoding examples. This is only used
+  // when mapping class text strings to integers.
+  optional bool use_display_name = 17 [default = false];
+
  oneof input_reader {
    TFRecordInputReader tf_record_input_reader = 8;
    ExternalInputReader external_input_reader = 9;

--- a/research/object_detection/trainer.py
+++ b/research/object_detection/trainer.py
@@ -235,6 +235,9 @@ def train(create_tensor_dict_fn,
      built (before optimization). This is helpful to perform additional changes
      to the training graph such as adding FakeQuant ops. The function should
      modify the default graph.
+
+  Raises:
+    ValueError: If both num_clones > 1 and train_config.sync_replicas is true.
  """

  detection_model = create_model_fn()
@@ -256,9 +259,16 @@ def train(create_tensor_dict_fn,
    with tf.device(deploy_config.variables_device()):
      global_step = slim.create_global_step()

+    if num_clones != 1 and train_config.sync_replicas:
+      raise ValueError('In Synchronous SGD mode num_clones must ',
+                       'be 1. Found num_clones: {}'.format(num_clones))
+    batch_size = train_config.batch_size // num_clones
+    if train_config.sync_replicas:
+      batch_size //= train_config.replicas_to_aggregate
+
    with tf.device(deploy_config.inputs_device()):
      input_queue = create_input_queue(
-          train_config.batch_size // num_clones, create_tensor_dict_fn,
+          batch_size, create_tensor_dict_fn,
          train_config.batch_queue_capacity,
          train_config.num_batch_queue_threads,
          train_config.prefetch_queue_capacity, data_augmentation_options)
@@ -377,7 +387,8 @@ def train(create_tensor_dict_fn,
              train_config.load_all_detection_checkpoint_vars))
      available_var_map = (variables_helper.
                           get_variables_available_in_checkpoint(
-                               var_map, train_config.fine_tune_checkpoint))
+                               var_map, train_config.fine_tune_checkpoint,
+                               include_global_step=False))
      init_saver = tf.train.Saver(available_var_map)
      def initializer_fn(sess):
        init_saver.restore(sess, train_config.fine_tune_checkpoint)

--- a/research/object_detection/utils/config_util.py
+++ b/research/object_detection/utils/config_util.py
@@ -278,6 +278,19 @@ def get_learning_rate_type(optimizer_config):
  return optimizer_config.learning_rate.WhichOneof("learning_rate")


+def _is_generic_key(key):
+  """Determines whether the key starts with a generic config dictionary key."""
+  for prefix in [
+      "graph_rewriter_config",
+      "model",
+      "train_input_config",
+      "train_input_config",
+      "train_config"]:
+    if key.startswith(prefix + "."):
+      return True
+  return False
+
+
 def merge_external_params_with_configs(configs, hparams=None, **kwargs):
  """Updates `configs` dictionary based on supplied parameters.

@@ -287,6 +300,16 @@ def merge_external_params_with_configs(configs, hparams=None, **kwargs):
  experiment, one can use a single base config file, and update particular
  values.

+  There are two types of field overrides:
+  1. Strategy-based overrides, which update multiple relevant configuration
+  options. For example, updating `learning_rate` will update both the warmup and
+  final learning rates.
+  2. Generic key/value, which update a specific parameter based on namespaced
+  configuration keys. For example,
+  `model.ssd.loss.hard_example_miner.max_negatives_per_positive` will update the
+  hard example miner configuration for an SSD model config. Generic overrides
+  are automatically detected based on the namespaced keys.
+
  Args:
    configs: Dictionary of configuration objects. See outputs from
      get_configs_from_pipeline_file() or get_configs_from_multiple_files().
@@ -302,44 +325,42 @@ def merge_external_params_with_configs(configs, hparams=None, **kwargs):
  if hparams:
    kwargs.update(hparams.values())
  for key, value in kwargs.items():
+    tf.logging.info("Maybe overwriting %s: %s", key, value)
    # pylint: disable=g-explicit-bool-comparison
    if value == "" or value is None:
      continue
    # pylint: enable=g-explicit-bool-comparison
    if key == "learning_rate":
      _update_initial_learning_rate(configs, value)
-      tf.logging.info("Overwriting learning rate: %f", value)
-    if key == "batch_size":
+    elif key == "batch_size":
      _update_batch_size(configs, value)
-      tf.logging.info("Overwriting batch size: %d", value)
-    if key == "momentum_optimizer_value":
+    elif key == "momentum_optimizer_value":
      _update_momentum_optimizer_value(configs, value)
-      tf.logging.info("Overwriting momentum optimizer value: %f", value)
-    if key == "classification_localization_weight_ratio":
+    elif key == "classification_localization_weight_ratio":
      # Localization weight is fixed to 1.0.
      _update_classification_localization_weight_ratio(configs, value)
-    if key == "focal_loss_gamma":
+    elif key == "focal_loss_gamma":
      _update_focal_loss_gamma(configs, value)
-    if key == "focal_loss_alpha":
+    elif key == "focal_loss_alpha":
      _update_focal_loss_alpha(configs, value)
-    if key == "train_steps":
+    elif key == "train_steps":
      _update_train_steps(configs, value)
-      tf.logging.info("Overwriting train steps: %d", value)
-    if key == "eval_steps":
+    elif key == "eval_steps":
      _update_eval_steps(configs, value)
-      tf.logging.info("Overwriting eval steps: %d", value)
-    if key == "train_input_path":
+    elif key == "train_input_path":
      _update_input_path(configs["train_input_config"], value)
-      tf.logging.info("Overwriting train input path: %s", value)
-    if key == "eval_input_path":
+    elif key == "eval_input_path":
      _update_input_path(configs["eval_input_config"], value)
-      tf.logging.info("Overwriting eval input path: %s", value)
-    if key == "label_map_path":
+    elif key == "label_map_path":
      _update_label_map_path(configs, value)
-      tf.logging.info("Overwriting label map path: %s", value)
-    if key == "mask_type":
+    elif key == "mask_type":
      _update_mask_type(configs, value)
-      tf.logging.info("Overwritten mask type: %s", value)
+    elif key == "eval_with_moving_averages":
+      _update_use_moving_averages(configs, value)
+    elif _is_generic_key(key):
+      _update_generic(configs, key, value)
+    else:
+      tf.logging.info("Ignoring config override key: %s", key)
  return configs


@@ -411,6 +432,38 @@ def _update_batch_size(configs, batch_size):
  configs["train_config"].batch_size = max(1, int(round(batch_size)))


+def _validate_message_has_field(message, field):
+  if not message.HasField(field):
+    raise ValueError("Expecting message to have field %s" % field)
+
+
+def _update_generic(configs, key, value):
+  """Update a pipeline configuration parameter based on a generic key/value.
+
+  Args:
+    configs: Dictionary of pipeline configuration protos.
+    key: A string key, dot-delimited to represent the argument key.
+      e.g. "model.ssd.train_config.batch_size"
+    value: A value to set the argument to. The type of the value must match the
+      type for the protocol buffer. Note that setting the wrong type will
+      result in a TypeError.
+      e.g. 42
+
+  Raises:
+    ValueError if the message key does not match the existing proto fields.
+    TypeError the value type doesn't match the protobuf field type.
+  """
+  fields = key.split(".")
+  first_field = fields.pop(0)
+  last_field = fields.pop()
+  message = configs[first_field]
+  for field in fields:
+    _validate_message_has_field(message, field)
+    message = getattr(message, field)
+  _validate_message_has_field(message, last_field)
+  setattr(message, last_field, value)
+
+
 def _update_momentum_optimizer_value(configs, momentum):
  """Updates `configs` to reflect the new momentum value.

@@ -587,3 +640,17 @@ def _update_mask_type(configs, mask_type):
  """
  configs["train_input_config"].mask_type = mask_type
  configs["eval_input_config"].mask_type = mask_type
+
+
+def _update_use_moving_averages(configs, use_moving_averages):
+  """Updates the eval config option to use or not use moving averages.
+
+  The configs dictionary is updated in place, and hence not returned.
+
+  Args:
+    configs: Dictionary of configuration objects. See outputs from
+      get_configs_from_pipeline_file() or get_configs_from_multiple_files().
+    use_moving_averages: Boolean indicating whether moving average variables
+      should be loaded during evaluation.
+  """
+  configs["eval_config"].use_moving_averages = use_moving_averages
--- a/research/object_detection/utils/config_util_test.py
+++ b/research/object_detection/utils/config_util_test.py
@@ -69,6 +69,11 @@ def _update_optimizer_with_cosine_decay_learning_rate(

 class ConfigUtilTest(tf.test.TestCase):

+  def _create_and_load_test_configs(self, pipeline_config):
+    pipeline_config_path = os.path.join(self.get_temp_dir(), "pipeline.config")
+    _write_config(pipeline_config, pipeline_config_path)
+    return config_util.get_configs_from_pipeline_file(pipeline_config_path)
+
  def test_get_configs_from_pipeline_file(self):
    """Test that proto configs can be read from pipeline config file."""
    pipeline_config_path = os.path.join(self.get_temp_dir(), "pipeline.config")
@@ -307,6 +312,34 @@ class ConfigUtilTest(tf.test.TestCase):
    new_batch_size = configs["train_config"].batch_size
    self.assertEqual(1, new_batch_size)  # Clipped to 1.0.

+  def testOverwriteBatchSizeWithKeyValue(self):
+    """Tests that batch size is overwritten based on key/value."""
+    pipeline_config = pipeline_pb2.TrainEvalPipelineConfig()
+    pipeline_config.train_config.batch_size = 2
+    configs = self._create_and_load_test_configs(pipeline_config)
+    hparams = tf.contrib.training.HParams(**{"train_config.batch_size": 10})
+    configs = config_util.merge_external_params_with_configs(configs, hparams)
+    new_batch_size = configs["train_config"].batch_size
+    self.assertEqual(10, new_batch_size)
+
+  def testKeyValueOverrideBadKey(self):
+    """Tests that overwriting with a bad key causes an exception."""
+    pipeline_config = pipeline_pb2.TrainEvalPipelineConfig()
+    configs = self._create_and_load_test_configs(pipeline_config)
+    hparams = tf.contrib.training.HParams(**{"train_config.no_such_field": 10})
+    with self.assertRaises(ValueError):
+      config_util.merge_external_params_with_configs(configs, hparams)
+
+  def testOverwriteBatchSizeWithBadValueType(self):
+    """Tests that overwriting with a bad valuye type causes an exception."""
+    pipeline_config = pipeline_pb2.TrainEvalPipelineConfig()
+    pipeline_config.train_config.batch_size = 2
+    configs = self._create_and_load_test_configs(pipeline_config)
+    # Type should be an integer, but we're passing a string "10".
+    hparams = tf.contrib.training.HParams(**{"train_config.batch_size": "10"})
+    with self.assertRaises(TypeError):
+      config_util.merge_external_params_with_configs(configs, hparams)
+
  def testNewMomentumOptimizerValue(self):
    """Tests that new momentum value is updated appropriately."""
    original_momentum_value = 0.4
@@ -501,6 +534,19 @@ class ConfigUtilTest(tf.test.TestCase):
    self.assertEqual(new_mask_type, configs["train_input_config"].mask_type)
    self.assertEqual(new_mask_type, configs["eval_input_config"].mask_type)

+  def testUseMovingAverageForEval(self):
+    use_moving_averages_orig = False
+    pipeline_config_path = os.path.join(self.get_temp_dir(), "pipeline.config")
+
+    pipeline_config = pipeline_pb2.TrainEvalPipelineConfig()
+    pipeline_config.eval_config.use_moving_averages = use_moving_averages_orig
+    _write_config(pipeline_config, pipeline_config_path)
+
+    configs = config_util.get_configs_from_pipeline_file(pipeline_config_path)
+    configs = config_util.merge_external_params_with_configs(
+        configs, eval_with_moving_averages=True)
+    self.assertEqual(True, configs["eval_config"].use_moving_averages)
+
  def  test_get_image_resizer_config(self):
    """Tests that number of classes can be retrieved."""
    model_config = model_pb2.DetectionModel()

--- a/research/object_detection/utils/dataset_util.py
+++ b/research/object_detection/utils/dataset_util.py
@@ -117,13 +117,17 @@ def read_dataset(file_read_func, decode_func, input_files, config):
    A tf.data.Dataset based on config.
  """
  # Shard, shuffle, and read files.
-  filenames = tf.concat([tf.matching_files(pattern) for pattern in input_files],
-                        0)
-  filename_dataset = tf.data.Dataset.from_tensor_slices(filenames)
+  filenames = tf.gfile.Glob(input_files)
+  num_readers = config.num_readers
+  if num_readers > len(filenames):
+    num_readers = len(filenames)
+    tf.logging.warning('num_readers has been reduced to %d to match input file '
+                       'shards.' % num_readers)
+  filename_dataset = tf.data.Dataset.from_tensor_slices(tf.unstack(filenames))
  if config.shuffle:
    filename_dataset = filename_dataset.shuffle(
        config.filenames_shuffle_buffer_size)
-  elif config.num_readers > 1:
+  elif num_readers > 1:
    tf.logging.warning('`shuffle` is false, but the input data stream is '
                       'still slightly shuffled since `num_readers` > 1.')

@@ -131,8 +135,10 @@ def read_dataset(file_read_func, decode_func, input_files, config):

  records_dataset = filename_dataset.apply(
      tf.contrib.data.parallel_interleave(
-          file_read_func, cycle_length=config.num_readers,
-          block_length=config.read_block_length, sloppy=config.shuffle))
+          file_read_func,
+          cycle_length=num_readers,
+          block_length=config.read_block_length,
+          sloppy=config.shuffle))
  if config.shuffle:
    records_dataset = records_dataset.shuffle(config.shuffle_buffer_size)
  tensor_dataset = records_dataset.map(

--- a/research/object_detection/utils/dataset_util_test.py
+++ b/research/object_detection/utils/dataset_util_test.py
@@ -16,6 +16,7 @@
 """Tests for object_detection.utils.dataset_util."""

 import os
+import numpy as np
 import tensorflow as tf

 from object_detection.protos import input_reader_pb2
@@ -32,6 +33,13 @@ class DatasetUtilTest(tf.test.TestCase):
      with tf.gfile.Open(path, 'wb') as f:
        f.write('\n'.join([str(i + 1), str((i + 1) * 10)]))

+    self._shuffle_path_template = os.path.join(self.get_temp_dir(),
+                                               'shuffle_%s.txt')
+    for i in range(2):
+      path = self._shuffle_path_template % i
+      with tf.gfile.Open(path, 'wb') as f:
+        f.write('\n'.join([str(i)] * 5))
+
  def _get_dataset_next(self, files, config, batch_size):
    def decode_func(value):
      return [tf.string_to_number(value, out_type=tf.int32)]
@@ -78,6 +86,43 @@ class DatasetUtilTest(tf.test.TestCase):
                          [[1, 10, 2, 20, 3, 30, 4, 40, 5, 50, 1, 10, 2, 20, 3,
                            30, 4, 40, 5, 50]])

+  def test_reduce_num_reader(self):
+    config = input_reader_pb2.InputReader()
+    config.num_readers = 10
+    config.shuffle = False
+
+    data = self._get_dataset_next([self._path_template % '*'], config,
+                                  batch_size=20)
+    with self.test_session() as sess:
+      self.assertAllEqual(sess.run(data),
+                          [[1, 10, 2, 20, 3, 30, 4, 40, 5, 50, 1, 10, 2, 20, 3,
+                            30, 4, 40, 5, 50]])
+
+  def test_enable_shuffle(self):
+    config = input_reader_pb2.InputReader()
+    config.num_readers = 1
+    config.shuffle = True
+
+    data = self._get_dataset_next(
+        [self._shuffle_path_template % '*'], config, batch_size=10)
+    expected_non_shuffle_output = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
+
+    with self.test_session() as sess:
+      self.assertTrue(
+          np.any(np.not_equal(sess.run(data), expected_non_shuffle_output)))
+
+  def test_disable_shuffle_(self):
+    config = input_reader_pb2.InputReader()
+    config.num_readers = 1
+    config.shuffle = False
+
+    data = self._get_dataset_next(
+        [self._shuffle_path_template % '*'], config, batch_size=10)
+    expected_non_shuffle_output = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
+
+    with self.test_session() as sess:
+      self.assertAllEqual(sess.run(data), [expected_non_shuffle_output])
+
  def test_read_dataset_single_epoch(self):
    config = input_reader_pb2.InputReader()
    config.num_epochs = 1

--- a/research/object_detection/utils/ops.py
+++ b/research/object_detection/utils/ops.py
@@ -318,8 +318,9 @@ def retain_groundtruth(tensor_dict, valid_indices):
  Args:
    tensor_dict: a dictionary of following groundtruth tensors -
      fields.InputDataFields.groundtruth_boxes
-      fields.InputDataFields.groundtruth_instance_masks
      fields.InputDataFields.groundtruth_classes
+      fields.InputDataFields.groundtruth_keypoints
+      fields.InputDataFields.groundtruth_instance_masks
      fields.InputDataFields.groundtruth_is_crowd
      fields.InputDataFields.groundtruth_area
      fields.InputDataFields.groundtruth_label_types
@@ -347,6 +348,7 @@ def retain_groundtruth(tensor_dict, valid_indices):
    for key in tensor_dict:
      if key in [fields.InputDataFields.groundtruth_boxes,
                 fields.InputDataFields.groundtruth_classes,
+                 fields.InputDataFields.groundtruth_keypoints,
                 fields.InputDataFields.groundtruth_instance_masks]:
        valid_dict[key] = tf.gather(tensor_dict[key], valid_indices)
      # Input decoder returns empty tensor when these fields are not provided.
@@ -374,6 +376,8 @@ def retain_groundtruth_with_positive_classes(tensor_dict):
    tensor_dict: a dictionary of following groundtruth tensors -
      fields.InputDataFields.groundtruth_boxes
      fields.InputDataFields.groundtruth_classes
+      fields.InputDataFields.groundtruth_keypoints
+      fields.InputDataFields.groundtruth_instance_masks
      fields.InputDataFields.groundtruth_is_crowd
      fields.InputDataFields.groundtruth_area
      fields.InputDataFields.groundtruth_label_types
@@ -413,6 +417,8 @@ def filter_groundtruth_with_crowd_boxes(tensor_dict):
    tensor_dict: a dictionary of following groundtruth tensors -
      fields.InputDataFields.groundtruth_boxes
      fields.InputDataFields.groundtruth_classes
+      fields.InputDataFields.groundtruth_keypoints
+      fields.InputDataFields.groundtruth_instance_masks
      fields.InputDataFields.groundtruth_is_crowd
      fields.InputDataFields.groundtruth_area
      fields.InputDataFields.groundtruth_label_types
@@ -435,8 +441,9 @@ def filter_groundtruth_with_nan_box_coordinates(tensor_dict):
  Args:
    tensor_dict: a dictionary of following groundtruth tensors -
      fields.InputDataFields.groundtruth_boxes
-      fields.InputDataFields.groundtruth_instance_masks
      fields.InputDataFields.groundtruth_classes
+      fields.InputDataFields.groundtruth_keypoints
+      fields.InputDataFields.groundtruth_instance_masks
      fields.InputDataFields.groundtruth_is_crowd
      fields.InputDataFields.groundtruth_area
      fields.InputDataFields.groundtruth_label_types
@@ -703,23 +710,30 @@ def reframe_box_masks_to_image_masks(box_masks, boxes, image_height,
    A tf.float32 tensor of size [num_masks, image_height, image_width].
  """
  # TODO(rathodv): Make this a public function.
-  def transform_boxes_relative_to_boxes(boxes, reference_boxes):
-    boxes = tf.reshape(boxes, [-1, 2, 2])
-    min_corner = tf.expand_dims(reference_boxes[:, 0:2], 1)
-    max_corner = tf.expand_dims(reference_boxes[:, 2:4], 1)
-    transformed_boxes = (boxes - min_corner) / (max_corner - min_corner)
-    return tf.reshape(transformed_boxes, [-1, 4])
-
-  box_masks = tf.expand_dims(box_masks, axis=3)
-  num_boxes = tf.shape(box_masks)[0]
-  unit_boxes = tf.concat(
-      [tf.zeros([num_boxes, 2]), tf.ones([num_boxes, 2])], axis=1)
-  reverse_boxes = transform_boxes_relative_to_boxes(unit_boxes, boxes)
-  image_masks = tf.image.crop_and_resize(image=box_masks,
-                                         boxes=reverse_boxes,
-                                         box_ind=tf.range(num_boxes),
-                                         crop_size=[image_height, image_width],
-                                         extrapolation_value=0.0)
+  def reframe_box_masks_to_image_masks_default():
+    """The default function when there are more than 0 box masks."""
+    def transform_boxes_relative_to_boxes(boxes, reference_boxes):
+      boxes = tf.reshape(boxes, [-1, 2, 2])
+      min_corner = tf.expand_dims(reference_boxes[:, 0:2], 1)
+      max_corner = tf.expand_dims(reference_boxes[:, 2:4], 1)
+      transformed_boxes = (boxes - min_corner) / (max_corner - min_corner)
+      return tf.reshape(transformed_boxes, [-1, 4])
+
+    box_masks_expanded = tf.expand_dims(box_masks, axis=3)
+    num_boxes = tf.shape(box_masks_expanded)[0]
+    unit_boxes = tf.concat(
+        [tf.zeros([num_boxes, 2]), tf.ones([num_boxes, 2])], axis=1)
+    reverse_boxes = transform_boxes_relative_to_boxes(unit_boxes, boxes)
+    return tf.image.crop_and_resize(
+        image=box_masks_expanded,
+        boxes=reverse_boxes,
+        box_ind=tf.range(num_boxes),
+        crop_size=[image_height, image_width],
+        extrapolation_value=0.0)
+  image_masks = tf.cond(
+      tf.shape(box_masks)[0] > 0,
+      reframe_box_masks_to_image_masks_default,
+      lambda: tf.zeros([0, image_height, image_width, 1], dtype=tf.float32))
  return tf.squeeze(image_masks, axis=3)



--- a/research/object_detection/utils/ops_test.py
+++ b/research/object_detection/utils/ops_test.py
@@ -1100,6 +1100,16 @@ class ReframeBoxMasksToImageMasksTest(tf.test.TestCase):
      np_image_masks = sess.run(image_masks)
      self.assertAllClose(np_image_masks, np_expected_image_masks)

+  def testZeroBoxMasks(self):
+    box_masks = tf.zeros([0, 3, 3], dtype=tf.float32)
+    boxes = tf.zeros([0, 4], dtype=tf.float32)
+    image_masks = ops.reframe_box_masks_to_image_masks(box_masks, boxes,
+                                                       image_height=4,
+                                                       image_width=4)
+    with self.test_session() as sess:
+      np_image_masks = sess.run(image_masks)
+      self.assertAllEqual(np_image_masks.shape, np.array([0, 4, 4]))
+
  def testMaskIsCenteredInImageWhenBoxIsCentered(self):
    box_masks = tf.constant([[[1, 1],
                              [1, 1]]], dtype=tf.float32)

--- a/research/object_detection/utils/per_image_vrd_evaluation.py
+++ b/research/object_detection/utils/per_image_vrd_evaluation.py
@@ -67,16 +67,18 @@ class PerImageVRDEvaluation(object):
      tp_fp_labels: A single boolean numpy array of shape [N,], representing N
          True/False positive label, one label per tuple. The labels are sorted
          so that the order of the labels matches the order of the scores.
+      result_mapping: A numpy array with shape [N,] with original index of each
+          entry.
    """

-    scores, tp_fp_labels = self._compute_tp_fp(
+    scores, tp_fp_labels, result_mapping = self._compute_tp_fp(
        detected_box_tuples=detected_box_tuples,
        detected_scores=detected_scores,
        detected_class_tuples=detected_class_tuples,
        groundtruth_box_tuples=groundtruth_box_tuples,
        groundtruth_class_tuples=groundtruth_class_tuples)

-    return scores, tp_fp_labels
+    return scores, tp_fp_labels, result_mapping

  def _compute_tp_fp(self, detected_box_tuples, detected_scores,
                     detected_class_tuples, groundtruth_box_tuples,
@@ -107,33 +109,46 @@ class PerImageVRDEvaluation(object):
      tp_fp_labels: A single boolean numpy array of shape [N,], representing N
          True/False positive label, one label per tuple. The labels are sorted
          so that the order of the labels matches the order of the scores.
-
+      result_mapping: A numpy array with shape [N,] with original index of each
+          entry.
    """
    unique_gt_tuples = np.unique(
        np.concatenate((groundtruth_class_tuples, detected_class_tuples)))
    result_scores = []
    result_tp_fp_labels = []
+    result_mapping = []

    for unique_tuple in unique_gt_tuples:
      detections_selector = (detected_class_tuples == unique_tuple)
      gt_selector = (groundtruth_class_tuples == unique_tuple)
-      scores, tp_fp_labels = self._compute_tp_fp_for_single_class(
-          detected_box_tuples=detected_box_tuples[detections_selector],
-          detected_scores=detected_scores[detections_selector],
+
+      selector_mapping = np.where(detections_selector)[0]
+
+      detection_scores_per_tuple = detected_scores[detections_selector]
+      detection_box_per_tuple = detected_box_tuples[detections_selector]
+
+      sorted_indices = np.argsort(detection_scores_per_tuple)
+      sorted_indices = sorted_indices[::-1]
+
+      tp_fp_labels = self._compute_tp_fp_for_single_class(
+          detected_box_tuples=detection_box_per_tuple[sorted_indices],
          groundtruth_box_tuples=groundtruth_box_tuples[gt_selector])
-      result_scores.append(scores)
+      result_scores.append(detection_scores_per_tuple[sorted_indices])
      result_tp_fp_labels.append(tp_fp_labels)
+      result_mapping.append(selector_mapping[sorted_indices])

    result_scores = np.concatenate(result_scores)
    result_tp_fp_labels = np.concatenate(result_tp_fp_labels)
+    result_mapping = np.concatenate(result_mapping)

    sorted_indices = np.argsort(result_scores)
    sorted_indices = sorted_indices[::-1]

-    return result_scores[sorted_indices], result_tp_fp_labels[sorted_indices]
+    return result_scores[sorted_indices], result_tp_fp_labels[
+        sorted_indices], result_mapping[sorted_indices]

-  def _get_overlaps_and_scores_relation_tuples(
-      self, detected_box_tuples, detected_scores, groundtruth_box_tuples):
+  def _get_overlaps_and_scores_relation_tuples(self, detected_box_tuples,
+                                               groundtruth_box_tuples):
    """Computes overlaps and scores between detected and groundtruth tuples.

    Both detections and groundtruth boxes have the same class tuples.
@@ -143,8 +158,6 @@ class PerImageVRDEvaluation(object):
          representing N tuples, each tuple containing the same number of named
          bounding boxes.
          Each box is of the format [y_min, x_min, y_max, x_max]
-      detected_scores: A float numpy array of shape [N,], representing
-          the confidence scores of the detected N object instances.
      groundtruth_box_tuples: A float numpy array of structures with the shape
          [M,], representing M tuples, each tuple containing the same number
          of named bounding boxes.
@@ -153,7 +166,6 @@ class PerImageVRDEvaluation(object):
    Returns:
      result_iou: A float numpy array of size
        [num_detected_tuples, num_gt_box_tuples].
-      scores: The score of the detected boxlist.
    """

    result_iou = np.ones(
@@ -161,46 +173,35 @@ class PerImageVRDEvaluation(object):
        dtype=float)
    for field in detected_box_tuples.dtype.fields:
      detected_boxlist_field = np_box_list.BoxList(detected_box_tuples[field])
-      detected_boxlist_field.add_field('scores', detected_scores)
-      detected_boxlist_field = np_box_list_ops.sort_by_field(
-          detected_boxlist_field, 'scores')
      gt_boxlist_field = np_box_list.BoxList(groundtruth_box_tuples[field])
      iou_field = np_box_list_ops.iou(detected_boxlist_field, gt_boxlist_field)
      result_iou = np.minimum(iou_field, result_iou)
-    scores = detected_boxlist_field.get_field('scores')
-    return result_iou, scores
+    return result_iou

  def _compute_tp_fp_for_single_class(self, detected_box_tuples,
-                                      detected_scores, groundtruth_box_tuples):
+                                      groundtruth_box_tuples):
    """Labels boxes detected with the same class from the same image as tp/fp.

+    Detection boxes are expected to be already sorted by score.
    Args:
      detected_box_tuples: A numpy array of structures with shape [N,],
          representing N tuples, each tuple containing the same number of named
          bounding boxes.
          Each box is of the format [y_min, x_min, y_max, x_max]
-      detected_scores: A float numpy array of shape [N,], representing
-          the confidence scores of the detected N object instances.
      groundtruth_box_tuples: A float numpy array of structures with the shape
          [M,], representing M tuples, each tuple containing the same number
          of named bounding boxes.
          Each box is of the format [y_min, x_min, y_max, x_max]

    Returns:
-      Two arrays of the same size, containing true/false for N boxes that were
-      evaluated as being true positives or false positives;
-
-      scores: A numpy array representing the detection scores.
      tp_fp_labels: a boolean numpy array indicating whether a detection is a
          true positive.
    """
    if detected_box_tuples.size == 0:
-      return np.array([], dtype=float), np.array([], dtype=bool)
+      return np.array([], dtype=bool)

-    min_iou, scores = self._get_overlaps_and_scores_relation_tuples(
-        detected_box_tuples=detected_box_tuples,
-        detected_scores=detected_scores,
-        groundtruth_box_tuples=groundtruth_box_tuples)
+    min_iou = self._get_overlaps_and_scores_relation_tuples(
+        detected_box_tuples, groundtruth_box_tuples)

    num_detected_tuples = detected_box_tuples.shape[0]
    tp_fp_labels = np.zeros(num_detected_tuples, dtype=bool)
@@ -215,4 +216,4 @@ class PerImageVRDEvaluation(object):
            tp_fp_labels[i] = True
            is_gt_tuple_detected[gt_id] = True

-    return scores, tp_fp_labels
+    return tp_fp_labels
--- a/research/object_detection/utils/per_image_vrd_evaluation_test.py
+++ b/research/object_detection/utils/per_image_vrd_evaluation_test.py
@@ -28,31 +28,25 @@ class SingleClassPerImageVrdEvaluationTest(tf.test.TestCase):
    box_data_type = np.dtype([('subject', 'f4', (4,)), ('object', 'f4', (4,))])

    self.detected_box_tuples = np.array(
-        [([0, 0, 1, 1], [1, 1, 2, 2]), ([0, 0, 1.1, 1], [1, 1, 2, 2]),
+        [([0, 0, 1.1, 1], [1, 1, 2, 2]), ([0, 0, 1, 1], [1, 1, 2, 2]),
         ([1, 1, 2, 2], [0, 0, 1.1, 1])],
        dtype=box_data_type)
-    self.detected_scores = np.array([0.2, 0.8, 0.1], dtype=float)
+    self.detected_scores = np.array([0.8, 0.2, 0.1], dtype=float)
    self.groundtruth_box_tuples = np.array(
        [([0, 0, 1, 1], [1, 1, 2, 2])], dtype=box_data_type)

  def test_tp_fp_eval(self):
-    scores, tp_fp_labels = self.eval._compute_tp_fp_for_single_class(
-        self.detected_box_tuples, self.detected_scores,
-        self.groundtruth_box_tuples)
-    expected_scores = np.array([0.8, 0.2, 0.1], dtype=float)
+    tp_fp_labels = self.eval._compute_tp_fp_for_single_class(
+        self.detected_box_tuples, self.groundtruth_box_tuples)
    expected_tp_fp_labels = np.array([True, False, False], dtype=bool)
-    self.assertTrue(np.allclose(expected_scores, scores))
    self.assertTrue(np.allclose(expected_tp_fp_labels, tp_fp_labels))

  def test_tp_fp_eval_empty_gt(self):
    box_data_type = np.dtype([('subject', 'f4', (4,)), ('object', 'f4', (4,))])

-    scores, tp_fp_labels = self.eval._compute_tp_fp_for_single_class(
-        self.detected_box_tuples, self.detected_scores,
-        np.array([], dtype=box_data_type))
-    expected_scores = np.array([0.8, 0.2, 0.1], dtype=float)
+    tp_fp_labels = self.eval._compute_tp_fp_for_single_class(
+        self.detected_box_tuples, np.array([], dtype=box_data_type))
    expected_tp_fp_labels = np.array([False, False, False], dtype=bool)
-    self.assertTrue(np.allclose(expected_scores, scores))
    self.assertTrue(np.allclose(expected_tp_fp_labels, tp_fp_labels))


@@ -82,16 +76,18 @@ class MultiClassPerImageVrdEvaluationTest(tf.test.TestCase):
        [(1, 2, 3), (1, 7, 3), (1, 4, 5)], dtype=label_data_type)

  def test_tp_fp_eval(self):
-    scores, tp_fp_labels = self.eval.compute_detection_tp_fp(
+    scores, tp_fp_labels, mapping = self.eval.compute_detection_tp_fp(
        self.detected_box_tuples, self.detected_scores,
        self.detected_class_tuples, self.groundtruth_box_tuples,
        self.groundtruth_class_tuples)

    expected_scores = np.array([0.8, 0.5, 0.2, 0.1], dtype=float)
    expected_tp_fp_labels = np.array([True, True, False, False], dtype=bool)
+    expected_mapping = np.array([1, 3, 0, 2])

    self.assertTrue(np.allclose(expected_scores, scores))
    self.assertTrue(np.allclose(expected_tp_fp_labels, tp_fp_labels))
+    self.assertTrue(np.allclose(expected_mapping, mapping))


 if __name__ == '__main__':

--- a/research/object_detection/utils/test_utils.py
+++ b/research/object_detection/utils/test_utils.py
@@ -138,3 +138,36 @@ def create_random_boxes(num_boxes, max_height, max_width):
  boxes[:, 3] = np.maximum(x_1, x_2)

  return boxes.astype(np.float32)
+
+
+def first_rows_close_as_set(a, b, k=None, rtol=1e-6, atol=1e-6):
+  """Checks if first K entries of two lists are close, up to permutation.
+
+  Inputs to this assert are lists of items which can be compared via
+  numpy.allclose(...) and can be sorted.
+
+  Args:
+    a: list of items which can be compared via numpy.allclose(...) and are
+      sortable.
+    b: list of items which can be compared via numpy.allclose(...) and are
+      sortable.
+    k: a non-negative integer.  If not provided, k is set to be len(a).
+    rtol: relative tolerance.
+    atol: absolute tolerance.
+
+  Returns:
+    boolean, True if input lists a and b have the same length and
+    the first k entries of the inputs satisfy numpy.allclose() after
+    sorting entries.
+  """
+  if not isinstance(a, list) or not isinstance(b, list) or len(a) != len(b):
+    return False
+  if not k:
+    k = len(a)
+  k = min(k, len(a))
+  a_sorted = sorted(a[:k])
+  b_sorted = sorted(b[:k])
+  return all([
+      np.allclose(entry_a, entry_b, rtol, atol)
+      for (entry_a, entry_b) in zip(a_sorted, b_sorted)
+  ])
--- a/research/object_detection/utils/test_utils_test.py
+++ b/research/object_detection/utils/test_utils_test.py
@@ -68,6 +68,22 @@ class TestUtilsTest(tf.test.TestCase):
    self.assertTrue(boxes[:, 2].max() <= max_height)
    self.assertTrue(boxes[:, 3].max() <= max_width)

+  def test_first_rows_close_as_set(self):
+    a = [1, 2, 3, 0, 0]
+    b = [3, 2, 1, 0, 0]
+    k = 3
+    self.assertTrue(test_utils.first_rows_close_as_set(a, b, k))
+
+    a = [[1, 2], [1, 4], [0, 0]]
+    b = [[1, 4 + 1e-9], [1, 2], [0, 0]]
+    k = 2
+    self.assertTrue(test_utils.first_rows_close_as_set(a, b, k))
+
+    a = [[1, 2], [1, 4], [0, 0]]
+    b = [[1, 4 + 1e-9], [2, 2], [0, 0]]
+    k = 2
+    self.assertFalse(test_utils.first_rows_close_as_set(a, b, k))
+

 if __name__ == '__main__':
  tf.test.main()
--- a/research/object_detection/utils/visualization_utils.py
+++ b/research/object_detection/utils/visualization_utils.py
@@ -315,11 +315,13 @@ def draw_bounding_boxes_on_image_tensors(images,
                                         instance_masks=None,
                                         keypoints=None,
                                         max_boxes_to_draw=20,
-                                         min_score_thresh=0.2):
+                                         min_score_thresh=0.2,
+                                         use_normalized_coordinates=True):
  """Draws bounding boxes, masks, and keypoints on batch of image tensors.

  Args:
-    images: A 4D uint8 image tensor of shape [N, H, W, C].
+    images: A 4D uint8 image tensor of shape [N, H, W, C]. If C > 3, additional
+      channels will be ignored.
    boxes: [N, max_detections, 4] float32 tensor of detection boxes.
    classes: [N, max_detections] int tensor of detection classes. Note that
      classes are 1-indexed.
@@ -332,12 +334,17 @@ def draw_bounding_boxes_on_image_tensors(images,
      with keypoints.
    max_boxes_to_draw: Maximum number of boxes to draw on an image. Default 20.
    min_score_thresh: Minimum score threshold for visualization. Default 0.2.
+    use_normalized_coordinates: Whether to assume boxes and kepoints are in
+      normalized coordinates (as opposed to absolute coordiantes).
+      Default is True.

  Returns:
    4D image tensor of type uint8, with boxes drawn on top.
  """
+  # Additional channels are being ignored.
+  images = images[:, :, :, 0:3]
  visualization_keyword_args = {
-      'use_normalized_coordinates': True,
+      'use_normalized_coordinates': use_normalized_coordinates,
      'max_boxes_to_draw': max_boxes_to_draw,
      'min_score_thresh': min_score_thresh,
      'agnostic_mode': False,
@@ -382,7 +389,8 @@ def draw_bounding_boxes_on_image_tensors(images,
 def draw_side_by_side_evaluation_image(eval_dict,
                                       category_index,
                                       max_boxes_to_draw=20,
-                                       min_score_thresh=0.2):
+                                       min_score_thresh=0.2,
+                                       use_normalized_coordinates=True):
  """Creates a side-by-side image with detections and groundtruth.

  Bounding boxes (and instance masks, if available) are visualized on both
@@ -394,6 +402,9 @@ def draw_side_by_side_evaluation_image(eval_dict,
    category_index: A category index (dictionary) produced from a labelmap.
    max_boxes_to_draw: The maximum number of boxes to draw for detections.
    min_score_thresh: The minimum score threshold for showing detections.
+    use_normalized_coordinates: Whether to assume boxes and kepoints are in
+      normalized coordinates (as opposed to absolute coordiantes).
+      Default is True.

  Returns:
    A [1, H, 2 * W, C] uint8 tensor. The subimage on the left corresponds to
@@ -425,7 +436,8 @@ def draw_side_by_side_evaluation_image(eval_dict,
      instance_masks=instance_masks,
      keypoints=keypoints,
      max_boxes_to_draw=max_boxes_to_draw,
-      min_score_thresh=min_score_thresh)
+      min_score_thresh=min_score_thresh,
+      use_normalized_coordinates=use_normalized_coordinates)
  images_with_groundtruth = draw_bounding_boxes_on_image_tensors(
      eval_dict[input_data_fields.original_image],
      tf.expand_dims(eval_dict[input_data_fields.groundtruth_boxes], axis=0),
@@ -439,7 +451,8 @@ def draw_side_by_side_evaluation_image(eval_dict,
      instance_masks=groundtruth_instance_masks,
      keypoints=None,
      max_boxes_to_draw=None,
-      min_score_thresh=0.0)
+      min_score_thresh=0.0,
+      use_normalized_coordinates=use_normalized_coordinates)
  return tf.concat([images_with_detections, images_with_groundtruth], axis=2)



--- a/research/object_detection/utils/visualization_utils_test.py
+++ b/research/object_detection/utils/visualization_utils_test.py
@@ -48,6 +48,9 @@ class VisualizationUtilsTest(tf.test.TestCase):
    image = np.concatenate((imu, imd), axis=0)
    return image

+  def create_test_image_with_five_channels(self):
+    return np.full([100, 200, 5], 255, dtype=np.uint8)
+
  def test_draw_bounding_box_on_image(self):
    test_image = self.create_colorful_test_image()
    test_image = Image.fromarray(test_image)
@@ -144,6 +147,32 @@ class VisualizationUtilsTest(tf.test.TestCase):
          image_pil = Image.fromarray(images_with_boxes_np[i, ...])
          image_pil.save(output_file)

+  def test_draw_bounding_boxes_on_image_tensors_with_additional_channels(self):
+    """Tests the case where input image tensor has more than 3 channels."""
+    category_index = {1: {'id': 1, 'name': 'dog'}}
+    image_np = self.create_test_image_with_five_channels()
+    images_np = np.stack((image_np, image_np), axis=0)
+
+    with tf.Graph().as_default():
+      images_tensor = tf.constant(value=images_np, dtype=tf.uint8)
+      boxes = tf.constant(0, dtype=tf.float32, shape=[2, 0, 4])
+      classes = tf.constant(0, dtype=tf.int64, shape=[2, 0])
+      scores = tf.constant(0, dtype=tf.float32, shape=[2, 0])
+      images_with_boxes = (
+          visualization_utils.draw_bounding_boxes_on_image_tensors(
+              images_tensor,
+              boxes,
+              classes,
+              scores,
+              category_index,
+              min_score_thresh=0.2))
+
+      with self.test_session() as sess:
+        sess.run(tf.global_variables_initializer())
+
+        final_images_np = sess.run(images_with_boxes)
+        self.assertEqual((2, 100, 200, 3), final_images_np.shape)
+
  def test_draw_keypoints_on_image(self):
    test_image = self.create_colorful_test_image()
    test_image = Image.fromarray(test_image)