Merged commit includes the following changes:

199348852 by Zhichao Lu: Small typos fixes in VRD evaluation. -- 199315191 by Zhichao Lu: Change padding shapes when additional channels are available. -- 199309180 by Zhichao Lu: Adds minor fixes to the Object Detection API implementation. -- 199298605 by Zhichao Lu: Force num_readers to be 1 when only input file is not sharded. -- 199292952 by Zhichao Lu: Adds image-level labels parsing into TfExampleDetectionAndGTParser. -- 199259866 by Zhichao Lu: Visual Relationships Evaluation executable. -- 199208330 by Zhichao Lu: Infer train_config.batch_size as the effective batch size. Therefore we need to divide the effective batch size in trainer by train_config.replica_to_aggregate to get per worker batch size. -- 199207842 by Zhichao Lu: Internal change. -- 199204222 by Zhichao Lu: In case the image has more than three channels, we only take the first three channels for visualization. -- 199194388 by Zhichao Lu: Correcting protocols description: VOC 2007 -> VOC 2012. -- 199188290 by Zhichao Lu: Adds per-relationship APs and mAP computation to VRD evaluation. -- 199158801 by Zhichao Lu: If available, additional channels are merged with input image. -- 199099637 by Zhichao Lu: OpenImages Challenge metric support: -adding verified labels standard field for TFExample; -adding tfrecord creation functionality. -- 198957391 by Zhichao Lu: Allow tf record sharding when creating pets dataset. -- 198925184 by Zhichao Lu: Introduce moving average support for evaluation. Also adding the ability to override this configuration via config_util. -- 198918186 by Zhichao Lu: Handles the case where there are 0 box masks. -- 198809009 by Zhichao Lu: Plumb groundtruth weights into target assigner for Faster RCNN. -- 198759987 by Zhichao Lu: Fix object detection test broken by shape inference. -- 198668602 by Zhichao Lu: Adding a new input field in data_decoders/tf_example_decoder.py for storing additional channels. -- 198530013 by Zhichao Lu: An util for hierarchical expandion of boxes and labels of OID dataset. -- 198503124 by Zhichao Lu: Fix dimension mismatch error introduced by https://github.com/tensorflow/tensorflow/pull/18251, or cl/194031845. After above change, conv2d strictly checks for conv_dims + 2 == input_rank. -- 198445807 by Zhichao Lu: Enabling Object Detection Challenge 2018 metric in evaluator.py framework for running eval job. Renaming old OpenImages V2 metric. -- 198413950 by Zhichao Lu: Support generic configuration override using namespaced keys Useful for adding custom hyper-parameter tuning fields without having to add custom override methods to config_utils.py. -- 198106437 by Zhichao Lu: Enable fused batchnorm now that quantization is supported. -- 198048364 by Zhichao Lu: Add support for keypoints in tf sequence examples and some util ops. -- 198004736 by Zhichao Lu: Relax postprocessing unit tests that are based on assumption that tf.image.non_max_suppression are stable with respect to input. -- 197997513 by Zhichao Lu: More lenient validation for normalized box boundaries. -- 197940068 by Zhichao Lu: A couple of minor updates/fixes: - Updating input reader proto with option to use display_name when decoding data. - Updating visualization tool to specify whether using absolute or normalized box coordinates. Appropriate boxes will now appear in TB when using model_main.py -- 197920152 by Zhichao Lu: Add quantized training support in the new OD binaries and a config for SSD Mobilenet v1 quantized training that is TPU compatible. -- 197213563 by Zhichao Lu: Do not share batch_norm for classification and regression tower in weight shared box predictor. -- 197196757 by Zhichao Lu: Relax the box_predictor api to return box_prediction of shape [batch_size, num_anchors, code_size] in addition to [batch_size, num_anchors, (1|q), code_size]. -- 196898361 by Zhichao Lu: Allow per-channel scalar value to pad input image with when using keep aspect ratio resizer (when pad_to_max_dimension=True). In Object Detection Pipeline, we pad image before normalization and this skews batch_norm statistics during training. The option to set per channel pad value lets us truly pad with zeros. -- 196592101 by Zhichao Lu: Fix bug regarding tfrecord shuffling in object_detection -- 196320138 by Zhichao Lu: Fix typo in exporting_models.md -- PiperOrigin-RevId: 199348852

Merged commit includes the following changes:
199348852 by Zhichao Lu: Small typos fixes in VRD evaluation. -- 199315191 by Zhichao Lu: Change padding shapes when additional channels are available. -- 199309180 by Zhichao Lu: Adds minor fixes to the Object Detection API implementation. -- 199298605 by Zhichao Lu: Force num_readers to be 1 when only input file is not sharded. -- 199292952 by Zhichao Lu: Adds image-level labels parsing into TfExampleDetectionAndGTParser. -- 199259866 by Zhichao Lu: Visual Relationships Evaluation executable. -- 199208330 by Zhichao Lu: Infer train_config.batch_size as the effective batch size. Therefore we need to divide the effective batch size in trainer by train_config.replica_to_aggregate to get per worker batch size. -- 199207842 by Zhichao Lu: Internal change. -- 199204222 by Zhichao Lu: In case the image has more than three channels, we only take the first three channels for visualization. -- 199194388 by Zhichao Lu: Correcting protocols description: VOC 2007 -> VOC 2012. -- 199188290 by Zhichao Lu: Adds per-relationship APs and mAP computation to VRD evaluation. -- 199158801 by Zhichao Lu: If available, additional channels are merged with input image. -- 199099637 by Zhichao Lu: OpenImages Challenge metric support: -adding verified labels standard field for TFExample; -adding tfrecord creation functionality. -- 198957391 by Zhichao Lu: Allow tf record sharding when creating pets dataset. -- 198925184 by Zhichao Lu: Introduce moving average support for evaluation. Also adding the ability to override this configuration via config_util. -- 198918186 by Zhichao Lu: Handles the case where there are 0 box masks. -- 198809009 by Zhichao Lu: Plumb groundtruth weights into target assigner for Faster RCNN. -- 198759987 by Zhichao Lu: Fix object detection test broken by shape inference. -- 198668602 by Zhichao Lu: Adding a new input field in data_decoders/tf_example_decoder.py for storing additional channels. -- 198530013 by Zhichao Lu: An util for hierarchical expandion of boxes and labels of OID dataset. -- 198503124 by Zhichao Lu: Fix dimension mismatch error introduced by https://github.com/tensorflow/tensorflow/pull/18251, or cl/194031845. After above change, conv2d strictly checks for conv_dims + 2 == input_rank. -- 198445807 by Zhichao Lu: Enabling Object Detection Challenge 2018 metric in evaluator.py framework for running eval job. Renaming old OpenImages V2 metric. -- 198413950 by Zhichao Lu: Support generic configuration override using namespaced keys Useful for adding custom hyper-parameter tuning fields without having to add custom override methods to config_utils.py. -- 198106437 by Zhichao Lu: Enable fused batchnorm now that quantization is supported. -- 198048364 by Zhichao Lu: Add support for keypoints in tf sequence examples and some util ops. -- 198004736 by Zhichao Lu: Relax postprocessing unit tests that are based on assumption that tf.image.non_max_suppression are stable with respect to input. -- 197997513 by Zhichao Lu: More lenient validation for normalized box boundaries. -- 197940068 by Zhichao Lu: A couple of minor updates/fixes: - Updating input reader proto with option to use display_name when decoding data. - Updating visualization tool to specify whether using absolute or normalized box coordinates. Appropriate boxes will now appear in TB when using model_main.py -- 197920152 by Zhichao Lu: Add quantized training support in the new OD binaries and a config for SSD Mobilenet v1 quantized training that is TPU compatible. -- 197213563 by Zhichao Lu: Do not share batch_norm for classification and regression tower in weight shared box predictor. -- 197196757 by Zhichao Lu: Relax the box_predictor api to return box_prediction of shape [batch_size, num_anchors, code_size] in addition to [batch_size, num_anchors, (1|q), code_size]. -- 196898361 by Zhichao Lu: Allow per-channel scalar value to pad input image with when using keep aspect ratio resizer (when pad_to_max_dimension=True). In Object Detection Pipeline, we pad image before normalization and this skews batch_norm statistics during training. The option to set per channel pad value lets us truly pad with zeros. -- 196592101 by Zhichao Lu: Fix bug regarding tfrecord shuffling in object_detection -- 196320138 by Zhichao Lu: Fix typo in exporting_models.md -- PiperOrigin-RevId: 199348852
9fce9c64 · Zhichao Lu · pkulzc · ed901b73 · 9fce9c64 · 9fce9c64
Commit 9fce9c64 authored Jun 05, 2018 by Zhichao Lu Committed by pkulzc Jun 05, 2018
20 changed files
--- a/research/object_detection/builders/dataset_builder.py
+++ b/research/object_detection/builders/dataset_builder.py
@@ -56,15 +56,26 @@ def _get_padding_shapes(dataset, max_num_boxes=None, num_classes=None,
  else:
    height, width = spatial_image_shape  # pylint: disable=unpacking-non-sequence

+  num_additional_channels = 0
+  if fields.InputDataFields.image_additional_channels in dataset.output_shapes:
+    num_additional_channels = dataset.output_shapes[
+        fields.InputDataFields.image_additional_channels].dims[2].value
  padding_shapes = {
-      fields.InputDataFields.image: [height, width, 3],
+      # Additional channels are merged before batching.
+      fields.InputDataFields.image: [
+          height, width, 3 + num_additional_channels
+      ],
+      fields.InputDataFields.image_additional_channels: [
+          height, width, num_additional_channels
+      ],
      fields.InputDataFields.source_id: [],
      fields.InputDataFields.filename: [],
      fields.InputDataFields.key: [],
      fields.InputDataFields.groundtruth_difficult: [max_num_boxes],
      fields.InputDataFields.groundtruth_boxes: [max_num_boxes, 4],
-      fields.InputDataFields.groundtruth_instance_masks: [max_num_boxes, height,
-                                                          width],
+      fields.InputDataFields.groundtruth_instance_masks: [
+          max_num_boxes, height, width
+      ],
      fields.InputDataFields.groundtruth_is_crowd: [max_num_boxes],
      fields.InputDataFields.groundtruth_group_of: [max_num_boxes],
      fields.InputDataFields.groundtruth_area: [max_num_boxes],
@@ -74,7 +85,8 @@ def _get_padding_shapes(dataset, max_num_boxes=None, num_classes=None,
      fields.InputDataFields.groundtruth_label_scores: [max_num_boxes],
      fields.InputDataFields.true_image_shape: [3],
      fields.InputDataFields.multiclass_scores: [
-          max_num_boxes, num_classes + 1 if num_classes is not None else None],
+          max_num_boxes, num_classes + 1 if num_classes is not None else None
+      ],
  }
  # Determine whether groundtruth_classes are integers or one-hot encodings, and
  # apply batching appropriately.
@@ -90,7 +102,9 @@ def _get_padding_shapes(dataset, max_num_boxes=None, num_classes=None,
                     'rank 2 tensor (one-hot encodings)')

  if fields.InputDataFields.original_image in dataset.output_shapes:
-    padding_shapes[fields.InputDataFields.original_image] = [None, None, 3]
+    padding_shapes[fields.InputDataFields.original_image] = [
+        None, None, 3 + num_additional_channels
+    ]
  if fields.InputDataFields.groundtruth_keypoints in dataset.output_shapes:
    tensor_shape = dataset.output_shapes[fields.InputDataFields.
                                         groundtruth_keypoints]
@@ -108,9 +122,13 @@ def _get_padding_shapes(dataset, max_num_boxes=None, num_classes=None,
          for tensor_key, _ in dataset.output_shapes.items()}


-def build(input_reader_config, transform_input_data_fn=None,
-          batch_size=None, max_num_boxes=None, num_classes=None,
-          spatial_image_shape=None):
+def build(input_reader_config,
+          transform_input_data_fn=None,
+          batch_size=None,
+          max_num_boxes=None,
+          num_classes=None,
+          spatial_image_shape=None,
+          num_additional_channels=0):
  """Builds a tf.data.Dataset.

  Builds a tf.data.Dataset by applying the `transform_input_data_fn` on all
@@ -128,6 +146,7 @@ def build(input_reader_config, transform_input_data_fn=None,
    spatial_image_shape: A list of two integers of the form [height, width]
      containing expected spatial shape of the image after applying
      transform_input_data_fn. If None, will use dynamic shapes.
+    num_additional_channels: Number of additional channels to use in the input.

  Returns:
    A tf.data.Dataset based on the input_reader_config.
@@ -152,7 +171,9 @@ def build(input_reader_config, transform_input_data_fn=None,
    decoder = tf_example_decoder.TfExampleDecoder(
        load_instance_masks=input_reader_config.load_instance_masks,
        instance_mask_type=input_reader_config.mask_type,
-        label_map_proto_file=label_map_proto_file)
+        label_map_proto_file=label_map_proto_file,
+        use_display_name=input_reader_config.use_display_name,
+        num_additional_channels=num_additional_channels)

    def process_fn(value):
      processed = decoder.decode(value)

--- a/research/object_detection/builders/dataset_builder_test.py
+++ b/research/object_detection/builders/dataset_builder_test.py
@@ -30,49 +30,50 @@ from object_detection.utils import dataset_util

 class DatasetBuilderTest(tf.test.TestCase):

-  def create_tf_record(self):
+  def create_tf_record(self, has_additional_channels=False):
    path = os.path.join(self.get_temp_dir(), 'tfrecord')
    writer = tf.python_io.TFRecordWriter(path)

    image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
+    additional_channels_tensor = np.random.randint(
+        255, size=(4, 5, 1)).astype(np.uint8)
    flat_mask = (4 * 5) * [1.0]
    with self.test_session():
      encoded_jpeg = tf.image.encode_jpeg(tf.constant(image_tensor)).eval()
+      encoded_additional_channels_jpeg = tf.image.encode_jpeg(
+          tf.constant(additional_channels_tensor)).eval()
+    features = {
+        'image/encoded':
+            feature_pb2.Feature(
+                bytes_list=feature_pb2.BytesList(value=[encoded_jpeg])),
+        'image/format':
+            feature_pb2.Feature(
+                bytes_list=feature_pb2.BytesList(value=['jpeg'.encode('utf-8')])
+            ),
+        'image/height':
+            feature_pb2.Feature(int64_list=feature_pb2.Int64List(value=[4])),
+        'image/width':
+            feature_pb2.Feature(int64_list=feature_pb2.Int64List(value=[5])),
+        'image/object/bbox/xmin':
+            feature_pb2.Feature(float_list=feature_pb2.FloatList(value=[0.0])),
+        'image/object/bbox/xmax':
+            feature_pb2.Feature(float_list=feature_pb2.FloatList(value=[1.0])),
+        'image/object/bbox/ymin':
+            feature_pb2.Feature(float_list=feature_pb2.FloatList(value=[0.0])),
+        'image/object/bbox/ymax':
+            feature_pb2.Feature(float_list=feature_pb2.FloatList(value=[1.0])),
+        'image/object/class/label':
+            feature_pb2.Feature(int64_list=feature_pb2.Int64List(value=[2])),
+        'image/object/mask':
+            feature_pb2.Feature(
+                float_list=feature_pb2.FloatList(value=flat_mask)),
+    }
+    if has_additional_channels:
+      features['image/additional_channels/encoded'] = feature_pb2.Feature(
+          bytes_list=feature_pb2.BytesList(
+              value=[encoded_additional_channels_jpeg] * 2))
    example = example_pb2.Example(
-        features=feature_pb2.Features(
-            feature={
-                'image/encoded':
-                    feature_pb2.Feature(
-                        bytes_list=feature_pb2.BytesList(value=[encoded_jpeg])),
-                'image/format':
-                    feature_pb2.Feature(
-                        bytes_list=feature_pb2.BytesList(
-                            value=['jpeg'.encode('utf-8')])),
-                'image/height':
-                    feature_pb2.Feature(
-                        int64_list=feature_pb2.Int64List(value=[4])),
-                'image/width':
-                    feature_pb2.Feature(
-                        int64_list=feature_pb2.Int64List(value=[5])),
-                'image/object/bbox/xmin':
-                    feature_pb2.Feature(
-                        float_list=feature_pb2.FloatList(value=[0.0])),
-                'image/object/bbox/xmax':
-                    feature_pb2.Feature(
-                        float_list=feature_pb2.FloatList(value=[1.0])),
-                'image/object/bbox/ymin':
-                    feature_pb2.Feature(
-                        float_list=feature_pb2.FloatList(value=[0.0])),
-                'image/object/bbox/ymax':
-                    feature_pb2.Feature(
-                        float_list=feature_pb2.FloatList(value=[1.0])),
-                'image/object/class/label':
-                    feature_pb2.Feature(
-                        int64_list=feature_pb2.Int64List(value=[2])),
-                'image/object/mask':
-                    feature_pb2.Feature(
-                        float_list=feature_pb2.FloatList(value=flat_mask)),
-            }))
+        features=feature_pb2.Features(feature=features))
    writer.write(example.SerializeToString())
    writer.close()

@@ -218,6 +219,31 @@ class DatasetBuilderTest(tf.test.TestCase):
        [2, 2, 4, 5],
        output_dict[fields.InputDataFields.groundtruth_instance_masks].shape)

+  def test_build_tf_record_input_reader_with_additional_channels(self):
+    tf_record_path = self.create_tf_record(has_additional_channels=True)
+
+    input_reader_text_proto = """
+      shuffle: false
+      num_readers: 1
+      tf_record_input_reader {{
+        input_path: '{0}'
+      }}
+    """.format(tf_record_path)
+    input_reader_proto = input_reader_pb2.InputReader()
+    text_format.Merge(input_reader_text_proto, input_reader_proto)
+    tensor_dict = dataset_util.make_initializable_iterator(
+        dataset_builder.build(
+            input_reader_proto, batch_size=2,
+            num_additional_channels=2)).get_next()
+
+    sv = tf.train.Supervisor(logdir=self.get_temp_dir())
+    with sv.prepare_or_wait_for_session() as sess:
+      sv.start_queue_runners(sess)
+      output_dict = sess.run(tensor_dict)
+
+    self.assertEquals((2, 4, 5, 5),
+                      output_dict[fields.InputDataFields.image].shape)
+
  def test_raises_error_with_no_input_paths(self):
    input_reader_text_proto = """
      shuffle: false

--- a/research/object_detection/builders/image_resizer_builder.py
+++ b/research/object_detection/builders/image_resizer_builder.py
@@ -79,12 +79,17 @@ def build(image_resizer_config):
            keep_aspect_ratio_config.max_dimension):
      raise ValueError('min_dimension > max_dimension')
    method = _tf_resize_method(keep_aspect_ratio_config.resize_method)
+    per_channel_pad_value = (0, 0, 0)
+    if keep_aspect_ratio_config.per_channel_pad_value:
+      per_channel_pad_value = tuple(keep_aspect_ratio_config.
+                                    per_channel_pad_value)
    image_resizer_fn = functools.partial(
        preprocessor.resize_to_range,
        min_dimension=keep_aspect_ratio_config.min_dimension,
        max_dimension=keep_aspect_ratio_config.max_dimension,
        method=method,
-        pad_to_max_dimension=keep_aspect_ratio_config.pad_to_max_dimension)
+        pad_to_max_dimension=keep_aspect_ratio_config.pad_to_max_dimension,
+        per_channel_pad_value=per_channel_pad_value)
    if not keep_aspect_ratio_config.convert_to_grayscale:
      return image_resizer_fn
  elif image_resizer_oneof == 'fixed_shape_resizer':

--- a/research/object_detection/builders/image_resizer_builder_test.py
+++ b/research/object_detection/builders/image_resizer_builder_test.py
@@ -52,6 +52,9 @@ class ImageResizerBuilderTest(tf.test.TestCase):
        min_dimension: 10
        max_dimension: 20
        pad_to_max_dimension: true
+        per_channel_pad_value: 3
+        per_channel_pad_value: 4
+        per_channel_pad_value: 5
      }
    """
    input_shape = (50, 25, 3)

--- a/research/object_detection/builders/model_builder.py
+++ b/research/object_detection/builders/model_builder.py
-# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""A function to build a DetectionModel from configuration."""
-from object_detection.builders import anchor_generator_builder
-from object_detection.builders import box_coder_builder
-from object_detection.builders import box_predictor_builder
-from object_detection.builders import hyperparams_builder
-from object_detection.builders import image_resizer_builder
-from object_detection.builders import losses_builder
-from object_detection.builders import matcher_builder
-from object_detection.builders import post_processing_builder
-from object_detection.builders import region_similarity_calculator_builder as sim_calc
-from object_detection.core import box_predictor
-from object_detection.meta_architectures import faster_rcnn_meta_arch
-from object_detection.meta_architectures import rfcn_meta_arch
-from object_detection.meta_architectures import ssd_meta_arch
-from object_detection.models import faster_rcnn_inception_resnet_v2_feature_extractor as frcnn_inc_res
-from object_detection.models import faster_rcnn_inception_v2_feature_extractor as frcnn_inc_v2
-from object_detection.models import faster_rcnn_nas_feature_extractor as frcnn_nas
-from object_detection.models import faster_rcnn_pnas_feature_extractor as frcnn_pnas
-from object_detection.models import faster_rcnn_resnet_v1_feature_extractor as frcnn_resnet_v1
-from object_detection.models import ssd_resnet_v1_fpn_feature_extractor as ssd_resnet_v1_fpn
-from object_detection.models.embedded_ssd_mobilenet_v1_feature_extractor import EmbeddedSSDMobileNetV1FeatureExtractor
-from object_detection.models.ssd_inception_v2_feature_extractor import SSDInceptionV2FeatureExtractor
-from object_detection.models.ssd_inception_v3_feature_extractor import SSDInceptionV3FeatureExtractor
-from object_detection.models.ssd_mobilenet_v1_feature_extractor import SSDMobileNetV1FeatureExtractor
-from object_detection.models.ssd_mobilenet_v2_feature_extractor import SSDMobileNetV2FeatureExtractor
-from object_detection.protos import model_pb2
-
-# A map of names to SSD feature extractors.
-SSD_FEATURE_EXTRACTOR_CLASS_MAP = {
-    'ssd_inception_v2': SSDInceptionV2FeatureExtractor,
-    'ssd_inception_v3': SSDInceptionV3FeatureExtractor,
-    'ssd_mobilenet_v1': SSDMobileNetV1FeatureExtractor,
-    'ssd_mobilenet_v2': SSDMobileNetV2FeatureExtractor,
-    'ssd_resnet50_v1_fpn': ssd_resnet_v1_fpn.SSDResnet50V1FpnFeatureExtractor,
-    'ssd_resnet101_v1_fpn': ssd_resnet_v1_fpn.SSDResnet101V1FpnFeatureExtractor,
-    'ssd_resnet152_v1_fpn': ssd_resnet_v1_fpn.SSDResnet152V1FpnFeatureExtractor,
-    'embedded_ssd_mobilenet_v1': EmbeddedSSDMobileNetV1FeatureExtractor,
-}
-
-# A map of names to Faster R-CNN feature extractors.
-FASTER_RCNN_FEATURE_EXTRACTOR_CLASS_MAP = {
-    'faster_rcnn_nas':
-    frcnn_nas.FasterRCNNNASFeatureExtractor,
-    'faster_rcnn_pnas':
-    frcnn_pnas.FasterRCNNPNASFeatureExtractor,
-    'faster_rcnn_inception_resnet_v2':
-    frcnn_inc_res.FasterRCNNInceptionResnetV2FeatureExtractor,
-    'faster_rcnn_inception_v2':
-    frcnn_inc_v2.FasterRCNNInceptionV2FeatureExtractor,
-    'faster_rcnn_resnet50':
-    frcnn_resnet_v1.FasterRCNNResnet50FeatureExtractor,
-    'faster_rcnn_resnet101':
-    frcnn_resnet_v1.FasterRCNNResnet101FeatureExtractor,
-    'faster_rcnn_resnet152':
-    frcnn_resnet_v1.FasterRCNNResnet152FeatureExtractor,
-}
-
-
-def build(model_config, is_training, add_summaries=True,
-          add_background_class=True):
-  """Builds a DetectionModel based on the model config.
-
-  Args:
-    model_config: A model.proto object containing the config for the desired
-      DetectionModel.
-    is_training: True if this model is being built for training purposes.
-    add_summaries: Whether to add tensorflow summaries in the model graph.
-    add_background_class: Whether to add an implicit background class to one-hot
-      encodings of groundtruth labels. Set to false if using groundtruth labels
-      with an explicit background class or using multiclass scores instead of
-      truth in the case of distillation. Ignored in the case of faster_rcnn.
-  Returns:
-    DetectionModel based on the config.
-
-  Raises:
-    ValueError: On invalid meta architecture or model.
-  """
-  if not isinstance(model_config, model_pb2.DetectionModel):
-    raise ValueError('model_config not of type model_pb2.DetectionModel.')
-  meta_architecture = model_config.WhichOneof('model')
-  if meta_architecture == 'ssd':
-    return _build_ssd_model(model_config.ssd, is_training, add_summaries,
-                            add_background_class)
-  if meta_architecture == 'faster_rcnn':
-    return _build_faster_rcnn_model(model_config.faster_rcnn, is_training,
-                                    add_summaries)
-  raise ValueError('Unknown meta architecture: {}'.format(meta_architecture))
-
-
-def _build_ssd_feature_extractor(feature_extractor_config, is_training,
-                                 reuse_weights=None):
-  """Builds a ssd_meta_arch.SSDFeatureExtractor based on config.
-
-  Args:
-    feature_extractor_config: A SSDFeatureExtractor proto config from ssd.proto.
-    is_training: True if this feature extractor is being built for training.
-    reuse_weights: if the feature extractor should reuse weights.
-
-  Returns:
-    ssd_meta_arch.SSDFeatureExtractor based on config.
-
-  Raises:
-    ValueError: On invalid feature extractor type.
-  """
-  feature_type = feature_extractor_config.type
-  depth_multiplier = feature_extractor_config.depth_multiplier
-  min_depth = feature_extractor_config.min_depth
-  pad_to_multiple = feature_extractor_config.pad_to_multiple
-  use_explicit_padding = feature_extractor_config.use_explicit_padding
-  use_depthwise = feature_extractor_config.use_depthwise
-  conv_hyperparams = hyperparams_builder.build(
-      feature_extractor_config.conv_hyperparams, is_training)
-  override_base_feature_extractor_hyperparams = (
-      feature_extractor_config.override_base_feature_extractor_hyperparams)
-
-  if feature_type not in SSD_FEATURE_EXTRACTOR_CLASS_MAP:
-    raise ValueError('Unknown ssd feature_extractor: {}'.format(feature_type))
-
-  feature_extractor_class = SSD_FEATURE_EXTRACTOR_CLASS_MAP[feature_type]
-  return feature_extractor_class(
-      is_training, depth_multiplier, min_depth, pad_to_multiple,
-      conv_hyperparams, reuse_weights, use_explicit_padding, use_depthwise,
-      override_base_feature_extractor_hyperparams)
-
-
-def _build_ssd_model(ssd_config, is_training, add_summaries,
-                     add_background_class=True):
-  """Builds an SSD detection model based on the model config.
-
-  Args:
-    ssd_config: A ssd.proto object containing the config for the desired
-      SSDMetaArch.
-    is_training: True if this model is being built for training purposes.
-    add_summaries: Whether to add tf summaries in the model.
-    add_background_class: Whether to add an implicit background class to one-hot
-      encodings of groundtruth labels. Set to false if using groundtruth labels
-      with an explicit background class or using multiclass scores instead of
-      truth in the case of distillation.
-  Returns:
-    SSDMetaArch based on the config.
-
-  Raises:
-    ValueError: If ssd_config.type is not recognized (i.e. not registered in
-      model_class_map).
-  """
-  num_classes = ssd_config.num_classes
-
-  # Feature extractor
-  feature_extractor = _build_ssd_feature_extractor(
-      feature_extractor_config=ssd_config.feature_extractor,
-      is_training=is_training)
-
-  box_coder = box_coder_builder.build(ssd_config.box_coder)
-  matcher = matcher_builder.build(ssd_config.matcher)
-  region_similarity_calculator = sim_calc.build(
-      ssd_config.similarity_calculator)
-  encode_background_as_zeros = ssd_config.encode_background_as_zeros
-  negative_class_weight = ssd_config.negative_class_weight
-  ssd_box_predictor = box_predictor_builder.build(hyperparams_builder.build,
-                                                  ssd_config.box_predictor,
-                                                  is_training, num_classes)
-  anchor_generator = anchor_generator_builder.build(
-      ssd_config.anchor_generator)
-  image_resizer_fn = image_resizer_builder.build(ssd_config.image_resizer)
-  non_max_suppression_fn, score_conversion_fn = post_processing_builder.build(
-      ssd_config.post_processing)
-  (classification_loss, localization_loss, classification_weight,
-   localization_weight, hard_example_miner,
-   random_example_sampler) = losses_builder.build(ssd_config.loss)
-  normalize_loss_by_num_matches = ssd_config.normalize_loss_by_num_matches
-  normalize_loc_loss_by_codesize = ssd_config.normalize_loc_loss_by_codesize
-
-  return ssd_meta_arch.SSDMetaArch(
-      is_training,
-      anchor_generator,
-      ssd_box_predictor,
-      box_coder,
-      feature_extractor,
-      matcher,
-      region_similarity_calculator,
-      encode_background_as_zeros,
-      negative_class_weight,
-      image_resizer_fn,
-      non_max_suppression_fn,
-      score_conversion_fn,
-      classification_loss,
-      localization_loss,
-      classification_weight,
-      localization_weight,
-      normalize_loss_by_num_matches,
-      hard_example_miner,
-      add_summaries=add_summaries,
-      normalize_loc_loss_by_codesize=normalize_loc_loss_by_codesize,
-      freeze_batchnorm=ssd_config.freeze_batchnorm,
-      inplace_batchnorm_update=ssd_config.inplace_batchnorm_update,
-      add_background_class=add_background_class,
-      random_example_sampler=random_example_sampler)
-
-
-def _build_faster_rcnn_feature_extractor(
-    feature_extractor_config, is_training, reuse_weights=None,
-    inplace_batchnorm_update=False):
-  """Builds a faster_rcnn_meta_arch.FasterRCNNFeatureExtractor based on config.
-
-  Args:
-    feature_extractor_config: A FasterRcnnFeatureExtractor proto config from
-      faster_rcnn.proto.
-    is_training: True if this feature extractor is being built for training.
-    reuse_weights: if the feature extractor should reuse weights.
-    inplace_batchnorm_update: Whether to update batch_norm inplace during
-      training. This is required for batch norm to work correctly on TPUs. When
-      this is false, user must add a control dependency on
-      tf.GraphKeys.UPDATE_OPS for train/loss op in order to update the batch
-      norm moving average parameters.
-
-  Returns:
-    faster_rcnn_meta_arch.FasterRCNNFeatureExtractor based on config.
-
-  Raises:
-    ValueError: On invalid feature extractor type.
-  """
-  if inplace_batchnorm_update:
-    raise ValueError('inplace batchnorm updates not supported.')
-  feature_type = feature_extractor_config.type
-  first_stage_features_stride = (
-      feature_extractor_config.first_stage_features_stride)
-  batch_norm_trainable = feature_extractor_config.batch_norm_trainable
-
-  if feature_type not in FASTER_RCNN_FEATURE_EXTRACTOR_CLASS_MAP:
-    raise ValueError('Unknown Faster R-CNN feature_extractor: {}'.format(
-        feature_type))
-  feature_extractor_class = FASTER_RCNN_FEATURE_EXTRACTOR_CLASS_MAP[
-      feature_type]
-  return feature_extractor_class(
-      is_training, first_stage_features_stride,
-      batch_norm_trainable, reuse_weights)
-
-
-def _build_faster_rcnn_model(frcnn_config, is_training, add_summaries):
-  """Builds a Faster R-CNN or R-FCN detection model based on the model config.
-
-  Builds R-FCN model if the second_stage_box_predictor in the config is of type
-  `rfcn_box_predictor` else builds a Faster R-CNN model.
-
-  Args:
-    frcnn_config: A faster_rcnn.proto object containing the config for the
-      desired FasterRCNNMetaArch or RFCNMetaArch.
-    is_training: True if this model is being built for training purposes.
-    add_summaries: Whether to add tf summaries in the model.
-
-  Returns:
-    FasterRCNNMetaArch based on the config.
-
-  Raises:
-    ValueError: If frcnn_config.type is not recognized (i.e. not registered in
-      model_class_map).
-  """
-  num_classes = frcnn_config.num_classes
-  image_resizer_fn = image_resizer_builder.build(frcnn_config.image_resizer)
-
-  feature_extractor = _build_faster_rcnn_feature_extractor(
-      frcnn_config.feature_extractor, is_training,
-      frcnn_config.inplace_batchnorm_update)
-
-  number_of_stages = frcnn_config.number_of_stages
-  first_stage_anchor_generator = anchor_generator_builder.build(
-      frcnn_config.first_stage_anchor_generator)
-
-  first_stage_atrous_rate = frcnn_config.first_stage_atrous_rate
-  first_stage_box_predictor_arg_scope_fn = hyperparams_builder.build(
-      frcnn_config.first_stage_box_predictor_conv_hyperparams, is_training)
-  first_stage_box_predictor_kernel_size = (
-      frcnn_config.first_stage_box_predictor_kernel_size)
-  first_stage_box_predictor_depth = frcnn_config.first_stage_box_predictor_depth
-  first_stage_minibatch_size = frcnn_config.first_stage_minibatch_size
-  first_stage_positive_balance_fraction = (
-      frcnn_config.first_stage_positive_balance_fraction)
-  first_stage_nms_score_threshold = frcnn_config.first_stage_nms_score_threshold
-  first_stage_nms_iou_threshold = frcnn_config.first_stage_nms_iou_threshold
-  first_stage_max_proposals = frcnn_config.first_stage_max_proposals
-  first_stage_loc_loss_weight = (
-      frcnn_config.first_stage_localization_loss_weight)
-  first_stage_obj_loss_weight = frcnn_config.first_stage_objectness_loss_weight
-
-  initial_crop_size = frcnn_config.initial_crop_size
-  maxpool_kernel_size = frcnn_config.maxpool_kernel_size
-  maxpool_stride = frcnn_config.maxpool_stride
-
-  second_stage_box_predictor = box_predictor_builder.build(
-      hyperparams_builder.build,
-      frcnn_config.second_stage_box_predictor,
-      is_training=is_training,
-      num_classes=num_classes)
-  second_stage_batch_size = frcnn_config.second_stage_batch_size
-  second_stage_balance_fraction = frcnn_config.second_stage_balance_fraction
-  (second_stage_non_max_suppression_fn, second_stage_score_conversion_fn
-  ) = post_processing_builder.build(frcnn_config.second_stage_post_processing)
-  second_stage_localization_loss_weight = (
-      frcnn_config.second_stage_localization_loss_weight)
-  second_stage_classification_loss = (
-      losses_builder.build_faster_rcnn_classification_loss(
-          frcnn_config.second_stage_classification_loss))
-  second_stage_classification_loss_weight = (
-      frcnn_config.second_stage_classification_loss_weight)
-  second_stage_mask_prediction_loss_weight = (
-      frcnn_config.second_stage_mask_prediction_loss_weight)
-
-  hard_example_miner = None
-  if frcnn_config.HasField('hard_example_miner'):
-    hard_example_miner = losses_builder.build_hard_example_miner(
-        frcnn_config.hard_example_miner,
-        second_stage_classification_loss_weight,
-        second_stage_localization_loss_weight)
-
-  common_kwargs = {
-      'is_training': is_training,
-      'num_classes': num_classes,
-      'image_resizer_fn': image_resizer_fn,
-      'feature_extractor': feature_extractor,
-      'number_of_stages': number_of_stages,
-      'first_stage_anchor_generator': first_stage_anchor_generator,
-      'first_stage_atrous_rate': first_stage_atrous_rate,
-      'first_stage_box_predictor_arg_scope_fn':
-      first_stage_box_predictor_arg_scope_fn,
-      'first_stage_box_predictor_kernel_size':
-      first_stage_box_predictor_kernel_size,
-      'first_stage_box_predictor_depth': first_stage_box_predictor_depth,
-      'first_stage_minibatch_size': first_stage_minibatch_size,
-      'first_stage_positive_balance_fraction':
-      first_stage_positive_balance_fraction,
-      'first_stage_nms_score_threshold': first_stage_nms_score_threshold,
-      'first_stage_nms_iou_threshold': first_stage_nms_iou_threshold,
-      'first_stage_max_proposals': first_stage_max_proposals,
-      'first_stage_localization_loss_weight': first_stage_loc_loss_weight,
-      'first_stage_objectness_loss_weight': first_stage_obj_loss_weight,
-      'second_stage_batch_size': second_stage_batch_size,
-      'second_stage_balance_fraction': second_stage_balance_fraction,
-      'second_stage_non_max_suppression_fn':
-      second_stage_non_max_suppression_fn,
-      'second_stage_score_conversion_fn': second_stage_score_conversion_fn,
-      'second_stage_localization_loss_weight':
-      second_stage_localization_loss_weight,
-      'second_stage_classification_loss':
-      second_stage_classification_loss,
-      'second_stage_classification_loss_weight':
-      second_stage_classification_loss_weight,
-      'hard_example_miner': hard_example_miner,
-      'add_summaries': add_summaries}
-
-  if isinstance(second_stage_box_predictor, box_predictor.RfcnBoxPredictor):
-    return rfcn_meta_arch.RFCNMetaArch(
-        second_stage_rfcn_box_predictor=second_stage_box_predictor,
-        **common_kwargs)
-  else:
-    return faster_rcnn_meta_arch.FasterRCNNMetaArch(
-        initial_crop_size=initial_crop_size,
-        maxpool_kernel_size=maxpool_kernel_size,
-        maxpool_stride=maxpool_stride,
-        second_stage_mask_rcnn_box_predictor=second_stage_box_predictor,
-        second_stage_mask_prediction_loss_weight=(
-            second_stage_mask_prediction_loss_weight),
-        **common_kwargs)
--- a/research/object_detection/builders/model_builder_test.py
+++ b/research/object_detection/builders/model_builder_test.py
--- a/research/object_detection/core/box_list_ops.py
+++ b/research/object_detection/core/box_list_ops.py
@@ -778,7 +778,7 @@ def to_absolute_coordinates(boxlist,
                            height,
                            width,
                            check_range=True,
-                            maximum_normalized_coordinate=1.01,
+                            maximum_normalized_coordinate=1.1,
                            scope=None):
  """Converts normalized box coordinates to absolute pixel coordinates.

@@ -792,7 +792,7 @@ def to_absolute_coordinates(boxlist,
    width: Maximum value for width of absolute box coordinates.
    check_range: If True, checks if the coordinates are normalized or not.
    maximum_normalized_coordinate: Maximum coordinate value to be considered
-      as normalized, default to 1.01.
+      as normalized, default to 1.1.
    scope: name scope.

  Returns:

--- a/research/object_detection/core/box_list_ops_test.py
+++ b/research/object_detection/core/box_list_ops_test.py
@@ -931,6 +931,21 @@ class CoordinatesConversionTest(tf.test.TestCase):
      out = sess.run(boxlist.get())
      self.assertAllClose(out, coordinates)

+  def test_to_absolute_coordinates_maximum_coordinate_check(self):
+    coordinates = tf.constant([[0, 0, 1.2, 1.2],
+                               [0.25, 0.25, 0.75, 0.75]], tf.float32)
+    img = tf.ones((128, 100, 100, 3))
+    boxlist = box_list.BoxList(coordinates)
+    absolute_boxlist = box_list_ops.to_absolute_coordinates(
+        boxlist,
+        tf.shape(img)[1],
+        tf.shape(img)[2],
+        maximum_normalized_coordinate=1.1)
+
+    with self.test_session() as sess:
+      with self.assertRaisesOpError('assertion failed'):
+        sess.run(absolute_boxlist.get())
+

 class BoxRefinementTest(tf.test.TestCase):


--- a/research/object_detection/core/box_predictor.py
+++ b/research/object_detection/core/box_predictor.py
@@ -79,10 +79,12 @@ class BoxPredictor(object):

    Returns:
      A dictionary containing at least the following tensors.
-        box_encodings: A list of float tensors of shape
-          [batch_size, num_anchors_i, q, code_size] representing the location of
-          the objects, where q is 1 or the number of classes. Each entry in the
-          list corresponds to a feature map in the input `image_features` list.
+        box_encodings: A list of float tensors. Each entry in the list
+          corresponds to a feature map in the input `image_features` list. All
+          tensors in the list have one of the two following shapes:
+          a. [batch_size, num_anchors_i, q, code_size] representing the location
+            of the objects, where q is 1 or the number of classes.
+          b. [batch_size, num_anchors_i, code_size].
        class_predictions_with_background: A list of float tensors of shape
          [batch_size, num_anchors_i, num_classes + 1] representing the class
          predictions for the proposals. Each entry in the list corresponds to a
@@ -120,10 +122,12 @@ class BoxPredictor(object):

    Returns:
      A dictionary containing at least the following tensors.
-        box_encodings: A list of float tensors of shape
-          [batch_size, num_anchors_i, q, code_size] representing the location of
-          the objects, where q is 1 or the number of classes. Each entry in the
-          list corresponds to a feature map in the input `image_features` list.
+        box_encodings: A list of float tensors. Each entry in the list
+          corresponds to a feature map in the input `image_features` list. All
+          tensors in the list have one of the two following shapes:
+          a. [batch_size, num_anchors_i, q, code_size] representing the location
+            of the objects, where q is 1 or the number of classes.
+          b. [batch_size, num_anchors_i, code_size].
        class_predictions_with_background: A list of float tensors of shape
          [batch_size, num_anchors_i, num_classes + 1] representing the class
          predictions for the proposals. Each entry in the list corresponds to a
@@ -765,6 +769,13 @@ class ConvolutionalBoxPredictor(BoxPredictor):
    }


+# TODO(rathodv): Replace with slim.arg_scope_func_key once its available
+# externally.
+def _arg_scope_func_key(op):
+  """Returns a key that can be used to index arg_scope dictionary."""
+  return getattr(op, '_key_op', str(op))
+
+
 # TODO(rathodv): Merge the implementation with ConvolutionalBoxPredictor above
 # since they are very similar.
 class WeightSharedConvolutionalBoxPredictor(BoxPredictor):
@@ -773,8 +784,12 @@ class WeightSharedConvolutionalBoxPredictor(BoxPredictor):
  Defines the box predictor as defined in
  https://arxiv.org/abs/1708.02002. This class differs from
  ConvolutionalBoxPredictor in that it shares weights and biases while
-  predicting from different feature maps.  Separate multi-layer towers are
-  constructed for the box encoding and class predictors respectively.
+  predicting from different feature maps. However, batch_norm parameters are not
+  shared because the statistics of the activations vary among the different
+  feature maps.
+
+  Also note that separate multi-layer towers are constructed for the box
+  encoding and class predictors respectively.
  """

  def __init__(self,
@@ -833,14 +848,15 @@ class WeightSharedConvolutionalBoxPredictor(BoxPredictor):

    Returns:
      box_encodings: A list of float tensors of shape
-        [batch_size, num_anchors_i, q, code_size] representing the location of
-        the objects, where q is 1 or the number of classes. Each entry in the
-        list corresponds to a feature map in the input `image_features` list.
+        [batch_size, num_anchors_i, code_size] representing the location of
+        the objects. Each entry in the list corresponds to a feature map in the
+        input `image_features` list.
      class_predictions_with_background: A list of float tensors of shape
        [batch_size, num_anchors_i, num_classes + 1] representing the class
        predictions for the proposals. Each entry in the list corresponds to a
        feature map in the input `image_features` list.

+
    Raises:
      ValueError: If the image feature maps do not have the same number of
        channels or if the num predictions per locations is differs between the
@@ -858,15 +874,18 @@ class WeightSharedConvolutionalBoxPredictor(BoxPredictor):
                       'channels, found: {}'.format(feature_channels))
    box_encodings_list = []
    class_predictions_list = []
-    for (image_feature, num_predictions_per_location) in zip(
-        image_features, num_predictions_per_location_list):
+    for feature_index, (image_feature,
+                        num_predictions_per_location) in enumerate(
+                            zip(image_features,
+                                num_predictions_per_location_list)):
      # Add a slot for the background class.
      with tf.variable_scope('WeightSharedConvolutionalBoxPredictor',
                             reuse=tf.AUTO_REUSE):
        num_class_slots = self.num_classes + 1
        box_encodings_net = image_feature
        class_predictions_net = image_feature
-        with slim.arg_scope(self._conv_hyperparams_fn()):
+        with slim.arg_scope(self._conv_hyperparams_fn()) as sc:
+          apply_batch_norm = _arg_scope_func_key(slim.batch_norm) in sc
          for i in range(self._num_layers_before_predictor):
            box_encodings_net = slim.conv2d(
                box_encodings_net,
@@ -874,14 +893,22 @@ class WeightSharedConvolutionalBoxPredictor(BoxPredictor):
                [self._kernel_size, self._kernel_size],
                stride=1,
                padding='SAME',
-                scope='BoxEncodingPredictionTower/conv2d_{}'.format(i))
+                activation_fn=None,
+                normalizer_fn=(tf.identity if apply_batch_norm else None),
+                scope='BoxPredictionTower/conv2d_{}'.format(i))
+            if apply_batch_norm:
+              box_encodings_net = slim.batch_norm(
+                  box_encodings_net,
+                  scope='BoxPredictionTower/conv2d_{}/BatchNorm/feature_{}'.
+                  format(i, feature_index))
+            box_encodings_net = tf.nn.relu6(box_encodings_net)
          box_encodings = slim.conv2d(
              box_encodings_net,
              num_predictions_per_location * self._box_code_size,
              [self._kernel_size, self._kernel_size],
              activation_fn=None, stride=1, padding='SAME',
              normalizer_fn=None,
-              scope='BoxEncodingPredictor')
+              scope='BoxPredictor')

          for i in range(self._num_layers_before_predictor):
            class_predictions_net = slim.conv2d(
@@ -890,7 +917,15 @@ class WeightSharedConvolutionalBoxPredictor(BoxPredictor):
                [self._kernel_size, self._kernel_size],
                stride=1,
                padding='SAME',
+                activation_fn=None,
+                normalizer_fn=(tf.identity if apply_batch_norm else None),
                scope='ClassPredictionTower/conv2d_{}'.format(i))
+            if apply_batch_norm:
+              class_predictions_net = slim.batch_norm(
+                  class_predictions_net,
+                  scope='ClassPredictionTower/conv2d_{}/BatchNorm/feature_{}'
+                  .format(i, feature_index))
+            class_predictions_net = tf.nn.relu6(class_predictions_net)
          if self._use_dropout:
            class_predictions_net = slim.dropout(
                class_predictions_net, keep_prob=self._dropout_keep_prob)
@@ -912,7 +947,7 @@ class WeightSharedConvolutionalBoxPredictor(BoxPredictor):
                                       combined_feature_map_shape[1] *
                                       combined_feature_map_shape[2] *
                                       num_predictions_per_location,
-                                       1, self._box_code_size]))
+                                       self._box_code_size]))
          box_encodings_list.append(box_encodings)
          class_predictions_with_background = tf.reshape(
              class_predictions_with_background,

--- a/research/object_detection/core/box_predictor_test.py
+++ b/research/object_detection/core/box_predictor_test.py
@@ -442,6 +442,24 @@ class WeightSharedConvolutionalBoxPredictorTest(test_case.TestCase):
    text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams)
    return hyperparams_builder.build(conv_hyperparams, is_training=True)

+  def _build_conv_arg_scope_no_batch_norm(self):
+    conv_hyperparams = hyperparams_pb2.Hyperparams()
+    conv_hyperparams_text_proto = """
+      activation: RELU_6
+      regularizer {
+        l2_regularizer {
+        }
+      }
+      initializer {
+        random_normal_initializer {
+          stddev: 0.01
+          mean: 0.0
+        }
+      }
+    """
+    text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams)
+    return hyperparams_builder.build(conv_hyperparams, is_training=True)
+
  def test_get_boxes_for_five_aspect_ratios_per_location(self):

    def graph_fn(image_features):
@@ -463,7 +481,7 @@ class WeightSharedConvolutionalBoxPredictorTest(test_case.TestCase):
    image_features = np.random.rand(4, 8, 8, 64).astype(np.float32)
    (box_encodings, objectness_predictions) = self.execute(
        graph_fn, [image_features])
-    self.assertAllEqual(box_encodings.shape, [4, 320, 1, 4])
+    self.assertAllEqual(box_encodings.shape, [4, 320, 4])
    self.assertAllEqual(objectness_predictions.shape, [4, 320, 1])

  def test_bias_predictions_to_background_with_sigmoid_score_conversion(self):
@@ -512,7 +530,7 @@ class WeightSharedConvolutionalBoxPredictorTest(test_case.TestCase):
    image_features = np.random.rand(4, 8, 8, 64).astype(np.float32)
    (box_encodings, class_predictions_with_background) = self.execute(
        graph_fn, [image_features])
-    self.assertAllEqual(box_encodings.shape, [4, 320, 1, 4])
+    self.assertAllEqual(box_encodings.shape, [4, 320, 4])
    self.assertAllEqual(class_predictions_with_background.shape,
                        [4, 320, num_classes_without_background+1])

@@ -543,11 +561,12 @@ class WeightSharedConvolutionalBoxPredictorTest(test_case.TestCase):
    image_features2 = np.random.rand(4, 8, 8, 64).astype(np.float32)
    (box_encodings, class_predictions_with_background) = self.execute(
        graph_fn, [image_features1, image_features2])
-    self.assertAllEqual(box_encodings.shape, [4, 640, 1, 4])
+    self.assertAllEqual(box_encodings.shape, [4, 640, 4])
    self.assertAllEqual(class_predictions_with_background.shape,
                        [4, 640, num_classes_without_background+1])

-  def test_predictions_from_multiple_feature_maps_share_weights(self):
+  def test_predictions_from_multiple_feature_maps_share_weights_not_batchnorm(
+      self):
    num_classes_without_background = 6
    def graph_fn(image_features1, image_features2):
      conv_box_predictor = box_predictor.WeightSharedConvolutionalBoxPredictor(
@@ -574,26 +593,95 @@ class WeightSharedConvolutionalBoxPredictorTest(test_case.TestCase):
      actual_variable_set = set(
          [var.op.name for var in tf.trainable_variables()])
    expected_variable_set = set([
+        # Box prediction tower
+        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
+         'BoxPredictionTower/conv2d_0/weights'),
+        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
+         'BoxPredictionTower/conv2d_0/BatchNorm/feature_0/beta'),
        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
-         'BoxEncodingPredictionTower/conv2d_0/weights'),
+         'BoxPredictionTower/conv2d_0/BatchNorm/feature_1/beta'),
        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
-         'BoxEncodingPredictionTower/conv2d_0/BatchNorm/beta'),
+         'BoxPredictionTower/conv2d_1/weights'),
        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
-         'BoxEncodingPredictionTower/conv2d_1/weights'),
+         'BoxPredictionTower/conv2d_1/BatchNorm/feature_0/beta'),
        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
-         'BoxEncodingPredictionTower/conv2d_1/BatchNorm/beta'),
+         'BoxPredictionTower/conv2d_1/BatchNorm/feature_1/beta'),
+        # Box prediction head
+        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
+         'BoxPredictor/weights'),
+        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
+         'BoxPredictor/biases'),
+        # Class prediction tower
        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
         'ClassPredictionTower/conv2d_0/weights'),
        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
-         'ClassPredictionTower/conv2d_0/BatchNorm/beta'),
+         'ClassPredictionTower/conv2d_0/BatchNorm/feature_0/beta'),
+        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
+         'ClassPredictionTower/conv2d_0/BatchNorm/feature_1/beta'),
        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
         'ClassPredictionTower/conv2d_1/weights'),
        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
-         'ClassPredictionTower/conv2d_1/BatchNorm/beta'),
+         'ClassPredictionTower/conv2d_1/BatchNorm/feature_0/beta'),
+        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
+         'ClassPredictionTower/conv2d_1/BatchNorm/feature_1/beta'),
+        # Class prediction head
+        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
+         'ClassPredictor/weights'),
+        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
+         'ClassPredictor/biases')])
+    self.assertEqual(expected_variable_set, actual_variable_set)
+
+  def test_no_batchnorm_params_when_batchnorm_is_not_configured(self):
+    num_classes_without_background = 6
+    def graph_fn(image_features1, image_features2):
+      conv_box_predictor = box_predictor.WeightSharedConvolutionalBoxPredictor(
+          is_training=False,
+          num_classes=num_classes_without_background,
+          conv_hyperparams_fn=self._build_conv_arg_scope_no_batch_norm(),
+          depth=32,
+          num_layers_before_predictor=2,
+          box_code_size=4)
+      box_predictions = conv_box_predictor.predict(
+          [image_features1, image_features2],
+          num_predictions_per_location=[5, 5],
+          scope='BoxPredictor')
+      box_encodings = tf.concat(
+          box_predictions[box_predictor.BOX_ENCODINGS], axis=1)
+      class_predictions_with_background = tf.concat(
+          box_predictions[box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND],
+          axis=1)
+      return (box_encodings, class_predictions_with_background)
+
+    with self.test_session(graph=tf.Graph()):
+      graph_fn(tf.random_uniform([4, 32, 32, 3], dtype=tf.float32),
+               tf.random_uniform([4, 16, 16, 3], dtype=tf.float32))
+      actual_variable_set = set(
+          [var.op.name for var in tf.trainable_variables()])
+    expected_variable_set = set([
+        # Box prediction tower
+        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
+         'BoxPredictionTower/conv2d_0/weights'),
+        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
+         'BoxPredictionTower/conv2d_0/biases'),
+        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
+         'BoxPredictionTower/conv2d_1/weights'),
        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
-         'BoxEncodingPredictor/weights'),
+         'BoxPredictionTower/conv2d_1/biases'),
+        # Box prediction head
        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
-         'BoxEncodingPredictor/biases'),
+         'BoxPredictor/weights'),
+        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
+         'BoxPredictor/biases'),
+        # Class prediction tower
+        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
+         'ClassPredictionTower/conv2d_0/weights'),
+        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
+         'ClassPredictionTower/conv2d_0/biases'),
+        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
+         'ClassPredictionTower/conv2d_1/weights'),
+        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
+         'ClassPredictionTower/conv2d_1/biases'),
+        # Class prediction head
        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
         'ClassPredictor/weights'),
        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
@@ -628,7 +716,7 @@ class WeightSharedConvolutionalBoxPredictorTest(test_case.TestCase):
           [tf.shape(box_encodings), tf.shape(objectness_predictions)],
           feed_dict={image_features:
                      np.random.rand(4, resolution, resolution, 64)})
-      self.assertAllEqual(box_encodings_shape, [4, expected_num_anchors, 1, 4])
+      self.assertAllEqual(box_encodings_shape, [4, expected_num_anchors, 4])
      self.assertAllEqual(objectness_predictions_shape,
                          [4, expected_num_anchors, 1])


--- a/research/object_detection/core/preprocessor.py
+++ b/research/object_detection/core/preprocessor.py
@@ -2128,7 +2128,8 @@ def resize_to_range(image,
                    max_dimension=None,
                    method=tf.image.ResizeMethod.BILINEAR,
                    align_corners=False,
-                    pad_to_max_dimension=False):
+                    pad_to_max_dimension=False,
+                    per_channel_pad_value=(0, 0, 0)):
  """Resizes an image so its dimensions are within the provided value.

  The output size can be described by two cases:
@@ -2153,6 +2154,8 @@ def resize_to_range(image,
      so the resulting image is of the spatial size
      [max_dimension, max_dimension]. If masks are included they are padded
      similarly.
+    per_channel_pad_value: A tuple of per-channel scalar value to use for
+      padding. By default pads zeros.

  Returns:
    Note that the position of the resized_image_shape changes based on whether
@@ -2181,8 +2184,20 @@ def resize_to_range(image,
        image, new_size[:-1], method=method, align_corners=align_corners)

    if pad_to_max_dimension:
-      new_image = tf.image.pad_to_bounding_box(
-          new_image, 0, 0, max_dimension, max_dimension)
+      channels = tf.unstack(new_image, axis=2)
+      if len(channels) != len(per_channel_pad_value):
+        raise ValueError('Number of channels must be equal to the length of '
+                         'per-channel pad value.')
+      new_image = tf.stack(
+          [
+              tf.pad(
+                  channels[i], [[0, max_dimension - new_size[0]],
+                                [0, max_dimension - new_size[1]]],
+                  constant_values=per_channel_pad_value[i])
+              for i in range(len(channels))
+          ],
+          axis=2)
+      new_image.set_shape([max_dimension, max_dimension, 3])

    result = [new_image]
    if masks is not None:

--- a/research/object_detection/core/preprocessor_test.py
+++ b/research/object_detection/core/preprocessor_test.py
@@ -2316,6 +2316,46 @@ class PreprocessorTest(tf.test.TestCase):
                                              np.random.randn(*in_shape)})
        self.assertAllEqual(out_image_shape, expected_shape)

+  def testResizeToRangeWithPadToMaxDimensionReturnsCorrectShapes(self):
+    in_shape_list = [[60, 40, 3], [15, 30, 3], [15, 50, 3]]
+    min_dim = 50
+    max_dim = 100
+    expected_shape_list = [[100, 100, 3], [100, 100, 3], [100, 100, 3]]
+
+    for in_shape, expected_shape in zip(in_shape_list, expected_shape_list):
+      in_image = tf.placeholder(tf.float32, shape=(None, None, 3))
+      out_image, _ = preprocessor.resize_to_range(
+          in_image,
+          min_dimension=min_dim,
+          max_dimension=max_dim,
+          pad_to_max_dimension=True)
+      self.assertAllEqual(out_image.shape.as_list(), expected_shape)
+      out_image_shape = tf.shape(out_image)
+      with self.test_session() as sess:
+        out_image_shape = sess.run(
+            out_image_shape, feed_dict={in_image: np.random.randn(*in_shape)})
+        self.assertAllEqual(out_image_shape, expected_shape)
+
+  def testResizeToRangeWithPadToMaxDimensionReturnsCorrectTensor(self):
+    in_image_np = np.array([[[0, 1, 2]]], np.float32)
+    ex_image_np = np.array(
+        [[[0, 1, 2], [123.68, 116.779, 103.939]],
+         [[123.68, 116.779, 103.939], [123.68, 116.779, 103.939]]], np.float32)
+    min_dim = 1
+    max_dim = 2
+
+    in_image = tf.placeholder(tf.float32, shape=(None, None, 3))
+    out_image, _ = preprocessor.resize_to_range(
+        in_image,
+        min_dimension=min_dim,
+        max_dimension=max_dim,
+        pad_to_max_dimension=True,
+        per_channel_pad_value=(123.68, 116.779, 103.939))
+
+    with self.test_session() as sess:
+      out_image_np = sess.run(out_image, feed_dict={in_image: in_image_np})
+      self.assertAllClose(ex_image_np, out_image_np)
+
  def testResizeToRangeWithMasksPreservesStaticSpatialShape(self):
    """Tests image resizing, checking output sizes."""
    in_image_shape_list = [[60, 40, 3], [15, 30, 3]]

--- a/research/object_detection/core/standard_fields.py
+++ b/research/object_detection/core/standard_fields.py
@@ -34,6 +34,7 @@ class InputDataFields(object):

  Attributes:
    image: image.
+    image_additional_channels: additional channels.
    original_image: image in the original input size.
    key: unique key corresponding to image.
    source_id: source of the original image.
@@ -66,6 +67,7 @@ class InputDataFields(object):
    multiclass_scores: the label score per class for each box.
  """
  image = 'image'
+  image_additional_channels = 'image_additional_channels'
  original_image = 'original_image'
  key = 'key'
  source_id = 'source_id'
@@ -161,6 +163,8 @@ class TfExampleFields(object):
    height: height of image in pixels, e.g. 462
    width: width of image in pixels, e.g. 581
    source_id: original source of the image
+    image_class_text: image-level label in text format
+    image_class_label: image-level label in numerical format
    object_class_text: labels in text format, e.g. ["person", "cat"]
    object_class_label: labels in numbers, e.g. [16, 8]
    object_bbox_xmin: xmin coordinates of groundtruth box, e.g. 10, 30
@@ -195,6 +199,8 @@ class TfExampleFields(object):
  height = 'image/height'
  width = 'image/width'
  source_id = 'image/source_id'
+  image_class_text = 'image/class/text'
+  image_class_label = 'image/class/label'
  object_class_text = 'image/object/class/text'
  object_class_label = 'image/object/class/label'
  object_bbox_ymin = 'image/object/bbox/ymin'

--- a/research/object_detection/data/oid_object_detection_challenge_500_label_map.pbtxt
+++ b/research/object_detection/data/oid_object_detection_challenge_500_label_map.pbtxt
--- a/research/object_detection/data_decoders/tf_example_decoder.py
+++ b/research/object_detection/data_decoders/tf_example_decoder.py
@@ -112,7 +112,8 @@ class TfExampleDecoder(data_decoder.DataDecoder):
               label_map_proto_file=None,
               use_display_name=False,
               dct_method='',
-               num_keypoints=0):
+               num_keypoints=0,
+               num_additional_channels=0):
    """Constructor sets keys_to_features and items_to_handlers.

    Args:
@@ -133,6 +134,7 @@ class TfExampleDecoder(data_decoder.DataDecoder):
        are ['INTEGER_FAST', 'INTEGER_ACCURATE']. The hint may be ignored, for
        example, the jpeg library does not have that specific option.
      num_keypoints: the number of keypoints per object.
+      num_additional_channels: how many additional channels to use.

    Raises:
      ValueError: If `instance_mask_type` option is not one of
@@ -178,15 +180,28 @@ class TfExampleDecoder(data_decoder.DataDecoder):
        'image/object/weight':
            tf.VarLenFeature(tf.float32),
    }
+    # We are checking `dct_method` instead of passing it directly in order to
+    # ensure TF version 1.6 compatibility.
    if dct_method:
      image = slim_example_decoder.Image(
          image_key='image/encoded',
          format_key='image/format',
          channels=3,
          dct_method=dct_method)
+      additional_channel_image = slim_example_decoder.Image(
+          image_key='image/additional_channels/encoded',
+          format_key='image/format',
+          channels=1,
+          repeated=True,
+          dct_method=dct_method)
    else:
      image = slim_example_decoder.Image(
          image_key='image/encoded', format_key='image/format', channels=3)
+      additional_channel_image = slim_example_decoder.Image(
+          image_key='image/additional_channels/encoded',
+          format_key='image/format',
+          channels=1,
+          repeated=True)
    self.items_to_handlers = {
        fields.InputDataFields.image:
            image,
@@ -211,6 +226,13 @@ class TfExampleDecoder(data_decoder.DataDecoder):
        fields.InputDataFields.groundtruth_weights: (
            slim_example_decoder.Tensor('image/object/weight')),
    }
+    if num_additional_channels > 0:
+      self.keys_to_features[
+          'image/additional_channels/encoded'] = tf.FixedLenFeature(
+              (num_additional_channels,), tf.string)
+      self.items_to_handlers[
+          fields.InputDataFields.
+          image_additional_channels] = additional_channel_image
    self._num_keypoints = num_keypoints
    if num_keypoints > 0:
      self.keys_to_features['image/object/keypoint/x'] = (
@@ -294,6 +316,9 @@ class TfExampleDecoder(data_decoder.DataDecoder):
        [None] indicating if the boxes enclose a crowd.

    Optional:
+      fields.InputDataFields.image_additional_channels - 3D uint8 tensor of
+        shape [None, None, num_additional_channels]. 1st dim is height; 2nd dim
+        is width; 3rd dim is the number of additional channels.
      fields.InputDataFields.groundtruth_difficult - 1D bool tensor of shape
        [None] indicating if the boxes represent `difficult` instances.
      fields.InputDataFields.groundtruth_group_of - 1D bool tensor of shape
@@ -316,6 +341,12 @@ class TfExampleDecoder(data_decoder.DataDecoder):
    tensor_dict[fields.InputDataFields.num_groundtruth_boxes] = tf.shape(
        tensor_dict[fields.InputDataFields.groundtruth_boxes])[0]

+    if fields.InputDataFields.image_additional_channels in tensor_dict:
+      channels = tensor_dict[fields.InputDataFields.image_additional_channels]
+      channels = tf.squeeze(channels, axis=3)
+      channels = tf.transpose(channels, perm=[1, 2, 0])
+      tensor_dict[fields.InputDataFields.image_additional_channels] = channels
+
    def default_groundtruth_weights():
      return tf.ones(
          [tf.shape(tensor_dict[fields.InputDataFields.groundtruth_boxes])[0]],

--- a/research/object_detection/data_decoders/tf_example_decoder_test.py
+++ b/research/object_detection/data_decoders/tf_example_decoder_test.py
@@ -23,6 +23,7 @@ from tensorflow.core.example import example_pb2
 from tensorflow.core.example import feature_pb2
 from tensorflow.python.framework import constant_op
 from tensorflow.python.framework import dtypes
+from tensorflow.python.framework import test_util
 from tensorflow.python.ops import array_ops
 from tensorflow.python.ops import lookup_ops
 from tensorflow.python.ops import parsing_ops
@@ -72,10 +73,41 @@ class TfExampleDecoderTest(tf.test.TestCase):

  def _BytesFeatureFromList(self, ndarray):
    values = ndarray.flatten().tolist()
-    for i in range(len(values)):
-      values[i] = values[i].encode('utf-8')
    return feature_pb2.Feature(bytes_list=feature_pb2.BytesList(value=values))

+  def testDecodeAdditionalChannels(self):
+    image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
+    encoded_jpeg = self._EncodeImage(image_tensor)
+
+    additional_channel_tensor = np.random.randint(
+        256, size=(4, 5, 1)).astype(np.uint8)
+    encoded_additional_channel = self._EncodeImage(additional_channel_tensor)
+    decoded_additional_channel = self._DecodeImage(encoded_additional_channel)
+
+    example = tf.train.Example(
+        features=tf.train.Features(
+            feature={
+                'image/encoded':
+                    self._BytesFeature(encoded_jpeg),
+                'image/additional_channels/encoded':
+                    self._BytesFeatureFromList(
+                        np.array([encoded_additional_channel] * 2)),
+                'image/format':
+                    self._BytesFeature('jpeg'),
+                'image/source_id':
+                    self._BytesFeature('image_id'),
+            })).SerializeToString()
+
+    example_decoder = tf_example_decoder.TfExampleDecoder(
+        num_additional_channels=2)
+    tensor_dict = example_decoder.decode(tf.convert_to_tensor(example))
+
+    with self.test_session() as sess:
+      tensor_dict = sess.run(tensor_dict)
+      self.assertAllEqual(
+          np.concatenate([decoded_additional_channel] * 2, axis=2),
+          tensor_dict[fields.InputDataFields.image_additional_channels])
+
  def testDecodeExampleWithBranchedBackupHandler(self):
    example1 = example_pb2.Example(
        features=feature_pb2.Features(
@@ -304,6 +336,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
    self.assertAllEqual(
        2, tensor_dict[fields.InputDataFields.num_groundtruth_boxes])

+  @test_util.enable_c_shapes
  def testDecodeKeypoint(self):
    image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
    encoded_jpeg = self._EncodeImage(image_tensor)
@@ -331,7 +364,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
                         get_shape().as_list()), [None, 4])
    self.assertAllEqual((tensor_dict[fields.InputDataFields.
                                     groundtruth_keypoints].
-                         get_shape().as_list()), [None, 3, 2])
+                         get_shape().as_list()), [2, 3, 2])
    with self.test_session() as sess:
      tensor_dict = sess.run(tensor_dict)

@@ -376,6 +409,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
    self.assertAllClose(tensor_dict[fields.InputDataFields.groundtruth_weights],
                        np.ones(2, dtype=np.float32))

+  @test_util.enable_c_shapes
  def testDecodeObjectLabel(self):
    image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
    encoded_jpeg = self._EncodeImage(image_tensor)
@@ -391,7 +425,7 @@ class TfExampleDecoderTest(tf.test.TestCase):

    self.assertAllEqual((tensor_dict[
        fields.InputDataFields.groundtruth_classes].get_shape().as_list()),
-                        [None])
+                        [2])

    with self.test_session() as sess:
      tensor_dict = sess.run(tensor_dict)
@@ -522,6 +556,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
    self.assertAllEqual([3, 1],
                        tensor_dict[fields.InputDataFields.groundtruth_classes])

+  @test_util.enable_c_shapes
  def testDecodeObjectArea(self):
    image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
    encoded_jpeg = self._EncodeImage(image_tensor)
@@ -536,13 +571,14 @@ class TfExampleDecoderTest(tf.test.TestCase):
    tensor_dict = example_decoder.decode(tf.convert_to_tensor(example))

    self.assertAllEqual((tensor_dict[fields.InputDataFields.groundtruth_area].
-                         get_shape().as_list()), [None])
+                         get_shape().as_list()), [2])
    with self.test_session() as sess:
      tensor_dict = sess.run(tensor_dict)

    self.assertAllEqual(object_area,
                        tensor_dict[fields.InputDataFields.groundtruth_area])

+  @test_util.enable_c_shapes
  def testDecodeObjectIsCrowd(self):
    image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
    encoded_jpeg = self._EncodeImage(image_tensor)
@@ -558,7 +594,7 @@ class TfExampleDecoderTest(tf.test.TestCase):

    self.assertAllEqual((tensor_dict[
        fields.InputDataFields.groundtruth_is_crowd].get_shape().as_list()),
-                        [None])
+                        [2])
    with self.test_session() as sess:
      tensor_dict = sess.run(tensor_dict)

@@ -566,6 +602,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
                        tensor_dict[
                            fields.InputDataFields.groundtruth_is_crowd])

+  @test_util.enable_c_shapes
  def testDecodeObjectDifficult(self):
    image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
    encoded_jpeg = self._EncodeImage(image_tensor)
@@ -581,7 +618,7 @@ class TfExampleDecoderTest(tf.test.TestCase):

    self.assertAllEqual((tensor_dict[
        fields.InputDataFields.groundtruth_difficult].get_shape().as_list()),
-                        [None])
+                        [2])
    with self.test_session() as sess:
      tensor_dict = sess.run(tensor_dict)

@@ -589,6 +626,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
                        tensor_dict[
                            fields.InputDataFields.groundtruth_difficult])

+  @test_util.enable_c_shapes
  def testDecodeObjectGroupOf(self):
    image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
    encoded_jpeg = self._EncodeImage(image_tensor)
@@ -605,7 +643,7 @@ class TfExampleDecoderTest(tf.test.TestCase):

    self.assertAllEqual((tensor_dict[
        fields.InputDataFields.groundtruth_group_of].get_shape().as_list()),
-                        [None])
+                        [2])
    with self.test_session() as sess:
      tensor_dict = sess.run(tensor_dict)

@@ -637,6 +675,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
        object_weights,
        tensor_dict[fields.InputDataFields.groundtruth_weights])

+  @test_util.enable_c_shapes
  def testDecodeInstanceSegmentation(self):
    num_instances = 4
    image_height = 5
@@ -673,11 +712,11 @@ class TfExampleDecoderTest(tf.test.TestCase):

    self.assertAllEqual((
        tensor_dict[fields.InputDataFields.groundtruth_instance_masks].
-        get_shape().as_list()), [None, None, None])
+        get_shape().as_list()), [4, 5, 3])

    self.assertAllEqual((
        tensor_dict[fields.InputDataFields.groundtruth_classes].
-        get_shape().as_list()), [None])
+        get_shape().as_list()), [4])

    with self.test_session() as sess:
      tensor_dict = sess.run(tensor_dict)

--- a/research/object_detection/dataset_tools/create_oid_tf_record.py
+++ b/research/object_detection/dataset_tools/create_oid_tf_record.py
@@ -16,7 +16,8 @@ r"""Creates TFRecords of Open Images dataset for object detection.

 Example usage:
  python object_detection/dataset_tools/create_oid_tf_record.py \
-    --input_annotations_csv=/path/to/input/annotations-human-bbox.csv \
+    --input_box_annotations_csv=/path/to/input/annotations-human-bbox.csv \
+    --input_image_label_annotations_csv=/path/to/input/annotations-label.csv \
    --input_images_directory=/path/to/input/image_pixels_directory \
    --input_label_map=/path/to/input/labels_bbox_545.labelmap \
    --output_tf_record_path_prefix=/path/to/output/prefix.tfrecord
@@ -27,7 +28,9 @@ https://github.com/openimages/dataset

 This script will include every image found in the input_images_directory in the
 output TFRecord, even if the image has no corresponding bounding box annotations
-in the input_annotations_csv.
+in the input_annotations_csv. If input_image_label_annotations_csv is specified,
+it will add image-level labels as well. Note that the information of whether a
+label is positivelly or negativelly verified is NOT added to tfrecord.
 """
 from __future__ import absolute_import
 from __future__ import division
@@ -40,13 +43,16 @@ import pandas as pd
 import tensorflow as tf

 from object_detection.dataset_tools import oid_tfrecord_creation
+from object_detection.dataset_tools import tf_record_creation_util
 from object_detection.utils import label_map_util

-tf.flags.DEFINE_string('input_annotations_csv', None,
+tf.flags.DEFINE_string('input_box_annotations_csv', None,
                       'Path to CSV containing image bounding box annotations')
 tf.flags.DEFINE_string('input_images_directory', None,
                       'Directory containing the image pixels '
                       'downloaded from the OpenImages GitHub repository.')
+tf.flags.DEFINE_string('input_image_label_annotations_csv', None,
+                       'Path to CSV containing image-level labels annotations')
 tf.flags.DEFINE_string('input_label_map', None, 'Path to the label map proto')
 tf.flags.DEFINE_string(
    'output_tf_record_path_prefix', None,
@@ -61,7 +67,7 @@ def main(_):
  tf.logging.set_verbosity(tf.logging.INFO)

  required_flags = [
-      'input_annotations_csv', 'input_images_directory', 'input_label_map',
+      'input_box_annotations_csv', 'input_images_directory', 'input_label_map',
      'output_tf_record_path_prefix'
  ]
  for flag_name in required_flags:
@@ -69,17 +75,24 @@ def main(_):
      raise ValueError('Flag --{} is required'.format(flag_name))

  label_map = label_map_util.get_label_map_dict(FLAGS.input_label_map)
-  all_annotations = pd.read_csv(FLAGS.input_annotations_csv)
+  all_box_annotations = pd.read_csv(FLAGS.input_box_annotations_csv)
+  if FLAGS.input_image_label_annotations_csv:
+    all_label_annotations = pd.read_csv(FLAGS.input_image_label_annotations_csv)
+    all_label_annotations.rename(
+        columns={'Confidence': 'ConfidenceImageLabel'}, inplace=True)
+  else:
+    all_label_annotations = None
  all_images = tf.gfile.Glob(
      os.path.join(FLAGS.input_images_directory, '*.jpg'))
  all_image_ids = [os.path.splitext(os.path.basename(v))[0] for v in all_images]
  all_image_ids = pd.DataFrame({'ImageID': all_image_ids})
-  all_annotations = pd.concat([all_annotations, all_image_ids])
+  all_annotations = pd.concat(
+      [all_box_annotations, all_image_ids, all_label_annotations])

  tf.logging.log(tf.logging.INFO, 'Found %d images...', len(all_image_ids))

  with contextlib2.ExitStack() as tf_record_close_stack:
-    output_tfrecords = oid_tfrecord_creation.open_sharded_output_tfrecords(
+    output_tfrecords = tf_record_creation_util.open_sharded_output_tfrecords(
        tf_record_close_stack, FLAGS.output_tf_record_path_prefix,
        FLAGS.num_shards)


--- a/research/object_detection/dataset_tools/create_pet_tf_record.py
+++ b/research/object_detection/dataset_tools/create_pet_tf_record.py
--- a/research/object_detection/dataset_tools/oid_hierarchical_labels_expansion.py
+++ b/research/object_detection/dataset_tools/oid_hierarchical_labels_expansion.py
--- a/research/object_detection/dataset_tools/oid_hierarchical_labels_expansion_test.py
+++ b/research/object_detection/dataset_tools/oid_hierarchical_labels_expansion_test.py