Internal changes for object detection. (#3656)

* Force cast of num_classes to integer PiperOrigin-RevId: 188335318 * Updating config util to allow overwriting of cosine decay learning rates. PiperOrigin-RevId: 188338852 * Make box_list_ops.py and box_list_ops_test.py work with C API enabled. The C API has improved shape inference over the original Python code. This causes some previously-working conds to fail. Switching to smart_cond fixes this. Another effect of the improved shape inference is that one of the failures tested gets caught earlier, so I modified the test to reflect this. PiperOrigin-RevId: 188409792 * Fix parallel event file writing issue. Without this change, the event files might get corrupted when multiple evaluations are run in parallel. PiperOrigin-RevId: 188502560 * Deprecating the boolean flag of from_detection_checkpoint. Replace with a string field fine_tune_checkpoint_type to train_config to provide extensibility. The fine_tune_checkpoint_type can currently take value of `detection`, `classification`, or others when the restore_map is overwritten. PiperOrigin-RevId: 188518685 * Automated g4 rollback of changelist 188502560 PiperOrigin-RevId: 188519969 * Introducing eval metrics specs for Coco Mask metrics. This allows metrics to be computed in tensorflow using the tf.learn Estimator. PiperOrigin-RevId: 188528485 * Minor fix to make object_detection/metrics/coco_evaluation.py python3 compatible. PiperOrigin-RevId: 188550683 * Updating eval_util to handle eval_metric_ops from multiple `DetectionEvaluator`s. PiperOrigin-RevId: 188560474 * Allow tensor input for new_height and new_width for resize_image. PiperOrigin-RevId: 188561908 * Fix typo in fine_tune_checkpoint_type name in trainer. PiperOrigin-RevId: 188799033 * Adding mobilenet feature extractor to object detection. PiperOrigin-RevId: 188916897 * Allow label maps to optionally contain an explicit background class with id zero. PiperOrigin-RevId: 188951089 * Fix boundary conditions in random_pad_to_aspect_ratio to ensure that min_scale is always less than max_scale. PiperOrigin-RevId: 189026868 * Fallback on from_detection_checkpoint option if fine_tune_checkpoint_type isn't set. PiperOrigin-RevId: 189052833 * Add proper names for learning rate schedules so we don't see cryptic names on tensorboard. PiperOrigin-RevId: 189069837 * Enforcing that all datasets are batched (and then unbatched in the model) with batch_size >= 1. PiperOrigin-RevId: 189117178 * Adding regularization to total loss returned from DetectionModel.loss(). PiperOrigin-RevId: 189189123 * Standardize the names of loss scalars (for SSD, Faster R-CNN and R-FCN) in both training and eval so they can be compared on tensorboard. Log localization and classification losses in evaluation. PiperOrigin-RevId: 189189940 * Remove negative test from box list ops test. PiperOrigin-RevId: 189229327 * Add an option to warmup learning rate in manual stepping schedule. PiperOrigin-RevId: 189361039 * Replace tf.contrib.slim.tfexample_decoder.LookupTensor with object_detection.data_decoders.tf_example_decoder.LookupTensor. PiperOrigin-RevId: 189388556 * Force regularization summary variables under specific family names. PiperOrigin-RevId: 189393190 * Automated g4 rollback of changelist 188619139 PiperOrigin-RevId: 189396001 * Remove step 0 schedule since we do a hard check for it after cl/189361039 PiperOrigin-RevId: 189396697 * PiperOrigin-RevId: 189040463 * PiperOrigin-RevId: 189059229 * PiperOrigin-RevId: 189214402 * Force regularization summary variables under specific family names. PiperOrigin-RevId: 189393190 * Automated g4 rollback of changelist 188619139 PiperOrigin-RevId: 189396001 * Make slim python3 compatible. * Monir fixes. * Add TargetAssignment summaries in a separate family. PiperOrigin-RevId: 189407487 * 1. Setting `family` keyword arg prepends the summary names twice with the same name. Directly adding family suffix to the name gets rid of this problem. 2. Make sure the eval losses have the same name. PiperOrigin-RevId: 189434618 * Minor fixes to make object detection tf 1.4 compatible. PiperOrigin-RevId: 189437519 * Call the base of mobilenet_v1 feature extractor under the right arg scope and set batchnorm is_training based on the value passed in the constructor. PiperOrigin-RevId: 189460890 * Automated g4 rollback of changelist 188409792 PiperOrigin-RevId: 189463882 * Update object detection syncing. PiperOrigin-RevId: 189601955 * Add an option to warmup learning rate, hold it constant for a certain number of steps and cosine decay it. PiperOrigin-RevId: 189606169 * Let the proposal feature extractor function in faster_rcnn meta architectures return the activations (end_points). PiperOrigin-RevId: 189619301 * Fixed bug which caused masks to be mostly zeros (caused by detection_boxes being in absolute coordinates if scale_to_absolute=True. PiperOrigin-RevId: 189641294 * Open sourcing Mobilenetv2 + SSDLite. PiperOrigin-RevId: 189654520 * Remove unused files.

Internal changes for object detection. (#3656)
* Force cast of num_classes to integer PiperOrigin-RevId: 188335318 * Updating config util to allow overwriting of cosine decay learning rates. PiperOrigin-RevId: 188338852 * Make box_list_ops.py and box_list_ops_test.py work with C API enabled. The C API has improved shape inference over the original Python code. This causes some previously-working conds to fail. Switching to smart_cond fixes this. Another effect of the improved shape inference is that one of the failures tested gets caught earlier, so I modified the test to reflect this. PiperOrigin-RevId: 188409792 * Fix parallel event file writing issue. Without this change, the event files might get corrupted when multiple evaluations are run in parallel. PiperOrigin-RevId: 188502560 * Deprecating the boolean flag of from_detection_checkpoint. Replace with a string field fine_tune_checkpoint_type to train_config to provide extensibility. The fine_tune_checkpoint_type can currently take value of `detection`, `classification`, or others when the restore_map is overwritten. PiperOrigin-RevId: 188518685 * Automated g4 rollback of changelist 188502560 PiperOrigin-RevId: 188519969 * Introducing eval metrics specs for Coco Mask metrics. This allows metrics to be computed in tensorflow using the tf.learn Estimator. PiperOrigin-RevId: 188528485 * Minor fix to make object_detection/metrics/coco_evaluation.py python3 compatible. PiperOrigin-RevId: 188550683 * Updating eval_util to handle eval_metric_ops from multiple `DetectionEvaluator`s. PiperOrigin-RevId: 188560474 * Allow tensor input for new_height and new_width for resize_image. PiperOrigin-RevId: 188561908 * Fix typo in fine_tune_checkpoint_type name in trainer. PiperOrigin-RevId: 188799033 * Adding mobilenet feature extractor to object detection. PiperOrigin-RevId: 188916897 * Allow label maps to optionally contain an explicit background class with id zero. PiperOrigin-RevId: 188951089 * Fix boundary conditions in random_pad_to_aspect_ratio to ensure that min_scale is always less than max_scale. PiperOrigin-RevId: 189026868 * Fallback on from_detection_checkpoint option if fine_tune_checkpoint_type isn't set. PiperOrigin-RevId: 189052833 * Add proper names for learning rate schedules so we don't see cryptic names on tensorboard. PiperOrigin-RevId: 189069837 * Enforcing that all datasets are batched (and then unbatched in the model) with batch_size >= 1. PiperOrigin-RevId: 189117178 * Adding regularization to total loss returned from DetectionModel.loss(). PiperOrigin-RevId: 189189123 * Standardize the names of loss scalars (for SSD, Faster R-CNN and R-FCN) in both training and eval so they can be compared on tensorboard. Log localization and classification losses in evaluation. PiperOrigin-RevId: 189189940 * Remove negative test from box list ops test. PiperOrigin-RevId: 189229327 * Add an option to warmup learning rate in manual stepping schedule. PiperOrigin-RevId: 189361039 * Replace tf.contrib.slim.tfexample_decoder.LookupTensor with object_detection.data_decoders.tf_example_decoder.LookupTensor. PiperOrigin-RevId: 189388556 * Force regularization summary variables under specific family names. PiperOrigin-RevId: 189393190 * Automated g4 rollback of changelist 188619139 PiperOrigin-RevId: 189396001 * Remove step 0 schedule since we do a hard check for it after cl/189361039 PiperOrigin-RevId: 189396697 * PiperOrigin-RevId: 189040463 * PiperOrigin-RevId: 189059229 * PiperOrigin-RevId: 189214402 * Force regularization summary variables under specific family names. PiperOrigin-RevId: 189393190 * Automated g4 rollback of changelist 188619139 PiperOrigin-RevId: 189396001 * Make slim python3 compatible. * Monir fixes. * Add TargetAssignment summaries in a separate family. PiperOrigin-RevId: 189407487 * 1. Setting `family` keyword arg prepends the summary names twice with the same name. Directly adding family suffix to the name gets rid of this problem. 2. Make sure the eval losses have the same name. PiperOrigin-RevId: 189434618 * Minor fixes to make object detection tf 1.4 compatible. PiperOrigin-RevId: 189437519 * Call the base of mobilenet_v1 feature extractor under the right arg scope and set batchnorm is_training based on the value passed in the constructor. PiperOrigin-RevId: 189460890 * Automated g4 rollback of changelist 188409792 PiperOrigin-RevId: 189463882 * Update object detection syncing. PiperOrigin-RevId: 189601955 * Add an option to warmup learning rate, hold it constant for a certain number of steps and cosine decay it. PiperOrigin-RevId: 189606169 * Let the proposal feature extractor function in faster_rcnn meta architectures return the activations (end_points). PiperOrigin-RevId: 189619301 * Fixed bug which caused masks to be mostly zeros (caused by detection_boxes being in absolute coordinates if scale_to_absolute=True. PiperOrigin-RevId: 189641294 * Open sourcing Mobilenetv2 + SSDLite. PiperOrigin-RevId: 189654520 * Remove unused files.
001a2a61 · pkulzc · Sergio Guadarrama · 2913cb24 · 001a2a61 · 001a2a61
Commit 001a2a61 authored Mar 22, 2018 by pkulzc Committed by Sergio Guadarrama Mar 22, 2018
20 changed files
--- a/research/object_detection/builders/dataset_builder.py
+++ b/research/object_detection/builders/dataset_builder.py
@@ -30,8 +30,8 @@ from object_detection.protos import input_reader_pb2
 from object_detection.utils import dataset_util


-def _get_padding_shapes(dataset, max_num_boxes, num_classes,
-                        spatial_image_shape):
+def _get_padding_shapes(dataset, max_num_boxes=None, num_classes=None,
+                        spatial_image_shape=None):
  """Returns shapes to pad dataset tensors to before batching.

  Args:
@@ -41,13 +41,21 @@ def _get_padding_shapes(dataset, max_num_boxes, num_classes,
    num_classes: Number of classes in the dataset needed to compute shapes for
      padding.
    spatial_image_shape: A list of two integers of the form [height, width]
-      containing expected spatial shape of the imaage.
+      containing expected spatial shape of the image.

  Returns:
    A dictionary keyed by fields.InputDataFields containing padding shapes for
    tensors in the dataset.
+
+  Raises:
+    ValueError: If groundtruth classes is neither rank 1 nor rank 2.
  """
-  height, width = spatial_image_shape
+
+  if not spatial_image_shape or spatial_image_shape == [-1, -1]:
+    height, width = None, None
+  else:
+    height, width = spatial_image_shape  # pylint: disable=unpacking-non-sequence
+
  padding_shapes = {
      fields.InputDataFields.image: [height, width, 3],
      fields.InputDataFields.source_id: [],
@@ -55,9 +63,6 @@ def _get_padding_shapes(dataset, max_num_boxes, num_classes,
      fields.InputDataFields.key: [],
      fields.InputDataFields.groundtruth_difficult: [max_num_boxes],
      fields.InputDataFields.groundtruth_boxes: [max_num_boxes, 4],
-      fields.InputDataFields.groundtruth_classes: [
-          max_num_boxes, num_classes
-      ],
      fields.InputDataFields.groundtruth_instance_masks: [max_num_boxes, height,
                                                          width],
      fields.InputDataFields.groundtruth_is_crowd: [max_num_boxes],
@@ -69,6 +74,21 @@ def _get_padding_shapes(dataset, max_num_boxes, num_classes,
      fields.InputDataFields.groundtruth_label_scores: [max_num_boxes],
      fields.InputDataFields.true_image_shape: [3]
  }
+  # Determine whether groundtruth_classes are integers or one-hot encodings, and
+  # apply batching appropriately.
+  classes_shape = dataset.output_shapes[
+      fields.InputDataFields.groundtruth_classes]
+  if len(classes_shape) == 1:  # Class integers.
+    padding_shapes[fields.InputDataFields.groundtruth_classes] = [max_num_boxes]
+  elif len(classes_shape) == 2:  # One-hot or k-hot encoding.
+    padding_shapes[fields.InputDataFields.groundtruth_classes] = [
+        max_num_boxes, num_classes]
+  else:
+    raise ValueError('Groundtruth classes must be a rank 1 tensor (classes) or '
+                     'rank 2 tensor (one-hot encodings)')
+
+  if fields.InputDataFields.original_image in dataset.output_shapes:
+    padding_shapes[fields.InputDataFields.original_image] = [None, None, 3]
  if fields.InputDataFields.groundtruth_keypoints in dataset.output_shapes:
    tensor_shape = dataset.output_shapes[fields.InputDataFields.
                                         groundtruth_keypoints]
@@ -87,28 +107,25 @@ def _get_padding_shapes(dataset, max_num_boxes, num_classes,


 def build(input_reader_config, transform_input_data_fn=None,
-          batch_size=1, max_num_boxes=None, num_classes=None,
+          batch_size=None, max_num_boxes=None, num_classes=None,
          spatial_image_shape=None):
  """Builds a tf.data.Dataset.

  Builds a tf.data.Dataset by applying the `transform_input_data_fn` on all
-  records. Optionally, if `batch_size` > 1 and `max_num_boxes`, `num_classes`
-  and `spatial_image_shape` are not None, returns a padded batched
-  tf.data.Dataset.
+  records. Applies a padded batch to the resulting dataset.

  Args:
    input_reader_config: A input_reader_pb2.InputReader object.
    transform_input_data_fn: Function to apply to all records, or None if
      no extra decoding is required.
-    batch_size: Batch size. If not None, returns a padded batch dataset.
-    max_num_boxes: Max number of groundtruth boxes needed to computes shapes for
-      padding. This is only used if batch_size is greater than 1.
+    batch_size: Batch size. If None, batching is not performed.
+    max_num_boxes: Max number of groundtruth boxes needed to compute shapes for
+      padding. If None, will use a dynamic shape.
    num_classes: Number of classes in the dataset needed to compute shapes for
-      padding. This is only used if batch_size is greater than 1.
-    spatial_image_shape: a list of two integers of the form [height, width]
+      padding. If None, will use a dynamic shape.
+    spatial_image_shape: A list of two integers of the form [height, width]
      containing expected spatial shape of the image after applying
-      transform_input_data_fn. This is needed to compute shapes for padding and
-      only used if batch_size is greater than 1.
+      transform_input_data_fn. If None, will use dynamic shapes.

  Returns:
    A tf.data.Dataset based on the input_reader_config.
@@ -116,8 +133,6 @@ def build(input_reader_config, transform_input_data_fn=None,
  Raises:
    ValueError: On invalid input reader proto.
    ValueError: If no input paths are specified.
-    ValueError: If batch_size > 1 and any of (max_num_boxes, num_classes,
-      spatial_image_shape) is None.
  """
  if not isinstance(input_reader_config, input_reader_pb2.InputReader):
    raise ValueError('input_reader_config not of type '
@@ -147,14 +162,7 @@ def build(input_reader_config, transform_input_data_fn=None,
        functools.partial(tf.data.TFRecordDataset, buffer_size=8 * 1000 * 1000),
        process_fn, config.input_path[:], input_reader_config)

-    if batch_size > 1:
-      if num_classes is None:
-        raise ValueError('`num_classes` must be set when batch_size > 1.')
-      if max_num_boxes is None:
-        raise ValueError('`max_num_boxes` must be set when batch_size > 1.')
-      if spatial_image_shape is None:
-        raise ValueError('`spatial_image_shape` must be set when batch_size > '
-                         '1 .')
+    if batch_size:
      padding_shapes = _get_padding_shapes(dataset, max_num_boxes, num_classes,
                                           spatial_image_shape)
      dataset = dataset.apply(

--- a/research/object_detection/builders/dataset_builder_test.py
+++ b/research/object_detection/builders/dataset_builder_test.py
@@ -91,7 +91,7 @@ class DatasetBuilderTest(tf.test.TestCase):
    input_reader_proto = input_reader_pb2.InputReader()
    text_format.Merge(input_reader_text_proto, input_reader_proto)
    tensor_dict = dataset_util.make_initializable_iterator(
-        dataset_builder.build(input_reader_proto)).get_next()
+        dataset_builder.build(input_reader_proto, batch_size=1)).get_next()

    sv = tf.train.Supervisor(logdir=self.get_temp_dir())
    with sv.prepare_or_wait_for_session() as sess:
@@ -100,15 +100,15 @@ class DatasetBuilderTest(tf.test.TestCase):

    self.assertTrue(
        fields.InputDataFields.groundtruth_instance_masks not in output_dict)
-    self.assertEquals((4, 5, 3),
+    self.assertEquals((1, 4, 5, 3),
                      output_dict[fields.InputDataFields.image].shape)
-    self.assertEquals([2],
+    self.assertAllEqual([[2]],
                        output_dict[fields.InputDataFields.groundtruth_classes])
    self.assertEquals(
-        (1, 4), output_dict[fields.InputDataFields.groundtruth_boxes].shape)
+        (1, 1, 4), output_dict[fields.InputDataFields.groundtruth_boxes].shape)
    self.assertAllEqual(
        [0.0, 0.0, 1.0, 1.0],
-        output_dict[fields.InputDataFields.groundtruth_boxes][0])
+        output_dict[fields.InputDataFields.groundtruth_boxes][0][0])

  def test_build_tf_record_input_reader_and_load_instance_masks(self):
    tf_record_path = self.create_tf_record()
@@ -124,14 +124,14 @@ class DatasetBuilderTest(tf.test.TestCase):
    input_reader_proto = input_reader_pb2.InputReader()
    text_format.Merge(input_reader_text_proto, input_reader_proto)
    tensor_dict = dataset_util.make_initializable_iterator(
-        dataset_builder.build(input_reader_proto)).get_next()
+        dataset_builder.build(input_reader_proto, batch_size=1)).get_next()

    sv = tf.train.Supervisor(logdir=self.get_temp_dir())
    with sv.prepare_or_wait_for_session() as sess:
      sv.start_queue_runners(sess)
      output_dict = sess.run(tensor_dict)
    self.assertAllEqual(
-        (1, 4, 5),
+        (1, 1, 4, 5),
        output_dict[fields.InputDataFields.groundtruth_instance_masks].shape)

  def test_build_tf_record_input_reader_with_batch_size_two(self):

--- a/research/object_detection/builders/model_builder.py
+++ b/research/object_detection/builders/model_builder.py
@@ -36,6 +36,7 @@ from object_detection.models.embedded_ssd_mobilenet_v1_feature_extractor import
 from object_detection.models.ssd_inception_v2_feature_extractor import SSDInceptionV2FeatureExtractor
 from object_detection.models.ssd_inception_v3_feature_extractor import SSDInceptionV3FeatureExtractor
 from object_detection.models.ssd_mobilenet_v1_feature_extractor import SSDMobileNetV1FeatureExtractor
+from object_detection.models.ssd_mobilenet_v2_feature_extractor import SSDMobileNetV2FeatureExtractor
 from object_detection.protos import model_pb2

 # A map of names to SSD feature extractors.
@@ -43,6 +44,7 @@ SSD_FEATURE_EXTRACTOR_CLASS_MAP = {
    'ssd_inception_v2': SSDInceptionV2FeatureExtractor,
    'ssd_inception_v3': SSDInceptionV3FeatureExtractor,
    'ssd_mobilenet_v1': SSDMobileNetV1FeatureExtractor,
+    'ssd_mobilenet_v2': SSDMobileNetV2FeatureExtractor,
    'ssd_resnet50_v1_fpn': ssd_resnet_v1_fpn.SSDResnet50V1FpnFeatureExtractor,
    'ssd_resnet101_v1_fpn': ssd_resnet_v1_fpn.SSDResnet101V1FpnFeatureExtractor,
    'ssd_resnet152_v1_fpn': ssd_resnet_v1_fpn.SSDResnet152V1FpnFeatureExtractor,

--- a/research/object_detection/builders/model_builder_test.py
+++ b/research/object_detection/builders/model_builder_test.py
@@ -31,6 +31,7 @@ from object_detection.models.embedded_ssd_mobilenet_v1_feature_extractor import
 from object_detection.models.ssd_inception_v2_feature_extractor import SSDInceptionV2FeatureExtractor
 from object_detection.models.ssd_inception_v3_feature_extractor import SSDInceptionV3FeatureExtractor
 from object_detection.models.ssd_mobilenet_v1_feature_extractor import SSDMobileNetV1FeatureExtractor
+from object_detection.models.ssd_mobilenet_v2_feature_extractor import SSDMobileNetV2FeatureExtractor
 from object_detection.protos import model_pb2

 FRCNN_RESNET_FEAT_MAPS = {
@@ -368,6 +369,81 @@ class ModelBuilderTest(tf.test.TestCase):
    self.assertTrue(model._feature_extractor._batch_norm_trainable)
    self.assertTrue(model._normalize_loc_loss_by_codesize)

+  def test_create_ssd_mobilenet_v2_model_from_config(self):
+    model_text_proto = """
+      ssd {
+        feature_extractor {
+          type: 'ssd_mobilenet_v2'
+          conv_hyperparams {
+            regularizer {
+                l2_regularizer {
+                }
+              }
+              initializer {
+                truncated_normal_initializer {
+                }
+              }
+          }
+          batch_norm_trainable: true
+        }
+        box_coder {
+          faster_rcnn_box_coder {
+          }
+        }
+        matcher {
+          argmax_matcher {
+          }
+        }
+        similarity_calculator {
+          iou_similarity {
+          }
+        }
+        anchor_generator {
+          ssd_anchor_generator {
+            aspect_ratios: 1.0
+          }
+        }
+        image_resizer {
+          fixed_shape_resizer {
+            height: 320
+            width: 320
+          }
+        }
+        box_predictor {
+          convolutional_box_predictor {
+            conv_hyperparams {
+              regularizer {
+                l2_regularizer {
+                }
+              }
+              initializer {
+                truncated_normal_initializer {
+                }
+              }
+            }
+          }
+        }
+        normalize_loc_loss_by_codesize: true
+        loss {
+          classification_loss {
+            weighted_softmax {
+            }
+          }
+          localization_loss {
+            weighted_smooth_l1 {
+            }
+          }
+        }
+      }"""
+    model_proto = model_pb2.DetectionModel()
+    text_format.Merge(model_text_proto, model_proto)
+    model = self.create_model(model_proto)
+    self.assertIsInstance(model, ssd_meta_arch.SSDMetaArch)
+    self.assertIsInstance(model._feature_extractor,
+                          SSDMobileNetV2FeatureExtractor)
+    self.assertTrue(model._feature_extractor._batch_norm_trainable)
+    self.assertTrue(model._normalize_loc_loss_by_codesize)
+
  def test_create_embedded_ssd_mobilenet_v1_model_from_config(self):
    model_text_proto = """
      ssd {

--- a/research/object_detection/builders/optimizer_builder.py
+++ b/research/object_detection/builders/optimizer_builder.py
@@ -85,7 +85,8 @@ def _create_learning_rate(learning_rate_config):
  learning_rate_type = learning_rate_config.WhichOneof('learning_rate')
  if learning_rate_type == 'constant_learning_rate':
    config = learning_rate_config.constant_learning_rate
-    learning_rate = tf.constant(config.learning_rate, dtype=tf.float32)
+    learning_rate = tf.constant(config.learning_rate, dtype=tf.float32,
+                                name='learning_rate')

  if learning_rate_type == 'exponential_decay_learning_rate':
    config = learning_rate_config.exponential_decay_learning_rate
@@ -94,7 +95,7 @@ def _create_learning_rate(learning_rate_config):
        tf.train.get_or_create_global_step(),
        config.decay_steps,
        config.decay_factor,
-        staircase=config.staircase)
+        staircase=config.staircase, name='learning_rate')

  if learning_rate_type == 'manual_step_learning_rate':
    config = learning_rate_config.manual_step_learning_rate
@@ -105,7 +106,7 @@ def _create_learning_rate(learning_rate_config):
    learning_rate_sequence += [x.learning_rate for x in config.schedule]
    learning_rate = learning_schedules.manual_stepping(
        tf.train.get_or_create_global_step(), learning_rate_step_boundaries,
-        learning_rate_sequence)
+        learning_rate_sequence, config.warmup)

  if learning_rate_type == 'cosine_decay_learning_rate':
    config = learning_rate_config.cosine_decay_learning_rate
@@ -114,7 +115,8 @@ def _create_learning_rate(learning_rate_config):
        config.learning_rate_base,
        config.total_steps,
        config.warmup_learning_rate,
-        config.warmup_steps)
+        config.warmup_steps,
+        config.hold_base_rate_steps)

  if learning_rate is None:
    raise ValueError('Learning_rate %s not supported.' % learning_rate_type)

--- a/research/object_detection/builders/optimizer_builder_test.py
+++ b/research/object_detection/builders/optimizer_builder_test.py
@@ -35,6 +35,7 @@ class LearningRateBuilderTest(tf.test.TestCase):
    text_format.Merge(learning_rate_text_proto, learning_rate_proto)
    learning_rate = optimizer_builder._create_learning_rate(
        learning_rate_proto)
+    self.assertTrue(learning_rate.op.name.endswith('learning_rate'))
    with self.test_session():
      learning_rate_out = learning_rate.eval()
    self.assertAlmostEqual(learning_rate_out, 0.004)
@@ -52,19 +53,22 @@ class LearningRateBuilderTest(tf.test.TestCase):
    text_format.Merge(learning_rate_text_proto, learning_rate_proto)
    learning_rate = optimizer_builder._create_learning_rate(
        learning_rate_proto)
+    self.assertTrue(learning_rate.op.name.endswith('learning_rate'))
    self.assertTrue(isinstance(learning_rate, tf.Tensor))

  def testBuildManualStepLearningRate(self):
    learning_rate_text_proto = """
      manual_step_learning_rate {
+        initial_learning_rate: 0.002
        schedule {
-          step: 0
+          step: 100
          learning_rate: 0.006
        }
        schedule {
          step: 90000
          learning_rate: 0.00006
        }
+        warmup: true
      }
    """
    learning_rate_proto = optimizer_pb2.LearningRate()
@@ -80,6 +84,7 @@ class LearningRateBuilderTest(tf.test.TestCase):
        total_steps: 20000
        warmup_learning_rate: 0.0001
        warmup_steps: 1000
+        hold_base_rate_steps: 20000
      }
    """
    learning_rate_proto = optimizer_pb2.LearningRate()

--- a/research/object_detection/core/box_list_ops_test.py
+++ b/research/object_detection/core/box_list_ops_test.py
@@ -727,21 +727,6 @@ class ConcatenateTest(tf.test.TestCase):

 class NonMaxSuppressionTest(tf.test.TestCase):

-  def test_with_invalid_scores_field(self):
-    corners = tf.constant([[0, 0, 1, 1],
-                           [0, 0.1, 1, 1.1],
-                           [0, -0.1, 1, 0.9],
-                           [0, 10, 1, 11],
-                           [0, 10.1, 1, 11.1],
-                           [0, 100, 1, 101]], tf.float32)
-    boxes = box_list.BoxList(corners)
-    boxes.add_field('scores', tf.constant([.9, .75, .6, .95, .5]))
-    iou_thresh = .5
-    max_output_size = 3
-    with self.assertRaisesWithPredicateMatch(ValueError,
-                                             'Dimensions must be equal'):
-      box_list_ops.non_max_suppression(boxes, iou_thresh, max_output_size)
-
  def test_select_from_three_clusters(self):
    corners = tf.constant([[0, 0, 1, 1],
                           [0, 0.1, 1, 1.1],

--- a/research/object_detection/core/model.py
+++ b/research/object_detection/core/model.py
@@ -275,7 +275,7 @@ class DetectionModel(object):
          fields.BoxListFields.keypoints] = groundtruth_keypoints_list

  @abstractmethod
-  def restore_map(self, from_detection_checkpoint=True):
+  def restore_map(self, fine_tune_checkpoint_type='detection'):
    """Returns a map of variables to load from a foreign checkpoint.

    Returns a map of variable names to load from a checkpoint to variables in
@@ -287,9 +287,10 @@ class DetectionModel(object):
    the num_classes parameter.

    Args:
-      from_detection_checkpoint: whether to restore from a full detection
+      fine_tune_checkpoint_type: whether to restore from a full detection
        checkpoint (with compatible variable names) or to restore from a
        classification checkpoint for initialization prior to training.
+        Valid values: `detection`, `classification`. Default 'detection'.

    Returns:
      A dict mapping variable names (to load from a checkpoint) to variables in

--- a/research/object_detection/core/post_processing.py
+++ b/research/object_detection/core/post_processing.py
@@ -122,7 +122,7 @@ def multiclass_non_max_suppression(boxes,
    if boundaries is not None:
      per_class_boundaries_list = tf.unstack(boundaries, axis=1)
    boxes_ids = (range(num_classes) if len(per_class_boxes_list) > 1
-                 else [0] * num_classes)
+                 else [0] * num_classes.value)
    for class_idx, boxes_idx in zip(range(num_classes), boxes_ids):
      per_class_boxes = per_class_boxes_list[boxes_idx]
      boxlist_and_class_scores = box_list.BoxList(per_class_boxes)

--- a/research/object_detection/core/preprocessor.py
+++ b/research/object_detection/core/preprocessor.py
@@ -233,7 +233,7 @@ def _rgb_to_grayscale(images, name=None):
    rgb_weights = [0.2989, 0.5870, 0.1140]
    rank_1 = tf.expand_dims(tf.rank(images) - 1, 0)
    gray_float = tf.reduce_sum(
-        flt_image * rgb_weights, rank_1, keepdims=True)
+        flt_image * rgb_weights, rank_1, keep_dims=True)
    gray_float.set_shape(images.get_shape()[:-1].concatenate([1]))
    return tf.image.convert_image_dtype(gray_float, orig_dtype, name=name)

@@ -1821,8 +1821,10 @@ def random_pad_to_aspect_ratio(image,
    max_width = tf.maximum(
        max_padded_size_ratio[1] * image_width, target_width)

-    min_scale = tf.maximum(min_height / target_height, min_width / target_width)
    max_scale = tf.minimum(max_height / target_height, max_width / target_width)
+    min_scale = tf.minimum(
+        max_scale,
+        tf.maximum(min_height / target_height, min_width / target_width))

    generator_func = functools.partial(tf.random_uniform, [],
                                       min_scale, max_scale, seed=seed)
@@ -1831,8 +1833,8 @@ def random_pad_to_aspect_ratio(image,
        preprocessor_cache.PreprocessorCache.PAD_TO_ASPECT_RATIO,
        preprocess_vars_cache)

-    target_height = scale * target_height
-    target_width = scale * target_width
+    target_height = tf.round(scale * target_height)
+    target_width = tf.round(scale * target_width)

    new_image = tf.image.pad_to_bounding_box(
        image, 0, 0, tf.to_int32(target_height), tf.to_int32(target_width))
@@ -2261,14 +2263,14 @@ def resize_image(image,
      'ResizeImage',
      values=[image, new_height, new_width, method, align_corners]):
    new_image = tf.image.resize_images(
-        image, [new_height, new_width],
+        image, tf.stack([new_height, new_width]),
        method=method,
        align_corners=align_corners)
    image_shape = shape_utils.combined_static_and_dynamic_shape(image)
    result = [new_image]
    if masks is not None:
      num_instances = tf.shape(masks)[0]
-      new_size = tf.constant([new_height, new_width], dtype=tf.int32)
+      new_size = tf.stack([new_height, new_width])
      def resize_masks_branch():
        new_masks = tf.expand_dims(masks, 3)
        new_masks = tf.image.resize_nearest_neighbor(

--- a/research/object_detection/core/preprocessor_test.py
+++ b/research/object_detection/core/preprocessor_test.py
@@ -1736,6 +1736,41 @@ class PreprocessorTest(tf.test.TestCase):
                                test_masks=True,
                                test_keypoints=True)

+  def testRunRandomPadToAspectRatioWithMinMaxPaddedSizeRatios(self):
+    image = self.createColorfulTestImage()
+    boxes = self.createTestBoxes()
+    labels = self.createTestLabels()
+
+    tensor_dict = {
+        fields.InputDataFields.image: image,
+        fields.InputDataFields.groundtruth_boxes: boxes,
+        fields.InputDataFields.groundtruth_classes: labels
+    }
+
+    preprocessor_arg_map = preprocessor.get_default_func_arg_map()
+    preprocessing_options = [(preprocessor.random_pad_to_aspect_ratio,
+                              {'min_padded_size_ratio': (4.0, 4.0),
+                               'max_padded_size_ratio': (4.0, 4.0)})]
+
+    distorted_tensor_dict = preprocessor.preprocess(
+        tensor_dict, preprocessing_options, func_arg_map=preprocessor_arg_map)
+    distorted_image = distorted_tensor_dict[fields.InputDataFields.image]
+    distorted_boxes = distorted_tensor_dict[
+        fields.InputDataFields.groundtruth_boxes]
+    distorted_labels = distorted_tensor_dict[
+        fields.InputDataFields.groundtruth_classes]
+    with self.test_session() as sess:
+      distorted_image_, distorted_boxes_, distorted_labels_ = sess.run([
+          distorted_image, distorted_boxes, distorted_labels])
+
+      expected_boxes = np.array(
+          [[0.0, 0.125, 0.1875, 0.5], [0.0625, 0.25, 0.1875, 0.5]],
+          dtype=np.float32)
+      self.assertAllEqual(distorted_image_.shape, [1, 800, 800, 3])
+      self.assertAllEqual(distorted_labels_, [1, 2])
+      self.assertAllClose(distorted_boxes_.flatten(),
+                          expected_boxes.flatten())
+
  def testRunRandomPadToAspectRatioWithMasks(self):
    image = self.createColorfulTestImage()
    boxes = self.createTestBoxes()
@@ -2118,6 +2153,33 @@ class PreprocessorTest(tf.test.TestCase):
        self.assertAllEqual(out_image_shape, expected_image_shape)
        self.assertAllEqual(out_masks_shape, expected_mask_shape)

+  def testResizeImageWithMasksTensorInputHeightAndWidth(self):
+    """Tests image resizing, checking output sizes."""
+    in_image_shape_list = [[60, 40, 3], [15, 30, 3]]
+    in_masks_shape_list = [[15, 60, 40], [10, 15, 30]]
+    height = tf.constant(50, dtype=tf.int32)
+    width = tf.constant(100, dtype=tf.int32)
+    expected_image_shape_list = [[50, 100, 3], [50, 100, 3]]
+    expected_masks_shape_list = [[15, 50, 100], [10, 50, 100]]
+
+    for (in_image_shape, expected_image_shape, in_masks_shape,
+         expected_mask_shape) in zip(in_image_shape_list,
+                                     expected_image_shape_list,
+                                     in_masks_shape_list,
+                                     expected_masks_shape_list):
+      in_image = tf.random_uniform(in_image_shape)
+      in_masks = tf.random_uniform(in_masks_shape)
+      out_image, out_masks, _ = preprocessor.resize_image(
+          in_image, in_masks, new_height=height, new_width=width)
+      out_image_shape = tf.shape(out_image)
+      out_masks_shape = tf.shape(out_masks)
+
+      with self.test_session() as sess:
+        out_image_shape, out_masks_shape = sess.run(
+            [out_image_shape, out_masks_shape])
+        self.assertAllEqual(out_image_shape, expected_image_shape)
+        self.assertAllEqual(out_masks_shape, expected_mask_shape)
+
  def testResizeImageWithNoInstanceMask(self):
    """Tests image resizing, checking output sizes."""
    in_image_shape_list = [[60, 40, 3], [15, 30, 3]]

--- a/research/object_detection/data_decoders/tf_example_decoder.py
+++ b/research/object_detection/data_decoders/tf_example_decoder.py
@@ -31,6 +31,44 @@ from object_detection.utils import label_map_util
 slim_example_decoder = tf.contrib.slim.tfexample_decoder


+# TODO(lzc): keep LookupTensor and BackupHandler in sync with
+# tf.contrib.slim.tfexample_decoder version.
+class LookupTensor(slim_example_decoder.Tensor):
+  """An ItemHandler that returns a parsed Tensor, the result of a lookup."""
+
+  def __init__(self,
+               tensor_key,
+               table,
+               shape_keys=None,
+               shape=None,
+               default_value=''):
+    """Initializes the LookupTensor handler.
+
+    Simply calls a vocabulary (most often, a label mapping) lookup.
+
+    Args:
+      tensor_key: the name of the `TFExample` feature to read the tensor from.
+      table: A tf.lookup table.
+      shape_keys: Optional name or list of names of the TF-Example feature in
+        which the tensor shape is stored. If a list, then each corresponds to
+        one dimension of the shape.
+      shape: Optional output shape of the `Tensor`. If provided, the `Tensor` is
+        reshaped accordingly.
+      default_value: The value used when the `tensor_key` is not found in a
+        particular `TFExample`.
+
+    Raises:
+      ValueError: if both `shape_keys` and `shape` are specified.
+    """
+    self._table = table
+    super(LookupTensor, self).__init__(tensor_key, shape_keys, shape,
+                                       default_value)
+
+  def tensors_to_item(self, keys_to_tensors):
+    unmapped_tensor = super(LookupTensor, self).tensors_to_item(keys_to_tensors)
+    return self._table.lookup(unmapped_tensor)
+
+
 class BackupHandler(slim_example_decoder.ItemHandler):
  """An ItemHandler that tries two ItemHandlers in order."""

@@ -207,8 +245,7 @@ class TfExampleDecoder(data_decoder.DataDecoder):
      # switch back to slim_example_decoder.BackupHandler once tf 1.5 becomes
      # more popular.
      label_handler = BackupHandler(
-          slim_example_decoder.LookupTensor(
-              'image/object/class/text', table, default_value=''),
+          LookupTensor('image/object/class/text', table, default_value=''),
          slim_example_decoder.Tensor('image/object/class/label'))
    else:
      label_handler = slim_example_decoder.Tensor('image/object/class/label')

--- a/research/object_detection/data_decoders/tf_example_decoder_test.py
+++ b/research/object_detection/data_decoders/tf_example_decoder_test.py
@@ -108,7 +108,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
    }
    backup_handler = tf_example_decoder.BackupHandler(
        handler=slim_example_decoder.Tensor('image/object/class/label'),
-        backup=slim_example_decoder.LookupTensor('image/object/class/text',
+        backup=tf_example_decoder.LookupTensor('image/object/class/text',
                                               table))
    items_to_handlers = {
        'labels': backup_handler,
@@ -128,6 +128,37 @@ class TfExampleDecoderTest(tf.test.TestCase):
    self.assertAllClose([2, 0, 1], obtained_class_ids_each_example[1])
    self.assertAllClose([42, 10, 901], obtained_class_ids_each_example[2])

+  def testDecodeExampleWithBranchedLookup(self):
+
+    example = example_pb2.Example(features=feature_pb2.Features(feature={
+        'image/object/class/text': self._BytesFeatureFromList(
+            np.array(['cat', 'dog', 'guinea pig'])),
+    }))
+    serialized_example = example.SerializeToString()
+    # 'dog' -> 0, 'guinea pig' -> 1, 'cat' -> 2
+    table = lookup_ops.index_table_from_tensor(
+        constant_op.constant(['dog', 'guinea pig', 'cat']))
+
+    with self.test_session() as sess:
+      sess.run(lookup_ops.tables_initializer())
+
+      serialized_example = array_ops.reshape(serialized_example, shape=[])
+
+      keys_to_features = {
+          'image/object/class/text': parsing_ops.VarLenFeature(dtypes.string),
+      }
+
+      items_to_handlers = {
+          'labels':
+              tf_example_decoder.LookupTensor('image/object/class/text', table),
+      }
+
+      decoder = slim_example_decoder.TFExampleDecoder(keys_to_features,
+                                                      items_to_handlers)
+      obtained_class_ids = decoder.decode(serialized_example)[0].eval()
+
+    self.assertAllClose([2, 0, 1], obtained_class_ids)
+
  def testDecodeJpegImage(self):
    image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
    encoded_jpeg = self._EncodeImage(image_tensor)

--- a/research/object_detection/dataset_tools/create_kitti_tf_record.py
+++ b/research/object_detection/dataset_tools/create_kitti_tf_record.py
@@ -58,10 +58,10 @@ tf.app.flags.DEFINE_string('output_path', '', 'Path to which TFRecord files'
                           'will be located at: <output_path>_train.tfrecord.'
                           'And the TFRecord with the validation set will be'
                           'located at: <output_path>_val.tfrecord')
-tf.app.flags.DEFINE_list('classes_to_use', ['car', 'pedestrian', 'dontcare'],
-                         'Which classes of bounding boxes to use. Adding the'
-                         'dontcare class will remove all bboxs in the dontcare'
-                         'regions.')
+tf.app.flags.DEFINE_string('classes_to_use', 'car,pedestrian,dontcare',
+                           'Comma separated list of class names that will be'
+                           'used. Adding the dontcare class will remove all'
+                           'bboxs in the dontcare regions.')
 tf.app.flags.DEFINE_string('label_map_path', 'data/kitti_label_map.pbtxt',
                           'Path to label map proto.')
 tf.app.flags.DEFINE_integer('validation_set_size', '500', 'Number of images to'
@@ -302,7 +302,7 @@ def main(_):
  convert_kitti_to_tfrecords(
      data_dir=FLAGS.data_dir,
      output_path=FLAGS.output_path,
-      classes_to_use=FLAGS.classes_to_use,
+      classes_to_use=FLAGS.classes_to_use.split(','),
      label_map_path=FLAGS.label_map_path,
      validation_set_size=FLAGS.validation_set_size)


--- a/research/object_detection/eval_util.py
+++ b/research/object_detection/eval_util.py
@@ -12,7 +12,8 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ==============================================================================
-"""Common functions for repeatedly evaluating a checkpoint."""
+"""Common utility functions for evaluation."""
+import collections
 import logging
 import os
 import time
@@ -24,6 +25,7 @@ from object_detection.core import box_list
 from object_detection.core import box_list_ops
 from object_detection.core import keypoint_ops
 from object_detection.core import standard_fields as fields
+from object_detection.metrics import coco_evaluation
 from object_detection.utils import label_map_util
 from object_detection.utils import ops
 from object_detection.utils import visualization_utils as vis_utils
@@ -201,8 +203,9 @@ def _run_checkpoint_once(tensor_dict,
                         num_batches=1,
                         master='',
                         save_graph=False,
-                         save_graph_dir=''):
-  """Evaluates metrics defined in evaluators.
+                         save_graph_dir='',
+                         losses_dict=None):
+  """Evaluates metrics defined in evaluators and returns summaries.

  This function loads the latest checkpoint in checkpoint_dirs and evaluates
  all metrics defined in evaluators. The metrics are processed in batch by the
@@ -240,6 +243,7 @@ def _run_checkpoint_once(tensor_dict,
    save_graph: whether or not the Tensorflow graph is stored as a pbtxt file.
    save_graph_dir: where to store the Tensorflow graph on disk. If save_graph
      is True this must be non-empty.
+    losses_dict: optional dictionary of scalar detection losses.

  Returns:
    global_step: the count of global steps.
@@ -269,6 +273,7 @@ def _run_checkpoint_once(tensor_dict,
    tf.train.write_graph(sess.graph_def, save_graph_dir, 'eval.pbtxt')

  counters = {'skipped': 0, 'success': 0}
+  aggregate_result_losses_dict = collections.defaultdict(list)
  with tf.contrib.slim.queues.QueueRunners(sess):
    try:
      for batch in range(int(num_batches)):
@@ -276,16 +281,22 @@ def _run_checkpoint_once(tensor_dict,
          logging.info('Running eval ops batch %d/%d', batch + 1, num_batches)
        if not batch_processor:
          try:
-            result_dict = sess.run(tensor_dict)
+            if not losses_dict:
+              losses_dict = {}
+            result_dict, result_losses_dict = sess.run([tensor_dict,
+                                                        losses_dict])
            counters['success'] += 1
          except tf.errors.InvalidArgumentError:
            logging.info('Skipping image')
            counters['skipped'] += 1
            result_dict = {}
        else:
-          result_dict = batch_processor(tensor_dict, sess, batch, counters)
+          result_dict, result_losses_dict = batch_processor(
+              tensor_dict, sess, batch, counters, losses_dict=losses_dict)
        if not result_dict:
          continue
+        for key, value in iter(result_losses_dict.items()):
+          aggregate_result_losses_dict[key].append(value)
        for evaluator in evaluators:
          # TODO(b/65130867): Use image_id tensor once we fix the input data
          # decoders to return correct image_id.
@@ -310,6 +321,9 @@ def _run_checkpoint_once(tensor_dict,
          raise ValueError('Metric names between evaluators must not collide.')
        all_evaluator_metrics.update(metrics)
      global_step = tf.train.global_step(sess, tf.train.get_global_step())
+
+      for key, value in iter(aggregate_result_losses_dict.items()):
+        all_evaluator_metrics['Losses/' + key] = np.mean(value)
  sess.close()
  return (global_step, all_evaluator_metrics)

@@ -327,7 +341,8 @@ def repeated_checkpoint_run(tensor_dict,
                            max_number_of_evaluations=None,
                            master='',
                            save_graph=False,
-                            save_graph_dir=''):
+                            save_graph_dir='',
+                            losses_dict=None):
  """Periodically evaluates desired tensors using checkpoint_dirs or restore_fn.

  This function repeatedly loads a checkpoint and evaluates a desired
@@ -367,6 +382,7 @@ def repeated_checkpoint_run(tensor_dict,
    save_graph: whether or not the Tensorflow graph is saved as a pbtxt file.
    save_graph_dir: where to save on disk the Tensorflow graph. If store_graph
      is True this must be non-empty.
+    losses_dict: optional dictionary of scalar detection losses.

  Returns:
    metrics: A dictionary containing metric names and values in the latest
@@ -404,7 +420,8 @@ def repeated_checkpoint_run(tensor_dict,
                                                  variables_to_restore,
                                                  restore_fn, num_batches,
                                                  master, save_graph,
-                                                  save_graph_dir)
+                                                  save_graph_dir,
+                                                  losses_dict=losses_dict)
      write_metrics(metrics, global_step, summary_dir)
    number_of_evaluations += 1

@@ -432,7 +449,7 @@ def result_dict_for_single_example(image,
  have label 1.

  Args:
-    image: A single 4D image tensor of shape [1, H, W, C].
+    image: A single 4D uint8 image tensor of shape [1, H, W, C].
    key: A single string tensor identifying the image.
    detections: A dictionary of detections, returned from
      DetectionModel.postprocess().
@@ -479,7 +496,7 @@ def result_dict_for_single_example(image,
  """
  label_id_offset = 1  # Applying label id offset (b/63711816)

-  input_data_fields = fields.InputDataFields()
+  input_data_fields = fields.InputDataFields
  output_dict = {
      input_data_fields.original_image: image,
      input_data_fields.key: key,
@@ -488,10 +505,6 @@ def result_dict_for_single_example(image,
  detection_fields = fields.DetectionResultFields
  detection_boxes = detections[detection_fields.detection_boxes][0]
  image_shape = tf.shape(image)
-  if scale_to_absolute:
-    absolute_detection_boxlist = box_list_ops.to_absolute_coordinates(
-        box_list.BoxList(detection_boxes), image_shape[1], image_shape[2])
-    detection_boxes = absolute_detection_boxlist.get()
  detection_scores = detections[detection_fields.detection_scores][0]

  if class_agnostic:
@@ -508,6 +521,13 @@ def result_dict_for_single_example(image,
      detection_classes, begin=[0], size=[num_detections])
  detection_scores = tf.slice(
      detection_scores, begin=[0], size=[num_detections])
+
+  if scale_to_absolute:
+    absolute_detection_boxlist = box_list_ops.to_absolute_coordinates(
+        box_list.BoxList(detection_boxes), image_shape[1], image_shape[2])
+    output_dict[detection_fields.detection_boxes] = (
+        absolute_detection_boxlist.get())
+  else:
    output_dict[detection_fields.detection_boxes] = detection_boxes
  output_dict[detection_fields.detection_classes] = detection_classes
  output_dict[detection_fields.detection_scores] = detection_scores
@@ -550,3 +570,69 @@ def result_dict_for_single_example(image,
      output_dict[input_data_fields.groundtruth_classes] = groundtruth_classes

  return output_dict
+
+
+def get_eval_metric_ops_for_evaluators(evaluation_metrics,
+                                       categories,
+                                       eval_dict,
+                                       include_metrics_per_category=False):
+  """Returns a dictionary of eval metric ops to use with `tf.EstimatorSpec`.
+
+  Args:
+    evaluation_metrics: List of evaluation metric names. Current options are
+      'coco_detection_metrics' and 'coco_mask_metrics'.
+    categories: A list of dicts, each of which has the following keys -
+        'id': (required) an integer id uniquely identifying this category.
+        'name': (required) string representing category name e.g., 'cat', 'dog'.
+    eval_dict: An evaluation dictionary, returned from
+      result_dict_for_single_example().
+    include_metrics_per_category: If True, include metrics for each category.
+
+  Returns:
+    A dictionary of metric names to tuple of value_op and update_op that can be
+    used as eval metric ops in tf.EstimatorSpec.
+
+  Raises:
+    ValueError: If any of the metrics in `evaluation_metric` is not
+    'coco_detection_metrics' or 'coco_mask_metrics'.
+  """
+  evaluation_metrics = list(set(evaluation_metrics))
+
+  input_data_fields = fields.InputDataFields
+  detection_fields = fields.DetectionResultFields
+  eval_metric_ops = {}
+  for metric in evaluation_metrics:
+    if metric == 'coco_detection_metrics':
+      coco_evaluator = coco_evaluation.CocoDetectionEvaluator(
+          categories, include_metrics_per_category=include_metrics_per_category)
+      eval_metric_ops.update(
+          coco_evaluator.get_estimator_eval_metric_ops(
+              image_id=eval_dict[input_data_fields.key],
+              groundtruth_boxes=eval_dict[input_data_fields.groundtruth_boxes],
+              groundtruth_classes=eval_dict[
+                  input_data_fields.groundtruth_classes],
+              detection_boxes=eval_dict[detection_fields.detection_boxes],
+              detection_scores=eval_dict[detection_fields.detection_scores],
+              detection_classes=eval_dict[detection_fields.detection_classes]))
+    elif metric == 'coco_mask_metrics':
+      coco_mask_evaluator = coco_evaluation.CocoMaskEvaluator(
+          categories, include_metrics_per_category=include_metrics_per_category)
+      eval_metric_ops.update(
+          coco_mask_evaluator.get_estimator_eval_metric_ops(
+              image_id=eval_dict[input_data_fields.key],
+              groundtruth_boxes=eval_dict[input_data_fields.groundtruth_boxes],
+              groundtruth_classes=eval_dict[
+                  input_data_fields.groundtruth_classes],
+              groundtruth_instance_masks=eval_dict[
+                  input_data_fields.groundtruth_instance_masks],
+              detection_scores=eval_dict[detection_fields.detection_scores],
+              detection_classes=eval_dict[detection_fields.detection_classes],
+              detection_masks=eval_dict[detection_fields.detection_masks]))
+    else:
+      raise ValueError('The only evaluation metrics supported are '
+                       '"coco_detection_metrics" and "coco_mask_metrics". '
+                       'Found {} in the evaluation metrics'.format(metric))
+
+  return eval_metric_ops
+
+
--- a/research/object_detection/eval_util_test.py
+++ b/research/object_detection/eval_util_test.py
+# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for eval_util."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import tensorflow as tf
+
+
+from object_detection import eval_util
+from object_detection.core import standard_fields as fields
+
+
+class EvalUtilTest(tf.test.TestCase):
+
+  def _get_categories_list(self):
+    return [{'id': 0, 'name': 'person'},
+            {'id': 1, 'name': 'dog'},
+            {'id': 2, 'name': 'cat'}]
+
+  def _make_evaluation_dict(self):
+    input_data_fields = fields.InputDataFields
+    detection_fields = fields.DetectionResultFields
+
+    image = tf.zeros(shape=[1, 20, 20, 3], dtype=tf.uint8)
+    key = tf.constant('image1')
+    detection_boxes = tf.constant([[[0., 0., 1., 1.]]])
+    detection_scores = tf.constant([[0.8]])
+    detection_classes = tf.constant([[0]])
+    detection_masks = tf.ones(shape=[1, 1, 20, 20], dtype=tf.float32)
+    num_detections = tf.constant([1])
+    groundtruth_boxes = tf.constant([[0., 0., 1., 1.]])
+    groundtruth_classes = tf.constant([1])
+    groundtruth_instance_masks = tf.ones(shape=[1, 20, 20], dtype=tf.uint8)
+    detections = {
+        detection_fields.detection_boxes: detection_boxes,
+        detection_fields.detection_scores: detection_scores,
+        detection_fields.detection_classes: detection_classes,
+        detection_fields.detection_masks: detection_masks,
+        detection_fields.num_detections: num_detections
+    }
+    groundtruth = {
+        input_data_fields.groundtruth_boxes: groundtruth_boxes,
+        input_data_fields.groundtruth_classes: groundtruth_classes,
+        input_data_fields.groundtruth_instance_masks: groundtruth_instance_masks
+    }
+    return eval_util.result_dict_for_single_example(image, key, detections,
+                                                    groundtruth)
+
+  def test_get_eval_metric_ops_for_coco_detections(self):
+    evaluation_metrics = ['coco_detection_metrics']
+    categories = self._get_categories_list()
+    eval_dict = self._make_evaluation_dict()
+    metric_ops = eval_util.get_eval_metric_ops_for_evaluators(
+        evaluation_metrics, categories, eval_dict)
+    _, update_op = metric_ops['DetectionBoxes_Precision/mAP']
+
+    with self.test_session() as sess:
+      metrics = {}
+      for key, (value_op, _) in metric_ops.iteritems():
+        metrics[key] = value_op
+      sess.run(update_op)
+      metrics = sess.run(metrics)
+      print(metrics)
+      self.assertAlmostEqual(1.0, metrics['DetectionBoxes_Precision/mAP'])
+      self.assertNotIn('DetectionMasks_Precision/mAP', metrics)
+
+  def test_get_eval_metric_ops_for_coco_detections_and_masks(self):
+    evaluation_metrics = ['coco_detection_metrics',
+                          'coco_mask_metrics']
+    categories = self._get_categories_list()
+    eval_dict = self._make_evaluation_dict()
+    metric_ops = eval_util.get_eval_metric_ops_for_evaluators(
+        evaluation_metrics, categories, eval_dict)
+    _, update_op_boxes = metric_ops['DetectionBoxes_Precision/mAP']
+    _, update_op_masks = metric_ops['DetectionMasks_Precision/mAP']
+
+    with self.test_session() as sess:
+      metrics = {}
+      for key, (value_op, _) in metric_ops.iteritems():
+        metrics[key] = value_op
+      sess.run(update_op_boxes)
+      sess.run(update_op_masks)
+      metrics = sess.run(metrics)
+      self.assertAlmostEqual(1.0, metrics['DetectionBoxes_Precision/mAP'])
+      self.assertAlmostEqual(1.0, metrics['DetectionMasks_Precision/mAP'])
+
+  def test_get_eval_metric_ops_raises_error_with_unsupported_metric(self):
+    evaluation_metrics = ['unsupported_metrics']
+    categories = self._get_categories_list()
+    eval_dict = self._make_evaluation_dict()
+    with self.assertRaises(ValueError):
+      eval_util.get_eval_metric_ops_for_evaluators(
+          evaluation_metrics, categories, eval_dict)
+
+
+if __name__ == '__main__':
+  tf.test.main()
--- a/research/object_detection/evaluator.py
+++ b/research/object_detection/evaluator.py
@@ -50,10 +50,10 @@ EVAL_METRICS_CLASS_DICT = {
 EVAL_DEFAULT_METRIC = 'pascal_voc_detection_metrics'


-def _extract_prediction_tensors(model,
+def _extract_predictions_and_losses(model,
                                    create_input_dict_fn,
                                    ignore_groundtruth=False):
-  """Restores the model in a tensorflow session.
+  """Constructs tensorflow detection graph and returns output tensors.

  Args:
    model: model to perform predictions with.
@@ -61,7 +61,11 @@ def _extract_prediction_tensors(model,
    ignore_groundtruth: whether groundtruth should be ignored.

  Returns:
-    tensor_dict: A tensor dictionary with evaluations.
+    prediction_groundtruth_dict: A dictionary with postprocessed tensors (keyed
+      by standard_fields.DetectionResultsFields) and optional groundtruth
+      tensors (keyed by standard_fields.InputDataFields).
+    losses_dict: A dictionary containing detection losses. This is empty when
+      ignore_groundtruth is true.
  """
  input_dict = create_input_dict_fn()
  prefetch_queue = prefetcher.prefetch(input_dict, capacity=500)
@@ -73,6 +77,7 @@ def _extract_prediction_tensors(model,
  detections = model.postprocess(prediction_dict, true_image_shapes)

  groundtruth = None
+  losses_dict = {}
  if not ignore_groundtruth:
    groundtruth = {
        fields.InputDataFields.groundtruth_boxes:
@@ -92,8 +97,14 @@ def _extract_prediction_tensors(model,
    if fields.DetectionResultFields.detection_masks in detections:
      groundtruth[fields.InputDataFields.groundtruth_instance_masks] = (
          input_dict[fields.InputDataFields.groundtruth_instance_masks])
-
-  return eval_util.result_dict_for_single_example(
+    label_id_offset = 1
+    model.provide_groundtruth(
+        [input_dict[fields.InputDataFields.groundtruth_boxes]],
+        [tf.one_hot(input_dict[fields.InputDataFields.groundtruth_classes]
+                    - label_id_offset, depth=model.num_classes)])
+    losses_dict.update(model.loss(prediction_dict, true_image_shapes))
+
+  result_dict = eval_util.result_dict_for_single_example(
      original_image,
      input_dict[fields.InputDataFields.source_id],
      detections,
@@ -101,6 +112,7 @@ def _extract_prediction_tensors(model,
      class_agnostic=(
          fields.DetectionResultFields.detection_classes not in detections),
      scale_to_absolute=True)
+  return result_dict, losses_dict


 def get_evaluators(eval_config, categories):
@@ -157,13 +169,14 @@ def evaluate(create_input_dict_fn, create_model_fn, eval_config, categories,
    logging.fatal('If ignore_groundtruth=True then an export_path is '
                  'required. Aborting!!!')

-  tensor_dict = _extract_prediction_tensors(
+  tensor_dict, losses_dict = _extract_predictions_and_losses(
      model=model,
      create_input_dict_fn=create_input_dict_fn,
      ignore_groundtruth=eval_config.ignore_groundtruth)

-  def _process_batch(tensor_dict, sess, batch_index, counters):
-    """Evaluates tensors in tensor_dict, visualizing the first K examples.
+  def _process_batch(tensor_dict, sess, batch_index, counters,
+                     losses_dict=None):
+    """Evaluates tensors in tensor_dict, losses_dict and visualizes examples.

    This function calls sess.run on tensor_dict, evaluating the original_image
    tensor only on the first K examples and visualizing detections overlaid
@@ -177,12 +190,17 @@ def evaluate(create_input_dict_fn, create_model_fn, eval_config, categories,
        be updated to keep track of number of successful and failed runs,
        respectively.  If these fields are not updated, then the success/skipped
        counter values shown at the end of evaluation will be incorrect.
+      losses_dict: Optional dictonary of scalar loss tensors.

    Returns:
      result_dict: a dictionary of numpy arrays
+      result_losses_dict: a dictionary of scalar losses. This is empty if input
+        losses_dict is None.
    """
    try:
-      result_dict = sess.run(tensor_dict)
+      if not losses_dict:
+        losses_dict = {}
+      result_dict, result_losses_dict = sess.run([tensor_dict, losses_dict])
      counters['success'] += 1
    except tf.errors.InvalidArgumentError:
      logging.info('Skipping image')
@@ -207,7 +225,7 @@ def evaluate(create_input_dict_fn, create_model_fn, eval_config, categories,
          skip_labels=eval_config.skip_labels,
          keep_image_id_for_visualization_export=eval_config.
          keep_image_id_for_visualization_export)
-    return result_dict
+    return result_dict, result_losses_dict

  variables_to_restore = tf.global_variables()
  global_step = tf.train.get_or_create_global_step()
@@ -242,6 +260,7 @@ def evaluate(create_input_dict_fn, create_model_fn, eval_config, categories,
                                 if eval_config.max_evals else None),
      master=eval_config.eval_master,
      save_graph=eval_config.save_graph,
-      save_graph_dir=(eval_dir if eval_config.save_graph else ''))
+      save_graph_dir=(eval_dir if eval_config.save_graph else ''),
+      losses_dict=losses_dict)

  return metrics
--- a/research/object_detection/exporter_test.py
+++ b/research/object_detection/exporter_test.py
@@ -62,7 +62,7 @@ class FakeModel(model.DetectionModel):
            np.arange(64).reshape([2, 2, 4, 4]), tf.float32)
    return postprocessed_tensors

-  def restore_map(self, checkpoint_path, from_detection_checkpoint):
+  def restore_map(self, checkpoint_path, fine_tune_checkpoint_type):
    pass

  def loss(self, prediction_dict, true_image_shapes):

--- a/research/object_detection/g3doc/faq.md
+++ b/research/object_detection/g3doc/faq.md
@@ -6,10 +6,14 @@ introduced in tensorflow 1.5.0 so runing with earlier versions may cause this
 issue. It now has been replaced by
 object_detection.data_decoders.tf_example_decoder.BackupHandler. Whoever sees
 this issue should be able to resolve it by syncing your fork to HEAD.
+Same for LookupTensor.
+
+## Q: AttributeError: 'module' object has no attribute 'LookupTensor'
+A: Similar to BackupHandler, syncing your fork to HEAD should make it work.

 ## Q: Why can't I get the inference time as reported in model zoo?
 A: The inference time reported in model zoo is mean time of testing hundreds of
-images with a internal machine. As mentioned in
+images with an internal machine. As mentioned in
 [Tensorflow detection model zoo](detection_model_zoo.md), this speed depends
 highly on one's specific hardware configuration and should be treated more as
 relative timing.
--- a/research/object_detection/inputs.py
+++ b/research/object_detection/inputs.py
@@ -40,6 +40,11 @@ HASH_KEY = 'hash'
 HASH_BINS = 1 << 31
 SERVING_FED_EXAMPLE_KEY = 'serialized_example'

+# A map of names to methods that help build the input pipeline.
+INPUT_BUILDER_UTIL_MAP = {
+    'dataset_build': dataset_builder.build,
+}
+

 def transform_input_data(tensor_dict,
                         model_preprocess_fn,
@@ -229,7 +234,7 @@ def create_train_input_fn(train_config, train_input_config,
        image_resizer_fn=image_resizer_fn,
        num_classes=config_util.get_number_of_classes(model_config),
        data_augmentation_fn=data_augmentation_fn)
-    dataset = dataset_builder.build(
+    dataset = INPUT_BUILDER_UTIL_MAP['dataset_build'](
        train_input_config,
        transform_input_data_fn=transform_data_fn,
        batch_size=params['batch_size'] if params else train_config.batch_size,
@@ -341,8 +346,13 @@ def create_eval_input_fn(eval_config, eval_input_config, model_config):
        num_classes=num_classes,
        data_augmentation_fn=None,
        retain_original_image=True)
-    dataset = dataset_builder.build(eval_input_config,
-                                    transform_input_data_fn=transform_data_fn)
+    dataset = INPUT_BUILDER_UTIL_MAP['dataset_build'](
+        eval_input_config,
+        transform_input_data_fn=transform_data_fn,
+        batch_size=1,
+        num_classes=config_util.get_number_of_classes(model_config),
+        spatial_image_shape=config_util.get_spatial_image_size(
+            image_resizer_config))
    input_dict = dataset_util.make_initializable_iterator(dataset).get_next()

    hash_from_source_id = tf.string_to_hash_bucket_fast(
@@ -374,16 +384,6 @@ def create_eval_input_fn(eval_config, eval_input_config, model_config):
      labels[fields.InputDataFields.groundtruth_instance_masks] = input_dict[
          fields.InputDataFields.groundtruth_instance_masks]

-    # Add a batch dimension to the tensors.
-    features = {
-        key: tf.expand_dims(features[key], axis=0)
-        for key, feature in features.items()
-    }
-    labels = {
-        key: tf.expand_dims(labels[key], axis=0)
-        for key, label in labels.items()
-    }
-
    return features, labels

  return _eval_input_fn
@@ -426,9 +426,13 @@ def create_predict_input_fn(model_config):
    input_dict = transform_fn(decoder.decode(example))
    images = tf.to_float(input_dict[fields.InputDataFields.image])
    images = tf.expand_dims(images, axis=0)
+    true_image_shape = tf.expand_dims(
+        input_dict[fields.InputDataFields.true_image_shape], axis=0)

    return tf.estimator.export.ServingInputReceiver(
-        features={fields.InputDataFields.image: images},
+        features={
+            fields.InputDataFields.image: images,
+            fields.InputDataFields.true_image_shape: true_image_shape},
        receiver_tensors={SERVING_FED_EXAMPLE_KEY: example})

  return _predict_input_fn