Internal changes for object detection. (#3656)

* Force cast of num_classes to integer PiperOrigin-RevId: 188335318 * Updating config util to allow overwriting of cosine decay learning rates. PiperOrigin-RevId: 188338852 * Make box_list_ops.py and box_list_ops_test.py work with C API enabled. The C API has improved shape inference over the original Python code. This causes some previously-working conds to fail. Switching to smart_cond fixes this. Another effect of the improved shape inference is that one of the failures tested gets caught earlier, so I modified the test to reflect this. PiperOrigin-RevId: 188409792 * Fix parallel event file writing issue. Without this change, the event files might get corrupted when multiple evaluations are run in parallel. PiperOrigin-RevId: 188502560 * Deprecating the boolean flag of from_detection_checkpoint. Replace with a string field fine_tune_checkpoint_type to train_config to provide extensibility. The fine_tune_checkpoint_type can currently take value of `detection`, `classification`, or others when the restore_map is overwritten. PiperOrigin-RevId: 188518685 * Automated g4 rollback of changelist 188502560 PiperOrigin-RevId: 188519969 * Introducing eval metrics specs for Coco Mask metrics. This allows metrics to be computed in tensorflow using the tf.learn Estimator. PiperOrigin-RevId: 188528485 * Minor fix to make object_detection/metrics/coco_evaluation.py python3 compatible. PiperOrigin-RevId: 188550683 * Updating eval_util to handle eval_metric_ops from multiple `DetectionEvaluator`s. PiperOrigin-RevId: 188560474 * Allow tensor input for new_height and new_width for resize_image. PiperOrigin-RevId: 188561908 * Fix typo in fine_tune_checkpoint_type name in trainer. PiperOrigin-RevId: 188799033 * Adding mobilenet feature extractor to object detection. PiperOrigin-RevId: 188916897 * Allow label maps to optionally contain an explicit background class with id zero. PiperOrigin-RevId: 188951089 * Fix boundary conditions in random_pad_to_aspect_ratio to ensure that min_scale is always less than max_scale. PiperOrigin-RevId: 189026868 * Fallback on from_detection_checkpoint option if fine_tune_checkpoint_type isn't set. PiperOrigin-RevId: 189052833 * Add proper names for learning rate schedules so we don't see cryptic names on tensorboard. PiperOrigin-RevId: 189069837 * Enforcing that all datasets are batched (and then unbatched in the model) with batch_size >= 1. PiperOrigin-RevId: 189117178 * Adding regularization to total loss returned from DetectionModel.loss(). PiperOrigin-RevId: 189189123 * Standardize the names of loss scalars (for SSD, Faster R-CNN and R-FCN) in both training and eval so they can be compared on tensorboard. Log localization and classification losses in evaluation. PiperOrigin-RevId: 189189940 * Remove negative test from box list ops test. PiperOrigin-RevId: 189229327 * Add an option to warmup learning rate in manual stepping schedule. PiperOrigin-RevId: 189361039 * Replace tf.contrib.slim.tfexample_decoder.LookupTensor with object_detection.data_decoders.tf_example_decoder.LookupTensor. PiperOrigin-RevId: 189388556 * Force regularization summary variables under specific family names. PiperOrigin-RevId: 189393190 * Automated g4 rollback of changelist 188619139 PiperOrigin-RevId: 189396001 * Remove step 0 schedule since we do a hard check for it after cl/189361039 PiperOrigin-RevId: 189396697 * PiperOrigin-RevId: 189040463 * PiperOrigin-RevId: 189059229 * PiperOrigin-RevId: 189214402 * Force regularization summary variables under specific family names. PiperOrigin-RevId: 189393190 * Automated g4 rollback of changelist 188619139 PiperOrigin-RevId: 189396001 * Make slim python3 compatible. * Monir fixes. * Add TargetAssignment summaries in a separate family. PiperOrigin-RevId: 189407487 * 1. Setting `family` keyword arg prepends the summary names twice with the same name. Directly adding family suffix to the name gets rid of this problem. 2. Make sure the eval losses have the same name. PiperOrigin-RevId: 189434618 * Minor fixes to make object detection tf 1.4 compatible. PiperOrigin-RevId: 189437519 * Call the base of mobilenet_v1 feature extractor under the right arg scope and set batchnorm is_training based on the value passed in the constructor. PiperOrigin-RevId: 189460890 * Automated g4 rollback of changelist 188409792 PiperOrigin-RevId: 189463882 * Update object detection syncing. PiperOrigin-RevId: 189601955 * Add an option to warmup learning rate, hold it constant for a certain number of steps and cosine decay it. PiperOrigin-RevId: 189606169 * Let the proposal feature extractor function in faster_rcnn meta architectures return the activations (end_points). PiperOrigin-RevId: 189619301 * Fixed bug which caused masks to be mostly zeros (caused by detection_boxes being in absolute coordinates if scale_to_absolute=True. PiperOrigin-RevId: 189641294 * Open sourcing Mobilenetv2 + SSDLite. PiperOrigin-RevId: 189654520 * Remove unused files.

Internal changes for object detection. (#3656)
* Force cast of num_classes to integer PiperOrigin-RevId: 188335318 * Updating config util to allow overwriting of cosine decay learning rates. PiperOrigin-RevId: 188338852 * Make box_list_ops.py and box_list_ops_test.py work with C API enabled. The C API has improved shape inference over the original Python code. This causes some previously-working conds to fail. Switching to smart_cond fixes this. Another effect of the improved shape inference is that one of the failures tested gets caught earlier, so I modified the test to reflect this. PiperOrigin-RevId: 188409792 * Fix parallel event file writing issue. Without this change, the event files might get corrupted when multiple evaluations are run in parallel. PiperOrigin-RevId: 188502560 * Deprecating the boolean flag of from_detection_checkpoint. Replace with a string field fine_tune_checkpoint_type to train_config to provide extensibility. The fine_tune_checkpoint_type can currently take value of `detection`, `classification`, or others when the restore_map is overwritten. PiperOrigin-RevId: 188518685 * Automated g4 rollback of changelist 188502560 PiperOrigin-RevId: 188519969 * Introducing eval metrics specs for Coco Mask metrics. This allows metrics to be computed in tensorflow using the tf.learn Estimator. PiperOrigin-RevId: 188528485 * Minor fix to make object_detection/metrics/coco_evaluation.py python3 compatible. PiperOrigin-RevId: 188550683 * Updating eval_util to handle eval_metric_ops from multiple `DetectionEvaluator`s. PiperOrigin-RevId: 188560474 * Allow tensor input for new_height and new_width for resize_image. PiperOrigin-RevId: 188561908 * Fix typo in fine_tune_checkpoint_type name in trainer. PiperOrigin-RevId: 188799033 * Adding mobilenet feature extractor to object detection. PiperOrigin-RevId: 188916897 * Allow label maps to optionally contain an explicit background class with id zero. PiperOrigin-RevId: 188951089 * Fix boundary conditions in random_pad_to_aspect_ratio to ensure that min_scale is always less than max_scale. PiperOrigin-RevId: 189026868 * Fallback on from_detection_checkpoint option if fine_tune_checkpoint_type isn't set. PiperOrigin-RevId: 189052833 * Add proper names for learning rate schedules so we don't see cryptic names on tensorboard. PiperOrigin-RevId: 189069837 * Enforcing that all datasets are batched (and then unbatched in the model) with batch_size >= 1. PiperOrigin-RevId: 189117178 * Adding regularization to total loss returned from DetectionModel.loss(). PiperOrigin-RevId: 189189123 * Standardize the names of loss scalars (for SSD, Faster R-CNN and R-FCN) in both training and eval so they can be compared on tensorboard. Log localization and classification losses in evaluation. PiperOrigin-RevId: 189189940 * Remove negative test from box list ops test. PiperOrigin-RevId: 189229327 * Add an option to warmup learning rate in manual stepping schedule. PiperOrigin-RevId: 189361039 * Replace tf.contrib.slim.tfexample_decoder.LookupTensor with object_detection.data_decoders.tf_example_decoder.LookupTensor. PiperOrigin-RevId: 189388556 * Force regularization summary variables under specific family names. PiperOrigin-RevId: 189393190 * Automated g4 rollback of changelist 188619139 PiperOrigin-RevId: 189396001 * Remove step 0 schedule since we do a hard check for it after cl/189361039 PiperOrigin-RevId: 189396697 * PiperOrigin-RevId: 189040463 * PiperOrigin-RevId: 189059229 * PiperOrigin-RevId: 189214402 * Force regularization summary variables under specific family names. PiperOrigin-RevId: 189393190 * Automated g4 rollback of changelist 188619139 PiperOrigin-RevId: 189396001 * Make slim python3 compatible. * Monir fixes. * Add TargetAssignment summaries in a separate family. PiperOrigin-RevId: 189407487 * 1. Setting `family` keyword arg prepends the summary names twice with the same name. Directly adding family suffix to the name gets rid of this problem. 2. Make sure the eval losses have the same name. PiperOrigin-RevId: 189434618 * Minor fixes to make object detection tf 1.4 compatible. PiperOrigin-RevId: 189437519 * Call the base of mobilenet_v1 feature extractor under the right arg scope and set batchnorm is_training based on the value passed in the constructor. PiperOrigin-RevId: 189460890 * Automated g4 rollback of changelist 188409792 PiperOrigin-RevId: 189463882 * Update object detection syncing. PiperOrigin-RevId: 189601955 * Add an option to warmup learning rate, hold it constant for a certain number of steps and cosine decay it. PiperOrigin-RevId: 189606169 * Let the proposal feature extractor function in faster_rcnn meta architectures return the activations (end_points). PiperOrigin-RevId: 189619301 * Fixed bug which caused masks to be mostly zeros (caused by detection_boxes being in absolute coordinates if scale_to_absolute=True. PiperOrigin-RevId: 189641294 * Open sourcing Mobilenetv2 + SSDLite. PiperOrigin-RevId: 189654520 * Remove unused files.
001a2a61 · pkulzc · Sergio Guadarrama · 2913cb24 · 001a2a61 · 001a2a61
Commit 001a2a61 authored Mar 22, 2018 by pkulzc Committed by Sergio Guadarrama Mar 22, 2018
20 changed files
--- a/research/object_detection/inputs_test.py
+++ b/research/object_detection/inputs_test.py
@@ -64,24 +64,24 @@ class InputsTest(tf.test.TestCase):
        configs['train_config'], configs['train_input_config'], model_config)
    features, labels = train_input_fn()

-    self.assertAllEqual([None, None, 3],
+    self.assertAllEqual([1, None, None, 3],
                        features[fields.InputDataFields.image].shape.as_list())
    self.assertEqual(tf.float32, features[fields.InputDataFields.image].dtype)
-    self.assertAllEqual([],
+    self.assertAllEqual([1],
                        features[inputs.HASH_KEY].shape.as_list())
    self.assertEqual(tf.int32, features[inputs.HASH_KEY].dtype)
    self.assertAllEqual(
-        [None, 4],
+        [1, 50, 4],
        labels[fields.InputDataFields.groundtruth_boxes].shape.as_list())
    self.assertEqual(tf.float32,
                     labels[fields.InputDataFields.groundtruth_boxes].dtype)
    self.assertAllEqual(
-        [None, model_config.faster_rcnn.num_classes],
+        [1, 50, model_config.faster_rcnn.num_classes],
        labels[fields.InputDataFields.groundtruth_classes].shape.as_list())
    self.assertEqual(tf.float32,
                     labels[fields.InputDataFields.groundtruth_classes].dtype)
    self.assertAllEqual(
-        [None],
+        [1, 50],
        labels[fields.InputDataFields.groundtruth_weights].shape.as_list())
    self.assertEqual(tf.float32,
                     labels[fields.InputDataFields.groundtruth_weights].dtype)

--- a/research/object_detection/meta_architectures/faster_rcnn_meta_arch.py
+++ b/research/object_detection/meta_architectures/faster_rcnn_meta_arch.py
@@ -159,6 +159,7 @@ class FasterRCNNFeatureExtractor(object):

    Returns:
      rpn_feature_map: A tensor with shape [batch, height, width, depth]
+      activations: A dictionary mapping activation tensor names to tensors.
    """
    with tf.variable_scope(scope, values=[preprocessed_inputs]):
      return self._extract_proposal_features(preprocessed_inputs, scope)
@@ -906,7 +907,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
      image_shape: A 1-D tensor representing the input image shape.
    """
    image_shape = tf.shape(preprocessed_inputs)
-    rpn_features_to_crop = self._feature_extractor.extract_proposal_features(
+    rpn_features_to_crop, _ = self._feature_extractor.extract_proposal_features(
        preprocessed_inputs, scope=self.first_stage_feature_extractor_scope)

    feature_map_shape = tf.shape(rpn_features_to_crop)
@@ -1649,14 +1650,14 @@ class FasterRCNNMetaArch(model.DetectionModel):
          tf.reduce_sum(localization_losses, axis=1) / normalizer)
      objectness_loss = tf.reduce_mean(
          tf.reduce_sum(objectness_losses, axis=1) / normalizer)
-      loss_dict = {}
-
-      with tf.name_scope('localization_loss'):
-        loss_dict['first_stage_localization_loss'] = (
-            self._first_stage_loc_loss_weight * localization_loss)
-      with tf.name_scope('objectness_loss'):
-        loss_dict['first_stage_objectness_loss'] = (
-            self._first_stage_obj_loss_weight * objectness_loss)
+
+      localization_loss = tf.multiply(self._first_stage_loc_loss_weight,
+                                      localization_loss,
+                                      name='localization_loss')
+      objectness_loss = tf.multiply(self._first_stage_obj_loss_weight,
+                                    objectness_loss, name='objectness_loss')
+      loss_dict = {localization_loss.op.name: localization_loss,
+                   objectness_loss.op.name: objectness_loss}
    return loss_dict

  def _loss_box_classifier(self,
@@ -1782,15 +1783,16 @@ class FasterRCNNMetaArch(model.DetectionModel):
        ) = self._unpad_proposals_and_apply_hard_mining(
            proposal_boxlists, second_stage_loc_losses,
            second_stage_cls_losses, num_proposals)
-      loss_dict = {}
-      with tf.name_scope('localization_loss'):
-        loss_dict['second_stage_localization_loss'] = (
-            self._second_stage_loc_loss_weight * second_stage_loc_loss)
+      localization_loss = tf.multiply(self._second_stage_loc_loss_weight,
+                                      second_stage_loc_loss,
+                                      name='localization_loss')

-      with tf.name_scope('classification_loss'):
-        loss_dict['second_stage_classification_loss'] = (
-            self._second_stage_cls_loss_weight * second_stage_cls_loss)
+      classification_loss = tf.multiply(self._second_stage_cls_loss_weight,
+                                        second_stage_cls_loss,
+                                        name='classification_loss')

+      loss_dict = {localization_loss.op.name: localization_loss,
+                   classification_loss.op.name: classification_loss}
      second_stage_mask_loss = None
      if prediction_masks is not None:
        if groundtruth_masks_list is None:
@@ -1857,9 +1859,9 @@ class FasterRCNNMetaArch(model.DetectionModel):
            tf.boolean_mask(second_stage_mask_losses, paddings_indicator))

      if second_stage_mask_loss is not None:
-        with tf.name_scope('mask_loss'):
-          loss_dict['second_stage_mask_loss'] = (
-              self._second_stage_mask_loss_weight * second_stage_mask_loss)
+        mask_loss = tf.multiply(self._second_stage_mask_loss_weight,
+                                second_stage_mask_loss, name='mask_loss')
+        loss_dict[mask_loss.op.name] = mask_loss
    return loss_dict

  def _padded_batched_proposals_indicator(self,
@@ -1927,26 +1929,32 @@ class FasterRCNNMetaArch(model.DetectionModel):
          decoded_boxlist_list=[proposal_boxlist])

  def restore_map(self,
-                  from_detection_checkpoint=True,
+                  fine_tune_checkpoint_type='detection',
                  load_all_detection_checkpoint_vars=False):
    """Returns a map of variables to load from a foreign checkpoint.

    See parent class for details.

    Args:
-      from_detection_checkpoint: whether to restore from a full detection
+      fine_tune_checkpoint_type: whether to restore from a full detection
        checkpoint (with compatible variable names) or to restore from a
-        classification checkpoint for initialization prior to training. Default
-        True.
+        classification checkpoint for initialization prior to training.
+        Valid values: `detection`, `classification`. Default 'detection'.
       load_all_detection_checkpoint_vars: whether to load all variables (when
-         `from_detection_checkpoint` is True). If False, only variables within
-         the feature extractor scopes are included. Default False.
+         `fine_tune_checkpoint_type` is `detection`). If False, only variables
+         within the feature extractor scopes are included. Default False.

    Returns:
      A dict mapping variable names (to load from a checkpoint) to variables in
      the model graph.
+    Raises:
+      ValueError: if fine_tune_checkpoint_type is neither `classification`
+        nor `detection`.
    """
-    if not from_detection_checkpoint:
+    if fine_tune_checkpoint_type not in ['detection', 'classification']:
+      raise ValueError('Not supported fine_tune_checkpoint_type: {}'.format(
+          fine_tune_checkpoint_type))
+    if fine_tune_checkpoint_type == 'classification':
      return self._feature_extractor.restore_from_classification_checkpoint_fn(
          self.first_stage_feature_extractor_scope,
          self.second_stage_feature_extractor_scope)

--- a/research/object_detection/meta_architectures/faster_rcnn_meta_arch_test_lib.py
+++ b/research/object_detection/meta_architectures/faster_rcnn_meta_arch_test_lib.py
@@ -47,8 +47,9 @@ class FakeFasterRCNNFeatureExtractor(

  def _extract_proposal_features(self, preprocessed_inputs, scope):
    with tf.variable_scope('mock_model'):
-      return 0 * slim.conv2d(preprocessed_inputs,
-                             num_outputs=3, kernel_size=1, scope='layer1')
+      proposal_features = 0 * slim.conv2d(
+          preprocessed_inputs, num_outputs=3, kernel_size=1, scope='layer1')
+      return proposal_features, {}

  def _extract_box_classifier_features(self, proposal_feature_maps, scope):
    with tf.variable_scope('mock_model'):
@@ -792,10 +793,12 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
    loss_dict = model.loss(prediction_dict, true_image_shapes)
    with self.test_session() as sess:
      loss_dict_out = sess.run(loss_dict)
-      self.assertAllClose(loss_dict_out['first_stage_localization_loss'], 0)
-      self.assertAllClose(loss_dict_out['first_stage_objectness_loss'], 0)
-      self.assertTrue('second_stage_localization_loss' not in loss_dict_out)
-      self.assertTrue('second_stage_classification_loss' not in loss_dict_out)
+      self.assertAllClose(loss_dict_out['Loss/RPNLoss/localization_loss'], 0)
+      self.assertAllClose(loss_dict_out['Loss/RPNLoss/objectness_loss'], 0)
+      self.assertTrue('Loss/BoxClassifierLoss/localization_loss'
+                      not in loss_dict_out)
+      self.assertTrue('Loss/BoxClassifierLoss/classification_loss'
+                      not in loss_dict_out)

  # TODO(rathodv): Split test into two - with and without masks.
  def test_loss_full(self):
@@ -890,11 +893,13 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):

    with self.test_session() as sess:
      loss_dict_out = sess.run(loss_dict)
-      self.assertAllClose(loss_dict_out['first_stage_localization_loss'], 0)
-      self.assertAllClose(loss_dict_out['first_stage_objectness_loss'], 0)
-      self.assertAllClose(loss_dict_out['second_stage_localization_loss'], 0)
-      self.assertAllClose(loss_dict_out['second_stage_classification_loss'], 0)
-      self.assertAllClose(loss_dict_out['second_stage_mask_loss'], 0)
+      self.assertAllClose(loss_dict_out['Loss/RPNLoss/localization_loss'], 0)
+      self.assertAllClose(loss_dict_out['Loss/RPNLoss/objectness_loss'], 0)
+      self.assertAllClose(loss_dict_out[
+          'Loss/BoxClassifierLoss/localization_loss'], 0)
+      self.assertAllClose(loss_dict_out[
+          'Loss/BoxClassifierLoss/classification_loss'], 0)
+      self.assertAllClose(loss_dict_out['Loss/BoxClassifierLoss/mask_loss'], 0)

  def test_loss_full_zero_padded_proposals(self):
    model = self._build_model(
@@ -978,11 +983,13 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):

    with self.test_session() as sess:
      loss_dict_out = sess.run(loss_dict)
-      self.assertAllClose(loss_dict_out['first_stage_localization_loss'], 0)
-      self.assertAllClose(loss_dict_out['first_stage_objectness_loss'], 0)
-      self.assertAllClose(loss_dict_out['second_stage_localization_loss'], 0)
-      self.assertAllClose(loss_dict_out['second_stage_classification_loss'], 0)
-      self.assertAllClose(loss_dict_out['second_stage_mask_loss'], 0)
+      self.assertAllClose(loss_dict_out['Loss/RPNLoss/localization_loss'], 0)
+      self.assertAllClose(loss_dict_out['Loss/RPNLoss/objectness_loss'], 0)
+      self.assertAllClose(loss_dict_out[
+          'Loss/BoxClassifierLoss/localization_loss'], 0)
+      self.assertAllClose(loss_dict_out[
+          'Loss/BoxClassifierLoss/classification_loss'], 0)
+      self.assertAllClose(loss_dict_out['Loss/BoxClassifierLoss/mask_loss'], 0)

  def test_loss_full_multiple_label_groundtruth(self):
    model = self._build_model(
@@ -1074,11 +1081,13 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):

    with self.test_session() as sess:
      loss_dict_out = sess.run(loss_dict)
-      self.assertAllClose(loss_dict_out['first_stage_localization_loss'], 0)
-      self.assertAllClose(loss_dict_out['first_stage_objectness_loss'], 0)
-      self.assertAllClose(loss_dict_out['second_stage_localization_loss'], 0)
-      self.assertAllClose(loss_dict_out['second_stage_classification_loss'], 0)
-      self.assertAllClose(loss_dict_out['second_stage_mask_loss'], 0)
+      self.assertAllClose(loss_dict_out['Loss/RPNLoss/localization_loss'], 0)
+      self.assertAllClose(loss_dict_out['Loss/RPNLoss/objectness_loss'], 0)
+      self.assertAllClose(loss_dict_out[
+          'Loss/BoxClassifierLoss/localization_loss'], 0)
+      self.assertAllClose(loss_dict_out[
+          'Loss/BoxClassifierLoss/classification_loss'], 0)
+      self.assertAllClose(loss_dict_out['Loss/BoxClassifierLoss/mask_loss'], 0)

  def test_loss_full_zero_padded_proposals_nonzero_loss_with_two_images(self):
    model = self._build_model(
@@ -1173,12 +1182,13 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):

    with self.test_session() as sess:
      loss_dict_out = sess.run(loss_dict)
-      self.assertAllClose(loss_dict_out['first_stage_localization_loss'],
+      self.assertAllClose(loss_dict_out['Loss/RPNLoss/localization_loss'],
                          exp_loc_loss)
-      self.assertAllClose(loss_dict_out['first_stage_objectness_loss'], 0)
-      self.assertAllClose(loss_dict_out['second_stage_localization_loss'],
-                          exp_loc_loss)
-      self.assertAllClose(loss_dict_out['second_stage_classification_loss'], 0)
+      self.assertAllClose(loss_dict_out['Loss/RPNLoss/objectness_loss'], 0)
+      self.assertAllClose(loss_dict_out[
+          'Loss/BoxClassifierLoss/localization_loss'], exp_loc_loss)
+      self.assertAllClose(loss_dict_out[
+          'Loss/BoxClassifierLoss/classification_loss'], 0)

  def test_loss_with_hard_mining(self):
    model = self._build_model(is_training=True,
@@ -1263,9 +1273,10 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):

    with self.test_session() as sess:
      loss_dict_out = sess.run(loss_dict)
-      self.assertAllClose(loss_dict_out['second_stage_localization_loss'],
-                          exp_loc_loss)
-      self.assertAllClose(loss_dict_out['second_stage_classification_loss'], 0)
+      self.assertAllClose(loss_dict_out[
+          'Loss/BoxClassifierLoss/localization_loss'], exp_loc_loss)
+      self.assertAllClose(loss_dict_out[
+          'Loss/BoxClassifierLoss/classification_loss'], 0)

  def test_restore_map_for_classification_ckpt(self):
    # Define mock tensorflow classification graph and save variables.
@@ -1296,7 +1307,7 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
      preprocessed_inputs, true_image_shapes = model.preprocess(inputs)
      prediction_dict = model.predict(preprocessed_inputs, true_image_shapes)
      model.postprocess(prediction_dict, true_image_shapes)
-      var_map = model.restore_map(from_detection_checkpoint=False)
+      var_map = model.restore_map(fine_tune_checkpoint_type='classification')
      self.assertIsInstance(var_map, dict)
      saver = tf.train.Saver(var_map)
      with self.test_session(graph=test_graph_classification) as sess:
@@ -1338,7 +1349,7 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
      prediction_dict2 = model2.predict(preprocessed_inputs2, true_image_shapes)
      model2.postprocess(prediction_dict2, true_image_shapes)
      another_variable = tf.Variable([17.0], name='another_variable')  # pylint: disable=unused-variable
-      var_map = model2.restore_map(from_detection_checkpoint=True)
+      var_map = model2.restore_map(fine_tune_checkpoint_type='detection')
      self.assertIsInstance(var_map, dict)
      saver = tf.train.Saver(var_map)
      with self.test_session(graph=test_graph_detection2) as sess:
@@ -1366,7 +1377,7 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
      model.postprocess(prediction_dict, true_image_shapes)
      another_variable = tf.Variable([17.0], name='another_variable')  # pylint: disable=unused-variable
      var_map = model.restore_map(
-          from_detection_checkpoint=True,
+          fine_tune_checkpoint_type='detection',
          load_all_detection_checkpoint_vars=True)
      self.assertIsInstance(var_map, dict)
      self.assertIn('another_variable', var_map)

--- a/research/object_detection/meta_architectures/ssd_meta_arch.py
+++ b/research/object_detection/meta_architectures/ssd_meta_arch.py
@@ -503,7 +503,7 @@ class SSDMetaArch(model.DetectionModel):
           self.groundtruth_lists(fields.BoxListFields.classes),
           keypoints, weights)
      if self._add_summaries:
-        self._summarize_input(
+        self._summarize_target_assignment(
            self.groundtruth_lists(fields.BoxListFields.boxes), match_list)
      location_losses = self._localization_loss(
          prediction_dict['box_encodings'],
@@ -538,19 +538,20 @@ class SSDMetaArch(model.DetectionModel):
        normalizer = tf.maximum(tf.to_float(tf.reduce_sum(batch_reg_weights)),
                                1.0)

-      with tf.name_scope('localization_loss'):
-        localization_loss_normalizer = normalizer
-        if self._normalize_loc_loss_by_codesize:
-          localization_loss_normalizer *= self._box_coder.code_size
-        localization_loss = ((self._localization_loss_weight / (
-            localization_loss_normalizer)) * localization_loss)
-      with tf.name_scope('classification_loss'):
-        classification_loss = ((self._classification_loss_weight / normalizer) *
-                               classification_loss)
+      localization_loss_normalizer = normalizer
+      if self._normalize_loc_loss_by_codesize:
+        localization_loss_normalizer *= self._box_coder.code_size
+      localization_loss = tf.multiply((self._localization_loss_weight /
+                                       localization_loss_normalizer),
+                                      localization_loss,
+                                      name='localization_loss')
+      classification_loss = tf.multiply((self._classification_loss_weight /
+                                         normalizer), classification_loss,
+                                        name='classification_loss')

      loss_dict = {
-          'localization_loss': localization_loss,
-          'classification_loss': classification_loss
+          localization_loss.op.name: localization_loss,
+          classification_loss.op.name: classification_loss
      }
    return loss_dict

@@ -615,7 +616,7 @@ class SSDMetaArch(model.DetectionModel):
        self._target_assigner, self.anchors, groundtruth_boxlists,
        groundtruth_classes_with_background_list, groundtruth_weights_list)

-  def _summarize_input(self, groundtruth_boxes_list, match_list):
+  def _summarize_target_assignment(self, groundtruth_boxes_list, match_list):
    """Creates tensorflow summaries for the input boxes and anchors.

    This function creates four summaries corresponding to the average
@@ -639,14 +640,18 @@ class SSDMetaArch(model.DetectionModel):
        [match.num_unmatched_columns() for match in match_list])
    ignored_anchors_per_image = tf.stack(
        [match.num_ignored_columns() for match in match_list])
-    tf.summary.scalar('Input/AvgNumGroundtruthBoxesPerImage',
-                      tf.reduce_mean(tf.to_float(num_boxes_per_image)))
-    tf.summary.scalar('Input/AvgNumPositiveAnchorsPerImage',
-                      tf.reduce_mean(tf.to_float(pos_anchors_per_image)))
-    tf.summary.scalar('Input/AvgNumNegativeAnchorsPerImage',
-                      tf.reduce_mean(tf.to_float(neg_anchors_per_image)))
-    tf.summary.scalar('Input/AvgNumIgnoredAnchorsPerImage',
-                      tf.reduce_mean(tf.to_float(ignored_anchors_per_image)))
+    tf.summary.scalar('AvgNumGroundtruthBoxesPerImage',
+                      tf.reduce_mean(tf.to_float(num_boxes_per_image)),
+                      family='TargetAssignment')
+    tf.summary.scalar('AvgNumPositiveAnchorsPerImage',
+                      tf.reduce_mean(tf.to_float(pos_anchors_per_image)),
+                      family='TargetAssignment')
+    tf.summary.scalar('AvgNumNegativeAnchorsPerImage',
+                      tf.reduce_mean(tf.to_float(neg_anchors_per_image)),
+                      family='TargetAssignment')
+    tf.summary.scalar('AvgNumIgnoredAnchorsPerImage',
+                      tf.reduce_mean(tf.to_float(ignored_anchors_per_image)),
+                      family='TargetAssignment')

  def _apply_hard_mining(self, location_losses, cls_losses, prediction_dict,
                         match_list):
@@ -731,16 +736,17 @@ class SSDMetaArch(model.DetectionModel):
    return decoded_boxes, decoded_keypoints

  def restore_map(self,
-                  from_detection_checkpoint=True,
+                  fine_tune_checkpoint_type='detection',
                  load_all_detection_checkpoint_vars=False):
    """Returns a map of variables to load from a foreign checkpoint.

    See parent class for details.

    Args:
-      from_detection_checkpoint: whether to restore from a full detection
+      fine_tune_checkpoint_type: whether to restore from a full detection
        checkpoint (with compatible variable names) or to restore from a
        classification checkpoint for initialization prior to training.
+        Valid values: `detection`, `classification`. Default 'detection'.
      load_all_detection_checkpoint_vars: whether to load all variables (when
         `from_detection_checkpoint` is True). If False, only variables within
         the appropriate scopes are included. Default False.
@@ -748,15 +754,22 @@ class SSDMetaArch(model.DetectionModel):
    Returns:
      A dict mapping variable names (to load from a checkpoint) to variables in
      the model graph.
+    Raises:
+      ValueError: if fine_tune_checkpoint_type is neither `classification`
+        nor `detection`.
    """
+    if fine_tune_checkpoint_type not in ['detection', 'classification']:
+      raise ValueError('Not supported fine_tune_checkpoint_type: {}'.format(
+          fine_tune_checkpoint_type))
    variables_to_restore = {}
    for variable in tf.global_variables():
      var_name = variable.op.name
-      if from_detection_checkpoint and load_all_detection_checkpoint_vars:
+      if (fine_tune_checkpoint_type == 'detection' and
+          load_all_detection_checkpoint_vars):
        variables_to_restore[var_name] = variable
      else:
        if var_name.startswith(self._extract_features_scope):
-          if not from_detection_checkpoint:
+          if fine_tune_checkpoint_type == 'classification':
            var_name = (
                re.split('^' + self._extract_features_scope + '/',
                         var_name)[-1])

--- a/research/object_detection/meta_architectures/ssd_meta_arch_test.py
+++ b/research/object_detection/meta_architectures/ssd_meta_arch_test.py
@@ -72,6 +72,13 @@ class MockAnchorGenerator2x2(anchor_generator.AnchorGenerator):
    return 4


+def _get_value_for_matching_key(dictionary, suffix):
+  for key in dictionary.keys():
+    if key.endswith(suffix):
+      return dictionary[key]
+  raise ValueError('key not found {}'.format(suffix))
+
+
 class SsdMetaArchTest(test_case.TestCase):

  def _create_model(self, apply_hard_mining=True,
@@ -270,7 +277,9 @@ class SsdMetaArchTest(test_case.TestCase):
      prediction_dict = model.predict(preprocessed_tensor,
                                      true_image_shapes=None)
      loss_dict = model.loss(prediction_dict, true_image_shapes=None)
-      return (loss_dict['localization_loss'], loss_dict['classification_loss'])
+      return (
+          _get_value_for_matching_key(loss_dict, 'Loss/localization_loss'),
+          _get_value_for_matching_key(loss_dict, 'Loss/classification_loss'))

    batch_size = 2
    preprocessed_input = np.random.rand(batch_size, 2, 2, 3).astype(np.float32)
@@ -305,7 +314,7 @@ class SsdMetaArchTest(test_case.TestCase):
      prediction_dict = model.predict(preprocessed_tensor,
                                      true_image_shapes=None)
      loss_dict = model.loss(prediction_dict, true_image_shapes=None)
-      return (loss_dict['localization_loss'],)
+      return (_get_value_for_matching_key(loss_dict, 'Loss/localization_loss'),)

    batch_size = 2
    preprocessed_input = np.random.rand(batch_size, 2, 2, 3).astype(np.float32)
@@ -335,7 +344,9 @@ class SsdMetaArchTest(test_case.TestCase):
      prediction_dict = model.predict(preprocessed_tensor,
                                      true_image_shapes=None)
      loss_dict = model.loss(prediction_dict, true_image_shapes=None)
-      return (loss_dict['localization_loss'], loss_dict['classification_loss'])
+      return (
+          _get_value_for_matching_key(loss_dict, 'Loss/localization_loss'),
+          _get_value_for_matching_key(loss_dict, 'Loss/classification_loss'))

    batch_size = 2
    preprocessed_input = np.random.rand(batch_size, 2, 2, 3).astype(np.float32)
@@ -366,7 +377,7 @@ class SsdMetaArchTest(test_case.TestCase):
      sess.run(init_op)
      saved_model_path = saver.save(sess, save_path)
      var_map = model.restore_map(
-          from_detection_checkpoint=True,
+          fine_tune_checkpoint_type='detection',
          load_all_detection_checkpoint_vars=False)
      self.assertIsInstance(var_map, dict)
      saver = tf.train.Saver(var_map)
@@ -402,7 +413,7 @@ class SsdMetaArchTest(test_case.TestCase):
      prediction_dict = model.predict(preprocessed_inputs, true_image_shapes)
      model.postprocess(prediction_dict, true_image_shapes)
      another_variable = tf.Variable([17.0], name='another_variable')  # pylint: disable=unused-variable
-      var_map = model.restore_map(from_detection_checkpoint=False)
+      var_map = model.restore_map(fine_tune_checkpoint_type='classification')
      self.assertNotIn('another_variable', var_map)
      self.assertIsInstance(var_map, dict)
      saver = tf.train.Saver(var_map)
@@ -423,7 +434,7 @@ class SsdMetaArchTest(test_case.TestCase):
      model.postprocess(prediction_dict, true_image_shapes)
      another_variable = tf.Variable([17.0], name='another_variable')  # pylint: disable=unused-variable
      var_map = model.restore_map(
-          from_detection_checkpoint=True,
+          fine_tune_checkpoint_type='detection',
          load_all_detection_checkpoint_vars=True)
      self.assertIsInstance(var_map, dict)
      self.assertIn('another_variable', var_map)

--- a/research/object_detection/metrics/coco_evaluation.py
+++ b/research/object_detection/metrics/coco_evaluation.py
@@ -100,6 +100,7 @@ class CocoDetectionEvaluator(object_detection_evaluation.DetectionEvaluator):
            groundtruth_is_crowd=groundtruth_is_crowd))
    self._annotation_id += groundtruth_dict[standard_fields.InputDataFields.
                                            groundtruth_boxes].shape[0]
+    # Boolean to indicate whether a detection has been added for this image.
    self._image_ids[image_id] = False

  def add_single_detected_image_info(self,
@@ -120,9 +121,6 @@ class CocoDetectionEvaluator(object_detection_evaluation.DetectionEvaluator):
          [num_boxes] containing detection scores for the boxes.
        DetectionResultFields.detection_classes: integer numpy array of shape
          [num_boxes] containing 1-indexed detection classes for the boxes.
-        DetectionResultFields.detection_masks: optional uint8 numpy array of
-          shape [num_boxes, image_height, image_width] containing instance
-          masks for the boxes.

    Raises:
      ValueError: If groundtruth for the image_id is not available.
@@ -200,7 +198,7 @@ class CocoDetectionEvaluator(object_detection_evaluation.DetectionEvaluator):
        all_metrics_per_category=self._all_metrics_per_category)
    box_metrics.update(box_per_category_ap)
    box_metrics = {'DetectionBoxes_'+ key: value
-                   for key, value in box_metrics.iteritems()}
+                   for key, value in iter(box_metrics.items())}
    return box_metrics

  def get_estimator_eval_metric_ops(self, image_id, groundtruth_boxes,
@@ -282,6 +280,7 @@ class CocoDetectionEvaluator(object_detection_evaluation.DetectionEvaluator):
        return np.float32(self._metrics[metric_name])
      return value_func

+    # Ensure that the metrics are only evaluated once.
    first_value_op = tf.py_func(first_value_func, [], tf.float32)
    eval_metric_ops = {metric_names[0]: (first_value_op, update_op)}
    with tf.control_dependencies([first_value_op]):
@@ -292,7 +291,7 @@ class CocoDetectionEvaluator(object_detection_evaluation.DetectionEvaluator):


 def _check_mask_type_and_value(array_name, masks):
-  """Checks whether mask dtype is uint8 anf the values are either 0 or 1."""
+  """Checks whether mask dtype is uint8 and the values are either 0 or 1."""
  if masks.dtype != np.uint8:
    raise ValueError('{} must be of type np.uint8. Found {}.'.format(
        array_name, masks.dtype))
@@ -334,6 +333,9 @@ class CocoMaskEvaluator(object_detection_evaluation.DetectionEvaluator):
                                         groundtruth_dict):
    """Adds groundtruth for a single image to be used for evaluation.

+    If the image has already been added, a warning is logged, and groundtruth is
+    ignored.
+
    Args:
      image_id: A unique string/integer identifier for the image.
      groundtruth_dict: A dictionary containing -
@@ -379,6 +381,9 @@ class CocoMaskEvaluator(object_detection_evaluation.DetectionEvaluator):
                                     detections_dict):
    """Adds detections for a single image to be used for evaluation.

+    If a detection has already been added for this image id, a warning is
+    logged, and the detection is skipped.
+
    Args:
      image_id: A unique string/integer identifier for the image.
      detections_dict: A dictionary containing -
@@ -435,25 +440,25 @@ class CocoMaskEvaluator(object_detection_evaluation.DetectionEvaluator):
      A dictionary holding -

      1. summary_metrics:
-      'Precision/mAP': mean average precision over classes averaged over IOU
-        thresholds ranging from .5 to .95 with .05 increments
-      'Precision/mAP@.50IOU': mean average precision at 50% IOU
-      'Precision/mAP@.75IOU': mean average precision at 75% IOU
-      'Precision/mAP (small)': mean average precision for small objects
-                      (area < 32^2 pixels)
-      'Precision/mAP (medium)': mean average precision for medium sized
-                      objects (32^2 pixels < area < 96^2 pixels)
-      'Precision/mAP (large)': mean average precision for large objects
-                      (96^2 pixels < area < 10000^2 pixels)
-      'Recall/AR@1': average recall with 1 detection
-      'Recall/AR@10': average recall with 10 detections
-      'Recall/AR@100': average recall with 100 detections
-      'Recall/AR@100 (small)': average recall for small objects with 100
-        detections
-      'Recall/AR@100 (medium)': average recall for medium objects with 100
-        detections
-      'Recall/AR@100 (large)': average recall for large objects with 100
-        detections
+      'DetectionMasks_Precision/mAP': mean average precision over classes
+        averaged over IOU thresholds ranging from .5 to .95 with .05 increments.
+      'DetectionMasks_Precision/mAP@.50IOU': mean average precision at 50% IOU.
+      'DetectionMasks_Precision/mAP@.75IOU': mean average precision at 75% IOU.
+      'DetectionMasks_Precision/mAP (small)': mean average precision for small
+        objects (area < 32^2 pixels).
+      'DetectionMasks_Precision/mAP (medium)': mean average precision for medium
+        sized objects (32^2 pixels < area < 96^2 pixels).
+      'DetectionMasks_Precision/mAP (large)': mean average precision for large
+        objects (96^2 pixels < area < 10000^2 pixels).
+      'DetectionMasks_Recall/AR@1': average recall with 1 detection.
+      'DetectionMasks_Recall/AR@10': average recall with 10 detections.
+      'DetectionMasks_Recall/AR@100': average recall with 100 detections.
+      'DetectionMasks_Recall/AR@100 (small)': average recall for small objects
+        with 100 detections.
+      'DetectionMasks_Recall/AR@100 (medium)': average recall for medium objects
+        with 100 detections.
+      'DetectionMasks_Recall/AR@100 (large)': average recall for large objects
+        with 100 detections.

      2. per_category_ap: if include_metrics_per_category is True, category
      specific results with keys of the form:
@@ -482,3 +487,101 @@ class CocoMaskEvaluator(object_detection_evaluation.DetectionEvaluator):
    mask_metrics = {'DetectionMasks_'+ key: value
                    for key, value in mask_metrics.iteritems()}
    return mask_metrics
+
+  def get_estimator_eval_metric_ops(self, image_id, groundtruth_boxes,
+                                    groundtruth_classes,
+                                    groundtruth_instance_masks,
+                                    detection_scores, detection_classes,
+                                    detection_masks):
+    """Returns a dictionary of eval metric ops to use with `tf.EstimatorSpec`.
+
+    Note that once value_op is called, the detections and groundtruth added via
+    update_op are cleared.
+
+    Args:
+      image_id: Unique string/integer identifier for the image.
+      groundtruth_boxes: float32 tensor of shape [num_boxes, 4] containing
+        `num_boxes` groundtruth boxes of the format
+        [ymin, xmin, ymax, xmax] in absolute image coordinates.
+      groundtruth_classes: int32 tensor of shape [num_boxes] containing
+        1-indexed groundtruth classes for the boxes.
+      groundtruth_instance_masks: uint8 tensor array of shape
+        [num_boxes, image_height, image_width] containing groundtruth masks
+        corresponding to the boxes. The elements of the array must be in {0, 1}.
+      detection_scores: float32 tensor of shape [num_boxes] containing
+        detection scores for the boxes.
+      detection_classes: int32 tensor of shape [num_boxes] containing
+        1-indexed detection classes for the boxes.
+      detection_masks: uint8 tensor array of shape
+        [num_boxes, image_height, image_width] containing instance masks
+        corresponding to the boxes. The elements of the array must be in {0, 1}.
+
+    Returns:
+      a dictionary of metric names to tuple of value_op and update_op that can
+      be used as eval metric ops in tf.EstimatorSpec. Note that all update ops
+      must be run together and similarly all value ops must be run together to
+      guarantee correct behaviour.
+    """
+    def update_op(
+        image_id,
+        groundtruth_boxes,
+        groundtruth_classes,
+        groundtruth_instance_masks,
+        detection_scores,
+        detection_classes,
+        detection_masks):
+      self.add_single_ground_truth_image_info(
+          image_id,
+          {'groundtruth_boxes': groundtruth_boxes,
+           'groundtruth_classes': groundtruth_classes,
+           'groundtruth_instance_masks': groundtruth_instance_masks})
+      self.add_single_detected_image_info(
+          image_id,
+          {'detection_scores': detection_scores,
+           'detection_classes': detection_classes,
+           'detection_masks': detection_masks})
+
+    update_op = tf.py_func(update_op, [image_id,
+                                       groundtruth_boxes,
+                                       groundtruth_classes,
+                                       groundtruth_instance_masks,
+                                       detection_scores,
+                                       detection_classes,
+                                       detection_masks], [])
+    metric_names = ['DetectionMasks_Precision/mAP',
+                    'DetectionMasks_Precision/mAP@.50IOU',
+                    'DetectionMasks_Precision/mAP@.75IOU',
+                    'DetectionMasks_Precision/mAP (large)',
+                    'DetectionMasks_Precision/mAP (medium)',
+                    'DetectionMasks_Precision/mAP (small)',
+                    'DetectionMasks_Recall/AR@1',
+                    'DetectionMasks_Recall/AR@10',
+                    'DetectionMasks_Recall/AR@100',
+                    'DetectionMasks_Recall/AR@100 (large)',
+                    'DetectionMasks_Recall/AR@100 (medium)',
+                    'DetectionMasks_Recall/AR@100 (small)']
+    if self._include_metrics_per_category:
+      for category_dict in self._categories:
+        metric_names.append('DetectionMasks_PerformanceByCategory/mAP/' +
+                            category_dict['name'])
+
+    def first_value_func():
+      self._metrics = self.evaluate()
+      self.clear()
+      return np.float32(self._metrics[metric_names[0]])
+
+    def value_func_factory(metric_name):
+      def value_func():
+        return np.float32(self._metrics[metric_name])
+      return value_func
+
+    # Ensure that the metrics are only evaluated once.
+    first_value_op = tf.py_func(first_value_func, [], tf.float32)
+    eval_metric_ops = {metric_names[0]: (first_value_op, update_op)}
+    with tf.control_dependencies([first_value_op]):
+      for metric_name in metric_names[1:]:
+        eval_metric_ops[metric_name] = (tf.py_func(
+            value_func_factory(metric_name), [], np.float32), update_op)
+    return eval_metric_ops
+
+
--- a/research/object_detection/metrics/coco_evaluation_test.py
+++ b/research/object_detection/metrics/coco_evaluation_test.py
@@ -403,5 +403,101 @@ class CocoMaskEvaluationTest(tf.test.TestCase):
    self.assertFalse(coco_evaluator._detection_masks_list)


+class CocoMaskEvaluationPyFuncTest(tf.test.TestCase):
+
+  def testGetOneMAPWithMatchingGroundtruthAndDetections(self):
+    category_list = [{'id': 0, 'name': 'person'},
+                     {'id': 1, 'name': 'cat'},
+                     {'id': 2, 'name': 'dog'}]
+    coco_evaluator = coco_evaluation.CocoMaskEvaluator(category_list)
+    image_id = tf.placeholder(tf.string, shape=())
+    groundtruth_boxes = tf.placeholder(tf.float32, shape=(None, 4))
+    groundtruth_classes = tf.placeholder(tf.float32, shape=(None))
+    groundtruth_masks = tf.placeholder(tf.uint8, shape=(None, None, None))
+    detection_scores = tf.placeholder(tf.float32, shape=(None))
+    detection_classes = tf.placeholder(tf.float32, shape=(None))
+    detection_masks = tf.placeholder(tf.uint8, shape=(None, None, None))
+
+    eval_metric_ops = coco_evaluator.get_estimator_eval_metric_ops(
+        image_id, groundtruth_boxes,
+        groundtruth_classes,
+        groundtruth_masks,
+        detection_scores,
+        detection_classes,
+        detection_masks)
+
+    _, update_op = eval_metric_ops['DetectionMasks_Precision/mAP']
+
+    with self.test_session() as sess:
+      sess.run(update_op,
+               feed_dict={
+                   image_id: 'image1',
+                   groundtruth_boxes: np.array([[100., 100., 200., 200.]]),
+                   groundtruth_classes: np.array([1]),
+                   groundtruth_masks: np.pad(np.ones([1, 100, 100],
+                                                     dtype=np.uint8),
+                                             ((0, 0), (10, 10), (10, 10)),
+                                             mode='constant'),
+                   detection_scores: np.array([.8]),
+                   detection_classes: np.array([1]),
+                   detection_masks: np.pad(np.ones([1, 100, 100],
+                                                   dtype=np.uint8),
+                                           ((0, 0), (10, 10), (10, 10)),
+                                           mode='constant')
+               })
+      sess.run(update_op,
+               feed_dict={
+                   image_id: 'image2',
+                   groundtruth_boxes: np.array([[50., 50., 100., 100.]]),
+                   groundtruth_classes: np.array([1]),
+                   groundtruth_masks: np.pad(np.ones([1, 50, 50],
+                                                     dtype=np.uint8),
+                                             ((0, 0), (10, 10), (10, 10)),
+                                             mode='constant'),
+                   detection_scores: np.array([.8]),
+                   detection_classes: np.array([1]),
+                   detection_masks: np.pad(np.ones([1, 50, 50], dtype=np.uint8),
+                                           ((0, 0), (10, 10), (10, 10)),
+                                           mode='constant')
+               })
+      sess.run(update_op,
+               feed_dict={
+                   image_id: 'image3',
+                   groundtruth_boxes: np.array([[25., 25., 50., 50.]]),
+                   groundtruth_classes: np.array([1]),
+                   groundtruth_masks: np.pad(np.ones([1, 25, 25],
+                                                     dtype=np.uint8),
+                                             ((0, 0), (10, 10), (10, 10)),
+                                             mode='constant'),
+                   detection_scores: np.array([.8]),
+                   detection_classes: np.array([1]),
+                   detection_masks: np.pad(np.ones([1, 25, 25],
+                                                   dtype=np.uint8),
+                                           ((0, 0), (10, 10), (10, 10)),
+                                           mode='constant')
+               })
+    metrics = {}
+    for key, (value_op, _) in eval_metric_ops.iteritems():
+      metrics[key] = value_op
+    metrics = sess.run(metrics)
+    self.assertAlmostEqual(metrics['DetectionMasks_Precision/mAP'], 1.0)
+    self.assertAlmostEqual(metrics['DetectionMasks_Precision/mAP@.50IOU'], 1.0)
+    self.assertAlmostEqual(metrics['DetectionMasks_Precision/mAP@.75IOU'], 1.0)
+    self.assertAlmostEqual(metrics['DetectionMasks_Precision/mAP (large)'], 1.0)
+    self.assertAlmostEqual(metrics['DetectionMasks_Precision/mAP (medium)'],
+                           1.0)
+    self.assertAlmostEqual(metrics['DetectionMasks_Precision/mAP (small)'], 1.0)
+    self.assertAlmostEqual(metrics['DetectionMasks_Recall/AR@1'], 1.0)
+    self.assertAlmostEqual(metrics['DetectionMasks_Recall/AR@10'], 1.0)
+    self.assertAlmostEqual(metrics['DetectionMasks_Recall/AR@100'], 1.0)
+    self.assertAlmostEqual(metrics['DetectionMasks_Recall/AR@100 (large)'], 1.0)
+    self.assertAlmostEqual(metrics['DetectionMasks_Recall/AR@100 (medium)'],
+                           1.0)
+    self.assertAlmostEqual(metrics['DetectionMasks_Recall/AR@100 (small)'], 1.0)
+    self.assertFalse(coco_evaluator._groundtruth_list)
+    self.assertFalse(coco_evaluator._image_ids_with_detections)
+    self.assertFalse(coco_evaluator._image_id_to_mask_shape_map)
+    self.assertFalse(coco_evaluator._detection_masks_list)
+
 if __name__ == '__main__':
  tf.test.main()
--- a/research/object_detection/model.py
+++ b/research/object_detection/model.py
@@ -39,7 +39,6 @@ from object_detection import model_hparams
 from object_detection.builders import model_builder
 from object_detection.builders import optimizer_builder
 from object_detection.core import standard_fields as fields
-from object_detection.metrics import coco_evaluation
 from object_detection.utils import config_util
 from object_detection.utils import label_map_util
 from object_detection.utils import shape_utils
@@ -121,8 +120,8 @@ def unstack_batch(tensor_dict, unpad_groundtruth_tensors=True):
  2. [batch_size, height, width, channels]
  3. [batch_size, num_boxes, d1, d2, ... dn]

-  When unpad_tensors is set to true, unstacked tensors of form 3 above are
-  sliced along the `num_boxes` dimension using the value in tensor
+  When unpad_groundtruth_tensors is set to true, unstacked tensors of form 3
+  above are sliced along the `num_boxes` dimension using the value in tensor
  field.InputDataFields.num_groundtruth_boxes.

  Note that this function has a static list of input data fields and has to be
@@ -198,6 +197,7 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
  """
  train_config = configs['train_config']
  eval_input_config = configs['eval_input_config']
+  eval_config = configs['eval_config']

  def model_fn(features, labels, mode, params=None):
    """Constructs the object detection model.
@@ -250,9 +250,25 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
        prediction_dict, features[fields.InputDataFields.true_image_shape])

    if mode == tf.estimator.ModeKeys.TRAIN:
+      if not train_config.fine_tune_checkpoint_type:
+        # train_config.from_detection_checkpoint field is deprecated. For
+        # backward compatibility, sets finetune_checkpoint_type based on
+        # from_detection_checkpoint.
+        if train_config.from_detection_checkpoint:
+          train_config.fine_tune_checkpoint_type = 'detection'
+        else:
+          train_config.fine_tune_checkpoint_type = 'classification'
      if train_config.fine_tune_checkpoint and hparams.load_pretrained:
+        if not train_config.fine_tune_checkpoint_type:
+          # train_config.from_detection_checkpoint field is deprecated. For
+          # backward compatibility, set train_config.fine_tune_checkpoint_type
+          # based on train_config.from_detection_checkpoint.
+          if train_config.from_detection_checkpoint:
+            train_config.fine_tune_checkpoint_type = 'detection'
+          else:
+            train_config.fine_tune_checkpoint_type = 'classification'
        asg_map = detection_model.restore_map(
-            from_detection_checkpoint=train_config.from_detection_checkpoint,
+            fine_tune_checkpoint_type=train_config.fine_tune_checkpoint_type,
            load_all_detection_checkpoint_vars=(
                train_config.load_all_detection_checkpoint_vars))
        available_var_map = (
@@ -273,6 +289,15 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
      losses_dict = detection_model.loss(
          prediction_dict, features[fields.InputDataFields.true_image_shape])
      losses = [loss_tensor for loss_tensor in losses_dict.itervalues()]
+      if train_config.add_regularization_loss:
+        regularization_losses = tf.get_collection(
+            tf.GraphKeys.REGULARIZATION_LOSSES)
+        if regularization_losses:
+          regularization_loss = tf.add_n(regularization_losses,
+                                         name='regularization_loss')
+          losses.append(regularization_loss)
+          if not use_tpu:
+            tf.summary.scalar('regularization_loss', regularization_loss)
      total_loss = tf.add_n(losses, name='total_loss')

    if mode == tf.estimator.ModeKeys.TRAIN:
@@ -321,8 +346,12 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
      class_agnostic = (fields.DetectionResultFields.detection_classes
                        not in detections)
      groundtruth = _get_groundtruth_data(detection_model, class_agnostic)
+      use_original_images = fields.InputDataFields.original_image in features
+      eval_images = (
+          features[fields.InputDataFields.original_image] if use_original_images
+          else features[fields.InputDataFields.image])
      eval_dict = eval_util.result_dict_for_single_example(
-          tf.expand_dims(features[fields.InputDataFields.original_image][0], 0),
+          eval_images[0:1],
          features[inputs.HASH_KEY][0],
          detections,
          groundtruth,
@@ -334,7 +363,7 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
      else:
        category_index = label_map_util.create_category_index_from_labelmap(
            eval_input_config.label_map_path)
-      if not use_tpu:
+      if not use_tpu and use_original_images:
        detection_and_groundtruth = (
            vis_utils.draw_side_by_side_evaluation_image(
                eval_dict, category_index, max_boxes_to_draw=20,
@@ -343,17 +372,12 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
                         detection_and_groundtruth)

      # Eval metrics on a single image.
-      detection_fields = fields.DetectionResultFields()
-      input_data_fields = fields.InputDataFields()
-      coco_evaluator = coco_evaluation.CocoDetectionEvaluator(
-          category_index.values())
-      eval_metric_ops = coco_evaluator.get_estimator_eval_metric_ops(
-          image_id=eval_dict[input_data_fields.key],
-          groundtruth_boxes=eval_dict[input_data_fields.groundtruth_boxes],
-          groundtruth_classes=eval_dict[input_data_fields.groundtruth_classes],
-          detection_boxes=eval_dict[detection_fields.detection_boxes],
-          detection_scores=eval_dict[detection_fields.detection_scores],
-          detection_classes=eval_dict[detection_fields.detection_classes])
+      eval_metrics = eval_config.metrics_set
+      if not eval_metrics:
+        eval_metrics = ['coco_detection_metrics']
+      eval_metric_ops = eval_util.get_eval_metric_ops_for_evaluators(
+          eval_metrics, category_index.values(), eval_dict,
+          include_metrics_per_category=False)

    if use_tpu:
      return tf.contrib.tpu.TPUEstimatorSpec(
@@ -376,7 +400,7 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
  return model_fn


-def _build_experiment_fn(train_steps, eval_steps):
+def build_experiment_fn(train_steps, eval_steps):
  """Returns a function that creates an `Experiment`."""

  def build_experiment(run_config, hparams):
@@ -509,8 +533,8 @@ def main(unused_argv):
  tf.flags.mark_flag_as_required('pipeline_config_path')
  config = tf.contrib.learn.RunConfig(model_dir=FLAGS.model_dir)
  learn_runner.run(
-      experiment_fn=_build_experiment_fn(FLAGS.num_train_steps,
-                                         FLAGS.num_eval_steps),
+      experiment_fn=build_experiment_fn(FLAGS.num_train_steps,
+                                        FLAGS.num_eval_steps),
      run_config=config,
      hparams=model_hparams.create_hparams())


--- a/research/object_detection/model_test_util.py
+++ b/research/object_detection/model_test_util.py
@@ -49,6 +49,6 @@ def BuildExperiment():
      hparams_overrides='load_pretrained=false')

  # pylint: disable=protected-access
-  experiment_fn = model._build_experiment_fn(10, 10)
+  experiment_fn = model.build_experiment_fn(10, 10)
  # pylint: enable=protected-access
  return experiment_fn(run_config, hparams)
--- a/research/object_detection/models/faster_rcnn_inception_resnet_v2_feature_extractor.py
+++ b/research/object_detection/models/faster_rcnn_inception_resnet_v2_feature_extractor.py
@@ -105,12 +105,10 @@ class FasterRCNNInceptionResnetV2FeatureExtractor(
                          is_training=self._train_batch_norm):
        with tf.variable_scope('InceptionResnetV2',
                               reuse=self._reuse_weights) as scope:
-          rpn_feature_map, _ = (
-              inception_resnet_v2.inception_resnet_v2_base(
-                  preprocessed_inputs, final_endpoint='PreAuxLogits',
-                  scope=scope, output_stride=self._first_stage_features_stride,
-                  align_feature_maps=True))
-    return rpn_feature_map
+          return inception_resnet_v2.inception_resnet_v2_base(
+              preprocessed_inputs, final_endpoint='PreAuxLogits',
+              scope=scope, output_stride=self._first_stage_features_stride,
+              align_feature_maps=True)

  def _extract_box_classifier_features(self, proposal_feature_maps, scope):
    """Extracts second stage box classifier features.

--- a/research/object_detection/models/faster_rcnn_inception_resnet_v2_feature_extractor_test.py
+++ b/research/object_detection/models/faster_rcnn_inception_resnet_v2_feature_extractor_test.py
@@ -35,7 +35,7 @@ class FasterRcnnInceptionResnetV2FeatureExtractorTest(tf.test.TestCase):
        first_stage_features_stride=16)
    preprocessed_inputs = tf.random_uniform(
        [1, 299, 299, 3], maxval=255, dtype=tf.float32)
-    rpn_feature_map = feature_extractor.extract_proposal_features(
+    rpn_feature_map, _ = feature_extractor.extract_proposal_features(
        preprocessed_inputs, scope='TestScope')
    features_shape = tf.shape(rpn_feature_map)

@@ -50,7 +50,7 @@ class FasterRcnnInceptionResnetV2FeatureExtractorTest(tf.test.TestCase):
        first_stage_features_stride=8)
    preprocessed_inputs = tf.random_uniform(
        [1, 224, 224, 3], maxval=255, dtype=tf.float32)
-    rpn_feature_map = feature_extractor.extract_proposal_features(
+    rpn_feature_map, _ = feature_extractor.extract_proposal_features(
        preprocessed_inputs, scope='TestScope')
    features_shape = tf.shape(rpn_feature_map)

@@ -65,7 +65,7 @@ class FasterRcnnInceptionResnetV2FeatureExtractorTest(tf.test.TestCase):
        first_stage_features_stride=16)
    preprocessed_inputs = tf.random_uniform(
        [1, 112, 112, 3], maxval=255, dtype=tf.float32)
-    rpn_feature_map = feature_extractor.extract_proposal_features(
+    rpn_feature_map, _ = feature_extractor.extract_proposal_features(
        preprocessed_inputs, scope='TestScope')
    features_shape = tf.shape(rpn_feature_map)


--- a/research/object_detection/models/faster_rcnn_inception_v2_feature_extractor.py
+++ b/research/object_detection/models/faster_rcnn_inception_v2_feature_extractor.py
@@ -109,6 +109,9 @@ class FasterRCNNInceptionV2FeatureExtractor(

    Returns:
      rpn_feature_map: A tensor with shape [batch, height, width, depth]
+      activations: A dictionary mapping feature extractor tensor names to
+        tensors
+
    Raises:
      InvalidArgumentError: If the spatial size of `preprocessed_inputs`
        (height or width) is less than 33.
@@ -134,7 +137,7 @@ class FasterRCNNInceptionV2FeatureExtractor(
              depth_multiplier=self._depth_multiplier,
              scope=scope)

-    return activations['Mixed_4e']
+    return activations['Mixed_4e'], activations

  def _extract_box_classifier_features(self, proposal_feature_maps, scope):
    """Extracts second stage box classifier features.

--- a/research/object_detection/models/faster_rcnn_inception_v2_feature_extractor_test.py
+++ b/research/object_detection/models/faster_rcnn_inception_v2_feature_extractor_test.py
@@ -36,7 +36,7 @@ class FasterRcnnInceptionV2FeatureExtractorTest(tf.test.TestCase):
        first_stage_features_stride=16)
    preprocessed_inputs = tf.random_uniform(
        [4, 224, 224, 3], maxval=255, dtype=tf.float32)
-    rpn_feature_map = feature_extractor.extract_proposal_features(
+    rpn_feature_map, _ = feature_extractor.extract_proposal_features(
        preprocessed_inputs, scope='TestScope')
    features_shape = tf.shape(rpn_feature_map)

@@ -51,7 +51,7 @@ class FasterRcnnInceptionV2FeatureExtractorTest(tf.test.TestCase):
        first_stage_features_stride=8)
    preprocessed_inputs = tf.random_uniform(
        [4, 224, 224, 3], maxval=255, dtype=tf.float32)
-    rpn_feature_map = feature_extractor.extract_proposal_features(
+    rpn_feature_map, _ = feature_extractor.extract_proposal_features(
        preprocessed_inputs, scope='TestScope')
    features_shape = tf.shape(rpn_feature_map)

@@ -66,7 +66,7 @@ class FasterRcnnInceptionV2FeatureExtractorTest(tf.test.TestCase):
        first_stage_features_stride=16)
    preprocessed_inputs = tf.random_uniform(
        [1, 112, 112, 3], maxval=255, dtype=tf.float32)
-    rpn_feature_map = feature_extractor.extract_proposal_features(
+    rpn_feature_map, _ = feature_extractor.extract_proposal_features(
        preprocessed_inputs, scope='TestScope')
    features_shape = tf.shape(rpn_feature_map)

@@ -84,7 +84,7 @@ class FasterRcnnInceptionV2FeatureExtractorTest(tf.test.TestCase):
    feature_extractor = self._build_feature_extractor(
        first_stage_features_stride=16)
    preprocessed_inputs = tf.placeholder(tf.float32, (4, None, None, 3))
-    rpn_feature_map = feature_extractor.extract_proposal_features(
+    rpn_feature_map, _ = feature_extractor.extract_proposal_features(
        preprocessed_inputs, scope='TestScope')
    features_shape = tf.shape(rpn_feature_map)


--- a/research/object_detection/models/faster_rcnn_mobilenet_v1_feature_extractor.py
+++ b/research/object_detection/models/faster_rcnn_mobilenet_v1_feature_extractor.py
+# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Mobilenet v1 Faster R-CNN implementation."""
+import tensorflow as tf
+
+from object_detection.meta_architectures import faster_rcnn_meta_arch
+from nets import mobilenet_v1
+
+slim = tf.contrib.slim
+
+
+def _batch_norm_arg_scope(list_ops,
+                          use_batch_norm=True,
+                          batch_norm_decay=0.9997,
+                          batch_norm_epsilon=0.001,
+                          batch_norm_scale=False,
+                          train_batch_norm=False):
+  """Slim arg scope for Mobilenet V1 batch norm."""
+  if use_batch_norm:
+    batch_norm_params = {
+        'is_training': train_batch_norm,
+        'scale': batch_norm_scale,
+        'decay': batch_norm_decay,
+        'epsilon': batch_norm_epsilon
+    }
+    normalizer_fn = slim.batch_norm
+  else:
+    normalizer_fn = None
+    batch_norm_params = None
+
+  return slim.arg_scope(list_ops,
+                        normalizer_fn=normalizer_fn,
+                        normalizer_params=batch_norm_params)
+
+
+class FasterRCNNMobilenetV1FeatureExtractor(
+    faster_rcnn_meta_arch.FasterRCNNFeatureExtractor):
+  """Faster R-CNN Mobilenet V1 feature extractor implementation."""
+
+  def __init__(self,
+               is_training,
+               first_stage_features_stride,
+               batch_norm_trainable=False,
+               reuse_weights=None,
+               weight_decay=0.0,
+               depth_multiplier=1.0,
+               min_depth=16):
+    """Constructor.
+
+    Args:
+      is_training: See base class.
+      first_stage_features_stride: See base class.
+      batch_norm_trainable: See base class.
+      reuse_weights: See base class.
+      weight_decay: See base class.
+      depth_multiplier: float depth multiplier for feature extractor.
+      min_depth: minimum feature extractor depth.
+
+    Raises:
+      ValueError: If `first_stage_features_stride` is not 8 or 16.
+    """
+    if first_stage_features_stride != 8 and first_stage_features_stride != 16:
+      raise ValueError('`first_stage_features_stride` must be 8 or 16.')
+    self._depth_multiplier = depth_multiplier
+    self._min_depth = min_depth
+    super(FasterRCNNMobilenetV1FeatureExtractor, self).__init__(
+        is_training, first_stage_features_stride, batch_norm_trainable,
+        reuse_weights, weight_decay)
+
+  def preprocess(self, resized_inputs):
+    """Faster R-CNN Mobilenet V1 preprocessing.
+
+    Maps pixel values to the range [-1, 1].
+
+    Args:
+      resized_inputs: a [batch, height, width, channels] float tensor
+        representing a batch of images.
+
+    Returns:
+      preprocessed_inputs: a [batch, height, width, channels] float tensor
+        representing a batch of images.
+    """
+    return (2.0 / 255.0) * resized_inputs - 1.0
+
+  def _extract_proposal_features(self, preprocessed_inputs, scope):
+    """Extracts first stage RPN features.
+
+    Args:
+      preprocessed_inputs: A [batch, height, width, channels] float32 tensor
+        representing a batch of images.
+      scope: A scope name.
+
+    Returns:
+      rpn_feature_map: A tensor with shape [batch, height, width, depth]
+      activations: A dictionary mapping feature extractor tensor names to
+        tensors
+
+    Raises:
+      InvalidArgumentError: If the spatial size of `preprocessed_inputs`
+        (height or width) is less than 33.
+      ValueError: If the created network is missing the required activation.
+    """
+
+    preprocessed_inputs.get_shape().assert_has_rank(4)
+    shape_assert = tf.Assert(
+        tf.logical_and(tf.greater_equal(tf.shape(preprocessed_inputs)[1], 33),
+                       tf.greater_equal(tf.shape(preprocessed_inputs)[2], 33)),
+        ['image size must at least be 33 in both height and width.'])
+
+    with tf.control_dependencies([shape_assert]):
+      with tf.variable_scope('MobilenetV1',
+                             reuse=self._reuse_weights) as scope:
+        with _batch_norm_arg_scope([slim.conv2d, slim.separable_conv2d],
+                                   batch_norm_scale=True,
+                                   train_batch_norm=self._train_batch_norm):
+          _, activations = mobilenet_v1.mobilenet_v1_base(
+              preprocessed_inputs,
+              final_endpoint='Conv2d_13_pointwise',
+              min_depth=self._min_depth,
+              depth_multiplier=self._depth_multiplier,
+              scope=scope)
+    return activations['Conv2d_13_pointwise'], activations
+
+  def _extract_box_classifier_features(self, proposal_feature_maps, scope):
+    """Extracts second stage box classifier features.
+
+    Args:
+      proposal_feature_maps: A 4-D float tensor with shape
+        [batch_size * self.max_num_proposals, crop_height, crop_width, depth]
+        representing the feature map cropped to each proposal.
+      scope: A scope name (unused).
+
+    Returns:
+      proposal_classifier_features: A 4-D float tensor with shape
+        [batch_size * self.max_num_proposals, height, width, depth]
+        representing box classifier features for each proposal.
+    """
+    net = proposal_feature_maps
+
+    depth = lambda d: max(int(d * 1.0), 16)
+    with tf.variable_scope('MobilenetV1', reuse=self._reuse_weights):
+      with _batch_norm_arg_scope([slim.conv2d, slim.separable_conv2d],
+                                 batch_norm_scale=True,
+                                 train_batch_norm=self._train_batch_norm):
+        with slim.arg_scope(
+            [slim.conv2d, slim.separable_conv2d], padding='SAME'):
+          net = slim.separable_conv2d(
+              net,
+              depth(1024), [3, 3],
+              depth_multiplier=1,
+              stride=2,
+              scope='Conv2d_12_pointwise')
+          return slim.separable_conv2d(
+              net,
+              depth(1024), [3, 3],
+              depth_multiplier=1,
+              stride=1,
+              scope='Conv2d_13_pointwise')
--- a/research/object_detection/models/faster_rcnn_mobilenet_v1_feature_extractor_test.py
+++ b/research/object_detection/models/faster_rcnn_mobilenet_v1_feature_extractor_test.py
+# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Tests for faster_rcnn_mobilenet_v1_feature_extractor."""
+
+import numpy as np
+import tensorflow as tf
+
+from object_detection.models import faster_rcnn_mobilenet_v1_feature_extractor as faster_rcnn_mobilenet_v1
+
+
+class FasterRcnnMobilenetV1FeatureExtractorTest(tf.test.TestCase):
+
+  def _build_feature_extractor(self, first_stage_features_stride):
+    return faster_rcnn_mobilenet_v1.FasterRCNNMobilenetV1FeatureExtractor(
+        is_training=False,
+        first_stage_features_stride=first_stage_features_stride,
+        batch_norm_trainable=False,
+        reuse_weights=None,
+        weight_decay=0.0)
+
+  def test_extract_proposal_features_returns_expected_size(self):
+    feature_extractor = self._build_feature_extractor(
+        first_stage_features_stride=16)
+    preprocessed_inputs = tf.random_uniform(
+        [4, 224, 224, 3], maxval=255, dtype=tf.float32)
+    rpn_feature_map, _ = feature_extractor.extract_proposal_features(
+        preprocessed_inputs, scope='TestScope')
+    features_shape = tf.shape(rpn_feature_map)
+
+    init_op = tf.global_variables_initializer()
+    with self.test_session() as sess:
+      sess.run(init_op)
+      features_shape_out = sess.run(features_shape)
+      self.assertAllEqual(features_shape_out, [4, 7, 7, 1024])
+
+  def test_extract_proposal_features_stride_eight(self):
+    feature_extractor = self._build_feature_extractor(
+        first_stage_features_stride=8)
+    preprocessed_inputs = tf.random_uniform(
+        [4, 224, 224, 3], maxval=255, dtype=tf.float32)
+    rpn_feature_map, _ = feature_extractor.extract_proposal_features(
+        preprocessed_inputs, scope='TestScope')
+    features_shape = tf.shape(rpn_feature_map)
+
+    init_op = tf.global_variables_initializer()
+    with self.test_session() as sess:
+      sess.run(init_op)
+      features_shape_out = sess.run(features_shape)
+      self.assertAllEqual(features_shape_out, [4, 7, 7, 1024])
+
+  def test_extract_proposal_features_half_size_input(self):
+    feature_extractor = self._build_feature_extractor(
+        first_stage_features_stride=16)
+    preprocessed_inputs = tf.random_uniform(
+        [1, 112, 112, 3], maxval=255, dtype=tf.float32)
+    rpn_feature_map, _ = feature_extractor.extract_proposal_features(
+        preprocessed_inputs, scope='TestScope')
+    features_shape = tf.shape(rpn_feature_map)
+
+    init_op = tf.global_variables_initializer()
+    with self.test_session() as sess:
+      sess.run(init_op)
+      features_shape_out = sess.run(features_shape)
+      self.assertAllEqual(features_shape_out, [1, 4, 4, 1024])
+
+  def test_extract_proposal_features_dies_on_invalid_stride(self):
+    with self.assertRaises(ValueError):
+      self._build_feature_extractor(first_stage_features_stride=99)
+
+  def test_extract_proposal_features_dies_on_very_small_images(self):
+    feature_extractor = self._build_feature_extractor(
+        first_stage_features_stride=16)
+    preprocessed_inputs = tf.placeholder(tf.float32, (4, None, None, 3))
+    rpn_feature_map, _ = feature_extractor.extract_proposal_features(
+        preprocessed_inputs, scope='TestScope')
+    features_shape = tf.shape(rpn_feature_map)
+
+    init_op = tf.global_variables_initializer()
+    with self.test_session() as sess:
+      sess.run(init_op)
+      with self.assertRaises(tf.errors.InvalidArgumentError):
+        sess.run(
+            features_shape,
+            feed_dict={preprocessed_inputs: np.random.rand(4, 32, 32, 3)})
+
+  def test_extract_proposal_features_dies_with_incorrect_rank_inputs(self):
+    feature_extractor = self._build_feature_extractor(
+        first_stage_features_stride=16)
+    preprocessed_inputs = tf.random_uniform(
+        [224, 224, 3], maxval=255, dtype=tf.float32)
+    with self.assertRaises(ValueError):
+      feature_extractor.extract_proposal_features(
+          preprocessed_inputs, scope='TestScope')
+
+  def test_extract_box_classifier_features_returns_expected_size(self):
+    feature_extractor = self._build_feature_extractor(
+        first_stage_features_stride=16)
+    proposal_feature_maps = tf.random_uniform(
+        [3, 14, 14, 576], maxval=255, dtype=tf.float32)
+    proposal_classifier_features = (
+        feature_extractor.extract_box_classifier_features(
+            proposal_feature_maps, scope='TestScope'))
+    features_shape = tf.shape(proposal_classifier_features)
+
+    init_op = tf.global_variables_initializer()
+    with self.test_session() as sess:
+      sess.run(init_op)
+      features_shape_out = sess.run(features_shape)
+      self.assertAllEqual(features_shape_out, [3, 7, 7, 1024])
+
+
+if __name__ == '__main__':
+  tf.test.main()
--- a/research/object_detection/models/faster_rcnn_nas_feature_extractor.py
+++ b/research/object_detection/models/faster_rcnn_nas_feature_extractor.py
@@ -171,6 +171,8 @@ class FasterRCNNNASFeatureExtractor(

    Returns:
      rpn_feature_map: A tensor with shape [batch, height, width, depth]
+      end_points: A dictionary mapping feature extractor tensor names to tensors
+
    Raises:
      ValueError: If the created network is missing the required activation.
    """
@@ -202,7 +204,7 @@ class FasterRCNNNASFeatureExtractor(
    rpn_feature_map_shape = [batch] + shape_without_batch
    rpn_feature_map.set_shape(rpn_feature_map_shape)

-    return rpn_feature_map
+    return rpn_feature_map, end_points

  def _extract_box_classifier_features(self, proposal_feature_maps, scope):
    """Extracts second stage box classifier features.
@@ -231,9 +233,11 @@ class FasterRCNNNASFeatureExtractor(
    # Note that what follows is largely a copy of build_nasnet_large() within
    # nasnet.py. We are copying to minimize code pollution in slim.

-    # pylint: disable=protected-access
-    hparams = nasnet._large_imagenet_config(is_training=self._is_training)
-    # pylint: enable=protected-access
+    # TODO(shlens,skornblith): Determine the appropriate drop path schedule.
+    # For now the schedule is the default (1.0->0.7 over 250,000 train steps).
+    hparams = nasnet.large_imagenet_config()
+    if not self._is_training:
+      hparams.set_hparam('drop_path_keep_prob', 1.0)

    # Calculate the total number of cells in the network
    # -- Add 2 for the reduction cells.

--- a/research/object_detection/models/faster_rcnn_nas_feature_extractor_test.py
+++ b/research/object_detection/models/faster_rcnn_nas_feature_extractor_test.py
@@ -35,7 +35,7 @@ class FasterRcnnNASFeatureExtractorTest(tf.test.TestCase):
        first_stage_features_stride=16)
    preprocessed_inputs = tf.random_uniform(
        [1, 299, 299, 3], maxval=255, dtype=tf.float32)
-    rpn_feature_map = feature_extractor.extract_proposal_features(
+    rpn_feature_map, _ = feature_extractor.extract_proposal_features(
        preprocessed_inputs, scope='TestScope')
    features_shape = tf.shape(rpn_feature_map)

@@ -50,7 +50,7 @@ class FasterRcnnNASFeatureExtractorTest(tf.test.TestCase):
        first_stage_features_stride=16)
    preprocessed_inputs = tf.random_uniform(
        [1, 224, 224, 3], maxval=255, dtype=tf.float32)
-    rpn_feature_map = feature_extractor.extract_proposal_features(
+    rpn_feature_map, _ = feature_extractor.extract_proposal_features(
        preprocessed_inputs, scope='TestScope')
    features_shape = tf.shape(rpn_feature_map)

@@ -65,7 +65,7 @@ class FasterRcnnNASFeatureExtractorTest(tf.test.TestCase):
        first_stage_features_stride=16)
    preprocessed_inputs = tf.random_uniform(
        [1, 112, 112, 3], maxval=255, dtype=tf.float32)
-    rpn_feature_map = feature_extractor.extract_proposal_features(
+    rpn_feature_map, _ = feature_extractor.extract_proposal_features(
        preprocessed_inputs, scope='TestScope')
    features_shape = tf.shape(rpn_feature_map)


--- a/research/object_detection/models/faster_rcnn_resnet_v1_feature_extractor.py
+++ b/research/object_detection/models/faster_rcnn_resnet_v1_feature_extractor.py
@@ -95,6 +95,9 @@ class FasterRCNNResnetV1FeatureExtractor(

    Returns:
      rpn_feature_map: A tensor with shape [batch, height, width, depth]
+      activations: A dictionary mapping feature extractor tensor names to
+        tensors
+
    Raises:
      InvalidArgumentError: If the spatial size of `preprocessed_inputs`
        (height or width) is less than 33.
@@ -130,7 +133,7 @@ class FasterRCNNResnetV1FeatureExtractor(
              scope=var_scope)

    handle = scope + '/%s/block3' % self._architecture
-    return activations[handle]
+    return activations[handle], activations

  def _extract_box_classifier_features(self, proposal_feature_maps, scope):
    """Extracts second stage box classifier features.

--- a/research/object_detection/models/faster_rcnn_resnet_v1_feature_extractor_test.py
+++ b/research/object_detection/models/faster_rcnn_resnet_v1_feature_extractor_test.py
@@ -47,7 +47,7 @@ class FasterRcnnResnetV1FeatureExtractorTest(tf.test.TestCase):
          first_stage_features_stride=16, architecture=architecture)
      preprocessed_inputs = tf.random_uniform(
          [4, 224, 224, 3], maxval=255, dtype=tf.float32)
-      rpn_feature_map = feature_extractor.extract_proposal_features(
+      rpn_feature_map, _ = feature_extractor.extract_proposal_features(
          preprocessed_inputs, scope='TestScope')
      features_shape = tf.shape(rpn_feature_map)

@@ -62,7 +62,7 @@ class FasterRcnnResnetV1FeatureExtractorTest(tf.test.TestCase):
        first_stage_features_stride=8)
    preprocessed_inputs = tf.random_uniform(
        [4, 224, 224, 3], maxval=255, dtype=tf.float32)
-    rpn_feature_map = feature_extractor.extract_proposal_features(
+    rpn_feature_map, _ = feature_extractor.extract_proposal_features(
        preprocessed_inputs, scope='TestScope')
    features_shape = tf.shape(rpn_feature_map)

@@ -77,7 +77,7 @@ class FasterRcnnResnetV1FeatureExtractorTest(tf.test.TestCase):
        first_stage_features_stride=16)
    preprocessed_inputs = tf.random_uniform(
        [1, 112, 112, 3], maxval=255, dtype=tf.float32)
-    rpn_feature_map = feature_extractor.extract_proposal_features(
+    rpn_feature_map, _ = feature_extractor.extract_proposal_features(
        preprocessed_inputs, scope='TestScope')
    features_shape = tf.shape(rpn_feature_map)

@@ -95,7 +95,7 @@ class FasterRcnnResnetV1FeatureExtractorTest(tf.test.TestCase):
    feature_extractor = self._build_feature_extractor(
        first_stage_features_stride=16)
    preprocessed_inputs = tf.placeholder(tf.float32, (4, None, None, 3))
-    rpn_feature_map = feature_extractor.extract_proposal_features(
+    rpn_feature_map, _ = feature_extractor.extract_proposal_features(
        preprocessed_inputs, scope='TestScope')
    features_shape = tf.shape(rpn_feature_map)


--- a/research/object_detection/models/ssd_mobilenet_v1_feature_extractor.py
+++ b/research/object_detection/models/ssd_mobilenet_v1_feature_extractor.py
@@ -100,11 +100,13 @@ class SSDMobileNetV1FeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
        'use_depthwise': self._use_depthwise,
    }

-    with slim.arg_scope(self._conv_hyperparams):
-      # TODO(skligys): Enable fused batch norm once quantization supports it.
-      with slim.arg_scope([slim.batch_norm], fused=False):
-        with tf.variable_scope('MobilenetV1',
-                               reuse=self._reuse_weights) as scope:
+    with tf.variable_scope('MobilenetV1',
+                           reuse=self._reuse_weights) as scope:
+      with slim.arg_scope(
+          mobilenet_v1.mobilenet_v1_arg_scope(
+              is_training=(self._batch_norm_trainable and self._is_training))):
+        # TODO(skligys): Enable fused batch norm once quantization supports it.
+        with slim.arg_scope([slim.batch_norm], fused=False):
          _, image_features = mobilenet_v1.mobilenet_v1_base(
              ops.pad_to_multiple(preprocessed_inputs, self._pad_to_multiple),
              final_endpoint='Conv2d_13_pointwise',
@@ -112,6 +114,9 @@ class SSDMobileNetV1FeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
              depth_multiplier=self._depth_multiplier,
              use_explicit_padding=self._use_explicit_padding,
              scope=scope)
+      with slim.arg_scope(self._conv_hyperparams):
+        # TODO(skligys): Enable fused batch norm once quantization supports it.
+        with slim.arg_scope([slim.batch_norm], fused=False):
          feature_maps = feature_map_generators.multi_resolution_feature_maps(
              feature_map_layout=feature_map_layout,
              depth_multiplier=self._depth_multiplier,