Merged commit includes the following changes:

185215255 by Zhichao Lu: Stop populating image/object/class/text field when generating COCO tf record. -- 185213306 by Zhichao Lu: Use the params batch size and not the one from train_config in input_fn -- 185209081 by Zhichao Lu: Handle the case when there are no ground-truth masks for an image. -- 185195531 by Zhichao Lu: Remove unstack and stack operations on features from third_party/object_detection/model.py. -- 185195017 by Zhichao Lu: Matrix multiplication based gather op implementation. -- 185187744 by Zhichao Lu: Fix eval_util minor issue. -- 185098733 by Zhichao Lu: Internal change 185076656 by Zhichao Lu: Increment the amount of boxes for coco17. -- 185074199 by Zhichao Lu: Add config for SSD Resnet50 v1 with FPN. -- 185060199 by Zhichao Lu: Fix a bug in clear_detections. This method set detection_keys to an empty dictionary instead of an empty set. I've refactored so that this method and the constructor use the same code path. -- 185031359 by Zhichao Lu: Eval TPU trained models continuously. -- 185016591 by Zhichao Lu: Use TPUEstimatorSpec for TPU -- 185013651 by Zhichao Lu: Add PreprocessorCache to record and duplicate augmentations. -- 184921763 by Zhichao Lu: Minor fixes for object detection. -- 184920610 by Zhichao Lu: Adds a model builder test for "embedded_ssd_mobilenet_v1" feature extractor. -- 184919284 by Zhichao Lu: Added unit tests for TPU, with optional training / eval. -- 184915910 by Zhichao Lu: Update third_party g3 doc with Mask RCNN detection models. -- 184914085 by Zhichao Lu: Slight change to WeightSharedConvolutionalBoxPredictor implementation to make things match more closely with RetinaNet. Specifically we now construct the box encoding and class predictor towers separately rather than having them share weights until penultimate layer. -- 184913786 by Zhichao Lu: Plumbs SSD Resnet V1 with FPN models into model builder. -- 184910030 by Zhichao Lu: Add coco metrics to evaluator. -- 184897758 by Zhichao Lu: Merge changes from github. -- 184888736 by Zhichao Lu: Ensure groundtruth_weights are always 1-D. -- 184887256 by Zhichao Lu: Introduce an option to add summaries in the model so it can be turned off when necessary. -- 184865559 by Zhichao Lu: Updating inputs so that a dictionary of tensors is returned from input_fn. Moving unbatch/unpad to model.py. Also removing source_id key from features dictionary, and replacing with an integer hash. -- 184859205 by Zhichao Lu: This CL is trying to hide those differences by making the default settings work with the public code. -- 184769779 by Zhichao Lu: Pass groundtruth weights into ssd meta architecture all the way to target assigner. This will allow training ssd models with padded groundtruth tensors. -- 184767117 by Zhichao Lu: * Add `params` arg to make all input fns work with TPUEstimator * Add --master * Output eval results -- 184766244 by Zhichao Lu: Update create_coco_tf_record to include category indices -- 184752937 by Zhichao Lu: Create a third_party version of TPU compatible mobilenet_v2_focal_loss coco config. -- 184750174 by Zhichao Lu: A few small fixes for multiscale anchor generator and a test. -- 184746581 by Zhichao Lu: Update jupyter notebook to show mask if provided by model. -- 184728646 by Zhichao Lu: Adding a few more tests to make sure decoding with/without label maps performs as expected. -- 184624154 by Zhichao Lu: Add an object detection binary for TPU. -- 184622118 by Zhichao Lu: Batch, transform, and unbatch in the tflearn interface. -- 184595064 by Zhichao Lu: Add support for training grayscale models. -- 184532026 by Zhichao Lu: Change dataset_builder.build to perform optional batching using tf.data.Dataset API -- 184330239 by Zhichao Lu: Add augment_input_data and transform_input_data helper functions to third_party/tensorflow_models/object_detection/inputs.py -- 184328681 by Zhichao Lu: Use an internal rgb to gray method that can be quantized. -- 184327909 by Zhichao Lu: Helper function to return padding shapes to use with Dataset.padded_batch. -- 184326291 by Zhichao Lu: Added decode_func for specialized decoding. -- 184314676 by Zhichao Lu: Add unstack_batch method to inputs.py. This will enable us to convert batched tensors to lists of tensors. This is compatible with OD API that consumes groundtruth batch as a list of tensors. -- 184281269 by Zhichao Lu: Internal test target changes. -- 184192851 by Zhichao Lu: Adding `Estimator` interface for object detection. -- 184187885 by Zhichao Lu: Add config_util functions to help with input pipeline. 1. function to return expected shapes from the resizer config 2. function to extract image_resizer_config from model_config. -- 184139892 by Zhichao Lu: Adding support for depthwise SSD (ssd-lite) and depthwise box predictions. -- 184089891 by Zhichao Lu: Fix third_party faster rcnn resnet101 coco config. -- 184083378 by Zhichao Lu: In the case when there is no object/weights field in tf.Example proto, return a default weight of 1.0 for all boxes. -- PiperOrigin-RevId: 185215255

Merged commit includes the following changes:
185215255 by Zhichao Lu: Stop populating image/object/class/text field when generating COCO tf record. -- 185213306 by Zhichao Lu: Use the params batch size and not the one from train_config in input_fn -- 185209081 by Zhichao Lu: Handle the case when there are no ground-truth masks for an image. -- 185195531 by Zhichao Lu: Remove unstack and stack operations on features from third_party/object_detection/model.py. -- 185195017 by Zhichao Lu: Matrix multiplication based gather op implementation. -- 185187744 by Zhichao Lu: Fix eval_util minor issue. -- 185098733 by Zhichao Lu: Internal change 185076656 by Zhichao Lu: Increment the amount of boxes for coco17. -- 185074199 by Zhichao Lu: Add config for SSD Resnet50 v1 with FPN. -- 185060199 by Zhichao Lu: Fix a bug in clear_detections. This method set detection_keys to an empty dictionary instead of an empty set. I've refactored so that this method and the constructor use the same code path. -- 185031359 by Zhichao Lu: Eval TPU trained models continuously. -- 185016591 by Zhichao Lu: Use TPUEstimatorSpec for TPU -- 185013651 by Zhichao Lu: Add PreprocessorCache to record and duplicate augmentations. -- 184921763 by Zhichao Lu: Minor fixes for object detection. -- 184920610 by Zhichao Lu: Adds a model builder test for "embedded_ssd_mobilenet_v1" feature extractor. -- 184919284 by Zhichao Lu: Added unit tests for TPU, with optional training / eval. -- 184915910 by Zhichao Lu: Update third_party g3 doc with Mask RCNN detection models. -- 184914085 by Zhichao Lu: Slight change to WeightSharedConvolutionalBoxPredictor implementation to make things match more closely with RetinaNet. Specifically we now construct the box encoding and class predictor towers separately rather than having them share weights until penultimate layer. -- 184913786 by Zhichao Lu: Plumbs SSD Resnet V1 with FPN models into model builder. -- 184910030 by Zhichao Lu: Add coco metrics to evaluator. -- 184897758 by Zhichao Lu: Merge changes from github. -- 184888736 by Zhichao Lu: Ensure groundtruth_weights are always 1-D. -- 184887256 by Zhichao Lu: Introduce an option to add summaries in the model so it can be turned off when necessary. -- 184865559 by Zhichao Lu: Updating inputs so that a dictionary of tensors is returned from input_fn. Moving unbatch/unpad to model.py. Also removing source_id key from features dictionary, and replacing with an integer hash. -- 184859205 by Zhichao Lu: This CL is trying to hide those differences by making the default settings work with the public code. -- 184769779 by Zhichao Lu: Pass groundtruth weights into ssd meta architecture all the way to target assigner. This will allow training ssd models with padded groundtruth tensors. -- 184767117 by Zhichao Lu: * Add `params` arg to make all input fns work with TPUEstimator * Add --master * Output eval results -- 184766244 by Zhichao Lu: Update create_coco_tf_record to include category indices -- 184752937 by Zhichao Lu: Create a third_party version of TPU compatible mobilenet_v2_focal_loss coco config. -- 184750174 by Zhichao Lu: A few small fixes for multiscale anchor generator and a test. -- 184746581 by Zhichao Lu: Update jupyter notebook to show mask if provided by model. -- 184728646 by Zhichao Lu: Adding a few more tests to make sure decoding with/without label maps performs as expected. -- 184624154 by Zhichao Lu: Add an object detection binary for TPU. -- 184622118 by Zhichao Lu: Batch, transform, and unbatch in the tflearn interface. -- 184595064 by Zhichao Lu: Add support for training grayscale models. -- 184532026 by Zhichao Lu: Change dataset_builder.build to perform optional batching using tf.data.Dataset API -- 184330239 by Zhichao Lu: Add augment_input_data and transform_input_data helper functions to third_party/tensorflow_models/object_detection/inputs.py -- 184328681 by Zhichao Lu: Use an internal rgb to gray method that can be quantized. -- 184327909 by Zhichao Lu: Helper function to return padding shapes to use with Dataset.padded_batch. -- 184326291 by Zhichao Lu: Added decode_func for specialized decoding. -- 184314676 by Zhichao Lu: Add unstack_batch method to inputs.py. This will enable us to convert batched tensors to lists of tensors. This is compatible with OD API that consumes groundtruth batch as a list of tensors. -- 184281269 by Zhichao Lu: Internal test target changes. -- 184192851 by Zhichao Lu: Adding `Estimator` interface for object detection. -- 184187885 by Zhichao Lu: Add config_util functions to help with input pipeline. 1. function to return expected shapes from the resizer config 2. function to extract image_resizer_config from model_config. -- 184139892 by Zhichao Lu: Adding support for depthwise SSD (ssd-lite) and depthwise box predictions. -- 184089891 by Zhichao Lu: Fix third_party faster rcnn resnet101 coco config. -- 184083378 by Zhichao Lu: In the case when there is no object/weights field in tf.Example proto, return a default weight of 1.0 for all boxes. -- PiperOrigin-RevId: 185215255
1efe98bb · Zhichao Lu · lzc5123016 · fbc5ba06 · 1efe98bb · 1efe98bb
Commit 1efe98bb authored Feb 09, 2018 by Zhichao Lu Committed by lzc5123016 Feb 09, 2018
20 changed files
--- a/research/object_detection/core/BUILD
+++ b/research/object_detection/core/BUILD
@@ -123,6 +123,7 @@ py_library(
        "matcher.py",
    ],
    deps = [
+        "//tensorflow/models/research/object_detection/utils:ops",
    ],
 )

@@ -160,12 +161,20 @@ py_library(
        ":box_list",
        ":box_list_ops",
        ":keypoint_ops",
+        ":preprocessor_cache",
        ":standard_fields",
        "//tensorflow",
        "//tensorflow/models/research/object_detection/utils:shape_utils",
    ],
 )

+py_library(
+    name = "preprocessor_cache",
+    srcs = [
+        "preprocessor_cache.py",
+    ],
+)
+
 py_test(
    name = "preprocessor_test",
    srcs = [
@@ -173,6 +182,7 @@ py_test(
    ],
    deps = [
        ":preprocessor",
+        ":preprocessor_cache",
        "//tensorflow",
    ],
 )

--- a/research/object_detection/core/__init__.py
+++ b/research/object_detection/core/__init__.py
+
--- a/research/object_detection/core/box_predictor.py
+++ b/research/object_detection/core/box_predictor.py
@@ -582,7 +582,8 @@ class ConvolutionalBoxPredictor(BoxPredictor):
               kernel_size,
               box_code_size,
               apply_sigmoid_to_scores=False,
-               class_prediction_bias_init=0.0):
+               class_prediction_bias_init=0.0,
+               use_depthwise=False):
    """Constructor.

    Args:
@@ -611,6 +612,8 @@ class ConvolutionalBoxPredictor(BoxPredictor):
        class_predictions.
      class_prediction_bias_init: constant value to initialize bias of the last
        conv2d layer before class prediction.
+      use_depthwise: Whether to use depthwise convolutions for prediction
+        steps. Default is False.

    Raises:
      ValueError: if min_depth > max_depth.
@@ -628,6 +631,7 @@ class ConvolutionalBoxPredictor(BoxPredictor):
    self._dropout_keep_prob = dropout_keep_prob
    self._apply_sigmoid_to_scores = apply_sigmoid_to_scores
    self._class_prediction_bias_init = class_prediction_bias_init
+    self._use_depthwise = use_depthwise

  def _predict(self, image_features, num_predictions_per_location_list):
    """Computes encoded object locations and corresponding confidences.
@@ -683,15 +687,36 @@ class ConvolutionalBoxPredictor(BoxPredictor):
                  net, depth, [1, 1], scope='Conv2d_%d_1x1_%d' % (i, depth))
          with slim.arg_scope([slim.conv2d], activation_fn=None,
                              normalizer_fn=None, normalizer_params=None):
+            if self._use_depthwise:
+              box_encodings = slim.separable_conv2d(
+                  net, None, [self._kernel_size, self._kernel_size],
+                  padding='SAME', depth_multiplier=1, stride=1,
+                  rate=1, scope='BoxEncodingPredictor_depthwise')
+              box_encodings = slim.conv2d(
+                  box_encodings,
+                  num_predictions_per_location * self._box_code_size, [1, 1],
+                  scope='BoxEncodingPredictor')
+            else:
              box_encodings = slim.conv2d(
                  net, num_predictions_per_location * self._box_code_size,
                  [self._kernel_size, self._kernel_size],
                  scope='BoxEncodingPredictor')
            if self._use_dropout:
              net = slim.dropout(net, keep_prob=self._dropout_keep_prob)
+            if self._use_depthwise:
+              class_predictions_with_background = slim.separable_conv2d(
+                  net, None, [self._kernel_size, self._kernel_size],
+                  padding='SAME', depth_multiplier=1, stride=1,
+                  rate=1, scope='ClassPredictor_depthwise')
+              class_predictions_with_background = slim.conv2d(
+                  class_predictions_with_background,
+                  num_predictions_per_location * num_class_slots,
+                  [1, 1], scope='ClassPredictor')
+            else:
              class_predictions_with_background = slim.conv2d(
                  net, num_predictions_per_location * num_class_slots,
-                [self._kernel_size, self._kernel_size], scope='ClassPredictor',
+                  [self._kernel_size, self._kernel_size],
+                  scope='ClassPredictor',
                  biases_initializer=tf.constant_initializer(
                      self._class_prediction_bias_init))
            if self._apply_sigmoid_to_scores:
@@ -729,7 +754,8 @@ class WeightSharedConvolutionalBoxPredictor(BoxPredictor):
  Defines the box predictor as defined in
  https://arxiv.org/abs/1708.02002. This class differs from
  ConvolutionalBoxPredictor in that it shares weights and biases while
-  predicting from different feature maps.
+  predicting from different feature maps.  Separate multi-layer towers are
+  constructed for the box encoding and class predictors respectively.
  """

  def __init__(self,
@@ -811,22 +837,35 @@ class WeightSharedConvolutionalBoxPredictor(BoxPredictor):
      with tf.variable_scope('WeightSharedConvolutionalBoxPredictor',
                             reuse=tf.AUTO_REUSE):
        num_class_slots = self.num_classes + 1
-        net = image_feature
+        box_encodings_net = image_feature
+        class_predictions_net = image_feature
        with slim.arg_scope(self._conv_hyperparams):
          for i in range(self._num_layers_before_predictor):
-            net = slim.conv2d(net,
+            box_encodings_net = slim.conv2d(
+                box_encodings_net,
                self._depth,
                [self._kernel_size, self._kernel_size],
                stride=1,
                padding='SAME',
-                              scope='conv2d_{}'.format(i))
+                scope='BoxEncodingPredictionTower/conv2d_{}'.format(i))
          box_encodings = slim.conv2d(
-              net, num_predictions_per_location * self._box_code_size,
+              box_encodings_net,
+              num_predictions_per_location * self._box_code_size,
              [self._kernel_size, self._kernel_size],
              activation_fn=None, stride=1, padding='SAME',
              scope='BoxEncodingPredictor')
+
+          for i in range(self._num_layers_before_predictor):
+            class_predictions_net = slim.conv2d(
+                class_predictions_net,
+                self._depth,
+                [self._kernel_size, self._kernel_size],
+                stride=1,
+                padding='SAME',
+                scope='ClassPredictionTower/conv2d_{}'.format(i))
          class_predictions_with_background = slim.conv2d(
-              net, num_predictions_per_location * num_class_slots,
+              class_predictions_net,
+              num_predictions_per_location * num_class_slots,
              [self._kernel_size, self._kernel_size],
              activation_fn=None, stride=1, padding='SAME',
              biases_initializer=tf.constant_initializer(

--- a/research/object_detection/core/box_predictor_test.py
+++ b/research/object_detection/core/box_predictor_test.py
@@ -316,9 +316,69 @@ class ConvolutionalBoxPredictorTest(test_case.TestCase):
           [tf.shape(box_encodings), tf.shape(objectness_predictions)],
           feed_dict={image_features:
                      np.random.rand(4, resolution, resolution, 64)})
+      actual_variable_set = set(
+          [var.op.name for var in tf.trainable_variables()])
      self.assertAllEqual(box_encodings_shape, [4, expected_num_anchors, 1, 4])
      self.assertAllEqual(objectness_predictions_shape,
                          [4, expected_num_anchors, 1])
+    expected_variable_set = set([
+        'BoxPredictor/Conv2d_0_1x1_32/biases',
+        'BoxPredictor/Conv2d_0_1x1_32/weights',
+        'BoxPredictor/BoxEncodingPredictor/biases',
+        'BoxPredictor/BoxEncodingPredictor/weights',
+        'BoxPredictor/ClassPredictor/biases',
+        'BoxPredictor/ClassPredictor/weights'])
+    self.assertEqual(expected_variable_set, actual_variable_set)
+
+  def test_use_depthwise_convolution(self):
+    image_features = tf.placeholder(dtype=tf.float32, shape=[4, None, None, 64])
+    conv_box_predictor = box_predictor.ConvolutionalBoxPredictor(
+        is_training=False,
+        num_classes=0,
+        conv_hyperparams=self._build_arg_scope_with_conv_hyperparams(),
+        min_depth=0,
+        max_depth=32,
+        num_layers_before_predictor=1,
+        dropout_keep_prob=0.8,
+        kernel_size=1,
+        box_code_size=4,
+        use_dropout=True,
+        use_depthwise=True
+    )
+    box_predictions = conv_box_predictor.predict(
+        [image_features], num_predictions_per_location=[5],
+        scope='BoxPredictor')
+    box_encodings = box_predictions[box_predictor.BOX_ENCODINGS]
+    objectness_predictions = box_predictions[
+        box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND]
+    init_op = tf.global_variables_initializer()
+
+    resolution = 32
+    expected_num_anchors = resolution*resolution*5
+    with self.test_session() as sess:
+      sess.run(init_op)
+      (box_encodings_shape,
+       objectness_predictions_shape) = sess.run(
+           [tf.shape(box_encodings), tf.shape(objectness_predictions)],
+           feed_dict={image_features:
+                      np.random.rand(4, resolution, resolution, 64)})
+      actual_variable_set = set(
+          [var.op.name for var in tf.trainable_variables()])
+    self.assertAllEqual(box_encodings_shape, [4, expected_num_anchors, 1, 4])
+    self.assertAllEqual(objectness_predictions_shape,
+                        [4, expected_num_anchors, 1])
+    expected_variable_set = set([
+        'BoxPredictor/Conv2d_0_1x1_32/biases',
+        'BoxPredictor/Conv2d_0_1x1_32/weights',
+        'BoxPredictor/BoxEncodingPredictor_depthwise/biases',
+        'BoxPredictor/BoxEncodingPredictor_depthwise/depthwise_weights',
+        'BoxPredictor/BoxEncodingPredictor/biases',
+        'BoxPredictor/BoxEncodingPredictor/weights',
+        'BoxPredictor/ClassPredictor_depthwise/biases',
+        'BoxPredictor/ClassPredictor_depthwise/depthwise_weights',
+        'BoxPredictor/ClassPredictor/biases',
+        'BoxPredictor/ClassPredictor/weights'])
+    self.assertEqual(expected_variable_set, actual_variable_set)


 class WeightSharedConvolutionalBoxPredictorTest(test_case.TestCase):
@@ -440,14 +500,26 @@ class WeightSharedConvolutionalBoxPredictorTest(test_case.TestCase):

    with self.test_session(graph=tf.Graph()):
      graph_fn(tf.random_uniform([4, 32, 32, 3], dtype=tf.float32),
-               tf.random_uniform([4, 32, 32, 3], dtype=tf.float32))
+               tf.random_uniform([4, 16, 16, 3], dtype=tf.float32))
      actual_variable_set = set(
          [var.op.name for var in tf.trainable_variables()])
    expected_variable_set = set([
-        'BoxPredictor/WeightSharedConvolutionalBoxPredictor/conv2d_0/weights',
-        'BoxPredictor/WeightSharedConvolutionalBoxPredictor/conv2d_0/biases',
-        'BoxPredictor/WeightSharedConvolutionalBoxPredictor/conv2d_1/weights',
-        'BoxPredictor/WeightSharedConvolutionalBoxPredictor/conv2d_1/biases',
+        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
+         'BoxEncodingPredictionTower/conv2d_0/weights'),
+        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
+         'BoxEncodingPredictionTower/conv2d_0/biases'),
+        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
+         'BoxEncodingPredictionTower/conv2d_1/weights'),
+        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
+         'BoxEncodingPredictionTower/conv2d_1/biases'),
+        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
+         'ClassPredictionTower/conv2d_0/weights'),
+        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
+         'ClassPredictionTower/conv2d_0/biases'),
+        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
+         'ClassPredictionTower/conv2d_1/weights'),
+        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
+         'ClassPredictionTower/conv2d_1/biases'),
        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
         'BoxEncodingPredictor/weights'),
        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
@@ -489,6 +561,5 @@ class WeightSharedConvolutionalBoxPredictorTest(test_case.TestCase):
      self.assertAllEqual(objectness_predictions_shape,
                          [4, expected_num_anchors, 1])

-
 if __name__ == '__main__':
  tf.test.main()
--- a/research/object_detection/core/matcher.py
+++ b/research/object_detection/core/matcher.py
@@ -36,6 +36,8 @@ from abc import abstractmethod

 import tensorflow as tf

+from object_detection.utils import ops
+

 class Match(object):
  """Class to store results from the matcher.
@@ -44,7 +46,7 @@ class Match(object):
  convenient methods to query the matching results.
  """

-  def __init__(self, match_results):
+  def __init__(self, match_results, use_matmul_gather=False):
    """Constructs a Match object.

    Args:
@@ -52,6 +54,8 @@ class Match(object):
        meaning that column i is matched with row match_results[i].
        (2) match_results[i]=-1, meaning that column i is not matched.
        (3) match_results[i]=-2, meaning that column i is ignored.
+      use_matmul_gather: Use matrix multiplication based gather instead of
+        standard tf.gather. (Default: False).

    Raises:
      ValueError: if match_results does not have rank 1 or is not an
@@ -63,6 +67,9 @@ class Match(object):
      raise ValueError('match_results should be an int32 or int64 scalar '
                       'tensor')
    self._match_results = match_results
+    self._gather_op = tf.gather
+    if use_matmul_gather:
+      self._gather_op = ops.matmul_gather_on_zeroth_axis

  @property
  def match_results(self):
@@ -163,7 +170,7 @@ class Match(object):
      row_indices: int32 tensor of shape [K] with row indices.
    """
    return self._reshape_and_cast(
-        tf.gather(self._match_results, self.matched_column_indices()))
+        self._gather_op(self._match_results, self.matched_column_indices()))

  def _reshape_and_cast(self, t):
    return tf.cast(tf.reshape(t, [-1]), tf.int32)
@@ -193,7 +200,7 @@ class Match(object):
    input_tensor = tf.concat([tf.stack([ignored_value, unmatched_value]),
                              input_tensor], axis=0)
    gather_indices = tf.maximum(self.match_results + 2, 0)
-    gathered_tensor = tf.gather(input_tensor, gather_indices)
+    gathered_tensor = self._gather_op(input_tensor, gather_indices)
    return gathered_tensor


@@ -202,6 +209,16 @@ class Matcher(object):
  """
  __metaclass__ = ABCMeta

+  def __init__(self, use_matmul_gather=False):
+    """Constructs a Matcher.
+
+    Args:
+      use_matmul_gather: Force constructed match objects to use matrix
+        multiplication based gather instead of standard tf.gather.
+        (Default: False).
+    """
+    self._use_matmul_gather = use_matmul_gather
+
  def match(self, similarity_matrix, scope=None, **params):
    """Computes matches among row and column indices and returns the result.

@@ -219,7 +236,8 @@ class Matcher(object):
      A Match object with the results of matching.
    """
    with tf.name_scope(scope, 'Match', [similarity_matrix, params]) as scope:
-      return Match(self._match(similarity_matrix, **params))
+      return Match(self._match(similarity_matrix, **params),
+                   self._use_matmul_gather)

  @abstractmethod
  def _match(self, similarity_matrix, **params):

--- a/research/object_detection/core/matcher_test.py
+++ b/research/object_detection/core/matcher_test.py
@@ -172,5 +172,21 @@ class MatchTest(tf.test.TestCase):
      gathered_tensor_out = gathered_tensor.eval()
    self.assertAllEqual(expected_gathered_tensor, gathered_tensor_out)

+  def test_multidimensional_gather_based_on_match_with_matmul_gather_op(self):
+    match_results = tf.constant([1, -1, -2])
+    input_tensor = tf.constant([[0, 0.5, 0, 0.5], [0, 0, 0.5, 0.5]],
+                               dtype=tf.float32)
+    expected_gathered_tensor = [[0, 0, 0.5, 0.5], [0, 0, 0, 0], [0, 0, 0, 0]]
+    match = matcher.Match(match_results, use_matmul_gather=True)
+    gathered_tensor = match.gather_based_on_match(input_tensor,
+                                                  unmatched_value=tf.zeros(4),
+                                                  ignored_value=tf.zeros(4))
+    self.assertEquals(gathered_tensor.dtype, tf.float32)
+    with self.test_session() as sess:
+      self.assertTrue(
+          all([op.name is not 'Gather' for op in sess.graph.get_operations()]))
+      gathered_tensor_out = gathered_tensor.eval()
+    self.assertAllEqual(expected_gathered_tensor, gathered_tensor_out)
+
 if __name__ == '__main__':
  tf.test.main()
--- a/research/object_detection/core/model.py
+++ b/research/object_detection/core/model.py
@@ -236,7 +236,8 @@ class DetectionModel(object):
                          groundtruth_boxes_list,
                          groundtruth_classes_list,
                          groundtruth_masks_list=None,
-                          groundtruth_keypoints_list=None):
+                          groundtruth_keypoints_list=None,
+                          groundtruth_weights_list=None):
    """Provide groundtruth tensors.

    Args:
@@ -257,10 +258,15 @@ class DetectionModel(object):
        shape [num_boxes, num_keypoints, 2] containing keypoints.
        Keypoints are assumed to be provided in normalized coordinates and
        missing keypoints should be encoded as NaN.
+      groundtruth_weights_list: A list of 1-D tf.float32 tensors of shape
+        [num_boxes] containing weights for groundtruth boxes.
    """
    self._groundtruth_lists[fields.BoxListFields.boxes] = groundtruth_boxes_list
    self._groundtruth_lists[
        fields.BoxListFields.classes] = groundtruth_classes_list
+    if groundtruth_weights_list:
+      self._groundtruth_lists[fields.BoxListFields.
+                              weights] = groundtruth_weights_list
    if groundtruth_masks_list:
      self._groundtruth_lists[
          fields.BoxListFields.masks] = groundtruth_masks_list

--- a/research/object_detection/core/preprocessor.py
+++ b/research/object_detection/core/preprocessor.py
--- a/research/object_detection/core/preprocessor_cache.py
+++ b/research/object_detection/core/preprocessor_cache.py
+# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Records previous preprocessing operations and allows them to be repeated.
+
+Used with object_detection.core.preprocessor. Passing a PreprocessorCache
+into individual data augmentation functions or the general preprocess() function
+will store all randomly generated variables in the PreprocessorCache. When
+a preprocessor function is called multiple times with the same
+PreprocessorCache object, that function will perform the same augmentation
+on all calls.
+"""
+
+from collections import defaultdict
+
+
+class PreprocessorCache(object):
+  """Dictionary wrapper storing random variables generated during preprocessing.
+  """
+
+  # Constant keys representing different preprocessing functions
+  ROTATION90 = 'rotation90'
+  HORIZONTAL_FLIP = 'horizontal_flip'
+  VERTICAL_FLIP = 'vertical_flip'
+  PIXEL_VALUE_SCALE = 'pixel_value_scale'
+  IMAGE_SCALE = 'image_scale'
+  RGB_TO_GRAY = 'rgb_to_gray'
+  ADJUST_BRIGHTNESS = 'adjust_brightness'
+  ADJUST_CONTRAST = 'adjust_contrast'
+  ADJUST_HUE = 'adjust_hue'
+  ADJUST_SATURATION = 'adjust_saturation'
+  DISTORT_COLOR = 'distort_color'
+  STRICT_CROP_IMAGE = 'strict_crop_image'
+  CROP_IMAGE = 'crop_image'
+  PAD_IMAGE = 'pad_image'
+  CROP_TO_ASPECT_RATIO = 'crop_to_aspect_ratio'
+  RESIZE_METHOD = 'resize_method'
+  PAD_TO_ASPECT_RATIO = 'pad_to_aspect_ratio'
+  BLACK_PATCHES = 'black_patches'
+  ADD_BLACK_PATCH = 'add_black_patch'
+  SELECTOR = 'selector'
+  SELECTOR_TUPLES = 'selector_tuples'
+  SSD_CROP_SELECTOR_ID = 'ssd_crop_selector_id'
+  SSD_CROP_PAD_SELECTOR_ID = 'ssd_crop_pad_selector_id'
+
+  # 23 permitted function ids
+  _VALID_FNS = [ROTATION90, HORIZONTAL_FLIP, VERTICAL_FLIP, PIXEL_VALUE_SCALE,
+                IMAGE_SCALE, RGB_TO_GRAY, ADJUST_BRIGHTNESS, ADJUST_CONTRAST,
+                ADJUST_HUE, ADJUST_SATURATION, DISTORT_COLOR, STRICT_CROP_IMAGE,
+                CROP_IMAGE, PAD_IMAGE, CROP_TO_ASPECT_RATIO, RESIZE_METHOD,
+                PAD_TO_ASPECT_RATIO, BLACK_PATCHES, ADD_BLACK_PATCH, SELECTOR,
+                SELECTOR_TUPLES, SSD_CROP_SELECTOR_ID, SSD_CROP_PAD_SELECTOR_ID]
+
+  def __init__(self):
+    self._history = defaultdict(dict)
+
+  def clear(self):
+    """Resets cache."""
+    self._history = {}
+
+  def get(self, function_id, key):
+    """Gets stored value given a function id and key.
+
+    Args:
+      function_id: identifier for the preprocessing function used.
+      key: identifier for the variable stored.
+    Returns:
+      value: the corresponding value, expected to be a tensor or
+             nested structure of tensors.
+    Raises:
+      ValueError: if function_id is not one of the 23 valid function ids.
+    """
+    if function_id not in self._VALID_FNS:
+      raise ValueError('Function id not recognized: %s.' % str(function_id))
+    return self._history[function_id].get(key)
+
+  def update(self, function_id, key, value):
+    """Adds a value to the dictionary.
+
+    Args:
+      function_id: identifier for the preprocessing function used.
+      key: identifier for the variable stored.
+      value: the value to store, expected to be a tensor or nested structure
+             of tensors.
+    Raises:
+      ValueError: if function_id is not one of the 23 valid function ids.
+    """
+    if function_id not in self._VALID_FNS:
+      raise ValueError('Function id not recognized: %s.' % str(function_id))
+    self._history[function_id][key] = value
+
--- a/research/object_detection/core/preprocessor_test.py
+++ b/research/object_detection/core/preprocessor_test.py
@@ -21,6 +21,7 @@ import six
 import tensorflow as tf

 from object_detection.core import preprocessor
+from object_detection.core import preprocessor_cache
 from object_detection.core import standard_fields as fields

 if six.PY2:
@@ -290,6 +291,15 @@ class PreprocessorTest(tf.test.TestCase):
  def expectedLabelsAfterThresholdingWithMissingScore(self):
    return tf.constant([2], dtype=tf.float32)

+  def testRgbToGrayscale(self):
+    images = self.createTestImages()
+    grayscale_images = preprocessor._rgb_to_grayscale(images)
+    expected_images = tf.image.rgb_to_grayscale(images)
+    with self.test_session() as sess:
+      (grayscale_images, expected_images) = sess.run(
+          [grayscale_images, expected_images])
+      self.assertAllEqual(expected_images, grayscale_images)
+
  def testNormalizeImage(self):
    preprocess_options = [(preprocessor.normalize_image, {
        'original_minval': 0,
@@ -435,6 +445,55 @@ class PreprocessorTest(tf.test.TestCase):
      rotated_mask, expected_mask = sess.run([rotated_mask, expected_mask])
      self.assertAllEqual(rotated_mask.flatten(), expected_mask.flatten())

+  def _testPreprocessorCache(self,
+                             preprocess_options,
+                             test_boxes=False,
+                             test_masks=False,
+                             test_keypoints=False,
+                             num_runs=4):
+    cache = preprocessor_cache.PreprocessorCache()
+    images = self.createTestImages()
+    boxes = self.createTestBoxes()
+    classes = self.createTestLabels()
+    masks = self.createTestMasks()
+    keypoints = self.createTestKeypoints()
+    preprocessor_arg_map = preprocessor.get_default_func_arg_map(
+        include_instance_masks=test_masks, include_keypoints=test_keypoints)
+    out = []
+    for i in range(num_runs):
+      tensor_dict = {
+          fields.InputDataFields.image: images,
+      }
+      num_outputs = 1
+      if test_boxes:
+        tensor_dict[fields.InputDataFields.groundtruth_boxes] = boxes
+        tensor_dict[fields.InputDataFields.groundtruth_classes] = classes
+        num_outputs += 1
+      if test_masks:
+        tensor_dict[fields.InputDataFields.groundtruth_instance_masks] = masks
+        num_outputs += 1
+      if test_keypoints:
+        tensor_dict[fields.InputDataFields.groundtruth_keypoints] = keypoints
+        num_outputs += 1
+      out.append(preprocessor.preprocess(
+          tensor_dict, preprocess_options, preprocessor_arg_map, cache))
+
+    with self.test_session() as sess:
+      to_run = []
+      for i in range(num_runs):
+        to_run.append(out[i][fields.InputDataFields.image])
+        if test_boxes:
+          to_run.append(out[i][fields.InputDataFields.groundtruth_boxes])
+        if test_masks:
+          to_run.append(
+              out[i][fields.InputDataFields.groundtruth_instance_masks])
+        if test_keypoints:
+          to_run.append(out[i][fields.InputDataFields.groundtruth_keypoints])
+
+      out_array = sess.run(to_run)
+      for i in range(num_outputs, len(out_array)):
+        self.assertAllClose(out_array[i], out_array[i - num_outputs])
+
  def testRandomHorizontalFlip(self):
    preprocess_options = [(preprocessor.random_horizontal_flip, {})]
    images = self.expectedImagesAfterNormalization()
@@ -491,6 +550,16 @@ class PreprocessorTest(tf.test.TestCase):
      self.assertAllClose(boxes_, boxes_expected_)
      self.assertAllClose(images_diff_, images_diff_expected_)

+  def testRandomHorizontalFlipWithCache(self):
+    keypoint_flip_permutation = self.createKeypointFlipPermutation()
+    preprocess_options = [
+        (preprocessor.random_horizontal_flip,
+         {'keypoint_flip_permutation': keypoint_flip_permutation})]
+    self._testPreprocessorCache(preprocess_options,
+                                test_boxes=True,
+                                test_masks=True,
+                                test_keypoints=True)
+
  def testRunRandomHorizontalFlipWithMaskAndKeypoints(self):
    preprocess_options = [(preprocessor.random_horizontal_flip, {})]
    image_height = 3
@@ -578,6 +647,16 @@ class PreprocessorTest(tf.test.TestCase):
      self.assertAllClose(boxes_, boxes_expected_)
      self.assertAllClose(images_diff_, images_diff_expected_)

+  def testRandomVerticalFlipWithCache(self):
+    keypoint_flip_permutation = self.createKeypointFlipPermutation()
+    preprocess_options = [
+        (preprocessor.random_vertical_flip,
+         {'keypoint_flip_permutation': keypoint_flip_permutation})]
+    self._testPreprocessorCache(preprocess_options,
+                                test_boxes=True,
+                                test_masks=True,
+                                test_keypoints=True)
+
  def testRunRandomVerticalFlipWithMaskAndKeypoints(self):
    preprocess_options = [(preprocessor.random_vertical_flip, {})]
    image_height = 3
@@ -665,6 +744,13 @@ class PreprocessorTest(tf.test.TestCase):
      self.assertAllClose(boxes_, boxes_expected_)
      self.assertAllClose(images_diff_, images_diff_expected_)

+  def testRandomRotation90WithCache(self):
+    preprocess_options = [(preprocessor.random_rotation90, {})]
+    self._testPreprocessorCache(preprocess_options,
+                                test_boxes=True,
+                                test_masks=True,
+                                test_keypoints=True)
+
  def testRunRandomRotation90WithMaskAndKeypoints(self):
    preprocess_options = [(preprocessor.random_rotation90, {})]
    image_height = 3
@@ -716,6 +802,20 @@ class PreprocessorTest(tf.test.TestCase):
      self.assertAllClose(values_greater_, values_true_)
      self.assertAllClose(values_less_, values_true_)

+  def testRandomPixelValueScaleWithCache(self):
+    preprocess_options = []
+    preprocess_options.append((preprocessor.normalize_image, {
+        'original_minval': 0,
+        'original_maxval': 255,
+        'target_minval': 0,
+        'target_maxval': 1
+    }))
+    preprocess_options.append((preprocessor.random_pixel_value_scale, {}))
+    self._testPreprocessorCache(preprocess_options,
+                                test_boxes=True,
+                                test_masks=False,
+                                test_keypoints=False)
+
  def testRandomImageScale(self):
    preprocess_options = [(preprocessor.random_image_scale, {})]
    images_original = self.createTestImages()
@@ -736,6 +836,13 @@ class PreprocessorTest(tf.test.TestCase):
      self.assertTrue(
          images_original_shape_[2] * 2.0 >= images_scaled_shape_[2])

+  def testRandomImageScaleWithCache(self):
+    preprocess_options = [(preprocessor.random_image_scale, {})]
+    self._testPreprocessorCache(preprocess_options,
+                                test_boxes=False,
+                                test_masks=False,
+                                test_keypoints=False)
+
  def testRandomRGBtoGray(self):
    preprocess_options = [(preprocessor.random_rgb_to_gray, {})]
    images_original = self.createTestImages()
@@ -769,6 +876,14 @@ class PreprocessorTest(tf.test.TestCase):
      self.assertAllClose(images_g_diff_, image_zero1_)
      self.assertAllClose(images_b_diff_, image_zero1_)

+  def testRandomRGBtoGrayWithCache(self):
+    preprocess_options = [(
+        preprocessor.random_rgb_to_gray, {'probability': 0.5})]
+    self._testPreprocessorCache(preprocess_options,
+                                test_boxes=False,
+                                test_masks=False,
+                                test_keypoints=False)
+
  def testRandomAdjustBrightness(self):
    preprocessing_options = []
    preprocessing_options.append((preprocessor.normalize_image, {
@@ -789,6 +904,20 @@ class PreprocessorTest(tf.test.TestCase):
          [image_original_shape, image_bright_shape])
      self.assertAllEqual(image_original_shape_, image_bright_shape_)

+  def testRandomAdjustBrightnessWithCache(self):
+    preprocess_options = []
+    preprocess_options.append((preprocessor.normalize_image, {
+        'original_minval': 0,
+        'original_maxval': 255,
+        'target_minval': 0,
+        'target_maxval': 1
+    }))
+    preprocess_options.append((preprocessor.random_adjust_brightness, {}))
+    self._testPreprocessorCache(preprocess_options,
+                                test_boxes=False,
+                                test_masks=False,
+                                test_keypoints=False)
+
  def testRandomAdjustContrast(self):
    preprocessing_options = []
    preprocessing_options.append((preprocessor.normalize_image, {
@@ -809,6 +938,20 @@ class PreprocessorTest(tf.test.TestCase):
          [image_original_shape, image_contrast_shape])
      self.assertAllEqual(image_original_shape_, image_contrast_shape_)

+  def testRandomAdjustContrastWithCache(self):
+    preprocess_options = []
+    preprocess_options.append((preprocessor.normalize_image, {
+        'original_minval': 0,
+        'original_maxval': 255,
+        'target_minval': 0,
+        'target_maxval': 1
+    }))
+    preprocess_options.append((preprocessor.random_adjust_contrast, {}))
+    self._testPreprocessorCache(preprocess_options,
+                                test_boxes=False,
+                                test_masks=False,
+                                test_keypoints=False)
+
  def testRandomAdjustHue(self):
    preprocessing_options = []
    preprocessing_options.append((preprocessor.normalize_image, {
@@ -829,6 +972,20 @@ class PreprocessorTest(tf.test.TestCase):
          [image_original_shape, image_hue_shape])
      self.assertAllEqual(image_original_shape_, image_hue_shape_)

+  def testRandomAdjustHueWithCache(self):
+    preprocess_options = []
+    preprocess_options.append((preprocessor.normalize_image, {
+        'original_minval': 0,
+        'original_maxval': 255,
+        'target_minval': 0,
+        'target_maxval': 1
+    }))
+    preprocess_options.append((preprocessor.random_adjust_hue, {}))
+    self._testPreprocessorCache(preprocess_options,
+                                test_boxes=False,
+                                test_masks=False,
+                                test_keypoints=False)
+
  def testRandomDistortColor(self):
    preprocessing_options = []
    preprocessing_options.append((preprocessor.normalize_image, {
@@ -849,6 +1006,20 @@ class PreprocessorTest(tf.test.TestCase):
          [images_original_shape, images_distorted_color_shape])
      self.assertAllEqual(images_original_shape_, images_distorted_color_shape_)

+  def testRandomDistortColorWithCache(self):
+    preprocess_options = []
+    preprocess_options.append((preprocessor.normalize_image, {
+        'original_minval': 0,
+        'original_maxval': 255,
+        'target_minval': 0,
+        'target_maxval': 1
+    }))
+    preprocess_options.append((preprocessor.random_distort_color, {}))
+    self._testPreprocessorCache(preprocess_options,
+                                test_boxes=False,
+                                test_masks=False,
+                                test_keypoints=False)
+
  def testRandomJitterBoxes(self):
    preprocessing_options = []
    preprocessing_options.append((preprocessor.random_jitter_boxes, {}))
@@ -900,6 +1071,21 @@ class PreprocessorTest(tf.test.TestCase):
      self.assertAllEqual(boxes_rank_, distorted_boxes_rank_)
      self.assertAllEqual(images_rank_, distorted_images_rank_)

+  def testRandomCropImageWithCache(self):
+    preprocess_options = [(preprocessor.random_rgb_to_gray,
+                           {'probability': 0.5}),
+                          (preprocessor.normalize_image, {
+                              'original_minval': 0,
+                              'original_maxval': 255,
+                              'target_minval': 0,
+                              'target_maxval': 1,
+                          }),
+                          (preprocessor.random_crop_image, {})]
+    self._testPreprocessorCache(preprocess_options,
+                                test_boxes=True,
+                                test_masks=False,
+                                test_keypoints=False)
+
  def testRandomCropImageGrayscale(self):
    preprocessing_options = [(preprocessor.rgb_to_gray, {}),
                             (preprocessor.normalize_image, {
@@ -1446,6 +1632,13 @@ class PreprocessorTest(tf.test.TestCase):
           self.expectedKeypointsAfterThresholding()])
      self.assertAllClose(retained_keypoints_, expected_keypoints_)

+  def testRandomCropToAspectRatioWithCache(self):
+    preprocess_options = [(preprocessor.random_crop_to_aspect_ratio, {})]
+    self._testPreprocessorCache(preprocess_options,
+                                test_boxes=True,
+                                test_masks=False,
+                                test_keypoints=False)
+
  def testRunRandomCropToAspectRatioWithMasks(self):
    image = self.createColorfulTestImage()
    boxes = self.createTestBoxes()
@@ -1536,6 +1729,13 @@ class PreprocessorTest(tf.test.TestCase):
        self.assertAllClose(distorted_keypoints_.flatten(),
                            expected_keypoints.flatten())

+  def testRandomPadToAspectRatioWithCache(self):
+    preprocess_options = [(preprocessor.random_pad_to_aspect_ratio, {})]
+    self._testPreprocessorCache(preprocess_options,
+                                test_boxes=True,
+                                test_masks=True,
+                                test_keypoints=True)
+
  def testRunRandomPadToAspectRatioWithMasks(self):
    image = self.createColorfulTestImage()
    boxes = self.createTestBoxes()
@@ -1624,6 +1824,17 @@ class PreprocessorTest(tf.test.TestCase):
      self.assertAllClose(distorted_keypoints_.flatten(),
                          expected_keypoints.flatten())

+  def testRandomPadImageWithCache(self):
+    preprocess_options = [(preprocessor.normalize_image, {
+        'original_minval': 0,
+        'original_maxval': 255,
+        'target_minval': 0,
+        'target_maxval': 1,}), (preprocessor.random_pad_image, {})]
+    self._testPreprocessorCache(preprocess_options,
+                                test_boxes=True,
+                                test_masks=True,
+                                test_keypoints=True)
+
  def testRandomPadImage(self):
    preprocessing_options = [(preprocessor.normalize_image, {
        'original_minval': 0,
@@ -1670,6 +1881,17 @@ class PreprocessorTest(tf.test.TestCase):
      self.assertTrue(np.all((boxes_[:, 3] - boxes_[:, 1]) >= (
          padded_boxes_[:, 3] - padded_boxes_[:, 1])))

+  def testRandomCropPadImageWithCache(self):
+    preprocess_options = [(preprocessor.normalize_image, {
+        'original_minval': 0,
+        'original_maxval': 255,
+        'target_minval': 0,
+        'target_maxval': 1,}), (preprocessor.random_crop_pad_image, {})]
+    self._testPreprocessorCache(preprocess_options,
+                                test_boxes=True,
+                                test_masks=True,
+                                test_keypoints=True)
+
  def testRandomCropPadImageWithRandomCoefOne(self):
    preprocessing_options = [(preprocessor.normalize_image, {
        'original_minval': 0,
@@ -1788,6 +2010,22 @@ class PreprocessorTest(tf.test.TestCase):
      self.assertEqual(images_shape_[1], padded_images_shape_[1])
      self.assertEqual(2 * images_shape_[2], padded_images_shape_[2])

+  def testRandomBlackPatchesWithCache(self):
+    preprocess_options = []
+    preprocess_options.append((preprocessor.normalize_image, {
+        'original_minval': 0,
+        'original_maxval': 255,
+        'target_minval': 0,
+        'target_maxval': 1
+    }))
+    preprocess_options.append((preprocessor.random_black_patches, {
+        'size_to_image_ratio': 0.5
+    }))
+    self._testPreprocessorCache(preprocess_options,
+                                test_boxes=True,
+                                test_masks=True,
+                                test_keypoints=True)
+
  def testRandomBlackPatches(self):
    preprocessing_options = []
    preprocessing_options.append((preprocessor.normalize_image, {
@@ -1812,6 +2050,22 @@ class PreprocessorTest(tf.test.TestCase):
          [images_shape, blacked_images_shape])
      self.assertAllEqual(images_shape_, blacked_images_shape_)

+  def testRandomResizeMethodWithCache(self):
+    preprocess_options = []
+    preprocess_options.append((preprocessor.normalize_image, {
+        'original_minval': 0,
+        'original_maxval': 255,
+        'target_minval': 0,
+        'target_maxval': 1
+    }))
+    preprocess_options.append((preprocessor.random_resize_method, {
+        'target_size': (75, 150)
+    }))
+    self._testPreprocessorCache(preprocess_options,
+                                test_boxes=True,
+                                test_masks=True,
+                                test_keypoints=True)
+
  def testRandomResizeMethod(self):
    preprocessing_options = []
    preprocessing_options.append((preprocessor.normalize_image, {
@@ -2144,6 +2398,20 @@ class PreprocessorTest(tf.test.TestCase):

      self.assertAllEqual([0, 1, 1, 0, 1], one_hot)

+  def testSSDRandomCropWithCache(self):
+    preprocess_options = [
+        (preprocessor.normalize_image, {
+            'original_minval': 0,
+            'original_maxval': 255,
+            'target_minval': 0,
+            'target_maxval': 1
+        }),
+        (preprocessor.ssd_random_crop, {})]
+    self._testPreprocessorCache(preprocess_options,
+                                test_boxes=True,
+                                test_masks=False,
+                                test_keypoints=False)
+
  def testSSDRandomCrop(self):
    preprocessing_options = [
        (preprocessor.normalize_image, {
@@ -2216,6 +2484,20 @@ class PreprocessorTest(tf.test.TestCase):
      self.assertAllEqual(boxes_rank_, distorted_boxes_rank_)
      self.assertAllEqual(images_rank_, distorted_images_rank_)

+  def testSSDRandomCropFixedAspectRatioWithCache(self):
+    preprocess_options = [
+        (preprocessor.normalize_image, {
+            'original_minval': 0,
+            'original_maxval': 255,
+            'target_minval': 0,
+            'target_maxval': 1
+        }),
+        (preprocessor.ssd_random_crop_fixed_aspect_ratio, {})]
+    self._testPreprocessorCache(preprocess_options,
+                                test_boxes=True,
+                                test_masks=False,
+                                test_keypoints=False)
+
  def _testSSDRandomCropFixedAspectRatio(self,
                                         include_label_scores,
                                         include_instance_masks,

--- a/research/object_detection/core/standard_fields.py
+++ b/research/object_detection/core/standard_fields.py
@@ -58,6 +58,9 @@ class InputDataFields(object):
    groundtruth_keypoint_visibilities: ground truth keypoint visibilities.
    groundtruth_label_scores: groundtruth label scores.
    groundtruth_weights: groundtruth weight factor for bounding boxes.
+    num_groundtruth_boxes: number of groundtruth boxes.
+    true_image_shapes: true shapes of images in the resized images, as resized
+      images can be padded with zeros.
  """
  image = 'image'
  original_image = 'original_image'
@@ -81,6 +84,8 @@ class InputDataFields(object):
  groundtruth_keypoint_visibilities = 'groundtruth_keypoint_visibilities'
  groundtruth_label_scores = 'groundtruth_label_scores'
  groundtruth_weights = 'groundtruth_weights'
+  num_groundtruth_boxes = 'num_groundtruth_boxes'
+  true_image_shape = 'true_image_shape'


 class DetectionResultFields(object):

--- a/research/object_detection/core/target_assigner.py
+++ b/research/object_detection/core/target_assigner.py
@@ -389,7 +389,8 @@ def create_target_assigner(reference, stage=None,
 def batch_assign_targets(target_assigner,
                         anchors_batch,
                         gt_box_batch,
-                         gt_class_targets_batch):
+                         gt_class_targets_batch,
+                         gt_weights_batch=None):
  """Batched assignment of classification and regression targets.

  Args:
@@ -402,6 +403,8 @@ def batch_assign_targets(target_assigner,
      each tensor has shape [num_gt_boxes_i, classification_target_size] and
      num_gt_boxes_i is the number of boxes in the ith boxlist of
      gt_box_batch.
+    gt_weights_batch: A list of 1-D tf.float32 tensors of shape
+      [num_boxes] containing weights for groundtruth boxes.

  Returns:
    batch_cls_targets: a tensor with shape [batch_size, num_anchors,
@@ -435,11 +438,13 @@ def batch_assign_targets(target_assigner,
  reg_targets_list = []
  reg_weights_list = []
  match_list = []
-  for anchors, gt_boxes, gt_class_targets in zip(
-      anchors_batch, gt_box_batch, gt_class_targets_batch):
+  if gt_weights_batch is None:
+    gt_weights_batch = [None] * len(gt_class_targets_batch)
+  for anchors, gt_boxes, gt_class_targets, gt_weights in zip(
+      anchors_batch, gt_box_batch, gt_class_targets_batch, gt_weights_batch):
    (cls_targets, cls_weights, reg_targets,
     reg_weights, match) = target_assigner.assign(
-         anchors, gt_boxes, gt_class_targets)
+         anchors, gt_boxes, gt_class_targets, gt_weights)
    cls_targets_list.append(cls_targets)
    cls_weights_list.append(cls_weights)
    reg_targets_list.append(reg_targets)

--- a/research/object_detection/core/target_assigner_test.py
+++ b/research/object_detection/core/target_assigner_test.py
@@ -632,6 +632,81 @@ class BatchTargetAssignerTest(test_case.TestCase):
    self.assertAllClose(reg_targets_out, exp_reg_targets)
    self.assertAllClose(reg_weights_out, exp_reg_weights)

+  def test_batch_assign_multiclass_targets_with_padded_groundtruth(self):
+    def graph_fn(anchor_means, anchor_stddevs, groundtruth_boxlist1,
+                 groundtruth_boxlist2, class_targets1, class_targets2,
+                 groundtruth_weights1, groundtruth_weights2):
+      box_list1 = box_list.BoxList(groundtruth_boxlist1)
+      box_list2 = box_list.BoxList(groundtruth_boxlist2)
+      gt_box_batch = [box_list1, box_list2]
+      gt_class_targets = [class_targets1, class_targets2]
+      gt_weights = [groundtruth_weights1, groundtruth_weights2]
+      anchors_boxlist = box_list.BoxList(anchor_means)
+      anchors_boxlist.add_field('stddev', anchor_stddevs)
+      multiclass_target_assigner = self._get_multi_class_target_assigner(
+          num_classes=3)
+      (cls_targets, cls_weights, reg_targets, reg_weights,
+       _) = targetassigner.batch_assign_targets(
+           multiclass_target_assigner, anchors_boxlist, gt_box_batch,
+           gt_class_targets, gt_weights)
+      return (cls_targets, cls_weights, reg_targets, reg_weights)
+
+    groundtruth_boxlist1 = np.array([[0., 0., 0.2, 0.2],
+                                     [0., 0., 0., 0.]], dtype=np.float32)
+    groundtruth_weights1 = np.array([1, 0], dtype=np.float32)
+    groundtruth_boxlist2 = np.array([[0, 0.25123152, 1, 1],
+                                     [0.015789, 0.0985, 0.55789, 0.3842],
+                                     [0, 0, 0, 0]],
+                                    dtype=np.float32)
+    groundtruth_weights2 = np.array([1, 1, 0], dtype=np.float32)
+    class_targets1 = np.array([[0, 1, 0, 0], [0, 0, 0, 0]], dtype=np.float32)
+    class_targets2 = np.array([[0, 0, 0, 1],
+                               [0, 0, 1, 0],
+                               [0, 0, 0, 0]], dtype=np.float32)
+
+    anchor_means = np.array([[0, 0, .25, .25],
+                             [0, .25, 1, 1],
+                             [0, .1, .5, .5],
+                             [.75, .75, 1, 1]], dtype=np.float32)
+    anchor_stddevs = np.array([[.1, .1, .1, .1],
+                               [.1, .1, .1, .1],
+                               [.1, .1, .1, .1],
+                               [.1, .1, .1, .1]], dtype=np.float32)
+
+    exp_reg_targets = [[[0, 0, -0.5, -0.5],
+                        [0, 0, 0, 0],
+                        [0, 0, 0, 0,],
+                        [0, 0, 0, 0,],],
+                       [[0, 0, 0, 0,],
+                        [0, 0.01231521, 0, 0],
+                        [0.15789001, -0.01500003, 0.57889998, -1.15799987],
+                        [0, 0, 0, 0]]]
+    exp_cls_weights = [[1, 1, 1, 1],
+                       [1, 1, 1, 1]]
+    exp_cls_targets = [[[0, 1, 0, 0],
+                        [1, 0, 0, 0],
+                        [1, 0, 0, 0],
+                        [1, 0, 0, 0]],
+                       [[1, 0, 0, 0],
+                        [0, 0, 0, 1],
+                        [0, 0, 1, 0],
+                        [1, 0, 0, 0]]]
+    exp_reg_weights = [[1, 0, 0, 0],
+                       [0, 1, 1, 0]]
+
+    (cls_targets_out, cls_weights_out, reg_targets_out,
+     reg_weights_out) = self.execute(graph_fn, [anchor_means, anchor_stddevs,
+                                                groundtruth_boxlist1,
+                                                groundtruth_boxlist2,
+                                                class_targets1,
+                                                class_targets2,
+                                                groundtruth_weights1,
+                                                groundtruth_weights2])
+    self.assertAllClose(cls_targets_out, exp_cls_targets)
+    self.assertAllClose(cls_weights_out, exp_cls_weights)
+    self.assertAllClose(reg_targets_out, exp_reg_targets)
+    self.assertAllClose(reg_weights_out, exp_reg_weights)
+
  def test_batch_assign_multidimensional_targets(self):
    def graph_fn(anchor_means, anchor_stddevs, groundtruth_boxlist1,
                 groundtruth_boxlist2, class_targets1, class_targets2):

--- a/research/object_detection/data_decoders/tf_example_decoder.py
+++ b/research/object_detection/data_decoders/tf_example_decoder.py
@@ -134,7 +134,8 @@ class TfExampleDecoder(data_decoder.DataDecoder):
        self.items_to_handlers[
            fields.InputDataFields.groundtruth_instance_masks] = (
                slim_example_decoder.ItemHandlerCallback(
-                    ['image/object/mask'], self._decode_png_instance_masks))
+                    ['image/object/mask', 'image/height', 'image/width'],
+                    self._decode_png_instance_masks))
      else:
        raise ValueError('Did not recognize the `instance_mask_type` option.')
    if label_map_proto_file:
@@ -178,10 +179,15 @@ class TfExampleDecoder(data_decoder.DataDecoder):
        [None, 4] containing box corners.
      fields.InputDataFields.groundtruth_classes - 1D int64 tensor of shape
        [None] containing classes for the boxes.
+      fields.InputDataFields.groundtruth_weights - 1D float32 tensor of
+        shape [None] indicating the weights of groundtruth boxes.
+      fields.InputDataFields.num_groundtruth_boxes - int32 scalar indicating
+        the number of groundtruth_boxes.
      fields.InputDataFields.groundtruth_area - 1D float32 tensor of shape
        [None] containing containing object mask area in pixel squared.
      fields.InputDataFields.groundtruth_is_crowd - 1D bool tensor of shape
        [None] indicating if the boxes enclose a crowd.
+
    Optional:
      fields.InputDataFields.groundtruth_difficult - 1D bool tensor of shape
        [None] indicating if the boxes represent `difficult` instances.
@@ -189,8 +195,6 @@ class TfExampleDecoder(data_decoder.DataDecoder):
        [None] indicating if the boxes represent `group_of` instances.
      fields.InputDataFields.groundtruth_instance_masks - 3D float32 tensor of
        shape [None, None, None] containing instance masks.
-      fields.InputDataFields.groundtruth_weights - 1D float32 tensor of
-        shape [None] indicating the weights of groundtruth boxes.
    """
    serialized_example = tf.reshape(tf_example_string_tensor, shape=[])
    decoder = slim_example_decoder.TFExampleDecoder(self.keys_to_features,
@@ -201,6 +205,20 @@ class TfExampleDecoder(data_decoder.DataDecoder):
    is_crowd = fields.InputDataFields.groundtruth_is_crowd
    tensor_dict[is_crowd] = tf.cast(tensor_dict[is_crowd], dtype=tf.bool)
    tensor_dict[fields.InputDataFields.image].set_shape([None, None, 3])
+    tensor_dict[fields.InputDataFields.num_groundtruth_boxes] = tf.shape(
+        tensor_dict[fields.InputDataFields.groundtruth_boxes])[0]
+
+    def default_groundtruth_weights():
+      return tf.ones(
+          [tf.shape(tensor_dict[fields.InputDataFields.groundtruth_boxes])[0]],
+          dtype=tf.float32)
+
+    tensor_dict[fields.InputDataFields.groundtruth_weights] = tf.cond(
+        tf.greater(
+            tf.shape(
+                tensor_dict[fields.InputDataFields.groundtruth_weights])[0],
+            0), lambda: tensor_dict[fields.InputDataFields.groundtruth_weights],
+        default_groundtruth_weights)
    return tensor_dict

  def _reshape_instance_masks(self, keys_to_tensors):
@@ -247,6 +265,11 @@ class TfExampleDecoder(data_decoder.DataDecoder):
      return image

    png_masks = keys_to_tensors['image/object/mask']
+    height = keys_to_tensors['image/height']
+    width = keys_to_tensors['image/width']
    if isinstance(png_masks, tf.SparseTensor):
      png_masks = tf.sparse_tensor_to_dense(png_masks, default_value='')
-    return tf.map_fn(decode_png_mask, png_masks, dtype=tf.float32)
+    return tf.cond(
+        tf.greater(tf.size(png_masks), 0),
+        lambda: tf.map_fn(decode_png_mask, png_masks, dtype=tf.float32),
+        lambda: tf.zeros(tf.to_int32(tf.stack([0, height, width]))))
--- a/research/object_detection/data_decoders/tf_example_decoder_test.py
+++ b/research/object_detection/data_decoders/tf_example_decoder_test.py
@@ -58,7 +58,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))

  def testDecodeJpegImage(self):
-    image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
+    image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
    encoded_jpeg = self._EncodeImage(image_tensor)
    decoded_jpeg = self._DecodeImage(encoded_jpeg)
    example = tf.train.Example(features=tf.train.Features(feature={
@@ -79,7 +79,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
    self.assertEqual('image_id', tensor_dict[fields.InputDataFields.source_id])

  def testDecodeImageKeyAndFilename(self):
-    image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
+    image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
    encoded_jpeg = self._EncodeImage(image_tensor)
    example = tf.train.Example(features=tf.train.Features(feature={
        'image/encoded': self._BytesFeature(encoded_jpeg),
@@ -97,7 +97,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
    self.assertEqual('filename', tensor_dict[fields.InputDataFields.filename])

  def testDecodePngImage(self):
-    image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
+    image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
    encoded_png = self._EncodeImage(image_tensor, encoding_type='png')
    decoded_png = self._DecodeImage(encoded_png, encoding_type='png')
    example = tf.train.Example(features=tf.train.Features(feature={
@@ -147,8 +147,32 @@ class TfExampleDecoderTest(tf.test.TestCase):
        decoded_masks,
        tensor_dict[fields.InputDataFields.groundtruth_instance_masks])

+  def testDecodeEmptyPngInstanceMasks(self):
+    image_tensor = np.random.randint(256, size=(10, 10, 3)).astype(np.uint8)
+    encoded_jpeg = self._EncodeImage(image_tensor)
+    encoded_masks = []
+    example = tf.train.Example(
+        features=tf.train.Features(
+            feature={
+                'image/encoded': self._BytesFeature(encoded_jpeg),
+                'image/format': self._BytesFeature('jpeg'),
+                'image/object/mask': self._BytesFeature(encoded_masks),
+                'image/height': self._Int64Feature([10]),
+                'image/width': self._Int64Feature([10]),
+            })).SerializeToString()
+
+    example_decoder = tf_example_decoder.TfExampleDecoder(
+        load_instance_masks=True, instance_mask_type=input_reader_pb2.PNG_MASKS)
+    tensor_dict = example_decoder.decode(tf.convert_to_tensor(example))
+
+    with self.test_session() as sess:
+      tensor_dict = sess.run(tensor_dict)
+      self.assertAllEqual(
+          tensor_dict[fields.InputDataFields.groundtruth_instance_masks].shape,
+          [0, 10, 10])
+
  def testDecodeBoundingBox(self):
-    image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
+    image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
    encoded_jpeg = self._EncodeImage(image_tensor)
    bbox_ymins = [0.0, 4.0]
    bbox_xmins = [1.0, 5.0]
@@ -175,9 +199,39 @@ class TfExampleDecoderTest(tf.test.TestCase):
                                bbox_ymaxs, bbox_xmaxs]).transpose()
    self.assertAllEqual(expected_boxes,
                        tensor_dict[fields.InputDataFields.groundtruth_boxes])
+    self.assertAllEqual(
+        2, tensor_dict[fields.InputDataFields.num_groundtruth_boxes])
+
+  def testDecodeDefaultGroundtruthWeights(self):
+    image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
+    encoded_jpeg = self._EncodeImage(image_tensor)
+    bbox_ymins = [0.0, 4.0]
+    bbox_xmins = [1.0, 5.0]
+    bbox_ymaxs = [2.0, 6.0]
+    bbox_xmaxs = [3.0, 7.0]
+    example = tf.train.Example(features=tf.train.Features(feature={
+        'image/encoded': self._BytesFeature(encoded_jpeg),
+        'image/format': self._BytesFeature('jpeg'),
+        'image/object/bbox/ymin': self._FloatFeature(bbox_ymins),
+        'image/object/bbox/xmin': self._FloatFeature(bbox_xmins),
+        'image/object/bbox/ymax': self._FloatFeature(bbox_ymaxs),
+        'image/object/bbox/xmax': self._FloatFeature(bbox_xmaxs),
+    })).SerializeToString()
+
+    example_decoder = tf_example_decoder.TfExampleDecoder()
+    tensor_dict = example_decoder.decode(tf.convert_to_tensor(example))
+
+    self.assertAllEqual((tensor_dict[fields.InputDataFields.groundtruth_boxes].
+                         get_shape().as_list()), [None, 4])
+
+    with self.test_session() as sess:
+      tensor_dict = sess.run(tensor_dict)
+
+    self.assertAllClose(tensor_dict[fields.InputDataFields.groundtruth_weights],
+                        np.ones(2, dtype=np.float32))

  def testDecodeObjectLabel(self):
-    image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
+    image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
    encoded_jpeg = self._EncodeImage(image_tensor)
    bbox_classes = [0, 1]
    example = tf.train.Example(features=tf.train.Features(feature={
@@ -199,8 +253,89 @@ class TfExampleDecoderTest(tf.test.TestCase):
    self.assertAllEqual(bbox_classes,
                        tensor_dict[fields.InputDataFields.groundtruth_classes])

+  def testDecodeObjectLabelNoText(self):
+    image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
+    encoded_jpeg = self._EncodeImage(image_tensor)
+    bbox_classes = [1, 2]
+    example = tf.train.Example(features=tf.train.Features(feature={
+        'image/encoded': self._BytesFeature(encoded_jpeg),
+        'image/format': self._BytesFeature('jpeg'),
+        'image/object/class/label': self._Int64Feature(bbox_classes),
+    })).SerializeToString()
+    label_map_string = """
+      item {
+        id:1
+        name:'cat'
+      }
+      item {
+        id:2
+        name:'dog'
+      }
+    """
+    label_map_path = os.path.join(self.get_temp_dir(), 'label_map.pbtxt')
+    with tf.gfile.Open(label_map_path, 'wb') as f:
+      f.write(label_map_string)
+
+    example_decoder = tf_example_decoder.TfExampleDecoder(
+        label_map_proto_file=label_map_path)
+    tensor_dict = example_decoder.decode(tf.convert_to_tensor(example))
+
+    self.assertAllEqual((tensor_dict[
+        fields.InputDataFields.groundtruth_classes].get_shape().as_list()),
+                        [None])
+
+    init = tf.tables_initializer()
+    with self.test_session() as sess:
+      sess.run(init)
+      tensor_dict = sess.run(tensor_dict)
+
+    self.assertAllEqual(bbox_classes,
+                        tensor_dict[fields.InputDataFields.groundtruth_classes])
+
+  def testDecodeObjectLabelUnrecognizedName(self):
+    image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
+    encoded_jpeg = self._EncodeImage(image_tensor)
+    bbox_classes_text = ['cat', 'cheetah']
+    example = tf.train.Example(
+        features=tf.train.Features(
+            feature={
+                'image/encoded':
+                    self._BytesFeature(encoded_jpeg),
+                'image/format':
+                    self._BytesFeature('jpeg'),
+                'image/object/class/text':
+                    self._BytesFeature(bbox_classes_text),
+            })).SerializeToString()
+
+    label_map_string = """
+      item {
+        id:2
+        name:'cat'
+      }
+      item {
+        id:1
+        name:'dog'
+      }
+    """
+    label_map_path = os.path.join(self.get_temp_dir(), 'label_map.pbtxt')
+    with tf.gfile.Open(label_map_path, 'wb') as f:
+      f.write(label_map_string)
+    example_decoder = tf_example_decoder.TfExampleDecoder(
+        label_map_proto_file=label_map_path)
+    tensor_dict = example_decoder.decode(tf.convert_to_tensor(example))
+
+    self.assertAllEqual((tensor_dict[fields.InputDataFields.groundtruth_classes]
+                         .get_shape().as_list()), [None])
+
+    with self.test_session() as sess:
+      sess.run(tf.tables_initializer())
+      tensor_dict = sess.run(tensor_dict)
+
+    self.assertAllEqual([2, -1],
+                        tensor_dict[fields.InputDataFields.groundtruth_classes])
+
  def testDecodeObjectLabelWithMapping(self):
-    image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
+    image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
    encoded_jpeg = self._EncodeImage(image_tensor)
    bbox_classes_text = ['cat', 'dog']
    example = tf.train.Example(
@@ -242,7 +377,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
                        tensor_dict[fields.InputDataFields.groundtruth_classes])

  def testDecodeObjectArea(self):
-    image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
+    image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
    encoded_jpeg = self._EncodeImage(image_tensor)
    object_area = [100., 174.]
    example = tf.train.Example(features=tf.train.Features(feature={
@@ -263,7 +398,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
                        tensor_dict[fields.InputDataFields.groundtruth_area])

  def testDecodeObjectIsCrowd(self):
-    image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
+    image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
    encoded_jpeg = self._EncodeImage(image_tensor)
    object_is_crowd = [0, 1]
    example = tf.train.Example(features=tf.train.Features(feature={
@@ -286,7 +421,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
                            fields.InputDataFields.groundtruth_is_crowd])

  def testDecodeObjectDifficult(self):
-    image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
+    image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
    encoded_jpeg = self._EncodeImage(image_tensor)
    object_difficult = [0, 1]
    example = tf.train.Example(features=tf.train.Features(feature={
@@ -309,7 +444,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
                            fields.InputDataFields.groundtruth_difficult])

  def testDecodeObjectGroupOf(self):
-    image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
+    image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
    encoded_jpeg = self._EncodeImage(image_tensor)
    object_group_of = [0, 1]
    example = tf.train.Example(features=tf.train.Features(
@@ -333,7 +468,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
        tensor_dict[fields.InputDataFields.groundtruth_group_of])

  def testDecodeObjectWeight(self):
-    image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
+    image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
    encoded_jpeg = self._EncodeImage(image_tensor)
    object_weights = [0.75, 1.0]
    example = tf.train.Example(features=tf.train.Features(
@@ -362,7 +497,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
    image_width = 3

    # Randomly generate image.
-    image_tensor = np.random.randint(255, size=(image_height,
+    image_tensor = np.random.randint(256, size=(image_height,
                                                image_width,
                                                3)).astype(np.uint8)
    encoded_jpeg = self._EncodeImage(image_tensor)
@@ -413,7 +548,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
    image_height = 5
    image_width = 3
    # Randomly generate image.
-    image_tensor = np.random.randint(255, size=(image_height,
+    image_tensor = np.random.randint(256, size=(image_height,
                                                image_width,
                                                3)).astype(np.uint8)
    encoded_jpeg = self._EncodeImage(image_tensor)

--- a/research/object_detection/dataset_tools/__init__.py
+++ b/research/object_detection/dataset_tools/__init__.py
-
--- a/research/object_detection/dataset_tools/create_coco_tf_record.py
+++ b/research/object_detection/dataset_tools/create_coco_tf_record.py
@@ -87,13 +87,12 @@ def create_tf_example(image,
      to the format expected by the Tensorflow Object Detection API (which is
      which is [ymin, xmin, ymax, xmax] with coordinates normalized relative
      to image size).
-    image_dir: Directory containing the image files.
+    image_dir: directory containing the image files.
    category_index: a dict containing COCO category information keyed
      by the 'id' field of each category.  See the
      label_map_util.create_category_index function.
    include_masks: Whether to include instance segmentations masks
      (PNG encoded) in the result. default: False.
-
  Returns:
    example: The converted tf.Example
    num_annotations_skipped: Number of (invalid) annotations that were ignored.
@@ -104,6 +103,7 @@ def create_tf_example(image,
  image_height = image['height']
  image_width = image['width']
  filename = image['file_name']
+  image_id = image['id']

  full_path = os.path.join(image_dir, filename)
  with tf.gfile.GFile(full_path, 'rb') as fid:
@@ -118,6 +118,7 @@ def create_tf_example(image,
  ymax = []
  is_crowd = []
  category_names = []
+  category_ids = []
  area = []
  encoded_mask_png = []
  num_annotations_skipped = 0
@@ -135,12 +136,13 @@ def create_tf_example(image,
    ymax.append(float(y + height) / image_height)
    is_crowd.append(object_annotations['iscrowd'])
    category_id = int(object_annotations['category_id'])
+    category_ids.append(category_id)
    category_names.append(category_index[category_id]['name'].encode('utf8'))
    area.append(object_annotations['area'])

    if include_masks:
-      run_len_encoding = mask.frPyObjects(
-          object_annotations['segmentation'], image_height, image_width)
+      run_len_encoding = mask.frPyObjects(object_annotations['segmentation'],
+                                          image_height, image_width)
      binary_mask = mask.decode(run_len_encoding)
      if not object_annotations['iscrowd']:
        binary_mask = np.amax(binary_mask, axis=2)
@@ -148,31 +150,41 @@ def create_tf_example(image,
      output_io = io.BytesIO()
      pil_image.save(output_io, format='PNG')
      encoded_mask_png.append(output_io.getvalue())
-
  feature_dict = {
-      'image/height': dataset_util.int64_feature(image_height),
-      'image/width': dataset_util.int64_feature(image_width),
-      'image/filename': dataset_util.bytes_feature(
-          filename.encode('utf8')),
-      'image/source_id': dataset_util.bytes_feature(
-          filename.encode('utf8')),
-      'image/key/sha256': dataset_util.bytes_feature(key.encode('utf8')),
-      'image/encoded': dataset_util.bytes_feature(encoded_jpg),
-      'image/format': dataset_util.bytes_feature('jpeg'.encode('utf8')),
-      'image/object/bbox/xmin': dataset_util.float_list_feature(xmin),
-      'image/object/bbox/xmax': dataset_util.float_list_feature(xmax),
-      'image/object/bbox/ymin': dataset_util.float_list_feature(ymin),
-      'image/object/bbox/ymax': dataset_util.float_list_feature(ymax),
-      'image/object/class/text': dataset_util.bytes_list_feature(
-          category_names),
-      'image/object/is_crowd': dataset_util.int64_list_feature(is_crowd),
-      'image/object/area': dataset_util.float_list_feature(area),
+      'image/height':
+          dataset_util.int64_feature(image_height),
+      'image/width':
+          dataset_util.int64_feature(image_width),
+      'image/filename':
+          dataset_util.bytes_feature(filename.encode('utf8')),
+      'image/source_id':
+          dataset_util.bytes_feature(str(image_id).encode('utf8')),
+      'image/key/sha256':
+          dataset_util.bytes_feature(key.encode('utf8')),
+      'image/encoded':
+          dataset_util.bytes_feature(encoded_jpg),
+      'image/format':
+          dataset_util.bytes_feature('jpeg'.encode('utf8')),
+      'image/object/bbox/xmin':
+          dataset_util.float_list_feature(xmin),
+      'image/object/bbox/xmax':
+          dataset_util.float_list_feature(xmax),
+      'image/object/bbox/ymin':
+          dataset_util.float_list_feature(ymin),
+      'image/object/bbox/ymax':
+          dataset_util.float_list_feature(ymax),
+      'image/object/class/label':
+          dataset_util.int64_list_feature(category_ids),
+      'image/object/is_crowd':
+          dataset_util.int64_list_feature(is_crowd),
+      'image/object/area':
+          dataset_util.float_list_feature(area),
  }
  if include_masks:
    feature_dict['image/object/mask'] = (
        dataset_util.bytes_list_feature(encoded_mask_png))
  example = tf.train.Example(features=tf.train.Features(feature=feature_dict))
-  return example, num_annotations_skipped
+  return key, example, num_annotations_skipped


 def _create_tf_record_from_coco_annotations(
@@ -217,7 +229,7 @@ def _create_tf_record_from_coco_annotations(
      if idx % 100 == 0:
        tf.logging.info('On image %d of %d', idx, len(images))
      annotations_list = annotations_index[image['id']]
-      tf_example, num_annotations_skipped = create_tf_example(
+      _, tf_example, num_annotations_skipped = create_tf_example(
          image, annotations_list, image_dir, category_index, include_masks)
      total_num_annotations_skipped += num_annotations_skipped
      writer.write(tf_example.SerializeToString())

--- a/research/object_detection/dataset_tools/create_coco_tf_record_test.py
+++ b/research/object_detection/dataset_tools/create_coco_tf_record_test.py
@@ -12,7 +12,6 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ==============================================================================
-
 """Test for create_coco_tf_record.py."""

 import io
@@ -52,25 +51,33 @@ class CreateCocoTFRecordTest(tf.test.TestCase):
        'id': 11,
    }

-    annotations_list = [
-        {
+    annotations_list = [{
        'area': .5,
        'iscrowd': False,
        'image_id': 11,
        'bbox': [64, 64, 128, 128],
        'category_id': 2,
        'id': 1000,
-        }
-    ]
+    }]

    image_dir = tmp_dir
    category_index = {
-        1: {'name': 'dog', 'id': 1},
-        2: {'name': 'cat', 'id': 2},
-        3: {'name': 'human', 'id': 3}
+        1: {
+            'name': 'dog',
+            'id': 1
+        },
+        2: {
+            'name': 'cat',
+            'id': 2
+        },
+        3: {
+            'name': 'human',
+            'id': 3
+        }
    }

-    example, num_annotations_skipped = create_coco_tf_record.create_tf_example(
+    (_, example,
+     num_annotations_skipped) = create_coco_tf_record.create_tf_example(
         image, annotations_list, image_dir, category_index)

    self.assertEqual(num_annotations_skipped, 0)
@@ -83,7 +90,7 @@ class CreateCocoTFRecordTest(tf.test.TestCase):
        [image_file_name])
    self._assertProtoEqual(
        example.features.feature['image/source_id'].bytes_list.value,
-        [image_file_name])
+        [str(image['id'])])
    self._assertProtoEqual(
        example.features.feature['image/format'].bytes_list.value, ['jpeg'])
    self._assertProtoEqual(
@@ -98,9 +105,6 @@ class CreateCocoTFRecordTest(tf.test.TestCase):
    self._assertProtoEqual(
        example.features.feature['image/object/bbox/ymax'].float_list.value,
        [0.75])
-    self._assertProtoEqual(
-        example.features.feature['image/object/class/text'].bytes_list.value,
-        ['cat'])

  def test_create_tf_example_with_instance_masks(self):
    image_file_name = 'tmp_image.jpg'
@@ -117,25 +121,26 @@ class CreateCocoTFRecordTest(tf.test.TestCase):
        'id': 11,
    }

-    annotations_list = [
-        {
+    annotations_list = [{
        'area': .5,
        'iscrowd': False,
        'image_id': 11,
        'bbox': [0, 0, 8, 8],
-            'segmentation': [[4, 0, 0, 0, 0, 4],
-                             [8, 4, 4, 8, 8, 8]],
+        'segmentation': [[4, 0, 0, 0, 0, 4], [8, 4, 4, 8, 8, 8]],
        'category_id': 1,
        'id': 1000,
-        }
-    ]
+    }]

    image_dir = tmp_dir
    category_index = {
-        1: {'name': 'dog', 'id': 1},
+        1: {
+            'name': 'dog',
+            'id': 1
+        },
    }

-    example, num_annotations_skipped = create_coco_tf_record.create_tf_example(
+    (_, example,
+     num_annotations_skipped) = create_coco_tf_record.create_tf_example(
         image, annotations_list, image_dir, category_index, include_masks=True)

    self.assertEqual(num_annotations_skipped, 0)
@@ -148,7 +153,7 @@ class CreateCocoTFRecordTest(tf.test.TestCase):
        [image_file_name])
    self._assertProtoEqual(
        example.features.feature['image/source_id'].bytes_list.value,
-        [image_file_name])
+        [str(image['id'])])
    self._assertProtoEqual(
        example.features.feature['image/format'].bytes_list.value, ['jpeg'])
    self._assertProtoEqual(
@@ -163,24 +168,20 @@ class CreateCocoTFRecordTest(tf.test.TestCase):
    self._assertProtoEqual(
        example.features.feature['image/object/bbox/ymax'].float_list.value,
        [1])
-    self._assertProtoEqual(
-        example.features.feature['image/object/class/text'].bytes_list.value,
-        ['dog'])
-    encoded_mask_pngs = [io.BytesIO(encoded_masks)
-                         for encoded_masks in example.features.feature[
-                             'image/object/mask'].bytes_list.value]
-    pil_masks = [np.array(PIL.Image.open(encoded_mask_png))
-                 for encoded_mask_png in encoded_mask_pngs]
+    encoded_mask_pngs = [
+        io.BytesIO(encoded_masks) for encoded_masks in example.features.feature[
+            'image/object/mask'].bytes_list.value
+    ]
+    pil_masks = [
+        np.array(PIL.Image.open(encoded_mask_png))
+        for encoded_mask_png in encoded_mask_pngs
+    ]
    self.assertTrue(len(pil_masks) == 1)
    self.assertAllEqual(pil_masks[0],
-                        [[1, 1, 1, 0, 0, 0, 0, 0],
-                         [1, 1, 0, 0, 0, 0, 0, 0],
-                         [1, 0, 0, 0, 0, 0, 0, 0],
-                         [0, 0, 0, 0, 0, 0, 0, 0],
-                         [0, 0, 0, 0, 0, 0, 0, 1],
-                         [0, 0, 0, 0, 0, 0, 1, 1],
-                         [0, 0, 0, 0, 0, 1, 1, 1],
-                         [0, 0, 0, 0, 1, 1, 1, 1]])
+                        [[1, 1, 1, 0, 0, 0, 0, 0], [1, 1, 0, 0, 0, 0, 0, 0],
+                         [1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0],
+                         [0, 0, 0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 0, 1, 1],
+                         [0, 0, 0, 0, 0, 1, 1, 1], [0, 0, 0, 0, 1, 1, 1, 1]])


 if __name__ == '__main__':

--- a/research/object_detection/eval_util.py
+++ b/research/object_detection/eval_util.py
@@ -509,6 +509,11 @@ def result_dict_for_single_example(image,
    detection_masks = detections[detection_fields.detection_masks][0]
    # TODO: This should be done in model's postprocess
    # function ideally.
+    num_detections = tf.to_int32(detections[detection_fields.num_detections][0])
+    detection_boxes = tf.slice(
+        detection_boxes, begin=[0, 0], size=[num_detections, -1])
+    detection_masks = tf.slice(
+        detection_masks, begin=[0, 0, 0], size=[num_detections, -1, -1])
    detection_masks_reframed = ops.reframe_box_masks_to_image_masks(
        detection_masks, detection_boxes, image_shape[1], image_shape[2])
    detection_masks_reframed = tf.cast(

--- a/research/object_detection/evaluator.py
+++ b/research/object_detection/evaluator.py
@@ -24,6 +24,7 @@ import tensorflow as tf
 from object_detection import eval_util
 from object_detection.core import prefetcher
 from object_detection.core import standard_fields as fields
+from object_detection.metrics import coco_evaluation
 from object_detection.utils import object_detection_evaluation

 # A dictionary of metric names to classes that implement the metric. The classes
@@ -39,7 +40,11 @@ EVAL_METRICS_CLASS_DICT = {
    'weighted_pascal_voc_instance_segmentation_metrics':
        object_detection_evaluation.WeightedPascalInstanceSegmentationEvaluator,
    'open_images_detection_metrics':
-        object_detection_evaluation.OpenImagesDetectionEvaluator
+        object_detection_evaluation.OpenImagesDetectionEvaluator,
+    'coco_detection_metrics':
+        coco_evaluation.CocoDetectionEvaluator,
+    'coco_mask_metrics':
+        coco_evaluation.CocoMaskEvaluator,
 }

 EVAL_DEFAULT_METRIC = 'pascal_voc_detection_metrics'