Merged commit includes the following changes:

185215255 by Zhichao Lu: Stop populating image/object/class/text field when generating COCO tf record. -- 185213306 by Zhichao Lu: Use the params batch size and not the one from train_config in input_fn -- 185209081 by Zhichao Lu: Handle the case when there are no ground-truth masks for an image. -- 185195531 by Zhichao Lu: Remove unstack and stack operations on features from third_party/object_detection/model.py. -- 185195017 by Zhichao Lu: Matrix multiplication based gather op implementation. -- 185187744 by Zhichao Lu: Fix eval_util minor issue. -- 185098733 by Zhichao Lu: Internal change 185076656 by Zhichao Lu: Increment the amount of boxes for coco17. -- 185074199 by Zhichao Lu: Add config for SSD Resnet50 v1 with FPN. -- 185060199 by Zhichao Lu: Fix a bug in clear_detections. This method set detection_keys to an empty dictionary instead of an empty set. I've refactored so that this method and the constructor use the same code path. -- 185031359 by Zhichao Lu: Eval TPU trained models continuously. -- 185016591 by Zhichao Lu: Use TPUEstimatorSpec for TPU -- 185013651 by Zhichao Lu: Add PreprocessorCache to record and duplicate augmentations. -- 184921763 by Zhichao Lu: Minor fixes for object detection. -- 184920610 by Zhichao Lu: Adds a model builder test for "embedded_ssd_mobilenet_v1" feature extractor. -- 184919284 by Zhichao Lu: Added unit tests for TPU, with optional training / eval. -- 184915910 by Zhichao Lu: Update third_party g3 doc with Mask RCNN detection models. -- 184914085 by Zhichao Lu: Slight change to WeightSharedConvolutionalBoxPredictor implementation to make things match more closely with RetinaNet. Specifically we now construct the box encoding and class predictor towers separately rather than having them share weights until penultimate layer. -- 184913786 by Zhichao Lu: Plumbs SSD Resnet V1 with FPN models into model builder. -- 184910030 by Zhichao Lu: Add coco metrics to evaluator. -- 184897758 by Zhichao Lu: Merge changes from github. -- 184888736 by Zhichao Lu: Ensure groundtruth_weights are always 1-D. -- 184887256 by Zhichao Lu: Introduce an option to add summaries in the model so it can be turned off when necessary. -- 184865559 by Zhichao Lu: Updating inputs so that a dictionary of tensors is returned from input_fn. Moving unbatch/unpad to model.py. Also removing source_id key from features dictionary, and replacing with an integer hash. -- 184859205 by Zhichao Lu: This CL is trying to hide those differences by making the default settings work with the public code. -- 184769779 by Zhichao Lu: Pass groundtruth weights into ssd meta architecture all the way to target assigner. This will allow training ssd models with padded groundtruth tensors. -- 184767117 by Zhichao Lu: * Add `params` arg to make all input fns work with TPUEstimator * Add --master * Output eval results -- 184766244 by Zhichao Lu: Update create_coco_tf_record to include category indices -- 184752937 by Zhichao Lu: Create a third_party version of TPU compatible mobilenet_v2_focal_loss coco config. -- 184750174 by Zhichao Lu: A few small fixes for multiscale anchor generator and a test. -- 184746581 by Zhichao Lu: Update jupyter notebook to show mask if provided by model. -- 184728646 by Zhichao Lu: Adding a few more tests to make sure decoding with/without label maps performs as expected. -- 184624154 by Zhichao Lu: Add an object detection binary for TPU. -- 184622118 by Zhichao Lu: Batch, transform, and unbatch in the tflearn interface. -- 184595064 by Zhichao Lu: Add support for training grayscale models. -- 184532026 by Zhichao Lu: Change dataset_builder.build to perform optional batching using tf.data.Dataset API -- 184330239 by Zhichao Lu: Add augment_input_data and transform_input_data helper functions to third_party/tensorflow_models/object_detection/inputs.py -- 184328681 by Zhichao Lu: Use an internal rgb to gray method that can be quantized. -- 184327909 by Zhichao Lu: Helper function to return padding shapes to use with Dataset.padded_batch. -- 184326291 by Zhichao Lu: Added decode_func for specialized decoding. -- 184314676 by Zhichao Lu: Add unstack_batch method to inputs.py. This will enable us to convert batched tensors to lists of tensors. This is compatible with OD API that consumes groundtruth batch as a list of tensors. -- 184281269 by Zhichao Lu: Internal test target changes. -- 184192851 by Zhichao Lu: Adding `Estimator` interface for object detection. -- 184187885 by Zhichao Lu: Add config_util functions to help with input pipeline. 1. function to return expected shapes from the resizer config 2. function to extract image_resizer_config from model_config. -- 184139892 by Zhichao Lu: Adding support for depthwise SSD (ssd-lite) and depthwise box predictions. -- 184089891 by Zhichao Lu: Fix third_party faster rcnn resnet101 coco config. -- 184083378 by Zhichao Lu: In the case when there is no object/weights field in tf.Example proto, return a default weight of 1.0 for all boxes. -- PiperOrigin-RevId: 185215255

Merged commit includes the following changes:
185215255 by Zhichao Lu: Stop populating image/object/class/text field when generating COCO tf record. -- 185213306 by Zhichao Lu: Use the params batch size and not the one from train_config in input_fn -- 185209081 by Zhichao Lu: Handle the case when there are no ground-truth masks for an image. -- 185195531 by Zhichao Lu: Remove unstack and stack operations on features from third_party/object_detection/model.py. -- 185195017 by Zhichao Lu: Matrix multiplication based gather op implementation. -- 185187744 by Zhichao Lu: Fix eval_util minor issue. -- 185098733 by Zhichao Lu: Internal change 185076656 by Zhichao Lu: Increment the amount of boxes for coco17. -- 185074199 by Zhichao Lu: Add config for SSD Resnet50 v1 with FPN. -- 185060199 by Zhichao Lu: Fix a bug in clear_detections. This method set detection_keys to an empty dictionary instead of an empty set. I've refactored so that this method and the constructor use the same code path. -- 185031359 by Zhichao Lu: Eval TPU trained models continuously. -- 185016591 by Zhichao Lu: Use TPUEstimatorSpec for TPU -- 185013651 by Zhichao Lu: Add PreprocessorCache to record and duplicate augmentations. -- 184921763 by Zhichao Lu: Minor fixes for object detection. -- 184920610 by Zhichao Lu: Adds a model builder test for "embedded_ssd_mobilenet_v1" feature extractor. -- 184919284 by Zhichao Lu: Added unit tests for TPU, with optional training / eval. -- 184915910 by Zhichao Lu: Update third_party g3 doc with Mask RCNN detection models. -- 184914085 by Zhichao Lu: Slight change to WeightSharedConvolutionalBoxPredictor implementation to make things match more closely with RetinaNet. Specifically we now construct the box encoding and class predictor towers separately rather than having them share weights until penultimate layer. -- 184913786 by Zhichao Lu: Plumbs SSD Resnet V1 with FPN models into model builder. -- 184910030 by Zhichao Lu: Add coco metrics to evaluator. -- 184897758 by Zhichao Lu: Merge changes from github. -- 184888736 by Zhichao Lu: Ensure groundtruth_weights are always 1-D. -- 184887256 by Zhichao Lu: Introduce an option to add summaries in the model so it can be turned off when necessary. -- 184865559 by Zhichao Lu: Updating inputs so that a dictionary of tensors is returned from input_fn. Moving unbatch/unpad to model.py. Also removing source_id key from features dictionary, and replacing with an integer hash. -- 184859205 by Zhichao Lu: This CL is trying to hide those differences by making the default settings work with the public code. -- 184769779 by Zhichao Lu: Pass groundtruth weights into ssd meta architecture all the way to target assigner. This will allow training ssd models with padded groundtruth tensors. -- 184767117 by Zhichao Lu: * Add `params` arg to make all input fns work with TPUEstimator * Add --master * Output eval results -- 184766244 by Zhichao Lu: Update create_coco_tf_record to include category indices -- 184752937 by Zhichao Lu: Create a third_party version of TPU compatible mobilenet_v2_focal_loss coco config. -- 184750174 by Zhichao Lu: A few small fixes for multiscale anchor generator and a test. -- 184746581 by Zhichao Lu: Update jupyter notebook to show mask if provided by model. -- 184728646 by Zhichao Lu: Adding a few more tests to make sure decoding with/without label maps performs as expected. -- 184624154 by Zhichao Lu: Add an object detection binary for TPU. -- 184622118 by Zhichao Lu: Batch, transform, and unbatch in the tflearn interface. -- 184595064 by Zhichao Lu: Add support for training grayscale models. -- 184532026 by Zhichao Lu: Change dataset_builder.build to perform optional batching using tf.data.Dataset API -- 184330239 by Zhichao Lu: Add augment_input_data and transform_input_data helper functions to third_party/tensorflow_models/object_detection/inputs.py -- 184328681 by Zhichao Lu: Use an internal rgb to gray method that can be quantized. -- 184327909 by Zhichao Lu: Helper function to return padding shapes to use with Dataset.padded_batch. -- 184326291 by Zhichao Lu: Added decode_func for specialized decoding. -- 184314676 by Zhichao Lu: Add unstack_batch method to inputs.py. This will enable us to convert batched tensors to lists of tensors. This is compatible with OD API that consumes groundtruth batch as a list of tensors. -- 184281269 by Zhichao Lu: Internal test target changes. -- 184192851 by Zhichao Lu: Adding `Estimator` interface for object detection. -- 184187885 by Zhichao Lu: Add config_util functions to help with input pipeline. 1. function to return expected shapes from the resizer config 2. function to extract image_resizer_config from model_config. -- 184139892 by Zhichao Lu: Adding support for depthwise SSD (ssd-lite) and depthwise box predictions. -- 184089891 by Zhichao Lu: Fix third_party faster rcnn resnet101 coco config. -- 184083378 by Zhichao Lu: In the case when there is no object/weights field in tf.Example proto, return a default weight of 1.0 for all boxes. -- PiperOrigin-RevId: 185215255
1efe98bb · Zhichao Lu · lzc5123016 · fbc5ba06 · 1efe98bb · 1efe98bb
Commit 1efe98bb authored Feb 09, 2018 by Zhichao Lu Committed by lzc5123016 Feb 09, 2018
20 changed files
--- a/research/object_detection/core/BUILD
+++ b/research/object_detection/core/BUILD
@@ -123,6 +123,7 @@ py_library(
        "matcher.py",
    ],
    deps = [
+        "//tensorflow/models/research/object_detection/utils:ops",
    ],
 )

@@ -160,12 +161,20 @@ py_library(
        ":box_list",
        ":box_list_ops",
        ":keypoint_ops",
+        ":preprocessor_cache",
        ":standard_fields",
        "//tensorflow",
        "//tensorflow/models/research/object_detection/utils:shape_utils",
    ],
 )

+py_library(
+    name = "preprocessor_cache",
+    srcs = [
+        "preprocessor_cache.py",
+    ],
+)
+
 py_test(
    name = "preprocessor_test",
    srcs = [
@@ -173,6 +182,7 @@ py_test(
    ],
    deps = [
        ":preprocessor",
+        ":preprocessor_cache",
        "//tensorflow",
    ],
 )

--- a/research/object_detection/core/__init__.py
+++ b/research/object_detection/core/__init__.py
+
--- a/research/object_detection/core/box_predictor.py
+++ b/research/object_detection/core/box_predictor.py
@@ -102,7 +102,7 @@ class BoxPredictor(object):
        return self._predict(image_features, num_predictions_per_location,
                             **params)
    return self._predict(image_features, num_predictions_per_location,
-                           **params)
+                         **params)

  # TODO: num_predictions_per_location could be moved to constructor.
  # This is currently only used by ConvolutionalBoxPredictor.
@@ -582,7 +582,8 @@ class ConvolutionalBoxPredictor(BoxPredictor):
               kernel_size,
               box_code_size,
               apply_sigmoid_to_scores=False,
-               class_prediction_bias_init=0.0):
+               class_prediction_bias_init=0.0,
+               use_depthwise=False):
    """Constructor.

    Args:
@@ -611,6 +612,8 @@ class ConvolutionalBoxPredictor(BoxPredictor):
        class_predictions.
      class_prediction_bias_init: constant value to initialize bias of the last
        conv2d layer before class prediction.
+      use_depthwise: Whether to use depthwise convolutions for prediction
+        steps. Default is False.

    Raises:
      ValueError: if min_depth > max_depth.
@@ -628,6 +631,7 @@ class ConvolutionalBoxPredictor(BoxPredictor):
    self._dropout_keep_prob = dropout_keep_prob
    self._apply_sigmoid_to_scores = apply_sigmoid_to_scores
    self._class_prediction_bias_init = class_prediction_bias_init
+    self._use_depthwise = use_depthwise

  def _predict(self, image_features, num_predictions_per_location_list):
    """Computes encoded object locations and corresponding confidences.
@@ -683,17 +687,38 @@ class ConvolutionalBoxPredictor(BoxPredictor):
                  net, depth, [1, 1], scope='Conv2d_%d_1x1_%d' % (i, depth))
          with slim.arg_scope([slim.conv2d], activation_fn=None,
                              normalizer_fn=None, normalizer_params=None):
-            box_encodings = slim.conv2d(
-                net, num_predictions_per_location * self._box_code_size,
-                [self._kernel_size, self._kernel_size],
-                scope='BoxEncodingPredictor')
+            if self._use_depthwise:
+              box_encodings = slim.separable_conv2d(
+                  net, None, [self._kernel_size, self._kernel_size],
+                  padding='SAME', depth_multiplier=1, stride=1,
+                  rate=1, scope='BoxEncodingPredictor_depthwise')
+              box_encodings = slim.conv2d(
+                  box_encodings,
+                  num_predictions_per_location * self._box_code_size, [1, 1],
+                  scope='BoxEncodingPredictor')
+            else:
+              box_encodings = slim.conv2d(
+                  net, num_predictions_per_location * self._box_code_size,
+                  [self._kernel_size, self._kernel_size],
+                  scope='BoxEncodingPredictor')
            if self._use_dropout:
              net = slim.dropout(net, keep_prob=self._dropout_keep_prob)
-            class_predictions_with_background = slim.conv2d(
-                net, num_predictions_per_location * num_class_slots,
-                [self._kernel_size, self._kernel_size], scope='ClassPredictor',
-                biases_initializer=tf.constant_initializer(
-                    self._class_prediction_bias_init))
+            if self._use_depthwise:
+              class_predictions_with_background = slim.separable_conv2d(
+                  net, None, [self._kernel_size, self._kernel_size],
+                  padding='SAME', depth_multiplier=1, stride=1,
+                  rate=1, scope='ClassPredictor_depthwise')
+              class_predictions_with_background = slim.conv2d(
+                  class_predictions_with_background,
+                  num_predictions_per_location * num_class_slots,
+                  [1, 1], scope='ClassPredictor')
+            else:
+              class_predictions_with_background = slim.conv2d(
+                  net, num_predictions_per_location * num_class_slots,
+                  [self._kernel_size, self._kernel_size],
+                  scope='ClassPredictor',
+                  biases_initializer=tf.constant_initializer(
+                      self._class_prediction_bias_init))
            if self._apply_sigmoid_to_scores:
              class_predictions_with_background = tf.sigmoid(
                  class_predictions_with_background)
@@ -729,7 +754,8 @@ class WeightSharedConvolutionalBoxPredictor(BoxPredictor):
  Defines the box predictor as defined in
  https://arxiv.org/abs/1708.02002. This class differs from
  ConvolutionalBoxPredictor in that it shares weights and biases while
-  predicting from different feature maps.
+  predicting from different feature maps.  Separate multi-layer towers are
+  constructed for the box encoding and class predictors respectively.
  """

  def __init__(self,
@@ -811,22 +837,35 @@ class WeightSharedConvolutionalBoxPredictor(BoxPredictor):
      with tf.variable_scope('WeightSharedConvolutionalBoxPredictor',
                             reuse=tf.AUTO_REUSE):
        num_class_slots = self.num_classes + 1
-        net = image_feature
+        box_encodings_net = image_feature
+        class_predictions_net = image_feature
        with slim.arg_scope(self._conv_hyperparams):
          for i in range(self._num_layers_before_predictor):
-            net = slim.conv2d(net,
-                              self._depth,
-                              [self._kernel_size, self._kernel_size],
-                              stride=1,
-                              padding='SAME',
-                              scope='conv2d_{}'.format(i))
+            box_encodings_net = slim.conv2d(
+                box_encodings_net,
+                self._depth,
+                [self._kernel_size, self._kernel_size],
+                stride=1,
+                padding='SAME',
+                scope='BoxEncodingPredictionTower/conv2d_{}'.format(i))
          box_encodings = slim.conv2d(
-              net, num_predictions_per_location * self._box_code_size,
+              box_encodings_net,
+              num_predictions_per_location * self._box_code_size,
              [self._kernel_size, self._kernel_size],
              activation_fn=None, stride=1, padding='SAME',
              scope='BoxEncodingPredictor')
+
+          for i in range(self._num_layers_before_predictor):
+            class_predictions_net = slim.conv2d(
+                class_predictions_net,
+                self._depth,
+                [self._kernel_size, self._kernel_size],
+                stride=1,
+                padding='SAME',
+                scope='ClassPredictionTower/conv2d_{}'.format(i))
          class_predictions_with_background = slim.conv2d(
-              net, num_predictions_per_location * num_class_slots,
+              class_predictions_net,
+              num_predictions_per_location * num_class_slots,
              [self._kernel_size, self._kernel_size],
              activation_fn=None, stride=1, padding='SAME',
              biases_initializer=tf.constant_initializer(

--- a/research/object_detection/core/box_predictor_test.py
+++ b/research/object_detection/core/box_predictor_test.py
@@ -316,9 +316,69 @@ class ConvolutionalBoxPredictorTest(test_case.TestCase):
           [tf.shape(box_encodings), tf.shape(objectness_predictions)],
           feed_dict={image_features:
                      np.random.rand(4, resolution, resolution, 64)})
+      actual_variable_set = set(
+          [var.op.name for var in tf.trainable_variables()])
      self.assertAllEqual(box_encodings_shape, [4, expected_num_anchors, 1, 4])
      self.assertAllEqual(objectness_predictions_shape,
                          [4, expected_num_anchors, 1])
+    expected_variable_set = set([
+        'BoxPredictor/Conv2d_0_1x1_32/biases',
+        'BoxPredictor/Conv2d_0_1x1_32/weights',
+        'BoxPredictor/BoxEncodingPredictor/biases',
+        'BoxPredictor/BoxEncodingPredictor/weights',
+        'BoxPredictor/ClassPredictor/biases',
+        'BoxPredictor/ClassPredictor/weights'])
+    self.assertEqual(expected_variable_set, actual_variable_set)
+
+  def test_use_depthwise_convolution(self):
+    image_features = tf.placeholder(dtype=tf.float32, shape=[4, None, None, 64])
+    conv_box_predictor = box_predictor.ConvolutionalBoxPredictor(
+        is_training=False,
+        num_classes=0,
+        conv_hyperparams=self._build_arg_scope_with_conv_hyperparams(),
+        min_depth=0,
+        max_depth=32,
+        num_layers_before_predictor=1,
+        dropout_keep_prob=0.8,
+        kernel_size=1,
+        box_code_size=4,
+        use_dropout=True,
+        use_depthwise=True
+    )
+    box_predictions = conv_box_predictor.predict(
+        [image_features], num_predictions_per_location=[5],
+        scope='BoxPredictor')
+    box_encodings = box_predictions[box_predictor.BOX_ENCODINGS]
+    objectness_predictions = box_predictions[
+        box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND]
+    init_op = tf.global_variables_initializer()
+
+    resolution = 32
+    expected_num_anchors = resolution*resolution*5
+    with self.test_session() as sess:
+      sess.run(init_op)
+      (box_encodings_shape,
+       objectness_predictions_shape) = sess.run(
+           [tf.shape(box_encodings), tf.shape(objectness_predictions)],
+           feed_dict={image_features:
+                      np.random.rand(4, resolution, resolution, 64)})
+      actual_variable_set = set(
+          [var.op.name for var in tf.trainable_variables()])
+    self.assertAllEqual(box_encodings_shape, [4, expected_num_anchors, 1, 4])
+    self.assertAllEqual(objectness_predictions_shape,
+                        [4, expected_num_anchors, 1])
+    expected_variable_set = set([
+        'BoxPredictor/Conv2d_0_1x1_32/biases',
+        'BoxPredictor/Conv2d_0_1x1_32/weights',
+        'BoxPredictor/BoxEncodingPredictor_depthwise/biases',
+        'BoxPredictor/BoxEncodingPredictor_depthwise/depthwise_weights',
+        'BoxPredictor/BoxEncodingPredictor/biases',
+        'BoxPredictor/BoxEncodingPredictor/weights',
+        'BoxPredictor/ClassPredictor_depthwise/biases',
+        'BoxPredictor/ClassPredictor_depthwise/depthwise_weights',
+        'BoxPredictor/ClassPredictor/biases',
+        'BoxPredictor/ClassPredictor/weights'])
+    self.assertEqual(expected_variable_set, actual_variable_set)


 class WeightSharedConvolutionalBoxPredictorTest(test_case.TestCase):
@@ -440,14 +500,26 @@ class WeightSharedConvolutionalBoxPredictorTest(test_case.TestCase):

    with self.test_session(graph=tf.Graph()):
      graph_fn(tf.random_uniform([4, 32, 32, 3], dtype=tf.float32),
-               tf.random_uniform([4, 32, 32, 3], dtype=tf.float32))
+               tf.random_uniform([4, 16, 16, 3], dtype=tf.float32))
      actual_variable_set = set(
          [var.op.name for var in tf.trainable_variables()])
    expected_variable_set = set([
-        'BoxPredictor/WeightSharedConvolutionalBoxPredictor/conv2d_0/weights',
-        'BoxPredictor/WeightSharedConvolutionalBoxPredictor/conv2d_0/biases',
-        'BoxPredictor/WeightSharedConvolutionalBoxPredictor/conv2d_1/weights',
-        'BoxPredictor/WeightSharedConvolutionalBoxPredictor/conv2d_1/biases',
+        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
+         'BoxEncodingPredictionTower/conv2d_0/weights'),
+        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
+         'BoxEncodingPredictionTower/conv2d_0/biases'),
+        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
+         'BoxEncodingPredictionTower/conv2d_1/weights'),
+        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
+         'BoxEncodingPredictionTower/conv2d_1/biases'),
+        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
+         'ClassPredictionTower/conv2d_0/weights'),
+        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
+         'ClassPredictionTower/conv2d_0/biases'),
+        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
+         'ClassPredictionTower/conv2d_1/weights'),
+        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
+         'ClassPredictionTower/conv2d_1/biases'),
        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
         'BoxEncodingPredictor/weights'),
        ('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
@@ -489,6 +561,5 @@ class WeightSharedConvolutionalBoxPredictorTest(test_case.TestCase):
      self.assertAllEqual(objectness_predictions_shape,
                          [4, expected_num_anchors, 1])

-
 if __name__ == '__main__':
  tf.test.main()
--- a/research/object_detection/core/matcher.py
+++ b/research/object_detection/core/matcher.py
@@ -36,6 +36,8 @@ from abc import abstractmethod

 import tensorflow as tf

+from object_detection.utils import ops
+

 class Match(object):
  """Class to store results from the matcher.
@@ -44,7 +46,7 @@ class Match(object):
  convenient methods to query the matching results.
  """

-  def __init__(self, match_results):
+  def __init__(self, match_results, use_matmul_gather=False):
    """Constructs a Match object.

    Args:
@@ -52,6 +54,8 @@ class Match(object):
        meaning that column i is matched with row match_results[i].
        (2) match_results[i]=-1, meaning that column i is not matched.
        (3) match_results[i]=-2, meaning that column i is ignored.
+      use_matmul_gather: Use matrix multiplication based gather instead of
+        standard tf.gather. (Default: False).

    Raises:
      ValueError: if match_results does not have rank 1 or is not an
@@ -63,6 +67,9 @@ class Match(object):
      raise ValueError('match_results should be an int32 or int64 scalar '
                       'tensor')
    self._match_results = match_results
+    self._gather_op = tf.gather
+    if use_matmul_gather:
+      self._gather_op = ops.matmul_gather_on_zeroth_axis

  @property
  def match_results(self):
@@ -163,7 +170,7 @@ class Match(object):
      row_indices: int32 tensor of shape [K] with row indices.
    """
    return self._reshape_and_cast(
-        tf.gather(self._match_results, self.matched_column_indices()))
+        self._gather_op(self._match_results, self.matched_column_indices()))

  def _reshape_and_cast(self, t):
    return tf.cast(tf.reshape(t, [-1]), tf.int32)
@@ -193,7 +200,7 @@ class Match(object):
    input_tensor = tf.concat([tf.stack([ignored_value, unmatched_value]),
                              input_tensor], axis=0)
    gather_indices = tf.maximum(self.match_results + 2, 0)
-    gathered_tensor = tf.gather(input_tensor, gather_indices)
+    gathered_tensor = self._gather_op(input_tensor, gather_indices)
    return gathered_tensor


@@ -202,6 +209,16 @@ class Matcher(object):
  """
  __metaclass__ = ABCMeta

+  def __init__(self, use_matmul_gather=False):
+    """Constructs a Matcher.
+
+    Args:
+      use_matmul_gather: Force constructed match objects to use matrix
+        multiplication based gather instead of standard tf.gather.
+        (Default: False).
+    """
+    self._use_matmul_gather = use_matmul_gather
+
  def match(self, similarity_matrix, scope=None, **params):
    """Computes matches among row and column indices and returns the result.

@@ -219,7 +236,8 @@ class Matcher(object):
      A Match object with the results of matching.
    """
    with tf.name_scope(scope, 'Match', [similarity_matrix, params]) as scope:
-      return Match(self._match(similarity_matrix, **params))
+      return Match(self._match(similarity_matrix, **params),
+                   self._use_matmul_gather)

  @abstractmethod
  def _match(self, similarity_matrix, **params):

--- a/research/object_detection/core/matcher_test.py
+++ b/research/object_detection/core/matcher_test.py
@@ -172,5 +172,21 @@ class MatchTest(tf.test.TestCase):
      gathered_tensor_out = gathered_tensor.eval()
    self.assertAllEqual(expected_gathered_tensor, gathered_tensor_out)

+  def test_multidimensional_gather_based_on_match_with_matmul_gather_op(self):
+    match_results = tf.constant([1, -1, -2])
+    input_tensor = tf.constant([[0, 0.5, 0, 0.5], [0, 0, 0.5, 0.5]],
+                               dtype=tf.float32)
+    expected_gathered_tensor = [[0, 0, 0.5, 0.5], [0, 0, 0, 0], [0, 0, 0, 0]]
+    match = matcher.Match(match_results, use_matmul_gather=True)
+    gathered_tensor = match.gather_based_on_match(input_tensor,
+                                                  unmatched_value=tf.zeros(4),
+                                                  ignored_value=tf.zeros(4))
+    self.assertEquals(gathered_tensor.dtype, tf.float32)
+    with self.test_session() as sess:
+      self.assertTrue(
+          all([op.name is not 'Gather' for op in sess.graph.get_operations()]))
+      gathered_tensor_out = gathered_tensor.eval()
+    self.assertAllEqual(expected_gathered_tensor, gathered_tensor_out)
+
 if __name__ == '__main__':
  tf.test.main()
--- a/research/object_detection/core/model.py
+++ b/research/object_detection/core/model.py
@@ -236,7 +236,8 @@ class DetectionModel(object):
                          groundtruth_boxes_list,
                          groundtruth_classes_list,
                          groundtruth_masks_list=None,
-                          groundtruth_keypoints_list=None):
+                          groundtruth_keypoints_list=None,
+                          groundtruth_weights_list=None):
    """Provide groundtruth tensors.

    Args:
@@ -257,10 +258,15 @@ class DetectionModel(object):
        shape [num_boxes, num_keypoints, 2] containing keypoints.
        Keypoints are assumed to be provided in normalized coordinates and
        missing keypoints should be encoded as NaN.
+      groundtruth_weights_list: A list of 1-D tf.float32 tensors of shape
+        [num_boxes] containing weights for groundtruth boxes.
    """
    self._groundtruth_lists[fields.BoxListFields.boxes] = groundtruth_boxes_list
    self._groundtruth_lists[
        fields.BoxListFields.classes] = groundtruth_classes_list
+    if groundtruth_weights_list:
+      self._groundtruth_lists[fields.BoxListFields.
+                              weights] = groundtruth_weights_list
    if groundtruth_masks_list:
      self._groundtruth_lists[
          fields.BoxListFields.masks] = groundtruth_masks_list

--- a/research/object_detection/core/preprocessor.py
+++ b/research/object_detection/core/preprocessor.py
@@ -35,6 +35,27 @@ in each row there is a box with [ymin xmin ymax xmax].
 Boxes are in normalized coordinates meaning
 their coordinate values range in [0, 1]

+To preprocess multiple images with the same operations in cases where
+nondeterministic operations are used, a preprocessor_cache.PreprocessorCache
+object can be passed into the preprocess function or individual operations.
+All nondeterministic operations except random_jitter_boxes support caching.
+E.g.
+Let tensor_dict{1,2,3,4,5} be copies of the same inputs.
+Let preprocess_options contain nondeterministic operation(s) excluding
+random_jitter_boxes.
+
+cache1 = preprocessor_cache.PreprocessorCache()
+cache2 = preprocessor_cache.PreprocessorCache()
+a = preprocess(tensor_dict1, preprocess_options, preprocess_vars_cache=cache1)
+b = preprocess(tensor_dict2, preprocess_options, preprocess_vars_cache=cache1)
+c = preprocess(tensor_dict3, preprocess_options, preprocess_vars_cache=cache2)
+d = preprocess(tensor_dict4, preprocess_options, preprocess_vars_cache=cache2)
+e = preprocess(tensor_dict5, preprocess_options)
+
+Then correspondings tensors of object pairs (a,b) and (c,d)
+are guaranteed to be equal element-wise, but the equality of any other object
+pair cannot be determined.
+
 Important Note: In tensor_dict, images is a rank 4 tensor, but preprocessing
 functions receive a rank 3 tensor for processing the image. Thus, inside the
 preprocess function we squeeze the image to become a rank 3 tensor and then
@@ -42,6 +63,8 @@ we pass it to the functions. At the end of the preprocess we expand the image
 back to rank 4.
 """

+import functools
+import inspect
 import sys
 import tensorflow as tf

@@ -50,45 +73,79 @@ from tensorflow.python.ops import control_flow_ops
 from object_detection.core import box_list
 from object_detection.core import box_list_ops
 from object_detection.core import keypoint_ops
+from object_detection.core import preprocessor_cache
 from object_detection.core import standard_fields as fields
 from object_detection.utils import shape_utils


-def _apply_with_random_selector(x, func, num_cases):
+def _apply_with_random_selector(x,
+                                func,
+                                num_cases,
+                                preprocess_vars_cache=None,
+                                key=''):
  """Computes func(x, sel), with sel sampled from [0...num_cases-1].

+  If both preprocess_vars_cache AND key are the same between two calls, sel will
+  be the same value in both calls.
+
  Args:
    x: input Tensor.
    func: Python function to apply.
    num_cases: Python int32, number of cases to sample sel from.
+    preprocess_vars_cache: PreprocessorCache object that records previously
+                           performed augmentations. Updated in-place. If this
+                           function is called multiple times with the same
+                           non-null cache, it will perform deterministically.
+    key: variable identifier for preprocess_vars_cache.

  Returns:
    The result of func(x, sel), where func receives the value of the
    selector as a python integer, but sel is sampled dynamically.
  """
-  rand_sel = tf.random_uniform([], maxval=num_cases, dtype=tf.int32)
+  generator_func = functools.partial(
+      tf.random_uniform, [], maxval=num_cases, dtype=tf.int32)
+  rand_sel = _get_or_create_preprocess_rand_vars(
+      generator_func, preprocessor_cache.PreprocessorCache.SELECTOR,
+      preprocess_vars_cache, key)
+
  # Pass the real x only to one of the func calls.
  return control_flow_ops.merge([func(
      control_flow_ops.switch(x, tf.equal(rand_sel, case))[1], case)
                                 for case in range(num_cases)])[0]


-def _apply_with_random_selector_tuples(x, func, num_cases):
+def _apply_with_random_selector_tuples(x,
+                                       func,
+                                       num_cases,
+                                       preprocess_vars_cache=None,
+                                       key=''):
  """Computes func(x, sel), with sel sampled from [0...num_cases-1].

+  If both preprocess_vars_cache AND key are the same between two calls, sel will
+  be the same value in both calls.
+
  Args:
    x: A tuple of input tensors.
    func: Python function to apply.
    num_cases: Python int32, number of cases to sample sel from.
+    preprocess_vars_cache: PreprocessorCache object that records previously
+                           performed augmentations. Updated in-place. If this
+                           function is called multiple times with the same
+                           non-null cache, it will perform deterministically.
+    key: variable identifier for preprocess_vars_cache.

  Returns:
    The result of func(x, sel), where func receives the value of the
    selector as a python integer, but sel is sampled dynamically.
  """
  num_inputs = len(x)
-  rand_sel = tf.random_uniform([], maxval=num_cases, dtype=tf.int32)
-  # Pass the real x only to one of the func calls.
+  generator_func = functools.partial(
+      tf.random_uniform, [], maxval=num_cases, dtype=tf.int32)
+  rand_sel = _get_or_create_preprocess_rand_vars(
+      generator_func, preprocessor_cache.PreprocessorCache.SELECTOR_TUPLES,
+      preprocess_vars_cache, key)

+  # Pass the real x only to one of the func calls.
  tuples = [list() for t in x]
  for case in range(num_cases):
    new_x = [control_flow_ops.switch(t, tf.equal(rand_sel, case))[1] for t in x]
@@ -101,6 +158,37 @@ def _apply_with_random_selector_tuples(x, func, num_cases):
  return tuple(tuples)


+def _get_or_create_preprocess_rand_vars(generator_func,
+                                        function_id,
+                                        preprocess_vars_cache,
+                                        key=''):
+  """Returns a tensor stored in preprocess_vars_cache or using generator_func.
+
+  If the tensor was previously generated and appears in the PreprocessorCache,
+  the previously generated tensor will be returned. Otherwise, a new tensor
+  is generated using generator_func and stored in the cache.
+
+  Args:
+    generator_func: A 0-argument function that generates a tensor.
+    function_id: identifier for the preprocessing function used.
+    preprocess_vars_cache: PreprocessorCache object that records previously
+                           performed augmentations. Updated in-place. If this
+                           function is called multiple times with the same
+                           non-null cache, it will perform deterministically.
+    key: identifier for the variable stored.
+  Returns:
+    The generated tensor.
+  """
+  if preprocess_vars_cache is not None:
+    var = preprocess_vars_cache.get(function_id, key)
+    if var is None:
+      var = generator_func()
+      preprocess_vars_cache.update(function_id, key, var)
+  else:
+    var = generator_func()
+  return var
+
+
 def _random_integer(minval, maxval, seed):
  """Returns a random 0-D tensor between minval and maxval.

@@ -116,6 +204,40 @@ def _random_integer(minval, maxval, seed):
      [], minval=minval, maxval=maxval, dtype=tf.int32, seed=seed)


+# TODO: This method is needed because the current
+# tf.image.rgb_to_grayscale method does not support quantization. Replace with
+# tf.image.rgb_to_grayscale after quantization support is added.
+def _rgb_to_grayscale(images, name=None):
+  """Converts one or more images from RGB to Grayscale.
+
+  Outputs a tensor of the same `DType` and rank as `images`.  The size of the
+  last dimension of the output is 1, containing the Grayscale value of the
+  pixels.
+
+  Args:
+    images: The RGB tensor to convert. Last dimension must have size 3 and
+      should contain RGB values.
+    name: A name for the operation (optional).
+
+  Returns:
+    The converted grayscale image(s).
+  """
+  with tf.name_scope(name, 'rgb_to_grayscale', [images]) as name:
+    images = tf.convert_to_tensor(images, name='images')
+    # Remember original dtype to so we can convert back if needed
+    orig_dtype = images.dtype
+    flt_image = tf.image.convert_image_dtype(images, tf.float32)
+
+    # Reference for converting between RGB and grayscale.
+    # https://en.wikipedia.org/wiki/Luma_%28video%29
+    rgb_weights = [0.2989, 0.5870, 0.1140]
+    rank_1 = tf.expand_dims(tf.rank(images) - 1, 0)
+    gray_float = tf.reduce_sum(
+        flt_image * rgb_weights, rank_1, keepdims=True)
+    gray_float.set_shape(images.get_shape()[:-1].concatenate([1]))
+    return tf.image.convert_image_dtype(gray_float, orig_dtype, name=name)
+
+
 def normalize_image(image, original_minval, original_maxval, target_minval,
                    target_maxval):
  """Normalizes pixel values in the image.
@@ -313,7 +435,8 @@ def random_horizontal_flip(image,
                           masks=None,
                           keypoints=None,
                           keypoint_flip_permutation=None,
-                           seed=None):
+                           seed=None,
+                           preprocess_vars_cache=None):
  """Randomly flips the image and detections horizontally.

  The probability of flipping the image is 50%.
@@ -334,6 +457,10 @@ def random_horizontal_flip(image,
    keypoint_flip_permutation: rank 1 int32 tensor containing the keypoint flip
                               permutation.
    seed: random seed
+    preprocess_vars_cache: PreprocessorCache object that records previously
+                           performed augmentations. Updated in-place. If this
+                           function is called multiple times with the same
+                           non-null cache, it will perform deterministically.

  Returns:
    image: image which is the same shape as input image.
@@ -365,7 +492,12 @@ def random_horizontal_flip(image,
  with tf.name_scope('RandomHorizontalFlip', values=[image, boxes]):
    result = []
    # random variable defining whether to do flip or not
-    do_a_flip_random = tf.greater(tf.random_uniform([], seed=seed), 0.5)
+    generator_func = functools.partial(tf.random_uniform, [], seed=seed)
+    do_a_flip_random = _get_or_create_preprocess_rand_vars(
+        generator_func,
+        preprocessor_cache.PreprocessorCache.HORIZONTAL_FLIP,
+        preprocess_vars_cache)
+    do_a_flip_random = tf.greater(do_a_flip_random, 0.5)

    # flip image
    image = tf.cond(do_a_flip_random, lambda: _flip_image(image), lambda: image)
@@ -400,7 +532,8 @@ def random_vertical_flip(image,
                         masks=None,
                         keypoints=None,
                         keypoint_flip_permutation=None,
-                         seed=None):
+                         seed=None,
+                         preprocess_vars_cache=None):
  """Randomly flips the image and detections vertically.

  The probability of flipping the image is 50%.
@@ -421,6 +554,10 @@ def random_vertical_flip(image,
    keypoint_flip_permutation: rank 1 int32 tensor containing the keypoint flip
                               permutation.
    seed: random seed
+    preprocess_vars_cache: PreprocessorCache object that records previously
+                           performed augmentations. Updated in-place. If this
+                           function is called multiple times with the same
+                           non-null cache, it will perform deterministically.

  Returns:
    image: image which is the same shape as input image.
@@ -452,7 +589,11 @@ def random_vertical_flip(image,
  with tf.name_scope('RandomVerticalFlip', values=[image, boxes]):
    result = []
    # random variable defining whether to do flip or not
-    do_a_flip_random = tf.greater(tf.random_uniform([], seed=seed), 0.5)
+    generator_func = functools.partial(tf.random_uniform, [], seed=seed)
+    do_a_flip_random = _get_or_create_preprocess_rand_vars(
+        generator_func, preprocessor_cache.PreprocessorCache.VERTICAL_FLIP,
+        preprocess_vars_cache)
+    do_a_flip_random = tf.greater(do_a_flip_random, 0.5)

    # flip image
    image = tf.cond(do_a_flip_random, lambda: _flip_image(image), lambda: image)
@@ -486,7 +627,8 @@ def random_rotation90(image,
                      boxes=None,
                      masks=None,
                      keypoints=None,
-                      seed=None):
+                      seed=None,
+                      preprocess_vars_cache=None):
  """Randomly rotates the image and detections 90 degrees counter-clockwise.

  The probability of rotating the image is 50%. This can be combined with
@@ -508,6 +650,10 @@ def random_rotation90(image,
               [num_instances, num_keypoints, 2]. The keypoints are in y-x
               normalized coordinates.
    seed: random seed
+    preprocess_vars_cache: PreprocessorCache object that records previously
+                           performed augmentations. Updated in-place. If this
+                           function is called multiple times with the same
+                           non-null cache, it will perform deterministically.

  Returns:
    image: image which is the same shape as input image.
@@ -533,7 +679,11 @@ def random_rotation90(image,
    result = []

    # random variable defining whether to rotate by 90 degrees or not
-    do_a_rot90_random = tf.greater(tf.random_uniform([], seed=seed), 0.5)
+    generator_func = functools.partial(tf.random_uniform, [], seed=seed)
+    do_a_rot90_random = _get_or_create_preprocess_rand_vars(
+        generator_func, preprocessor_cache.PreprocessorCache.ROTATION90,
+        preprocess_vars_cache)
+    do_a_rot90_random = tf.greater(do_a_rot90_random, 0.5)

    # flip image
    image = tf.cond(do_a_rot90_random, lambda: _rot90_image(image),
@@ -563,7 +713,11 @@ def random_rotation90(image,
    return tuple(result)


-def random_pixel_value_scale(image, minval=0.9, maxval=1.1, seed=None):
+def random_pixel_value_scale(image,
+                             minval=0.9,
+                             maxval=1.1,
+                             seed=None,
+                             preprocess_vars_cache=None):
  """Scales each value in the pixels of the image.

     This function scales each pixel independent of the other ones.
@@ -576,17 +730,24 @@ def random_pixel_value_scale(image, minval=0.9, maxval=1.1, seed=None):
    minval: lower ratio of scaling pixel values.
    maxval: upper ratio of scaling pixel values.
    seed: random seed.
+    preprocess_vars_cache: PreprocessorCache object that records previously
+                           performed augmentations. Updated in-place. If this
+                           function is called multiple times with the same
+                           non-null cache, it will perform deterministically.

  Returns:
    image: image which is the same shape as input image.
  """
  with tf.name_scope('RandomPixelValueScale', values=[image]):
-    color_coef = tf.random_uniform(
-        tf.shape(image),
-        minval=minval,
-        maxval=maxval,
-        dtype=tf.float32,
-        seed=seed)
+    generator_func = functools.partial(
+        tf.random_uniform, tf.shape(image),
+        minval=minval, maxval=maxval,
+        dtype=tf.float32, seed=seed)
+    color_coef = _get_or_create_preprocess_rand_vars(
+        generator_func,
+        preprocessor_cache.PreprocessorCache.PIXEL_VALUE_SCALE,
+        preprocess_vars_cache)
+
    image = tf.multiply(image, color_coef)
    image = tf.clip_by_value(image, 0.0, 1.0)

@@ -597,7 +758,8 @@ def random_image_scale(image,
                       masks=None,
                       min_scale_ratio=0.5,
                       max_scale_ratio=2.0,
-                       seed=None):
+                       seed=None,
+                       preprocess_vars_cache=None):
  """Scales the image size.

  Args:
@@ -608,6 +770,10 @@ def random_image_scale(image,
    min_scale_ratio: minimum scaling ratio.
    max_scale_ratio: maximum scaling ratio.
    seed: random seed.
+    preprocess_vars_cache: PreprocessorCache object that records previously
+                           performed augmentations. Updated in-place. If this
+                           function is called multiple times with the same
+                           non-null cache, it will perform deterministically.

  Returns:
    image: image which is the same rank as input image.
@@ -619,10 +785,14 @@ def random_image_scale(image,
    image_shape = tf.shape(image)
    image_height = image_shape[0]
    image_width = image_shape[1]
-    size_coef = tf.random_uniform([],
-                                  minval=min_scale_ratio,
-                                  maxval=max_scale_ratio,
-                                  dtype=tf.float32, seed=seed)
+    generator_func = functools.partial(
+        tf.random_uniform, [],
+        minval=min_scale_ratio, maxval=max_scale_ratio,
+        dtype=tf.float32, seed=seed)
+    size_coef = _get_or_create_preprocess_rand_vars(
+        generator_func, preprocessor_cache.PreprocessorCache.IMAGE_SCALE,
+        preprocess_vars_cache)
+
    image_newysize = tf.to_int32(
        tf.multiply(tf.to_float(image_height), size_coef))
    image_newxsize = tf.to_int32(
@@ -637,7 +807,10 @@ def random_image_scale(image,
    return tuple(result)


-def random_rgb_to_gray(image, probability=0.1, seed=None):
+def random_rgb_to_gray(image,
+                       probability=0.1,
+                       seed=None,
+                       preprocess_vars_cache=None):
  """Changes the image from RGB to Grayscale with the given probability.

  Args:
@@ -646,18 +819,25 @@ def random_rgb_to_gray(image, probability=0.1, seed=None):
    probability: the probability of returning a grayscale image.
            The probability should be a number between [0, 1].
    seed: random seed.
+    preprocess_vars_cache: PreprocessorCache object that records previously
+                           performed augmentations. Updated in-place. If this
+                           function is called multiple times with the same
+                           non-null cache, it will perform deterministically.

  Returns:
    image: image which is the same shape as input image.
  """
  def _image_to_gray(image):
-    image_gray1 = tf.image.rgb_to_grayscale(image)
+    image_gray1 = _rgb_to_grayscale(image)
    image_gray3 = tf.image.grayscale_to_rgb(image_gray1)
    return image_gray3

  with tf.name_scope('RandomRGBtoGray', values=[image]):
-    # random variable defining whether to do flip or not
-    do_gray_random = tf.random_uniform([], seed=seed)
+    # random variable defining whether to change to grayscale or not
+    generator_func = functools.partial(tf.random_uniform, [], seed=seed)
+    do_gray_random = _get_or_create_preprocess_rand_vars(
+        generator_func, preprocessor_cache.PreprocessorCache.RGB_TO_GRAY,
+        preprocess_vars_cache)

    image = tf.cond(
        tf.greater(do_gray_random, probability), lambda: image,
@@ -666,7 +846,10 @@ def random_rgb_to_gray(image, probability=0.1, seed=None):
  return image


-def random_adjust_brightness(image, max_delta=0.2):
+def random_adjust_brightness(image,
+                             max_delta=0.2,
+                             seed=None,
+                             preprocess_vars_cache=None):
  """Randomly adjusts brightness.

  Makes sure the output image is still between 0 and 1.
@@ -675,18 +858,34 @@ def random_adjust_brightness(image, max_delta=0.2):
    image: rank 3 float32 tensor contains 1 image -> [height, width, channels]
           with pixel values varying between [0, 1].
    max_delta: how much to change the brightness. A value between [0, 1).
+    seed: random seed.
+    preprocess_vars_cache: PreprocessorCache object that records previously
+                           performed augmentations. Updated in-place. If this
+                           function is called multiple times with the same
+                           non-null cache, it will perform deterministically.

  Returns:
    image: image which is the same shape as input image.
    boxes: boxes which is the same shape as input boxes.
  """
  with tf.name_scope('RandomAdjustBrightness', values=[image]):
-    image = tf.image.random_brightness(image, max_delta)
+    generator_func = functools.partial(tf.random_uniform, [],
+                                       -max_delta, max_delta, seed=seed)
+    delta = _get_or_create_preprocess_rand_vars(
+        generator_func,
+        preprocessor_cache.PreprocessorCache.ADJUST_BRIGHTNESS,
+        preprocess_vars_cache)
+
+    image = tf.image.adjust_brightness(image, delta)
    image = tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=1.0)
    return image


-def random_adjust_contrast(image, min_delta=0.8, max_delta=1.25):
+def random_adjust_contrast(image,
+                           min_delta=0.8,
+                           max_delta=1.25,
+                           seed=None,
+                           preprocess_vars_cache=None):
  """Randomly adjusts contrast.

  Makes sure the output image is still between 0 and 1.
@@ -698,17 +897,31 @@ def random_adjust_contrast(image, min_delta=0.8, max_delta=1.25):
    max_delta: how much to change the contrast. Contrast will change with a
               value between min_delta and max_delta. This value will be
               multiplied to the current contrast of the image.
+    seed: random seed.
+    preprocess_vars_cache: PreprocessorCache object that records previously
+                           performed augmentations. Updated in-place. If this
+                           function is called multiple times with the same
+                           non-null cache, it will perform deterministically.

  Returns:
    image: image which is the same shape as input image.
  """
  with tf.name_scope('RandomAdjustContrast', values=[image]):
-    image = tf.image.random_contrast(image, min_delta, max_delta)
+    generator_func = functools.partial(tf.random_uniform, [],
+                                       min_delta, max_delta, seed=seed)
+    contrast_factor = _get_or_create_preprocess_rand_vars(
+        generator_func,
+        preprocessor_cache.PreprocessorCache.ADJUST_CONTRAST,
+        preprocess_vars_cache)
+    image = tf.image.adjust_contrast(image, contrast_factor)
    image = tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=1.0)
    return image


-def random_adjust_hue(image, max_delta=0.02):
+def random_adjust_hue(image,
+                      max_delta=0.02,
+                      seed=None,
+                      preprocess_vars_cache=None):
  """Randomly adjusts hue.

  Makes sure the output image is still between 0 and 1.
@@ -717,17 +930,31 @@ def random_adjust_hue(image, max_delta=0.02):
    image: rank 3 float32 tensor contains 1 image -> [height, width, channels]
           with pixel values varying between [0, 1].
    max_delta: change hue randomly with a value between 0 and max_delta.
+    seed: random seed.
+    preprocess_vars_cache: PreprocessorCache object that records previously
+                           performed augmentations. Updated in-place. If this
+                           function is called multiple times with the same
+                           non-null cache, it will perform deterministically.

  Returns:
    image: image which is the same shape as input image.
  """
  with tf.name_scope('RandomAdjustHue', values=[image]):
-    image = tf.image.random_hue(image, max_delta)
+    generator_func = functools.partial(tf.random_uniform, [],
+                                       -max_delta, max_delta, seed=seed)
+    delta = _get_or_create_preprocess_rand_vars(
+        generator_func, preprocessor_cache.PreprocessorCache.ADJUST_HUE,
+        preprocess_vars_cache)
+    image = tf.image.adjust_hue(image, delta)
    image = tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=1.0)
    return image


-def random_adjust_saturation(image, min_delta=0.8, max_delta=1.25):
+def random_adjust_saturation(image,
+                             min_delta=0.8,
+                             max_delta=1.25,
+                             seed=None,
+                             preprocess_vars_cache=None):
  """Randomly adjusts saturation.

  Makes sure the output image is still between 0 and 1.
@@ -739,17 +966,28 @@ def random_adjust_saturation(image, min_delta=0.8, max_delta=1.25):
    max_delta: how much to change the saturation. Saturation will change with a
               value between min_delta and max_delta. This value will be
               multiplied to the current saturation of the image.
+    seed: random seed.
+    preprocess_vars_cache: PreprocessorCache object that records previously
+                           performed augmentations. Updated in-place. If this
+                           function is called multiple times with the same
+                           non-null cache, it will perform deterministically.

  Returns:
    image: image which is the same shape as input image.
  """
  with tf.name_scope('RandomAdjustSaturation', values=[image]):
-    image = tf.image.random_saturation(image, min_delta, max_delta)
+    generator_func = functools.partial(tf.random_uniform, [],
+                                       min_delta, max_delta, seed=seed)
+    saturation_factor = _get_or_create_preprocess_rand_vars(
+        generator_func,
+        preprocessor_cache.PreprocessorCache.ADJUST_SATURATION,
+        preprocess_vars_cache)
+    image = tf.image.adjust_saturation(image, saturation_factor)
    image = tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=1.0)
    return image


-def random_distort_color(image, color_ordering=0):
+def random_distort_color(image, color_ordering=0, preprocess_vars_cache=None):
  """Randomly distorts color.

  Randomly distorts color using a combination of brightness, hue, contrast
@@ -759,6 +997,10 @@ def random_distort_color(image, color_ordering=0):
    image: rank 3 float32 tensor contains 1 image -> [height, width, channels]
           with pixel values varying between [0, 1].
    color_ordering: Python int, a type of distortion (valid values: 0, 1).
+    preprocess_vars_cache: PreprocessorCache object that records previously
+                           performed augmentations. Updated in-place. If this
+                           function is called multiple times with the same
+                           non-null cache, it will perform deterministically.

  Returns:
    image: image which is the same shape as input image.
@@ -768,20 +1010,34 @@ def random_distort_color(image, color_ordering=0):
  """
  with tf.name_scope('RandomDistortColor', values=[image]):
    if color_ordering == 0:
-      image = tf.image.random_brightness(image, max_delta=32. / 255.)
-      image = tf.image.random_saturation(image, lower=0.5, upper=1.5)
-      image = tf.image.random_hue(image, max_delta=0.2)
-      image = tf.image.random_contrast(image, lower=0.5, upper=1.5)
+      image = random_adjust_brightness(
+          image, max_delta=32. / 255.,
+          preprocess_vars_cache=preprocess_vars_cache)
+      image = random_adjust_saturation(
+          image, min_delta=0.5, max_delta=1.5,
+          preprocess_vars_cache=preprocess_vars_cache)
+      image = random_adjust_hue(
+          image, max_delta=0.2,
+          preprocess_vars_cache=preprocess_vars_cache)
+      image = random_adjust_contrast(
+          image, min_delta=0.5, max_delta=1.5,
+          preprocess_vars_cache=preprocess_vars_cache)
+
    elif color_ordering == 1:
-      image = tf.image.random_brightness(image, max_delta=32. / 255.)
-      image = tf.image.random_contrast(image, lower=0.5, upper=1.5)
-      image = tf.image.random_saturation(image, lower=0.5, upper=1.5)
-      image = tf.image.random_hue(image, max_delta=0.2)
+      image = random_adjust_brightness(
+          image, max_delta=32. / 255.,
+          preprocess_vars_cache=preprocess_vars_cache)
+      image = random_adjust_contrast(
+          image, min_delta=0.5, max_delta=1.5,
+          preprocess_vars_cache=preprocess_vars_cache)
+      image = random_adjust_saturation(
+          image, min_delta=0.5, max_delta=1.5,
+          preprocess_vars_cache=preprocess_vars_cache)
+      image = random_adjust_hue(
+          image, max_delta=0.2,
+          preprocess_vars_cache=preprocess_vars_cache)
    else:
      raise ValueError('color_ordering must be in {0, 1}')
-
-    # The random_* ops do not necessarily clamp.
-    image = tf.clip_by_value(image, 0.0, 1.0)
    return image


@@ -846,7 +1102,8 @@ def _strict_random_crop_image(image,
                              min_object_covered=1.0,
                              aspect_ratio_range=(0.75, 1.33),
                              area_range=(0.1, 1.0),
-                              overlap_thresh=0.3):
+                              overlap_thresh=0.3,
+                              preprocess_vars_cache=None):
  """Performs random crop.

  Note: boxes will be clipped to the crop. Keypoint coordinates that are
@@ -879,6 +1136,10 @@ def _strict_random_crop_image(image,
                original image.
    overlap_thresh: minimum overlap thresh with new cropped
                    image to keep the box.
+    preprocess_vars_cache: PreprocessorCache object that records previously
+                           performed augmentations. Updated in-place. If this
+                           function is called multiple times with the same
+                           non-null cache, it will perform deterministically.

  Returns:
    image: image which is the same rank as input image.
@@ -901,7 +1162,8 @@ def _strict_random_crop_image(image,
        tf.clip_by_value(
            boxes, clip_value_min=0.0, clip_value_max=1.0), 1)

-    sample_distorted_bounding_box = tf.image.sample_distorted_bounding_box(
+    generator_func = functools.partial(
+        tf.image.sample_distorted_bounding_box,
        image_shape,
        bounding_boxes=boxes_expanded,
        min_object_covered=min_object_covered,
@@ -910,6 +1172,13 @@ def _strict_random_crop_image(image,
        max_attempts=100,
        use_image_if_no_bounding_boxes=True)

+    # for ssd cropping, each value of min_object_covered has its own
+    # cached random variable
+    sample_distorted_bounding_box = _get_or_create_preprocess_rand_vars(
+        generator_func,
+        preprocessor_cache.PreprocessorCache.STRICT_CROP_IMAGE,
+        preprocess_vars_cache, key=min_object_covered)
+
    im_box_begin, im_box_size, im_box = sample_distorted_bounding_box

    new_image = tf.slice(image, im_box_begin, im_box_size)
@@ -985,7 +1254,8 @@ def random_crop_image(image,
                      area_range=(0.1, 1.0),
                      overlap_thresh=0.3,
                      random_coef=0.0,
-                      seed=None):
+                      seed=None,
+                      preprocess_vars_cache=None):
  """Randomly crops the image.

  Given the input image and its bounding boxes, this op randomly
@@ -1030,6 +1300,10 @@ def random_crop_image(image,
                 cropped image, and if it is 1.0, we will always get the
                 original image.
    seed: random seed.
+    preprocess_vars_cache: PreprocessorCache object that records previously
+                           performed augmentations. Updated in-place. If this
+                           function is called multiple times with the same
+                           non-null cache, it will perform deterministically.

  Returns:
    image: Image shape will be [new_height, new_width, channels].
@@ -1057,13 +1331,17 @@ def random_crop_image(image,
        min_object_covered=min_object_covered,
        aspect_ratio_range=aspect_ratio_range,
        area_range=area_range,
-        overlap_thresh=overlap_thresh)
+        overlap_thresh=overlap_thresh,
+        preprocess_vars_cache=preprocess_vars_cache)

  # avoids tf.cond to make faster RCNN training on borg. See b/140057645.
  if random_coef < sys.float_info.min:
    result = strict_random_crop_image_fn()
  else:
-    do_a_crop_random = tf.random_uniform([], seed=seed)
+    generator_func = functools.partial(tf.random_uniform, [], seed=seed)
+    do_a_crop_random = _get_or_create_preprocess_rand_vars(
+        generator_func, preprocessor_cache.PreprocessorCache.CROP_IMAGE,
+        preprocess_vars_cache)
    do_a_crop_random = tf.greater(do_a_crop_random, random_coef)

    outputs = [image, boxes, labels]
@@ -1085,7 +1363,8 @@ def random_pad_image(image,
                     min_image_size=None,
                     max_image_size=None,
                     pad_color=None,
-                     seed=None):
+                     seed=None,
+                     preprocess_vars_cache=None):
  """Randomly pads the image.

  This function randomly pads the image with zeros. The final size of the
@@ -1111,8 +1390,11 @@ def random_pad_image(image,
    pad_color: padding color. A rank 1 tensor of [3] with dtype=tf.float32.
               if set as None, it will be set to average color of the input
               image.
-
    seed: random seed.
+    preprocess_vars_cache: PreprocessorCache object that records previously
+                           performed augmentations. Updated in-place. If this
+                           function is called multiple times with the same
+                           non-null cache, it will perform deterministically.

  Returns:
    image: Image shape will be [new_height, new_width, channels].
@@ -1156,6 +1438,12 @@ def random_pad_image(image,
      lambda: _random_integer(0, target_width - image_width, seed),
      lambda: tf.constant(0, dtype=tf.int32))

+  gen_func = lambda: (target_height, target_width, offset_height, offset_width)
+  params = _get_or_create_preprocess_rand_vars(
+      gen_func, preprocessor_cache.PreprocessorCache.PAD_IMAGE,
+      preprocess_vars_cache)
+  target_height, target_width, offset_height, offset_width = params
+
  new_image = tf.image.pad_to_bounding_box(
      image,
      offset_height=offset_height,
@@ -1201,7 +1489,8 @@ def random_crop_pad_image(image,
                          min_padded_size_ratio=(1.0, 1.0),
                          max_padded_size_ratio=(2.0, 2.0),
                          pad_color=None,
-                          seed=None):
+                          seed=None,
+                          preprocess_vars_cache=None):
  """Randomly crops and pads the image.

  Given an input image and its bounding boxes, this op first randomly crops
@@ -1242,6 +1531,10 @@ def random_crop_pad_image(image,
               if set as None, it will be set to average color of the randomly
               cropped image.
    seed: random seed.
+    preprocess_vars_cache: PreprocessorCache object that records previously
+                           performed augmentations. Updated in-place. If this
+                           function is called multiple times with the same
+                           non-null cache, it will perform deterministically.

  Returns:
    padded_image: padded image.
@@ -1264,7 +1557,8 @@ def random_crop_pad_image(image,
      area_range=area_range,
      overlap_thresh=overlap_thresh,
      random_coef=random_coef,
-      seed=seed)
+      seed=seed,
+      preprocess_vars_cache=preprocess_vars_cache)

  cropped_image, cropped_boxes, cropped_labels = result[:3]

@@ -1281,7 +1575,8 @@ def random_crop_pad_image(image,
      min_image_size=min_image_size,
      max_image_size=max_image_size,
      pad_color=pad_color,
-      seed=seed)
+      seed=seed,
+      preprocess_vars_cache=preprocess_vars_cache)

  cropped_padded_output = (padded_image, padded_boxes, cropped_labels)

@@ -1300,7 +1595,8 @@ def random_crop_to_aspect_ratio(image,
                                keypoints=None,
                                aspect_ratio=1.0,
                                overlap_thresh=0.3,
-                                seed=None):
+                                seed=None,
+                                preprocess_vars_cache=None):
  """Randomly crops an image to the specified aspect ratio.

  Randomly crops the a portion of the image such that the crop is of the
@@ -1332,6 +1628,10 @@ def random_crop_to_aspect_ratio(image,
    overlap_thresh: minimum overlap thresh with new cropped
                    image to keep the box.
    seed: random seed.
+    preprocess_vars_cache: PreprocessorCache object that records previously
+                           performed augmentations. Updated in-place. If this
+                           function is called multiple times with the same
+                           non-null cache, it will perform deterministically.

  Returns:
    image: image which is the same rank as input image.
@@ -1375,6 +1675,13 @@ def random_crop_to_aspect_ratio(image,
    # offset_height is randomly chosen from [0, offset_height - target_height)
    offset_height = _random_integer(0, orig_height - target_height + 1, seed)
    offset_width = _random_integer(0, orig_width - target_width + 1, seed)
+
+    generator_func = lambda: (offset_height, offset_width)
+    offset_height, offset_width = _get_or_create_preprocess_rand_vars(
+        generator_func,
+        preprocessor_cache.PreprocessorCache.CROP_TO_ASPECT_RATIO,
+        preprocess_vars_cache)
+
    new_image = tf.image.crop_to_bounding_box(
        image, offset_height, offset_width, target_height, target_width)

@@ -1437,7 +1744,8 @@ def random_pad_to_aspect_ratio(image,
                               aspect_ratio=1.0,
                               min_padded_size_ratio=(1.0, 1.0),
                               max_padded_size_ratio=(2.0, 2.0),
-                               seed=None):
+                               seed=None,
+                               preprocess_vars_cache=None):
  """Randomly zero pads an image to the specified aspect ratio.

  Pads the image so that the resulting image will have the specified aspect
@@ -1465,6 +1773,10 @@ def random_pad_to_aspect_ratio(image,
    max_padded_size_ratio: max ratio of padded image height and width to the
                           input image's height and width.
    seed: random seed.
+    preprocess_vars_cache: PreprocessorCache object that records previously
+                           performed augmentations. Updated in-place. If this
+                           function is called multiple times with the same
+                           non-null cache, it will perform deterministically.

  Returns:
    image: image which is the same rank as input image.
@@ -1511,7 +1823,13 @@ def random_pad_to_aspect_ratio(image,

    min_scale = tf.maximum(min_height / target_height, min_width / target_width)
    max_scale = tf.minimum(max_height / target_height, max_width / target_width)
-    scale = tf.random_uniform([], min_scale, max_scale, seed=seed)
+
+    generator_func = functools.partial(tf.random_uniform, [],
+                                       min_scale, max_scale, seed=seed)
+    scale = _get_or_create_preprocess_rand_vars(
+        generator_func,
+        preprocessor_cache.PreprocessorCache.PAD_TO_ASPECT_RATIO,
+        preprocess_vars_cache)

    target_height = scale * target_height
    target_width = scale * target_width
@@ -1550,7 +1868,8 @@ def random_black_patches(image,
                         max_black_patches=10,
                         probability=0.5,
                         size_to_image_ratio=0.1,
-                         random_seed=None):
+                         random_seed=None,
+                         preprocess_vars_cache=None):
  """Randomly adds some black patches to the image.

  This op adds up to max_black_patches square black patches of a fixed size
@@ -1567,15 +1886,20 @@ def random_black_patches(image,
                         box_size = size_to_image_ratio *
                                    min(image_width, image_height)
    random_seed: random seed.
+    preprocess_vars_cache: PreprocessorCache object that records previously
+                           performed augmentations. Updated in-place. If this
+                           function is called multiple times with the same
+                           non-null cache, it will perform deterministically.

  Returns:
    image
  """
-  def add_black_patch_to_image(image):
+  def add_black_patch_to_image(image, idx):
    """Function for adding one patch to the image.

    Args:
      image: image
+      idx: counter for number of patches that could have been added

    Returns:
      image with a randomly added black box
@@ -1587,10 +1911,19 @@ def random_black_patches(image,
        tf.multiply(
            tf.minimum(tf.to_float(image_height), tf.to_float(image_width)),
            size_to_image_ratio))
-    normalized_y_min = tf.random_uniform(
-        [], minval=0.0, maxval=(1.0 - size_to_image_ratio), seed=random_seed)
-    normalized_x_min = tf.random_uniform(
-        [], minval=0.0, maxval=(1.0 - size_to_image_ratio), seed=random_seed)
+
+    generator_func = functools.partial(tf.random_uniform, [], minval=0.0,
+                                       maxval=(1.0 - size_to_image_ratio),
+                                       seed=random_seed)
+    normalized_y_min = _get_or_create_preprocess_rand_vars(
+        generator_func,
+        preprocessor_cache.PreprocessorCache.ADD_BLACK_PATCH,
+        preprocess_vars_cache, key=str(idx) + 'y')
+    normalized_x_min = _get_or_create_preprocess_rand_vars(
+        generator_func,
+        preprocessor_cache.PreprocessorCache.ADD_BLACK_PATCH,
+        preprocess_vars_cache, key=str(idx) + 'x')
+
    y_min = tf.to_int32(normalized_y_min * tf.to_float(image_height))
    x_min = tf.to_int32(normalized_x_min * tf.to_float(image_width))
    black_box = tf.ones([box_size, box_size, 3], dtype=tf.float32)
@@ -1600,13 +1933,17 @@ def random_black_patches(image,
    return image

  with tf.name_scope('RandomBlackPatchInImage', values=[image]):
-    for _ in range(max_black_patches):
-      random_prob = tf.random_uniform(
-          [], minval=0.0, maxval=1.0, dtype=tf.float32, seed=random_seed)
+    for idx in range(max_black_patches):
+      generator_func = functools.partial(tf.random_uniform, [],
+                                         minval=0.0, maxval=1.0,
+                                         dtype=tf.float32, seed=random_seed)
+      random_prob = _get_or_create_preprocess_rand_vars(
+          generator_func,
+          preprocessor_cache.PreprocessorCache.BLACK_PATCHES,
+          preprocess_vars_cache, key=idx)
      image = tf.cond(
          tf.greater(random_prob, probability), lambda: image,
-          lambda: add_black_patch_to_image(image))
-
+          functools.partial(add_black_patch_to_image, image=image, idx=idx))
    return image


@@ -1624,12 +1961,16 @@ def image_to_float(image):
    return image


-def random_resize_method(image, target_size):
+def random_resize_method(image, target_size, preprocess_vars_cache=None):
  """Uses a random resize method to resize the image to target size.

  Args:
    image: a rank 3 tensor.
    target_size: a list of [target_height, target_width]
+    preprocess_vars_cache: PreprocessorCache object that records previously
+                           performed augmentations. Updated in-place. If this
+                           function is called multiple times with the same
+                           non-null cache, it will perform deterministically.

  Returns:
    resized image.
@@ -1638,7 +1979,9 @@ def random_resize_method(image, target_size):
  resized_image = _apply_with_random_selector(
      image,
      lambda x, method: tf.image.resize_images(x, target_size, method),
-      num_cases=4)
+      num_cases=4,
+      preprocess_vars_cache=preprocess_vars_cache,
+      key=preprocessor_cache.PreprocessorCache.RESIZE_METHOD)

  return resized_image

@@ -2000,7 +2343,7 @@ def rgb_to_gray(image):
  Returns:
    image: A single channel grayscale image -> [image, height, 1].
  """
-  return tf.image.rgb_to_grayscale(image)
+  return _rgb_to_grayscale(image)


 def ssd_random_crop(image,
@@ -2014,7 +2357,8 @@ def ssd_random_crop(image,
                    area_range=((0.1, 1.0),) * 7,
                    overlap_thresh=(0.0, 0.1, 0.3, 0.5, 0.7, 0.9, 1.0),
                    random_coef=(0.15,) * 7,
-                    seed=None):
+                    seed=None,
+                    preprocess_vars_cache=None):
  """Random crop preprocessing with default parameters as in SSD paper.

  Liu et al., SSD: Single shot multibox detector.
@@ -2048,6 +2392,10 @@ def ssd_random_crop(image,
                 cropped image, and if it is 1.0, we will always get the
                 original image.
    seed: random seed.
+    preprocess_vars_cache: PreprocessorCache object that records previously
+                           performed augmentations. Updated in-place. If this
+                           function is called multiple times with the same
+                           non-null cache, it will perform deterministically.

  Returns:
    image: image which is the same rank as input image.
@@ -2100,14 +2448,17 @@ def ssd_random_crop(image,
        area_range=area_range[index],
        overlap_thresh=overlap_thresh[index],
        random_coef=random_coef[index],
-        seed=seed)
+        seed=seed,
+        preprocess_vars_cache=preprocess_vars_cache)

  result = _apply_with_random_selector_tuples(
      tuple(
          t for t in (image, boxes, labels, label_scores, masks, keypoints)
          if t is not None),
      random_crop_selector,
-      num_cases=len(min_object_covered))
+      num_cases=len(min_object_covered),
+      preprocess_vars_cache=preprocess_vars_cache,
+      key=preprocessor_cache.PreprocessorCache.SSD_CROP_SELECTOR_ID)
  return result


@@ -2123,7 +2474,8 @@ def ssd_random_crop_pad(image,
                        min_padded_size_ratio=((1.0, 1.0),) * 6,
                        max_padded_size_ratio=((2.0, 2.0),) * 6,
                        pad_color=(None,) * 6,
-                        seed=None):
+                        seed=None,
+                        preprocess_vars_cache=None):
  """Random crop preprocessing with default parameters as in SSD paper.

  Liu et al., SSD: Single shot multibox detector.
@@ -2159,6 +2511,10 @@ def ssd_random_crop_pad(image,
               if set as None, it will be set to average color of the randomly
               cropped image.
    seed: random seed.
+    preprocess_vars_cache: PreprocessorCache object that records previously
+                           performed augmentations. Updated in-place. If this
+                           function is called multiple times with the same
+                           non-null cache, it will perform deterministically.

  Returns:
    image: Image shape will be [new_height, new_width, channels].
@@ -2188,12 +2544,15 @@ def ssd_random_crop_pad(image,
        min_padded_size_ratio=min_padded_size_ratio[index],
        max_padded_size_ratio=max_padded_size_ratio[index],
        pad_color=pad_color[index],
-        seed=seed)
+        seed=seed,
+        preprocess_vars_cache=preprocess_vars_cache)

  return _apply_with_random_selector_tuples(
      tuple(t for t in (image, boxes, labels, label_scores) if t is not None),
      random_crop_pad_selector,
-      num_cases=len(min_object_covered))
+      num_cases=len(min_object_covered),
+      preprocess_vars_cache=preprocess_vars_cache,
+      key=preprocessor_cache.PreprocessorCache.SSD_CROP_PAD_SELECTOR_ID)


 def ssd_random_crop_fixed_aspect_ratio(
@@ -2208,7 +2567,8 @@ def ssd_random_crop_fixed_aspect_ratio(
    area_range=((0.1, 1.0),) * 7,
    overlap_thresh=(0.0, 0.1, 0.3, 0.5, 0.7, 0.9, 1.0),
    random_coef=(0.15,) * 7,
-    seed=None):
+    seed=None,
+    preprocess_vars_cache=None):
  """Random crop preprocessing with default parameters as in SSD paper.

  Liu et al., SSD: Single shot multibox detector.
@@ -2245,6 +2605,10 @@ def ssd_random_crop_fixed_aspect_ratio(
                 cropped image, and if it is 1.0, we will always get the
                 original image.
    seed: random seed.
+    preprocess_vars_cache: PreprocessorCache object that records previously
+                           performed augmentations. Updated in-place. If this
+                           function is called multiple times with the same
+                           non-null cache, it will perform deterministically.

  Returns:
    image: image which is the same rank as input image.
@@ -2263,7 +2627,8 @@ def ssd_random_crop_fixed_aspect_ratio(

  crop_result = ssd_random_crop(
      image, boxes, labels, label_scores, masks, keypoints, min_object_covered,
-      aspect_ratio_range, area_range, overlap_thresh, random_coef, seed)
+      aspect_ratio_range, area_range, overlap_thresh, random_coef, seed,
+      preprocess_vars_cache)
  i = 3
  new_image, new_boxes, new_labels = crop_result[:i]
  new_label_scores = None
@@ -2285,7 +2650,8 @@ def ssd_random_crop_fixed_aspect_ratio(
      new_masks,
      new_keypoints,
      aspect_ratio=aspect_ratio,
-      seed=seed)
+      seed=seed,
+      preprocess_vars_cache=preprocess_vars_cache)

  return result

@@ -2305,7 +2671,8 @@ def ssd_random_crop_pad_fixed_aspect_ratio(
    random_coef=(0.15,) * 7,
    min_padded_size_ratio=(1.0, 1.0),
    max_padded_size_ratio=(2.0, 2.0),
-    seed=None):
+    seed=None,
+    preprocess_vars_cache=None):
  """Random crop and pad preprocessing with default parameters as in SSD paper.

  Liu et al., SSD: Single shot multibox detector.
@@ -2348,6 +2715,10 @@ def ssd_random_crop_pad_fixed_aspect_ratio(
    max_padded_size_ratio: max ratio of padded image height and width to the
                           input image's height and width.
    seed: random seed.
+    preprocess_vars_cache: PreprocessorCache object that records previously
+                           performed augmentations. Updated in-place. If this
+                           function is called multiple times with the same
+                           non-null cache, it will perform deterministically.

  Returns:
    image: image which is the same rank as input image.
@@ -2364,7 +2735,8 @@ def ssd_random_crop_pad_fixed_aspect_ratio(
  """
  crop_result = ssd_random_crop(
      image, boxes, labels, label_scores, masks, keypoints, min_object_covered,
-      aspect_ratio_range, area_range, overlap_thresh, random_coef, seed)
+      aspect_ratio_range, area_range, overlap_thresh, random_coef, seed,
+      preprocess_vars_cache)
  i = 3
  new_image, new_boxes, new_labels = crop_result[:i]
  new_label_scores = None
@@ -2386,7 +2758,8 @@ def ssd_random_crop_pad_fixed_aspect_ratio(
      aspect_ratio=aspect_ratio,
      min_padded_size_ratio=min_padded_size_ratio,
      max_padded_size_ratio=max_padded_size_ratio,
-      seed=seed)
+      seed=seed,
+      preprocess_vars_cache=preprocess_vars_cache)

  result = list(result)
  if new_label_scores is not None:
@@ -2534,7 +2907,10 @@ def get_default_func_arg_map(include_label_scores=False,
  return prep_func_arg_map


-def preprocess(tensor_dict, preprocess_options, func_arg_map=None):
+def preprocess(tensor_dict,
+               preprocess_options,
+               func_arg_map=None,
+               preprocess_vars_cache=None):
  """Preprocess images and bounding boxes.

  Various types of preprocessing (to be implemented) based on the
@@ -2559,6 +2935,10 @@ def preprocess(tensor_dict, preprocess_options, func_arg_map=None):
                        their values.
    func_arg_map: mapping from preprocessing functions to arguments that they
                  expect to receive and return.
+    preprocess_vars_cache: PreprocessorCache object that records previously
+                           performed augmentations. Updated in-place. If this
+                           function is called multiple times with the same
+                           non-null cache, it will perform deterministically.

  Returns:
    tensor_dict: which contains the preprocessed images, bounding boxes, etc.
@@ -2598,6 +2978,9 @@ def preprocess(tensor_dict, preprocess_options, func_arg_map=None):
      return tensor_dict[key] if key is not None else None

    args = [get_arg(a) for a in arg_names]
+    if (preprocess_vars_cache is not None and
+        'preprocess_vars_cache' in inspect.getargspec(func).args):
+      params['preprocess_vars_cache'] = preprocess_vars_cache
    results = func(*args, **params)
    if not isinstance(results, (list, tuple)):
      results = (results,)

--- a/research/object_detection/core/preprocessor_cache.py
+++ b/research/object_detection/core/preprocessor_cache.py
+# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Records previous preprocessing operations and allows them to be repeated.
+
+Used with object_detection.core.preprocessor. Passing a PreprocessorCache
+into individual data augmentation functions or the general preprocess() function
+will store all randomly generated variables in the PreprocessorCache. When
+a preprocessor function is called multiple times with the same
+PreprocessorCache object, that function will perform the same augmentation
+on all calls.
+"""
+
+from collections import defaultdict
+
+
+class PreprocessorCache(object):
+  """Dictionary wrapper storing random variables generated during preprocessing.
+  """
+
+  # Constant keys representing different preprocessing functions
+  ROTATION90 = 'rotation90'
+  HORIZONTAL_FLIP = 'horizontal_flip'
+  VERTICAL_FLIP = 'vertical_flip'
+  PIXEL_VALUE_SCALE = 'pixel_value_scale'
+  IMAGE_SCALE = 'image_scale'
+  RGB_TO_GRAY = 'rgb_to_gray'
+  ADJUST_BRIGHTNESS = 'adjust_brightness'
+  ADJUST_CONTRAST = 'adjust_contrast'
+  ADJUST_HUE = 'adjust_hue'
+  ADJUST_SATURATION = 'adjust_saturation'
+  DISTORT_COLOR = 'distort_color'
+  STRICT_CROP_IMAGE = 'strict_crop_image'
+  CROP_IMAGE = 'crop_image'
+  PAD_IMAGE = 'pad_image'
+  CROP_TO_ASPECT_RATIO = 'crop_to_aspect_ratio'
+  RESIZE_METHOD = 'resize_method'
+  PAD_TO_ASPECT_RATIO = 'pad_to_aspect_ratio'
+  BLACK_PATCHES = 'black_patches'
+  ADD_BLACK_PATCH = 'add_black_patch'
+  SELECTOR = 'selector'
+  SELECTOR_TUPLES = 'selector_tuples'
+  SSD_CROP_SELECTOR_ID = 'ssd_crop_selector_id'
+  SSD_CROP_PAD_SELECTOR_ID = 'ssd_crop_pad_selector_id'
+
+  # 23 permitted function ids
+  _VALID_FNS = [ROTATION90, HORIZONTAL_FLIP, VERTICAL_FLIP, PIXEL_VALUE_SCALE,
+                IMAGE_SCALE, RGB_TO_GRAY, ADJUST_BRIGHTNESS, ADJUST_CONTRAST,
+                ADJUST_HUE, ADJUST_SATURATION, DISTORT_COLOR, STRICT_CROP_IMAGE,
+                CROP_IMAGE, PAD_IMAGE, CROP_TO_ASPECT_RATIO, RESIZE_METHOD,
+                PAD_TO_ASPECT_RATIO, BLACK_PATCHES, ADD_BLACK_PATCH, SELECTOR,
+                SELECTOR_TUPLES, SSD_CROP_SELECTOR_ID, SSD_CROP_PAD_SELECTOR_ID]
+
+  def __init__(self):
+    self._history = defaultdict(dict)
+
+  def clear(self):
+    """Resets cache."""
+    self._history = {}
+
+  def get(self, function_id, key):
+    """Gets stored value given a function id and key.
+
+    Args:
+      function_id: identifier for the preprocessing function used.
+      key: identifier for the variable stored.
+    Returns:
+      value: the corresponding value, expected to be a tensor or
+             nested structure of tensors.
+    Raises:
+      ValueError: if function_id is not one of the 23 valid function ids.
+    """
+    if function_id not in self._VALID_FNS:
+      raise ValueError('Function id not recognized: %s.' % str(function_id))
+    return self._history[function_id].get(key)
+
+  def update(self, function_id, key, value):
+    """Adds a value to the dictionary.
+
+    Args:
+      function_id: identifier for the preprocessing function used.
+      key: identifier for the variable stored.
+      value: the value to store, expected to be a tensor or nested structure
+             of tensors.
+    Raises:
+      ValueError: if function_id is not one of the 23 valid function ids.
+    """
+    if function_id not in self._VALID_FNS:
+      raise ValueError('Function id not recognized: %s.' % str(function_id))
+    self._history[function_id][key] = value
+
--- a/research/object_detection/core/preprocessor_test.py
+++ b/research/object_detection/core/preprocessor_test.py
@@ -21,6 +21,7 @@ import six
 import tensorflow as tf

 from object_detection.core import preprocessor
+from object_detection.core import preprocessor_cache
 from object_detection.core import standard_fields as fields

 if six.PY2:
@@ -290,6 +291,15 @@ class PreprocessorTest(tf.test.TestCase):
  def expectedLabelsAfterThresholdingWithMissingScore(self):
    return tf.constant([2], dtype=tf.float32)

+  def testRgbToGrayscale(self):
+    images = self.createTestImages()
+    grayscale_images = preprocessor._rgb_to_grayscale(images)
+    expected_images = tf.image.rgb_to_grayscale(images)
+    with self.test_session() as sess:
+      (grayscale_images, expected_images) = sess.run(
+          [grayscale_images, expected_images])
+      self.assertAllEqual(expected_images, grayscale_images)
+
  def testNormalizeImage(self):
    preprocess_options = [(preprocessor.normalize_image, {
        'original_minval': 0,
@@ -435,6 +445,55 @@ class PreprocessorTest(tf.test.TestCase):
      rotated_mask, expected_mask = sess.run([rotated_mask, expected_mask])
      self.assertAllEqual(rotated_mask.flatten(), expected_mask.flatten())

+  def _testPreprocessorCache(self,
+                             preprocess_options,
+                             test_boxes=False,
+                             test_masks=False,
+                             test_keypoints=False,
+                             num_runs=4):
+    cache = preprocessor_cache.PreprocessorCache()
+    images = self.createTestImages()
+    boxes = self.createTestBoxes()
+    classes = self.createTestLabels()
+    masks = self.createTestMasks()
+    keypoints = self.createTestKeypoints()
+    preprocessor_arg_map = preprocessor.get_default_func_arg_map(
+        include_instance_masks=test_masks, include_keypoints=test_keypoints)
+    out = []
+    for i in range(num_runs):
+      tensor_dict = {
+          fields.InputDataFields.image: images,
+      }
+      num_outputs = 1
+      if test_boxes:
+        tensor_dict[fields.InputDataFields.groundtruth_boxes] = boxes
+        tensor_dict[fields.InputDataFields.groundtruth_classes] = classes
+        num_outputs += 1
+      if test_masks:
+        tensor_dict[fields.InputDataFields.groundtruth_instance_masks] = masks
+        num_outputs += 1
+      if test_keypoints:
+        tensor_dict[fields.InputDataFields.groundtruth_keypoints] = keypoints
+        num_outputs += 1
+      out.append(preprocessor.preprocess(
+          tensor_dict, preprocess_options, preprocessor_arg_map, cache))
+
+    with self.test_session() as sess:
+      to_run = []
+      for i in range(num_runs):
+        to_run.append(out[i][fields.InputDataFields.image])
+        if test_boxes:
+          to_run.append(out[i][fields.InputDataFields.groundtruth_boxes])
+        if test_masks:
+          to_run.append(
+              out[i][fields.InputDataFields.groundtruth_instance_masks])
+        if test_keypoints:
+          to_run.append(out[i][fields.InputDataFields.groundtruth_keypoints])
+
+      out_array = sess.run(to_run)
+      for i in range(num_outputs, len(out_array)):
+        self.assertAllClose(out_array[i], out_array[i - num_outputs])
+
  def testRandomHorizontalFlip(self):
    preprocess_options = [(preprocessor.random_horizontal_flip, {})]
    images = self.expectedImagesAfterNormalization()
@@ -491,6 +550,16 @@ class PreprocessorTest(tf.test.TestCase):
      self.assertAllClose(boxes_, boxes_expected_)
      self.assertAllClose(images_diff_, images_diff_expected_)

+  def testRandomHorizontalFlipWithCache(self):
+    keypoint_flip_permutation = self.createKeypointFlipPermutation()
+    preprocess_options = [
+        (preprocessor.random_horizontal_flip,
+         {'keypoint_flip_permutation': keypoint_flip_permutation})]
+    self._testPreprocessorCache(preprocess_options,
+                                test_boxes=True,
+                                test_masks=True,
+                                test_keypoints=True)
+
  def testRunRandomHorizontalFlipWithMaskAndKeypoints(self):
    preprocess_options = [(preprocessor.random_horizontal_flip, {})]
    image_height = 3
@@ -578,6 +647,16 @@ class PreprocessorTest(tf.test.TestCase):
      self.assertAllClose(boxes_, boxes_expected_)
      self.assertAllClose(images_diff_, images_diff_expected_)

+  def testRandomVerticalFlipWithCache(self):
+    keypoint_flip_permutation = self.createKeypointFlipPermutation()
+    preprocess_options = [
+        (preprocessor.random_vertical_flip,
+         {'keypoint_flip_permutation': keypoint_flip_permutation})]
+    self._testPreprocessorCache(preprocess_options,
+                                test_boxes=True,
+                                test_masks=True,
+                                test_keypoints=True)
+
  def testRunRandomVerticalFlipWithMaskAndKeypoints(self):
    preprocess_options = [(preprocessor.random_vertical_flip, {})]
    image_height = 3
@@ -665,6 +744,13 @@ class PreprocessorTest(tf.test.TestCase):
      self.assertAllClose(boxes_, boxes_expected_)
      self.assertAllClose(images_diff_, images_diff_expected_)

+  def testRandomRotation90WithCache(self):
+    preprocess_options = [(preprocessor.random_rotation90, {})]
+    self._testPreprocessorCache(preprocess_options,
+                                test_boxes=True,
+                                test_masks=True,
+                                test_keypoints=True)
+
  def testRunRandomRotation90WithMaskAndKeypoints(self):
    preprocess_options = [(preprocessor.random_rotation90, {})]
    image_height = 3
@@ -716,6 +802,20 @@ class PreprocessorTest(tf.test.TestCase):
      self.assertAllClose(values_greater_, values_true_)
      self.assertAllClose(values_less_, values_true_)

+  def testRandomPixelValueScaleWithCache(self):
+    preprocess_options = []
+    preprocess_options.append((preprocessor.normalize_image, {
+        'original_minval': 0,
+        'original_maxval': 255,
+        'target_minval': 0,
+        'target_maxval': 1
+    }))
+    preprocess_options.append((preprocessor.random_pixel_value_scale, {}))
+    self._testPreprocessorCache(preprocess_options,
+                                test_boxes=True,
+                                test_masks=False,
+                                test_keypoints=False)
+
  def testRandomImageScale(self):
    preprocess_options = [(preprocessor.random_image_scale, {})]
    images_original = self.createTestImages()
@@ -736,6 +836,13 @@ class PreprocessorTest(tf.test.TestCase):
      self.assertTrue(
          images_original_shape_[2] * 2.0 >= images_scaled_shape_[2])

+  def testRandomImageScaleWithCache(self):
+    preprocess_options = [(preprocessor.random_image_scale, {})]
+    self._testPreprocessorCache(preprocess_options,
+                                test_boxes=False,
+                                test_masks=False,
+                                test_keypoints=False)
+
  def testRandomRGBtoGray(self):
    preprocess_options = [(preprocessor.random_rgb_to_gray, {})]
    images_original = self.createTestImages()
@@ -769,6 +876,14 @@ class PreprocessorTest(tf.test.TestCase):
      self.assertAllClose(images_g_diff_, image_zero1_)
      self.assertAllClose(images_b_diff_, image_zero1_)

+  def testRandomRGBtoGrayWithCache(self):
+    preprocess_options = [(
+        preprocessor.random_rgb_to_gray, {'probability': 0.5})]
+    self._testPreprocessorCache(preprocess_options,
+                                test_boxes=False,
+                                test_masks=False,
+                                test_keypoints=False)
+
  def testRandomAdjustBrightness(self):
    preprocessing_options = []
    preprocessing_options.append((preprocessor.normalize_image, {
@@ -789,6 +904,20 @@ class PreprocessorTest(tf.test.TestCase):
          [image_original_shape, image_bright_shape])
      self.assertAllEqual(image_original_shape_, image_bright_shape_)

+  def testRandomAdjustBrightnessWithCache(self):
+    preprocess_options = []
+    preprocess_options.append((preprocessor.normalize_image, {
+        'original_minval': 0,
+        'original_maxval': 255,
+        'target_minval': 0,
+        'target_maxval': 1
+    }))
+    preprocess_options.append((preprocessor.random_adjust_brightness, {}))
+    self._testPreprocessorCache(preprocess_options,
+                                test_boxes=False,
+                                test_masks=False,
+                                test_keypoints=False)
+
  def testRandomAdjustContrast(self):
    preprocessing_options = []
    preprocessing_options.append((preprocessor.normalize_image, {
@@ -809,6 +938,20 @@ class PreprocessorTest(tf.test.TestCase):
          [image_original_shape, image_contrast_shape])
      self.assertAllEqual(image_original_shape_, image_contrast_shape_)

+  def testRandomAdjustContrastWithCache(self):
+    preprocess_options = []
+    preprocess_options.append((preprocessor.normalize_image, {
+        'original_minval': 0,
+        'original_maxval': 255,
+        'target_minval': 0,
+        'target_maxval': 1
+    }))
+    preprocess_options.append((preprocessor.random_adjust_contrast, {}))
+    self._testPreprocessorCache(preprocess_options,
+                                test_boxes=False,
+                                test_masks=False,
+                                test_keypoints=False)
+
  def testRandomAdjustHue(self):
    preprocessing_options = []
    preprocessing_options.append((preprocessor.normalize_image, {
@@ -829,6 +972,20 @@ class PreprocessorTest(tf.test.TestCase):
          [image_original_shape, image_hue_shape])
      self.assertAllEqual(image_original_shape_, image_hue_shape_)

+  def testRandomAdjustHueWithCache(self):
+    preprocess_options = []
+    preprocess_options.append((preprocessor.normalize_image, {
+        'original_minval': 0,
+        'original_maxval': 255,
+        'target_minval': 0,
+        'target_maxval': 1
+    }))
+    preprocess_options.append((preprocessor.random_adjust_hue, {}))
+    self._testPreprocessorCache(preprocess_options,
+                                test_boxes=False,
+                                test_masks=False,
+                                test_keypoints=False)
+
  def testRandomDistortColor(self):
    preprocessing_options = []
    preprocessing_options.append((preprocessor.normalize_image, {
@@ -849,6 +1006,20 @@ class PreprocessorTest(tf.test.TestCase):
          [images_original_shape, images_distorted_color_shape])
      self.assertAllEqual(images_original_shape_, images_distorted_color_shape_)

+  def testRandomDistortColorWithCache(self):
+    preprocess_options = []
+    preprocess_options.append((preprocessor.normalize_image, {
+        'original_minval': 0,
+        'original_maxval': 255,
+        'target_minval': 0,
+        'target_maxval': 1
+    }))
+    preprocess_options.append((preprocessor.random_distort_color, {}))
+    self._testPreprocessorCache(preprocess_options,
+                                test_boxes=False,
+                                test_masks=False,
+                                test_keypoints=False)
+
  def testRandomJitterBoxes(self):
    preprocessing_options = []
    preprocessing_options.append((preprocessor.random_jitter_boxes, {}))
@@ -900,6 +1071,21 @@ class PreprocessorTest(tf.test.TestCase):
      self.assertAllEqual(boxes_rank_, distorted_boxes_rank_)
      self.assertAllEqual(images_rank_, distorted_images_rank_)

+  def testRandomCropImageWithCache(self):
+    preprocess_options = [(preprocessor.random_rgb_to_gray,
+                           {'probability': 0.5}),
+                          (preprocessor.normalize_image, {
+                              'original_minval': 0,
+                              'original_maxval': 255,
+                              'target_minval': 0,
+                              'target_maxval': 1,
+                          }),
+                          (preprocessor.random_crop_image, {})]
+    self._testPreprocessorCache(preprocess_options,
+                                test_boxes=True,
+                                test_masks=False,
+                                test_keypoints=False)
+
  def testRandomCropImageGrayscale(self):
    preprocessing_options = [(preprocessor.rgb_to_gray, {}),
                             (preprocessor.normalize_image, {
@@ -1446,6 +1632,13 @@ class PreprocessorTest(tf.test.TestCase):
           self.expectedKeypointsAfterThresholding()])
      self.assertAllClose(retained_keypoints_, expected_keypoints_)

+  def testRandomCropToAspectRatioWithCache(self):
+    preprocess_options = [(preprocessor.random_crop_to_aspect_ratio, {})]
+    self._testPreprocessorCache(preprocess_options,
+                                test_boxes=True,
+                                test_masks=False,
+                                test_keypoints=False)
+
  def testRunRandomCropToAspectRatioWithMasks(self):
    image = self.createColorfulTestImage()
    boxes = self.createTestBoxes()
@@ -1536,6 +1729,13 @@ class PreprocessorTest(tf.test.TestCase):
        self.assertAllClose(distorted_keypoints_.flatten(),
                            expected_keypoints.flatten())

+  def testRandomPadToAspectRatioWithCache(self):
+    preprocess_options = [(preprocessor.random_pad_to_aspect_ratio, {})]
+    self._testPreprocessorCache(preprocess_options,
+                                test_boxes=True,
+                                test_masks=True,
+                                test_keypoints=True)
+
  def testRunRandomPadToAspectRatioWithMasks(self):
    image = self.createColorfulTestImage()
    boxes = self.createTestBoxes()
@@ -1624,6 +1824,17 @@ class PreprocessorTest(tf.test.TestCase):
      self.assertAllClose(distorted_keypoints_.flatten(),
                          expected_keypoints.flatten())

+  def testRandomPadImageWithCache(self):
+    preprocess_options = [(preprocessor.normalize_image, {
+        'original_minval': 0,
+        'original_maxval': 255,
+        'target_minval': 0,
+        'target_maxval': 1,}), (preprocessor.random_pad_image, {})]
+    self._testPreprocessorCache(preprocess_options,
+                                test_boxes=True,
+                                test_masks=True,
+                                test_keypoints=True)
+
  def testRandomPadImage(self):
    preprocessing_options = [(preprocessor.normalize_image, {
        'original_minval': 0,
@@ -1670,6 +1881,17 @@ class PreprocessorTest(tf.test.TestCase):
      self.assertTrue(np.all((boxes_[:, 3] - boxes_[:, 1]) >= (
          padded_boxes_[:, 3] - padded_boxes_[:, 1])))

+  def testRandomCropPadImageWithCache(self):
+    preprocess_options = [(preprocessor.normalize_image, {
+        'original_minval': 0,
+        'original_maxval': 255,
+        'target_minval': 0,
+        'target_maxval': 1,}), (preprocessor.random_crop_pad_image, {})]
+    self._testPreprocessorCache(preprocess_options,
+                                test_boxes=True,
+                                test_masks=True,
+                                test_keypoints=True)
+
  def testRandomCropPadImageWithRandomCoefOne(self):
    preprocessing_options = [(preprocessor.normalize_image, {
        'original_minval': 0,
@@ -1788,6 +2010,22 @@ class PreprocessorTest(tf.test.TestCase):
      self.assertEqual(images_shape_[1], padded_images_shape_[1])
      self.assertEqual(2 * images_shape_[2], padded_images_shape_[2])

+  def testRandomBlackPatchesWithCache(self):
+    preprocess_options = []
+    preprocess_options.append((preprocessor.normalize_image, {
+        'original_minval': 0,
+        'original_maxval': 255,
+        'target_minval': 0,
+        'target_maxval': 1
+    }))
+    preprocess_options.append((preprocessor.random_black_patches, {
+        'size_to_image_ratio': 0.5
+    }))
+    self._testPreprocessorCache(preprocess_options,
+                                test_boxes=True,
+                                test_masks=True,
+                                test_keypoints=True)
+
  def testRandomBlackPatches(self):
    preprocessing_options = []
    preprocessing_options.append((preprocessor.normalize_image, {
@@ -1812,6 +2050,22 @@ class PreprocessorTest(tf.test.TestCase):
          [images_shape, blacked_images_shape])
      self.assertAllEqual(images_shape_, blacked_images_shape_)

+  def testRandomResizeMethodWithCache(self):
+    preprocess_options = []
+    preprocess_options.append((preprocessor.normalize_image, {
+        'original_minval': 0,
+        'original_maxval': 255,
+        'target_minval': 0,
+        'target_maxval': 1
+    }))
+    preprocess_options.append((preprocessor.random_resize_method, {
+        'target_size': (75, 150)
+    }))
+    self._testPreprocessorCache(preprocess_options,
+                                test_boxes=True,
+                                test_masks=True,
+                                test_keypoints=True)
+
  def testRandomResizeMethod(self):
    preprocessing_options = []
    preprocessing_options.append((preprocessor.normalize_image, {
@@ -2144,6 +2398,20 @@ class PreprocessorTest(tf.test.TestCase):

      self.assertAllEqual([0, 1, 1, 0, 1], one_hot)

+  def testSSDRandomCropWithCache(self):
+    preprocess_options = [
+        (preprocessor.normalize_image, {
+            'original_minval': 0,
+            'original_maxval': 255,
+            'target_minval': 0,
+            'target_maxval': 1
+        }),
+        (preprocessor.ssd_random_crop, {})]
+    self._testPreprocessorCache(preprocess_options,
+                                test_boxes=True,
+                                test_masks=False,
+                                test_keypoints=False)
+
  def testSSDRandomCrop(self):
    preprocessing_options = [
        (preprocessor.normalize_image, {
@@ -2216,6 +2484,20 @@ class PreprocessorTest(tf.test.TestCase):
      self.assertAllEqual(boxes_rank_, distorted_boxes_rank_)
      self.assertAllEqual(images_rank_, distorted_images_rank_)

+  def testSSDRandomCropFixedAspectRatioWithCache(self):
+    preprocess_options = [
+        (preprocessor.normalize_image, {
+            'original_minval': 0,
+            'original_maxval': 255,
+            'target_minval': 0,
+            'target_maxval': 1
+        }),
+        (preprocessor.ssd_random_crop_fixed_aspect_ratio, {})]
+    self._testPreprocessorCache(preprocess_options,
+                                test_boxes=True,
+                                test_masks=False,
+                                test_keypoints=False)
+
  def _testSSDRandomCropFixedAspectRatio(self,
                                         include_label_scores,
                                         include_instance_masks,

--- a/research/object_detection/core/standard_fields.py
+++ b/research/object_detection/core/standard_fields.py
@@ -58,6 +58,9 @@ class InputDataFields(object):
    groundtruth_keypoint_visibilities: ground truth keypoint visibilities.
    groundtruth_label_scores: groundtruth label scores.
    groundtruth_weights: groundtruth weight factor for bounding boxes.
+    num_groundtruth_boxes: number of groundtruth boxes.
+    true_image_shapes: true shapes of images in the resized images, as resized
+      images can be padded with zeros.
  """
  image = 'image'
  original_image = 'original_image'
@@ -81,6 +84,8 @@ class InputDataFields(object):
  groundtruth_keypoint_visibilities = 'groundtruth_keypoint_visibilities'
  groundtruth_label_scores = 'groundtruth_label_scores'
  groundtruth_weights = 'groundtruth_weights'
+  num_groundtruth_boxes = 'num_groundtruth_boxes'
+  true_image_shape = 'true_image_shape'


 class DetectionResultFields(object):

--- a/research/object_detection/core/target_assigner.py
+++ b/research/object_detection/core/target_assigner.py
@@ -389,7 +389,8 @@ def create_target_assigner(reference, stage=None,
 def batch_assign_targets(target_assigner,
                         anchors_batch,
                         gt_box_batch,
-                         gt_class_targets_batch):
+                         gt_class_targets_batch,
+                         gt_weights_batch=None):
  """Batched assignment of classification and regression targets.

  Args:
@@ -402,6 +403,8 @@ def batch_assign_targets(target_assigner,
      each tensor has shape [num_gt_boxes_i, classification_target_size] and
      num_gt_boxes_i is the number of boxes in the ith boxlist of
      gt_box_batch.
+    gt_weights_batch: A list of 1-D tf.float32 tensors of shape
+      [num_boxes] containing weights for groundtruth boxes.

  Returns:
    batch_cls_targets: a tensor with shape [batch_size, num_anchors,
@@ -435,11 +438,13 @@ def batch_assign_targets(target_assigner,
  reg_targets_list = []
  reg_weights_list = []
  match_list = []
-  for anchors, gt_boxes, gt_class_targets in zip(
-      anchors_batch, gt_box_batch, gt_class_targets_batch):
+  if gt_weights_batch is None:
+    gt_weights_batch = [None] * len(gt_class_targets_batch)
+  for anchors, gt_boxes, gt_class_targets, gt_weights in zip(
+      anchors_batch, gt_box_batch, gt_class_targets_batch, gt_weights_batch):
    (cls_targets, cls_weights, reg_targets,
     reg_weights, match) = target_assigner.assign(
-         anchors, gt_boxes, gt_class_targets)
+         anchors, gt_boxes, gt_class_targets, gt_weights)
    cls_targets_list.append(cls_targets)
    cls_weights_list.append(cls_weights)
    reg_targets_list.append(reg_targets)

--- a/research/object_detection/core/target_assigner_test.py
+++ b/research/object_detection/core/target_assigner_test.py
@@ -632,6 +632,81 @@ class BatchTargetAssignerTest(test_case.TestCase):
    self.assertAllClose(reg_targets_out, exp_reg_targets)
    self.assertAllClose(reg_weights_out, exp_reg_weights)

+  def test_batch_assign_multiclass_targets_with_padded_groundtruth(self):
+    def graph_fn(anchor_means, anchor_stddevs, groundtruth_boxlist1,
+                 groundtruth_boxlist2, class_targets1, class_targets2,
+                 groundtruth_weights1, groundtruth_weights2):
+      box_list1 = box_list.BoxList(groundtruth_boxlist1)
+      box_list2 = box_list.BoxList(groundtruth_boxlist2)
+      gt_box_batch = [box_list1, box_list2]
+      gt_class_targets = [class_targets1, class_targets2]
+      gt_weights = [groundtruth_weights1, groundtruth_weights2]
+      anchors_boxlist = box_list.BoxList(anchor_means)
+      anchors_boxlist.add_field('stddev', anchor_stddevs)
+      multiclass_target_assigner = self._get_multi_class_target_assigner(
+          num_classes=3)
+      (cls_targets, cls_weights, reg_targets, reg_weights,
+       _) = targetassigner.batch_assign_targets(
+           multiclass_target_assigner, anchors_boxlist, gt_box_batch,
+           gt_class_targets, gt_weights)
+      return (cls_targets, cls_weights, reg_targets, reg_weights)
+
+    groundtruth_boxlist1 = np.array([[0., 0., 0.2, 0.2],
+                                     [0., 0., 0., 0.]], dtype=np.float32)
+    groundtruth_weights1 = np.array([1, 0], dtype=np.float32)
+    groundtruth_boxlist2 = np.array([[0, 0.25123152, 1, 1],
+                                     [0.015789, 0.0985, 0.55789, 0.3842],
+                                     [0, 0, 0, 0]],
+                                    dtype=np.float32)
+    groundtruth_weights2 = np.array([1, 1, 0], dtype=np.float32)
+    class_targets1 = np.array([[0, 1, 0, 0], [0, 0, 0, 0]], dtype=np.float32)
+    class_targets2 = np.array([[0, 0, 0, 1],
+                               [0, 0, 1, 0],
+                               [0, 0, 0, 0]], dtype=np.float32)
+
+    anchor_means = np.array([[0, 0, .25, .25],
+                             [0, .25, 1, 1],
+                             [0, .1, .5, .5],
+                             [.75, .75, 1, 1]], dtype=np.float32)
+    anchor_stddevs = np.array([[.1, .1, .1, .1],
+                               [.1, .1, .1, .1],
+                               [.1, .1, .1, .1],
+                               [.1, .1, .1, .1]], dtype=np.float32)
+
+    exp_reg_targets = [[[0, 0, -0.5, -0.5],
+                        [0, 0, 0, 0],
+                        [0, 0, 0, 0,],
+                        [0, 0, 0, 0,],],
+                       [[0, 0, 0, 0,],
+                        [0, 0.01231521, 0, 0],
+                        [0.15789001, -0.01500003, 0.57889998, -1.15799987],
+                        [0, 0, 0, 0]]]
+    exp_cls_weights = [[1, 1, 1, 1],
+                       [1, 1, 1, 1]]
+    exp_cls_targets = [[[0, 1, 0, 0],
+                        [1, 0, 0, 0],
+                        [1, 0, 0, 0],
+                        [1, 0, 0, 0]],
+                       [[1, 0, 0, 0],
+                        [0, 0, 0, 1],
+                        [0, 0, 1, 0],
+                        [1, 0, 0, 0]]]
+    exp_reg_weights = [[1, 0, 0, 0],
+                       [0, 1, 1, 0]]
+
+    (cls_targets_out, cls_weights_out, reg_targets_out,
+     reg_weights_out) = self.execute(graph_fn, [anchor_means, anchor_stddevs,
+                                                groundtruth_boxlist1,
+                                                groundtruth_boxlist2,
+                                                class_targets1,
+                                                class_targets2,
+                                                groundtruth_weights1,
+                                                groundtruth_weights2])
+    self.assertAllClose(cls_targets_out, exp_cls_targets)
+    self.assertAllClose(cls_weights_out, exp_cls_weights)
+    self.assertAllClose(reg_targets_out, exp_reg_targets)
+    self.assertAllClose(reg_weights_out, exp_reg_weights)
+
  def test_batch_assign_multidimensional_targets(self):
    def graph_fn(anchor_means, anchor_stddevs, groundtruth_boxlist1,
                 groundtruth_boxlist2, class_targets1, class_targets2):

--- a/research/object_detection/data_decoders/tf_example_decoder.py
+++ b/research/object_detection/data_decoders/tf_example_decoder.py
@@ -134,7 +134,8 @@ class TfExampleDecoder(data_decoder.DataDecoder):
        self.items_to_handlers[
            fields.InputDataFields.groundtruth_instance_masks] = (
                slim_example_decoder.ItemHandlerCallback(
-                    ['image/object/mask'], self._decode_png_instance_masks))
+                    ['image/object/mask', 'image/height', 'image/width'],
+                    self._decode_png_instance_masks))
      else:
        raise ValueError('Did not recognize the `instance_mask_type` option.')
    if label_map_proto_file:
@@ -178,10 +179,15 @@ class TfExampleDecoder(data_decoder.DataDecoder):
        [None, 4] containing box corners.
      fields.InputDataFields.groundtruth_classes - 1D int64 tensor of shape
        [None] containing classes for the boxes.
+      fields.InputDataFields.groundtruth_weights - 1D float32 tensor of
+        shape [None] indicating the weights of groundtruth boxes.
+      fields.InputDataFields.num_groundtruth_boxes - int32 scalar indicating
+        the number of groundtruth_boxes.
      fields.InputDataFields.groundtruth_area - 1D float32 tensor of shape
        [None] containing containing object mask area in pixel squared.
      fields.InputDataFields.groundtruth_is_crowd - 1D bool tensor of shape
        [None] indicating if the boxes enclose a crowd.
+
    Optional:
      fields.InputDataFields.groundtruth_difficult - 1D bool tensor of shape
        [None] indicating if the boxes represent `difficult` instances.
@@ -189,8 +195,6 @@ class TfExampleDecoder(data_decoder.DataDecoder):
        [None] indicating if the boxes represent `group_of` instances.
      fields.InputDataFields.groundtruth_instance_masks - 3D float32 tensor of
        shape [None, None, None] containing instance masks.
-      fields.InputDataFields.groundtruth_weights - 1D float32 tensor of
-        shape [None] indicating the weights of groundtruth boxes.
    """
    serialized_example = tf.reshape(tf_example_string_tensor, shape=[])
    decoder = slim_example_decoder.TFExampleDecoder(self.keys_to_features,
@@ -201,6 +205,20 @@ class TfExampleDecoder(data_decoder.DataDecoder):
    is_crowd = fields.InputDataFields.groundtruth_is_crowd
    tensor_dict[is_crowd] = tf.cast(tensor_dict[is_crowd], dtype=tf.bool)
    tensor_dict[fields.InputDataFields.image].set_shape([None, None, 3])
+    tensor_dict[fields.InputDataFields.num_groundtruth_boxes] = tf.shape(
+        tensor_dict[fields.InputDataFields.groundtruth_boxes])[0]
+
+    def default_groundtruth_weights():
+      return tf.ones(
+          [tf.shape(tensor_dict[fields.InputDataFields.groundtruth_boxes])[0]],
+          dtype=tf.float32)
+
+    tensor_dict[fields.InputDataFields.groundtruth_weights] = tf.cond(
+        tf.greater(
+            tf.shape(
+                tensor_dict[fields.InputDataFields.groundtruth_weights])[0],
+            0), lambda: tensor_dict[fields.InputDataFields.groundtruth_weights],
+        default_groundtruth_weights)
    return tensor_dict

  def _reshape_instance_masks(self, keys_to_tensors):
@@ -247,6 +265,11 @@ class TfExampleDecoder(data_decoder.DataDecoder):
      return image

    png_masks = keys_to_tensors['image/object/mask']
+    height = keys_to_tensors['image/height']
+    width = keys_to_tensors['image/width']
    if isinstance(png_masks, tf.SparseTensor):
      png_masks = tf.sparse_tensor_to_dense(png_masks, default_value='')
-    return tf.map_fn(decode_png_mask, png_masks, dtype=tf.float32)
+    return tf.cond(
+        tf.greater(tf.size(png_masks), 0),
+        lambda: tf.map_fn(decode_png_mask, png_masks, dtype=tf.float32),
+        lambda: tf.zeros(tf.to_int32(tf.stack([0, height, width]))))
--- a/research/object_detection/data_decoders/tf_example_decoder_test.py
+++ b/research/object_detection/data_decoders/tf_example_decoder_test.py
@@ -58,7 +58,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))

  def testDecodeJpegImage(self):
-    image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
+    image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
    encoded_jpeg = self._EncodeImage(image_tensor)
    decoded_jpeg = self._DecodeImage(encoded_jpeg)
    example = tf.train.Example(features=tf.train.Features(feature={
@@ -79,7 +79,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
    self.assertEqual('image_id', tensor_dict[fields.InputDataFields.source_id])

  def testDecodeImageKeyAndFilename(self):
-    image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
+    image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
    encoded_jpeg = self._EncodeImage(image_tensor)
    example = tf.train.Example(features=tf.train.Features(feature={
        'image/encoded': self._BytesFeature(encoded_jpeg),
@@ -97,7 +97,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
    self.assertEqual('filename', tensor_dict[fields.InputDataFields.filename])

  def testDecodePngImage(self):
-    image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
+    image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
    encoded_png = self._EncodeImage(image_tensor, encoding_type='png')
    decoded_png = self._DecodeImage(encoded_png, encoding_type='png')
    example = tf.train.Example(features=tf.train.Features(feature={
@@ -147,8 +147,32 @@ class TfExampleDecoderTest(tf.test.TestCase):
        decoded_masks,
        tensor_dict[fields.InputDataFields.groundtruth_instance_masks])

+  def testDecodeEmptyPngInstanceMasks(self):
+    image_tensor = np.random.randint(256, size=(10, 10, 3)).astype(np.uint8)
+    encoded_jpeg = self._EncodeImage(image_tensor)
+    encoded_masks = []
+    example = tf.train.Example(
+        features=tf.train.Features(
+            feature={
+                'image/encoded': self._BytesFeature(encoded_jpeg),
+                'image/format': self._BytesFeature('jpeg'),
+                'image/object/mask': self._BytesFeature(encoded_masks),
+                'image/height': self._Int64Feature([10]),
+                'image/width': self._Int64Feature([10]),
+            })).SerializeToString()
+
+    example_decoder = tf_example_decoder.TfExampleDecoder(
+        load_instance_masks=True, instance_mask_type=input_reader_pb2.PNG_MASKS)
+    tensor_dict = example_decoder.decode(tf.convert_to_tensor(example))
+
+    with self.test_session() as sess:
+      tensor_dict = sess.run(tensor_dict)
+      self.assertAllEqual(
+          tensor_dict[fields.InputDataFields.groundtruth_instance_masks].shape,
+          [0, 10, 10])
+
  def testDecodeBoundingBox(self):
-    image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
+    image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
    encoded_jpeg = self._EncodeImage(image_tensor)
    bbox_ymins = [0.0, 4.0]
    bbox_xmins = [1.0, 5.0]
@@ -175,9 +199,39 @@ class TfExampleDecoderTest(tf.test.TestCase):
                                bbox_ymaxs, bbox_xmaxs]).transpose()
    self.assertAllEqual(expected_boxes,
                        tensor_dict[fields.InputDataFields.groundtruth_boxes])
+    self.assertAllEqual(
+        2, tensor_dict[fields.InputDataFields.num_groundtruth_boxes])
+
+  def testDecodeDefaultGroundtruthWeights(self):
+    image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
+    encoded_jpeg = self._EncodeImage(image_tensor)
+    bbox_ymins = [0.0, 4.0]
+    bbox_xmins = [1.0, 5.0]
+    bbox_ymaxs = [2.0, 6.0]
+    bbox_xmaxs = [3.0, 7.0]
+    example = tf.train.Example(features=tf.train.Features(feature={
+        'image/encoded': self._BytesFeature(encoded_jpeg),
+        'image/format': self._BytesFeature('jpeg'),
+        'image/object/bbox/ymin': self._FloatFeature(bbox_ymins),
+        'image/object/bbox/xmin': self._FloatFeature(bbox_xmins),
+        'image/object/bbox/ymax': self._FloatFeature(bbox_ymaxs),
+        'image/object/bbox/xmax': self._FloatFeature(bbox_xmaxs),
+    })).SerializeToString()
+
+    example_decoder = tf_example_decoder.TfExampleDecoder()
+    tensor_dict = example_decoder.decode(tf.convert_to_tensor(example))
+
+    self.assertAllEqual((tensor_dict[fields.InputDataFields.groundtruth_boxes].
+                         get_shape().as_list()), [None, 4])
+
+    with self.test_session() as sess:
+      tensor_dict = sess.run(tensor_dict)
+
+    self.assertAllClose(tensor_dict[fields.InputDataFields.groundtruth_weights],
+                        np.ones(2, dtype=np.float32))

  def testDecodeObjectLabel(self):
-    image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
+    image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
    encoded_jpeg = self._EncodeImage(image_tensor)
    bbox_classes = [0, 1]
    example = tf.train.Example(features=tf.train.Features(feature={
@@ -199,8 +253,89 @@ class TfExampleDecoderTest(tf.test.TestCase):
    self.assertAllEqual(bbox_classes,
                        tensor_dict[fields.InputDataFields.groundtruth_classes])

+  def testDecodeObjectLabelNoText(self):
+    image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
+    encoded_jpeg = self._EncodeImage(image_tensor)
+    bbox_classes = [1, 2]
+    example = tf.train.Example(features=tf.train.Features(feature={
+        'image/encoded': self._BytesFeature(encoded_jpeg),
+        'image/format': self._BytesFeature('jpeg'),
+        'image/object/class/label': self._Int64Feature(bbox_classes),
+    })).SerializeToString()
+    label_map_string = """
+      item {
+        id:1
+        name:'cat'
+      }
+      item {
+        id:2
+        name:'dog'
+      }
+    """
+    label_map_path = os.path.join(self.get_temp_dir(), 'label_map.pbtxt')
+    with tf.gfile.Open(label_map_path, 'wb') as f:
+      f.write(label_map_string)
+
+    example_decoder = tf_example_decoder.TfExampleDecoder(
+        label_map_proto_file=label_map_path)
+    tensor_dict = example_decoder.decode(tf.convert_to_tensor(example))
+
+    self.assertAllEqual((tensor_dict[
+        fields.InputDataFields.groundtruth_classes].get_shape().as_list()),
+                        [None])
+
+    init = tf.tables_initializer()
+    with self.test_session() as sess:
+      sess.run(init)
+      tensor_dict = sess.run(tensor_dict)
+
+    self.assertAllEqual(bbox_classes,
+                        tensor_dict[fields.InputDataFields.groundtruth_classes])
+
+  def testDecodeObjectLabelUnrecognizedName(self):
+    image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
+    encoded_jpeg = self._EncodeImage(image_tensor)
+    bbox_classes_text = ['cat', 'cheetah']
+    example = tf.train.Example(
+        features=tf.train.Features(
+            feature={
+                'image/encoded':
+                    self._BytesFeature(encoded_jpeg),
+                'image/format':
+                    self._BytesFeature('jpeg'),
+                'image/object/class/text':
+                    self._BytesFeature(bbox_classes_text),
+            })).SerializeToString()
+
+    label_map_string = """
+      item {
+        id:2
+        name:'cat'
+      }
+      item {
+        id:1
+        name:'dog'
+      }
+    """
+    label_map_path = os.path.join(self.get_temp_dir(), 'label_map.pbtxt')
+    with tf.gfile.Open(label_map_path, 'wb') as f:
+      f.write(label_map_string)
+    example_decoder = tf_example_decoder.TfExampleDecoder(
+        label_map_proto_file=label_map_path)
+    tensor_dict = example_decoder.decode(tf.convert_to_tensor(example))
+
+    self.assertAllEqual((tensor_dict[fields.InputDataFields.groundtruth_classes]
+                         .get_shape().as_list()), [None])
+
+    with self.test_session() as sess:
+      sess.run(tf.tables_initializer())
+      tensor_dict = sess.run(tensor_dict)
+
+    self.assertAllEqual([2, -1],
+                        tensor_dict[fields.InputDataFields.groundtruth_classes])
+
  def testDecodeObjectLabelWithMapping(self):
-    image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
+    image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
    encoded_jpeg = self._EncodeImage(image_tensor)
    bbox_classes_text = ['cat', 'dog']
    example = tf.train.Example(
@@ -242,7 +377,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
                        tensor_dict[fields.InputDataFields.groundtruth_classes])

  def testDecodeObjectArea(self):
-    image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
+    image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
    encoded_jpeg = self._EncodeImage(image_tensor)
    object_area = [100., 174.]
    example = tf.train.Example(features=tf.train.Features(feature={
@@ -263,7 +398,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
                        tensor_dict[fields.InputDataFields.groundtruth_area])

  def testDecodeObjectIsCrowd(self):
-    image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
+    image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
    encoded_jpeg = self._EncodeImage(image_tensor)
    object_is_crowd = [0, 1]
    example = tf.train.Example(features=tf.train.Features(feature={
@@ -286,7 +421,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
                            fields.InputDataFields.groundtruth_is_crowd])

  def testDecodeObjectDifficult(self):
-    image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
+    image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
    encoded_jpeg = self._EncodeImage(image_tensor)
    object_difficult = [0, 1]
    example = tf.train.Example(features=tf.train.Features(feature={
@@ -309,7 +444,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
                            fields.InputDataFields.groundtruth_difficult])

  def testDecodeObjectGroupOf(self):
-    image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
+    image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
    encoded_jpeg = self._EncodeImage(image_tensor)
    object_group_of = [0, 1]
    example = tf.train.Example(features=tf.train.Features(
@@ -333,7 +468,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
        tensor_dict[fields.InputDataFields.groundtruth_group_of])

  def testDecodeObjectWeight(self):
-    image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
+    image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
    encoded_jpeg = self._EncodeImage(image_tensor)
    object_weights = [0.75, 1.0]
    example = tf.train.Example(features=tf.train.Features(
@@ -362,7 +497,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
    image_width = 3

    # Randomly generate image.
-    image_tensor = np.random.randint(255, size=(image_height,
+    image_tensor = np.random.randint(256, size=(image_height,
                                                image_width,
                                                3)).astype(np.uint8)
    encoded_jpeg = self._EncodeImage(image_tensor)
@@ -413,7 +548,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
    image_height = 5
    image_width = 3
    # Randomly generate image.
-    image_tensor = np.random.randint(255, size=(image_height,
+    image_tensor = np.random.randint(256, size=(image_height,
                                                image_width,
                                                3)).astype(np.uint8)
    encoded_jpeg = self._EncodeImage(image_tensor)

--- a/research/object_detection/dataset_tools/__init__.py
+++ b/research/object_detection/dataset_tools/__init__.py
-
--- a/research/object_detection/dataset_tools/create_coco_tf_record.py
+++ b/research/object_detection/dataset_tools/create_coco_tf_record.py
@@ -87,13 +87,12 @@ def create_tf_example(image,
      to the format expected by the Tensorflow Object Detection API (which is
      which is [ymin, xmin, ymax, xmax] with coordinates normalized relative
      to image size).
-    image_dir: Directory containing the image files.
+    image_dir: directory containing the image files.
    category_index: a dict containing COCO category information keyed
      by the 'id' field of each category.  See the
      label_map_util.create_category_index function.
    include_masks: Whether to include instance segmentations masks
      (PNG encoded) in the result. default: False.
-
  Returns:
    example: The converted tf.Example
    num_annotations_skipped: Number of (invalid) annotations that were ignored.
@@ -104,6 +103,7 @@ def create_tf_example(image,
  image_height = image['height']
  image_width = image['width']
  filename = image['file_name']
+  image_id = image['id']

  full_path = os.path.join(image_dir, filename)
  with tf.gfile.GFile(full_path, 'rb') as fid:
@@ -118,6 +118,7 @@ def create_tf_example(image,
  ymax = []
  is_crowd = []
  category_names = []
+  category_ids = []
  area = []
  encoded_mask_png = []
  num_annotations_skipped = 0
@@ -135,12 +136,13 @@ def create_tf_example(image,
    ymax.append(float(y + height) / image_height)
    is_crowd.append(object_annotations['iscrowd'])
    category_id = int(object_annotations['category_id'])
+    category_ids.append(category_id)
    category_names.append(category_index[category_id]['name'].encode('utf8'))
    area.append(object_annotations['area'])

    if include_masks:
-      run_len_encoding = mask.frPyObjects(
-          object_annotations['segmentation'], image_height, image_width)
+      run_len_encoding = mask.frPyObjects(object_annotations['segmentation'],
+                                          image_height, image_width)
      binary_mask = mask.decode(run_len_encoding)
      if not object_annotations['iscrowd']:
        binary_mask = np.amax(binary_mask, axis=2)
@@ -148,31 +150,41 @@ def create_tf_example(image,
      output_io = io.BytesIO()
      pil_image.save(output_io, format='PNG')
      encoded_mask_png.append(output_io.getvalue())
-
  feature_dict = {
-      'image/height': dataset_util.int64_feature(image_height),
-      'image/width': dataset_util.int64_feature(image_width),
-      'image/filename': dataset_util.bytes_feature(
-          filename.encode('utf8')),
-      'image/source_id': dataset_util.bytes_feature(
-          filename.encode('utf8')),
-      'image/key/sha256': dataset_util.bytes_feature(key.encode('utf8')),
-      'image/encoded': dataset_util.bytes_feature(encoded_jpg),
-      'image/format': dataset_util.bytes_feature('jpeg'.encode('utf8')),
-      'image/object/bbox/xmin': dataset_util.float_list_feature(xmin),
-      'image/object/bbox/xmax': dataset_util.float_list_feature(xmax),
-      'image/object/bbox/ymin': dataset_util.float_list_feature(ymin),
-      'image/object/bbox/ymax': dataset_util.float_list_feature(ymax),
-      'image/object/class/text': dataset_util.bytes_list_feature(
-          category_names),
-      'image/object/is_crowd': dataset_util.int64_list_feature(is_crowd),
-      'image/object/area': dataset_util.float_list_feature(area),
+      'image/height':
+          dataset_util.int64_feature(image_height),
+      'image/width':
+          dataset_util.int64_feature(image_width),
+      'image/filename':
+          dataset_util.bytes_feature(filename.encode('utf8')),
+      'image/source_id':
+          dataset_util.bytes_feature(str(image_id).encode('utf8')),
+      'image/key/sha256':
+          dataset_util.bytes_feature(key.encode('utf8')),
+      'image/encoded':
+          dataset_util.bytes_feature(encoded_jpg),
+      'image/format':
+          dataset_util.bytes_feature('jpeg'.encode('utf8')),
+      'image/object/bbox/xmin':
+          dataset_util.float_list_feature(xmin),
+      'image/object/bbox/xmax':
+          dataset_util.float_list_feature(xmax),
+      'image/object/bbox/ymin':
+          dataset_util.float_list_feature(ymin),
+      'image/object/bbox/ymax':
+          dataset_util.float_list_feature(ymax),
+      'image/object/class/label':
+          dataset_util.int64_list_feature(category_ids),
+      'image/object/is_crowd':
+          dataset_util.int64_list_feature(is_crowd),
+      'image/object/area':
+          dataset_util.float_list_feature(area),
  }
  if include_masks:
    feature_dict['image/object/mask'] = (
        dataset_util.bytes_list_feature(encoded_mask_png))
  example = tf.train.Example(features=tf.train.Features(feature=feature_dict))
-  return example, num_annotations_skipped
+  return key, example, num_annotations_skipped


 def _create_tf_record_from_coco_annotations(
@@ -217,7 +229,7 @@ def _create_tf_record_from_coco_annotations(
      if idx % 100 == 0:
        tf.logging.info('On image %d of %d', idx, len(images))
      annotations_list = annotations_index[image['id']]
-      tf_example, num_annotations_skipped = create_tf_example(
+      _, tf_example, num_annotations_skipped = create_tf_example(
          image, annotations_list, image_dir, category_index, include_masks)
      total_num_annotations_skipped += num_annotations_skipped
      writer.write(tf_example.SerializeToString())

--- a/research/object_detection/dataset_tools/create_coco_tf_record_test.py
+++ b/research/object_detection/dataset_tools/create_coco_tf_record_test.py
@@ -12,7 +12,6 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ==============================================================================
-
 """Test for create_coco_tf_record.py."""

 import io
@@ -52,26 +51,34 @@ class CreateCocoTFRecordTest(tf.test.TestCase):
        'id': 11,
    }

-    annotations_list = [
-        {
-            'area': .5,
-            'iscrowd': False,
-            'image_id': 11,
-            'bbox': [64, 64, 128, 128],
-            'category_id': 2,
-            'id': 1000,
-        }
-    ]
+    annotations_list = [{
+        'area': .5,
+        'iscrowd': False,
+        'image_id': 11,
+        'bbox': [64, 64, 128, 128],
+        'category_id': 2,
+        'id': 1000,
+    }]

    image_dir = tmp_dir
    category_index = {
-        1: {'name': 'dog', 'id': 1},
-        2: {'name': 'cat', 'id': 2},
-        3: {'name': 'human', 'id': 3}
+        1: {
+            'name': 'dog',
+            'id': 1
+        },
+        2: {
+            'name': 'cat',
+            'id': 2
+        },
+        3: {
+            'name': 'human',
+            'id': 3
+        }
    }

-    example, num_annotations_skipped = create_coco_tf_record.create_tf_example(
-        image, annotations_list, image_dir, category_index)
+    (_, example,
+     num_annotations_skipped) = create_coco_tf_record.create_tf_example(
+         image, annotations_list, image_dir, category_index)

    self.assertEqual(num_annotations_skipped, 0)
    self._assertProtoEqual(
@@ -83,7 +90,7 @@ class CreateCocoTFRecordTest(tf.test.TestCase):
        [image_file_name])
    self._assertProtoEqual(
        example.features.feature['image/source_id'].bytes_list.value,
-        [image_file_name])
+        [str(image['id'])])
    self._assertProtoEqual(
        example.features.feature['image/format'].bytes_list.value, ['jpeg'])
    self._assertProtoEqual(
@@ -98,9 +105,6 @@ class CreateCocoTFRecordTest(tf.test.TestCase):
    self._assertProtoEqual(
        example.features.feature['image/object/bbox/ymax'].float_list.value,
        [0.75])
-    self._assertProtoEqual(
-        example.features.feature['image/object/class/text'].bytes_list.value,
-        ['cat'])

  def test_create_tf_example_with_instance_masks(self):
    image_file_name = 'tmp_image.jpg'
@@ -117,26 +121,27 @@ class CreateCocoTFRecordTest(tf.test.TestCase):
        'id': 11,
    }

-    annotations_list = [
-        {
-            'area': .5,
-            'iscrowd': False,
-            'image_id': 11,
-            'bbox': [0, 0, 8, 8],
-            'segmentation': [[4, 0, 0, 0, 0, 4],
-                             [8, 4, 4, 8, 8, 8]],
-            'category_id': 1,
-            'id': 1000,
-        }
-    ]
+    annotations_list = [{
+        'area': .5,
+        'iscrowd': False,
+        'image_id': 11,
+        'bbox': [0, 0, 8, 8],
+        'segmentation': [[4, 0, 0, 0, 0, 4], [8, 4, 4, 8, 8, 8]],
+        'category_id': 1,
+        'id': 1000,
+    }]

    image_dir = tmp_dir
    category_index = {
-        1: {'name': 'dog', 'id': 1},
+        1: {
+            'name': 'dog',
+            'id': 1
+        },
    }

-    example, num_annotations_skipped = create_coco_tf_record.create_tf_example(
-        image, annotations_list, image_dir, category_index, include_masks=True)
+    (_, example,
+     num_annotations_skipped) = create_coco_tf_record.create_tf_example(
+         image, annotations_list, image_dir, category_index, include_masks=True)

    self.assertEqual(num_annotations_skipped, 0)
    self._assertProtoEqual(
@@ -148,7 +153,7 @@ class CreateCocoTFRecordTest(tf.test.TestCase):
        [image_file_name])
    self._assertProtoEqual(
        example.features.feature['image/source_id'].bytes_list.value,
-        [image_file_name])
+        [str(image['id'])])
    self._assertProtoEqual(
        example.features.feature['image/format'].bytes_list.value, ['jpeg'])
    self._assertProtoEqual(
@@ -163,24 +168,20 @@ class CreateCocoTFRecordTest(tf.test.TestCase):
    self._assertProtoEqual(
        example.features.feature['image/object/bbox/ymax'].float_list.value,
        [1])
-    self._assertProtoEqual(
-        example.features.feature['image/object/class/text'].bytes_list.value,
-        ['dog'])
-    encoded_mask_pngs = [io.BytesIO(encoded_masks)
-                         for encoded_masks in example.features.feature[
-                             'image/object/mask'].bytes_list.value]
-    pil_masks = [np.array(PIL.Image.open(encoded_mask_png))
-                 for encoded_mask_png in encoded_mask_pngs]
+    encoded_mask_pngs = [
+        io.BytesIO(encoded_masks) for encoded_masks in example.features.feature[
+            'image/object/mask'].bytes_list.value
+    ]
+    pil_masks = [
+        np.array(PIL.Image.open(encoded_mask_png))
+        for encoded_mask_png in encoded_mask_pngs
+    ]
    self.assertTrue(len(pil_masks) == 1)
    self.assertAllEqual(pil_masks[0],
-                        [[1, 1, 1, 0, 0, 0, 0, 0],
-                         [1, 1, 0, 0, 0, 0, 0, 0],
-                         [1, 0, 0, 0, 0, 0, 0, 0],
-                         [0, 0, 0, 0, 0, 0, 0, 0],
-                         [0, 0, 0, 0, 0, 0, 0, 1],
-                         [0, 0, 0, 0, 0, 0, 1, 1],
-                         [0, 0, 0, 0, 0, 1, 1, 1],
-                         [0, 0, 0, 0, 1, 1, 1, 1]])
+                        [[1, 1, 1, 0, 0, 0, 0, 0], [1, 1, 0, 0, 0, 0, 0, 0],
+                         [1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0],
+                         [0, 0, 0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 0, 1, 1],
+                         [0, 0, 0, 0, 0, 1, 1, 1], [0, 0, 0, 0, 1, 1, 1, 1]])


 if __name__ == '__main__':

--- a/research/object_detection/eval_util.py
+++ b/research/object_detection/eval_util.py
@@ -509,6 +509,11 @@ def result_dict_for_single_example(image,
    detection_masks = detections[detection_fields.detection_masks][0]
    # TODO: This should be done in model's postprocess
    # function ideally.
+    num_detections = tf.to_int32(detections[detection_fields.num_detections][0])
+    detection_boxes = tf.slice(
+        detection_boxes, begin=[0, 0], size=[num_detections, -1])
+    detection_masks = tf.slice(
+        detection_masks, begin=[0, 0, 0], size=[num_detections, -1, -1])
    detection_masks_reframed = ops.reframe_box_masks_to_image_masks(
        detection_masks, detection_boxes, image_shape[1], image_shape[2])
    detection_masks_reframed = tf.cast(

--- a/research/object_detection/evaluator.py
+++ b/research/object_detection/evaluator.py
@@ -24,6 +24,7 @@ import tensorflow as tf
 from object_detection import eval_util
 from object_detection.core import prefetcher
 from object_detection.core import standard_fields as fields
+from object_detection.metrics import coco_evaluation
 from object_detection.utils import object_detection_evaluation

 # A dictionary of metric names to classes that implement the metric. The classes
@@ -39,7 +40,11 @@ EVAL_METRICS_CLASS_DICT = {
    'weighted_pascal_voc_instance_segmentation_metrics':
        object_detection_evaluation.WeightedPascalInstanceSegmentationEvaluator,
    'open_images_detection_metrics':
-        object_detection_evaluation.OpenImagesDetectionEvaluator
+        object_detection_evaluation.OpenImagesDetectionEvaluator,
+    'coco_detection_metrics':
+        coco_evaluation.CocoDetectionEvaluator,
+    'coco_mask_metrics':
+        coco_evaluation.CocoMaskEvaluator,
 }

 EVAL_DEFAULT_METRIC = 'pascal_voc_detection_metrics'