Minor fixes for object detection (#5613)

* Internal change. PiperOrigin-RevId: 213914693 * Add original_image_spatial_shape tensor in input dictionary to store shape of the original input image PiperOrigin-RevId: 214018767 * Remove "groundtruth_confidences" from decoders use "groundtruth_weights" to indicate label confidence. This also solves a bug that only surfaced now - random crop routines in core/preprocessor.py did not correctly handle "groundtruth_weight" tensors returned by the decoders. PiperOrigin-RevId: 214091843 * Update CocoMaskEvaluator to allow for a batch of image info, rather than a single image. PiperOrigin-RevId: 214295305 * Adding the option to be able to summarize gradients. PiperOrigin-RevId: 214310875 * Adds FasterRCNN inference on CPU 1. Adds a flag use_static_shapes_for_eval to restrict to the ops that guarantees static shape. 2. No filtering of overlapping anchors while clipping the anchors when use_static_shapes_for_eval is set to True. 3. Adds test for faster_rcnn_meta_arch for predict and postprocess in inference mode for first and second stages. PiperOrigin-RevId: 214329565 * Fix model_lib eval_spec_names assignment (integer->string). PiperOrigin-RevId: 214335461 * Refactor Mask HEAD to optionally upsample after applying convolutions on ROI crops. PiperOrigin-RevId: 214338440 * Uses final_exporter_name as exporter_name for the first eval spec for backward compatibility. PiperOrigin-RevId: 214522032 * Add reshaped `mask_predictions` tensor to the prediction dictionary in `_predict_third_stage` method to allow computing mask loss in eval job. PiperOrigin-RevId: 214620716 * Add support for fully conv training to fpn. PiperOrigin-RevId: 214626274 * Fix the proprocess() function in Resnet v1 to make it work for any number of input channels. Note: If the #channels != 3, this will simply skip the mean subtraction in preprocess() function. PiperOrigin-RevId: 214635428 * Wrap result_dict_for_single_example in eval_util to run for batched examples. PiperOrigin-RevId: 214678514 * Adds PNASNet-based (ImageNet model) feature extractor for SSD. PiperOrigin-RevId: 214988331 * Update documentation PiperOrigin-RevId: 215243502 * Correct index used to compute number of groundtruth/detection boxes in COCOMaskEvaluator. Due to an incorrect indexing in cl/214295305 only the first detection mask and first groundtruth mask for a given image are fed to the COCO Mask evaluation library. Since groundtruth masks are arranged in no particular order, the first and highest scoring detection mask (detection masks are ordered by score) won't match the the first and only groundtruth retained in all cases. This is I think why mask evaluation metrics do not get better than ~11 mAP. Note that this code path is only active when using model_main.py binary for evaluation. This change fixes the indices and modifies an existing test case to cover it. PiperOrigin-RevId: 215275936 * Fixing grayscale_image_resizer to accept mask as input. PiperOrigin-RevId: 215345836 * Add an option not to clip groundtruth boxes during preprocessing. Clipping boxes adversely affects training for partially occluded or large objects, especially for fully conv models. Clipping already occurs during postprocessing, and should not occur during training. PiperOrigin-RevId: 215613379 * Always return recalls and precisions with length equal to the number of classes. The previous behavior of ObjectDetectionEvaluation was somewhat dangerous: when no groundtruth boxes were present, the lists of per-class precisions and recalls were simply truncated. Unless you were aware of this phenomenon (and consulted the `num_gt_instances_per_class` vector) it was difficult to associate each metric with each class. PiperOrigin-RevId: 215633711 * Expose the box feature node in SSD. PiperOrigin-RevId: 215653316 * Fix ssd mobilenet v2 _CONV_DEFS overwriting issue. PiperOrigin-RevId: 215654160 * More documentation updates PiperOrigin-RevId: 215656580 * Add pooling + residual option in multi_resolution_feature_maps. It adds an average pooling and a residual layer between feature maps with matching depth. Designed to be used with WeightSharedBoxPredictor. PiperOrigin-RevId: 215665619 * Only call create_modificed_mobilenet_config on init if use_depthwise is true. PiperOrigin-RevId: 215784290 * Only call create_modificed_mobilenet_config on init if use_depthwise is true. PiperOrigin-RevId: 215837524 * Don't prune keypoints if clip_boxes is false. PiperOrigin-RevId: 216187642 * Makes sure "key" field exists in the result dictionary. PiperOrigin-RevId: 216456543 * Add add_background_class parameter to allow disabling the inclusion of a background class. PiperOrigin-RevId: 216567612 * Update expected_classification_loss_under_sampling to better account for expected sampling. PiperOrigin-RevId: 216712287 * Let the evaluation receive a evaluation class in its constructor. PiperOrigin-RevId: 216769374 * This CL adds model building & training support for end-to-end Keras-based SSD models. If a Keras feature extractor's name is specified in the model config (e.g. 'ssd_mobilenet_v2_keras'), the model will use that feature extractor and a corresponding Keras-based box predictor. This CL makes sure regularization losses & batch norm updates work correctly when training models that have Keras-based components. It also updates the default hyperparameter settings of the keras-based mobilenetV2 (when not overriding hyperparams) to more closely match the legacy Slim training scope. PiperOrigin-RevId: 216938707 * Adding the ability in the coco evaluator to indicate whether an image has been annotated. For a non-annotated image, detections and groundtruth are not supplied. PiperOrigin-RevId: 217316342 * Release the 8k minival dataset ids for MSCOCO, used in Huang et al. "Speed/accuracy trade-offs for modern convolutional object detectors" (https://arxiv.org/abs/1611.10012) PiperOrigin-RevId: 217549353 * Exposes weighted_sigmoid_focal loss for faster rcnn classifier PiperOrigin-RevId: 217601740 * Add detection_features to output nodes. The shape of the feature is [batch_size, max_detections, depth]. PiperOrigin-RevId: 217629905 * FPN uses a custom NN resize op for TPU-compatibility. Replace this op with the Tensorflow version at export time for TFLite-compatibility. PiperOrigin-RevId: 217721184 * Compute `num_groundtruth_boxes` in inputs.tranform_input_data_fn after data augmentation instead of decoders. PiperOrigin-RevId: 217733432 * 1. Stop gradients from flowing into groundtruth masks with zero paddings. 2. Normalize pixelwise cross entropy loss across the whole batch. PiperOrigin-RevId: 217735114 * Optimize Input pipeline for Mask R-CNN on TPU with blfoat16: improve the step time from: 1663.6 ms -> 1184.2 ms, about 28.8% improvement. PiperOrigin-RevId: 217748833 * Fixes to export a TPU compatible model Adds nodes to each of the output tensor. Also increments the value of class labels by 1. PiperOrigin-RevId: 217856760 * API changes: - change the interface of target assigner to return per-class weights. - change the interface of classification loss to take per-class weights. PiperOrigin-RevId: 217968393 * Add an option to override pipeline config in export_saved_model using command line arg PiperOrigin-RevId: 218429292 * Include Quantized trained MobileNet V2 SSD and FaceSsd in model zoo. PiperOrigin-RevId: 218530947 * Write final config to disk in `train` mode only. PiperOrigin-RevId: 218735512

Minor fixes for object detection (#5613)
* Internal change. PiperOrigin-RevId: 213914693 * Add original_image_spatial_shape tensor in input dictionary to store shape of the original input image PiperOrigin-RevId: 214018767 * Remove "groundtruth_confidences" from decoders use "groundtruth_weights" to indicate label confidence. This also solves a bug that only surfaced now - random crop routines in core/preprocessor.py did not correctly handle "groundtruth_weight" tensors returned by the decoders. PiperOrigin-RevId: 214091843 * Update CocoMaskEvaluator to allow for a batch of image info, rather than a single image. PiperOrigin-RevId: 214295305 * Adding the option to be able to summarize gradients. PiperOrigin-RevId: 214310875 * Adds FasterRCNN inference on CPU 1. Adds a flag use_static_shapes_for_eval to restrict to the ops that guarantees static shape. 2. No filtering of overlapping anchors while clipping the anchors when use_static_shapes_for_eval is set to True. 3. Adds test for faster_rcnn_meta_arch for predict and postprocess in inference mode for first and second stages. PiperOrigin-RevId: 214329565 * Fix model_lib eval_spec_names assignment (integer->string). PiperOrigin-RevId: 214335461 * Refactor Mask HEAD to optionally upsample after applying convolutions on ROI crops. PiperOrigin-RevId: 214338440 * Uses final_exporter_name as exporter_name for the first eval spec for backward compatibility. PiperOrigin-RevId: 214522032 * Add reshaped `mask_predictions` tensor to the prediction dictionary in `_predict_third_stage` method to allow computing mask loss in eval job. PiperOrigin-RevId: 214620716 * Add support for fully conv training to fpn. PiperOrigin-RevId: 214626274 * Fix the proprocess() function in Resnet v1 to make it work for any number of input channels. Note: If the #channels != 3, this will simply skip the mean subtraction in preprocess() function. PiperOrigin-RevId: 214635428 * Wrap result_dict_for_single_example in eval_util to run for batched examples. PiperOrigin-RevId: 214678514 * Adds PNASNet-based (ImageNet model) feature extractor for SSD. PiperOrigin-RevId: 214988331 * Update documentation PiperOrigin-RevId: 215243502 * Correct index used to compute number of groundtruth/detection boxes in COCOMaskEvaluator. Due to an incorrect indexing in cl/214295305 only the first detection mask and first groundtruth mask for a given image are fed to the COCO Mask evaluation library. Since groundtruth masks are arranged in no particular order, the first and highest scoring detection mask (detection masks are ordered by score) won't match the the first and only groundtruth retained in all cases. This is I think why mask evaluation metrics do not get better than ~11 mAP. Note that this code path is only active when using model_main.py binary for evaluation. This change fixes the indices and modifies an existing test case to cover it. PiperOrigin-RevId: 215275936 * Fixing grayscale_image_resizer to accept mask as input. PiperOrigin-RevId: 215345836 * Add an option not to clip groundtruth boxes during preprocessing. Clipping boxes adversely affects training for partially occluded or large objects, especially for fully conv models. Clipping already occurs during postprocessing, and should not occur during training. PiperOrigin-RevId: 215613379 * Always return recalls and precisions with length equal to the number of classes. The previous behavior of ObjectDetectionEvaluation was somewhat dangerous: when no groundtruth boxes were present, the lists of per-class precisions and recalls were simply truncated. Unless you were aware of this phenomenon (and consulted the `num_gt_instances_per_class` vector) it was difficult to associate each metric with each class. PiperOrigin-RevId: 215633711 * Expose the box feature node in SSD. PiperOrigin-RevId: 215653316 * Fix ssd mobilenet v2 _CONV_DEFS overwriting issue. PiperOrigin-RevId: 215654160 * More documentation updates PiperOrigin-RevId: 215656580 * Add pooling + residual option in multi_resolution_feature_maps. It adds an average pooling and a residual layer between feature maps with matching depth. Designed to be used with WeightSharedBoxPredictor. PiperOrigin-RevId: 215665619 * Only call create_modificed_mobilenet_config on init if use_depthwise is true. PiperOrigin-RevId: 215784290 * Only call create_modificed_mobilenet_config on init if use_depthwise is true. PiperOrigin-RevId: 215837524 * Don't prune keypoints if clip_boxes is false. PiperOrigin-RevId: 216187642 * Makes sure "key" field exists in the result dictionary. PiperOrigin-RevId: 216456543 * Add add_background_class parameter to allow disabling the inclusion of a background class. PiperOrigin-RevId: 216567612 * Update expected_classification_loss_under_sampling to better account for expected sampling. PiperOrigin-RevId: 216712287 * Let the evaluation receive a evaluation class in its constructor. PiperOrigin-RevId: 216769374 * This CL adds model building & training support for end-to-end Keras-based SSD models. If a Keras feature extractor's name is specified in the model config (e.g. 'ssd_mobilenet_v2_keras'), the model will use that feature extractor and a corresponding Keras-based box predictor. This CL makes sure regularization losses & batch norm updates work correctly when training models that have Keras-based components. It also updates the default hyperparameter settings of the keras-based mobilenetV2 (when not overriding hyperparams) to more closely match the legacy Slim training scope. PiperOrigin-RevId: 216938707 * Adding the ability in the coco evaluator to indicate whether an image has been annotated. For a non-annotated image, detections and groundtruth are not supplied. PiperOrigin-RevId: 217316342 * Release the 8k minival dataset ids for MSCOCO, used in Huang et al. "Speed/accuracy trade-offs for modern convolutional object detectors" (https://arxiv.org/abs/1611.10012) PiperOrigin-RevId: 217549353 * Exposes weighted_sigmoid_focal loss for faster rcnn classifier PiperOrigin-RevId: 217601740 * Add detection_features to output nodes. The shape of the feature is [batch_size, max_detections, depth]. PiperOrigin-RevId: 217629905 * FPN uses a custom NN resize op for TPU-compatibility. Replace this op with the Tensorflow version at export time for TFLite-compatibility. PiperOrigin-RevId: 217721184 * Compute `num_groundtruth_boxes` in inputs.tranform_input_data_fn after data augmentation instead of decoders. PiperOrigin-RevId: 217733432 * 1. Stop gradients from flowing into groundtruth masks with zero paddings. 2. Normalize pixelwise cross entropy loss across the whole batch. PiperOrigin-RevId: 217735114 * Optimize Input pipeline for Mask R-CNN on TPU with blfoat16: improve the step time from: 1663.6 ms -> 1184.2 ms, about 28.8% improvement. PiperOrigin-RevId: 217748833 * Fixes to export a TPU compatible model Adds nodes to each of the output tensor. Also increments the value of class labels by 1. PiperOrigin-RevId: 217856760 * API changes: - change the interface of target assigner to return per-class weights. - change the interface of classification loss to take per-class weights. PiperOrigin-RevId: 217968393 * Add an option to override pipeline config in export_saved_model using command line arg PiperOrigin-RevId: 218429292 * Include Quantized trained MobileNet V2 SSD and FaceSsd in model zoo. PiperOrigin-RevId: 218530947 * Write final config to disk in `train` mode only. PiperOrigin-RevId: 218735512
31ae57eb · pkulzc · GitHub · 0b0c9cfd · 31ae57eb · 31ae57eb
Unverified Commit 31ae57eb authored Nov 02, 2018 by pkulzc Committed by GitHub Nov 02, 2018
20 changed files
--- a/research/object_detection/anchor_generators/multiscale_grid_anchor_generator.py
+++ b/research/object_detection/anchor_generators/multiscale_grid_anchor_generator.py
@@ -108,9 +108,6 @@ class MultiscaleGridAnchorGenerator(anchor_generator.AnchorGenerator):
      ValueError: if im_height and im_width are 1, but normalized coordinates
        were requested.
    """
-    if not isinstance(im_height, int) or not isinstance(im_width, int):
-      raise ValueError('MultiscaleGridAnchorGenerator currently requires '
-                       'input image shape to be statically defined.')
    anchor_grid_list = []
    for feat_shape, grid_info in zip(feature_map_shape_list,
                                     self._anchor_grid_info):
@@ -122,10 +119,11 @@ class MultiscaleGridAnchorGenerator(anchor_generator.AnchorGenerator):
      feat_h = feat_shape[0]
      feat_w = feat_shape[1]
      anchor_offset = [0, 0]
-      if im_height % 2.0**level == 0 or im_height == 1:
-        anchor_offset[0] = stride / 2.0
-      if im_width % 2.0**level == 0 or im_width == 1:
-        anchor_offset[1] = stride / 2.0
+      if isinstance(im_height, int) and isinstance(im_width, int):
+        if im_height % 2.0**level == 0 or im_height == 1:
+          anchor_offset[0] = stride / 2.0
+        if im_width % 2.0**level == 0 or im_width == 1:
+          anchor_offset[1] = stride / 2.0
      ag = grid_anchor_generator.GridAnchorGenerator(
          scales,
          aspect_ratios,

--- a/research/object_detection/anchor_generators/multiscale_grid_anchor_generator_test.py
+++ b/research/object_detection/anchor_generators/multiscale_grid_anchor_generator_test.py
@@ -116,7 +116,7 @@ class MultiscaleGridAnchorGeneratorTest(test_case.TestCase):
        normalize_coordinates=False)
    self.assertEqual(anchor_generator.num_anchors_per_location(), [6, 6])

-  def test_construct_single_anchor_fails_with_tensor_image_size(self):
+  def test_construct_single_anchor_dynamic_size(self):
    min_level = 5
    max_level = 5
    anchor_scale = 4.0
@@ -125,12 +125,22 @@ class MultiscaleGridAnchorGeneratorTest(test_case.TestCase):
    im_height = tf.constant(64)
    im_width = tf.constant(64)
    feature_map_shape_list = [(2, 2)]
+    # Zero offsets are used.
+    exp_anchor_corners = [[-64, -64, 64, 64],
+                          [-64, -32, 64, 96],
+                          [-32, -64, 96, 64],
+                          [-32, -32, 96, 96]]
+
    anchor_generator = mg.MultiscaleGridAnchorGenerator(
        min_level, max_level, anchor_scale, aspect_ratios, scales_per_octave,
        normalize_coordinates=False)
-    with self.assertRaisesRegexp(ValueError, 'statically defined'):
-      anchor_generator.generate(
-          feature_map_shape_list, im_height=im_height, im_width=im_width)
+    anchors_list = anchor_generator.generate(
+        feature_map_shape_list, im_height=im_height, im_width=im_width)
+    anchor_corners = anchors_list[0].get()
+
+    with self.test_session():
+      anchor_corners_out = anchor_corners.eval()
+      self.assertAllClose(anchor_corners_out, exp_anchor_corners)

  def test_construct_single_anchor_with_odd_input_dimension(self):


--- a/research/object_detection/builders/box_predictor_builder.py
+++ b/research/object_detection/builders/box_predictor_builder.py
@@ -42,6 +42,7 @@ def build_convolutional_box_predictor(is_training,
                                      kernel_size,
                                      box_code_size,
                                      apply_sigmoid_to_scores=False,
+                                      add_background_class=True,
                                      class_prediction_bias_init=0.0,
                                      use_depthwise=False,
                                      mask_head_config=None):
@@ -49,7 +50,10 @@ def build_convolutional_box_predictor(is_training,

  Args:
    is_training: Indicates whether the BoxPredictor is in training mode.
-    num_classes: Number of classes.
+    num_classes: number of classes.  Note that num_classes *does not*
+      include the background category, so if groundtruth labels take values
+      in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the
+      assigned classification targets can range from {0,... K}).
    conv_hyperparams_fn: A function to generate tf-slim arg_scope with
      hyperparameters for convolution ops.
    min_depth: Minimum feature depth prior to predicting box encodings
@@ -71,6 +75,7 @@ def build_convolutional_box_predictor(is_training,
    box_code_size: Size of encoding for each box.
    apply_sigmoid_to_scores: If True, apply the sigmoid on the output
      class_predictions.
+    add_background_class: Whether to add an implicit background class.
    class_prediction_bias_init: Constant value to initialize bias of the last
      conv2d layer before class prediction.
    use_depthwise: Whether to use depthwise convolutions for prediction
@@ -88,7 +93,7 @@ def build_convolutional_box_predictor(is_training,
      use_depthwise=use_depthwise)
  class_prediction_head = class_head.ConvolutionalClassHead(
      is_training=is_training,
-      num_classes=num_classes,
+      num_class_slots=num_classes + 1 if add_background_class else num_classes,
      use_dropout=use_dropout,
      dropout_keep_prob=dropout_keep_prob,
      kernel_size=kernel_size,
@@ -136,15 +141,19 @@ def build_convolutional_keras_box_predictor(is_training,
                                            dropout_keep_prob,
                                            kernel_size,
                                            box_code_size,
+                                            add_background_class=True,
                                            class_prediction_bias_init=0.0,
                                            use_depthwise=False,
                                            mask_head_config=None,
                                            name='BoxPredictor'):
-  """Builds the ConvolutionalBoxPredictor from the arguments.
+  """Builds the Keras ConvolutionalBoxPredictor from the arguments.

  Args:
    is_training: Indicates whether the BoxPredictor is in training mode.
-    num_classes: Number of classes.
+    num_classes: number of classes.  Note that num_classes *does not*
+      include the background category, so if groundtruth labels take values
+      in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the
+      assigned classification targets can range from {0,... K}).
    conv_hyperparams: A `hyperparams_builder.KerasLayerHyperparams` object
      containing hyperparameters for convolution ops.
    freeze_batchnorm: Whether to freeze batch norm parameters during
@@ -175,6 +184,7 @@ def build_convolutional_keras_box_predictor(is_training,
      then the kernel size is automatically set to be
      min(feature_width, feature_height).
    box_code_size: Size of encoding for each box.
+    add_background_class: Whether to add an implicit background class.
    class_prediction_bias_init: constant value to initialize bias of the last
      conv2d layer before class prediction.
    use_depthwise: Whether to use depthwise convolutions for prediction
@@ -185,7 +195,7 @@ def build_convolutional_keras_box_predictor(is_training,
      will auto-generate one from the class name.

  Returns:
-    A ConvolutionalBoxPredictor class.
+    A Keras ConvolutionalBoxPredictor class.
  """
  box_prediction_heads = []
  class_prediction_heads = []
@@ -210,7 +220,8 @@ def build_convolutional_keras_box_predictor(is_training,
    class_prediction_heads.append(
        keras_class_head.ConvolutionalClassHead(
            is_training=is_training,
-            num_classes=num_classes,
+            num_class_slots=(
+                num_classes + 1 if add_background_class else num_classes),
            use_dropout=use_dropout,
            dropout_keep_prob=dropout_keep_prob,
            kernel_size=kernel_size,
@@ -264,6 +275,7 @@ def build_weight_shared_convolutional_box_predictor(
    num_layers_before_predictor,
    box_code_size,
    kernel_size=3,
+    add_background_class=True,
    class_prediction_bias_init=0.0,
    use_dropout=False,
    dropout_keep_prob=0.8,
@@ -288,6 +300,7 @@ def build_weight_shared_convolutional_box_predictor(
      the predictor.
    box_code_size: Size of encoding for each box.
    kernel_size: Size of final convolution kernel.
+    add_background_class: Whether to add an implicit background class.
    class_prediction_bias_init: constant value to initialize bias of the last
      conv2d layer before class prediction.
    use_dropout: Whether to apply dropout to class prediction head.
@@ -313,7 +326,8 @@ def build_weight_shared_convolutional_box_predictor(
      box_encodings_clip_range=box_encodings_clip_range)
  class_prediction_head = (
      class_head.WeightSharedConvolutionalClassHead(
-          num_classes=num_classes,
+          num_class_slots=(
+              num_classes + 1 if add_background_class else num_classes),
          kernel_size=kernel_size,
          class_prediction_bias_init=class_prediction_bias_init,
          use_dropout=use_dropout,
@@ -355,6 +369,7 @@ def build_mask_rcnn_box_predictor(is_training,
                                  use_dropout,
                                  dropout_keep_prob,
                                  box_code_size,
+                                  add_background_class=True,
                                  share_box_across_classes=False,
                                  predict_instance_masks=False,
                                  conv_hyperparams_fn=None,
@@ -362,40 +377,46 @@ def build_mask_rcnn_box_predictor(is_training,
                                  mask_width=14,
                                  mask_prediction_num_conv_layers=2,
                                  mask_prediction_conv_depth=256,
-                                  masks_are_class_agnostic=False):
+                                  masks_are_class_agnostic=False,
+                                  convolve_then_upsample_masks=False):
  """Builds and returns a MaskRCNNBoxPredictor class.

  Args:
-      is_training: Indicates whether the BoxPredictor is in training mode.
-      num_classes: number of classes.  Note that num_classes *does not*
-        include the background category, so if groundtruth labels take values
-        in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the
-        assigned classification targets can range from {0,... K}).
-      fc_hyperparams_fn: A function to generate tf-slim arg_scope with
-        hyperparameters for fully connected ops.
-      use_dropout: Option to use dropout or not.  Note that a single dropout
-        op is applied here prior to both box and class predictions, which stands
-        in contrast to the ConvolutionalBoxPredictor below.
-      dropout_keep_prob: Keep probability for dropout.
-        This is only used if use_dropout is True.
-      box_code_size: Size of encoding for each box.
-      share_box_across_classes: Whether to share boxes across classes rather
-        than use a different box for each class.
-      predict_instance_masks: If True, will add a third stage mask prediction
-        to the returned class.
-      conv_hyperparams_fn: A function to generate tf-slim arg_scope with
-        hyperparameters for convolution ops.
-      mask_height: Desired output mask height. The default value is 14.
-      mask_width: Desired output mask width. The default value is 14.
-      mask_prediction_num_conv_layers: Number of convolution layers applied to
-        the image_features in mask prediction branch.
-      mask_prediction_conv_depth: The depth for the first conv2d_transpose op
-        applied to the image_features in the mask prediction branch. If set
-        to 0, the depth of the convolution layers will be automatically chosen
-        based on the number of object classes and the number of channels in the
-        image features.
-      masks_are_class_agnostic: Boolean determining if the mask-head is
-        class-agnostic or not.
+    is_training: Indicates whether the BoxPredictor is in training mode.
+    num_classes: number of classes.  Note that num_classes *does not*
+      include the background category, so if groundtruth labels take values
+      in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the
+      assigned classification targets can range from {0,... K}).
+    fc_hyperparams_fn: A function to generate tf-slim arg_scope with
+      hyperparameters for fully connected ops.
+    use_dropout: Option to use dropout or not.  Note that a single dropout
+      op is applied here prior to both box and class predictions, which stands
+      in contrast to the ConvolutionalBoxPredictor below.
+    dropout_keep_prob: Keep probability for dropout.
+      This is only used if use_dropout is True.
+    box_code_size: Size of encoding for each box.
+    add_background_class: Whether to add an implicit background class.
+    share_box_across_classes: Whether to share boxes across classes rather
+      than use a different box for each class.
+    predict_instance_masks: If True, will add a third stage mask prediction
+      to the returned class.
+    conv_hyperparams_fn: A function to generate tf-slim arg_scope with
+      hyperparameters for convolution ops.
+    mask_height: Desired output mask height. The default value is 14.
+    mask_width: Desired output mask width. The default value is 14.
+    mask_prediction_num_conv_layers: Number of convolution layers applied to
+      the image_features in mask prediction branch.
+    mask_prediction_conv_depth: The depth for the first conv2d_transpose op
+      applied to the image_features in the mask prediction branch. If set
+      to 0, the depth of the convolution layers will be automatically chosen
+      based on the number of object classes and the number of channels in the
+      image features.
+    masks_are_class_agnostic: Boolean determining if the mask-head is
+      class-agnostic or not.
+    convolve_then_upsample_masks: Whether to apply convolutions on mask
+      features before upsampling using nearest neighbor resizing. Otherwise,
+      mask features are resized to [`mask_height`, `mask_width`] using
+      bilinear resizing before applying convolutions.

  Returns:
    A MaskRCNNBoxPredictor class.
@@ -410,7 +431,7 @@ def build_mask_rcnn_box_predictor(is_training,
      share_box_across_classes=share_box_across_classes)
  class_prediction_head = class_head.MaskRCNNClassHead(
      is_training=is_training,
-      num_classes=num_classes,
+      num_class_slots=num_classes + 1 if add_background_class else num_classes,
      fc_hyperparams_fn=fc_hyperparams_fn,
      use_dropout=use_dropout,
      dropout_keep_prob=dropout_keep_prob)
@@ -425,7 +446,8 @@ def build_mask_rcnn_box_predictor(is_training,
            mask_width=mask_width,
            mask_prediction_num_conv_layers=mask_prediction_num_conv_layers,
            mask_prediction_conv_depth=mask_prediction_conv_depth,
-            masks_are_class_agnostic=masks_are_class_agnostic)
+            masks_are_class_agnostic=masks_are_class_agnostic,
+            convolve_then_upsample=convolve_then_upsample_masks)
  return mask_rcnn_box_predictor.MaskRCNNBoxPredictor(
      is_training=is_training,
      num_classes=num_classes,
@@ -464,7 +486,8 @@ BoxEncodingsClipRange = collections.namedtuple('BoxEncodingsClipRange',
                                               ['min', 'max'])


-def build(argscope_fn, box_predictor_config, is_training, num_classes):
+def build(argscope_fn, box_predictor_config, is_training, num_classes,
+          add_background_class=True):
  """Builds box predictor based on the configuration.

  Builds box predictor based on the configuration. See box_predictor.proto for
@@ -479,6 +502,7 @@ def build(argscope_fn, box_predictor_config, is_training, num_classes):
      configuration.
    is_training: Whether the models is in training mode.
    num_classes: Number of classes to predict.
+    add_background_class: Whether to add an implicit background class.

  Returns:
    box_predictor: box_predictor.BoxPredictor object.
@@ -502,6 +526,7 @@ def build(argscope_fn, box_predictor_config, is_training, num_classes):
    return build_convolutional_box_predictor(
        is_training=is_training,
        num_classes=num_classes,
+        add_background_class=add_background_class,
        conv_hyperparams_fn=conv_hyperparams_fn,
        use_dropout=config_box_predictor.use_dropout,
        dropout_keep_prob=config_box_predictor.dropout_keep_probability,
@@ -542,6 +567,7 @@ def build(argscope_fn, box_predictor_config, is_training, num_classes):
    return build_weight_shared_convolutional_box_predictor(
        is_training=is_training,
        num_classes=num_classes,
+        add_background_class=add_background_class,
        conv_hyperparams_fn=conv_hyperparams_fn,
        depth=config_box_predictor.depth,
        num_layers_before_predictor=(
@@ -570,6 +596,7 @@ def build(argscope_fn, box_predictor_config, is_training, num_classes):
    return build_mask_rcnn_box_predictor(
        is_training=is_training,
        num_classes=num_classes,
+        add_background_class=add_background_class,
        fc_hyperparams_fn=fc_hyperparams_fn,
        use_dropout=config_box_predictor.use_dropout,
        dropout_keep_prob=config_box_predictor.dropout_keep_probability,
@@ -585,7 +612,9 @@ def build(argscope_fn, box_predictor_config, is_training, num_classes):
        mask_prediction_conv_depth=(
            config_box_predictor.mask_prediction_conv_depth),
        masks_are_class_agnostic=(
-            config_box_predictor.masks_are_class_agnostic))
+            config_box_predictor.masks_are_class_agnostic),
+        convolve_then_upsample_masks=(
+            config_box_predictor.convolve_then_upsample_masks))

  if box_predictor_oneof == 'rfcn_box_predictor':
    config_box_predictor = box_predictor_config.rfcn_box_predictor
@@ -603,3 +632,78 @@ def build(argscope_fn, box_predictor_config, is_training, num_classes):
        box_code_size=config_box_predictor.box_code_size)
    return box_predictor_object
  raise ValueError('Unknown box predictor: {}'.format(box_predictor_oneof))
+
+
+def build_keras(conv_hyperparams_fn, freeze_batchnorm, inplace_batchnorm_update,
+                num_predictions_per_location_list, box_predictor_config,
+                is_training, num_classes, add_background_class=True):
+  """Builds a Keras-based box predictor based on the configuration.
+
+  Builds Keras-based box predictor based on the configuration.
+  See box_predictor.proto for configurable options. Also, see box_predictor.py
+  for more details.
+
+  Args:
+    conv_hyperparams_fn: A function that takes a hyperparams_pb2.Hyperparams
+      proto and returns a `hyperparams_builder.KerasLayerHyperparams`
+      for Conv or FC hyperparameters.
+    freeze_batchnorm: Whether to freeze batch norm parameters during
+      training or not. When training with a small batch size (e.g. 1), it is
+      desirable to freeze batch norm update and use pretrained batch norm
+      params.
+    inplace_batchnorm_update: Whether to update batch norm moving average
+      values inplace. When this is false train op must add a control
+      dependency on tf.graphkeys.UPDATE_OPS collection in order to update
+      batch norm statistics.
+    num_predictions_per_location_list: A list of integers representing the
+      number of box predictions to be made per spatial location for each
+      feature map.
+    box_predictor_config: box_predictor_pb2.BoxPredictor proto containing
+      configuration.
+    is_training: Whether the models is in training mode.
+    num_classes: Number of classes to predict.
+    add_background_class: Whether to add an implicit background class.
+
+  Returns:
+    box_predictor: box_predictor.KerasBoxPredictor object.
+
+  Raises:
+    ValueError: On unknown box predictor, or one with no Keras box predictor.
+  """
+  if not isinstance(box_predictor_config, box_predictor_pb2.BoxPredictor):
+    raise ValueError('box_predictor_config not of type '
+                     'box_predictor_pb2.BoxPredictor.')
+
+  box_predictor_oneof = box_predictor_config.WhichOneof('box_predictor_oneof')
+
+  if box_predictor_oneof == 'convolutional_box_predictor':
+    config_box_predictor = box_predictor_config.convolutional_box_predictor
+    conv_hyperparams = conv_hyperparams_fn(
+        config_box_predictor.conv_hyperparams)
+
+    mask_head_config = (
+        config_box_predictor.mask_head
+        if config_box_predictor.HasField('mask_head') else None)
+    return build_convolutional_keras_box_predictor(
+        is_training=is_training,
+        num_classes=num_classes,
+        add_background_class=add_background_class,
+        conv_hyperparams=conv_hyperparams,
+        freeze_batchnorm=freeze_batchnorm,
+        inplace_batchnorm_update=inplace_batchnorm_update,
+        num_predictions_per_location_list=num_predictions_per_location_list,
+        use_dropout=config_box_predictor.use_dropout,
+        dropout_keep_prob=config_box_predictor.dropout_keep_probability,
+        box_code_size=config_box_predictor.box_code_size,
+        kernel_size=config_box_predictor.kernel_size,
+        num_layers_before_predictor=(
+            config_box_predictor.num_layers_before_predictor),
+        min_depth=config_box_predictor.min_depth,
+        max_depth=config_box_predictor.max_depth,
+        class_prediction_bias_init=(
+            config_box_predictor.class_prediction_bias_init),
+        use_depthwise=config_box_predictor.use_depthwise,
+        mask_head_config=mask_head_config)
+
+  raise ValueError(
+      'Unknown box predictor for Keras: {}'.format(box_predictor_oneof))
--- a/research/object_detection/builders/box_predictor_builder_test.py
+++ b/research/object_detection/builders/box_predictor_builder_test.py
@@ -113,7 +113,8 @@ class ConvolutionalBoxPredictorBuilderTest(tf.test.TestCase):
        argscope_fn=mock_conv_argscope_builder,
        box_predictor_config=box_predictor_proto,
        is_training=False,
-        num_classes=10)
+        num_classes=10,
+        add_background_class=False)
    class_head = box_predictor._class_prediction_head
    self.assertEqual(box_predictor._min_depth, 2)
    self.assertEqual(box_predictor._max_depth, 16)
@@ -122,6 +123,7 @@ class ConvolutionalBoxPredictorBuilderTest(tf.test.TestCase):
    self.assertAlmostEqual(class_head._dropout_keep_prob, 0.4)
    self.assertTrue(class_head._apply_sigmoid_to_scores)
    self.assertAlmostEqual(class_head._class_prediction_bias_init, 4.0)
+    self.assertEqual(class_head._num_class_slots, 10)
    self.assertEqual(box_predictor.num_classes, 10)
    self.assertFalse(box_predictor._is_training)
    self.assertTrue(class_head._use_depthwise)
@@ -154,6 +156,7 @@ class ConvolutionalBoxPredictorBuilderTest(tf.test.TestCase):
    self.assertTrue(class_head._use_dropout)
    self.assertAlmostEqual(class_head._dropout_keep_prob, 0.8)
    self.assertFalse(class_head._apply_sigmoid_to_scores)
+    self.assertEqual(class_head._num_class_slots, 91)
    self.assertEqual(box_predictor.num_classes, 90)
    self.assertTrue(box_predictor._is_training)
    self.assertFalse(class_head._use_depthwise)
@@ -306,7 +309,8 @@ class WeightSharedConvolutionalBoxPredictorBuilderTest(tf.test.TestCase):
        argscope_fn=mock_conv_argscope_builder,
        box_predictor_config=box_predictor_proto,
        is_training=False,
-        num_classes=10)
+        num_classes=10,
+        add_background_class=False)
    class_head = box_predictor._class_prediction_head
    self.assertEqual(box_predictor._depth, 2)
    self.assertEqual(box_predictor._num_layers_before_predictor, 2)
@@ -349,7 +353,8 @@ class WeightSharedConvolutionalBoxPredictorBuilderTest(tf.test.TestCase):
        argscope_fn=mock_conv_argscope_builder,
        box_predictor_config=box_predictor_proto,
        is_training=False,
-        num_classes=10)
+        num_classes=10,
+        add_background_class=False)
    class_head = box_predictor._class_prediction_head
    self.assertEqual(box_predictor._depth, 2)
    self.assertEqual(box_predictor._num_layers_before_predictor, 2)
@@ -627,6 +632,48 @@ class MaskRCNNBoxPredictorBuilderTest(tf.test.TestCase):
        third_stage_heads[mask_rcnn_box_predictor.MASK_PREDICTIONS]
        ._mask_prediction_conv_depth, 512)

+  def test_build_box_predictor_with_convlve_then_upsample_masks(self):
+    box_predictor_proto = box_predictor_pb2.BoxPredictor()
+    box_predictor_proto.mask_rcnn_box_predictor.fc_hyperparams.op = (
+        hyperparams_pb2.Hyperparams.FC)
+    box_predictor_proto.mask_rcnn_box_predictor.conv_hyperparams.op = (
+        hyperparams_pb2.Hyperparams.CONV)
+    box_predictor_proto.mask_rcnn_box_predictor.predict_instance_masks = True
+    box_predictor_proto.mask_rcnn_box_predictor.mask_prediction_conv_depth = 512
+    box_predictor_proto.mask_rcnn_box_predictor.mask_height = 24
+    box_predictor_proto.mask_rcnn_box_predictor.mask_width = 24
+    box_predictor_proto.mask_rcnn_box_predictor.convolve_then_upsample_masks = (
+        True)
+
+    mock_argscope_fn = mock.Mock(return_value='arg_scope')
+    box_predictor = box_predictor_builder.build(
+        argscope_fn=mock_argscope_fn,
+        box_predictor_config=box_predictor_proto,
+        is_training=True,
+        num_classes=90)
+    mock_argscope_fn.assert_has_calls(
+        [mock.call(box_predictor_proto.mask_rcnn_box_predictor.fc_hyperparams,
+                   True),
+         mock.call(box_predictor_proto.mask_rcnn_box_predictor.conv_hyperparams,
+                   True)], any_order=True)
+    box_head = box_predictor._box_prediction_head
+    class_head = box_predictor._class_prediction_head
+    third_stage_heads = box_predictor._third_stage_heads
+    self.assertFalse(box_head._use_dropout)
+    self.assertFalse(class_head._use_dropout)
+    self.assertAlmostEqual(box_head._dropout_keep_prob, 0.5)
+    self.assertAlmostEqual(class_head._dropout_keep_prob, 0.5)
+    self.assertEqual(box_predictor.num_classes, 90)
+    self.assertTrue(box_predictor._is_training)
+    self.assertEqual(box_head._box_code_size, 4)
+    self.assertTrue(
+        mask_rcnn_box_predictor.MASK_PREDICTIONS in third_stage_heads)
+    self.assertEqual(
+        third_stage_heads[mask_rcnn_box_predictor.MASK_PREDICTIONS]
+        ._mask_prediction_conv_depth, 512)
+    self.assertTrue(third_stage_heads[mask_rcnn_box_predictor.MASK_PREDICTIONS]
+                    ._convolve_then_upsample)
+

 class RfcnBoxPredictorBuilderTest(tf.test.TestCase):


--- a/research/object_detection/builders/hyperparams_builder.py
+++ b/research/object_detection/builders/hyperparams_builder.py
@@ -64,6 +64,10 @@ class KerasLayerHyperparams(object):
          hyperparams_config.batch_norm)

    self._activation_fn = _build_activation_fn(hyperparams_config.activation)
+    # TODO(kaftan): Unclear if these kwargs apply to separable & depthwise conv
+    # (Those might use depthwise_* instead of kernel_*)
+    # We should probably switch to using build_conv2d_layer and
+    # build_depthwise_conv2d_layer methods instead.
    self._op_params = {
        'kernel_regularizer': _build_keras_regularizer(
            hyperparams_config.regularizer),

--- a/research/object_detection/builders/image_resizer_builder.py
+++ b/research/object_detection/builders/image_resizer_builder.py
@@ -106,10 +106,35 @@ def build(image_resizer_config):
    raise ValueError(
        'Invalid image resizer option: \'%s\'.' % image_resizer_oneof)

-  def grayscale_image_resizer(image):
-    [resized_image, resized_image_shape] = image_resizer_fn(image)
-    grayscale_image = preprocessor.rgb_to_gray(resized_image)
-    grayscale_image_shape = tf.concat([resized_image_shape[:-1], [1]], 0)
-    return [grayscale_image, grayscale_image_shape]
+  def grayscale_image_resizer(image, masks=None):
+    """Convert to grayscale before applying image_resizer_fn.
+
+    Args:
+      image: A 3D tensor of shape [height, width, 3]
+      masks: (optional) rank 3 float32 tensor with shape [num_instances, height,
+        width] containing instance masks.
+
+    Returns:
+    Note that the position of the resized_image_shape changes based on whether
+    masks are present.
+    resized_image: A 3D tensor of shape [new_height, new_width, 1],
+      where the image has been resized (with bilinear interpolation) so that
+      min(new_height, new_width) == min_dimension or
+      max(new_height, new_width) == max_dimension.
+    resized_masks: If masks is not None, also outputs masks. A 3D tensor of
+      shape [num_instances, new_height, new_width].
+    resized_image_shape: A 1D tensor of shape [3] containing shape of the
+      resized image.
+    """
+    # image_resizer_fn returns [resized_image, resized_image_shape] if
+    # mask==None, otherwise it returns
+    # [resized_image, resized_mask, resized_image_shape]. In either case, we
+    # only deal with first and last element of the returned list.
+    retval = image_resizer_fn(image, masks)
+    resized_image = retval[0]
+    resized_image_shape = retval[-1]
+    retval[0] = preprocessor.rgb_to_gray(resized_image)
+    retval[-1] = tf.concat([resized_image_shape[:-1], [1]], 0)
+    return retval

  return functools.partial(grayscale_image_resizer)
--- a/research/object_detection/builders/losses_builder.py
+++ b/research/object_detection/builders/losses_builder.py
@@ -136,6 +136,14 @@ def build_faster_rcnn_classification_loss(loss_config):
    config = loss_config.weighted_logits_softmax
    return losses.WeightedSoftmaxClassificationAgainstLogitsLoss(
        logit_scale=config.logit_scale)
+  if loss_type == 'weighted_sigmoid_focal':
+    config = loss_config.weighted_sigmoid_focal
+    alpha = None
+    if config.HasField('alpha'):
+      alpha = config.alpha
+    return losses.SigmoidFocalClassificationLoss(
+        gamma=config.gamma,
+        alpha=alpha)

  # By default, Faster RCNN second stage classifier uses Softmax loss
  # with anchor-wise outputs.

--- a/research/object_detection/builders/losses_builder_test.py
+++ b/research/object_detection/builders/losses_builder_test.py
@@ -280,7 +280,7 @@ class ClassificationLossBuilderTest(tf.test.TestCase):
                               losses.WeightedSigmoidClassificationLoss))
    predictions = tf.constant([[[0.0, 1.0, 0.0], [0.0, 0.5, 0.5]]])
    targets = tf.constant([[[0.0, 1.0, 0.0], [0.0, 0.0, 1.0]]])
-    weights = tf.constant([[1.0, 1.0]])
+    weights = tf.constant([[[1.0, 1.0, 1.0], [1.0, 1.0, 1.0]]])
    loss = classification_loss(predictions, targets, weights=weights)
    self.assertEqual(loss.shape, [1, 2, 3])

@@ -473,6 +473,19 @@ class FasterRcnnClassificationLossBuilderTest(tf.test.TestCase):
        isinstance(classification_loss,
                   losses.WeightedSoftmaxClassificationAgainstLogitsLoss))

+  def test_build_sigmoid_focal_loss(self):
+    losses_text_proto = """
+      weighted_sigmoid_focal {
+      }
+    """
+    losses_proto = losses_pb2.ClassificationLoss()
+    text_format.Merge(losses_text_proto, losses_proto)
+    classification_loss = losses_builder.build_faster_rcnn_classification_loss(
+        losses_proto)
+    self.assertTrue(
+        isinstance(classification_loss,
+                   losses.SigmoidFocalClassificationLoss))
+
  def test_build_softmax_loss_by_default(self):
    losses_text_proto = """
    """

--- a/research/object_detection/builders/model_builder.py
+++ b/research/object_detection/builders/model_builder.py
@@ -47,6 +47,8 @@ from object_detection.models.ssd_mobilenet_v1_fpn_feature_extractor import SSDMo
 from object_detection.models.ssd_mobilenet_v1_ppn_feature_extractor import SSDMobileNetV1PpnFeatureExtractor
 from object_detection.models.ssd_mobilenet_v2_feature_extractor import SSDMobileNetV2FeatureExtractor
 from object_detection.models.ssd_mobilenet_v2_fpn_feature_extractor import SSDMobileNetV2FpnFeatureExtractor
+from object_detection.models.ssd_mobilenet_v2_keras_feature_extractor import SSDMobileNetV2KerasFeatureExtractor
+from object_detection.models.ssd_pnasnet_feature_extractor import SSDPNASNetFeatureExtractor
 from object_detection.predictors import rfcn_box_predictor
 from object_detection.protos import model_pb2
 from object_detection.utils import ops
@@ -69,6 +71,11 @@ SSD_FEATURE_EXTRACTOR_CLASS_MAP = {
    'ssd_resnet152_v1_ppn':
        ssd_resnet_v1_ppn.SSDResnet152V1PpnFeatureExtractor,
    'embedded_ssd_mobilenet_v1': EmbeddedSSDMobileNetV1FeatureExtractor,
+    'ssd_pnasnet': SSDPNASNetFeatureExtractor,
+}
+
+SSD_KERAS_FEATURE_EXTRACTOR_CLASS_MAP = {
+    'ssd_mobilenet_v2_keras': SSDMobileNetV2KerasFeatureExtractor
 }

 # A map of names to Faster R-CNN feature extractors.
@@ -90,8 +97,7 @@ FASTER_RCNN_FEATURE_EXTRACTOR_CLASS_MAP = {
 }


-def build(model_config, is_training, add_summaries=True,
-          add_background_class=True):
+def build(model_config, is_training, add_summaries=True):
  """Builds a DetectionModel based on the model config.

  Args:
@@ -99,10 +105,6 @@ def build(model_config, is_training, add_summaries=True,
      DetectionModel.
    is_training: True if this model is being built for training purposes.
    add_summaries: Whether to add tensorflow summaries in the model graph.
-    add_background_class: Whether to add an implicit background class to one-hot
-      encodings of groundtruth labels. Set to false if using groundtruth labels
-      with an explicit background class or using multiclass scores instead of
-      truth in the case of distillation. Ignored in the case of faster_rcnn.
  Returns:
    DetectionModel based on the config.

@@ -113,21 +115,26 @@ def build(model_config, is_training, add_summaries=True,
    raise ValueError('model_config not of type model_pb2.DetectionModel.')
  meta_architecture = model_config.WhichOneof('model')
  if meta_architecture == 'ssd':
-    return _build_ssd_model(model_config.ssd, is_training, add_summaries,
-                            add_background_class)
+    return _build_ssd_model(model_config.ssd, is_training, add_summaries)
  if meta_architecture == 'faster_rcnn':
    return _build_faster_rcnn_model(model_config.faster_rcnn, is_training,
                                    add_summaries)
  raise ValueError('Unknown meta architecture: {}'.format(meta_architecture))


-def _build_ssd_feature_extractor(feature_extractor_config, is_training,
+def _build_ssd_feature_extractor(feature_extractor_config,
+                                 is_training,
+                                 freeze_batchnorm,
                                 reuse_weights=None):
  """Builds a ssd_meta_arch.SSDFeatureExtractor based on config.

  Args:
    feature_extractor_config: A SSDFeatureExtractor proto config from ssd.proto.
    is_training: True if this feature extractor is being built for training.
+    freeze_batchnorm: Whether to freeze batch norm parameters during
+      training or not. When training with a small batch size (e.g. 1), it is
+      desirable to freeze batch norm update and use pretrained batch norm
+      params.
    reuse_weights: if the feature extractor should reuse weights.

  Returns:
@@ -137,20 +144,31 @@ def _build_ssd_feature_extractor(feature_extractor_config, is_training,
    ValueError: On invalid feature extractor type.
  """
  feature_type = feature_extractor_config.type
+  is_keras_extractor = feature_type in SSD_KERAS_FEATURE_EXTRACTOR_CLASS_MAP
  depth_multiplier = feature_extractor_config.depth_multiplier
  min_depth = feature_extractor_config.min_depth
  pad_to_multiple = feature_extractor_config.pad_to_multiple
  use_explicit_padding = feature_extractor_config.use_explicit_padding
  use_depthwise = feature_extractor_config.use_depthwise
-  conv_hyperparams = hyperparams_builder.build(
-      feature_extractor_config.conv_hyperparams, is_training)
+
+  if is_keras_extractor:
+    conv_hyperparams = hyperparams_builder.KerasLayerHyperparams(
+        feature_extractor_config.conv_hyperparams)
+  else:
+    conv_hyperparams = hyperparams_builder.build(
+        feature_extractor_config.conv_hyperparams, is_training)
  override_base_feature_extractor_hyperparams = (
      feature_extractor_config.override_base_feature_extractor_hyperparams)

-  if feature_type not in SSD_FEATURE_EXTRACTOR_CLASS_MAP:
+  if (feature_type not in SSD_FEATURE_EXTRACTOR_CLASS_MAP) and (
+      not is_keras_extractor):
    raise ValueError('Unknown ssd feature_extractor: {}'.format(feature_type))

-  feature_extractor_class = SSD_FEATURE_EXTRACTOR_CLASS_MAP[feature_type]
+  if is_keras_extractor:
+    feature_extractor_class = SSD_KERAS_FEATURE_EXTRACTOR_CLASS_MAP[
+        feature_type]
+  else:
+    feature_extractor_class = SSD_FEATURE_EXTRACTOR_CLASS_MAP[feature_type]
  kwargs = {
      'is_training':
          is_training,
@@ -160,10 +178,6 @@ def _build_ssd_feature_extractor(feature_extractor_config, is_training,
          min_depth,
      'pad_to_multiple':
          pad_to_multiple,
-      'conv_hyperparams_fn':
-          conv_hyperparams,
-      'reuse_weights':
-          reuse_weights,
      'use_explicit_padding':
          use_explicit_padding,
      'use_depthwise':
@@ -172,6 +186,18 @@ def _build_ssd_feature_extractor(feature_extractor_config, is_training,
          override_base_feature_extractor_hyperparams
  }

+  if is_keras_extractor:
+    kwargs.update({
+        'conv_hyperparams': conv_hyperparams,
+        'inplace_batchnorm_update': False,
+        'freeze_batchnorm': freeze_batchnorm
+    })
+  else:
+    kwargs.update({
+        'conv_hyperparams_fn': conv_hyperparams,
+        'reuse_weights': reuse_weights,
+    })
+
  if feature_extractor_config.HasField('fpn'):
    kwargs.update({
        'fpn_min_level':
@@ -185,8 +211,7 @@ def _build_ssd_feature_extractor(feature_extractor_config, is_training,
  return feature_extractor_class(**kwargs)


-def _build_ssd_model(ssd_config, is_training, add_summaries,
-                     add_background_class=True):
+def _build_ssd_model(ssd_config, is_training, add_summaries):
  """Builds an SSD detection model based on the model config.

  Args:
@@ -194,10 +219,6 @@ def _build_ssd_model(ssd_config, is_training, add_summaries,
      SSDMetaArch.
    is_training: True if this model is being built for training purposes.
    add_summaries: Whether to add tf summaries in the model.
-    add_background_class: Whether to add an implicit background class to one-hot
-      encodings of groundtruth labels. Set to false if using groundtruth labels
-      with an explicit background class or using multiclass scores instead of
-      truth in the case of distillation.
  Returns:
    SSDMetaArch based on the config.

@@ -210,6 +231,7 @@ def _build_ssd_model(ssd_config, is_training, add_summaries,
  # Feature extractor
  feature_extractor = _build_ssd_feature_extractor(
      feature_extractor_config=ssd_config.feature_extractor,
+      freeze_batchnorm=ssd_config.freeze_batchnorm,
      is_training=is_training)

  box_coder = box_coder_builder.build(ssd_config.box_coder)
@@ -218,11 +240,23 @@ def _build_ssd_model(ssd_config, is_training, add_summaries,
      ssd_config.similarity_calculator)
  encode_background_as_zeros = ssd_config.encode_background_as_zeros
  negative_class_weight = ssd_config.negative_class_weight
-  ssd_box_predictor = box_predictor_builder.build(hyperparams_builder.build,
-                                                  ssd_config.box_predictor,
-                                                  is_training, num_classes)
  anchor_generator = anchor_generator_builder.build(
      ssd_config.anchor_generator)
+  if feature_extractor.is_keras_model:
+    ssd_box_predictor = box_predictor_builder.build_keras(
+        conv_hyperparams_fn=hyperparams_builder.KerasLayerHyperparams,
+        freeze_batchnorm=ssd_config.freeze_batchnorm,
+        inplace_batchnorm_update=False,
+        num_predictions_per_location_list=anchor_generator
+        .num_anchors_per_location(),
+        box_predictor_config=ssd_config.box_predictor,
+        is_training=is_training,
+        num_classes=num_classes,
+        add_background_class=ssd_config.add_background_class)
+  else:
+    ssd_box_predictor = box_predictor_builder.build(
+        hyperparams_builder.build, ssd_config.box_predictor, is_training,
+        num_classes, ssd_config.add_background_class)
  image_resizer_fn = image_resizer_builder.build(ssd_config.image_resizer)
  non_max_suppression_fn, score_conversion_fn = post_processing_builder.build(
      ssd_config.post_processing)
@@ -244,7 +278,7 @@ def _build_ssd_model(ssd_config, is_training, add_summaries,
  if ssd_config.use_expected_classification_loss_under_sampling:
    expected_classification_loss_under_sampling = functools.partial(
        ops.expected_classification_loss_under_sampling,
-        minimum_negative_sampling=ssd_config.minimum_negative_sampling,
+        min_num_negative_samples=ssd_config.min_num_negative_samples,
        desired_negative_sampling_ratio=ssd_config.
        desired_negative_sampling_ratio)

@@ -271,7 +305,7 @@ def _build_ssd_model(ssd_config, is_training, add_summaries,
      normalize_loc_loss_by_codesize=normalize_loc_loss_by_codesize,
      freeze_batchnorm=ssd_config.freeze_batchnorm,
      inplace_batchnorm_update=ssd_config.inplace_batchnorm_update,
-      add_background_class=add_background_class,
+      add_background_class=ssd_config.add_background_class,
      random_example_sampler=random_example_sampler,
      expected_classification_loss_under_sampling=
      expected_classification_loss_under_sampling)
@@ -357,12 +391,11 @@ def _build_faster_rcnn_model(frcnn_config, is_training, add_summaries):
      frcnn_config.first_stage_box_predictor_kernel_size)
  first_stage_box_predictor_depth = frcnn_config.first_stage_box_predictor_depth
  first_stage_minibatch_size = frcnn_config.first_stage_minibatch_size
-  # TODO(bhattad): When eval is supported using static shapes, add separate
-  # use_static_shapes_for_trainig and use_static_shapes_for_evaluation.
-  use_static_shapes = frcnn_config.use_static_shapes and is_training
+  use_static_shapes = frcnn_config.use_static_shapes
  first_stage_sampler = sampler.BalancedPositiveNegativeSampler(
      positive_fraction=frcnn_config.first_stage_positive_balance_fraction,
-      is_static=frcnn_config.use_static_balanced_label_sampler and is_training)
+      is_static=(frcnn_config.use_static_balanced_label_sampler and
+                 use_static_shapes))
  first_stage_max_proposals = frcnn_config.first_stage_max_proposals
  if (frcnn_config.first_stage_nms_iou_threshold < 0 or
      frcnn_config.first_stage_nms_iou_threshold > 1.0):
@@ -377,7 +410,7 @@ def _build_faster_rcnn_model(frcnn_config, is_training, add_summaries):
      iou_thresh=frcnn_config.first_stage_nms_iou_threshold,
      max_size_per_class=frcnn_config.first_stage_max_proposals,
      max_total_size=frcnn_config.first_stage_max_proposals,
-      use_static_shapes=use_static_shapes and is_training)
+      use_static_shapes=use_static_shapes)
  first_stage_loc_loss_weight = (
      frcnn_config.first_stage_localization_loss_weight)
  first_stage_obj_loss_weight = frcnn_config.first_stage_objectness_loss_weight
@@ -398,7 +431,8 @@ def _build_faster_rcnn_model(frcnn_config, is_training, add_summaries):
  second_stage_batch_size = frcnn_config.second_stage_batch_size
  second_stage_sampler = sampler.BalancedPositiveNegativeSampler(
      positive_fraction=frcnn_config.second_stage_balance_fraction,
-      is_static=frcnn_config.use_static_balanced_label_sampler and is_training)
+      is_static=(frcnn_config.use_static_balanced_label_sampler and
+                 use_static_shapes))
  (second_stage_non_max_suppression_fn, second_stage_score_conversion_fn
  ) = post_processing_builder.build(frcnn_config.second_stage_post_processing)
  second_stage_localization_loss_weight = (

--- a/research/object_detection/builders/model_builder_test.py
+++ b/research/object_detection/builders/model_builder_test.py
@@ -39,6 +39,9 @@ from object_detection.models.ssd_mobilenet_v1_fpn_feature_extractor import SSDMo
 from object_detection.models.ssd_mobilenet_v1_ppn_feature_extractor import SSDMobileNetV1PpnFeatureExtractor
 from object_detection.models.ssd_mobilenet_v2_feature_extractor import SSDMobileNetV2FeatureExtractor
 from object_detection.models.ssd_mobilenet_v2_fpn_feature_extractor import SSDMobileNetV2FpnFeatureExtractor
+from object_detection.models.ssd_mobilenet_v2_keras_feature_extractor import SSDMobileNetV2KerasFeatureExtractor
+from object_detection.predictors import convolutional_box_predictor
+from object_detection.predictors import convolutional_keras_box_predictor
 from object_detection.protos import model_pb2

 FRCNN_RESNET_FEAT_MAPS = {
@@ -148,7 +151,7 @@ class ModelBuilderTest(tf.test.TestCase, parameterized.TestCase):
          }
        }
        use_expected_classification_loss_under_sampling: true
-        minimum_negative_sampling: 10
+        min_num_negative_samples: 10
        desired_negative_sampling_ratio: 2
      }"""
    model_proto = model_pb2.DetectionModel()
@@ -160,7 +163,7 @@ class ModelBuilderTest(tf.test.TestCase, parameterized.TestCase):
    self.assertIsNotNone(model._expected_classification_loss_under_sampling)
    self.assertEqual(
        model._expected_classification_loss_under_sampling.keywords, {
-            'minimum_negative_sampling': 10,
+            'min_num_negative_samples': 10,
            'desired_negative_sampling_ratio': 2
        })

@@ -713,6 +716,86 @@ class ModelBuilderTest(tf.test.TestCase, parameterized.TestCase):
    self.assertIsInstance(model, ssd_meta_arch.SSDMetaArch)
    self.assertIsInstance(model._feature_extractor,
                          SSDMobileNetV2FeatureExtractor)
+    self.assertIsInstance(model._box_predictor,
+                          convolutional_box_predictor.ConvolutionalBoxPredictor)
+    self.assertTrue(model._normalize_loc_loss_by_codesize)
+    self.assertTrue(model._target_assigner._weight_regression_loss_by_score)
+
+  def test_create_ssd_mobilenet_v2_keras_model_from_config(self):
+    model_text_proto = """
+      ssd {
+        feature_extractor {
+          type: 'ssd_mobilenet_v2_keras'
+          conv_hyperparams {
+            regularizer {
+                l2_regularizer {
+                }
+              }
+              initializer {
+                truncated_normal_initializer {
+                }
+              }
+          }
+        }
+        box_coder {
+          faster_rcnn_box_coder {
+          }
+        }
+        matcher {
+          argmax_matcher {
+          }
+        }
+        similarity_calculator {
+          iou_similarity {
+          }
+        }
+        anchor_generator {
+          ssd_anchor_generator {
+            aspect_ratios: 1.0
+          }
+        }
+        image_resizer {
+          fixed_shape_resizer {
+            height: 320
+            width: 320
+          }
+        }
+        box_predictor {
+          convolutional_box_predictor {
+            conv_hyperparams {
+              regularizer {
+                l2_regularizer {
+                }
+              }
+              initializer {
+                truncated_normal_initializer {
+                }
+              }
+            }
+          }
+        }
+        normalize_loc_loss_by_codesize: true
+        loss {
+          classification_loss {
+            weighted_softmax {
+            }
+          }
+          localization_loss {
+            weighted_smooth_l1 {
+            }
+          }
+        }
+        weight_regression_loss_by_score: true
+      }"""
+    model_proto = model_pb2.DetectionModel()
+    text_format.Merge(model_text_proto, model_proto)
+    model = self.create_model(model_proto)
+    self.assertIsInstance(model, ssd_meta_arch.SSDMetaArch)
+    self.assertIsInstance(model._feature_extractor,
+                          SSDMobileNetV2KerasFeatureExtractor)
+    self.assertIsInstance(
+        model._box_predictor,
+        convolutional_keras_box_predictor.ConvolutionalBoxPredictor)
    self.assertTrue(model._normalize_loc_loss_by_codesize)
    self.assertTrue(model._target_assigner._weight_regression_loss_by_score)


--- a/research/object_detection/builders/preprocessor_builder.py
+++ b/research/object_detection/builders/preprocessor_builder.py
@@ -167,6 +167,7 @@ def build(preprocessor_step_config):
                                       config.max_aspect_ratio),
                'area_range': (config.min_area, config.max_area),
                'overlap_thresh': config.overlap_thresh,
+                'clip_boxes': config.clip_boxes,
                'random_coef': config.random_coef,
            })

@@ -217,6 +218,7 @@ def build(preprocessor_step_config):
                               config.max_aspect_ratio),
        'area_range': (config.min_area, config.max_area),
        'overlap_thresh': config.overlap_thresh,
+        'clip_boxes': config.clip_boxes,
        'random_coef': config.random_coef,
    }
    if min_padded_size_ratio:
@@ -252,6 +254,7 @@ def build(preprocessor_step_config):
                            for op in config.operations]
      area_range = [(op.min_area, op.max_area) for op in config.operations]
      overlap_thresh = [op.overlap_thresh for op in config.operations]
+      clip_boxes = [op.clip_boxes for op in config.operations]
      random_coef = [op.random_coef for op in config.operations]
      return (preprocessor.ssd_random_crop,
              {
@@ -259,6 +262,7 @@ def build(preprocessor_step_config):
                  'aspect_ratio_range': aspect_ratio_range,
                  'area_range': area_range,
                  'overlap_thresh': overlap_thresh,
+                  'clip_boxes': clip_boxes,
                  'random_coef': random_coef,
              })
    return (preprocessor.ssd_random_crop, {})
@@ -271,6 +275,7 @@ def build(preprocessor_step_config):
                            for op in config.operations]
      area_range = [(op.min_area, op.max_area) for op in config.operations]
      overlap_thresh = [op.overlap_thresh for op in config.operations]
+      clip_boxes = [op.clip_boxes for op in config.operations]
      random_coef = [op.random_coef for op in config.operations]
      min_padded_size_ratio = [tuple(op.min_padded_size_ratio)
                               for op in config.operations]
@@ -284,6 +289,7 @@ def build(preprocessor_step_config):
                  'aspect_ratio_range': aspect_ratio_range,
                  'area_range': area_range,
                  'overlap_thresh': overlap_thresh,
+                  'clip_boxes': clip_boxes,
                  'random_coef': random_coef,
                  'min_padded_size_ratio': min_padded_size_ratio,
                  'max_padded_size_ratio': max_padded_size_ratio,
@@ -297,6 +303,7 @@ def build(preprocessor_step_config):
      min_object_covered = [op.min_object_covered for op in config.operations]
      area_range = [(op.min_area, op.max_area) for op in config.operations]
      overlap_thresh = [op.overlap_thresh for op in config.operations]
+      clip_boxes = [op.clip_boxes for op in config.operations]
      random_coef = [op.random_coef for op in config.operations]
      return (preprocessor.ssd_random_crop_fixed_aspect_ratio,
              {
@@ -304,6 +311,7 @@ def build(preprocessor_step_config):
                  'aspect_ratio': config.aspect_ratio,
                  'area_range': area_range,
                  'overlap_thresh': overlap_thresh,
+                  'clip_boxes': clip_boxes,
                  'random_coef': random_coef,
              })
    return (preprocessor.ssd_random_crop_fixed_aspect_ratio, {})
@@ -332,6 +340,7 @@ def build(preprocessor_step_config):
      kwargs['area_range'] = [(op.min_area, op.max_area)
                              for op in config.operations]
      kwargs['overlap_thresh'] = [op.overlap_thresh for op in config.operations]
+      kwargs['clip_boxes'] = [op.clip_boxes for op in config.operations]
      kwargs['random_coef'] = [op.random_coef for op in config.operations]
    return (preprocessor.ssd_random_crop_pad_fixed_aspect_ratio, kwargs)


--- a/research/object_detection/builders/preprocessor_builder_test.py
+++ b/research/object_detection/builders/preprocessor_builder_test.py
@@ -222,6 +222,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
      min_area: 0.25
      max_area: 0.875
      overlap_thresh: 0.5
+      clip_boxes: False
      random_coef: 0.125
    }
    """
@@ -234,6 +235,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
        'aspect_ratio_range': (0.75, 1.5),
        'area_range': (0.25, 0.875),
        'overlap_thresh': 0.5,
+        'clip_boxes': False,
        'random_coef': 0.125,
    })

@@ -261,6 +263,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
      min_area: 0.25
      max_area: 0.875
      overlap_thresh: 0.5
+      clip_boxes: False
      random_coef: 0.125
    }
    """
@@ -273,6 +276,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
        'aspect_ratio_range': (0.75, 1.5),
        'area_range': (0.25, 0.875),
        'overlap_thresh': 0.5,
+        'clip_boxes': False,
        'random_coef': 0.125,
    })

@@ -285,6 +289,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
      min_area: 0.25
      max_area: 0.875
      overlap_thresh: 0.5
+      clip_boxes: False
      random_coef: 0.125
      min_padded_size_ratio: 0.5
      min_padded_size_ratio: 0.75
@@ -304,6 +309,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
        'aspect_ratio_range': (0.75, 1.5),
        'area_range': (0.25, 0.875),
        'overlap_thresh': 0.5,
+        'clip_boxes': False,
        'random_coef': 0.125,
        'min_padded_size_ratio': (0.5, 0.75),
        'max_padded_size_ratio': (0.5, 0.75),
@@ -315,6 +321,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
    random_crop_to_aspect_ratio {
      aspect_ratio: 0.85
      overlap_thresh: 0.35
+      clip_boxes: False
    }
    """
    preprocessor_proto = preprocessor_pb2.PreprocessingStep()
@@ -322,7 +329,8 @@ class PreprocessorBuilderTest(tf.test.TestCase):
    function, args = preprocessor_builder.build(preprocessor_proto)
    self.assertEqual(function, preprocessor.random_crop_to_aspect_ratio)
    self.assert_dictionary_close(args, {'aspect_ratio': 0.85,
-                                        'overlap_thresh': 0.35})
+                                        'overlap_thresh': 0.35,
+                                        'clip_boxes': False})

  def test_build_random_black_patches(self):
    preprocessor_text_proto = """
@@ -411,6 +419,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
        min_area: 0.5
        max_area: 1.0
        overlap_thresh: 0.0
+        clip_boxes: False
        random_coef: 0.375
      }
      operations {
@@ -420,6 +429,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
        min_area: 0.5
        max_area: 1.0
        overlap_thresh: 0.25
+        clip_boxes: True
        random_coef: 0.375
      }
    }
@@ -432,6 +442,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
                            'aspect_ratio_range': [(0.875, 1.125), (0.75, 1.5)],
                            'area_range': [(0.5, 1.0), (0.5, 1.0)],
                            'overlap_thresh': [0.0, 0.25],
+                            'clip_boxes': [False, True],
                            'random_coef': [0.375, 0.375]})

  def test_build_ssd_random_crop_empty_operations(self):
@@ -455,6 +466,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
        min_area: 0.5
        max_area: 1.0
        overlap_thresh: 0.0
+        clip_boxes: False
        random_coef: 0.375
        min_padded_size_ratio: [1.0, 1.0]
        max_padded_size_ratio: [2.0, 2.0]
@@ -469,6 +481,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
        min_area: 0.5
        max_area: 1.0
        overlap_thresh: 0.25
+        clip_boxes: True
        random_coef: 0.375
        min_padded_size_ratio: [1.0, 1.0]
        max_padded_size_ratio: [2.0, 2.0]
@@ -486,6 +499,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
                            'aspect_ratio_range': [(0.875, 1.125), (0.75, 1.5)],
                            'area_range': [(0.5, 1.0), (0.5, 1.0)],
                            'overlap_thresh': [0.0, 0.25],
+                            'clip_boxes': [False, True],
                            'random_coef': [0.375, 0.375],
                            'min_padded_size_ratio': [(1.0, 1.0), (1.0, 1.0)],
                            'max_padded_size_ratio': [(2.0, 2.0), (2.0, 2.0)],
@@ -499,6 +513,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
        min_area: 0.5
        max_area: 1.0
        overlap_thresh: 0.0
+        clip_boxes: False
        random_coef: 0.375
      }
      operations {
@@ -506,6 +521,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
        min_area: 0.5
        max_area: 1.0
        overlap_thresh: 0.25
+        clip_boxes: True
        random_coef: 0.375
      }
      aspect_ratio: 0.875
@@ -519,6 +535,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
                            'aspect_ratio': 0.875,
                            'area_range': [(0.5, 1.0), (0.5, 1.0)],
                            'overlap_thresh': [0.0, 0.25],
+                            'clip_boxes': [False, True],
                            'random_coef': [0.375, 0.375]})

  def test_build_ssd_random_crop_pad_fixed_aspect_ratio(self):
@@ -531,6 +548,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
        min_area: 0.5
        max_area: 1.0
        overlap_thresh: 0.0
+        clip_boxes: False
        random_coef: 0.375
      }
      operations {
@@ -540,6 +558,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
        min_area: 0.5
        max_area: 1.0
        overlap_thresh: 0.25
+        clip_boxes: True
        random_coef: 0.375
      }
      aspect_ratio: 0.875
@@ -557,6 +576,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
                            'aspect_ratio_range': [(0.875, 1.125), (0.75, 1.5)],
                            'area_range': [(0.5, 1.0), (0.5, 1.0)],
                            'overlap_thresh': [0.0, 0.25],
+                            'clip_boxes': [False, True],
                            'random_coef': [0.375, 0.375],
                            'min_padded_size_ratio': (1.0, 1.0),
                            'max_padded_size_ratio': (2.0, 2.0)})

--- a/research/object_detection/core/losses.py
+++ b/research/object_detection/core/losses.py
@@ -225,7 +225,9 @@ class WeightedSigmoidClassificationLoss(Loss):
        num_classes] representing the predicted logits for each class
      target_tensor: A float tensor of shape [batch_size, num_anchors,
        num_classes] representing one-hot encoded classification targets
-      weights: a float tensor of shape [batch_size, num_anchors]
+      weights: a float tensor of shape, either [batch_size, num_anchors,
+        num_classes] or [batch_size, num_anchors, 1]. If the shape is
+        [batch_size, num_anchors, 1], all the classses are equally weighted.
      class_indices: (Optional) A 1-D integer tensor of class indices.
        If provided, computes loss only for the specified class indices.

@@ -233,7 +235,6 @@ class WeightedSigmoidClassificationLoss(Loss):
      loss: a float tensor of shape [batch_size, num_anchors, num_classes]
        representing the value of the loss function.
    """
-    weights = tf.expand_dims(weights, 2)
    if class_indices is not None:
      weights *= tf.reshape(
          ops.indices_to_dense_vector(class_indices,
@@ -273,7 +274,9 @@ class SigmoidFocalClassificationLoss(Loss):
        num_classes] representing the predicted logits for each class
      target_tensor: A float tensor of shape [batch_size, num_anchors,
        num_classes] representing one-hot encoded classification targets
-      weights: a float tensor of shape [batch_size, num_anchors]
+      weights: a float tensor of shape, either [batch_size, num_anchors,
+        num_classes] or [batch_size, num_anchors, 1]. If the shape is
+        [batch_size, num_anchors, 1], all the classses are equally weighted.
      class_indices: (Optional) A 1-D integer tensor of class indices.
        If provided, computes loss only for the specified class indices.

@@ -281,7 +284,6 @@ class SigmoidFocalClassificationLoss(Loss):
      loss: a float tensor of shape [batch_size, num_anchors, num_classes]
        representing the value of the loss function.
    """
-    weights = tf.expand_dims(weights, 2)
    if class_indices is not None:
      weights *= tf.reshape(
          ops.indices_to_dense_vector(class_indices,
@@ -326,12 +328,15 @@ class WeightedSoftmaxClassificationLoss(Loss):
        num_classes] representing the predicted logits for each class
      target_tensor: A float tensor of shape [batch_size, num_anchors,
        num_classes] representing one-hot encoded classification targets
-      weights: a float tensor of shape [batch_size, num_anchors]
+      weights: a float tensor of shape, either [batch_size, num_anchors,
+        num_classes] or [batch_size, num_anchors, 1]. If the shape is
+        [batch_size, num_anchors, 1], all the classses are equally weighted.

    Returns:
      loss: a float tensor of shape [batch_size, num_anchors]
        representing the value of the loss function.
    """
+    weights = tf.reduce_mean(weights, axis=2)
    num_classes = prediction_tensor.get_shape().as_list()[-1]
    prediction_tensor = tf.divide(
        prediction_tensor, self._logit_scale, name='scale_logit')
@@ -372,12 +377,15 @@ class WeightedSoftmaxClassificationAgainstLogitsLoss(Loss):
        num_classes] representing the predicted logits for each class
      target_tensor: A float tensor of shape [batch_size, num_anchors,
        num_classes] representing logit classification targets
-      weights: a float tensor of shape [batch_size, num_anchors]
+      weights: a float tensor of shape, either [batch_size, num_anchors,
+        num_classes] or [batch_size, num_anchors, 1]. If the shape is
+        [batch_size, num_anchors, 1], all the classses are equally weighted.

    Returns:
      loss: a float tensor of shape [batch_size, num_anchors]
        representing the value of the loss function.
    """
+    weights = tf.reduce_mean(weights, axis=2)
    num_classes = prediction_tensor.get_shape().as_list()[-1]
    target_tensor = self._scale_and_softmax_logits(target_tensor)
    prediction_tensor = tf.divide(prediction_tensor, self._logit_scale,
@@ -431,7 +439,9 @@ class BootstrappedSigmoidClassificationLoss(Loss):
        num_classes] representing the predicted logits for each class
      target_tensor: A float tensor of shape [batch_size, num_anchors,
        num_classes] representing one-hot encoded classification targets
-      weights: a float tensor of shape [batch_size, num_anchors]
+      weights: a float tensor of shape, either [batch_size, num_anchors,
+        num_classes] or [batch_size, num_anchors, 1]. If the shape is
+        [batch_size, num_anchors, 1], all the classses are equally weighted.

    Returns:
      loss: a float tensor of shape [batch_size, num_anchors, num_classes]
@@ -446,7 +456,7 @@ class BootstrappedSigmoidClassificationLoss(Loss):
              tf.sigmoid(prediction_tensor) > 0.5, tf.float32)
    per_entry_cross_ent = (tf.nn.sigmoid_cross_entropy_with_logits(
        labels=bootstrap_target_tensor, logits=prediction_tensor))
-    return per_entry_cross_ent * tf.expand_dims(weights, 2)
+    return per_entry_cross_ent * weights


 class HardExampleMiner(object):

--- a/research/object_detection/core/losses_test.py
+++ b/research/object_detection/core/losses_test.py
@@ -209,8 +209,14 @@ class WeightedSigmoidClassificationLossTest(tf.test.TestCase):
                                  [0, 1, 0],
                                  [1, 1, 1],
                                  [1, 0, 0]]], tf.float32)
-    weights = tf.constant([[1, 1, 1, 1],
-                           [1, 1, 1, 0]], tf.float32)
+    weights = tf.constant([[[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1]],
+                           [[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [0, 0, 0]]], tf.float32)
    loss_op = losses.WeightedSigmoidClassificationLoss()
    loss = loss_op(prediction_tensor, target_tensor, weights=weights)
    loss = tf.reduce_sum(loss)
@@ -237,8 +243,14 @@ class WeightedSigmoidClassificationLossTest(tf.test.TestCase):
                                  [0, 1, 0],
                                  [1, 1, 1],
                                  [1, 0, 0]]], tf.float32)
-    weights = tf.constant([[1, 1, 1, 1],
-                           [1, 1, 1, 0]], tf.float32)
+    weights = tf.constant([[[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1]],
+                           [[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [0, 0, 0]]], tf.float32)
    loss_op = losses.WeightedSigmoidClassificationLoss()
    loss = loss_op(prediction_tensor, target_tensor, weights=weights)
    loss = tf.reduce_sum(loss, axis=2)
@@ -266,8 +278,14 @@ class WeightedSigmoidClassificationLossTest(tf.test.TestCase):
                                  [0, 1, 0, 0],
                                  [1, 1, 1, 0],
                                  [1, 0, 0, 0]]], tf.float32)
-    weights = tf.constant([[1, 1, 1, 1],
-                           [1, 1, 1, 0]], tf.float32)
+    weights = tf.constant([[[1, 1, 1, 1],
+                            [1, 1, 1, 1],
+                            [1, 1, 1, 1],
+                            [1, 1, 1, 1]],
+                           [[1, 1, 1, 1],
+                            [1, 1, 1, 1],
+                            [1, 1, 1, 1],
+                            [0, 0, 0, 0]]], tf.float32)
    # Ignores the last class.
    class_indices = tf.constant([0, 1, 2], tf.int32)
    loss_op = losses.WeightedSigmoidClassificationLoss()
@@ -306,9 +324,18 @@ class WeightedSigmoidClassificationLossTest(tf.test.TestCase):
                                  [0, 0, 0],
                                  [0, 0, 0],
                                  [0, 0, 0]]], tf.float32)
-    weights = tf.constant([[1, 1, 1, 1],
-                           [1, 1, 1, 0],
-                           [1, 1, 1, 1]], tf.float32)
+    weights = tf.constant([[[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1]],
+                           [[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [0, 0, 0]],
+                           [[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1]]], tf.float32)
    losses_mask = tf.constant([True, True, False], tf.bool)

    loss_op = losses.WeightedSigmoidClassificationLoss()
@@ -345,7 +372,7 @@ class SigmoidFocalClassificationLossTest(tf.test.TestCase):
                                  [0],
                                  [0],
                                  [0]]], tf.float32)
-    weights = tf.constant([[1, 1, 1, 1, 1, 1]], tf.float32)
+    weights = tf.constant([[[1], [1], [1], [1], [1], [1]]], tf.float32)
    focal_loss_op = losses.SigmoidFocalClassificationLoss(gamma=2.0, alpha=None)
    sigmoid_loss_op = losses.WeightedSigmoidClassificationLoss()
    focal_loss = tf.reduce_sum(focal_loss_op(prediction_tensor, target_tensor,
@@ -371,7 +398,7 @@ class SigmoidFocalClassificationLossTest(tf.test.TestCase):
                                  [1],
                                  [0],
                                  [0]]], tf.float32)
-    weights = tf.constant([[1, 1, 1, 1, 1]], tf.float32)
+    weights = tf.constant([[[1], [1], [1], [1], [1]]], tf.float32)
    focal_loss_op = losses.SigmoidFocalClassificationLoss(gamma=2.0, alpha=None)
    sigmoid_loss_op = losses.WeightedSigmoidClassificationLoss()
    focal_loss = tf.reduce_sum(focal_loss_op(prediction_tensor, target_tensor,
@@ -397,7 +424,7 @@ class SigmoidFocalClassificationLossTest(tf.test.TestCase):
                                  [1],
                                  [0],
                                  [0]]], tf.float32)
-    weights = tf.constant([[1, 1, 1, 1, 1]], tf.float32)
+    weights = tf.constant([[[1], [1], [1], [1], [1]]], tf.float32)
    focal_loss_op = losses.SigmoidFocalClassificationLoss(gamma=2.0, alpha=None)
    sigmoid_loss_op = losses.WeightedSigmoidClassificationLoss()
    focal_loss = tf.reduce_sum(focal_loss_op(prediction_tensor, target_tensor,
@@ -423,7 +450,7 @@ class SigmoidFocalClassificationLossTest(tf.test.TestCase):
                                  [1],
                                  [0],
                                  [0]]], tf.float32)
-    weights = tf.constant([[1, 1, 1, 1, 1]], tf.float32)
+    weights = tf.constant([[[1], [1], [1], [1], [1]]], tf.float32)
    focal_loss_op = losses.SigmoidFocalClassificationLoss(gamma=2.0, alpha=1.0)
    sigmoid_loss_op = losses.WeightedSigmoidClassificationLoss()
    focal_loss = tf.reduce_sum(focal_loss_op(prediction_tensor, target_tensor,
@@ -451,7 +478,7 @@ class SigmoidFocalClassificationLossTest(tf.test.TestCase):
                                  [1],
                                  [0],
                                  [0]]], tf.float32)
-    weights = tf.constant([[1, 1, 1, 1, 1]], tf.float32)
+    weights = tf.constant([[[1], [1], [1], [1], [1]]], tf.float32)
    focal_loss_op = losses.SigmoidFocalClassificationLoss(gamma=2.0, alpha=0.0)
    sigmoid_loss_op = losses.WeightedSigmoidClassificationLoss()
    focal_loss = tf.reduce_sum(focal_loss_op(prediction_tensor, target_tensor,
@@ -485,8 +512,14 @@ class SigmoidFocalClassificationLossTest(tf.test.TestCase):
                                  [0, 1, 0],
                                  [1, 1, 1],
                                  [1, 0, 0]]], tf.float32)
-    weights = tf.constant([[1, 1, 1, 1],
-                           [1, 1, 1, 0]], tf.float32)
+    weights = tf.constant([[[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1]],
+                           [[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [0, 0, 0]]], tf.float32)
    focal_loss_op = losses.SigmoidFocalClassificationLoss(alpha=0.5, gamma=0.0)
    sigmoid_loss_op = losses.WeightedSigmoidClassificationLoss()
    focal_loss = focal_loss_op(prediction_tensor, target_tensor,
@@ -515,8 +548,14 @@ class SigmoidFocalClassificationLossTest(tf.test.TestCase):
                                  [0, 1, 0],
                                  [1, 1, 1],
                                  [1, 0, 0]]], tf.float32)
-    weights = tf.constant([[1, 1, 1, 1],
-                           [1, 1, 1, 0]], tf.float32)
+    weights = tf.constant([[[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1]],
+                           [[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [0, 0, 0]]], tf.float32)
    focal_loss_op = losses.SigmoidFocalClassificationLoss(alpha=None, gamma=0.0)
    sigmoid_loss_op = losses.WeightedSigmoidClassificationLoss()
    focal_loss = focal_loss_op(prediction_tensor, target_tensor,
@@ -546,8 +585,14 @@ class SigmoidFocalClassificationLossTest(tf.test.TestCase):
                                  [0, 1, 0],
                                  [1, 0, 0],
                                  [1, 0, 0]]], tf.float32)
-    weights = tf.constant([[1, 1, 1, 1],
-                           [1, 1, 1, 1]], tf.float32)
+    weights = tf.constant([[[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1]],
+                           [[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1]]], tf.float32)
    focal_loss_op = losses.SigmoidFocalClassificationLoss(alpha=1.0, gamma=0.0)

    focal_loss = tf.reduce_sum(focal_loss_op(prediction_tensor, target_tensor,
@@ -578,8 +623,14 @@ class SigmoidFocalClassificationLossTest(tf.test.TestCase):
                                  [0, 1, 0],
                                  [1, 0, 0],
                                  [1, 0, 0]]], tf.float32)
-    weights = tf.constant([[1, 1, 1, 1],
-                           [1, 1, 1, 1]], tf.float32)
+    weights = tf.constant([[[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1]],
+                           [[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1]]], tf.float32)
    focal_loss_op = losses.SigmoidFocalClassificationLoss(alpha=0.75, gamma=0.0)

    focal_loss = tf.reduce_sum(focal_loss_op(prediction_tensor, target_tensor,
@@ -620,9 +671,18 @@ class SigmoidFocalClassificationLossTest(tf.test.TestCase):
                                  [1, 0, 0],
                                  [1, 0, 0],
                                  [1, 0, 0]]], tf.float32)
-    weights = tf.constant([[1, 1, 1, 1],
-                           [1, 1, 1, 1],
-                           [1, 1, 1, 1]], tf.float32)
+    weights = tf.constant([[[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1]],
+                           [[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1]],
+                           [[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1]]], tf.float32)
    losses_mask = tf.constant([True, True, False], tf.bool)
    focal_loss_op = losses.SigmoidFocalClassificationLoss(alpha=0.75, gamma=0.0)

@@ -659,8 +719,14 @@ class WeightedSoftmaxClassificationLossTest(tf.test.TestCase):
                                  [0, 1, 0],
                                  [0, 1, 0],
                                  [1, 0, 0]]], tf.float32)
-    weights = tf.constant([[1, 1, .5, 1],
-                           [1, 1, 1, 0]], tf.float32)
+    weights = tf.constant([[[1, 1, 1],
+                            [1, 1, 1],
+                            [0.5, 0.5, 0.5],
+                            [1, 1, 1]],
+                           [[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [0, 0, 0]]], tf.float32)
    loss_op = losses.WeightedSoftmaxClassificationLoss()
    loss = loss_op(prediction_tensor, target_tensor, weights=weights)
    loss = tf.reduce_sum(loss)
@@ -687,8 +753,14 @@ class WeightedSoftmaxClassificationLossTest(tf.test.TestCase):
                                  [0, 1, 0],
                                  [0, 1, 0],
                                  [1, 0, 0]]], tf.float32)
-    weights = tf.constant([[1, 1, .5, 1],
-                           [1, 1, 1, 0]], tf.float32)
+    weights = tf.constant([[[1, 1, 1],
+                            [1, 1, 1],
+                            [0.5, 0.5, 0.5],
+                            [1, 1, 1]],
+                           [[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [0, 0, 0]]], tf.float32)
    loss_op = losses.WeightedSoftmaxClassificationLoss()
    loss = loss_op(prediction_tensor, target_tensor, weights=weights)

@@ -718,8 +790,14 @@ class WeightedSoftmaxClassificationLossTest(tf.test.TestCase):
                                  [0, 1, 0],
                                  [0, 1, 0],
                                  [1, 0, 0]]], tf.float32)
-    weights = tf.constant([[1, 1, 1, 1],
-                           [1, 1, 1, 1]], tf.float32)
+    weights = tf.constant([[[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1]],
+                           [[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1]]], tf.float32)
    loss_op = losses.WeightedSoftmaxClassificationLoss(logit_scale=logit_scale)
    loss = loss_op(prediction_tensor, target_tensor, weights=weights)

@@ -755,9 +833,18 @@ class WeightedSoftmaxClassificationLossTest(tf.test.TestCase):
                                  [1, 0, 0],
                                  [1, 0, 0],
                                  [1, 0, 0]]], tf.float32)
-    weights = tf.constant([[1, 1, .5, 1],
-                           [1, 1, 1, 0],
-                           [1, 1, 1, 1]], tf.float32)
+    weights = tf.constant([[[1, 1, 1],
+                            [1, 1, 1],
+                            [0.5, 0.5, 0.5],
+                            [1, 1, 1]],
+                           [[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [0, 0, 0]],
+                           [[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1]]], tf.float32)
    losses_mask = tf.constant([True, True, False], tf.bool)
    loss_op = losses.WeightedSoftmaxClassificationLoss()
    loss = loss_op(prediction_tensor, target_tensor, weights=weights,
@@ -792,6 +879,11 @@ class WeightedSoftmaxClassificationAgainstLogitsLossTest(tf.test.TestCase):
                                  [100, -100, -100]]], tf.float32)
    weights = tf.constant([[1, 1, .5, 1],
                           [1, 1, 1, 1]], tf.float32)
+    weights_shape = tf.shape(weights)
+    weights_multiple = tf.concat(
+        [tf.ones_like(weights_shape), tf.constant([3])],
+        axis=0)
+    weights = tf.tile(tf.expand_dims(weights, 2), weights_multiple)
    loss_op = losses.WeightedSoftmaxClassificationAgainstLogitsLoss()
    loss = loss_op(prediction_tensor, target_tensor, weights=weights)
    loss = tf.reduce_sum(loss)
@@ -820,6 +912,11 @@ class WeightedSoftmaxClassificationAgainstLogitsLossTest(tf.test.TestCase):
                                  [100, -100, -100]]], tf.float32)
    weights = tf.constant([[1, 1, .5, 1],
                           [1, 1, 1, 0]], tf.float32)
+    weights_shape = tf.shape(weights)
+    weights_multiple = tf.concat(
+        [tf.ones_like(weights_shape), tf.constant([3])],
+        axis=0)
+    weights = tf.tile(tf.expand_dims(weights, 2), weights_multiple)
    loss_op = losses.WeightedSoftmaxClassificationAgainstLogitsLoss()
    loss = loss_op(prediction_tensor, target_tensor, weights=weights)

@@ -849,6 +946,11 @@ class WeightedSoftmaxClassificationAgainstLogitsLossTest(tf.test.TestCase):
                                  [100, -100, -100]]], tf.float32)
    weights = tf.constant([[1, 1, .5, 1],
                           [1, 1, 1, 0]], tf.float32)
+    weights_shape = tf.shape(weights)
+    weights_multiple = tf.concat(
+        [tf.ones_like(weights_shape), tf.constant([3])],
+        axis=0)
+    weights = tf.tile(tf.expand_dims(weights, 2), weights_multiple)
    loss_op = losses.WeightedSoftmaxClassificationAgainstLogitsLoss(
        logit_scale=logit_scale)
    loss = loss_op(prediction_tensor, target_tensor, weights=weights)
@@ -894,8 +996,14 @@ class BootstrappedSigmoidClassificationLossTest(tf.test.TestCase):
                                  [0, 1, 0],
                                  [1, 1, 1],
                                  [1, 0, 0]]], tf.float32)
-    weights = tf.constant([[1, 1, 1, 1],
-                           [1, 1, 1, 0]], tf.float32)
+    weights = tf.constant([[[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1]],
+                           [[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [0, 0, 0]]], tf.float32)
    alpha = tf.constant(.5, tf.float32)
    loss_op = losses.BootstrappedSigmoidClassificationLoss(
        alpha, bootstrap_type='soft')
@@ -923,8 +1031,14 @@ class BootstrappedSigmoidClassificationLossTest(tf.test.TestCase):
                                  [0, 1, 0],
                                  [1, 1, 1],
                                  [1, 0, 0]]], tf.float32)
-    weights = tf.constant([[1, 1, 1, 1],
-                           [1, 1, 1, 0]], tf.float32)
+    weights = tf.constant([[[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1]],
+                           [[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [0, 0, 0]]], tf.float32)
    alpha = tf.constant(.5, tf.float32)
    loss_op = losses.BootstrappedSigmoidClassificationLoss(
        alpha, bootstrap_type='hard')
@@ -952,8 +1066,14 @@ class BootstrappedSigmoidClassificationLossTest(tf.test.TestCase):
                                  [0, 1, 0],
                                  [1, 1, 1],
                                  [1, 0, 0]]], tf.float32)
-    weights = tf.constant([[1, 1, 1, 1],
-                           [1, 1, 1, 0]], tf.float32)
+    weights = tf.constant([[[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1]],
+                           [[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [0, 0, 0]]], tf.float32)
    alpha = tf.constant(.5, tf.float32)
    loss_op = losses.BootstrappedSigmoidClassificationLoss(
        alpha, bootstrap_type='hard')

--- a/research/object_detection/core/matcher.py
+++ b/research/object_detection/core/matcher.py
@@ -197,8 +197,10 @@ class Match(object):
        The shape of the gathered tensor is [match_results.shape[0]] +
        input_tensor.shape[1:].
    """
-    input_tensor = tf.concat([tf.stack([ignored_value, unmatched_value]),
-                              input_tensor], axis=0)
+    input_tensor = tf.concat(
+        [tf.stack([ignored_value, unmatched_value]),
+         tf.to_float(input_tensor)],
+        axis=0)
    gather_indices = tf.maximum(self.match_results + 2, 0)
    gathered_tensor = self._gather_op(input_tensor, gather_indices)
    return gathered_tensor

--- a/research/object_detection/core/model.py
+++ b/research/object_detection/core/model.py
@@ -289,6 +289,18 @@ class DetectionModel(object):
      self._groundtruth_lists[
          fields.InputDataFields.is_annotated] = is_annotated_list

+  @abstractmethod
+  def regularization_losses(self):
+    """Returns a list of regularization losses for this model.
+
+    Returns a list of regularization losses for this model that the estimator
+    needs to use during training/optimization.
+
+    Returns:
+      A list of regularization loss tensors.
+    """
+    pass
+
  @abstractmethod
  def restore_map(self, fine_tune_checkpoint_type='detection'):
    """Returns a map of variables to load from a foreign checkpoint.
@@ -312,3 +324,16 @@ class DetectionModel(object):
      the model graph.
    """
    pass
+
+  @abstractmethod
+  def updates(self):
+    """Returns a list of update operators for this model.
+
+    Returns a list of update operators for this model that must be executed at
+    each training step. The estimator's train op needs to have a control
+    dependency on these updates.
+
+    Returns:
+      A list of update operators.
+    """
+    pass
--- a/research/object_detection/core/post_processing.py
+++ b/research/object_detection/core/post_processing.py
@@ -15,6 +15,7 @@

 """Post-processing operations on detected boxes."""

+import numpy as np
 import tensorflow as tf

 from object_detection.core import box_list
@@ -407,28 +408,36 @@ def batch_multiclass_non_max_suppression(boxes,
          for key, value in zip(additional_fields, args[4:-1])
      }
      per_image_num_valid_boxes = args[-1]
-      per_image_boxes = tf.reshape(
-          tf.slice(per_image_boxes, 3 * [0],
-                   tf.stack([per_image_num_valid_boxes, -1, -1])), [-1, q, 4])
-      per_image_scores = tf.reshape(
-          tf.slice(per_image_scores, [0, 0],
-                   tf.stack([per_image_num_valid_boxes, -1])),
-          [-1, num_classes])
-      per_image_masks = tf.reshape(
-          tf.slice(per_image_masks, 4 * [0],
-                   tf.stack([per_image_num_valid_boxes, -1, -1, -1])),
-          [-1, q, per_image_masks.shape[2].value,
-           per_image_masks.shape[3].value])
-      if per_image_additional_fields is not None:
-        for key, tensor in per_image_additional_fields.items():
-          additional_field_shape = tensor.get_shape()
-          additional_field_dim = len(additional_field_shape)
-          per_image_additional_fields[key] = tf.reshape(
-              tf.slice(per_image_additional_fields[key],
-                       additional_field_dim * [0],
-                       tf.stack([per_image_num_valid_boxes] +
-                                (additional_field_dim - 1) * [-1])),
-              [-1] + [dim.value for dim in additional_field_shape[1:]])
+      if use_static_shapes:
+        total_proposals = tf.shape(per_image_scores)
+        per_image_scores = tf.where(
+            tf.less(tf.range(total_proposals[0]), per_image_num_valid_boxes),
+            per_image_scores,
+            tf.fill(total_proposals, np.finfo('float32').min))
+      else:
+        per_image_boxes = tf.reshape(
+            tf.slice(per_image_boxes, 3 * [0],
+                     tf.stack([per_image_num_valid_boxes, -1, -1])), [-1, q, 4])
+        per_image_scores = tf.reshape(
+            tf.slice(per_image_scores, [0, 0],
+                     tf.stack([per_image_num_valid_boxes, -1])),
+            [-1, num_classes])
+        per_image_masks = tf.reshape(
+            tf.slice(per_image_masks, 4 * [0],
+                     tf.stack([per_image_num_valid_boxes, -1, -1, -1])),
+            [-1, q, per_image_masks.shape[2].value,
+             per_image_masks.shape[3].value])
+        if per_image_additional_fields is not None:
+          for key, tensor in per_image_additional_fields.items():
+            additional_field_shape = tensor.get_shape()
+            additional_field_dim = len(additional_field_shape)
+            per_image_additional_fields[key] = tf.reshape(
+                tf.slice(per_image_additional_fields[key],
+                         additional_field_dim * [0],
+                         tf.stack([per_image_num_valid_boxes] +
+                                  (additional_field_dim - 1) * [-1])),
+                [-1] + [dim.value for dim in additional_field_shape[1:]])
+
      nmsed_boxlist, num_valid_nms_boxes = multiclass_non_max_suppression(
          per_image_boxes,
          per_image_scores,

--- a/research/object_detection/core/preprocessor.py
+++ b/research/object_detection/core/preprocessor.py
@@ -1108,7 +1108,7 @@ def random_jitter_boxes(boxes, ratio=0.05, seed=None):
 def _strict_random_crop_image(image,
                              boxes,
                              labels,
-                              label_scores=None,
+                              label_scores,
                              multiclass_scores=None,
                              masks=None,
                              keypoints=None,
@@ -1116,14 +1116,14 @@ def _strict_random_crop_image(image,
                              aspect_ratio_range=(0.75, 1.33),
                              area_range=(0.1, 1.0),
                              overlap_thresh=0.3,
+                              clip_boxes=True,
                              preprocess_vars_cache=None):
  """Performs random crop.

-  Note: boxes will be clipped to the crop. Keypoint coordinates that are
-  outside the crop will be set to NaN, which is consistent with the original
-  keypoint encoding for non-existing keypoints. This function always crops
-  the image and is supposed to be used by `random_crop_image` function which
-  sometimes returns image unchanged.
+  Note: Keypoint coordinates that are outside the crop will be set to NaN, which
+  is consistent with the original keypoint encoding for non-existing keypoints.
+  This function always crops the image and is supposed to be used by
+  `random_crop_image` function which sometimes returns the image unchanged.

  Args:
    image: rank 3 float32 tensor containing 1 image -> [height, width, channels]
@@ -1152,6 +1152,7 @@ def _strict_random_crop_image(image,
                original image.
    overlap_thresh: minimum overlap thresh with new cropped
                    image to keep the box.
+    clip_boxes: whether to clip the boxes to the cropped image.
    preprocess_vars_cache: PreprocessorCache object that records previously
                           performed augmentations. Updated in-place. If this
                           function is called multiple times with the same
@@ -1232,8 +1233,9 @@ def _strict_random_crop_image(image,
    new_boxlist = box_list_ops.change_coordinate_frame(overlapping_boxlist,
                                                       im_box_rank1)
    new_boxes = new_boxlist.get()
-    new_boxes = tf.clip_by_value(
-        new_boxes, clip_value_min=0.0, clip_value_max=1.0)
+    if clip_boxes:
+      new_boxes = tf.clip_by_value(
+          new_boxes, clip_value_min=0.0, clip_value_max=1.0)

    result = [new_image, new_boxes, new_labels]

@@ -1262,8 +1264,9 @@ def _strict_random_crop_image(image,
          keypoints_of_boxes_inside_window, keep_ids)
      new_keypoints = keypoint_ops.change_coordinate_frame(
          keypoints_of_boxes_completely_inside_window, im_box_rank1)
-      new_keypoints = keypoint_ops.prune_outside_window(new_keypoints,
-                                                        [0.0, 0.0, 1.0, 1.0])
+      if clip_boxes:
+        new_keypoints = keypoint_ops.prune_outside_window(new_keypoints,
+                                                          [0.0, 0.0, 1.0, 1.0])
      result.append(new_keypoints)

    return tuple(result)
@@ -1280,6 +1283,7 @@ def random_crop_image(image,
                      aspect_ratio_range=(0.75, 1.33),
                      area_range=(0.1, 1.0),
                      overlap_thresh=0.3,
+                      clip_boxes=True,
                      random_coef=0.0,
                      seed=None,
                      preprocess_vars_cache=None):
@@ -1294,9 +1298,8 @@ def random_crop_image(image,
  form (e.g., lie in the unit square [0, 1]).
  This function will return the original image with probability random_coef.

-  Note: boxes will be clipped to the crop. Keypoint coordinates that are
-  outside the crop will be set to NaN, which is consistent with the original
-  keypoint encoding for non-existing keypoints.
+  Note: Keypoint coordinates that are outside the crop will be set to NaN, which
+  is consistent with the original keypoint encoding for non-existing keypoints.

  Args:
    image: rank 3 float32 tensor contains 1 image -> [height, width, channels]
@@ -1325,6 +1328,7 @@ def random_crop_image(image,
                original image.
    overlap_thresh: minimum overlap thresh with new cropped
                    image to keep the box.
+    clip_boxes: whether to clip the boxes to the cropped image.
    random_coef: a random coefficient that defines the chance of getting the
                 original image. If random_coef is 0, we will always get the
                 cropped image, and if it is 1.0, we will always get the
@@ -1365,6 +1369,7 @@ def random_crop_image(image,
        aspect_ratio_range=aspect_ratio_range,
        area_range=area_range,
        overlap_thresh=overlap_thresh,
+        clip_boxes=clip_boxes,
        preprocess_vars_cache=preprocess_vars_cache)

  # avoids tf.cond to make faster RCNN training on borg. See b/140057645.
@@ -1515,12 +1520,13 @@ def random_pad_image(image,
 def random_crop_pad_image(image,
                          boxes,
                          labels,
-                          label_scores=None,
+                          label_scores,
                          multiclass_scores=None,
                          min_object_covered=1.0,
                          aspect_ratio_range=(0.75, 1.33),
                          area_range=(0.1, 1.0),
                          overlap_thresh=0.3,
+                          clip_boxes=True,
                          random_coef=0.0,
                          min_padded_size_ratio=(1.0, 1.0),
                          max_padded_size_ratio=(2.0, 2.0),
@@ -1558,6 +1564,7 @@ def random_crop_pad_image(image,
                original image.
    overlap_thresh: minimum overlap thresh with new cropped
                    image to keep the box.
+    clip_boxes: whether to clip the boxes to the cropped image.
    random_coef: a random coefficient that defines the chance of getting the
                 original image. If random_coef is 0, we will always get the
                 cropped image, and if it is 1.0, we will always get the
@@ -1599,6 +1606,7 @@ def random_crop_pad_image(image,
      aspect_ratio_range=aspect_ratio_range,
      area_range=area_range,
      overlap_thresh=overlap_thresh,
+      clip_boxes=clip_boxes,
      random_coef=random_coef,
      seed=seed,
      preprocess_vars_cache=preprocess_vars_cache)
@@ -1639,12 +1647,13 @@ def random_crop_pad_image(image,
 def random_crop_to_aspect_ratio(image,
                                boxes,
                                labels,
-                                label_scores=None,
+                                label_scores,
                                multiclass_scores=None,
                                masks=None,
                                keypoints=None,
                                aspect_ratio=1.0,
                                overlap_thresh=0.3,
+                                clip_boxes=True,
                                seed=None,
                                preprocess_vars_cache=None):
  """Randomly crops an image to the specified aspect ratio.
@@ -1680,6 +1689,7 @@ def random_crop_to_aspect_ratio(image,
    aspect_ratio: the aspect ratio of cropped image.
    overlap_thresh: minimum overlap thresh with new cropped
                    image to keep the box.
+    clip_boxes: whether to clip the boxes to the cropped image.
    seed: random seed.
    preprocess_vars_cache: PreprocessorCache object that records previously
                           performed augmentations. Updated in-place. If this
@@ -1767,9 +1777,9 @@ def random_crop_to_aspect_ratio(image,
    new_labels = overlapping_boxlist.get_field('labels')
    new_boxlist = box_list_ops.change_coordinate_frame(overlapping_boxlist,
                                                       im_box)
-    new_boxlist = box_list_ops.clip_to_window(new_boxlist,
-                                              tf.constant([0.0, 0.0, 1.0, 1.0],
-                                                          tf.float32))
+    if clip_boxes:
+      new_boxlist = box_list_ops.clip_to_window(
+          new_boxlist, tf.constant([0.0, 0.0, 1.0, 1.0], tf.float32))
    new_boxes = new_boxlist.get()

    result = [new_image, new_boxes, new_labels]
@@ -1793,8 +1803,9 @@ def random_crop_to_aspect_ratio(image,
      keypoints_inside_window = tf.gather(keypoints, keep_ids)
      new_keypoints = keypoint_ops.change_coordinate_frame(
          keypoints_inside_window, im_box)
-      new_keypoints = keypoint_ops.prune_outside_window(new_keypoints,
-                                                        [0.0, 0.0, 1.0, 1.0])
+      if clip_boxes:
+        new_keypoints = keypoint_ops.prune_outside_window(new_keypoints,
+                                                          [0.0, 0.0, 1.0, 1.0])
      result.append(new_keypoints)

    return tuple(result)
@@ -2432,7 +2443,7 @@ def rgb_to_gray(image):
 def ssd_random_crop(image,
                    boxes,
                    labels,
-                    label_scores=None,
+                    label_scores,
                    multiclass_scores=None,
                    masks=None,
                    keypoints=None,
@@ -2440,6 +2451,7 @@ def ssd_random_crop(image,
                    aspect_ratio_range=((0.5, 2.0),) * 7,
                    area_range=((0.1, 1.0),) * 7,
                    overlap_thresh=(0.0, 0.1, 0.3, 0.5, 0.7, 0.9, 1.0),
+                    clip_boxes=(True,) * 7,
                    random_coef=(0.15,) * 7,
                    seed=None,
                    preprocess_vars_cache=None):
@@ -2474,6 +2486,7 @@ def ssd_random_crop(image,
                original image.
    overlap_thresh: minimum overlap thresh with new cropped
                    image to keep the box.
+    clip_boxes: whether to clip the boxes to the cropped image.
    random_coef: a random coefficient that defines the chance of getting the
                 original image. If random_coef is 0, we will always get the
                 cropped image, and if it is 1.0, we will always get the
@@ -2543,6 +2556,7 @@ def ssd_random_crop(image,
        aspect_ratio_range=aspect_ratio_range[index],
        area_range=area_range[index],
        overlap_thresh=overlap_thresh[index],
+        clip_boxes=clip_boxes[index],
        random_coef=random_coef[index],
        seed=seed,
        preprocess_vars_cache=preprocess_vars_cache)
@@ -2561,12 +2575,13 @@ def ssd_random_crop(image,
 def ssd_random_crop_pad(image,
                        boxes,
                        labels,
-                        label_scores=None,
+                        label_scores,
                        multiclass_scores=None,
                        min_object_covered=(0.1, 0.3, 0.5, 0.7, 0.9, 1.0),
                        aspect_ratio_range=((0.5, 2.0),) * 6,
                        area_range=((0.1, 1.0),) * 6,
                        overlap_thresh=(0.1, 0.3, 0.5, 0.7, 0.9, 1.0),
+                        clip_boxes=(True,) * 6,
                        random_coef=(0.15,) * 6,
                        min_padded_size_ratio=((1.0, 1.0),) * 6,
                        max_padded_size_ratio=((2.0, 2.0),) * 6,
@@ -2599,6 +2614,7 @@ def ssd_random_crop_pad(image,
                original image.
    overlap_thresh: minimum overlap thresh with new cropped
                    image to keep the box.
+    clip_boxes: whether to clip the boxes to the cropped image.
    random_coef: a random coefficient that defines the chance of getting the
                 original image. If random_coef is 0, we will always get the
                 cropped image, and if it is 1.0, we will always get the
@@ -2646,6 +2662,7 @@ def ssd_random_crop_pad(image,
        aspect_ratio_range=aspect_ratio_range[index],
        area_range=area_range[index],
        overlap_thresh=overlap_thresh[index],
+        clip_boxes=clip_boxes[index],
        random_coef=random_coef[index],
        min_padded_size_ratio=min_padded_size_ratio[index],
        max_padded_size_ratio=max_padded_size_ratio[index],
@@ -2666,7 +2683,7 @@ def ssd_random_crop_fixed_aspect_ratio(
    image,
    boxes,
    labels,
-    label_scores=None,
+    label_scores,
    multiclass_scores=None,
    masks=None,
    keypoints=None,
@@ -2674,6 +2691,7 @@ def ssd_random_crop_fixed_aspect_ratio(
    aspect_ratio=1.0,
    area_range=((0.1, 1.0),) * 7,
    overlap_thresh=(0.0, 0.1, 0.3, 0.5, 0.7, 0.9, 1.0),
+    clip_boxes=(True,) * 7,
    random_coef=(0.15,) * 7,
    seed=None,
    preprocess_vars_cache=None):
@@ -2711,6 +2729,7 @@ def ssd_random_crop_fixed_aspect_ratio(
                original image.
    overlap_thresh: minimum overlap thresh with new cropped
                    image to keep the box.
+    clip_boxes: whether to clip the boxes to the cropped image.
    random_coef: a random coefficient that defines the chance of getting the
                 original image. If random_coef is 0, we will always get the
                 cropped image, and if it is 1.0, we will always get the
@@ -2751,6 +2770,7 @@ def ssd_random_crop_fixed_aspect_ratio(
      aspect_ratio_range=aspect_ratio_range,
      area_range=area_range,
      overlap_thresh=overlap_thresh,
+      clip_boxes=clip_boxes,
      random_coef=random_coef,
      seed=seed,
      preprocess_vars_cache=preprocess_vars_cache)
@@ -2781,6 +2801,7 @@ def ssd_random_crop_fixed_aspect_ratio(
      masks=new_masks,
      keypoints=new_keypoints,
      aspect_ratio=aspect_ratio,
+      clip_boxes=clip_boxes,
      seed=seed,
      preprocess_vars_cache=preprocess_vars_cache)

@@ -2791,7 +2812,7 @@ def ssd_random_crop_pad_fixed_aspect_ratio(
    image,
    boxes,
    labels,
-    label_scores=None,
+    label_scores,
    multiclass_scores=None,
    masks=None,
    keypoints=None,
@@ -2800,6 +2821,7 @@ def ssd_random_crop_pad_fixed_aspect_ratio(
    aspect_ratio_range=((0.5, 2.0),) * 7,
    area_range=((0.1, 1.0),) * 7,
    overlap_thresh=(0.0, 0.1, 0.3, 0.5, 0.7, 0.9, 1.0),
+    clip_boxes=(True,) * 7,
    random_coef=(0.15,) * 7,
    min_padded_size_ratio=(1.0, 1.0),
    max_padded_size_ratio=(2.0, 2.0),
@@ -2841,6 +2863,7 @@ def ssd_random_crop_pad_fixed_aspect_ratio(
                original image.
    overlap_thresh: minimum overlap thresh with new cropped
                    image to keep the box.
+    clip_boxes: whether to clip the boxes to the cropped image.
    random_coef: a random coefficient that defines the chance of getting the
                 original image. If random_coef is 0, we will always get the
                 cropped image, and if it is 1.0, we will always get the
@@ -2882,6 +2905,7 @@ def ssd_random_crop_pad_fixed_aspect_ratio(
      aspect_ratio_range=aspect_ratio_range,
      area_range=area_range,
      overlap_thresh=overlap_thresh,
+      clip_boxes=clip_boxes,
      random_coef=random_coef,
      seed=seed,
      preprocess_vars_cache=preprocess_vars_cache)
@@ -2950,7 +2974,7 @@ def convert_class_logits_to_softmax(multiclass_scores, temperature=1.0):
  return multiclass_scores


-def get_default_func_arg_map(include_label_scores=False,
+def get_default_func_arg_map(include_label_scores=True,
                             include_multiclass_scores=False,
                             include_instance_masks=False,
                             include_keypoints=False):
@@ -2972,7 +2996,7 @@ def get_default_func_arg_map(include_label_scores=False,
  groundtruth_label_scores = None
  if include_label_scores:
    groundtruth_label_scores = (
-        fields.InputDataFields.groundtruth_confidences)
+        fields.InputDataFields.groundtruth_weights)

  multiclass_scores = None
  if include_multiclass_scores:

--- a/research/object_detection/core/preprocessor_test.py
+++ b/research/object_detection/core/preprocessor_test.py
@@ -70,12 +70,9 @@ class PreprocessorTest(tf.test.TestCase):
        [[0.0, 0.25, 0.75, 1.0], [0.25, 0.5, 0.75, 1.0]], dtype=tf.float32)
    return boxes

-  def createTestLabelScores(self):
+  def createTestGroundtruthWeights(self):
    return tf.constant([1.0, 0.5], dtype=tf.float32)

-  def createTestLabelScoresWithMissingScore(self):
-    return tf.constant([0.5, np.nan], dtype=tf.float32)
-
  def createTestMasks(self):
    mask = np.array([
        [[255.0, 0.0, 0.0],
@@ -332,15 +329,15 @@ class PreprocessorTest(tf.test.TestCase):
  def testRetainBoxesAboveThreshold(self):
    boxes = self.createTestBoxes()
    labels = self.createTestLabels()
-    label_scores = self.createTestLabelScores()
+    weights = self.createTestGroundtruthWeights()
    (retained_boxes, retained_labels,
-     retained_label_scores) = preprocessor.retain_boxes_above_threshold(
-         boxes, labels, label_scores, threshold=0.6)
+     retained_weights) = preprocessor.retain_boxes_above_threshold(
+         boxes, labels, weights, threshold=0.6)
    with self.test_session() as sess:
-      (retained_boxes_, retained_labels_, retained_label_scores_,
+      (retained_boxes_, retained_labels_, retained_weights_,
       expected_retained_boxes_, expected_retained_labels_,
-       expected_retained_label_scores_) = sess.run([
-           retained_boxes, retained_labels, retained_label_scores,
+       expected_retained_weights_) = sess.run([
+           retained_boxes, retained_labels, retained_weights,
           self.expectedBoxesAfterThresholding(),
           self.expectedLabelsAfterThresholding(),
           self.expectedLabelScoresAfterThresholding()])
@@ -349,18 +346,18 @@ class PreprocessorTest(tf.test.TestCase):
      self.assertAllClose(
          retained_labels_, expected_retained_labels_)
      self.assertAllClose(
-          retained_label_scores_, expected_retained_label_scores_)
+          retained_weights_, expected_retained_weights_)

  def testRetainBoxesAboveThresholdWithMultiClassScores(self):
    boxes = self.createTestBoxes()
    labels = self.createTestLabels()
-    label_scores = self.createTestLabelScores()
+    weights = self.createTestGroundtruthWeights()
    multiclass_scores = self.createTestMultiClassScores()
    (_, _, _,
     retained_multiclass_scores) = preprocessor.retain_boxes_above_threshold(
         boxes,
         labels,
-         label_scores,
+         weights,
         multiclass_scores=multiclass_scores,
         threshold=0.6)
    with self.test_session() as sess:
@@ -376,10 +373,10 @@ class PreprocessorTest(tf.test.TestCase):
  def testRetainBoxesAboveThresholdWithMasks(self):
    boxes = self.createTestBoxes()
    labels = self.createTestLabels()
-    label_scores = self.createTestLabelScores()
+    weights = self.createTestGroundtruthWeights()
    masks = self.createTestMasks()
    _, _, _, retained_masks = preprocessor.retain_boxes_above_threshold(
-        boxes, labels, label_scores, masks, threshold=0.6)
+        boxes, labels, weights, masks, threshold=0.6)
    with self.test_session() as sess:
      retained_masks_, expected_retained_masks_ = sess.run([
          retained_masks, self.expectedMasksAfterThresholding()])
@@ -390,10 +387,10 @@ class PreprocessorTest(tf.test.TestCase):
  def testRetainBoxesAboveThresholdWithKeypoints(self):
    boxes = self.createTestBoxes()
    labels = self.createTestLabels()
-    label_scores = self.createTestLabelScores()
+    weights = self.createTestGroundtruthWeights()
    keypoints = self.createTestKeypoints()
    (_, _, _, retained_keypoints) = preprocessor.retain_boxes_above_threshold(
-        boxes, labels, label_scores, keypoints=keypoints, threshold=0.6)
+        boxes, labels, weights, keypoints=keypoints, threshold=0.6)
    with self.test_session() as sess:
      (retained_keypoints_,
       expected_retained_keypoints_) = sess.run([
@@ -403,28 +400,6 @@ class PreprocessorTest(tf.test.TestCase):
      self.assertAllClose(
          retained_keypoints_, expected_retained_keypoints_)

-  def testRetainBoxesAboveThresholdWithMissingScore(self):
-    boxes = self.createTestBoxes()
-    labels = self.createTestLabels()
-    label_scores = self.createTestLabelScoresWithMissingScore()
-    (retained_boxes, retained_labels,
-     retained_label_scores) = preprocessor.retain_boxes_above_threshold(
-         boxes, labels, label_scores, threshold=0.6)
-    with self.test_session() as sess:
-      (retained_boxes_, retained_labels_, retained_label_scores_,
-       expected_retained_boxes_, expected_retained_labels_,
-       expected_retained_label_scores_) = sess.run([
-           retained_boxes, retained_labels, retained_label_scores,
-           self.expectedBoxesAfterThresholdingWithMissingScore(),
-           self.expectedLabelsAfterThresholdingWithMissingScore(),
-           self.expectedLabelScoresAfterThresholdingWithMissingScore()])
-      self.assertAllClose(
-          retained_boxes_, expected_retained_boxes_)
-      self.assertAllClose(
-          retained_labels_, expected_retained_labels_)
-      self.assertAllClose(
-          retained_label_scores_, expected_retained_label_scores_)
-
  def testFlipBoxesLeftRight(self):
    boxes = self.createTestBoxes()
    flipped_boxes = preprocessor._flip_boxes_left_right(boxes)
@@ -482,6 +457,7 @@ class PreprocessorTest(tf.test.TestCase):
    cache = preprocessor_cache.PreprocessorCache()
    images = self.createTestImages()
    boxes = self.createTestBoxes()
+    weights = self.createTestGroundtruthWeights()
    classes = self.createTestLabels()
    masks = self.createTestMasks()
    keypoints = self.createTestKeypoints()
@@ -491,6 +467,7 @@ class PreprocessorTest(tf.test.TestCase):
    for i in range(num_runs):
      tensor_dict = {
          fields.InputDataFields.image: images,
+          fields.InputDataFields.groundtruth_weights: weights
      }
      num_outputs = 1
      if test_boxes:
@@ -1075,10 +1052,12 @@ class PreprocessorTest(tf.test.TestCase):
    images = self.createTestImages()
    boxes = self.createTestBoxes()
    labels = self.createTestLabels()
+    weights = self.createTestGroundtruthWeights()
    tensor_dict = {
        fields.InputDataFields.image: images,
        fields.InputDataFields.groundtruth_boxes: boxes,
        fields.InputDataFields.groundtruth_classes: labels,
+        fields.InputDataFields.groundtruth_weights: weights,
    }
    distorted_tensor_dict = preprocessor.preprocess(tensor_dict,
                                                    preprocessing_options)
@@ -1126,10 +1105,12 @@ class PreprocessorTest(tf.test.TestCase):
    images = self.createTestImages()
    boxes = self.createTestBoxes()
    labels = self.createTestLabels()
+    weights = self.createTestGroundtruthWeights()
    tensor_dict = {
        fields.InputDataFields.image: images,
        fields.InputDataFields.groundtruth_boxes: boxes,
        fields.InputDataFields.groundtruth_classes: labels,
+        fields.InputDataFields.groundtruth_weights: weights,
    }
    distorted_tensor_dict = preprocessor.preprocess(
        tensor_dict, preprocessing_options)
@@ -1163,10 +1144,12 @@ class PreprocessorTest(tf.test.TestCase):
    images = self.createTestImages()
    boxes = self.createTestBoxesOutOfImage()
    labels = self.createTestLabels()
+    weights = self.createTestGroundtruthWeights()
    tensor_dict = {
        fields.InputDataFields.image: images,
        fields.InputDataFields.groundtruth_boxes: boxes,
        fields.InputDataFields.groundtruth_classes: labels,
+        fields.InputDataFields.groundtruth_weights: weights,
        }
    distorted_tensor_dict = preprocessor.preprocess(tensor_dict,
                                                    preprocessing_options)
@@ -1197,12 +1180,12 @@ class PreprocessorTest(tf.test.TestCase):
    images = self.createTestImages()
    boxes = self.createTestBoxes()
    labels = self.createTestLabels()
-    label_scores = self.createTestLabelScores()
+    weights = self.createTestGroundtruthWeights()
    tensor_dict = {
        fields.InputDataFields.image: images,
        fields.InputDataFields.groundtruth_boxes: boxes,
        fields.InputDataFields.groundtruth_classes: labels,
-        fields.InputDataFields.groundtruth_label_scores: label_scores
+        fields.InputDataFields.groundtruth_weights: weights
    }
    tensor_dict = preprocessor.preprocess(tensor_dict, preprocessing_options)
    images = tensor_dict[fields.InputDataFields.image]
@@ -1218,8 +1201,8 @@ class PreprocessorTest(tf.test.TestCase):
        fields.InputDataFields.groundtruth_boxes]
    distorted_labels = distorted_tensor_dict[
        fields.InputDataFields.groundtruth_classes]
-    distorted_label_scores = distorted_tensor_dict[
-        fields.InputDataFields.groundtruth_label_scores]
+    distorted_weights = distorted_tensor_dict[
+        fields.InputDataFields.groundtruth_weights]
    boxes_shape = tf.shape(boxes)
    distorted_boxes_shape = tf.shape(distorted_boxes)
    images_shape = tf.shape(images)
@@ -1229,17 +1212,17 @@ class PreprocessorTest(tf.test.TestCase):
      (boxes_shape_, distorted_boxes_shape_, images_shape_,
       distorted_images_shape_, images_, distorted_images_,
       boxes_, distorted_boxes_, labels_, distorted_labels_,
-       label_scores_, distorted_label_scores_) = sess.run(
+       weights_, distorted_weights_) = sess.run(
           [boxes_shape, distorted_boxes_shape, images_shape,
            distorted_images_shape, images, distorted_images,
            boxes, distorted_boxes, labels, distorted_labels,
-            label_scores, distorted_label_scores])
+            weights, distorted_weights])
      self.assertAllEqual(boxes_shape_, distorted_boxes_shape_)
      self.assertAllEqual(images_shape_, distorted_images_shape_)
      self.assertAllClose(images_, distorted_images_)
      self.assertAllClose(boxes_, distorted_boxes_)
      self.assertAllEqual(labels_, distorted_labels_)
-      self.assertAllEqual(label_scores_, distorted_label_scores_)
+      self.assertAllEqual(weights_, distorted_weights_)

  def testRandomCropWithMockSampleDistortedBoundingBox(self):
    preprocessing_options = [(preprocessor.normalize_image, {
@@ -1254,11 +1237,13 @@ class PreprocessorTest(tf.test.TestCase):
                         [0.2, 0.4, 0.75, 0.75],
                         [0.3, 0.1, 0.4, 0.7]], dtype=tf.float32)
    labels = tf.constant([1, 7, 11], dtype=tf.int32)
+    weights = tf.constant([1.0, 0.5, 0.6], dtype=tf.float32)

    tensor_dict = {
        fields.InputDataFields.image: images,
        fields.InputDataFields.groundtruth_boxes: boxes,
        fields.InputDataFields.groundtruth_classes: labels,
+        fields.InputDataFields.groundtruth_weights: weights,
    }
    tensor_dict = preprocessor.preprocess(tensor_dict, preprocessing_options)
    images = tensor_dict[fields.InputDataFields.image]
@@ -1279,18 +1264,98 @@ class PreprocessorTest(tf.test.TestCase):
          fields.InputDataFields.groundtruth_boxes]
      distorted_labels = distorted_tensor_dict[
          fields.InputDataFields.groundtruth_classes]
+      distorted_weights = distorted_tensor_dict[
+          fields.InputDataFields.groundtruth_weights]
      expected_boxes = tf.constant([[0.178947, 0.07173, 0.75789469, 0.66244733],
                                    [0.28421, 0.0, 0.38947365, 0.57805908]],
                                   dtype=tf.float32)
      expected_labels = tf.constant([7, 11], dtype=tf.int32)
+      expected_weights = tf.constant([0.5, 0.6], dtype=tf.float32)

      with self.test_session() as sess:
-        (distorted_boxes_, distorted_labels_,
-         expected_boxes_, expected_labels_) = sess.run(
-             [distorted_boxes, distorted_labels,
-              expected_boxes, expected_labels])
+        (distorted_boxes_, distorted_labels_, distorted_weights_,
+         expected_boxes_, expected_labels_, expected_weights_) = sess.run(
+             [distorted_boxes, distorted_labels, distorted_weights,
+              expected_boxes, expected_labels, expected_weights])
        self.assertAllClose(distorted_boxes_, expected_boxes_)
        self.assertAllEqual(distorted_labels_, expected_labels_)
+        self.assertAllEqual(distorted_weights_, expected_weights_)
+
+  def testRandomCropWithoutClipBoxes(self):
+    preprocessing_options = [(preprocessor.normalize_image, {
+        'original_minval': 0,
+        'original_maxval': 255,
+        'target_minval': 0,
+        'target_maxval': 1
+    })]
+
+    images = self.createColorfulTestImage()
+    boxes = tf.constant([[0.1, 0.1, 0.8, 0.3],
+                         [0.2, 0.4, 0.75, 0.75],
+                         [0.3, 0.1, 0.4, 0.7]], dtype=tf.float32)
+    keypoints = tf.constant([
+        [[0.1, 0.1], [0.8, 0.3]],
+        [[0.2, 0.4], [0.75, 0.75]],
+        [[0.3, 0.1], [0.4, 0.7]],
+    ], dtype=tf.float32)
+    labels = tf.constant([1, 7, 11], dtype=tf.int32)
+    weights = tf.constant([1.0, 0.5, 0.6], dtype=tf.float32)
+
+    tensor_dict = {
+        fields.InputDataFields.image: images,
+        fields.InputDataFields.groundtruth_boxes: boxes,
+        fields.InputDataFields.groundtruth_keypoints: keypoints,
+        fields.InputDataFields.groundtruth_classes: labels,
+        fields.InputDataFields.groundtruth_weights: weights,
+    }
+    tensor_dict = preprocessor.preprocess(tensor_dict, preprocessing_options)
+
+    preprocessing_options = [(preprocessor.random_crop_image, {
+        'clip_boxes': False,
+    })]
+    with mock.patch.object(
+        tf.image,
+        'sample_distorted_bounding_box') as mock_sample_distorted_bounding_box:
+      mock_sample_distorted_bounding_box.return_value = (tf.constant(
+          [6, 143, 0], dtype=tf.int32), tf.constant(
+              [190, 237, -1], dtype=tf.int32), tf.constant(
+                  [[[0.03, 0.3575, 0.98, 0.95]]], dtype=tf.float32))
+
+      preprocessor_arg_map = preprocessor.get_default_func_arg_map(
+          include_keypoints=True)
+      distorted_tensor_dict = preprocessor.preprocess(
+          tensor_dict, preprocessing_options, func_arg_map=preprocessor_arg_map)
+
+      distorted_boxes = distorted_tensor_dict[
+          fields.InputDataFields.groundtruth_boxes]
+      distorted_keypoints = distorted_tensor_dict[
+          fields.InputDataFields.groundtruth_keypoints]
+      distorted_labels = distorted_tensor_dict[
+          fields.InputDataFields.groundtruth_classes]
+      distorted_weights = distorted_tensor_dict[
+          fields.InputDataFields.groundtruth_weights]
+      expected_boxes = tf.constant(
+          [[0.178947, 0.07173, 0.75789469, 0.66244733],
+           [0.28421, -0.434599, 0.38947365, 0.57805908]],
+          dtype=tf.float32)
+      expected_keypoints = tf.constant(
+          [[[0.178947, 0.07173], [0.75789469, 0.66244733]],
+           [[0.28421, -0.434599], [0.38947365, 0.57805908]]],
+          dtype=tf.float32)
+      expected_labels = tf.constant([7, 11], dtype=tf.int32)
+      expected_weights = tf.constant([0.5, 0.6], dtype=tf.float32)
+
+      with self.test_session() as sess:
+        (distorted_boxes_, distorted_keypoints_, distorted_labels_,
+         distorted_weights_, expected_boxes_, expected_keypoints_,
+         expected_labels_, expected_weights_) = sess.run(
+             [distorted_boxes, distorted_keypoints, distorted_labels,
+              distorted_weights, expected_boxes, expected_keypoints,
+              expected_labels, expected_weights])
+        self.assertAllClose(distorted_boxes_, expected_boxes_)
+        self.assertAllClose(distorted_keypoints_, expected_keypoints_)
+        self.assertAllEqual(distorted_labels_, expected_labels_)
+        self.assertAllEqual(distorted_weights_, expected_weights_)

  def testRandomCropImageWithMultiClassScores(self):
    preprocessing_options = []
@@ -1304,12 +1369,14 @@ class PreprocessorTest(tf.test.TestCase):
    images = self.createTestImages()
    boxes = self.createTestBoxes()
    labels = self.createTestLabels()
+    weights = self.createTestGroundtruthWeights()
    multiclass_scores = self.createTestMultiClassScores()

    tensor_dict = {
        fields.InputDataFields.image: images,
        fields.InputDataFields.groundtruth_boxes: boxes,
        fields.InputDataFields.groundtruth_classes: labels,
+        fields.InputDataFields.groundtruth_weights: weights,
        fields.InputDataFields.multiclass_scores: multiclass_scores
    }
    distorted_tensor_dict = preprocessor.preprocess(tensor_dict,
@@ -1342,11 +1409,11 @@ class PreprocessorTest(tf.test.TestCase):
      self.assertAllEqual(distorted_boxes_.shape[0],
                          distorted_multiclass_scores_.shape[0])

-  def testStrictRandomCropImageWithLabelScores(self):
+  def testStrictRandomCropImageWithGroundtruthWeights(self):
    image = self.createColorfulTestImage()[0]
    boxes = self.createTestBoxes()
    labels = self.createTestLabels()
-    label_scores = self.createTestLabelScores()
+    weights = self.createTestGroundtruthWeights()
    with mock.patch.object(
        tf.image,
        'sample_distorted_bounding_box'
@@ -1355,20 +1422,20 @@ class PreprocessorTest(tf.test.TestCase):
          tf.constant([6, 143, 0], dtype=tf.int32),
          tf.constant([190, 237, -1], dtype=tf.int32),
          tf.constant([[[0.03, 0.3575, 0.98, 0.95]]], dtype=tf.float32))
-      new_image, new_boxes, new_labels, new_label_scores = (
+      new_image, new_boxes, new_labels, new_groundtruth_weights = (
          preprocessor._strict_random_crop_image(
-              image, boxes, labels, label_scores))
+              image, boxes, labels, weights))
      with self.test_session() as sess:
-        new_image, new_boxes, new_labels, new_label_scores = (
+        new_image, new_boxes, new_labels, new_groundtruth_weights = (
            sess.run(
-                [new_image, new_boxes, new_labels, new_label_scores])
+                [new_image, new_boxes, new_labels, new_groundtruth_weights])
        )

        expected_boxes = np.array(
            [[0.0, 0.0, 0.75789469, 1.0],
             [0.23157893, 0.24050637, 0.75789469, 1.0]], dtype=np.float32)
        self.assertAllEqual(new_image.shape, [190, 237, 3])
-        self.assertAllEqual(new_label_scores, [1.0, 0.5])
+        self.assertAllEqual(new_groundtruth_weights, [1.0, 0.5])
        self.assertAllClose(
            new_boxes.flatten(), expected_boxes.flatten())

@@ -1376,6 +1443,7 @@ class PreprocessorTest(tf.test.TestCase):
    image = self.createColorfulTestImage()[0]
    boxes = self.createTestBoxes()
    labels = self.createTestLabels()
+    weights = self.createTestGroundtruthWeights()
    masks = tf.random_uniform([2, 200, 400], dtype=tf.float32)
    with mock.patch.object(
        tf.image,
@@ -1385,12 +1453,12 @@ class PreprocessorTest(tf.test.TestCase):
          tf.constant([6, 143, 0], dtype=tf.int32),
          tf.constant([190, 237, -1], dtype=tf.int32),
          tf.constant([[[0.03, 0.3575, 0.98, 0.95]]], dtype=tf.float32))
-      new_image, new_boxes, new_labels, new_masks = (
+      new_image, new_boxes, new_labels, new_weights, new_masks = (
          preprocessor._strict_random_crop_image(
-              image, boxes, labels, masks=masks))
+              image, boxes, labels, weights, masks=masks))
      with self.test_session() as sess:
-        new_image, new_boxes, new_labels, new_masks = sess.run(
-            [new_image, new_boxes, new_labels, new_masks])
+        new_image, new_boxes, new_labels, new_weights, new_masks = sess.run(
+            [new_image, new_boxes, new_labels, new_weights, new_masks])
        expected_boxes = np.array(
            [[0.0, 0.0, 0.75789469, 1.0],
             [0.23157893, 0.24050637, 0.75789469, 1.0]], dtype=np.float32)
@@ -1403,6 +1471,7 @@ class PreprocessorTest(tf.test.TestCase):
    image = self.createColorfulTestImage()[0]
    boxes = self.createTestBoxes()
    labels = self.createTestLabels()
+    weights = self.createTestGroundtruthWeights()
    keypoints = self.createTestKeypoints()
    with mock.patch.object(
        tf.image,
@@ -1412,12 +1481,12 @@ class PreprocessorTest(tf.test.TestCase):
          tf.constant([6, 143, 0], dtype=tf.int32),
          tf.constant([190, 237, -1], dtype=tf.int32),
          tf.constant([[[0.03, 0.3575, 0.98, 0.95]]], dtype=tf.float32))
-      new_image, new_boxes, new_labels, new_keypoints = (
+      new_image, new_boxes, new_labels, new_weights, new_keypoints = (
          preprocessor._strict_random_crop_image(
-              image, boxes, labels, keypoints=keypoints))
+              image, boxes, labels, weights, keypoints=keypoints))
      with self.test_session() as sess:
-        new_image, new_boxes, new_labels, new_keypoints = sess.run(
-            [new_image, new_boxes, new_labels, new_keypoints])
+        new_image, new_boxes, new_labels, new_weights, new_keypoints = sess.run(
+            [new_image, new_boxes, new_labels, new_weights, new_keypoints])

        expected_boxes = np.array([
            [0.0, 0.0, 0.75789469, 1.0],
@@ -1440,12 +1509,14 @@ class PreprocessorTest(tf.test.TestCase):
    image = self.createColorfulTestImage()
    boxes = self.createTestBoxes()
    labels = self.createTestLabels()
+    weights = self.createTestGroundtruthWeights()
    masks = tf.random_uniform([2, 200, 400], dtype=tf.float32)

    tensor_dict = {
        fields.InputDataFields.image: image,
        fields.InputDataFields.groundtruth_boxes: boxes,
        fields.InputDataFields.groundtruth_classes: labels,
+        fields.InputDataFields.groundtruth_weights: weights,
        fields.InputDataFields.groundtruth_instance_masks: masks,
    }

@@ -1491,13 +1562,15 @@ class PreprocessorTest(tf.test.TestCase):
    image = self.createColorfulTestImage()
    boxes = self.createTestBoxes()
    labels = self.createTestLabels()
+    weights = self.createTestGroundtruthWeights()
    keypoints = self.createTestKeypointsInsideCrop()

    tensor_dict = {
        fields.InputDataFields.image: image,
        fields.InputDataFields.groundtruth_boxes: boxes,
        fields.InputDataFields.groundtruth_classes: labels,
-        fields.InputDataFields.groundtruth_keypoints: keypoints
+        fields.InputDataFields.groundtruth_keypoints: keypoints,
+        fields.InputDataFields.groundtruth_weights: weights
    }

    preprocessor_arg_map = preprocessor.get_default_func_arg_map(
@@ -1551,12 +1624,14 @@ class PreprocessorTest(tf.test.TestCase):
    image = self.createColorfulTestImage()
    boxes = self.createTestBoxes()
    labels = self.createTestLabels()
+    weights = self.createTestGroundtruthWeights()
    keypoints = self.createTestKeypointsOutsideCrop()

    tensor_dict = {
        fields.InputDataFields.image: image,
        fields.InputDataFields.groundtruth_boxes: boxes,
        fields.InputDataFields.groundtruth_classes: labels,
+        fields.InputDataFields.groundtruth_weights: weights,
        fields.InputDataFields.groundtruth_keypoints: keypoints
    }

@@ -1610,33 +1685,32 @@ class PreprocessorTest(tf.test.TestCase):
  def testRunRetainBoxesAboveThreshold(self):
    boxes = self.createTestBoxes()
    labels = self.createTestLabels()
-    label_scores = self.createTestLabelScores()
+    weights = self.createTestGroundtruthWeights()

    tensor_dict = {
        fields.InputDataFields.groundtruth_boxes: boxes,
        fields.InputDataFields.groundtruth_classes: labels,
-        fields.InputDataFields.groundtruth_confidences: label_scores
+        fields.InputDataFields.groundtruth_weights: weights,
    }

    preprocessing_options = [
        (preprocessor.retain_boxes_above_threshold, {'threshold': 0.6})
    ]
-    preprocessor_arg_map = preprocessor.get_default_func_arg_map(
-        include_label_scores=True)
+    preprocessor_arg_map = preprocessor.get_default_func_arg_map()
    retained_tensor_dict = preprocessor.preprocess(
        tensor_dict, preprocessing_options, func_arg_map=preprocessor_arg_map)
    retained_boxes = retained_tensor_dict[
        fields.InputDataFields.groundtruth_boxes]
    retained_labels = retained_tensor_dict[
        fields.InputDataFields.groundtruth_classes]
-    retained_label_scores = retained_tensor_dict[
-        fields.InputDataFields.groundtruth_confidences]
+    retained_weights = retained_tensor_dict[
+        fields.InputDataFields.groundtruth_weights]

    with self.test_session() as sess:
      (retained_boxes_, retained_labels_,
-       retained_label_scores_, expected_retained_boxes_,
-       expected_retained_labels_, expected_retained_label_scores_) = sess.run(
-           [retained_boxes, retained_labels, retained_label_scores,
+       retained_weights_, expected_retained_boxes_,
+       expected_retained_labels_, expected_retained_weights_) = sess.run(
+           [retained_boxes, retained_labels, retained_weights,
            self.expectedBoxesAfterThresholding(),
            self.expectedLabelsAfterThresholding(),
            self.expectedLabelScoresAfterThresholding()])
@@ -1644,18 +1718,18 @@ class PreprocessorTest(tf.test.TestCase):
      self.assertAllClose(retained_boxes_, expected_retained_boxes_)
      self.assertAllClose(retained_labels_, expected_retained_labels_)
      self.assertAllClose(
-          retained_label_scores_, expected_retained_label_scores_)
+          retained_weights_, expected_retained_weights_)

  def testRunRetainBoxesAboveThresholdWithMasks(self):
    boxes = self.createTestBoxes()
    labels = self.createTestLabels()
-    label_scores = self.createTestLabelScores()
+    weights = self.createTestGroundtruthWeights()
    masks = self.createTestMasks()

    tensor_dict = {
        fields.InputDataFields.groundtruth_boxes: boxes,
        fields.InputDataFields.groundtruth_classes: labels,
-        fields.InputDataFields.groundtruth_confidences: label_scores,
+        fields.InputDataFields.groundtruth_weights: weights,
        fields.InputDataFields.groundtruth_instance_masks: masks
    }

@@ -1681,18 +1755,17 @@ class PreprocessorTest(tf.test.TestCase):
  def testRunRetainBoxesAboveThresholdWithKeypoints(self):
    boxes = self.createTestBoxes()
    labels = self.createTestLabels()
-    label_scores = self.createTestLabelScores()
+    weights = self.createTestGroundtruthWeights()
    keypoints = self.createTestKeypoints()

    tensor_dict = {
        fields.InputDataFields.groundtruth_boxes: boxes,
        fields.InputDataFields.groundtruth_classes: labels,
-        fields.InputDataFields.groundtruth_confidences: label_scores,
+        fields.InputDataFields.groundtruth_weights: weights,
        fields.InputDataFields.groundtruth_keypoints: keypoints
    }

    preprocessor_arg_map = preprocessor.get_default_func_arg_map(
-        include_label_scores=True,
        include_keypoints=True)

    preprocessing_options = [
@@ -1721,12 +1794,14 @@ class PreprocessorTest(tf.test.TestCase):
    image = self.createColorfulTestImage()
    boxes = self.createTestBoxes()
    labels = self.createTestLabels()
+    weights = self.createTestGroundtruthWeights()
    masks = tf.random_uniform([2, 200, 400], dtype=tf.float32)

    tensor_dict = {
        fields.InputDataFields.image: image,
        fields.InputDataFields.groundtruth_boxes: boxes,
        fields.InputDataFields.groundtruth_classes: labels,
+        fields.InputDataFields.groundtruth_weights: weights,
        fields.InputDataFields.groundtruth_instance_masks: masks
    }

@@ -1764,12 +1839,14 @@ class PreprocessorTest(tf.test.TestCase):
    image = self.createColorfulTestImage()
    boxes = self.createTestBoxes()
    labels = self.createTestLabels()
+    weights = self.createTestGroundtruthWeights()
    keypoints = self.createTestKeypoints()

    tensor_dict = {
        fields.InputDataFields.image: image,
        fields.InputDataFields.groundtruth_boxes: boxes,
        fields.InputDataFields.groundtruth_classes: labels,
+        fields.InputDataFields.groundtruth_weights: weights,
        fields.InputDataFields.groundtruth_keypoints: keypoints
    }

@@ -2016,10 +2093,12 @@ class PreprocessorTest(tf.test.TestCase):
    images = self.createTestImages()
    boxes = self.createTestBoxes()
    labels = self.createTestLabels()
+    weights = self.createTestGroundtruthWeights()
    tensor_dict = {
        fields.InputDataFields.image: images,
        fields.InputDataFields.groundtruth_boxes: boxes,
        fields.InputDataFields.groundtruth_classes: labels,
+        fields.InputDataFields.groundtruth_weights: weights,
    }
    tensor_dict = preprocessor.preprocess(tensor_dict, preprocessing_options)
    images = tensor_dict[fields.InputDataFields.image]
@@ -2057,10 +2136,12 @@ class PreprocessorTest(tf.test.TestCase):
    images = self.createTestImages()
    boxes = self.createTestBoxes()
    labels = self.createTestLabels()
+    weights = self.createTestGroundtruthWeights()
    tensor_dict = {
        fields.InputDataFields.image: images,
        fields.InputDataFields.groundtruth_boxes: boxes,
        fields.InputDataFields.groundtruth_classes: labels,
+        fields.InputDataFields.groundtruth_weights: weights,
    }
    tensor_dict = preprocessor.preprocess(tensor_dict, [])
    images = tensor_dict[fields.InputDataFields.image]
@@ -2638,10 +2719,12 @@ class PreprocessorTest(tf.test.TestCase):
    images = self.createTestImages()
    boxes = self.createTestBoxes()
    labels = self.createTestLabels()
+    weights = self.createTestGroundtruthWeights()
    tensor_dict = {
        fields.InputDataFields.image: images,
        fields.InputDataFields.groundtruth_boxes: boxes,
        fields.InputDataFields.groundtruth_classes: labels,
+        fields.InputDataFields.groundtruth_weights: weights,
    }
    distorted_tensor_dict = preprocessor.preprocess(tensor_dict,
                                                    preprocessing_options)
@@ -2672,6 +2755,7 @@ class PreprocessorTest(tf.test.TestCase):
    images = self.createTestImages()
    boxes = self.createTestBoxes()
    labels = self.createTestLabels()
+    weights = self.createTestGroundtruthWeights()
    multiclass_scores = self.createTestMultiClassScores()

    tensor_dict = {
@@ -2679,6 +2763,7 @@ class PreprocessorTest(tf.test.TestCase):
        fields.InputDataFields.groundtruth_boxes: boxes,
        fields.InputDataFields.groundtruth_classes: labels,
        fields.InputDataFields.multiclass_scores: multiclass_scores,
+        fields.InputDataFields.groundtruth_weights: weights,
    }
    preprocessor_arg_map = preprocessor.get_default_func_arg_map(
        include_multiclass_scores=True)
@@ -2717,6 +2802,7 @@ class PreprocessorTest(tf.test.TestCase):
    images = self.createTestImages()
    boxes = self.createTestBoxes()
    labels = self.createTestLabels()
+    weights = self.createTestGroundtruthWeights()
    preprocessing_options = [
        (preprocessor.normalize_image, {
            'original_minval': 0,
@@ -2729,6 +2815,7 @@ class PreprocessorTest(tf.test.TestCase):
        fields.InputDataFields.image: images,
        fields.InputDataFields.groundtruth_boxes: boxes,
        fields.InputDataFields.groundtruth_classes: labels,
+        fields.InputDataFields.groundtruth_weights: weights,
    }
    distorted_tensor_dict = preprocessor.preprocess(tensor_dict,
                                                    preprocessing_options)
@@ -2764,13 +2851,13 @@ class PreprocessorTest(tf.test.TestCase):
                                test_keypoints=False)

  def _testSSDRandomCropFixedAspectRatio(self,
-                                         include_label_scores,
                                         include_multiclass_scores,
                                         include_instance_masks,
                                         include_keypoints):
    images = self.createTestImages()
    boxes = self.createTestBoxes()
    labels = self.createTestLabels()
+    weights = self.createTestGroundtruthWeights()
    preprocessing_options = [(preprocessor.normalize_image, {
        'original_minval': 0,
        'original_maxval': 255,
@@ -2781,11 +2868,8 @@ class PreprocessorTest(tf.test.TestCase):
        fields.InputDataFields.image: images,
        fields.InputDataFields.groundtruth_boxes: boxes,
        fields.InputDataFields.groundtruth_classes: labels,
+        fields.InputDataFields.groundtruth_weights: weights
    }
-    if include_label_scores:
-      label_scores = self.createTestLabelScores()
-      tensor_dict[fields.InputDataFields.groundtruth_confidences] = (
-          label_scores)
    if include_multiclass_scores:
      multiclass_scores = self.createTestMultiClassScores()
      tensor_dict[fields.InputDataFields.multiclass_scores] = (
@@ -2798,7 +2882,6 @@ class PreprocessorTest(tf.test.TestCase):
      tensor_dict[fields.InputDataFields.groundtruth_keypoints] = keypoints

    preprocessor_arg_map = preprocessor.get_default_func_arg_map(
-        include_label_scores=include_label_scores,
        include_multiclass_scores=include_multiclass_scores,
        include_instance_masks=include_instance_masks,
        include_keypoints=include_keypoints)
@@ -2821,26 +2904,22 @@ class PreprocessorTest(tf.test.TestCase):
      self.assertAllEqual(images_rank_, distorted_images_rank_)

  def testSSDRandomCropFixedAspectRatio(self):
-    self._testSSDRandomCropFixedAspectRatio(include_label_scores=False,
-                                            include_multiclass_scores=False,
+    self._testSSDRandomCropFixedAspectRatio(include_multiclass_scores=False,
                                            include_instance_masks=False,
                                            include_keypoints=False)

  def testSSDRandomCropFixedAspectRatioWithMultiClassScores(self):
-    self._testSSDRandomCropFixedAspectRatio(include_label_scores=False,
-                                            include_multiclass_scores=True,
+    self._testSSDRandomCropFixedAspectRatio(include_multiclass_scores=True,
                                            include_instance_masks=False,
                                            include_keypoints=False)

  def testSSDRandomCropFixedAspectRatioWithMasksAndKeypoints(self):
-    self._testSSDRandomCropFixedAspectRatio(include_label_scores=False,
-                                            include_multiclass_scores=False,
+    self._testSSDRandomCropFixedAspectRatio(include_multiclass_scores=False,
                                            include_instance_masks=True,
                                            include_keypoints=True)

  def testSSDRandomCropFixedAspectRatioWithLabelScoresMasksAndKeypoints(self):
-    self._testSSDRandomCropFixedAspectRatio(include_label_scores=True,
-                                            include_multiclass_scores=False,
+    self._testSSDRandomCropFixedAspectRatio(include_multiclass_scores=False,
                                            include_instance_masks=True,
                                            include_keypoints=True)


--- a/research/object_detection/core/standard_fields.py
+++ b/research/object_detection/core/standard_fields.py
@@ -44,7 +44,6 @@ class InputDataFields(object):
    groundtruth_image_confidences: image-level class confidences.
    groundtruth_boxes: coordinates of the ground truth boxes in the image.
    groundtruth_classes: box-level class labels.
-    groundtruth_confidences: box-level class confidences.
    groundtruth_label_types: box-level label types (e.g. explicit negative).
    groundtruth_is_crowd: [DEPRECATED, use groundtruth_group_of instead]
      is the groundtruth a single object or a crowd.