Refactor object detection box predictors and fix some issues with model_main. (#4965)

* Merged commit includes the following changes: 206852642 by Zhichao Lu: Build the balanced_positive_negative_sampler in the model builder for FasterRCNN. Also adds an option to use the static implementation of the sampler. -- 206803260 by Zhichao Lu: Fixes a misplaced argument in resnet fpn feature extractor. -- 206682736 by Zhichao Lu: This CL modifies the SSD meta architecture to support both Slim-based and Keras-based box predictors, and begins preparation for Keras box predictor support in the other meta architectures. Concretely, this CL adds a new `KerasBoxPredictor` base class and makes the meta architectures appropriately call whichever box predictors they are using. We can switch the non-ssd meta architectures to fully support Keras box predictors once the Keras Convolutional Box Predictor CL is submitted. -- 206669634 by Zhichao Lu: Adds an alternate method for balanced positive negative sampler using static shapes. -- 206643278 by Zhichao Lu: This CL adds a Keras layer hyperparameter configuration object to the hyperparams_builder. It automatically converts from Slim layer hyperparameter configs to Keras layer hyperparameters. Namely, it: - Builds Keras initializers/regularizers instead of Slim ones - sets weights_regularizer/initializer to kernel_regularizer/initializer - converts batchnorm decay to momentum - converts Slim l2 regularizer weights to the equivalent Keras l2 weights This will be used in the conversion of object detection feature extractors & box predictors to newer Tensorflow APIs. -- 206611681 by Zhichao Lu: Internal changes. -- 206591619 by Zhichao Lu: Clip the to shape when the input tensors are larger than the expected padded static shape -- 206517644 by Zhichao Lu: Make MultiscaleGridAnchorGenerator more consistent with MultipleGridAnchorGenerator. -- 206415624 by Zhichao Lu: Make the hardcoded feature pyramid network (FPN) levels configurable for both SSD Resnet and SSD Mobilenet. -- 206398204 by Zhichao Lu: This CL modifies the SSD meta architecture to support both Slim-based and Keras-based feature extractors. This allows us to begin the conversion of object detection to newer Tensorflow APIs. -- 206213448 by Zhichao Lu: Adding a method to compute the expected classification loss by background/foreground weighting. -- 206204232 by Zhichao Lu: Adding the keypoint head to the Mask RCNN pipeline. -- 206200352 by Zhichao Lu: - Create Faster R-CNN target assigner in the model builder. This allows configuring matchers in Target assigner to use TPU compatible ops (tf.gather in this case) without any change in meta architecture. - As a +ve side effect of the refactoring, we can now re-use a single target assigner for all of second stage heads in Faster R-CNN. -- 206178206 by Zhichao Lu: Force ssd feature extractor builder to use keyword arguments so values won't be passed to wrong arguments. -- 206168297 by Zhichao Lu: Updating exporter to use freeze_graph.freeze_graph_with_def_protos rather than a homegrown version. -- 206080748 by Zhichao Lu: Merge external contributions. -- 206074460 by Zhichao Lu: Update to preprocessor to apply temperature and softmax to the multiclass scores on read. -- 205960802 by Zhichao Lu: Fixing a bug in hierarchical label expansion script. -- 205944686 by Zhichao Lu: Update exporter to support exporting quantized model. -- 205912529 by Zhichao Lu: Add a two stage matcher to allow for thresholding by one criteria and then argmaxing on the other. -- 205909017 by Zhichao Lu: Add test for grayscale image_resizer -- 205892801 by Zhichao Lu: Add flag to decide whether to apply batch norm to conv layers of weight shared box predictor. -- 205824449 by Zhichao Lu: make sure that by default mask rcnn box predictor predicts 2 stages. -- 205730139 by Zhichao Lu: Updating warning message to be more explicit about variable size mismatch. -- 205696992 by Zhichao Lu: Remove utils/ops.py's dependency on core/box_list_ops.py. This will allow re-using TPU compatible ops from utils/ops.py in core/box_list_ops.py. -- 205696867 by Zhichao Lu: Refactoring mask rcnn predictor so have each head in a separate file. This CL lets us to add new heads more easily in the future to mask rcnn. -- 205492073 by Zhichao Lu: Refactor R-FCN box predictor to be TPU compliant. - Change utils/ops.py:position_sensitive_crop_regions to operate on single image and set of boxes without `box_ind` - Add a batch version that operations on batches of images and batches of boxes. - Refactor R-FCN box predictor to use the batched version of position sensitive crop regions. -- 205453567 by Zhichao Lu: Fix bug that cannot export inference graph when write_inference_graph flag is True. -- 205316039 by Zhichao Lu: Changing input tensor name. -- 205256307 by Zhichao Lu: Fix model zoo links for quantized model. -- 205164432 by Zhichao Lu: Fixes eval error when label map contains non-ascii characters. -- 205129842 by Zhichao Lu: Adds a option to clip the anchors to the window size without filtering the overlapped boxes in Faster-RCNN -- 205094863 by Zhichao Lu: Update to label map util to allow the option of adding a background class and fill in gaps in the label map. Useful for using multiclass scores which require a complete label map with explicit background label. -- 204989032 by Zhichao Lu: Add tf.prof support to exporter. -- 204825267 by Zhichao Lu: Modify mask rcnn box predictor tests for TPU compatibility. -- 204778749 by Zhichao Lu: Remove score filtering from postprocessing.py and rely on filtering logic in tf.image.non_max_suppression -- 204775818 by Zhichao Lu: Python3 fixes for object_detection. -- 204745920 by Zhichao Lu: Object Detection Dataset visualization tool (documentation). -- 204686993 by Zhichao Lu: Internal changes. -- 204559667 by Zhichao Lu: Refactor box_predictor.py into multiple files. The abstract base class remains in the object_detection/core, The other classes have moved to a separate file each in object_detection/predictors -- 204552847 by Zhichao Lu: Update blog post link. -- 204508028 by Zhichao Lu: Bump down the batch size to 1024 to be a bit more tolerant to OOM and double the number of iterations. This job still converges to 20.5 mAP in 3 hours. -- PiperOrigin-RevId: 206852642 * Add original post-processing back.

Refactor object detection box predictors and fix some issues with model_main. (#4965)
* Merged commit includes the following changes: 206852642 by Zhichao Lu: Build the balanced_positive_negative_sampler in the model builder for FasterRCNN. Also adds an option to use the static implementation of the sampler. -- 206803260 by Zhichao Lu: Fixes a misplaced argument in resnet fpn feature extractor. -- 206682736 by Zhichao Lu: This CL modifies the SSD meta architecture to support both Slim-based and Keras-based box predictors, and begins preparation for Keras box predictor support in the other meta architectures. Concretely, this CL adds a new `KerasBoxPredictor` base class and makes the meta architectures appropriately call whichever box predictors they are using. We can switch the non-ssd meta architectures to fully support Keras box predictors once the Keras Convolutional Box Predictor CL is submitted. -- 206669634 by Zhichao Lu: Adds an alternate method for balanced positive negative sampler using static shapes. -- 206643278 by Zhichao Lu: This CL adds a Keras layer hyperparameter configuration object to the hyperparams_builder. It automatically converts from Slim layer hyperparameter configs to Keras layer hyperparameters. Namely, it: - Builds Keras initializers/regularizers instead of Slim ones - sets weights_regularizer/initializer to kernel_regularizer/initializer - converts batchnorm decay to momentum - converts Slim l2 regularizer weights to the equivalent Keras l2 weights This will be used in the conversion of object detection feature extractors & box predictors to newer Tensorflow APIs. -- 206611681 by Zhichao Lu: Internal changes. -- 206591619 by Zhichao Lu: Clip the to shape when the input tensors are larger than the expected padded static shape -- 206517644 by Zhichao Lu: Make MultiscaleGridAnchorGenerator more consistent with MultipleGridAnchorGenerator. -- 206415624 by Zhichao Lu: Make the hardcoded feature pyramid network (FPN) levels configurable for both SSD Resnet and SSD Mobilenet. -- 206398204 by Zhichao Lu: This CL modifies the SSD meta architecture to support both Slim-based and Keras-based feature extractors. This allows us to begin the conversion of object detection to newer Tensorflow APIs. -- 206213448 by Zhichao Lu: Adding a method to compute the expected classification loss by background/foreground weighting. -- 206204232 by Zhichao Lu: Adding the keypoint head to the Mask RCNN pipeline. -- 206200352 by Zhichao Lu: - Create Faster R-CNN target assigner in the model builder. This allows configuring matchers in Target assigner to use TPU compatible ops (tf.gather in this case) without any change in meta architecture. - As a +ve side effect of the refactoring, we can now re-use a single target assigner for all of second stage heads in Faster R-CNN. -- 206178206 by Zhichao Lu: Force ssd feature extractor builder to use keyword arguments so values won't be passed to wrong arguments. -- 206168297 by Zhichao Lu: Updating exporter to use freeze_graph.freeze_graph_with_def_protos rather than a homegrown version. -- 206080748 by Zhichao Lu: Merge external contributions. -- 206074460 by Zhichao Lu: Update to preprocessor to apply temperature and softmax to the multiclass scores on read. -- 205960802 by Zhichao Lu: Fixing a bug in hierarchical label expansion script. -- 205944686 by Zhichao Lu: Update exporter to support exporting quantized model. -- 205912529 by Zhichao Lu: Add a two stage matcher to allow for thresholding by one criteria and then argmaxing on the other. -- 205909017 by Zhichao Lu: Add test for grayscale image_resizer -- 205892801 by Zhichao Lu: Add flag to decide whether to apply batch norm to conv layers of weight shared box predictor. -- 205824449 by Zhichao Lu: make sure that by default mask rcnn box predictor predicts 2 stages. -- 205730139 by Zhichao Lu: Updating warning message to be more explicit about variable size mismatch. -- 205696992 by Zhichao Lu: Remove utils/ops.py's dependency on core/box_list_ops.py. This will allow re-using TPU compatible ops from utils/ops.py in core/box_list_ops.py. -- 205696867 by Zhichao Lu: Refactoring mask rcnn predictor so have each head in a separate file. This CL lets us to add new heads more easily in the future to mask rcnn. -- 205492073 by Zhichao Lu: Refactor R-FCN box predictor to be TPU compliant. - Change utils/ops.py:position_sensitive_crop_regions to operate on single image and set of boxes without `box_ind` - Add a batch version that operations on batches of images and batches of boxes. - Refactor R-FCN box predictor to use the batched version of position sensitive crop regions. -- 205453567 by Zhichao Lu: Fix bug that cannot export inference graph when write_inference_graph flag is True. -- 205316039 by Zhichao Lu: Changing input tensor name. -- 205256307 by Zhichao Lu: Fix model zoo links for quantized model. -- 205164432 by Zhichao Lu: Fixes eval error when label map contains non-ascii characters. -- 205129842 by Zhichao Lu: Adds a option to clip the anchors to the window size without filtering the overlapped boxes in Faster-RCNN -- 205094863 by Zhichao Lu: Update to label map util to allow the option of adding a background class and fill in gaps in the label map. Useful for using multiclass scores which require a complete label map with explicit background label. -- 204989032 by Zhichao Lu: Add tf.prof support to exporter. -- 204825267 by Zhichao Lu: Modify mask rcnn box predictor tests for TPU compatibility. -- 204778749 by Zhichao Lu: Remove score filtering from postprocessing.py and rely on filtering logic in tf.image.non_max_suppression -- 204775818 by Zhichao Lu: Python3 fixes for object_detection. -- 204745920 by Zhichao Lu: Object Detection Dataset visualization tool (documentation). -- 204686993 by Zhichao Lu: Internal changes. -- 204559667 by Zhichao Lu: Refactor box_predictor.py into multiple files. The abstract base class remains in the object_detection/core, The other classes have moved to a separate file each in object_detection/predictors -- 204552847 by Zhichao Lu: Update blog post link. -- 204508028 by Zhichao Lu: Bump down the batch size to 1024 to be a bit more tolerant to OOM and double the number of iterations. This job still converges to 20.5 mAP in 3 hours. -- PiperOrigin-RevId: 206852642 * Add original post-processing back.
02a9969e · pkulzc · GitHub · d135ed9c · 02a9969e · 02a9969e
Unverified Commit 02a9969e authored Aug 01, 2018 by pkulzc Committed by GitHub Aug 01, 2018
20 changed files
--- a/research/object_detection/predictors/rfcn_box_predictor.py
+++ b/research/object_detection/predictors/rfcn_box_predictor.py
+# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""RFCN Box Predictor."""
+import tensorflow as tf
+from object_detection.core import box_predictor
+from object_detection.utils import ops
+
+slim = tf.contrib.slim
+
+BOX_ENCODINGS = box_predictor.BOX_ENCODINGS
+CLASS_PREDICTIONS_WITH_BACKGROUND = (
+    box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND)
+MASK_PREDICTIONS = box_predictor.MASK_PREDICTIONS
+
+
+class RfcnBoxPredictor(box_predictor.BoxPredictor):
+  """RFCN Box Predictor.
+
+  Applies a position sensitive ROI pooling on position sensitive feature maps to
+  predict classes and refined locations. See https://arxiv.org/abs/1605.06409
+  for details.
+
+  This is used for the second stage of the RFCN meta architecture. Notice that
+  locations are *not* shared across classes, thus for each anchor, a separate
+  prediction is made for each class.
+  """
+
+  def __init__(self,
+               is_training,
+               num_classes,
+               conv_hyperparams_fn,
+               num_spatial_bins,
+               depth,
+               crop_size,
+               box_code_size):
+    """Constructor.
+
+    Args:
+      is_training: Indicates whether the BoxPredictor is in training mode.
+      num_classes: number of classes.  Note that num_classes *does not*
+        include the background category, so if groundtruth labels take values
+        in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the
+        assigned classification targets can range from {0,... K}).
+      conv_hyperparams_fn: A function to construct tf-slim arg_scope with
+        hyperparameters for convolutional layers.
+      num_spatial_bins: A list of two integers `[spatial_bins_y,
+        spatial_bins_x]`.
+      depth: Target depth to reduce the input feature maps to.
+      crop_size: A list of two integers `[crop_height, crop_width]`.
+      box_code_size: Size of encoding for each box.
+    """
+    super(RfcnBoxPredictor, self).__init__(is_training, num_classes)
+    self._conv_hyperparams_fn = conv_hyperparams_fn
+    self._num_spatial_bins = num_spatial_bins
+    self._depth = depth
+    self._crop_size = crop_size
+    self._box_code_size = box_code_size
+
+  @property
+  def num_classes(self):
+    return self._num_classes
+
+  def _predict(self, image_features, num_predictions_per_location,
+               proposal_boxes):
+    """Computes encoded object locations and corresponding confidences.
+
+    Args:
+      image_features: A list of float tensors of shape [batch_size, height_i,
+      width_i, channels_i] containing features for a batch of images.
+      num_predictions_per_location: A list of integers representing the number
+        of box predictions to be made per spatial location for each feature map.
+        Currently, this must be set to [1], or an error will be raised.
+      proposal_boxes: A float tensor of shape [batch_size, num_proposals,
+        box_code_size].
+
+    Returns:
+      box_encodings: A list of float tensors of shape
+        [batch_size, num_anchors_i, q, code_size] representing the location of
+        the objects, where q is 1 or the number of classes. Each entry in the
+        list corresponds to a feature map in the input `image_features` list.
+      class_predictions_with_background: A list of float tensors of shape
+        [batch_size, num_anchors_i, num_classes + 1] representing the class
+        predictions for the proposals. Each entry in the list corresponds to a
+        feature map in the input `image_features` list.
+
+    Raises:
+      ValueError: if num_predictions_per_location is not 1 or if
+        len(image_features) is not 1.
+    """
+    if (len(num_predictions_per_location) != 1 or
+        num_predictions_per_location[0] != 1):
+      raise ValueError('Currently RfcnBoxPredictor only supports '
+                       'predicting a single box per class per location.')
+    if len(image_features) != 1:
+      raise ValueError('length of `image_features` must be 1. Found {}'.
+                       format(len(image_features)))
+    image_feature = image_features[0]
+    num_predictions_per_location = num_predictions_per_location[0]
+    batch_size = tf.shape(proposal_boxes)[0]
+    num_boxes = tf.shape(proposal_boxes)[1]
+    net = image_feature
+    with slim.arg_scope(self._conv_hyperparams_fn()):
+      net = slim.conv2d(net, self._depth, [1, 1], scope='reduce_depth')
+      # Location predictions.
+      location_feature_map_depth = (self._num_spatial_bins[0] *
+                                    self._num_spatial_bins[1] *
+                                    self.num_classes *
+                                    self._box_code_size)
+      location_feature_map = slim.conv2d(net, location_feature_map_depth,
+                                         [1, 1], activation_fn=None,
+                                         scope='refined_locations')
+      box_encodings = ops.batch_position_sensitive_crop_regions(
+          location_feature_map,
+          boxes=proposal_boxes,
+          crop_size=self._crop_size,
+          num_spatial_bins=self._num_spatial_bins,
+          global_pool=True)
+      box_encodings = tf.squeeze(box_encodings, squeeze_dims=[2, 3])
+      box_encodings = tf.reshape(box_encodings,
+                                 [batch_size * num_boxes, 1, self.num_classes,
+                                  self._box_code_size])
+
+      # Class predictions.
+      total_classes = self.num_classes + 1  # Account for background class.
+      class_feature_map_depth = (self._num_spatial_bins[0] *
+                                 self._num_spatial_bins[1] *
+                                 total_classes)
+      class_feature_map = slim.conv2d(net, class_feature_map_depth, [1, 1],
+                                      activation_fn=None,
+                                      scope='class_predictions')
+      class_predictions_with_background = (
+          ops.batch_position_sensitive_crop_regions(
+              class_feature_map,
+              boxes=proposal_boxes,
+              crop_size=self._crop_size,
+              num_spatial_bins=self._num_spatial_bins,
+              global_pool=True))
+      class_predictions_with_background = tf.squeeze(
+          class_predictions_with_background, squeeze_dims=[2, 3])
+      class_predictions_with_background = tf.reshape(
+          class_predictions_with_background,
+          [batch_size * num_boxes, 1, total_classes])
+
+    return {BOX_ENCODINGS: [box_encodings],
+            CLASS_PREDICTIONS_WITH_BACKGROUND:
+            [class_predictions_with_background]}
--- a/research/object_detection/predictors/rfcn_box_predictor_test.py
+++ b/research/object_detection/predictors/rfcn_box_predictor_test.py
+# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Tests for object_detection.predictors.rfcn_box_predictor."""
+import numpy as np
+import tensorflow as tf
+
+from google.protobuf import text_format
+from object_detection.builders import hyperparams_builder
+from object_detection.predictors import rfcn_box_predictor as box_predictor
+from object_detection.protos import hyperparams_pb2
+from object_detection.utils import test_case
+
+
+class RfcnBoxPredictorTest(test_case.TestCase):
+
+  def _build_arg_scope_with_conv_hyperparams(self):
+    conv_hyperparams = hyperparams_pb2.Hyperparams()
+    conv_hyperparams_text_proto = """
+      regularizer {
+        l2_regularizer {
+        }
+      }
+      initializer {
+        truncated_normal_initializer {
+        }
+      }
+    """
+    text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams)
+    return hyperparams_builder.build(conv_hyperparams, is_training=True)
+
+  def test_get_correct_box_encoding_and_class_prediction_shapes(self):
+
+    def graph_fn(image_features, proposal_boxes):
+      rfcn_box_predictor = box_predictor.RfcnBoxPredictor(
+          is_training=False,
+          num_classes=2,
+          conv_hyperparams_fn=self._build_arg_scope_with_conv_hyperparams(),
+          num_spatial_bins=[3, 3],
+          depth=4,
+          crop_size=[12, 12],
+          box_code_size=4
+      )
+      box_predictions = rfcn_box_predictor.predict(
+          [image_features], num_predictions_per_location=[1],
+          scope='BoxPredictor',
+          proposal_boxes=proposal_boxes)
+      box_encodings = tf.concat(
+          box_predictions[box_predictor.BOX_ENCODINGS], axis=1)
+      class_predictions_with_background = tf.concat(
+          box_predictions[box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND],
+          axis=1)
+      return (box_encodings, class_predictions_with_background)
+
+    image_features = np.random.rand(4, 8, 8, 64).astype(np.float32)
+    proposal_boxes = np.random.rand(4, 2, 4).astype(np.float32)
+    (box_encodings, class_predictions_with_background) = self.execute(
+        graph_fn, [image_features, proposal_boxes])
+
+    self.assertAllEqual(box_encodings.shape, [8, 1, 2, 4])
+    self.assertAllEqual(class_predictions_with_background.shape, [8, 1, 3])
+
+
+if __name__ == '__main__':
+  tf.test.main()
--- a/research/object_detection/protos/box_predictor.proto
+++ b/research/object_detection/protos/box_predictor.proto
@@ -93,6 +93,8 @@ message WeightSharedConvolutionalBoxPredictor {
  optional bool share_prediction_tower = 13 [default = false];
 }

+// TODO(alirezafathi): Refactor the proto file to be able to configure mask rcnn
+// head easily.
 message MaskRCNNBoxPredictor {
  // Hyperparameters for fully connected ops used in the box predictor.
  optional Hyperparams fc_hyperparams = 1;

--- a/research/object_detection/protos/faster_rcnn.proto
+++ b/research/object_detection/protos/faster_rcnn.proto
@@ -142,6 +142,21 @@ message FasterRcnn {
  // standard tf.image.crop_and_resize while computing second stage input
  // feature maps.
  optional bool use_matmul_crop_and_resize = 31 [default = false];
+
+  // Normally, anchors generated for a given image size are pruned during
+  // training if they lie outside the image window. Setting this option to true,
+  // clips the anchors to be within the image instead of pruning.
+  optional bool clip_anchors_to_image = 32 [default = false];
+
+  // After peforming matching between anchors and targets, in order to pull out
+  // targets for training Faster R-CNN meta architecture we perform a gather
+  // operation. This options specifies whether to use an alternate
+  // implementation of tf.gather that is faster on TPUs.
+  optional bool use_matmul_gather_in_matcher = 33 [default = false];
+
+  // Whether to use the balanced positive negative sampler implementation with
+  // static shape guarantees.
+  optional bool use_static_balanced_label_sampler = 34 [default = false];
 }



--- a/research/object_detection/protos/preprocessor.proto
+++ b/research/object_detection/protos/preprocessor.proto
@@ -33,6 +33,7 @@ message PreprocessingStep {
    RandomVerticalFlip random_vertical_flip = 25;
    RandomRotation90 random_rotation90 = 26;
    RGBtoGray rgb_to_gray = 27;
+    ConvertClassLogitsToSoftmax convert_class_logits_to_softmax = 28;
  }
 }

@@ -409,3 +410,11 @@ message SSDRandomCropPadFixedAspectRatio {
  // width. Two entries per operation.
  repeated float max_padded_size_ratio = 4;
 }
+
+// Converts class logits to softmax optionally scaling the values by temperature
+// first.
+message ConvertClassLogitsToSoftmax {
+
+  // Scale to use on logits before applying softmax.
+  optional float temperature = 1 [default=1.0];
+}
--- a/research/object_detection/protos/region_similarity_calculator.proto
+++ b/research/object_detection/protos/region_similarity_calculator.proto
@@ -9,6 +9,7 @@ message RegionSimilarityCalculator {
    NegSqDistSimilarity neg_sq_dist_similarity = 1;
    IouSimilarity iou_similarity = 2;
    IoaSimilarity ioa_similarity = 3;
+    ThresholdedIouSimilarity thresholded_iou_similarity = 4;
  }
 }

@@ -23,3 +24,10 @@ message IouSimilarity {
 // Configuration for intersection-over-area (IOA) similarity calculator.
 message IoaSimilarity {
 }
+
+// Configuration for thresholded-intersection-over-union similarity calculator.
+message ThresholdedIouSimilarity {
+
+  // IOU threshold used for filtering scores.
+  optional float iou_threshold = 1 [default = 0.5];
+}
--- a/research/object_detection/protos/ssd.proto
+++ b/research/object_detection/protos/ssd.proto
@@ -120,4 +120,30 @@ message SsdFeatureExtractor {
  // Whether to use depthwise separable convolutions for to extract additional
  // feature maps added by SSD.
  optional bool use_depthwise = 8 [default=false];
+
+  // Feature Pyramid Networks config.
+  optional FeaturePyramidNetworks fpn = 10;
+}
+
+// Configuration for Feature Pyramid Networks.
+message FeaturePyramidNetworks {
+  // We recommend to use multi_resolution_feature_map_generator with FPN, and
+  // the levels there must match the levels defined below for better
+  // performance.
+  // Correspondence from FPN levels to Resnet/Mobilenet V1 feature maps:
+  // FPN Level        Resnet Feature Map      Mobilenet-V1 Feature Map
+  //     2               Block 1                Conv2d_3_pointwise
+  //     3               Block 2                Conv2d_5_pointwise
+  //     4               Block 3                Conv2d_11_pointwise
+  //     5               Block 4                Conv2d_13_pointwise
+  //     6               Bottomup_5             bottom_up_Conv2d_14
+  //     7               Bottomup_6             bottom_up_Conv2d_15
+  //     8               Bottomup_7             bottom_up_Conv2d_16
+  //     9               Bottomup_8             bottom_up_Conv2d_17
+
+  // minimum level in feature pyramid
+  optional int32 min_level = 1 [default = 3];
+
+  // maximum level in feature pyramid
+  optional int32 max_level = 2 [default = 7];
 }
--- a/research/object_detection/samples/configs/ssd_mobilenet_v1_300x300_coco14_sync.config
+++ b/research/object_detection/samples/configs/ssd_mobilenet_v1_300x300_coco14_sync.config
 # SSD with Mobilenet v1 feature extractor and focal loss.
 # Trained on COCO14, initialized from Imagenet classification checkpoint

-# Achieves 19.3 mAP on COCO14 minival dataset. Doubling the number of training
-# steps gets to 20.6 mAP.
+# Achieves 20.5 mAP on COCO14 minival dataset.

 # This config is TPU compatible

@@ -144,11 +143,11 @@ model {

 train_config: {
  fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt"
-  batch_size: 2048
+  batch_size: 1024
  sync_replicas: true
  startup_delay_steps: 0
  replicas_to_aggregate: 8
-  num_steps: 10000
+  num_steps: 20000
  data_augmentation_options {
    random_horizontal_flip {
    }
@@ -162,9 +161,9 @@ train_config: {
      learning_rate: {
        cosine_decay_learning_rate {
          learning_rate_base: 0.9
-          total_steps: 10000
+          total_steps: 20000
          warmup_learning_rate: 0.3
-          warmup_steps: 300
+          warmup_steps: 1000
        }
      }
      momentum_optimizer_value: 0.9

--- a/research/object_detection/samples/configs/ssd_mobilenet_v1_fpn_shared_box_predictor_640x640_coco14_sync.config
+++ b/research/object_detection/samples/configs/ssd_mobilenet_v1_fpn_shared_box_predictor_640x640_coco14_sync.config
@@ -79,6 +79,10 @@ model {
    }
    feature_extractor {
      type: 'ssd_mobilenet_v1_fpn'
+      fpn {
+        min_level: 3
+        max_level: 7
+      }
      min_depth: 16
      depth_multiplier: 1.0
      conv_hyperparams {

--- a/research/object_detection/samples/configs/ssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync.config
+++ b/research/object_detection/samples/configs/ssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync.config
@@ -80,6 +80,10 @@ model {
    }
    feature_extractor {
      type: 'ssd_resnet50_v1_fpn'
+      fpn {
+        min_level: 3
+        max_level: 7
+      }
      min_depth: 16
      depth_multiplier: 1.0
      conv_hyperparams {

--- a/research/object_detection/utils/label_map_util.py
+++ b/research/object_detection/utils/label_map_util.py
@@ -139,15 +139,26 @@ def load_labelmap(path):
  return label_map


-def get_label_map_dict(label_map_path, use_display_name=False):
+def get_label_map_dict(label_map_path,
+                       use_display_name=False,
+                       fill_in_gaps_and_background=False):
  """Reads a label map and returns a dictionary of label names to id.

  Args:
-    label_map_path: path to label_map.
+    label_map_path: path to StringIntLabelMap proto text file.
    use_display_name: whether to use the label map items' display names as keys.
+    fill_in_gaps_and_background: whether to fill in gaps and background with
+    respect to the id field in the proto. The id: 0 is reserved for the
+    'background' class and will be added if it is missing. All other missing
+    ids in range(1, max(id)) will be added with a dummy class name
+    ("class_<id>") if they are missing.

  Returns:
    A dictionary mapping label names to id.
+
+  Raises:
+    ValueError: if fill_in_gaps_and_background and label_map has non-integer or
+    negative values.
  """
  label_map = load_labelmap(label_map_path)
  label_map_dict = {}
@@ -156,6 +167,24 @@ def get_label_map_dict(label_map_path, use_display_name=False):
      label_map_dict[item.display_name] = item.id
    else:
      label_map_dict[item.name] = item.id
+
+  if fill_in_gaps_and_background:
+    values = set(label_map_dict.values())
+
+    if 0 not in values:
+      label_map_dict['background'] = 0
+    if not all(isinstance(value, int) for value in values):
+      raise ValueError('The values in label map must be integers in order to'
+                       'fill_in_gaps_and_background.')
+    if not all(value >= 0 for value in values):
+      raise ValueError('The values in the label map must be positive.')
+
+    if len(values) != max(values) + 1:
+      # there are gaps in the labels, fill in gaps.
+      for value in range(1, max(values)):
+        if value not in values:
+          label_map_dict['class_' + str(value)] = value
+
  return label_map_dict



--- a/research/object_detection/utils/label_map_util_test.py
+++ b/research/object_detection/utils/label_map_util_test.py
@@ -119,6 +119,30 @@ class LabelMapUtilTest(tf.test.TestCase):
    self.assertEqual(label_map_dict['dog'], 1)
    self.assertEqual(label_map_dict['cat'], 2)

+  def test_get_label_map_dict_with_fill_in_gaps_and_background(self):
+    label_map_string = """
+      item {
+        id:3
+        name:'cat'
+      }
+      item {
+        id:1
+        name:'dog'
+      }
+    """
+    label_map_path = os.path.join(self.get_temp_dir(), 'label_map.pbtxt')
+    with tf.gfile.Open(label_map_path, 'wb') as f:
+      f.write(label_map_string)
+
+    label_map_dict = label_map_util.get_label_map_dict(
+        label_map_path, fill_in_gaps_and_background=True)
+
+    self.assertEqual(label_map_dict['background'], 0)
+    self.assertEqual(label_map_dict['dog'], 1)
+    self.assertEqual(label_map_dict['class_2'], 2)
+    self.assertEqual(label_map_dict['cat'], 3)
+    self.assertEqual(len(label_map_dict), max(label_map_dict.values()) + 1)
+
  def test_keep_categories_with_unique_id(self):
    label_map_proto = string_int_label_map_pb2.StringIntLabelMap()
    label_map_string = """

--- a/research/object_detection/utils/object_detection_evaluation.py
+++ b/research/object_detection/utils/object_detection_evaluation.py
@@ -31,6 +31,7 @@ from abc import ABCMeta
 from abc import abstractmethod
 import collections
 import logging
+import unicodedata
 import numpy as np

 from object_detection.core import standard_fields
@@ -284,18 +285,23 @@ class ObjectDetectionEvaluator(DetectionEvaluator):
    category_index = label_map_util.create_category_index(self._categories)
    for idx in range(per_class_ap.size):
      if idx + self._label_id_offset in category_index:
+        category_name = category_index[idx + self._label_id_offset]['name']
+        try:
+          category_name = unicode(category_name, 'utf-8')
+        except TypeError:
+          pass
+        category_name = unicodedata.normalize(
+            'NFKD', category_name).encode('ascii', 'ignore')
        display_name = (
            self._metric_prefix + 'PerformanceByCategory/AP@{}IOU/{}'.format(
-                self._matching_iou_threshold,
-                category_index[idx + self._label_id_offset]['name']))
+                self._matching_iou_threshold, category_name))
        pascal_metrics[display_name] = per_class_ap[idx]

        # Optionally add CorLoc metrics.classes
        if self._evaluate_corlocs:
          display_name = (
              self._metric_prefix + 'PerformanceByCategory/CorLoc@{}IOU/{}'
-              .format(self._matching_iou_threshold,
-                      category_index[idx + self._label_id_offset]['name']))
+              .format(self._matching_iou_threshold, category_name))
          pascal_metrics[display_name] = per_class_corloc[idx]

    return pascal_metrics
@@ -839,9 +845,9 @@ class ObjectDetectionEvaluation(object):
      if self.use_weighted_mean_ap:
        all_scores = np.append(all_scores, scores)
        all_tp_fp_labels = np.append(all_tp_fp_labels, tp_fp_labels)
-      print 'Scores and tpfp per class label: {}'.format(class_index)
-      print tp_fp_labels
-      print scores
+      logging.info('Scores and tpfp per class label: %d', class_index)
+      logging.info(tp_fp_labels)
+      logging.info(scores)
      precision, recall = metrics.compute_precision_recall(
          scores, tp_fp_labels, self.num_gt_instances_per_class[class_index])
      self.precisions_per_class.append(precision)

--- a/research/object_detection/utils/ops.py
+++ b/research/object_detection/utils/ops.py
@@ -20,8 +20,6 @@ import six

 import tensorflow as tf

-from object_detection.core import box_list
-from object_detection.core import box_list_ops
 from object_detection.core import standard_fields as fields
 from object_detection.utils import shape_utils
 from object_detection.utils import static_shape
@@ -60,13 +58,20 @@ def normalized_to_image_coordinates(normalized_boxes, image_shape,
    parallel_iterations: parallelism for the map_fn op.

  Returns:
-    absolute_boxes: a float32 tensor of shape [None, num_boxes, 4] containg the
-      boxes in image coordinates.
+    absolute_boxes: a float32 tensor of shape [None, num_boxes, 4] containing
+      the boxes in image coordinates.
  """
+  x_scale = tf.cast(image_shape[2], tf.float32)
+  y_scale = tf.cast(image_shape[1], tf.float32)
  def _to_absolute_coordinates(normalized_boxes):
-    return box_list_ops.to_absolute_coordinates(
-        box_list.BoxList(normalized_boxes),
-        image_shape[1], image_shape[2], check_range=False).get()
+    y_min, x_min, y_max, x_max = tf.split(
+        value=normalized_boxes, num_or_size_splits=4, axis=1)
+    y_min = y_scale * y_min
+    y_max = y_scale * y_max
+    x_min = x_scale * x_min
+    x_max = x_scale * x_max
+    scaled_boxes = tf.concat([y_min, x_min, y_max, x_max], 1)
+    return scaled_boxes

  absolute_boxes = shape_utils.static_or_dynamic_map_fn(
      _to_absolute_coordinates,
@@ -538,13 +543,59 @@ def normalize_to_target(inputs,
    return tf.reshape(target_norm, mult_shape) * tf.truediv(inputs, lengths)


+def batch_position_sensitive_crop_regions(images,
+                                          boxes,
+                                          crop_size,
+                                          num_spatial_bins,
+                                          global_pool,
+                                          parallel_iterations=64):
+  """Position sensitive crop with batches of images and boxes.
+
+  This op is exactly like `position_sensitive_crop_regions` below but operates
+  on batches of images and boxes. See `position_sensitive_crop_regions` function
+  below for the operation applied per batch element.
+
+  Args:
+    images: A `Tensor`. Must be one of the following types: `uint8`, `int8`,
+      `int16`, `int32`, `int64`, `half`, `float32`, `float64`.
+      A 4-D tensor of shape `[batch, image_height, image_width, depth]`.
+      Both `image_height` and `image_width` need to be positive.
+    boxes: A `Tensor` of type `float32`.
+      A 3-D tensor of shape `[batch, num_boxes, 4]`. Each box is specified in
+      normalized coordinates `[y1, x1, y2, x2]`. A normalized coordinate value
+      of `y` is mapped to the image coordinate at `y * (image_height - 1)`, so
+      as the `[0, 1]` interval of normalized image height is mapped to
+      `[0, image_height - 1] in image height coordinates. We do allow y1 > y2,
+      in which case the sampled crop is an up-down flipped version of the
+      original image. The width dimension is treated similarly.
+    crop_size: See `position_sensitive_crop_regions` below.
+    num_spatial_bins: See `position_sensitive_crop_regions` below.
+    global_pool: See `position_sensitive_crop_regions` below.
+    parallel_iterations: Number of batch items to process in parallel.
+
+  Returns:
+  """
+  def _position_sensitive_crop_fn(inputs):
+    images, boxes = inputs
+    return position_sensitive_crop_regions(
+        images,
+        boxes,
+        crop_size=crop_size,
+        num_spatial_bins=num_spatial_bins,
+        global_pool=global_pool)
+
+  return shape_utils.static_or_dynamic_map_fn(
+      _position_sensitive_crop_fn,
+      elems=[images, boxes],
+      dtype=tf.float32,
+      parallel_iterations=parallel_iterations)
+
+
 def position_sensitive_crop_regions(image,
                                    boxes,
-                                    box_ind,
                                    crop_size,
                                    num_spatial_bins,
-                                    global_pool,
-                                    extrapolation_value=None):
+                                    global_pool):
  """Position-sensitive crop and pool rectangular regions from a feature grid.

  The output crops are split into `spatial_bins_y` vertical bins
@@ -565,23 +616,16 @@ def position_sensitive_crop_regions(image,
  Args:
    image: A `Tensor`. Must be one of the following types: `uint8`, `int8`,
      `int16`, `int32`, `int64`, `half`, `float32`, `float64`.
-      A 4-D tensor of shape `[batch, image_height, image_width, depth]`.
+      A 3-D tensor of shape `[image_height, image_width, depth]`.
      Both `image_height` and `image_width` need to be positive.
    boxes: A `Tensor` of type `float32`.
-      A 2-D tensor of shape `[num_boxes, 4]`. The `i`-th row of the tensor
-      specifies the coordinates of a box in the `box_ind[i]` image and is
-      specified in normalized coordinates `[y1, x1, y2, x2]`. A normalized
-      coordinate value of `y` is mapped to the image coordinate at
-      `y * (image_height - 1)`, so as the `[0, 1]` interval of normalized image
-      height is mapped to `[0, image_height - 1] in image height coordinates.
-      We do allow y1 > y2, in which case the sampled crop is an up-down flipped
-      version of the original image. The width dimension is treated similarly.
-      Normalized coordinates outside the `[0, 1]` range are allowed, in which
-      case we use `extrapolation_value` to extrapolate the input image values.
-    box_ind:  A `Tensor` of type `int32`.
-      A 1-D tensor of shape `[num_boxes]` with int32 values in `[0, batch)`.
-      The value of `box_ind[i]` specifies the image that the `i`-th box refers
-      to.
+      A 2-D tensor of shape `[num_boxes, 4]`. Each box is specified in
+      normalized coordinates `[y1, x1, y2, x2]`. A normalized coordinate value
+      of `y` is mapped to the image coordinate at `y * (image_height - 1)`, so
+      as the `[0, 1]` interval of normalized image height is mapped to
+      `[0, image_height - 1] in image height coordinates. We do allow y1 > y2,
+      in which case the sampled crop is an up-down flipped version of the
+      original image. The width dimension is treated similarly.
    crop_size: A list of two integers `[crop_height, crop_width]`. All
      cropped image patches are resized to this size. The aspect ratio of the
      image content is not preserved. Both `crop_height` and `crop_width` need
@@ -601,8 +645,7 @@ def position_sensitive_crop_regions(image,
      Note that using global_pool=True is equivalent to but more efficient than
        running the function with global_pool=False and then performing global
        average pooling.
-    extrapolation_value: An optional `float`. Defaults to `0`.
-      Value used for extrapolation, when applicable.
+
  Returns:
    position_sensitive_features: A 4-D tensor of shape
      `[num_boxes, K, K, crop_channels]`,
@@ -649,12 +692,17 @@ def position_sensitive_crop_regions(image,
                        ]
      position_sensitive_boxes.append(tf.stack(box_coordinates, axis=1))

-  image_splits = tf.split(value=image, num_or_size_splits=total_bins, axis=3)
+  image_splits = tf.split(value=image, num_or_size_splits=total_bins, axis=2)

  image_crops = []
  for (split, box) in zip(image_splits, position_sensitive_boxes):
-    crop = tf.image.crop_and_resize(split, box, box_ind, bin_crop_size,
-                                    extrapolation_value=extrapolation_value)
+    if split.shape.is_fully_defined() and box.shape.is_fully_defined():
+      crop = matmul_crop_and_resize(
+          tf.expand_dims(split, 0), box, bin_crop_size)
+    else:
+      crop = tf.image.crop_and_resize(
+          tf.expand_dims(split, 0), box,
+          tf.zeros(tf.shape(boxes)[0], dtype=tf.int32), bin_crop_size)
    image_crops.append(crop)

  if global_pool:
@@ -957,3 +1005,68 @@ def matmul_crop_and_resize(image, boxes, crop_size, scope=None):
              tf.matmul(kernel_h, tf.tile(channel, [num_crops, 1, 1])),
              kernel_w, transpose_b=True))
    return tf.stack(result_channels, axis=3)
+
+
+def expected_classification_loss_under_sampling(batch_cls_targets, cls_losses,
+                                                negative_to_positive_ratio,
+                                                minimum_negative_sampling):
+  """Computes classification loss by background/foreground weighting.
+
+  The weighting is such that the effective background/foreground weight ratio
+  is the negative_to_positive_ratio. if p_i is the foreground probability of
+  anchor a_i, L(a_i) is the anchors loss, N is the number of anchors, and M is
+  the sum of foreground probabilities across anchors, then the total loss L is
+  calculated as:
+
+  beta = K*M/(N-M)
+  L = sum_{i=1}^N [p_i + beta * (1 - p_i)] * (L(a_i))
+
+  Args:
+    batch_cls_targets: A tensor with shape [batch_size, num_anchors,
+        num_classes + 1], where 0'th index is the background class, containing
+        the class distrubution for the target assigned to a given anchor.
+    cls_losses: Float tensor of shape [batch_size, num_anchors]
+        representing anchorwise classification losses.
+    negative_to_positive_ratio: The desired background/foreground weight ratio.
+    minimum_negative_sampling: Minimum number of effective negative samples.
+      Used only when there are no positive examples.
+
+  Returns:
+    The classification loss.
+  """
+
+  num_anchors = tf.cast(tf.shape(batch_cls_targets)[1], tf.float32)
+
+  # find the p_i
+  foreground_probabilities = (
+      foreground_probabilities_from_targets(batch_cls_targets))
+  foreground_sum = tf.reduce_sum(foreground_probabilities, axis=-1)
+
+  k = negative_to_positive_ratio
+
+  # compute beta
+  denominators = (num_anchors - foreground_sum)
+  beta = tf.where(
+      tf.equal(denominators, 0), tf.zeros_like(foreground_sum),
+      k * foreground_sum / denominators)
+
+  # where the foreground sum is zero, use a minimum negative weight.
+  min_negative_weight = 1.0 * minimum_negative_sampling / num_anchors
+  beta = tf.where(
+      tf.equal(beta, 0), min_negative_weight * tf.ones_like(beta), beta)
+  beta = tf.reshape(beta, [-1, 1])
+
+  cls_loss_weights = foreground_probabilities + (
+      1 - foreground_probabilities) * beta
+
+  weighted_losses = cls_loss_weights * cls_losses
+
+  cls_losses = tf.reduce_sum(weighted_losses, axis=-1)
+
+  return cls_losses
+
+
+def foreground_probabilities_from_targets(batch_cls_targets):
+  foreground_probabilities = 1 - batch_cls_targets[:, :, 0]
+
+  return foreground_probabilities
--- a/research/object_detection/utils/ops_test.py
+++ b/research/object_detection/utils/ops_test.py
@@ -812,13 +812,12 @@ class OpsTestPositionSensitiveCropRegions(tf.test.TestCase):

  def test_position_sensitive(self):
    num_spatial_bins = [3, 2]
-    image_shape = [1, 3, 2, 6]
+    image_shape = [3, 2, 6]

    # First channel is 1's, second channel is 2's, etc.
    image = tf.constant(range(1, 3 * 2 + 1) * 6, dtype=tf.float32,
                        shape=image_shape)
    boxes = tf.random_uniform((2, 4))
-    box_ind = tf.constant([0, 0], dtype=tf.int32)

    # The result for both boxes should be [[1, 2], [3, 4], [5, 6]]
    # before averaging.
@@ -827,7 +826,7 @@ class OpsTestPositionSensitiveCropRegions(tf.test.TestCase):
    for crop_size_mult in range(1, 3):
      crop_size = [3 * crop_size_mult, 2 * crop_size_mult]
      ps_crop_and_pool = ops.position_sensitive_crop_regions(
-          image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=True)
+          image, boxes, crop_size, num_spatial_bins, global_pool=True)

      with self.test_session() as sess:
        output = sess.run(ps_crop_and_pool)
@@ -835,24 +834,24 @@ class OpsTestPositionSensitiveCropRegions(tf.test.TestCase):

  def test_position_sensitive_with_equal_channels(self):
    num_spatial_bins = [2, 2]
-    image_shape = [1, 3, 3, 4]
+    image_shape = [3, 3, 4]
    crop_size = [2, 2]

    image = tf.constant(range(1, 3 * 3 + 1), dtype=tf.float32,
-                        shape=[1, 3, 3, 1])
-    tiled_image = tf.tile(image, [1, 1, 1, image_shape[3]])
+                        shape=[3, 3, 1])
+    tiled_image = tf.tile(image, [1, 1, image_shape[2]])
    boxes = tf.random_uniform((3, 4))
    box_ind = tf.constant([0, 0, 0], dtype=tf.int32)

    # All channels are equal so position-sensitive crop and resize should
    # work as the usual crop and resize for just one channel.
-    crop = tf.image.crop_and_resize(image, boxes, box_ind, crop_size)
+    crop = tf.image.crop_and_resize(tf.expand_dims(image, axis=0), boxes,
+                                    box_ind, crop_size)
    crop_and_pool = tf.reduce_mean(crop, [1, 2], keep_dims=True)

    ps_crop_and_pool = ops.position_sensitive_crop_regions(
        tiled_image,
        boxes,
-        box_ind,
        crop_size,
        num_spatial_bins,
        global_pool=True)
@@ -861,78 +860,53 @@ class OpsTestPositionSensitiveCropRegions(tf.test.TestCase):
      expected_output, output = sess.run((crop_and_pool, ps_crop_and_pool))
      self.assertAllClose(output, expected_output)

-  def test_position_sensitive_with_single_bin(self):
-    num_spatial_bins = [1, 1]
-    image_shape = [2, 3, 3, 4]
-    crop_size = [2, 2]
-
-    image = tf.random_uniform(image_shape)
-    boxes = tf.random_uniform((6, 4))
-    box_ind = tf.constant([0, 0, 0, 1, 1, 1], dtype=tf.int32)
-
-    # When a single bin is used, position-sensitive crop and pool should be
-    # the same as non-position sensitive crop and pool.
-    crop = tf.image.crop_and_resize(image, boxes, box_ind, crop_size)
-    crop_and_pool = tf.reduce_mean(crop, [1, 2], keep_dims=True)
-
-    ps_crop_and_pool = ops.position_sensitive_crop_regions(
-        image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=True)
-
-    with self.test_session() as sess:
-      expected_output, output = sess.run((crop_and_pool, ps_crop_and_pool))
-      self.assertAllClose(output, expected_output)
-
  def test_raise_value_error_on_num_bins_less_than_one(self):
    num_spatial_bins = [1, -1]
-    image_shape = [1, 1, 1, 2]
+    image_shape = [1, 1, 2]
    crop_size = [2, 2]

    image = tf.constant(1, dtype=tf.float32, shape=image_shape)
    boxes = tf.constant([[0, 0, 1, 1]], dtype=tf.float32)
-    box_ind = tf.constant([0], dtype=tf.int32)

    with self.assertRaisesRegexp(ValueError, 'num_spatial_bins should be >= 1'):
      ops.position_sensitive_crop_regions(
-          image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=True)
+          image, boxes, crop_size, num_spatial_bins, global_pool=True)

  def test_raise_value_error_on_non_divisible_crop_size(self):
    num_spatial_bins = [2, 3]
-    image_shape = [1, 1, 1, 6]
+    image_shape = [1, 1, 6]
    crop_size = [3, 2]

    image = tf.constant(1, dtype=tf.float32, shape=image_shape)
    boxes = tf.constant([[0, 0, 1, 1]], dtype=tf.float32)
-    box_ind = tf.constant([0], dtype=tf.int32)

    with self.assertRaisesRegexp(
        ValueError, 'crop_size should be divisible by num_spatial_bins'):
      ops.position_sensitive_crop_regions(
-          image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=True)
+          image, boxes, crop_size, num_spatial_bins, global_pool=True)

  def test_raise_value_error_on_non_divisible_num_channels(self):
    num_spatial_bins = [2, 2]
-    image_shape = [1, 1, 1, 5]
+    image_shape = [1, 1, 5]
    crop_size = [2, 2]

    image = tf.constant(1, dtype=tf.float32, shape=image_shape)
    boxes = tf.constant([[0, 0, 1, 1]], dtype=tf.float32)
-    box_ind = tf.constant([0], dtype=tf.int32)

    with self.assertRaisesRegexp(
        ValueError, 'Dimension size must be evenly divisible by 4 but is 5'):
      ops.position_sensitive_crop_regions(
-          image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=True)
+          image, boxes, crop_size, num_spatial_bins, global_pool=True)

  def test_position_sensitive_with_global_pool_false(self):
    num_spatial_bins = [3, 2]
-    image_shape = [1, 3, 2, 6]
+    image_shape = [3, 2, 6]
    num_boxes = 2

    # First channel is 1's, second channel is 2's, etc.
    image = tf.constant(range(1, 3 * 2 + 1) * 6, dtype=tf.float32,
                        shape=image_shape)
    boxes = tf.random_uniform((num_boxes, 4))
-    box_ind = tf.constant([0, 0], dtype=tf.int32)

    expected_output = []

@@ -956,79 +930,21 @@ class OpsTestPositionSensitiveCropRegions(tf.test.TestCase):
    for crop_size_mult in range(1, 3):
      crop_size = [3 * crop_size_mult, 2 * crop_size_mult]
      ps_crop = ops.position_sensitive_crop_regions(
-          image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=False)
+          image, boxes, crop_size, num_spatial_bins, global_pool=False)
      with self.test_session() as sess:
        output = sess.run(ps_crop)

      self.assertAllEqual(output, expected_output[crop_size_mult - 1])

-  def test_position_sensitive_with_global_pool_false_and_known_boxes(self):
-    num_spatial_bins = [2, 2]
-    image_shape = [2, 2, 2, 4]
-    crop_size = [2, 2]
-
-    image = tf.constant(range(1, 2 * 2 * 4  + 1) * 2, dtype=tf.float32,
-                        shape=image_shape)
-
-    # First box contains whole image, and second box contains only first row.
-    boxes = tf.constant(np.array([[0., 0., 1., 1.],
-                                  [0., 0., 0.5, 1.]]), dtype=tf.float32)
-    box_ind = tf.constant([0, 1], dtype=tf.int32)
-
-    expected_output = []
-
-    # Expected output, when the box containing whole image.
-    expected_output.append(
-        np.reshape(np.array([[4, 7],
-                             [10, 13]]),
-                   (1, 2, 2, 1))
-    )
-
-    # Expected output, when the box containing only first row.
-    expected_output.append(
-        np.reshape(np.array([[3, 6],
-                             [7, 10]]),
-                   (1, 2, 2, 1))
-    )
-    expected_output = np.concatenate(expected_output, axis=0)
-
-    ps_crop = ops.position_sensitive_crop_regions(
-        image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=False)
-
-    with self.test_session() as sess:
-      output = sess.run(ps_crop)
-      self.assertAllEqual(output, expected_output)
-
-  def test_position_sensitive_with_global_pool_false_and_single_bin(self):
-    num_spatial_bins = [1, 1]
-    image_shape = [2, 3, 3, 4]
-    crop_size = [1, 1]
-
-    image = tf.random_uniform(image_shape)
-    boxes = tf.random_uniform((6, 4))
-    box_ind = tf.constant([0, 0, 0, 1, 1, 1], dtype=tf.int32)
-
-    # Since single_bin is used and crop_size = [1, 1] (i.e., no crop resize),
-    # the outputs are the same whatever the global_pool value is.
-    ps_crop_and_pool = ops.position_sensitive_crop_regions(
-        image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=True)
-    ps_crop = ops.position_sensitive_crop_regions(
-        image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=False)
-
-    with self.test_session() as sess:
-      pooled_output, unpooled_output = sess.run((ps_crop_and_pool, ps_crop))
-      self.assertAllClose(pooled_output, unpooled_output)
-
  def test_position_sensitive_with_global_pool_false_and_do_global_pool(self):
    num_spatial_bins = [3, 2]
-    image_shape = [1, 3, 2, 6]
+    image_shape = [3, 2, 6]
    num_boxes = 2

    # First channel is 1's, second channel is 2's, etc.
    image = tf.constant(range(1, 3 * 2 + 1) * 6, dtype=tf.float32,
                        shape=image_shape)
    boxes = tf.random_uniform((num_boxes, 4))
-    box_ind = tf.constant([0, 0], dtype=tf.int32)

    expected_output = []

@@ -1059,7 +975,7 @@ class OpsTestPositionSensitiveCropRegions(tf.test.TestCase):
      # Perform global_pooling after running the function with
      # global_pool=False.
      ps_crop = ops.position_sensitive_crop_regions(
-          image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=False)
+          image, boxes, crop_size, num_spatial_bins, global_pool=False)
      ps_crop_and_pool = tf.reduce_mean(
          ps_crop, reduction_indices=(1, 2), keep_dims=True)

@@ -1070,17 +986,99 @@ class OpsTestPositionSensitiveCropRegions(tf.test.TestCase):

  def test_raise_value_error_on_non_square_block_size(self):
    num_spatial_bins = [3, 2]
-    image_shape = [1, 3, 2, 6]
+    image_shape = [3, 2, 6]
    crop_size = [6, 2]

    image = tf.constant(1, dtype=tf.float32, shape=image_shape)
    boxes = tf.constant([[0, 0, 1, 1]], dtype=tf.float32)
-    box_ind = tf.constant([0], dtype=tf.int32)

    with self.assertRaisesRegexp(
        ValueError, 'Only support square bin crop size for now.'):
      ops.position_sensitive_crop_regions(
-          image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=False)
+          image, boxes, crop_size, num_spatial_bins, global_pool=False)
+
+
+class OpsTestBatchPositionSensitiveCropRegions(tf.test.TestCase):
+
+  def test_position_sensitive_with_single_bin(self):
+    num_spatial_bins = [1, 1]
+    image_shape = [2, 3, 3, 4]
+    crop_size = [2, 2]
+
+    image = tf.random_uniform(image_shape)
+    boxes = tf.random_uniform((2, 3, 4))
+    box_ind = tf.constant([0, 0, 0, 1, 1, 1], dtype=tf.int32)
+
+    # When a single bin is used, position-sensitive crop and pool should be
+    # the same as non-position sensitive crop and pool.
+    crop = tf.image.crop_and_resize(image, tf.reshape(boxes, [-1, 4]), box_ind,
+                                    crop_size)
+    crop_and_pool = tf.reduce_mean(crop, [1, 2], keepdims=True)
+    crop_and_pool = tf.reshape(crop_and_pool, [2, 3, 1, 1, 4])
+
+    ps_crop_and_pool = ops.batch_position_sensitive_crop_regions(
+        image, boxes, crop_size, num_spatial_bins, global_pool=True)
+
+    with self.test_session() as sess:
+      expected_output, output = sess.run((crop_and_pool, ps_crop_and_pool))
+      self.assertAllClose(output, expected_output)
+
+  def test_position_sensitive_with_global_pool_false_and_known_boxes(self):
+    num_spatial_bins = [2, 2]
+    image_shape = [2, 2, 2, 4]
+    crop_size = [2, 2]
+
+    images = tf.constant(range(1, 2 * 2 * 4  + 1) * 2, dtype=tf.float32,
+                         shape=image_shape)
+
+    # First box contains whole image, and second box contains only first row.
+    boxes = tf.constant(np.array([[[0., 0., 1., 1.]],
+                                  [[0., 0., 0.5, 1.]]]), dtype=tf.float32)
+    # box_ind = tf.constant([0, 1], dtype=tf.int32)
+
+    expected_output = []
+
+    # Expected output, when the box containing whole image.
+    expected_output.append(
+        np.reshape(np.array([[4, 7],
+                             [10, 13]]),
+                   (1, 2, 2, 1))
+    )
+
+    # Expected output, when the box containing only first row.
+    expected_output.append(
+        np.reshape(np.array([[3, 6],
+                             [7, 10]]),
+                   (1, 2, 2, 1))
+    )
+    expected_output = np.stack(expected_output, axis=0)
+
+    ps_crop = ops.batch_position_sensitive_crop_regions(
+        images, boxes, crop_size, num_spatial_bins, global_pool=False)
+
+    with self.test_session() as sess:
+      output = sess.run(ps_crop)
+      self.assertAllEqual(output, expected_output)
+
+  def test_position_sensitive_with_global_pool_false_and_single_bin(self):
+    num_spatial_bins = [1, 1]
+    image_shape = [2, 3, 3, 4]
+    crop_size = [1, 1]
+
+    images = tf.random_uniform(image_shape)
+    boxes = tf.random_uniform((2, 3, 4))
+    # box_ind = tf.constant([0, 0, 0, 1, 1, 1], dtype=tf.int32)
+
+    # Since single_bin is used and crop_size = [1, 1] (i.e., no crop resize),
+    # the outputs are the same whatever the global_pool value is.
+    ps_crop_and_pool = ops.batch_position_sensitive_crop_regions(
+        images, boxes, crop_size, num_spatial_bins, global_pool=True)
+    ps_crop = ops.batch_position_sensitive_crop_regions(
+        images, boxes, crop_size, num_spatial_bins, global_pool=False)
+
+    with self.test_session() as sess:
+      pooled_output, unpooled_output = sess.run((ps_crop_and_pool, ps_crop))
+      self.assertAllClose(pooled_output, unpooled_output)


 class ReframeBoxMasksToImageMasksTest(tf.test.TestCase):
@@ -1365,5 +1363,86 @@ class OpsTestMatMulCropAndResize(test_case.TestCase):
      _ = ops.matmul_crop_and_resize(image, boxes, crop_size)


+class OpsTestExpectedClassificationLoss(test_case.TestCase):
+
+  def testExpectedClassificationLossUnderSamplingWithHardLabels(self):
+
+    def graph_fn(batch_cls_targets, cls_losses, negative_to_positive_ratio,
+                 minimum_negative_sampling):
+      return ops.expected_classification_loss_under_sampling(
+          batch_cls_targets, cls_losses, negative_to_positive_ratio,
+          minimum_negative_sampling)
+
+    batch_cls_targets = np.array(
+        [[[1., 0, 0], [0, 1., 0]], [[1., 0, 0], [0, 1., 0]]], dtype=np.float32)
+    cls_losses = np.array([[1, 2], [3, 4]], dtype=np.float32)
+    negative_to_positive_ratio = np.array([2], dtype=np.float32)
+    minimum_negative_sampling = np.array([1], dtype=np.float32)
+
+    classification_loss = self.execute(graph_fn, [
+        batch_cls_targets, cls_losses, negative_to_positive_ratio,
+        minimum_negative_sampling
+    ])
+
+    # expected_foregorund_sum = [1,1]
+    # expected_beta = [2,2]
+    # expected_cls_loss_weights = [2,1],[2,1]
+    # expected_classification_loss_under_sampling = [2*1+1*2, 2*3+1*4]
+    expected_classification_loss_under_sampling = [2 + 2, 6 + 4]
+
+    self.assertAllClose(expected_classification_loss_under_sampling,
+                        classification_loss)
+
+  def testExpectedClassificationLossUnderSamplingWithAllNegative(self):
+
+    def graph_fn(batch_cls_targets, cls_losses):
+      return ops.expected_classification_loss_under_sampling(
+          batch_cls_targets, cls_losses, negative_to_positive_ratio,
+          minimum_negative_sampling)
+
+    batch_cls_targets = np.array(
+        [[[1, 0, 0], [1, 0, 0]], [[1, 0, 0], [1, 0, 0]]], dtype=np.float32)
+    cls_losses = np.array([[1, 2], [3, 4]], dtype=np.float32)
+    negative_to_positive_ratio = np.array([2], dtype=np.float32)
+    minimum_negative_sampling = np.array([1], dtype=np.float32)
+
+    classification_loss = self.execute(graph_fn,
+                                       [batch_cls_targets, cls_losses])
+
+    # expected_foregorund_sum = [0,0]
+    # expected_beta = [0.5,0.5]
+    # expected_cls_loss_weights = [0.5,0.5],[0.5,0.5]
+    # expected_classification_loss_under_sampling = [.5*1+.5*2, .5*3+.5*4]
+    expected_classification_loss_under_sampling = [1.5, 3.5]
+
+    self.assertAllClose(expected_classification_loss_under_sampling,
+                        classification_loss)
+
+  def testExpectedClassificationLossUnderSamplingWithAllPositive(self):
+
+    def graph_fn(batch_cls_targets, cls_losses):
+      return ops.expected_classification_loss_under_sampling(
+          batch_cls_targets, cls_losses, negative_to_positive_ratio,
+          minimum_negative_sampling)
+
+    batch_cls_targets = np.array(
+        [[[0, 1., 0], [0, 1., 0]], [[0, 1, 0], [0, 0, 1]]], dtype=np.float32)
+    cls_losses = np.array([[1, 2], [3, 4]], dtype=np.float32)
+    negative_to_positive_ratio = np.array([2], dtype=np.float32)
+    minimum_negative_sampling = np.array([1], dtype=np.float32)
+
+    classification_loss = self.execute(graph_fn,
+                                       [batch_cls_targets, cls_losses])
+
+    # expected_foregorund_sum = [2,2]
+    # expected_beta = [0,0]
+    # expected_cls_loss_weights = [1,1],[1,1]
+    # expected_classification_loss_under_sampling = [1*1+1*2, 1*3+1*4]
+    expected_classification_loss_under_sampling = [1 + 2, 3 + 4]
+
+    self.assertAllClose(expected_classification_loss_under_sampling,
+                        classification_loss)
+
+
 if __name__ == '__main__':
  tf.test.main()
--- a/research/object_detection/utils/shape_utils.py
+++ b/research/object_detection/utils/shape_utils.py
@@ -106,13 +106,49 @@ def pad_or_clip_tensor(t, length):
      length is an integer, the first dimension of the processed tensor is set
      to length statically.
  """
-  processed_t = tf.cond(
-      tf.greater(tf.shape(t)[0], length),
-      lambda: clip_tensor(t, length),
-      lambda: pad_tensor(t, length))
-  if not _is_tensor(length):
-    processed_t = _set_dim_0(processed_t, length)
-  return processed_t
+  return pad_or_clip_nd(t, [length] + t.shape.as_list()[1:])
+
+
+def pad_or_clip_nd(tensor, output_shape):
+  """Pad or Clip given tensor to the output shape.
+
+  Args:
+    tensor: Input tensor to pad or clip.
+    output_shape: A list of integers / scalar tensors (or None for dynamic dim)
+      representing the size to pad or clip each dimension of the input tensor.
+
+  Returns:
+    Input tensor padded and clipped to the output shape.
+  """
+  tensor_shape = tf.shape(tensor)
+  clip_size = [
+      tf.where(tensor_shape[i] - shape > 0, shape, -1)
+      if shape is not None else -1 for i, shape in enumerate(output_shape)
+  ]
+  clipped_tensor = tf.slice(
+      tensor,
+      begin=tf.zeros(len(clip_size), dtype=tf.int32),
+      size=clip_size)
+
+  # Pad tensor if the shape of clipped tensor is smaller than the expected
+  # shape.
+  clipped_tensor_shape = tf.shape(clipped_tensor)
+  trailing_paddings = [
+      shape - clipped_tensor_shape[i] if shape is not None else 0
+      for i, shape in enumerate(output_shape)
+  ]
+  paddings = tf.stack(
+      [
+          tf.zeros(len(trailing_paddings), dtype=tf.int32),
+          trailing_paddings
+      ],
+      axis=1)
+  padded_tensor = tf.pad(clipped_tensor, paddings=paddings)
+  output_static_shape = [
+      dim if not isinstance(dim, tf.Tensor) else None for dim in output_shape
+  ]
+  padded_tensor.set_shape(output_static_shape)
+  return padded_tensor


 def combined_static_and_dynamic_shape(tensor):
@@ -306,4 +342,3 @@ def assert_shape_equal_along_first_dimension(shape_a, shape_b):
    else: return tf.no_op()
  else:
    return tf.assert_equal(shape_a[0], shape_b[0])
-
--- a/research/object_detection/utils/shape_utils_test.py
+++ b/research/object_detection/utils/shape_utils_test.py
@@ -123,6 +123,22 @@ class UtilTest(tf.test.TestCase):
    self.assertTrue(tf.contrib.framework.is_tensor(combined_shape[0]))
    self.assertListEqual(combined_shape[1:], [2, 3])

+  def test_pad_or_clip_nd_tensor(self):
+    tensor_placeholder = tf.placeholder(tf.float32, [None, 5, 4, 7])
+    output_tensor = shape_utils.pad_or_clip_nd(
+        tensor_placeholder, [None, 3, 5, tf.constant(6)])
+
+    self.assertAllEqual(output_tensor.shape.as_list(), [None, 3, 5, None])
+
+    with self.test_session() as sess:
+      output_tensor_np = sess.run(
+          output_tensor,
+          feed_dict={
+              tensor_placeholder: np.random.rand(2, 5, 4, 7),
+          })
+
+    self.assertAllEqual(output_tensor_np.shape, [2, 3, 5, 6])
+

 class StaticOrDynamicMapFnTest(tf.test.TestCase):


--- a/research/object_detection/utils/test_case.py
+++ b/research/object_detection/utils/test_case.py
@@ -47,9 +47,10 @@ class TestCase(tf.test.TestCase):
      materialized_results = sess.run(tpu_computation,
                                      feed_dict=dict(zip(placeholders, inputs)))
      sess.run(tpu.shutdown_system())
-      if (len(materialized_results) == 1
-          and (isinstance(materialized_results, list)
-               or isinstance(materialized_results, tuple))):
+      if (hasattr(materialized_results, '__len__') and
+          len(materialized_results) == 1 and
+          (isinstance(materialized_results, list) or
+           isinstance(materialized_results, tuple))):
        materialized_results = materialized_results[0]
    return materialized_results

@@ -72,9 +73,11 @@ class TestCase(tf.test.TestCase):
                tf.local_variables_initializer()])
      materialized_results = sess.run(results, feed_dict=dict(zip(placeholders,
                                                                  inputs)))
-      if (len(materialized_results) == 1
-          and (isinstance(materialized_results, list)
-               or isinstance(materialized_results, tuple))):
+
+      if (hasattr(materialized_results, '__len__') and
+          len(materialized_results) == 1 and
+          (isinstance(materialized_results, list) or
+           isinstance(materialized_results, tuple))):
        materialized_results = materialized_results[0]
    return materialized_results


--- a/research/object_detection/utils/test_utils.py
+++ b/research/object_detection/utils/test_utils.py
@@ -62,6 +62,30 @@ class MockBoxPredictor(box_predictor.BoxPredictor):
            class_predictions_with_background}


+class MockKerasBoxPredictor(box_predictor.KerasBoxPredictor):
+  """Simple box predictor that ignores inputs and outputs all zeros."""
+
+  def __init__(self, is_training, num_classes):
+    super(MockKerasBoxPredictor, self).__init__(
+        is_training, num_classes, False, False)
+
+  def _predict(self, image_features, **kwargs):
+    image_feature = image_features[0]
+    combined_feature_shape = shape_utils.combined_static_and_dynamic_shape(
+        image_feature)
+    batch_size = combined_feature_shape[0]
+    num_anchors = (combined_feature_shape[1] * combined_feature_shape[2])
+    code_size = 4
+    zero = tf.reduce_sum(0 * image_feature)
+    box_encodings = zero + tf.zeros(
+        (batch_size, num_anchors, 1, code_size), dtype=tf.float32)
+    class_predictions_with_background = zero + tf.zeros(
+        (batch_size, num_anchors, self.num_classes + 1), dtype=tf.float32)
+    return {box_predictor.BOX_ENCODINGS: box_encodings,
+            box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND:
+                class_predictions_with_background}
+
+
 class MockAnchorGenerator(anchor_generator.AnchorGenerator):
  """Mock anchor generator."""


--- a/research/object_detection/utils/variables_helper.py
+++ b/research/object_detection/utils/variables_helper.py
@@ -134,8 +134,11 @@ def get_variables_available_in_checkpoint(variables,
        vars_in_ckpt[variable_name] = variable
      else:
        logging.warning('Variable [%s] is available in checkpoint, but has an '
-                        'incompatible shape with model variable.',
-                        variable_name)
+                        'incompatible shape with model variable. Checkpoint '
+                        'shape: [%s], model variable shape: [%s]. This '
+                        'variable will not be initialized from the checkpoint.',
+                        variable_name, ckpt_vars_to_shape_map[variable_name],
+                        variable.shape.as_list())
    else:
      logging.warning('Variable [%s] is not available in checkpoint',
                      variable_name)