Object detection changes: (#7208)

257914648 by lzc: Internal changes -- 257525973 by Zhichao Lu: Fixes bug that silently prevents checkpoints from loading when training w/ eager + functions. Also sets up scripts to run training. -- 257296614 by Zhichao Lu: Adding detection_features to model outputs -- 257234565 by Zhichao Lu: Fix wrong order of `classes_with_max_scores` in class-agnostic NMS caused by sorting in partitioned-NMS. -- 257232002 by ronnyvotel: Supporting `filter_nonoverlapping` option in np_box_list_ops.clip_to_window(). -- 257198282 by Zhichao Lu: Adding the focal loss and l1 loss from the Objects as Points paper. -- 257089535 by Zhichao Lu: Create Keras based ssd + resnetv1 + fpn. -- 257087407 by Zhichao Lu: Make object_detection/data_decoders Python3-compatible. -- 257004582 by Zhichao Lu: Updates _decode_raw_data_into_masks_and_boxes to the latest binary masks-to-string encoding fo...

Object detection changes: (#7208)
257914648 by lzc: Internal changes -- 257525973 by Zhichao Lu: Fixes bug that silently prevents checkpoints from loading when training w/ eager + functions. Also sets up scripts to run training. -- 257296614 by Zhichao Lu: Adding detection_features to model outputs -- 257234565 by Zhichao Lu: Fix wrong order of `classes_with_max_scores` in class-agnostic NMS caused by sorting in partitioned-NMS. -- 257232002 by ronnyvotel: Supporting `filter_nonoverlapping` option in np_box_list_ops.clip_to_window(). -- 257198282 by Zhichao Lu: Adding the focal loss and l1 loss from the Objects as Points paper. -- 257089535 by Zhichao Lu: Create Keras based ssd + resnetv1 + fpn. -- 257087407 by Zhichao Lu: Make object_detection/data_decoders Python3-compatible. -- 257004582 by Zhichao Lu: Updates _decode_raw_data_into_masks_and_boxes to the latest binary masks-to-string encoding fo...
fe748d4a · pkulzc · GitHub · 81123ebf · fe748d4a · fe748d4a
Unverified Commit fe748d4a authored Jul 15, 2019 by pkulzc Committed by GitHub Jul 15, 2019
20 changed files
--- a/research/object_detection/g3doc/challenge_evaluation.md
+++ b/research/object_detection/g3doc/challenge_evaluation.md
 # Open Images Challenge Evaluation

-The Object Detection API is currently supporting several evaluation metrics used in the [Open Images Challenge 2018](https://storage.googleapis.com/openimages/web/challenge.html).
-In addition, several data processing tools are available. Detailed instructions on using the tools for each track are available below.
+The Object Detection API is currently supporting several evaluation metrics used
+in the
+[Open Images Challenge 2018](https://storage.googleapis.com/openimages/web/challenge.html)
+and
+[Open Images Challenge 2019](https://storage.googleapis.com/openimages/web/challenge2019.html).
+In addition, several data processing tools are available. Detailed instructions
+on using the tools for each track are available below.

-**NOTE**: links to the external website in this tutorial may change after the Open Images Challenge 2018 is finished.
+**NOTE:** all data links are updated to the Open Images Challenge 2019.

 ## Object Detection Track

-The [Object Detection metric](https://storage.googleapis.com/openimages/web/object_detection_metric.html) protocol requires a pre-processing of the released data to ensure correct evaluation. The released data contains only leaf-most bounding box annotations and image-level labels.
-The evaluation metric implementation is available in the class `OpenImagesDetectionChallengeEvaluator`.
-
-1. Download class hierarchy of Open Images Challenge 2018 in JSON format from [here](https://storage.googleapis.com/openimages/challenge_2018/bbox_labels_500_hierarchy.json).
-2. Download ground-truth [boundling boxes](https://storage.googleapis.com/openimages/challenge_2018/train/challenge-2018-train-annotations-bbox.csv) and [image-level labels](https://storage.googleapis.com/openimages/challenge_2018/train/challenge-2018-train-annotations-human-imagelabels.csv).
-3. Filter the rows corresponding to the validation set images you want to use and store the results in the same CSV format.
-4. Run the following command to create hierarchical expansion of the bounding boxes annotations:
+The
+[Object Detection metric](https://storage.googleapis.com/openimages/web/evaluation.html#object_detection_eval)
+protocol requires a pre-processing of the released data to ensure correct
+evaluation. The released data contains only leaf-most bounding box annotations
+and image-level labels. The evaluation metric implementation is available in the
+class `OpenImagesChallengeEvaluator`.
+
+1.  Download
+    [class hierarchy of Open Images Detection Challenge 2019](https://storage.googleapis.com/openimages/challenge_2019/challenge-2019-label500-hierarchy.json)
+    in JSON format.
+2.  Download
+    [ground-truth boundling boxes](https://storage.googleapis.com/openimages/challenge_2019/challenge-2019-validation-detection-bbox.csv)
+    and
+    [image-level labels](https://storage.googleapis.com/openimages/challenge_2019/challenge-2019-validation-detection-human-imagelabels.csv).
+3.  Run the following command to create hierarchical expansion of the bounding
+    boxes and image-level label annotations:

 ```
-HIERARCHY_FILE=/path/to/bbox_labels_500_hierarchy.json
-BOUNDING_BOXES=/path/to/challenge-2018-train-annotations-bbox
-IMAGE_LABELS=/path/to/challenge-2018-train-annotations-human-imagelabels
+HIERARCHY_FILE=/path/to/challenge-2019-label500-hierarchy.json
+BOUNDING_BOXES=/path/to/challenge-2019-validation-detection-bbox
+IMAGE_LABELS=/path/to/challenge-2019-validation-detection-human-imagelabels

 python object_detection/dataset_tools/oid_hierarchical_labels_expansion.py \
    --json_hierarchy_file=${HIERARCHY_FILE} \
@@ -33,13 +47,18 @@ python object_detection/dataset_tools/oid_hierarchical_labels_expansion.py \
    --annotation_type=2
 ```

-After step 4 you will have produced the ground-truth files suitable for running 'OID Challenge Object Detection Metric 2018' evaluation.
+1.  If you are not using Tensorflow, you can run evaluation directly using your
+    algorithm's output and generated ground-truth files. {value=4}
+
+After step 3 you produced the ground-truth files suitable for running 'OID
+Challenge Object Detection Metric 2019' evaluation. To run the evaluation, use
+the following command:

 ```
 INPUT_PREDICTIONS=/path/to/detection_predictions.csv
 OUTPUT_METRICS=/path/to/output/metrics/file

-python models/research/object_detection/metrics/oid_od_challenge_evaluation.py \
+python models/research/object_detection/metrics/oid_challenge_evaluation.py \
    --input_annotations_boxes=${BOUNDING_BOXES}_expanded.csv \
    --input_annotations_labels=${IMAGE_LABELS}_expanded.csv \
    --input_class_labelmap=object_detection/data/oid_object_detection_challenge_500_label_map.pbtxt \
@@ -47,66 +66,99 @@ python models/research/object_detection/metrics/oid_od_challenge_evaluation.py \
    --output_metrics=${OUTPUT_METRICS} \
 ```

-### Running evaluation on CSV files directly
+For the Object Detection Track, the participants will be ranked on:

-5. If you are not using Tensorflow, you can run evaluation directly using your algorithm's output and generated ground-truth files. {value=5}
+-   "OpenImagesDetectionChallenge_Precision/mAP@0.5IOU"

+To use evaluation within Tensorflow training, use metric name
+`oid_challenge_detection_metrics` in the evaluation config.

-### Running evaluation using TF Object Detection API
+## Instance Segmentation Track

-5. Produce tf.Example files suitable for running inference: {value=5}
+The
+[Instance Segmentation metric](https://storage.googleapis.com/openimages/web/evaluation.html#instance_segmentation_eval)
+can be directly evaluated using the ground-truth data and model predictions. The
+evaluation metric implementation is available in the class
+`OpenImagesChallengeEvaluator`.

-```
-RAW_IMAGES_DIR=/path/to/raw_images_location
-OUTPUT_DIR=/path/to/output_tfrecords
-
-python object_detection/dataset_tools/create_oid_tf_record.py \
-    --input_box_annotations_csv ${BOUNDING_BOXES}_expanded.csv \
-    --input_image_label_annotations_csv ${IMAGE_LABELS}_expanded.csv \
-    --input_images_directory ${RAW_IMAGES_DIR} \
-    --input_label_map object_detection/data/oid_object_detection_challenge_500_label_map.pbtxt \
-    --output_tf_record_path_prefix ${OUTPUT_DIR} \
-    --num_shards=100
-```
+1.  Download
+    [class hierarchy of Open Images Instance Segmentation Challenge 2019](https://storage.googleapis.com/openimages/challenge_2019/challenge-2019-label300-segmentable-hierarchy.json)
+    in JSON format.
+2.  Download
+    [ground-truth bounding boxes](https://storage.googleapis.com/openimages/challenge_2019/challenge-2019-validation-segmentation-bbox.csv)
+    and
+    [image-level labels](https://storage.googleapis.com/openimages/challenge_2019/challenge-2019-validation-segmentation-labels.csv).
+3.  Download instance segmentation files for the validation set (see
+    [Open Images Challenge Downloads page](https://storage.googleapis.com/openimages/web/challenge2019_downloads.html)).
+    The download consists of a set of .zip archives containing binary .png
+    masks.
+    Those should be transformed into a single CSV file in the format:
+
+    ImageID,LabelName,ImageWidth,ImageHeight,XMin,YMin,XMax,YMax,GroupOf,Mask
+    where Mask is MS COCO RLE encoding of a binary mask stored in .png file.

-6. Run inference of your model and fill corresponding fields in tf.Example: see [this tutorial](object_detection/g3doc/oid_inference_and_evaluation.md) on running the inference with Tensorflow Object Detection API models. {value=6}
+    NOTE: the util to make the transformation will be released soon.

-7. Finally, run the evaluation script to produce the final evaluation result.
+1.  Run the following command to create hierarchical expansion of the instance
+    segmentation, bounding boxes and image-level label annotations: {value=4}

 ```
-INPUT_TFRECORDS_WITH_DETECTIONS=/path/to/tf_records_with_detections
-OUTPUT_CONFIG_DIR=/path/to/configs
+HIERARCHY_FILE=/path/to/challenge-2019-label300-hierarchy.json
+BOUNDING_BOXES=/path/to/challenge-2019-validation-detection-bbox
+IMAGE_LABELS=/path/to/challenge-2019-validation-detection-human-imagelabels

-echo "
-label_map_path: 'object_detection/data/oid_object_detection_challenge_500_label_map.pbtxt'
-tf_record_input_reader: { input_path: '${INPUT_TFRECORDS_WITH_DETECTIONS}' }
-" > ${OUTPUT_CONFIG_DIR}/input_config.pbtxt
+python object_detection/dataset_tools/oid_hierarchical_labels_expansion.py \
+    --json_hierarchy_file=${HIERARCHY_FILE} \
+    --input_annotations=${BOUNDING_BOXES}.csv \
+    --output_annotations=${BOUNDING_BOXES}_expanded.csv \
+    --annotation_type=1
+
+python object_detection/dataset_tools/oid_hierarchical_labels_expansion.py \
+    --json_hierarchy_file=${HIERARCHY_FILE} \
+    --input_annotations=${IMAGE_LABELS}.csv \
+    --output_annotations=${IMAGE_LABELS}_expanded.csv \
+    --annotation_type=2

-echo "
-metrics_set: 'oid_challenge_detection_metrics'
-" > ${OUTPUT_CONFIG_DIR}/eval_config.pbtxt
+python object_detection/dataset_tools/oid_hierarchical_labels_expansion.py \
+    --json_hierarchy_file=${HIERARCHY_FILE} \
+    --input_annotations=${INSTANCE_SEGMENTATIONS}.csv \
+    --output_annotations=${INSTANCE_SEGMENTATIONS}_expanded.csv \
+    --annotation_type=1
+```

-OUTPUT_METRICS_DIR=/path/to/metrics_csv
+1.  If you are not using Tensorflow, you can run evaluation directly using your
+    algorithm's output and generated ground-truth files. {value=4}

-python object_detection/metrics/offline_eval_map_corloc.py \
-    --eval_dir=${OUTPUT_METRICS_DIR} \
-    --eval_config_path=${OUTPUT_CONFIG_DIR}/eval_config.pbtxt \
-    --input_config_path=${OUTPUT_CONFIG_DIR}/input_config.pbtxt
 ```
+INPUT_PREDICTIONS=/path/to/instance_segmentation_predictions.csv
+OUTPUT_METRICS=/path/to/output/metrics/file

-The result of the evaluation will be stored in `${OUTPUT_METRICS_DIR}/metrics.csv`
+python models/research/object_detection/metrics/oid_challenge_evaluation.py \
+    --input_annotations_boxes=${BOUNDING_BOXES}_expanded.csv \
+    --input_annotations_labels=${IMAGE_LABELS}_expanded.csv \
+    --input_class_labelmap=object_detection/data/oid_object_detection_challenge_500_label_map.pbtxt \
+    --input_predictions=${INPUT_PREDICTIONS} \
+    --input_annotations_segm=${INSTANCE_SEGMENTATIONS}_expanded.csv
+    --output_metrics=${OUTPUT_METRICS} \
+```

-For the Object Detection Track, the participants will be ranked on:
+For the Instance Segmentation Track, the participants will be ranked on:

- "OpenImagesChallenge2018_Precision/mAP@0.5IOU"
+-   "OpenImagesInstanceSegmentationChallenge_Precision/mAP@0.5IOU"

 ## Visual Relationships Detection Track

-The [Visual Relationships Detection metrics](https://storage.googleapis.com/openimages/web/vrd_detection_metric.html) can be directly evaluated using the ground-truth data and model predictions. The evaluation metric implementation is available in the class `VRDRelationDetectionEvaluator`,`VRDPhraseDetectionEvaluator`.
-
-1. Download the ground-truth [visual relationships annotations](https://storage.googleapis.com/openimages/challenge_2018/train/challenge-2018-train-vrd.csv) and [image-level labels](https://storage.googleapis.com/openimages/challenge_2018/train/challenge-2018-train-vrd-labels.csv).
-2. Filter the rows corresponding to the validation set images you want to use and store the results in the same CSV format.
-3. Run the follwing command to produce final metrics:
+The
+[Visual Relationships Detection metrics](https://storage.googleapis.com/openimages/web/evaluation.html#visual_relationships_eval)
+can be directly evaluated using the ground-truth data and model predictions. The
+evaluation metric implementation is available in the class
+`VRDRelationDetectionEvaluator`,`VRDPhraseDetectionEvaluator`.
+
+1.  Download the ground-truth
+    [visual relationships annotations](https://storage.googleapis.com/openimages/challenge_2019/challenge-2019-validation-vrd.csv)
+    and
+    [image-level labels](https://storage.googleapis.com/openimages/challenge_2019/challenge-2019-validation-vrd-labels.csv).
+2.  Run the follwing command to produce final metrics:

 ```
 INPUT_ANNOTATIONS_BOXES=/path/to/challenge-2018-train-vrd.csv

--- a/research/object_detection/g3doc/detection_model_zoo.md
+++ b/research/object_detection/g3doc/detection_model_zoo.md
@@ -138,6 +138,8 @@ Model name


 [^2]: This is PASCAL mAP with a slightly different way of true positives computation: see [Open Images evaluation protocols](evaluation_protocols.md), oid_V2_detection_metrics.
+
 [^3]: Non-face boxes are dropped during training and non-face groundtruth boxes are ignored when evaluating.
+
 [^4]: This is Open Images Challenge metric: see [Open Images evaluation protocols](evaluation_protocols.md), oid_challenge_detection_metrics.

--- a/research/object_detection/g3doc/evaluation_protocols.md
+++ b/research/object_detection/g3doc/evaluation_protocols.md
@@ -135,22 +135,29 @@ output bounding-boxes labelled in the same manner.
 The old metric name is DEPRECATED.
 `EvalConfig.metrics_set='open_images_V2_detection_metrics'`

-## OID Challenge Object Detection Metric 2018
+## OID Challenge Object Detection Metric

 `EvalConfig.metrics_set='oid_challenge_detection_metrics'`

-The metric for the OID Challenge Object Detection Metric 2018, Object Detection
-track. The description is provided on the [Open Images Challenge
-website](https://storage.googleapis.com/openimages/web/challenge.html).
+The metric for the OID Challenge Object Detection Metric 2018/2019 Object
+Detection track. The description is provided on the
+[Open Images Challenge website](https://storage.googleapis.com/openimages/web/evaluation.html#object_detection_eval).

 The old metric name is DEPRECATED.
 `EvalConfig.metrics_set='oid_challenge_object_detection_metrics'`

-## OID Challenge Visual Relationship Detection Metric 2018
+## OID Challenge Visual Relationship Detection Metric

-The metric for the OID Challenge Visual Relationship Detection Metric 2018, Visual
-Relationship Detection track. The description is provided on the [Open Images
-Challenge
-website](https://storage.googleapis.com/openimages/web/challenge.html). Note:
-this is currently a stand-alone metric, that can be used only through the
+The metric for the OID Challenge Visual Relationship Detection Metric 2018,2019
+Visual Relationship Detection track. The description is provided on the
+[Open Images Challenge website](https://storage.googleapis.com/openimages/web/evaluation.html#visual_relationships_eval).
+Note: this is currently a stand-alone metric, that can be used only through the
 `metrics/oid_vrd_challenge_evaluation.py` util.
+
+## OID Challenge Instance Segmentation Metric
+
+`EvalConfig.metrics_set='oid_challenge_segmentation_metrics'`
+
+The metric for the OID Challenge Instance Segmentation Metric 2019, Instance
+Segmentation track. The description is provided on the
+[Open Images Challenge website](https://storage.googleapis.com/openimages/web/evaluation.html#instance_segmentation_eval).
--- a/research/object_detection/inputs.py
+++ b/research/object_detection/inputs.py
@@ -47,6 +47,22 @@ INPUT_BUILDER_UTIL_MAP = {
 }


+def _multiclass_scores_or_one_hot_labels(multiclass_scores,
+                                         groundtruth_boxes,
+                                         groundtruth_classes, num_classes):
+  """Returns one-hot encoding of classes when multiclass_scores is empty."""
+  # Replace groundtruth_classes tensor with multiclass_scores tensor when its
+  # non-empty. If multiclass_scores is empty fall back on groundtruth_classes
+  # tensor.
+  def true_fn():
+    return tf.reshape(multiclass_scores,
+                      [tf.shape(groundtruth_boxes)[0], num_classes])
+  def false_fn():
+    return tf.one_hot(groundtruth_classes, num_classes)
+
+  return tf.cond(tf.size(multiclass_scores) > 0, true_fn, false_fn)
+
+
 def transform_input_data(tensor_dict,
                         model_preprocess_fn,
                         image_resizer_fn,
@@ -89,102 +105,106 @@ def transform_input_data(tensor_dict,
      and classes for a given image if the boxes are exactly the same.
    retain_original_image: (optional) whether to retain original image in the
      output dictionary.
-    use_multiclass_scores: whether to use multiclass scores as
-      class targets instead of one-hot encoding of `groundtruth_classes`.
+    use_multiclass_scores: whether to use multiclass scores as class targets
+      instead of one-hot encoding of `groundtruth_classes`. When
+      this is True and multiclass_scores is empty, one-hot encoding of
+      `groundtruth_classes` is used as a fallback.
    use_bfloat16: (optional) a bool, whether to use bfloat16 in training.

  Returns:
    A dictionary keyed by fields.InputDataFields containing the tensors obtained
    after applying all the transformations.
  """
-  # Reshape flattened multiclass scores tensor into a 2D tensor of shape
-  # [num_boxes, num_classes].
-  if fields.InputDataFields.multiclass_scores in tensor_dict:
-    tensor_dict[fields.InputDataFields.multiclass_scores] = tf.reshape(
-        tensor_dict[fields.InputDataFields.multiclass_scores], [
-            tf.shape(tensor_dict[fields.InputDataFields.groundtruth_boxes])[0],
-            num_classes
-        ])
-  if fields.InputDataFields.groundtruth_boxes in tensor_dict:
-    tensor_dict = util_ops.filter_groundtruth_with_nan_box_coordinates(
-        tensor_dict)
-    tensor_dict = util_ops.filter_unrecognized_classes(tensor_dict)
+  out_tensor_dict = tensor_dict.copy()
+  if fields.InputDataFields.multiclass_scores in out_tensor_dict:
+    out_tensor_dict[
+        fields.InputDataFields
+        .multiclass_scores] = _multiclass_scores_or_one_hot_labels(
+            out_tensor_dict[fields.InputDataFields.multiclass_scores],
+            out_tensor_dict[fields.InputDataFields.groundtruth_boxes],
+            out_tensor_dict[fields.InputDataFields.groundtruth_classes],
+            num_classes)
+
+  if fields.InputDataFields.groundtruth_boxes in out_tensor_dict:
+    out_tensor_dict = util_ops.filter_groundtruth_with_nan_box_coordinates(
+        out_tensor_dict)
+    out_tensor_dict = util_ops.filter_unrecognized_classes(out_tensor_dict)

  if retain_original_image:
-    tensor_dict[fields.InputDataFields.original_image] = tf.cast(
-        image_resizer_fn(tensor_dict[fields.InputDataFields.image], None)[0],
-        tf.uint8)
+    out_tensor_dict[fields.InputDataFields.original_image] = tf.cast(
+        image_resizer_fn(out_tensor_dict[fields.InputDataFields.image],
+                         None)[0], tf.uint8)

-  if fields.InputDataFields.image_additional_channels in tensor_dict:
-    channels = tensor_dict[fields.InputDataFields.image_additional_channels]
-    tensor_dict[fields.InputDataFields.image] = tf.concat(
-        [tensor_dict[fields.InputDataFields.image], channels], axis=2)
+  if fields.InputDataFields.image_additional_channels in out_tensor_dict:
+    channels = out_tensor_dict[fields.InputDataFields.image_additional_channels]
+    out_tensor_dict[fields.InputDataFields.image] = tf.concat(
+        [out_tensor_dict[fields.InputDataFields.image], channels], axis=2)

  # Apply data augmentation ops.
  if data_augmentation_fn is not None:
-    tensor_dict = data_augmentation_fn(tensor_dict)
+    out_tensor_dict = data_augmentation_fn(out_tensor_dict)

  # Apply model preprocessing ops and resize instance masks.
-  image = tensor_dict[fields.InputDataFields.image]
+  image = out_tensor_dict[fields.InputDataFields.image]
  preprocessed_resized_image, true_image_shape = model_preprocess_fn(
      tf.expand_dims(tf.cast(image, dtype=tf.float32), axis=0))
  if use_bfloat16:
    preprocessed_resized_image = tf.cast(
        preprocessed_resized_image, tf.bfloat16)
-  tensor_dict[fields.InputDataFields.image] = tf.squeeze(
+  out_tensor_dict[fields.InputDataFields.image] = tf.squeeze(
      preprocessed_resized_image, axis=0)
-  tensor_dict[fields.InputDataFields.true_image_shape] = tf.squeeze(
+  out_tensor_dict[fields.InputDataFields.true_image_shape] = tf.squeeze(
      true_image_shape, axis=0)
-  if fields.InputDataFields.groundtruth_instance_masks in tensor_dict:
-    masks = tensor_dict[fields.InputDataFields.groundtruth_instance_masks]
+  if fields.InputDataFields.groundtruth_instance_masks in out_tensor_dict:
+    masks = out_tensor_dict[fields.InputDataFields.groundtruth_instance_masks]
    _, resized_masks, _ = image_resizer_fn(image, masks)
    if use_bfloat16:
      resized_masks = tf.cast(resized_masks, tf.bfloat16)
-    tensor_dict[fields.InputDataFields.
-                groundtruth_instance_masks] = resized_masks
+    out_tensor_dict[
+        fields.InputDataFields.groundtruth_instance_masks] = resized_masks

-  # Transform groundtruth classes to one hot encodings.
  label_offset = 1
-  zero_indexed_groundtruth_classes = tensor_dict[
+  zero_indexed_groundtruth_classes = out_tensor_dict[
      fields.InputDataFields.groundtruth_classes] - label_offset
-  tensor_dict[fields.InputDataFields.groundtruth_classes] = tf.one_hot(
-      zero_indexed_groundtruth_classes, num_classes)
-
  if use_multiclass_scores:
-    tensor_dict[fields.InputDataFields.groundtruth_classes] = tensor_dict[
-        fields.InputDataFields.multiclass_scores]
-  tensor_dict.pop(fields.InputDataFields.multiclass_scores, None)
+    out_tensor_dict[
+        fields.InputDataFields.groundtruth_classes] = out_tensor_dict[
+            fields.InputDataFields.multiclass_scores]
+  else:
+    out_tensor_dict[fields.InputDataFields.groundtruth_classes] = tf.one_hot(
+        zero_indexed_groundtruth_classes, num_classes)
+  out_tensor_dict.pop(fields.InputDataFields.multiclass_scores, None)

-  if fields.InputDataFields.groundtruth_confidences in tensor_dict:
-    groundtruth_confidences = tensor_dict[
+  if fields.InputDataFields.groundtruth_confidences in out_tensor_dict:
+    groundtruth_confidences = out_tensor_dict[
        fields.InputDataFields.groundtruth_confidences]
    # Map the confidences to the one-hot encoding of classes
-    tensor_dict[fields.InputDataFields.groundtruth_confidences] = (
+    out_tensor_dict[fields.InputDataFields.groundtruth_confidences] = (
        tf.reshape(groundtruth_confidences, [-1, 1]) *
-        tensor_dict[fields.InputDataFields.groundtruth_classes])
+        out_tensor_dict[fields.InputDataFields.groundtruth_classes])
  else:
    groundtruth_confidences = tf.ones_like(
        zero_indexed_groundtruth_classes, dtype=tf.float32)
-    tensor_dict[fields.InputDataFields.groundtruth_confidences] = (
-        tensor_dict[fields.InputDataFields.groundtruth_classes])
+    out_tensor_dict[fields.InputDataFields.groundtruth_confidences] = (
+        out_tensor_dict[fields.InputDataFields.groundtruth_classes])

  if merge_multiple_boxes:
    merged_boxes, merged_classes, merged_confidences, _ = (
        util_ops.merge_boxes_with_multiple_labels(
-            tensor_dict[fields.InputDataFields.groundtruth_boxes],
+            out_tensor_dict[fields.InputDataFields.groundtruth_boxes],
            zero_indexed_groundtruth_classes,
            groundtruth_confidences,
            num_classes))
    merged_classes = tf.cast(merged_classes, tf.float32)
-    tensor_dict[fields.InputDataFields.groundtruth_boxes] = merged_boxes
-    tensor_dict[fields.InputDataFields.groundtruth_classes] = merged_classes
-    tensor_dict[fields.InputDataFields.groundtruth_confidences] = (
+    out_tensor_dict[fields.InputDataFields.groundtruth_boxes] = merged_boxes
+    out_tensor_dict[fields.InputDataFields.groundtruth_classes] = merged_classes
+    out_tensor_dict[fields.InputDataFields.groundtruth_confidences] = (
        merged_confidences)
-  if fields.InputDataFields.groundtruth_boxes in tensor_dict:
-    tensor_dict[fields.InputDataFields.num_groundtruth_boxes] = tf.shape(
-        tensor_dict[fields.InputDataFields.groundtruth_boxes])[0]
+  if fields.InputDataFields.groundtruth_boxes in out_tensor_dict:
+    out_tensor_dict[fields.InputDataFields.num_groundtruth_boxes] = tf.shape(
+        out_tensor_dict[fields.InputDataFields.groundtruth_boxes])[0]

-  return tensor_dict
+  return out_tensor_dict


 def pad_input_data_to_static_shapes(tensor_dict, max_num_boxes, num_classes,

--- a/research/object_detection/inputs_test.py
+++ b/research/object_detection/inputs_test.py
@@ -611,6 +611,62 @@ class DataTransformationFnTest(test_case.TestCase):
    self.assertAllClose(transformed_inputs[fields.InputDataFields.image],
                        np.concatenate((image, additional_channels), axis=2))

+  def test_use_multiclass_scores_when_present(self):
+    image = np.random.rand(4, 4, 3).astype(np.float32)
+    tensor_dict = {
+        fields.InputDataFields.image:
+            tf.constant(image),
+        fields.InputDataFields.groundtruth_boxes:
+            tf.constant(np.array([[.5, .5, 1, 1], [.5, .5, 1, 1]], np.float32)),
+        fields.InputDataFields.multiclass_scores:
+            tf.constant(np.array([0.2, 0.3, 0.5, 0.1, 0.6, 0.3], np.float32)),
+        fields.InputDataFields.groundtruth_classes:
+            tf.constant(np.array([1, 2], np.int32))
+    }
+
+    input_transformation_fn = functools.partial(
+        inputs.transform_input_data,
+        model_preprocess_fn=_fake_model_preprocessor_fn,
+        image_resizer_fn=_fake_image_resizer_fn,
+        num_classes=3, use_multiclass_scores=True)
+    with self.test_session() as sess:
+      transformed_inputs = sess.run(
+          input_transformation_fn(tensor_dict=tensor_dict))
+
+    self.assertAllClose(
+        np.array([[0.2, 0.3, 0.5], [0.1, 0.6, 0.3]], np.float32),
+        transformed_inputs[fields.InputDataFields.groundtruth_classes])
+
+  def test_use_multiclass_scores_when_not_present(self):
+    image = np.random.rand(4, 4, 3).astype(np.float32)
+    tensor_dict = {
+        fields.InputDataFields.image:
+            tf.constant(image),
+        fields.InputDataFields.groundtruth_boxes:
+            tf.constant(np.array([[.5, .5, 1, 1], [.5, .5, 1, 1]], np.float32)),
+        fields.InputDataFields.multiclass_scores:
+            tf.placeholder(tf.float32),
+        fields.InputDataFields.groundtruth_classes:
+            tf.constant(np.array([1, 2], np.int32))
+    }
+
+    input_transformation_fn = functools.partial(
+        inputs.transform_input_data,
+        model_preprocess_fn=_fake_model_preprocessor_fn,
+        image_resizer_fn=_fake_image_resizer_fn,
+        num_classes=3, use_multiclass_scores=True)
+    with self.test_session() as sess:
+      transformed_inputs = sess.run(
+          input_transformation_fn(tensor_dict=tensor_dict),
+          feed_dict={
+              tensor_dict[fields.InputDataFields.multiclass_scores]:
+                  np.array([], dtype=np.float32)
+          })
+
+    self.assertAllClose(
+        np.array([[0, 1, 0], [0, 0, 1]], np.float32),
+        transformed_inputs[fields.InputDataFields.groundtruth_classes])
+
  def test_returns_correct_class_label_encodings(self):
    tensor_dict = {
        fields.InputDataFields.image:

--- a/research/object_detection/meta_architectures/faster_rcnn_meta_arch.py
+++ b/research/object_detection/meta_architectures/faster_rcnn_meta_arch.py
@@ -108,6 +108,7 @@ from object_detection.core import standard_fields as fields
 from object_detection.core import target_assigner
 from object_detection.utils import ops
 from object_detection.utils import shape_utils
+from object_detection.utils import variables_helper

 slim = tf.contrib.slim

@@ -210,7 +211,7 @@ class FasterRCNNFeatureExtractor(object):
      the model graph.
    """
    variables_to_restore = {}
-    for variable in tf.global_variables():
+    for variable in variables_helper.get_global_variables_safely():
      for scope_name in [first_stage_feature_extractor_scope,
                         second_stage_feature_extractor_scope]:
        if variable.op.name.startswith(scope_name):
@@ -275,7 +276,7 @@ class FasterRCNNKerasFeatureExtractor(object):
      the model graph.
    """
    variables_to_restore = {}
-    for variable in tf.global_variables():
+    for variable in variables_helper.get_global_variables_safely():
      for scope_name in [first_stage_feature_extractor_scope,
                         second_stage_feature_extractor_scope]:
        if variable.op.name.startswith(scope_name):
@@ -1193,6 +1194,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
        detection_masks = self._gather_instance_masks(
            detection_masks, detection_classes)

+      detection_masks = tf.cast(detection_masks, tf.float32)
      prediction_dict[fields.DetectionResultFields.detection_masks] = (
          tf.reshape(tf.sigmoid(detection_masks),
                     [batch_size, max_detection, mask_height, mask_width]))
@@ -1461,9 +1463,10 @@ class FasterRCNNMetaArch(model.DetectionModel):
            mask_predictions=mask_predictions)

      if 'rpn_features_to_crop' in prediction_dict and self._initial_crop_size:
-        self._add_detection_features_output_node(
-            detections_dict[fields.DetectionResultFields.detection_boxes],
-            prediction_dict['rpn_features_to_crop'])
+        detections_dict[
+            'detection_features'] = self._add_detection_features_output_node(
+                detections_dict[fields.DetectionResultFields.detection_boxes],
+                prediction_dict['rpn_features_to_crop'])

      return detections_dict

@@ -1474,18 +1477,25 @@ class FasterRCNNMetaArch(model.DetectionModel):

  def _add_detection_features_output_node(self, detection_boxes,
                                          rpn_features_to_crop):
-    """Add the detection features to the output node.
+    """Add detection features to outputs.

-    The detection features are from cropping rpn_features with boxes.
-    Each bounding box has one feature vector of length depth, which comes from
-    mean_pooling of the cropped rpn_features.
+    This function extracts box features for each box in rpn_features_to_crop.
+    It returns the extracted box features, reshaped to
+    [batch size, max_detections, height, width, depth], and average pools
+    the extracted features across the spatial dimensions and adds a graph node
+    to the pooled features named 'pooled_detection_features'

    Args:
      detection_boxes: a 3-D float32 tensor of shape
-        [batch_size, max_detection, 4] which represents the bounding boxes.
+        [batch_size, max_detections, 4] which represents the bounding boxes.
      rpn_features_to_crop: A 4-D float32 tensor with shape
        [batch, height, width, depth] representing image features to crop using
        the proposals boxes.
+
+    Returns:
+      detection_features: a 4-D float32 tensor of shape
+        [batch size, max_detections, height, width, depth] representing
+        cropped image features
    """
    with tf.name_scope('SecondStageDetectionFeaturesExtract'):
      flattened_detected_feature_maps = (
@@ -1495,15 +1505,23 @@ class FasterRCNNMetaArch(model.DetectionModel):
          flattened_detected_feature_maps)

      batch_size = tf.shape(detection_boxes)[0]
-      max_detection = tf.shape(detection_boxes)[1]
+      max_detections = tf.shape(detection_boxes)[1]
      detection_features_pool = tf.reduce_mean(
          detection_features_unpooled, axis=[1, 2])
-      detection_features = tf.reshape(
+      reshaped_detection_features_pool = tf.reshape(
          detection_features_pool,
-          [batch_size, max_detection, tf.shape(detection_features_pool)[-1]])
+          [batch_size, max_detections, tf.shape(detection_features_pool)[-1]])
+      reshaped_detection_features_pool = tf.identity(
+          reshaped_detection_features_pool, 'pooled_detection_features')

-    detection_features = tf.identity(
-        detection_features, 'detection_features')
+      reshaped_detection_features = tf.reshape(
+          detection_features_unpooled,
+          [batch_size, max_detections,
+           tf.shape(detection_features_unpooled)[1],
+           tf.shape(detection_features_unpooled)[2],
+           tf.shape(detection_features_unpooled)[3]])
+
+    return reshaped_detection_features

  def _postprocess_rpn(self,
                       rpn_box_encodings_batch,
@@ -1749,6 +1767,15 @@ class FasterRCNNMetaArch(model.DetectionModel):
        resized_masks_list.append(resized_mask)

      groundtruth_masks_list = resized_masks_list
+    # Masks could be set to bfloat16 in the input pipeline for performance
+    # reasons. Convert masks back to floating point space here since the rest of
+    # this module assumes groundtruth to be of float32 type.
+    float_groundtruth_masks_list = []
+    if groundtruth_masks_list:
+      for mask in groundtruth_masks_list:
+        float_groundtruth_masks_list.append(tf.cast(mask, tf.float32))
+      groundtruth_masks_list = float_groundtruth_masks_list
+
    if self.groundtruth_has_field(fields.BoxListFields.weights):
      groundtruth_weights_list = self.groundtruth_lists(
          fields.BoxListFields.weights)
@@ -2619,7 +2646,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
          self.first_stage_feature_extractor_scope,
          self.second_stage_feature_extractor_scope)

-    variables_to_restore = tf.global_variables()
+    variables_to_restore = variables_helper.get_global_variables_safely()
    variables_to_restore.append(slim.get_or_create_global_step())
    # Only load feature extractor variables to be consistent with loading from
    # a classification checkpoint.

--- a/research/object_detection/meta_architectures/faster_rcnn_meta_arch_test.py
+++ b/research/object_detection/meta_architectures/faster_rcnn_meta_arch_test.py
@@ -383,6 +383,11 @@ class FasterRCNNMetaArchTest(
    class_predictions_with_background_shapes = [(16, 3), (None, 3)]
    proposal_boxes_shapes = [(2, 8, 4), (None, 8, 4)]
    batch_size = 2
+    initial_crop_size = 3
+    maxpool_stride = 1
+    height = initial_crop_size/maxpool_stride
+    width = initial_crop_size/maxpool_stride
+    depth = 3
    image_shape = np.array((2, 36, 48, 3), dtype=np.int32)
    for (num_proposals_shape, refined_box_encoding_shape,
         class_predictions_with_background_shape,
@@ -433,6 +438,7 @@ class FasterRCNNMetaArchTest(
            'detection_scores': tf.zeros([2, 5]),
            'detection_classes': tf.zeros([2, 5]),
            'num_detections': tf.zeros([2]),
+            'detection_features': tf.zeros([2, 5, width, height, depth])
        }, true_image_shapes)
      with self.test_session(graph=tf_graph) as sess:
        detections_out = sess.run(
@@ -453,6 +459,9 @@ class FasterRCNNMetaArchTest(
      self.assertAllClose(detections_out['num_detections'].shape, [2])
      self.assertTrue(np.amax(detections_out['detection_masks'] <= 1.0))
      self.assertTrue(np.amin(detections_out['detection_masks'] >= 0.0))
+      self.assertAllEqual(detections_out['detection_features'].shape,
+                          [2, 5, width, height, depth])
+      self.assertGreaterEqual(np.amax(detections_out['detection_features']), 0)

  def _get_box_classifier_features_shape(self,
                                         image_size,

--- a/research/object_detection/meta_architectures/ssd_meta_arch.py
+++ b/research/object_detection/meta_architectures/ssd_meta_arch.py
@@ -28,6 +28,7 @@ from object_detection.core import standard_fields as fields
 from object_detection.core import target_assigner
 from object_detection.utils import ops
 from object_detection.utils import shape_utils
+from object_detection.utils import variables_helper
 from object_detection.utils import visualization_utils

 slim = tf.contrib.slim
@@ -45,6 +46,7 @@ class SSDFeatureExtractor(object):
               reuse_weights=None,
               use_explicit_padding=False,
               use_depthwise=False,
+               num_layers=6,
               override_base_feature_extractor_hyperparams=False):
    """Constructor.

@@ -61,6 +63,7 @@ class SSDFeatureExtractor(object):
      use_explicit_padding: Whether to use explicit padding when extracting
        features. Default is False.
      use_depthwise: Whether to use depthwise convolutions. Default is False.
+      num_layers: Number of SSD layers.
      override_base_feature_extractor_hyperparams: Whether to override
        hyperparameters of the base feature extractor with the one from
        `conv_hyperparams_fn`.
@@ -73,6 +76,7 @@ class SSDFeatureExtractor(object):
    self._reuse_weights = reuse_weights
    self._use_explicit_padding = use_explicit_padding
    self._use_depthwise = use_depthwise
+    self._num_layers = num_layers
    self._override_base_feature_extractor_hyperparams = (
        override_base_feature_extractor_hyperparams)

@@ -126,7 +130,7 @@ class SSDFeatureExtractor(object):
      the model graph.
    """
    variables_to_restore = {}
-    for variable in tf.global_variables():
+    for variable in variables_helper.get_global_variables_safely():
      var_name = variable.op.name
      if var_name.startswith(feature_extractor_scope + '/'):
        var_name = var_name.replace(feature_extractor_scope + '/', '')
@@ -148,6 +152,7 @@ class SSDKerasFeatureExtractor(tf.keras.Model):
               inplace_batchnorm_update,
               use_explicit_padding=False,
               use_depthwise=False,
+               num_layers=6,
               override_base_feature_extractor_hyperparams=False,
               name=None):
    """Constructor.
@@ -172,6 +177,7 @@ class SSDKerasFeatureExtractor(tf.keras.Model):
      use_explicit_padding: Whether to use explicit padding when extracting
        features. Default is False.
      use_depthwise: Whether to use depthwise convolutions. Default is False.
+      num_layers: Number of SSD layers.
      override_base_feature_extractor_hyperparams: Whether to override
        hyperparameters of the base feature extractor with the one from
        `conv_hyperparams_config`.
@@ -189,6 +195,7 @@ class SSDKerasFeatureExtractor(tf.keras.Model):
    self._inplace_batchnorm_update = inplace_batchnorm_update
    self._use_explicit_padding = use_explicit_padding
    self._use_depthwise = use_depthwise
+    self._num_layers = num_layers
    self._override_base_feature_extractor_hyperparams = (
        override_base_feature_extractor_hyperparams)

@@ -247,11 +254,13 @@ class SSDKerasFeatureExtractor(tf.keras.Model):
      the model graph.
    """
    variables_to_restore = {}
-    for variable in tf.global_variables():
-      var_name = variable.op.name
+    for variable in self.variables:
+      # variable.name includes ":0" at the end, but the names in the checkpoint
+      # do not have the suffix ":0". So, we strip it here.
+      var_name = variable.name[:-2]
      if var_name.startswith(feature_extractor_scope + '/'):
        var_name = var_name.replace(feature_extractor_scope + '/', '')
-        variables_to_restore[var_name] = variable
+      variables_to_restore[var_name] = variable

    return variables_to_restore

@@ -709,6 +718,14 @@ class SSDMetaArch(model.DetectionModel):
      additional_fields = {
          'multiclass_scores': detection_scores_with_background
      }
+      if self._anchors is not None:
+        anchor_indices = tf.range(self._anchors.num_boxes_static())
+        batch_anchor_indices = tf.tile(
+            tf.expand_dims(anchor_indices, 0), [batch_size, 1])
+        # All additional fields need to be float.
+        additional_fields.update({
+            'anchor_indices': tf.cast(batch_anchor_indices, tf.float32),
+        })
      if detection_keypoints is not None:
        detection_keypoints = tf.identity(
            detection_keypoints, 'raw_keypoint_locations')
@@ -737,6 +754,12 @@ class SSDMetaArch(model.DetectionModel):
          fields.DetectionResultFields.raw_detection_scores:
              detection_scores_with_background
      }
+      if (nmsed_additional_fields is not None and
+          'anchor_indices' in nmsed_additional_fields):
+        detection_dict.update({
+            fields.DetectionResultFields.detection_anchor_indices:
+                tf.cast(nmsed_additional_fields['anchor_indices'], tf.int32),
+        })
      if (nmsed_additional_fields is not None and
          fields.BoxListFields.keypoints in nmsed_additional_fields):
        detection_dict[fields.DetectionResultFields.detection_keypoints] = (
@@ -1218,13 +1241,24 @@ class SSDMetaArch(model.DetectionModel):

    if fine_tune_checkpoint_type == 'detection':
      variables_to_restore = {}
-      for variable in tf.global_variables():
-        var_name = variable.op.name
-        if load_all_detection_checkpoint_vars:
-          variables_to_restore[var_name] = variable
-        else:
-          if var_name.startswith(self._extract_features_scope):
+      if tf.executing_eagerly():
+        for variable in self.variables:
+          # variable.name includes ":0" at the end, but the names in the
+          # checkpoint do not have the suffix ":0". So, we strip it here.
+          var_name = variable.name[:-2]
+          if load_all_detection_checkpoint_vars:
+            variables_to_restore[var_name] = variable
+          else:
+            if var_name.startswith(self._extract_features_scope):
+              variables_to_restore[var_name] = variable
+      else:
+        for variable in variables_helper.get_global_variables_safely():
+          var_name = variable.op.name
+          if load_all_detection_checkpoint_vars:
            variables_to_restore[var_name] = variable
+          else:
+            if var_name.startswith(self._extract_features_scope):
+              variables_to_restore[var_name] = variable

    return variables_to_restore


--- a/research/object_detection/meta_architectures/ssd_meta_arch_test.py
+++ b/research/object_detection/meta_architectures/ssd_meta_arch_test.py
@@ -188,6 +188,7 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
                            [0.5, 0., 1., 0.5], [1., 1., 1.5, 1.5]]]
    raw_detection_scores = [[[0, 0], [0, 0], [0, 0], [0, 0]],
                            [[0, 0], [0, 0], [0, 0], [0, 0]]]
+    detection_anchor_indices = [[0, 2, 1, 0, 0], [0, 2, 1, 0, 0]]

    for input_shape in input_shapes:
      tf_graph = tf.Graph()
@@ -229,6 +230,8 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
                          raw_detection_boxes)
      self.assertAllEqual(detections_out['raw_detection_scores'],
                          raw_detection_scores)
+      self.assertAllEqual(detections_out['detection_anchor_indices'],
+                          detection_anchor_indices)

  def test_postprocess_results_are_correct_static(self, use_keras):
    with tf.Graph().as_default():

--- a/research/object_detection/metrics/coco_evaluation.py
+++ b/research/object_detection/metrics/coco_evaluation.py
@@ -13,7 +13,12 @@
 # limitations under the License.
 # ==============================================================================
 """Class for evaluating object detections with COCO metrics."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
 import numpy as np
+from six.moves import zip
 import tensorflow as tf

 from object_detection.core import standard_fields

--- a/research/object_detection/metrics/coco_tools.py
+++ b/research/object_detection/metrics/coco_tools.py
@@ -39,6 +39,10 @@ then evaluation (in multi-class mode) can be invoked as follows:
  metrics = evaluator.ComputeMetrics()

 """
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
 from collections import OrderedDict
 import copy
 import time
@@ -48,6 +52,8 @@ from pycocotools import coco
 from pycocotools import cocoeval
 from pycocotools import mask

+from six.moves import range
+from six.moves import zip
 import tensorflow as tf

 from object_detection.utils import json_utils

--- a/research/object_detection/metrics/oid_challenge_evaluation.py
+++ b/research/object_detection/metrics/oid_challenge_evaluation.py
@@ -40,6 +40,8 @@ from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function

+import logging
+
 from absl import app
 from absl import flags
 import pandas as pd
@@ -120,20 +122,22 @@ def main(unused_argv):
      object_detection_evaluation.OpenImagesChallengeEvaluator(
          categories, evaluate_masks=is_instance_segmentation_eval))

+  all_predictions = pd.read_csv(FLAGS.input_predictions)
+  images_processed = 0
  for _, groundtruth in enumerate(all_annotations.groupby('ImageID')):
+    logging.info('Processing image %d', images_processed)
    image_id, image_groundtruth = groundtruth
    groundtruth_dictionary = utils.build_groundtruth_dictionary(
        image_groundtruth, class_label_map)
    challenge_evaluator.add_single_ground_truth_image_info(
        image_id, groundtruth_dictionary)

-  all_predictions = pd.read_csv(FLAGS.input_predictions)
-  for _, prediction_data in enumerate(all_predictions.groupby('ImageID')):
-    image_id, image_predictions = prediction_data
    prediction_dictionary = utils.build_predictions_dictionary(
-        image_predictions, class_label_map)
+        all_predictions.loc[all_predictions['ImageID'] == image_id],
+        class_label_map)
    challenge_evaluator.add_single_detected_image_info(image_id,
                                                       prediction_dictionary)
+    images_processed += 1

  metrics = challenge_evaluator.evaluate()


--- a/research/object_detection/metrics/oid_challenge_evaluation_utils.py
+++ b/research/object_detection/metrics/oid_challenge_evaluation_utils.py
@@ -18,10 +18,13 @@ from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function

+import base64
+import zlib
+
 import numpy as np
 import pandas as pd
+from pycocotools import mask as coco_mask

-from pycocotools import mask
 from object_detection.core import standard_fields


@@ -53,33 +56,42 @@ def _decode_raw_data_into_masks_and_boxes(segments, image_widths,
  """Decods binary segmentation masks into np.arrays and boxes.

  Args:
-    segments: pandas Series object containing either None entries or strings
-    with COCO-encoded binary masks. All masks are expected to be the same size.
+    segments: pandas Series object containing either
+      None entries, or strings with
+      base64, zlib compressed, COCO RLE-encoded binary masks.
+      All masks are expected to be the same size.
    image_widths: pandas Series of mask widths.
    image_heights: pandas Series of mask heights.

  Returns:
    a np.ndarray of the size NxWxH, where W and H is determined from the encoded
-    masks; for the None values, zero arrays of size WxH are created. if input
+    masks; for the None values, zero arrays of size WxH are created. If input
    contains only None values, W=1, H=1.
  """
  segment_masks = []
  segment_boxes = []
  ind = segments.first_valid_index()
  if ind is not None:
-    size = [int(image_heights.iloc[ind]), int(image_widths[ind])]
+    size = [int(image_heights[ind]), int(image_widths[ind])]
  else:
    # It does not matter which size we pick since no masks will ever be
    # evaluated.
-    size = [1, 1]
+    return np.zeros((segments.shape[0], 1, 1), dtype=np.uint8), np.zeros(
+        (segments.shape[0], 4), dtype=np.float32)
+
  for segment, im_width, im_height in zip(segments, image_widths,
                                          image_heights):
    if pd.isnull(segment):
      segment_masks.append(np.zeros([1, size[0], size[1]], dtype=np.uint8))
      segment_boxes.append(np.expand_dims(np.array([0.0, 0.0, 0.0, 0.0]), 0))
    else:
-      encoding_dict = {'size': [im_height, im_width], 'counts': segment}
-      mask_tensor = mask.decode(encoding_dict)
+      compressed_mask = base64.b64decode(segment)
+      rle_encoded_mask = zlib.decompress(compressed_mask)
+      decoding_dict = {
+          'size': [im_height, im_width],
+          'counts': rle_encoded_mask
+      }
+      mask_tensor = coco_mask.decode(decoding_dict)

      segment_masks.append(np.expand_dims(mask_tensor, 0))
      segment_boxes.append(np.expand_dims(_to_normalized_box(mask_tensor), 0))

--- a/research/object_detection/metrics/oid_challenge_evaluation_utils_test.py
+++ b/research/object_detection/metrics/oid_challenge_evaluation_utils_test.py
@@ -18,15 +18,43 @@ from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function

+import base64
+import zlib
+
 import numpy as np
 import pandas as pd
-from pycocotools import mask
+from pycocotools import mask as coco_mask
 import tensorflow as tf

 from object_detection.core import standard_fields
 from object_detection.metrics import oid_challenge_evaluation_utils as utils


+def encode_mask(mask_to_encode):
+  """Encodes a binary mask into the Kaggle challenge text format.
+
+  The encoding is done in three stages:
+   - COCO RLE-encoding,
+   - zlib compression,
+   - base64 encoding (to use as entry in csv file).
+
+  Args:
+    mask_to_encode: binary np.ndarray of dtype bool and 2d shape.
+
+  Returns:
+    A (base64) text string of the encoded mask.
+  """
+  mask_to_encode = np.squeeze(mask_to_encode)
+  mask_to_encode = mask_to_encode.reshape(mask_to_encode.shape[0],
+                                          mask_to_encode.shape[1], 1)
+  mask_to_encode = mask_to_encode.astype(np.uint8)
+  mask_to_encode = np.asfortranarray(mask_to_encode)
+  encoded_mask = coco_mask.encode(mask_to_encode)[0]['counts']
+  compressed_mask = zlib.compress(encoded_mask, zlib.Z_BEST_COMPRESSION)
+  base64_mask = base64.b64encode(compressed_mask)
+  return base64_mask
+
+
 class OidUtilTest(tf.test.TestCase):

  def testMaskToNormalizedBox(self):
@@ -44,10 +72,10 @@ class OidUtilTest(tf.test.TestCase):
    mask1 = np.array([[0, 0, 1, 1], [0, 0, 1, 1], [0, 0, 0, 0]], dtype=np.uint8)
    mask2 = np.array([[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]], dtype=np.uint8)

-    encoding1 = mask.encode(np.asfortranarray(mask1))
-    encoding2 = mask.encode(np.asfortranarray(mask2))
+    encoding1 = encode_mask(mask1)
+    encoding2 = encode_mask(mask2)

-    vals = pd.Series([encoding1['counts'], encoding2['counts']])
+    vals = pd.Series([encoding1, encoding2])
    image_widths = pd.Series([mask1.shape[1], mask2.shape[1]])
    image_heights = pd.Series([mask1.shape[0], mask2.shape[0]])

@@ -60,6 +88,15 @@ class OidUtilTest(tf.test.TestCase):
    self.assertAllEqual(expected_segm, segm)
    self.assertAllEqual(expected_bbox, bbox)

+  def testDecodeToTensorsNoMasks(self):
+    vals = pd.Series([None, None])
+    image_widths = pd.Series([None, None])
+    image_heights = pd.Series([None, None])
+    segm, bbox = utils._decode_raw_data_into_masks_and_boxes(
+        vals, image_widths, image_heights)
+    self.assertAllEqual(np.zeros((2, 1, 1), dtype=np.uint8), segm)
+    self.assertAllEqual(np.zeros((2, 4), dtype=np.float32), bbox)
+

 class OidChallengeEvaluationUtilTest(tf.test.TestCase):

@@ -140,13 +177,13 @@ class OidChallengeEvaluationUtilTest(tf.test.TestCase):
    mask2 = np.array([[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]],
                     dtype=np.uint8)

-    encoding1 = mask.encode(np.asfortranarray(mask1))
-    encoding2 = mask.encode(np.asfortranarray(mask2))
+    encoding1 = encode_mask(mask1)
+    encoding2 = encode_mask(mask2)

    np_data = pd.DataFrame(
        [[
            'fe58ec1b06db2bb7', mask1.shape[1], mask1.shape[0], '/m/04bcr3',
-            0.0, 0.3, 0.5, 0.6, 0, None, encoding1['counts']
+            0.0, 0.3, 0.5, 0.6, 0, None, encoding1
        ],
         [
             'fe58ec1b06db2bb7', None, None, '/m/02gy9n', 0.1, 0.2, 0.3, 0.4, 1,
@@ -154,7 +191,7 @@ class OidChallengeEvaluationUtilTest(tf.test.TestCase):
         ],
         [
             'fe58ec1b06db2bb7', mask2.shape[1], mask2.shape[0], '/m/02gy9n',
-             0.5, 0.6, 0.8, 0.9, 0, None, encoding2['counts']
+             0.5, 0.6, 0.8, 0.9, 0, None, encoding2
         ],
         [
             'fe58ec1b06db2bb7', None, None, '/m/04bcr3', None, None, None,
@@ -218,21 +255,21 @@ class OidChallengeEvaluationUtilTest(tf.test.TestCase):
    mask2 = np.array([[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]],
                     dtype=np.uint8)

-    encoding1 = mask.encode(np.asfortranarray(mask1))
-    encoding2 = mask.encode(np.asfortranarray(mask2))
+    encoding1 = encode_mask(mask1)
+    encoding2 = encode_mask(mask2)

-    np_data = pd.DataFrame(
-        [[
-            'fe58ec1b06db2bb7', mask1.shape[1], mask1.shape[0], '/m/04bcr3',
-            encoding1['counts'], 0.8
-        ],
-         [
-             'fe58ec1b06db2bb7', mask2.shape[1], mask2.shape[0], '/m/02gy9n',
-             encoding2['counts'], 0.6
-         ]],
-        columns=[
-            'ImageID', 'ImageWidth', 'ImageHeight', 'LabelName', 'Mask', 'Score'
-        ])
+    np_data = pd.DataFrame([[
+        'fe58ec1b06db2bb7', mask1.shape[1], mask1.shape[0], '/m/04bcr3',
+        encoding1, 0.8
+    ],
+                            [
+                                'fe58ec1b06db2bb7', mask2.shape[1],
+                                mask2.shape[0], '/m/02gy9n', encoding2, 0.6
+                            ]],
+                           columns=[
+                               'ImageID', 'ImageWidth', 'ImageHeight',
+                               'LabelName', 'Mask', 'Score'
+                           ])
    class_label_map = {'/m/04bcr3': 1, '/m/02gy9n': 3}
    prediction_dictionary = utils.build_predictions_dictionary(
        np_data, class_label_map)

--- a/research/object_detection/model_lib.py
+++ b/research/object_detection/model_lib.py
@@ -24,7 +24,6 @@ import os

 import tensorflow as tf

-from tensorflow.python.util import function_utils
 from object_detection import eval_util
 from object_detection import exporter as exporter_lib
 from object_detection import inputs
@@ -187,7 +186,7 @@ def unstack_batch(tensor_dict, unpad_groundtruth_tensors=True):
  return unbatched_tensor_dict


-def _provide_groundtruth(model, labels):
+def provide_groundtruth(model, labels):
  """Provides the labels to a model as groundtruth.

  This helper function extracts the corresponding boxes, classes,
@@ -287,7 +286,7 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False,
          labels, unpad_groundtruth_tensors=unpad_groundtruth_tensors)

    if mode in (tf.estimator.ModeKeys.TRAIN, tf.estimator.ModeKeys.EVAL):
-      _provide_groundtruth(detection_model, labels)
+      provide_groundtruth(detection_model, labels)

    preprocessed_images = features[fields.InputDataFields.image]
    if use_tpu and train_config.use_bfloat16:
@@ -524,7 +523,7 @@ def create_estimator_and_inputs(run_config,
                                pipeline_config_path,
                                config_override=None,
                                train_steps=None,
-                                sample_1_of_n_eval_examples=1,
+                                sample_1_of_n_eval_examples=None,
                                sample_1_of_n_eval_on_train_examples=1,
                                model_fn_creator=create_model_fn,
                                use_tpu_estimator=False,
@@ -606,9 +605,12 @@ def create_estimator_and_inputs(run_config,
      pipeline_config_path, config_override=config_override)
  kwargs.update({
      'train_steps': train_steps,
-      'sample_1_of_n_eval_examples': sample_1_of_n_eval_examples,
      'use_bfloat16': configs['train_config'].use_bfloat16 and use_tpu
  })
+  if sample_1_of_n_eval_examples >= 1:
+    kwargs.update({
+        'sample_1_of_n_eval_examples': sample_1_of_n_eval_examples
+    })
  if override_eval_num_epochs:
    kwargs.update({'eval_num_epochs': 1})
    tf.logging.warning(
@@ -667,11 +669,6 @@ def create_estimator_and_inputs(run_config,
  model_fn = model_fn_creator(detection_model_fn, configs, hparams, use_tpu,
                              postprocess_on_cpu)
  if use_tpu_estimator:
-    # Multicore inference disabled due to b/129367127
-    tpu_estimator_args = function_utils.fn_args(tf.contrib.tpu.TPUEstimator)
-    kwargs = {}
-    if 'experimental_export_device_assignment' in tpu_estimator_args:
-      kwargs['experimental_export_device_assignment'] = True
    estimator = tf.contrib.tpu.TPUEstimator(
        model_fn=model_fn,
        train_batch_size=train_config.batch_size,
@@ -681,8 +678,7 @@ def create_estimator_and_inputs(run_config,
        config=run_config,
        export_to_tpu=export_to_tpu,
        eval_on_tpu=False,  # Eval runs on CPU, so disable eval on TPU
-        params=params if params else {},
-        **kwargs)
+        params=params if params else {})
  else:
    estimator = tf.estimator.Estimator(model_fn=model_fn, config=run_config)


--- a/research/object_detection/model_lib_v2.py
+++ b/research/object_detection/model_lib_v2.py
--- a/research/object_detection/model_lib_v2_test.py
+++ b/research/object_detection/model_lib_v2_test.py
+# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for object detection model library."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import os
+
+import tensorflow as tf
+
+from object_detection import model_hparams
+from object_detection import model_lib_v2
+from object_detection.utils import config_util
+
+
+# Model for test. Current options are:
+# 'ssd_mobilenet_v2_pets_keras'
+MODEL_NAME_FOR_TEST = 'ssd_mobilenet_v2_pets_keras'
+
+
+def _get_data_path():
+  """Returns an absolute path to TFRecord file."""
+  return os.path.join(tf.resource_loader.get_data_files_path(), 'test_data',
+                      'pets_examples.record')
+
+
+def get_pipeline_config_path(model_name):
+  """Returns path to the local pipeline config file."""
+  return os.path.join(tf.resource_loader.get_data_files_path(), 'samples',
+                      'configs', model_name + '.config')
+
+
+def _get_labelmap_path():
+  """Returns an absolute path to label map file."""
+  return os.path.join(tf.resource_loader.get_data_files_path(), 'data',
+                      'pet_label_map.pbtxt')
+
+
+def _get_config_kwarg_overrides():
+  """Returns overrides to the configs that insert the correct local paths."""
+  data_path = _get_data_path()
+  label_map_path = _get_labelmap_path()
+  return {
+      'train_input_path': data_path,
+      'eval_input_path': data_path,
+      'label_map_path': label_map_path
+  }
+
+
+def _get_configs_for_model(model_name):
+  """Returns configurations for model."""
+  filename = get_pipeline_config_path(model_name)
+  configs = config_util.get_configs_from_pipeline_file(filename)
+  configs = config_util.merge_external_params_with_configs(
+      configs, kwargs_dict=_get_config_kwarg_overrides())
+  return configs
+
+
+class ModelLibTest(tf.test.TestCase):
+
+  @classmethod
+  def setUpClass(cls):
+    tf.keras.backend.clear_session()
+
+  def test_train_loop_then_eval_loop(self):
+    """Tests that Estimator and input function are constructed correctly."""
+    hparams = model_hparams.create_hparams(
+        hparams_overrides='load_pretrained=false')
+    pipeline_config_path = get_pipeline_config_path(MODEL_NAME_FOR_TEST)
+    config_kwarg_overrides = _get_config_kwarg_overrides()
+    model_dir = tf.test.get_temp_dir()
+
+    train_steps = 2
+    model_lib_v2.train_loop(
+        hparams,
+        pipeline_config_path,
+        model_dir=model_dir,
+        train_steps=train_steps,
+        checkpoint_every_n=1,
+        **config_kwarg_overrides)
+
+    model_lib_v2.eval_continuously(
+        hparams,
+        pipeline_config_path,
+        model_dir=model_dir,
+        checkpoint_dir=model_dir,
+        train_steps=train_steps,
+        wait_interval=10,
+        **config_kwarg_overrides)
+
--- a/research/object_detection/models/faster_rcnn_inception_resnet_v2_feature_extractor.py
+++ b/research/object_detection/models/faster_rcnn_inception_resnet_v2_feature_extractor.py
@@ -25,6 +25,7 @@ Huang et al. (https://arxiv.org/abs/1611.10012)
 import tensorflow as tf

 from object_detection.meta_architectures import faster_rcnn_meta_arch
+from object_detection.utils import variables_helper
 from nets import inception_resnet_v2

 slim = tf.contrib.slim
@@ -195,7 +196,7 @@ class FasterRCNNInceptionResnetV2FeatureExtractor(
    """

    variables_to_restore = {}
-    for variable in tf.global_variables():
+    for variable in variables_helper.get_global_variables_safely():
      if variable.op.name.startswith(
          first_stage_feature_extractor_scope):
        var_name = variable.op.name.replace(

--- a/research/object_detection/models/faster_rcnn_inception_resnet_v2_keras_feature_extractor.py
+++ b/research/object_detection/models/faster_rcnn_inception_resnet_v2_keras_feature_extractor.py
@@ -30,6 +30,7 @@ import tensorflow as tf
 from object_detection.meta_architectures import faster_rcnn_meta_arch
 from object_detection.models.keras_models import inception_resnet_v2
 from object_detection.utils import model_util
+from object_detection.utils import variables_helper


 class FasterRCNNInceptionResnetV2KerasFeatureExtractor(
@@ -1070,7 +1071,7 @@ class FasterRCNNInceptionResnetV2KerasFeatureExtractor(
    }

    variables_to_restore = {}
-    for variable in tf.global_variables():
+    for variable in variables_helper.get_global_variables_safely():
      var_name = keras_to_slim_name_mapping.get(variable.op.name)
      if var_name:
        variables_to_restore[var_name] = variable

--- a/research/object_detection/models/faster_rcnn_nas_feature_extractor.py
+++ b/research/object_detection/models/faster_rcnn_nas_feature_extractor.py
@@ -23,6 +23,7 @@ https://arxiv.org/abs/1707.07012
 import tensorflow as tf

 from object_detection.meta_architectures import faster_rcnn_meta_arch
+from object_detection.utils import variables_helper
 from nets.nasnet import nasnet
 from nets.nasnet import nasnet_utils

@@ -307,7 +308,7 @@ class FasterRCNNNASFeatureExtractor(
    # Note that the NAS checkpoint only contains the moving average version of
    # the Variables so we need to generate an appropriate dictionary mapping.
    variables_to_restore = {}
-    for variable in tf.global_variables():
+    for variable in variables_helper.get_global_variables_safely():
      if variable.op.name.startswith(
          first_stage_feature_extractor_scope):
        var_name = variable.op.name.replace(