Merged commit includes the following changes: (#6932)

250447559 by Zhichao Lu: Update expected files format for Instance Segmentation challenge: - add fields ImageWidth, ImageHeight and store the values per prediction - as mask, store only encoded image and assume its size is ImageWidth x ImageHeight -- 250402780 by rathodv: Fix failing Mask R-CNN TPU convergence test. Cast second stage prediction tensors from bfloat16 to float32 to prevent errors in third target assignment (Mask Prediction) - Concat with different types bfloat16 and bfloat32 isn't allowed. -- 250300240 by Zhichao Lu: Addion Open Images Challenge 2019 object detection and instance segmentation support into Estimator framework. -- 249944839 by rathodv: Modify exporter.py to add multiclass score nodes in exported inference graphs. -- 249935201 by rathodv: Modify postprocess methods to preserve multiclass scores after non max suppression. -- 249878079 by Zhichao Lu: This CL slightly refactors some Object Detection helper functions for data creation, evaluation, and groundtruth providing. This will allow the eager+function custom loops to share code with the existing estimator training loops. Concretely we make the following changes: 1. In input creation we separate dataset-creation into top-level helpers, and allow it to optionally accept a pre-constructed model directly instead of always creating a model from the config just for feature preprocessing. 2. In coco evaluation we split the update_op creation into its own function, which the custom loops will call directly. 3. In model_lib we move groundtruth providing/ datastructure munging into a helper function 4. For now we put an escape hatch in `_summarize_target_assignment` when executing in tf v2.0 behavior because the summary apis used only work w/ tf 1.x -- 249673507 by rathodv: Use explicit casts instead of tf.to_float and tf.to_int32 to avoid warnings. -- 249656006 by Zhichao Lu: Add named "raw_keypoint_locations" node that corresponds with the "raw_box_locations" node. -- 249651674 by rathodv: Keep proposal boxes in float format. MatMulCropAndResize can handle the type even when feature themselves are bfloat16s. -- 249568633 by rathodv: Support q > 1 in class agnostic NMS. Break post_processing_test.py into 3 separate files to avoid linter errors. -- 249535530 by rathodv: Update some deprecated arguments to tf ops. -- 249368223 by rathodv: Modify MatMulCropAndResize to use MultiLevelRoIAlign method and move the tests to spatial_transform_ops.py module. This cl establishes that CropAndResize and RoIAlign are equivalent and only differ in the sampling point grid within the boxes. CropAndResize uses a uniform size x size point grid such that the corner points exactly overlap box corners, while RoiAlign divides boxes into size x size cells and uses their centers as sampling points. In this cl, we switch MatMulCropAndResize to use the MultiLevelRoIAlign implementation with `align_corner` option as MultiLevelRoIAlign implementation is more memory efficient on TPU when compared to the original MatMulCropAndResize. -- 249337338 by chowdhery: Add class-agnostic non-max-suppression in post_processing -- 249139196 by Zhichao Lu: Fix positional argument bug in export_tflite_ssd_graph -- 249120219 by Zhichao Lu: Add evaluator for computing precision limited to a given recall range. -- 249030593 by Zhichao Lu: Evaluation util to run segmentation and detection challenge evaluation. -- 248554358 by Zhichao Lu: This change contains the auxiliary changes required for TF 2.0 style training with eager+functions+dist strat loops, but not the loops themselves. It includes: - Updates to shape usage to support both tensorshape v1 and tensorshape v2 - A fix to FreezableBatchNorm to not override the `training` arg in call when `None` was passed to the constructor (Not an issue in the estimator loops but it was in the custom loops) - Puts some constants in init_scope so they work in eager + functions - Makes learning rate schedules return a callable in eager mode (required so they update when the global_step changes) - Makes DetectionModel a tf.module so it tracks variables (e.g. ones nested in layers) - Removes some references to `op.name` for some losses and replaces it w/ explicit names - A small part of the change to allow the coco evaluation metrics to work in eager mode -- 248271226 by rathodv: Add MultiLevel RoIAlign op. -- 248229103 by rathodv: Add functions to 1. pad features maps 2. ravel 5-D indices -- 248206769 by rathodv: Add utilities needed to introduce RoI Align op. -- 248177733 by pengchong: Internal changes -- 247742582 by Zhichao Lu: Open Images Challenge 2019 instance segmentation metric: part 2 -- 247525401 by Zhichao Lu: Update comments on max_class_per_detection. -- 247520753 by rathodv: Add multilevel crop and resize operation that builds on top of matmul_crop_and_resize. -- 247391600 by Zhichao Lu: Open Images Challenge 2019 instance segmentation metric -- 247325813 by chowdhery: Quantized MobileNet v2 SSD FPNLite config with depth multiplier 0.75 -- PiperOrigin-RevId: 250447559

Merged commit includes the following changes: (#6932)
250447559 by Zhichao Lu: Update expected files format for Instance Segmentation challenge: - add fields ImageWidth, ImageHeight and store the values per prediction - as mask, store only encoded image and assume its size is ImageWidth x ImageHeight -- 250402780 by rathodv: Fix failing Mask R-CNN TPU convergence test. Cast second stage prediction tensors from bfloat16 to float32 to prevent errors in third target assignment (Mask Prediction) - Concat with different types bfloat16 and bfloat32 isn't allowed. -- 250300240 by Zhichao Lu: Addion Open Images Challenge 2019 object detection and instance segmentation support into Estimator framework. -- 249944839 by rathodv: Modify exporter.py to add multiclass score nodes in exported inference graphs. -- 249935201 by rathodv: Modify postprocess methods to preserve multiclass scores after non max suppression. -- 249878079 by Zhichao Lu: This CL slightly refactors some Object Detection helper functions for data creation, evaluation, and groundtruth providing. This will allow the eager+function custom loops to share code with the existing estimator training loops. Concretely we make the following changes: 1. In input creation we separate dataset-creation into top-level helpers, and allow it to optionally accept a pre-constructed model directly instead of always creating a model from the config just for feature preprocessing. 2. In coco evaluation we split the update_op creation into its own function, which the custom loops will call directly. 3. In model_lib we move groundtruth providing/ datastructure munging into a helper function 4. For now we put an escape hatch in `_summarize_target_assignment` when executing in tf v2.0 behavior because the summary apis used only work w/ tf 1.x -- 249673507 by rathodv: Use explicit casts instead of tf.to_float and tf.to_int32 to avoid warnings. -- 249656006 by Zhichao Lu: Add named "raw_keypoint_locations" node that corresponds with the "raw_box_locations" node. -- 249651674 by rathodv: Keep proposal boxes in float format. MatMulCropAndResize can handle the type even when feature themselves are bfloat16s. -- 249568633 by rathodv: Support q > 1 in class agnostic NMS. Break post_processing_test.py into 3 separate files to avoid linter errors. -- 249535530 by rathodv: Update some deprecated arguments to tf ops. -- 249368223 by rathodv: Modify MatMulCropAndResize to use MultiLevelRoIAlign method and move the tests to spatial_transform_ops.py module. This cl establishes that CropAndResize and RoIAlign are equivalent and only differ in the sampling point grid within the boxes. CropAndResize uses a uniform size x size point grid such that the corner points exactly overlap box corners, while RoiAlign divides boxes into size x size cells and uses their centers as sampling points. In this cl, we switch MatMulCropAndResize to use the MultiLevelRoIAlign implementation with `align_corner` option as MultiLevelRoIAlign implementation is more memory efficient on TPU when compared to the original MatMulCropAndResize. -- 249337338 by chowdhery: Add class-agnostic non-max-suppression in post_processing -- 249139196 by Zhichao Lu: Fix positional argument bug in export_tflite_ssd_graph -- 249120219 by Zhichao Lu: Add evaluator for computing precision limited to a given recall range. -- 249030593 by Zhichao Lu: Evaluation util to run segmentation and detection challenge evaluation. -- 248554358 by Zhichao Lu: This change contains the auxiliary changes required for TF 2.0 style training with eager+functions+dist strat loops, but not the loops themselves. It includes: - Updates to shape usage to support both tensorshape v1 and tensorshape v2 - A fix to FreezableBatchNorm to not override the `training` arg in call when `None` was passed to the constructor (Not an issue in the estimator loops but it was in the custom loops) - Puts some constants in init_scope so they work in eager + functions - Makes learning rate schedules return a callable in eager mode (required so they update when the global_step changes) - Makes DetectionModel a tf.module so it tracks variables (e.g. ones nested in layers) - Removes some references to `op.name` for some losses and replaces it w/ explicit names - A small part of the change to allow the coco evaluation metrics to work in eager mode -- 248271226 by rathodv: Add MultiLevel RoIAlign op. -- 248229103 by rathodv: Add functions to 1. pad features maps 2. ravel 5-D indices -- 248206769 by rathodv: Add utilities needed to introduce RoI Align op. -- 248177733 by pengchong: Internal changes -- 247742582 by Zhichao Lu: Open Images Challenge 2019 instance segmentation metric: part 2 -- 247525401 by Zhichao Lu: Update comments on max_class_per_detection. -- 247520753 by rathodv: Add multilevel crop and resize operation that builds on top of matmul_crop_and_resize. -- 247391600 by Zhichao Lu: Open Images Challenge 2019 instance segmentation metric -- 247325813 by chowdhery: Quantized MobileNet v2 SSD FPNLite config with depth multiplier 0.75 -- PiperOrigin-RevId: 250447559
9bbf8015 · pkulzc · GitHub · f42fddee · 9bbf8015 · 9bbf8015
Unverified Commit 9bbf8015 authored May 30, 2019 by pkulzc Committed by GitHub May 30, 2019
20 changed files
--- a/research/object_detection/meta_architectures/faster_rcnn_meta_arch_test.py
+++ b/research/object_detection/meta_architectures/faster_rcnn_meta_arch_test.py
@@ -271,7 +271,8 @@ class FasterRCNNMetaArchTest(
          set(tensor_dict_out.keys()),
          set(expected_shapes.keys()).union(
              set([
-                  'detection_boxes', 'detection_scores', 'detection_classes',
+                  'detection_boxes', 'detection_scores',
+                  'detection_multiclass_scores', 'detection_classes',
                  'detection_masks', 'num_detections', 'mask_predictions',
                  'raw_detection_boxes', 'raw_detection_scores'
              ])))

--- a/research/object_detection/meta_architectures/faster_rcnn_meta_arch_test_lib.py
+++ b/research/object_detection/meta_architectures/faster_rcnn_meta_arch_test_lib.py
@@ -967,7 +967,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
        [[0, 0, .5, .5], [.5, .5, 1, 1]], [[0, .5, .5, 1], [.5, 0, 1, .5]]]
    expected_proposal_scores = [[1, 1],
                                [1, 1]]
-    expected_num_proposals = [2, 2]
+    expected_proposal_multiclass_scores = [[[0., 1.], [0., 1.]],
+                                           [[0., 1.], [0., 1.]]]
    expected_raw_proposal_boxes = [[[0., 0., 0.5, 0.5], [0., 0.5, 0.5, 1.],
                                    [0.5, 0., 1., 0.5], [0.5, 0.5, 1., 1.]],
                                   [[0., 0., 0.5, 0.5], [0., 0.5, 0.5, 1.],
@@ -975,31 +976,45 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
    expected_raw_scores = [[[0., 1.], [0., 1.], [0., 1.], [0., 1.]],
                           [[0., 1.], [0., 1.], [0., 1.], [0., 1.]]]
    expected_output_keys = set([
-        'detection_boxes', 'detection_scores', 'num_detections',
-        'raw_detection_boxes', 'raw_detection_scores'
+        'detection_boxes', 'detection_scores', 'detection_multiclass_scores',
+        'num_detections', 'raw_detection_boxes', 'raw_detection_scores'
    ])
    self.assertEqual(set(proposals.keys()), expected_output_keys)

    with self.test_session() as sess:
      proposals_out = sess.run(proposals)
      for image_idx in range(batch_size):
+        num_detections = int(proposals_out['num_detections'][image_idx])
+        boxes = proposals_out['detection_boxes'][
+            image_idx][:num_detections, :].tolist()
+        scores = proposals_out['detection_scores'][
+            image_idx][:num_detections].tolist()
+        multiclass_scores = proposals_out['detection_multiclass_scores'][
+            image_idx][:num_detections, :].tolist()
+        expected_boxes = expected_proposal_boxes[image_idx]
+        expected_scores = expected_proposal_scores[image_idx]
+        expected_multiclass_scores = expected_proposal_multiclass_scores[
+            image_idx]
        self.assertTrue(
-            test_utils.first_rows_close_as_set(
-                proposals_out['detection_boxes'][image_idx].tolist(),
-                expected_proposal_boxes[image_idx]))
-      self.assertAllClose(proposals_out['detection_scores'],
-                          expected_proposal_scores)
-      self.assertAllEqual(proposals_out['num_detections'],
-                          expected_num_proposals)
+            test_utils.first_rows_close_as_set(boxes, expected_boxes))
+        self.assertTrue(
+            test_utils.first_rows_close_as_set(scores, expected_scores))
+        self.assertTrue(
+            test_utils.first_rows_close_as_set(multiclass_scores,
+                                               expected_multiclass_scores))
+
    self.assertAllClose(proposals_out['raw_detection_boxes'],
                        expected_raw_proposal_boxes)
    self.assertAllClose(proposals_out['raw_detection_scores'],
                        expected_raw_scores)

-  @parameterized.parameters(
-      {'use_keras': True},
-      {'use_keras': False}
-  )
+  @parameterized.named_parameters({
+      'testcase_name': 'keras',
+      'use_keras': True
+  }, {
+      'testcase_name': 'slim',
+      'use_keras': False
+  })
  def test_postprocess_first_stage_only_train_mode(self, use_keras=False):
    self._test_postprocess_first_stage_only_train_mode(use_keras=use_keras)

@@ -1066,7 +1081,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
      return (detections['num_detections'], detections['detection_boxes'],
              detections['detection_scores'], detections['detection_classes'],
              detections['raw_detection_boxes'],
-              detections['raw_detection_scores'])
+              detections['raw_detection_scores'],
+              detections['detection_multiclass_scores'])

    proposal_boxes = np.array(
        [[[1, 1, 2, 3],
@@ -1097,6 +1113,17 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
    expected_num_detections = [5, 4]
    expected_detection_classes = [[0, 0, 0, 1, 1], [0, 0, 1, 1, 0]]
    expected_detection_scores = [[1, 1, 1, 1, 1], [1, 1, 1, 1, 0]]
+    expected_multiclass_scores = [[[1, 1, 1],
+                                   [1, 1, 1],
+                                   [1, 1, 1],
+                                   [1, 1, 1],
+                                   [1, 1, 1]],
+                                  [[1, 1, 1],
+                                   [1, 1, 1],
+                                   [1, 1, 1],
+                                   [1, 1, 1],
+                                   [0, 0, 0]]]
+
    h = float(image_shape[1])
    w = float(image_shape[2])
    expected_raw_detection_boxes = np.array(
@@ -1114,6 +1141,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
                          expected_detection_scores[indx][0:num_proposals])
      self.assertAllClose(results[3][indx][0:num_proposals],
                          expected_detection_classes[indx][0:num_proposals])
+      self.assertAllClose(results[6][indx][0:num_proposals],
+                          expected_multiclass_scores[indx][0:num_proposals])

    self.assertAllClose(results[4], expected_raw_detection_boxes)
    self.assertAllClose(results[5],
@@ -1895,8 +1924,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
          number_of_stages=2, second_stage_batch_size=6)

      inputs_shape = (2, 20, 20, 3)
-      inputs = tf.to_float(tf.random_uniform(
-          inputs_shape, minval=0, maxval=255, dtype=tf.int32))
+      inputs = tf.cast(tf.random_uniform(
+          inputs_shape, minval=0, maxval=255, dtype=tf.int32), dtype=tf.float32)
      preprocessed_inputs, true_image_shapes = model.preprocess(inputs)
      prediction_dict = model.predict(preprocessed_inputs, true_image_shapes)
      model.postprocess(prediction_dict, true_image_shapes)
@@ -1921,8 +1950,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
          is_training=False, use_keras=use_keras,
          number_of_stages=2, second_stage_batch_size=6)
      inputs_shape = (2, 20, 20, 3)
-      inputs = tf.to_float(tf.random_uniform(
-          inputs_shape, minval=0, maxval=255, dtype=tf.int32))
+      inputs = tf.cast(tf.random_uniform(
+          inputs_shape, minval=0, maxval=255, dtype=tf.int32), dtype=tf.float32)
      preprocessed_inputs, true_image_shapes = model.preprocess(inputs)
      prediction_dict = model.predict(preprocessed_inputs, true_image_shapes)
      model.postprocess(prediction_dict, true_image_shapes)
@@ -1942,8 +1971,9 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
                                 second_stage_batch_size=6, num_classes=42)

      inputs_shape2 = (2, 20, 20, 3)
-      inputs2 = tf.to_float(tf.random_uniform(
-          inputs_shape2, minval=0, maxval=255, dtype=tf.int32))
+      inputs2 = tf.cast(tf.random_uniform(
+          inputs_shape2, minval=0, maxval=255, dtype=tf.int32),
+                        dtype=tf.float32)
      preprocessed_inputs2, true_image_shapes = model2.preprocess(inputs2)
      prediction_dict2 = model2.predict(preprocessed_inputs2, true_image_shapes)
      model2.postprocess(prediction_dict2, true_image_shapes)
@@ -1974,8 +2004,9 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
          num_classes=42)

      inputs_shape = (2, 20, 20, 3)
-      inputs = tf.to_float(
-          tf.random_uniform(inputs_shape, minval=0, maxval=255, dtype=tf.int32))
+      inputs = tf.cast(
+          tf.random_uniform(inputs_shape, minval=0, maxval=255, dtype=tf.int32),
+          dtype=tf.float32)
      preprocessed_inputs, true_image_shapes = model.preprocess(inputs)
      prediction_dict = model.predict(preprocessed_inputs, true_image_shapes)
      model.postprocess(prediction_dict, true_image_shapes)

--- a/research/object_detection/meta_architectures/rfcn_meta_arch.py
+++ b/research/object_detection/meta_architectures/rfcn_meta_arch.py
@@ -297,9 +297,10 @@ class RFCNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch):
    """
    image_shape_2d = tf.tile(tf.expand_dims(image_shape[1:], 0),
                             [image_shape[0], 1])
-    proposal_boxes_normalized, _, num_proposals, _, _ = self._postprocess_rpn(
-        rpn_box_encodings, rpn_objectness_predictions_with_background,
-        anchors, image_shape_2d, true_image_shapes)
+    (proposal_boxes_normalized, _, _, num_proposals, _,
+     _) = self._postprocess_rpn(rpn_box_encodings,
+                                rpn_objectness_predictions_with_background,
+                                anchors, image_shape_2d, true_image_shapes)

    box_classifier_features = (
        self._extract_box_classifier_features(rpn_features))

--- a/research/object_detection/meta_architectures/ssd_meta_arch.py
+++ b/research/object_detection/meta_architectures/ssd_meta_arch.py
@@ -509,9 +509,9 @@ class SSDMetaArch(model.DetectionModel):
    resized_inputs_shape = shape_utils.combined_static_and_dynamic_shape(
        preprocessed_images)
    true_heights, true_widths, _ = tf.unstack(
-        tf.to_float(true_image_shapes), axis=1)
-    padded_height = tf.to_float(resized_inputs_shape[1])
-    padded_width = tf.to_float(resized_inputs_shape[2])
+        tf.cast(true_image_shapes, dtype=tf.float32), axis=1)
+    padded_height = tf.cast(resized_inputs_shape[1], dtype=tf.float32)
+    padded_width = tf.cast(resized_inputs_shape[2], dtype=tf.float32)
    return tf.stack(
        [
            tf.zeros_like(true_heights),
@@ -654,6 +654,9 @@ class SSDMetaArch(model.DetectionModel):
          detection boxes.
        detection_scores: [batch, max_detections] tensor with scalar scores for
          post-processed detection boxes.
+        detection_multiclass_scores: [batch, max_detections,
+          num_classes_with_background] tensor with class score distribution for
+          post-processed detection boxes including background class if any.
        detection_classes: [batch, max_detections] tensor with classes for
          post-processed detection classes.
        detection_keypoints: [batch, max_detections, num_keypoints, 2] (if
@@ -703,10 +706,13 @@ class SSDMetaArch(model.DetectionModel):
          feature_map_list.append(tf.reshape(feature_map, [batch_size, -1]))
        box_features = tf.concat(feature_map_list, 1)
        box_features = tf.identity(box_features, 'raw_box_features')
-
+      additional_fields = {
+          'multiclass_scores': detection_scores_with_background
+      }
      if detection_keypoints is not None:
-        additional_fields = {
-            fields.BoxListFields.keypoints: detection_keypoints}
+        detection_keypoints = tf.identity(
+            detection_keypoints, 'raw_keypoint_locations')
+        additional_fields[fields.BoxListFields.keypoints] = detection_keypoints
      (nmsed_boxes, nmsed_scores, nmsed_classes, nmsed_masks,
       nmsed_additional_fields, num_detections) = self._non_max_suppression_fn(
           detection_boxes,
@@ -722,8 +728,10 @@ class SSDMetaArch(model.DetectionModel):
              nmsed_scores,
          fields.DetectionResultFields.detection_classes:
              nmsed_classes,
+          fields.DetectionResultFields.detection_multiclass_scores:
+              nmsed_additional_fields['multiclass_scores'],
          fields.DetectionResultFields.num_detections:
-              tf.to_float(num_detections),
+              tf.cast(num_detections, dtype=tf.float32),
          fields.DetectionResultFields.raw_detection_boxes:
              tf.squeeze(detection_boxes, axis=2),
          fields.DetectionResultFields.raw_detection_scores:
@@ -786,13 +794,13 @@ class SSDMetaArch(model.DetectionModel):
      if self._random_example_sampler:
        batch_cls_per_anchor_weights = tf.reduce_mean(
            batch_cls_weights, axis=-1)
-        batch_sampled_indicator = tf.to_float(
+        batch_sampled_indicator = tf.cast(
            shape_utils.static_or_dynamic_map_fn(
                self._minibatch_subsample_fn,
                [batch_cls_targets, batch_cls_per_anchor_weights],
                dtype=tf.bool,
                parallel_iterations=self._parallel_iterations,
-                back_prop=True))
+                back_prop=True), dtype=tf.float32)
        batch_reg_weights = tf.multiply(batch_sampled_indicator,
                                        batch_reg_weights)
        batch_cls_weights = tf.multiply(
@@ -868,7 +876,8 @@ class SSDMetaArch(model.DetectionModel):
      # Optionally normalize by number of positive matches
      normalizer = tf.constant(1.0, dtype=tf.float32)
      if self._normalize_loss_by_num_matches:
-        normalizer = tf.maximum(tf.to_float(tf.reduce_sum(batch_reg_weights)),
+        normalizer = tf.maximum(tf.cast(tf.reduce_sum(batch_reg_weights),
+                                        dtype=tf.float32),
                                1.0)

      localization_loss_normalizer = normalizer
@@ -883,8 +892,8 @@ class SSDMetaArch(model.DetectionModel):
                                        name='classification_loss')

      loss_dict = {
-          str(localization_loss.op.name): localization_loss,
-          str(classification_loss.op.name): classification_loss
+          'Loss/localization_loss': localization_loss,
+          'Loss/classification_loss': classification_loss
      }


@@ -1025,17 +1034,35 @@ class SSDMetaArch(model.DetectionModel):
        with rows of the Match objects corresponding to groundtruth boxes
        and columns corresponding to anchors.
    """
-    avg_num_gt_boxes = tf.reduce_mean(tf.to_float(tf.stack(
-        [tf.shape(x)[0] for x in groundtruth_boxes_list])))
-    avg_num_matched_gt_boxes = tf.reduce_mean(tf.to_float(tf.stack(
-        [match.num_matched_rows() for match in match_list])))
-    avg_pos_anchors = tf.reduce_mean(tf.to_float(tf.stack(
-        [match.num_matched_columns() for match in match_list])))
-    avg_neg_anchors = tf.reduce_mean(tf.to_float(tf.stack(
-        [match.num_unmatched_columns() for match in match_list])))
-    avg_ignored_anchors = tf.reduce_mean(tf.to_float(tf.stack(
-        [match.num_ignored_columns() for match in match_list])))
+    avg_num_gt_boxes = tf.reduce_mean(
+        tf.cast(
+            tf.stack([tf.shape(x)[0] for x in groundtruth_boxes_list]),
+            dtype=tf.float32))
+    avg_num_matched_gt_boxes = tf.reduce_mean(
+        tf.cast(
+            tf.stack([match.num_matched_rows() for match in match_list]),
+            dtype=tf.float32))
+    avg_pos_anchors = tf.reduce_mean(
+        tf.cast(
+            tf.stack([match.num_matched_columns() for match in match_list]),
+            dtype=tf.float32))
+    avg_neg_anchors = tf.reduce_mean(
+        tf.cast(
+            tf.stack([match.num_unmatched_columns() for match in match_list]),
+            dtype=tf.float32))
+    avg_ignored_anchors = tf.reduce_mean(
+        tf.cast(
+            tf.stack([match.num_ignored_columns() for match in match_list]),
+            dtype=tf.float32))
    # TODO(rathodv): Add a test for these summaries.
+    try:
+      # TODO(kaftan): Integrate these summaries into the v2 style loops
+      with tf.compat.v2.init_scope():
+        if tf.compat.v2.executing_eagerly():
+          return
+    except AttributeError:
+      pass
+
    tf.summary.scalar('AvgNumGroundtruthBoxesPerImage',
                      avg_num_gt_boxes,
                      family='TargetAssignment')

--- a/research/object_detection/meta_architectures/ssd_meta_arch_test.py
+++ b/research/object_detection/meta_architectures/ssd_meta_arch_test.py
@@ -176,6 +176,9 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
        ]
    ]  # padding
    expected_scores = [[0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]
+    expected_multiclass_scores = [[[0, 0], [0, 0], [0, 0], [0, 0], [0, 0]],
+                                  [[0, 0], [0, 0], [0, 0], [0, 0], [0, 0]]]
+
    expected_classes = [[0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]
    expected_num_detections = np.array([3, 3])

@@ -198,6 +201,7 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
        detections = model.postprocess(prediction_dict, true_image_shapes)
        self.assertIn('detection_boxes', detections)
        self.assertIn('detection_scores', detections)
+        self.assertIn('detection_multiclass_scores', detections)
        self.assertIn('detection_classes', detections)
        self.assertIn('num_detections', detections)
        self.assertIn('raw_detection_boxes', detections)
@@ -217,6 +221,8 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
                expected_boxes[image_idx]))
      self.assertAllClose(detections_out['detection_scores'], expected_scores)
      self.assertAllClose(detections_out['detection_classes'], expected_classes)
+      self.assertAllClose(detections_out['detection_multiclass_scores'],
+                          expected_multiclass_scores)
      self.assertAllClose(detections_out['num_detections'],
                          expected_num_detections)
      self.assertAllEqual(detections_out['raw_detection_boxes'],
@@ -235,7 +241,8 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
                                      true_image_shapes)
      detections = model.postprocess(prediction_dict, true_image_shapes)
      return (detections['detection_boxes'], detections['detection_scores'],
-              detections['detection_classes'], detections['num_detections'])
+              detections['detection_classes'], detections['num_detections'],
+              detections['detection_multiclass_scores'])

    batch_size = 2
    image_size = 2
@@ -257,11 +264,14 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
        ]
    ]  # padding
    expected_scores = [[0, 0, 0, 0], [0, 0, 0, 0]]
+    expected_multiclass_scores = [[[0, 0], [0, 0], [0, 0], [0, 0]],
+                                  [[0, 0], [0, 0], [0, 0], [0, 0]]]
    expected_classes = [[0, 0, 0, 0], [0, 0, 0, 0]]
    expected_num_detections = np.array([3, 3])

    (detection_boxes, detection_scores, detection_classes,
-     num_detections) = self.execute(graph_fn, [input_image])
+     num_detections, detection_multiclass_scores) = self.execute(graph_fn,
+                                                                 [input_image])
    for image_idx in range(batch_size):
      self.assertTrue(test_utils.first_rows_close_as_set(
          detection_boxes[image_idx][
@@ -270,6 +280,11 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
      self.assertAllClose(
          detection_scores[image_idx][0:expected_num_detections[image_idx]],
          expected_scores[image_idx][0:expected_num_detections[image_idx]])
+      self.assertAllClose(
+          detection_multiclass_scores[image_idx]
+          [0:expected_num_detections[image_idx]],
+          expected_multiclass_scores[image_idx]
+          [0:expected_num_detections[image_idx]])
      self.assertAllClose(
          detection_classes[image_idx][0:expected_num_detections[image_idx]],
          expected_classes[image_idx][0:expected_num_detections[image_idx]])
@@ -600,8 +615,8 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
    with test_graph_detection.as_default():
      model, _, _, _ = self._create_model(use_keras=use_keras)
      inputs_shape = [2, 2, 2, 3]
-      inputs = tf.to_float(tf.random_uniform(
-          inputs_shape, minval=0, maxval=255, dtype=tf.int32))
+      inputs = tf.cast(tf.random_uniform(
+          inputs_shape, minval=0, maxval=255, dtype=tf.int32), dtype=tf.float32)
      preprocessed_inputs, true_image_shapes = model.preprocess(inputs)
      prediction_dict = model.predict(preprocessed_inputs, true_image_shapes)
      model.postprocess(prediction_dict, true_image_shapes)
@@ -620,8 +635,9 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
    with test_graph_detection.as_default():
      model, _, _, _ = self._create_model(use_keras=use_keras)
      inputs_shape = [2, 2, 2, 3]
-      inputs = tf.to_float(
-          tf.random_uniform(inputs_shape, minval=0, maxval=255, dtype=tf.int32))
+      inputs = tf.cast(
+          tf.random_uniform(inputs_shape, minval=0, maxval=255, dtype=tf.int32),
+          dtype=tf.float32)
      preprocessed_inputs, true_image_shapes = model.preprocess(inputs)
      prediction_dict = model.predict(preprocessed_inputs, true_image_shapes)
      model.postprocess(prediction_dict, true_image_shapes)

--- a/research/object_detection/metrics/calibration_metrics.py
+++ b/research/object_detection/metrics/calibration_metrics.py
@@ -98,13 +98,16 @@ def expected_calibration_error(y_true, y_pred, nbins=20):

  with tf.control_dependencies([bin_ids]):
    update_bin_counts_op = tf.assign_add(
-        bin_counts, tf.to_float(tf.bincount(bin_ids, minlength=nbins)))
+        bin_counts, tf.cast(tf.bincount(bin_ids, minlength=nbins),
+                            dtype=tf.float32))
    update_bin_true_sum_op = tf.assign_add(
        bin_true_sum,
-        tf.to_float(tf.bincount(bin_ids, weights=y_true, minlength=nbins)))
+        tf.cast(tf.bincount(bin_ids, weights=y_true, minlength=nbins),
+                dtype=tf.float32))
    update_bin_preds_sum_op = tf.assign_add(
        bin_preds_sum,
-        tf.to_float(tf.bincount(bin_ids, weights=y_pred, minlength=nbins)))
+        tf.cast(tf.bincount(bin_ids, weights=y_pred, minlength=nbins),
+                dtype=tf.float32))

  ece_update_op = _ece_from_bins(
      update_bin_counts_op,

--- a/research/object_detection/metrics/coco_evaluation.py
+++ b/research/object_detection/metrics/coco_evaluation.py
@@ -216,29 +216,23 @@ class CocoDetectionEvaluator(object_detection_evaluation.DetectionEvaluator):
                   for key, value in iter(box_metrics.items())}
    return box_metrics

-  def get_estimator_eval_metric_ops(self, eval_dict):
-    """Returns a dictionary of eval metric ops.
+  def add_eval_dict(self, eval_dict):
+    """Observes an evaluation result dict for a single example.

-    Note that once value_op is called, the detections and groundtruth added via
-    update_op are cleared.
+    When executing eagerly, once all observations have been observed by this
+    method you can use `.evaluate()` to get the final metrics.

-    This function can take in groundtruth and detections for a batch of images,
-    or for a single image. For the latter case, the batch dimension for input
-    tensors need not be present.
+    When using `tf.estimator.Estimator` for evaluation this function is used by
+    `get_estimator_eval_metric_ops()` to construct the metric update op.

    Args:
-      eval_dict: A dictionary that holds tensors for evaluating object detection
-        performance. For single-image evaluation, this dictionary may be
-        produced from eval_util.result_dict_for_single_example(). If multi-image
-        evaluation, `eval_dict` should contain the fields
-        'num_groundtruth_boxes_per_image' and 'num_det_boxes_per_image' to
-        properly unpad the tensors from the batch.
+      eval_dict: A dictionary that holds tensors for evaluating an object
+        detection model, returned from
+        eval_util.result_dict_for_single_example().

    Returns:
-      a dictionary of metric names to tuple of value_op and update_op that can
-      be used as eval metric ops in tf.estimator.EstimatorSpec. Note that all
-      update ops must be run together and similarly all value ops must be run
-      together to guarantee correct behaviour.
+      None when executing eagerly, or an update_op that can be used to update
+      the eval metrics in `tf.estimator.EstimatorSpec`.
    """
    def update_op(
        image_id_batched,
@@ -328,16 +322,42 @@ class CocoDetectionEvaluator(object_detection_evaluation.DetectionEvaluator):
      if is_annotated is None:
        is_annotated = tf.ones_like(image_id, dtype=tf.bool)

-    update_op = tf.py_func(update_op, [image_id,
-                                       groundtruth_boxes,
-                                       groundtruth_classes,
-                                       groundtruth_is_crowd,
-                                       num_gt_boxes_per_image,
-                                       detection_boxes,
-                                       detection_scores,
-                                       detection_classes,
-                                       num_det_boxes_per_image,
-                                       is_annotated], [])
+    return tf.py_func(update_op, [image_id,
+                                  groundtruth_boxes,
+                                  groundtruth_classes,
+                                  groundtruth_is_crowd,
+                                  num_gt_boxes_per_image,
+                                  detection_boxes,
+                                  detection_scores,
+                                  detection_classes,
+                                  num_det_boxes_per_image,
+                                  is_annotated], [])
+
+  def get_estimator_eval_metric_ops(self, eval_dict):
+    """Returns a dictionary of eval metric ops.
+
+    Note that once value_op is called, the detections and groundtruth added via
+    update_op are cleared.
+
+    This function can take in groundtruth and detections for a batch of images,
+    or for a single image. For the latter case, the batch dimension for input
+    tensors need not be present.
+
+    Args:
+      eval_dict: A dictionary that holds tensors for evaluating object detection
+        performance. For single-image evaluation, this dictionary may be
+        produced from eval_util.result_dict_for_single_example(). If multi-image
+        evaluation, `eval_dict` should contain the fields
+        'num_groundtruth_boxes_per_image' and 'num_det_boxes_per_image' to
+        properly unpad the tensors from the batch.
+
+    Returns:
+      a dictionary of metric names to tuple of value_op and update_op that can
+      be used as eval metric ops in tf.estimator.EstimatorSpec. Note that all
+      update ops must be run together and similarly all value ops must be run
+      together to guarantee correct behaviour.
+    """
+    update_op = self.add_eval_dict(eval_dict)
    metric_names = ['DetectionBoxes_Precision/mAP',
                    'DetectionBoxes_Precision/mAP@.50IOU',
                    'DetectionBoxes_Precision/mAP@.75IOU',

--- a/research/object_detection/metrics/oid_od_challenge_evaluation.py
+++ b/research/object_detection/metrics/oid_od_challenge_evaluation.py
@@ -14,6 +14,8 @@
 # ==============================================================================
 r"""Runs evaluation using OpenImages groundtruth and predictions.

+Uses Open Images Challenge 2018, 2019 metrics
+
 Example usage:
 python models/research/object_detection/metrics/oid_od_challenge_evaluation.py \
    --input_annotations_boxes=/path/to/input/annotations-human-bbox.csv \
@@ -21,27 +23,50 @@ python models/research/object_detection/metrics/oid_od_challenge_evaluation.py \
    --input_class_labelmap=/path/to/input/class_labelmap.pbtxt \
    --input_predictions=/path/to/input/predictions.csv \
    --output_metrics=/path/to/output/metric.csv \
+    --input_annotations_segm=[/path/to/input/annotations-human-mask.csv] \
+
+If optional flag has_masks is True, Mask column is also expected in CSV.

-CSVs with bounding box annotations and image label (including the image URLs)
+CSVs with bounding box annotations, instance segmentations and image label
 can be downloaded from the Open Images Challenge website:
 https://storage.googleapis.com/openimages/web/challenge.html
 The format of the input csv and the metrics itself are described on the
-challenge website.
+challenge website as well.
+
+
 """

 from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function

-import argparse
+from absl import app
+from absl import flags
 import pandas as pd
 from google.protobuf import text_format

 from object_detection.metrics import io_utils
-from object_detection.metrics import oid_od_challenge_evaluation_utils as utils
+from object_detection.metrics import oid_challenge_evaluation_utils as utils
 from object_detection.protos import string_int_label_map_pb2
 from object_detection.utils import object_detection_evaluation

+flags.DEFINE_string('input_annotations_boxes', None,
+                    'File with groundtruth boxes annotations.')
+flags.DEFINE_string('input_annotations_labels', None,
+                    'File with groundtruth labels annotations.')
+flags.DEFINE_string(
+    'input_predictions', None,
+    """File with detection predictions; NOTE: no postprocessing is applied in the evaluation script."""
+)
+flags.DEFINE_string('input_class_labelmap', None,
+                    'Open Images Challenge labelmap.')
+flags.DEFINE_string('output_metrics', None, 'Output file with csv metrics.')
+flags.DEFINE_string(
+    'input_annotations_segm', None,
+    'File with groundtruth instance segmentation annotations [OPTIONAL].')
+
+FLAGS = flags.FLAGS
+

 def _load_labelmap(labelmap_path):
  """Loads labelmap from the labelmap path.
@@ -66,26 +91,43 @@ def _load_labelmap(labelmap_path):
  return labelmap_dict, categories


-def main(parsed_args):
-  all_box_annotations = pd.read_csv(parsed_args.input_annotations_boxes)
-  all_label_annotations = pd.read_csv(parsed_args.input_annotations_labels)
+def main(unused_argv):
+  flags.mark_flag_as_required('input_annotations_boxes')
+  flags.mark_flag_as_required('input_annotations_labels')
+  flags.mark_flag_as_required('input_predictions')
+  flags.mark_flag_as_required('input_class_labelmap')
+  flags.mark_flag_as_required('output_metrics')
+
+  all_location_annotations = pd.read_csv(FLAGS.input_annotations_boxes)
+  all_label_annotations = pd.read_csv(FLAGS.input_annotations_labels)
  all_label_annotations.rename(
      columns={'Confidence': 'ConfidenceImageLabel'}, inplace=True)
-  all_annotations = pd.concat([all_box_annotations, all_label_annotations])

-  class_label_map, categories = _load_labelmap(parsed_args.input_class_labelmap)
+  is_instance_segmentation_eval = False
+  if FLAGS.input_annotations_segm:
+    is_instance_segmentation_eval = True
+    all_segm_annotations = pd.read_csv(FLAGS.input_annotations_segm)
+    # Note: this part is unstable as it requires the float point numbers in both
+    # csvs are exactly the same;
+    # Will be replaced by more stable solution: merge on LabelName and ImageID
+    # and filter down by IoU.
+    all_location_annotations = utils.merge_boxes_and_masks(
+        all_location_annotations, all_segm_annotations)
+  all_annotations = pd.concat([all_location_annotations, all_label_annotations])
+
+  class_label_map, categories = _load_labelmap(FLAGS.input_class_labelmap)
  challenge_evaluator = (
-      object_detection_evaluation.OpenImagesDetectionChallengeEvaluator(
-          categories))
+      object_detection_evaluation.OpenImagesChallengeEvaluator(
+          categories, evaluate_masks=is_instance_segmentation_eval))

  for _, groundtruth in enumerate(all_annotations.groupby('ImageID')):
    image_id, image_groundtruth = groundtruth
-    groundtruth_dictionary = utils.build_groundtruth_boxes_dictionary(
+    groundtruth_dictionary = utils.build_groundtruth_dictionary(
        image_groundtruth, class_label_map)
    challenge_evaluator.add_single_ground_truth_image_info(
        image_id, groundtruth_dictionary)

-  all_predictions = pd.read_csv(parsed_args.input_predictions)
+  all_predictions = pd.read_csv(FLAGS.input_predictions)
  for _, prediction_data in enumerate(all_predictions.groupby('ImageID')):
    image_id, image_predictions = prediction_data
    prediction_dictionary = utils.build_predictions_dictionary(
@@ -95,34 +137,9 @@ def main(parsed_args):

  metrics = challenge_evaluator.evaluate()

-  with open(parsed_args.output_metrics, 'w') as fid:
+  with open(FLAGS.output_metrics, 'w') as fid:
    io_utils.write_csv(fid, metrics)


 if __name__ == '__main__':
-
-  parser = argparse.ArgumentParser(
-      description='Evaluate Open Images Object Detection Challenge predictions.'
-  )
-  parser.add_argument(
-      '--input_annotations_boxes',
-      required=True,
-      help='File with groundtruth boxes annotations.')
-  parser.add_argument(
-      '--input_annotations_labels',
-      required=True,
-      help='File with groundtruth labels annotations')
-  parser.add_argument(
-      '--input_predictions',
-      required=True,
-      help="""File with detection predictions; NOTE: no postprocessing is
-      applied in the evaluation script.""")
-  parser.add_argument(
-      '--input_class_labelmap',
-      required=True,
-      help='Open Images Challenge labelmap.')
-  parser.add_argument(
-      '--output_metrics', required=True, help='Output file with csv metrics')
-
-  args = parser.parse_args()
-  main(args)
+  app.run(main)
--- a/research/object_detection/metrics/oid_od_challenge_evaluation_utils.py
+++ b/research/object_detection/metrics/oid_od_challenge_evaluation_utils.py
@@ -12,17 +12,92 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ==============================================================================
-r"""Converts data from CSV to the OpenImagesDetectionChallengeEvaluator format.
-"""
+r"""Converts data from CSV to the OpenImagesDetectionChallengeEvaluator format."""

 from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function

+import numpy as np
+import pandas as pd
+
+from pycocotools import mask
 from object_detection.core import standard_fields


-def build_groundtruth_boxes_dictionary(data, class_label_map):
+def _to_normalized_box(mask_np):
+  """Decodes binary segmentation masks into np.arrays and boxes.
+
+  Args:
+    mask_np: np.ndarray of size NxWxH.
+
+  Returns:
+    a np.ndarray of the size Nx4, each row containing normalized coordinates
+    [YMin, XMin, YMax, XMax] of a box computed of axis parallel enclosing box of
+    a mask.
+  """
+  coord1, coord2 = np.nonzero(mask_np)
+  if coord1.size > 0:
+    ymin = float(min(coord1)) / mask_np.shape[0]
+    ymax = float(max(coord1) + 1) / mask_np.shape[0]
+    xmin = float(min(coord2)) / mask_np.shape[1]
+    xmax = float((max(coord2) + 1)) / mask_np.shape[1]
+
+    return np.array([ymin, xmin, ymax, xmax])
+  else:
+    return np.array([0.0, 0.0, 0.0, 0.0])
+
+
+def _decode_raw_data_into_masks_and_boxes(segments, image_widths,
+                                          image_heights):
+  """Decods binary segmentation masks into np.arrays and boxes.
+
+  Args:
+    segments: pandas Series object containing either None entries or strings
+    with COCO-encoded binary masks. All masks are expected to be the same size.
+    image_widths: pandas Series of mask widths.
+    image_heights: pandas Series of mask heights.
+
+  Returns:
+    a np.ndarray of the size NxWxH, where W and H is determined from the encoded
+    masks; for the None values, zero arrays of size WxH are created. if input
+    contains only None values, W=1, H=1.
+  """
+  segment_masks = []
+  segment_boxes = []
+  ind = segments.first_valid_index()
+  if ind is not None:
+    size = [int(image_heights.iloc[ind]), int(image_widths[ind])]
+  else:
+    # It does not matter which size we pick since no masks will ever be
+    # evaluated.
+    size = [1, 1]
+  for segment, im_width, im_height in zip(segments, image_widths,
+                                          image_heights):
+    if pd.isnull(segment):
+      segment_masks.append(np.zeros([1, size[0], size[1]], dtype=np.uint8))
+      segment_boxes.append(np.expand_dims(np.array([0.0, 0.0, 0.0, 0.0]), 0))
+    else:
+      encoding_dict = {'size': [im_height, im_width], 'counts': segment}
+      mask_tensor = mask.decode(encoding_dict)
+
+      segment_masks.append(np.expand_dims(mask_tensor, 0))
+      segment_boxes.append(np.expand_dims(_to_normalized_box(mask_tensor), 0))
+
+  return np.concatenate(
+      segment_masks, axis=0), np.concatenate(
+          segment_boxes, axis=0)
+
+
+def merge_boxes_and_masks(box_data, mask_data):
+  return pd.merge(
+      box_data,
+      mask_data,
+      how='outer',
+      on=['LabelName', 'ImageID', 'XMin', 'XMax', 'YMin', 'YMax', 'IsGroupOf'])
+
+
+def build_groundtruth_dictionary(data, class_label_map):
  """Builds a groundtruth dictionary from groundtruth data in CSV file.

  Args:
@@ -44,21 +119,31 @@ def build_groundtruth_boxes_dictionary(data, class_label_map):
          M numpy boolean array denoting whether a groundtruth box contains a
          group of instances.
  """
-  data_boxes = data[data.ConfidenceImageLabel.isnull()]
-  data_labels = data[data.XMin.isnull()]
+  data_location = data[data.XMin.notnull()]
+  data_labels = data[data.ConfidenceImageLabel.notnull()]

-  return {
+  dictionary = {
      standard_fields.InputDataFields.groundtruth_boxes:
-          data_boxes[['YMin', 'XMin', 'YMax', 'XMax']].as_matrix(),
+          data_location[['YMin', 'XMin', 'YMax', 'XMax']].as_matrix(),
      standard_fields.InputDataFields.groundtruth_classes:
-          data_boxes['LabelName'].map(lambda x: class_label_map[x]).as_matrix(),
+          data_location['LabelName'].map(lambda x: class_label_map[x]
+                                        ).as_matrix(),
      standard_fields.InputDataFields.groundtruth_group_of:
-          data_boxes['IsGroupOf'].as_matrix().astype(int),
+          data_location['IsGroupOf'].as_matrix().astype(int),
      standard_fields.InputDataFields.groundtruth_image_classes:
-          data_labels['LabelName'].map(lambda x: class_label_map[x])
-          .as_matrix(),
+          data_labels['LabelName'].map(lambda x: class_label_map[x]
+                                      ).as_matrix(),
  }

+  if 'Mask' in data_location:
+    segments, _ = _decode_raw_data_into_masks_and_boxes(
+        data_location['Mask'], data_location['ImageWidth'],
+        data_location['ImageHeight'])
+    dictionary[
+        standard_fields.InputDataFields.groundtruth_instance_masks] = segments
+
+  return dictionary
+

 def build_predictions_dictionary(data, class_label_map):
  """Builds a predictions dictionary from predictions data in CSV file.
@@ -80,11 +165,21 @@ def build_predictions_dictionary(data, class_label_map):
          the boxes.

  """
-  return {
-      standard_fields.DetectionResultFields.detection_boxes:
-          data[['YMin', 'XMin', 'YMax', 'XMax']].as_matrix(),
+  dictionary = {
      standard_fields.DetectionResultFields.detection_classes:
          data['LabelName'].map(lambda x: class_label_map[x]).as_matrix(),
      standard_fields.DetectionResultFields.detection_scores:
          data['Score'].as_matrix()
  }
+
+  if 'Mask' in data:
+    segments, boxes = _decode_raw_data_into_masks_and_boxes(
+        data['Mask'], data['ImageWidth'], data['ImageHeight'])
+    dictionary[standard_fields.DetectionResultFields.detection_masks] = segments
+    dictionary[standard_fields.DetectionResultFields.detection_boxes] = boxes
+  else:
+    dictionary[standard_fields.DetectionResultFields.detection_boxes] = data[[
+        'YMin', 'XMin', 'YMax', 'XMax'
+    ]].as_matrix()
+
+  return dictionary
--- a/research/object_detection/metrics/oid_challenge_evaluation_utils_test.py
+++ b/research/object_detection/metrics/oid_challenge_evaluation_utils_test.py
+# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for oid_od_challenge_evaluation_util."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+import pandas as pd
+from pycocotools import mask
+import tensorflow as tf
+
+from object_detection.core import standard_fields
+from object_detection.metrics import oid_challenge_evaluation_utils as utils
+
+
+class OidUtilTest(tf.test.TestCase):
+
+  def testMaskToNormalizedBox(self):
+    mask_np = np.array([[0, 0, 0, 0], [0, 1, 0, 0], [0, 1, 0, 0], [0, 0, 0, 0]])
+    box = utils._to_normalized_box(mask_np)
+    self.assertAllEqual(np.array([0.25, 0.25, 0.75, 0.5]), box)
+    mask_np = np.array([[0, 0, 0, 0], [0, 1, 0, 1], [0, 1, 0, 1], [0, 1, 1, 1]])
+    box = utils._to_normalized_box(mask_np)
+    self.assertAllEqual(np.array([0.25, 0.25, 1.0, 1.0]), box)
+    mask_np = np.array([[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]])
+    box = utils._to_normalized_box(mask_np)
+    self.assertAllEqual(np.array([0.0, 0.0, 0.0, 0.0]), box)
+
+  def testDecodeToTensors(self):
+    mask1 = np.array([[0, 0, 1, 1], [0, 0, 1, 1], [0, 0, 0, 0]], dtype=np.uint8)
+    mask2 = np.array([[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]], dtype=np.uint8)
+
+    encoding1 = mask.encode(np.asfortranarray(mask1))
+    encoding2 = mask.encode(np.asfortranarray(mask2))
+
+    vals = pd.Series([encoding1['counts'], encoding2['counts']])
+    image_widths = pd.Series([mask1.shape[1], mask2.shape[1]])
+    image_heights = pd.Series([mask1.shape[0], mask2.shape[0]])
+
+    segm, bbox = utils._decode_raw_data_into_masks_and_boxes(
+        vals, image_widths, image_heights)
+    expected_segm = np.concatenate(
+        [np.expand_dims(mask1, 0),
+         np.expand_dims(mask2, 0)], axis=0)
+    expected_bbox = np.array([[0.0, 0.5, 2.0 / 3.0, 1.0], [0, 0, 0, 0]])
+    self.assertAllEqual(expected_segm, segm)
+    self.assertAllEqual(expected_bbox, bbox)
+
+
+class OidChallengeEvaluationUtilTest(tf.test.TestCase):
+
+  def testBuildGroundtruthDictionaryBoxes(self):
+    np_data = pd.DataFrame(
+        [['fe58ec1b06db2bb7', '/m/04bcr3', 0.0, 0.3, 0.5, 0.6, 1, None],
+         ['fe58ec1b06db2bb7', '/m/02gy9n', 0.1, 0.2, 0.3, 0.4, 0, None],
+         ['fe58ec1b06db2bb7', '/m/04bcr3', None, None, None, None, None, 1],
+         ['fe58ec1b06db2bb7', '/m/083vt', None, None, None, None, None, 0],
+         ['fe58ec1b06db2bb7', '/m/02gy9n', None, None, None, None, None, 1]],
+        columns=[
+            'ImageID', 'LabelName', 'XMin', 'XMax', 'YMin', 'YMax', 'IsGroupOf',
+            'ConfidenceImageLabel'
+        ])
+    class_label_map = {'/m/04bcr3': 1, '/m/083vt': 2, '/m/02gy9n': 3}
+    groundtruth_dictionary = utils.build_groundtruth_dictionary(
+        np_data, class_label_map)
+
+    self.assertIn(standard_fields.InputDataFields.groundtruth_boxes,
+                  groundtruth_dictionary)
+    self.assertIn(standard_fields.InputDataFields.groundtruth_classes,
+                  groundtruth_dictionary)
+    self.assertIn(standard_fields.InputDataFields.groundtruth_group_of,
+                  groundtruth_dictionary)
+    self.assertIn(standard_fields.InputDataFields.groundtruth_image_classes,
+                  groundtruth_dictionary)
+
+    self.assertAllEqual(
+        np.array([1, 3]), groundtruth_dictionary[
+            standard_fields.InputDataFields.groundtruth_classes])
+    self.assertAllEqual(
+        np.array([1, 0]), groundtruth_dictionary[
+            standard_fields.InputDataFields.groundtruth_group_of])
+
+    expected_boxes_data = np.array([[0.5, 0.0, 0.6, 0.3], [0.3, 0.1, 0.4, 0.2]])
+
+    self.assertNDArrayNear(
+        expected_boxes_data, groundtruth_dictionary[
+            standard_fields.InputDataFields.groundtruth_boxes], 1e-5)
+    self.assertAllEqual(
+        np.array([1, 2, 3]), groundtruth_dictionary[
+            standard_fields.InputDataFields.groundtruth_image_classes])
+
+  def testBuildPredictionDictionaryBoxes(self):
+    np_data = pd.DataFrame(
+        [['fe58ec1b06db2bb7', '/m/04bcr3', 0.0, 0.3, 0.5, 0.6, 0.1],
+         ['fe58ec1b06db2bb7', '/m/02gy9n', 0.1, 0.2, 0.3, 0.4, 0.2],
+         ['fe58ec1b06db2bb7', '/m/04bcr3', 0.0, 0.1, 0.2, 0.3, 0.3]],
+        columns=[
+            'ImageID', 'LabelName', 'XMin', 'XMax', 'YMin', 'YMax', 'Score'
+        ])
+    class_label_map = {'/m/04bcr3': 1, '/m/083vt': 2, '/m/02gy9n': 3}
+    prediction_dictionary = utils.build_predictions_dictionary(
+        np_data, class_label_map)
+
+    self.assertIn(standard_fields.DetectionResultFields.detection_boxes,
+                  prediction_dictionary)
+    self.assertIn(standard_fields.DetectionResultFields.detection_classes,
+                  prediction_dictionary)
+    self.assertIn(standard_fields.DetectionResultFields.detection_scores,
+                  prediction_dictionary)
+
+    self.assertAllEqual(
+        np.array([1, 3, 1]), prediction_dictionary[
+            standard_fields.DetectionResultFields.detection_classes])
+    expected_boxes_data = np.array([[0.5, 0.0, 0.6, 0.3], [0.3, 0.1, 0.4, 0.2],
+                                    [0.2, 0.0, 0.3, 0.1]])
+    self.assertNDArrayNear(
+        expected_boxes_data, prediction_dictionary[
+            standard_fields.DetectionResultFields.detection_boxes], 1e-5)
+    self.assertNDArrayNear(
+        np.array([0.1, 0.2, 0.3]), prediction_dictionary[
+            standard_fields.DetectionResultFields.detection_scores], 1e-5)
+
+  def testBuildGroundtruthDictionaryMasks(self):
+    mask1 = np.array([[0, 0, 1, 1], [0, 0, 1, 1], [0, 0, 0, 0], [0, 0, 0, 0]],
+                     dtype=np.uint8)
+    mask2 = np.array([[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]],
+                     dtype=np.uint8)
+
+    encoding1 = mask.encode(np.asfortranarray(mask1))
+    encoding2 = mask.encode(np.asfortranarray(mask2))
+
+    np_data = pd.DataFrame(
+        [[
+            'fe58ec1b06db2bb7', mask1.shape[1], mask1.shape[0], '/m/04bcr3',
+            0.0, 0.3, 0.5, 0.6, 0, None, encoding1['counts']
+        ],
+         [
+             'fe58ec1b06db2bb7', None, None, '/m/02gy9n', 0.1, 0.2, 0.3, 0.4, 1,
+             None, None
+         ],
+         [
+             'fe58ec1b06db2bb7', mask2.shape[1], mask2.shape[0], '/m/02gy9n',
+             0.5, 0.6, 0.8, 0.9, 0, None, encoding2['counts']
+         ],
+         [
+             'fe58ec1b06db2bb7', None, None, '/m/04bcr3', None, None, None,
+             None, None, 1, None
+         ],
+         [
+             'fe58ec1b06db2bb7', None, None, '/m/083vt', None, None, None, None,
+             None, 0, None
+         ],
+         [
+             'fe58ec1b06db2bb7', None, None, '/m/02gy9n', None, None, None,
+             None, None, 1, None
+         ]],
+        columns=[
+            'ImageID', 'ImageWidth', 'ImageHeight', 'LabelName', 'XMin', 'XMax',
+            'YMin', 'YMax', 'IsGroupOf', 'ConfidenceImageLabel', 'Mask'
+        ])
+    class_label_map = {'/m/04bcr3': 1, '/m/083vt': 2, '/m/02gy9n': 3}
+    groundtruth_dictionary = utils.build_groundtruth_dictionary(
+        np_data, class_label_map)
+    self.assertIn(standard_fields.InputDataFields.groundtruth_boxes,
+                  groundtruth_dictionary)
+    self.assertIn(standard_fields.InputDataFields.groundtruth_classes,
+                  groundtruth_dictionary)
+    self.assertIn(standard_fields.InputDataFields.groundtruth_group_of,
+                  groundtruth_dictionary)
+    self.assertIn(standard_fields.InputDataFields.groundtruth_image_classes,
+                  groundtruth_dictionary)
+    self.assertIn(standard_fields.InputDataFields.groundtruth_instance_masks,
+                  groundtruth_dictionary)
+    self.assertAllEqual(
+        np.array([1, 3, 3]), groundtruth_dictionary[
+            standard_fields.InputDataFields.groundtruth_classes])
+    self.assertAllEqual(
+        np.array([0, 1, 0]), groundtruth_dictionary[
+            standard_fields.InputDataFields.groundtruth_group_of])
+
+    expected_boxes_data = np.array([[0.5, 0.0, 0.6, 0.3], [0.3, 0.1, 0.4, 0.2],
+                                    [0.8, 0.5, 0.9, 0.6]])
+
+    self.assertNDArrayNear(
+        expected_boxes_data, groundtruth_dictionary[
+            standard_fields.InputDataFields.groundtruth_boxes], 1e-5)
+    self.assertAllEqual(
+        np.array([1, 2, 3]), groundtruth_dictionary[
+            standard_fields.InputDataFields.groundtruth_image_classes])
+
+    expected_segm = np.concatenate([
+        np.expand_dims(mask1, 0),
+        np.zeros((1, 4, 4), dtype=np.uint8),
+        np.expand_dims(mask2, 0)
+    ],
+                                   axis=0)
+    self.assertAllEqual(
+        expected_segm, groundtruth_dictionary[
+            standard_fields.InputDataFields.groundtruth_instance_masks])
+
+  def testBuildPredictionDictionaryMasks(self):
+    mask1 = np.array([[0, 0, 1, 1], [0, 0, 1, 1], [0, 0, 0, 0], [0, 0, 0, 0]],
+                     dtype=np.uint8)
+    mask2 = np.array([[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]],
+                     dtype=np.uint8)
+
+    encoding1 = mask.encode(np.asfortranarray(mask1))
+    encoding2 = mask.encode(np.asfortranarray(mask2))
+
+    np_data = pd.DataFrame(
+        [[
+            'fe58ec1b06db2bb7', mask1.shape[1], mask1.shape[0], '/m/04bcr3',
+            encoding1['counts'], 0.8
+        ],
+         [
+             'fe58ec1b06db2bb7', mask2.shape[1], mask2.shape[0], '/m/02gy9n',
+             encoding2['counts'], 0.6
+         ]],
+        columns=[
+            'ImageID', 'ImageWidth', 'ImageHeight', 'LabelName', 'Mask', 'Score'
+        ])
+    class_label_map = {'/m/04bcr3': 1, '/m/02gy9n': 3}
+    prediction_dictionary = utils.build_predictions_dictionary(
+        np_data, class_label_map)
+
+    self.assertIn(standard_fields.DetectionResultFields.detection_boxes,
+                  prediction_dictionary)
+    self.assertIn(standard_fields.DetectionResultFields.detection_classes,
+                  prediction_dictionary)
+    self.assertIn(standard_fields.DetectionResultFields.detection_scores,
+                  prediction_dictionary)
+    self.assertIn(standard_fields.DetectionResultFields.detection_masks,
+                  prediction_dictionary)
+
+    self.assertAllEqual(
+        np.array([1, 3]), prediction_dictionary[
+            standard_fields.DetectionResultFields.detection_classes])
+
+    expected_boxes_data = np.array([[0.0, 0.5, 0.5, 1.0], [0, 0, 0, 0]])
+    self.assertNDArrayNear(
+        expected_boxes_data, prediction_dictionary[
+            standard_fields.DetectionResultFields.detection_boxes], 1e-5)
+    self.assertNDArrayNear(
+        np.array([0.8, 0.6]), prediction_dictionary[
+            standard_fields.DetectionResultFields.detection_scores], 1e-5)
+    expected_segm = np.concatenate(
+        [np.expand_dims(mask1, 0),
+         np.expand_dims(mask2, 0)], axis=0)
+    self.assertAllEqual(
+        expected_segm, prediction_dictionary[
+            standard_fields.DetectionResultFields.detection_masks])
+
+
+if __name__ == '__main__':
+  tf.test.main()
--- a/research/object_detection/metrics/oid_od_challenge_evaluation_utils_test.py
+++ b/research/object_detection/metrics/oid_od_challenge_evaluation_utils_test.py
-# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tests for oid_od_challenge_evaluation_util."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import numpy as np
-import pandas as pd
-import tensorflow as tf
-from object_detection.core import standard_fields
-from object_detection.metrics import oid_od_challenge_evaluation_utils as utils
-
-
-class OidOdChallengeEvaluationUtilTest(tf.test.TestCase):
-
-  def testBuildGroundtruthDictionary(self):
-    np_data = pd.DataFrame(
-        [['fe58ec1b06db2bb7', '/m/04bcr3', 0.0, 0.3, 0.5, 0.6, 1, None], [
-            'fe58ec1b06db2bb7', '/m/02gy9n', 0.1, 0.2, 0.3, 0.4, 0, None
-        ], ['fe58ec1b06db2bb7', '/m/04bcr3', None, None, None, None, None, 1], [
-            'fe58ec1b06db2bb7', '/m/083vt', None, None, None, None, None, 0
-        ], ['fe58ec1b06db2bb7', '/m/02gy9n', None, None, None, None, None, 1]],
-        columns=[
-            'ImageID', 'LabelName', 'XMin', 'XMax', 'YMin', 'YMax', 'IsGroupOf',
-            'ConfidenceImageLabel'
-        ])
-    class_label_map = {'/m/04bcr3': 1, '/m/083vt': 2, '/m/02gy9n': 3}
-    groundtruth_dictionary = utils.build_groundtruth_boxes_dictionary(
-        np_data, class_label_map)
-
-    self.assertTrue(standard_fields.InputDataFields.groundtruth_boxes in
-                    groundtruth_dictionary)
-    self.assertTrue(standard_fields.InputDataFields.groundtruth_classes in
-                    groundtruth_dictionary)
-    self.assertTrue(standard_fields.InputDataFields.groundtruth_group_of in
-                    groundtruth_dictionary)
-    self.assertTrue(standard_fields.InputDataFields.groundtruth_image_classes in
-                    groundtruth_dictionary)
-
-    self.assertAllEqual(
-        np.array([1, 3]), groundtruth_dictionary[
-            standard_fields.InputDataFields.groundtruth_classes])
-    self.assertAllEqual(
-        np.array([1, 0]), groundtruth_dictionary[
-            standard_fields.InputDataFields.groundtruth_group_of])
-
-    expected_boxes_data = np.array([[0.5, 0.0, 0.6, 0.3], [0.3, 0.1, 0.4, 0.2]])
-
-    self.assertNDArrayNear(
-        expected_boxes_data, groundtruth_dictionary[
-            standard_fields.InputDataFields.groundtruth_boxes], 1e-5)
-    self.assertAllEqual(
-        np.array([1, 2, 3]), groundtruth_dictionary[
-            standard_fields.InputDataFields.groundtruth_image_classes])
-
-  def testBuildPredictionDictionary(self):
-    np_data = pd.DataFrame(
-        [['fe58ec1b06db2bb7', '/m/04bcr3', 0.0, 0.3, 0.5, 0.6, 0.1], [
-            'fe58ec1b06db2bb7', '/m/02gy9n', 0.1, 0.2, 0.3, 0.4, 0.2
-        ], ['fe58ec1b06db2bb7', '/m/04bcr3', 0.0, 0.1, 0.2, 0.3, 0.3]],
-        columns=[
-            'ImageID', 'LabelName', 'XMin', 'XMax', 'YMin', 'YMax', 'Score'
-        ])
-    class_label_map = {'/m/04bcr3': 1, '/m/083vt': 2, '/m/02gy9n': 3}
-    prediction_dictionary = utils.build_predictions_dictionary(
-        np_data, class_label_map)
-
-    self.assertTrue(standard_fields.DetectionResultFields.detection_boxes in
-                    prediction_dictionary)
-    self.assertTrue(standard_fields.DetectionResultFields.detection_classes in
-                    prediction_dictionary)
-    self.assertTrue(standard_fields.DetectionResultFields.detection_scores in
-                    prediction_dictionary)
-
-    self.assertAllEqual(
-        np.array([1, 3, 1]), prediction_dictionary[
-            standard_fields.DetectionResultFields.detection_classes])
-    expected_boxes_data = np.array([[0.5, 0.0, 0.6, 0.3], [0.3, 0.1, 0.4, 0.2],
-                                    [0.2, 0.0, 0.3, 0.1]])
-    self.assertNDArrayNear(
-        expected_boxes_data, prediction_dictionary[
-            standard_fields.DetectionResultFields.detection_boxes], 1e-5)
-    self.assertNDArrayNear(
-        np.array([0.1, 0.2, 0.3]), prediction_dictionary[
-            standard_fields.DetectionResultFields.detection_scores], 1e-5)
-
-
-if __name__ == '__main__':
-  tf.test.main()
--- a/research/object_detection/metrics/oid_vrd_challenge_evaluation.py
+++ b/research/object_detection/metrics/oid_vrd_challenge_evaluation.py
@@ -17,7 +17,7 @@ r"""Runs evaluation using OpenImages groundtruth and predictions.
 Example usage:
 python \
 models/research/object_detection/metrics/oid_vrd_challenge_evaluation.py \
-    --input_annotations_boxes=/path/to/input/annotations-human-bbox.csv \
+    --input_annotations_vrd=/path/to/input/annotations-human-bbox.csv \
    --input_annotations_labels=/path/to/input/annotations-label.csv \
    --input_class_labelmap=/path/to/input/class_labelmap.pbtxt \
    --input_relationship_labelmap=/path/to/input/relationship_labelmap.pbtxt \
@@ -126,7 +126,7 @@ if __name__ == '__main__':
      description=
      'Evaluate Open Images Visual Relationship Detection predictions.')
  parser.add_argument(
-      '--input_annotations_boxes',
+      '--input_annotations_vrd',
      required=True,
      help='File with groundtruth vrd annotations.')
  parser.add_argument(

--- a/research/object_detection/model_lib.py
+++ b/research/object_detection/model_lib.py
--- a/research/object_detection/models/keras_models/mobilenet_v2.py
+++ b/research/object_detection/models/keras_models/mobilenet_v2.py
--- a/research/object_detection/predictors/convolutional_box_predictor.py
+++ b/research/object_detection/predictors/convolutional_box_predictor.py
--- a/research/object_detection/predictors/convolutional_keras_box_predictor.py
+++ b/research/object_detection/predictors/convolutional_keras_box_predictor.py
--- a/research/object_detection/predictors/heads/keras_mask_head.py
+++ b/research/object_detection/predictors/heads/keras_mask_head.py
--- a/research/object_detection/predictors/rfcn_box_predictor.py
+++ b/research/object_detection/predictors/rfcn_box_predictor.py
--- a/research/object_detection/predictors/rfcn_keras_box_predictor.py
+++ b/research/object_detection/predictors/rfcn_keras_box_predictor.py
--- a/research/object_detection/protos/eval.proto
+++ b/research/object_detection/protos/eval.proto