Merged commit includes the following changes:

199348852 by Zhichao Lu: Small typos fixes in VRD evaluation. -- 199315191 by Zhichao Lu: Change padding shapes when additional channels are available. -- 199309180 by Zhichao Lu: Adds minor fixes to the Object Detection API implementation. -- 199298605 by Zhichao Lu: Force num_readers to be 1 when only input file is not sharded. -- 199292952 by Zhichao Lu: Adds image-level labels parsing into TfExampleDetectionAndGTParser. -- 199259866 by Zhichao Lu: Visual Relationships Evaluation executable. -- 199208330 by Zhichao Lu: Infer train_config.batch_size as the effective batch size. Therefore we need to divide the effective batch size in trainer by train_config.replica_to_aggregate to get per worker batch size. -- 199207842 by Zhichao Lu: Internal change. -- 199204222 by Zhichao Lu: In case the image has more than three channels, we only take the first three channels for visualization. -- 199194388 by Zhichao Lu: Correcting protocols description: VOC 2007 -> VOC 2012. -- 199188290 by Zhichao Lu: Adds per-relationship APs and mAP computation to VRD evaluation. -- 199158801 by Zhichao Lu: If available, additional channels are merged with input image. -- 199099637 by Zhichao Lu: OpenImages Challenge metric support: -adding verified labels standard field for TFExample; -adding tfrecord creation functionality. -- 198957391 by Zhichao Lu: Allow tf record sharding when creating pets dataset. -- 198925184 by Zhichao Lu: Introduce moving average support for evaluation. Also adding the ability to override this configuration via config_util. -- 198918186 by Zhichao Lu: Handles the case where there are 0 box masks. -- 198809009 by Zhichao Lu: Plumb groundtruth weights into target assigner for Faster RCNN. -- 198759987 by Zhichao Lu: Fix object detection test broken by shape inference. -- 198668602 by Zhichao Lu: Adding a new input field in data_decoders/tf_example_decoder.py for storing additional channels. -- 198530013 by Zhichao Lu: An util for hierarchical expandion of boxes and labels of OID dataset. -- 198503124 by Zhichao Lu: Fix dimension mismatch error introduced by https://github.com/tensorflow/tensorflow/pull/18251, or cl/194031845. After above change, conv2d strictly checks for conv_dims + 2 == input_rank. -- 198445807 by Zhichao Lu: Enabling Object Detection Challenge 2018 metric in evaluator.py framework for running eval job. Renaming old OpenImages V2 metric. -- 198413950 by Zhichao Lu: Support generic configuration override using namespaced keys Useful for adding custom hyper-parameter tuning fields without having to add custom override methods to config_utils.py. -- 198106437 by Zhichao Lu: Enable fused batchnorm now that quantization is supported. -- 198048364 by Zhichao Lu: Add support for keypoints in tf sequence examples and some util ops. -- 198004736 by Zhichao Lu: Relax postprocessing unit tests that are based on assumption that tf.image.non_max_suppression are stable with respect to input. -- 197997513 by Zhichao Lu: More lenient validation for normalized box boundaries. -- 197940068 by Zhichao Lu: A couple of minor updates/fixes: - Updating input reader proto with option to use display_name when decoding data. - Updating visualization tool to specify whether using absolute or normalized box coordinates. Appropriate boxes will now appear in TB when using model_main.py -- 197920152 by Zhichao Lu: Add quantized training support in the new OD binaries and a config for SSD Mobilenet v1 quantized training that is TPU compatible. -- 197213563 by Zhichao Lu: Do not share batch_norm for classification and regression tower in weight shared box predictor. -- 197196757 by Zhichao Lu: Relax the box_predictor api to return box_prediction of shape [batch_size, num_anchors, code_size] in addition to [batch_size, num_anchors, (1|q), code_size]. -- 196898361 by Zhichao Lu: Allow per-channel scalar value to pad input image with when using keep aspect ratio resizer (when pad_to_max_dimension=True). In Object Detection Pipeline, we pad image before normalization and this skews batch_norm statistics during training. The option to set per channel pad value lets us truly pad with zeros. -- 196592101 by Zhichao Lu: Fix bug regarding tfrecord shuffling in object_detection -- 196320138 by Zhichao Lu: Fix typo in exporting_models.md -- PiperOrigin-RevId: 199348852

Merged commit includes the following changes:
199348852 by Zhichao Lu: Small typos fixes in VRD evaluation. -- 199315191 by Zhichao Lu: Change padding shapes when additional channels are available. -- 199309180 by Zhichao Lu: Adds minor fixes to the Object Detection API implementation. -- 199298605 by Zhichao Lu: Force num_readers to be 1 when only input file is not sharded. -- 199292952 by Zhichao Lu: Adds image-level labels parsing into TfExampleDetectionAndGTParser. -- 199259866 by Zhichao Lu: Visual Relationships Evaluation executable. -- 199208330 by Zhichao Lu: Infer train_config.batch_size as the effective batch size. Therefore we need to divide the effective batch size in trainer by train_config.replica_to_aggregate to get per worker batch size. -- 199207842 by Zhichao Lu: Internal change. -- 199204222 by Zhichao Lu: In case the image has more than three channels, we only take the first three channels for visualization. -- 199194388 by Zhichao Lu: Correcting protocols description: VOC 2007 -> VOC 2012. -- 199188290 by Zhichao Lu: Adds per-relationship APs and mAP computation to VRD evaluation. -- 199158801 by Zhichao Lu: If available, additional channels are merged with input image. -- 199099637 by Zhichao Lu: OpenImages Challenge metric support: -adding verified labels standard field for TFExample; -adding tfrecord creation functionality. -- 198957391 by Zhichao Lu: Allow tf record sharding when creating pets dataset. -- 198925184 by Zhichao Lu: Introduce moving average support for evaluation. Also adding the ability to override this configuration via config_util. -- 198918186 by Zhichao Lu: Handles the case where there are 0 box masks. -- 198809009 by Zhichao Lu: Plumb groundtruth weights into target assigner for Faster RCNN. -- 198759987 by Zhichao Lu: Fix object detection test broken by shape inference. -- 198668602 by Zhichao Lu: Adding a new input field in data_decoders/tf_example_decoder.py for storing additional channels. -- 198530013 by Zhichao Lu: An util for hierarchical expandion of boxes and labels of OID dataset. -- 198503124 by Zhichao Lu: Fix dimension mismatch error introduced by https://github.com/tensorflow/tensorflow/pull/18251, or cl/194031845. After above change, conv2d strictly checks for conv_dims + 2 == input_rank. -- 198445807 by Zhichao Lu: Enabling Object Detection Challenge 2018 metric in evaluator.py framework for running eval job. Renaming old OpenImages V2 metric. -- 198413950 by Zhichao Lu: Support generic configuration override using namespaced keys Useful for adding custom hyper-parameter tuning fields without having to add custom override methods to config_utils.py. -- 198106437 by Zhichao Lu: Enable fused batchnorm now that quantization is supported. -- 198048364 by Zhichao Lu: Add support for keypoints in tf sequence examples and some util ops. -- 198004736 by Zhichao Lu: Relax postprocessing unit tests that are based on assumption that tf.image.non_max_suppression are stable with respect to input. -- 197997513 by Zhichao Lu: More lenient validation for normalized box boundaries. -- 197940068 by Zhichao Lu: A couple of minor updates/fixes: - Updating input reader proto with option to use display_name when decoding data. - Updating visualization tool to specify whether using absolute or normalized box coordinates. Appropriate boxes will now appear in TB when using model_main.py -- 197920152 by Zhichao Lu: Add quantized training support in the new OD binaries and a config for SSD Mobilenet v1 quantized training that is TPU compatible. -- 197213563 by Zhichao Lu: Do not share batch_norm for classification and regression tower in weight shared box predictor. -- 197196757 by Zhichao Lu: Relax the box_predictor api to return box_prediction of shape [batch_size, num_anchors, code_size] in addition to [batch_size, num_anchors, (1|q), code_size]. -- 196898361 by Zhichao Lu: Allow per-channel scalar value to pad input image with when using keep aspect ratio resizer (when pad_to_max_dimension=True). In Object Detection Pipeline, we pad image before normalization and this skews batch_norm statistics during training. The option to set per channel pad value lets us truly pad with zeros. -- 196592101 by Zhichao Lu: Fix bug regarding tfrecord shuffling in object_detection -- 196320138 by Zhichao Lu: Fix typo in exporting_models.md -- PiperOrigin-RevId: 199348852
9fce9c64 · Zhichao Lu · pkulzc · ed901b73 · 9fce9c64 · 9fce9c64
Commit 9fce9c64 authored Jun 05, 2018 by Zhichao Lu Committed by pkulzc Jun 05, 2018
2 changed files
--- a/research/object_detection/utils/vrd_evaluation.py
+++ b/research/object_detection/utils/vrd_evaluation.py
+# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Evaluator class for Visual Relations Detection.
+
+VRDDetectionEvaluator is a class which manages ground truth information of a
+visual relations detection (vrd) dataset, and computes frequently used detection
+metrics such as Precision, Recall, Recall@k, of the provided vrd detection
+results.
+It supports the following operations:
+1) Adding ground truth information of images sequentially.
+2) Adding detection results of images sequentially.
+3) Evaluating detection metrics on already inserted detection results.
+
+Note1: groundtruth should be inserted before evaluation.
+Note2: This module operates on numpy boxes and box lists.
+"""
+
+from abc import abstractmethod
+import collections
+import logging
+import numpy as np
+
+from object_detection.core import standard_fields
+from object_detection.utils import metrics
+from object_detection.utils import object_detection_evaluation
+from object_detection.utils import per_image_vrd_evaluation
+
+# Below standard input numpy datatypes are defined:
+# box_data_type - datatype of the groundtruth visual relations box annotations;
+# this datatype consists of two named boxes: subject bounding box and object
+# bounding box. Each box is of the format [y_min, x_min, y_max, x_max], each
+# coordinate being of type float32.
+# label_data_type - corresponding datatype of the visual relations label
+# annotaions; it consists of three numerical class labels: subject class label,
+# object class label and relation class label, each class label being of type
+# int32.
+vrd_box_data_type = np.dtype([('subject', 'f4', (4,)), ('object', 'f4', (4,))])
+single_box_data_type = np.dtype([('box', 'f4', (4,))])
+label_data_type = np.dtype([('subject', 'i4'), ('object', 'i4'), ('relation',
+                                                                  'i4')])
+
+
+class VRDDetectionEvaluator(object_detection_evaluation.DetectionEvaluator):
+  """A class to evaluate VRD detections.
+
+  This class serves as a base class for VRD evaluation in two settings:
+  - phrase detection
+  - relation detection.
+  """
+
+  def __init__(self, matching_iou_threshold=0.5, metric_prefix=None):
+    """Constructor.
+
+    Args:
+      matching_iou_threshold: IOU threshold to use for matching groundtruth
+        boxes to detection boxes.
+      metric_prefix: (optional) string prefix for metric name; if None, no
+        prefix is used.
+
+    """
+    super(VRDDetectionEvaluator, self).__init__([])
+    self._matching_iou_threshold = matching_iou_threshold
+    self._evaluation = _VRDDetectionEvaluation(
+        matching_iou_threshold=self._matching_iou_threshold)
+    self._image_ids = set([])
+    self._metric_prefix = (metric_prefix + '_') if metric_prefix else ''
+    self._evaluatable_labels = {}
+    self._negative_labels = {}
+
+  @abstractmethod
+  def _process_groundtruth_boxes(self, groundtruth_box_tuples):
+    """Pre-processes boxes before adding them to the VRDDetectionEvaluation.
+
+    Phrase detection and Relation detection subclasses re-implement this method
+    depending on the task.
+
+    Args:
+      groundtruth_box_tuples:  A numpy array of structures with the shape
+        [M, 1], each structure containing the same number of named bounding
+        boxes. Each box is of the format [y_min, x_min, y_max, x_max] (see
+        datatype vrd_box_data_type, single_box_data_type above).
+    """
+    raise NotImplementedError(
+        '_process_groundtruth_boxes method should be implemented in subclasses'
+        'of VRDDetectionEvaluator.')
+
+  @abstractmethod
+  def _process_detection_boxes(self, detections_box_tuples):
+    """Pre-processes boxes before adding them to the VRDDetectionEvaluation.
+
+    Phrase detection and Relation detection subclasses re-implement this method
+    depending on the task.
+
+    Args:
+      detections_box_tuples:  A numpy array of structures with the shape
+        [M, 1], each structure containing the same number of named bounding
+        boxes. Each box is of the format [y_min, x_min, y_max, x_max] (see
+        datatype vrd_box_data_type, single_box_data_type above).
+    """
+    raise NotImplementedError(
+        '_process_detection_boxes method should be implemented in subclasses'
+        'of VRDDetectionEvaluator.')
+
+  def add_single_ground_truth_image_info(self, image_id, groundtruth_dict):
+    """Adds groundtruth for a single image to be used for evaluation.
+
+    Args:
+      image_id: A unique string/integer identifier for the image.
+      groundtruth_dict: A dictionary containing -
+        standard_fields.InputDataFields.groundtruth_boxes: A numpy array
+          of structures with the shape [M, 1], representing M tuples, each tuple
+          containing the same number of named bounding boxes.
+          Each box is of the format [y_min, x_min, y_max, x_max] (see
+          datatype vrd_box_data_type, single_box_data_type above).
+        standard_fields.InputDataFields.groundtruth_classes: A numpy array of
+          structures shape [M, 1], representing  the class labels of the
+          corresponding bounding boxes and possibly additional classes (see
+          datatype label_data_type above).
+        standard_fields.InputDataFields.verified_labels: numpy array
+          of shape [K] containing verified labels.
+    Raises:
+      ValueError: On adding groundtruth for an image more than once.
+    """
+    if image_id in self._image_ids:
+      raise ValueError('Image with id {} already added.'.format(image_id))
+
+    groundtruth_class_tuples = (
+        groundtruth_dict[standard_fields.InputDataFields.groundtruth_classes])
+    groundtruth_box_tuples = (
+        groundtruth_dict[standard_fields.InputDataFields.groundtruth_boxes])
+
+    self._evaluation.add_single_ground_truth_image_info(
+        image_key=image_id,
+        groundtruth_box_tuples=self._process_groundtruth_boxes(
+            groundtruth_box_tuples),
+        groundtruth_class_tuples=groundtruth_class_tuples)
+    self._image_ids.update([image_id])
+    all_classes = []
+    for field in groundtruth_box_tuples.dtype.fields:
+      all_classes.append(groundtruth_class_tuples[field])
+    groudtruth_positive_classes = np.unique(np.concatenate(all_classes))
+    verified_labels = groundtruth_dict.get(
+        standard_fields.InputDataFields.verified_labels, np.array(
+            [], dtype=int))
+    self._evaluatable_labels[image_id] = np.unique(
+        np.concatenate((verified_labels, groudtruth_positive_classes)))
+
+    self._negative_labels[image_id] = np.setdiff1d(verified_labels,
+                                                   groudtruth_positive_classes)
+
+  def add_single_detected_image_info(self, image_id, detections_dict):
+    """Adds detections for a single image to be used for evaluation.
+
+    Args:
+      image_id: A unique string/integer identifier for the image.
+      detections_dict: A dictionary containing -
+        standard_fields.DetectionResultFields.detection_boxes: A numpy array of
+          structures with shape [N, 1], representing N tuples, each tuple
+          containing the same number of named bounding boxes.
+          Each box is of the format [y_min, x_min, y_max, x_max] (as an example
+          see datatype vrd_box_data_type, single_box_data_type above).
+        standard_fields.DetectionResultFields.detection_scores: float32 numpy
+          array of shape [N] containing detection scores for the boxes.
+        standard_fields.DetectionResultFields.detection_classes: A numpy array
+          of structures shape [N, 1], representing the class labels of the
+          corresponding bounding boxes and possibly additional classes (see
+          datatype label_data_type above).
+    """
+    num_detections = detections_dict[
+        standard_fields.DetectionResultFields.detection_boxes].shape[0]
+    detection_class_tuples = detections_dict[
+        standard_fields.DetectionResultFields.detection_classes]
+    detection_box_tuples = detections_dict[
+        standard_fields.DetectionResultFields.detection_boxes]
+    selector = np.ones(num_detections, dtype=bool)
+
+    # Only check boxable labels
+    for field in detection_box_tuples.dtype.fields:
+      # Verify if one of the labels is negative (this is sure FP)
+      selector |= np.isin(detection_class_tuples[field],
+                          self._negative_labels[image_id])
+      # Verify if all labels are verified
+      selector |= np.isin(detection_class_tuples[field],
+                          self._evaluatable_labels[image_id])
+
+    self._evaluation.add_single_detected_image_info(
+        image_key=image_id,
+        detected_box_tuples=self._process_detection_boxes(
+            detection_box_tuples[selector]),
+        detected_scores=detections_dict[
+            standard_fields.DetectionResultFields.detection_scores][selector],
+        detected_class_tuples=detection_class_tuples[selector])
+
+  def evaluate(self, relationships=None):
+    """Compute evaluation result.
+
+    Args:
+      relationships: A dictionary of numerical label-text label mapping; if
+        specified, returns per-relationship AP.
+
+    Returns:
+      A dictionary of metrics with the following fields -
+
+      summary_metrics:
+        'weightedAP@<matching_iou_threshold>IOU' : weighted average precision
+        at the specified IOU threshold.
+        'AP@<matching_iou_threshold>IOU/<relationship>' : AP per relationship.
+        'mAP@<matching_iou_threshold>IOU': mean average precision at the
+        specified IOU threshold.
+        'Recall@50@<matching_iou_threshold>IOU': recall@50 at the specified IOU
+        threshold.
+        'Recall@100@<matching_iou_threshold>IOU': recall@100 at the specified
+        IOU threshold.
+      if relationships is specified, returns <relationship> in AP metrics as
+      readable names, otherwise the names correspond to class numbers.
+    """
+    (weighted_average_precision, mean_average_precision, average_precisions, _,
+     _, recall_50, recall_100, _, _) = (
+         self._evaluation.evaluate())
+
+    vrd_metrics = {
+        (self._metric_prefix + 'weightedAP@{}IOU'.format(
+            self._matching_iou_threshold)):
+            weighted_average_precision,
+        self._metric_prefix + 'mAP@{}IOU'.format(self._matching_iou_threshold):
+            mean_average_precision,
+        self._metric_prefix + 'Recall@50@{}IOU'.format(
+            self._matching_iou_threshold):
+            recall_50,
+        self._metric_prefix + 'Recall@100@{}IOU'.format(
+            self._matching_iou_threshold):
+            recall_100,
+    }
+    if relationships:
+      for key, average_precision in average_precisions.iteritems():
+        vrd_metrics[self._metric_prefix + 'AP@{}IOU/{}'.format(
+            self._matching_iou_threshold,
+            relationships[key])] = average_precision
+    else:
+      for key, average_precision in average_precisions.iteritems():
+        vrd_metrics[self._metric_prefix + 'AP@{}IOU/{}'.format(
+            self._matching_iou_threshold, key)] = average_precision
+
+    return vrd_metrics
+
+  def clear(self):
+    """Clears the state to prepare for a fresh evaluation."""
+    self._evaluation = _VRDDetectionEvaluation(
+        matching_iou_threshold=self._matching_iou_threshold)
+    self._image_ids.clear()
+    self._negative_labels.clear()
+    self._evaluatable_labels.clear()
+
+
+class VRDRelationDetectionEvaluator(VRDDetectionEvaluator):
+  """A class to evaluate VRD detections in relations setting.
+
+  Expected groundtruth box datatype is vrd_box_data_type, expected groudtruth
+  labels datatype is label_data_type.
+  Expected detection box datatype is vrd_box_data_type, expected detection
+  labels
+  datatype is label_data_type.
+  """
+
+  def __init__(self, matching_iou_threshold=0.5):
+    super(VRDRelationDetectionEvaluator, self).__init__(
+        matching_iou_threshold=matching_iou_threshold,
+        metric_prefix='VRDMetric_Relationships')
+
+  def _process_groundtruth_boxes(self, groundtruth_box_tuples):
+    """Pre-processes boxes before adding them to the VRDDetectionEvaluation.
+
+    Args:
+      groundtruth_box_tuples: A numpy array of structures with the shape
+        [M, 1], each structure containing the same number of named bounding
+        boxes. Each box is of the format [y_min, x_min, y_max, x_max].
+
+    Returns:
+      Unchanged input.
+    """
+
+    return groundtruth_box_tuples
+
+  def _process_detection_boxes(self, detections_box_tuples):
+    """Pre-processes boxes before adding them to the VRDDetectionEvaluation.
+
+    Phrase detection and Relation detection subclasses re-implement this method
+    depending on the task.
+
+    Args:
+      detections_box_tuples:  A numpy array of structures with the shape
+        [M, 1], each structure containing the same number of named bounding
+        boxes. Each box is of the format [y_min, x_min, y_max, x_max] (see
+        datatype vrd_box_data_type, single_box_data_type above).
+    Returns:
+      Unchanged input.
+    """
+    return detections_box_tuples
+
+
+class VRDPhraseDetectionEvaluator(VRDDetectionEvaluator):
+  """A class to evaluate VRD detections in phrase setting.
+
+  Expected groundtruth box datatype is vrd_box_data_type, expected groudtruth
+  labels datatype is label_data_type.
+  Expected detection box datatype is single_box_data_type, expected detection
+  labels datatype is label_data_type.
+  """
+
+  def __init__(self, matching_iou_threshold=0.5):
+    super(VRDPhraseDetectionEvaluator, self).__init__(
+        matching_iou_threshold=matching_iou_threshold,
+        metric_prefix='VRDMetric_Phrases')
+
+  def _process_groundtruth_boxes(self, groundtruth_box_tuples):
+    """Pre-processes boxes before adding them to the VRDDetectionEvaluation.
+
+    In case of phrase evaluation task, evaluation expects exactly one bounding
+    box containing all objects in the phrase. This bounding box is computed
+    as an enclosing box of all groundtruth boxes of a phrase.
+
+    Args:
+      groundtruth_box_tuples: A numpy array of structures with the shape
+        [M, 1], each structure containing the same number of named bounding
+        boxes. Each box is of the format [y_min, x_min, y_max, x_max]. See
+        vrd_box_data_type for an example of structure.
+
+    Returns:
+      result: A numpy array of structures with the shape [M, 1], each
+        structure containing exactly one named bounding box. i-th output
+        structure corresponds to the result of processing i-th input structure,
+        where the named bounding box is computed as an enclosing bounding box
+        of all bounding boxes of the i-th input structure.
+    """
+    first_box_key = groundtruth_box_tuples.dtype.fields.keys()[0]
+    miny = groundtruth_box_tuples[first_box_key][:, 0]
+    minx = groundtruth_box_tuples[first_box_key][:, 1]
+    maxy = groundtruth_box_tuples[first_box_key][:, 2]
+    maxx = groundtruth_box_tuples[first_box_key][:, 3]
+    for fields in groundtruth_box_tuples.dtype.fields:
+      miny = np.minimum(groundtruth_box_tuples[fields][:, 0], miny)
+      minx = np.minimum(groundtruth_box_tuples[fields][:, 1], minx)
+      maxy = np.maximum(groundtruth_box_tuples[fields][:, 2], maxy)
+      maxx = np.maximum(groundtruth_box_tuples[fields][:, 3], maxx)
+    data_result = []
+    for i in range(groundtruth_box_tuples.shape[0]):
+      data_result.append(([miny[i], minx[i], maxy[i], maxx[i]],))
+    result = np.array(data_result, dtype=[('box', 'f4', (4,))])
+    return result
+
+  def _process_detection_boxes(self, detections_box_tuples):
+    """Pre-processes boxes before adding them to the VRDDetectionEvaluation.
+
+    In case of phrase evaluation task, evaluation expects exactly one bounding
+    box containing all objects in the phrase. This bounding box is computed
+    as an enclosing box of all groundtruth boxes of a phrase.
+
+    Args:
+      detections_box_tuples: A numpy array of structures with the shape
+        [M, 1], each structure containing the same number of named bounding
+        boxes. Each box is of the format [y_min, x_min, y_max, x_max]. See
+        vrd_box_data_type for an example of this structure.
+
+    Returns:
+      result: A numpy array of structures with the shape [M, 1], each
+        structure containing exactly one named bounding box. i-th output
+        structure corresponds to the result of processing i-th input structure,
+        where the named bounding box is computed as an enclosing bounding box
+        of all bounding boxes of the i-th input structure.
+    """
+    first_box_key = detections_box_tuples.dtype.fields.keys()[0]
+    miny = detections_box_tuples[first_box_key][:, 0]
+    minx = detections_box_tuples[first_box_key][:, 1]
+    maxy = detections_box_tuples[first_box_key][:, 2]
+    maxx = detections_box_tuples[first_box_key][:, 3]
+    for fields in detections_box_tuples.dtype.fields:
+      miny = np.minimum(detections_box_tuples[fields][:, 0], miny)
+      minx = np.minimum(detections_box_tuples[fields][:, 1], minx)
+      maxy = np.maximum(detections_box_tuples[fields][:, 2], maxy)
+      maxx = np.maximum(detections_box_tuples[fields][:, 3], maxx)
+    data_result = []
+    for i in range(detections_box_tuples.shape[0]):
+      data_result.append(([miny[i], minx[i], maxy[i], maxx[i]],))
+    result = np.array(data_result, dtype=[('box', 'f4', (4,))])
+    return result
+
+
+VRDDetectionEvalMetrics = collections.namedtuple('VRDDetectionEvalMetrics', [
+    'weighted_average_precision', 'mean_average_precision',
+    'average_precisions', 'precisions', 'recalls', 'recall_50', 'recall_100',
+    'median_rank_50', 'median_rank_100'
+])
+
+
+class _VRDDetectionEvaluation(object):
+  """Performs metric computation for the VRD task. This class is internal.
+  """
+
+  def __init__(self, matching_iou_threshold=0.5):
+    """Constructor.
+
+    Args:
+      matching_iou_threshold: IOU threshold to use for matching groundtruth
+        boxes to detection boxes.
+    """
+    self._per_image_eval = per_image_vrd_evaluation.PerImageVRDEvaluation(
+        matching_iou_threshold=matching_iou_threshold)
+
+    self._groundtruth_box_tuples = {}
+    self._groundtruth_class_tuples = {}
+    self._num_gt_instances = 0
+    self._num_gt_imgs = 0
+    self._num_gt_instances_per_relationship = {}
+
+    self.clear_detections()
+
+  def clear_detections(self):
+    """Clears detections."""
+    self._detection_keys = set()
+    self._scores = []
+    self._relation_field_values = []
+    self._tp_fp_labels = []
+    self._average_precisions = {}
+    self._precisions = []
+    self._recalls = []
+
+  def add_single_ground_truth_image_info(
+      self, image_key, groundtruth_box_tuples, groundtruth_class_tuples):
+    """Adds groundtruth for a single image to be used for evaluation.
+
+    Args:
+      image_key: A unique string/integer identifier for the image.
+      groundtruth_box_tuples: A numpy array of structures with the shape
+          [M, 1], representing M tuples, each tuple containing the same number
+          of named bounding boxes.
+          Each box is of the format [y_min, x_min, y_max, x_max].
+      groundtruth_class_tuples: A numpy array of structures shape [M, 1],
+          representing  the class labels of the corresponding bounding boxes and
+          possibly additional classes.
+    """
+    if image_key in self._groundtruth_box_tuples:
+      logging.warn(
+          'image %s has already been added to the ground truth database.',
+          image_key)
+      return
+
+    self._groundtruth_box_tuples[image_key] = groundtruth_box_tuples
+    self._groundtruth_class_tuples[image_key] = groundtruth_class_tuples
+
+    self._update_groundtruth_statistics(groundtruth_class_tuples)
+
+  def add_single_detected_image_info(self, image_key, detected_box_tuples,
+                                     detected_scores, detected_class_tuples):
+    """Adds detections for a single image to be used for evaluation.
+
+    Args:
+      image_key: A unique string/integer identifier for the image.
+      detected_box_tuples: A numpy array of structures with shape [N, 1],
+          representing N tuples, each tuple containing the same number of named
+          bounding boxes.
+          Each box is of the format [y_min, x_min, y_max, x_max].
+      detected_scores: A float numpy array of shape [N, 1], representing
+          the confidence scores of the detected N object instances.
+      detected_class_tuples: A numpy array of structures shape [N, 1],
+          representing the class labels of the corresponding bounding boxes and
+          possibly additional classes.
+    """
+    self._detection_keys.add(image_key)
+    if image_key in self._groundtruth_box_tuples:
+      groundtruth_box_tuples = self._groundtruth_box_tuples[image_key]
+      groundtruth_class_tuples = self._groundtruth_class_tuples[image_key]
+    else:
+      groundtruth_box_tuples = np.empty(shape=[0, 4], dtype=float)
+      groundtruth_class_tuples = np.array([], dtype=int)
+
+    scores, tp_fp_labels, mapping = (
+        self._per_image_eval.compute_detection_tp_fp(
+            detected_box_tuples=detected_box_tuples,
+            detected_scores=detected_scores,
+            detected_class_tuples=detected_class_tuples,
+            groundtruth_box_tuples=groundtruth_box_tuples,
+            groundtruth_class_tuples=groundtruth_class_tuples))
+
+    self._scores += [scores]
+    self._tp_fp_labels += [tp_fp_labels]
+    self._relation_field_values += [detected_class_tuples[mapping]['relation']]
+
+  def _update_groundtruth_statistics(self, groundtruth_class_tuples):
+    """Updates grouth truth statistics.
+
+    Args:
+      groundtruth_class_tuples: A numpy array of structures shape [M, 1],
+          representing  the class labels of the corresponding bounding boxes and
+          possibly additional classes.
+    """
+    self._num_gt_instances += groundtruth_class_tuples.shape[0]
+    self._num_gt_imgs += 1
+    for relation_field_value in np.unique(groundtruth_class_tuples['relation']):
+      if relation_field_value not in self._num_gt_instances_per_relationship:
+        self._num_gt_instances_per_relationship[relation_field_value] = 0
+      self._num_gt_instances_per_relationship[relation_field_value] += np.sum(
+          groundtruth_class_tuples['relation'] == relation_field_value)
+
+  def evaluate(self):
+    """Computes evaluation result.
+
+    Returns:
+      A named tuple with the following fields -
+        average_precision: a float number corresponding to average precision.
+        precisions: an array of precisions.
+        recalls: an array of recalls.
+        recall@50: recall computed on 50 top-scoring samples.
+        recall@100: recall computed on 100 top-scoring samples.
+        median_rank@50: median rank computed on 50 top-scoring samples.
+        median_rank@100: median rank computed on 100 top-scoring samples.
+    """
+    if self._num_gt_instances == 0:
+      logging.warn('No ground truth instances')
+
+    if not self._scores:
+      scores = np.array([], dtype=float)
+      tp_fp_labels = np.array([], dtype=bool)
+    else:
+      scores = np.concatenate(self._scores)
+      tp_fp_labels = np.concatenate(self._tp_fp_labels)
+      relation_field_values = np.concatenate(self._relation_field_values)
+
+    for relation_field_value, _ in (
+        self._num_gt_instances_per_relationship.iteritems()):
+      precisions, recalls = metrics.compute_precision_recall(
+          scores[relation_field_values == relation_field_value],
+          tp_fp_labels[relation_field_values == relation_field_value],
+          self._num_gt_instances_per_relationship[relation_field_value])
+      self._average_precisions[
+          relation_field_value] = metrics.compute_average_precision(
+              precisions, recalls)
+
+    self._mean_average_precision = np.mean(self._average_precisions.values())
+
+    self._precisions, self._recalls = metrics.compute_precision_recall(
+        scores, tp_fp_labels, self._num_gt_instances)
+    self._weighted_average_precision = metrics.compute_average_precision(
+        self._precisions, self._recalls)
+
+    self._recall_50 = (
+        metrics.compute_recall_at_k(self._tp_fp_labels, self._num_gt_instances,
+                                    50))
+    self._median_rank_50 = (
+        metrics.compute_median_rank_at_k(self._tp_fp_labels, 50))
+    self._recall_100 = (
+        metrics.compute_recall_at_k(self._tp_fp_labels, self._num_gt_instances,
+                                    100))
+    self._median_rank_100 = (
+        metrics.compute_median_rank_at_k(self._tp_fp_labels, 100))
+
+    return VRDDetectionEvalMetrics(
+        self._weighted_average_precision, self._mean_average_precision,
+        self._average_precisions, self._precisions, self._recalls,
+        self._recall_50, self._recall_100, self._median_rank_50,
+        self._median_rank_100)
--- a/research/object_detection/utils/vrd_evaluation_test.py
+++ b/research/object_detection/utils/vrd_evaluation_test.py
+# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for tensorflow_models.object_detection.utils.vrd_evaluation."""
+
+import numpy as np
+import tensorflow as tf
+
+from object_detection.core import standard_fields
+from object_detection.utils import vrd_evaluation
+
+
+class VRDRelationDetectionEvaluatorTest(tf.test.TestCase):
+
+  def test_vrdrelation_evaluator(self):
+    self.vrd_eval = vrd_evaluation.VRDRelationDetectionEvaluator()
+
+    image_key1 = 'img1'
+    groundtruth_box_tuples1 = np.array(
+        [([0, 0, 1, 1], [1, 1, 2, 2]), ([0, 0, 1, 1], [1, 2, 2, 3])],
+        dtype=vrd_evaluation.vrd_box_data_type)
+    groundtruth_class_tuples1 = np.array(
+        [(1, 2, 3), (1, 4, 3)], dtype=vrd_evaluation.label_data_type)
+    groundtruth_verified_labels1 = np.array([1, 2, 3, 4, 5], dtype=int)
+    self.vrd_eval.add_single_ground_truth_image_info(
+        image_key1, {
+            standard_fields.InputDataFields.groundtruth_boxes:
+                groundtruth_box_tuples1,
+            standard_fields.InputDataFields.groundtruth_classes:
+                groundtruth_class_tuples1,
+            standard_fields.InputDataFields.verified_labels:
+                groundtruth_verified_labels1
+        })
+
+    image_key2 = 'img2'
+    groundtruth_box_tuples2 = np.array(
+        [([0, 0, 1, 1], [1, 1, 2, 2])], dtype=vrd_evaluation.vrd_box_data_type)
+    groundtruth_class_tuples2 = np.array(
+        [(1, 4, 3)], dtype=vrd_evaluation.label_data_type)
+    self.vrd_eval.add_single_ground_truth_image_info(
+        image_key2, {
+            standard_fields.InputDataFields.groundtruth_boxes:
+                groundtruth_box_tuples2,
+            standard_fields.InputDataFields.groundtruth_classes:
+                groundtruth_class_tuples2,
+        })
+
+    image_key3 = 'img3'
+    groundtruth_box_tuples3 = np.array(
+        [([0, 0, 1, 1], [1, 1, 2, 2])], dtype=vrd_evaluation.vrd_box_data_type)
+    groundtruth_class_tuples3 = np.array(
+        [(1, 2, 4)], dtype=vrd_evaluation.label_data_type)
+    self.vrd_eval.add_single_ground_truth_image_info(
+        image_key3, {
+            standard_fields.InputDataFields.groundtruth_boxes:
+                groundtruth_box_tuples3,
+            standard_fields.InputDataFields.groundtruth_classes:
+                groundtruth_class_tuples3,
+        })
+
+    image_key = 'img1'
+    detected_box_tuples = np.array(
+        [([0, 0.3, 1, 1], [1.1, 1, 2, 2]), ([0, 0, 1, 1], [1, 1, 2, 2])],
+        dtype=vrd_evaluation.vrd_box_data_type)
+    detected_class_tuples = np.array(
+        [(1, 2, 5), (1, 2, 3)], dtype=vrd_evaluation.label_data_type)
+    detected_scores = np.array([0.7, 0.8], dtype=float)
+    self.vrd_eval.add_single_detected_image_info(
+        image_key, {
+            standard_fields.DetectionResultFields.detection_boxes:
+                detected_box_tuples,
+            standard_fields.DetectionResultFields.detection_scores:
+                detected_scores,
+            standard_fields.DetectionResultFields.detection_classes:
+                detected_class_tuples
+        })
+
+    metrics = self.vrd_eval.evaluate()
+
+    self.assertAlmostEqual(metrics['VRDMetric_Relationships_weightedAP@0.5IOU'],
+                           0.25)
+    self.assertAlmostEqual(metrics['VRDMetric_Relationships_mAP@0.5IOU'],
+                           0.1666666666666666)
+    self.assertAlmostEqual(metrics['VRDMetric_Relationships_AP@0.5IOU/3'],
+                           0.3333333333333333)
+    self.assertAlmostEqual(metrics['VRDMetric_Relationships_AP@0.5IOU/4'], 0)
+    self.assertAlmostEqual(metrics['VRDMetric_Relationships_Recall@50@0.5IOU'],
+                           0.25)
+    self.assertAlmostEqual(metrics['VRDMetric_Relationships_Recall@100@0.5IOU'],
+                           0.25)
+    self.vrd_eval.clear()
+    self.assertFalse(self.vrd_eval._image_ids)
+
+
+class VRDPhraseDetectionEvaluatorTest(tf.test.TestCase):
+
+  def test_vrdphrase_evaluator(self):
+    self.vrd_eval = vrd_evaluation.VRDPhraseDetectionEvaluator()
+
+    image_key1 = 'img1'
+    groundtruth_box_tuples1 = np.array(
+        [([0, 0, 1, 1], [1, 1, 2, 2]), ([0, 0, 1, 1], [1, 2, 2, 3])],
+        dtype=vrd_evaluation.vrd_box_data_type)
+    groundtruth_class_tuples1 = np.array(
+        [(1, 2, 3), (1, 4, 3)], dtype=vrd_evaluation.label_data_type)
+    groundtruth_verified_labels1 = np.array([1, 2, 3, 4, 5], dtype=int)
+    self.vrd_eval.add_single_ground_truth_image_info(
+        image_key1, {
+            standard_fields.InputDataFields.groundtruth_boxes:
+                groundtruth_box_tuples1,
+            standard_fields.InputDataFields.groundtruth_classes:
+                groundtruth_class_tuples1,
+            standard_fields.InputDataFields.verified_labels:
+                groundtruth_verified_labels1
+        })
+
+    image_key2 = 'img2'
+    groundtruth_box_tuples2 = np.array(
+        [([0, 0, 1, 1], [1, 1, 2, 2])], dtype=vrd_evaluation.vrd_box_data_type)
+    groundtruth_class_tuples2 = np.array(
+        [(1, 4, 3)], dtype=vrd_evaluation.label_data_type)
+    self.vrd_eval.add_single_ground_truth_image_info(
+        image_key2, {
+            standard_fields.InputDataFields.groundtruth_boxes:
+                groundtruth_box_tuples2,
+            standard_fields.InputDataFields.groundtruth_classes:
+                groundtruth_class_tuples2,
+        })
+
+    image_key3 = 'img3'
+    groundtruth_box_tuples3 = np.array(
+        [([0, 0, 1, 1], [1, 1, 2, 2])], dtype=vrd_evaluation.vrd_box_data_type)
+    groundtruth_class_tuples3 = np.array(
+        [(1, 2, 4)], dtype=vrd_evaluation.label_data_type)
+    self.vrd_eval.add_single_ground_truth_image_info(
+        image_key3, {
+            standard_fields.InputDataFields.groundtruth_boxes:
+                groundtruth_box_tuples3,
+            standard_fields.InputDataFields.groundtruth_classes:
+                groundtruth_class_tuples3,
+        })
+
+    image_key = 'img1'
+    detected_box_tuples = np.array(
+        [([0, 0.3, 0.5, 0.5], [0.3, 0.3, 1.0, 1.0]),
+         ([0, 0, 1.2, 1.2], [0.0, 0.0, 2.0, 2.0])],
+        dtype=vrd_evaluation.vrd_box_data_type)
+    detected_class_tuples = np.array(
+        [(1, 2, 5), (1, 2, 3)], dtype=vrd_evaluation.label_data_type)
+    detected_scores = np.array([0.7, 0.8], dtype=float)
+    self.vrd_eval.add_single_detected_image_info(
+        image_key, {
+            standard_fields.DetectionResultFields.detection_boxes:
+                detected_box_tuples,
+            standard_fields.DetectionResultFields.detection_scores:
+                detected_scores,
+            standard_fields.DetectionResultFields.detection_classes:
+                detected_class_tuples
+        })
+
+    metrics = self.vrd_eval.evaluate()
+
+    self.assertAlmostEqual(metrics['VRDMetric_Phrases_weightedAP@0.5IOU'], 0.25)
+    self.assertAlmostEqual(metrics['VRDMetric_Phrases_mAP@0.5IOU'],
+                           0.1666666666666666)
+    self.assertAlmostEqual(metrics['VRDMetric_Phrases_AP@0.5IOU/3'],
+                           0.3333333333333333)
+    self.assertAlmostEqual(metrics['VRDMetric_Phrases_AP@0.5IOU/4'], 0)
+    self.assertAlmostEqual(metrics['VRDMetric_Phrases_Recall@50@0.5IOU'], 0.25)
+    self.assertAlmostEqual(metrics['VRDMetric_Phrases_Recall@100@0.5IOU'], 0.25)
+
+    self.vrd_eval.clear()
+    self.assertFalse(self.vrd_eval._image_ids)
+
+
+class VRDDetectionEvaluationTest(tf.test.TestCase):
+
+  def setUp(self):
+
+    self.vrd_eval = vrd_evaluation._VRDDetectionEvaluation(
+        matching_iou_threshold=0.5)
+
+    image_key1 = 'img1'
+    groundtruth_box_tuples1 = np.array(
+        [([0, 0, 1, 1], [1, 1, 2, 2]), ([0, 0, 1, 1], [1, 2, 2, 3])],
+        dtype=vrd_evaluation.vrd_box_data_type)
+    groundtruth_class_tuples1 = np.array(
+        [(1, 2, 3), (1, 4, 3)], dtype=vrd_evaluation.label_data_type)
+    self.vrd_eval.add_single_ground_truth_image_info(
+        image_key1, groundtruth_box_tuples1, groundtruth_class_tuples1)
+
+    image_key2 = 'img2'
+    groundtruth_box_tuples2 = np.array(
+        [([0, 0, 1, 1], [1, 1, 2, 2])], dtype=vrd_evaluation.vrd_box_data_type)
+    groundtruth_class_tuples2 = np.array(
+        [(1, 4, 3)], dtype=vrd_evaluation.label_data_type)
+    self.vrd_eval.add_single_ground_truth_image_info(
+        image_key2, groundtruth_box_tuples2, groundtruth_class_tuples2)
+
+    image_key3 = 'img3'
+    groundtruth_box_tuples3 = np.array(
+        [([0, 0, 1, 1], [1, 1, 2, 2])], dtype=vrd_evaluation.vrd_box_data_type)
+    groundtruth_class_tuples3 = np.array(
+        [(1, 2, 4)], dtype=vrd_evaluation.label_data_type)
+    self.vrd_eval.add_single_ground_truth_image_info(
+        image_key3, groundtruth_box_tuples3, groundtruth_class_tuples3)
+
+    image_key = 'img1'
+    detected_box_tuples = np.array(
+        [([0, 0.3, 1, 1], [1.1, 1, 2, 2]), ([0, 0, 1, 1], [1, 1, 2, 2])],
+        dtype=vrd_evaluation.vrd_box_data_type)
+    detected_class_tuples = np.array(
+        [(1, 2, 3), (1, 2, 3)], dtype=vrd_evaluation.label_data_type)
+    detected_scores = np.array([0.7, 0.8], dtype=float)
+    self.vrd_eval.add_single_detected_image_info(
+        image_key, detected_box_tuples, detected_scores, detected_class_tuples)
+
+    metrics = self.vrd_eval.evaluate()
+    expected_weighted_average_precision = 0.25
+    expected_mean_average_precision = 0.16666666666666
+    expected_precision = np.array([1., 0.5], dtype=float)
+    expected_recall = np.array([0.25, 0.25], dtype=float)
+    expected_recall_50 = 0.25
+    expected_recall_100 = 0.25
+    expected_median_rank_50 = 0
+    expected_median_rank_100 = 0
+
+    self.assertAlmostEqual(expected_weighted_average_precision,
+                           metrics.weighted_average_precision)
+    self.assertAlmostEqual(expected_mean_average_precision,
+                           metrics.mean_average_precision)
+    self.assertAlmostEqual(expected_mean_average_precision,
+                           metrics.mean_average_precision)
+
+    self.assertAllClose(expected_precision, metrics.precisions)
+    self.assertAllClose(expected_recall, metrics.recalls)
+    self.assertAlmostEqual(expected_recall_50, metrics.recall_50)
+    self.assertAlmostEqual(expected_recall_100, metrics.recall_100)
+    self.assertAlmostEqual(expected_median_rank_50, metrics.median_rank_50)
+    self.assertAlmostEqual(expected_median_rank_100, metrics.median_rank_100)
+
+
+if __name__ == '__main__':
+  tf.test.main()