evaluation_protocols.md 5.91 KB
Newer Older
1
2
3
4
5
6
# Supported object detection evaluation protocols

The Tensorflow Object Detection API currently supports three evaluation protocols,
that can be configured in `EvalConfig` by setting `metrics_set` to the
corresponding value.

7
## PASCAL VOC 2010 detection metric
8

9
`EvalConfig.metrics_set='pascal_voc_detection_metrics'`
10

11
12
13
14
The commonly used mAP metric for evaluating the quality of object detectors,
computed according to the protocol of the PASCAL VOC Challenge 2010-2012. The
protocol is available
[here](http://host.robots.ox.ac.uk/pascal/VOC/voc2010/devkit_doc_08-May-2010.pdf).
15

16
## Weighted PASCAL VOC detection metric
17

18
`EvalConfig.metrics_set='weighted_pascal_voc_detection_metrics'`
19
20
21
22
23
24

The weighted PASCAL metric computes the mean average precision as the average
precision when treating all classes as a single class. In comparison,
PASCAL metrics computes the mean average precision as the mean of the
per-class average precisions.

25
26
27
28
29
For example, the test set consists of two classes, "cat" and "dog", and there
are ten times more boxes of "cat" than those of "dog". According to PASCAL VOC
2010 metric, performance on each of the two classes would contribute equally
towards the final mAP value, while for the Weighted PASCAL VOC metric the final
mAP value will be influenced by frequency of each class.
30

31
## PASCAL VOC 2010 instance segmentation metric
32
33
34

`EvalConfig.metrics_set='pascal_voc_instance_segmentation_metrics'`

35
Similar to Pascal VOC 2010 detection metric, but computes the intersection over
36
37
union based on the object masks instead of object boxes.

Zhichao Lu's avatar
Zhichao Lu committed
38
## Weighted PASCAL VOC instance segmentation metric
39
40
41

`EvalConfig.metrics_set='weighted_pascal_voc_instance_segmentation_metrics'`

42
Similar to the weighted pascal voc 2010 detection metric, but computes the
43
44
intersection over union based on the object masks instead of object boxes.

45
## Open Images V2 detection metric
46

47
`EvalConfig.metrics_set='open_images_V2_detection_metrics'`
48

49
50
51
52
This metric is defined originally for evaluating detector performance on [Open
Images V2 dataset](https://github.com/openimages/dataset) and is fairly similar
to the PASCAL VOC 2010 metric mentioned above. It computes interpolated average
precision (AP) for each class and averages it among all classes (mAP).
53

54
The difference to the PASCAL VOC 2010 metric is the following: Open Images
55
56
57
58
59
60
61
62
63
64
65
66
67
68
annotations contain `group-of` ground-truth boxes (see [Open Images data
description](https://github.com/openimages/dataset#annotations-human-bboxcsv)),
that are treated differently for the purpose of deciding whether detections are
"true positives", "ignored", "false positives". Here we define these three
cases:

A detection is a "true positive" if there is a non-group-of ground-truth box,
such that:

*   The detection box and the ground-truth box are of the same class, and
    intersection-over-union (IoU) between the detection box and the ground-truth
    box is greater than the IoU threshold (default value 0.5). \
    Illustration of handling non-group-of boxes: \
    ![alt
69
    groupof_case_eval](img/nongroupof_case_eval.png "illustration of handling non-group-of boxes: yellow box - ground truth bounding box; green box - true positive; red box - false positives.")
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87

    *   yellow box - ground-truth box;
    *   green box - true positive;
    *   red boxes - false positives.

*   This is the highest scoring detection for this ground truth box that
    satisfies the criteria above.

A detection is "ignored" if it is not a true positive, and there is a `group-of`
ground-truth box such that:

*   The detection box and the ground-truth box are of the same class, and the
    area of intersection between the detection box and the ground-truth box
    divided by the area of the detection is greater than 0.5. This is intended
    to measure whether the detection box is approximately inside the group-of
    ground-truth box. \
    Illustration of handling `group-of` boxes: \
    ![alt
88
    groupof_case_eval](img/groupof_case_eval.png "illustration of handling group-of boxes: yellow box - ground truth bounding box; grey boxes - two detections of cars, that are ignored; red box - false positive.")
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112

    *   yellow box - ground-truth box;
    *   grey boxes - two detections on cars, that are ignored;
    *   red box - false positive.

A detection is a "false positive" if it is neither a "true positive" nor
"ignored".

Precision and recall are defined as:

* Precision = number-of-true-positives/(number-of-true-positives + number-of-false-positives)
* Recall = number-of-true-positives/number-of-non-group-of-boxes

Note that detections ignored as firing on a `group-of` ground-truth box do not
contribute to the number of true positives.

The labels in Open Images are organized in a
[hierarchy](https://storage.googleapis.com/openimages/2017_07/bbox_labels_vis/bbox_labels_vis.html).
Ground-truth bounding-boxes are annotated with the most specific class available
in the hierarchy. For example, "car" has two children "limousine" and "van". Any
other kind of car is annotated as "car" (for example, a sedan). Given this
convention, the evaluation software treats all classes independently, ignoring
the hierarchy. To achieve high performance values, object detectors should
output bounding-boxes labelled in the same manner.
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129

## OID Challenge Object Detection Metric 2018

`EvalConfig.metrics_set='oid_challenge_object_detection_metrics'`

The metric for the OID Challenge Object Detection Metric 2018, Object Detection
track. The description is provided on the [Open Images Challenge
website](https://storage.googleapis.com/openimages/web/challenge.html).

## OID Challenge Visual Relationship Detection Metric 2018

The metric for the OID Challenge Visual Relationship Detection Metric 2018, Visual
Relationship Detection track. The description is provided on the [Open Images
Challenge
website](https://storage.googleapis.com/openimages/web/challenge.html). Note:
this is currently a stand-alone metric, that can be used only through the
`metrics/oid_vrd_challenge_evaluation.py` util.