"tests/vscode:/vscode.git/clone" did not exist on "a03100ead0b5f0d11877d9af4660615bb5f20814"
Unverified Commit 8349eaf8 authored by Jonathan Huang's avatar Jonathan Huang Committed by GitHub
Browse files

Merge pull request #2827 from tombstone/documentation

Add dataset tools, pretrained models and documentatinos for open imag…
parents 676a4f70 560fae89
......@@ -21,6 +21,10 @@ Song Y, Guadarrama S, Murphy K, CVPR 2017
\[[link](https://arxiv.org/abs/1611.10012)\]\[[bibtex](
https://scholar.googleusercontent.com/scholar.bib?q=info:l291WsrB-hQJ:scholar.google.com/&output=citation&scisig=AAGBfm0AAAAAWUIIlnPZ_L9jxvPwcC49kDlELtaeIyU-&scisf=4&ct=citation&cd=-1&hl=en&scfhb=1)\]
<p align="center">
<img src="g3doc/img/tf-od-api-logo.png" width=140 height=195>
</p>
## Maintainers
* Jonathan Huang, github: [jch1](https://github.com/jch1)
......@@ -59,6 +63,10 @@ Extras:
Defining your own model architecture</a><br>
* <a href='g3doc/using_your_own_dataset.md'>
Bringing in your own dataset</a><br>
* <a href='g3doc/oid_inference_and_evaluation.md'>
Inference and evaluation on the Open Images dataset</a><br>
* <a href='g3doc/evaluation_protocols.md'>
Supported object detection evaluation protocols</a><br>
## Getting Help
......@@ -71,8 +79,21 @@ tensorflow/models Github
[issue tracker](https://github.com/tensorflow/models/issues), prefixing the
issue name with "object_detection".
## Release information
### November 17, 2017
As a part of the Open Images V3 release we have released:
* An implementation of the Open Images evaluation metric and the [protocol](g3doc/evaluation_protocols.md#open-images).
* Additional tools to separate inference of detection and evaluation (see [this tutorial](g3doc/oid_inference_and_evaluation.md)).
* A new detection model trained on the Open Images V2 data release (see [Open Images model](g3doc/detection_model_zoo.md#open-images-models)).
See more information on the [Open Images website](https://github.com/openimages/dataset)!
<b>Thanks to contributors</b>: Stefan Popov, Alina Kuznetsova
### November 6, 2017
......@@ -107,6 +128,7 @@ you to try out other detection models!
<b>Thanks to contributors</b>: Jonathan Huang, Andrew Harp
### June 15, 2017
In addition to our base Tensorflow detection model definitions, this
......@@ -130,3 +152,4 @@ release includes:
<b>Thanks to contributors</b>: Jonathan Huang, Vivek Rathod, Derek Chow,
Chen Sun, Menglong Zhu, Matthew Tang, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, Jasper Uijlings,
Viacheslav Kovalevskyi, Kevin Murphy
......@@ -82,3 +82,26 @@ py_library(
"//tensorflow_models/object_detection/utils:dataset_util",
],
)
py_test(
name = "oid_tfrecord_creation_test",
srcs = ["oid_tfrecord_creation_test.py"],
deps = [
":oid_tfrecord_creation",
"//third_party/py/contextlib2",
"//third_party/py/pandas",
"//third_party/py/tensorflow",
],
)
py_binary(
name = "create_oid_tf_record",
srcs = ["create_oid_tf_record.py"],
deps = [
":oid_tfrecord_creation",
"//third_party/py/contextlib2",
"//third_party/py/pandas",
"//tensorflow",
"//tensorflow_models/object_detection/utils:label_map_util",
],
)
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
r"""Creates TFRecords of Open Images dataset for object detection.
Example usage:
./create_oid_tf_record \
--input_annotations_csv=/path/to/input/annotations-human-bbox.csv \
--input_images_directory=/path/to/input/image_pixels_directory \
--input_label_map=/path/to/input/labels_bbox_545.labelmap \
--output_tf_record_path_prefix=/path/to/output/prefix.tfrecord
CSVs with bounding box annotations and image metadata (including the image URLs)
can be downloaded from the Open Images GitHub repository:
https://github.com/openimages/dataset
This script will include every image found in the input_images_directory in the
output TFRecord, even if the image has no corresponding bounding box annotations
in the input_annotations_csv.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import contextlib2
import pandas as pd
import tensorflow as tf
from object_detection.dataset_tools import oid_tfrecord_creation
from object_detection.utils import label_map_util
tf.flags.DEFINE_string('input_annotations_csv', None,
'Path to CSV containing image bounding box annotations')
tf.flags.DEFINE_string('input_images_directory', None,
'Directory containing the image pixels '
'downloaded from the OpenImages GitHub repository.')
tf.flags.DEFINE_string('input_label_map', None, 'Path to the label map proto')
tf.flags.DEFINE_string(
'output_tf_record_path_prefix', None,
'Path to the output TFRecord. The shard index and the number of shards '
'will be appended for each output shard.')
tf.flags.DEFINE_integer('num_shards', 100, 'Number of TFRecord shards')
FLAGS = tf.flags.FLAGS
def main(_):
tf.logging.set_verbosity(tf.logging.INFO)
required_flags = [
'input_annotations_csv', 'input_images_directory', 'input_label_map',
'output_tf_record_path_prefix'
]
for flag_name in required_flags:
if not getattr(FLAGS, flag_name):
raise ValueError('Flag --{} is required'.format(flag_name))
label_map = label_map_util.get_label_map_dict(FLAGS.input_label_map)
all_annotations = pd.read_csv(FLAGS.input_annotations_csv)
all_images = tf.gfile.Glob(
os.path.join(FLAGS.input_images_directory, '*.jpg'))
all_image_ids = [os.path.splitext(os.path.basename(v))[0] for v in all_images]
all_image_ids = pd.DataFrame({'ImageID': all_image_ids})
all_annotations = pd.concat([all_annotations, all_image_ids])
tf.logging.log(tf.logging.INFO, 'Found %d images...', len(all_image_ids))
with contextlib2.ExitStack() as tf_record_close_stack:
output_tfrecords = oid_tfrecord_creation.open_sharded_output_tfrecords(
tf_record_close_stack, FLAGS.output_tf_record_path_prefix,
FLAGS.num_shards)
for counter, image_data in enumerate(all_annotations.groupby('ImageID')):
tf.logging.log_every_n(tf.logging.INFO, 'Processed %d images...', 1000,
counter)
image_id, image_annotations = image_data
# In OID image file names are formed by appending ".jpg" to the image ID.
image_path = os.path.join(FLAGS.input_images_directory, image_id + '.jpg')
with tf.gfile.Open(image_path) as image_file:
encoded_image = image_file.read()
tf_example = oid_tfrecord_creation.tf_example_from_annotations_data_frame(
image_annotations, label_map, encoded_image)
if tf_example:
shard_idx = long(image_id, 16) % FLAGS.num_shards
output_tfrecords[shard_idx].write(tf_example.SerializeToString())
if __name__ == '__main__':
tf.app.run()
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
r"""Utilities for creating TFRecords of TF examples for the Open Images dataset.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
from object_detection.core import standard_fields
from object_detection.utils import dataset_util
def tf_example_from_annotations_data_frame(annotations_data_frame, label_map,
encoded_image):
"""Populates a TF Example message with image annotations from a data frame.
Args:
annotations_data_frame: Data frame containing the annotations for a single
image.
label_map: String to integer label map.
encoded_image: The encoded image string
Returns:
The populated TF Example, if the label of at least one object is present in
label_map. Otherwise, returns None.
"""
filtered_data_frame = annotations_data_frame[
annotations_data_frame.LabelName.isin(label_map)]
image_id = annotations_data_frame.ImageID.iloc[0]
feature_map = {
standard_fields.TfExampleFields.object_bbox_ymin:
dataset_util.float_list_feature(filtered_data_frame.YMin.as_matrix()),
standard_fields.TfExampleFields.object_bbox_xmin:
dataset_util.float_list_feature(filtered_data_frame.XMin.as_matrix()),
standard_fields.TfExampleFields.object_bbox_ymax:
dataset_util.float_list_feature(filtered_data_frame.YMax.as_matrix()),
standard_fields.TfExampleFields.object_bbox_xmax:
dataset_util.float_list_feature(filtered_data_frame.XMax.as_matrix()),
standard_fields.TfExampleFields.object_class_text:
dataset_util.bytes_list_feature(
filtered_data_frame.LabelName.as_matrix()),
standard_fields.TfExampleFields.object_class_label:
dataset_util.int64_list_feature(
filtered_data_frame.LabelName.map(lambda x: label_map[x])
.as_matrix()),
standard_fields.TfExampleFields.filename:
dataset_util.bytes_feature('{}.jpg'.format(image_id)),
standard_fields.TfExampleFields.source_id:
dataset_util.bytes_feature(image_id),
standard_fields.TfExampleFields.image_encoded:
dataset_util.bytes_feature(encoded_image),
}
if 'IsGroupOf' in filtered_data_frame.columns:
feature_map[standard_fields.TfExampleFields.
object_group_of] = dataset_util.int64_list_feature(
filtered_data_frame.IsGroupOf.as_matrix().astype(int))
if 'IsOccluded' in filtered_data_frame.columns:
feature_map[standard_fields.TfExampleFields.
object_occluded] = dataset_util.int64_list_feature(
filtered_data_frame.IsOccluded.as_matrix().astype(int))
if 'IsTruncated' in filtered_data_frame.columns:
feature_map[standard_fields.TfExampleFields.
object_truncated] = dataset_util.int64_list_feature(
filtered_data_frame.IsTruncated.as_matrix().astype(int))
if 'IsDepiction' in filtered_data_frame.columns:
feature_map[standard_fields.TfExampleFields.
object_depiction] = dataset_util.int64_list_feature(
filtered_data_frame.IsDepiction.as_matrix().astype(int))
return tf.train.Example(features=tf.train.Features(feature=feature_map))
def open_sharded_output_tfrecords(exit_stack, base_path, num_shards):
"""Opens all TFRecord shards for writing and adds them to an exit stack.
Args:
exit_stack: A context2.ExitStack used to automatically closed the TFRecords
opened in this function.
base_path: The base path for all shards
num_shards: The number of shards
Returns:
The list of opened TFRecords. Position k in the list corresponds to shard k.
"""
tf_record_output_filenames = [
'{}-{:05d}-of-{:05d}'.format(base_path, idx, num_shards)
for idx in xrange(num_shards)
]
tfrecords = [
exit_stack.enter_context(tf.python_io.TFRecordWriter(file_name))
for file_name in tf_record_output_filenames
]
return tfrecords
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for oid_tfrecord_creation.py."""
import os
import contextlib2
import pandas as pd
import tensorflow as tf
from object_detection.dataset_tools import oid_tfrecord_creation
def create_test_data():
data = {
'ImageID': ['i1', 'i1', 'i1', 'i1', 'i2', 'i2'],
'LabelName': ['a', 'a', 'b', 'b', 'b', 'c'],
'YMin': [0.3, 0.6, 0.8, 0.1, 0.0, 0.0],
'XMin': [0.1, 0.3, 0.7, 0.0, 0.1, 0.1],
'XMax': [0.2, 0.3, 0.8, 0.5, 0.9, 0.9],
'YMax': [0.3, 0.6, 1, 0.8, 0.8, 0.8],
'IsOccluded': [0, 1, 1, 0, 0, 0],
'IsTruncated': [0, 0, 0, 1, 0, 0],
'IsGroupOf': [0, 0, 0, 0, 0, 1],
'IsDepiction': [1, 0, 0, 0, 0, 0],
}
df = pd.DataFrame(data=data)
label_map = {'a': 0, 'b': 1, 'c': 2}
return label_map, df
class TfExampleFromAnnotationsDataFrameTests(tf.test.TestCase):
def test_simple(self):
label_map, df = create_test_data()
tf_example = oid_tfrecord_creation.tf_example_from_annotations_data_frame(
df[df.ImageID == 'i1'], label_map, 'encoded_image_test')
self.assertProtoEquals("""
features {
feature {
key: "image/encoded"
value { bytes_list { value: "encoded_image_test" } } }
feature {
key: "image/filename"
value { bytes_list { value: "i1.jpg" } } }
feature {
key: "image/object/bbox/ymin"
value { float_list { value: [0.3, 0.6, 0.8, 0.1] } } }
feature {
key: "image/object/bbox/xmin"
value { float_list { value: [0.1, 0.3, 0.7, 0.0] } } }
feature {
key: "image/object/bbox/ymax"
value { float_list { value: [0.3, 0.6, 1.0, 0.8] } } }
feature {
key: "image/object/bbox/xmax"
value { float_list { value: [0.2, 0.3, 0.8, 0.5] } } }
feature {
key: "image/object/class/label"
value { int64_list { value: [0, 0, 1, 1] } } }
feature {
key: "image/object/class/text"
value { bytes_list { value: ["a", "a", "b", "b"] } } }
feature {
key: "image/source_id"
value { bytes_list { value: "i1" } } }
feature {
key: "image/object/depiction"
value { int64_list { value: [1, 0, 0, 0] } } }
feature {
key: "image/object/group_of"
value { int64_list { value: [0, 0, 0, 0] } } }
feature {
key: "image/object/occluded"
value { int64_list { value: [0, 1, 1, 0] } } }
feature {
key: "image/object/truncated"
value { int64_list { value: [0, 0, 0, 1] } } } }
""", tf_example)
def test_no_attributes(self):
label_map, df = create_test_data()
del df['IsDepiction']
del df['IsGroupOf']
del df['IsOccluded']
del df['IsTruncated']
tf_example = oid_tfrecord_creation.tf_example_from_annotations_data_frame(
df[df.ImageID == 'i2'], label_map, 'encoded_image_test')
self.assertProtoEquals("""
features {
feature {
key: "image/encoded"
value { bytes_list { value: "encoded_image_test" } } }
feature {
key: "image/filename"
value { bytes_list { value: "i2.jpg" } } }
feature {
key: "image/object/bbox/ymin"
value { float_list { value: [0.0, 0.0] } } }
feature {
key: "image/object/bbox/xmin"
value { float_list { value: [0.1, 0.1] } } }
feature {
key: "image/object/bbox/ymax"
value { float_list { value: [0.8, 0.8] } } }
feature {
key: "image/object/bbox/xmax"
value { float_list { value: [0.9, 0.9] } } }
feature {
key: "image/object/class/label"
value { int64_list { value: [1, 2] } } }
feature {
key: "image/object/class/text"
value { bytes_list { value: ["b", "c"] } } }
feature {
key: "image/source_id"
value { bytes_list { value: "i2" } } } }
""", tf_example)
def test_label_filtering(self):
label_map, df = create_test_data()
label_map = {'a': 0}
tf_example = oid_tfrecord_creation.tf_example_from_annotations_data_frame(
df[df.ImageID == 'i1'], label_map, 'encoded_image_test')
self.assertProtoEquals("""
features {
feature {
key: "image/encoded"
value { bytes_list { value: "encoded_image_test" } } }
feature {
key: "image/filename"
value { bytes_list { value: "i1.jpg" } } }
feature {
key: "image/object/bbox/ymin"
value { float_list { value: [0.3, 0.6] } } }
feature {
key: "image/object/bbox/xmin"
value { float_list { value: [0.1, 0.3] } } }
feature {
key: "image/object/bbox/ymax"
value { float_list { value: [0.3, 0.6] } } }
feature {
key: "image/object/bbox/xmax"
value { float_list { value: [0.2, 0.3] } } }
feature {
key: "image/object/class/label"
value { int64_list { value: [0, 0] } } }
feature {
key: "image/object/class/text"
value { bytes_list { value: ["a", "a"] } } }
feature {
key: "image/source_id"
value { bytes_list { value: "i1" } } }
feature {
key: "image/object/depiction"
value { int64_list { value: [1, 0] } } }
feature {
key: "image/object/group_of"
value { int64_list { value: [0, 0] } } }
feature {
key: "image/object/occluded"
value { int64_list { value: [0, 1] } } }
feature {
key: "image/object/truncated"
value { int64_list { value: [0, 0] } } } }
""", tf_example)
class OpenOutputTfrecordsTests(tf.test.TestCase):
def test_sharded_tfrecord_writes(self):
with contextlib2.ExitStack() as tf_record_close_stack:
output_tfrecords = oid_tfrecord_creation.open_sharded_output_tfrecords(
tf_record_close_stack,
os.path.join(tf.test.get_temp_dir(), 'test.tfrec'), 10)
for idx in range(10):
output_tfrecords[idx].write('test_{}'.format(idx))
for idx in range(10):
tf_record_path = '{}-{:05d}-of-00010'.format(
os.path.join(tf.test.get_temp_dir(), 'test.tfrec'), idx)
records = list(tf.python_io.tf_record_iterator(tf_record_path))
self.assertAllEqual(records, ['test_{}'.format(idx)])
if __name__ == '__main__':
tf.test.main()
# Faster R-CNN with Inception Resnet v2, Atrous version;
# Configured for Open Images Dataset.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
# should be configured.
model {
faster_rcnn {
num_classes: 546
image_resizer {
keep_aspect_ratio_resizer {
min_dimension: 600
max_dimension: 1024
}
}
feature_extractor {
type: 'faster_rcnn_inception_resnet_v2'
first_stage_features_stride: 8
}
first_stage_anchor_generator {
grid_anchor_generator {
scales: [0.25, 0.5, 1.0, 2.0]
aspect_ratios: [0.5, 1.0, 2.0]
height_stride: 8
width_stride: 8
}
}
first_stage_atrous_rate: 2
first_stage_box_predictor_conv_hyperparams {
op: CONV
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
truncated_normal_initializer {
stddev: 0.01
}
}
}
first_stage_nms_score_threshold: 0.0
first_stage_nms_iou_threshold: 0.7
first_stage_max_proposals: 300
first_stage_localization_loss_weight: 2.0
first_stage_objectness_loss_weight: 1.0
initial_crop_size: 17
maxpool_kernel_size: 1
maxpool_stride: 1
second_stage_box_predictor {
mask_rcnn_box_predictor {
use_dropout: false
dropout_keep_probability: 1.0
fc_hyperparams {
op: FC
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
variance_scaling_initializer {
factor: 1.0
uniform: true
mode: FAN_AVG
}
}
}
}
}
second_stage_post_processing {
batch_non_max_suppression {
score_threshold: 0.0
iou_threshold: 0.6
max_detections_per_class: 100
max_total_detections: 100
}
score_converter: SOFTMAX
}
second_stage_localization_loss_weight: 2.0
second_stage_classification_loss_weight: 1.0
}
}
train_config: {
batch_size: 1
optimizer {
momentum_optimizer: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.00006
schedule {
step: 0
learning_rate: .00006
}
schedule {
step: 6000000
learning_rate: .000006
}
schedule {
step: 7000000
learning_rate: .0000006
}
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
gradient_clipping_by_norm: 10.0
fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt"
# Note: The below line limits the training process to 800K steps, which we
# empirically found to be sufficient enough to train the Open Images dataset.
# This effectively bypasses the learning rate schedule (the learning rate will
# never decay). Remove the below line to train indefinitely.
num_steps: 8000000
data_augmentation_options {
random_horizontal_flip {
}
}
}
train_input_reader: {
tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/oid_bbox_trainable_train.record"
}
label_map_path: "PATH_TO_BE_CONFIGURED/oid_bbox_trainable_label_map.pbtxt"
}
eval_config: {
metrics_set: "open_images_metrics"
num_examples: 8000
# Note: The below line limits the evaluation process to 10 evaluations.
# Remove the below line to evaluate indefinitely.
max_evals: 10
}
eval_input_reader: {
tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/oid_bbox_trainable_val.record"
}
label_map_path: "PATH_TO_BE_CONFIGURED/oid_bbox_trainable_label_map.pbtxt"
shuffle: false
num_readers: 1
}
# Tensorflow detection model zoo
We provide a collection of detection models pre-trained on the [COCO
dataset](http://mscoco.org) and the [Kitti dataset](http://www.cvlibs.net/datasets/kitti/).
These models can be useful for
dataset](http://mscoco.org), the [Kitti dataset](http://www.cvlibs.net/datasets/kitti/), and the
[Open Images dataset](https://github.com/openimages/dataset). These models can
be useful for
out-of-the-box inference if you are interested in categories already in COCO
(e.g., humans, cars, etc). They are also useful for initializing your models when
(e.g., humans, cars, etc) or in Open Images (e.g.,
surfboard, jacuzzi, etc). They are also useful for initializing your models when
training on novel datasets.
In the table below, we list each such pre-trained model including:
......@@ -18,7 +20,7 @@ In the table below, we list each such pre-trained model including:
configuration (these timings were performed using an Nvidia
GeForce GTX TITAN X card) and should be treated more as relative timings in
many cases.
* detector performance on subset of the COCO validation set.
* detector performance on subset of the COCO validation set or Open Images test split as measured by the dataset-specific mAP measure.
Here, higher is better, and we only report bounding box mAP rounded to the
nearest integer.
* Output types (currently only `Boxes`)
......@@ -86,5 +88,14 @@ Model name
----------------------------------------------------------------------------------------------------------------------------------------------------------------- | :---: | :-------------: | :-----:
[faster_rcnn_resnet101_kitti](http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_kitti_2017_11_08.tar.gz) | 79 | 87 | Boxes
## Open Images-trained models {#open-images-models}
Model name | Speed (ms) | Open Images mAP@0.5[^2] | Outputs
----------------------------------------------------------------------------------------------------------------------------------------------------------------- | :---: | :-------------: | :-----:
[faster_rcnn_inception_resnet_v2_atrous_oid](http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_resnet_v2_atrous_oid_2017_11_08.tar.gz) | 727 | 37 | Boxes
[faster_rcnn_inception_resnet_v2_atrous_lowproposals_oid](http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_resnet_v2_atrous_lowproposals_oid_2017_11_08.tar.gz) | 347 | | Boxes
[^1]: See [MSCOCO evaluation protocol](http://cocodataset.org/#detections-eval).
[^2]: This is PASCAL mAP with a slightly different way of true positives computation: see [Open Images evaluation protocol](evaluation_protocols.md#open-images).
# Supported object detection evaluation protocols
The Tensorflow Object Detection API currently supports three evaluation protocols,
that can be configured in `EvalConfig` by setting `metrics_set` to the
corresponding value.
## PASCAL VOC 2007 metric
`EvalConfig.metrics_set='pascal_voc_metrics'`
The commonly used mAP metric for evaluating the quality of object detectors, computed according to the protocol of the PASCAL VOC Challenge 2007.
The protocol is available [here](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/devkit_doc_07-Jun-2007.pdf).
## Weighted PASCAL VOC metric
`EvalConfig.metrics_set='weighted_pascal_voc_metrics'`
The weighted PASCAL metric computes the mean average precision as the average
precision when treating all classes as a single class. In comparison,
PASCAL metrics computes the mean average precision as the mean of the
per-class average precisions.
For example, the test set consists of two classes, "cat" and "dog", and there are ten times more boxes of "cat" than those of "dog".
According to PASCAL VOC 2007 metric, performance on each of the two classes would contribute equally towards the final mAP value,
while for the Weighted PASCAL VOC metric the final mAP value will be influenced by frequency of each class.
## Open Images metric {#open-images}
`EvalConfig.metrics_set='open_images_metrics'`
This metric is defined originally for evaluating detector performance on [Open Images V2 dataset](https://github.com/openimages/dataset)
and is fairly similar to the PASCAL VOC 2007 metric mentioned above.
It computes interpolated average precision (AP) for each class and averages it among all classes (mAP).
The difference to the PASCAL VOC 2007 metric is the following: Open Images
annotations contain `group-of` ground-truth boxes (see [Open Images data
description](https://github.com/openimages/dataset#annotations-human-bboxcsv)),
that are treated differently for the purpose of deciding whether detections are
"true positives", "ignored", "false positives". Here we define these three
cases:
A detection is a "true positive" if there is a non-group-of ground-truth box,
such that:
* The detection box and the ground-truth box are of the same class, and
intersection-over-union (IoU) between the detection box and the ground-truth
box is greater than the IoU threshold (default value 0.5). \
Illustration of handling non-group-of boxes: \
![alt
groupof_case_eval](img/nongroupof_case_eval.png "illustration of handling non-group-of boxes: yellow box - ground truth bounding box; green box - true positive; red box - false positives."){width="500" height="270"}
* yellow box - ground-truth box;
* green box - true positive;
* red boxes - false positives.
* This is the highest scoring detection for this ground truth box that
satisfies the criteria above.
A detection is "ignored" if it is not a true positive, and there is a `group-of`
ground-truth box such that:
* The detection box and the ground-truth box are of the same class, and the
area of intersection between the detection box and the ground-truth box
divided by the area of the detection is greater than 0.5. This is intended
to measure whether the detection box is approximately inside the group-of
ground-truth box. \
Illustration of handling `group-of` boxes: \
![alt
groupof_case_eval](img/groupof_case_eval.png "illustration of handling group-of boxes: yellow box - ground truth bounding box; grey boxes - two detections of cars, that are ignored; red box - false positive."){width="500" height="270"}
* yellow box - ground-truth box;
* grey boxes - two detections on cars, that are ignored;
* red box - false positive.
A detection is a "false positive" if it is neither a "true positive" nor
"ignored".
Precision and recall are defined as:
* Precision = number-of-true-positives/(number-of-true-positives + number-of-false-positives)
* Recall = number-of-true-positives/number-of-non-group-of-boxes
Note that detections ignored as firing on a `group-of` ground-truth box do not
contribute to the number of true positives.
The labels in Open Images are organized in a
[hierarchy](https://storage.googleapis.com/openimages/2017_07/bbox_labels_vis/bbox_labels_vis.html).
Ground-truth bounding-boxes are annotated with the most specific class available
in the hierarchy. For example, "car" has two children "limousine" and "van". Any
other kind of car is annotated as "car" (for example, a sedan). Given this
convention, the evaluation software treats all classes independently, ignoring
the hierarchy. To achieve high performance values, object detectors should
output bounding-boxes labelled in the same manner.
......@@ -14,7 +14,7 @@ Tensorflow Object Detection API depends on the following libraries:
For detailed steps to install Tensorflow, follow the
[Tensorflow installation instructions](https://www.tensorflow.org/install/).
A typical user can install Tensorflow using one of the following commands:
A typically user can install Tensorflow using one of the following commands:
``` bash
# For CPU
......
# Inference and evaluation on the Open Images dataset
This page presents a tutorial for running object detector inference and
evaluation measure computations on the [Open Images
dataset](https://github.com/openimages/dataset), using tools from the
[TensorFlow Object Detection
API](https://github.com/tensorflow/models/tree/master/research/object_detection).
It shows how to download the images and annotations for the validation and test
sets of Open Images; how to package the downloaded data in a format understood
by the Object Detection API; where to find a trained object detector model for
Open Images; how to run inference; and how to compute evaluation measures on the
inferred detections.
Inferred detections will look like the following:
![](img/oid_bus_72e19c28aac34ed8.jpg){height="300"}
![](img/oid_monkey_3b4168c89cecbc5b.jpg){height="300"}
On the validation set of Open Images, this tutorial requires 27GB of free disk
space and the inference step takes approximately 9 hours on a single NVIDIA
Tesla P100 GPU. On the test set -- 75GB and 27 hours respectively. All other
steps require less than two hours in total on both sets.
## Installing TensorFlow, the Object Detection API, and Google Cloud SDK
Please run through the [installation instructions](installation.md) to install
TensorFlow and all its dependencies. Ensure the Protobuf libraries are compiled
and the library directories are added to `PYTHONPATH`. You will also need to
`pip` install `pandas` and `contextlib2`.
Some of the data used in this tutorial lives in Google Cloud buckets. To access
it, you will have to [install the Google Cloud
SDK](https://cloud.google.com/sdk/downloads) on your workstation or laptop.
## Preparing the Open Images validation and test sets
In order to run inference and subsequent evaluation measure computations, we
require a dataset of images and ground truth boxes, packaged as TFRecords of
TFExamples. To create such a dataset for Open Images, you will need to first
download ground truth boxes from the [Open Images
website](https://github.com/openimages/dataset):
```bash
# From tensorflow/models/research
mkdir oid
cd oid
wget https://storage.googleapis.com/openimages/2017_07/annotations_human_bbox_2017_07.tar.gz
tar -xvf annotations_human_bbox_2017_07.tar.gz
```
Next, download the images. In this tutorial, we will use lower resolution images
provided by [CVDF](http://www.cvdfoundation.org). Please follow the instructions
on [CVDF's Open Images repository
page](https://github.com/cvdfoundation/open-images-dataset) in order to gain
access to the cloud bucket with the images. Then run:
```bash
# From tensorflow/models/research/oid
SPLIT=validation # Set SPLIT to "test" to download the images in the test set
mkdir raw_images_${SPLIT}
gsutil -m rsync -r gs://open-images-dataset/$SPLIT raw_images_${SPLIT}
```
Another option for downloading the images is to follow the URLs contained in the
[image URLs and metadata CSV
files](https://storage.googleapis.com/openimages/2017_07/images_2017_07.tar.gz)
on the Open Images website.
At this point, your `tensorflow/models/research/oid` directory should appear as
follows:
```lang-none
|-- 2017_07
| |-- test
| | `-- annotations-human-bbox.csv
| |-- train
| | `-- annotations-human-bbox.csv
| `-- validation
| `-- annotations-human-bbox.csv
|-- raw_images_validation (if you downloaded the validation split)
| `-- ... (41,620 files matching regex "[0-9a-f]{16}.jpg")
|-- raw_images_test (if you downloaded the test split)
| `-- ... (125,436 files matching regex "[0-9a-f]{16}.jpg")
`-- annotations_human_bbox_2017_07.tar.gz
```
Next, package the data into TFRecords of TFExamples by running:
```bash
# From tensorflow/models/research/oid
SPLIT=validation # Set SPLIT to "test" to create TFRecords for the test split
mkdir ${SPLIT}_tfrecords
PYTHONPATH=$PYTHONPATH:$(readlink -f ..) \
python -m object_detection/dataset_tools/create_oid_tf_record \
--input_annotations_csv 2017_07/$SPLIT/annotations-human-bbox.csv \
--input_images_directory raw_images_${SPLIT} \
--input_label_map ../object_detection/data/oid_bbox_trainable_label_map.pbtxt \
--output_tf_record_path_prefix ${SPLIT}_tfrecords/$SPLIT.tfrecord \
--num_shards=100
```
This results in 100 TFRecord files (shards), written to
`oid/${SPLIT}_tfrecords`, with filenames matching
`${SPLIT}.tfrecord-000[0-9][0-9]-of-00100`. Each shard contains approximately
the same number of images and is defacto a representative random sample of the
input data. [This enables](#accelerating_inference) a straightforward work
division scheme for distributing inference and also approximate measure
computations on subsets of the validation and test sets.
## Inferring detections
Inference requires a trained object detection model. In this tutorial we will
use a model from the [detections model zoo](detection_model_zoo.md), which can
be downloaded and unpacked by running the commands below. More information about
the model, such as its architecture and how it was trained, is available in the
[model zoo page](detection_model_zoo.md).
```bash
# From tensorflow/models/research/oid
wget http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_resnet_v2_atrous_oid_14_10_2017.tar.gz
tar -zxvf faster_rcnn_inception_resnet_v2_atrous_oid_14_10_2017.tar.gz
```
At this point, data is packed into TFRecords and we have an object detector
model. We can run inference using:
```bash
# From tensorflow/models/research/oid
SPLIT=validation # or test
TF_RECORD_FILES=$(ls -1 ${SPLIT}_tfrecords/* | tr '\n' ',')
PYTHONPATH=$PYTHONPATH:$(readlink -f ..) \
python -m object_detection/inference/infer_detections \
--input_tfrecord_paths=$TF_RECORD_FILES \
--output_tfrecord_path=${SPLIT}_detections.tfrecord-00000-of-00001 \
--inference_graph=faster_rcnn_inception_resnet_v2_atrous_oid/frozen_inference_graph.pb \
--discard_image_pixels
```
Inference preserves all fields of the input TFExamples, and adds new fields to
store the inferred detections. This allows [computing evaluation
measures](#compute_evaluation_measures) on the output TFRecord alone, as ground
truth boxes are preserved as well. Since measure computations don't require
access to the images, `infer_detections` can optionally discard them with the
`--discard_image_pixels` flag. Discarding the images drastically reduces the
size of the output TFRecord.
### Accelerating inference {#accelerating_inference}
Running inference on the whole validation or test set can take a long time to
complete due to the large number of images present in these sets (41,620 and
125,436 respectively). For quick but approximate evaluation, inference and the
subsequent measure computations can be run on a small number of shards. To run
for example on 2% of all the data, it is enough to set `TF_RECORD_FILES` as
shown below before running `infer_detections`:
```bash
TF_RECORD_FILES=$(ls ${SPLIT}_tfrecords/${SPLIT}.tfrecord-0000[0-1]-of-00100 | tr '\n' ',')
```
Please note that computing evaluation measures on a small subset of the data
introduces variance and bias, since some classes of objects won't be seen during
evaluation. In the example above, this leads to 13.2% higher mAP on the first
two shards of the validation set compared to the mAP for the full set ([see mAP
results](#expected-maps)).
Another way to accelerate inference is to run it in parallel on multiple
TensorFlow devices on possibly multiple machines. The script below uses
[tmux](https://github.com/tmux/tmux/wiki) to run a separate `infer_detections`
process for each GPU on different partition of the input data.
```bash
# From tensorflow/models/research/oid
SPLIT=validation # or test
NUM_GPUS=4
NUM_SHARDS=100
tmux new-session -d -s "inference"
function tmux_start { tmux new-window -d -n "inference:GPU$1" "${*:2}; exec bash"; }
for gpu_index in $(seq 0 $(($NUM_GPUS-1))); do
start_shard=$(( $gpu_index * $NUM_SHARDS / $NUM_GPUS ))
end_shard=$(( ($gpu_index + 1) * $NUM_SHARDS / $NUM_GPUS - 1))
TF_RECORD_FILES=$(seq -s, -f "${SPLIT}_tfrecords/${SPLIT}.tfrecord-%05.0f-of-$(printf '%05d' $NUM_SHARDS)" $start_shard $end_shard)
tmux_start ${gpu_index} \
PYTHONPATH=$PYTHONPATH:$(readlink -f ..) CUDA_VISIBLE_DEVICES=$gpu_index \
python -m object_detection/inference/infer_detections \
--input_tfrecord_paths=$TF_RECORD_FILES \
--output_tfrecord_path=${SPLIT}_detections.tfrecord-$(printf "%05d" $gpu_index)-of-$(printf "%05d" $NUM_GPUS) \
--inference_graph=faster_rcnn_inception_resnet_v2_atrous_oid/frozen_inference_graph.pb \
--discard_image_pixels
done
```
After all `infer_detections` processes finish, `tensorflow/models/research/oid`
will contain one output TFRecord from each process, with name matching
`validation_detections.tfrecord-0000[0-3]-of-00004`.
## Computing evaluation measures {#compute_evaluation_measures}
To compute evaluation measures on the inferred detections you first need to
create the appropriate configuration files:
```bash
# From tensorflow/models/research/oid
SPLIT=validation # or test
NUM_SHARDS=1 # Set to NUM_GPUS if using the parallel evaluation script above
mkdir -p ${SPLIT}_eval_metrics
echo "
label_map_path: '../object_detection/data/oid_bbox_trainable_label_map.pbtxt'
tf_record_input_reader: { input_path: '${SPLIT}_detections.tfrecord@${NUM_SHARDS}' }
" > ${SPLIT}_eval_metrics/${SPLIT}_input_config.pbtxt
echo "
metrics_set: 'open_images_metrics'
" > ${SPLIT}_eval_metrics/${SPLIT}_eval_config.pbtxt
```
And then run:
```bash
# From tensorflow/models/research/oid
SPLIT=validation # or test
PYTHONPATH=$PYTHONPATH:$(readlink -f ..) \
python -m object_detection/metrics/offline_eval_map_corloc \
--eval_dir=${SPLIT}_eval_metrics \
--eval_config_path=${SPLIT}_eval_metrics/${SPLIT}_eval_config.pbtxt \
--input_config_path=${SPLIT}_eval_metrics/${SPLIT}_input_config.pbtxt
```
The first configuration file contains an `object_detection.protos.InputReader`
message that describes the location of the necessary input files. The second
file contains an `object_detection.protos.EvalConfig` message that describes the
evaluation metric. For more information about these protos see the corresponding
source files.
### Expected mAPs {#expected-maps}
The result of running `offline_eval_map_corloc` is a CSV file located at
`${SPLIT}_eval_metrics/metrics.csv`. With the above configuration, the file will
contain average precision at IoU≥0.5 for each of the classes present in the
dataset. It will also contain the mAP@IoU≥0.5. Both the per-class average
precisions and the mAP are computed according to the [Open Images evaluation
protocol](evaluation_protocols.md). The expected mAPs for the validation and
test sets of Open Images in this case are:
Set | Fraction of data | Images | mAP@IoU≥0.5
---------: | :--------------: | :-----: | -----------
validation | everything | 41,620 | 39.2%
validation | first 2 shards | 884 | 52.4%
test | everything | 125,436 | 37.7%
test | first 2 shards | 2,476 | 50.8%
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment