Merge pull request #2827 from tombstone/documentation

Add dataset tools, pretrained models and documentatinos for open imag…

Merge pull request #2827 from tombstone/documentation
Add dataset tools, pretrained models and documentatinos for open imag…
8349eaf8 · Jonathan Huang · GitHub · 676a4f70 · 560fae89 · 8349eaf8
Unverified Commit 8349eaf8 authored Nov 17, 2017 by Jonathan Huang Committed by GitHub Nov 17, 2017
15 changed files
--- a/research/object_detection/README.md
+++ b/research/object_detection/README.md
@@ -21,6 +21,10 @@ Song Y, Guadarrama S, Murphy K, CVPR 2017
 \[[link](https://arxiv.org/abs/1611.10012)\]\[[bibtex](
 https://scholar.googleusercontent.com/scholar.bib?q=info:l291WsrB-hQJ:scholar.google.com/&output=citation&scisig=AAGBfm0AAAAAWUIIlnPZ_L9jxvPwcC49kDlELtaeIyU-&scisf=4&ct=citation&cd=-1&hl=en&scfhb=1)\]

+<p align="center">
+  <img src="g3doc/img/tf-od-api-logo.png" width=140 height=195>
+</p>
+
 ## Maintainers

 * Jonathan Huang, github: [jch1](https://github.com/jch1)
@@ -59,6 +63,10 @@ Extras:
      Defining your own model architecture</a><br>
  * <a href='g3doc/using_your_own_dataset.md'>
      Bringing in your own dataset</a><br>
+  * <a href='g3doc/oid_inference_and_evaluation.md'>
+      Inference and evaluation on the Open Images dataset</a><br>
+  * <a href='g3doc/evaluation_protocols.md'>
+      Supported object detection evaluation protocols</a><br>

 ## Getting Help

@@ -71,8 +79,21 @@ tensorflow/models Github
 [issue tracker](https://github.com/tensorflow/models/issues), prefixing the
 issue name with "object_detection".

+
+
 ## Release information

+### November 17, 2017
+
+As a part of the Open Images V3 release we have released:
+
+* An implementation of the Open Images evaluation metric and the [protocol](g3doc/evaluation_protocols.md#open-images).
+* Additional tools to separate inference of detection and evaluation (see [this tutorial](g3doc/oid_inference_and_evaluation.md)).
+* A new detection model trained on the Open Images V2 data release (see [Open Images model](g3doc/detection_model_zoo.md#open-images-models)).
+
+See more information on the [Open Images website](https://github.com/openimages/dataset)!
+
+<b>Thanks to contributors</b>: Stefan Popov, Alina Kuznetsova

 ### November 6, 2017

@@ -107,6 +128,7 @@ you to try out other detection models!

 <b>Thanks to contributors</b>: Jonathan Huang, Andrew Harp

+
 ### June 15, 2017

 In addition to our base Tensorflow detection model definitions, this
@@ -130,3 +152,4 @@ release includes:
 <b>Thanks to contributors</b>: Jonathan Huang, Vivek Rathod, Derek Chow,
 Chen Sun, Menglong Zhu, Matthew Tang, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, Jasper Uijlings,
 Viacheslav Kovalevskyi, Kevin Murphy
+
--- a/research/object_detection/dataset_tools/BUILD
+++ b/research/object_detection/dataset_tools/BUILD
@@ -82,3 +82,26 @@ py_library(
        "//tensorflow_models/object_detection/utils:dataset_util",
    ],
 )
+
+py_test(
+    name = "oid_tfrecord_creation_test",
+    srcs = ["oid_tfrecord_creation_test.py"],
+    deps = [
+        ":oid_tfrecord_creation",
+        "//third_party/py/contextlib2",
+        "//third_party/py/pandas",
+        "//third_party/py/tensorflow",
+    ],
+)
+
+py_binary(
+    name = "create_oid_tf_record",
+    srcs = ["create_oid_tf_record.py"],
+    deps = [
+        ":oid_tfrecord_creation",
+        "//third_party/py/contextlib2",
+        "//third_party/py/pandas",
+        "//tensorflow",
+        "//tensorflow_models/object_detection/utils:label_map_util",
+    ],
+)
--- a/research/object_detection/dataset_tools/create_oid_tf_record.py
+++ b/research/object_detection/dataset_tools/create_oid_tf_record.py
+# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+r"""Creates TFRecords of Open Images dataset for object detection.
+
+Example usage:
+  ./create_oid_tf_record \
+    --input_annotations_csv=/path/to/input/annotations-human-bbox.csv \
+    --input_images_directory=/path/to/input/image_pixels_directory \
+    --input_label_map=/path/to/input/labels_bbox_545.labelmap \
+    --output_tf_record_path_prefix=/path/to/output/prefix.tfrecord
+
+CSVs with bounding box annotations and image metadata (including the image URLs)
+can be downloaded from the Open Images GitHub repository:
+https://github.com/openimages/dataset
+
+This script will include every image found in the input_images_directory in the
+output TFRecord, even if the image has no corresponding bounding box annotations
+in the input_annotations_csv.
+"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import os
+
+import contextlib2
+import pandas as pd
+import tensorflow as tf
+
+from object_detection.dataset_tools import oid_tfrecord_creation
+from object_detection.utils import label_map_util
+
+tf.flags.DEFINE_string('input_annotations_csv', None,
+                       'Path to CSV containing image bounding box annotations')
+tf.flags.DEFINE_string('input_images_directory', None,
+                       'Directory containing the image pixels '
+                       'downloaded from the OpenImages GitHub repository.')
+tf.flags.DEFINE_string('input_label_map', None, 'Path to the label map proto')
+tf.flags.DEFINE_string(
+    'output_tf_record_path_prefix', None,
+    'Path to the output TFRecord. The shard index and the number of shards '
+    'will be appended for each output shard.')
+tf.flags.DEFINE_integer('num_shards', 100, 'Number of TFRecord shards')
+
+FLAGS = tf.flags.FLAGS
+
+
+def main(_):
+  tf.logging.set_verbosity(tf.logging.INFO)
+
+  required_flags = [
+      'input_annotations_csv', 'input_images_directory', 'input_label_map',
+      'output_tf_record_path_prefix'
+  ]
+  for flag_name in required_flags:
+    if not getattr(FLAGS, flag_name):
+      raise ValueError('Flag --{} is required'.format(flag_name))
+
+  label_map = label_map_util.get_label_map_dict(FLAGS.input_label_map)
+  all_annotations = pd.read_csv(FLAGS.input_annotations_csv)
+  all_images = tf.gfile.Glob(
+      os.path.join(FLAGS.input_images_directory, '*.jpg'))
+  all_image_ids = [os.path.splitext(os.path.basename(v))[0] for v in all_images]
+  all_image_ids = pd.DataFrame({'ImageID': all_image_ids})
+  all_annotations = pd.concat([all_annotations, all_image_ids])
+
+  tf.logging.log(tf.logging.INFO, 'Found %d images...', len(all_image_ids))
+
+  with contextlib2.ExitStack() as tf_record_close_stack:
+    output_tfrecords = oid_tfrecord_creation.open_sharded_output_tfrecords(
+        tf_record_close_stack, FLAGS.output_tf_record_path_prefix,
+        FLAGS.num_shards)
+
+    for counter, image_data in enumerate(all_annotations.groupby('ImageID')):
+      tf.logging.log_every_n(tf.logging.INFO, 'Processed %d images...', 1000,
+                             counter)
+
+      image_id, image_annotations = image_data
+      # In OID image file names are formed by appending ".jpg" to the image ID.
+      image_path = os.path.join(FLAGS.input_images_directory, image_id + '.jpg')
+      with tf.gfile.Open(image_path) as image_file:
+        encoded_image = image_file.read()
+
+      tf_example = oid_tfrecord_creation.tf_example_from_annotations_data_frame(
+          image_annotations, label_map, encoded_image)
+      if tf_example:
+        shard_idx = long(image_id, 16) % FLAGS.num_shards
+        output_tfrecords[shard_idx].write(tf_example.SerializeToString())
+
+
+if __name__ == '__main__':
+  tf.app.run()
--- a/research/object_detection/dataset_tools/oid_tfrecord_creation.py
+++ b/research/object_detection/dataset_tools/oid_tfrecord_creation.py
+# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+r"""Utilities for creating TFRecords of TF examples for the Open Images dataset.
+"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import tensorflow as tf
+
+from object_detection.core import standard_fields
+from object_detection.utils import dataset_util
+
+
+def tf_example_from_annotations_data_frame(annotations_data_frame, label_map,
+                                           encoded_image):
+  """Populates a TF Example message with image annotations from a data frame.
+
+  Args:
+    annotations_data_frame: Data frame containing the annotations for a single
+      image.
+    label_map: String to integer label map.
+    encoded_image: The encoded image string
+
+  Returns:
+    The populated TF Example, if the label of at least one object is present in
+    label_map. Otherwise, returns None.
+  """
+
+  filtered_data_frame = annotations_data_frame[
+      annotations_data_frame.LabelName.isin(label_map)]
+
+  image_id = annotations_data_frame.ImageID.iloc[0]
+
+  feature_map = {
+      standard_fields.TfExampleFields.object_bbox_ymin:
+          dataset_util.float_list_feature(filtered_data_frame.YMin.as_matrix()),
+      standard_fields.TfExampleFields.object_bbox_xmin:
+          dataset_util.float_list_feature(filtered_data_frame.XMin.as_matrix()),
+      standard_fields.TfExampleFields.object_bbox_ymax:
+          dataset_util.float_list_feature(filtered_data_frame.YMax.as_matrix()),
+      standard_fields.TfExampleFields.object_bbox_xmax:
+          dataset_util.float_list_feature(filtered_data_frame.XMax.as_matrix()),
+      standard_fields.TfExampleFields.object_class_text:
+          dataset_util.bytes_list_feature(
+              filtered_data_frame.LabelName.as_matrix()),
+      standard_fields.TfExampleFields.object_class_label:
+          dataset_util.int64_list_feature(
+              filtered_data_frame.LabelName.map(lambda x: label_map[x])
+              .as_matrix()),
+      standard_fields.TfExampleFields.filename:
+          dataset_util.bytes_feature('{}.jpg'.format(image_id)),
+      standard_fields.TfExampleFields.source_id:
+          dataset_util.bytes_feature(image_id),
+      standard_fields.TfExampleFields.image_encoded:
+          dataset_util.bytes_feature(encoded_image),
+  }
+
+  if 'IsGroupOf' in filtered_data_frame.columns:
+    feature_map[standard_fields.TfExampleFields.
+                object_group_of] = dataset_util.int64_list_feature(
+                    filtered_data_frame.IsGroupOf.as_matrix().astype(int))
+  if 'IsOccluded' in filtered_data_frame.columns:
+    feature_map[standard_fields.TfExampleFields.
+                object_occluded] = dataset_util.int64_list_feature(
+                    filtered_data_frame.IsOccluded.as_matrix().astype(int))
+  if 'IsTruncated' in filtered_data_frame.columns:
+    feature_map[standard_fields.TfExampleFields.
+                object_truncated] = dataset_util.int64_list_feature(
+                    filtered_data_frame.IsTruncated.as_matrix().astype(int))
+  if 'IsDepiction' in filtered_data_frame.columns:
+    feature_map[standard_fields.TfExampleFields.
+                object_depiction] = dataset_util.int64_list_feature(
+                    filtered_data_frame.IsDepiction.as_matrix().astype(int))
+
+  return tf.train.Example(features=tf.train.Features(feature=feature_map))
+
+
+def open_sharded_output_tfrecords(exit_stack, base_path, num_shards):
+  """Opens all TFRecord shards for writing and adds them to an exit stack.
+
+  Args:
+    exit_stack: A context2.ExitStack used to automatically closed the TFRecords
+      opened in this function.
+    base_path: The base path for all shards
+    num_shards: The number of shards
+
+  Returns:
+    The list of opened TFRecords. Position k in the list corresponds to shard k.
+  """
+  tf_record_output_filenames = [
+      '{}-{:05d}-of-{:05d}'.format(base_path, idx, num_shards)
+      for idx in xrange(num_shards)
+  ]
+
+  tfrecords = [
+      exit_stack.enter_context(tf.python_io.TFRecordWriter(file_name))
+      for file_name in tf_record_output_filenames
+  ]
+
+  return tfrecords
--- a/research/object_detection/dataset_tools/oid_tfrecord_creation_test.py
+++ b/research/object_detection/dataset_tools/oid_tfrecord_creation_test.py
+# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for oid_tfrecord_creation.py."""
+
+import os
+import contextlib2
+import pandas as pd
+import tensorflow as tf
+
+from object_detection.dataset_tools import oid_tfrecord_creation
+
+
+def create_test_data():
+  data = {
+      'ImageID': ['i1', 'i1', 'i1', 'i1', 'i2', 'i2'],
+      'LabelName': ['a', 'a', 'b', 'b', 'b', 'c'],
+      'YMin': [0.3, 0.6, 0.8, 0.1, 0.0, 0.0],
+      'XMin': [0.1, 0.3, 0.7, 0.0, 0.1, 0.1],
+      'XMax': [0.2, 0.3, 0.8, 0.5, 0.9, 0.9],
+      'YMax': [0.3, 0.6, 1, 0.8, 0.8, 0.8],
+      'IsOccluded': [0, 1, 1, 0, 0, 0],
+      'IsTruncated': [0, 0, 0, 1, 0, 0],
+      'IsGroupOf': [0, 0, 0, 0, 0, 1],
+      'IsDepiction': [1, 0, 0, 0, 0, 0],
+  }
+  df = pd.DataFrame(data=data)
+  label_map = {'a': 0, 'b': 1, 'c': 2}
+  return label_map, df
+
+
+class TfExampleFromAnnotationsDataFrameTests(tf.test.TestCase):
+
+  def test_simple(self):
+    label_map, df = create_test_data()
+
+    tf_example = oid_tfrecord_creation.tf_example_from_annotations_data_frame(
+        df[df.ImageID == 'i1'], label_map, 'encoded_image_test')
+    self.assertProtoEquals("""
+        features {
+          feature {
+            key: "image/encoded"
+            value { bytes_list { value: "encoded_image_test" } } }
+          feature {
+            key: "image/filename"
+            value { bytes_list { value: "i1.jpg" } } }
+          feature {
+            key: "image/object/bbox/ymin"
+            value { float_list { value: [0.3, 0.6, 0.8, 0.1] } } }
+          feature {
+            key: "image/object/bbox/xmin"
+            value { float_list { value: [0.1, 0.3, 0.7, 0.0] } } }
+          feature {
+            key: "image/object/bbox/ymax"
+            value { float_list { value: [0.3, 0.6, 1.0, 0.8] } } }
+          feature {
+            key: "image/object/bbox/xmax"
+            value { float_list { value: [0.2, 0.3, 0.8, 0.5] } } }
+          feature {
+            key: "image/object/class/label"
+            value { int64_list { value: [0, 0, 1, 1] } } }
+          feature {
+            key: "image/object/class/text"
+            value { bytes_list { value: ["a", "a", "b", "b"] } } }
+          feature {
+            key: "image/source_id"
+            value { bytes_list { value: "i1" } } }
+          feature {
+            key: "image/object/depiction"
+            value { int64_list { value: [1, 0, 0, 0] } } }
+          feature {
+            key: "image/object/group_of"
+            value { int64_list { value: [0, 0, 0, 0] } } }
+          feature {
+            key: "image/object/occluded"
+            value { int64_list { value: [0, 1, 1, 0] } } }
+          feature {
+            key: "image/object/truncated"
+            value { int64_list { value: [0, 0, 0, 1] } } } }
+    """, tf_example)
+
+  def test_no_attributes(self):
+    label_map, df = create_test_data()
+
+    del df['IsDepiction']
+    del df['IsGroupOf']
+    del df['IsOccluded']
+    del df['IsTruncated']
+
+    tf_example = oid_tfrecord_creation.tf_example_from_annotations_data_frame(
+        df[df.ImageID == 'i2'], label_map, 'encoded_image_test')
+    self.assertProtoEquals("""
+        features {
+          feature {
+            key: "image/encoded"
+            value { bytes_list { value: "encoded_image_test" } } }
+          feature {
+            key: "image/filename"
+            value { bytes_list { value: "i2.jpg" } } }
+          feature {
+            key: "image/object/bbox/ymin"
+            value { float_list { value: [0.0, 0.0] } } }
+          feature {
+            key: "image/object/bbox/xmin"
+            value { float_list { value: [0.1, 0.1] } } }
+          feature {
+            key: "image/object/bbox/ymax"
+            value { float_list { value: [0.8, 0.8] } } }
+          feature {
+            key: "image/object/bbox/xmax"
+            value { float_list { value: [0.9, 0.9] } } }
+          feature {
+            key: "image/object/class/label"
+            value { int64_list { value: [1, 2] } } }
+          feature {
+            key: "image/object/class/text"
+            value { bytes_list { value: ["b", "c"] } } }
+          feature {
+            key: "image/source_id"
+           value { bytes_list { value: "i2" } } } }
+    """, tf_example)
+
+  def test_label_filtering(self):
+    label_map, df = create_test_data()
+
+    label_map = {'a': 0}
+
+    tf_example = oid_tfrecord_creation.tf_example_from_annotations_data_frame(
+        df[df.ImageID == 'i1'], label_map, 'encoded_image_test')
+    self.assertProtoEquals("""
+        features {
+          feature {
+            key: "image/encoded"
+            value { bytes_list { value: "encoded_image_test" } } }
+          feature {
+            key: "image/filename"
+            value { bytes_list { value: "i1.jpg" } } }
+          feature {
+            key: "image/object/bbox/ymin"
+            value { float_list { value: [0.3, 0.6] } } }
+          feature {
+            key: "image/object/bbox/xmin"
+            value { float_list { value: [0.1, 0.3] } } }
+          feature {
+            key: "image/object/bbox/ymax"
+            value { float_list { value: [0.3, 0.6] } } }
+          feature {
+            key: "image/object/bbox/xmax"
+            value { float_list { value: [0.2, 0.3] } } }
+          feature {
+            key: "image/object/class/label"
+            value { int64_list { value: [0, 0] } } }
+          feature {
+            key: "image/object/class/text"
+            value { bytes_list { value: ["a", "a"] } } }
+          feature {
+            key: "image/source_id"
+            value { bytes_list { value: "i1" } } }
+          feature {
+            key: "image/object/depiction"
+            value { int64_list { value: [1, 0] } } }
+          feature {
+            key: "image/object/group_of"
+            value { int64_list { value: [0, 0] } } }
+          feature {
+            key: "image/object/occluded"
+            value { int64_list { value: [0, 1] } } }
+          feature {
+            key: "image/object/truncated"
+            value { int64_list { value: [0, 0] } } } }
+    """, tf_example)
+
+
+class OpenOutputTfrecordsTests(tf.test.TestCase):
+
+  def test_sharded_tfrecord_writes(self):
+    with contextlib2.ExitStack() as tf_record_close_stack:
+      output_tfrecords = oid_tfrecord_creation.open_sharded_output_tfrecords(
+          tf_record_close_stack,
+          os.path.join(tf.test.get_temp_dir(), 'test.tfrec'), 10)
+      for idx in range(10):
+        output_tfrecords[idx].write('test_{}'.format(idx))
+
+    for idx in range(10):
+      tf_record_path = '{}-{:05d}-of-00010'.format(
+          os.path.join(tf.test.get_temp_dir(), 'test.tfrec'), idx)
+      records = list(tf.python_io.tf_record_iterator(tf_record_path))
+      self.assertAllEqual(records, ['test_{}'.format(idx)])
+
+
+if __name__ == '__main__':
+  tf.test.main()
--- a/research/object_detection/faster_rcnn_inception_resnet_v2_atrous_oid.config
+++ b/research/object_detection/faster_rcnn_inception_resnet_v2_atrous_oid.config
+# Faster R-CNN with Inception Resnet v2, Atrous version;
+# Configured for Open Images Dataset.
+# Users should configure the fine_tune_checkpoint field in the train config as
+# well as the label_map_path and input_path fields in the train_input_reader and
+# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
+# should be configured.
+
+model {
+  faster_rcnn {
+    num_classes: 546
+    image_resizer {
+      keep_aspect_ratio_resizer {
+        min_dimension: 600
+        max_dimension: 1024
+      }
+    }
+    feature_extractor {
+      type: 'faster_rcnn_inception_resnet_v2'
+      first_stage_features_stride: 8
+    }
+    first_stage_anchor_generator {
+      grid_anchor_generator {
+        scales: [0.25, 0.5, 1.0, 2.0]
+        aspect_ratios: [0.5, 1.0, 2.0]
+        height_stride: 8
+        width_stride: 8
+      }
+    }
+    first_stage_atrous_rate: 2
+    first_stage_box_predictor_conv_hyperparams {
+      op: CONV
+      regularizer {
+        l2_regularizer {
+          weight: 0.0
+        }
+      }
+      initializer {
+        truncated_normal_initializer {
+          stddev: 0.01
+        }
+      }
+    }
+    first_stage_nms_score_threshold: 0.0
+    first_stage_nms_iou_threshold: 0.7
+    first_stage_max_proposals: 300
+    first_stage_localization_loss_weight: 2.0
+    first_stage_objectness_loss_weight: 1.0
+    initial_crop_size: 17
+    maxpool_kernel_size: 1
+    maxpool_stride: 1
+    second_stage_box_predictor {
+      mask_rcnn_box_predictor {
+        use_dropout: false
+        dropout_keep_probability: 1.0
+        fc_hyperparams {
+          op: FC
+          regularizer {
+            l2_regularizer {
+              weight: 0.0
+            }
+          }
+          initializer {
+            variance_scaling_initializer {
+              factor: 1.0
+              uniform: true
+              mode: FAN_AVG
+            }
+          }
+        }
+      }
+    }
+    second_stage_post_processing {
+      batch_non_max_suppression {
+        score_threshold: 0.0
+        iou_threshold: 0.6
+        max_detections_per_class: 100
+        max_total_detections: 100
+      }
+      score_converter: SOFTMAX
+    }
+    second_stage_localization_loss_weight: 2.0
+    second_stage_classification_loss_weight: 1.0
+  }
+}
+
+train_config: {
+  batch_size: 1
+  optimizer {
+    momentum_optimizer: {
+      learning_rate: {
+        manual_step_learning_rate {
+          initial_learning_rate: 0.00006
+          schedule {
+            step: 0
+            learning_rate: .00006
+          }
+          schedule {
+            step: 6000000
+            learning_rate: .000006
+          }
+          schedule {
+            step: 7000000
+            learning_rate: .0000006
+          }
+        }
+      }
+      momentum_optimizer_value: 0.9
+    }
+    use_moving_average: false
+  }
+  gradient_clipping_by_norm: 10.0
+  fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt"
+  # Note: The below line limits the training process to 800K steps, which we
+  # empirically found to be sufficient enough to train the Open Images dataset.
+  # This effectively bypasses the learning rate schedule (the learning rate will
+  # never decay). Remove the below line to train indefinitely.
+  num_steps: 8000000
+  data_augmentation_options {
+    random_horizontal_flip {
+    }
+  }
+}
+
+train_input_reader: {
+  tf_record_input_reader {
+    input_path: "PATH_TO_BE_CONFIGURED/oid_bbox_trainable_train.record"
+  }
+  label_map_path: "PATH_TO_BE_CONFIGURED/oid_bbox_trainable_label_map.pbtxt"
+}
+
+eval_config: {
+  metrics_set: "open_images_metrics"
+  num_examples: 8000
+  # Note: The below line limits the evaluation process to 10 evaluations.
+  # Remove the below line to evaluate indefinitely.
+  max_evals: 10
+}
+
+eval_input_reader: {
+  tf_record_input_reader {
+    input_path: "PATH_TO_BE_CONFIGURED/oid_bbox_trainable_val.record"
+  }
+  label_map_path: "PATH_TO_BE_CONFIGURED/oid_bbox_trainable_label_map.pbtxt"
+  shuffle: false
+  num_readers: 1
+}
--- a/research/object_detection/g3doc/detection_model_zoo.md
+++ b/research/object_detection/g3doc/detection_model_zoo.md
 # Tensorflow detection model zoo

 We provide a collection of detection models pre-trained on the [COCO
-dataset](http://mscoco.org) and the [Kitti dataset](http://www.cvlibs.net/datasets/kitti/).
-These models can be useful for
+dataset](http://mscoco.org), the [Kitti dataset](http://www.cvlibs.net/datasets/kitti/), and the
+[Open Images dataset](https://github.com/openimages/dataset). These models can
+be useful for
 out-of-the-box inference if you are interested in categories already in COCO
-(e.g., humans, cars, etc). They are also useful for initializing your models when
+(e.g., humans, cars, etc) or in Open Images (e.g.,
+surfboard, jacuzzi, etc). They are also useful for initializing your models when
 training on novel datasets.

 In the table below, we list each such pre-trained model including:
@@ -18,7 +20,7 @@ In the table below, we list each such pre-trained model including:
  configuration (these timings were performed using an Nvidia
  GeForce GTX TITAN X card) and should be treated more as relative timings in
  many cases.
-* detector performance on subset of the COCO validation set.
+* detector performance on subset of the COCO validation set or Open Images test split as measured by the dataset-specific mAP measure.
  Here, higher is better, and we only report bounding box mAP rounded to the
  nearest integer.
 * Output types (currently only `Boxes`)
@@ -86,5 +88,14 @@ Model name
 ----------------------------------------------------------------------------------------------------------------------------------------------------------------- | :---: | :-------------: | :-----:
 [faster_rcnn_resnet101_kitti](http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_kitti_2017_11_08.tar.gz) | 79  | 87              | Boxes

+## Open Images-trained models {#open-images-models}
+
+Model name                                                                                                                                                        | Speed (ms) | Open Images mAP@0.5[^2] | Outputs
+----------------------------------------------------------------------------------------------------------------------------------------------------------------- | :---: | :-------------: | :-----:
+[faster_rcnn_inception_resnet_v2_atrous_oid](http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_resnet_v2_atrous_oid_2017_11_08.tar.gz) | 727 | 37              | Boxes
+[faster_rcnn_inception_resnet_v2_atrous_lowproposals_oid](http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_resnet_v2_atrous_lowproposals_oid_2017_11_08.tar.gz) | 347  |               | Boxes
+
+
 [^1]: See [MSCOCO evaluation protocol](http://cocodataset.org/#detections-eval).
+[^2]: This is PASCAL mAP with a slightly different way of true positives computation: see [Open Images evaluation protocol](evaluation_protocols.md#open-images).

--- a/research/object_detection/g3doc/evaluation_protocols.md
+++ b/research/object_detection/g3doc/evaluation_protocols.md
+# Supported object detection evaluation protocols
+
+The Tensorflow Object Detection API currently supports three evaluation protocols,
+that can be configured in `EvalConfig` by setting `metrics_set` to the
+corresponding value.
+
+## PASCAL VOC 2007 metric
+
+`EvalConfig.metrics_set='pascal_voc_metrics'`
+
+The commonly used mAP metric for evaluating the quality of object detectors, computed according to the protocol of the PASCAL VOC Challenge 2007.
+The protocol is available [here](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/devkit_doc_07-Jun-2007.pdf).
+
+
+## Weighted PASCAL VOC metric
+
+`EvalConfig.metrics_set='weighted_pascal_voc_metrics'`
+
+The weighted PASCAL metric computes the mean average precision as the average
+precision when treating all classes as a single class. In comparison,
+PASCAL metrics computes the mean average precision as the mean of the
+per-class average precisions.
+
+For example, the test set consists of two classes, "cat" and "dog", and there are ten times more boxes of "cat" than those of "dog".
+According to PASCAL VOC 2007 metric, performance on each of the two classes would contribute equally towards the final mAP value,
+while for the Weighted PASCAL VOC metric the final mAP value will be influenced by frequency of each class.
+
+## Open Images metric {#open-images}
+
+`EvalConfig.metrics_set='open_images_metrics'`
+
+This metric is defined originally for evaluating detector performance on [Open Images V2 dataset](https://github.com/openimages/dataset)
+and is fairly similar to the PASCAL VOC 2007 metric mentioned above.
+It computes interpolated average precision (AP) for each class and averages it among all classes (mAP).
+
+The difference to the PASCAL VOC 2007 metric is the following: Open Images
+annotations contain `group-of` ground-truth boxes (see [Open Images data
+description](https://github.com/openimages/dataset#annotations-human-bboxcsv)),
+that are treated differently for the purpose of deciding whether detections are
+"true positives", "ignored", "false positives". Here we define these three
+cases:
+
+A detection is a "true positive" if there is a non-group-of ground-truth box,
+such that:
+
+*   The detection box and the ground-truth box are of the same class, and
+    intersection-over-union (IoU) between the detection box and the ground-truth
+    box is greater than the IoU threshold (default value 0.5). \
+    Illustration of handling non-group-of boxes: \
+    ![alt
+    groupof_case_eval](img/nongroupof_case_eval.png "illustration of handling non-group-of boxes: yellow box - ground truth bounding box; green box - true positive; red box - false positives."){width="500" height="270"}
+
+    *   yellow box - ground-truth box;
+    *   green box - true positive;
+    *   red boxes - false positives.
+
+*   This is the highest scoring detection for this ground truth box that
+    satisfies the criteria above.
+
+A detection is "ignored" if it is not a true positive, and there is a `group-of`
+ground-truth box such that:
+
+*   The detection box and the ground-truth box are of the same class, and the
+    area of intersection between the detection box and the ground-truth box
+    divided by the area of the detection is greater than 0.5. This is intended
+    to measure whether the detection box is approximately inside the group-of
+    ground-truth box. \
+    Illustration of handling `group-of` boxes: \
+    ![alt
+    groupof_case_eval](img/groupof_case_eval.png "illustration of handling group-of boxes: yellow box - ground truth bounding box; grey boxes - two detections of cars, that are ignored; red box - false positive."){width="500" height="270"}
+
+    *   yellow box - ground-truth box;
+    *   grey boxes - two detections on cars, that are ignored;
+    *   red box - false positive.
+
+A detection is a "false positive" if it is neither a "true positive" nor
+"ignored".
+
+Precision and recall are defined as:
+
+* Precision = number-of-true-positives/(number-of-true-positives + number-of-false-positives)
+* Recall = number-of-true-positives/number-of-non-group-of-boxes
+
+Note that detections ignored as firing on a `group-of` ground-truth box do not
+contribute to the number of true positives.
+
+The labels in Open Images are organized in a
+[hierarchy](https://storage.googleapis.com/openimages/2017_07/bbox_labels_vis/bbox_labels_vis.html).
+Ground-truth bounding-boxes are annotated with the most specific class available
+in the hierarchy. For example, "car" has two children "limousine" and "van". Any
+other kind of car is annotated as "car" (for example, a sedan). Given this
+convention, the evaluation software treats all classes independently, ignoring
+the hierarchy. To achieve high performance values, object detectors should
+output bounding-boxes labelled in the same manner.
--- a/research/object_detection/g3doc/img/groupof_case_eval.png
+++ b/research/object_detection/g3doc/img/groupof_case_eval.png
--- a/research/object_detection/g3doc/img/nongroupof_case_eval.png
+++ b/research/object_detection/g3doc/img/nongroupof_case_eval.png
--- a/research/object_detection/g3doc/img/oid_bus_72e19c28aac34ed8.jpg
+++ b/research/object_detection/g3doc/img/oid_bus_72e19c28aac34ed8.jpg
--- a/research/object_detection/g3doc/img/oid_monkey_3b4168c89cecbc5b.jpg
+++ b/research/object_detection/g3doc/img/oid_monkey_3b4168c89cecbc5b.jpg
--- a/research/object_detection/g3doc/img/tf-od-api-logo.png
+++ b/research/object_detection/g3doc/img/tf-od-api-logo.png
--- a/research/object_detection/g3doc/installation.md
+++ b/research/object_detection/g3doc/installation.md
@@ -14,7 +14,7 @@ Tensorflow Object Detection API depends on the following libraries:

 For detailed steps to install Tensorflow, follow the
 [Tensorflow installation instructions](https://www.tensorflow.org/install/).
-A typical user can install Tensorflow using one of the following commands:
+A typically user can install Tensorflow using one of the following commands:

 ``` bash
 # For CPU

--- a/research/object_detection/g3doc/oid_inference_and_evaluation.md
+++ b/research/object_detection/g3doc/oid_inference_and_evaluation.md
+# Inference and evaluation on the Open Images dataset
+
+This page presents a tutorial for running object detector inference and
+evaluation measure computations on the [Open Images
+dataset](https://github.com/openimages/dataset), using tools from the
+[TensorFlow Object Detection
+API](https://github.com/tensorflow/models/tree/master/research/object_detection).
+It shows how to download the images and annotations for the validation and test
+sets of Open Images; how to package the downloaded data in a format understood
+by the Object Detection API; where to find a trained object detector model for
+Open Images; how to run inference; and how to compute evaluation measures on the
+inferred detections.
+
+Inferred detections will look like the following:
+
+![](img/oid_bus_72e19c28aac34ed8.jpg){height="300"}
+![](img/oid_monkey_3b4168c89cecbc5b.jpg){height="300"}
+
+On the validation set of Open Images, this tutorial requires 27GB of free disk
+space and the inference step takes approximately 9 hours on a single NVIDIA
+Tesla P100 GPU. On the test set -- 75GB and 27 hours respectively. All other
+steps require less than two hours in total on both sets.
+
+## Installing TensorFlow, the Object Detection API, and Google Cloud SDK
+
+Please run through the [installation instructions](installation.md) to install
+TensorFlow and all its dependencies. Ensure the Protobuf libraries are compiled
+and the library directories are added to `PYTHONPATH`. You will also need to
+`pip` install `pandas` and `contextlib2`.
+
+Some of the data used in this tutorial lives in Google Cloud buckets. To access
+it, you will have to [install the Google Cloud
+SDK](https://cloud.google.com/sdk/downloads) on your workstation or laptop.
+
+## Preparing the Open Images validation and test sets
+
+In order to run inference and subsequent evaluation measure computations, we
+require a dataset of images and ground truth boxes, packaged as TFRecords of
+TFExamples. To create such a dataset for Open Images, you will need to first
+download ground truth boxes from the [Open Images
+website](https://github.com/openimages/dataset):
+
+```bash
+# From tensorflow/models/research
+mkdir oid
+cd oid
+wget https://storage.googleapis.com/openimages/2017_07/annotations_human_bbox_2017_07.tar.gz
+tar -xvf annotations_human_bbox_2017_07.tar.gz
+```
+
+Next, download the images. In this tutorial, we will use lower resolution images
+provided by [CVDF](http://www.cvdfoundation.org). Please follow the instructions
+on [CVDF's Open Images repository
+page](https://github.com/cvdfoundation/open-images-dataset) in order to gain
+access to the cloud bucket with the images. Then run:
+
+```bash
+# From tensorflow/models/research/oid
+SPLIT=validation  # Set SPLIT to "test" to download the images in the test set
+mkdir raw_images_${SPLIT}
+gsutil -m rsync -r gs://open-images-dataset/$SPLIT raw_images_${SPLIT}
+```
+
+Another option for downloading the images is to follow the URLs contained in the
+[image URLs and metadata CSV
+files](https://storage.googleapis.com/openimages/2017_07/images_2017_07.tar.gz)
+on the Open Images website.
+
+At this point, your `tensorflow/models/research/oid` directory should appear as
+follows:
+
+```lang-none
+|-- 2017_07
+|   |-- test
+|   |   `-- annotations-human-bbox.csv
+|   |-- train
+|   |   `-- annotations-human-bbox.csv
+|   `-- validation
+|       `-- annotations-human-bbox.csv
+|-- raw_images_validation (if you downloaded the validation split)
+|   `-- ... (41,620 files matching regex "[0-9a-f]{16}.jpg")
+|-- raw_images_test (if you downloaded the test split)
+|   `-- ... (125,436 files matching regex "[0-9a-f]{16}.jpg")
+`-- annotations_human_bbox_2017_07.tar.gz
+```
+
+Next, package the data into TFRecords of TFExamples by running:
+
+```bash
+# From tensorflow/models/research/oid
+SPLIT=validation  # Set SPLIT to "test" to create TFRecords for the test split
+mkdir ${SPLIT}_tfrecords
+
+PYTHONPATH=$PYTHONPATH:$(readlink -f ..) \
+python -m object_detection/dataset_tools/create_oid_tf_record \
+  --input_annotations_csv 2017_07/$SPLIT/annotations-human-bbox.csv \
+  --input_images_directory raw_images_${SPLIT} \
+  --input_label_map ../object_detection/data/oid_bbox_trainable_label_map.pbtxt \
+  --output_tf_record_path_prefix ${SPLIT}_tfrecords/$SPLIT.tfrecord \
+  --num_shards=100
+```
+
+This results in 100 TFRecord files (shards), written to
+`oid/${SPLIT}_tfrecords`, with filenames matching
+`${SPLIT}.tfrecord-000[0-9][0-9]-of-00100`. Each shard contains approximately
+the same number of images and is defacto a representative random sample of the
+input data. [This enables](#accelerating_inference) a straightforward work
+division scheme for distributing inference and also approximate measure
+computations on subsets of the validation and test sets.
+
+## Inferring detections
+
+Inference requires a trained object detection model. In this tutorial we will
+use a model from the [detections model zoo](detection_model_zoo.md), which can
+be downloaded and unpacked by running the commands below. More information about
+the model, such as its architecture and how it was trained, is available in the
+[model zoo page](detection_model_zoo.md).
+
+```bash
+# From tensorflow/models/research/oid
+wget http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_resnet_v2_atrous_oid_14_10_2017.tar.gz
+tar -zxvf faster_rcnn_inception_resnet_v2_atrous_oid_14_10_2017.tar.gz
+```
+
+At this point, data is packed into TFRecords and we have an object detector
+model. We can run inference using:
+
+```bash
+# From tensorflow/models/research/oid
+SPLIT=validation  # or test
+TF_RECORD_FILES=$(ls -1 ${SPLIT}_tfrecords/* | tr '\n' ',')
+
+PYTHONPATH=$PYTHONPATH:$(readlink -f ..) \
+python -m object_detection/inference/infer_detections \
+  --input_tfrecord_paths=$TF_RECORD_FILES \
+  --output_tfrecord_path=${SPLIT}_detections.tfrecord-00000-of-00001 \
+  --inference_graph=faster_rcnn_inception_resnet_v2_atrous_oid/frozen_inference_graph.pb \
+  --discard_image_pixels
+```
+
+Inference preserves all fields of the input TFExamples, and adds new fields to
+store the inferred detections. This allows [computing evaluation
+measures](#compute_evaluation_measures) on the output TFRecord alone, as ground
+truth boxes are preserved as well. Since measure computations don't require
+access to the images, `infer_detections` can optionally discard them with the
+`--discard_image_pixels` flag. Discarding the images drastically reduces the
+size of the output TFRecord.
+
+### Accelerating inference {#accelerating_inference}
+
+Running inference on the whole validation or test set can take a long time to
+complete due to the large number of images present in these sets (41,620 and
+125,436 respectively). For quick but approximate evaluation, inference and the
+subsequent measure computations can be run on a small number of shards. To run
+for example on 2% of all the data, it is enough to set `TF_RECORD_FILES` as
+shown below before running `infer_detections`:
+
+```bash
+TF_RECORD_FILES=$(ls ${SPLIT}_tfrecords/${SPLIT}.tfrecord-0000[0-1]-of-00100 | tr '\n' ',')
+```
+
+Please note that computing evaluation measures on a small subset of the data
+introduces variance and bias, since some classes of objects won't be seen during
+evaluation. In the example above, this leads to 13.2% higher mAP on the first
+two shards of the validation set compared to the mAP for the full set ([see mAP
+results](#expected-maps)).
+
+Another way to accelerate inference is to run it in parallel on multiple
+TensorFlow devices on possibly multiple machines. The script below uses
+[tmux](https://github.com/tmux/tmux/wiki) to run a separate `infer_detections`
+process for each GPU on different partition of the input data.
+
+```bash
+# From tensorflow/models/research/oid
+SPLIT=validation  # or test
+NUM_GPUS=4
+NUM_SHARDS=100
+
+tmux new-session -d -s "inference"
+function tmux_start { tmux new-window -d -n "inference:GPU$1" "${*:2}; exec bash"; }
+for gpu_index in $(seq 0 $(($NUM_GPUS-1))); do
+  start_shard=$(( $gpu_index * $NUM_SHARDS / $NUM_GPUS ))
+  end_shard=$(( ($gpu_index + 1) * $NUM_SHARDS / $NUM_GPUS - 1))
+  TF_RECORD_FILES=$(seq -s, -f "${SPLIT}_tfrecords/${SPLIT}.tfrecord-%05.0f-of-$(printf '%05d' $NUM_SHARDS)" $start_shard $end_shard)
+  tmux_start ${gpu_index} \
+  PYTHONPATH=$PYTHONPATH:$(readlink -f ..) CUDA_VISIBLE_DEVICES=$gpu_index \
+  python -m object_detection/inference/infer_detections \
+    --input_tfrecord_paths=$TF_RECORD_FILES \
+    --output_tfrecord_path=${SPLIT}_detections.tfrecord-$(printf "%05d" $gpu_index)-of-$(printf "%05d" $NUM_GPUS) \
+    --inference_graph=faster_rcnn_inception_resnet_v2_atrous_oid/frozen_inference_graph.pb \
+    --discard_image_pixels
+done
+```
+
+After all `infer_detections` processes finish, `tensorflow/models/research/oid`
+will contain one output TFRecord from each process, with name matching
+`validation_detections.tfrecord-0000[0-3]-of-00004`.
+
+## Computing evaluation measures {#compute_evaluation_measures}
+
+To compute evaluation measures on the inferred detections you first need to
+create the appropriate configuration files:
+
+```bash
+# From tensorflow/models/research/oid
+SPLIT=validation  # or test
+NUM_SHARDS=1  # Set to NUM_GPUS if using the parallel evaluation script above
+
+mkdir -p ${SPLIT}_eval_metrics
+
+echo "
+label_map_path: '../object_detection/data/oid_bbox_trainable_label_map.pbtxt'
+tf_record_input_reader: { input_path: '${SPLIT}_detections.tfrecord@${NUM_SHARDS}' }
+" > ${SPLIT}_eval_metrics/${SPLIT}_input_config.pbtxt
+
+echo "
+metrics_set: 'open_images_metrics'
+" > ${SPLIT}_eval_metrics/${SPLIT}_eval_config.pbtxt
+```
+
+And then run:
+
+```bash
+# From tensorflow/models/research/oid
+SPLIT=validation  # or test
+
+PYTHONPATH=$PYTHONPATH:$(readlink -f ..) \
+python -m object_detection/metrics/offline_eval_map_corloc \
+  --eval_dir=${SPLIT}_eval_metrics \
+  --eval_config_path=${SPLIT}_eval_metrics/${SPLIT}_eval_config.pbtxt \
+  --input_config_path=${SPLIT}_eval_metrics/${SPLIT}_input_config.pbtxt
+```
+
+The first configuration file contains an `object_detection.protos.InputReader`
+message that describes the location of the necessary input files. The second
+file contains an `object_detection.protos.EvalConfig` message that describes the
+evaluation metric. For more information about these protos see the corresponding
+source files.
+
+### Expected mAPs {#expected-maps}
+
+The result of running `offline_eval_map_corloc` is a CSV file located at
+`${SPLIT}_eval_metrics/metrics.csv`. With the above configuration, the file will
+contain average precision at IoU≥0.5 for each of the classes present in the
+dataset. It will also contain the mAP@IoU≥0.5. Both the per-class average
+precisions and the mAP are computed according to the [Open Images evaluation
+protocol](evaluation_protocols.md). The expected mAPs for the validation and
+test sets of Open Images in this case are:
+
+Set        | Fraction of data | Images  | mAP@IoU≥0.5
+---------: | :--------------: | :-----: | -----------
+validation | everything       | 41,620  | 39.2%
+validation | first 2 shards   | 884     | 52.4%
+test       | everything       | 125,436 | 37.7%
+test       | first 2 shards   | 2,476   | 50.8%