README.md 17.5 KB
Newer Older
1
![TensorFlow Requirement: 1.15](https://img.shields.io/badge/TensorFlow%20Requirement-1.15-brightgreen)
2
![TensorFlow 2 Not Supported](https://img.shields.io/badge/TensorFlow%202%20Not%20Supported-%E2%9C%95-red.svg)
Vivek Rathod's avatar
Vivek Rathod committed
3

4
# Tensorflow Object Detection API
5

6
7
8
9
Creating accurate machine learning models capable of localizing and identifying
multiple objects in a single image remains a core challenge in computer vision.
The TensorFlow Object Detection API is an open source framework built on top of
TensorFlow that makes it easy to construct, train and deploy object detection
10
11
12
models. At Google we’ve certainly found this codebase to be useful for our
computer vision needs, and we hope that you will as well. <p align="center">
<img src="g3doc/img/kites_detections_output.jpg" width=676 height=450> </p>
13
Contributions to the codebase are welcome and we would love to hear back from
14
you if you find this API useful. Finally if you use the Tensorflow Object
15
16
17
18
19
20
21
Detection API for a research publication, please consider citing:

```
"Speed/accuracy trade-offs for modern convolutional object detectors."
Huang J, Rathod V, Sun C, Zhu M, Korattikara A, Fathi A, Fischer I, Wojna Z,
Song Y, Guadarrama S, Murphy K, CVPR 2017
```
22
23

\[[link](https://arxiv.org/abs/1611.10012)\]\[[bibtex](https://scholar.googleusercontent.com/scholar.bib?q=info:l291WsrB-hQJ:scholar.google.com/&output=citation&scisig=AAGBfm0AAAAAWUIIlnPZ_L9jxvPwcC49kDlELtaeIyU-&scisf=4&ct=citation&cd=-1&hl=en&scfhb=1)\]
24

25
26
27
28
<p align="center">
  <img src="g3doc/img/tf-od-api-logo.png" width=140 height=195>
</p>

29
30
## Maintainers

31
32
33
34
35
36
37
38
39
40
Name           | GitHub
-------------- | ---------------------------------------------
Jonathan Huang | [jch1](https://github.com/jch1)
Vivek Rathod   | [tombstone](https://github.com/tombstone)
Ronny Votel    | [ronnyvotel](https://github.com/ronnyvotel)
Derek Chow     | [derekjchow](https://github.com/derekjchow)
Chen Sun       | [jesu9](https://github.com/jesu9)
Menglong Zhu   | [dreamdragon](https://github.com/dreamdragon)
Alireza Fathi  | [afathi3](https://github.com/afathi3)
Zhichao Lu     | [pkulzc](https://github.com/pkulzc)
41
42
43

## Table of contents

44
45
Setup:

46
*   <a href='g3doc/installation.md'>Installation</a><br>
47

48
Quick Start:
49

50
*   <a href='object_detection_tutorial.ipynb'>
51
      Quick Start: Jupyter notebook for off-the-shelf inference</a><br>
52
*   <a href="g3doc/running_pets.md">Quick Start: Training a pet detector</a><br>
53

54
Customizing a Pipeline:
55

56
*   <a href='g3doc/configuring_jobs.md'>
57
      Configuring an object detection pipeline</a><br>
58
*   <a href='g3doc/preparing_inputs.md'>Preparing inputs</a><br>
59
60

Running:
61

62
63
*   <a href='g3doc/running_locally.md'>Running locally</a><br>
*   <a href='g3doc/running_on_cloud.md'>Running on the cloud</a><br>
64
65

Extras:
66

67
68
*   <a href='g3doc/detection_model_zoo.md'>Tensorflow detection model zoo</a><br>
*   <a href='g3doc/exporting_models.md'>
69
      Exporting a trained model for inference</a><br>
70
*   <a href='g3doc/tpu_exporters.md'>
71
      Exporting a trained model for TPU inference</a><br>
72
*   <a href='g3doc/defining_your_own_model.md'>
73
      Defining your own model architecture</a><br>
74
*   <a href='g3doc/using_your_own_dataset.md'>
75
      Bringing in your own dataset</a><br>
76
*   <a href='g3doc/evaluation_protocols.md'>
77
      Supported object detection evaluation protocols</a><br>
78
*   <a href='g3doc/oid_inference_and_evaluation.md'>
Vivek Rathod's avatar
Vivek Rathod committed
79
      Inference and evaluation on the Open Images dataset</a><br>
80
*   <a href='g3doc/instance_segmentation.md'>
Zhichao Lu's avatar
Zhichao Lu committed
81
      Run an instance segmentation model</a><br>
82
*   <a href='g3doc/challenge_evaluation.md'>
pkulzc's avatar
pkulzc committed
83
      Run the evaluation for the Open Images Challenge 2018/2019</a><br>
84
*   <a href='g3doc/tpu_compatibility.md'>
85
      TPU compatible detection pipelines</a><br>
86
*   <a href='g3doc/running_on_mobile_tensorflowlite.md'>
87
      Running object detection on mobile devices with TensorFlow Lite</a><br>
88
89
*   <a href='g3doc/context_rcnn.md'>
      Context R-CNN documentation for data preparation, training, and export</a><br>
90
91
92

## Getting Help

93
94
95
96
97
To get help with issues you may encounter using the Tensorflow Object Detection
API, create a new question on [StackOverflow](https://stackoverflow.com/) with
the tags "tensorflow" and "object-detection".

Please report bugs (actually broken code, not usage questions) to the
98
tensorflow/models GitHub
99
[issue tracker](https://github.com/tensorflow/models/issues), prefixing the
100
issue name with "object_detection".
101

102
103
Please check [FAQ](g3doc/faq.md) for frequently asked questions before reporting
an issue.
104

105
## Release information
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
### June 17th, 2020

We have released [Context R-CNN](https://arxiv.org/abs/1912.03538), a model that
uses attention to incorporate contextual information images (e.g. from
temporally nearby frames taken by a static camera) in order to improve accuracy.
Importantly, these contextual images need not be labeled.

*   When applied to a challenging wildlife detection dataset ([Snapshot Serengeti](http://lila.science/datasets/snapshot-serengeti)),
    Context R-CNN with context from up to a month of images outperforms a
    single-frame baseline by 17.9% mAP, and outperforms S3D (a 3d convolution
    based baseline) by 11.2% mAP.
*   Context R-CNN leverages temporal context from the unlabeled frames of a
    novel camera deployment to improve performance at that camera, boosting
    model generalizeability.

121
122
Read about Context R-CNN on the Google AI blog [here](https://ai.googleblog.com/2020/06/leveraging-temporal-context-for-object.html).

123
124
125
126
127
128
129
130
131
132
133
134
135
We have provided code for generating data with associated context
[here](g3doc/context_rcnn.md), and a sample config for a Context R-CNN
model [here](samples/configs/context_rcnn_resnet101_snapshot_serengeti_sync.config).

Snapshot Serengeti-trained Faster R-CNN and Context R-CNN models can be found in
the [model zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md#snapshot-serengeti-camera-trap-trained-models).

A colab demonstrating Context R-CNN is provided
[here](colab_tutorials/context_rcnn_tutorial.ipynb).

<b>Thanks to contributors</b>: Sara Beery, Jonathan Huang, Guanhang Wu, Vivek
Rathod, Ronny Votel, Zhichao Lu, David Ross, Pietro Perona, Tanya Birch, and
the Wildlife Insights AI Team.
136

137
138
### May 19th, 2020

139
140
141
142
143
144
145
146
We have released [MobileDets](https://arxiv.org/abs/2004.14525), a set of
high-performance models for mobile CPUs, DSPs and EdgeTPUs.

*   MobileDets outperform MobileNetV3+SSDLite by 1.7 mAP at comparable mobile
    CPU inference latencies. MobileDets also outperform MobileNetV2+SSDLite by
    1.9 mAP on mobile CPUs, 3.7 mAP on EdgeTPUs and 3.4 mAP on DSPs while
    running equally fast. MobileDets also offer up to 2x speedup over MnasFPN on
    EdgeTPUs and DSPs.
147
148
149
150
151

For each of the three hardware platforms we have released model definition,
model checkpoints trained on the COCO14 dataset and converted TFLite models in
fp32 and/or uint8.

152
153
154
<b>Thanks to contributors</b>: Yunyang Xiong, Hanxiao Liu, Suyog Gupta, Berkin
Akin, Gabriel Bender, Pieter-Jan Kindermans, Mingxing Tan, Vikas Singh, Bo Chen,
Quoc Le, Zhichao Lu.
155

156
### May 7th, 2020
157

158
159
160
We have released a mobile model with the
[MnasFPN head](https://arxiv.org/abs/1912.01106).

161
162
163
164
165
*   MnasFPN with MobileNet-V2 backbone is the most accurate (26.6 mAP at 183ms
    on Pixel 1) mobile detection model we have released to date. With
    depth-multiplier, MnasFPN with MobileNet-V2 backbone is 1.8 mAP higher than
    MobileNet-V3-Large with SSDLite (23.8 mAP vs 22.0 mAP) at similar latency
    (120ms) on Pixel 1.
166

167
168
We have released model definition, model checkpoints trained on the COCO14
dataset and a converted TFLite model.
169

170
171
172
<b>Thanks to contributors</b>: Bo Chen, Golnaz Ghiasi, Hanxiao Liu, Tsung-Yi
Lin, Dmitry Kalenichenko, Hartwig Adam, Quoc Le, Zhichao Lu, Jonathan Huang, Hao
Xu.
173

174
### Nov 13th, 2019
175

176
177
We have released MobileNetEdgeTPU SSDLite model.

178
179
180
*   SSDLite with MobileNetEdgeTPU backbone, which achieves 10% mAP higher than
    MobileNetV2 SSDLite (24.3 mAP vs 22 mAP) on a Google Pixel4 at comparable
    latency (6.6ms vs 6.8ms).
181

182
183
Along with the model definition, we are also releasing model checkpoints trained
on the COCO dataset.
184
185
186
187

<b>Thanks to contributors</b>: Yunyang Xiong, Bo Chen, Suyog Gupta, Hanxiao Liu,
Gabriel Bender, Mingxing Tan, Berkin Akin, Zhichao Lu, Quoc Le

188
### Oct 15th, 2019
189

190
191
192
We have released two MobileNet V3 SSDLite models (presented in
[Searching for MobileNetV3](https://arxiv.org/abs/1905.02244)).

193
194
195
196
*   SSDLite with MobileNet-V3-Large backbone, which is 27% faster than Mobilenet
    V2 SSDLite (119ms vs 162ms) on a Google Pixel phone CPU at the same mAP.
*   SSDLite with MobileNet-V3-Small backbone, which is 37% faster than MnasNet
    SSDLite reduced with depth-multiplier (43ms vs 68ms) at the same mAP.
197

198
199
Along with the model definition, we are also releasing model checkpoints trained
on the COCO dataset.
200
201
202

<b>Thanks to contributors</b>: Bo Chen, Zhichao Lu, Vivek Rathod, Jonathan Huang

pkulzc's avatar
pkulzc committed
203
204
205
206
207
208
209
### July 1st, 2019

We have released an updated set of utils and an updated
[tutorial](g3doc/challenge_evaluation.md) for all three tracks of the
[Open Images Challenge 2019](https://storage.googleapis.com/openimages/web/challenge2019.html)!

The Instance Segmentation metric for
210
211
212
213
[Open Images V5](https://storage.googleapis.com/openimages/web/index.html) and
[Challenge 2019](https://storage.googleapis.com/openimages/web/challenge2019.html)
is part of this release. Check out
[the metric description](https://storage.googleapis.com/openimages/web/evaluation.html#instance_segmentation_eval)
pkulzc's avatar
pkulzc committed
214
215
216
217
on the Open Images website.

<b>Thanks to contributors</b>: Alina Kuznetsova, Rodrigo Benenson

218
219
### Feb 11, 2019

220
221
We have released detection models trained on the Open Images Dataset V4 in our
detection model zoo, including
222

223
224
225
*   Faster R-CNN detector with Inception Resnet V2 feature extractor
*   SSD detector with MobileNet V2 feature extractor
*   SSD detector with ResNet 101 FPN feature extractor (aka RetinaNet-101)
226
227
228

<b>Thanks to contributors</b>: Alina Kuznetsova, Yinxiao Li

229
230
231
### Sep 17, 2018

We have released Faster R-CNN detectors with ResNet-50 / ResNet-101 feature
232
233
extractors trained on the
[iNaturalist Species Detection Dataset](https://github.com/visipedia/inat_comp/blob/master/2017/README.md#bounding-boxes).
234
235
236
237
238
239
The models are trained on the training split of the iNaturalist data for 4M
iterations, they achieve 55% and 58% mean AP@.5 over 2854 classes respectively.
For more details please refer to this [paper](https://arxiv.org/abs/1707.06642).

<b>Thanks to contributors</b>: Chen Sun

240
241
242
243
244
### July 13, 2018

There are many new updates in this release, extending the functionality and
capability of the API:

245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
*   Moving from slim-based training to
    [Estimator](https://www.tensorflow.org/api_docs/python/tf/estimator/Estimator)-based
    training.
*   Support for [RetinaNet](https://arxiv.org/abs/1708.02002), and a
    [MobileNet](https://ai.googleblog.com/2017/06/mobilenets-open-source-models-for.html)
    adaptation of RetinaNet.
*   A novel SSD-based architecture called the
    [Pooling Pyramid Network](https://arxiv.org/abs/1807.03284) (PPN).
*   Releasing several [TPU](https://cloud.google.com/tpu/)-compatible models.
    These can be found in the `samples/configs/` directory with a comment in the
    pipeline configuration files indicating TPU compatibility.
*   Support for quantized training.
*   Updated documentation for new binaries, Cloud training, and
    [Tensorflow Lite](https://www.tensorflow.org/mobile/tflite/).

See also our
[expanded announcement blogpost](https://ai.googleblog.com/2018/07/accelerated-training-and-inference-with.html)
and accompanying tutorial at the
[TensorFlow blog](https://medium.com/tensorflow/training-and-serving-a-realtime-mobile-object-detector-in-30-minutes-with-cloud-tpus-b78971cf1193).
pkulzc's avatar
pkulzc committed
264

265
266
267
<b>Thanks to contributors</b>: Sara Robinson, Aakanksha Chowdhery, Derek Chow,
Pengchong Jin, Jonathan Huang, Vivek Rathod, Zhichao Lu, Ronny Votel

268
269
### June 25, 2018

270
271
272
273
Additional evaluation tools for the
[Open Images Challenge 2018](https://storage.googleapis.com/openimages/web/challenge.html)
are out. Check out our short tutorial on data preparation and running evaluation
[here](g3doc/challenge_evaluation.md)!
274
275
276
277
278

<b>Thanks to contributors</b>: Alina Kuznetsova

### June 5, 2018

279
280
281
282
283
284
285
We have released the implementation of evaluation metrics for both tracks of the
[Open Images Challenge 2018](https://storage.googleapis.com/openimages/web/challenge.html)
as a part of the Object Detection API - see the
[evaluation protocols](g3doc/evaluation_protocols.md) for more details.
Additionally, we have released a tool for hierarchical labels expansion for the
Open Images Challenge: check out
[oid_hierarchical_labels_expansion.py](dataset_tools/oid_hierarchical_labels_expansion.py).
286

287
288
<b>Thanks to contributors</b>: Alina Kuznetsova, Vittorio Ferrari, Jasper
Uijlings
289

290
291
### April 30, 2018

292
293
294
295
296
297
We have released a Faster R-CNN detector with ResNet-101 feature extractor
trained on [AVA](https://research.google.com/ava/) v2.1. Compared with other
commonly used object detectors, it changes the action classification loss
function to per-class Sigmoid loss to handle boxes with multiple labels. The
model is trained on the training split of AVA v2.1 for 1.5M iterations, it
achieves mean AP of 11.25% over 60 classes on the validation split of AVA v2.1.
298
299
300
301
For more details please refer to this [paper](https://arxiv.org/abs/1705.08421).

<b>Thanks to contributors</b>: Chen Sun, David Ross

302
303
304
305
306
### April 2, 2018

Supercharge your mobile phones with the next generation mobile object detector!
We are adding support for MobileNet V2 with SSDLite presented in
[MobileNetV2: Inverted Residuals and Linear Bottlenecks](https://arxiv.org/abs/1801.04381).
307
308
309
This model is 35% faster than Mobilenet V1 SSD on a Google Pixel phone CPU
(200ms vs. 270ms) at the same accuracy. Along with the model definition, we are
also releasing a model checkpoint trained on the COCO dataset.
310

311
312
<b>Thanks to contributors</b>: Menglong Zhu, Mark Sandler, Zhichao Lu, Vivek
Rathod, Jonathan Huang
313

314
315
### February 9, 2018

316
317
318
319
320
321
322
323
We now support instance segmentation!! In this API update we support a number of
instance segmentation models similar to those discussed in the
[Mask R-CNN paper](https://arxiv.org/abs/1703.06870). For further details refer
to [our slides](http://presentations.cocodataset.org/Places17-GMRI.pdf) from the
2017 Coco + Places Workshop. Refer to the section on
[Running an Instance Segmentation Model](g3doc/instance_segmentation.md) for
instructions on how to configure a model that predicts masks in addition to
object bounding boxes.
324

325
326
<b>Thanks to contributors</b>: Alireza Fathi, Zhichao Lu, Vivek Rathod, Ronny
Votel, Jonathan Huang
327

328
329
330
331
### November 17, 2017

As a part of the Open Images V3 release we have released:

332
333
334
335
336
337
*   An implementation of the Open Images evaluation metric and the
    [protocol](g3doc/evaluation_protocols.md#open-images).
*   Additional tools to separate inference of detection and evaluation (see
    [this tutorial](g3doc/oid_inference_and_evaluation.md)).
*   A new detection model trained on the Open Images V2 data release (see
    [Open Images model](g3doc/detection_model_zoo.md#open-images-models)).
338

339
340
See more information on the
[Open Images website](https://github.com/openimages/dataset)!
341
342

<b>Thanks to contributors</b>: Stefan Popov, Alina Kuznetsova
Vivek Rathod's avatar
Vivek Rathod committed
343
344
345
346

### November 6, 2017

We have re-released faster versions of our (pre-trained) models in the
347
348
349
350
<a href='g3doc/detection_model_zoo.md'>model zoo</a>. In addition to what was
available before, we are also adding Faster R-CNN models trained on COCO with
Inception V2 and Resnet-50 feature extractors, as well as a Faster R-CNN with
Resnet-101 model trained on the KITTI dataset.
Vivek Rathod's avatar
Vivek Rathod committed
351

352
353
<b>Thanks to contributors</b>: Jonathan Huang, Vivek Rathod, Derek Chow, Tal
Remez, Chen Sun.
Vivek Rathod's avatar
Vivek Rathod committed
354

355
356
### October 31, 2017

357
358
359
360
361
We have released a new state-of-the-art model for object detection using the
Faster-RCNN with the
[NASNet-A image featurization](https://arxiv.org/abs/1707.07012). This model
achieves mAP of 43.1% on the test-dev validation dataset for COCO, improving on
the best available model in the zoo by 6% in terms of absolute mAP.
362

363
364
<b>Thanks to contributors</b>: Barret Zoph, Vijay Vasudevan, Jonathon Shlens,
Quoc Le
365

366
367
### August 11, 2017

368
369
370
371
372
373
We have released an update to the
[Android Detect demo](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/android)
which will now run models trained using the Tensorflow Object Detection API on
an Android device. By default, it currently runs a frozen SSD w/Mobilenet
detector trained on COCO, but we encourage you to try out other detection
models!
374
375
376

<b>Thanks to contributors</b>: Jonathan Huang, Andrew Harp

377
378
### June 15, 2017

379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
In addition to our base Tensorflow detection model definitions, this release
includes:

*   A selection of trainable detection models, including:
    *   Single Shot Multibox Detector (SSD) with MobileNet,
    *   SSD with Inception V2,
    *   Region-Based Fully Convolutional Networks (R-FCN) with Resnet 101,
    *   Faster RCNN with Resnet 101,
    *   Faster RCNN with Inception Resnet v2
*   Frozen weights (trained on the COCO dataset) for each of the above models to
    be used for out-of-the-box inference purposes.
*   A [Jupyter notebook](colab_tutorials/object_detection_tutorial.ipynb) for
    performing out-of-the-box inference with one of our released models
*   Convenient [local training](g3doc/running_locally.md) scripts as well as
    distributed training and evaluation pipelines via
    [Google Cloud](g3doc/running_on_cloud.md).
395

396
397
398
399
<b>Thanks to contributors</b>: Jonathan Huang, Vivek Rathod, Derek Chow, Chen
Sun, Menglong Zhu, Matthew Tang, Anoop Korattikara, Alireza Fathi, Ian Fischer,
Zbigniew Wojna, Yang Song, Sergio Guadarrama, Jasper Uijlings, Viacheslav
Kovalevskyi, Kevin Murphy