Unverified Commit 79ae8004 authored by pkulzc's avatar pkulzc Committed by GitHub
Browse files

Release Context RCNN code and pre-trained model

Context R-CNN: Long Term Temporal Context for Per Camera Object Detection

http://openaccess.thecvf.com/content_CVPR_2020/html/Beery_Context_R-CNN_Long_Term_Temporal_Context_for_Per-Camera_Object_Detection_CVPR_2020_paper.html
parent c87c3965
...@@ -2,17 +2,16 @@ ...@@ -2,17 +2,16 @@
![TensorFlow 2 Not Supported](https://img.shields.io/badge/TensorFlow%202%20Not%20Supported-%E2%9C%95-red.svg) ![TensorFlow 2 Not Supported](https://img.shields.io/badge/TensorFlow%202%20Not%20Supported-%E2%9C%95-red.svg)
# Tensorflow Object Detection API # Tensorflow Object Detection API
Creating accurate machine learning models capable of localizing and identifying Creating accurate machine learning models capable of localizing and identifying
multiple objects in a single image remains a core challenge in computer vision. multiple objects in a single image remains a core challenge in computer vision.
The TensorFlow Object Detection API is an open source framework built on top of The TensorFlow Object Detection API is an open source framework built on top of
TensorFlow that makes it easy to construct, train and deploy object detection TensorFlow that makes it easy to construct, train and deploy object detection
models. At Google we’ve certainly found this codebase to be useful for our models. At Google we’ve certainly found this codebase to be useful for our
computer vision needs, and we hope that you will as well. computer vision needs, and we hope that you will as well. <p align="center">
<p align="center"> <img src="g3doc/img/kites_detections_output.jpg" width=676 height=450> </p>
<img src="g3doc/img/kites_detections_output.jpg" width=676 height=450>
</p>
Contributions to the codebase are welcome and we would love to hear back from Contributions to the codebase are welcome and we would love to hear back from
you if you find this API useful. Finally if you use the Tensorflow Object you if you find this API useful. Finally if you use the Tensorflow Object
Detection API for a research publication, please consider citing: Detection API for a research publication, please consider citing:
``` ```
...@@ -20,8 +19,8 @@ Detection API for a research publication, please consider citing: ...@@ -20,8 +19,8 @@ Detection API for a research publication, please consider citing:
Huang J, Rathod V, Sun C, Zhu M, Korattikara A, Fathi A, Fischer I, Wojna Z, Huang J, Rathod V, Sun C, Zhu M, Korattikara A, Fathi A, Fischer I, Wojna Z,
Song Y, Guadarrama S, Murphy K, CVPR 2017 Song Y, Guadarrama S, Murphy K, CVPR 2017
``` ```
\[[link](https://arxiv.org/abs/1611.10012)\]\[[bibtex](
https://scholar.googleusercontent.com/scholar.bib?q=info:l291WsrB-hQJ:scholar.google.com/&output=citation&scisig=AAGBfm0AAAAAWUIIlnPZ_L9jxvPwcC49kDlELtaeIyU-&scisf=4&ct=citation&cd=-1&hl=en&scfhb=1)\] \[[link](https://arxiv.org/abs/1611.10012)\]\[[bibtex](https://scholar.googleusercontent.com/scholar.bib?q=info:l291WsrB-hQJ:scholar.google.com/&output=citation&scisig=AAGBfm0AAAAAWUIIlnPZ_L9jxvPwcC49kDlELtaeIyU-&scisf=4&ct=citation&cd=-1&hl=en&scfhb=1)\]
<p align="center"> <p align="center">
<img src="g3doc/img/tf-od-api-logo.png" width=140 height=195> <img src="g3doc/img/tf-od-api-logo.png" width=140 height=195>
...@@ -29,63 +28,65 @@ https://scholar.googleusercontent.com/scholar.bib?q=info:l291WsrB-hQJ:scholar.go ...@@ -29,63 +28,65 @@ https://scholar.googleusercontent.com/scholar.bib?q=info:l291WsrB-hQJ:scholar.go
## Maintainers ## Maintainers
| Name | GitHub | Name | GitHub
| --- | --- | -------------- | ---------------------------------------------
| Jonathan Huang | [jch1](https://github.com/jch1) | Jonathan Huang | [jch1](https://github.com/jch1)
| Vivek Rathod | [tombstone](https://github.com/tombstone) | Vivek Rathod | [tombstone](https://github.com/tombstone)
| Ronny Votel | [ronnyvotel](https://github.com/ronnyvotel) | Ronny Votel | [ronnyvotel](https://github.com/ronnyvotel)
| Derek Chow | [derekjchow](https://github.com/derekjchow) | Derek Chow | [derekjchow](https://github.com/derekjchow)
| Chen Sun | [jesu9](https://github.com/jesu9) | Chen Sun | [jesu9](https://github.com/jesu9)
| Menglong Zhu | [dreamdragon](https://github.com/dreamdragon) | Menglong Zhu | [dreamdragon](https://github.com/dreamdragon)
| Alireza Fathi | [afathi3](https://github.com/afathi3) | Alireza Fathi | [afathi3](https://github.com/afathi3)
| Zhichao Lu | [pkulzc](https://github.com/pkulzc) | Zhichao Lu | [pkulzc](https://github.com/pkulzc)
## Table of contents ## Table of contents
Setup: Setup:
* <a href='g3doc/installation.md'>Installation</a><br> * <a href='g3doc/installation.md'>Installation</a><br>
Quick Start: Quick Start:
* <a href='object_detection_tutorial.ipynb'> * <a href='object_detection_tutorial.ipynb'>
Quick Start: Jupyter notebook for off-the-shelf inference</a><br> Quick Start: Jupyter notebook for off-the-shelf inference</a><br>
* <a href="g3doc/running_pets.md">Quick Start: Training a pet detector</a><br> * <a href="g3doc/running_pets.md">Quick Start: Training a pet detector</a><br>
Customizing a Pipeline: Customizing a Pipeline:
* <a href='g3doc/configuring_jobs.md'> * <a href='g3doc/configuring_jobs.md'>
Configuring an object detection pipeline</a><br> Configuring an object detection pipeline</a><br>
* <a href='g3doc/preparing_inputs.md'>Preparing inputs</a><br> * <a href='g3doc/preparing_inputs.md'>Preparing inputs</a><br>
Running: Running:
* <a href='g3doc/running_locally.md'>Running locally</a><br> * <a href='g3doc/running_locally.md'>Running locally</a><br>
* <a href='g3doc/running_on_cloud.md'>Running on the cloud</a><br> * <a href='g3doc/running_on_cloud.md'>Running on the cloud</a><br>
Extras: Extras:
* <a href='g3doc/detection_model_zoo.md'>Tensorflow detection model zoo</a><br> * <a href='g3doc/detection_model_zoo.md'>Tensorflow detection model zoo</a><br>
* <a href='g3doc/exporting_models.md'> * <a href='g3doc/exporting_models.md'>
Exporting a trained model for inference</a><br> Exporting a trained model for inference</a><br>
* <a href='g3doc/tpu_exporters.md'> * <a href='g3doc/tpu_exporters.md'>
Exporting a trained model for TPU inference</a><br> Exporting a trained model for TPU inference</a><br>
* <a href='g3doc/defining_your_own_model.md'> * <a href='g3doc/defining_your_own_model.md'>
Defining your own model architecture</a><br> Defining your own model architecture</a><br>
* <a href='g3doc/using_your_own_dataset.md'> * <a href='g3doc/using_your_own_dataset.md'>
Bringing in your own dataset</a><br> Bringing in your own dataset</a><br>
* <a href='g3doc/evaluation_protocols.md'> * <a href='g3doc/evaluation_protocols.md'>
Supported object detection evaluation protocols</a><br> Supported object detection evaluation protocols</a><br>
* <a href='g3doc/oid_inference_and_evaluation.md'> * <a href='g3doc/oid_inference_and_evaluation.md'>
Inference and evaluation on the Open Images dataset</a><br> Inference and evaluation on the Open Images dataset</a><br>
* <a href='g3doc/instance_segmentation.md'> * <a href='g3doc/instance_segmentation.md'>
Run an instance segmentation model</a><br> Run an instance segmentation model</a><br>
* <a href='g3doc/challenge_evaluation.md'> * <a href='g3doc/challenge_evaluation.md'>
Run the evaluation for the Open Images Challenge 2018/2019</a><br> Run the evaluation for the Open Images Challenge 2018/2019</a><br>
* <a href='g3doc/tpu_compatibility.md'> * <a href='g3doc/tpu_compatibility.md'>
TPU compatible detection pipelines</a><br> TPU compatible detection pipelines</a><br>
* <a href='g3doc/running_on_mobile_tensorflowlite.md'> * <a href='g3doc/running_on_mobile_tensorflowlite.md'>
Running object detection on mobile devices with TensorFlow Lite</a><br> Running object detection on mobile devices with TensorFlow Lite</a><br>
* <a href='g3doc/context_rcnn.md'>
Context R-CNN documentation for data preparation, training, and export</a><br>
## Getting Help ## Getting Help
...@@ -98,78 +99,105 @@ tensorflow/models GitHub ...@@ -98,78 +99,105 @@ tensorflow/models GitHub
[issue tracker](https://github.com/tensorflow/models/issues), prefixing the [issue tracker](https://github.com/tensorflow/models/issues), prefixing the
issue name with "object_detection". issue name with "object_detection".
Please check [FAQ](g3doc/faq.md) for frequently asked questions before Please check [FAQ](g3doc/faq.md) for frequently asked questions before reporting
reporting an issue. an issue.
## Release information ## Release information
### June 17th, 2020
We have released [Context R-CNN](https://arxiv.org/abs/1912.03538), a model that
uses attention to incorporate contextual information images (e.g. from
temporally nearby frames taken by a static camera) in order to improve accuracy.
Importantly, these contextual images need not be labeled.
* When applied to a challenging wildlife detection dataset ([Snapshot Serengeti](http://lila.science/datasets/snapshot-serengeti)),
Context R-CNN with context from up to a month of images outperforms a
single-frame baseline by 17.9% mAP, and outperforms S3D (a 3d convolution
based baseline) by 11.2% mAP.
* Context R-CNN leverages temporal context from the unlabeled frames of a
novel camera deployment to improve performance at that camera, boosting
model generalizeability.
We have provided code for generating data with associated context
[here](g3doc/context_rcnn.md), and a sample config for a Context R-CNN
model [here](samples/configs/context_rcnn_resnet101_snapshot_serengeti_sync.config).
Snapshot Serengeti-trained Faster R-CNN and Context R-CNN models can be found in
the [model zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md#snapshot-serengeti-camera-trap-trained-models).
A colab demonstrating Context R-CNN is provided
[here](colab_tutorials/context_rcnn_tutorial.ipynb).
<b>Thanks to contributors</b>: Sara Beery, Jonathan Huang, Guanhang Wu, Vivek
Rathod, Ronny Votel, Zhichao Lu, David Ross, Pietro Perona, Tanya Birch, and
the Wildlife Insights AI Team.
### May 19th, 2020 ### May 19th, 2020
We have released
[MobileDets](https://arxiv.org/abs/2004.14525),
a set of high-performance models for mobile CPUs, DSPs and EdgeTPUs.
* MobileDets outperform MobileNetV3+SSDLite by 1.7 mAP at comparable mobile CPU We have released [MobileDets](https://arxiv.org/abs/2004.14525), a set of
inference latencies. MobileDets also outperform MobileNetV2+SSDLite by 1.9 mAP high-performance models for mobile CPUs, DSPs and EdgeTPUs.
on mobile CPUs, 3.7 mAP on EdgeTPUs and 3.4 mAP on DSPs while running equally
fast. MobileDets also offer up to 2x speedup over MnasFPN on EdgeTPUs and DSPs. * MobileDets outperform MobileNetV3+SSDLite by 1.7 mAP at comparable mobile
CPU inference latencies. MobileDets also outperform MobileNetV2+SSDLite by
1.9 mAP on mobile CPUs, 3.7 mAP on EdgeTPUs and 3.4 mAP on DSPs while
running equally fast. MobileDets also offer up to 2x speedup over MnasFPN on
EdgeTPUs and DSPs.
For each of the three hardware platforms we have released model definition, For each of the three hardware platforms we have released model definition,
model checkpoints trained on the COCO14 dataset and converted TFLite models in model checkpoints trained on the COCO14 dataset and converted TFLite models in
fp32 and/or uint8. fp32 and/or uint8.
<b>Thanks to contributors</b>: Yunyang Xiong, Hanxiao Liu, Suyog Gupta, <b>Thanks to contributors</b>: Yunyang Xiong, Hanxiao Liu, Suyog Gupta, Berkin
Berkin Akin, Gabriel Bender, Pieter-Jan Kindermans, Mingxing Tan, Vikas Singh, Akin, Gabriel Bender, Pieter-Jan Kindermans, Mingxing Tan, Vikas Singh, Bo Chen,
Bo Chen, Quoc Le, Zhichao Lu. Quoc Le, Zhichao Lu.
### May 7th, 2020 ### May 7th, 2020
We have released a mobile model with the We have released a mobile model with the
[MnasFPN head](https://arxiv.org/abs/1912.01106). [MnasFPN head](https://arxiv.org/abs/1912.01106).
* MnasFPN with MobileNet-V2 backbone is the most accurate (26.6 mAP at 183ms
on Pixel 1) mobile detection model we have released to date. With
depth-multiplier, MnasFPN with MobileNet-V2 backbone is 1.8 mAP higher than
MobileNet-V3-Large with SSDLite (23.8 mAP vs 22.0 mAP) at similar latency
(120ms) on Pixel 1.
* MnasFPN with MobileNet-V2 backbone is the most accurate (26.6 mAP at 183ms on We have released model definition, model checkpoints trained on the COCO14
Pixel 1) mobile detection model we have released to date. With depth-multiplier, dataset and a converted TFLite model.
MnasFPN with MobileNet-V2 backbone is 1.8 mAP higher than MobileNet-V3-Large
with SSDLite (23.8 mAP vs 22.0 mAP) at similar latency (120ms) on Pixel 1.
We have released model definition, model checkpoints trained on
the COCO14 dataset and a converted TFLite model.
<b>Thanks to contributors</b>: Bo Chen, Golnaz Ghiasi, Hanxiao Liu,
Tsung-Yi Lin, Dmitry Kalenichenko, Hartwig Adam, Quoc Le, Zhichao Lu,
Jonathan Huang, Hao Xu.
<b>Thanks to contributors</b>: Bo Chen, Golnaz Ghiasi, Hanxiao Liu, Tsung-Yi
Lin, Dmitry Kalenichenko, Hartwig Adam, Quoc Le, Zhichao Lu, Jonathan Huang, Hao
Xu.
### Nov 13th, 2019 ### Nov 13th, 2019
We have released MobileNetEdgeTPU SSDLite model. We have released MobileNetEdgeTPU SSDLite model.
* SSDLite with MobileNetEdgeTPU backbone, which achieves 10% mAP higher than * SSDLite with MobileNetEdgeTPU backbone, which achieves 10% mAP higher than
MobileNetV2 SSDLite (24.3 mAP vs 22 mAP) on a Google Pixel4 at comparable MobileNetV2 SSDLite (24.3 mAP vs 22 mAP) on a Google Pixel4 at comparable
latency (6.6ms vs 6.8ms). latency (6.6ms vs 6.8ms).
Along with the model definition, we are also releasing model checkpoints Along with the model definition, we are also releasing model checkpoints trained
trained on the COCO dataset. on the COCO dataset.
<b>Thanks to contributors</b>: Yunyang Xiong, Bo Chen, Suyog Gupta, Hanxiao Liu, <b>Thanks to contributors</b>: Yunyang Xiong, Bo Chen, Suyog Gupta, Hanxiao Liu,
Gabriel Bender, Mingxing Tan, Berkin Akin, Zhichao Lu, Quoc Le Gabriel Bender, Mingxing Tan, Berkin Akin, Zhichao Lu, Quoc Le
### Oct 15th, 2019 ### Oct 15th, 2019
We have released two MobileNet V3 SSDLite models (presented in We have released two MobileNet V3 SSDLite models (presented in
[Searching for MobileNetV3](https://arxiv.org/abs/1905.02244)). [Searching for MobileNetV3](https://arxiv.org/abs/1905.02244)).
* SSDLite with MobileNet-V3-Large backbone, which is 27% faster than Mobilenet * SSDLite with MobileNet-V3-Large backbone, which is 27% faster than Mobilenet
V2 SSDLite (119ms vs 162ms) on a Google Pixel phone CPU at the same mAP. V2 SSDLite (119ms vs 162ms) on a Google Pixel phone CPU at the same mAP.
* SSDLite with MobileNet-V3-Small backbone, which is 37% faster than MnasNet * SSDLite with MobileNet-V3-Small backbone, which is 37% faster than MnasNet
SSDLite reduced with depth-multiplier (43ms vs 68ms) at the same mAP. SSDLite reduced with depth-multiplier (43ms vs 68ms) at the same mAP.
Along with the model definition, we are also releasing model checkpoints Along with the model definition, we are also releasing model checkpoints trained
trained on the COCO dataset. on the COCO dataset.
<b>Thanks to contributors</b>: Bo Chen, Zhichao Lu, Vivek Rathod, Jonathan Huang <b>Thanks to contributors</b>: Bo Chen, Zhichao Lu, Vivek Rathod, Jonathan Huang
### July 1st, 2019 ### July 1st, 2019
We have released an updated set of utils and an updated We have released an updated set of utils and an updated
...@@ -177,28 +205,30 @@ We have released an updated set of utils and an updated ...@@ -177,28 +205,30 @@ We have released an updated set of utils and an updated
[Open Images Challenge 2019](https://storage.googleapis.com/openimages/web/challenge2019.html)! [Open Images Challenge 2019](https://storage.googleapis.com/openimages/web/challenge2019.html)!
The Instance Segmentation metric for The Instance Segmentation metric for
[Open Images V5](https://storage.googleapis.com/openimages/web/index.html) [Open Images V5](https://storage.googleapis.com/openimages/web/index.html) and
and [Challenge 2019](https://storage.googleapis.com/openimages/web/challenge2019.html) [Challenge 2019](https://storage.googleapis.com/openimages/web/challenge2019.html)
is part of this release. Check out [the metric description](https://storage.googleapis.com/openimages/web/evaluation.html#instance_segmentation_eval) is part of this release. Check out
[the metric description](https://storage.googleapis.com/openimages/web/evaluation.html#instance_segmentation_eval)
on the Open Images website. on the Open Images website.
<b>Thanks to contributors</b>: Alina Kuznetsova, Rodrigo Benenson <b>Thanks to contributors</b>: Alina Kuznetsova, Rodrigo Benenson
### Feb 11, 2019 ### Feb 11, 2019
We have released detection models trained on the Open Images Dataset V4 We have released detection models trained on the Open Images Dataset V4 in our
in our detection model zoo, including detection model zoo, including
* Faster R-CNN detector with Inception Resnet V2 feature extractor * Faster R-CNN detector with Inception Resnet V2 feature extractor
* SSD detector with MobileNet V2 feature extractor * SSD detector with MobileNet V2 feature extractor
* SSD detector with ResNet 101 FPN feature extractor (aka RetinaNet-101) * SSD detector with ResNet 101 FPN feature extractor (aka RetinaNet-101)
<b>Thanks to contributors</b>: Alina Kuznetsova, Yinxiao Li <b>Thanks to contributors</b>: Alina Kuznetsova, Yinxiao Li
### Sep 17, 2018 ### Sep 17, 2018
We have released Faster R-CNN detectors with ResNet-50 / ResNet-101 feature We have released Faster R-CNN detectors with ResNet-50 / ResNet-101 feature
extractors trained on the [iNaturalist Species Detection Dataset](https://github.com/visipedia/inat_comp/blob/master/2017/README.md#bounding-boxes). extractors trained on the
[iNaturalist Species Detection Dataset](https://github.com/visipedia/inat_comp/blob/master/2017/README.md#bounding-boxes).
The models are trained on the training split of the iNaturalist data for 4M The models are trained on the training split of the iNaturalist data for 4M
iterations, they achieve 55% and 58% mean AP@.5 over 2854 classes respectively. iterations, they achieve 55% and 58% mean AP@.5 over 2854 classes respectively.
For more details please refer to this [paper](https://arxiv.org/abs/1707.06642). For more details please refer to this [paper](https://arxiv.org/abs/1707.06642).
...@@ -210,42 +240,59 @@ For more details please refer to this [paper](https://arxiv.org/abs/1707.06642). ...@@ -210,42 +240,59 @@ For more details please refer to this [paper](https://arxiv.org/abs/1707.06642).
There are many new updates in this release, extending the functionality and There are many new updates in this release, extending the functionality and
capability of the API: capability of the API:
* Moving from slim-based training to [Estimator](https://www.tensorflow.org/api_docs/python/tf/estimator/Estimator)-based * Moving from slim-based training to
training. [Estimator](https://www.tensorflow.org/api_docs/python/tf/estimator/Estimator)-based
* Support for [RetinaNet](https://arxiv.org/abs/1708.02002), and a [MobileNet](https://ai.googleblog.com/2017/06/mobilenets-open-source-models-for.html) training.
adaptation of RetinaNet. * Support for [RetinaNet](https://arxiv.org/abs/1708.02002), and a
* A novel SSD-based architecture called the [Pooling Pyramid Network](https://arxiv.org/abs/1807.03284) (PPN). [MobileNet](https://ai.googleblog.com/2017/06/mobilenets-open-source-models-for.html)
* Releasing several [TPU](https://cloud.google.com/tpu/)-compatible models. adaptation of RetinaNet.
These can be found in the `samples/configs/` directory with a comment in the * A novel SSD-based architecture called the
pipeline configuration files indicating TPU compatibility. [Pooling Pyramid Network](https://arxiv.org/abs/1807.03284) (PPN).
* Support for quantized training. * Releasing several [TPU](https://cloud.google.com/tpu/)-compatible models.
* Updated documentation for new binaries, Cloud training, and [Tensorflow Lite](https://www.tensorflow.org/mobile/tflite/). These can be found in the `samples/configs/` directory with a comment in the
pipeline configuration files indicating TPU compatibility.
See also our [expanded announcement blogpost](https://ai.googleblog.com/2018/07/accelerated-training-and-inference-with.html) and accompanying tutorial at the [TensorFlow blog](https://medium.com/tensorflow/training-and-serving-a-realtime-mobile-object-detector-in-30-minutes-with-cloud-tpus-b78971cf1193). * Support for quantized training.
* Updated documentation for new binaries, Cloud training, and
[Tensorflow Lite](https://www.tensorflow.org/mobile/tflite/).
See also our
[expanded announcement blogpost](https://ai.googleblog.com/2018/07/accelerated-training-and-inference-with.html)
and accompanying tutorial at the
[TensorFlow blog](https://medium.com/tensorflow/training-and-serving-a-realtime-mobile-object-detector-in-30-minutes-with-cloud-tpus-b78971cf1193).
<b>Thanks to contributors</b>: Sara Robinson, Aakanksha Chowdhery, Derek Chow, <b>Thanks to contributors</b>: Sara Robinson, Aakanksha Chowdhery, Derek Chow,
Pengchong Jin, Jonathan Huang, Vivek Rathod, Zhichao Lu, Ronny Votel Pengchong Jin, Jonathan Huang, Vivek Rathod, Zhichao Lu, Ronny Votel
### June 25, 2018 ### June 25, 2018
Additional evaluation tools for the [Open Images Challenge 2018](https://storage.googleapis.com/openimages/web/challenge.html) are out. Additional evaluation tools for the
Check out our short tutorial on data preparation and running evaluation [here](g3doc/challenge_evaluation.md)! [Open Images Challenge 2018](https://storage.googleapis.com/openimages/web/challenge.html)
are out. Check out our short tutorial on data preparation and running evaluation
[here](g3doc/challenge_evaluation.md)!
<b>Thanks to contributors</b>: Alina Kuznetsova <b>Thanks to contributors</b>: Alina Kuznetsova
### June 5, 2018 ### June 5, 2018
We have released the implementation of evaluation metrics for both tracks of the [Open Images Challenge 2018](https://storage.googleapis.com/openimages/web/challenge.html) as a part of the Object Detection API - see the [evaluation protocols](g3doc/evaluation_protocols.md) for more details. We have released the implementation of evaluation metrics for both tracks of the
Additionally, we have released a tool for hierarchical labels expansion for the Open Images Challenge: check out [oid_hierarchical_labels_expansion.py](dataset_tools/oid_hierarchical_labels_expansion.py). [Open Images Challenge 2018](https://storage.googleapis.com/openimages/web/challenge.html)
as a part of the Object Detection API - see the
[evaluation protocols](g3doc/evaluation_protocols.md) for more details.
Additionally, we have released a tool for hierarchical labels expansion for the
Open Images Challenge: check out
[oid_hierarchical_labels_expansion.py](dataset_tools/oid_hierarchical_labels_expansion.py).
<b>Thanks to contributors</b>: Alina Kuznetsova, Vittorio Ferrari, Jasper Uijlings <b>Thanks to contributors</b>: Alina Kuznetsova, Vittorio Ferrari, Jasper
Uijlings
### April 30, 2018 ### April 30, 2018
We have released a Faster R-CNN detector with ResNet-101 feature extractor trained on [AVA](https://research.google.com/ava/) v2.1. We have released a Faster R-CNN detector with ResNet-101 feature extractor
Compared with other commonly used object detectors, it changes the action classification loss function to per-class Sigmoid loss to handle boxes with multiple labels. trained on [AVA](https://research.google.com/ava/) v2.1. Compared with other
The model is trained on the training split of AVA v2.1 for 1.5M iterations, it achieves mean AP of 11.25% over 60 classes on the validation split of AVA v2.1. commonly used object detectors, it changes the action classification loss
function to per-class Sigmoid loss to handle boxes with multiple labels. The
model is trained on the training split of AVA v2.1 for 1.5M iterations, it
achieves mean AP of 11.25% over 60 classes on the validation split of AVA v2.1.
For more details please refer to this [paper](https://arxiv.org/abs/1705.08421). For more details please refer to this [paper](https://arxiv.org/abs/1705.08421).
<b>Thanks to contributors</b>: Chen Sun, David Ross <b>Thanks to contributors</b>: Chen Sun, David Ross
...@@ -255,84 +302,94 @@ For more details please refer to this [paper](https://arxiv.org/abs/1705.08421). ...@@ -255,84 +302,94 @@ For more details please refer to this [paper](https://arxiv.org/abs/1705.08421).
Supercharge your mobile phones with the next generation mobile object detector! Supercharge your mobile phones with the next generation mobile object detector!
We are adding support for MobileNet V2 with SSDLite presented in We are adding support for MobileNet V2 with SSDLite presented in
[MobileNetV2: Inverted Residuals and Linear Bottlenecks](https://arxiv.org/abs/1801.04381). [MobileNetV2: Inverted Residuals and Linear Bottlenecks](https://arxiv.org/abs/1801.04381).
This model is 35% faster than Mobilenet V1 SSD on a Google Pixel phone CPU (200ms vs. 270ms) at the same accuracy. This model is 35% faster than Mobilenet V1 SSD on a Google Pixel phone CPU
Along with the model definition, we are also releasing a model checkpoint trained on the COCO dataset. (200ms vs. 270ms) at the same accuracy. Along with the model definition, we are
also releasing a model checkpoint trained on the COCO dataset.
<b>Thanks to contributors</b>: Menglong Zhu, Mark Sandler, Zhichao Lu, Vivek Rathod, Jonathan Huang <b>Thanks to contributors</b>: Menglong Zhu, Mark Sandler, Zhichao Lu, Vivek
Rathod, Jonathan Huang
### February 9, 2018 ### February 9, 2018
We now support instance segmentation!! In this API update we support a number of instance segmentation models similar to those discussed in the [Mask R-CNN paper](https://arxiv.org/abs/1703.06870). For further details refer to We now support instance segmentation!! In this API update we support a number of
[our slides](http://presentations.cocodataset.org/Places17-GMRI.pdf) from the 2017 Coco + Places Workshop. instance segmentation models similar to those discussed in the
Refer to the section on [Running an Instance Segmentation Model](g3doc/instance_segmentation.md) for instructions on how to configure a model [Mask R-CNN paper](https://arxiv.org/abs/1703.06870). For further details refer
that predicts masks in addition to object bounding boxes. to [our slides](http://presentations.cocodataset.org/Places17-GMRI.pdf) from the
2017 Coco + Places Workshop. Refer to the section on
[Running an Instance Segmentation Model](g3doc/instance_segmentation.md) for
instructions on how to configure a model that predicts masks in addition to
object bounding boxes.
<b>Thanks to contributors</b>: Alireza Fathi, Zhichao Lu, Vivek Rathod, Ronny Votel, Jonathan Huang <b>Thanks to contributors</b>: Alireza Fathi, Zhichao Lu, Vivek Rathod, Ronny
Votel, Jonathan Huang
### November 17, 2017 ### November 17, 2017
As a part of the Open Images V3 release we have released: As a part of the Open Images V3 release we have released:
* An implementation of the Open Images evaluation metric and the [protocol](g3doc/evaluation_protocols.md#open-images). * An implementation of the Open Images evaluation metric and the
* Additional tools to separate inference of detection and evaluation (see [this tutorial](g3doc/oid_inference_and_evaluation.md)). [protocol](g3doc/evaluation_protocols.md#open-images).
* A new detection model trained on the Open Images V2 data release (see [Open Images model](g3doc/detection_model_zoo.md#open-images-models)). * Additional tools to separate inference of detection and evaluation (see
[this tutorial](g3doc/oid_inference_and_evaluation.md)).
* A new detection model trained on the Open Images V2 data release (see
[Open Images model](g3doc/detection_model_zoo.md#open-images-models)).
See more information on the [Open Images website](https://github.com/openimages/dataset)! See more information on the
[Open Images website](https://github.com/openimages/dataset)!
<b>Thanks to contributors</b>: Stefan Popov, Alina Kuznetsova <b>Thanks to contributors</b>: Stefan Popov, Alina Kuznetsova
### November 6, 2017 ### November 6, 2017
We have re-released faster versions of our (pre-trained) models in the We have re-released faster versions of our (pre-trained) models in the
<a href='g3doc/detection_model_zoo.md'>model zoo</a>. In addition to what <a href='g3doc/detection_model_zoo.md'>model zoo</a>. In addition to what was
was available before, we are also adding Faster R-CNN models trained on COCO available before, we are also adding Faster R-CNN models trained on COCO with
with Inception V2 and Resnet-50 feature extractors, as well as a Faster R-CNN Inception V2 and Resnet-50 feature extractors, as well as a Faster R-CNN with
with Resnet-101 model trained on the KITTI dataset. Resnet-101 model trained on the KITTI dataset.
<b>Thanks to contributors</b>: Jonathan Huang, Vivek Rathod, Derek Chow, <b>Thanks to contributors</b>: Jonathan Huang, Vivek Rathod, Derek Chow, Tal
Tal Remez, Chen Sun. Remez, Chen Sun.
### October 31, 2017 ### October 31, 2017
We have released a new state-of-the-art model for object detection using We have released a new state-of-the-art model for object detection using the
the Faster-RCNN with the Faster-RCNN with the
[NASNet-A image featurization](https://arxiv.org/abs/1707.07012). This [NASNet-A image featurization](https://arxiv.org/abs/1707.07012). This model
model achieves mAP of 43.1% on the test-dev validation dataset for COCO, achieves mAP of 43.1% on the test-dev validation dataset for COCO, improving on
improving on the best available model in the zoo by 6% in terms the best available model in the zoo by 6% in terms of absolute mAP.
of absolute mAP.
<b>Thanks to contributors</b>: Barret Zoph, Vijay Vasudevan, Jonathon Shlens, Quoc Le <b>Thanks to contributors</b>: Barret Zoph, Vijay Vasudevan, Jonathon Shlens,
Quoc Le
### August 11, 2017 ### August 11, 2017
We have released an update to the [Android Detect We have released an update to the
demo](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/android) [Android Detect demo](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/android)
which will now run models trained using the Tensorflow Object which will now run models trained using the Tensorflow Object Detection API on
Detection API on an Android device. By default, it currently runs a an Android device. By default, it currently runs a frozen SSD w/Mobilenet
frozen SSD w/Mobilenet detector trained on COCO, but we encourage detector trained on COCO, but we encourage you to try out other detection
you to try out other detection models! models!
<b>Thanks to contributors</b>: Jonathan Huang, Andrew Harp <b>Thanks to contributors</b>: Jonathan Huang, Andrew Harp
### June 15, 2017 ### June 15, 2017
In addition to our base Tensorflow detection model definitions, this In addition to our base Tensorflow detection model definitions, this release
release includes: includes:
* A selection of trainable detection models, including: * A selection of trainable detection models, including:
* Single Shot Multibox Detector (SSD) with MobileNet, * Single Shot Multibox Detector (SSD) with MobileNet,
* SSD with Inception V2, * SSD with Inception V2,
* Region-Based Fully Convolutional Networks (R-FCN) with Resnet 101, * Region-Based Fully Convolutional Networks (R-FCN) with Resnet 101,
* Faster RCNN with Resnet 101, * Faster RCNN with Resnet 101,
* Faster RCNN with Inception Resnet v2 * Faster RCNN with Inception Resnet v2
* Frozen weights (trained on the COCO dataset) for each of the above models to * Frozen weights (trained on the COCO dataset) for each of the above models to
be used for out-of-the-box inference purposes. be used for out-of-the-box inference purposes.
* A [Jupyter notebook](object_detection_tutorial.ipynb) for performing * A [Jupyter notebook](colab_tutorials/object_detection_tutorial.ipynb) for
out-of-the-box inference with one of our released models performing out-of-the-box inference with one of our released models
* Convenient [local training](g3doc/running_locally.md) scripts as well as * Convenient [local training](g3doc/running_locally.md) scripts as well as
distributed training and evaluation pipelines via distributed training and evaluation pipelines via
[Google Cloud](g3doc/running_on_cloud.md). [Google Cloud](g3doc/running_on_cloud.md).
<b>Thanks to contributors</b>: Jonathan Huang, Vivek Rathod, Derek Chow, Chen <b>Thanks to contributors</b>: Jonathan Huang, Vivek Rathod, Derek Chow, Chen
Sun, Menglong Zhu, Matthew Tang, Anoop Korattikara, Alireza Fathi, Ian Fischer, Sun, Menglong Zhu, Matthew Tang, Anoop Korattikara, Alireza Fathi, Ian Fischer,
......
...@@ -22,6 +22,18 @@ contextual features. We focus on building context from object-centric features ...@@ -22,6 +22,18 @@ contextual features. We focus on building context from object-centric features
generated with a pre-trained Faster R-CNN model, but you can adapt the provided generated with a pre-trained Faster R-CNN model, but you can adapt the provided
code to use alternative feature extractors. code to use alternative feature extractors.
Each of these data processing scripts uses Apache Beam, which can be installed
using
```
pip install apache-beam
```
and can be run locally, or on a cluster for efficient processing of large
amounts of data. See the
[Apache Beam documentation](https://beam.apache.org/documentation/runners/dataflow/)
for more information.
### Generating TfRecords from a set of images and a COCO-CameraTraps style JSON ### Generating TfRecords from a set of images and a COCO-CameraTraps style JSON
If your data is already stored in TfRecords, you can skip this first step. If your data is already stored in TfRecords, you can skip this first step.
...@@ -99,6 +111,10 @@ python object_detection/export_inference_graph.py \ ...@@ -99,6 +111,10 @@ python object_detection/export_inference_graph.py \
--additional_output_tensor_names detection_features --additional_output_tensor_names detection_features
``` ```
Make sure that you have set `output_final_box_features: true` within
your config file before exporting. This is needed to export the features as an
output, but it does not need to be set during training.
To generate and save contextual features for your data, run To generate and save contextual features for your data, run
``` ```
...@@ -111,7 +127,8 @@ python object_detection/dataset_tools/context_rcnn/generate_embedding_data.py \ ...@@ -111,7 +127,8 @@ python object_detection/dataset_tools/context_rcnn/generate_embedding_data.py \
### Building up contextual memory banks and storing them for each context group ### Building up contextual memory banks and storing them for each context group
To build the context features into memory banks, run To build the context features you just added for each image into memory banks,
run
``` ```
python object_detection/dataset_tools/context_rcnn/add_context_to_examples.py \ python object_detection/dataset_tools/context_rcnn/add_context_to_examples.py \
...@@ -121,6 +138,9 @@ python object_detection/dataset_tools/context_rcnn/add_context_to_examples.py \ ...@@ -121,6 +138,9 @@ python object_detection/dataset_tools/context_rcnn/add_context_to_examples.py \
--time_horizon month --time_horizon month
``` ```
where the input_tfrecords for add_context_to_examples.py are the
output_tfrecords from generate_embedding_data.py.
For all options, see add_context_to_examples.py. By default, this code builds For all options, see add_context_to_examples.py. By default, this code builds
TfSequenceExamples, which are more data efficient (this allows you to store the TfSequenceExamples, which are more data efficient (this allows you to store the
context features once for each context group, as opposed to once per image). If context features once for each context group, as opposed to once per image). If
......
...@@ -23,9 +23,9 @@ import functools ...@@ -23,9 +23,9 @@ import functools
import os import os
import tensorflow.compat.v1 as tf import tensorflow.compat.v1 as tf
import tensorflow.compat.v2 as tf2
import tf_slim as slim import tf_slim as slim
from object_detection import eval_util from object_detection import eval_util
from object_detection import exporter as exporter_lib from object_detection import exporter as exporter_lib
from object_detection import inputs from object_detection import inputs
...@@ -349,7 +349,7 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False, ...@@ -349,7 +349,7 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False,
from tensorflow.python.keras.engine import base_layer_utils # pylint: disable=g-import-not-at-top from tensorflow.python.keras.engine import base_layer_utils # pylint: disable=g-import-not-at-top
# Enable v2 behavior, as `mixed_bfloat16` is only supported in TF 2.0. # Enable v2 behavior, as `mixed_bfloat16` is only supported in TF 2.0.
base_layer_utils.enable_v2_dtype_behavior() base_layer_utils.enable_v2_dtype_behavior()
tf.compat.v2.keras.mixed_precision.experimental.set_policy( tf2.keras.mixed_precision.experimental.set_policy(
'mixed_bfloat16') 'mixed_bfloat16')
detection_model = detection_model_fn( detection_model = detection_model_fn(
is_training=is_training, add_summaries=(not use_tpu)) is_training=is_training, add_summaries=(not use_tpu))
......
# Context R-CNN configuration for Snapshot Serengeti Dataset, with sequence
# example input data with context_features.
# This model uses attention into contextual features within the Faster R-CNN
# object detection framework to improve object detection performance.
# See https://arxiv.org/abs/1912.03538 for more information.
# Search for "PATH_TO_BE_CONFIGURED" to find the fields that should be
# configured.
# This config is TPU compatible.
model {
faster_rcnn {
num_classes: 48
image_resizer {
fixed_shape_resizer {
height: 640
width: 640
}
}
feature_extractor {
type: "faster_rcnn_resnet101"
first_stage_features_stride: 16
batch_norm_trainable: true
}
first_stage_anchor_generator {
grid_anchor_generator {
height_stride: 16
width_stride: 16
scales: 0.25
scales: 0.5
scales: 1.0
scales: 2.0
aspect_ratios: 0.5
aspect_ratios: 1.0
aspect_ratios: 2.0
}
}
first_stage_box_predictor_conv_hyperparams {
op: CONV
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
truncated_normal_initializer {
stddev: 0.00999999977648
}
}
}
first_stage_nms_score_threshold: 0.0
first_stage_nms_iou_threshold: 0.699999988079
first_stage_max_proposals: 300
first_stage_localization_loss_weight: 2.0
first_stage_objectness_loss_weight: 1.0
initial_crop_size: 14
maxpool_kernel_size: 2
maxpool_stride: 2
second_stage_box_predictor {
mask_rcnn_box_predictor {
fc_hyperparams {
op: FC
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
variance_scaling_initializer {
factor: 1.0
uniform: true
mode: FAN_AVG
}
}
}
use_dropout: false
dropout_keep_probability: 1.0
share_box_across_classes: true
}
}
second_stage_post_processing {
batch_non_max_suppression {
score_threshold: 0.0
iou_threshold: 0.600000023842
max_detections_per_class: 100
max_total_detections: 300
}
score_converter: SOFTMAX
}
second_stage_localization_loss_weight: 2.0
second_stage_classification_loss_weight: 1.0
use_matmul_crop_and_resize: true
clip_anchors_to_image: true
use_matmul_gather_in_matcher: true
use_static_balanced_label_sampler: true
use_static_shapes: true
context_config {
max_num_context_features: 2000
context_feature_length: 2057
}
}
}
train_config {
batch_size: 64
data_augmentation_options {
random_horizontal_flip {
}
}
sync_replicas: true
optimizer {
momentum_optimizer {
learning_rate {
manual_step_learning_rate {
initial_learning_rate: 0.0
schedule {
step: 2000
learning_rate: 0.00200000009499
}
schedule {
step: 200000
learning_rate: 0.000199999994948
}
schedule {
step: 300000
learning_rate: 1.99999994948e-05
}
warmup: true
}
}
momentum_optimizer_value: 0.899999976158
}
use_moving_average: false
}
gradient_clipping_by_norm: 10.0
fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/faster_rcnn_resnet101_coco_2018_08_14/model.ckpt"
from_detection_checkpoint: true
num_steps: 500000
replicas_to_aggregate: 8
max_number_of_boxes: 100
unpad_groundtruth_tensors: false
use_bfloat16: true
}
train_input_reader {
label_map_path: "PATH_TO_BE_CONFIGURED/ss_label_map.pbtxt"
tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/snapshot_serengeti_train-?????-of-?????"
}
load_context_features: true
input_type: TF_SEQUENCE_EXAMPLE
}
eval_config {
max_evals: 50
metrics_set: "coco_detection_metrics"
use_moving_averages: false
batch_size: 4
}
eval_input_reader {
label_map_path: "PATH_TO_BE_CONFIGURED/ss_label_map.pbtxt"
shuffle: false
num_epochs: 1
tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/snapshot_serengeti_val-?????-of-?????"
}
load_context_features: true
input_type: TF_SEQUENCE_EXAMPLE
}
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment