Commit 5a2cf36f authored by Kaushik Shivakumar's avatar Kaushik Shivakumar
Browse files

Merge remote-tracking branch 'upstream/master' into newavarecords

parents 258ddfc3 a829e648
# Quick Start: Jupyter notebook for off-the-shelf inference # Quick Start: Jupyter notebook for off-the-shelf inference
[![TensorFlow 2.2](https://img.shields.io/badge/TensorFlow-2.2-FF6F00?logo=tensorflow)](https://github.com/tensorflow/tensorflow/releases/tag/v2.2.0)
[![TensorFlow 1.15](https://img.shields.io/badge/TensorFlow-1.15-FF6F00?logo=tensorflow)](https://github.com/tensorflow/tensorflow/releases/tag/v1.15.0)
If you'd like to hit the ground running and run detection on a few example If you'd like to hit the ground running and run detection on a few example
images right out of the box, we recommend trying out the Jupyter notebook demo. images right out of the box, we recommend trying out the Jupyter notebook demo.
To run the Jupyter notebook, run the following command from To run the Jupyter notebook, run the following command from
......
# Running on mobile with TensorFlow Lite # Running on mobile with TensorFlow Lite
[![TensorFlow 1.15](https://img.shields.io/badge/TensorFlow-1.15-FF6F00?logo=tensorflow)](https://github.com/tensorflow/tensorflow/releases/tag/v1.15.0)
In this section, we will show you how to use [TensorFlow In this section, we will show you how to use [TensorFlow
Lite](https://www.tensorflow.org/mobile/tflite/) to get a smaller model and Lite](https://www.tensorflow.org/mobile/tflite/) to get a smaller model and
allow you take advantage of ops that have been optimized for mobile devices. allow you take advantage of ops that have been optimized for mobile devices.
......
# Quick Start: Distributed Training on the Oxford-IIIT Pets Dataset on Google Cloud # Quick Start: Distributed Training on the Oxford-IIIT Pets Dataset on Google Cloud
This page is a walkthrough for training an object detector using the Tensorflow [![TensorFlow 1.15](https://img.shields.io/badge/TensorFlow-1.15-FF6F00?logo=tensorflow)](https://github.com/tensorflow/tensorflow/releases/tag/v1.15.0)
This page is a walkthrough for training an object detector using the TensorFlow
Object Detection API. In this tutorial, we'll be training on the Oxford-IIIT Pets Object Detection API. In this tutorial, we'll be training on the Oxford-IIIT Pets
dataset to build a system to detect various breeds of cats and dogs. The output dataset to build a system to detect various breeds of cats and dogs. The output
of the detector will look like the following: of the detector will look like the following:
...@@ -40,10 +42,10 @@ export YOUR_GCS_BUCKET=${YOUR_GCS_BUCKET} ...@@ -40,10 +42,10 @@ export YOUR_GCS_BUCKET=${YOUR_GCS_BUCKET}
It is also possible to run locally by following It is also possible to run locally by following
[the running locally instructions](running_locally.md). [the running locally instructions](running_locally.md).
## Installing Tensorflow and the Tensorflow Object Detection API ## Installing TensorFlow and the TensorFlow Object Detection API
Please run through the [installation instructions](installation.md) to install Please run through the [installation instructions](installation.md) to install
Tensorflow and all it dependencies. Ensure the Protobuf libraries are TensorFlow and all it dependencies. Ensure the Protobuf libraries are
compiled and the library directories are added to `PYTHONPATH`. compiled and the library directories are added to `PYTHONPATH`.
## Getting the Oxford-IIIT Pets Dataset and Uploading it to Google Cloud Storage ## Getting the Oxford-IIIT Pets Dataset and Uploading it to Google Cloud Storage
...@@ -77,7 +79,7 @@ should appear as follows: ...@@ -77,7 +79,7 @@ should appear as follows:
... other files and directories ... other files and directories
``` ```
The Tensorflow Object Detection API expects data to be in the TFRecord format, The TensorFlow Object Detection API expects data to be in the TFRecord format,
so we'll now run the `create_pet_tf_record` script to convert from the raw so we'll now run the `create_pet_tf_record` script to convert from the raw
Oxford-IIIT Pet dataset into TFRecords. Run the following commands from the Oxford-IIIT Pet dataset into TFRecords. Run the following commands from the
`tensorflow/models/research/` directory: `tensorflow/models/research/` directory:
...@@ -134,7 +136,7 @@ in the following step. ...@@ -134,7 +136,7 @@ in the following step.
## Configuring the Object Detection Pipeline ## Configuring the Object Detection Pipeline
In the Tensorflow Object Detection API, the model parameters, training In the TensorFlow Object Detection API, the model parameters, training
parameters and eval parameters are all defined by a config file. More details parameters and eval parameters are all defined by a config file. More details
can be found [here](configuring_jobs.md). For this tutorial, we will use some can be found [here](configuring_jobs.md). For this tutorial, we will use some
predefined templates provided with the source code. In the predefined templates provided with the source code. In the
...@@ -188,10 +190,10 @@ browser](https://console.cloud.google.com/storage/browser). ...@@ -188,10 +190,10 @@ browser](https://console.cloud.google.com/storage/browser).
Before we can start a job on Google Cloud ML Engine, we must: Before we can start a job on Google Cloud ML Engine, we must:
1. Package the Tensorflow Object Detection code. 1. Package the TensorFlow Object Detection code.
2. Write a cluster configuration for our Google Cloud ML job. 2. Write a cluster configuration for our Google Cloud ML job.
To package the Tensorflow Object Detection code, run the following commands from To package the TensorFlow Object Detection code, run the following commands from
the `tensorflow/models/research/` directory: the `tensorflow/models/research/` directory:
```bash ```bash
...@@ -248,7 +250,7 @@ web browser. You should see something similar to the following: ...@@ -248,7 +250,7 @@ web browser. You should see something similar to the following:
![](img/tensorboard.png) ![](img/tensorboard.png)
Make sure your Tensorboard version is the same minor version as your Tensorflow (1.x) Make sure your Tensorboard version is the same minor version as your TensorFlow (1.x)
You will also want to click on the images tab to see example detections made by You will also want to click on the images tab to see example detections made by
the model while it trains. After about an hour and a half of training, you can the model while it trains. After about an hour and a half of training, you can
...@@ -265,9 +267,9 @@ the training jobs are configured to go for much longer than is necessary for ...@@ -265,9 +267,9 @@ the training jobs are configured to go for much longer than is necessary for
convergence. To save money, we recommend killing your jobs once you've seen convergence. To save money, we recommend killing your jobs once you've seen
that they've converged. that they've converged.
## Exporting the Tensorflow Graph ## Exporting the TensorFlow Graph
After your model has been trained, you should export it to a Tensorflow graph After your model has been trained, you should export it to a TensorFlow graph
proto. First, you need to identify a candidate checkpoint to export. You can proto. First, you need to identify a candidate checkpoint to export. You can
search your bucket using the [Google Cloud Storage search your bucket using the [Google Cloud Storage
Browser](https://console.cloud.google.com/storage/browser). The file should be Browser](https://console.cloud.google.com/storage/browser). The file should be
......
# Object Detection API with TensorFlow 1
## Requirements
[![Python 3.6](https://img.shields.io/badge/Python-3.6-3776AB)](https://www.python.org/downloads/release/python-360/)
[![TensorFlow 1.15](https://img.shields.io/badge/TensorFlow-1.15-FF6F00?logo=tensorflow)](https://github.com/tensorflow/tensorflow/releases/tag/v1.15.0)
[![Protobuf Compiler >= 3.0](https://img.shields.io/badge/ProtoBuf%20Compiler-%3E3.0-brightgreen)](https://grpc.io/docs/protoc-installation/#install-using-a-package-manager)
## Installation
You can install the TensorFlow Object Detection API either with Python Package
Installer (pip) or Docker. For local runs we recommend using Docker and for
Google Cloud runs we recommend using pip.
Clone the TensorFlow Models repository and proceed to one of the installation
options.
```bash
git clone https://github.com/tensorflow/models.git
```
### Docker Installation
```bash
# From the root of the git repository
docker build -f research/object_detection/dockerfiles/tf1/Dockerfile -t od .
docker run -it od
```
### Python Package Installation
```bash
cd models/research
# Compile protos.
protoc object_detection/protos/*.proto --python_out=.
# Install TensorFlow Object Detection API.
cp object_detection/packages/tf1/setup.py .
python -m pip install .
```
```bash
# Test the installation.
python object_detection/builders/model_builder_tf1_test.py
```
## Quick Start
### Colabs
* [Jupyter notebook for off-the-shelf inference](../colab_tutorials/object_detection_tutorial.ipynb)
* [Training a pet detector](running_pets.md)
### Training and Evaluation
To train and evaluate your models either locally or on Google Cloud see
[instructions](tf1_training_and_evaluation.md).
## Model Zoo
We provide a large collection of models that are trained on several datasets in
the [Model Zoo](tf1_detection_zoo.md).
## Guides
* <a href='configuring_jobs.md'>
Configuring an object detection pipeline</a><br>
* <a href='preparing_inputs.md'>Preparing inputs</a><br>
* <a href='defining_your_own_model.md'>
Defining your own model architecture</a><br>
* <a href='using_your_own_dataset.md'>
Bringing in your own dataset</a><br>
* <a href='evaluation_protocols.md'>
Supported object detection evaluation protocols</a><br>
* <a href='tpu_compatibility.md'>
TPU compatible detection pipelines</a><br>
* <a href='tf1_training_and_evaluation.md'>
Training and evaluation guide (CPU, GPU, or TPU)</a><br>
## Extras:
* <a href='exporting_models.md'>
Exporting a trained model for inference</a><br>
* <a href='tpu_exporters.md'>
Exporting a trained model for TPU inference</a><br>
* <a href='oid_inference_and_evaluation.md'>
Inference and evaluation on the Open Images dataset</a><br>
* <a href='instance_segmentation.md'>
Run an instance segmentation model</a><br>
* <a href='challenge_evaluation.md'>
Run the evaluation for the Open Images Challenge 2018/2019</a><br>
* <a href='running_on_mobile_tensorflowlite.md'>
Running object detection on mobile devices with TensorFlow Lite</a><br>
* <a href='context_rcnn.md'>
Context R-CNN documentation for data preparation, training, and export</a><br>
# Tensorflow detection model zoo # TensorFlow 1 Detection Model Zoo
[![TensorFlow 1.15](https://img.shields.io/badge/TensorFlow-1.15-FF6F00?logo=tensorflow)](https://github.com/tensorflow/tensorflow/releases/tag/v1.15.0)
[![Python 3.6](https://img.shields.io/badge/Python-3.6-3776AB)](https://www.python.org/downloads/release/python-360/)
We provide a collection of detection models pre-trained on the We provide a collection of detection models pre-trained on the
[COCO dataset](http://cocodataset.org), the [COCO dataset](http://cocodataset.org), the
...@@ -64,9 +67,9 @@ Some remarks on frozen inference graphs: ...@@ -64,9 +67,9 @@ Some remarks on frozen inference graphs:
metrics. metrics.
* Our frozen inference graphs are generated using the * Our frozen inference graphs are generated using the
[v1.12.0](https://github.com/tensorflow/tensorflow/tree/v1.12.0) release [v1.12.0](https://github.com/tensorflow/tensorflow/tree/v1.12.0) release
version of Tensorflow and we do not guarantee that these will work with version of TensorFlow and we do not guarantee that these will work with
other versions; this being said, each frozen inference graph can be other versions; this being said, each frozen inference graph can be
regenerated using your current version of Tensorflow by re-running the regenerated using your current version of TensorFlow by re-running the
[exporter](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/exporting_models.md), [exporter](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/exporting_models.md),
pointing it at the model directory as well as the corresponding config file pointing it at the model directory as well as the corresponding config file
in in
......
# Running on Google Cloud ML Engine # Training and Evaluation with TensorFlow 1
The Tensorflow Object Detection API supports distributed training on Google [![Python 3.6](https://img.shields.io/badge/Python-3.6-3776AB)](https://www.python.org/downloads/release/python-360/)
Cloud ML Engine. This section documents instructions on how to train and [![TensorFlow 1.15](https://img.shields.io/badge/TensorFlow-1.15-FF6F00?logo=tensorflow)](https://github.com/tensorflow/tensorflow/releases/tag/v1.15.0)
evaluate your model using Cloud ML. The reader should complete the following
prerequistes:
1. The reader has created and configured a project on Google Cloud Platform. This page walks through the steps required to train an object detection model.
See [the Cloud ML quick start guide](https://cloud.google.com/ml-engine/docs/quickstarts/command-line). It assumes the reader has completed the following prerequisites:
2. The reader has installed the Tensorflow Object Detection API as documented
in the [installation instructions](installation.md).
3. The reader has a valid data set and stored it in a Google Cloud Storage
bucket. See [this page](preparing_inputs.md) for instructions on how to generate
a dataset for the PASCAL VOC challenge or the Oxford-IIIT Pet dataset.
4. The reader has configured a valid Object Detection pipeline, and stored it
in a Google Cloud Storage bucket. See [this page](configuring_jobs.md) for
details on how to write a pipeline configuration.
Additionally, it is recommended users test their job by running training and 1. The TensorFlow Object Detection API has been installed as documented in the
evaluation jobs for a few iterations [installation instructions](tf1.md#installation).
[locally on their own machines](running_locally.md). 2. A valid data set has been created. See [this page](preparing_inputs.md) for
instructions on how to generate a dataset for the PASCAL VOC challenge or
the Oxford-IIIT Pet dataset.
## Recommended Directory Structure for Training and Evaluation
```bash
.
├── data/
│   ├── eval-00000-of-00001.tfrecord
│   ├── label_map.txt
│   ├── train-00000-of-00002.tfrecord
│   └── train-00001-of-00002.tfrecord
└── models/
└── my_model_dir/
├── eval/ # Created by evaluation job.
├── my_model.config
└── train/ #
└── model_ckpt-100-data@1 # Created by training job.
└── model_ckpt-100-index #
└── checkpoint #
```
## Writing a model configuration
Please refer to sample [TF1 configs](../samples/configs) and
[configuring jobs](configuring_jobs.md) to create a model config.
## Packaging ### Model Parameter Initialization
In order to run the Tensorflow Object Detection API on Cloud ML, it must be While optional, it is highly recommended that users utilize classification or
packaged (along with it's TF-Slim dependency and the object detection checkpoints. Training an object detector from scratch can take
[pycocotools](https://github.com/cocodataset/cocoapi/tree/master/PythonAPI/pycocotools) days. To speed up the training process, it is recommended that users re-use the
library). The required packages can be created with the following command feature extractor parameters from a pre-existing image classification or object
detection checkpoint. The`train_config` section in the config provides two
fields to specify pre-existing checkpoints:
``` bash * `fine_tune_checkpoint`: a path prefix to the pre-existing checkpoint
# From tensorflow/models/research/ (ie:"/usr/home/username/checkpoint/model.ckpt-#####").
bash object_detection/dataset_tools/create_pycocotools_package.sh /tmp/pycocotools
python setup.py sdist * `fine_tune_checkpoint_type`: with value `classification` or `detection`
(cd slim && python setup.py sdist) depending on the type.
A list of detection checkpoints can be found [here](tf1_detection_zoo.md).
## Local
### Training
A local training job can be run with the following command:
```bash
# From the tensorflow/models/research/ directory
PIPELINE_CONFIG_PATH={path to pipeline config file}
MODEL_DIR={path to model directory}
NUM_TRAIN_STEPS=50000
SAMPLE_1_OF_N_EVAL_EXAMPLES=1
python object_detection/model_main.py \
--pipeline_config_path=${PIPELINE_CONFIG_PATH} \
--model_dir=${MODEL_DIR} \
--num_train_steps=${NUM_TRAIN_STEPS} \
--sample_1_of_n_eval_examples=${SAMPLE_1_OF_N_EVAL_EXAMPLES} \
--alsologtostderr
``` ```
This will create python packages dist/object_detection-0.1.tar.gz, where `${PIPELINE_CONFIG_PATH}` points to the pipeline config and `${MODEL_DIR}`
slim/dist/slim-0.1.tar.gz, and /tmp/pycocotools/pycocotools-2.0.tar.gz. points to the directory in which training checkpoints and events will be
written. Note that this binary will interleave both training and evaluation.
## Running a Multiworker (GPU) Training Job on CMLE ## Google Cloud AI Platform
The TensorFlow Object Detection API supports training on Google Cloud AI
Platform. This section documents instructions on how to train and evaluate your
model using Cloud AI Platform. The reader should complete the following
prerequistes:
1. The reader has created and configured a project on Google Cloud AI Platform.
See
[Using GPUs](https://cloud.google.com/ai-platform/training/docs/using-gpus)
and
[Using TPUs](https://cloud.google.com/ai-platform/training/docs/using-tpus)
guides.
2. The reader has a valid data set and stored it in a Google Cloud Storage
bucket. See [this page](preparing_inputs.md) for instructions on how to
generate a dataset for the PASCAL VOC challenge or the Oxford-IIIT Pet
dataset.
Additionally, it is recommended users test their job by running training and
evaluation jobs for a few iterations [locally on their own machines](#local).
### Training with multiple workers with single GPU
Google Cloud ML requires a YAML configuration file for a multiworker training Google Cloud ML requires a YAML configuration file for a multiworker training
job using GPUs. A sample YAML file is given below: job using GPUs. A sample YAML file is given below:
``` ```
trainingInput: trainingInput:
runtimeVersion: "1.12" runtimeVersion: "1.15"
scaleTier: CUSTOM scaleTier: CUSTOM
masterType: standard_gpu masterType: standard_gpu
workerCount: 9 workerCount: 9
...@@ -52,30 +113,32 @@ trainingInput: ...@@ -52,30 +113,32 @@ trainingInput:
parameterServerCount: 3 parameterServerCount: 3
parameterServerType: standard parameterServerType: standard
``` ```
Please keep the following guidelines in mind when writing the YAML Please keep the following guidelines in mind when writing the YAML
configuration: configuration:
* A job with n workers will have n + 1 training machines (n workers + 1 master). * A job with n workers will have n + 1 training machines (n workers + 1
* The number of parameters servers used should be an odd number to prevent master).
a parameter server from storing only weight variables or only bias variables * The number of parameters servers used should be an odd number to prevent a
(due to round robin parameter scheduling). parameter server from storing only weight variables or only bias variables
* The learning rate in the training config should be decreased when using a (due to round robin parameter scheduling).
larger number of workers. Some experimentation is required to find the * The learning rate in the training config should be decreased when using a
optimal learning rate. larger number of workers. Some experimentation is required to find the
optimal learning rate.
The YAML file should be saved on the local machine (not on GCP). Once it has The YAML file should be saved on the local machine (not on GCP). Once it has
been written, a user can start a training job on Cloud ML Engine using the been written, a user can start a training job on Cloud ML Engine using the
following command: following command:
```bash ```bash
# From tensorflow/models/research/ # From the tensorflow/models/research/ directory
cp object_detection/packages/tf1/setup.py .
gcloud ml-engine jobs submit training object_detection_`date +%m_%d_%Y_%H_%M_%S` \ gcloud ml-engine jobs submit training object_detection_`date +%m_%d_%Y_%H_%M_%S` \
--runtime-version 1.12 \ --runtime-version 1.15 \
--python-version 3.6 \
--job-dir=gs://${MODEL_DIR} \ --job-dir=gs://${MODEL_DIR} \
--packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz,/tmp/pycocotools/pycocotools-2.0.tar.gz \ --package-path ./object_detection \
--module-name object_detection.model_main \ --module-name object_detection.model_main \
--region us-central1 \ --region us-central1 \
--config ${PATH_TO_LOCAL_YAML_FILE} \ --config ${PATH_TO_LOCAL_YAML_FILE} \
...@@ -90,41 +153,42 @@ training checkpoints and events will be written to and ...@@ -90,41 +153,42 @@ training checkpoints and events will be written to and
`gs://${PIPELINE_CONFIG_PATH}` points to the pipeline configuration stored on `gs://${PIPELINE_CONFIG_PATH}` points to the pipeline configuration stored on
Google Cloud Storage. Google Cloud Storage.
Users can monitor the progress of their training job on the [ML Engine Users can monitor the progress of their training job on the
Dashboard](https://console.cloud.google.com/mlengine/jobs). [ML Engine Dashboard](https://console.cloud.google.com/ai-platform/jobs).
Note: This sample is supported for use with 1.12 runtime version.
## Running a TPU Training Job on CMLE ## Training with TPU
Launching a training job with a TPU compatible pipeline config requires using a Launching a training job with a TPU compatible pipeline config requires using a
similar command: similar command:
```bash ```bash
# From the tensorflow/models/research/ directory
cp object_detection/packages/tf1/setup.py .
gcloud ml-engine jobs submit training `whoami`_object_detection_`date +%m_%d_%Y_%H_%M_%S` \ gcloud ml-engine jobs submit training `whoami`_object_detection_`date +%m_%d_%Y_%H_%M_%S` \
--job-dir=gs://${MODEL_DIR} \ --job-dir=gs://${MODEL_DIR} \
--packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz,/tmp/pycocotools/pycocotools-2.0.tar.gz \ --package-path ./object_detection \
--module-name object_detection.model_tpu_main \ --module-name object_detection.model_tpu_main \
--runtime-version 1.12 \ --runtime-version 1.15 \
--scale-tier BASIC_TPU \ --python-version 3.6 \
--region us-central1 \ --scale-tier BASIC_TPU \
-- \ --region us-central1 \
--tpu_zone us-central1 \ -- \
--model_dir=gs://${MODEL_DIR} \ --tpu_zone us-central1 \
--pipeline_config_path=gs://${PIPELINE_CONFIG_PATH} --model_dir=gs://${MODEL_DIR} \
--pipeline_config_path=gs://${PIPELINE_CONFIG_PATH}
``` ```
In contrast with the GPU training command, there is no need to specify a YAML In contrast with the GPU training command, there is no need to specify a YAML
file and we point to the *object_detection.model_tpu_main* binary instead of file, and we point to the *object_detection.model_tpu_main* binary instead of
*object_detection.model_main*. We must also now set `scale-tier` to be *object_detection.model_main*. We must also now set `scale-tier` to be
`BASIC_TPU` and provide a `tpu_zone`. Finally as before `pipeline_config_path` `BASIC_TPU` and provide a `tpu_zone`. Finally as before `pipeline_config_path`
points to a points to the pipeline configuration stored on Google Cloud Storage points to a points to the pipeline configuration stored on Google Cloud Storage
(but is now must be a TPU compatible model). (but is now must be a TPU compatible model).
## Running an Evaluation Job on CMLE ## Evaluation with GPU
Note: You only need to do this when using TPU for training as it does not Note: You only need to do this when using TPU for training, as it does not
interleave evaluation during training as in the case of Multiworker GPU interleave evaluation during training, as in the case of Multiworker GPU
training. training.
Evaluation jobs run on a single machine, so it is not necessary to write a YAML Evaluation jobs run on a single machine, so it is not necessary to write a YAML
...@@ -132,10 +196,13 @@ configuration for evaluation. Run the following command to start the evaluation ...@@ -132,10 +196,13 @@ configuration for evaluation. Run the following command to start the evaluation
job: job:
```bash ```bash
# From the tensorflow/models/research/ directory
cp object_detection/packages/tf1/setup.py .
gcloud ml-engine jobs submit training object_detection_eval_`date +%m_%d_%Y_%H_%M_%S` \ gcloud ml-engine jobs submit training object_detection_eval_`date +%m_%d_%Y_%H_%M_%S` \
--runtime-version 1.12 \ --runtime-version 1.15 \
--python-version 3.6 \
--job-dir=gs://${MODEL_DIR} \ --job-dir=gs://${MODEL_DIR} \
--packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz,/tmp/pycocotools/pycocotools-2.0.tar.gz \ --package-path ./object_detection \
--module-name object_detection.model_main \ --module-name object_detection.model_main \
--region us-central1 \ --region us-central1 \
--scale-tier BASIC_GPU \ --scale-tier BASIC_GPU \
...@@ -146,25 +213,25 @@ gcloud ml-engine jobs submit training object_detection_eval_`date +%m_%d_%Y_%H_% ...@@ -146,25 +213,25 @@ gcloud ml-engine jobs submit training object_detection_eval_`date +%m_%d_%Y_%H_%
``` ```
Where `gs://${MODEL_DIR}` points to the directory on Google Cloud Storage where Where `gs://${MODEL_DIR}` points to the directory on Google Cloud Storage where
training checkpoints are saved (same as the training job), as well as training checkpoints are saved (same as the training job), as well as to where
to where evaluation events will be saved on Google Cloud Storage and evaluation events will be saved on Google Cloud Storage and
`gs://${PIPELINE_CONFIG_PATH}` points to where the pipeline configuration is `gs://${PIPELINE_CONFIG_PATH}` points to where the pipeline configuration is
stored on Google Cloud Storage. stored on Google Cloud Storage.
Typically one starts an evaluation job concurrently with the training job. Typically one starts an evaluation job concurrently with the training job. Note
Note that we do not support running evaluation on TPU, so the above command that we do not support running evaluation on TPU, so the above command line for
line for launching evaluation jobs is the same whether you are training launching evaluation jobs is the same whether you are training on GPU or TPU.
on GPU or TPU.
## Running Tensorboard ## Running Tensorboard
You can run Tensorboard locally on your own machine to view progress of your Progress for training and eval jobs can be inspected using Tensorboard. If using
training and eval jobs on Google Cloud ML. Run the following command to start the recommended directory structure, Tensorboard can be run using the following
Tensorboard: command:
``` bash ```bash
tensorboard --logdir=gs://${YOUR_CLOUD_BUCKET} tensorboard --logdir=${MODEL_DIR}
``` ```
Note it may Tensorboard a few minutes to populate with results. where `${MODEL_DIR}` points to the directory that contains the train and eval
directories. Please note it may take Tensorboard a couple minutes to populate
with data.
# Object Detection API with TensorFlow 2
## Requirements
[![Python 3.6](https://img.shields.io/badge/Python-3.6-3776AB)](https://www.python.org/downloads/release/python-360/)
[![TensorFlow 2.2](https://img.shields.io/badge/TensorFlow-2.2-FF6F00?logo=tensorflow)](https://github.com/tensorflow/tensorflow/releases/tag/v2.2.0)
[![Protobuf Compiler >= 3.0](https://img.shields.io/badge/ProtoBuf%20Compiler-%3E3.0-brightgreen)](https://grpc.io/docs/protoc-installation/#install-using-a-package-manager)
## Installation
You can install the TensorFlow Object Detection API either with Python Package
Installer (pip) or Docker. For local runs we recommend using Docker and for
Google Cloud runs we recommend using pip.
Clone the TensorFlow Models repository and proceed to one of the installation
options.
```bash
git clone https://github.com/tensorflow/models.git
```
### Docker Installation
```bash
# From the root of the git repository
docker build -f research/object_detection/dockerfiles/tf2/Dockerfile -t od .
docker run -it od
```
### Python Package Installation
```bash
cd models/research
# Compile protos.
protoc object_detection/protos/*.proto --python_out=.
# Install TensorFlow Object Detection API.
cp object_detection/packages/tf2/setup.py .
python -m pip install .
```
```bash
# Test the installation.
python object_detection/builders/model_builder_tf2_test.py
```
## Quick Start
### Colabs
<!-- mdlint off(URL_BAD_G3DOC_PATH) -->
* Training -
[Fine-tune a pre-trained detector in eager mode on custom data](../colab_tutorials/eager_few_shot_od_training_tf2_colab.ipynb)
* Inference -
[Run inference with models from the zoo](../colab_tutorials/inference_tf2_colab.ipynb)
<!-- mdlint on -->
## Training and Evaluation
To train and evaluate your models either locally or on Google Cloud see
[instructions](tf2_training_and_evaluation.md).
## Model Zoo
We provide a large collection of models that are trained on COCO 2017 in the
[Model Zoo](tf2_detection_zoo.md).
## Guides
* <a href='configuring_jobs.md'>
Configuring an object detection pipeline</a><br>
* <a href='preparing_inputs.md'>Preparing inputs</a><br>
* <a href='defining_your_own_model.md'>
Defining your own model architecture</a><br>
* <a href='using_your_own_dataset.md'>
Bringing in your own dataset</a><br>
* <a href='evaluation_protocols.md'>
Supported object detection evaluation protocols</a><br>
* <a href='tpu_compatibility.md'>
TPU compatible detection pipelines</a><br>
* <a href='tf2_training_and_evaluation.md'>
Training and evaluation guide (CPU, GPU, or TPU)</a><br>
\ No newline at end of file
# TensorFlow 2 Classification Model Zoo
[![TensorFlow 2.2](https://img.shields.io/badge/TensorFlow-2.2-FF6F00?logo=tensorflow)](https://github.com/tensorflow/tensorflow/releases/tag/v2.2.0)
[![Python 3.6](https://img.shields.io/badge/Python-3.6-3776AB)](https://www.python.org/downloads/release/python-360/)
We provide a collection of classification models pre-trained on the
[Imagenet](http://www.image-net.org). These can be used to initilize detection
model parameters.
Model name |
---------- |
[EfficientNet B0](http://download.tensorflow.org/models/object_detection/classification/tf2/20200710/efficientnet_b0.tar.gz) |
[EfficientNet B1](http://download.tensorflow.org/models/object_detection/classification/tf2/20200710/efficientnet_b1.tar.gz) |
[EfficientNet B2](http://download.tensorflow.org/models/object_detection/classification/tf2/20200710/efficientnet_b2.tar.gz) |
[EfficientNet B3](http://download.tensorflow.org/models/object_detection/classification/tf2/20200710/efficientnet_b3.tar.gz) |
[EfficientNet B4](http://download.tensorflow.org/models/object_detection/classification/tf2/20200710/efficientnet_b4.tar.gz) |
[EfficientNet B5](http://download.tensorflow.org/models/object_detection/classification/tf2/20200710/efficientnet_b5.tar.gz) |
[EfficientNet B6](http://download.tensorflow.org/models/object_detection/classification/tf2/20200710/efficientnet_b6.tar.gz) |
[EfficientNet B7](http://download.tensorflow.org/models/object_detection/classification/tf2/20200710/efficientnet_b7.tar.gz) |
[Resnet V1 50](http://download.tensorflow.org/models/object_detection/classification/tf2/20200710/resnet50_v1.tar.gz) |
[Resnet V1 101](http://download.tensorflow.org/models/object_detection/classification/tf2/20200710/resnet101_v1.tar.gz) |
[Resnet V1 152](http://download.tensorflow.org/models/object_detection/classification/tf2/20200710/resnet152_v1.tar.gz) |
[Inception Resnet V2](http://download.tensorflow.org/models/object_detection/classification/tf2/20200710/inception_resnet_v2.tar.gz) |
[MobileNet V1](http://download.tensorflow.org/models/object_detection/classification/tf2/20200710/mobilnet_v1.tar.gz) |
[MobileNet V2](http://download.tensorflow.org/models/object_detection/classification/tf2/20200710/mobilnet_v2.tar.gz) |
# TensorFlow 2 Detection Model Zoo
[![TensorFlow 2.2](https://img.shields.io/badge/TensorFlow-2.2-FF6F00?logo=tensorflow)](https://github.com/tensorflow/tensorflow/releases/tag/v2.2.0)
[![Python 3.6](https://img.shields.io/badge/Python-3.6-3776AB)](https://www.python.org/downloads/release/python-360/)
<!-- mdlint off(URL_BAD_G3DOC_PATH) -->
We provide a collection of detection models pre-trained on the
[COCO 2017 dataset](http://cocodataset.org). These models can be useful for
out-of-the-box inference if you are interested in categories already in those
datasets. You can try it in our inference
[colab](../colab_tutorials/inference_tf2_colab.ipynb)
They are also useful for initializing your models when training on novel
datasets. You can try this out on our few-shot training
[colab](../colab_tutorials/eager_few_shot_od_training_tf2_colab.ipynb).
<!-- mdlint on -->
Finally, if you would like to train these models from scratch, you can find the
model configs in this [directory](../configs/tf2) (also in the linked
`tar.gz`s).
Model name | Speed (ms) | COCO mAP | Outputs
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :--------: | :----------: | :-----:
[CenterNet HourGlass104 512x512](http://download.tensorflow.org/models/object_detection/tf2/20200711/centernet_hg104_512x512_coco17_tpu-8.tar.gz) | 70 | 41.6 | Boxes
[CenterNet HourGlass104 Keypoints 512x512](http://download.tensorflow.org/models/object_detection/tf2/20200711/centernet_hg104_512x512_kpts_coco17_tpu-32.tar.gz) | 76 | 40.0/61.4 | Boxes/Keypoints
[CenterNet HourGlass104 1024x1024](http://download.tensorflow.org/models/object_detection/tf2/20200711/centernet_hg104_1024x1024_coco17_tpu-32.tar.gz) | 197 | 43.5 | Boxes
[CenterNet HourGlass104 Keypoints 1024x1024](http://download.tensorflow.org/models/object_detection/tf2/20200711/centernet_hg104_1024x1024_kpts_coco17_tpu-32.tar.gz) | 211 | 42.8/64.5 | Boxes/Keypoints
[CenterNet Resnet50 V1 FPN 512x512](http://download.tensorflow.org/models/object_detection/tf2/20200711/centernet_resnet50_v1_fpn_512x512_coco17_tpu-8.tar.gz) | 27 | 31.2 | Boxes
[CenterNet Resnet50 V1 FPN Keypoints 512x512](http://download.tensorflow.org/models/object_detection/tf2/20200711/centernet_resnet50_v1_fpn_512x512_kpts_coco17_tpu-8.tar.gz) | 30 | 29.3/50.7 | Boxes/Keypoints
[CenterNet Resnet101 V1 FPN 512x512](http://download.tensorflow.org/models/object_detection/tf2/20200711/centernet_resnet101_v1_fpn_512x512_coco17_tpu-8.tar.gz) | 34 | 34.2 | Boxes
[CenterNet Resnet50 V2 512x512](http://download.tensorflow.org/models/object_detection/tf2/20200711/centernet_resnet50_v2_512x512_coco17_tpu-8.tar.gz) | 27 | 29.5 | Boxes
[CenterNet Resnet50 V2 Keypoints 512x512](http://download.tensorflow.org/models/object_detection/tf2/20200711/centernet_resnet50_v2_512x512_kpts_coco17_tpu-8.tar.gz) | 30 | 27.6/48.2 | Boxes/Keypoints
[EfficientDet D0 512x512](http://download.tensorflow.org/models/object_detection/tf2/20200711/efficientdet_d0_coco17_tpu-32.tar.gz) | 39 | 33.6 | Boxes
[EfficientDet D1 640x640](http://download.tensorflow.org/models/object_detection/tf2/20200711/efficientdet_d1_coco17_tpu-32.tar.gz) | 54 | 38.4 | Boxes
[EfficientDet D2 768x768](http://download.tensorflow.org/models/object_detection/tf2/20200711/efficientdet_d2_coco17_tpu-32.tar.gz) | 67 | 41.8 | Boxes
[EfficientDet D3 896x896](http://download.tensorflow.org/models/object_detection/tf2/20200711/efficientdet_d3_coco17_tpu-32.tar.gz) | 95 | 45.4 | Boxes
[EfficientDet D4 1024x1024](http://download.tensorflow.org/models/object_detection/tf2/20200711/efficientdet_d4_coco17_tpu-32.tar.gz) | 133 | 48.5 | Boxes
[EfficientDet D5 1280x1280](http://download.tensorflow.org/models/object_detection/tf2/20200711/efficientdet_d5_coco17_tpu-32.tar.gz) | 222 | 49.7 | Boxes
[EfficientDet D6 1280x1280](http://download.tensorflow.org/models/object_detection/tf2/20200711/efficientdet_d6_coco17_tpu-32.tar.gz) | 268 | 50.5 | Boxes
[EfficientDet D7 1536x1536](http://download.tensorflow.org/models/object_detection/tf2/20200711/efficientdet_d7_coco17_tpu-32.tar.gz) | 325 | 51.2 | Boxes
[SSD MobileNet v2 320x320](http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_mobilenet_v2_320x320_coco17_tpu-8.tar.gz) |19 | 20.2 | Boxes
[SSD MobileNet V1 FPN 640x640](http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_mobilenet_v1_fpn_640x640_coco17_tpu-8.tar.gz) | 48 | 29.1 | Boxes
[SSD MobileNet V2 FPNLite 320x320](http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8.tar.gz) | 22 | 22.2 | Boxes
[SSD MobileNet V2 FPNLite 640x640](http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8.tar.gz) | 39 | 28.2 | Boxes
[SSD ResNet50 V1 FPN 640x640 (RetinaNet50)](http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.tar.gz) | 46 | 34.3 | Boxes
[SSD ResNet50 V1 FPN 1024x1024 (RetinaNet50)](http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_resnet50_v1_fpn_1024x1024_coco17_tpu-8.tar.gz) | 87 | 38.3 | Boxes
[SSD ResNet101 V1 FPN 640x640 (RetinaNet101)](http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_resnet101_v1_fpn_640x640_coco17_tpu-8.tar.gz) | 57 | 35.6 | Boxes
[SSD ResNet101 V1 FPN 1024x1024 (RetinaNet101)](http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_resnet101_v1_fpn_1024x1024_coco17_tpu-8.tar.gz) | 104 | 39.5 | Boxes
[SSD ResNet152 V1 FPN 640x640 (RetinaNet152)](http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_resnet152_v1_fpn_640x640_coco17_tpu-8.tar.gz) | 80 | 35.4 | Boxes
[SSD ResNet152 V1 FPN 1024x1024 (RetinaNet152)](http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_resnet152_v1_fpn_1024x1024_coco17_tpu-8.tar.gz) | 111 | 39.6 | Boxes
[Faster R-CNN ResNet50 V1 640x640](http://download.tensorflow.org/models/object_detection/tf2/20200711/faster_rcnn_resnet50_v1_640x640_coco17_tpu-8.tar.gz) | 53 | 29.3 | Boxes
[Faster R-CNN ResNet50 V1 1024x1024](http://download.tensorflow.org/models/object_detection/tf2/20200711/faster_rcnn_resnet50_v1_1024x1024_coco17_tpu-8.tar.gz) | 65 | 31.0 | Boxes
[Faster R-CNN ResNet50 V1 800x1333](http://download.tensorflow.org/models/object_detection/tf2/20200711/faster_rcnn_resnet50_v1_800x1333_coco17_gpu-8.tar.gz) | 65 | 31.6 | Boxes
[Faster R-CNN ResNet101 V1 640x640](http://download.tensorflow.org/models/object_detection/tf2/20200711/faster_rcnn_resnet101_v1_640x640_coco17_tpu-8.tar.gz) | 55 | 31.8 | Boxes
[Faster R-CNN ResNet101 V1 1024x1024](http://download.tensorflow.org/models/object_detection/tf2/20200711/faster_rcnn_resnet101_v1_1024x1024_coco17_tpu-8.tar.gz) | 72 | 37.1 | Boxes
[Faster R-CNN ResNet101 V1 800x1333](http://download.tensorflow.org/models/object_detection/tf2/20200711/faster_rcnn_resnet101_v1_800x1333_coco17_gpu-8.tar.gz) | 77 | 36.6 | Boxes
[Faster R-CNN ResNet152 V1 640x640](http://download.tensorflow.org/models/object_detection/tf2/20200711/faster_rcnn_resnet152_v1_640x640_coco17_tpu-8.tar.gz) | 64 | 32.4 | Boxes
[Faster R-CNN ResNet152 V1 1024x1024](http://download.tensorflow.org/models/object_detection/tf2/20200711/faster_rcnn_resnet152_v1_1024x1024_coco17_tpu-8.tar.gz) | 85 | 37.6 | Boxes
[Faster R-CNN ResNet152 V1 800x1333](http://download.tensorflow.org/models/object_detection/tf2/20200711/faster_rcnn_resnet152_v1_800x1333_coco17_gpu-8.tar.gz) | 101 | 37.4 | Boxes
[Faster R-CNN Inception ResNet V2 640x640](http://download.tensorflow.org/models/object_detection/tf2/20200711/faster_rcnn_inception_resnet_v2_640x640_coco17_tpu-8.tar.gz) | 206 | 37.7 | Boxes
[Faster R-CNN Inception ResNet V2 1024x1024](http://download.tensorflow.org/models/object_detection/tf2/20200711/faster_rcnn_inception_resnet_v2_1024x1024_coco17_tpu-8.tar.gz) | 236 | 38.7 | Boxes
[Mask R-CNN Inception ResNet V2 1024x1024](http://download.tensorflow.org/models/object_detection/tf2/20200711/mask_rcnn_inception_resnet_v2_1024x1024_coco17_gpu-8.tar.gz) | 301 | 39.0/34.6 | Boxes/Masks
[ExtremeNet](http://download.tensorflow.org/models/object_detection/tf2/20200711/extremenet.tar.gz) | -- | -- | Boxes
# Training and Evaluation with TensorFlow 2
[![TensorFlow 2.2](https://img.shields.io/badge/TensorFlow-2.2-FF6F00?logo=tensorflow)](https://github.com/tensorflow/tensorflow/releases/tag/v2.2.0)
[![Python 3.6](https://img.shields.io/badge/Python-3.6-3776AB)](https://www.python.org/downloads/release/python-360/)
This page walks through the steps required to train an object detection model.
It assumes the reader has completed the following prerequisites:
1. The TensorFlow Object Detection API has been installed as documented in the
[installation instructions](tf2.md#installation).
2. A valid data set has been created. See [this page](preparing_inputs.md) for
instructions on how to generate a dataset for the PASCAL VOC challenge or
the Oxford-IIIT Pet dataset.
## Recommended Directory Structure for Training and Evaluation
```bash
.
├── data/
│   ├── eval-00000-of-00001.tfrecord
│   ├── label_map.txt
│   ├── train-00000-of-00002.tfrecord
│   └── train-00001-of-00002.tfrecord
└── models/
└── my_model_dir/
├── eval/ # Created by evaluation job.
├── my_model.config
└── model_ckpt-100-data@1 #
└── model_ckpt-100-index # Created by training job.
└── checkpoint #
```
## Writing a model configuration
Please refer to sample [TF2 configs](../configs/tf2) and
[configuring jobs](configuring_jobs.md) to create a model config.
### Model Parameter Initialization
While optional, it is highly recommended that users utilize classification or
object detection checkpoints. Training an object detector from scratch can take
days. To speed up the training process, it is recommended that users re-use the
feature extractor parameters from a pre-existing image classification or object
detection checkpoint. The `train_config` section in the config provides two
fields to specify pre-existing checkpoints:
* `fine_tune_checkpoint`: a path prefix to the pre-existing checkpoint
(ie:"/usr/home/username/checkpoint/model.ckpt-#####").
* `fine_tune_checkpoint_type`: with value `classification` or `detection`
depending on the type.
A list of classification checkpoints can be found
[here](tf2_classification_zoo.md)
A list of detection checkpoints can be found [here](tf2_detection_zoo.md).
## Local
### Training
A local training job can be run with the following command:
```bash
# From the tensorflow/models/research/ directory
PIPELINE_CONFIG_PATH={path to pipeline config file}
MODEL_DIR={path to model directory}
python object_detection/model_main_tf2.py \
--pipeline_config_path=${PIPELINE_CONFIG_PATH} \
--model_dir=${MODEL_DIR} \
--alsologtostderr
```
where `${PIPELINE_CONFIG_PATH}` points to the pipeline config and `${MODEL_DIR}`
points to the directory in which training checkpoints and events will be
written.
### Evaluation
A local evaluation job can be run with the following command:
```bash
# From the tensorflow/models/research/ directory
PIPELINE_CONFIG_PATH={path to pipeline config file}
MODEL_DIR={path to model directory}
CHECKPOINT_DIR=${MODEL_DIR}
MODEL_DIR={path to model directory}
python object_detection/model_main_tf2.py \
--pipeline_config_path=${PIPELINE_CONFIG_PATH} \
--model_dir=${MODEL_DIR} \
--checkpoint_dir=${CHECKPOINT_DIR} \
--alsologtostderr
```
where `${CHECKPOINT_DIR}` points to the directory with checkpoints produced by
the training job. Evaluation events are written to `${MODEL_DIR/eval}`
## Google Cloud VM
The TensorFlow Object Detection API supports training on Google Cloud with Deep
Learning GPU VMs and TPU VMs. This section documents instructions on how to
train and evaluate your model on them. The reader should complete the following
prerequistes:
1. The reader has create and configured a GPU VM or TPU VM on Google Cloud with
TensorFlow >= 2.2.0. See
[TPU quickstart](https://cloud.google.com/tpu/docs/quickstart) and
[GPU quickstart](https://cloud.google.com/ai-platform/deep-learning-vm/docs/tensorflow_start_instance#with-one-or-more-gpus)
2. The reader has installed the TensorFlow Object Detection API as documented
in the [installation instructions](tf2.md#installation) on the VM.
3. The reader has a valid data set and stored it in a Google Cloud Storage
bucket or locally on the VM. See [this page](preparing_inputs.md) for
instructions on how to generate a dataset for the PASCAL VOC challenge or
the Oxford-IIIT Pet dataset.
Additionally, it is recommended users test their job by running training and
evaluation jobs for a few iterations [locally on their own machines](#local).
### Training
Training on GPU or TPU VMs is similar to local training. It can be launched
using the following command.
```bash
# From the tensorflow/models/research/ directory
USE_TPU=true
TPU_NAME="MY_TPU_NAME"
PIPELINE_CONFIG_PATH={path to pipeline config file}
MODEL_DIR={path to model directory}
python object_detection/model_main_tf2.py \
--pipeline_config_path=${PIPELINE_CONFIG_PATH} \
--model_dir=${MODEL_DIR} \
--use_tpu=${USE_TPU} \ # (optional) only required for TPU training.
--tpu_name=${TPU_NAME} \ # (optional) only required for TPU training.
--alsologtostderr
```
where `${PIPELINE_CONFIG_PATH}` points to the pipeline config and `${MODEL_DIR}`
points to the root directory for the files produces. Training checkpoints and
events are written to `${MODEL_DIR}`. Note that the paths can be either local or
a path to GCS bucket.
### Evaluation
Evaluation is only supported on GPU. Similar to local evaluation it can be
launched using the following command:
```bash
# From the tensorflow/models/research/ directory
PIPELINE_CONFIG_PATH={path to pipeline config file}
MODEL_DIR={path to model directory}
CHECKPOINT_DIR=${MODEL_DIR}
MODEL_DIR={path to model directory}
python object_detection/model_main_tf2.py \
--pipeline_config_path=${PIPELINE_CONFIG_PATH} \
--model_dir=${MODEL_DIR} \
--checkpoint_dir=${CHECKPOINT_DIR} \
--alsologtostderr
```
where `${CHECKPOINT_DIR}` points to the directory with checkpoints produced by
the training job. Evaluation events are written to `${MODEL_DIR/eval}`. Note
that the paths can be either local or a path to GCS bucket.
## Google Cloud AI Platform
The TensorFlow Object Detection API supports also supports training on Google
Cloud AI Platform. This section documents instructions on how to train and
evaluate your model using Cloud ML. The reader should complete the following
prerequistes:
1. The reader has created and configured a project on Google Cloud AI Platform.
See
[Using GPUs](https://cloud.google.com/ai-platform/training/docs/using-gpus)
and
[Using TPUs](https://cloud.google.com/ai-platform/training/docs/using-tpus)
guides.
2. The reader has a valid data set and stored it in a Google Cloud Storage
bucket. See [this page](preparing_inputs.md) for instructions on how to
generate a dataset for the PASCAL VOC challenge or the Oxford-IIIT Pet
dataset.
Additionally, it is recommended users test their job by running training and
evaluation jobs for a few iterations [locally on their own machines](#local).
### Training with multiple GPUs
A user can start a training job on Cloud AI Platform using the following
command:
```bash
# From the tensorflow/models/research/ directory
cp object_detection/packages/tf2/setup.py .
gcloud ai-platform jobs submit training object_detection_`date +%m_%d_%Y_%H_%M_%S` \
--runtime-version 2.1 \
--python-version 3.6 \
--job-dir=gs://${MODEL_DIR} \
--package-path ./object_detection \
--module-name object_detection.model_main_tf2 \
--region us-central1 \
--master-machine-type n1-highcpu-16 \
--master-accelerator count=8,type=nvidia-tesla-v100 \
-- \
--model_dir=gs://${MODEL_DIR} \
--pipeline_config_path=gs://${PIPELINE_CONFIG_PATH}
```
Where `gs://${MODEL_DIR}` specifies the directory on Google Cloud Storage where
the training checkpoints and events will be written to and
`gs://${PIPELINE_CONFIG_PATH}` points to the pipeline configuration stored on
Google Cloud Storage.
Users can monitor the progress of their training job on the
[ML Engine Dashboard](https://console.cloud.google.com/ai-platform/jobs).
### Training with TPU
Launching a training job with a TPU compatible pipeline config requires using a
similar command:
```bash
# From the tensorflow/models/research/ directory
cp object_detection/packages/tf2/setup.py .
gcloud ai-platform jobs submit training `whoami`_object_detection_`date +%m_%d_%Y_%H_%M_%S` \
--job-dir=gs://${MODEL_DIR} \
--package-path ./object_detection \
--module-name object_detection.model_main_tf2 \
--runtime-version 2.1 \
--python-version 3.6 \
--scale-tier BASIC_TPU \
--region us-central1 \
-- \
--use_tpu true \
--model_dir=gs://${MODEL_DIR} \
--pipeline_config_path=gs://${PIPELINE_CONFIG_PATH}
```
As before `pipeline_config_path` points to the pipeline configuration stored on
Google Cloud Storage (but is now must be a TPU compatible model).
### Evaluating with GPU
Evaluation jobs run on a single machine. Run the following command to start the
evaluation job:
```bash
# From the tensorflow/models/research/ directory
cp object_detection/packages/tf2/setup.py .
gcloud ai-platform jobs submit training object_detection_eval_`date +%m_%d_%Y_%H_%M_%S` \
--runtime-version 2.1 \
--python-version 3.6 \
--job-dir=gs://${MODEL_DIR} \
--package-path ./object_detection \
--module-name object_detection.model_main_tf2 \
--region us-central1 \
--scale-tier BASIC_GPU \
-- \
--model_dir=gs://${MODEL_DIR} \
--pipeline_config_path=gs://${PIPELINE_CONFIG_PATH} \
--checkpoint_dir=gs://${MODEL_DIR}
```
where `gs://${MODEL_DIR}` points to the directory on Google Cloud Storage where
training checkpoints are saved and `gs://{PIPELINE_CONFIG_PATH}` points to where
the model configuration file stored on Google Cloud Storage. Evaluation events
are written to `gs://${MODEL_DIR}/eval`
Typically one starts an evaluation job concurrently with the training job. Note
that we do not support running evaluation on TPU.
## Running Tensorboard
Progress for training and eval jobs can be inspected using Tensorboard. If using
the recommended directory structure, Tensorboard can be run using the following
command:
```bash
tensorboard --logdir=${MODEL_DIR}
```
where `${MODEL_DIR}` points to the directory that contains the train and eval
directories. Please note it may take Tensorboard a couple minutes to populate
with data.
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
[TOC] [TOC]
The Tensorflow Object Detection API supports TPU training for some models. To The TensorFlow Object Detection API supports TPU training for some models. To
make models TPU compatible you need to make a few tweaks to the model config as make models TPU compatible you need to make a few tweaks to the model config as
mentioned below. We also provide several sample configs that you can use as a mentioned below. We also provide several sample configs that you can use as a
template. template.
...@@ -11,7 +11,7 @@ template. ...@@ -11,7 +11,7 @@ template.
### Static shaped tensors ### Static shaped tensors
TPU training currently requires all tensors in the Tensorflow Graph to have TPU training currently requires all tensors in the TensorFlow Graph to have
static shapes. However, most of the sample configs in Object Detection API have static shapes. However, most of the sample configs in Object Detection API have
a few different tensors that are dynamically shaped. Fortunately, we provide a few different tensors that are dynamically shaped. Fortunately, we provide
simple alternatives in the model configuration that modifies these tensors to simple alternatives in the model configuration that modifies these tensors to
...@@ -62,7 +62,7 @@ have static shape: ...@@ -62,7 +62,7 @@ have static shape:
### TPU friendly ops ### TPU friendly ops
Although TPU supports a vast number of tensorflow ops, a few used in the Although TPU supports a vast number of tensorflow ops, a few used in the
Tensorflow Object Detection API are unsupported. We list such ops below and TensorFlow Object Detection API are unsupported. We list such ops below and
recommend compatible substitutes. recommend compatible substitutes.
* **Anchor sampling** - Typically we use hard example mining in standard SSD * **Anchor sampling** - Typically we use hard example mining in standard SSD
......
# Object Detection TPU Inference Exporter # Object Detection TPU Inference Exporter
[![TensorFlow 1.15](https://img.shields.io/badge/TensorFlow-1.15-FF6F00?logo=tensorflow)](https://github.com/tensorflow/tensorflow/releases/tag/v1.15.0)
This package contains SavedModel Exporter for TPU Inference of object detection This package contains SavedModel Exporter for TPU Inference of object detection
models. models.
......
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
[TOC] [TOC]
To use your own dataset in Tensorflow Object Detection API, you must convert it To use your own dataset in TensorFlow Object Detection API, you must convert it
into the [TFRecord file format](https://www.tensorflow.org/api_guides/python/python_io#tfrecords_format_details). into the [TFRecord file format](https://www.tensorflow.org/api_guides/python/python_io#tfrecords_format_details).
This document outlines how to write a script to generate the TFRecord file. This document outlines how to write a script to generate the TFRecord file.
......
...@@ -1094,8 +1094,12 @@ def get_reduce_to_frame_fn(input_reader_config, is_training): ...@@ -1094,8 +1094,12 @@ def get_reduce_to_frame_fn(input_reader_config, is_training):
num_frames = tf.cast( num_frames = tf.cast(
tf.shape(tensor_dict[fields.InputDataFields.source_id])[0], tf.shape(tensor_dict[fields.InputDataFields.source_id])[0],
dtype=tf.int32) dtype=tf.int32)
frame_index = tf.random.uniform((), minval=0, maxval=num_frames, if input_reader_config.frame_index == -1:
dtype=tf.int32) frame_index = tf.random.uniform((), minval=0, maxval=num_frames,
dtype=tf.int32)
else:
frame_index = tf.constant(input_reader_config.frame_index,
dtype=tf.int32)
out_tensor_dict = {} out_tensor_dict = {}
for key in tensor_dict: for key in tensor_dict:
if key in fields.SEQUENCE_FIELDS: if key in fields.SEQUENCE_FIELDS:
......
...@@ -61,7 +61,7 @@ def _get_configs_for_model(model_name): ...@@ -61,7 +61,7 @@ def _get_configs_for_model(model_name):
configs, kwargs_dict=override_dict) configs, kwargs_dict=override_dict)
def _get_configs_for_model_sequence_example(model_name): def _get_configs_for_model_sequence_example(model_name, frame_index=-1):
"""Returns configurations for model.""" """Returns configurations for model."""
fname = os.path.join(tf.resource_loader.get_data_files_path(), fname = os.path.join(tf.resource_loader.get_data_files_path(),
'test_data/' + model_name + '.config') 'test_data/' + model_name + '.config')
...@@ -74,7 +74,8 @@ def _get_configs_for_model_sequence_example(model_name): ...@@ -74,7 +74,8 @@ def _get_configs_for_model_sequence_example(model_name):
override_dict = { override_dict = {
'train_input_path': data_path, 'train_input_path': data_path,
'eval_input_path': data_path, 'eval_input_path': data_path,
'label_map_path': label_map_path 'label_map_path': label_map_path,
'frame_index': frame_index
} }
return config_util.merge_external_params_with_configs( return config_util.merge_external_params_with_configs(
configs, kwargs_dict=override_dict) configs, kwargs_dict=override_dict)
...@@ -312,6 +313,46 @@ class InputFnTest(test_case.TestCase, parameterized.TestCase): ...@@ -312,6 +313,46 @@ class InputFnTest(test_case.TestCase, parameterized.TestCase):
tf.float32, tf.float32,
labels[fields.InputDataFields.groundtruth_weights].dtype) labels[fields.InputDataFields.groundtruth_weights].dtype)
def test_context_rcnn_resnet50_train_input_with_sequence_example_frame_index(
self, train_batch_size=8):
"""Tests the training input function for FasterRcnnResnet50."""
configs = _get_configs_for_model_sequence_example(
'context_rcnn_camera_trap', frame_index=2)
model_config = configs['model']
train_config = configs['train_config']
train_config.batch_size = train_batch_size
train_input_fn = inputs.create_train_input_fn(
train_config, configs['train_input_config'], model_config)
features, labels = _make_initializable_iterator(train_input_fn()).get_next()
self.assertAllEqual([train_batch_size, 640, 640, 3],
features[fields.InputDataFields.image].shape.as_list())
self.assertEqual(tf.float32, features[fields.InputDataFields.image].dtype)
self.assertAllEqual([train_batch_size],
features[inputs.HASH_KEY].shape.as_list())
self.assertEqual(tf.int32, features[inputs.HASH_KEY].dtype)
self.assertAllEqual(
[train_batch_size, 100, 4],
labels[fields.InputDataFields.groundtruth_boxes].shape.as_list())
self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_boxes].dtype)
self.assertAllEqual(
[train_batch_size, 100, model_config.faster_rcnn.num_classes],
labels[fields.InputDataFields.groundtruth_classes].shape.as_list())
self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_classes].dtype)
self.assertAllEqual(
[train_batch_size, 100],
labels[fields.InputDataFields.groundtruth_weights].shape.as_list())
self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_weights].dtype)
self.assertAllEqual(
[train_batch_size, 100, model_config.faster_rcnn.num_classes],
labels[fields.InputDataFields.groundtruth_confidences].shape.as_list())
self.assertEqual(
tf.float32,
labels[fields.InputDataFields.groundtruth_confidences].dtype)
def test_ssd_inceptionV2_train_input(self): def test_ssd_inceptionV2_train_input(self):
"""Tests the training input function for SSDInceptionV2.""" """Tests the training input function for SSDInceptionV2."""
configs = _get_configs_for_model('ssd_inception_v2_pets') configs = _get_configs_for_model('ssd_inception_v2_pets')
......
...@@ -924,13 +924,16 @@ def convert_strided_predictions_to_normalized_keypoints( ...@@ -924,13 +924,16 @@ def convert_strided_predictions_to_normalized_keypoints(
def convert_strided_predictions_to_instance_masks( def convert_strided_predictions_to_instance_masks(
boxes, classes, masks, stride, mask_height, mask_width, boxes, classes, masks, true_image_shapes,
true_image_shapes, score_threshold=0.5): densepose_part_heatmap=None, densepose_surface_coords=None, stride=4,
mask_height=256, mask_width=256, score_threshold=0.5,
densepose_class_index=-1):
"""Converts predicted full-image masks into instance masks. """Converts predicted full-image masks into instance masks.
For each predicted detection box: For each predicted detection box:
* Crop and resize the predicted mask based on the detected bounding box * Crop and resize the predicted mask (and optionally DensePose coordinates)
coordinates and class prediction. Uses bilinear resampling. based on the detected bounding box coordinates and class prediction. Uses
bilinear resampling.
* Binarize the mask using the provided score threshold. * Binarize the mask using the provided score threshold.
Args: Args:
...@@ -940,57 +943,212 @@ def convert_strided_predictions_to_instance_masks( ...@@ -940,57 +943,212 @@ def convert_strided_predictions_to_instance_masks(
detected class for each box (0-indexed). detected class for each box (0-indexed).
masks: A [batch, output_height, output_width, num_classes] float32 masks: A [batch, output_height, output_width, num_classes] float32
tensor with class probabilities. tensor with class probabilities.
true_image_shapes: A tensor of shape [batch, 3] representing the true
shape of the inputs not considering padding.
densepose_part_heatmap: (Optional) A [batch, output_height, output_width,
num_parts] float32 tensor with part scores (i.e. logits).
densepose_surface_coords: (Optional) A [batch, output_height, output_width,
2 * num_parts] float32 tensor with predicted part coordinates (in
vu-format).
stride: The stride in the output space. stride: The stride in the output space.
mask_height: The desired resized height for instance masks. mask_height: The desired resized height for instance masks.
mask_width: The desired resized width for instance masks. mask_width: The desired resized width for instance masks.
true_image_shapes: A tensor of shape [batch, 3] representing the true
shape of the inputs not considering padding.
score_threshold: The threshold at which to convert predicted mask score_threshold: The threshold at which to convert predicted mask
into foreground pixels. into foreground pixels.
densepose_class_index: The class index (0-indexed) corresponding to the
class which has DensePose labels (e.g. person class).
Returns: Returns:
A [batch_size, max_detections, mask_height, mask_width] uint8 tensor with A tuple of masks and surface_coords.
predicted foreground mask for each instance. The masks take values in instance_masks: A [batch_size, max_detections, mask_height, mask_width]
{0, 1}. uint8 tensor with predicted foreground mask for each
instance. If DensePose tensors are provided, then each pixel value in the
mask encodes the 1-indexed part.
surface_coords: A [batch_size, max_detections, mask_height, mask_width, 2]
float32 tensor with (v, u) coordinates. Note that v, u coordinates are
only defined on instance masks, and the coordinates at each location of
the foreground mask correspond to coordinates on a local part coordinate
system (the specific part can be inferred from the `instance_masks`
output. If DensePose feature maps are not passed to this function, this
output will be None.
Raises:
ValueError: If one but not both of `densepose_part_heatmap` and
`densepose_surface_coords` is provided.
""" """
_, output_height, output_width, _ = ( batch_size, output_height, output_width, _ = (
shape_utils.combined_static_and_dynamic_shape(masks)) shape_utils.combined_static_and_dynamic_shape(masks))
input_height = stride * output_height input_height = stride * output_height
input_width = stride * output_width input_width = stride * output_width
true_heights, true_widths, _ = tf.unstack(true_image_shapes, axis=1)
# If necessary, create dummy DensePose tensors to simplify the map function.
densepose_present = True
if ((densepose_part_heatmap is not None) ^
(densepose_surface_coords is not None)):
raise ValueError('To use DensePose, both `densepose_part_heatmap` and '
'`densepose_surface_coords` must be provided')
if densepose_part_heatmap is None and densepose_surface_coords is None:
densepose_present = False
densepose_part_heatmap = tf.zeros(
(batch_size, output_height, output_width, 1), dtype=tf.float32)
densepose_surface_coords = tf.zeros(
(batch_size, output_height, output_width, 2), dtype=tf.float32)
crop_and_threshold_fn = functools.partial(
crop_and_threshold_masks, input_height=input_height,
input_width=input_width, mask_height=mask_height, mask_width=mask_width,
score_threshold=score_threshold,
densepose_class_index=densepose_class_index)
instance_masks, surface_coords = shape_utils.static_or_dynamic_map_fn(
crop_and_threshold_fn,
elems=[boxes, classes, masks, densepose_part_heatmap,
densepose_surface_coords, true_heights, true_widths],
dtype=[tf.uint8, tf.float32],
back_prop=False)
surface_coords = surface_coords if densepose_present else None
return instance_masks, surface_coords
def crop_and_threshold_masks(elems, input_height, input_width, mask_height=256,
mask_width=256, score_threshold=0.5,
densepose_class_index=-1):
"""Crops and thresholds masks based on detection boxes.
Args:
elems: A tuple of
boxes - float32 tensor of shape [max_detections, 4]
classes - int32 tensor of shape [max_detections] (0-indexed)
masks - float32 tensor of shape [output_height, output_width, num_classes]
part_heatmap - float32 tensor of shape [output_height, output_width,
num_parts]
surf_coords - float32 tensor of shape [output_height, output_width,
2 * num_parts]
true_height - scalar int tensor
true_width - scalar int tensor
input_height: Input height to network.
input_width: Input width to network.
mask_height: Height for resizing mask crops.
mask_width: Width for resizing mask crops.
score_threshold: The threshold at which to convert predicted mask
into foreground pixels.
densepose_class_index: scalar int tensor with the class index (0-indexed)
for DensePose.
Returns:
A tuple of
all_instances: A [max_detections, mask_height, mask_width] uint8 tensor
with a predicted foreground mask for each instance. Background is encoded
as 0, and foreground is encoded as a positive integer. Specific part
indices are encoded as 1-indexed parts (for classes that have part
information).
surface_coords: A [max_detections, mask_height, mask_width, 2]
float32 tensor with (v, u) coordinates. for each part.
"""
(boxes, classes, masks, part_heatmap, surf_coords, true_height,
true_width) = elems
# Boxes are in normalized coordinates relative to true image shapes. Convert # Boxes are in normalized coordinates relative to true image shapes. Convert
# coordinates to be normalized relative to input image shapes (since masks # coordinates to be normalized relative to input image shapes (since masks
# may still have padding). # may still have padding).
# Then crop and resize each mask. boxlist = box_list.BoxList(boxes)
def crop_and_threshold_masks(args): y_scale = true_height / input_height
"""Crops masks based on detection boxes.""" x_scale = true_width / input_width
boxes, classes, masks, true_height, true_width = args boxlist = box_list_ops.scale(boxlist, y_scale, x_scale)
boxlist = box_list.BoxList(boxes) boxes = boxlist.get()
y_scale = true_height / input_height # Convert masks from [output_height, output_width, num_classes] to
x_scale = true_width / input_width # [num_classes, output_height, output_width, 1].
boxlist = box_list_ops.scale(boxlist, y_scale, x_scale) num_classes = tf.shape(masks)[-1]
boxes = boxlist.get() masks_4d = tf.transpose(masks, perm=[2, 0, 1])[:, :, :, tf.newaxis]
# Convert masks from [input_height, input_width, num_classes] to # Tile part and surface coordinate masks for all classes.
# [num_classes, input_height, input_width, 1]. part_heatmap_4d = tf.tile(part_heatmap[tf.newaxis, :, :, :],
masks_4d = tf.transpose(masks, perm=[2, 0, 1])[:, :, :, tf.newaxis] multiples=[num_classes, 1, 1, 1])
cropped_masks = tf2.image.crop_and_resize( surf_coords_4d = tf.tile(surf_coords[tf.newaxis, :, :, :],
masks_4d, multiples=[num_classes, 1, 1, 1])
boxes=boxes, feature_maps_concat = tf.concat([masks_4d, part_heatmap_4d, surf_coords_4d],
box_indices=classes, axis=-1)
crop_size=[mask_height, mask_width], # The following tensor has shape
method='bilinear') # [max_detections, mask_height, mask_width, 1 + 3 * num_parts].
masks_3d = tf.squeeze(cropped_masks, axis=3) cropped_masks = tf2.image.crop_and_resize(
masks_binarized = tf.math.greater_equal(masks_3d, score_threshold) feature_maps_concat,
return tf.cast(masks_binarized, tf.uint8) boxes=boxes,
box_indices=classes,
crop_size=[mask_height, mask_width],
method='bilinear')
# Split the cropped masks back into instance masks, part masks, and surface
# coordinates.
num_parts = tf.shape(part_heatmap)[-1]
instance_masks, part_heatmap_cropped, surface_coords_cropped = tf.split(
cropped_masks, [1, num_parts, 2 * num_parts], axis=-1)
# Threshold the instance masks. Resulting tensor has shape
# [max_detections, mask_height, mask_width, 1].
instance_masks_int = tf.cast(
tf.math.greater_equal(instance_masks, score_threshold), dtype=tf.int32)
# Produce a binary mask that is 1.0 only:
# - in the foreground region for an instance
# - in detections corresponding to the DensePose class
det_with_parts = tf.equal(classes, densepose_class_index)
det_with_parts = tf.cast(
tf.reshape(det_with_parts, [-1, 1, 1, 1]), dtype=tf.int32)
instance_masks_with_parts = tf.math.multiply(instance_masks_int,
det_with_parts)
# Similarly, produce a binary mask that holds the foreground masks only for
# instances without parts (i.e. non-DensePose classes).
det_without_parts = 1 - det_with_parts
instance_masks_without_parts = tf.math.multiply(instance_masks_int,
det_without_parts)
# Assemble a tensor that has standard instance segmentation masks for
# non-DensePose classes (with values in [0, 1]), and part segmentation masks
# for DensePose classes (with vaues in [0, 1, ..., num_parts]).
part_mask_int_zero_indexed = tf.math.argmax(
part_heatmap_cropped, axis=-1, output_type=tf.int32)[:, :, :, tf.newaxis]
part_mask_int_one_indexed = part_mask_int_zero_indexed + 1
all_instances = (instance_masks_without_parts +
instance_masks_with_parts * part_mask_int_one_indexed)
# Gather the surface coordinates for the parts.
surface_coords_cropped = tf.reshape(
surface_coords_cropped, [-1, mask_height, mask_width, num_parts, 2])
surface_coords = gather_surface_coords_for_parts(surface_coords_cropped,
part_mask_int_zero_indexed)
surface_coords = (
surface_coords * tf.cast(instance_masks_with_parts, tf.float32))
return [tf.squeeze(all_instances, axis=3), surface_coords]
def gather_surface_coords_for_parts(surface_coords_cropped,
highest_scoring_part):
"""Gathers the (v, u) coordinates for the highest scoring DensePose parts.
true_heights, true_widths, _ = tf.unstack(true_image_shapes, axis=1) Args:
masks_for_image = shape_utils.static_or_dynamic_map_fn( surface_coords_cropped: A [max_detections, height, width, num_parts, 2]
crop_and_threshold_masks, float32 tensor with (v, u) surface coordinates.
elems=[boxes, classes, masks, true_heights, true_widths], highest_scoring_part: A [max_detections, height, width] integer tensor with
dtype=tf.uint8, the highest scoring part (0-indexed) indices for each location.
back_prop=False)
masks = tf.stack(masks_for_image, axis=0) Returns:
return masks A [max_detections, height, width, 2] float32 tensor with the (v, u)
coordinates selected from the highest scoring parts.
"""
max_detections, height, width, num_parts, _ = (
shape_utils.combined_static_and_dynamic_shape(surface_coords_cropped))
flattened_surface_coords = tf.reshape(surface_coords_cropped, [-1, 2])
flattened_part_ids = tf.reshape(highest_scoring_part, [-1])
# Produce lookup indices that represent the locations of the highest scoring
# parts in the `flattened_surface_coords` tensor.
flattened_lookup_indices = (
num_parts * tf.range(max_detections * height * width) +
flattened_part_ids)
vu_coords_flattened = tf.gather(flattened_surface_coords,
flattened_lookup_indices, axis=0)
return tf.reshape(vu_coords_flattened, [max_detections, height, width, 2])
class ObjectDetectionParams( class ObjectDetectionParams(
...@@ -1235,6 +1393,64 @@ class MaskParams( ...@@ -1235,6 +1393,64 @@ class MaskParams(
score_threshold, heatmap_bias_init) score_threshold, heatmap_bias_init)
class DensePoseParams(
collections.namedtuple('DensePoseParams', [
'class_id', 'classification_loss', 'localization_loss',
'part_loss_weight', 'coordinate_loss_weight', 'num_parts',
'task_loss_weight', 'upsample_to_input_res', 'upsample_method',
'heatmap_bias_init'
])):
"""Namedtuple to store DensePose prediction related parameters."""
__slots__ = ()
def __new__(cls,
class_id,
classification_loss,
localization_loss,
part_loss_weight=1.0,
coordinate_loss_weight=1.0,
num_parts=24,
task_loss_weight=1.0,
upsample_to_input_res=True,
upsample_method='bilinear',
heatmap_bias_init=-2.19):
"""Constructor with default values for DensePoseParams.
Args:
class_id: the ID of the class that contains the DensePose groundtruth.
This should typically correspond to the "person" class. Note that the ID
is 0-based, meaning that class 0 corresponds to the first non-background
object class.
classification_loss: an object_detection.core.losses.Loss object to
compute the loss for the body part predictions in CenterNet.
localization_loss: an object_detection.core.losses.Loss object to compute
the loss for the surface coordinate regression in CenterNet.
part_loss_weight: The loss weight to apply to part prediction.
coordinate_loss_weight: The loss weight to apply to surface coordinate
prediction.
num_parts: The number of DensePose parts to predict.
task_loss_weight: float, the loss weight for the DensePose task.
upsample_to_input_res: Whether to upsample the DensePose feature maps to
the input resolution before applying loss. Note that the prediction
outputs are still at the standard CenterNet output stride.
upsample_method: Method for upsampling DensePose feature maps. Options are
either 'bilinear' or 'nearest'). This takes no effect when
`upsample_to_input_res` is False.
heatmap_bias_init: float, the initial value of bias in the convolutional
kernel of the part prediction head. If set to None, the
bias is initialized with zeros.
Returns:
An initialized DensePoseParams namedtuple.
"""
return super(DensePoseParams,
cls).__new__(cls, class_id, classification_loss,
localization_loss, part_loss_weight,
coordinate_loss_weight, num_parts,
task_loss_weight, upsample_to_input_res,
upsample_method, heatmap_bias_init)
# The following constants are used to generate the keys of the # The following constants are used to generate the keys of the
# (prediction, loss, target assigner,...) dictionaries used in CenterNetMetaArch # (prediction, loss, target assigner,...) dictionaries used in CenterNetMetaArch
# class. # class.
...@@ -1247,6 +1463,9 @@ KEYPOINT_HEATMAP = 'keypoint/heatmap' ...@@ -1247,6 +1463,9 @@ KEYPOINT_HEATMAP = 'keypoint/heatmap'
KEYPOINT_OFFSET = 'keypoint/offset' KEYPOINT_OFFSET = 'keypoint/offset'
SEGMENTATION_TASK = 'segmentation_task' SEGMENTATION_TASK = 'segmentation_task'
SEGMENTATION_HEATMAP = 'segmentation/heatmap' SEGMENTATION_HEATMAP = 'segmentation/heatmap'
DENSEPOSE_TASK = 'densepose_task'
DENSEPOSE_HEATMAP = 'densepose/heatmap'
DENSEPOSE_REGRESSION = 'densepose/regression'
LOSS_KEY_PREFIX = 'Loss' LOSS_KEY_PREFIX = 'Loss'
...@@ -1290,7 +1509,8 @@ class CenterNetMetaArch(model.DetectionModel): ...@@ -1290,7 +1509,8 @@ class CenterNetMetaArch(model.DetectionModel):
object_center_params, object_center_params,
object_detection_params=None, object_detection_params=None,
keypoint_params_dict=None, keypoint_params_dict=None,
mask_params=None): mask_params=None,
densepose_params=None):
"""Initializes a CenterNet model. """Initializes a CenterNet model.
Args: Args:
...@@ -1318,6 +1538,10 @@ class CenterNetMetaArch(model.DetectionModel): ...@@ -1318,6 +1538,10 @@ class CenterNetMetaArch(model.DetectionModel):
mask_params: A MaskParams namedtuple. This object mask_params: A MaskParams namedtuple. This object
holds the hyper-parameters for segmentation. Please see the class holds the hyper-parameters for segmentation. Please see the class
definition for more details. definition for more details.
densepose_params: A DensePoseParams namedtuple. This object holds the
hyper-parameters for DensePose prediction. Please see the class
definition for more details. Note that if this is provided, it is
expected that `mask_params` is also provided.
""" """
assert object_detection_params or keypoint_params_dict assert object_detection_params or keypoint_params_dict
# Shorten the name for convenience and better formatting. # Shorten the name for convenience and better formatting.
...@@ -1333,6 +1557,10 @@ class CenterNetMetaArch(model.DetectionModel): ...@@ -1333,6 +1557,10 @@ class CenterNetMetaArch(model.DetectionModel):
self._od_params = object_detection_params self._od_params = object_detection_params
self._kp_params_dict = keypoint_params_dict self._kp_params_dict = keypoint_params_dict
self._mask_params = mask_params self._mask_params = mask_params
if densepose_params is not None and mask_params is None:
raise ValueError('To run DensePose prediction, `mask_params` must also '
'be supplied.')
self._densepose_params = densepose_params
# Construct the prediction head nets. # Construct the prediction head nets.
self._prediction_head_dict = self._construct_prediction_heads( self._prediction_head_dict = self._construct_prediction_heads(
...@@ -1413,8 +1641,18 @@ class CenterNetMetaArch(model.DetectionModel): ...@@ -1413,8 +1641,18 @@ class CenterNetMetaArch(model.DetectionModel):
if self._mask_params is not None: if self._mask_params is not None:
prediction_heads[SEGMENTATION_HEATMAP] = [ prediction_heads[SEGMENTATION_HEATMAP] = [
make_prediction_net(num_classes, make_prediction_net(num_classes,
bias_fill=class_prediction_bias_init) bias_fill=self._mask_params.heatmap_bias_init)
for _ in range(num_feature_outputs)]
if self._densepose_params is not None:
prediction_heads[DENSEPOSE_HEATMAP] = [
make_prediction_net( # pylint: disable=g-complex-comprehension
self._densepose_params.num_parts,
bias_fill=self._densepose_params.heatmap_bias_init)
for _ in range(num_feature_outputs)] for _ in range(num_feature_outputs)]
prediction_heads[DENSEPOSE_REGRESSION] = [
make_prediction_net(2 * self._densepose_params.num_parts)
for _ in range(num_feature_outputs)
]
return prediction_heads return prediction_heads
def _initialize_target_assigners(self, stride, min_box_overlap_iou): def _initialize_target_assigners(self, stride, min_box_overlap_iou):
...@@ -1449,6 +1687,10 @@ class CenterNetMetaArch(model.DetectionModel): ...@@ -1449,6 +1687,10 @@ class CenterNetMetaArch(model.DetectionModel):
if self._mask_params is not None: if self._mask_params is not None:
target_assigners[SEGMENTATION_TASK] = ( target_assigners[SEGMENTATION_TASK] = (
cn_assigner.CenterNetMaskTargetAssigner(stride)) cn_assigner.CenterNetMaskTargetAssigner(stride))
if self._densepose_params is not None:
dp_stride = 1 if self._densepose_params.upsample_to_input_res else stride
target_assigners[DENSEPOSE_TASK] = (
cn_assigner.CenterNetDensePoseTargetAssigner(dp_stride))
return target_assigners return target_assigners
...@@ -1860,6 +2102,113 @@ class CenterNetMetaArch(model.DetectionModel): ...@@ -1860,6 +2102,113 @@ class CenterNetMetaArch(model.DetectionModel):
float(len(segmentation_predictions)) * total_pixels_in_loss) float(len(segmentation_predictions)) * total_pixels_in_loss)
return total_loss return total_loss
def _compute_densepose_losses(self, input_height, input_width,
prediction_dict):
"""Computes the weighted DensePose losses.
Args:
input_height: An integer scalar tensor representing input image height.
input_width: An integer scalar tensor representing input image width.
prediction_dict: A dictionary holding predicted tensors output by the
"predict" function. See the "predict" function for more detailed
description.
Returns:
A dictionary of scalar float tensors representing the weighted losses for
the DensePose task:
DENSEPOSE_HEATMAP: the weighted part segmentation loss.
DENSEPOSE_REGRESSION: the weighted part surface coordinate loss.
"""
dp_heatmap_loss, dp_regression_loss = (
self._compute_densepose_part_and_coordinate_losses(
input_height=input_height,
input_width=input_width,
part_predictions=prediction_dict[DENSEPOSE_HEATMAP],
surface_coord_predictions=prediction_dict[DENSEPOSE_REGRESSION]))
loss_dict = {}
loss_dict[DENSEPOSE_HEATMAP] = (
self._densepose_params.part_loss_weight * dp_heatmap_loss)
loss_dict[DENSEPOSE_REGRESSION] = (
self._densepose_params.coordinate_loss_weight * dp_regression_loss)
return loss_dict
def _compute_densepose_part_and_coordinate_losses(
self, input_height, input_width, part_predictions,
surface_coord_predictions):
"""Computes the individual losses for the DensePose task.
Args:
input_height: An integer scalar tensor representing input image height.
input_width: An integer scalar tensor representing input image width.
part_predictions: A list of float tensors of shape [batch_size,
out_height, out_width, num_parts].
surface_coord_predictions: A list of float tensors of shape [batch_size,
out_height, out_width, 2 * num_parts].
Returns:
A tuple with two scalar loss tensors: part_prediction_loss and
surface_coord_loss.
"""
gt_dp_num_points_list = self.groundtruth_lists(
fields.BoxListFields.densepose_num_points)
gt_dp_part_ids_list = self.groundtruth_lists(
fields.BoxListFields.densepose_part_ids)
gt_dp_surface_coords_list = self.groundtruth_lists(
fields.BoxListFields.densepose_surface_coords)
gt_weights_list = self.groundtruth_lists(fields.BoxListFields.weights)
assigner = self._target_assigner_dict[DENSEPOSE_TASK]
batch_indices, batch_part_ids, batch_surface_coords, batch_weights = (
assigner.assign_part_and_coordinate_targets(
height=input_height,
width=input_width,
gt_dp_num_points_list=gt_dp_num_points_list,
gt_dp_part_ids_list=gt_dp_part_ids_list,
gt_dp_surface_coords_list=gt_dp_surface_coords_list,
gt_weights_list=gt_weights_list))
part_prediction_loss = 0
surface_coord_loss = 0
classification_loss_fn = self._densepose_params.classification_loss
localization_loss_fn = self._densepose_params.localization_loss
num_predictions = float(len(part_predictions))
num_valid_points = tf.math.count_nonzero(batch_weights)
num_valid_points = tf.cast(tf.math.maximum(num_valid_points, 1), tf.float32)
for part_pred, surface_coord_pred in zip(part_predictions,
surface_coord_predictions):
# Potentially upsample the feature maps, so that better quality (i.e.
# higher res) groundtruth can be applied.
if self._densepose_params.upsample_to_input_res:
part_pred = tf.keras.layers.UpSampling2D(
self._stride, interpolation=self._densepose_params.upsample_method)(
part_pred)
surface_coord_pred = tf.keras.layers.UpSampling2D(
self._stride, interpolation=self._densepose_params.upsample_method)(
surface_coord_pred)
# Compute the part prediction loss.
part_pred = cn_assigner.get_batch_predictions_from_indices(
part_pred, batch_indices[:, 0:3])
part_prediction_loss += classification_loss_fn(
part_pred[:, tf.newaxis, :],
batch_part_ids[:, tf.newaxis, :],
weights=batch_weights[:, tf.newaxis, tf.newaxis])
# Compute the surface coordinate loss.
batch_size, out_height, out_width, _ = _get_shape(
surface_coord_pred, 4)
surface_coord_pred = tf.reshape(
surface_coord_pred, [batch_size, out_height, out_width, -1, 2])
surface_coord_pred = cn_assigner.get_batch_predictions_from_indices(
surface_coord_pred, batch_indices)
surface_coord_loss += localization_loss_fn(
surface_coord_pred,
batch_surface_coords,
weights=batch_weights[:, tf.newaxis])
part_prediction_loss = tf.reduce_sum(part_prediction_loss) / (
num_predictions * num_valid_points)
surface_coord_loss = tf.reduce_sum(surface_coord_loss) / (
num_predictions * num_valid_points)
return part_prediction_loss, surface_coord_loss
def preprocess(self, inputs): def preprocess(self, inputs):
outputs = shape_utils.resize_images_and_return_shapes( outputs = shape_utils.resize_images_and_return_shapes(
inputs, self._image_resizer_fn) inputs, self._image_resizer_fn)
...@@ -1909,6 +2258,13 @@ class CenterNetMetaArch(model.DetectionModel): ...@@ -1909,6 +2258,13 @@ class CenterNetMetaArch(model.DetectionModel):
'segmentation/heatmap' - [optional] A list of size num_feature_outputs 'segmentation/heatmap' - [optional] A list of size num_feature_outputs
holding float tensors of size [batch_size, output_height, holding float tensors of size [batch_size, output_height,
output_width, num_classes] representing the mask logits. output_width, num_classes] representing the mask logits.
'densepose/heatmap' - [optional] A list of size num_feature_outputs
holding float tensors of size [batch_size, output_height,
output_width, num_parts] representing the mask logits for each part.
'densepose/regression' - [optional] A list of size num_feature_outputs
holding float tensors of size [batch_size, output_height,
output_width, 2 * num_parts] representing the DensePose surface
coordinate predictions.
Note the $TASK_NAME is provided by the KeypointEstimation namedtuple Note the $TASK_NAME is provided by the KeypointEstimation namedtuple
used to differentiate between different keypoint tasks. used to differentiate between different keypoint tasks.
""" """
...@@ -1938,10 +2294,16 @@ class CenterNetMetaArch(model.DetectionModel): ...@@ -1938,10 +2294,16 @@ class CenterNetMetaArch(model.DetectionModel):
scope: Optional scope name. scope: Optional scope name.
Returns: Returns:
A dictionary mapping the keys ['Loss/object_center', 'Loss/box/scale', A dictionary mapping the keys [
'Loss/box/offset', 'Loss/$TASK_NAME/keypoint/heatmap', 'Loss/object_center',
'Loss/$TASK_NAME/keypoint/offset', 'Loss/box/scale', (optional)
'Loss/$TASK_NAME/keypoint/regression', 'Loss/segmentation/heatmap'] to 'Loss/box/offset', (optional)
'Loss/$TASK_NAME/keypoint/heatmap', (optional)
'Loss/$TASK_NAME/keypoint/offset', (optional)
'Loss/$TASK_NAME/keypoint/regression', (optional)
'Loss/segmentation/heatmap', (optional)
'Loss/densepose/heatmap', (optional)
'Loss/densepose/regression]' (optional)
scalar tensors corresponding to the losses for different tasks. Note the scalar tensors corresponding to the losses for different tasks. Note the
$TASK_NAME is provided by the KeypointEstimation namedtuple used to $TASK_NAME is provided by the KeypointEstimation namedtuple used to
differentiate between different keypoint tasks. differentiate between different keypoint tasks.
...@@ -1999,6 +2361,16 @@ class CenterNetMetaArch(model.DetectionModel): ...@@ -1999,6 +2361,16 @@ class CenterNetMetaArch(model.DetectionModel):
seg_losses[key] = seg_losses[key] * self._mask_params.task_loss_weight seg_losses[key] = seg_losses[key] * self._mask_params.task_loss_weight
losses.update(seg_losses) losses.update(seg_losses)
if self._densepose_params is not None:
densepose_losses = self._compute_densepose_losses(
input_height=input_height,
input_width=input_width,
prediction_dict=prediction_dict)
for key in densepose_losses:
densepose_losses[key] = (
densepose_losses[key] * self._densepose_params.task_loss_weight)
losses.update(densepose_losses)
# Prepend the LOSS_KEY_PREFIX to the keys in the dictionary such that the # Prepend the LOSS_KEY_PREFIX to the keys in the dictionary such that the
# losses will be grouped together in Tensorboard. # losses will be grouped together in Tensorboard.
return dict([('%s/%s' % (LOSS_KEY_PREFIX, key), val) return dict([('%s/%s' % (LOSS_KEY_PREFIX, key), val)
...@@ -2033,9 +2405,14 @@ class CenterNetMetaArch(model.DetectionModel): ...@@ -2033,9 +2405,14 @@ class CenterNetMetaArch(model.DetectionModel):
invalid keypoints have their coordinates and scores set to 0.0. invalid keypoints have their coordinates and scores set to 0.0.
detection_keypoint_scores: (Optional) A float tensor of shape [batch, detection_keypoint_scores: (Optional) A float tensor of shape [batch,
max_detection, num_keypoints] with scores for each keypoint. max_detection, num_keypoints] with scores for each keypoint.
detection_masks: (Optional) An int tensor of shape [batch, detection_masks: (Optional) A uint8 tensor of shape [batch,
max_detections, mask_height, mask_width] with binarized masks for each max_detections, mask_height, mask_width] with masks for each
detection. detection. Background is specified with 0, and foreground is specified
with positive integers (1 for standard instance segmentation mask, and
1-indexed parts for DensePose task).
detection_surface_coords: (Optional) A float32 tensor of shape [batch,
max_detection, mask_height, mask_width, 2] with DensePose surface
coordinates, in (v, u) format.
""" """
object_center_prob = tf.nn.sigmoid(prediction_dict[OBJECT_CENTER][-1]) object_center_prob = tf.nn.sigmoid(prediction_dict[OBJECT_CENTER][-1])
# Get x, y and channel indices corresponding to the top indices in the class # Get x, y and channel indices corresponding to the top indices in the class
...@@ -2076,14 +2453,27 @@ class CenterNetMetaArch(model.DetectionModel): ...@@ -2076,14 +2453,27 @@ class CenterNetMetaArch(model.DetectionModel):
if self._mask_params: if self._mask_params:
masks = tf.nn.sigmoid(prediction_dict[SEGMENTATION_HEATMAP][-1]) masks = tf.nn.sigmoid(prediction_dict[SEGMENTATION_HEATMAP][-1])
instance_masks = convert_strided_predictions_to_instance_masks( densepose_part_heatmap, densepose_surface_coords = None, None
boxes, classes, masks, self._stride, self._mask_params.mask_height, densepose_class_index = 0
self._mask_params.mask_width, true_image_shapes, if self._densepose_params:
self._mask_params.score_threshold) densepose_part_heatmap = prediction_dict[DENSEPOSE_HEATMAP][-1]
postprocess_dict.update({ densepose_surface_coords = prediction_dict[DENSEPOSE_REGRESSION][-1]
fields.DetectionResultFields.detection_masks: densepose_class_index = self._densepose_params.class_id
instance_masks instance_masks, surface_coords = (
}) convert_strided_predictions_to_instance_masks(
boxes, classes, masks, true_image_shapes,
densepose_part_heatmap, densepose_surface_coords,
stride=self._stride, mask_height=self._mask_params.mask_height,
mask_width=self._mask_params.mask_width,
score_threshold=self._mask_params.score_threshold,
densepose_class_index=densepose_class_index))
postprocess_dict[
fields.DetectionResultFields.detection_masks] = instance_masks
if self._densepose_params:
postprocess_dict[
fields.DetectionResultFields.detection_surface_coords] = (
surface_coords)
return postprocess_dict return postprocess_dict
def _postprocess_keypoints(self, prediction_dict, classes, y_indices, def _postprocess_keypoints(self, prediction_dict, classes, y_indices,
...@@ -2359,6 +2749,14 @@ class CenterNetMetaArch(model.DetectionModel): ...@@ -2359,6 +2749,14 @@ class CenterNetMetaArch(model.DetectionModel):
checkpoint (with compatible variable names) or to restore from a checkpoint (with compatible variable names) or to restore from a
classification checkpoint for initialization prior to training. classification checkpoint for initialization prior to training.
Valid values: `detection`, `classification`. Default 'detection'. Valid values: `detection`, `classification`. Default 'detection'.
'detection': used when loading in the Hourglass model pre-trained on
other detection task.
'classification': used when loading in the ResNet model pre-trained on
image classification task. Note that only the image feature encoding
part is loaded but not those upsampling layers.
'fine_tune': used when loading the entire CenterNet feature extractor
pre-trained on other tasks. The checkpoints saved during CenterNet
model training can be directly loaded using this mode.
Returns: Returns:
A dict mapping keys to Trackable objects (tf.Module or Checkpoint). A dict mapping keys to Trackable objects (tf.Module or Checkpoint).
...@@ -2367,9 +2765,14 @@ class CenterNetMetaArch(model.DetectionModel): ...@@ -2367,9 +2765,14 @@ class CenterNetMetaArch(model.DetectionModel):
if fine_tune_checkpoint_type == 'classification': if fine_tune_checkpoint_type == 'classification':
return {'feature_extractor': self._feature_extractor.get_base_model()} return {'feature_extractor': self._feature_extractor.get_base_model()}
if fine_tune_checkpoint_type == 'detection': elif fine_tune_checkpoint_type == 'detection':
return {'feature_extractor': self._feature_extractor.get_model()} return {'feature_extractor': self._feature_extractor.get_model()}
elif fine_tune_checkpoint_type == 'fine_tune':
feature_extractor_model = tf.train.Checkpoint(
_feature_extractor=self._feature_extractor)
return {'model': feature_extractor_model}
else: else:
raise ValueError('Not supported fine tune checkpoint type - {}'.format( raise ValueError('Not supported fine tune checkpoint type - {}'.format(
fine_tune_checkpoint_type)) fine_tune_checkpoint_type))
......
...@@ -266,7 +266,7 @@ class CenterNetMetaArchHelpersTest(test_case.TestCase, parameterized.TestCase): ...@@ -266,7 +266,7 @@ class CenterNetMetaArchHelpersTest(test_case.TestCase, parameterized.TestCase):
masks_np[0, :, :3, 1] = 1 # Class 1. masks_np[0, :, :3, 1] = 1 # Class 1.
masks = tf.constant(masks_np) masks = tf.constant(masks_np)
true_image_shapes = tf.constant([[6, 8, 3]]) true_image_shapes = tf.constant([[6, 8, 3]])
instance_masks = cnma.convert_strided_predictions_to_instance_masks( instance_masks, _ = cnma.convert_strided_predictions_to_instance_masks(
boxes, classes, masks, stride=2, mask_height=2, mask_width=2, boxes, classes, masks, stride=2, mask_height=2, mask_width=2,
true_image_shapes=true_image_shapes) true_image_shapes=true_image_shapes)
return instance_masks return instance_masks
...@@ -289,6 +289,104 @@ class CenterNetMetaArchHelpersTest(test_case.TestCase, parameterized.TestCase): ...@@ -289,6 +289,104 @@ class CenterNetMetaArchHelpersTest(test_case.TestCase, parameterized.TestCase):
]) ])
np.testing.assert_array_equal(expected_instance_masks, instance_masks) np.testing.assert_array_equal(expected_instance_masks, instance_masks)
def test_convert_strided_predictions_raises_error_with_one_tensor(self):
def graph_fn():
boxes = tf.constant(
[
[[0.5, 0.5, 1.0, 1.0],
[0.0, 0.5, 0.5, 1.0],
[0.0, 0.0, 0.0, 0.0]],
], tf.float32)
classes = tf.constant(
[
[0, 1, 0],
], tf.int32)
masks_np = np.zeros((1, 4, 4, 2), dtype=np.float32)
masks_np[0, :, 2:, 0] = 1 # Class 0.
masks_np[0, :, :3, 1] = 1 # Class 1.
masks = tf.constant(masks_np)
true_image_shapes = tf.constant([[6, 8, 3]])
densepose_part_heatmap = tf.random.uniform(
[1, 4, 4, 24])
instance_masks, _ = cnma.convert_strided_predictions_to_instance_masks(
boxes, classes, masks, true_image_shapes,
densepose_part_heatmap=densepose_part_heatmap,
densepose_surface_coords=None)
return instance_masks
with self.assertRaises(ValueError):
self.execute_cpu(graph_fn, [])
def test_crop_and_threshold_masks(self):
boxes_np = np.array(
[[0., 0., 0.5, 0.5],
[0.25, 0.25, 1.0, 1.0]], dtype=np.float32)
classes_np = np.array([0, 2], dtype=np.int32)
masks_np = np.zeros((4, 4, _NUM_CLASSES), dtype=np.float32)
masks_np[0, 0, 0] = 0.8
masks_np[1, 1, 0] = 0.6
masks_np[3, 3, 2] = 0.7
part_heatmap_np = np.zeros((4, 4, _DENSEPOSE_NUM_PARTS), dtype=np.float32)
part_heatmap_np[0, 0, 4] = 1
part_heatmap_np[0, 0, 2] = 0.6 # Lower scoring.
part_heatmap_np[1, 1, 8] = 0.2
part_heatmap_np[3, 3, 4] = 0.5
surf_coords_np = np.zeros((4, 4, 2 * _DENSEPOSE_NUM_PARTS),
dtype=np.float32)
surf_coords_np[:, :, 8:10] = 0.2, 0.9
surf_coords_np[:, :, 16:18] = 0.3, 0.5
true_height, true_width = 10, 10
input_height, input_width = 10, 10
mask_height = 4
mask_width = 4
def graph_fn():
elems = [
tf.constant(boxes_np),
tf.constant(classes_np),
tf.constant(masks_np),
tf.constant(part_heatmap_np),
tf.constant(surf_coords_np),
tf.constant(true_height, dtype=tf.int32),
tf.constant(true_width, dtype=tf.int32)
]
part_masks, surface_coords = cnma.crop_and_threshold_masks(
elems, input_height, input_width, mask_height=mask_height,
mask_width=mask_width, densepose_class_index=0)
return part_masks, surface_coords
part_masks, surface_coords = self.execute_cpu(graph_fn, [])
expected_part_masks = np.zeros((2, 4, 4), dtype=np.uint8)
expected_part_masks[0, 0, 0] = 5 # Recall classes are 1-indexed in output.
expected_part_masks[0, 2, 2] = 9 # Recall classes are 1-indexed in output.
expected_part_masks[1, 3, 3] = 1 # Standard instance segmentation mask.
expected_surface_coords = np.zeros((2, 4, 4, 2), dtype=np.float32)
expected_surface_coords[0, 0, 0, :] = 0.2, 0.9
expected_surface_coords[0, 2, 2, :] = 0.3, 0.5
np.testing.assert_allclose(expected_part_masks, part_masks)
np.testing.assert_allclose(expected_surface_coords, surface_coords)
def test_gather_surface_coords_for_parts(self):
surface_coords_cropped_np = np.zeros((2, 5, 5, _DENSEPOSE_NUM_PARTS, 2),
dtype=np.float32)
surface_coords_cropped_np[0, 0, 0, 5] = 0.3, 0.4
surface_coords_cropped_np[0, 1, 0, 9] = 0.5, 0.6
highest_scoring_part_np = np.zeros((2, 5, 5), dtype=np.int32)
highest_scoring_part_np[0, 0, 0] = 5
highest_scoring_part_np[0, 1, 0] = 9
def graph_fn():
surface_coords_cropped = tf.constant(surface_coords_cropped_np,
tf.float32)
highest_scoring_part = tf.constant(highest_scoring_part_np, tf.int32)
surface_coords_gathered = cnma.gather_surface_coords_for_parts(
surface_coords_cropped, highest_scoring_part)
return surface_coords_gathered
surface_coords_gathered = self.execute_cpu(graph_fn, [])
np.testing.assert_allclose([0.3, 0.4], surface_coords_gathered[0, 0, 0])
np.testing.assert_allclose([0.5, 0.6], surface_coords_gathered[0, 1, 0])
def test_top_k_feature_map_locations(self): def test_top_k_feature_map_locations(self):
feature_map_np = np.zeros((2, 3, 3, 2), dtype=np.float32) feature_map_np = np.zeros((2, 3, 3, 2), dtype=np.float32)
feature_map_np[0, 2, 0, 1] = 1.0 feature_map_np[0, 2, 0, 1] = 1.0
...@@ -535,6 +633,8 @@ class CenterNetMetaArchHelpersTest(test_case.TestCase, parameterized.TestCase): ...@@ -535,6 +633,8 @@ class CenterNetMetaArchHelpersTest(test_case.TestCase, parameterized.TestCase):
keypoint_heatmap_np[1, 0, 1, 1] = 0.9 keypoint_heatmap_np[1, 0, 1, 1] = 0.9
keypoint_heatmap_np[1, 2, 0, 1] = 0.8 keypoint_heatmap_np[1, 2, 0, 1] = 0.8
# Note that the keypoint offsets are now per keypoint (as opposed to
# keypoint agnostic, in the test test_keypoint_candidate_prediction).
keypoint_heatmap_offsets_np = np.zeros((2, 3, 3, 4), dtype=np.float32) keypoint_heatmap_offsets_np = np.zeros((2, 3, 3, 4), dtype=np.float32)
keypoint_heatmap_offsets_np[0, 0, 0] = [0.5, 0.25, 0.0, 0.0] keypoint_heatmap_offsets_np[0, 0, 0] = [0.5, 0.25, 0.0, 0.0]
keypoint_heatmap_offsets_np[0, 2, 1] = [-0.25, 0.5, 0.0, 0.0] keypoint_heatmap_offsets_np[0, 2, 1] = [-0.25, 0.5, 0.0, 0.0]
...@@ -949,6 +1049,7 @@ class CenterNetMetaArchHelpersTest(test_case.TestCase, parameterized.TestCase): ...@@ -949,6 +1049,7 @@ class CenterNetMetaArchHelpersTest(test_case.TestCase, parameterized.TestCase):
_NUM_CLASSES = 10 _NUM_CLASSES = 10
_KEYPOINT_INDICES = [0, 1, 2, 3] _KEYPOINT_INDICES = [0, 1, 2, 3]
_NUM_KEYPOINTS = len(_KEYPOINT_INDICES) _NUM_KEYPOINTS = len(_KEYPOINT_INDICES)
_DENSEPOSE_NUM_PARTS = 24
_TASK_NAME = 'human_pose' _TASK_NAME = 'human_pose'
...@@ -991,6 +1092,20 @@ def get_fake_mask_params(): ...@@ -991,6 +1092,20 @@ def get_fake_mask_params():
mask_width=4) mask_width=4)
def get_fake_densepose_params():
"""Returns the fake DensePose estimation parameter namedtuple."""
return cnma.DensePoseParams(
class_id=1,
classification_loss=losses.WeightedSoftmaxClassificationLoss(),
localization_loss=losses.L1LocalizationLoss(),
part_loss_weight=1.0,
coordinate_loss_weight=1.0,
num_parts=_DENSEPOSE_NUM_PARTS,
task_loss_weight=1.0,
upsample_to_input_res=True,
upsample_method='nearest')
def build_center_net_meta_arch(build_resnet=False): def build_center_net_meta_arch(build_resnet=False):
"""Builds the CenterNet meta architecture.""" """Builds the CenterNet meta architecture."""
if build_resnet: if build_resnet:
...@@ -1018,7 +1133,8 @@ def build_center_net_meta_arch(build_resnet=False): ...@@ -1018,7 +1133,8 @@ def build_center_net_meta_arch(build_resnet=False):
object_center_params=get_fake_center_params(), object_center_params=get_fake_center_params(),
object_detection_params=get_fake_od_params(), object_detection_params=get_fake_od_params(),
keypoint_params_dict={_TASK_NAME: get_fake_kp_params()}, keypoint_params_dict={_TASK_NAME: get_fake_kp_params()},
mask_params=get_fake_mask_params()) mask_params=get_fake_mask_params(),
densepose_params=get_fake_densepose_params())
def _logit(p): def _logit(p):
...@@ -1102,6 +1218,16 @@ class CenterNetMetaArchTest(test_case.TestCase, parameterized.TestCase): ...@@ -1102,6 +1218,16 @@ class CenterNetMetaArchTest(test_case.TestCase, parameterized.TestCase):
fake_feature_map) fake_feature_map)
self.assertEqual((4, 128, 128, _NUM_CLASSES), output.shape) self.assertEqual((4, 128, 128, _NUM_CLASSES), output.shape)
# "densepose parts" head:
output = model._prediction_head_dict[cnma.DENSEPOSE_HEATMAP][-1](
fake_feature_map)
self.assertEqual((4, 128, 128, _DENSEPOSE_NUM_PARTS), output.shape)
# "densepose surface coordinates" head:
output = model._prediction_head_dict[cnma.DENSEPOSE_REGRESSION][-1](
fake_feature_map)
self.assertEqual((4, 128, 128, 2 * _DENSEPOSE_NUM_PARTS), output.shape)
def test_initialize_target_assigners(self): def test_initialize_target_assigners(self):
model = build_center_net_meta_arch() model = build_center_net_meta_arch()
assigner_dict = model._initialize_target_assigners( assigner_dict = model._initialize_target_assigners(
...@@ -1125,6 +1251,10 @@ class CenterNetMetaArchTest(test_case.TestCase, parameterized.TestCase): ...@@ -1125,6 +1251,10 @@ class CenterNetMetaArchTest(test_case.TestCase, parameterized.TestCase):
self.assertIsInstance(assigner_dict[cnma.SEGMENTATION_TASK], self.assertIsInstance(assigner_dict[cnma.SEGMENTATION_TASK],
cn_assigner.CenterNetMaskTargetAssigner) cn_assigner.CenterNetMaskTargetAssigner)
# DensePose estimation target assigner:
self.assertIsInstance(assigner_dict[cnma.DENSEPOSE_TASK],
cn_assigner.CenterNetDensePoseTargetAssigner)
def test_predict(self): def test_predict(self):
"""Test the predict function.""" """Test the predict function."""
...@@ -1145,6 +1275,10 @@ class CenterNetMetaArchTest(test_case.TestCase, parameterized.TestCase): ...@@ -1145,6 +1275,10 @@ class CenterNetMetaArchTest(test_case.TestCase, parameterized.TestCase):
(2, 32, 32, 2)) (2, 32, 32, 2))
self.assertEqual(prediction_dict[cnma.SEGMENTATION_HEATMAP][0].shape, self.assertEqual(prediction_dict[cnma.SEGMENTATION_HEATMAP][0].shape,
(2, 32, 32, _NUM_CLASSES)) (2, 32, 32, _NUM_CLASSES))
self.assertEqual(prediction_dict[cnma.DENSEPOSE_HEATMAP][0].shape,
(2, 32, 32, _DENSEPOSE_NUM_PARTS))
self.assertEqual(prediction_dict[cnma.DENSEPOSE_REGRESSION][0].shape,
(2, 32, 32, 2 * _DENSEPOSE_NUM_PARTS))
def test_loss(self): def test_loss(self):
"""Test the loss function.""" """Test the loss function."""
...@@ -1157,7 +1291,13 @@ class CenterNetMetaArchTest(test_case.TestCase, parameterized.TestCase): ...@@ -1157,7 +1291,13 @@ class CenterNetMetaArchTest(test_case.TestCase, parameterized.TestCase):
groundtruth_keypoints_list=groundtruth_dict[ groundtruth_keypoints_list=groundtruth_dict[
fields.BoxListFields.keypoints], fields.BoxListFields.keypoints],
groundtruth_masks_list=groundtruth_dict[ groundtruth_masks_list=groundtruth_dict[
fields.BoxListFields.masks]) fields.BoxListFields.masks],
groundtruth_dp_num_points_list=groundtruth_dict[
fields.BoxListFields.densepose_num_points],
groundtruth_dp_part_ids_list=groundtruth_dict[
fields.BoxListFields.densepose_part_ids],
groundtruth_dp_surface_coords_list=groundtruth_dict[
fields.BoxListFields.densepose_surface_coords])
prediction_dict = get_fake_prediction_dict( prediction_dict = get_fake_prediction_dict(
input_height=16, input_width=32, stride=4) input_height=16, input_width=32, stride=4)
...@@ -1193,6 +1333,12 @@ class CenterNetMetaArchTest(test_case.TestCase, parameterized.TestCase): ...@@ -1193,6 +1333,12 @@ class CenterNetMetaArchTest(test_case.TestCase, parameterized.TestCase):
self.assertGreater( self.assertGreater(
0.01, loss_dict['%s/%s' % (cnma.LOSS_KEY_PREFIX, 0.01, loss_dict['%s/%s' % (cnma.LOSS_KEY_PREFIX,
cnma.SEGMENTATION_HEATMAP)]) cnma.SEGMENTATION_HEATMAP)])
self.assertGreater(
0.01, loss_dict['%s/%s' % (cnma.LOSS_KEY_PREFIX,
cnma.DENSEPOSE_HEATMAP)])
self.assertGreater(
0.01, loss_dict['%s/%s' % (cnma.LOSS_KEY_PREFIX,
cnma.DENSEPOSE_REGRESSION)])
@parameterized.parameters( @parameterized.parameters(
{'target_class_id': 1}, {'target_class_id': 1},
...@@ -1230,6 +1376,14 @@ class CenterNetMetaArchTest(test_case.TestCase, parameterized.TestCase): ...@@ -1230,6 +1376,14 @@ class CenterNetMetaArchTest(test_case.TestCase, parameterized.TestCase):
segmentation_heatmap[:, 14:18, 14:18, target_class_id] = 1.0 segmentation_heatmap[:, 14:18, 14:18, target_class_id] = 1.0
segmentation_heatmap = _logit(segmentation_heatmap) segmentation_heatmap = _logit(segmentation_heatmap)
dp_part_ind = 4
dp_part_heatmap = np.zeros((1, 32, 32, _DENSEPOSE_NUM_PARTS),
dtype=np.float32)
dp_part_heatmap[0, 14:18, 14:18, dp_part_ind] = 1.0
dp_part_heatmap = _logit(dp_part_heatmap)
dp_surf_coords = np.random.randn(1, 32, 32, 2 * _DENSEPOSE_NUM_PARTS)
class_center = tf.constant(class_center) class_center = tf.constant(class_center)
height_width = tf.constant(height_width) height_width = tf.constant(height_width)
offset = tf.constant(offset) offset = tf.constant(offset)
...@@ -1237,6 +1391,8 @@ class CenterNetMetaArchTest(test_case.TestCase, parameterized.TestCase): ...@@ -1237,6 +1391,8 @@ class CenterNetMetaArchTest(test_case.TestCase, parameterized.TestCase):
keypoint_offsets = tf.constant(keypoint_offsets, dtype=tf.float32) keypoint_offsets = tf.constant(keypoint_offsets, dtype=tf.float32)
keypoint_regression = tf.constant(keypoint_regression, dtype=tf.float32) keypoint_regression = tf.constant(keypoint_regression, dtype=tf.float32)
segmentation_heatmap = tf.constant(segmentation_heatmap, dtype=tf.float32) segmentation_heatmap = tf.constant(segmentation_heatmap, dtype=tf.float32)
dp_part_heatmap = tf.constant(dp_part_heatmap, dtype=tf.float32)
dp_surf_coords = tf.constant(dp_surf_coords, dtype=tf.float32)
prediction_dict = { prediction_dict = {
cnma.OBJECT_CENTER: [class_center], cnma.OBJECT_CENTER: [class_center],
...@@ -1249,6 +1405,8 @@ class CenterNetMetaArchTest(test_case.TestCase, parameterized.TestCase): ...@@ -1249,6 +1405,8 @@ class CenterNetMetaArchTest(test_case.TestCase, parameterized.TestCase):
cnma.get_keypoint_name(_TASK_NAME, cnma.KEYPOINT_REGRESSION): cnma.get_keypoint_name(_TASK_NAME, cnma.KEYPOINT_REGRESSION):
[keypoint_regression], [keypoint_regression],
cnma.SEGMENTATION_HEATMAP: [segmentation_heatmap], cnma.SEGMENTATION_HEATMAP: [segmentation_heatmap],
cnma.DENSEPOSE_HEATMAP: [dp_part_heatmap],
cnma.DENSEPOSE_REGRESSION: [dp_surf_coords]
} }
def graph_fn(): def graph_fn():
...@@ -1271,12 +1429,13 @@ class CenterNetMetaArchTest(test_case.TestCase, parameterized.TestCase): ...@@ -1271,12 +1429,13 @@ class CenterNetMetaArchTest(test_case.TestCase, parameterized.TestCase):
self.assertAllEqual([1, max_detection, 4, 4], self.assertAllEqual([1, max_detection, 4, 4],
detections['detection_masks'].shape) detections['detection_masks'].shape)
# There should be some section of the first mask (correspond to the only # Masks should be empty for everything but the first detection.
# detection) with non-zero mask values.
self.assertGreater(np.sum(detections['detection_masks'][0, 0, :, :] > 0), 0)
self.assertAllEqual( self.assertAllEqual(
detections['detection_masks'][0, 1:, :, :], detections['detection_masks'][0, 1:, :, :],
np.zeros_like(detections['detection_masks'][0, 1:, :, :])) np.zeros_like(detections['detection_masks'][0, 1:, :, :]))
self.assertAllEqual(
detections['detection_surface_coords'][0, 1:, :, :],
np.zeros_like(detections['detection_surface_coords'][0, 1:, :, :]))
if target_class_id == 1: if target_class_id == 1:
expected_kpts_for_obj_0 = np.array( expected_kpts_for_obj_0 = np.array(
...@@ -1287,6 +1446,12 @@ class CenterNetMetaArchTest(test_case.TestCase, parameterized.TestCase): ...@@ -1287,6 +1446,12 @@ class CenterNetMetaArchTest(test_case.TestCase, parameterized.TestCase):
expected_kpts_for_obj_0, rtol=1e-6) expected_kpts_for_obj_0, rtol=1e-6)
np.testing.assert_allclose(detections['detection_keypoint_scores'][0][0], np.testing.assert_allclose(detections['detection_keypoint_scores'][0][0],
expected_kpt_scores_for_obj_0, rtol=1e-6) expected_kpt_scores_for_obj_0, rtol=1e-6)
# First detection has DensePose parts.
self.assertSameElements(
np.unique(detections['detection_masks'][0, 0, :, :]),
set([0, dp_part_ind + 1]))
self.assertGreater(np.sum(np.abs(detections['detection_surface_coords'])),
0.0)
else: else:
# All keypoint outputs should be zeros. # All keypoint outputs should be zeros.
np.testing.assert_allclose( np.testing.assert_allclose(
...@@ -1297,6 +1462,14 @@ class CenterNetMetaArchTest(test_case.TestCase, parameterized.TestCase): ...@@ -1297,6 +1462,14 @@ class CenterNetMetaArchTest(test_case.TestCase, parameterized.TestCase):
detections['detection_keypoint_scores'][0][0], detections['detection_keypoint_scores'][0][0],
np.zeros([num_keypoints], np.float), np.zeros([num_keypoints], np.float),
rtol=1e-6) rtol=1e-6)
# Binary segmentation mask.
self.assertSameElements(
np.unique(detections['detection_masks'][0, 0, :, :]),
set([0, 1]))
# No DensePose surface coordinates.
np.testing.assert_allclose(
detections['detection_surface_coords'][0, 0, :, :],
np.zeros_like(detections['detection_surface_coords'][0, 0, :, :]))
def test_get_instance_indices(self): def test_get_instance_indices(self):
classes = tf.constant([[0, 1, 2, 0], [2, 1, 2, 2]], dtype=tf.int32) classes = tf.constant([[0, 1, 2, 0], [2, 1, 2, 2]], dtype=tf.int32)
...@@ -1353,6 +1526,17 @@ def get_fake_prediction_dict(input_height, input_width, stride): ...@@ -1353,6 +1526,17 @@ def get_fake_prediction_dict(input_height, input_width, stride):
mask_heatmap[0, 2, 4, 1] = 1.0 mask_heatmap[0, 2, 4, 1] = 1.0
mask_heatmap = _logit(mask_heatmap) mask_heatmap = _logit(mask_heatmap)
densepose_heatmap = np.zeros((2, output_height, output_width,
_DENSEPOSE_NUM_PARTS), dtype=np.float32)
densepose_heatmap[0, 2, 4, 5] = 1.0
densepose_heatmap = _logit(densepose_heatmap)
densepose_regression = np.zeros((2, output_height, output_width,
2 * _DENSEPOSE_NUM_PARTS), dtype=np.float32)
# The surface coordinate indices for part index 5 are:
# (5 * 2, 5 * 2 + 1), or (10, 11).
densepose_regression[0, 2, 4, 10:12] = 0.4, 0.7
prediction_dict = { prediction_dict = {
'preprocessed_inputs': 'preprocessed_inputs':
tf.zeros((2, input_height, input_width, 3)), tf.zeros((2, input_height, input_width, 3)),
...@@ -1383,6 +1567,14 @@ def get_fake_prediction_dict(input_height, input_width, stride): ...@@ -1383,6 +1567,14 @@ def get_fake_prediction_dict(input_height, input_width, stride):
cnma.SEGMENTATION_HEATMAP: [ cnma.SEGMENTATION_HEATMAP: [
tf.constant(mask_heatmap), tf.constant(mask_heatmap),
tf.constant(mask_heatmap) tf.constant(mask_heatmap)
],
cnma.DENSEPOSE_HEATMAP: [
tf.constant(densepose_heatmap),
tf.constant(densepose_heatmap),
],
cnma.DENSEPOSE_REGRESSION: [
tf.constant(densepose_regression),
tf.constant(densepose_regression),
] ]
} }
return prediction_dict return prediction_dict
...@@ -1427,12 +1619,30 @@ def get_fake_groundtruth_dict(input_height, input_width, stride): ...@@ -1427,12 +1619,30 @@ def get_fake_groundtruth_dict(input_height, input_width, stride):
tf.constant(mask), tf.constant(mask),
tf.zeros_like(mask), tf.zeros_like(mask),
] ]
densepose_num_points = [
tf.constant([1], dtype=tf.int32),
tf.constant([0], dtype=tf.int32),
]
densepose_part_ids = [
tf.constant([[5, 0, 0]], dtype=tf.int32),
tf.constant([[0, 0, 0]], dtype=tf.int32),
]
densepose_surface_coords_np = np.zeros((1, 3, 4), dtype=np.float32)
densepose_surface_coords_np[0, 0, :] = 0.55, 0.55, 0.4, 0.7
densepose_surface_coords = [
tf.constant(densepose_surface_coords_np),
tf.zeros_like(densepose_surface_coords_np)
]
groundtruth_dict = { groundtruth_dict = {
fields.BoxListFields.boxes: boxes, fields.BoxListFields.boxes: boxes,
fields.BoxListFields.weights: weights, fields.BoxListFields.weights: weights,
fields.BoxListFields.classes: classes, fields.BoxListFields.classes: classes,
fields.BoxListFields.keypoints: keypoints, fields.BoxListFields.keypoints: keypoints,
fields.BoxListFields.masks: masks, fields.BoxListFields.masks: masks,
fields.BoxListFields.densepose_num_points: densepose_num_points,
fields.BoxListFields.densepose_part_ids: densepose_part_ids,
fields.BoxListFields.densepose_surface_coords:
densepose_surface_coords,
fields.InputDataFields.groundtruth_labeled_classes: labeled_classes, fields.InputDataFields.groundtruth_labeled_classes: labeled_classes,
} }
return groundtruth_dict return groundtruth_dict
......
# Lint as: python3
# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Library functions for Context R-CNN."""
import tensorflow as tf
from object_detection.core import freezable_batch_norm
# The negative value used in padding the invalid weights.
_NEGATIVE_PADDING_VALUE = -100000
class ContextProjection(tf.keras.layers.Layer):
"""Custom layer to do batch normalization and projection."""
def __init__(self, projection_dimension, **kwargs):
self.batch_norm = freezable_batch_norm.FreezableBatchNorm(
epsilon=0.001,
center=True,
scale=True,
momentum=0.97,
trainable=True)
self.projection = tf.keras.layers.Dense(units=projection_dimension,
activation=tf.nn.relu6,
use_bias=True)
super(ContextProjection, self).__init__(**kwargs)
def build(self, input_shape):
self.batch_norm.build(input_shape)
self.projection.build(input_shape)
def call(self, input_features, is_training=False):
return self.projection(self.batch_norm(input_features, is_training))
class AttentionBlock(tf.keras.layers.Layer):
"""Custom layer to perform all attention."""
def __init__(self, bottleneck_dimension, attention_temperature,
output_dimension=None, is_training=False,
name='AttentionBlock', **kwargs):
"""Constructs an attention block.
Args:
bottleneck_dimension: A int32 Tensor representing the bottleneck dimension
for intermediate projections.
attention_temperature: A float Tensor. It controls the temperature of the
softmax for weights calculation. The formula for calculation as follows:
weights = exp(weights / temperature) / sum(exp(weights / temperature))
output_dimension: A int32 Tensor representing the last dimension of the
output feature.
is_training: A boolean Tensor (affecting batch normalization).
name: A string describing what to name the variables in this block.
**kwargs: Additional keyword arguments.
"""
self._key_proj = ContextProjection(bottleneck_dimension)
self._val_proj = ContextProjection(bottleneck_dimension)
self._query_proj = ContextProjection(bottleneck_dimension)
self._feature_proj = None
self._attention_temperature = attention_temperature
self._bottleneck_dimension = bottleneck_dimension
self._is_training = is_training
self._output_dimension = output_dimension
if self._output_dimension:
self._feature_proj = ContextProjection(self._output_dimension)
super(AttentionBlock, self).__init__(name=name, **kwargs)
def build(self, input_shapes):
"""Finishes building the attention block.
Args:
input_shapes: the shape of the primary input box features.
"""
if not self._feature_proj:
self._output_dimension = input_shapes[-1]
self._feature_proj = ContextProjection(self._output_dimension)
def call(self, box_features, context_features, valid_context_size):
"""Handles a call by performing attention.
Args:
box_features: A float Tensor of shape [batch_size, input_size,
num_input_features].
context_features: A float Tensor of shape [batch_size, context_size,
num_context_features].
valid_context_size: A int32 Tensor of shape [batch_size].
Returns:
A float Tensor with shape [batch_size, input_size, num_input_features]
containing output features after attention with context features.
"""
_, context_size, _ = context_features.shape
valid_mask = compute_valid_mask(valid_context_size, context_size)
# Average pools over height and width dimension so that the shape of
# box_features becomes [batch_size, max_num_proposals, channels].
box_features = tf.reduce_mean(box_features, [2, 3])
queries = project_features(
box_features, self._bottleneck_dimension, self._is_training,
self._query_proj, normalize=True)
keys = project_features(
context_features, self._bottleneck_dimension, self._is_training,
self._key_proj, normalize=True)
values = project_features(
context_features, self._bottleneck_dimension, self._is_training,
self._val_proj, normalize=True)
weights = tf.matmul(queries, keys, transpose_b=True)
weights, values = filter_weight_value(weights, values, valid_mask)
weights = tf.nn.softmax(weights / self._attention_temperature)
features = tf.matmul(weights, values)
output_features = project_features(
features, self._output_dimension, self._is_training,
self._feature_proj, normalize=False)
output_features = output_features[:, :, tf.newaxis, tf.newaxis, :]
return output_features
def filter_weight_value(weights, values, valid_mask):
"""Filters weights and values based on valid_mask.
_NEGATIVE_PADDING_VALUE will be added to invalid elements in the weights to
avoid their contribution in softmax. 0 will be set for the invalid elements in
the values.
Args:
weights: A float Tensor of shape [batch_size, input_size, context_size].
values: A float Tensor of shape [batch_size, context_size,
projected_dimension].
valid_mask: A boolean Tensor of shape [batch_size, context_size]. True means
valid and False means invalid.
Returns:
weights: A float Tensor of shape [batch_size, input_size, context_size].
values: A float Tensor of shape [batch_size, context_size,
projected_dimension].
Raises:
ValueError: If shape of doesn't match.
"""
w_batch_size, _, w_context_size = weights.shape
v_batch_size, v_context_size, _ = values.shape
m_batch_size, m_context_size = valid_mask.shape
if w_batch_size != v_batch_size or v_batch_size != m_batch_size:
raise ValueError('Please make sure the first dimension of the input'
' tensors are the same.')
if w_context_size != v_context_size:
raise ValueError('Please make sure the third dimension of weights matches'
' the second dimension of values.')
if w_context_size != m_context_size:
raise ValueError('Please make sure the third dimension of the weights'
' matches the second dimension of the valid_mask.')
valid_mask = valid_mask[..., tf.newaxis]
# Force the invalid weights to be very negative so it won't contribute to
# the softmax.
weights += tf.transpose(
tf.cast(tf.math.logical_not(valid_mask), weights.dtype) *
_NEGATIVE_PADDING_VALUE,
perm=[0, 2, 1])
# Force the invalid values to be 0.
values *= tf.cast(valid_mask, values.dtype)
return weights, values
def project_features(features, bottleneck_dimension, is_training,
layer, normalize=True):
"""Projects features to another feature space.
Args:
features: A float Tensor of shape [batch_size, features_size,
num_features].
bottleneck_dimension: A int32 Tensor.
is_training: A boolean Tensor (affecting batch normalization).
layer: Contains a custom layer specific to the particular operation
being performed (key, value, query, features)
normalize: A boolean Tensor. If true, the output features will be l2
normalized on the last dimension.
Returns:
A float Tensor of shape [batch, features_size, projection_dimension].
"""
shape_arr = features.shape
batch_size, _, num_features = shape_arr
features = tf.reshape(features, [-1, num_features])
projected_features = layer(features, is_training)
projected_features = tf.reshape(projected_features,
[batch_size, -1, bottleneck_dimension])
if normalize:
projected_features = tf.keras.backend.l2_normalize(projected_features,
axis=-1)
return projected_features
def compute_valid_mask(num_valid_elements, num_elements):
"""Computes mask of valid entries within padded context feature.
Args:
num_valid_elements: A int32 Tensor of shape [batch_size].
num_elements: An int32 Tensor.
Returns:
A boolean Tensor of the shape [batch_size, num_elements]. True means
valid and False means invalid.
"""
batch_size = num_valid_elements.shape[0]
element_idxs = tf.range(num_elements, dtype=tf.int32)
batch_element_idxs = tf.tile(element_idxs[tf.newaxis, ...], [batch_size, 1])
num_valid_elements = num_valid_elements[..., tf.newaxis]
valid_mask = tf.less(batch_element_idxs, num_valid_elements)
return valid_mask
# Lint as: python3
# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for context_rcnn_lib."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import unittest
from absl.testing import parameterized
import tensorflow.compat.v1 as tf
from object_detection.meta_architectures import context_rcnn_lib_tf2 as context_rcnn_lib
from object_detection.utils import test_case
from object_detection.utils import tf_version
_NEGATIVE_PADDING_VALUE = -100000
@unittest.skipIf(tf_version.is_tf1(), 'Skipping TF2.X only test.')
class ContextRcnnLibTest(parameterized.TestCase, test_case.TestCase):
"""Tests for the functions in context_rcnn_lib."""
def test_compute_valid_mask(self):
num_elements = tf.constant(3, tf.int32)
num_valid_elementss = tf.constant((1, 2), tf.int32)
valid_mask = context_rcnn_lib.compute_valid_mask(num_valid_elementss,
num_elements)
expected_valid_mask = tf.constant([[1, 0, 0], [1, 1, 0]], tf.float32)
self.assertAllEqual(valid_mask, expected_valid_mask)
def test_filter_weight_value(self):
weights = tf.ones((2, 3, 2), tf.float32) * 4
values = tf.ones((2, 2, 4), tf.float32)
valid_mask = tf.constant([[True, True], [True, False]], tf.bool)
filtered_weights, filtered_values = context_rcnn_lib.filter_weight_value(
weights, values, valid_mask)
expected_weights = tf.constant([[[4, 4], [4, 4], [4, 4]],
[[4, _NEGATIVE_PADDING_VALUE + 4],
[4, _NEGATIVE_PADDING_VALUE + 4],
[4, _NEGATIVE_PADDING_VALUE + 4]]])
expected_values = tf.constant([[[1, 1, 1, 1], [1, 1, 1, 1]],
[[1, 1, 1, 1], [0, 0, 0, 0]]])
self.assertAllEqual(filtered_weights, expected_weights)
self.assertAllEqual(filtered_values, expected_values)
# Changes the valid_mask so the results will be different.
valid_mask = tf.constant([[True, True], [False, False]], tf.bool)
filtered_weights, filtered_values = context_rcnn_lib.filter_weight_value(
weights, values, valid_mask)
expected_weights = tf.constant(
[[[4, 4], [4, 4], [4, 4]],
[[_NEGATIVE_PADDING_VALUE + 4, _NEGATIVE_PADDING_VALUE + 4],
[_NEGATIVE_PADDING_VALUE + 4, _NEGATIVE_PADDING_VALUE + 4],
[_NEGATIVE_PADDING_VALUE + 4, _NEGATIVE_PADDING_VALUE + 4]]])
expected_values = tf.constant([[[1, 1, 1, 1], [1, 1, 1, 1]],
[[0, 0, 0, 0], [0, 0, 0, 0]]])
self.assertAllEqual(filtered_weights, expected_weights)
self.assertAllEqual(filtered_values, expected_values)
@parameterized.parameters((2, True, True), (2, False, True),
(10, True, False), (10, False, False))
def test_project_features(self, projection_dimension, is_training, normalize):
features = tf.ones([2, 3, 4], tf.float32)
projected_features = context_rcnn_lib.project_features(
features,
projection_dimension,
is_training,
context_rcnn_lib.ContextProjection(projection_dimension),
normalize=normalize)
# Makes sure the shape is correct.
self.assertAllEqual(projected_features.shape, [2, 3, projection_dimension])
@parameterized.parameters(
(2, 10, 1),
(3, 10, 2),
(4, None, 3),
(5, 20, 4),
(7, None, 5),
)
def test_attention_block(self, bottleneck_dimension, output_dimension,
attention_temperature):
input_features = tf.ones([2, 8, 3, 3, 3], tf.float32)
context_features = tf.ones([2, 20, 10], tf.float32)
attention_block = context_rcnn_lib.AttentionBlock(
bottleneck_dimension,
attention_temperature,
output_dimension=output_dimension,
is_training=False)
valid_context_size = tf.random_uniform((2,),
minval=0,
maxval=10,
dtype=tf.int32)
output_features = attention_block(input_features, context_features,
valid_context_size)
# Makes sure the shape is correct.
self.assertAllEqual(output_features.shape,
[2, 8, 1, 1, (output_dimension or 3)])
if __name__ == '__main__':
tf.test.main()
...@@ -27,7 +27,9 @@ import functools ...@@ -27,7 +27,9 @@ import functools
from object_detection.core import standard_fields as fields from object_detection.core import standard_fields as fields
from object_detection.meta_architectures import context_rcnn_lib from object_detection.meta_architectures import context_rcnn_lib
from object_detection.meta_architectures import context_rcnn_lib_tf2
from object_detection.meta_architectures import faster_rcnn_meta_arch from object_detection.meta_architectures import faster_rcnn_meta_arch
from object_detection.utils import tf_version
class ContextRCNNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch): class ContextRCNNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch):
...@@ -264,11 +266,17 @@ class ContextRCNNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch): ...@@ -264,11 +266,17 @@ class ContextRCNNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch):
return_raw_detections_during_predict), return_raw_detections_during_predict),
output_final_box_features=output_final_box_features) output_final_box_features=output_final_box_features)
self._context_feature_extract_fn = functools.partial( if tf_version.is_tf1():
context_rcnn_lib.compute_box_context_attention, self._context_feature_extract_fn = functools.partial(
bottleneck_dimension=attention_bottleneck_dimension, context_rcnn_lib.compute_box_context_attention,
attention_temperature=attention_temperature, bottleneck_dimension=attention_bottleneck_dimension,
is_training=is_training) attention_temperature=attention_temperature,
is_training=is_training)
else:
self._context_feature_extract_fn = context_rcnn_lib_tf2.AttentionBlock(
bottleneck_dimension=attention_bottleneck_dimension,
attention_temperature=attention_temperature,
is_training=is_training)
@staticmethod @staticmethod
def get_side_inputs(features): def get_side_inputs(features):
...@@ -323,8 +331,9 @@ class ContextRCNNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch): ...@@ -323,8 +331,9 @@ class ContextRCNNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch):
Returns: Returns:
A float32 Tensor with shape [K, new_height, new_width, depth]. A float32 Tensor with shape [K, new_height, new_width, depth].
""" """
box_features = self._crop_and_resize_fn( box_features = self._crop_and_resize_fn(
features_to_crop, proposal_boxes_normalized, [features_to_crop], proposal_boxes_normalized, None,
[self._initial_crop_size, self._initial_crop_size]) [self._initial_crop_size, self._initial_crop_size])
attention_features = self._context_feature_extract_fn( attention_features = self._context_feature_extract_fn(
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment