"build_tools/packaging/README.md" did not exist on "289f08af6af01e7d89644ed334435333fdd769b3"
Commit c308c03c authored by Mehdi Sharifzadeh's avatar Mehdi Sharifzadeh Committed by Taylor Robie
Browse files

Mask R-CNN model added to models/research/mlperf_object_detection/Mask_RCNN (#4678)

* Create README.md

* readme changed

* readme changed

* ResNet backbone completed.

* FPN added

* Create README.md

* initial commit

* files removed

* initial commit

* protobuf file removed
parent 32e7d660
# Installation
## Dependencies
Tensorflow Object Detection API depends on the following libraries:
* Protobuf 3+
* Python-tk
* Pillow 1.0
* lxml
* tf Slim (which is included in the "tensorflow/models/research/" checkout)
* Jupyter notebook
* Matplotlib
* Tensorflow
* Cython
* cocoapi
For detailed steps to install Tensorflow, follow the [Tensorflow installation
instructions](https://www.tensorflow.org/install/). A typical user can install
Tensorflow using one of the following commands:
``` bash
# For CPU
pip install tensorflow
# For GPU
pip install tensorflow-gpu
```
The remaining libraries can be installed on Ubuntu 16.04 using via apt-get:
``` bash
sudo apt-get install protobuf-compiler python-pil python-lxml python-tk
sudo pip install Cython
sudo pip install jupyter
sudo pip install matplotlib
```
Alternatively, users can install dependencies using pip:
``` bash
sudo pip install Cython
sudo pip install pillow
sudo pip install lxml
sudo pip install jupyter
sudo pip install matplotlib
```
## COCO API installation
Download the
<a href="https://github.com/cocodataset/cocoapi" target=_blank>cocoapi</a> and
copy the pycocotools subfolder to the tensorflow/models/research directory if
you are interested in using COCO evaluation metrics. The default metrics are
based on those used in Pascal VOC evaluation. To use the COCO object detection
metrics add `metrics_set: "coco_detection_metrics"` to the `eval_config` message
in the config file. To use the COCO instance segmentation metrics add
`metrics_set: "coco_mask_metrics"` to the `eval_config` message in the config
file.
```bash
git clone https://github.com/cocodataset/cocoapi.git
cd cocoapi/PythonAPI
make
cp -r pycocotools <path_to_tensorflow>/models/research/
```
## Protobuf Compilation
The Tensorflow Object Detection API uses Protobufs to configure model and
training parameters. Before the framework can be used, the Protobuf libraries
must be compiled. This should be done by running the following command from
the tensorflow/models/research/ directory:
``` bash
# From tensorflow/models/research/
protoc object_detection/protos/*.proto --python_out=.
```
## Add Libraries to PYTHONPATH
When running locally, the tensorflow/models/research/ and slim directories
should be appended to PYTHONPATH. This can be done by running the following from
tensorflow/models/research/:
``` bash
# From tensorflow/models/research/
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim
```
Note: This command needs to run from every new terminal you start. If you wish
to avoid running this manually, you can add it as a new line to the end of your
~/.bashrc file, replacing \`pwd\` with the absolute path of
tensorflow/models/research on your system.
# Testing the Installation
You can test that you have correctly installed the Tensorflow Object Detection\
API by running the following command:
```bash
python object_detection/builders/model_builder_test.py
```
## Run an Instance Segmentation Model
For some applications it isn't adequate enough to localize an object with a
simple bounding box. For instance, you might want to segment an object region
once it is detected. This class of problems is called **instance segmentation**.
<p align="center">
<img src="img/kites_with_segment_overlay.png" width=676 height=450>
</p>
### Materializing data for instance segmentation {#materializing-instance-seg}
Instance segmentation is an extension of object detection, where a binary mask
(i.e. object vs. background) is associated with every bounding box. This allows
for more fine-grained information about the extent of the object within the box.
To train an instance segmentation model, a groundtruth mask must be supplied for
every groundtruth bounding box. In additional to the proto fields listed in the
section titled [Using your own dataset](using_your_own_dataset.md), one must
also supply `image/object/mask`, which can either be a repeated list of
single-channel encoded PNG strings, or a single dense 3D binary tensor where
masks corresponding to each object are stacked along the first dimension. Each
is described in more detail below.
#### PNG Instance Segmentation Masks
Instance segmentation masks can be supplied as serialized PNG images.
```shell
image/object/mask = ["\x89PNG\r\n\x1A\n\x00\x00\x00\rIHDR\...", ...]
```
These masks are whole-image masks, one for each object instance. The spatial
dimensions of each mask must agree with the image. Each mask has only a single
channel, and the pixel values are either 0 (background) or 1 (object mask).
**PNG masks are the preferred parameterization since they offer considerable
space savings compared to dense numerical masks.**
#### Dense Numerical Instance Segmentation Masks
Masks can also be specified via a dense numerical tensor.
```shell
image/object/mask = [0.0, 0.0, 1.0, 1.0, 0.0, ...]
```
For an image with dimensions `H` x `W` and `num_boxes` groundtruth boxes, the
mask corresponds to a [`num_boxes`, `H`, `W`] float32 tensor, flattened into a
single vector of shape `num_boxes` * `H` * `W`. In TensorFlow, examples are read
in row-major format, so the elements are organized as:
```shell
... mask 0 row 0 ... mask 0 row 1 ... // ... mask 0 row H-1 ... mask 1 row 0 ...
```
where each row has W contiguous binary values.
To see an example tf-records with mask labels, see the examples under the
[Preparing Inputs](preparing_inputs.md) section.
### Pre-existing config files
We provide four instance segmentation config files that you can use to train
your own models:
1. <a href="https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/mask_rcnn_inception_resnet_v2_atrous_coco.config" target=_blank>mask_rcnn_inception_resnet_v2_atrous_coco</a>
1. <a href="https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/mask_rcnn_resnet101_atrous_coco.config" target=_blank>mask_rcnn_resnet101_atrous_coco</a>
1. <a href="https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/mask_rcnn_resnet50_atrous_coco.config" target=_blank>mask_rcnn_resnet50_atrous_coco</a>
1. <a href="https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/mask_rcnn_inception_v2_coco.config" target=_blank>mask_rcnn_inception_v2_coco</a>
For more details see the [detection model zoo](detection_model_zoo.md).
### Updating a Faster R-CNN config file
Currently, the only supported instance segmentation model is [Mask
R-CNN](https://arxiv.org/abs/1703.06870), which requires Faster R-CNN as the
backbone object detector.
Once you have a baseline Faster R-CNN pipeline configuration, you can make the
following modifications in order to convert it into a Mask R-CNN model.
1. Within `train_input_reader` and `eval_input_reader`, set
`load_instance_masks` to `True`. If using PNG masks, set `mask_type` to
`PNG_MASKS`, otherwise you can leave it as the default 'NUMERICAL_MASKS'.
1. Within the `faster_rcnn` config, use a `MaskRCNNBoxPredictor` as the
`second_stage_box_predictor`.
1. Within the `MaskRCNNBoxPredictor` message, set `predict_instance_masks` to
`True`. You must also define `conv_hyperparams`.
1. Within the `faster_rcnn` message, set `number_of_stages` to `3`.
1. Add instance segmentation metrics to the set of metrics:
`'coco_mask_metrics'`.
1. Update the `input_path`s to point at your data.
Please refer to the section on [Running the pets dataset](running_pets.md) for
additional details.
> Note: The mask prediction branch consists of a sequence of convolution layers.
> You can set the number of convolution layers and their depth as follows:
>
> 1. Within the `MaskRCNNBoxPredictor` message, set the
> `mask_prediction_conv_depth` to your value of interest. The default value
> is 256. If you set it to `0` (recommended), the depth is computed
> automatically based on the number of classes in the dataset.
> 1. Within the `MaskRCNNBoxPredictor` message, set the
> `mask_prediction_num_conv_layers` to your value of interest. The default
> value is 2.
# Inference and evaluation on the Open Images dataset
This page presents a tutorial for running object detector inference and
evaluation measure computations on the [Open Images
dataset](https://github.com/openimages/dataset), using tools from the
[TensorFlow Object Detection
API](https://github.com/tensorflow/models/tree/master/research/object_detection).
It shows how to download the images and annotations for the validation and test
sets of Open Images; how to package the downloaded data in a format understood
by the Object Detection API; where to find a trained object detector model for
Open Images; how to run inference; and how to compute evaluation measures on the
inferred detections.
Inferred detections will look like the following:
![](img/oid_bus_72e19c28aac34ed8.jpg){height="300"}
![](img/oid_monkey_3b4168c89cecbc5b.jpg){height="300"}
On the validation set of Open Images, this tutorial requires 27GB of free disk
space and the inference step takes approximately 9 hours on a single NVIDIA
Tesla P100 GPU. On the test set -- 75GB and 27 hours respectively. All other
steps require less than two hours in total on both sets.
## Installing TensorFlow, the Object Detection API, and Google Cloud SDK
Please run through the [installation instructions](installation.md) to install
TensorFlow and all its dependencies. Ensure the Protobuf libraries are compiled
and the library directories are added to `PYTHONPATH`. You will also need to
`pip` install `pandas` and `contextlib2`.
Some of the data used in this tutorial lives in Google Cloud buckets. To access
it, you will have to [install the Google Cloud
SDK](https://cloud.google.com/sdk/downloads) on your workstation or laptop.
## Preparing the Open Images validation and test sets
In order to run inference and subsequent evaluation measure computations, we
require a dataset of images and ground truth boxes, packaged as TFRecords of
TFExamples. To create such a dataset for Open Images, you will need to first
download ground truth boxes from the [Open Images
website](https://github.com/openimages/dataset):
```bash
# From tensorflow/models/research
mkdir oid
cd oid
wget https://storage.googleapis.com/openimages/2017_07/annotations_human_bbox_2017_07.tar.gz
tar -xvf annotations_human_bbox_2017_07.tar.gz
```
Next, download the images. In this tutorial, we will use lower resolution images
provided by [CVDF](http://www.cvdfoundation.org). Please follow the instructions
on [CVDF's Open Images repository
page](https://github.com/cvdfoundation/open-images-dataset) in order to gain
access to the cloud bucket with the images. Then run:
```bash
# From tensorflow/models/research/oid
SPLIT=validation # Set SPLIT to "test" to download the images in the test set
mkdir raw_images_${SPLIT}
gsutil -m rsync -r gs://open-images-dataset/$SPLIT raw_images_${SPLIT}
```
Another option for downloading the images is to follow the URLs contained in the
[image URLs and metadata CSV
files](https://storage.googleapis.com/openimages/2017_07/images_2017_07.tar.gz)
on the Open Images website.
At this point, your `tensorflow/models/research/oid` directory should appear as
follows:
```lang-none
|-- 2017_07
| |-- test
| | `-- annotations-human-bbox.csv
| |-- train
| | `-- annotations-human-bbox.csv
| `-- validation
| `-- annotations-human-bbox.csv
|-- raw_images_validation (if you downloaded the validation split)
| `-- ... (41,620 files matching regex "[0-9a-f]{16}.jpg")
|-- raw_images_test (if you downloaded the test split)
| `-- ... (125,436 files matching regex "[0-9a-f]{16}.jpg")
`-- annotations_human_bbox_2017_07.tar.gz
```
Next, package the data into TFRecords of TFExamples by running:
```bash
# From tensorflow/models/research/oid
SPLIT=validation # Set SPLIT to "test" to create TFRecords for the test split
mkdir ${SPLIT}_tfrecords
PYTHONPATH=$PYTHONPATH:$(readlink -f ..) \
python -m object_detection/dataset_tools/create_oid_tf_record \
--input_box_annotations_csv 2017_07/$SPLIT/annotations-human-bbox.csv \
--input_images_directory raw_images_${SPLIT} \
--input_label_map ../object_detection/data/oid_bbox_trainable_label_map.pbtxt \
--output_tf_record_path_prefix ${SPLIT}_tfrecords/$SPLIT.tfrecord \
--num_shards=100
```
This results in 100 TFRecord files (shards), written to
`oid/${SPLIT}_tfrecords`, with filenames matching
`${SPLIT}.tfrecord-000[0-9][0-9]-of-00100`. Each shard contains approximately
the same number of images and is defacto a representative random sample of the
input data. [This enables](#accelerating_inference) a straightforward work
division scheme for distributing inference and also approximate measure
computations on subsets of the validation and test sets.
## Inferring detections
Inference requires a trained object detection model. In this tutorial we will
use a model from the [detections model zoo](detection_model_zoo.md), which can
be downloaded and unpacked by running the commands below. More information about
the model, such as its architecture and how it was trained, is available in the
[model zoo page](detection_model_zoo.md).
```bash
# From tensorflow/models/research/oid
wget http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_resnet_v2_atrous_oid_14_10_2017.tar.gz
tar -zxvf faster_rcnn_inception_resnet_v2_atrous_oid_14_10_2017.tar.gz
```
At this point, data is packed into TFRecords and we have an object detector
model. We can run inference using:
```bash
# From tensorflow/models/research/oid
SPLIT=validation # or test
TF_RECORD_FILES=$(ls -1 ${SPLIT}_tfrecords/* | tr '\n' ',')
PYTHONPATH=$PYTHONPATH:$(readlink -f ..) \
python -m object_detection/inference/infer_detections \
--input_tfrecord_paths=$TF_RECORD_FILES \
--output_tfrecord_path=${SPLIT}_detections.tfrecord-00000-of-00001 \
--inference_graph=faster_rcnn_inception_resnet_v2_atrous_oid/frozen_inference_graph.pb \
--discard_image_pixels
```
Inference preserves all fields of the input TFExamples, and adds new fields to
store the inferred detections. This allows [computing evaluation
measures](#compute_evaluation_measures) on the output TFRecord alone, as ground
truth boxes are preserved as well. Since measure computations don't require
access to the images, `infer_detections` can optionally discard them with the
`--discard_image_pixels` flag. Discarding the images drastically reduces the
size of the output TFRecord.
### Accelerating inference {#accelerating_inference}
Running inference on the whole validation or test set can take a long time to
complete due to the large number of images present in these sets (41,620 and
125,436 respectively). For quick but approximate evaluation, inference and the
subsequent measure computations can be run on a small number of shards. To run
for example on 2% of all the data, it is enough to set `TF_RECORD_FILES` as
shown below before running `infer_detections`:
```bash
TF_RECORD_FILES=$(ls ${SPLIT}_tfrecords/${SPLIT}.tfrecord-0000[0-1]-of-00100 | tr '\n' ',')
```
Please note that computing evaluation measures on a small subset of the data
introduces variance and bias, since some classes of objects won't be seen during
evaluation. In the example above, this leads to 13.2% higher mAP on the first
two shards of the validation set compared to the mAP for the full set ([see mAP
results](#expected-maps)).
Another way to accelerate inference is to run it in parallel on multiple
TensorFlow devices on possibly multiple machines. The script below uses
[tmux](https://github.com/tmux/tmux/wiki) to run a separate `infer_detections`
process for each GPU on different partition of the input data.
```bash
# From tensorflow/models/research/oid
SPLIT=validation # or test
NUM_GPUS=4
NUM_SHARDS=100
tmux new-session -d -s "inference"
function tmux_start { tmux new-window -d -n "inference:GPU$1" "${*:2}; exec bash"; }
for gpu_index in $(seq 0 $(($NUM_GPUS-1))); do
start_shard=$(( $gpu_index * $NUM_SHARDS / $NUM_GPUS ))
end_shard=$(( ($gpu_index + 1) * $NUM_SHARDS / $NUM_GPUS - 1))
TF_RECORD_FILES=$(seq -s, -f "${SPLIT}_tfrecords/${SPLIT}.tfrecord-%05.0f-of-$(printf '%05d' $NUM_SHARDS)" $start_shard $end_shard)
tmux_start ${gpu_index} \
PYTHONPATH=$PYTHONPATH:$(readlink -f ..) CUDA_VISIBLE_DEVICES=$gpu_index \
python -m object_detection/inference/infer_detections \
--input_tfrecord_paths=$TF_RECORD_FILES \
--output_tfrecord_path=${SPLIT}_detections.tfrecord-$(printf "%05d" $gpu_index)-of-$(printf "%05d" $NUM_GPUS) \
--inference_graph=faster_rcnn_inception_resnet_v2_atrous_oid/frozen_inference_graph.pb \
--discard_image_pixels
done
```
After all `infer_detections` processes finish, `tensorflow/models/research/oid`
will contain one output TFRecord from each process, with name matching
`validation_detections.tfrecord-0000[0-3]-of-00004`.
## Computing evaluation measures {#compute_evaluation_measures}
To compute evaluation measures on the inferred detections you first need to
create the appropriate configuration files:
```bash
# From tensorflow/models/research/oid
SPLIT=validation # or test
NUM_SHARDS=1 # Set to NUM_GPUS if using the parallel evaluation script above
mkdir -p ${SPLIT}_eval_metrics
echo "
label_map_path: '../object_detection/data/oid_bbox_trainable_label_map.pbtxt'
tf_record_input_reader: { input_path: '${SPLIT}_detections.tfrecord@${NUM_SHARDS}' }
" > ${SPLIT}_eval_metrics/${SPLIT}_input_config.pbtxt
echo "
metrics_set: 'open_images_V2_detection_metrics'
" > ${SPLIT}_eval_metrics/${SPLIT}_eval_config.pbtxt
```
And then run:
```bash
# From tensorflow/models/research/oid
SPLIT=validation # or test
PYTHONPATH=$PYTHONPATH:$(readlink -f ..) \
python -m object_detection/metrics/offline_eval_map_corloc \
--eval_dir=${SPLIT}_eval_metrics \
--eval_config_path=${SPLIT}_eval_metrics/${SPLIT}_eval_config.pbtxt \
--input_config_path=${SPLIT}_eval_metrics/${SPLIT}_input_config.pbtxt
```
The first configuration file contains an `object_detection.protos.InputReader`
message that describes the location of the necessary input files. The second
file contains an `object_detection.protos.EvalConfig` message that describes the
evaluation metric. For more information about these protos see the corresponding
source files.
### Expected mAPs {#expected-maps}
The result of running `offline_eval_map_corloc` is a CSV file located at
`${SPLIT}_eval_metrics/metrics.csv`. With the above configuration, the file will
contain average precision at IoU≥0.5 for each of the classes present in the
dataset. It will also contain the mAP@IoU≥0.5. Both the per-class average
precisions and the mAP are computed according to the [Open Images evaluation
protocol](evaluation_protocols.md). The expected mAPs for the validation and
test sets of Open Images in this case are:
Set | Fraction of data | Images | mAP@IoU≥0.5
---------: | :--------------: | :-----: | -----------
validation | everything | 41,620 | 39.2%
validation | first 2 shards | 884 | 52.4%
test | everything | 125,436 | 37.7%
test | first 2 shards | 2,476 | 50.8%
# Preparing Inputs
Tensorflow Object Detection API reads data using the TFRecord file format. Two
sample scripts (`create_pascal_tf_record.py` and `create_pet_tf_record.py`) are
provided to convert from the PASCAL VOC dataset and Oxford-IIIT Pet dataset to
TFRecords.
## Generating the PASCAL VOC TFRecord files.
The raw 2012 PASCAL VOC data set is located
[here](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar).
To download, extract and convert it to TFRecords, run the following commands
below:
```bash
# From tensorflow/models/research/
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
tar -xvf VOCtrainval_11-May-2012.tar
python object_detection/dataset_tools/create_pascal_tf_record.py \
--label_map_path=object_detection/data/pascal_label_map.pbtxt \
--data_dir=VOCdevkit --year=VOC2012 --set=train \
--output_path=pascal_train.record
python object_detection/dataset_tools/create_pascal_tf_record.py \
--label_map_path=object_detection/data/pascal_label_map.pbtxt \
--data_dir=VOCdevkit --year=VOC2012 --set=val \
--output_path=pascal_val.record
```
You should end up with two TFRecord files named `pascal_train.record` and
`pascal_val.record` in the `tensorflow/models/research/` directory.
The label map for the PASCAL VOC data set can be found at
`object_detection/data/pascal_label_map.pbtxt`.
## Generating the Oxford-IIIT Pet TFRecord files.
The Oxford-IIIT Pet data set is located
[here](http://www.robots.ox.ac.uk/~vgg/data/pets/). To download, extract and
convert it to TFRecrods, run the following commands below:
```bash
# From tensorflow/models/research/
wget http://www.robots.ox.ac.uk/~vgg/data/pets/data/images.tar.gz
wget http://www.robots.ox.ac.uk/~vgg/data/pets/data/annotations.tar.gz
tar -xvf annotations.tar.gz
tar -xvf images.tar.gz
python object_detection/dataset_tools/create_pet_tf_record.py \
--label_map_path=object_detection/data/pet_label_map.pbtxt \
--data_dir=`pwd` \
--output_dir=`pwd`
```
You should end up with two TFRecord files named `pet_train.record` and
`pet_val.record` in the `tensorflow/models/research/` directory.
The label map for the Pet dataset can be found at
`object_detection/data/pet_label_map.pbtxt`.
# Running Locally
This page walks through the steps required to train an object detection model
on a local machine. It assumes the reader has completed the
following prerequisites:
1. The Tensorflow Object Detection API has been installed as documented in the
[installation instructions](installation.md). This includes installing library
dependencies, compiling the configuration protobufs and setting up the Python
environment.
2. A valid data set has been created. See [this page](preparing_inputs.md) for
instructions on how to generate a dataset for the PASCAL VOC challenge or the
Oxford-IIIT Pet dataset.
3. A Object Detection pipeline configuration has been written. See
[this page](configuring_jobs.md) for details on how to write a pipeline configuration.
## Recommended Directory Structure for Training and Evaluation
```
+data
-label_map file
-train TFRecord file
-eval TFRecord file
+models
+ model
-pipeline config file
+train
+eval
```
## Running the Training Job
A local training job can be run with the following command:
```bash
# From the tensorflow/models/research/ directory
python object_detection/train.py \
--logtostderr \
--pipeline_config_path=${PATH_TO_YOUR_PIPELINE_CONFIG} \
--train_dir=${PATH_TO_TRAIN_DIR}
```
where `${PATH_TO_YOUR_PIPELINE_CONFIG}` points to the pipeline config and
`${PATH_TO_TRAIN_DIR}` points to the directory in which training checkpoints
and events will be written to. By default, the training job will
run indefinitely until the user kills it.
## Running the Evaluation Job
Evaluation is run as a separate job. The eval job will periodically poll the
train directory for new checkpoints and evaluate them on a test dataset. The
job can be run using the following command:
```bash
# From the tensorflow/models/research/ directory
python object_detection/eval.py \
--logtostderr \
--pipeline_config_path=${PATH_TO_YOUR_PIPELINE_CONFIG} \
--checkpoint_dir=${PATH_TO_TRAIN_DIR} \
--eval_dir=${PATH_TO_EVAL_DIR}
```
where `${PATH_TO_YOUR_PIPELINE_CONFIG}` points to the pipeline config,
`${PATH_TO_TRAIN_DIR}` points to the directory in which training checkpoints
were saved (same as the training job) and `${PATH_TO_EVAL_DIR}` points to the
directory in which evaluation events will be saved. As with the training job,
the eval job run until terminated by default.
## Running Tensorboard
Progress for training and eval jobs can be inspected using Tensorboard. If
using the recommended directory structure, Tensorboard can be run using the
following command:
```bash
tensorboard --logdir=${PATH_TO_MODEL_DIRECTORY}
```
where `${PATH_TO_MODEL_DIRECTORY}` points to the directory that contains the
train and eval directories. Please note it may take Tensorboard a couple minutes
to populate with data.
# Quick Start: Jupyter notebook for off-the-shelf inference
If you'd like to hit the ground running and run detection on a few example
images right out of the box, we recommend trying out the Jupyter notebook demo.
To run the Jupyter notebook, run the following command from
`tensorflow/models/research/object_detection`:
```
# From tensorflow/models/research/object_detection
jupyter notebook
```
The notebook should open in your favorite web browser. Click the
[`object_detection_tutorial.ipynb`](../object_detection_tutorial.ipynb) link to
open the demo.
# Running on Google Cloud Platform
The Tensorflow Object Detection API supports distributed training on Google
Cloud ML Engine. This section documents instructions on how to train and
evaluate your model using Cloud ML. The reader should complete the following
prerequistes:
1. The reader has created and configured a project on Google Cloud Platform.
See [the Cloud ML quick start guide](https://cloud.google.com/ml-engine/docs/quickstarts/command-line).
2. The reader has installed the Tensorflow Object Detection API as documented
in the [installation instructions](installation.md).
3. The reader has a valid data set and stored it in a Google Cloud Storage
bucket. See [this page](preparing_inputs.md) for instructions on how to generate
a dataset for the PASCAL VOC challenge or the Oxford-IIIT Pet dataset.
4. The reader has configured a valid Object Detection pipeline, and stored it
in a Google Cloud Storage bucket. See [this page](configuring_jobs.md) for
details on how to write a pipeline configuration.
Additionally, it is recommended users test their job by running training and
evaluation jobs for a few iterations
[locally on their own machines](running_locally.md).
## Packaging
In order to run the Tensorflow Object Detection API on Cloud ML, it must be
packaged (along with it's TF-Slim dependency). The required packages can be
created with the following command
``` bash
# From tensorflow/models/research/
python setup.py sdist
(cd slim && python setup.py sdist)
```
This will create python packages in dist/object_detection-0.1.tar.gz and
slim/dist/slim-0.1.tar.gz.
## Running a Multiworker Training Job
Google Cloud ML requires a YAML configuration file for a multiworker training
job using GPUs. A sample YAML file is given below:
```
trainingInput:
runtimeVersion: "1.2"
scaleTier: CUSTOM
masterType: standard_gpu
workerCount: 9
workerType: standard_gpu
parameterServerCount: 3
parameterServerType: standard
```
Please keep the following guidelines in mind when writing the YAML
configuration:
* A job with n workers will have n + 1 training machines (n workers + 1 master).
* The number of parameters servers used should be an odd number to prevent
a parameter server from storing only weight variables or only bias variables
(due to round robin parameter scheduling).
* The learning rate in the training config should be decreased when using a
larger number of workers. Some experimentation is required to find the
optimal learning rate.
The YAML file should be saved on the local machine (not on GCP). Once it has
been written, a user can start a training job on Cloud ML Engine using the
following command:
``` bash
# From tensorflow/models/research/
gcloud ml-engine jobs submit training object_detection_`date +%s` \
--runtime-version 1.2 \
--job-dir=gs://${TRAIN_DIR} \
--packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz \
--module-name object_detection.train \
--region us-central1 \
--config ${PATH_TO_LOCAL_YAML_FILE} \
-- \
--train_dir=gs://${TRAIN_DIR} \
--pipeline_config_path=gs://${PIPELINE_CONFIG_PATH}
```
Where `${PATH_TO_LOCAL_YAML_FILE}` is the local path to the YAML configuration,
`gs://${TRAIN_DIR}` specifies the directory on Google Cloud Storage where the
training checkpoints and events will be written to and
`gs://${PIPELINE_CONFIG_PATH}` points to the pipeline configuration stored on
Google Cloud Storage.
Users can monitor the progress of their training job on the [ML Engine
Dashboard](https://console.cloud.google.com/mlengine/jobs).
Note: This sample is supported for use with 1.2 runtime version.
## Running an Evaluation Job on Cloud
Evaluation jobs run on a single machine, so it is not necessary to write a YAML
configuration for evaluation. Run the following command to start the evaluation
job:
``` bash
gcloud ml-engine jobs submit training object_detection_eval_`date +%s` \
--runtime-version 1.2 \
--job-dir=gs://${TRAIN_DIR} \
--packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz \
--module-name object_detection.eval \
--region us-central1 \
--scale-tier BASIC_GPU \
-- \
--checkpoint_dir=gs://${TRAIN_DIR} \
--eval_dir=gs://${EVAL_DIR} \
--pipeline_config_path=gs://${PIPELINE_CONFIG_PATH}
```
Where `gs://${TRAIN_DIR}` points to the directory on Google Cloud Storage where
training checkpoints are saved (same as the training job), `gs://${EVAL_DIR}`
points to where evaluation events will be saved on Google Cloud Storage and
`gs://${PIPELINE_CONFIG_PATH}` points to where the pipeline configuration is
stored on Google Cloud Storage.
## Running Tensorboard
You can run Tensorboard locally on your own machine to view progress of your
training and eval jobs on Google Cloud ML. Run the following command to start
Tensorboard:
``` bash
tensorboard --logdir=gs://${YOUR_CLOUD_BUCKET}
```
Note it may Tensorboard a few minutes to populate with results.
# Quick Start: Distributed Training on the Oxford-IIIT Pets Dataset on Google Cloud
This page is a walkthrough for training an object detector using the Tensorflow
Object Detection API. In this tutorial, we'll be training on the Oxford-IIIT Pets
dataset to build a system to detect various breeds of cats and dogs. The output
of the detector will look like the following:
![](img/oxford_pet.png)
## Setting up a Project on Google Cloud
To accelerate the process, we'll run training and evaluation on [Google Cloud
ML Engine](https://cloud.google.com/ml-engine/) to leverage multiple GPUs. To
begin, you will have to set up Google Cloud via the following steps (if you have
already done this, feel free to skip to the next section):
1. [Create a GCP project](https://cloud.google.com/resource-manager/docs/creating-managing-projects).
2. [Install the Google Cloud SDK](https://cloud.google.com/sdk/downloads) on
your workstation or laptop.
This will provide the tools you need to upload files to Google Cloud Storage and
start ML training jobs.
3. [Enable the ML Engine
APIs](https://console.cloud.google.com/flows/enableapi?apiid=ml.googleapis.com,compute_component&_ga=1.73374291.1570145678.1496689256).
By default, a new GCP project does not enable APIs to start ML Engine training
jobs. Use the above link to explicitly enable them.
4. [Set up a Google Cloud Storage (GCS)
bucket](https://cloud.google.com/storage/docs/creating-buckets). ML Engine
training jobs can only access files on a Google Cloud Storage bucket. In this
tutorial, we'll be required to upload our dataset and configuration to GCS.
Please remember the name of your GCS bucket, as we will reference it multiple
times in this document. Substitute `${YOUR_GCS_BUCKET}` with the name of
your bucket in this document. For your convenience, you should define the
environment variable below:
``` bash
export YOUR_GCS_BUCKET=${YOUR_GCS_BUCKET}
```
It is also possible to run locally by following
[the running locally instructions](running_locally.md).
## Installing Tensorflow and the Tensorflow Object Detection API
Please run through the [installation instructions](installation.md) to install
Tensorflow and all it dependencies. Ensure the Protobuf libraries are
compiled and the library directories are added to `PYTHONPATH`.
## Getting the Oxford-IIIT Pets Dataset and Uploading it to Google Cloud Storage
In order to train a detector, we require a dataset of images, bounding boxes and
classifications. For this demo, we'll use the Oxford-IIIT Pets dataset. The raw
dataset for Oxford-IIIT Pets lives
[here](http://www.robots.ox.ac.uk/~vgg/data/pets/). You will need to download
both the image dataset [`images.tar.gz`](http://www.robots.ox.ac.uk/~vgg/data/pets/data/images.tar.gz)
and the groundtruth data [`annotations.tar.gz`](http://www.robots.ox.ac.uk/~vgg/data/pets/data/annotations.tar.gz)
to the `tensorflow/models/research/` directory and unzip them. This may take
some time.
``` bash
# From tensorflow/models/research/
wget http://www.robots.ox.ac.uk/~vgg/data/pets/data/images.tar.gz
wget http://www.robots.ox.ac.uk/~vgg/data/pets/data/annotations.tar.gz
tar -xvf images.tar.gz
tar -xvf annotations.tar.gz
```
After downloading the tarballs, your `tensorflow/models/research/` directory
should appear as follows:
```lang-none
- images.tar.gz
- annotations.tar.gz
+ images/
+ annotations/
+ object_detection/
... other files and directories
```
The Tensorflow Object Detection API expects data to be in the TFRecord format,
so we'll now run the `create_pet_tf_record` script to convert from the raw
Oxford-IIIT Pet dataset into TFRecords. Run the following commands from the
`tensorflow/models/research/` directory:
``` bash
# From tensorflow/models/research/
python object_detection/dataset_tools/create_pet_tf_record.py \
--label_map_path=object_detection/data/pet_label_map.pbtxt \
--data_dir=`pwd` \
--output_dir=`pwd`
```
Note: It is normal to see some warnings when running this script. You may ignore
them.
Two TFRecord files named `pet_train.record` and `pet_val.record` should be
generated in the `tensorflow/models/research/` directory.
Now that the data has been generated, we'll need to upload it to Google Cloud
Storage so the data can be accessed by ML Engine. Run the following command to
copy the files into your GCS bucket (substituting `${YOUR_GCS_BUCKET}`):
``` bash
# From tensorflow/models/research/
gsutil cp pet_train.record gs://${YOUR_GCS_BUCKET}/data/pet_train.record
gsutil cp pet_val.record gs://${YOUR_GCS_BUCKET}/data/pet_val.record
gsutil cp object_detection/data/pet_label_map.pbtxt gs://${YOUR_GCS_BUCKET}/data/pet_label_map.pbtxt
```
Please remember the path where you upload the data to, as we will need this
information when configuring the pipeline in a following step.
## Downloading a COCO-pretrained Model for Transfer Learning
Training a state of the art object detector from scratch can take days, even
when using multiple GPUs! In order to speed up training, we'll take an object
detector trained on a different dataset (COCO), and reuse some of it's
parameters to initialize our new model.
Download our [COCO-pretrained Faster R-CNN with Resnet-101
model](http://storage.googleapis.com/download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_coco_11_06_2017.tar.gz).
Unzip the contents of the folder and copy the `model.ckpt*` files into your GCS
Bucket.
``` bash
wget http://storage.googleapis.com/download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_coco_11_06_2017.tar.gz
tar -xvf faster_rcnn_resnet101_coco_11_06_2017.tar.gz
gsutil cp faster_rcnn_resnet101_coco_11_06_2017/model.ckpt.* gs://${YOUR_GCS_BUCKET}/data/
```
Remember the path where you uploaded the model checkpoint to, as we will need it
in the following step.
## Configuring the Object Detection Pipeline
In the Tensorflow Object Detection API, the model parameters, training
parameters and eval parameters are all defined by a config file. More details
can be found [here](configuring_jobs.md). For this tutorial, we will use some
predefined templates provided with the source code. In the
`object_detection/samples/configs` folder, there are skeleton object_detection
configuration files. We will use `faster_rcnn_resnet101_pets.config` as a
starting point for configuring the pipeline. Open the file with your favourite
text editor.
We'll need to configure some paths in order for the template to work. Search the
file for instances of `PATH_TO_BE_CONFIGURED` and replace them with the
appropriate value (typically `gs://${YOUR_GCS_BUCKET}/data/`). Afterwards
upload your edited file onto GCS, making note of the path it was uploaded to
(we'll need it when starting the training/eval jobs).
``` bash
# From tensorflow/models/research/
# Edit the faster_rcnn_resnet101_pets.config template. Please note that there
# are multiple places where PATH_TO_BE_CONFIGURED needs to be set.
sed -i "s|PATH_TO_BE_CONFIGURED|"gs://${YOUR_GCS_BUCKET}"/data|g" \
object_detection/samples/configs/faster_rcnn_resnet101_pets.config
# Copy edited template to cloud.
gsutil cp object_detection/samples/configs/faster_rcnn_resnet101_pets.config \
gs://${YOUR_GCS_BUCKET}/data/faster_rcnn_resnet101_pets.config
```
## Checking Your Google Cloud Storage Bucket
At this point in the tutorial, you should have uploaded the training/validation
datasets (including label map), our COCO trained FasterRCNN finetune checkpoint and your job
configuration to your Google Cloud Storage Bucket. Your bucket should look like
the following:
```lang-none
+ ${YOUR_GCS_BUCKET}/
+ data/
- faster_rcnn_resnet101_pets.config
- model.ckpt.index
- model.ckpt.meta
- model.ckpt.data-00000-of-00001
- pet_label_map.pbtxt
- pet_train.record
- pet_val.record
```
You can inspect your bucket using the [Google Cloud Storage
browser](https://console.cloud.google.com/storage/browser).
## Starting Training and Evaluation Jobs on Google Cloud ML Engine
Before we can start a job on Google Cloud ML Engine, we must:
1. Package the Tensorflow Object Detection code.
2. Write a cluster configuration for our Google Cloud ML job.
To package the Tensorflow Object Detection code, run the following commands from
the `tensorflow/models/research/` directory:
``` bash
# From tensorflow/models/research/
python setup.py sdist
(cd slim && python setup.py sdist)
```
You should see two tar.gz files created at `dist/object_detection-0.1.tar.gz`
and `slim/dist/slim-0.1.tar.gz`.
For running the training Cloud ML job, we'll configure the cluster to use 10
training jobs (1 master + 9 workers) and three parameters servers. The
configuration file can be found at `object_detection/samples/cloud/cloud.yml`.
Note: This sample is supported for use with 1.2 runtime version.
To start training, execute the following command from the
`tensorflow/models/research/` directory:
``` bash
# From tensorflow/models/research/
gcloud ml-engine jobs submit training `whoami`_object_detection_`date +%s` \
--runtime-version 1.2 \
--job-dir=gs://${YOUR_GCS_BUCKET}/train \
--packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz \
--module-name object_detection.train \
--region us-central1 \
--config object_detection/samples/cloud/cloud.yml \
-- \
--train_dir=gs://${YOUR_GCS_BUCKET}/train \
--pipeline_config_path=gs://${YOUR_GCS_BUCKET}/data/faster_rcnn_resnet101_pets.config
```
Once training has started, we can run an evaluation concurrently:
``` bash
# From tensorflow/models/research/
gcloud ml-engine jobs submit training `whoami`_object_detection_eval_`date +%s` \
--runtime-version 1.2 \
--job-dir=gs://${YOUR_GCS_BUCKET}/train \
--packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz \
--module-name object_detection.eval \
--region us-central1 \
--scale-tier BASIC_GPU \
-- \
--checkpoint_dir=gs://${YOUR_GCS_BUCKET}/train \
--eval_dir=gs://${YOUR_GCS_BUCKET}/eval \
--pipeline_config_path=gs://${YOUR_GCS_BUCKET}/data/faster_rcnn_resnet101_pets.config
```
Note: Even though we're running an evaluation job, the `gcloud ml-engine jobs
submit training` command is correct. ML Engine does not distinguish between
training and evaluation jobs.
Users can monitor and stop training and evaluation jobs on the [ML Engine
Dashboard](https://console.cloud.google.com/mlengine/jobs).
## Monitoring Progress with Tensorboard
You can monitor progress of the training and eval jobs by running Tensorboard on
your local machine:
``` bash
# This command needs to be run once to allow your local machine to access your
# GCS bucket.
gcloud auth application-default login
tensorboard --logdir=gs://${YOUR_GCS_BUCKET}
```
Once Tensorboard is running, navigate to `localhost:6006` from your favourite
web browser. You should see something similar to the following:
![](img/tensorboard.png)
You will also want to click on the images tab to see example detections made by
the model while it trains. After about an hour and a half of training, you can
expect to see something like this:
![](img/tensorboard2.png)
Note: It takes roughly 10 minutes for a job to get started on ML Engine, and
roughly an hour for the system to evaluate the validation dataset. It may take
some time to populate the dashboards. If you do not see any entries after half
an hour, check the logs from the [ML Engine
Dashboard](https://console.cloud.google.com/mlengine/jobs). Note that by default
the training jobs are configured to go for much longer than is necessary for
convergence. To save money, we recommend killing your jobs once you've seen
that they've converged.
## Exporting the Tensorflow Graph
After your model has been trained, you should export it to a Tensorflow
graph proto. First, you need to identify a candidate checkpoint to export. You
can search your bucket using the [Google Cloud Storage
Browser](https://console.cloud.google.com/storage/browser). The file should be
stored under `${YOUR_GCS_BUCKET}/train`. The checkpoint will typically consist of
three files:
* `model.ckpt-${CHECKPOINT_NUMBER}.data-00000-of-00001`
* `model.ckpt-${CHECKPOINT_NUMBER}.index`
* `model.ckpt-${CHECKPOINT_NUMBER}.meta`
After you've identified a candidate checkpoint to export, run the following
command from `tensorflow/models/research/`:
``` bash
# From tensorflow/models/research/
gsutil cp gs://${YOUR_GCS_BUCKET}/train/model.ckpt-${CHECKPOINT_NUMBER}.* .
python object_detection/export_inference_graph.py \
--input_type image_tensor \
--pipeline_config_path object_detection/samples/configs/faster_rcnn_resnet101_pets.config \
--trained_checkpoint_prefix model.ckpt-${CHECKPOINT_NUMBER} \
--output_directory exported_graphs
```
Afterwards, you should see a directory named `exported_graphs` containing the
SavedModel and frozen graph.
## Configuring the Instance Segmentation Pipeline
Mask prediction can be turned on for an object detection config by adding
`predict_instance_masks: true` within the `MaskRCNNBoxPredictor`. Other
parameters such as mask size, number of convolutions in the mask layer, and the
convolution hyper parameters can be defined. We will use
`mask_rcnn_resnet101_pets.config` as a starting point for configuring the
instance segmentation pipeline. Everything above that was mentioned about object
detection holds true for instance segmentation. Instance segmentation consists
of an object detection model with an additional head that predicts the object
mask inside each predicted box once we remove the training and other details.
Please refer to the section on [Running an Instance Segmentation
Model](instance_segmentation.md) for instructions on how to configure a model
that predicts masks in addition to object bounding boxes.
## What's Next
Congratulations, you have now trained an object detector for various cats and
dogs! There different things you can do now:
1. [Test your exported model using the provided Jupyter notebook.](running_notebook.md)
2. [Experiment with different model configurations.](configuring_jobs.md)
3. Train an object detector using your own data.
# Preparing Inputs
To use your own dataset in Tensorflow Object Detection API, you must convert it
into the [TFRecord file format](https://www.tensorflow.org/api_guides/python/python_io#tfrecords_format_details).
This document outlines how to write a script to generate the TFRecord file.
## Label Maps
Each dataset is required to have a label map associated with it. This label map
defines a mapping from string class names to integer class Ids. The label map
should be a `StringIntLabelMap` text protobuf. Sample label maps can be found in
object_detection/data. Label maps should always start from id 1.
## Dataset Requirements
For every example in your dataset, you should have the following information:
1. An RGB image for the dataset encoded as jpeg or png.
2. A list of bounding boxes for the image. Each bounding box should contain:
1. A bounding box coordinates (with origin in top left corner) defined by 4
floating point numbers [ymin, xmin, ymax, xmax]. Note that we store the
_normalized_ coordinates (x / width, y / height) in the TFRecord dataset.
2. The class of the object in the bounding box.
# Example Image
Consider the following image:
![Example Image](img/example_cat.jpg "Example Image")
with the following label map:
```
item {
id: 1
name: 'Cat'
}
item {
id: 2
name: 'Dog'
}
```
We can generate a tf.Example proto for this image using the following code:
```python
def create_cat_tf_example(encoded_cat_image_data):
"""Creates a tf.Example proto from sample cat image.
Args:
encoded_cat_image_data: The jpg encoded data of the cat image.
Returns:
example: The created tf.Example.
"""
height = 1032.0
width = 1200.0
filename = 'example_cat.jpg'
image_format = b'jpg'
xmins = [322.0 / 1200.0]
xmaxs = [1062.0 / 1200.0]
ymins = [174.0 / 1032.0]
ymaxs = [761.0 / 1032.0]
classes_text = ['Cat']
classes = [1]
tf_example = tf.train.Example(features=tf.train.Features(feature={
'image/height': dataset_util.int64_feature(height),
'image/width': dataset_util.int64_feature(width),
'image/filename': dataset_util.bytes_feature(filename),
'image/source_id': dataset_util.bytes_feature(filename),
'image/encoded': dataset_util.bytes_feature(encoded_image_data),
'image/format': dataset_util.bytes_feature(image_format),
'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
'image/object/class/label': dataset_util.int64_list_feature(classes),
}))
return tf_example
```
## Conversion Script Outline
A typical conversion script will look like the following:
```python
import tensorflow as tf
from object_detection.utils import dataset_util
flags = tf.app.flags
flags.DEFINE_string('output_path', '', 'Path to output TFRecord')
FLAGS = flags.FLAGS
def create_tf_example(example):
# TODO(user): Populate the following variables from your example.
height = None # Image height
width = None # Image width
filename = None # Filename of the image. Empty if image is not from file
encoded_image_data = None # Encoded image bytes
image_format = None # b'jpeg' or b'png'
xmins = [] # List of normalized left x coordinates in bounding box (1 per box)
xmaxs = [] # List of normalized right x coordinates in bounding box
# (1 per box)
ymins = [] # List of normalized top y coordinates in bounding box (1 per box)
ymaxs = [] # List of normalized bottom y coordinates in bounding box
# (1 per box)
classes_text = [] # List of string class name of bounding box (1 per box)
classes = [] # List of integer class id of bounding box (1 per box)
tf_example = tf.train.Example(features=tf.train.Features(feature={
'image/height': dataset_util.int64_feature(height),
'image/width': dataset_util.int64_feature(width),
'image/filename': dataset_util.bytes_feature(filename),
'image/source_id': dataset_util.bytes_feature(filename),
'image/encoded': dataset_util.bytes_feature(encoded_image_data),
'image/format': dataset_util.bytes_feature(image_format),
'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
'image/object/class/label': dataset_util.int64_list_feature(classes),
}))
return tf_example
def main(_):
writer = tf.python_io.TFRecordWriter(FLAGS.output_path)
# TODO(user): Write code to read in your dataset to examples variable
for example in examples:
tf_example = create_tf_example(example)
writer.write(tf_example.SerializeToString())
writer.close()
if __name__ == '__main__':
tf.app.run()
```
Note: You may notice additional fields in some other datasets. They are
currently unused by the API and are optional.
Note: Please refer to the section on [Running an Instance Segmentation
Model](instance_segmentation.md) for instructions on how to configure a model
that predicts masks in addition to object bounding boxes.
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Utility functions for detection inference."""
from __future__ import division
import tensorflow as tf
from object_detection.core import standard_fields
def build_input(tfrecord_paths):
"""Builds the graph's input.
Args:
tfrecord_paths: List of paths to the input TFRecords
Returns:
serialized_example_tensor: The next serialized example. String scalar Tensor
image_tensor: The decoded image of the example. Uint8 tensor,
shape=[1, None, None,3]
"""
filename_queue = tf.train.string_input_producer(
tfrecord_paths, shuffle=False, num_epochs=1)
tf_record_reader = tf.TFRecordReader()
_, serialized_example_tensor = tf_record_reader.read(filename_queue)
features = tf.parse_single_example(
serialized_example_tensor,
features={
standard_fields.TfExampleFields.image_encoded:
tf.FixedLenFeature([], tf.string),
})
encoded_image = features[standard_fields.TfExampleFields.image_encoded]
image_tensor = tf.image.decode_image(encoded_image, channels=3)
image_tensor.set_shape([None, None, 3])
image_tensor = tf.expand_dims(image_tensor, 0)
return serialized_example_tensor, image_tensor
def build_inference_graph(image_tensor, inference_graph_path):
"""Loads the inference graph and connects it to the input image.
Args:
image_tensor: The input image. uint8 tensor, shape=[1, None, None, 3]
inference_graph_path: Path to the inference graph with embedded weights
Returns:
detected_boxes_tensor: Detected boxes. Float tensor,
shape=[num_detections, 4]
detected_scores_tensor: Detected scores. Float tensor,
shape=[num_detections]
detected_labels_tensor: Detected labels. Int64 tensor,
shape=[num_detections]
"""
with tf.gfile.Open(inference_graph_path, 'r') as graph_def_file:
graph_content = graph_def_file.read()
graph_def = tf.GraphDef()
graph_def.MergeFromString(graph_content)
tf.import_graph_def(
graph_def, name='', input_map={'image_tensor': image_tensor})
g = tf.get_default_graph()
num_detections_tensor = tf.squeeze(
g.get_tensor_by_name('num_detections:0'), 0)
num_detections_tensor = tf.cast(num_detections_tensor, tf.int32)
detected_boxes_tensor = tf.squeeze(
g.get_tensor_by_name('detection_boxes:0'), 0)
detected_boxes_tensor = detected_boxes_tensor[:num_detections_tensor]
detected_scores_tensor = tf.squeeze(
g.get_tensor_by_name('detection_scores:0'), 0)
detected_scores_tensor = detected_scores_tensor[:num_detections_tensor]
detected_labels_tensor = tf.squeeze(
g.get_tensor_by_name('detection_classes:0'), 0)
detected_labels_tensor = tf.cast(detected_labels_tensor, tf.int64)
detected_labels_tensor = detected_labels_tensor[:num_detections_tensor]
return detected_boxes_tensor, detected_scores_tensor, detected_labels_tensor
def infer_detections_and_add_to_example(
serialized_example_tensor, detected_boxes_tensor, detected_scores_tensor,
detected_labels_tensor, discard_image_pixels):
"""Runs the supplied tensors and adds the inferred detections to the example.
Args:
serialized_example_tensor: Serialized TF example. Scalar string tensor
detected_boxes_tensor: Detected boxes. Float tensor,
shape=[num_detections, 4]
detected_scores_tensor: Detected scores. Float tensor,
shape=[num_detections]
detected_labels_tensor: Detected labels. Int64 tensor,
shape=[num_detections]
discard_image_pixels: If true, discards the image from the result
Returns:
The de-serialized TF example augmented with the inferred detections.
"""
tf_example = tf.train.Example()
(serialized_example, detected_boxes, detected_scores,
detected_classes) = tf.get_default_session().run([
serialized_example_tensor, detected_boxes_tensor, detected_scores_tensor,
detected_labels_tensor
])
detected_boxes = detected_boxes.T
tf_example.ParseFromString(serialized_example)
feature = tf_example.features.feature
feature[standard_fields.TfExampleFields.
detection_score].float_list.value[:] = detected_scores
feature[standard_fields.TfExampleFields.
detection_bbox_ymin].float_list.value[:] = detected_boxes[0]
feature[standard_fields.TfExampleFields.
detection_bbox_xmin].float_list.value[:] = detected_boxes[1]
feature[standard_fields.TfExampleFields.
detection_bbox_ymax].float_list.value[:] = detected_boxes[2]
feature[standard_fields.TfExampleFields.
detection_bbox_xmax].float_list.value[:] = detected_boxes[3]
feature[standard_fields.TfExampleFields.
detection_class_label].int64_list.value[:] = detected_classes
if discard_image_pixels:
del feature[standard_fields.TfExampleFields.image_encoded]
return tf_example
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
r"""Tests for detection_inference.py."""
import os
import StringIO
import numpy as np
from PIL import Image
import tensorflow as tf
from object_detection.core import standard_fields
from object_detection.inference import detection_inference
from object_detection.utils import dataset_util
def get_mock_tfrecord_path():
return os.path.join(tf.test.get_temp_dir(), 'mock.tfrec')
def create_mock_tfrecord():
pil_image = Image.fromarray(np.array([[[123, 0, 0]]], dtype=np.uint8), 'RGB')
image_output_stream = StringIO.StringIO()
pil_image.save(image_output_stream, format='png')
encoded_image = image_output_stream.getvalue()
feature_map = {
'test_field':
dataset_util.float_list_feature([1, 2, 3, 4]),
standard_fields.TfExampleFields.image_encoded:
dataset_util.bytes_feature(encoded_image),
}
tf_example = tf.train.Example(features=tf.train.Features(feature=feature_map))
with tf.python_io.TFRecordWriter(get_mock_tfrecord_path()) as writer:
writer.write(tf_example.SerializeToString())
def get_mock_graph_path():
return os.path.join(tf.test.get_temp_dir(), 'mock_graph.pb')
def create_mock_graph():
g = tf.Graph()
with g.as_default():
in_image_tensor = tf.placeholder(
tf.uint8, shape=[1, None, None, 3], name='image_tensor')
tf.constant([2.0], name='num_detections')
tf.constant(
[[[0, 0.8, 0.7, 1], [0.1, 0.2, 0.8, 0.9], [0.2, 0.3, 0.4, 0.5]]],
name='detection_boxes')
tf.constant([[0.1, 0.2, 0.3]], name='detection_scores')
tf.identity(
tf.constant([[1.0, 2.0, 3.0]]) *
tf.reduce_sum(tf.cast(in_image_tensor, dtype=tf.float32)),
name='detection_classes')
graph_def = g.as_graph_def()
with tf.gfile.Open(get_mock_graph_path(), 'w') as fl:
fl.write(graph_def.SerializeToString())
class InferDetectionsTests(tf.test.TestCase):
def test_simple(self):
create_mock_graph()
create_mock_tfrecord()
serialized_example_tensor, image_tensor = detection_inference.build_input(
[get_mock_tfrecord_path()])
self.assertAllEqual(image_tensor.get_shape().as_list(), [1, None, None, 3])
(detected_boxes_tensor, detected_scores_tensor,
detected_labels_tensor) = detection_inference.build_inference_graph(
image_tensor, get_mock_graph_path())
with self.test_session(use_gpu=False) as sess:
sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())
tf.train.start_queue_runners()
tf_example = detection_inference.infer_detections_and_add_to_example(
serialized_example_tensor, detected_boxes_tensor,
detected_scores_tensor, detected_labels_tensor, False)
self.assertProtoEquals(r"""
features {
feature {
key: "image/detection/bbox/ymin"
value { float_list { value: [0.0, 0.1] } } }
feature {
key: "image/detection/bbox/xmin"
value { float_list { value: [0.8, 0.2] } } }
feature {
key: "image/detection/bbox/ymax"
value { float_list { value: [0.7, 0.8] } } }
feature {
key: "image/detection/bbox/xmax"
value { float_list { value: [1.0, 0.9] } } }
feature {
key: "image/detection/label"
value { int64_list { value: [123, 246] } } }
feature {
key: "image/detection/score"
value { float_list { value: [0.1, 0.2] } } }
feature {
key: "image/encoded"
value { bytes_list { value:
"\211PNG\r\n\032\n\000\000\000\rIHDR\000\000\000\001\000\000"
"\000\001\010\002\000\000\000\220wS\336\000\000\000\022IDATx"
"\234b\250f`\000\000\000\000\377\377\003\000\001u\000|gO\242"
"\213\000\000\000\000IEND\256B`\202" } } }
feature {
key: "test_field"
value { float_list { value: [1.0, 2.0, 3.0, 4.0] } } } }
""", tf_example)
def test_discard_image(self):
create_mock_graph()
create_mock_tfrecord()
serialized_example_tensor, image_tensor = detection_inference.build_input(
[get_mock_tfrecord_path()])
(detected_boxes_tensor, detected_scores_tensor,
detected_labels_tensor) = detection_inference.build_inference_graph(
image_tensor, get_mock_graph_path())
with self.test_session(use_gpu=False) as sess:
sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())
tf.train.start_queue_runners()
tf_example = detection_inference.infer_detections_and_add_to_example(
serialized_example_tensor, detected_boxes_tensor,
detected_scores_tensor, detected_labels_tensor, True)
self.assertProtoEquals(r"""
features {
feature {
key: "image/detection/bbox/ymin"
value { float_list { value: [0.0, 0.1] } } }
feature {
key: "image/detection/bbox/xmin"
value { float_list { value: [0.8, 0.2] } } }
feature {
key: "image/detection/bbox/ymax"
value { float_list { value: [0.7, 0.8] } } }
feature {
key: "image/detection/bbox/xmax"
value { float_list { value: [1.0, 0.9] } } }
feature {
key: "image/detection/label"
value { int64_list { value: [123, 246] } } }
feature {
key: "image/detection/score"
value { float_list { value: [0.1, 0.2] } } }
feature {
key: "test_field"
value { float_list { value: [1.0, 2.0, 3.0, 4.0] } } } }
""", tf_example)
if __name__ == '__main__':
tf.test.main()
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
r"""Infers detections on a TFRecord of TFExamples given an inference graph.
Example usage:
./infer_detections \
--input_tfrecord_paths=/path/to/input/tfrecord1,/path/to/input/tfrecord2 \
--output_tfrecord_path_prefix=/path/to/output/detections.tfrecord \
--inference_graph=/path/to/frozen_weights_inference_graph.pb
The output is a TFRecord of TFExamples. Each TFExample from the input is first
augmented with detections from the inference graph and then copied to the
output.
The input and output nodes of the inference graph are expected to have the same
types, shapes, and semantics, as the input and output nodes of graphs produced
by export_inference_graph.py, when run with --input_type=image_tensor.
The script can also discard the image pixels in the output. This greatly
reduces the output size and can potentially accelerate reading data in
subsequent processing steps that don't require the images (e.g. computing
metrics).
"""
import itertools
import tensorflow as tf
from object_detection.inference import detection_inference
tf.flags.DEFINE_string('input_tfrecord_paths', None,
'A comma separated list of paths to input TFRecords.')
tf.flags.DEFINE_string('output_tfrecord_path', None,
'Path to the output TFRecord.')
tf.flags.DEFINE_string('inference_graph', None,
'Path to the inference graph with embedded weights.')
tf.flags.DEFINE_boolean('discard_image_pixels', False,
'Discards the images in the output TFExamples. This'
' significantly reduces the output size and is useful'
' if the subsequent tools don\'t need access to the'
' images (e.g. when computing evaluation measures).')
FLAGS = tf.flags.FLAGS
def main(_):
tf.logging.set_verbosity(tf.logging.INFO)
required_flags = ['input_tfrecord_paths', 'output_tfrecord_path',
'inference_graph']
for flag_name in required_flags:
if not getattr(FLAGS, flag_name):
raise ValueError('Flag --{} is required'.format(flag_name))
with tf.Session() as sess:
input_tfrecord_paths = [
v for v in FLAGS.input_tfrecord_paths.split(',') if v]
tf.logging.info('Reading input from %d files', len(input_tfrecord_paths))
serialized_example_tensor, image_tensor = detection_inference.build_input(
input_tfrecord_paths)
tf.logging.info('Reading graph and building model...')
(detected_boxes_tensor, detected_scores_tensor,
detected_labels_tensor) = detection_inference.build_inference_graph(
image_tensor, FLAGS.inference_graph)
tf.logging.info('Running inference and writing output to {}'.format(
FLAGS.output_tfrecord_path))
sess.run(tf.local_variables_initializer())
tf.train.start_queue_runners()
with tf.python_io.TFRecordWriter(
FLAGS.output_tfrecord_path) as tf_record_writer:
try:
for counter in itertools.count():
tf.logging.log_every_n(tf.logging.INFO, 'Processed %d images...', 10,
counter)
tf_example = detection_inference.infer_detections_and_add_to_example(
serialized_example_tensor, detected_boxes_tensor,
detected_scores_tensor, detected_labels_tensor,
FLAGS.discard_image_pixels)
tf_record_writer.write(tf_example.SerializeToString())
except tf.errors.OutOfRangeError:
tf.logging.info('Finished processing records')
if __name__ == '__main__':
tf.app.run()
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Model input function for tf-learn object detection model."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import functools
import tensorflow as tf
from object_detection.builders import dataset_builder
from object_detection.builders import image_resizer_builder
from object_detection.builders import model_builder
from object_detection.builders import preprocessor_builder
from object_detection.core import preprocessor
from object_detection.core import standard_fields as fields
from object_detection.data_decoders import tf_example_decoder
from object_detection.protos import eval_pb2
from object_detection.protos import input_reader_pb2
from object_detection.protos import model_pb2
from object_detection.protos import train_pb2
from object_detection.utils import config_util
from object_detection.utils import dataset_util
from object_detection.utils import ops as util_ops
HASH_KEY = 'hash'
HASH_BINS = 1 << 31
SERVING_FED_EXAMPLE_KEY = 'serialized_example'
# A map of names to methods that help build the input pipeline.
INPUT_BUILDER_UTIL_MAP = {
'dataset_build': dataset_builder.build,
}
def transform_input_data(tensor_dict,
model_preprocess_fn,
image_resizer_fn,
num_classes,
data_augmentation_fn=None,
merge_multiple_boxes=False,
retain_original_image=False):
"""A single function that is responsible for all input data transformations.
Data transformation functions are applied in the following order.
1. If key fields.InputDataFields.image_additional_channels is present in
tensor_dict, the additional channels will be merged into
fields.InputDataFields.image.
2. data_augmentation_fn (optional): applied on tensor_dict.
3. model_preprocess_fn: applied only on image tensor in tensor_dict.
4. image_resizer_fn: applied on original image and instance mask tensor in
tensor_dict.
5. one_hot_encoding: applied to classes tensor in tensor_dict.
6. merge_multiple_boxes (optional): when groundtruth boxes are exactly the
same they can be merged into a single box with an associated k-hot class
label.
Args:
tensor_dict: dictionary containing input tensors keyed by
fields.InputDataFields.
model_preprocess_fn: model's preprocess function to apply on image tensor.
This function must take in a 4-D float tensor and return a 4-D preprocess
float tensor and a tensor containing the true image shape.
image_resizer_fn: image resizer function to apply on groundtruth instance
`masks. This function must take a 3-D float tensor of an image and a 3-D
tensor of instance masks and return a resized version of these along with
the true shapes.
num_classes: number of max classes to one-hot (or k-hot) encode the class
labels.
data_augmentation_fn: (optional) data augmentation function to apply on
input `tensor_dict`.
merge_multiple_boxes: (optional) whether to merge multiple groundtruth boxes
and classes for a given image if the boxes are exactly the same.
retain_original_image: (optional) whether to retain original image in the
output dictionary.
Returns:
A dictionary keyed by fields.InputDataFields containing the tensors obtained
after applying all the transformations.
"""
if fields.InputDataFields.image_additional_channels in tensor_dict:
channels = tensor_dict[fields.InputDataFields.image_additional_channels]
tensor_dict[fields.InputDataFields.image] = tf.concat(
[tensor_dict[fields.InputDataFields.image], channels], axis=2)
if retain_original_image:
tensor_dict[fields.InputDataFields.original_image] = tf.cast(
tensor_dict[fields.InputDataFields.image], tf.uint8)
# Apply data augmentation ops.
if data_augmentation_fn is not None:
tensor_dict = data_augmentation_fn(tensor_dict)
# Apply model preprocessing ops and resize instance masks.
image = tensor_dict[fields.InputDataFields.image]
preprocessed_resized_image, true_image_shape = model_preprocess_fn(
tf.expand_dims(tf.to_float(image), axis=0))
tensor_dict[fields.InputDataFields.image] = tf.squeeze(
preprocessed_resized_image, axis=0)
tensor_dict[fields.InputDataFields.true_image_shape] = tf.squeeze(
true_image_shape, axis=0)
if fields.InputDataFields.groundtruth_instance_masks in tensor_dict:
masks = tensor_dict[fields.InputDataFields.groundtruth_instance_masks]
_, resized_masks, _ = image_resizer_fn(image, masks)
tensor_dict[fields.InputDataFields.
groundtruth_instance_masks] = resized_masks
# Transform groundtruth classes to one hot encodings.
label_offset = 1
zero_indexed_groundtruth_classes = tensor_dict[
fields.InputDataFields.groundtruth_classes] - label_offset
tensor_dict[fields.InputDataFields.groundtruth_classes] = tf.one_hot(
zero_indexed_groundtruth_classes, num_classes)
if merge_multiple_boxes:
merged_boxes, merged_classes, _ = util_ops.merge_boxes_with_multiple_labels(
tensor_dict[fields.InputDataFields.groundtruth_boxes],
zero_indexed_groundtruth_classes, num_classes)
tensor_dict[fields.InputDataFields.groundtruth_boxes] = merged_boxes
tensor_dict[fields.InputDataFields.groundtruth_classes] = merged_classes
return tensor_dict
def augment_input_data(tensor_dict, data_augmentation_options):
"""Applies data augmentation ops to input tensors.
Args:
tensor_dict: A dictionary of input tensors keyed by fields.InputDataFields.
data_augmentation_options: A list of tuples, where each tuple contains a
function and a dictionary that contains arguments and their values.
Usually, this is the output of core/preprocessor.build.
Returns:
A dictionary of tensors obtained by applying data augmentation ops to the
input tensor dictionary.
"""
tensor_dict[fields.InputDataFields.image] = tf.expand_dims(
tf.to_float(tensor_dict[fields.InputDataFields.image]), 0)
include_instance_masks = (fields.InputDataFields.groundtruth_instance_masks
in tensor_dict)
include_keypoints = (fields.InputDataFields.groundtruth_keypoints
in tensor_dict)
tensor_dict = preprocessor.preprocess(
tensor_dict, data_augmentation_options,
func_arg_map=preprocessor.get_default_func_arg_map(
include_instance_masks=include_instance_masks,
include_keypoints=include_keypoints))
tensor_dict[fields.InputDataFields.image] = tf.squeeze(
tensor_dict[fields.InputDataFields.image], axis=0)
return tensor_dict
def _get_labels_dict(input_dict):
"""Extracts labels dict from input dict."""
required_label_keys = [
fields.InputDataFields.num_groundtruth_boxes,
fields.InputDataFields.groundtruth_boxes,
fields.InputDataFields.groundtruth_classes,
fields.InputDataFields.groundtruth_weights
]
labels_dict = {}
for key in required_label_keys:
labels_dict[key] = input_dict[key]
optional_label_keys = [
fields.InputDataFields.groundtruth_keypoints,
fields.InputDataFields.groundtruth_instance_masks,
fields.InputDataFields.groundtruth_area,
fields.InputDataFields.groundtruth_is_crowd,
fields.InputDataFields.groundtruth_difficult
]
for key in optional_label_keys:
if key in input_dict:
labels_dict[key] = input_dict[key]
if fields.InputDataFields.groundtruth_difficult in labels_dict:
labels_dict[fields.InputDataFields.groundtruth_difficult] = tf.cast(
labels_dict[fields.InputDataFields.groundtruth_difficult], tf.int32)
return labels_dict
def _get_features_dict(input_dict):
"""Extracts features dict from input dict."""
hash_from_source_id = tf.string_to_hash_bucket_fast(
input_dict[fields.InputDataFields.source_id], HASH_BINS)
features = {
fields.InputDataFields.image:
input_dict[fields.InputDataFields.image],
HASH_KEY: tf.cast(hash_from_source_id, tf.int32),
fields.InputDataFields.true_image_shape:
input_dict[fields.InputDataFields.true_image_shape]
}
if fields.InputDataFields.original_image in input_dict:
features[fields.InputDataFields.original_image] = input_dict[
fields.InputDataFields.original_image]
return features
def create_train_input_fn(train_config, train_input_config,
model_config):
"""Creates a train `input` function for `Estimator`.
Args:
train_config: A train_pb2.TrainConfig.
train_input_config: An input_reader_pb2.InputReader.
model_config: A model_pb2.DetectionModel.
Returns:
`input_fn` for `Estimator` in TRAIN mode.
"""
def _train_input_fn(params=None):
"""Returns `features` and `labels` tensor dictionaries for training.
Args:
params: Parameter dictionary passed from the estimator.
Returns:
features: Dictionary of feature tensors.
features[fields.InputDataFields.image] is a [batch_size, H, W, C]
float32 tensor with preprocessed images.
features[HASH_KEY] is a [batch_size] int32 tensor representing unique
identifiers for the images.
features[fields.InputDataFields.true_image_shape] is a [batch_size, 3]
int32 tensor representing the true image shapes, as preprocessed
images could be padded.
features[fields.InputDataFields.original_image] (optional) is a
[batch_size, H, W, C] float32 tensor with original images.
labels: Dictionary of groundtruth tensors.
labels[fields.InputDataFields.num_groundtruth_boxes] is a [batch_size]
int32 tensor indicating the number of groundtruth boxes.
labels[fields.InputDataFields.groundtruth_boxes] is a
[batch_size, num_boxes, 4] float32 tensor containing the corners of
the groundtruth boxes.
labels[fields.InputDataFields.groundtruth_classes] is a
[batch_size, num_boxes, num_classes] float32 one-hot tensor of
classes.
labels[fields.InputDataFields.groundtruth_weights] is a
[batch_size, num_boxes] float32 tensor containing groundtruth weights
for the boxes.
-- Optional --
labels[fields.InputDataFields.groundtruth_instance_masks] is a
[batch_size, num_boxes, H, W] float32 tensor containing only binary
values, which represent instance masks for objects.
labels[fields.InputDataFields.groundtruth_keypoints] is a
[batch_size, num_boxes, num_keypoints, 2] float32 tensor containing
keypoints for each box.
Raises:
TypeError: if the `train_config`, `train_input_config` or `model_config`
are not of the correct type.
"""
if not isinstance(train_config, train_pb2.TrainConfig):
raise TypeError('For training mode, the `train_config` must be a '
'train_pb2.TrainConfig.')
if not isinstance(train_input_config, input_reader_pb2.InputReader):
raise TypeError('The `train_input_config` must be a '
'input_reader_pb2.InputReader.')
if not isinstance(model_config, model_pb2.DetectionModel):
raise TypeError('The `model_config` must be a '
'model_pb2.DetectionModel.')
data_augmentation_options = [
preprocessor_builder.build(step)
for step in train_config.data_augmentation_options
]
data_augmentation_fn = functools.partial(
augment_input_data, data_augmentation_options=data_augmentation_options)
model = model_builder.build(model_config, is_training=True)
image_resizer_config = config_util.get_image_resizer_config(model_config)
image_resizer_fn = image_resizer_builder.build(image_resizer_config)
transform_data_fn = functools.partial(
transform_input_data, model_preprocess_fn=model.preprocess,
image_resizer_fn=image_resizer_fn,
num_classes=config_util.get_number_of_classes(model_config),
data_augmentation_fn=data_augmentation_fn,
retain_original_image=train_config.retain_original_images)
dataset = INPUT_BUILDER_UTIL_MAP['dataset_build'](
train_input_config,
transform_input_data_fn=transform_data_fn,
batch_size=params['batch_size'] if params else train_config.batch_size,
max_num_boxes=train_config.max_number_of_boxes,
num_classes=config_util.get_number_of_classes(model_config),
spatial_image_shape=config_util.get_spatial_image_size(
image_resizer_config))
input_dict = dataset_util.make_initializable_iterator(dataset).get_next()
return (_get_features_dict(input_dict), _get_labels_dict(input_dict))
return _train_input_fn
def create_eval_input_fn(eval_config, eval_input_config, model_config):
"""Creates an eval `input` function for `Estimator`.
Args:
eval_config: An eval_pb2.EvalConfig.
eval_input_config: An input_reader_pb2.InputReader.
model_config: A model_pb2.DetectionModel.
Returns:
`input_fn` for `Estimator` in EVAL mode.
"""
def _eval_input_fn(params=None):
"""Returns `features` and `labels` tensor dictionaries for evaluation.
Args:
params: Parameter dictionary passed from the estimator.
Returns:
features: Dictionary of feature tensors.
features[fields.InputDataFields.image] is a [1, H, W, C] float32 tensor
with preprocessed images.
features[HASH_KEY] is a [1] int32 tensor representing unique
identifiers for the images.
features[fields.InputDataFields.true_image_shape] is a [1, 3]
int32 tensor representing the true image shapes, as preprocessed
images could be padded.
features[fields.InputDataFields.original_image] is a [1, H', W', C]
float32 tensor with the original image.
labels: Dictionary of groundtruth tensors.
labels[fields.InputDataFields.groundtruth_boxes] is a [1, num_boxes, 4]
float32 tensor containing the corners of the groundtruth boxes.
labels[fields.InputDataFields.groundtruth_classes] is a
[num_boxes, num_classes] float32 one-hot tensor of classes.
labels[fields.InputDataFields.groundtruth_area] is a [1, num_boxes]
float32 tensor containing object areas.
labels[fields.InputDataFields.groundtruth_is_crowd] is a [1, num_boxes]
bool tensor indicating if the boxes enclose a crowd.
labels[fields.InputDataFields.groundtruth_difficult] is a [1, num_boxes]
int32 tensor indicating if the boxes represent difficult instances.
-- Optional --
labels[fields.InputDataFields.groundtruth_instance_masks] is a
[1, num_boxes, H, W] float32 tensor containing only binary values,
which represent instance masks for objects.
Raises:
TypeError: if the `eval_config`, `eval_input_config` or `model_config`
are not of the correct type.
"""
params = params or {}
if not isinstance(eval_config, eval_pb2.EvalConfig):
raise TypeError('For eval mode, the `eval_config` must be a '
'train_pb2.EvalConfig.')
if not isinstance(eval_input_config, input_reader_pb2.InputReader):
raise TypeError('The `eval_input_config` must be a '
'input_reader_pb2.InputReader.')
if not isinstance(model_config, model_pb2.DetectionModel):
raise TypeError('The `model_config` must be a '
'model_pb2.DetectionModel.')
num_classes = config_util.get_number_of_classes(model_config)
model = model_builder.build(model_config, is_training=False)
image_resizer_config = config_util.get_image_resizer_config(model_config)
image_resizer_fn = image_resizer_builder.build(image_resizer_config)
transform_data_fn = functools.partial(
transform_input_data, model_preprocess_fn=model.preprocess,
image_resizer_fn=image_resizer_fn,
num_classes=num_classes,
data_augmentation_fn=None,
retain_original_image=eval_config.retain_original_images)
dataset = INPUT_BUILDER_UTIL_MAP['dataset_build'](
eval_input_config,
transform_input_data_fn=transform_data_fn,
batch_size=params.get('batch_size', 1),
num_classes=config_util.get_number_of_classes(model_config),
spatial_image_shape=config_util.get_spatial_image_size(
image_resizer_config))
input_dict = dataset_util.make_initializable_iterator(dataset).get_next()
return (_get_features_dict(input_dict), _get_labels_dict(input_dict))
return _eval_input_fn
def create_predict_input_fn(model_config):
"""Creates a predict `input` function for `Estimator`.
Args:
model_config: A model_pb2.DetectionModel.
Returns:
`input_fn` for `Estimator` in PREDICT mode.
"""
def _predict_input_fn(params=None):
"""Decodes serialized tf.Examples and returns `ServingInputReceiver`.
Args:
params: Parameter dictionary passed from the estimator.
Returns:
`ServingInputReceiver`.
"""
del params
example = tf.placeholder(dtype=tf.string, shape=[], name='input_feature')
num_classes = config_util.get_number_of_classes(model_config)
model = model_builder.build(model_config, is_training=False)
image_resizer_config = config_util.get_image_resizer_config(model_config)
image_resizer_fn = image_resizer_builder.build(image_resizer_config)
transform_fn = functools.partial(
transform_input_data, model_preprocess_fn=model.preprocess,
image_resizer_fn=image_resizer_fn,
num_classes=num_classes,
data_augmentation_fn=None)
decoder = tf_example_decoder.TfExampleDecoder(load_instance_masks=False)
input_dict = transform_fn(decoder.decode(example))
images = tf.to_float(input_dict[fields.InputDataFields.image])
images = tf.expand_dims(images, axis=0)
true_image_shape = tf.expand_dims(
input_dict[fields.InputDataFields.true_image_shape], axis=0)
return tf.estimator.export.ServingInputReceiver(
features={
fields.InputDataFields.image: images,
fields.InputDataFields.true_image_shape: true_image_shape},
receiver_tensors={SERVING_FED_EXAMPLE_KEY: example})
return _predict_input_fn
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for object_detection.tflearn.inputs."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import functools
import os
import numpy as np
import tensorflow as tf
from object_detection import inputs
from object_detection.core import preprocessor
from object_detection.core import standard_fields as fields
from object_detection.utils import config_util
FLAGS = tf.flags.FLAGS
def _get_configs_for_model(model_name):
"""Returns configurations for model."""
fname = os.path.join(tf.resource_loader.get_data_files_path(),
'samples/configs/' + model_name + '.config')
label_map_path = os.path.join(tf.resource_loader.get_data_files_path(),
'data/pet_label_map.pbtxt')
data_path = os.path.join(tf.resource_loader.get_data_files_path(),
'test_data/pets_examples.record')
configs = config_util.get_configs_from_pipeline_file(fname)
return config_util.merge_external_params_with_configs(
configs,
train_input_path=data_path,
eval_input_path=data_path,
label_map_path=label_map_path)
class InputsTest(tf.test.TestCase):
def test_faster_rcnn_resnet50_train_input(self):
"""Tests the training input function for FasterRcnnResnet50."""
configs = _get_configs_for_model('faster_rcnn_resnet50_pets')
configs['train_config'].unpad_groundtruth_tensors = True
model_config = configs['model']
model_config.faster_rcnn.num_classes = 37
train_input_fn = inputs.create_train_input_fn(
configs['train_config'], configs['train_input_config'], model_config)
features, labels = train_input_fn()
self.assertAllEqual([1, None, None, 3],
features[fields.InputDataFields.image].shape.as_list())
self.assertEqual(tf.float32, features[fields.InputDataFields.image].dtype)
self.assertAllEqual([1],
features[inputs.HASH_KEY].shape.as_list())
self.assertEqual(tf.int32, features[inputs.HASH_KEY].dtype)
self.assertAllEqual(
[1, 50, 4],
labels[fields.InputDataFields.groundtruth_boxes].shape.as_list())
self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_boxes].dtype)
self.assertAllEqual(
[1, 50, model_config.faster_rcnn.num_classes],
labels[fields.InputDataFields.groundtruth_classes].shape.as_list())
self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_classes].dtype)
self.assertAllEqual(
[1, 50],
labels[fields.InputDataFields.groundtruth_weights].shape.as_list())
self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_weights].dtype)
def test_faster_rcnn_resnet50_eval_input(self):
"""Tests the eval input function for FasterRcnnResnet50."""
configs = _get_configs_for_model('faster_rcnn_resnet50_pets')
model_config = configs['model']
model_config.faster_rcnn.num_classes = 37
eval_input_fn = inputs.create_eval_input_fn(
configs['eval_config'], configs['eval_input_config'], model_config)
features, labels = eval_input_fn()
self.assertAllEqual([1, None, None, 3],
features[fields.InputDataFields.image].shape.as_list())
self.assertEqual(tf.float32, features[fields.InputDataFields.image].dtype)
self.assertAllEqual(
[1, None, None, 3],
features[fields.InputDataFields.original_image].shape.as_list())
self.assertEqual(tf.uint8,
features[fields.InputDataFields.original_image].dtype)
self.assertAllEqual([1], features[inputs.HASH_KEY].shape.as_list())
self.assertEqual(tf.int32, features[inputs.HASH_KEY].dtype)
self.assertAllEqual(
[1, None, 4],
labels[fields.InputDataFields.groundtruth_boxes].shape.as_list())
self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_boxes].dtype)
self.assertAllEqual(
[1, None, model_config.faster_rcnn.num_classes],
labels[fields.InputDataFields.groundtruth_classes].shape.as_list())
self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_classes].dtype)
self.assertAllEqual(
[1, None],
labels[fields.InputDataFields.groundtruth_area].shape.as_list())
self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_area].dtype)
self.assertAllEqual(
[1, None],
labels[fields.InputDataFields.groundtruth_is_crowd].shape.as_list())
self.assertEqual(
tf.bool, labels[fields.InputDataFields.groundtruth_is_crowd].dtype)
self.assertAllEqual(
[1, None],
labels[fields.InputDataFields.groundtruth_difficult].shape.as_list())
self.assertEqual(
tf.int32, labels[fields.InputDataFields.groundtruth_difficult].dtype)
def test_ssd_inceptionV2_train_input(self):
"""Tests the training input function for SSDInceptionV2."""
configs = _get_configs_for_model('ssd_inception_v2_pets')
model_config = configs['model']
model_config.ssd.num_classes = 37
batch_size = configs['train_config'].batch_size
train_input_fn = inputs.create_train_input_fn(
configs['train_config'], configs['train_input_config'], model_config)
features, labels = train_input_fn()
self.assertAllEqual([batch_size, 300, 300, 3],
features[fields.InputDataFields.image].shape.as_list())
self.assertEqual(tf.float32, features[fields.InputDataFields.image].dtype)
self.assertAllEqual([batch_size],
features[inputs.HASH_KEY].shape.as_list())
self.assertEqual(tf.int32, features[inputs.HASH_KEY].dtype)
self.assertAllEqual(
[batch_size],
labels[fields.InputDataFields.num_groundtruth_boxes].shape.as_list())
self.assertEqual(tf.int32,
labels[fields.InputDataFields.num_groundtruth_boxes].dtype)
self.assertAllEqual(
[batch_size, 50, 4],
labels[fields.InputDataFields.groundtruth_boxes].shape.as_list())
self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_boxes].dtype)
self.assertAllEqual(
[batch_size, 50, model_config.ssd.num_classes],
labels[fields.InputDataFields.groundtruth_classes].shape.as_list())
self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_classes].dtype)
self.assertAllEqual(
[batch_size, 50],
labels[fields.InputDataFields.groundtruth_weights].shape.as_list())
self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_weights].dtype)
def test_ssd_inceptionV2_eval_input(self):
"""Tests the eval input function for SSDInceptionV2."""
configs = _get_configs_for_model('ssd_inception_v2_pets')
model_config = configs['model']
model_config.ssd.num_classes = 37
eval_input_fn = inputs.create_eval_input_fn(
configs['eval_config'], configs['eval_input_config'], model_config)
features, labels = eval_input_fn()
self.assertAllEqual([1, 300, 300, 3],
features[fields.InputDataFields.image].shape.as_list())
self.assertEqual(tf.float32, features[fields.InputDataFields.image].dtype)
self.assertAllEqual(
[1, None, None, 3],
features[fields.InputDataFields.original_image].shape.as_list())
self.assertEqual(tf.uint8,
features[fields.InputDataFields.original_image].dtype)
self.assertAllEqual([1], features[inputs.HASH_KEY].shape.as_list())
self.assertEqual(tf.int32, features[inputs.HASH_KEY].dtype)
self.assertAllEqual(
[1, None, 4],
labels[fields.InputDataFields.groundtruth_boxes].shape.as_list())
self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_boxes].dtype)
self.assertAllEqual(
[1, None, model_config.ssd.num_classes],
labels[fields.InputDataFields.groundtruth_classes].shape.as_list())
self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_classes].dtype)
self.assertAllEqual(
[1, None],
labels[fields.InputDataFields.groundtruth_area].shape.as_list())
self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_area].dtype)
self.assertAllEqual(
[1, None],
labels[fields.InputDataFields.groundtruth_is_crowd].shape.as_list())
self.assertEqual(
tf.bool, labels[fields.InputDataFields.groundtruth_is_crowd].dtype)
self.assertAllEqual(
[1, None],
labels[fields.InputDataFields.groundtruth_difficult].shape.as_list())
self.assertEqual(
tf.int32, labels[fields.InputDataFields.groundtruth_difficult].dtype)
def test_predict_input(self):
"""Tests the predict input function."""
configs = _get_configs_for_model('ssd_inception_v2_pets')
predict_input_fn = inputs.create_predict_input_fn(
model_config=configs['model'])
serving_input_receiver = predict_input_fn()
image = serving_input_receiver.features[fields.InputDataFields.image]
receiver_tensors = serving_input_receiver.receiver_tensors[
inputs.SERVING_FED_EXAMPLE_KEY]
self.assertEqual([1, 300, 300, 3], image.shape.as_list())
self.assertEqual(tf.float32, image.dtype)
self.assertEqual(tf.string, receiver_tensors.dtype)
def test_error_with_bad_train_config(self):
"""Tests that a TypeError is raised with improper train config."""
configs = _get_configs_for_model('ssd_inception_v2_pets')
configs['model'].ssd.num_classes = 37
train_input_fn = inputs.create_train_input_fn(
train_config=configs['eval_config'], # Expecting `TrainConfig`.
train_input_config=configs['train_input_config'],
model_config=configs['model'])
with self.assertRaises(TypeError):
train_input_fn()
def test_error_with_bad_train_input_config(self):
"""Tests that a TypeError is raised with improper train input config."""
configs = _get_configs_for_model('ssd_inception_v2_pets')
configs['model'].ssd.num_classes = 37
train_input_fn = inputs.create_train_input_fn(
train_config=configs['train_config'],
train_input_config=configs['model'], # Expecting `InputReader`.
model_config=configs['model'])
with self.assertRaises(TypeError):
train_input_fn()
def test_error_with_bad_train_model_config(self):
"""Tests that a TypeError is raised with improper train model config."""
configs = _get_configs_for_model('ssd_inception_v2_pets')
configs['model'].ssd.num_classes = 37
train_input_fn = inputs.create_train_input_fn(
train_config=configs['train_config'],
train_input_config=configs['train_input_config'],
model_config=configs['train_config']) # Expecting `DetectionModel`.
with self.assertRaises(TypeError):
train_input_fn()
def test_error_with_bad_eval_config(self):
"""Tests that a TypeError is raised with improper eval config."""
configs = _get_configs_for_model('ssd_inception_v2_pets')
configs['model'].ssd.num_classes = 37
eval_input_fn = inputs.create_eval_input_fn(
eval_config=configs['train_config'], # Expecting `EvalConfig`.
eval_input_config=configs['eval_input_config'],
model_config=configs['model'])
with self.assertRaises(TypeError):
eval_input_fn()
def test_error_with_bad_eval_input_config(self):
"""Tests that a TypeError is raised with improper eval input config."""
configs = _get_configs_for_model('ssd_inception_v2_pets')
configs['model'].ssd.num_classes = 37
eval_input_fn = inputs.create_eval_input_fn(
eval_config=configs['eval_config'],
eval_input_config=configs['model'], # Expecting `InputReader`.
model_config=configs['model'])
with self.assertRaises(TypeError):
eval_input_fn()
def test_error_with_bad_eval_model_config(self):
"""Tests that a TypeError is raised with improper eval model config."""
configs = _get_configs_for_model('ssd_inception_v2_pets')
configs['model'].ssd.num_classes = 37
eval_input_fn = inputs.create_eval_input_fn(
eval_config=configs['eval_config'],
eval_input_config=configs['eval_input_config'],
model_config=configs['eval_config']) # Expecting `DetectionModel`.
with self.assertRaises(TypeError):
eval_input_fn()
class DataAugmentationFnTest(tf.test.TestCase):
def test_apply_image_and_box_augmentation(self):
data_augmentation_options = [
(preprocessor.resize_image, {
'new_height': 20,
'new_width': 20,
'method': tf.image.ResizeMethod.NEAREST_NEIGHBOR
}),
(preprocessor.scale_boxes_to_pixel_coordinates, {}),
]
data_augmentation_fn = functools.partial(
inputs.augment_input_data,
data_augmentation_options=data_augmentation_options)
tensor_dict = {
fields.InputDataFields.image:
tf.constant(np.random.rand(10, 10, 3).astype(np.float32)),
fields.InputDataFields.groundtruth_boxes:
tf.constant(np.array([[.5, .5, 1., 1.]], np.float32))
}
augmented_tensor_dict = data_augmentation_fn(tensor_dict=tensor_dict)
with self.test_session() as sess:
augmented_tensor_dict_out = sess.run(augmented_tensor_dict)
self.assertAllEqual(
augmented_tensor_dict_out[fields.InputDataFields.image].shape,
[20, 20, 3]
)
self.assertAllClose(
augmented_tensor_dict_out[fields.InputDataFields.groundtruth_boxes],
[[10, 10, 20, 20]]
)
def test_include_masks_in_data_augmentation(self):
data_augmentation_options = [
(preprocessor.resize_image, {
'new_height': 20,
'new_width': 20,
'method': tf.image.ResizeMethod.NEAREST_NEIGHBOR
})
]
data_augmentation_fn = functools.partial(
inputs.augment_input_data,
data_augmentation_options=data_augmentation_options)
tensor_dict = {
fields.InputDataFields.image:
tf.constant(np.random.rand(10, 10, 3).astype(np.float32)),
fields.InputDataFields.groundtruth_instance_masks:
tf.constant(np.zeros([2, 10, 10], np.uint8))
}
augmented_tensor_dict = data_augmentation_fn(tensor_dict=tensor_dict)
with self.test_session() as sess:
augmented_tensor_dict_out = sess.run(augmented_tensor_dict)
self.assertAllEqual(
augmented_tensor_dict_out[fields.InputDataFields.image].shape,
[20, 20, 3])
self.assertAllEqual(augmented_tensor_dict_out[
fields.InputDataFields.groundtruth_instance_masks].shape, [2, 20, 20])
def test_include_keypoints_in_data_augmentation(self):
data_augmentation_options = [
(preprocessor.resize_image, {
'new_height': 20,
'new_width': 20,
'method': tf.image.ResizeMethod.NEAREST_NEIGHBOR
}),
(preprocessor.scale_boxes_to_pixel_coordinates, {}),
]
data_augmentation_fn = functools.partial(
inputs.augment_input_data,
data_augmentation_options=data_augmentation_options)
tensor_dict = {
fields.InputDataFields.image:
tf.constant(np.random.rand(10, 10, 3).astype(np.float32)),
fields.InputDataFields.groundtruth_boxes:
tf.constant(np.array([[.5, .5, 1., 1.]], np.float32)),
fields.InputDataFields.groundtruth_keypoints:
tf.constant(np.array([[[0.5, 1.0], [0.5, 0.5]]], np.float32))
}
augmented_tensor_dict = data_augmentation_fn(tensor_dict=tensor_dict)
with self.test_session() as sess:
augmented_tensor_dict_out = sess.run(augmented_tensor_dict)
self.assertAllEqual(
augmented_tensor_dict_out[fields.InputDataFields.image].shape,
[20, 20, 3]
)
self.assertAllClose(
augmented_tensor_dict_out[fields.InputDataFields.groundtruth_boxes],
[[10, 10, 20, 20]]
)
self.assertAllClose(
augmented_tensor_dict_out[fields.InputDataFields.groundtruth_keypoints],
[[[10, 20], [10, 10]]]
)
def _fake_model_preprocessor_fn(image):
return (image, tf.expand_dims(tf.shape(image)[1:], axis=0))
def _fake_image_resizer_fn(image, mask):
return (image, mask, tf.shape(image))
class DataTransformationFnTest(tf.test.TestCase):
def test_combine_additional_channels_if_present(self):
image = np.random.rand(4, 4, 3).astype(np.float32)
additional_channels = np.random.rand(4, 4, 2).astype(np.float32)
tensor_dict = {
fields.InputDataFields.image:
tf.constant(image),
fields.InputDataFields.image_additional_channels:
tf.constant(additional_channels),
fields.InputDataFields.groundtruth_classes:
tf.constant(np.array([1, 1], np.int32))
}
input_transformation_fn = functools.partial(
inputs.transform_input_data,
model_preprocess_fn=_fake_model_preprocessor_fn,
image_resizer_fn=_fake_image_resizer_fn,
num_classes=1)
with self.test_session() as sess:
transformed_inputs = sess.run(
input_transformation_fn(tensor_dict=tensor_dict))
self.assertAllEqual(transformed_inputs[fields.InputDataFields.image].dtype,
tf.float32)
self.assertAllEqual(transformed_inputs[fields.InputDataFields.image].shape,
[4, 4, 5])
self.assertAllClose(transformed_inputs[fields.InputDataFields.image],
np.concatenate((image, additional_channels), axis=2))
def test_returns_correct_class_label_encodings(self):
tensor_dict = {
fields.InputDataFields.image:
tf.constant(np.random.rand(4, 4, 3).astype(np.float32)),
fields.InputDataFields.groundtruth_boxes:
tf.constant(np.array([[0, 0, 1, 1], [.5, .5, 1, 1]], np.float32)),
fields.InputDataFields.groundtruth_classes:
tf.constant(np.array([3, 1], np.int32))
}
num_classes = 3
input_transformation_fn = functools.partial(
inputs.transform_input_data,
model_preprocess_fn=_fake_model_preprocessor_fn,
image_resizer_fn=_fake_image_resizer_fn,
num_classes=num_classes)
with self.test_session() as sess:
transformed_inputs = sess.run(
input_transformation_fn(tensor_dict=tensor_dict))
self.assertAllClose(
transformed_inputs[fields.InputDataFields.groundtruth_classes],
[[0, 0, 1], [1, 0, 0]])
def test_returns_correct_merged_boxes(self):
tensor_dict = {
fields.InputDataFields.image:
tf.constant(np.random.rand(4, 4, 3).astype(np.float32)),
fields.InputDataFields.groundtruth_boxes:
tf.constant(np.array([[.5, .5, 1, 1], [.5, .5, 1, 1]], np.float32)),
fields.InputDataFields.groundtruth_classes:
tf.constant(np.array([3, 1], np.int32))
}
num_classes = 3
input_transformation_fn = functools.partial(
inputs.transform_input_data,
model_preprocess_fn=_fake_model_preprocessor_fn,
image_resizer_fn=_fake_image_resizer_fn,
num_classes=num_classes,
merge_multiple_boxes=True)
with self.test_session() as sess:
transformed_inputs = sess.run(
input_transformation_fn(tensor_dict=tensor_dict))
self.assertAllClose(
transformed_inputs[fields.InputDataFields.groundtruth_boxes],
[[.5, .5, 1., 1.]])
self.assertAllClose(
transformed_inputs[fields.InputDataFields.groundtruth_classes],
[[1, 0, 1]])
def test_returns_resized_masks(self):
tensor_dict = {
fields.InputDataFields.image:
tf.constant(np.random.rand(4, 4, 3).astype(np.float32)),
fields.InputDataFields.groundtruth_instance_masks:
tf.constant(np.random.rand(2, 4, 4).astype(np.float32)),
fields.InputDataFields.groundtruth_classes:
tf.constant(np.array([3, 1], np.int32))
}
def fake_image_resizer_fn(image, masks=None):
resized_image = tf.image.resize_images(image, [8, 8])
results = [resized_image]
if masks is not None:
resized_masks = tf.transpose(
tf.image.resize_images(tf.transpose(masks, [1, 2, 0]), [8, 8]),
[2, 0, 1])
results.append(resized_masks)
results.append(tf.shape(resized_image))
return results
num_classes = 3
input_transformation_fn = functools.partial(
inputs.transform_input_data,
model_preprocess_fn=_fake_model_preprocessor_fn,
image_resizer_fn=fake_image_resizer_fn,
num_classes=num_classes,
retain_original_image=True)
with self.test_session() as sess:
transformed_inputs = sess.run(
input_transformation_fn(tensor_dict=tensor_dict))
self.assertAllEqual(transformed_inputs[
fields.InputDataFields.original_image].dtype, tf.uint8)
self.assertAllEqual(transformed_inputs[
fields.InputDataFields.original_image].shape, [4, 4, 3])
self.assertAllEqual(transformed_inputs[
fields.InputDataFields.groundtruth_instance_masks].shape, [2, 8, 8])
def test_applies_model_preprocess_fn_to_image_tensor(self):
np_image = np.random.randint(256, size=(4, 4, 3))
tensor_dict = {
fields.InputDataFields.image:
tf.constant(np_image),
fields.InputDataFields.groundtruth_classes:
tf.constant(np.array([3, 1], np.int32))
}
def fake_model_preprocessor_fn(image):
return (image / 255., tf.expand_dims(tf.shape(image)[1:], axis=0))
num_classes = 3
input_transformation_fn = functools.partial(
inputs.transform_input_data,
model_preprocess_fn=fake_model_preprocessor_fn,
image_resizer_fn=_fake_image_resizer_fn,
num_classes=num_classes)
with self.test_session() as sess:
transformed_inputs = sess.run(
input_transformation_fn(tensor_dict=tensor_dict))
self.assertAllClose(transformed_inputs[fields.InputDataFields.image],
np_image / 255.)
self.assertAllClose(transformed_inputs[fields.InputDataFields.
true_image_shape],
[4, 4, 3])
def test_applies_data_augmentation_fn_to_tensor_dict(self):
np_image = np.random.randint(256, size=(4, 4, 3))
tensor_dict = {
fields.InputDataFields.image:
tf.constant(np_image),
fields.InputDataFields.groundtruth_classes:
tf.constant(np.array([3, 1], np.int32))
}
def add_one_data_augmentation_fn(tensor_dict):
return {key: value + 1 for key, value in tensor_dict.items()}
num_classes = 4
input_transformation_fn = functools.partial(
inputs.transform_input_data,
model_preprocess_fn=_fake_model_preprocessor_fn,
image_resizer_fn=_fake_image_resizer_fn,
num_classes=num_classes,
data_augmentation_fn=add_one_data_augmentation_fn)
with self.test_session() as sess:
augmented_tensor_dict = sess.run(
input_transformation_fn(tensor_dict=tensor_dict))
self.assertAllEqual(augmented_tensor_dict[fields.InputDataFields.image],
np_image + 1)
self.assertAllEqual(
augmented_tensor_dict[fields.InputDataFields.groundtruth_classes],
[[0, 0, 0, 1], [0, 1, 0, 0]])
def test_applies_data_augmentation_fn_before_model_preprocess_fn(self):
np_image = np.random.randint(256, size=(4, 4, 3))
tensor_dict = {
fields.InputDataFields.image:
tf.constant(np_image),
fields.InputDataFields.groundtruth_classes:
tf.constant(np.array([3, 1], np.int32))
}
def mul_two_model_preprocessor_fn(image):
return (image * 2, tf.expand_dims(tf.shape(image)[1:], axis=0))
def add_five_to_image_data_augmentation_fn(tensor_dict):
tensor_dict[fields.InputDataFields.image] += 5
return tensor_dict
num_classes = 4
input_transformation_fn = functools.partial(
inputs.transform_input_data,
model_preprocess_fn=mul_two_model_preprocessor_fn,
image_resizer_fn=_fake_image_resizer_fn,
num_classes=num_classes,
data_augmentation_fn=add_five_to_image_data_augmentation_fn)
with self.test_session() as sess:
augmented_tensor_dict = sess.run(
input_transformation_fn(tensor_dict=tensor_dict))
self.assertAllEqual(augmented_tensor_dict[fields.InputDataFields.image],
(np_image + 5) * 2)
if __name__ == '__main__':
tf.test.main()
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Argmax matcher implementation.
This class takes a similarity matrix and matches columns to rows based on the
maximum value per column. One can specify matched_thresholds and
to prevent columns from matching to rows (generally resulting in a negative
training example) and unmatched_theshold to ignore the match (generally
resulting in neither a positive or negative training example).
This matcher is used in Fast(er)-RCNN.
Note: matchers are used in TargetAssigners. There is a create_target_assigner
factory function for popular implementations.
"""
import tensorflow as tf
from object_detection.core import matcher
from object_detection.utils import shape_utils
class ArgMaxMatcher(matcher.Matcher):
"""Matcher based on highest value.
This class computes matches from a similarity matrix. Each column is matched
to a single row.
To support object detection target assignment this class enables setting both
matched_threshold (upper threshold) and unmatched_threshold (lower thresholds)
defining three categories of similarity which define whether examples are
positive, negative, or ignored:
(1) similarity >= matched_threshold: Highest similarity. Matched/Positive!
(2) matched_threshold > similarity >= unmatched_threshold: Medium similarity.
Depending on negatives_lower_than_unmatched, this is either
Unmatched/Negative OR Ignore.
(3) unmatched_threshold > similarity: Lowest similarity. Depending on flag
negatives_lower_than_unmatched, either Unmatched/Negative OR Ignore.
For ignored matches this class sets the values in the Match object to -2.
"""
def __init__(self,
matched_threshold,
unmatched_threshold=None,
negatives_lower_than_unmatched=True,
force_match_for_each_row=False,
use_matmul_gather=False):
"""Construct ArgMaxMatcher.
Args:
matched_threshold: Threshold for positive matches. Positive if
sim >= matched_threshold, where sim is the maximum value of the
similarity matrix for a given column. Set to None for no threshold.
unmatched_threshold: Threshold for negative matches. Negative if
sim < unmatched_threshold. Defaults to matched_threshold
when set to None.
negatives_lower_than_unmatched: Boolean which defaults to True. If True
then negative matches are the ones below the unmatched_threshold,
whereas ignored matches are in between the matched and umatched
threshold. If False, then negative matches are in between the matched
and unmatched threshold, and everything lower than unmatched is ignored.
force_match_for_each_row: If True, ensures that each row is matched to
at least one column (which is not guaranteed otherwise if the
matched_threshold is high). Defaults to False. See
argmax_matcher_test.testMatcherForceMatch() for an example.
use_matmul_gather: Force constructed match objects to use matrix
multiplication based gather instead of standard tf.gather.
(Default: False).
Raises:
ValueError: if unmatched_threshold is set but matched_threshold is not set
or if unmatched_threshold > matched_threshold.
"""
super(ArgMaxMatcher, self).__init__(use_matmul_gather=use_matmul_gather)
if (matched_threshold is None) and (unmatched_threshold is not None):
raise ValueError('Need to also define matched_threshold when'
'unmatched_threshold is defined')
self._matched_threshold = matched_threshold
if unmatched_threshold is None:
self._unmatched_threshold = matched_threshold
else:
if unmatched_threshold > matched_threshold:
raise ValueError('unmatched_threshold needs to be smaller or equal'
'to matched_threshold')
self._unmatched_threshold = unmatched_threshold
if not negatives_lower_than_unmatched:
if self._unmatched_threshold == self._matched_threshold:
raise ValueError('When negatives are in between matched and '
'unmatched thresholds, these cannot be of equal '
'value. matched: %s, unmatched: %s',
self._matched_threshold, self._unmatched_threshold)
self._force_match_for_each_row = force_match_for_each_row
self._negatives_lower_than_unmatched = negatives_lower_than_unmatched
def _match(self, similarity_matrix):
"""Tries to match each column of the similarity matrix to a row.
Args:
similarity_matrix: tensor of shape [N, M] representing any similarity
metric.
Returns:
Match object with corresponding matches for each of M columns.
"""
def _match_when_rows_are_empty():
"""Performs matching when the rows of similarity matrix are empty.
When the rows are empty, all detections are false positives. So we return
a tensor of -1's to indicate that the columns do not match to any rows.
Returns:
matches: int32 tensor indicating the row each column matches to.
"""
similarity_matrix_shape = shape_utils.combined_static_and_dynamic_shape(
similarity_matrix)
return -1 * tf.ones([similarity_matrix_shape[1]], dtype=tf.int32)
def _match_when_rows_are_non_empty():
"""Performs matching when the rows of similarity matrix are non empty.
Returns:
matches: int32 tensor indicating the row each column matches to.
"""
# Matches for each column
matches = tf.argmax(similarity_matrix, 0, output_type=tf.int32)
# Deal with matched and unmatched threshold
if self._matched_threshold is not None:
# Get logical indices of ignored and unmatched columns as tf.int64
matched_vals = tf.reduce_max(similarity_matrix, 0)
below_unmatched_threshold = tf.greater(self._unmatched_threshold,
matched_vals)
between_thresholds = tf.logical_and(
tf.greater_equal(matched_vals, self._unmatched_threshold),
tf.greater(self._matched_threshold, matched_vals))
if self._negatives_lower_than_unmatched:
matches = self._set_values_using_indicator(matches,
below_unmatched_threshold,
-1)
matches = self._set_values_using_indicator(matches,
between_thresholds,
-2)
else:
matches = self._set_values_using_indicator(matches,
below_unmatched_threshold,
-2)
matches = self._set_values_using_indicator(matches,
between_thresholds,
-1)
if self._force_match_for_each_row:
similarity_matrix_shape = shape_utils.combined_static_and_dynamic_shape(
similarity_matrix)
force_match_column_ids = tf.argmax(similarity_matrix, 1,
output_type=tf.int32)
force_match_column_indicators = tf.one_hot(
force_match_column_ids, depth=similarity_matrix_shape[1])
force_match_row_ids = tf.argmax(force_match_column_indicators, 0,
output_type=tf.int32)
force_match_column_mask = tf.cast(
tf.reduce_max(force_match_column_indicators, 0), tf.bool)
final_matches = tf.where(force_match_column_mask,
force_match_row_ids, matches)
return final_matches
else:
return matches
if similarity_matrix.shape.is_fully_defined():
if similarity_matrix.shape[0].value == 0:
return _match_when_rows_are_empty()
else:
return _match_when_rows_are_non_empty()
else:
return tf.cond(
tf.greater(tf.shape(similarity_matrix)[0], 0),
_match_when_rows_are_non_empty, _match_when_rows_are_empty)
def _set_values_using_indicator(self, x, indicator, val):
"""Set the indicated fields of x to val.
Args:
x: tensor.
indicator: boolean with same shape as x.
val: scalar with value to set.
Returns:
modified tensor.
"""
indicator = tf.cast(indicator, x.dtype)
return tf.add(tf.multiply(x, 1 - indicator), val * indicator)
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for object_detection.matchers.argmax_matcher."""
import numpy as np
import tensorflow as tf
from object_detection.matchers import argmax_matcher
from object_detection.utils import test_case
class ArgMaxMatcherTest(test_case.TestCase):
def test_return_correct_matches_with_default_thresholds(self):
def graph_fn(similarity_matrix):
matcher = argmax_matcher.ArgMaxMatcher(matched_threshold=None)
match = matcher.match(similarity_matrix)
matched_cols = match.matched_column_indicator()
unmatched_cols = match.unmatched_column_indicator()
match_results = match.match_results
return (matched_cols, unmatched_cols, match_results)
similarity = np.array([[1., 1, 1, 3, 1],
[2, -1, 2, 0, 4],
[3, 0, -1, 0, 0]], dtype=np.float32)
expected_matched_rows = np.array([2, 0, 1, 0, 1])
(res_matched_cols, res_unmatched_cols,
res_match_results) = self.execute(graph_fn, [similarity])
self.assertAllEqual(res_match_results[res_matched_cols],
expected_matched_rows)
self.assertAllEqual(np.nonzero(res_matched_cols)[0], [0, 1, 2, 3, 4])
self.assertFalse(np.all(res_unmatched_cols))
def test_return_correct_matches_with_empty_rows(self):
def graph_fn(similarity_matrix):
matcher = argmax_matcher.ArgMaxMatcher(matched_threshold=None)
match = matcher.match(similarity_matrix)
return match.unmatched_column_indicator()
similarity = 0.2 * np.ones([0, 5], dtype=np.float32)
res_unmatched_cols = self.execute(graph_fn, [similarity])
self.assertAllEqual(np.nonzero(res_unmatched_cols)[0], np.arange(5))
def test_return_correct_matches_with_matched_threshold(self):
def graph_fn(similarity):
matcher = argmax_matcher.ArgMaxMatcher(matched_threshold=3.)
match = matcher.match(similarity)
matched_cols = match.matched_column_indicator()
unmatched_cols = match.unmatched_column_indicator()
match_results = match.match_results
return (matched_cols, unmatched_cols, match_results)
similarity = np.array([[1, 1, 1, 3, 1],
[2, -1, 2, 0, 4],
[3, 0, -1, 0, 0]], dtype=np.float32)
expected_matched_cols = np.array([0, 3, 4])
expected_matched_rows = np.array([2, 0, 1])
expected_unmatched_cols = np.array([1, 2])
(res_matched_cols, res_unmatched_cols,
match_results) = self.execute(graph_fn, [similarity])
self.assertAllEqual(match_results[res_matched_cols], expected_matched_rows)
self.assertAllEqual(np.nonzero(res_matched_cols)[0], expected_matched_cols)
self.assertAllEqual(np.nonzero(res_unmatched_cols)[0],
expected_unmatched_cols)
def test_return_correct_matches_with_matched_and_unmatched_threshold(self):
def graph_fn(similarity):
matcher = argmax_matcher.ArgMaxMatcher(matched_threshold=3.,
unmatched_threshold=2.)
match = matcher.match(similarity)
matched_cols = match.matched_column_indicator()
unmatched_cols = match.unmatched_column_indicator()
match_results = match.match_results
return (matched_cols, unmatched_cols, match_results)
similarity = np.array([[1, 1, 1, 3, 1],
[2, -1, 2, 0, 4],
[3, 0, -1, 0, 0]], dtype=np.float32)
expected_matched_cols = np.array([0, 3, 4])
expected_matched_rows = np.array([2, 0, 1])
expected_unmatched_cols = np.array([1]) # col 2 has too high maximum val
(res_matched_cols, res_unmatched_cols,
match_results) = self.execute(graph_fn, [similarity])
self.assertAllEqual(match_results[res_matched_cols], expected_matched_rows)
self.assertAllEqual(np.nonzero(res_matched_cols)[0], expected_matched_cols)
self.assertAllEqual(np.nonzero(res_unmatched_cols)[0],
expected_unmatched_cols)
def test_return_correct_matches_negatives_lower_than_unmatched_false(self):
def graph_fn(similarity):
matcher = argmax_matcher.ArgMaxMatcher(
matched_threshold=3.,
unmatched_threshold=2.,
negatives_lower_than_unmatched=False)
match = matcher.match(similarity)
matched_cols = match.matched_column_indicator()
unmatched_cols = match.unmatched_column_indicator()
match_results = match.match_results
return (matched_cols, unmatched_cols, match_results)
similarity = np.array([[1, 1, 1, 3, 1],
[2, -1, 2, 0, 4],
[3, 0, -1, 0, 0]], dtype=np.float32)
expected_matched_cols = np.array([0, 3, 4])
expected_matched_rows = np.array([2, 0, 1])
expected_unmatched_cols = np.array([2]) # col 1 has too low maximum val
(res_matched_cols, res_unmatched_cols,
match_results) = self.execute(graph_fn, [similarity])
self.assertAllEqual(match_results[res_matched_cols], expected_matched_rows)
self.assertAllEqual(np.nonzero(res_matched_cols)[0], expected_matched_cols)
self.assertAllEqual(np.nonzero(res_unmatched_cols)[0],
expected_unmatched_cols)
def test_return_correct_matches_unmatched_row_not_using_force_match(self):
def graph_fn(similarity):
matcher = argmax_matcher.ArgMaxMatcher(matched_threshold=3.,
unmatched_threshold=2.)
match = matcher.match(similarity)
matched_cols = match.matched_column_indicator()
unmatched_cols = match.unmatched_column_indicator()
match_results = match.match_results
return (matched_cols, unmatched_cols, match_results)
similarity = np.array([[1, 1, 1, 3, 1],
[-1, 0, -2, -2, -1],
[3, 0, -1, 2, 0]], dtype=np.float32)
expected_matched_cols = np.array([0, 3])
expected_matched_rows = np.array([2, 0])
expected_unmatched_cols = np.array([1, 2, 4])
(res_matched_cols, res_unmatched_cols,
match_results) = self.execute(graph_fn, [similarity])
self.assertAllEqual(match_results[res_matched_cols], expected_matched_rows)
self.assertAllEqual(np.nonzero(res_matched_cols)[0], expected_matched_cols)
self.assertAllEqual(np.nonzero(res_unmatched_cols)[0],
expected_unmatched_cols)
def test_return_correct_matches_unmatched_row_while_using_force_match(self):
def graph_fn(similarity):
matcher = argmax_matcher.ArgMaxMatcher(matched_threshold=3.,
unmatched_threshold=2.,
force_match_for_each_row=True)
match = matcher.match(similarity)
matched_cols = match.matched_column_indicator()
unmatched_cols = match.unmatched_column_indicator()
match_results = match.match_results
return (matched_cols, unmatched_cols, match_results)
similarity = np.array([[1, 1, 1, 3, 1],
[-1, 0, -2, -2, -1],
[3, 0, -1, 2, 0]], dtype=np.float32)
expected_matched_cols = np.array([0, 1, 3])
expected_matched_rows = np.array([2, 1, 0])
expected_unmatched_cols = np.array([2, 4]) # col 2 has too high max val
(res_matched_cols, res_unmatched_cols,
match_results) = self.execute(graph_fn, [similarity])
self.assertAllEqual(match_results[res_matched_cols], expected_matched_rows)
self.assertAllEqual(np.nonzero(res_matched_cols)[0], expected_matched_cols)
self.assertAllEqual(np.nonzero(res_unmatched_cols)[0],
expected_unmatched_cols)
def test_valid_arguments_corner_case(self):
argmax_matcher.ArgMaxMatcher(matched_threshold=1,
unmatched_threshold=1)
def test_invalid_arguments_corner_case_negatives_lower_than_thres_false(self):
with self.assertRaises(ValueError):
argmax_matcher.ArgMaxMatcher(matched_threshold=1,
unmatched_threshold=1,
negatives_lower_than_unmatched=False)
def test_invalid_arguments_no_matched_threshold(self):
with self.assertRaises(ValueError):
argmax_matcher.ArgMaxMatcher(matched_threshold=None,
unmatched_threshold=4)
def test_invalid_arguments_unmatched_thres_larger_than_matched_thres(self):
with self.assertRaises(ValueError):
argmax_matcher.ArgMaxMatcher(matched_threshold=1,
unmatched_threshold=2)
if __name__ == '__main__':
tf.test.main()
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment