Commit e71d67c3 authored by Scott Main's avatar Scott Main Committed by TF Object Detection Team
Browse files

Add explanation for anchor box configuration

PiperOrigin-RevId: 383651497
parent 178fdeb2
...@@ -24,22 +24,23 @@ A skeleton configuration file is shown below: ...@@ -24,22 +24,23 @@ A skeleton configuration file is shown below:
``` ```
model { model {
(... Add model config here...) (... Add model config here...)
} }
train_config : { train_config : {
(... Add train_config here...) (... Add train_config here...)
} }
train_input_reader: { train_input_reader: {
(... Add train_input configuration here...) (... Add train_input configuration here...)
} }
eval_config: { eval_config: {
(... Add eval_config here...)
} }
eval_input_reader: { eval_input_reader: {
(... Add eval_input configuration here...) (... Add eval_input configuration here...)
} }
``` ```
...@@ -58,6 +59,106 @@ configuration files can be pasted into `model` field of the skeleton ...@@ -58,6 +59,106 @@ configuration files can be pasted into `model` field of the skeleton
configuration. Users should note that the `num_classes` field should be changed configuration. Users should note that the `num_classes` field should be changed
to a value suited for the dataset the user is training on. to a value suited for the dataset the user is training on.
### Anchor box parameters
Many object detection models use an anchor generator as a region-sampling
strategy, which generates a large number of anchor boxes in a range of shapes
and sizes, in many locations of the image. The detection algorithm then
incrementally offsets the anchor box closest to the ground truth until it
(closely) matches. You can specify the variety of and position of these anchor
boxes in the `anchor_generator` config.
Usually, the anchor configs provided with pre-trained checkpoints are
designed for large/versatile datasets (COCO, ImageNet), in which the goal is to
improve accuracy for a wide range of object sizes and positions. But in most
real-world applications, objects are confined to a limited number of sizes. So
adjusting the anchors to be specific to your dataset and environment
can both improve model accuracy and reduce training time.
The format for these anchor box parameters differ depending on your model
architecture. For details about all fields, see the [`anchor_generator`
definition](https://github.com/tensorflow/models/blob/master/research/object_detection/protos/anchor_generator.proto).
On this page, we'll focus on parameters
used in a traditional single shot detector (SSD) model and SSD models with a
feature pyramid network (FPN) head.
Regardless of the model architecture, you'll need to understand the following
anchor box concepts:
+ **Scale**: This defines the variety of anchor box sizes. Each box size is
defined as a proportion of the original image size (for SSD models) or as a
factor of the filter's stride length (for FPN). The number of different sizes
is defined using a range of "scales" (relative to image size) or "levels" (the
level on the feature pyramid). For example, to detect small objects with the
configurations below, the `min_scale` and `min_level` are set to a small
value, while `max_scale` and `max_level` specify the largest objects to
detect.
+ **Aspect ratio**: This is the height/width ratio for the anchor boxes. For
example, the `aspect_ratio` value of `1.0` creates a square, and `2.0` creates
a 1:2 rectangle (landscape orientation). You can define as many aspects as you
want and each one is repeated at all anchor box scales.
Beware that increasing the total number of anchor boxes will exponentially
increase computation costs. Whereas generating fewer anchors that have a higher
chance to overlap with ground truth will both improve accuracy and reduce
computation costs.
**Single Shot Detector (SSD) full model:**
Setting `num_layers` to 6 means the model generates each box aspect at 6
different sizes. The exact sizes are not specified but they're evenly spaced out
between the `min_scale` and `max_scale` values, which specify the smallest box
size is 20% of the input image size and the largest is 95% that size.
```
model {
ssd {
anchor_generator {
ssd_anchor_generator {
num_layers: 6
min_scale: 0.2
max_scale: 0.95
aspect_ratios: 1.0
aspect_ratios: 2.0
aspect_ratios: 0.5
}
}
}
}
```
For more details, see [`ssd_anchor_generator.proto`](https://github.com/tensorflow/models/blob/master/research/object_detection/protos/ssd_anchor_generator.proto).
**SSD with Feature Pyramid Network (FPN) head:**
When using an FPN head, you must specify the anchor box size relative to the
convolutional filter's stride length at a given pyramid level, using
`anchor_scale`. So in this example, the box size is 4.0 multiplied by the
layer's stride length. The number of sizes you get for each aspect simply
depends on how many levels there are between the `min_level` and `max_level`.
```
model {
ssd {
anchor_generator {
multiscale_anchor_generator {
anchor_scale: 4.0
min_level: 3
max_level: 7
aspect_ratios: 1.0
aspect_ratios: 2.0
aspect_ratios: 0.5
}
}
}
}
```
For more details, see [`multiscale_anchor_generator.proto`](https://github.com/tensorflow/models/blob/master/research/object_detection/protos/multiscale_anchor_generator.proto).
## Defining Inputs ## Defining Inputs
The TensorFlow Object Detection API accepts inputs in the TFRecord file format. The TensorFlow Object Detection API accepts inputs in the TFRecord file format.
...@@ -66,20 +167,21 @@ Additionally, users should also specify a label map, which define the mapping ...@@ -66,20 +167,21 @@ Additionally, users should also specify a label map, which define the mapping
between a class id and class name. The label map should be identical between between a class id and class name. The label map should be identical between
training and evaluation datasets. training and evaluation datasets.
An example input configuration looks as follows: An example training input configuration looks as follows:
``` ```
tf_record_input_reader { train_input_reader: {
input_path: "/usr/home/username/data/train.record" tf_record_input_reader {
input_path: "/usr/home/username/data/train.record-?????-of-00010"
}
label_map_path: "/usr/home/username/data/label_map.pbtxt"
} }
label_map_path: "/usr/home/username/data/label_map.pbtxt"
``` ```
Users should substitute the `input_path` and `label_map_path` arguments and The `eval_input_reader` follows the same format. Users should substitute the
insert the input configuration into the `train_input_reader` and `input_path` and `label_map_path` arguments. Note that the paths can also point
`eval_input_reader` fields in the skeleton configuration. Note that the paths to Google Cloud Storage buckets (ie. "gs://project_bucket/train.record") to
can also point to Google Cloud Storage buckets (ie. pull datasets hosted on Google Cloud.
"gs://project_bucket/train.record") for use on Google Cloud.
## Configuring the Trainer ## Configuring the Trainer
...@@ -92,8 +194,9 @@ The `train_config` defines parts of the training process: ...@@ -92,8 +194,9 @@ The `train_config` defines parts of the training process:
A sample `train_config` is below: A sample `train_config` is below:
``` ```
batch_size: 1 train_config: {
optimizer { batch_size: 1
optimizer {
momentum_optimizer: { momentum_optimizer: {
learning_rate: { learning_rate: {
manual_step_learning_rate { manual_step_learning_rate {
...@@ -115,14 +218,15 @@ optimizer { ...@@ -115,14 +218,15 @@ optimizer {
momentum_optimizer_value: 0.9 momentum_optimizer_value: 0.9
} }
use_moving_average: false use_moving_average: false
} }
fine_tune_checkpoint: "/usr/home/username/tmp/model.ckpt-#####" fine_tune_checkpoint: "/usr/home/username/tmp/model.ckpt-#####"
from_detection_checkpoint: true from_detection_checkpoint: true
load_all_detection_checkpoint_vars: true load_all_detection_checkpoint_vars: true
gradient_clipping_by_norm: 10.0 gradient_clipping_by_norm: 10.0
data_augmentation_options { data_augmentation_options {
random_horizontal_flip { random_horizontal_flip {
} }
}
} }
``` ```
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment