used in a traditional single shot detector (SSD) model and SSD models with a
feature pyramid network (FPN) head.
Regardless of the model architecture, you'll need to understand the following
anchor box concepts:
+**Scale**: This defines the variety of anchor box sizes. Each box size is
defined as a proportion of the original image size (for SSD models) or as a
factor of the filter's stride length (for FPN). The number of different sizes
is defined using a range of "scales" (relative to image size) or "levels" (the
level on the feature pyramid). For example, to detect small objects with the
configurations below, the `min_scale` and `min_level` are set to a small
value, while `max_scale` and `max_level` specify the largest objects to
detect.
+**Aspect ratio**: This is the height/width ratio for the anchor boxes. For
example, the `aspect_ratio` value of `1.0` creates a square, and `2.0` creates
a 1:2 rectangle (landscape orientation). You can define as many aspects as you
want and each one is repeated at all anchor box scales.
Beware that increasing the total number of anchor boxes will exponentially
increase computation costs. Whereas generating fewer anchors that have a higher
chance to overlap with ground truth will both improve accuracy and reduce
computation costs.
**Single Shot Detector (SSD) full model:**
Setting `num_layers` to 6 means the model generates each box aspect at 6
different sizes. The exact sizes are not specified but they're evenly spaced out
between the `min_scale` and `max_scale` values, which specify the smallest box
size is 20% of the input image size and the largest is 95% that size.
```
model{
ssd{
anchor_generator{
ssd_anchor_generator{
num_layers:6
min_scale:0.2
max_scale:0.95
aspect_ratios:1.0
aspect_ratios:2.0
aspect_ratios:0.5
}
}
}
}
```
For more details, see [`ssd_anchor_generator.proto`](https://github.com/tensorflow/models/blob/master/research/object_detection/protos/ssd_anchor_generator.proto).
**SSD with Feature Pyramid Network (FPN) head:**
When using an FPN head, you must specify the anchor box size relative to the
convolutional filter's stride length at a given pyramid level, using
`anchor_scale`. So in this example, the box size is 4.0 multiplied by the
layer's stride length. The number of sizes you get for each aspect simply
depends on how many levels there are between the `min_level` and `max_level`.
```
model{
ssd{
anchor_generator{
multiscale_anchor_generator{
anchor_scale:4.0
min_level:3
max_level:7
aspect_ratios:1.0
aspect_ratios:2.0
aspect_ratios:0.5
}
}
}
}
```
For more details, see [`multiscale_anchor_generator.proto`](https://github.com/tensorflow/models/blob/master/research/object_detection/protos/multiscale_anchor_generator.proto).
## Defining Inputs
## Defining Inputs
The TensorFlow Object Detection API accepts inputs in the TFRecord file format.
The TensorFlow Object Detection API accepts inputs in the TFRecord file format.
...
@@ -66,20 +167,21 @@ Additionally, users should also specify a label map, which define the mapping
...
@@ -66,20 +167,21 @@ Additionally, users should also specify a label map, which define the mapping
between a class id and class name. The label map should be identical between
between a class id and class name. The label map should be identical between
training and evaluation datasets.
training and evaluation datasets.
An example input configuration looks as follows:
An example training input configuration looks as follows: