configuring_jobs.md 8.76 KB
Newer Older
1
2
3
4
# Configuring the Object Detection Training Pipeline

## Overview

5
The TensorFlow Object Detection API uses protobuf files to configure the
6
7
8
9
10
11
12
13
14
15
training and evaluation process. The schema for the training pipeline can be
found in object_detection/protos/pipeline.proto. At a high level, the config
file is split into 5 parts:

1. The `model` configuration. This defines what type of model will be trained
(ie. meta-architecture, feature extractor).
2. The `train_config`, which decides what parameters should be used to train
model parameters (ie. SGD parameters, input preprocessing and feature extractor
initialization values).
3. The `eval_config`, which determines what set of metrics will be reported for
16
evaluation.
17
18
19
20
21
22
23
24
25
26
4. The `train_input_config`, which defines what dataset the model should be
trained on.
5. The `eval_input_config`, which defines what dataset the model will be
evaluated on. Typically this should be different than the training input
dataset.

A skeleton configuration file is shown below:

```
model {
27
  (... Add model config here...)
28
29
30
}

train_config : {
31
  (... Add train_config here...)
32
33
34
}

train_input_reader: {
35
  (... Add train_input configuration here...)
36
37
38
}

eval_config: {
39
  (... Add eval_config here...)
40
41
42
}

eval_input_reader: {
43
  (... Add eval_input configuration here...)
44
45
46
47
48
49
50
51
52
53
54
55
56
}
```

## Picking Model Parameters

There are a large number of model parameters to configure. The best settings
will depend on your given application. Faster R-CNN models are better suited to
cases where high accuracy is desired and latency is of lower priority.
Conversely, if processing time is the most important factor, SSD models are
recommended. Read [our paper](https://arxiv.org/abs/1611.10012) for a more
detailed discussion on the speed vs accuracy tradeoff.

To help new users get started, sample model configurations have been provided
Zhichao Lu's avatar
Zhichao Lu committed
57
in the object_detection/samples/configs folder. The contents of these
58
59
60
61
configuration files can be pasted into `model` field of the skeleton
configuration. Users should note that the `num_classes` field should be changed
to a value suited for the dataset the user is training on.

62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
### Anchor box parameters

Many object detection models use an anchor generator as a region-sampling
strategy, which generates a large number of anchor boxes in a range of shapes
and sizes, in many locations of the image. The detection algorithm then
incrementally offsets the anchor box closest to the ground truth until it
(closely) matches. You can specify the variety of and position of these anchor
boxes in the `anchor_generator` config.

Usually, the anchor configs provided with pre-trained checkpoints are
designed for large/versatile datasets (COCO, ImageNet), in which the goal is to
improve accuracy for a wide range of object sizes and positions. But in most
real-world applications, objects are confined to a limited number of sizes. So
adjusting the anchors to be specific to your dataset and environment
can both improve model accuracy and reduce training time.

The format for these anchor box parameters differ depending on your model
architecture. For details about all fields, see the [`anchor_generator`
definition](https://github.com/tensorflow/models/blob/master/research/object_detection/protos/anchor_generator.proto).
On this page, we'll focus on parameters
used in a traditional single shot detector (SSD) model and SSD models with a
feature pyramid network (FPN) head.

Regardless of the model architecture, you'll need to understand the following
anchor box concepts:

  +  **Scale**: This defines the variety of anchor box sizes. Each box size is
  defined as a proportion of the original image size (for SSD models) or as a
  factor of the filter's stride length (for FPN). The number of different sizes
  is defined using a range of "scales" (relative to image size) or "levels" (the
  level on the feature pyramid). For example, to detect small objects with the
  configurations below, the `min_scale` and `min_level` are set to a small
  value, while `max_scale` and `max_level` specify the largest objects to
  detect.

  +  **Aspect ratio**: This is the height/width ratio for the anchor boxes.  For
  example, the `aspect_ratio` value of `1.0` creates a square, and `2.0` creates
  a 1:2 rectangle (landscape orientation). You can define as many aspects as you
  want and each one is repeated at all anchor box scales.

Beware that increasing the total number of anchor boxes will exponentially
increase computation costs. Whereas generating fewer anchors that have a higher
chance to overlap with ground truth will both improve accuracy and reduce
computation costs.


**Single Shot Detector (SSD) full model:**

Setting `num_layers` to 6 means the model generates each box aspect at 6
different sizes. The exact sizes are not specified but they're evenly spaced out
between the `min_scale` and `max_scale` values, which specify the smallest box
size is 20% of the input image size and the largest is 95% that size.

```
model {
  ssd {
    anchor_generator {
      ssd_anchor_generator {
        num_layers: 6
        min_scale: 0.2
        max_scale: 0.95
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect_ratios: 0.5
      }
    }
  }
}
```

For more details, see [`ssd_anchor_generator.proto`](https://github.com/tensorflow/models/blob/master/research/object_detection/protos/ssd_anchor_generator.proto).

**SSD with Feature Pyramid Network (FPN) head:**

When using an FPN head, you must specify the anchor box size relative to the
convolutional filter's stride length at a given pyramid level, using
`anchor_scale`. So in this example, the box size is 4.0 multiplied by the
layer's stride length. The number of sizes you get for each aspect simply
depends on how many levels there are between the `min_level` and `max_level`.

```
model {
  ssd {
    anchor_generator {
      multiscale_anchor_generator {
        anchor_scale: 4.0
        min_level: 3
        max_level: 7
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect_ratios: 0.5
      }
    }
  }
}
```

For more details, see [`multiscale_anchor_generator.proto`](https://github.com/tensorflow/models/blob/master/research/object_detection/protos/multiscale_anchor_generator.proto).


162
163
## Defining Inputs

164
The TensorFlow Object Detection API accepts inputs in the TFRecord file format.
165
166
167
168
169
Users must specify the locations of both the training and evaluation files.
Additionally, users should also specify a label map, which define the mapping
between a class id and class name. The label map should be identical between
training and evaluation datasets.

170
An example training input configuration looks as follows:
171
172

```
173
174
175
176
177
train_input_reader: {
  tf_record_input_reader {
    input_path: "/usr/home/username/data/train.record-?????-of-00010"
  }
  label_map_path: "/usr/home/username/data/label_map.pbtxt"
178
179
180
}
```

181
182
183
184
The `eval_input_reader` follows the same format. Users should substitute the
`input_path` and `label_map_path` arguments. Note that the paths can also point
to Google Cloud Storage buckets (ie. "gs://project_bucket/train.record") to
pull datasets hosted on Google Cloud.
185
186
187
188
189
190
191
192
193
194
195
196

## Configuring the Trainer

The `train_config` defines parts of the training process:

1. Model parameter initialization.
2. Input preprocessing.
3. SGD parameters.

A sample `train_config` is below:

```
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
train_config: {
  batch_size: 1
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0002
          schedule {
            step: 0
            learning_rate: .0002
          }
          schedule {
            step: 900000
            learning_rate: .00002
          }
          schedule {
            step: 1200000
            learning_rate: .000002
          }
216
217
        }
      }
218
      momentum_optimizer_value: 0.9
219
    }
220
    use_moving_average: false
221
  }
222
223
224
225
226
227
228
  fine_tune_checkpoint: "/usr/home/username/tmp/model.ckpt-#####"
  from_detection_checkpoint: true
  load_all_detection_checkpoint_vars: true
  gradient_clipping_by_norm: 10.0
  data_augmentation_options {
    random_horizontal_flip {
    }
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
  }
}
```

### Input Preprocessing

The `data_augmentation_options` in `train_config` can be used to specify
how training data can be modified. This field is optional.

### SGD Parameters

The remainings parameters in `train_config` are hyperparameters for gradient
descent. Please note that the optimal learning rates provided in these
configuration files may depend on the specifics of the training setup (e.g.
number of workers, gpu type).

## Configuring the Evaluator

247
248
249
250
251
The main components to set in `eval_config` are `num_examples` and
`metrics_set`. The parameter `num_examples` indicates the number of batches (
currently of batch size 1) used for an evaluation cycle, and often is the total
size of the evaluation dataset. The parameter `metrics_set` indicates which
metrics to run during evaluation (i.e. `"coco_detection_metrics"`).