"...examples/question-answering/trainer_qa.py" did not exist on "aebde649e30016aa33b2e1345cb22210a2e49b04"
configuring_jobs.md 9.32 KB
Newer Older
1
2
3
4
# Configuring the Object Detection Training Pipeline

## Overview

5
The TensorFlow Object Detection API uses protobuf files to configure the
6
7
8
9
10
11
12
13
14
15
training and evaluation process. The schema for the training pipeline can be
found in object_detection/protos/pipeline.proto. At a high level, the config
file is split into 5 parts:

1. The `model` configuration. This defines what type of model will be trained
(ie. meta-architecture, feature extractor).
2. The `train_config`, which decides what parameters should be used to train
model parameters (ie. SGD parameters, input preprocessing and feature extractor
initialization values).
3. The `eval_config`, which determines what set of metrics will be reported for
16
evaluation.
17
18
19
20
21
22
23
24
25
26
4. The `train_input_config`, which defines what dataset the model should be
trained on.
5. The `eval_input_config`, which defines what dataset the model will be
evaluated on. Typically this should be different than the training input
dataset.

A skeleton configuration file is shown below:

```
model {
27
  (... Add model config here...)
28
29
30
}

train_config : {
31
  (... Add train_config here...)
32
33
34
}

train_input_reader: {
35
  (... Add train_input configuration here...)
36
37
38
}

eval_config: {
39
  (... Add eval_config here...)
40
41
42
}

eval_input_reader: {
43
  (... Add eval_input configuration here...)
44
45
46
47
48
49
50
51
52
53
54
55
56
}
```

## Picking Model Parameters

There are a large number of model parameters to configure. The best settings
will depend on your given application. Faster R-CNN models are better suited to
cases where high accuracy is desired and latency is of lower priority.
Conversely, if processing time is the most important factor, SSD models are
recommended. Read [our paper](https://arxiv.org/abs/1611.10012) for a more
detailed discussion on the speed vs accuracy tradeoff.

To help new users get started, sample model configurations have been provided
Zhichao Lu's avatar
Zhichao Lu committed
57
in the object_detection/samples/configs folder. The contents of these
58
59
60
61
configuration files can be pasted into `model` field of the skeleton
configuration. Users should note that the `num_classes` field should be changed
to a value suited for the dataset the user is training on.

62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
### Anchor box parameters

Many object detection models use an anchor generator as a region-sampling
strategy, which generates a large number of anchor boxes in a range of shapes
and sizes, in many locations of the image. The detection algorithm then
incrementally offsets the anchor box closest to the ground truth until it
(closely) matches. You can specify the variety of and position of these anchor
boxes in the `anchor_generator` config.

Usually, the anchor configs provided with pre-trained checkpoints are
designed for large/versatile datasets (COCO, ImageNet), in which the goal is to
improve accuracy for a wide range of object sizes and positions. But in most
real-world applications, objects are confined to a limited number of sizes. So
adjusting the anchors to be specific to your dataset and environment
can both improve model accuracy and reduce training time.

The format for these anchor box parameters differ depending on your model
architecture. For details about all fields, see the [`anchor_generator`
definition](https://github.com/tensorflow/models/blob/master/research/object_detection/protos/anchor_generator.proto).
On this page, we'll focus on parameters
used in a traditional single shot detector (SSD) model and SSD models with a
feature pyramid network (FPN) head.

Regardless of the model architecture, you'll need to understand the following
anchor box concepts:

  +  **Scale**: This defines the variety of anchor box sizes. Each box size is
  defined as a proportion of the original image size (for SSD models) or as a
  factor of the filter's stride length (for FPN). The number of different sizes
  is defined using a range of "scales" (relative to image size) or "levels" (the
  level on the feature pyramid). For example, to detect small objects with the
  configurations below, the `min_scale` and `min_level` are set to a small
  value, while `max_scale` and `max_level` specify the largest objects to
  detect.

  +  **Aspect ratio**: This is the height/width ratio for the anchor boxes.  For
  example, the `aspect_ratio` value of `1.0` creates a square, and `2.0` creates
  a 1:2 rectangle (landscape orientation). You can define as many aspects as you
  want and each one is repeated at all anchor box scales.

Beware that increasing the total number of anchor boxes will exponentially
increase computation costs. Whereas generating fewer anchors that have a higher
chance to overlap with ground truth will both improve accuracy and reduce
computation costs.

107
108
109
110
111
112
113
And although you can manually select values for both scale and aspect ratios
that work well for your dataset, there are programmatic techniques you can use
instead. One such strategy to determine the ideal aspect ratios is to perform
k-means clustering of all the ground-truth bounding-box ratios, as shown in this
Colab notebook to [Generate SSD anchor box aspect ratios using k-means
clustering](https://colab.sandbox.google.com/github/tensorflow/models/blob/master/research/object_detection/colab_tutorials/generate_ssd_anchor_box_aspect_ratios_using_k_means_clustering.ipynb).

114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168

**Single Shot Detector (SSD) full model:**

Setting `num_layers` to 6 means the model generates each box aspect at 6
different sizes. The exact sizes are not specified but they're evenly spaced out
between the `min_scale` and `max_scale` values, which specify the smallest box
size is 20% of the input image size and the largest is 95% that size.

```
model {
  ssd {
    anchor_generator {
      ssd_anchor_generator {
        num_layers: 6
        min_scale: 0.2
        max_scale: 0.95
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect_ratios: 0.5
      }
    }
  }
}
```

For more details, see [`ssd_anchor_generator.proto`](https://github.com/tensorflow/models/blob/master/research/object_detection/protos/ssd_anchor_generator.proto).

**SSD with Feature Pyramid Network (FPN) head:**

When using an FPN head, you must specify the anchor box size relative to the
convolutional filter's stride length at a given pyramid level, using
`anchor_scale`. So in this example, the box size is 4.0 multiplied by the
layer's stride length. The number of sizes you get for each aspect simply
depends on how many levels there are between the `min_level` and `max_level`.

```
model {
  ssd {
    anchor_generator {
      multiscale_anchor_generator {
        anchor_scale: 4.0
        min_level: 3
        max_level: 7
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect_ratios: 0.5
      }
    }
  }
}
```

For more details, see [`multiscale_anchor_generator.proto`](https://github.com/tensorflow/models/blob/master/research/object_detection/protos/multiscale_anchor_generator.proto).


169
170
## Defining Inputs

171
The TensorFlow Object Detection API accepts inputs in the TFRecord file format.
172
173
174
175
176
Users must specify the locations of both the training and evaluation files.
Additionally, users should also specify a label map, which define the mapping
between a class id and class name. The label map should be identical between
training and evaluation datasets.

177
An example training input configuration looks as follows:
178
179

```
180
181
182
183
184
train_input_reader: {
  tf_record_input_reader {
    input_path: "/usr/home/username/data/train.record-?????-of-00010"
  }
  label_map_path: "/usr/home/username/data/label_map.pbtxt"
185
186
187
}
```

188
189
190
191
The `eval_input_reader` follows the same format. Users should substitute the
`input_path` and `label_map_path` arguments. Note that the paths can also point
to Google Cloud Storage buckets (ie. "gs://project_bucket/train.record") to
pull datasets hosted on Google Cloud.
192
193
194
195
196
197
198
199
200
201
202
203

## Configuring the Trainer

The `train_config` defines parts of the training process:

1. Model parameter initialization.
2. Input preprocessing.
3. SGD parameters.

A sample `train_config` is below:

```
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
train_config: {
  batch_size: 1
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0002
          schedule {
            step: 0
            learning_rate: .0002
          }
          schedule {
            step: 900000
            learning_rate: .00002
          }
          schedule {
            step: 1200000
            learning_rate: .000002
          }
223
224
        }
      }
225
      momentum_optimizer_value: 0.9
226
    }
227
    use_moving_average: false
228
  }
229
230
231
232
233
234
235
  fine_tune_checkpoint: "/usr/home/username/tmp/model.ckpt-#####"
  from_detection_checkpoint: true
  load_all_detection_checkpoint_vars: true
  gradient_clipping_by_norm: 10.0
  data_augmentation_options {
    random_horizontal_flip {
    }
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
  }
}
```

### Input Preprocessing

The `data_augmentation_options` in `train_config` can be used to specify
how training data can be modified. This field is optional.

### SGD Parameters

The remainings parameters in `train_config` are hyperparameters for gradient
descent. Please note that the optimal learning rates provided in these
configuration files may depend on the specifics of the training setup (e.g.
number of workers, gpu type).

## Configuring the Evaluator

254
255
256
257
258
The main components to set in `eval_config` are `num_examples` and
`metrics_set`. The parameter `num_examples` indicates the number of batches (
currently of batch size 1) used for an evaluation cycle, and often is the total
size of the evaluation dataset. The parameter `metrics_set` indicates which
metrics to run during evaluation (i.e. `"coco_detection_metrics"`).