Unverified Commit 70255908 authored by pkulzc's avatar pkulzc Committed by GitHub
Browse files

Object detection Internal Changes. (#4757)

* Merged commit includes the following changes:
204316992  by Zhichao Lu:

    Update docs to prepare inputs

--
204309254  by Zhichao Lu:

    Update running_pets.md to use new binaries and correct a few things in running_on_cloud.md

--
204306734  by Zhichao Lu:

    Move old binaries into legacy folder and add deprecation notice.

--
204267757  by Zhichao Lu:

    Fixing a problem in VRD evaluation with missing ground truth annotations for
    images that do not contain objects from 62 groundtruth classes.

--
204167430  by Zhichao Lu:

    This fixes a flaky losses test failure.

--
203670721  by Zhichao Lu:

    Internal change.

--
203569388  by Zhichao Lu:

    Internal change

203546580  by Zhichao Lu:

    * Expand TPU compatibility g3doc with config snippets
    * Change mscoco dataset path in sample configs to the sharded versions

--
203325694  by Zhichao Lu:

    Make merge_multiple_label_boxes work for model_main code path.

--
203305655  by Zhichao Lu:

    Remove the 1x1 conv layer before pooling in MobileNet-v1-PPN feature extractor.

--
203139608  by Zhichao Lu:

    - Support exponential_decay with burnin learning rate schedule.
    - Add the minimum learning rate option.
    - Make the exponential decay start only after the burnin steps.

--
203068703  by Zhichao Lu:

    Modify create_coco_tf_record.py to output sharded files.

--
203025308  by Zhichao Lu:

    Add an option to share the prediction tower in WeightSharedBoxPredictor.

--
203024942  by Zhichao Lu:

    Move ssd mobilenet v1 ppn configs to third party.

--
202901259  by Zhichao Lu:

    Delete obsolete ssd mobilenet v1 focal loss configs and update pets dataset path

--
202894154  by Zhichao Lu:

    Move all TPU compatible ssd mobilenet v1 coco14/pet configs to third party.

--
202861774  by Zhichao Lu:

    Move Retinanet (SSD + FPN + Shared box predictor) configs to third_party.

--

PiperOrigin-RevId: 204316992

* Add original files back.
parent ee6fdda1
# SSD with Mobilenet v1 with quantized training.
# Trained on COCO, initialized from Imagenet classification checkpoint
# Achieves 18.2 mAP on coco14 minival dataset.
# This config is TPU compatible
model {
ssd {
inplace_batchnorm_update: true
freeze_batchnorm: false
num_classes: 90
box_coder {
faster_rcnn_box_coder {
y_scale: 10.0
x_scale: 10.0
height_scale: 5.0
width_scale: 5.0
}
}
matcher {
argmax_matcher {
matched_threshold: 0.5
unmatched_threshold: 0.5
ignore_thresholds: false
negatives_lower_than_unmatched: true
force_match_for_each_row: true
use_matmul_gather: true
}
}
similarity_calculator {
iou_similarity {
}
}
encode_background_as_zeros: true
anchor_generator {
ssd_anchor_generator {
num_layers: 6
min_scale: 0.2
max_scale: 0.95
aspect_ratios: 1.0
aspect_ratios: 2.0
aspect_ratios: 0.5
aspect_ratios: 3.0
aspect_ratios: 0.3333
}
}
image_resizer {
fixed_shape_resizer {
height: 300
width: 300
}
}
box_predictor {
convolutional_box_predictor {
min_depth: 0
max_depth: 0
num_layers_before_predictor: 0
use_dropout: false
dropout_keep_probability: 0.8
kernel_size: 1
box_code_size: 4
apply_sigmoid_to_scores: false
class_prediction_bias_init: -4.6
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
random_normal_initializer {
stddev: 0.01
mean: 0.0
}
}
batch_norm {
scale: true,
center: true,
decay: 0.97,
epsilon: 0.001,
}
}
}
}
feature_extractor {
type: 'ssd_mobilenet_v1'
min_depth: 16
depth_multiplier: 0.75
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
random_normal_initializer {
stddev: 0.01
mean: 0.0
}
}
batch_norm {
scale: true,
center: true,
decay: 0.97,
epsilon: 0.001,
}
}
override_base_feature_extractor_hyperparams: true
}
loss {
classification_loss {
weighted_sigmoid_focal {
alpha: 0.75,
gamma: 2.0
}
}
localization_loss {
weighted_smooth_l1 {
}
}
classification_weight: 1.0
localization_weight: 1.0
}
normalize_loss_by_num_matches: true
normalize_loc_loss_by_codesize: true
post_processing {
batch_non_max_suppression {
score_threshold: 1e-8
iou_threshold: 0.6
max_detections_per_class: 100
max_total_detections: 100
}
score_converter: SIGMOID
}
}
}
train_config: {
fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt"
batch_size: 128
sync_replicas: true
startup_delay_steps: 0
replicas_to_aggregate: 8
num_steps: 50000
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
ssd_random_crop {
}
}
optimizer {
momentum_optimizer: {
learning_rate: {
cosine_decay_learning_rate {
learning_rate_base: .2
total_steps: 50000
warmup_learning_rate: 0.06
warmup_steps: 2000
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
max_number_of_boxes: 100
unpad_groundtruth_tensors: false
}
train_input_reader: {
tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/mscoco_train.record-00000-of-00100"
}
label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
}
eval_config: {
metrics_set: "coco_detection_metrics"
use_moving_averages: false
num_examples: 8000
}
eval_input_reader: {
tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/mscoco_val.record-00000-of-00010"
}
label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
shuffle: false
num_readers: 1
}
graph_rewriter {
quantization {
delay: 48000
activation_bits: 8
weight_bits: 8
}
}
# SSD with Mobilenet v1 0.75 depth multiplied feature extractor, focal loss and
# quantized training.
# Trained on IIIT-Oxford pets, initialized from COCO detection checkpoint
# This config is TPU compatible
model {
ssd {
inplace_batchnorm_update: true
freeze_batchnorm: false
num_classes: 37
box_coder {
faster_rcnn_box_coder {
y_scale: 10.0
x_scale: 10.0
height_scale: 5.0
width_scale: 5.0
}
}
matcher {
argmax_matcher {
matched_threshold: 0.5
unmatched_threshold: 0.5
ignore_thresholds: false
negatives_lower_than_unmatched: true
force_match_for_each_row: true
use_matmul_gather: true
}
}
similarity_calculator {
iou_similarity {
}
}
encode_background_as_zeros: true
anchor_generator {
ssd_anchor_generator {
num_layers: 6
min_scale: 0.2
max_scale: 0.95
aspect_ratios: 1.0
aspect_ratios: 2.0
aspect_ratios: 0.5
aspect_ratios: 3.0
aspect_ratios: 0.3333
}
}
image_resizer {
fixed_shape_resizer {
height: 300
width: 300
}
}
box_predictor {
convolutional_box_predictor {
min_depth: 0
max_depth: 0
num_layers_before_predictor: 0
use_dropout: false
dropout_keep_probability: 0.8
kernel_size: 1
box_code_size: 4
apply_sigmoid_to_scores: false
class_prediction_bias_init: -4.6
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
random_normal_initializer {
stddev: 0.01
mean: 0.0
}
}
batch_norm {
train: true,
scale: true,
center: true,
decay: 0.9,
epsilon: 0.001,
}
}
}
}
feature_extractor {
type: 'ssd_mobilenet_v1'
min_depth: 16
depth_multiplier: 0.75
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
truncated_normal_initializer {
stddev: 0.03
mean: 0.0
}
}
batch_norm {
scale: true,
center: true,
decay: 0.9,
epsilon: 0.001,
}
}
override_base_feature_extractor_hyperparams: true
}
loss {
classification_loss {
weighted_sigmoid_focal {
alpha: 0.75,
gamma: 2.0
}
}
localization_loss {
weighted_smooth_l1 {
delta: 1.0
}
}
classification_weight: 1.0
localization_weight: 1.0
}
normalize_loss_by_num_matches: true
normalize_loc_loss_by_codesize: true
post_processing {
batch_non_max_suppression {
score_threshold: 1e-8
iou_threshold: 0.6
max_detections_per_class: 100
max_total_detections: 100
}
score_converter: SIGMOID
}
}
}
train_config: {
fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt"
fine_tune_checkpoint_type: "detection"
load_all_detection_checkpoint_vars: true
batch_size: 128
sync_replicas: true
startup_delay_steps: 0
replicas_to_aggregate: 8
num_steps: 2000
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
ssd_random_crop {
}
}
optimizer {
momentum_optimizer: {
learning_rate: {
cosine_decay_learning_rate {
learning_rate_base: 0.2
total_steps: 2000
warmup_steps: 0
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
max_number_of_boxes: 100
unpad_groundtruth_tensors: false
}
train_input_reader: {
tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/pet_faces_train.record-?????-of-00010"
}
label_map_path: "PATH_TO_BE_CONFIGURED/pet_label_map.pbtxt"
}
eval_config: {
metrics_set: "coco_detection_metrics"
use_moving_averages: false
num_examples: 1100
}
eval_input_reader: {
tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/pet_faces_val.record-?????-of-00010"
}
label_map_path: "PATH_TO_BE_CONFIGURED/pet_label_map.pbtxt"
shuffle: false
num_readers: 1
}
graph_rewriter {
quantization {
delay: 1800
activation_bits: 8
weight_bits: 8
}
}
\ No newline at end of file
# SSD with Mobilenet v1 configuration for MSCOCO Dataset. # SSD with Mobilenet v1 feature extractor and focal loss.
# Users should configure the fine_tune_checkpoint field in the train config as # Trained on COCO14, initialized from Imagenet classification checkpoint
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that # Achieves 19.3 mAP on COCO14 minival dataset. Doubling the number of training
# should be configured. # steps gets to 20.6 mAP.
# TPU-compatible
# This config is TPU compatible
model { model {
ssd { ssd {
inplace_batchnorm_update: true
freeze_batchnorm: false
num_classes: 90 num_classes: 90
box_coder { box_coder {
faster_rcnn_box_coder { faster_rcnn_box_coder {
...@@ -23,12 +26,14 @@ model { ...@@ -23,12 +26,14 @@ model {
ignore_thresholds: false ignore_thresholds: false
negatives_lower_than_unmatched: true negatives_lower_than_unmatched: true
force_match_for_each_row: true force_match_for_each_row: true
use_matmul_gather: true
} }
} }
similarity_calculator { similarity_calculator {
iou_similarity { iou_similarity {
} }
} }
encode_background_as_zeros: true
anchor_generator { anchor_generator {
ssd_anchor_generator { ssd_anchor_generator {
num_layers: 6 num_layers: 6
...@@ -57,6 +62,7 @@ model { ...@@ -57,6 +62,7 @@ model {
kernel_size: 1 kernel_size: 1
box_code_size: 4 box_code_size: 4
apply_sigmoid_to_scores: false apply_sigmoid_to_scores: false
class_prediction_bias_init: -4.6
conv_hyperparams { conv_hyperparams {
activation: RELU_6, activation: RELU_6,
regularizer { regularizer {
...@@ -65,8 +71,8 @@ model { ...@@ -65,8 +71,8 @@ model {
} }
} }
initializer { initializer {
truncated_normal_initializer { random_normal_initializer {
stddev: 0.03 stddev: 0.01
mean: 0.0 mean: 0.0
} }
} }
...@@ -74,7 +80,7 @@ model { ...@@ -74,7 +80,7 @@ model {
train: true, train: true,
scale: true, scale: true,
center: true, center: true,
decay: 0.9997, decay: 0.97,
epsilon: 0.001, epsilon: 0.001,
} }
} }
...@@ -101,26 +107,29 @@ model { ...@@ -101,26 +107,29 @@ model {
train: true, train: true,
scale: true, scale: true,
center: true, center: true,
decay: 0.9997, decay: 0.97,
epsilon: 0.001, epsilon: 0.001,
} }
} }
override_base_feature_extractor_hyperparams: true
} }
loss { loss {
classification_loss { classification_loss {
weighted_sigmoid_focal { weighted_sigmoid_focal {
alpha: 0.75 alpha: 0.75,
gamma: 2.0 gamma: 2.0
} }
} }
localization_loss { localization_loss {
weighted_smooth_l1 { weighted_smooth_l1 {
delta: 1.0
} }
} }
classification_weight: 1.0 classification_weight: 1.0
localization_weight: 1.0 localization_weight: 1.0
} }
normalize_loss_by_num_matches: true normalize_loss_by_num_matches: true
normalize_loc_loss_by_codesize: true
post_processing { post_processing {
batch_non_max_suppression { batch_non_max_suppression {
score_threshold: 1e-8 score_threshold: 1e-8
...@@ -134,28 +143,12 @@ model { ...@@ -134,28 +143,12 @@ model {
} }
train_config: { train_config: {
batch_size: 24
optimizer {
rms_prop_optimizer: {
learning_rate: {
exponential_decay_learning_rate {
initial_learning_rate: 0.004
decay_steps: 800720
decay_factor: 0.95
}
}
momentum_optimizer_value: 0.9
decay: 0.9
epsilon: 1.0
}
}
fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt" fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt"
from_detection_checkpoint: true batch_size: 2048
# Note: The below line limits the training process to 200K steps, which we sync_replicas: true
# empirically found to be sufficient enough to train the pets dataset. This startup_delay_steps: 0
# effectively bypasses the learning rate schedule (the learning rate will replicas_to_aggregate: 8
# never decay). Remove the below line to train indefinitely. num_steps: 10000
num_steps: 200000
data_augmentation_options { data_augmentation_options {
random_horizontal_flip { random_horizontal_flip {
} }
...@@ -164,29 +157,42 @@ train_config: { ...@@ -164,29 +157,42 @@ train_config: {
ssd_random_crop { ssd_random_crop {
} }
} }
max_number_of_boxes: 50 optimizer {
momentum_optimizer: {
learning_rate: {
cosine_decay_learning_rate {
learning_rate_base: 0.9
total_steps: 10000
warmup_learning_rate: 0.3
warmup_steps: 300
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
max_number_of_boxes: 100
unpad_groundtruth_tensors: false unpad_groundtruth_tensors: false
} }
train_input_reader: { train_input_reader: {
tf_record_input_reader { tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/mscoco_train.record" input_path: "PATH_TO_BE_CONFIGURED/mscoco_train.record-00000-of-00100"
} }
label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt" label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
} }
eval_config: { eval_config: {
metrics_set: "coco_detection_metrics"
use_moving_averages: false
num_examples: 8000 num_examples: 8000
# Note: The below line limits the evaluation process to 10 evaluations.
# Remove the below line to evaluate indefinitely.
max_evals: 10
} }
eval_input_reader: { eval_input_reader: {
tf_record_input_reader { tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/mscoco_val.record" input_path: "PATH_TO_BE_CONFIGURED/mscoco_val.record-00000-of-00010"
} }
label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt" label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
shuffle: false shuffle: false
num_readers: 1 num_readers: 1
} }
\ No newline at end of file
...@@ -172,7 +172,7 @@ train_config: { ...@@ -172,7 +172,7 @@ train_config: {
train_input_reader: { train_input_reader: {
tf_record_input_reader { tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/mscoco_train.record" input_path: "PATH_TO_BE_CONFIGURED/mscoco_train.record-?????-of-00100"
} }
label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt" label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
} }
...@@ -186,7 +186,7 @@ eval_config: { ...@@ -186,7 +186,7 @@ eval_config: {
eval_input_reader: { eval_input_reader: {
tf_record_input_reader { tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/mscoco_val.record" input_path: "PATH_TO_BE_CONFIGURED/mscoco_val.record-?????-of-00010"
} }
label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt" label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
shuffle: false shuffle: false
......
...@@ -171,7 +171,7 @@ train_config: { ...@@ -171,7 +171,7 @@ train_config: {
train_input_reader: { train_input_reader: {
tf_record_input_reader { tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/pet_faces_train.record-?????-of-?????" input_path: "PATH_TO_BE_CONFIGURED/pet_faces_train.record-?????-of-00010"
} }
label_map_path: "PATH_TO_BE_CONFIGURED/pet_label_map.pbtxt" label_map_path: "PATH_TO_BE_CONFIGURED/pet_label_map.pbtxt"
} }
...@@ -183,7 +183,7 @@ eval_config: { ...@@ -183,7 +183,7 @@ eval_config: {
eval_input_reader: { eval_input_reader: {
tf_record_input_reader { tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/pet_faces_val.record-?????-of-?????" input_path: "PATH_TO_BE_CONFIGURED/pet_faces_val.record-?????-of-00010"
} }
label_map_path: "PATH_TO_BE_CONFIGURED/pet_label_map.pbtxt" label_map_path: "PATH_TO_BE_CONFIGURED/pet_label_map.pbtxt"
shuffle: false shuffle: false
......
# SSD with Mobilenet v1 FPN feature extractor, shared box predictor and focal
# loss (a.k.a Retinanet).
# See Lin et al, https://arxiv.org/abs/1708.02002
# Trained on COCO, initialized from Imagenet classification checkpoint
# Achieves 29.6 mAP on COCO14 minival dataset. Doubling the number of training
# steps to 25k gets 31.5 mAP
# This config is TPU compatible
model {
ssd {
inplace_batchnorm_update: true
freeze_batchnorm: false
num_classes: 90
box_coder {
faster_rcnn_box_coder {
y_scale: 10.0
x_scale: 10.0
height_scale: 5.0
width_scale: 5.0
}
}
matcher {
argmax_matcher {
matched_threshold: 0.5
unmatched_threshold: 0.5
ignore_thresholds: false
negatives_lower_than_unmatched: true
force_match_for_each_row: true
use_matmul_gather: true
}
}
similarity_calculator {
iou_similarity {
}
}
encode_background_as_zeros: true
anchor_generator {
multiscale_anchor_generator {
min_level: 3
max_level: 7
anchor_scale: 4.0
aspect_ratios: [1.0, 2.0, 0.5]
scales_per_octave: 2
}
}
image_resizer {
fixed_shape_resizer {
height: 640
width: 640
}
}
box_predictor {
weight_shared_convolutional_box_predictor {
depth: 256
class_prediction_bias_init: -4.6
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
random_normal_initializer {
stddev: 0.01
mean: 0.0
}
}
batch_norm {
scale: true,
decay: 0.997,
epsilon: 0.001,
}
}
num_layers_before_predictor: 4
kernel_size: 3
}
}
feature_extractor {
type: 'ssd_mobilenet_v1_fpn'
min_depth: 16
depth_multiplier: 1.0
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
random_normal_initializer {
stddev: 0.01
mean: 0.0
}
}
batch_norm {
scale: true,
decay: 0.997,
epsilon: 0.001,
}
}
override_base_feature_extractor_hyperparams: true
}
loss {
classification_loss {
weighted_sigmoid_focal {
alpha: 0.25
gamma: 2.0
}
}
localization_loss {
weighted_smooth_l1 {
}
}
classification_weight: 1.0
localization_weight: 1.0
}
normalize_loss_by_num_matches: true
normalize_loc_loss_by_codesize: true
post_processing {
batch_non_max_suppression {
score_threshold: 1e-8
iou_threshold: 0.6
max_detections_per_class: 100
max_total_detections: 100
}
score_converter: SIGMOID
}
}
}
train_config: {
fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt"
batch_size: 128
sync_replicas: true
startup_delay_steps: 0
replicas_to_aggregate: 8
num_steps: 12500
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
random_crop_image {
min_object_covered: 0.0
min_aspect_ratio: 0.75
max_aspect_ratio: 3.0
min_area: 0.75
max_area: 1.0
overlap_thresh: 0.0
}
}
optimizer {
momentum_optimizer: {
learning_rate: {
cosine_decay_learning_rate {
learning_rate_base: .08
total_steps: 12500
warmup_learning_rate: .026666
warmup_steps: 1000
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
max_number_of_boxes: 100
unpad_groundtruth_tensors: false
}
train_input_reader: {
tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/mscoco_train.record-00000-of-00100"
}
label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
}
eval_config: {
metrics_set: "coco_detection_metrics"
use_moving_averages: false
num_examples: 8000
}
eval_input_reader: {
tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/mscoco_val.record-00000-of-00010"
}
label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
shuffle: false
num_readers: 1
}
\ No newline at end of file
...@@ -173,19 +173,19 @@ train_config: { ...@@ -173,19 +173,19 @@ train_config: {
train_input_reader: { train_input_reader: {
tf_record_input_reader { tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/pet_faces_train.record-?????-of-?????" input_path: "PATH_TO_BE_CONFIGURED/pet_faces_train.record-?????-of-00010"
} }
label_map_path: "PATH_TO_BE_CONFIGURED/pet_label_map.pbtxt" label_map_path: "PATH_TO_BE_CONFIGURED/pet_label_map.pbtxt"
} }
eval_config: { eval_config: {
metrics_set: "coco_detection_metrics" metrics_set: "coco_detection_metrics"
num_examples: 1101 num_examples: 1100
} }
eval_input_reader: { eval_input_reader: {
tf_record_input_reader { tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/pet_faces_val.record-?????-of-?????" input_path: "PATH_TO_BE_CONFIGURED/pet_faces_val.record-?????-of-00010"
} }
label_map_path: "PATH_TO_BE_CONFIGURED/pet_label_map.pbtxt" label_map_path: "PATH_TO_BE_CONFIGURED/pet_label_map.pbtxt"
shuffle: false shuffle: false
......
# SSD with Mobilenet v1 PPN feature extractor.
# Trained on COCO, initialized from Imagenet classification checkpoint
# Achieves 19.7 mAP on COCO14 minival dataset.
# This config is TPU compatible.
model {
ssd {
inplace_batchnorm_update: true
freeze_batchnorm: false
num_classes: 90
box_coder {
faster_rcnn_box_coder {
y_scale: 10.0
x_scale: 10.0
height_scale: 5.0
width_scale: 5.0
}
}
matcher {
argmax_matcher {
matched_threshold: 0.5
unmatched_threshold: 0.5
ignore_thresholds: false
negatives_lower_than_unmatched: true
force_match_for_each_row: true
use_matmul_gather: true
}
}
similarity_calculator {
iou_similarity {
}
}
encode_background_as_zeros: true
anchor_generator {
ssd_anchor_generator {
num_layers: 6
min_scale: 0.15
max_scale: 0.95
aspect_ratios: 1.0
aspect_ratios: 2.0
aspect_ratios: 0.5
aspect_ratios: 3.0
aspect_ratios: 0.3333
reduce_boxes_in_lowest_layer: false
}
}
image_resizer {
fixed_shape_resizer {
height: 300
width: 300
}
}
box_predictor {
weight_shared_convolutional_box_predictor {
depth: 512
class_prediction_bias_init: -4.6
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
random_normal_initializer {
stddev: 0.01
mean: 0.0
}
}
batch_norm {
scale: true
center: true
train: true
decay: 0.97
epsilon: 0.001
}
}
num_layers_before_predictor: 1
kernel_size: 1
share_prediction_tower: true
}
}
feature_extractor {
type: 'ssd_mobilenet_v1_ppn'
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
random_normal_initializer {
stddev: 0.01
mean: 0.0
}
}
batch_norm {
scale: true
center: true
decay: 0.97
epsilon: 0.001
}
}
override_base_feature_extractor_hyperparams: true
}
loss {
classification_loss {
weighted_sigmoid_focal {
alpha: 0.75
gamma: 2.0
}
}
localization_loss {
weighted_smooth_l1 {
}
}
classification_weight: 1.0
localization_weight: 1.5
}
normalize_loss_by_num_matches: true
normalize_loc_loss_by_codesize: true
post_processing {
batch_non_max_suppression {
score_threshold: 1e-8
iou_threshold: 0.6
max_detections_per_class: 100
max_total_detections: 100
}
score_converter: SIGMOID
}
}
}
train_config: {
fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt"
batch_size: 512
sync_replicas: true
startup_delay_steps: 0
replicas_to_aggregate: 8
num_steps: 50000
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
ssd_random_crop {
}
}
optimizer {
momentum_optimizer: {
learning_rate: {
cosine_decay_learning_rate {
learning_rate_base: 0.7
total_steps: 50000
warmup_learning_rate: 0.1333
warmup_steps: 2000
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
max_number_of_boxes: 100
unpad_groundtruth_tensors: false
}
train_input_reader: {
tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/mscoco_train.record-00000-of-00100"
}
label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
}
eval_config: {
metrics_set: "coco_detection_metrics"
use_moving_averages: false
num_examples: 8000
}
eval_input_reader: {
tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/mscoco_val.record-00000-of-00010"
}
label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
shuffle: false
num_readers: 1
}
# SSD with Mobilenet v1 with quantized training.
# Trained on COCO, initialized from Imagenet classification checkpoint
# Achieves 18.2 mAP on coco14 minival dataset.
# This config is TPU compatible
model {
ssd {
inplace_batchnorm_update: true
freeze_batchnorm: false
num_classes: 90
box_coder {
faster_rcnn_box_coder {
y_scale: 10.0
x_scale: 10.0
height_scale: 5.0
width_scale: 5.0
}
}
matcher {
argmax_matcher {
matched_threshold: 0.5
unmatched_threshold: 0.5
ignore_thresholds: false
negatives_lower_than_unmatched: true
force_match_for_each_row: true
use_matmul_gather: true
}
}
similarity_calculator {
iou_similarity {
}
}
encode_background_as_zeros: true
anchor_generator {
ssd_anchor_generator {
num_layers: 6
min_scale: 0.2
max_scale: 0.95
aspect_ratios: 1.0
aspect_ratios: 2.0
aspect_ratios: 0.5
aspect_ratios: 3.0
aspect_ratios: 0.3333
}
}
image_resizer {
fixed_shape_resizer {
height: 300
width: 300
}
}
box_predictor {
convolutional_box_predictor {
min_depth: 0
max_depth: 0
num_layers_before_predictor: 0
use_dropout: false
dropout_keep_probability: 0.8
kernel_size: 1
box_code_size: 4
apply_sigmoid_to_scores: false
class_prediction_bias_init: -4.6
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
random_normal_initializer {
stddev: 0.01
mean: 0.0
}
}
batch_norm {
scale: true,
center: true,
decay: 0.97,
epsilon: 0.001,
}
}
}
}
feature_extractor {
type: 'ssd_mobilenet_v1'
min_depth: 16
depth_multiplier: 1.0
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
random_normal_initializer {
stddev: 0.01
mean: 0.0
}
}
batch_norm {
scale: true,
center: true,
decay: 0.97,
epsilon: 0.001,
}
}
override_base_feature_extractor_hyperparams: true
}
loss {
classification_loss {
weighted_sigmoid_focal {
alpha: 0.75,
gamma: 2.0
}
}
localization_loss {
weighted_smooth_l1 {
}
}
classification_weight: 1.0
localization_weight: 1.0
}
normalize_loss_by_num_matches: true
normalize_loc_loss_by_codesize: true
post_processing {
batch_non_max_suppression {
score_threshold: 1e-8
iou_threshold: 0.6
max_detections_per_class: 100
max_total_detections: 100
}
score_converter: SIGMOID
}
}
}
train_config: {
fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt"
batch_size: 128
sync_replicas: true
startup_delay_steps: 0
replicas_to_aggregate: 8
num_steps: 50000
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
ssd_random_crop {
}
}
optimizer {
momentum_optimizer: {
learning_rate: {
cosine_decay_learning_rate {
learning_rate_base: .2
total_steps: 50000
warmup_learning_rate: 0.06
warmup_steps: 2000
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
max_number_of_boxes: 100
unpad_groundtruth_tensors: false
}
train_input_reader: {
tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/mscoco_train.record-00000-of-00100"
}
label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
}
eval_config: {
metrics_set: "coco_detection_metrics"
use_moving_averages: false
num_examples: 8000
}
eval_input_reader: {
tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/mscoco_val.record-00000-of-00010"
}
label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
shuffle: false
num_readers: 1
}
graph_rewriter {
quantization {
delay: 48000
activation_bits: 8
weight_bits: 8
}
}
...@@ -172,7 +172,7 @@ train_config: { ...@@ -172,7 +172,7 @@ train_config: {
train_input_reader: { train_input_reader: {
tf_record_input_reader { tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/mscoco_train.record" input_path: "PATH_TO_BE_CONFIGURED/mscoco_train.record-?????-of-00100"
} }
label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt" label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
} }
...@@ -186,7 +186,7 @@ eval_config: { ...@@ -186,7 +186,7 @@ eval_config: {
eval_input_reader: { eval_input_reader: {
tf_record_input_reader { tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/mscoco_val.record" input_path: "PATH_TO_BE_CONFIGURED/mscoco_val.record-?????-of-00010"
} }
label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt" label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
shuffle: false shuffle: false
......
# SSD with Resnet 50 v1 FPN feature extractor, shared box predictor and focal
# loss (a.k.a Retinanet).
# See Lin et al, https://arxiv.org/abs/1708.02002
# Trained on COCO, initialized from Imagenet classification checkpoint
# Achieves 35.2 mAP on COCO14 minival dataset. Doubling the number of training
# steps to 50k gets 36.9 mAP
# This config is TPU compatible
model {
ssd {
inplace_batchnorm_update: true
freeze_batchnorm: false
num_classes: 90
box_coder {
faster_rcnn_box_coder {
y_scale: 10.0
x_scale: 10.0
height_scale: 5.0
width_scale: 5.0
}
}
matcher {
argmax_matcher {
matched_threshold: 0.5
unmatched_threshold: 0.5
ignore_thresholds: false
negatives_lower_than_unmatched: true
force_match_for_each_row: true
use_matmul_gather: true
}
}
similarity_calculator {
iou_similarity {
}
}
encode_background_as_zeros: true
anchor_generator {
multiscale_anchor_generator {
min_level: 3
max_level: 7
anchor_scale: 4.0
aspect_ratios: [1.0, 2.0, 0.5]
scales_per_octave: 2
}
}
image_resizer {
fixed_shape_resizer {
height: 640
width: 640
}
}
box_predictor {
weight_shared_convolutional_box_predictor {
depth: 256
class_prediction_bias_init: -4.6
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.0004
}
}
initializer {
random_normal_initializer {
stddev: 0.01
mean: 0.0
}
}
batch_norm {
scale: true,
decay: 0.997,
epsilon: 0.001,
}
}
num_layers_before_predictor: 4
kernel_size: 3
}
}
feature_extractor {
type: 'ssd_resnet50_v1_fpn'
min_depth: 16
depth_multiplier: 1.0
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.0004
}
}
initializer {
truncated_normal_initializer {
stddev: 0.03
mean: 0.0
}
}
batch_norm {
scale: true,
decay: 0.997,
epsilon: 0.001,
}
}
override_base_feature_extractor_hyperparams: true
}
loss {
classification_loss {
weighted_sigmoid_focal {
alpha: 0.25
gamma: 2.0
}
}
localization_loss {
weighted_smooth_l1 {
}
}
classification_weight: 1.0
localization_weight: 1.0
}
normalize_loss_by_num_matches: true
normalize_loc_loss_by_codesize: true
post_processing {
batch_non_max_suppression {
score_threshold: 1e-8
iou_threshold: 0.6
max_detections_per_class: 100
max_total_detections: 100
}
score_converter: SIGMOID
}
}
}
train_config: {
fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt"
batch_size: 64
sync_replicas: true
startup_delay_steps: 0
replicas_to_aggregate: 8
num_steps: 25000
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
random_crop_image {
min_object_covered: 0.0
min_aspect_ratio: 0.75
max_aspect_ratio: 3.0
min_area: 0.75
max_area: 1.0
overlap_thresh: 0.0
}
}
optimizer {
momentum_optimizer: {
learning_rate: {
cosine_decay_learning_rate {
learning_rate_base: .04
total_steps: 25000
warmup_learning_rate: .013333
warmup_steps: 2000
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
max_number_of_boxes: 100
unpad_groundtruth_tensors: false
}
train_input_reader: {
tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/mscoco_train.record-00000-of-00100"
}
label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
}
eval_config: {
metrics_set: "coco_detection_metrics"
use_moving_averages: false
num_examples: 8000
}
eval_input_reader: {
tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/mscoco_val.record-00000-of-00010"
}
label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
shuffle: false
num_readers: 1
}
\ No newline at end of file
...@@ -174,7 +174,7 @@ train_config: { ...@@ -174,7 +174,7 @@ train_config: {
train_input_reader: { train_input_reader: {
tf_record_input_reader { tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/mscoco_train.record" input_path: "PATH_TO_BE_CONFIGURED/mscoco_train.record-?????-of-00100"
} }
label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt" label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
} }
...@@ -188,7 +188,7 @@ eval_config: { ...@@ -188,7 +188,7 @@ eval_config: {
eval_input_reader: { eval_input_reader: {
tf_record_input_reader { tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/mscoco_val.record" input_path: "PATH_TO_BE_CONFIGURED/mscoco_val.record-?????-of-00010"
} }
label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt" label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
shuffle: false shuffle: false
......
...@@ -174,7 +174,7 @@ train_config: { ...@@ -174,7 +174,7 @@ train_config: {
train_input_reader: { train_input_reader: {
tf_record_input_reader { tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/mscoco_train.record" input_path: "PATH_TO_BE_CONFIGURED/mscoco_train.record-?????-of-00100"
} }
label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt" label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
} }
...@@ -188,7 +188,7 @@ eval_config: { ...@@ -188,7 +188,7 @@ eval_config: {
eval_input_reader: { eval_input_reader: {
tf_record_input_reader { tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/mscoco_val.record" input_path: "PATH_TO_BE_CONFIGURED/mscoco_val.record-?????-of-00010"
} }
label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt" label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
shuffle: false shuffle: false
......
...@@ -23,7 +23,9 @@ def exponential_decay_with_burnin(global_step, ...@@ -23,7 +23,9 @@ def exponential_decay_with_burnin(global_step,
learning_rate_decay_steps, learning_rate_decay_steps,
learning_rate_decay_factor, learning_rate_decay_factor,
burnin_learning_rate=0.0, burnin_learning_rate=0.0,
burnin_steps=0): burnin_steps=0,
min_learning_rate=0.0,
staircase=True):
"""Exponential decay schedule with burn-in period. """Exponential decay schedule with burn-in period.
In this schedule, learning rate is fixed at burnin_learning_rate In this schedule, learning rate is fixed at burnin_learning_rate
...@@ -41,6 +43,8 @@ def exponential_decay_with_burnin(global_step, ...@@ -41,6 +43,8 @@ def exponential_decay_with_burnin(global_step,
0.0 (which is the default), then the burn-in learning rate is simply 0.0 (which is the default), then the burn-in learning rate is simply
set to learning_rate_base. set to learning_rate_base.
burnin_steps: number of steps to use burnin learning rate. burnin_steps: number of steps to use burnin learning rate.
min_learning_rate: the minimum learning rate.
staircase: whether use staircase decay.
Returns: Returns:
a (scalar) float tensor representing learning rate a (scalar) float tensor representing learning rate
...@@ -49,14 +53,14 @@ def exponential_decay_with_burnin(global_step, ...@@ -49,14 +53,14 @@ def exponential_decay_with_burnin(global_step,
burnin_learning_rate = learning_rate_base burnin_learning_rate = learning_rate_base
post_burnin_learning_rate = tf.train.exponential_decay( post_burnin_learning_rate = tf.train.exponential_decay(
learning_rate_base, learning_rate_base,
global_step, global_step - burnin_steps,
learning_rate_decay_steps, learning_rate_decay_steps,
learning_rate_decay_factor, learning_rate_decay_factor,
staircase=True) staircase=staircase)
return tf.where( return tf.maximum(tf.where(
tf.less(tf.cast(global_step, tf.int32), tf.constant(burnin_steps)), tf.less(tf.cast(global_step, tf.int32), tf.constant(burnin_steps)),
tf.constant(burnin_learning_rate), tf.constant(burnin_learning_rate),
post_burnin_learning_rate, name='learning_rate') post_burnin_learning_rate), min_learning_rate, name='learning_rate')
def cosine_decay_with_warmup(global_step, def cosine_decay_with_warmup(global_step,
......
...@@ -30,17 +30,19 @@ class LearningSchedulesTest(test_case.TestCase): ...@@ -30,17 +30,19 @@ class LearningSchedulesTest(test_case.TestCase):
learning_rate_decay_factor = .1 learning_rate_decay_factor = .1
burnin_learning_rate = .5 burnin_learning_rate = .5
burnin_steps = 2 burnin_steps = 2
min_learning_rate = .05
learning_rate = learning_schedules.exponential_decay_with_burnin( learning_rate = learning_schedules.exponential_decay_with_burnin(
global_step, learning_rate_base, learning_rate_decay_steps, global_step, learning_rate_base, learning_rate_decay_steps,
learning_rate_decay_factor, burnin_learning_rate, burnin_steps) learning_rate_decay_factor, burnin_learning_rate, burnin_steps,
min_learning_rate)
assert learning_rate.op.name.endswith('learning_rate') assert learning_rate.op.name.endswith('learning_rate')
return (learning_rate,) return (learning_rate,)
output_rates = [ output_rates = [
self.execute(graph_fn, [np.array(i).astype(np.int64)]) for i in range(8) self.execute(graph_fn, [np.array(i).astype(np.int64)]) for i in range(9)
] ]
exp_rates = [.5, .5, 1, .1, .1, .1, .01, .01] exp_rates = [.5, .5, 1, 1, 1, .1, .1, .1, .05]
self.assertAllClose(output_rates, exp_rates, rtol=1e-4) self.assertAllClose(output_rates, exp_rates, rtol=1e-4)
def testCosineDecayWithWarmup(self): def testCosineDecayWithWarmup(self):
......
...@@ -137,9 +137,14 @@ class PerImageVRDEvaluation(object): ...@@ -137,9 +137,14 @@ class PerImageVRDEvaluation(object):
result_tp_fp_labels.append(tp_fp_labels) result_tp_fp_labels.append(tp_fp_labels)
result_mapping.append(selector_mapping[sorted_indices]) result_mapping.append(selector_mapping[sorted_indices])
result_scores = np.concatenate(result_scores) if result_scores:
result_tp_fp_labels = np.concatenate(result_tp_fp_labels) result_scores = np.concatenate(result_scores)
result_mapping = np.concatenate(result_mapping) result_tp_fp_labels = np.concatenate(result_tp_fp_labels)
result_mapping = np.concatenate(result_mapping)
else:
result_scores = np.array([], dtype=float)
result_tp_fp_labels = np.array([], dtype=bool)
result_mapping = np.array([], dtype=int)
sorted_indices = np.argsort(result_scores) sorted_indices = np.argsort(result_scores)
sorted_indices = sorted_indices[::-1] sorted_indices = sorted_indices[::-1]
......
...@@ -178,6 +178,14 @@ class VRDDetectionEvaluator(object_detection_evaluation.DetectionEvaluator): ...@@ -178,6 +178,14 @@ class VRDDetectionEvaluator(object_detection_evaluation.DetectionEvaluator):
corresponding bounding boxes and possibly additional classes (see corresponding bounding boxes and possibly additional classes (see
datatype label_data_type above). datatype label_data_type above).
""" """
if image_id not in self._image_ids:
logging.warn('No groundtruth for the image with id %s.', image_id)
# Since for the correct work of evaluator it is assumed that groundtruth
# is inserted first we make sure to break the code if is it not the case.
self._image_ids.update([image_id])
self._negative_labels[image_id] = np.array([])
self._evaluatable_labels[image_id] = np.array([])
num_detections = detections_dict[ num_detections = detections_dict[
standard_fields.DetectionResultFields.detection_boxes].shape[0] standard_fields.DetectionResultFields.detection_boxes].shape[0]
detection_class_tuples = detections_dict[ detection_class_tuples = detections_dict[
...@@ -186,7 +194,6 @@ class VRDDetectionEvaluator(object_detection_evaluation.DetectionEvaluator): ...@@ -186,7 +194,6 @@ class VRDDetectionEvaluator(object_detection_evaluation.DetectionEvaluator):
standard_fields.DetectionResultFields.detection_boxes] standard_fields.DetectionResultFields.detection_boxes]
negative_selector = np.zeros(num_detections, dtype=bool) negative_selector = np.zeros(num_detections, dtype=bool)
selector = np.ones(num_detections, dtype=bool) selector = np.ones(num_detections, dtype=bool)
# Only check boxable labels # Only check boxable labels
for field in detection_box_tuples.dtype.fields: for field in detection_box_tuples.dtype.fields:
# Verify if one of the labels is negative (this is sure FP) # Verify if one of the labels is negative (this is sure FP)
...@@ -483,8 +490,9 @@ class _VRDDetectionEvaluation(object): ...@@ -483,8 +490,9 @@ class _VRDDetectionEvaluation(object):
groundtruth_box_tuples = self._groundtruth_box_tuples[image_key] groundtruth_box_tuples = self._groundtruth_box_tuples[image_key]
groundtruth_class_tuples = self._groundtruth_class_tuples[image_key] groundtruth_class_tuples = self._groundtruth_class_tuples[image_key]
else: else:
groundtruth_box_tuples = np.empty(shape=[0, 4], dtype=float) groundtruth_box_tuples = np.empty(
groundtruth_class_tuples = np.array([], dtype=int) shape=[0, 4], dtype=detected_box_tuples.dtype)
groundtruth_class_tuples = np.array([], dtype=detected_class_tuples.dtype)
scores, tp_fp_labels, mapping = ( scores, tp_fp_labels, mapping = (
self._per_image_eval.compute_detection_tp_fp( self._per_image_eval.compute_detection_tp_fp(
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment