Commit 001a2a61 authored by pkulzc's avatar pkulzc Committed by Sergio Guadarrama
Browse files

Internal changes for object detection. (#3656)

* Force cast of num_classes to integer

PiperOrigin-RevId: 188335318

* Updating config util to allow overwriting of cosine decay learning rates.

PiperOrigin-RevId: 188338852

* Make box_list_ops.py and box_list_ops_test.py work with C API enabled.

The C API has improved shape inference over the original Python
code. This causes some previously-working conds to fail. Switching to smart_cond fixes this.

Another effect of the improved shape inference is that one of the
failures tested gets caught earlier, so I modified the test to reflect
this.

PiperOrigin-RevId: 188409792

* Fix parallel event file writing issue.

Without this change, the event files might get corrupted when multiple evaluations are run in parallel.

PiperOrigin-RevId: 188502560

* Deprecating the boolean flag of from_detection_checkpoint.

Replace with a string field fine_tune_checkpoint_type to train_config to provide extensibility. The fine_tune_checkpoint_type can currently take value of `detection`, `classification`, or others when the restore_map is overwritten.

PiperOrigin-RevId: 188518685

* Automated g4 rollback of changelist 188502560

PiperOrigin-RevId: 188519969

* Introducing eval metrics specs for Coco Mask metrics. This allows metrics to be computed in tensorflow using the tf.learn Estimator.

PiperOrigin-RevId: 188528485

* Minor fix to make object_detection/metrics/coco_evaluation.py python3 compatible.

PiperOrigin-RevId: 188550683

* Updating eval_util to handle eval_metric_ops from multiple `DetectionEvaluator`s.

PiperOrigin-RevId: 188560474

* Allow tensor input for new_height and new_width for resize_image.

PiperOrigin-RevId: 188561908

* Fix typo in fine_tune_checkpoint_type name in trainer.

PiperOrigin-RevId: 188799033

* Adding mobilenet feature extractor to object detection.

PiperOrigin-RevId: 188916897

* Allow label maps to optionally contain an explicit background class with id zero.

PiperOrigin-RevId: 188951089

* Fix boundary conditions in random_pad_to_aspect_ratio to ensure that min_scale is always less than max_scale.

PiperOrigin-RevId: 189026868

* Fallback on from_detection_checkpoint option if fine_tune_checkpoint_type isn't set.

PiperOrigin-RevId: 189052833

* Add proper names for learning rate schedules so we don't see cryptic names on tensorboard.

PiperOrigin-RevId: 189069837

* Enforcing that all datasets are batched (and then unbatched in the model) with batch_size >= 1.

PiperOrigin-RevId: 189117178

* Adding regularization to total loss returned from DetectionModel.loss().

PiperOrigin-RevId: 189189123

* Standardize the names of loss scalars (for SSD, Faster R-CNN and R-FCN) in both training and eval so they can be compared on tensorboard.

Log localization and classification losses in evaluation.

PiperOrigin-RevId: 189189940

* Remove negative test from box list ops test.

PiperOrigin-RevId: 189229327

* Add an option to warmup learning rate in manual stepping schedule.

PiperOrigin-RevId: 189361039

* Replace tf.contrib.slim.tfexample_decoder.LookupTensor with object_detection.data_decoders.tf_example_decoder.LookupTensor.

PiperOrigin-RevId: 189388556

* Force regularization summary variables under specific family names.

PiperOrigin-RevId: 189393190

* Automated g4 rollback of changelist 188619139

PiperOrigin-RevId: 189396001

* Remove step 0 schedule since we do a hard check for it after cl/189361039

PiperOrigin-RevId: 189396697

* PiperOrigin-RevId: 189040463

* PiperOrigin-RevId: 189059229

* PiperOrigin-RevId: 189214402

* Force regularization summary variables under specific family names.

PiperOrigin-RevId: 189393190

* Automated g4 rollback of changelist 188619139

PiperOrigin-RevId: 189396001

* Make slim python3 compatible.

* Monir fixes.

* Add TargetAssignment summaries in a separate family.

PiperOrigin-RevId: 189407487

* 1. Setting `family` keyword arg prepends the summary names twice with the same name. Directly adding family suffix to the name gets rid of this problem.
2. Make sure the eval losses have the same name.

PiperOrigin-RevId: 189434618

* Minor fixes to make object detection tf 1.4 compatible.

PiperOrigin-RevId: 189437519

* Call the base of mobilenet_v1 feature extractor under the right arg scope and set batchnorm is_training based on the value passed in the constructor.

PiperOrigin-RevId: 189460890

* Automated g4 rollback of changelist 188409792

PiperOrigin-RevId: 189463882

* Update object detection syncing.

PiperOrigin-RevId: 189601955

* Add an option to warmup learning rate, hold it constant for a certain number of steps and cosine decay it.

PiperOrigin-RevId: 189606169

* Let the proposal feature extractor function in faster_rcnn meta architectures return the activations (end_points).

PiperOrigin-RevId: 189619301

* Fixed bug which caused masks to be mostly zeros (caused by detection_boxes being in absolute coordinates if scale_to_absolute=True.

PiperOrigin-RevId: 189641294

* Open sourcing Mobilenetv2 + SSDLite.

PiperOrigin-RevId: 189654520

* Remove unused files.
parent 2913cb24
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""SSDFeatureExtractor for MobilenetV2 features."""
import tensorflow as tf
from object_detection.meta_architectures import ssd_meta_arch
from object_detection.models import feature_map_generators
from object_detection.utils import ops
from object_detection.utils import shape_utils
from nets.mobilenet import mobilenet
from nets.mobilenet import mobilenet_v2
slim = tf.contrib.slim
class SSDMobileNetV2FeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
"""SSD Feature Extractor using MobilenetV2 features."""
def __init__(self,
is_training,
depth_multiplier,
min_depth,
pad_to_multiple,
conv_hyperparams,
batch_norm_trainable=True,
reuse_weights=None,
use_explicit_padding=False,
use_depthwise=False):
"""MobileNetV2 Feature Extractor for SSD Models.
Mobilenet v2 (experimental), designed by sandler@. More details can be found
in //knowledge/cerebra/brain/compression/mobilenet/mobilenet_experimental.py
Args:
is_training: whether the network is in training mode.
depth_multiplier: float depth multiplier for feature extractor.
min_depth: minimum feature extractor depth.
pad_to_multiple: the nearest multiple to zero pad the input height and
width dimensions to.
conv_hyperparams: tf slim arg_scope for conv2d and separable_conv2d ops.
batch_norm_trainable: Whether to update batch norm parameters during
training or not. When training with a small batch size
(e.g. 1), it is desirable to disable batch norm update and use
pretrained batch norm params.
reuse_weights: Whether to reuse variables. Default is None.
use_explicit_padding: Whether to use explicit padding when extracting
features. Default is False.
use_depthwise: Whether to use depthwise convolutions. Default is False.
"""
super(SSDMobileNetV2FeatureExtractor, self).__init__(
is_training, depth_multiplier, min_depth, pad_to_multiple,
conv_hyperparams, batch_norm_trainable, reuse_weights,
use_explicit_padding, use_depthwise)
def preprocess(self, resized_inputs):
"""SSD preprocessing.
Maps pixel values to the range [-1, 1].
Args:
resized_inputs: a [batch, height, width, channels] float tensor
representing a batch of images.
Returns:
preprocessed_inputs: a [batch, height, width, channels] float tensor
representing a batch of images.
"""
return (2.0 / 255.0) * resized_inputs - 1.0
def extract_features(self, preprocessed_inputs):
"""Extract features from preprocessed inputs.
Args:
preprocessed_inputs: a [batch, height, width, channels] float tensor
representing a batch of images.
Returns:
feature_maps: a list of tensors where the ith tensor has shape
[batch, height_i, width_i, depth_i]
"""
preprocessed_inputs = shape_utils.check_min_image_dim(
33, preprocessed_inputs)
feature_map_layout = {
'from_layer': ['layer_15/expansion_output', 'layer_19', '', '', '', ''],
'layer_depth': [-1, -1, 512, 256, 256, 128],
'use_depthwise': self._use_depthwise,
'use_explicit_padding': self._use_explicit_padding,
}
with tf.variable_scope('MobilenetV2', reuse=self._reuse_weights) as scope:
with slim.arg_scope(
mobilenet_v2.training_scope(
is_training=(self._is_training and self._batch_norm_trainable),
bn_decay=0.9997)), \
slim.arg_scope(
[mobilenet.depth_multiplier], min_depth=self._min_depth):
# TODO(b/68150321): Enable fused batch norm once quantization
# supports it.
with slim.arg_scope([slim.batch_norm], fused=False):
_, image_features = mobilenet_v2.mobilenet_base(
ops.pad_to_multiple(preprocessed_inputs, self._pad_to_multiple),
final_endpoint='layer_19',
depth_multiplier=self._depth_multiplier,
use_explicit_padding=self._use_explicit_padding,
scope=scope)
with slim.arg_scope(self._conv_hyperparams):
# TODO(b/68150321): Enable fused batch norm once quantization
# supports it.
with slim.arg_scope([slim.batch_norm], fused=False):
feature_maps = feature_map_generators.multi_resolution_feature_maps(
feature_map_layout=feature_map_layout,
depth_multiplier=self._depth_multiplier,
min_depth=self._min_depth,
insert_1x1_conv=True,
image_features=image_features)
return feature_maps.values()
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for ssd_mobilenet_v2_feature_extractor."""
import numpy as np
import tensorflow as tf
from object_detection.models import ssd_feature_extractor_test
from object_detection.models import ssd_mobilenet_v2_feature_extractor
slim = tf.contrib.slim
class SsdMobilenetV2FeatureExtractorTest(
ssd_feature_extractor_test.SsdFeatureExtractorTestBase):
def _create_feature_extractor(self, depth_multiplier, pad_to_multiple,
use_explicit_padding=False):
"""Constructs a new feature extractor.
Args:
depth_multiplier: float depth multiplier for feature extractor
pad_to_multiple: the nearest multiple to zero pad the input height and
width dimensions to.
use_explicit_padding: use 'VALID' padding for convolutions, but prepad
inputs so that the output dimensions are the same as if 'SAME' padding
were used.
Returns:
an ssd_meta_arch.SSDFeatureExtractor object.
"""
min_depth = 32
with slim.arg_scope([slim.conv2d], normalizer_fn=slim.batch_norm) as sc:
conv_hyperparams = sc
return ssd_mobilenet_v2_feature_extractor.SSDMobileNetV2FeatureExtractor(
False,
depth_multiplier,
min_depth,
pad_to_multiple,
conv_hyperparams,
use_explicit_padding=use_explicit_padding)
def test_extract_features_returns_correct_shapes_128(self):
image_height = 128
image_width = 128
depth_multiplier = 1.0
pad_to_multiple = 1
expected_feature_map_shape = [(2, 8, 8, 576), (2, 4, 4, 1280),
(2, 2, 2, 512), (2, 1, 1, 256),
(2, 1, 1, 256), (2, 1, 1, 128)]
self.check_extract_features_returns_correct_shape(
2, image_height, image_width, depth_multiplier, pad_to_multiple,
expected_feature_map_shape)
def test_extract_features_returns_correct_shapes_with_dynamic_inputs(self):
image_height = 128
image_width = 128
depth_multiplier = 1.0
pad_to_multiple = 1
expected_feature_map_shape = [(2, 8, 8, 576), (2, 4, 4, 1280),
(2, 2, 2, 512), (2, 1, 1, 256),
(2, 1, 1, 256), (2, 1, 1, 128)]
self.check_extract_features_returns_correct_shapes_with_dynamic_inputs(
2, image_height, image_width, depth_multiplier, pad_to_multiple,
expected_feature_map_shape)
def test_extract_features_returns_correct_shapes_299(self):
image_height = 299
image_width = 299
depth_multiplier = 1.0
pad_to_multiple = 1
expected_feature_map_shape = [(2, 19, 19, 576), (2, 10, 10, 1280),
(2, 5, 5, 512), (2, 3, 3, 256),
(2, 2, 2, 256), (2, 1, 1, 128)]
self.check_extract_features_returns_correct_shape(
2, image_height, image_width, depth_multiplier, pad_to_multiple,
expected_feature_map_shape)
def test_extract_features_returns_correct_shapes_enforcing_min_depth(self):
image_height = 299
image_width = 299
depth_multiplier = 0.5**12
pad_to_multiple = 1
expected_feature_map_shape = [(2, 19, 19, 192), (2, 10, 10, 32),
(2, 5, 5, 32), (2, 3, 3, 32),
(2, 2, 2, 32), (2, 1, 1, 32)]
self.check_extract_features_returns_correct_shape(
2, image_height, image_width, depth_multiplier, pad_to_multiple,
expected_feature_map_shape)
def test_extract_features_returns_correct_shapes_with_pad_to_multiple(self):
image_height = 299
image_width = 299
depth_multiplier = 1.0
pad_to_multiple = 32
expected_feature_map_shape = [(2, 20, 20, 576), (2, 10, 10, 1280),
(2, 5, 5, 512), (2, 3, 3, 256),
(2, 2, 2, 256), (2, 1, 1, 128)]
self.check_extract_features_returns_correct_shape(
2, image_height, image_width, depth_multiplier, pad_to_multiple,
expected_feature_map_shape)
def test_extract_features_raises_error_with_invalid_image_size(self):
image_height = 32
image_width = 32
depth_multiplier = 1.0
pad_to_multiple = 1
self.check_extract_features_raises_error_with_invalid_image_size(
image_height, image_width, depth_multiplier, pad_to_multiple)
def test_preprocess_returns_correct_value_range(self):
image_height = 128
image_width = 128
depth_multiplier = 1
pad_to_multiple = 1
test_image = np.random.rand(4, image_height, image_width, 3)
feature_extractor = self._create_feature_extractor(depth_multiplier,
pad_to_multiple)
preprocessed_image = feature_extractor.preprocess(test_image)
self.assertTrue(np.all(np.less_equal(np.abs(preprocessed_image), 1.0)))
def test_variables_only_created_in_scope(self):
depth_multiplier = 1
pad_to_multiple = 1
scope_name = 'MobilenetV2'
self.check_feature_extractor_variables_under_scope(
depth_multiplier, pad_to_multiple, scope_name)
def test_nofused_batchnorm(self):
image_height = 40
image_width = 40
depth_multiplier = 1
pad_to_multiple = 1
image_placeholder = tf.placeholder(tf.float32,
[1, image_height, image_width, 3])
feature_extractor = self._create_feature_extractor(depth_multiplier,
pad_to_multiple)
preprocessed_image = feature_extractor.preprocess(image_placeholder)
_ = feature_extractor.extract_features(preprocessed_image)
self.assertFalse(any(op.type == 'FusedBatchNorm'
for op in tf.get_default_graph().get_operations()))
if __name__ == '__main__':
tf.test.main()
......@@ -71,6 +71,10 @@ message ManualStepLearningRate {
optional float learning_rate = 2 [default = 0.002];
}
repeated LearningRateSchedule schedule = 2;
// Whether to linearly interpolate learning rates for steps in
// [0, schedule[0].step].
optional bool warmup = 3 [default = false];
}
// Configuration message for a cosine decaying learning rate as defined in
......@@ -80,4 +84,5 @@ message CosineDecayLearningRate {
optional uint32 total_steps = 2 [default = 4000000];
optional float warmup_learning_rate = 3 [default = 0.0002];
optional uint32 warmup_steps = 4 [default = 10000];
optional uint32 hold_base_rate_steps = 5 [default = 0];
}
......@@ -72,7 +72,8 @@ message SsdFeatureExtractor {
// Minimum number of the channels in the feature extractor.
optional int32 min_depth = 3 [default=16];
// Hyperparameters for the feature extractor.
// Hyperparameters that affect the layers of feature extractor added on top
// of the base feature extractor.
optional Hyperparams conv_hyperparams = 4;
// The nearest multiple to zero-pad the input height and width dimensions to.
......
......@@ -29,11 +29,17 @@ message TrainConfig {
// extractor variables trained outside of object detection.
optional string fine_tune_checkpoint = 7 [default=""];
// Type of checkpoint to restore variables from, e.g. 'classification' or
// 'detection'. Provides extensibility to from_detection_checkpoint.
// Typically used to load feature extractor variables from trained models.
optional string fine_tune_checkpoint_type = 22 [default=""];
// [Deprecated]: use fine_tune_checkpoint_type instead.
// Specifies if the finetune checkpoint is from an object detection model.
// If from an object detection model, the model being trained should have
// the same parameters with the exception of the num_classes parameter.
// If false, it assumes the checkpoint was a object classification model.
optional bool from_detection_checkpoint = 8 [default=false];
optional bool from_detection_checkpoint = 8 [default=false, deprecated=true];
// Whether to load all checkpoint vars that match model variable names and
// sizes. This option is only available if `from_detection_checkpoint` is
......@@ -83,7 +89,7 @@ message TrainConfig {
// Set this to at least the maximum amount of boxes in the input data.
// Otherwise, it may cause "Data loss: Attempted to pad to a smaller size
// than the input element" errors.
optional int32 max_number_of_boxes = 20 [default=50];
optional int32 max_number_of_boxes = 20 [default=100];
// Whether to remove padding along `num_boxes` dimension of the groundtruth
// tensors.
......
......@@ -90,10 +90,6 @@ train_config: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0003
schedule {
step: 0
learning_rate: .0003
}
schedule {
step: 900000
learning_rate: .00003
......
......@@ -90,10 +90,6 @@ train_config: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.00006
schedule {
step: 0
learning_rate: .00006
}
schedule {
step: 6000000
learning_rate: .000006
......
......@@ -90,10 +90,6 @@ train_config: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0003
schedule {
step: 0
learning_rate: .0003
}
schedule {
step: 900000
learning_rate: .00003
......
......@@ -89,10 +89,6 @@ train_config: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0002
schedule {
step: 0
learning_rate: .0002
}
schedule {
step: 900000
learning_rate: .00002
......
......@@ -88,10 +88,6 @@ train_config: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0002
schedule {
step: 0
learning_rate: .0002
}
schedule {
step: 900000
learning_rate: .00002
......
......@@ -91,10 +91,6 @@ train_config: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0003
schedule {
step: 0
learning_rate: .0003
}
schedule {
step: 900000
learning_rate: .00003
......
......@@ -90,10 +90,6 @@ train_config: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0003
schedule {
step: 0
learning_rate: .0003
}
schedule {
step: 900000
learning_rate: .00003
......
......@@ -88,10 +88,6 @@ train_config: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0003
schedule {
step: 0
learning_rate: .0003
}
schedule {
step: 900000
learning_rate: .00003
......
......@@ -93,10 +93,6 @@ train_config: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0001
schedule {
step: 0
learning_rate: .0001
}
schedule {
step: 500000
learning_rate: .00001
......
......@@ -88,10 +88,6 @@ train_config: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0003
schedule {
step: 0
learning_rate: .0003
}
schedule {
step: 900000
learning_rate: .00003
......
......@@ -88,10 +88,6 @@ train_config: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0001
schedule {
step: 0
learning_rate: .0001
}
schedule {
step: 500000
learning_rate: .00001
......
......@@ -88,10 +88,6 @@ train_config: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0003
schedule {
step: 0
learning_rate: .0003
}
schedule {
step: 900000
learning_rate: .00003
......
......@@ -88,10 +88,6 @@ train_config: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0003
schedule {
step: 0
learning_rate: .0003
}
schedule {
step: 900000
learning_rate: .00003
......
......@@ -88,10 +88,6 @@ train_config: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0003
schedule {
step: 0
learning_rate: .0003
}
schedule {
step: 900000
learning_rate: .00003
......
......@@ -88,10 +88,6 @@ train_config: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0003
schedule {
step: 0
learning_rate: .0003
}
schedule {
step: 900000
learning_rate: .00003
......@@ -118,6 +114,7 @@ train_config: {
random_horizontal_flip {
}
}
max_number_of_boxes: 50
}
train_input_reader: {
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment