Commit 0ba83cf0 authored by pkulzc's avatar pkulzc Committed by Sergio Guadarrama
Browse files

Release MobileNet V3 models and SSDLite models with MobileNet V3 backbone. (#7678)

* Merged commit includes the following changes:
275131829  by Sergio Guadarrama:

    updates mobilenet/README.md to be github compatible adds V2+ reference to mobilenet_v1.md file and fixes invalid markdown

--
274908068  by Sergio Guadarrama:

    Opensource MobilenetV3 detection models.

--
274697808  by Sergio Guadarrama:

    Fixed cases where tf.TensorShape was constructed with float dimensions

    This is a prerequisite for making TensorShape and Dimension more strict
    about the types of their arguments.

--
273577462  by Sergio Guadarrama:

    Fixing `conv_defs['defaults']` override issue.

--
272801298  by Sergio Guadarrama:

    Adds links to trained models for Moblienet V3, adds a version of minimalistic mobilenet-v3 to the definitions.

--
268928503  by Sergio Guadarrama:

    Mobilenet v2 with group normalization.

--
263492735  by Sergio Guadarrama:

    Internal change

260037126  by Sergio Guadarrama:

    Adds an option of using a custom depthwise operation in `expanded_conv`.

--
259997001  by Sergio Guadarrama:

    Explicitly mark Python binaries/tests with python_version = "PY2".

--
252697685  by Sergio Guadarrama:

    Internal change

251918746  by Sergio Guadarrama:

    Internal change

251909704  by Sergio Guadarrama:

    Mobilenet V3 backbone implementation.

--
247510236  by Sergio Guadarrama:

    Internal change

246196802  by Sergio Guadarrama:

    Internal change

246014539  by Sergio Guadarrama:

    Internal change

245891435  by Sergio Guadarrama:

    Internal change

245834925  by Sergio Guadarrama:

    n/a

--

PiperOrigin-RevId: 275131829

* Merged commit includes the following changes:
274959989  by Zhichao Lu:

    Update detection model zoo with MobilenetV3 SSD candidates.

--
274908068  by Zhichao Lu:

    Opensource MobilenetV3 detection models.

--
274695889  by richardmunoz:

    RandomPatchGaussian preprocessing step

    This step can be used during model training to randomly apply gaussian noise to a random image patch. Example addition to an Object Detection API pipeline config:

    train_config {
      ...
      data_augmentation_options {
        random_patch_gaussian {
          random_coef: 0.5
          min_patch_size: 1
          max_patch_size: 250
          min_gaussian_stddev: 0.0
          max_gaussian_stddev: 1.0
        }
      }
      ...
    }

--
274257872  by lzc:

    Internal change.

--
274114689  by Zhichao Lu:

    Pass native_resize flag to other FPN variants.

--
274112308  by lzc:

    Internal change.

--
274090763  by richardmunoz:

    Util function for getting a patch mask on an image for use with the Object Detection API

--
274069806  by Zhichao Lu:

    Adding functions which will help compute predictions and losses for CenterNet.

--
273860828  by lzc:

    Internal change.

--
273380069  by richardmunoz:

    RandomImageDownscaleToTargetPixels preprocessing step

    This step can be used during model training to randomly downscale an image to a random target number of pixels. If the image does not contain more than the target number of pixels, then downscaling is skipped. Example addition to an Object Detection API pipeline config:

    train_config {
      ...
      data_augmentation_options {
        random_downscale_to_target_pixels {
          random_coef: 0.5
          min_target_pixels: 300000
          max_target_pixels: 500000
        }
      }
      ...
    }

--
272987602  by Zhichao Lu:

    Avoid -inf when empty box list is passed.

--
272525836  by Zhichao Lu:

    Cleanup repeated resizing code in meta archs.

--
272458667  by richardmunoz:

    RandomJpegQuality preprocessing step

    This step can be used during model training to randomly encode the image into a jpeg with a random quality level. Example addition to an Object Detection API pipeline config:

    train_config {
      ...
      data_augmentation_options {
        random_jpeg_quality {
          random_coef: 0.5
          min_jpeg_quality: 80
          max_jpeg_quality: 100
        }
      }
      ...
    }

--
271412717  by Zhichao Lu:

    Enables TPU training with the V2 eager + tf.function Object Detection training loops.

--
270744153  by Zhichao Lu:

    Adding the offset and size target assigners for CenterNet.

--
269916081  by Zhichao Lu:

    Include basic installation in Object Detection API tutorial.
    Also:
     - Use TF2.0
     - Use saved_model

--
269376056  by Zhichao Lu:

    Fix to variable loading in RetinaNet w/ custom loops. (makes the code rely on the exact name scopes that are generated a little bit less)

--
269256251  by lzc:

    Add use_partitioned_nms field to config and update post_prossing_builder to honor that flag when building nms function.

--
268865295  by Zhichao Lu:

    Adding functionality for importing and merging back internal state of the metric.

--
268640984  by Zhichao Lu:

    Fix computation of gaussian sigma value to create CenterNet heatmap target.

--
267475576  by Zhichao Lu:

    Fix for exporter trying to export non-existent exponential moving averages.

--
267286768  by Zhichao Lu:

    Update mixed-precision policy.

--
266166879  by Zhichao Lu:

    Internal change

265860884  by Zhichao Lu:

    Apply floor function to center coordinates when creating heatmap for CenterNet target.

--
265702749  by Zhichao Lu:

    Internal change

--
264241949  by ronnyvotel:

    Updating Faster R-CNN 'final_anchors' to be in normalized coordinates.

--
264175192  by lzc:

    Update model_fn to only read hparams if it is not None.

--
264159328  by Zhichao Lu:

    Modify nearest neighbor upsampling to eliminate a multiply operation. For quantized models, the multiply operation gets unnecessarily quantized and reduces accuracy (simple stacking would work in place of the broadcast op which doesn't require quantization). Also removes an unnecessary reshape op.

--
263668306  by Zhichao Lu:

    Add the option to use dynamic map_fn for batch NMS

--
263031163  by Zhichao Lu:

    Mark outside compilation for NMS as optional.

--
263024916  by Zhichao Lu:

    Add an ExperimentalModel meta arch for experimenting with new model types.

--
262655894  by Zhichao Lu:

    Add the center heatmap target assigner for CenterNet

--
262431036  by Zhichao Lu:

    Adding add_eval_dict to allow for evaluation on model_v2

--
262035351  by ronnyvotel:

    Removing any non-Tensor predictions from the third stage of Mask R-CNN.

--
261953416  by Zhichao Lu:

    Internal change.

--
261834966  by Zhichao Lu:

    Fix the NMS OOM issue on TPU by forcing NMS to run outside of TPU.

--
261775941  by Zhichao Lu:

    Make Keras InputLayer compatible with both TF 1.x and TF 2.0.

--
261775633  by Zhichao Lu:

    Visualize additional channels with ground-truth bounding boxes.

--
261768117  by lzc:

    Internal change.

--
261766773  by ronnyvotel:

    Exposing `return_raw_detections_during_predict` in Faster R-CNN Proto.

--
260975089  by ronnyvotel:

    Moving calculation of batched prediction tensor names after all tensors in prediction dictionary are created.

--
259816913  by ronnyvotel:

    Adding raw detection boxes and feature map indices to SSD

--
259791955  by Zhichao Lu:

    Added a flag to control the use partitioned_non_max_suppression.

--
259580475  by Zhichao Lu:

    Tweak quantization-aware training re-writer to support NasFpn model architecture.

--
259579943  by rathodv:

    Add a meta target assigner proto and builders in OD API.

--
259577741  by Zhichao Lu:

    Internal change.

--
259366315  by lzc:

    Internal change.

--
259344310  by ronnyvotel:

    Updating faster rcnn so that raw_detection_boxes from predict() are in normalized coordinates.

--
259338670  by Zhichao Lu:

    Add support for use_native_resize_op to more feature extractors. Use dynamic shapes when static shapes are not available.

--
259083543  by ronnyvotel:

    Updating/fixing documentation.

--
259078937  by rathodv:

    Add prediction fields for tensors returned from detection_model.predict.

--
259044601  by Zhichao Lu:

    Add protocol buffer and builders for temperature scaling calibration.

--
259036770  by lzc:

    Internal changes.

--
259006223  by ronnyvotel:

    Adding detection anchor indices to Faster R-CNN Config. This is useful when one wishes to associate final detections and the anchors (or pre-nms boxes) from which they originated.

--
258872501  by Zhichao Lu:

    Run the training pipeline of ssd + resnet_v1_50 + fpn with a checkpoint.

--
258840686  by ronnyvotel:

    Adding standard outputs to DetectionModel.predict(). This CL only updates Faster R-CNN. Other meta architectures will be updated in future CLs.

--
258672969  by lzc:

    Internal change.

--
258649494  by lzc:

    Internal changes.

--
258630321  by ronnyvotel:

    Fixing documentation in shape_utils.flatten_dimensions().

--
258468145  by Zhichao Lu:

    Add additional output tensors parameter to Postprocess op.

--
258099219  by Zhichao Lu:

    Internal changes

--

PiperOrigin-RevId: 274959989
parent 9aed0ffb
......@@ -47,9 +47,7 @@ MODEL_BUILD_UTIL_MAP = model_lib.MODEL_BUILD_UTIL_MAP
def _compute_losses_and_predictions_dicts(
model, features, labels,
add_regularization_loss=True,
use_tpu=False,
use_bfloat16=False):
add_regularization_loss=True):
"""Computes the losses dict and predictions dict for a model on inputs.
Args:
......@@ -88,8 +86,6 @@ def _compute_losses_and_predictions_dicts(
float32 tensor containing keypoints for each box.
add_regularization_loss: Whether or not to include the model's
regularization loss in the losses dictionary.
use_tpu: Whether computation should happen on a TPU.
use_bfloat16: Whether computation on a TPU should use bfloat16.
Returns:
A tuple containing the losses dictionary (with the total loss under
......@@ -100,18 +96,10 @@ def _compute_losses_and_predictions_dicts(
model_lib.provide_groundtruth(model, labels)
preprocessed_images = features[fields.InputDataFields.image]
# TODO(kaftan): Check how we're supposed to do this mixed precision stuff
## in TF2 TPUStrategy + Keras
if use_tpu and use_bfloat16:
with tf.contrib.tpu.bfloat16_scope():
prediction_dict = model.predict(
preprocessed_images,
features[fields.InputDataFields.true_image_shape])
prediction_dict = ops.bfloat16_to_float32_nested(prediction_dict)
else:
prediction_dict = model.predict(
preprocessed_images,
features[fields.InputDataFields.true_image_shape])
prediction_dict = model.predict(
preprocessed_images,
features[fields.InputDataFields.true_image_shape])
prediction_dict = ops.bfloat16_to_float32_nested(prediction_dict)
losses_dict = model.loss(
prediction_dict, features[fields.InputDataFields.true_image_shape])
......@@ -122,6 +110,8 @@ def _compute_losses_and_predictions_dicts(
## as well.
regularization_losses = model.regularization_losses()
if regularization_losses:
regularization_losses = ops.bfloat16_to_float32_nested(
regularization_losses)
regularization_loss = tf.add_n(
regularization_losses, name='regularization_loss')
losses.append(regularization_loss)
......@@ -146,7 +136,6 @@ def eager_train_step(detection_model,
add_regularization_loss=True,
clip_gradients_value=None,
use_tpu=False,
use_bfloat16=False,
global_step=None,
num_replicas=1.0):
"""Process a single training batch.
......@@ -204,7 +193,6 @@ def eager_train_step(detection_model,
clip_gradients_value: If this is present, clip the gradients global norm
at this value using `tf.clip_by_global_norm`.
use_tpu: Whether computation should happen on a TPU.
use_bfloat16: Whether computation on a TPU should use bfloat16.
global_step: The current training step. Used for TensorBoard logging
purposes. This step is not updated by this function and must be
incremented separately.
......@@ -226,8 +214,7 @@ def eager_train_step(detection_model,
with tf.GradientTape() as tape:
losses_dict, _ = _compute_losses_and_predictions_dicts(
detection_model, features, labels, add_regularization_loss, use_tpu,
use_bfloat16)
detection_model, features, labels, add_regularization_loss)
total_loss = losses_dict['Loss/total_loss']
......@@ -236,9 +223,10 @@ def eager_train_step(detection_model,
tf.constant(num_replicas, dtype=tf.float32))
losses_dict['Loss/normalized_total_loss'] = total_loss
for loss_type in losses_dict:
tf.compat.v2.summary.scalar(
loss_type, losses_dict[loss_type], step=global_step)
if not use_tpu:
for loss_type in losses_dict:
tf.compat.v2.summary.scalar(
loss_type, losses_dict[loss_type], step=global_step)
trainable_variables = detection_model.trainable_variables
......@@ -258,7 +246,7 @@ def eager_train_step(detection_model,
def load_fine_tune_checkpoint(
model, checkpoint_path, checkpoint_type,
load_all_detection_checkpoint_vars, input_dataset,
unpad_groundtruth_tensors, use_tpu, use_bfloat16):
unpad_groundtruth_tensors):
"""Load a fine tuning classification or detection checkpoint.
To make sure the model variables are all built, this method first executes
......@@ -284,8 +272,6 @@ def load_fine_tune_checkpoint(
input_dataset: The tf.data Dataset the model is being trained on. Needed
to get the shapes for the dummy loss computation.
unpad_groundtruth_tensors: A parameter passed to unstack_batch.
use_tpu: Whether computation should happen on a TPU.
use_bfloat16: Whether computation on a TPU should use bfloat16.
"""
features, labels = iter(input_dataset).next()
......@@ -299,9 +285,7 @@ def load_fine_tune_checkpoint(
return _compute_losses_and_predictions_dicts(
model,
features,
labels,
use_tpu=use_tpu,
use_bfloat16=use_bfloat16)
labels)
strategy = tf.compat.v2.distribute.get_strategy()
strategy.experimental_run_v2(
......@@ -313,11 +297,10 @@ def load_fine_tune_checkpoint(
fine_tune_checkpoint_type=checkpoint_type,
load_all_detection_checkpoint_vars=(
load_all_detection_checkpoint_vars))
available_var_map = (
variables_helper.get_variables_available_in_checkpoint(
var_map,
checkpoint_path,
include_global_step=False))
available_var_map = variables_helper.get_variables_available_in_checkpoint(
var_map,
checkpoint_path,
include_global_step=False)
tf.train.init_from_checkpoint(checkpoint_path,
available_var_map)
......@@ -386,7 +369,6 @@ def train_loop(
train_input_config = configs['train_input_config']
unpad_groundtruth_tensors = train_config.unpad_groundtruth_tensors
use_bfloat16 = train_config.use_bfloat16
add_regularization_loss = train_config.add_regularization_loss
clip_gradients_value = None
if train_config.gradient_clipping_by_norm > 0:
......@@ -403,6 +385,9 @@ def train_loop(
'train_loop: use_tpu %s, export_to_tpu %s', use_tpu,
export_to_tpu)
if kwargs['use_bfloat16']:
tf.compat.v2.keras.mixed_precision.experimental.set_policy('mixed_bfloat16')
# Parse the checkpoint fine tuning configs
if hparams.load_pretrained:
fine_tune_checkpoint_path = train_config.fine_tune_checkpoint
......@@ -427,10 +412,8 @@ def train_loop(
pipeline_config_final = create_pipeline_proto_from_configs(configs)
config_util.save_pipeline_config(pipeline_config_final, model_dir)
# TODO(kaftan): Either make strategy a parameter of this method, or
## grab it w/ Distribution strategy's get_scope
# Build the model, optimizer, and training input
strategy = tf.compat.v2.distribute.MirroredStrategy()
strategy = tf.compat.v2.distribute.get_strategy()
with strategy.scope():
detection_model = model_builder.build(
model_config=model_config, is_training=True)
......@@ -446,7 +429,7 @@ def train_loop(
train_input.repeat())
global_step = tf.compat.v2.Variable(
0, trainable=False, dtype=tf.compat.v2.dtypes.int64)
0, trainable=False, dtype=tf.compat.v2.dtypes.int64, name='global_step')
optimizer, (learning_rate,) = optimizer_builder.build(
train_config.optimizer, global_step=global_step)
......@@ -465,8 +448,7 @@ def train_loop(
fine_tune_checkpoint_type,
load_all_detection_checkpoint_vars,
train_input,
unpad_groundtruth_tensors, use_tpu,
use_bfloat16)
unpad_groundtruth_tensors)
ckpt = tf.compat.v2.train.Checkpoint(
step=global_step, model=detection_model)
......@@ -483,7 +465,6 @@ def train_loop(
unpad_groundtruth_tensors,
optimizer,
learning_rate=learning_rate_fn(),
use_bfloat16=use_bfloat16,
add_regularization_loss=add_regularization_loss,
clip_gradients_value=clip_gradients_value,
use_tpu=use_tpu,
......@@ -512,11 +493,12 @@ def train_loop(
loss = _dist_train_step(train_input_iter)
global_step.assign_add(1)
end_time = time.time()
tf.compat.v2.summary.scalar(
'steps_per_sec', 1.0 / (end_time - start_time), step=global_step)
if not use_tpu:
tf.compat.v2.summary.scalar(
'steps_per_sec', 1.0 / (end_time - start_time), step=global_step)
# TODO(kaftan): Remove this print after it is no longer helpful for
## debugging.
tf.print('Finished step', global_step, end_time, loss)
print('Finished step', global_step, end_time, loss)
if int(global_step.value().numpy()) % checkpoint_every_n == 0:
manager.save()
......@@ -552,7 +534,6 @@ def eager_eval_loop(
train_config = configs['train_config']
eval_input_config = configs['eval_input_config']
eval_config = configs['eval_config']
use_bfloat16 = train_config.use_bfloat16
add_regularization_loss = train_config.add_regularization_loss
is_training = False
......@@ -594,8 +575,7 @@ def eager_eval_loop(
labels, unpad_groundtruth_tensors=unpad_groundtruth_tensors)
losses_dict, prediction_dict = _compute_losses_and_predictions_dicts(
detection_model, features, labels, add_regularization_loss, use_tpu,
use_bfloat16)
detection_model, features, labels, add_regularization_loss)
def postprocess_wrapper(args):
return detection_model.postprocess(args[0], args[1])
......@@ -762,6 +742,9 @@ def eval_continuously(
eval_on_train_input_config.num_epochs))
eval_on_train_input_config.num_epochs = 1
if kwargs['use_bfloat16']:
tf.compat.v2.keras.mixed_precision.experimental.set_policy('mixed_bfloat16')
detection_model = model_builder.build(
model_config=model_config, is_training=True)
......
......@@ -27,6 +27,7 @@ import collections
import functools
import tensorflow as tf
from object_detection.utils import ops
from object_detection.utils import shape_utils
slim = tf.contrib.slim
# Activation bound used for TPU v1. Activations will be clipped to
......@@ -568,7 +569,7 @@ class KerasFpnTopDownFeatureMaps(tf.keras.Model):
# TODO (b/128922690): clean-up of ops.nearest_neighbor_upsampling
if use_native_resize_op:
def resize_nearest_neighbor(image):
image_shape = image.shape.as_list()
image_shape = shape_utils.combined_static_and_dynamic_shape(image)
return tf.image.resize_nearest_neighbor(
image, [image_shape[1] * 2, image_shape[2] * 2])
top_down_net.append(tf.keras.layers.Lambda(
......@@ -704,7 +705,8 @@ def fpn_top_down_feature_maps(image_features,
for level in reversed(range(num_levels - 1)):
if use_native_resize_op:
with tf.name_scope('nearest_neighbor_upsampling'):
top_down_shape = top_down.shape.as_list()
top_down_shape = shape_utils.combined_static_and_dynamic_shape(
top_down)
top_down = tf.image.resize_nearest_neighbor(
top_down, [top_down_shape[1] * 2, top_down_shape[2] * 2])
else:
......
......@@ -242,7 +242,7 @@ class _LayersOverride(object):
placeholder_with_default = tf.placeholder_with_default(
input=input_tensor, shape=[None] + shape)
return tf.keras.layers.Input(tensor=placeholder_with_default)
return model_utils.input_layer(shape, placeholder_with_default)
# pylint: disable=unused-argument
def ReLU(self, *args, **kwargs):
......
......@@ -230,10 +230,7 @@ class _LayersOverride(object):
placeholder_with_default = tf.placeholder_with_default(
input=input_tensor, shape=[None] + shape)
if tf.executing_eagerly():
return tf.keras.layers.Input(shape=shape)
else:
return tf.keras.layers.Input(tensor=placeholder_with_default)
return model_utils.input_layer(shape, placeholder_with_default)
# pylint: disable=unused-argument
def ReLU(self, *args, **kwargs):
......
......@@ -20,6 +20,7 @@ from __future__ import division
from __future__ import print_function
import collections
import tensorflow as tf
# This is to specify the custom config of model structures. For example,
# ConvDefs(conv_name='conv_pw_12', filters=512) for Mobilenet V1 is to specify
......@@ -43,3 +44,10 @@ def get_conv_def(conv_defs, layer_name):
if layer_name == conv_def.conv_name:
return conv_def.filters
return None
def input_layer(shape, placeholder_with_default):
if tf.executing_eagerly():
return tf.keras.layers.Input(shape=shape)
else:
return tf.keras.layers.Input(tensor=placeholder_with_default)
......@@ -22,6 +22,7 @@ from __future__ import print_function
import tensorflow as tf
from object_detection.core import freezable_batch_norm
from object_detection.models.keras_models import model_utils
def _fixed_padding(inputs, kernel_size, rate=1): # pylint: disable=invalid-name
......@@ -216,7 +217,7 @@ class _LayersOverride(object):
placeholder_with_default = tf.placeholder_with_default(
input=input_tensor, shape=[None] + shape)
return tf.keras.layers.Input(tensor=placeholder_with_default)
return model_utils.input_layer(shape, placeholder_with_default)
def MaxPooling2D(self, pool_size, **kwargs):
"""Builds a MaxPooling2D layer with default padding as 'SAME'.
......
......@@ -52,6 +52,7 @@ class SSDMobileNetV1FpnFeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
reuse_weights=None,
use_explicit_padding=False,
use_depthwise=False,
use_native_resize_op=False,
override_base_feature_extractor_hyperparams=False):
"""SSD FPN feature extractor based on Mobilenet v1 architecture.
......@@ -79,6 +80,8 @@ class SSDMobileNetV1FpnFeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
use_explicit_padding: Whether to use explicit padding when extracting
features. Default is False.
use_depthwise: Whether to use depthwise convolutions. Default is False.
use_native_resize_op: Whether to use tf.image.nearest_neighbor_resize
to do upsampling in FPN. Default is false.
override_base_feature_extractor_hyperparams: Whether to override
hyperparameters of the base feature extractor with the one from
`conv_hyperparams_fn`.
......@@ -100,6 +103,7 @@ class SSDMobileNetV1FpnFeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
self._conv_defs = None
if self._use_depthwise:
self._conv_defs = _create_modified_mobilenet_config()
self._use_native_resize_op = use_native_resize_op
def preprocess(self, resized_inputs):
"""SSD preprocessing.
......@@ -162,7 +166,8 @@ class SSDMobileNetV1FpnFeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
[(key, image_features[key]) for key in feature_block_list],
depth=depth_fn(self._additional_layer_depth),
use_depthwise=self._use_depthwise,
use_explicit_padding=self._use_explicit_padding)
use_explicit_padding=self._use_explicit_padding,
use_native_resize_op=self._use_native_resize_op)
feature_maps = []
for level in range(self._fpn_min_level, base_fpn_max_level + 1):
feature_maps.append(fpn_features['top_down_{}'.format(
......
......@@ -49,6 +49,7 @@ class SSDMobileNetV1FpnKerasFeatureExtractor(
additional_layer_depth=256,
use_explicit_padding=False,
use_depthwise=False,
use_native_resize_op=False,
override_base_feature_extractor_hyperparams=False,
name=None):
"""SSD Keras based FPN feature extractor Mobilenet v1 architecture.
......@@ -84,6 +85,8 @@ class SSDMobileNetV1FpnKerasFeatureExtractor(
use_explicit_padding: Whether to use explicit padding when extracting
features. Default is False.
use_depthwise: whether to use depthwise convolutions. Default is False.
use_native_resize_op: Whether to use tf.image.nearest_neighbor_resize
to do upsampling in FPN. Default is false.
override_base_feature_extractor_hyperparams: Whether to override
hyperparameters of the base feature extractor with the one from
`conv_hyperparams`.
......@@ -109,6 +112,7 @@ class SSDMobileNetV1FpnKerasFeatureExtractor(
self._conv_defs = None
if self._use_depthwise:
self._conv_defs = _create_modified_mobilenet_config()
self._use_native_resize_op = use_native_resize_op
self._feature_blocks = [
'Conv2d_3_pointwise', 'Conv2d_5_pointwise', 'Conv2d_11_pointwise',
'Conv2d_13_pointwise'
......@@ -153,6 +157,7 @@ class SSDMobileNetV1FpnKerasFeatureExtractor(
depth=self._depth_fn(self._additional_layer_depth),
use_depthwise=self._use_depthwise,
use_explicit_padding=self._use_explicit_padding,
use_native_resize_op=self._use_native_resize_op,
is_training=self._is_training,
conv_hyperparams=self._conv_hyperparams,
freeze_batchnorm=self._freeze_batchnorm,
......
......@@ -53,6 +53,7 @@ class SSDMobileNetV2FpnFeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
reuse_weights=None,
use_explicit_padding=False,
use_depthwise=False,
use_native_resize_op=False,
override_base_feature_extractor_hyperparams=False):
"""SSD FPN feature extractor based on Mobilenet v2 architecture.
......@@ -79,6 +80,8 @@ class SSDMobileNetV2FpnFeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
use_explicit_padding: Whether to use explicit padding when extracting
features. Default is False.
use_depthwise: Whether to use depthwise convolutions. Default is False.
use_native_resize_op: Whether to use tf.image.nearest_neighbor_resize
to do upsampling in FPN. Default is false.
override_base_feature_extractor_hyperparams: Whether to override
hyperparameters of the base feature extractor with the one from
`conv_hyperparams_fn`.
......@@ -100,6 +103,7 @@ class SSDMobileNetV2FpnFeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
self._conv_defs = None
if self._use_depthwise:
self._conv_defs = _create_modified_mobilenet_config()
self._use_native_resize_op = use_native_resize_op
def preprocess(self, resized_inputs):
"""SSD preprocessing.
......@@ -159,7 +163,8 @@ class SSDMobileNetV2FpnFeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
[(key, image_features[key]) for key in feature_block_list],
depth=depth_fn(self._additional_layer_depth),
use_depthwise=self._use_depthwise,
use_explicit_padding=self._use_explicit_padding)
use_explicit_padding=self._use_explicit_padding,
use_native_resize_op=self._use_native_resize_op)
feature_maps = []
for level in range(self._fpn_min_level, base_fpn_max_level + 1):
feature_maps.append(fpn_features['top_down_{}'.format(
......
......@@ -52,6 +52,7 @@ class SSDMobileNetV2FpnKerasFeatureExtractor(
reuse_weights=None,
use_explicit_padding=False,
use_depthwise=False,
use_native_resize_op=False,
override_base_feature_extractor_hyperparams=False,
name=None):
"""SSD Keras based FPN feature extractor Mobilenet v2 architecture.
......@@ -87,6 +88,8 @@ class SSDMobileNetV2FpnKerasFeatureExtractor(
use_explicit_padding: Whether to use explicit padding when extracting
features. Default is False.
use_depthwise: Whether to use depthwise convolutions. Default is False.
use_native_resize_op: Whether to use tf.image.nearest_neighbor_resize
to do upsampling in FPN. Default is false.
override_base_feature_extractor_hyperparams: Whether to override
hyperparameters of the base feature extractor with the one from
`conv_hyperparams`.
......@@ -112,6 +115,7 @@ class SSDMobileNetV2FpnKerasFeatureExtractor(
self._conv_defs = None
if self._use_depthwise:
self._conv_defs = _create_modified_mobilenet_config()
self._use_native_resize_op = use_native_resize_op
self._feature_blocks = ['layer_4', 'layer_7', 'layer_14', 'layer_19']
self._mobilenet_v2 = None
self._fpn_features_generator = None
......@@ -151,6 +155,7 @@ class SSDMobileNetV2FpnKerasFeatureExtractor(
depth=self._depth_fn(self._additional_layer_depth),
use_depthwise=self._use_depthwise,
use_explicit_padding=self._use_explicit_padding,
use_native_resize_op=self._use_native_resize_op,
is_training=self._is_training,
conv_hyperparams=self._conv_hyperparams,
freeze_batchnorm=self._freeze_batchnorm,
......
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""SSDFeatureExtractor for MobileNetV3 features."""
import tensorflow as tf
from object_detection.meta_architectures import ssd_meta_arch
from object_detection.models import feature_map_generators
from object_detection.utils import context_manager
from object_detection.utils import ops
from object_detection.utils import shape_utils
from nets.mobilenet import mobilenet
from nets.mobilenet import mobilenet_v3
slim = tf.contrib.slim
class _SSDMobileNetV3FeatureExtractorBase(ssd_meta_arch.SSDFeatureExtractor):
"""Base class of SSD feature extractor using MobilenetV3 features."""
def __init__(self,
conv_defs,
from_layer,
is_training,
depth_multiplier,
min_depth,
pad_to_multiple,
conv_hyperparams_fn,
reuse_weights=None,
use_explicit_padding=False,
use_depthwise=False,
override_base_feature_extractor_hyperparams=False):
"""MobileNetV3 Feature Extractor for SSD Models.
MobileNet v3. Details found in:
https://arxiv.org/abs/1905.02244
Args:
conv_defs: MobileNetV3 conv defs for backbone.
from_layer: A cell of two layer names (string) to connect to the 1st and
2nd inputs of the SSD head.
is_training: whether the network is in training mode.
depth_multiplier: float depth multiplier for feature extractor.
min_depth: minimum feature extractor depth.
pad_to_multiple: the nearest multiple to zero pad the input height and
width dimensions to.
conv_hyperparams_fn: A function to construct tf slim arg_scope for conv2d
and separable_conv2d ops in the layers that are added on top of the base
feature extractor.
reuse_weights: Whether to reuse variables. Default is None.
use_explicit_padding: Whether to use explicit padding when extracting
features. Default is False.
use_depthwise: Whether to use depthwise convolutions. Default is False.
override_base_feature_extractor_hyperparams: Whether to override
hyperparameters of the base feature extractor with the one from
`conv_hyperparams_fn`.
"""
super(_SSDMobileNetV3FeatureExtractorBase, self).__init__(
is_training=is_training,
depth_multiplier=depth_multiplier,
min_depth=min_depth,
pad_to_multiple=pad_to_multiple,
conv_hyperparams_fn=conv_hyperparams_fn,
reuse_weights=reuse_weights,
use_explicit_padding=use_explicit_padding,
use_depthwise=use_depthwise,
override_base_feature_extractor_hyperparams=override_base_feature_extractor_hyperparams
)
self._conv_defs = conv_defs
self._from_layer = from_layer
def preprocess(self, resized_inputs):
"""SSD preprocessing.
Maps pixel values to the range [-1, 1].
Args:
resized_inputs: a [batch, height, width, channels] float tensor
representing a batch of images.
Returns:
preprocessed_inputs: a [batch, height, width, channels] float tensor
representing a batch of images.
"""
return (2.0 / 255.0) * resized_inputs - 1.0
def extract_features(self, preprocessed_inputs):
"""Extract features from preprocessed inputs.
Args:
preprocessed_inputs: a [batch, height, width, channels] float tensor
representing a batch of images.
Returns:
feature_maps: a list of tensors where the ith tensor has shape
[batch, height_i, width_i, depth_i]
Raises:
ValueError if conv_defs is not provided or from_layer does not meet the
size requirement.
"""
if not self._conv_defs:
raise ValueError('Must provide backbone conv defs.')
if len(self._from_layer) != 2:
raise ValueError('SSD input feature names are not provided.')
preprocessed_inputs = shape_utils.check_min_image_dim(
33, preprocessed_inputs)
feature_map_layout = {
'from_layer': [
self._from_layer[0], self._from_layer[1], '', '', '', ''
],
'layer_depth': [-1, -1, 512, 256, 256, 128],
'use_depthwise': self._use_depthwise,
'use_explicit_padding': self._use_explicit_padding,
}
with tf.variable_scope('MobilenetV3', reuse=self._reuse_weights) as scope:
with slim.arg_scope(
mobilenet_v3.training_scope(is_training=None, bn_decay=0.9997)), \
slim.arg_scope(
[mobilenet.depth_multiplier], min_depth=self._min_depth):
with (slim.arg_scope(self._conv_hyperparams_fn())
if self._override_base_feature_extractor_hyperparams else
context_manager.IdentityContextManager()):
# TODO(bochen): switch to v3 modules once v3 is properly refactored.
_, image_features = mobilenet_v3.mobilenet_base(
ops.pad_to_multiple(preprocessed_inputs, self._pad_to_multiple),
conv_defs=self._conv_defs,
final_endpoint=self._from_layer[1],
depth_multiplier=self._depth_multiplier,
use_explicit_padding=self._use_explicit_padding,
scope=scope)
with slim.arg_scope(self._conv_hyperparams_fn()):
feature_maps = feature_map_generators.multi_resolution_feature_maps(
feature_map_layout=feature_map_layout,
depth_multiplier=self._depth_multiplier,
min_depth=self._min_depth,
insert_1x1_conv=True,
image_features=image_features)
return feature_maps.values()
class SSDMobileNetV3LargeFeatureExtractor(_SSDMobileNetV3FeatureExtractorBase):
"""Mobilenet V3-Large feature extractor."""
def __init__(self,
is_training,
depth_multiplier,
min_depth,
pad_to_multiple,
conv_hyperparams_fn,
reuse_weights=None,
use_explicit_padding=False,
use_depthwise=False,
override_base_feature_extractor_hyperparams=False):
super(SSDMobileNetV3LargeFeatureExtractor, self).__init__(
conv_defs=mobilenet_v3.V3_LARGE_DETECTION,
from_layer=['layer_14/expansion_output', 'layer_17'],
is_training=is_training,
depth_multiplier=depth_multiplier,
min_depth=min_depth,
pad_to_multiple=pad_to_multiple,
conv_hyperparams_fn=conv_hyperparams_fn,
reuse_weights=reuse_weights,
use_explicit_padding=use_explicit_padding,
use_depthwise=use_depthwise,
override_base_feature_extractor_hyperparams=override_base_feature_extractor_hyperparams
)
class SSDMobileNetV3SmallFeatureExtractor(_SSDMobileNetV3FeatureExtractorBase):
"""Mobilenet V3-Small feature extractor."""
def __init__(self,
is_training,
depth_multiplier,
min_depth,
pad_to_multiple,
conv_hyperparams_fn,
reuse_weights=None,
use_explicit_padding=False,
use_depthwise=False,
override_base_feature_extractor_hyperparams=False):
super(SSDMobileNetV3SmallFeatureExtractor, self).__init__(
conv_defs=mobilenet_v3.V3_SMALL_DETECTION,
from_layer=['layer_10/expansion_output', 'layer_13'],
is_training=is_training,
depth_multiplier=depth_multiplier,
min_depth=min_depth,
pad_to_multiple=pad_to_multiple,
conv_hyperparams_fn=conv_hyperparams_fn,
reuse_weights=reuse_weights,
use_explicit_padding=use_explicit_padding,
use_depthwise=use_depthwise,
override_base_feature_extractor_hyperparams=override_base_feature_extractor_hyperparams
)
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for ssd_mobilenet_v3_feature_extractor."""
import tensorflow as tf
from object_detection.models import ssd_mobilenet_v3_feature_extractor
from object_detection.models import ssd_mobilenet_v3_feature_extractor_testbase
slim = tf.contrib.slim
class SsdMobilenetV3LargeFeatureExtractorTest(
ssd_mobilenet_v3_feature_extractor_testbase
._SsdMobilenetV3FeatureExtractorTestBase):
def _get_input_sizes(self):
"""Return first two input feature map sizes."""
return [672, 480]
def _create_feature_extractor(self,
depth_multiplier,
pad_to_multiple,
use_explicit_padding=False,
use_keras=False):
"""Constructs a new Mobilenet V3-Large feature extractor.
Args:
depth_multiplier: float depth multiplier for feature extractor
pad_to_multiple: the nearest multiple to zero pad the input height and
width dimensions to.
use_explicit_padding: use 'VALID' padding for convolutions, but prepad
inputs so that the output dimensions are the same as if 'SAME' padding
were used.
use_keras: if True builds a keras-based feature extractor, if False builds
a slim-based one.
Returns:
an ssd_meta_arch.SSDFeatureExtractor object.
"""
min_depth = 32
return (
ssd_mobilenet_v3_feature_extractor.SSDMobileNetV3LargeFeatureExtractor(
False,
depth_multiplier,
min_depth,
pad_to_multiple,
self.conv_hyperparams_fn,
use_explicit_padding=use_explicit_padding))
class SsdMobilenetV3SmallFeatureExtractorTest(
ssd_mobilenet_v3_feature_extractor_testbase
._SsdMobilenetV3FeatureExtractorTestBase):
def _get_input_sizes(self):
"""Return first two input feature map sizes."""
return [288, 288]
def _create_feature_extractor(self,
depth_multiplier,
pad_to_multiple,
use_explicit_padding=False,
use_keras=False):
"""Constructs a new Mobilenet V3-Small feature extractor.
Args:
depth_multiplier: float depth multiplier for feature extractor
pad_to_multiple: the nearest multiple to zero pad the input height and
width dimensions to.
use_explicit_padding: use 'VALID' padding for convolutions, but prepad
inputs so that the output dimensions are the same as if 'SAME' padding
were used.
use_keras: if True builds a keras-based feature extractor, if False builds
a slim-based one.
Returns:
an ssd_meta_arch.SSDFeatureExtractor object.
"""
min_depth = 32
return (
ssd_mobilenet_v3_feature_extractor.SSDMobileNetV3SmallFeatureExtractor(
False,
depth_multiplier,
min_depth,
pad_to_multiple,
self.conv_hyperparams_fn,
use_explicit_padding=use_explicit_padding))
if __name__ == '__main__':
tf.test.main()
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Base test class for ssd_mobilenet_v3_feature_extractor."""
import abc
import numpy as np
import tensorflow as tf
from object_detection.models import ssd_feature_extractor_test
slim = tf.contrib.slim
class _SsdMobilenetV3FeatureExtractorTestBase(
ssd_feature_extractor_test.SsdFeatureExtractorTestBase):
"""Base class for MobilenetV3 tests."""
@abc.abstractmethod
def _get_input_sizes(self):
"""Return feature map sizes for the two inputs to SSD head."""
pass
def test_extract_features_returns_correct_shapes_128(self):
image_height = 128
image_width = 128
depth_multiplier = 1.0
pad_to_multiple = 1
input_feature_sizes = self._get_input_sizes()
expected_feature_map_shape = [(2, 8, 8, input_feature_sizes[0]),
(2, 4, 4, input_feature_sizes[1]),
(2, 2, 2, 512), (2, 1, 1, 256), (2, 1, 1,
256),
(2, 1, 1, 128)]
self.check_extract_features_returns_correct_shape(
2,
image_height,
image_width,
depth_multiplier,
pad_to_multiple,
expected_feature_map_shape,
use_keras=False)
def test_extract_features_returns_correct_shapes_299(self):
image_height = 299
image_width = 299
depth_multiplier = 1.0
pad_to_multiple = 1
input_feature_sizes = self._get_input_sizes()
expected_feature_map_shape = [(2, 19, 19, input_feature_sizes[0]),
(2, 10, 10, input_feature_sizes[1]),
(2, 5, 5, 512), (2, 3, 3, 256), (2, 2, 2,
256),
(2, 1, 1, 128)]
self.check_extract_features_returns_correct_shape(
2,
image_height,
image_width,
depth_multiplier,
pad_to_multiple,
expected_feature_map_shape,
use_keras=False)
def test_extract_features_returns_correct_shapes_with_pad_to_multiple(self):
image_height = 299
image_width = 299
depth_multiplier = 1.0
pad_to_multiple = 32
input_feature_sizes = self._get_input_sizes()
expected_feature_map_shape = [(2, 20, 20, input_feature_sizes[0]),
(2, 10, 10, input_feature_sizes[1]),
(2, 5, 5, 512), (2, 3, 3, 256), (2, 2, 2,
256),
(2, 1, 1, 128)]
self.check_extract_features_returns_correct_shape(
2, image_height, image_width, depth_multiplier, pad_to_multiple,
expected_feature_map_shape)
def test_preprocess_returns_correct_value_range(self):
image_height = 128
image_width = 128
depth_multiplier = 1
pad_to_multiple = 1
test_image = np.random.rand(4, image_height, image_width, 3)
feature_extractor = self._create_feature_extractor(
depth_multiplier, pad_to_multiple, use_keras=False)
preprocessed_image = feature_extractor.preprocess(test_image)
self.assertTrue(np.all(np.less_equal(np.abs(preprocessed_image), 1.0)))
def test_has_fused_batchnorm(self):
image_height = 40
image_width = 40
depth_multiplier = 1
pad_to_multiple = 1
image_placeholder = tf.placeholder(tf.float32,
[1, image_height, image_width, 3])
feature_extractor = self._create_feature_extractor(
depth_multiplier, pad_to_multiple, use_keras=False)
preprocessed_image = feature_extractor.preprocess(image_placeholder)
_ = feature_extractor.extract_features(preprocessed_image)
self.assertTrue(any('FusedBatchNorm' in op.type
for op in tf.get_default_graph().get_operations()))
......@@ -47,6 +47,7 @@ class SSDResnetV1FpnFeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
reuse_weights=None,
use_explicit_padding=False,
use_depthwise=False,
use_native_resize_op=False,
override_base_feature_extractor_hyperparams=False):
"""SSD FPN feature extractor based on Resnet v1 architecture.
......@@ -77,6 +78,8 @@ class SSDResnetV1FpnFeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
use_explicit_padding: Whether to use explicit padding when extracting
features. Default is False. UNUSED currently.
use_depthwise: Whether to use depthwise convolutions. UNUSED currently.
use_native_resize_op: Whether to use tf.image.nearest_neighbor_resize
to do upsampling in FPN. Default is false.
override_base_feature_extractor_hyperparams: Whether to override
hyperparameters of the base feature extractor with the one from
`conv_hyperparams_fn`.
......@@ -103,6 +106,7 @@ class SSDResnetV1FpnFeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
self._fpn_min_level = fpn_min_level
self._fpn_max_level = fpn_max_level
self._additional_layer_depth = additional_layer_depth
self._use_native_resize_op = use_native_resize_op
def preprocess(self, resized_inputs):
"""SSD preprocessing.
......@@ -178,7 +182,8 @@ class SSDResnetV1FpnFeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
feature_block_list.append('block{}'.format(level - 1))
fpn_features = feature_map_generators.fpn_top_down_feature_maps(
[(key, image_features[key]) for key in feature_block_list],
depth=depth_fn(self._additional_layer_depth))
depth=depth_fn(self._additional_layer_depth),
use_native_resize_op=self._use_native_resize_op)
feature_maps = []
for level in range(self._fpn_min_level, base_fpn_max_level + 1):
feature_maps.append(
......@@ -213,6 +218,7 @@ class SSDResnet50V1FpnFeatureExtractor(SSDResnetV1FpnFeatureExtractor):
reuse_weights=None,
use_explicit_padding=False,
use_depthwise=False,
use_native_resize_op=False,
override_base_feature_extractor_hyperparams=False):
"""SSD Resnet50 V1 FPN feature extractor based on Resnet v1 architecture.
......@@ -232,6 +238,8 @@ class SSDResnet50V1FpnFeatureExtractor(SSDResnetV1FpnFeatureExtractor):
use_explicit_padding: Whether to use explicit padding when extracting
features. Default is False. UNUSED currently.
use_depthwise: Whether to use depthwise convolutions. UNUSED currently.
use_native_resize_op: Whether to use tf.image.nearest_neighbor_resize
to do upsampling in FPN. Default is false.
override_base_feature_extractor_hyperparams: Whether to override
hyperparameters of the base feature extractor with the one from
`conv_hyperparams_fn`.
......@@ -251,6 +259,7 @@ class SSDResnet50V1FpnFeatureExtractor(SSDResnetV1FpnFeatureExtractor):
reuse_weights=reuse_weights,
use_explicit_padding=use_explicit_padding,
use_depthwise=use_depthwise,
use_native_resize_op=use_native_resize_op,
override_base_feature_extractor_hyperparams=
override_base_feature_extractor_hyperparams)
......@@ -270,6 +279,7 @@ class SSDResnet101V1FpnFeatureExtractor(SSDResnetV1FpnFeatureExtractor):
reuse_weights=None,
use_explicit_padding=False,
use_depthwise=False,
use_native_resize_op=False,
override_base_feature_extractor_hyperparams=False):
"""SSD Resnet101 V1 FPN feature extractor based on Resnet v1 architecture.
......@@ -289,6 +299,8 @@ class SSDResnet101V1FpnFeatureExtractor(SSDResnetV1FpnFeatureExtractor):
use_explicit_padding: Whether to use explicit padding when extracting
features. Default is False. UNUSED currently.
use_depthwise: Whether to use depthwise convolutions. UNUSED currently.
use_native_resize_op: Whether to use tf.image.nearest_neighbor_resize
to do upsampling in FPN. Default is false.
override_base_feature_extractor_hyperparams: Whether to override
hyperparameters of the base feature extractor with the one from
`conv_hyperparams_fn`.
......@@ -308,6 +320,7 @@ class SSDResnet101V1FpnFeatureExtractor(SSDResnetV1FpnFeatureExtractor):
reuse_weights=reuse_weights,
use_explicit_padding=use_explicit_padding,
use_depthwise=use_depthwise,
use_native_resize_op=use_native_resize_op,
override_base_feature_extractor_hyperparams=
override_base_feature_extractor_hyperparams)
......@@ -327,6 +340,7 @@ class SSDResnet152V1FpnFeatureExtractor(SSDResnetV1FpnFeatureExtractor):
reuse_weights=None,
use_explicit_padding=False,
use_depthwise=False,
use_native_resize_op=False,
override_base_feature_extractor_hyperparams=False):
"""SSD Resnet152 V1 FPN feature extractor based on Resnet v1 architecture.
......@@ -346,6 +360,8 @@ class SSDResnet152V1FpnFeatureExtractor(SSDResnetV1FpnFeatureExtractor):
use_explicit_padding: Whether to use explicit padding when extracting
features. Default is False. UNUSED currently.
use_depthwise: Whether to use depthwise convolutions. UNUSED currently.
use_native_resize_op: Whether to use tf.image.nearest_neighbor_resize
to do upsampling in FPN. Default is false.
override_base_feature_extractor_hyperparams: Whether to override
hyperparameters of the base feature extractor with the one from
`conv_hyperparams_fn`.
......@@ -365,5 +381,6 @@ class SSDResnet152V1FpnFeatureExtractor(SSDResnetV1FpnFeatureExtractor):
reuse_weights=reuse_weights,
use_explicit_padding=use_explicit_padding,
use_depthwise=use_depthwise,
use_native_resize_op=use_native_resize_op,
override_base_feature_extractor_hyperparams=
override_base_feature_extractor_hyperparams)
......@@ -52,6 +52,7 @@ class SSDResNetV1FpnKerasFeatureExtractor(
additional_layer_depth=256,
reuse_weights=None,
use_explicit_padding=None,
use_depthwise=None,
override_base_feature_extractor_hyperparams=False,
name=None):
"""SSD Keras based FPN feature extractor Resnet v1 architecture.
......@@ -90,6 +91,7 @@ class SSDResNetV1FpnKerasFeatureExtractor(
use_explicit_padding: whether to use explicit padding when extracting
features. Default is None, as it's an invalid option and not implemented
in this feature extractor.
use_depthwise: Whether to use depthwise convolutions. UNUSED currently.
override_base_feature_extractor_hyperparams: Whether to override
hyperparameters of the base feature extractor with the one from
`conv_hyperparams`.
......@@ -105,11 +107,14 @@ class SSDResNetV1FpnKerasFeatureExtractor(
freeze_batchnorm=freeze_batchnorm,
inplace_batchnorm_update=inplace_batchnorm_update,
use_explicit_padding=None,
use_depthwise=None,
override_base_feature_extractor_hyperparams=
override_base_feature_extractor_hyperparams,
name=name)
if self._use_explicit_padding:
raise ValueError('Explicit padding is not a valid option.')
if self._use_depthwise:
raise ValueError('Depthwise is not a valid option.')
self._fpn_min_level = fpn_min_level
self._fpn_max_level = fpn_max_level
self._additional_layer_depth = additional_layer_depth
......@@ -251,6 +256,7 @@ class SSDResNet50V1FpnKerasFeatureExtractor(
additional_layer_depth=256,
reuse_weights=None,
use_explicit_padding=None,
use_depthwise=None,
override_base_feature_extractor_hyperparams=False,
name='ResNet50V1_FPN'):
"""SSD Keras based FPN feature extractor ResnetV1-50 architecture.
......@@ -278,7 +284,8 @@ class SSDResNet50V1FpnKerasFeatureExtractor(
reuse_weights: whether to reuse variables. Default is None.
use_explicit_padding: whether to use explicit padding when extracting
features. Default is None, as it's an invalid option and not implemented
in this feature extractor
in this feature extractor.
use_depthwise: Whether to use depthwise convolutions. UNUSED currently.
override_base_feature_extractor_hyperparams: Whether to override
hyperparameters of the base feature extractor with the one from
`conv_hyperparams`.
......@@ -296,6 +303,7 @@ class SSDResNet50V1FpnKerasFeatureExtractor(
resnet_v1_base_model=resnet_v1.resnet_v1_50,
resnet_v1_base_model_name='resnet_v1_50',
use_explicit_padding=use_explicit_padding,
use_depthwise=use_depthwise,
override_base_feature_extractor_hyperparams=
override_base_feature_extractor_hyperparams,
name=name)
......@@ -318,6 +326,7 @@ class SSDResNet101V1FpnKerasFeatureExtractor(
additional_layer_depth=256,
reuse_weights=None,
use_explicit_padding=None,
use_depthwise=None,
override_base_feature_extractor_hyperparams=False,
name='ResNet101V1_FPN'):
"""SSD Keras based FPN feature extractor ResnetV1-101 architecture.
......@@ -345,7 +354,8 @@ class SSDResNet101V1FpnKerasFeatureExtractor(
reuse_weights: whether to reuse variables. Default is None.
use_explicit_padding: whether to use explicit padding when extracting
features. Default is None, as it's an invalid option and not implemented
in this feature extractor
in this feature extractor.
use_depthwise: Whether to use depthwise convolutions. UNUSED currently.
override_base_feature_extractor_hyperparams: Whether to override
hyperparameters of the base feature extractor with the one from
`conv_hyperparams`.
......@@ -363,6 +373,7 @@ class SSDResNet101V1FpnKerasFeatureExtractor(
resnet_v1_base_model=resnet_v1.resnet_v1_101,
resnet_v1_base_model_name='resnet_v1_101',
use_explicit_padding=use_explicit_padding,
use_depthwise=use_depthwise,
override_base_feature_extractor_hyperparams=
override_base_feature_extractor_hyperparams,
name=name)
......@@ -385,6 +396,7 @@ class SSDResNet152V1FpnKerasFeatureExtractor(
additional_layer_depth=256,
reuse_weights=None,
use_explicit_padding=False,
use_depthwise=None,
override_base_feature_extractor_hyperparams=False,
name='ResNet152V1_FPN'):
"""SSD Keras based FPN feature extractor ResnetV1-152 architecture.
......@@ -412,7 +424,8 @@ class SSDResNet152V1FpnKerasFeatureExtractor(
reuse_weights: whether to reuse variables. Default is None.
use_explicit_padding: whether to use explicit padding when extracting
features. Default is None, as it's an invalid option and not implemented
in this feature extractor
in this feature extractor.
use_depthwise: Whether to use depthwise convolutions. UNUSED currently.
override_base_feature_extractor_hyperparams: Whether to override
hyperparameters of the base feature extractor with the one from
`conv_hyperparams`.
......@@ -430,6 +443,7 @@ class SSDResNet152V1FpnKerasFeatureExtractor(
resnet_v1_base_model=resnet_v1.resnet_v1_152,
resnet_v1_base_model_name='resnet_v1_152',
use_explicit_padding=use_explicit_padding,
use_depthwise=use_depthwise,
override_base_feature_extractor_hyperparams=
override_base_feature_extractor_hyperparams,
name=name)
......@@ -66,7 +66,7 @@ message ConvolutionalBoxPredictor {
}
// Configuration proto for weight shared convolutional box predictor.
// Next id: 18
// Next id: 19
message WeightSharedConvolutionalBoxPredictor {
// Hyperparameters for convolution ops used in the box predictor.
optional Hyperparams conv_hyperparams = 1;
......@@ -122,8 +122,8 @@ message WeightSharedConvolutionalBoxPredictor {
optional float max = 2;
}
optional BoxEncodingsClipRange box_encodings_clip_range = 17;
}
}
// TODO(alirezafathi): Refactor the proto file to be able to configure mask rcnn
......@@ -197,3 +197,4 @@ message RfcnBoxPredictor {
optional int32 crop_width = 7 [default = 12];
}
......@@ -21,6 +21,9 @@ message CalibrationConfig {
// Per-class sigmoid calibration.
ClassIdSigmoidCalibrations class_id_sigmoid_calibrations = 4;
// Temperature scaling calibration.
TemperatureScalingCalibration temperature_scaling_calibration = 5;
}
}
......@@ -50,6 +53,11 @@ message ClassIdSigmoidCalibrations {
map<int32, SigmoidParameters> class_id_sigmoid_parameters_map = 1;
}
// Message for Temperature Scaling Calibration.
message TemperatureScalingCalibration {
optional float scaler = 1;
}
// Description of data used to fit the calibration model. CLASS_SPECIFIC
// indicates that the calibration parameters are derived from detections
// pertaining to a single class. ALL_CLASSES indicates that parameters were
......
......@@ -4,80 +4,85 @@ package object_detection.protos;
// Message for configuring DetectionModel evaluation jobs (eval.py).
message EvalConfig {
optional uint32 batch_size = 25 [default=1];
optional uint32 batch_size = 25 [default = 1];
// Number of visualization images to generate.
optional uint32 num_visualizations = 1 [default=10];
optional uint32 num_visualizations = 1 [default = 10];
// Number of examples to process of evaluation.
optional uint32 num_examples = 2 [default=5000, deprecated=true];
optional uint32 num_examples = 2 [default = 5000, deprecated = true];
// How often to run evaluation.
optional uint32 eval_interval_secs = 3 [default=300];
optional uint32 eval_interval_secs = 3 [default = 300];
// Maximum number of times to run evaluation. If set to 0, will run forever.
optional uint32 max_evals = 4 [default=0, deprecated=true];
optional uint32 max_evals = 4 [default = 0, deprecated = true];
// Whether the TensorFlow graph used for evaluation should be saved to disk.
optional bool save_graph = 5 [default=false];
optional bool save_graph = 5 [default = false];
// Path to directory to store visualizations in. If empty, visualization
// images are not exported (only shown on Tensorboard).
optional string visualization_export_dir = 6 [default=""];
optional string visualization_export_dir = 6 [default = ""];
// BNS name of the TensorFlow master.
optional string eval_master = 7 [default=""];
optional string eval_master = 7 [default = ""];
// Type of metrics to use for evaluation.
repeated string metrics_set = 8;
// Path to export detections to COCO compatible JSON format.
optional string export_path = 9 [default=''];
optional string export_path = 9 [default =''];
// Option to not read groundtruth labels and only export detections to
// COCO-compatible JSON file.
optional bool ignore_groundtruth = 10 [default=false];
optional bool ignore_groundtruth = 10 [default = false];
// Use exponential moving averages of variables for evaluation.
// TODO(rathodv): When this is false make sure the model is constructed
// without moving averages in restore_fn.
optional bool use_moving_averages = 11 [default=false];
optional bool use_moving_averages = 11 [default = false];
// Whether to evaluate instance masks.
// Note that since there is no evaluation code currently for instance
// segmenation this option is unused.
optional bool eval_instance_masks = 12 [default=false];
optional bool eval_instance_masks = 12 [default = false];
// Minimum score threshold for a detected object box to be visualized
optional float min_score_threshold = 13 [default=0.5];
optional float min_score_threshold = 13 [default = 0.5];
// Maximum number of detections to visualize
optional int32 max_num_boxes_to_visualize = 14 [default=20];
optional int32 max_num_boxes_to_visualize = 14 [default = 20];
// When drawing a single detection, each label is by default visualized as
// <label name> : <label score>. One can skip the name or/and score using the
// following fields:
optional bool skip_scores = 15 [default=false];
optional bool skip_labels = 16 [default=false];
optional bool skip_scores = 15 [default = false];
optional bool skip_labels = 16 [default = false];
// Whether to show groundtruth boxes in addition to detected boxes in
// visualizations.
optional bool visualize_groundtruth_boxes = 17 [default=false];
optional bool visualize_groundtruth_boxes = 17 [default = false];
// Box color for visualizing groundtruth boxes.
optional string groundtruth_box_visualization_color = 18 [default="black"];
optional string groundtruth_box_visualization_color = 18 [default = "black"];
// Whether to keep image identifier in filename when exported to
// visualization_export_dir.
optional bool keep_image_id_for_visualization_export = 19 [default=false];
optional bool keep_image_id_for_visualization_export = 19 [default = false];
// Whether to retain original images (i.e. not pre-processed) in the tensor
// dictionary, so that they can be displayed in Tensorboard.
optional bool retain_original_images = 23 [default=true];
optional bool retain_original_images = 23 [default = true];
// If True, additionally include per-category metrics.
optional bool include_metrics_per_category = 24 [default=false];
optional bool include_metrics_per_category = 24 [default = false];
// Recall range within which precision should be computed.
optional float recall_lower_bound = 26 [default = 0.0];
optional float recall_upper_bound = 27 [default = 1.0];
// Whether to retain additional channels (i.e. not pre-processed) in the
// tensor dictionary, so that they can be displayed in Tensorboard.
optional bool retain_original_image_additional_channels = 28
[default = false];
}
......@@ -169,11 +169,17 @@ message FasterRcnn {
// running evaluation (specifically not is_training if False).
optional bool use_static_shapes_for_eval = 37 [default = false];
// If true, uses implementation of partitioned_non_max_suppression in first
// stage.
optional bool use_partitioned_nms_in_first_stage = 38 [default = true];
// Whether to return raw detections (pre NMS).
optional bool return_raw_detections_during_predict = 39 [default = false];
// Whether to use tf.image.combined_non_max_suppression.
optional bool use_combined_nms_in_first_stage = 38 [default=false];
optional bool use_combined_nms_in_first_stage = 40 [default = false];
}
message FasterRcnnFeatureExtractor {
// Type of Faster R-CNN model (e.g., 'faster_rcnn_resnet101';
// See builders/model_builder.py for expected types).
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment