Commit 80444539 authored by Zhuoran Liu's avatar Zhuoran Liu Committed by pkulzc
Browse files

Add TPU SavedModel exporter and refactor OD code (#6737)

247226201  by ronnyvotel:

    Updating the visualization tools to accept unique_ids for color coding.

--
247067830  by Zhichao Lu:

    Add box_encodings_clip_range options for the convolutional box predictor (for TPU compatibility).

--
246888475  by Zhichao Lu:

    Remove unused _update_eval_steps function.

--
246163259  by lzc:

    Add a gather op that can handle ignore indices (which are "-1"s in this case).

--
246084944  by Zhichao Lu:

    Keras based implementation for SSD + MobilenetV2 + FPN.

--
245544227  by rathodv:

    Add batch_get_targets method to target assigner module to gather any groundtruth tensors based on the results of target assigner.

--
245540854  by rathodv:

    Update target assigner to return match tensor instead of a match object.

--
245434441  by Zhichao Lu:

    Add README for tpu_exporters package.

--
245381834  by lzc:

    Internal change.

--
245298983  by Zhichao Lu:

    Add conditional_shape_resizer to config_util

--
245134666  by Zhichao Lu:

    Adds ConditionalShapeResizer to the ImageResizer proto which enables resizing only if input image height or width is is greater or smaller than a certain size. Also enables specification of resize method in resize_to_{max, min}_dimension methods.

--
245093975  by Zhichao Lu:

    Exporting SavedModel for Object Detection TPU inference. (faster-rcnn)

--
245072421  by Zhichao Lu:

    Adds a new image resizing method "resize_to_max_dimension" which resizes images only if a dimension is greater than the maximum desired value while maintaining aspect ratio.

--
244946998  by lzc:

    Internal Changes.

--
244943693  by Zhichao Lu:

    Add a custom config to mobilenet v2 that makes it more detection friendly.

--
244754158  by derekjchow:

    Internal change.

--
244699875  by Zhichao Lu:

    Add check_range=False to box_list_ops.to_normalized_coordinates when training
    for instance segmentation.  This is consistent with other calls when training
    for object detection.  There could be wrongly annotated boxes in the dataset.

--
244507425  by rathodv:

    Support bfloat16 for ssd models.

--
244399982  by Zhichao Lu:

    Exporting SavedModel for Object Detection TPU inference. (ssd)

--
244209387  by Zhichao Lu:

    Internal change.

--
243922296  by rathodv:

    Change `raw_detection_scores` to contain softmax/sigmoid scores (not logits) for `raw_ detection_boxes`.

--
243883978  by Zhichao Lu:

    Add a sample fully conv config.

--
243369455  by Zhichao Lu:

    Fix regularization loss gap in Keras and Slim.

--
243292002  by lzc:

    Internal changes.

--
243097958  by Zhichao Lu:

    Exporting SavedModel for Object Detection TPU inference. (ssd model)

--
243007177  by Zhichao Lu:

    Exporting SavedModel for Object Detection TPU inference. (ssd model)

--
242776550  by Zhichao Lu:

    Make object detection pre-processing run on GPU.  tf.map_fn() uses
    TensorArrayV3 ops, which have no int32 GPU implementation.  Cast to int64,
    then cast back to int32.

--
242723128  by Zhichao Lu:

    Using sorted dictionaries for additional heads in non_max_suppression to ensure tensor order

--
242495311  by Zhichao Lu:

    Update documentation to reflect new TFLite examples repo location

--
242230527  by Zhichao Lu:

    Fix Dropout bugs for WeightSharedConvolutionalBoxPred.

--
242226573  by Zhichao Lu:

    Create Keras-based WeightSharedConvolutionalBoxPredictor.

--
241806074  by Zhichao Lu:

    Add inference in unit tests of TFX OD template.

--
241641498  by lzc:

    Internal change.

--
241637481  by Zhichao Lu:

    matmul_crop_and_resize(): Switch to dynamic shaping, so that not all dimensions are required to be known.

--
241429980  by Zhichao Lu:

    Internal change

--
241167237  by Zhichao Lu:

    Adds a faster_rcnn_inception_resnet_v2 Keras feature extractor, and updates the model builder to construct it.

--
241088616  by Zhichao Lu:

    Make it compatible with different dtype, e.g. float32, bfloat16, etc.

--
240897364  by lzc:

    Use image_np_expanded in object_detection_tutorial notebook.

--
240890393  by Zhichao Lu:

    Disable multicore inference for OD template as its not yet compatible.

--
240352168  by Zhichao Lu:

    Make SSDResnetV1FpnFeatureExtractor not protected to allow inheritance.

--
240351470  by lzc:

    Internal change.

--
239878928  by Zhichao Lu:

    Defines Keras box predictors for Faster RCNN and RFCN

--
239872103  by Zhichao Lu:

    Delete duplicated inputs in test.

--
239714273  by Zhichao Lu:

    Adding scope variable to all class heads

--
239698643  by Zhichao Lu:

    Create FPN feature extractor for object detection.

--
239696657  by Zhichao Lu:

    Internal Change.

--
239299404  by Zhichao Lu:

    Allows the faster rcnn meta-architecture to support Keras subcomponents

--
238502595  by Zhichao Lu:

    Lay the groundwork for symmetric quantization.

--
238496885  by Zhichao Lu:

    Add flexible_grid_anchor_generator

--
238138727  by lzc:

    Remove dead code.

    _USE_C_SHAPES has been forced True in TensorFlow releases since
    TensorFlow 1.9
    (https://github.com/tensorflow/tensorflow/commit/1d74a69443f741e69f9f52cb6bc2940b4d4ae3b7)

--
238123936  by rathodv:

    Add num_matched_groundtruth summary to target assigner in SSD.

--
238103345  by ronnyvotel:

    Raising error if input file pattern does not match any files.
    Also printing the number of evaluation images for coco metrics.

--
238044081  by Zhichao Lu:

    Fix docstring to state the correct dimensionality of `class_predictions_with_background`.

--
237920279  by Zhichao Lu:

    [XLA] Rework debug flags for dumping HLO.

    The following flags (usually passed via the XLA_FLAGS envvar) are removed:

      xla_dump_computations_to
      xla_dump_executions_to
      xla_dump_ir_to
      xla_dump_optimized_hlo_proto_to
      xla_dump_per_pass_hlo_proto_to
      xla_dump_unoptimized_hlo_proto_to
      xla_generate_hlo_graph
      xla_generate_hlo_text_to
      xla_hlo_dump_as_html
      xla_hlo_graph_path
      xla_log_hlo_text

    The following new flags are added:

      xla_dump_to
      xla_dump_hlo_module_re
      xla_dump_hlo_pass_re
      xla_dump_hlo_as_text
      xla_dump_hlo_as_proto
      xla_dump_hlo_as_dot
      xla_dump_hlo_as_url
      xla_dump_hlo_as_html
      xla_dump_ir
      xla_dump_hlo_snapshots

    The default is not to dump anything at all, but as soon as some dumping flag is
    specified, we enable the following defaults (most of which can be overridden).

     * dump to stdout (overridden by --xla_dump_to)
     * dump HLO modules at the very beginning and end of the optimization pipeline
     * don't dump between any HLO passes (overridden by --xla_dump_hlo_pass_re)
     * dump all HLO modules (overridden by --xla_dump_hlo_module_re)
     * dump in textual format (overridden by
       --xla_dump_hlo_as_{text,proto,dot,url,html}).

    For example, to dump optimized and unoptimized HLO text and protos to /tmp/foo,
    pass

      --xla_dump_to=/tmp/foo --xla_dump_hlo_as_text --xla_dump_hlo_as_proto

    For details on these flags' meanings, see xla.proto.

    The intent of this change is to make dumping both simpler to use and more
    powerful.

    For example:

     * Previously there was no way to dump the HLO module during the pass pipeline
       in HLO text format; the only option was --dump_per_pass_hlo_proto_to, which
       dumped in proto format.

       Now this is --xla_dump_pass_re=.* --xla_dump_hlo_as_text.  (In fact, the
       second flag is not necessary in this case, as dumping as text is the
       default.)

     * Previously there was no way to dump HLO as a graph before and after
       compilation; the only option was --xla_generate_hlo_graph, which would dump
       before/after every pass.

       Now this is --xla_dump_hlo_as_{dot,url,html} (depending on what format you
       want the graph in).

     * Previously, there was no coordination between the filenames written by the
       various flags, so info about one module might be dumped with various
       filename prefixes.  Now the filenames are consistent and all dumps from a
       particular module are next to each other.

    If you only specify some of these flags, we try to figure out what you wanted.
    For example:

     * --xla_dump_to implies --xla_dump_hlo_as_text unless you specify some
       other --xla_dump_as_* flag.

     * --xla_dump_hlo_as_text or --xla_dump_ir implies dumping to stdout unless you
       specify a different --xla_dump_to directory.  You can explicitly dump to
       stdout with --xla_dump_to=-.

    As part of this change, I simplified the debugging code in the HLO passes for
    dumping HLO modules.  Previously, many tests explicitly VLOG'ed the HLO module
    before, after, and sometimes during the pass.  I removed these VLOGs.  If you
    want dumps before/during/after an HLO pass, use --xla_dump_pass_re=<pass_name>.

--
237510043  by lzc:

    Internal Change.

--
237469515  by Zhichao Lu:

    Parameterize model_builder.build in inputs.py.

--
237293511  by rathodv:

    Remove multiclass_scores from tensor_dict in transform_data_fn always.

--
237260333  by ronnyvotel:

    Updating faster_rcnn_meta_arch to define prediction dictionary fields that are batched.

--

PiperOrigin-RevId: 247226201
parent c4f34e58
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for inception_resnet_v2.py.
This test mainly focuses on comparing slim inception resnet v2 and Keras
inception resnet v2 for object detection. To verify the consistency of the two
models, we compare:
1. Output shape of each layer given different inputs
2. Number of global variables
We also visualize the model structure via Tensorboard, and compare the model
layout and the parameters of each Op to make sure the two implementations are
consistent.
"""
import itertools
import numpy as np
import tensorflow as tf
from object_detection.models.keras_models import inception_resnet_v2
from object_detection.utils import test_case
_KERAS_TO_SLIM_ENDPOINT_NAMES = {
'activation': 'Conv2d_1a_3x3',
'activation_1': 'Conv2d_2a_3x3',
'activation_2': 'Conv2d_2b_3x3',
'activation_3': 'Conv2d_3b_1x1',
'activation_4': 'Conv2d_4a_3x3',
'max_pooling2d': 'MaxPool_3a_3x3',
'max_pooling2d_1': 'MaxPool_5a_3x3',
'mixed_5b': 'Mixed_5b',
'mixed_6a': 'Mixed_6a',
'block17_20_ac': 'PreAuxLogits',
'mixed_7a': 'Mixed_7a',
'conv_7b_ac': 'Conv2d_7b_1x1',
}
_SLIM_ENDPOINT_SHAPES_128 = {
'Conv2d_1a_3x3': (2, 64, 64, 32),
'Conv2d_2a_3x3': (2, 64, 64, 32),
'Conv2d_2b_3x3': (2, 64, 64, 64),
'Conv2d_3b_1x1': (2, 32, 32, 80),
'Conv2d_4a_3x3': (2, 32, 32, 192),
'Conv2d_7b_1x1': (2, 4, 4, 1536),
'MaxPool_3a_3x3': (2, 32, 32, 64),
'MaxPool_5a_3x3': (2, 16, 16, 192),
'Mixed_5b': (2, 16, 16, 320),
'Mixed_6a': (2, 8, 8, 1088),
'Mixed_7a': (2, 4, 4, 2080),
'PreAuxLogits': (2, 8, 8, 1088)}
_SLIM_ENDPOINT_SHAPES_128_STRIDE_8 = {
'Conv2d_1a_3x3': (2, 64, 64, 32),
'Conv2d_2a_3x3': (2, 64, 64, 32),
'Conv2d_2b_3x3': (2, 64, 64, 64),
'Conv2d_3b_1x1': (2, 32, 32, 80),
'Conv2d_4a_3x3': (2, 32, 32, 192),
'MaxPool_3a_3x3': (2, 32, 32, 64),
'MaxPool_5a_3x3': (2, 16, 16, 192),
'Mixed_5b': (2, 16, 16, 320),
'Mixed_6a': (2, 16, 16, 1088),
'PreAuxLogits': (2, 16, 16, 1088)}
_SLIM_ENDPOINT_SHAPES_128_ALIGN_FEATURE_MAPS_FALSE = {
'Conv2d_1a_3x3': (2, 63, 63, 32),
'Conv2d_2a_3x3': (2, 61, 61, 32),
'Conv2d_2b_3x3': (2, 61, 61, 64),
'Conv2d_3b_1x1': (2, 30, 30, 80),
'Conv2d_4a_3x3': (2, 28, 28, 192),
'Conv2d_7b_1x1': (2, 2, 2, 1536),
'MaxPool_3a_3x3': (2, 30, 30, 64),
'MaxPool_5a_3x3': (2, 13, 13, 192),
'Mixed_5b': (2, 13, 13, 320),
'Mixed_6a': (2, 6, 6, 1088),
'Mixed_7a': (2, 2, 2, 2080),
'PreAuxLogits': (2, 6, 6, 1088)}
_SLIM_ENDPOINT_SHAPES_299 = {}
_SLIM_ENDPOINT_SHAPES_299_STRIDE_8 = {}
_SLIM_ENDPOINT_SHAPES_299_ALIGN_FEATURE_MAPS_FALSE = {}
_KERAS_LAYERS_TO_CHECK = list(_KERAS_TO_SLIM_ENDPOINT_NAMES.keys())
_NUM_CHANNELS = 3
_BATCH_SIZE = 2
class InceptionResnetV2Test(test_case.TestCase):
def _create_application_with_layer_outputs(
self, layer_names, batchnorm_training,
output_stride=16,
align_feature_maps=False,
batchnorm_scale=False,
weight_decay=0.00004,
default_batchnorm_momentum=0.9997,
default_batchnorm_epsilon=0.001,):
"""Constructs Keras inception_resnet_v2 that extracts layer outputs."""
# Have to clear the Keras backend to ensure isolation in layer naming
tf.keras.backend.clear_session()
if not layer_names:
layer_names = _KERAS_LAYERS_TO_CHECK
full_model = inception_resnet_v2.inception_resnet_v2(
batchnorm_training=batchnorm_training,
output_stride=output_stride,
align_feature_maps=align_feature_maps,
weights=None,
batchnorm_scale=batchnorm_scale,
weight_decay=weight_decay,
default_batchnorm_momentum=default_batchnorm_momentum,
default_batchnorm_epsilon=default_batchnorm_epsilon,
include_top=False)
layer_outputs = [full_model.get_layer(name=layer).output
for layer in layer_names]
return tf.keras.Model(
inputs=full_model.inputs,
outputs=layer_outputs)
def _check_returns_correct_shape(
self, image_height, image_width,
expected_feature_map_shape, layer_names=None, batchnorm_training=True,
output_stride=16,
align_feature_maps=False,
batchnorm_scale=False,
weight_decay=0.00004,
default_batchnorm_momentum=0.9997,
default_batchnorm_epsilon=0.001,):
if not layer_names:
layer_names = _KERAS_LAYERS_TO_CHECK
model = self._create_application_with_layer_outputs(
layer_names=layer_names,
batchnorm_training=batchnorm_training,
output_stride=output_stride,
align_feature_maps=align_feature_maps,
batchnorm_scale=batchnorm_scale,
weight_decay=weight_decay,
default_batchnorm_momentum=default_batchnorm_momentum,
default_batchnorm_epsilon=default_batchnorm_epsilon)
image_tensor = np.random.rand(_BATCH_SIZE, image_height, image_width,
_NUM_CHANNELS).astype(np.float32)
feature_maps = model(image_tensor)
for feature_map, layer_name in itertools.izip(
feature_maps, layer_names):
endpoint_name = _KERAS_TO_SLIM_ENDPOINT_NAMES[layer_name]
expected_shape = expected_feature_map_shape[endpoint_name]
self.assertAllEqual(feature_map.shape, expected_shape)
def _get_variables(self, layer_names=None):
tf.keras.backend.clear_session()
model = self._create_application_with_layer_outputs(
layer_names=layer_names,
batchnorm_training=False)
preprocessed_inputs = tf.placeholder(
tf.float32, (4, None, None, _NUM_CHANNELS))
model(preprocessed_inputs)
return model.variables
def test_returns_correct_shapes_128(self):
image_height = 128
image_width = 128
expected_feature_map_shape = (
_SLIM_ENDPOINT_SHAPES_128)
self._check_returns_correct_shape(
image_height, image_width, expected_feature_map_shape,
align_feature_maps=True)
def test_returns_correct_shapes_128_output_stride_8(self):
image_height = 128
image_width = 128
expected_feature_map_shape = (
_SLIM_ENDPOINT_SHAPES_128_STRIDE_8)
# Output stride of 8 not defined beyond 'block17_20_ac', which is
# PreAuxLogits in slim. So, we exclude those layers in our Keras vs Slim
# comparison.
excluded_layers = {'mixed_7a', 'conv_7b_ac'}
layer_names = [l for l in _KERAS_LAYERS_TO_CHECK
if l not in excluded_layers]
self._check_returns_correct_shape(
image_height, image_width, expected_feature_map_shape,
layer_names=layer_names, output_stride=8, align_feature_maps=True)
def test_returns_correct_shapes_128_align_feature_maps_false(
self):
image_height = 128
image_width = 128
expected_feature_map_shape = (
_SLIM_ENDPOINT_SHAPES_128_ALIGN_FEATURE_MAPS_FALSE)
self._check_returns_correct_shape(
image_height, image_width, expected_feature_map_shape,
align_feature_maps=False)
def test_hyperparam_override(self):
model = inception_resnet_v2.inception_resnet_v2(
batchnorm_training=True,
default_batchnorm_momentum=0.2,
default_batchnorm_epsilon=0.1,
weights=None,
include_top=False)
bn_layer = model.get_layer(name='freezable_batch_norm')
self.assertAllClose(bn_layer.momentum, 0.2)
self.assertAllClose(bn_layer.epsilon, 0.1)
def test_variable_count(self):
variables = self._get_variables()
# 896 is the number of variables from slim inception resnet v2 model.
self.assertEqual(len(variables), 896)
if __name__ == '__main__':
tf.test.main()
...@@ -22,6 +22,7 @@ from __future__ import print_function ...@@ -22,6 +22,7 @@ from __future__ import print_function
import tensorflow as tf import tensorflow as tf
from object_detection.core import freezable_batch_norm from object_detection.core import freezable_batch_norm
from object_detection.models.keras_models import model_utils
def _fixed_padding(inputs, kernel_size, rate=1): # pylint: disable=invalid-name def _fixed_padding(inputs, kernel_size, rate=1): # pylint: disable=invalid-name
...@@ -59,7 +60,8 @@ class _LayersOverride(object): ...@@ -59,7 +60,8 @@ class _LayersOverride(object):
conv_hyperparams=None, conv_hyperparams=None,
use_explicit_padding=False, use_explicit_padding=False,
alpha=1.0, alpha=1.0,
min_depth=None): min_depth=None,
conv_defs=None):
"""Alternative tf.keras.layers interface, for use by the Keras MobileNetV1. """Alternative tf.keras.layers interface, for use by the Keras MobileNetV1.
It is used by the Keras applications kwargs injection API to It is used by the Keras applications kwargs injection API to
...@@ -90,6 +92,8 @@ class _LayersOverride(object): ...@@ -90,6 +92,8 @@ class _LayersOverride(object):
modifies the number of filters in each convolutional layer. It's called modifies the number of filters in each convolutional layer. It's called
depth multiplier in Keras application MobilenetV1. depth multiplier in Keras application MobilenetV1.
min_depth: Minimum number of filters in the convolutional layers. min_depth: Minimum number of filters in the convolutional layers.
conv_defs: Network layout to specify the mobilenet_v1 body. Default is
`None` to use the default mobilenet_v1 network layout.
""" """
self._alpha = alpha self._alpha = alpha
self._batchnorm_training = batchnorm_training self._batchnorm_training = batchnorm_training
...@@ -97,6 +101,7 @@ class _LayersOverride(object): ...@@ -97,6 +101,7 @@ class _LayersOverride(object):
self._conv_hyperparams = conv_hyperparams self._conv_hyperparams = conv_hyperparams
self._use_explicit_padding = use_explicit_padding self._use_explicit_padding = use_explicit_padding
self._min_depth = min_depth self._min_depth = min_depth
self._conv_defs = conv_defs
self.regularizer = tf.keras.regularizers.l2(0.00004 * 0.5) self.regularizer = tf.keras.regularizers.l2(0.00004 * 0.5)
self.initializer = tf.truncated_normal_initializer(stddev=0.09) self.initializer = tf.truncated_normal_initializer(stddev=0.09)
...@@ -122,6 +127,11 @@ class _LayersOverride(object): ...@@ -122,6 +127,11 @@ class _LayersOverride(object):
the input argument, or that will first pad the input then apply a Conv2D the input argument, or that will first pad the input then apply a Conv2D
layer. layer.
""" """
layer_name = kwargs['name']
if self._conv_defs:
conv_filters = model_utils.get_conv_def(self._conv_defs, layer_name)
if conv_filters:
filters = conv_filters
# Apply the width multiplier and the minimum depth to the convolution layers # Apply the width multiplier and the minimum depth to the convolution layers
filters = int(filters * self._alpha) filters = int(filters * self._alpha)
if self._min_depth and filters < self._min_depth: if self._min_depth and filters < self._min_depth:
...@@ -163,7 +173,12 @@ class _LayersOverride(object): ...@@ -163,7 +173,12 @@ class _LayersOverride(object):
""" """
if self._conv_hyperparams: if self._conv_hyperparams:
kwargs = self._conv_hyperparams.params(**kwargs) kwargs = self._conv_hyperparams.params(**kwargs)
# Both regularizer and initializaer also applies to depthwise layer in
# MobilenetV1, so we remap the kernel_* to depthwise_* here.
kwargs['depthwise_regularizer'] = kwargs['kernel_regularizer']
kwargs['depthwise_initializer'] = kwargs['kernel_initializer']
else: else:
kwargs['depthwise_regularizer'] = self.regularizer
kwargs['depthwise_initializer'] = self.initializer kwargs['depthwise_initializer'] = self.initializer
kwargs['padding'] = 'same' kwargs['padding'] = 'same'
...@@ -278,6 +293,7 @@ def mobilenet_v1(batchnorm_training, ...@@ -278,6 +293,7 @@ def mobilenet_v1(batchnorm_training,
use_explicit_padding=False, use_explicit_padding=False,
alpha=1.0, alpha=1.0,
min_depth=None, min_depth=None,
conv_defs=None,
**kwargs): **kwargs):
"""Instantiates the MobileNetV1 architecture, modified for object detection. """Instantiates the MobileNetV1 architecture, modified for object detection.
...@@ -309,6 +325,8 @@ def mobilenet_v1(batchnorm_training, ...@@ -309,6 +325,8 @@ def mobilenet_v1(batchnorm_training,
alpha: The width multiplier referenced in the MobileNetV1 paper. It alpha: The width multiplier referenced in the MobileNetV1 paper. It
modifies the number of filters in each convolutional layer. modifies the number of filters in each convolutional layer.
min_depth: Minimum number of filters in the convolutional layers. min_depth: Minimum number of filters in the convolutional layers.
conv_defs: Network layout to specify the mobilenet_v1 body. Default is
`None` to use the default mobilenet_v1 network layout.
**kwargs: Keyword arguments forwarded directly to the **kwargs: Keyword arguments forwarded directly to the
`tf.keras.applications.Mobilenet` method that constructs the Keras `tf.keras.applications.Mobilenet` method that constructs the Keras
model. model.
...@@ -322,7 +340,8 @@ def mobilenet_v1(batchnorm_training, ...@@ -322,7 +340,8 @@ def mobilenet_v1(batchnorm_training,
conv_hyperparams=conv_hyperparams, conv_hyperparams=conv_hyperparams,
use_explicit_padding=use_explicit_padding, use_explicit_padding=use_explicit_padding,
min_depth=min_depth, min_depth=min_depth,
alpha=alpha) alpha=alpha,
conv_defs=conv_defs)
return tf.keras.applications.MobileNet( return tf.keras.applications.MobileNet(
alpha=alpha, layers=layers_override, **kwargs) alpha=alpha, layers=layers_override, **kwargs)
# pylint: enable=invalid-name # pylint: enable=invalid-name
...@@ -33,6 +33,7 @@ from google.protobuf import text_format ...@@ -33,6 +33,7 @@ from google.protobuf import text_format
from object_detection.builders import hyperparams_builder from object_detection.builders import hyperparams_builder
from object_detection.models.keras_models import mobilenet_v1 from object_detection.models.keras_models import mobilenet_v1
from object_detection.models.keras_models import model_utils
from object_detection.models.keras_models import test_utils from object_detection.models.keras_models import test_utils
from object_detection.protos import hyperparams_pb2 from object_detection.protos import hyperparams_pb2
from object_detection.utils import test_case from object_detection.utils import test_case
...@@ -88,7 +89,8 @@ class MobilenetV1Test(test_case.TestCase): ...@@ -88,7 +89,8 @@ class MobilenetV1Test(test_case.TestCase):
conv_hyperparams=None, conv_hyperparams=None,
use_explicit_padding=False, use_explicit_padding=False,
alpha=1.0, alpha=1.0,
min_depth=None): min_depth=None,
conv_defs=None):
"""Constructs Keras MobilenetV1 that extracts intermediate layer outputs.""" """Constructs Keras MobilenetV1 that extracts intermediate layer outputs."""
if not layer_names: if not layer_names:
layer_names = _KERAS_LAYERS_TO_CHECK layer_names = _KERAS_LAYERS_TO_CHECK
...@@ -99,6 +101,7 @@ class MobilenetV1Test(test_case.TestCase): ...@@ -99,6 +101,7 @@ class MobilenetV1Test(test_case.TestCase):
use_explicit_padding=use_explicit_padding, use_explicit_padding=use_explicit_padding,
alpha=alpha, alpha=alpha,
min_depth=min_depth, min_depth=min_depth,
conv_defs=conv_defs,
include_top=False) include_top=False)
layer_outputs = [full_model.get_layer(name=layer).output layer_outputs = [full_model.get_layer(name=layer).output
for layer in layer_names] for layer in layer_names]
...@@ -109,14 +112,15 @@ class MobilenetV1Test(test_case.TestCase): ...@@ -109,14 +112,15 @@ class MobilenetV1Test(test_case.TestCase):
def _check_returns_correct_shape( def _check_returns_correct_shape(
self, image_height, image_width, depth_multiplier, self, image_height, image_width, depth_multiplier,
expected_feature_map_shape, use_explicit_padding=False, min_depth=8, expected_feature_map_shape, use_explicit_padding=False, min_depth=8,
layer_names=None): layer_names=None, conv_defs=None):
def graph_fn(image_tensor): def graph_fn(image_tensor):
model = self._create_application_with_layer_outputs( model = self._create_application_with_layer_outputs(
layer_names=layer_names, layer_names=layer_names,
batchnorm_training=False, batchnorm_training=False,
use_explicit_padding=use_explicit_padding, use_explicit_padding=use_explicit_padding,
min_depth=min_depth, min_depth=min_depth,
alpha=depth_multiplier) alpha=depth_multiplier,
conv_defs=conv_defs)
return model(image_tensor) return model(image_tensor)
image_tensor = np.random.rand(_BATCH_SIZE, image_height, image_width, image_tensor = np.random.rand(_BATCH_SIZE, image_height, image_width,
...@@ -211,6 +215,23 @@ class MobilenetV1Test(test_case.TestCase): ...@@ -211,6 +215,23 @@ class MobilenetV1Test(test_case.TestCase):
self._check_returns_correct_shape( self._check_returns_correct_shape(
image_height, image_width, depth_multiplier, expected_feature_map_shape) image_height, image_width, depth_multiplier, expected_feature_map_shape)
def test_returns_correct_shapes_with_conv_defs(
self):
image_height = 299
image_width = 299
depth_multiplier = 1.0
conv_def_block_12 = model_utils.ConvDefs(
conv_name='conv_pw_12', filters=512)
conv_def_block_13 = model_utils.ConvDefs(
conv_name='conv_pw_13', filters=256)
conv_defs = [conv_def_block_12, conv_def_block_13]
expected_feature_map_shape = (
test_utils.moblenet_v1_expected_feature_map_shape_with_conv_defs)
self._check_returns_correct_shape(
image_height, image_width, depth_multiplier, expected_feature_map_shape,
conv_defs=conv_defs)
def test_hyperparam_override(self): def test_hyperparam_override(self):
hyperparams = self._build_conv_hyperparams() hyperparams = self._build_conv_hyperparams()
model = mobilenet_v1.mobilenet_v1( model = mobilenet_v1.mobilenet_v1(
......
...@@ -21,6 +21,7 @@ from __future__ import print_function ...@@ -21,6 +21,7 @@ from __future__ import print_function
import tensorflow as tf import tensorflow as tf
from object_detection.core import freezable_batch_norm from object_detection.core import freezable_batch_norm
from object_detection.models.keras_models import model_utils
from object_detection.utils import ops from object_detection.utils import ops
...@@ -45,7 +46,8 @@ class _LayersOverride(object): ...@@ -45,7 +46,8 @@ class _LayersOverride(object):
conv_hyperparams=None, conv_hyperparams=None,
use_explicit_padding=False, use_explicit_padding=False,
alpha=1.0, alpha=1.0,
min_depth=None): min_depth=None,
conv_defs=None):
"""Alternative tf.keras.layers interface, for use by the Keras MobileNetV2. """Alternative tf.keras.layers interface, for use by the Keras MobileNetV2.
It is used by the Keras applications kwargs injection API to It is used by the Keras applications kwargs injection API to
...@@ -75,6 +77,8 @@ class _LayersOverride(object): ...@@ -75,6 +77,8 @@ class _LayersOverride(object):
alpha: The width multiplier referenced in the MobileNetV2 paper. It alpha: The width multiplier referenced in the MobileNetV2 paper. It
modifies the number of filters in each convolutional layer. modifies the number of filters in each convolutional layer.
min_depth: Minimum number of filters in the convolutional layers. min_depth: Minimum number of filters in the convolutional layers.
conv_defs: Network layout to specify the mobilenet_v2 body. Default is
`None` to use the default mobilenet_v2 network layout.
""" """
self._alpha = alpha self._alpha = alpha
self._batchnorm_training = batchnorm_training self._batchnorm_training = batchnorm_training
...@@ -82,6 +86,7 @@ class _LayersOverride(object): ...@@ -82,6 +86,7 @@ class _LayersOverride(object):
self._conv_hyperparams = conv_hyperparams self._conv_hyperparams = conv_hyperparams
self._use_explicit_padding = use_explicit_padding self._use_explicit_padding = use_explicit_padding
self._min_depth = min_depth self._min_depth = min_depth
self._conv_defs = conv_defs
self.regularizer = tf.keras.regularizers.l2(0.00004 * 0.5) self.regularizer = tf.keras.regularizers.l2(0.00004 * 0.5)
self.initializer = tf.truncated_normal_initializer(stddev=0.09) self.initializer = tf.truncated_normal_initializer(stddev=0.09)
...@@ -106,8 +111,14 @@ class _LayersOverride(object): ...@@ -106,8 +111,14 @@ class _LayersOverride(object):
""" """
# Make sure 'alpha' is always applied to the last convolution block's size # Make sure 'alpha' is always applied to the last convolution block's size
# (This overrides the Keras application's functionality) # (This overrides the Keras application's functionality)
if kwargs.get('name') == 'Conv_1' and self._alpha < 1.0: layer_name = kwargs.get('name')
filters = _make_divisible(1280 * self._alpha, 8) if layer_name == 'Conv_1':
if self._conv_defs:
filters = model_utils.get_conv_def(self._conv_defs, 'Conv_1')
else:
filters = 1280
if self._alpha < 1.0:
filters = _make_divisible(filters * self._alpha, 8)
# Apply the minimum depth to the convolution layers # Apply the minimum depth to the convolution layers
if (self._min_depth and (filters < self._min_depth) if (self._min_depth and (filters < self._min_depth)
...@@ -263,6 +274,7 @@ def mobilenet_v2(batchnorm_training, ...@@ -263,6 +274,7 @@ def mobilenet_v2(batchnorm_training,
use_explicit_padding=False, use_explicit_padding=False,
alpha=1.0, alpha=1.0,
min_depth=None, min_depth=None,
conv_defs=None,
**kwargs): **kwargs):
"""Instantiates the MobileNetV2 architecture, modified for object detection. """Instantiates the MobileNetV2 architecture, modified for object detection.
...@@ -294,6 +306,8 @@ def mobilenet_v2(batchnorm_training, ...@@ -294,6 +306,8 @@ def mobilenet_v2(batchnorm_training,
alpha: The width multiplier referenced in the MobileNetV2 paper. It alpha: The width multiplier referenced in the MobileNetV2 paper. It
modifies the number of filters in each convolutional layer. modifies the number of filters in each convolutional layer.
min_depth: Minimum number of filters in the convolutional layers. min_depth: Minimum number of filters in the convolutional layers.
conv_defs: Network layout to specify the mobilenet_v2 body. Default is
`None` to use the default mobilenet_v2 network layout.
**kwargs: Keyword arguments forwarded directly to the **kwargs: Keyword arguments forwarded directly to the
`tf.keras.applications.MobilenetV2` method that constructs the Keras `tf.keras.applications.MobilenetV2` method that constructs the Keras
model. model.
...@@ -307,7 +321,8 @@ def mobilenet_v2(batchnorm_training, ...@@ -307,7 +321,8 @@ def mobilenet_v2(batchnorm_training,
conv_hyperparams=conv_hyperparams, conv_hyperparams=conv_hyperparams,
use_explicit_padding=use_explicit_padding, use_explicit_padding=use_explicit_padding,
min_depth=min_depth, min_depth=min_depth,
alpha=alpha) alpha=alpha,
conv_defs=conv_defs)
return tf.keras.applications.MobileNetV2(alpha=alpha, return tf.keras.applications.MobileNetV2(alpha=alpha,
layers=layers_override, layers=layers_override,
**kwargs) **kwargs)
......
...@@ -22,6 +22,7 @@ from google.protobuf import text_format ...@@ -22,6 +22,7 @@ from google.protobuf import text_format
from object_detection.builders import hyperparams_builder from object_detection.builders import hyperparams_builder
from object_detection.models.keras_models import mobilenet_v2 from object_detection.models.keras_models import mobilenet_v2
from object_detection.models.keras_models import model_utils
from object_detection.models.keras_models import test_utils from object_detection.models.keras_models import test_utils
from object_detection.protos import hyperparams_pb2 from object_detection.protos import hyperparams_pb2
from object_detection.utils import test_case from object_detection.utils import test_case
...@@ -77,7 +78,8 @@ class MobilenetV2Test(test_case.TestCase): ...@@ -77,7 +78,8 @@ class MobilenetV2Test(test_case.TestCase):
conv_hyperparams=None, conv_hyperparams=None,
use_explicit_padding=False, use_explicit_padding=False,
alpha=1.0, alpha=1.0,
min_depth=None): min_depth=None,
conv_defs=None):
"""Constructs Keras mobilenetv2 that extracts intermediate layer outputs.""" """Constructs Keras mobilenetv2 that extracts intermediate layer outputs."""
if not layer_names: if not layer_names:
layer_names = _layers_to_check layer_names = _layers_to_check
...@@ -88,7 +90,8 @@ class MobilenetV2Test(test_case.TestCase): ...@@ -88,7 +90,8 @@ class MobilenetV2Test(test_case.TestCase):
use_explicit_padding=use_explicit_padding, use_explicit_padding=use_explicit_padding,
alpha=alpha, alpha=alpha,
min_depth=min_depth, min_depth=min_depth,
include_top=False) include_top=False,
conv_defs=conv_defs)
layer_outputs = [full_model.get_layer(name=layer).output layer_outputs = [full_model.get_layer(name=layer).output
for layer in layer_names] for layer in layer_names]
return tf.keras.Model( return tf.keras.Model(
...@@ -98,13 +101,15 @@ class MobilenetV2Test(test_case.TestCase): ...@@ -98,13 +101,15 @@ class MobilenetV2Test(test_case.TestCase):
def _check_returns_correct_shape( def _check_returns_correct_shape(
self, batch_size, image_height, image_width, depth_multiplier, self, batch_size, image_height, image_width, depth_multiplier,
expected_feature_map_shapes, use_explicit_padding=False, min_depth=None, expected_feature_map_shapes, use_explicit_padding=False, min_depth=None,
layer_names=None): layer_names=None, conv_defs=None):
def graph_fn(image_tensor): def graph_fn(image_tensor):
model = self._create_application_with_layer_outputs( model = self._create_application_with_layer_outputs(
layer_names=layer_names, layer_names=layer_names,
batchnorm_training=False, use_explicit_padding=use_explicit_padding, batchnorm_training=False,
use_explicit_padding=use_explicit_padding,
min_depth=min_depth, min_depth=min_depth,
alpha=depth_multiplier) alpha=depth_multiplier,
conv_defs=conv_defs)
return model(image_tensor) return model(image_tensor)
image_tensor = np.random.rand(batch_size, image_height, image_width, image_tensor = np.random.rand(batch_size, image_height, image_width,
...@@ -202,6 +207,21 @@ class MobilenetV2Test(test_case.TestCase): ...@@ -202,6 +207,21 @@ class MobilenetV2Test(test_case.TestCase):
2, image_height, image_width, depth_multiplier, 2, image_height, image_width, depth_multiplier,
expected_feature_map_shape, min_depth=32) expected_feature_map_shape, min_depth=32)
def test_returns_correct_shapes_with_conv_defs(
self):
image_height = 299
image_width = 299
depth_multiplier = 1.0
conv_1 = model_utils.ConvDefs(
conv_name='Conv_1', filters=256)
conv_defs = [conv_1]
expected_feature_map_shape = (
test_utils.moblenet_v2_expected_feature_map_shape_with_conv_defs)
self._check_returns_correct_shape(
2, image_height, image_width, depth_multiplier,
expected_feature_map_shape, conv_defs=conv_defs)
def test_hyperparam_override(self): def test_hyperparam_override(self):
hyperparams = self._build_conv_hyperparams() hyperparams = self._build_conv_hyperparams()
model = mobilenet_v2.mobilenet_v2( model = mobilenet_v2.mobilenet_v2(
......
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Utils for Keras models."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
# This is to specify the custom config of model structures. For example,
# ConvDefs(conv_name='conv_pw_12', filters=512) for Mobilenet V1 is to specify
# the filters of the conv layer with name 'conv_pw_12' as 512.s
ConvDefs = collections.namedtuple('ConvDefs', ['conv_name', 'filters'])
def get_conv_def(conv_defs, layer_name):
"""Get the custom config for some layer of the model structure.
Args:
conv_defs: A named tuple to specify the custom config of the model
network. See `ConvDefs` for details.
layer_name: A string, the name of the layer to be customized.
Returns:
The number of filters for the layer, or `None` if there is no custom
config for the requested layer.
"""
for conv_def in conv_defs:
if layer_name == conv_def.conv_name:
return conv_def.filters
return None
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""MobileNet v2 models for Keras.
MobileNetV2 is a general architecture and can be used for multiple use cases.
Depending on the use case, it can use different input layer size and
different width factors. This allows different width models to reduce
the number of multiply-adds and thereby
reduce inference cost on mobile devices.
MobileNetV2 is very similar to the original MobileNet,
except that it uses inverted residual blocks with
bottlenecking features. It has a drastically lower
parameter count than the original MobileNet.
MobileNets support any input size greater
than 32 x 32, with larger image sizes
offering better performance.
The number of parameters and number of multiply-adds
can be modified by using the `alpha` parameter,
which increases/decreases the number of filters in each layer.
By altering the image size and `alpha` parameter,
all 22 models from the paper can be built, with ImageNet weights provided.
The paper demonstrates the performance of MobileNets using `alpha` values of
1.0 (also called 100 % MobileNet), 0.35, 0.5, 0.75, 1.0, 1.3, and 1.4
For each of these `alpha` values, weights for 5 different input image sizes
are provided (224, 192, 160, 128, and 96).
The following table describes the performance of
MobileNet on various input sizes:
------------------------------------------------------------------------
MACs stands for Multiply Adds
Classification Checkpoint| MACs (M) | Parameters (M)| Top 1 Accuracy| Top 5 Accuracy
--------------------------|------------|---------------|---------|----|-------------
| [mobilenet_v2_1.4_224] | 582 | 6.06 | 75.0 | 92.5 |
| [mobilenet_v2_1.3_224] | 509 | 5.34 | 74.4 | 92.1 |
| [mobilenet_v2_1.0_224] | 300 | 3.47 | 71.8 | 91.0 |
| [mobilenet_v2_1.0_192] | 221 | 3.47 | 70.7 | 90.1 |
| [mobilenet_v2_1.0_160] | 154 | 3.47 | 68.8 | 89.0 |
| [mobilenet_v2_1.0_128] | 99 | 3.47 | 65.3 | 86.9 |
| [mobilenet_v2_1.0_96] | 56 | 3.47 | 60.3 | 83.2 |
| [mobilenet_v2_0.75_224] | 209 | 2.61 | 69.8 | 89.6 |
| [mobilenet_v2_0.75_192] | 153 | 2.61 | 68.7 | 88.9 |
| [mobilenet_v2_0.75_160] | 107 | 2.61 | 66.4 | 87.3 |
| [mobilenet_v2_0.75_128] | 69 | 2.61 | 63.2 | 85.3 |
| [mobilenet_v2_0.75_96] | 39 | 2.61 | 58.8 | 81.6 |
| [mobilenet_v2_0.5_224] | 97 | 1.95 | 65.4 | 86.4 |
| [mobilenet_v2_0.5_192] | 71 | 1.95 | 63.9 | 85.4 |
| [mobilenet_v2_0.5_160] | 50 | 1.95 | 61.0 | 83.2 |
| [mobilenet_v2_0.5_128] | 32 | 1.95 | 57.7 | 80.8 |
| [mobilenet_v2_0.5_96] | 18 | 1.95 | 51.2 | 75.8 |
| [mobilenet_v2_0.35_224] | 59 | 1.66 | 60.3 | 82.9 |
| [mobilenet_v2_0.35_192] | 43 | 1.66 | 58.2 | 81.2 |
| [mobilenet_v2_0.35_160] | 30 | 1.66 | 55.7 | 79.1 |
| [mobilenet_v2_0.35_128] | 20 | 1.66 | 50.8 | 75.0 |
| [mobilenet_v2_0.35_96] | 11 | 1.66 | 45.5 | 70.4 |
The weights for all 16 models are obtained and translated from the Tensorflow checkpoints
from TensorFlow checkpoints found at
https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/README.md
# Reference
This file contains building code for MobileNetV2, based on
[MobileNetV2: Inverted Residuals and Linear Bottlenecks](https://arxiv.org/abs/1801.04381)
Tests comparing this model to the existing Tensorflow model can be
found at [mobilenet_v2_keras](https://github.com/JonathanCMitchell/mobilenet_v2_keras)
"""
from __future__ import print_function
from __future__ import absolute_import
from __future__ import division
import os
import warnings
import h5py
import numpy as np
from ..models import Model
from ..layers import Input
from ..layers import Activation
from ..layers import Dropout
from ..layers import Reshape
from ..layers import BatchNormalization
from ..layers import Conv2D
from ..layers import DepthwiseConv2D
from ..layers import GlobalAveragePooling2D
from ..layers import Add
from ..layers import Flatten
from ..layers import Dense
from .. import initializers
from .. import regularizers
from .. import constraints
from ..utils import conv_utils
from ..utils.data_utils import get_file
from ..engine import get_source_inputs
from ..engine.base_layer import InputSpec
from . import imagenet_utils
from .imagenet_utils import _obtain_input_shape
from .imagenet_utils import decode_predictions
from .. import backend as K
# TODO Change path to v1.1
BASE_WEIGHT_PATH = 'https://github.com/JonathanCMitchell/mobilenet_v2_keras/releases/download/v1.1/'
def relu6(x):
return K.relu(x, max_value=6)
def preprocess_input(x):
"""Preprocesses a numpy array encoding a batch of images.
This function applies the "Inception" preprocessing which converts
the RGB values from [0, 255] to [-1, 1]. Note that this preprocessing
function is different from `imagenet_utils.preprocess_input()`.
# Arguments
x: a 4D numpy array consists of RGB values within [0, 255].
# Returns
Preprocessed array.
"""
x /= 128.
x -= 1.
return x.astype(np.float32)
# This function is taken from the original tf repo.
# It ensures that all layers have a channel number that is divisible by 8
# It can be seen here:
# https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
def _make_divisible(v, divisor, min_value=None):
if min_value is None:
min_value = divisor
new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
# Make sure that round down does not go down by more than 10%.
if new_v < 0.9 * v:
new_v += divisor
return new_v
def MobileNetV2(input_shape=None,
alpha=1.0,
depth_multiplier=1,
dropout=1e-3,
include_top=True,
weights='imagenet',
input_tensor=None,
classes=1000):
"""Instantiates the MobileNetV2 architecture.
To load a MobileNetV2 model via `load_model`, import the custom
objects `relu6` and pass them to the `custom_objects` parameter.
E.g.
model = load_model('mobilenet.h5', custom_objects={
'relu6': mobilenet.relu6})
# Arguments
input_shape: optional shape tuple, to be specified if you would
like to use a model with an input img resolution that is not
(224, 224, 3).
It should have exactly 3 inputs channels (224, 224, 3).
You can also omit this option if you would like
to infer input_shape from an input_tensor.
If you choose to include both input_tensor and input_shape then
input_shape will be used if they match, if the shapes
do not match then we will throw an error.
E.g. `(160, 160, 3)` would be one valid value.
alpha: controls the width of the network. This is known as the
width multiplier in the MobileNetV2 paper.
- If `alpha` < 1.0, proportionally decreases the number
of filters in each layer.
- If `alpha` > 1.0, proportionally increases the number
of filters in each layer.
- If `alpha` = 1, default number of filters from the paper
are used at each layer.
depth_multiplier: depth multiplier for depthwise convolution
(also called the resolution multiplier)
dropout: dropout rate, dropout is currently not in use
include_top: whether to include the fully-connected
layer at the top of the network.
weights: one of `None` (random initialization),
'imagenet' (pre-training on ImageNet),
or the path to the weights file to be loaded.
input_tensor: optional Keras tensor (i.e. output of
`layers.Input()`)
to use as image input for the model.
classes: optional number of classes to classify images
into, only to be specified if `include_top` is True, and
if no `weights` argument is specified.
# Returns
A Keras model instance.
# Raises
ValueError: in case of invalid argument for `weights`,
or invalid input shape or invalid depth_multiplier, alpha,
rows when weights='imagenet'
"""
if not (weights in {'imagenet', None} or os.path.exists(weights)):
raise ValueError('The `weights` argument should be either '
'`None` (random initialization), `imagenet` '
'(pre-training on ImageNet), '
'or the path to the weights file to be loaded.')
if weights == 'imagenet' and include_top and classes != 1000:
raise ValueError('If using `weights` as ImageNet with `include_top` '
'as true, `classes` should be 1000')
# Determine proper input shape and default size.
# If both input_shape and input_tensor are used, they should match
if input_shape is not None and input_tensor is not None:
try:
is_input_t_tensor = K.is_keras_tensor(input_tensor)
except ValueError:
try:
is_input_t_tensor = K.is_keras_tensor(
get_source_inputs(input_tensor))
except ValueError:
raise ValueError('input_tensor: ', input_tensor,
'is not type input_tensor')
if is_input_t_tensor:
if K.image_data_format == 'channels_first':
if input_tensor._keras_shape[1] != input_shape[1]:
raise ValueError('input_shape: ', input_shape,
'and input_tensor: ', input_tensor,
'do not meet the same shape requirements')
else:
if input_tensor._keras_shape[2] != input_shape[1]:
raise ValueError('input_shape: ', input_shape,
'and input_tensor: ', input_tensor,
'do not meet the same shape requirements')
else:
raise ValueError('input_tensor specified: ', input_tensor,
'is not a keras tensor')
# If input_shape is None, infer shape from input_tensor
if input_shape is None and input_tensor is not None:
try:
K.is_keras_tensor(input_tensor)
except ValueError:
raise ValueError('input_tensor: ', input_tensor,
'is type: ', type(input_tensor),
'which is not a valid type')
if input_shape is None and not K.is_keras_tensor(input_tensor):
default_size = 224
elif input_shape is None and K.is_keras_tensor(input_tensor):
if K.image_data_format() == 'channels_first':
rows = input_tensor._keras_shape[2]
cols = input_tensor._keras_shape[3]
else:
rows = input_tensor._keras_shape[1]
cols = input_tensor._keras_shape[2]
if rows == cols and rows in [96, 128, 160, 192, 224]:
default_size = rows
else:
default_size = 224
# If input_shape is None and no input_tensor
elif input_shape is None:
default_size = 224
# If input_shape is not None, assume default size
else:
if K.image_data_format() == 'channels_first':
rows = input_shape[1]
cols = input_shape[2]
else:
rows = input_shape[0]
cols = input_shape[1]
if rows == cols and rows in [96, 128, 160, 192, 224]:
default_size = rows
else:
default_size = 224
input_shape = _obtain_input_shape(input_shape,
default_size=default_size,
min_size=32,
data_format=K.image_data_format(),
require_flatten=include_top,
weights=weights)
if K.image_data_format() == 'channels_last':
row_axis, col_axis = (0, 1)
else:
row_axis, col_axis = (1, 2)
rows = input_shape[row_axis]
cols = input_shape[col_axis]
if weights == 'imagenet':
if depth_multiplier != 1:
raise ValueError('If imagenet weights are being loaded, '
'depth multiplier must be 1')
if alpha not in [0.35, 0.50, 0.75, 1.0, 1.3, 1.4]:
raise ValueError('If imagenet weights are being loaded, '
'alpha can be one of'
'`0.25`, `0.50`, `0.75` or `1.0` only.')
if rows != cols or rows not in [96, 128, 160, 192, 224]:
if rows is None:
rows = 224
warnings.warn('MobileNet shape is undefined.'
' Weights for input shape'
'(224, 224) will be loaded.')
else:
raise ValueError('If imagenet weights are being loaded, '
'input must have a static square shape'
'(one of (96, 96), (128, 128), (160, 160),'
'(192, 192), or (224, 224)).'
'Input shape provided = %s' % (input_shape,))
if K.image_data_format() != 'channels_last':
warnings.warn('The MobileNet family of models is only available '
'for the input data format "channels_last" '
'(width, height, channels). '
'However your settings specify the default '
'data format "channels_first" (channels, width, height).'
' You should set `image_data_format="channels_last"` '
'in your Keras config located at ~/.keras/keras.json. '
'The model being returned right now will expect inputs '
'to follow the "channels_last" data format.')
K.set_image_data_format('channels_last')
old_data_format = 'channels_first'
else:
old_data_format = None
if input_tensor is None:
img_input = Input(shape=input_shape)
else:
if not K.is_keras_tensor(input_tensor):
img_input = Input(tensor=input_tensor, shape=input_shape)
else:
img_input = input_tensor
first_block_filters = _make_divisible(32 * alpha, 8)
x = Conv2D(first_block_filters,
kernel_size=3,
strides=(2, 2), padding='same',
use_bias=False, name='Conv1')(img_input)
x = BatchNormalization(epsilon=1e-3, momentum=0.999, name='bn_Conv1')(x)
x = Activation(relu6, name='Conv1_relu')(x)
x = _first_inverted_res_block(x,
filters=16,
alpha=alpha,
stride=1,
expansion=1,
block_id=0)
x = _inverted_res_block(x, filters=24, alpha=alpha, stride=2,
expansion=6, block_id=1)
x = _inverted_res_block(x, filters=24, alpha=alpha, stride=1,
expansion=6, block_id=2)
x = _inverted_res_block(x, filters=32, alpha=alpha, stride=2,
expansion=6, block_id=3)
x = _inverted_res_block(x, filters=32, alpha=alpha, stride=1,
expansion=6, block_id=4)
x = _inverted_res_block(x, filters=32, alpha=alpha, stride=1,
expansion=6, block_id=5)
x = _inverted_res_block(x, filters=64, alpha=alpha, stride=2,
expansion=6, block_id=6)
x = _inverted_res_block(x, filters=64, alpha=alpha, stride=1,
expansion=6, block_id=7)
x = _inverted_res_block(x, filters=64, alpha=alpha, stride=1,
expansion=6, block_id=8)
x = _inverted_res_block(x, filters=64, alpha=alpha, stride=1,
expansion=6, block_id=9)
x = _inverted_res_block(x, filters=96, alpha=alpha, stride=1,
expansion=6, block_id=10)
x = _inverted_res_block(x, filters=96, alpha=alpha, stride=1,
expansion=6, block_id=11)
x = _inverted_res_block(x, filters=96, alpha=alpha, stride=1,
expansion=6, block_id=12)
x = _inverted_res_block(x, filters=160, alpha=alpha, stride=2,
expansion=6, block_id=13)
x = _inverted_res_block(x, filters=160, alpha=alpha, stride=1,
expansion=6, block_id=14)
x = _inverted_res_block(x, filters=160, alpha=alpha, stride=1,
expansion=6, block_id=15)
x = _inverted_res_block(x, filters=320, alpha=alpha, stride=1,
expansion=6, block_id=16)
# no alpha applied to last conv as stated in the paper:
# if the width multiplier is greater than 1 we
# increase the number of output channels
if alpha > 1.0:
last_block_filters = _make_divisible(1280 * alpha, 8)
else:
last_block_filters = 1280
x = Conv2D(last_block_filters,
kernel_size=1,
use_bias=False,
name='Conv_1')(x)
x = BatchNormalization(epsilon=1e-3, momentum=0.999, name='Conv_1_bn')(x)
x = Activation(relu6, name='out_relu')(x)
if include_top:
x = GlobalAveragePooling2D()(x)
x = Dense(classes, activation='softmax',
use_bias=True, name='Logits')(x)
# Ensure that the model takes into account
# any potential predecessors of `input_tensor`.
if input_tensor is not None:
inputs = get_source_inputs(input_tensor)
else:
inputs = img_input
# Create model.
model = Model(inputs, x, name='mobilenetv2_%0.2f_%s' % (alpha, rows))
# load weights
if weights == 'imagenet':
if K.image_data_format() == 'channels_first':
raise ValueError('Weights for "channels_first" format '
'are not available.')
if include_top:
model_name = 'mobilenet_v2_weights_tf_dim_ordering_tf_kernels_' + \
str(alpha) + '_' + str(rows) + '.h5'
weigh_path = BASE_WEIGHT_PATH + model_name
weights_path = get_file(model_name, weigh_path,
cache_subdir='models')
else:
model_name = 'mobilenet_v2_weights_tf_dim_ordering_tf_kernels_' + \
str(alpha) + '_' + str(rows) + '_no_top' + '.h5'
weigh_path = BASE_WEIGHT_PATH + model_name
weights_path = get_file(model_name, weigh_path,
cache_subdir='models')
model.load_weights(weights_path)
elif weights is not None:
model.load_weights(weights)
if old_data_format:
K.set_image_data_format(old_data_format)
return model
def _inverted_res_block(inputs, expansion, stride, alpha, filters, block_id):
in_channels = inputs._keras_shape[-1]
prefix = 'features.' + str(block_id) + '.conv.'
pointwise_conv_filters = int(filters * alpha)
pointwise_filters = _make_divisible(pointwise_conv_filters, 8)
# Expand
x = Conv2D(expansion * in_channels, kernel_size=1, padding='same',
use_bias=False, activation=None,
name='mobl%d_conv_expand' % block_id)(inputs)
x = BatchNormalization(epsilon=1e-3, momentum=0.999,
name='bn%d_conv_bn_expand' %
block_id)(x)
x = Activation(relu6, name='conv_%d_relu' % block_id)(x)
# Depthwise
x = DepthwiseConv2D(kernel_size=3, strides=stride, activation=None,
use_bias=False, padding='same',
name='mobl%d_conv_depthwise' % block_id)(x)
x = BatchNormalization(epsilon=1e-3, momentum=0.999,
name='bn%d_conv_depthwise' % block_id)(x)
x = Activation(relu6, name='conv_dw_%d_relu' % block_id)(x)
# Project
x = Conv2D(pointwise_filters,
kernel_size=1, padding='same', use_bias=False, activation=None,
name='mobl%d_conv_project' % block_id)(x)
x = BatchNormalization(epsilon=1e-3, momentum=0.999,
name='bn%d_conv_bn_project' % block_id)(x)
if in_channels == pointwise_filters and stride == 1:
return Add(name='res_connect_' + str(block_id))([inputs, x])
return x
def _first_inverted_res_block(inputs,
expansion, stride,
alpha, filters, block_id):
in_channels = inputs._keras_shape[-1]
prefix = 'features.' + str(block_id) + '.conv.'
pointwise_conv_filters = int(filters * alpha)
pointwise_filters = _make_divisible(pointwise_conv_filters, 8)
# Depthwise
x = DepthwiseConv2D(kernel_size=3,
strides=stride, activation=None,
use_bias=False, padding='same',
name='mobl%d_conv_depthwise' %
block_id)(inputs)
x = BatchNormalization(epsilon=1e-3, momentum=0.999,
name='bn%d_conv_depthwise' %
block_id)(x)
x = Activation(relu6, name='conv_dw_%d_relu' % block_id)(x)
# Project
x = Conv2D(pointwise_filters,
kernel_size=1,
padding='same',
use_bias=False,
activation=None,
name='mobl%d_conv_project' %
block_id)(x)
x = BatchNormalization(epsilon=1e-3, momentum=0.999,
name='bn%d_conv_project' %
block_id)(x)
if in_channels == pointwise_filters and stride == 1:
return Add(name='res_connect_' + str(block_id))([inputs, x])
return x
...@@ -106,6 +106,16 @@ moblenet_v1_expected_feature_map_shape_enforcing_min_depth = [ ...@@ -106,6 +106,16 @@ moblenet_v1_expected_feature_map_shape_enforcing_min_depth = [
(2, 10, 10, 8), (2, 10, 10, 8), (2, 10, 10, 8), (2, 10, 10, 8), (2, 10, 10, 8), (2, 10, 10, 8),
] ]
moblenet_v1_expected_feature_map_shape_with_conv_defs = [
(2, 150, 150, 32), (2, 150, 150, 32), (2, 150, 150, 64), (2, 75, 75, 64),
(2, 75, 75, 128), (2, 75, 75, 128), (2, 75, 75, 128), (2, 38, 38, 128),
(2, 38, 38, 256), (2, 38, 38, 256), (2, 38, 38, 256), (2, 19, 19, 256),
(2, 19, 19, 512), (2, 19, 19, 512), (2, 19, 19, 512), (2, 19, 19, 512),
(2, 19, 19, 512), (2, 19, 19, 512), (2, 19, 19, 512), (2, 19, 19, 512),
(2, 19, 19, 512), (2, 19, 19, 512), (2, 19, 19, 512), (2, 10, 10, 512),
(2, 10, 10, 512), (2, 10, 10, 512), (2, 10, 10, 256),
]
# For Mobilenet V2 # For Mobilenet V2
moblenet_v2_expected_feature_map_shape_128 = [ moblenet_v2_expected_feature_map_shape_128 = [
(2, 64, 64, 32), (2, 64, 64, 96), (2, 32, 32, 96), (2, 32, 32, 24), (2, 64, 64, 32), (2, 64, 64, 96), (2, 32, 32, 96), (2, 32, 32, 24),
...@@ -187,3 +197,18 @@ moblenet_v2_expected_feature_map_shape_enforcing_min_depth = [ ...@@ -187,3 +197,18 @@ moblenet_v2_expected_feature_map_shape_enforcing_min_depth = [
(2, 10, 10, 32), (2, 10, 10, 32) (2, 10, 10, 32), (2, 10, 10, 32)
] ]
moblenet_v2_expected_feature_map_shape_with_conv_defs = [
(2, 150, 150, 32), (2, 150, 150, 96), (2, 75, 75, 96), (2, 75, 75, 24),
(2, 75, 75, 144), (2, 75, 75, 144), (2, 75, 75, 24), (2, 75, 75, 144),
(2, 38, 38, 144), (2, 38, 38, 32), (2, 38, 38, 192), (2, 38, 38, 192),
(2, 38, 38, 32), (2, 38, 38, 192), (2, 38, 38, 192), (2, 38, 38, 32),
(2, 38, 38, 192), (2, 19, 19, 192), (2, 19, 19, 64), (2, 19, 19, 384),
(2, 19, 19, 384), (2, 19, 19, 64), (2, 19, 19, 384), (2, 19, 19, 384),
(2, 19, 19, 64), (2, 19, 19, 384), (2, 19, 19, 384), (2, 19, 19, 64),
(2, 19, 19, 384), (2, 19, 19, 384), (2, 19, 19, 96), (2, 19, 19, 576),
(2, 19, 19, 576), (2, 19, 19, 96), (2, 19, 19, 576), (2, 19, 19, 576),
(2, 19, 19, 96), (2, 19, 19, 576), (2, 10, 10, 576), (2, 10, 10, 160),
(2, 10, 10, 960), (2, 10, 10, 960), (2, 10, 10, 160), (2, 10, 10, 960),
(2, 10, 10, 960), (2, 10, 10, 160), (2, 10, 10, 960), (2, 10, 10, 960),
(2, 10, 10, 320), (2, 10, 10, 256)
]
...@@ -13,21 +13,32 @@ ...@@ -13,21 +13,32 @@
# limitations under the License. # limitations under the License.
# ============================================================================== # ==============================================================================
"""Tests for ssd_mobilenet_v1_fpn_feature_extractor.""" """Tests for ssd_mobilenet_v1_fpn_feature_extractor.
By using parameterized test decorator, this test serves for both Slim-based and
Keras-based Mobilenet V1 FPN feature extractors in SSD.
"""
from absl.testing import parameterized
import numpy as np import numpy as np
import tensorflow as tf import tensorflow as tf
from object_detection.models import ssd_feature_extractor_test from object_detection.models import ssd_feature_extractor_test
from object_detection.models import ssd_mobilenet_v1_fpn_feature_extractor from object_detection.models import ssd_mobilenet_v1_fpn_feature_extractor
from object_detection.models import ssd_mobilenet_v1_fpn_keras_feature_extractor
slim = tf.contrib.slim slim = tf.contrib.slim
@parameterized.parameters(
{'use_keras': False},
{'use_keras': True},
)
class SsdMobilenetV1FpnFeatureExtractorTest( class SsdMobilenetV1FpnFeatureExtractorTest(
ssd_feature_extractor_test.SsdFeatureExtractorTestBase): ssd_feature_extractor_test.SsdFeatureExtractorTestBase):
def _create_feature_extractor(self, depth_multiplier, pad_to_multiple, def _create_feature_extractor(self, depth_multiplier, pad_to_multiple,
is_training=True, use_explicit_padding=False): is_training=True, use_explicit_padding=False,
use_keras=False):
"""Constructs a new feature extractor. """Constructs a new feature extractor.
Args: Args:
...@@ -38,10 +49,27 @@ class SsdMobilenetV1FpnFeatureExtractorTest( ...@@ -38,10 +49,27 @@ class SsdMobilenetV1FpnFeatureExtractorTest(
use_explicit_padding: Use 'VALID' padding for convolutions, but prepad use_explicit_padding: Use 'VALID' padding for convolutions, but prepad
inputs so that the output dimensions are the same as if 'SAME' padding inputs so that the output dimensions are the same as if 'SAME' padding
were used. were used.
use_keras: if True builds a keras-based feature extractor, if False builds
a slim-based one.
Returns: Returns:
an ssd_meta_arch.SSDFeatureExtractor object. an ssd_meta_arch.SSDFeatureExtractor object.
""" """
min_depth = 32 min_depth = 32
if use_keras:
return (ssd_mobilenet_v1_fpn_keras_feature_extractor.
SSDMobileNetV1FpnKerasFeatureExtractor(
is_training=is_training,
depth_multiplier=depth_multiplier,
min_depth=min_depth,
pad_to_multiple=pad_to_multiple,
conv_hyperparams=self._build_conv_hyperparams(
add_batch_norm=False),
freeze_batchnorm=False,
inplace_batchnorm_update=False,
use_explicit_padding=use_explicit_padding,
use_depthwise=True,
name='MobilenetV1_FPN'))
else:
return (ssd_mobilenet_v1_fpn_feature_extractor. return (ssd_mobilenet_v1_fpn_feature_extractor.
SSDMobileNetV1FpnFeatureExtractor( SSDMobileNetV1FpnFeatureExtractor(
is_training, is_training,
...@@ -49,9 +77,10 @@ class SsdMobilenetV1FpnFeatureExtractorTest( ...@@ -49,9 +77,10 @@ class SsdMobilenetV1FpnFeatureExtractorTest(
min_depth, min_depth,
pad_to_multiple, pad_to_multiple,
self.conv_hyperparams_fn, self.conv_hyperparams_fn,
use_depthwise=True,
use_explicit_padding=use_explicit_padding)) use_explicit_padding=use_explicit_padding))
def test_extract_features_returns_correct_shapes_256(self): def test_extract_features_returns_correct_shapes_256(self, use_keras):
image_height = 256 image_height = 256
image_width = 256 image_width = 256
depth_multiplier = 1.0 depth_multiplier = 1.0
...@@ -61,12 +90,14 @@ class SsdMobilenetV1FpnFeatureExtractorTest( ...@@ -61,12 +90,14 @@ class SsdMobilenetV1FpnFeatureExtractorTest(
(2, 2, 2, 256)] (2, 2, 2, 256)]
self.check_extract_features_returns_correct_shape( self.check_extract_features_returns_correct_shape(
2, image_height, image_width, depth_multiplier, pad_to_multiple, 2, image_height, image_width, depth_multiplier, pad_to_multiple,
expected_feature_map_shape, use_explicit_padding=False) expected_feature_map_shape, use_explicit_padding=False,
use_keras=use_keras)
self.check_extract_features_returns_correct_shape( self.check_extract_features_returns_correct_shape(
2, image_height, image_width, depth_multiplier, pad_to_multiple, 2, image_height, image_width, depth_multiplier, pad_to_multiple,
expected_feature_map_shape, use_explicit_padding=True) expected_feature_map_shape, use_explicit_padding=True,
use_keras=use_keras)
def test_extract_features_returns_correct_shapes_384(self): def test_extract_features_returns_correct_shapes_384(self, use_keras):
image_height = 320 image_height = 320
image_width = 320 image_width = 320
depth_multiplier = 1.0 depth_multiplier = 1.0
...@@ -76,12 +107,14 @@ class SsdMobilenetV1FpnFeatureExtractorTest( ...@@ -76,12 +107,14 @@ class SsdMobilenetV1FpnFeatureExtractorTest(
(2, 3, 3, 256)] (2, 3, 3, 256)]
self.check_extract_features_returns_correct_shape( self.check_extract_features_returns_correct_shape(
2, image_height, image_width, depth_multiplier, pad_to_multiple, 2, image_height, image_width, depth_multiplier, pad_to_multiple,
expected_feature_map_shape, use_explicit_padding=False) expected_feature_map_shape, use_explicit_padding=False,
use_keras=use_keras)
self.check_extract_features_returns_correct_shape( self.check_extract_features_returns_correct_shape(
2, image_height, image_width, depth_multiplier, pad_to_multiple, 2, image_height, image_width, depth_multiplier, pad_to_multiple,
expected_feature_map_shape, use_explicit_padding=True) expected_feature_map_shape, use_explicit_padding=True,
use_keras=use_keras)
def test_extract_features_with_dynamic_image_shape(self): def test_extract_features_with_dynamic_image_shape(self, use_keras):
image_height = 256 image_height = 256
image_width = 256 image_width = 256
depth_multiplier = 1.0 depth_multiplier = 1.0
...@@ -91,12 +124,15 @@ class SsdMobilenetV1FpnFeatureExtractorTest( ...@@ -91,12 +124,15 @@ class SsdMobilenetV1FpnFeatureExtractorTest(
(2, 2, 2, 256)] (2, 2, 2, 256)]
self.check_extract_features_returns_correct_shapes_with_dynamic_inputs( self.check_extract_features_returns_correct_shapes_with_dynamic_inputs(
2, image_height, image_width, depth_multiplier, pad_to_multiple, 2, image_height, image_width, depth_multiplier, pad_to_multiple,
expected_feature_map_shape, use_explicit_padding=False) expected_feature_map_shape, use_explicit_padding=False,
use_keras=use_keras)
self.check_extract_features_returns_correct_shapes_with_dynamic_inputs( self.check_extract_features_returns_correct_shapes_with_dynamic_inputs(
2, image_height, image_width, depth_multiplier, pad_to_multiple, 2, image_height, image_width, depth_multiplier, pad_to_multiple,
expected_feature_map_shape, use_explicit_padding=True) expected_feature_map_shape, use_explicit_padding=True,
use_keras=use_keras)
def test_extract_features_returns_correct_shapes_with_pad_to_multiple(self): def test_extract_features_returns_correct_shapes_with_pad_to_multiple(
self, use_keras):
image_height = 299 image_height = 299
image_width = 299 image_width = 299
depth_multiplier = 1.0 depth_multiplier = 1.0
...@@ -106,12 +142,15 @@ class SsdMobilenetV1FpnFeatureExtractorTest( ...@@ -106,12 +142,15 @@ class SsdMobilenetV1FpnFeatureExtractorTest(
(2, 3, 3, 256)] (2, 3, 3, 256)]
self.check_extract_features_returns_correct_shape( self.check_extract_features_returns_correct_shape(
2, image_height, image_width, depth_multiplier, pad_to_multiple, 2, image_height, image_width, depth_multiplier, pad_to_multiple,
expected_feature_map_shape, use_explicit_padding=False) expected_feature_map_shape, use_explicit_padding=False,
use_keras=use_keras)
self.check_extract_features_returns_correct_shape( self.check_extract_features_returns_correct_shape(
2, image_height, image_width, depth_multiplier, pad_to_multiple, 2, image_height, image_width, depth_multiplier, pad_to_multiple,
expected_feature_map_shape, use_explicit_padding=True) expected_feature_map_shape, use_explicit_padding=True,
use_keras=use_keras)
def test_extract_features_returns_correct_shapes_enforcing_min_depth(self): def test_extract_features_returns_correct_shapes_enforcing_min_depth(
self, use_keras):
image_height = 256 image_height = 256
image_width = 256 image_width = 256
depth_multiplier = 0.5**12 depth_multiplier = 0.5**12
...@@ -121,38 +160,50 @@ class SsdMobilenetV1FpnFeatureExtractorTest( ...@@ -121,38 +160,50 @@ class SsdMobilenetV1FpnFeatureExtractorTest(
(2, 2, 2, 32)] (2, 2, 2, 32)]
self.check_extract_features_returns_correct_shape( self.check_extract_features_returns_correct_shape(
2, image_height, image_width, depth_multiplier, pad_to_multiple, 2, image_height, image_width, depth_multiplier, pad_to_multiple,
expected_feature_map_shape, use_explicit_padding=False) expected_feature_map_shape, use_explicit_padding=False,
use_keras=use_keras)
self.check_extract_features_returns_correct_shape( self.check_extract_features_returns_correct_shape(
2, image_height, image_width, depth_multiplier, pad_to_multiple, 2, image_height, image_width, depth_multiplier, pad_to_multiple,
expected_feature_map_shape, use_explicit_padding=True) expected_feature_map_shape, use_explicit_padding=True,
use_keras=use_keras)
def test_extract_features_raises_error_with_invalid_image_size(self): def test_extract_features_raises_error_with_invalid_image_size(
self, use_keras):
image_height = 32 image_height = 32
image_width = 32 image_width = 32
depth_multiplier = 1.0 depth_multiplier = 1.0
pad_to_multiple = 1 pad_to_multiple = 1
self.check_extract_features_raises_error_with_invalid_image_size( self.check_extract_features_raises_error_with_invalid_image_size(
image_height, image_width, depth_multiplier, pad_to_multiple) image_height, image_width, depth_multiplier, pad_to_multiple,
use_keras=use_keras)
def test_preprocess_returns_correct_value_range(self): def test_preprocess_returns_correct_value_range(self, use_keras):
image_height = 256 image_height = 256
image_width = 256 image_width = 256
depth_multiplier = 1 depth_multiplier = 1
pad_to_multiple = 1 pad_to_multiple = 1
test_image = np.random.rand(2, image_height, image_width, 3) test_image = np.random.rand(2, image_height, image_width, 3)
feature_extractor = self._create_feature_extractor(depth_multiplier, feature_extractor = self._create_feature_extractor(depth_multiplier,
pad_to_multiple) pad_to_multiple,
use_keras=use_keras)
preprocessed_image = feature_extractor.preprocess(test_image) preprocessed_image = feature_extractor.preprocess(test_image)
self.assertTrue(np.all(np.less_equal(np.abs(preprocessed_image), 1.0))) self.assertTrue(np.all(np.less_equal(np.abs(preprocessed_image), 1.0)))
def test_variables_only_created_in_scope(self): def test_variables_only_created_in_scope(self, use_keras):
depth_multiplier = 1 depth_multiplier = 1
pad_to_multiple = 1 pad_to_multiple = 1
scope_name = 'MobilenetV1' scope_name = 'MobilenetV1'
self.check_feature_extractor_variables_under_scope( self.check_feature_extractor_variables_under_scope(
depth_multiplier, pad_to_multiple, scope_name) depth_multiplier, pad_to_multiple, scope_name, use_keras=use_keras)
def test_fused_batchnorm(self): def test_variable_count(self, use_keras):
depth_multiplier = 1
pad_to_multiple = 1
variables = self.get_feature_extractor_variables(
depth_multiplier, pad_to_multiple, use_keras=use_keras)
self.assertEqual(len(variables), 153)
def test_fused_batchnorm(self, use_keras):
image_height = 256 image_height = 256
image_width = 256 image_width = 256
depth_multiplier = 1 depth_multiplier = 1
...@@ -160,9 +211,14 @@ class SsdMobilenetV1FpnFeatureExtractorTest( ...@@ -160,9 +211,14 @@ class SsdMobilenetV1FpnFeatureExtractorTest(
image_placeholder = tf.placeholder(tf.float32, image_placeholder = tf.placeholder(tf.float32,
[1, image_height, image_width, 3]) [1, image_height, image_width, 3])
feature_extractor = self._create_feature_extractor(depth_multiplier, feature_extractor = self._create_feature_extractor(depth_multiplier,
pad_to_multiple) pad_to_multiple,
use_keras=use_keras)
preprocessed_image = feature_extractor.preprocess(image_placeholder) preprocessed_image = feature_extractor.preprocess(image_placeholder)
if use_keras:
_ = feature_extractor(preprocessed_image)
else:
_ = feature_extractor.extract_features(preprocessed_image) _ = feature_extractor.extract_features(preprocessed_image)
self.assertTrue( self.assertTrue(
any(op.type == 'FusedBatchNorm' any(op.type == 'FusedBatchNorm'
for op in tf.get_default_graph().get_operations())) for op in tf.get_default_graph().get_operations()))
......
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""SSD Keras-based MobilenetV1 FPN Feature Extractor."""
import tensorflow as tf
from object_detection.meta_architectures import ssd_meta_arch
from object_detection.models import feature_map_generators
from object_detection.models.keras_models import mobilenet_v1
from object_detection.models.keras_models import model_utils
from object_detection.utils import ops
from object_detection.utils import shape_utils
# A modified config of mobilenet v1 that makes it more detection friendly.
def _create_modified_mobilenet_config():
conv_def_block_12 = model_utils.ConvDefs(conv_name='conv_pw_12', filters=512)
conv_def_block_13 = model_utils.ConvDefs(conv_name='conv_pw_13', filters=256)
return [conv_def_block_12, conv_def_block_13]
class SSDMobileNetV1FpnKerasFeatureExtractor(
ssd_meta_arch.SSDKerasFeatureExtractor):
"""SSD Feature Extractor using Keras-based MobilenetV1 FPN features."""
def __init__(self,
is_training,
depth_multiplier,
min_depth,
pad_to_multiple,
conv_hyperparams,
freeze_batchnorm,
inplace_batchnorm_update,
fpn_min_level=3,
fpn_max_level=7,
additional_layer_depth=256,
use_explicit_padding=False,
use_depthwise=False,
override_base_feature_extractor_hyperparams=False,
name=None):
"""SSD Keras based FPN feature extractor Mobilenet v1 architecture.
Args:
is_training: whether the network is in training mode.
depth_multiplier: float depth multiplier for feature extractor.
min_depth: minimum feature extractor depth.
pad_to_multiple: the nearest multiple to zero pad the input height and
width dimensions to.
conv_hyperparams: a `hyperparams_builder.KerasLayerHyperparams` object
containing convolution hyperparameters for the layers added on top of
the base feature extractor.
freeze_batchnorm: whether to freeze batch norm parameters during
training or not. When training with a small batch size (e.g. 1), it is
desirable to freeze batch norm update and use pretrained batch norm
params.
inplace_batchnorm_update: whether to update batch norm moving average
values inplace. When this is false train op must add a control
dependency on tf.graphkeys.UPDATE_OPS collection in order to update
batch norm statistics.
fpn_min_level: the highest resolution feature map to use in FPN. The valid
values are {2, 3, 4, 5} which map to MobileNet v1 layers
{Conv2d_3_pointwise, Conv2d_5_pointwise, Conv2d_11_pointwise,
Conv2d_13_pointwise}, respectively.
fpn_max_level: the smallest resolution feature map to construct or use in
FPN. FPN constructions uses features maps starting from fpn_min_level
upto the fpn_max_level. In the case that there are not enough feature
maps in the backbone network, additional feature maps are created by
applying stride 2 convolutions until we get the desired number of fpn
levels.
additional_layer_depth: additional feature map layer channel depth.
use_explicit_padding: Whether to use explicit padding when extracting
features. Default is False.
use_depthwise: whether to use depthwise convolutions. Default is False.
override_base_feature_extractor_hyperparams: Whether to override
hyperparameters of the base feature extractor with the one from
`conv_hyperparams`.
name: a string name scope to assign to the model. If 'None', Keras
will auto-generate one from the class name.
"""
super(SSDMobileNetV1FpnKerasFeatureExtractor, self).__init__(
is_training=is_training,
depth_multiplier=depth_multiplier,
min_depth=min_depth,
pad_to_multiple=pad_to_multiple,
conv_hyperparams=conv_hyperparams,
freeze_batchnorm=freeze_batchnorm,
inplace_batchnorm_update=inplace_batchnorm_update,
use_explicit_padding=use_explicit_padding,
use_depthwise=use_depthwise,
override_base_feature_extractor_hyperparams=
override_base_feature_extractor_hyperparams,
name=name)
self._fpn_min_level = fpn_min_level
self._fpn_max_level = fpn_max_level
self._additional_layer_depth = additional_layer_depth
self._conv_defs = None
if self._use_depthwise:
self._conv_defs = _create_modified_mobilenet_config()
self._feature_blocks = [
'Conv2d_3_pointwise', 'Conv2d_5_pointwise', 'Conv2d_11_pointwise',
'Conv2d_13_pointwise'
]
self._mobilenet_v1 = None
self._fpn_features_generator = None
self._coarse_feature_layers = []
def build(self, input_shape):
full_mobilenet_v1 = mobilenet_v1.mobilenet_v1(
batchnorm_training=(self._is_training and not self._freeze_batchnorm),
conv_hyperparams=(self._conv_hyperparams
if self._override_base_feature_extractor_hyperparams
else None),
weights=None,
use_explicit_padding=self._use_explicit_padding,
alpha=self._depth_multiplier,
min_depth=self._min_depth,
conv_defs=self._conv_defs,
include_top=False)
conv2d_3_pointwise = full_mobilenet_v1.get_layer(
name='conv_pw_3_relu').output
conv2d_5_pointwise = full_mobilenet_v1.get_layer(
name='conv_pw_5_relu').output
conv2d_11_pointwise = full_mobilenet_v1.get_layer(
name='conv_pw_11_relu').output
conv2d_13_pointwise = full_mobilenet_v1.get_layer(
name='conv_pw_13_relu').output
self._mobilenet_v1 = tf.keras.Model(
inputs=full_mobilenet_v1.inputs,
outputs=[conv2d_3_pointwise, conv2d_5_pointwise,
conv2d_11_pointwise, conv2d_13_pointwise]
)
# pylint:disable=g-long-lambda
self._depth_fn = lambda d: max(
int(d * self._depth_multiplier), self._min_depth)
self._base_fpn_max_level = min(self._fpn_max_level, 5)
self._num_levels = self._base_fpn_max_level + 1 - self._fpn_min_level
self._fpn_features_generator = (
feature_map_generators.KerasFpnTopDownFeatureMaps(
num_levels=self._num_levels,
depth=self._depth_fn(self._additional_layer_depth),
use_depthwise=self._use_depthwise,
use_explicit_padding=self._use_explicit_padding,
is_training=self._is_training,
conv_hyperparams=self._conv_hyperparams,
freeze_batchnorm=self._freeze_batchnorm,
name='FeatureMaps'))
# Construct coarse feature layers
padding = 'VALID' if self._use_explicit_padding else 'SAME'
kernel_size = 3
stride = 2
for i in range(self._base_fpn_max_level + 1, self._fpn_max_level + 1):
coarse_feature_layers = []
if self._use_explicit_padding:
def fixed_padding(features, kernel_size=kernel_size):
return ops.fixed_padding(features, kernel_size)
coarse_feature_layers.append(tf.keras.layers.Lambda(
fixed_padding, name='fixed_padding'))
layer_name = 'bottom_up_Conv2d_{}'.format(
i - self._base_fpn_max_level + 13)
conv_block = feature_map_generators.create_conv_block(
self._use_depthwise, kernel_size, padding, stride, layer_name,
self._conv_hyperparams, self._is_training, self._freeze_batchnorm,
self._depth_fn(self._additional_layer_depth))
coarse_feature_layers.extend(conv_block)
self._coarse_feature_layers.append(coarse_feature_layers)
self.built = True
def preprocess(self, resized_inputs):
"""SSD preprocessing.
Maps pixel values to the range [-1, 1].
Args:
resized_inputs: a [batch, height, width, channels] float tensor
representing a batch of images.
Returns:
preprocessed_inputs: a [batch, height, width, channels] float tensor
representing a batch of images.
"""
return (2.0 / 255.0) * resized_inputs - 1.0
def _extract_features(self, preprocessed_inputs):
"""Extract features from preprocessed inputs.
Args:
preprocessed_inputs: a [batch, height, width, channels] float tensor
representing a batch of images.
Returns:
feature_maps: a list of tensors where the ith tensor has shape
[batch, height_i, width_i, depth_i]
"""
preprocessed_inputs = shape_utils.check_min_image_dim(
33, preprocessed_inputs)
image_features = self._mobilenet_v1(
ops.pad_to_multiple(preprocessed_inputs, self._pad_to_multiple))
feature_block_list = []
for level in range(self._fpn_min_level, self._base_fpn_max_level + 1):
feature_block_list.append(self._feature_blocks[level - 2])
feature_start_index = len(self._feature_blocks) - self._num_levels
fpn_input_image_features = [
(key, image_features[feature_start_index + index])
for index, key in enumerate(feature_block_list)]
fpn_features = self._fpn_features_generator(fpn_input_image_features)
feature_maps = []
for level in range(self._fpn_min_level, self._base_fpn_max_level + 1):
feature_maps.append(fpn_features['top_down_{}'.format(
self._feature_blocks[level - 2])])
last_feature_map = fpn_features['top_down_{}'.format(
self._feature_blocks[self._base_fpn_max_level - 2])]
for coarse_feature_layers in self._coarse_feature_layers:
for layer in coarse_feature_layers:
last_feature_map = layer(last_feature_map)
feature_maps.append(last_feature_map)
return feature_maps
...@@ -13,21 +13,32 @@ ...@@ -13,21 +13,32 @@
# limitations under the License. # limitations under the License.
# ============================================================================== # ==============================================================================
"""Tests for ssd_mobilenet_v2_fpn_feature_extractor.""" """Tests for ssd_mobilenet_v2_fpn_feature_extractor.
By using parameterized test decorator, this test serves for both Slim-based and
Keras-based Mobilenet V2 FPN feature extractors in SSD.
"""
from absl.testing import parameterized
import numpy as np import numpy as np
import tensorflow as tf import tensorflow as tf
from object_detection.models import ssd_feature_extractor_test from object_detection.models import ssd_feature_extractor_test
from object_detection.models import ssd_mobilenet_v2_fpn_feature_extractor from object_detection.models import ssd_mobilenet_v2_fpn_feature_extractor
from object_detection.models import ssd_mobilenet_v2_fpn_keras_feature_extractor
slim = tf.contrib.slim slim = tf.contrib.slim
@parameterized.parameters(
{'use_keras': False},
{'use_keras': True},
)
class SsdMobilenetV2FpnFeatureExtractorTest( class SsdMobilenetV2FpnFeatureExtractorTest(
ssd_feature_extractor_test.SsdFeatureExtractorTestBase): ssd_feature_extractor_test.SsdFeatureExtractorTestBase):
def _create_feature_extractor(self, depth_multiplier, pad_to_multiple, def _create_feature_extractor(self, depth_multiplier, pad_to_multiple,
is_training=True, use_explicit_padding=False): is_training=True, use_explicit_padding=False,
use_keras=False):
"""Constructs a new feature extractor. """Constructs a new feature extractor.
Args: Args:
...@@ -38,10 +49,26 @@ class SsdMobilenetV2FpnFeatureExtractorTest( ...@@ -38,10 +49,26 @@ class SsdMobilenetV2FpnFeatureExtractorTest(
use_explicit_padding: Use 'VALID' padding for convolutions, but prepad use_explicit_padding: Use 'VALID' padding for convolutions, but prepad
inputs so that the output dimensions are the same as if 'SAME' padding inputs so that the output dimensions are the same as if 'SAME' padding
were used. were used.
use_keras: if True builds a keras-based feature extractor, if False builds
a slim-based one.
Returns: Returns:
an ssd_meta_arch.SSDFeatureExtractor object. an ssd_meta_arch.SSDFeatureExtractor object.
""" """
min_depth = 32 min_depth = 32
if use_keras:
return (ssd_mobilenet_v2_fpn_keras_feature_extractor.
SSDMobileNetV2FpnKerasFeatureExtractor(
is_training=is_training,
depth_multiplier=depth_multiplier,
min_depth=min_depth,
pad_to_multiple=pad_to_multiple,
conv_hyperparams=self._build_conv_hyperparams(
add_batch_norm=False),
freeze_batchnorm=False,
inplace_batchnorm_update=False,
use_explicit_padding=use_explicit_padding,
name='MobilenetV2_FPN'))
else:
return (ssd_mobilenet_v2_fpn_feature_extractor. return (ssd_mobilenet_v2_fpn_feature_extractor.
SSDMobileNetV2FpnFeatureExtractor( SSDMobileNetV2FpnFeatureExtractor(
is_training, is_training,
...@@ -51,7 +78,7 @@ class SsdMobilenetV2FpnFeatureExtractorTest( ...@@ -51,7 +78,7 @@ class SsdMobilenetV2FpnFeatureExtractorTest(
self.conv_hyperparams_fn, self.conv_hyperparams_fn,
use_explicit_padding=use_explicit_padding)) use_explicit_padding=use_explicit_padding))
def test_extract_features_returns_correct_shapes_256(self): def test_extract_features_returns_correct_shapes_256(self, use_keras):
image_height = 256 image_height = 256
image_width = 256 image_width = 256
depth_multiplier = 1.0 depth_multiplier = 1.0
...@@ -61,12 +88,14 @@ class SsdMobilenetV2FpnFeatureExtractorTest( ...@@ -61,12 +88,14 @@ class SsdMobilenetV2FpnFeatureExtractorTest(
(2, 2, 2, 256)] (2, 2, 2, 256)]
self.check_extract_features_returns_correct_shape( self.check_extract_features_returns_correct_shape(
2, image_height, image_width, depth_multiplier, pad_to_multiple, 2, image_height, image_width, depth_multiplier, pad_to_multiple,
expected_feature_map_shape, use_explicit_padding=False) expected_feature_map_shape, use_explicit_padding=False,
use_keras=use_keras)
self.check_extract_features_returns_correct_shape( self.check_extract_features_returns_correct_shape(
2, image_height, image_width, depth_multiplier, pad_to_multiple, 2, image_height, image_width, depth_multiplier, pad_to_multiple,
expected_feature_map_shape, use_explicit_padding=True) expected_feature_map_shape, use_explicit_padding=True,
use_keras=use_keras)
def test_extract_features_returns_correct_shapes_384(self): def test_extract_features_returns_correct_shapes_384(self, use_keras):
image_height = 320 image_height = 320
image_width = 320 image_width = 320
depth_multiplier = 1.0 depth_multiplier = 1.0
...@@ -76,12 +105,14 @@ class SsdMobilenetV2FpnFeatureExtractorTest( ...@@ -76,12 +105,14 @@ class SsdMobilenetV2FpnFeatureExtractorTest(
(2, 3, 3, 256)] (2, 3, 3, 256)]
self.check_extract_features_returns_correct_shape( self.check_extract_features_returns_correct_shape(
2, image_height, image_width, depth_multiplier, pad_to_multiple, 2, image_height, image_width, depth_multiplier, pad_to_multiple,
expected_feature_map_shape, use_explicit_padding=False) expected_feature_map_shape, use_explicit_padding=False,
use_keras=use_keras)
self.check_extract_features_returns_correct_shape( self.check_extract_features_returns_correct_shape(
2, image_height, image_width, depth_multiplier, pad_to_multiple, 2, image_height, image_width, depth_multiplier, pad_to_multiple,
expected_feature_map_shape, use_explicit_padding=True) expected_feature_map_shape, use_explicit_padding=True,
use_keras=use_keras)
def test_extract_features_with_dynamic_image_shape(self): def test_extract_features_with_dynamic_image_shape(self, use_keras):
image_height = 256 image_height = 256
image_width = 256 image_width = 256
depth_multiplier = 1.0 depth_multiplier = 1.0
...@@ -91,12 +122,15 @@ class SsdMobilenetV2FpnFeatureExtractorTest( ...@@ -91,12 +122,15 @@ class SsdMobilenetV2FpnFeatureExtractorTest(
(2, 2, 2, 256)] (2, 2, 2, 256)]
self.check_extract_features_returns_correct_shapes_with_dynamic_inputs( self.check_extract_features_returns_correct_shapes_with_dynamic_inputs(
2, image_height, image_width, depth_multiplier, pad_to_multiple, 2, image_height, image_width, depth_multiplier, pad_to_multiple,
expected_feature_map_shape, use_explicit_padding=False) expected_feature_map_shape, use_explicit_padding=False,
use_keras=use_keras)
self.check_extract_features_returns_correct_shapes_with_dynamic_inputs( self.check_extract_features_returns_correct_shapes_with_dynamic_inputs(
2, image_height, image_width, depth_multiplier, pad_to_multiple, 2, image_height, image_width, depth_multiplier, pad_to_multiple,
expected_feature_map_shape, use_explicit_padding=True) expected_feature_map_shape, use_explicit_padding=True,
use_keras=use_keras)
def test_extract_features_returns_correct_shapes_with_pad_to_multiple(self): def test_extract_features_returns_correct_shapes_with_pad_to_multiple(
self, use_keras):
image_height = 299 image_height = 299
image_width = 299 image_width = 299
depth_multiplier = 1.0 depth_multiplier = 1.0
...@@ -106,12 +140,15 @@ class SsdMobilenetV2FpnFeatureExtractorTest( ...@@ -106,12 +140,15 @@ class SsdMobilenetV2FpnFeatureExtractorTest(
(2, 3, 3, 256)] (2, 3, 3, 256)]
self.check_extract_features_returns_correct_shape( self.check_extract_features_returns_correct_shape(
2, image_height, image_width, depth_multiplier, pad_to_multiple, 2, image_height, image_width, depth_multiplier, pad_to_multiple,
expected_feature_map_shape, use_explicit_padding=False) expected_feature_map_shape, use_explicit_padding=False,
use_keras=use_keras)
self.check_extract_features_returns_correct_shape( self.check_extract_features_returns_correct_shape(
2, image_height, image_width, depth_multiplier, pad_to_multiple, 2, image_height, image_width, depth_multiplier, pad_to_multiple,
expected_feature_map_shape, use_explicit_padding=True) expected_feature_map_shape, use_explicit_padding=True,
use_keras=use_keras)
def test_extract_features_returns_correct_shapes_enforcing_min_depth(self): def test_extract_features_returns_correct_shapes_enforcing_min_depth(
self, use_keras):
image_height = 256 image_height = 256
image_width = 256 image_width = 256
depth_multiplier = 0.5**12 depth_multiplier = 0.5**12
...@@ -121,38 +158,43 @@ class SsdMobilenetV2FpnFeatureExtractorTest( ...@@ -121,38 +158,43 @@ class SsdMobilenetV2FpnFeatureExtractorTest(
(2, 2, 2, 32)] (2, 2, 2, 32)]
self.check_extract_features_returns_correct_shape( self.check_extract_features_returns_correct_shape(
2, image_height, image_width, depth_multiplier, pad_to_multiple, 2, image_height, image_width, depth_multiplier, pad_to_multiple,
expected_feature_map_shape, use_explicit_padding=False) expected_feature_map_shape, use_explicit_padding=False,
use_keras=use_keras)
self.check_extract_features_returns_correct_shape( self.check_extract_features_returns_correct_shape(
2, image_height, image_width, depth_multiplier, pad_to_multiple, 2, image_height, image_width, depth_multiplier, pad_to_multiple,
expected_feature_map_shape, use_explicit_padding=True) expected_feature_map_shape, use_explicit_padding=True,
use_keras=use_keras)
def test_extract_features_raises_error_with_invalid_image_size(self): def test_extract_features_raises_error_with_invalid_image_size(
self, use_keras):
image_height = 32 image_height = 32
image_width = 32 image_width = 32
depth_multiplier = 1.0 depth_multiplier = 1.0
pad_to_multiple = 1 pad_to_multiple = 1
self.check_extract_features_raises_error_with_invalid_image_size( self.check_extract_features_raises_error_with_invalid_image_size(
image_height, image_width, depth_multiplier, pad_to_multiple) image_height, image_width, depth_multiplier, pad_to_multiple,
use_keras=use_keras)
def test_preprocess_returns_correct_value_range(self): def test_preprocess_returns_correct_value_range(self, use_keras):
image_height = 256 image_height = 256
image_width = 256 image_width = 256
depth_multiplier = 1 depth_multiplier = 1
pad_to_multiple = 1 pad_to_multiple = 1
test_image = np.random.rand(2, image_height, image_width, 3) test_image = np.random.rand(2, image_height, image_width, 3)
feature_extractor = self._create_feature_extractor(depth_multiplier, feature_extractor = self._create_feature_extractor(depth_multiplier,
pad_to_multiple) pad_to_multiple,
use_keras=use_keras)
preprocessed_image = feature_extractor.preprocess(test_image) preprocessed_image = feature_extractor.preprocess(test_image)
self.assertTrue(np.all(np.less_equal(np.abs(preprocessed_image), 1.0))) self.assertTrue(np.all(np.less_equal(np.abs(preprocessed_image), 1.0)))
def test_variables_only_created_in_scope(self): def test_variables_only_created_in_scope(self, use_keras):
depth_multiplier = 1 depth_multiplier = 1
pad_to_multiple = 1 pad_to_multiple = 1
scope_name = 'MobilenetV2' scope_name = 'MobilenetV2'
self.check_feature_extractor_variables_under_scope( self.check_feature_extractor_variables_under_scope(
depth_multiplier, pad_to_multiple, scope_name) depth_multiplier, pad_to_multiple, scope_name, use_keras=use_keras)
def test_fused_batchnorm(self): def test_fused_batchnorm(self, use_keras):
image_height = 256 image_height = 256
image_width = 256 image_width = 256
depth_multiplier = 1 depth_multiplier = 1
...@@ -160,19 +202,30 @@ class SsdMobilenetV2FpnFeatureExtractorTest( ...@@ -160,19 +202,30 @@ class SsdMobilenetV2FpnFeatureExtractorTest(
image_placeholder = tf.placeholder(tf.float32, image_placeholder = tf.placeholder(tf.float32,
[1, image_height, image_width, 3]) [1, image_height, image_width, 3])
feature_extractor = self._create_feature_extractor(depth_multiplier, feature_extractor = self._create_feature_extractor(depth_multiplier,
pad_to_multiple) pad_to_multiple,
use_keras=use_keras)
preprocessed_image = feature_extractor.preprocess(image_placeholder) preprocessed_image = feature_extractor.preprocess(image_placeholder)
if use_keras:
_ = feature_extractor(preprocessed_image)
else:
_ = feature_extractor.extract_features(preprocessed_image) _ = feature_extractor.extract_features(preprocessed_image)
self.assertTrue( self.assertTrue(
any(op.type == 'FusedBatchNorm' any(op.type == 'FusedBatchNorm'
for op in tf.get_default_graph().get_operations())) for op in tf.get_default_graph().get_operations()))
def test_get_expected_feature_map_variable_names(self): def test_variable_count(self, use_keras):
depth_multiplier = 1
pad_to_multiple = 1
variables = self.get_feature_extractor_variables(
depth_multiplier, pad_to_multiple, use_keras=use_keras)
self.assertEqual(len(variables), 274)
def test_get_expected_feature_map_variable_names(self, use_keras):
depth_multiplier = 1.0 depth_multiplier = 1.0
pad_to_multiple = 1 pad_to_multiple = 1
expected_feature_maps_variables = set([ slim_expected_feature_maps_variables = set([
# Mobilenet V2 feature maps # Slim Mobilenet V2 feature maps
'MobilenetV2/expanded_conv_4/depthwise/depthwise_weights', 'MobilenetV2/expanded_conv_4/depthwise/depthwise_weights',
'MobilenetV2/expanded_conv_7/depthwise/depthwise_weights', 'MobilenetV2/expanded_conv_7/depthwise/depthwise_weights',
'MobilenetV2/expanded_conv_14/depthwise/depthwise_weights', 'MobilenetV2/expanded_conv_14/depthwise/depthwise_weights',
...@@ -186,13 +239,32 @@ class SsdMobilenetV2FpnFeatureExtractorTest( ...@@ -186,13 +239,32 @@ class SsdMobilenetV2FpnFeatureExtractorTest(
'MobilenetV2/fpn/projection_2/weights', 'MobilenetV2/fpn/projection_2/weights',
'MobilenetV2/fpn/projection_3/weights', 'MobilenetV2/fpn/projection_3/weights',
]) ])
keras_expected_feature_maps_variables = set([
# Keras Mobilenet V2 feature maps
'MobilenetV2_FPN/block_4_depthwise/depthwise_kernel',
'MobilenetV2_FPN/block_7_depthwise/depthwise_kernel',
'MobilenetV2_FPN/block_14_depthwise/depthwise_kernel',
'MobilenetV2_FPN/Conv_1/kernel',
# FPN layers
'MobilenetV2_FPN/bottom_up_Conv2d_20_conv/kernel',
'MobilenetV2_FPN/bottom_up_Conv2d_21_conv/kernel',
'MobilenetV2_FPN/FeatureMaps/top_down/smoothing_1_conv/kernel',
'MobilenetV2_FPN/FeatureMaps/top_down/smoothing_2_conv/kernel',
'MobilenetV2_FPN/FeatureMaps/top_down/projection_1/kernel',
'MobilenetV2_FPN/FeatureMaps/top_down/projection_2/kernel',
'MobilenetV2_FPN/FeatureMaps/top_down/projection_3/kernel'
])
g = tf.Graph() g = tf.Graph()
with g.as_default(): with g.as_default():
preprocessed_inputs = tf.placeholder(tf.float32, (4, None, None, 3)) preprocessed_inputs = tf.placeholder(tf.float32, (4, None, None, 3))
feature_extractor = self._create_feature_extractor( feature_extractor = self._create_feature_extractor(
depth_multiplier, pad_to_multiple) depth_multiplier, pad_to_multiple, use_keras=use_keras)
if use_keras:
feature_extractor(preprocessed_inputs)
expected_feature_maps_variables = keras_expected_feature_maps_variables
else:
feature_extractor.extract_features(preprocessed_inputs) feature_extractor.extract_features(preprocessed_inputs)
expected_feature_maps_variables = slim_expected_feature_maps_variables
actual_variable_set = set([ actual_variable_set = set([
var.op.name for var in g.get_collection(tf.GraphKeys.GLOBAL_VARIABLES) var.op.name for var in g.get_collection(tf.GraphKeys.GLOBAL_VARIABLES)
]) ])
......
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""SSD Keras-based MobilenetV2 FPN Feature Extractor."""
import tensorflow as tf
from object_detection.meta_architectures import ssd_meta_arch
from object_detection.models import feature_map_generators
from object_detection.models.keras_models import mobilenet_v2
from object_detection.utils import ops
from object_detection.utils import shape_utils
# Total number of blocks in Mobilenet_V2 base network.
NUM_LAYERS = 19
# A modified config of mobilenet v2 that makes it more detection friendly.
def _create_modified_mobilenet_config():
last_conv = mobilenet_v2.ConvDefs(conv_name='Conv_1', filters=256)
return [last_conv]
class SSDMobileNetV2FpnKerasFeatureExtractor(
ssd_meta_arch.SSDKerasFeatureExtractor):
"""SSD Feature Extractor using Keras-based MobilenetV2 FPN features."""
def __init__(self,
is_training,
depth_multiplier,
min_depth,
pad_to_multiple,
conv_hyperparams,
freeze_batchnorm,
inplace_batchnorm_update,
fpn_min_level=3,
fpn_max_level=7,
additional_layer_depth=256,
reuse_weights=None,
use_explicit_padding=False,
use_depthwise=False,
override_base_feature_extractor_hyperparams=False,
name=None):
"""SSD Keras based FPN feature extractor Mobilenet v2 architecture.
Args:
is_training: whether the network is in training mode.
depth_multiplier: float depth multiplier for feature extractor.
min_depth: minimum feature extractor depth.
pad_to_multiple: the nearest multiple to zero pad the input height and
width dimensions to.
conv_hyperparams: a `hyperparams_builder.KerasLayerHyperparams` object
containing convolution hyperparameters for the layers added on top of
the base feature extractor.
freeze_batchnorm: whether to freeze batch norm parameters during
training or not. When training with a small batch size (e.g. 1), it is
desirable to freeze batch norm update and use pretrained batch norm
params.
inplace_batchnorm_update: whether to update batch norm moving average
values inplace. When this is false train op must add a control
dependency on tf.graphkeys.UPDATE_OPS collection in order to update
batch norm statistics.
fpn_min_level: the highest resolution feature map to use in FPN. The valid
values are {2, 3, 4, 5} which map to MobileNet v2 layers
{layer_4, layer_7, layer_14, layer_19}, respectively.
fpn_max_level: the smallest resolution feature map to construct or use in
FPN. FPN constructions uses features maps starting from fpn_min_level
upto the fpn_max_level. In the case that there are not enough feature
maps in the backbone network, additional feature maps are created by
applying stride 2 convolutions until we get the desired number of fpn
levels.
additional_layer_depth: additional feature map layer channel depth.
reuse_weights: whether to reuse variables. Default is None.
use_explicit_padding: Whether to use explicit padding when extracting
features. Default is False.
use_depthwise: Whether to use depthwise convolutions. Default is False.
override_base_feature_extractor_hyperparams: Whether to override
hyperparameters of the base feature extractor with the one from
`conv_hyperparams`.
name: a string name scope to assign to the model. If 'None', Keras
will auto-generate one from the class name.
"""
super(SSDMobileNetV2FpnKerasFeatureExtractor, self).__init__(
is_training=is_training,
depth_multiplier=depth_multiplier,
min_depth=min_depth,
pad_to_multiple=pad_to_multiple,
conv_hyperparams=conv_hyperparams,
freeze_batchnorm=freeze_batchnorm,
inplace_batchnorm_update=inplace_batchnorm_update,
use_explicit_padding=use_explicit_padding,
use_depthwise=use_depthwise,
override_base_feature_extractor_hyperparams=
override_base_feature_extractor_hyperparams,
name=name)
self._fpn_min_level = fpn_min_level
self._fpn_max_level = fpn_max_level
self._additional_layer_depth = additional_layer_depth
self._conv_defs = None
if self._use_depthwise:
self._conv_defs = _create_modified_mobilenet_config()
self._feature_blocks = ['layer_4', 'layer_7', 'layer_14', 'layer_19']
self._mobilenet_v2 = None
self._fpn_features_generator = None
self._coarse_feature_layers = []
def build(self, input_shape):
full_mobilenet_v2 = mobilenet_v2.mobilenet_v2(
batchnorm_training=(self._is_training and not self._freeze_batchnorm),
conv_hyperparams=(self._conv_hyperparams
if self._override_base_feature_extractor_hyperparams
else None),
weights=None,
use_explicit_padding=self._use_explicit_padding,
alpha=self._depth_multiplier,
min_depth=self._min_depth,
include_top=False)
layer_names = [layer.name for layer in full_mobilenet_v2.layers]
outputs = []
for layer_idx in [4, 7, 14]:
add_name = 'block_{}_add'.format(layer_idx - 2)
project_name = 'block_{}_project_BN'.format(layer_idx - 2)
output_layer_name = add_name if add_name in layer_names else project_name
outputs.append(full_mobilenet_v2.get_layer(output_layer_name).output)
layer_19 = full_mobilenet_v2.get_layer(name='out_relu').output
outputs.append(layer_19)
self._mobilenet_v2 = tf.keras.Model(
inputs=full_mobilenet_v2.inputs,
outputs=outputs)
# pylint:disable=g-long-lambda
self._depth_fn = lambda d: max(
int(d * self._depth_multiplier), self._min_depth)
self._base_fpn_max_level = min(self._fpn_max_level, 5)
self._num_levels = self._base_fpn_max_level + 1 - self._fpn_min_level
self._fpn_features_generator = (
feature_map_generators.KerasFpnTopDownFeatureMaps(
num_levels=self._num_levels,
depth=self._depth_fn(self._additional_layer_depth),
use_depthwise=self._use_depthwise,
use_explicit_padding=self._use_explicit_padding,
is_training=self._is_training,
conv_hyperparams=self._conv_hyperparams,
freeze_batchnorm=self._freeze_batchnorm,
name='FeatureMaps'))
# Construct coarse feature layers
padding = 'VALID' if self._use_explicit_padding else 'SAME'
kernel_size = 3
stride = 2
for i in range(self._base_fpn_max_level + 1, self._fpn_max_level + 1):
coarse_feature_layers = []
if self._use_explicit_padding:
def fixed_padding(features, kernel_size=kernel_size):
return ops.fixed_padding(features, kernel_size)
coarse_feature_layers.append(tf.keras.layers.Lambda(
fixed_padding, name='fixed_padding'))
layer_name = 'bottom_up_Conv2d_{}'.format(
i - self._base_fpn_max_level + NUM_LAYERS)
conv_block = feature_map_generators.create_conv_block(
self._use_depthwise, kernel_size, padding, stride, layer_name,
self._conv_hyperparams, self._is_training, self._freeze_batchnorm,
self._depth_fn(self._additional_layer_depth))
coarse_feature_layers.extend(conv_block)
self._coarse_feature_layers.append(coarse_feature_layers)
self.built = True
def preprocess(self, resized_inputs):
"""SSD preprocessing.
Maps pixel values to the range [-1, 1].
Args:
resized_inputs: a [batch, height, width, channels] float tensor
representing a batch of images.
Returns:
preprocessed_inputs: a [batch, height, width, channels] float tensor
representing a batch of images.
"""
return (2.0 / 255.0) * resized_inputs - 1.0
def _extract_features(self, preprocessed_inputs):
"""Extract features from preprocessed inputs.
Args:
preprocessed_inputs: a [batch, height, width, channels] float tensor
representing a batch of images.
Returns:
feature_maps: a list of tensors where the ith tensor has shape
[batch, height_i, width_i, depth_i]
"""
preprocessed_inputs = shape_utils.check_min_image_dim(
33, preprocessed_inputs)
image_features = self._mobilenet_v2(
ops.pad_to_multiple(preprocessed_inputs, self._pad_to_multiple))
feature_block_list = []
for level in range(self._fpn_min_level, self._base_fpn_max_level + 1):
feature_block_list.append(self._feature_blocks[level - 2])
feature_start_index = len(self._feature_blocks) - self._num_levels
fpn_input_image_features = [
(key, image_features[feature_start_index + index])
for index, key in enumerate(feature_block_list)]
fpn_features = self._fpn_features_generator(fpn_input_image_features)
feature_maps = []
for level in range(self._fpn_min_level, self._base_fpn_max_level + 1):
feature_maps.append(fpn_features['top_down_{}'.format(
self._feature_blocks[level - 2])])
last_feature_map = fpn_features['top_down_{}'.format(
self._feature_blocks[self._base_fpn_max_level - 2])]
for coarse_feature_layers in self._coarse_feature_layers:
for layer in coarse_feature_layers:
last_feature_map = layer(last_feature_map)
feature_maps.append(last_feature_map)
return feature_maps
...@@ -29,7 +29,7 @@ from nets import resnet_v1 ...@@ -29,7 +29,7 @@ from nets import resnet_v1
slim = tf.contrib.slim slim = tf.contrib.slim
class _SSDResnetV1FpnFeatureExtractor(ssd_meta_arch.SSDFeatureExtractor): class SSDResnetV1FpnFeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
"""SSD FPN feature extractor based on Resnet v1 architecture.""" """SSD FPN feature extractor based on Resnet v1 architecture."""
def __init__(self, def __init__(self,
...@@ -84,7 +84,7 @@ class _SSDResnetV1FpnFeatureExtractor(ssd_meta_arch.SSDFeatureExtractor): ...@@ -84,7 +84,7 @@ class _SSDResnetV1FpnFeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
Raises: Raises:
ValueError: On supplying invalid arguments for unused arguments. ValueError: On supplying invalid arguments for unused arguments.
""" """
super(_SSDResnetV1FpnFeatureExtractor, self).__init__( super(SSDResnetV1FpnFeatureExtractor, self).__init__(
is_training=is_training, is_training=is_training,
depth_multiplier=depth_multiplier, depth_multiplier=depth_multiplier,
min_depth=min_depth, min_depth=min_depth,
...@@ -198,7 +198,7 @@ class _SSDResnetV1FpnFeatureExtractor(ssd_meta_arch.SSDFeatureExtractor): ...@@ -198,7 +198,7 @@ class _SSDResnetV1FpnFeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
return feature_maps return feature_maps
class SSDResnet50V1FpnFeatureExtractor(_SSDResnetV1FpnFeatureExtractor): class SSDResnet50V1FpnFeatureExtractor(SSDResnetV1FpnFeatureExtractor):
"""SSD Resnet50 V1 FPN feature extractor.""" """SSD Resnet50 V1 FPN feature extractor."""
def __init__(self, def __init__(self,
...@@ -255,7 +255,7 @@ class SSDResnet50V1FpnFeatureExtractor(_SSDResnetV1FpnFeatureExtractor): ...@@ -255,7 +255,7 @@ class SSDResnet50V1FpnFeatureExtractor(_SSDResnetV1FpnFeatureExtractor):
override_base_feature_extractor_hyperparams) override_base_feature_extractor_hyperparams)
class SSDResnet101V1FpnFeatureExtractor(_SSDResnetV1FpnFeatureExtractor): class SSDResnet101V1FpnFeatureExtractor(SSDResnetV1FpnFeatureExtractor):
"""SSD Resnet101 V1 FPN feature extractor.""" """SSD Resnet101 V1 FPN feature extractor."""
def __init__(self, def __init__(self,
...@@ -312,7 +312,7 @@ class SSDResnet101V1FpnFeatureExtractor(_SSDResnetV1FpnFeatureExtractor): ...@@ -312,7 +312,7 @@ class SSDResnet101V1FpnFeatureExtractor(_SSDResnetV1FpnFeatureExtractor):
override_base_feature_extractor_hyperparams) override_base_feature_extractor_hyperparams)
class SSDResnet152V1FpnFeatureExtractor(_SSDResnetV1FpnFeatureExtractor): class SSDResnet152V1FpnFeatureExtractor(SSDResnetV1FpnFeatureExtractor):
"""SSD Resnet152 V1 FPN feature extractor.""" """SSD Resnet152 V1 FPN feature extractor."""
def __init__(self, def __init__(self,
......
...@@ -357,9 +357,9 @@ class WeightSharedConvolutionalBoxPredictor(box_predictor.BoxPredictor): ...@@ -357,9 +357,9 @@ class WeightSharedConvolutionalBoxPredictor(box_predictor.BoxPredictor):
inserted_layer_counter = 0 inserted_layer_counter = 0
target_channel = max(set(feature_channels), key=feature_channels.count) target_channel = max(set(feature_channels), key=feature_channels.count)
tf.logging.info('Not all feature maps have the same number of ' tf.logging.info('Not all feature maps have the same number of '
'channels, found: {}, addition project layers ' 'channels, found: {}, appending additional projection '
'to bring all feature maps to uniform channels ' 'layers to bring all feature maps to uniformly have {} '
'of {}'.format(feature_channels, target_channel)) 'channels.'.format(feature_channels, target_channel))
else: else:
# Place holder variables if has_different_feature_channels is False. # Place holder variables if has_different_feature_channels is False.
target_channel = -1 target_channel = -1
...@@ -377,6 +377,8 @@ class WeightSharedConvolutionalBoxPredictor(box_predictor.BoxPredictor): ...@@ -377,6 +377,8 @@ class WeightSharedConvolutionalBoxPredictor(box_predictor.BoxPredictor):
with tf.variable_scope('WeightSharedConvolutionalBoxPredictor', with tf.variable_scope('WeightSharedConvolutionalBoxPredictor',
reuse=tf.AUTO_REUSE): reuse=tf.AUTO_REUSE):
with slim.arg_scope(self._conv_hyperparams_fn()): with slim.arg_scope(self._conv_hyperparams_fn()):
# TODO(wangjiang) Pass is_training to the head class directly.
with slim.arg_scope([slim.dropout], is_training=self._is_training):
(image_feature, (image_feature,
inserted_layer_counter) = self._insert_additional_projection_layer( inserted_layer_counter) = self._insert_additional_projection_layer(
image_feature, inserted_layer_counter, target_channel) image_feature, inserted_layer_counter, target_channel)
......
...@@ -197,3 +197,272 @@ class ConvolutionalBoxPredictor(box_predictor.KerasBoxPredictor): ...@@ -197,3 +197,272 @@ class ConvolutionalBoxPredictor(box_predictor.KerasBoxPredictor):
predictions[head_name].append(prediction) predictions[head_name].append(prediction)
return predictions return predictions
class WeightSharedConvolutionalBoxPredictor(box_predictor.KerasBoxPredictor):
"""Convolutional Box Predictor with weight sharing based on Keras.
Defines the box predictor as defined in
https://arxiv.org/abs/1708.02002. This class differs from
ConvolutionalBoxPredictor in that it shares weights and biases while
predicting from different feature maps. However, batch_norm parameters are not
shared because the statistics of the activations vary among the different
feature maps.
Also note that separate multi-layer towers are constructed for the box
encoding and class predictors respectively.
"""
def __init__(self,
is_training,
num_classes,
box_prediction_head,
class_prediction_head,
other_heads,
conv_hyperparams,
depth,
num_layers_before_predictor,
freeze_batchnorm,
inplace_batchnorm_update,
kernel_size=3,
apply_batch_norm=False,
share_prediction_tower=False,
use_depthwise=False,
name=None):
"""Constructor.
Args:
is_training: Indicates whether the BoxPredictor is in training mode.
num_classes: number of classes. Note that num_classes *does not*
include the background category, so if groundtruth labels take values
in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the
assigned classification targets can range from {0,... K}).
box_prediction_head: The head that predicts the boxes.
class_prediction_head: The head that predicts the classes.
other_heads: A dictionary mapping head names to convolutional
head classes.
conv_hyperparams: A `hyperparams_builder.KerasLayerHyperparams` object
containing hyperparameters for convolution ops.
depth: depth of conv layers.
num_layers_before_predictor: Number of the additional conv layers before
the predictor.
freeze_batchnorm: Whether to freeze batch norm parameters during
training or not. When training with a small batch size (e.g. 1), it is
desirable to freeze batch norm update and use pretrained batch norm
params.
inplace_batchnorm_update: Whether to update batch norm moving average
values inplace. When this is false train op must add a control
dependency on tf.graphkeys.UPDATE_OPS collection in order to update
batch norm statistics.
kernel_size: Size of final convolution kernel.
apply_batch_norm: Whether to apply batch normalization to conv layers in
this predictor.
share_prediction_tower: Whether to share the multi-layer tower among box
prediction head, class prediction head and other heads.
use_depthwise: Whether to use depthwise separable conv2d instead of
regular conv2d.
name: A string name scope to assign to the model. If `None`, Keras
will auto-generate one from the class name.
"""
super(WeightSharedConvolutionalBoxPredictor, self).__init__(
is_training, num_classes, freeze_batchnorm=freeze_batchnorm,
inplace_batchnorm_update=inplace_batchnorm_update,
name=name)
self._box_prediction_head = box_prediction_head
self._prediction_heads = {
CLASS_PREDICTIONS_WITH_BACKGROUND: class_prediction_head,
}
if other_heads:
self._prediction_heads.update(other_heads)
# We generate a consistent ordering for the prediction head names,
# so that all workers build the model in the exact same order.
self._sorted_head_names = sorted(self._prediction_heads.keys())
self._conv_hyperparams = conv_hyperparams
self._depth = depth
self._num_layers_before_predictor = num_layers_before_predictor
self._kernel_size = kernel_size
self._apply_batch_norm = apply_batch_norm
self._share_prediction_tower = share_prediction_tower
self._use_depthwise = use_depthwise
# Additional projection layers to bring all feature maps to uniform
# channels.
self._additional_projection_layers = []
# The base tower layers for each head.
self._base_tower_layers_for_heads = {
BOX_ENCODINGS: [],
CLASS_PREDICTIONS_WITH_BACKGROUND: [],
}
for head_name in other_heads.keys():
self._base_tower_layers_for_heads[head_name] = []
# A dict maps the tower_name_scope of each head to the shared conv layers in
# the base tower for different feature map levels.
self._head_scope_conv_layers = {}
def _insert_additional_projection_layer(
self, inserted_layer_counter, target_channel):
projection_layers = []
if inserted_layer_counter >= 0:
use_bias = False if self._apply_batch_norm else True
projection_layers.append(keras.Conv2D(
target_channel, [1, 1], strides=1, padding='SAME',
name='ProjectionLayer/conv2d_{}'.format(inserted_layer_counter),
**self._conv_hyperparams.params(use_bias=use_bias)))
if self._apply_batch_norm:
projection_layers.append(self._conv_hyperparams.build_batch_norm(
training=(self._is_training and not self._freeze_batchnorm),
name='ProjectionLayer/conv2d_{}/BatchNorm'.format(
inserted_layer_counter)))
inserted_layer_counter += 1
return inserted_layer_counter, projection_layers
def _compute_base_tower(self, tower_name_scope, feature_index):
conv_layers = []
batch_norm_layers = []
activation_layers = []
use_bias = False if self._apply_batch_norm else True
for additional_conv_layer_idx in range(self._num_layers_before_predictor):
layer_name = '{}/conv2d_{}'.format(
tower_name_scope, additional_conv_layer_idx)
if tower_name_scope not in self._head_scope_conv_layers:
if self._use_depthwise:
conv_layers.append(
tf.keras.layers.SeparableConv2D(
self._depth,
[self._kernel_size, self._kernel_size],
padding='SAME',
name=layer_name,
**self._conv_hyperparams.params(use_bias=use_bias)))
else:
conv_layers.append(
tf.keras.layers.Conv2D(
self._depth,
[self._kernel_size, self._kernel_size],
padding='SAME',
name=layer_name,
**self._conv_hyperparams.params(use_bias=use_bias)))
# Each feature gets a separate batchnorm parameter even though they share
# the same convolution weights.
if self._apply_batch_norm:
batch_norm_layers.append(self._conv_hyperparams.build_batch_norm(
training=(self._is_training and not self._freeze_batchnorm),
name='{}/conv2d_{}/BatchNorm/feature_{}'.format(
tower_name_scope, additional_conv_layer_idx, feature_index)))
activation_layers.append(tf.keras.layers.Lambda(tf.nn.relu6))
# Set conv layers as the shared conv layers for different feature maps with
# the same tower_name_scope.
if tower_name_scope in self._head_scope_conv_layers:
conv_layers = self._head_scope_conv_layers[tower_name_scope]
# Stack the base_tower_layers in the order of conv_layer, batch_norm_layer
# and activation_layer
base_tower_layers = []
for i in range(self._num_layers_before_predictor):
base_tower_layers.extend([conv_layers[i]])
if self._apply_batch_norm:
base_tower_layers.extend([batch_norm_layers[i]])
base_tower_layers.extend([activation_layers[i]])
return conv_layers, base_tower_layers
def build(self, input_shapes):
"""Creates the variables of the layer."""
feature_channels = [
input_shape[3].value for input_shape in input_shapes
]
has_different_feature_channels = len(set(feature_channels)) > 1
if has_different_feature_channels:
inserted_layer_counter = 0
target_channel = max(set(feature_channels), key=feature_channels.count)
tf.logging.info('Not all feature maps have the same number of '
'channels, found: {}, appending additional projection '
'layers to bring all feature maps to uniformly have {} '
'channels.'.format(feature_channels, target_channel))
else:
# Place holder variables if has_different_feature_channels is False.
target_channel = -1
inserted_layer_counter = -1
def _build_layers(tower_name_scope, feature_index):
conv_layers, base_tower_layers = self._compute_base_tower(
tower_name_scope=tower_name_scope, feature_index=feature_index)
if tower_name_scope not in self._head_scope_conv_layers:
self._head_scope_conv_layers[tower_name_scope] = conv_layers
return base_tower_layers
for feature_index, input_shape in enumerate(input_shapes):
# Additional projection layers should not be shared as input channels
# (and thus weight shapes) are different
inserted_layer_counter, projection_layers = (
self._insert_additional_projection_layer(
inserted_layer_counter, target_channel))
self._additional_projection_layers.append(projection_layers)
if self._share_prediction_tower:
box_tower_scope = 'PredictionTower'
else:
box_tower_scope = 'BoxPredictionTower'
# For box tower base
box_tower_layers = _build_layers(box_tower_scope, feature_index)
self._base_tower_layers_for_heads[BOX_ENCODINGS].append(box_tower_layers)
for head_name in self._sorted_head_names:
if head_name == CLASS_PREDICTIONS_WITH_BACKGROUND:
tower_name_scope = 'ClassPredictionTower'
else:
tower_name_scope = '{}PredictionTower'.format(head_name)
box_tower_layers = _build_layers(tower_name_scope, feature_index)
self._base_tower_layers_for_heads[head_name].append(box_tower_layers)
self.built = True
def _predict(self, image_features):
"""Computes encoded object locations and corresponding confidences.
Args:
image_features: A list of float tensors of shape [batch_size, height_i,
width_i, channels_i] containing features for a batch of images.
Returns:
box_encodings: A list of float tensors of shape
[batch_size, num_anchors_i, q, code_size] representing the location of
the objects, where q is 1 or the number of classes. Each entry in the
list corresponds to a feature map in the input `image_features` list.
class_predictions_with_background: A list of float tensors of shape
[batch_size, num_anchors_i, num_classes + 1] representing the class
predictions for the proposals. Each entry in the list corresponds to a
feature map in the input `image_features` list.
"""
predictions = collections.defaultdict(list)
def _apply_layers(base_tower_layers, image_feature):
for layer in base_tower_layers:
image_feature = layer(image_feature)
return image_feature
for (index, image_feature) in enumerate(image_features):
# Apply additional projection layers to image features
for layer in self._additional_projection_layers[index]:
image_feature = layer(image_feature)
# Apply box tower layers.
box_tower_feature = _apply_layers(
self._base_tower_layers_for_heads[BOX_ENCODINGS][index],
image_feature)
box_encodings = self._box_prediction_head(box_tower_feature)
predictions[BOX_ENCODINGS].append(box_encodings)
for head_name in self._sorted_head_names:
head_obj = self._prediction_heads[head_name]
if self._share_prediction_tower:
head_tower_feature = box_tower_feature
else:
head_tower_feature = _apply_layers(
self._base_tower_layers_for_heads[head_name][index],
image_feature)
prediction = head_obj(head_tower_feature)
predictions[head_name].append(prediction)
return predictions
...@@ -21,6 +21,9 @@ from google.protobuf import text_format ...@@ -21,6 +21,9 @@ from google.protobuf import text_format
from object_detection.builders import box_predictor_builder from object_detection.builders import box_predictor_builder
from object_detection.builders import hyperparams_builder from object_detection.builders import hyperparams_builder
from object_detection.predictors import convolutional_keras_box_predictor as box_predictor from object_detection.predictors import convolutional_keras_box_predictor as box_predictor
from object_detection.predictors.heads import keras_box_head
from object_detection.predictors.heads import keras_class_head
from object_detection.predictors.heads import keras_mask_head
from object_detection.protos import hyperparams_pb2 from object_detection.protos import hyperparams_pb2
from object_detection.utils import test_case from object_detection.utils import test_case
...@@ -255,5 +258,651 @@ class ConvolutionalKerasBoxPredictorTest(test_case.TestCase): ...@@ -255,5 +258,651 @@ class ConvolutionalKerasBoxPredictorTest(test_case.TestCase):
self.assertEqual(conv_box_predictor._sorted_head_names, self.assertEqual(conv_box_predictor._sorted_head_names,
['box_encodings', 'class_predictions_with_background']) ['box_encodings', 'class_predictions_with_background'])
class WeightSharedConvolutionalKerasBoxPredictorTest(test_case.TestCase):
def _build_conv_hyperparams(self, add_batch_norm=True):
conv_hyperparams = hyperparams_pb2.Hyperparams()
conv_hyperparams_text_proto = """
activation: RELU_6
regularizer {
l2_regularizer {
}
}
initializer {
truncated_normal_initializer {
stddev: 0.01
mean: 0.0
}
}
"""
if add_batch_norm:
batch_norm_proto = """
batch_norm {
train: true,
}
"""
conv_hyperparams_text_proto += batch_norm_proto
text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams)
return hyperparams_builder.KerasLayerHyperparams(conv_hyperparams)
# pylint: disable=line-too-long
def test_get_boxes_for_five_aspect_ratios_per_location(self):
def graph_fn(image_features):
conv_box_predictor = (
box_predictor_builder.build_weight_shared_convolutional_keras_box_predictor(
is_training=False,
num_classes=0,
conv_hyperparams=self._build_conv_hyperparams(),
freeze_batchnorm=False,
inplace_batchnorm_update=False,
num_predictions_per_location_list=[5],
depth=32,
num_layers_before_predictor=1,
box_code_size=4))
box_predictions = conv_box_predictor([image_features])
box_encodings = tf.concat(
box_predictions[box_predictor.BOX_ENCODINGS], axis=1)
objectness_predictions = tf.concat(box_predictions[
box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND], axis=1)
return (box_encodings, objectness_predictions)
image_features = np.random.rand(4, 8, 8, 64).astype(np.float32)
(box_encodings, objectness_predictions) = self.execute(
graph_fn, [image_features])
self.assertAllEqual(box_encodings.shape, [4, 320, 4])
self.assertAllEqual(objectness_predictions.shape, [4, 320, 1])
def test_bias_predictions_to_background_with_sigmoid_score_conversion(self):
def graph_fn(image_features):
conv_box_predictor = (
box_predictor_builder.build_weight_shared_convolutional_keras_box_predictor(
is_training=True,
num_classes=2,
conv_hyperparams=self._build_conv_hyperparams(),
freeze_batchnorm=False,
inplace_batchnorm_update=False,
num_predictions_per_location_list=[5],
depth=32,
num_layers_before_predictor=1,
class_prediction_bias_init=-4.6,
box_code_size=4))
box_predictions = conv_box_predictor([image_features])
class_predictions = tf.concat(box_predictions[
box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND], axis=1)
return (tf.nn.sigmoid(class_predictions),)
image_features = np.random.rand(4, 8, 8, 64).astype(np.float32)
class_predictions = self.execute(graph_fn, [image_features])
self.assertAlmostEqual(np.mean(class_predictions), 0.01, places=3)
def test_get_multi_class_predictions_for_five_aspect_ratios_per_location(
self):
num_classes_without_background = 6
def graph_fn(image_features):
conv_box_predictor = (
box_predictor_builder.build_weight_shared_convolutional_keras_box_predictor(
is_training=False,
num_classes=num_classes_without_background,
conv_hyperparams=self._build_conv_hyperparams(),
freeze_batchnorm=False,
inplace_batchnorm_update=False,
num_predictions_per_location_list=[5],
depth=32,
num_layers_before_predictor=1,
box_code_size=4))
box_predictions = conv_box_predictor([image_features])
box_encodings = tf.concat(
box_predictions[box_predictor.BOX_ENCODINGS], axis=1)
class_predictions_with_background = tf.concat(box_predictions[
box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND], axis=1)
return (box_encodings, class_predictions_with_background)
image_features = np.random.rand(4, 8, 8, 64).astype(np.float32)
(box_encodings, class_predictions_with_background) = self.execute(
graph_fn, [image_features])
self.assertAllEqual(box_encodings.shape, [4, 320, 4])
self.assertAllEqual(class_predictions_with_background.shape,
[4, 320, num_classes_without_background+1])
def test_get_multi_class_predictions_from_two_feature_maps(
self):
num_classes_without_background = 6
def graph_fn(image_features1, image_features2):
conv_box_predictor = (
box_predictor_builder.build_weight_shared_convolutional_keras_box_predictor(
is_training=False,
num_classes=num_classes_without_background,
conv_hyperparams=self._build_conv_hyperparams(),
freeze_batchnorm=False,
inplace_batchnorm_update=False,
num_predictions_per_location_list=[5, 5],
depth=32,
num_layers_before_predictor=1,
box_code_size=4))
box_predictions = conv_box_predictor([image_features1, image_features2])
box_encodings = tf.concat(
box_predictions[box_predictor.BOX_ENCODINGS], axis=1)
class_predictions_with_background = tf.concat(
box_predictions[box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND],
axis=1)
return (box_encodings, class_predictions_with_background)
image_features1 = np.random.rand(4, 8, 8, 64).astype(np.float32)
image_features2 = np.random.rand(4, 8, 8, 64).astype(np.float32)
(box_encodings, class_predictions_with_background) = self.execute(
graph_fn, [image_features1, image_features2])
self.assertAllEqual(box_encodings.shape, [4, 640, 4])
self.assertAllEqual(class_predictions_with_background.shape,
[4, 640, num_classes_without_background+1])
def test_get_multi_class_predictions_from_feature_maps_of_different_depth(
self):
num_classes_without_background = 6
def graph_fn(image_features1, image_features2, image_features3):
conv_box_predictor = (
box_predictor_builder.build_weight_shared_convolutional_keras_box_predictor(
is_training=False,
num_classes=num_classes_without_background,
conv_hyperparams=self._build_conv_hyperparams(),
freeze_batchnorm=False,
inplace_batchnorm_update=False,
num_predictions_per_location_list=[5, 5, 5],
depth=32,
num_layers_before_predictor=1,
box_code_size=4))
box_predictions = conv_box_predictor(
[image_features1, image_features2, image_features3])
box_encodings = tf.concat(
box_predictions[box_predictor.BOX_ENCODINGS], axis=1)
class_predictions_with_background = tf.concat(
box_predictions[box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND],
axis=1)
return (box_encodings, class_predictions_with_background)
image_features1 = np.random.rand(4, 8, 8, 64).astype(np.float32)
image_features2 = np.random.rand(4, 8, 8, 64).astype(np.float32)
image_features3 = np.random.rand(4, 8, 8, 32).astype(np.float32)
(box_encodings, class_predictions_with_background) = self.execute(
graph_fn, [image_features1, image_features2, image_features3])
self.assertAllEqual(box_encodings.shape, [4, 960, 4])
self.assertAllEqual(class_predictions_with_background.shape,
[4, 960, num_classes_without_background+1])
def test_predictions_multiple_feature_maps_share_weights_separate_batchnorm(
self):
num_classes_without_background = 6
def graph_fn(image_features1, image_features2):
conv_box_predictor = (
box_predictor_builder.build_weight_shared_convolutional_keras_box_predictor(
is_training=False,
num_classes=num_classes_without_background,
conv_hyperparams=self._build_conv_hyperparams(),
freeze_batchnorm=False,
inplace_batchnorm_update=False,
num_predictions_per_location_list=[5, 5],
depth=32,
num_layers_before_predictor=2,
box_code_size=4))
box_predictions = conv_box_predictor([image_features1, image_features2])
box_encodings = tf.concat(
box_predictions[box_predictor.BOX_ENCODINGS], axis=1)
class_predictions_with_background = tf.concat(
box_predictions[box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND],
axis=1)
return (box_encodings, class_predictions_with_background)
with self.test_session(graph=tf.Graph()):
graph_fn(tf.random_uniform([4, 32, 32, 3], dtype=tf.float32),
tf.random_uniform([4, 16, 16, 3], dtype=tf.float32))
actual_variable_set = set(
[var.op.name for var in tf.trainable_variables()])
expected_variable_set = set([
# Box prediction tower
('WeightSharedConvolutionalBoxPredictor/'
'BoxPredictionTower/conv2d_0/kernel'),
('WeightSharedConvolutionalBoxPredictor/'
'BoxPredictionTower/conv2d_0/BatchNorm/feature_0/beta'),
('WeightSharedConvolutionalBoxPredictor/'
'BoxPredictionTower/conv2d_0/BatchNorm/feature_1/beta'),
('WeightSharedConvolutionalBoxPredictor/'
'BoxPredictionTower/conv2d_1/kernel'),
('WeightSharedConvolutionalBoxPredictor/'
'BoxPredictionTower/conv2d_1/BatchNorm/feature_0/beta'),
('WeightSharedConvolutionalBoxPredictor/'
'BoxPredictionTower/conv2d_1/BatchNorm/feature_1/beta'),
# Box prediction head
('WeightSharedConvolutionalBoxPredictor/'
'WeightSharedConvolutionalBoxHead/BoxPredictor/kernel'),
('WeightSharedConvolutionalBoxPredictor/'
'WeightSharedConvolutionalBoxHead/BoxPredictor/bias'),
# Class prediction tower
('WeightSharedConvolutionalBoxPredictor/'
'ClassPredictionTower/conv2d_0/kernel'),
('WeightSharedConvolutionalBoxPredictor/'
'ClassPredictionTower/conv2d_0/BatchNorm/feature_0/beta'),
('WeightSharedConvolutionalBoxPredictor/'
'ClassPredictionTower/conv2d_0/BatchNorm/feature_1/beta'),
('WeightSharedConvolutionalBoxPredictor/'
'ClassPredictionTower/conv2d_1/kernel'),
('WeightSharedConvolutionalBoxPredictor/'
'ClassPredictionTower/conv2d_1/BatchNorm/feature_0/beta'),
('WeightSharedConvolutionalBoxPredictor/'
'ClassPredictionTower/conv2d_1/BatchNorm/feature_1/beta'),
# Class prediction head
('WeightSharedConvolutionalBoxPredictor/'
'WeightSharedConvolutionalClassHead/ClassPredictor/kernel'),
('WeightSharedConvolutionalBoxPredictor/'
'WeightSharedConvolutionalClassHead/ClassPredictor/bias')])
self.assertEqual(expected_variable_set, actual_variable_set)
def test_predictions_multiple_feature_maps_share_weights_without_batchnorm(
self):
num_classes_without_background = 6
def graph_fn(image_features1, image_features2):
conv_box_predictor = (
box_predictor_builder.build_weight_shared_convolutional_keras_box_predictor(
is_training=False,
num_classes=num_classes_without_background,
conv_hyperparams=self._build_conv_hyperparams(),
freeze_batchnorm=False,
inplace_batchnorm_update=False,
num_predictions_per_location_list=[5, 5],
depth=32,
num_layers_before_predictor=2,
box_code_size=4,
apply_batch_norm=False))
box_predictions = conv_box_predictor([image_features1, image_features2])
box_encodings = tf.concat(
box_predictions[box_predictor.BOX_ENCODINGS], axis=1)
class_predictions_with_background = tf.concat(
box_predictions[box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND],
axis=1)
return (box_encodings, class_predictions_with_background)
with self.test_session(graph=tf.Graph()):
graph_fn(tf.random_uniform([4, 32, 32, 3], dtype=tf.float32),
tf.random_uniform([4, 16, 16, 3], dtype=tf.float32))
actual_variable_set = set(
[var.op.name for var in tf.trainable_variables()])
expected_variable_set = set([
# Box prediction tower
('WeightSharedConvolutionalBoxPredictor/'
'BoxPredictionTower/conv2d_0/kernel'),
('WeightSharedConvolutionalBoxPredictor/'
'BoxPredictionTower/conv2d_0/bias'),
('WeightSharedConvolutionalBoxPredictor/'
'BoxPredictionTower/conv2d_1/kernel'),
('WeightSharedConvolutionalBoxPredictor/'
'BoxPredictionTower/conv2d_1/bias'),
# Box prediction head
('WeightSharedConvolutionalBoxPredictor/'
'WeightSharedConvolutionalBoxHead/BoxPredictor/kernel'),
('WeightSharedConvolutionalBoxPredictor/'
'WeightSharedConvolutionalBoxHead/BoxPredictor/bias'),
# Class prediction tower
('WeightSharedConvolutionalBoxPredictor/'
'ClassPredictionTower/conv2d_0/kernel'),
('WeightSharedConvolutionalBoxPredictor/'
'ClassPredictionTower/conv2d_0/bias'),
('WeightSharedConvolutionalBoxPredictor/'
'ClassPredictionTower/conv2d_1/kernel'),
('WeightSharedConvolutionalBoxPredictor/'
'ClassPredictionTower/conv2d_1/bias'),
# Class prediction head
('WeightSharedConvolutionalBoxPredictor/'
'WeightSharedConvolutionalClassHead/ClassPredictor/kernel'),
('WeightSharedConvolutionalBoxPredictor/'
'WeightSharedConvolutionalClassHead/ClassPredictor/bias')])
self.assertEqual(expected_variable_set, actual_variable_set)
def test_predictions_multiple_feature_maps_share_weights_with_depthwise(
self):
num_classes_without_background = 6
def graph_fn(image_features1, image_features2):
conv_box_predictor = (
box_predictor_builder.build_weight_shared_convolutional_keras_box_predictor(
is_training=False,
num_classes=num_classes_without_background,
conv_hyperparams=self._build_conv_hyperparams(
add_batch_norm=False),
freeze_batchnorm=False,
inplace_batchnorm_update=False,
num_predictions_per_location_list=[5, 5],
depth=32,
num_layers_before_predictor=2,
box_code_size=4,
apply_batch_norm=False,
use_depthwise=True))
box_predictions = conv_box_predictor([image_features1, image_features2])
box_encodings = tf.concat(
box_predictions[box_predictor.BOX_ENCODINGS], axis=1)
class_predictions_with_background = tf.concat(
box_predictions[box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND],
axis=1)
return (box_encodings, class_predictions_with_background)
with self.test_session(graph=tf.Graph()):
graph_fn(tf.random_uniform([4, 32, 32, 3], dtype=tf.float32),
tf.random_uniform([4, 16, 16, 3], dtype=tf.float32))
actual_variable_set = set(
[var.op.name for var in tf.trainable_variables()])
expected_variable_set = set([
# Box prediction tower
('WeightSharedConvolutionalBoxPredictor/'
'BoxPredictionTower/conv2d_0/depthwise_kernel'),
('WeightSharedConvolutionalBoxPredictor/'
'BoxPredictionTower/conv2d_0/pointwise_kernel'),
('WeightSharedConvolutionalBoxPredictor/'
'BoxPredictionTower/conv2d_0/bias'),
('WeightSharedConvolutionalBoxPredictor/'
'BoxPredictionTower/conv2d_1/depthwise_kernel'),
('WeightSharedConvolutionalBoxPredictor/'
'BoxPredictionTower/conv2d_1/pointwise_kernel'),
('WeightSharedConvolutionalBoxPredictor/'
'BoxPredictionTower/conv2d_1/bias'),
# Box prediction head
('WeightSharedConvolutionalBoxPredictor/'
'WeightSharedConvolutionalBoxHead/BoxPredictor/depthwise_kernel'),
('WeightSharedConvolutionalBoxPredictor/'
'WeightSharedConvolutionalBoxHead/BoxPredictor/pointwise_kernel'),
('WeightSharedConvolutionalBoxPredictor/'
'WeightSharedConvolutionalBoxHead/BoxPredictor/bias'),
# Class prediction tower
('WeightSharedConvolutionalBoxPredictor/'
'ClassPredictionTower/conv2d_0/depthwise_kernel'),
('WeightSharedConvolutionalBoxPredictor/'
'ClassPredictionTower/conv2d_0/pointwise_kernel'),
('WeightSharedConvolutionalBoxPredictor/'
'ClassPredictionTower/conv2d_0/bias'),
('WeightSharedConvolutionalBoxPredictor/'
'ClassPredictionTower/conv2d_1/depthwise_kernel'),
('WeightSharedConvolutionalBoxPredictor/'
'ClassPredictionTower/conv2d_1/pointwise_kernel'),
('WeightSharedConvolutionalBoxPredictor/'
'ClassPredictionTower/conv2d_1/bias'),
# Class prediction head
('WeightSharedConvolutionalBoxPredictor/'
'WeightSharedConvolutionalClassHead/ClassPredictor/depthwise_kernel'),
('WeightSharedConvolutionalBoxPredictor/'
'WeightSharedConvolutionalClassHead/ClassPredictor/pointwise_kernel'),
('WeightSharedConvolutionalBoxPredictor/'
'WeightSharedConvolutionalClassHead/ClassPredictor/bias')])
self.assertEqual(expected_variable_set, actual_variable_set)
def test_no_batchnorm_params_when_batchnorm_is_not_configured(self):
num_classes_without_background = 6
def graph_fn(image_features1, image_features2):
conv_box_predictor = (
box_predictor_builder.build_weight_shared_convolutional_keras_box_predictor(
is_training=False,
num_classes=num_classes_without_background,
conv_hyperparams=self._build_conv_hyperparams(
add_batch_norm=False),
freeze_batchnorm=False,
inplace_batchnorm_update=False,
num_predictions_per_location_list=[5, 5],
depth=32,
num_layers_before_predictor=2,
box_code_size=4,
apply_batch_norm=False))
box_predictions = conv_box_predictor(
[image_features1, image_features2])
box_encodings = tf.concat(
box_predictions[box_predictor.BOX_ENCODINGS], axis=1)
class_predictions_with_background = tf.concat(
box_predictions[box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND],
axis=1)
return (box_encodings, class_predictions_with_background)
with self.test_session(graph=tf.Graph()):
graph_fn(tf.random_uniform([4, 32, 32, 3], dtype=tf.float32),
tf.random_uniform([4, 16, 16, 3], dtype=tf.float32))
actual_variable_set = set(
[var.op.name for var in tf.trainable_variables()])
expected_variable_set = set([
# Box prediction tower
('WeightSharedConvolutionalBoxPredictor/'
'BoxPredictionTower/conv2d_0/kernel'),
('WeightSharedConvolutionalBoxPredictor/'
'BoxPredictionTower/conv2d_0/bias'),
('WeightSharedConvolutionalBoxPredictor/'
'BoxPredictionTower/conv2d_1/kernel'),
('WeightSharedConvolutionalBoxPredictor/'
'BoxPredictionTower/conv2d_1/bias'),
# Box prediction head
('WeightSharedConvolutionalBoxPredictor/'
'WeightSharedConvolutionalBoxHead/BoxPredictor/kernel'),
('WeightSharedConvolutionalBoxPredictor/'
'WeightSharedConvolutionalBoxHead/BoxPredictor/bias'),
# Class prediction tower
('WeightSharedConvolutionalBoxPredictor/'
'ClassPredictionTower/conv2d_0/kernel'),
('WeightSharedConvolutionalBoxPredictor/'
'ClassPredictionTower/conv2d_0/bias'),
('WeightSharedConvolutionalBoxPredictor/'
'ClassPredictionTower/conv2d_1/kernel'),
('WeightSharedConvolutionalBoxPredictor/'
'ClassPredictionTower/conv2d_1/bias'),
# Class prediction head
('WeightSharedConvolutionalBoxPredictor/'
'WeightSharedConvolutionalClassHead/ClassPredictor/kernel'),
('WeightSharedConvolutionalBoxPredictor/'
'WeightSharedConvolutionalClassHead/ClassPredictor/bias')])
self.assertEqual(expected_variable_set, actual_variable_set)
def test_predictions_share_weights_share_tower_separate_batchnorm(
self):
num_classes_without_background = 6
def graph_fn(image_features1, image_features2):
conv_box_predictor = (
box_predictor_builder.build_weight_shared_convolutional_keras_box_predictor(
is_training=False,
num_classes=num_classes_without_background,
conv_hyperparams=self._build_conv_hyperparams(),
freeze_batchnorm=False,
inplace_batchnorm_update=False,
num_predictions_per_location_list=[5, 5],
depth=32,
num_layers_before_predictor=2,
box_code_size=4,
share_prediction_tower=True))
box_predictions = conv_box_predictor(
[image_features1, image_features2])
box_encodings = tf.concat(
box_predictions[box_predictor.BOX_ENCODINGS], axis=1)
class_predictions_with_background = tf.concat(
box_predictions[box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND],
axis=1)
return (box_encodings, class_predictions_with_background)
with self.test_session(graph=tf.Graph()):
graph_fn(tf.random_uniform([4, 32, 32, 3], dtype=tf.float32),
tf.random_uniform([4, 16, 16, 3], dtype=tf.float32))
actual_variable_set = set(
[var.op.name for var in tf.trainable_variables()])
expected_variable_set = set([
# Shared prediction tower
('WeightSharedConvolutionalBoxPredictor/'
'PredictionTower/conv2d_0/kernel'),
('WeightSharedConvolutionalBoxPredictor/'
'PredictionTower/conv2d_0/BatchNorm/feature_0/beta'),
('WeightSharedConvolutionalBoxPredictor/'
'PredictionTower/conv2d_0/BatchNorm/feature_1/beta'),
('WeightSharedConvolutionalBoxPredictor/'
'PredictionTower/conv2d_1/kernel'),
('WeightSharedConvolutionalBoxPredictor/'
'PredictionTower/conv2d_1/BatchNorm/feature_0/beta'),
('WeightSharedConvolutionalBoxPredictor/'
'PredictionTower/conv2d_1/BatchNorm/feature_1/beta'),
# Box prediction head
('WeightSharedConvolutionalBoxPredictor/'
'WeightSharedConvolutionalBoxHead/BoxPredictor/kernel'),
('WeightSharedConvolutionalBoxPredictor/'
'WeightSharedConvolutionalBoxHead/BoxPredictor/bias'),
# Class prediction head
('WeightSharedConvolutionalBoxPredictor/'
'WeightSharedConvolutionalClassHead/ClassPredictor/kernel'),
('WeightSharedConvolutionalBoxPredictor/'
'WeightSharedConvolutionalClassHead/ClassPredictor/bias')])
self.assertEqual(expected_variable_set, actual_variable_set)
def test_predictions_share_weights_share_tower_without_batchnorm(
self):
num_classes_without_background = 6
def graph_fn(image_features1, image_features2):
conv_box_predictor = (
box_predictor_builder.build_weight_shared_convolutional_keras_box_predictor(
is_training=False,
num_classes=num_classes_without_background,
conv_hyperparams=self._build_conv_hyperparams(
add_batch_norm=False),
freeze_batchnorm=False,
inplace_batchnorm_update=False,
num_predictions_per_location_list=[5, 5],
depth=32,
num_layers_before_predictor=2,
box_code_size=4,
share_prediction_tower=True,
apply_batch_norm=False))
box_predictions = conv_box_predictor(
[image_features1, image_features2])
box_encodings = tf.concat(
box_predictions[box_predictor.BOX_ENCODINGS], axis=1)
class_predictions_with_background = tf.concat(
box_predictions[box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND],
axis=1)
return (box_encodings, class_predictions_with_background)
with self.test_session(graph=tf.Graph()):
graph_fn(tf.random_uniform([4, 32, 32, 3], dtype=tf.float32),
tf.random_uniform([4, 16, 16, 3], dtype=tf.float32))
actual_variable_set = set(
[var.op.name for var in tf.trainable_variables()])
expected_variable_set = set([
# Shared prediction tower
('WeightSharedConvolutionalBoxPredictor/'
'PredictionTower/conv2d_0/kernel'),
('WeightSharedConvolutionalBoxPredictor/'
'PredictionTower/conv2d_0/bias'),
('WeightSharedConvolutionalBoxPredictor/'
'PredictionTower/conv2d_1/kernel'),
('WeightSharedConvolutionalBoxPredictor/'
'PredictionTower/conv2d_1/bias'),
# Box prediction head
('WeightSharedConvolutionalBoxPredictor/'
'WeightSharedConvolutionalBoxHead/BoxPredictor/kernel'),
('WeightSharedConvolutionalBoxPredictor/'
'WeightSharedConvolutionalBoxHead/BoxPredictor/bias'),
# Class prediction head
('WeightSharedConvolutionalBoxPredictor/'
'WeightSharedConvolutionalClassHead/ClassPredictor/kernel'),
('WeightSharedConvolutionalBoxPredictor/'
'WeightSharedConvolutionalClassHead/ClassPredictor/bias')])
self.assertEqual(expected_variable_set, actual_variable_set)
def test_get_predictions_with_feature_maps_of_dynamic_shape(
self):
image_features = tf.placeholder(dtype=tf.float32, shape=[4, None, None, 64])
conv_box_predictor = (
box_predictor_builder.build_weight_shared_convolutional_keras_box_predictor(
is_training=False,
num_classes=0,
conv_hyperparams=self._build_conv_hyperparams(),
freeze_batchnorm=False,
inplace_batchnorm_update=False,
num_predictions_per_location_list=[5],
depth=32,
num_layers_before_predictor=1,
box_code_size=4))
box_predictions = conv_box_predictor([image_features])
box_encodings = tf.concat(box_predictions[box_predictor.BOX_ENCODINGS],
axis=1)
objectness_predictions = tf.concat(box_predictions[
box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND], axis=1)
init_op = tf.global_variables_initializer()
resolution = 32
expected_num_anchors = resolution*resolution*5
with self.test_session() as sess:
sess.run(init_op)
(box_encodings_shape,
objectness_predictions_shape) = sess.run(
[tf.shape(box_encodings), tf.shape(objectness_predictions)],
feed_dict={image_features:
np.random.rand(4, resolution, resolution, 64)})
self.assertAllEqual(box_encodings_shape, [4, expected_num_anchors, 4])
self.assertAllEqual(objectness_predictions_shape,
[4, expected_num_anchors, 1])
def test_other_heads_predictions(self):
box_code_size = 4
num_classes_without_background = 3
other_head_name = 'Mask'
mask_height = 5
mask_width = 5
num_predictions_per_location = 5
def graph_fn(image_features):
box_prediction_head = keras_box_head.WeightSharedConvolutionalBoxHead(
box_code_size=box_code_size,
conv_hyperparams=self._build_conv_hyperparams(),
num_predictions_per_location=num_predictions_per_location)
class_prediction_head = keras_class_head.WeightSharedConvolutionalClassHead(
num_class_slots=num_classes_without_background + 1,
conv_hyperparams=self._build_conv_hyperparams(),
num_predictions_per_location=num_predictions_per_location)
other_heads = {
other_head_name:
keras_mask_head.WeightSharedConvolutionalMaskHead(
num_classes=num_classes_without_background,
conv_hyperparams=self._build_conv_hyperparams(),
num_predictions_per_location=num_predictions_per_location,
mask_height=mask_height,
mask_width=mask_width)
}
conv_box_predictor = box_predictor.WeightSharedConvolutionalBoxPredictor(
is_training=False,
num_classes=num_classes_without_background,
box_prediction_head=box_prediction_head,
class_prediction_head=class_prediction_head,
other_heads=other_heads,
conv_hyperparams=self._build_conv_hyperparams(),
freeze_batchnorm=False,
inplace_batchnorm_update=False,
depth=32,
num_layers_before_predictor=2)
box_predictions = conv_box_predictor([image_features])
for key, value in box_predictions.items():
box_predictions[key] = tf.concat(value, axis=1)
assert len(box_predictions) == 3
return (box_predictions[box_predictor.BOX_ENCODINGS],
box_predictions[box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND],
box_predictions[other_head_name])
batch_size = 4
feature_ht = 8
feature_wt = 8
image_features = np.random.rand(batch_size, feature_ht, feature_wt,
64).astype(np.float32)
(box_encodings, class_predictions, other_head_predictions) = self.execute(
graph_fn, [image_features])
num_anchors = feature_ht * feature_wt * num_predictions_per_location
self.assertAllEqual(box_encodings.shape,
[batch_size, num_anchors, box_code_size])
self.assertAllEqual(
class_predictions.shape,
[batch_size, num_anchors, num_classes_without_background + 1])
self.assertAllEqual(other_head_predictions.shape, [
batch_size, num_anchors, num_classes_without_background, mask_height,
mask_width
])
if __name__ == '__main__': if __name__ == '__main__':
tf.test.main() tf.test.main()
...@@ -120,7 +120,8 @@ class ConvolutionalBoxHead(head.Head): ...@@ -120,7 +120,8 @@ class ConvolutionalBoxHead(head.Head):
is_training, is_training,
box_code_size, box_code_size,
kernel_size, kernel_size,
use_depthwise=False): use_depthwise=False,
box_encodings_clip_range=None):
"""Constructor. """Constructor.
Args: Args:
...@@ -132,6 +133,7 @@ class ConvolutionalBoxHead(head.Head): ...@@ -132,6 +133,7 @@ class ConvolutionalBoxHead(head.Head):
min(feature_width, feature_height). min(feature_width, feature_height).
use_depthwise: Whether to use depthwise convolutions for prediction use_depthwise: Whether to use depthwise convolutions for prediction
steps. Default is False. steps. Default is False.
box_encodings_clip_range: Min and max values for clipping box_encodings.
Raises: Raises:
ValueError: if min_depth > max_depth. ValueError: if min_depth > max_depth.
...@@ -141,6 +143,7 @@ class ConvolutionalBoxHead(head.Head): ...@@ -141,6 +143,7 @@ class ConvolutionalBoxHead(head.Head):
self._box_code_size = box_code_size self._box_code_size = box_code_size
self._kernel_size = kernel_size self._kernel_size = kernel_size
self._use_depthwise = use_depthwise self._use_depthwise = use_depthwise
self._box_encodings_clip_range = box_encodings_clip_range
def predict(self, features, num_predictions_per_location): def predict(self, features, num_predictions_per_location):
"""Predicts boxes. """Predicts boxes.
...@@ -180,6 +183,11 @@ class ConvolutionalBoxHead(head.Head): ...@@ -180,6 +183,11 @@ class ConvolutionalBoxHead(head.Head):
batch_size = features.get_shape().as_list()[0] batch_size = features.get_shape().as_list()[0]
if batch_size is None: if batch_size is None:
batch_size = tf.shape(features)[0] batch_size = tf.shape(features)[0]
# Clipping the box encodings to make the inference graph TPU friendly.
if self._box_encodings_clip_range is not None:
box_encodings = tf.clip_by_value(
box_encodings, self._box_encodings_clip_range.min,
self._box_encodings_clip_range.max)
box_encodings = tf.reshape(box_encodings, box_encodings = tf.reshape(box_encodings,
[batch_size, -1, 1, self._box_code_size]) [batch_size, -1, 1, self._box_code_size])
return box_encodings return box_encodings
...@@ -198,7 +206,8 @@ class WeightSharedConvolutionalBoxHead(head.Head): ...@@ -198,7 +206,8 @@ class WeightSharedConvolutionalBoxHead(head.Head):
box_code_size, box_code_size,
kernel_size=3, kernel_size=3,
use_depthwise=False, use_depthwise=False,
box_encodings_clip_range=None): box_encodings_clip_range=None,
return_flat_predictions=True):
"""Constructor. """Constructor.
Args: Args:
...@@ -207,12 +216,18 @@ class WeightSharedConvolutionalBoxHead(head.Head): ...@@ -207,12 +216,18 @@ class WeightSharedConvolutionalBoxHead(head.Head):
use_depthwise: Whether to use depthwise convolutions for prediction steps. use_depthwise: Whether to use depthwise convolutions for prediction steps.
Default is False. Default is False.
box_encodings_clip_range: Min and max values for clipping box_encodings. box_encodings_clip_range: Min and max values for clipping box_encodings.
return_flat_predictions: If true, returns flattened prediction tensor
of shape [batch, height * width * num_predictions_per_location,
box_coder]. Otherwise returns the prediction tensor before reshaping,
whose shape is [batch, height, width, num_predictions_per_location *
num_class_slots].
""" """
super(WeightSharedConvolutionalBoxHead, self).__init__() super(WeightSharedConvolutionalBoxHead, self).__init__()
self._box_code_size = box_code_size self._box_code_size = box_code_size
self._kernel_size = kernel_size self._kernel_size = kernel_size
self._use_depthwise = use_depthwise self._use_depthwise = use_depthwise
self._box_encodings_clip_range = box_encodings_clip_range self._box_encodings_clip_range = box_encodings_clip_range
self._return_flat_predictions = return_flat_predictions
def predict(self, features, num_predictions_per_location): def predict(self, features, num_predictions_per_location):
"""Predicts boxes. """Predicts boxes.
...@@ -226,7 +241,9 @@ class WeightSharedConvolutionalBoxHead(head.Head): ...@@ -226,7 +241,9 @@ class WeightSharedConvolutionalBoxHead(head.Head):
Returns: Returns:
box_encodings: A float tensor of shape box_encodings: A float tensor of shape
[batch_size, num_anchors, code_size] representing the location of [batch_size, num_anchors, code_size] representing the location of
the objects. the objects, or a float tensor of shape [batch, height, width,
num_predictions_per_location * box_code_size] representing grid box
location predictions if self._return_flat_predictions is False.
""" """
box_encodings_net = features box_encodings_net = features
if self._use_depthwise: if self._use_depthwise:
...@@ -248,6 +265,7 @@ class WeightSharedConvolutionalBoxHead(head.Head): ...@@ -248,6 +265,7 @@ class WeightSharedConvolutionalBoxHead(head.Head):
box_encodings = tf.clip_by_value( box_encodings = tf.clip_by_value(
box_encodings, self._box_encodings_clip_range.min, box_encodings, self._box_encodings_clip_range.min,
self._box_encodings_clip_range.max) self._box_encodings_clip_range.max)
if self._return_flat_predictions:
box_encodings = tf.reshape(box_encodings, box_encodings = tf.reshape(box_encodings,
[batch_size, -1, self._box_code_size]) [batch_size, -1, self._box_code_size])
return box_encodings return box_encodings
...@@ -39,7 +39,8 @@ class MaskRCNNClassHead(head.Head): ...@@ -39,7 +39,8 @@ class MaskRCNNClassHead(head.Head):
num_class_slots, num_class_slots,
fc_hyperparams_fn, fc_hyperparams_fn,
use_dropout, use_dropout,
dropout_keep_prob): dropout_keep_prob,
scope='ClassPredictor'):
"""Constructor. """Constructor.
Args: Args:
...@@ -53,6 +54,7 @@ class MaskRCNNClassHead(head.Head): ...@@ -53,6 +54,7 @@ class MaskRCNNClassHead(head.Head):
in contrast to the ConvolutionalBoxPredictor below. in contrast to the ConvolutionalBoxPredictor below.
dropout_keep_prob: Keep probability for dropout. dropout_keep_prob: Keep probability for dropout.
This is only used if use_dropout is True. This is only used if use_dropout is True.
scope: Scope name for the convolution operation.
""" """
super(MaskRCNNClassHead, self).__init__() super(MaskRCNNClassHead, self).__init__()
self._is_training = is_training self._is_training = is_training
...@@ -60,6 +62,7 @@ class MaskRCNNClassHead(head.Head): ...@@ -60,6 +62,7 @@ class MaskRCNNClassHead(head.Head):
self._fc_hyperparams_fn = fc_hyperparams_fn self._fc_hyperparams_fn = fc_hyperparams_fn
self._use_dropout = use_dropout self._use_dropout = use_dropout
self._dropout_keep_prob = dropout_keep_prob self._dropout_keep_prob = dropout_keep_prob
self._scope = scope
def predict(self, features, num_predictions_per_location=1): def predict(self, features, num_predictions_per_location=1):
"""Predicts boxes and class scores. """Predicts boxes and class scores.
...@@ -95,7 +98,7 @@ class MaskRCNNClassHead(head.Head): ...@@ -95,7 +98,7 @@ class MaskRCNNClassHead(head.Head):
flattened_roi_pooled_features, flattened_roi_pooled_features,
self._num_class_slots, self._num_class_slots,
activation_fn=None, activation_fn=None,
scope='ClassPredictor') scope=self._scope)
class_predictions_with_background = tf.reshape( class_predictions_with_background = tf.reshape(
class_predictions_with_background, class_predictions_with_background,
[-1, 1, self._num_class_slots]) [-1, 1, self._num_class_slots])
...@@ -113,7 +116,8 @@ class ConvolutionalClassHead(head.Head): ...@@ -113,7 +116,8 @@ class ConvolutionalClassHead(head.Head):
kernel_size, kernel_size,
apply_sigmoid_to_scores=False, apply_sigmoid_to_scores=False,
class_prediction_bias_init=0.0, class_prediction_bias_init=0.0,
use_depthwise=False): use_depthwise=False,
scope='ClassPredictor'):
"""Constructor. """Constructor.
Args: Args:
...@@ -135,6 +139,7 @@ class ConvolutionalClassHead(head.Head): ...@@ -135,6 +139,7 @@ class ConvolutionalClassHead(head.Head):
conv2d layer before class prediction. conv2d layer before class prediction.
use_depthwise: Whether to use depthwise convolutions for prediction use_depthwise: Whether to use depthwise convolutions for prediction
steps. Default is False. steps. Default is False.
scope: Scope name for the convolution operation.
Raises: Raises:
ValueError: if min_depth > max_depth. ValueError: if min_depth > max_depth.
...@@ -148,6 +153,7 @@ class ConvolutionalClassHead(head.Head): ...@@ -148,6 +153,7 @@ class ConvolutionalClassHead(head.Head):
self._apply_sigmoid_to_scores = apply_sigmoid_to_scores self._apply_sigmoid_to_scores = apply_sigmoid_to_scores
self._class_prediction_bias_init = class_prediction_bias_init self._class_prediction_bias_init = class_prediction_bias_init
self._use_depthwise = use_depthwise self._use_depthwise = use_depthwise
self._scope = scope
def predict(self, features, num_predictions_per_location): def predict(self, features, num_predictions_per_location):
"""Predicts boxes. """Predicts boxes.
...@@ -167,17 +173,18 @@ class ConvolutionalClassHead(head.Head): ...@@ -167,17 +173,18 @@ class ConvolutionalClassHead(head.Head):
if self._use_dropout: if self._use_dropout:
net = slim.dropout(net, keep_prob=self._dropout_keep_prob) net = slim.dropout(net, keep_prob=self._dropout_keep_prob)
if self._use_depthwise: if self._use_depthwise:
depthwise_scope = self._scope + '_depthwise'
class_predictions_with_background = slim.separable_conv2d( class_predictions_with_background = slim.separable_conv2d(
net, None, [self._kernel_size, self._kernel_size], net, None, [self._kernel_size, self._kernel_size],
padding='SAME', depth_multiplier=1, stride=1, padding='SAME', depth_multiplier=1, stride=1,
rate=1, scope='ClassPredictor_depthwise') rate=1, scope=depthwise_scope)
class_predictions_with_background = slim.conv2d( class_predictions_with_background = slim.conv2d(
class_predictions_with_background, class_predictions_with_background,
num_predictions_per_location * self._num_class_slots, [1, 1], num_predictions_per_location * self._num_class_slots, [1, 1],
activation_fn=None, activation_fn=None,
normalizer_fn=None, normalizer_fn=None,
normalizer_params=None, normalizer_params=None,
scope='ClassPredictor') scope=self._scope)
else: else:
class_predictions_with_background = slim.conv2d( class_predictions_with_background = slim.conv2d(
net, net,
...@@ -186,7 +193,7 @@ class ConvolutionalClassHead(head.Head): ...@@ -186,7 +193,7 @@ class ConvolutionalClassHead(head.Head):
activation_fn=None, activation_fn=None,
normalizer_fn=None, normalizer_fn=None,
normalizer_params=None, normalizer_params=None,
scope='ClassPredictor', scope=self._scope,
biases_initializer=tf.constant_initializer( biases_initializer=tf.constant_initializer(
self._class_prediction_bias_init)) self._class_prediction_bias_init))
if self._apply_sigmoid_to_scores: if self._apply_sigmoid_to_scores:
...@@ -217,7 +224,9 @@ class WeightSharedConvolutionalClassHead(head.Head): ...@@ -217,7 +224,9 @@ class WeightSharedConvolutionalClassHead(head.Head):
use_dropout=False, use_dropout=False,
dropout_keep_prob=0.8, dropout_keep_prob=0.8,
use_depthwise=False, use_depthwise=False,
score_converter_fn=tf.identity): score_converter_fn=tf.identity,
return_flat_predictions=True,
scope='ClassPredictor'):
"""Constructor. """Constructor.
Args: Args:
...@@ -232,6 +241,12 @@ class WeightSharedConvolutionalClassHead(head.Head): ...@@ -232,6 +241,12 @@ class WeightSharedConvolutionalClassHead(head.Head):
steps. Default is False. steps. Default is False.
score_converter_fn: Callable elementwise nonlinearity (that takes tensors score_converter_fn: Callable elementwise nonlinearity (that takes tensors
as inputs and returns tensors). as inputs and returns tensors).
return_flat_predictions: If true, returns flattened prediction tensor
of shape [batch, height * width * num_predictions_per_location,
box_coder]. Otherwise returns the prediction tensor before reshaping,
whose shape is [batch, height, width, num_predictions_per_location *
num_class_slots].
scope: Scope name for the convolution operation.
""" """
super(WeightSharedConvolutionalClassHead, self).__init__() super(WeightSharedConvolutionalClassHead, self).__init__()
self._num_class_slots = num_class_slots self._num_class_slots = num_class_slots
...@@ -241,6 +256,8 @@ class WeightSharedConvolutionalClassHead(head.Head): ...@@ -241,6 +256,8 @@ class WeightSharedConvolutionalClassHead(head.Head):
self._dropout_keep_prob = dropout_keep_prob self._dropout_keep_prob = dropout_keep_prob
self._use_depthwise = use_depthwise self._use_depthwise = use_depthwise
self._score_converter_fn = score_converter_fn self._score_converter_fn = score_converter_fn
self._return_flat_predictions = return_flat_predictions
self._scope = scope
def predict(self, features, num_predictions_per_location): def predict(self, features, num_predictions_per_location):
"""Predicts boxes. """Predicts boxes.
...@@ -254,7 +271,10 @@ class WeightSharedConvolutionalClassHead(head.Head): ...@@ -254,7 +271,10 @@ class WeightSharedConvolutionalClassHead(head.Head):
Returns: Returns:
class_predictions_with_background: A tensor of shape class_predictions_with_background: A tensor of shape
[batch_size, num_anchors, num_class_slots] representing the class [batch_size, num_anchors, num_class_slots] representing the class
predictions for the proposals. predictions for the proposals, or a tensor of shape [batch, height,
width, num_predictions_per_location * num_class_slots] representing
class predictions before reshaping if self._return_flat_predictions is
False.
""" """
class_predictions_net = features class_predictions_net = features
if self._use_dropout: if self._use_dropout:
...@@ -272,13 +292,15 @@ class WeightSharedConvolutionalClassHead(head.Head): ...@@ -272,13 +292,15 @@ class WeightSharedConvolutionalClassHead(head.Head):
normalizer_fn=None, normalizer_fn=None,
biases_initializer=tf.constant_initializer( biases_initializer=tf.constant_initializer(
self._class_prediction_bias_init), self._class_prediction_bias_init),
scope='ClassPredictor') scope=self._scope)
batch_size = features.get_shape().as_list()[0] batch_size = features.get_shape().as_list()[0]
if batch_size is None: if batch_size is None:
batch_size = tf.shape(features)[0] batch_size = tf.shape(features)[0]
class_predictions_with_background = self._score_converter_fn( class_predictions_with_background = self._score_converter_fn(
class_predictions_with_background) class_predictions_with_background)
if self._return_flat_predictions:
class_predictions_with_background = tf.reshape( class_predictions_with_background = tf.reshape(
class_predictions_with_background, class_predictions_with_background,
[batch_size, -1, self._num_class_slots]) [batch_size, -1, self._num_class_slots])
return class_predictions_with_background return class_predictions_with_background
...@@ -56,6 +56,30 @@ class MaskRCNNClassHeadTest(test_case.TestCase): ...@@ -56,6 +56,30 @@ class MaskRCNNClassHeadTest(test_case.TestCase):
features=roi_pooled_features, num_predictions_per_location=1) features=roi_pooled_features, num_predictions_per_location=1)
self.assertAllEqual([64, 1, 20], prediction.get_shape().as_list()) self.assertAllEqual([64, 1, 20], prediction.get_shape().as_list())
def test_scope_name(self):
expected_var_names = set([
"""ClassPredictor/weights""",
"""ClassPredictor/biases"""
])
g = tf.Graph()
with g.as_default():
class_prediction_head = class_head.MaskRCNNClassHead(
is_training=True,
num_class_slots=20,
fc_hyperparams_fn=self._build_arg_scope_with_hyperparams(),
use_dropout=True,
dropout_keep_prob=0.5)
image_feature = tf.random_uniform(
[64, 17, 19, 1024], minval=-10.0, maxval=10.0, dtype=tf.float32)
class_prediction_head.predict(
features=image_feature,
num_predictions_per_location=1)
actual_variable_set = set([
var.op.name for var in g.get_collection(tf.GraphKeys.GLOBAL_VARIABLES)
])
self.assertSetEqual(expected_var_names, actual_variable_set)
class ConvolutionalClassPredictorTest(test_case.TestCase): class ConvolutionalClassPredictorTest(test_case.TestCase):
...@@ -92,6 +116,29 @@ class ConvolutionalClassPredictorTest(test_case.TestCase): ...@@ -92,6 +116,29 @@ class ConvolutionalClassPredictorTest(test_case.TestCase):
self.assertAllEqual([64, 323, 20], self.assertAllEqual([64, 323, 20],
class_predictions.get_shape().as_list()) class_predictions.get_shape().as_list())
def test_scope_name(self):
expected_var_names = set([
"""ClassPredictor/weights""",
"""ClassPredictor/biases"""
])
g = tf.Graph()
with g.as_default():
class_prediction_head = class_head.ConvolutionalClassHead(
is_training=True,
num_class_slots=20,
use_dropout=True,
dropout_keep_prob=0.5,
kernel_size=3)
image_feature = tf.random_uniform(
[64, 17, 19, 1024], minval=-10.0, maxval=10.0, dtype=tf.float32)
class_prediction_head.predict(
features=image_feature,
num_predictions_per_location=1)
actual_variable_set = set([
var.op.name for var in g.get_collection(tf.GraphKeys.GLOBAL_VARIABLES)
])
self.assertSetEqual(expected_var_names, actual_variable_set)
class WeightSharedConvolutionalClassPredictorTest(test_case.TestCase): class WeightSharedConvolutionalClassPredictorTest(test_case.TestCase):
...@@ -123,6 +170,25 @@ class WeightSharedConvolutionalClassPredictorTest(test_case.TestCase): ...@@ -123,6 +170,25 @@ class WeightSharedConvolutionalClassPredictorTest(test_case.TestCase):
num_predictions_per_location=1) num_predictions_per_location=1)
self.assertAllEqual([64, 323, 20], class_predictions.get_shape().as_list()) self.assertAllEqual([64, 323, 20], class_predictions.get_shape().as_list())
def test_scope_name(self):
expected_var_names = set([
"""ClassPredictor/weights""",
"""ClassPredictor/biases"""
])
g = tf.Graph()
with g.as_default():
class_prediction_head = class_head.WeightSharedConvolutionalClassHead(
num_class_slots=20)
image_feature = tf.random_uniform(
[64, 17, 19, 1024], minval=-10.0, maxval=10.0, dtype=tf.float32)
class_prediction_head.predict(
features=image_feature,
num_predictions_per_location=1)
actual_variable_set = set([
var.op.name for var in g.get_collection(tf.GraphKeys.GLOBAL_VARIABLES)
])
self.assertSetEqual(expected_var_names, actual_variable_set)
if __name__ == '__main__': if __name__ == '__main__':
tf.test.main() tf.test.main()
...@@ -34,7 +34,8 @@ class ConvolutionalBoxHead(head.KerasHead): ...@@ -34,7 +34,8 @@ class ConvolutionalBoxHead(head.KerasHead):
num_predictions_per_location, num_predictions_per_location,
conv_hyperparams, conv_hyperparams,
freeze_batchnorm, freeze_batchnorm,
use_depthwise=True, use_depthwise=False,
box_encodings_clip_range=None,
name=None): name=None):
"""Constructor. """Constructor.
...@@ -55,6 +56,7 @@ class ConvolutionalBoxHead(head.KerasHead): ...@@ -55,6 +56,7 @@ class ConvolutionalBoxHead(head.KerasHead):
params. params.
use_depthwise: Whether to use depthwise convolutions for prediction use_depthwise: Whether to use depthwise convolutions for prediction
steps. Default is False. steps. Default is False.
box_encodings_clip_range: Min and max values for clipping box_encodings.
name: A string name scope to assign to the model. If `None`, Keras name: A string name scope to assign to the model. If `None`, Keras
will auto-generate one from the class name. will auto-generate one from the class name.
...@@ -67,6 +69,7 @@ class ConvolutionalBoxHead(head.KerasHead): ...@@ -67,6 +69,7 @@ class ConvolutionalBoxHead(head.KerasHead):
self._kernel_size = kernel_size self._kernel_size = kernel_size
self._num_predictions_per_location = num_predictions_per_location self._num_predictions_per_location = num_predictions_per_location
self._use_depthwise = use_depthwise self._use_depthwise = use_depthwise
self._box_encodings_clip_range = box_encodings_clip_range
self._box_encoder_layers = [] self._box_encoder_layers = []
...@@ -119,6 +122,202 @@ class ConvolutionalBoxHead(head.KerasHead): ...@@ -119,6 +122,202 @@ class ConvolutionalBoxHead(head.KerasHead):
batch_size = features.get_shape().as_list()[0] batch_size = features.get_shape().as_list()[0]
if batch_size is None: if batch_size is None:
batch_size = tf.shape(features)[0] batch_size = tf.shape(features)[0]
# Clipping the box encodings to make the inference graph TPU friendly.
if self._box_encodings_clip_range is not None:
box_encodings = tf.clip_by_value(
box_encodings, self._box_encodings_clip_range.min,
self._box_encodings_clip_range.max)
box_encodings = tf.reshape(box_encodings, box_encodings = tf.reshape(box_encodings,
[batch_size, -1, 1, self._box_code_size]) [batch_size, -1, 1, self._box_code_size])
return box_encodings return box_encodings
class MaskRCNNBoxHead(head.KerasHead):
"""Box prediction head.
This is a piece of Mask RCNN which is responsible for predicting
just the box encodings.
Please refer to Mask RCNN paper:
https://arxiv.org/abs/1703.06870
"""
def __init__(self,
is_training,
num_classes,
fc_hyperparams,
freeze_batchnorm,
use_dropout,
dropout_keep_prob,
box_code_size,
share_box_across_classes=False,
name=None):
"""Constructor.
Args:
is_training: Indicates whether the BoxPredictor is in training mode.
num_classes: number of classes. Note that num_classes *does not*
include the background category, so if groundtruth labels take values
in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the
assigned classification targets can range from {0,... K}).
fc_hyperparams: A `hyperparams_builder.KerasLayerHyperparams` object
containing hyperparameters for fully connected dense ops.
freeze_batchnorm: Whether to freeze batch norm parameters during
training or not. When training with a small batch size (e.g. 1), it is
desirable to freeze batch norm update and use pretrained batch norm
params.
use_dropout: Option to use dropout or not. Note that a single dropout
op is applied here prior to both box and class predictions, which stands
in contrast to the ConvolutionalBoxPredictor below.
dropout_keep_prob: Keep probability for dropout.
This is only used if use_dropout is True.
box_code_size: Size of encoding for each box.
share_box_across_classes: Whether to share boxes across classes rather
than use a different box for each class.
name: A string name scope to assign to the box head. If `None`, Keras
will auto-generate one from the class name.
"""
super(MaskRCNNBoxHead, self).__init__(name=name)
self._is_training = is_training
self._num_classes = num_classes
self._fc_hyperparams = fc_hyperparams
self._freeze_batchnorm = freeze_batchnorm
self._use_dropout = use_dropout
self._dropout_keep_prob = dropout_keep_prob
self._box_code_size = box_code_size
self._share_box_across_classes = share_box_across_classes
self._box_encoder_layers = [tf.keras.layers.Flatten()]
if self._use_dropout:
self._box_encoder_layers.append(
tf.keras.layers.Dropout(rate=1.0 - self._dropout_keep_prob))
self._number_of_boxes = 1
if not self._share_box_across_classes:
self._number_of_boxes = self._num_classes
self._box_encoder_layers.append(
tf.keras.layers.Dense(self._number_of_boxes * self._box_code_size,
name='BoxEncodingPredictor_dense'))
self._box_encoder_layers.append(
fc_hyperparams.build_batch_norm(training=(is_training and
not freeze_batchnorm),
name='BoxEncodingPredictor_batchnorm'))
def _predict(self, features):
"""Predicts box encodings.
Args:
features: A float tensor of shape [batch_size, height, width,
channels] containing features for a batch of images.
Returns:
box_encodings: A float tensor of shape
[batch_size, 1, num_classes, code_size] representing the location of the
objects.
"""
spatial_averaged_roi_pooled_features = tf.reduce_mean(
features, [1, 2], keep_dims=True, name='AvgPool')
net = spatial_averaged_roi_pooled_features
for layer in self._box_encoder_layers:
net = layer(net)
box_encodings = tf.reshape(net,
[-1, 1,
self._number_of_boxes,
self._box_code_size])
return box_encodings
# TODO(b/128922690): Unify the implementations of ConvolutionalBoxHead
# and WeightSharedConvolutionalBoxHead
class WeightSharedConvolutionalBoxHead(head.KerasHead):
"""Weight shared convolutional box prediction head based on Keras.
This head allows sharing the same set of parameters (weights) when called more
then once on different feature maps.
"""
def __init__(self,
box_code_size,
num_predictions_per_location,
conv_hyperparams,
kernel_size=3,
use_depthwise=False,
box_encodings_clip_range=None,
return_flat_predictions=True,
name=None):
"""Constructor.
Args:
box_code_size: Size of encoding for each box.
num_predictions_per_location: Number of box predictions to be made per
spatial location. Int specifying number of boxes per location.
conv_hyperparams: A `hyperparams_builder.KerasLayerHyperparams` object
containing hyperparameters for convolution ops.
kernel_size: Size of final convolution kernel.
use_depthwise: Whether to use depthwise convolutions for prediction steps.
Default is False.
box_encodings_clip_range: Min and max values for clipping box_encodings.
return_flat_predictions: If true, returns flattened prediction tensor
of shape [batch, height * width * num_predictions_per_location,
box_coder]. Otherwise returns the prediction tensor before reshaping,
whose shape is [batch, height, width, num_predictions_per_location *
num_class_slots].
name: A string name scope to assign to the model. If `None`, Keras
will auto-generate one from the class name.
"""
super(WeightSharedConvolutionalBoxHead, self).__init__(name=name)
self._box_code_size = box_code_size
self._kernel_size = kernel_size
self._num_predictions_per_location = num_predictions_per_location
self._use_depthwise = use_depthwise
self._box_encodings_clip_range = box_encodings_clip_range
self._return_flat_predictions = return_flat_predictions
self._box_encoder_layers = []
if self._use_depthwise:
self._box_encoder_layers.append(
tf.keras.layers.SeparableConv2D(
num_predictions_per_location * self._box_code_size,
[self._kernel_size, self._kernel_size],
padding='SAME',
name='BoxPredictor',
**conv_hyperparams.params(use_bias=True)))
else:
self._box_encoder_layers.append(
tf.keras.layers.Conv2D(
num_predictions_per_location * self._box_code_size,
[self._kernel_size, self._kernel_size],
padding='SAME',
name='BoxPredictor',
**conv_hyperparams.params(use_bias=True)))
def _predict(self, features):
"""Predicts boxes.
Args:
features: A float tensor of shape [batch_size, height, width, channels]
containing image features.
Returns:
box_encodings: A float tensor of shape
[batch_size, num_anchors, q, code_size] representing the location of
the objects, where q is 1 or the number of classes.
"""
box_encodings = features
for layer in self._box_encoder_layers:
box_encodings = layer(box_encodings)
batch_size = features.get_shape().as_list()[0]
if batch_size is None:
batch_size = tf.shape(features)[0]
# Clipping the box encodings to make the inference graph TPU friendly.
if self._box_encodings_clip_range is not None:
box_encodings = tf.clip_by_value(
box_encodings, self._box_encodings_clip_range.min,
self._box_encodings_clip_range.max)
if self._return_flat_predictions:
box_encodings = tf.reshape(box_encodings,
[batch_size, -1, self._box_code_size])
return box_encodings
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment