Commit 80444539 authored by Zhuoran Liu's avatar Zhuoran Liu Committed by pkulzc
Browse files

Add TPU SavedModel exporter and refactor OD code (#6737)

247226201  by ronnyvotel:

    Updating the visualization tools to accept unique_ids for color coding.

--
247067830  by Zhichao Lu:

    Add box_encodings_clip_range options for the convolutional box predictor (for TPU compatibility).

--
246888475  by Zhichao Lu:

    Remove unused _update_eval_steps function.

--
246163259  by lzc:

    Add a gather op that can handle ignore indices (which are "-1"s in this case).

--
246084944  by Zhichao Lu:

    Keras based implementation for SSD + MobilenetV2 + FPN.

--
245544227  by rathodv:

    Add batch_get_targets method to target assigner module to gather any groundtruth tensors based on the results of target assigner.

--
245540854  by rathodv:

    Update target assigner to return match tensor instead of a match object.

--
245434441  by Zhichao Lu:

    Add README for tpu_exporters package.

--
245381834  by lzc:

    Internal change.

--
245298983  by Zhichao Lu:

    Add conditional_shape_resizer to config_util

--
245134666  by Zhichao Lu:

    Adds ConditionalShapeResizer to the ImageResizer proto which enables resizing only if input image height or width is is greater or smaller than a certain size. Also enables specification of resize method in resize_to_{max, min}_dimension methods.

--
245093975  by Zhichao Lu:

    Exporting SavedModel for Object Detection TPU inference. (faster-rcnn)

--
245072421  by Zhichao Lu:

    Adds a new image resizing method "resize_to_max_dimension" which resizes images only if a dimension is greater than the maximum desired value while maintaining aspect ratio.

--
244946998  by lzc:

    Internal Changes.

--
244943693  by Zhichao Lu:

    Add a custom config to mobilenet v2 that makes it more detection friendly.

--
244754158  by derekjchow:

    Internal change.

--
244699875  by Zhichao Lu:

    Add check_range=False to box_list_ops.to_normalized_coordinates when training
    for instance segmentation.  This is consistent with other calls when training
    for object detection.  There could be wrongly annotated boxes in the dataset.

--
244507425  by rathodv:

    Support bfloat16 for ssd models.

--
244399982  by Zhichao Lu:

    Exporting SavedModel for Object Detection TPU inference. (ssd)

--
244209387  by Zhichao Lu:

    Internal change.

--
243922296  by rathodv:

    Change `raw_detection_scores` to contain softmax/sigmoid scores (not logits) for `raw_ detection_boxes`.

--
243883978  by Zhichao Lu:

    Add a sample fully conv config.

--
243369455  by Zhichao Lu:

    Fix regularization loss gap in Keras and Slim.

--
243292002  by lzc:

    Internal changes.

--
243097958  by Zhichao Lu:

    Exporting SavedModel for Object Detection TPU inference. (ssd model)

--
243007177  by Zhichao Lu:

    Exporting SavedModel for Object Detection TPU inference. (ssd model)

--
242776550  by Zhichao Lu:

    Make object detection pre-processing run on GPU.  tf.map_fn() uses
    TensorArrayV3 ops, which have no int32 GPU implementation.  Cast to int64,
    then cast back to int32.

--
242723128  by Zhichao Lu:

    Using sorted dictionaries for additional heads in non_max_suppression to ensure tensor order

--
242495311  by Zhichao Lu:

    Update documentation to reflect new TFLite examples repo location

--
242230527  by Zhichao Lu:

    Fix Dropout bugs for WeightSharedConvolutionalBoxPred.

--
242226573  by Zhichao Lu:

    Create Keras-based WeightSharedConvolutionalBoxPredictor.

--
241806074  by Zhichao Lu:

    Add inference in unit tests of TFX OD template.

--
241641498  by lzc:

    Internal change.

--
241637481  by Zhichao Lu:

    matmul_crop_and_resize(): Switch to dynamic shaping, so that not all dimensions are required to be known.

--
241429980  by Zhichao Lu:

    Internal change

--
241167237  by Zhichao Lu:

    Adds a faster_rcnn_inception_resnet_v2 Keras feature extractor, and updates the model builder to construct it.

--
241088616  by Zhichao Lu:

    Make it compatible with different dtype, e.g. float32, bfloat16, etc.

--
240897364  by lzc:

    Use image_np_expanded in object_detection_tutorial notebook.

--
240890393  by Zhichao Lu:

    Disable multicore inference for OD template as its not yet compatible.

--
240352168  by Zhichao Lu:

    Make SSDResnetV1FpnFeatureExtractor not protected to allow inheritance.

--
240351470  by lzc:

    Internal change.

--
239878928  by Zhichao Lu:

    Defines Keras box predictors for Faster RCNN and RFCN

--
239872103  by Zhichao Lu:

    Delete duplicated inputs in test.

--
239714273  by Zhichao Lu:

    Adding scope variable to all class heads

--
239698643  by Zhichao Lu:

    Create FPN feature extractor for object detection.

--
239696657  by Zhichao Lu:

    Internal Change.

--
239299404  by Zhichao Lu:

    Allows the faster rcnn meta-architecture to support Keras subcomponents

--
238502595  by Zhichao Lu:

    Lay the groundwork for symmetric quantization.

--
238496885  by Zhichao Lu:

    Add flexible_grid_anchor_generator

--
238138727  by lzc:

    Remove dead code.

    _USE_C_SHAPES has been forced True in TensorFlow releases since
    TensorFlow 1.9
    (https://github.com/tensorflow/tensorflow/commit/1d74a69443f741e69f9f52cb6bc2940b4d4ae3b7)

--
238123936  by rathodv:

    Add num_matched_groundtruth summary to target assigner in SSD.

--
238103345  by ronnyvotel:

    Raising error if input file pattern does not match any files.
    Also printing the number of evaluation images for coco metrics.

--
238044081  by Zhichao Lu:

    Fix docstring to state the correct dimensionality of `class_predictions_with_background`.

--
237920279  by Zhichao Lu:

    [XLA] Rework debug flags for dumping HLO.

    The following flags (usually passed via the XLA_FLAGS envvar) are removed:

      xla_dump_computations_to
      xla_dump_executions_to
      xla_dump_ir_to
      xla_dump_optimized_hlo_proto_to
      xla_dump_per_pass_hlo_proto_to
      xla_dump_unoptimized_hlo_proto_to
      xla_generate_hlo_graph
      xla_generate_hlo_text_to
      xla_hlo_dump_as_html
      xla_hlo_graph_path
      xla_log_hlo_text

    The following new flags are added:

      xla_dump_to
      xla_dump_hlo_module_re
      xla_dump_hlo_pass_re
      xla_dump_hlo_as_text
      xla_dump_hlo_as_proto
      xla_dump_hlo_as_dot
      xla_dump_hlo_as_url
      xla_dump_hlo_as_html
      xla_dump_ir
      xla_dump_hlo_snapshots

    The default is not to dump anything at all, but as soon as some dumping flag is
    specified, we enable the following defaults (most of which can be overridden).

     * dump to stdout (overridden by --xla_dump_to)
     * dump HLO modules at the very beginning and end of the optimization pipeline
     * don't dump between any HLO passes (overridden by --xla_dump_hlo_pass_re)
     * dump all HLO modules (overridden by --xla_dump_hlo_module_re)
     * dump in textual format (overridden by
       --xla_dump_hlo_as_{text,proto,dot,url,html}).

    For example, to dump optimized and unoptimized HLO text and protos to /tmp/foo,
    pass

      --xla_dump_to=/tmp/foo --xla_dump_hlo_as_text --xla_dump_hlo_as_proto

    For details on these flags' meanings, see xla.proto.

    The intent of this change is to make dumping both simpler to use and more
    powerful.

    For example:

     * Previously there was no way to dump the HLO module during the pass pipeline
       in HLO text format; the only option was --dump_per_pass_hlo_proto_to, which
       dumped in proto format.

       Now this is --xla_dump_pass_re=.* --xla_dump_hlo_as_text.  (In fact, the
       second flag is not necessary in this case, as dumping as text is the
       default.)

     * Previously there was no way to dump HLO as a graph before and after
       compilation; the only option was --xla_generate_hlo_graph, which would dump
       before/after every pass.

       Now this is --xla_dump_hlo_as_{dot,url,html} (depending on what format you
       want the graph in).

     * Previously, there was no coordination between the filenames written by the
       various flags, so info about one module might be dumped with various
       filename prefixes.  Now the filenames are consistent and all dumps from a
       particular module are next to each other.

    If you only specify some of these flags, we try to figure out what you wanted.
    For example:

     * --xla_dump_to implies --xla_dump_hlo_as_text unless you specify some
       other --xla_dump_as_* flag.

     * --xla_dump_hlo_as_text or --xla_dump_ir implies dumping to stdout unless you
       specify a different --xla_dump_to directory.  You can explicitly dump to
       stdout with --xla_dump_to=-.

    As part of this change, I simplified the debugging code in the HLO passes for
    dumping HLO modules.  Previously, many tests explicitly VLOG'ed the HLO module
    before, after, and sometimes during the pass.  I removed these VLOGs.  If you
    want dumps before/during/after an HLO pass, use --xla_dump_pass_re=<pass_name>.

--
237510043  by lzc:

    Internal Change.

--
237469515  by Zhichao Lu:

    Parameterize model_builder.build in inputs.py.

--
237293511  by rathodv:

    Remove multiclass_scores from tensor_dict in transform_data_fn always.

--
237260333  by ronnyvotel:

    Updating faster_rcnn_meta_arch to define prediction dictionary fields that are batched.

--

PiperOrigin-RevId: 247226201
parent c4f34e58
......@@ -71,5 +71,114 @@ class ConvolutionalKerasBoxHeadTest(test_case.TestCase):
box_encodings = box_prediction_head(image_feature)
self.assertAllEqual([64, 323, 1, 4], box_encodings.get_shape().as_list())
class MaskRCNNKerasBoxHeadTest(test_case.TestCase):
def _build_fc_hyperparams(
self, op_type=hyperparams_pb2.Hyperparams.FC):
hyperparams = hyperparams_pb2.Hyperparams()
hyperparams_text_proto = """
activation: NONE
regularizer {
l2_regularizer {
}
}
initializer {
truncated_normal_initializer {
}
}
"""
text_format.Merge(hyperparams_text_proto, hyperparams)
hyperparams.op = op_type
return hyperparams_builder.KerasLayerHyperparams(hyperparams)
def test_prediction_size(self):
box_prediction_head = keras_box_head.MaskRCNNBoxHead(
is_training=False,
num_classes=20,
fc_hyperparams=self._build_fc_hyperparams(),
freeze_batchnorm=False,
use_dropout=True,
dropout_keep_prob=0.5,
box_code_size=4,
share_box_across_classes=False)
roi_pooled_features = tf.random_uniform(
[64, 7, 7, 1024], minval=-10.0, maxval=10.0, dtype=tf.float32)
prediction = box_prediction_head(roi_pooled_features)
self.assertAllEqual([64, 1, 20, 4], prediction.get_shape().as_list())
class WeightSharedConvolutionalKerasBoxHead(test_case.TestCase):
def _build_conv_hyperparams(self):
conv_hyperparams = hyperparams_pb2.Hyperparams()
conv_hyperparams_text_proto = """
activation: NONE
regularizer {
l2_regularizer {
}
}
initializer {
truncated_normal_initializer {
}
}
"""
text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams)
return hyperparams_builder.KerasLayerHyperparams(conv_hyperparams)
def test_prediction_size_depthwise_false(self):
conv_hyperparams = self._build_conv_hyperparams()
box_prediction_head = keras_box_head.WeightSharedConvolutionalBoxHead(
box_code_size=4,
conv_hyperparams=conv_hyperparams,
num_predictions_per_location=1,
use_depthwise=False)
image_feature = tf.random_uniform(
[64, 17, 19, 1024], minval=-10.0, maxval=10.0, dtype=tf.float32)
box_encodings = box_prediction_head(image_feature)
self.assertAllEqual([64, 323, 4], box_encodings.get_shape().as_list())
def test_prediction_size_depthwise_true(self):
conv_hyperparams = self._build_conv_hyperparams()
box_prediction_head = keras_box_head.WeightSharedConvolutionalBoxHead(
box_code_size=4,
conv_hyperparams=conv_hyperparams,
num_predictions_per_location=1,
use_depthwise=True)
image_feature = tf.random_uniform(
[64, 17, 19, 1024], minval=-10.0, maxval=10.0, dtype=tf.float32)
box_encodings = box_prediction_head(image_feature)
self.assertAllEqual([64, 323, 4], box_encodings.get_shape().as_list())
def test_variable_count_depth_wise_true(self):
g = tf.Graph()
with g.as_default():
conv_hyperparams = self._build_conv_hyperparams()
box_prediction_head = keras_box_head.WeightSharedConvolutionalBoxHead(
box_code_size=4,
conv_hyperparams=conv_hyperparams,
num_predictions_per_location=1,
use_depthwise=True)
image_feature = tf.random_uniform(
[64, 17, 19, 1024], minval=-10.0, maxval=10.0, dtype=tf.float32)
_ = box_prediction_head(image_feature)
variables = g.get_collection(tf.GraphKeys.GLOBAL_VARIABLES)
self.assertEqual(len(variables), 3)
def test_variable_count_depth_wise_False(self):
g = tf.Graph()
with g.as_default():
conv_hyperparams = self._build_conv_hyperparams()
box_prediction_head = keras_box_head.WeightSharedConvolutionalBoxHead(
box_code_size=4,
conv_hyperparams=conv_hyperparams,
num_predictions_per_location=1,
use_depthwise=False)
image_feature = tf.random_uniform(
[64, 17, 19, 1024], minval=-10.0, maxval=10.0, dtype=tf.float32)
_ = box_prediction_head(image_feature)
variables = g.get_collection(tf.GraphKeys.GLOBAL_VARIABLES)
self.assertEqual(len(variables), 2)
if __name__ == '__main__':
tf.test.main()
......@@ -134,7 +134,6 @@ class ConvolutionalClassHead(head.KerasHead):
[batch_size, num_anchors, num_class_slots] representing the class
predictions for the proposals.
"""
# Add a slot for the background class.
class_predictions_with_background = features
for layer in self._class_predictor_layers:
class_predictions_with_background = layer(
......@@ -146,3 +145,197 @@ class ConvolutionalClassHead(head.KerasHead):
class_predictions_with_background,
[batch_size, -1, self._num_class_slots])
return class_predictions_with_background
class MaskRCNNClassHead(head.KerasHead):
"""Mask RCNN class prediction head.
This is a piece of Mask RCNN which is responsible for predicting
just the class scores of boxes.
Please refer to Mask RCNN paper:
https://arxiv.org/abs/1703.06870
"""
def __init__(self,
is_training,
num_class_slots,
fc_hyperparams,
freeze_batchnorm,
use_dropout,
dropout_keep_prob,
name=None):
"""Constructor.
Args:
is_training: Indicates whether the BoxPredictor is in training mode.
num_class_slots: number of class slots. Note that num_class_slots may or
may not include an implicit background category.
fc_hyperparams: A `hyperparams_builder.KerasLayerHyperparams` object
containing hyperparameters for fully connected dense ops.
freeze_batchnorm: Whether to freeze batch norm parameters during
training or not. When training with a small batch size (e.g. 1), it is
desirable to freeze batch norm update and use pretrained batch norm
params.
use_dropout: Option to use dropout or not. Note that a single dropout
op is applied here prior to both box and class predictions, which stands
in contrast to the ConvolutionalBoxPredictor below.
dropout_keep_prob: Keep probability for dropout.
This is only used if use_dropout is True.
name: A string name scope to assign to the class head. If `None`, Keras
will auto-generate one from the class name.
"""
super(MaskRCNNClassHead, self).__init__(name=name)
self._is_training = is_training
self._freeze_batchnorm = freeze_batchnorm
self._num_class_slots = num_class_slots
self._fc_hyperparams = fc_hyperparams
self._use_dropout = use_dropout
self._dropout_keep_prob = dropout_keep_prob
self._class_predictor_layers = [tf.keras.layers.Flatten()]
if self._use_dropout:
self._class_predictor_layers.append(
tf.keras.layers.Dropout(rate=1.0 - self._dropout_keep_prob))
self._class_predictor_layers.append(
tf.keras.layers.Dense(self._num_class_slots,
name='ClassPredictor_dense'))
self._class_predictor_layers.append(
fc_hyperparams.build_batch_norm(training=(is_training and
not freeze_batchnorm),
name='ClassPredictor_batchnorm'))
def _predict(self, features):
"""Predicts the class scores for boxes.
Args:
features: A float tensor of shape [batch_size, height, width, channels]
containing features for a batch of images.
Returns:
class_predictions_with_background: A float tensor of shape
[batch_size, 1, num_class_slots] representing the class predictions for
the proposals.
"""
spatial_averaged_roi_pooled_features = tf.reduce_mean(
features, [1, 2], keep_dims=True, name='AvgPool')
net = spatial_averaged_roi_pooled_features
for layer in self._class_predictor_layers:
net = layer(net)
class_predictions_with_background = tf.reshape(
net,
[-1, 1, self._num_class_slots])
return class_predictions_with_background
class WeightSharedConvolutionalClassHead(head.KerasHead):
"""Weight shared convolutional class prediction head.
This head allows sharing the same set of parameters (weights) when called more
then once on different feature maps.
"""
def __init__(self,
num_class_slots,
num_predictions_per_location,
conv_hyperparams,
kernel_size=3,
class_prediction_bias_init=0.0,
use_dropout=False,
dropout_keep_prob=0.8,
use_depthwise=False,
score_converter_fn=tf.identity,
return_flat_predictions=True,
name=None):
"""Constructor.
Args:
num_class_slots: number of class slots. Note that num_class_slots may or
may not include an implicit background category.
num_predictions_per_location: Number of box predictions to be made per
spatial location. Int specifying number of boxes per location.
conv_hyperparams: A `hyperparams_builder.KerasLayerHyperparams` object
containing hyperparameters for convolution ops.
kernel_size: Size of final convolution kernel.
class_prediction_bias_init: constant value to initialize bias of the last
conv2d layer before class prediction.
use_dropout: Whether to apply dropout to class prediction head.
dropout_keep_prob: Probability of keeping activiations.
use_depthwise: Whether to use depthwise convolutions for prediction
steps. Default is False.
score_converter_fn: Callable elementwise nonlinearity (that takes tensors
as inputs and returns tensors).
return_flat_predictions: If true, returns flattened prediction tensor
of shape [batch, height * width * num_predictions_per_location,
box_coder]. Otherwise returns the prediction tensor before reshaping,
whose shape is [batch, height, width, num_predictions_per_location *
num_class_slots].
name: A string name scope to assign to the model. If `None`, Keras
will auto-generate one from the class name.
"""
super(WeightSharedConvolutionalClassHead, self).__init__(name=name)
self._num_class_slots = num_class_slots
self._kernel_size = kernel_size
self._class_prediction_bias_init = class_prediction_bias_init
self._use_dropout = use_dropout
self._dropout_keep_prob = dropout_keep_prob
self._use_depthwise = use_depthwise
self._score_converter_fn = score_converter_fn
self._return_flat_predictions = return_flat_predictions
self._class_predictor_layers = []
if self._use_dropout:
self._class_predictor_layers.append(
tf.keras.layers.Dropout(rate=1.0 - self._dropout_keep_prob))
if self._use_depthwise:
self._class_predictor_layers.append(
tf.keras.layers.SeparableConv2D(
num_predictions_per_location * self._num_class_slots,
[self._kernel_size, self._kernel_size],
padding='SAME',
depth_multiplier=1,
strides=1,
name='ClassPredictor',
bias_initializer=tf.constant_initializer(
self._class_prediction_bias_init),
**conv_hyperparams.params(use_bias=True)))
else:
self._class_predictor_layers.append(
tf.keras.layers.Conv2D(
num_predictions_per_location * self._num_class_slots,
[self._kernel_size, self._kernel_size],
padding='SAME',
name='ClassPredictor',
bias_initializer=tf.constant_initializer(
self._class_prediction_bias_init),
**conv_hyperparams.params(use_bias=True)))
def _predict(self, features):
"""Predicts boxes.
Args:
features: A float tensor of shape [batch_size, height, width, channels]
containing image features.
Returns:
class_predictions_with_background: A float tensor of shape
[batch_size, num_anchors, num_class_slots] representing the class
predictions for the proposals.
"""
class_predictions_with_background = features
for layer in self._class_predictor_layers:
class_predictions_with_background = layer(
class_predictions_with_background)
batch_size = features.get_shape().as_list()[0]
if batch_size is None:
batch_size = tf.shape(features)[0]
class_predictions_with_background = self._score_converter_fn(
class_predictions_with_background)
if self._return_flat_predictions:
class_predictions_with_background = tf.reshape(
class_predictions_with_background,
[batch_size, -1, self._num_class_slots])
return class_predictions_with_background
......@@ -77,5 +77,115 @@ class ConvolutionalKerasClassPredictorTest(test_case.TestCase):
self.assertAllEqual([64, 323, 20],
class_predictions.get_shape().as_list())
class MaskRCNNClassHeadTest(test_case.TestCase):
def _build_fc_hyperparams(self,
op_type=hyperparams_pb2.Hyperparams.FC):
hyperparams = hyperparams_pb2.Hyperparams()
hyperparams_text_proto = """
activation: NONE
regularizer {
l2_regularizer {
}
}
initializer {
truncated_normal_initializer {
}
}
"""
text_format.Merge(hyperparams_text_proto, hyperparams)
hyperparams.op = op_type
return hyperparams_builder.KerasLayerHyperparams(hyperparams)
def test_prediction_size(self):
class_prediction_head = keras_class_head.MaskRCNNClassHead(
is_training=False,
num_class_slots=20,
fc_hyperparams=self._build_fc_hyperparams(),
freeze_batchnorm=False,
use_dropout=True,
dropout_keep_prob=0.5)
roi_pooled_features = tf.random_uniform(
[64, 7, 7, 1024], minval=-10.0, maxval=10.0, dtype=tf.float32)
prediction = class_prediction_head(roi_pooled_features)
self.assertAllEqual([64, 1, 20], prediction.get_shape().as_list())
class WeightSharedConvolutionalKerasClassPredictorTest(test_case.TestCase):
def _build_conv_hyperparams(self):
conv_hyperparams = hyperparams_pb2.Hyperparams()
conv_hyperparams_text_proto = """
activation: NONE
regularizer {
l2_regularizer {
}
}
initializer {
truncated_normal_initializer {
}
}
"""
text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams)
return hyperparams_builder.KerasLayerHyperparams(conv_hyperparams)
def test_prediction_size_depthwise_false(self):
conv_hyperparams = self._build_conv_hyperparams()
class_prediction_head = keras_class_head.WeightSharedConvolutionalClassHead(
num_class_slots=20,
conv_hyperparams=conv_hyperparams,
num_predictions_per_location=1,
use_depthwise=False)
image_feature = tf.random_uniform(
[64, 17, 19, 1024], minval=-10.0, maxval=10.0, dtype=tf.float32)
class_predictions = class_prediction_head(image_feature)
self.assertAllEqual([64, 323, 20], class_predictions.get_shape().as_list())
def test_prediction_size_depthwise_true(self):
conv_hyperparams = self._build_conv_hyperparams()
class_prediction_head = keras_class_head.WeightSharedConvolutionalClassHead(
num_class_slots=20,
conv_hyperparams=conv_hyperparams,
num_predictions_per_location=1,
use_depthwise=True)
image_feature = tf.random_uniform(
[64, 17, 19, 1024], minval=-10.0, maxval=10.0, dtype=tf.float32)
class_predictions = class_prediction_head(image_feature)
self.assertAllEqual([64, 323, 20], class_predictions.get_shape().as_list())
def test_variable_count_depth_wise_true(self):
g = tf.Graph()
with g.as_default():
conv_hyperparams = self._build_conv_hyperparams()
class_prediction_head = (
keras_class_head.WeightSharedConvolutionalClassHead(
num_class_slots=20,
conv_hyperparams=conv_hyperparams,
num_predictions_per_location=1,
use_depthwise=True))
image_feature = tf.random_uniform(
[64, 17, 19, 1024], minval=-10.0, maxval=10.0, dtype=tf.float32)
_ = class_prediction_head(image_feature)
variables = g.get_collection(tf.GraphKeys.GLOBAL_VARIABLES)
self.assertEqual(len(variables), 3)
def test_variable_count_depth_wise_False(self):
g = tf.Graph()
with g.as_default():
conv_hyperparams = self._build_conv_hyperparams()
class_prediction_head = (
keras_class_head.WeightSharedConvolutionalClassHead(
num_class_slots=20,
conv_hyperparams=conv_hyperparams,
num_predictions_per_location=1,
use_depthwise=False))
image_feature = tf.random_uniform(
[64, 17, 19, 1024], minval=-10.0, maxval=10.0, dtype=tf.float32)
_ = class_prediction_head(image_feature)
variables = g.get_collection(tf.GraphKeys.GLOBAL_VARIABLES)
self.assertEqual(len(variables), 2)
if __name__ == '__main__':
tf.test.main()
......@@ -19,9 +19,11 @@ Contains Mask prediction head classes for different meta architectures.
All the mask prediction heads have a predict function that receives the
`features` as the first argument and returns `mask_predictions`.
"""
import math
import tensorflow as tf
from object_detection.predictors.heads import head
from object_detection.utils import ops
class ConvolutionalMaskHead(head.KerasHead):
......@@ -156,3 +158,281 @@ class ConvolutionalMaskHead(head.KerasHead):
mask_predictions,
[batch_size, -1, self._num_masks, self._mask_height, self._mask_width])
return mask_predictions
class MaskRCNNMaskHead(head.KerasHead):
"""Mask RCNN mask prediction head.
This is a piece of Mask RCNN which is responsible for predicting
just the pixelwise foreground scores for regions within the boxes.
Please refer to Mask RCNN paper:
https://arxiv.org/abs/1703.06870
"""
def __init__(self,
is_training,
num_classes,
freeze_batchnorm,
conv_hyperparams,
mask_height=14,
mask_width=14,
mask_prediction_num_conv_layers=2,
mask_prediction_conv_depth=256,
masks_are_class_agnostic=False,
convolve_then_upsample=False,
name=None):
"""Constructor.
Args:
is_training: Indicates whether the Mask head is in training mode.
num_classes: number of classes. Note that num_classes *does not*
include the background category, so if groundtruth labels take values
in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the
assigned classification targets can range from {0,... K}).
freeze_batchnorm: Whether to freeze batch norm parameters during
training or not. When training with a small batch size (e.g. 1), it is
desirable to freeze batch norm update and use pretrained batch norm
params.
conv_hyperparams: A `hyperparams_builder.KerasLayerHyperparams` object
containing hyperparameters for convolution ops.
mask_height: Desired output mask height. The default value is 14.
mask_width: Desired output mask width. The default value is 14.
mask_prediction_num_conv_layers: Number of convolution layers applied to
the image_features in mask prediction branch.
mask_prediction_conv_depth: The depth for the first conv2d_transpose op
applied to the image_features in the mask prediction branch. If set
to 0, the depth of the convolution layers will be automatically chosen
based on the number of object classes and the number of channels in the
image features.
masks_are_class_agnostic: Boolean determining if the mask-head is
class-agnostic or not.
convolve_then_upsample: Whether to apply convolutions on mask features
before upsampling using nearest neighbor resizing. Otherwise, mask
features are resized to [`mask_height`, `mask_width`] using bilinear
resizing before applying convolutions.
name: A string name scope to assign to the mask head. If `None`, Keras
will auto-generate one from the class name.
"""
super(MaskRCNNMaskHead, self).__init__(name=name)
self._is_training = is_training
self._freeze_batchnorm = freeze_batchnorm
self._num_classes = num_classes
self._conv_hyperparams = conv_hyperparams
self._mask_height = mask_height
self._mask_width = mask_width
self._mask_prediction_num_conv_layers = mask_prediction_num_conv_layers
self._mask_prediction_conv_depth = mask_prediction_conv_depth
self._masks_are_class_agnostic = masks_are_class_agnostic
self._convolve_then_upsample = convolve_then_upsample
self._mask_predictor_layers = []
def build(self, input_shapes):
num_conv_channels = self._mask_prediction_conv_depth
if num_conv_channels == 0:
num_feature_channels = input_shapes.as_list()[3]
num_conv_channels = self._get_mask_predictor_conv_depth(
num_feature_channels, self._num_classes)
for i in range(self._mask_prediction_num_conv_layers - 1):
self._mask_predictor_layers.append(
tf.keras.layers.Conv2D(
num_conv_channels,
[3, 3],
padding='SAME',
name='MaskPredictor_conv2d_{}'.format(i),
**self._conv_hyperparams.params()))
self._mask_predictor_layers.append(
self._conv_hyperparams.build_batch_norm(
training=(self._is_training and not self._freeze_batchnorm),
name='MaskPredictor_batchnorm_{}'.format(i)))
self._mask_predictor_layers.append(
self._conv_hyperparams.build_activation_layer(
name='MaskPredictor_activation_{}'.format(i)))
if self._convolve_then_upsample:
# Replace Transposed Convolution with a Nearest Neighbor upsampling step
# followed by 3x3 convolution.
height_scale = self._mask_height / input_shapes[1].value
width_scale = self._mask_width / input_shapes[2].value
# pylint: disable=g-long-lambda
self._mask_predictor_layers.append(tf.keras.layers.Lambda(
lambda features: ops.nearest_neighbor_upsampling(
features, height_scale=height_scale, width_scale=width_scale)
))
# pylint: enable=g-long-lambda
self._mask_predictor_layers.append(
tf.keras.layers.Conv2D(
num_conv_channels,
[3, 3],
padding='SAME',
name='MaskPredictor_upsample_conv2d',
**self._conv_hyperparams.params()))
self._mask_predictor_layers.append(
self._conv_hyperparams.build_batch_norm(
training=(self._is_training and not self._freeze_batchnorm),
name='MaskPredictor_upsample_batchnorm'))
self._mask_predictor_layers.append(
self._conv_hyperparams.build_activation_layer(
name='MaskPredictor_upsample_activation'))
num_masks = 1 if self._masks_are_class_agnostic else self._num_classes
self._mask_predictor_layers.append(
tf.keras.layers.Conv2D(
num_masks,
[3, 3],
padding='SAME',
name='MaskPredictor_last_conv2d',
**self._conv_hyperparams.params(use_bias=True)))
self.built = True
def _get_mask_predictor_conv_depth(self,
num_feature_channels,
num_classes,
class_weight=3.0,
feature_weight=2.0):
"""Computes the depth of the mask predictor convolutions.
Computes the depth of the mask predictor convolutions given feature channels
and number of classes by performing a weighted average of the two in
log space to compute the number of convolution channels. The weights that
are used for computing the weighted average do not need to sum to 1.
Args:
num_feature_channels: An integer containing the number of feature
channels.
num_classes: An integer containing the number of classes.
class_weight: Class weight used in computing the weighted average.
feature_weight: Feature weight used in computing the weighted average.
Returns:
An integer containing the number of convolution channels used by mask
predictor.
"""
num_feature_channels_log = math.log(float(num_feature_channels), 2.0)
num_classes_log = math.log(float(num_classes), 2.0)
weighted_num_feature_channels_log = (
num_feature_channels_log * feature_weight)
weighted_num_classes_log = num_classes_log * class_weight
total_weight = feature_weight + class_weight
num_conv_channels_log = round(
(weighted_num_feature_channels_log + weighted_num_classes_log) /
total_weight)
return int(math.pow(2.0, num_conv_channels_log))
def _predict(self, features):
"""Predicts pixelwise foreground scores for regions within the boxes.
Args:
features: A float tensor of shape [batch_size, height, width, channels]
containing features for a batch of images.
Returns:
instance_masks: A float tensor of shape
[batch_size, 1, num_classes, mask_height, mask_width].
"""
if not self._convolve_then_upsample:
features = tf.image.resize_bilinear(
features, [self._mask_height, self._mask_width],
align_corners=True)
mask_predictions = features
for layer in self._mask_predictor_layers:
mask_predictions = layer(mask_predictions)
return tf.expand_dims(
tf.transpose(mask_predictions, perm=[0, 3, 1, 2]),
axis=1,
name='MaskPredictor')
class WeightSharedConvolutionalMaskHead(head.KerasHead):
"""Weight shared convolutional mask prediction head based on Keras."""
def __init__(self,
num_classes,
num_predictions_per_location,
conv_hyperparams,
kernel_size=3,
use_dropout=False,
dropout_keep_prob=0.8,
mask_height=7,
mask_width=7,
masks_are_class_agnostic=False,
name=None):
"""Constructor.
Args:
num_classes: number of classes. Note that num_classes *does not*
include the background category, so if groundtruth labels take values
in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the
assigned classification targets can range from {0,... K}).
num_predictions_per_location: Number of box predictions to be made per
spatial location. Int specifying number of boxes per location.
conv_hyperparams: A `hyperparams_builder.KerasLayerHyperparams` object
containing hyperparameters for convolution ops.
kernel_size: Size of final convolution kernel.
use_dropout: Whether to apply dropout to class prediction head.
dropout_keep_prob: Probability of keeping activiations.
mask_height: Desired output mask height. The default value is 7.
mask_width: Desired output mask width. The default value is 7.
masks_are_class_agnostic: Boolean determining if the mask-head is
class-agnostic or not.
name: A string name scope to assign to the model. If `None`, Keras
will auto-generate one from the class name.
Raises:
ValueError: if min_depth > max_depth.
"""
super(WeightSharedConvolutionalMaskHead, self).__init__(name=name)
self._num_classes = num_classes
self._num_predictions_per_location = num_predictions_per_location
self._kernel_size = kernel_size
self._use_dropout = use_dropout
self._dropout_keep_prob = dropout_keep_prob
self._mask_height = mask_height
self._mask_width = mask_width
self._masks_are_class_agnostic = masks_are_class_agnostic
self._mask_predictor_layers = []
if self._masks_are_class_agnostic:
self._num_masks = 1
else:
self._num_masks = self._num_classes
num_mask_channels = self._num_masks * self._mask_height * self._mask_width
if self._use_dropout:
self._mask_predictor_layers.append(
tf.keras.layers.Dropout(rate=1.0 - self._dropout_keep_prob))
self._mask_predictor_layers.append(
tf.keras.layers.Conv2D(
num_predictions_per_location * num_mask_channels,
[self._kernel_size, self._kernel_size],
padding='SAME',
name='MaskPredictor',
**conv_hyperparams.params(use_bias=True)))
def _predict(self, features):
"""Predicts boxes.
Args:
features: A float tensor of shape [batch_size, height, width, channels]
containing image features.
Returns:
mask_predictions: A tensor of shape
[batch_size, num_anchors, num_classes, mask_height, mask_width]
representing the mask predictions for the proposals.
"""
mask_predictions = features
for layer in self._mask_predictor_layers:
mask_predictions = layer(mask_predictions)
batch_size = features.get_shape().as_list()[0]
if batch_size is None:
batch_size = tf.shape(features)[0]
mask_predictions = tf.reshape(
mask_predictions,
[batch_size, -1, self._num_masks, self._mask_height, self._mask_width])
return mask_predictions
......@@ -123,5 +123,107 @@ class ConvolutionalMaskPredictorTest(test_case.TestCase):
self.assertAllEqual([64, 323, 1, 7, 7],
mask_predictions.get_shape().as_list())
class MaskRCNNMaskHeadTest(test_case.TestCase):
def _build_conv_hyperparams(self,
op_type=hyperparams_pb2.Hyperparams.CONV):
hyperparams = hyperparams_pb2.Hyperparams()
hyperparams_text_proto = """
activation: NONE
regularizer {
l2_regularizer {
}
}
initializer {
truncated_normal_initializer {
}
}
"""
text_format.Merge(hyperparams_text_proto, hyperparams)
hyperparams.op = op_type
return hyperparams_builder.KerasLayerHyperparams(hyperparams)
def test_prediction_size(self):
mask_prediction_head = keras_mask_head.MaskRCNNMaskHead(
is_training=True,
num_classes=20,
conv_hyperparams=self._build_conv_hyperparams(),
freeze_batchnorm=False,
mask_height=14,
mask_width=14,
mask_prediction_num_conv_layers=2,
mask_prediction_conv_depth=256,
masks_are_class_agnostic=False)
roi_pooled_features = tf.random_uniform(
[64, 7, 7, 1024], minval=-10.0, maxval=10.0, dtype=tf.float32)
prediction = mask_prediction_head(roi_pooled_features)
self.assertAllEqual([64, 1, 20, 14, 14], prediction.get_shape().as_list())
def test_prediction_size_with_convolve_then_upsample(self):
mask_prediction_head = keras_mask_head.MaskRCNNMaskHead(
is_training=True,
num_classes=20,
conv_hyperparams=self._build_conv_hyperparams(),
freeze_batchnorm=False,
mask_height=28,
mask_width=28,
mask_prediction_num_conv_layers=2,
mask_prediction_conv_depth=256,
masks_are_class_agnostic=True,
convolve_then_upsample=True)
roi_pooled_features = tf.random_uniform(
[64, 14, 14, 1024], minval=-10.0, maxval=10.0, dtype=tf.float32)
prediction = mask_prediction_head(roi_pooled_features)
self.assertAllEqual([64, 1, 1, 28, 28], prediction.get_shape().as_list())
class WeightSharedConvolutionalMaskPredictorTest(test_case.TestCase):
def _build_conv_hyperparams(self):
conv_hyperparams = hyperparams_pb2.Hyperparams()
conv_hyperparams_text_proto = """
activation: NONE
regularizer {
l2_regularizer {
}
}
initializer {
truncated_normal_initializer {
}
}
"""
text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams)
return hyperparams_builder.KerasLayerHyperparams(conv_hyperparams)
def test_prediction_size(self):
mask_prediction_head = (
keras_mask_head.WeightSharedConvolutionalMaskHead(
num_classes=20,
num_predictions_per_location=1,
conv_hyperparams=self._build_conv_hyperparams(),
mask_height=7,
mask_width=7))
image_feature = tf.random_uniform(
[64, 17, 19, 1024], minval=-10.0, maxval=10.0, dtype=tf.float32)
mask_predictions = mask_prediction_head(image_feature)
self.assertAllEqual([64, 323, 20, 7, 7],
mask_predictions.get_shape().as_list())
def test_class_agnostic_prediction_size(self):
mask_prediction_head = (
keras_mask_head.WeightSharedConvolutionalMaskHead(
num_classes=20,
num_predictions_per_location=1,
conv_hyperparams=self._build_conv_hyperparams(),
mask_height=7,
mask_width=7,
masks_are_class_agnostic=True))
image_feature = tf.random_uniform(
[64, 17, 19, 1024], minval=-10.0, maxval=10.0, dtype=tf.float32)
mask_predictions = mask_prediction_head(image_feature)
self.assertAllEqual([64, 323, 1, 7, 7],
mask_predictions.get_shape().as_list())
if __name__ == '__main__':
tf.test.main()
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Mask R-CNN Box Predictor."""
from object_detection.core import box_predictor
BOX_ENCODINGS = box_predictor.BOX_ENCODINGS
CLASS_PREDICTIONS_WITH_BACKGROUND = (
box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND)
MASK_PREDICTIONS = box_predictor.MASK_PREDICTIONS
class MaskRCNNKerasBoxPredictor(box_predictor.KerasBoxPredictor):
"""Mask R-CNN Box Predictor.
See Mask R-CNN: He, K., Gkioxari, G., Dollar, P., & Girshick, R. (2017).
Mask R-CNN. arXiv preprint arXiv:1703.06870.
This is used for the second stage of the Mask R-CNN detector where proposals
cropped from an image are arranged along the batch dimension of the input
image_features tensor. Notice that locations are *not* shared across classes,
thus for each anchor, a separate prediction is made for each class.
In addition to predicting boxes and classes, optionally this class allows
predicting masks and/or keypoints inside detection boxes.
Currently this box predictor makes per-class predictions; that is, each
anchor makes a separate box prediction for each class.
"""
def __init__(self,
is_training,
num_classes,
freeze_batchnorm,
box_prediction_head,
class_prediction_head,
third_stage_heads,
name=None):
"""Constructor.
Args:
is_training: Indicates whether the BoxPredictor is in training mode.
num_classes: number of classes. Note that num_classes *does not*
include the background category, so if groundtruth labels take values
in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the
assigned classification targets can range from {0,... K}).
freeze_batchnorm: Whether to freeze batch norm parameters during
training or not. When training with a small batch size (e.g. 1), it is
desirable to freeze batch norm update and use pretrained batch norm
params.
box_prediction_head: The head that predicts the boxes in second stage.
class_prediction_head: The head that predicts the classes in second stage.
third_stage_heads: A dictionary mapping head names to mask rcnn head
classes.
name: A string name scope to assign to the model. If `None`, Keras
will auto-generate one from the class name.
"""
super(MaskRCNNKerasBoxPredictor, self).__init__(
is_training, num_classes, freeze_batchnorm=freeze_batchnorm,
inplace_batchnorm_update=False, name=name)
self._box_prediction_head = box_prediction_head
self._class_prediction_head = class_prediction_head
self._third_stage_heads = third_stage_heads
@property
def num_classes(self):
return self._num_classes
def get_second_stage_prediction_heads(self):
return BOX_ENCODINGS, CLASS_PREDICTIONS_WITH_BACKGROUND
def get_third_stage_prediction_heads(self):
return sorted(self._third_stage_heads.keys())
def _predict(self,
image_features,
prediction_stage=2):
"""Optionally computes encoded object locations, confidences, and masks.
Predicts the heads belonging to the given prediction stage.
Args:
image_features: A list of float tensors of shape
[batch_size, height_i, width_i, channels_i] containing roi pooled
features for each image. The length of the list should be 1 otherwise
a ValueError will be raised.
prediction_stage: Prediction stage. Acceptable values are 2 and 3.
Returns:
A dictionary containing the predicted tensors that are listed in
self._prediction_heads. A subset of the following keys will exist in the
dictionary:
BOX_ENCODINGS: A float tensor of shape
[batch_size, 1, num_classes, code_size] representing the
location of the objects.
CLASS_PREDICTIONS_WITH_BACKGROUND: A float tensor of shape
[batch_size, 1, num_classes + 1] representing the class
predictions for the proposals.
MASK_PREDICTIONS: A float tensor of shape
[batch_size, 1, num_classes, image_height, image_width]
Raises:
ValueError: If num_predictions_per_location is not 1 or if
len(image_features) is not 1.
ValueError: if prediction_stage is not 2 or 3.
"""
if len(image_features) != 1:
raise ValueError('length of `image_features` must be 1. Found {}'.format(
len(image_features)))
image_feature = image_features[0]
predictions_dict = {}
if prediction_stage == 2:
predictions_dict[BOX_ENCODINGS] = self._box_prediction_head(image_feature)
predictions_dict[CLASS_PREDICTIONS_WITH_BACKGROUND] = (
self._class_prediction_head(image_feature))
elif prediction_stage == 3:
for prediction_head in self.get_third_stage_prediction_heads():
head_object = self._third_stage_heads[prediction_head]
predictions_dict[prediction_head] = head_object(image_feature)
else:
raise ValueError('prediction_stage should be either 2 or 3.')
return predictions_dict
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for object_detection.predictors.mask_rcnn_box_predictor."""
import numpy as np
import tensorflow as tf
from google.protobuf import text_format
from object_detection.builders import box_predictor_builder
from object_detection.builders import hyperparams_builder
from object_detection.predictors import mask_rcnn_keras_box_predictor as box_predictor
from object_detection.protos import hyperparams_pb2
from object_detection.utils import test_case
class MaskRCNNKerasBoxPredictorTest(test_case.TestCase):
def _build_hyperparams(self,
op_type=hyperparams_pb2.Hyperparams.FC):
hyperparams = hyperparams_pb2.Hyperparams()
hyperparams_text_proto = """
activation: NONE
regularizer {
l2_regularizer {
}
}
initializer {
truncated_normal_initializer {
}
}
"""
text_format.Merge(hyperparams_text_proto, hyperparams)
hyperparams.op = op_type
return hyperparams_builder.KerasLayerHyperparams(hyperparams)
def test_get_boxes_with_five_classes(self):
def graph_fn(image_features):
mask_box_predictor = (
box_predictor_builder.build_mask_rcnn_keras_box_predictor(
is_training=False,
num_classes=5,
fc_hyperparams=self._build_hyperparams(),
freeze_batchnorm=False,
use_dropout=False,
dropout_keep_prob=0.5,
box_code_size=4,
))
box_predictions = mask_box_predictor(
[image_features],
prediction_stage=2)
return (box_predictions[box_predictor.BOX_ENCODINGS],
box_predictions[box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND])
image_features = np.random.rand(2, 7, 7, 3).astype(np.float32)
(box_encodings,
class_predictions_with_background) = self.execute(graph_fn,
[image_features])
self.assertAllEqual(box_encodings.shape, [2, 1, 5, 4])
self.assertAllEqual(class_predictions_with_background.shape, [2, 1, 6])
def test_get_boxes_with_five_classes_share_box_across_classes(self):
def graph_fn(image_features):
mask_box_predictor = (
box_predictor_builder.build_mask_rcnn_keras_box_predictor(
is_training=False,
num_classes=5,
fc_hyperparams=self._build_hyperparams(),
freeze_batchnorm=False,
use_dropout=False,
dropout_keep_prob=0.5,
box_code_size=4,
share_box_across_classes=True
))
box_predictions = mask_box_predictor(
[image_features],
prediction_stage=2)
return (box_predictions[box_predictor.BOX_ENCODINGS],
box_predictions[box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND])
image_features = np.random.rand(2, 7, 7, 3).astype(np.float32)
(box_encodings,
class_predictions_with_background) = self.execute(graph_fn,
[image_features])
self.assertAllEqual(box_encodings.shape, [2, 1, 1, 4])
self.assertAllEqual(class_predictions_with_background.shape, [2, 1, 6])
def test_get_instance_masks(self):
def graph_fn(image_features):
mask_box_predictor = (
box_predictor_builder.build_mask_rcnn_keras_box_predictor(
is_training=False,
num_classes=5,
fc_hyperparams=self._build_hyperparams(),
freeze_batchnorm=False,
use_dropout=False,
dropout_keep_prob=0.5,
box_code_size=4,
conv_hyperparams=self._build_hyperparams(
op_type=hyperparams_pb2.Hyperparams.CONV),
predict_instance_masks=True))
box_predictions = mask_box_predictor(
[image_features],
prediction_stage=3)
return (box_predictions[box_predictor.MASK_PREDICTIONS],)
image_features = np.random.rand(2, 7, 7, 3).astype(np.float32)
mask_predictions = self.execute(graph_fn, [image_features])
self.assertAllEqual(mask_predictions.shape, [2, 1, 5, 14, 14])
def test_do_not_return_instance_masks_without_request(self):
image_features = tf.random_uniform([2, 7, 7, 3], dtype=tf.float32)
mask_box_predictor = (
box_predictor_builder.build_mask_rcnn_keras_box_predictor(
is_training=False,
num_classes=5,
fc_hyperparams=self._build_hyperparams(),
freeze_batchnorm=False,
use_dropout=False,
dropout_keep_prob=0.5,
box_code_size=4))
box_predictions = mask_box_predictor(
[image_features],
prediction_stage=2)
self.assertEqual(len(box_predictions), 2)
self.assertTrue(box_predictor.BOX_ENCODINGS in box_predictions)
self.assertTrue(box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND
in box_predictions)
if __name__ == '__main__':
tf.test.main()
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""RFCN Box Predictor."""
import tensorflow as tf
from object_detection.core import box_predictor
from object_detection.utils import ops
BOX_ENCODINGS = box_predictor.BOX_ENCODINGS
CLASS_PREDICTIONS_WITH_BACKGROUND = (
box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND)
MASK_PREDICTIONS = box_predictor.MASK_PREDICTIONS
class RfcnKerasBoxPredictor(box_predictor.KerasBoxPredictor):
"""RFCN Box Predictor.
Applies a position sensitive ROI pooling on position sensitive feature maps to
predict classes and refined locations. See https://arxiv.org/abs/1605.06409
for details.
This is used for the second stage of the RFCN meta architecture. Notice that
locations are *not* shared across classes, thus for each anchor, a separate
prediction is made for each class.
"""
def __init__(self,
is_training,
num_classes,
conv_hyperparams,
freeze_batchnorm,
num_spatial_bins,
depth,
crop_size,
box_code_size,
name=None):
"""Constructor.
Args:
is_training: Indicates whether the BoxPredictor is in training mode.
num_classes: number of classes. Note that num_classes *does not*
include the background category, so if groundtruth labels take values
in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the
assigned classification targets can range from {0,... K}).
conv_hyperparams: A `hyperparams_builder.KerasLayerHyperparams` object
containing hyperparameters for convolution ops.
freeze_batchnorm: Whether to freeze batch norm parameters during
training or not. When training with a small batch size (e.g. 1), it is
desirable to freeze batch norm update and use pretrained batch norm
params.
num_spatial_bins: A list of two integers `[spatial_bins_y,
spatial_bins_x]`.
depth: Target depth to reduce the input feature maps to.
crop_size: A list of two integers `[crop_height, crop_width]`.
box_code_size: Size of encoding for each box.
name: A string name scope to assign to the box predictor. If `None`, Keras
will auto-generate one from the class name.
"""
super(RfcnKerasBoxPredictor, self).__init__(
is_training, num_classes, freeze_batchnorm=freeze_batchnorm,
inplace_batchnorm_update=False, name=name)
self._freeze_batchnorm = freeze_batchnorm
self._conv_hyperparams = conv_hyperparams
self._num_spatial_bins = num_spatial_bins
self._depth = depth
self._crop_size = crop_size
self._box_code_size = box_code_size
# Build the shared layers used for both heads
self._shared_conv_layers = []
self._shared_conv_layers.append(
tf.keras.layers.Conv2D(
self._depth,
[1, 1],
padding='SAME',
name='reduce_depth_conv',
**self._conv_hyperparams.params()))
self._shared_conv_layers.append(
self._conv_hyperparams.build_batch_norm(
training=(self._is_training and not self._freeze_batchnorm),
name='reduce_depth_batchnorm'))
self._shared_conv_layers.append(
self._conv_hyperparams.build_activation_layer(
name='reduce_depth_activation'))
self._box_encoder_layers = []
location_feature_map_depth = (self._num_spatial_bins[0] *
self._num_spatial_bins[1] *
self.num_classes *
self._box_code_size)
self._box_encoder_layers.append(
tf.keras.layers.Conv2D(
location_feature_map_depth,
[1, 1],
padding='SAME',
name='refined_locations_conv',
**self._conv_hyperparams.params()))
self._box_encoder_layers.append(
self._conv_hyperparams.build_batch_norm(
training=(self._is_training and not self._freeze_batchnorm),
name='refined_locations_batchnorm'))
self._class_predictor_layers = []
self._total_classes = self.num_classes + 1 # Account for background class.
class_feature_map_depth = (self._num_spatial_bins[0] *
self._num_spatial_bins[1] *
self._total_classes)
self._class_predictor_layers.append(
tf.keras.layers.Conv2D(
class_feature_map_depth,
[1, 1],
padding='SAME',
name='class_predictions_conv',
**self._conv_hyperparams.params()))
self._class_predictor_layers.append(
self._conv_hyperparams.build_batch_norm(
training=(self._is_training and not self._freeze_batchnorm),
name='class_predictions_batchnorm'))
@property
def num_classes(self):
return self._num_classes
def _predict(self, image_features, proposal_boxes):
"""Computes encoded object locations and corresponding confidences.
Args:
image_features: A list of float tensors of shape [batch_size, height_i,
width_i, channels_i] containing features for a batch of images.
proposal_boxes: A float tensor of shape [batch_size, num_proposals,
box_code_size].
Returns:
box_encodings: A list of float tensors of shape
[batch_size, num_anchors_i, q, code_size] representing the location of
the objects, where q is 1 or the number of classes. Each entry in the
list corresponds to a feature map in the input `image_features` list.
class_predictions_with_background: A list of float tensors of shape
[batch_size, num_anchors_i, num_classes + 1] representing the class
predictions for the proposals. Each entry in the list corresponds to a
feature map in the input `image_features` list.
Raises:
ValueError: if num_predictions_per_location is not 1 or if
len(image_features) is not 1.
"""
if len(image_features) != 1:
raise ValueError('length of `image_features` must be 1. Found {}'.
format(len(image_features)))
image_feature = image_features[0]
batch_size = tf.shape(proposal_boxes)[0]
num_boxes = tf.shape(proposal_boxes)[1]
net = image_feature
for layer in self._shared_conv_layers:
net = layer(net)
# Location predictions.
box_net = net
for layer in self._box_encoder_layers:
box_net = layer(box_net)
box_encodings = ops.batch_position_sensitive_crop_regions(
box_net,
boxes=proposal_boxes,
crop_size=self._crop_size,
num_spatial_bins=self._num_spatial_bins,
global_pool=True)
box_encodings = tf.squeeze(box_encodings, squeeze_dims=[2, 3])
box_encodings = tf.reshape(box_encodings,
[batch_size * num_boxes, 1, self.num_classes,
self._box_code_size])
# Class predictions.
class_net = net
for layer in self._class_predictor_layers:
class_net = layer(class_net)
class_predictions_with_background = (
ops.batch_position_sensitive_crop_regions(
class_net,
boxes=proposal_boxes,
crop_size=self._crop_size,
num_spatial_bins=self._num_spatial_bins,
global_pool=True))
class_predictions_with_background = tf.squeeze(
class_predictions_with_background, squeeze_dims=[2, 3])
class_predictions_with_background = tf.reshape(
class_predictions_with_background,
[batch_size * num_boxes, 1, self._total_classes])
return {BOX_ENCODINGS: [box_encodings],
CLASS_PREDICTIONS_WITH_BACKGROUND:
[class_predictions_with_background]}
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for object_detection.predictors.rfcn_box_predictor."""
import numpy as np
import tensorflow as tf
from google.protobuf import text_format
from object_detection.builders import hyperparams_builder
from object_detection.predictors import rfcn_keras_box_predictor as box_predictor
from object_detection.protos import hyperparams_pb2
from object_detection.utils import test_case
class RfcnKerasBoxPredictorTest(test_case.TestCase):
def _build_conv_hyperparams(self):
conv_hyperparams = hyperparams_pb2.Hyperparams()
conv_hyperparams_text_proto = """
regularizer {
l2_regularizer {
}
}
initializer {
truncated_normal_initializer {
}
}
"""
text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams)
return hyperparams_builder.KerasLayerHyperparams(conv_hyperparams)
def test_get_correct_box_encoding_and_class_prediction_shapes(self):
def graph_fn(image_features, proposal_boxes):
rfcn_box_predictor = box_predictor.RfcnKerasBoxPredictor(
is_training=False,
num_classes=2,
conv_hyperparams=self._build_conv_hyperparams(),
freeze_batchnorm=False,
num_spatial_bins=[3, 3],
depth=4,
crop_size=[12, 12],
box_code_size=4
)
box_predictions = rfcn_box_predictor(
[image_features],
proposal_boxes=proposal_boxes)
box_encodings = tf.concat(
box_predictions[box_predictor.BOX_ENCODINGS], axis=1)
class_predictions_with_background = tf.concat(
box_predictions[box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND],
axis=1)
return (box_encodings, class_predictions_with_background)
image_features = np.random.rand(4, 8, 8, 64).astype(np.float32)
proposal_boxes = np.random.rand(4, 2, 4).astype(np.float32)
(box_encodings, class_predictions_with_background) = self.execute(
graph_fn, [image_features, proposal_boxes])
self.assertAllEqual(box_encodings.shape, [8, 1, 2, 4])
self.assertAllEqual(class_predictions_with_background.shape, [8, 1, 3])
if __name__ == '__main__':
tf.test.main()
......@@ -2,9 +2,10 @@ syntax = "proto2";
package object_detection.protos;
import "object_detection/protos/flexible_grid_anchor_generator.proto";
import "object_detection/protos/grid_anchor_generator.proto";
import "object_detection/protos/ssd_anchor_generator.proto";
import "object_detection/protos/multiscale_anchor_generator.proto";
import "object_detection/protos/ssd_anchor_generator.proto";
// Configuration proto for the anchor generator to use in the object detection
// pipeline. See core/anchor_generator.py for details.
......@@ -13,5 +14,6 @@ message AnchorGenerator {
GridAnchorGenerator grid_anchor_generator = 1;
SsdAnchorGenerator ssd_anchor_generator = 2;
MultiscaleAnchorGenerator multiscale_anchor_generator = 3;
FlexibleGridAnchorGenerator flexible_grid_anchor_generator = 4;
}
}
......@@ -15,7 +15,6 @@ message BoxPredictor {
}
}
// Configuration proto for Convolutional box predictor.
// Next id: 13
message ConvolutionalBoxPredictor {
......@@ -57,6 +56,13 @@ message ConvolutionalBoxPredictor {
// Whether to use depthwise separable convolution for box predictor layers.
optional bool use_depthwise = 11 [default = false];
// If specified, apply clipping to box encodings.
message BoxEncodingsClipRange {
optional float min = 1;
optional float max = 2;
}
optional BoxEncodingsClipRange box_encodings_clip_range = 12;
}
// Configuration proto for weight shared convolutional box predictor.
......@@ -118,6 +124,8 @@ message WeightSharedConvolutionalBoxPredictor {
optional BoxEncodingsClipRange box_encodings_clip_range = 17;
}
// TODO(alirezafathi): Refactor the proto file to be able to configure mask rcnn
// head easily.
// Next id: 15
......
syntax = "proto2";
package object_detection.protos;
message FlexibleGridAnchorGenerator {
repeated AnchorGrid anchor_grid = 1;
// Whether to produce anchors in normalized coordinates.
optional bool normalize_coordinates = 2 [default = true];
}
message AnchorGrid {
// The base sizes in pixels for each anchor in this anchor layer.
repeated float base_sizes = 1;
// The aspect ratios for each anchor in this anchor layer.
repeated float aspect_ratios = 2;
// The anchor height stride in pixels.
optional uint32 height_stride = 3;
// The anchor width stride in pixels.
optional uint32 width_stride = 4;
// The anchor height offset in pixels.
optional uint32 height_offset = 5 [default = 0];
// The anchor width offset in pixels.
optional uint32 width_offset = 6 [default = 0];
}
......@@ -20,4 +20,7 @@ message Quantization {
// Number of bits to use for quantizing activations.
// Only 8 bit is supported for now.
optional int32 activation_bits = 3 [default = 8];
// Whether to use symmetric weight quantization.
optional bool symmetric = 4 [default = false];
}
......@@ -9,6 +9,7 @@ message ImageResizer {
KeepAspectRatioResizer keep_aspect_ratio_resizer = 1;
FixedShapeResizer fixed_shape_resizer = 2;
IdentityResizer identity_resizer = 3;
ConditionalShapeResizer conditional_shape_resizer = 4;
}
}
......@@ -61,3 +62,31 @@ message FixedShapeResizer {
// Whether to also resize the image channels from 3 to 1 (RGB to grayscale).
optional bool convert_to_grayscale = 4 [default = false];
}
// Configuration proto for image resizer that resizes only if input image height
// or width is greater or smaller than a certain size.
// Aspect ratio is maintained.
message ConditionalShapeResizer {
// Enumeration for the condition on which to resize an image.
enum ResizeCondition {
INVALID = 0; // Default value.
GREATER = 1; // Resizes image if a dimension is greater than specified size.
SMALLER = 2; // Resizes image if a dimension is smaller than specified size.
}
// Condition which must be true to resize the image.
optional ResizeCondition condition = 1 [default = GREATER];
// Threshold for the image size. If any image dimension is above or below this
// (as specified by condition) the image will be resized so that it meets the
// threshold.
optional int32 size_threshold = 2 [default = 300];
// Desired method when resizing image.
optional ResizeType resize_method = 3 [default = BILINEAR];
// Whether to also resize the image channels from 3 to 1 (RGB to grayscale).
optional bool convert_to_grayscale = 4 [default = false];
}
syntax = "proto2";
package object_detection.protos;
import "object_detection/protos/anchor_generator.proto";
......@@ -6,15 +7,14 @@ import "object_detection/protos/box_coder.proto";
import "object_detection/protos/box_predictor.proto";
import "object_detection/protos/hyperparams.proto";
import "object_detection/protos/image_resizer.proto";
import "object_detection/protos/matcher.proto";
import "object_detection/protos/losses.proto";
import "object_detection/protos/matcher.proto";
import "object_detection/protos/post_processing.proto";
import "object_detection/protos/region_similarity_calculator.proto";
// Configuration for Single Shot Detection (SSD) models.
// Next id: 26
message Ssd {
// Number of classes to predict.
optional int32 num_classes = 1;
......@@ -114,8 +114,8 @@ message Ssd {
// features and the number of classes.
optional int32 mask_prediction_conv_depth = 4 [default = 256];
// The number of convolutions applied to image_features in the mask prediction
// branch.
// The number of convolutions applied to image_features in the mask
// prediction branch.
optional int32 mask_prediction_num_conv_layers = 5 [default = 2];
// Whether to apply convolutions on mask features before upsampling using
......@@ -125,10 +125,10 @@ message Ssd {
optional bool convolve_then_upsample_masks = 6 [default = false];
// Mask loss weight.
optional float mask_loss_weight = 7 [default=5.0];
optional float mask_loss_weight = 7 [default = 5.0];
// Number of boxes to be generated at training time for computing mask loss.
optional int32 mask_loss_sample_size = 8 [default=16];
optional int32 mask_loss_sample_size = 8 [default = 16];
// Hyperparameters for convolution ops used in the box predictor.
optional Hyperparams conv_hyperparams = 9;
......
# SSD with Mobilenet v1, configured for Oxford-IIIT Pets Dataset.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
# should be configured.
# TPU-compatible for both training and inference
model {
ssd {
num_classes: 37
box_coder {
faster_rcnn_box_coder {
y_scale: 10.0
x_scale: 10.0
height_scale: 5.0
width_scale: 5.0
}
}
matcher {
argmax_matcher {
matched_threshold: 0.5
unmatched_threshold: 0.5
ignore_thresholds: false
negatives_lower_than_unmatched: true
force_match_for_each_row: true
}
}
similarity_calculator {
iou_similarity {
}
}
anchor_generator {
ssd_anchor_generator {
num_layers: 6
min_scale: 0.2
max_scale: 0.95
aspect_ratios: 1.0
aspect_ratios: 2.0
aspect_ratios: 0.5
aspect_ratios: 3.0
aspect_ratios: 0.3333
}
}
image_resizer {
fixed_shape_resizer {
height: 300
width: 300
}
}
box_predictor {
convolutional_box_predictor {
min_depth: 0
max_depth: 0
num_layers_before_predictor: 0
use_dropout: false
dropout_keep_probability: 0.8
kernel_size: 1
box_code_size: 4
apply_sigmoid_to_scores: false
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
truncated_normal_initializer {
stddev: 0.03
mean: 0.0
}
}
batch_norm {
train: true,
scale: true,
center: true,
decay: 0.9997,
epsilon: 0.001,
}
}
}
}
feature_extractor {
type: 'ssd_mobilenet_v1'
min_depth: 16
depth_multiplier: 1.0
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
truncated_normal_initializer {
stddev: 0.03
mean: 0.0
}
}
batch_norm {
train: true,
scale: true,
center: true,
decay: 0.9997,
epsilon: 0.001,
}
}
}
loss {
classification_loss {
weighted_sigmoid_focal {
alpha: 0.75
gamma: 2.0
}
}
localization_loss {
weighted_smooth_l1 {
}
}
classification_weight: 1.0
localization_weight: 1.0
}
normalize_loss_by_num_matches: true
post_processing {
batch_non_max_suppression {
score_threshold: 1e-8
iou_threshold: 0.6
max_detections_per_class: 100
max_total_detections: 100
use_static_shapes: true
}
score_converter: SIGMOID
}
}
}
train_config: {
batch_size: 24
optimizer {
rms_prop_optimizer: {
learning_rate: {
exponential_decay_learning_rate {
initial_learning_rate: 0.004
decay_steps: 800720
decay_factor: 0.95
}
}
momentum_optimizer_value: 0.9
decay: 0.9
epsilon: 1.0
}
}
fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt"
from_detection_checkpoint: true
load_all_detection_checkpoint_vars: true
# Note: The below line limits the training process to 200K steps, which we
# empirically found to be sufficient enough to train the pets dataset. This
# effectively bypasses the learning rate schedule (the learning rate will
# never decay). Remove the below line to train indefinitely.
num_steps: 200000
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
ssd_random_crop {
}
}
max_number_of_boxes: 50
unpad_groundtruth_tensors: false
}
train_input_reader: {
tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/pet_faces_train.record-?????-of-00010"
}
label_map_path: "PATH_TO_BE_CONFIGURED/pet_label_map.pbtxt"
}
eval_config: {
metrics_set: "coco_detection_metrics"
num_examples: 1101
}
eval_input_reader: {
tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/pet_faces_val.record-?????-of-00010"
}
label_map_path: "PATH_TO_BE_CONFIGURED/pet_label_map.pbtxt"
shuffle: false
num_readers: 1
}
# SSD with Mobilenet v2 configuration for MSCOCO Dataset.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
# should be configured.
model {
ssd {
num_classes: 90
box_coder {
faster_rcnn_box_coder {
y_scale: 10.0
x_scale: 10.0
height_scale: 5.0
width_scale: 5.0
}
}
matcher {
argmax_matcher {
matched_threshold: 0.5
unmatched_threshold: 0.5
ignore_thresholds: false
negatives_lower_than_unmatched: true
force_match_for_each_row: true
}
}
similarity_calculator {
iou_similarity {
}
}
anchor_generator {
ssd_anchor_generator {
num_layers: 6
min_scale: 0.2
max_scale: 0.95
aspect_ratios: 1.0
aspect_ratios: 2.0
aspect_ratios: 0.5
aspect_ratios: 3.0
aspect_ratios: 0.3333
height_stride: 16
height_stride: 32
height_stride: 64
height_stride: 128
height_stride: 256
height_stride: 512
width_stride: 16
width_stride: 32
width_stride: 64
width_stride: 128
width_stride: 256
width_stride: 512
}
}
image_resizer {
keep_aspect_ratio_resizer {
min_dimension: 320
max_dimension: 640
}
}
box_predictor {
convolutional_box_predictor {
min_depth: 0
max_depth: 0
num_layers_before_predictor: 0
use_dropout: false
dropout_keep_probability: 0.8
kernel_size: 1
box_code_size: 4
apply_sigmoid_to_scores: false
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
truncated_normal_initializer {
stddev: 0.03
mean: 0.0
}
}
batch_norm {
train: true,
scale: true,
center: true,
decay: 0.9997,
epsilon: 0.001,
}
}
}
}
feature_extractor {
type: 'ssd_mobilenet_v2'
min_depth: 16
depth_multiplier: 1.0
use_explicit_padding: true
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
truncated_normal_initializer {
stddev: 0.03
mean: 0.0
}
}
batch_norm {
train: true,
scale: true,
center: true,
decay: 0.9997,
epsilon: 0.001,
}
}
}
loss {
classification_loss {
weighted_sigmoid {
}
}
localization_loss {
weighted_smooth_l1 {
}
}
hard_example_miner {
num_hard_examples: 3000
iou_threshold: 0.99
loss_type: CLASSIFICATION
max_negatives_per_positive: 3
min_negatives_per_image: 3
}
classification_weight: 1.0
localization_weight: 1.0
}
normalize_loss_by_num_matches: true
post_processing {
batch_non_max_suppression {
score_threshold: 1e-8
iou_threshold: 0.6
max_detections_per_class: 100
max_total_detections: 100
}
score_converter: SIGMOID
}
}
}
train_config: {
batch_size: 24
optimizer {
rms_prop_optimizer: {
learning_rate: {
exponential_decay_learning_rate {
initial_learning_rate: 0.004
decay_steps: 800720
decay_factor: 0.95
}
}
momentum_optimizer_value: 0.9
decay: 0.9
epsilon: 1.0
}
}
fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt"
fine_tune_checkpoint_type: "detection"
# Note: The below line limits the training process to 200K steps, which we
# empirically found to be sufficient enough to train the pets dataset. This
# effectively bypasses the learning rate schedule (the learning rate will
# never decay). Remove the below line to train indefinitely.
num_steps: 200000
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
ssd_random_crop_fixed_aspect_ratio {
}
}
}
train_input_reader: {
tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/mscoco_train.record-?????-of-00100"
}
label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
}
eval_config: {
num_examples: 8000
# Note: The below line limits the evaluation process to 10 evaluations.
# Remove the below line to evaluate indefinitely.
max_evals: 10
}
eval_input_reader: {
tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/mscoco_val.record-?????-of-00010"
}
label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
shuffle: false
num_readers: 1
}
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Python binary for exporting SavedModel, tailored for TPU inference."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
from object_detection.tpu_exporters import export_saved_model_tpu_lib
flags = tf.app.flags
FLAGS = flags.FLAGS
flags.DEFINE_string('pipeline_config_file', None,
'A pipeline_pb2.TrainEvalPipelineConfig config file.')
flags.DEFINE_string(
'ckpt_path', None, 'Path to trained checkpoint, typically of the form '
'path/to/model.ckpt')
flags.DEFINE_string('export_dir', None, 'Path to export SavedModel.')
flags.DEFINE_string('input_placeholder_name', 'placeholder_tensor',
'Name of input placeholder in model\'s signature_def_map.')
flags.DEFINE_string(
'input_type', 'tf_example', 'Type of input node. Can be '
'one of [`image_tensor`, `encoded_image_string_tensor`, '
'`tf_example`]')
flags.DEFINE_boolean('use_bfloat16', False, 'If true, use tf.bfloat16 on TPU.')
def main(argv):
if len(argv) > 1:
raise tf.app.UsageError('Too many command-line arguments.')
export_saved_model_tpu_lib.export(FLAGS.pipeline_config_file, FLAGS.ckpt_path,
FLAGS.export_dir,
FLAGS.input_placeholder_name,
FLAGS.input_type, FLAGS.use_bfloat16)
if __name__ == '__main__':
tf.app.flags.mark_flag_as_required('pipeline_config_file')
tf.app.flags.mark_flag_as_required('ckpt_path')
tf.app.flags.mark_flag_as_required('export_dir')
tf.app.run()
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Python library for exporting SavedModel, tailored for TPU inference."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
from google.protobuf import text_format
# pylint: disable=g-direct-tensorflow-import
from tensorflow.python.saved_model import loader
from tensorflow.python.saved_model import signature_constants
from tensorflow.python.saved_model import tag_constants
# pylint: enable=g-direct-tensorflow-import
from object_detection.protos import pipeline_pb2
from object_detection.tpu_exporters import faster_rcnn
from object_detection.tpu_exporters import ssd
model_map = {
'faster_rcnn': faster_rcnn,
'ssd': ssd,
}
def parse_pipeline_config(pipeline_config_file):
"""Returns pipeline config and meta architecture name."""
with tf.gfile.GFile(pipeline_config_file, 'r') as config_file:
config_str = config_file.read()
pipeline_config = pipeline_pb2.TrainEvalPipelineConfig()
text_format.Merge(config_str, pipeline_config)
meta_arch = pipeline_config.model.WhichOneof('model')
return pipeline_config, meta_arch
def export(pipeline_config_file,
ckpt_path,
export_dir,
input_placeholder_name='placeholder_tensor',
input_type='encoded_image_string_tensor',
use_bfloat16=False):
"""Exports as SavedModel.
Args:
pipeline_config_file: Pipeline config file name.
ckpt_path: Training checkpoint path.
export_dir: Directory to export SavedModel.
input_placeholder_name: input placeholder's name in SavedModel signature.
input_type: One of
'encoded_image_string_tensor': a 1d tensor with dtype=tf.string
'image_tensor': a 4d tensor with dtype=tf.uint8
'tf_example': a 1d tensor with dtype=tf.string
use_bfloat16: If true, use tf.bfloat16 on TPU.
"""
pipeline_config, meta_arch = parse_pipeline_config(pipeline_config_file)
shapes_info = model_map[meta_arch].get_prediction_tensor_shapes(
pipeline_config)
with tf.Graph().as_default(), tf.Session() as sess:
placeholder_tensor, result_tensor_dict = model_map[meta_arch].build_graph(
pipeline_config, shapes_info, input_type, use_bfloat16)
saver = tf.train.Saver()
init_op = tf.global_variables_initializer()
sess.run(init_op)
if ckpt_path is not None:
saver.restore(sess, ckpt_path)
# export saved model
builder = tf.saved_model.builder.SavedModelBuilder(export_dir)
tensor_info_inputs = {
input_placeholder_name:
tf.saved_model.utils.build_tensor_info(placeholder_tensor)
}
tensor_info_outputs = {
k: tf.saved_model.utils.build_tensor_info(v)
for k, v in result_tensor_dict.items()
}
detection_signature = (
tf.saved_model.signature_def_utils.build_signature_def(
inputs=tensor_info_inputs,
outputs=tensor_info_outputs,
method_name=tf.saved_model.signature_constants.PREDICT_METHOD_NAME))
tf.logging.info('Inputs:\n{}\nOutputs:{}\nPredict method name:{}'.format(
tensor_info_inputs, tensor_info_outputs,
tf.saved_model.signature_constants.PREDICT_METHOD_NAME))
# Graph for TPU.
builder.add_meta_graph_and_variables(
sess, [
tf.saved_model.tag_constants.SERVING,
tf.saved_model.tag_constants.TPU
],
signature_def_map={
tf.saved_model.signature_constants
.DEFAULT_SERVING_SIGNATURE_DEF_KEY:
detection_signature,
},
strip_default_attrs=True)
# Graph for CPU, this is for passing infra validation.
builder.add_meta_graph(
[tf.saved_model.tag_constants.SERVING],
signature_def_map={
tf.saved_model.signature_constants
.DEFAULT_SERVING_SIGNATURE_DEF_KEY:
detection_signature,
},
strip_default_attrs=True)
builder.save(as_text=False)
tf.logging.info('Model saved to {}'.format(export_dir))
def run_inference(inputs,
pipeline_config_file,
ckpt_path,
input_type='encoded_image_string_tensor',
use_bfloat16=False,
repeat=1):
"""Runs inference on TPU.
Args:
inputs: Input image with the same type as `input_type`
pipeline_config_file: Pipeline config file name.
ckpt_path: Training checkpoint path.
input_type: One of
'encoded_image_string_tensor': a 1d tensor with dtype=tf.string
'image_tensor': a 4d tensor with dtype=tf.uint8
'tf_example': a 1d tensor with dtype=tf.string
use_bfloat16: If true, use tf.bfloat16 on TPU.
repeat: Number of times to repeat running the provided input for profiling.
Returns:
A dict of resulting tensors.
"""
pipeline_config, meta_arch = parse_pipeline_config(pipeline_config_file)
shapes_info = model_map[meta_arch].get_prediction_tensor_shapes(
pipeline_config)
with tf.Graph().as_default(), tf.Session() as sess:
placeholder_tensor, result_tensor_dict = model_map[meta_arch].build_graph(
pipeline_config, shapes_info, input_type, use_bfloat16)
saver = tf.train.Saver()
init_op = tf.global_variables_initializer()
sess.run(tf.contrib.tpu.initialize_system())
sess.run(init_op)
if ckpt_path is not None:
saver.restore(sess, ckpt_path)
for _ in range(repeat):
tensor_dict_out = sess.run(
result_tensor_dict, feed_dict={placeholder_tensor: [inputs]})
sess.run(tf.contrib.tpu.shutdown_system())
return tensor_dict_out
def run_inference_from_saved_model(inputs,
saved_model_dir,
input_placeholder_name='placeholder_tensor',
repeat=1):
"""Loads saved model and run inference on TPU.
Args:
inputs: Input image with the same type as `input_type`
saved_model_dir: The directory SavedModel being exported to.
input_placeholder_name: input placeholder's name in SavedModel signature.
repeat: Number of times to repeat running the provided input for profiling.
Returns:
A dict of resulting tensors.
"""
with tf.Graph().as_default(), tf.Session() as sess:
meta_graph = loader.load(sess, [tag_constants.SERVING, tag_constants.TPU],
saved_model_dir)
sess.run(tf.contrib.tpu.initialize_system())
key_prediction = signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY
tensor_name_input = (
meta_graph.signature_def[key_prediction].inputs[input_placeholder_name]
.name)
tensor_name_output = {
k: v.name
for k, v in (meta_graph.signature_def[key_prediction].outputs.items())
}
for _ in range(repeat):
tensor_dict_out = sess.run(
tensor_name_output, feed_dict={tensor_name_input: [inputs]})
sess.run(tf.contrib.tpu.shutdown_system())
return tensor_dict_out
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment