Commit 0ba83cf0 authored by pkulzc's avatar pkulzc Committed by Sergio Guadarrama
Browse files

Release MobileNet V3 models and SSDLite models with MobileNet V3 backbone. (#7678)

* Merged commit includes the following changes:
275131829  by Sergio Guadarrama:

    updates mobilenet/README.md to be github compatible adds V2+ reference to mobilenet_v1.md file and fixes invalid markdown

--
274908068  by Sergio Guadarrama:

    Opensource MobilenetV3 detection models.

--
274697808  by Sergio Guadarrama:

    Fixed cases where tf.TensorShape was constructed with float dimensions

    This is a prerequisite for making TensorShape and Dimension more strict
    about the types of their arguments.

--
273577462  by Sergio Guadarrama:

    Fixing `conv_defs['defaults']` override issue.

--
272801298  by Sergio Guadarrama:

    Adds links to trained models for Moblienet V3, adds a version of minimalistic mobilenet-v3 to the definitions.

--
268928503  by Sergio Guadarrama:

    Mobilenet v2 with group normalization.

--
263492735  by Sergio Guadarrama:

    Internal change

260037126  by Sergio Guadarrama:

    Adds an option of using a custom depthwise operation in `expanded_conv`.

--
259997001  by Sergio Guadarrama:

    Explicitly mark Python binaries/tests with python_version = "PY2".

--
252697685  by Sergio Guadarrama:

    Internal change

251918746  by Sergio Guadarrama:

    Internal change

251909704  by Sergio Guadarrama:

    Mobilenet V3 backbone implementation.

--
247510236  by Sergio Guadarrama:

    Internal change

246196802  by Sergio Guadarrama:

    Internal change

246014539  by Sergio Guadarrama:

    Internal change

245891435  by Sergio Guadarrama:

    Internal change

245834925  by Sergio Guadarrama:

    n/a

--

PiperOrigin-RevId: 275131829

* Merged commit includes the following changes:
274959989  by Zhichao Lu:

    Update detection model zoo with MobilenetV3 SSD candidates.

--
274908068  by Zhichao Lu:

    Opensource MobilenetV3 detection models.

--
274695889  by richardmunoz:

    RandomPatchGaussian preprocessing step

    This step can be used during model training to randomly apply gaussian noise to a random image patch. Example addition to an Object Detection API pipeline config:

    train_config {
      ...
      data_augmentation_options {
        random_patch_gaussian {
          random_coef: 0.5
          min_patch_size: 1
          max_patch_size: 250
          min_gaussian_stddev: 0.0
          max_gaussian_stddev: 1.0
        }
      }
      ...
    }

--
274257872  by lzc:

    Internal change.

--
274114689  by Zhichao Lu:

    Pass native_resize flag to other FPN variants.

--
274112308  by lzc:

    Internal change.

--
274090763  by richardmunoz:

    Util function for getting a patch mask on an image for use with the Object Detection API

--
274069806  by Zhichao Lu:

    Adding functions which will help compute predictions and losses for CenterNet.

--
273860828  by lzc:

    Internal change.

--
273380069  by richardmunoz:

    RandomImageDownscaleToTargetPixels preprocessing step

    This step can be used during model training to randomly downscale an image to a random target number of pixels. If the image does not contain more than the target number of pixels, then downscaling is skipped. Example addition to an Object Detection API pipeline config:

    train_config {
      ...
      data_augmentation_options {
        random_downscale_to_target_pixels {
          random_coef: 0.5
          min_target_pixels: 300000
          max_target_pixels: 500000
        }
      }
      ...
    }

--
272987602  by Zhichao Lu:

    Avoid -inf when empty box list is passed.

--
272525836  by Zhichao Lu:

    Cleanup repeated resizing code in meta archs.

--
272458667  by richardmunoz:

    RandomJpegQuality preprocessing step

    This step can be used during model training to randomly encode the image into a jpeg with a random quality level. Example addition to an Object Detection API pipeline config:

    train_config {
      ...
      data_augmentation_options {
        random_jpeg_quality {
          random_coef: 0.5
          min_jpeg_quality: 80
          max_jpeg_quality: 100
        }
      }
      ...
    }

--
271412717  by Zhichao Lu:

    Enables TPU training with the V2 eager + tf.function Object Detection training loops.

--
270744153  by Zhichao Lu:

    Adding the offset and size target assigners for CenterNet.

--
269916081  by Zhichao Lu:

    Include basic installation in Object Detection API tutorial.
    Also:
     - Use TF2.0
     - Use saved_model

--
269376056  by Zhichao Lu:

    Fix to variable loading in RetinaNet w/ custom loops. (makes the code rely on the exact name scopes that are generated a little bit less)

--
269256251  by lzc:

    Add use_partitioned_nms field to config and update post_prossing_builder to honor that flag when building nms function.

--
268865295  by Zhichao Lu:

    Adding functionality for importing and merging back internal state of the metric.

--
268640984  by Zhichao Lu:

    Fix computation of gaussian sigma value to create CenterNet heatmap target.

--
267475576  by Zhichao Lu:

    Fix for exporter trying to export non-existent exponential moving averages.

--
267286768  by Zhichao Lu:

    Update mixed-precision policy.

--
266166879  by Zhichao Lu:

    Internal change

265860884  by Zhichao Lu:

    Apply floor function to center coordinates when creating heatmap for CenterNet target.

--
265702749  by Zhichao Lu:

    Internal change

--
264241949  by ronnyvotel:

    Updating Faster R-CNN 'final_anchors' to be in normalized coordinates.

--
264175192  by lzc:

    Update model_fn to only read hparams if it is not None.

--
264159328  by Zhichao Lu:

    Modify nearest neighbor upsampling to eliminate a multiply operation. For quantized models, the multiply operation gets unnecessarily quantized and reduces accuracy (simple stacking would work in place of the broadcast op which doesn't require quantization). Also removes an unnecessary reshape op.

--
263668306  by Zhichao Lu:

    Add the option to use dynamic map_fn for batch NMS

--
263031163  by Zhichao Lu:

    Mark outside compilation for NMS as optional.

--
263024916  by Zhichao Lu:

    Add an ExperimentalModel meta arch for experimenting with new model types.

--
262655894  by Zhichao Lu:

    Add the center heatmap target assigner for CenterNet

--
262431036  by Zhichao Lu:

    Adding add_eval_dict to allow for evaluation on model_v2

--
262035351  by ronnyvotel:

    Removing any non-Tensor predictions from the third stage of Mask R-CNN.

--
261953416  by Zhichao Lu:

    Internal change.

--
261834966  by Zhichao Lu:

    Fix the NMS OOM issue on TPU by forcing NMS to run outside of TPU.

--
261775941  by Zhichao Lu:

    Make Keras InputLayer compatible with both TF 1.x and TF 2.0.

--
261775633  by Zhichao Lu:

    Visualize additional channels with ground-truth bounding boxes.

--
261768117  by lzc:

    Internal change.

--
261766773  by ronnyvotel:

    Exposing `return_raw_detections_during_predict` in Faster R-CNN Proto.

--
260975089  by ronnyvotel:

    Moving calculation of batched prediction tensor names after all tensors in prediction dictionary are created.

--
259816913  by ronnyvotel:

    Adding raw detection boxes and feature map indices to SSD

--
259791955  by Zhichao Lu:

    Added a flag to control the use partitioned_non_max_suppression.

--
259580475  by Zhichao Lu:

    Tweak quantization-aware training re-writer to support NasFpn model architecture.

--
259579943  by rathodv:

    Add a meta target assigner proto and builders in OD API.

--
259577741  by Zhichao Lu:

    Internal change.

--
259366315  by lzc:

    Internal change.

--
259344310  by ronnyvotel:

    Updating faster rcnn so that raw_detection_boxes from predict() are in normalized coordinates.

--
259338670  by Zhichao Lu:

    Add support for use_native_resize_op to more feature extractors. Use dynamic shapes when static shapes are not available.

--
259083543  by ronnyvotel:

    Updating/fixing documentation.

--
259078937  by rathodv:

    Add prediction fields for tensors returned from detection_model.predict.

--
259044601  by Zhichao Lu:

    Add protocol buffer and builders for temperature scaling calibration.

--
259036770  by lzc:

    Internal changes.

--
259006223  by ronnyvotel:

    Adding detection anchor indices to Faster R-CNN Config. This is useful when one wishes to associate final detections and the anchors (or pre-nms boxes) from which they originated.

--
258872501  by Zhichao Lu:

    Run the training pipeline of ssd + resnet_v1_50 + fpn with a checkpoint.

--
258840686  by ronnyvotel:

    Adding standard outputs to DetectionModel.predict(). This CL only updates Faster R-CNN. Other meta architectures will be updated in future CLs.

--
258672969  by lzc:

    Internal change.

--
258649494  by lzc:

    Internal changes.

--
258630321  by ronnyvotel:

    Fixing documentation in shape_utils.flatten_dimensions().

--
258468145  by Zhichao Lu:

    Add additional output tensors parameter to Postprocess op.

--
258099219  by Zhichao Lu:

    Internal changes

--

PiperOrigin-RevId: 274959989
parent 9aed0ffb
...@@ -19,9 +19,9 @@ from __future__ import absolute_import ...@@ -19,9 +19,9 @@ from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
from absl.testing import parameterized
import numpy as np import numpy as np
import six import six
from six.moves import range from six.moves import range
from six.moves import zip from six.moves import zip
import tensorflow as tf import tensorflow as tf
...@@ -36,7 +36,7 @@ else: ...@@ -36,7 +36,7 @@ else:
from unittest import mock # pylint: disable=g-import-not-at-top from unittest import mock # pylint: disable=g-import-not-at-top
class PreprocessorTest(tf.test.TestCase): class PreprocessorTest(tf.test.TestCase, parameterized.TestCase):
def createColorfulTestImage(self): def createColorfulTestImage(self):
ch255 = tf.fill([1, 100, 200, 1], tf.constant(255, dtype=tf.uint8)) ch255 = tf.fill([1, 100, 200, 1], tf.constant(255, dtype=tf.uint8))
...@@ -2478,6 +2478,233 @@ class PreprocessorTest(tf.test.TestCase): ...@@ -2478,6 +2478,233 @@ class PreprocessorTest(tf.test.TestCase):
[images_shape, blacked_images_shape]) [images_shape, blacked_images_shape])
self.assertAllEqual(images_shape_, blacked_images_shape_) self.assertAllEqual(images_shape_, blacked_images_shape_)
def testRandomJpegQuality(self):
preprocessing_options = [(preprocessor.random_jpeg_quality, {
'min_jpeg_quality': 0,
'max_jpeg_quality': 100
})]
images = self.createTestImages()
tensor_dict = {fields.InputDataFields.image: images}
processed_tensor_dict = preprocessor.preprocess(tensor_dict,
preprocessing_options)
encoded_images = processed_tensor_dict[fields.InputDataFields.image]
images_shape = tf.shape(images)
encoded_images_shape = tf.shape(encoded_images)
with self.test_session() as sess:
images_shape_out, encoded_images_shape_out = sess.run(
[images_shape, encoded_images_shape])
self.assertAllEqual(images_shape_out, encoded_images_shape_out)
def testRandomJpegQualityKeepsStaticChannelShape(self):
# Set at least three weeks past the forward compatibility horizon for
# tf 1.14 of 2019/11/01.
# https://github.com/tensorflow/tensorflow/blob/v1.14.0/tensorflow/python/compat/compat.py#L30
if not tf.compat.forward_compatible(year=2019, month=12, day=1):
self.skipTest('Skipping test for future functionality.')
preprocessing_options = [(preprocessor.random_jpeg_quality, {
'min_jpeg_quality': 0,
'max_jpeg_quality': 100
})]
images = self.createTestImages()
tensor_dict = {fields.InputDataFields.image: images}
processed_tensor_dict = preprocessor.preprocess(tensor_dict,
preprocessing_options)
encoded_images = processed_tensor_dict[fields.InputDataFields.image]
images_static_channels = images.shape[-1]
encoded_images_static_channels = encoded_images.shape[-1]
self.assertEqual(images_static_channels, encoded_images_static_channels)
def testRandomJpegQualityWithCache(self):
preprocessing_options = [(preprocessor.random_jpeg_quality, {
'min_jpeg_quality': 0,
'max_jpeg_quality': 100
})]
self._testPreprocessorCache(preprocessing_options)
def testRandomJpegQualityWithRandomCoefOne(self):
preprocessing_options = [(preprocessor.random_jpeg_quality, {
'random_coef': 1.0
})]
images = self.createTestImages()
tensor_dict = {fields.InputDataFields.image: images}
processed_tensor_dict = preprocessor.preprocess(tensor_dict,
preprocessing_options)
encoded_images = processed_tensor_dict[fields.InputDataFields.image]
images_shape = tf.shape(images)
encoded_images_shape = tf.shape(encoded_images)
with self.test_session() as sess:
(images_out, encoded_images_out, images_shape_out,
encoded_images_shape_out) = sess.run(
[images, encoded_images, images_shape, encoded_images_shape])
self.assertAllEqual(images_shape_out, encoded_images_shape_out)
self.assertAllEqual(images_out, encoded_images_out)
def testRandomDownscaleToTargetPixels(self):
preprocessing_options = [(preprocessor.random_downscale_to_target_pixels, {
'min_target_pixels': 100,
'max_target_pixels': 101
})]
images = tf.random_uniform([1, 25, 100, 3])
tensor_dict = {fields.InputDataFields.image: images}
processed_tensor_dict = preprocessor.preprocess(tensor_dict,
preprocessing_options)
downscaled_images = processed_tensor_dict[fields.InputDataFields.image]
downscaled_shape = tf.shape(downscaled_images)
expected_shape = [1, 5, 20, 3]
with self.test_session() as sess:
downscaled_shape_out = sess.run(downscaled_shape)
self.assertAllEqual(downscaled_shape_out, expected_shape)
def testRandomDownscaleToTargetPixelsWithMasks(self):
preprocessing_options = [(preprocessor.random_downscale_to_target_pixels, {
'min_target_pixels': 100,
'max_target_pixels': 101
})]
images = tf.random_uniform([1, 25, 100, 3])
masks = tf.random_uniform([10, 25, 100])
tensor_dict = {
fields.InputDataFields.image: images,
fields.InputDataFields.groundtruth_instance_masks: masks
}
preprocessor_arg_map = preprocessor.get_default_func_arg_map(
include_instance_masks=True)
processed_tensor_dict = preprocessor.preprocess(
tensor_dict, preprocessing_options, func_arg_map=preprocessor_arg_map)
downscaled_images = processed_tensor_dict[fields.InputDataFields.image]
downscaled_masks = processed_tensor_dict[
fields.InputDataFields.groundtruth_instance_masks]
downscaled_images_shape = tf.shape(downscaled_images)
downscaled_masks_shape = tf.shape(downscaled_masks)
expected_images_shape = [1, 5, 20, 3]
expected_masks_shape = [10, 5, 20]
with self.test_session() as sess:
downscaled_images_shape_out, downscaled_masks_shape_out = sess.run(
[downscaled_images_shape, downscaled_masks_shape])
self.assertAllEqual(downscaled_images_shape_out, expected_images_shape)
self.assertAllEqual(downscaled_masks_shape_out, expected_masks_shape)
@parameterized.parameters(
{'test_masks': False},
{'test_masks': True}
)
def testRandomDownscaleToTargetPixelsWithCache(self, test_masks):
preprocessing_options = [(preprocessor.random_downscale_to_target_pixels, {
'min_target_pixels': 100,
'max_target_pixels': 999
})]
self._testPreprocessorCache(preprocessing_options, test_masks=test_masks)
def testRandomDownscaleToTargetPixelsWithRandomCoefOne(self):
preprocessing_options = [(preprocessor.random_downscale_to_target_pixels, {
'random_coef': 1.0,
'min_target_pixels': 10,
'max_target_pixels': 20,
})]
images = tf.random_uniform([1, 25, 100, 3])
tensor_dict = {fields.InputDataFields.image: images}
processed_tensor_dict = preprocessor.preprocess(tensor_dict,
preprocessing_options)
downscaled_images = processed_tensor_dict[fields.InputDataFields.image]
images_shape = tf.shape(images)
downscaled_images_shape = tf.shape(downscaled_images)
with self.test_session() as sess:
(images_out, downscaled_images_out, images_shape_out,
downscaled_images_shape_out) = sess.run(
[images, downscaled_images, images_shape, downscaled_images_shape])
self.assertAllEqual(images_shape_out, downscaled_images_shape_out)
self.assertAllEqual(images_out, downscaled_images_out)
def testRandomDownscaleToTargetPixelsIgnoresSmallImages(self):
preprocessing_options = [(preprocessor.random_downscale_to_target_pixels, {
'min_target_pixels': 1000,
'max_target_pixels': 1001
})]
images = tf.random_uniform([1, 10, 10, 3])
tensor_dict = {fields.InputDataFields.image: images}
processed_tensor_dict = preprocessor.preprocess(tensor_dict,
preprocessing_options)
downscaled_images = processed_tensor_dict[fields.InputDataFields.image]
images_shape = tf.shape(images)
downscaled_images_shape = tf.shape(downscaled_images)
with self.test_session() as sess:
(images_out, downscaled_images_out, images_shape_out,
downscaled_images_shape_out) = sess.run(
[images, downscaled_images, images_shape, downscaled_images_shape])
self.assertAllEqual(images_shape_out, downscaled_images_shape_out)
self.assertAllEqual(images_out, downscaled_images_out)
def testRandomPatchGaussianShape(self):
preprocessing_options = [(preprocessor.random_patch_gaussian, {
'min_patch_size': 1,
'max_patch_size': 200,
'min_gaussian_stddev': 0.0,
'max_gaussian_stddev': 2.0
})]
images = self.createTestImages()
tensor_dict = {fields.InputDataFields.image: images}
processed_tensor_dict = preprocessor.preprocess(tensor_dict,
preprocessing_options)
patched_images = processed_tensor_dict[fields.InputDataFields.image]
images_shape = tf.shape(images)
patched_images_shape = tf.shape(patched_images)
self.assertAllEqual(images_shape, patched_images_shape)
def testRandomPatchGaussianClippedToLowerBound(self):
preprocessing_options = [(preprocessor.random_patch_gaussian, {
'min_patch_size': 20,
'max_patch_size': 40,
'min_gaussian_stddev': 50,
'max_gaussian_stddev': 100
})]
images = tf.zeros([1, 5, 4, 3])
tensor_dict = {fields.InputDataFields.image: images}
processed_tensor_dict = preprocessor.preprocess(tensor_dict,
preprocessing_options)
patched_images = processed_tensor_dict[fields.InputDataFields.image]
self.assertAllGreaterEqual(patched_images, 0.0)
def testRandomPatchGaussianClippedToUpperBound(self):
preprocessing_options = [(preprocessor.random_patch_gaussian, {
'min_patch_size': 20,
'max_patch_size': 40,
'min_gaussian_stddev': 50,
'max_gaussian_stddev': 100
})]
images = tf.constant(255.0, shape=[1, 5, 4, 3])
tensor_dict = {fields.InputDataFields.image: images}
processed_tensor_dict = preprocessor.preprocess(tensor_dict,
preprocessing_options)
patched_images = processed_tensor_dict[fields.InputDataFields.image]
self.assertAllLessEqual(patched_images, 255.0)
def testRandomPatchGaussianWithCache(self):
preprocessing_options = [(preprocessor.random_patch_gaussian, {
'min_patch_size': 1,
'max_patch_size': 200,
'min_gaussian_stddev': 0.0,
'max_gaussian_stddev': 2.0
})]
self._testPreprocessorCache(preprocessing_options)
def testRandomPatchGaussianWithRandomCoefOne(self):
preprocessing_options = [(preprocessor.random_patch_gaussian, {
'random_coef': 1.0
})]
images = self.createTestImages()
tensor_dict = {fields.InputDataFields.image: images}
processed_tensor_dict = preprocessor.preprocess(tensor_dict,
preprocessing_options)
patched_images = processed_tensor_dict[fields.InputDataFields.image]
images_shape = tf.shape(images)
patched_images_shape = tf.shape(patched_images)
self.assertAllEqual(images_shape, patched_images_shape)
self.assertAllEqual(images, patched_images)
def testAutoAugmentImage(self): def testAutoAugmentImage(self):
preprocessing_options = [] preprocessing_options = []
preprocessing_options.append((preprocessor.autoaugment_image, { preprocessing_options.append((preprocessor.autoaugment_image, {
......
...@@ -168,6 +168,22 @@ class BoxListFields(object): ...@@ -168,6 +168,22 @@ class BoxListFields(object):
is_crowd = 'is_crowd' is_crowd = 'is_crowd'
class PredictionFields(object):
"""Naming conventions for standardized prediction outputs.
Attributes:
feature_maps: List of feature maps for prediction.
anchors: Generated anchors.
raw_detection_boxes: Decoded detection boxes without NMS.
raw_detection_feature_map_indices: Feature map indices from which each raw
detection box was produced.
"""
feature_maps = 'feature_maps'
anchors = 'anchors'
raw_detection_boxes = 'raw_detection_boxes'
raw_detection_feature_map_indices = 'raw_detection_feature_map_indices'
class TfExampleFields(object): class TfExampleFields(object):
"""TF-example proto feature names for object detection. """TF-example proto feature names for object detection.
......
...@@ -41,8 +41,9 @@ import tensorflow as tf ...@@ -41,8 +41,9 @@ import tensorflow as tf
from object_detection.box_coders import faster_rcnn_box_coder from object_detection.box_coders import faster_rcnn_box_coder
from object_detection.box_coders import mean_stddev_box_coder from object_detection.box_coders import mean_stddev_box_coder
from object_detection.core import box_coder as bcoder from object_detection.core import box_coder
from object_detection.core import box_list from object_detection.core import box_list
from object_detection.core import box_list_ops
from object_detection.core import matcher as mat from object_detection.core import matcher as mat
from object_detection.core import region_similarity_calculator as sim_calc from object_detection.core import region_similarity_calculator as sim_calc
from object_detection.core import standard_fields as fields from object_detection.core import standard_fields as fields
...@@ -57,7 +58,7 @@ class TargetAssigner(object): ...@@ -57,7 +58,7 @@ class TargetAssigner(object):
def __init__(self, def __init__(self,
similarity_calc, similarity_calc,
matcher, matcher,
box_coder, box_coder_instance,
negative_class_weight=1.0): negative_class_weight=1.0):
"""Construct Object Detection Target Assigner. """Construct Object Detection Target Assigner.
...@@ -65,8 +66,8 @@ class TargetAssigner(object): ...@@ -65,8 +66,8 @@ class TargetAssigner(object):
similarity_calc: a RegionSimilarityCalculator similarity_calc: a RegionSimilarityCalculator
matcher: an object_detection.core.Matcher used to match groundtruth to matcher: an object_detection.core.Matcher used to match groundtruth to
anchors. anchors.
box_coder: an object_detection.core.BoxCoder used to encode matching box_coder_instance: an object_detection.core.BoxCoder used to encode
groundtruth boxes with respect to anchors. matching groundtruth boxes with respect to anchors.
negative_class_weight: classification weight to be associated to negative negative_class_weight: classification weight to be associated to negative
anchors (default: 1.0). The weight must be in [0., 1.]. anchors (default: 1.0). The weight must be in [0., 1.].
...@@ -78,11 +79,11 @@ class TargetAssigner(object): ...@@ -78,11 +79,11 @@ class TargetAssigner(object):
raise ValueError('similarity_calc must be a RegionSimilarityCalculator') raise ValueError('similarity_calc must be a RegionSimilarityCalculator')
if not isinstance(matcher, mat.Matcher): if not isinstance(matcher, mat.Matcher):
raise ValueError('matcher must be a Matcher') raise ValueError('matcher must be a Matcher')
if not isinstance(box_coder, bcoder.BoxCoder): if not isinstance(box_coder_instance, box_coder.BoxCoder):
raise ValueError('box_coder must be a BoxCoder') raise ValueError('box_coder must be a BoxCoder')
self._similarity_calc = similarity_calc self._similarity_calc = similarity_calc
self._matcher = matcher self._matcher = matcher
self._box_coder = box_coder self._box_coder = box_coder_instance
self._negative_class_weight = negative_class_weight self._negative_class_weight = negative_class_weight
@property @property
...@@ -391,7 +392,7 @@ def create_target_assigner(reference, stage=None, ...@@ -391,7 +392,7 @@ def create_target_assigner(reference, stage=None,
if reference == 'Multibox' and stage == 'proposal': if reference == 'Multibox' and stage == 'proposal':
similarity_calc = sim_calc.NegSqDistSimilarity() similarity_calc = sim_calc.NegSqDistSimilarity()
matcher = bipartite_matcher.GreedyBipartiteMatcher() matcher = bipartite_matcher.GreedyBipartiteMatcher()
box_coder = mean_stddev_box_coder.MeanStddevBoxCoder() box_coder_instance = mean_stddev_box_coder.MeanStddevBoxCoder()
elif reference == 'FasterRCNN' and stage == 'proposal': elif reference == 'FasterRCNN' and stage == 'proposal':
similarity_calc = sim_calc.IouSimilarity() similarity_calc = sim_calc.IouSimilarity()
...@@ -399,7 +400,7 @@ def create_target_assigner(reference, stage=None, ...@@ -399,7 +400,7 @@ def create_target_assigner(reference, stage=None,
unmatched_threshold=0.3, unmatched_threshold=0.3,
force_match_for_each_row=True, force_match_for_each_row=True,
use_matmul_gather=use_matmul_gather) use_matmul_gather=use_matmul_gather)
box_coder = faster_rcnn_box_coder.FasterRcnnBoxCoder( box_coder_instance = faster_rcnn_box_coder.FasterRcnnBoxCoder(
scale_factors=[10.0, 10.0, 5.0, 5.0]) scale_factors=[10.0, 10.0, 5.0, 5.0])
elif reference == 'FasterRCNN' and stage == 'detection': elif reference == 'FasterRCNN' and stage == 'detection':
...@@ -408,7 +409,7 @@ def create_target_assigner(reference, stage=None, ...@@ -408,7 +409,7 @@ def create_target_assigner(reference, stage=None,
matcher = argmax_matcher.ArgMaxMatcher(matched_threshold=0.5, matcher = argmax_matcher.ArgMaxMatcher(matched_threshold=0.5,
negatives_lower_than_unmatched=True, negatives_lower_than_unmatched=True,
use_matmul_gather=use_matmul_gather) use_matmul_gather=use_matmul_gather)
box_coder = faster_rcnn_box_coder.FasterRcnnBoxCoder( box_coder_instance = faster_rcnn_box_coder.FasterRcnnBoxCoder(
scale_factors=[10.0, 10.0, 5.0, 5.0]) scale_factors=[10.0, 10.0, 5.0, 5.0])
elif reference == 'FastRCNN': elif reference == 'FastRCNN':
...@@ -418,12 +419,12 @@ def create_target_assigner(reference, stage=None, ...@@ -418,12 +419,12 @@ def create_target_assigner(reference, stage=None,
force_match_for_each_row=False, force_match_for_each_row=False,
negatives_lower_than_unmatched=False, negatives_lower_than_unmatched=False,
use_matmul_gather=use_matmul_gather) use_matmul_gather=use_matmul_gather)
box_coder = faster_rcnn_box_coder.FasterRcnnBoxCoder() box_coder_instance = faster_rcnn_box_coder.FasterRcnnBoxCoder()
else: else:
raise ValueError('No valid combination of reference and stage.') raise ValueError('No valid combination of reference and stage.')
return TargetAssigner(similarity_calc, matcher, box_coder, return TargetAssigner(similarity_calc, matcher, box_coder_instance,
negative_class_weight=negative_class_weight) negative_class_weight=negative_class_weight)
...@@ -702,3 +703,5 @@ def batch_assign_confidences(target_assigner, ...@@ -702,3 +703,5 @@ def batch_assign_confidences(target_assigner,
batch_match = tf.stack(match_list) batch_match = tf.stack(match_list)
return (batch_cls_targets, batch_cls_weights, batch_reg_targets, return (batch_cls_targets, batch_cls_weights, batch_reg_targets,
batch_reg_weights, batch_match) batch_reg_weights, batch_match)
...@@ -67,7 +67,8 @@ def append_postprocessing_op(frozen_graph_def, ...@@ -67,7 +67,8 @@ def append_postprocessing_op(frozen_graph_def,
num_classes, num_classes,
scale_values, scale_values,
detections_per_class=100, detections_per_class=100,
use_regular_nms=False): use_regular_nms=False,
additional_output_tensors=()):
"""Appends postprocessing custom op. """Appends postprocessing custom op.
Args: Args:
...@@ -82,11 +83,13 @@ def append_postprocessing_op(frozen_graph_def, ...@@ -82,11 +83,13 @@ def append_postprocessing_op(frozen_graph_def,
num_classes: number of classes in SSD detector num_classes: number of classes in SSD detector
scale_values: scale values is a dict with following key-value pairs scale_values: scale values is a dict with following key-value pairs
{y_scale: 10, x_scale: 10, h_scale: 5, w_scale: 5} that are used in decode {y_scale: 10, x_scale: 10, h_scale: 5, w_scale: 5} that are used in decode
centersize boxes centersize boxes
detections_per_class: In regular NonMaxSuppression, number of anchors used detections_per_class: In regular NonMaxSuppression, number of anchors used
for NonMaxSuppression per class for NonMaxSuppression per class
use_regular_nms: Flag to set postprocessing op to use Regular NMS instead use_regular_nms: Flag to set postprocessing op to use Regular NMS instead of
of Fast NMS. Fast NMS.
additional_output_tensors: Array of additional tensor names to output.
Tensors are appended after postprocessing output.
Returns: Returns:
transformed_graph_def: Frozen GraphDef with postprocessing custom op transformed_graph_def: Frozen GraphDef with postprocessing custom op
...@@ -140,7 +143,8 @@ def append_postprocessing_op(frozen_graph_def, ...@@ -140,7 +143,8 @@ def append_postprocessing_op(frozen_graph_def,
['raw_outputs/box_encodings', 'raw_outputs/class_predictions', 'anchors']) ['raw_outputs/box_encodings', 'raw_outputs/class_predictions', 'anchors'])
# Transform the graph to append new postprocessing op # Transform the graph to append new postprocessing op
input_names = [] input_names = []
output_names = ['TFLite_Detection_PostProcess'] output_names = ['TFLite_Detection_PostProcess'
] + list(additional_output_tensors)
transforms = ['strip_unused_nodes'] transforms = ['strip_unused_nodes']
transformed_graph_def = TransformGraph(frozen_graph_def, input_names, transformed_graph_def = TransformGraph(frozen_graph_def, input_names,
output_names, transforms) output_names, transforms)
...@@ -156,7 +160,8 @@ def export_tflite_graph(pipeline_config, ...@@ -156,7 +160,8 @@ def export_tflite_graph(pipeline_config,
detections_per_class=100, detections_per_class=100,
use_regular_nms=False, use_regular_nms=False,
binary_graph_name='tflite_graph.pb', binary_graph_name='tflite_graph.pb',
txt_graph_name='tflite_graph.pbtxt'): txt_graph_name='tflite_graph.pbtxt',
additional_output_tensors=()):
"""Exports a tflite compatible graph and anchors for ssd detection model. """Exports a tflite compatible graph and anchors for ssd detection model.
Anchors are written to a tensor and tflite compatible graph Anchors are written to a tensor and tflite compatible graph
...@@ -173,11 +178,13 @@ def export_tflite_graph(pipeline_config, ...@@ -173,11 +178,13 @@ def export_tflite_graph(pipeline_config,
max_detections: Maximum number of detections (boxes) to show max_detections: Maximum number of detections (boxes) to show
max_classes_per_detection: Number of classes to display per detection max_classes_per_detection: Number of classes to display per detection
detections_per_class: In regular NonMaxSuppression, number of anchors used detections_per_class: In regular NonMaxSuppression, number of anchors used
for NonMaxSuppression per class for NonMaxSuppression per class
use_regular_nms: Flag to set postprocessing op to use Regular NMS instead use_regular_nms: Flag to set postprocessing op to use Regular NMS instead of
of Fast NMS. Fast NMS.
binary_graph_name: Name of the exported graph file in binary format. binary_graph_name: Name of the exported graph file in binary format.
txt_graph_name: Name of the exported graph file in text format. txt_graph_name: Name of the exported graph file in text format.
additional_output_tensors: Array of additional tensor names to output.
Additional tensors are appended to the end of output tensor list.
Raises: Raises:
ValueError: if the pipeline config contains models other than ssd or uses an ValueError: if the pipeline config contains models other than ssd or uses an
...@@ -191,12 +198,12 @@ def export_tflite_graph(pipeline_config, ...@@ -191,12 +198,12 @@ def export_tflite_graph(pipeline_config,
num_classes = pipeline_config.model.ssd.num_classes num_classes = pipeline_config.model.ssd.num_classes
nms_score_threshold = { nms_score_threshold = {
pipeline_config.model.ssd.post_processing.batch_non_max_suppression. pipeline_config.model.ssd.post_processing.batch_non_max_suppression
score_threshold .score_threshold
} }
nms_iou_threshold = { nms_iou_threshold = {
pipeline_config.model.ssd.post_processing.batch_non_max_suppression. pipeline_config.model.ssd.post_processing.batch_non_max_suppression
iou_threshold .iou_threshold
} }
scale_values = {} scale_values = {}
scale_values['y_scale'] = { scale_values['y_scale'] = {
...@@ -291,7 +298,7 @@ def export_tflite_graph(pipeline_config, ...@@ -291,7 +298,7 @@ def export_tflite_graph(pipeline_config,
output_node_names=','.join([ output_node_names=','.join([
'raw_outputs/box_encodings', 'raw_outputs/class_predictions', 'raw_outputs/box_encodings', 'raw_outputs/class_predictions',
'anchors' 'anchors'
]), ] + list(additional_output_tensors)),
restore_op_name='save/restore_all', restore_op_name='save/restore_all',
filename_tensor_name='save/Const:0', filename_tensor_name='save/Const:0',
clear_devices=True, clear_devices=True,
...@@ -301,9 +308,16 @@ def export_tflite_graph(pipeline_config, ...@@ -301,9 +308,16 @@ def export_tflite_graph(pipeline_config,
# Add new operation to do post processing in a custom op (TF Lite only) # Add new operation to do post processing in a custom op (TF Lite only)
if add_postprocessing_op: if add_postprocessing_op:
transformed_graph_def = append_postprocessing_op( transformed_graph_def = append_postprocessing_op(
frozen_graph_def, max_detections, max_classes_per_detection, frozen_graph_def,
nms_score_threshold, nms_iou_threshold, num_classes, scale_values, max_detections,
detections_per_class, use_regular_nms) max_classes_per_detection,
nms_score_threshold,
nms_iou_threshold,
num_classes,
scale_values,
detections_per_class,
use_regular_nms,
additional_output_tensors=additional_output_tensors)
else: else:
# Return frozen without adding post-processing custom op # Return frozen without adding post-processing custom op
transformed_graph_def = frozen_graph_def transformed_graph_def = frozen_graph_def
......
...@@ -12,7 +12,6 @@ ...@@ -12,7 +12,6 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
# ============================================================================== # ==============================================================================
"""Tests for object_detection.export_tflite_ssd_graph.""" """Tests for object_detection.export_tflite_ssd_graph."""
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
...@@ -31,7 +30,6 @@ from object_detection.protos import graph_rewriter_pb2 ...@@ -31,7 +30,6 @@ from object_detection.protos import graph_rewriter_pb2
from object_detection.protos import pipeline_pb2 from object_detection.protos import pipeline_pb2
from object_detection.protos import post_processing_pb2 from object_detection.protos import post_processing_pb2
if six.PY2: if six.PY2:
import mock # pylint: disable=g-import-not-at-top import mock # pylint: disable=g-import-not-at-top
else: else:
...@@ -130,7 +128,10 @@ class ExportTfliteGraphTest(tf.test.TestCase): ...@@ -130,7 +128,10 @@ class ExportTfliteGraphTest(tf.test.TestCase):
feed_dict={input_tensor: np.random.rand(1, 10, 10, num_channels)}) feed_dict={input_tensor: np.random.rand(1, 10, 10, num_channels)})
return box_encodings_np, class_predictions_np return box_encodings_np, class_predictions_np
def _export_graph(self, pipeline_config, num_channels=3): def _export_graph(self,
pipeline_config,
num_channels=3,
additional_output_tensors=()):
"""Exports a tflite graph.""" """Exports a tflite graph."""
output_dir = self.get_temp_dir() output_dir = self.get_temp_dir()
trained_checkpoint_prefix = os.path.join(output_dir, 'model.ckpt') trained_checkpoint_prefix = os.path.join(output_dir, 'model.ckpt')
...@@ -147,18 +148,22 @@ class ExportTfliteGraphTest(tf.test.TestCase): ...@@ -147,18 +148,22 @@ class ExportTfliteGraphTest(tf.test.TestCase):
mock_builder.return_value = FakeModel() mock_builder.return_value = FakeModel()
with tf.Graph().as_default(): with tf.Graph().as_default():
tf.identity(
tf.constant([[1, 2], [3, 4]], tf.uint8), name='UnattachedTensor')
export_tflite_ssd_graph_lib.export_tflite_graph( export_tflite_ssd_graph_lib.export_tflite_graph(
pipeline_config=pipeline_config, pipeline_config=pipeline_config,
trained_checkpoint_prefix=trained_checkpoint_prefix, trained_checkpoint_prefix=trained_checkpoint_prefix,
output_dir=output_dir, output_dir=output_dir,
add_postprocessing_op=False, add_postprocessing_op=False,
max_detections=10, max_detections=10,
max_classes_per_detection=1) max_classes_per_detection=1,
additional_output_tensors=additional_output_tensors)
return tflite_graph_file return tflite_graph_file
def _export_graph_with_postprocessing_op(self, def _export_graph_with_postprocessing_op(self,
pipeline_config, pipeline_config,
num_channels=3): num_channels=3,
additional_output_tensors=()):
"""Exports a tflite graph with custom postprocessing op.""" """Exports a tflite graph with custom postprocessing op."""
output_dir = self.get_temp_dir() output_dir = self.get_temp_dir()
trained_checkpoint_prefix = os.path.join(output_dir, 'model.ckpt') trained_checkpoint_prefix = os.path.join(output_dir, 'model.ckpt')
...@@ -175,13 +180,16 @@ class ExportTfliteGraphTest(tf.test.TestCase): ...@@ -175,13 +180,16 @@ class ExportTfliteGraphTest(tf.test.TestCase):
mock_builder.return_value = FakeModel() mock_builder.return_value = FakeModel()
with tf.Graph().as_default(): with tf.Graph().as_default():
tf.identity(
tf.constant([[1, 2], [3, 4]], tf.uint8), name='UnattachedTensor')
export_tflite_ssd_graph_lib.export_tflite_graph( export_tflite_ssd_graph_lib.export_tflite_graph(
pipeline_config=pipeline_config, pipeline_config=pipeline_config,
trained_checkpoint_prefix=trained_checkpoint_prefix, trained_checkpoint_prefix=trained_checkpoint_prefix,
output_dir=output_dir, output_dir=output_dir,
add_postprocessing_op=True, add_postprocessing_op=True,
max_detections=10, max_detections=10,
max_classes_per_detection=1) max_classes_per_detection=1,
additional_output_tensors=additional_output_tensors)
return tflite_graph_file return tflite_graph_file
def test_export_tflite_graph_with_moving_averages(self): def test_export_tflite_graph_with_moving_averages(self):
...@@ -325,7 +333,8 @@ class ExportTfliteGraphTest(tf.test.TestCase): ...@@ -325,7 +333,8 @@ class ExportTfliteGraphTest(tf.test.TestCase):
with tf.gfile.Open(tflite_graph_file) as f: with tf.gfile.Open(tflite_graph_file) as f:
graph_def.ParseFromString(f.read()) graph_def.ParseFromString(f.read())
all_op_names = [node.name for node in graph_def.node] all_op_names = [node.name for node in graph_def.node]
self.assertTrue('TFLite_Detection_PostProcess' in all_op_names) self.assertIn('TFLite_Detection_PostProcess', all_op_names)
self.assertNotIn('UnattachedTensor', all_op_names)
for node in graph_def.node: for node in graph_def.node:
if node.name == 'TFLite_Detection_PostProcess': if node.name == 'TFLite_Detection_PostProcess':
self.assertTrue(node.attr['_output_quantized'].b is True) self.assertTrue(node.attr['_output_quantized'].b is True)
...@@ -342,6 +351,42 @@ class ExportTfliteGraphTest(tf.test.TestCase): ...@@ -342,6 +351,42 @@ class ExportTfliteGraphTest(tf.test.TestCase):
for t in node.attr['_output_types'].list.type for t in node.attr['_output_types'].list.type
])) ]))
def test_export_tflite_graph_with_additional_tensors(self):
pipeline_config = pipeline_pb2.TrainEvalPipelineConfig()
pipeline_config.eval_config.use_moving_averages = False
pipeline_config.model.ssd.image_resizer.fixed_shape_resizer.height = 10
pipeline_config.model.ssd.image_resizer.fixed_shape_resizer.width = 10
tflite_graph_file = self._export_graph(
pipeline_config, additional_output_tensors=['UnattachedTensor'])
self.assertTrue(os.path.exists(tflite_graph_file))
graph = tf.Graph()
with graph.as_default():
graph_def = tf.GraphDef()
with tf.gfile.Open(tflite_graph_file) as f:
graph_def.ParseFromString(f.read())
all_op_names = [node.name for node in graph_def.node]
self.assertIn('UnattachedTensor', all_op_names)
def test_export_tflite_graph_with_postprocess_op_and_additional_tensors(self):
pipeline_config = pipeline_pb2.TrainEvalPipelineConfig()
pipeline_config.eval_config.use_moving_averages = False
pipeline_config.model.ssd.post_processing.score_converter = (
post_processing_pb2.PostProcessing.SIGMOID)
pipeline_config.model.ssd.image_resizer.fixed_shape_resizer.height = 10
pipeline_config.model.ssd.image_resizer.fixed_shape_resizer.width = 10
pipeline_config.model.ssd.num_classes = 2
tflite_graph_file = self._export_graph_with_postprocessing_op(
pipeline_config, additional_output_tensors=['UnattachedTensor'])
self.assertTrue(os.path.exists(tflite_graph_file))
graph = tf.Graph()
with graph.as_default():
graph_def = tf.GraphDef()
with tf.gfile.Open(tflite_graph_file) as f:
graph_def.ParseFromString(f.read())
all_op_names = [node.name for node in graph_def.node]
self.assertIn('TFLite_Detection_PostProcess', all_op_names)
self.assertIn('UnattachedTensor', all_op_names)
@mock.patch.object(exporter, 'rewrite_nn_resize_op') @mock.patch.object(exporter, 'rewrite_nn_resize_op')
def test_export_with_nn_resize_op_not_called_without_fpn(self, mock_get): def test_export_with_nn_resize_op_not_called_without_fpn(self, mock_get):
pipeline_config = pipeline_pb2.TrainEvalPipelineConfig() pipeline_config = pipeline_pb2.TrainEvalPipelineConfig()
......
...@@ -40,50 +40,54 @@ def rewrite_nn_resize_op(is_quantized=False): ...@@ -40,50 +40,54 @@ def rewrite_nn_resize_op(is_quantized=False):
Args: Args:
is_quantized: True if the default graph is quantized. is_quantized: True if the default graph is quantized.
""" """
input_pattern = graph_matcher.OpTypePattern( def remove_nn():
'FakeQuantWithMinMaxVars' if is_quantized else '*') """Remove nearest neighbor upsampling structure and replace with TF op."""
reshape_1_pattern = graph_matcher.OpTypePattern( input_pattern = graph_matcher.OpTypePattern(
'Reshape', inputs=[input_pattern, 'Const'], ordered_inputs=False) 'FakeQuantWithMinMaxVars' if is_quantized else '*')
mul_pattern = graph_matcher.OpTypePattern( stack_1_pattern = graph_matcher.OpTypePattern(
'Mul', inputs=[reshape_1_pattern, 'Const'], ordered_inputs=False) 'Pack', inputs=[input_pattern, input_pattern], ordered_inputs=False)
# The quantization script may or may not insert a fake quant op after the stack_2_pattern = graph_matcher.OpTypePattern(
# Mul. In either case, these min/max vars are not needed once replaced with 'Pack', inputs=[stack_1_pattern, stack_1_pattern], ordered_inputs=False)
# the TF version of NN resize. reshape_pattern = graph_matcher.OpTypePattern(
fake_quant_pattern = graph_matcher.OpTypePattern( 'Reshape', inputs=[stack_2_pattern, 'Const'], ordered_inputs=False)
'FakeQuantWithMinMaxVars', consumer_pattern = graph_matcher.OpTypePattern(
inputs=[mul_pattern, 'Identity', 'Identity'], 'Add|AddV2|Max|Mul', inputs=[reshape_pattern, '*'],
ordered_inputs=False) ordered_inputs=False)
reshape_2_pattern = graph_matcher.OpTypePattern(
'Reshape', match_counter = 0
inputs=[graph_matcher.OneofPattern([fake_quant_pattern, mul_pattern]), matcher = graph_matcher.GraphMatcher(consumer_pattern)
'Const'], for match in matcher.match_graph(tf.get_default_graph()):
ordered_inputs=False) match_counter += 1
add_type_name = 'Add' projection_op = match.get_op(input_pattern)
if tf.compat.forward_compatible(2019, 6, 26): reshape_op = match.get_op(reshape_pattern)
add_type_name = 'AddV2' consumer_op = match.get_op(consumer_pattern)
add_pattern = graph_matcher.OpTypePattern( nn_resize = tf.image.resize_nearest_neighbor(
add_type_name, inputs=[reshape_2_pattern, '*'], ordered_inputs=False) projection_op.outputs[0],
reshape_op.outputs[0].shape.dims[1:3],
matcher = graph_matcher.GraphMatcher(add_pattern) align_corners=False,
for match in matcher.match_graph(tf.get_default_graph()): name=os.path.split(reshape_op.name)[0] + '/resize_nearest_neighbor')
projection_op = match.get_op(input_pattern)
reshape_2_op = match.get_op(reshape_2_pattern) for index, op_input in enumerate(consumer_op.inputs):
add_op = match.get_op(add_pattern) if op_input == reshape_op.outputs[0]:
nn_resize = tf.image.resize_nearest_neighbor( consumer_op._update_input(index, nn_resize) # pylint: disable=protected-access
projection_op.outputs[0], break
add_op.outputs[0].shape.dims[1:3],
align_corners=False, tf.logging.info('Found and fixed {} matches'.format(match_counter))
name=os.path.split(reshape_2_op.name)[0] + '/resize_nearest_neighbor') return match_counter
for index, op_input in enumerate(add_op.inputs): # Applying twice because both inputs to Add could be NN pattern
if op_input == reshape_2_op.outputs[0]: total_removals = 0
add_op._update_input(index, nn_resize) # pylint: disable=protected-access while remove_nn():
break total_removals += 1
# This number is chosen based on the nas-fpn architecture.
if total_removals > 4:
raise ValueError('Graph removal encountered a infinite loop.')
def replace_variable_values_with_moving_averages(graph, def replace_variable_values_with_moving_averages(graph,
current_checkpoint_file, current_checkpoint_file,
new_checkpoint_file): new_checkpoint_file,
no_ema_collection=None):
"""Replaces variable values in the checkpoint with their moving averages. """Replaces variable values in the checkpoint with their moving averages.
If the current checkpoint has shadow variables maintaining moving averages of If the current checkpoint has shadow variables maintaining moving averages of
...@@ -95,10 +99,14 @@ def replace_variable_values_with_moving_averages(graph, ...@@ -95,10 +99,14 @@ def replace_variable_values_with_moving_averages(graph,
current_checkpoint_file: a checkpoint containing both original variables and current_checkpoint_file: a checkpoint containing both original variables and
their moving averages. their moving averages.
new_checkpoint_file: file path to write a new checkpoint. new_checkpoint_file: file path to write a new checkpoint.
no_ema_collection: A list of namescope substrings to match the variables
to eliminate EMA.
""" """
with graph.as_default(): with graph.as_default():
variable_averages = tf.train.ExponentialMovingAverage(0.0) variable_averages = tf.train.ExponentialMovingAverage(0.0)
ema_variables_to_restore = variable_averages.variables_to_restore() ema_variables_to_restore = variable_averages.variables_to_restore()
ema_variables_to_restore = config_util.remove_unecessary_ema(
ema_variables_to_restore, no_ema_collection)
with tf.Session() as sess: with tf.Session() as sess:
read_saver = tf.train.Saver(ema_variables_to_restore) read_saver = tf.train.Saver(ema_variables_to_restore)
read_saver.restore(sess, current_checkpoint_file) read_saver.restore(sess, current_checkpoint_file)
......
...@@ -21,6 +21,7 @@ import tensorflow as tf ...@@ -21,6 +21,7 @@ import tensorflow as tf
from google.protobuf import text_format from google.protobuf import text_format
from tensorflow.python.framework import dtypes from tensorflow.python.framework import dtypes
from tensorflow.python.ops import array_ops from tensorflow.python.ops import array_ops
from tensorflow.python.tools import strip_unused_lib
from object_detection import exporter from object_detection import exporter
from object_detection.builders import graph_rewriter_builder from object_detection.builders import graph_rewriter_builder
from object_detection.builders import model_builder from object_detection.builders import model_builder
...@@ -1056,6 +1057,42 @@ class ExportInferenceGraphTest(tf.test.TestCase): ...@@ -1056,6 +1057,42 @@ class ExportInferenceGraphTest(tf.test.TestCase):
self.assertTrue(resize_op_found) self.assertTrue(resize_op_found)
def test_rewrite_nn_resize_op_multiple_path(self):
g = tf.Graph()
with g.as_default():
with tf.name_scope('nearest_upsampling'):
x = array_ops.placeholder(dtypes.float32, shape=(8, 10, 10, 8))
x_stack = tf.stack([tf.stack([x] * 2, axis=3)] * 2, axis=2)
x_reshape = tf.reshape(x_stack, [8, 20, 20, 8])
with tf.name_scope('nearest_upsampling'):
x_2 = array_ops.placeholder(dtypes.float32, shape=(8, 10, 10, 8))
x_stack_2 = tf.stack([tf.stack([x_2] * 2, axis=3)] * 2, axis=2)
x_reshape_2 = tf.reshape(x_stack_2, [8, 20, 20, 8])
t = x_reshape + x_reshape_2
exporter.rewrite_nn_resize_op()
graph_def = g.as_graph_def()
graph_def = strip_unused_lib.strip_unused(
graph_def,
input_node_names=[
'nearest_upsampling/Placeholder', 'nearest_upsampling_1/Placeholder'
],
output_node_names=['add'],
placeholder_type_enum=dtypes.float32.as_datatype_enum)
counter_resize_op = 0
t_input_ops = [op.name for op in t.op.inputs]
for node in graph_def.node:
# Make sure Stacks are replaced.
self.assertNotEqual(node.op, 'Pack')
if node.op == 'ResizeNearestNeighbor':
counter_resize_op += 1
self.assertIn(node.name + ':0', t_input_ops)
self.assertEqual(counter_resize_op, 2)
if __name__ == '__main__': if __name__ == '__main__':
tf.test.main() tf.test.main()
...@@ -66,6 +66,9 @@ python models/research/object_detection/metrics/oid_challenge_evaluation.py \ ...@@ -66,6 +66,9 @@ python models/research/object_detection/metrics/oid_challenge_evaluation.py \
--output_metrics=${OUTPUT_METRICS} \ --output_metrics=${OUTPUT_METRICS} \
``` ```
Note that predictions file must contain the following keys:
ImageID,LabelName,Score,XMin,XMax,YMin,YMax
For the Object Detection Track, the participants will be ranked on: For the Object Detection Track, the participants will be ranked on:
- "OpenImagesDetectionChallenge_Precision/mAP@0.5IOU" - "OpenImagesDetectionChallenge_Precision/mAP@0.5IOU"
...@@ -94,10 +97,11 @@ evaluation metric implementation is available in the class ...@@ -94,10 +97,11 @@ evaluation metric implementation is available in the class
masks. masks.
Those should be transformed into a single CSV file in the format: Those should be transformed into a single CSV file in the format:
ImageID,LabelName,ImageWidth,ImageHeight,XMin,YMin,XMax,YMax,GroupOf,Mask ImageID,LabelName,ImageWidth,ImageHeight,XMin,YMin,XMax,YMax,IsGroupOf,Mask
where Mask is MS COCO RLE encoding of a binary mask stored in .png file. where Mask is MS COCO RLE encoding, compressed with zip, and re-coded with
base64 encoding of a binary mask stored in .png file. See an example
NOTE: the util to make the transformation will be released soon. implementation of the encoding function
[here](https://gist.github.com/pculliton/209398a2a52867580c6103e25e55d93c).
1. Run the following command to create hierarchical expansion of the instance 1. Run the following command to create hierarchical expansion of the instance
segmentation, bounding boxes and image-level label annotations: {value=4} segmentation, bounding boxes and image-level label annotations: {value=4}
...@@ -142,6 +146,11 @@ python models/research/object_detection/metrics/oid_challenge_evaluation.py \ ...@@ -142,6 +146,11 @@ python models/research/object_detection/metrics/oid_challenge_evaluation.py \
--output_metrics=${OUTPUT_METRICS} \ --output_metrics=${OUTPUT_METRICS} \
``` ```
Note that predictions file must contain the following keys:
ImageID,ImageWidth,ImageHeight,LabelName,Score,Mask
Mask must be encoded the same way as groundtruth masks.
For the Instance Segmentation Track, the participants will be ranked on: For the Instance Segmentation Track, the participants will be ranked on:
- "OpenImagesInstanceSegmentationChallenge_Precision/mAP@0.5IOU" - "OpenImagesInstanceSegmentationChallenge_Precision/mAP@0.5IOU"
...@@ -196,6 +205,9 @@ python object_detection/metrics/oid_vrd_challenge_evaluation.py \ ...@@ -196,6 +205,9 @@ python object_detection/metrics/oid_vrd_challenge_evaluation.py \
--output_metrics=${OUTPUT_METRICS} --output_metrics=${OUTPUT_METRICS}
``` ```
Note that predictions file must contain the following keys:
ImageID,LabelName1,LabelName2,RelationshipLabel,Score,XMin1,XMax1,YMin1,YMax1,XMin2,XMax2,YMin2,YMax2
The participants of the challenge will be evaluated by weighted average of the following three metrics: The participants of the challenge will be evaluated by weighted average of the following three metrics:
- "VRDMetric_Relationships_mAP@0.5IOU" - "VRDMetric_Relationships_mAP@0.5IOU"
......
...@@ -35,17 +35,20 @@ tar -xzvf ssd_mobilenet_v1_coco.tar.gz ...@@ -35,17 +35,20 @@ tar -xzvf ssd_mobilenet_v1_coco.tar.gz
Inside the un-tar'ed directory, you will find: Inside the un-tar'ed directory, you will find:
* a graph proto (`graph.pbtxt`) * a graph proto (`graph.pbtxt`)
* a checkpoint * a checkpoint (`model.ckpt.data-00000-of-00001`, `model.ckpt.index`,
(`model.ckpt.data-00000-of-00001`, `model.ckpt.index`, `model.ckpt.meta`) `model.ckpt.meta`)
* a frozen graph proto with weights baked into the graph as constants * a frozen graph proto with weights baked into the graph as constants
(`frozen_inference_graph.pb`) to be used for out of the box inference (`frozen_inference_graph.pb`) to be used for out of the box inference (try
(try this out in the Jupyter notebook!) this out in the Jupyter notebook!)
* a config file (`pipeline.config`) which was used to generate the graph. These * a config file (`pipeline.config`) which was used to generate the graph.
directly correspond to a config file in the These directly correspond to a config file in the
[samples/configs](https://github.com/tensorflow/models/tree/master/research/object_detection/samples/configs)) directory but often with a modified score threshold. In the case [samples/configs](https://github.com/tensorflow/models/tree/master/research/object_detection/samples/configs))
of the heavier Faster R-CNN models, we also provide a version of the model directory but often with a modified score threshold. In the case of the
that uses a highly reduced number of proposals for speed. heavier Faster R-CNN models, we also provide a version of the model that
uses a highly reduced number of proposals for speed.
* Mobile model only: a TfLite file (`model.tflite`) that can be deployed on
mobile devices.
Some remarks on frozen inference graphs: Some remarks on frozen inference graphs:
...@@ -100,6 +103,13 @@ Note: The asterisk (☆) at the end of model name indicates that this model supp ...@@ -100,6 +103,13 @@ Note: The asterisk (☆) at the end of model name indicates that this model supp
Note: If you download the tar.gz file of quantized models and un-tar, you will get different set of files - a checkpoint, a config file and tflite frozen graphs (txt/binary). Note: If you download the tar.gz file of quantized models and un-tar, you will get different set of files - a checkpoint, a config file and tflite frozen graphs (txt/binary).
### Mobile models
Model name | Pixel 1 Latency (ms) | COCO mAP | Outputs
----------------------------------------------------------------------------------------------------------------------------------- | :------------------: | :------: | :-----:
[ssd_mobilenet_v3_large_coco](http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v3_large_coco_2019_08_14.tar.gz) | 119 | 22.3 | Boxes
[ssd_mobilenet_v3_small_coco](http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v3_small_coco_2019_08_14.tar.gz) | 43 | 15.6 | Boxes
## Kitti-trained models ## Kitti-trained models
Model name | Speed (ms) | Pascal mAP@0.5 | Outputs Model name | Speed (ms) | Pascal mAP@0.5 | Outputs
......
...@@ -71,7 +71,8 @@ def transform_input_data(tensor_dict, ...@@ -71,7 +71,8 @@ def transform_input_data(tensor_dict,
merge_multiple_boxes=False, merge_multiple_boxes=False,
retain_original_image=False, retain_original_image=False,
use_multiclass_scores=False, use_multiclass_scores=False,
use_bfloat16=False): use_bfloat16=False,
retain_original_image_additional_channels=False):
"""A single function that is responsible for all input data transformations. """A single function that is responsible for all input data transformations.
Data transformation functions are applied in the following order. Data transformation functions are applied in the following order.
...@@ -110,6 +111,8 @@ def transform_input_data(tensor_dict, ...@@ -110,6 +111,8 @@ def transform_input_data(tensor_dict,
this is True and multiclass_scores is empty, one-hot encoding of this is True and multiclass_scores is empty, one-hot encoding of
`groundtruth_classes` is used as a fallback. `groundtruth_classes` is used as a fallback.
use_bfloat16: (optional) a bool, whether to use bfloat16 in training. use_bfloat16: (optional) a bool, whether to use bfloat16 in training.
retain_original_image_additional_channels: (optional) Whether to retain
original image additional channels in the output dictionary.
Returns: Returns:
A dictionary keyed by fields.InputDataFields containing the tensors obtained A dictionary keyed by fields.InputDataFields containing the tensors obtained
...@@ -139,6 +142,10 @@ def transform_input_data(tensor_dict, ...@@ -139,6 +142,10 @@ def transform_input_data(tensor_dict,
channels = out_tensor_dict[fields.InputDataFields.image_additional_channels] channels = out_tensor_dict[fields.InputDataFields.image_additional_channels]
out_tensor_dict[fields.InputDataFields.image] = tf.concat( out_tensor_dict[fields.InputDataFields.image] = tf.concat(
[out_tensor_dict[fields.InputDataFields.image], channels], axis=2) [out_tensor_dict[fields.InputDataFields.image], channels], axis=2)
if retain_original_image_additional_channels:
out_tensor_dict[
fields.InputDataFields.image_additional_channels] = tf.cast(
image_resizer_fn(channels, None)[0], tf.uint8)
# Apply data augmentation ops. # Apply data augmentation ops.
if data_augmentation_fn is not None: if data_augmentation_fn is not None:
...@@ -445,6 +452,9 @@ def _get_features_dict(input_dict): ...@@ -445,6 +452,9 @@ def _get_features_dict(input_dict):
if fields.InputDataFields.original_image in input_dict: if fields.InputDataFields.original_image in input_dict:
features[fields.InputDataFields.original_image] = input_dict[ features[fields.InputDataFields.original_image] = input_dict[
fields.InputDataFields.original_image] fields.InputDataFields.original_image]
if fields.InputDataFields.image_additional_channels in input_dict:
features[fields.InputDataFields.image_additional_channels] = input_dict[
fields.InputDataFields.image_additional_channels]
return features return features
...@@ -663,7 +673,9 @@ def eval_input(eval_config, eval_input_config, model_config, ...@@ -663,7 +673,9 @@ def eval_input(eval_config, eval_input_config, model_config,
image_resizer_fn=image_resizer_fn, image_resizer_fn=image_resizer_fn,
num_classes=num_classes, num_classes=num_classes,
data_augmentation_fn=None, data_augmentation_fn=None,
retain_original_image=eval_config.retain_original_images) retain_original_image=eval_config.retain_original_images,
retain_original_image_additional_channels=
eval_config.retain_original_image_additional_channels)
tensor_dict = pad_input_data_to_static_shapes( tensor_dict = pad_input_data_to_static_shapes(
tensor_dict=transform_data_fn(tensor_dict), tensor_dict=transform_data_fn(tensor_dict),
max_num_boxes=eval_input_config.max_number_of_boxes, max_num_boxes=eval_input_config.max_number_of_boxes,
......
...@@ -301,6 +301,70 @@ class InputsTest(test_case.TestCase, parameterized.TestCase): ...@@ -301,6 +301,70 @@ class InputsTest(test_case.TestCase, parameterized.TestCase):
self.assertEqual( self.assertEqual(
tf.int32, labels[fields.InputDataFields.groundtruth_difficult].dtype) tf.int32, labels[fields.InputDataFields.groundtruth_difficult].dtype)
def test_ssd_inceptionV2_eval_input_with_additional_channels(
self, eval_batch_size=1):
"""Tests the eval input function for SSDInceptionV2 with additional channels.
Args:
eval_batch_size: Batch size for eval set.
"""
configs = _get_configs_for_model('ssd_inception_v2_pets')
model_config = configs['model']
model_config.ssd.num_classes = 37
configs['eval_input_configs'][0].num_additional_channels = 1
eval_config = configs['eval_config']
eval_config.batch_size = eval_batch_size
eval_config.retain_original_image_additional_channels = True
eval_input_fn = inputs.create_eval_input_fn(
eval_config, configs['eval_input_configs'][0], model_config)
features, labels = _make_initializable_iterator(eval_input_fn()).get_next()
self.assertAllEqual([eval_batch_size, 300, 300, 4],
features[fields.InputDataFields.image].shape.as_list())
self.assertEqual(tf.float32, features[fields.InputDataFields.image].dtype)
self.assertAllEqual(
[eval_batch_size, 300, 300, 3],
features[fields.InputDataFields.original_image].shape.as_list())
self.assertEqual(tf.uint8,
features[fields.InputDataFields.original_image].dtype)
self.assertAllEqual([eval_batch_size, 300, 300, 1], features[
fields.InputDataFields.image_additional_channels].shape.as_list())
self.assertEqual(
tf.uint8,
features[fields.InputDataFields.image_additional_channels].dtype)
self.assertAllEqual([eval_batch_size],
features[inputs.HASH_KEY].shape.as_list())
self.assertEqual(tf.int32, features[inputs.HASH_KEY].dtype)
self.assertAllEqual(
[eval_batch_size, 100, 4],
labels[fields.InputDataFields.groundtruth_boxes].shape.as_list())
self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_boxes].dtype)
self.assertAllEqual(
[eval_batch_size, 100, model_config.ssd.num_classes],
labels[fields.InputDataFields.groundtruth_classes].shape.as_list())
self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_classes].dtype)
self.assertAllEqual(
[eval_batch_size, 100],
labels[fields.InputDataFields.groundtruth_weights].shape.as_list())
self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_weights].dtype)
self.assertAllEqual(
[eval_batch_size, 100],
labels[fields.InputDataFields.groundtruth_area].shape.as_list())
self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_area].dtype)
self.assertAllEqual(
[eval_batch_size, 100],
labels[fields.InputDataFields.groundtruth_is_crowd].shape.as_list())
self.assertEqual(tf.bool,
labels[fields.InputDataFields.groundtruth_is_crowd].dtype)
self.assertAllEqual(
[eval_batch_size, 100],
labels[fields.InputDataFields.groundtruth_difficult].shape.as_list())
self.assertEqual(tf.int32,
labels[fields.InputDataFields.groundtruth_difficult].dtype)
def test_predict_input(self): def test_predict_input(self):
"""Tests the predict input function.""" """Tests the predict input function."""
configs = _get_configs_for_model('ssd_inception_v2_pets') configs = _get_configs_for_model('ssd_inception_v2_pets')
......
...@@ -326,7 +326,8 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -326,7 +326,8 @@ class FasterRCNNMetaArch(model.DetectionModel):
clip_anchors_to_image=False, clip_anchors_to_image=False,
use_static_shapes=False, use_static_shapes=False,
resize_masks=True, resize_masks=True,
freeze_batchnorm=False): freeze_batchnorm=False,
return_raw_detections_during_predict=False):
"""FasterRCNNMetaArch Constructor. """FasterRCNNMetaArch Constructor.
Args: Args:
...@@ -455,7 +456,9 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -455,7 +456,9 @@ class FasterRCNNMetaArch(model.DetectionModel):
stage box predictor during training or not. When training with a small stage box predictor during training or not. When training with a small
batch size (e.g. 1), it is desirable to freeze batch norm update and batch size (e.g. 1), it is desirable to freeze batch norm update and
use pretrained batch norm params. use pretrained batch norm params.
return_raw_detections_during_predict: Whether to return raw detection
boxes in the predict() method. These are decoded boxes that have not
been through postprocessing (i.e. NMS). Default False.
Raises: Raises:
ValueError: If `second_stage_batch_size` > `first_stage_max_proposals` at ValueError: If `second_stage_batch_size` > `first_stage_max_proposals` at
training time. training time.
...@@ -623,6 +626,8 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -623,6 +626,8 @@ class FasterRCNNMetaArch(model.DetectionModel):
if self._number_of_stages <= 0 or self._number_of_stages > 3: if self._number_of_stages <= 0 or self._number_of_stages > 3:
raise ValueError('Number of stages should be a value in {1, 2, 3}.') raise ValueError('Number of stages should be a value in {1, 2, 3}.')
self._batched_prediction_tensor_names = [] self._batched_prediction_tensor_names = []
self._return_raw_detections_during_predict = (
return_raw_detections_during_predict)
@property @property
def first_stage_feature_extractor_scope(self): def first_stage_feature_extractor_scope(self):
...@@ -694,16 +699,12 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -694,16 +699,12 @@ class FasterRCNNMetaArch(model.DetectionModel):
Raises: Raises:
ValueError: if inputs tensor does not have type tf.float32 ValueError: if inputs tensor does not have type tf.float32
""" """
if inputs.dtype is not tf.float32:
raise ValueError('`preprocess` expects a tf.float32 tensor')
with tf.name_scope('Preprocessor'): with tf.name_scope('Preprocessor'):
outputs = shape_utils.static_or_dynamic_map_fn( (resized_inputs,
self._image_resizer_fn, true_image_shapes) = shape_utils.resize_images_and_return_shapes(
elems=inputs, inputs, self._image_resizer_fn)
dtype=[tf.float32, tf.int32],
parallel_iterations=self._parallel_iterations)
resized_inputs = outputs[0]
true_image_shapes = outputs[1]
return (self._feature_extractor.preprocess(resized_inputs), return (self._feature_extractor.preprocess(resized_inputs),
true_image_shapes) true_image_shapes)
...@@ -790,31 +791,42 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -790,31 +791,42 @@ class FasterRCNNMetaArch(model.DetectionModel):
for the first stage RPN (in absolute coordinates). Note that for the first stage RPN (in absolute coordinates). Note that
`num_anchors` can differ depending on whether the model is created in `num_anchors` can differ depending on whether the model is created in
training or inference mode. training or inference mode.
7) feature_maps: A single element list containing a 4-D float32 tensor
with shape batch_size, height, width, depth] representing the RPN
features to crop.
(and if number_of_stages > 1): (and if number_of_stages > 1):
7) refined_box_encodings: a 3-D tensor with shape 8) refined_box_encodings: a 3-D tensor with shape
[total_num_proposals, num_classes, self._box_coder.code_size] [total_num_proposals, num_classes, self._box_coder.code_size]
representing predicted (final) refined box encodings, where representing predicted (final) refined box encodings, where
total_num_proposals=batch_size*self._max_num_proposals. If using total_num_proposals=batch_size*self._max_num_proposals. If using
a shared box across classes the shape will instead be a shared box across classes the shape will instead be
[total_num_proposals, 1, self._box_coder.code_size]. [total_num_proposals, 1, self._box_coder.code_size].
8) class_predictions_with_background: a 3-D tensor with shape 9) class_predictions_with_background: a 3-D tensor with shape
[total_num_proposals, num_classes + 1] containing class [total_num_proposals, num_classes + 1] containing class
predictions (logits) for each of the anchors, where predictions (logits) for each of the anchors, where
total_num_proposals=batch_size*self._max_num_proposals. total_num_proposals=batch_size*self._max_num_proposals.
Note that this tensor *includes* background class predictions Note that this tensor *includes* background class predictions
(at class index 0). (at class index 0).
9) num_proposals: An int32 tensor of shape [batch_size] representing the 10) num_proposals: An int32 tensor of shape [batch_size] representing
number of proposals generated by the RPN. `num_proposals` allows us the number of proposals generated by the RPN. `num_proposals` allows
to keep track of which entries are to be treated as zero paddings and us to keep track of which entries are to be treated as zero paddings
which are not since we always pad the number of proposals to be and which are not since we always pad the number of proposals to be
`self.max_num_proposals` for each image. `self.max_num_proposals` for each image.
10) proposal_boxes: A float32 tensor of shape 11) proposal_boxes: A float32 tensor of shape
[batch_size, self.max_num_proposals, 4] representing [batch_size, self.max_num_proposals, 4] representing
decoded proposal bounding boxes in absolute coordinates. decoded proposal bounding boxes in absolute coordinates.
11) mask_predictions: (optional) a 4-D tensor with shape 12) mask_predictions: (optional) a 4-D tensor with shape
[total_num_padded_proposals, num_classes, mask_height, mask_width] [total_num_padded_proposals, num_classes, mask_height, mask_width]
containing instance mask predictions. containing instance mask predictions.
13) raw_detection_boxes: (optional) a
[batch_size, self.max_num_proposals, num_classes, 4] float32 tensor
with detections prior to NMS in normalized coordinates.
14) raw_detection_feature_map_indices: (optional) a
[batch_size, self.max_num_proposals, num_classes] int32 tensor with
indices indicating which feature map each raw detection box was
produced from. The indices correspond to the elements in the
'feature_maps' field.
Raises: Raises:
ValueError: If `predict` is called before `preprocess`. ValueError: If `predict` is called before `preprocess`.
...@@ -868,6 +880,9 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -868,6 +880,9 @@ class FasterRCNNMetaArch(model.DetectionModel):
for the first stage RPN (in absolute coordinates). Note that for the first stage RPN (in absolute coordinates). Note that
`num_anchors` can differ depending on whether the model is created in `num_anchors` can differ depending on whether the model is created in
training or inference mode. training or inference mode.
7) feature_maps: A single element list containing a 4-D float32 tensor
with shape batch_size, height, width, depth] representing the RPN
features to crop.
""" """
(rpn_box_predictor_features, rpn_features_to_crop, anchors_boxlist, (rpn_box_predictor_features, rpn_features_to_crop, anchors_boxlist,
image_shape) = self._extract_rpn_feature_maps(preprocessed_inputs) image_shape) = self._extract_rpn_feature_maps(preprocessed_inputs)
...@@ -907,6 +922,7 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -907,6 +922,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
dtype=tf.float32), dtype=tf.float32),
'anchors': 'anchors':
anchors_boxlist.data['boxes'], anchors_boxlist.data['boxes'],
fields.PredictionFields.feature_maps: [rpn_features_to_crop]
} }
return prediction_dict return prediction_dict
...@@ -985,18 +1001,25 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -985,18 +1001,25 @@ class FasterRCNNMetaArch(model.DetectionModel):
of the image. of the image.
6) box_classifier_features: a 4-D float32/bfloat16 tensor 6) box_classifier_features: a 4-D float32/bfloat16 tensor
representing the features for each proposal. representing the features for each proposal.
If self._return_raw_detections_during_predict is True, the dictionary
will also contain:
7) raw_detection_boxes: a 4-D float32 tensor with shape
[batch_size, self.max_num_proposals, num_classes, 4] in normalized
coordinates.
8) raw_detection_feature_map_indices: a 3-D int32 tensor with shape
[batch_size, self.max_num_proposals, num_classes].
""" """
proposal_boxes_normalized, num_proposals = self._proposal_postprocess( proposal_boxes_normalized, num_proposals = self._proposal_postprocess(
rpn_box_encodings, rpn_objectness_predictions_with_background, anchors, rpn_box_encodings, rpn_objectness_predictions_with_background, anchors,
image_shape, true_image_shapes) image_shape, true_image_shapes)
prediction_dict = self._box_prediction(rpn_features_to_crop, prediction_dict = self._box_prediction(rpn_features_to_crop,
proposal_boxes_normalized, proposal_boxes_normalized,
image_shape) image_shape, true_image_shapes)
prediction_dict['num_proposals'] = num_proposals prediction_dict['num_proposals'] = num_proposals
return prediction_dict return prediction_dict
def _box_prediction(self, rpn_features_to_crop, proposal_boxes_normalized, def _box_prediction(self, rpn_features_to_crop, proposal_boxes_normalized,
image_shape): image_shape, true_image_shapes):
"""Predicts the output tensors from second stage of Faster R-CNN. """Predicts the output tensors from second stage of Faster R-CNN.
Args: Args:
...@@ -1008,6 +1031,10 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -1008,6 +1031,10 @@ class FasterRCNNMetaArch(model.DetectionModel):
proposal boxes for all images in the batch. These boxes are represented proposal boxes for all images in the batch. These boxes are represented
as normalized coordinates. as normalized coordinates.
image_shape: A 1D int32 tensors of size [4] containing the image shape. image_shape: A 1D int32 tensors of size [4] containing the image shape.
true_image_shapes: int32 tensor of shape [batch, 3] where each row is
of the form [height, width, channels] indicating the shapes
of true images in the resized images, as resized images can be padded
with zeros.
Returns: Returns:
prediction_dict: a dictionary holding "raw" prediction tensors: prediction_dict: a dictionary holding "raw" prediction tensors:
...@@ -1034,6 +1061,16 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -1034,6 +1061,16 @@ class FasterRCNNMetaArch(model.DetectionModel):
of the image. of the image.
5) box_classifier_features: a 4-D float32/bfloat16 tensor 5) box_classifier_features: a 4-D float32/bfloat16 tensor
representing the features for each proposal. representing the features for each proposal.
If self._return_raw_detections_during_predict is True, the dictionary
will also contain:
6) raw_detection_boxes: a 4-D float32 tensor with shape
[batch_size, self.max_num_proposals, num_classes, 4] in normalized
coordinates.
7) raw_detection_feature_map_indices: a 3-D int32 tensor with shape
[batch_size, self.max_num_proposals, num_classes].
8) final_anchors: a 3-D float tensor of shape [batch_size,
self.max_num_proposals, 4] containing the reference anchors for raw
detection boxes in normalized coordinates.
""" """
flattened_proposal_feature_maps = ( flattened_proposal_feature_maps = (
self._compute_second_stage_input_feature_maps( self._compute_second_stage_input_feature_maps(
...@@ -1071,10 +1108,54 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -1071,10 +1108,54 @@ class FasterRCNNMetaArch(model.DetectionModel):
'proposal_boxes': absolute_proposal_boxes, 'proposal_boxes': absolute_proposal_boxes,
'box_classifier_features': box_classifier_features, 'box_classifier_features': box_classifier_features,
'proposal_boxes_normalized': proposal_boxes_normalized, 'proposal_boxes_normalized': proposal_boxes_normalized,
'final_anchors': proposal_boxes_normalized
} }
if self._return_raw_detections_during_predict:
prediction_dict.update(self._raw_detections_and_feature_map_inds(
refined_box_encodings, absolute_proposal_boxes, true_image_shapes))
return prediction_dict return prediction_dict
def _raw_detections_and_feature_map_inds(
self, refined_box_encodings, absolute_proposal_boxes, true_image_shapes):
"""Returns raw detections and feat map inds from where they originated.
Args:
refined_box_encodings: [total_num_proposals, num_classes,
self._box_coder.code_size] float32 tensor.
absolute_proposal_boxes: [batch_size, self.max_num_proposals, 4] float32
tensor representing decoded proposal bounding boxes in absolute
coordinates.
true_image_shapes: [batch, 3] int32 tensor where each row is
of the form [height, width, channels] indicating the shapes
of true images in the resized images, as resized images can be padded
with zeros.
Returns:
A dictionary with raw detection boxes, and the feature map indices from
which they originated.
"""
box_encodings_batch = tf.reshape(
refined_box_encodings,
[-1, self.max_num_proposals, refined_box_encodings.shape[1],
self._box_coder.code_size])
raw_detection_boxes_absolute = self._batch_decode_boxes(
box_encodings_batch, absolute_proposal_boxes)
raw_detection_boxes_normalized = shape_utils.static_or_dynamic_map_fn(
self._normalize_and_clip_boxes,
elems=[raw_detection_boxes_absolute, true_image_shapes],
dtype=tf.float32)
detection_feature_map_indices = tf.zeros_like(
raw_detection_boxes_normalized[:, :, :, 0], dtype=tf.int32)
return {
fields.PredictionFields.raw_detection_boxes:
raw_detection_boxes_normalized,
fields.PredictionFields.raw_detection_feature_map_indices:
detection_feature_map_indices
}
def _extract_box_classifier_features(self, flattened_feature_maps): def _extract_box_classifier_features(self, flattened_feature_maps):
if self._feature_extractor_for_box_classifier_features == ( if self._feature_extractor_for_box_classifier_features == (
_UNINITIALIZED_FEATURE_EXTRACTOR): _UNINITIALIZED_FEATURE_EXTRACTOR):
...@@ -1416,11 +1497,12 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -1416,11 +1497,12 @@ class FasterRCNNMetaArch(model.DetectionModel):
detection_boxes: [batch, max_detection, 4] detection_boxes: [batch, max_detection, 4]
detection_scores: [batch, max_detections] detection_scores: [batch, max_detections]
detection_multiclass_scores: [batch, max_detections, 2] detection_multiclass_scores: [batch, max_detections, 2]
detection_anchor_indices: [batch, max_detections]
detection_classes: [batch, max_detections] detection_classes: [batch, max_detections]
(this entry is only created if rpn_mode=False) (this entry is only created if rpn_mode=False)
num_detections: [batch] num_detections: [batch]
raw_detection_boxes: [batch, max_detections, 4] raw_detection_boxes: [batch, total_detections, 4]
raw_detection_scores: [batch, max_detections, num_classes + 1] raw_detection_scores: [batch, total_detections, num_classes + 1]
Raises: Raises:
ValueError: If `predict` is called before `preprocess`. ValueError: If `predict` is called before `preprocess`.
...@@ -1473,6 +1555,13 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -1473,6 +1555,13 @@ class FasterRCNNMetaArch(model.DetectionModel):
if self._number_of_stages == 3: if self._number_of_stages == 3:
# Post processing is already performed in 3rd stage. We need to transfer # Post processing is already performed in 3rd stage. We need to transfer
# postprocessed tensors from `prediction_dict` to `detections_dict`. # postprocessed tensors from `prediction_dict` to `detections_dict`.
# Remove any items from the prediction dictionary if they are not pure
# Tensors.
non_tensor_predictions = [
k for k, v in prediction_dict.items() if not isinstance(v, tf.Tensor)]
for k in non_tensor_predictions:
tf.logging.info('Removing {0} from prediction_dict'.format(k))
prediction_dict.pop(k)
return prediction_dict return prediction_dict
def _add_detection_features_output_node(self, detection_boxes, def _add_detection_features_output_node(self, detection_boxes,
...@@ -1621,8 +1710,9 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -1621,8 +1710,9 @@ class FasterRCNNMetaArch(model.DetectionModel):
normalize_boxes, normalize_boxes,
elems=[raw_proposal_boxes, image_shapes], elems=[raw_proposal_boxes, image_shapes],
dtype=tf.float32) dtype=tf.float32)
proposal_multiclass_scores = nmsed_additional_fields.get( proposal_multiclass_scores = (
'multiclass_scores') if nmsed_additional_fields else None, nmsed_additional_fields.get('multiclass_scores')
if nmsed_additional_fields else None)
return (normalized_proposal_boxes, proposal_scores, return (normalized_proposal_boxes, proposal_scores,
proposal_multiclass_scores, num_proposals, proposal_multiclass_scores, num_proposals,
raw_normalized_proposal_boxes, rpn_objectness_softmax) raw_normalized_proposal_boxes, rpn_objectness_softmax)
...@@ -1899,9 +1989,11 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -1899,9 +1989,11 @@ class FasterRCNNMetaArch(model.DetectionModel):
A dictionary containing: A dictionary containing:
`detection_boxes`: [batch, max_detection, 4] in normalized co-ordinates. `detection_boxes`: [batch, max_detection, 4] in normalized co-ordinates.
`detection_scores`: [batch, max_detections] `detection_scores`: [batch, max_detections]
detection_multiclass_scores: [batch, max_detections, `detection_multiclass_scores`: [batch, max_detections,
num_classes_with_background] tensor with class score distribution for num_classes_with_background] tensor with class score distribution for
post-processed detection boxes including background class if any. post-processed detection boxes including background class if any.
`detection_anchor_indices`: [batch, max_detections] with anchor
indices.
`detection_classes`: [batch, max_detections] `detection_classes`: [batch, max_detections]
`num_detections`: [batch] `num_detections`: [batch]
`detection_masks`: `detection_masks`:
...@@ -1909,10 +2001,13 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -1909,10 +2001,13 @@ class FasterRCNNMetaArch(model.DetectionModel):
that a pixel-wise sigmoid score converter is applied to the detection that a pixel-wise sigmoid score converter is applied to the detection
masks. masks.
`raw_detection_boxes`: [batch, total_detections, 4] tensor with decoded `raw_detection_boxes`: [batch, total_detections, 4] tensor with decoded
detection boxes before Non-Max Suppression. detection boxes in normalized coordinates, before Non-Max Suppression.
The value total_detections is the number of second stage anchors
(i.e. the total number of boxes before NMS).
`raw_detection_scores`: [batch, total_detections, `raw_detection_scores`: [batch, total_detections,
num_classes_with_background] tensor of multi-class scores for num_classes_with_background] tensor of multi-class scores for
raw detection boxes. raw detection boxes. The value total_detections is the number of
second stage anchors (i.e. the total number of boxes before NMS).
""" """
refined_box_encodings_batch = tf.reshape( refined_box_encodings_batch = tf.reshape(
refined_box_encodings, refined_box_encodings,
...@@ -1943,8 +2038,14 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -1943,8 +2038,14 @@ class FasterRCNNMetaArch(model.DetectionModel):
mask_predictions, [-1, self.max_num_proposals, mask_predictions, [-1, self.max_num_proposals,
self.num_classes, mask_height, mask_width]) self.num_classes, mask_height, mask_width])
batch_size = shape_utils.combined_static_and_dynamic_shape(
refined_box_encodings_batch)[0]
batch_anchor_indices = tf.tile(
tf.expand_dims(tf.range(self.max_num_proposals), 0),
multiples=[batch_size, 1])
additional_fields = { additional_fields = {
'multiclass_scores': class_predictions_with_background_batch_normalized 'multiclass_scores': class_predictions_with_background_batch_normalized,
'anchor_indices': tf.cast(batch_anchor_indices, tf.float32)
} }
(nmsed_boxes, nmsed_scores, nmsed_classes, nmsed_masks, (nmsed_boxes, nmsed_scores, nmsed_classes, nmsed_masks,
nmsed_additional_fields, num_detections) = self._second_stage_nms_fn( nmsed_additional_fields, num_detections) = self._second_stage_nms_fn(
...@@ -1965,25 +2066,8 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -1965,25 +2066,8 @@ class FasterRCNNMetaArch(model.DetectionModel):
else: else:
raw_detection_boxes = tf.squeeze(refined_decoded_boxes_batch, axis=2) raw_detection_boxes = tf.squeeze(refined_decoded_boxes_batch, axis=2)
def normalize_and_clip_boxes(args):
"""Normalize and clip boxes."""
boxes_per_image = args[0]
image_shape = args[1]
normalized_boxes_per_image = box_list_ops.to_normalized_coordinates(
box_list.BoxList(boxes_per_image),
image_shape[0],
image_shape[1],
check_range=False).get()
normalized_boxes_per_image = box_list_ops.clip_to_window(
box_list.BoxList(normalized_boxes_per_image),
tf.constant([0.0, 0.0, 1.0, 1.0], tf.float32),
filter_nonoverlapping=False).get()
return normalized_boxes_per_image
raw_normalized_detection_boxes = shape_utils.static_or_dynamic_map_fn( raw_normalized_detection_boxes = shape_utils.static_or_dynamic_map_fn(
normalize_and_clip_boxes, self._normalize_and_clip_boxes,
elems=[raw_detection_boxes, image_shapes], elems=[raw_detection_boxes, image_shapes],
dtype=tf.float32) dtype=tf.float32)
...@@ -1996,6 +2080,8 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -1996,6 +2080,8 @@ class FasterRCNNMetaArch(model.DetectionModel):
nmsed_classes, nmsed_classes,
fields.DetectionResultFields.detection_multiclass_scores: fields.DetectionResultFields.detection_multiclass_scores:
nmsed_additional_fields['multiclass_scores'], nmsed_additional_fields['multiclass_scores'],
fields.DetectionResultFields.detection_anchor_indices:
tf.cast(nmsed_additional_fields['anchor_indices'], tf.int32),
fields.DetectionResultFields.num_detections: fields.DetectionResultFields.num_detections:
tf.cast(num_detections, dtype=tf.float32), tf.cast(num_detections, dtype=tf.float32),
fields.DetectionResultFields.raw_detection_boxes: fields.DetectionResultFields.raw_detection_boxes:
...@@ -2041,6 +2127,35 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -2041,6 +2127,35 @@ class FasterRCNNMetaArch(model.DetectionModel):
tf.stack([combined_shape[0], combined_shape[1], tf.stack([combined_shape[0], combined_shape[1],
num_classes, 4])) num_classes, 4]))
def _normalize_and_clip_boxes(self, boxes_and_image_shape):
"""Normalize and clip boxes."""
boxes_per_image = boxes_and_image_shape[0]
image_shape = boxes_and_image_shape[1]
boxes_contains_classes_dim = boxes_per_image.shape.ndims == 3
if boxes_contains_classes_dim:
boxes_per_image = shape_utils.flatten_first_n_dimensions(
boxes_per_image, 2)
normalized_boxes_per_image = box_list_ops.to_normalized_coordinates(
box_list.BoxList(boxes_per_image),
image_shape[0],
image_shape[1],
check_range=False).get()
normalized_boxes_per_image = box_list_ops.clip_to_window(
box_list.BoxList(normalized_boxes_per_image),
tf.constant([0.0, 0.0, 1.0, 1.0], tf.float32),
filter_nonoverlapping=False).get()
if boxes_contains_classes_dim:
max_num_proposals, num_classes, _ = (
shape_utils.combined_static_and_dynamic_shape(
boxes_and_image_shape[0]))
normalized_boxes_per_image = shape_utils.expand_first_dimension(
normalized_boxes_per_image, [max_num_proposals, num_classes])
return normalized_boxes_per_image
def loss(self, prediction_dict, true_image_shapes, scope=None): def loss(self, prediction_dict, true_image_shapes, scope=None):
"""Compute scalar loss tensors given prediction tensors. """Compute scalar loss tensors given prediction tensors.
......
...@@ -244,7 +244,8 @@ class FasterRCNNMetaArchTest( ...@@ -244,7 +244,8 @@ class FasterRCNNMetaArchTest(
max_num_proposals, max_num_proposals,
initial_crop_size, initial_crop_size,
maxpool_stride, maxpool_stride,
3) 3),
'feature_maps': [(2, image_size, image_size, 512)]
} }
for input_shape in input_shapes: for input_shape in input_shapes:
...@@ -274,9 +275,12 @@ class FasterRCNNMetaArchTest( ...@@ -274,9 +275,12 @@ class FasterRCNNMetaArchTest(
'detection_boxes', 'detection_scores', 'detection_boxes', 'detection_scores',
'detection_multiclass_scores', 'detection_classes', 'detection_multiclass_scores', 'detection_classes',
'detection_masks', 'num_detections', 'mask_predictions', 'detection_masks', 'num_detections', 'mask_predictions',
'raw_detection_boxes', 'raw_detection_scores' 'raw_detection_boxes', 'raw_detection_scores',
'detection_anchor_indices', 'final_anchors',
]))) ])))
for key in expected_shapes: for key in expected_shapes:
if isinstance(tensor_dict_out[key], list):
continue
self.assertAllEqual(tensor_dict_out[key].shape, expected_shapes[key]) self.assertAllEqual(tensor_dict_out[key].shape, expected_shapes[key])
self.assertAllEqual(tensor_dict_out['detection_boxes'].shape, [2, 5, 4]) self.assertAllEqual(tensor_dict_out['detection_boxes'].shape, [2, 5, 4])
self.assertAllEqual(tensor_dict_out['detection_masks'].shape, self.assertAllEqual(tensor_dict_out['detection_masks'].shape,
...@@ -288,6 +292,101 @@ class FasterRCNNMetaArchTest( ...@@ -288,6 +292,101 @@ class FasterRCNNMetaArchTest(
self.assertAllEqual(tensor_dict_out['mask_predictions'].shape, self.assertAllEqual(tensor_dict_out['mask_predictions'].shape,
[10, num_classes, 14, 14]) [10, num_classes, 14, 14])
@parameterized.parameters(
{'use_keras': True},
{'use_keras': False},
)
def test_raw_detection_boxes_and_anchor_indices_correct(self, use_keras):
batch_size = 2
image_size = 10
max_num_proposals = 8
initial_crop_size = 3
maxpool_stride = 1
input_shapes = [(batch_size, image_size, image_size, 3),
(None, image_size, image_size, 3),
(batch_size, None, None, 3),
(None, None, None, 3)]
expected_num_anchors = image_size * image_size * 3 * 3
expected_shapes = {
'rpn_box_predictor_features':
(batch_size, image_size, image_size, 512),
'rpn_features_to_crop': (batch_size, image_size, image_size, 3),
'image_shape': (4,),
'rpn_box_encodings': (batch_size, expected_num_anchors, 4),
'rpn_objectness_predictions_with_background':
(batch_size, expected_num_anchors, 2),
'anchors': (expected_num_anchors, 4),
'refined_box_encodings': (batch_size * max_num_proposals, 1, 4),
'class_predictions_with_background':
(batch_size * max_num_proposals, 2 + 1),
'num_proposals': (batch_size,),
'proposal_boxes': (batch_size, max_num_proposals, 4),
'proposal_boxes_normalized': (batch_size, max_num_proposals, 4),
'box_classifier_features':
self._get_box_classifier_features_shape(image_size,
batch_size,
max_num_proposals,
initial_crop_size,
maxpool_stride,
3),
'feature_maps': [(batch_size, image_size, image_size, 3)],
'raw_detection_feature_map_indices': (batch_size, max_num_proposals, 1),
'raw_detection_boxes': (batch_size, max_num_proposals, 1, 4),
'final_anchors': (batch_size, max_num_proposals, 4)
}
for input_shape in input_shapes:
test_graph = tf.Graph()
with test_graph.as_default():
model = self._build_model(
is_training=False,
use_keras=use_keras,
number_of_stages=2,
second_stage_batch_size=2,
share_box_across_classes=True,
return_raw_detections_during_predict=True)
preprocessed_inputs = tf.placeholder(tf.float32, shape=input_shape)
_, true_image_shapes = model.preprocess(preprocessed_inputs)
predict_tensor_dict = model.predict(preprocessed_inputs,
true_image_shapes)
postprocess_tensor_dict = model.postprocess(predict_tensor_dict,
true_image_shapes)
init_op = tf.global_variables_initializer()
with self.test_session(graph=test_graph) as sess:
sess.run(init_op)
[predict_dict_out, postprocess_dict_out] = sess.run(
[predict_tensor_dict, postprocess_tensor_dict], feed_dict={
preprocessed_inputs:
np.zeros((batch_size, image_size, image_size, 3))})
self.assertEqual(
set(predict_dict_out.keys()),
set(expected_shapes.keys()))
for key in expected_shapes:
if isinstance(predict_dict_out[key], list):
continue
self.assertAllEqual(predict_dict_out[key].shape, expected_shapes[key])
# Verify that the raw detections from predict and postprocess are the
# same.
self.assertAllClose(
np.squeeze(predict_dict_out['raw_detection_boxes']),
postprocess_dict_out['raw_detection_boxes'])
# Verify that the raw detection boxes at detection anchor indices are the
# same as the postprocessed detections.
for i in range(batch_size):
num_detections_per_image = int(
postprocess_dict_out['num_detections'][i])
detection_boxes_per_image = postprocess_dict_out[
'detection_boxes'][i][:num_detections_per_image]
detection_anchor_indices_per_image = postprocess_dict_out[
'detection_anchor_indices'][i][:num_detections_per_image]
raw_detections_per_image = np.squeeze(predict_dict_out[
'raw_detection_boxes'][i])
raw_detections_at_anchor_indices = raw_detections_per_image[
detection_anchor_indices_per_image]
self.assertAllClose(detection_boxes_per_image,
raw_detections_at_anchor_indices)
@parameterized.parameters( @parameterized.parameters(
{'masks_are_class_agnostic': False, 'use_keras': True}, {'masks_are_class_agnostic': False, 'use_keras': True},
{'masks_are_class_agnostic': True, 'use_keras': True}, {'masks_are_class_agnostic': True, 'use_keras': True},
...@@ -345,7 +444,8 @@ class FasterRCNNMetaArchTest( ...@@ -345,7 +444,8 @@ class FasterRCNNMetaArchTest(
self._get_box_classifier_features_shape( self._get_box_classifier_features_shape(
image_size, batch_size, max_num_proposals, initial_crop_size, image_size, batch_size, max_num_proposals, initial_crop_size,
maxpool_stride, 3), maxpool_stride, 3),
'mask_predictions': (2 * max_num_proposals, mask_shape_1, 14, 14) 'mask_predictions': (2 * max_num_proposals, mask_shape_1, 14, 14),
'feature_maps': [(2, image_size, image_size, 512)]
} }
init_op = tf.global_variables_initializer() init_op = tf.global_variables_initializer()
...@@ -359,8 +459,11 @@ class FasterRCNNMetaArchTest( ...@@ -359,8 +459,11 @@ class FasterRCNNMetaArchTest(
'rpn_box_encodings', 'rpn_box_encodings',
'rpn_objectness_predictions_with_background', 'rpn_objectness_predictions_with_background',
'anchors', 'anchors',
'final_anchors',
]))) ])))
for key in expected_shapes: for key in expected_shapes:
if isinstance(tensor_dict_out[key], list):
continue
self.assertAllEqual(tensor_dict_out[key].shape, expected_shapes[key]) self.assertAllEqual(tensor_dict_out[key].shape, expected_shapes[key])
anchors_shape_out = tensor_dict_out['anchors'].shape anchors_shape_out = tensor_dict_out['anchors'].shape
......
...@@ -118,27 +118,30 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase): ...@@ -118,27 +118,30 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
text_format.Merge(hyperparams_text_proto, hyperparams) text_format.Merge(hyperparams_text_proto, hyperparams)
return hyperparams_builder.KerasLayerHyperparams(hyperparams) return hyperparams_builder.KerasLayerHyperparams(hyperparams)
def _get_second_stage_box_predictor_text_proto(self): def _get_second_stage_box_predictor_text_proto(
self, share_box_across_classes=False):
share_box_field = 'true' if share_box_across_classes else 'false'
box_predictor_text_proto = """ box_predictor_text_proto = """
mask_rcnn_box_predictor { mask_rcnn_box_predictor {{
fc_hyperparams { fc_hyperparams {{
op: FC op: FC
activation: NONE activation: NONE
regularizer { regularizer {{
l2_regularizer { l2_regularizer {{
weight: 0.0005 weight: 0.0005
} }}
} }}
initializer { initializer {{
variance_scaling_initializer { variance_scaling_initializer {{
factor: 1.0 factor: 1.0
uniform: true uniform: true
mode: FAN_AVG mode: FAN_AVG
} }}
} }}
} }}
} share_box_across_classes: {share_box_across_classes}
""" }}
""".format(share_box_across_classes=share_box_field)
return box_predictor_text_proto return box_predictor_text_proto
def _add_mask_to_second_stage_box_predictor_text_proto( def _add_mask_to_second_stage_box_predictor_text_proto(
...@@ -169,10 +172,11 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase): ...@@ -169,10 +172,11 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
def _get_second_stage_box_predictor(self, num_classes, is_training, def _get_second_stage_box_predictor(self, num_classes, is_training,
predict_masks, masks_are_class_agnostic, predict_masks, masks_are_class_agnostic,
share_box_across_classes=False,
use_keras=False): use_keras=False):
box_predictor_proto = box_predictor_pb2.BoxPredictor() box_predictor_proto = box_predictor_pb2.BoxPredictor()
text_format.Merge(self._get_second_stage_box_predictor_text_proto(), text_format.Merge(self._get_second_stage_box_predictor_text_proto(
box_predictor_proto) share_box_across_classes), box_predictor_proto)
if predict_masks: if predict_masks:
text_format.Merge( text_format.Merge(
self._add_mask_to_second_stage_box_predictor_text_proto( self._add_mask_to_second_stage_box_predictor_text_proto(
...@@ -219,7 +223,9 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase): ...@@ -219,7 +223,9 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
clip_anchors_to_image=False, clip_anchors_to_image=False,
use_matmul_gather_in_matcher=False, use_matmul_gather_in_matcher=False,
use_static_shapes=False, use_static_shapes=False,
calibration_mapping_value=None): calibration_mapping_value=None,
share_box_across_classes=False,
return_raw_detections_during_predict=False):
def image_resizer_fn(image, masks=None): def image_resizer_fn(image, masks=None):
"""Fake image resizer function.""" """Fake image resizer function."""
...@@ -404,6 +410,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase): ...@@ -404,6 +410,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
'clip_anchors_to_image': clip_anchors_to_image, 'clip_anchors_to_image': clip_anchors_to_image,
'use_static_shapes': use_static_shapes, 'use_static_shapes': use_static_shapes,
'resize_masks': True, 'resize_masks': True,
'return_raw_detections_during_predict':
return_raw_detections_during_predict
} }
return self._get_model( return self._get_model(
...@@ -412,7 +420,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase): ...@@ -412,7 +420,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
is_training=is_training, is_training=is_training,
use_keras=use_keras, use_keras=use_keras,
predict_masks=predict_masks, predict_masks=predict_masks,
masks_are_class_agnostic=masks_are_class_agnostic), **common_kwargs) masks_are_class_agnostic=masks_are_class_agnostic,
share_box_across_classes=share_box_across_classes), **common_kwargs)
@parameterized.parameters( @parameterized.parameters(
{'use_static_shapes': False, 'use_keras': True}, {'use_static_shapes': False, 'use_keras': True},
...@@ -538,7 +547,7 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase): ...@@ -538,7 +547,7 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
expected_output_keys = set([ expected_output_keys = set([
'rpn_box_predictor_features', 'rpn_features_to_crop', 'image_shape', 'rpn_box_predictor_features', 'rpn_features_to_crop', 'image_shape',
'rpn_box_encodings', 'rpn_objectness_predictions_with_background', 'rpn_box_encodings', 'rpn_objectness_predictions_with_background',
'anchors']) 'anchors', 'feature_maps'])
# At training time, anchors that exceed image bounds are pruned. Thus # At training time, anchors that exceed image bounds are pruned. Thus
# the `expected_num_anchors` in the above inference mode test is now # the `expected_num_anchors` in the above inference mode test is now
# a strict upper bound on the number of anchors. # a strict upper bound on the number of anchors.
...@@ -612,7 +621,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase): ...@@ -612,7 +621,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
expected_output_shapes['proposal_boxes_normalized']) expected_output_shapes['proposal_boxes_normalized'])
self.assertAllEqual(results[11].shape, self.assertAllEqual(results[11].shape,
expected_output_shapes['box_classifier_features']) expected_output_shapes['box_classifier_features'])
self.assertAllEqual(results[12].shape,
expected_output_shapes['final_anchors'])
batch_size = 2 batch_size = 2
image_size = 10 image_size = 10
max_num_proposals = 8 max_num_proposals = 8
...@@ -648,7 +658,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase): ...@@ -648,7 +658,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
prediction_dict['num_proposals'], prediction_dict['num_proposals'],
prediction_dict['proposal_boxes'], prediction_dict['proposal_boxes'],
prediction_dict['proposal_boxes_normalized'], prediction_dict['proposal_boxes_normalized'],
prediction_dict['box_classifier_features']) prediction_dict['box_classifier_features'],
prediction_dict['final_anchors'])
expected_num_anchors = image_size * image_size * 3 * 3 expected_num_anchors = image_size * image_size * 3 * 3
expected_shapes = { expected_shapes = {
...@@ -671,7 +682,9 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase): ...@@ -671,7 +682,9 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
max_num_proposals, max_num_proposals,
initial_crop_size, initial_crop_size,
maxpool_stride, maxpool_stride,
3) 3),
'feature_maps': [(2, image_size, image_size, 512)],
'final_anchors': (2, max_num_proposals, 4)
} }
if use_static_shapes: if use_static_shapes:
...@@ -702,6 +715,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase): ...@@ -702,6 +715,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
self.assertEqual(set(tensor_dict_out.keys()), self.assertEqual(set(tensor_dict_out.keys()),
set(expected_shapes.keys())) set(expected_shapes.keys()))
for key in expected_shapes: for key in expected_shapes:
if isinstance(tensor_dict_out[key], list):
continue
self.assertAllEqual(tensor_dict_out[key].shape, expected_shapes[key]) self.assertAllEqual(tensor_dict_out[key].shape, expected_shapes[key])
@parameterized.parameters( @parameterized.parameters(
...@@ -748,7 +763,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase): ...@@ -748,7 +763,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
result_tensor_dict['rpn_objectness_predictions_with_background'], result_tensor_dict['rpn_objectness_predictions_with_background'],
result_tensor_dict['rpn_features_to_crop'], result_tensor_dict['rpn_features_to_crop'],
result_tensor_dict['rpn_box_predictor_features'], result_tensor_dict['rpn_box_predictor_features'],
updates updates,
result_tensor_dict['final_anchors'],
) )
image_shape = (batch_size, image_size, image_size, 3) image_shape = (batch_size, image_size, image_size, 3)
...@@ -785,7 +801,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase): ...@@ -785,7 +801,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
image_size, batch_size, max_num_proposals, initial_crop_size, image_size, batch_size, max_num_proposals, initial_crop_size,
maxpool_stride, 3), maxpool_stride, 3),
'rpn_objectness_predictions_with_background': 'rpn_objectness_predictions_with_background':
(2, image_size * image_size * 9, 2) (2, image_size * image_size * 9, 2),
'final_anchors': (2, max_num_proposals, 4)
} }
# TODO(rathodv): Possibly change utils/test_case.py to accept dictionaries # TODO(rathodv): Possibly change utils/test_case.py to accept dictionaries
# and return dicionaries so don't have to rely on the order of tensors. # and return dicionaries so don't have to rely on the order of tensors.
...@@ -805,6 +822,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase): ...@@ -805,6 +822,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
expected_shapes['rpn_features_to_crop']) expected_shapes['rpn_features_to_crop'])
self.assertAllEqual(results[8].shape, self.assertAllEqual(results[8].shape,
expected_shapes['rpn_box_predictor_features']) expected_shapes['rpn_box_predictor_features'])
self.assertAllEqual(results[10].shape,
expected_shapes['final_anchors'])
@parameterized.parameters( @parameterized.parameters(
{'use_static_shapes': False, 'pad_to_max_dimension': None, {'use_static_shapes': False, 'pad_to_max_dimension': None,
...@@ -1082,7 +1101,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase): ...@@ -1082,7 +1101,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
detections['detection_scores'], detections['detection_classes'], detections['detection_scores'], detections['detection_classes'],
detections['raw_detection_boxes'], detections['raw_detection_boxes'],
detections['raw_detection_scores'], detections['raw_detection_scores'],
detections['detection_multiclass_scores']) detections['detection_multiclass_scores'],
detections['detection_anchor_indices'])
proposal_boxes = np.array( proposal_boxes = np.array(
[[[1, 1, 2, 3], [[[1, 1, 2, 3],
...@@ -1110,6 +1130,7 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase): ...@@ -1110,6 +1130,7 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
[images, refined_box_encodings, [images, refined_box_encodings,
class_predictions_with_background, class_predictions_with_background,
num_proposals, proposal_boxes]) num_proposals, proposal_boxes])
# Note that max_total_detections=5 in the NMS config.
expected_num_detections = [5, 4] expected_num_detections = [5, 4]
expected_detection_classes = [[0, 0, 0, 1, 1], [0, 0, 1, 1, 0]] expected_detection_classes = [[0, 0, 0, 1, 1], [0, 0, 1, 1, 0]]
expected_detection_scores = [[1, 1, 1, 1, 1], [1, 1, 1, 1, 0]] expected_detection_scores = [[1, 1, 1, 1, 1], [1, 1, 1, 1, 0]]
...@@ -1123,6 +1144,10 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase): ...@@ -1123,6 +1144,10 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
[1, 1, 1], [1, 1, 1],
[1, 1, 1], [1, 1, 1],
[0, 0, 0]]] [0, 0, 0]]]
# Note that a single anchor can be used for multiple detections (predictions
# are made independently per class).
expected_anchor_indices = [[0, 1, 2, 0, 1],
[0, 1, 0, 1]]
h = float(image_shape[1]) h = float(image_shape[1])
w = float(image_shape[2]) w = float(image_shape[2])
...@@ -1143,6 +1168,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase): ...@@ -1143,6 +1168,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
expected_detection_classes[indx][0:num_proposals]) expected_detection_classes[indx][0:num_proposals])
self.assertAllClose(results[6][indx][0:num_proposals], self.assertAllClose(results[6][indx][0:num_proposals],
expected_multiclass_scores[indx][0:num_proposals]) expected_multiclass_scores[indx][0:num_proposals])
self.assertAllClose(results[7][indx][0:num_proposals],
expected_anchor_indices[indx][0:num_proposals])
self.assertAllClose(results[4], expected_raw_detection_boxes) self.assertAllClose(results[4], expected_raw_detection_boxes)
self.assertAllClose(results[5], self.assertAllClose(results[5],
......
...@@ -82,7 +82,8 @@ class RFCNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch): ...@@ -82,7 +82,8 @@ class RFCNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch):
clip_anchors_to_image=False, clip_anchors_to_image=False,
use_static_shapes=False, use_static_shapes=False,
resize_masks=False, resize_masks=False,
freeze_batchnorm=False): freeze_batchnorm=False,
return_raw_detections_during_predict=False):
"""RFCNMetaArch Constructor. """RFCNMetaArch Constructor.
Args: Args:
...@@ -188,6 +189,9 @@ class RFCNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch): ...@@ -188,6 +189,9 @@ class RFCNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch):
training or not. When training with a small batch size (e.g. 1), it is training or not. When training with a small batch size (e.g. 1), it is
desirable to freeze batch norm update and use pretrained batch norm desirable to freeze batch norm update and use pretrained batch norm
params. params.
return_raw_detections_during_predict: Whether to return raw detection
boxes in the predict() method. These are decoded boxes that have not
been through postprocessing (i.e. NMS). Default False.
Raises: Raises:
ValueError: If `second_stage_batch_size` > `first_stage_max_proposals` ValueError: If `second_stage_batch_size` > `first_stage_max_proposals`
...@@ -234,7 +238,9 @@ class RFCNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch): ...@@ -234,7 +238,9 @@ class RFCNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch):
clip_anchors_to_image, clip_anchors_to_image,
use_static_shapes, use_static_shapes,
resize_masks, resize_masks,
freeze_batchnorm=freeze_batchnorm) freeze_batchnorm=freeze_batchnorm,
return_raw_detections_during_predict=(
return_raw_detections_during_predict))
self._rfcn_box_predictor = second_stage_rfcn_box_predictor self._rfcn_box_predictor = second_stage_rfcn_box_predictor
...@@ -335,7 +341,11 @@ class RFCNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch): ...@@ -335,7 +341,11 @@ class RFCNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch):
'proposal_boxes': absolute_proposal_boxes, 'proposal_boxes': absolute_proposal_boxes,
'box_classifier_features': box_classifier_features, 'box_classifier_features': box_classifier_features,
'proposal_boxes_normalized': proposal_boxes_normalized, 'proposal_boxes_normalized': proposal_boxes_normalized,
'final_anchors': absolute_proposal_boxes
} }
if self._return_raw_detections_during_predict:
prediction_dict.update(self._raw_detections_and_feature_map_inds(
refined_box_encodings, absolute_proposal_boxes))
return prediction_dict return prediction_dict
def regularization_losses(self): def regularization_losses(self):
......
...@@ -24,7 +24,9 @@ from object_detection.meta_architectures import rfcn_meta_arch ...@@ -24,7 +24,9 @@ from object_detection.meta_architectures import rfcn_meta_arch
class RFCNMetaArchTest( class RFCNMetaArchTest(
faster_rcnn_meta_arch_test_lib.FasterRCNNMetaArchTestBase): faster_rcnn_meta_arch_test_lib.FasterRCNNMetaArchTestBase):
def _get_second_stage_box_predictor_text_proto(self): def _get_second_stage_box_predictor_text_proto(
self, share_box_across_classes=False):
del share_box_across_classes
box_predictor_text_proto = """ box_predictor_text_proto = """
rfcn_box_predictor { rfcn_box_predictor {
conv_hyperparams { conv_hyperparams {
......
...@@ -254,13 +254,21 @@ class SSDKerasFeatureExtractor(tf.keras.Model): ...@@ -254,13 +254,21 @@ class SSDKerasFeatureExtractor(tf.keras.Model):
the model graph. the model graph.
""" """
variables_to_restore = {} variables_to_restore = {}
for variable in self.variables: if tf.executing_eagerly():
# variable.name includes ":0" at the end, but the names in the checkpoint for variable in self.variables:
# do not have the suffix ":0". So, we strip it here. # variable.name includes ":0" at the end, but the names in the
var_name = variable.name[:-2] # checkpoint do not have the suffix ":0". So, we strip it here.
if var_name.startswith(feature_extractor_scope + '/'): var_name = variable.name[:-2]
var_name = var_name.replace(feature_extractor_scope + '/', '') if var_name.startswith(feature_extractor_scope + '/'):
variables_to_restore[var_name] = variable var_name = var_name.replace(feature_extractor_scope + '/', '')
variables_to_restore[var_name] = variable
else:
# b/137854499: use global_variables.
for variable in variables_helper.get_global_variables_safely():
var_name = variable.op.name
if var_name.startswith(feature_extractor_scope + '/'):
var_name = var_name.replace(feature_extractor_scope + '/', '')
variables_to_restore[var_name] = variable
return variables_to_restore return variables_to_restore
...@@ -295,7 +303,9 @@ class SSDMetaArch(model.DetectionModel): ...@@ -295,7 +303,9 @@ class SSDMetaArch(model.DetectionModel):
expected_loss_weights_fn=None, expected_loss_weights_fn=None,
use_confidences_as_targets=False, use_confidences_as_targets=False,
implicit_example_weight=0.5, implicit_example_weight=0.5,
equalization_loss_config=None): equalization_loss_config=None,
return_raw_detections_during_predict=False,
nms_on_host=True):
"""SSDMetaArch Constructor. """SSDMetaArch Constructor.
TODO(rathodv,jonathanhuang): group NMS parameters + score converter into TODO(rathodv,jonathanhuang): group NMS parameters + score converter into
...@@ -371,6 +381,11 @@ class SSDMetaArch(model.DetectionModel): ...@@ -371,6 +381,11 @@ class SSDMetaArch(model.DetectionModel):
for the implicit negative examples. for the implicit negative examples.
equalization_loss_config: a namedtuple that specifies configs for equalization_loss_config: a namedtuple that specifies configs for
computing equalization loss. computing equalization loss.
return_raw_detections_during_predict: Whether to return raw detection
boxes in the predict() method. These are decoded boxes that have not
been through postprocessing (i.e. NMS). Default False.
nms_on_host: boolean (default: True) controlling whether NMS should be
carried out on the host (outside of TPU).
""" """
super(SSDMetaArch, self).__init__(num_classes=box_predictor.num_classes) super(SSDMetaArch, self).__init__(num_classes=box_predictor.num_classes)
self._is_training = is_training self._is_training = is_training
...@@ -438,6 +453,10 @@ class SSDMetaArch(model.DetectionModel): ...@@ -438,6 +453,10 @@ class SSDMetaArch(model.DetectionModel):
self._equalization_loss_config = equalization_loss_config self._equalization_loss_config = equalization_loss_config
self._return_raw_detections_during_predict = (
return_raw_detections_during_predict)
self._nms_on_host = nms_on_host
@property @property
def anchors(self): def anchors(self):
if not self._anchors: if not self._anchors:
...@@ -475,17 +494,10 @@ class SSDMetaArch(model.DetectionModel): ...@@ -475,17 +494,10 @@ class SSDMetaArch(model.DetectionModel):
Raises: Raises:
ValueError: if inputs tensor does not have type tf.float32 ValueError: if inputs tensor does not have type tf.float32
""" """
if inputs.dtype is not tf.float32:
raise ValueError('`preprocess` expects a tf.float32 tensor')
with tf.name_scope('Preprocessor'): with tf.name_scope('Preprocessor'):
# TODO(jonathanhuang): revisit whether to always use batch size as (resized_inputs,
# the number of parallel iterations vs allow for dynamic batching. true_image_shapes) = shape_utils.resize_images_and_return_shapes(
outputs = shape_utils.static_or_dynamic_map_fn( inputs, self._image_resizer_fn)
self._image_resizer_fn,
elems=inputs,
dtype=[tf.float32, tf.int32])
resized_inputs = outputs[0]
true_image_shapes = outputs[1]
return (self._feature_extractor.preprocess(resized_inputs), return (self._feature_extractor.preprocess(resized_inputs),
true_image_shapes) true_image_shapes)
...@@ -560,6 +572,14 @@ class SSDMetaArch(model.DetectionModel): ...@@ -560,6 +572,14 @@ class SSDMetaArch(model.DetectionModel):
[batch, height_i, width_i, depth_i]. [batch, height_i, width_i, depth_i].
5) anchors: 2-D float tensor of shape [num_anchors, 4] containing 5) anchors: 2-D float tensor of shape [num_anchors, 4] containing
the generated anchors in normalized coordinates. the generated anchors in normalized coordinates.
6) final_anchors: 3-D float tensor of shape [batch_size, num_anchors, 4]
containing the generated anchors in normalized coordinates.
If self._return_raw_detections_during_predict is True, the dictionary
will also contain:
7) raw_detection_boxes: a 4-D float32 tensor with shape
[batch_size, self.max_num_proposals, 4] in normalized coordinates.
8) raw_detection_feature_map_indices: a 3-D int32 tensor with shape
[batch_size, self.max_num_proposals].
""" """
if self._inplace_batchnorm_update: if self._inplace_batchnorm_update:
batchnorm_updates_collections = None batchnorm_updates_collections = None
...@@ -581,11 +601,11 @@ class SSDMetaArch(model.DetectionModel): ...@@ -581,11 +601,11 @@ class SSDMetaArch(model.DetectionModel):
feature_maps) feature_maps)
image_shape = shape_utils.combined_static_and_dynamic_shape( image_shape = shape_utils.combined_static_and_dynamic_shape(
preprocessed_inputs) preprocessed_inputs)
self._anchors = box_list_ops.concatenate( boxlist_list = self._anchor_generator.generate(
self._anchor_generator.generate( feature_map_spatial_dims,
feature_map_spatial_dims, im_height=image_shape[1],
im_height=image_shape[1], im_width=image_shape[2])
im_width=image_shape[2])) self._anchors = box_list_ops.concatenate(boxlist_list)
if self._box_predictor.is_keras_model: if self._box_predictor.is_keras_model:
predictor_results_dict = self._box_predictor(feature_maps) predictor_results_dict = self._box_predictor(feature_maps)
else: else:
...@@ -596,9 +616,15 @@ class SSDMetaArch(model.DetectionModel): ...@@ -596,9 +616,15 @@ class SSDMetaArch(model.DetectionModel):
predictor_results_dict = self._box_predictor.predict( predictor_results_dict = self._box_predictor.predict(
feature_maps, self._anchor_generator.num_anchors_per_location()) feature_maps, self._anchor_generator.num_anchors_per_location())
predictions_dict = { predictions_dict = {
'preprocessed_inputs': preprocessed_inputs, 'preprocessed_inputs':
'feature_maps': feature_maps, preprocessed_inputs,
'anchors': self._anchors.get() 'feature_maps':
feature_maps,
'anchors':
self._anchors.get(),
'final_anchors':
tf.tile(
tf.expand_dims(self._anchors.get(), 0), [image_shape[0], 1, 1])
} }
for prediction_key, prediction_list in iter(predictor_results_dict.items()): for prediction_key, prediction_list in iter(predictor_results_dict.items()):
prediction = tf.concat(prediction_list, axis=1) prediction = tf.concat(prediction_list, axis=1)
...@@ -606,10 +632,29 @@ class SSDMetaArch(model.DetectionModel): ...@@ -606,10 +632,29 @@ class SSDMetaArch(model.DetectionModel):
prediction.shape[2] == 1): prediction.shape[2] == 1):
prediction = tf.squeeze(prediction, axis=2) prediction = tf.squeeze(prediction, axis=2)
predictions_dict[prediction_key] = prediction predictions_dict[prediction_key] = prediction
if self._return_raw_detections_during_predict:
predictions_dict.update(self._raw_detections_and_feature_map_inds(
predictions_dict['box_encodings'], boxlist_list))
self._batched_prediction_tensor_names = [x for x in predictions_dict self._batched_prediction_tensor_names = [x for x in predictions_dict
if x != 'anchors'] if x != 'anchors']
return predictions_dict return predictions_dict
def _raw_detections_and_feature_map_inds(self, box_encodings, boxlist_list):
anchors = self._anchors.get()
raw_detection_boxes, _ = self._batch_decode(box_encodings, anchors)
batch_size, _, _ = shape_utils.combined_static_and_dynamic_shape(
raw_detection_boxes)
feature_map_indices = (
self._anchor_generator.anchor_index_to_feature_map_index(boxlist_list))
feature_map_indices_batched = tf.tile(
tf.expand_dims(feature_map_indices, 0),
multiples=[batch_size, 1])
return {
fields.PredictionFields.raw_detection_boxes: raw_detection_boxes,
fields.PredictionFields.raw_detection_feature_map_indices:
feature_map_indices_batched
}
def _get_feature_map_spatial_dims(self, feature_maps): def _get_feature_map_spatial_dims(self, feature_maps):
"""Return list of spatial dimensions for each feature map in a list. """Return list of spatial dimensions for each feature map in a list.
...@@ -719,7 +764,9 @@ class SSDMetaArch(model.DetectionModel): ...@@ -719,7 +764,9 @@ class SSDMetaArch(model.DetectionModel):
'multiclass_scores': detection_scores_with_background 'multiclass_scores': detection_scores_with_background
} }
if self._anchors is not None: if self._anchors is not None:
anchor_indices = tf.range(self._anchors.num_boxes_static()) num_boxes = (self._anchors.num_boxes_static() or
self._anchors.num_boxes())
anchor_indices = tf.range(num_boxes)
batch_anchor_indices = tf.tile( batch_anchor_indices = tf.tile(
tf.expand_dims(anchor_indices, 0), [batch_size, 1]) tf.expand_dims(anchor_indices, 0), [batch_size, 1])
# All additional fields need to be float. # All additional fields need to be float.
...@@ -730,14 +777,30 @@ class SSDMetaArch(model.DetectionModel): ...@@ -730,14 +777,30 @@ class SSDMetaArch(model.DetectionModel):
detection_keypoints = tf.identity( detection_keypoints = tf.identity(
detection_keypoints, 'raw_keypoint_locations') detection_keypoints, 'raw_keypoint_locations')
additional_fields[fields.BoxListFields.keypoints] = detection_keypoints additional_fields[fields.BoxListFields.keypoints] = detection_keypoints
def _non_max_suppression_wrapper(kwargs):
if self._nms_on_host:
# Note: NMS is not memory efficient on TPU. This force the NMS to run
# outside of TPU.
return tf.contrib.tpu.outside_compilation(
lambda x: self._non_max_suppression_fn(**x), kwargs)
else:
return self._non_max_suppression_fn(**kwargs)
(nmsed_boxes, nmsed_scores, nmsed_classes, nmsed_masks, (nmsed_boxes, nmsed_scores, nmsed_classes, nmsed_masks,
nmsed_additional_fields, num_detections) = self._non_max_suppression_fn( nmsed_additional_fields,
num_detections) = _non_max_suppression_wrapper({
'boxes':
detection_boxes, detection_boxes,
'scores':
detection_scores, detection_scores,
clip_window=self._compute_clip_window(preprocessed_images, 'clip_window':
true_image_shapes), self._compute_clip_window(preprocessed_images, true_image_shapes),
additional_fields=additional_fields, 'additional_fields':
masks=prediction_dict.get('mask_predictions')) additional_fields,
'masks':
prediction_dict.get('mask_predictions')
})
detection_dict = { detection_dict = {
fields.DetectionResultFields.detection_boxes: fields.DetectionResultFields.detection_boxes:
nmsed_boxes, nmsed_boxes,
...@@ -1058,6 +1121,15 @@ class SSDMetaArch(model.DetectionModel): ...@@ -1058,6 +1121,15 @@ class SSDMetaArch(model.DetectionModel):
with rows of the Match objects corresponding to groundtruth boxes with rows of the Match objects corresponding to groundtruth boxes
and columns corresponding to anchors. and columns corresponding to anchors.
""" """
# TODO(rathodv): Add a test for these summaries.
try:
# TODO(kaftan): Integrate these summaries into the v2 style loops
with tf.compat.v2.init_scope():
if tf.compat.v2.executing_eagerly():
return
except AttributeError:
pass
avg_num_gt_boxes = tf.reduce_mean( avg_num_gt_boxes = tf.reduce_mean(
tf.cast( tf.cast(
tf.stack([tf.shape(x)[0] for x in groundtruth_boxes_list]), tf.stack([tf.shape(x)[0] for x in groundtruth_boxes_list]),
...@@ -1078,14 +1150,6 @@ class SSDMetaArch(model.DetectionModel): ...@@ -1078,14 +1150,6 @@ class SSDMetaArch(model.DetectionModel):
tf.cast( tf.cast(
tf.stack([match.num_ignored_columns() for match in match_list]), tf.stack([match.num_ignored_columns() for match in match_list]),
dtype=tf.float32)) dtype=tf.float32))
# TODO(rathodv): Add a test for these summaries.
try:
# TODO(kaftan): Integrate these summaries into the v2 style loops
with tf.compat.v2.init_scope():
if tf.compat.v2.executing_eagerly():
return
except AttributeError:
pass
tf.summary.scalar('AvgNumGroundtruthBoxesPerImage', tf.summary.scalar('AvgNumGroundtruthBoxesPerImage',
avg_num_gt_boxes, avg_num_gt_boxes,
...@@ -1232,26 +1296,27 @@ class SSDMetaArch(model.DetectionModel): ...@@ -1232,26 +1296,27 @@ class SSDMetaArch(model.DetectionModel):
ValueError: if fine_tune_checkpoint_type is neither `classification` ValueError: if fine_tune_checkpoint_type is neither `classification`
nor `detection`. nor `detection`.
""" """
if fine_tune_checkpoint_type not in ['detection', 'classification']:
raise ValueError('Not supported fine_tune_checkpoint_type: {}'.format(
fine_tune_checkpoint_type))
if fine_tune_checkpoint_type == 'classification': if fine_tune_checkpoint_type == 'classification':
return self._feature_extractor.restore_from_classification_checkpoint_fn( return self._feature_extractor.restore_from_classification_checkpoint_fn(
self._extract_features_scope) self._extract_features_scope)
if fine_tune_checkpoint_type == 'detection': elif fine_tune_checkpoint_type == 'detection':
variables_to_restore = {} variables_to_restore = {}
if tf.executing_eagerly(): if tf.executing_eagerly():
for variable in self.variables: if load_all_detection_checkpoint_vars:
# variable.name includes ":0" at the end, but the names in the # Grab all detection vars by name
# checkpoint do not have the suffix ":0". So, we strip it here. for variable in self.variables:
var_name = variable.name[:-2] # variable.name includes ":0" at the end, but the names in the
if load_all_detection_checkpoint_vars: # checkpoint do not have the suffix ":0". So, we strip it here.
var_name = variable.name[:-2]
variables_to_restore[var_name] = variable
else:
# Grab just the feature extractor vars by name
for variable in self._feature_extractor.variables:
# variable.name includes ":0" at the end, but the names in the
# checkpoint do not have the suffix ":0". So, we strip it here.
var_name = variable.name[:-2]
variables_to_restore[var_name] = variable variables_to_restore[var_name] = variable
else:
if var_name.startswith(self._extract_features_scope):
variables_to_restore[var_name] = variable
else: else:
for variable in variables_helper.get_global_variables_safely(): for variable in variables_helper.get_global_variables_safely():
var_name = variable.op.name var_name = variable.op.name
...@@ -1261,7 +1326,11 @@ class SSDMetaArch(model.DetectionModel): ...@@ -1261,7 +1326,11 @@ class SSDMetaArch(model.DetectionModel):
if var_name.startswith(self._extract_features_scope): if var_name.startswith(self._extract_features_scope):
variables_to_restore[var_name] = variable variables_to_restore[var_name] = variable
return variables_to_restore return variables_to_restore
else:
raise ValueError('Not supported fine_tune_checkpoint_type: {}'.format(
fine_tune_checkpoint_type))
def updates(self): def updates(self):
"""Returns a list of update operators for this model. """Returns a list of update operators for this model.
......
...@@ -49,7 +49,8 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase, ...@@ -49,7 +49,8 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
predict_mask=False, predict_mask=False,
use_static_shapes=False, use_static_shapes=False,
nms_max_size_per_class=5, nms_max_size_per_class=5,
calibration_mapping_value=None): calibration_mapping_value=None,
return_raw_detections_during_predict=False):
return super(SsdMetaArchTest, self)._create_model( return super(SsdMetaArchTest, self)._create_model(
model_fn=ssd_meta_arch.SSDMetaArch, model_fn=ssd_meta_arch.SSDMetaArch,
apply_hard_mining=apply_hard_mining, apply_hard_mining=apply_hard_mining,
...@@ -63,7 +64,9 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase, ...@@ -63,7 +64,9 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
predict_mask=predict_mask, predict_mask=predict_mask,
use_static_shapes=use_static_shapes, use_static_shapes=use_static_shapes,
nms_max_size_per_class=nms_max_size_per_class, nms_max_size_per_class=nms_max_size_per_class,
calibration_mapping_value=calibration_mapping_value) calibration_mapping_value=calibration_mapping_value,
return_raw_detections_during_predict=(
return_raw_detections_during_predict))
def test_preprocess_preserves_shapes_with_dynamic_input_image( def test_preprocess_preserves_shapes_with_dynamic_input_image(
self, use_keras): self, use_keras):
...@@ -105,6 +108,7 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase, ...@@ -105,6 +108,7 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
self.assertIn('class_predictions_with_background', prediction_dict) self.assertIn('class_predictions_with_background', prediction_dict)
self.assertIn('feature_maps', prediction_dict) self.assertIn('feature_maps', prediction_dict)
self.assertIn('anchors', prediction_dict) self.assertIn('anchors', prediction_dict)
self.assertIn('final_anchors', prediction_dict)
init_op = tf.global_variables_initializer() init_op = tf.global_variables_initializer()
with self.test_session(graph=tf_graph) as sess: with self.test_session(graph=tf_graph) as sess:
...@@ -121,6 +125,8 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase, ...@@ -121,6 +125,8 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
self.assertAllEqual(prediction_out['box_encodings'].shape, self.assertAllEqual(prediction_out['box_encodings'].shape,
expected_box_encodings_shape_out) expected_box_encodings_shape_out)
self.assertAllEqual(prediction_out['final_anchors'].shape,
(batch_size, num_anchors, 4))
self.assertAllEqual( self.assertAllEqual(
prediction_out['class_predictions_with_background'].shape, prediction_out['class_predictions_with_background'].shape,
expected_class_predictions_with_background_shape_out) expected_class_predictions_with_background_shape_out)
...@@ -137,7 +143,7 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase, ...@@ -137,7 +143,7 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
return (predictions['box_encodings'], return (predictions['box_encodings'],
predictions['class_predictions_with_background'], predictions['class_predictions_with_background'],
predictions['feature_maps'], predictions['feature_maps'],
predictions['anchors']) predictions['anchors'], predictions['final_anchors'])
batch_size = 3 batch_size = 3
image_size = 2 image_size = 2
channels = 3 channels = 3
...@@ -145,11 +151,83 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase, ...@@ -145,11 +151,83 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
channels).astype(np.float32) channels).astype(np.float32)
expected_box_encodings_shape = (batch_size, num_anchors, code_size) expected_box_encodings_shape = (batch_size, num_anchors, code_size)
expected_class_predictions_shape = (batch_size, num_anchors, num_classes+1) expected_class_predictions_shape = (batch_size, num_anchors, num_classes+1)
(box_encodings, class_predictions, _, _) = self.execute(graph_fn, final_anchors_shape = (batch_size, num_anchors, 4)
[input_image]) (box_encodings, class_predictions, _, _, final_anchors) = self.execute(
graph_fn, [input_image])
self.assertAllEqual(box_encodings.shape, expected_box_encodings_shape) self.assertAllEqual(box_encodings.shape, expected_box_encodings_shape)
self.assertAllEqual(class_predictions.shape, self.assertAllEqual(class_predictions.shape,
expected_class_predictions_shape) expected_class_predictions_shape)
self.assertAllEqual(final_anchors.shape, final_anchors_shape)
def test_predict_with_raw_output_fields(self, use_keras):
with tf.Graph().as_default():
_, num_classes, num_anchors, code_size = self._create_model(
use_keras=use_keras)
def graph_fn(input_image):
model, _, _, _ = self._create_model(
return_raw_detections_during_predict=True)
predictions = model.predict(input_image, true_image_shapes=None)
return (predictions['box_encodings'],
predictions['class_predictions_with_background'],
predictions['feature_maps'],
predictions['anchors'], predictions['final_anchors'],
predictions['raw_detection_boxes'],
predictions['raw_detection_feature_map_indices'])
batch_size = 3
image_size = 2
channels = 3
input_image = np.random.rand(batch_size, image_size, image_size,
channels).astype(np.float32)
expected_box_encodings_shape = (batch_size, num_anchors, code_size)
expected_class_predictions_shape = (batch_size, num_anchors, num_classes+1)
final_anchors_shape = (batch_size, num_anchors, 4)
expected_raw_detection_boxes_shape = (batch_size, num_anchors, 4)
(box_encodings, class_predictions, _, _, final_anchors, raw_detection_boxes,
raw_detection_feature_map_indices) = self.execute(
graph_fn, [input_image])
self.assertAllEqual(box_encodings.shape, expected_box_encodings_shape)
self.assertAllEqual(class_predictions.shape,
expected_class_predictions_shape)
self.assertAllEqual(final_anchors.shape, final_anchors_shape)
self.assertAllEqual(raw_detection_boxes.shape,
expected_raw_detection_boxes_shape)
self.assertAllEqual(raw_detection_feature_map_indices,
np.zeros((batch_size, num_anchors)))
def test_raw_detection_boxes_agree_predict_postprocess(self, use_keras):
batch_size = 2
image_size = 2
input_shapes = [(batch_size, image_size, image_size, 3),
(None, image_size, image_size, 3),
(batch_size, None, None, 3),
(None, None, None, 3)]
for input_shape in input_shapes:
tf_graph = tf.Graph()
with tf_graph.as_default():
model, _, _, _ = self._create_model(
use_keras=use_keras, return_raw_detections_during_predict=True)
input_placeholder = tf.placeholder(tf.float32, shape=input_shape)
preprocessed_inputs, true_image_shapes = model.preprocess(
input_placeholder)
prediction_dict = model.predict(preprocessed_inputs,
true_image_shapes)
raw_detection_boxes_predict = prediction_dict['raw_detection_boxes']
detections = model.postprocess(prediction_dict, true_image_shapes)
raw_detection_boxes_postprocess = detections['raw_detection_boxes']
init_op = tf.global_variables_initializer()
with self.test_session(graph=tf_graph) as sess:
sess.run(init_op)
raw_detection_boxes_predict_out, raw_detection_boxes_postprocess_out = (
sess.run(
[raw_detection_boxes_predict, raw_detection_boxes_postprocess],
feed_dict={
input_placeholder:
np.random.uniform(size=(batch_size, 2, 2, 3))}))
self.assertAllEqual(raw_detection_boxes_predict_out,
raw_detection_boxes_postprocess_out)
def test_postprocess_results_are_correct(self, use_keras): def test_postprocess_results_are_correct(self, use_keras):
batch_size = 2 batch_size = 2
...@@ -188,7 +266,7 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase, ...@@ -188,7 +266,7 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
[0.5, 0., 1., 0.5], [1., 1., 1.5, 1.5]]] [0.5, 0., 1., 0.5], [1., 1., 1.5, 1.5]]]
raw_detection_scores = [[[0, 0], [0, 0], [0, 0], [0, 0]], raw_detection_scores = [[[0, 0], [0, 0], [0, 0], [0, 0]],
[[0, 0], [0, 0], [0, 0], [0, 0]]] [[0, 0], [0, 0], [0, 0], [0, 0]]]
detection_anchor_indices = [[0, 2, 1, 0, 0], [0, 2, 1, 0, 0]] detection_anchor_indices_sets = [[0, 1, 2], [0, 1, 2]]
for input_shape in input_shapes: for input_shape in input_shapes:
tf_graph = tf.Graph() tf_graph = tf.Graph()
...@@ -230,8 +308,9 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase, ...@@ -230,8 +308,9 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
raw_detection_boxes) raw_detection_boxes)
self.assertAllEqual(detections_out['raw_detection_scores'], self.assertAllEqual(detections_out['raw_detection_scores'],
raw_detection_scores) raw_detection_scores)
self.assertAllEqual(detections_out['detection_anchor_indices'], for idx in range(batch_size):
detection_anchor_indices) self.assertSameElements(detections_out['detection_anchor_indices'][idx],
detection_anchor_indices_sets[idx])
def test_postprocess_results_are_correct_static(self, use_keras): def test_postprocess_results_are_correct_static(self, use_keras):
with tf.Graph().as_default(): with tf.Graph().as_default():
......
...@@ -129,7 +129,8 @@ class SSDMetaArchTestBase(test_case.TestCase): ...@@ -129,7 +129,8 @@ class SSDMetaArchTestBase(test_case.TestCase):
predict_mask=False, predict_mask=False,
use_static_shapes=False, use_static_shapes=False,
nms_max_size_per_class=5, nms_max_size_per_class=5,
calibration_mapping_value=None): calibration_mapping_value=None,
return_raw_detections_during_predict=False):
is_training = False is_training = False
num_classes = 1 num_classes = 1
mock_anchor_generator = MockAnchorGenerator2x2() mock_anchor_generator = MockAnchorGenerator2x2()
...@@ -238,6 +239,8 @@ class SSDMetaArchTestBase(test_case.TestCase): ...@@ -238,6 +239,8 @@ class SSDMetaArchTestBase(test_case.TestCase):
add_background_class=add_background_class, add_background_class=add_background_class,
random_example_sampler=random_example_sampler, random_example_sampler=random_example_sampler,
expected_loss_weights_fn=expected_loss_weights_fn, expected_loss_weights_fn=expected_loss_weights_fn,
return_raw_detections_during_predict=(
return_raw_detections_during_predict),
**kwargs) **kwargs)
return model, num_classes, mock_anchor_generator.num_anchors(), code_size return model, num_classes, mock_anchor_generator.num_anchors(), code_size
......
...@@ -267,6 +267,13 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False, ...@@ -267,6 +267,13 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False,
# Make sure to set the Keras learning phase. True during training, # Make sure to set the Keras learning phase. True during training,
# False for inference. # False for inference.
tf.keras.backend.set_learning_phase(is_training) tf.keras.backend.set_learning_phase(is_training)
# Set policy for mixed-precision training with Keras-based models.
if use_tpu and train_config.use_bfloat16:
from tensorflow.python.keras.engine import base_layer_utils # pylint: disable=g-import-not-at-top
# Enable v2 behavior, as `mixed_bfloat16` is only supported in TF 2.0.
base_layer_utils.enable_v2_dtype_behavior()
tf.compat.v2.keras.mixed_precision.experimental.set_policy(
'mixed_bfloat16')
detection_model = detection_model_fn( detection_model = detection_model_fn(
is_training=is_training, add_summaries=(not use_tpu)) is_training=is_training, add_summaries=(not use_tpu))
scaffold_fn = None scaffold_fn = None
...@@ -315,7 +322,8 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False, ...@@ -315,7 +322,8 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False,
features[fields.InputDataFields.true_image_shape])) features[fields.InputDataFields.true_image_shape]))
if mode == tf.estimator.ModeKeys.TRAIN: if mode == tf.estimator.ModeKeys.TRAIN:
if train_config.fine_tune_checkpoint and hparams.load_pretrained: load_pretrained = hparams.load_pretrained if hparams else False
if train_config.fine_tune_checkpoint and load_pretrained:
if not train_config.fine_tune_checkpoint_type: if not train_config.fine_tune_checkpoint_type:
# train_config.from_detection_checkpoint field is deprecated. For # train_config.from_detection_checkpoint field is deprecated. For
# backward compatibility, set train_config.fine_tune_checkpoint_type # backward compatibility, set train_config.fine_tune_checkpoint_type
...@@ -449,6 +457,10 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False, ...@@ -449,6 +457,10 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False,
original_image_spatial_shapes=original_image_spatial_shapes, original_image_spatial_shapes=original_image_spatial_shapes,
true_image_shapes=true_image_shapes) true_image_shapes=true_image_shapes)
if fields.InputDataFields.image_additional_channels in features:
eval_dict[fields.InputDataFields.image_additional_channels] = features[
fields.InputDataFields.image_additional_channels]
if class_agnostic: if class_agnostic:
category_index = label_map_util.create_class_agnostic_category_index() category_index = label_map_util.create_class_agnostic_category_index()
else: else:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment