Commit 0ba83cf0 authored by pkulzc's avatar pkulzc Committed by Sergio Guadarrama
Browse files

Release MobileNet V3 models and SSDLite models with MobileNet V3 backbone. (#7678)

* Merged commit includes the following changes:
275131829  by Sergio Guadarrama:

    updates mobilenet/README.md to be github compatible adds V2+ reference to mobilenet_v1.md file and fixes invalid markdown

--
274908068  by Sergio Guadarrama:

    Opensource MobilenetV3 detection models.

--
274697808  by Sergio Guadarrama:

    Fixed cases where tf.TensorShape was constructed with float dimensions

    This is a prerequisite for making TensorShape and Dimension more strict
    about the types of their arguments.

--
273577462  by Sergio Guadarrama:

    Fixing `conv_defs['defaults']` override issue.

--
272801298  by Sergio Guadarrama:

    Adds links to trained models for Moblienet V3, adds a version of minimalistic mobilenet-v3 to the definitions.

--
268928503  by Sergio Guadarrama:

    Mobilenet v2 with group normalization.

--
263492735  by Sergio Guadarrama:

    Internal change

260037126  by Sergio Guadarrama:

    Adds an option of using a custom depthwise operation in `expanded_conv`.

--
259997001  by Sergio Guadarrama:

    Explicitly mark Python binaries/tests with python_version = "PY2".

--
252697685  by Sergio Guadarrama:

    Internal change

251918746  by Sergio Guadarrama:

    Internal change

251909704  by Sergio Guadarrama:

    Mobilenet V3 backbone implementation.

--
247510236  by Sergio Guadarrama:

    Internal change

246196802  by Sergio Guadarrama:

    Internal change

246014539  by Sergio Guadarrama:

    Internal change

245891435  by Sergio Guadarrama:

    Internal change

245834925  by Sergio Guadarrama:

    n/a

--

PiperOrigin-RevId: 275131829

* Merged commit includes the following changes:
274959989  by Zhichao Lu:

    Update detection model zoo with MobilenetV3 SSD candidates.

--
274908068  by Zhichao Lu:

    Opensource MobilenetV3 detection models.

--
274695889  by richardmunoz:

    RandomPatchGaussian preprocessing step

    This step can be used during model training to randomly apply gaussian noise to a random image patch. Example addition to an Object Detection API pipeline config:

    train_config {
      ...
      data_augmentation_options {
        random_patch_gaussian {
          random_coef: 0.5
          min_patch_size: 1
          max_patch_size: 250
          min_gaussian_stddev: 0.0
          max_gaussian_stddev: 1.0
        }
      }
      ...
    }

--
274257872  by lzc:

    Internal change.

--
274114689  by Zhichao Lu:

    Pass native_resize flag to other FPN variants.

--
274112308  by lzc:

    Internal change.

--
274090763  by richardmunoz:

    Util function for getting a patch mask on an image for use with the Object Detection API

--
274069806  by Zhichao Lu:

    Adding functions which will help compute predictions and losses for CenterNet.

--
273860828  by lzc:

    Internal change.

--
273380069  by richardmunoz:

    RandomImageDownscaleToTargetPixels preprocessing step

    This step can be used during model training to randomly downscale an image to a random target number of pixels. If the image does not contain more than the target number of pixels, then downscaling is skipped. Example addition to an Object Detection API pipeline config:

    train_config {
      ...
      data_augmentation_options {
        random_downscale_to_target_pixels {
          random_coef: 0.5
          min_target_pixels: 300000
          max_target_pixels: 500000
        }
      }
      ...
    }

--
272987602  by Zhichao Lu:

    Avoid -inf when empty box list is passed.

--
272525836  by Zhichao Lu:

    Cleanup repeated resizing code in meta archs.

--
272458667  by richardmunoz:

    RandomJpegQuality preprocessing step

    This step can be used during model training to randomly encode the image into a jpeg with a random quality level. Example addition to an Object Detection API pipeline config:

    train_config {
      ...
      data_augmentation_options {
        random_jpeg_quality {
          random_coef: 0.5
          min_jpeg_quality: 80
          max_jpeg_quality: 100
        }
      }
      ...
    }

--
271412717  by Zhichao Lu:

    Enables TPU training with the V2 eager + tf.function Object Detection training loops.

--
270744153  by Zhichao Lu:

    Adding the offset and size target assigners for CenterNet.

--
269916081  by Zhichao Lu:

    Include basic installation in Object Detection API tutorial.
    Also:
     - Use TF2.0
     - Use saved_model

--
269376056  by Zhichao Lu:

    Fix to variable loading in RetinaNet w/ custom loops. (makes the code rely on the exact name scopes that are generated a little bit less)

--
269256251  by lzc:

    Add use_partitioned_nms field to config and update post_prossing_builder to honor that flag when building nms function.

--
268865295  by Zhichao Lu:

    Adding functionality for importing and merging back internal state of the metric.

--
268640984  by Zhichao Lu:

    Fix computation of gaussian sigma value to create CenterNet heatmap target.

--
267475576  by Zhichao Lu:

    Fix for exporter trying to export non-existent exponential moving averages.

--
267286768  by Zhichao Lu:

    Update mixed-precision policy.

--
266166879  by Zhichao Lu:

    Internal change

265860884  by Zhichao Lu:

    Apply floor function to center coordinates when creating heatmap for CenterNet target.

--
265702749  by Zhichao Lu:

    Internal change

--
264241949  by ronnyvotel:

    Updating Faster R-CNN 'final_anchors' to be in normalized coordinates.

--
264175192  by lzc:

    Update model_fn to only read hparams if it is not None.

--
264159328  by Zhichao Lu:

    Modify nearest neighbor upsampling to eliminate a multiply operation. For quantized models, the multiply operation gets unnecessarily quantized and reduces accuracy (simple stacking would work in place of the broadcast op which doesn't require quantization). Also removes an unnecessary reshape op.

--
263668306  by Zhichao Lu:

    Add the option to use dynamic map_fn for batch NMS

--
263031163  by Zhichao Lu:

    Mark outside compilation for NMS as optional.

--
263024916  by Zhichao Lu:

    Add an ExperimentalModel meta arch for experimenting with new model types.

--
262655894  by Zhichao Lu:

    Add the center heatmap target assigner for CenterNet

--
262431036  by Zhichao Lu:

    Adding add_eval_dict to allow for evaluation on model_v2

--
262035351  by ronnyvotel:

    Removing any non-Tensor predictions from the third stage of Mask R-CNN.

--
261953416  by Zhichao Lu:

    Internal change.

--
261834966  by Zhichao Lu:

    Fix the NMS OOM issue on TPU by forcing NMS to run outside of TPU.

--
261775941  by Zhichao Lu:

    Make Keras InputLayer compatible with both TF 1.x and TF 2.0.

--
261775633  by Zhichao Lu:

    Visualize additional channels with ground-truth bounding boxes.

--
261768117  by lzc:

    Internal change.

--
261766773  by ronnyvotel:

    Exposing `return_raw_detections_during_predict` in Faster R-CNN Proto.

--
260975089  by ronnyvotel:

    Moving calculation of batched prediction tensor names after all tensors in prediction dictionary are created.

--
259816913  by ronnyvotel:

    Adding raw detection boxes and feature map indices to SSD

--
259791955  by Zhichao Lu:

    Added a flag to control the use partitioned_non_max_suppression.

--
259580475  by Zhichao Lu:

    Tweak quantization-aware training re-writer to support NasFpn model architecture.

--
259579943  by rathodv:

    Add a meta target assigner proto and builders in OD API.

--
259577741  by Zhichao Lu:

    Internal change.

--
259366315  by lzc:

    Internal change.

--
259344310  by ronnyvotel:

    Updating faster rcnn so that raw_detection_boxes from predict() are in normalized coordinates.

--
259338670  by Zhichao Lu:

    Add support for use_native_resize_op to more feature extractors. Use dynamic shapes when static shapes are not available.

--
259083543  by ronnyvotel:

    Updating/fixing documentation.

--
259078937  by rathodv:

    Add prediction fields for tensors returned from detection_model.predict.

--
259044601  by Zhichao Lu:

    Add protocol buffer and builders for temperature scaling calibration.

--
259036770  by lzc:

    Internal changes.

--
259006223  by ronnyvotel:

    Adding detection anchor indices to Faster R-CNN Config. This is useful when one wishes to associate final detections and the anchors (or pre-nms boxes) from which they originated.

--
258872501  by Zhichao Lu:

    Run the training pipeline of ssd + resnet_v1_50 + fpn with a checkpoint.

--
258840686  by ronnyvotel:

    Adding standard outputs to DetectionModel.predict(). This CL only updates Faster R-CNN. Other meta architectures will be updated in future CLs.

--
258672969  by lzc:

    Internal change.

--
258649494  by lzc:

    Internal changes.

--
258630321  by ronnyvotel:

    Fixing documentation in shape_utils.flatten_dimensions().

--
258468145  by Zhichao Lu:

    Add additional output tensors parameter to Postprocess op.

--
258099219  by Zhichao Lu:

    Internal changes

--

PiperOrigin-RevId: 274959989
parent 9aed0ffb
......@@ -19,9 +19,9 @@ from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from absl.testing import parameterized
import numpy as np
import six
from six.moves import range
from six.moves import zip
import tensorflow as tf
......@@ -36,7 +36,7 @@ else:
from unittest import mock # pylint: disable=g-import-not-at-top
class PreprocessorTest(tf.test.TestCase):
class PreprocessorTest(tf.test.TestCase, parameterized.TestCase):
def createColorfulTestImage(self):
ch255 = tf.fill([1, 100, 200, 1], tf.constant(255, dtype=tf.uint8))
......@@ -2478,6 +2478,233 @@ class PreprocessorTest(tf.test.TestCase):
[images_shape, blacked_images_shape])
self.assertAllEqual(images_shape_, blacked_images_shape_)
def testRandomJpegQuality(self):
preprocessing_options = [(preprocessor.random_jpeg_quality, {
'min_jpeg_quality': 0,
'max_jpeg_quality': 100
})]
images = self.createTestImages()
tensor_dict = {fields.InputDataFields.image: images}
processed_tensor_dict = preprocessor.preprocess(tensor_dict,
preprocessing_options)
encoded_images = processed_tensor_dict[fields.InputDataFields.image]
images_shape = tf.shape(images)
encoded_images_shape = tf.shape(encoded_images)
with self.test_session() as sess:
images_shape_out, encoded_images_shape_out = sess.run(
[images_shape, encoded_images_shape])
self.assertAllEqual(images_shape_out, encoded_images_shape_out)
def testRandomJpegQualityKeepsStaticChannelShape(self):
# Set at least three weeks past the forward compatibility horizon for
# tf 1.14 of 2019/11/01.
# https://github.com/tensorflow/tensorflow/blob/v1.14.0/tensorflow/python/compat/compat.py#L30
if not tf.compat.forward_compatible(year=2019, month=12, day=1):
self.skipTest('Skipping test for future functionality.')
preprocessing_options = [(preprocessor.random_jpeg_quality, {
'min_jpeg_quality': 0,
'max_jpeg_quality': 100
})]
images = self.createTestImages()
tensor_dict = {fields.InputDataFields.image: images}
processed_tensor_dict = preprocessor.preprocess(tensor_dict,
preprocessing_options)
encoded_images = processed_tensor_dict[fields.InputDataFields.image]
images_static_channels = images.shape[-1]
encoded_images_static_channels = encoded_images.shape[-1]
self.assertEqual(images_static_channels, encoded_images_static_channels)
def testRandomJpegQualityWithCache(self):
preprocessing_options = [(preprocessor.random_jpeg_quality, {
'min_jpeg_quality': 0,
'max_jpeg_quality': 100
})]
self._testPreprocessorCache(preprocessing_options)
def testRandomJpegQualityWithRandomCoefOne(self):
preprocessing_options = [(preprocessor.random_jpeg_quality, {
'random_coef': 1.0
})]
images = self.createTestImages()
tensor_dict = {fields.InputDataFields.image: images}
processed_tensor_dict = preprocessor.preprocess(tensor_dict,
preprocessing_options)
encoded_images = processed_tensor_dict[fields.InputDataFields.image]
images_shape = tf.shape(images)
encoded_images_shape = tf.shape(encoded_images)
with self.test_session() as sess:
(images_out, encoded_images_out, images_shape_out,
encoded_images_shape_out) = sess.run(
[images, encoded_images, images_shape, encoded_images_shape])
self.assertAllEqual(images_shape_out, encoded_images_shape_out)
self.assertAllEqual(images_out, encoded_images_out)
def testRandomDownscaleToTargetPixels(self):
preprocessing_options = [(preprocessor.random_downscale_to_target_pixels, {
'min_target_pixels': 100,
'max_target_pixels': 101
})]
images = tf.random_uniform([1, 25, 100, 3])
tensor_dict = {fields.InputDataFields.image: images}
processed_tensor_dict = preprocessor.preprocess(tensor_dict,
preprocessing_options)
downscaled_images = processed_tensor_dict[fields.InputDataFields.image]
downscaled_shape = tf.shape(downscaled_images)
expected_shape = [1, 5, 20, 3]
with self.test_session() as sess:
downscaled_shape_out = sess.run(downscaled_shape)
self.assertAllEqual(downscaled_shape_out, expected_shape)
def testRandomDownscaleToTargetPixelsWithMasks(self):
preprocessing_options = [(preprocessor.random_downscale_to_target_pixels, {
'min_target_pixels': 100,
'max_target_pixels': 101
})]
images = tf.random_uniform([1, 25, 100, 3])
masks = tf.random_uniform([10, 25, 100])
tensor_dict = {
fields.InputDataFields.image: images,
fields.InputDataFields.groundtruth_instance_masks: masks
}
preprocessor_arg_map = preprocessor.get_default_func_arg_map(
include_instance_masks=True)
processed_tensor_dict = preprocessor.preprocess(
tensor_dict, preprocessing_options, func_arg_map=preprocessor_arg_map)
downscaled_images = processed_tensor_dict[fields.InputDataFields.image]
downscaled_masks = processed_tensor_dict[
fields.InputDataFields.groundtruth_instance_masks]
downscaled_images_shape = tf.shape(downscaled_images)
downscaled_masks_shape = tf.shape(downscaled_masks)
expected_images_shape = [1, 5, 20, 3]
expected_masks_shape = [10, 5, 20]
with self.test_session() as sess:
downscaled_images_shape_out, downscaled_masks_shape_out = sess.run(
[downscaled_images_shape, downscaled_masks_shape])
self.assertAllEqual(downscaled_images_shape_out, expected_images_shape)
self.assertAllEqual(downscaled_masks_shape_out, expected_masks_shape)
@parameterized.parameters(
{'test_masks': False},
{'test_masks': True}
)
def testRandomDownscaleToTargetPixelsWithCache(self, test_masks):
preprocessing_options = [(preprocessor.random_downscale_to_target_pixels, {
'min_target_pixels': 100,
'max_target_pixels': 999
})]
self._testPreprocessorCache(preprocessing_options, test_masks=test_masks)
def testRandomDownscaleToTargetPixelsWithRandomCoefOne(self):
preprocessing_options = [(preprocessor.random_downscale_to_target_pixels, {
'random_coef': 1.0,
'min_target_pixels': 10,
'max_target_pixels': 20,
})]
images = tf.random_uniform([1, 25, 100, 3])
tensor_dict = {fields.InputDataFields.image: images}
processed_tensor_dict = preprocessor.preprocess(tensor_dict,
preprocessing_options)
downscaled_images = processed_tensor_dict[fields.InputDataFields.image]
images_shape = tf.shape(images)
downscaled_images_shape = tf.shape(downscaled_images)
with self.test_session() as sess:
(images_out, downscaled_images_out, images_shape_out,
downscaled_images_shape_out) = sess.run(
[images, downscaled_images, images_shape, downscaled_images_shape])
self.assertAllEqual(images_shape_out, downscaled_images_shape_out)
self.assertAllEqual(images_out, downscaled_images_out)
def testRandomDownscaleToTargetPixelsIgnoresSmallImages(self):
preprocessing_options = [(preprocessor.random_downscale_to_target_pixels, {
'min_target_pixels': 1000,
'max_target_pixels': 1001
})]
images = tf.random_uniform([1, 10, 10, 3])
tensor_dict = {fields.InputDataFields.image: images}
processed_tensor_dict = preprocessor.preprocess(tensor_dict,
preprocessing_options)
downscaled_images = processed_tensor_dict[fields.InputDataFields.image]
images_shape = tf.shape(images)
downscaled_images_shape = tf.shape(downscaled_images)
with self.test_session() as sess:
(images_out, downscaled_images_out, images_shape_out,
downscaled_images_shape_out) = sess.run(
[images, downscaled_images, images_shape, downscaled_images_shape])
self.assertAllEqual(images_shape_out, downscaled_images_shape_out)
self.assertAllEqual(images_out, downscaled_images_out)
def testRandomPatchGaussianShape(self):
preprocessing_options = [(preprocessor.random_patch_gaussian, {
'min_patch_size': 1,
'max_patch_size': 200,
'min_gaussian_stddev': 0.0,
'max_gaussian_stddev': 2.0
})]
images = self.createTestImages()
tensor_dict = {fields.InputDataFields.image: images}
processed_tensor_dict = preprocessor.preprocess(tensor_dict,
preprocessing_options)
patched_images = processed_tensor_dict[fields.InputDataFields.image]
images_shape = tf.shape(images)
patched_images_shape = tf.shape(patched_images)
self.assertAllEqual(images_shape, patched_images_shape)
def testRandomPatchGaussianClippedToLowerBound(self):
preprocessing_options = [(preprocessor.random_patch_gaussian, {
'min_patch_size': 20,
'max_patch_size': 40,
'min_gaussian_stddev': 50,
'max_gaussian_stddev': 100
})]
images = tf.zeros([1, 5, 4, 3])
tensor_dict = {fields.InputDataFields.image: images}
processed_tensor_dict = preprocessor.preprocess(tensor_dict,
preprocessing_options)
patched_images = processed_tensor_dict[fields.InputDataFields.image]
self.assertAllGreaterEqual(patched_images, 0.0)
def testRandomPatchGaussianClippedToUpperBound(self):
preprocessing_options = [(preprocessor.random_patch_gaussian, {
'min_patch_size': 20,
'max_patch_size': 40,
'min_gaussian_stddev': 50,
'max_gaussian_stddev': 100
})]
images = tf.constant(255.0, shape=[1, 5, 4, 3])
tensor_dict = {fields.InputDataFields.image: images}
processed_tensor_dict = preprocessor.preprocess(tensor_dict,
preprocessing_options)
patched_images = processed_tensor_dict[fields.InputDataFields.image]
self.assertAllLessEqual(patched_images, 255.0)
def testRandomPatchGaussianWithCache(self):
preprocessing_options = [(preprocessor.random_patch_gaussian, {
'min_patch_size': 1,
'max_patch_size': 200,
'min_gaussian_stddev': 0.0,
'max_gaussian_stddev': 2.0
})]
self._testPreprocessorCache(preprocessing_options)
def testRandomPatchGaussianWithRandomCoefOne(self):
preprocessing_options = [(preprocessor.random_patch_gaussian, {
'random_coef': 1.0
})]
images = self.createTestImages()
tensor_dict = {fields.InputDataFields.image: images}
processed_tensor_dict = preprocessor.preprocess(tensor_dict,
preprocessing_options)
patched_images = processed_tensor_dict[fields.InputDataFields.image]
images_shape = tf.shape(images)
patched_images_shape = tf.shape(patched_images)
self.assertAllEqual(images_shape, patched_images_shape)
self.assertAllEqual(images, patched_images)
def testAutoAugmentImage(self):
preprocessing_options = []
preprocessing_options.append((preprocessor.autoaugment_image, {
......
......@@ -168,6 +168,22 @@ class BoxListFields(object):
is_crowd = 'is_crowd'
class PredictionFields(object):
"""Naming conventions for standardized prediction outputs.
Attributes:
feature_maps: List of feature maps for prediction.
anchors: Generated anchors.
raw_detection_boxes: Decoded detection boxes without NMS.
raw_detection_feature_map_indices: Feature map indices from which each raw
detection box was produced.
"""
feature_maps = 'feature_maps'
anchors = 'anchors'
raw_detection_boxes = 'raw_detection_boxes'
raw_detection_feature_map_indices = 'raw_detection_feature_map_indices'
class TfExampleFields(object):
"""TF-example proto feature names for object detection.
......
......@@ -41,8 +41,9 @@ import tensorflow as tf
from object_detection.box_coders import faster_rcnn_box_coder
from object_detection.box_coders import mean_stddev_box_coder
from object_detection.core import box_coder as bcoder
from object_detection.core import box_coder
from object_detection.core import box_list
from object_detection.core import box_list_ops
from object_detection.core import matcher as mat
from object_detection.core import region_similarity_calculator as sim_calc
from object_detection.core import standard_fields as fields
......@@ -57,7 +58,7 @@ class TargetAssigner(object):
def __init__(self,
similarity_calc,
matcher,
box_coder,
box_coder_instance,
negative_class_weight=1.0):
"""Construct Object Detection Target Assigner.
......@@ -65,8 +66,8 @@ class TargetAssigner(object):
similarity_calc: a RegionSimilarityCalculator
matcher: an object_detection.core.Matcher used to match groundtruth to
anchors.
box_coder: an object_detection.core.BoxCoder used to encode matching
groundtruth boxes with respect to anchors.
box_coder_instance: an object_detection.core.BoxCoder used to encode
matching groundtruth boxes with respect to anchors.
negative_class_weight: classification weight to be associated to negative
anchors (default: 1.0). The weight must be in [0., 1.].
......@@ -78,11 +79,11 @@ class TargetAssigner(object):
raise ValueError('similarity_calc must be a RegionSimilarityCalculator')
if not isinstance(matcher, mat.Matcher):
raise ValueError('matcher must be a Matcher')
if not isinstance(box_coder, bcoder.BoxCoder):
if not isinstance(box_coder_instance, box_coder.BoxCoder):
raise ValueError('box_coder must be a BoxCoder')
self._similarity_calc = similarity_calc
self._matcher = matcher
self._box_coder = box_coder
self._box_coder = box_coder_instance
self._negative_class_weight = negative_class_weight
@property
......@@ -391,7 +392,7 @@ def create_target_assigner(reference, stage=None,
if reference == 'Multibox' and stage == 'proposal':
similarity_calc = sim_calc.NegSqDistSimilarity()
matcher = bipartite_matcher.GreedyBipartiteMatcher()
box_coder = mean_stddev_box_coder.MeanStddevBoxCoder()
box_coder_instance = mean_stddev_box_coder.MeanStddevBoxCoder()
elif reference == 'FasterRCNN' and stage == 'proposal':
similarity_calc = sim_calc.IouSimilarity()
......@@ -399,7 +400,7 @@ def create_target_assigner(reference, stage=None,
unmatched_threshold=0.3,
force_match_for_each_row=True,
use_matmul_gather=use_matmul_gather)
box_coder = faster_rcnn_box_coder.FasterRcnnBoxCoder(
box_coder_instance = faster_rcnn_box_coder.FasterRcnnBoxCoder(
scale_factors=[10.0, 10.0, 5.0, 5.0])
elif reference == 'FasterRCNN' and stage == 'detection':
......@@ -408,7 +409,7 @@ def create_target_assigner(reference, stage=None,
matcher = argmax_matcher.ArgMaxMatcher(matched_threshold=0.5,
negatives_lower_than_unmatched=True,
use_matmul_gather=use_matmul_gather)
box_coder = faster_rcnn_box_coder.FasterRcnnBoxCoder(
box_coder_instance = faster_rcnn_box_coder.FasterRcnnBoxCoder(
scale_factors=[10.0, 10.0, 5.0, 5.0])
elif reference == 'FastRCNN':
......@@ -418,12 +419,12 @@ def create_target_assigner(reference, stage=None,
force_match_for_each_row=False,
negatives_lower_than_unmatched=False,
use_matmul_gather=use_matmul_gather)
box_coder = faster_rcnn_box_coder.FasterRcnnBoxCoder()
box_coder_instance = faster_rcnn_box_coder.FasterRcnnBoxCoder()
else:
raise ValueError('No valid combination of reference and stage.')
return TargetAssigner(similarity_calc, matcher, box_coder,
return TargetAssigner(similarity_calc, matcher, box_coder_instance,
negative_class_weight=negative_class_weight)
......@@ -702,3 +703,5 @@ def batch_assign_confidences(target_assigner,
batch_match = tf.stack(match_list)
return (batch_cls_targets, batch_cls_weights, batch_reg_targets,
batch_reg_weights, batch_match)
......@@ -67,7 +67,8 @@ def append_postprocessing_op(frozen_graph_def,
num_classes,
scale_values,
detections_per_class=100,
use_regular_nms=False):
use_regular_nms=False,
additional_output_tensors=()):
"""Appends postprocessing custom op.
Args:
......@@ -82,11 +83,13 @@ def append_postprocessing_op(frozen_graph_def,
num_classes: number of classes in SSD detector
scale_values: scale values is a dict with following key-value pairs
{y_scale: 10, x_scale: 10, h_scale: 5, w_scale: 5} that are used in decode
centersize boxes
centersize boxes
detections_per_class: In regular NonMaxSuppression, number of anchors used
for NonMaxSuppression per class
use_regular_nms: Flag to set postprocessing op to use Regular NMS instead
of Fast NMS.
for NonMaxSuppression per class
use_regular_nms: Flag to set postprocessing op to use Regular NMS instead of
Fast NMS.
additional_output_tensors: Array of additional tensor names to output.
Tensors are appended after postprocessing output.
Returns:
transformed_graph_def: Frozen GraphDef with postprocessing custom op
......@@ -140,7 +143,8 @@ def append_postprocessing_op(frozen_graph_def,
['raw_outputs/box_encodings', 'raw_outputs/class_predictions', 'anchors'])
# Transform the graph to append new postprocessing op
input_names = []
output_names = ['TFLite_Detection_PostProcess']
output_names = ['TFLite_Detection_PostProcess'
] + list(additional_output_tensors)
transforms = ['strip_unused_nodes']
transformed_graph_def = TransformGraph(frozen_graph_def, input_names,
output_names, transforms)
......@@ -156,7 +160,8 @@ def export_tflite_graph(pipeline_config,
detections_per_class=100,
use_regular_nms=False,
binary_graph_name='tflite_graph.pb',
txt_graph_name='tflite_graph.pbtxt'):
txt_graph_name='tflite_graph.pbtxt',
additional_output_tensors=()):
"""Exports a tflite compatible graph and anchors for ssd detection model.
Anchors are written to a tensor and tflite compatible graph
......@@ -173,11 +178,13 @@ def export_tflite_graph(pipeline_config,
max_detections: Maximum number of detections (boxes) to show
max_classes_per_detection: Number of classes to display per detection
detections_per_class: In regular NonMaxSuppression, number of anchors used
for NonMaxSuppression per class
use_regular_nms: Flag to set postprocessing op to use Regular NMS instead
of Fast NMS.
for NonMaxSuppression per class
use_regular_nms: Flag to set postprocessing op to use Regular NMS instead of
Fast NMS.
binary_graph_name: Name of the exported graph file in binary format.
txt_graph_name: Name of the exported graph file in text format.
additional_output_tensors: Array of additional tensor names to output.
Additional tensors are appended to the end of output tensor list.
Raises:
ValueError: if the pipeline config contains models other than ssd or uses an
......@@ -191,12 +198,12 @@ def export_tflite_graph(pipeline_config,
num_classes = pipeline_config.model.ssd.num_classes
nms_score_threshold = {
pipeline_config.model.ssd.post_processing.batch_non_max_suppression.
score_threshold
pipeline_config.model.ssd.post_processing.batch_non_max_suppression
.score_threshold
}
nms_iou_threshold = {
pipeline_config.model.ssd.post_processing.batch_non_max_suppression.
iou_threshold
pipeline_config.model.ssd.post_processing.batch_non_max_suppression
.iou_threshold
}
scale_values = {}
scale_values['y_scale'] = {
......@@ -291,7 +298,7 @@ def export_tflite_graph(pipeline_config,
output_node_names=','.join([
'raw_outputs/box_encodings', 'raw_outputs/class_predictions',
'anchors'
]),
] + list(additional_output_tensors)),
restore_op_name='save/restore_all',
filename_tensor_name='save/Const:0',
clear_devices=True,
......@@ -301,9 +308,16 @@ def export_tflite_graph(pipeline_config,
# Add new operation to do post processing in a custom op (TF Lite only)
if add_postprocessing_op:
transformed_graph_def = append_postprocessing_op(
frozen_graph_def, max_detections, max_classes_per_detection,
nms_score_threshold, nms_iou_threshold, num_classes, scale_values,
detections_per_class, use_regular_nms)
frozen_graph_def,
max_detections,
max_classes_per_detection,
nms_score_threshold,
nms_iou_threshold,
num_classes,
scale_values,
detections_per_class,
use_regular_nms,
additional_output_tensors=additional_output_tensors)
else:
# Return frozen without adding post-processing custom op
transformed_graph_def = frozen_graph_def
......
......@@ -12,7 +12,6 @@
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for object_detection.export_tflite_ssd_graph."""
from __future__ import absolute_import
from __future__ import division
......@@ -31,7 +30,6 @@ from object_detection.protos import graph_rewriter_pb2
from object_detection.protos import pipeline_pb2
from object_detection.protos import post_processing_pb2
if six.PY2:
import mock # pylint: disable=g-import-not-at-top
else:
......@@ -130,7 +128,10 @@ class ExportTfliteGraphTest(tf.test.TestCase):
feed_dict={input_tensor: np.random.rand(1, 10, 10, num_channels)})
return box_encodings_np, class_predictions_np
def _export_graph(self, pipeline_config, num_channels=3):
def _export_graph(self,
pipeline_config,
num_channels=3,
additional_output_tensors=()):
"""Exports a tflite graph."""
output_dir = self.get_temp_dir()
trained_checkpoint_prefix = os.path.join(output_dir, 'model.ckpt')
......@@ -147,18 +148,22 @@ class ExportTfliteGraphTest(tf.test.TestCase):
mock_builder.return_value = FakeModel()
with tf.Graph().as_default():
tf.identity(
tf.constant([[1, 2], [3, 4]], tf.uint8), name='UnattachedTensor')
export_tflite_ssd_graph_lib.export_tflite_graph(
pipeline_config=pipeline_config,
trained_checkpoint_prefix=trained_checkpoint_prefix,
output_dir=output_dir,
add_postprocessing_op=False,
max_detections=10,
max_classes_per_detection=1)
max_classes_per_detection=1,
additional_output_tensors=additional_output_tensors)
return tflite_graph_file
def _export_graph_with_postprocessing_op(self,
pipeline_config,
num_channels=3):
num_channels=3,
additional_output_tensors=()):
"""Exports a tflite graph with custom postprocessing op."""
output_dir = self.get_temp_dir()
trained_checkpoint_prefix = os.path.join(output_dir, 'model.ckpt')
......@@ -175,13 +180,16 @@ class ExportTfliteGraphTest(tf.test.TestCase):
mock_builder.return_value = FakeModel()
with tf.Graph().as_default():
tf.identity(
tf.constant([[1, 2], [3, 4]], tf.uint8), name='UnattachedTensor')
export_tflite_ssd_graph_lib.export_tflite_graph(
pipeline_config=pipeline_config,
trained_checkpoint_prefix=trained_checkpoint_prefix,
output_dir=output_dir,
add_postprocessing_op=True,
max_detections=10,
max_classes_per_detection=1)
max_classes_per_detection=1,
additional_output_tensors=additional_output_tensors)
return tflite_graph_file
def test_export_tflite_graph_with_moving_averages(self):
......@@ -325,7 +333,8 @@ class ExportTfliteGraphTest(tf.test.TestCase):
with tf.gfile.Open(tflite_graph_file) as f:
graph_def.ParseFromString(f.read())
all_op_names = [node.name for node in graph_def.node]
self.assertTrue('TFLite_Detection_PostProcess' in all_op_names)
self.assertIn('TFLite_Detection_PostProcess', all_op_names)
self.assertNotIn('UnattachedTensor', all_op_names)
for node in graph_def.node:
if node.name == 'TFLite_Detection_PostProcess':
self.assertTrue(node.attr['_output_quantized'].b is True)
......@@ -342,6 +351,42 @@ class ExportTfliteGraphTest(tf.test.TestCase):
for t in node.attr['_output_types'].list.type
]))
def test_export_tflite_graph_with_additional_tensors(self):
pipeline_config = pipeline_pb2.TrainEvalPipelineConfig()
pipeline_config.eval_config.use_moving_averages = False
pipeline_config.model.ssd.image_resizer.fixed_shape_resizer.height = 10
pipeline_config.model.ssd.image_resizer.fixed_shape_resizer.width = 10
tflite_graph_file = self._export_graph(
pipeline_config, additional_output_tensors=['UnattachedTensor'])
self.assertTrue(os.path.exists(tflite_graph_file))
graph = tf.Graph()
with graph.as_default():
graph_def = tf.GraphDef()
with tf.gfile.Open(tflite_graph_file) as f:
graph_def.ParseFromString(f.read())
all_op_names = [node.name for node in graph_def.node]
self.assertIn('UnattachedTensor', all_op_names)
def test_export_tflite_graph_with_postprocess_op_and_additional_tensors(self):
pipeline_config = pipeline_pb2.TrainEvalPipelineConfig()
pipeline_config.eval_config.use_moving_averages = False
pipeline_config.model.ssd.post_processing.score_converter = (
post_processing_pb2.PostProcessing.SIGMOID)
pipeline_config.model.ssd.image_resizer.fixed_shape_resizer.height = 10
pipeline_config.model.ssd.image_resizer.fixed_shape_resizer.width = 10
pipeline_config.model.ssd.num_classes = 2
tflite_graph_file = self._export_graph_with_postprocessing_op(
pipeline_config, additional_output_tensors=['UnattachedTensor'])
self.assertTrue(os.path.exists(tflite_graph_file))
graph = tf.Graph()
with graph.as_default():
graph_def = tf.GraphDef()
with tf.gfile.Open(tflite_graph_file) as f:
graph_def.ParseFromString(f.read())
all_op_names = [node.name for node in graph_def.node]
self.assertIn('TFLite_Detection_PostProcess', all_op_names)
self.assertIn('UnattachedTensor', all_op_names)
@mock.patch.object(exporter, 'rewrite_nn_resize_op')
def test_export_with_nn_resize_op_not_called_without_fpn(self, mock_get):
pipeline_config = pipeline_pb2.TrainEvalPipelineConfig()
......
......@@ -40,50 +40,54 @@ def rewrite_nn_resize_op(is_quantized=False):
Args:
is_quantized: True if the default graph is quantized.
"""
input_pattern = graph_matcher.OpTypePattern(
'FakeQuantWithMinMaxVars' if is_quantized else '*')
reshape_1_pattern = graph_matcher.OpTypePattern(
'Reshape', inputs=[input_pattern, 'Const'], ordered_inputs=False)
mul_pattern = graph_matcher.OpTypePattern(
'Mul', inputs=[reshape_1_pattern, 'Const'], ordered_inputs=False)
# The quantization script may or may not insert a fake quant op after the
# Mul. In either case, these min/max vars are not needed once replaced with
# the TF version of NN resize.
fake_quant_pattern = graph_matcher.OpTypePattern(
'FakeQuantWithMinMaxVars',
inputs=[mul_pattern, 'Identity', 'Identity'],
ordered_inputs=False)
reshape_2_pattern = graph_matcher.OpTypePattern(
'Reshape',
inputs=[graph_matcher.OneofPattern([fake_quant_pattern, mul_pattern]),
'Const'],
ordered_inputs=False)
add_type_name = 'Add'
if tf.compat.forward_compatible(2019, 6, 26):
add_type_name = 'AddV2'
add_pattern = graph_matcher.OpTypePattern(
add_type_name, inputs=[reshape_2_pattern, '*'], ordered_inputs=False)
matcher = graph_matcher.GraphMatcher(add_pattern)
for match in matcher.match_graph(tf.get_default_graph()):
projection_op = match.get_op(input_pattern)
reshape_2_op = match.get_op(reshape_2_pattern)
add_op = match.get_op(add_pattern)
nn_resize = tf.image.resize_nearest_neighbor(
projection_op.outputs[0],
add_op.outputs[0].shape.dims[1:3],
align_corners=False,
name=os.path.split(reshape_2_op.name)[0] + '/resize_nearest_neighbor')
for index, op_input in enumerate(add_op.inputs):
if op_input == reshape_2_op.outputs[0]:
add_op._update_input(index, nn_resize) # pylint: disable=protected-access
break
def remove_nn():
"""Remove nearest neighbor upsampling structure and replace with TF op."""
input_pattern = graph_matcher.OpTypePattern(
'FakeQuantWithMinMaxVars' if is_quantized else '*')
stack_1_pattern = graph_matcher.OpTypePattern(
'Pack', inputs=[input_pattern, input_pattern], ordered_inputs=False)
stack_2_pattern = graph_matcher.OpTypePattern(
'Pack', inputs=[stack_1_pattern, stack_1_pattern], ordered_inputs=False)
reshape_pattern = graph_matcher.OpTypePattern(
'Reshape', inputs=[stack_2_pattern, 'Const'], ordered_inputs=False)
consumer_pattern = graph_matcher.OpTypePattern(
'Add|AddV2|Max|Mul', inputs=[reshape_pattern, '*'],
ordered_inputs=False)
match_counter = 0
matcher = graph_matcher.GraphMatcher(consumer_pattern)
for match in matcher.match_graph(tf.get_default_graph()):
match_counter += 1
projection_op = match.get_op(input_pattern)
reshape_op = match.get_op(reshape_pattern)
consumer_op = match.get_op(consumer_pattern)
nn_resize = tf.image.resize_nearest_neighbor(
projection_op.outputs[0],
reshape_op.outputs[0].shape.dims[1:3],
align_corners=False,
name=os.path.split(reshape_op.name)[0] + '/resize_nearest_neighbor')
for index, op_input in enumerate(consumer_op.inputs):
if op_input == reshape_op.outputs[0]:
consumer_op._update_input(index, nn_resize) # pylint: disable=protected-access
break
tf.logging.info('Found and fixed {} matches'.format(match_counter))
return match_counter
# Applying twice because both inputs to Add could be NN pattern
total_removals = 0
while remove_nn():
total_removals += 1
# This number is chosen based on the nas-fpn architecture.
if total_removals > 4:
raise ValueError('Graph removal encountered a infinite loop.')
def replace_variable_values_with_moving_averages(graph,
current_checkpoint_file,
new_checkpoint_file):
new_checkpoint_file,
no_ema_collection=None):
"""Replaces variable values in the checkpoint with their moving averages.
If the current checkpoint has shadow variables maintaining moving averages of
......@@ -95,10 +99,14 @@ def replace_variable_values_with_moving_averages(graph,
current_checkpoint_file: a checkpoint containing both original variables and
their moving averages.
new_checkpoint_file: file path to write a new checkpoint.
no_ema_collection: A list of namescope substrings to match the variables
to eliminate EMA.
"""
with graph.as_default():
variable_averages = tf.train.ExponentialMovingAverage(0.0)
ema_variables_to_restore = variable_averages.variables_to_restore()
ema_variables_to_restore = config_util.remove_unecessary_ema(
ema_variables_to_restore, no_ema_collection)
with tf.Session() as sess:
read_saver = tf.train.Saver(ema_variables_to_restore)
read_saver.restore(sess, current_checkpoint_file)
......
......@@ -21,6 +21,7 @@ import tensorflow as tf
from google.protobuf import text_format
from tensorflow.python.framework import dtypes
from tensorflow.python.ops import array_ops
from tensorflow.python.tools import strip_unused_lib
from object_detection import exporter
from object_detection.builders import graph_rewriter_builder
from object_detection.builders import model_builder
......@@ -1056,6 +1057,42 @@ class ExportInferenceGraphTest(tf.test.TestCase):
self.assertTrue(resize_op_found)
def test_rewrite_nn_resize_op_multiple_path(self):
g = tf.Graph()
with g.as_default():
with tf.name_scope('nearest_upsampling'):
x = array_ops.placeholder(dtypes.float32, shape=(8, 10, 10, 8))
x_stack = tf.stack([tf.stack([x] * 2, axis=3)] * 2, axis=2)
x_reshape = tf.reshape(x_stack, [8, 20, 20, 8])
with tf.name_scope('nearest_upsampling'):
x_2 = array_ops.placeholder(dtypes.float32, shape=(8, 10, 10, 8))
x_stack_2 = tf.stack([tf.stack([x_2] * 2, axis=3)] * 2, axis=2)
x_reshape_2 = tf.reshape(x_stack_2, [8, 20, 20, 8])
t = x_reshape + x_reshape_2
exporter.rewrite_nn_resize_op()
graph_def = g.as_graph_def()
graph_def = strip_unused_lib.strip_unused(
graph_def,
input_node_names=[
'nearest_upsampling/Placeholder', 'nearest_upsampling_1/Placeholder'
],
output_node_names=['add'],
placeholder_type_enum=dtypes.float32.as_datatype_enum)
counter_resize_op = 0
t_input_ops = [op.name for op in t.op.inputs]
for node in graph_def.node:
# Make sure Stacks are replaced.
self.assertNotEqual(node.op, 'Pack')
if node.op == 'ResizeNearestNeighbor':
counter_resize_op += 1
self.assertIn(node.name + ':0', t_input_ops)
self.assertEqual(counter_resize_op, 2)
if __name__ == '__main__':
tf.test.main()
......@@ -66,6 +66,9 @@ python models/research/object_detection/metrics/oid_challenge_evaluation.py \
--output_metrics=${OUTPUT_METRICS} \
```
Note that predictions file must contain the following keys:
ImageID,LabelName,Score,XMin,XMax,YMin,YMax
For the Object Detection Track, the participants will be ranked on:
- "OpenImagesDetectionChallenge_Precision/mAP@0.5IOU"
......@@ -94,10 +97,11 @@ evaluation metric implementation is available in the class
masks.
Those should be transformed into a single CSV file in the format:
ImageID,LabelName,ImageWidth,ImageHeight,XMin,YMin,XMax,YMax,GroupOf,Mask
where Mask is MS COCO RLE encoding of a binary mask stored in .png file.
NOTE: the util to make the transformation will be released soon.
ImageID,LabelName,ImageWidth,ImageHeight,XMin,YMin,XMax,YMax,IsGroupOf,Mask
where Mask is MS COCO RLE encoding, compressed with zip, and re-coded with
base64 encoding of a binary mask stored in .png file. See an example
implementation of the encoding function
[here](https://gist.github.com/pculliton/209398a2a52867580c6103e25e55d93c).
1. Run the following command to create hierarchical expansion of the instance
segmentation, bounding boxes and image-level label annotations: {value=4}
......@@ -142,6 +146,11 @@ python models/research/object_detection/metrics/oid_challenge_evaluation.py \
--output_metrics=${OUTPUT_METRICS} \
```
Note that predictions file must contain the following keys:
ImageID,ImageWidth,ImageHeight,LabelName,Score,Mask
Mask must be encoded the same way as groundtruth masks.
For the Instance Segmentation Track, the participants will be ranked on:
- "OpenImagesInstanceSegmentationChallenge_Precision/mAP@0.5IOU"
......@@ -196,6 +205,9 @@ python object_detection/metrics/oid_vrd_challenge_evaluation.py \
--output_metrics=${OUTPUT_METRICS}
```
Note that predictions file must contain the following keys:
ImageID,LabelName1,LabelName2,RelationshipLabel,Score,XMin1,XMax1,YMin1,YMax1,XMin2,XMax2,YMin2,YMax2
The participants of the challenge will be evaluated by weighted average of the following three metrics:
- "VRDMetric_Relationships_mAP@0.5IOU"
......
......@@ -35,17 +35,20 @@ tar -xzvf ssd_mobilenet_v1_coco.tar.gz
Inside the un-tar'ed directory, you will find:
* a graph proto (`graph.pbtxt`)
* a checkpoint
(`model.ckpt.data-00000-of-00001`, `model.ckpt.index`, `model.ckpt.meta`)
* a frozen graph proto with weights baked into the graph as constants
(`frozen_inference_graph.pb`) to be used for out of the box inference
(try this out in the Jupyter notebook!)
* a config file (`pipeline.config`) which was used to generate the graph. These
directly correspond to a config file in the
[samples/configs](https://github.com/tensorflow/models/tree/master/research/object_detection/samples/configs)) directory but often with a modified score threshold. In the case
of the heavier Faster R-CNN models, we also provide a version of the model
that uses a highly reduced number of proposals for speed.
* a graph proto (`graph.pbtxt`)
* a checkpoint (`model.ckpt.data-00000-of-00001`, `model.ckpt.index`,
`model.ckpt.meta`)
* a frozen graph proto with weights baked into the graph as constants
(`frozen_inference_graph.pb`) to be used for out of the box inference (try
this out in the Jupyter notebook!)
* a config file (`pipeline.config`) which was used to generate the graph.
These directly correspond to a config file in the
[samples/configs](https://github.com/tensorflow/models/tree/master/research/object_detection/samples/configs))
directory but often with a modified score threshold. In the case of the
heavier Faster R-CNN models, we also provide a version of the model that
uses a highly reduced number of proposals for speed.
* Mobile model only: a TfLite file (`model.tflite`) that can be deployed on
mobile devices.
Some remarks on frozen inference graphs:
......@@ -100,6 +103,13 @@ Note: The asterisk (☆) at the end of model name indicates that this model supp
Note: If you download the tar.gz file of quantized models and un-tar, you will get different set of files - a checkpoint, a config file and tflite frozen graphs (txt/binary).
### Mobile models
Model name | Pixel 1 Latency (ms) | COCO mAP | Outputs
----------------------------------------------------------------------------------------------------------------------------------- | :------------------: | :------: | :-----:
[ssd_mobilenet_v3_large_coco](http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v3_large_coco_2019_08_14.tar.gz) | 119 | 22.3 | Boxes
[ssd_mobilenet_v3_small_coco](http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v3_small_coco_2019_08_14.tar.gz) | 43 | 15.6 | Boxes
## Kitti-trained models
Model name | Speed (ms) | Pascal mAP@0.5 | Outputs
......
......@@ -71,7 +71,8 @@ def transform_input_data(tensor_dict,
merge_multiple_boxes=False,
retain_original_image=False,
use_multiclass_scores=False,
use_bfloat16=False):
use_bfloat16=False,
retain_original_image_additional_channels=False):
"""A single function that is responsible for all input data transformations.
Data transformation functions are applied in the following order.
......@@ -110,6 +111,8 @@ def transform_input_data(tensor_dict,
this is True and multiclass_scores is empty, one-hot encoding of
`groundtruth_classes` is used as a fallback.
use_bfloat16: (optional) a bool, whether to use bfloat16 in training.
retain_original_image_additional_channels: (optional) Whether to retain
original image additional channels in the output dictionary.
Returns:
A dictionary keyed by fields.InputDataFields containing the tensors obtained
......@@ -139,6 +142,10 @@ def transform_input_data(tensor_dict,
channels = out_tensor_dict[fields.InputDataFields.image_additional_channels]
out_tensor_dict[fields.InputDataFields.image] = tf.concat(
[out_tensor_dict[fields.InputDataFields.image], channels], axis=2)
if retain_original_image_additional_channels:
out_tensor_dict[
fields.InputDataFields.image_additional_channels] = tf.cast(
image_resizer_fn(channels, None)[0], tf.uint8)
# Apply data augmentation ops.
if data_augmentation_fn is not None:
......@@ -445,6 +452,9 @@ def _get_features_dict(input_dict):
if fields.InputDataFields.original_image in input_dict:
features[fields.InputDataFields.original_image] = input_dict[
fields.InputDataFields.original_image]
if fields.InputDataFields.image_additional_channels in input_dict:
features[fields.InputDataFields.image_additional_channels] = input_dict[
fields.InputDataFields.image_additional_channels]
return features
......@@ -663,7 +673,9 @@ def eval_input(eval_config, eval_input_config, model_config,
image_resizer_fn=image_resizer_fn,
num_classes=num_classes,
data_augmentation_fn=None,
retain_original_image=eval_config.retain_original_images)
retain_original_image=eval_config.retain_original_images,
retain_original_image_additional_channels=
eval_config.retain_original_image_additional_channels)
tensor_dict = pad_input_data_to_static_shapes(
tensor_dict=transform_data_fn(tensor_dict),
max_num_boxes=eval_input_config.max_number_of_boxes,
......
......@@ -301,6 +301,70 @@ class InputsTest(test_case.TestCase, parameterized.TestCase):
self.assertEqual(
tf.int32, labels[fields.InputDataFields.groundtruth_difficult].dtype)
def test_ssd_inceptionV2_eval_input_with_additional_channels(
self, eval_batch_size=1):
"""Tests the eval input function for SSDInceptionV2 with additional channels.
Args:
eval_batch_size: Batch size for eval set.
"""
configs = _get_configs_for_model('ssd_inception_v2_pets')
model_config = configs['model']
model_config.ssd.num_classes = 37
configs['eval_input_configs'][0].num_additional_channels = 1
eval_config = configs['eval_config']
eval_config.batch_size = eval_batch_size
eval_config.retain_original_image_additional_channels = True
eval_input_fn = inputs.create_eval_input_fn(
eval_config, configs['eval_input_configs'][0], model_config)
features, labels = _make_initializable_iterator(eval_input_fn()).get_next()
self.assertAllEqual([eval_batch_size, 300, 300, 4],
features[fields.InputDataFields.image].shape.as_list())
self.assertEqual(tf.float32, features[fields.InputDataFields.image].dtype)
self.assertAllEqual(
[eval_batch_size, 300, 300, 3],
features[fields.InputDataFields.original_image].shape.as_list())
self.assertEqual(tf.uint8,
features[fields.InputDataFields.original_image].dtype)
self.assertAllEqual([eval_batch_size, 300, 300, 1], features[
fields.InputDataFields.image_additional_channels].shape.as_list())
self.assertEqual(
tf.uint8,
features[fields.InputDataFields.image_additional_channels].dtype)
self.assertAllEqual([eval_batch_size],
features[inputs.HASH_KEY].shape.as_list())
self.assertEqual(tf.int32, features[inputs.HASH_KEY].dtype)
self.assertAllEqual(
[eval_batch_size, 100, 4],
labels[fields.InputDataFields.groundtruth_boxes].shape.as_list())
self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_boxes].dtype)
self.assertAllEqual(
[eval_batch_size, 100, model_config.ssd.num_classes],
labels[fields.InputDataFields.groundtruth_classes].shape.as_list())
self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_classes].dtype)
self.assertAllEqual(
[eval_batch_size, 100],
labels[fields.InputDataFields.groundtruth_weights].shape.as_list())
self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_weights].dtype)
self.assertAllEqual(
[eval_batch_size, 100],
labels[fields.InputDataFields.groundtruth_area].shape.as_list())
self.assertEqual(tf.float32,
labels[fields.InputDataFields.groundtruth_area].dtype)
self.assertAllEqual(
[eval_batch_size, 100],
labels[fields.InputDataFields.groundtruth_is_crowd].shape.as_list())
self.assertEqual(tf.bool,
labels[fields.InputDataFields.groundtruth_is_crowd].dtype)
self.assertAllEqual(
[eval_batch_size, 100],
labels[fields.InputDataFields.groundtruth_difficult].shape.as_list())
self.assertEqual(tf.int32,
labels[fields.InputDataFields.groundtruth_difficult].dtype)
def test_predict_input(self):
"""Tests the predict input function."""
configs = _get_configs_for_model('ssd_inception_v2_pets')
......
......@@ -326,7 +326,8 @@ class FasterRCNNMetaArch(model.DetectionModel):
clip_anchors_to_image=False,
use_static_shapes=False,
resize_masks=True,
freeze_batchnorm=False):
freeze_batchnorm=False,
return_raw_detections_during_predict=False):
"""FasterRCNNMetaArch Constructor.
Args:
......@@ -455,7 +456,9 @@ class FasterRCNNMetaArch(model.DetectionModel):
stage box predictor during training or not. When training with a small
batch size (e.g. 1), it is desirable to freeze batch norm update and
use pretrained batch norm params.
return_raw_detections_during_predict: Whether to return raw detection
boxes in the predict() method. These are decoded boxes that have not
been through postprocessing (i.e. NMS). Default False.
Raises:
ValueError: If `second_stage_batch_size` > `first_stage_max_proposals` at
training time.
......@@ -623,6 +626,8 @@ class FasterRCNNMetaArch(model.DetectionModel):
if self._number_of_stages <= 0 or self._number_of_stages > 3:
raise ValueError('Number of stages should be a value in {1, 2, 3}.')
self._batched_prediction_tensor_names = []
self._return_raw_detections_during_predict = (
return_raw_detections_during_predict)
@property
def first_stage_feature_extractor_scope(self):
......@@ -694,16 +699,12 @@ class FasterRCNNMetaArch(model.DetectionModel):
Raises:
ValueError: if inputs tensor does not have type tf.float32
"""
if inputs.dtype is not tf.float32:
raise ValueError('`preprocess` expects a tf.float32 tensor')
with tf.name_scope('Preprocessor'):
outputs = shape_utils.static_or_dynamic_map_fn(
self._image_resizer_fn,
elems=inputs,
dtype=[tf.float32, tf.int32],
parallel_iterations=self._parallel_iterations)
resized_inputs = outputs[0]
true_image_shapes = outputs[1]
(resized_inputs,
true_image_shapes) = shape_utils.resize_images_and_return_shapes(
inputs, self._image_resizer_fn)
return (self._feature_extractor.preprocess(resized_inputs),
true_image_shapes)
......@@ -790,31 +791,42 @@ class FasterRCNNMetaArch(model.DetectionModel):
for the first stage RPN (in absolute coordinates). Note that
`num_anchors` can differ depending on whether the model is created in
training or inference mode.
7) feature_maps: A single element list containing a 4-D float32 tensor
with shape batch_size, height, width, depth] representing the RPN
features to crop.
(and if number_of_stages > 1):
7) refined_box_encodings: a 3-D tensor with shape
8) refined_box_encodings: a 3-D tensor with shape
[total_num_proposals, num_classes, self._box_coder.code_size]
representing predicted (final) refined box encodings, where
total_num_proposals=batch_size*self._max_num_proposals. If using
a shared box across classes the shape will instead be
[total_num_proposals, 1, self._box_coder.code_size].
8) class_predictions_with_background: a 3-D tensor with shape
9) class_predictions_with_background: a 3-D tensor with shape
[total_num_proposals, num_classes + 1] containing class
predictions (logits) for each of the anchors, where
total_num_proposals=batch_size*self._max_num_proposals.
Note that this tensor *includes* background class predictions
(at class index 0).
9) num_proposals: An int32 tensor of shape [batch_size] representing the
number of proposals generated by the RPN. `num_proposals` allows us
to keep track of which entries are to be treated as zero paddings and
which are not since we always pad the number of proposals to be
10) num_proposals: An int32 tensor of shape [batch_size] representing
the number of proposals generated by the RPN. `num_proposals` allows
us to keep track of which entries are to be treated as zero paddings
and which are not since we always pad the number of proposals to be
`self.max_num_proposals` for each image.
10) proposal_boxes: A float32 tensor of shape
11) proposal_boxes: A float32 tensor of shape
[batch_size, self.max_num_proposals, 4] representing
decoded proposal bounding boxes in absolute coordinates.
11) mask_predictions: (optional) a 4-D tensor with shape
12) mask_predictions: (optional) a 4-D tensor with shape
[total_num_padded_proposals, num_classes, mask_height, mask_width]
containing instance mask predictions.
13) raw_detection_boxes: (optional) a
[batch_size, self.max_num_proposals, num_classes, 4] float32 tensor
with detections prior to NMS in normalized coordinates.
14) raw_detection_feature_map_indices: (optional) a
[batch_size, self.max_num_proposals, num_classes] int32 tensor with
indices indicating which feature map each raw detection box was
produced from. The indices correspond to the elements in the
'feature_maps' field.
Raises:
ValueError: If `predict` is called before `preprocess`.
......@@ -868,6 +880,9 @@ class FasterRCNNMetaArch(model.DetectionModel):
for the first stage RPN (in absolute coordinates). Note that
`num_anchors` can differ depending on whether the model is created in
training or inference mode.
7) feature_maps: A single element list containing a 4-D float32 tensor
with shape batch_size, height, width, depth] representing the RPN
features to crop.
"""
(rpn_box_predictor_features, rpn_features_to_crop, anchors_boxlist,
image_shape) = self._extract_rpn_feature_maps(preprocessed_inputs)
......@@ -907,6 +922,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
dtype=tf.float32),
'anchors':
anchors_boxlist.data['boxes'],
fields.PredictionFields.feature_maps: [rpn_features_to_crop]
}
return prediction_dict
......@@ -985,18 +1001,25 @@ class FasterRCNNMetaArch(model.DetectionModel):
of the image.
6) box_classifier_features: a 4-D float32/bfloat16 tensor
representing the features for each proposal.
If self._return_raw_detections_during_predict is True, the dictionary
will also contain:
7) raw_detection_boxes: a 4-D float32 tensor with shape
[batch_size, self.max_num_proposals, num_classes, 4] in normalized
coordinates.
8) raw_detection_feature_map_indices: a 3-D int32 tensor with shape
[batch_size, self.max_num_proposals, num_classes].
"""
proposal_boxes_normalized, num_proposals = self._proposal_postprocess(
rpn_box_encodings, rpn_objectness_predictions_with_background, anchors,
image_shape, true_image_shapes)
prediction_dict = self._box_prediction(rpn_features_to_crop,
proposal_boxes_normalized,
image_shape)
image_shape, true_image_shapes)
prediction_dict['num_proposals'] = num_proposals
return prediction_dict
def _box_prediction(self, rpn_features_to_crop, proposal_boxes_normalized,
image_shape):
image_shape, true_image_shapes):
"""Predicts the output tensors from second stage of Faster R-CNN.
Args:
......@@ -1008,6 +1031,10 @@ class FasterRCNNMetaArch(model.DetectionModel):
proposal boxes for all images in the batch. These boxes are represented
as normalized coordinates.
image_shape: A 1D int32 tensors of size [4] containing the image shape.
true_image_shapes: int32 tensor of shape [batch, 3] where each row is
of the form [height, width, channels] indicating the shapes
of true images in the resized images, as resized images can be padded
with zeros.
Returns:
prediction_dict: a dictionary holding "raw" prediction tensors:
......@@ -1034,6 +1061,16 @@ class FasterRCNNMetaArch(model.DetectionModel):
of the image.
5) box_classifier_features: a 4-D float32/bfloat16 tensor
representing the features for each proposal.
If self._return_raw_detections_during_predict is True, the dictionary
will also contain:
6) raw_detection_boxes: a 4-D float32 tensor with shape
[batch_size, self.max_num_proposals, num_classes, 4] in normalized
coordinates.
7) raw_detection_feature_map_indices: a 3-D int32 tensor with shape
[batch_size, self.max_num_proposals, num_classes].
8) final_anchors: a 3-D float tensor of shape [batch_size,
self.max_num_proposals, 4] containing the reference anchors for raw
detection boxes in normalized coordinates.
"""
flattened_proposal_feature_maps = (
self._compute_second_stage_input_feature_maps(
......@@ -1071,10 +1108,54 @@ class FasterRCNNMetaArch(model.DetectionModel):
'proposal_boxes': absolute_proposal_boxes,
'box_classifier_features': box_classifier_features,
'proposal_boxes_normalized': proposal_boxes_normalized,
'final_anchors': proposal_boxes_normalized
}
if self._return_raw_detections_during_predict:
prediction_dict.update(self._raw_detections_and_feature_map_inds(
refined_box_encodings, absolute_proposal_boxes, true_image_shapes))
return prediction_dict
def _raw_detections_and_feature_map_inds(
self, refined_box_encodings, absolute_proposal_boxes, true_image_shapes):
"""Returns raw detections and feat map inds from where they originated.
Args:
refined_box_encodings: [total_num_proposals, num_classes,
self._box_coder.code_size] float32 tensor.
absolute_proposal_boxes: [batch_size, self.max_num_proposals, 4] float32
tensor representing decoded proposal bounding boxes in absolute
coordinates.
true_image_shapes: [batch, 3] int32 tensor where each row is
of the form [height, width, channels] indicating the shapes
of true images in the resized images, as resized images can be padded
with zeros.
Returns:
A dictionary with raw detection boxes, and the feature map indices from
which they originated.
"""
box_encodings_batch = tf.reshape(
refined_box_encodings,
[-1, self.max_num_proposals, refined_box_encodings.shape[1],
self._box_coder.code_size])
raw_detection_boxes_absolute = self._batch_decode_boxes(
box_encodings_batch, absolute_proposal_boxes)
raw_detection_boxes_normalized = shape_utils.static_or_dynamic_map_fn(
self._normalize_and_clip_boxes,
elems=[raw_detection_boxes_absolute, true_image_shapes],
dtype=tf.float32)
detection_feature_map_indices = tf.zeros_like(
raw_detection_boxes_normalized[:, :, :, 0], dtype=tf.int32)
return {
fields.PredictionFields.raw_detection_boxes:
raw_detection_boxes_normalized,
fields.PredictionFields.raw_detection_feature_map_indices:
detection_feature_map_indices
}
def _extract_box_classifier_features(self, flattened_feature_maps):
if self._feature_extractor_for_box_classifier_features == (
_UNINITIALIZED_FEATURE_EXTRACTOR):
......@@ -1416,11 +1497,12 @@ class FasterRCNNMetaArch(model.DetectionModel):
detection_boxes: [batch, max_detection, 4]
detection_scores: [batch, max_detections]
detection_multiclass_scores: [batch, max_detections, 2]
detection_anchor_indices: [batch, max_detections]
detection_classes: [batch, max_detections]
(this entry is only created if rpn_mode=False)
num_detections: [batch]
raw_detection_boxes: [batch, max_detections, 4]
raw_detection_scores: [batch, max_detections, num_classes + 1]
raw_detection_boxes: [batch, total_detections, 4]
raw_detection_scores: [batch, total_detections, num_classes + 1]
Raises:
ValueError: If `predict` is called before `preprocess`.
......@@ -1473,6 +1555,13 @@ class FasterRCNNMetaArch(model.DetectionModel):
if self._number_of_stages == 3:
# Post processing is already performed in 3rd stage. We need to transfer
# postprocessed tensors from `prediction_dict` to `detections_dict`.
# Remove any items from the prediction dictionary if they are not pure
# Tensors.
non_tensor_predictions = [
k for k, v in prediction_dict.items() if not isinstance(v, tf.Tensor)]
for k in non_tensor_predictions:
tf.logging.info('Removing {0} from prediction_dict'.format(k))
prediction_dict.pop(k)
return prediction_dict
def _add_detection_features_output_node(self, detection_boxes,
......@@ -1621,8 +1710,9 @@ class FasterRCNNMetaArch(model.DetectionModel):
normalize_boxes,
elems=[raw_proposal_boxes, image_shapes],
dtype=tf.float32)
proposal_multiclass_scores = nmsed_additional_fields.get(
'multiclass_scores') if nmsed_additional_fields else None,
proposal_multiclass_scores = (
nmsed_additional_fields.get('multiclass_scores')
if nmsed_additional_fields else None)
return (normalized_proposal_boxes, proposal_scores,
proposal_multiclass_scores, num_proposals,
raw_normalized_proposal_boxes, rpn_objectness_softmax)
......@@ -1899,9 +1989,11 @@ class FasterRCNNMetaArch(model.DetectionModel):
A dictionary containing:
`detection_boxes`: [batch, max_detection, 4] in normalized co-ordinates.
`detection_scores`: [batch, max_detections]
detection_multiclass_scores: [batch, max_detections,
`detection_multiclass_scores`: [batch, max_detections,
num_classes_with_background] tensor with class score distribution for
post-processed detection boxes including background class if any.
`detection_anchor_indices`: [batch, max_detections] with anchor
indices.
`detection_classes`: [batch, max_detections]
`num_detections`: [batch]
`detection_masks`:
......@@ -1909,10 +2001,13 @@ class FasterRCNNMetaArch(model.DetectionModel):
that a pixel-wise sigmoid score converter is applied to the detection
masks.
`raw_detection_boxes`: [batch, total_detections, 4] tensor with decoded
detection boxes before Non-Max Suppression.
detection boxes in normalized coordinates, before Non-Max Suppression.
The value total_detections is the number of second stage anchors
(i.e. the total number of boxes before NMS).
`raw_detection_scores`: [batch, total_detections,
num_classes_with_background] tensor of multi-class scores for
raw detection boxes.
raw detection boxes. The value total_detections is the number of
second stage anchors (i.e. the total number of boxes before NMS).
"""
refined_box_encodings_batch = tf.reshape(
refined_box_encodings,
......@@ -1943,8 +2038,14 @@ class FasterRCNNMetaArch(model.DetectionModel):
mask_predictions, [-1, self.max_num_proposals,
self.num_classes, mask_height, mask_width])
batch_size = shape_utils.combined_static_and_dynamic_shape(
refined_box_encodings_batch)[0]
batch_anchor_indices = tf.tile(
tf.expand_dims(tf.range(self.max_num_proposals), 0),
multiples=[batch_size, 1])
additional_fields = {
'multiclass_scores': class_predictions_with_background_batch_normalized
'multiclass_scores': class_predictions_with_background_batch_normalized,
'anchor_indices': tf.cast(batch_anchor_indices, tf.float32)
}
(nmsed_boxes, nmsed_scores, nmsed_classes, nmsed_masks,
nmsed_additional_fields, num_detections) = self._second_stage_nms_fn(
......@@ -1965,25 +2066,8 @@ class FasterRCNNMetaArch(model.DetectionModel):
else:
raw_detection_boxes = tf.squeeze(refined_decoded_boxes_batch, axis=2)
def normalize_and_clip_boxes(args):
"""Normalize and clip boxes."""
boxes_per_image = args[0]
image_shape = args[1]
normalized_boxes_per_image = box_list_ops.to_normalized_coordinates(
box_list.BoxList(boxes_per_image),
image_shape[0],
image_shape[1],
check_range=False).get()
normalized_boxes_per_image = box_list_ops.clip_to_window(
box_list.BoxList(normalized_boxes_per_image),
tf.constant([0.0, 0.0, 1.0, 1.0], tf.float32),
filter_nonoverlapping=False).get()
return normalized_boxes_per_image
raw_normalized_detection_boxes = shape_utils.static_or_dynamic_map_fn(
normalize_and_clip_boxes,
self._normalize_and_clip_boxes,
elems=[raw_detection_boxes, image_shapes],
dtype=tf.float32)
......@@ -1996,6 +2080,8 @@ class FasterRCNNMetaArch(model.DetectionModel):
nmsed_classes,
fields.DetectionResultFields.detection_multiclass_scores:
nmsed_additional_fields['multiclass_scores'],
fields.DetectionResultFields.detection_anchor_indices:
tf.cast(nmsed_additional_fields['anchor_indices'], tf.int32),
fields.DetectionResultFields.num_detections:
tf.cast(num_detections, dtype=tf.float32),
fields.DetectionResultFields.raw_detection_boxes:
......@@ -2041,6 +2127,35 @@ class FasterRCNNMetaArch(model.DetectionModel):
tf.stack([combined_shape[0], combined_shape[1],
num_classes, 4]))
def _normalize_and_clip_boxes(self, boxes_and_image_shape):
"""Normalize and clip boxes."""
boxes_per_image = boxes_and_image_shape[0]
image_shape = boxes_and_image_shape[1]
boxes_contains_classes_dim = boxes_per_image.shape.ndims == 3
if boxes_contains_classes_dim:
boxes_per_image = shape_utils.flatten_first_n_dimensions(
boxes_per_image, 2)
normalized_boxes_per_image = box_list_ops.to_normalized_coordinates(
box_list.BoxList(boxes_per_image),
image_shape[0],
image_shape[1],
check_range=False).get()
normalized_boxes_per_image = box_list_ops.clip_to_window(
box_list.BoxList(normalized_boxes_per_image),
tf.constant([0.0, 0.0, 1.0, 1.0], tf.float32),
filter_nonoverlapping=False).get()
if boxes_contains_classes_dim:
max_num_proposals, num_classes, _ = (
shape_utils.combined_static_and_dynamic_shape(
boxes_and_image_shape[0]))
normalized_boxes_per_image = shape_utils.expand_first_dimension(
normalized_boxes_per_image, [max_num_proposals, num_classes])
return normalized_boxes_per_image
def loss(self, prediction_dict, true_image_shapes, scope=None):
"""Compute scalar loss tensors given prediction tensors.
......
......@@ -244,7 +244,8 @@ class FasterRCNNMetaArchTest(
max_num_proposals,
initial_crop_size,
maxpool_stride,
3)
3),
'feature_maps': [(2, image_size, image_size, 512)]
}
for input_shape in input_shapes:
......@@ -274,9 +275,12 @@ class FasterRCNNMetaArchTest(
'detection_boxes', 'detection_scores',
'detection_multiclass_scores', 'detection_classes',
'detection_masks', 'num_detections', 'mask_predictions',
'raw_detection_boxes', 'raw_detection_scores'
'raw_detection_boxes', 'raw_detection_scores',
'detection_anchor_indices', 'final_anchors',
])))
for key in expected_shapes:
if isinstance(tensor_dict_out[key], list):
continue
self.assertAllEqual(tensor_dict_out[key].shape, expected_shapes[key])
self.assertAllEqual(tensor_dict_out['detection_boxes'].shape, [2, 5, 4])
self.assertAllEqual(tensor_dict_out['detection_masks'].shape,
......@@ -288,6 +292,101 @@ class FasterRCNNMetaArchTest(
self.assertAllEqual(tensor_dict_out['mask_predictions'].shape,
[10, num_classes, 14, 14])
@parameterized.parameters(
{'use_keras': True},
{'use_keras': False},
)
def test_raw_detection_boxes_and_anchor_indices_correct(self, use_keras):
batch_size = 2
image_size = 10
max_num_proposals = 8
initial_crop_size = 3
maxpool_stride = 1
input_shapes = [(batch_size, image_size, image_size, 3),
(None, image_size, image_size, 3),
(batch_size, None, None, 3),
(None, None, None, 3)]
expected_num_anchors = image_size * image_size * 3 * 3
expected_shapes = {
'rpn_box_predictor_features':
(batch_size, image_size, image_size, 512),
'rpn_features_to_crop': (batch_size, image_size, image_size, 3),
'image_shape': (4,),
'rpn_box_encodings': (batch_size, expected_num_anchors, 4),
'rpn_objectness_predictions_with_background':
(batch_size, expected_num_anchors, 2),
'anchors': (expected_num_anchors, 4),
'refined_box_encodings': (batch_size * max_num_proposals, 1, 4),
'class_predictions_with_background':
(batch_size * max_num_proposals, 2 + 1),
'num_proposals': (batch_size,),
'proposal_boxes': (batch_size, max_num_proposals, 4),
'proposal_boxes_normalized': (batch_size, max_num_proposals, 4),
'box_classifier_features':
self._get_box_classifier_features_shape(image_size,
batch_size,
max_num_proposals,
initial_crop_size,
maxpool_stride,
3),
'feature_maps': [(batch_size, image_size, image_size, 3)],
'raw_detection_feature_map_indices': (batch_size, max_num_proposals, 1),
'raw_detection_boxes': (batch_size, max_num_proposals, 1, 4),
'final_anchors': (batch_size, max_num_proposals, 4)
}
for input_shape in input_shapes:
test_graph = tf.Graph()
with test_graph.as_default():
model = self._build_model(
is_training=False,
use_keras=use_keras,
number_of_stages=2,
second_stage_batch_size=2,
share_box_across_classes=True,
return_raw_detections_during_predict=True)
preprocessed_inputs = tf.placeholder(tf.float32, shape=input_shape)
_, true_image_shapes = model.preprocess(preprocessed_inputs)
predict_tensor_dict = model.predict(preprocessed_inputs,
true_image_shapes)
postprocess_tensor_dict = model.postprocess(predict_tensor_dict,
true_image_shapes)
init_op = tf.global_variables_initializer()
with self.test_session(graph=test_graph) as sess:
sess.run(init_op)
[predict_dict_out, postprocess_dict_out] = sess.run(
[predict_tensor_dict, postprocess_tensor_dict], feed_dict={
preprocessed_inputs:
np.zeros((batch_size, image_size, image_size, 3))})
self.assertEqual(
set(predict_dict_out.keys()),
set(expected_shapes.keys()))
for key in expected_shapes:
if isinstance(predict_dict_out[key], list):
continue
self.assertAllEqual(predict_dict_out[key].shape, expected_shapes[key])
# Verify that the raw detections from predict and postprocess are the
# same.
self.assertAllClose(
np.squeeze(predict_dict_out['raw_detection_boxes']),
postprocess_dict_out['raw_detection_boxes'])
# Verify that the raw detection boxes at detection anchor indices are the
# same as the postprocessed detections.
for i in range(batch_size):
num_detections_per_image = int(
postprocess_dict_out['num_detections'][i])
detection_boxes_per_image = postprocess_dict_out[
'detection_boxes'][i][:num_detections_per_image]
detection_anchor_indices_per_image = postprocess_dict_out[
'detection_anchor_indices'][i][:num_detections_per_image]
raw_detections_per_image = np.squeeze(predict_dict_out[
'raw_detection_boxes'][i])
raw_detections_at_anchor_indices = raw_detections_per_image[
detection_anchor_indices_per_image]
self.assertAllClose(detection_boxes_per_image,
raw_detections_at_anchor_indices)
@parameterized.parameters(
{'masks_are_class_agnostic': False, 'use_keras': True},
{'masks_are_class_agnostic': True, 'use_keras': True},
......@@ -345,7 +444,8 @@ class FasterRCNNMetaArchTest(
self._get_box_classifier_features_shape(
image_size, batch_size, max_num_proposals, initial_crop_size,
maxpool_stride, 3),
'mask_predictions': (2 * max_num_proposals, mask_shape_1, 14, 14)
'mask_predictions': (2 * max_num_proposals, mask_shape_1, 14, 14),
'feature_maps': [(2, image_size, image_size, 512)]
}
init_op = tf.global_variables_initializer()
......@@ -359,8 +459,11 @@ class FasterRCNNMetaArchTest(
'rpn_box_encodings',
'rpn_objectness_predictions_with_background',
'anchors',
'final_anchors',
])))
for key in expected_shapes:
if isinstance(tensor_dict_out[key], list):
continue
self.assertAllEqual(tensor_dict_out[key].shape, expected_shapes[key])
anchors_shape_out = tensor_dict_out['anchors'].shape
......
......@@ -118,27 +118,30 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
text_format.Merge(hyperparams_text_proto, hyperparams)
return hyperparams_builder.KerasLayerHyperparams(hyperparams)
def _get_second_stage_box_predictor_text_proto(self):
def _get_second_stage_box_predictor_text_proto(
self, share_box_across_classes=False):
share_box_field = 'true' if share_box_across_classes else 'false'
box_predictor_text_proto = """
mask_rcnn_box_predictor {
fc_hyperparams {
mask_rcnn_box_predictor {{
fc_hyperparams {{
op: FC
activation: NONE
regularizer {
l2_regularizer {
regularizer {{
l2_regularizer {{
weight: 0.0005
}
}
initializer {
variance_scaling_initializer {
}}
}}
initializer {{
variance_scaling_initializer {{
factor: 1.0
uniform: true
mode: FAN_AVG
}
}
}
}
"""
}}
}}
}}
share_box_across_classes: {share_box_across_classes}
}}
""".format(share_box_across_classes=share_box_field)
return box_predictor_text_proto
def _add_mask_to_second_stage_box_predictor_text_proto(
......@@ -169,10 +172,11 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
def _get_second_stage_box_predictor(self, num_classes, is_training,
predict_masks, masks_are_class_agnostic,
share_box_across_classes=False,
use_keras=False):
box_predictor_proto = box_predictor_pb2.BoxPredictor()
text_format.Merge(self._get_second_stage_box_predictor_text_proto(),
box_predictor_proto)
text_format.Merge(self._get_second_stage_box_predictor_text_proto(
share_box_across_classes), box_predictor_proto)
if predict_masks:
text_format.Merge(
self._add_mask_to_second_stage_box_predictor_text_proto(
......@@ -219,7 +223,9 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
clip_anchors_to_image=False,
use_matmul_gather_in_matcher=False,
use_static_shapes=False,
calibration_mapping_value=None):
calibration_mapping_value=None,
share_box_across_classes=False,
return_raw_detections_during_predict=False):
def image_resizer_fn(image, masks=None):
"""Fake image resizer function."""
......@@ -404,6 +410,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
'clip_anchors_to_image': clip_anchors_to_image,
'use_static_shapes': use_static_shapes,
'resize_masks': True,
'return_raw_detections_during_predict':
return_raw_detections_during_predict
}
return self._get_model(
......@@ -412,7 +420,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
is_training=is_training,
use_keras=use_keras,
predict_masks=predict_masks,
masks_are_class_agnostic=masks_are_class_agnostic), **common_kwargs)
masks_are_class_agnostic=masks_are_class_agnostic,
share_box_across_classes=share_box_across_classes), **common_kwargs)
@parameterized.parameters(
{'use_static_shapes': False, 'use_keras': True},
......@@ -538,7 +547,7 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
expected_output_keys = set([
'rpn_box_predictor_features', 'rpn_features_to_crop', 'image_shape',
'rpn_box_encodings', 'rpn_objectness_predictions_with_background',
'anchors'])
'anchors', 'feature_maps'])
# At training time, anchors that exceed image bounds are pruned. Thus
# the `expected_num_anchors` in the above inference mode test is now
# a strict upper bound on the number of anchors.
......@@ -612,7 +621,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
expected_output_shapes['proposal_boxes_normalized'])
self.assertAllEqual(results[11].shape,
expected_output_shapes['box_classifier_features'])
self.assertAllEqual(results[12].shape,
expected_output_shapes['final_anchors'])
batch_size = 2
image_size = 10
max_num_proposals = 8
......@@ -648,7 +658,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
prediction_dict['num_proposals'],
prediction_dict['proposal_boxes'],
prediction_dict['proposal_boxes_normalized'],
prediction_dict['box_classifier_features'])
prediction_dict['box_classifier_features'],
prediction_dict['final_anchors'])
expected_num_anchors = image_size * image_size * 3 * 3
expected_shapes = {
......@@ -671,7 +682,9 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
max_num_proposals,
initial_crop_size,
maxpool_stride,
3)
3),
'feature_maps': [(2, image_size, image_size, 512)],
'final_anchors': (2, max_num_proposals, 4)
}
if use_static_shapes:
......@@ -702,6 +715,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
self.assertEqual(set(tensor_dict_out.keys()),
set(expected_shapes.keys()))
for key in expected_shapes:
if isinstance(tensor_dict_out[key], list):
continue
self.assertAllEqual(tensor_dict_out[key].shape, expected_shapes[key])
@parameterized.parameters(
......@@ -748,7 +763,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
result_tensor_dict['rpn_objectness_predictions_with_background'],
result_tensor_dict['rpn_features_to_crop'],
result_tensor_dict['rpn_box_predictor_features'],
updates
updates,
result_tensor_dict['final_anchors'],
)
image_shape = (batch_size, image_size, image_size, 3)
......@@ -785,7 +801,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
image_size, batch_size, max_num_proposals, initial_crop_size,
maxpool_stride, 3),
'rpn_objectness_predictions_with_background':
(2, image_size * image_size * 9, 2)
(2, image_size * image_size * 9, 2),
'final_anchors': (2, max_num_proposals, 4)
}
# TODO(rathodv): Possibly change utils/test_case.py to accept dictionaries
# and return dicionaries so don't have to rely on the order of tensors.
......@@ -805,6 +822,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
expected_shapes['rpn_features_to_crop'])
self.assertAllEqual(results[8].shape,
expected_shapes['rpn_box_predictor_features'])
self.assertAllEqual(results[10].shape,
expected_shapes['final_anchors'])
@parameterized.parameters(
{'use_static_shapes': False, 'pad_to_max_dimension': None,
......@@ -1082,7 +1101,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
detections['detection_scores'], detections['detection_classes'],
detections['raw_detection_boxes'],
detections['raw_detection_scores'],
detections['detection_multiclass_scores'])
detections['detection_multiclass_scores'],
detections['detection_anchor_indices'])
proposal_boxes = np.array(
[[[1, 1, 2, 3],
......@@ -1110,6 +1130,7 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
[images, refined_box_encodings,
class_predictions_with_background,
num_proposals, proposal_boxes])
# Note that max_total_detections=5 in the NMS config.
expected_num_detections = [5, 4]
expected_detection_classes = [[0, 0, 0, 1, 1], [0, 0, 1, 1, 0]]
expected_detection_scores = [[1, 1, 1, 1, 1], [1, 1, 1, 1, 0]]
......@@ -1123,6 +1144,10 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
[1, 1, 1],
[1, 1, 1],
[0, 0, 0]]]
# Note that a single anchor can be used for multiple detections (predictions
# are made independently per class).
expected_anchor_indices = [[0, 1, 2, 0, 1],
[0, 1, 0, 1]]
h = float(image_shape[1])
w = float(image_shape[2])
......@@ -1143,6 +1168,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
expected_detection_classes[indx][0:num_proposals])
self.assertAllClose(results[6][indx][0:num_proposals],
expected_multiclass_scores[indx][0:num_proposals])
self.assertAllClose(results[7][indx][0:num_proposals],
expected_anchor_indices[indx][0:num_proposals])
self.assertAllClose(results[4], expected_raw_detection_boxes)
self.assertAllClose(results[5],
......
......@@ -82,7 +82,8 @@ class RFCNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch):
clip_anchors_to_image=False,
use_static_shapes=False,
resize_masks=False,
freeze_batchnorm=False):
freeze_batchnorm=False,
return_raw_detections_during_predict=False):
"""RFCNMetaArch Constructor.
Args:
......@@ -188,6 +189,9 @@ class RFCNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch):
training or not. When training with a small batch size (e.g. 1), it is
desirable to freeze batch norm update and use pretrained batch norm
params.
return_raw_detections_during_predict: Whether to return raw detection
boxes in the predict() method. These are decoded boxes that have not
been through postprocessing (i.e. NMS). Default False.
Raises:
ValueError: If `second_stage_batch_size` > `first_stage_max_proposals`
......@@ -234,7 +238,9 @@ class RFCNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch):
clip_anchors_to_image,
use_static_shapes,
resize_masks,
freeze_batchnorm=freeze_batchnorm)
freeze_batchnorm=freeze_batchnorm,
return_raw_detections_during_predict=(
return_raw_detections_during_predict))
self._rfcn_box_predictor = second_stage_rfcn_box_predictor
......@@ -335,7 +341,11 @@ class RFCNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch):
'proposal_boxes': absolute_proposal_boxes,
'box_classifier_features': box_classifier_features,
'proposal_boxes_normalized': proposal_boxes_normalized,
'final_anchors': absolute_proposal_boxes
}
if self._return_raw_detections_during_predict:
prediction_dict.update(self._raw_detections_and_feature_map_inds(
refined_box_encodings, absolute_proposal_boxes))
return prediction_dict
def regularization_losses(self):
......
......@@ -24,7 +24,9 @@ from object_detection.meta_architectures import rfcn_meta_arch
class RFCNMetaArchTest(
faster_rcnn_meta_arch_test_lib.FasterRCNNMetaArchTestBase):
def _get_second_stage_box_predictor_text_proto(self):
def _get_second_stage_box_predictor_text_proto(
self, share_box_across_classes=False):
del share_box_across_classes
box_predictor_text_proto = """
rfcn_box_predictor {
conv_hyperparams {
......
......@@ -254,13 +254,21 @@ class SSDKerasFeatureExtractor(tf.keras.Model):
the model graph.
"""
variables_to_restore = {}
for variable in self.variables:
# variable.name includes ":0" at the end, but the names in the checkpoint
# do not have the suffix ":0". So, we strip it here.
var_name = variable.name[:-2]
if var_name.startswith(feature_extractor_scope + '/'):
var_name = var_name.replace(feature_extractor_scope + '/', '')
variables_to_restore[var_name] = variable
if tf.executing_eagerly():
for variable in self.variables:
# variable.name includes ":0" at the end, but the names in the
# checkpoint do not have the suffix ":0". So, we strip it here.
var_name = variable.name[:-2]
if var_name.startswith(feature_extractor_scope + '/'):
var_name = var_name.replace(feature_extractor_scope + '/', '')
variables_to_restore[var_name] = variable
else:
# b/137854499: use global_variables.
for variable in variables_helper.get_global_variables_safely():
var_name = variable.op.name
if var_name.startswith(feature_extractor_scope + '/'):
var_name = var_name.replace(feature_extractor_scope + '/', '')
variables_to_restore[var_name] = variable
return variables_to_restore
......@@ -295,7 +303,9 @@ class SSDMetaArch(model.DetectionModel):
expected_loss_weights_fn=None,
use_confidences_as_targets=False,
implicit_example_weight=0.5,
equalization_loss_config=None):
equalization_loss_config=None,
return_raw_detections_during_predict=False,
nms_on_host=True):
"""SSDMetaArch Constructor.
TODO(rathodv,jonathanhuang): group NMS parameters + score converter into
......@@ -371,6 +381,11 @@ class SSDMetaArch(model.DetectionModel):
for the implicit negative examples.
equalization_loss_config: a namedtuple that specifies configs for
computing equalization loss.
return_raw_detections_during_predict: Whether to return raw detection
boxes in the predict() method. These are decoded boxes that have not
been through postprocessing (i.e. NMS). Default False.
nms_on_host: boolean (default: True) controlling whether NMS should be
carried out on the host (outside of TPU).
"""
super(SSDMetaArch, self).__init__(num_classes=box_predictor.num_classes)
self._is_training = is_training
......@@ -438,6 +453,10 @@ class SSDMetaArch(model.DetectionModel):
self._equalization_loss_config = equalization_loss_config
self._return_raw_detections_during_predict = (
return_raw_detections_during_predict)
self._nms_on_host = nms_on_host
@property
def anchors(self):
if not self._anchors:
......@@ -475,17 +494,10 @@ class SSDMetaArch(model.DetectionModel):
Raises:
ValueError: if inputs tensor does not have type tf.float32
"""
if inputs.dtype is not tf.float32:
raise ValueError('`preprocess` expects a tf.float32 tensor')
with tf.name_scope('Preprocessor'):
# TODO(jonathanhuang): revisit whether to always use batch size as
# the number of parallel iterations vs allow for dynamic batching.
outputs = shape_utils.static_or_dynamic_map_fn(
self._image_resizer_fn,
elems=inputs,
dtype=[tf.float32, tf.int32])
resized_inputs = outputs[0]
true_image_shapes = outputs[1]
(resized_inputs,
true_image_shapes) = shape_utils.resize_images_and_return_shapes(
inputs, self._image_resizer_fn)
return (self._feature_extractor.preprocess(resized_inputs),
true_image_shapes)
......@@ -560,6 +572,14 @@ class SSDMetaArch(model.DetectionModel):
[batch, height_i, width_i, depth_i].
5) anchors: 2-D float tensor of shape [num_anchors, 4] containing
the generated anchors in normalized coordinates.
6) final_anchors: 3-D float tensor of shape [batch_size, num_anchors, 4]
containing the generated anchors in normalized coordinates.
If self._return_raw_detections_during_predict is True, the dictionary
will also contain:
7) raw_detection_boxes: a 4-D float32 tensor with shape
[batch_size, self.max_num_proposals, 4] in normalized coordinates.
8) raw_detection_feature_map_indices: a 3-D int32 tensor with shape
[batch_size, self.max_num_proposals].
"""
if self._inplace_batchnorm_update:
batchnorm_updates_collections = None
......@@ -581,11 +601,11 @@ class SSDMetaArch(model.DetectionModel):
feature_maps)
image_shape = shape_utils.combined_static_and_dynamic_shape(
preprocessed_inputs)
self._anchors = box_list_ops.concatenate(
self._anchor_generator.generate(
feature_map_spatial_dims,
im_height=image_shape[1],
im_width=image_shape[2]))
boxlist_list = self._anchor_generator.generate(
feature_map_spatial_dims,
im_height=image_shape[1],
im_width=image_shape[2])
self._anchors = box_list_ops.concatenate(boxlist_list)
if self._box_predictor.is_keras_model:
predictor_results_dict = self._box_predictor(feature_maps)
else:
......@@ -596,9 +616,15 @@ class SSDMetaArch(model.DetectionModel):
predictor_results_dict = self._box_predictor.predict(
feature_maps, self._anchor_generator.num_anchors_per_location())
predictions_dict = {
'preprocessed_inputs': preprocessed_inputs,
'feature_maps': feature_maps,
'anchors': self._anchors.get()
'preprocessed_inputs':
preprocessed_inputs,
'feature_maps':
feature_maps,
'anchors':
self._anchors.get(),
'final_anchors':
tf.tile(
tf.expand_dims(self._anchors.get(), 0), [image_shape[0], 1, 1])
}
for prediction_key, prediction_list in iter(predictor_results_dict.items()):
prediction = tf.concat(prediction_list, axis=1)
......@@ -606,10 +632,29 @@ class SSDMetaArch(model.DetectionModel):
prediction.shape[2] == 1):
prediction = tf.squeeze(prediction, axis=2)
predictions_dict[prediction_key] = prediction
if self._return_raw_detections_during_predict:
predictions_dict.update(self._raw_detections_and_feature_map_inds(
predictions_dict['box_encodings'], boxlist_list))
self._batched_prediction_tensor_names = [x for x in predictions_dict
if x != 'anchors']
return predictions_dict
def _raw_detections_and_feature_map_inds(self, box_encodings, boxlist_list):
anchors = self._anchors.get()
raw_detection_boxes, _ = self._batch_decode(box_encodings, anchors)
batch_size, _, _ = shape_utils.combined_static_and_dynamic_shape(
raw_detection_boxes)
feature_map_indices = (
self._anchor_generator.anchor_index_to_feature_map_index(boxlist_list))
feature_map_indices_batched = tf.tile(
tf.expand_dims(feature_map_indices, 0),
multiples=[batch_size, 1])
return {
fields.PredictionFields.raw_detection_boxes: raw_detection_boxes,
fields.PredictionFields.raw_detection_feature_map_indices:
feature_map_indices_batched
}
def _get_feature_map_spatial_dims(self, feature_maps):
"""Return list of spatial dimensions for each feature map in a list.
......@@ -719,7 +764,9 @@ class SSDMetaArch(model.DetectionModel):
'multiclass_scores': detection_scores_with_background
}
if self._anchors is not None:
anchor_indices = tf.range(self._anchors.num_boxes_static())
num_boxes = (self._anchors.num_boxes_static() or
self._anchors.num_boxes())
anchor_indices = tf.range(num_boxes)
batch_anchor_indices = tf.tile(
tf.expand_dims(anchor_indices, 0), [batch_size, 1])
# All additional fields need to be float.
......@@ -730,14 +777,30 @@ class SSDMetaArch(model.DetectionModel):
detection_keypoints = tf.identity(
detection_keypoints, 'raw_keypoint_locations')
additional_fields[fields.BoxListFields.keypoints] = detection_keypoints
def _non_max_suppression_wrapper(kwargs):
if self._nms_on_host:
# Note: NMS is not memory efficient on TPU. This force the NMS to run
# outside of TPU.
return tf.contrib.tpu.outside_compilation(
lambda x: self._non_max_suppression_fn(**x), kwargs)
else:
return self._non_max_suppression_fn(**kwargs)
(nmsed_boxes, nmsed_scores, nmsed_classes, nmsed_masks,
nmsed_additional_fields, num_detections) = self._non_max_suppression_fn(
nmsed_additional_fields,
num_detections) = _non_max_suppression_wrapper({
'boxes':
detection_boxes,
'scores':
detection_scores,
clip_window=self._compute_clip_window(preprocessed_images,
true_image_shapes),
additional_fields=additional_fields,
masks=prediction_dict.get('mask_predictions'))
'clip_window':
self._compute_clip_window(preprocessed_images, true_image_shapes),
'additional_fields':
additional_fields,
'masks':
prediction_dict.get('mask_predictions')
})
detection_dict = {
fields.DetectionResultFields.detection_boxes:
nmsed_boxes,
......@@ -1058,6 +1121,15 @@ class SSDMetaArch(model.DetectionModel):
with rows of the Match objects corresponding to groundtruth boxes
and columns corresponding to anchors.
"""
# TODO(rathodv): Add a test for these summaries.
try:
# TODO(kaftan): Integrate these summaries into the v2 style loops
with tf.compat.v2.init_scope():
if tf.compat.v2.executing_eagerly():
return
except AttributeError:
pass
avg_num_gt_boxes = tf.reduce_mean(
tf.cast(
tf.stack([tf.shape(x)[0] for x in groundtruth_boxes_list]),
......@@ -1078,14 +1150,6 @@ class SSDMetaArch(model.DetectionModel):
tf.cast(
tf.stack([match.num_ignored_columns() for match in match_list]),
dtype=tf.float32))
# TODO(rathodv): Add a test for these summaries.
try:
# TODO(kaftan): Integrate these summaries into the v2 style loops
with tf.compat.v2.init_scope():
if tf.compat.v2.executing_eagerly():
return
except AttributeError:
pass
tf.summary.scalar('AvgNumGroundtruthBoxesPerImage',
avg_num_gt_boxes,
......@@ -1232,26 +1296,27 @@ class SSDMetaArch(model.DetectionModel):
ValueError: if fine_tune_checkpoint_type is neither `classification`
nor `detection`.
"""
if fine_tune_checkpoint_type not in ['detection', 'classification']:
raise ValueError('Not supported fine_tune_checkpoint_type: {}'.format(
fine_tune_checkpoint_type))
if fine_tune_checkpoint_type == 'classification':
return self._feature_extractor.restore_from_classification_checkpoint_fn(
self._extract_features_scope)
if fine_tune_checkpoint_type == 'detection':
elif fine_tune_checkpoint_type == 'detection':
variables_to_restore = {}
if tf.executing_eagerly():
for variable in self.variables:
# variable.name includes ":0" at the end, but the names in the
# checkpoint do not have the suffix ":0". So, we strip it here.
var_name = variable.name[:-2]
if load_all_detection_checkpoint_vars:
if load_all_detection_checkpoint_vars:
# Grab all detection vars by name
for variable in self.variables:
# variable.name includes ":0" at the end, but the names in the
# checkpoint do not have the suffix ":0". So, we strip it here.
var_name = variable.name[:-2]
variables_to_restore[var_name] = variable
else:
# Grab just the feature extractor vars by name
for variable in self._feature_extractor.variables:
# variable.name includes ":0" at the end, but the names in the
# checkpoint do not have the suffix ":0". So, we strip it here.
var_name = variable.name[:-2]
variables_to_restore[var_name] = variable
else:
if var_name.startswith(self._extract_features_scope):
variables_to_restore[var_name] = variable
else:
for variable in variables_helper.get_global_variables_safely():
var_name = variable.op.name
......@@ -1261,7 +1326,11 @@ class SSDMetaArch(model.DetectionModel):
if var_name.startswith(self._extract_features_scope):
variables_to_restore[var_name] = variable
return variables_to_restore
return variables_to_restore
else:
raise ValueError('Not supported fine_tune_checkpoint_type: {}'.format(
fine_tune_checkpoint_type))
def updates(self):
"""Returns a list of update operators for this model.
......
......@@ -49,7 +49,8 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
predict_mask=False,
use_static_shapes=False,
nms_max_size_per_class=5,
calibration_mapping_value=None):
calibration_mapping_value=None,
return_raw_detections_during_predict=False):
return super(SsdMetaArchTest, self)._create_model(
model_fn=ssd_meta_arch.SSDMetaArch,
apply_hard_mining=apply_hard_mining,
......@@ -63,7 +64,9 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
predict_mask=predict_mask,
use_static_shapes=use_static_shapes,
nms_max_size_per_class=nms_max_size_per_class,
calibration_mapping_value=calibration_mapping_value)
calibration_mapping_value=calibration_mapping_value,
return_raw_detections_during_predict=(
return_raw_detections_during_predict))
def test_preprocess_preserves_shapes_with_dynamic_input_image(
self, use_keras):
......@@ -105,6 +108,7 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
self.assertIn('class_predictions_with_background', prediction_dict)
self.assertIn('feature_maps', prediction_dict)
self.assertIn('anchors', prediction_dict)
self.assertIn('final_anchors', prediction_dict)
init_op = tf.global_variables_initializer()
with self.test_session(graph=tf_graph) as sess:
......@@ -121,6 +125,8 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
self.assertAllEqual(prediction_out['box_encodings'].shape,
expected_box_encodings_shape_out)
self.assertAllEqual(prediction_out['final_anchors'].shape,
(batch_size, num_anchors, 4))
self.assertAllEqual(
prediction_out['class_predictions_with_background'].shape,
expected_class_predictions_with_background_shape_out)
......@@ -137,7 +143,7 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
return (predictions['box_encodings'],
predictions['class_predictions_with_background'],
predictions['feature_maps'],
predictions['anchors'])
predictions['anchors'], predictions['final_anchors'])
batch_size = 3
image_size = 2
channels = 3
......@@ -145,11 +151,83 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
channels).astype(np.float32)
expected_box_encodings_shape = (batch_size, num_anchors, code_size)
expected_class_predictions_shape = (batch_size, num_anchors, num_classes+1)
(box_encodings, class_predictions, _, _) = self.execute(graph_fn,
[input_image])
final_anchors_shape = (batch_size, num_anchors, 4)
(box_encodings, class_predictions, _, _, final_anchors) = self.execute(
graph_fn, [input_image])
self.assertAllEqual(box_encodings.shape, expected_box_encodings_shape)
self.assertAllEqual(class_predictions.shape,
expected_class_predictions_shape)
self.assertAllEqual(final_anchors.shape, final_anchors_shape)
def test_predict_with_raw_output_fields(self, use_keras):
with tf.Graph().as_default():
_, num_classes, num_anchors, code_size = self._create_model(
use_keras=use_keras)
def graph_fn(input_image):
model, _, _, _ = self._create_model(
return_raw_detections_during_predict=True)
predictions = model.predict(input_image, true_image_shapes=None)
return (predictions['box_encodings'],
predictions['class_predictions_with_background'],
predictions['feature_maps'],
predictions['anchors'], predictions['final_anchors'],
predictions['raw_detection_boxes'],
predictions['raw_detection_feature_map_indices'])
batch_size = 3
image_size = 2
channels = 3
input_image = np.random.rand(batch_size, image_size, image_size,
channels).astype(np.float32)
expected_box_encodings_shape = (batch_size, num_anchors, code_size)
expected_class_predictions_shape = (batch_size, num_anchors, num_classes+1)
final_anchors_shape = (batch_size, num_anchors, 4)
expected_raw_detection_boxes_shape = (batch_size, num_anchors, 4)
(box_encodings, class_predictions, _, _, final_anchors, raw_detection_boxes,
raw_detection_feature_map_indices) = self.execute(
graph_fn, [input_image])
self.assertAllEqual(box_encodings.shape, expected_box_encodings_shape)
self.assertAllEqual(class_predictions.shape,
expected_class_predictions_shape)
self.assertAllEqual(final_anchors.shape, final_anchors_shape)
self.assertAllEqual(raw_detection_boxes.shape,
expected_raw_detection_boxes_shape)
self.assertAllEqual(raw_detection_feature_map_indices,
np.zeros((batch_size, num_anchors)))
def test_raw_detection_boxes_agree_predict_postprocess(self, use_keras):
batch_size = 2
image_size = 2
input_shapes = [(batch_size, image_size, image_size, 3),
(None, image_size, image_size, 3),
(batch_size, None, None, 3),
(None, None, None, 3)]
for input_shape in input_shapes:
tf_graph = tf.Graph()
with tf_graph.as_default():
model, _, _, _ = self._create_model(
use_keras=use_keras, return_raw_detections_during_predict=True)
input_placeholder = tf.placeholder(tf.float32, shape=input_shape)
preprocessed_inputs, true_image_shapes = model.preprocess(
input_placeholder)
prediction_dict = model.predict(preprocessed_inputs,
true_image_shapes)
raw_detection_boxes_predict = prediction_dict['raw_detection_boxes']
detections = model.postprocess(prediction_dict, true_image_shapes)
raw_detection_boxes_postprocess = detections['raw_detection_boxes']
init_op = tf.global_variables_initializer()
with self.test_session(graph=tf_graph) as sess:
sess.run(init_op)
raw_detection_boxes_predict_out, raw_detection_boxes_postprocess_out = (
sess.run(
[raw_detection_boxes_predict, raw_detection_boxes_postprocess],
feed_dict={
input_placeholder:
np.random.uniform(size=(batch_size, 2, 2, 3))}))
self.assertAllEqual(raw_detection_boxes_predict_out,
raw_detection_boxes_postprocess_out)
def test_postprocess_results_are_correct(self, use_keras):
batch_size = 2
......@@ -188,7 +266,7 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
[0.5, 0., 1., 0.5], [1., 1., 1.5, 1.5]]]
raw_detection_scores = [[[0, 0], [0, 0], [0, 0], [0, 0]],
[[0, 0], [0, 0], [0, 0], [0, 0]]]
detection_anchor_indices = [[0, 2, 1, 0, 0], [0, 2, 1, 0, 0]]
detection_anchor_indices_sets = [[0, 1, 2], [0, 1, 2]]
for input_shape in input_shapes:
tf_graph = tf.Graph()
......@@ -230,8 +308,9 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
raw_detection_boxes)
self.assertAllEqual(detections_out['raw_detection_scores'],
raw_detection_scores)
self.assertAllEqual(detections_out['detection_anchor_indices'],
detection_anchor_indices)
for idx in range(batch_size):
self.assertSameElements(detections_out['detection_anchor_indices'][idx],
detection_anchor_indices_sets[idx])
def test_postprocess_results_are_correct_static(self, use_keras):
with tf.Graph().as_default():
......
......@@ -129,7 +129,8 @@ class SSDMetaArchTestBase(test_case.TestCase):
predict_mask=False,
use_static_shapes=False,
nms_max_size_per_class=5,
calibration_mapping_value=None):
calibration_mapping_value=None,
return_raw_detections_during_predict=False):
is_training = False
num_classes = 1
mock_anchor_generator = MockAnchorGenerator2x2()
......@@ -238,6 +239,8 @@ class SSDMetaArchTestBase(test_case.TestCase):
add_background_class=add_background_class,
random_example_sampler=random_example_sampler,
expected_loss_weights_fn=expected_loss_weights_fn,
return_raw_detections_during_predict=(
return_raw_detections_during_predict),
**kwargs)
return model, num_classes, mock_anchor_generator.num_anchors(), code_size
......
......@@ -267,6 +267,13 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False,
# Make sure to set the Keras learning phase. True during training,
# False for inference.
tf.keras.backend.set_learning_phase(is_training)
# Set policy for mixed-precision training with Keras-based models.
if use_tpu and train_config.use_bfloat16:
from tensorflow.python.keras.engine import base_layer_utils # pylint: disable=g-import-not-at-top
# Enable v2 behavior, as `mixed_bfloat16` is only supported in TF 2.0.
base_layer_utils.enable_v2_dtype_behavior()
tf.compat.v2.keras.mixed_precision.experimental.set_policy(
'mixed_bfloat16')
detection_model = detection_model_fn(
is_training=is_training, add_summaries=(not use_tpu))
scaffold_fn = None
......@@ -315,7 +322,8 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False,
features[fields.InputDataFields.true_image_shape]))
if mode == tf.estimator.ModeKeys.TRAIN:
if train_config.fine_tune_checkpoint and hparams.load_pretrained:
load_pretrained = hparams.load_pretrained if hparams else False
if train_config.fine_tune_checkpoint and load_pretrained:
if not train_config.fine_tune_checkpoint_type:
# train_config.from_detection_checkpoint field is deprecated. For
# backward compatibility, set train_config.fine_tune_checkpoint_type
......@@ -449,6 +457,10 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False,
original_image_spatial_shapes=original_image_spatial_shapes,
true_image_shapes=true_image_shapes)
if fields.InputDataFields.image_additional_channels in features:
eval_dict[fields.InputDataFields.image_additional_channels] = features[
fields.InputDataFields.image_additional_channels]
if class_agnostic:
category_index = label_map_util.create_class_agnostic_category_index()
else:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment