Commit 05584085 authored by pkulzc's avatar pkulzc Committed by Jonathan Huang
Browse files

Merged commit includes the following changes: (#6315)

236813471  by lzc:

    Internal change.

--
236507310  by lzc:

    Fix preprocess.random_resize_method config type issue. The target height and width will be passed as "size" to tf.image.resize_images which only accepts integer.

--
236409989  by Zhichao Lu:

    Config export_to_tpu from function parameter instead of HParams for TPU inference.

--
236403186  by Zhichao Lu:

    Make graph file names optional arguments.

--
236237072  by Zhichao Lu:

    Minor bugfix for keyword args.

--
236209602  by Zhichao Lu:

    Add support for PartitionedVariable to get_variables_available_in_checkpoint.

--
235828658  by Zhichao Lu:

    Automatically stop evaluation jobs when training is finished.

--
235817964  by Zhichao Lu:

    Add an optional process_metrics_fn callback to eval_util, it gets called
    with evaluation results once each evaluation is complete.

--
235788721  by lzc:

    Fix yml file tf runtime version.

--
235262897  by Zhichao Lu:

    Add keypoint support to the random_pad_image preprocessor method.

--
235257380  by Zhichao Lu:

    Support InputDataFields.groundtruth_confidences in retain_groundtruth(), retain_groundtruth_with_positive_classes(), filter_groundtruth_with_crowd_boxes(), filter_groundtruth_with_nan_box_coordinates(), filter_unrecognized_classes().

--
235109188  by Zhichao Lu:

    Fix bug in pad_input_data_to_static_shapes for num_additional_channels > 0; make color-specific data augmentation only touch RGB channels.

--
235045010  by Zhichao Lu:

    Don't slice class_predictions_with_background when add_background_class is false.

--
235026189  by lzc:

    Fix import in g3doc.

--
234863426  by Zhichao Lu:

    Added fixes in exporter to allow writing a checkpoint to a specified temporary directory.

--
234671886  by lzc:

    Internal Change.

--
234630803  by rathodv:

    Internal Change.

--
233985896  by Zhichao Lu:

    Add Neumann optimizer to object detection.

--
233560911  by Zhichao Lu:

    Add NAS-FPN object detection with Resnet and Mobilenet v2.

--
233513536  by Zhichao Lu:

    Export TPU compatible object detection model

--
233495772  by lzc:

    Internal change.

--
233453557  by Zhichao Lu:

    Create Keras-based SSD+MobilenetV1 for object detection.

--
233220074  by lzc:

    Update release notes date.

--
233165761  by Zhichao Lu:

    Support depth_multiplier and min_depth in _SSDResnetV1FpnFeatureExtractor.

--
233160046  by lzc:

    Internal change.

--
232926599  by Zhichao Lu:

    [tf.data] Switching tf.data functions to use `defun`, providing an escape hatch to continue using the legacy `Defun`.

    There are subtle differences between the implementation of `defun` and `Defun` (such as resources handling or control flow) and it is possible that input pipelines that use control flow or resources in their functions might be affected by this change. To migrate majority of existing pipelines to the recommended way of creating functions in TF 2.0 world, while allowing (a small number of) existing pipelines to continue relying on the deprecated behavior, this CL provides an escape hatch.

    If your input pipeline is affected by this CL, it should apply the escape hatch by replacing `foo.map(...)` with `foo.map_with_legacy_function(...)`.

--
232891621  by Zhichao Lu:

    Modify faster_rcnn meta architecture to normalize raw detections.

--
232875817  by Zhichao Lu:

    Make calibration a post-processing step.

    Specifically:
    - Move the calibration config from pipeline.proto --> post_processing.proto
    - Edit post_processing_builder.py to return a calibration function. If no calibration config is provided, it None.
    - Edit SSD and FasterRCNN meta architectures to optionally call the calibration function on detection scores after score conversion and before NMS.

--
232704481  by Zhichao Lu:

    Edit calibration builder to build a function that will be used within a detection model's `postprocess` method, after score conversion and before non-maxima suppression.

    Specific Edits:
    - The returned function now accepts class_predictions_with_background as its argument instead of detection_scores and detection_classes.
    - Class-specific calibration was temporarily removed, as it requires more significant refactoring. Will be added later.

--
232615379  by Zhichao Lu:

    Internal change

--
232483345  by ronnyvotel:

    Making the use of bfloat16 restricted to TPUs.

--
232399572  by Zhichao Lu:

    Edit calibration builder and proto to support class-agnostic calibration.

    Specifically:
    - Edit calibration protos to include path to relevant label map if required for class-specific calibration. Previously, label maps were inferred from other parts of the pipeline proto; this allows all information required by the builder stay within the calibration proto and remove extraneous information from being passed with class-agnostic calibration.
    - Add class-agnostic protos to the calibration config.

    Note that the proto supports sigmoid and linear interpolation parameters, but the builder currently only supports linear interpolation.

--
231613048  by Zhichao Lu:

    Add calibration builder for applying calibration transformations from output of object detection models.

    Specifically:
    - Add calibration proto to support sigmoid and isotonic regression (stepwise function) calibration.
    - Add a builder to support calibration from isotonic regression outputs.

--
231519786  by lzc:

    model_builder test refactor.
    - removed proto text boilerplate in each test case and let them call a create_default_proto function instead.
    - consolidated all separate ssd model creation tests into one.
    - consolidated all separate faster rcnn model creation tests into one.
    - used parameterized test for testing mask rcnn models and use_matmul_crop_and_resize
    - added all failures test.

--
231448169  by Zhichao Lu:

    Return static shape as a constant tensor.

--
231423126  by lzc:

    Add a release note for OID v4 models.

--
231401941  by Zhichao Lu:

    Adding correct labelmap for the models trained on Open Images V4 (*oid_v4
    config suffix).

--
231320357  by Zhichao Lu:

    Add scope to Nearest Neighbor Resize op so that it stays in the same name scope as the original resize ops.

--
231257699  by Zhichao Lu:

    Switch to using preserve_aspect_ratio in tf.image.resize_images rather than using a custom implementation.

--
231247368  by rathodv:

    Internal change.

--
231004874  by lzc:

    Update documentations to use tf 1.12 for object detection API.

--
230999911  by rathodv:

    Use tf.batch_gather instead of ops.batch_gather

--
230999720  by huizhongc:

    Fix weight equalization test in ops_test.

--
230984728  by rathodv:

    Internal update.

--
230929019  by lzc:

    Add an option to replace preprocess operation with placeholder for ssd feature extractor.

--
230845266  by lzc:

    Require tensorflow version 1.12 for object detection API and rename keras_applications to keras_models

--
230392064  by lzc:

    Add RetinaNet 101 checkpoint trained on OID v4 to detection model zoo.

--
230014128  by derekjchow:

    This file was re-located below the tensorflow/lite/g3doc/convert

--
229941449  by lzc:

    Update SSD mobilenet v2 quantized model download path.

--
229843662  by lzc:

    Add an option to use native resize tf op in fpn top-down feature map generation.

--
229636034  by rathodv:

    Add deprecation notice to a few old parameters in train.proto

--
228959078  by derekjchow:

    Remove duplicate elif case in _check_and_convert_legacy_input_config_key

--
228749719  by rathodv:

    Minor refactoring to make exporter's `build_detection_graph` method public.

--
228573828  by rathodv:

    Mofity model.postprocess to return raw detections and raw scores.

    Modify, post-process methods in core/model.py and the meta architectures to export raw detection (without any non-max suppression) and raw multiclass score logits for those detections.

--
228420670  by Zhichao Lu:

    Add shims for custom architectures for object detection models.

--
228241692  by Zhichao Lu:

    Fix the comment on "losses_mask" in "Loss" class.

--
228223810  by Zhichao Lu:

    Support other_heads' predictions in WeightSharedConvolutionalBoxPredictor. Also remove a few unused parameters and fix a couple of comments in convolutional_box_predictor.py.

--
228200588  by Zhichao Lu:

    Add Expected Calibration Error and an evaluator that calculates the metric for object detections.

--
228167740  by lzc:

    Add option to use bounded activations in FPN top-down feature map generation.

--
227767700  by rathodv:

    Internal.

--
226295236  by Zhichao Lu:

    Add Open Image V4 Resnet101-FPN training config to third_party

--
226254842  by Zhichao Lu:

    Fix typo in documentation.

--
225833971  by Zhichao Lu:

    Option to have no resizer in object detection model.

--
225824890  by lzc:

    Fixes p3 compatibility for model_lib.py

--
225760897  by menglong:

    normalizer should be at least 1.

--
225559842  by menglong:

    Add extra logic filtering unrecognized classes.

--
225379421  by lzc:

    Add faster_rcnn_inception_resnet_v2_atrous_oid_v4 config to third_party

--
225368337  by Zhichao Lu:

    Add extra logic filtering unrecognized classes.

--
225341095  by Zhichao Lu:

    Adding Open Images V4 models to OD API model zoo and corresponding configs to the
    configs.

--
225218450  by menglong:

    Add extra logic filtering unrecognized classes.

--
225057591  by Zhichao Lu:

    Internal change.

--
224895417  by rathodv:

    Internal change.

--
224209282  by Zhichao Lu:

    Add two data augmentations to object detection: (1) Self-concat (2) Absolute pads.

--
224073762  by Zhichao Lu:

    Do not create tf.constant until _generate() is actually called in the object detector.

--

PiperOrigin-RevId: 236813471
parent a5db4420
......@@ -92,8 +92,8 @@ configured in the meta architecture:
non-max suppression and normalize them. In this case, the `postprocess` method
skips both `_postprocess_rpn` and `_postprocess_box_classifier`.
"""
from abc import abstractmethod
from functools import partial
import abc
import functools
import tensorflow as tf
from object_detection.anchor_generators import grid_anchor_generator
......@@ -138,7 +138,7 @@ class FasterRCNNFeatureExtractor(object):
self._reuse_weights = reuse_weights
self._weight_decay = weight_decay
@abstractmethod
@abc.abstractmethod
def preprocess(self, resized_inputs):
"""Feature-extractor specific preprocessing (minus image resizing)."""
pass
......@@ -162,7 +162,7 @@ class FasterRCNNFeatureExtractor(object):
with tf.variable_scope(scope, values=[preprocessed_inputs]):
return self._extract_proposal_features(preprocessed_inputs, scope)
@abstractmethod
@abc.abstractmethod
def _extract_proposal_features(self, preprocessed_inputs, scope):
"""Extracts first stage RPN features, to be overridden."""
pass
......@@ -185,7 +185,7 @@ class FasterRCNNFeatureExtractor(object):
scope, values=[proposal_feature_maps], reuse=tf.AUTO_REUSE):
return self._extract_box_classifier_features(proposal_feature_maps, scope)
@abstractmethod
@abc.abstractmethod
def _extract_box_classifier_features(self, proposal_feature_maps, scope):
"""Extracts second stage box classifier features, to be overridden."""
pass
......@@ -770,7 +770,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
representing the features for each proposal.
"""
image_shape_2d = self._image_batch_shape_2d(image_shape)
proposal_boxes_normalized, _, num_proposals = self._postprocess_rpn(
proposal_boxes_normalized, _, num_proposals, _, _ = self._postprocess_rpn(
rpn_box_encodings, rpn_objectness_predictions_with_background,
anchors, image_shape_2d, true_image_shapes)
......@@ -1080,7 +1080,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
anchors_boxlist, clip_window)
def _batch_gather_kept_indices(predictions_tensor):
return shape_utils.static_or_dynamic_map_fn(
partial(tf.gather, indices=keep_indices),
functools.partial(tf.gather, indices=keep_indices),
elems=predictions_tensor,
dtype=tf.float32,
parallel_iterations=self._parallel_iterations,
......@@ -1148,17 +1148,22 @@ class FasterRCNNMetaArch(model.DetectionModel):
with tf.name_scope('FirstStagePostprocessor'):
if self._number_of_stages == 1:
proposal_boxes, proposal_scores, num_proposals = self._postprocess_rpn(
prediction_dict['rpn_box_encodings'],
prediction_dict['rpn_objectness_predictions_with_background'],
prediction_dict['anchors'],
true_image_shapes,
true_image_shapes)
(proposal_boxes, proposal_scores, num_proposals, raw_proposal_boxes,
raw_proposal_scores) = self._postprocess_rpn(
prediction_dict['rpn_box_encodings'],
prediction_dict['rpn_objectness_predictions_with_background'],
prediction_dict['anchors'], true_image_shapes, true_image_shapes)
return {
fields.DetectionResultFields.detection_boxes: proposal_boxes,
fields.DetectionResultFields.detection_scores: proposal_scores,
fields.DetectionResultFields.detection_boxes:
proposal_boxes,
fields.DetectionResultFields.detection_scores:
proposal_scores,
fields.DetectionResultFields.num_detections:
tf.to_float(num_proposals),
fields.DetectionResultFields.raw_detection_boxes:
raw_proposal_boxes,
fields.DetectionResultFields.raw_detection_scores:
raw_proposal_scores
}
# TODO(jrru): Remove mask_predictions from _post_process_box_classifier.
......@@ -1266,6 +1271,11 @@ class FasterRCNNMetaArch(model.DetectionModel):
num_proposals: A Tensor of type `int32`. A 1-D tensor of shape [batch]
representing the number of proposals predicted for each image in
the batch.
raw_detection_boxes: [batch, total_detections, 4] tensor with decoded
proposal boxes before Non-Max Suppression.
raw_detection_score: [batch, total_detections,
num_classes_with_background] tensor of class score logits for
raw proposal boxes.
"""
rpn_box_encodings_batch = tf.expand_dims(rpn_box_encodings_batch, axis=2)
rpn_encodings_shape = shape_utils.combined_static_and_dynamic_shape(
......@@ -1274,13 +1284,13 @@ class FasterRCNNMetaArch(model.DetectionModel):
tf.expand_dims(anchors, 0), [rpn_encodings_shape[0], 1, 1])
proposal_boxes = self._batch_decode_boxes(rpn_box_encodings_batch,
tiled_anchor_boxes)
proposal_boxes = tf.squeeze(proposal_boxes, axis=2)
raw_proposal_boxes = tf.squeeze(proposal_boxes, axis=2)
rpn_objectness_softmax_without_background = tf.nn.softmax(
rpn_objectness_predictions_with_background_batch)[:, :, 1]
clip_window = self._compute_clip_window(image_shapes)
(proposal_boxes, proposal_scores, _, _, _,
num_proposals) = self._first_stage_nms_fn(
tf.expand_dims(proposal_boxes, axis=2),
tf.expand_dims(raw_proposal_boxes, axis=2),
tf.expand_dims(rpn_objectness_softmax_without_background, axis=2),
clip_window=clip_window)
if self._is_training:
......@@ -1304,7 +1314,13 @@ class FasterRCNNMetaArch(model.DetectionModel):
return normalized_boxes_per_image
normalized_proposal_boxes = shape_utils.static_or_dynamic_map_fn(
normalize_boxes, elems=[proposal_boxes, image_shapes], dtype=tf.float32)
return normalized_proposal_boxes, proposal_scores, num_proposals
raw_normalized_proposal_boxes = shape_utils.static_or_dynamic_map_fn(
normalize_boxes,
elems=[raw_proposal_boxes, image_shapes],
dtype=tf.float32)
return (normalized_proposal_boxes, proposal_scores, num_proposals,
raw_normalized_proposal_boxes,
rpn_objectness_predictions_with_background_batch)
def _sample_box_classifier_batch(
self,
......@@ -1576,6 +1592,11 @@ class FasterRCNNMetaArch(model.DetectionModel):
(optional) [batch, max_detections, mask_height, mask_width]. Note
that a pixel-wise sigmoid score converter is applied to the detection
masks.
`raw_detection_boxes`: [batch, total_detections, 4] tensor with decoded
detection boxes before Non-Max Suppression.
`raw_detection_score`: [batch, total_detections,
num_classes_with_background] tensor of multi-class score logits for
raw detection boxes.
"""
refined_box_encodings_batch = tf.reshape(
refined_box_encodings,
......@@ -1589,11 +1610,11 @@ class FasterRCNNMetaArch(model.DetectionModel):
)
refined_decoded_boxes_batch = self._batch_decode_boxes(
refined_box_encodings_batch, proposal_boxes)
class_predictions_with_background_batch = (
class_predictions_with_background_batch_normalized = (
self._second_stage_score_conversion_fn(
class_predictions_with_background_batch))
class_predictions_batch = tf.reshape(
tf.slice(class_predictions_with_background_batch,
tf.slice(class_predictions_with_background_batch_normalized,
[0, 0, 1], [-1, -1, -1]),
[-1, self.max_num_proposals, self.num_classes])
clip_window = self._compute_clip_window(image_shapes)
......@@ -1614,11 +1635,51 @@ class FasterRCNNMetaArch(model.DetectionModel):
change_coordinate_frame=True,
num_valid_boxes=num_proposals,
masks=mask_predictions_batch)
if refined_decoded_boxes_batch.shape[2] > 1:
class_ids = tf.expand_dims(
tf.argmax(class_predictions_with_background_batch[:, :, 1:], axis=2,
output_type=tf.int32),
axis=-1)
raw_detection_boxes = tf.squeeze(
tf.batch_gather(refined_decoded_boxes_batch, class_ids), axis=2)
else:
raw_detection_boxes = tf.squeeze(refined_decoded_boxes_batch, axis=2)
def normalize_and_clip_boxes(args):
"""Normalize and clip boxes."""
boxes_per_image = args[0]
image_shape = args[1]
normalized_boxes_per_image = box_list_ops.to_normalized_coordinates(
box_list.BoxList(boxes_per_image),
image_shape[0],
image_shape[1],
check_range=False).get()
normalized_boxes_per_image = box_list_ops.clip_to_window(
box_list.BoxList(normalized_boxes_per_image),
tf.constant([0.0, 0.0, 1.0, 1.0], tf.float32),
filter_nonoverlapping=False).get()
return normalized_boxes_per_image
raw_normalized_detection_boxes = shape_utils.static_or_dynamic_map_fn(
normalize_and_clip_boxes,
elems=[raw_detection_boxes, image_shapes],
dtype=tf.float32)
detections = {
fields.DetectionResultFields.detection_boxes: nmsed_boxes,
fields.DetectionResultFields.detection_scores: nmsed_scores,
fields.DetectionResultFields.detection_classes: nmsed_classes,
fields.DetectionResultFields.num_detections: tf.to_float(num_detections)
fields.DetectionResultFields.detection_boxes:
nmsed_boxes,
fields.DetectionResultFields.detection_scores:
nmsed_scores,
fields.DetectionResultFields.detection_classes:
nmsed_classes,
fields.DetectionResultFields.num_detections:
tf.to_float(num_detections),
fields.DetectionResultFields.raw_detection_boxes:
raw_normalized_detection_boxes,
fields.DetectionResultFields.raw_detection_scores:
class_predictions_with_background_batch
}
if nmsed_masks is not None:
detections[fields.DetectionResultFields.detection_masks] = nmsed_masks
......@@ -1769,7 +1830,8 @@ class FasterRCNNMetaArch(model.DetectionModel):
back_prop=True))
# Normalize by number of examples in sampled minibatch
normalizer = tf.reduce_sum(batch_sampled_indices, axis=1)
normalizer = tf.maximum(
tf.reduce_sum(batch_sampled_indices, axis=1), 1.0)
batch_one_hot_targets = tf.one_hot(
tf.to_int32(batch_cls_targets), depth=2)
sampled_reg_indices = tf.multiply(batch_sampled_indices,
......
......@@ -85,6 +85,68 @@ class FasterRCNNMetaArchTest(
self.assertTrue(np.amax(detections_out['detection_masks'] <= 1.0))
self.assertTrue(np.amin(detections_out['detection_masks'] >= 0.0))
def test_postprocess_second_stage_only_inference_mode_with_calibration(self):
model = self._build_model(
is_training=False, number_of_stages=2, second_stage_batch_size=6,
calibration_mapping_value=0.5)
batch_size = 2
total_num_padded_proposals = batch_size * model.max_num_proposals
proposal_boxes = tf.constant(
[[[1, 1, 2, 3],
[0, 0, 1, 1],
[.5, .5, .6, .6],
4*[0], 4*[0], 4*[0], 4*[0], 4*[0]],
[[2, 3, 6, 8],
[1, 2, 5, 3],
4*[0], 4*[0], 4*[0], 4*[0], 4*[0], 4*[0]]], dtype=tf.float32)
num_proposals = tf.constant([3, 2], dtype=tf.int32)
refined_box_encodings = tf.zeros(
[total_num_padded_proposals, model.num_classes, 4], dtype=tf.float32)
class_predictions_with_background = tf.ones(
[total_num_padded_proposals, model.num_classes+1], dtype=tf.float32)
image_shape = tf.constant([batch_size, 36, 48, 3], dtype=tf.int32)
mask_height = 2
mask_width = 2
mask_predictions = 30. * tf.ones(
[total_num_padded_proposals, model.num_classes,
mask_height, mask_width], dtype=tf.float32)
exp_detection_masks = np.array([[[[1, 1], [1, 1]],
[[1, 1], [1, 1]],
[[1, 1], [1, 1]],
[[1, 1], [1, 1]],
[[1, 1], [1, 1]]],
[[[1, 1], [1, 1]],
[[1, 1], [1, 1]],
[[1, 1], [1, 1]],
[[1, 1], [1, 1]],
[[0, 0], [0, 0]]]])
_, true_image_shapes = model.preprocess(tf.zeros(image_shape))
detections = model.postprocess({
'refined_box_encodings': refined_box_encodings,
'class_predictions_with_background': class_predictions_with_background,
'num_proposals': num_proposals,
'proposal_boxes': proposal_boxes,
'image_shape': image_shape,
'mask_predictions': mask_predictions
}, true_image_shapes)
with self.test_session() as sess:
detections_out = sess.run(detections)
self.assertAllEqual(detections_out['detection_boxes'].shape, [2, 5, 4])
# All scores map to 0.5, except for the final one, which is pruned.
self.assertAllClose(detections_out['detection_scores'],
[[0.5, 0.5, 0.5, 0.5, 0.5],
[0.5, 0.5, 0.5, 0.5, 0.0]])
self.assertAllClose(detections_out['detection_classes'],
[[0, 0, 0, 1, 1], [0, 0, 1, 1, 0]])
self.assertAllClose(detections_out['num_detections'], [5, 4])
self.assertAllClose(detections_out['detection_masks'],
exp_detection_masks)
self.assertTrue(np.amax(detections_out['detection_masks'] <= 1.0))
self.assertTrue(np.amin(detections_out['detection_masks'] >= 0.0))
def test_postprocess_second_stage_only_inference_mode_with_shared_boxes(self):
model = self._build_model(
is_training=False, number_of_stages=2, second_stage_batch_size=6)
......@@ -190,6 +252,7 @@ class FasterRCNNMetaArchTest(
set([
'detection_boxes', 'detection_scores', 'detection_classes',
'detection_masks', 'num_detections', 'mask_predictions',
'raw_detection_boxes', 'raw_detection_scores'
])))
for key in expected_shapes:
self.assertAllEqual(tensor_dict_out[key].shape, expected_shapes[key])
......@@ -276,7 +339,7 @@ class FasterRCNNMetaArchTest(
self.assertAllEqual(tensor_dict_out[key].shape, expected_shapes[key])
anchors_shape_out = tensor_dict_out['anchors'].shape
self.assertEqual(2, len(anchors_shape_out))
self.assertLen(anchors_shape_out, 2)
self.assertEqual(4, anchors_shape_out[1])
num_anchors_out = anchors_shape_out[0]
self.assertAllEqual(tensor_dict_out['rpn_box_encodings'].shape,
......
......@@ -165,7 +165,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
use_matmul_crop_and_resize=False,
clip_anchors_to_image=False,
use_matmul_gather_in_matcher=False,
use_static_shapes=False):
use_static_shapes=False,
calibration_mapping_value=None):
def image_resizer_fn(image, masks=None):
"""Fake image resizer function."""
......@@ -244,7 +245,9 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
first_stage_localization_loss_weight = 1.0
first_stage_objectness_loss_weight = 1.0
post_processing_config = post_processing_pb2.PostProcessing()
post_processing_text_proto = """
score_converter: IDENTITY
batch_non_max_suppression {
score_threshold: -20.0
iou_threshold: 1.0
......@@ -253,18 +256,31 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
use_static_shapes: """ +'{}'.format(use_static_shapes) + """
}
"""
post_processing_config = post_processing_pb2.PostProcessing()
if calibration_mapping_value:
calibration_text_proto = """
calibration_config {
function_approximation {
x_y_pairs {
x_y_pair {
x: 0.0
y: %f
}
x_y_pair {
x: 1.0
y: %f
}}}}""" % (calibration_mapping_value, calibration_mapping_value)
post_processing_text_proto = (post_processing_text_proto
+ ' ' + calibration_text_proto)
text_format.Merge(post_processing_text_proto, post_processing_config)
second_stage_non_max_suppression_fn, second_stage_score_conversion_fn = (
post_processing_builder.build(post_processing_config))
second_stage_target_assigner = target_assigner.create_target_assigner(
'FasterRCNN', 'detection',
use_matmul_gather=use_matmul_gather_in_matcher)
second_stage_non_max_suppression_fn, _ = post_processing_builder.build(
post_processing_config)
second_stage_sampler = sampler.BalancedPositiveNegativeSampler(
positive_fraction=1.0, is_static=use_static_shapes)
second_stage_score_conversion_fn = tf.identity
second_stage_localization_loss_weight = 1.0
second_stage_classification_loss_weight = 1.0
if softmax_second_stage_classification_loss:
......@@ -336,6 +352,10 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
predict_masks=predict_masks,
masks_are_class_agnostic=masks_are_class_agnostic), **common_kwargs)
@parameterized.parameters(
{'use_static_shapes': False},
{'use_static_shapes': True}
)
def test_predict_gives_correct_shapes_in_inference_mode_first_stage_only(
self, use_static_shapes=False):
batch_size = 2
......@@ -457,6 +477,10 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
prediction_out['rpn_objectness_predictions_with_background'].shape,
(batch_size, num_anchors_out, 2))
@parameterized.parameters(
{'use_static_shapes': False},
{'use_static_shapes': True}
)
def test_predict_correct_shapes_in_inference_mode_two_stages(
self, use_static_shapes=False):
......@@ -578,6 +602,10 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
for key in expected_shapes:
self.assertAllEqual(tensor_dict_out[key].shape, expected_shapes[key])
@parameterized.parameters(
{'use_static_shapes': False},
{'use_static_shapes': True}
)
def test_predict_gives_correct_shapes_in_train_mode_both_stages(
self,
use_static_shapes=False):
......@@ -670,6 +698,12 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
self.assertAllEqual(results[8].shape,
expected_shapes['rpn_box_predictor_features'])
@parameterized.parameters(
{'use_static_shapes': False, 'pad_to_max_dimension': None},
{'use_static_shapes': True, 'pad_to_max_dimension': None},
{'use_static_shapes': False, 'pad_to_max_dimension': 56},
{'use_static_shapes': True, 'pad_to_max_dimension': 56}
)
def test_postprocess_first_stage_only_inference_mode(
self, use_static_shapes=False, pad_to_max_dimension=None):
batch_size = 2
......@@ -696,9 +730,9 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
rpn_objectness_predictions_with_background,
'rpn_features_to_crop': rpn_features_to_crop,
'anchors': anchors}, true_image_shapes)
return (proposals['num_detections'],
proposals['detection_boxes'],
proposals['detection_scores'])
return (proposals['num_detections'], proposals['detection_boxes'],
proposals['detection_scores'], proposals['raw_detection_boxes'],
proposals['raw_detection_scores'])
anchors = np.array(
[[0, 0, 16, 16],
......@@ -741,6 +775,12 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
expected_proposal_scores = [[1, 1, 0, 0, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0, 0, 0]]
expected_num_proposals = [4, 4]
expected_raw_proposal_boxes = [[[0., 0., 0.5, 0.5], [0., 0.5, 0.5, 1.],
[0.5, 0., 1., 0.5], [0.5, 0.5, 1., 1.]],
[[0., 0., 0.5, 0.5], [0., 0.5, 0.5, 1.],
[0.5, 0., 1., 0.5], [0.5, 0.5, 1., 1.]]]
expected_raw_scores = [[[-10., 13.], [10., -10.], [10., -11.], [-10., 12.]],
[[10., -10.], [-10., 13.], [-10., 12.], [10., -11.]]]
self.assertAllClose(results[0], expected_num_proposals)
for indx, num_proposals in enumerate(expected_num_proposals):
......@@ -748,6 +788,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
expected_proposal_boxes[indx][0:num_proposals])
self.assertAllClose(results[2][indx][0:num_proposals],
expected_proposal_scores[indx][0:num_proposals])
self.assertAllClose(results[3], expected_raw_proposal_boxes)
self.assertAllClose(results[4], expected_raw_scores)
def _test_postprocess_first_stage_only_train_mode(self,
pad_to_max_dimension=None):
......@@ -801,9 +843,17 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
expected_proposal_scores = [[1, 1],
[1, 1]]
expected_num_proposals = [2, 2]
expected_output_keys = set(['detection_boxes', 'detection_scores',
'num_detections'])
expected_raw_proposal_boxes = [[[0., 0., 0.5, 0.5], [0., 0.5, 0.5, 1.],
[0.5, 0., 1., 0.5], [0.5, 0.5, 1., 1.]],
[[0., 0., 0.5, 0.5], [0., 0.5, 0.5, 1.],
[0.5, 0., 1., 0.5], [0.5, 0.5, 1., 1.]]]
expected_raw_scores = [[[-10., 13.], [-10., 12.], [-10., 11.], [-10., 10.]],
[[-10., 13.], [-10., 12.], [-10., 11.], [-10., 10.]]]
expected_output_keys = set([
'detection_boxes', 'detection_scores', 'num_detections',
'raw_detection_boxes', 'raw_detection_scores'
])
self.assertEqual(set(proposals.keys()), expected_output_keys)
with self.test_session() as sess:
......@@ -817,6 +867,10 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
expected_proposal_scores)
self.assertAllEqual(proposals_out['num_detections'],
expected_num_proposals)
self.assertAllClose(proposals_out['raw_detection_boxes'],
expected_raw_proposal_boxes)
self.assertAllClose(proposals_out['raw_detection_scores'],
expected_raw_scores)
def test_postprocess_first_stage_only_train_mode(self):
self._test_postprocess_first_stage_only_train_mode()
......@@ -824,6 +878,12 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
def test_postprocess_first_stage_only_train_mode_padded_image(self):
self._test_postprocess_first_stage_only_train_mode(pad_to_max_dimension=56)
@parameterized.parameters(
{'use_static_shapes': False, 'pad_to_max_dimension': None},
{'use_static_shapes': True, 'pad_to_max_dimension': None},
{'use_static_shapes': False, 'pad_to_max_dimension': 56},
{'use_static_shapes': True, 'pad_to_max_dimension': 56}
)
def test_postprocess_second_stage_only_inference_mode(
self, use_static_shapes=False, pad_to_max_dimension=None):
batch_size = 2
......@@ -854,10 +914,10 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
'num_proposals': num_proposals,
'proposal_boxes': proposal_boxes,
}, true_image_shapes)
return (detections['num_detections'],
detections['detection_boxes'],
detections['detection_scores'],
detections['detection_classes'])
return (detections['num_detections'], detections['detection_boxes'],
detections['detection_scores'], detections['detection_classes'],
detections['raw_detection_boxes'],
detections['raw_detection_scores'])
proposal_boxes = np.array(
[[[1, 1, 2, 3],
......@@ -867,6 +927,7 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
[[2, 3, 6, 8],
[1, 2, 5, 3],
4*[0], 4*[0], 4*[0], 4*[0], 4*[0], 4*[0]]], dtype=np.float32)
num_proposals = np.array([3, 2], dtype=np.int32)
refined_box_encodings = np.zeros(
[total_num_padded_proposals, num_classes, 4], dtype=np.float32)
......@@ -887,6 +948,15 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
expected_num_detections = [5, 4]
expected_detection_classes = [[0, 0, 0, 1, 1], [0, 0, 1, 1, 0]]
expected_detection_scores = [[1, 1, 1, 1, 1], [1, 1, 1, 1, 0]]
h = float(image_shape[1])
w = float(image_shape[2])
expected_raw_detection_boxes = np.array(
[[[1 / h, 1 / w, 2 / h, 3 / w], [0, 0, 1 / h, 1 / w],
[.5 / h, .5 / w, .6 / h, .6 / w], 4 * [0], 4 * [0], 4 * [0], 4 * [0],
4 * [0]],
[[2 / h, 3 / w, 6 / h, 8 / w], [1 / h, 2 / w, 5 / h, 3 / w], 4 * [0],
4 * [0], 4 * [0], 4 * [0], 4 * [0], 4 * [0]]],
dtype=np.float32)
self.assertAllClose(results[0], expected_num_detections)
......@@ -896,6 +966,9 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
self.assertAllClose(results[3][indx][0:num_proposals],
expected_detection_classes[indx][0:num_proposals])
self.assertAllClose(results[4], expected_raw_detection_boxes)
self.assertAllClose(results[5],
class_predictions_with_background.reshape([-1, 8, 3]))
if not use_static_shapes:
self.assertAllEqual(results[1].shape, [2, 5, 4])
......@@ -1268,6 +1341,13 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
'Loss/BoxClassifierLoss/classification_loss'], 0)
self.assertAllClose(loss_dict_out['Loss/BoxClassifierLoss/mask_loss'], 0)
@parameterized.parameters(
{'use_static_shapes': False, 'shared_boxes': False},
{'use_static_shapes': False, 'shared_boxes': True},
{'use_static_shapes': True, 'shared_boxes': False},
{'use_static_shapes': True, 'shared_boxes': True},
)
def test_loss_full_zero_padded_proposals_nonzero_loss_with_two_images(
self, use_static_shapes=False, shared_boxes=False):
batch_size = 2
......
......@@ -288,7 +288,7 @@ class RFCNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch):
"""
image_shape_2d = tf.tile(tf.expand_dims(image_shape[1:], 0),
[image_shape[0], 1])
proposal_boxes_normalized, _, num_proposals = self._postprocess_rpn(
proposal_boxes_normalized, _, num_proposals, _, _ = self._postprocess_rpn(
rpn_box_encodings, rpn_objectness_predictions_with_background,
anchors, image_shape_2d, true_image_shapes)
......
......@@ -17,8 +17,7 @@
General tensorflow implementation of convolutional Multibox/SSD detection
models.
"""
from abc import abstractmethod
import abc
import tensorflow as tf
from object_detection.core import box_list
......@@ -80,7 +79,7 @@ class SSDFeatureExtractor(object):
def is_keras_model(self):
return False
@abstractmethod
@abc.abstractmethod
def preprocess(self, resized_inputs):
"""Preprocesses images for feature extraction (minus image resizing).
......@@ -98,7 +97,7 @@ class SSDFeatureExtractor(object):
"""
pass
@abstractmethod
@abc.abstractmethod
def extract_features(self, preprocessed_inputs):
"""Extracts features from preprocessed inputs.
......@@ -196,7 +195,7 @@ class SSDKerasFeatureExtractor(tf.keras.Model):
def is_keras_model(self):
return True
@abstractmethod
@abc.abstractmethod
def preprocess(self, resized_inputs):
"""Preprocesses images for feature extraction (minus image resizing).
......@@ -214,7 +213,7 @@ class SSDKerasFeatureExtractor(tf.keras.Model):
"""
raise NotImplementedError
@abstractmethod
@abc.abstractmethod
def _extract_features(self, preprocessed_inputs):
"""Extracts features from preprocessed inputs.
......@@ -552,8 +551,10 @@ class SSDMetaArch(model.DetectionModel):
5) anchors: 2-D float tensor of shape [num_anchors, 4] containing
the generated anchors in normalized coordinates.
"""
batchnorm_updates_collections = (None if self._inplace_batchnorm_update
else tf.GraphKeys.UPDATE_OPS)
if self._inplace_batchnorm_update:
batchnorm_updates_collections = None
else:
batchnorm_updates_collections = tf.GraphKeys.UPDATE_OPS
if self._feature_extractor.is_keras_model:
feature_maps = self._feature_extractor(preprocessed_inputs)
else:
......@@ -648,14 +649,22 @@ class SSDMetaArch(model.DetectionModel):
Returns:
detections: a dictionary containing the following fields
detection_boxes: [batch, max_detections, 4]
detection_scores: [batch, max_detections]
detection_classes: [batch, max_detections]
detection_boxes: [batch, max_detections, 4] tensor with post-processed
detection boxes.
detection_scores: [batch, max_detections] tensor with scalar scores for
post-processed detection boxes.
detection_classes: [batch, max_detections] tensor with classes for
post-processed detection classes.
detection_keypoints: [batch, max_detections, num_keypoints, 2] (if
encoded in the prediction_dict 'box_encodings')
detection_masks: [batch_size, max_detections, mask_height, mask_width]
(optional)
num_detections: [batch]
raw_detection_boxes: [batch, total_detections, 4] tensor with decoded
detection boxes before Non-Max Suppression.
raw_detection_score: [batch, total_detections,
num_classes_with_background] tensor of multi-class score logits for
raw detection boxes.
Raises:
ValueError: if prediction_dict does not contain `box_encodings` or
`class_predictions_with_background` fields.
......@@ -700,11 +709,18 @@ class SSDMetaArch(model.DetectionModel):
additional_fields=additional_fields,
masks=prediction_dict.get('mask_predictions'))
detection_dict = {
fields.DetectionResultFields.detection_boxes: nmsed_boxes,
fields.DetectionResultFields.detection_scores: nmsed_scores,
fields.DetectionResultFields.detection_classes: nmsed_classes,
fields.DetectionResultFields.detection_boxes:
nmsed_boxes,
fields.DetectionResultFields.detection_scores:
nmsed_scores,
fields.DetectionResultFields.detection_classes:
nmsed_classes,
fields.DetectionResultFields.num_detections:
tf.to_float(num_detections)
tf.to_float(num_detections),
fields.DetectionResultFields.raw_detection_boxes:
tf.squeeze(detection_boxes, axis=2),
fields.DetectionResultFields.raw_detection_scores:
class_predictions
}
if (nmsed_additional_fields is not None and
fields.BoxListFields.keypoints in nmsed_additional_fields):
......@@ -1049,9 +1065,9 @@ class SSDMetaArch(model.DetectionModel):
mined_cls_loss: a float scalar with sum of classification losses from
selected hard examples.
"""
class_predictions = tf.slice(
prediction_dict['class_predictions_with_background'], [0, 0,
1], [-1, -1, -1])
class_predictions = prediction_dict['class_predictions_with_background']
if self._add_background_class:
class_predictions = tf.slice(class_predictions, [0, 0, 1], [-1, -1, -1])
decoded_boxes, _ = self._batch_decode(prediction_dict['box_encodings'])
decoded_box_tensors_list = tf.unstack(decoded_boxes)
......
......@@ -48,7 +48,8 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
use_keras=False,
predict_mask=False,
use_static_shapes=False,
nms_max_size_per_class=5):
nms_max_size_per_class=5,
calibration_mapping_value=None):
return super(SsdMetaArchTest, self)._create_model(
model_fn=ssd_meta_arch.SSDMetaArch,
apply_hard_mining=apply_hard_mining,
......@@ -61,7 +62,8 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
use_keras=use_keras,
predict_mask=predict_mask,
use_static_shapes=use_static_shapes,
nms_max_size_per_class=nms_max_size_per_class)
nms_max_size_per_class=nms_max_size_per_class,
calibration_mapping_value=calibration_mapping_value)
def test_preprocess_preserves_shapes_with_dynamic_input_image(
self, use_keras):
......@@ -177,6 +179,13 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
expected_classes = [[0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]
expected_num_detections = np.array([3, 3])
raw_detection_boxes = [[[0., 0., 0.5, 0.5], [0., 0.5, 0.5, 1.],
[0.5, 0., 1., 0.5], [1., 1., 1.5, 1.5]],
[[0., 0., 0.5, 0.5], [0., 0.5, 0.5, 1.],
[0.5, 0., 1., 0.5], [1., 1., 1.5, 1.5]]]
raw_detection_scores = [[[0, 0], [0, 0], [0, 0], [0, 0]],
[[0, 0], [0, 0], [0, 0], [0, 0]]]
for input_shape in input_shapes:
tf_graph = tf.Graph()
with tf_graph.as_default():
......@@ -191,6 +200,8 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
self.assertIn('detection_scores', detections)
self.assertIn('detection_classes', detections)
self.assertIn('num_detections', detections)
self.assertIn('raw_detection_boxes', detections)
self.assertIn('raw_detection_scores', detections)
init_op = tf.global_variables_initializer()
with self.test_session(graph=tf_graph) as sess:
sess.run(init_op)
......@@ -208,7 +219,139 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
self.assertAllClose(detections_out['detection_classes'], expected_classes)
self.assertAllClose(detections_out['num_detections'],
expected_num_detections)
self.assertAllEqual(detections_out['raw_detection_boxes'],
raw_detection_boxes)
self.assertAllEqual(detections_out['raw_detection_scores'],
raw_detection_scores)
def test_postprocess_results_are_correct_static(self, use_keras):
with tf.Graph().as_default():
_, _, _, _ = self._create_model(use_keras=use_keras)
def graph_fn(input_image):
model, _, _, _ = self._create_model(use_static_shapes=True,
nms_max_size_per_class=4)
preprocessed_inputs, true_image_shapes = model.preprocess(input_image)
prediction_dict = model.predict(preprocessed_inputs,
true_image_shapes)
detections = model.postprocess(prediction_dict, true_image_shapes)
return (detections['detection_boxes'], detections['detection_scores'],
detections['detection_classes'], detections['num_detections'])
batch_size = 2
image_size = 2
channels = 3
input_image = np.random.rand(batch_size, image_size, image_size,
channels).astype(np.float32)
expected_boxes = [
[
[0, 0, .5, .5],
[0, .5, .5, 1],
[.5, 0, 1, .5],
[0, 0, 0, 0]
], # padding
[
[0, 0, .5, .5],
[0, .5, .5, 1],
[.5, 0, 1, .5],
[0, 0, 0, 0]
]
] # padding
expected_scores = [[0, 0, 0, 0], [0, 0, 0, 0]]
expected_classes = [[0, 0, 0, 0], [0, 0, 0, 0]]
expected_num_detections = np.array([3, 3])
(detection_boxes, detection_scores, detection_classes,
num_detections) = self.execute(graph_fn, [input_image])
for image_idx in range(batch_size):
self.assertTrue(test_utils.first_rows_close_as_set(
detection_boxes[image_idx][
0:expected_num_detections[image_idx]].tolist(),
expected_boxes[image_idx][0:expected_num_detections[image_idx]]))
self.assertAllClose(
detection_scores[image_idx][0:expected_num_detections[image_idx]],
expected_scores[image_idx][0:expected_num_detections[image_idx]])
self.assertAllClose(
detection_classes[image_idx][0:expected_num_detections[image_idx]],
expected_classes[image_idx][0:expected_num_detections[image_idx]])
self.assertAllClose(num_detections,
expected_num_detections)
def test_postprocess_results_are_correct_with_calibration(self, use_keras):
batch_size = 2
image_size = 2
input_shapes = [(batch_size, image_size, image_size, 3),
(None, image_size, image_size, 3),
(batch_size, None, None, 3),
(None, None, None, 3)]
expected_boxes = [
[
[0, 0, .5, .5],
[0, .5, .5, 1],
[.5, 0, 1, .5],
[0, 0, 0, 0], # pruned prediction
[0, 0, 0, 0]
], # padding
[
[0, 0, .5, .5],
[0, .5, .5, 1],
[.5, 0, 1, .5],
[0, 0, 0, 0], # pruned prediction
[0, 0, 0, 0]
]
] # padding
# Calibration mapping value below is set to map all scores to 0.5, except
# for the last two detections in each batch (see expected number of
# detections below.
expected_scores = [[0.5, 0.5, 0.5, 0., 0.], [0.5, 0.5, 0.5, 0., 0.]]
expected_classes = [[0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]
expected_num_detections = np.array([3, 3])
raw_detection_boxes = [[[0., 0., 0.5, 0.5], [0., 0.5, 0.5, 1.],
[0.5, 0., 1., 0.5], [1., 1., 1.5, 1.5]],
[[0., 0., 0.5, 0.5], [0., 0.5, 0.5, 1.],
[0.5, 0., 1., 0.5], [1., 1., 1.5, 1.5]]]
raw_detection_scores = [[[0, 0], [0, 0], [0, 0], [0, 0]],
[[0, 0], [0, 0], [0, 0], [0, 0]]]
for input_shape in input_shapes:
tf_graph = tf.Graph()
with tf_graph.as_default():
model, _, _, _ = self._create_model(use_keras=use_keras,
calibration_mapping_value=0.5)
input_placeholder = tf.placeholder(tf.float32, shape=input_shape)
preprocessed_inputs, true_image_shapes = model.preprocess(
input_placeholder)
prediction_dict = model.predict(preprocessed_inputs,
true_image_shapes)
detections = model.postprocess(prediction_dict, true_image_shapes)
self.assertIn('detection_boxes', detections)
self.assertIn('detection_scores', detections)
self.assertIn('detection_classes', detections)
self.assertIn('num_detections', detections)
self.assertIn('raw_detection_boxes', detections)
self.assertIn('raw_detection_scores', detections)
init_op = tf.global_variables_initializer()
with self.test_session(graph=tf_graph) as sess:
sess.run(init_op)
detections_out = sess.run(detections,
feed_dict={
input_placeholder:
np.random.uniform(
size=(batch_size, 2, 2, 3))})
for image_idx in range(batch_size):
self.assertTrue(
test_utils.first_rows_close_as_set(
detections_out['detection_boxes'][image_idx].tolist(),
expected_boxes[image_idx]))
self.assertAllClose(detections_out['detection_scores'], expected_scores)
self.assertAllClose(detections_out['detection_classes'], expected_classes)
self.assertAllClose(detections_out['num_detections'],
expected_num_detections)
self.assertAllEqual(detections_out['raw_detection_boxes'],
raw_detection_boxes)
self.assertAllEqual(detections_out['raw_detection_scores'],
raw_detection_scores)
def test_loss_results_are_correct(self, use_keras):
......
......@@ -16,7 +16,9 @@
import functools
import tensorflow as tf
from google.protobuf import text_format
from object_detection.builders import post_processing_builder
from object_detection.core import anchor_generator
from object_detection.core import balanced_positive_negative_sampler as sampler
from object_detection.core import box_list
......@@ -25,6 +27,7 @@ from object_detection.core import post_processing
from object_detection.core import region_similarity_calculator as sim_calc
from object_detection.core import target_assigner
from object_detection.meta_architectures import ssd_meta_arch
from object_detection.protos import calibration_pb2
from object_detection.protos import model_pb2
from object_detection.utils import ops
from object_detection.utils import test_case
......@@ -125,7 +128,8 @@ class SSDMetaArchTestBase(test_case.TestCase):
use_keras=False,
predict_mask=False,
use_static_shapes=False,
nms_max_size_per_class=5):
nms_max_size_per_class=5,
calibration_mapping_value=None):
is_training = False
num_classes = 1
mock_anchor_generator = MockAnchorGenerator2x2()
......@@ -156,6 +160,24 @@ class SSDMetaArchTestBase(test_case.TestCase):
max_size_per_class=nms_max_size_per_class,
max_total_size=nms_max_size_per_class,
use_static_shapes=use_static_shapes)
score_conversion_fn = tf.identity
calibration_config = calibration_pb2.CalibrationConfig()
if calibration_mapping_value:
calibration_text_proto = """
function_approximation {
x_y_pairs {
x_y_pair {
x: 0.0
y: %f
}
x_y_pair {
x: 1.0
y: %f
}}}""" % (calibration_mapping_value, calibration_mapping_value)
text_format.Merge(calibration_text_proto, calibration_config)
score_conversion_fn = (
post_processing_builder._build_calibrated_score_converter( # pylint: disable=protected-access
tf.identity, calibration_config))
classification_loss_weight = 1.0
localization_loss_weight = 1.0
negative_class_weight = 1.0
......@@ -201,7 +223,7 @@ class SSDMetaArchTestBase(test_case.TestCase):
encode_background_as_zeros=encode_background_as_zeros,
image_resizer_fn=image_resizer_fn,
non_max_suppression_fn=non_max_suppression_fn,
score_conversion_fn=tf.identity,
score_conversion_fn=score_conversion_fn,
classification_loss=classification_loss,
localization_loss=localization_loss,
classification_loss_weight=classification_loss_weight,
......
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Class for evaluating object detections with calibration metrics."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
from object_detection.box_coders import mean_stddev_box_coder
from object_detection.core import box_list
from object_detection.core import region_similarity_calculator
from object_detection.core import standard_fields
from object_detection.core import target_assigner
from object_detection.matchers import argmax_matcher
from object_detection.metrics import calibration_metrics
from object_detection.utils import object_detection_evaluation
# TODO(zbeaver): Implement metrics per category.
class CalibrationDetectionEvaluator(
object_detection_evaluation.DetectionEvaluator):
"""Class to evaluate calibration detection metrics."""
def __init__(self,
categories,
iou_threshold=0.5):
"""Constructor.
Args:
categories: A list of dicts, each of which has the following keys -
'id': (required) an integer id uniquely identifying this category.
'name': (required) string representing category name e.g., 'cat', 'dog'.
iou_threshold: Threshold above which to consider a box as matched during
evaluation.
"""
super(CalibrationDetectionEvaluator, self).__init__(categories)
# Constructing target_assigner to match detections to groundtruth.
similarity_calc = region_similarity_calculator.IouSimilarity()
matcher = argmax_matcher.ArgMaxMatcher(
matched_threshold=iou_threshold, unmatched_threshold=iou_threshold)
box_coder = mean_stddev_box_coder.MeanStddevBoxCoder(stddev=0.1)
self._target_assigner = target_assigner.TargetAssigner(
similarity_calc, matcher, box_coder)
def match_single_image_info(self, image_info):
"""Match detections to groundtruth for a single image.
Detections are matched to available groundtruth in the image based on the
IOU threshold from the constructor. The classes of the detections and
groundtruth matches are then compared. Detections that do not have IOU above
the required threshold or have different classes from their match are
considered negative matches. All inputs in `image_info` originate or are
inferred from the eval_dict passed to class method
`get_estimator_eval_metric_ops`.
Args:
image_info: a tuple or list containing the following (in order):
- gt_boxes: tf.float32 tensor of groundtruth boxes.
- gt_classes: tf.int64 tensor of groundtruth classes associated with
groundtruth boxes.
- num_gt_box: scalar indicating the number of groundtruth boxes per
image.
- det_boxes: tf.float32 tensor of detection boxes.
- det_classes: tf.int64 tensor of detection classes associated with
detection boxes.
- num_det_box: scalar indicating the number of detection boxes per
image.
Returns:
is_class_matched: tf.int64 tensor identical in shape to det_boxes,
indicating whether detection boxes matched with and had the same
class as groundtruth annotations.
"""
(gt_boxes, gt_classes, num_gt_box, det_boxes, det_classes,
num_det_box) = image_info
detection_boxes = det_boxes[:num_det_box]
detection_classes = det_classes[:num_det_box]
groundtruth_boxes = gt_boxes[:num_gt_box]
groundtruth_classes = gt_classes[:num_gt_box]
det_boxlist = box_list.BoxList(detection_boxes)
gt_boxlist = box_list.BoxList(groundtruth_boxes)
# Target assigner requires classes in one-hot format. An additional
# dimension is required since gt_classes are 1-indexed; the zero index is
# provided to all non-matches.
one_hot_depth = tf.cast(tf.add(tf.reduce_max(groundtruth_classes), 1),
dtype=tf.int32)
gt_classes_one_hot = tf.one_hot(
groundtruth_classes, one_hot_depth, dtype=tf.float32)
one_hot_cls_targets, _, _, _, _ = self._target_assigner.assign(
det_boxlist,
gt_boxlist,
gt_classes_one_hot,
unmatched_class_label=tf.zeros(shape=one_hot_depth, dtype=tf.float32))
# Transform from one-hot back to indexes.
cls_targets = tf.argmax(one_hot_cls_targets, axis=1)
is_class_matched = tf.cast(
tf.equal(tf.cast(cls_targets, tf.int64), detection_classes),
dtype=tf.int64)
return is_class_matched
def get_estimator_eval_metric_ops(self, eval_dict):
"""Returns a dictionary of eval metric ops.
Note that once value_op is called, the detections and groundtruth added via
update_op are cleared.
This function can take in groundtruth and detections for a batch of images,
or for a single image. For the latter case, the batch dimension for input
tensors need not be present.
Args:
eval_dict: A dictionary that holds tensors for evaluating object detection
performance. For single-image evaluation, this dictionary may be
produced from eval_util.result_dict_for_single_example(). If multi-image
evaluation, `eval_dict` should contain the fields
'num_groundtruth_boxes_per_image' and 'num_det_boxes_per_image' to
properly unpad the tensors from the batch.
Returns:
a dictionary of metric names to tuple of value_op and update_op that can
be used as eval metric ops in tf.estimator.EstimatorSpec. Note that all
update ops must be run together and similarly all value ops must be run
together to guarantee correct behaviour.
"""
# Unpack items from the evaluation dictionary.
input_data_fields = standard_fields.InputDataFields
detection_fields = standard_fields.DetectionResultFields
image_id = eval_dict[input_data_fields.key]
groundtruth_boxes = eval_dict[input_data_fields.groundtruth_boxes]
groundtruth_classes = eval_dict[input_data_fields.groundtruth_classes]
detection_boxes = eval_dict[detection_fields.detection_boxes]
detection_scores = eval_dict[detection_fields.detection_scores]
detection_classes = eval_dict[detection_fields.detection_classes]
num_gt_boxes_per_image = eval_dict.get(
'num_groundtruth_boxes_per_image', None)
num_det_boxes_per_image = eval_dict.get('num_det_boxes_per_image', None)
is_annotated_batched = eval_dict.get('is_annotated', None)
if not image_id.shape.as_list():
# Apply a batch dimension to all tensors.
image_id = tf.expand_dims(image_id, 0)
groundtruth_boxes = tf.expand_dims(groundtruth_boxes, 0)
groundtruth_classes = tf.expand_dims(groundtruth_classes, 0)
detection_boxes = tf.expand_dims(detection_boxes, 0)
detection_scores = tf.expand_dims(detection_scores, 0)
detection_classes = tf.expand_dims(detection_classes, 0)
if num_gt_boxes_per_image is None:
num_gt_boxes_per_image = tf.shape(groundtruth_boxes)[1:2]
else:
num_gt_boxes_per_image = tf.expand_dims(num_gt_boxes_per_image, 0)
if num_det_boxes_per_image is None:
num_det_boxes_per_image = tf.shape(detection_boxes)[1:2]
else:
num_det_boxes_per_image = tf.expand_dims(num_det_boxes_per_image, 0)
if is_annotated_batched is None:
is_annotated_batched = tf.constant([True])
else:
is_annotated_batched = tf.expand_dims(is_annotated_batched, 0)
else:
if num_gt_boxes_per_image is None:
num_gt_boxes_per_image = tf.tile(
tf.shape(groundtruth_boxes)[1:2],
multiples=tf.shape(groundtruth_boxes)[0:1])
if num_det_boxes_per_image is None:
num_det_boxes_per_image = tf.tile(
tf.shape(detection_boxes)[1:2],
multiples=tf.shape(detection_boxes)[0:1])
if is_annotated_batched is None:
is_annotated_batched = tf.ones_like(image_id, dtype=tf.bool)
# Filter images based on is_annotated_batched and match detections.
image_info = [tf.boolean_mask(tensor, is_annotated_batched) for tensor in
[groundtruth_boxes, groundtruth_classes,
num_gt_boxes_per_image, detection_boxes, detection_classes,
num_det_boxes_per_image]]
is_class_matched = tf.map_fn(
self.match_single_image_info, image_info, dtype=tf.int64)
y_true = tf.squeeze(is_class_matched)
y_pred = tf.squeeze(tf.boolean_mask(detection_scores, is_annotated_batched))
ece, update_op = calibration_metrics.expected_calibration_error(
y_true, y_pred)
return {'CalibrationError/ExpectedCalibrationError': (ece, update_op)}
def add_single_ground_truth_image_info(self, image_id, groundtruth_dict):
"""Adds groundtruth for a single image to be used for evaluation.
Args:
image_id: A unique string/integer identifier for the image.
groundtruth_dict: A dictionary of groundtruth numpy arrays required
for evaluations.
"""
raise NotImplementedError
def add_single_detected_image_info(self, image_id, detections_dict):
"""Adds detections for a single image to be used for evaluation.
Args:
image_id: A unique string/integer identifier for the image.
detections_dict: A dictionary of detection numpy arrays required for
evaluation.
"""
raise NotImplementedError
def evaluate(self):
"""Evaluates detections and returns a dictionary of metrics."""
raise NotImplementedError
def clear(self):
"""Clears the state to prepare for a fresh evaluation."""
raise NotImplementedError
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for tensorflow_models.object_detection.metrics.calibration_evaluation.""" # pylint: disable=line-too-long
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
from object_detection.core import standard_fields
from object_detection.metrics import calibration_evaluation
def _get_categories_list():
return [{
'id': 1,
'name': 'person'
}, {
'id': 2,
'name': 'dog'
}, {
'id': 3,
'name': 'cat'
}]
class CalibrationDetectionEvaluationTest(tf.test.TestCase):
def _get_ece(self, ece_op, update_op):
"""Return scalar expected calibration error."""
with self.test_session() as sess:
metrics_vars = tf.get_collection(tf.GraphKeys.METRIC_VARIABLES)
sess.run(tf.variables_initializer(var_list=metrics_vars))
_ = sess.run(update_op)
return sess.run(ece_op)
def testGetECEWithMatchingGroundtruthAndDetections(self):
"""Tests that ECE is calculated correctly when box matches exist."""
calibration_evaluator = calibration_evaluation.CalibrationDetectionEvaluator(
_get_categories_list(), iou_threshold=0.5)
input_data_fields = standard_fields.InputDataFields
detection_fields = standard_fields.DetectionResultFields
# All gt and detection boxes match.
base_eval_dict = {
input_data_fields.key:
tf.constant(['image_1', 'image_2', 'image_3']),
input_data_fields.groundtruth_boxes:
tf.constant([[[100., 100., 200., 200.]],
[[50., 50., 100., 100.]],
[[25., 25., 50., 50.]]],
dtype=tf.float32),
detection_fields.detection_boxes:
tf.constant([[[100., 100., 200., 200.]],
[[50., 50., 100., 100.]],
[[25., 25., 50., 50.]]],
dtype=tf.float32),
input_data_fields.groundtruth_classes:
tf.constant([[1], [2], [3]], dtype=tf.int64),
# Note that, in the zero ECE case, the detection class for image_2
# should NOT match groundtruth, since the detection score is zero.
detection_fields.detection_scores:
tf.constant([[1.0], [0.0], [1.0]], dtype=tf.float32)
}
# Zero ECE (perfectly calibrated).
zero_ece_eval_dict = base_eval_dict.copy()
zero_ece_eval_dict[detection_fields.detection_classes] = tf.constant(
[[1], [1], [3]], dtype=tf.int64)
zero_ece_op, zero_ece_update_op = (
calibration_evaluator.get_estimator_eval_metric_ops(zero_ece_eval_dict)
['CalibrationError/ExpectedCalibrationError'])
zero_ece = self._get_ece(zero_ece_op, zero_ece_update_op)
self.assertAlmostEqual(zero_ece, 0.0)
# ECE of 1 (poorest calibration).
one_ece_eval_dict = base_eval_dict.copy()
one_ece_eval_dict[detection_fields.detection_classes] = tf.constant(
[[3], [2], [1]], dtype=tf.int64)
one_ece_op, one_ece_update_op = (
calibration_evaluator.get_estimator_eval_metric_ops(one_ece_eval_dict)
['CalibrationError/ExpectedCalibrationError'])
one_ece = self._get_ece(one_ece_op, one_ece_update_op)
self.assertAlmostEqual(one_ece, 1.0)
def testGetECEWithUnmatchedGroundtruthAndDetections(self):
"""Tests that ECE is correctly calculated when boxes are unmatched."""
calibration_evaluator = calibration_evaluation.CalibrationDetectionEvaluator(
_get_categories_list(), iou_threshold=0.5)
input_data_fields = standard_fields.InputDataFields
detection_fields = standard_fields.DetectionResultFields
# No gt and detection boxes match.
eval_dict = {
input_data_fields.key:
tf.constant(['image_1', 'image_2', 'image_3']),
input_data_fields.groundtruth_boxes:
tf.constant([[[100., 100., 200., 200.]],
[[50., 50., 100., 100.]],
[[25., 25., 50., 50.]]],
dtype=tf.float32),
detection_fields.detection_boxes:
tf.constant([[[50., 50., 100., 100.]],
[[25., 25., 50., 50.]],
[[100., 100., 200., 200.]]],
dtype=tf.float32),
input_data_fields.groundtruth_classes:
tf.constant([[1], [2], [3]], dtype=tf.int64),
detection_fields.detection_classes:
tf.constant([[1], [1], [3]], dtype=tf.int64),
# Detection scores of zero when boxes are unmatched = ECE of zero.
detection_fields.detection_scores:
tf.constant([[0.0], [0.0], [0.0]], dtype=tf.float32)
}
ece_op, update_op = calibration_evaluator.get_estimator_eval_metric_ops(
eval_dict)['CalibrationError/ExpectedCalibrationError']
ece = self._get_ece(ece_op, update_op)
self.assertAlmostEqual(ece, 0.0)
def testGetECEWithBatchedDetections(self):
"""Tests that ECE is correct with multiple detections per image."""
calibration_evaluator = calibration_evaluation.CalibrationDetectionEvaluator(
_get_categories_list(), iou_threshold=0.5)
input_data_fields = standard_fields.InputDataFields
detection_fields = standard_fields.DetectionResultFields
# Note that image_2 has mismatched classes and detection scores but should
# still produce ECE of 0 because detection scores are also 0.
eval_dict = {
input_data_fields.key:
tf.constant(['image_1', 'image_2', 'image_3']),
input_data_fields.groundtruth_boxes:
tf.constant([[[100., 100., 200., 200.], [50., 50., 100., 100.]],
[[50., 50., 100., 100.], [100., 100., 200., 200.]],
[[25., 25., 50., 50.], [100., 100., 200., 200.]]],
dtype=tf.float32),
detection_fields.detection_boxes:
tf.constant([[[100., 100., 200., 200.], [50., 50., 100., 100.]],
[[50., 50., 100., 100.], [25., 25., 50., 50.]],
[[25., 25., 50., 50.], [100., 100., 200., 200.]]],
dtype=tf.float32),
input_data_fields.groundtruth_classes:
tf.constant([[1, 2], [2, 3], [3, 1]], dtype=tf.int64),
detection_fields.detection_classes:
tf.constant([[1, 2], [1, 1], [3, 1]], dtype=tf.int64),
detection_fields.detection_scores:
tf.constant([[1.0, 1.0], [0.0, 0.0], [1.0, 1.0]], dtype=tf.float32)
}
ece_op, update_op = calibration_evaluator.get_estimator_eval_metric_ops(
eval_dict)['CalibrationError/ExpectedCalibrationError']
ece = self._get_ece(ece_op, update_op)
self.assertAlmostEqual(ece, 0.0)
def testGetECEWhenImagesFilteredByIsAnnotated(self):
"""Tests that ECE is correct when detections filtered by is_annotated."""
calibration_evaluator = calibration_evaluation.CalibrationDetectionEvaluator(
_get_categories_list(), iou_threshold=0.5)
input_data_fields = standard_fields.InputDataFields
detection_fields = standard_fields.DetectionResultFields
# ECE will be 0 only if the third image is filtered by is_annotated.
eval_dict = {
input_data_fields.key:
tf.constant(['image_1', 'image_2', 'image_3']),
input_data_fields.groundtruth_boxes:
tf.constant([[[100., 100., 200., 200.]],
[[50., 50., 100., 100.]],
[[25., 25., 50., 50.]]],
dtype=tf.float32),
detection_fields.detection_boxes:
tf.constant([[[100., 100., 200., 200.]],
[[50., 50., 100., 100.]],
[[25., 25., 50., 50.]]],
dtype=tf.float32),
input_data_fields.groundtruth_classes:
tf.constant([[1], [2], [1]], dtype=tf.int64),
detection_fields.detection_classes:
tf.constant([[1], [1], [3]], dtype=tf.int64),
detection_fields.detection_scores:
tf.constant([[1.0], [0.0], [1.0]], dtype=tf.float32),
'is_annotated': tf.constant([True, True, False], dtype=tf.bool)
}
ece_op, update_op = calibration_evaluator.get_estimator_eval_metric_ops(
eval_dict)['CalibrationError/ExpectedCalibrationError']
ece = self._get_ece(ece_op, update_op)
self.assertAlmostEqual(ece, 0.0)
if __name__ == '__main__':
tf.test.main()
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Object detection calibration metrics.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
from tensorflow.python.ops import metrics_impl
def _safe_div(numerator, denominator):
"""Divides two tensors element-wise, returning 0 if the denominator is <= 0.
Args:
numerator: A real `Tensor`.
denominator: A real `Tensor`, with dtype matching `numerator`.
Returns:
0 if `denominator` <= 0, else `numerator` / `denominator`
"""
t = tf.truediv(numerator, denominator)
zero = tf.zeros_like(t, dtype=denominator.dtype)
condition = tf.greater(denominator, zero)
zero = tf.cast(zero, t.dtype)
return tf.where(condition, t, zero)
def _ece_from_bins(bin_counts, bin_true_sum, bin_preds_sum, name):
"""Calculates Expected Calibration Error from accumulated statistics."""
bin_accuracies = _safe_div(bin_true_sum, bin_counts)
bin_confidences = _safe_div(bin_preds_sum, bin_counts)
abs_bin_errors = tf.abs(bin_accuracies - bin_confidences)
bin_weights = _safe_div(bin_counts, tf.reduce_sum(bin_counts))
return tf.reduce_sum(abs_bin_errors * bin_weights, name=name)
def expected_calibration_error(y_true, y_pred, nbins=20):
"""Calculates Expected Calibration Error (ECE).
ECE is a scalar summary statistic of calibration error. It is the
sample-weighted average of the difference between the predicted and true
probabilities of a positive detection across uniformly-spaced model
confidences [0, 1]. See referenced paper for a thorough explanation.
Reference:
Guo, et. al, "On Calibration of Modern Neural Networks"
Page 2, Expected Calibration Error (ECE).
https://arxiv.org/pdf/1706.04599.pdf
This function creates three local variables, `bin_counts`, `bin_true_sum`, and
`bin_preds_sum` that are used to compute ECE. For estimation of the metric
over a stream of data, the function creates an `update_op` operation that
updates these variables and returns the ECE.
Args:
y_true: 1-D tf.int64 Tensor of binarized ground truth, corresponding to each
prediction in y_pred.
y_pred: 1-D tf.float32 tensor of model confidence scores in range
[0.0, 1.0].
nbins: int specifying the number of uniformly-spaced bins into which y_pred
will be bucketed.
Returns:
value_op: A value metric op that returns ece.
update_op: An operation that increments the `bin_counts`, `bin_true_sum`,
and `bin_preds_sum` variables appropriately and whose value matches `ece`.
Raises:
InvalidArgumentError: if y_pred is not in [0.0, 1.0].
"""
bin_counts = metrics_impl.metric_variable(
[nbins], tf.float32, name='bin_counts')
bin_true_sum = metrics_impl.metric_variable(
[nbins], tf.float32, name='true_sum')
bin_preds_sum = metrics_impl.metric_variable(
[nbins], tf.float32, name='preds_sum')
with tf.control_dependencies([
tf.assert_greater_equal(y_pred, 0.0),
tf.assert_less_equal(y_pred, 1.0),
]):
bin_ids = tf.histogram_fixed_width_bins(y_pred, [0.0, 1.0], nbins=nbins)
with tf.control_dependencies([bin_ids]):
update_bin_counts_op = tf.assign_add(
bin_counts, tf.to_float(tf.bincount(bin_ids, minlength=nbins)))
update_bin_true_sum_op = tf.assign_add(
bin_true_sum,
tf.to_float(tf.bincount(bin_ids, weights=y_true, minlength=nbins)))
update_bin_preds_sum_op = tf.assign_add(
bin_preds_sum,
tf.to_float(tf.bincount(bin_ids, weights=y_pred, minlength=nbins)))
ece_update_op = _ece_from_bins(
update_bin_counts_op,
update_bin_true_sum_op,
update_bin_preds_sum_op,
name='update_op')
ece = _ece_from_bins(bin_counts, bin_true_sum, bin_preds_sum, name='value')
return ece, ece_update_op
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for calibration_metrics."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import tensorflow as tf
from object_detection.metrics import calibration_metrics
class CalibrationLibTest(tf.test.TestCase):
@staticmethod
def _get_calibration_placeholders():
"""Returns TF placeholders for y_true and y_pred."""
return (tf.placeholder(tf.int64, shape=(None)),
tf.placeholder(tf.float32, shape=(None)))
def test_expected_calibration_error_all_bins_filled(self):
"""Test expected calibration error when all bins contain predictions."""
y_true, y_pred = self._get_calibration_placeholders()
expected_ece_op, update_op = calibration_metrics.expected_calibration_error(
y_true, y_pred, nbins=2)
with self.test_session() as sess:
metrics_vars = tf.get_collection(tf.GraphKeys.METRIC_VARIABLES)
sess.run(tf.variables_initializer(var_list=metrics_vars))
# Bin calibration errors (|confidence - accuracy| * bin_weight):
# - [0,0.5): |0.2 - 0.333| * (3/5) = 0.08
# - [0.5, 1]: |0.75 - 0.5| * (2/5) = 0.1
sess.run(
update_op,
feed_dict={
y_pred: np.array([0., 0.2, 0.4, 0.5, 1.0]),
y_true: np.array([0, 0, 1, 0, 1])
})
actual_ece = 0.08 + 0.1
expected_ece = sess.run(expected_ece_op)
self.assertAlmostEqual(actual_ece, expected_ece)
def test_expected_calibration_error_all_bins_not_filled(self):
"""Test expected calibration error when no predictions for one bin."""
y_true, y_pred = self._get_calibration_placeholders()
expected_ece_op, update_op = calibration_metrics.expected_calibration_error(
y_true, y_pred, nbins=2)
with self.test_session() as sess:
metrics_vars = tf.get_collection(tf.GraphKeys.METRIC_VARIABLES)
sess.run(tf.variables_initializer(var_list=metrics_vars))
# Bin calibration errors (|confidence - accuracy| * bin_weight):
# - [0,0.5): |0.2 - 0.333| * (3/5) = 0.08
# - [0.5, 1]: |0.75 - 0.5| * (2/5) = 0.1
sess.run(
update_op,
feed_dict={
y_pred: np.array([0., 0.2, 0.4]),
y_true: np.array([0, 0, 1])
})
actual_ece = np.abs(0.2 - (1 / 3.))
expected_ece = sess.run(expected_ece_op)
self.assertAlmostEqual(actual_ece, expected_ece)
def test_expected_calibration_error_with_multiple_data_streams(self):
"""Test expected calibration error when multiple data batches provided."""
y_true, y_pred = self._get_calibration_placeholders()
expected_ece_op, update_op = calibration_metrics.expected_calibration_error(
y_true, y_pred, nbins=2)
with self.test_session() as sess:
metrics_vars = tf.get_collection(tf.GraphKeys.METRIC_VARIABLES)
sess.run(tf.variables_initializer(var_list=metrics_vars))
# Identical data to test_expected_calibration_error_all_bins_filled,
# except split over three batches.
sess.run(
update_op,
feed_dict={
y_pred: np.array([0., 0.2]),
y_true: np.array([0, 0])
})
sess.run(
update_op,
feed_dict={
y_pred: np.array([0.4, 0.5]),
y_true: np.array([1, 0])
})
sess.run(
update_op, feed_dict={
y_pred: np.array([1.0]),
y_true: np.array([1])
})
actual_ece = 0.08 + 0.1
expected_ece = sess.run(expected_ece_op)
self.assertAlmostEqual(actual_ece, expected_ece)
if __name__ == '__main__':
tf.test.main()
......@@ -51,6 +51,7 @@ MODEL_BUILD_UTIL_MAP = {
inputs.create_eval_input_fn,
'create_predict_input_fn':
inputs.create_predict_input_fn,
'detection_model_fn_base': model_builder.build,
}
......@@ -184,7 +185,8 @@ def unstack_batch(tensor_dict, unpad_groundtruth_tensors=True):
return unbatched_tensor_dict
def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False,
postprocess_on_cpu=False):
"""Creates a model function for `Estimator`.
Args:
......@@ -193,6 +195,8 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
hparams: `HParams` object.
use_tpu: Boolean indicating whether model should be constructed for
use on TPU.
postprocess_on_cpu: When use_tpu and postprocess_on_cpu is true, postprocess
is scheduled on the host cpu.
Returns:
`model_fn` for `Estimator`.
......@@ -282,9 +286,20 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
prediction_dict = detection_model.predict(
preprocessed_images,
features[fields.InputDataFields.true_image_shape])
def postprocess_wrapper(args):
return detection_model.postprocess(args[0], args[1])
if mode in (tf.estimator.ModeKeys.EVAL, tf.estimator.ModeKeys.PREDICT):
detections = detection_model.postprocess(
prediction_dict, features[fields.InputDataFields.true_image_shape])
if use_tpu and postprocess_on_cpu:
detections = tf.contrib.tpu.outside_compilation(
postprocess_wrapper,
(prediction_dict,
features[fields.InputDataFields.true_image_shape]))
else:
detections = postprocess_wrapper((
prediction_dict,
features[fields.InputDataFields.true_image_shape]))
if mode == tf.estimator.ModeKeys.TRAIN:
if train_config.fine_tune_checkpoint and hparams.load_pretrained:
......@@ -501,6 +516,8 @@ def create_estimator_and_inputs(run_config,
params=None,
override_eval_num_epochs=True,
save_final_config=False,
postprocess_on_cpu=False,
export_to_tpu=None,
**kwargs):
"""Creates `Estimator`, input functions, and steps.
......@@ -535,10 +552,15 @@ def create_estimator_and_inputs(run_config,
is True.
params: Parameter dictionary passed from the estimator. Only used if
`use_tpu_estimator` is True.
override_eval_num_epochs: Whether to overwrite the number of epochs to
1 for eval_input.
override_eval_num_epochs: Whether to overwrite the number of epochs to 1 for
eval_input.
save_final_config: Whether to save final config (obtained after applying
overrides) to `estimator.model_dir`.
postprocess_on_cpu: When use_tpu and postprocess_on_cpu are true,
postprocess is scheduled on the host cpu.
export_to_tpu: When use_tpu and export_to_tpu are true,
`export_savedmodel()` exports a metagraph for serving on TPU besides the
one on CPU.
**kwargs: Additional keyword arguments for configuration override.
Returns:
......@@ -561,12 +583,14 @@ def create_estimator_and_inputs(run_config,
create_train_input_fn = MODEL_BUILD_UTIL_MAP['create_train_input_fn']
create_eval_input_fn = MODEL_BUILD_UTIL_MAP['create_eval_input_fn']
create_predict_input_fn = MODEL_BUILD_UTIL_MAP['create_predict_input_fn']
detection_model_fn_base = MODEL_BUILD_UTIL_MAP['detection_model_fn_base']
configs = get_configs_from_pipeline_file(pipeline_config_path,
config_override=config_override)
configs = get_configs_from_pipeline_file(
pipeline_config_path, config_override=config_override)
kwargs.update({
'train_steps': train_steps,
'sample_1_of_n_eval_examples': sample_1_of_n_eval_examples
'sample_1_of_n_eval_examples': sample_1_of_n_eval_examples,
'use_bfloat16': configs['train_config'].use_bfloat16 and use_tpu
})
if override_eval_num_epochs:
kwargs.update({'eval_num_epochs': 1})
......@@ -595,7 +619,7 @@ def create_estimator_and_inputs(run_config,
train_steps = train_config.num_steps
detection_model_fn = functools.partial(
model_builder.build, model_config=model_config)
detection_model_fn_base, model_config=model_config)
# Create the input functions for TRAIN/EVAL/PREDICT.
train_input_fn = create_train_input_fn(
......@@ -618,10 +642,13 @@ def create_estimator_and_inputs(run_config,
predict_input_fn = create_predict_input_fn(
model_config=model_config, predict_input_config=eval_input_configs[0])
export_to_tpu = hparams.get('export_to_tpu', False)
# Read export_to_tpu from hparams if not passed.
if export_to_tpu is None:
export_to_tpu = hparams.get('export_to_tpu', False)
tf.logging.info('create_estimator_and_inputs: use_tpu %s, export_to_tpu %s',
use_tpu, export_to_tpu)
model_fn = model_fn_creator(detection_model_fn, configs, hparams, use_tpu)
model_fn = model_fn_creator(detection_model_fn, configs, hparams, use_tpu,
postprocess_on_cpu)
if use_tpu_estimator:
estimator = tf.contrib.tpu.TPUEstimator(
model_fn=model_fn,
......@@ -630,7 +657,8 @@ def create_estimator_and_inputs(run_config,
eval_batch_size=num_shards * 1 if use_tpu else 1,
use_tpu=use_tpu,
config=run_config,
# TODO(lzc): Remove conditional after CMLE moves to TF 1.9
export_to_tpu=export_to_tpu,
eval_on_tpu=False, # Eval runs on CPU, so disable eval on TPU
params=params if params else {})
else:
estimator = tf.estimator.Estimator(model_fn=model_fn, config=run_config)
......
......@@ -29,6 +29,11 @@ import tensorflow as tf
from object_detection.utils import ops
slim = tf.contrib.slim
# Activation bound used for TPU v1. Activations will be clipped to
# [-ACTIVATION_BOUND, ACTIVATION_BOUND] when training with
# use_bounded_activations enabled.
ACTIVATION_BOUND = 6.0
def get_depth_fn(depth_multiplier, min_depth):
"""Builds a callable to compute depth (output channels) of conv filters.
......@@ -418,7 +423,9 @@ def fpn_top_down_feature_maps(image_features,
depth,
use_depthwise=False,
use_explicit_padding=False,
scope=None):
use_bounded_activations=False,
scope=None,
use_native_resize_op=False):
"""Generates `top-down` feature maps for Feature Pyramid Networks.
See https://arxiv.org/abs/1612.03144 for details.
......@@ -431,7 +438,12 @@ def fpn_top_down_feature_maps(image_features,
use_depthwise: whether to use depthwise separable conv instead of regular
conv.
use_explicit_padding: whether to use explicit padding.
use_bounded_activations: Whether or not to clip activations to range
[-ACTIVATION_BOUND, ACTIVATION_BOUND]. Bounded activations better lend
themselves to quantized inference.
scope: A scope name to wrap this op under.
use_native_resize_op: If True, uses tf.image.resize_nearest_neighbor op for
the upsampling process instead of reshape and broadcasting implementation.
Returns:
feature_maps: an OrderedDict mapping keys (feature map names) to
......@@ -449,21 +461,36 @@ def fpn_top_down_feature_maps(image_features,
image_features[-1][1],
depth, [1, 1], activation_fn=None, normalizer_fn=None,
scope='projection_%d' % num_levels)
if use_bounded_activations:
top_down = tf.clip_by_value(top_down, -ACTIVATION_BOUND,
ACTIVATION_BOUND)
output_feature_maps_list.append(top_down)
output_feature_map_keys.append(
'top_down_%s' % image_features[-1][0])
for level in reversed(range(num_levels - 1)):
top_down = ops.nearest_neighbor_upsampling(top_down, 2)
if use_native_resize_op:
with tf.name_scope('nearest_neighbor_upsampling'):
top_down_shape = top_down.shape.as_list()
top_down = tf.image.resize_nearest_neighbor(
top_down, [top_down_shape[1] * 2, top_down_shape[2] * 2])
else:
top_down = ops.nearest_neighbor_upsampling(top_down, scale=2)
residual = slim.conv2d(
image_features[level][1], depth, [1, 1],
activation_fn=None, normalizer_fn=None,
scope='projection_%d' % (level + 1))
if use_bounded_activations:
residual = tf.clip_by_value(residual, -ACTIVATION_BOUND,
ACTIVATION_BOUND)
if use_explicit_padding:
# slice top_down to the same shape as residual
residual_shape = tf.shape(residual)
top_down = top_down[:, :residual_shape[1], :residual_shape[2], :]
top_down += residual
if use_bounded_activations:
top_down = tf.clip_by_value(top_down, -ACTIVATION_BOUND,
ACTIVATION_BOUND)
if use_depthwise:
conv_op = functools.partial(slim.separable_conv2d, depth_multiplier=1)
else:
......
......@@ -17,6 +17,7 @@
from absl.testing import parameterized
import numpy as np
import tensorflow as tf
from google.protobuf import text_format
......@@ -124,7 +125,36 @@ class MultiResolutionFeatureMapGeneratorTest(tf.test.TestCase):
(key, value.shape) for key, value in out_feature_maps.items())
self.assertDictEqual(expected_feature_map_shapes, out_feature_map_shapes)
# TODO(kaftan): Remove conditional after CMLE moves to TF 1.10
def test_get_expected_feature_map_shapes_with_inception_v2_use_depthwise(
self, use_keras):
image_features = {
'Mixed_3c': tf.random_uniform([4, 28, 28, 256], dtype=tf.float32),
'Mixed_4c': tf.random_uniform([4, 14, 14, 576], dtype=tf.float32),
'Mixed_5c': tf.random_uniform([4, 7, 7, 1024], dtype=tf.float32)
}
layout_copy = INCEPTION_V2_LAYOUT.copy()
layout_copy['use_depthwise'] = True
feature_map_generator = self._build_feature_map_generator(
feature_map_layout=layout_copy,
use_keras=use_keras
)
feature_maps = feature_map_generator(image_features)
expected_feature_map_shapes = {
'Mixed_3c': (4, 28, 28, 256),
'Mixed_4c': (4, 14, 14, 576),
'Mixed_5c': (4, 7, 7, 1024),
'Mixed_5c_2_Conv2d_3_3x3_s2_512': (4, 4, 4, 512),
'Mixed_5c_2_Conv2d_4_3x3_s2_256': (4, 2, 2, 256),
'Mixed_5c_2_Conv2d_5_3x3_s2_256': (4, 1, 1, 256)}
init_op = tf.global_variables_initializer()
with self.test_session() as sess:
sess.run(init_op)
out_feature_maps = sess.run(feature_maps)
out_feature_map_shapes = dict(
(key, value.shape) for key, value in out_feature_maps.items())
self.assertDictEqual(expected_feature_map_shapes, out_feature_map_shapes)
def test_get_expected_feature_map_shapes_use_explicit_padding(
self, use_keras):
......@@ -297,12 +327,87 @@ class MultiResolutionFeatureMapGeneratorTest(tf.test.TestCase):
else:
self.assertSetEqual(expected_slim_variables, actual_variable_set)
# TODO(kaftan): Remove conditional after CMLE moves to TF 1.10
def test_get_expected_variable_names_with_inception_v2_use_depthwise(
self,
use_keras):
image_features = {
'Mixed_3c': tf.random_uniform([4, 28, 28, 256], dtype=tf.float32),
'Mixed_4c': tf.random_uniform([4, 14, 14, 576], dtype=tf.float32),
'Mixed_5c': tf.random_uniform([4, 7, 7, 1024], dtype=tf.float32)
}
layout_copy = INCEPTION_V2_LAYOUT.copy()
layout_copy['use_depthwise'] = True
feature_map_generator = self._build_feature_map_generator(
feature_map_layout=layout_copy,
use_keras=use_keras
)
feature_maps = feature_map_generator(image_features)
expected_slim_variables = set([
'Mixed_5c_1_Conv2d_3_1x1_256/weights',
'Mixed_5c_1_Conv2d_3_1x1_256/biases',
'Mixed_5c_2_Conv2d_3_3x3_s2_512_depthwise/depthwise_weights',
'Mixed_5c_2_Conv2d_3_3x3_s2_512_depthwise/biases',
'Mixed_5c_2_Conv2d_3_3x3_s2_512/weights',
'Mixed_5c_2_Conv2d_3_3x3_s2_512/biases',
'Mixed_5c_1_Conv2d_4_1x1_128/weights',
'Mixed_5c_1_Conv2d_4_1x1_128/biases',
'Mixed_5c_2_Conv2d_4_3x3_s2_256_depthwise/depthwise_weights',
'Mixed_5c_2_Conv2d_4_3x3_s2_256_depthwise/biases',
'Mixed_5c_2_Conv2d_4_3x3_s2_256/weights',
'Mixed_5c_2_Conv2d_4_3x3_s2_256/biases',
'Mixed_5c_1_Conv2d_5_1x1_128/weights',
'Mixed_5c_1_Conv2d_5_1x1_128/biases',
'Mixed_5c_2_Conv2d_5_3x3_s2_256_depthwise/depthwise_weights',
'Mixed_5c_2_Conv2d_5_3x3_s2_256_depthwise/biases',
'Mixed_5c_2_Conv2d_5_3x3_s2_256/weights',
'Mixed_5c_2_Conv2d_5_3x3_s2_256/biases',
])
expected_keras_variables = set([
'FeatureMaps/Mixed_5c_1_Conv2d_3_1x1_256_conv/kernel',
'FeatureMaps/Mixed_5c_1_Conv2d_3_1x1_256_conv/bias',
('FeatureMaps/Mixed_5c_2_Conv2d_3_3x3_s2_512_depthwise_conv/'
'depthwise_kernel'),
('FeatureMaps/Mixed_5c_2_Conv2d_3_3x3_s2_512_depthwise_conv/'
'bias'),
'FeatureMaps/Mixed_5c_2_Conv2d_3_3x3_s2_512_conv/kernel',
'FeatureMaps/Mixed_5c_2_Conv2d_3_3x3_s2_512_conv/bias',
'FeatureMaps/Mixed_5c_1_Conv2d_4_1x1_128_conv/kernel',
'FeatureMaps/Mixed_5c_1_Conv2d_4_1x1_128_conv/bias',
('FeatureMaps/Mixed_5c_2_Conv2d_4_3x3_s2_256_depthwise_conv/'
'depthwise_kernel'),
('FeatureMaps/Mixed_5c_2_Conv2d_4_3x3_s2_256_depthwise_conv/'
'bias'),
'FeatureMaps/Mixed_5c_2_Conv2d_4_3x3_s2_256_conv/kernel',
'FeatureMaps/Mixed_5c_2_Conv2d_4_3x3_s2_256_conv/bias',
'FeatureMaps/Mixed_5c_1_Conv2d_5_1x1_128_conv/kernel',
'FeatureMaps/Mixed_5c_1_Conv2d_5_1x1_128_conv/bias',
('FeatureMaps/Mixed_5c_2_Conv2d_5_3x3_s2_256_depthwise_conv/'
'depthwise_kernel'),
('FeatureMaps/Mixed_5c_2_Conv2d_5_3x3_s2_256_depthwise_conv/'
'bias'),
'FeatureMaps/Mixed_5c_2_Conv2d_5_3x3_s2_256_conv/kernel',
'FeatureMaps/Mixed_5c_2_Conv2d_5_3x3_s2_256_conv/bias',
])
init_op = tf.global_variables_initializer()
with self.test_session() as sess:
sess.run(init_op)
sess.run(feature_maps)
actual_variable_set = set(
[var.op.name for var in tf.trainable_variables()])
if use_keras:
self.assertSetEqual(expected_keras_variables, actual_variable_set)
else:
self.assertSetEqual(expected_slim_variables, actual_variable_set)
class FPNFeatureMapGeneratorTest(tf.test.TestCase):
@parameterized.parameters({'use_native_resize_op': True},
{'use_native_resize_op': False})
class FPNFeatureMapGeneratorTest(tf.test.TestCase, parameterized.TestCase):
def test_get_expected_feature_map_shapes(self):
def test_get_expected_feature_map_shapes(self, use_native_resize_op):
image_features = [
('block2', tf.random_uniform([4, 8, 8, 256], dtype=tf.float32)),
('block3', tf.random_uniform([4, 4, 4, 256], dtype=tf.float32)),
......@@ -310,7 +415,9 @@ class FPNFeatureMapGeneratorTest(tf.test.TestCase):
('block5', tf.random_uniform([4, 1, 1, 256], dtype=tf.float32))
]
feature_maps = feature_map_generators.fpn_top_down_feature_maps(
image_features=image_features, depth=128)
image_features=image_features,
depth=128,
use_native_resize_op=use_native_resize_op)
expected_feature_map_shapes = {
'top_down_block2': (4, 8, 8, 128),
......@@ -327,7 +434,95 @@ class FPNFeatureMapGeneratorTest(tf.test.TestCase):
for key, value in out_feature_maps.items()}
self.assertDictEqual(out_feature_map_shapes, expected_feature_map_shapes)
def test_get_expected_feature_map_shapes_with_depthwise(self):
def test_use_bounded_activations_add_operations(self, use_native_resize_op):
tf_graph = tf.Graph()
with tf_graph.as_default():
image_features = [('block2',
tf.random_uniform([4, 8, 8, 256], dtype=tf.float32)),
('block3',
tf.random_uniform([4, 4, 4, 256], dtype=tf.float32)),
('block4',
tf.random_uniform([4, 2, 2, 256], dtype=tf.float32)),
('block5',
tf.random_uniform([4, 1, 1, 256], dtype=tf.float32))]
feature_map_generators.fpn_top_down_feature_maps(
image_features=image_features,
depth=128,
use_bounded_activations=True,
use_native_resize_op=use_native_resize_op)
expected_added_operations = dict.fromkeys([
'top_down/clip_by_value', 'top_down/clip_by_value_1',
'top_down/clip_by_value_2', 'top_down/clip_by_value_3',
'top_down/clip_by_value_4', 'top_down/clip_by_value_5',
'top_down/clip_by_value_6'
])
op_names = {op.name: None for op in tf_graph.get_operations()}
self.assertDictContainsSubset(expected_added_operations, op_names)
def test_use_bounded_activations_clip_value(self, use_native_resize_op):
tf_graph = tf.Graph()
with tf_graph.as_default():
image_features = [
('block2', 255 * tf.ones([4, 8, 8, 256], dtype=tf.float32)),
('block3', 255 * tf.ones([4, 4, 4, 256], dtype=tf.float32)),
('block4', 255 * tf.ones([4, 2, 2, 256], dtype=tf.float32)),
('block5', 255 * tf.ones([4, 1, 1, 256], dtype=tf.float32))
]
feature_map_generators.fpn_top_down_feature_maps(
image_features=image_features,
depth=128,
use_bounded_activations=True,
use_native_resize_op=use_native_resize_op)
expected_clip_by_value_ops = [
'top_down/clip_by_value', 'top_down/clip_by_value_1',
'top_down/clip_by_value_2', 'top_down/clip_by_value_3',
'top_down/clip_by_value_4', 'top_down/clip_by_value_5',
'top_down/clip_by_value_6'
]
# Gathers activation tensors before and after clip_by_value operations.
activations = {}
for clip_by_value_op in expected_clip_by_value_ops:
clip_input_tensor = tf_graph.get_operation_by_name(
'{}/Minimum'.format(clip_by_value_op)).inputs[0]
clip_output_tensor = tf_graph.get_tensor_by_name(
'{}:0'.format(clip_by_value_op))
activations.update({
'before_{}'.format(clip_by_value_op): clip_input_tensor,
'after_{}'.format(clip_by_value_op): clip_output_tensor,
})
expected_lower_bound = -feature_map_generators.ACTIVATION_BOUND
expected_upper_bound = feature_map_generators.ACTIVATION_BOUND
init_op = tf.global_variables_initializer()
with self.test_session() as session:
session.run(init_op)
activations_output = session.run(activations)
for clip_by_value_op in expected_clip_by_value_ops:
# Before clipping, activations are beyound the expected bound because
# of large input image_features values.
activations_before_clipping = (
activations_output['before_{}'.format(clip_by_value_op)])
before_clipping_lower_bound = np.amin(activations_before_clipping)
before_clipping_upper_bound = np.amax(activations_before_clipping)
self.assertLessEqual(before_clipping_lower_bound,
expected_lower_bound)
self.assertGreaterEqual(before_clipping_upper_bound,
expected_upper_bound)
# After clipping, activations are bounded as expectation.
activations_after_clipping = (
activations_output['after_{}'.format(clip_by_value_op)])
after_clipping_lower_bound = np.amin(activations_after_clipping)
after_clipping_upper_bound = np.amax(activations_after_clipping)
self.assertGreaterEqual(after_clipping_lower_bound,
expected_lower_bound)
self.assertLessEqual(after_clipping_upper_bound, expected_upper_bound)
def test_get_expected_feature_map_shapes_with_depthwise(
self, use_native_resize_op):
image_features = [
('block2', tf.random_uniform([4, 8, 8, 256], dtype=tf.float32)),
('block3', tf.random_uniform([4, 4, 4, 256], dtype=tf.float32)),
......@@ -335,7 +530,10 @@ class FPNFeatureMapGeneratorTest(tf.test.TestCase):
('block5', tf.random_uniform([4, 1, 1, 256], dtype=tf.float32))
]
feature_maps = feature_map_generators.fpn_top_down_feature_maps(
image_features=image_features, depth=128, use_depthwise=True)
image_features=image_features,
depth=128,
use_depthwise=True,
use_native_resize_op=use_native_resize_op)
expected_feature_map_shapes = {
'top_down_block2': (4, 8, 8, 128),
......
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""A wrapper around the Keras MobilenetV1 models for object detection."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
from object_detection.core import freezable_batch_norm
def _fixed_padding(inputs, kernel_size, rate=1): # pylint: disable=invalid-name
"""Pads the input along the spatial dimensions independently of input size.
Pads the input such that if it was used in a convolution with 'VALID' padding,
the output would have the same dimensions as if the unpadded input was used
in a convolution with 'SAME' padding.
Args:
inputs: A tensor of size [batch, height_in, width_in, channels].
kernel_size: The kernel to be used in the conv2d or max_pool2d operation.
rate: An integer, rate for atrous convolution.
Returns:
output: A tensor of size [batch, height_out, width_out, channels] with the
input, either intact (if kernel_size == 1) or padded (if kernel_size > 1).
"""
kernel_size_effective = [kernel_size[0] + (kernel_size[0] - 1) * (rate - 1),
kernel_size[0] + (kernel_size[0] - 1) * (rate - 1)]
pad_total = [kernel_size_effective[0] - 1, kernel_size_effective[1] - 1]
pad_beg = [pad_total[0] // 2, pad_total[1] // 2]
pad_end = [pad_total[0] - pad_beg[0], pad_total[1] - pad_beg[1]]
padded_inputs = tf.pad(inputs, [[0, 0], [pad_beg[0], pad_end[0]],
[pad_beg[1], pad_end[1]], [0, 0]])
return padded_inputs
class _LayersOverride(object):
"""Alternative Keras layers interface for the Keras MobileNetV1."""
def __init__(self,
batchnorm_training,
default_batchnorm_momentum=0.999,
conv_hyperparams=None,
use_explicit_padding=False,
alpha=1.0,
min_depth=None):
"""Alternative tf.keras.layers interface, for use by the Keras MobileNetV1.
It is used by the Keras applications kwargs injection API to
modify the MobilenetV1 Keras application with changes required by
the Object Detection API.
These injected interfaces make the following changes to the network:
- Applies the Object Detection hyperparameter configuration
- Supports FreezableBatchNorms
- Adds support for a min number of filters for each layer
- Makes the `alpha` parameter affect the final convolution block even if it
is less than 1.0
- Adds support for explicit padding of convolutions
Args:
batchnorm_training: Bool. Assigned to Batch norm layer `training` param
when constructing `freezable_batch_norm.FreezableBatchNorm` layers.
default_batchnorm_momentum: Float. When 'conv_hyperparams' is None,
batch norm layers will be constructed using this value as the momentum.
conv_hyperparams: A `hyperparams_builder.KerasLayerHyperparams` object
containing hyperparameters for convolution ops. Optionally set to `None`
to use default mobilenet_v1 layer builders.
use_explicit_padding: If True, use 'valid' padding for convolutions,
but explicitly pre-pads inputs so that the output dimensions are the
same as if 'same' padding were used. Off by default.
alpha: The width multiplier referenced in the MobileNetV1 paper. It
modifies the number of filters in each convolutional layer. It's called
depth multiplier in Keras application MobilenetV1.
min_depth: Minimum number of filters in the convolutional layers.
"""
self._alpha = alpha
self._batchnorm_training = batchnorm_training
self._default_batchnorm_momentum = default_batchnorm_momentum
self._conv_hyperparams = conv_hyperparams
self._use_explicit_padding = use_explicit_padding
self._min_depth = min_depth
self.regularizer = tf.keras.regularizers.l2(0.00004 * 0.5)
self.initializer = tf.truncated_normal_initializer(stddev=0.09)
def _FixedPaddingLayer(self, kernel_size, rate=1):
return tf.keras.layers.Lambda(
lambda x: _fixed_padding(x, kernel_size, rate))
def Conv2D(self, filters, kernel_size, **kwargs):
"""Builds a Conv2D layer according to the current Object Detection config.
Overrides the Keras MobileNetV1 application's convolutions with ones that
follow the spec specified by the Object Detection hyperparameters.
Args:
filters: The number of filters to use for the convolution.
kernel_size: The kernel size to specify the height and width of the 2D
convolution window.
**kwargs: Keyword args specified by the Keras application for
constructing the convolution.
Returns:
A one-arg callable that will either directly apply a Keras Conv2D layer to
the input argument, or that will first pad the input then apply a Conv2D
layer.
"""
# Apply the width multiplier and the minimum depth to the convolution layers
filters = int(filters * self._alpha)
if self._min_depth and filters < self._min_depth:
filters = self._min_depth
if self._conv_hyperparams:
kwargs = self._conv_hyperparams.params(**kwargs)
else:
kwargs['kernel_regularizer'] = self.regularizer
kwargs['kernel_initializer'] = self.initializer
kwargs['padding'] = 'same'
if self._use_explicit_padding and kernel_size > 1:
kwargs['padding'] = 'valid'
def padded_conv(features): # pylint: disable=invalid-name
padded_features = self._FixedPaddingLayer(kernel_size)(features)
return tf.keras.layers.Conv2D(
filters, kernel_size, **kwargs)(padded_features)
return padded_conv
else:
return tf.keras.layers.Conv2D(filters, kernel_size, **kwargs)
def DepthwiseConv2D(self, kernel_size, **kwargs):
"""Builds a DepthwiseConv2D according to the Object Detection config.
Overrides the Keras MobileNetV2 application's convolutions with ones that
follow the spec specified by the Object Detection hyperparameters.
Args:
kernel_size: The kernel size to specify the height and width of the 2D
convolution window.
**kwargs: Keyword args specified by the Keras application for
constructing the convolution.
Returns:
A one-arg callable that will either directly apply a Keras DepthwiseConv2D
layer to the input argument, or that will first pad the input then apply
the depthwise convolution.
"""
if self._conv_hyperparams:
kwargs = self._conv_hyperparams.params(**kwargs)
else:
kwargs['depthwise_initializer'] = self.initializer
kwargs['padding'] = 'same'
if self._use_explicit_padding:
kwargs['padding'] = 'valid'
def padded_depthwise_conv(features): # pylint: disable=invalid-name
padded_features = self._FixedPaddingLayer(kernel_size)(features)
return tf.keras.layers.DepthwiseConv2D(
kernel_size, **kwargs)(padded_features)
return padded_depthwise_conv
else:
return tf.keras.layers.DepthwiseConv2D(kernel_size, **kwargs)
def BatchNormalization(self, **kwargs):
"""Builds a normalization layer.
Overrides the Keras application batch norm with the norm specified by the
Object Detection configuration.
Args:
**kwargs: Only the name is used, all other params ignored.
Required for matching `layers.BatchNormalization` calls in the Keras
application.
Returns:
A normalization layer specified by the Object Detection hyperparameter
configurations.
"""
name = kwargs.get('name')
if self._conv_hyperparams:
return self._conv_hyperparams.build_batch_norm(
training=self._batchnorm_training,
name=name)
else:
return freezable_batch_norm.FreezableBatchNorm(
training=self._batchnorm_training,
epsilon=1e-3,
momentum=self._default_batchnorm_momentum,
name=name)
def Input(self, shape):
"""Builds an Input layer.
Overrides the Keras application Input layer with one that uses a
tf.placeholder_with_default instead of a tf.placeholder. This is necessary
to ensure the application works when run on a TPU.
Args:
shape: The shape for the input layer to use. (Does not include a dimension
for the batch size).
Returns:
An input layer for the specified shape that internally uses a
placeholder_with_default.
"""
default_size = 224
default_batch_size = 1
shape = list(shape)
default_shape = [default_size if dim is None else dim for dim in shape]
input_tensor = tf.constant(0.0, shape=[default_batch_size] + default_shape)
placeholder_with_default = tf.placeholder_with_default(
input=input_tensor, shape=[None] + shape)
return tf.keras.layers.Input(tensor=placeholder_with_default)
# pylint: disable=unused-argument
def ReLU(self, *args, **kwargs):
"""Builds an activation layer.
Overrides the Keras application ReLU with the activation specified by the
Object Detection configuration.
Args:
*args: Ignored, required to match the `tf.keras.ReLU` interface
**kwargs: Only the name is used,
required to match `tf.keras.ReLU` interface
Returns:
An activation layer specified by the Object Detection hyperparameter
configurations.
"""
name = kwargs.get('name')
if self._conv_hyperparams:
return self._conv_hyperparams.build_activation_layer(name=name)
else:
return tf.keras.layers.Lambda(tf.nn.relu6, name=name)
# pylint: enable=unused-argument
# pylint: disable=unused-argument
def ZeroPadding2D(self, padding, **kwargs):
"""Replaces explicit padding in the Keras application with a no-op.
Args:
padding: The padding values for image height and width.
**kwargs: Ignored, required to match the Keras applications usage.
Returns:
A no-op identity lambda.
"""
return lambda x: x
# pylint: enable=unused-argument
# Forward all non-overridden methods to the keras layers
def __getattr__(self, item):
return getattr(tf.keras.layers, item)
# pylint: disable=invalid-name
def mobilenet_v1(batchnorm_training,
default_batchnorm_momentum=0.9997,
conv_hyperparams=None,
use_explicit_padding=False,
alpha=1.0,
min_depth=None,
**kwargs):
"""Instantiates the MobileNetV1 architecture, modified for object detection.
This wraps the MobileNetV1 tensorflow Keras application, but uses the
Keras application's kwargs-based monkey-patching API to override the Keras
architecture with the following changes:
- Changes the default batchnorm momentum to 0.9997
- Applies the Object Detection hyperparameter configuration
- Supports FreezableBatchNorms
- Adds support for a min number of filters for each layer
- Makes the `alpha` parameter affect the final convolution block even if it
is less than 1.0
- Adds support for explicit padding of convolutions
- Makes the Input layer use a tf.placeholder_with_default instead of a
tf.placeholder, to work on TPUs.
Args:
batchnorm_training: Bool. Assigned to Batch norm layer `training` param
when constructing `freezable_batch_norm.FreezableBatchNorm` layers.
default_batchnorm_momentum: Float. When 'conv_hyperparams' is None,
batch norm layers will be constructed using this value as the momentum.
conv_hyperparams: A `hyperparams_builder.KerasLayerHyperparams` object
containing hyperparameters for convolution ops. Optionally set to `None`
to use default mobilenet_v1 layer builders.
use_explicit_padding: If True, use 'valid' padding for convolutions,
but explicitly pre-pads inputs so that the output dimensions are the
same as if 'same' padding were used. Off by default.
alpha: The width multiplier referenced in the MobileNetV1 paper. It
modifies the number of filters in each convolutional layer.
min_depth: Minimum number of filters in the convolutional layers.
**kwargs: Keyword arguments forwarded directly to the
`tf.keras.applications.Mobilenet` method that constructs the Keras
model.
Returns:
A Keras model instance.
"""
layers_override = _LayersOverride(
batchnorm_training,
default_batchnorm_momentum=default_batchnorm_momentum,
conv_hyperparams=conv_hyperparams,
use_explicit_padding=use_explicit_padding,
min_depth=min_depth,
alpha=alpha)
return tf.keras.applications.MobileNet(
alpha=alpha, layers=layers_override, **kwargs)
# pylint: enable=invalid-name
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for mobilenet_v1.py.
This test mainly focuses on comparing slim MobilenetV1 and Keras MobilenetV1 for
object detection. To verify the consistency of the two models, we compare:
1. Output shape of each layer given different inputs
2. Number of global variables
We also visualize the model structure via Tensorboard, and compare the model
layout and the parameters of each Op to make sure the two implementations are
consistent.
"""
import itertools
import numpy as np
import tensorflow as tf
from google.protobuf import text_format
from object_detection.builders import hyperparams_builder
from object_detection.models.keras_models import mobilenet_v1
from object_detection.models.keras_models import test_utils
from object_detection.protos import hyperparams_pb2
from object_detection.utils import test_case
_KERAS_LAYERS_TO_CHECK = [
'conv1_relu',
'conv_dw_1_relu', 'conv_pw_1_relu',
'conv_dw_2_relu', 'conv_pw_2_relu',
'conv_dw_3_relu', 'conv_pw_3_relu',
'conv_dw_4_relu', 'conv_pw_4_relu',
'conv_dw_5_relu', 'conv_pw_5_relu',
'conv_dw_6_relu', 'conv_pw_6_relu',
'conv_dw_7_relu', 'conv_pw_7_relu',
'conv_dw_8_relu', 'conv_pw_8_relu',
'conv_dw_9_relu', 'conv_pw_9_relu',
'conv_dw_10_relu', 'conv_pw_10_relu',
'conv_dw_11_relu', 'conv_pw_11_relu',
'conv_dw_12_relu', 'conv_pw_12_relu',
'conv_dw_13_relu', 'conv_pw_13_relu',
]
_NUM_CHANNELS = 3
_BATCH_SIZE = 2
class MobilenetV1Test(test_case.TestCase):
def _build_conv_hyperparams(self):
conv_hyperparams = hyperparams_pb2.Hyperparams()
conv_hyperparams_text_proto = """
activation: RELU_6
regularizer {
l2_regularizer {
}
}
initializer {
truncated_normal_initializer {
}
}
batch_norm {
train: true,
scale: false,
center: true,
decay: 0.2,
epsilon: 0.1,
}
"""
text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams)
return hyperparams_builder.KerasLayerHyperparams(conv_hyperparams)
def _create_application_with_layer_outputs(
self, layer_names, batchnorm_training,
conv_hyperparams=None,
use_explicit_padding=False,
alpha=1.0,
min_depth=None):
"""Constructs Keras MobilenetV1 that extracts intermediate layer outputs."""
if not layer_names:
layer_names = _KERAS_LAYERS_TO_CHECK
full_model = mobilenet_v1.mobilenet_v1(
batchnorm_training=batchnorm_training,
conv_hyperparams=conv_hyperparams,
weights=None,
use_explicit_padding=use_explicit_padding,
alpha=alpha,
min_depth=min_depth,
include_top=False)
layer_outputs = [full_model.get_layer(name=layer).output
for layer in layer_names]
return tf.keras.Model(
inputs=full_model.inputs,
outputs=layer_outputs)
def _check_returns_correct_shape(
self, image_height, image_width, depth_multiplier,
expected_feature_map_shape, use_explicit_padding=False, min_depth=8,
layer_names=None):
def graph_fn(image_tensor):
model = self._create_application_with_layer_outputs(
layer_names=layer_names,
batchnorm_training=False,
use_explicit_padding=use_explicit_padding,
min_depth=min_depth,
alpha=depth_multiplier)
return model(image_tensor)
image_tensor = np.random.rand(_BATCH_SIZE, image_height, image_width,
_NUM_CHANNELS).astype(np.float32)
feature_maps = self.execute(graph_fn, [image_tensor])
for feature_map, expected_shape in itertools.izip(
feature_maps, expected_feature_map_shape):
self.assertAllEqual(feature_map.shape, expected_shape)
def _check_returns_correct_shapes_with_dynamic_inputs(
self, image_height, image_width, depth_multiplier,
expected_feature_map_shape, use_explicit_padding=False, min_depth=8,
layer_names=None):
def graph_fn(image_height, image_width):
image_tensor = tf.random_uniform([_BATCH_SIZE, image_height, image_width,
_NUM_CHANNELS], dtype=tf.float32)
model = self._create_application_with_layer_outputs(
layer_names=layer_names,
batchnorm_training=False,
use_explicit_padding=use_explicit_padding,
alpha=depth_multiplier)
return model(image_tensor)
feature_maps = self.execute_cpu(graph_fn, [
np.array(image_height, dtype=np.int32),
np.array(image_width, dtype=np.int32)
])
for feature_map, expected_shape in itertools.izip(
feature_maps, expected_feature_map_shape):
self.assertAllEqual(feature_map.shape, expected_shape)
def _get_variables(self, depth_multiplier, layer_names=None):
g = tf.Graph()
with g.as_default():
preprocessed_inputs = tf.placeholder(
tf.float32, (4, None, None, _NUM_CHANNELS))
model = self._create_application_with_layer_outputs(
layer_names=layer_names,
batchnorm_training=False, use_explicit_padding=False,
alpha=depth_multiplier)
model(preprocessed_inputs)
return g.get_collection(tf.GraphKeys.GLOBAL_VARIABLES)
def test_returns_correct_shapes_128(self):
image_height = 128
image_width = 128
depth_multiplier = 1.0
expected_feature_map_shape = (
test_utils.moblenet_v1_expected_feature_map_shape_128)
self._check_returns_correct_shape(
image_height, image_width, depth_multiplier, expected_feature_map_shape)
def test_returns_correct_shapes_128_explicit_padding(
self):
image_height = 128
image_width = 128
depth_multiplier = 1.0
expected_feature_map_shape = (
test_utils.moblenet_v1_expected_feature_map_shape_128_explicit_padding)
self._check_returns_correct_shape(
image_height, image_width, depth_multiplier, expected_feature_map_shape,
use_explicit_padding=True)
def test_returns_correct_shapes_with_dynamic_inputs(
self):
image_height = 128
image_width = 128
depth_multiplier = 1.0
expected_feature_map_shape = (
test_utils.mobilenet_v1_expected_feature_map_shape_with_dynamic_inputs)
self._check_returns_correct_shapes_with_dynamic_inputs(
image_height, image_width, depth_multiplier, expected_feature_map_shape)
def test_returns_correct_shapes_299(self):
image_height = 299
image_width = 299
depth_multiplier = 1.0
expected_feature_map_shape = (
test_utils.moblenet_v1_expected_feature_map_shape_299)
self._check_returns_correct_shape(
image_height, image_width, depth_multiplier, expected_feature_map_shape)
def test_returns_correct_shapes_enforcing_min_depth(
self):
image_height = 299
image_width = 299
depth_multiplier = 0.5**12
expected_feature_map_shape = (
test_utils.moblenet_v1_expected_feature_map_shape_enforcing_min_depth)
self._check_returns_correct_shape(
image_height, image_width, depth_multiplier, expected_feature_map_shape)
def test_hyperparam_override(self):
hyperparams = self._build_conv_hyperparams()
model = mobilenet_v1.mobilenet_v1(
batchnorm_training=True,
conv_hyperparams=hyperparams,
weights=None,
use_explicit_padding=False,
alpha=1.0,
min_depth=32,
include_top=False)
hyperparams.params()
bn_layer = model.get_layer(name='conv_pw_5_bn')
self.assertAllClose(bn_layer.momentum, 0.2)
self.assertAllClose(bn_layer.epsilon, 0.1)
def test_variable_count(self):
depth_multiplier = 1
variables = self._get_variables(depth_multiplier)
# 135 is the number of variables from slim MobilenetV1 model.
self.assertEqual(len(variables), 135)
if __name__ == '__main__':
tf.test.main()
......@@ -21,7 +21,8 @@ import tensorflow as tf
from google.protobuf import text_format
from object_detection.builders import hyperparams_builder
from object_detection.models.keras_applications import mobilenet_v2
from object_detection.models.keras_models import mobilenet_v2
from object_detection.models.keras_models import test_utils
from object_detection.protos import hyperparams_pb2
from object_detection.utils import test_case
......@@ -151,56 +152,8 @@ class MobilenetV2Test(test_case.TestCase):
image_height = 128
image_width = 128
depth_multiplier = 1.0
expected_feature_map_shape = [(2, 64, 64, 32),
(2, 64, 64, 96),
(2, 32, 32, 96),
(2, 32, 32, 24),
(2, 32, 32, 144),
(2, 32, 32, 144),
(2, 32, 32, 24),
(2, 32, 32, 144),
(2, 16, 16, 144),
(2, 16, 16, 32),
(2, 16, 16, 192),
(2, 16, 16, 192),
(2, 16, 16, 32),
(2, 16, 16, 192),
(2, 16, 16, 192),
(2, 16, 16, 32),
(2, 16, 16, 192),
(2, 8, 8, 192),
(2, 8, 8, 64),
(2, 8, 8, 384),
(2, 8, 8, 384),
(2, 8, 8, 64),
(2, 8, 8, 384),
(2, 8, 8, 384),
(2, 8, 8, 64),
(2, 8, 8, 384),
(2, 8, 8, 384),
(2, 8, 8, 64),
(2, 8, 8, 384),
(2, 8, 8, 384),
(2, 8, 8, 96),
(2, 8, 8, 576),
(2, 8, 8, 576),
(2, 8, 8, 96),
(2, 8, 8, 576),
(2, 8, 8, 576),
(2, 8, 8, 96),
(2, 8, 8, 576),
(2, 4, 4, 576),
(2, 4, 4, 160),
(2, 4, 4, 960),
(2, 4, 4, 960),
(2, 4, 4, 160),
(2, 4, 4, 960),
(2, 4, 4, 960),
(2, 4, 4, 160),
(2, 4, 4, 960),
(2, 4, 4, 960),
(2, 4, 4, 320),
(2, 4, 4, 1280)]
expected_feature_map_shape = (
test_utils.moblenet_v2_expected_feature_map_shape_128)
self._check_returns_correct_shape(
2, image_height, image_width, depth_multiplier,
......@@ -211,56 +164,8 @@ class MobilenetV2Test(test_case.TestCase):
image_height = 128
image_width = 128
depth_multiplier = 1.0
expected_feature_map_shape = [(2, 64, 64, 32),
(2, 64, 64, 96),
(2, 32, 32, 96),
(2, 32, 32, 24),
(2, 32, 32, 144),
(2, 32, 32, 144),
(2, 32, 32, 24),
(2, 32, 32, 144),
(2, 16, 16, 144),
(2, 16, 16, 32),
(2, 16, 16, 192),
(2, 16, 16, 192),
(2, 16, 16, 32),
(2, 16, 16, 192),
(2, 16, 16, 192),
(2, 16, 16, 32),
(2, 16, 16, 192),
(2, 8, 8, 192),
(2, 8, 8, 64),
(2, 8, 8, 384),
(2, 8, 8, 384),
(2, 8, 8, 64),
(2, 8, 8, 384),
(2, 8, 8, 384),
(2, 8, 8, 64),
(2, 8, 8, 384),
(2, 8, 8, 384),
(2, 8, 8, 64),
(2, 8, 8, 384),
(2, 8, 8, 384),
(2, 8, 8, 96),
(2, 8, 8, 576),
(2, 8, 8, 576),
(2, 8, 8, 96),
(2, 8, 8, 576),
(2, 8, 8, 576),
(2, 8, 8, 96),
(2, 8, 8, 576),
(2, 4, 4, 576),
(2, 4, 4, 160),
(2, 4, 4, 960),
(2, 4, 4, 960),
(2, 4, 4, 160),
(2, 4, 4, 960),
(2, 4, 4, 960),
(2, 4, 4, 160),
(2, 4, 4, 960),
(2, 4, 4, 960),
(2, 4, 4, 320),
(2, 4, 4, 1280)]
expected_feature_map_shape = (
test_utils.moblenet_v2_expected_feature_map_shape_128_explicit_padding)
self._check_returns_correct_shape(
2, image_height, image_width, depth_multiplier,
expected_feature_map_shape, use_explicit_padding=True)
......@@ -270,56 +175,8 @@ class MobilenetV2Test(test_case.TestCase):
image_height = 128
image_width = 128
depth_multiplier = 1.0
expected_feature_map_shape = [(2, 64, 64, 32),
(2, 64, 64, 96),
(2, 32, 32, 96),
(2, 32, 32, 24),
(2, 32, 32, 144),
(2, 32, 32, 144),
(2, 32, 32, 24),
(2, 32, 32, 144),
(2, 16, 16, 144),
(2, 16, 16, 32),
(2, 16, 16, 192),
(2, 16, 16, 192),
(2, 16, 16, 32),
(2, 16, 16, 192),
(2, 16, 16, 192),
(2, 16, 16, 32),
(2, 16, 16, 192),
(2, 8, 8, 192),
(2, 8, 8, 64),
(2, 8, 8, 384),
(2, 8, 8, 384),
(2, 8, 8, 64),
(2, 8, 8, 384),
(2, 8, 8, 384),
(2, 8, 8, 64),
(2, 8, 8, 384),
(2, 8, 8, 384),
(2, 8, 8, 64),
(2, 8, 8, 384),
(2, 8, 8, 384),
(2, 8, 8, 96),
(2, 8, 8, 576),
(2, 8, 8, 576),
(2, 8, 8, 96),
(2, 8, 8, 576),
(2, 8, 8, 576),
(2, 8, 8, 96),
(2, 8, 8, 576),
(2, 4, 4, 576),
(2, 4, 4, 160),
(2, 4, 4, 960),
(2, 4, 4, 960),
(2, 4, 4, 160),
(2, 4, 4, 960),
(2, 4, 4, 960),
(2, 4, 4, 160),
(2, 4, 4, 960),
(2, 4, 4, 960),
(2, 4, 4, 320),
(2, 4, 4, 1280)]
expected_feature_map_shape = (
test_utils.mobilenet_v2_expected_feature_map_shape_with_dynamic_inputs)
self._check_returns_correct_shapes_with_dynamic_inputs(
2, image_height, image_width, depth_multiplier,
expected_feature_map_shape)
......@@ -328,57 +185,8 @@ class MobilenetV2Test(test_case.TestCase):
image_height = 299
image_width = 299
depth_multiplier = 1.0
expected_feature_map_shape = [(2, 150, 150, 32),
(2, 150, 150, 96),
(2, 75, 75, 96),
(2, 75, 75, 24),
(2, 75, 75, 144),
(2, 75, 75, 144),
(2, 75, 75, 24),
(2, 75, 75, 144),
(2, 38, 38, 144),
(2, 38, 38, 32),
(2, 38, 38, 192),
(2, 38, 38, 192),
(2, 38, 38, 32),
(2, 38, 38, 192),
(2, 38, 38, 192),
(2, 38, 38, 32),
(2, 38, 38, 192),
(2, 19, 19, 192),
(2, 19, 19, 64),
(2, 19, 19, 384),
(2, 19, 19, 384),
(2, 19, 19, 64),
(2, 19, 19, 384),
(2, 19, 19, 384),
(2, 19, 19, 64),
(2, 19, 19, 384),
(2, 19, 19, 384),
(2, 19, 19, 64),
(2, 19, 19, 384),
(2, 19, 19, 384),
(2, 19, 19, 96),
(2, 19, 19, 576),
(2, 19, 19, 576),
(2, 19, 19, 96),
(2, 19, 19, 576),
(2, 19, 19, 576),
(2, 19, 19, 96),
(2, 19, 19, 576),
(2, 10, 10, 576),
(2, 10, 10, 160),
(2, 10, 10, 960),
(2, 10, 10, 960),
(2, 10, 10, 160),
(2, 10, 10, 960),
(2, 10, 10, 960),
(2, 10, 10, 160),
(2, 10, 10, 960),
(2, 10, 10, 960),
(2, 10, 10, 320),
(2, 10, 10, 1280)]
expected_feature_map_shape = (
test_utils.moblenet_v2_expected_feature_map_shape_299)
self._check_returns_correct_shape(
2, image_height, image_width, depth_multiplier,
expected_feature_map_shape)
......@@ -388,56 +196,8 @@ class MobilenetV2Test(test_case.TestCase):
image_height = 299
image_width = 299
depth_multiplier = 0.5**12
expected_feature_map_shape = [(2, 150, 150, 32),
(2, 150, 150, 192),
(2, 75, 75, 192),
(2, 75, 75, 32),
(2, 75, 75, 192),
(2, 75, 75, 192),
(2, 75, 75, 32),
(2, 75, 75, 192),
(2, 38, 38, 192),
(2, 38, 38, 32),
(2, 38, 38, 192),
(2, 38, 38, 192),
(2, 38, 38, 32),
(2, 38, 38, 192),
(2, 38, 38, 192),
(2, 38, 38, 32),
(2, 38, 38, 192),
(2, 19, 19, 192),
(2, 19, 19, 32),
(2, 19, 19, 192),
(2, 19, 19, 192),
(2, 19, 19, 32),
(2, 19, 19, 192),
(2, 19, 19, 192),
(2, 19, 19, 32),
(2, 19, 19, 192),
(2, 19, 19, 192),
(2, 19, 19, 32),
(2, 19, 19, 192),
(2, 19, 19, 192),
(2, 19, 19, 32),
(2, 19, 19, 192),
(2, 19, 19, 192),
(2, 19, 19, 32),
(2, 19, 19, 192),
(2, 19, 19, 192),
(2, 19, 19, 32),
(2, 19, 19, 192),
(2, 10, 10, 192),
(2, 10, 10, 32),
(2, 10, 10, 192),
(2, 10, 10, 192),
(2, 10, 10, 32),
(2, 10, 10, 192),
(2, 10, 10, 192),
(2, 10, 10, 32),
(2, 10, 10, 192),
(2, 10, 10, 192),
(2, 10, 10, 32),
(2, 10, 10, 32)]
expected_feature_map_shape = (
test_utils.moblenet_v2_expected_feature_map_shape_enforcing_min_depth)
self._check_returns_correct_shape(
2, image_height, image_width, depth_multiplier,
expected_feature_map_shape, min_depth=32)
......
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Test utils for other test files."""
# import tensorflow as tf
#
# from nets import mobilenet_v1
#
# slim = tf.contrib.slim
#
# # Layer names of Slim to map Keras layer names in MobilenetV1
# _MOBLIENET_V1_SLIM_ENDPOINTS = [
# 'Conv2d_0',
# 'Conv2d_1_depthwise', 'Conv2d_1_pointwise',
# 'Conv2d_2_depthwise', 'Conv2d_2_pointwise',
# 'Conv2d_3_depthwise', 'Conv2d_3_pointwise',
# 'Conv2d_4_depthwise', 'Conv2d_4_pointwise',
# 'Conv2d_5_depthwise', 'Conv2d_5_pointwise',
# 'Conv2d_6_depthwise', 'Conv2d_6_pointwise',
# 'Conv2d_7_depthwise', 'Conv2d_7_pointwise',
# 'Conv2d_8_depthwise', 'Conv2d_8_pointwise',
# 'Conv2d_9_depthwise', 'Conv2d_9_pointwise',
# 'Conv2d_10_depthwise', 'Conv2d_10_pointwise',
# 'Conv2d_11_depthwise', 'Conv2d_11_pointwise',
# 'Conv2d_12_depthwise', 'Conv2d_12_pointwise',
# 'Conv2d_13_depthwise', 'Conv2d_13_pointwise'
# ]
#
#
# # Function to get the output shape of each layer in Slim. It's used to
# # generate the following constant expected_feature_map_shape for MobilenetV1.
# # Similarly, this can also apply to MobilenetV2.
# def _get_slim_endpoint_shapes(inputs, depth_multiplier=1.0, min_depth=8,
# use_explicit_padding=False):
# with slim.arg_scope([slim.conv2d, slim.separable_conv2d],
# normalizer_fn=slim.batch_norm):
# _, end_points = mobilenet_v1.mobilenet_v1_base(
# inputs, final_endpoint='Conv2d_13_pointwise',
# depth_multiplier=depth_multiplier, min_depth=min_depth,
# use_explicit_padding=use_explicit_padding)
# return [end_points[endpoint_name].get_shape()
# for endpoint_name in _MOBLIENET_V1_SLIM_ENDPOINTS]
# For Mobilenet V1
moblenet_v1_expected_feature_map_shape_128 = [
(2, 64, 64, 32), (2, 64, 64, 32), (2, 64, 64, 64), (2, 32, 32, 64),
(2, 32, 32, 128), (2, 32, 32, 128), (2, 32, 32, 128), (2, 16, 16, 128),
(2, 16, 16, 256), (2, 16, 16, 256), (2, 16, 16, 256), (2, 8, 8, 256),
(2, 8, 8, 512), (2, 8, 8, 512), (2, 8, 8, 512), (2, 8, 8, 512),
(2, 8, 8, 512), (2, 8, 8, 512), (2, 8, 8, 512), (2, 8, 8, 512),
(2, 8, 8, 512), (2, 8, 8, 512), (2, 8, 8, 512), (2, 4, 4, 512),
(2, 4, 4, 1024), (2, 4, 4, 1024), (2, 4, 4, 1024),
]
moblenet_v1_expected_feature_map_shape_128_explicit_padding = [
(2, 64, 64, 32), (2, 64, 64, 32), (2, 64, 64, 64), (2, 32, 32, 64),
(2, 32, 32, 128), (2, 32, 32, 128), (2, 32, 32, 128), (2, 16, 16, 128),
(2, 16, 16, 256), (2, 16, 16, 256), (2, 16, 16, 256), (2, 8, 8, 256),
(2, 8, 8, 512), (2, 8, 8, 512), (2, 8, 8, 512), (2, 8, 8, 512),
(2, 8, 8, 512), (2, 8, 8, 512), (2, 8, 8, 512), (2, 8, 8, 512),
(2, 8, 8, 512), (2, 8, 8, 512), (2, 8, 8, 512), (2, 4, 4, 512),
(2, 4, 4, 1024), (2, 4, 4, 1024), (2, 4, 4, 1024),
]
mobilenet_v1_expected_feature_map_shape_with_dynamic_inputs = [
(2, 64, 64, 32), (2, 64, 64, 32), (2, 64, 64, 64), (2, 32, 32, 64),
(2, 32, 32, 128), (2, 32, 32, 128), (2, 32, 32, 128), (2, 16, 16, 128),
(2, 16, 16, 256), (2, 16, 16, 256), (2, 16, 16, 256), (2, 8, 8, 256),
(2, 8, 8, 512), (2, 8, 8, 512), (2, 8, 8, 512), (2, 8, 8, 512),
(2, 8, 8, 512), (2, 8, 8, 512), (2, 8, 8, 512), (2, 8, 8, 512),
(2, 8, 8, 512), (2, 8, 8, 512), (2, 8, 8, 512), (2, 4, 4, 512),
(2, 4, 4, 1024), (2, 4, 4, 1024), (2, 4, 4, 1024),
]
moblenet_v1_expected_feature_map_shape_299 = [
(2, 150, 150, 32), (2, 150, 150, 32), (2, 150, 150, 64), (2, 75, 75, 64),
(2, 75, 75, 128), (2, 75, 75, 128), (2, 75, 75, 128), (2, 38, 38, 128),
(2, 38, 38, 256), (2, 38, 38, 256), (2, 38, 38, 256), (2, 19, 19, 256),
(2, 19, 19, 512), (2, 19, 19, 512), (2, 19, 19, 512), (2, 19, 19, 512),
(2, 19, 19, 512), (2, 19, 19, 512), (2, 19, 19, 512), (2, 19, 19, 512),
(2, 19, 19, 512), (2, 19, 19, 512), (2, 19, 19, 512), (2, 10, 10, 512),
(2, 10, 10, 1024), (2, 10, 10, 1024), (2, 10, 10, 1024),
]
moblenet_v1_expected_feature_map_shape_enforcing_min_depth = [
(2, 150, 150, 8), (2, 150, 150, 8), (2, 150, 150, 8), (2, 75, 75, 8),
(2, 75, 75, 8), (2, 75, 75, 8), (2, 75, 75, 8), (2, 38, 38, 8),
(2, 38, 38, 8), (2, 38, 38, 8), (2, 38, 38, 8), (2, 19, 19, 8),
(2, 19, 19, 8), (2, 19, 19, 8), (2, 19, 19, 8), (2, 19, 19, 8),
(2, 19, 19, 8), (2, 19, 19, 8), (2, 19, 19, 8), (2, 19, 19, 8),
(2, 19, 19, 8), (2, 19, 19, 8), (2, 19, 19, 8), (2, 10, 10, 8),
(2, 10, 10, 8), (2, 10, 10, 8), (2, 10, 10, 8),
]
# For Mobilenet V2
moblenet_v2_expected_feature_map_shape_128 = [
(2, 64, 64, 32), (2, 64, 64, 96), (2, 32, 32, 96), (2, 32, 32, 24),
(2, 32, 32, 144), (2, 32, 32, 144), (2, 32, 32, 24), (2, 32, 32, 144),
(2, 16, 16, 144), (2, 16, 16, 32), (2, 16, 16, 192), (2, 16, 16, 192),
(2, 16, 16, 32), (2, 16, 16, 192), (2, 16, 16, 192), (2, 16, 16, 32),
(2, 16, 16, 192), (2, 8, 8, 192), (2, 8, 8, 64), (2, 8, 8, 384),
(2, 8, 8, 384), (2, 8, 8, 64), (2, 8, 8, 384), (2, 8, 8, 384),
(2, 8, 8, 64), (2, 8, 8, 384), (2, 8, 8, 384), (2, 8, 8, 64),
(2, 8, 8, 384), (2, 8, 8, 384), (2, 8, 8, 96), (2, 8, 8, 576),
(2, 8, 8, 576), (2, 8, 8, 96), (2, 8, 8, 576), (2, 8, 8, 576),
(2, 8, 8, 96), (2, 8, 8, 576), (2, 4, 4, 576), (2, 4, 4, 160),
(2, 4, 4, 960), (2, 4, 4, 960), (2, 4, 4, 160), (2, 4, 4, 960),
(2, 4, 4, 960), (2, 4, 4, 160), (2, 4, 4, 960), (2, 4, 4, 960),
(2, 4, 4, 320), (2, 4, 4, 1280)
]
moblenet_v2_expected_feature_map_shape_128_explicit_padding = [
(2, 64, 64, 32), (2, 64, 64, 96), (2, 32, 32, 96), (2, 32, 32, 24),
(2, 32, 32, 144), (2, 32, 32, 144), (2, 32, 32, 24), (2, 32, 32, 144),
(2, 16, 16, 144), (2, 16, 16, 32), (2, 16, 16, 192), (2, 16, 16, 192),
(2, 16, 16, 32), (2, 16, 16, 192), (2, 16, 16, 192), (2, 16, 16, 32),
(2, 16, 16, 192), (2, 8, 8, 192), (2, 8, 8, 64), (2, 8, 8, 384),
(2, 8, 8, 384), (2, 8, 8, 64), (2, 8, 8, 384), (2, 8, 8, 384),
(2, 8, 8, 64), (2, 8, 8, 384), (2, 8, 8, 384), (2, 8, 8, 64),
(2, 8, 8, 384), (2, 8, 8, 384), (2, 8, 8, 96), (2, 8, 8, 576),
(2, 8, 8, 576), (2, 8, 8, 96), (2, 8, 8, 576), (2, 8, 8, 576),
(2, 8, 8, 96), (2, 8, 8, 576), (2, 4, 4, 576), (2, 4, 4, 160),
(2, 4, 4, 960), (2, 4, 4, 960), (2, 4, 4, 160), (2, 4, 4, 960),
(2, 4, 4, 960), (2, 4, 4, 160), (2, 4, 4, 960), (2, 4, 4, 960),
(2, 4, 4, 320), (2, 4, 4, 1280)
]
mobilenet_v2_expected_feature_map_shape_with_dynamic_inputs = [
(2, 64, 64, 32), (2, 64, 64, 96), (2, 32, 32, 96), (2, 32, 32, 24),
(2, 32, 32, 144), (2, 32, 32, 144), (2, 32, 32, 24), (2, 32, 32, 144),
(2, 16, 16, 144), (2, 16, 16, 32), (2, 16, 16, 192), (2, 16, 16, 192),
(2, 16, 16, 32), (2, 16, 16, 192), (2, 16, 16, 192), (2, 16, 16, 32),
(2, 16, 16, 192), (2, 8, 8, 192), (2, 8, 8, 64), (2, 8, 8, 384),
(2, 8, 8, 384), (2, 8, 8, 64), (2, 8, 8, 384), (2, 8, 8, 384),
(2, 8, 8, 64), (2, 8, 8, 384), (2, 8, 8, 384), (2, 8, 8, 64),
(2, 8, 8, 384), (2, 8, 8, 384), (2, 8, 8, 96), (2, 8, 8, 576),
(2, 8, 8, 576), (2, 8, 8, 96), (2, 8, 8, 576), (2, 8, 8, 576),
(2, 8, 8, 96), (2, 8, 8, 576), (2, 4, 4, 576), (2, 4, 4, 160),
(2, 4, 4, 960), (2, 4, 4, 960), (2, 4, 4, 160), (2, 4, 4, 960),
(2, 4, 4, 960), (2, 4, 4, 160), (2, 4, 4, 960), (2, 4, 4, 960),
(2, 4, 4, 320), (2, 4, 4, 1280)
]
moblenet_v2_expected_feature_map_shape_299 = [
(2, 150, 150, 32), (2, 150, 150, 96), (2, 75, 75, 96), (2, 75, 75, 24),
(2, 75, 75, 144), (2, 75, 75, 144), (2, 75, 75, 24), (2, 75, 75, 144),
(2, 38, 38, 144), (2, 38, 38, 32), (2, 38, 38, 192), (2, 38, 38, 192),
(2, 38, 38, 32), (2, 38, 38, 192), (2, 38, 38, 192), (2, 38, 38, 32),
(2, 38, 38, 192), (2, 19, 19, 192), (2, 19, 19, 64), (2, 19, 19, 384),
(2, 19, 19, 384), (2, 19, 19, 64), (2, 19, 19, 384), (2, 19, 19, 384),
(2, 19, 19, 64), (2, 19, 19, 384), (2, 19, 19, 384), (2, 19, 19, 64),
(2, 19, 19, 384), (2, 19, 19, 384), (2, 19, 19, 96), (2, 19, 19, 576),
(2, 19, 19, 576), (2, 19, 19, 96), (2, 19, 19, 576), (2, 19, 19, 576),
(2, 19, 19, 96), (2, 19, 19, 576), (2, 10, 10, 576), (2, 10, 10, 160),
(2, 10, 10, 960), (2, 10, 10, 960), (2, 10, 10, 160), (2, 10, 10, 960),
(2, 10, 10, 960), (2, 10, 10, 160), (2, 10, 10, 960), (2, 10, 10, 960),
(2, 10, 10, 320), (2, 10, 10, 1280)
]
moblenet_v2_expected_feature_map_shape_enforcing_min_depth = [
(2, 150, 150, 32), (2, 150, 150, 192), (2, 75, 75, 192), (2, 75, 75, 32),
(2, 75, 75, 192), (2, 75, 75, 192), (2, 75, 75, 32), (2, 75, 75, 192),
(2, 38, 38, 192), (2, 38, 38, 32), (2, 38, 38, 192), (2, 38, 38, 192),
(2, 38, 38, 32), (2, 38, 38, 192), (2, 38, 38, 192), (2, 38, 38, 32),
(2, 38, 38, 192), (2, 19, 19, 192), (2, 19, 19, 32), (2, 19, 19, 192),
(2, 19, 19, 192), (2, 19, 19, 32), (2, 19, 19, 192), (2, 19, 19, 192),
(2, 19, 19, 32), (2, 19, 19, 192), (2, 19, 19, 192), (2, 19, 19, 32),
(2, 19, 19, 192), (2, 19, 19, 192), (2, 19, 19, 32), (2, 19, 19, 192),
(2, 19, 19, 192), (2, 19, 19, 32), (2, 19, 19, 192), (2, 19, 19, 192),
(2, 19, 19, 32), (2, 19, 19, 192), (2, 10, 10, 192), (2, 10, 10, 32),
(2, 10, 10, 192), (2, 10, 10, 192), (2, 10, 10, 32), (2, 10, 10, 192),
(2, 10, 10, 192), (2, 10, 10, 32), (2, 10, 10, 192), (2, 10, 10, 192),
(2, 10, 10, 32), (2, 10, 10, 32)
]
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment