Unverified Commit 99256cf4 authored by pkulzc's avatar pkulzc Committed by GitHub
Browse files

Release iNaturalist Species-trained models, refactor of evaluation, box...

Release iNaturalist Species-trained models, refactor of evaluation, box predictor for object detection. (#5289)

* Merged commit includes the following changes:
212389173  by Zhichao Lu:

    1. Replace tf.boolean_mask with tf.where

--
212282646  by Zhichao Lu:

    1. Fix a typo in model_builder.py and add a test to cover it.

--
212142989  by Zhichao Lu:

    Only resize masks in meta architecture if it has not already been resized in the input pipeline.

--
212136935  by Zhichao Lu:

    Choose matmul or native crop_and_resize in the model builder instead of faster r-cnn meta architecture.

--
211907984  by Zhichao Lu:

    Make eval input reader repeated field and update config util to handle this field.

--
211858098  by Zhichao Lu:

    Change the implementation of merge_boxes_with_multiple_labels.

--
211843915  by Zhichao Lu:

    Add Mobilenet v2 + FPN support.

--
211655076  by Zhichao Lu:

    Bug fix for generic keys in config overrides

    In generic configuration overrides, we had a duplicate entry for train_input_config and we were missing the eval_input_config and eval_config.

    This change also introduces testing for all config overrides.

--
211157501  by Zhichao Lu:

    Make the locally-modified conv defs a copy.

    So that it doesn't modify MobileNet conv defs globally for other code that
    transitively imports this package.

--
211112813  by Zhichao Lu:

    Refactoring visualization tools for Estimator's eval_metric_ops. This will make it easier for future models to take advantage of a single interface and mechanics.

--
211109571  by Zhichao Lu:

    A test decorator.

--
210747685  by Zhichao Lu:

    For FPN, when use_depthwise is set to true, use slightly modified mobilenet v1 config.

--
210723882  by Zhichao Lu:

    Integrating the losses mask into the meta architectures. When providing groundtruth, one can optionally specify annotation information (i.e. which images are labeled vs. unlabeled). For any image that is unlabeled, there is no loss accumulation.

--
210673675  by Zhichao Lu:

    Internal change.

--
210546590  by Zhichao Lu:

    Internal change.

--
210529752  by Zhichao Lu:

    Support batched inputs with ops.matmul_crop_and_resize.

    With this change the new inputs are images of shape [batch, heigh, width, depth] and boxes of shape [batch, num_boxes, 4]. The output tensor is of the shape [batch, num_boxes, crop_height, crop_width, depth].

--
210485912  by Zhichao Lu:

    Fix TensorFlow version check in object_detection_tutorial.ipynb

--
210484076  by Zhichao Lu:

    Reduce TPU memory required for single image matmul_crop_and_resize.

    Using tf.einsum eliminates intermediate tensors, tiling and expansion. for an image of size [40, 40, 1024] and boxes of shape [300, 4] HBM memory usage goes down from 3.52G to 1.67G.

--
210468361  by Zhichao Lu:

    Remove PositiveAnchorLossCDF/NegativeAnchorLossCDF to resolve "Main thread is not in main loop error" issue in local training.

--
210100253  by Zhichao Lu:

    Pooling pyramid feature maps: add option to replace max pool with convolution layers.

--
209995842  by Zhichao Lu:

    Fix a bug which prevents variable sharing in Faster RCNN.

--
209965526  by Zhichao Lu:

    Add support for enabling export_to_tpu through the estimator.

--
209946440  by Zhichao Lu:

    Replace deprecated tf.train.Supervisor with tf.train.MonitoredSession. MonitoredSession also takes away the hassle of starting queue runners.

--
209888003  by Zhichao Lu:

    Implement function to handle data where source_id is not set.

    If the field source_id is found to be the empty string for any image during runtime, it will be replaced with a random string. This avoids hash-collisions on dataset where many examples do not have source_id set. Those hash-collisions have unintended site effects and may lead to bugs in the detection pipeline.

--
209842134  by Zhichao Lu:

    Converting loss mask into multiplier, rather than using it as a boolean mask (which changes tensor shape). This is necessary, since other utilities (e.g. hard example miner) require a loss matrix with the same dimensions as the original prediction tensor.

--
209768066  by Zhichao Lu:

    Adding ability to remove loss computation from specific images in a batch, via an optional boolean mask.

--
209722556  by Zhichao Lu:

    Remove dead code.

    (_USE_C_API was flipped to True by default in TensorFlow 1.8)

--
209701861  by Zhichao Lu:

    This CL cleans-up some tf.Example creation snippets, by reusing the convenient tf.train.Feature building functions in dataset_util.

--
209697893  by Zhichao Lu:

    Do not overwrite num_epoch for eval input. This leads to errors in some cases.

--
209694652  by Zhichao Lu:

    Sample boxes by jittering around the currently given boxes.

--
209550300  by Zhichao Lu:

    `create_category_index_from_labelmap()` function now accepts `use_display_name` parameter.
    Also added create_categories_from_labelmap function for convenience

--
209490273  by Zhichao Lu:

    Check result_dict type before accessing image_id via key.

--
209442529  by Zhichao Lu:

    Introducing the capability to sample examples for evaluation. This makes it easy to specify one full epoch of evaluation, or a subset (e.g. sample 1 of every N examples).

--
208941150  by Zhichao Lu:

    Adding the capability of exporting the results in json format.

--
208888798  by Zhichao Lu:

    Fixes wrong dictionary key for num_det_boxes_per_image.

--
208873549  by Zhichao Lu:

    Reduce the number of HLO ops created by matmul_crop_and_resize.

    Do not unroll along the channels dimension. Instead, transpose the input image dimensions, apply tf.matmul and transpose back.

    The number of HLO instructions for 1024 channels reduce from 12368 to 110.

--
208844315  by Zhichao Lu:

    Add an option to use tf.non_maximal_supression_padded in SSD post-process

--
208731380  by Zhichao Lu:

    Add field in box_predictor config to enable mask prediction and update builders accordingly.

--
208699405  by Zhichao Lu:

    This CL creates a keras-based multi-resolution feature map extractor.

--
208557208  by Zhichao Lu:

    Add TPU tests for Faster R-CNN Meta arch.

    * Tests that two_stage_predict and total_loss tests run successfully on TPU.
    * Small mods to multiclass_non_max_suppression to preserve static shapes.

--
208499278  by Zhichao Lu:

    This CL makes sure the Keras convolutional box predictor & head layers apply activation layers *after* normalization (as opposed to before).

--
208391694  by Zhichao Lu:

    Updating visualization tool to produce multiple evaluation images.

--
208275961  by Zhichao Lu:

    This CL adds a Keras version of the Convolutional Box Predictor, as well as more general infrastructure for making Keras Prediction heads & Keras box predictors.

--
208275585  by Zhichao Lu:

    This CL enables the Keras layer hyperparameter object to build a dedicated activation layer, and to disable activation by default in the op layer construction kwargs.

    This is necessary because in most cases the normalization layer must be applied before the activation layer. So, in Keras models we must set the convolution activation in a dedicated layer after normalization is applied, rather than setting it in the convolution layer construction args.

--
208263792  by Zhichao Lu:

    Add a new SSD mask meta arch that can predict masks for SSD models.
    Changes including:
     - overwrite loss function to add mask loss computation.
     - update ssd_meta_arch to handle masks if predicted in predict and postprocessing.

--
208000218  by Zhichao Lu:

    Make FasterRCNN choose static shape operations only in training mode.

--
207997797  by Zhichao Lu:

    Add static boolean_mask op to box_list_ops.py and use that in faster_rcnn_meta_arch.py to support use_static_shapes option.

--
207993460  by Zhichao Lu:

    Include FGVC detection models in model zoo.

--
207971213  by Zhichao Lu:

    remove the restriction to run tf.nn.top_k op on CPU

--
207961187  by Zhichao Lu:

    Build the first stage NMS function in the model builder and pass it to FasterRCNN meta arch.

--
207960608  by Zhichao Lu:

    Internal Change.

--
207927015  by Zhichao Lu:

    Have an option to use the TPU compatible NMS op cl/206673787, in the batch_multiclass_non_max_suppression function. On setting pad_to_max_output_size to true, the output nmsed boxes are padded to be of length max_size_per_class.

    This can be used in first stage Region Proposal Network in FasterRCNN model by setting the first_stage_nms_pad_to_max_proposals field to true in config proto.

--
207809668  by Zhichao Lu:

    Add option to use depthwise separable conv instead of conv2d in FPN and WeightSharedBoxPredictor. More specifically, there are two related configs:
    - SsdFeatureExtractor.use_depthwise
    - WeightSharedConvolutionalBoxPredictor.use_depthwise

--
207808651  by Zhichao Lu:

    Fix the static balanced positive negative sampler's TPU tests

--
207798658  by Zhichao Lu:

    Fixes a post-refactoring bug where the pre-prediction convolution layers in the convolutional box predictor are ignored.

--
207796470  by Zhichao Lu:

    Make slim endpoints visible in FasterRCNNMetaArch.

--
207787053  by Zhichao Lu:

    Refactor ssd_meta_arch so that the target assigner instance is passed into the SSDMetaArch constructor rather than constructed inside.

--

PiperOrigin-RevId: 212389173

* Fix detection model zoo typo.

* Modify tf example decoder to handle label maps with either `display_name` or `name` fields seamlessly.

Currently, tf example decoder uses only `name` field to look up ids for class text field present in the data. This change uses both `display_name` and `name` fields in the label map to fetch ids for class text.

PiperOrigin-RevId: 212672223

* Modify create_coco_tf_record tool to write out class text instead of class labels.

PiperOrigin-RevId: 212679112

* Fix detection model zoo typo.

PiperOrigin-RevId: 212715692

* Adding the following two optional flags to WeightSharedConvolutionalBoxHead:
1) In the box head, apply clipping to box encodings in the box head.
2) In the class head, apply sigmoid to class predictions at inference time.

PiperOrigin-RevId: 212723242

* Support class confidences in merge boxes with multiple labels.

PiperOrigin-RevId: 212884998

* Creates multiple eval specs for object detection.

PiperOrigin-RevId: 212894556

* Set batch_norm on last layer in Mask Head to None.

PiperOrigin-RevId: 213030087

* Enable bfloat16 training for object detection models.

PiperOrigin-RevId: 213053547

* Skip padding op when unnecessary.

PiperOrigin-RevId: 213065869

* Modify `Matchers` to use groundtruth weights before performing matching.

Groundtruth weights tensor is used to indicate padding in groundtruth box tensor. It is handled in `TargetAssigner` by creating appropriate classification and regression target weights based on the groundtruth box each anchor matches to. However, options such as `force_match_all_rows` in `ArgmaxMatcher` force certain anchors to match to groundtruth boxes that are just paddings thereby reducing the number of anchors that could otherwise match to real groundtruth boxes.

For single stage models like SSD the effect of this is negligible as there are two orders of magnitude more anchors than the number of padded groundtruth boxes. But for Faster R-CNN and Mask R-CNN where there are only 300 anchors in the second stage, a significant number of these match to groundtruth paddings reducing the number of anchors regressing to real groundtruth boxes degrading the performance severely.

Therefore, this change introduces an additional boolean argument `valid_rows` to `Matcher.match` methods and the implementations now ignore such padded groudtruth boxes during matching.

PiperOrigin-RevId: 213345395

* Add release note for iNaturalist Species trained models.

PiperOrigin-RevId: 213347179

* Fix the bug of uninitialized gt_is_crowd_list variable.

PiperOrigin-RevId: 213364858

* ...text exposed to open source public git repo...

PiperOrigin-RevId: 213554260
parent 256b8ae6
......@@ -273,6 +273,7 @@ def evaluate(create_input_dict_fn, create_model_fn, eval_config, categories,
master=eval_config.eval_master,
save_graph=eval_config.save_graph,
save_graph_dir=(eval_dir if eval_config.save_graph else ''),
losses_dict=losses_dict)
losses_dict=losses_dict,
eval_export_path=eval_config.export_path)
return metrics
......@@ -99,17 +99,19 @@ class ArgMaxMatcher(matcher.Matcher):
if self._unmatched_threshold == self._matched_threshold:
raise ValueError('When negatives are in between matched and '
'unmatched thresholds, these cannot be of equal '
'value. matched: %s, unmatched: %s',
self._matched_threshold, self._unmatched_threshold)
'value. matched: {}, unmatched: {}'.format(
self._matched_threshold,
self._unmatched_threshold))
self._force_match_for_each_row = force_match_for_each_row
self._negatives_lower_than_unmatched = negatives_lower_than_unmatched
def _match(self, similarity_matrix):
def _match(self, similarity_matrix, valid_rows):
"""Tries to match each column of the similarity matrix to a row.
Args:
similarity_matrix: tensor of shape [N, M] representing any similarity
metric.
valid_rows: a boolean tensor of shape [N] indicating valid rows.
Returns:
Match object with corresponding matches for each of M columns.
......@@ -167,8 +169,10 @@ class ArgMaxMatcher(matcher.Matcher):
similarity_matrix)
force_match_column_ids = tf.argmax(similarity_matrix, 1,
output_type=tf.int32)
force_match_column_indicators = tf.one_hot(
force_match_column_ids, depth=similarity_matrix_shape[1])
force_match_column_indicators = (
tf.one_hot(
force_match_column_ids, depth=similarity_matrix_shape[1]) *
tf.cast(tf.expand_dims(valid_rows, axis=-1), dtype=tf.float32))
force_match_row_ids = tf.argmax(force_match_column_indicators, 0,
output_type=tf.int32)
force_match_column_mask = tf.cast(
......
......@@ -182,6 +182,34 @@ class ArgMaxMatcherTest(test_case.TestCase):
self.assertAllEqual(np.nonzero(res_unmatched_cols)[0],
expected_unmatched_cols)
def test_return_correct_matches_using_force_match_padded_groundtruth(self):
def graph_fn(similarity, valid_rows):
matcher = argmax_matcher.ArgMaxMatcher(matched_threshold=3.,
unmatched_threshold=2.,
force_match_for_each_row=True)
match = matcher.match(similarity, valid_rows)
matched_cols = match.matched_column_indicator()
unmatched_cols = match.unmatched_column_indicator()
match_results = match.match_results
return (matched_cols, unmatched_cols, match_results)
similarity = np.array([[1, 1, 1, 3, 1],
[-1, 0, -2, -2, -1],
[0, 0, 0, 0, 0],
[3, 0, -1, 2, 0],
[0, 0, 0, 0, 0]], dtype=np.float32)
valid_rows = np.array([True, True, False, True, False])
expected_matched_cols = np.array([0, 1, 3])
expected_matched_rows = np.array([3, 1, 0])
expected_unmatched_cols = np.array([2, 4]) # col 2 has too high max val
(res_matched_cols, res_unmatched_cols,
match_results) = self.execute(graph_fn, [similarity, valid_rows])
self.assertAllEqual(match_results[res_matched_cols], expected_matched_rows)
self.assertAllEqual(np.nonzero(res_matched_cols)[0], expected_matched_cols)
self.assertAllEqual(np.nonzero(res_unmatched_cols)[0],
expected_unmatched_cols)
def test_valid_arguments_corner_case(self):
argmax_matcher.ArgMaxMatcher(matched_threshold=1,
unmatched_threshold=1)
......
......@@ -35,7 +35,7 @@ class GreedyBipartiteMatcher(matcher.Matcher):
super(GreedyBipartiteMatcher, self).__init__(
use_matmul_gather=use_matmul_gather)
def _match(self, similarity_matrix, num_valid_rows=-1):
def _match(self, similarity_matrix, valid_rows):
"""Bipartite matches a collection rows and columns. A greedy bi-partite.
TODO(rathodv): Add num_valid_columns options to match only that many columns
......@@ -44,21 +44,27 @@ class GreedyBipartiteMatcher(matcher.Matcher):
Args:
similarity_matrix: Float tensor of shape [N, M] with pairwise similarity
where higher values mean more similar.
num_valid_rows: A scalar or a 1-D tensor with one element describing the
number of valid rows of similarity_matrix to consider for the bipartite
matching. If set to be negative, then all rows from similarity_matrix
are used.
valid_rows: A boolean tensor of shape [N] indicating the rows that are
valid.
Returns:
match_results: int32 tensor of shape [M] with match_results[i]=-1
meaning that column i is not matched and otherwise that it is matched to
row match_results[i].
"""
valid_row_sim_matrix = tf.gather(similarity_matrix,
tf.squeeze(tf.where(valid_rows), axis=-1))
invalid_row_sim_matrix = tf.gather(
similarity_matrix,
tf.squeeze(tf.where(tf.logical_not(valid_rows)), axis=-1))
similarity_matrix = tf.concat(
[valid_row_sim_matrix, invalid_row_sim_matrix], axis=0)
# Convert similarity matrix to distance matrix as tf.image.bipartite tries
# to find minimum distance matches.
distance_matrix = -1 * similarity_matrix
num_valid_rows = tf.reduce_sum(tf.to_float(valid_rows))
_, match_results = image_ops.bipartite_match(
distance_matrix, num_valid_rows)
distance_matrix, num_valid_rows=num_valid_rows)
match_results = tf.reshape(match_results, [-1])
match_results = tf.cast(match_results, tf.int32)
return match_results
......@@ -24,44 +24,54 @@ class GreedyBipartiteMatcherTest(tf.test.TestCase):
def test_get_expected_matches_when_all_rows_are_valid(self):
similarity_matrix = tf.constant([[0.50, 0.1, 0.8], [0.15, 0.2, 0.3]])
num_valid_rows = 2
valid_rows = tf.ones([2], dtype=tf.bool)
expected_match_results = [-1, 1, 0]
matcher = bipartite_matcher.GreedyBipartiteMatcher()
match = matcher.match(similarity_matrix, num_valid_rows=num_valid_rows)
match = matcher.match(similarity_matrix, valid_rows=valid_rows)
with self.test_session() as sess:
match_results_out = sess.run(match._match_results)
self.assertAllEqual(match_results_out, expected_match_results)
def test_get_expected_matches_with_valid_rows_set_to_minus_one(self):
def test_get_expected_matches_with_all_rows_be_default(self):
similarity_matrix = tf.constant([[0.50, 0.1, 0.8], [0.15, 0.2, 0.3]])
num_valid_rows = -1
expected_match_results = [-1, 1, 0]
matcher = bipartite_matcher.GreedyBipartiteMatcher()
match = matcher.match(similarity_matrix, num_valid_rows=num_valid_rows)
match = matcher.match(similarity_matrix)
with self.test_session() as sess:
match_results_out = sess.run(match._match_results)
self.assertAllEqual(match_results_out, expected_match_results)
def test_get_no_matches_with_zero_valid_rows(self):
similarity_matrix = tf.constant([[0.50, 0.1, 0.8], [0.15, 0.2, 0.3]])
num_valid_rows = 0
valid_rows = tf.zeros([2], dtype=tf.bool)
expected_match_results = [-1, -1, -1]
matcher = bipartite_matcher.GreedyBipartiteMatcher()
match = matcher.match(similarity_matrix, num_valid_rows=num_valid_rows)
match = matcher.match(similarity_matrix, valid_rows)
with self.test_session() as sess:
match_results_out = sess.run(match._match_results)
self.assertAllEqual(match_results_out, expected_match_results)
def test_get_expected_matches_with_only_one_valid_row(self):
similarity_matrix = tf.constant([[0.50, 0.1, 0.8], [0.15, 0.2, 0.3]])
num_valid_rows = 1
valid_rows = tf.constant([True, False], dtype=tf.bool)
expected_match_results = [-1, -1, 0]
matcher = bipartite_matcher.GreedyBipartiteMatcher()
match = matcher.match(similarity_matrix, num_valid_rows=num_valid_rows)
match = matcher.match(similarity_matrix, valid_rows)
with self.test_session() as sess:
match_results_out = sess.run(match._match_results)
self.assertAllEqual(match_results_out, expected_match_results)
def test_get_expected_matches_with_only_one_valid_row_at_bottom(self):
similarity_matrix = tf.constant([[0.15, 0.2, 0.3], [0.50, 0.1, 0.8]])
valid_rows = tf.constant([False, True], dtype=tf.bool)
expected_match_results = [-1, -1, 0]
matcher = bipartite_matcher.GreedyBipartiteMatcher()
match = matcher.match(similarity_matrix, valid_rows)
with self.test_session() as sess:
match_results_out = sess.run(match._match_results)
self.assertAllEqual(match_results_out, expected_match_results)
......
......@@ -103,7 +103,6 @@ from object_detection.core import box_list_ops
from object_detection.core import box_predictor
from object_detection.core import losses
from object_detection.core import model
from object_detection.core import post_processing
from object_detection.core import standard_fields as fields
from object_detection.core import target_assigner
from object_detection.utils import ops
......@@ -234,11 +233,11 @@ class FasterRCNNMetaArch(model.DetectionModel):
first_stage_box_predictor_depth,
first_stage_minibatch_size,
first_stage_sampler,
first_stage_nms_score_threshold,
first_stage_nms_iou_threshold,
first_stage_non_max_suppression_fn,
first_stage_max_proposals,
first_stage_localization_loss_weight,
first_stage_objectness_loss_weight,
crop_and_resize_fn,
initial_crop_size,
maxpool_kernel_size,
maxpool_stride,
......@@ -255,8 +254,9 @@ class FasterRCNNMetaArch(model.DetectionModel):
hard_example_miner=None,
parallel_iterations=16,
add_summaries=True,
use_matmul_crop_and_resize=False,
clip_anchors_to_image=False):
clip_anchors_to_image=False,
use_static_shapes=False,
resize_masks=True):
"""FasterRCNNMetaArch Constructor.
Args:
......@@ -309,18 +309,22 @@ class FasterRCNNMetaArch(model.DetectionModel):
to the loss function for any given image within the image batch and is
only called "batch_size" due to terminology from the Faster R-CNN paper.
first_stage_sampler: Sampler to use for first stage loss (RPN loss).
first_stage_nms_score_threshold: Score threshold for non max suppression
for the Region Proposal Network (RPN). This value is expected to be in
[0, 1] as it is applied directly after a softmax transformation. The
recommended value for Faster R-CNN is 0.
first_stage_nms_iou_threshold: The Intersection Over Union (IOU) threshold
for performing Non-Max Suppression (NMS) on the boxes predicted by the
Region Proposal Network (RPN).
first_stage_non_max_suppression_fn: batch_multiclass_non_max_suppression
callable that takes `boxes`, `scores` and optional `clip_window`(with
all other inputs already set) and returns a dictionary containing
tensors with keys: `detection_boxes`, `detection_scores`,
`detection_classes`, `num_detections`. This is used to perform non max
suppression on the boxes predicted by the Region Proposal Network
(RPN).
See `post_processing.batch_multiclass_non_max_suppression` for the type
and shape of these tensors.
first_stage_max_proposals: Maximum number of boxes to retain after
performing Non-Max Suppression (NMS) on the boxes predicted by the
Region Proposal Network (RPN).
first_stage_localization_loss_weight: A float
first_stage_objectness_loss_weight: A float
crop_and_resize_fn: A differentiable resampler to use for cropping RPN
proposal features.
initial_crop_size: A single integer indicating the output size
(width and height are set to be the same) of the initial bilinear
interpolation based cropping during ROI pooling.
......@@ -367,12 +371,13 @@ class FasterRCNNMetaArch(model.DetectionModel):
in parallel for calls to tf.map_fn.
add_summaries: boolean (default: True) controlling whether summary ops
should be added to tensorflow graph.
use_matmul_crop_and_resize: Force the use of matrix multiplication based
crop and resize instead of standard tf.image.crop_and_resize while
computing second stage input feature maps.
clip_anchors_to_image: Normally, anchors generated for a given image size
are pruned during training if they lie outside the image window. This
option clips the anchors to be within the image instead of pruning.
are pruned during training if they lie outside the image window. This
option clips the anchors to be within the image instead of pruning.
use_static_shapes: If True, uses implementation of ops with static shape
guarantees.
resize_masks: Indicates whether the masks presend in the groundtruth
should be resized in the model with `image_resizer_fn`
Raises:
ValueError: If `second_stage_batch_size` > `first_stage_max_proposals` at
......@@ -384,9 +389,6 @@ class FasterRCNNMetaArch(model.DetectionModel):
# in the future.
super(FasterRCNNMetaArch, self).__init__(num_classes=num_classes)
if is_training and second_stage_batch_size > first_stage_max_proposals:
raise ValueError('second_stage_batch_size should be no greater than '
'first_stage_max_proposals.')
if not isinstance(first_stage_anchor_generator,
grid_anchor_generator.GridAnchorGenerator):
raise ValueError('first_stage_anchor_generator must be of type '
......@@ -394,6 +396,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
self._is_training = is_training
self._image_resizer_fn = image_resizer_fn
self._resize_masks = resize_masks
self._feature_extractor = feature_extractor
self._number_of_stages = number_of_stages
......@@ -425,9 +428,9 @@ class FasterRCNNMetaArch(model.DetectionModel):
min_depth=0,
max_depth=0))
self._first_stage_nms_score_threshold = first_stage_nms_score_threshold
self._first_stage_nms_iou_threshold = first_stage_nms_iou_threshold
self._first_stage_nms_fn = first_stage_non_max_suppression_fn
self._first_stage_max_proposals = first_stage_max_proposals
self._use_static_shapes = use_static_shapes
self._first_stage_localization_loss = (
losses.WeightedSmoothL1LocalizationLoss())
......@@ -437,6 +440,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
self._first_stage_obj_loss_weight = first_stage_objectness_loss_weight
# Per-region cropping parameters
self._crop_and_resize_fn = crop_and_resize_fn
self._initial_crop_size = initial_crop_size
self._maxpool_kernel_size = maxpool_kernel_size
self._maxpool_stride = maxpool_stride
......@@ -458,7 +462,6 @@ class FasterRCNNMetaArch(model.DetectionModel):
self._second_stage_cls_loss_weight = second_stage_classification_loss_weight
self._second_stage_mask_loss_weight = (
second_stage_mask_prediction_loss_weight)
self._use_matmul_crop_and_resize = use_matmul_crop_and_resize
self._hard_example_miner = hard_example_miner
self._parallel_iterations = parallel_iterations
......@@ -673,9 +676,13 @@ class FasterRCNNMetaArch(model.DetectionModel):
}
if self._number_of_stages >= 2:
# If mixed-precision training on TPU is enabled, rpn_box_encodings and
# rpn_objectness_predictions_with_background are bfloat16 tensors.
# Considered prediction results, they need to be casted to float32
# tensors for correct postprocess_rpn computation in predict_second_stage.
prediction_dict.update(self._predict_second_stage(
rpn_box_encodings,
rpn_objectness_predictions_with_background,
tf.to_float(rpn_box_encodings),
tf.to_float(rpn_objectness_predictions_with_background),
rpn_features_to_crop,
self._anchors.get(), image_shape, true_image_shapes))
......@@ -719,7 +726,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
[batch_size, num_valid_anchors, 2] containing class
predictions (logits) for each of the anchors. Note that this
tensor *includes* background class predictions (at class index 0).
rpn_features_to_crop: A 4-D float32 tensor with shape
rpn_features_to_crop: A 4-D float32 or bfloat16 tensor with shape
[batch_size, height, width, depth] representing image features to crop
using the proposal boxes predicted by the RPN.
anchors: 2-D float tensor of shape
......@@ -758,17 +765,22 @@ class FasterRCNNMetaArch(model.DetectionModel):
boxes proposed by the RPN, thus enabling one to extract features and
get box classification and prediction for externally selected areas
of the image.
6) box_classifier_features: a 4-D float32 tensor representing the
features for each proposal.
6) box_classifier_features: a 4-D float32 or bfloat16 tensor
representing the features for each proposal.
"""
image_shape_2d = self._image_batch_shape_2d(image_shape)
proposal_boxes_normalized, _, num_proposals = self._postprocess_rpn(
rpn_box_encodings, rpn_objectness_predictions_with_background,
anchors, image_shape_2d, true_image_shapes)
# If mixed-precision training on TPU is enabled, the dtype of
# rpn_features_to_crop is bfloat16, otherwise it is float32. tf.cast is
# used to match the dtype of proposal_boxes_normalized to that of
# rpn_features_to_crop for further computation.
flattened_proposal_feature_maps = (
self._compute_second_stage_input_feature_maps(
rpn_features_to_crop, proposal_boxes_normalized))
rpn_features_to_crop,
tf.cast(proposal_boxes_normalized, rpn_features_to_crop.dtype)))
box_classifier_features = (
self._feature_extractor.extract_box_classifier_features(
......@@ -956,8 +968,11 @@ class FasterRCNNMetaArch(model.DetectionModel):
image_shape: A 1-D tensor representing the input image shape.
"""
image_shape = tf.shape(preprocessed_inputs)
rpn_features_to_crop, _ = self._feature_extractor.extract_proposal_features(
preprocessed_inputs, scope=self.first_stage_feature_extractor_scope)
rpn_features_to_crop, self.endpoints = (
self._feature_extractor.extract_proposal_features(
preprocessed_inputs,
scope=self.first_stage_feature_extractor_scope))
feature_map_shape = tf.shape(rpn_features_to_crop)
anchors = box_list_ops.concatenate(
......@@ -965,12 +980,15 @@ class FasterRCNNMetaArch(model.DetectionModel):
feature_map_shape[2])]))
with slim.arg_scope(self._first_stage_box_predictor_arg_scope_fn()):
kernel_size = self._first_stage_box_predictor_kernel_size
reuse = tf.get_variable_scope().reuse
rpn_box_predictor_features = slim.conv2d(
rpn_features_to_crop,
self._first_stage_box_predictor_depth,
kernel_size=[kernel_size, kernel_size],
rate=self._first_stage_atrous_rate,
activation_fn=tf.nn.relu6)
activation_fn=tf.nn.relu6,
scope='Conv',
reuse=reuse)
return (rpn_box_predictor_features, rpn_features_to_crop,
anchors, image_shape)
......@@ -1223,14 +1241,9 @@ class FasterRCNNMetaArch(model.DetectionModel):
rpn_objectness_predictions_with_background_batch)[:, :, 1]
clip_window = self._compute_clip_window(image_shapes)
(proposal_boxes, proposal_scores, _, _, _,
num_proposals) = post_processing.batch_multiclass_non_max_suppression(
num_proposals) = self._first_stage_nms_fn(
tf.expand_dims(proposal_boxes, axis=2),
tf.expand_dims(rpn_objectness_softmax_without_background,
axis=2),
self._first_stage_nms_score_threshold,
self._first_stage_nms_iou_threshold,
self._first_stage_max_proposals,
self._first_stage_max_proposals,
tf.expand_dims(rpn_objectness_softmax_without_background, axis=2),
clip_window=clip_window)
if self._is_training:
proposal_boxes = tf.stop_gradient(proposal_boxes)
......@@ -1377,16 +1390,19 @@ class FasterRCNNMetaArch(model.DetectionModel):
groundtruth_masks_list = self._groundtruth_lists.get(
fields.BoxListFields.masks)
if groundtruth_masks_list is not None:
# TODO(rathodv): Remove mask resizing once the legacy pipeline is deleted.
if groundtruth_masks_list is not None and self._resize_masks:
resized_masks_list = []
for mask in groundtruth_masks_list:
_, resized_mask, _ = self._image_resizer_fn(
# Reuse the given `image_resizer_fn` to resize groundtruth masks.
# `mask` tensor for an image is of the shape [num_masks,
# image_height, image_width]. Below we create a dummy image of the
# the shape [image_height, image_width, 1] to use with
# `image_resizer_fn`.
image=tf.zeros(tf.stack([tf.shape(mask)[1], tf.shape(mask)[2], 1])),
image=tf.zeros(tf.stack([tf.shape(mask)[1],
tf.shape(mask)[2], 1])),
masks=mask)
resized_masks_list.append(resized_mask)
......@@ -1443,11 +1459,16 @@ class FasterRCNNMetaArch(model.DetectionModel):
tf.range(proposal_boxlist.num_boxes()) < num_valid_proposals,
cls_weights > 0
)
sampled_indices = self._second_stage_sampler.subsample(
selected_positions = self._second_stage_sampler.subsample(
valid_indicator,
self._second_stage_batch_size,
positive_indicator)
return box_list_ops.boolean_mask(proposal_boxlist, sampled_indices)
return box_list_ops.boolean_mask(
proposal_boxlist,
selected_positions,
use_static_shapes=self._use_static_shapes,
indicator_sum=(self._second_stage_batch_size
if self._use_static_shapes else None))
def _compute_second_stage_input_feature_maps(self, features_to_crop,
proposal_boxes_normalized):
......@@ -1467,35 +1488,10 @@ class FasterRCNNMetaArch(model.DetectionModel):
Returns:
A float32 tensor with shape [K, new_height, new_width, depth].
"""
def get_box_inds(proposals):
proposals_shape = proposals.get_shape().as_list()
if any(dim is None for dim in proposals_shape):
proposals_shape = tf.shape(proposals)
ones_mat = tf.ones(proposals_shape[:2], dtype=tf.int32)
multiplier = tf.expand_dims(
tf.range(start=0, limit=proposals_shape[0]), 1)
return tf.reshape(ones_mat * multiplier, [-1])
if self._use_matmul_crop_and_resize:
def _single_image_crop_and_resize(inputs):
single_image_features_to_crop, proposal_boxes_normalized = inputs
return ops.matmul_crop_and_resize(
tf.expand_dims(single_image_features_to_crop, 0),
proposal_boxes_normalized,
[self._initial_crop_size, self._initial_crop_size])
cropped_regions = self._flatten_first_two_dimensions(
shape_utils.static_or_dynamic_map_fn(
_single_image_crop_and_resize,
elems=[features_to_crop, proposal_boxes_normalized],
dtype=tf.float32,
parallel_iterations=self._parallel_iterations))
else:
cropped_regions = tf.image.crop_and_resize(
features_to_crop,
self._flatten_first_two_dimensions(proposal_boxes_normalized),
get_box_inds(proposal_boxes_normalized),
(self._initial_crop_size, self._initial_crop_size))
cropped_regions = self._flatten_first_two_dimensions(
self._crop_and_resize_fn(
features_to_crop, proposal_boxes_normalized,
[self._initial_crop_size, self._initial_crop_size]))
return slim.max_pool2d(
cropped_regions,
[self._maxpool_kernel_size, self._maxpool_kernel_size],
......@@ -1738,11 +1734,17 @@ class FasterRCNNMetaArch(model.DetectionModel):
sampled_reg_indices = tf.multiply(batch_sampled_indices,
batch_reg_weights)
losses_mask = None
if self.groundtruth_has_field(fields.InputDataFields.is_annotated):
losses_mask = tf.stack(self.groundtruth_lists(
fields.InputDataFields.is_annotated))
localization_losses = self._first_stage_localization_loss(
rpn_box_encodings, batch_reg_targets, weights=sampled_reg_indices)
rpn_box_encodings, batch_reg_targets, weights=sampled_reg_indices,
losses_mask=losses_mask)
objectness_losses = self._first_stage_objectness_loss(
rpn_objectness_predictions_with_background,
batch_one_hot_targets, weights=batch_sampled_indices)
batch_one_hot_targets, weights=batch_sampled_indices,
losses_mask=losses_mask)
localization_loss = tf.reduce_mean(
tf.reduce_sum(localization_losses, axis=1) / normalizer)
objectness_loss = tf.reduce_mean(
......@@ -1866,32 +1868,32 @@ class FasterRCNNMetaArch(model.DetectionModel):
# for just one class to avoid over-counting for regression loss and
# (optionally) mask loss.
else:
# We only predict refined location encodings for the non background
# classes, but we now pad it to make it compatible with the class
# predictions
refined_box_encodings_with_background = tf.pad(
refined_box_encodings, [[0, 0], [1, 0], [0, 0]])
refined_box_encodings_masked_by_class_targets = tf.boolean_mask(
refined_box_encodings_with_background,
tf.greater(one_hot_flat_cls_targets_with_background, 0))
reshaped_refined_box_encodings = tf.reshape(
refined_box_encodings_masked_by_class_targets,
[batch_size, self.max_num_proposals, self._box_coder.code_size])
reshaped_refined_box_encodings = (
self._get_refined_encodings_for_postitive_class(
refined_box_encodings,
one_hot_flat_cls_targets_with_background, batch_size))
losses_mask = None
if self.groundtruth_has_field(fields.InputDataFields.is_annotated):
losses_mask = tf.stack(self.groundtruth_lists(
fields.InputDataFields.is_annotated))
second_stage_loc_losses = self._second_stage_localization_loss(
reshaped_refined_box_encodings,
batch_reg_targets, weights=batch_reg_weights) / normalizer
batch_reg_targets,
weights=batch_reg_weights,
losses_mask=losses_mask) / normalizer
second_stage_cls_losses = ops.reduce_sum_trailing_dimensions(
self._second_stage_classification_loss(
class_predictions_with_background,
batch_cls_targets_with_background,
weights=batch_cls_weights),
weights=batch_cls_weights,
losses_mask=losses_mask),
ndims=2) / normalizer
second_stage_loc_loss = tf.reduce_sum(
tf.boolean_mask(second_stage_loc_losses, paddings_indicator))
second_stage_loc_losses * tf.to_float(paddings_indicator))
second_stage_cls_loss = tf.reduce_sum(
tf.boolean_mask(second_stage_cls_losses, paddings_indicator))
second_stage_cls_losses * tf.to_float(paddings_indicator))
if self._hard_example_miner:
(second_stage_loc_loss, second_stage_cls_loss
......@@ -1954,10 +1956,9 @@ class FasterRCNNMetaArch(model.DetectionModel):
box_list.BoxList(tf.reshape(proposal_boxes, [-1, 4])),
image_shape[1], image_shape[2]).get()
flat_cropped_gt_mask = tf.image.crop_and_resize(
flat_cropped_gt_mask = self._crop_and_resize_fn(
tf.expand_dims(flat_gt_masks, -1),
flat_normalized_proposals,
tf.range(flat_normalized_proposals.shape[0].value),
tf.expand_dims(flat_normalized_proposals, axis=1),
[mask_height, mask_width])
batch_cropped_gt_mask = tf.reshape(
......@@ -1968,14 +1969,16 @@ class FasterRCNNMetaArch(model.DetectionModel):
self._second_stage_mask_loss(
reshaped_prediction_masks,
batch_cropped_gt_mask,
weights=batch_mask_target_weights),
weights=batch_mask_target_weights,
losses_mask=losses_mask),
ndims=2) / (
mask_height * mask_width * tf.maximum(
tf.reduce_sum(
batch_mask_target_weights, axis=1, keep_dims=True
), tf.ones((batch_size, 1))))
second_stage_mask_loss = tf.reduce_sum(
tf.boolean_mask(second_stage_mask_losses, paddings_indicator))
tf.where(paddings_indicator, second_stage_mask_losses,
tf.zeros_like(second_stage_mask_losses)))
if second_stage_mask_loss is not None:
mask_loss = tf.multiply(self._second_stage_mask_loss_weight,
......@@ -1983,6 +1986,29 @@ class FasterRCNNMetaArch(model.DetectionModel):
loss_dict[mask_loss.op.name] = mask_loss
return loss_dict
def _get_refined_encodings_for_postitive_class(
self, refined_box_encodings, flat_cls_targets_with_background,
batch_size):
# We only predict refined location encodings for the non background
# classes, but we now pad it to make it compatible with the class
# predictions
refined_box_encodings_with_background = tf.pad(refined_box_encodings,
[[0, 0], [1, 0], [0, 0]])
refined_box_encodings_masked_by_class_targets = (
box_list_ops.boolean_mask(
box_list.BoxList(
tf.reshape(refined_box_encodings_with_background,
[-1, self._box_coder.code_size])),
tf.reshape(tf.greater(flat_cls_targets_with_background, 0), [-1]),
use_static_shapes=self._use_static_shapes,
indicator_sum=batch_size * self.max_num_proposals
if self._use_static_shapes else None).get())
return tf.reshape(
refined_box_encodings_masked_by_class_targets, [
batch_size, self.max_num_proposals,
self._box_coder.code_size
])
def _padded_batched_proposals_indicator(self,
num_proposals,
max_num_proposals):
......
......@@ -14,8 +14,12 @@
# ==============================================================================
"""Tests for object_detection.meta_architectures.faster_rcnn_meta_arch."""
import functools
from absl.testing import parameterized
import numpy as np
import tensorflow as tf
from google.protobuf import text_format
from object_detection.anchor_generators import grid_anchor_generator
from object_detection.builders import box_predictor_builder
......@@ -23,11 +27,14 @@ from object_detection.builders import hyperparams_builder
from object_detection.builders import post_processing_builder
from object_detection.core import balanced_positive_negative_sampler as sampler
from object_detection.core import losses
from object_detection.core import post_processing
from object_detection.core import target_assigner
from object_detection.meta_architectures import faster_rcnn_meta_arch
from object_detection.protos import box_predictor_pb2
from object_detection.protos import hyperparams_pb2
from object_detection.protos import post_processing_pb2
from object_detection.utils import ops
from object_detection.utils import test_case
from object_detection.utils import test_utils
slim = tf.contrib.slim
......@@ -60,7 +67,7 @@ class FakeFasterRCNNFeatureExtractor(
num_outputs=3, kernel_size=1, scope='layer2')
class FasterRCNNMetaArchTestBase(tf.test.TestCase):
class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
"""Base class to test Faster R-CNN and R-FCN meta architectures."""
def _build_arg_scope_with_hyperparams(self,
......@@ -157,7 +164,8 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
masks_are_class_agnostic=False,
use_matmul_crop_and_resize=False,
clip_anchors_to_image=False,
use_matmul_gather_in_matcher=False):
use_matmul_gather_in_matcher=False,
use_static_shapes=False):
def image_resizer_fn(image, masks=None):
"""Fake image resizer function."""
......@@ -220,11 +228,18 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
first_stage_box_predictor_depth = 512
first_stage_minibatch_size = 3
first_stage_sampler = sampler.BalancedPositiveNegativeSampler(
positive_fraction=0.5, is_static=False)
positive_fraction=0.5, is_static=use_static_shapes)
first_stage_nms_score_threshold = -1.0
first_stage_nms_iou_threshold = 1.0
first_stage_max_proposals = first_stage_max_proposals
first_stage_non_max_suppression_fn = functools.partial(
post_processing.batch_multiclass_non_max_suppression,
score_thresh=first_stage_nms_score_threshold,
iou_thresh=first_stage_nms_iou_threshold,
max_size_per_class=first_stage_max_proposals,
max_total_size=first_stage_max_proposals,
use_static_shapes=use_static_shapes)
first_stage_localization_loss_weight = 1.0
first_stage_objectness_loss_weight = 1.0
......@@ -246,7 +261,7 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
second_stage_non_max_suppression_fn, _ = post_processing_builder.build(
post_processing_config)
second_stage_sampler = sampler.BalancedPositiveNegativeSampler(
positive_fraction=1.0, is_static=False)
positive_fraction=1.0, is_static=use_static_shapes)
second_stage_score_conversion_fn = tf.identity
second_stage_localization_loss_weight = 1.0
......@@ -268,6 +283,9 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
loc_loss_weight=second_stage_localization_loss_weight,
max_negatives_per_positive=None)
crop_and_resize_fn = (
ops.matmul_crop_and_resize
if use_matmul_crop_and_resize else ops.native_crop_and_resize)
common_kwargs = {
'is_training': is_training,
'num_classes': num_classes,
......@@ -284,8 +302,8 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
'first_stage_box_predictor_depth': first_stage_box_predictor_depth,
'first_stage_minibatch_size': first_stage_minibatch_size,
'first_stage_sampler': first_stage_sampler,
'first_stage_nms_score_threshold': first_stage_nms_score_threshold,
'first_stage_nms_iou_threshold': first_stage_nms_iou_threshold,
'first_stage_non_max_suppression_fn':
first_stage_non_max_suppression_fn,
'first_stage_max_proposals': first_stage_max_proposals,
'first_stage_localization_loss_weight':
first_stage_localization_loss_weight,
......@@ -304,8 +322,10 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
'second_stage_classification_loss':
second_stage_classification_loss,
'hard_example_miner': hard_example_miner,
'use_matmul_crop_and_resize': use_matmul_crop_and_resize,
'clip_anchors_to_image': clip_anchors_to_image
'crop_and_resize_fn': crop_and_resize_fn,
'clip_anchors_to_image': clip_anchors_to_image,
'use_static_shapes': use_static_shapes,
'resize_masks': True,
}
return self._get_model(
......@@ -412,7 +432,7 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
anchors = prediction_out['anchors']
self.assertTrue(len(anchors.shape) == 2 and anchors.shape[1] == 4)
num_anchors_out = anchors.shape[0]
self.assertTrue(num_anchors_out < num_anchors_strict_upper_bound)
self.assertLess(num_anchors_out, num_anchors_strict_upper_bound)
self.assertTrue(np.all(np.greater_equal(anchors, 0)))
self.assertTrue(np.all(np.less_equal(anchors[:, 0], height)))
......@@ -484,94 +504,104 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
for key in expected_shapes:
self.assertAllEqual(tensor_dict_out[key].shape, expected_shapes[key])
def _test_predict_gives_correct_shapes_in_train_mode_both_stages(
self, use_matmul_crop_and_resize=False,
clip_anchors_to_image=False):
test_graph = tf.Graph()
with test_graph.as_default():
# BEGIN GOOGLE-INTERNAL
# TODO(bhattad): Remove conditional after CMLE moves to TF 1.11
@parameterized.parameters(
{'use_static_shapes': False},
{'use_static_shapes': True}
)
# END GOOGLE-INTERNAL
def test_predict_gives_correct_shapes_in_train_mode_both_stages(
self,
use_static_shapes=False):
batch_size = 2
image_size = 10
max_num_proposals = 7
initial_crop_size = 3
maxpool_stride = 1
def graph_fn(images, gt_boxes, gt_classes, gt_weights):
"""Function to construct tf graph for the test."""
model = self._build_model(
is_training=True,
number_of_stages=2,
second_stage_batch_size=7,
predict_masks=False,
use_matmul_crop_and_resize=use_matmul_crop_and_resize,
clip_anchors_to_image=clip_anchors_to_image)
use_matmul_crop_and_resize=use_static_shapes,
clip_anchors_to_image=use_static_shapes,
use_static_shapes=use_static_shapes)
batch_size = 2
image_size = 10
max_num_proposals = 7
initial_crop_size = 3
maxpool_stride = 1
image_shape = (batch_size, image_size, image_size, 3)
preprocessed_inputs = tf.zeros(image_shape, dtype=tf.float32)
groundtruth_boxes_list = [
tf.constant([[0, 0, .5, .5], [.5, .5, 1, 1]], dtype=tf.float32),
tf.constant([[0, .5, .5, 1], [.5, 0, 1, .5]], dtype=tf.float32)]
groundtruth_classes_list = [
tf.constant([[1, 0], [0, 1]], dtype=tf.float32),
tf.constant([[1, 0], [1, 0]], dtype=tf.float32)]
groundtruth_weights_list = [
tf.constant([1, 1], dtype=tf.float32),
tf.constant([1, 1], dtype=tf.float32)]
_, true_image_shapes = model.preprocess(tf.zeros(image_shape))
preprocessed_inputs, true_image_shapes = model.preprocess(images)
model.provide_groundtruth(
groundtruth_boxes_list,
groundtruth_classes_list,
groundtruth_weights_list=groundtruth_weights_list)
groundtruth_boxes_list=tf.unstack(gt_boxes),
groundtruth_classes_list=tf.unstack(gt_classes),
groundtruth_weights_list=tf.unstack(gt_weights))
result_tensor_dict = model.predict(preprocessed_inputs, true_image_shapes)
expected_shapes = {
'rpn_box_predictor_features':
(2, image_size, image_size, 512),
'rpn_features_to_crop': (2, image_size, image_size, 3),
'image_shape': (4,),
'refined_box_encodings': (2 * max_num_proposals, 2, 4),
'class_predictions_with_background': (2 * max_num_proposals, 2 + 1),
'num_proposals': (2,),
'proposal_boxes': (2, max_num_proposals, 4),
'proposal_boxes_normalized': (2, max_num_proposals, 4),
'box_classifier_features':
self._get_box_classifier_features_shape(image_size,
batch_size,
max_num_proposals,
initial_crop_size,
maxpool_stride,
3)
}
init_op = tf.global_variables_initializer()
with self.test_session(graph=test_graph) as sess:
sess.run(init_op)
tensor_dict_out = sess.run(result_tensor_dict)
self.assertEqual(set(tensor_dict_out.keys()),
set(expected_shapes.keys()).union(set([
'rpn_box_encodings',
'rpn_objectness_predictions_with_background',
'anchors'])))
for key in expected_shapes:
self.assertAllEqual(tensor_dict_out[key].shape, expected_shapes[key])
anchors_shape_out = tensor_dict_out['anchors'].shape
self.assertEqual(2, len(anchors_shape_out))
self.assertEqual(4, anchors_shape_out[1])
num_anchors_out = anchors_shape_out[0]
self.assertAllEqual(tensor_dict_out['rpn_box_encodings'].shape,
(2, num_anchors_out, 4))
self.assertAllEqual(
tensor_dict_out['rpn_objectness_predictions_with_background'].shape,
(2, num_anchors_out, 2))
def test_predict_gives_correct_shapes_in_train_mode_both_stages(self):
self._test_predict_gives_correct_shapes_in_train_mode_both_stages()
def test_predict_gives_correct_shapes_in_train_mode_matmul_crop_resize(self):
self._test_predict_gives_correct_shapes_in_train_mode_both_stages(
use_matmul_crop_and_resize=True)
return (result_tensor_dict['refined_box_encodings'],
result_tensor_dict['class_predictions_with_background'],
result_tensor_dict['proposal_boxes'],
result_tensor_dict['proposal_boxes_normalized'],
result_tensor_dict['anchors'],
result_tensor_dict['rpn_box_encodings'],
result_tensor_dict['rpn_objectness_predictions_with_background'],
result_tensor_dict['rpn_features_to_crop'],
result_tensor_dict['rpn_box_predictor_features'],
)
image_shape = (batch_size, image_size, image_size, 3)
images = np.zeros(image_shape, dtype=np.float32)
gt_boxes = np.stack([
np.array([[0, 0, .5, .5], [.5, .5, 1, 1]], dtype=np.float32),
np.array([[0, .5, .5, 1], [.5, 0, 1, .5]], dtype=np.float32)
])
gt_classes = np.stack([
np.array([[1, 0], [0, 1]], dtype=np.float32),
np.array([[1, 0], [1, 0]], dtype=np.float32)
])
gt_weights = np.stack([
np.array([1, 1], dtype=np.float32),
np.array([1, 1], dtype=np.float32)
])
if use_static_shapes:
results = self.execute(graph_fn,
[images, gt_boxes, gt_classes, gt_weights])
else:
results = self.execute_cpu(graph_fn,
[images, gt_boxes, gt_classes, gt_weights])
def test_predict_gives_correct_shapes_in_train_mode_clip_anchors(self):
self._test_predict_gives_correct_shapes_in_train_mode_both_stages(
clip_anchors_to_image=True)
expected_shapes = {
'rpn_box_predictor_features': (2, image_size, image_size, 512),
'rpn_features_to_crop': (2, image_size, image_size, 3),
'refined_box_encodings': (2 * max_num_proposals, 2, 4),
'class_predictions_with_background': (2 * max_num_proposals, 2 + 1),
'proposal_boxes': (2, max_num_proposals, 4),
'rpn_box_encodings': (2, image_size * image_size * 9, 4),
'proposal_boxes_normalized': (2, max_num_proposals, 4),
'box_classifier_features':
self._get_box_classifier_features_shape(
image_size, batch_size, max_num_proposals, initial_crop_size,
maxpool_stride, 3),
'rpn_objectness_predictions_with_background':
(2, image_size * image_size * 9, 2)
}
# TODO(rathodv): Possibly change utils/test_case.py to accept dictionaries
# and return dicionaries so don't have to rely on the order of tensors.
self.assertAllEqual(results[0].shape,
expected_shapes['refined_box_encodings'])
self.assertAllEqual(results[1].shape,
expected_shapes['class_predictions_with_background'])
self.assertAllEqual(results[2].shape, expected_shapes['proposal_boxes'])
self.assertAllEqual(results[3].shape,
expected_shapes['proposal_boxes_normalized'])
anchors_shape = results[4].shape
self.assertAllEqual(results[5].shape,
[batch_size, anchors_shape[0], 4])
self.assertAllEqual(results[6].shape,
[batch_size, anchors_shape[0], 2])
self.assertAllEqual(results[7].shape,
expected_shapes['rpn_features_to_crop'])
self.assertAllEqual(results[8].shape,
expected_shapes['rpn_box_predictor_features'])
def _test_postprocess_first_stage_only_inference_mode(
self, pad_to_max_dimension=None):
......@@ -848,10 +878,10 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
loss_dict_out = sess.run(loss_dict)
self.assertAllClose(loss_dict_out['Loss/RPNLoss/localization_loss'], 0)
self.assertAllClose(loss_dict_out['Loss/RPNLoss/objectness_loss'], 0)
self.assertTrue('Loss/BoxClassifierLoss/localization_loss'
not in loss_dict_out)
self.assertTrue('Loss/BoxClassifierLoss/classification_loss'
not in loss_dict_out)
self.assertNotIn('Loss/BoxClassifierLoss/localization_loss',
loss_dict_out)
self.assertNotIn('Loss/BoxClassifierLoss/classification_loss',
loss_dict_out)
# TODO(rathodv): Split test into two - with and without masks.
def test_loss_full(self):
......@@ -1157,22 +1187,68 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
'Loss/BoxClassifierLoss/classification_loss'], 0)
self.assertAllClose(loss_dict_out['Loss/BoxClassifierLoss/mask_loss'], 0)
def test_loss_full_zero_padded_proposals_nonzero_loss_with_two_images(self):
model = self._build_model(
is_training=True, number_of_stages=2, second_stage_batch_size=6)
# BEGIN GOOGLE-INTERNAL
# TODO(bhattad): Remove conditional after CMLE moves to TF 1.11
@parameterized.parameters(
{'use_static_shapes': False, 'shared_boxes': False},
{'use_static_shapes': False, 'shared_boxes': True},
{'use_static_shapes': True, 'shared_boxes': False},
{'use_static_shapes': True, 'shared_boxes': True},
)
# END GOOGLE-INTERNAL
def test_loss_full_zero_padded_proposals_nonzero_loss_with_two_images(
self, use_static_shapes=False, shared_boxes=False):
batch_size = 2
anchors = tf.constant(
first_stage_max_proposals = 8
second_stage_batch_size = 6
num_classes = 2
def graph_fn(anchors, rpn_box_encodings,
rpn_objectness_predictions_with_background, images,
num_proposals, proposal_boxes, refined_box_encodings,
class_predictions_with_background, groundtruth_boxes,
groundtruth_classes):
"""Function to construct tf graph for the test."""
model = self._build_model(
is_training=True, number_of_stages=2,
second_stage_batch_size=second_stage_batch_size,
first_stage_max_proposals=first_stage_max_proposals,
num_classes=num_classes,
use_matmul_crop_and_resize=use_static_shapes,
clip_anchors_to_image=use_static_shapes,
use_static_shapes=use_static_shapes)
prediction_dict = {
'rpn_box_encodings': rpn_box_encodings,
'rpn_objectness_predictions_with_background':
rpn_objectness_predictions_with_background,
'image_shape': tf.shape(images),
'anchors': anchors,
'refined_box_encodings': refined_box_encodings,
'class_predictions_with_background':
class_predictions_with_background,
'proposal_boxes': proposal_boxes,
'num_proposals': num_proposals
}
_, true_image_shapes = model.preprocess(images)
model.provide_groundtruth(tf.unstack(groundtruth_boxes),
tf.unstack(groundtruth_classes))
loss_dict = model.loss(prediction_dict, true_image_shapes)
return (loss_dict['Loss/RPNLoss/localization_loss'],
loss_dict['Loss/RPNLoss/objectness_loss'],
loss_dict['Loss/BoxClassifierLoss/localization_loss'],
loss_dict['Loss/BoxClassifierLoss/classification_loss'])
anchors = np.array(
[[0, 0, 16, 16],
[0, 16, 16, 32],
[16, 0, 32, 16],
[16, 16, 32, 32]], dtype=tf.float32)
rpn_box_encodings = tf.zeros(
[batch_size,
anchors.get_shape().as_list()[0],
BOX_CODE_SIZE], dtype=tf.float32)
[16, 16, 32, 32]], dtype=np.float32)
rpn_box_encodings = np.zeros(
[batch_size, anchors.shape[1], BOX_CODE_SIZE], dtype=np.float32)
# use different numbers for the objectness category to break ties in
# order of boxes returned by NMS
rpn_objectness_predictions_with_background = tf.constant(
rpn_objectness_predictions_with_background = np.array(
[[[-10, 13],
[10, -10],
[10, -11],
......@@ -1180,13 +1256,13 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
[[-10, 13],
[10, -10],
[10, -11],
[10, -12]]], dtype=tf.float32)
image_shape = tf.constant([batch_size, 32, 32, 3], dtype=tf.int32)
[10, -12]]], dtype=np.float32)
images = np.zeros([batch_size, 32, 32, 3], dtype=np.float32)
# box_classifier_batch_size is 6, but here we assume that the number of
# actual proposals (not counting zero paddings) is fewer.
num_proposals = tf.constant([3, 2], dtype=tf.int32)
proposal_boxes = tf.constant(
num_proposals = np.array([3, 2], dtype=np.int32)
proposal_boxes = np.array(
[[[0, 0, 16, 16],
[0, 16, 16, 32],
[16, 0, 32, 16],
......@@ -1198,13 +1274,13 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
[0, 0, 0, 0], # begin paddings
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]]], dtype=tf.float32)
[0, 0, 0, 0]]], dtype=np.float32)
refined_box_encodings = tf.zeros(
(batch_size * model.max_num_proposals,
model.num_classes,
BOX_CODE_SIZE), dtype=tf.float32)
class_predictions_with_background = tf.constant(
refined_box_encodings = np.zeros(
(batch_size * second_stage_batch_size, 1
if shared_boxes else num_classes, BOX_CODE_SIZE),
dtype=np.float32)
class_predictions_with_background = np.array(
[[-10, 10, -10], # first image
[10, -10, -10],
[10, -10, -10],
......@@ -1216,7 +1292,7 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
[0, 0, 0], # begin paddings
[0, 0, 0],
[0, 0, 0],
[0, 0, 0],], dtype=tf.float32)
[0, 0, 0],], dtype=np.float32)
# The first groundtruth box is 4/5 of the anchor size in both directions
# experiencing a loss of:
......@@ -1225,38 +1301,29 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
# The second groundtruth box is identical to the prediction and thus
# experiences zero loss.
# Total average loss is (abs(5 * log(1/2)) - .5) / 3.
groundtruth_boxes_list = [
tf.constant([[0.05, 0.05, 0.45, 0.45]], dtype=tf.float32),
tf.constant([[0.0, 0.0, 0.5, 0.5]], dtype=tf.float32)]
groundtruth_classes_list = [tf.constant([[1, 0]], dtype=tf.float32),
tf.constant([[0, 1]], dtype=tf.float32)]
exp_loc_loss = (-5 * np.log(.8) - 0.5) / 3.0
groundtruth_boxes = np.stack([
np.array([[0.05, 0.05, 0.45, 0.45]], dtype=np.float32),
np.array([[0.0, 0.0, 0.5, 0.5]], dtype=np.float32)])
groundtruth_classes = np.stack([np.array([[1, 0]], dtype=np.float32),
np.array([[0, 1]], dtype=np.float32)])
execute_fn = self.execute_cpu
if use_static_shapes:
execute_fn = self.execute
results = execute_fn(graph_fn, [
anchors, rpn_box_encodings, rpn_objectness_predictions_with_background,
images, num_proposals, proposal_boxes, refined_box_encodings,
class_predictions_with_background, groundtruth_boxes,
groundtruth_classes
])
prediction_dict = {
'rpn_box_encodings': rpn_box_encodings,
'rpn_objectness_predictions_with_background':
rpn_objectness_predictions_with_background,
'image_shape': image_shape,
'anchors': anchors,
'refined_box_encodings': refined_box_encodings,
'class_predictions_with_background': class_predictions_with_background,
'proposal_boxes': proposal_boxes,
'num_proposals': num_proposals
}
_, true_image_shapes = model.preprocess(tf.zeros(image_shape))
model.provide_groundtruth(groundtruth_boxes_list,
groundtruth_classes_list)
loss_dict = model.loss(prediction_dict, true_image_shapes)
exp_loc_loss = (-5 * np.log(.8) - 0.5) / 3.0
with self.test_session() as sess:
loss_dict_out = sess.run(loss_dict)
self.assertAllClose(loss_dict_out['Loss/RPNLoss/localization_loss'],
exp_loc_loss)
self.assertAllClose(loss_dict_out['Loss/RPNLoss/objectness_loss'], 0)
self.assertAllClose(loss_dict_out[
'Loss/BoxClassifierLoss/localization_loss'], exp_loc_loss)
self.assertAllClose(loss_dict_out[
'Loss/BoxClassifierLoss/classification_loss'], 0)
self.assertAllClose(results[0], exp_loc_loss, rtol=1e-4, atol=1e-4)
self.assertAllClose(results[1], 0.0)
self.assertAllClose(results[2], exp_loc_loss, rtol=1e-4, atol=1e-4)
self.assertAllClose(results[3], 0.0)
def test_loss_with_hard_mining(self):
model = self._build_model(is_training=True,
......@@ -1346,10 +1413,14 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
self.assertAllClose(loss_dict_out[
'Loss/BoxClassifierLoss/classification_loss'], 0)
def test_loss_full_with_shared_boxes(self):
model = self._build_model(
is_training=True, number_of_stages=2, second_stage_batch_size=6)
def test_loss_with_hard_mining_and_losses_mask(self):
model = self._build_model(is_training=True,
number_of_stages=2,
second_stage_batch_size=None,
first_stage_max_proposals=6,
hard_mining=True)
batch_size = 2
number_of_proposals = 3
anchors = tf.constant(
[[0, 0, 16, 16],
[0, 16, 16, 32],
......@@ -1361,63 +1432,77 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
BOX_CODE_SIZE], dtype=tf.float32)
# use different numbers for the objectness category to break ties in
# order of boxes returned by NMS
rpn_objectness_predictions_with_background = tf.constant([
[[-10, 13],
[10, -10],
[10, -11],
[-10, 12]],
[[10, -10],
[-10, 13],
[-10, 12],
[10, -11]]], dtype=tf.float32)
rpn_objectness_predictions_with_background = tf.constant(
[[[-10, 13],
[-10, 12],
[10, -11],
[10, -12]],
[[-10, 13],
[-10, 12],
[10, -11],
[10, -12]]], dtype=tf.float32)
image_shape = tf.constant([batch_size, 32, 32, 3], dtype=tf.int32)
num_proposals = tf.constant([6, 6], dtype=tf.int32)
# box_classifier_batch_size is 6, but here we assume that the number of
# actual proposals (not counting zero paddings) is fewer (3).
num_proposals = tf.constant([number_of_proposals, number_of_proposals],
dtype=tf.int32)
proposal_boxes = tf.constant(
2 * [[[0, 0, 16, 16],
[0, 16, 16, 32],
[16, 0, 32, 16],
[16, 16, 32, 32],
[0, 0, 16, 16],
[0, 16, 16, 32]]], dtype=tf.float32)
[[[0, 0, 16, 16], # first image
[0, 16, 16, 32],
[16, 0, 32, 16],
[0, 0, 0, 0], # begin paddings
[0, 0, 0, 0],
[0, 0, 0, 0]],
[[0, 0, 16, 16], # second image
[0, 16, 16, 32],
[16, 0, 32, 16],
[0, 0, 0, 0], # begin paddings
[0, 0, 0, 0],
[0, 0, 0, 0]]], dtype=tf.float32)
refined_box_encodings = tf.zeros(
(batch_size * model.max_num_proposals,
1, # one box shared among all the classes
model.num_classes,
BOX_CODE_SIZE), dtype=tf.float32)
class_predictions_with_background = tf.constant(
[[-10, 10, -10], # first image
[10, -10, -10],
[10, -10, -10],
[-10, -10, 10],
[-10, 10, -10],
[10, -10, -10],
[10, -10, -10], # second image
[-10, 10, -10],
[-10, 10, -10],
[10, -10, -10],
[0, 0, 0], # begin paddings
[0, 0, 0],
[0, 0, 0],
[-10, 10, -10], # second image
[-10, -10, 10],
[10, -10, -10],
[-10, 10, -10]], dtype=tf.float32)
mask_predictions_logits = 20 * tf.ones((batch_size *
model.max_num_proposals,
model.num_classes,
14, 14),
dtype=tf.float32)
[0, 0, 0], # begin paddings
[0, 0, 0],
[0, 0, 0]], dtype=tf.float32)
# The first groundtruth box is 4/5 of the anchor size in both directions
# experiencing a loss of:
# 2 * SmoothL1(5 * log(4/5)) / (num_proposals * batch_size)
# = 2 * (abs(5 * log(1/2)) - .5) / 3
# The second groundtruth box is 46/50 of the anchor size in both directions
# experiencing a loss of:
# 2 * SmoothL1(5 * log(42/50)) / (num_proposals * batch_size)
# = 2 * (.5(5 * log(.92))^2 - .5) / 3.
# Since the first groundtruth box experiences greater loss, and we have
# set num_hard_examples=1 in the HardMiner, the final localization loss
# corresponds to that of the first groundtruth box.
groundtruth_boxes_list = [
tf.constant([[0, 0, .5, .5], [.5, .5, 1, 1]], dtype=tf.float32),
tf.constant([[0, .5, .5, 1], [.5, 0, 1, .5]], dtype=tf.float32)]
groundtruth_classes_list = [tf.constant([[1, 0], [0, 1]], dtype=tf.float32),
tf.constant([[1, 0], [1, 0]], dtype=tf.float32)]
tf.constant([[0.05, 0.05, 0.45, 0.45],
[0.02, 0.52, 0.48, 0.98]], dtype=tf.float32),
tf.constant([[0.05, 0.05, 0.45, 0.45],
[0.02, 0.52, 0.48, 0.98]], dtype=tf.float32)]
groundtruth_classes_list = [
tf.constant([[1, 0], [0, 1]], dtype=tf.float32),
tf.constant([[1, 0], [0, 1]], dtype=tf.float32)]
is_annotated_list = [tf.constant(True, dtype=tf.bool),
tf.constant(False, dtype=tf.bool)]
exp_loc_loss = (2 * (-5 * np.log(.8) - 0.5) /
(number_of_proposals * batch_size))
# Set all elements of groundtruth mask to 1.0. In this case all proposal
# crops of the groundtruth masks should return a mask that covers the entire
# proposal. Thus, if mask_predictions_logits element values are all greater
# than 20, the loss should be zero.
groundtruth_masks_list = [tf.convert_to_tensor(np.ones((2, 32, 32)),
dtype=tf.float32),
tf.convert_to_tensor(np.ones((2, 32, 32)),
dtype=tf.float32)]
prediction_dict = {
'rpn_box_encodings': rpn_box_encodings,
'rpn_objectness_predictions_with_background':
......@@ -1427,24 +1512,20 @@ class FasterRCNNMetaArchTestBase(tf.test.TestCase):
'refined_box_encodings': refined_box_encodings,
'class_predictions_with_background': class_predictions_with_background,
'proposal_boxes': proposal_boxes,
'num_proposals': num_proposals,
'mask_predictions': mask_predictions_logits
'num_proposals': num_proposals
}
_, true_image_shapes = model.preprocess(tf.zeros(image_shape))
model.provide_groundtruth(groundtruth_boxes_list,
groundtruth_classes_list,
groundtruth_masks_list)
is_annotated_list=is_annotated_list)
loss_dict = model.loss(prediction_dict, true_image_shapes)
with self.test_session() as sess:
loss_dict_out = sess.run(loss_dict)
self.assertAllClose(loss_dict_out['Loss/RPNLoss/localization_loss'], 0)
self.assertAllClose(loss_dict_out['Loss/RPNLoss/objectness_loss'], 0)
self.assertAllClose(loss_dict_out[
'Loss/BoxClassifierLoss/localization_loss'], 0)
'Loss/BoxClassifierLoss/localization_loss'], exp_loc_loss)
self.assertAllClose(loss_dict_out[
'Loss/BoxClassifierLoss/classification_loss'], 0)
self.assertAllClose(loss_dict_out['Loss/BoxClassifierLoss/mask_loss'], 0)
def test_restore_map_for_classification_ckpt(self):
# Define mock tensorflow classification graph and save variables.
......
......@@ -62,11 +62,11 @@ class RFCNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch):
first_stage_box_predictor_depth,
first_stage_minibatch_size,
first_stage_sampler,
first_stage_nms_score_threshold,
first_stage_nms_iou_threshold,
first_stage_non_max_suppression_fn,
first_stage_max_proposals,
first_stage_localization_loss_weight,
first_stage_objectness_loss_weight,
crop_and_resize_fn,
second_stage_target_assigner,
second_stage_rfcn_box_predictor,
second_stage_batch_size,
......@@ -79,8 +79,9 @@ class RFCNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch):
hard_example_miner,
parallel_iterations=16,
add_summaries=True,
use_matmul_crop_and_resize=False,
clip_anchors_to_image=False):
clip_anchors_to_image=False,
use_static_shapes=False,
resize_masks=False):
"""RFCNMetaArch Constructor.
Args:
......@@ -123,18 +124,22 @@ class RFCNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch):
only called "batch_size" due to terminology from the Faster R-CNN paper.
first_stage_sampler: The sampler for the boxes used to calculate the RPN
loss after the first stage.
first_stage_nms_score_threshold: Score threshold for non max suppression
for the Region Proposal Network (RPN). This value is expected to be in
[0, 1] as it is applied directly after a softmax transformation. The
recommended value for Faster R-CNN is 0.
first_stage_nms_iou_threshold: The Intersection Over Union (IOU) threshold
for performing Non-Max Suppression (NMS) on the boxes predicted by the
Region Proposal Network (RPN).
first_stage_non_max_suppression_fn: batch_multiclass_non_max_suppression
callable that takes `boxes`, `scores` and optional `clip_window`(with
all other inputs already set) and returns a dictionary containing
tensors with keys: `detection_boxes`, `detection_scores`,
`detection_classes`, `num_detections`. This is used to perform non max
suppression on the boxes predicted by the Region Proposal Network
(RPN).
See `post_processing.batch_multiclass_non_max_suppression` for the type
and shape of these tensors.
first_stage_max_proposals: Maximum number of boxes to retain after
performing Non-Max Suppression (NMS) on the boxes predicted by the
Region Proposal Network (RPN).
first_stage_localization_loss_weight: A float
first_stage_objectness_loss_weight: A float
crop_and_resize_fn: A differentiable resampler to use for cropping RPN
proposal features.
second_stage_target_assigner: Target assigner to use for second stage of
R-FCN. If the model is configured with multiple prediction heads, this
target assigner is used to generate targets for all heads (with the
......@@ -168,12 +173,13 @@ class RFCNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch):
in parallel for calls to tf.map_fn.
add_summaries: boolean (default: True) controlling whether summary ops
should be added to tensorflow graph.
use_matmul_crop_and_resize: Force the use of matrix multiplication based
crop and resize instead of standard tf.image.crop_and_resize while
computing second stage input feature maps.
clip_anchors_to_image: The anchors generated are clip to the
window size without filtering the nonoverlapping anchors. This generates
a static number of anchors. This argument is unused.
use_static_shapes: If True, uses implementation of ops with static shape
guarantees.
resize_masks: Indicates whether the masks presend in the groundtruth
should be resized in the model with `image_resizer_fn`
Raises:
ValueError: If `second_stage_batch_size` > `first_stage_max_proposals`
......@@ -196,11 +202,11 @@ class RFCNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch):
first_stage_box_predictor_depth,
first_stage_minibatch_size,
first_stage_sampler,
first_stage_nms_score_threshold,
first_stage_nms_iou_threshold,
first_stage_non_max_suppression_fn,
first_stage_max_proposals,
first_stage_localization_loss_weight,
first_stage_objectness_loss_weight,
crop_and_resize_fn,
None, # initial_crop_size is not used in R-FCN
None, # maxpool_kernel_size is not use in R-FCN
None, # maxpool_stride is not use in R-FCN
......@@ -215,7 +221,11 @@ class RFCNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch):
second_stage_classification_loss,
1.0, # second stage mask prediction loss weight isn't used in R-FCN.
hard_example_miner,
parallel_iterations)
parallel_iterations,
add_summaries,
clip_anchors_to_image,
use_static_shapes,
resize_masks)
self._rfcn_box_predictor = second_stage_rfcn_box_predictor
......
......@@ -225,10 +225,7 @@ class SSDMetaArch(model.DetectionModel):
box_predictor,
box_coder,
feature_extractor,
matcher,
region_similarity_calculator,
encode_background_as_zeros,
negative_class_weight,
image_resizer_fn,
non_max_suppression_fn,
score_conversion_fn,
......@@ -238,14 +235,14 @@ class SSDMetaArch(model.DetectionModel):
localization_loss_weight,
normalize_loss_by_num_matches,
hard_example_miner,
target_assigner_instance,
add_summaries=True,
normalize_loc_loss_by_codesize=False,
freeze_batchnorm=False,
inplace_batchnorm_update=False,
add_background_class=True,
random_example_sampler=None,
expected_classification_loss_under_sampling=None,
target_assigner_instance=None):
expected_classification_loss_under_sampling=None):
"""SSDMetaArch Constructor.
TODO(rathodv,jonathanhuang): group NMS parameters + score converter into
......@@ -259,13 +256,9 @@ class SSDMetaArch(model.DetectionModel):
box_predictor: a box_predictor.BoxPredictor object.
box_coder: a box_coder.BoxCoder object.
feature_extractor: a SSDFeatureExtractor object.
matcher: a matcher.Matcher object.
region_similarity_calculator: a
region_similarity_calculator.RegionSimilarityCalculator object.
encode_background_as_zeros: boolean determining whether background
targets are to be encoded as an all zeros vector or a one-hot
vector (where background is the 0th class).
negative_class_weight: Weight for confidence loss of negative anchors.
image_resizer_fn: a callable for image resizing. This callable always
takes a rank-3 image tensor (corresponding to a single image) and
returns a rank-3 image tensor, possibly with new spatial dimensions and
......@@ -288,6 +281,7 @@ class SSDMetaArch(model.DetectionModel):
localization_loss_weight: float
normalize_loss_by_num_matches: boolean
hard_example_miner: a losses.HardExampleMiner object (can be None)
target_assigner_instance: target_assigner.TargetAssigner instance to use.
add_summaries: boolean (default: True) controlling whether summary ops
should be added to tensorflow graph.
normalize_loc_loss_by_codesize: whether to normalize localization loss
......@@ -312,7 +306,6 @@ class SSDMetaArch(model.DetectionModel):
the random sampled examples.
expected_classification_loss_under_sampling: If not None, use
to calcualte classification loss by background/foreground weighting.
target_assigner_instance: target_assigner.TargetAssigner instance to use.
"""
super(SSDMetaArch, self).__init__(num_classes=box_predictor.num_classes)
self._is_training = is_training
......@@ -324,8 +317,6 @@ class SSDMetaArch(model.DetectionModel):
self._box_coder = box_coder
self._feature_extractor = feature_extractor
self._matcher = matcher
self._region_similarity_calculator = region_similarity_calculator
self._add_background_class = add_background_class
# Needed for fine-tuning from classification checkpoints whose
......@@ -347,14 +338,7 @@ class SSDMetaArch(model.DetectionModel):
self._unmatched_class_label = tf.constant((self.num_classes + 1) * [0],
tf.float32)
if target_assigner_instance:
self._target_assigner = target_assigner_instance
else:
self._target_assigner = target_assigner.TargetAssigner(
self._region_similarity_calculator,
self._matcher,
self._box_coder,
negative_class_weight=negative_class_weight)
self._target_assigner = target_assigner_instance
self._classification_loss = classification_loss
self._localization_loss = localization_loss
......@@ -523,28 +507,25 @@ class SSDMetaArch(model.DetectionModel):
im_height=image_shape[1],
im_width=image_shape[2]))
if self._box_predictor.is_keras_model:
prediction_dict = self._box_predictor(feature_maps)
predictor_results_dict = self._box_predictor(feature_maps)
else:
with slim.arg_scope([slim.batch_norm],
is_training=(self._is_training and
not self._freeze_batchnorm),
updates_collections=batchnorm_updates_collections):
prediction_dict = self._box_predictor.predict(
predictor_results_dict = self._box_predictor.predict(
feature_maps, self._anchor_generator.num_anchors_per_location())
box_encodings = tf.concat(prediction_dict['box_encodings'], axis=1)
if box_encodings.shape.ndims == 4 and box_encodings.shape[2] == 1:
box_encodings = tf.squeeze(box_encodings, axis=2)
class_predictions_with_background = tf.concat(
prediction_dict['class_predictions_with_background'], axis=1)
predictions_dict = {
'preprocessed_inputs': preprocessed_inputs,
'box_encodings': box_encodings,
'class_predictions_with_background':
class_predictions_with_background,
'feature_maps': feature_maps,
'anchors': self._anchors.get()
}
for prediction_key, prediction_list in iter(predictor_results_dict.items()):
prediction = tf.concat(prediction_list, axis=1)
if (prediction_key == 'box_encodings' and prediction.shape.ndims == 4 and
prediction.shape[2] == 1):
prediction = tf.squeeze(prediction, axis=2)
predictions_dict[prediction_key] = prediction
self._batched_prediction_tensor_names = [x for x in predictions_dict
if x != 'anchors']
return predictions_dict
......@@ -587,6 +568,10 @@ class SSDMetaArch(model.DetectionModel):
[batch_size, num_anchors, num_classes+1] containing class predictions
(logits) for each of the anchors. Note that this tensor *includes*
background class predictions.
4) mask_predictions: (optional) a 5-D float tensor of shape
[batch_size, num_anchors, q, mask_height, mask_width]. `q` can be
either number of classes or 1 depending on whether a separate mask is
predicted per class.
true_image_shapes: int32 tensor of shape [batch, 3] where each row is
of the form [height, width, channels] indicating the shapes
of true images in the resized images, as resized images can be padded
......@@ -599,6 +584,8 @@ class SSDMetaArch(model.DetectionModel):
detection_classes: [batch, max_detections]
detection_keypoints: [batch, max_detections, num_keypoints, 2] (if
encoded in the prediction_dict 'box_encodings')
detection_masks: [batch_size, max_detections, mask_height, mask_width]
(optional)
num_detections: [batch]
Raises:
ValueError: if prediction_dict does not contain `box_encodings` or
......@@ -627,13 +614,14 @@ class SSDMetaArch(model.DetectionModel):
if detection_keypoints is not None:
additional_fields = {
fields.BoxListFields.keypoints: detection_keypoints}
(nmsed_boxes, nmsed_scores, nmsed_classes, _, nmsed_additional_fields,
num_detections) = self._non_max_suppression_fn(
(nmsed_boxes, nmsed_scores, nmsed_classes, nmsed_masks,
nmsed_additional_fields, num_detections) = self._non_max_suppression_fn(
detection_boxes,
detection_scores,
clip_window=self._compute_clip_window(
preprocessed_images, true_image_shapes),
additional_fields=additional_fields)
clip_window=self._compute_clip_window(preprocessed_images,
true_image_shapes),
additional_fields=additional_fields,
masks=prediction_dict.get('mask_predictions'))
detection_dict = {
fields.DetectionResultFields.detection_boxes: nmsed_boxes,
fields.DetectionResultFields.detection_scores: nmsed_scores,
......@@ -645,6 +633,9 @@ class SSDMetaArch(model.DetectionModel):
fields.BoxListFields.keypoints in nmsed_additional_fields):
detection_dict[fields.DetectionResultFields.detection_keypoints] = (
nmsed_additional_fields[fields.BoxListFields.keypoints])
if nmsed_masks is not None:
detection_dict[
fields.DetectionResultFields.detection_masks] = nmsed_masks
return detection_dict
def loss(self, prediction_dict, true_image_shapes, scope=None):
......@@ -701,16 +692,22 @@ class SSDMetaArch(model.DetectionModel):
batch_cls_weights = tf.multiply(batch_sampled_indicator,
batch_cls_weights)
losses_mask = None
if self.groundtruth_has_field(fields.InputDataFields.is_annotated):
losses_mask = tf.stack(self.groundtruth_lists(
fields.InputDataFields.is_annotated))
location_losses = self._localization_loss(
prediction_dict['box_encodings'],
batch_reg_targets,
ignore_nan_targets=True,
weights=batch_reg_weights)
weights=batch_reg_weights,
losses_mask=losses_mask)
cls_losses = self._classification_loss(
prediction_dict['class_predictions_with_background'],
batch_cls_targets,
weights=batch_cls_weights)
weights=batch_cls_weights,
losses_mask=losses_mask)
if self._expected_classification_loss_under_sampling:
if cls_losses.get_shape().ndims == 3:
......@@ -734,12 +731,6 @@ class SSDMetaArch(model.DetectionModel):
self._hard_example_miner.summarize()
else:
cls_losses = ops.reduce_sum_trailing_dimensions(cls_losses, ndims=2)
if self._add_summaries:
class_ids = tf.argmax(batch_cls_targets, axis=2)
flattened_class_ids = tf.reshape(class_ids, [-1])
flattened_classification_losses = tf.reshape(cls_losses, [-1])
self._summarize_anchor_classification_loss(
flattened_class_ids, flattened_classification_losses)
localization_loss = tf.reduce_sum(location_losses)
classification_loss = tf.reduce_sum(cls_losses)
......
......@@ -14,105 +14,26 @@
# ==============================================================================
"""Tests for object_detection.meta_architectures.ssd_meta_arch."""
import functools
from absl.testing import parameterized
import numpy as np
import tensorflow as tf
from object_detection.core import anchor_generator
from object_detection.core import balanced_positive_negative_sampler as sampler
from object_detection.core import box_list
from object_detection.core import losses
from object_detection.core import post_processing
from object_detection.core import region_similarity_calculator as sim_calc
from object_detection.core import target_assigner
from object_detection.meta_architectures import ssd_meta_arch
from object_detection.utils import ops
from object_detection.utils import test_case
from object_detection.meta_architectures import ssd_meta_arch_test_lib
from object_detection.utils import test_utils
slim = tf.contrib.slim
keras = tf.keras.layers
class FakeSSDFeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
def __init__(self):
super(FakeSSDFeatureExtractor, self).__init__(
is_training=True,
depth_multiplier=0,
min_depth=0,
pad_to_multiple=1,
conv_hyperparams_fn=None)
def preprocess(self, resized_inputs):
return tf.identity(resized_inputs)
def extract_features(self, preprocessed_inputs):
with tf.variable_scope('mock_model'):
features = slim.conv2d(inputs=preprocessed_inputs, num_outputs=32,
kernel_size=1, scope='layer1')
return [features]
class FakeSSDKerasFeatureExtractor(ssd_meta_arch.SSDKerasFeatureExtractor):
def __init__(self):
with tf.name_scope('mock_model'):
super(FakeSSDKerasFeatureExtractor, self).__init__(
is_training=True,
depth_multiplier=0,
min_depth=0,
pad_to_multiple=1,
conv_hyperparams_config=None,
freeze_batchnorm=False,
inplace_batchnorm_update=False,
)
self._conv = keras.Conv2D(filters=32, kernel_size=1, name='layer1')
def preprocess(self, resized_inputs):
return tf.identity(resized_inputs)
def _extract_features(self, preprocessed_inputs, **kwargs):
with tf.name_scope('mock_model'):
return [self._conv(preprocessed_inputs)]
class MockAnchorGenerator2x2(anchor_generator.AnchorGenerator):
"""Sets up a simple 2x2 anchor grid on the unit square."""
def name_scope(self):
return 'MockAnchorGenerator'
def num_anchors_per_location(self):
return [1]
def _generate(self, feature_map_shape_list, im_height, im_width):
return [box_list.BoxList(
tf.constant([[0, 0, .5, .5],
[0, .5, .5, 1],
[.5, 0, 1, .5],
[1., 1., 1.5, 1.5] # Anchor that is outside clip_window.
], tf.float32))]
def num_anchors(self):
return 4
def _get_value_for_matching_key(dictionary, suffix):
for key in dictionary.keys():
if key.endswith(suffix):
return dictionary[key]
raise ValueError('key not found {}'.format(suffix))
@parameterized.parameters(
{'use_keras': False},
{'use_keras': True},
)
class SsdMetaArchTest(test_case.TestCase, parameterized.TestCase):
class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
parameterized.TestCase):
def _create_model(self,
apply_hard_mining=True,
......@@ -123,96 +44,25 @@ class SsdMetaArchTest(test_case.TestCase, parameterized.TestCase):
use_expected_classification_loss_under_sampling=False,
minimum_negative_sampling=1,
desired_negative_sampling_ratio=3,
use_keras=False):
is_training = False
num_classes = 1
mock_anchor_generator = MockAnchorGenerator2x2()
if use_keras:
mock_box_predictor = test_utils.MockKerasBoxPredictor(
is_training, num_classes)
else:
mock_box_predictor = test_utils.MockBoxPredictor(
is_training, num_classes)
mock_box_coder = test_utils.MockBoxCoder()
if use_keras:
fake_feature_extractor = FakeSSDKerasFeatureExtractor()
else:
fake_feature_extractor = FakeSSDFeatureExtractor()
mock_matcher = test_utils.MockMatcher()
region_similarity_calculator = sim_calc.IouSimilarity()
encode_background_as_zeros = False
def image_resizer_fn(image):
return [tf.identity(image), tf.shape(image)]
classification_loss = losses.WeightedSigmoidClassificationLoss()
localization_loss = losses.WeightedSmoothL1LocalizationLoss()
non_max_suppression_fn = functools.partial(
post_processing.batch_multiclass_non_max_suppression,
score_thresh=-20.0,
iou_thresh=1.0,
max_size_per_class=5,
max_total_size=5)
classification_loss_weight = 1.0
localization_loss_weight = 1.0
negative_class_weight = 1.0
normalize_loss_by_num_matches = False
hard_example_miner = None
if apply_hard_mining:
# This hard example miner is expected to be a no-op.
hard_example_miner = losses.HardExampleMiner(
num_hard_examples=None,
iou_threshold=1.0)
random_example_sampler = None
if random_example_sampling:
random_example_sampler = sampler.BalancedPositiveNegativeSampler(
positive_fraction=0.5)
target_assigner_instance = target_assigner.TargetAssigner(
region_similarity_calculator,
mock_matcher,
mock_box_coder,
negative_class_weight=negative_class_weight,
weight_regression_loss_by_score=weight_regression_loss_by_score)
expected_classification_loss_under_sampling = None
if use_expected_classification_loss_under_sampling:
expected_classification_loss_under_sampling = functools.partial(
ops.expected_classification_loss_under_sampling,
minimum_negative_sampling=minimum_negative_sampling,
desired_negative_sampling_ratio=desired_negative_sampling_ratio)
code_size = 4
model = ssd_meta_arch.SSDMetaArch(
is_training,
mock_anchor_generator,
mock_box_predictor,
mock_box_coder,
fake_feature_extractor,
mock_matcher,
region_similarity_calculator,
encode_background_as_zeros,
negative_class_weight,
image_resizer_fn,
non_max_suppression_fn,
tf.identity,
classification_loss,
localization_loss,
classification_loss_weight,
localization_loss_weight,
normalize_loss_by_num_matches,
hard_example_miner,
target_assigner_instance=target_assigner_instance,
add_summaries=False,
use_keras=False,
predict_mask=False,
use_static_shapes=False,
nms_max_size_per_class=5):
return super(SsdMetaArchTest, self)._create_model(
model_fn=ssd_meta_arch.SSDMetaArch,
apply_hard_mining=apply_hard_mining,
normalize_loc_loss_by_codesize=normalize_loc_loss_by_codesize,
freeze_batchnorm=False,
inplace_batchnorm_update=False,
add_background_class=add_background_class,
random_example_sampler=random_example_sampler,
expected_classification_loss_under_sampling=
expected_classification_loss_under_sampling)
return model, num_classes, mock_anchor_generator.num_anchors(), code_size
random_example_sampling=random_example_sampling,
weight_regression_loss_by_score=weight_regression_loss_by_score,
use_expected_classification_loss_under_sampling=
use_expected_classification_loss_under_sampling,
minimum_negative_sampling=minimum_negative_sampling,
desired_negative_sampling_ratio=desired_negative_sampling_ratio,
use_keras=use_keras,
predict_mask=predict_mask,
use_static_shapes=use_static_shapes,
nms_max_size_per_class=nms_max_size_per_class)
def test_preprocess_preserves_shapes_with_dynamic_input_image(
self, use_keras):
......@@ -360,6 +210,61 @@ class SsdMetaArchTest(test_case.TestCase, parameterized.TestCase):
self.assertAllClose(detections_out['num_detections'],
expected_num_detections)
# BEGIN GOOGLE-INTERNAL
# TODO(b/112621326): Remove conditional after CMLE moves to TF 1.11
def test_postprocess_results_are_correct_static(self, use_keras):
with tf.Graph().as_default():
_, _, _, _ = self._create_model(use_keras=use_keras)
def graph_fn(input_image):
model, _, _, _ = self._create_model(use_static_shapes=True,
nms_max_size_per_class=4)
preprocessed_inputs, true_image_shapes = model.preprocess(input_image)
prediction_dict = model.predict(preprocessed_inputs,
true_image_shapes)
detections = model.postprocess(prediction_dict, true_image_shapes)
return (detections['detection_boxes'], detections['detection_scores'],
detections['detection_classes'], detections['num_detections'])
batch_size = 2
image_size = 2
channels = 3
input_image = np.random.rand(batch_size, image_size, image_size,
channels).astype(np.float32)
expected_boxes = [
[
[0, 0, .5, .5],
[0, .5, .5, 1],
[.5, 0, 1, .5],
[0, 0, 0, 0]
], # padding
[
[0, 0, .5, .5],
[0, .5, .5, 1],
[.5, 0, 1, .5],
[0, 0, 0, 0]
]
] # padding
expected_scores = [[0, 0, 0, 0], [0, 0, 0, 0]]
expected_classes = [[0, 0, 0, 0], [0, 0, 0, 0]]
expected_num_detections = np.array([3, 3])
(detection_boxes, detection_scores, detection_classes,
num_detections) = self.execute(graph_fn, [input_image])
for image_idx in range(batch_size):
self.assertTrue(test_utils.first_rows_close_as_set(
detection_boxes[image_idx][
0:expected_num_detections[image_idx]].tolist(),
expected_boxes[image_idx][0:expected_num_detections[image_idx]]))
self.assertAllClose(
detection_scores[image_idx][0:expected_num_detections[image_idx]],
expected_scores[image_idx][0:expected_num_detections[image_idx]])
self.assertAllClose(
detection_classes[image_idx][0:expected_num_detections[image_idx]],
expected_classes[image_idx][0:expected_num_detections[image_idx]])
self.assertAllClose(num_detections,
expected_num_detections)
# END GOOGLE-INTERNAL
def test_loss_results_are_correct(self, use_keras):
with tf.Graph().as_default():
......@@ -374,9 +279,10 @@ class SsdMetaArchTest(test_case.TestCase, parameterized.TestCase):
prediction_dict = model.predict(preprocessed_tensor,
true_image_shapes=None)
loss_dict = model.loss(prediction_dict, true_image_shapes=None)
return (
_get_value_for_matching_key(loss_dict, 'Loss/localization_loss'),
_get_value_for_matching_key(loss_dict, 'Loss/classification_loss'))
return (self._get_value_for_matching_key(loss_dict,
'Loss/localization_loss'),
self._get_value_for_matching_key(loss_dict,
'Loss/classification_loss'))
batch_size = 2
preprocessed_input = np.random.rand(batch_size, 2, 2, 3).astype(np.float32)
......@@ -413,7 +319,8 @@ class SsdMetaArchTest(test_case.TestCase, parameterized.TestCase):
prediction_dict = model.predict(preprocessed_tensor,
true_image_shapes=None)
loss_dict = model.loss(prediction_dict, true_image_shapes=None)
return (_get_value_for_matching_key(loss_dict, 'Loss/localization_loss'),)
return (self._get_value_for_matching_key(loss_dict,
'Loss/localization_loss'),)
batch_size = 2
preprocessed_input = np.random.rand(batch_size, 2, 2, 3).astype(np.float32)
......@@ -443,9 +350,10 @@ class SsdMetaArchTest(test_case.TestCase, parameterized.TestCase):
prediction_dict = model.predict(preprocessed_tensor,
true_image_shapes=None)
loss_dict = model.loss(prediction_dict, true_image_shapes=None)
return (
_get_value_for_matching_key(loss_dict, 'Loss/localization_loss'),
_get_value_for_matching_key(loss_dict, 'Loss/classification_loss'))
return (self._get_value_for_matching_key(loss_dict,
'Loss/localization_loss'),
self._get_value_for_matching_key(loss_dict,
'Loss/classification_loss'))
batch_size = 2
preprocessed_input = np.random.rand(batch_size, 2, 2, 3).astype(np.float32)
......@@ -591,6 +499,55 @@ class SsdMetaArchTest(test_case.TestCase, parameterized.TestCase):
self.assertAllClose(localization_loss, expected_localization_loss)
self.assertAllClose(classification_loss, expected_classification_loss)
def test_loss_results_are_correct_with_losses_mask(self, use_keras):
with tf.Graph().as_default():
_, num_classes, num_anchors, _ = self._create_model(use_keras=use_keras)
def graph_fn(preprocessed_tensor, groundtruth_boxes1, groundtruth_boxes2,
groundtruth_boxes3, groundtruth_classes1, groundtruth_classes2,
groundtruth_classes3):
groundtruth_boxes_list = [groundtruth_boxes1, groundtruth_boxes2,
groundtruth_boxes3]
groundtruth_classes_list = [groundtruth_classes1, groundtruth_classes2,
groundtruth_classes3]
is_annotated_list = [tf.constant(True), tf.constant(True),
tf.constant(False)]
model, _, _, _ = self._create_model(apply_hard_mining=False)
model.provide_groundtruth(groundtruth_boxes_list,
groundtruth_classes_list,
is_annotated_list=is_annotated_list)
prediction_dict = model.predict(preprocessed_tensor,
true_image_shapes=None)
loss_dict = model.loss(prediction_dict, true_image_shapes=None)
return (self._get_value_for_matching_key(loss_dict,
'Loss/localization_loss'),
self._get_value_for_matching_key(loss_dict,
'Loss/classification_loss'))
batch_size = 3
preprocessed_input = np.random.rand(batch_size, 2, 2, 3).astype(np.float32)
groundtruth_boxes1 = np.array([[0, 0, .5, .5]], dtype=np.float32)
groundtruth_boxes2 = np.array([[0, 0, .5, .5]], dtype=np.float32)
groundtruth_boxes3 = np.array([[0, 0, .5, .5]], dtype=np.float32)
groundtruth_classes1 = np.array([[1]], dtype=np.float32)
groundtruth_classes2 = np.array([[1]], dtype=np.float32)
groundtruth_classes3 = np.array([[1]], dtype=np.float32)
expected_localization_loss = 0.0
# Note that we are subtracting 1 from batch_size, since the final image is
# not annotated.
expected_classification_loss = ((batch_size - 1) * num_anchors
* (num_classes+1) * np.log(2.0))
(localization_loss,
classification_loss) = self.execute(graph_fn, [preprocessed_input,
groundtruth_boxes1,
groundtruth_boxes2,
groundtruth_boxes3,
groundtruth_classes1,
groundtruth_classes2,
groundtruth_classes3])
self.assertAllClose(localization_loss, expected_localization_loss)
self.assertAllClose(classification_loss, expected_classification_loss)
def test_restore_map_for_detection_ckpt(self, use_keras):
model, _, _, _ = self._create_model(use_keras=use_keras)
model.predict(tf.constant(np.array([[[[0, 0], [1, 1]], [[1, 0], [0, 1]]]],
......@@ -678,10 +635,8 @@ class SsdMetaArchTest(test_case.TestCase, parameterized.TestCase):
use_keras):
with tf.Graph().as_default():
_, num_classes, num_anchors, _ = self._create_model(
random_example_sampling=True,
use_keras=use_keras)
print num_classes, num_anchors
_, num_classes, _, _ = self._create_model(
random_example_sampling=True, use_keras=use_keras)
def graph_fn(preprocessed_tensor, groundtruth_boxes1, groundtruth_boxes2,
groundtruth_classes1, groundtruth_classes2):
......@@ -694,9 +649,10 @@ class SsdMetaArchTest(test_case.TestCase, parameterized.TestCase):
prediction_dict = model.predict(
preprocessed_tensor, true_image_shapes=None)
loss_dict = model.loss(prediction_dict, true_image_shapes=None)
return (_get_value_for_matching_key(loss_dict, 'Loss/localization_loss'),
_get_value_for_matching_key(loss_dict,
'Loss/classification_loss'))
return (self._get_value_for_matching_key(loss_dict,
'Loss/localization_loss'),
self._get_value_for_matching_key(loss_dict,
'Loss/classification_loss'))
batch_size = 2
preprocessed_input = np.random.rand(batch_size, 2, 2, 3).astype(np.float32)
......
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Helper functions for SSD models meta architecture tests."""
import functools
import tensorflow as tf
from object_detection.core import anchor_generator
from object_detection.core import balanced_positive_negative_sampler as sampler
from object_detection.core import box_list
from object_detection.core import losses
from object_detection.core import post_processing
from object_detection.core import region_similarity_calculator as sim_calc
from object_detection.core import target_assigner
from object_detection.meta_architectures import ssd_meta_arch
from object_detection.utils import ops
from object_detection.utils import test_case
from object_detection.utils import test_utils
slim = tf.contrib.slim
keras = tf.keras.layers
class FakeSSDFeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
"""Fake ssd feature extracture for ssd meta arch tests."""
def __init__(self):
super(FakeSSDFeatureExtractor, self).__init__(
is_training=True,
depth_multiplier=0,
min_depth=0,
pad_to_multiple=1,
conv_hyperparams_fn=None)
def preprocess(self, resized_inputs):
return tf.identity(resized_inputs)
def extract_features(self, preprocessed_inputs):
with tf.variable_scope('mock_model'):
features = slim.conv2d(
inputs=preprocessed_inputs,
num_outputs=32,
kernel_size=1,
scope='layer1')
return [features]
class FakeSSDKerasFeatureExtractor(ssd_meta_arch.SSDKerasFeatureExtractor):
"""Fake keras based ssd feature extracture for ssd meta arch tests."""
def __init__(self):
with tf.name_scope('mock_model'):
super(FakeSSDKerasFeatureExtractor, self).__init__(
is_training=True,
depth_multiplier=0,
min_depth=0,
pad_to_multiple=1,
conv_hyperparams_config=None,
freeze_batchnorm=False,
inplace_batchnorm_update=False,
)
self._conv = keras.Conv2D(filters=32, kernel_size=1, name='layer1')
def preprocess(self, resized_inputs):
return tf.identity(resized_inputs)
def _extract_features(self, preprocessed_inputs, **kwargs):
with tf.name_scope('mock_model'):
return [self._conv(preprocessed_inputs)]
class MockAnchorGenerator2x2(anchor_generator.AnchorGenerator):
"""A simple 2x2 anchor grid on the unit square used for test only."""
def name_scope(self):
return 'MockAnchorGenerator'
def num_anchors_per_location(self):
return [1]
def _generate(self, feature_map_shape_list, im_height, im_width):
return [
box_list.BoxList(
tf.constant(
[
[0, 0, .5, .5],
[0, .5, .5, 1],
[.5, 0, 1, .5],
[1., 1., 1.5, 1.5] # Anchor that is outside clip_window.
],
tf.float32))
]
def num_anchors(self):
return 4
class SSDMetaArchTestBase(test_case.TestCase):
"""Base class to test SSD based meta architectures."""
def _create_model(self,
model_fn=ssd_meta_arch.SSDMetaArch,
apply_hard_mining=True,
normalize_loc_loss_by_codesize=False,
add_background_class=True,
random_example_sampling=False,
weight_regression_loss_by_score=False,
use_expected_classification_loss_under_sampling=False,
minimum_negative_sampling=1,
desired_negative_sampling_ratio=3,
use_keras=False,
predict_mask=False,
use_static_shapes=False,
nms_max_size_per_class=5):
is_training = False
num_classes = 1
mock_anchor_generator = MockAnchorGenerator2x2()
if use_keras:
mock_box_predictor = test_utils.MockKerasBoxPredictor(
is_training, num_classes, predict_mask=predict_mask)
else:
mock_box_predictor = test_utils.MockBoxPredictor(
is_training, num_classes, predict_mask=predict_mask)
mock_box_coder = test_utils.MockBoxCoder()
if use_keras:
fake_feature_extractor = FakeSSDKerasFeatureExtractor()
else:
fake_feature_extractor = FakeSSDFeatureExtractor()
mock_matcher = test_utils.MockMatcher()
region_similarity_calculator = sim_calc.IouSimilarity()
encode_background_as_zeros = False
def image_resizer_fn(image):
return [tf.identity(image), tf.shape(image)]
classification_loss = losses.WeightedSigmoidClassificationLoss()
localization_loss = losses.WeightedSmoothL1LocalizationLoss()
non_max_suppression_fn = functools.partial(
post_processing.batch_multiclass_non_max_suppression,
score_thresh=-20.0,
iou_thresh=1.0,
max_size_per_class=nms_max_size_per_class,
max_total_size=nms_max_size_per_class,
use_static_shapes=use_static_shapes)
classification_loss_weight = 1.0
localization_loss_weight = 1.0
negative_class_weight = 1.0
normalize_loss_by_num_matches = False
hard_example_miner = None
if apply_hard_mining:
# This hard example miner is expected to be a no-op.
hard_example_miner = losses.HardExampleMiner(
num_hard_examples=None, iou_threshold=1.0)
random_example_sampler = None
if random_example_sampling:
random_example_sampler = sampler.BalancedPositiveNegativeSampler(
positive_fraction=0.5)
target_assigner_instance = target_assigner.TargetAssigner(
region_similarity_calculator,
mock_matcher,
mock_box_coder,
negative_class_weight=negative_class_weight,
weight_regression_loss_by_score=weight_regression_loss_by_score)
expected_classification_loss_under_sampling = None
if use_expected_classification_loss_under_sampling:
expected_classification_loss_under_sampling = functools.partial(
ops.expected_classification_loss_under_sampling,
minimum_negative_sampling=minimum_negative_sampling,
desired_negative_sampling_ratio=desired_negative_sampling_ratio)
code_size = 4
model = model_fn(
is_training=is_training,
anchor_generator=mock_anchor_generator,
box_predictor=mock_box_predictor,
box_coder=mock_box_coder,
feature_extractor=fake_feature_extractor,
encode_background_as_zeros=encode_background_as_zeros,
image_resizer_fn=image_resizer_fn,
non_max_suppression_fn=non_max_suppression_fn,
score_conversion_fn=tf.identity,
classification_loss=classification_loss,
localization_loss=localization_loss,
classification_loss_weight=classification_loss_weight,
localization_loss_weight=localization_loss_weight,
normalize_loss_by_num_matches=normalize_loss_by_num_matches,
hard_example_miner=hard_example_miner,
target_assigner_instance=target_assigner_instance,
add_summaries=False,
normalize_loc_loss_by_codesize=normalize_loc_loss_by_codesize,
freeze_batchnorm=False,
inplace_batchnorm_update=False,
add_background_class=add_background_class,
random_example_sampler=random_example_sampler,
expected_classification_loss_under_sampling=
expected_classification_loss_under_sampling)
return model, num_classes, mock_anchor_generator.num_anchors(), code_size
def _get_value_for_matching_key(self, dictionary, suffix):
for key in dictionary.keys():
if key.endswith(suffix):
return dictionary[key]
raise ValueError('key not found {}'.format(suffix))
if __name__ == '__main__':
tf.test.main()
......@@ -18,6 +18,7 @@ import tensorflow as tf
from object_detection.core import standard_fields
from object_detection.metrics import coco_tools
from object_detection.utils import json_utils
from object_detection.utils import object_detection_evaluation
......@@ -148,6 +149,19 @@ class CocoDetectionEvaluator(object_detection_evaluation.DetectionEvaluator):
detection_classes]))
self._image_ids[image_id] = True
def dump_detections_to_json_file(self, json_output_path):
"""Saves the detections into json_output_path in the format used by MS COCO.
Args:
json_output_path: String containing the output file's path. It can be also
None. In that case nothing will be written to the output file.
"""
if json_output_path and json_output_path is not None:
with tf.gfile.GFile(json_output_path, 'w') as fid:
tf.logging.info('Dumping detections to output json file.')
json_utils.Dump(
obj=self._detection_boxes_list, fid=fid, float_digits=4, indent=2)
def evaluate(self):
"""Evaluates the detection boxes and returns a dictionary of coco metrics.
......@@ -245,10 +259,11 @@ class CocoDetectionEvaluator(object_detection_evaluation.DetectionEvaluator):
detection_boxes_batched, detection_scores_batched,
detection_classes_batched, num_det_boxes_per_image):
self.add_single_ground_truth_image_info(
image_id,
{'groundtruth_boxes': gt_box[:num_gt_box],
'groundtruth_classes': gt_class[:num_gt_box],
'groundtruth_is_crowd': gt_is_crowd[:num_gt_box]})
image_id, {
'groundtruth_boxes': gt_box[:num_gt_box],
'groundtruth_classes': gt_class[:num_gt_box],
'groundtruth_is_crowd': gt_is_crowd[:num_gt_box]
})
self.add_single_detected_image_info(
image_id,
{'detection_boxes': det_box[:num_det_box],
......@@ -268,8 +283,7 @@ class CocoDetectionEvaluator(object_detection_evaluation.DetectionEvaluator):
detection_classes = eval_dict[detection_fields.detection_classes]
num_gt_boxes_per_image = eval_dict.get(
'num_groundtruth_boxes_per_image', None)
num_det_boxes_per_image = eval_dict.get(
'num_groundtruth_boxes_per_image', None)
num_det_boxes_per_image = eval_dict.get('num_det_boxes_per_image', None)
if groundtruth_is_crowd is None:
groundtruth_is_crowd = tf.zeros_like(groundtruth_classes, dtype=tf.bool)
......@@ -491,6 +505,19 @@ class CocoMaskEvaluator(object_detection_evaluation.DetectionEvaluator):
detection_classes]))
self._image_ids_with_detections.update([image_id])
def dump_detections_to_json_file(self, json_output_path):
"""Saves the detections into json_output_path in the format used by MS COCO.
Args:
json_output_path: String containing the output file's path. It can be also
None. In that case nothing will be written to the output file.
"""
if json_output_path and json_output_path is not None:
tf.logging.info('Dumping detections to output json file.')
with tf.gfile.GFile(json_output_path, 'w') as fid:
json_utils.Dump(
obj=self._detection_masks_list, fid=fid, float_digits=4, indent=2)
def evaluate(self):
"""Evaluates the detection masks and returns a dictionary of coco metrics.
......
......@@ -24,14 +24,25 @@ from object_detection.core import standard_fields
from object_detection.metrics import coco_evaluation
def _get_categories_list():
return [{
'id': 1,
'name': 'person'
}, {
'id': 2,
'name': 'dog'
}, {
'id': 3,
'name': 'cat'
}]
class CocoDetectionEvaluationTest(tf.test.TestCase):
def testGetOneMAPWithMatchingGroundtruthAndDetections(self):
"""Tests that mAP is calculated correctly on GT and Detections."""
category_list = [{'id': 0, 'name': 'person'},
{'id': 1, 'name': 'cat'},
{'id': 2, 'name': 'dog'}]
coco_evaluator = coco_evaluation.CocoDetectionEvaluator(category_list)
coco_evaluator = coco_evaluation.CocoDetectionEvaluator(
_get_categories_list())
coco_evaluator.add_single_ground_truth_image_info(
image_id='image1',
groundtruth_dict={
......@@ -88,17 +99,8 @@ class CocoDetectionEvaluationTest(tf.test.TestCase):
def testGetOneMAPWithMatchingGroundtruthAndDetectionsSkipCrowd(self):
"""Tests computing mAP with is_crowd GT boxes skipped."""
category_list = [{
'id': 0,
'name': 'person'
}, {
'id': 1,
'name': 'cat'
}, {
'id': 2,
'name': 'dog'
}]
coco_evaluator = coco_evaluation.CocoDetectionEvaluator(category_list)
coco_evaluator = coco_evaluation.CocoDetectionEvaluator(
_get_categories_list())
coco_evaluator.add_single_ground_truth_image_info(
image_id='image1',
groundtruth_dict={
......@@ -124,17 +126,8 @@ class CocoDetectionEvaluationTest(tf.test.TestCase):
def testGetOneMAPWithMatchingGroundtruthAndDetectionsEmptyCrowd(self):
"""Tests computing mAP with empty is_crowd array passed in."""
category_list = [{
'id': 0,
'name': 'person'
}, {
'id': 1,
'name': 'cat'
}, {
'id': 2,
'name': 'dog'
}]
coco_evaluator = coco_evaluation.CocoDetectionEvaluator(category_list)
coco_evaluator = coco_evaluation.CocoDetectionEvaluator(
_get_categories_list())
coco_evaluator.add_single_ground_truth_image_info(
image_id='image1',
groundtruth_dict={
......@@ -160,11 +153,9 @@ class CocoDetectionEvaluationTest(tf.test.TestCase):
def testRejectionOnDuplicateGroundtruth(self):
"""Tests that groundtruth cannot be added more than once for an image."""
categories = [{'id': 1, 'name': 'cat'},
{'id': 2, 'name': 'dog'},
{'id': 3, 'name': 'elephant'}]
coco_evaluator = coco_evaluation.CocoDetectionEvaluator(
_get_categories_list())
# Add groundtruth
coco_evaluator = coco_evaluation.CocoDetectionEvaluator(categories)
image_key1 = 'img1'
groundtruth_boxes1 = np.array([[0, 0, 1, 1], [0, 0, 2, 2], [0, 0, 3, 3]],
dtype=float)
......@@ -189,11 +180,9 @@ class CocoDetectionEvaluationTest(tf.test.TestCase):
def testRejectionOnDuplicateDetections(self):
"""Tests that detections cannot be added more than once for an image."""
categories = [{'id': 1, 'name': 'cat'},
{'id': 2, 'name': 'dog'},
{'id': 3, 'name': 'elephant'}]
coco_evaluator = coco_evaluation.CocoDetectionEvaluator(
_get_categories_list())
# Add groundtruth
coco_evaluator = coco_evaluation.CocoDetectionEvaluator(categories)
coco_evaluator.add_single_ground_truth_image_info(
image_id='image1',
groundtruth_dict={
......@@ -227,10 +216,8 @@ class CocoDetectionEvaluationTest(tf.test.TestCase):
def testExceptionRaisedWithMissingGroundtruth(self):
"""Tests that exception is raised for detection with missing groundtruth."""
categories = [{'id': 1, 'name': 'cat'},
{'id': 2, 'name': 'dog'},
{'id': 3, 'name': 'elephant'}]
coco_evaluator = coco_evaluation.CocoDetectionEvaluator(categories)
coco_evaluator = coco_evaluation.CocoDetectionEvaluator(
_get_categories_list())
with self.assertRaises(ValueError):
coco_evaluator.add_single_detected_image_info(
image_id='image1',
......@@ -247,10 +234,8 @@ class CocoDetectionEvaluationTest(tf.test.TestCase):
class CocoEvaluationPyFuncTest(tf.test.TestCase):
def testGetOneMAPWithMatchingGroundtruthAndDetections(self):
category_list = [{'id': 0, 'name': 'person'},
{'id': 1, 'name': 'cat'},
{'id': 2, 'name': 'dog'}]
coco_evaluator = coco_evaluation.CocoDetectionEvaluator(category_list)
coco_evaluator = coco_evaluation.CocoDetectionEvaluator(
_get_categories_list())
image_id = tf.placeholder(tf.string, shape=())
groundtruth_boxes = tf.placeholder(tf.float32, shape=(None, 4))
groundtruth_classes = tf.placeholder(tf.float32, shape=(None))
......@@ -310,31 +295,22 @@ class CocoEvaluationPyFuncTest(tf.test.TestCase):
self.assertAlmostEqual(metrics['DetectionBoxes_Precision/mAP@.75IOU'], 1.0)
self.assertAlmostEqual(metrics['DetectionBoxes_Precision/mAP (large)'], 1.0)
self.assertAlmostEqual(metrics['DetectionBoxes_Precision/mAP (medium)'],
-1.0)
1.0)
self.assertAlmostEqual(metrics['DetectionBoxes_Precision/mAP (small)'], 1.0)
self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@1'], 1.0)
self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@10'], 1.0)
self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@100'], 1.0)
self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@100 (large)'], 1.0)
self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@100 (medium)'],
-1.0)
1.0)
self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@100 (small)'], 1.0)
self.assertFalse(coco_evaluator._groundtruth_list)
self.assertFalse(coco_evaluator._detection_boxes_list)
self.assertFalse(coco_evaluator._image_ids)
def testGetOneMAPWithMatchingGroundtruthAndDetectionsPadded(self):
category_list = [{
'id': 0,
'name': 'person'
}, {
'id': 1,
'name': 'cat'
}, {
'id': 2,
'name': 'dog'
}]
coco_evaluator = coco_evaluation.CocoDetectionEvaluator(category_list)
coco_evaluator = coco_evaluation.CocoDetectionEvaluator(
_get_categories_list())
image_id = tf.placeholder(tf.string, shape=())
groundtruth_boxes = tf.placeholder(tf.float32, shape=(None, 4))
groundtruth_classes = tf.placeholder(tf.float32, shape=(None))
......@@ -415,24 +391,22 @@ class CocoEvaluationPyFuncTest(tf.test.TestCase):
self.assertAlmostEqual(metrics['DetectionBoxes_Precision/mAP@.75IOU'], 1.0)
self.assertAlmostEqual(metrics['DetectionBoxes_Precision/mAP (large)'], 1.0)
self.assertAlmostEqual(metrics['DetectionBoxes_Precision/mAP (medium)'],
-1.0)
1.0)
self.assertAlmostEqual(metrics['DetectionBoxes_Precision/mAP (small)'], 1.0)
self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@1'], 0.75)
self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@1'], 0.83333331)
self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@10'], 1.0)
self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@100'], 1.0)
self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@100 (large)'], 1.0)
self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@100 (medium)'],
-1.0)
1.0)
self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@100 (small)'], 1.0)
self.assertFalse(coco_evaluator._groundtruth_list)
self.assertFalse(coco_evaluator._detection_boxes_list)
self.assertFalse(coco_evaluator._image_ids)
def testGetOneMAPWithMatchingGroundtruthAndDetectionsBatched(self):
category_list = [{'id': 0, 'name': 'person'},
{'id': 1, 'name': 'cat'},
{'id': 2, 'name': 'dog'}]
coco_evaluator = coco_evaluation.CocoDetectionEvaluator(category_list)
coco_evaluator = coco_evaluation.CocoDetectionEvaluator(
_get_categories_list())
batch_size = 3
image_id = tf.placeholder(tf.string, shape=(batch_size))
groundtruth_boxes = tf.placeholder(tf.float32, shape=(batch_size, None, 4))
......@@ -479,24 +453,22 @@ class CocoEvaluationPyFuncTest(tf.test.TestCase):
self.assertAlmostEqual(metrics['DetectionBoxes_Precision/mAP@.75IOU'], 1.0)
self.assertAlmostEqual(metrics['DetectionBoxes_Precision/mAP (large)'], 1.0)
self.assertAlmostEqual(metrics['DetectionBoxes_Precision/mAP (medium)'],
-1.0)
1.0)
self.assertAlmostEqual(metrics['DetectionBoxes_Precision/mAP (small)'], 1.0)
self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@1'], 1.0)
self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@10'], 1.0)
self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@100'], 1.0)
self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@100 (large)'], 1.0)
self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@100 (medium)'],
-1.0)
1.0)
self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@100 (small)'], 1.0)
self.assertFalse(coco_evaluator._groundtruth_list)
self.assertFalse(coco_evaluator._detection_boxes_list)
self.assertFalse(coco_evaluator._image_ids)
def testGetOneMAPWithMatchingGroundtruthAndDetectionsPaddedBatches(self):
category_list = [{'id': 0, 'name': 'person'},
{'id': 1, 'name': 'cat'},
{'id': 2, 'name': 'dog'}]
coco_evaluator = coco_evaluation.CocoDetectionEvaluator(category_list)
coco_evaluator = coco_evaluation.CocoDetectionEvaluator(
_get_categories_list())
batch_size = 3
image_id = tf.placeholder(tf.string, shape=(batch_size))
groundtruth_boxes = tf.placeholder(tf.float32, shape=(batch_size, None, 4))
......@@ -525,27 +497,40 @@ class CocoEvaluationPyFuncTest(tf.test.TestCase):
_, update_op = eval_metric_ops['DetectionBoxes_Precision/mAP']
with self.test_session() as sess:
sess.run(update_op,
feed_dict={
image_id: ['image1', 'image2', 'image3'],
groundtruth_boxes: np.array([[[100., 100., 200., 200.],
[-1, -1, -1, -1]],
[[50., 50., 100., 100.],
[-1, -1, -1, -1]],
[[25., 25., 50., 50.],
[10., 10., 15., 15.]]]),
groundtruth_classes: np.array([[1, -1], [3, -1], [2, 2]]),
num_gt_boxes_per_image: np.array([1, 1, 2]),
detection_boxes: np.array([[[100., 100., 200., 200.],
[0., 0., 0., 0.]],
[[50., 50., 100., 100.],
[0., 0., 0., 0.]],
[[25., 25., 50., 50.],
[10., 10., 15., 15.]]]),
detection_scores: np.array([[.8, 0.], [.7, 0.], [.95, .9]]),
detection_classes: np.array([[1, -1], [3, -1], [2, 2]]),
num_det_boxes_per_image: np.array([1, 1, 2]),
})
sess.run(
update_op,
feed_dict={
image_id: ['image1', 'image2', 'image3'],
groundtruth_boxes:
np.array([[[100., 100., 200., 200.], [-1, -1, -1, -1]],
[[50., 50., 100., 100.], [-1, -1, -1, -1]],
[[25., 25., 50., 50.], [10., 10., 15., 15.]]]),
groundtruth_classes:
np.array([[1, -1], [3, -1], [2, 2]]),
num_gt_boxes_per_image:
np.array([1, 1, 2]),
detection_boxes:
np.array([[[100., 100., 200., 200.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]],
[[50., 50., 100., 100.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]],
[[25., 25., 50., 50.],
[10., 10., 15., 15.],
[10., 10., 15., 15.]]]),
detection_scores:
np.array([[.8, 0., 0.], [.7, 0., 0.], [.95, .9, 0.9]]),
detection_classes:
np.array([[1, -1, -1], [3, -1, -1], [2, 2, 2]]),
num_det_boxes_per_image:
np.array([1, 1, 3]),
})
# Check the number of bounding boxes added.
self.assertEqual(len(coco_evaluator._groundtruth_list), 4)
self.assertEqual(len(coco_evaluator._detection_boxes_list), 5)
metrics = {}
for key, (value_op, _) in eval_metric_ops.iteritems():
metrics[key] = value_op
......@@ -555,14 +540,14 @@ class CocoEvaluationPyFuncTest(tf.test.TestCase):
self.assertAlmostEqual(metrics['DetectionBoxes_Precision/mAP@.75IOU'], 1.0)
self.assertAlmostEqual(metrics['DetectionBoxes_Precision/mAP (large)'], 1.0)
self.assertAlmostEqual(metrics['DetectionBoxes_Precision/mAP (medium)'],
-1.0)
1.0)
self.assertAlmostEqual(metrics['DetectionBoxes_Precision/mAP (small)'], 1.0)
self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@1'], 0.75)
self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@1'], 0.83333331)
self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@10'], 1.0)
self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@100'], 1.0)
self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@100 (large)'], 1.0)
self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@100 (medium)'],
-1.0)
1.0)
self.assertAlmostEqual(metrics['DetectionBoxes_Recall/AR@100 (small)'], 1.0)
self.assertFalse(coco_evaluator._groundtruth_list)
self.assertFalse(coco_evaluator._detection_boxes_list)
......@@ -572,10 +557,7 @@ class CocoEvaluationPyFuncTest(tf.test.TestCase):
class CocoMaskEvaluationTest(tf.test.TestCase):
def testGetOneMAPWithMatchingGroundtruthAndDetections(self):
category_list = [{'id': 0, 'name': 'person'},
{'id': 1, 'name': 'cat'},
{'id': 2, 'name': 'dog'}]
coco_evaluator = coco_evaluation.CocoMaskEvaluator(category_list)
coco_evaluator = coco_evaluation.CocoMaskEvaluator(_get_categories_list())
coco_evaluator.add_single_ground_truth_image_info(
image_id='image1',
groundtruth_dict={
......@@ -657,10 +639,7 @@ class CocoMaskEvaluationTest(tf.test.TestCase):
class CocoMaskEvaluationPyFuncTest(tf.test.TestCase):
def testGetOneMAPWithMatchingGroundtruthAndDetections(self):
category_list = [{'id': 0, 'name': 'person'},
{'id': 1, 'name': 'cat'},
{'id': 2, 'name': 'dog'}]
coco_evaluator = coco_evaluation.CocoMaskEvaluator(category_list)
coco_evaluator = coco_evaluation.CocoMaskEvaluator(_get_categories_list())
image_id = tf.placeholder(tf.string, shape=())
groundtruth_boxes = tf.placeholder(tf.float32, shape=(None, 4))
groundtruth_classes = tf.placeholder(tf.float32, shape=(None))
......@@ -756,5 +735,6 @@ class CocoMaskEvaluationPyFuncTest(tf.test.TestCase):
self.assertFalse(coco_evaluator._image_id_to_mask_shape_map)
self.assertFalse(coco_evaluator._detection_masks_list)
if __name__ == '__main__':
tf.test.main()
......@@ -91,10 +91,8 @@ def read_data_and_evaluate(input_config, eval_config):
if input_config.WhichOneof('input_reader') == 'tf_record_input_reader':
input_paths = input_config.tf_record_input_reader.input_path
label_map = label_map_util.load_labelmap(input_config.label_map_path)
max_num_classes = max([item.id for item in label_map.item])
categories = label_map_util.convert_label_map_to_categories(
label_map, max_num_classes)
categories = label_map_util.create_categories_from_labelmap(
input_config.label_map_path)
object_detection_evaluators = evaluator.get_evaluators(
eval_config, categories)
......
......@@ -18,6 +18,7 @@ from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import copy
import functools
import os
......@@ -43,9 +44,12 @@ MODEL_BUILD_UTIL_MAP = {
config_util.create_pipeline_proto_from_configs,
'merge_external_params_with_configs':
config_util.merge_external_params_with_configs,
'create_train_input_fn': inputs.create_train_input_fn,
'create_eval_input_fn': inputs.create_eval_input_fn,
'create_predict_input_fn': inputs.create_predict_input_fn,
'create_train_input_fn':
inputs.create_train_input_fn,
'create_eval_input_fn':
inputs.create_eval_input_fn,
'create_predict_input_fn':
inputs.create_predict_input_fn,
}
......@@ -126,8 +130,9 @@ def unstack_batch(tensor_dict, unpad_groundtruth_tensors=True):
ValueError: If unpad_tensors is True and `tensor_dict` does not contain
`num_groundtruth_boxes` tensor.
"""
unbatched_tensor_dict = {key: tf.unstack(tensor)
for key, tensor in tensor_dict.items()}
unbatched_tensor_dict = {
key: tf.unstack(tensor) for key, tensor in tensor_dict.items()
}
if unpad_groundtruth_tensors:
if (fields.InputDataFields.num_groundtruth_boxes not in
unbatched_tensor_dict):
......@@ -206,8 +211,8 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
# Make sure to set the Keras learning phase. True during training,
# False for inference.
tf.keras.backend.set_learning_phase(is_training)
detection_model = detection_model_fn(is_training=is_training,
add_summaries=(not use_tpu))
detection_model = detection_model_fn(
is_training=is_training, add_summaries=(not use_tpu))
scaffold_fn = None
if mode == tf.estimator.ModeKeys.TRAIN:
......@@ -237,6 +242,7 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
gt_weights_list = None
if fields.InputDataFields.groundtruth_weights in labels:
gt_weights_list = labels[fields.InputDataFields.groundtruth_weights]
gt_is_crowd_list = None
if fields.InputDataFields.groundtruth_is_crowd in labels:
gt_is_crowd_list = labels[fields.InputDataFields.groundtruth_is_crowd]
detection_model.provide_groundtruth(
......@@ -248,8 +254,18 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
groundtruth_is_crowd_list=gt_is_crowd_list)
preprocessed_images = features[fields.InputDataFields.image]
prediction_dict = detection_model.predict(
preprocessed_images, features[fields.InputDataFields.true_image_shape])
if use_tpu and train_config.use_bfloat16:
with tf.contrib.tpu.bfloat16_scope():
prediction_dict = detection_model.predict(
preprocessed_images,
features[fields.InputDataFields.true_image_shape])
for k, v in prediction_dict.items():
if v.dtype == tf.bfloat16:
prediction_dict[k] = tf.cast(v, tf.float32)
else:
prediction_dict = detection_model.predict(
preprocessed_images,
features[fields.InputDataFields.true_image_shape])
if mode in (tf.estimator.ModeKeys.EVAL, tf.estimator.ModeKeys.PREDICT):
detections = detection_model.postprocess(
prediction_dict, features[fields.InputDataFields.true_image_shape])
......@@ -270,13 +286,16 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
train_config.load_all_detection_checkpoint_vars))
available_var_map = (
variables_helper.get_variables_available_in_checkpoint(
asg_map, train_config.fine_tune_checkpoint,
asg_map,
train_config.fine_tune_checkpoint,
include_global_step=False))
if use_tpu:
def tpu_scaffold():
tf.train.init_from_checkpoint(train_config.fine_tune_checkpoint,
available_var_map)
return tf.train.Scaffold()
scaffold_fn = tpu_scaffold
else:
tf.train.init_from_checkpoint(train_config.fine_tune_checkpoint,
......@@ -290,8 +309,8 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
regularization_losses = tf.get_collection(
tf.GraphKeys.REGULARIZATION_LOSSES)
if regularization_losses:
regularization_loss = tf.add_n(regularization_losses,
name='regularization_loss')
regularization_loss = tf.add_n(
regularization_losses, name='regularization_loss')
losses.append(regularization_loss)
losses_dict['Loss/regularization_loss'] = regularization_loss
total_loss = tf.add_n(losses, name='total_loss')
......@@ -353,14 +372,14 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
eval_metric_ops = None
scaffold = None
if mode == tf.estimator.ModeKeys.EVAL:
class_agnostic = (fields.DetectionResultFields.detection_classes
not in detections)
groundtruth = _prepare_groundtruth_for_eval(
detection_model, class_agnostic)
class_agnostic = (
fields.DetectionResultFields.detection_classes not in detections)
groundtruth = _prepare_groundtruth_for_eval(detection_model,
class_agnostic)
use_original_images = fields.InputDataFields.original_image in features
eval_images = (
features[fields.InputDataFields.original_image] if use_original_images
else features[fields.InputDataFields.image])
features[fields.InputDataFields.original_image]
if use_original_images else features[fields.InputDataFields.image])
eval_dict = eval_util.result_dict_for_single_example(
eval_images[0:1],
features[inputs.HASH_KEY][0],
......@@ -374,28 +393,26 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
else:
category_index = label_map_util.create_category_index_from_labelmap(
eval_input_config.label_map_path)
img_summary = None
vis_metric_ops = None
if not use_tpu and use_original_images:
detection_and_groundtruth = (
vis_utils.draw_side_by_side_evaluation_image(
eval_dict, category_index, max_boxes_to_draw=eval_config.max_num_boxes_to_visualize,
min_score_thresh=eval_config.min_score_threshold,
use_normalized_coordinates=False))
img_summary = tf.summary.image('Detections_Left_Groundtruth_Right',
detection_and_groundtruth)
eval_metric_op_vis = vis_utils.VisualizeSingleFrameDetections(
category_index,
max_examples_to_draw=eval_config.num_visualizations,
max_boxes_to_draw=eval_config.max_num_boxes_to_visualize,
min_score_thresh=eval_config.min_score_threshold,
use_normalized_coordinates=False)
vis_metric_ops = eval_metric_op_vis.get_estimator_eval_metric_ops(
eval_dict)
# Eval metrics on a single example.
eval_metric_ops = eval_util.get_eval_metric_ops_for_evaluators(
eval_config,
category_index.values(),
eval_dict)
eval_config, category_index.values(), eval_dict)
for loss_key, loss_tensor in iter(losses_dict.items()):
eval_metric_ops[loss_key] = tf.metrics.mean(loss_tensor)
for var in optimizer_summary_vars:
eval_metric_ops[var.op.name] = (var, tf.no_op())
if img_summary is not None:
eval_metric_ops['Detections_Left_Groundtruth_Right'] = (
img_summary, tf.no_op())
if vis_metric_ops is not None:
eval_metric_ops.update(vis_metric_ops)
eval_metric_ops = {str(k): v for k, v in eval_metric_ops.items()}
if eval_config.use_moving_averages:
......@@ -435,12 +452,14 @@ def create_estimator_and_inputs(run_config,
hparams,
pipeline_config_path,
train_steps=None,
eval_steps=None,
sample_1_of_n_eval_examples=1,
sample_1_of_n_eval_on_train_examples=1,
model_fn_creator=create_model_fn,
use_tpu_estimator=False,
use_tpu=False,
num_shards=1,
params=None,
override_eval_num_epochs=True,
**kwargs):
"""Creates `Estimator`, input functions, and steps.
......@@ -450,8 +469,11 @@ def create_estimator_and_inputs(run_config,
pipeline_config_path: A path to a pipeline config file.
train_steps: Number of training steps. If None, the number of training steps
is set from the `TrainConfig` proto.
eval_steps: Number of evaluation steps per evaluation cycle. If None, the
number of evaluation steps is set from the `EvalConfig` proto.
sample_1_of_n_eval_examples: Integer representing how often an eval example
should be sampled. If 1, will sample all examples.
sample_1_of_n_eval_on_train_examples: Similar to
`sample_1_of_n_eval_examples`, except controls the sampling of training
data for evaluation.
model_fn_creator: A function that creates a `model_fn` for `Estimator`.
Follows the signature:
......@@ -470,19 +492,20 @@ def create_estimator_and_inputs(run_config,
is True.
params: Parameter dictionary passed from the estimator. Only used if
`use_tpu_estimator` is True.
override_eval_num_epochs: Whether to overwrite the number of epochs to
1 for eval_input.
**kwargs: Additional keyword arguments for configuration override.
Returns:
A dictionary with the following fields:
'estimator': An `Estimator` or `TPUEstimator`.
'train_input_fn': A training input function.
'eval_input_fn': An evaluation input function.
'eval_input_fns': A list of all evaluation input functions.
'eval_input_names': A list of names for each evaluation input.
'eval_on_train_input_fn': An evaluation-on-train input function.
'predict_input_fn': A prediction input function.
'train_steps': Number of training steps. Either directly from input or from
configuration.
'eval_steps': Number of evaluation steps. Either directly from input or from
configuration.
"""
get_configs_from_pipeline_file = MODEL_BUILD_UTIL_MAP[
'get_configs_from_pipeline_file']
......@@ -495,27 +518,37 @@ def create_estimator_and_inputs(run_config,
create_predict_input_fn = MODEL_BUILD_UTIL_MAP['create_predict_input_fn']
configs = get_configs_from_pipeline_file(pipeline_config_path)
kwargs.update({
'train_steps': train_steps,
'sample_1_of_n_eval_examples': sample_1_of_n_eval_examples,
'retain_original_images_in_eval': False if use_tpu else True,
})
if override_eval_num_epochs:
kwargs.update({'eval_num_epochs': 1})
tf.logging.warning(
'Forced number of epochs for all eval validations to be 1.')
configs = merge_external_params_with_configs(
configs,
hparams,
train_steps=train_steps,
eval_steps=eval_steps,
retain_original_images_in_eval=False if use_tpu else True,
**kwargs)
configs, hparams, kwargs_dict=kwargs)
model_config = configs['model']
train_config = configs['train_config']
train_input_config = configs['train_input_config']
eval_config = configs['eval_config']
eval_input_config = configs['eval_input_config']
eval_input_configs = configs['eval_input_configs']
eval_on_train_input_config = copy.deepcopy(train_input_config)
eval_on_train_input_config.sample_1_of_n_examples = (
sample_1_of_n_eval_on_train_examples)
if override_eval_num_epochs and eval_on_train_input_config.num_epochs != 1:
tf.logging.warning('Expected number of evaluation epochs is 1, but '
'instead encountered `eval_on_train_input_config'
'.num_epochs` = '
'{}. Overwriting `num_epochs` to 1.'.format(
eval_on_train_input_config.num_epochs))
eval_on_train_input_config.num_epochs = 1
# update train_steps from config but only when non-zero value is provided
if train_steps is None and train_config.num_steps != 0:
train_steps = train_config.num_steps
# update eval_steps from config but only when non-zero value is provided
if eval_steps is None and eval_config.num_examples != 0:
eval_steps = eval_config.num_examples
detection_model_fn = functools.partial(
model_builder.build, model_config=model_config)
......@@ -524,18 +557,25 @@ def create_estimator_and_inputs(run_config,
train_config=train_config,
train_input_config=train_input_config,
model_config=model_config)
eval_input_fn = create_eval_input_fn(
eval_config=eval_config,
eval_input_config=eval_input_config,
model_config=model_config)
eval_input_fns = [
create_eval_input_fn(
eval_config=eval_config,
eval_input_config=eval_input_config,
model_config=model_config) for eval_input_config in eval_input_configs
]
eval_input_names = [
eval_input_config.name for eval_input_config in eval_input_configs
]
eval_on_train_input_fn = create_eval_input_fn(
eval_config=eval_config,
eval_input_config=train_input_config,
eval_input_config=eval_on_train_input_config,
model_config=model_config)
predict_input_fn = create_predict_input_fn(
model_config=model_config, predict_input_config=eval_input_config)
model_config=model_config, predict_input_config=eval_input_configs[0])
tf.logging.info('create_estimator_and_inputs: use_tpu %s', use_tpu)
export_to_tpu = hparams.get('export_to_tpu', False)
tf.logging.info('create_estimator_and_inputs: use_tpu %s, export_to_tpu %s',
use_tpu, export_to_tpu)
model_fn = model_fn_creator(detection_model_fn, configs, hparams, use_tpu)
if use_tpu_estimator:
estimator = tf.contrib.tpu.TPUEstimator(
......@@ -546,95 +586,95 @@ def create_estimator_and_inputs(run_config,
use_tpu=use_tpu,
config=run_config,
# TODO(lzc): Remove conditional after CMLE moves to TF 1.9
# BEGIN GOOGLE-INTERNAL
export_to_tpu=export_to_tpu,
eval_on_tpu=False, # Eval runs on CPU, so disable eval on TPU
# END GOOGLE-INTERNAL
params=params if params else {})
else:
estimator = tf.estimator.Estimator(model_fn=model_fn, config=run_config)
# Write the as-run pipeline config to disk.
if run_config.is_chief:
pipeline_config_final = create_pipeline_proto_from_configs(
configs)
pipeline_config_final = create_pipeline_proto_from_configs(configs)
config_util.save_pipeline_config(pipeline_config_final, estimator.model_dir)
return dict(
estimator=estimator,
train_input_fn=train_input_fn,
eval_input_fn=eval_input_fn,
eval_input_fns=eval_input_fns,
eval_input_names=eval_input_names,
eval_on_train_input_fn=eval_on_train_input_fn,
predict_input_fn=predict_input_fn,
train_steps=train_steps,
eval_steps=eval_steps)
train_steps=train_steps)
def create_train_and_eval_specs(train_input_fn,
eval_input_fn,
eval_input_fns,
eval_on_train_input_fn,
predict_input_fn,
train_steps,
eval_steps,
eval_on_train_data=False,
eval_on_train_steps=None,
final_exporter_name='Servo',
eval_spec_name='eval'):
eval_spec_names=None):
"""Creates a `TrainSpec` and `EvalSpec`s.
Args:
train_input_fn: Function that produces features and labels on train data.
eval_input_fn: Function that produces features and labels on eval data.
eval_input_fns: A list of functions that produce features and labels on eval
data.
eval_on_train_input_fn: Function that produces features and labels for
evaluation on train data.
predict_input_fn: Function that produces features for inference.
train_steps: Number of training steps.
eval_steps: Number of eval steps.
eval_on_train_data: Whether to evaluate model on training data. Default is
False.
eval_on_train_steps: Number of eval steps for training data. If not given,
uses eval_steps.
final_exporter_name: String name given to `FinalExporter`.
eval_spec_name: String name given to main `EvalSpec`.
eval_spec_names: A list of string names for each `EvalSpec`.
Returns:
Tuple of `TrainSpec` and list of `EvalSpecs`. The first `EvalSpec` is for
evaluation data. If `eval_on_train_data` is True, the second `EvalSpec` in
the list will correspond to training data.
Tuple of `TrainSpec` and list of `EvalSpecs`. If `eval_on_train_data` is
True, the last `EvalSpec` in the list will correspond to training data. The
rest EvalSpecs in the list are evaluation datas.
"""
exporter = tf.estimator.FinalExporter(
name=final_exporter_name, serving_input_receiver_fn=predict_input_fn)
train_spec = tf.estimator.TrainSpec(
input_fn=train_input_fn, max_steps=train_steps)
eval_specs = [
tf.estimator.EvalSpec(
name=eval_spec_name,
input_fn=eval_input_fn,
steps=eval_steps,
exporters=exporter)
]
if eval_spec_names is None:
eval_spec_names = range(len(eval_input_fns))
eval_specs = []
for eval_spec_name, eval_input_fn in zip(eval_spec_names, eval_input_fns):
exporter_name = '{}_{}'.format(final_exporter_name, eval_spec_name)
exporter = tf.estimator.FinalExporter(
name=exporter_name, serving_input_receiver_fn=predict_input_fn)
eval_specs.append(
tf.estimator.EvalSpec(
name=eval_spec_name,
input_fn=eval_input_fn,
steps=None,
exporters=exporter))
if eval_on_train_data:
eval_specs.append(
tf.estimator.EvalSpec(
name='eval_on_train', input_fn=eval_on_train_input_fn,
steps=eval_on_train_steps or eval_steps))
name='eval_on_train', input_fn=eval_on_train_input_fn, steps=None))
return train_spec, eval_specs
def continuous_eval(estimator, model_dir, input_fn, eval_steps, train_steps,
name):
def continuous_eval(estimator, model_dir, input_fn, train_steps, name):
"""Perform continuous evaluation on checkpoints written to a model directory.
Args:
estimator: Estimator object to use for evaluation.
model_dir: Model directory to read checkpoints for continuous evaluation.
input_fn: Input function to use for evaluation.
eval_steps: Number of steps to run during each evaluation.
train_steps: Number of training steps. This is used to infer the last
checkpoint and stop evaluation loop.
name: Namescope for eval summary.
"""
def terminate_eval():
tf.logging.info('Terminating eval after 180 seconds of no checkpoints')
return True
......@@ -646,10 +686,7 @@ def continuous_eval(estimator, model_dir, input_fn, eval_steps, train_steps,
tf.logging.info('Starting Evaluation.')
try:
eval_results = estimator.evaluate(
input_fn=input_fn,
steps=eval_steps,
checkpoint_path=ckpt,
name=name)
input_fn=input_fn, steps=None, checkpoint_path=ckpt, name=name)
tf.logging.info('Eval results: %s' % eval_results)
# Terminate eval job when final checkpoint is reached
......@@ -713,10 +750,9 @@ def populate_experiment(run_config,
**kwargs)
estimator = train_and_eval_dict['estimator']
train_input_fn = train_and_eval_dict['train_input_fn']
eval_input_fn = train_and_eval_dict['eval_input_fn']
eval_input_fns = train_and_eval_dict['eval_input_fns']
predict_input_fn = train_and_eval_dict['predict_input_fn']
train_steps = train_and_eval_dict['train_steps']
eval_steps = train_and_eval_dict['eval_steps']
export_strategies = [
tf.contrib.learn.utils.saved_model_export_utils.make_export_strategy(
......@@ -726,8 +762,9 @@ def populate_experiment(run_config,
return tf.contrib.learn.Experiment(
estimator=estimator,
train_input_fn=train_input_fn,
eval_input_fn=eval_input_fn,
eval_input_fn=eval_input_fns[0],
train_steps=train_steps,
eval_steps=eval_steps,
eval_steps=None,
export_strategies=export_strategies,
eval_delay_secs=120,)
eval_delay_secs=120,
)
......@@ -64,11 +64,13 @@ def _get_configs_for_model(model_name):
data_path = _get_data_path()
label_map_path = _get_labelmap_path()
configs = config_util.get_configs_from_pipeline_file(filename)
override_dict = {
'train_input_path': data_path,
'eval_input_path': data_path,
'label_map_path': label_map_path
}
configs = config_util.merge_external_params_with_configs(
configs,
train_input_path=data_path,
eval_input_path=data_path,
label_map_path=label_map_path)
configs, kwargs_dict=override_dict)
return configs
......@@ -145,6 +147,9 @@ class ModelLibTest(tf.test.TestCase):
self.assertEqual(batch_size, detection_scores.shape.as_list()[0])
self.assertEqual(tf.float32, detection_scores.dtype)
self.assertEqual(tf.float32, num_detections.dtype)
if mode == 'eval':
self.assertIn('Detections_Left_Groundtruth_Right/0',
estimator_spec.eval_metric_ops)
if model_mode == tf.estimator.ModeKeys.TRAIN:
self.assertIsNotNone(estimator_spec.train_op)
return estimator_spec
......@@ -225,21 +230,17 @@ class ModelLibTest(tf.test.TestCase):
hparams_overrides='load_pretrained=false')
pipeline_config_path = get_pipeline_config_path(MODEL_NAME_FOR_TEST)
train_steps = 20
eval_steps = 10
train_and_eval_dict = model_lib.create_estimator_and_inputs(
run_config,
hparams,
pipeline_config_path,
train_steps=train_steps,
eval_steps=eval_steps)
train_steps=train_steps)
estimator = train_and_eval_dict['estimator']
train_steps = train_and_eval_dict['train_steps']
eval_steps = train_and_eval_dict['eval_steps']
self.assertIsInstance(estimator, tf.estimator.Estimator)
self.assertEqual(20, train_steps)
self.assertEqual(10, eval_steps)
self.assertIn('train_input_fn', train_and_eval_dict)
self.assertIn('eval_input_fn', train_and_eval_dict)
self.assertIn('eval_input_fns', train_and_eval_dict)
self.assertIn('eval_on_train_input_fn', train_and_eval_dict)
def test_create_estimator_with_default_train_eval_steps(self):
......@@ -250,16 +251,13 @@ class ModelLibTest(tf.test.TestCase):
pipeline_config_path = get_pipeline_config_path(MODEL_NAME_FOR_TEST)
configs = config_util.get_configs_from_pipeline_file(pipeline_config_path)
config_train_steps = configs['train_config'].num_steps
config_eval_steps = configs['eval_config'].num_examples
train_and_eval_dict = model_lib.create_estimator_and_inputs(
run_config, hparams, pipeline_config_path)
estimator = train_and_eval_dict['estimator']
train_steps = train_and_eval_dict['train_steps']
eval_steps = train_and_eval_dict['eval_steps']
self.assertIsInstance(estimator, tf.estimator.Estimator)
self.assertEqual(config_train_steps, train_steps)
self.assertEqual(config_eval_steps, eval_steps)
def test_create_tpu_estimator_and_inputs(self):
"""Tests that number of train/eval defaults to config values."""
......@@ -269,21 +267,17 @@ class ModelLibTest(tf.test.TestCase):
hparams_overrides='load_pretrained=false')
pipeline_config_path = get_pipeline_config_path(MODEL_NAME_FOR_TEST)
train_steps = 20
eval_steps = 10
train_and_eval_dict = model_lib.create_estimator_and_inputs(
run_config,
hparams,
pipeline_config_path,
train_steps=train_steps,
eval_steps=eval_steps,
use_tpu_estimator=True)
estimator = train_and_eval_dict['estimator']
train_steps = train_and_eval_dict['train_steps']
eval_steps = train_and_eval_dict['eval_steps']
self.assertIsInstance(estimator, tpu_estimator.TPUEstimator)
self.assertEqual(20, train_steps)
self.assertEqual(10, eval_steps)
def test_create_train_and_eval_specs(self):
"""Tests that `TrainSpec` and `EvalSpec` is created correctly."""
......@@ -292,38 +286,32 @@ class ModelLibTest(tf.test.TestCase):
hparams_overrides='load_pretrained=false')
pipeline_config_path = get_pipeline_config_path(MODEL_NAME_FOR_TEST)
train_steps = 20
eval_steps = 10
eval_on_train_steps = 15
train_and_eval_dict = model_lib.create_estimator_and_inputs(
run_config,
hparams,
pipeline_config_path,
train_steps=train_steps,
eval_steps=eval_steps)
train_steps=train_steps)
train_input_fn = train_and_eval_dict['train_input_fn']
eval_input_fn = train_and_eval_dict['eval_input_fn']
eval_input_fns = train_and_eval_dict['eval_input_fns']
eval_on_train_input_fn = train_and_eval_dict['eval_on_train_input_fn']
predict_input_fn = train_and_eval_dict['predict_input_fn']
train_steps = train_and_eval_dict['train_steps']
eval_steps = train_and_eval_dict['eval_steps']
train_spec, eval_specs = model_lib.create_train_and_eval_specs(
train_input_fn,
eval_input_fn,
eval_input_fns,
eval_on_train_input_fn,
predict_input_fn,
train_steps,
eval_steps,
eval_on_train_data=True,
eval_on_train_steps=eval_on_train_steps,
final_exporter_name='exporter',
eval_spec_name='holdout')
eval_spec_names=['holdout'])
self.assertEqual(train_steps, train_spec.max_steps)
self.assertEqual(2, len(eval_specs))
self.assertEqual(eval_steps, eval_specs[0].steps)
self.assertEqual(None, eval_specs[0].steps)
self.assertEqual('holdout', eval_specs[0].name)
self.assertEqual('exporter', eval_specs[0].exporters[0].name)
self.assertEqual(eval_on_train_steps, eval_specs[1].steps)
self.assertEqual('exporter_holdout', eval_specs[0].exporters[0].name)
self.assertEqual(None, eval_specs[1].steps)
self.assertEqual('eval_on_train', eval_specs[1].name)
def test_experiment(self):
......@@ -339,7 +327,7 @@ class ModelLibTest(tf.test.TestCase):
train_steps=10,
eval_steps=20)
self.assertEqual(10, experiment.train_steps)
self.assertEqual(20, experiment.eval_steps)
self.assertEqual(None, experiment.eval_steps)
class UnbatchTensorsTest(tf.test.TestCase):
......
......@@ -31,7 +31,16 @@ flags.DEFINE_string(
flags.DEFINE_string('pipeline_config_path', None, 'Path to pipeline config '
'file.')
flags.DEFINE_integer('num_train_steps', None, 'Number of train steps.')
flags.DEFINE_integer('num_eval_steps', None, 'Number of train steps.')
flags.DEFINE_boolean('eval_training_data', False,
'If training data should be evaluated for this job. Note '
'that one call only use this in eval-only mode, and '
'`checkpoint_dir` must be supplied.')
flags.DEFINE_integer('sample_1_of_n_eval_examples', 1, 'Will sample one of '
'every n eval input examples, where n is provided.')
flags.DEFINE_integer('sample_1_of_n_eval_on_train_examples', 5, 'Will sample '
'one of every n train input examples for evaluation, '
'where n is provided. This is only used if '
'`eval_training_data` is True.')
flags.DEFINE_string(
'hparams_overrides', None, 'Hyperparameter overrides, '
'represented as a string containing comma-separated '
......@@ -44,8 +53,6 @@ flags.DEFINE_boolean(
'run_once', False, 'If running in eval-only mode, whether to run just '
'one round of eval vs running continuously (default).'
)
flags.DEFINE_boolean('eval_training_data', False,
'If training data should be evaluated for this job.')
FLAGS = flags.FLAGS
......@@ -59,14 +66,15 @@ def main(unused_argv):
hparams=model_hparams.create_hparams(FLAGS.hparams_overrides),
pipeline_config_path=FLAGS.pipeline_config_path,
train_steps=FLAGS.num_train_steps,
eval_steps=FLAGS.num_eval_steps)
sample_1_of_n_eval_examples=FLAGS.sample_1_of_n_eval_examples,
sample_1_of_n_eval_on_train_examples=(
FLAGS.sample_1_of_n_eval_on_train_examples))
estimator = train_and_eval_dict['estimator']
train_input_fn = train_and_eval_dict['train_input_fn']
eval_input_fn = train_and_eval_dict['eval_input_fn']
eval_input_fns = train_and_eval_dict['eval_input_fns']
eval_on_train_input_fn = train_and_eval_dict['eval_on_train_input_fn']
predict_input_fn = train_and_eval_dict['predict_input_fn']
train_steps = train_and_eval_dict['train_steps']
eval_steps = train_and_eval_dict['eval_steps']
if FLAGS.checkpoint_dir:
if FLAGS.eval_training_data:
......@@ -74,23 +82,23 @@ def main(unused_argv):
input_fn = eval_on_train_input_fn
else:
name = 'validation_data'
input_fn = eval_input_fn
# The first eval input will be evaluated.
input_fn = eval_input_fns[0]
if FLAGS.run_once:
estimator.evaluate(input_fn,
eval_steps,
num_eval_steps=None,
checkpoint_path=tf.train.latest_checkpoint(
FLAGS.checkpoint_dir))
else:
model_lib.continuous_eval(estimator, FLAGS.model_dir, input_fn,
eval_steps, train_steps, name)
model_lib.continuous_eval(estimator, FLAGS.checkpoint_dir, input_fn,
train_steps, name)
else:
train_spec, eval_specs = model_lib.create_train_and_eval_specs(
train_input_fn,
eval_input_fn,
eval_input_fns,
eval_on_train_input_fn,
predict_input_fn,
train_steps,
eval_steps,
eval_on_train_data=False)
# Currently only a single Eval Spec is allowed.
......
......@@ -62,15 +62,20 @@ flags.DEFINE_integer('train_batch_size', None, 'Batch size for training. If '
flags.DEFINE_string(
'hparams_overrides', None, 'Comma-separated list of '
'hyperparameters to override defaults.')
flags.DEFINE_integer('num_train_steps', None, 'Number of train steps.')
flags.DEFINE_boolean('eval_training_data', False,
'If training data should be evaluated for this job.')
flags.DEFINE_integer('sample_1_of_n_eval_examples', 1, 'Will sample one of '
'every n eval input examples, where n is provided.')
flags.DEFINE_integer('sample_1_of_n_eval_on_train_examples', 5, 'Will sample '
'one of every n train input examples for evaluation, '
'where n is provided. This is only used if '
'`eval_training_data` is True.')
flags.DEFINE_string(
'model_dir', None, 'Path to output model directory '
'where event and checkpoint files will be written.')
flags.DEFINE_string('pipeline_config_path', None, 'Path to pipeline config '
'file.')
flags.DEFINE_integer('num_train_steps', None, 'Number of train steps.')
flags.DEFINE_integer('num_eval_steps', None, 'Number of train steps.')
FLAGS = tf.flags.FLAGS
......@@ -103,17 +108,18 @@ def main(unused_argv):
hparams=model_hparams.create_hparams(FLAGS.hparams_overrides),
pipeline_config_path=FLAGS.pipeline_config_path,
train_steps=FLAGS.num_train_steps,
eval_steps=FLAGS.num_eval_steps,
sample_1_of_n_eval_examples=FLAGS.sample_1_of_n_eval_examples,
sample_1_of_n_eval_on_train_examples=(
FLAGS.sample_1_of_n_eval_on_train_examples),
use_tpu_estimator=True,
use_tpu=FLAGS.use_tpu,
num_shards=FLAGS.num_shards,
**kwargs)
estimator = train_and_eval_dict['estimator']
train_input_fn = train_and_eval_dict['train_input_fn']
eval_input_fn = train_and_eval_dict['eval_input_fn']
eval_input_fns = train_and_eval_dict['eval_input_fns']
eval_on_train_input_fn = train_and_eval_dict['eval_on_train_input_fn']
train_steps = train_and_eval_dict['train_steps']
eval_steps = train_and_eval_dict['eval_steps']
if FLAGS.mode == 'train':
estimator.train(input_fn=train_input_fn, max_steps=train_steps)
......@@ -125,9 +131,10 @@ def main(unused_argv):
input_fn = eval_on_train_input_fn
else:
name = 'validation_data'
input_fn = eval_input_fn
model_lib.continuous_eval(estimator, FLAGS.model_dir, input_fn, eval_steps,
train_steps, name)
# Currently only a single eval input is allowed.
input_fn = eval_input_fns[0]
model_lib.continuous_eval(estimator, FLAGS.model_dir, input_fn, train_steps,
name)
if __name__ == '__main__':
......
......@@ -24,6 +24,7 @@ Feature map generators build on the base feature extractors and produce a list
of final feature maps.
"""
import collections
import functools
import tensorflow as tf
from object_detection.utils import ops
slim = tf.contrib.slim
......@@ -45,6 +46,220 @@ def get_depth_fn(depth_multiplier, min_depth):
return multiply_depth
class KerasMultiResolutionFeatureMaps(tf.keras.Model):
"""Generates multi resolution feature maps from input image features.
A Keras model that generates multi-scale feature maps for detection as in the
SSD papers by Liu et al: https://arxiv.org/pdf/1512.02325v2.pdf, See Sec 2.1.
More specifically, when called on inputs it performs the following two tasks:
1) If a layer name is provided in the configuration, returns that layer as a
feature map.
2) If a layer name is left as an empty string, constructs a new feature map
based on the spatial shape and depth configuration. Note that the current
implementation only supports generating new layers using convolution of
stride 2 resulting in a spatial resolution reduction by a factor of 2.
By default convolution kernel size is set to 3, and it can be customized
by caller.
An example of the configuration for Inception V3:
{
'from_layer': ['Mixed_5d', 'Mixed_6e', 'Mixed_7c', '', '', ''],
'layer_depth': [-1, -1, -1, 512, 256, 128]
}
When this feature generator object is called on input image_features:
Args:
image_features: A dictionary of handles to activation tensors from the
base feature extractor.
Returns:
feature_maps: an OrderedDict mapping keys (feature map names) to
tensors where each tensor has shape [batch, height_i, width_i, depth_i].
"""
def __init__(self,
feature_map_layout,
depth_multiplier,
min_depth,
insert_1x1_conv,
is_training,
conv_hyperparams,
freeze_batchnorm,
name=None):
"""Constructor.
Args:
feature_map_layout: Dictionary of specifications for the feature map
layouts in the following format (Inception V2/V3 respectively):
{
'from_layer': ['Mixed_3c', 'Mixed_4c', 'Mixed_5c', '', '', ''],
'layer_depth': [-1, -1, -1, 512, 256, 128]
}
or
{
'from_layer': ['Mixed_5d', 'Mixed_6e', 'Mixed_7c', '', '', ''],
'layer_depth': [-1, -1, -1, 512, 256, 128]
}
If 'from_layer' is specified, the specified feature map is directly used
as a box predictor layer, and the layer_depth is directly infered from
the feature map (instead of using the provided 'layer_depth' parameter).
In this case, our convention is to set 'layer_depth' to -1 for clarity.
Otherwise, if 'from_layer' is an empty string, then the box predictor
layer will be built from the previous layer using convolution
operations. Note that the current implementation only supports
generating new layers using convolutions of stride 2 (resulting in a
spatial resolution reduction by a factor of 2), and will be extended to
a more flexible design. Convolution kernel size is set to 3 by default,
and can be customized by 'conv_kernel_size' parameter (similarily,
'conv_kernel_size' should be set to -1 if 'from_layer' is specified).
The created convolution operation will be a normal 2D convolution by
default, and a depthwise convolution followed by 1x1 convolution if
'use_depthwise' is set to True.
depth_multiplier: Depth multiplier for convolutional layers.
min_depth: Minimum depth for convolutional layers.
insert_1x1_conv: A boolean indicating whether an additional 1x1
convolution should be inserted before shrinking the feature map.
is_training: Indicates whether the feature generator is in training mode.
conv_hyperparams: A `hyperparams_builder.KerasLayerHyperparams` object
containing hyperparameters for convolution ops.
freeze_batchnorm: Bool. Whether to freeze batch norm parameters during
training or not. When training with a small batch size (e.g. 1), it is
desirable to freeze batch norm update and use pretrained batch norm
params.
name: A string name scope to assign to the model. If 'None', Keras
will auto-generate one from the class name.
"""
super(KerasMultiResolutionFeatureMaps, self).__init__(name=name)
self.feature_map_layout = feature_map_layout
self.convolutions = []
depth_fn = get_depth_fn(depth_multiplier, min_depth)
base_from_layer = ''
use_explicit_padding = False
if 'use_explicit_padding' in feature_map_layout:
use_explicit_padding = feature_map_layout['use_explicit_padding']
use_depthwise = False
if 'use_depthwise' in feature_map_layout:
use_depthwise = feature_map_layout['use_depthwise']
for index, from_layer in enumerate(feature_map_layout['from_layer']):
net = tf.keras.Sequential(name='output_%d' % index)
self.convolutions.append(net)
layer_depth = feature_map_layout['layer_depth'][index]
conv_kernel_size = 3
if 'conv_kernel_size' in feature_map_layout:
conv_kernel_size = feature_map_layout['conv_kernel_size'][index]
if from_layer:
base_from_layer = from_layer
else:
if insert_1x1_conv:
layer_name = '{}_1_Conv2d_{}_1x1_{}'.format(
base_from_layer, index, depth_fn(layer_depth / 2))
net.add(tf.keras.layers.Conv2D(depth_fn(layer_depth / 2),
[1, 1],
padding='SAME',
strides=1,
name=layer_name + '_conv',
**conv_hyperparams.params()))
net.add(
conv_hyperparams.build_batch_norm(
training=(is_training and not freeze_batchnorm),
name=layer_name + '_batchnorm'))
net.add(
conv_hyperparams.build_activation_layer(
name=layer_name))
layer_name = '{}_2_Conv2d_{}_{}x{}_s2_{}'.format(
base_from_layer, index, conv_kernel_size, conv_kernel_size,
depth_fn(layer_depth))
stride = 2
padding = 'SAME'
if use_explicit_padding:
padding = 'VALID'
# We define this function here while capturing the value of
# conv_kernel_size, to avoid holding a reference to the loop variable
# conv_kernel_size inside of a lambda function
def fixed_padding(features, kernel_size=conv_kernel_size):
ops.fixed_padding(features, kernel_size)
net.add(tf.keras.layers.Lambda(fixed_padding))
# TODO(rathodv): Add some utilities to simplify the creation of
# Depthwise & non-depthwise convolutions w/ normalization & activations
if use_depthwise:
net.add(tf.keras.layers.DepthwiseConv2D(
[conv_kernel_size, conv_kernel_size],
depth_multiplier=1,
padding=padding,
strides=stride,
name=layer_name + '_depthwise_conv',
**conv_hyperparams.params()))
net.add(
conv_hyperparams.build_batch_norm(
training=(is_training and not freeze_batchnorm),
name=layer_name + '_depthwise_batchnorm'))
net.add(
conv_hyperparams.build_activation_layer(
name=layer_name + '_depthwise'))
net.add(tf.keras.layers.Conv2D(depth_fn(layer_depth), [1, 1],
padding='SAME',
strides=1,
name=layer_name + '_conv',
**conv_hyperparams.params()))
net.add(
conv_hyperparams.build_batch_norm(
training=(is_training and not freeze_batchnorm),
name=layer_name + '_batchnorm'))
net.add(
conv_hyperparams.build_activation_layer(
name=layer_name))
else:
net.add(tf.keras.layers.Conv2D(depth_fn(layer_depth),
[conv_kernel_size, conv_kernel_size],
padding=padding,
strides=stride,
name=layer_name + '_conv',
**conv_hyperparams.params()))
net.add(
conv_hyperparams.build_batch_norm(
training=(is_training and not freeze_batchnorm),
name=layer_name + '_batchnorm'))
net.add(
conv_hyperparams.build_activation_layer(
name=layer_name))
def call(self, image_features):
"""Generate the multi-resolution feature maps.
Executed when calling the `.__call__` method on input.
Args:
image_features: A dictionary of handles to activation tensors from the
base feature extractor.
Returns:
feature_maps: an OrderedDict mapping keys (feature map names) to
tensors where each tensor has shape [batch, height_i, width_i, depth_i].
"""
feature_maps = []
feature_map_keys = []
for index, from_layer in enumerate(self.feature_map_layout['from_layer']):
if from_layer:
feature_map = image_features[from_layer]
feature_map_keys.append(from_layer)
else:
feature_map = feature_maps[-1]
feature_map = self.convolutions[index](feature_map)
layer_name = self.convolutions[index].layers[-1].name
feature_map_keys.append(layer_name)
feature_maps.append(feature_map)
return collections.OrderedDict(
[(x, y) for (x, y) in zip(feature_map_keys, feature_maps)])
def multi_resolution_feature_maps(feature_map_layout, depth_multiplier,
min_depth, insert_1x1_conv, image_features):
"""Generates multi resolution feature maps from input image features.
......@@ -77,7 +292,7 @@ def multi_resolution_feature_maps(feature_map_layout, depth_multiplier,
}
or
{
'from_layer': ['Mixed_5d', 'Mixed_6e', 'Mixed_7c', '', '', '', ''],
'from_layer': ['Mixed_5d', 'Mixed_6e', 'Mixed_7c', '', '', ''],
'layer_depth': [-1, -1, -1, 512, 256, 128]
}
If 'from_layer' is specified, the specified feature map is directly used
......@@ -179,7 +394,10 @@ def multi_resolution_feature_maps(feature_map_layout, depth_multiplier,
[(x, y) for (x, y) in zip(feature_map_keys, feature_maps)])
def fpn_top_down_feature_maps(image_features, depth, scope=None):
def fpn_top_down_feature_maps(image_features,
depth,
use_depthwise=False,
scope=None):
"""Generates `top-down` feature maps for Feature Pyramid Networks.
See https://arxiv.org/abs/1612.03144 for details.
......@@ -189,6 +407,7 @@ def fpn_top_down_feature_maps(image_features, depth, scope=None):
Spatial resolutions of succesive tensors must reduce exactly by a factor
of 2.
depth: depth of output feature maps.
use_depthwise: use depthwise separable conv instead of regular conv.
scope: A scope name to wrap this op under.
Returns:
......@@ -200,7 +419,7 @@ def fpn_top_down_feature_maps(image_features, depth, scope=None):
output_feature_maps_list = []
output_feature_map_keys = []
with slim.arg_scope(
[slim.conv2d], padding='SAME', stride=1):
[slim.conv2d, slim.separable_conv2d], padding='SAME', stride=1):
top_down = slim.conv2d(
image_features[-1][1],
depth, [1, 1], activation_fn=None, normalizer_fn=None,
......@@ -216,7 +435,11 @@ def fpn_top_down_feature_maps(image_features, depth, scope=None):
activation_fn=None, normalizer_fn=None,
scope='projection_%d' % (level + 1))
top_down += residual
output_feature_maps_list.append(slim.conv2d(
if use_depthwise:
conv_op = functools.partial(slim.separable_conv2d, depth_multiplier=1)
else:
conv_op = slim.conv2d
output_feature_maps_list.append(conv_op(
top_down,
depth, [3, 3],
scope='smoothing_%d' % (level + 1)))
......@@ -226,7 +449,7 @@ def fpn_top_down_feature_maps(image_features, depth, scope=None):
def pooling_pyramid_feature_maps(base_feature_map_depth, num_layers,
image_features):
image_features, replace_pool_with_conv=False):
"""Generates pooling pyramid feature maps.
The pooling pyramid feature maps is motivated by
......@@ -250,6 +473,8 @@ def pooling_pyramid_feature_maps(base_feature_map_depth, num_layers,
from the base feature.
image_features: A dictionary of handles to activation tensors from the
feature extractor.
replace_pool_with_conv: Whether or not to replace pooling operations with
convolutions in the PPN. Default is False.
Returns:
feature_maps: an OrderedDict mapping keys (feature map names) to
......@@ -279,12 +504,22 @@ def pooling_pyramid_feature_maps(base_feature_map_depth, num_layers,
feature_map_keys.append(feature_map_key)
feature_maps.append(image_features)
feature_map = image_features
with slim.arg_scope([slim.max_pool2d], padding='SAME', stride=2):
for i in range(num_layers - 1):
feature_map_key = 'MaxPool2d_%d_2x2' % i
feature_map = slim.max_pool2d(
feature_map, [2, 2], padding='SAME', scope=feature_map_key)
feature_map_keys.append(feature_map_key)
feature_maps.append(feature_map)
if replace_pool_with_conv:
with slim.arg_scope([slim.conv2d], padding='SAME', stride=2):
for i in range(num_layers - 1):
feature_map_key = 'Conv2d_{}_3x3_s2_{}'.format(i,
base_feature_map_depth)
feature_map = slim.conv2d(
feature_map, base_feature_map_depth, [3, 3], scope=feature_map_key)
feature_map_keys.append(feature_map_key)
feature_maps.append(feature_map)
else:
with slim.arg_scope([slim.max_pool2d], padding='SAME', stride=2):
for i in range(num_layers - 1):
feature_map_key = 'MaxPool2d_%d_2x2' % i
feature_map = slim.max_pool2d(
feature_map, [2, 2], padding='SAME', scope=feature_map_key)
feature_map_keys.append(feature_map_key)
feature_maps.append(feature_map)
return collections.OrderedDict(
[(x, y) for (x, y) in zip(feature_map_keys, feature_maps)])
......@@ -15,9 +15,15 @@
"""Tests for feature map generators."""
from absl.testing import parameterized
import tensorflow as tf
from google.protobuf import text_format
from object_detection.builders import hyperparams_builder
from object_detection.models import feature_map_generators
from object_detection.protos import hyperparams_pb2
INCEPTION_V2_LAYOUT = {
'from_layer': ['Mixed_3c', 'Mixed_4c', 'Mixed_5c', '', '', ''],
......@@ -40,21 +46,60 @@ EMBEDDED_SSD_MOBILENET_V1_LAYOUT = {
}
# TODO(rathodv): add tests with different anchor strides.
@parameterized.parameters(
{'use_keras': False},
{'use_keras': True},
)
class MultiResolutionFeatureMapGeneratorTest(tf.test.TestCase):
def test_get_expected_feature_map_shapes_with_inception_v2(self):
def _build_conv_hyperparams(self):
conv_hyperparams = hyperparams_pb2.Hyperparams()
conv_hyperparams_text_proto = """
regularizer {
l2_regularizer {
}
}
initializer {
truncated_normal_initializer {
}
}
"""
text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams)
return hyperparams_builder.KerasLayerHyperparams(conv_hyperparams)
def _build_feature_map_generator(self, feature_map_layout, use_keras):
if use_keras:
return feature_map_generators.KerasMultiResolutionFeatureMaps(
feature_map_layout=feature_map_layout,
depth_multiplier=1,
min_depth=32,
insert_1x1_conv=True,
freeze_batchnorm=False,
is_training=True,
conv_hyperparams=self._build_conv_hyperparams(),
name='FeatureMaps'
)
else:
def feature_map_generator(image_features):
return feature_map_generators.multi_resolution_feature_maps(
feature_map_layout=feature_map_layout,
depth_multiplier=1,
min_depth=32,
insert_1x1_conv=True,
image_features=image_features)
return feature_map_generator
def test_get_expected_feature_map_shapes_with_inception_v2(self, use_keras):
image_features = {
'Mixed_3c': tf.random_uniform([4, 28, 28, 256], dtype=tf.float32),
'Mixed_4c': tf.random_uniform([4, 14, 14, 576], dtype=tf.float32),
'Mixed_5c': tf.random_uniform([4, 7, 7, 1024], dtype=tf.float32)
}
feature_maps = feature_map_generators.multi_resolution_feature_maps(
feature_map_generator = self._build_feature_map_generator(
feature_map_layout=INCEPTION_V2_LAYOUT,
depth_multiplier=1,
min_depth=32,
insert_1x1_conv=True,
image_features=image_features)
use_keras=use_keras
)
feature_maps = feature_map_generator(image_features)
expected_feature_map_shapes = {
'Mixed_3c': (4, 28, 28, 256),
......@@ -70,21 +115,54 @@ class MultiResolutionFeatureMapGeneratorTest(tf.test.TestCase):
out_feature_maps = sess.run(feature_maps)
out_feature_map_shapes = dict(
(key, value.shape) for key, value in out_feature_maps.items())
self.assertDictEqual(out_feature_map_shapes, expected_feature_map_shapes)
self.assertDictEqual(expected_feature_map_shapes, out_feature_map_shapes)
# TODO(kaftan): Remove conditional after CMLE moves to TF 1.10
# BEGIN GOOGLE-INTERNAL
def test_get_expected_feature_map_shapes_with_inception_v2_use_depthwise(
self, use_keras):
image_features = {
'Mixed_3c': tf.random_uniform([4, 28, 28, 256], dtype=tf.float32),
'Mixed_4c': tf.random_uniform([4, 14, 14, 576], dtype=tf.float32),
'Mixed_5c': tf.random_uniform([4, 7, 7, 1024], dtype=tf.float32)
}
layout_copy = INCEPTION_V2_LAYOUT.copy()
layout_copy['use_depthwise'] = True
feature_map_generator = self._build_feature_map_generator(
feature_map_layout=layout_copy,
use_keras=use_keras
)
feature_maps = feature_map_generator(image_features)
expected_feature_map_shapes = {
'Mixed_3c': (4, 28, 28, 256),
'Mixed_4c': (4, 14, 14, 576),
'Mixed_5c': (4, 7, 7, 1024),
'Mixed_5c_2_Conv2d_3_3x3_s2_512': (4, 4, 4, 512),
'Mixed_5c_2_Conv2d_4_3x3_s2_256': (4, 2, 2, 256),
'Mixed_5c_2_Conv2d_5_3x3_s2_256': (4, 1, 1, 256)}
init_op = tf.global_variables_initializer()
with self.test_session() as sess:
sess.run(init_op)
out_feature_maps = sess.run(feature_maps)
out_feature_map_shapes = dict(
(key, value.shape) for key, value in out_feature_maps.items())
self.assertDictEqual(expected_feature_map_shapes, out_feature_map_shapes)
# END GOOGLE-INTERNAL
def test_get_expected_feature_map_shapes_with_inception_v3(self):
def test_get_expected_feature_map_shapes_with_inception_v3(self, use_keras):
image_features = {
'Mixed_5d': tf.random_uniform([4, 35, 35, 256], dtype=tf.float32),
'Mixed_6e': tf.random_uniform([4, 17, 17, 576], dtype=tf.float32),
'Mixed_7c': tf.random_uniform([4, 8, 8, 1024], dtype=tf.float32)
}
feature_maps = feature_map_generators.multi_resolution_feature_maps(
feature_map_generator = self._build_feature_map_generator(
feature_map_layout=INCEPTION_V3_LAYOUT,
depth_multiplier=1,
min_depth=32,
insert_1x1_conv=True,
image_features=image_features)
use_keras=use_keras
)
feature_maps = feature_map_generator(image_features)
expected_feature_map_shapes = {
'Mixed_5d': (4, 35, 35, 256),
......@@ -100,10 +178,10 @@ class MultiResolutionFeatureMapGeneratorTest(tf.test.TestCase):
out_feature_maps = sess.run(feature_maps)
out_feature_map_shapes = dict(
(key, value.shape) for key, value in out_feature_maps.items())
self.assertDictEqual(out_feature_map_shapes, expected_feature_map_shapes)
self.assertDictEqual(expected_feature_map_shapes, out_feature_map_shapes)
def test_get_expected_feature_map_shapes_with_embedded_ssd_mobilenet_v1(
self):
self, use_keras):
image_features = {
'Conv2d_11_pointwise': tf.random_uniform([4, 16, 16, 512],
dtype=tf.float32),
......@@ -111,12 +189,11 @@ class MultiResolutionFeatureMapGeneratorTest(tf.test.TestCase):
dtype=tf.float32),
}
feature_maps = feature_map_generators.multi_resolution_feature_maps(
feature_map_generator = self._build_feature_map_generator(
feature_map_layout=EMBEDDED_SSD_MOBILENET_V1_LAYOUT,
depth_multiplier=1,
min_depth=32,
insert_1x1_conv=True,
image_features=image_features)
use_keras=use_keras
)
feature_maps = feature_map_generator(image_features)
expected_feature_map_shapes = {
'Conv2d_11_pointwise': (4, 16, 16, 512),
......@@ -131,7 +208,138 @@ class MultiResolutionFeatureMapGeneratorTest(tf.test.TestCase):
out_feature_maps = sess.run(feature_maps)
out_feature_map_shapes = dict(
(key, value.shape) for key, value in out_feature_maps.items())
self.assertDictEqual(out_feature_map_shapes, expected_feature_map_shapes)
self.assertDictEqual(expected_feature_map_shapes, out_feature_map_shapes)
def test_get_expected_variable_names_with_inception_v2(self, use_keras):
image_features = {
'Mixed_3c': tf.random_uniform([4, 28, 28, 256], dtype=tf.float32),
'Mixed_4c': tf.random_uniform([4, 14, 14, 576], dtype=tf.float32),
'Mixed_5c': tf.random_uniform([4, 7, 7, 1024], dtype=tf.float32)
}
feature_map_generator = self._build_feature_map_generator(
feature_map_layout=INCEPTION_V2_LAYOUT,
use_keras=use_keras
)
feature_maps = feature_map_generator(image_features)
expected_slim_variables = set([
'Mixed_5c_1_Conv2d_3_1x1_256/weights',
'Mixed_5c_1_Conv2d_3_1x1_256/biases',
'Mixed_5c_2_Conv2d_3_3x3_s2_512/weights',
'Mixed_5c_2_Conv2d_3_3x3_s2_512/biases',
'Mixed_5c_1_Conv2d_4_1x1_128/weights',
'Mixed_5c_1_Conv2d_4_1x1_128/biases',
'Mixed_5c_2_Conv2d_4_3x3_s2_256/weights',
'Mixed_5c_2_Conv2d_4_3x3_s2_256/biases',
'Mixed_5c_1_Conv2d_5_1x1_128/weights',
'Mixed_5c_1_Conv2d_5_1x1_128/biases',
'Mixed_5c_2_Conv2d_5_3x3_s2_256/weights',
'Mixed_5c_2_Conv2d_5_3x3_s2_256/biases',
])
expected_keras_variables = set([
'FeatureMaps/output_3/Mixed_5c_1_Conv2d_3_1x1_256_conv/kernel',
'FeatureMaps/output_3/Mixed_5c_1_Conv2d_3_1x1_256_conv/bias',
'FeatureMaps/output_3/Mixed_5c_2_Conv2d_3_3x3_s2_512_conv/kernel',
'FeatureMaps/output_3/Mixed_5c_2_Conv2d_3_3x3_s2_512_conv/bias',
'FeatureMaps/output_4/Mixed_5c_1_Conv2d_4_1x1_128_conv/kernel',
'FeatureMaps/output_4/Mixed_5c_1_Conv2d_4_1x1_128_conv/bias',
'FeatureMaps/output_4/Mixed_5c_2_Conv2d_4_3x3_s2_256_conv/kernel',
'FeatureMaps/output_4/Mixed_5c_2_Conv2d_4_3x3_s2_256_conv/bias',
'FeatureMaps/output_5/Mixed_5c_1_Conv2d_5_1x1_128_conv/kernel',
'FeatureMaps/output_5/Mixed_5c_1_Conv2d_5_1x1_128_conv/bias',
'FeatureMaps/output_5/Mixed_5c_2_Conv2d_5_3x3_s2_256_conv/kernel',
'FeatureMaps/output_5/Mixed_5c_2_Conv2d_5_3x3_s2_256_conv/bias',
])
init_op = tf.global_variables_initializer()
with self.test_session() as sess:
sess.run(init_op)
sess.run(feature_maps)
actual_variable_set = set(
[var.op.name for var in tf.trainable_variables()])
if use_keras:
self.assertSetEqual(expected_keras_variables, actual_variable_set)
else:
self.assertSetEqual(expected_slim_variables, actual_variable_set)
# TODO(kaftan): Remove conditional after CMLE moves to TF 1.10
# BEGIN GOOGLE-INTERNAL
def test_get_expected_variable_names_with_inception_v2_use_depthwise(
self,
use_keras):
image_features = {
'Mixed_3c': tf.random_uniform([4, 28, 28, 256], dtype=tf.float32),
'Mixed_4c': tf.random_uniform([4, 14, 14, 576], dtype=tf.float32),
'Mixed_5c': tf.random_uniform([4, 7, 7, 1024], dtype=tf.float32)
}
layout_copy = INCEPTION_V2_LAYOUT.copy()
layout_copy['use_depthwise'] = True
feature_map_generator = self._build_feature_map_generator(
feature_map_layout=layout_copy,
use_keras=use_keras
)
feature_maps = feature_map_generator(image_features)
expected_slim_variables = set([
'Mixed_5c_1_Conv2d_3_1x1_256/weights',
'Mixed_5c_1_Conv2d_3_1x1_256/biases',
'Mixed_5c_2_Conv2d_3_3x3_s2_512_depthwise/depthwise_weights',
'Mixed_5c_2_Conv2d_3_3x3_s2_512_depthwise/biases',
'Mixed_5c_2_Conv2d_3_3x3_s2_512/weights',
'Mixed_5c_2_Conv2d_3_3x3_s2_512/biases',
'Mixed_5c_1_Conv2d_4_1x1_128/weights',
'Mixed_5c_1_Conv2d_4_1x1_128/biases',
'Mixed_5c_2_Conv2d_4_3x3_s2_256_depthwise/depthwise_weights',
'Mixed_5c_2_Conv2d_4_3x3_s2_256_depthwise/biases',
'Mixed_5c_2_Conv2d_4_3x3_s2_256/weights',
'Mixed_5c_2_Conv2d_4_3x3_s2_256/biases',
'Mixed_5c_1_Conv2d_5_1x1_128/weights',
'Mixed_5c_1_Conv2d_5_1x1_128/biases',
'Mixed_5c_2_Conv2d_5_3x3_s2_256_depthwise/depthwise_weights',
'Mixed_5c_2_Conv2d_5_3x3_s2_256_depthwise/biases',
'Mixed_5c_2_Conv2d_5_3x3_s2_256/weights',
'Mixed_5c_2_Conv2d_5_3x3_s2_256/biases',
])
expected_keras_variables = set([
'FeatureMaps/output_3/Mixed_5c_1_Conv2d_3_1x1_256_conv/kernel',
'FeatureMaps/output_3/Mixed_5c_1_Conv2d_3_1x1_256_conv/bias',
('FeatureMaps/output_3/Mixed_5c_2_Conv2d_3_3x3_s2_512_depthwise_conv/'
'depthwise_kernel'),
('FeatureMaps/output_3/Mixed_5c_2_Conv2d_3_3x3_s2_512_depthwise_conv/'
'bias'),
'FeatureMaps/output_3/Mixed_5c_2_Conv2d_3_3x3_s2_512_conv/kernel',
'FeatureMaps/output_3/Mixed_5c_2_Conv2d_3_3x3_s2_512_conv/bias',
'FeatureMaps/output_4/Mixed_5c_1_Conv2d_4_1x1_128_conv/kernel',
'FeatureMaps/output_4/Mixed_5c_1_Conv2d_4_1x1_128_conv/bias',
('FeatureMaps/output_4/Mixed_5c_2_Conv2d_4_3x3_s2_256_depthwise_conv/'
'depthwise_kernel'),
('FeatureMaps/output_4/Mixed_5c_2_Conv2d_4_3x3_s2_256_depthwise_conv/'
'bias'),
'FeatureMaps/output_4/Mixed_5c_2_Conv2d_4_3x3_s2_256_conv/kernel',
'FeatureMaps/output_4/Mixed_5c_2_Conv2d_4_3x3_s2_256_conv/bias',
'FeatureMaps/output_5/Mixed_5c_1_Conv2d_5_1x1_128_conv/kernel',
'FeatureMaps/output_5/Mixed_5c_1_Conv2d_5_1x1_128_conv/bias',
('FeatureMaps/output_5/Mixed_5c_2_Conv2d_5_3x3_s2_256_depthwise_conv/'
'depthwise_kernel'),
('FeatureMaps/output_5/Mixed_5c_2_Conv2d_5_3x3_s2_256_depthwise_conv/'
'bias'),
'FeatureMaps/output_5/Mixed_5c_2_Conv2d_5_3x3_s2_256_conv/kernel',
'FeatureMaps/output_5/Mixed_5c_2_Conv2d_5_3x3_s2_256_conv/bias',
])
init_op = tf.global_variables_initializer()
with self.test_session() as sess:
sess.run(init_op)
sess.run(feature_maps)
actual_variable_set = set(
[var.op.name for var in tf.trainable_variables()])
if use_keras:
self.assertSetEqual(expected_keras_variables, actual_variable_set)
else:
self.assertSetEqual(expected_slim_variables, actual_variable_set)
# END GOOGLE-INTERNAL
class FPNFeatureMapGeneratorTest(tf.test.TestCase):
......@@ -161,6 +369,31 @@ class FPNFeatureMapGeneratorTest(tf.test.TestCase):
for key, value in out_feature_maps.items()}
self.assertDictEqual(out_feature_map_shapes, expected_feature_map_shapes)
def test_get_expected_feature_map_shapes_with_depthwise(self):
image_features = [
('block2', tf.random_uniform([4, 8, 8, 256], dtype=tf.float32)),
('block3', tf.random_uniform([4, 4, 4, 256], dtype=tf.float32)),
('block4', tf.random_uniform([4, 2, 2, 256], dtype=tf.float32)),
('block5', tf.random_uniform([4, 1, 1, 256], dtype=tf.float32))
]
feature_maps = feature_map_generators.fpn_top_down_feature_maps(
image_features=image_features, depth=128, use_depthwise=True)
expected_feature_map_shapes = {
'top_down_block2': (4, 8, 8, 128),
'top_down_block3': (4, 4, 4, 128),
'top_down_block4': (4, 2, 2, 128),
'top_down_block5': (4, 1, 1, 128)
}
init_op = tf.global_variables_initializer()
with self.test_session() as sess:
sess.run(init_op)
out_feature_maps = sess.run(feature_maps)
out_feature_map_shapes = {key: value.shape
for key, value in out_feature_maps.items()}
self.assertDictEqual(out_feature_map_shapes, expected_feature_map_shapes)
class GetDepthFunctionTest(tf.test.TestCase):
......@@ -175,5 +408,94 @@ class GetDepthFunctionTest(tf.test.TestCase):
self.assertEqual(depth_fn(64), 32)
@parameterized.parameters(
{'replace_pool_with_conv': False},
{'replace_pool_with_conv': True},
)
class PoolingPyramidFeatureMapGeneratorTest(tf.test.TestCase):
def test_get_expected_feature_map_shapes(self, replace_pool_with_conv):
image_features = {
'image_features': tf.random_uniform([4, 19, 19, 1024])
}
feature_maps = feature_map_generators.pooling_pyramid_feature_maps(
base_feature_map_depth=1024,
num_layers=6,
image_features=image_features,
replace_pool_with_conv=replace_pool_with_conv)
expected_pool_feature_map_shapes = {
'Base_Conv2d_1x1_1024': (4, 19, 19, 1024),
'MaxPool2d_0_2x2': (4, 10, 10, 1024),
'MaxPool2d_1_2x2': (4, 5, 5, 1024),
'MaxPool2d_2_2x2': (4, 3, 3, 1024),
'MaxPool2d_3_2x2': (4, 2, 2, 1024),
'MaxPool2d_4_2x2': (4, 1, 1, 1024),
}
expected_conv_feature_map_shapes = {
'Base_Conv2d_1x1_1024': (4, 19, 19, 1024),
'Conv2d_0_3x3_s2_1024': (4, 10, 10, 1024),
'Conv2d_1_3x3_s2_1024': (4, 5, 5, 1024),
'Conv2d_2_3x3_s2_1024': (4, 3, 3, 1024),
'Conv2d_3_3x3_s2_1024': (4, 2, 2, 1024),
'Conv2d_4_3x3_s2_1024': (4, 1, 1, 1024),
}
init_op = tf.global_variables_initializer()
with self.test_session() as sess:
sess.run(init_op)
out_feature_maps = sess.run(feature_maps)
out_feature_map_shapes = {key: value.shape
for key, value in out_feature_maps.items()}
if replace_pool_with_conv:
self.assertDictEqual(expected_conv_feature_map_shapes,
out_feature_map_shapes)
else:
self.assertDictEqual(expected_pool_feature_map_shapes,
out_feature_map_shapes)
def test_get_expected_variable_names(self, replace_pool_with_conv):
image_features = {
'image_features': tf.random_uniform([4, 19, 19, 1024])
}
feature_maps = feature_map_generators.pooling_pyramid_feature_maps(
base_feature_map_depth=1024,
num_layers=6,
image_features=image_features,
replace_pool_with_conv=replace_pool_with_conv)
expected_pool_variables = set([
'Base_Conv2d_1x1_1024/weights',
'Base_Conv2d_1x1_1024/biases',
])
expected_conv_variables = set([
'Base_Conv2d_1x1_1024/weights',
'Base_Conv2d_1x1_1024/biases',
'Conv2d_0_3x3_s2_1024/weights',
'Conv2d_0_3x3_s2_1024/biases',
'Conv2d_1_3x3_s2_1024/weights',
'Conv2d_1_3x3_s2_1024/biases',
'Conv2d_2_3x3_s2_1024/weights',
'Conv2d_2_3x3_s2_1024/biases',
'Conv2d_3_3x3_s2_1024/weights',
'Conv2d_3_3x3_s2_1024/biases',
'Conv2d_4_3x3_s2_1024/weights',
'Conv2d_4_3x3_s2_1024/biases',
])
init_op = tf.global_variables_initializer()
with self.test_session() as sess:
sess.run(init_op)
sess.run(feature_maps)
actual_variable_set = set(
[var.op.name for var in tf.trainable_variables()])
if replace_pool_with_conv:
self.assertSetEqual(expected_conv_variables, actual_variable_set)
else:
self.assertSetEqual(expected_pool_variables, actual_variable_set)
if __name__ == '__main__':
tf.test.main()
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment