Unverified Commit 99256cf4 authored by pkulzc's avatar pkulzc Committed by GitHub
Browse files

Release iNaturalist Species-trained models, refactor of evaluation, box...

Release iNaturalist Species-trained models, refactor of evaluation, box predictor for object detection. (#5289)

* Merged commit includes the following changes:
212389173  by Zhichao Lu:

    1. Replace tf.boolean_mask with tf.where

--
212282646  by Zhichao Lu:

    1. Fix a typo in model_builder.py and add a test to cover it.

--
212142989  by Zhichao Lu:

    Only resize masks in meta architecture if it has not already been resized in the input pipeline.

--
212136935  by Zhichao Lu:

    Choose matmul or native crop_and_resize in the model builder instead of faster r-cnn meta architecture.

--
211907984  by Zhichao Lu:

    Make eval input reader repeated field and update config util to handle this field.

--
211858098  by Zhichao Lu:

    Change the implementation of merge_boxes_with_multiple_labels.

--
211843915  by Zhichao Lu:

    Add Mobilenet v2 + FPN support.

--
211655076  by Zhichao Lu:

    Bug fix for generic keys in config overrides

    In generic configuration overrides, we had a duplicate entry for train_input_config and we were missing the eval_input_config and eval_config.

    This change also introduces testing for all config overrides.

--
211157501  by Zhichao Lu:

    Make the locally-modified conv defs a copy.

    So that it doesn't modify MobileNet conv defs globally for other code that
    transitively imports this package.

--
211112813  by Zhichao Lu:

    Refactoring visualization tools for Estimator's eval_metric_ops. This will make it easier for future models to take advantage of a single interface and mechanics.

--
211109571  by Zhichao Lu:

    A test decorator.

--
210747685  by Zhichao Lu:

    For FPN, when use_depthwise is set to true, use slightly modified mobilenet v1 config.

--
210723882  by Zhichao Lu:

    Integrating the losses mask into the meta architectures. When providing groundtruth, one can optionally specify annotation information (i.e. which images are labeled vs. unlabeled). For any image that is unlabeled, there is no loss accumulation.

--
210673675  by Zhichao Lu:

    Internal change.

--
210546590  by Zhichao Lu:

    Internal change.

--
210529752  by Zhichao Lu:

    Support batched inputs with ops.matmul_crop_and_resize.

    With this change the new inputs are images of shape [batch, heigh, width, depth] and boxes of shape [batch, num_boxes, 4]. The output tensor is of the shape [batch, num_boxes, crop_height, crop_width, depth].

--
210485912  by Zhichao Lu:

    Fix TensorFlow version check in object_detection_tutorial.ipynb

--
210484076  by Zhichao Lu:

    Reduce TPU memory required for single image matmul_crop_and_resize.

    Using tf.einsum eliminates intermediate tensors, tiling and expansion. for an image of size [40, 40, 1024] and boxes of shape [300, 4] HBM memory usage goes down from 3.52G to 1.67G.

--
210468361  by Zhichao Lu:

    Remove PositiveAnchorLossCDF/NegativeAnchorLossCDF to resolve "Main thread is not in main loop error" issue in local training.

--
210100253  by Zhichao Lu:

    Pooling pyramid feature maps: add option to replace max pool with convolution layers.

--
209995842  by Zhichao Lu:

    Fix a bug which prevents variable sharing in Faster RCNN.

--
209965526  by Zhichao Lu:

    Add support for enabling export_to_tpu through the estimator.

--
209946440  by Zhichao Lu:

    Replace deprecated tf.train.Supervisor with tf.train.MonitoredSession. MonitoredSession also takes away the hassle of starting queue runners.

--
209888003  by Zhichao Lu:

    Implement function to handle data where source_id is not set.

    If the field source_id is found to be the empty string for any image during runtime, it will be replaced with a random string. This avoids hash-collisions on dataset where many examples do not have source_id set. Those hash-collisions have unintended site effects and may lead to bugs in the detection pipeline.

--
209842134  by Zhichao Lu:

    Converting loss mask into multiplier, rather than using it as a boolean mask (which changes tensor shape). This is necessary, since other utilities (e.g. hard example miner) require a loss matrix with the same dimensions as the original prediction tensor.

--
209768066  by Zhichao Lu:

    Adding ability to remove loss computation from specific images in a batch, via an optional boolean mask.

--
209722556  by Zhichao Lu:

    Remove dead code.

    (_USE_C_API was flipped to True by default in TensorFlow 1.8)

--
209701861  by Zhichao Lu:

    This CL cleans-up some tf.Example creation snippets, by reusing the convenient tf.train.Feature building functions in dataset_util.

--
209697893  by Zhichao Lu:

    Do not overwrite num_epoch for eval input. This leads to errors in some cases.

--
209694652  by Zhichao Lu:

    Sample boxes by jittering around the currently given boxes.

--
209550300  by Zhichao Lu:

    `create_category_index_from_labelmap()` function now accepts `use_display_name` parameter.
    Also added create_categories_from_labelmap function for convenience

--
209490273  by Zhichao Lu:

    Check result_dict type before accessing image_id via key.

--
209442529  by Zhichao Lu:

    Introducing the capability to sample examples for evaluation. This makes it easy to specify one full epoch of evaluation, or a subset (e.g. sample 1 of every N examples).

--
208941150  by Zhichao Lu:

    Adding the capability of exporting the results in json format.

--
208888798  by Zhichao Lu:

    Fixes wrong dictionary key for num_det_boxes_per_image.

--
208873549  by Zhichao Lu:

    Reduce the number of HLO ops created by matmul_crop_and_resize.

    Do not unroll along the channels dimension. Instead, transpose the input image dimensions, apply tf.matmul and transpose back.

    The number of HLO instructions for 1024 channels reduce from 12368 to 110.

--
208844315  by Zhichao Lu:

    Add an option to use tf.non_maximal_supression_padded in SSD post-process

--
208731380  by Zhichao Lu:

    Add field in box_predictor config to enable mask prediction and update builders accordingly.

--
208699405  by Zhichao Lu:

    This CL creates a keras-based multi-resolution feature map extractor.

--
208557208  by Zhichao Lu:

    Add TPU tests for Faster R-CNN Meta arch.

    * Tests that two_stage_predict and total_loss tests run successfully on TPU.
    * Small mods to multiclass_non_max_suppression to preserve static shapes.

--
208499278  by Zhichao Lu:

    This CL makes sure the Keras convolutional box predictor & head layers apply activation layers *after* normalization (as opposed to before).

--
208391694  by Zhichao Lu:

    Updating visualization tool to produce multiple evaluation images.

--
208275961  by Zhichao Lu:

    This CL adds a Keras version of the Convolutional Box Predictor, as well as more general infrastructure for making Keras Prediction heads & Keras box predictors.

--
208275585  by Zhichao Lu:

    This CL enables the Keras layer hyperparameter object to build a dedicated activation layer, and to disable activation by default in the op layer construction kwargs.

    This is necessary because in most cases the normalization layer must be applied before the activation layer. So, in Keras models we must set the convolution activation in a dedicated layer after normalization is applied, rather than setting it in the convolution layer construction args.

--
208263792  by Zhichao Lu:

    Add a new SSD mask meta arch that can predict masks for SSD models.
    Changes including:
     - overwrite loss function to add mask loss computation.
     - update ssd_meta_arch to handle masks if predicted in predict and postprocessing.

--
208000218  by Zhichao Lu:

    Make FasterRCNN choose static shape operations only in training mode.

--
207997797  by Zhichao Lu:

    Add static boolean_mask op to box_list_ops.py and use that in faster_rcnn_meta_arch.py to support use_static_shapes option.

--
207993460  by Zhichao Lu:

    Include FGVC detection models in model zoo.

--
207971213  by Zhichao Lu:

    remove the restriction to run tf.nn.top_k op on CPU

--
207961187  by Zhichao Lu:

    Build the first stage NMS function in the model builder and pass it to FasterRCNN meta arch.

--
207960608  by Zhichao Lu:

    Internal Change.

--
207927015  by Zhichao Lu:

    Have an option to use the TPU compatible NMS op cl/206673787, in the batch_multiclass_non_max_suppression function. On setting pad_to_max_output_size to true, the output nmsed boxes are padded to be of length max_size_per_class.

    This can be used in first stage Region Proposal Network in FasterRCNN model by setting the first_stage_nms_pad_to_max_proposals field to true in config proto.

--
207809668  by Zhichao Lu:

    Add option to use depthwise separable conv instead of conv2d in FPN and WeightSharedBoxPredictor. More specifically, there are two related configs:
    - SsdFeatureExtractor.use_depthwise
    - WeightSharedConvolutionalBoxPredictor.use_depthwise

--
207808651  by Zhichao Lu:

    Fix the static balanced positive negative sampler's TPU tests

--
207798658  by Zhichao Lu:

    Fixes a post-refactoring bug where the pre-prediction convolution layers in the convolutional box predictor are ignored.

--
207796470  by Zhichao Lu:

    Make slim endpoints visible in FasterRCNNMetaArch.

--
207787053  by Zhichao Lu:

    Refactor ssd_meta_arch so that the target assigner instance is passed into the SSDMetaArch constructor rather than constructed inside.

--

PiperOrigin-RevId: 212389173

* Fix detection model zoo typo.

* Modify tf example decoder to handle label maps with either `display_name` or `name` fields seamlessly.

Currently, tf example decoder uses only `name` field to look up ids for class text field present in the data. This change uses both `display_name` and `name` fields in the label map to fetch ids for class text.

PiperOrigin-RevId: 212672223

* Modify create_coco_tf_record tool to write out class text instead of class labels.

PiperOrigin-RevId: 212679112

* Fix detection model zoo typo.

PiperOrigin-RevId: 212715692

* Adding the following two optional flags to WeightSharedConvolutionalBoxHead:
1) In the box head, apply clipping to box encodings in the box head.
2) In the class head, apply sigmoid to class predictions at inference time.

PiperOrigin-RevId: 212723242

* Support class confidences in merge boxes with multiple labels.

PiperOrigin-RevId: 212884998

* Creates multiple eval specs for object detection.

PiperOrigin-RevId: 212894556

* Set batch_norm on last layer in Mask Head to None.

PiperOrigin-RevId: 213030087

* Enable bfloat16 training for object detection models.

PiperOrigin-RevId: 213053547

* Skip padding op when unnecessary.

PiperOrigin-RevId: 213065869

* Modify `Matchers` to use groundtruth weights before performing matching.

Groundtruth weights tensor is used to indicate padding in groundtruth box tensor. It is handled in `TargetAssigner` by creating appropriate classification and regression target weights based on the groundtruth box each anchor matches to. However, options such as `force_match_all_rows` in `ArgmaxMatcher` force certain anchors to match to groundtruth boxes that are just paddings thereby reducing the number of anchors that could otherwise match to real groundtruth boxes.

For single stage models like SSD the effect of this is negligible as there are two orders of magnitude more anchors than the number of padded groundtruth boxes. But for Faster R-CNN and Mask R-CNN where there are only 300 anchors in the second stage, a significant number of these match to groundtruth paddings reducing the number of anchors regressing to real groundtruth boxes degrading the performance severely.

Therefore, this change introduces an additional boolean argument `valid_rows` to `Matcher.match` methods and the implementations now ignore such padded groudtruth boxes during matching.

PiperOrigin-RevId: 213345395

* Add release note for iNaturalist Species trained models.

PiperOrigin-RevId: 213347179

* Fix the bug of uninitialized gt_is_crowd_list variable.

PiperOrigin-RevId: 213364858

* ...text exposed to open source public git repo...

PiperOrigin-RevId: 213554260
parent 256b8ae6
...@@ -99,6 +99,16 @@ reporting an issue. ...@@ -99,6 +99,16 @@ reporting an issue.
## Release information ## Release information
### Sep 17, 2018
We have released Faster R-CNN detectors with ResNet-50 / ResNet-101 feature
extractors trained on the [iNaturalist Species Detection Dataset](https://github.com/visipedia/inat_comp/blob/master/2017/README.md#bounding-boxes).
The models are trained on the training split of the iNaturalist data for 4M
iterations, they achieve 55% and 58% mean AP@.5 over 2854 classes respectively.
For more details please refer to this [paper](https://arxiv.org/abs/1707.06642).
<b>Thanks to contributors</b>: Chen Sun
### July 13, 2018 ### July 13, 2018
There are many new updates in this release, extending the functionality and There are many new updates in this release, extending the functionality and
......
...@@ -15,33 +15,36 @@ ...@@ -15,33 +15,36 @@
"""Function to build box predictor from configuration.""" """Function to build box predictor from configuration."""
import collections
from absl import logging
import tensorflow as tf
from object_detection.predictors import convolutional_box_predictor from object_detection.predictors import convolutional_box_predictor
from object_detection.predictors import convolutional_keras_box_predictor
from object_detection.predictors import mask_rcnn_box_predictor from object_detection.predictors import mask_rcnn_box_predictor
from object_detection.predictors import rfcn_box_predictor from object_detection.predictors import rfcn_box_predictor
from object_detection.predictors.heads import box_head from object_detection.predictors.heads import box_head
from object_detection.predictors.heads import class_head from object_detection.predictors.heads import class_head
from object_detection.predictors.heads import keras_box_head
from object_detection.predictors.heads import keras_class_head
from object_detection.predictors.heads import keras_mask_head
from object_detection.predictors.heads import mask_head from object_detection.predictors.heads import mask_head
from object_detection.protos import box_predictor_pb2 from object_detection.protos import box_predictor_pb2
def build_convolutional_box_predictor( def build_convolutional_box_predictor(is_training,
is_training, num_classes,
num_classes, conv_hyperparams_fn,
conv_hyperparams_fn, min_depth,
min_depth, max_depth,
max_depth, num_layers_before_predictor,
num_layers_before_predictor, use_dropout,
use_dropout, dropout_keep_prob,
dropout_keep_prob, kernel_size,
kernel_size, box_code_size,
box_code_size, apply_sigmoid_to_scores=False,
apply_sigmoid_to_scores=False, class_prediction_bias_init=0.0,
class_prediction_bias_init=0.0, use_depthwise=False,
use_depthwise=False, mask_head_config=None):
predict_instance_masks=False,
mask_height=7,
mask_width=7,
masks_are_class_agnostic=False):
"""Builds the ConvolutionalBoxPredictor from the arguments. """Builds the ConvolutionalBoxPredictor from the arguments.
Args: Args:
...@@ -66,18 +69,14 @@ def build_convolutional_box_predictor( ...@@ -66,18 +69,14 @@ def build_convolutional_box_predictor(
then the kernel size is automatically set to be then the kernel size is automatically set to be
min(feature_width, feature_height). min(feature_width, feature_height).
box_code_size: Size of encoding for each box. box_code_size: Size of encoding for each box.
apply_sigmoid_to_scores: if True, apply the sigmoid on the output apply_sigmoid_to_scores: If True, apply the sigmoid on the output
class_predictions. class_predictions.
class_prediction_bias_init: constant value to initialize bias of the last class_prediction_bias_init: Constant value to initialize bias of the last
conv2d layer before class prediction. conv2d layer before class prediction.
use_depthwise: Whether to use depthwise convolutions for prediction use_depthwise: Whether to use depthwise convolutions for prediction
steps. Default is False. steps. Default is False.
predict_instance_masks: If True, will add a third stage mask prediction mask_head_config: An optional MaskHead object containing configs for mask
to the returned class. head construction.
mask_height: Desired output mask height. The default value is 7.
mask_width: Desired output mask width. The default value is 7.
masks_are_class_agnostic: Boolean determining if the mask-head is
class-agnostic or not.
Returns: Returns:
A ConvolutionalBoxPredictor class. A ConvolutionalBoxPredictor class.
...@@ -97,7 +96,10 @@ def build_convolutional_box_predictor( ...@@ -97,7 +96,10 @@ def build_convolutional_box_predictor(
class_prediction_bias_init=class_prediction_bias_init, class_prediction_bias_init=class_prediction_bias_init,
use_depthwise=use_depthwise) use_depthwise=use_depthwise)
other_heads = {} other_heads = {}
if predict_instance_masks: if mask_head_config is not None:
if not mask_head_config.masks_are_class_agnostic:
logging.warning('Note that class specific mask prediction for SSD '
'models is memory consuming.')
other_heads[convolutional_box_predictor.MASK_PREDICTIONS] = ( other_heads[convolutional_box_predictor.MASK_PREDICTIONS] = (
mask_head.ConvolutionalMaskHead( mask_head.ConvolutionalMaskHead(
is_training=is_training, is_training=is_training,
...@@ -106,9 +108,9 @@ def build_convolutional_box_predictor( ...@@ -106,9 +108,9 @@ def build_convolutional_box_predictor(
dropout_keep_prob=dropout_keep_prob, dropout_keep_prob=dropout_keep_prob,
kernel_size=kernel_size, kernel_size=kernel_size,
use_depthwise=use_depthwise, use_depthwise=use_depthwise,
mask_height=mask_height, mask_height=mask_head_config.mask_height,
mask_width=mask_width, mask_width=mask_head_config.mask_width,
masks_are_class_agnostic=masks_are_class_agnostic)) masks_are_class_agnostic=mask_head_config.masks_are_class_agnostic))
return convolutional_box_predictor.ConvolutionalBoxPredictor( return convolutional_box_predictor.ConvolutionalBoxPredictor(
is_training=is_training, is_training=is_training,
num_classes=num_classes, num_classes=num_classes,
...@@ -121,6 +123,139 @@ def build_convolutional_box_predictor( ...@@ -121,6 +123,139 @@ def build_convolutional_box_predictor(
max_depth=max_depth) max_depth=max_depth)
def build_convolutional_keras_box_predictor(is_training,
num_classes,
conv_hyperparams,
freeze_batchnorm,
inplace_batchnorm_update,
num_predictions_per_location_list,
min_depth,
max_depth,
num_layers_before_predictor,
use_dropout,
dropout_keep_prob,
kernel_size,
box_code_size,
class_prediction_bias_init=0.0,
use_depthwise=False,
mask_head_config=None,
name='BoxPredictor'):
"""Builds the ConvolutionalBoxPredictor from the arguments.
Args:
is_training: Indicates whether the BoxPredictor is in training mode.
num_classes: Number of classes.
conv_hyperparams: A `hyperparams_builder.KerasLayerHyperparams` object
containing hyperparameters for convolution ops.
freeze_batchnorm: Whether to freeze batch norm parameters during
training or not. When training with a small batch size (e.g. 1), it is
desirable to freeze batch norm update and use pretrained batch norm
params.
inplace_batchnorm_update: Whether to update batch norm moving average
values inplace. When this is false train op must add a control
dependency on tf.graphkeys.UPDATE_OPS collection in order to update
batch norm statistics.
num_predictions_per_location_list: A list of integers representing the
number of box predictions to be made per spatial location for each
feature map.
min_depth: Minimum feature depth prior to predicting box encodings
and class predictions.
max_depth: Maximum feature depth prior to predicting box encodings
and class predictions. If max_depth is set to 0, no additional
feature map will be inserted before location and class predictions.
num_layers_before_predictor: Number of the additional conv layers before
the predictor.
use_dropout: Option to use dropout or not. Note that a single dropout
op is applied here prior to both box and class predictions, which stands
in contrast to the ConvolutionalBoxPredictor below.
dropout_keep_prob: Keep probability for dropout.
This is only used if use_dropout is True.
kernel_size: Size of final convolution kernel. If the
spatial resolution of the feature map is smaller than the kernel size,
then the kernel size is automatically set to be
min(feature_width, feature_height).
box_code_size: Size of encoding for each box.
class_prediction_bias_init: constant value to initialize bias of the last
conv2d layer before class prediction.
use_depthwise: Whether to use depthwise convolutions for prediction
steps. Default is False.
mask_head_config: An optional MaskHead object containing configs for mask
head construction.
name: A string name scope to assign to the box predictor. If `None`, Keras
will auto-generate one from the class name.
Returns:
A ConvolutionalBoxPredictor class.
"""
box_prediction_heads = []
class_prediction_heads = []
mask_prediction_heads = []
other_heads = {}
if mask_head_config is not None:
other_heads[convolutional_box_predictor.MASK_PREDICTIONS] = \
mask_prediction_heads
for stack_index, num_predictions_per_location in enumerate(
num_predictions_per_location_list):
box_prediction_heads.append(
keras_box_head.ConvolutionalBoxHead(
is_training=is_training,
box_code_size=box_code_size,
kernel_size=kernel_size,
conv_hyperparams=conv_hyperparams,
freeze_batchnorm=freeze_batchnorm,
num_predictions_per_location=num_predictions_per_location,
use_depthwise=use_depthwise,
name='ConvolutionalBoxHead_%d' % stack_index))
class_prediction_heads.append(
keras_class_head.ConvolutionalClassHead(
is_training=is_training,
num_classes=num_classes,
use_dropout=use_dropout,
dropout_keep_prob=dropout_keep_prob,
kernel_size=kernel_size,
conv_hyperparams=conv_hyperparams,
freeze_batchnorm=freeze_batchnorm,
num_predictions_per_location=num_predictions_per_location,
class_prediction_bias_init=class_prediction_bias_init,
use_depthwise=use_depthwise,
name='ConvolutionalClassHead_%d' % stack_index))
if mask_head_config is not None:
if not mask_head_config.masks_are_class_agnostic:
logging.warning('Note that class specific mask prediction for SSD '
'models is memory consuming.')
mask_prediction_heads.append(
keras_mask_head.ConvolutionalMaskHead(
is_training=is_training,
num_classes=num_classes,
use_dropout=use_dropout,
dropout_keep_prob=dropout_keep_prob,
kernel_size=kernel_size,
conv_hyperparams=conv_hyperparams,
freeze_batchnorm=freeze_batchnorm,
num_predictions_per_location=num_predictions_per_location,
use_depthwise=use_depthwise,
mask_height=mask_head_config.mask_height,
mask_width=mask_head_config.mask_width,
masks_are_class_agnostic=mask_head_config.
masks_are_class_agnostic,
name='ConvolutionalMaskHead_%d' % stack_index))
return convolutional_keras_box_predictor.ConvolutionalBoxPredictor(
is_training=is_training,
num_classes=num_classes,
box_prediction_heads=box_prediction_heads,
class_prediction_heads=class_prediction_heads,
other_heads=other_heads,
conv_hyperparams=conv_hyperparams,
num_layers_before_predictor=num_layers_before_predictor,
min_depth=min_depth,
max_depth=max_depth,
freeze_batchnorm=freeze_batchnorm,
inplace_batchnorm_update=inplace_batchnorm_update,
name=name)
def build_weight_shared_convolutional_box_predictor( def build_weight_shared_convolutional_box_predictor(
is_training, is_training,
num_classes, num_classes,
...@@ -134,10 +269,10 @@ def build_weight_shared_convolutional_box_predictor( ...@@ -134,10 +269,10 @@ def build_weight_shared_convolutional_box_predictor(
dropout_keep_prob=0.8, dropout_keep_prob=0.8,
share_prediction_tower=False, share_prediction_tower=False,
apply_batch_norm=True, apply_batch_norm=True,
predict_instance_masks=False, use_depthwise=False,
mask_height=7, mask_head_config=None,
mask_width=7, score_converter_fn=tf.identity,
masks_are_class_agnostic=False): box_encodings_clip_range=None):
"""Builds and returns a WeightSharedConvolutionalBoxPredictor class. """Builds and returns a WeightSharedConvolutionalBoxPredictor class.
Args: Args:
...@@ -161,12 +296,12 @@ def build_weight_shared_convolutional_box_predictor( ...@@ -161,12 +296,12 @@ def build_weight_shared_convolutional_box_predictor(
prediction and class prediction heads. prediction and class prediction heads.
apply_batch_norm: Whether to apply batch normalization to conv layers in apply_batch_norm: Whether to apply batch normalization to conv layers in
this predictor. this predictor.
predict_instance_masks: If True, will add a third stage mask prediction use_depthwise: Whether to use depthwise separable conv2d instead of conv2d.
to the returned class. mask_head_config: An optional MaskHead object containing configs for mask
mask_height: Desired output mask height. The default value is 7. head construction.
mask_width: Desired output mask width. The default value is 7. score_converter_fn: Callable score converter to perform elementwise op on
masks_are_class_agnostic: Boolean determining if the mask-head is class scores.
class-agnostic or not. box_encodings_clip_range: Min and max values for clipping the box_encodings.
Returns: Returns:
A WeightSharedConvolutionalBoxPredictor class. A WeightSharedConvolutionalBoxPredictor class.
...@@ -174,25 +309,31 @@ def build_weight_shared_convolutional_box_predictor( ...@@ -174,25 +309,31 @@ def build_weight_shared_convolutional_box_predictor(
box_prediction_head = box_head.WeightSharedConvolutionalBoxHead( box_prediction_head = box_head.WeightSharedConvolutionalBoxHead(
box_code_size=box_code_size, box_code_size=box_code_size,
kernel_size=kernel_size, kernel_size=kernel_size,
class_prediction_bias_init=class_prediction_bias_init) use_depthwise=use_depthwise,
box_encodings_clip_range=box_encodings_clip_range)
class_prediction_head = ( class_prediction_head = (
class_head.WeightSharedConvolutionalClassHead( class_head.WeightSharedConvolutionalClassHead(
num_classes=num_classes, num_classes=num_classes,
kernel_size=kernel_size, kernel_size=kernel_size,
class_prediction_bias_init=class_prediction_bias_init, class_prediction_bias_init=class_prediction_bias_init,
use_dropout=use_dropout, use_dropout=use_dropout,
dropout_keep_prob=dropout_keep_prob)) dropout_keep_prob=dropout_keep_prob,
use_depthwise=use_depthwise,
score_converter_fn=score_converter_fn))
other_heads = {} other_heads = {}
if predict_instance_masks: if mask_head_config is not None:
if not mask_head_config.masks_are_class_agnostic:
logging.warning('Note that class specific mask prediction for SSD '
'models is memory consuming.')
other_heads[convolutional_box_predictor.MASK_PREDICTIONS] = ( other_heads[convolutional_box_predictor.MASK_PREDICTIONS] = (
mask_head.WeightSharedConvolutionalMaskHead( mask_head.WeightSharedConvolutionalMaskHead(
num_classes=num_classes, num_classes=num_classes,
kernel_size=kernel_size, kernel_size=kernel_size,
use_dropout=use_dropout, use_dropout=use_dropout,
dropout_keep_prob=dropout_keep_prob, dropout_keep_prob=dropout_keep_prob,
mask_height=mask_height, mask_height=mask_head_config.mask_height,
mask_width=mask_width, mask_width=mask_head_config.mask_width,
masks_are_class_agnostic=masks_are_class_agnostic)) masks_are_class_agnostic=mask_head_config.masks_are_class_agnostic))
return convolutional_box_predictor.WeightSharedConvolutionalBoxPredictor( return convolutional_box_predictor.WeightSharedConvolutionalBoxPredictor(
is_training=is_training, is_training=is_training,
num_classes=num_classes, num_classes=num_classes,
...@@ -204,7 +345,8 @@ def build_weight_shared_convolutional_box_predictor( ...@@ -204,7 +345,8 @@ def build_weight_shared_convolutional_box_predictor(
num_layers_before_predictor=num_layers_before_predictor, num_layers_before_predictor=num_layers_before_predictor,
kernel_size=kernel_size, kernel_size=kernel_size,
apply_batch_norm=apply_batch_norm, apply_batch_norm=apply_batch_norm,
share_prediction_tower=share_prediction_tower) share_prediction_tower=share_prediction_tower,
use_depthwise=use_depthwise)
def build_mask_rcnn_box_predictor(is_training, def build_mask_rcnn_box_predictor(is_training,
...@@ -292,6 +434,36 @@ def build_mask_rcnn_box_predictor(is_training, ...@@ -292,6 +434,36 @@ def build_mask_rcnn_box_predictor(is_training,
third_stage_heads=third_stage_heads) third_stage_heads=third_stage_heads)
def build_score_converter(score_converter_config, is_training):
"""Builds score converter based on the config.
Builds one of [tf.identity, tf.sigmoid] score converters based on the config
and whether the BoxPredictor is for training or inference.
Args:
score_converter_config:
box_predictor_pb2.WeightSharedConvolutionalBoxPredictor.score_converter.
is_training: Indicates whether the BoxPredictor is in training mode.
Returns:
Callable score converter op.
Raises:
ValueError: On unknown score converter.
"""
if score_converter_config == (
box_predictor_pb2.WeightSharedConvolutionalBoxPredictor.IDENTITY):
return tf.identity
if score_converter_config == (
box_predictor_pb2.WeightSharedConvolutionalBoxPredictor.SIGMOID):
return tf.identity if is_training else tf.sigmoid
raise ValueError('Unknown score converter.')
BoxEncodingsClipRange = collections.namedtuple('BoxEncodingsClipRange',
['min', 'max'])
def build(argscope_fn, box_predictor_config, is_training, num_classes): def build(argscope_fn, box_predictor_config, is_training, num_classes):
"""Builds box predictor based on the configuration. """Builds box predictor based on the configuration.
...@@ -324,6 +496,9 @@ def build(argscope_fn, box_predictor_config, is_training, num_classes): ...@@ -324,6 +496,9 @@ def build(argscope_fn, box_predictor_config, is_training, num_classes):
config_box_predictor = box_predictor_config.convolutional_box_predictor config_box_predictor = box_predictor_config.convolutional_box_predictor
conv_hyperparams_fn = argscope_fn(config_box_predictor.conv_hyperparams, conv_hyperparams_fn = argscope_fn(config_box_predictor.conv_hyperparams,
is_training) is_training)
mask_head_config = (
config_box_predictor.mask_head
if config_box_predictor.HasField('mask_head') else None)
return build_convolutional_box_predictor( return build_convolutional_box_predictor(
is_training=is_training, is_training=is_training,
num_classes=num_classes, num_classes=num_classes,
...@@ -339,7 +514,8 @@ def build(argscope_fn, box_predictor_config, is_training, num_classes): ...@@ -339,7 +514,8 @@ def build(argscope_fn, box_predictor_config, is_training, num_classes):
apply_sigmoid_to_scores=config_box_predictor.apply_sigmoid_to_scores, apply_sigmoid_to_scores=config_box_predictor.apply_sigmoid_to_scores,
class_prediction_bias_init=( class_prediction_bias_init=(
config_box_predictor.class_prediction_bias_init), config_box_predictor.class_prediction_bias_init),
use_depthwise=config_box_predictor.use_depthwise) use_depthwise=config_box_predictor.use_depthwise,
mask_head_config=mask_head_config)
if box_predictor_oneof == 'weight_shared_convolutional_box_predictor': if box_predictor_oneof == 'weight_shared_convolutional_box_predictor':
config_box_predictor = ( config_box_predictor = (
...@@ -348,6 +524,21 @@ def build(argscope_fn, box_predictor_config, is_training, num_classes): ...@@ -348,6 +524,21 @@ def build(argscope_fn, box_predictor_config, is_training, num_classes):
is_training) is_training)
apply_batch_norm = config_box_predictor.conv_hyperparams.HasField( apply_batch_norm = config_box_predictor.conv_hyperparams.HasField(
'batch_norm') 'batch_norm')
mask_head_config = (
config_box_predictor.mask_head
if config_box_predictor.HasField('mask_head') else None)
# During training phase, logits are used to compute the loss. Only apply
# sigmoid at inference to make the inference graph TPU friendly.
score_converter_fn = build_score_converter(
config_box_predictor.score_converter, is_training)
# Optionally apply clipping to box encodings, when box_encodings_clip_range
# is set.
box_encodings_clip_range = (
BoxEncodingsClipRange(
min=config_box_predictor.box_encodings_clip_range.min,
max=config_box_predictor.box_encodings_clip_range.max)
if config_box_predictor.HasField('box_encodings_clip_range') else None)
return build_weight_shared_convolutional_box_predictor( return build_weight_shared_convolutional_box_predictor(
is_training=is_training, is_training=is_training,
num_classes=num_classes, num_classes=num_classes,
...@@ -362,7 +553,11 @@ def build(argscope_fn, box_predictor_config, is_training, num_classes): ...@@ -362,7 +553,11 @@ def build(argscope_fn, box_predictor_config, is_training, num_classes):
use_dropout=config_box_predictor.use_dropout, use_dropout=config_box_predictor.use_dropout,
dropout_keep_prob=config_box_predictor.dropout_keep_probability, dropout_keep_prob=config_box_predictor.dropout_keep_probability,
share_prediction_tower=config_box_predictor.share_prediction_tower, share_prediction_tower=config_box_predictor.share_prediction_tower,
apply_batch_norm=apply_batch_norm) apply_batch_norm=apply_batch_norm,
use_depthwise=config_box_predictor.use_depthwise,
mask_head_config=mask_head_config,
score_converter_fn=score_converter_fn,
box_encodings_clip_range=box_encodings_clip_range)
if box_predictor_oneof == 'mask_rcnn_box_predictor': if box_predictor_oneof == 'mask_rcnn_box_predictor':
config_box_predictor = box_predictor_config.mask_rcnn_box_predictor config_box_predictor = box_predictor_config.mask_rcnn_box_predictor
......
...@@ -14,13 +14,16 @@ ...@@ -14,13 +14,16 @@
# ============================================================================== # ==============================================================================
"""Tests for box_predictor_builder.""" """Tests for box_predictor_builder."""
import mock import mock
import tensorflow as tf import tensorflow as tf
from google.protobuf import text_format from google.protobuf import text_format
from object_detection.builders import box_predictor_builder from object_detection.builders import box_predictor_builder
from object_detection.builders import hyperparams_builder from object_detection.builders import hyperparams_builder
from object_detection.predictors import convolutional_box_predictor
from object_detection.predictors import mask_rcnn_box_predictor from object_detection.predictors import mask_rcnn_box_predictor
from object_detection.predictors.heads import mask_head
from object_detection.protos import box_predictor_pb2 from object_detection.protos import box_predictor_pb2
from object_detection.protos import hyperparams_pb2 from object_detection.protos import hyperparams_pb2
...@@ -155,6 +158,73 @@ class ConvolutionalBoxPredictorBuilderTest(tf.test.TestCase): ...@@ -155,6 +158,73 @@ class ConvolutionalBoxPredictorBuilderTest(tf.test.TestCase):
self.assertTrue(box_predictor._is_training) self.assertTrue(box_predictor._is_training)
self.assertFalse(class_head._use_depthwise) self.assertFalse(class_head._use_depthwise)
def test_construct_default_conv_box_predictor_with_default_mask_head(self):
box_predictor_text_proto = """
convolutional_box_predictor {
mask_head {
}
conv_hyperparams {
regularizer {
l1_regularizer {
}
}
initializer {
truncated_normal_initializer {
}
}
}
}"""
box_predictor_proto = box_predictor_pb2.BoxPredictor()
text_format.Merge(box_predictor_text_proto, box_predictor_proto)
box_predictor = box_predictor_builder.build(
argscope_fn=hyperparams_builder.build,
box_predictor_config=box_predictor_proto,
is_training=True,
num_classes=90)
self.assertTrue(convolutional_box_predictor.MASK_PREDICTIONS in
box_predictor._other_heads)
mask_prediction_head = (
box_predictor._other_heads[convolutional_box_predictor.MASK_PREDICTIONS]
)
self.assertEqual(mask_prediction_head._mask_height, 15)
self.assertEqual(mask_prediction_head._mask_width, 15)
self.assertTrue(mask_prediction_head._masks_are_class_agnostic)
def test_construct_default_conv_box_predictor_with_custom_mask_head(self):
box_predictor_text_proto = """
convolutional_box_predictor {
mask_head {
mask_height: 7
mask_width: 7
masks_are_class_agnostic: false
}
conv_hyperparams {
regularizer {
l1_regularizer {
}
}
initializer {
truncated_normal_initializer {
}
}
}
}"""
box_predictor_proto = box_predictor_pb2.BoxPredictor()
text_format.Merge(box_predictor_text_proto, box_predictor_proto)
box_predictor = box_predictor_builder.build(
argscope_fn=hyperparams_builder.build,
box_predictor_config=box_predictor_proto,
is_training=True,
num_classes=90)
self.assertTrue(convolutional_box_predictor.MASK_PREDICTIONS in
box_predictor._other_heads)
mask_prediction_head = (
box_predictor._other_heads[convolutional_box_predictor.MASK_PREDICTIONS]
)
self.assertEqual(mask_prediction_head._mask_height, 7)
self.assertEqual(mask_prediction_head._mask_width, 7)
self.assertFalse(mask_prediction_head._masks_are_class_agnostic)
class WeightSharedConvolutionalBoxPredictorBuilderTest(tf.test.TestCase): class WeightSharedConvolutionalBoxPredictorBuilderTest(tf.test.TestCase):
...@@ -240,7 +310,51 @@ class WeightSharedConvolutionalBoxPredictorBuilderTest(tf.test.TestCase): ...@@ -240,7 +310,51 @@ class WeightSharedConvolutionalBoxPredictorBuilderTest(tf.test.TestCase):
class_head = box_predictor._class_prediction_head class_head = box_predictor._class_prediction_head
self.assertEqual(box_predictor._depth, 2) self.assertEqual(box_predictor._depth, 2)
self.assertEqual(box_predictor._num_layers_before_predictor, 2) self.assertEqual(box_predictor._num_layers_before_predictor, 2)
self.assertAlmostEqual(class_head._class_prediction_bias_init, 4.0)
self.assertEqual(box_predictor.num_classes, 10)
self.assertFalse(box_predictor._is_training)
self.assertEqual(box_predictor._apply_batch_norm, False) self.assertEqual(box_predictor._apply_batch_norm, False)
def test_construct_non_default_depthwise_conv_box_predictor(self):
box_predictor_text_proto = """
weight_shared_convolutional_box_predictor {
depth: 2
num_layers_before_predictor: 2
kernel_size: 7
box_code_size: 3
class_prediction_bias_init: 4.0
use_depthwise: true
}
"""
conv_hyperparams_text_proto = """
regularizer {
l1_regularizer {
}
}
initializer {
truncated_normal_initializer {
}
}
"""
hyperparams_proto = hyperparams_pb2.Hyperparams()
text_format.Merge(conv_hyperparams_text_proto, hyperparams_proto)
def mock_conv_argscope_builder(conv_hyperparams_arg, is_training):
return (conv_hyperparams_arg, is_training)
box_predictor_proto = box_predictor_pb2.BoxPredictor()
text_format.Merge(box_predictor_text_proto, box_predictor_proto)
(box_predictor_proto.weight_shared_convolutional_box_predictor.
conv_hyperparams.CopyFrom(hyperparams_proto))
box_predictor = box_predictor_builder.build(
argscope_fn=mock_conv_argscope_builder,
box_predictor_config=box_predictor_proto,
is_training=False,
num_classes=10)
class_head = box_predictor._class_prediction_head
self.assertEqual(box_predictor._depth, 2)
self.assertEqual(box_predictor._num_layers_before_predictor, 2)
self.assertEqual(box_predictor._apply_batch_norm, False)
self.assertEqual(box_predictor._use_depthwise, True)
self.assertAlmostEqual(class_head._class_prediction_bias_init, 4.0) self.assertAlmostEqual(class_head._class_prediction_bias_init, 4.0)
self.assertEqual(box_predictor.num_classes, 10) self.assertEqual(box_predictor.num_classes, 10)
self.assertFalse(box_predictor._is_training) self.assertFalse(box_predictor._is_training)
...@@ -302,6 +416,79 @@ class WeightSharedConvolutionalBoxPredictorBuilderTest(tf.test.TestCase): ...@@ -302,6 +416,79 @@ class WeightSharedConvolutionalBoxPredictorBuilderTest(tf.test.TestCase):
self.assertTrue(box_predictor._is_training) self.assertTrue(box_predictor._is_training)
self.assertEqual(box_predictor._apply_batch_norm, True) self.assertEqual(box_predictor._apply_batch_norm, True)
def test_construct_weight_shared_predictor_with_default_mask_head(self):
box_predictor_text_proto = """
weight_shared_convolutional_box_predictor {
mask_head {
}
conv_hyperparams {
regularizer {
l1_regularizer {
}
}
initializer {
truncated_normal_initializer {
}
}
}
}"""
box_predictor_proto = box_predictor_pb2.BoxPredictor()
text_format.Merge(box_predictor_text_proto, box_predictor_proto)
box_predictor = box_predictor_builder.build(
argscope_fn=hyperparams_builder.build,
box_predictor_config=box_predictor_proto,
is_training=True,
num_classes=90)
self.assertTrue(convolutional_box_predictor.MASK_PREDICTIONS in
box_predictor._other_heads)
weight_shared_convolutional_mask_head = (
box_predictor._other_heads[convolutional_box_predictor.MASK_PREDICTIONS]
)
self.assertIsInstance(weight_shared_convolutional_mask_head,
mask_head.WeightSharedConvolutionalMaskHead)
self.assertEqual(weight_shared_convolutional_mask_head._mask_height, 15)
self.assertEqual(weight_shared_convolutional_mask_head._mask_width, 15)
self.assertTrue(
weight_shared_convolutional_mask_head._masks_are_class_agnostic)
def test_construct_weight_shared_predictor_with_custom_mask_head(self):
box_predictor_text_proto = """
weight_shared_convolutional_box_predictor {
mask_head {
mask_height: 7
mask_width: 7
masks_are_class_agnostic: false
}
conv_hyperparams {
regularizer {
l1_regularizer {
}
}
initializer {
truncated_normal_initializer {
}
}
}
}"""
box_predictor_proto = box_predictor_pb2.BoxPredictor()
text_format.Merge(box_predictor_text_proto, box_predictor_proto)
box_predictor = box_predictor_builder.build(
argscope_fn=hyperparams_builder.build,
box_predictor_config=box_predictor_proto,
is_training=True,
num_classes=90)
self.assertTrue(convolutional_box_predictor.MASK_PREDICTIONS in
box_predictor._other_heads)
weight_shared_convolutional_mask_head = (
box_predictor._other_heads[convolutional_box_predictor.MASK_PREDICTIONS]
)
self.assertIsInstance(weight_shared_convolutional_mask_head,
mask_head.WeightSharedConvolutionalMaskHead)
self.assertEqual(weight_shared_convolutional_mask_head._mask_height, 7)
self.assertEqual(weight_shared_convolutional_mask_head._mask_width, 7)
self.assertFalse(
weight_shared_convolutional_mask_head._masks_are_class_agnostic)
class MaskRCNNBoxPredictorBuilderTest(tf.test.TestCase): class MaskRCNNBoxPredictorBuilderTest(tf.test.TestCase):
......
...@@ -132,6 +132,8 @@ def build(input_reader_config, batch_size=None, transform_input_data_fn=None): ...@@ -132,6 +132,8 @@ def build(input_reader_config, batch_size=None, transform_input_data_fn=None):
dataset = read_dataset( dataset = read_dataset(
functools.partial(tf.data.TFRecordDataset, buffer_size=8 * 1000 * 1000), functools.partial(tf.data.TFRecordDataset, buffer_size=8 * 1000 * 1000),
config.input_path[:], input_reader_config) config.input_path[:], input_reader_config)
if input_reader_config.sample_1_of_n_examples > 1:
dataset = dataset.shard(input_reader_config.sample_1_of_n_examples, 0)
# TODO(rathodv): make batch size a required argument once the old binaries # TODO(rathodv): make batch size a required argument once the old binaries
# are deleted. # are deleted.
if batch_size: if batch_size:
......
...@@ -20,16 +20,15 @@ import tensorflow as tf ...@@ -20,16 +20,15 @@ import tensorflow as tf
from google.protobuf import text_format from google.protobuf import text_format
from tensorflow.core.example import example_pb2
from tensorflow.core.example import feature_pb2
from object_detection.builders import dataset_builder from object_detection.builders import dataset_builder
from object_detection.core import standard_fields as fields from object_detection.core import standard_fields as fields
from object_detection.protos import input_reader_pb2 from object_detection.protos import input_reader_pb2
from object_detection.utils import dataset_util
class DatasetBuilderTest(tf.test.TestCase): class DatasetBuilderTest(tf.test.TestCase):
def create_tf_record(self, has_additional_channels=False): def create_tf_record(self, has_additional_channels=False, num_examples=1):
path = os.path.join(self.get_temp_dir(), 'tfrecord') path = os.path.join(self.get_temp_dir(), 'tfrecord')
writer = tf.python_io.TFRecordWriter(path) writer = tf.python_io.TFRecordWriter(path)
...@@ -41,40 +40,27 @@ class DatasetBuilderTest(tf.test.TestCase): ...@@ -41,40 +40,27 @@ class DatasetBuilderTest(tf.test.TestCase):
encoded_jpeg = tf.image.encode_jpeg(tf.constant(image_tensor)).eval() encoded_jpeg = tf.image.encode_jpeg(tf.constant(image_tensor)).eval()
encoded_additional_channels_jpeg = tf.image.encode_jpeg( encoded_additional_channels_jpeg = tf.image.encode_jpeg(
tf.constant(additional_channels_tensor)).eval() tf.constant(additional_channels_tensor)).eval()
features = { for i in range(num_examples):
'image/encoded': features = {
feature_pb2.Feature( 'image/source_id': dataset_util.bytes_feature(str(i)),
bytes_list=feature_pb2.BytesList(value=[encoded_jpeg])), 'image/encoded': dataset_util.bytes_feature(encoded_jpeg),
'image/format': 'image/format': dataset_util.bytes_feature('jpeg'.encode('utf8')),
feature_pb2.Feature( 'image/height': dataset_util.int64_feature(4),
bytes_list=feature_pb2.BytesList(value=['jpeg'.encode('utf-8')]) 'image/width': dataset_util.int64_feature(5),
), 'image/object/bbox/xmin': dataset_util.float_list_feature([0.0]),
'image/height': 'image/object/bbox/xmax': dataset_util.float_list_feature([1.0]),
feature_pb2.Feature(int64_list=feature_pb2.Int64List(value=[4])), 'image/object/bbox/ymin': dataset_util.float_list_feature([0.0]),
'image/width': 'image/object/bbox/ymax': dataset_util.float_list_feature([1.0]),
feature_pb2.Feature(int64_list=feature_pb2.Int64List(value=[5])), 'image/object/class/label': dataset_util.int64_list_feature([2]),
'image/object/bbox/xmin': 'image/object/mask': dataset_util.float_list_feature(flat_mask),
feature_pb2.Feature(float_list=feature_pb2.FloatList(value=[0.0])), }
'image/object/bbox/xmax': if has_additional_channels:
feature_pb2.Feature(float_list=feature_pb2.FloatList(value=[1.0])), additional_channels_key = 'image/additional_channels/encoded'
'image/object/bbox/ymin': features[additional_channels_key] = dataset_util.bytes_list_feature(
feature_pb2.Feature(float_list=feature_pb2.FloatList(value=[0.0])), [encoded_additional_channels_jpeg] * 2)
'image/object/bbox/ymax': example = tf.train.Example(features=tf.train.Features(feature=features))
feature_pb2.Feature(float_list=feature_pb2.FloatList(value=[1.0])), writer.write(example.SerializeToString())
'image/object/class/label': writer.close()
feature_pb2.Feature(int64_list=feature_pb2.Int64List(value=[2])),
'image/object/mask':
feature_pb2.Feature(
float_list=feature_pb2.FloatList(value=flat_mask)),
}
if has_additional_channels:
features['image/additional_channels/encoded'] = feature_pb2.Feature(
bytes_list=feature_pb2.BytesList(
value=[encoded_additional_channels_jpeg] * 2))
example = example_pb2.Example(
features=feature_pb2.Features(feature=features))
writer.write(example.SerializeToString())
writer.close()
return path return path
...@@ -93,9 +79,7 @@ class DatasetBuilderTest(tf.test.TestCase): ...@@ -93,9 +79,7 @@ class DatasetBuilderTest(tf.test.TestCase):
tensor_dict = dataset_builder.make_initializable_iterator( tensor_dict = dataset_builder.make_initializable_iterator(
dataset_builder.build(input_reader_proto, batch_size=1)).get_next() dataset_builder.build(input_reader_proto, batch_size=1)).get_next()
sv = tf.train.Supervisor(logdir=self.get_temp_dir()) with tf.train.MonitoredSession() as sess:
with sv.prepare_or_wait_for_session() as sess:
sv.start_queue_runners(sess)
output_dict = sess.run(tensor_dict) output_dict = sess.run(tensor_dict)
self.assertTrue( self.assertTrue(
...@@ -126,9 +110,7 @@ class DatasetBuilderTest(tf.test.TestCase): ...@@ -126,9 +110,7 @@ class DatasetBuilderTest(tf.test.TestCase):
tensor_dict = dataset_builder.make_initializable_iterator( tensor_dict = dataset_builder.make_initializable_iterator(
dataset_builder.build(input_reader_proto, batch_size=1)).get_next() dataset_builder.build(input_reader_proto, batch_size=1)).get_next()
sv = tf.train.Supervisor(logdir=self.get_temp_dir()) with tf.train.MonitoredSession() as sess:
with sv.prepare_or_wait_for_session() as sess:
sv.start_queue_runners(sess)
output_dict = sess.run(tensor_dict) output_dict = sess.run(tensor_dict)
self.assertAllEqual( self.assertAllEqual(
(1, 1, 4, 5), (1, 1, 4, 5),
...@@ -158,23 +140,18 @@ class DatasetBuilderTest(tf.test.TestCase): ...@@ -158,23 +140,18 @@ class DatasetBuilderTest(tf.test.TestCase):
transform_input_data_fn=one_hot_class_encoding_fn, transform_input_data_fn=one_hot_class_encoding_fn,
batch_size=2)).get_next() batch_size=2)).get_next()
sv = tf.train.Supervisor(logdir=self.get_temp_dir()) with tf.train.MonitoredSession() as sess:
with sv.prepare_or_wait_for_session() as sess:
sv.start_queue_runners(sess)
output_dict = sess.run(tensor_dict) output_dict = sess.run(tensor_dict)
self.assertAllEqual([2, 4, 5, 3], self.assertAllEqual([2, 4, 5, 3],
output_dict[fields.InputDataFields.image].shape) output_dict[fields.InputDataFields.image].shape)
self.assertAllEqual([2, 1, 3],
output_dict[fields.InputDataFields.groundtruth_classes].
shape)
self.assertAllEqual([2, 1, 4],
output_dict[fields.InputDataFields.groundtruth_boxes].
shape)
self.assertAllEqual( self.assertAllEqual(
[[[0.0, 0.0, 1.0, 1.0]], [2, 1, 3],
[[0.0, 0.0, 1.0, 1.0]]], output_dict[fields.InputDataFields.groundtruth_classes].shape)
output_dict[fields.InputDataFields.groundtruth_boxes]) self.assertAllEqual(
[2, 1, 4], output_dict[fields.InputDataFields.groundtruth_boxes].shape)
self.assertAllEqual([[[0.0, 0.0, 1.0, 1.0]], [[0.0, 0.0, 1.0, 1.0]]],
output_dict[fields.InputDataFields.groundtruth_boxes])
def test_build_tf_record_input_reader_with_batch_size_two_and_masks(self): def test_build_tf_record_input_reader_with_batch_size_two_and_masks(self):
tf_record_path = self.create_tf_record() tf_record_path = self.create_tf_record()
...@@ -201,9 +178,7 @@ class DatasetBuilderTest(tf.test.TestCase): ...@@ -201,9 +178,7 @@ class DatasetBuilderTest(tf.test.TestCase):
transform_input_data_fn=one_hot_class_encoding_fn, transform_input_data_fn=one_hot_class_encoding_fn,
batch_size=2)).get_next() batch_size=2)).get_next()
sv = tf.train.Supervisor(logdir=self.get_temp_dir()) with tf.train.MonitoredSession() as sess:
with sv.prepare_or_wait_for_session() as sess:
sv.start_queue_runners(sess)
output_dict = sess.run(tensor_dict) output_dict = sess.run(tensor_dict)
self.assertAllEqual( self.assertAllEqual(
...@@ -221,6 +196,50 @@ class DatasetBuilderTest(tf.test.TestCase): ...@@ -221,6 +196,50 @@ class DatasetBuilderTest(tf.test.TestCase):
with self.assertRaises(ValueError): with self.assertRaises(ValueError):
dataset_builder.build(input_reader_proto, batch_size=1) dataset_builder.build(input_reader_proto, batch_size=1)
def test_sample_all_data(self):
tf_record_path = self.create_tf_record(num_examples=2)
input_reader_text_proto = """
shuffle: false
num_readers: 1
sample_1_of_n_examples: 1
tf_record_input_reader {{
input_path: '{0}'
}}
""".format(tf_record_path)
input_reader_proto = input_reader_pb2.InputReader()
text_format.Merge(input_reader_text_proto, input_reader_proto)
tensor_dict = dataset_builder.make_initializable_iterator(
dataset_builder.build(input_reader_proto, batch_size=1)).get_next()
with tf.train.MonitoredSession() as sess:
output_dict = sess.run(tensor_dict)
self.assertAllEqual(['0'], output_dict[fields.InputDataFields.source_id])
output_dict = sess.run(tensor_dict)
self.assertEquals(['1'], output_dict[fields.InputDataFields.source_id])
def test_sample_one_of_n_shards(self):
tf_record_path = self.create_tf_record(num_examples=4)
input_reader_text_proto = """
shuffle: false
num_readers: 1
sample_1_of_n_examples: 2
tf_record_input_reader {{
input_path: '{0}'
}}
""".format(tf_record_path)
input_reader_proto = input_reader_pb2.InputReader()
text_format.Merge(input_reader_text_proto, input_reader_proto)
tensor_dict = dataset_builder.make_initializable_iterator(
dataset_builder.build(input_reader_proto, batch_size=1)).get_next()
with tf.train.MonitoredSession() as sess:
output_dict = sess.run(tensor_dict)
self.assertAllEqual(['0'], output_dict[fields.InputDataFields.source_id])
output_dict = sess.run(tensor_dict)
self.assertEquals(['2'], output_dict[fields.InputDataFields.source_id])
class ReadDatasetTest(tf.test.TestCase): class ReadDatasetTest(tf.test.TestCase):
...@@ -240,11 +259,12 @@ class ReadDatasetTest(tf.test.TestCase): ...@@ -240,11 +259,12 @@ class ReadDatasetTest(tf.test.TestCase):
f.write('\n'.join([str(i)] * 5)) f.write('\n'.join([str(i)] * 5))
def _get_dataset_next(self, files, config, batch_size): def _get_dataset_next(self, files, config, batch_size):
def decode_func(value): def decode_func(value):
return [tf.string_to_number(value, out_type=tf.int32)] return [tf.string_to_number(value, out_type=tf.int32)]
dataset = dataset_builder.read_dataset( dataset = dataset_builder.read_dataset(tf.data.TextLineDataset, files,
tf.data.TextLineDataset, files, config) config)
dataset = dataset.map(decode_func) dataset = dataset.map(decode_func)
dataset = dataset.batch(batch_size) dataset = dataset.batch(batch_size)
return dataset.make_one_shot_iterator().get_next() return dataset.make_one_shot_iterator().get_next()
...@@ -254,8 +274,7 @@ class ReadDatasetTest(tf.test.TestCase): ...@@ -254,8 +274,7 @@ class ReadDatasetTest(tf.test.TestCase):
dataset = tf.data.Dataset.from_tensor_slices([[1, 2, -1, 5]]) dataset = tf.data.Dataset.from_tensor_slices([[1, 2, -1, 5]])
table = tf.contrib.lookup.HashTable( table = tf.contrib.lookup.HashTable(
initializer=tf.contrib.lookup.KeyValueTensorInitializer( initializer=tf.contrib.lookup.KeyValueTensorInitializer(
keys=keys, keys=keys, values=list(reversed(keys))),
values=list(reversed(keys))),
default_value=100) default_value=100)
dataset = dataset.map(table.lookup) dataset = dataset.map(table.lookup)
data = dataset_builder.make_initializable_iterator(dataset).get_next() data = dataset_builder.make_initializable_iterator(dataset).get_next()
...@@ -270,24 +289,28 @@ class ReadDatasetTest(tf.test.TestCase): ...@@ -270,24 +289,28 @@ class ReadDatasetTest(tf.test.TestCase):
config.num_readers = 1 config.num_readers = 1
config.shuffle = False config.shuffle = False
data = self._get_dataset_next([self._path_template % '*'], config, data = self._get_dataset_next(
batch_size=20) [self._path_template % '*'], config, batch_size=20)
with self.test_session() as sess: with self.test_session() as sess:
self.assertAllEqual(sess.run(data), self.assertAllEqual(
[[1, 10, 2, 20, 3, 30, 4, 40, 5, 50, 1, 10, 2, 20, 3, sess.run(data), [[
30, 4, 40, 5, 50]]) 1, 10, 2, 20, 3, 30, 4, 40, 5, 50, 1, 10, 2, 20, 3, 30, 4, 40, 5,
50
]])
def test_reduce_num_reader(self): def test_reduce_num_reader(self):
config = input_reader_pb2.InputReader() config = input_reader_pb2.InputReader()
config.num_readers = 10 config.num_readers = 10
config.shuffle = False config.shuffle = False
data = self._get_dataset_next([self._path_template % '*'], config, data = self._get_dataset_next(
batch_size=20) [self._path_template % '*'], config, batch_size=20)
with self.test_session() as sess: with self.test_session() as sess:
self.assertAllEqual(sess.run(data), self.assertAllEqual(
[[1, 10, 2, 20, 3, 30, 4, 40, 5, 50, 1, 10, 2, 20, 3, sess.run(data), [[
30, 4, 40, 5, 50]]) 1, 10, 2, 20, 3, 30, 4, 40, 5, 50, 1, 10, 2, 20, 3, 30, 4, 40, 5,
50
]])
def test_enable_shuffle(self): def test_enable_shuffle(self):
config = input_reader_pb2.InputReader() config = input_reader_pb2.InputReader()
...@@ -321,8 +344,8 @@ class ReadDatasetTest(tf.test.TestCase): ...@@ -321,8 +344,8 @@ class ReadDatasetTest(tf.test.TestCase):
config.num_readers = 1 config.num_readers = 1
config.shuffle = False config.shuffle = False
data = self._get_dataset_next([self._path_template % '0'], config, data = self._get_dataset_next(
batch_size=30) [self._path_template % '0'], config, batch_size=30)
with self.test_session() as sess: with self.test_session() as sess:
# First batch will retrieve as much as it can, second batch will fail. # First batch will retrieve as much as it can, second batch will fail.
self.assertAllEqual(sess.run(data), [[1, 10]]) self.assertAllEqual(sess.run(data), [[1, 10]])
......
...@@ -63,6 +63,7 @@ class KerasLayerHyperparams(object): ...@@ -63,6 +63,7 @@ class KerasLayerHyperparams(object):
self._batch_norm_params = _build_keras_batch_norm_params( self._batch_norm_params = _build_keras_batch_norm_params(
hyperparams_config.batch_norm) hyperparams_config.batch_norm)
self._activation_fn = _build_activation_fn(hyperparams_config.activation)
self._op_params = { self._op_params = {
'kernel_regularizer': _build_keras_regularizer( 'kernel_regularizer': _build_keras_regularizer(
hyperparams_config.regularizer), hyperparams_config.regularizer),
...@@ -126,7 +127,21 @@ class KerasLayerHyperparams(object): ...@@ -126,7 +127,21 @@ class KerasLayerHyperparams(object):
else: else:
return tf.keras.layers.Lambda(tf.identity) return tf.keras.layers.Lambda(tf.identity)
def params(self, **overrides): def build_activation_layer(self, name='activation'):
"""Returns a Keras layer that applies the desired activation function.
Args:
name: The name to assign the Keras layer.
Returns: A Keras lambda layer that applies the activation function
specified in the hyperparam config, or applies the identity if the
activation function is None.
"""
if self._activation_fn:
return tf.keras.layers.Lambda(self._activation_fn, name=name)
else:
return tf.keras.layers.Lambda(tf.identity, name=name)
def params(self, include_activation=False, **overrides):
"""Returns a dict containing the layer construction hyperparameters to use. """Returns a dict containing the layer construction hyperparameters to use.
Optionally overrides values in the returned dict. Overrides Optionally overrides values in the returned dict. Overrides
...@@ -134,12 +149,20 @@ class KerasLayerHyperparams(object): ...@@ -134,12 +149,20 @@ class KerasLayerHyperparams(object):
future calls. future calls.
Args: Args:
include_activation: If False, activation in the returned dictionary will
be set to `None`, and the activation must be applied via a separate
layer created by `build_activation_layer`. If True, `activation` in the
output param dictionary will be set to the activation function
specified in the hyperparams config.
**overrides: keyword arguments to override in the hyperparams dictionary. **overrides: keyword arguments to override in the hyperparams dictionary.
Returns: dict containing the layer construction keyword arguments, with Returns: dict containing the layer construction keyword arguments, with
values overridden by the `overrides` keyword arguments. values overridden by the `overrides` keyword arguments.
""" """
new_params = self._op_params.copy() new_params = self._op_params.copy()
new_params['activation'] = None
if include_activation:
new_params['activation'] = self._activation_fn
new_params.update(**overrides) new_params.update(**overrides)
return new_params return new_params
...@@ -243,6 +266,8 @@ def _build_slim_regularizer(regularizer): ...@@ -243,6 +266,8 @@ def _build_slim_regularizer(regularizer):
return slim.l1_regularizer(scale=float(regularizer.l1_regularizer.weight)) return slim.l1_regularizer(scale=float(regularizer.l1_regularizer.weight))
if regularizer_oneof == 'l2_regularizer': if regularizer_oneof == 'l2_regularizer':
return slim.l2_regularizer(scale=float(regularizer.l2_regularizer.weight)) return slim.l2_regularizer(scale=float(regularizer.l2_regularizer.weight))
if regularizer_oneof is None:
return None
raise ValueError('Unknown regularizer function: {}'.format(regularizer_oneof)) raise ValueError('Unknown regularizer function: {}'.format(regularizer_oneof))
......
...@@ -460,6 +460,11 @@ class HyperparamsBuilderTest(tf.test.TestCase): ...@@ -460,6 +460,11 @@ class HyperparamsBuilderTest(tf.test.TestCase):
keras_config = hyperparams_builder.KerasLayerHyperparams( keras_config = hyperparams_builder.KerasLayerHyperparams(
conv_hyperparams_proto) conv_hyperparams_proto)
self.assertEqual(keras_config.params()['activation'], None) self.assertEqual(keras_config.params()['activation'], None)
self.assertEqual(
keras_config.params(include_activation=True)['activation'], None)
activation_layer = keras_config.build_activation_layer()
self.assertTrue(isinstance(activation_layer, tf.keras.layers.Lambda))
self.assertEqual(activation_layer.function, tf.identity)
def test_use_relu_activation(self): def test_use_relu_activation(self):
conv_hyperparams_text_proto = """ conv_hyperparams_text_proto = """
...@@ -497,7 +502,12 @@ class HyperparamsBuilderTest(tf.test.TestCase): ...@@ -497,7 +502,12 @@ class HyperparamsBuilderTest(tf.test.TestCase):
text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams_proto) text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams_proto)
keras_config = hyperparams_builder.KerasLayerHyperparams( keras_config = hyperparams_builder.KerasLayerHyperparams(
conv_hyperparams_proto) conv_hyperparams_proto)
self.assertEqual(keras_config.params()['activation'], tf.nn.relu) self.assertEqual(keras_config.params()['activation'], None)
self.assertEqual(
keras_config.params(include_activation=True)['activation'], tf.nn.relu)
activation_layer = keras_config.build_activation_layer()
self.assertTrue(isinstance(activation_layer, tf.keras.layers.Lambda))
self.assertEqual(activation_layer.function, tf.nn.relu)
def test_use_relu_6_activation(self): def test_use_relu_6_activation(self):
conv_hyperparams_text_proto = """ conv_hyperparams_text_proto = """
...@@ -535,7 +545,12 @@ class HyperparamsBuilderTest(tf.test.TestCase): ...@@ -535,7 +545,12 @@ class HyperparamsBuilderTest(tf.test.TestCase):
text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams_proto) text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams_proto)
keras_config = hyperparams_builder.KerasLayerHyperparams( keras_config = hyperparams_builder.KerasLayerHyperparams(
conv_hyperparams_proto) conv_hyperparams_proto)
self.assertEqual(keras_config.params()['activation'], tf.nn.relu6) self.assertEqual(keras_config.params()['activation'], None)
self.assertEqual(
keras_config.params(include_activation=True)['activation'], tf.nn.relu6)
activation_layer = keras_config.build_activation_layer()
self.assertTrue(isinstance(activation_layer, tf.keras.layers.Lambda))
self.assertEqual(activation_layer.function, tf.nn.relu6)
def test_override_activation_keras(self): def test_override_activation_keras(self):
conv_hyperparams_text_proto = """ conv_hyperparams_text_proto = """
......
...@@ -21,11 +21,10 @@ import tensorflow as tf ...@@ -21,11 +21,10 @@ import tensorflow as tf
from google.protobuf import text_format from google.protobuf import text_format
from tensorflow.core.example import example_pb2
from tensorflow.core.example import feature_pb2
from object_detection.builders import input_reader_builder from object_detection.builders import input_reader_builder
from object_detection.core import standard_fields as fields from object_detection.core import standard_fields as fields
from object_detection.protos import input_reader_pb2 from object_detection.protos import input_reader_pb2
from object_detection.utils import dataset_util
class InputReaderBuilderTest(tf.test.TestCase): class InputReaderBuilderTest(tf.test.TestCase):
...@@ -38,27 +37,17 @@ class InputReaderBuilderTest(tf.test.TestCase): ...@@ -38,27 +37,17 @@ class InputReaderBuilderTest(tf.test.TestCase):
flat_mask = (4 * 5) * [1.0] flat_mask = (4 * 5) * [1.0]
with self.test_session(): with self.test_session():
encoded_jpeg = tf.image.encode_jpeg(tf.constant(image_tensor)).eval() encoded_jpeg = tf.image.encode_jpeg(tf.constant(image_tensor)).eval()
example = example_pb2.Example(features=feature_pb2.Features(feature={ example = tf.train.Example(features=tf.train.Features(feature={
'image/encoded': feature_pb2.Feature( 'image/encoded': dataset_util.bytes_feature(encoded_jpeg),
bytes_list=feature_pb2.BytesList(value=[encoded_jpeg])), 'image/format': dataset_util.bytes_feature('jpeg'.encode('utf8')),
'image/format': feature_pb2.Feature( 'image/height': dataset_util.int64_feature(4),
bytes_list=feature_pb2.BytesList(value=['jpeg'.encode('utf-8')])), 'image/width': dataset_util.int64_feature(5),
'image/height': feature_pb2.Feature( 'image/object/bbox/xmin': dataset_util.float_list_feature([0.0]),
int64_list=feature_pb2.Int64List(value=[4])), 'image/object/bbox/xmax': dataset_util.float_list_feature([1.0]),
'image/width': feature_pb2.Feature( 'image/object/bbox/ymin': dataset_util.float_list_feature([0.0]),
int64_list=feature_pb2.Int64List(value=[5])), 'image/object/bbox/ymax': dataset_util.float_list_feature([1.0]),
'image/object/bbox/xmin': feature_pb2.Feature( 'image/object/class/label': dataset_util.int64_list_feature([2]),
float_list=feature_pb2.FloatList(value=[0.0])), 'image/object/mask': dataset_util.float_list_feature(flat_mask),
'image/object/bbox/xmax': feature_pb2.Feature(
float_list=feature_pb2.FloatList(value=[1.0])),
'image/object/bbox/ymin': feature_pb2.Feature(
float_list=feature_pb2.FloatList(value=[0.0])),
'image/object/bbox/ymax': feature_pb2.Feature(
float_list=feature_pb2.FloatList(value=[1.0])),
'image/object/class/label': feature_pb2.Feature(
int64_list=feature_pb2.Int64List(value=[2])),
'image/object/mask': feature_pb2.Feature(
float_list=feature_pb2.FloatList(value=flat_mask)),
})) }))
writer.write(example.SerializeToString()) writer.write(example.SerializeToString())
writer.close() writer.close()
...@@ -79,9 +68,7 @@ class InputReaderBuilderTest(tf.test.TestCase): ...@@ -79,9 +68,7 @@ class InputReaderBuilderTest(tf.test.TestCase):
text_format.Merge(input_reader_text_proto, input_reader_proto) text_format.Merge(input_reader_text_proto, input_reader_proto)
tensor_dict = input_reader_builder.build(input_reader_proto) tensor_dict = input_reader_builder.build(input_reader_proto)
sv = tf.train.Supervisor(logdir=self.get_temp_dir()) with tf.train.MonitoredSession() as sess:
with sv.prepare_or_wait_for_session() as sess:
sv.start_queue_runners(sess)
output_dict = sess.run(tensor_dict) output_dict = sess.run(tensor_dict)
self.assertTrue(fields.InputDataFields.groundtruth_instance_masks self.assertTrue(fields.InputDataFields.groundtruth_instance_masks
...@@ -111,9 +98,7 @@ class InputReaderBuilderTest(tf.test.TestCase): ...@@ -111,9 +98,7 @@ class InputReaderBuilderTest(tf.test.TestCase):
text_format.Merge(input_reader_text_proto, input_reader_proto) text_format.Merge(input_reader_text_proto, input_reader_proto)
tensor_dict = input_reader_builder.build(input_reader_proto) tensor_dict = input_reader_builder.build(input_reader_proto)
sv = tf.train.Supervisor(logdir=self.get_temp_dir()) with tf.train.MonitoredSession() as sess:
with sv.prepare_or_wait_for_session() as sess:
sv.start_queue_runners(sess)
output_dict = sess.run(tensor_dict) output_dict = sess.run(tensor_dict)
self.assertEquals( self.assertEquals(
......
...@@ -14,7 +14,9 @@ ...@@ -14,7 +14,9 @@
# ============================================================================== # ==============================================================================
"""A function to build a DetectionModel from configuration.""" """A function to build a DetectionModel from configuration."""
import functools import functools
from object_detection.builders import anchor_generator_builder from object_detection.builders import anchor_generator_builder
from object_detection.builders import box_coder_builder from object_detection.builders import box_coder_builder
from object_detection.builders import box_predictor_builder from object_detection.builders import box_predictor_builder
...@@ -25,6 +27,7 @@ from object_detection.builders import matcher_builder ...@@ -25,6 +27,7 @@ from object_detection.builders import matcher_builder
from object_detection.builders import post_processing_builder from object_detection.builders import post_processing_builder
from object_detection.builders import region_similarity_calculator_builder as sim_calc from object_detection.builders import region_similarity_calculator_builder as sim_calc
from object_detection.core import balanced_positive_negative_sampler as sampler from object_detection.core import balanced_positive_negative_sampler as sampler
from object_detection.core import post_processing
from object_detection.core import target_assigner from object_detection.core import target_assigner
from object_detection.meta_architectures import faster_rcnn_meta_arch from object_detection.meta_architectures import faster_rcnn_meta_arch
from object_detection.meta_architectures import rfcn_meta_arch from object_detection.meta_architectures import rfcn_meta_arch
...@@ -43,10 +46,15 @@ from object_detection.models.ssd_mobilenet_v1_feature_extractor import SSDMobile ...@@ -43,10 +46,15 @@ from object_detection.models.ssd_mobilenet_v1_feature_extractor import SSDMobile
from object_detection.models.ssd_mobilenet_v1_fpn_feature_extractor import SSDMobileNetV1FpnFeatureExtractor from object_detection.models.ssd_mobilenet_v1_fpn_feature_extractor import SSDMobileNetV1FpnFeatureExtractor
from object_detection.models.ssd_mobilenet_v1_ppn_feature_extractor import SSDMobileNetV1PpnFeatureExtractor from object_detection.models.ssd_mobilenet_v1_ppn_feature_extractor import SSDMobileNetV1PpnFeatureExtractor
from object_detection.models.ssd_mobilenet_v2_feature_extractor import SSDMobileNetV2FeatureExtractor from object_detection.models.ssd_mobilenet_v2_feature_extractor import SSDMobileNetV2FeatureExtractor
from object_detection.models.ssd_mobilenet_v2_fpn_feature_extractor import SSDMobileNetV2FpnFeatureExtractor
from object_detection.predictors import rfcn_box_predictor from object_detection.predictors import rfcn_box_predictor
from object_detection.protos import model_pb2 from object_detection.protos import model_pb2
from object_detection.utils import ops from object_detection.utils import ops
# BEGIN GOOGLE-INTERNAL
# TODO(lzc): move ssd_mask_meta_arch to third party when it has decent
# performance relative to a comparable Mask R-CNN model (b/112561592).
from google3.image.understanding.object_detection.meta_architectures import ssd_mask_meta_arch
# END GOOGLE-INTERNAL
# A map of names to SSD feature extractors. # A map of names to SSD feature extractors.
SSD_FEATURE_EXTRACTOR_CLASS_MAP = { SSD_FEATURE_EXTRACTOR_CLASS_MAP = {
...@@ -56,6 +64,7 @@ SSD_FEATURE_EXTRACTOR_CLASS_MAP = { ...@@ -56,6 +64,7 @@ SSD_FEATURE_EXTRACTOR_CLASS_MAP = {
'ssd_mobilenet_v1_fpn': SSDMobileNetV1FpnFeatureExtractor, 'ssd_mobilenet_v1_fpn': SSDMobileNetV1FpnFeatureExtractor,
'ssd_mobilenet_v1_ppn': SSDMobileNetV1PpnFeatureExtractor, 'ssd_mobilenet_v1_ppn': SSDMobileNetV1PpnFeatureExtractor,
'ssd_mobilenet_v2': SSDMobileNetV2FeatureExtractor, 'ssd_mobilenet_v2': SSDMobileNetV2FeatureExtractor,
'ssd_mobilenet_v2_fpn': SSDMobileNetV2FpnFeatureExtractor,
'ssd_resnet50_v1_fpn': ssd_resnet_v1_fpn.SSDResnet50V1FpnFeatureExtractor, 'ssd_resnet50_v1_fpn': ssd_resnet_v1_fpn.SSDResnet50V1FpnFeatureExtractor,
'ssd_resnet101_v1_fpn': ssd_resnet_v1_fpn.SSDResnet101V1FpnFeatureExtractor, 'ssd_resnet101_v1_fpn': ssd_resnet_v1_fpn.SSDResnet101V1FpnFeatureExtractor,
'ssd_resnet152_v1_fpn': ssd_resnet_v1_fpn.SSDResnet152V1FpnFeatureExtractor, 'ssd_resnet152_v1_fpn': ssd_resnet_v1_fpn.SSDResnet152V1FpnFeatureExtractor,
...@@ -170,8 +179,12 @@ def _build_ssd_feature_extractor(feature_extractor_config, is_training, ...@@ -170,8 +179,12 @@ def _build_ssd_feature_extractor(feature_extractor_config, is_training,
if feature_extractor_config.HasField('fpn'): if feature_extractor_config.HasField('fpn'):
kwargs.update({ kwargs.update({
'fpn_min_level': feature_extractor_config.fpn.min_level, 'fpn_min_level':
'fpn_max_level': feature_extractor_config.fpn.max_level, feature_extractor_config.fpn.min_level,
'fpn_max_level':
feature_extractor_config.fpn.max_level,
'additional_layer_depth':
feature_extractor_config.fpn.additional_layer_depth,
}) })
return feature_extractor_class(**kwargs) return feature_extractor_class(**kwargs)
...@@ -240,25 +253,41 @@ def _build_ssd_model(ssd_config, is_training, add_summaries, ...@@ -240,25 +253,41 @@ def _build_ssd_model(ssd_config, is_training, add_summaries,
desired_negative_sampling_ratio=ssd_config. desired_negative_sampling_ratio=ssd_config.
desired_negative_sampling_ratio) desired_negative_sampling_ratio)
return ssd_meta_arch.SSDMetaArch( ssd_meta_arch_fn = ssd_meta_arch.SSDMetaArch
is_training, # BEGIN GOOGLE-INTERNAL
anchor_generator, # TODO(lzc): move ssd_mask_meta_arch to third party when it has decent
ssd_box_predictor, # performance relative to a comparable Mask R-CNN model (b/112561592).
box_coder, predictor_config = ssd_config.box_predictor
feature_extractor, predict_instance_masks = False
matcher, if predictor_config.WhichOneof(
region_similarity_calculator, 'box_predictor_oneof') == 'convolutional_box_predictor':
encode_background_as_zeros, predict_instance_masks = (
negative_class_weight, predictor_config.convolutional_box_predictor.HasField('mask_head'))
image_resizer_fn, elif predictor_config.WhichOneof(
non_max_suppression_fn, 'box_predictor_oneof') == 'weight_shared_convolutional_box_predictor':
score_conversion_fn, predict_instance_masks = (
classification_loss, predictor_config.weight_shared_convolutional_box_predictor.HasField(
localization_loss, 'mask_head'))
classification_weight, if predict_instance_masks:
localization_weight, ssd_meta_arch_fn = ssd_mask_meta_arch.SSDMaskMetaArch
normalize_loss_by_num_matches, # END GOOGLE-INTERNAL
hard_example_miner,
return ssd_meta_arch_fn(
is_training=is_training,
anchor_generator=anchor_generator,
box_predictor=ssd_box_predictor,
box_coder=box_coder,
feature_extractor=feature_extractor,
encode_background_as_zeros=encode_background_as_zeros,
image_resizer_fn=image_resizer_fn,
non_max_suppression_fn=non_max_suppression_fn,
score_conversion_fn=score_conversion_fn,
classification_loss=classification_loss,
localization_loss=localization_loss,
classification_loss_weight=classification_weight,
localization_loss_weight=localization_weight,
normalize_loss_by_num_matches=normalize_loss_by_num_matches,
hard_example_miner=hard_example_miner,
target_assigner_instance=target_assigner_instance, target_assigner_instance=target_assigner_instance,
add_summaries=add_summaries, add_summaries=add_summaries,
normalize_loc_loss_by_codesize=normalize_loc_loss_by_codesize, normalize_loc_loss_by_codesize=normalize_loc_loss_by_codesize,
...@@ -350,12 +379,27 @@ def _build_faster_rcnn_model(frcnn_config, is_training, add_summaries): ...@@ -350,12 +379,27 @@ def _build_faster_rcnn_model(frcnn_config, is_training, add_summaries):
frcnn_config.first_stage_box_predictor_kernel_size) frcnn_config.first_stage_box_predictor_kernel_size)
first_stage_box_predictor_depth = frcnn_config.first_stage_box_predictor_depth first_stage_box_predictor_depth = frcnn_config.first_stage_box_predictor_depth
first_stage_minibatch_size = frcnn_config.first_stage_minibatch_size first_stage_minibatch_size = frcnn_config.first_stage_minibatch_size
# TODO(bhattad): When eval is supported using static shapes, add separate
# use_static_shapes_for_trainig and use_static_shapes_for_evaluation.
use_static_shapes = frcnn_config.use_static_shapes and is_training
first_stage_sampler = sampler.BalancedPositiveNegativeSampler( first_stage_sampler = sampler.BalancedPositiveNegativeSampler(
positive_fraction=frcnn_config.first_stage_positive_balance_fraction, positive_fraction=frcnn_config.first_stage_positive_balance_fraction,
is_static=frcnn_config.use_static_balanced_label_sampler) is_static=frcnn_config.use_static_balanced_label_sampler and is_training)
first_stage_nms_score_threshold = frcnn_config.first_stage_nms_score_threshold
first_stage_nms_iou_threshold = frcnn_config.first_stage_nms_iou_threshold
first_stage_max_proposals = frcnn_config.first_stage_max_proposals first_stage_max_proposals = frcnn_config.first_stage_max_proposals
if (frcnn_config.first_stage_nms_iou_threshold < 0 or
frcnn_config.first_stage_nms_iou_threshold > 1.0):
raise ValueError('iou_threshold not in [0, 1.0].')
if (is_training and frcnn_config.second_stage_batch_size >
first_stage_max_proposals):
raise ValueError('second_stage_batch_size should be no greater than '
'first_stage_max_proposals.')
first_stage_non_max_suppression_fn = functools.partial(
post_processing.batch_multiclass_non_max_suppression,
score_thresh=frcnn_config.first_stage_nms_score_threshold,
iou_thresh=frcnn_config.first_stage_nms_iou_threshold,
max_size_per_class=frcnn_config.first_stage_max_proposals,
max_total_size=frcnn_config.first_stage_max_proposals,
use_static_shapes=use_static_shapes and is_training)
first_stage_loc_loss_weight = ( first_stage_loc_loss_weight = (
frcnn_config.first_stage_localization_loss_weight) frcnn_config.first_stage_localization_loss_weight)
first_stage_obj_loss_weight = frcnn_config.first_stage_objectness_loss_weight first_stage_obj_loss_weight = frcnn_config.first_stage_objectness_loss_weight
...@@ -376,7 +420,7 @@ def _build_faster_rcnn_model(frcnn_config, is_training, add_summaries): ...@@ -376,7 +420,7 @@ def _build_faster_rcnn_model(frcnn_config, is_training, add_summaries):
second_stage_batch_size = frcnn_config.second_stage_batch_size second_stage_batch_size = frcnn_config.second_stage_batch_size
second_stage_sampler = sampler.BalancedPositiveNegativeSampler( second_stage_sampler = sampler.BalancedPositiveNegativeSampler(
positive_fraction=frcnn_config.second_stage_balance_fraction, positive_fraction=frcnn_config.second_stage_balance_fraction,
is_static=frcnn_config.use_static_balanced_label_sampler) is_static=frcnn_config.use_static_balanced_label_sampler and is_training)
(second_stage_non_max_suppression_fn, second_stage_score_conversion_fn (second_stage_non_max_suppression_fn, second_stage_score_conversion_fn
) = post_processing_builder.build(frcnn_config.second_stage_post_processing) ) = post_processing_builder.build(frcnn_config.second_stage_post_processing)
second_stage_localization_loss_weight = ( second_stage_localization_loss_weight = (
...@@ -396,7 +440,9 @@ def _build_faster_rcnn_model(frcnn_config, is_training, add_summaries): ...@@ -396,7 +440,9 @@ def _build_faster_rcnn_model(frcnn_config, is_training, add_summaries):
second_stage_classification_loss_weight, second_stage_classification_loss_weight,
second_stage_localization_loss_weight) second_stage_localization_loss_weight)
use_matmul_crop_and_resize = (frcnn_config.use_matmul_crop_and_resize) crop_and_resize_fn = (
ops.matmul_crop_and_resize if frcnn_config.use_matmul_crop_and_resize
else ops.native_crop_and_resize)
clip_anchors_to_image = ( clip_anchors_to_image = (
frcnn_config.clip_anchors_to_image) frcnn_config.clip_anchors_to_image)
...@@ -416,8 +462,7 @@ def _build_faster_rcnn_model(frcnn_config, is_training, add_summaries): ...@@ -416,8 +462,7 @@ def _build_faster_rcnn_model(frcnn_config, is_training, add_summaries):
'first_stage_box_predictor_depth': first_stage_box_predictor_depth, 'first_stage_box_predictor_depth': first_stage_box_predictor_depth,
'first_stage_minibatch_size': first_stage_minibatch_size, 'first_stage_minibatch_size': first_stage_minibatch_size,
'first_stage_sampler': first_stage_sampler, 'first_stage_sampler': first_stage_sampler,
'first_stage_nms_score_threshold': first_stage_nms_score_threshold, 'first_stage_non_max_suppression_fn': first_stage_non_max_suppression_fn,
'first_stage_nms_iou_threshold': first_stage_nms_iou_threshold,
'first_stage_max_proposals': first_stage_max_proposals, 'first_stage_max_proposals': first_stage_max_proposals,
'first_stage_localization_loss_weight': first_stage_loc_loss_weight, 'first_stage_localization_loss_weight': first_stage_loc_loss_weight,
'first_stage_objectness_loss_weight': first_stage_obj_loss_weight, 'first_stage_objectness_loss_weight': first_stage_obj_loss_weight,
...@@ -435,8 +480,10 @@ def _build_faster_rcnn_model(frcnn_config, is_training, add_summaries): ...@@ -435,8 +480,10 @@ def _build_faster_rcnn_model(frcnn_config, is_training, add_summaries):
second_stage_classification_loss_weight, second_stage_classification_loss_weight,
'hard_example_miner': hard_example_miner, 'hard_example_miner': hard_example_miner,
'add_summaries': add_summaries, 'add_summaries': add_summaries,
'use_matmul_crop_and_resize': use_matmul_crop_and_resize, 'crop_and_resize_fn': crop_and_resize_fn,
'clip_anchors_to_image': clip_anchors_to_image 'clip_anchors_to_image': clip_anchors_to_image,
'use_static_shapes': use_static_shapes,
'resize_masks': frcnn_config.resize_masks
} }
if isinstance(second_stage_box_predictor, if isinstance(second_stage_box_predictor,
......
...@@ -15,6 +15,8 @@ ...@@ -15,6 +15,8 @@
"""Tests for object_detection.models.model_builder.""" """Tests for object_detection.models.model_builder."""
from absl.testing import parameterized
import tensorflow as tf import tensorflow as tf
from google.protobuf import text_format from google.protobuf import text_format
...@@ -36,7 +38,13 @@ from object_detection.models.ssd_mobilenet_v1_feature_extractor import SSDMobile ...@@ -36,7 +38,13 @@ from object_detection.models.ssd_mobilenet_v1_feature_extractor import SSDMobile
from object_detection.models.ssd_mobilenet_v1_fpn_feature_extractor import SSDMobileNetV1FpnFeatureExtractor from object_detection.models.ssd_mobilenet_v1_fpn_feature_extractor import SSDMobileNetV1FpnFeatureExtractor
from object_detection.models.ssd_mobilenet_v1_ppn_feature_extractor import SSDMobileNetV1PpnFeatureExtractor from object_detection.models.ssd_mobilenet_v1_ppn_feature_extractor import SSDMobileNetV1PpnFeatureExtractor
from object_detection.models.ssd_mobilenet_v2_feature_extractor import SSDMobileNetV2FeatureExtractor from object_detection.models.ssd_mobilenet_v2_feature_extractor import SSDMobileNetV2FeatureExtractor
from object_detection.models.ssd_mobilenet_v2_fpn_feature_extractor import SSDMobileNetV2FpnFeatureExtractor
from object_detection.protos import model_pb2 from object_detection.protos import model_pb2
# BEGIN GOOGLE-INTERNAL
# TODO(lzc): move ssd_mask_meta_arch to third party when it has decent
# performance relative to a comparable Mask R-CNN model (b/112561592).
from google3.image.understanding.object_detection.meta_architectures import ssd_mask_meta_arch
# END GOOGLE-INTERNAL
FRCNN_RESNET_FEAT_MAPS = { FRCNN_RESNET_FEAT_MAPS = {
'faster_rcnn_resnet50': 'faster_rcnn_resnet50':
...@@ -66,7 +74,7 @@ SSD_RESNET_V1_PPN_FEAT_MAPS = { ...@@ -66,7 +74,7 @@ SSD_RESNET_V1_PPN_FEAT_MAPS = {
} }
class ModelBuilderTest(tf.test.TestCase): class ModelBuilderTest(tf.test.TestCase, parameterized.TestCase):
def create_model(self, model_config): def create_model(self, model_config):
"""Builds a DetectionModel based on the model config. """Builds a DetectionModel based on the model config.
...@@ -161,6 +169,162 @@ class ModelBuilderTest(tf.test.TestCase): ...@@ -161,6 +169,162 @@ class ModelBuilderTest(tf.test.TestCase):
'desired_negative_sampling_ratio': 2 'desired_negative_sampling_ratio': 2
}) })
# BEGIN GOOGLE-INTERNAL
# TODO(lzc): move ssd_mask_meta_arch to third party when it has decent
# performance relative to a comparable Mask R-CNN model (b/112561592).
def test_create_ssd_conv_predictor_model_with_mask(self):
model_text_proto = """
ssd {
feature_extractor {
type: 'ssd_inception_v2'
conv_hyperparams {
regularizer {
l2_regularizer {
}
}
initializer {
truncated_normal_initializer {
}
}
}
override_base_feature_extractor_hyperparams: true
}
box_coder {
faster_rcnn_box_coder {
}
}
matcher {
argmax_matcher {
}
}
similarity_calculator {
iou_similarity {
}
}
anchor_generator {
ssd_anchor_generator {
aspect_ratios: 1.0
}
}
image_resizer {
fixed_shape_resizer {
height: 320
width: 320
}
}
box_predictor {
convolutional_box_predictor {
mask_head {
}
conv_hyperparams {
regularizer {
l2_regularizer {
}
}
initializer {
truncated_normal_initializer {
}
}
}
}
}
loss {
classification_loss {
weighted_softmax {
}
}
localization_loss {
weighted_smooth_l1 {
}
}
}
use_expected_classification_loss_under_sampling: true
minimum_negative_sampling: 10
desired_negative_sampling_ratio: 2
}"""
model_proto = model_pb2.DetectionModel()
text_format.Merge(model_text_proto, model_proto)
model = self.create_model(model_proto)
self.assertIsInstance(model, ssd_mask_meta_arch.SSDMaskMetaArch)
def test_create_ssd_weight_shared_predictor_model_with_mask(self):
model_text_proto = """
ssd {
feature_extractor {
type: 'ssd_inception_v2'
conv_hyperparams {
regularizer {
l2_regularizer {
}
}
initializer {
truncated_normal_initializer {
}
}
}
override_base_feature_extractor_hyperparams: true
}
box_coder {
faster_rcnn_box_coder {
}
}
matcher {
argmax_matcher {
}
}
similarity_calculator {
iou_similarity {
}
}
anchor_generator {
ssd_anchor_generator {
aspect_ratios: 1.0
}
}
image_resizer {
fixed_shape_resizer {
height: 320
width: 320
}
}
box_predictor {
weight_shared_convolutional_box_predictor {
mask_head {
}
depth: 32
conv_hyperparams {
regularizer {
l2_regularizer {
}
}
initializer {
random_normal_initializer {
}
}
}
num_layers_before_predictor: 1
}
}
loss {
classification_loss {
weighted_softmax {
}
}
localization_loss {
weighted_smooth_l1 {
}
}
}
use_expected_classification_loss_under_sampling: true
minimum_negative_sampling: 10
desired_negative_sampling_ratio: 2
}"""
model_proto = model_pb2.DetectionModel()
text_format.Merge(model_text_proto, model_proto)
model = self.create_model(model_proto)
self.assertIsInstance(model, ssd_mask_meta_arch.SSDMaskMetaArch)
# END GOOGLE-INTERNAL
def test_create_ssd_inception_v3_model_from_config(self): def test_create_ssd_inception_v3_model_from_config(self):
model_text_proto = """ model_text_proto = """
ssd { ssd {
...@@ -712,6 +876,170 @@ class ModelBuilderTest(tf.test.TestCase): ...@@ -712,6 +876,170 @@ class ModelBuilderTest(tf.test.TestCase):
self.assertTrue(model._normalize_loc_loss_by_codesize) self.assertTrue(model._normalize_loc_loss_by_codesize)
self.assertTrue(model._target_assigner._weight_regression_loss_by_score) self.assertTrue(model._target_assigner._weight_regression_loss_by_score)
def test_create_ssd_mobilenet_v2_fpn_model_from_config(self):
model_text_proto = """
ssd {
freeze_batchnorm: true
inplace_batchnorm_update: true
feature_extractor {
type: 'ssd_mobilenet_v2_fpn'
fpn {
min_level: 3
max_level: 7
}
conv_hyperparams {
regularizer {
l2_regularizer {
}
}
initializer {
truncated_normal_initializer {
}
}
}
}
box_coder {
faster_rcnn_box_coder {
}
}
matcher {
argmax_matcher {
}
}
similarity_calculator {
iou_similarity {
}
}
anchor_generator {
ssd_anchor_generator {
aspect_ratios: 1.0
}
}
image_resizer {
fixed_shape_resizer {
height: 320
width: 320
}
}
box_predictor {
convolutional_box_predictor {
conv_hyperparams {
regularizer {
l2_regularizer {
}
}
initializer {
truncated_normal_initializer {
}
}
}
}
}
normalize_loc_loss_by_codesize: true
loss {
classification_loss {
weighted_softmax {
}
}
localization_loss {
weighted_smooth_l1 {
}
}
}
}"""
model_proto = model_pb2.DetectionModel()
text_format.Merge(model_text_proto, model_proto)
model = self.create_model(model_proto)
self.assertIsInstance(model, ssd_meta_arch.SSDMetaArch)
self.assertIsInstance(model._feature_extractor,
SSDMobileNetV2FpnFeatureExtractor)
self.assertTrue(model._normalize_loc_loss_by_codesize)
self.assertTrue(model._freeze_batchnorm)
self.assertTrue(model._inplace_batchnorm_update)
def test_create_ssd_mobilenet_v2_fpnlite_model_from_config(self):
model_text_proto = """
ssd {
freeze_batchnorm: true
inplace_batchnorm_update: true
feature_extractor {
type: 'ssd_mobilenet_v2_fpn'
use_depthwise: true
fpn {
min_level: 3
max_level: 7
additional_layer_depth: 128
}
conv_hyperparams {
regularizer {
l2_regularizer {
}
}
initializer {
truncated_normal_initializer {
}
}
}
}
box_coder {
faster_rcnn_box_coder {
}
}
matcher {
argmax_matcher {
}
}
similarity_calculator {
iou_similarity {
}
}
anchor_generator {
ssd_anchor_generator {
aspect_ratios: 1.0
}
}
image_resizer {
fixed_shape_resizer {
height: 320
width: 320
}
}
box_predictor {
convolutional_box_predictor {
conv_hyperparams {
regularizer {
l2_regularizer {
}
}
initializer {
truncated_normal_initializer {
}
}
}
}
}
normalize_loc_loss_by_codesize: true
loss {
classification_loss {
weighted_softmax {
}
}
localization_loss {
weighted_smooth_l1 {
}
}
}
}"""
model_proto = model_pb2.DetectionModel()
text_format.Merge(model_text_proto, model_proto)
model = self.create_model(model_proto)
self.assertIsInstance(model, ssd_meta_arch.SSDMetaArch)
self.assertIsInstance(model._feature_extractor,
SSDMobileNetV2FpnFeatureExtractor)
self.assertTrue(model._normalize_loc_loss_by_codesize)
self.assertTrue(model._freeze_batchnorm)
self.assertTrue(model._inplace_batchnorm_update)
def test_create_embedded_ssd_mobilenet_v1_model_from_config(self): def test_create_embedded_ssd_mobilenet_v1_model_from_config(self):
model_text_proto = """ model_text_proto = """
ssd { ssd {
...@@ -845,13 +1173,19 @@ class ModelBuilderTest(tf.test.TestCase): ...@@ -845,13 +1173,19 @@ class ModelBuilderTest(tf.test.TestCase):
}""" }"""
model_proto = model_pb2.DetectionModel() model_proto = model_pb2.DetectionModel()
text_format.Merge(model_text_proto, model_proto) text_format.Merge(model_text_proto, model_proto)
for extractor_type, extractor_class in FRCNN_RESNET_FEAT_MAPS.items(): for extractor_type, extractor_class in FRCNN_RESNET_FEAT_MAPS.items():
model_proto.faster_rcnn.feature_extractor.type = extractor_type model_proto.faster_rcnn.feature_extractor.type = extractor_type
model = model_builder.build(model_proto, is_training=True) model = model_builder.build(model_proto, is_training=True)
self.assertIsInstance(model, faster_rcnn_meta_arch.FasterRCNNMetaArch) self.assertIsInstance(model, faster_rcnn_meta_arch.FasterRCNNMetaArch)
self.assertIsInstance(model._feature_extractor, extractor_class) self.assertIsInstance(model._feature_extractor, extractor_class)
def test_create_faster_rcnn_resnet101_with_mask_prediction_enabled(self): @parameterized.parameters(
{'use_matmul_crop_and_resize': False},
{'use_matmul_crop_and_resize': True},
)
def test_create_faster_rcnn_resnet101_with_mask_prediction_enabled(
self, use_matmul_crop_and_resize):
model_text_proto = """ model_text_proto = """
faster_rcnn { faster_rcnn {
num_classes: 3 num_classes: 3
...@@ -924,6 +1258,8 @@ class ModelBuilderTest(tf.test.TestCase): ...@@ -924,6 +1258,8 @@ class ModelBuilderTest(tf.test.TestCase):
}""" }"""
model_proto = model_pb2.DetectionModel() model_proto = model_pb2.DetectionModel()
text_format.Merge(model_text_proto, model_proto) text_format.Merge(model_text_proto, model_proto)
model_proto.faster_rcnn.use_matmul_crop_and_resize = (
use_matmul_crop_and_resize)
model = model_builder.build(model_proto, is_training=True) model = model_builder.build(model_proto, is_training=True)
self.assertAlmostEqual(model._second_stage_mask_loss_weight, 3.0) self.assertAlmostEqual(model._second_stage_mask_loss_weight, 3.0)
......
...@@ -84,7 +84,8 @@ def _build_non_max_suppressor(nms_config): ...@@ -84,7 +84,8 @@ def _build_non_max_suppressor(nms_config):
score_thresh=nms_config.score_threshold, score_thresh=nms_config.score_threshold,
iou_thresh=nms_config.iou_threshold, iou_thresh=nms_config.iou_threshold,
max_size_per_class=nms_config.max_detections_per_class, max_size_per_class=nms_config.max_detections_per_class,
max_total_size=nms_config.max_total_detections) max_total_size=nms_config.max_total_detections,
use_static_shapes=nms_config.use_static_shapes)
return non_max_suppressor_fn return non_max_suppressor_fn
......
...@@ -24,6 +24,10 @@ for obtaining the desired batch_size, it returns fewer examples. ...@@ -24,6 +24,10 @@ for obtaining the desired batch_size, it returns fewer examples.
The main function to call is Subsample(self, indicator, labels). For convenience The main function to call is Subsample(self, indicator, labels). For convenience
one can also call SubsampleWeights(self, weights, labels) which is defined in one can also call SubsampleWeights(self, weights, labels) which is defined in
the minibatch_sampler base class. the minibatch_sampler base class.
When is_static is True, it implements a method that guarantees static shapes.
It also ensures the length of output of the subsample is always batch_size, even
when number of examples set to True in indicator is less than batch_size.
""" """
import tensorflow as tf import tensorflow as tf
...@@ -102,13 +106,14 @@ class BalancedPositiveNegativeSampler(minibatch_sampler.MinibatchSampler): ...@@ -102,13 +106,14 @@ class BalancedPositiveNegativeSampler(minibatch_sampler.MinibatchSampler):
end_positions = tf.greater_equal( end_positions = tf.greater_equal(
tf.range(input_length), input_length - num_end_samples) tf.range(input_length), input_length - num_end_samples)
selected_positions = tf.logical_or(start_positions, end_positions) selected_positions = tf.logical_or(start_positions, end_positions)
selected_positions = tf.cast(selected_positions, tf.int32) selected_positions = tf.cast(selected_positions, tf.float32)
indexed_positions = tf.multiply(tf.cumsum(selected_positions), indexed_positions = tf.multiply(tf.cumsum(selected_positions),
selected_positions) selected_positions)
one_hot_selector = tf.one_hot(indexed_positions - 1, one_hot_selector = tf.one_hot(tf.cast(indexed_positions, tf.int32) - 1,
total_num_samples, total_num_samples,
dtype=tf.int32) dtype=tf.float32)
return tf.tensordot(input_tensor, one_hot_selector, axes=[0, 0]) return tf.cast(tf.tensordot(tf.cast(input_tensor, tf.float32),
one_hot_selector, axes=[0, 0]), tf.int32)
def _static_subsample(self, indicator, batch_size, labels): def _static_subsample(self, indicator, batch_size, labels):
"""Returns subsampled minibatch. """Returns subsampled minibatch.
...@@ -122,7 +127,9 @@ class BalancedPositiveNegativeSampler(minibatch_sampler.MinibatchSampler): ...@@ -122,7 +127,9 @@ class BalancedPositiveNegativeSampler(minibatch_sampler.MinibatchSampler):
Returns: Returns:
sampled_idx_indicator: boolean tensor of shape [N], True for entries which sampled_idx_indicator: boolean tensor of shape [N], True for entries which
are sampled. are sampled. It ensures the length of output of the subsample is always
batch_size, even when number of examples set to True in indicator is
less than batch_size.
Raises: Raises:
ValueError: if labels and indicator are not 1D boolean tensors. ValueError: if labels and indicator are not 1D boolean tensors.
...@@ -140,6 +147,14 @@ class BalancedPositiveNegativeSampler(minibatch_sampler.MinibatchSampler): ...@@ -140,6 +147,14 @@ class BalancedPositiveNegativeSampler(minibatch_sampler.MinibatchSampler):
input_length = tf.shape(indicator)[0] input_length = tf.shape(indicator)[0]
# Set the number of examples set True in indicator to be at least
# batch_size.
num_true_sampled = tf.reduce_sum(tf.cast(indicator, tf.float32))
additional_false_sample = tf.less_equal(
tf.cumsum(tf.cast(tf.logical_not(indicator), tf.float32)),
batch_size - num_true_sampled)
indicator = tf.logical_or(indicator, additional_false_sample)
# Shuffle indicator and label. Need to store the permutation to restore the # Shuffle indicator and label. Need to store the permutation to restore the
# order post sampling. # order post sampling.
permutation = tf.random_shuffle(tf.range(input_length)) permutation = tf.random_shuffle(tf.range(input_length))
...@@ -148,7 +163,7 @@ class BalancedPositiveNegativeSampler(minibatch_sampler.MinibatchSampler): ...@@ -148,7 +163,7 @@ class BalancedPositiveNegativeSampler(minibatch_sampler.MinibatchSampler):
labels = ops.matmul_gather_on_zeroth_axis( labels = ops.matmul_gather_on_zeroth_axis(
tf.cast(labels, tf.float32), permutation) tf.cast(labels, tf.float32), permutation)
# index (starting from 1) when cls_weight is True, 0 when False # index (starting from 1) when indicator is True, 0 when False
indicator_idx = tf.where( indicator_idx = tf.where(
tf.cast(indicator, tf.bool), tf.range(1, input_length + 1), tf.cast(indicator, tf.bool), tf.range(1, input_length + 1),
tf.zeros(input_length, tf.int32)) tf.zeros(input_length, tf.int32))
...@@ -183,9 +198,10 @@ class BalancedPositiveNegativeSampler(minibatch_sampler.MinibatchSampler): ...@@ -183,9 +198,10 @@ class BalancedPositiveNegativeSampler(minibatch_sampler.MinibatchSampler):
axis=0), tf.bool) axis=0), tf.bool)
# project back the order based on stored permutations # project back the order based on stored permutations
reprojections = tf.one_hot(permutation, depth=input_length, dtype=tf.int32) reprojections = tf.one_hot(permutation, depth=input_length,
dtype=tf.float32)
return tf.cast(tf.tensordot( return tf.cast(tf.tensordot(
tf.cast(sampled_idx_indicator, tf.int32), tf.cast(sampled_idx_indicator, tf.float32),
reprojections, axes=[0, 0]), tf.bool) reprojections, axes=[0, 0]), tf.bool)
def subsample(self, indicator, batch_size, labels, scope=None): def subsample(self, indicator, batch_size, labels, scope=None):
......
...@@ -24,7 +24,7 @@ from object_detection.utils import test_case ...@@ -24,7 +24,7 @@ from object_detection.utils import test_case
class BalancedPositiveNegativeSamplerTest(test_case.TestCase): class BalancedPositiveNegativeSamplerTest(test_case.TestCase):
def _test_subsample_all_examples(self, is_static=False): def test_subsample_all_examples_dynamic(self):
numpy_labels = np.random.permutation(300) numpy_labels = np.random.permutation(300)
indicator = tf.constant(np.ones(300) == 1) indicator = tf.constant(np.ones(300) == 1)
numpy_labels = (numpy_labels - 200) > 0 numpy_labels = (numpy_labels - 200) > 0
...@@ -32,8 +32,7 @@ class BalancedPositiveNegativeSamplerTest(test_case.TestCase): ...@@ -32,8 +32,7 @@ class BalancedPositiveNegativeSamplerTest(test_case.TestCase):
labels = tf.constant(numpy_labels) labels = tf.constant(numpy_labels)
sampler = ( sampler = (
balanced_positive_negative_sampler.BalancedPositiveNegativeSampler( balanced_positive_negative_sampler.BalancedPositiveNegativeSampler())
is_static=is_static))
is_sampled = sampler.subsample(indicator, 64, labels) is_sampled = sampler.subsample(indicator, 64, labels)
with self.test_session() as sess: with self.test_session() as sess:
is_sampled = sess.run(is_sampled) is_sampled = sess.run(is_sampled)
...@@ -42,13 +41,26 @@ class BalancedPositiveNegativeSamplerTest(test_case.TestCase): ...@@ -42,13 +41,26 @@ class BalancedPositiveNegativeSamplerTest(test_case.TestCase):
self.assertTrue(sum(np.logical_and( self.assertTrue(sum(np.logical_and(
np.logical_not(numpy_labels), is_sampled)) == 32) np.logical_not(numpy_labels), is_sampled)) == 32)
def test_subsample_all_examples_dynamic(self):
self._test_subsample_all_examples()
def test_subsample_all_examples_static(self): def test_subsample_all_examples_static(self):
self._test_subsample_all_examples(is_static=True) numpy_labels = np.random.permutation(300)
indicator = np.array(np.ones(300) == 1, np.bool)
numpy_labels = (numpy_labels - 200) > 0
labels = np.array(numpy_labels, np.bool)
def graph_fn(indicator, labels):
sampler = (
balanced_positive_negative_sampler.BalancedPositiveNegativeSampler(
is_static=True))
return sampler.subsample(indicator, 64, labels)
is_sampled = self.execute(graph_fn, [indicator, labels])
self.assertTrue(sum(is_sampled) == 64)
self.assertTrue(sum(np.logical_and(numpy_labels, is_sampled)) == 32)
self.assertTrue(sum(np.logical_and(
np.logical_not(numpy_labels), is_sampled)) == 32)
def _test_subsample_selection(self, is_static=False): def test_subsample_selection_dynamic(self):
# Test random sampling when only some examples can be sampled: # Test random sampling when only some examples can be sampled:
# 100 samples, 20 positives, 10 positives cannot be sampled # 100 samples, 20 positives, 10 positives cannot be sampled
numpy_labels = np.arange(100) numpy_labels = np.arange(100)
...@@ -59,8 +71,7 @@ class BalancedPositiveNegativeSamplerTest(test_case.TestCase): ...@@ -59,8 +71,7 @@ class BalancedPositiveNegativeSamplerTest(test_case.TestCase):
labels = tf.constant(numpy_labels) labels = tf.constant(numpy_labels)
sampler = ( sampler = (
balanced_positive_negative_sampler.BalancedPositiveNegativeSampler( balanced_positive_negative_sampler.BalancedPositiveNegativeSampler())
is_static=is_static))
is_sampled = sampler.subsample(indicator, 64, labels) is_sampled = sampler.subsample(indicator, 64, labels)
with self.test_session() as sess: with self.test_session() as sess:
is_sampled = sess.run(is_sampled) is_sampled = sess.run(is_sampled)
...@@ -71,13 +82,30 @@ class BalancedPositiveNegativeSamplerTest(test_case.TestCase): ...@@ -71,13 +82,30 @@ class BalancedPositiveNegativeSamplerTest(test_case.TestCase):
self.assertAllEqual(is_sampled, np.logical_and(is_sampled, self.assertAllEqual(is_sampled, np.logical_and(is_sampled,
numpy_indicator)) numpy_indicator))
def test_subsample_selection_dynamic(self):
self._test_subsample_selection()
def test_subsample_selection_static(self): def test_subsample_selection_static(self):
self._test_subsample_selection(is_static=True) # Test random sampling when only some examples can be sampled:
# 100 samples, 20 positives, 10 positives cannot be sampled.
numpy_labels = np.arange(100)
numpy_indicator = numpy_labels < 90
indicator = np.array(numpy_indicator, np.bool)
numpy_labels = (numpy_labels - 80) >= 0
def _test_subsample_selection_larger_batch_size(self, is_static=False): labels = np.array(numpy_labels, np.bool)
def graph_fn(indicator, labels):
sampler = (
balanced_positive_negative_sampler.BalancedPositiveNegativeSampler(
is_static=True))
return sampler.subsample(indicator, 64, labels)
is_sampled = self.execute(graph_fn, [indicator, labels])
self.assertTrue(sum(is_sampled) == 64)
self.assertTrue(sum(np.logical_and(numpy_labels, is_sampled)) == 10)
self.assertTrue(sum(np.logical_and(
np.logical_not(numpy_labels), is_sampled)) == 54)
self.assertAllEqual(is_sampled, np.logical_and(is_sampled, numpy_indicator))
def test_subsample_selection_larger_batch_size_dynamic(self):
# Test random sampling when total number of examples that can be sampled are # Test random sampling when total number of examples that can be sampled are
# less than batch size: # less than batch size:
# 100 samples, 50 positives, 40 positives cannot be sampled, batch size 64. # 100 samples, 50 positives, 40 positives cannot be sampled, batch size 64.
...@@ -89,8 +117,7 @@ class BalancedPositiveNegativeSamplerTest(test_case.TestCase): ...@@ -89,8 +117,7 @@ class BalancedPositiveNegativeSamplerTest(test_case.TestCase):
labels = tf.constant(numpy_labels) labels = tf.constant(numpy_labels)
sampler = ( sampler = (
balanced_positive_negative_sampler.BalancedPositiveNegativeSampler( balanced_positive_negative_sampler.BalancedPositiveNegativeSampler())
is_static=is_static))
is_sampled = sampler.subsample(indicator, 64, labels) is_sampled = sampler.subsample(indicator, 64, labels)
with self.test_session() as sess: with self.test_session() as sess:
is_sampled = sess.run(is_sampled) is_sampled = sess.run(is_sampled)
...@@ -101,11 +128,31 @@ class BalancedPositiveNegativeSamplerTest(test_case.TestCase): ...@@ -101,11 +128,31 @@ class BalancedPositiveNegativeSamplerTest(test_case.TestCase):
self.assertAllEqual(is_sampled, np.logical_and(is_sampled, self.assertAllEqual(is_sampled, np.logical_and(is_sampled,
numpy_indicator)) numpy_indicator))
def test_subsample_selection_larger_batch_size_dynamic(self):
self._test_subsample_selection_larger_batch_size()
def test_subsample_selection_larger_batch_size_static(self): def test_subsample_selection_larger_batch_size_static(self):
self._test_subsample_selection_larger_batch_size(is_static=True) # Test random sampling when total number of examples that can be sampled are
# less than batch size:
# 100 samples, 50 positives, 40 positives cannot be sampled, batch size 64.
# It should still return 64 samples, with 4 of them that couldn't have been
# sampled.
numpy_labels = np.arange(100)
numpy_indicator = numpy_labels < 60
indicator = np.array(numpy_indicator, np.bool)
numpy_labels = (numpy_labels - 50) >= 0
labels = np.array(numpy_labels, np.bool)
def graph_fn(indicator, labels):
sampler = (
balanced_positive_negative_sampler.BalancedPositiveNegativeSampler(
is_static=True))
return sampler.subsample(indicator, 64, labels)
is_sampled = self.execute(graph_fn, [indicator, labels])
self.assertTrue(sum(is_sampled) == 64)
self.assertTrue(sum(np.logical_and(numpy_labels, is_sampled)) >= 10)
self.assertTrue(
sum(np.logical_and(np.logical_not(numpy_labels), is_sampled)) >= 50)
self.assertTrue(sum(np.logical_and(is_sampled, numpy_indicator)) == 60)
def test_subsample_selection_no_batch_size(self): def test_subsample_selection_no_batch_size(self):
# Test random sampling when only some examples can be sampled: # Test random sampling when only some examples can be sampled:
......
...@@ -26,6 +26,7 @@ BoxList are retained unless documented otherwise. ...@@ -26,6 +26,7 @@ BoxList are retained unless documented otherwise.
import tensorflow as tf import tensorflow as tf
from object_detection.core import box_list from object_detection.core import box_list
from object_detection.utils import ops
from object_detection.utils import shape_utils from object_detection.utils import shape_utils
...@@ -420,7 +421,8 @@ def sq_dist(boxlist1, boxlist2, scope=None): ...@@ -420,7 +421,8 @@ def sq_dist(boxlist1, boxlist2, scope=None):
return sqnorm1 + tf.transpose(sqnorm2) - 2.0 * innerprod return sqnorm1 + tf.transpose(sqnorm2) - 2.0 * innerprod
def boolean_mask(boxlist, indicator, fields=None, scope=None): def boolean_mask(boxlist, indicator, fields=None, scope=None,
use_static_shapes=False, indicator_sum=None):
"""Select boxes from BoxList according to indicator and return new BoxList. """Select boxes from BoxList according to indicator and return new BoxList.
`boolean_mask` returns the subset of boxes that are marked as "True" by the `boolean_mask` returns the subset of boxes that are marked as "True" by the
...@@ -436,6 +438,10 @@ def boolean_mask(boxlist, indicator, fields=None, scope=None): ...@@ -436,6 +438,10 @@ def boolean_mask(boxlist, indicator, fields=None, scope=None):
all fields are gathered from. Pass an empty fields list to only gather all fields are gathered from. Pass an empty fields list to only gather
the box coordinates. the box coordinates.
scope: name scope. scope: name scope.
use_static_shapes: Whether to use an implementation with static shape
gurantees.
indicator_sum: An integer containing the sum of `indicator` vector. Only
required if `use_static_shape` is True.
Returns: Returns:
subboxlist: a BoxList corresponding to the subset of the input BoxList subboxlist: a BoxList corresponding to the subset of the input BoxList
...@@ -448,18 +454,36 @@ def boolean_mask(boxlist, indicator, fields=None, scope=None): ...@@ -448,18 +454,36 @@ def boolean_mask(boxlist, indicator, fields=None, scope=None):
raise ValueError('indicator should have rank 1') raise ValueError('indicator should have rank 1')
if indicator.dtype != tf.bool: if indicator.dtype != tf.bool:
raise ValueError('indicator should be a boolean tensor') raise ValueError('indicator should be a boolean tensor')
subboxlist = box_list.BoxList(tf.boolean_mask(boxlist.get(), indicator)) if use_static_shapes:
if fields is None: if not (indicator_sum and isinstance(indicator_sum, int)):
fields = boxlist.get_extra_fields() raise ValueError('`indicator_sum` must be a of type int')
for field in fields: selected_positions = tf.to_float(indicator)
if not boxlist.has_field(field): indexed_positions = tf.cast(
raise ValueError('boxlist must contain all specified fields') tf.multiply(
subfieldlist = tf.boolean_mask(boxlist.get_field(field), indicator) tf.cumsum(selected_positions), selected_positions),
subboxlist.add_field(field, subfieldlist) dtype=tf.int32)
return subboxlist one_hot_selector = tf.one_hot(
indexed_positions - 1, indicator_sum, dtype=tf.float32)
sampled_indices = tf.cast(
tf.tensordot(
tf.to_float(tf.range(tf.shape(indicator)[0])),
one_hot_selector,
axes=[0, 0]),
dtype=tf.int32)
return gather(boxlist, sampled_indices, use_static_shapes=True)
else:
subboxlist = box_list.BoxList(tf.boolean_mask(boxlist.get(), indicator))
if fields is None:
fields = boxlist.get_extra_fields()
for field in fields:
if not boxlist.has_field(field):
raise ValueError('boxlist must contain all specified fields')
subfieldlist = tf.boolean_mask(boxlist.get_field(field), indicator)
subboxlist.add_field(field, subfieldlist)
return subboxlist
def gather(boxlist, indices, fields=None, scope=None): def gather(boxlist, indices, fields=None, scope=None, use_static_shapes=False):
"""Gather boxes from BoxList according to indices and return new BoxList. """Gather boxes from BoxList according to indices and return new BoxList.
By default, `gather` returns boxes corresponding to the input index list, as By default, `gather` returns boxes corresponding to the input index list, as
...@@ -474,6 +498,8 @@ def gather(boxlist, indices, fields=None, scope=None): ...@@ -474,6 +498,8 @@ def gather(boxlist, indices, fields=None, scope=None):
all fields are gathered from. Pass an empty fields list to only gather all fields are gathered from. Pass an empty fields list to only gather
the box coordinates. the box coordinates.
scope: name scope. scope: name scope.
use_static_shapes: Whether to use an implementation with static shape
gurantees.
Returns: Returns:
subboxlist: a BoxList corresponding to the subset of the input BoxList subboxlist: a BoxList corresponding to the subset of the input BoxList
...@@ -487,13 +513,17 @@ def gather(boxlist, indices, fields=None, scope=None): ...@@ -487,13 +513,17 @@ def gather(boxlist, indices, fields=None, scope=None):
raise ValueError('indices should have rank 1') raise ValueError('indices should have rank 1')
if indices.dtype != tf.int32 and indices.dtype != tf.int64: if indices.dtype != tf.int32 and indices.dtype != tf.int64:
raise ValueError('indices should be an int32 / int64 tensor') raise ValueError('indices should be an int32 / int64 tensor')
subboxlist = box_list.BoxList(tf.gather(boxlist.get(), indices)) gather_op = tf.gather
if use_static_shapes:
gather_op = ops.matmul_gather_on_zeroth_axis
subboxlist = box_list.BoxList(gather_op(boxlist.get(), indices))
if fields is None: if fields is None:
fields = boxlist.get_extra_fields() fields = boxlist.get_extra_fields()
fields += ['boxes']
for field in fields: for field in fields:
if not boxlist.has_field(field): if not boxlist.has_field(field):
raise ValueError('boxlist must contain all specified fields') raise ValueError('boxlist must contain all specified fields')
subfieldlist = tf.gather(boxlist.get_field(field), indices) subfieldlist = gather_op(boxlist.get_field(field), indices)
subboxlist.add_field(field, subfieldlist) subboxlist.add_field(field, subfieldlist)
return subboxlist return subboxlist
...@@ -585,10 +615,7 @@ def sort_by_field(boxlist, field, order=SortOrder.descend, scope=None): ...@@ -585,10 +615,7 @@ def sort_by_field(boxlist, field, order=SortOrder.descend, scope=None):
['Incorrect field size: actual vs expected.', num_entries, num_boxes]) ['Incorrect field size: actual vs expected.', num_entries, num_boxes])
with tf.control_dependencies([length_assert]): with tf.control_dependencies([length_assert]):
# TODO(derekjchow): Remove with tf.device when top_k operation runs _, sorted_indices = tf.nn.top_k(field_to_sort, num_boxes, sorted=True)
# correctly on GPU.
with tf.device('/cpu:0'):
_, sorted_indices = tf.nn.top_k(field_to_sort, num_boxes, sorted=True)
if order == SortOrder.ascend: if order == SortOrder.ascend:
sorted_indices = tf.reverse_v2(sorted_indices, [0]) sorted_indices = tf.reverse_v2(sorted_indices, [0])
...@@ -1059,3 +1086,51 @@ def get_minimal_coverage_box(boxlist, ...@@ -1059,3 +1086,51 @@ def get_minimal_coverage_box(boxlist,
tf.greater_equal(num_boxes, 1), tf.greater_equal(num_boxes, 1),
true_fn=lambda: coverage_box(boxlist.get()), true_fn=lambda: coverage_box(boxlist.get()),
false_fn=lambda: default_box) false_fn=lambda: default_box)
def sample_boxes_by_jittering(boxlist,
num_boxes_to_sample,
stddev=0.1,
scope=None):
"""Samples num_boxes_to_sample boxes by jittering around boxlist boxes.
It is possible that this function might generate boxes with size 0. The larger
the stddev, this is more probable. For a small stddev of 0.1 this probability
is very small.
Args:
boxlist: A boxlist containing N boxes in normalized coordinates.
num_boxes_to_sample: A positive integer containing the number of boxes to
sample.
stddev: Standard deviation. This is used to draw random offsets for the
box corners from a normal distribution. The offset is multiplied by the
box size so will be larger in terms of pixels for larger boxes.
scope: Name scope.
Returns:
sampled_boxlist: A boxlist containing num_boxes_to_sample boxes in
normalized coordinates.
"""
with tf.name_scope(scope, 'SampleBoxesByJittering'):
num_boxes = boxlist.num_boxes()
box_indices = tf.random_uniform(
[num_boxes_to_sample],
minval=0,
maxval=num_boxes,
dtype=tf.int32)
sampled_boxes = tf.gather(boxlist.get(), box_indices)
sampled_boxes_height = sampled_boxes[:, 2] - sampled_boxes[:, 0]
sampled_boxes_width = sampled_boxes[:, 3] - sampled_boxes[:, 1]
rand_miny_gaussian = tf.random_normal([num_boxes_to_sample], stddev=stddev)
rand_minx_gaussian = tf.random_normal([num_boxes_to_sample], stddev=stddev)
rand_maxy_gaussian = tf.random_normal([num_boxes_to_sample], stddev=stddev)
rand_maxx_gaussian = tf.random_normal([num_boxes_to_sample], stddev=stddev)
miny = rand_miny_gaussian * sampled_boxes_height + sampled_boxes[:, 0]
minx = rand_minx_gaussian * sampled_boxes_width + sampled_boxes[:, 1]
maxy = rand_maxy_gaussian * sampled_boxes_height + sampled_boxes[:, 2]
maxx = rand_maxx_gaussian * sampled_boxes_width + sampled_boxes[:, 3]
maxy = tf.maximum(miny, maxy)
maxx = tf.maximum(minx, maxx)
sampled_boxes = tf.stack([miny, minx, maxy, maxx], axis=1)
sampled_boxes = tf.maximum(tf.minimum(sampled_boxes, 1.0), 0.0)
return box_list.BoxList(sampled_boxes)
...@@ -16,14 +16,13 @@ ...@@ -16,14 +16,13 @@
"""Tests for object_detection.core.box_list_ops.""" """Tests for object_detection.core.box_list_ops."""
import numpy as np import numpy as np
import tensorflow as tf import tensorflow as tf
from tensorflow.python.framework import errors
from tensorflow.python.framework import ops
from object_detection.core import box_list from object_detection.core import box_list
from object_detection.core import box_list_ops from object_detection.core import box_list_ops
from object_detection.utils import test_case
class BoxListOpsTest(tf.test.TestCase): class BoxListOpsTest(test_case.TestCase):
"""Tests for common bounding box operations.""" """Tests for common bounding box operations."""
def test_area(self): def test_area(self):
...@@ -364,11 +363,35 @@ class BoxListOpsTest(tf.test.TestCase): ...@@ -364,11 +363,35 @@ class BoxListOpsTest(tf.test.TestCase):
subset_output = sess.run(subset.get()) subset_output = sess.run(subset.get())
self.assertAllClose(subset_output, expected_subset) self.assertAllClose(subset_output, expected_subset)
def test_boolean_mask_with_field(self): def test_static_boolean_mask_with_field(self):
corners = tf.constant(
[4 * [0.0], 4 * [1.0], 4 * [2.0], 4 * [3.0], 4 * [4.0]]) def graph_fn(corners, weights, indicator):
indicator = tf.constant([True, False, True, False, True], tf.bool) boxes = box_list.BoxList(corners)
weights = tf.constant([[.1], [.3], [.5], [.7], [.9]], tf.float32) boxes.add_field('weights', weights)
subset = box_list_ops.boolean_mask(
boxes,
indicator, ['weights'],
use_static_shapes=True,
indicator_sum=3)
return (subset.get_field('boxes'), subset.get_field('weights'))
corners = np.array(
[4 * [0.0], 4 * [1.0], 4 * [2.0], 4 * [3.0], 4 * [4.0]],
dtype=np.float32)
indicator = np.array([True, False, True, False, True], dtype=np.bool)
weights = np.array([[.1], [.3], [.5], [.7], [.9]], dtype=np.float32)
result_boxes, result_weights = self.execute(graph_fn,
[corners, weights, indicator])
expected_boxes = [4 * [0.0], 4 * [2.0], 4 * [4.0]]
expected_weights = [[.1], [.5], [.9]]
self.assertAllClose(result_boxes, expected_boxes)
self.assertAllClose(result_weights, expected_weights)
def test_dynamic_boolean_mask_with_field(self):
corners = tf.placeholder(tf.float32, [None, 4])
indicator = tf.placeholder(tf.bool, [None])
weights = tf.placeholder(tf.float32, [None, 1])
expected_subset = [4 * [0.0], 4 * [2.0], 4 * [4.0]] expected_subset = [4 * [0.0], 4 * [2.0], 4 * [4.0]]
expected_weights = [[.1], [.5], [.9]] expected_weights = [[.1], [.5], [.9]]
...@@ -377,7 +400,16 @@ class BoxListOpsTest(tf.test.TestCase): ...@@ -377,7 +400,16 @@ class BoxListOpsTest(tf.test.TestCase):
subset = box_list_ops.boolean_mask(boxes, indicator, ['weights']) subset = box_list_ops.boolean_mask(boxes, indicator, ['weights'])
with self.test_session() as sess: with self.test_session() as sess:
subset_output, weights_output = sess.run( subset_output, weights_output = sess.run(
[subset.get(), subset.get_field('weights')]) [subset.get(), subset.get_field('weights')],
feed_dict={
corners:
np.array(
[4 * [0.0], 4 * [1.0], 4 * [2.0], 4 * [3.0], 4 * [4.0]]),
indicator:
np.array([True, False, True, False, True]).astype(np.bool),
weights:
np.array([[.1], [.3], [.5], [.7], [.9]])
})
self.assertAllClose(subset_output, expected_subset) self.assertAllClose(subset_output, expected_subset)
self.assertAllClose(weights_output, expected_weights) self.assertAllClose(weights_output, expected_weights)
...@@ -392,19 +424,50 @@ class BoxListOpsTest(tf.test.TestCase): ...@@ -392,19 +424,50 @@ class BoxListOpsTest(tf.test.TestCase):
subset_output = sess.run(subset.get()) subset_output = sess.run(subset.get())
self.assertAllClose(subset_output, expected_subset) self.assertAllClose(subset_output, expected_subset)
def test_gather_with_field(self): def test_static_gather_with_field(self):
corners = tf.constant([4*[0.0], 4*[1.0], 4*[2.0], 4*[3.0], 4*[4.0]])
indices = tf.constant([0, 2, 4], tf.int32) def graph_fn(corners, weights, indices):
weights = tf.constant([[.1], [.3], [.5], [.7], [.9]], tf.float32) boxes = box_list.BoxList(corners)
boxes.add_field('weights', weights)
subset = box_list_ops.gather(
boxes, indices, ['weights'], use_static_shapes=True)
return (subset.get_field('boxes'), subset.get_field('weights'))
corners = np.array([4 * [0.0], 4 * [1.0], 4 * [2.0], 4 * [3.0],
4 * [4.0]], dtype=np.float32)
weights = np.array([[.1], [.3], [.5], [.7], [.9]], dtype=np.float32)
indices = np.array([0, 2, 4], dtype=np.int32)
result_boxes, result_weights = self.execute(graph_fn,
[corners, weights, indices])
expected_boxes = [4 * [0.0], 4 * [2.0], 4 * [4.0]]
expected_weights = [[.1], [.5], [.9]]
self.assertAllClose(result_boxes, expected_boxes)
self.assertAllClose(result_weights, expected_weights)
def test_dynamic_gather_with_field(self):
corners = tf.placeholder(tf.float32, [None, 4])
indices = tf.placeholder(tf.int32, [None])
weights = tf.placeholder(tf.float32, [None, 1])
expected_subset = [4 * [0.0], 4 * [2.0], 4 * [4.0]] expected_subset = [4 * [0.0], 4 * [2.0], 4 * [4.0]]
expected_weights = [[.1], [.5], [.9]] expected_weights = [[.1], [.5], [.9]]
boxes = box_list.BoxList(corners) boxes = box_list.BoxList(corners)
boxes.add_field('weights', weights) boxes.add_field('weights', weights)
subset = box_list_ops.gather(boxes, indices, ['weights']) subset = box_list_ops.gather(boxes, indices, ['weights'],
use_static_shapes=True)
with self.test_session() as sess: with self.test_session() as sess:
subset_output, weights_output = sess.run( subset_output, weights_output = sess.run(
[subset.get(), subset.get_field('weights')]) [subset.get(), subset.get_field('weights')],
feed_dict={
corners:
np.array(
[4 * [0.0], 4 * [1.0], 4 * [2.0], 4 * [3.0], 4 * [4.0]]),
indices:
np.array([0, 2, 4]).astype(np.int32),
weights:
np.array([[.1], [.3], [.5], [.7], [.9]])
})
self.assertAllClose(subset_output, expected_subset) self.assertAllClose(subset_output, expected_subset)
self.assertAllClose(weights_output, expected_weights) self.assertAllClose(weights_output, expected_weights)
...@@ -503,20 +566,14 @@ class BoxListOpsTest(tf.test.TestCase): ...@@ -503,20 +566,14 @@ class BoxListOpsTest(tf.test.TestCase):
boxes.add_field('misc', misc) boxes.add_field('misc', misc)
boxes.add_field('weights', weights) boxes.add_field('weights', weights)
with self.test_session() as sess: with self.assertRaises(ValueError):
with self.assertRaises(ValueError): box_list_ops.sort_by_field(boxes, 'area')
box_list_ops.sort_by_field(boxes, 'area')
with self.assertRaises(ValueError): with self.assertRaises(ValueError):
box_list_ops.sort_by_field(boxes, 'misc') box_list_ops.sort_by_field(boxes, 'misc')
if ops._USE_C_API: with self.assertRaises(ValueError):
with self.assertRaises(ValueError): box_list_ops.sort_by_field(boxes, 'weights')
box_list_ops.sort_by_field(boxes, 'weights')
else:
with self.assertRaisesWithPredicateMatch(errors.InvalidArgumentError,
'Incorrect field size'):
sess.run(box_list_ops.sort_by_field(boxes, 'weights').get())
def test_visualize_boxes_in_image(self): def test_visualize_boxes_in_image(self):
image = tf.zeros((6, 4, 3)) image = tf.zeros((6, 4, 3))
...@@ -1031,6 +1088,21 @@ class BoxRefinementTest(tf.test.TestCase): ...@@ -1031,6 +1088,21 @@ class BoxRefinementTest(tf.test.TestCase):
self.assertAllClose(expected_scores, scores_out) self.assertAllClose(expected_scores, scores_out)
self.assertAllEqual(extra_field_out, [0, 1, 1]) self.assertAllEqual(extra_field_out, [0, 1, 1])
def test_sample_boxes_by_jittering(self):
boxes = box_list.BoxList(
tf.constant([[0.1, 0.1, 0.4, 0.4],
[0.1, 0.1, 0.5, 0.5],
[0.6, 0.6, 0.8, 0.8],
[0.2, 0.2, 0.3, 0.3]], tf.float32))
sampled_boxes = box_list_ops.sample_boxes_by_jittering(
boxlist=boxes, num_boxes_to_sample=10)
iou = box_list_ops.iou(boxes, sampled_boxes)
iou_max = tf.reduce_max(iou, axis=0)
with self.test_session() as sess:
(np_sampled_boxes, np_iou_max) = sess.run([sampled_boxes.get(), iou_max])
self.assertAllEqual(np_sampled_boxes.shape, [10, 4])
self.assertAllGreater(np_iou_max, 0.5)
if __name__ == '__main__': if __name__ == '__main__':
tf.test.main() tf.test.main()
...@@ -138,7 +138,7 @@ class KerasBoxPredictor(tf.keras.Model): ...@@ -138,7 +138,7 @@ class KerasBoxPredictor(tf.keras.Model):
"""Keras-based BoxPredictor.""" """Keras-based BoxPredictor."""
def __init__(self, is_training, num_classes, freeze_batchnorm, def __init__(self, is_training, num_classes, freeze_batchnorm,
inplace_batchnorm_update): inplace_batchnorm_update, name=None):
"""Constructor. """Constructor.
Args: Args:
...@@ -155,8 +155,10 @@ class KerasBoxPredictor(tf.keras.Model): ...@@ -155,8 +155,10 @@ class KerasBoxPredictor(tf.keras.Model):
values inplace. When this is false train op must add a control values inplace. When this is false train op must add a control
dependency on tf.graphkeys.UPDATE_OPS collection in order to update dependency on tf.graphkeys.UPDATE_OPS collection in order to update
batch norm statistics. batch norm statistics.
name: A string name scope to assign to the model. If `None`, Keras
will auto-generate one from the class name.
""" """
super(KerasBoxPredictor, self).__init__() super(KerasBoxPredictor, self).__init__(name=name)
self._is_training = is_training self._is_training = is_training
self._num_classes = num_classes self._num_classes = num_classes
...@@ -171,7 +173,7 @@ class KerasBoxPredictor(tf.keras.Model): ...@@ -171,7 +173,7 @@ class KerasBoxPredictor(tf.keras.Model):
def num_classes(self): def num_classes(self):
return self._num_classes return self._num_classes
def call(self, image_features, scope=None, **kwargs): def call(self, image_features, **kwargs):
"""Computes encoded object locations and corresponding confidences. """Computes encoded object locations and corresponding confidences.
Takes a list of high level image feature maps as input and produces a list Takes a list of high level image feature maps as input and produces a list
...@@ -181,9 +183,8 @@ class KerasBoxPredictor(tf.keras.Model): ...@@ -181,9 +183,8 @@ class KerasBoxPredictor(tf.keras.Model):
Args: Args:
image_features: A list of float tensors of shape [batch_size, height_i, image_features: A list of float tensors of shape [batch_size, height_i,
width_i, channels_i] containing features for a batch of images. width_i, channels_i] containing features for a batch of images.
scope: Variable and Op scope name.
**kwargs: Additional keyword arguments for specific implementations of **kwargs: Additional keyword arguments for specific implementations of
BoxPredictor. BoxPredictor.
Returns: Returns:
A dictionary containing at least the following tensors. A dictionary containing at least the following tensors.
......
...@@ -46,6 +46,7 @@ class Loss(object): ...@@ -46,6 +46,7 @@ class Loss(object):
prediction_tensor, prediction_tensor,
target_tensor, target_tensor,
ignore_nan_targets=False, ignore_nan_targets=False,
losses_mask=None,
scope=None, scope=None,
**params): **params):
"""Call the loss function. """Call the loss function.
...@@ -58,6 +59,11 @@ class Loss(object): ...@@ -58,6 +59,11 @@ class Loss(object):
ignore_nan_targets: whether to ignore nan targets in the loss computation. ignore_nan_targets: whether to ignore nan targets in the loss computation.
E.g. can be used if the target tensor is missing groundtruth data that E.g. can be used if the target tensor is missing groundtruth data that
shouldn't be factored into the loss. shouldn't be factored into the loss.
losses_mask: A [batch] boolean tensor that indicates whether losses should
be applied to individual images in the batch. For elements that
are True, corresponding prediction, target, and weight tensors will be
removed prior to loss computation. If None, no filtering will take place
prior to loss computation.
scope: Op scope name. Defaults to 'Loss' if None. scope: Op scope name. Defaults to 'Loss' if None.
**params: Additional keyword arguments for specific implementations of **params: Additional keyword arguments for specific implementations of
the Loss. the Loss.
...@@ -71,8 +77,25 @@ class Loss(object): ...@@ -71,8 +77,25 @@ class Loss(object):
target_tensor = tf.where(tf.is_nan(target_tensor), target_tensor = tf.where(tf.is_nan(target_tensor),
prediction_tensor, prediction_tensor,
target_tensor) target_tensor)
if losses_mask is not None:
tensor_multiplier = self._get_loss_multiplier_for_tensor(
prediction_tensor,
losses_mask)
prediction_tensor *= tensor_multiplier
target_tensor *= tensor_multiplier
if 'weights' in params:
params['weights'] = tf.convert_to_tensor(params['weights'])
weights_multiplier = self._get_loss_multiplier_for_tensor(
params['weights'],
losses_mask)
params['weights'] *= weights_multiplier
return self._compute_loss(prediction_tensor, target_tensor, **params) return self._compute_loss(prediction_tensor, target_tensor, **params)
def _get_loss_multiplier_for_tensor(self, tensor, losses_mask):
loss_multiplier_shape = tf.stack([-1] + [1] * (len(tensor.shape) - 1))
return tf.cast(tf.reshape(losses_mask, loss_multiplier_shape), tf.float32)
@abstractmethod @abstractmethod
def _compute_loss(self, prediction_tensor, target_tensor, **params): def _compute_loss(self, prediction_tensor, target_tensor, **params):
"""Method to be overridden by implementations. """Method to be overridden by implementations.
......
...@@ -79,6 +79,26 @@ class WeightedL2LocalizationLossTest(tf.test.TestCase): ...@@ -79,6 +79,26 @@ class WeightedL2LocalizationLossTest(tf.test.TestCase):
loss_output = sess.run(loss) loss_output = sess.run(loss)
self.assertAllClose(loss_output, expected_loss) self.assertAllClose(loss_output, expected_loss)
def testReturnsCorrectWeightedLossWithLossesMask(self):
batch_size = 4
num_anchors = 10
code_size = 4
prediction_tensor = tf.ones([batch_size, num_anchors, code_size])
target_tensor = tf.zeros([batch_size, num_anchors, code_size])
weights = tf.constant([[1, 1, 1, 1, 1, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1, 0, 0],
[1, 1, 1, 1, 1, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 0, 0, 0, 0, 0]], tf.float32)
losses_mask = tf.constant([True, False, True, True], tf.bool)
loss_op = losses.WeightedL2LocalizationLoss()
loss = tf.reduce_sum(loss_op(prediction_tensor, target_tensor,
weights=weights, losses_mask=losses_mask))
expected_loss = (3 * 5 * 4) / 2.0
with self.test_session() as sess:
loss_output = sess.run(loss)
self.assertAllClose(loss_output, expected_loss)
class WeightedSmoothL1LocalizationLossTest(tf.test.TestCase): class WeightedSmoothL1LocalizationLossTest(tf.test.TestCase):
...@@ -104,6 +124,34 @@ class WeightedSmoothL1LocalizationLossTest(tf.test.TestCase): ...@@ -104,6 +124,34 @@ class WeightedSmoothL1LocalizationLossTest(tf.test.TestCase):
loss_output = sess.run(loss) loss_output = sess.run(loss)
self.assertAllClose(loss_output, exp_loss) self.assertAllClose(loss_output, exp_loss)
def testReturnsCorrectLossWithLossesMask(self):
batch_size = 3
num_anchors = 3
code_size = 4
prediction_tensor = tf.constant([[[2.5, 0, .4, 0],
[0, 0, 0, 0],
[0, 2.5, 0, .4]],
[[3.5, 0, 0, 0],
[0, .4, 0, .9],
[0, 0, 1.5, 0]],
[[3.5, 7., 0, 0],
[0, .4, 0, .9],
[2.2, 2.2, 1.5, 0]]], tf.float32)
target_tensor = tf.zeros([batch_size, num_anchors, code_size])
weights = tf.constant([[2, 1, 1],
[0, 3, 0],
[4, 3, 0]], tf.float32)
losses_mask = tf.constant([True, True, False], tf.bool)
loss_op = losses.WeightedSmoothL1LocalizationLoss()
loss = loss_op(prediction_tensor, target_tensor, weights=weights,
losses_mask=losses_mask)
loss = tf.reduce_sum(loss)
exp_loss = 7.695
with self.test_session() as sess:
loss_output = sess.run(loss)
self.assertAllClose(loss_output, exp_loss)
class WeightedIOULocalizationLossTest(tf.test.TestCase): class WeightedIOULocalizationLossTest(tf.test.TestCase):
...@@ -123,6 +171,24 @@ class WeightedIOULocalizationLossTest(tf.test.TestCase): ...@@ -123,6 +171,24 @@ class WeightedIOULocalizationLossTest(tf.test.TestCase):
loss_output = sess.run(loss) loss_output = sess.run(loss)
self.assertAllClose(loss_output, exp_loss) self.assertAllClose(loss_output, exp_loss)
def testReturnsCorrectLossWithNoLabels(self):
prediction_tensor = tf.constant([[[1.5, 0, 2.4, 1],
[0, 0, 1, 1],
[0, 0, .5, .25]]])
target_tensor = tf.constant([[[1.5, 0, 2.4, 1],
[0, 0, 1, 1],
[50, 50, 500.5, 100.25]]])
weights = [[1.0, .5, 2.0]]
losses_mask = tf.constant([False], tf.bool)
loss_op = losses.WeightedIOULocalizationLoss()
loss = loss_op(prediction_tensor, target_tensor, weights=weights,
losses_mask=losses_mask)
loss = tf.reduce_sum(loss)
exp_loss = 0.0
with self.test_session() as sess:
loss_output = sess.run(loss)
self.assertAllClose(loss_output, exp_loss)
class WeightedSigmoidClassificationLossTest(tf.test.TestCase): class WeightedSigmoidClassificationLossTest(tf.test.TestCase):
...@@ -215,6 +281,50 @@ class WeightedSigmoidClassificationLossTest(tf.test.TestCase): ...@@ -215,6 +281,50 @@ class WeightedSigmoidClassificationLossTest(tf.test.TestCase):
loss_output = sess.run(loss) loss_output = sess.run(loss)
self.assertAllClose(loss_output, exp_loss) self.assertAllClose(loss_output, exp_loss)
def testReturnsCorrectLossWithLossesMask(self):
prediction_tensor = tf.constant([[[-100, 100, -100],
[100, -100, -100],
[100, 0, -100],
[-100, -100, 100]],
[[-100, 0, 100],
[-100, 100, -100],
[100, 100, 100],
[0, 0, -1]],
[[-100, 0, 100],
[-100, 100, -100],
[100, 100, 100],
[0, 0, -100]]], tf.float32)
target_tensor = tf.constant([[[0, 1, 0],
[1, 0, 0],
[1, 0, 0],
[0, 0, 1]],
[[0, 0, 1],
[0, 1, 0],
[1, 1, 1],
[1, 0, 0]],
[[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0]]], tf.float32)
weights = tf.constant([[1, 1, 1, 1],
[1, 1, 1, 0],
[1, 1, 1, 1]], tf.float32)
losses_mask = tf.constant([True, True, False], tf.bool)
loss_op = losses.WeightedSigmoidClassificationLoss()
loss_per_anchor = loss_op(prediction_tensor, target_tensor, weights=weights,
losses_mask=losses_mask)
loss = tf.reduce_sum(loss_per_anchor)
exp_loss = -2 * math.log(.5)
with self.test_session() as sess:
loss_output = sess.run(loss)
self.assertAllEqual(prediction_tensor.shape.as_list(),
loss_per_anchor.shape.as_list())
self.assertAllEqual(target_tensor.shape.as_list(),
loss_per_anchor.shape.as_list())
self.assertAllClose(loss_output, exp_loss)
def _logit(probability): def _logit(probability):
return math.log(probability / (1. - probability)) return math.log(probability / (1. - probability))
...@@ -484,6 +594,51 @@ class SigmoidFocalClassificationLossTest(tf.test.TestCase): ...@@ -484,6 +594,51 @@ class SigmoidFocalClassificationLossTest(tf.test.TestCase):
8 * 2))), # negatives from 8 anchors for two classes. 8 * 2))), # negatives from 8 anchors for two classes.
focal_loss) focal_loss)
def testExpectedLossWithLossesMask(self):
# All zeros correspond to 0.5 probability.
prediction_tensor = tf.constant([[[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0]],
[[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0]],
[[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0]]], tf.float32)
target_tensor = tf.constant([[[0, 1, 0],
[1, 0, 0],
[1, 0, 0],
[0, 0, 1]],
[[0, 0, 1],
[0, 1, 0],
[1, 0, 0],
[1, 0, 0]],
[[1, 0, 0],
[1, 0, 0],
[1, 0, 0],
[1, 0, 0]]], tf.float32)
weights = tf.constant([[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]], tf.float32)
losses_mask = tf.constant([True, True, False], tf.bool)
focal_loss_op = losses.SigmoidFocalClassificationLoss(alpha=0.75, gamma=0.0)
focal_loss = tf.reduce_sum(focal_loss_op(prediction_tensor, target_tensor,
weights=weights,
losses_mask=losses_mask))
with self.test_session() as sess:
focal_loss = sess.run(focal_loss)
self.assertAllClose(
(-math.log(.5) * # x-entropy per class per anchor.
((0.75 * # alpha for positives.
8) + # positives from 8 anchors.
(0.25 * # alpha for negatives.
8 * 2))), # negatives from 8 anchors for two classes.
focal_loss)
class WeightedSoftmaxClassificationLossTest(tf.test.TestCase): class WeightedSoftmaxClassificationLossTest(tf.test.TestCase):
...@@ -575,6 +730,45 @@ class WeightedSoftmaxClassificationLossTest(tf.test.TestCase): ...@@ -575,6 +730,45 @@ class WeightedSoftmaxClassificationLossTest(tf.test.TestCase):
loss_output = sess.run(loss) loss_output = sess.run(loss)
self.assertAllClose(loss_output, exp_loss) self.assertAllClose(loss_output, exp_loss)
def testReturnsCorrectLossWithLossesMask(self):
prediction_tensor = tf.constant([[[-100, 100, -100],
[100, -100, -100],
[0, 0, -100],
[-100, -100, 100]],
[[-100, 0, 0],
[-100, 100, -100],
[-100, 100, -100],
[100, -100, -100]],
[[-100, 0, 0],
[-100, 100, -100],
[-100, 100, -100],
[100, -100, -100]]], tf.float32)
target_tensor = tf.constant([[[0, 1, 0],
[1, 0, 0],
[1, 0, 0],
[0, 0, 1]],
[[0, 0, 1],
[0, 1, 0],
[0, 1, 0],
[1, 0, 0]],
[[1, 0, 0],
[1, 0, 0],
[1, 0, 0],
[1, 0, 0]]], tf.float32)
weights = tf.constant([[1, 1, .5, 1],
[1, 1, 1, 0],
[1, 1, 1, 1]], tf.float32)
losses_mask = tf.constant([True, True, False], tf.bool)
loss_op = losses.WeightedSoftmaxClassificationLoss()
loss = loss_op(prediction_tensor, target_tensor, weights=weights,
losses_mask=losses_mask)
loss = tf.reduce_sum(loss)
exp_loss = - 1.5 * math.log(.5)
with self.test_session() as sess:
loss_output = sess.run(loss)
self.assertAllClose(loss_output, exp_loss)
class WeightedSoftmaxClassificationAgainstLogitsLossTest(tf.test.TestCase): class WeightedSoftmaxClassificationAgainstLogitsLossTest(tf.test.TestCase):
......
...@@ -219,7 +219,7 @@ class Matcher(object): ...@@ -219,7 +219,7 @@ class Matcher(object):
""" """
self._use_matmul_gather = use_matmul_gather self._use_matmul_gather = use_matmul_gather
def match(self, similarity_matrix, scope=None, **params): def match(self, similarity_matrix, valid_rows=None, scope=None):
"""Computes matches among row and column indices and returns the result. """Computes matches among row and column indices and returns the result.
Computes matches among the row and column indices based on the similarity Computes matches among the row and column indices based on the similarity
...@@ -228,27 +228,28 @@ class Matcher(object): ...@@ -228,27 +228,28 @@ class Matcher(object):
Args: Args:
similarity_matrix: Float tensor of shape [N, M] with pairwise similarity similarity_matrix: Float tensor of shape [N, M] with pairwise similarity
where higher value means more similar. where higher value means more similar.
valid_rows: A boolean tensor of shape [N] indicating the rows that are
valid for matching.
scope: Op scope name. Defaults to 'Match' if None. scope: Op scope name. Defaults to 'Match' if None.
**params: Additional keyword arguments for specific implementations of
the Matcher.
Returns: Returns:
A Match object with the results of matching. A Match object with the results of matching.
""" """
with tf.name_scope(scope, 'Match', [similarity_matrix, params]) as scope: with tf.name_scope(scope, 'Match') as scope:
return Match(self._match(similarity_matrix, **params), if valid_rows is None:
valid_rows = tf.ones(tf.shape(similarity_matrix)[0], dtype=tf.bool)
return Match(self._match(similarity_matrix, valid_rows),
self._use_matmul_gather) self._use_matmul_gather)
@abstractmethod @abstractmethod
def _match(self, similarity_matrix, **params): def _match(self, similarity_matrix, valid_rows):
"""Method to be overridden by implementations. """Method to be overridden by implementations.
Args: Args:
similarity_matrix: Float tensor of shape [N, M] with pairwise similarity similarity_matrix: Float tensor of shape [N, M] with pairwise similarity
where higher value means more similar. where higher value means more similar.
**params: Additional keyword arguments for specific implementations of valid_rows: A boolean tensor of shape [N] indicating the rows that are
the Matcher. valid for matching.
Returns: Returns:
match_results: Integer tensor of shape [M]: match_results[i]>=0 means match_results: Integer tensor of shape [M]: match_results[i]>=0 means
that column i is matched to row match_results[i], match_results[i]=-1 that column i is matched to row match_results[i], match_results[i]=-1
......
...@@ -84,7 +84,8 @@ class DetectionModel(object): ...@@ -84,7 +84,8 @@ class DetectionModel(object):
Args: Args:
field: a string key, options are field: a string key, options are
fields.BoxListFields.{boxes,classes,masks,keypoints} fields.BoxListFields.{boxes,classes,masks,keypoints} or
fields.InputDataFields.is_annotated.
Returns: Returns:
a list of tensors holding groundtruth information (see also a list of tensors holding groundtruth information (see also
...@@ -94,7 +95,8 @@ class DetectionModel(object): ...@@ -94,7 +95,8 @@ class DetectionModel(object):
RuntimeError: if the field has not been provided via provide_groundtruth. RuntimeError: if the field has not been provided via provide_groundtruth.
""" """
if field not in self._groundtruth_lists: if field not in self._groundtruth_lists:
raise RuntimeError('Groundtruth tensor %s has not been provided', field) raise RuntimeError('Groundtruth tensor {} has not been provided'.format(
field))
return self._groundtruth_lists[field] return self._groundtruth_lists[field]
def groundtruth_has_field(self, field): def groundtruth_has_field(self, field):
...@@ -102,7 +104,8 @@ class DetectionModel(object): ...@@ -102,7 +104,8 @@ class DetectionModel(object):
Args: Args:
field: a string key, options are field: a string key, options are
fields.BoxListFields.{boxes,classes,masks,keypoints} fields.BoxListFields.{boxes,classes,masks,keypoints} or
fields.InputDataFields.is_annotated.
Returns: Returns:
True if the groundtruth includes the given field, False otherwise. True if the groundtruth includes the given field, False otherwise.
...@@ -238,7 +241,8 @@ class DetectionModel(object): ...@@ -238,7 +241,8 @@ class DetectionModel(object):
groundtruth_masks_list=None, groundtruth_masks_list=None,
groundtruth_keypoints_list=None, groundtruth_keypoints_list=None,
groundtruth_weights_list=None, groundtruth_weights_list=None,
groundtruth_is_crowd_list=None): groundtruth_is_crowd_list=None,
is_annotated_list=None):
"""Provide groundtruth tensors. """Provide groundtruth tensors.
Args: Args:
...@@ -263,6 +267,8 @@ class DetectionModel(object): ...@@ -263,6 +267,8 @@ class DetectionModel(object):
[num_boxes] containing weights for groundtruth boxes. [num_boxes] containing weights for groundtruth boxes.
groundtruth_is_crowd_list: A list of 1-D tf.bool tensors of shape groundtruth_is_crowd_list: A list of 1-D tf.bool tensors of shape
[num_boxes] containing is_crowd annotations [num_boxes] containing is_crowd annotations
is_annotated_list: A list of scalar tf.bool tensors indicating whether
images have been labeled or not.
""" """
self._groundtruth_lists[fields.BoxListFields.boxes] = groundtruth_boxes_list self._groundtruth_lists[fields.BoxListFields.boxes] = groundtruth_boxes_list
self._groundtruth_lists[ self._groundtruth_lists[
...@@ -279,6 +285,9 @@ class DetectionModel(object): ...@@ -279,6 +285,9 @@ class DetectionModel(object):
if groundtruth_is_crowd_list: if groundtruth_is_crowd_list:
self._groundtruth_lists[ self._groundtruth_lists[
fields.BoxListFields.is_crowd] = groundtruth_is_crowd_list fields.BoxListFields.is_crowd] = groundtruth_is_crowd_list
if is_annotated_list:
self._groundtruth_lists[
fields.InputDataFields.is_annotated] = is_annotated_list
@abstractmethod @abstractmethod
def restore_map(self, fine_tune_checkpoint_type='detection'): def restore_map(self, fine_tune_checkpoint_type='detection'):
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment