Unverified Commit 02a9969e authored by pkulzc's avatar pkulzc Committed by GitHub
Browse files

Refactor object detection box predictors and fix some issues with model_main. (#4965)

* Merged commit includes the following changes:
206852642  by Zhichao Lu:

    Build the balanced_positive_negative_sampler in the model builder for FasterRCNN. Also adds an option to use the static implementation of the sampler.

--
206803260  by Zhichao Lu:

    Fixes a misplaced argument in resnet fpn feature extractor.

--
206682736  by Zhichao Lu:

    This CL modifies the SSD meta architecture to support both Slim-based and Keras-based box predictors, and begins preparation for Keras box predictor support in the other meta architectures.

    Concretely, this CL adds a new `KerasBoxPredictor` base class and makes the meta architectures appropriately call whichever box predictors they are using.

    We can switch the non-ssd meta architectures to fully support Keras box predictors once the Keras Convolutional Box Predictor CL is submitted.

--
206669634  by Zhichao Lu:

    Adds an alternate method for balanced positive negative sampler using static shapes.

--
206643278  by Zhichao Lu:

    This CL adds a Keras layer hyperparameter configuration object to the hyperparams_builder.

    It automatically converts from Slim layer hyperparameter configs to Keras layer hyperparameters. Namely, it:
    - Builds Keras initializers/regularizers instead of Slim ones
    - sets weights_regularizer/initializer to kernel_regularizer/initializer
    - converts batchnorm decay to momentum
    - converts Slim l2 regularizer weights to the equivalent Keras l2 weights

    This will be used in the conversion of object detection feature extractors & box predictors to newer Tensorflow APIs.

--
206611681  by Zhichao Lu:

    Internal changes.

--
206591619  by Zhichao Lu:

    Clip the to shape when the input tensors are larger than the expected padded static shape

--
206517644  by Zhichao Lu:

    Make MultiscaleGridAnchorGenerator more consistent with MultipleGridAnchorGenerator.

--
206415624  by Zhichao Lu:

    Make the hardcoded feature pyramid network (FPN) levels configurable for both SSD
    Resnet and SSD Mobilenet.

--
206398204  by Zhichao Lu:

    This CL modifies the SSD meta architecture to support both Slim-based and Keras-based feature extractors.

    This allows us to begin the conversion of object detection to newer Tensorflow APIs.

--
206213448  by Zhichao Lu:

    Adding a method to compute the expected classification loss by background/foreground weighting.

--
206204232  by Zhichao Lu:

    Adding the keypoint head to the Mask RCNN pipeline.

--
206200352  by Zhichao Lu:

    - Create Faster R-CNN target assigner in the model builder. This allows configuring matchers in Target assigner to use TPU compatible ops (tf.gather in this case) without any change in meta architecture.
    - As a +ve side effect of the refactoring, we can now re-use a single target assigner for all of second stage heads in Faster R-CNN.

--
206178206  by Zhichao Lu:

    Force ssd feature extractor builder to use keyword arguments so values won't be passed to wrong arguments.

--
206168297  by Zhichao Lu:

    Updating exporter to use freeze_graph.freeze_graph_with_def_protos rather than a homegrown version.

--
206080748  by Zhichao Lu:

    Merge external contributions.

--
206074460  by Zhichao Lu:

    Update to preprocessor to apply temperature and softmax to the multiclass scores on read.

--
205960802  by Zhichao Lu:

    Fixing a bug in hierarchical label expansion script.

--
205944686  by Zhichao Lu:

    Update exporter to support exporting quantized model.

--
205912529  by Zhichao Lu:

    Add a two stage matcher to allow for thresholding by one criteria and then argmaxing on the other.

--
205909017  by Zhichao Lu:

    Add test for grayscale image_resizer

--
205892801  by Zhichao Lu:

    Add flag to decide whether to apply batch norm to conv layers of weight shared box predictor.

--
205824449  by Zhichao Lu:

    make sure that by default mask rcnn box predictor predicts 2 stages.

--
205730139  by Zhichao Lu:

    Updating warning message to be more explicit about variable size mismatch.

--
205696992  by Zhichao Lu:

    Remove utils/ops.py's dependency on core/box_list_ops.py. This will allow re-using TPU compatible ops from utils/ops.py in core/box_list_ops.py.

--
205696867  by Zhichao Lu:

    Refactoring mask rcnn predictor so have each head in a separate file.
    This CL lets us to add new heads more easily in the future to mask rcnn.

--
205492073  by Zhichao Lu:

    Refactor R-FCN box predictor to be TPU compliant.

    - Change utils/ops.py:position_sensitive_crop_regions to operate on single image and set of boxes without `box_ind`
    - Add a batch version that operations on batches of images and batches of boxes.
    - Refactor R-FCN box predictor to use the batched version of position sensitive crop regions.

--
205453567  by Zhichao Lu:

    Fix bug that cannot export inference graph when write_inference_graph flag is True.

--
205316039  by Zhichao Lu:

    Changing input tensor name.

--
205256307  by Zhichao Lu:

    Fix model zoo links for quantized model.

--
205164432  by Zhichao Lu:

    Fixes eval error when label map contains non-ascii characters.

--
205129842  by Zhichao Lu:

    Adds a option to clip the anchors to the window size without filtering the overlapped boxes in Faster-RCNN

--
205094863  by Zhichao Lu:

    Update to label map util to allow the option of adding a background class and fill in gaps in the label map. Useful for using multiclass scores which require a complete label map with explicit background label.

--
204989032  by Zhichao Lu:

    Add tf.prof support to exporter.

--
204825267  by Zhichao Lu:

    Modify mask rcnn box predictor tests for TPU compatibility.

--
204778749  by Zhichao Lu:

    Remove score filtering from postprocessing.py and rely on filtering logic in tf.image.non_max_suppression

--
204775818  by Zhichao Lu:

    Python3 fixes for object_detection.

--
204745920  by Zhichao Lu:

    Object Detection Dataset visualization tool (documentation).

--
204686993  by Zhichao Lu:

    Internal changes.

--
204559667  by Zhichao Lu:

    Refactor box_predictor.py into multiple files.
    The abstract base class remains in the object_detection/core, The other classes have moved to a separate file each in object_detection/predictors

--
204552847  by Zhichao Lu:

    Update blog post link.

--
204508028  by Zhichao Lu:

    Bump down the batch size to 1024 to be a bit more tolerant to OOM and double the number of iterations. This job still converges to 20.5 mAP in 3 hours.

--

PiperOrigin-RevId: 206852642

* Add original post-processing back.
parent d135ed9c
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""RFCN Box Predictor."""
import tensorflow as tf
from object_detection.core import box_predictor
from object_detection.utils import ops
slim = tf.contrib.slim
BOX_ENCODINGS = box_predictor.BOX_ENCODINGS
CLASS_PREDICTIONS_WITH_BACKGROUND = (
box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND)
MASK_PREDICTIONS = box_predictor.MASK_PREDICTIONS
class RfcnBoxPredictor(box_predictor.BoxPredictor):
"""RFCN Box Predictor.
Applies a position sensitive ROI pooling on position sensitive feature maps to
predict classes and refined locations. See https://arxiv.org/abs/1605.06409
for details.
This is used for the second stage of the RFCN meta architecture. Notice that
locations are *not* shared across classes, thus for each anchor, a separate
prediction is made for each class.
"""
def __init__(self,
is_training,
num_classes,
conv_hyperparams_fn,
num_spatial_bins,
depth,
crop_size,
box_code_size):
"""Constructor.
Args:
is_training: Indicates whether the BoxPredictor is in training mode.
num_classes: number of classes. Note that num_classes *does not*
include the background category, so if groundtruth labels take values
in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the
assigned classification targets can range from {0,... K}).
conv_hyperparams_fn: A function to construct tf-slim arg_scope with
hyperparameters for convolutional layers.
num_spatial_bins: A list of two integers `[spatial_bins_y,
spatial_bins_x]`.
depth: Target depth to reduce the input feature maps to.
crop_size: A list of two integers `[crop_height, crop_width]`.
box_code_size: Size of encoding for each box.
"""
super(RfcnBoxPredictor, self).__init__(is_training, num_classes)
self._conv_hyperparams_fn = conv_hyperparams_fn
self._num_spatial_bins = num_spatial_bins
self._depth = depth
self._crop_size = crop_size
self._box_code_size = box_code_size
@property
def num_classes(self):
return self._num_classes
def _predict(self, image_features, num_predictions_per_location,
proposal_boxes):
"""Computes encoded object locations and corresponding confidences.
Args:
image_features: A list of float tensors of shape [batch_size, height_i,
width_i, channels_i] containing features for a batch of images.
num_predictions_per_location: A list of integers representing the number
of box predictions to be made per spatial location for each feature map.
Currently, this must be set to [1], or an error will be raised.
proposal_boxes: A float tensor of shape [batch_size, num_proposals,
box_code_size].
Returns:
box_encodings: A list of float tensors of shape
[batch_size, num_anchors_i, q, code_size] representing the location of
the objects, where q is 1 or the number of classes. Each entry in the
list corresponds to a feature map in the input `image_features` list.
class_predictions_with_background: A list of float tensors of shape
[batch_size, num_anchors_i, num_classes + 1] representing the class
predictions for the proposals. Each entry in the list corresponds to a
feature map in the input `image_features` list.
Raises:
ValueError: if num_predictions_per_location is not 1 or if
len(image_features) is not 1.
"""
if (len(num_predictions_per_location) != 1 or
num_predictions_per_location[0] != 1):
raise ValueError('Currently RfcnBoxPredictor only supports '
'predicting a single box per class per location.')
if len(image_features) != 1:
raise ValueError('length of `image_features` must be 1. Found {}'.
format(len(image_features)))
image_feature = image_features[0]
num_predictions_per_location = num_predictions_per_location[0]
batch_size = tf.shape(proposal_boxes)[0]
num_boxes = tf.shape(proposal_boxes)[1]
net = image_feature
with slim.arg_scope(self._conv_hyperparams_fn()):
net = slim.conv2d(net, self._depth, [1, 1], scope='reduce_depth')
# Location predictions.
location_feature_map_depth = (self._num_spatial_bins[0] *
self._num_spatial_bins[1] *
self.num_classes *
self._box_code_size)
location_feature_map = slim.conv2d(net, location_feature_map_depth,
[1, 1], activation_fn=None,
scope='refined_locations')
box_encodings = ops.batch_position_sensitive_crop_regions(
location_feature_map,
boxes=proposal_boxes,
crop_size=self._crop_size,
num_spatial_bins=self._num_spatial_bins,
global_pool=True)
box_encodings = tf.squeeze(box_encodings, squeeze_dims=[2, 3])
box_encodings = tf.reshape(box_encodings,
[batch_size * num_boxes, 1, self.num_classes,
self._box_code_size])
# Class predictions.
total_classes = self.num_classes + 1 # Account for background class.
class_feature_map_depth = (self._num_spatial_bins[0] *
self._num_spatial_bins[1] *
total_classes)
class_feature_map = slim.conv2d(net, class_feature_map_depth, [1, 1],
activation_fn=None,
scope='class_predictions')
class_predictions_with_background = (
ops.batch_position_sensitive_crop_regions(
class_feature_map,
boxes=proposal_boxes,
crop_size=self._crop_size,
num_spatial_bins=self._num_spatial_bins,
global_pool=True))
class_predictions_with_background = tf.squeeze(
class_predictions_with_background, squeeze_dims=[2, 3])
class_predictions_with_background = tf.reshape(
class_predictions_with_background,
[batch_size * num_boxes, 1, total_classes])
return {BOX_ENCODINGS: [box_encodings],
CLASS_PREDICTIONS_WITH_BACKGROUND:
[class_predictions_with_background]}
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for object_detection.predictors.rfcn_box_predictor."""
import numpy as np
import tensorflow as tf
from google.protobuf import text_format
from object_detection.builders import hyperparams_builder
from object_detection.predictors import rfcn_box_predictor as box_predictor
from object_detection.protos import hyperparams_pb2
from object_detection.utils import test_case
class RfcnBoxPredictorTest(test_case.TestCase):
def _build_arg_scope_with_conv_hyperparams(self):
conv_hyperparams = hyperparams_pb2.Hyperparams()
conv_hyperparams_text_proto = """
regularizer {
l2_regularizer {
}
}
initializer {
truncated_normal_initializer {
}
}
"""
text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams)
return hyperparams_builder.build(conv_hyperparams, is_training=True)
def test_get_correct_box_encoding_and_class_prediction_shapes(self):
def graph_fn(image_features, proposal_boxes):
rfcn_box_predictor = box_predictor.RfcnBoxPredictor(
is_training=False,
num_classes=2,
conv_hyperparams_fn=self._build_arg_scope_with_conv_hyperparams(),
num_spatial_bins=[3, 3],
depth=4,
crop_size=[12, 12],
box_code_size=4
)
box_predictions = rfcn_box_predictor.predict(
[image_features], num_predictions_per_location=[1],
scope='BoxPredictor',
proposal_boxes=proposal_boxes)
box_encodings = tf.concat(
box_predictions[box_predictor.BOX_ENCODINGS], axis=1)
class_predictions_with_background = tf.concat(
box_predictions[box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND],
axis=1)
return (box_encodings, class_predictions_with_background)
image_features = np.random.rand(4, 8, 8, 64).astype(np.float32)
proposal_boxes = np.random.rand(4, 2, 4).astype(np.float32)
(box_encodings, class_predictions_with_background) = self.execute(
graph_fn, [image_features, proposal_boxes])
self.assertAllEqual(box_encodings.shape, [8, 1, 2, 4])
self.assertAllEqual(class_predictions_with_background.shape, [8, 1, 3])
if __name__ == '__main__':
tf.test.main()
...@@ -93,6 +93,8 @@ message WeightSharedConvolutionalBoxPredictor { ...@@ -93,6 +93,8 @@ message WeightSharedConvolutionalBoxPredictor {
optional bool share_prediction_tower = 13 [default = false]; optional bool share_prediction_tower = 13 [default = false];
} }
// TODO(alirezafathi): Refactor the proto file to be able to configure mask rcnn
// head easily.
message MaskRCNNBoxPredictor { message MaskRCNNBoxPredictor {
// Hyperparameters for fully connected ops used in the box predictor. // Hyperparameters for fully connected ops used in the box predictor.
optional Hyperparams fc_hyperparams = 1; optional Hyperparams fc_hyperparams = 1;
......
...@@ -142,6 +142,21 @@ message FasterRcnn { ...@@ -142,6 +142,21 @@ message FasterRcnn {
// standard tf.image.crop_and_resize while computing second stage input // standard tf.image.crop_and_resize while computing second stage input
// feature maps. // feature maps.
optional bool use_matmul_crop_and_resize = 31 [default = false]; optional bool use_matmul_crop_and_resize = 31 [default = false];
// Normally, anchors generated for a given image size are pruned during
// training if they lie outside the image window. Setting this option to true,
// clips the anchors to be within the image instead of pruning.
optional bool clip_anchors_to_image = 32 [default = false];
// After peforming matching between anchors and targets, in order to pull out
// targets for training Faster R-CNN meta architecture we perform a gather
// operation. This options specifies whether to use an alternate
// implementation of tf.gather that is faster on TPUs.
optional bool use_matmul_gather_in_matcher = 33 [default = false];
// Whether to use the balanced positive negative sampler implementation with
// static shape guarantees.
optional bool use_static_balanced_label_sampler = 34 [default = false];
} }
......
...@@ -33,6 +33,7 @@ message PreprocessingStep { ...@@ -33,6 +33,7 @@ message PreprocessingStep {
RandomVerticalFlip random_vertical_flip = 25; RandomVerticalFlip random_vertical_flip = 25;
RandomRotation90 random_rotation90 = 26; RandomRotation90 random_rotation90 = 26;
RGBtoGray rgb_to_gray = 27; RGBtoGray rgb_to_gray = 27;
ConvertClassLogitsToSoftmax convert_class_logits_to_softmax = 28;
} }
} }
...@@ -409,3 +410,11 @@ message SSDRandomCropPadFixedAspectRatio { ...@@ -409,3 +410,11 @@ message SSDRandomCropPadFixedAspectRatio {
// width. Two entries per operation. // width. Two entries per operation.
repeated float max_padded_size_ratio = 4; repeated float max_padded_size_ratio = 4;
} }
// Converts class logits to softmax optionally scaling the values by temperature
// first.
message ConvertClassLogitsToSoftmax {
// Scale to use on logits before applying softmax.
optional float temperature = 1 [default=1.0];
}
...@@ -9,6 +9,7 @@ message RegionSimilarityCalculator { ...@@ -9,6 +9,7 @@ message RegionSimilarityCalculator {
NegSqDistSimilarity neg_sq_dist_similarity = 1; NegSqDistSimilarity neg_sq_dist_similarity = 1;
IouSimilarity iou_similarity = 2; IouSimilarity iou_similarity = 2;
IoaSimilarity ioa_similarity = 3; IoaSimilarity ioa_similarity = 3;
ThresholdedIouSimilarity thresholded_iou_similarity = 4;
} }
} }
...@@ -23,3 +24,10 @@ message IouSimilarity { ...@@ -23,3 +24,10 @@ message IouSimilarity {
// Configuration for intersection-over-area (IOA) similarity calculator. // Configuration for intersection-over-area (IOA) similarity calculator.
message IoaSimilarity { message IoaSimilarity {
} }
// Configuration for thresholded-intersection-over-union similarity calculator.
message ThresholdedIouSimilarity {
// IOU threshold used for filtering scores.
optional float iou_threshold = 1 [default = 0.5];
}
...@@ -120,4 +120,30 @@ message SsdFeatureExtractor { ...@@ -120,4 +120,30 @@ message SsdFeatureExtractor {
// Whether to use depthwise separable convolutions for to extract additional // Whether to use depthwise separable convolutions for to extract additional
// feature maps added by SSD. // feature maps added by SSD.
optional bool use_depthwise = 8 [default=false]; optional bool use_depthwise = 8 [default=false];
// Feature Pyramid Networks config.
optional FeaturePyramidNetworks fpn = 10;
}
// Configuration for Feature Pyramid Networks.
message FeaturePyramidNetworks {
// We recommend to use multi_resolution_feature_map_generator with FPN, and
// the levels there must match the levels defined below for better
// performance.
// Correspondence from FPN levels to Resnet/Mobilenet V1 feature maps:
// FPN Level Resnet Feature Map Mobilenet-V1 Feature Map
// 2 Block 1 Conv2d_3_pointwise
// 3 Block 2 Conv2d_5_pointwise
// 4 Block 3 Conv2d_11_pointwise
// 5 Block 4 Conv2d_13_pointwise
// 6 Bottomup_5 bottom_up_Conv2d_14
// 7 Bottomup_6 bottom_up_Conv2d_15
// 8 Bottomup_7 bottom_up_Conv2d_16
// 9 Bottomup_8 bottom_up_Conv2d_17
// minimum level in feature pyramid
optional int32 min_level = 1 [default = 3];
// maximum level in feature pyramid
optional int32 max_level = 2 [default = 7];
} }
# SSD with Mobilenet v1 feature extractor and focal loss. # SSD with Mobilenet v1 feature extractor and focal loss.
# Trained on COCO14, initialized from Imagenet classification checkpoint # Trained on COCO14, initialized from Imagenet classification checkpoint
# Achieves 19.3 mAP on COCO14 minival dataset. Doubling the number of training # Achieves 20.5 mAP on COCO14 minival dataset.
# steps gets to 20.6 mAP.
# This config is TPU compatible # This config is TPU compatible
...@@ -144,11 +143,11 @@ model { ...@@ -144,11 +143,11 @@ model {
train_config: { train_config: {
fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt" fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt"
batch_size: 2048 batch_size: 1024
sync_replicas: true sync_replicas: true
startup_delay_steps: 0 startup_delay_steps: 0
replicas_to_aggregate: 8 replicas_to_aggregate: 8
num_steps: 10000 num_steps: 20000
data_augmentation_options { data_augmentation_options {
random_horizontal_flip { random_horizontal_flip {
} }
...@@ -162,9 +161,9 @@ train_config: { ...@@ -162,9 +161,9 @@ train_config: {
learning_rate: { learning_rate: {
cosine_decay_learning_rate { cosine_decay_learning_rate {
learning_rate_base: 0.9 learning_rate_base: 0.9
total_steps: 10000 total_steps: 20000
warmup_learning_rate: 0.3 warmup_learning_rate: 0.3
warmup_steps: 300 warmup_steps: 1000
} }
} }
momentum_optimizer_value: 0.9 momentum_optimizer_value: 0.9
......
...@@ -79,6 +79,10 @@ model { ...@@ -79,6 +79,10 @@ model {
} }
feature_extractor { feature_extractor {
type: 'ssd_mobilenet_v1_fpn' type: 'ssd_mobilenet_v1_fpn'
fpn {
min_level: 3
max_level: 7
}
min_depth: 16 min_depth: 16
depth_multiplier: 1.0 depth_multiplier: 1.0
conv_hyperparams { conv_hyperparams {
......
...@@ -80,6 +80,10 @@ model { ...@@ -80,6 +80,10 @@ model {
} }
feature_extractor { feature_extractor {
type: 'ssd_resnet50_v1_fpn' type: 'ssd_resnet50_v1_fpn'
fpn {
min_level: 3
max_level: 7
}
min_depth: 16 min_depth: 16
depth_multiplier: 1.0 depth_multiplier: 1.0
conv_hyperparams { conv_hyperparams {
......
...@@ -139,15 +139,26 @@ def load_labelmap(path): ...@@ -139,15 +139,26 @@ def load_labelmap(path):
return label_map return label_map
def get_label_map_dict(label_map_path, use_display_name=False): def get_label_map_dict(label_map_path,
use_display_name=False,
fill_in_gaps_and_background=False):
"""Reads a label map and returns a dictionary of label names to id. """Reads a label map and returns a dictionary of label names to id.
Args: Args:
label_map_path: path to label_map. label_map_path: path to StringIntLabelMap proto text file.
use_display_name: whether to use the label map items' display names as keys. use_display_name: whether to use the label map items' display names as keys.
fill_in_gaps_and_background: whether to fill in gaps and background with
respect to the id field in the proto. The id: 0 is reserved for the
'background' class and will be added if it is missing. All other missing
ids in range(1, max(id)) will be added with a dummy class name
("class_<id>") if they are missing.
Returns: Returns:
A dictionary mapping label names to id. A dictionary mapping label names to id.
Raises:
ValueError: if fill_in_gaps_and_background and label_map has non-integer or
negative values.
""" """
label_map = load_labelmap(label_map_path) label_map = load_labelmap(label_map_path)
label_map_dict = {} label_map_dict = {}
...@@ -156,6 +167,24 @@ def get_label_map_dict(label_map_path, use_display_name=False): ...@@ -156,6 +167,24 @@ def get_label_map_dict(label_map_path, use_display_name=False):
label_map_dict[item.display_name] = item.id label_map_dict[item.display_name] = item.id
else: else:
label_map_dict[item.name] = item.id label_map_dict[item.name] = item.id
if fill_in_gaps_and_background:
values = set(label_map_dict.values())
if 0 not in values:
label_map_dict['background'] = 0
if not all(isinstance(value, int) for value in values):
raise ValueError('The values in label map must be integers in order to'
'fill_in_gaps_and_background.')
if not all(value >= 0 for value in values):
raise ValueError('The values in the label map must be positive.')
if len(values) != max(values) + 1:
# there are gaps in the labels, fill in gaps.
for value in range(1, max(values)):
if value not in values:
label_map_dict['class_' + str(value)] = value
return label_map_dict return label_map_dict
......
...@@ -119,6 +119,30 @@ class LabelMapUtilTest(tf.test.TestCase): ...@@ -119,6 +119,30 @@ class LabelMapUtilTest(tf.test.TestCase):
self.assertEqual(label_map_dict['dog'], 1) self.assertEqual(label_map_dict['dog'], 1)
self.assertEqual(label_map_dict['cat'], 2) self.assertEqual(label_map_dict['cat'], 2)
def test_get_label_map_dict_with_fill_in_gaps_and_background(self):
label_map_string = """
item {
id:3
name:'cat'
}
item {
id:1
name:'dog'
}
"""
label_map_path = os.path.join(self.get_temp_dir(), 'label_map.pbtxt')
with tf.gfile.Open(label_map_path, 'wb') as f:
f.write(label_map_string)
label_map_dict = label_map_util.get_label_map_dict(
label_map_path, fill_in_gaps_and_background=True)
self.assertEqual(label_map_dict['background'], 0)
self.assertEqual(label_map_dict['dog'], 1)
self.assertEqual(label_map_dict['class_2'], 2)
self.assertEqual(label_map_dict['cat'], 3)
self.assertEqual(len(label_map_dict), max(label_map_dict.values()) + 1)
def test_keep_categories_with_unique_id(self): def test_keep_categories_with_unique_id(self):
label_map_proto = string_int_label_map_pb2.StringIntLabelMap() label_map_proto = string_int_label_map_pb2.StringIntLabelMap()
label_map_string = """ label_map_string = """
......
...@@ -31,6 +31,7 @@ from abc import ABCMeta ...@@ -31,6 +31,7 @@ from abc import ABCMeta
from abc import abstractmethod from abc import abstractmethod
import collections import collections
import logging import logging
import unicodedata
import numpy as np import numpy as np
from object_detection.core import standard_fields from object_detection.core import standard_fields
...@@ -284,18 +285,23 @@ class ObjectDetectionEvaluator(DetectionEvaluator): ...@@ -284,18 +285,23 @@ class ObjectDetectionEvaluator(DetectionEvaluator):
category_index = label_map_util.create_category_index(self._categories) category_index = label_map_util.create_category_index(self._categories)
for idx in range(per_class_ap.size): for idx in range(per_class_ap.size):
if idx + self._label_id_offset in category_index: if idx + self._label_id_offset in category_index:
category_name = category_index[idx + self._label_id_offset]['name']
try:
category_name = unicode(category_name, 'utf-8')
except TypeError:
pass
category_name = unicodedata.normalize(
'NFKD', category_name).encode('ascii', 'ignore')
display_name = ( display_name = (
self._metric_prefix + 'PerformanceByCategory/AP@{}IOU/{}'.format( self._metric_prefix + 'PerformanceByCategory/AP@{}IOU/{}'.format(
self._matching_iou_threshold, self._matching_iou_threshold, category_name))
category_index[idx + self._label_id_offset]['name']))
pascal_metrics[display_name] = per_class_ap[idx] pascal_metrics[display_name] = per_class_ap[idx]
# Optionally add CorLoc metrics.classes # Optionally add CorLoc metrics.classes
if self._evaluate_corlocs: if self._evaluate_corlocs:
display_name = ( display_name = (
self._metric_prefix + 'PerformanceByCategory/CorLoc@{}IOU/{}' self._metric_prefix + 'PerformanceByCategory/CorLoc@{}IOU/{}'
.format(self._matching_iou_threshold, .format(self._matching_iou_threshold, category_name))
category_index[idx + self._label_id_offset]['name']))
pascal_metrics[display_name] = per_class_corloc[idx] pascal_metrics[display_name] = per_class_corloc[idx]
return pascal_metrics return pascal_metrics
...@@ -839,9 +845,9 @@ class ObjectDetectionEvaluation(object): ...@@ -839,9 +845,9 @@ class ObjectDetectionEvaluation(object):
if self.use_weighted_mean_ap: if self.use_weighted_mean_ap:
all_scores = np.append(all_scores, scores) all_scores = np.append(all_scores, scores)
all_tp_fp_labels = np.append(all_tp_fp_labels, tp_fp_labels) all_tp_fp_labels = np.append(all_tp_fp_labels, tp_fp_labels)
print 'Scores and tpfp per class label: {}'.format(class_index) logging.info('Scores and tpfp per class label: %d', class_index)
print tp_fp_labels logging.info(tp_fp_labels)
print scores logging.info(scores)
precision, recall = metrics.compute_precision_recall( precision, recall = metrics.compute_precision_recall(
scores, tp_fp_labels, self.num_gt_instances_per_class[class_index]) scores, tp_fp_labels, self.num_gt_instances_per_class[class_index])
self.precisions_per_class.append(precision) self.precisions_per_class.append(precision)
......
...@@ -20,8 +20,6 @@ import six ...@@ -20,8 +20,6 @@ import six
import tensorflow as tf import tensorflow as tf
from object_detection.core import box_list
from object_detection.core import box_list_ops
from object_detection.core import standard_fields as fields from object_detection.core import standard_fields as fields
from object_detection.utils import shape_utils from object_detection.utils import shape_utils
from object_detection.utils import static_shape from object_detection.utils import static_shape
...@@ -60,13 +58,20 @@ def normalized_to_image_coordinates(normalized_boxes, image_shape, ...@@ -60,13 +58,20 @@ def normalized_to_image_coordinates(normalized_boxes, image_shape,
parallel_iterations: parallelism for the map_fn op. parallel_iterations: parallelism for the map_fn op.
Returns: Returns:
absolute_boxes: a float32 tensor of shape [None, num_boxes, 4] containg the absolute_boxes: a float32 tensor of shape [None, num_boxes, 4] containing
boxes in image coordinates. the boxes in image coordinates.
""" """
x_scale = tf.cast(image_shape[2], tf.float32)
y_scale = tf.cast(image_shape[1], tf.float32)
def _to_absolute_coordinates(normalized_boxes): def _to_absolute_coordinates(normalized_boxes):
return box_list_ops.to_absolute_coordinates( y_min, x_min, y_max, x_max = tf.split(
box_list.BoxList(normalized_boxes), value=normalized_boxes, num_or_size_splits=4, axis=1)
image_shape[1], image_shape[2], check_range=False).get() y_min = y_scale * y_min
y_max = y_scale * y_max
x_min = x_scale * x_min
x_max = x_scale * x_max
scaled_boxes = tf.concat([y_min, x_min, y_max, x_max], 1)
return scaled_boxes
absolute_boxes = shape_utils.static_or_dynamic_map_fn( absolute_boxes = shape_utils.static_or_dynamic_map_fn(
_to_absolute_coordinates, _to_absolute_coordinates,
...@@ -538,13 +543,59 @@ def normalize_to_target(inputs, ...@@ -538,13 +543,59 @@ def normalize_to_target(inputs,
return tf.reshape(target_norm, mult_shape) * tf.truediv(inputs, lengths) return tf.reshape(target_norm, mult_shape) * tf.truediv(inputs, lengths)
def batch_position_sensitive_crop_regions(images,
boxes,
crop_size,
num_spatial_bins,
global_pool,
parallel_iterations=64):
"""Position sensitive crop with batches of images and boxes.
This op is exactly like `position_sensitive_crop_regions` below but operates
on batches of images and boxes. See `position_sensitive_crop_regions` function
below for the operation applied per batch element.
Args:
images: A `Tensor`. Must be one of the following types: `uint8`, `int8`,
`int16`, `int32`, `int64`, `half`, `float32`, `float64`.
A 4-D tensor of shape `[batch, image_height, image_width, depth]`.
Both `image_height` and `image_width` need to be positive.
boxes: A `Tensor` of type `float32`.
A 3-D tensor of shape `[batch, num_boxes, 4]`. Each box is specified in
normalized coordinates `[y1, x1, y2, x2]`. A normalized coordinate value
of `y` is mapped to the image coordinate at `y * (image_height - 1)`, so
as the `[0, 1]` interval of normalized image height is mapped to
`[0, image_height - 1] in image height coordinates. We do allow y1 > y2,
in which case the sampled crop is an up-down flipped version of the
original image. The width dimension is treated similarly.
crop_size: See `position_sensitive_crop_regions` below.
num_spatial_bins: See `position_sensitive_crop_regions` below.
global_pool: See `position_sensitive_crop_regions` below.
parallel_iterations: Number of batch items to process in parallel.
Returns:
"""
def _position_sensitive_crop_fn(inputs):
images, boxes = inputs
return position_sensitive_crop_regions(
images,
boxes,
crop_size=crop_size,
num_spatial_bins=num_spatial_bins,
global_pool=global_pool)
return shape_utils.static_or_dynamic_map_fn(
_position_sensitive_crop_fn,
elems=[images, boxes],
dtype=tf.float32,
parallel_iterations=parallel_iterations)
def position_sensitive_crop_regions(image, def position_sensitive_crop_regions(image,
boxes, boxes,
box_ind,
crop_size, crop_size,
num_spatial_bins, num_spatial_bins,
global_pool, global_pool):
extrapolation_value=None):
"""Position-sensitive crop and pool rectangular regions from a feature grid. """Position-sensitive crop and pool rectangular regions from a feature grid.
The output crops are split into `spatial_bins_y` vertical bins The output crops are split into `spatial_bins_y` vertical bins
...@@ -565,23 +616,16 @@ def position_sensitive_crop_regions(image, ...@@ -565,23 +616,16 @@ def position_sensitive_crop_regions(image,
Args: Args:
image: A `Tensor`. Must be one of the following types: `uint8`, `int8`, image: A `Tensor`. Must be one of the following types: `uint8`, `int8`,
`int16`, `int32`, `int64`, `half`, `float32`, `float64`. `int16`, `int32`, `int64`, `half`, `float32`, `float64`.
A 4-D tensor of shape `[batch, image_height, image_width, depth]`. A 3-D tensor of shape `[image_height, image_width, depth]`.
Both `image_height` and `image_width` need to be positive. Both `image_height` and `image_width` need to be positive.
boxes: A `Tensor` of type `float32`. boxes: A `Tensor` of type `float32`.
A 2-D tensor of shape `[num_boxes, 4]`. The `i`-th row of the tensor A 2-D tensor of shape `[num_boxes, 4]`. Each box is specified in
specifies the coordinates of a box in the `box_ind[i]` image and is normalized coordinates `[y1, x1, y2, x2]`. A normalized coordinate value
specified in normalized coordinates `[y1, x1, y2, x2]`. A normalized of `y` is mapped to the image coordinate at `y * (image_height - 1)`, so
coordinate value of `y` is mapped to the image coordinate at as the `[0, 1]` interval of normalized image height is mapped to
`y * (image_height - 1)`, so as the `[0, 1]` interval of normalized image `[0, image_height - 1] in image height coordinates. We do allow y1 > y2,
height is mapped to `[0, image_height - 1] in image height coordinates. in which case the sampled crop is an up-down flipped version of the
We do allow y1 > y2, in which case the sampled crop is an up-down flipped original image. The width dimension is treated similarly.
version of the original image. The width dimension is treated similarly.
Normalized coordinates outside the `[0, 1]` range are allowed, in which
case we use `extrapolation_value` to extrapolate the input image values.
box_ind: A `Tensor` of type `int32`.
A 1-D tensor of shape `[num_boxes]` with int32 values in `[0, batch)`.
The value of `box_ind[i]` specifies the image that the `i`-th box refers
to.
crop_size: A list of two integers `[crop_height, crop_width]`. All crop_size: A list of two integers `[crop_height, crop_width]`. All
cropped image patches are resized to this size. The aspect ratio of the cropped image patches are resized to this size. The aspect ratio of the
image content is not preserved. Both `crop_height` and `crop_width` need image content is not preserved. Both `crop_height` and `crop_width` need
...@@ -601,8 +645,7 @@ def position_sensitive_crop_regions(image, ...@@ -601,8 +645,7 @@ def position_sensitive_crop_regions(image,
Note that using global_pool=True is equivalent to but more efficient than Note that using global_pool=True is equivalent to but more efficient than
running the function with global_pool=False and then performing global running the function with global_pool=False and then performing global
average pooling. average pooling.
extrapolation_value: An optional `float`. Defaults to `0`.
Value used for extrapolation, when applicable.
Returns: Returns:
position_sensitive_features: A 4-D tensor of shape position_sensitive_features: A 4-D tensor of shape
`[num_boxes, K, K, crop_channels]`, `[num_boxes, K, K, crop_channels]`,
...@@ -649,12 +692,17 @@ def position_sensitive_crop_regions(image, ...@@ -649,12 +692,17 @@ def position_sensitive_crop_regions(image,
] ]
position_sensitive_boxes.append(tf.stack(box_coordinates, axis=1)) position_sensitive_boxes.append(tf.stack(box_coordinates, axis=1))
image_splits = tf.split(value=image, num_or_size_splits=total_bins, axis=3) image_splits = tf.split(value=image, num_or_size_splits=total_bins, axis=2)
image_crops = [] image_crops = []
for (split, box) in zip(image_splits, position_sensitive_boxes): for (split, box) in zip(image_splits, position_sensitive_boxes):
crop = tf.image.crop_and_resize(split, box, box_ind, bin_crop_size, if split.shape.is_fully_defined() and box.shape.is_fully_defined():
extrapolation_value=extrapolation_value) crop = matmul_crop_and_resize(
tf.expand_dims(split, 0), box, bin_crop_size)
else:
crop = tf.image.crop_and_resize(
tf.expand_dims(split, 0), box,
tf.zeros(tf.shape(boxes)[0], dtype=tf.int32), bin_crop_size)
image_crops.append(crop) image_crops.append(crop)
if global_pool: if global_pool:
...@@ -957,3 +1005,68 @@ def matmul_crop_and_resize(image, boxes, crop_size, scope=None): ...@@ -957,3 +1005,68 @@ def matmul_crop_and_resize(image, boxes, crop_size, scope=None):
tf.matmul(kernel_h, tf.tile(channel, [num_crops, 1, 1])), tf.matmul(kernel_h, tf.tile(channel, [num_crops, 1, 1])),
kernel_w, transpose_b=True)) kernel_w, transpose_b=True))
return tf.stack(result_channels, axis=3) return tf.stack(result_channels, axis=3)
def expected_classification_loss_under_sampling(batch_cls_targets, cls_losses,
negative_to_positive_ratio,
minimum_negative_sampling):
"""Computes classification loss by background/foreground weighting.
The weighting is such that the effective background/foreground weight ratio
is the negative_to_positive_ratio. if p_i is the foreground probability of
anchor a_i, L(a_i) is the anchors loss, N is the number of anchors, and M is
the sum of foreground probabilities across anchors, then the total loss L is
calculated as:
beta = K*M/(N-M)
L = sum_{i=1}^N [p_i + beta * (1 - p_i)] * (L(a_i))
Args:
batch_cls_targets: A tensor with shape [batch_size, num_anchors,
num_classes + 1], where 0'th index is the background class, containing
the class distrubution for the target assigned to a given anchor.
cls_losses: Float tensor of shape [batch_size, num_anchors]
representing anchorwise classification losses.
negative_to_positive_ratio: The desired background/foreground weight ratio.
minimum_negative_sampling: Minimum number of effective negative samples.
Used only when there are no positive examples.
Returns:
The classification loss.
"""
num_anchors = tf.cast(tf.shape(batch_cls_targets)[1], tf.float32)
# find the p_i
foreground_probabilities = (
foreground_probabilities_from_targets(batch_cls_targets))
foreground_sum = tf.reduce_sum(foreground_probabilities, axis=-1)
k = negative_to_positive_ratio
# compute beta
denominators = (num_anchors - foreground_sum)
beta = tf.where(
tf.equal(denominators, 0), tf.zeros_like(foreground_sum),
k * foreground_sum / denominators)
# where the foreground sum is zero, use a minimum negative weight.
min_negative_weight = 1.0 * minimum_negative_sampling / num_anchors
beta = tf.where(
tf.equal(beta, 0), min_negative_weight * tf.ones_like(beta), beta)
beta = tf.reshape(beta, [-1, 1])
cls_loss_weights = foreground_probabilities + (
1 - foreground_probabilities) * beta
weighted_losses = cls_loss_weights * cls_losses
cls_losses = tf.reduce_sum(weighted_losses, axis=-1)
return cls_losses
def foreground_probabilities_from_targets(batch_cls_targets):
foreground_probabilities = 1 - batch_cls_targets[:, :, 0]
return foreground_probabilities
...@@ -812,13 +812,12 @@ class OpsTestPositionSensitiveCropRegions(tf.test.TestCase): ...@@ -812,13 +812,12 @@ class OpsTestPositionSensitiveCropRegions(tf.test.TestCase):
def test_position_sensitive(self): def test_position_sensitive(self):
num_spatial_bins = [3, 2] num_spatial_bins = [3, 2]
image_shape = [1, 3, 2, 6] image_shape = [3, 2, 6]
# First channel is 1's, second channel is 2's, etc. # First channel is 1's, second channel is 2's, etc.
image = tf.constant(range(1, 3 * 2 + 1) * 6, dtype=tf.float32, image = tf.constant(range(1, 3 * 2 + 1) * 6, dtype=tf.float32,
shape=image_shape) shape=image_shape)
boxes = tf.random_uniform((2, 4)) boxes = tf.random_uniform((2, 4))
box_ind = tf.constant([0, 0], dtype=tf.int32)
# The result for both boxes should be [[1, 2], [3, 4], [5, 6]] # The result for both boxes should be [[1, 2], [3, 4], [5, 6]]
# before averaging. # before averaging.
...@@ -827,7 +826,7 @@ class OpsTestPositionSensitiveCropRegions(tf.test.TestCase): ...@@ -827,7 +826,7 @@ class OpsTestPositionSensitiveCropRegions(tf.test.TestCase):
for crop_size_mult in range(1, 3): for crop_size_mult in range(1, 3):
crop_size = [3 * crop_size_mult, 2 * crop_size_mult] crop_size = [3 * crop_size_mult, 2 * crop_size_mult]
ps_crop_and_pool = ops.position_sensitive_crop_regions( ps_crop_and_pool = ops.position_sensitive_crop_regions(
image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=True) image, boxes, crop_size, num_spatial_bins, global_pool=True)
with self.test_session() as sess: with self.test_session() as sess:
output = sess.run(ps_crop_and_pool) output = sess.run(ps_crop_and_pool)
...@@ -835,24 +834,24 @@ class OpsTestPositionSensitiveCropRegions(tf.test.TestCase): ...@@ -835,24 +834,24 @@ class OpsTestPositionSensitiveCropRegions(tf.test.TestCase):
def test_position_sensitive_with_equal_channels(self): def test_position_sensitive_with_equal_channels(self):
num_spatial_bins = [2, 2] num_spatial_bins = [2, 2]
image_shape = [1, 3, 3, 4] image_shape = [3, 3, 4]
crop_size = [2, 2] crop_size = [2, 2]
image = tf.constant(range(1, 3 * 3 + 1), dtype=tf.float32, image = tf.constant(range(1, 3 * 3 + 1), dtype=tf.float32,
shape=[1, 3, 3, 1]) shape=[3, 3, 1])
tiled_image = tf.tile(image, [1, 1, 1, image_shape[3]]) tiled_image = tf.tile(image, [1, 1, image_shape[2]])
boxes = tf.random_uniform((3, 4)) boxes = tf.random_uniform((3, 4))
box_ind = tf.constant([0, 0, 0], dtype=tf.int32) box_ind = tf.constant([0, 0, 0], dtype=tf.int32)
# All channels are equal so position-sensitive crop and resize should # All channels are equal so position-sensitive crop and resize should
# work as the usual crop and resize for just one channel. # work as the usual crop and resize for just one channel.
crop = tf.image.crop_and_resize(image, boxes, box_ind, crop_size) crop = tf.image.crop_and_resize(tf.expand_dims(image, axis=0), boxes,
box_ind, crop_size)
crop_and_pool = tf.reduce_mean(crop, [1, 2], keep_dims=True) crop_and_pool = tf.reduce_mean(crop, [1, 2], keep_dims=True)
ps_crop_and_pool = ops.position_sensitive_crop_regions( ps_crop_and_pool = ops.position_sensitive_crop_regions(
tiled_image, tiled_image,
boxes, boxes,
box_ind,
crop_size, crop_size,
num_spatial_bins, num_spatial_bins,
global_pool=True) global_pool=True)
...@@ -861,78 +860,53 @@ class OpsTestPositionSensitiveCropRegions(tf.test.TestCase): ...@@ -861,78 +860,53 @@ class OpsTestPositionSensitiveCropRegions(tf.test.TestCase):
expected_output, output = sess.run((crop_and_pool, ps_crop_and_pool)) expected_output, output = sess.run((crop_and_pool, ps_crop_and_pool))
self.assertAllClose(output, expected_output) self.assertAllClose(output, expected_output)
def test_position_sensitive_with_single_bin(self):
num_spatial_bins = [1, 1]
image_shape = [2, 3, 3, 4]
crop_size = [2, 2]
image = tf.random_uniform(image_shape)
boxes = tf.random_uniform((6, 4))
box_ind = tf.constant([0, 0, 0, 1, 1, 1], dtype=tf.int32)
# When a single bin is used, position-sensitive crop and pool should be
# the same as non-position sensitive crop and pool.
crop = tf.image.crop_and_resize(image, boxes, box_ind, crop_size)
crop_and_pool = tf.reduce_mean(crop, [1, 2], keep_dims=True)
ps_crop_and_pool = ops.position_sensitive_crop_regions(
image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=True)
with self.test_session() as sess:
expected_output, output = sess.run((crop_and_pool, ps_crop_and_pool))
self.assertAllClose(output, expected_output)
def test_raise_value_error_on_num_bins_less_than_one(self): def test_raise_value_error_on_num_bins_less_than_one(self):
num_spatial_bins = [1, -1] num_spatial_bins = [1, -1]
image_shape = [1, 1, 1, 2] image_shape = [1, 1, 2]
crop_size = [2, 2] crop_size = [2, 2]
image = tf.constant(1, dtype=tf.float32, shape=image_shape) image = tf.constant(1, dtype=tf.float32, shape=image_shape)
boxes = tf.constant([[0, 0, 1, 1]], dtype=tf.float32) boxes = tf.constant([[0, 0, 1, 1]], dtype=tf.float32)
box_ind = tf.constant([0], dtype=tf.int32)
with self.assertRaisesRegexp(ValueError, 'num_spatial_bins should be >= 1'): with self.assertRaisesRegexp(ValueError, 'num_spatial_bins should be >= 1'):
ops.position_sensitive_crop_regions( ops.position_sensitive_crop_regions(
image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=True) image, boxes, crop_size, num_spatial_bins, global_pool=True)
def test_raise_value_error_on_non_divisible_crop_size(self): def test_raise_value_error_on_non_divisible_crop_size(self):
num_spatial_bins = [2, 3] num_spatial_bins = [2, 3]
image_shape = [1, 1, 1, 6] image_shape = [1, 1, 6]
crop_size = [3, 2] crop_size = [3, 2]
image = tf.constant(1, dtype=tf.float32, shape=image_shape) image = tf.constant(1, dtype=tf.float32, shape=image_shape)
boxes = tf.constant([[0, 0, 1, 1]], dtype=tf.float32) boxes = tf.constant([[0, 0, 1, 1]], dtype=tf.float32)
box_ind = tf.constant([0], dtype=tf.int32)
with self.assertRaisesRegexp( with self.assertRaisesRegexp(
ValueError, 'crop_size should be divisible by num_spatial_bins'): ValueError, 'crop_size should be divisible by num_spatial_bins'):
ops.position_sensitive_crop_regions( ops.position_sensitive_crop_regions(
image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=True) image, boxes, crop_size, num_spatial_bins, global_pool=True)
def test_raise_value_error_on_non_divisible_num_channels(self): def test_raise_value_error_on_non_divisible_num_channels(self):
num_spatial_bins = [2, 2] num_spatial_bins = [2, 2]
image_shape = [1, 1, 1, 5] image_shape = [1, 1, 5]
crop_size = [2, 2] crop_size = [2, 2]
image = tf.constant(1, dtype=tf.float32, shape=image_shape) image = tf.constant(1, dtype=tf.float32, shape=image_shape)
boxes = tf.constant([[0, 0, 1, 1]], dtype=tf.float32) boxes = tf.constant([[0, 0, 1, 1]], dtype=tf.float32)
box_ind = tf.constant([0], dtype=tf.int32)
with self.assertRaisesRegexp( with self.assertRaisesRegexp(
ValueError, 'Dimension size must be evenly divisible by 4 but is 5'): ValueError, 'Dimension size must be evenly divisible by 4 but is 5'):
ops.position_sensitive_crop_regions( ops.position_sensitive_crop_regions(
image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=True) image, boxes, crop_size, num_spatial_bins, global_pool=True)
def test_position_sensitive_with_global_pool_false(self): def test_position_sensitive_with_global_pool_false(self):
num_spatial_bins = [3, 2] num_spatial_bins = [3, 2]
image_shape = [1, 3, 2, 6] image_shape = [3, 2, 6]
num_boxes = 2 num_boxes = 2
# First channel is 1's, second channel is 2's, etc. # First channel is 1's, second channel is 2's, etc.
image = tf.constant(range(1, 3 * 2 + 1) * 6, dtype=tf.float32, image = tf.constant(range(1, 3 * 2 + 1) * 6, dtype=tf.float32,
shape=image_shape) shape=image_shape)
boxes = tf.random_uniform((num_boxes, 4)) boxes = tf.random_uniform((num_boxes, 4))
box_ind = tf.constant([0, 0], dtype=tf.int32)
expected_output = [] expected_output = []
...@@ -956,79 +930,21 @@ class OpsTestPositionSensitiveCropRegions(tf.test.TestCase): ...@@ -956,79 +930,21 @@ class OpsTestPositionSensitiveCropRegions(tf.test.TestCase):
for crop_size_mult in range(1, 3): for crop_size_mult in range(1, 3):
crop_size = [3 * crop_size_mult, 2 * crop_size_mult] crop_size = [3 * crop_size_mult, 2 * crop_size_mult]
ps_crop = ops.position_sensitive_crop_regions( ps_crop = ops.position_sensitive_crop_regions(
image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=False) image, boxes, crop_size, num_spatial_bins, global_pool=False)
with self.test_session() as sess: with self.test_session() as sess:
output = sess.run(ps_crop) output = sess.run(ps_crop)
self.assertAllEqual(output, expected_output[crop_size_mult - 1]) self.assertAllEqual(output, expected_output[crop_size_mult - 1])
def test_position_sensitive_with_global_pool_false_and_known_boxes(self):
num_spatial_bins = [2, 2]
image_shape = [2, 2, 2, 4]
crop_size = [2, 2]
image = tf.constant(range(1, 2 * 2 * 4 + 1) * 2, dtype=tf.float32,
shape=image_shape)
# First box contains whole image, and second box contains only first row.
boxes = tf.constant(np.array([[0., 0., 1., 1.],
[0., 0., 0.5, 1.]]), dtype=tf.float32)
box_ind = tf.constant([0, 1], dtype=tf.int32)
expected_output = []
# Expected output, when the box containing whole image.
expected_output.append(
np.reshape(np.array([[4, 7],
[10, 13]]),
(1, 2, 2, 1))
)
# Expected output, when the box containing only first row.
expected_output.append(
np.reshape(np.array([[3, 6],
[7, 10]]),
(1, 2, 2, 1))
)
expected_output = np.concatenate(expected_output, axis=0)
ps_crop = ops.position_sensitive_crop_regions(
image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=False)
with self.test_session() as sess:
output = sess.run(ps_crop)
self.assertAllEqual(output, expected_output)
def test_position_sensitive_with_global_pool_false_and_single_bin(self):
num_spatial_bins = [1, 1]
image_shape = [2, 3, 3, 4]
crop_size = [1, 1]
image = tf.random_uniform(image_shape)
boxes = tf.random_uniform((6, 4))
box_ind = tf.constant([0, 0, 0, 1, 1, 1], dtype=tf.int32)
# Since single_bin is used and crop_size = [1, 1] (i.e., no crop resize),
# the outputs are the same whatever the global_pool value is.
ps_crop_and_pool = ops.position_sensitive_crop_regions(
image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=True)
ps_crop = ops.position_sensitive_crop_regions(
image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=False)
with self.test_session() as sess:
pooled_output, unpooled_output = sess.run((ps_crop_and_pool, ps_crop))
self.assertAllClose(pooled_output, unpooled_output)
def test_position_sensitive_with_global_pool_false_and_do_global_pool(self): def test_position_sensitive_with_global_pool_false_and_do_global_pool(self):
num_spatial_bins = [3, 2] num_spatial_bins = [3, 2]
image_shape = [1, 3, 2, 6] image_shape = [3, 2, 6]
num_boxes = 2 num_boxes = 2
# First channel is 1's, second channel is 2's, etc. # First channel is 1's, second channel is 2's, etc.
image = tf.constant(range(1, 3 * 2 + 1) * 6, dtype=tf.float32, image = tf.constant(range(1, 3 * 2 + 1) * 6, dtype=tf.float32,
shape=image_shape) shape=image_shape)
boxes = tf.random_uniform((num_boxes, 4)) boxes = tf.random_uniform((num_boxes, 4))
box_ind = tf.constant([0, 0], dtype=tf.int32)
expected_output = [] expected_output = []
...@@ -1059,7 +975,7 @@ class OpsTestPositionSensitiveCropRegions(tf.test.TestCase): ...@@ -1059,7 +975,7 @@ class OpsTestPositionSensitiveCropRegions(tf.test.TestCase):
# Perform global_pooling after running the function with # Perform global_pooling after running the function with
# global_pool=False. # global_pool=False.
ps_crop = ops.position_sensitive_crop_regions( ps_crop = ops.position_sensitive_crop_regions(
image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=False) image, boxes, crop_size, num_spatial_bins, global_pool=False)
ps_crop_and_pool = tf.reduce_mean( ps_crop_and_pool = tf.reduce_mean(
ps_crop, reduction_indices=(1, 2), keep_dims=True) ps_crop, reduction_indices=(1, 2), keep_dims=True)
...@@ -1070,17 +986,99 @@ class OpsTestPositionSensitiveCropRegions(tf.test.TestCase): ...@@ -1070,17 +986,99 @@ class OpsTestPositionSensitiveCropRegions(tf.test.TestCase):
def test_raise_value_error_on_non_square_block_size(self): def test_raise_value_error_on_non_square_block_size(self):
num_spatial_bins = [3, 2] num_spatial_bins = [3, 2]
image_shape = [1, 3, 2, 6] image_shape = [3, 2, 6]
crop_size = [6, 2] crop_size = [6, 2]
image = tf.constant(1, dtype=tf.float32, shape=image_shape) image = tf.constant(1, dtype=tf.float32, shape=image_shape)
boxes = tf.constant([[0, 0, 1, 1]], dtype=tf.float32) boxes = tf.constant([[0, 0, 1, 1]], dtype=tf.float32)
box_ind = tf.constant([0], dtype=tf.int32)
with self.assertRaisesRegexp( with self.assertRaisesRegexp(
ValueError, 'Only support square bin crop size for now.'): ValueError, 'Only support square bin crop size for now.'):
ops.position_sensitive_crop_regions( ops.position_sensitive_crop_regions(
image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=False) image, boxes, crop_size, num_spatial_bins, global_pool=False)
class OpsTestBatchPositionSensitiveCropRegions(tf.test.TestCase):
def test_position_sensitive_with_single_bin(self):
num_spatial_bins = [1, 1]
image_shape = [2, 3, 3, 4]
crop_size = [2, 2]
image = tf.random_uniform(image_shape)
boxes = tf.random_uniform((2, 3, 4))
box_ind = tf.constant([0, 0, 0, 1, 1, 1], dtype=tf.int32)
# When a single bin is used, position-sensitive crop and pool should be
# the same as non-position sensitive crop and pool.
crop = tf.image.crop_and_resize(image, tf.reshape(boxes, [-1, 4]), box_ind,
crop_size)
crop_and_pool = tf.reduce_mean(crop, [1, 2], keepdims=True)
crop_and_pool = tf.reshape(crop_and_pool, [2, 3, 1, 1, 4])
ps_crop_and_pool = ops.batch_position_sensitive_crop_regions(
image, boxes, crop_size, num_spatial_bins, global_pool=True)
with self.test_session() as sess:
expected_output, output = sess.run((crop_and_pool, ps_crop_and_pool))
self.assertAllClose(output, expected_output)
def test_position_sensitive_with_global_pool_false_and_known_boxes(self):
num_spatial_bins = [2, 2]
image_shape = [2, 2, 2, 4]
crop_size = [2, 2]
images = tf.constant(range(1, 2 * 2 * 4 + 1) * 2, dtype=tf.float32,
shape=image_shape)
# First box contains whole image, and second box contains only first row.
boxes = tf.constant(np.array([[[0., 0., 1., 1.]],
[[0., 0., 0.5, 1.]]]), dtype=tf.float32)
# box_ind = tf.constant([0, 1], dtype=tf.int32)
expected_output = []
# Expected output, when the box containing whole image.
expected_output.append(
np.reshape(np.array([[4, 7],
[10, 13]]),
(1, 2, 2, 1))
)
# Expected output, when the box containing only first row.
expected_output.append(
np.reshape(np.array([[3, 6],
[7, 10]]),
(1, 2, 2, 1))
)
expected_output = np.stack(expected_output, axis=0)
ps_crop = ops.batch_position_sensitive_crop_regions(
images, boxes, crop_size, num_spatial_bins, global_pool=False)
with self.test_session() as sess:
output = sess.run(ps_crop)
self.assertAllEqual(output, expected_output)
def test_position_sensitive_with_global_pool_false_and_single_bin(self):
num_spatial_bins = [1, 1]
image_shape = [2, 3, 3, 4]
crop_size = [1, 1]
images = tf.random_uniform(image_shape)
boxes = tf.random_uniform((2, 3, 4))
# box_ind = tf.constant([0, 0, 0, 1, 1, 1], dtype=tf.int32)
# Since single_bin is used and crop_size = [1, 1] (i.e., no crop resize),
# the outputs are the same whatever the global_pool value is.
ps_crop_and_pool = ops.batch_position_sensitive_crop_regions(
images, boxes, crop_size, num_spatial_bins, global_pool=True)
ps_crop = ops.batch_position_sensitive_crop_regions(
images, boxes, crop_size, num_spatial_bins, global_pool=False)
with self.test_session() as sess:
pooled_output, unpooled_output = sess.run((ps_crop_and_pool, ps_crop))
self.assertAllClose(pooled_output, unpooled_output)
class ReframeBoxMasksToImageMasksTest(tf.test.TestCase): class ReframeBoxMasksToImageMasksTest(tf.test.TestCase):
...@@ -1365,5 +1363,86 @@ class OpsTestMatMulCropAndResize(test_case.TestCase): ...@@ -1365,5 +1363,86 @@ class OpsTestMatMulCropAndResize(test_case.TestCase):
_ = ops.matmul_crop_and_resize(image, boxes, crop_size) _ = ops.matmul_crop_and_resize(image, boxes, crop_size)
class OpsTestExpectedClassificationLoss(test_case.TestCase):
def testExpectedClassificationLossUnderSamplingWithHardLabels(self):
def graph_fn(batch_cls_targets, cls_losses, negative_to_positive_ratio,
minimum_negative_sampling):
return ops.expected_classification_loss_under_sampling(
batch_cls_targets, cls_losses, negative_to_positive_ratio,
minimum_negative_sampling)
batch_cls_targets = np.array(
[[[1., 0, 0], [0, 1., 0]], [[1., 0, 0], [0, 1., 0]]], dtype=np.float32)
cls_losses = np.array([[1, 2], [3, 4]], dtype=np.float32)
negative_to_positive_ratio = np.array([2], dtype=np.float32)
minimum_negative_sampling = np.array([1], dtype=np.float32)
classification_loss = self.execute(graph_fn, [
batch_cls_targets, cls_losses, negative_to_positive_ratio,
minimum_negative_sampling
])
# expected_foregorund_sum = [1,1]
# expected_beta = [2,2]
# expected_cls_loss_weights = [2,1],[2,1]
# expected_classification_loss_under_sampling = [2*1+1*2, 2*3+1*4]
expected_classification_loss_under_sampling = [2 + 2, 6 + 4]
self.assertAllClose(expected_classification_loss_under_sampling,
classification_loss)
def testExpectedClassificationLossUnderSamplingWithAllNegative(self):
def graph_fn(batch_cls_targets, cls_losses):
return ops.expected_classification_loss_under_sampling(
batch_cls_targets, cls_losses, negative_to_positive_ratio,
minimum_negative_sampling)
batch_cls_targets = np.array(
[[[1, 0, 0], [1, 0, 0]], [[1, 0, 0], [1, 0, 0]]], dtype=np.float32)
cls_losses = np.array([[1, 2], [3, 4]], dtype=np.float32)
negative_to_positive_ratio = np.array([2], dtype=np.float32)
minimum_negative_sampling = np.array([1], dtype=np.float32)
classification_loss = self.execute(graph_fn,
[batch_cls_targets, cls_losses])
# expected_foregorund_sum = [0,0]
# expected_beta = [0.5,0.5]
# expected_cls_loss_weights = [0.5,0.5],[0.5,0.5]
# expected_classification_loss_under_sampling = [.5*1+.5*2, .5*3+.5*4]
expected_classification_loss_under_sampling = [1.5, 3.5]
self.assertAllClose(expected_classification_loss_under_sampling,
classification_loss)
def testExpectedClassificationLossUnderSamplingWithAllPositive(self):
def graph_fn(batch_cls_targets, cls_losses):
return ops.expected_classification_loss_under_sampling(
batch_cls_targets, cls_losses, negative_to_positive_ratio,
minimum_negative_sampling)
batch_cls_targets = np.array(
[[[0, 1., 0], [0, 1., 0]], [[0, 1, 0], [0, 0, 1]]], dtype=np.float32)
cls_losses = np.array([[1, 2], [3, 4]], dtype=np.float32)
negative_to_positive_ratio = np.array([2], dtype=np.float32)
minimum_negative_sampling = np.array([1], dtype=np.float32)
classification_loss = self.execute(graph_fn,
[batch_cls_targets, cls_losses])
# expected_foregorund_sum = [2,2]
# expected_beta = [0,0]
# expected_cls_loss_weights = [1,1],[1,1]
# expected_classification_loss_under_sampling = [1*1+1*2, 1*3+1*4]
expected_classification_loss_under_sampling = [1 + 2, 3 + 4]
self.assertAllClose(expected_classification_loss_under_sampling,
classification_loss)
if __name__ == '__main__': if __name__ == '__main__':
tf.test.main() tf.test.main()
...@@ -106,13 +106,49 @@ def pad_or_clip_tensor(t, length): ...@@ -106,13 +106,49 @@ def pad_or_clip_tensor(t, length):
length is an integer, the first dimension of the processed tensor is set length is an integer, the first dimension of the processed tensor is set
to length statically. to length statically.
""" """
processed_t = tf.cond( return pad_or_clip_nd(t, [length] + t.shape.as_list()[1:])
tf.greater(tf.shape(t)[0], length),
lambda: clip_tensor(t, length),
lambda: pad_tensor(t, length)) def pad_or_clip_nd(tensor, output_shape):
if not _is_tensor(length): """Pad or Clip given tensor to the output shape.
processed_t = _set_dim_0(processed_t, length)
return processed_t Args:
tensor: Input tensor to pad or clip.
output_shape: A list of integers / scalar tensors (or None for dynamic dim)
representing the size to pad or clip each dimension of the input tensor.
Returns:
Input tensor padded and clipped to the output shape.
"""
tensor_shape = tf.shape(tensor)
clip_size = [
tf.where(tensor_shape[i] - shape > 0, shape, -1)
if shape is not None else -1 for i, shape in enumerate(output_shape)
]
clipped_tensor = tf.slice(
tensor,
begin=tf.zeros(len(clip_size), dtype=tf.int32),
size=clip_size)
# Pad tensor if the shape of clipped tensor is smaller than the expected
# shape.
clipped_tensor_shape = tf.shape(clipped_tensor)
trailing_paddings = [
shape - clipped_tensor_shape[i] if shape is not None else 0
for i, shape in enumerate(output_shape)
]
paddings = tf.stack(
[
tf.zeros(len(trailing_paddings), dtype=tf.int32),
trailing_paddings
],
axis=1)
padded_tensor = tf.pad(clipped_tensor, paddings=paddings)
output_static_shape = [
dim if not isinstance(dim, tf.Tensor) else None for dim in output_shape
]
padded_tensor.set_shape(output_static_shape)
return padded_tensor
def combined_static_and_dynamic_shape(tensor): def combined_static_and_dynamic_shape(tensor):
...@@ -306,4 +342,3 @@ def assert_shape_equal_along_first_dimension(shape_a, shape_b): ...@@ -306,4 +342,3 @@ def assert_shape_equal_along_first_dimension(shape_a, shape_b):
else: return tf.no_op() else: return tf.no_op()
else: else:
return tf.assert_equal(shape_a[0], shape_b[0]) return tf.assert_equal(shape_a[0], shape_b[0])
...@@ -123,6 +123,22 @@ class UtilTest(tf.test.TestCase): ...@@ -123,6 +123,22 @@ class UtilTest(tf.test.TestCase):
self.assertTrue(tf.contrib.framework.is_tensor(combined_shape[0])) self.assertTrue(tf.contrib.framework.is_tensor(combined_shape[0]))
self.assertListEqual(combined_shape[1:], [2, 3]) self.assertListEqual(combined_shape[1:], [2, 3])
def test_pad_or_clip_nd_tensor(self):
tensor_placeholder = tf.placeholder(tf.float32, [None, 5, 4, 7])
output_tensor = shape_utils.pad_or_clip_nd(
tensor_placeholder, [None, 3, 5, tf.constant(6)])
self.assertAllEqual(output_tensor.shape.as_list(), [None, 3, 5, None])
with self.test_session() as sess:
output_tensor_np = sess.run(
output_tensor,
feed_dict={
tensor_placeholder: np.random.rand(2, 5, 4, 7),
})
self.assertAllEqual(output_tensor_np.shape, [2, 3, 5, 6])
class StaticOrDynamicMapFnTest(tf.test.TestCase): class StaticOrDynamicMapFnTest(tf.test.TestCase):
......
...@@ -47,9 +47,10 @@ class TestCase(tf.test.TestCase): ...@@ -47,9 +47,10 @@ class TestCase(tf.test.TestCase):
materialized_results = sess.run(tpu_computation, materialized_results = sess.run(tpu_computation,
feed_dict=dict(zip(placeholders, inputs))) feed_dict=dict(zip(placeholders, inputs)))
sess.run(tpu.shutdown_system()) sess.run(tpu.shutdown_system())
if (len(materialized_results) == 1 if (hasattr(materialized_results, '__len__') and
and (isinstance(materialized_results, list) len(materialized_results) == 1 and
or isinstance(materialized_results, tuple))): (isinstance(materialized_results, list) or
isinstance(materialized_results, tuple))):
materialized_results = materialized_results[0] materialized_results = materialized_results[0]
return materialized_results return materialized_results
...@@ -72,9 +73,11 @@ class TestCase(tf.test.TestCase): ...@@ -72,9 +73,11 @@ class TestCase(tf.test.TestCase):
tf.local_variables_initializer()]) tf.local_variables_initializer()])
materialized_results = sess.run(results, feed_dict=dict(zip(placeholders, materialized_results = sess.run(results, feed_dict=dict(zip(placeholders,
inputs))) inputs)))
if (len(materialized_results) == 1
and (isinstance(materialized_results, list) if (hasattr(materialized_results, '__len__') and
or isinstance(materialized_results, tuple))): len(materialized_results) == 1 and
(isinstance(materialized_results, list) or
isinstance(materialized_results, tuple))):
materialized_results = materialized_results[0] materialized_results = materialized_results[0]
return materialized_results return materialized_results
......
...@@ -62,6 +62,30 @@ class MockBoxPredictor(box_predictor.BoxPredictor): ...@@ -62,6 +62,30 @@ class MockBoxPredictor(box_predictor.BoxPredictor):
class_predictions_with_background} class_predictions_with_background}
class MockKerasBoxPredictor(box_predictor.KerasBoxPredictor):
"""Simple box predictor that ignores inputs and outputs all zeros."""
def __init__(self, is_training, num_classes):
super(MockKerasBoxPredictor, self).__init__(
is_training, num_classes, False, False)
def _predict(self, image_features, **kwargs):
image_feature = image_features[0]
combined_feature_shape = shape_utils.combined_static_and_dynamic_shape(
image_feature)
batch_size = combined_feature_shape[0]
num_anchors = (combined_feature_shape[1] * combined_feature_shape[2])
code_size = 4
zero = tf.reduce_sum(0 * image_feature)
box_encodings = zero + tf.zeros(
(batch_size, num_anchors, 1, code_size), dtype=tf.float32)
class_predictions_with_background = zero + tf.zeros(
(batch_size, num_anchors, self.num_classes + 1), dtype=tf.float32)
return {box_predictor.BOX_ENCODINGS: box_encodings,
box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND:
class_predictions_with_background}
class MockAnchorGenerator(anchor_generator.AnchorGenerator): class MockAnchorGenerator(anchor_generator.AnchorGenerator):
"""Mock anchor generator.""" """Mock anchor generator."""
......
...@@ -134,8 +134,11 @@ def get_variables_available_in_checkpoint(variables, ...@@ -134,8 +134,11 @@ def get_variables_available_in_checkpoint(variables,
vars_in_ckpt[variable_name] = variable vars_in_ckpt[variable_name] = variable
else: else:
logging.warning('Variable [%s] is available in checkpoint, but has an ' logging.warning('Variable [%s] is available in checkpoint, but has an '
'incompatible shape with model variable.', 'incompatible shape with model variable. Checkpoint '
variable_name) 'shape: [%s], model variable shape: [%s]. This '
'variable will not be initialized from the checkpoint.',
variable_name, ckpt_vars_to_shape_map[variable_name],
variable.shape.as_list())
else: else:
logging.warning('Variable [%s] is not available in checkpoint', logging.warning('Variable [%s] is not available in checkpoint',
variable_name) variable_name)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment