"tests/test_datasets/test_scannet_dataset.py" did not exist on "2b3998cf14f42f2308d9796887b154d199ec1831"
Unverified Commit 02a9969e authored by pkulzc's avatar pkulzc Committed by GitHub
Browse files

Refactor object detection box predictors and fix some issues with model_main. (#4965)

* Merged commit includes the following changes:
206852642  by Zhichao Lu:

    Build the balanced_positive_negative_sampler in the model builder for FasterRCNN. Also adds an option to use the static implementation of the sampler.

--
206803260  by Zhichao Lu:

    Fixes a misplaced argument in resnet fpn feature extractor.

--
206682736  by Zhichao Lu:

    This CL modifies the SSD meta architecture to support both Slim-based and Keras-based box predictors, and begins preparation for Keras box predictor support in the other meta architectures.

    Concretely, this CL adds a new `KerasBoxPredictor` base class and makes the meta architectures appropriately call whichever box predictors they are using.

    We can switch the non-ssd meta architectures to fully support Keras box predictors once the Keras Convolutional Box Predictor CL is submitted.

--
206669634  by Zhichao Lu:

    Adds an alternate method for balanced positive negative sampler using static shapes.

--
206643278  by Zhichao Lu:

    This CL adds a Keras layer hyperparameter configuration object to the hyperparams_builder.

    It automatically converts from Slim layer hyperparameter configs to Keras layer hyperparameters. Namely, it:
    - Builds Keras initializers/regularizers instead of Slim ones
    - sets weights_regularizer/initializer to kernel_regularizer/initializer
    - converts batchnorm decay to momentum
    - converts Slim l2 regularizer weights to the equivalent Keras l2 weights

    This will be used in the conversion of object detection feature extractors & box predictors to newer Tensorflow APIs.

--
206611681  by Zhichao Lu:

    Internal changes.

--
206591619  by Zhichao Lu:

    Clip the to shape when the input tensors are larger than the expected padded static shape

--
206517644  by Zhichao Lu:

    Make MultiscaleGridAnchorGenerator more consistent with MultipleGridAnchorGenerator.

--
206415624  by Zhichao Lu:

    Make the hardcoded feature pyramid network (FPN) levels configurable for both SSD
    Resnet and SSD Mobilenet.

--
206398204  by Zhichao Lu:

    This CL modifies the SSD meta architecture to support both Slim-based and Keras-based feature extractors.

    This allows us to begin the conversion of object detection to newer Tensorflow APIs.

--
206213448  by Zhichao Lu:

    Adding a method to compute the expected classification loss by background/foreground weighting.

--
206204232  by Zhichao Lu:

    Adding the keypoint head to the Mask RCNN pipeline.

--
206200352  by Zhichao Lu:

    - Create Faster R-CNN target assigner in the model builder. This allows configuring matchers in Target assigner to use TPU compatible ops (tf.gather in this case) without any change in meta architecture.
    - As a +ve side effect of the refactoring, we can now re-use a single target assigner for all of second stage heads in Faster R-CNN.

--
206178206  by Zhichao Lu:

    Force ssd feature extractor builder to use keyword arguments so values won't be passed to wrong arguments.

--
206168297  by Zhichao Lu:

    Updating exporter to use freeze_graph.freeze_graph_with_def_protos rather than a homegrown version.

--
206080748  by Zhichao Lu:

    Merge external contributions.

--
206074460  by Zhichao Lu:

    Update to preprocessor to apply temperature and softmax to the multiclass scores on read.

--
205960802  by Zhichao Lu:

    Fixing a bug in hierarchical label expansion script.

--
205944686  by Zhichao Lu:

    Update exporter to support exporting quantized model.

--
205912529  by Zhichao Lu:

    Add a two stage matcher to allow for thresholding by one criteria and then argmaxing on the other.

--
205909017  by Zhichao Lu:

    Add test for grayscale image_resizer

--
205892801  by Zhichao Lu:

    Add flag to decide whether to apply batch norm to conv layers of weight shared box predictor.

--
205824449  by Zhichao Lu:

    make sure that by default mask rcnn box predictor predicts 2 stages.

--
205730139  by Zhichao Lu:

    Updating warning message to be more explicit about variable size mismatch.

--
205696992  by Zhichao Lu:

    Remove utils/ops.py's dependency on core/box_list_ops.py. This will allow re-using TPU compatible ops from utils/ops.py in core/box_list_ops.py.

--
205696867  by Zhichao Lu:

    Refactoring mask rcnn predictor so have each head in a separate file.
    This CL lets us to add new heads more easily in the future to mask rcnn.

--
205492073  by Zhichao Lu:

    Refactor R-FCN box predictor to be TPU compliant.

    - Change utils/ops.py:position_sensitive_crop_regions to operate on single image and set of boxes without `box_ind`
    - Add a batch version that operations on batches of images and batches of boxes.
    - Refactor R-FCN box predictor to use the batched version of position sensitive crop regions.

--
205453567  by Zhichao Lu:

    Fix bug that cannot export inference graph when write_inference_graph flag is True.

--
205316039  by Zhichao Lu:

    Changing input tensor name.

--
205256307  by Zhichao Lu:

    Fix model zoo links for quantized model.

--
205164432  by Zhichao Lu:

    Fixes eval error when label map contains non-ascii characters.

--
205129842  by Zhichao Lu:

    Adds a option to clip the anchors to the window size without filtering the overlapped boxes in Faster-RCNN

--
205094863  by Zhichao Lu:

    Update to label map util to allow the option of adding a background class and fill in gaps in the label map. Useful for using multiclass scores which require a complete label map with explicit background label.

--
204989032  by Zhichao Lu:

    Add tf.prof support to exporter.

--
204825267  by Zhichao Lu:

    Modify mask rcnn box predictor tests for TPU compatibility.

--
204778749  by Zhichao Lu:

    Remove score filtering from postprocessing.py and rely on filtering logic in tf.image.non_max_suppression

--
204775818  by Zhichao Lu:

    Python3 fixes for object_detection.

--
204745920  by Zhichao Lu:

    Object Detection Dataset visualization tool (documentation).

--
204686993  by Zhichao Lu:

    Internal changes.

--
204559667  by Zhichao Lu:

    Refactor box_predictor.py into multiple files.
    The abstract base class remains in the object_detection/core, The other classes have moved to a separate file each in object_detection/predictors

--
204552847  by Zhichao Lu:

    Update blog post link.

--
204508028  by Zhichao Lu:

    Bump down the batch size to 1024 to be a bit more tolerant to OOM and double the number of iterations. This job still converges to 20.5 mAP in 3 hours.

--

PiperOrigin-RevId: 206852642

* Add original post-processing back.
parent d135ed9c
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""RFCN Box Predictor."""
import tensorflow as tf
from object_detection.core import box_predictor
from object_detection.utils import ops
slim = tf.contrib.slim
BOX_ENCODINGS = box_predictor.BOX_ENCODINGS
CLASS_PREDICTIONS_WITH_BACKGROUND = (
box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND)
MASK_PREDICTIONS = box_predictor.MASK_PREDICTIONS
class RfcnBoxPredictor(box_predictor.BoxPredictor):
"""RFCN Box Predictor.
Applies a position sensitive ROI pooling on position sensitive feature maps to
predict classes and refined locations. See https://arxiv.org/abs/1605.06409
for details.
This is used for the second stage of the RFCN meta architecture. Notice that
locations are *not* shared across classes, thus for each anchor, a separate
prediction is made for each class.
"""
def __init__(self,
is_training,
num_classes,
conv_hyperparams_fn,
num_spatial_bins,
depth,
crop_size,
box_code_size):
"""Constructor.
Args:
is_training: Indicates whether the BoxPredictor is in training mode.
num_classes: number of classes. Note that num_classes *does not*
include the background category, so if groundtruth labels take values
in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the
assigned classification targets can range from {0,... K}).
conv_hyperparams_fn: A function to construct tf-slim arg_scope with
hyperparameters for convolutional layers.
num_spatial_bins: A list of two integers `[spatial_bins_y,
spatial_bins_x]`.
depth: Target depth to reduce the input feature maps to.
crop_size: A list of two integers `[crop_height, crop_width]`.
box_code_size: Size of encoding for each box.
"""
super(RfcnBoxPredictor, self).__init__(is_training, num_classes)
self._conv_hyperparams_fn = conv_hyperparams_fn
self._num_spatial_bins = num_spatial_bins
self._depth = depth
self._crop_size = crop_size
self._box_code_size = box_code_size
@property
def num_classes(self):
return self._num_classes
def _predict(self, image_features, num_predictions_per_location,
proposal_boxes):
"""Computes encoded object locations and corresponding confidences.
Args:
image_features: A list of float tensors of shape [batch_size, height_i,
width_i, channels_i] containing features for a batch of images.
num_predictions_per_location: A list of integers representing the number
of box predictions to be made per spatial location for each feature map.
Currently, this must be set to [1], or an error will be raised.
proposal_boxes: A float tensor of shape [batch_size, num_proposals,
box_code_size].
Returns:
box_encodings: A list of float tensors of shape
[batch_size, num_anchors_i, q, code_size] representing the location of
the objects, where q is 1 or the number of classes. Each entry in the
list corresponds to a feature map in the input `image_features` list.
class_predictions_with_background: A list of float tensors of shape
[batch_size, num_anchors_i, num_classes + 1] representing the class
predictions for the proposals. Each entry in the list corresponds to a
feature map in the input `image_features` list.
Raises:
ValueError: if num_predictions_per_location is not 1 or if
len(image_features) is not 1.
"""
if (len(num_predictions_per_location) != 1 or
num_predictions_per_location[0] != 1):
raise ValueError('Currently RfcnBoxPredictor only supports '
'predicting a single box per class per location.')
if len(image_features) != 1:
raise ValueError('length of `image_features` must be 1. Found {}'.
format(len(image_features)))
image_feature = image_features[0]
num_predictions_per_location = num_predictions_per_location[0]
batch_size = tf.shape(proposal_boxes)[0]
num_boxes = tf.shape(proposal_boxes)[1]
net = image_feature
with slim.arg_scope(self._conv_hyperparams_fn()):
net = slim.conv2d(net, self._depth, [1, 1], scope='reduce_depth')
# Location predictions.
location_feature_map_depth = (self._num_spatial_bins[0] *
self._num_spatial_bins[1] *
self.num_classes *
self._box_code_size)
location_feature_map = slim.conv2d(net, location_feature_map_depth,
[1, 1], activation_fn=None,
scope='refined_locations')
box_encodings = ops.batch_position_sensitive_crop_regions(
location_feature_map,
boxes=proposal_boxes,
crop_size=self._crop_size,
num_spatial_bins=self._num_spatial_bins,
global_pool=True)
box_encodings = tf.squeeze(box_encodings, squeeze_dims=[2, 3])
box_encodings = tf.reshape(box_encodings,
[batch_size * num_boxes, 1, self.num_classes,
self._box_code_size])
# Class predictions.
total_classes = self.num_classes + 1 # Account for background class.
class_feature_map_depth = (self._num_spatial_bins[0] *
self._num_spatial_bins[1] *
total_classes)
class_feature_map = slim.conv2d(net, class_feature_map_depth, [1, 1],
activation_fn=None,
scope='class_predictions')
class_predictions_with_background = (
ops.batch_position_sensitive_crop_regions(
class_feature_map,
boxes=proposal_boxes,
crop_size=self._crop_size,
num_spatial_bins=self._num_spatial_bins,
global_pool=True))
class_predictions_with_background = tf.squeeze(
class_predictions_with_background, squeeze_dims=[2, 3])
class_predictions_with_background = tf.reshape(
class_predictions_with_background,
[batch_size * num_boxes, 1, total_classes])
return {BOX_ENCODINGS: [box_encodings],
CLASS_PREDICTIONS_WITH_BACKGROUND:
[class_predictions_with_background]}
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for object_detection.predictors.rfcn_box_predictor."""
import numpy as np
import tensorflow as tf
from google.protobuf import text_format
from object_detection.builders import hyperparams_builder
from object_detection.predictors import rfcn_box_predictor as box_predictor
from object_detection.protos import hyperparams_pb2
from object_detection.utils import test_case
class RfcnBoxPredictorTest(test_case.TestCase):
def _build_arg_scope_with_conv_hyperparams(self):
conv_hyperparams = hyperparams_pb2.Hyperparams()
conv_hyperparams_text_proto = """
regularizer {
l2_regularizer {
}
}
initializer {
truncated_normal_initializer {
}
}
"""
text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams)
return hyperparams_builder.build(conv_hyperparams, is_training=True)
def test_get_correct_box_encoding_and_class_prediction_shapes(self):
def graph_fn(image_features, proposal_boxes):
rfcn_box_predictor = box_predictor.RfcnBoxPredictor(
is_training=False,
num_classes=2,
conv_hyperparams_fn=self._build_arg_scope_with_conv_hyperparams(),
num_spatial_bins=[3, 3],
depth=4,
crop_size=[12, 12],
box_code_size=4
)
box_predictions = rfcn_box_predictor.predict(
[image_features], num_predictions_per_location=[1],
scope='BoxPredictor',
proposal_boxes=proposal_boxes)
box_encodings = tf.concat(
box_predictions[box_predictor.BOX_ENCODINGS], axis=1)
class_predictions_with_background = tf.concat(
box_predictions[box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND],
axis=1)
return (box_encodings, class_predictions_with_background)
image_features = np.random.rand(4, 8, 8, 64).astype(np.float32)
proposal_boxes = np.random.rand(4, 2, 4).astype(np.float32)
(box_encodings, class_predictions_with_background) = self.execute(
graph_fn, [image_features, proposal_boxes])
self.assertAllEqual(box_encodings.shape, [8, 1, 2, 4])
self.assertAllEqual(class_predictions_with_background.shape, [8, 1, 3])
if __name__ == '__main__':
tf.test.main()
......@@ -93,6 +93,8 @@ message WeightSharedConvolutionalBoxPredictor {
optional bool share_prediction_tower = 13 [default = false];
}
// TODO(alirezafathi): Refactor the proto file to be able to configure mask rcnn
// head easily.
message MaskRCNNBoxPredictor {
// Hyperparameters for fully connected ops used in the box predictor.
optional Hyperparams fc_hyperparams = 1;
......
......@@ -142,6 +142,21 @@ message FasterRcnn {
// standard tf.image.crop_and_resize while computing second stage input
// feature maps.
optional bool use_matmul_crop_and_resize = 31 [default = false];
// Normally, anchors generated for a given image size are pruned during
// training if they lie outside the image window. Setting this option to true,
// clips the anchors to be within the image instead of pruning.
optional bool clip_anchors_to_image = 32 [default = false];
// After peforming matching between anchors and targets, in order to pull out
// targets for training Faster R-CNN meta architecture we perform a gather
// operation. This options specifies whether to use an alternate
// implementation of tf.gather that is faster on TPUs.
optional bool use_matmul_gather_in_matcher = 33 [default = false];
// Whether to use the balanced positive negative sampler implementation with
// static shape guarantees.
optional bool use_static_balanced_label_sampler = 34 [default = false];
}
......
......@@ -33,6 +33,7 @@ message PreprocessingStep {
RandomVerticalFlip random_vertical_flip = 25;
RandomRotation90 random_rotation90 = 26;
RGBtoGray rgb_to_gray = 27;
ConvertClassLogitsToSoftmax convert_class_logits_to_softmax = 28;
}
}
......@@ -409,3 +410,11 @@ message SSDRandomCropPadFixedAspectRatio {
// width. Two entries per operation.
repeated float max_padded_size_ratio = 4;
}
// Converts class logits to softmax optionally scaling the values by temperature
// first.
message ConvertClassLogitsToSoftmax {
// Scale to use on logits before applying softmax.
optional float temperature = 1 [default=1.0];
}
......@@ -9,6 +9,7 @@ message RegionSimilarityCalculator {
NegSqDistSimilarity neg_sq_dist_similarity = 1;
IouSimilarity iou_similarity = 2;
IoaSimilarity ioa_similarity = 3;
ThresholdedIouSimilarity thresholded_iou_similarity = 4;
}
}
......@@ -23,3 +24,10 @@ message IouSimilarity {
// Configuration for intersection-over-area (IOA) similarity calculator.
message IoaSimilarity {
}
// Configuration for thresholded-intersection-over-union similarity calculator.
message ThresholdedIouSimilarity {
// IOU threshold used for filtering scores.
optional float iou_threshold = 1 [default = 0.5];
}
......@@ -120,4 +120,30 @@ message SsdFeatureExtractor {
// Whether to use depthwise separable convolutions for to extract additional
// feature maps added by SSD.
optional bool use_depthwise = 8 [default=false];
// Feature Pyramid Networks config.
optional FeaturePyramidNetworks fpn = 10;
}
// Configuration for Feature Pyramid Networks.
message FeaturePyramidNetworks {
// We recommend to use multi_resolution_feature_map_generator with FPN, and
// the levels there must match the levels defined below for better
// performance.
// Correspondence from FPN levels to Resnet/Mobilenet V1 feature maps:
// FPN Level Resnet Feature Map Mobilenet-V1 Feature Map
// 2 Block 1 Conv2d_3_pointwise
// 3 Block 2 Conv2d_5_pointwise
// 4 Block 3 Conv2d_11_pointwise
// 5 Block 4 Conv2d_13_pointwise
// 6 Bottomup_5 bottom_up_Conv2d_14
// 7 Bottomup_6 bottom_up_Conv2d_15
// 8 Bottomup_7 bottom_up_Conv2d_16
// 9 Bottomup_8 bottom_up_Conv2d_17
// minimum level in feature pyramid
optional int32 min_level = 1 [default = 3];
// maximum level in feature pyramid
optional int32 max_level = 2 [default = 7];
}
# SSD with Mobilenet v1 feature extractor and focal loss.
# Trained on COCO14, initialized from Imagenet classification checkpoint
# Achieves 19.3 mAP on COCO14 minival dataset. Doubling the number of training
# steps gets to 20.6 mAP.
# Achieves 20.5 mAP on COCO14 minival dataset.
# This config is TPU compatible
......@@ -144,11 +143,11 @@ model {
train_config: {
fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt"
batch_size: 2048
batch_size: 1024
sync_replicas: true
startup_delay_steps: 0
replicas_to_aggregate: 8
num_steps: 10000
num_steps: 20000
data_augmentation_options {
random_horizontal_flip {
}
......@@ -162,9 +161,9 @@ train_config: {
learning_rate: {
cosine_decay_learning_rate {
learning_rate_base: 0.9
total_steps: 10000
total_steps: 20000
warmup_learning_rate: 0.3
warmup_steps: 300
warmup_steps: 1000
}
}
momentum_optimizer_value: 0.9
......
......@@ -79,6 +79,10 @@ model {
}
feature_extractor {
type: 'ssd_mobilenet_v1_fpn'
fpn {
min_level: 3
max_level: 7
}
min_depth: 16
depth_multiplier: 1.0
conv_hyperparams {
......
......@@ -80,6 +80,10 @@ model {
}
feature_extractor {
type: 'ssd_resnet50_v1_fpn'
fpn {
min_level: 3
max_level: 7
}
min_depth: 16
depth_multiplier: 1.0
conv_hyperparams {
......
......@@ -139,15 +139,26 @@ def load_labelmap(path):
return label_map
def get_label_map_dict(label_map_path, use_display_name=False):
def get_label_map_dict(label_map_path,
use_display_name=False,
fill_in_gaps_and_background=False):
"""Reads a label map and returns a dictionary of label names to id.
Args:
label_map_path: path to label_map.
label_map_path: path to StringIntLabelMap proto text file.
use_display_name: whether to use the label map items' display names as keys.
fill_in_gaps_and_background: whether to fill in gaps and background with
respect to the id field in the proto. The id: 0 is reserved for the
'background' class and will be added if it is missing. All other missing
ids in range(1, max(id)) will be added with a dummy class name
("class_<id>") if they are missing.
Returns:
A dictionary mapping label names to id.
Raises:
ValueError: if fill_in_gaps_and_background and label_map has non-integer or
negative values.
"""
label_map = load_labelmap(label_map_path)
label_map_dict = {}
......@@ -156,6 +167,24 @@ def get_label_map_dict(label_map_path, use_display_name=False):
label_map_dict[item.display_name] = item.id
else:
label_map_dict[item.name] = item.id
if fill_in_gaps_and_background:
values = set(label_map_dict.values())
if 0 not in values:
label_map_dict['background'] = 0
if not all(isinstance(value, int) for value in values):
raise ValueError('The values in label map must be integers in order to'
'fill_in_gaps_and_background.')
if not all(value >= 0 for value in values):
raise ValueError('The values in the label map must be positive.')
if len(values) != max(values) + 1:
# there are gaps in the labels, fill in gaps.
for value in range(1, max(values)):
if value not in values:
label_map_dict['class_' + str(value)] = value
return label_map_dict
......
......@@ -119,6 +119,30 @@ class LabelMapUtilTest(tf.test.TestCase):
self.assertEqual(label_map_dict['dog'], 1)
self.assertEqual(label_map_dict['cat'], 2)
def test_get_label_map_dict_with_fill_in_gaps_and_background(self):
label_map_string = """
item {
id:3
name:'cat'
}
item {
id:1
name:'dog'
}
"""
label_map_path = os.path.join(self.get_temp_dir(), 'label_map.pbtxt')
with tf.gfile.Open(label_map_path, 'wb') as f:
f.write(label_map_string)
label_map_dict = label_map_util.get_label_map_dict(
label_map_path, fill_in_gaps_and_background=True)
self.assertEqual(label_map_dict['background'], 0)
self.assertEqual(label_map_dict['dog'], 1)
self.assertEqual(label_map_dict['class_2'], 2)
self.assertEqual(label_map_dict['cat'], 3)
self.assertEqual(len(label_map_dict), max(label_map_dict.values()) + 1)
def test_keep_categories_with_unique_id(self):
label_map_proto = string_int_label_map_pb2.StringIntLabelMap()
label_map_string = """
......
......@@ -31,6 +31,7 @@ from abc import ABCMeta
from abc import abstractmethod
import collections
import logging
import unicodedata
import numpy as np
from object_detection.core import standard_fields
......@@ -284,18 +285,23 @@ class ObjectDetectionEvaluator(DetectionEvaluator):
category_index = label_map_util.create_category_index(self._categories)
for idx in range(per_class_ap.size):
if idx + self._label_id_offset in category_index:
category_name = category_index[idx + self._label_id_offset]['name']
try:
category_name = unicode(category_name, 'utf-8')
except TypeError:
pass
category_name = unicodedata.normalize(
'NFKD', category_name).encode('ascii', 'ignore')
display_name = (
self._metric_prefix + 'PerformanceByCategory/AP@{}IOU/{}'.format(
self._matching_iou_threshold,
category_index[idx + self._label_id_offset]['name']))
self._matching_iou_threshold, category_name))
pascal_metrics[display_name] = per_class_ap[idx]
# Optionally add CorLoc metrics.classes
if self._evaluate_corlocs:
display_name = (
self._metric_prefix + 'PerformanceByCategory/CorLoc@{}IOU/{}'
.format(self._matching_iou_threshold,
category_index[idx + self._label_id_offset]['name']))
.format(self._matching_iou_threshold, category_name))
pascal_metrics[display_name] = per_class_corloc[idx]
return pascal_metrics
......@@ -839,9 +845,9 @@ class ObjectDetectionEvaluation(object):
if self.use_weighted_mean_ap:
all_scores = np.append(all_scores, scores)
all_tp_fp_labels = np.append(all_tp_fp_labels, tp_fp_labels)
print 'Scores and tpfp per class label: {}'.format(class_index)
print tp_fp_labels
print scores
logging.info('Scores and tpfp per class label: %d', class_index)
logging.info(tp_fp_labels)
logging.info(scores)
precision, recall = metrics.compute_precision_recall(
scores, tp_fp_labels, self.num_gt_instances_per_class[class_index])
self.precisions_per_class.append(precision)
......
......@@ -20,8 +20,6 @@ import six
import tensorflow as tf
from object_detection.core import box_list
from object_detection.core import box_list_ops
from object_detection.core import standard_fields as fields
from object_detection.utils import shape_utils
from object_detection.utils import static_shape
......@@ -60,13 +58,20 @@ def normalized_to_image_coordinates(normalized_boxes, image_shape,
parallel_iterations: parallelism for the map_fn op.
Returns:
absolute_boxes: a float32 tensor of shape [None, num_boxes, 4] containg the
boxes in image coordinates.
absolute_boxes: a float32 tensor of shape [None, num_boxes, 4] containing
the boxes in image coordinates.
"""
x_scale = tf.cast(image_shape[2], tf.float32)
y_scale = tf.cast(image_shape[1], tf.float32)
def _to_absolute_coordinates(normalized_boxes):
return box_list_ops.to_absolute_coordinates(
box_list.BoxList(normalized_boxes),
image_shape[1], image_shape[2], check_range=False).get()
y_min, x_min, y_max, x_max = tf.split(
value=normalized_boxes, num_or_size_splits=4, axis=1)
y_min = y_scale * y_min
y_max = y_scale * y_max
x_min = x_scale * x_min
x_max = x_scale * x_max
scaled_boxes = tf.concat([y_min, x_min, y_max, x_max], 1)
return scaled_boxes
absolute_boxes = shape_utils.static_or_dynamic_map_fn(
_to_absolute_coordinates,
......@@ -538,13 +543,59 @@ def normalize_to_target(inputs,
return tf.reshape(target_norm, mult_shape) * tf.truediv(inputs, lengths)
def batch_position_sensitive_crop_regions(images,
boxes,
crop_size,
num_spatial_bins,
global_pool,
parallel_iterations=64):
"""Position sensitive crop with batches of images and boxes.
This op is exactly like `position_sensitive_crop_regions` below but operates
on batches of images and boxes. See `position_sensitive_crop_regions` function
below for the operation applied per batch element.
Args:
images: A `Tensor`. Must be one of the following types: `uint8`, `int8`,
`int16`, `int32`, `int64`, `half`, `float32`, `float64`.
A 4-D tensor of shape `[batch, image_height, image_width, depth]`.
Both `image_height` and `image_width` need to be positive.
boxes: A `Tensor` of type `float32`.
A 3-D tensor of shape `[batch, num_boxes, 4]`. Each box is specified in
normalized coordinates `[y1, x1, y2, x2]`. A normalized coordinate value
of `y` is mapped to the image coordinate at `y * (image_height - 1)`, so
as the `[0, 1]` interval of normalized image height is mapped to
`[0, image_height - 1] in image height coordinates. We do allow y1 > y2,
in which case the sampled crop is an up-down flipped version of the
original image. The width dimension is treated similarly.
crop_size: See `position_sensitive_crop_regions` below.
num_spatial_bins: See `position_sensitive_crop_regions` below.
global_pool: See `position_sensitive_crop_regions` below.
parallel_iterations: Number of batch items to process in parallel.
Returns:
"""
def _position_sensitive_crop_fn(inputs):
images, boxes = inputs
return position_sensitive_crop_regions(
images,
boxes,
crop_size=crop_size,
num_spatial_bins=num_spatial_bins,
global_pool=global_pool)
return shape_utils.static_or_dynamic_map_fn(
_position_sensitive_crop_fn,
elems=[images, boxes],
dtype=tf.float32,
parallel_iterations=parallel_iterations)
def position_sensitive_crop_regions(image,
boxes,
box_ind,
crop_size,
num_spatial_bins,
global_pool,
extrapolation_value=None):
global_pool):
"""Position-sensitive crop and pool rectangular regions from a feature grid.
The output crops are split into `spatial_bins_y` vertical bins
......@@ -565,23 +616,16 @@ def position_sensitive_crop_regions(image,
Args:
image: A `Tensor`. Must be one of the following types: `uint8`, `int8`,
`int16`, `int32`, `int64`, `half`, `float32`, `float64`.
A 4-D tensor of shape `[batch, image_height, image_width, depth]`.
A 3-D tensor of shape `[image_height, image_width, depth]`.
Both `image_height` and `image_width` need to be positive.
boxes: A `Tensor` of type `float32`.
A 2-D tensor of shape `[num_boxes, 4]`. The `i`-th row of the tensor
specifies the coordinates of a box in the `box_ind[i]` image and is
specified in normalized coordinates `[y1, x1, y2, x2]`. A normalized
coordinate value of `y` is mapped to the image coordinate at
`y * (image_height - 1)`, so as the `[0, 1]` interval of normalized image
height is mapped to `[0, image_height - 1] in image height coordinates.
We do allow y1 > y2, in which case the sampled crop is an up-down flipped
version of the original image. The width dimension is treated similarly.
Normalized coordinates outside the `[0, 1]` range are allowed, in which
case we use `extrapolation_value` to extrapolate the input image values.
box_ind: A `Tensor` of type `int32`.
A 1-D tensor of shape `[num_boxes]` with int32 values in `[0, batch)`.
The value of `box_ind[i]` specifies the image that the `i`-th box refers
to.
A 2-D tensor of shape `[num_boxes, 4]`. Each box is specified in
normalized coordinates `[y1, x1, y2, x2]`. A normalized coordinate value
of `y` is mapped to the image coordinate at `y * (image_height - 1)`, so
as the `[0, 1]` interval of normalized image height is mapped to
`[0, image_height - 1] in image height coordinates. We do allow y1 > y2,
in which case the sampled crop is an up-down flipped version of the
original image. The width dimension is treated similarly.
crop_size: A list of two integers `[crop_height, crop_width]`. All
cropped image patches are resized to this size. The aspect ratio of the
image content is not preserved. Both `crop_height` and `crop_width` need
......@@ -601,8 +645,7 @@ def position_sensitive_crop_regions(image,
Note that using global_pool=True is equivalent to but more efficient than
running the function with global_pool=False and then performing global
average pooling.
extrapolation_value: An optional `float`. Defaults to `0`.
Value used for extrapolation, when applicable.
Returns:
position_sensitive_features: A 4-D tensor of shape
`[num_boxes, K, K, crop_channels]`,
......@@ -649,12 +692,17 @@ def position_sensitive_crop_regions(image,
]
position_sensitive_boxes.append(tf.stack(box_coordinates, axis=1))
image_splits = tf.split(value=image, num_or_size_splits=total_bins, axis=3)
image_splits = tf.split(value=image, num_or_size_splits=total_bins, axis=2)
image_crops = []
for (split, box) in zip(image_splits, position_sensitive_boxes):
crop = tf.image.crop_and_resize(split, box, box_ind, bin_crop_size,
extrapolation_value=extrapolation_value)
if split.shape.is_fully_defined() and box.shape.is_fully_defined():
crop = matmul_crop_and_resize(
tf.expand_dims(split, 0), box, bin_crop_size)
else:
crop = tf.image.crop_and_resize(
tf.expand_dims(split, 0), box,
tf.zeros(tf.shape(boxes)[0], dtype=tf.int32), bin_crop_size)
image_crops.append(crop)
if global_pool:
......@@ -957,3 +1005,68 @@ def matmul_crop_and_resize(image, boxes, crop_size, scope=None):
tf.matmul(kernel_h, tf.tile(channel, [num_crops, 1, 1])),
kernel_w, transpose_b=True))
return tf.stack(result_channels, axis=3)
def expected_classification_loss_under_sampling(batch_cls_targets, cls_losses,
negative_to_positive_ratio,
minimum_negative_sampling):
"""Computes classification loss by background/foreground weighting.
The weighting is such that the effective background/foreground weight ratio
is the negative_to_positive_ratio. if p_i is the foreground probability of
anchor a_i, L(a_i) is the anchors loss, N is the number of anchors, and M is
the sum of foreground probabilities across anchors, then the total loss L is
calculated as:
beta = K*M/(N-M)
L = sum_{i=1}^N [p_i + beta * (1 - p_i)] * (L(a_i))
Args:
batch_cls_targets: A tensor with shape [batch_size, num_anchors,
num_classes + 1], where 0'th index is the background class, containing
the class distrubution for the target assigned to a given anchor.
cls_losses: Float tensor of shape [batch_size, num_anchors]
representing anchorwise classification losses.
negative_to_positive_ratio: The desired background/foreground weight ratio.
minimum_negative_sampling: Minimum number of effective negative samples.
Used only when there are no positive examples.
Returns:
The classification loss.
"""
num_anchors = tf.cast(tf.shape(batch_cls_targets)[1], tf.float32)
# find the p_i
foreground_probabilities = (
foreground_probabilities_from_targets(batch_cls_targets))
foreground_sum = tf.reduce_sum(foreground_probabilities, axis=-1)
k = negative_to_positive_ratio
# compute beta
denominators = (num_anchors - foreground_sum)
beta = tf.where(
tf.equal(denominators, 0), tf.zeros_like(foreground_sum),
k * foreground_sum / denominators)
# where the foreground sum is zero, use a minimum negative weight.
min_negative_weight = 1.0 * minimum_negative_sampling / num_anchors
beta = tf.where(
tf.equal(beta, 0), min_negative_weight * tf.ones_like(beta), beta)
beta = tf.reshape(beta, [-1, 1])
cls_loss_weights = foreground_probabilities + (
1 - foreground_probabilities) * beta
weighted_losses = cls_loss_weights * cls_losses
cls_losses = tf.reduce_sum(weighted_losses, axis=-1)
return cls_losses
def foreground_probabilities_from_targets(batch_cls_targets):
foreground_probabilities = 1 - batch_cls_targets[:, :, 0]
return foreground_probabilities
......@@ -812,13 +812,12 @@ class OpsTestPositionSensitiveCropRegions(tf.test.TestCase):
def test_position_sensitive(self):
num_spatial_bins = [3, 2]
image_shape = [1, 3, 2, 6]
image_shape = [3, 2, 6]
# First channel is 1's, second channel is 2's, etc.
image = tf.constant(range(1, 3 * 2 + 1) * 6, dtype=tf.float32,
shape=image_shape)
boxes = tf.random_uniform((2, 4))
box_ind = tf.constant([0, 0], dtype=tf.int32)
# The result for both boxes should be [[1, 2], [3, 4], [5, 6]]
# before averaging.
......@@ -827,7 +826,7 @@ class OpsTestPositionSensitiveCropRegions(tf.test.TestCase):
for crop_size_mult in range(1, 3):
crop_size = [3 * crop_size_mult, 2 * crop_size_mult]
ps_crop_and_pool = ops.position_sensitive_crop_regions(
image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=True)
image, boxes, crop_size, num_spatial_bins, global_pool=True)
with self.test_session() as sess:
output = sess.run(ps_crop_and_pool)
......@@ -835,24 +834,24 @@ class OpsTestPositionSensitiveCropRegions(tf.test.TestCase):
def test_position_sensitive_with_equal_channels(self):
num_spatial_bins = [2, 2]
image_shape = [1, 3, 3, 4]
image_shape = [3, 3, 4]
crop_size = [2, 2]
image = tf.constant(range(1, 3 * 3 + 1), dtype=tf.float32,
shape=[1, 3, 3, 1])
tiled_image = tf.tile(image, [1, 1, 1, image_shape[3]])
shape=[3, 3, 1])
tiled_image = tf.tile(image, [1, 1, image_shape[2]])
boxes = tf.random_uniform((3, 4))
box_ind = tf.constant([0, 0, 0], dtype=tf.int32)
# All channels are equal so position-sensitive crop and resize should
# work as the usual crop and resize for just one channel.
crop = tf.image.crop_and_resize(image, boxes, box_ind, crop_size)
crop = tf.image.crop_and_resize(tf.expand_dims(image, axis=0), boxes,
box_ind, crop_size)
crop_and_pool = tf.reduce_mean(crop, [1, 2], keep_dims=True)
ps_crop_and_pool = ops.position_sensitive_crop_regions(
tiled_image,
boxes,
box_ind,
crop_size,
num_spatial_bins,
global_pool=True)
......@@ -861,78 +860,53 @@ class OpsTestPositionSensitiveCropRegions(tf.test.TestCase):
expected_output, output = sess.run((crop_and_pool, ps_crop_and_pool))
self.assertAllClose(output, expected_output)
def test_position_sensitive_with_single_bin(self):
num_spatial_bins = [1, 1]
image_shape = [2, 3, 3, 4]
crop_size = [2, 2]
image = tf.random_uniform(image_shape)
boxes = tf.random_uniform((6, 4))
box_ind = tf.constant([0, 0, 0, 1, 1, 1], dtype=tf.int32)
# When a single bin is used, position-sensitive crop and pool should be
# the same as non-position sensitive crop and pool.
crop = tf.image.crop_and_resize(image, boxes, box_ind, crop_size)
crop_and_pool = tf.reduce_mean(crop, [1, 2], keep_dims=True)
ps_crop_and_pool = ops.position_sensitive_crop_regions(
image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=True)
with self.test_session() as sess:
expected_output, output = sess.run((crop_and_pool, ps_crop_and_pool))
self.assertAllClose(output, expected_output)
def test_raise_value_error_on_num_bins_less_than_one(self):
num_spatial_bins = [1, -1]
image_shape = [1, 1, 1, 2]
image_shape = [1, 1, 2]
crop_size = [2, 2]
image = tf.constant(1, dtype=tf.float32, shape=image_shape)
boxes = tf.constant([[0, 0, 1, 1]], dtype=tf.float32)
box_ind = tf.constant([0], dtype=tf.int32)
with self.assertRaisesRegexp(ValueError, 'num_spatial_bins should be >= 1'):
ops.position_sensitive_crop_regions(
image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=True)
image, boxes, crop_size, num_spatial_bins, global_pool=True)
def test_raise_value_error_on_non_divisible_crop_size(self):
num_spatial_bins = [2, 3]
image_shape = [1, 1, 1, 6]
image_shape = [1, 1, 6]
crop_size = [3, 2]
image = tf.constant(1, dtype=tf.float32, shape=image_shape)
boxes = tf.constant([[0, 0, 1, 1]], dtype=tf.float32)
box_ind = tf.constant([0], dtype=tf.int32)
with self.assertRaisesRegexp(
ValueError, 'crop_size should be divisible by num_spatial_bins'):
ops.position_sensitive_crop_regions(
image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=True)
image, boxes, crop_size, num_spatial_bins, global_pool=True)
def test_raise_value_error_on_non_divisible_num_channels(self):
num_spatial_bins = [2, 2]
image_shape = [1, 1, 1, 5]
image_shape = [1, 1, 5]
crop_size = [2, 2]
image = tf.constant(1, dtype=tf.float32, shape=image_shape)
boxes = tf.constant([[0, 0, 1, 1]], dtype=tf.float32)
box_ind = tf.constant([0], dtype=tf.int32)
with self.assertRaisesRegexp(
ValueError, 'Dimension size must be evenly divisible by 4 but is 5'):
ops.position_sensitive_crop_regions(
image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=True)
image, boxes, crop_size, num_spatial_bins, global_pool=True)
def test_position_sensitive_with_global_pool_false(self):
num_spatial_bins = [3, 2]
image_shape = [1, 3, 2, 6]
image_shape = [3, 2, 6]
num_boxes = 2
# First channel is 1's, second channel is 2's, etc.
image = tf.constant(range(1, 3 * 2 + 1) * 6, dtype=tf.float32,
shape=image_shape)
boxes = tf.random_uniform((num_boxes, 4))
box_ind = tf.constant([0, 0], dtype=tf.int32)
expected_output = []
......@@ -956,79 +930,21 @@ class OpsTestPositionSensitiveCropRegions(tf.test.TestCase):
for crop_size_mult in range(1, 3):
crop_size = [3 * crop_size_mult, 2 * crop_size_mult]
ps_crop = ops.position_sensitive_crop_regions(
image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=False)
image, boxes, crop_size, num_spatial_bins, global_pool=False)
with self.test_session() as sess:
output = sess.run(ps_crop)
self.assertAllEqual(output, expected_output[crop_size_mult - 1])
def test_position_sensitive_with_global_pool_false_and_known_boxes(self):
num_spatial_bins = [2, 2]
image_shape = [2, 2, 2, 4]
crop_size = [2, 2]
image = tf.constant(range(1, 2 * 2 * 4 + 1) * 2, dtype=tf.float32,
shape=image_shape)
# First box contains whole image, and second box contains only first row.
boxes = tf.constant(np.array([[0., 0., 1., 1.],
[0., 0., 0.5, 1.]]), dtype=tf.float32)
box_ind = tf.constant([0, 1], dtype=tf.int32)
expected_output = []
# Expected output, when the box containing whole image.
expected_output.append(
np.reshape(np.array([[4, 7],
[10, 13]]),
(1, 2, 2, 1))
)
# Expected output, when the box containing only first row.
expected_output.append(
np.reshape(np.array([[3, 6],
[7, 10]]),
(1, 2, 2, 1))
)
expected_output = np.concatenate(expected_output, axis=0)
ps_crop = ops.position_sensitive_crop_regions(
image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=False)
with self.test_session() as sess:
output = sess.run(ps_crop)
self.assertAllEqual(output, expected_output)
def test_position_sensitive_with_global_pool_false_and_single_bin(self):
num_spatial_bins = [1, 1]
image_shape = [2, 3, 3, 4]
crop_size = [1, 1]
image = tf.random_uniform(image_shape)
boxes = tf.random_uniform((6, 4))
box_ind = tf.constant([0, 0, 0, 1, 1, 1], dtype=tf.int32)
# Since single_bin is used and crop_size = [1, 1] (i.e., no crop resize),
# the outputs are the same whatever the global_pool value is.
ps_crop_and_pool = ops.position_sensitive_crop_regions(
image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=True)
ps_crop = ops.position_sensitive_crop_regions(
image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=False)
with self.test_session() as sess:
pooled_output, unpooled_output = sess.run((ps_crop_and_pool, ps_crop))
self.assertAllClose(pooled_output, unpooled_output)
def test_position_sensitive_with_global_pool_false_and_do_global_pool(self):
num_spatial_bins = [3, 2]
image_shape = [1, 3, 2, 6]
image_shape = [3, 2, 6]
num_boxes = 2
# First channel is 1's, second channel is 2's, etc.
image = tf.constant(range(1, 3 * 2 + 1) * 6, dtype=tf.float32,
shape=image_shape)
boxes = tf.random_uniform((num_boxes, 4))
box_ind = tf.constant([0, 0], dtype=tf.int32)
expected_output = []
......@@ -1059,7 +975,7 @@ class OpsTestPositionSensitiveCropRegions(tf.test.TestCase):
# Perform global_pooling after running the function with
# global_pool=False.
ps_crop = ops.position_sensitive_crop_regions(
image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=False)
image, boxes, crop_size, num_spatial_bins, global_pool=False)
ps_crop_and_pool = tf.reduce_mean(
ps_crop, reduction_indices=(1, 2), keep_dims=True)
......@@ -1070,17 +986,99 @@ class OpsTestPositionSensitiveCropRegions(tf.test.TestCase):
def test_raise_value_error_on_non_square_block_size(self):
num_spatial_bins = [3, 2]
image_shape = [1, 3, 2, 6]
image_shape = [3, 2, 6]
crop_size = [6, 2]
image = tf.constant(1, dtype=tf.float32, shape=image_shape)
boxes = tf.constant([[0, 0, 1, 1]], dtype=tf.float32)
box_ind = tf.constant([0], dtype=tf.int32)
with self.assertRaisesRegexp(
ValueError, 'Only support square bin crop size for now.'):
ops.position_sensitive_crop_regions(
image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=False)
image, boxes, crop_size, num_spatial_bins, global_pool=False)
class OpsTestBatchPositionSensitiveCropRegions(tf.test.TestCase):
def test_position_sensitive_with_single_bin(self):
num_spatial_bins = [1, 1]
image_shape = [2, 3, 3, 4]
crop_size = [2, 2]
image = tf.random_uniform(image_shape)
boxes = tf.random_uniform((2, 3, 4))
box_ind = tf.constant([0, 0, 0, 1, 1, 1], dtype=tf.int32)
# When a single bin is used, position-sensitive crop and pool should be
# the same as non-position sensitive crop and pool.
crop = tf.image.crop_and_resize(image, tf.reshape(boxes, [-1, 4]), box_ind,
crop_size)
crop_and_pool = tf.reduce_mean(crop, [1, 2], keepdims=True)
crop_and_pool = tf.reshape(crop_and_pool, [2, 3, 1, 1, 4])
ps_crop_and_pool = ops.batch_position_sensitive_crop_regions(
image, boxes, crop_size, num_spatial_bins, global_pool=True)
with self.test_session() as sess:
expected_output, output = sess.run((crop_and_pool, ps_crop_and_pool))
self.assertAllClose(output, expected_output)
def test_position_sensitive_with_global_pool_false_and_known_boxes(self):
num_spatial_bins = [2, 2]
image_shape = [2, 2, 2, 4]
crop_size = [2, 2]
images = tf.constant(range(1, 2 * 2 * 4 + 1) * 2, dtype=tf.float32,
shape=image_shape)
# First box contains whole image, and second box contains only first row.
boxes = tf.constant(np.array([[[0., 0., 1., 1.]],
[[0., 0., 0.5, 1.]]]), dtype=tf.float32)
# box_ind = tf.constant([0, 1], dtype=tf.int32)
expected_output = []
# Expected output, when the box containing whole image.
expected_output.append(
np.reshape(np.array([[4, 7],
[10, 13]]),
(1, 2, 2, 1))
)
# Expected output, when the box containing only first row.
expected_output.append(
np.reshape(np.array([[3, 6],
[7, 10]]),
(1, 2, 2, 1))
)
expected_output = np.stack(expected_output, axis=0)
ps_crop = ops.batch_position_sensitive_crop_regions(
images, boxes, crop_size, num_spatial_bins, global_pool=False)
with self.test_session() as sess:
output = sess.run(ps_crop)
self.assertAllEqual(output, expected_output)
def test_position_sensitive_with_global_pool_false_and_single_bin(self):
num_spatial_bins = [1, 1]
image_shape = [2, 3, 3, 4]
crop_size = [1, 1]
images = tf.random_uniform(image_shape)
boxes = tf.random_uniform((2, 3, 4))
# box_ind = tf.constant([0, 0, 0, 1, 1, 1], dtype=tf.int32)
# Since single_bin is used and crop_size = [1, 1] (i.e., no crop resize),
# the outputs are the same whatever the global_pool value is.
ps_crop_and_pool = ops.batch_position_sensitive_crop_regions(
images, boxes, crop_size, num_spatial_bins, global_pool=True)
ps_crop = ops.batch_position_sensitive_crop_regions(
images, boxes, crop_size, num_spatial_bins, global_pool=False)
with self.test_session() as sess:
pooled_output, unpooled_output = sess.run((ps_crop_and_pool, ps_crop))
self.assertAllClose(pooled_output, unpooled_output)
class ReframeBoxMasksToImageMasksTest(tf.test.TestCase):
......@@ -1365,5 +1363,86 @@ class OpsTestMatMulCropAndResize(test_case.TestCase):
_ = ops.matmul_crop_and_resize(image, boxes, crop_size)
class OpsTestExpectedClassificationLoss(test_case.TestCase):
def testExpectedClassificationLossUnderSamplingWithHardLabels(self):
def graph_fn(batch_cls_targets, cls_losses, negative_to_positive_ratio,
minimum_negative_sampling):
return ops.expected_classification_loss_under_sampling(
batch_cls_targets, cls_losses, negative_to_positive_ratio,
minimum_negative_sampling)
batch_cls_targets = np.array(
[[[1., 0, 0], [0, 1., 0]], [[1., 0, 0], [0, 1., 0]]], dtype=np.float32)
cls_losses = np.array([[1, 2], [3, 4]], dtype=np.float32)
negative_to_positive_ratio = np.array([2], dtype=np.float32)
minimum_negative_sampling = np.array([1], dtype=np.float32)
classification_loss = self.execute(graph_fn, [
batch_cls_targets, cls_losses, negative_to_positive_ratio,
minimum_negative_sampling
])
# expected_foregorund_sum = [1,1]
# expected_beta = [2,2]
# expected_cls_loss_weights = [2,1],[2,1]
# expected_classification_loss_under_sampling = [2*1+1*2, 2*3+1*4]
expected_classification_loss_under_sampling = [2 + 2, 6 + 4]
self.assertAllClose(expected_classification_loss_under_sampling,
classification_loss)
def testExpectedClassificationLossUnderSamplingWithAllNegative(self):
def graph_fn(batch_cls_targets, cls_losses):
return ops.expected_classification_loss_under_sampling(
batch_cls_targets, cls_losses, negative_to_positive_ratio,
minimum_negative_sampling)
batch_cls_targets = np.array(
[[[1, 0, 0], [1, 0, 0]], [[1, 0, 0], [1, 0, 0]]], dtype=np.float32)
cls_losses = np.array([[1, 2], [3, 4]], dtype=np.float32)
negative_to_positive_ratio = np.array([2], dtype=np.float32)
minimum_negative_sampling = np.array([1], dtype=np.float32)
classification_loss = self.execute(graph_fn,
[batch_cls_targets, cls_losses])
# expected_foregorund_sum = [0,0]
# expected_beta = [0.5,0.5]
# expected_cls_loss_weights = [0.5,0.5],[0.5,0.5]
# expected_classification_loss_under_sampling = [.5*1+.5*2, .5*3+.5*4]
expected_classification_loss_under_sampling = [1.5, 3.5]
self.assertAllClose(expected_classification_loss_under_sampling,
classification_loss)
def testExpectedClassificationLossUnderSamplingWithAllPositive(self):
def graph_fn(batch_cls_targets, cls_losses):
return ops.expected_classification_loss_under_sampling(
batch_cls_targets, cls_losses, negative_to_positive_ratio,
minimum_negative_sampling)
batch_cls_targets = np.array(
[[[0, 1., 0], [0, 1., 0]], [[0, 1, 0], [0, 0, 1]]], dtype=np.float32)
cls_losses = np.array([[1, 2], [3, 4]], dtype=np.float32)
negative_to_positive_ratio = np.array([2], dtype=np.float32)
minimum_negative_sampling = np.array([1], dtype=np.float32)
classification_loss = self.execute(graph_fn,
[batch_cls_targets, cls_losses])
# expected_foregorund_sum = [2,2]
# expected_beta = [0,0]
# expected_cls_loss_weights = [1,1],[1,1]
# expected_classification_loss_under_sampling = [1*1+1*2, 1*3+1*4]
expected_classification_loss_under_sampling = [1 + 2, 3 + 4]
self.assertAllClose(expected_classification_loss_under_sampling,
classification_loss)
if __name__ == '__main__':
tf.test.main()
......@@ -106,13 +106,49 @@ def pad_or_clip_tensor(t, length):
length is an integer, the first dimension of the processed tensor is set
to length statically.
"""
processed_t = tf.cond(
tf.greater(tf.shape(t)[0], length),
lambda: clip_tensor(t, length),
lambda: pad_tensor(t, length))
if not _is_tensor(length):
processed_t = _set_dim_0(processed_t, length)
return processed_t
return pad_or_clip_nd(t, [length] + t.shape.as_list()[1:])
def pad_or_clip_nd(tensor, output_shape):
"""Pad or Clip given tensor to the output shape.
Args:
tensor: Input tensor to pad or clip.
output_shape: A list of integers / scalar tensors (or None for dynamic dim)
representing the size to pad or clip each dimension of the input tensor.
Returns:
Input tensor padded and clipped to the output shape.
"""
tensor_shape = tf.shape(tensor)
clip_size = [
tf.where(tensor_shape[i] - shape > 0, shape, -1)
if shape is not None else -1 for i, shape in enumerate(output_shape)
]
clipped_tensor = tf.slice(
tensor,
begin=tf.zeros(len(clip_size), dtype=tf.int32),
size=clip_size)
# Pad tensor if the shape of clipped tensor is smaller than the expected
# shape.
clipped_tensor_shape = tf.shape(clipped_tensor)
trailing_paddings = [
shape - clipped_tensor_shape[i] if shape is not None else 0
for i, shape in enumerate(output_shape)
]
paddings = tf.stack(
[
tf.zeros(len(trailing_paddings), dtype=tf.int32),
trailing_paddings
],
axis=1)
padded_tensor = tf.pad(clipped_tensor, paddings=paddings)
output_static_shape = [
dim if not isinstance(dim, tf.Tensor) else None for dim in output_shape
]
padded_tensor.set_shape(output_static_shape)
return padded_tensor
def combined_static_and_dynamic_shape(tensor):
......@@ -306,4 +342,3 @@ def assert_shape_equal_along_first_dimension(shape_a, shape_b):
else: return tf.no_op()
else:
return tf.assert_equal(shape_a[0], shape_b[0])
......@@ -123,6 +123,22 @@ class UtilTest(tf.test.TestCase):
self.assertTrue(tf.contrib.framework.is_tensor(combined_shape[0]))
self.assertListEqual(combined_shape[1:], [2, 3])
def test_pad_or_clip_nd_tensor(self):
tensor_placeholder = tf.placeholder(tf.float32, [None, 5, 4, 7])
output_tensor = shape_utils.pad_or_clip_nd(
tensor_placeholder, [None, 3, 5, tf.constant(6)])
self.assertAllEqual(output_tensor.shape.as_list(), [None, 3, 5, None])
with self.test_session() as sess:
output_tensor_np = sess.run(
output_tensor,
feed_dict={
tensor_placeholder: np.random.rand(2, 5, 4, 7),
})
self.assertAllEqual(output_tensor_np.shape, [2, 3, 5, 6])
class StaticOrDynamicMapFnTest(tf.test.TestCase):
......
......@@ -47,9 +47,10 @@ class TestCase(tf.test.TestCase):
materialized_results = sess.run(tpu_computation,
feed_dict=dict(zip(placeholders, inputs)))
sess.run(tpu.shutdown_system())
if (len(materialized_results) == 1
and (isinstance(materialized_results, list)
or isinstance(materialized_results, tuple))):
if (hasattr(materialized_results, '__len__') and
len(materialized_results) == 1 and
(isinstance(materialized_results, list) or
isinstance(materialized_results, tuple))):
materialized_results = materialized_results[0]
return materialized_results
......@@ -72,9 +73,11 @@ class TestCase(tf.test.TestCase):
tf.local_variables_initializer()])
materialized_results = sess.run(results, feed_dict=dict(zip(placeholders,
inputs)))
if (len(materialized_results) == 1
and (isinstance(materialized_results, list)
or isinstance(materialized_results, tuple))):
if (hasattr(materialized_results, '__len__') and
len(materialized_results) == 1 and
(isinstance(materialized_results, list) or
isinstance(materialized_results, tuple))):
materialized_results = materialized_results[0]
return materialized_results
......
......@@ -62,6 +62,30 @@ class MockBoxPredictor(box_predictor.BoxPredictor):
class_predictions_with_background}
class MockKerasBoxPredictor(box_predictor.KerasBoxPredictor):
"""Simple box predictor that ignores inputs and outputs all zeros."""
def __init__(self, is_training, num_classes):
super(MockKerasBoxPredictor, self).__init__(
is_training, num_classes, False, False)
def _predict(self, image_features, **kwargs):
image_feature = image_features[0]
combined_feature_shape = shape_utils.combined_static_and_dynamic_shape(
image_feature)
batch_size = combined_feature_shape[0]
num_anchors = (combined_feature_shape[1] * combined_feature_shape[2])
code_size = 4
zero = tf.reduce_sum(0 * image_feature)
box_encodings = zero + tf.zeros(
(batch_size, num_anchors, 1, code_size), dtype=tf.float32)
class_predictions_with_background = zero + tf.zeros(
(batch_size, num_anchors, self.num_classes + 1), dtype=tf.float32)
return {box_predictor.BOX_ENCODINGS: box_encodings,
box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND:
class_predictions_with_background}
class MockAnchorGenerator(anchor_generator.AnchorGenerator):
"""Mock anchor generator."""
......
......@@ -134,8 +134,11 @@ def get_variables_available_in_checkpoint(variables,
vars_in_ckpt[variable_name] = variable
else:
logging.warning('Variable [%s] is available in checkpoint, but has an '
'incompatible shape with model variable.',
variable_name)
'incompatible shape with model variable. Checkpoint '
'shape: [%s], model variable shape: [%s]. This '
'variable will not be initialized from the checkpoint.',
variable_name, ckpt_vars_to_shape_map[variable_name],
variable.shape.as_list())
else:
logging.warning('Variable [%s] is not available in checkpoint',
variable_name)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment