Commit 9fce9c64 authored by Zhichao Lu's avatar Zhichao Lu Committed by pkulzc
Browse files

Merged commit includes the following changes:

199348852  by Zhichao Lu:

    Small typos fixes in VRD evaluation.

--
199315191  by Zhichao Lu:

    Change padding shapes when additional channels are available.

--
199309180  by Zhichao Lu:

    Adds minor fixes to the Object Detection API implementation.

--
199298605  by Zhichao Lu:

    Force num_readers to be 1 when only input file is not sharded.

--
199292952  by Zhichao Lu:

    Adds image-level labels parsing into TfExampleDetectionAndGTParser.

--
199259866  by Zhichao Lu:

    Visual Relationships Evaluation executable.

--
199208330  by Zhichao Lu:

    Infer train_config.batch_size as the effective batch size. Therefore we need to divide the effective batch size in trainer by train_config.replica_to_aggregate to get per worker batch size.

--
199207842  by Zhichao Lu:

    Internal change.

--
199204222  by Zhichao Lu:

    In case the image has more than three channels, we only take the first three channels for visualization.

--
199194388  by Zhichao Lu:

    Correcting protocols description: VOC 2007 -> VOC 2012.

--
199188290  by Zhichao Lu:

    Adds per-relationship APs and mAP computation to VRD evaluation.

--
199158801  by Zhichao Lu:

    If available, additional channels are merged with input image.

--
199099637  by Zhichao Lu:

    OpenImages Challenge metric support:
    -adding verified labels standard field for TFExample;
    -adding tfrecord creation functionality.

--
198957391  by Zhichao Lu:

    Allow tf record sharding when creating pets dataset.

--
198925184  by Zhichao Lu:

    Introduce moving average support for evaluation. Also adding the ability to override this configuration via config_util.

--
198918186  by Zhichao Lu:

    Handles the case where there are 0 box masks.

--
198809009  by Zhichao Lu:

    Plumb groundtruth weights into target assigner for Faster RCNN.

--
198759987  by Zhichao Lu:

    Fix object detection test broken by shape inference.

--
198668602  by Zhichao Lu:

    Adding a new input field in data_decoders/tf_example_decoder.py for storing additional channels.

--
198530013  by Zhichao Lu:

    An util for hierarchical expandion of boxes and labels of OID dataset.

--
198503124  by Zhichao Lu:

    Fix dimension mismatch error introduced by
    https://github.com/tensorflow/tensorflow/pull/18251, or cl/194031845.
    After above change, conv2d strictly checks for conv_dims + 2 == input_rank.

--
198445807  by Zhichao Lu:

    Enabling Object Detection Challenge 2018 metric in evaluator.py framework for
    running eval job.
    Renaming old OpenImages V2 metric.

--
198413950  by Zhichao Lu:

    Support generic configuration override using namespaced keys

    Useful for adding custom hyper-parameter tuning fields without having to add custom override methods to config_utils.py.

--
198106437  by Zhichao Lu:

    Enable fused batchnorm now that quantization is supported.

--
198048364  by Zhichao Lu:

    Add support for keypoints in tf sequence examples and some util ops.

--
198004736  by Zhichao Lu:

    Relax postprocessing unit tests that are based on assumption that tf.image.non_max_suppression are stable with respect to input.

--
197997513  by Zhichao Lu:

    More lenient validation for normalized box boundaries.

--
197940068  by Zhichao Lu:

    A couple of minor updates/fixes:
    - Updating input reader proto with option to use display_name when decoding data.
    - Updating visualization tool to specify whether using absolute or normalized box coordinates. Appropriate boxes will now appear in TB when using model_main.py

--
197920152  by Zhichao Lu:

    Add quantized training support in the new OD binaries and a config for SSD Mobilenet v1 quantized training that is TPU compatible.

--
197213563  by Zhichao Lu:

    Do not share batch_norm for classification and regression tower in weight shared box predictor.

--
197196757  by Zhichao Lu:

    Relax the box_predictor api to return box_prediction of shape [batch_size, num_anchors, code_size] in addition to [batch_size, num_anchors, (1|q), code_size].

--
196898361  by Zhichao Lu:

    Allow per-channel scalar value to pad input image with when using keep aspect ratio resizer (when pad_to_max_dimension=True).

    In Object Detection Pipeline, we pad image before normalization and this skews batch_norm statistics during training. The option to set per channel pad value lets us truly pad with zeros.

--
196592101  by Zhichao Lu:

    Fix bug regarding tfrecord shuffling in object_detection

--
196320138  by Zhichao Lu:

    Fix typo in exporting_models.md

--

PiperOrigin-RevId: 199348852
parent ed901b73
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Functions to generate a list of feature maps based on image features.
Provides several feature map generators that can be used to build object
detection feature extractors.
Object detection feature extractors usually are built by stacking two components
- A base feature extractor such as Inception V3 and a feature map generator.
Feature map generators build on the base feature extractors and produce a list
of final feature maps.
"""
import collections
import tensorflow as tf
from object_detection.utils import ops
slim = tf.contrib.slim
def get_depth_fn(depth_multiplier, min_depth):
"""Builds a callable to compute depth (output channels) of conv filters.
Args:
depth_multiplier: a multiplier for the nominal depth.
min_depth: a lower bound on the depth of filters.
Returns:
A callable that takes in a nominal depth and returns the depth to use.
"""
def multiply_depth(depth):
new_depth = int(depth * depth_multiplier)
return max(new_depth, min_depth)
return multiply_depth
def multi_resolution_feature_maps(feature_map_layout, depth_multiplier,
min_depth, insert_1x1_conv, image_features):
"""Generates multi resolution feature maps from input image features.
Generates multi-scale feature maps for detection as in the SSD papers by
Liu et al: https://arxiv.org/pdf/1512.02325v2.pdf, See Sec 2.1.
More specifically, it performs the following two tasks:
1) If a layer name is provided in the configuration, returns that layer as a
feature map.
2) If a layer name is left as an empty string, constructs a new feature map
based on the spatial shape and depth configuration. Note that the current
implementation only supports generating new layers using convolution of
stride 2 resulting in a spatial resolution reduction by a factor of 2.
By default convolution kernel size is set to 3, and it can be customized
by caller.
An example of the configuration for Inception V3:
{
'from_layer': ['Mixed_5d', 'Mixed_6e', 'Mixed_7c', '', '', ''],
'layer_depth': [-1, -1, -1, 512, 256, 128]
}
Args:
feature_map_layout: Dictionary of specifications for the feature map
layouts in the following format (Inception V2/V3 respectively):
{
'from_layer': ['Mixed_3c', 'Mixed_4c', 'Mixed_5c', '', '', ''],
'layer_depth': [-1, -1, -1, 512, 256, 128]
}
or
{
'from_layer': ['Mixed_5d', 'Mixed_6e', 'Mixed_7c', '', '', '', ''],
'layer_depth': [-1, -1, -1, 512, 256, 128]
}
If 'from_layer' is specified, the specified feature map is directly used
as a box predictor layer, and the layer_depth is directly infered from the
feature map (instead of using the provided 'layer_depth' parameter). In
this case, our convention is to set 'layer_depth' to -1 for clarity.
Otherwise, if 'from_layer' is an empty string, then the box predictor
layer will be built from the previous layer using convolution operations.
Note that the current implementation only supports generating new layers
using convolutions of stride 2 (resulting in a spatial resolution
reduction by a factor of 2), and will be extended to a more flexible
design. Convolution kernel size is set to 3 by default, and can be
customized by 'conv_kernel_size' parameter (similarily, 'conv_kernel_size'
should be set to -1 if 'from_layer' is specified). The created convolution
operation will be a normal 2D convolution by default, and a depthwise
convolution followed by 1x1 convolution if 'use_depthwise' is set to True.
depth_multiplier: Depth multiplier for convolutional layers.
min_depth: Minimum depth for convolutional layers.
insert_1x1_conv: A boolean indicating whether an additional 1x1 convolution
should be inserted before shrinking the feature map.
image_features: A dictionary of handles to activation tensors from the
base feature extractor.
Returns:
feature_maps: an OrderedDict mapping keys (feature map names) to
tensors where each tensor has shape [batch, height_i, width_i, depth_i].
Raises:
ValueError: if the number entries in 'from_layer' and
'layer_depth' do not match.
ValueError: if the generated layer does not have the same resolution
as specified.
"""
depth_fn = get_depth_fn(depth_multiplier, min_depth)
feature_map_keys = []
feature_maps = []
base_from_layer = ''
use_explicit_padding = False
if 'use_explicit_padding' in feature_map_layout:
use_explicit_padding = feature_map_layout['use_explicit_padding']
use_depthwise = False
if 'use_depthwise' in feature_map_layout:
use_depthwise = feature_map_layout['use_depthwise']
for index, from_layer in enumerate(feature_map_layout['from_layer']):
layer_depth = feature_map_layout['layer_depth'][index]
conv_kernel_size = 3
if 'conv_kernel_size' in feature_map_layout:
conv_kernel_size = feature_map_layout['conv_kernel_size'][index]
if from_layer:
feature_map = image_features[from_layer]
base_from_layer = from_layer
feature_map_keys.append(from_layer)
else:
pre_layer = feature_maps[-1]
intermediate_layer = pre_layer
if insert_1x1_conv:
layer_name = '{}_1_Conv2d_{}_1x1_{}'.format(
base_from_layer, index, depth_fn(layer_depth / 2))
intermediate_layer = slim.conv2d(
pre_layer,
depth_fn(layer_depth / 2), [1, 1],
padding='SAME',
stride=1,
scope=layer_name)
layer_name = '{}_2_Conv2d_{}_{}x{}_s2_{}'.format(
base_from_layer, index, conv_kernel_size, conv_kernel_size,
depth_fn(layer_depth))
stride = 2
padding = 'SAME'
if use_explicit_padding:
padding = 'VALID'
intermediate_layer = ops.fixed_padding(
intermediate_layer, conv_kernel_size)
if use_depthwise:
feature_map = slim.separable_conv2d(
intermediate_layer,
None, [conv_kernel_size, conv_kernel_size],
depth_multiplier=1,
padding=padding,
stride=stride,
scope=layer_name + '_depthwise')
feature_map = slim.conv2d(
feature_map,
depth_fn(layer_depth), [1, 1],
padding='SAME',
stride=1,
scope=layer_name)
else:
feature_map = slim.conv2d(
intermediate_layer,
depth_fn(layer_depth), [conv_kernel_size, conv_kernel_size],
padding=padding,
stride=stride,
scope=layer_name)
feature_map_keys.append(layer_name)
feature_maps.append(feature_map)
return collections.OrderedDict(
[(x, y) for (x, y) in zip(feature_map_keys, feature_maps)])
def fpn_top_down_feature_maps(image_features, depth, scope=None):
"""Generates `top-down` feature maps for Feature Pyramid Networks.
See https://arxiv.org/abs/1612.03144 for details.
Args:
image_features: list of tuples of (tensor_name, image_feature_tensor).
Spatial resolutions of succesive tensors must reduce exactly by a factor
of 2.
depth: depth of output feature maps.
scope: A scope name to wrap this op under.
Returns:
feature_maps: an OrderedDict mapping keys (feature map names) to
tensors where each tensor has shape [batch, height_i, width_i, depth_i].
"""
with tf.name_scope(scope, 'top_down'):
num_levels = len(image_features)
output_feature_maps_list = []
output_feature_map_keys = []
with slim.arg_scope(
[slim.conv2d], padding='SAME', stride=1):
top_down = slim.conv2d(
image_features[-1][1],
depth, [1, 1], activation_fn=None, normalizer_fn=None,
scope='projection_%d' % num_levels)
output_feature_maps_list.append(top_down)
output_feature_map_keys.append(
'top_down_%s' % image_features[-1][0])
for level in reversed(range(num_levels - 1)):
top_down = ops.nearest_neighbor_upsampling(top_down, 2)
residual = slim.conv2d(
image_features[level][1], depth, [1, 1],
activation_fn=None, normalizer_fn=None,
scope='projection_%d' % (level + 1))
top_down += residual
output_feature_maps_list.append(slim.conv2d(
top_down,
depth, [3, 3],
scope='smoothing_%d' % (level + 1)))
output_feature_map_keys.append('top_down_%s' % image_features[level][0])
return collections.OrderedDict(
reversed(zip(output_feature_map_keys, output_feature_maps_list)))
......@@ -110,23 +110,19 @@ class SSDMobileNetV1FeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
with (slim.arg_scope(self._conv_hyperparams_fn())
if self._override_base_feature_extractor_hyperparams
else context_manager.IdentityContextManager()):
# TODO(skligys): Enable fused batch norm once quantization supports it.
with slim.arg_scope([slim.batch_norm], fused=False):
_, image_features = mobilenet_v1.mobilenet_v1_base(
ops.pad_to_multiple(preprocessed_inputs, self._pad_to_multiple),
final_endpoint='Conv2d_13_pointwise',
min_depth=self._min_depth,
depth_multiplier=self._depth_multiplier,
use_explicit_padding=self._use_explicit_padding,
scope=scope)
with slim.arg_scope(self._conv_hyperparams_fn()):
# TODO(skligys): Enable fused batch norm once quantization supports it.
with slim.arg_scope([slim.batch_norm], fused=False):
feature_maps = feature_map_generators.multi_resolution_feature_maps(
feature_map_layout=feature_map_layout,
depth_multiplier=self._depth_multiplier,
_, image_features = mobilenet_v1.mobilenet_v1_base(
ops.pad_to_multiple(preprocessed_inputs, self._pad_to_multiple),
final_endpoint='Conv2d_13_pointwise',
min_depth=self._min_depth,
insert_1x1_conv=True,
image_features=image_features)
depth_multiplier=self._depth_multiplier,
use_explicit_padding=self._use_explicit_padding,
scope=scope)
with slim.arg_scope(self._conv_hyperparams_fn()):
feature_maps = feature_map_generators.multi_resolution_feature_maps(
feature_map_layout=feature_map_layout,
depth_multiplier=self._depth_multiplier,
min_depth=self._min_depth,
insert_1x1_conv=True,
image_features=image_features)
return feature_maps.values()
......@@ -148,7 +148,7 @@ class SsdMobilenetV1FeatureExtractorTest(
self.check_feature_extractor_variables_under_scope(
depth_multiplier, pad_to_multiple, scope_name)
def test_nofused_batchnorm(self):
def test_has_fused_batchnorm(self):
image_height = 40
image_width = 40
depth_multiplier = 1
......@@ -159,8 +159,8 @@ class SsdMobilenetV1FeatureExtractorTest(
pad_to_multiple)
preprocessed_image = feature_extractor.preprocess(image_placeholder)
_ = feature_extractor.extract_features(preprocessed_image)
self.assertFalse(any(op.type == 'FusedBatchNorm'
for op in tf.get_default_graph().get_operations()))
self.assertTrue(any(op.type == 'FusedBatchNorm'
for op in tf.get_default_graph().get_operations()))
if __name__ == '__main__':
tf.test.main()
......@@ -112,24 +112,18 @@ class SSDMobileNetV2FeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
with (slim.arg_scope(self._conv_hyperparams_fn())
if self._override_base_feature_extractor_hyperparams else
context_manager.IdentityContextManager()):
# TODO(b/68150321): Enable fused batch norm once quantization
# supports it.
with slim.arg_scope([slim.batch_norm], fused=False):
_, image_features = mobilenet_v2.mobilenet_base(
ops.pad_to_multiple(preprocessed_inputs, self._pad_to_multiple),
final_endpoint='layer_19',
depth_multiplier=self._depth_multiplier,
use_explicit_padding=self._use_explicit_padding,
scope=scope)
_, image_features = mobilenet_v2.mobilenet_base(
ops.pad_to_multiple(preprocessed_inputs, self._pad_to_multiple),
final_endpoint='layer_19',
depth_multiplier=self._depth_multiplier,
use_explicit_padding=self._use_explicit_padding,
scope=scope)
with slim.arg_scope(self._conv_hyperparams_fn()):
# TODO(b/68150321): Enable fused batch norm once quantization
# supports it.
with slim.arg_scope([slim.batch_norm], fused=False):
feature_maps = feature_map_generators.multi_resolution_feature_maps(
feature_map_layout=feature_map_layout,
depth_multiplier=self._depth_multiplier,
min_depth=self._min_depth,
insert_1x1_conv=True,
image_features=image_features)
feature_maps = feature_map_generators.multi_resolution_feature_maps(
feature_map_layout=feature_map_layout,
depth_multiplier=self._depth_multiplier,
min_depth=self._min_depth,
insert_1x1_conv=True,
image_features=image_features)
return feature_maps.values()
......@@ -135,7 +135,7 @@ class SsdMobilenetV2FeatureExtractorTest(
self.check_feature_extractor_variables_under_scope(
depth_multiplier, pad_to_multiple, scope_name)
def test_nofused_batchnorm(self):
def test_has_fused_batchnorm(self):
image_height = 40
image_width = 40
depth_multiplier = 1
......@@ -146,8 +146,8 @@ class SsdMobilenetV2FeatureExtractorTest(
pad_to_multiple)
preprocessed_image = feature_extractor.preprocess(image_placeholder)
_ = feature_extractor.extract_features(preprocessed_image)
self.assertFalse(any(op.type == 'FusedBatchNorm'
for op in tf.get_default_graph().get_operations()))
self.assertTrue(any(op.type == 'FusedBatchNorm'
for op in tf.get_default_graph().get_operations()))
if __name__ == '__main__':
......
......@@ -37,6 +37,10 @@ message KeepAspectRatioResizer {
// Whether to also resize the image channels from 3 to 1 (RGB to grayscale).
optional bool convert_to_grayscale = 5 [default = false];
// Per-channel pad value. This is only used when pad_to_max_dimension is True.
// If unspecified, a default pad value of 0 is applied to all channels.
repeated float per_channel_pad_value = 6;
}
// Configuration proto for image resizer that resizes to a fixed shape.
......
......@@ -69,6 +69,10 @@ message InputReader {
// Type of instance mask.
optional InstanceMaskType mask_type = 10 [default = NUMERICAL_MASKS];
// Whether to use the display name when decoding examples. This is only used
// when mapping class text strings to integers.
optional bool use_display_name = 17 [default = false];
oneof input_reader {
TFRecordInputReader tf_record_input_reader = 8;
ExternalInputReader external_input_reader = 9;
......
......@@ -235,6 +235,9 @@ def train(create_tensor_dict_fn,
built (before optimization). This is helpful to perform additional changes
to the training graph such as adding FakeQuant ops. The function should
modify the default graph.
Raises:
ValueError: If both num_clones > 1 and train_config.sync_replicas is true.
"""
detection_model = create_model_fn()
......@@ -256,9 +259,16 @@ def train(create_tensor_dict_fn,
with tf.device(deploy_config.variables_device()):
global_step = slim.create_global_step()
if num_clones != 1 and train_config.sync_replicas:
raise ValueError('In Synchronous SGD mode num_clones must ',
'be 1. Found num_clones: {}'.format(num_clones))
batch_size = train_config.batch_size // num_clones
if train_config.sync_replicas:
batch_size //= train_config.replicas_to_aggregate
with tf.device(deploy_config.inputs_device()):
input_queue = create_input_queue(
train_config.batch_size // num_clones, create_tensor_dict_fn,
batch_size, create_tensor_dict_fn,
train_config.batch_queue_capacity,
train_config.num_batch_queue_threads,
train_config.prefetch_queue_capacity, data_augmentation_options)
......@@ -377,7 +387,8 @@ def train(create_tensor_dict_fn,
train_config.load_all_detection_checkpoint_vars))
available_var_map = (variables_helper.
get_variables_available_in_checkpoint(
var_map, train_config.fine_tune_checkpoint))
var_map, train_config.fine_tune_checkpoint,
include_global_step=False))
init_saver = tf.train.Saver(available_var_map)
def initializer_fn(sess):
init_saver.restore(sess, train_config.fine_tune_checkpoint)
......
......@@ -278,6 +278,19 @@ def get_learning_rate_type(optimizer_config):
return optimizer_config.learning_rate.WhichOneof("learning_rate")
def _is_generic_key(key):
"""Determines whether the key starts with a generic config dictionary key."""
for prefix in [
"graph_rewriter_config",
"model",
"train_input_config",
"train_input_config",
"train_config"]:
if key.startswith(prefix + "."):
return True
return False
def merge_external_params_with_configs(configs, hparams=None, **kwargs):
"""Updates `configs` dictionary based on supplied parameters.
......@@ -287,6 +300,16 @@ def merge_external_params_with_configs(configs, hparams=None, **kwargs):
experiment, one can use a single base config file, and update particular
values.
There are two types of field overrides:
1. Strategy-based overrides, which update multiple relevant configuration
options. For example, updating `learning_rate` will update both the warmup and
final learning rates.
2. Generic key/value, which update a specific parameter based on namespaced
configuration keys. For example,
`model.ssd.loss.hard_example_miner.max_negatives_per_positive` will update the
hard example miner configuration for an SSD model config. Generic overrides
are automatically detected based on the namespaced keys.
Args:
configs: Dictionary of configuration objects. See outputs from
get_configs_from_pipeline_file() or get_configs_from_multiple_files().
......@@ -302,44 +325,42 @@ def merge_external_params_with_configs(configs, hparams=None, **kwargs):
if hparams:
kwargs.update(hparams.values())
for key, value in kwargs.items():
tf.logging.info("Maybe overwriting %s: %s", key, value)
# pylint: disable=g-explicit-bool-comparison
if value == "" or value is None:
continue
# pylint: enable=g-explicit-bool-comparison
if key == "learning_rate":
_update_initial_learning_rate(configs, value)
tf.logging.info("Overwriting learning rate: %f", value)
if key == "batch_size":
elif key == "batch_size":
_update_batch_size(configs, value)
tf.logging.info("Overwriting batch size: %d", value)
if key == "momentum_optimizer_value":
elif key == "momentum_optimizer_value":
_update_momentum_optimizer_value(configs, value)
tf.logging.info("Overwriting momentum optimizer value: %f", value)
if key == "classification_localization_weight_ratio":
elif key == "classification_localization_weight_ratio":
# Localization weight is fixed to 1.0.
_update_classification_localization_weight_ratio(configs, value)
if key == "focal_loss_gamma":
elif key == "focal_loss_gamma":
_update_focal_loss_gamma(configs, value)
if key == "focal_loss_alpha":
elif key == "focal_loss_alpha":
_update_focal_loss_alpha(configs, value)
if key == "train_steps":
elif key == "train_steps":
_update_train_steps(configs, value)
tf.logging.info("Overwriting train steps: %d", value)
if key == "eval_steps":
elif key == "eval_steps":
_update_eval_steps(configs, value)
tf.logging.info("Overwriting eval steps: %d", value)
if key == "train_input_path":
elif key == "train_input_path":
_update_input_path(configs["train_input_config"], value)
tf.logging.info("Overwriting train input path: %s", value)
if key == "eval_input_path":
elif key == "eval_input_path":
_update_input_path(configs["eval_input_config"], value)
tf.logging.info("Overwriting eval input path: %s", value)
if key == "label_map_path":
elif key == "label_map_path":
_update_label_map_path(configs, value)
tf.logging.info("Overwriting label map path: %s", value)
if key == "mask_type":
elif key == "mask_type":
_update_mask_type(configs, value)
tf.logging.info("Overwritten mask type: %s", value)
elif key == "eval_with_moving_averages":
_update_use_moving_averages(configs, value)
elif _is_generic_key(key):
_update_generic(configs, key, value)
else:
tf.logging.info("Ignoring config override key: %s", key)
return configs
......@@ -411,6 +432,38 @@ def _update_batch_size(configs, batch_size):
configs["train_config"].batch_size = max(1, int(round(batch_size)))
def _validate_message_has_field(message, field):
if not message.HasField(field):
raise ValueError("Expecting message to have field %s" % field)
def _update_generic(configs, key, value):
"""Update a pipeline configuration parameter based on a generic key/value.
Args:
configs: Dictionary of pipeline configuration protos.
key: A string key, dot-delimited to represent the argument key.
e.g. "model.ssd.train_config.batch_size"
value: A value to set the argument to. The type of the value must match the
type for the protocol buffer. Note that setting the wrong type will
result in a TypeError.
e.g. 42
Raises:
ValueError if the message key does not match the existing proto fields.
TypeError the value type doesn't match the protobuf field type.
"""
fields = key.split(".")
first_field = fields.pop(0)
last_field = fields.pop()
message = configs[first_field]
for field in fields:
_validate_message_has_field(message, field)
message = getattr(message, field)
_validate_message_has_field(message, last_field)
setattr(message, last_field, value)
def _update_momentum_optimizer_value(configs, momentum):
"""Updates `configs` to reflect the new momentum value.
......@@ -587,3 +640,17 @@ def _update_mask_type(configs, mask_type):
"""
configs["train_input_config"].mask_type = mask_type
configs["eval_input_config"].mask_type = mask_type
def _update_use_moving_averages(configs, use_moving_averages):
"""Updates the eval config option to use or not use moving averages.
The configs dictionary is updated in place, and hence not returned.
Args:
configs: Dictionary of configuration objects. See outputs from
get_configs_from_pipeline_file() or get_configs_from_multiple_files().
use_moving_averages: Boolean indicating whether moving average variables
should be loaded during evaluation.
"""
configs["eval_config"].use_moving_averages = use_moving_averages
......@@ -69,6 +69,11 @@ def _update_optimizer_with_cosine_decay_learning_rate(
class ConfigUtilTest(tf.test.TestCase):
def _create_and_load_test_configs(self, pipeline_config):
pipeline_config_path = os.path.join(self.get_temp_dir(), "pipeline.config")
_write_config(pipeline_config, pipeline_config_path)
return config_util.get_configs_from_pipeline_file(pipeline_config_path)
def test_get_configs_from_pipeline_file(self):
"""Test that proto configs can be read from pipeline config file."""
pipeline_config_path = os.path.join(self.get_temp_dir(), "pipeline.config")
......@@ -307,6 +312,34 @@ class ConfigUtilTest(tf.test.TestCase):
new_batch_size = configs["train_config"].batch_size
self.assertEqual(1, new_batch_size) # Clipped to 1.0.
def testOverwriteBatchSizeWithKeyValue(self):
"""Tests that batch size is overwritten based on key/value."""
pipeline_config = pipeline_pb2.TrainEvalPipelineConfig()
pipeline_config.train_config.batch_size = 2
configs = self._create_and_load_test_configs(pipeline_config)
hparams = tf.contrib.training.HParams(**{"train_config.batch_size": 10})
configs = config_util.merge_external_params_with_configs(configs, hparams)
new_batch_size = configs["train_config"].batch_size
self.assertEqual(10, new_batch_size)
def testKeyValueOverrideBadKey(self):
"""Tests that overwriting with a bad key causes an exception."""
pipeline_config = pipeline_pb2.TrainEvalPipelineConfig()
configs = self._create_and_load_test_configs(pipeline_config)
hparams = tf.contrib.training.HParams(**{"train_config.no_such_field": 10})
with self.assertRaises(ValueError):
config_util.merge_external_params_with_configs(configs, hparams)
def testOverwriteBatchSizeWithBadValueType(self):
"""Tests that overwriting with a bad valuye type causes an exception."""
pipeline_config = pipeline_pb2.TrainEvalPipelineConfig()
pipeline_config.train_config.batch_size = 2
configs = self._create_and_load_test_configs(pipeline_config)
# Type should be an integer, but we're passing a string "10".
hparams = tf.contrib.training.HParams(**{"train_config.batch_size": "10"})
with self.assertRaises(TypeError):
config_util.merge_external_params_with_configs(configs, hparams)
def testNewMomentumOptimizerValue(self):
"""Tests that new momentum value is updated appropriately."""
original_momentum_value = 0.4
......@@ -501,6 +534,19 @@ class ConfigUtilTest(tf.test.TestCase):
self.assertEqual(new_mask_type, configs["train_input_config"].mask_type)
self.assertEqual(new_mask_type, configs["eval_input_config"].mask_type)
def testUseMovingAverageForEval(self):
use_moving_averages_orig = False
pipeline_config_path = os.path.join(self.get_temp_dir(), "pipeline.config")
pipeline_config = pipeline_pb2.TrainEvalPipelineConfig()
pipeline_config.eval_config.use_moving_averages = use_moving_averages_orig
_write_config(pipeline_config, pipeline_config_path)
configs = config_util.get_configs_from_pipeline_file(pipeline_config_path)
configs = config_util.merge_external_params_with_configs(
configs, eval_with_moving_averages=True)
self.assertEqual(True, configs["eval_config"].use_moving_averages)
def test_get_image_resizer_config(self):
"""Tests that number of classes can be retrieved."""
model_config = model_pb2.DetectionModel()
......
......@@ -117,13 +117,17 @@ def read_dataset(file_read_func, decode_func, input_files, config):
A tf.data.Dataset based on config.
"""
# Shard, shuffle, and read files.
filenames = tf.concat([tf.matching_files(pattern) for pattern in input_files],
0)
filename_dataset = tf.data.Dataset.from_tensor_slices(filenames)
filenames = tf.gfile.Glob(input_files)
num_readers = config.num_readers
if num_readers > len(filenames):
num_readers = len(filenames)
tf.logging.warning('num_readers has been reduced to %d to match input file '
'shards.' % num_readers)
filename_dataset = tf.data.Dataset.from_tensor_slices(tf.unstack(filenames))
if config.shuffle:
filename_dataset = filename_dataset.shuffle(
config.filenames_shuffle_buffer_size)
elif config.num_readers > 1:
elif num_readers > 1:
tf.logging.warning('`shuffle` is false, but the input data stream is '
'still slightly shuffled since `num_readers` > 1.')
......@@ -131,8 +135,10 @@ def read_dataset(file_read_func, decode_func, input_files, config):
records_dataset = filename_dataset.apply(
tf.contrib.data.parallel_interleave(
file_read_func, cycle_length=config.num_readers,
block_length=config.read_block_length, sloppy=config.shuffle))
file_read_func,
cycle_length=num_readers,
block_length=config.read_block_length,
sloppy=config.shuffle))
if config.shuffle:
records_dataset = records_dataset.shuffle(config.shuffle_buffer_size)
tensor_dataset = records_dataset.map(
......
......@@ -16,6 +16,7 @@
"""Tests for object_detection.utils.dataset_util."""
import os
import numpy as np
import tensorflow as tf
from object_detection.protos import input_reader_pb2
......@@ -32,6 +33,13 @@ class DatasetUtilTest(tf.test.TestCase):
with tf.gfile.Open(path, 'wb') as f:
f.write('\n'.join([str(i + 1), str((i + 1) * 10)]))
self._shuffle_path_template = os.path.join(self.get_temp_dir(),
'shuffle_%s.txt')
for i in range(2):
path = self._shuffle_path_template % i
with tf.gfile.Open(path, 'wb') as f:
f.write('\n'.join([str(i)] * 5))
def _get_dataset_next(self, files, config, batch_size):
def decode_func(value):
return [tf.string_to_number(value, out_type=tf.int32)]
......@@ -78,6 +86,43 @@ class DatasetUtilTest(tf.test.TestCase):
[[1, 10, 2, 20, 3, 30, 4, 40, 5, 50, 1, 10, 2, 20, 3,
30, 4, 40, 5, 50]])
def test_reduce_num_reader(self):
config = input_reader_pb2.InputReader()
config.num_readers = 10
config.shuffle = False
data = self._get_dataset_next([self._path_template % '*'], config,
batch_size=20)
with self.test_session() as sess:
self.assertAllEqual(sess.run(data),
[[1, 10, 2, 20, 3, 30, 4, 40, 5, 50, 1, 10, 2, 20, 3,
30, 4, 40, 5, 50]])
def test_enable_shuffle(self):
config = input_reader_pb2.InputReader()
config.num_readers = 1
config.shuffle = True
data = self._get_dataset_next(
[self._shuffle_path_template % '*'], config, batch_size=10)
expected_non_shuffle_output = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
with self.test_session() as sess:
self.assertTrue(
np.any(np.not_equal(sess.run(data), expected_non_shuffle_output)))
def test_disable_shuffle_(self):
config = input_reader_pb2.InputReader()
config.num_readers = 1
config.shuffle = False
data = self._get_dataset_next(
[self._shuffle_path_template % '*'], config, batch_size=10)
expected_non_shuffle_output = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
with self.test_session() as sess:
self.assertAllEqual(sess.run(data), [expected_non_shuffle_output])
def test_read_dataset_single_epoch(self):
config = input_reader_pb2.InputReader()
config.num_epochs = 1
......
......@@ -318,8 +318,9 @@ def retain_groundtruth(tensor_dict, valid_indices):
Args:
tensor_dict: a dictionary of following groundtruth tensors -
fields.InputDataFields.groundtruth_boxes
fields.InputDataFields.groundtruth_instance_masks
fields.InputDataFields.groundtruth_classes
fields.InputDataFields.groundtruth_keypoints
fields.InputDataFields.groundtruth_instance_masks
fields.InputDataFields.groundtruth_is_crowd
fields.InputDataFields.groundtruth_area
fields.InputDataFields.groundtruth_label_types
......@@ -347,6 +348,7 @@ def retain_groundtruth(tensor_dict, valid_indices):
for key in tensor_dict:
if key in [fields.InputDataFields.groundtruth_boxes,
fields.InputDataFields.groundtruth_classes,
fields.InputDataFields.groundtruth_keypoints,
fields.InputDataFields.groundtruth_instance_masks]:
valid_dict[key] = tf.gather(tensor_dict[key], valid_indices)
# Input decoder returns empty tensor when these fields are not provided.
......@@ -374,6 +376,8 @@ def retain_groundtruth_with_positive_classes(tensor_dict):
tensor_dict: a dictionary of following groundtruth tensors -
fields.InputDataFields.groundtruth_boxes
fields.InputDataFields.groundtruth_classes
fields.InputDataFields.groundtruth_keypoints
fields.InputDataFields.groundtruth_instance_masks
fields.InputDataFields.groundtruth_is_crowd
fields.InputDataFields.groundtruth_area
fields.InputDataFields.groundtruth_label_types
......@@ -413,6 +417,8 @@ def filter_groundtruth_with_crowd_boxes(tensor_dict):
tensor_dict: a dictionary of following groundtruth tensors -
fields.InputDataFields.groundtruth_boxes
fields.InputDataFields.groundtruth_classes
fields.InputDataFields.groundtruth_keypoints
fields.InputDataFields.groundtruth_instance_masks
fields.InputDataFields.groundtruth_is_crowd
fields.InputDataFields.groundtruth_area
fields.InputDataFields.groundtruth_label_types
......@@ -435,8 +441,9 @@ def filter_groundtruth_with_nan_box_coordinates(tensor_dict):
Args:
tensor_dict: a dictionary of following groundtruth tensors -
fields.InputDataFields.groundtruth_boxes
fields.InputDataFields.groundtruth_instance_masks
fields.InputDataFields.groundtruth_classes
fields.InputDataFields.groundtruth_keypoints
fields.InputDataFields.groundtruth_instance_masks
fields.InputDataFields.groundtruth_is_crowd
fields.InputDataFields.groundtruth_area
fields.InputDataFields.groundtruth_label_types
......@@ -703,23 +710,30 @@ def reframe_box_masks_to_image_masks(box_masks, boxes, image_height,
A tf.float32 tensor of size [num_masks, image_height, image_width].
"""
# TODO(rathodv): Make this a public function.
def transform_boxes_relative_to_boxes(boxes, reference_boxes):
boxes = tf.reshape(boxes, [-1, 2, 2])
min_corner = tf.expand_dims(reference_boxes[:, 0:2], 1)
max_corner = tf.expand_dims(reference_boxes[:, 2:4], 1)
transformed_boxes = (boxes - min_corner) / (max_corner - min_corner)
return tf.reshape(transformed_boxes, [-1, 4])
box_masks = tf.expand_dims(box_masks, axis=3)
num_boxes = tf.shape(box_masks)[0]
unit_boxes = tf.concat(
[tf.zeros([num_boxes, 2]), tf.ones([num_boxes, 2])], axis=1)
reverse_boxes = transform_boxes_relative_to_boxes(unit_boxes, boxes)
image_masks = tf.image.crop_and_resize(image=box_masks,
boxes=reverse_boxes,
box_ind=tf.range(num_boxes),
crop_size=[image_height, image_width],
extrapolation_value=0.0)
def reframe_box_masks_to_image_masks_default():
"""The default function when there are more than 0 box masks."""
def transform_boxes_relative_to_boxes(boxes, reference_boxes):
boxes = tf.reshape(boxes, [-1, 2, 2])
min_corner = tf.expand_dims(reference_boxes[:, 0:2], 1)
max_corner = tf.expand_dims(reference_boxes[:, 2:4], 1)
transformed_boxes = (boxes - min_corner) / (max_corner - min_corner)
return tf.reshape(transformed_boxes, [-1, 4])
box_masks_expanded = tf.expand_dims(box_masks, axis=3)
num_boxes = tf.shape(box_masks_expanded)[0]
unit_boxes = tf.concat(
[tf.zeros([num_boxes, 2]), tf.ones([num_boxes, 2])], axis=1)
reverse_boxes = transform_boxes_relative_to_boxes(unit_boxes, boxes)
return tf.image.crop_and_resize(
image=box_masks_expanded,
boxes=reverse_boxes,
box_ind=tf.range(num_boxes),
crop_size=[image_height, image_width],
extrapolation_value=0.0)
image_masks = tf.cond(
tf.shape(box_masks)[0] > 0,
reframe_box_masks_to_image_masks_default,
lambda: tf.zeros([0, image_height, image_width, 1], dtype=tf.float32))
return tf.squeeze(image_masks, axis=3)
......
......@@ -1100,6 +1100,16 @@ class ReframeBoxMasksToImageMasksTest(tf.test.TestCase):
np_image_masks = sess.run(image_masks)
self.assertAllClose(np_image_masks, np_expected_image_masks)
def testZeroBoxMasks(self):
box_masks = tf.zeros([0, 3, 3], dtype=tf.float32)
boxes = tf.zeros([0, 4], dtype=tf.float32)
image_masks = ops.reframe_box_masks_to_image_masks(box_masks, boxes,
image_height=4,
image_width=4)
with self.test_session() as sess:
np_image_masks = sess.run(image_masks)
self.assertAllEqual(np_image_masks.shape, np.array([0, 4, 4]))
def testMaskIsCenteredInImageWhenBoxIsCentered(self):
box_masks = tf.constant([[[1, 1],
[1, 1]]], dtype=tf.float32)
......
......@@ -67,16 +67,18 @@ class PerImageVRDEvaluation(object):
tp_fp_labels: A single boolean numpy array of shape [N,], representing N
True/False positive label, one label per tuple. The labels are sorted
so that the order of the labels matches the order of the scores.
result_mapping: A numpy array with shape [N,] with original index of each
entry.
"""
scores, tp_fp_labels = self._compute_tp_fp(
scores, tp_fp_labels, result_mapping = self._compute_tp_fp(
detected_box_tuples=detected_box_tuples,
detected_scores=detected_scores,
detected_class_tuples=detected_class_tuples,
groundtruth_box_tuples=groundtruth_box_tuples,
groundtruth_class_tuples=groundtruth_class_tuples)
return scores, tp_fp_labels
return scores, tp_fp_labels, result_mapping
def _compute_tp_fp(self, detected_box_tuples, detected_scores,
detected_class_tuples, groundtruth_box_tuples,
......@@ -107,33 +109,46 @@ class PerImageVRDEvaluation(object):
tp_fp_labels: A single boolean numpy array of shape [N,], representing N
True/False positive label, one label per tuple. The labels are sorted
so that the order of the labels matches the order of the scores.
result_mapping: A numpy array with shape [N,] with original index of each
entry.
"""
unique_gt_tuples = np.unique(
np.concatenate((groundtruth_class_tuples, detected_class_tuples)))
result_scores = []
result_tp_fp_labels = []
result_mapping = []
for unique_tuple in unique_gt_tuples:
detections_selector = (detected_class_tuples == unique_tuple)
gt_selector = (groundtruth_class_tuples == unique_tuple)
scores, tp_fp_labels = self._compute_tp_fp_for_single_class(
detected_box_tuples=detected_box_tuples[detections_selector],
detected_scores=detected_scores[detections_selector],
selector_mapping = np.where(detections_selector)[0]
detection_scores_per_tuple = detected_scores[detections_selector]
detection_box_per_tuple = detected_box_tuples[detections_selector]
sorted_indices = np.argsort(detection_scores_per_tuple)
sorted_indices = sorted_indices[::-1]
tp_fp_labels = self._compute_tp_fp_for_single_class(
detected_box_tuples=detection_box_per_tuple[sorted_indices],
groundtruth_box_tuples=groundtruth_box_tuples[gt_selector])
result_scores.append(scores)
result_scores.append(detection_scores_per_tuple[sorted_indices])
result_tp_fp_labels.append(tp_fp_labels)
result_mapping.append(selector_mapping[sorted_indices])
result_scores = np.concatenate(result_scores)
result_tp_fp_labels = np.concatenate(result_tp_fp_labels)
result_mapping = np.concatenate(result_mapping)
sorted_indices = np.argsort(result_scores)
sorted_indices = sorted_indices[::-1]
return result_scores[sorted_indices], result_tp_fp_labels[sorted_indices]
return result_scores[sorted_indices], result_tp_fp_labels[
sorted_indices], result_mapping[sorted_indices]
def _get_overlaps_and_scores_relation_tuples(
self, detected_box_tuples, detected_scores, groundtruth_box_tuples):
def _get_overlaps_and_scores_relation_tuples(self, detected_box_tuples,
groundtruth_box_tuples):
"""Computes overlaps and scores between detected and groundtruth tuples.
Both detections and groundtruth boxes have the same class tuples.
......@@ -143,8 +158,6 @@ class PerImageVRDEvaluation(object):
representing N tuples, each tuple containing the same number of named
bounding boxes.
Each box is of the format [y_min, x_min, y_max, x_max]
detected_scores: A float numpy array of shape [N,], representing
the confidence scores of the detected N object instances.
groundtruth_box_tuples: A float numpy array of structures with the shape
[M,], representing M tuples, each tuple containing the same number
of named bounding boxes.
......@@ -153,7 +166,6 @@ class PerImageVRDEvaluation(object):
Returns:
result_iou: A float numpy array of size
[num_detected_tuples, num_gt_box_tuples].
scores: The score of the detected boxlist.
"""
result_iou = np.ones(
......@@ -161,46 +173,35 @@ class PerImageVRDEvaluation(object):
dtype=float)
for field in detected_box_tuples.dtype.fields:
detected_boxlist_field = np_box_list.BoxList(detected_box_tuples[field])
detected_boxlist_field.add_field('scores', detected_scores)
detected_boxlist_field = np_box_list_ops.sort_by_field(
detected_boxlist_field, 'scores')
gt_boxlist_field = np_box_list.BoxList(groundtruth_box_tuples[field])
iou_field = np_box_list_ops.iou(detected_boxlist_field, gt_boxlist_field)
result_iou = np.minimum(iou_field, result_iou)
scores = detected_boxlist_field.get_field('scores')
return result_iou, scores
return result_iou
def _compute_tp_fp_for_single_class(self, detected_box_tuples,
detected_scores, groundtruth_box_tuples):
groundtruth_box_tuples):
"""Labels boxes detected with the same class from the same image as tp/fp.
Detection boxes are expected to be already sorted by score.
Args:
detected_box_tuples: A numpy array of structures with shape [N,],
representing N tuples, each tuple containing the same number of named
bounding boxes.
Each box is of the format [y_min, x_min, y_max, x_max]
detected_scores: A float numpy array of shape [N,], representing
the confidence scores of the detected N object instances.
groundtruth_box_tuples: A float numpy array of structures with the shape
[M,], representing M tuples, each tuple containing the same number
of named bounding boxes.
Each box is of the format [y_min, x_min, y_max, x_max]
Returns:
Two arrays of the same size, containing true/false for N boxes that were
evaluated as being true positives or false positives;
scores: A numpy array representing the detection scores.
tp_fp_labels: a boolean numpy array indicating whether a detection is a
true positive.
"""
if detected_box_tuples.size == 0:
return np.array([], dtype=float), np.array([], dtype=bool)
return np.array([], dtype=bool)
min_iou, scores = self._get_overlaps_and_scores_relation_tuples(
detected_box_tuples=detected_box_tuples,
detected_scores=detected_scores,
groundtruth_box_tuples=groundtruth_box_tuples)
min_iou = self._get_overlaps_and_scores_relation_tuples(
detected_box_tuples, groundtruth_box_tuples)
num_detected_tuples = detected_box_tuples.shape[0]
tp_fp_labels = np.zeros(num_detected_tuples, dtype=bool)
......@@ -215,4 +216,4 @@ class PerImageVRDEvaluation(object):
tp_fp_labels[i] = True
is_gt_tuple_detected[gt_id] = True
return scores, tp_fp_labels
return tp_fp_labels
......@@ -28,31 +28,25 @@ class SingleClassPerImageVrdEvaluationTest(tf.test.TestCase):
box_data_type = np.dtype([('subject', 'f4', (4,)), ('object', 'f4', (4,))])
self.detected_box_tuples = np.array(
[([0, 0, 1, 1], [1, 1, 2, 2]), ([0, 0, 1.1, 1], [1, 1, 2, 2]),
[([0, 0, 1.1, 1], [1, 1, 2, 2]), ([0, 0, 1, 1], [1, 1, 2, 2]),
([1, 1, 2, 2], [0, 0, 1.1, 1])],
dtype=box_data_type)
self.detected_scores = np.array([0.2, 0.8, 0.1], dtype=float)
self.detected_scores = np.array([0.8, 0.2, 0.1], dtype=float)
self.groundtruth_box_tuples = np.array(
[([0, 0, 1, 1], [1, 1, 2, 2])], dtype=box_data_type)
def test_tp_fp_eval(self):
scores, tp_fp_labels = self.eval._compute_tp_fp_for_single_class(
self.detected_box_tuples, self.detected_scores,
self.groundtruth_box_tuples)
expected_scores = np.array([0.8, 0.2, 0.1], dtype=float)
tp_fp_labels = self.eval._compute_tp_fp_for_single_class(
self.detected_box_tuples, self.groundtruth_box_tuples)
expected_tp_fp_labels = np.array([True, False, False], dtype=bool)
self.assertTrue(np.allclose(expected_scores, scores))
self.assertTrue(np.allclose(expected_tp_fp_labels, tp_fp_labels))
def test_tp_fp_eval_empty_gt(self):
box_data_type = np.dtype([('subject', 'f4', (4,)), ('object', 'f4', (4,))])
scores, tp_fp_labels = self.eval._compute_tp_fp_for_single_class(
self.detected_box_tuples, self.detected_scores,
np.array([], dtype=box_data_type))
expected_scores = np.array([0.8, 0.2, 0.1], dtype=float)
tp_fp_labels = self.eval._compute_tp_fp_for_single_class(
self.detected_box_tuples, np.array([], dtype=box_data_type))
expected_tp_fp_labels = np.array([False, False, False], dtype=bool)
self.assertTrue(np.allclose(expected_scores, scores))
self.assertTrue(np.allclose(expected_tp_fp_labels, tp_fp_labels))
......@@ -82,16 +76,18 @@ class MultiClassPerImageVrdEvaluationTest(tf.test.TestCase):
[(1, 2, 3), (1, 7, 3), (1, 4, 5)], dtype=label_data_type)
def test_tp_fp_eval(self):
scores, tp_fp_labels = self.eval.compute_detection_tp_fp(
scores, tp_fp_labels, mapping = self.eval.compute_detection_tp_fp(
self.detected_box_tuples, self.detected_scores,
self.detected_class_tuples, self.groundtruth_box_tuples,
self.groundtruth_class_tuples)
expected_scores = np.array([0.8, 0.5, 0.2, 0.1], dtype=float)
expected_tp_fp_labels = np.array([True, True, False, False], dtype=bool)
expected_mapping = np.array([1, 3, 0, 2])
self.assertTrue(np.allclose(expected_scores, scores))
self.assertTrue(np.allclose(expected_tp_fp_labels, tp_fp_labels))
self.assertTrue(np.allclose(expected_mapping, mapping))
if __name__ == '__main__':
......
......@@ -138,3 +138,36 @@ def create_random_boxes(num_boxes, max_height, max_width):
boxes[:, 3] = np.maximum(x_1, x_2)
return boxes.astype(np.float32)
def first_rows_close_as_set(a, b, k=None, rtol=1e-6, atol=1e-6):
"""Checks if first K entries of two lists are close, up to permutation.
Inputs to this assert are lists of items which can be compared via
numpy.allclose(...) and can be sorted.
Args:
a: list of items which can be compared via numpy.allclose(...) and are
sortable.
b: list of items which can be compared via numpy.allclose(...) and are
sortable.
k: a non-negative integer. If not provided, k is set to be len(a).
rtol: relative tolerance.
atol: absolute tolerance.
Returns:
boolean, True if input lists a and b have the same length and
the first k entries of the inputs satisfy numpy.allclose() after
sorting entries.
"""
if not isinstance(a, list) or not isinstance(b, list) or len(a) != len(b):
return False
if not k:
k = len(a)
k = min(k, len(a))
a_sorted = sorted(a[:k])
b_sorted = sorted(b[:k])
return all([
np.allclose(entry_a, entry_b, rtol, atol)
for (entry_a, entry_b) in zip(a_sorted, b_sorted)
])
......@@ -68,6 +68,22 @@ class TestUtilsTest(tf.test.TestCase):
self.assertTrue(boxes[:, 2].max() <= max_height)
self.assertTrue(boxes[:, 3].max() <= max_width)
def test_first_rows_close_as_set(self):
a = [1, 2, 3, 0, 0]
b = [3, 2, 1, 0, 0]
k = 3
self.assertTrue(test_utils.first_rows_close_as_set(a, b, k))
a = [[1, 2], [1, 4], [0, 0]]
b = [[1, 4 + 1e-9], [1, 2], [0, 0]]
k = 2
self.assertTrue(test_utils.first_rows_close_as_set(a, b, k))
a = [[1, 2], [1, 4], [0, 0]]
b = [[1, 4 + 1e-9], [2, 2], [0, 0]]
k = 2
self.assertFalse(test_utils.first_rows_close_as_set(a, b, k))
if __name__ == '__main__':
tf.test.main()
......@@ -315,11 +315,13 @@ def draw_bounding_boxes_on_image_tensors(images,
instance_masks=None,
keypoints=None,
max_boxes_to_draw=20,
min_score_thresh=0.2):
min_score_thresh=0.2,
use_normalized_coordinates=True):
"""Draws bounding boxes, masks, and keypoints on batch of image tensors.
Args:
images: A 4D uint8 image tensor of shape [N, H, W, C].
images: A 4D uint8 image tensor of shape [N, H, W, C]. If C > 3, additional
channels will be ignored.
boxes: [N, max_detections, 4] float32 tensor of detection boxes.
classes: [N, max_detections] int tensor of detection classes. Note that
classes are 1-indexed.
......@@ -332,12 +334,17 @@ def draw_bounding_boxes_on_image_tensors(images,
with keypoints.
max_boxes_to_draw: Maximum number of boxes to draw on an image. Default 20.
min_score_thresh: Minimum score threshold for visualization. Default 0.2.
use_normalized_coordinates: Whether to assume boxes and kepoints are in
normalized coordinates (as opposed to absolute coordiantes).
Default is True.
Returns:
4D image tensor of type uint8, with boxes drawn on top.
"""
# Additional channels are being ignored.
images = images[:, :, :, 0:3]
visualization_keyword_args = {
'use_normalized_coordinates': True,
'use_normalized_coordinates': use_normalized_coordinates,
'max_boxes_to_draw': max_boxes_to_draw,
'min_score_thresh': min_score_thresh,
'agnostic_mode': False,
......@@ -382,7 +389,8 @@ def draw_bounding_boxes_on_image_tensors(images,
def draw_side_by_side_evaluation_image(eval_dict,
category_index,
max_boxes_to_draw=20,
min_score_thresh=0.2):
min_score_thresh=0.2,
use_normalized_coordinates=True):
"""Creates a side-by-side image with detections and groundtruth.
Bounding boxes (and instance masks, if available) are visualized on both
......@@ -394,6 +402,9 @@ def draw_side_by_side_evaluation_image(eval_dict,
category_index: A category index (dictionary) produced from a labelmap.
max_boxes_to_draw: The maximum number of boxes to draw for detections.
min_score_thresh: The minimum score threshold for showing detections.
use_normalized_coordinates: Whether to assume boxes and kepoints are in
normalized coordinates (as opposed to absolute coordiantes).
Default is True.
Returns:
A [1, H, 2 * W, C] uint8 tensor. The subimage on the left corresponds to
......@@ -425,7 +436,8 @@ def draw_side_by_side_evaluation_image(eval_dict,
instance_masks=instance_masks,
keypoints=keypoints,
max_boxes_to_draw=max_boxes_to_draw,
min_score_thresh=min_score_thresh)
min_score_thresh=min_score_thresh,
use_normalized_coordinates=use_normalized_coordinates)
images_with_groundtruth = draw_bounding_boxes_on_image_tensors(
eval_dict[input_data_fields.original_image],
tf.expand_dims(eval_dict[input_data_fields.groundtruth_boxes], axis=0),
......@@ -439,7 +451,8 @@ def draw_side_by_side_evaluation_image(eval_dict,
instance_masks=groundtruth_instance_masks,
keypoints=None,
max_boxes_to_draw=None,
min_score_thresh=0.0)
min_score_thresh=0.0,
use_normalized_coordinates=use_normalized_coordinates)
return tf.concat([images_with_detections, images_with_groundtruth], axis=2)
......
......@@ -48,6 +48,9 @@ class VisualizationUtilsTest(tf.test.TestCase):
image = np.concatenate((imu, imd), axis=0)
return image
def create_test_image_with_five_channels(self):
return np.full([100, 200, 5], 255, dtype=np.uint8)
def test_draw_bounding_box_on_image(self):
test_image = self.create_colorful_test_image()
test_image = Image.fromarray(test_image)
......@@ -144,6 +147,32 @@ class VisualizationUtilsTest(tf.test.TestCase):
image_pil = Image.fromarray(images_with_boxes_np[i, ...])
image_pil.save(output_file)
def test_draw_bounding_boxes_on_image_tensors_with_additional_channels(self):
"""Tests the case where input image tensor has more than 3 channels."""
category_index = {1: {'id': 1, 'name': 'dog'}}
image_np = self.create_test_image_with_five_channels()
images_np = np.stack((image_np, image_np), axis=0)
with tf.Graph().as_default():
images_tensor = tf.constant(value=images_np, dtype=tf.uint8)
boxes = tf.constant(0, dtype=tf.float32, shape=[2, 0, 4])
classes = tf.constant(0, dtype=tf.int64, shape=[2, 0])
scores = tf.constant(0, dtype=tf.float32, shape=[2, 0])
images_with_boxes = (
visualization_utils.draw_bounding_boxes_on_image_tensors(
images_tensor,
boxes,
classes,
scores,
category_index,
min_score_thresh=0.2))
with self.test_session() as sess:
sess.run(tf.global_variables_initializer())
final_images_np = sess.run(images_with_boxes)
self.assertEqual((2, 100, 200, 3), final_images_np.shape)
def test_draw_keypoints_on_image(self):
test_image = self.create_colorful_test_image()
test_image = Image.fromarray(test_image)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment