Commit 1efe98bb authored by Zhichao Lu's avatar Zhichao Lu Committed by lzc5123016
Browse files

Merged commit includes the following changes:

185215255  by Zhichao Lu:

    Stop populating image/object/class/text field when generating COCO tf record.

--
185213306  by Zhichao Lu:

    Use the params batch size and not the one from train_config in input_fn

--
185209081  by Zhichao Lu:

    Handle the case when there are no ground-truth masks for an image.

--
185195531  by Zhichao Lu:

    Remove unstack and stack operations on features from third_party/object_detection/model.py.

--
185195017  by Zhichao Lu:

    Matrix multiplication based gather op implementation.

--
185187744  by Zhichao Lu:

    Fix eval_util minor issue.

--
185098733  by Zhichao Lu:

    Internal change

185076656  by Zhichao Lu:

    Increment the amount of boxes for coco17.

--
185074199  by Zhichao Lu:

    Add config for SSD Resnet50 v1 with FPN.

--
185060199  by Zhichao Lu:

    Fix a bug in clear_detections.
    This method set detection_keys to an empty dictionary instead of an empty set. I've refactored so that this method and the constructor use the same code path.

--
185031359  by Zhichao Lu:

    Eval TPU trained models continuously.

--
185016591  by Zhichao Lu:

    Use TPUEstimatorSpec for TPU

--
185013651  by Zhichao Lu:

    Add PreprocessorCache to record and duplicate augmentations.

--
184921763  by Zhichao Lu:

    Minor fixes for object detection.

--
184920610  by Zhichao Lu:

    Adds a model builder test for "embedded_ssd_mobilenet_v1" feature extractor.

--
184919284  by Zhichao Lu:

    Added unit tests for TPU, with optional training / eval.

--
184915910  by Zhichao Lu:

    Update third_party g3 doc with Mask RCNN detection models.

--
184914085  by Zhichao Lu:

    Slight change to WeightSharedConvolutionalBoxPredictor implementation to make things match more closely with RetinaNet.  Specifically we now construct the box encoding and class predictor towers separately rather than having them share weights until penultimate layer.

--
184913786  by Zhichao Lu:

    Plumbs SSD Resnet V1 with FPN models into model builder.

--
184910030  by Zhichao Lu:

    Add coco metrics to evaluator.

--
184897758  by Zhichao Lu:

    Merge changes from github.

--
184888736  by Zhichao Lu:

    Ensure groundtruth_weights are always 1-D.

--
184887256  by Zhichao Lu:

    Introduce an option to add summaries in the model so it can be turned off when necessary.

--
184865559  by Zhichao Lu:

    Updating inputs so that a dictionary of tensors is returned from input_fn. Moving unbatch/unpad to model.py.
    Also removing source_id key from features dictionary, and replacing with an integer hash.

--
184859205  by Zhichao Lu:

    This CL is trying to hide those differences by making the default settings work with the public code.

--
184769779  by Zhichao Lu:

    Pass groundtruth weights into ssd meta architecture all the way to target assigner.

    This will allow training ssd models with padded groundtruth tensors.

--
184767117  by Zhichao Lu:

    * Add `params` arg to make all input fns work with TPUEstimator
    * Add --master
    * Output eval results

--
184766244  by Zhichao Lu:

    Update create_coco_tf_record to include category indices

--
184752937  by Zhichao Lu:

    Create a third_party version of TPU compatible mobilenet_v2_focal_loss coco config.

--
184750174  by Zhichao Lu:

    A few small fixes for multiscale anchor generator and a test.

--
184746581  by Zhichao Lu:

    Update jupyter notebook to show mask if provided by model.

--
184728646  by Zhichao Lu:

    Adding a few more tests to make sure decoding with/without label maps performs as expected.

--
184624154  by Zhichao Lu:

    Add an object detection binary for TPU.

--
184622118  by Zhichao Lu:

    Batch, transform, and unbatch in the tflearn interface.

--
184595064  by Zhichao Lu:

    Add support for training grayscale models.

--
184532026  by Zhichao Lu:

    Change dataset_builder.build to perform optional batching using tf.data.Dataset API

--
184330239  by Zhichao Lu:

    Add augment_input_data and transform_input_data helper functions to third_party/tensorflow_models/object_detection/inputs.py

--
184328681  by Zhichao Lu:

    Use an internal rgb to gray method that can be quantized.

--
184327909  by Zhichao Lu:

    Helper function to return padding shapes to use with Dataset.padded_batch.

--
184326291  by Zhichao Lu:

    Added decode_func for specialized decoding.

--
184314676  by Zhichao Lu:

    Add unstack_batch method to inputs.py.

    This will enable us to convert batched tensors to lists of tensors. This is compatible with OD API that consumes groundtruth batch as a list of tensors.

--
184281269  by Zhichao Lu:

    Internal test target changes.

--
184192851  by Zhichao Lu:

    Adding `Estimator` interface for object detection.

--
184187885  by Zhichao Lu:

    Add config_util functions to help with input pipeline.

    1. function to return expected shapes from the resizer config
    2. function to extract image_resizer_config from model_config.

--
184139892  by Zhichao Lu:

    Adding support for depthwise SSD (ssd-lite) and depthwise box predictions.

--
184089891  by Zhichao Lu:

    Fix third_party faster rcnn resnet101 coco config.

--
184083378  by Zhichao Lu:

    In the case when there is no object/weights field in tf.Example proto, return a default weight of 1.0 for all boxes.

--

PiperOrigin-RevId: 185215255
parent fbc5ba06
......@@ -123,6 +123,7 @@ py_library(
"matcher.py",
],
deps = [
"//tensorflow/models/research/object_detection/utils:ops",
],
)
......@@ -160,12 +161,20 @@ py_library(
":box_list",
":box_list_ops",
":keypoint_ops",
":preprocessor_cache",
":standard_fields",
"//tensorflow",
"//tensorflow/models/research/object_detection/utils:shape_utils",
],
)
py_library(
name = "preprocessor_cache",
srcs = [
"preprocessor_cache.py",
],
)
py_test(
name = "preprocessor_test",
srcs = [
......@@ -173,6 +182,7 @@ py_test(
],
deps = [
":preprocessor",
":preprocessor_cache",
"//tensorflow",
],
)
......
......@@ -102,7 +102,7 @@ class BoxPredictor(object):
return self._predict(image_features, num_predictions_per_location,
**params)
return self._predict(image_features, num_predictions_per_location,
**params)
**params)
# TODO: num_predictions_per_location could be moved to constructor.
# This is currently only used by ConvolutionalBoxPredictor.
......@@ -582,7 +582,8 @@ class ConvolutionalBoxPredictor(BoxPredictor):
kernel_size,
box_code_size,
apply_sigmoid_to_scores=False,
class_prediction_bias_init=0.0):
class_prediction_bias_init=0.0,
use_depthwise=False):
"""Constructor.
Args:
......@@ -611,6 +612,8 @@ class ConvolutionalBoxPredictor(BoxPredictor):
class_predictions.
class_prediction_bias_init: constant value to initialize bias of the last
conv2d layer before class prediction.
use_depthwise: Whether to use depthwise convolutions for prediction
steps. Default is False.
Raises:
ValueError: if min_depth > max_depth.
......@@ -628,6 +631,7 @@ class ConvolutionalBoxPredictor(BoxPredictor):
self._dropout_keep_prob = dropout_keep_prob
self._apply_sigmoid_to_scores = apply_sigmoid_to_scores
self._class_prediction_bias_init = class_prediction_bias_init
self._use_depthwise = use_depthwise
def _predict(self, image_features, num_predictions_per_location_list):
"""Computes encoded object locations and corresponding confidences.
......@@ -683,17 +687,38 @@ class ConvolutionalBoxPredictor(BoxPredictor):
net, depth, [1, 1], scope='Conv2d_%d_1x1_%d' % (i, depth))
with slim.arg_scope([slim.conv2d], activation_fn=None,
normalizer_fn=None, normalizer_params=None):
box_encodings = slim.conv2d(
net, num_predictions_per_location * self._box_code_size,
[self._kernel_size, self._kernel_size],
scope='BoxEncodingPredictor')
if self._use_depthwise:
box_encodings = slim.separable_conv2d(
net, None, [self._kernel_size, self._kernel_size],
padding='SAME', depth_multiplier=1, stride=1,
rate=1, scope='BoxEncodingPredictor_depthwise')
box_encodings = slim.conv2d(
box_encodings,
num_predictions_per_location * self._box_code_size, [1, 1],
scope='BoxEncodingPredictor')
else:
box_encodings = slim.conv2d(
net, num_predictions_per_location * self._box_code_size,
[self._kernel_size, self._kernel_size],
scope='BoxEncodingPredictor')
if self._use_dropout:
net = slim.dropout(net, keep_prob=self._dropout_keep_prob)
class_predictions_with_background = slim.conv2d(
net, num_predictions_per_location * num_class_slots,
[self._kernel_size, self._kernel_size], scope='ClassPredictor',
biases_initializer=tf.constant_initializer(
self._class_prediction_bias_init))
if self._use_depthwise:
class_predictions_with_background = slim.separable_conv2d(
net, None, [self._kernel_size, self._kernel_size],
padding='SAME', depth_multiplier=1, stride=1,
rate=1, scope='ClassPredictor_depthwise')
class_predictions_with_background = slim.conv2d(
class_predictions_with_background,
num_predictions_per_location * num_class_slots,
[1, 1], scope='ClassPredictor')
else:
class_predictions_with_background = slim.conv2d(
net, num_predictions_per_location * num_class_slots,
[self._kernel_size, self._kernel_size],
scope='ClassPredictor',
biases_initializer=tf.constant_initializer(
self._class_prediction_bias_init))
if self._apply_sigmoid_to_scores:
class_predictions_with_background = tf.sigmoid(
class_predictions_with_background)
......@@ -729,7 +754,8 @@ class WeightSharedConvolutionalBoxPredictor(BoxPredictor):
Defines the box predictor as defined in
https://arxiv.org/abs/1708.02002. This class differs from
ConvolutionalBoxPredictor in that it shares weights and biases while
predicting from different feature maps.
predicting from different feature maps. Separate multi-layer towers are
constructed for the box encoding and class predictors respectively.
"""
def __init__(self,
......@@ -811,22 +837,35 @@ class WeightSharedConvolutionalBoxPredictor(BoxPredictor):
with tf.variable_scope('WeightSharedConvolutionalBoxPredictor',
reuse=tf.AUTO_REUSE):
num_class_slots = self.num_classes + 1
net = image_feature
box_encodings_net = image_feature
class_predictions_net = image_feature
with slim.arg_scope(self._conv_hyperparams):
for i in range(self._num_layers_before_predictor):
net = slim.conv2d(net,
self._depth,
[self._kernel_size, self._kernel_size],
stride=1,
padding='SAME',
scope='conv2d_{}'.format(i))
box_encodings_net = slim.conv2d(
box_encodings_net,
self._depth,
[self._kernel_size, self._kernel_size],
stride=1,
padding='SAME',
scope='BoxEncodingPredictionTower/conv2d_{}'.format(i))
box_encodings = slim.conv2d(
net, num_predictions_per_location * self._box_code_size,
box_encodings_net,
num_predictions_per_location * self._box_code_size,
[self._kernel_size, self._kernel_size],
activation_fn=None, stride=1, padding='SAME',
scope='BoxEncodingPredictor')
for i in range(self._num_layers_before_predictor):
class_predictions_net = slim.conv2d(
class_predictions_net,
self._depth,
[self._kernel_size, self._kernel_size],
stride=1,
padding='SAME',
scope='ClassPredictionTower/conv2d_{}'.format(i))
class_predictions_with_background = slim.conv2d(
net, num_predictions_per_location * num_class_slots,
class_predictions_net,
num_predictions_per_location * num_class_slots,
[self._kernel_size, self._kernel_size],
activation_fn=None, stride=1, padding='SAME',
biases_initializer=tf.constant_initializer(
......
......@@ -316,9 +316,69 @@ class ConvolutionalBoxPredictorTest(test_case.TestCase):
[tf.shape(box_encodings), tf.shape(objectness_predictions)],
feed_dict={image_features:
np.random.rand(4, resolution, resolution, 64)})
actual_variable_set = set(
[var.op.name for var in tf.trainable_variables()])
self.assertAllEqual(box_encodings_shape, [4, expected_num_anchors, 1, 4])
self.assertAllEqual(objectness_predictions_shape,
[4, expected_num_anchors, 1])
expected_variable_set = set([
'BoxPredictor/Conv2d_0_1x1_32/biases',
'BoxPredictor/Conv2d_0_1x1_32/weights',
'BoxPredictor/BoxEncodingPredictor/biases',
'BoxPredictor/BoxEncodingPredictor/weights',
'BoxPredictor/ClassPredictor/biases',
'BoxPredictor/ClassPredictor/weights'])
self.assertEqual(expected_variable_set, actual_variable_set)
def test_use_depthwise_convolution(self):
image_features = tf.placeholder(dtype=tf.float32, shape=[4, None, None, 64])
conv_box_predictor = box_predictor.ConvolutionalBoxPredictor(
is_training=False,
num_classes=0,
conv_hyperparams=self._build_arg_scope_with_conv_hyperparams(),
min_depth=0,
max_depth=32,
num_layers_before_predictor=1,
dropout_keep_prob=0.8,
kernel_size=1,
box_code_size=4,
use_dropout=True,
use_depthwise=True
)
box_predictions = conv_box_predictor.predict(
[image_features], num_predictions_per_location=[5],
scope='BoxPredictor')
box_encodings = box_predictions[box_predictor.BOX_ENCODINGS]
objectness_predictions = box_predictions[
box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND]
init_op = tf.global_variables_initializer()
resolution = 32
expected_num_anchors = resolution*resolution*5
with self.test_session() as sess:
sess.run(init_op)
(box_encodings_shape,
objectness_predictions_shape) = sess.run(
[tf.shape(box_encodings), tf.shape(objectness_predictions)],
feed_dict={image_features:
np.random.rand(4, resolution, resolution, 64)})
actual_variable_set = set(
[var.op.name for var in tf.trainable_variables()])
self.assertAllEqual(box_encodings_shape, [4, expected_num_anchors, 1, 4])
self.assertAllEqual(objectness_predictions_shape,
[4, expected_num_anchors, 1])
expected_variable_set = set([
'BoxPredictor/Conv2d_0_1x1_32/biases',
'BoxPredictor/Conv2d_0_1x1_32/weights',
'BoxPredictor/BoxEncodingPredictor_depthwise/biases',
'BoxPredictor/BoxEncodingPredictor_depthwise/depthwise_weights',
'BoxPredictor/BoxEncodingPredictor/biases',
'BoxPredictor/BoxEncodingPredictor/weights',
'BoxPredictor/ClassPredictor_depthwise/biases',
'BoxPredictor/ClassPredictor_depthwise/depthwise_weights',
'BoxPredictor/ClassPredictor/biases',
'BoxPredictor/ClassPredictor/weights'])
self.assertEqual(expected_variable_set, actual_variable_set)
class WeightSharedConvolutionalBoxPredictorTest(test_case.TestCase):
......@@ -440,14 +500,26 @@ class WeightSharedConvolutionalBoxPredictorTest(test_case.TestCase):
with self.test_session(graph=tf.Graph()):
graph_fn(tf.random_uniform([4, 32, 32, 3], dtype=tf.float32),
tf.random_uniform([4, 32, 32, 3], dtype=tf.float32))
tf.random_uniform([4, 16, 16, 3], dtype=tf.float32))
actual_variable_set = set(
[var.op.name for var in tf.trainable_variables()])
expected_variable_set = set([
'BoxPredictor/WeightSharedConvolutionalBoxPredictor/conv2d_0/weights',
'BoxPredictor/WeightSharedConvolutionalBoxPredictor/conv2d_0/biases',
'BoxPredictor/WeightSharedConvolutionalBoxPredictor/conv2d_1/weights',
'BoxPredictor/WeightSharedConvolutionalBoxPredictor/conv2d_1/biases',
('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
'BoxEncodingPredictionTower/conv2d_0/weights'),
('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
'BoxEncodingPredictionTower/conv2d_0/biases'),
('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
'BoxEncodingPredictionTower/conv2d_1/weights'),
('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
'BoxEncodingPredictionTower/conv2d_1/biases'),
('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
'ClassPredictionTower/conv2d_0/weights'),
('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
'ClassPredictionTower/conv2d_0/biases'),
('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
'ClassPredictionTower/conv2d_1/weights'),
('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
'ClassPredictionTower/conv2d_1/biases'),
('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
'BoxEncodingPredictor/weights'),
('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
......@@ -489,6 +561,5 @@ class WeightSharedConvolutionalBoxPredictorTest(test_case.TestCase):
self.assertAllEqual(objectness_predictions_shape,
[4, expected_num_anchors, 1])
if __name__ == '__main__':
tf.test.main()
......@@ -36,6 +36,8 @@ from abc import abstractmethod
import tensorflow as tf
from object_detection.utils import ops
class Match(object):
"""Class to store results from the matcher.
......@@ -44,7 +46,7 @@ class Match(object):
convenient methods to query the matching results.
"""
def __init__(self, match_results):
def __init__(self, match_results, use_matmul_gather=False):
"""Constructs a Match object.
Args:
......@@ -52,6 +54,8 @@ class Match(object):
meaning that column i is matched with row match_results[i].
(2) match_results[i]=-1, meaning that column i is not matched.
(3) match_results[i]=-2, meaning that column i is ignored.
use_matmul_gather: Use matrix multiplication based gather instead of
standard tf.gather. (Default: False).
Raises:
ValueError: if match_results does not have rank 1 or is not an
......@@ -63,6 +67,9 @@ class Match(object):
raise ValueError('match_results should be an int32 or int64 scalar '
'tensor')
self._match_results = match_results
self._gather_op = tf.gather
if use_matmul_gather:
self._gather_op = ops.matmul_gather_on_zeroth_axis
@property
def match_results(self):
......@@ -163,7 +170,7 @@ class Match(object):
row_indices: int32 tensor of shape [K] with row indices.
"""
return self._reshape_and_cast(
tf.gather(self._match_results, self.matched_column_indices()))
self._gather_op(self._match_results, self.matched_column_indices()))
def _reshape_and_cast(self, t):
return tf.cast(tf.reshape(t, [-1]), tf.int32)
......@@ -193,7 +200,7 @@ class Match(object):
input_tensor = tf.concat([tf.stack([ignored_value, unmatched_value]),
input_tensor], axis=0)
gather_indices = tf.maximum(self.match_results + 2, 0)
gathered_tensor = tf.gather(input_tensor, gather_indices)
gathered_tensor = self._gather_op(input_tensor, gather_indices)
return gathered_tensor
......@@ -202,6 +209,16 @@ class Matcher(object):
"""
__metaclass__ = ABCMeta
def __init__(self, use_matmul_gather=False):
"""Constructs a Matcher.
Args:
use_matmul_gather: Force constructed match objects to use matrix
multiplication based gather instead of standard tf.gather.
(Default: False).
"""
self._use_matmul_gather = use_matmul_gather
def match(self, similarity_matrix, scope=None, **params):
"""Computes matches among row and column indices and returns the result.
......@@ -219,7 +236,8 @@ class Matcher(object):
A Match object with the results of matching.
"""
with tf.name_scope(scope, 'Match', [similarity_matrix, params]) as scope:
return Match(self._match(similarity_matrix, **params))
return Match(self._match(similarity_matrix, **params),
self._use_matmul_gather)
@abstractmethod
def _match(self, similarity_matrix, **params):
......
......@@ -172,5 +172,21 @@ class MatchTest(tf.test.TestCase):
gathered_tensor_out = gathered_tensor.eval()
self.assertAllEqual(expected_gathered_tensor, gathered_tensor_out)
def test_multidimensional_gather_based_on_match_with_matmul_gather_op(self):
match_results = tf.constant([1, -1, -2])
input_tensor = tf.constant([[0, 0.5, 0, 0.5], [0, 0, 0.5, 0.5]],
dtype=tf.float32)
expected_gathered_tensor = [[0, 0, 0.5, 0.5], [0, 0, 0, 0], [0, 0, 0, 0]]
match = matcher.Match(match_results, use_matmul_gather=True)
gathered_tensor = match.gather_based_on_match(input_tensor,
unmatched_value=tf.zeros(4),
ignored_value=tf.zeros(4))
self.assertEquals(gathered_tensor.dtype, tf.float32)
with self.test_session() as sess:
self.assertTrue(
all([op.name is not 'Gather' for op in sess.graph.get_operations()]))
gathered_tensor_out = gathered_tensor.eval()
self.assertAllEqual(expected_gathered_tensor, gathered_tensor_out)
if __name__ == '__main__':
tf.test.main()
......@@ -236,7 +236,8 @@ class DetectionModel(object):
groundtruth_boxes_list,
groundtruth_classes_list,
groundtruth_masks_list=None,
groundtruth_keypoints_list=None):
groundtruth_keypoints_list=None,
groundtruth_weights_list=None):
"""Provide groundtruth tensors.
Args:
......@@ -257,10 +258,15 @@ class DetectionModel(object):
shape [num_boxes, num_keypoints, 2] containing keypoints.
Keypoints are assumed to be provided in normalized coordinates and
missing keypoints should be encoded as NaN.
groundtruth_weights_list: A list of 1-D tf.float32 tensors of shape
[num_boxes] containing weights for groundtruth boxes.
"""
self._groundtruth_lists[fields.BoxListFields.boxes] = groundtruth_boxes_list
self._groundtruth_lists[
fields.BoxListFields.classes] = groundtruth_classes_list
if groundtruth_weights_list:
self._groundtruth_lists[fields.BoxListFields.
weights] = groundtruth_weights_list
if groundtruth_masks_list:
self._groundtruth_lists[
fields.BoxListFields.masks] = groundtruth_masks_list
......
......@@ -35,6 +35,27 @@ in each row there is a box with [ymin xmin ymax xmax].
Boxes are in normalized coordinates meaning
their coordinate values range in [0, 1]
To preprocess multiple images with the same operations in cases where
nondeterministic operations are used, a preprocessor_cache.PreprocessorCache
object can be passed into the preprocess function or individual operations.
All nondeterministic operations except random_jitter_boxes support caching.
E.g.
Let tensor_dict{1,2,3,4,5} be copies of the same inputs.
Let preprocess_options contain nondeterministic operation(s) excluding
random_jitter_boxes.
cache1 = preprocessor_cache.PreprocessorCache()
cache2 = preprocessor_cache.PreprocessorCache()
a = preprocess(tensor_dict1, preprocess_options, preprocess_vars_cache=cache1)
b = preprocess(tensor_dict2, preprocess_options, preprocess_vars_cache=cache1)
c = preprocess(tensor_dict3, preprocess_options, preprocess_vars_cache=cache2)
d = preprocess(tensor_dict4, preprocess_options, preprocess_vars_cache=cache2)
e = preprocess(tensor_dict5, preprocess_options)
Then correspondings tensors of object pairs (a,b) and (c,d)
are guaranteed to be equal element-wise, but the equality of any other object
pair cannot be determined.
Important Note: In tensor_dict, images is a rank 4 tensor, but preprocessing
functions receive a rank 3 tensor for processing the image. Thus, inside the
preprocess function we squeeze the image to become a rank 3 tensor and then
......@@ -42,6 +63,8 @@ we pass it to the functions. At the end of the preprocess we expand the image
back to rank 4.
"""
import functools
import inspect
import sys
import tensorflow as tf
......@@ -50,45 +73,79 @@ from tensorflow.python.ops import control_flow_ops
from object_detection.core import box_list
from object_detection.core import box_list_ops
from object_detection.core import keypoint_ops
from object_detection.core import preprocessor_cache
from object_detection.core import standard_fields as fields
from object_detection.utils import shape_utils
def _apply_with_random_selector(x, func, num_cases):
def _apply_with_random_selector(x,
func,
num_cases,
preprocess_vars_cache=None,
key=''):
"""Computes func(x, sel), with sel sampled from [0...num_cases-1].
If both preprocess_vars_cache AND key are the same between two calls, sel will
be the same value in both calls.
Args:
x: input Tensor.
func: Python function to apply.
num_cases: Python int32, number of cases to sample sel from.
preprocess_vars_cache: PreprocessorCache object that records previously
performed augmentations. Updated in-place. If this
function is called multiple times with the same
non-null cache, it will perform deterministically.
key: variable identifier for preprocess_vars_cache.
Returns:
The result of func(x, sel), where func receives the value of the
selector as a python integer, but sel is sampled dynamically.
"""
rand_sel = tf.random_uniform([], maxval=num_cases, dtype=tf.int32)
generator_func = functools.partial(
tf.random_uniform, [], maxval=num_cases, dtype=tf.int32)
rand_sel = _get_or_create_preprocess_rand_vars(
generator_func, preprocessor_cache.PreprocessorCache.SELECTOR,
preprocess_vars_cache, key)
# Pass the real x only to one of the func calls.
return control_flow_ops.merge([func(
control_flow_ops.switch(x, tf.equal(rand_sel, case))[1], case)
for case in range(num_cases)])[0]
def _apply_with_random_selector_tuples(x, func, num_cases):
def _apply_with_random_selector_tuples(x,
func,
num_cases,
preprocess_vars_cache=None,
key=''):
"""Computes func(x, sel), with sel sampled from [0...num_cases-1].
If both preprocess_vars_cache AND key are the same between two calls, sel will
be the same value in both calls.
Args:
x: A tuple of input tensors.
func: Python function to apply.
num_cases: Python int32, number of cases to sample sel from.
preprocess_vars_cache: PreprocessorCache object that records previously
performed augmentations. Updated in-place. If this
function is called multiple times with the same
non-null cache, it will perform deterministically.
key: variable identifier for preprocess_vars_cache.
Returns:
The result of func(x, sel), where func receives the value of the
selector as a python integer, but sel is sampled dynamically.
"""
num_inputs = len(x)
rand_sel = tf.random_uniform([], maxval=num_cases, dtype=tf.int32)
# Pass the real x only to one of the func calls.
generator_func = functools.partial(
tf.random_uniform, [], maxval=num_cases, dtype=tf.int32)
rand_sel = _get_or_create_preprocess_rand_vars(
generator_func, preprocessor_cache.PreprocessorCache.SELECTOR_TUPLES,
preprocess_vars_cache, key)
# Pass the real x only to one of the func calls.
tuples = [list() for t in x]
for case in range(num_cases):
new_x = [control_flow_ops.switch(t, tf.equal(rand_sel, case))[1] for t in x]
......@@ -101,6 +158,37 @@ def _apply_with_random_selector_tuples(x, func, num_cases):
return tuple(tuples)
def _get_or_create_preprocess_rand_vars(generator_func,
function_id,
preprocess_vars_cache,
key=''):
"""Returns a tensor stored in preprocess_vars_cache or using generator_func.
If the tensor was previously generated and appears in the PreprocessorCache,
the previously generated tensor will be returned. Otherwise, a new tensor
is generated using generator_func and stored in the cache.
Args:
generator_func: A 0-argument function that generates a tensor.
function_id: identifier for the preprocessing function used.
preprocess_vars_cache: PreprocessorCache object that records previously
performed augmentations. Updated in-place. If this
function is called multiple times with the same
non-null cache, it will perform deterministically.
key: identifier for the variable stored.
Returns:
The generated tensor.
"""
if preprocess_vars_cache is not None:
var = preprocess_vars_cache.get(function_id, key)
if var is None:
var = generator_func()
preprocess_vars_cache.update(function_id, key, var)
else:
var = generator_func()
return var
def _random_integer(minval, maxval, seed):
"""Returns a random 0-D tensor between minval and maxval.
......@@ -116,6 +204,40 @@ def _random_integer(minval, maxval, seed):
[], minval=minval, maxval=maxval, dtype=tf.int32, seed=seed)
# TODO: This method is needed because the current
# tf.image.rgb_to_grayscale method does not support quantization. Replace with
# tf.image.rgb_to_grayscale after quantization support is added.
def _rgb_to_grayscale(images, name=None):
"""Converts one or more images from RGB to Grayscale.
Outputs a tensor of the same `DType` and rank as `images`. The size of the
last dimension of the output is 1, containing the Grayscale value of the
pixels.
Args:
images: The RGB tensor to convert. Last dimension must have size 3 and
should contain RGB values.
name: A name for the operation (optional).
Returns:
The converted grayscale image(s).
"""
with tf.name_scope(name, 'rgb_to_grayscale', [images]) as name:
images = tf.convert_to_tensor(images, name='images')
# Remember original dtype to so we can convert back if needed
orig_dtype = images.dtype
flt_image = tf.image.convert_image_dtype(images, tf.float32)
# Reference for converting between RGB and grayscale.
# https://en.wikipedia.org/wiki/Luma_%28video%29
rgb_weights = [0.2989, 0.5870, 0.1140]
rank_1 = tf.expand_dims(tf.rank(images) - 1, 0)
gray_float = tf.reduce_sum(
flt_image * rgb_weights, rank_1, keepdims=True)
gray_float.set_shape(images.get_shape()[:-1].concatenate([1]))
return tf.image.convert_image_dtype(gray_float, orig_dtype, name=name)
def normalize_image(image, original_minval, original_maxval, target_minval,
target_maxval):
"""Normalizes pixel values in the image.
......@@ -313,7 +435,8 @@ def random_horizontal_flip(image,
masks=None,
keypoints=None,
keypoint_flip_permutation=None,
seed=None):
seed=None,
preprocess_vars_cache=None):
"""Randomly flips the image and detections horizontally.
The probability of flipping the image is 50%.
......@@ -334,6 +457,10 @@ def random_horizontal_flip(image,
keypoint_flip_permutation: rank 1 int32 tensor containing the keypoint flip
permutation.
seed: random seed
preprocess_vars_cache: PreprocessorCache object that records previously
performed augmentations. Updated in-place. If this
function is called multiple times with the same
non-null cache, it will perform deterministically.
Returns:
image: image which is the same shape as input image.
......@@ -365,7 +492,12 @@ def random_horizontal_flip(image,
with tf.name_scope('RandomHorizontalFlip', values=[image, boxes]):
result = []
# random variable defining whether to do flip or not
do_a_flip_random = tf.greater(tf.random_uniform([], seed=seed), 0.5)
generator_func = functools.partial(tf.random_uniform, [], seed=seed)
do_a_flip_random = _get_or_create_preprocess_rand_vars(
generator_func,
preprocessor_cache.PreprocessorCache.HORIZONTAL_FLIP,
preprocess_vars_cache)
do_a_flip_random = tf.greater(do_a_flip_random, 0.5)
# flip image
image = tf.cond(do_a_flip_random, lambda: _flip_image(image), lambda: image)
......@@ -400,7 +532,8 @@ def random_vertical_flip(image,
masks=None,
keypoints=None,
keypoint_flip_permutation=None,
seed=None):
seed=None,
preprocess_vars_cache=None):
"""Randomly flips the image and detections vertically.
The probability of flipping the image is 50%.
......@@ -421,6 +554,10 @@ def random_vertical_flip(image,
keypoint_flip_permutation: rank 1 int32 tensor containing the keypoint flip
permutation.
seed: random seed
preprocess_vars_cache: PreprocessorCache object that records previously
performed augmentations. Updated in-place. If this
function is called multiple times with the same
non-null cache, it will perform deterministically.
Returns:
image: image which is the same shape as input image.
......@@ -452,7 +589,11 @@ def random_vertical_flip(image,
with tf.name_scope('RandomVerticalFlip', values=[image, boxes]):
result = []
# random variable defining whether to do flip or not
do_a_flip_random = tf.greater(tf.random_uniform([], seed=seed), 0.5)
generator_func = functools.partial(tf.random_uniform, [], seed=seed)
do_a_flip_random = _get_or_create_preprocess_rand_vars(
generator_func, preprocessor_cache.PreprocessorCache.VERTICAL_FLIP,
preprocess_vars_cache)
do_a_flip_random = tf.greater(do_a_flip_random, 0.5)
# flip image
image = tf.cond(do_a_flip_random, lambda: _flip_image(image), lambda: image)
......@@ -486,7 +627,8 @@ def random_rotation90(image,
boxes=None,
masks=None,
keypoints=None,
seed=None):
seed=None,
preprocess_vars_cache=None):
"""Randomly rotates the image and detections 90 degrees counter-clockwise.
The probability of rotating the image is 50%. This can be combined with
......@@ -508,6 +650,10 @@ def random_rotation90(image,
[num_instances, num_keypoints, 2]. The keypoints are in y-x
normalized coordinates.
seed: random seed
preprocess_vars_cache: PreprocessorCache object that records previously
performed augmentations. Updated in-place. If this
function is called multiple times with the same
non-null cache, it will perform deterministically.
Returns:
image: image which is the same shape as input image.
......@@ -533,7 +679,11 @@ def random_rotation90(image,
result = []
# random variable defining whether to rotate by 90 degrees or not
do_a_rot90_random = tf.greater(tf.random_uniform([], seed=seed), 0.5)
generator_func = functools.partial(tf.random_uniform, [], seed=seed)
do_a_rot90_random = _get_or_create_preprocess_rand_vars(
generator_func, preprocessor_cache.PreprocessorCache.ROTATION90,
preprocess_vars_cache)
do_a_rot90_random = tf.greater(do_a_rot90_random, 0.5)
# flip image
image = tf.cond(do_a_rot90_random, lambda: _rot90_image(image),
......@@ -563,7 +713,11 @@ def random_rotation90(image,
return tuple(result)
def random_pixel_value_scale(image, minval=0.9, maxval=1.1, seed=None):
def random_pixel_value_scale(image,
minval=0.9,
maxval=1.1,
seed=None,
preprocess_vars_cache=None):
"""Scales each value in the pixels of the image.
This function scales each pixel independent of the other ones.
......@@ -576,17 +730,24 @@ def random_pixel_value_scale(image, minval=0.9, maxval=1.1, seed=None):
minval: lower ratio of scaling pixel values.
maxval: upper ratio of scaling pixel values.
seed: random seed.
preprocess_vars_cache: PreprocessorCache object that records previously
performed augmentations. Updated in-place. If this
function is called multiple times with the same
non-null cache, it will perform deterministically.
Returns:
image: image which is the same shape as input image.
"""
with tf.name_scope('RandomPixelValueScale', values=[image]):
color_coef = tf.random_uniform(
tf.shape(image),
minval=minval,
maxval=maxval,
dtype=tf.float32,
seed=seed)
generator_func = functools.partial(
tf.random_uniform, tf.shape(image),
minval=minval, maxval=maxval,
dtype=tf.float32, seed=seed)
color_coef = _get_or_create_preprocess_rand_vars(
generator_func,
preprocessor_cache.PreprocessorCache.PIXEL_VALUE_SCALE,
preprocess_vars_cache)
image = tf.multiply(image, color_coef)
image = tf.clip_by_value(image, 0.0, 1.0)
......@@ -597,7 +758,8 @@ def random_image_scale(image,
masks=None,
min_scale_ratio=0.5,
max_scale_ratio=2.0,
seed=None):
seed=None,
preprocess_vars_cache=None):
"""Scales the image size.
Args:
......@@ -608,6 +770,10 @@ def random_image_scale(image,
min_scale_ratio: minimum scaling ratio.
max_scale_ratio: maximum scaling ratio.
seed: random seed.
preprocess_vars_cache: PreprocessorCache object that records previously
performed augmentations. Updated in-place. If this
function is called multiple times with the same
non-null cache, it will perform deterministically.
Returns:
image: image which is the same rank as input image.
......@@ -619,10 +785,14 @@ def random_image_scale(image,
image_shape = tf.shape(image)
image_height = image_shape[0]
image_width = image_shape[1]
size_coef = tf.random_uniform([],
minval=min_scale_ratio,
maxval=max_scale_ratio,
dtype=tf.float32, seed=seed)
generator_func = functools.partial(
tf.random_uniform, [],
minval=min_scale_ratio, maxval=max_scale_ratio,
dtype=tf.float32, seed=seed)
size_coef = _get_or_create_preprocess_rand_vars(
generator_func, preprocessor_cache.PreprocessorCache.IMAGE_SCALE,
preprocess_vars_cache)
image_newysize = tf.to_int32(
tf.multiply(tf.to_float(image_height), size_coef))
image_newxsize = tf.to_int32(
......@@ -637,7 +807,10 @@ def random_image_scale(image,
return tuple(result)
def random_rgb_to_gray(image, probability=0.1, seed=None):
def random_rgb_to_gray(image,
probability=0.1,
seed=None,
preprocess_vars_cache=None):
"""Changes the image from RGB to Grayscale with the given probability.
Args:
......@@ -646,18 +819,25 @@ def random_rgb_to_gray(image, probability=0.1, seed=None):
probability: the probability of returning a grayscale image.
The probability should be a number between [0, 1].
seed: random seed.
preprocess_vars_cache: PreprocessorCache object that records previously
performed augmentations. Updated in-place. If this
function is called multiple times with the same
non-null cache, it will perform deterministically.
Returns:
image: image which is the same shape as input image.
"""
def _image_to_gray(image):
image_gray1 = tf.image.rgb_to_grayscale(image)
image_gray1 = _rgb_to_grayscale(image)
image_gray3 = tf.image.grayscale_to_rgb(image_gray1)
return image_gray3
with tf.name_scope('RandomRGBtoGray', values=[image]):
# random variable defining whether to do flip or not
do_gray_random = tf.random_uniform([], seed=seed)
# random variable defining whether to change to grayscale or not
generator_func = functools.partial(tf.random_uniform, [], seed=seed)
do_gray_random = _get_or_create_preprocess_rand_vars(
generator_func, preprocessor_cache.PreprocessorCache.RGB_TO_GRAY,
preprocess_vars_cache)
image = tf.cond(
tf.greater(do_gray_random, probability), lambda: image,
......@@ -666,7 +846,10 @@ def random_rgb_to_gray(image, probability=0.1, seed=None):
return image
def random_adjust_brightness(image, max_delta=0.2):
def random_adjust_brightness(image,
max_delta=0.2,
seed=None,
preprocess_vars_cache=None):
"""Randomly adjusts brightness.
Makes sure the output image is still between 0 and 1.
......@@ -675,18 +858,34 @@ def random_adjust_brightness(image, max_delta=0.2):
image: rank 3 float32 tensor contains 1 image -> [height, width, channels]
with pixel values varying between [0, 1].
max_delta: how much to change the brightness. A value between [0, 1).
seed: random seed.
preprocess_vars_cache: PreprocessorCache object that records previously
performed augmentations. Updated in-place. If this
function is called multiple times with the same
non-null cache, it will perform deterministically.
Returns:
image: image which is the same shape as input image.
boxes: boxes which is the same shape as input boxes.
"""
with tf.name_scope('RandomAdjustBrightness', values=[image]):
image = tf.image.random_brightness(image, max_delta)
generator_func = functools.partial(tf.random_uniform, [],
-max_delta, max_delta, seed=seed)
delta = _get_or_create_preprocess_rand_vars(
generator_func,
preprocessor_cache.PreprocessorCache.ADJUST_BRIGHTNESS,
preprocess_vars_cache)
image = tf.image.adjust_brightness(image, delta)
image = tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=1.0)
return image
def random_adjust_contrast(image, min_delta=0.8, max_delta=1.25):
def random_adjust_contrast(image,
min_delta=0.8,
max_delta=1.25,
seed=None,
preprocess_vars_cache=None):
"""Randomly adjusts contrast.
Makes sure the output image is still between 0 and 1.
......@@ -698,17 +897,31 @@ def random_adjust_contrast(image, min_delta=0.8, max_delta=1.25):
max_delta: how much to change the contrast. Contrast will change with a
value between min_delta and max_delta. This value will be
multiplied to the current contrast of the image.
seed: random seed.
preprocess_vars_cache: PreprocessorCache object that records previously
performed augmentations. Updated in-place. If this
function is called multiple times with the same
non-null cache, it will perform deterministically.
Returns:
image: image which is the same shape as input image.
"""
with tf.name_scope('RandomAdjustContrast', values=[image]):
image = tf.image.random_contrast(image, min_delta, max_delta)
generator_func = functools.partial(tf.random_uniform, [],
min_delta, max_delta, seed=seed)
contrast_factor = _get_or_create_preprocess_rand_vars(
generator_func,
preprocessor_cache.PreprocessorCache.ADJUST_CONTRAST,
preprocess_vars_cache)
image = tf.image.adjust_contrast(image, contrast_factor)
image = tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=1.0)
return image
def random_adjust_hue(image, max_delta=0.02):
def random_adjust_hue(image,
max_delta=0.02,
seed=None,
preprocess_vars_cache=None):
"""Randomly adjusts hue.
Makes sure the output image is still between 0 and 1.
......@@ -717,17 +930,31 @@ def random_adjust_hue(image, max_delta=0.02):
image: rank 3 float32 tensor contains 1 image -> [height, width, channels]
with pixel values varying between [0, 1].
max_delta: change hue randomly with a value between 0 and max_delta.
seed: random seed.
preprocess_vars_cache: PreprocessorCache object that records previously
performed augmentations. Updated in-place. If this
function is called multiple times with the same
non-null cache, it will perform deterministically.
Returns:
image: image which is the same shape as input image.
"""
with tf.name_scope('RandomAdjustHue', values=[image]):
image = tf.image.random_hue(image, max_delta)
generator_func = functools.partial(tf.random_uniform, [],
-max_delta, max_delta, seed=seed)
delta = _get_or_create_preprocess_rand_vars(
generator_func, preprocessor_cache.PreprocessorCache.ADJUST_HUE,
preprocess_vars_cache)
image = tf.image.adjust_hue(image, delta)
image = tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=1.0)
return image
def random_adjust_saturation(image, min_delta=0.8, max_delta=1.25):
def random_adjust_saturation(image,
min_delta=0.8,
max_delta=1.25,
seed=None,
preprocess_vars_cache=None):
"""Randomly adjusts saturation.
Makes sure the output image is still between 0 and 1.
......@@ -739,17 +966,28 @@ def random_adjust_saturation(image, min_delta=0.8, max_delta=1.25):
max_delta: how much to change the saturation. Saturation will change with a
value between min_delta and max_delta. This value will be
multiplied to the current saturation of the image.
seed: random seed.
preprocess_vars_cache: PreprocessorCache object that records previously
performed augmentations. Updated in-place. If this
function is called multiple times with the same
non-null cache, it will perform deterministically.
Returns:
image: image which is the same shape as input image.
"""
with tf.name_scope('RandomAdjustSaturation', values=[image]):
image = tf.image.random_saturation(image, min_delta, max_delta)
generator_func = functools.partial(tf.random_uniform, [],
min_delta, max_delta, seed=seed)
saturation_factor = _get_or_create_preprocess_rand_vars(
generator_func,
preprocessor_cache.PreprocessorCache.ADJUST_SATURATION,
preprocess_vars_cache)
image = tf.image.adjust_saturation(image, saturation_factor)
image = tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=1.0)
return image
def random_distort_color(image, color_ordering=0):
def random_distort_color(image, color_ordering=0, preprocess_vars_cache=None):
"""Randomly distorts color.
Randomly distorts color using a combination of brightness, hue, contrast
......@@ -759,6 +997,10 @@ def random_distort_color(image, color_ordering=0):
image: rank 3 float32 tensor contains 1 image -> [height, width, channels]
with pixel values varying between [0, 1].
color_ordering: Python int, a type of distortion (valid values: 0, 1).
preprocess_vars_cache: PreprocessorCache object that records previously
performed augmentations. Updated in-place. If this
function is called multiple times with the same
non-null cache, it will perform deterministically.
Returns:
image: image which is the same shape as input image.
......@@ -768,20 +1010,34 @@ def random_distort_color(image, color_ordering=0):
"""
with tf.name_scope('RandomDistortColor', values=[image]):
if color_ordering == 0:
image = tf.image.random_brightness(image, max_delta=32. / 255.)
image = tf.image.random_saturation(image, lower=0.5, upper=1.5)
image = tf.image.random_hue(image, max_delta=0.2)
image = tf.image.random_contrast(image, lower=0.5, upper=1.5)
image = random_adjust_brightness(
image, max_delta=32. / 255.,
preprocess_vars_cache=preprocess_vars_cache)
image = random_adjust_saturation(
image, min_delta=0.5, max_delta=1.5,
preprocess_vars_cache=preprocess_vars_cache)
image = random_adjust_hue(
image, max_delta=0.2,
preprocess_vars_cache=preprocess_vars_cache)
image = random_adjust_contrast(
image, min_delta=0.5, max_delta=1.5,
preprocess_vars_cache=preprocess_vars_cache)
elif color_ordering == 1:
image = tf.image.random_brightness(image, max_delta=32. / 255.)
image = tf.image.random_contrast(image, lower=0.5, upper=1.5)
image = tf.image.random_saturation(image, lower=0.5, upper=1.5)
image = tf.image.random_hue(image, max_delta=0.2)
image = random_adjust_brightness(
image, max_delta=32. / 255.,
preprocess_vars_cache=preprocess_vars_cache)
image = random_adjust_contrast(
image, min_delta=0.5, max_delta=1.5,
preprocess_vars_cache=preprocess_vars_cache)
image = random_adjust_saturation(
image, min_delta=0.5, max_delta=1.5,
preprocess_vars_cache=preprocess_vars_cache)
image = random_adjust_hue(
image, max_delta=0.2,
preprocess_vars_cache=preprocess_vars_cache)
else:
raise ValueError('color_ordering must be in {0, 1}')
# The random_* ops do not necessarily clamp.
image = tf.clip_by_value(image, 0.0, 1.0)
return image
......@@ -846,7 +1102,8 @@ def _strict_random_crop_image(image,
min_object_covered=1.0,
aspect_ratio_range=(0.75, 1.33),
area_range=(0.1, 1.0),
overlap_thresh=0.3):
overlap_thresh=0.3,
preprocess_vars_cache=None):
"""Performs random crop.
Note: boxes will be clipped to the crop. Keypoint coordinates that are
......@@ -879,6 +1136,10 @@ def _strict_random_crop_image(image,
original image.
overlap_thresh: minimum overlap thresh with new cropped
image to keep the box.
preprocess_vars_cache: PreprocessorCache object that records previously
performed augmentations. Updated in-place. If this
function is called multiple times with the same
non-null cache, it will perform deterministically.
Returns:
image: image which is the same rank as input image.
......@@ -901,7 +1162,8 @@ def _strict_random_crop_image(image,
tf.clip_by_value(
boxes, clip_value_min=0.0, clip_value_max=1.0), 1)
sample_distorted_bounding_box = tf.image.sample_distorted_bounding_box(
generator_func = functools.partial(
tf.image.sample_distorted_bounding_box,
image_shape,
bounding_boxes=boxes_expanded,
min_object_covered=min_object_covered,
......@@ -910,6 +1172,13 @@ def _strict_random_crop_image(image,
max_attempts=100,
use_image_if_no_bounding_boxes=True)
# for ssd cropping, each value of min_object_covered has its own
# cached random variable
sample_distorted_bounding_box = _get_or_create_preprocess_rand_vars(
generator_func,
preprocessor_cache.PreprocessorCache.STRICT_CROP_IMAGE,
preprocess_vars_cache, key=min_object_covered)
im_box_begin, im_box_size, im_box = sample_distorted_bounding_box
new_image = tf.slice(image, im_box_begin, im_box_size)
......@@ -985,7 +1254,8 @@ def random_crop_image(image,
area_range=(0.1, 1.0),
overlap_thresh=0.3,
random_coef=0.0,
seed=None):
seed=None,
preprocess_vars_cache=None):
"""Randomly crops the image.
Given the input image and its bounding boxes, this op randomly
......@@ -1030,6 +1300,10 @@ def random_crop_image(image,
cropped image, and if it is 1.0, we will always get the
original image.
seed: random seed.
preprocess_vars_cache: PreprocessorCache object that records previously
performed augmentations. Updated in-place. If this
function is called multiple times with the same
non-null cache, it will perform deterministically.
Returns:
image: Image shape will be [new_height, new_width, channels].
......@@ -1057,13 +1331,17 @@ def random_crop_image(image,
min_object_covered=min_object_covered,
aspect_ratio_range=aspect_ratio_range,
area_range=area_range,
overlap_thresh=overlap_thresh)
overlap_thresh=overlap_thresh,
preprocess_vars_cache=preprocess_vars_cache)
# avoids tf.cond to make faster RCNN training on borg. See b/140057645.
if random_coef < sys.float_info.min:
result = strict_random_crop_image_fn()
else:
do_a_crop_random = tf.random_uniform([], seed=seed)
generator_func = functools.partial(tf.random_uniform, [], seed=seed)
do_a_crop_random = _get_or_create_preprocess_rand_vars(
generator_func, preprocessor_cache.PreprocessorCache.CROP_IMAGE,
preprocess_vars_cache)
do_a_crop_random = tf.greater(do_a_crop_random, random_coef)
outputs = [image, boxes, labels]
......@@ -1085,7 +1363,8 @@ def random_pad_image(image,
min_image_size=None,
max_image_size=None,
pad_color=None,
seed=None):
seed=None,
preprocess_vars_cache=None):
"""Randomly pads the image.
This function randomly pads the image with zeros. The final size of the
......@@ -1111,8 +1390,11 @@ def random_pad_image(image,
pad_color: padding color. A rank 1 tensor of [3] with dtype=tf.float32.
if set as None, it will be set to average color of the input
image.
seed: random seed.
preprocess_vars_cache: PreprocessorCache object that records previously
performed augmentations. Updated in-place. If this
function is called multiple times with the same
non-null cache, it will perform deterministically.
Returns:
image: Image shape will be [new_height, new_width, channels].
......@@ -1156,6 +1438,12 @@ def random_pad_image(image,
lambda: _random_integer(0, target_width - image_width, seed),
lambda: tf.constant(0, dtype=tf.int32))
gen_func = lambda: (target_height, target_width, offset_height, offset_width)
params = _get_or_create_preprocess_rand_vars(
gen_func, preprocessor_cache.PreprocessorCache.PAD_IMAGE,
preprocess_vars_cache)
target_height, target_width, offset_height, offset_width = params
new_image = tf.image.pad_to_bounding_box(
image,
offset_height=offset_height,
......@@ -1201,7 +1489,8 @@ def random_crop_pad_image(image,
min_padded_size_ratio=(1.0, 1.0),
max_padded_size_ratio=(2.0, 2.0),
pad_color=None,
seed=None):
seed=None,
preprocess_vars_cache=None):
"""Randomly crops and pads the image.
Given an input image and its bounding boxes, this op first randomly crops
......@@ -1242,6 +1531,10 @@ def random_crop_pad_image(image,
if set as None, it will be set to average color of the randomly
cropped image.
seed: random seed.
preprocess_vars_cache: PreprocessorCache object that records previously
performed augmentations. Updated in-place. If this
function is called multiple times with the same
non-null cache, it will perform deterministically.
Returns:
padded_image: padded image.
......@@ -1264,7 +1557,8 @@ def random_crop_pad_image(image,
area_range=area_range,
overlap_thresh=overlap_thresh,
random_coef=random_coef,
seed=seed)
seed=seed,
preprocess_vars_cache=preprocess_vars_cache)
cropped_image, cropped_boxes, cropped_labels = result[:3]
......@@ -1281,7 +1575,8 @@ def random_crop_pad_image(image,
min_image_size=min_image_size,
max_image_size=max_image_size,
pad_color=pad_color,
seed=seed)
seed=seed,
preprocess_vars_cache=preprocess_vars_cache)
cropped_padded_output = (padded_image, padded_boxes, cropped_labels)
......@@ -1300,7 +1595,8 @@ def random_crop_to_aspect_ratio(image,
keypoints=None,
aspect_ratio=1.0,
overlap_thresh=0.3,
seed=None):
seed=None,
preprocess_vars_cache=None):
"""Randomly crops an image to the specified aspect ratio.
Randomly crops the a portion of the image such that the crop is of the
......@@ -1332,6 +1628,10 @@ def random_crop_to_aspect_ratio(image,
overlap_thresh: minimum overlap thresh with new cropped
image to keep the box.
seed: random seed.
preprocess_vars_cache: PreprocessorCache object that records previously
performed augmentations. Updated in-place. If this
function is called multiple times with the same
non-null cache, it will perform deterministically.
Returns:
image: image which is the same rank as input image.
......@@ -1375,6 +1675,13 @@ def random_crop_to_aspect_ratio(image,
# offset_height is randomly chosen from [0, offset_height - target_height)
offset_height = _random_integer(0, orig_height - target_height + 1, seed)
offset_width = _random_integer(0, orig_width - target_width + 1, seed)
generator_func = lambda: (offset_height, offset_width)
offset_height, offset_width = _get_or_create_preprocess_rand_vars(
generator_func,
preprocessor_cache.PreprocessorCache.CROP_TO_ASPECT_RATIO,
preprocess_vars_cache)
new_image = tf.image.crop_to_bounding_box(
image, offset_height, offset_width, target_height, target_width)
......@@ -1437,7 +1744,8 @@ def random_pad_to_aspect_ratio(image,
aspect_ratio=1.0,
min_padded_size_ratio=(1.0, 1.0),
max_padded_size_ratio=(2.0, 2.0),
seed=None):
seed=None,
preprocess_vars_cache=None):
"""Randomly zero pads an image to the specified aspect ratio.
Pads the image so that the resulting image will have the specified aspect
......@@ -1465,6 +1773,10 @@ def random_pad_to_aspect_ratio(image,
max_padded_size_ratio: max ratio of padded image height and width to the
input image's height and width.
seed: random seed.
preprocess_vars_cache: PreprocessorCache object that records previously
performed augmentations. Updated in-place. If this
function is called multiple times with the same
non-null cache, it will perform deterministically.
Returns:
image: image which is the same rank as input image.
......@@ -1511,7 +1823,13 @@ def random_pad_to_aspect_ratio(image,
min_scale = tf.maximum(min_height / target_height, min_width / target_width)
max_scale = tf.minimum(max_height / target_height, max_width / target_width)
scale = tf.random_uniform([], min_scale, max_scale, seed=seed)
generator_func = functools.partial(tf.random_uniform, [],
min_scale, max_scale, seed=seed)
scale = _get_or_create_preprocess_rand_vars(
generator_func,
preprocessor_cache.PreprocessorCache.PAD_TO_ASPECT_RATIO,
preprocess_vars_cache)
target_height = scale * target_height
target_width = scale * target_width
......@@ -1550,7 +1868,8 @@ def random_black_patches(image,
max_black_patches=10,
probability=0.5,
size_to_image_ratio=0.1,
random_seed=None):
random_seed=None,
preprocess_vars_cache=None):
"""Randomly adds some black patches to the image.
This op adds up to max_black_patches square black patches of a fixed size
......@@ -1567,15 +1886,20 @@ def random_black_patches(image,
box_size = size_to_image_ratio *
min(image_width, image_height)
random_seed: random seed.
preprocess_vars_cache: PreprocessorCache object that records previously
performed augmentations. Updated in-place. If this
function is called multiple times with the same
non-null cache, it will perform deterministically.
Returns:
image
"""
def add_black_patch_to_image(image):
def add_black_patch_to_image(image, idx):
"""Function for adding one patch to the image.
Args:
image: image
idx: counter for number of patches that could have been added
Returns:
image with a randomly added black box
......@@ -1587,10 +1911,19 @@ def random_black_patches(image,
tf.multiply(
tf.minimum(tf.to_float(image_height), tf.to_float(image_width)),
size_to_image_ratio))
normalized_y_min = tf.random_uniform(
[], minval=0.0, maxval=(1.0 - size_to_image_ratio), seed=random_seed)
normalized_x_min = tf.random_uniform(
[], minval=0.0, maxval=(1.0 - size_to_image_ratio), seed=random_seed)
generator_func = functools.partial(tf.random_uniform, [], minval=0.0,
maxval=(1.0 - size_to_image_ratio),
seed=random_seed)
normalized_y_min = _get_or_create_preprocess_rand_vars(
generator_func,
preprocessor_cache.PreprocessorCache.ADD_BLACK_PATCH,
preprocess_vars_cache, key=str(idx) + 'y')
normalized_x_min = _get_or_create_preprocess_rand_vars(
generator_func,
preprocessor_cache.PreprocessorCache.ADD_BLACK_PATCH,
preprocess_vars_cache, key=str(idx) + 'x')
y_min = tf.to_int32(normalized_y_min * tf.to_float(image_height))
x_min = tf.to_int32(normalized_x_min * tf.to_float(image_width))
black_box = tf.ones([box_size, box_size, 3], dtype=tf.float32)
......@@ -1600,13 +1933,17 @@ def random_black_patches(image,
return image
with tf.name_scope('RandomBlackPatchInImage', values=[image]):
for _ in range(max_black_patches):
random_prob = tf.random_uniform(
[], minval=0.0, maxval=1.0, dtype=tf.float32, seed=random_seed)
for idx in range(max_black_patches):
generator_func = functools.partial(tf.random_uniform, [],
minval=0.0, maxval=1.0,
dtype=tf.float32, seed=random_seed)
random_prob = _get_or_create_preprocess_rand_vars(
generator_func,
preprocessor_cache.PreprocessorCache.BLACK_PATCHES,
preprocess_vars_cache, key=idx)
image = tf.cond(
tf.greater(random_prob, probability), lambda: image,
lambda: add_black_patch_to_image(image))
functools.partial(add_black_patch_to_image, image=image, idx=idx))
return image
......@@ -1624,12 +1961,16 @@ def image_to_float(image):
return image
def random_resize_method(image, target_size):
def random_resize_method(image, target_size, preprocess_vars_cache=None):
"""Uses a random resize method to resize the image to target size.
Args:
image: a rank 3 tensor.
target_size: a list of [target_height, target_width]
preprocess_vars_cache: PreprocessorCache object that records previously
performed augmentations. Updated in-place. If this
function is called multiple times with the same
non-null cache, it will perform deterministically.
Returns:
resized image.
......@@ -1638,7 +1979,9 @@ def random_resize_method(image, target_size):
resized_image = _apply_with_random_selector(
image,
lambda x, method: tf.image.resize_images(x, target_size, method),
num_cases=4)
num_cases=4,
preprocess_vars_cache=preprocess_vars_cache,
key=preprocessor_cache.PreprocessorCache.RESIZE_METHOD)
return resized_image
......@@ -2000,7 +2343,7 @@ def rgb_to_gray(image):
Returns:
image: A single channel grayscale image -> [image, height, 1].
"""
return tf.image.rgb_to_grayscale(image)
return _rgb_to_grayscale(image)
def ssd_random_crop(image,
......@@ -2014,7 +2357,8 @@ def ssd_random_crop(image,
area_range=((0.1, 1.0),) * 7,
overlap_thresh=(0.0, 0.1, 0.3, 0.5, 0.7, 0.9, 1.0),
random_coef=(0.15,) * 7,
seed=None):
seed=None,
preprocess_vars_cache=None):
"""Random crop preprocessing with default parameters as in SSD paper.
Liu et al., SSD: Single shot multibox detector.
......@@ -2048,6 +2392,10 @@ def ssd_random_crop(image,
cropped image, and if it is 1.0, we will always get the
original image.
seed: random seed.
preprocess_vars_cache: PreprocessorCache object that records previously
performed augmentations. Updated in-place. If this
function is called multiple times with the same
non-null cache, it will perform deterministically.
Returns:
image: image which is the same rank as input image.
......@@ -2100,14 +2448,17 @@ def ssd_random_crop(image,
area_range=area_range[index],
overlap_thresh=overlap_thresh[index],
random_coef=random_coef[index],
seed=seed)
seed=seed,
preprocess_vars_cache=preprocess_vars_cache)
result = _apply_with_random_selector_tuples(
tuple(
t for t in (image, boxes, labels, label_scores, masks, keypoints)
if t is not None),
random_crop_selector,
num_cases=len(min_object_covered))
num_cases=len(min_object_covered),
preprocess_vars_cache=preprocess_vars_cache,
key=preprocessor_cache.PreprocessorCache.SSD_CROP_SELECTOR_ID)
return result
......@@ -2123,7 +2474,8 @@ def ssd_random_crop_pad(image,
min_padded_size_ratio=((1.0, 1.0),) * 6,
max_padded_size_ratio=((2.0, 2.0),) * 6,
pad_color=(None,) * 6,
seed=None):
seed=None,
preprocess_vars_cache=None):
"""Random crop preprocessing with default parameters as in SSD paper.
Liu et al., SSD: Single shot multibox detector.
......@@ -2159,6 +2511,10 @@ def ssd_random_crop_pad(image,
if set as None, it will be set to average color of the randomly
cropped image.
seed: random seed.
preprocess_vars_cache: PreprocessorCache object that records previously
performed augmentations. Updated in-place. If this
function is called multiple times with the same
non-null cache, it will perform deterministically.
Returns:
image: Image shape will be [new_height, new_width, channels].
......@@ -2188,12 +2544,15 @@ def ssd_random_crop_pad(image,
min_padded_size_ratio=min_padded_size_ratio[index],
max_padded_size_ratio=max_padded_size_ratio[index],
pad_color=pad_color[index],
seed=seed)
seed=seed,
preprocess_vars_cache=preprocess_vars_cache)
return _apply_with_random_selector_tuples(
tuple(t for t in (image, boxes, labels, label_scores) if t is not None),
random_crop_pad_selector,
num_cases=len(min_object_covered))
num_cases=len(min_object_covered),
preprocess_vars_cache=preprocess_vars_cache,
key=preprocessor_cache.PreprocessorCache.SSD_CROP_PAD_SELECTOR_ID)
def ssd_random_crop_fixed_aspect_ratio(
......@@ -2208,7 +2567,8 @@ def ssd_random_crop_fixed_aspect_ratio(
area_range=((0.1, 1.0),) * 7,
overlap_thresh=(0.0, 0.1, 0.3, 0.5, 0.7, 0.9, 1.0),
random_coef=(0.15,) * 7,
seed=None):
seed=None,
preprocess_vars_cache=None):
"""Random crop preprocessing with default parameters as in SSD paper.
Liu et al., SSD: Single shot multibox detector.
......@@ -2245,6 +2605,10 @@ def ssd_random_crop_fixed_aspect_ratio(
cropped image, and if it is 1.0, we will always get the
original image.
seed: random seed.
preprocess_vars_cache: PreprocessorCache object that records previously
performed augmentations. Updated in-place. If this
function is called multiple times with the same
non-null cache, it will perform deterministically.
Returns:
image: image which is the same rank as input image.
......@@ -2263,7 +2627,8 @@ def ssd_random_crop_fixed_aspect_ratio(
crop_result = ssd_random_crop(
image, boxes, labels, label_scores, masks, keypoints, min_object_covered,
aspect_ratio_range, area_range, overlap_thresh, random_coef, seed)
aspect_ratio_range, area_range, overlap_thresh, random_coef, seed,
preprocess_vars_cache)
i = 3
new_image, new_boxes, new_labels = crop_result[:i]
new_label_scores = None
......@@ -2285,7 +2650,8 @@ def ssd_random_crop_fixed_aspect_ratio(
new_masks,
new_keypoints,
aspect_ratio=aspect_ratio,
seed=seed)
seed=seed,
preprocess_vars_cache=preprocess_vars_cache)
return result
......@@ -2305,7 +2671,8 @@ def ssd_random_crop_pad_fixed_aspect_ratio(
random_coef=(0.15,) * 7,
min_padded_size_ratio=(1.0, 1.0),
max_padded_size_ratio=(2.0, 2.0),
seed=None):
seed=None,
preprocess_vars_cache=None):
"""Random crop and pad preprocessing with default parameters as in SSD paper.
Liu et al., SSD: Single shot multibox detector.
......@@ -2348,6 +2715,10 @@ def ssd_random_crop_pad_fixed_aspect_ratio(
max_padded_size_ratio: max ratio of padded image height and width to the
input image's height and width.
seed: random seed.
preprocess_vars_cache: PreprocessorCache object that records previously
performed augmentations. Updated in-place. If this
function is called multiple times with the same
non-null cache, it will perform deterministically.
Returns:
image: image which is the same rank as input image.
......@@ -2364,7 +2735,8 @@ def ssd_random_crop_pad_fixed_aspect_ratio(
"""
crop_result = ssd_random_crop(
image, boxes, labels, label_scores, masks, keypoints, min_object_covered,
aspect_ratio_range, area_range, overlap_thresh, random_coef, seed)
aspect_ratio_range, area_range, overlap_thresh, random_coef, seed,
preprocess_vars_cache)
i = 3
new_image, new_boxes, new_labels = crop_result[:i]
new_label_scores = None
......@@ -2386,7 +2758,8 @@ def ssd_random_crop_pad_fixed_aspect_ratio(
aspect_ratio=aspect_ratio,
min_padded_size_ratio=min_padded_size_ratio,
max_padded_size_ratio=max_padded_size_ratio,
seed=seed)
seed=seed,
preprocess_vars_cache=preprocess_vars_cache)
result = list(result)
if new_label_scores is not None:
......@@ -2534,7 +2907,10 @@ def get_default_func_arg_map(include_label_scores=False,
return prep_func_arg_map
def preprocess(tensor_dict, preprocess_options, func_arg_map=None):
def preprocess(tensor_dict,
preprocess_options,
func_arg_map=None,
preprocess_vars_cache=None):
"""Preprocess images and bounding boxes.
Various types of preprocessing (to be implemented) based on the
......@@ -2559,6 +2935,10 @@ def preprocess(tensor_dict, preprocess_options, func_arg_map=None):
their values.
func_arg_map: mapping from preprocessing functions to arguments that they
expect to receive and return.
preprocess_vars_cache: PreprocessorCache object that records previously
performed augmentations. Updated in-place. If this
function is called multiple times with the same
non-null cache, it will perform deterministically.
Returns:
tensor_dict: which contains the preprocessed images, bounding boxes, etc.
......@@ -2598,6 +2978,9 @@ def preprocess(tensor_dict, preprocess_options, func_arg_map=None):
return tensor_dict[key] if key is not None else None
args = [get_arg(a) for a in arg_names]
if (preprocess_vars_cache is not None and
'preprocess_vars_cache' in inspect.getargspec(func).args):
params['preprocess_vars_cache'] = preprocess_vars_cache
results = func(*args, **params)
if not isinstance(results, (list, tuple)):
results = (results,)
......
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Records previous preprocessing operations and allows them to be repeated.
Used with object_detection.core.preprocessor. Passing a PreprocessorCache
into individual data augmentation functions or the general preprocess() function
will store all randomly generated variables in the PreprocessorCache. When
a preprocessor function is called multiple times with the same
PreprocessorCache object, that function will perform the same augmentation
on all calls.
"""
from collections import defaultdict
class PreprocessorCache(object):
"""Dictionary wrapper storing random variables generated during preprocessing.
"""
# Constant keys representing different preprocessing functions
ROTATION90 = 'rotation90'
HORIZONTAL_FLIP = 'horizontal_flip'
VERTICAL_FLIP = 'vertical_flip'
PIXEL_VALUE_SCALE = 'pixel_value_scale'
IMAGE_SCALE = 'image_scale'
RGB_TO_GRAY = 'rgb_to_gray'
ADJUST_BRIGHTNESS = 'adjust_brightness'
ADJUST_CONTRAST = 'adjust_contrast'
ADJUST_HUE = 'adjust_hue'
ADJUST_SATURATION = 'adjust_saturation'
DISTORT_COLOR = 'distort_color'
STRICT_CROP_IMAGE = 'strict_crop_image'
CROP_IMAGE = 'crop_image'
PAD_IMAGE = 'pad_image'
CROP_TO_ASPECT_RATIO = 'crop_to_aspect_ratio'
RESIZE_METHOD = 'resize_method'
PAD_TO_ASPECT_RATIO = 'pad_to_aspect_ratio'
BLACK_PATCHES = 'black_patches'
ADD_BLACK_PATCH = 'add_black_patch'
SELECTOR = 'selector'
SELECTOR_TUPLES = 'selector_tuples'
SSD_CROP_SELECTOR_ID = 'ssd_crop_selector_id'
SSD_CROP_PAD_SELECTOR_ID = 'ssd_crop_pad_selector_id'
# 23 permitted function ids
_VALID_FNS = [ROTATION90, HORIZONTAL_FLIP, VERTICAL_FLIP, PIXEL_VALUE_SCALE,
IMAGE_SCALE, RGB_TO_GRAY, ADJUST_BRIGHTNESS, ADJUST_CONTRAST,
ADJUST_HUE, ADJUST_SATURATION, DISTORT_COLOR, STRICT_CROP_IMAGE,
CROP_IMAGE, PAD_IMAGE, CROP_TO_ASPECT_RATIO, RESIZE_METHOD,
PAD_TO_ASPECT_RATIO, BLACK_PATCHES, ADD_BLACK_PATCH, SELECTOR,
SELECTOR_TUPLES, SSD_CROP_SELECTOR_ID, SSD_CROP_PAD_SELECTOR_ID]
def __init__(self):
self._history = defaultdict(dict)
def clear(self):
"""Resets cache."""
self._history = {}
def get(self, function_id, key):
"""Gets stored value given a function id and key.
Args:
function_id: identifier for the preprocessing function used.
key: identifier for the variable stored.
Returns:
value: the corresponding value, expected to be a tensor or
nested structure of tensors.
Raises:
ValueError: if function_id is not one of the 23 valid function ids.
"""
if function_id not in self._VALID_FNS:
raise ValueError('Function id not recognized: %s.' % str(function_id))
return self._history[function_id].get(key)
def update(self, function_id, key, value):
"""Adds a value to the dictionary.
Args:
function_id: identifier for the preprocessing function used.
key: identifier for the variable stored.
value: the value to store, expected to be a tensor or nested structure
of tensors.
Raises:
ValueError: if function_id is not one of the 23 valid function ids.
"""
if function_id not in self._VALID_FNS:
raise ValueError('Function id not recognized: %s.' % str(function_id))
self._history[function_id][key] = value
......@@ -21,6 +21,7 @@ import six
import tensorflow as tf
from object_detection.core import preprocessor
from object_detection.core import preprocessor_cache
from object_detection.core import standard_fields as fields
if six.PY2:
......@@ -290,6 +291,15 @@ class PreprocessorTest(tf.test.TestCase):
def expectedLabelsAfterThresholdingWithMissingScore(self):
return tf.constant([2], dtype=tf.float32)
def testRgbToGrayscale(self):
images = self.createTestImages()
grayscale_images = preprocessor._rgb_to_grayscale(images)
expected_images = tf.image.rgb_to_grayscale(images)
with self.test_session() as sess:
(grayscale_images, expected_images) = sess.run(
[grayscale_images, expected_images])
self.assertAllEqual(expected_images, grayscale_images)
def testNormalizeImage(self):
preprocess_options = [(preprocessor.normalize_image, {
'original_minval': 0,
......@@ -435,6 +445,55 @@ class PreprocessorTest(tf.test.TestCase):
rotated_mask, expected_mask = sess.run([rotated_mask, expected_mask])
self.assertAllEqual(rotated_mask.flatten(), expected_mask.flatten())
def _testPreprocessorCache(self,
preprocess_options,
test_boxes=False,
test_masks=False,
test_keypoints=False,
num_runs=4):
cache = preprocessor_cache.PreprocessorCache()
images = self.createTestImages()
boxes = self.createTestBoxes()
classes = self.createTestLabels()
masks = self.createTestMasks()
keypoints = self.createTestKeypoints()
preprocessor_arg_map = preprocessor.get_default_func_arg_map(
include_instance_masks=test_masks, include_keypoints=test_keypoints)
out = []
for i in range(num_runs):
tensor_dict = {
fields.InputDataFields.image: images,
}
num_outputs = 1
if test_boxes:
tensor_dict[fields.InputDataFields.groundtruth_boxes] = boxes
tensor_dict[fields.InputDataFields.groundtruth_classes] = classes
num_outputs += 1
if test_masks:
tensor_dict[fields.InputDataFields.groundtruth_instance_masks] = masks
num_outputs += 1
if test_keypoints:
tensor_dict[fields.InputDataFields.groundtruth_keypoints] = keypoints
num_outputs += 1
out.append(preprocessor.preprocess(
tensor_dict, preprocess_options, preprocessor_arg_map, cache))
with self.test_session() as sess:
to_run = []
for i in range(num_runs):
to_run.append(out[i][fields.InputDataFields.image])
if test_boxes:
to_run.append(out[i][fields.InputDataFields.groundtruth_boxes])
if test_masks:
to_run.append(
out[i][fields.InputDataFields.groundtruth_instance_masks])
if test_keypoints:
to_run.append(out[i][fields.InputDataFields.groundtruth_keypoints])
out_array = sess.run(to_run)
for i in range(num_outputs, len(out_array)):
self.assertAllClose(out_array[i], out_array[i - num_outputs])
def testRandomHorizontalFlip(self):
preprocess_options = [(preprocessor.random_horizontal_flip, {})]
images = self.expectedImagesAfterNormalization()
......@@ -491,6 +550,16 @@ class PreprocessorTest(tf.test.TestCase):
self.assertAllClose(boxes_, boxes_expected_)
self.assertAllClose(images_diff_, images_diff_expected_)
def testRandomHorizontalFlipWithCache(self):
keypoint_flip_permutation = self.createKeypointFlipPermutation()
preprocess_options = [
(preprocessor.random_horizontal_flip,
{'keypoint_flip_permutation': keypoint_flip_permutation})]
self._testPreprocessorCache(preprocess_options,
test_boxes=True,
test_masks=True,
test_keypoints=True)
def testRunRandomHorizontalFlipWithMaskAndKeypoints(self):
preprocess_options = [(preprocessor.random_horizontal_flip, {})]
image_height = 3
......@@ -578,6 +647,16 @@ class PreprocessorTest(tf.test.TestCase):
self.assertAllClose(boxes_, boxes_expected_)
self.assertAllClose(images_diff_, images_diff_expected_)
def testRandomVerticalFlipWithCache(self):
keypoint_flip_permutation = self.createKeypointFlipPermutation()
preprocess_options = [
(preprocessor.random_vertical_flip,
{'keypoint_flip_permutation': keypoint_flip_permutation})]
self._testPreprocessorCache(preprocess_options,
test_boxes=True,
test_masks=True,
test_keypoints=True)
def testRunRandomVerticalFlipWithMaskAndKeypoints(self):
preprocess_options = [(preprocessor.random_vertical_flip, {})]
image_height = 3
......@@ -665,6 +744,13 @@ class PreprocessorTest(tf.test.TestCase):
self.assertAllClose(boxes_, boxes_expected_)
self.assertAllClose(images_diff_, images_diff_expected_)
def testRandomRotation90WithCache(self):
preprocess_options = [(preprocessor.random_rotation90, {})]
self._testPreprocessorCache(preprocess_options,
test_boxes=True,
test_masks=True,
test_keypoints=True)
def testRunRandomRotation90WithMaskAndKeypoints(self):
preprocess_options = [(preprocessor.random_rotation90, {})]
image_height = 3
......@@ -716,6 +802,20 @@ class PreprocessorTest(tf.test.TestCase):
self.assertAllClose(values_greater_, values_true_)
self.assertAllClose(values_less_, values_true_)
def testRandomPixelValueScaleWithCache(self):
preprocess_options = []
preprocess_options.append((preprocessor.normalize_image, {
'original_minval': 0,
'original_maxval': 255,
'target_minval': 0,
'target_maxval': 1
}))
preprocess_options.append((preprocessor.random_pixel_value_scale, {}))
self._testPreprocessorCache(preprocess_options,
test_boxes=True,
test_masks=False,
test_keypoints=False)
def testRandomImageScale(self):
preprocess_options = [(preprocessor.random_image_scale, {})]
images_original = self.createTestImages()
......@@ -736,6 +836,13 @@ class PreprocessorTest(tf.test.TestCase):
self.assertTrue(
images_original_shape_[2] * 2.0 >= images_scaled_shape_[2])
def testRandomImageScaleWithCache(self):
preprocess_options = [(preprocessor.random_image_scale, {})]
self._testPreprocessorCache(preprocess_options,
test_boxes=False,
test_masks=False,
test_keypoints=False)
def testRandomRGBtoGray(self):
preprocess_options = [(preprocessor.random_rgb_to_gray, {})]
images_original = self.createTestImages()
......@@ -769,6 +876,14 @@ class PreprocessorTest(tf.test.TestCase):
self.assertAllClose(images_g_diff_, image_zero1_)
self.assertAllClose(images_b_diff_, image_zero1_)
def testRandomRGBtoGrayWithCache(self):
preprocess_options = [(
preprocessor.random_rgb_to_gray, {'probability': 0.5})]
self._testPreprocessorCache(preprocess_options,
test_boxes=False,
test_masks=False,
test_keypoints=False)
def testRandomAdjustBrightness(self):
preprocessing_options = []
preprocessing_options.append((preprocessor.normalize_image, {
......@@ -789,6 +904,20 @@ class PreprocessorTest(tf.test.TestCase):
[image_original_shape, image_bright_shape])
self.assertAllEqual(image_original_shape_, image_bright_shape_)
def testRandomAdjustBrightnessWithCache(self):
preprocess_options = []
preprocess_options.append((preprocessor.normalize_image, {
'original_minval': 0,
'original_maxval': 255,
'target_minval': 0,
'target_maxval': 1
}))
preprocess_options.append((preprocessor.random_adjust_brightness, {}))
self._testPreprocessorCache(preprocess_options,
test_boxes=False,
test_masks=False,
test_keypoints=False)
def testRandomAdjustContrast(self):
preprocessing_options = []
preprocessing_options.append((preprocessor.normalize_image, {
......@@ -809,6 +938,20 @@ class PreprocessorTest(tf.test.TestCase):
[image_original_shape, image_contrast_shape])
self.assertAllEqual(image_original_shape_, image_contrast_shape_)
def testRandomAdjustContrastWithCache(self):
preprocess_options = []
preprocess_options.append((preprocessor.normalize_image, {
'original_minval': 0,
'original_maxval': 255,
'target_minval': 0,
'target_maxval': 1
}))
preprocess_options.append((preprocessor.random_adjust_contrast, {}))
self._testPreprocessorCache(preprocess_options,
test_boxes=False,
test_masks=False,
test_keypoints=False)
def testRandomAdjustHue(self):
preprocessing_options = []
preprocessing_options.append((preprocessor.normalize_image, {
......@@ -829,6 +972,20 @@ class PreprocessorTest(tf.test.TestCase):
[image_original_shape, image_hue_shape])
self.assertAllEqual(image_original_shape_, image_hue_shape_)
def testRandomAdjustHueWithCache(self):
preprocess_options = []
preprocess_options.append((preprocessor.normalize_image, {
'original_minval': 0,
'original_maxval': 255,
'target_minval': 0,
'target_maxval': 1
}))
preprocess_options.append((preprocessor.random_adjust_hue, {}))
self._testPreprocessorCache(preprocess_options,
test_boxes=False,
test_masks=False,
test_keypoints=False)
def testRandomDistortColor(self):
preprocessing_options = []
preprocessing_options.append((preprocessor.normalize_image, {
......@@ -849,6 +1006,20 @@ class PreprocessorTest(tf.test.TestCase):
[images_original_shape, images_distorted_color_shape])
self.assertAllEqual(images_original_shape_, images_distorted_color_shape_)
def testRandomDistortColorWithCache(self):
preprocess_options = []
preprocess_options.append((preprocessor.normalize_image, {
'original_minval': 0,
'original_maxval': 255,
'target_minval': 0,
'target_maxval': 1
}))
preprocess_options.append((preprocessor.random_distort_color, {}))
self._testPreprocessorCache(preprocess_options,
test_boxes=False,
test_masks=False,
test_keypoints=False)
def testRandomJitterBoxes(self):
preprocessing_options = []
preprocessing_options.append((preprocessor.random_jitter_boxes, {}))
......@@ -900,6 +1071,21 @@ class PreprocessorTest(tf.test.TestCase):
self.assertAllEqual(boxes_rank_, distorted_boxes_rank_)
self.assertAllEqual(images_rank_, distorted_images_rank_)
def testRandomCropImageWithCache(self):
preprocess_options = [(preprocessor.random_rgb_to_gray,
{'probability': 0.5}),
(preprocessor.normalize_image, {
'original_minval': 0,
'original_maxval': 255,
'target_minval': 0,
'target_maxval': 1,
}),
(preprocessor.random_crop_image, {})]
self._testPreprocessorCache(preprocess_options,
test_boxes=True,
test_masks=False,
test_keypoints=False)
def testRandomCropImageGrayscale(self):
preprocessing_options = [(preprocessor.rgb_to_gray, {}),
(preprocessor.normalize_image, {
......@@ -1446,6 +1632,13 @@ class PreprocessorTest(tf.test.TestCase):
self.expectedKeypointsAfterThresholding()])
self.assertAllClose(retained_keypoints_, expected_keypoints_)
def testRandomCropToAspectRatioWithCache(self):
preprocess_options = [(preprocessor.random_crop_to_aspect_ratio, {})]
self._testPreprocessorCache(preprocess_options,
test_boxes=True,
test_masks=False,
test_keypoints=False)
def testRunRandomCropToAspectRatioWithMasks(self):
image = self.createColorfulTestImage()
boxes = self.createTestBoxes()
......@@ -1536,6 +1729,13 @@ class PreprocessorTest(tf.test.TestCase):
self.assertAllClose(distorted_keypoints_.flatten(),
expected_keypoints.flatten())
def testRandomPadToAspectRatioWithCache(self):
preprocess_options = [(preprocessor.random_pad_to_aspect_ratio, {})]
self._testPreprocessorCache(preprocess_options,
test_boxes=True,
test_masks=True,
test_keypoints=True)
def testRunRandomPadToAspectRatioWithMasks(self):
image = self.createColorfulTestImage()
boxes = self.createTestBoxes()
......@@ -1624,6 +1824,17 @@ class PreprocessorTest(tf.test.TestCase):
self.assertAllClose(distorted_keypoints_.flatten(),
expected_keypoints.flatten())
def testRandomPadImageWithCache(self):
preprocess_options = [(preprocessor.normalize_image, {
'original_minval': 0,
'original_maxval': 255,
'target_minval': 0,
'target_maxval': 1,}), (preprocessor.random_pad_image, {})]
self._testPreprocessorCache(preprocess_options,
test_boxes=True,
test_masks=True,
test_keypoints=True)
def testRandomPadImage(self):
preprocessing_options = [(preprocessor.normalize_image, {
'original_minval': 0,
......@@ -1670,6 +1881,17 @@ class PreprocessorTest(tf.test.TestCase):
self.assertTrue(np.all((boxes_[:, 3] - boxes_[:, 1]) >= (
padded_boxes_[:, 3] - padded_boxes_[:, 1])))
def testRandomCropPadImageWithCache(self):
preprocess_options = [(preprocessor.normalize_image, {
'original_minval': 0,
'original_maxval': 255,
'target_minval': 0,
'target_maxval': 1,}), (preprocessor.random_crop_pad_image, {})]
self._testPreprocessorCache(preprocess_options,
test_boxes=True,
test_masks=True,
test_keypoints=True)
def testRandomCropPadImageWithRandomCoefOne(self):
preprocessing_options = [(preprocessor.normalize_image, {
'original_minval': 0,
......@@ -1788,6 +2010,22 @@ class PreprocessorTest(tf.test.TestCase):
self.assertEqual(images_shape_[1], padded_images_shape_[1])
self.assertEqual(2 * images_shape_[2], padded_images_shape_[2])
def testRandomBlackPatchesWithCache(self):
preprocess_options = []
preprocess_options.append((preprocessor.normalize_image, {
'original_minval': 0,
'original_maxval': 255,
'target_minval': 0,
'target_maxval': 1
}))
preprocess_options.append((preprocessor.random_black_patches, {
'size_to_image_ratio': 0.5
}))
self._testPreprocessorCache(preprocess_options,
test_boxes=True,
test_masks=True,
test_keypoints=True)
def testRandomBlackPatches(self):
preprocessing_options = []
preprocessing_options.append((preprocessor.normalize_image, {
......@@ -1812,6 +2050,22 @@ class PreprocessorTest(tf.test.TestCase):
[images_shape, blacked_images_shape])
self.assertAllEqual(images_shape_, blacked_images_shape_)
def testRandomResizeMethodWithCache(self):
preprocess_options = []
preprocess_options.append((preprocessor.normalize_image, {
'original_minval': 0,
'original_maxval': 255,
'target_minval': 0,
'target_maxval': 1
}))
preprocess_options.append((preprocessor.random_resize_method, {
'target_size': (75, 150)
}))
self._testPreprocessorCache(preprocess_options,
test_boxes=True,
test_masks=True,
test_keypoints=True)
def testRandomResizeMethod(self):
preprocessing_options = []
preprocessing_options.append((preprocessor.normalize_image, {
......@@ -2144,6 +2398,20 @@ class PreprocessorTest(tf.test.TestCase):
self.assertAllEqual([0, 1, 1, 0, 1], one_hot)
def testSSDRandomCropWithCache(self):
preprocess_options = [
(preprocessor.normalize_image, {
'original_minval': 0,
'original_maxval': 255,
'target_minval': 0,
'target_maxval': 1
}),
(preprocessor.ssd_random_crop, {})]
self._testPreprocessorCache(preprocess_options,
test_boxes=True,
test_masks=False,
test_keypoints=False)
def testSSDRandomCrop(self):
preprocessing_options = [
(preprocessor.normalize_image, {
......@@ -2216,6 +2484,20 @@ class PreprocessorTest(tf.test.TestCase):
self.assertAllEqual(boxes_rank_, distorted_boxes_rank_)
self.assertAllEqual(images_rank_, distorted_images_rank_)
def testSSDRandomCropFixedAspectRatioWithCache(self):
preprocess_options = [
(preprocessor.normalize_image, {
'original_minval': 0,
'original_maxval': 255,
'target_minval': 0,
'target_maxval': 1
}),
(preprocessor.ssd_random_crop_fixed_aspect_ratio, {})]
self._testPreprocessorCache(preprocess_options,
test_boxes=True,
test_masks=False,
test_keypoints=False)
def _testSSDRandomCropFixedAspectRatio(self,
include_label_scores,
include_instance_masks,
......
......@@ -58,6 +58,9 @@ class InputDataFields(object):
groundtruth_keypoint_visibilities: ground truth keypoint visibilities.
groundtruth_label_scores: groundtruth label scores.
groundtruth_weights: groundtruth weight factor for bounding boxes.
num_groundtruth_boxes: number of groundtruth boxes.
true_image_shapes: true shapes of images in the resized images, as resized
images can be padded with zeros.
"""
image = 'image'
original_image = 'original_image'
......@@ -81,6 +84,8 @@ class InputDataFields(object):
groundtruth_keypoint_visibilities = 'groundtruth_keypoint_visibilities'
groundtruth_label_scores = 'groundtruth_label_scores'
groundtruth_weights = 'groundtruth_weights'
num_groundtruth_boxes = 'num_groundtruth_boxes'
true_image_shape = 'true_image_shape'
class DetectionResultFields(object):
......
......@@ -389,7 +389,8 @@ def create_target_assigner(reference, stage=None,
def batch_assign_targets(target_assigner,
anchors_batch,
gt_box_batch,
gt_class_targets_batch):
gt_class_targets_batch,
gt_weights_batch=None):
"""Batched assignment of classification and regression targets.
Args:
......@@ -402,6 +403,8 @@ def batch_assign_targets(target_assigner,
each tensor has shape [num_gt_boxes_i, classification_target_size] and
num_gt_boxes_i is the number of boxes in the ith boxlist of
gt_box_batch.
gt_weights_batch: A list of 1-D tf.float32 tensors of shape
[num_boxes] containing weights for groundtruth boxes.
Returns:
batch_cls_targets: a tensor with shape [batch_size, num_anchors,
......@@ -435,11 +438,13 @@ def batch_assign_targets(target_assigner,
reg_targets_list = []
reg_weights_list = []
match_list = []
for anchors, gt_boxes, gt_class_targets in zip(
anchors_batch, gt_box_batch, gt_class_targets_batch):
if gt_weights_batch is None:
gt_weights_batch = [None] * len(gt_class_targets_batch)
for anchors, gt_boxes, gt_class_targets, gt_weights in zip(
anchors_batch, gt_box_batch, gt_class_targets_batch, gt_weights_batch):
(cls_targets, cls_weights, reg_targets,
reg_weights, match) = target_assigner.assign(
anchors, gt_boxes, gt_class_targets)
anchors, gt_boxes, gt_class_targets, gt_weights)
cls_targets_list.append(cls_targets)
cls_weights_list.append(cls_weights)
reg_targets_list.append(reg_targets)
......
......@@ -632,6 +632,81 @@ class BatchTargetAssignerTest(test_case.TestCase):
self.assertAllClose(reg_targets_out, exp_reg_targets)
self.assertAllClose(reg_weights_out, exp_reg_weights)
def test_batch_assign_multiclass_targets_with_padded_groundtruth(self):
def graph_fn(anchor_means, anchor_stddevs, groundtruth_boxlist1,
groundtruth_boxlist2, class_targets1, class_targets2,
groundtruth_weights1, groundtruth_weights2):
box_list1 = box_list.BoxList(groundtruth_boxlist1)
box_list2 = box_list.BoxList(groundtruth_boxlist2)
gt_box_batch = [box_list1, box_list2]
gt_class_targets = [class_targets1, class_targets2]
gt_weights = [groundtruth_weights1, groundtruth_weights2]
anchors_boxlist = box_list.BoxList(anchor_means)
anchors_boxlist.add_field('stddev', anchor_stddevs)
multiclass_target_assigner = self._get_multi_class_target_assigner(
num_classes=3)
(cls_targets, cls_weights, reg_targets, reg_weights,
_) = targetassigner.batch_assign_targets(
multiclass_target_assigner, anchors_boxlist, gt_box_batch,
gt_class_targets, gt_weights)
return (cls_targets, cls_weights, reg_targets, reg_weights)
groundtruth_boxlist1 = np.array([[0., 0., 0.2, 0.2],
[0., 0., 0., 0.]], dtype=np.float32)
groundtruth_weights1 = np.array([1, 0], dtype=np.float32)
groundtruth_boxlist2 = np.array([[0, 0.25123152, 1, 1],
[0.015789, 0.0985, 0.55789, 0.3842],
[0, 0, 0, 0]],
dtype=np.float32)
groundtruth_weights2 = np.array([1, 1, 0], dtype=np.float32)
class_targets1 = np.array([[0, 1, 0, 0], [0, 0, 0, 0]], dtype=np.float32)
class_targets2 = np.array([[0, 0, 0, 1],
[0, 0, 1, 0],
[0, 0, 0, 0]], dtype=np.float32)
anchor_means = np.array([[0, 0, .25, .25],
[0, .25, 1, 1],
[0, .1, .5, .5],
[.75, .75, 1, 1]], dtype=np.float32)
anchor_stddevs = np.array([[.1, .1, .1, .1],
[.1, .1, .1, .1],
[.1, .1, .1, .1],
[.1, .1, .1, .1]], dtype=np.float32)
exp_reg_targets = [[[0, 0, -0.5, -0.5],
[0, 0, 0, 0],
[0, 0, 0, 0,],
[0, 0, 0, 0,],],
[[0, 0, 0, 0,],
[0, 0.01231521, 0, 0],
[0.15789001, -0.01500003, 0.57889998, -1.15799987],
[0, 0, 0, 0]]]
exp_cls_weights = [[1, 1, 1, 1],
[1, 1, 1, 1]]
exp_cls_targets = [[[0, 1, 0, 0],
[1, 0, 0, 0],
[1, 0, 0, 0],
[1, 0, 0, 0]],
[[1, 0, 0, 0],
[0, 0, 0, 1],
[0, 0, 1, 0],
[1, 0, 0, 0]]]
exp_reg_weights = [[1, 0, 0, 0],
[0, 1, 1, 0]]
(cls_targets_out, cls_weights_out, reg_targets_out,
reg_weights_out) = self.execute(graph_fn, [anchor_means, anchor_stddevs,
groundtruth_boxlist1,
groundtruth_boxlist2,
class_targets1,
class_targets2,
groundtruth_weights1,
groundtruth_weights2])
self.assertAllClose(cls_targets_out, exp_cls_targets)
self.assertAllClose(cls_weights_out, exp_cls_weights)
self.assertAllClose(reg_targets_out, exp_reg_targets)
self.assertAllClose(reg_weights_out, exp_reg_weights)
def test_batch_assign_multidimensional_targets(self):
def graph_fn(anchor_means, anchor_stddevs, groundtruth_boxlist1,
groundtruth_boxlist2, class_targets1, class_targets2):
......
......@@ -134,7 +134,8 @@ class TfExampleDecoder(data_decoder.DataDecoder):
self.items_to_handlers[
fields.InputDataFields.groundtruth_instance_masks] = (
slim_example_decoder.ItemHandlerCallback(
['image/object/mask'], self._decode_png_instance_masks))
['image/object/mask', 'image/height', 'image/width'],
self._decode_png_instance_masks))
else:
raise ValueError('Did not recognize the `instance_mask_type` option.')
if label_map_proto_file:
......@@ -178,10 +179,15 @@ class TfExampleDecoder(data_decoder.DataDecoder):
[None, 4] containing box corners.
fields.InputDataFields.groundtruth_classes - 1D int64 tensor of shape
[None] containing classes for the boxes.
fields.InputDataFields.groundtruth_weights - 1D float32 tensor of
shape [None] indicating the weights of groundtruth boxes.
fields.InputDataFields.num_groundtruth_boxes - int32 scalar indicating
the number of groundtruth_boxes.
fields.InputDataFields.groundtruth_area - 1D float32 tensor of shape
[None] containing containing object mask area in pixel squared.
fields.InputDataFields.groundtruth_is_crowd - 1D bool tensor of shape
[None] indicating if the boxes enclose a crowd.
Optional:
fields.InputDataFields.groundtruth_difficult - 1D bool tensor of shape
[None] indicating if the boxes represent `difficult` instances.
......@@ -189,8 +195,6 @@ class TfExampleDecoder(data_decoder.DataDecoder):
[None] indicating if the boxes represent `group_of` instances.
fields.InputDataFields.groundtruth_instance_masks - 3D float32 tensor of
shape [None, None, None] containing instance masks.
fields.InputDataFields.groundtruth_weights - 1D float32 tensor of
shape [None] indicating the weights of groundtruth boxes.
"""
serialized_example = tf.reshape(tf_example_string_tensor, shape=[])
decoder = slim_example_decoder.TFExampleDecoder(self.keys_to_features,
......@@ -201,6 +205,20 @@ class TfExampleDecoder(data_decoder.DataDecoder):
is_crowd = fields.InputDataFields.groundtruth_is_crowd
tensor_dict[is_crowd] = tf.cast(tensor_dict[is_crowd], dtype=tf.bool)
tensor_dict[fields.InputDataFields.image].set_shape([None, None, 3])
tensor_dict[fields.InputDataFields.num_groundtruth_boxes] = tf.shape(
tensor_dict[fields.InputDataFields.groundtruth_boxes])[0]
def default_groundtruth_weights():
return tf.ones(
[tf.shape(tensor_dict[fields.InputDataFields.groundtruth_boxes])[0]],
dtype=tf.float32)
tensor_dict[fields.InputDataFields.groundtruth_weights] = tf.cond(
tf.greater(
tf.shape(
tensor_dict[fields.InputDataFields.groundtruth_weights])[0],
0), lambda: tensor_dict[fields.InputDataFields.groundtruth_weights],
default_groundtruth_weights)
return tensor_dict
def _reshape_instance_masks(self, keys_to_tensors):
......@@ -247,6 +265,11 @@ class TfExampleDecoder(data_decoder.DataDecoder):
return image
png_masks = keys_to_tensors['image/object/mask']
height = keys_to_tensors['image/height']
width = keys_to_tensors['image/width']
if isinstance(png_masks, tf.SparseTensor):
png_masks = tf.sparse_tensor_to_dense(png_masks, default_value='')
return tf.map_fn(decode_png_mask, png_masks, dtype=tf.float32)
return tf.cond(
tf.greater(tf.size(png_masks), 0),
lambda: tf.map_fn(decode_png_mask, png_masks, dtype=tf.float32),
lambda: tf.zeros(tf.to_int32(tf.stack([0, height, width]))))
......@@ -58,7 +58,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
def testDecodeJpegImage(self):
image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
encoded_jpeg = self._EncodeImage(image_tensor)
decoded_jpeg = self._DecodeImage(encoded_jpeg)
example = tf.train.Example(features=tf.train.Features(feature={
......@@ -79,7 +79,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
self.assertEqual('image_id', tensor_dict[fields.InputDataFields.source_id])
def testDecodeImageKeyAndFilename(self):
image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
encoded_jpeg = self._EncodeImage(image_tensor)
example = tf.train.Example(features=tf.train.Features(feature={
'image/encoded': self._BytesFeature(encoded_jpeg),
......@@ -97,7 +97,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
self.assertEqual('filename', tensor_dict[fields.InputDataFields.filename])
def testDecodePngImage(self):
image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
encoded_png = self._EncodeImage(image_tensor, encoding_type='png')
decoded_png = self._DecodeImage(encoded_png, encoding_type='png')
example = tf.train.Example(features=tf.train.Features(feature={
......@@ -147,8 +147,32 @@ class TfExampleDecoderTest(tf.test.TestCase):
decoded_masks,
tensor_dict[fields.InputDataFields.groundtruth_instance_masks])
def testDecodeEmptyPngInstanceMasks(self):
image_tensor = np.random.randint(256, size=(10, 10, 3)).astype(np.uint8)
encoded_jpeg = self._EncodeImage(image_tensor)
encoded_masks = []
example = tf.train.Example(
features=tf.train.Features(
feature={
'image/encoded': self._BytesFeature(encoded_jpeg),
'image/format': self._BytesFeature('jpeg'),
'image/object/mask': self._BytesFeature(encoded_masks),
'image/height': self._Int64Feature([10]),
'image/width': self._Int64Feature([10]),
})).SerializeToString()
example_decoder = tf_example_decoder.TfExampleDecoder(
load_instance_masks=True, instance_mask_type=input_reader_pb2.PNG_MASKS)
tensor_dict = example_decoder.decode(tf.convert_to_tensor(example))
with self.test_session() as sess:
tensor_dict = sess.run(tensor_dict)
self.assertAllEqual(
tensor_dict[fields.InputDataFields.groundtruth_instance_masks].shape,
[0, 10, 10])
def testDecodeBoundingBox(self):
image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
encoded_jpeg = self._EncodeImage(image_tensor)
bbox_ymins = [0.0, 4.0]
bbox_xmins = [1.0, 5.0]
......@@ -175,9 +199,39 @@ class TfExampleDecoderTest(tf.test.TestCase):
bbox_ymaxs, bbox_xmaxs]).transpose()
self.assertAllEqual(expected_boxes,
tensor_dict[fields.InputDataFields.groundtruth_boxes])
self.assertAllEqual(
2, tensor_dict[fields.InputDataFields.num_groundtruth_boxes])
def testDecodeDefaultGroundtruthWeights(self):
image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
encoded_jpeg = self._EncodeImage(image_tensor)
bbox_ymins = [0.0, 4.0]
bbox_xmins = [1.0, 5.0]
bbox_ymaxs = [2.0, 6.0]
bbox_xmaxs = [3.0, 7.0]
example = tf.train.Example(features=tf.train.Features(feature={
'image/encoded': self._BytesFeature(encoded_jpeg),
'image/format': self._BytesFeature('jpeg'),
'image/object/bbox/ymin': self._FloatFeature(bbox_ymins),
'image/object/bbox/xmin': self._FloatFeature(bbox_xmins),
'image/object/bbox/ymax': self._FloatFeature(bbox_ymaxs),
'image/object/bbox/xmax': self._FloatFeature(bbox_xmaxs),
})).SerializeToString()
example_decoder = tf_example_decoder.TfExampleDecoder()
tensor_dict = example_decoder.decode(tf.convert_to_tensor(example))
self.assertAllEqual((tensor_dict[fields.InputDataFields.groundtruth_boxes].
get_shape().as_list()), [None, 4])
with self.test_session() as sess:
tensor_dict = sess.run(tensor_dict)
self.assertAllClose(tensor_dict[fields.InputDataFields.groundtruth_weights],
np.ones(2, dtype=np.float32))
def testDecodeObjectLabel(self):
image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
encoded_jpeg = self._EncodeImage(image_tensor)
bbox_classes = [0, 1]
example = tf.train.Example(features=tf.train.Features(feature={
......@@ -199,8 +253,89 @@ class TfExampleDecoderTest(tf.test.TestCase):
self.assertAllEqual(bbox_classes,
tensor_dict[fields.InputDataFields.groundtruth_classes])
def testDecodeObjectLabelNoText(self):
image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
encoded_jpeg = self._EncodeImage(image_tensor)
bbox_classes = [1, 2]
example = tf.train.Example(features=tf.train.Features(feature={
'image/encoded': self._BytesFeature(encoded_jpeg),
'image/format': self._BytesFeature('jpeg'),
'image/object/class/label': self._Int64Feature(bbox_classes),
})).SerializeToString()
label_map_string = """
item {
id:1
name:'cat'
}
item {
id:2
name:'dog'
}
"""
label_map_path = os.path.join(self.get_temp_dir(), 'label_map.pbtxt')
with tf.gfile.Open(label_map_path, 'wb') as f:
f.write(label_map_string)
example_decoder = tf_example_decoder.TfExampleDecoder(
label_map_proto_file=label_map_path)
tensor_dict = example_decoder.decode(tf.convert_to_tensor(example))
self.assertAllEqual((tensor_dict[
fields.InputDataFields.groundtruth_classes].get_shape().as_list()),
[None])
init = tf.tables_initializer()
with self.test_session() as sess:
sess.run(init)
tensor_dict = sess.run(tensor_dict)
self.assertAllEqual(bbox_classes,
tensor_dict[fields.InputDataFields.groundtruth_classes])
def testDecodeObjectLabelUnrecognizedName(self):
image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
encoded_jpeg = self._EncodeImage(image_tensor)
bbox_classes_text = ['cat', 'cheetah']
example = tf.train.Example(
features=tf.train.Features(
feature={
'image/encoded':
self._BytesFeature(encoded_jpeg),
'image/format':
self._BytesFeature('jpeg'),
'image/object/class/text':
self._BytesFeature(bbox_classes_text),
})).SerializeToString()
label_map_string = """
item {
id:2
name:'cat'
}
item {
id:1
name:'dog'
}
"""
label_map_path = os.path.join(self.get_temp_dir(), 'label_map.pbtxt')
with tf.gfile.Open(label_map_path, 'wb') as f:
f.write(label_map_string)
example_decoder = tf_example_decoder.TfExampleDecoder(
label_map_proto_file=label_map_path)
tensor_dict = example_decoder.decode(tf.convert_to_tensor(example))
self.assertAllEqual((tensor_dict[fields.InputDataFields.groundtruth_classes]
.get_shape().as_list()), [None])
with self.test_session() as sess:
sess.run(tf.tables_initializer())
tensor_dict = sess.run(tensor_dict)
self.assertAllEqual([2, -1],
tensor_dict[fields.InputDataFields.groundtruth_classes])
def testDecodeObjectLabelWithMapping(self):
image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
encoded_jpeg = self._EncodeImage(image_tensor)
bbox_classes_text = ['cat', 'dog']
example = tf.train.Example(
......@@ -242,7 +377,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
tensor_dict[fields.InputDataFields.groundtruth_classes])
def testDecodeObjectArea(self):
image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
encoded_jpeg = self._EncodeImage(image_tensor)
object_area = [100., 174.]
example = tf.train.Example(features=tf.train.Features(feature={
......@@ -263,7 +398,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
tensor_dict[fields.InputDataFields.groundtruth_area])
def testDecodeObjectIsCrowd(self):
image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
encoded_jpeg = self._EncodeImage(image_tensor)
object_is_crowd = [0, 1]
example = tf.train.Example(features=tf.train.Features(feature={
......@@ -286,7 +421,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
fields.InputDataFields.groundtruth_is_crowd])
def testDecodeObjectDifficult(self):
image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
encoded_jpeg = self._EncodeImage(image_tensor)
object_difficult = [0, 1]
example = tf.train.Example(features=tf.train.Features(feature={
......@@ -309,7 +444,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
fields.InputDataFields.groundtruth_difficult])
def testDecodeObjectGroupOf(self):
image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
encoded_jpeg = self._EncodeImage(image_tensor)
object_group_of = [0, 1]
example = tf.train.Example(features=tf.train.Features(
......@@ -333,7 +468,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
tensor_dict[fields.InputDataFields.groundtruth_group_of])
def testDecodeObjectWeight(self):
image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
encoded_jpeg = self._EncodeImage(image_tensor)
object_weights = [0.75, 1.0]
example = tf.train.Example(features=tf.train.Features(
......@@ -362,7 +497,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
image_width = 3
# Randomly generate image.
image_tensor = np.random.randint(255, size=(image_height,
image_tensor = np.random.randint(256, size=(image_height,
image_width,
3)).astype(np.uint8)
encoded_jpeg = self._EncodeImage(image_tensor)
......@@ -413,7 +548,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
image_height = 5
image_width = 3
# Randomly generate image.
image_tensor = np.random.randint(255, size=(image_height,
image_tensor = np.random.randint(256, size=(image_height,
image_width,
3)).astype(np.uint8)
encoded_jpeg = self._EncodeImage(image_tensor)
......
......@@ -87,13 +87,12 @@ def create_tf_example(image,
to the format expected by the Tensorflow Object Detection API (which is
which is [ymin, xmin, ymax, xmax] with coordinates normalized relative
to image size).
image_dir: Directory containing the image files.
image_dir: directory containing the image files.
category_index: a dict containing COCO category information keyed
by the 'id' field of each category. See the
label_map_util.create_category_index function.
include_masks: Whether to include instance segmentations masks
(PNG encoded) in the result. default: False.
Returns:
example: The converted tf.Example
num_annotations_skipped: Number of (invalid) annotations that were ignored.
......@@ -104,6 +103,7 @@ def create_tf_example(image,
image_height = image['height']
image_width = image['width']
filename = image['file_name']
image_id = image['id']
full_path = os.path.join(image_dir, filename)
with tf.gfile.GFile(full_path, 'rb') as fid:
......@@ -118,6 +118,7 @@ def create_tf_example(image,
ymax = []
is_crowd = []
category_names = []
category_ids = []
area = []
encoded_mask_png = []
num_annotations_skipped = 0
......@@ -135,12 +136,13 @@ def create_tf_example(image,
ymax.append(float(y + height) / image_height)
is_crowd.append(object_annotations['iscrowd'])
category_id = int(object_annotations['category_id'])
category_ids.append(category_id)
category_names.append(category_index[category_id]['name'].encode('utf8'))
area.append(object_annotations['area'])
if include_masks:
run_len_encoding = mask.frPyObjects(
object_annotations['segmentation'], image_height, image_width)
run_len_encoding = mask.frPyObjects(object_annotations['segmentation'],
image_height, image_width)
binary_mask = mask.decode(run_len_encoding)
if not object_annotations['iscrowd']:
binary_mask = np.amax(binary_mask, axis=2)
......@@ -148,31 +150,41 @@ def create_tf_example(image,
output_io = io.BytesIO()
pil_image.save(output_io, format='PNG')
encoded_mask_png.append(output_io.getvalue())
feature_dict = {
'image/height': dataset_util.int64_feature(image_height),
'image/width': dataset_util.int64_feature(image_width),
'image/filename': dataset_util.bytes_feature(
filename.encode('utf8')),
'image/source_id': dataset_util.bytes_feature(
filename.encode('utf8')),
'image/key/sha256': dataset_util.bytes_feature(key.encode('utf8')),
'image/encoded': dataset_util.bytes_feature(encoded_jpg),
'image/format': dataset_util.bytes_feature('jpeg'.encode('utf8')),
'image/object/bbox/xmin': dataset_util.float_list_feature(xmin),
'image/object/bbox/xmax': dataset_util.float_list_feature(xmax),
'image/object/bbox/ymin': dataset_util.float_list_feature(ymin),
'image/object/bbox/ymax': dataset_util.float_list_feature(ymax),
'image/object/class/text': dataset_util.bytes_list_feature(
category_names),
'image/object/is_crowd': dataset_util.int64_list_feature(is_crowd),
'image/object/area': dataset_util.float_list_feature(area),
'image/height':
dataset_util.int64_feature(image_height),
'image/width':
dataset_util.int64_feature(image_width),
'image/filename':
dataset_util.bytes_feature(filename.encode('utf8')),
'image/source_id':
dataset_util.bytes_feature(str(image_id).encode('utf8')),
'image/key/sha256':
dataset_util.bytes_feature(key.encode('utf8')),
'image/encoded':
dataset_util.bytes_feature(encoded_jpg),
'image/format':
dataset_util.bytes_feature('jpeg'.encode('utf8')),
'image/object/bbox/xmin':
dataset_util.float_list_feature(xmin),
'image/object/bbox/xmax':
dataset_util.float_list_feature(xmax),
'image/object/bbox/ymin':
dataset_util.float_list_feature(ymin),
'image/object/bbox/ymax':
dataset_util.float_list_feature(ymax),
'image/object/class/label':
dataset_util.int64_list_feature(category_ids),
'image/object/is_crowd':
dataset_util.int64_list_feature(is_crowd),
'image/object/area':
dataset_util.float_list_feature(area),
}
if include_masks:
feature_dict['image/object/mask'] = (
dataset_util.bytes_list_feature(encoded_mask_png))
example = tf.train.Example(features=tf.train.Features(feature=feature_dict))
return example, num_annotations_skipped
return key, example, num_annotations_skipped
def _create_tf_record_from_coco_annotations(
......@@ -217,7 +229,7 @@ def _create_tf_record_from_coco_annotations(
if idx % 100 == 0:
tf.logging.info('On image %d of %d', idx, len(images))
annotations_list = annotations_index[image['id']]
tf_example, num_annotations_skipped = create_tf_example(
_, tf_example, num_annotations_skipped = create_tf_example(
image, annotations_list, image_dir, category_index, include_masks)
total_num_annotations_skipped += num_annotations_skipped
writer.write(tf_example.SerializeToString())
......
......@@ -12,7 +12,6 @@
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Test for create_coco_tf_record.py."""
import io
......@@ -52,26 +51,34 @@ class CreateCocoTFRecordTest(tf.test.TestCase):
'id': 11,
}
annotations_list = [
{
'area': .5,
'iscrowd': False,
'image_id': 11,
'bbox': [64, 64, 128, 128],
'category_id': 2,
'id': 1000,
}
]
annotations_list = [{
'area': .5,
'iscrowd': False,
'image_id': 11,
'bbox': [64, 64, 128, 128],
'category_id': 2,
'id': 1000,
}]
image_dir = tmp_dir
category_index = {
1: {'name': 'dog', 'id': 1},
2: {'name': 'cat', 'id': 2},
3: {'name': 'human', 'id': 3}
1: {
'name': 'dog',
'id': 1
},
2: {
'name': 'cat',
'id': 2
},
3: {
'name': 'human',
'id': 3
}
}
example, num_annotations_skipped = create_coco_tf_record.create_tf_example(
image, annotations_list, image_dir, category_index)
(_, example,
num_annotations_skipped) = create_coco_tf_record.create_tf_example(
image, annotations_list, image_dir, category_index)
self.assertEqual(num_annotations_skipped, 0)
self._assertProtoEqual(
......@@ -83,7 +90,7 @@ class CreateCocoTFRecordTest(tf.test.TestCase):
[image_file_name])
self._assertProtoEqual(
example.features.feature['image/source_id'].bytes_list.value,
[image_file_name])
[str(image['id'])])
self._assertProtoEqual(
example.features.feature['image/format'].bytes_list.value, ['jpeg'])
self._assertProtoEqual(
......@@ -98,9 +105,6 @@ class CreateCocoTFRecordTest(tf.test.TestCase):
self._assertProtoEqual(
example.features.feature['image/object/bbox/ymax'].float_list.value,
[0.75])
self._assertProtoEqual(
example.features.feature['image/object/class/text'].bytes_list.value,
['cat'])
def test_create_tf_example_with_instance_masks(self):
image_file_name = 'tmp_image.jpg'
......@@ -117,26 +121,27 @@ class CreateCocoTFRecordTest(tf.test.TestCase):
'id': 11,
}
annotations_list = [
{
'area': .5,
'iscrowd': False,
'image_id': 11,
'bbox': [0, 0, 8, 8],
'segmentation': [[4, 0, 0, 0, 0, 4],
[8, 4, 4, 8, 8, 8]],
'category_id': 1,
'id': 1000,
}
]
annotations_list = [{
'area': .5,
'iscrowd': False,
'image_id': 11,
'bbox': [0, 0, 8, 8],
'segmentation': [[4, 0, 0, 0, 0, 4], [8, 4, 4, 8, 8, 8]],
'category_id': 1,
'id': 1000,
}]
image_dir = tmp_dir
category_index = {
1: {'name': 'dog', 'id': 1},
1: {
'name': 'dog',
'id': 1
},
}
example, num_annotations_skipped = create_coco_tf_record.create_tf_example(
image, annotations_list, image_dir, category_index, include_masks=True)
(_, example,
num_annotations_skipped) = create_coco_tf_record.create_tf_example(
image, annotations_list, image_dir, category_index, include_masks=True)
self.assertEqual(num_annotations_skipped, 0)
self._assertProtoEqual(
......@@ -148,7 +153,7 @@ class CreateCocoTFRecordTest(tf.test.TestCase):
[image_file_name])
self._assertProtoEqual(
example.features.feature['image/source_id'].bytes_list.value,
[image_file_name])
[str(image['id'])])
self._assertProtoEqual(
example.features.feature['image/format'].bytes_list.value, ['jpeg'])
self._assertProtoEqual(
......@@ -163,24 +168,20 @@ class CreateCocoTFRecordTest(tf.test.TestCase):
self._assertProtoEqual(
example.features.feature['image/object/bbox/ymax'].float_list.value,
[1])
self._assertProtoEqual(
example.features.feature['image/object/class/text'].bytes_list.value,
['dog'])
encoded_mask_pngs = [io.BytesIO(encoded_masks)
for encoded_masks in example.features.feature[
'image/object/mask'].bytes_list.value]
pil_masks = [np.array(PIL.Image.open(encoded_mask_png))
for encoded_mask_png in encoded_mask_pngs]
encoded_mask_pngs = [
io.BytesIO(encoded_masks) for encoded_masks in example.features.feature[
'image/object/mask'].bytes_list.value
]
pil_masks = [
np.array(PIL.Image.open(encoded_mask_png))
for encoded_mask_png in encoded_mask_pngs
]
self.assertTrue(len(pil_masks) == 1)
self.assertAllEqual(pil_masks[0],
[[1, 1, 1, 0, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 0, 1, 1, 1, 1]])
[[1, 1, 1, 0, 0, 0, 0, 0], [1, 1, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 1, 1, 1], [0, 0, 0, 0, 1, 1, 1, 1]])
if __name__ == '__main__':
......
......@@ -509,6 +509,11 @@ def result_dict_for_single_example(image,
detection_masks = detections[detection_fields.detection_masks][0]
# TODO: This should be done in model's postprocess
# function ideally.
num_detections = tf.to_int32(detections[detection_fields.num_detections][0])
detection_boxes = tf.slice(
detection_boxes, begin=[0, 0], size=[num_detections, -1])
detection_masks = tf.slice(
detection_masks, begin=[0, 0, 0], size=[num_detections, -1, -1])
detection_masks_reframed = ops.reframe_box_masks_to_image_masks(
detection_masks, detection_boxes, image_shape[1], image_shape[2])
detection_masks_reframed = tf.cast(
......
......@@ -24,6 +24,7 @@ import tensorflow as tf
from object_detection import eval_util
from object_detection.core import prefetcher
from object_detection.core import standard_fields as fields
from object_detection.metrics import coco_evaluation
from object_detection.utils import object_detection_evaluation
# A dictionary of metric names to classes that implement the metric. The classes
......@@ -39,7 +40,11 @@ EVAL_METRICS_CLASS_DICT = {
'weighted_pascal_voc_instance_segmentation_metrics':
object_detection_evaluation.WeightedPascalInstanceSegmentationEvaluator,
'open_images_detection_metrics':
object_detection_evaluation.OpenImagesDetectionEvaluator
object_detection_evaluation.OpenImagesDetectionEvaluator,
'coco_detection_metrics':
coco_evaluation.CocoDetectionEvaluator,
'coco_mask_metrics':
coco_evaluation.CocoMaskEvaluator,
}
EVAL_DEFAULT_METRIC = 'pascal_voc_detection_metrics'
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment