Commit 1efe98bb authored by Zhichao Lu's avatar Zhichao Lu Committed by lzc5123016
Browse files

Merged commit includes the following changes:

185215255  by Zhichao Lu:

    Stop populating image/object/class/text field when generating COCO tf record.

--
185213306  by Zhichao Lu:

    Use the params batch size and not the one from train_config in input_fn

--
185209081  by Zhichao Lu:

    Handle the case when there are no ground-truth masks for an image.

--
185195531  by Zhichao Lu:

    Remove unstack and stack operations on features from third_party/object_detection/model.py.

--
185195017  by Zhichao Lu:

    Matrix multiplication based gather op implementation.

--
185187744  by Zhichao Lu:

    Fix eval_util minor issue.

--
185098733  by Zhichao Lu:

    Internal change

185076656  by Zhichao Lu:

    Increment the amount of boxes for coco17.

--
185074199  by Zhichao Lu:

    Add config for SSD Resnet50 v1 with FPN.

--
185060199  by Zhichao Lu:

    Fix a bug in clear_detections.
    This method set detection_keys to an empty dictionary instead of an empty set. I've refactored so that this method and the constructor use the same code path.

--
185031359  by Zhichao Lu:

    Eval TPU trained models continuously.

--
185016591  by Zhichao Lu:

    Use TPUEstimatorSpec for TPU

--
185013651  by Zhichao Lu:

    Add PreprocessorCache to record and duplicate augmentations.

--
184921763  by Zhichao Lu:

    Minor fixes for object detection.

--
184920610  by Zhichao Lu:

    Adds a model builder test for "embedded_ssd_mobilenet_v1" feature extractor.

--
184919284  by Zhichao Lu:

    Added unit tests for TPU, with optional training / eval.

--
184915910  by Zhichao Lu:

    Update third_party g3 doc with Mask RCNN detection models.

--
184914085  by Zhichao Lu:

    Slight change to WeightSharedConvolutionalBoxPredictor implementation to make things match more closely with RetinaNet.  Specifically we now construct the box encoding and class predictor towers separately rather than having them share weights until penultimate layer.

--
184913786  by Zhichao Lu:

    Plumbs SSD Resnet V1 with FPN models into model builder.

--
184910030  by Zhichao Lu:

    Add coco metrics to evaluator.

--
184897758  by Zhichao Lu:

    Merge changes from github.

--
184888736  by Zhichao Lu:

    Ensure groundtruth_weights are always 1-D.

--
184887256  by Zhichao Lu:

    Introduce an option to add summaries in the model so it can be turned off when necessary.

--
184865559  by Zhichao Lu:

    Updating inputs so that a dictionary of tensors is returned from input_fn. Moving unbatch/unpad to model.py.
    Also removing source_id key from features dictionary, and replacing with an integer hash.

--
184859205  by Zhichao Lu:

    This CL is trying to hide those differences by making the default settings work with the public code.

--
184769779  by Zhichao Lu:

    Pass groundtruth weights into ssd meta architecture all the way to target assigner.

    This will allow training ssd models with padded groundtruth tensors.

--
184767117  by Zhichao Lu:

    * Add `params` arg to make all input fns work with TPUEstimator
    * Add --master
    * Output eval results

--
184766244  by Zhichao Lu:

    Update create_coco_tf_record to include category indices

--
184752937  by Zhichao Lu:

    Create a third_party version of TPU compatible mobilenet_v2_focal_loss coco config.

--
184750174  by Zhichao Lu:

    A few small fixes for multiscale anchor generator and a test.

--
184746581  by Zhichao Lu:

    Update jupyter notebook to show mask if provided by model.

--
184728646  by Zhichao Lu:

    Adding a few more tests to make sure decoding with/without label maps performs as expected.

--
184624154  by Zhichao Lu:

    Add an object detection binary for TPU.

--
184622118  by Zhichao Lu:

    Batch, transform, and unbatch in the tflearn interface.

--
184595064  by Zhichao Lu:

    Add support for training grayscale models.

--
184532026  by Zhichao Lu:

    Change dataset_builder.build to perform optional batching using tf.data.Dataset API

--
184330239  by Zhichao Lu:

    Add augment_input_data and transform_input_data helper functions to third_party/tensorflow_models/object_detection/inputs.py

--
184328681  by Zhichao Lu:

    Use an internal rgb to gray method that can be quantized.

--
184327909  by Zhichao Lu:

    Helper function to return padding shapes to use with Dataset.padded_batch.

--
184326291  by Zhichao Lu:

    Added decode_func for specialized decoding.

--
184314676  by Zhichao Lu:

    Add unstack_batch method to inputs.py.

    This will enable us to convert batched tensors to lists of tensors. This is compatible with OD API that consumes groundtruth batch as a list of tensors.

--
184281269  by Zhichao Lu:

    Internal test target changes.

--
184192851  by Zhichao Lu:

    Adding `Estimator` interface for object detection.

--
184187885  by Zhichao Lu:

    Add config_util functions to help with input pipeline.

    1. function to return expected shapes from the resizer config
    2. function to extract image_resizer_config from model_config.

--
184139892  by Zhichao Lu:

    Adding support for depthwise SSD (ssd-lite) and depthwise box predictions.

--
184089891  by Zhichao Lu:

    Fix third_party faster rcnn resnet101 coco config.

--
184083378  by Zhichao Lu:

    In the case when there is no object/weights field in tf.Example proto, return a default weight of 1.0 for all boxes.

--

PiperOrigin-RevId: 185215255
parent fbc5ba06
......@@ -123,6 +123,7 @@ py_library(
"matcher.py",
],
deps = [
"//tensorflow/models/research/object_detection/utils:ops",
],
)
......@@ -160,12 +161,20 @@ py_library(
":box_list",
":box_list_ops",
":keypoint_ops",
":preprocessor_cache",
":standard_fields",
"//tensorflow",
"//tensorflow/models/research/object_detection/utils:shape_utils",
],
)
py_library(
name = "preprocessor_cache",
srcs = [
"preprocessor_cache.py",
],
)
py_test(
name = "preprocessor_test",
srcs = [
......@@ -173,6 +182,7 @@ py_test(
],
deps = [
":preprocessor",
":preprocessor_cache",
"//tensorflow",
],
)
......
......@@ -582,7 +582,8 @@ class ConvolutionalBoxPredictor(BoxPredictor):
kernel_size,
box_code_size,
apply_sigmoid_to_scores=False,
class_prediction_bias_init=0.0):
class_prediction_bias_init=0.0,
use_depthwise=False):
"""Constructor.
Args:
......@@ -611,6 +612,8 @@ class ConvolutionalBoxPredictor(BoxPredictor):
class_predictions.
class_prediction_bias_init: constant value to initialize bias of the last
conv2d layer before class prediction.
use_depthwise: Whether to use depthwise convolutions for prediction
steps. Default is False.
Raises:
ValueError: if min_depth > max_depth.
......@@ -628,6 +631,7 @@ class ConvolutionalBoxPredictor(BoxPredictor):
self._dropout_keep_prob = dropout_keep_prob
self._apply_sigmoid_to_scores = apply_sigmoid_to_scores
self._class_prediction_bias_init = class_prediction_bias_init
self._use_depthwise = use_depthwise
def _predict(self, image_features, num_predictions_per_location_list):
"""Computes encoded object locations and corresponding confidences.
......@@ -683,15 +687,36 @@ class ConvolutionalBoxPredictor(BoxPredictor):
net, depth, [1, 1], scope='Conv2d_%d_1x1_%d' % (i, depth))
with slim.arg_scope([slim.conv2d], activation_fn=None,
normalizer_fn=None, normalizer_params=None):
if self._use_depthwise:
box_encodings = slim.separable_conv2d(
net, None, [self._kernel_size, self._kernel_size],
padding='SAME', depth_multiplier=1, stride=1,
rate=1, scope='BoxEncodingPredictor_depthwise')
box_encodings = slim.conv2d(
box_encodings,
num_predictions_per_location * self._box_code_size, [1, 1],
scope='BoxEncodingPredictor')
else:
box_encodings = slim.conv2d(
net, num_predictions_per_location * self._box_code_size,
[self._kernel_size, self._kernel_size],
scope='BoxEncodingPredictor')
if self._use_dropout:
net = slim.dropout(net, keep_prob=self._dropout_keep_prob)
if self._use_depthwise:
class_predictions_with_background = slim.separable_conv2d(
net, None, [self._kernel_size, self._kernel_size],
padding='SAME', depth_multiplier=1, stride=1,
rate=1, scope='ClassPredictor_depthwise')
class_predictions_with_background = slim.conv2d(
class_predictions_with_background,
num_predictions_per_location * num_class_slots,
[1, 1], scope='ClassPredictor')
else:
class_predictions_with_background = slim.conv2d(
net, num_predictions_per_location * num_class_slots,
[self._kernel_size, self._kernel_size], scope='ClassPredictor',
[self._kernel_size, self._kernel_size],
scope='ClassPredictor',
biases_initializer=tf.constant_initializer(
self._class_prediction_bias_init))
if self._apply_sigmoid_to_scores:
......@@ -729,7 +754,8 @@ class WeightSharedConvolutionalBoxPredictor(BoxPredictor):
Defines the box predictor as defined in
https://arxiv.org/abs/1708.02002. This class differs from
ConvolutionalBoxPredictor in that it shares weights and biases while
predicting from different feature maps.
predicting from different feature maps. Separate multi-layer towers are
constructed for the box encoding and class predictors respectively.
"""
def __init__(self,
......@@ -811,22 +837,35 @@ class WeightSharedConvolutionalBoxPredictor(BoxPredictor):
with tf.variable_scope('WeightSharedConvolutionalBoxPredictor',
reuse=tf.AUTO_REUSE):
num_class_slots = self.num_classes + 1
net = image_feature
box_encodings_net = image_feature
class_predictions_net = image_feature
with slim.arg_scope(self._conv_hyperparams):
for i in range(self._num_layers_before_predictor):
net = slim.conv2d(net,
box_encodings_net = slim.conv2d(
box_encodings_net,
self._depth,
[self._kernel_size, self._kernel_size],
stride=1,
padding='SAME',
scope='conv2d_{}'.format(i))
scope='BoxEncodingPredictionTower/conv2d_{}'.format(i))
box_encodings = slim.conv2d(
net, num_predictions_per_location * self._box_code_size,
box_encodings_net,
num_predictions_per_location * self._box_code_size,
[self._kernel_size, self._kernel_size],
activation_fn=None, stride=1, padding='SAME',
scope='BoxEncodingPredictor')
for i in range(self._num_layers_before_predictor):
class_predictions_net = slim.conv2d(
class_predictions_net,
self._depth,
[self._kernel_size, self._kernel_size],
stride=1,
padding='SAME',
scope='ClassPredictionTower/conv2d_{}'.format(i))
class_predictions_with_background = slim.conv2d(
net, num_predictions_per_location * num_class_slots,
class_predictions_net,
num_predictions_per_location * num_class_slots,
[self._kernel_size, self._kernel_size],
activation_fn=None, stride=1, padding='SAME',
biases_initializer=tf.constant_initializer(
......
......@@ -316,9 +316,69 @@ class ConvolutionalBoxPredictorTest(test_case.TestCase):
[tf.shape(box_encodings), tf.shape(objectness_predictions)],
feed_dict={image_features:
np.random.rand(4, resolution, resolution, 64)})
actual_variable_set = set(
[var.op.name for var in tf.trainable_variables()])
self.assertAllEqual(box_encodings_shape, [4, expected_num_anchors, 1, 4])
self.assertAllEqual(objectness_predictions_shape,
[4, expected_num_anchors, 1])
expected_variable_set = set([
'BoxPredictor/Conv2d_0_1x1_32/biases',
'BoxPredictor/Conv2d_0_1x1_32/weights',
'BoxPredictor/BoxEncodingPredictor/biases',
'BoxPredictor/BoxEncodingPredictor/weights',
'BoxPredictor/ClassPredictor/biases',
'BoxPredictor/ClassPredictor/weights'])
self.assertEqual(expected_variable_set, actual_variable_set)
def test_use_depthwise_convolution(self):
image_features = tf.placeholder(dtype=tf.float32, shape=[4, None, None, 64])
conv_box_predictor = box_predictor.ConvolutionalBoxPredictor(
is_training=False,
num_classes=0,
conv_hyperparams=self._build_arg_scope_with_conv_hyperparams(),
min_depth=0,
max_depth=32,
num_layers_before_predictor=1,
dropout_keep_prob=0.8,
kernel_size=1,
box_code_size=4,
use_dropout=True,
use_depthwise=True
)
box_predictions = conv_box_predictor.predict(
[image_features], num_predictions_per_location=[5],
scope='BoxPredictor')
box_encodings = box_predictions[box_predictor.BOX_ENCODINGS]
objectness_predictions = box_predictions[
box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND]
init_op = tf.global_variables_initializer()
resolution = 32
expected_num_anchors = resolution*resolution*5
with self.test_session() as sess:
sess.run(init_op)
(box_encodings_shape,
objectness_predictions_shape) = sess.run(
[tf.shape(box_encodings), tf.shape(objectness_predictions)],
feed_dict={image_features:
np.random.rand(4, resolution, resolution, 64)})
actual_variable_set = set(
[var.op.name for var in tf.trainable_variables()])
self.assertAllEqual(box_encodings_shape, [4, expected_num_anchors, 1, 4])
self.assertAllEqual(objectness_predictions_shape,
[4, expected_num_anchors, 1])
expected_variable_set = set([
'BoxPredictor/Conv2d_0_1x1_32/biases',
'BoxPredictor/Conv2d_0_1x1_32/weights',
'BoxPredictor/BoxEncodingPredictor_depthwise/biases',
'BoxPredictor/BoxEncodingPredictor_depthwise/depthwise_weights',
'BoxPredictor/BoxEncodingPredictor/biases',
'BoxPredictor/BoxEncodingPredictor/weights',
'BoxPredictor/ClassPredictor_depthwise/biases',
'BoxPredictor/ClassPredictor_depthwise/depthwise_weights',
'BoxPredictor/ClassPredictor/biases',
'BoxPredictor/ClassPredictor/weights'])
self.assertEqual(expected_variable_set, actual_variable_set)
class WeightSharedConvolutionalBoxPredictorTest(test_case.TestCase):
......@@ -440,14 +500,26 @@ class WeightSharedConvolutionalBoxPredictorTest(test_case.TestCase):
with self.test_session(graph=tf.Graph()):
graph_fn(tf.random_uniform([4, 32, 32, 3], dtype=tf.float32),
tf.random_uniform([4, 32, 32, 3], dtype=tf.float32))
tf.random_uniform([4, 16, 16, 3], dtype=tf.float32))
actual_variable_set = set(
[var.op.name for var in tf.trainable_variables()])
expected_variable_set = set([
'BoxPredictor/WeightSharedConvolutionalBoxPredictor/conv2d_0/weights',
'BoxPredictor/WeightSharedConvolutionalBoxPredictor/conv2d_0/biases',
'BoxPredictor/WeightSharedConvolutionalBoxPredictor/conv2d_1/weights',
'BoxPredictor/WeightSharedConvolutionalBoxPredictor/conv2d_1/biases',
('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
'BoxEncodingPredictionTower/conv2d_0/weights'),
('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
'BoxEncodingPredictionTower/conv2d_0/biases'),
('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
'BoxEncodingPredictionTower/conv2d_1/weights'),
('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
'BoxEncodingPredictionTower/conv2d_1/biases'),
('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
'ClassPredictionTower/conv2d_0/weights'),
('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
'ClassPredictionTower/conv2d_0/biases'),
('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
'ClassPredictionTower/conv2d_1/weights'),
('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
'ClassPredictionTower/conv2d_1/biases'),
('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
'BoxEncodingPredictor/weights'),
('BoxPredictor/WeightSharedConvolutionalBoxPredictor/'
......@@ -489,6 +561,5 @@ class WeightSharedConvolutionalBoxPredictorTest(test_case.TestCase):
self.assertAllEqual(objectness_predictions_shape,
[4, expected_num_anchors, 1])
if __name__ == '__main__':
tf.test.main()
......@@ -36,6 +36,8 @@ from abc import abstractmethod
import tensorflow as tf
from object_detection.utils import ops
class Match(object):
"""Class to store results from the matcher.
......@@ -44,7 +46,7 @@ class Match(object):
convenient methods to query the matching results.
"""
def __init__(self, match_results):
def __init__(self, match_results, use_matmul_gather=False):
"""Constructs a Match object.
Args:
......@@ -52,6 +54,8 @@ class Match(object):
meaning that column i is matched with row match_results[i].
(2) match_results[i]=-1, meaning that column i is not matched.
(3) match_results[i]=-2, meaning that column i is ignored.
use_matmul_gather: Use matrix multiplication based gather instead of
standard tf.gather. (Default: False).
Raises:
ValueError: if match_results does not have rank 1 or is not an
......@@ -63,6 +67,9 @@ class Match(object):
raise ValueError('match_results should be an int32 or int64 scalar '
'tensor')
self._match_results = match_results
self._gather_op = tf.gather
if use_matmul_gather:
self._gather_op = ops.matmul_gather_on_zeroth_axis
@property
def match_results(self):
......@@ -163,7 +170,7 @@ class Match(object):
row_indices: int32 tensor of shape [K] with row indices.
"""
return self._reshape_and_cast(
tf.gather(self._match_results, self.matched_column_indices()))
self._gather_op(self._match_results, self.matched_column_indices()))
def _reshape_and_cast(self, t):
return tf.cast(tf.reshape(t, [-1]), tf.int32)
......@@ -193,7 +200,7 @@ class Match(object):
input_tensor = tf.concat([tf.stack([ignored_value, unmatched_value]),
input_tensor], axis=0)
gather_indices = tf.maximum(self.match_results + 2, 0)
gathered_tensor = tf.gather(input_tensor, gather_indices)
gathered_tensor = self._gather_op(input_tensor, gather_indices)
return gathered_tensor
......@@ -202,6 +209,16 @@ class Matcher(object):
"""
__metaclass__ = ABCMeta
def __init__(self, use_matmul_gather=False):
"""Constructs a Matcher.
Args:
use_matmul_gather: Force constructed match objects to use matrix
multiplication based gather instead of standard tf.gather.
(Default: False).
"""
self._use_matmul_gather = use_matmul_gather
def match(self, similarity_matrix, scope=None, **params):
"""Computes matches among row and column indices and returns the result.
......@@ -219,7 +236,8 @@ class Matcher(object):
A Match object with the results of matching.
"""
with tf.name_scope(scope, 'Match', [similarity_matrix, params]) as scope:
return Match(self._match(similarity_matrix, **params))
return Match(self._match(similarity_matrix, **params),
self._use_matmul_gather)
@abstractmethod
def _match(self, similarity_matrix, **params):
......
......@@ -172,5 +172,21 @@ class MatchTest(tf.test.TestCase):
gathered_tensor_out = gathered_tensor.eval()
self.assertAllEqual(expected_gathered_tensor, gathered_tensor_out)
def test_multidimensional_gather_based_on_match_with_matmul_gather_op(self):
match_results = tf.constant([1, -1, -2])
input_tensor = tf.constant([[0, 0.5, 0, 0.5], [0, 0, 0.5, 0.5]],
dtype=tf.float32)
expected_gathered_tensor = [[0, 0, 0.5, 0.5], [0, 0, 0, 0], [0, 0, 0, 0]]
match = matcher.Match(match_results, use_matmul_gather=True)
gathered_tensor = match.gather_based_on_match(input_tensor,
unmatched_value=tf.zeros(4),
ignored_value=tf.zeros(4))
self.assertEquals(gathered_tensor.dtype, tf.float32)
with self.test_session() as sess:
self.assertTrue(
all([op.name is not 'Gather' for op in sess.graph.get_operations()]))
gathered_tensor_out = gathered_tensor.eval()
self.assertAllEqual(expected_gathered_tensor, gathered_tensor_out)
if __name__ == '__main__':
tf.test.main()
......@@ -236,7 +236,8 @@ class DetectionModel(object):
groundtruth_boxes_list,
groundtruth_classes_list,
groundtruth_masks_list=None,
groundtruth_keypoints_list=None):
groundtruth_keypoints_list=None,
groundtruth_weights_list=None):
"""Provide groundtruth tensors.
Args:
......@@ -257,10 +258,15 @@ class DetectionModel(object):
shape [num_boxes, num_keypoints, 2] containing keypoints.
Keypoints are assumed to be provided in normalized coordinates and
missing keypoints should be encoded as NaN.
groundtruth_weights_list: A list of 1-D tf.float32 tensors of shape
[num_boxes] containing weights for groundtruth boxes.
"""
self._groundtruth_lists[fields.BoxListFields.boxes] = groundtruth_boxes_list
self._groundtruth_lists[
fields.BoxListFields.classes] = groundtruth_classes_list
if groundtruth_weights_list:
self._groundtruth_lists[fields.BoxListFields.
weights] = groundtruth_weights_list
if groundtruth_masks_list:
self._groundtruth_lists[
fields.BoxListFields.masks] = groundtruth_masks_list
......
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Records previous preprocessing operations and allows them to be repeated.
Used with object_detection.core.preprocessor. Passing a PreprocessorCache
into individual data augmentation functions or the general preprocess() function
will store all randomly generated variables in the PreprocessorCache. When
a preprocessor function is called multiple times with the same
PreprocessorCache object, that function will perform the same augmentation
on all calls.
"""
from collections import defaultdict
class PreprocessorCache(object):
"""Dictionary wrapper storing random variables generated during preprocessing.
"""
# Constant keys representing different preprocessing functions
ROTATION90 = 'rotation90'
HORIZONTAL_FLIP = 'horizontal_flip'
VERTICAL_FLIP = 'vertical_flip'
PIXEL_VALUE_SCALE = 'pixel_value_scale'
IMAGE_SCALE = 'image_scale'
RGB_TO_GRAY = 'rgb_to_gray'
ADJUST_BRIGHTNESS = 'adjust_brightness'
ADJUST_CONTRAST = 'adjust_contrast'
ADJUST_HUE = 'adjust_hue'
ADJUST_SATURATION = 'adjust_saturation'
DISTORT_COLOR = 'distort_color'
STRICT_CROP_IMAGE = 'strict_crop_image'
CROP_IMAGE = 'crop_image'
PAD_IMAGE = 'pad_image'
CROP_TO_ASPECT_RATIO = 'crop_to_aspect_ratio'
RESIZE_METHOD = 'resize_method'
PAD_TO_ASPECT_RATIO = 'pad_to_aspect_ratio'
BLACK_PATCHES = 'black_patches'
ADD_BLACK_PATCH = 'add_black_patch'
SELECTOR = 'selector'
SELECTOR_TUPLES = 'selector_tuples'
SSD_CROP_SELECTOR_ID = 'ssd_crop_selector_id'
SSD_CROP_PAD_SELECTOR_ID = 'ssd_crop_pad_selector_id'
# 23 permitted function ids
_VALID_FNS = [ROTATION90, HORIZONTAL_FLIP, VERTICAL_FLIP, PIXEL_VALUE_SCALE,
IMAGE_SCALE, RGB_TO_GRAY, ADJUST_BRIGHTNESS, ADJUST_CONTRAST,
ADJUST_HUE, ADJUST_SATURATION, DISTORT_COLOR, STRICT_CROP_IMAGE,
CROP_IMAGE, PAD_IMAGE, CROP_TO_ASPECT_RATIO, RESIZE_METHOD,
PAD_TO_ASPECT_RATIO, BLACK_PATCHES, ADD_BLACK_PATCH, SELECTOR,
SELECTOR_TUPLES, SSD_CROP_SELECTOR_ID, SSD_CROP_PAD_SELECTOR_ID]
def __init__(self):
self._history = defaultdict(dict)
def clear(self):
"""Resets cache."""
self._history = {}
def get(self, function_id, key):
"""Gets stored value given a function id and key.
Args:
function_id: identifier for the preprocessing function used.
key: identifier for the variable stored.
Returns:
value: the corresponding value, expected to be a tensor or
nested structure of tensors.
Raises:
ValueError: if function_id is not one of the 23 valid function ids.
"""
if function_id not in self._VALID_FNS:
raise ValueError('Function id not recognized: %s.' % str(function_id))
return self._history[function_id].get(key)
def update(self, function_id, key, value):
"""Adds a value to the dictionary.
Args:
function_id: identifier for the preprocessing function used.
key: identifier for the variable stored.
value: the value to store, expected to be a tensor or nested structure
of tensors.
Raises:
ValueError: if function_id is not one of the 23 valid function ids.
"""
if function_id not in self._VALID_FNS:
raise ValueError('Function id not recognized: %s.' % str(function_id))
self._history[function_id][key] = value
......@@ -21,6 +21,7 @@ import six
import tensorflow as tf
from object_detection.core import preprocessor
from object_detection.core import preprocessor_cache
from object_detection.core import standard_fields as fields
if six.PY2:
......@@ -290,6 +291,15 @@ class PreprocessorTest(tf.test.TestCase):
def expectedLabelsAfterThresholdingWithMissingScore(self):
return tf.constant([2], dtype=tf.float32)
def testRgbToGrayscale(self):
images = self.createTestImages()
grayscale_images = preprocessor._rgb_to_grayscale(images)
expected_images = tf.image.rgb_to_grayscale(images)
with self.test_session() as sess:
(grayscale_images, expected_images) = sess.run(
[grayscale_images, expected_images])
self.assertAllEqual(expected_images, grayscale_images)
def testNormalizeImage(self):
preprocess_options = [(preprocessor.normalize_image, {
'original_minval': 0,
......@@ -435,6 +445,55 @@ class PreprocessorTest(tf.test.TestCase):
rotated_mask, expected_mask = sess.run([rotated_mask, expected_mask])
self.assertAllEqual(rotated_mask.flatten(), expected_mask.flatten())
def _testPreprocessorCache(self,
preprocess_options,
test_boxes=False,
test_masks=False,
test_keypoints=False,
num_runs=4):
cache = preprocessor_cache.PreprocessorCache()
images = self.createTestImages()
boxes = self.createTestBoxes()
classes = self.createTestLabels()
masks = self.createTestMasks()
keypoints = self.createTestKeypoints()
preprocessor_arg_map = preprocessor.get_default_func_arg_map(
include_instance_masks=test_masks, include_keypoints=test_keypoints)
out = []
for i in range(num_runs):
tensor_dict = {
fields.InputDataFields.image: images,
}
num_outputs = 1
if test_boxes:
tensor_dict[fields.InputDataFields.groundtruth_boxes] = boxes
tensor_dict[fields.InputDataFields.groundtruth_classes] = classes
num_outputs += 1
if test_masks:
tensor_dict[fields.InputDataFields.groundtruth_instance_masks] = masks
num_outputs += 1
if test_keypoints:
tensor_dict[fields.InputDataFields.groundtruth_keypoints] = keypoints
num_outputs += 1
out.append(preprocessor.preprocess(
tensor_dict, preprocess_options, preprocessor_arg_map, cache))
with self.test_session() as sess:
to_run = []
for i in range(num_runs):
to_run.append(out[i][fields.InputDataFields.image])
if test_boxes:
to_run.append(out[i][fields.InputDataFields.groundtruth_boxes])
if test_masks:
to_run.append(
out[i][fields.InputDataFields.groundtruth_instance_masks])
if test_keypoints:
to_run.append(out[i][fields.InputDataFields.groundtruth_keypoints])
out_array = sess.run(to_run)
for i in range(num_outputs, len(out_array)):
self.assertAllClose(out_array[i], out_array[i - num_outputs])
def testRandomHorizontalFlip(self):
preprocess_options = [(preprocessor.random_horizontal_flip, {})]
images = self.expectedImagesAfterNormalization()
......@@ -491,6 +550,16 @@ class PreprocessorTest(tf.test.TestCase):
self.assertAllClose(boxes_, boxes_expected_)
self.assertAllClose(images_diff_, images_diff_expected_)
def testRandomHorizontalFlipWithCache(self):
keypoint_flip_permutation = self.createKeypointFlipPermutation()
preprocess_options = [
(preprocessor.random_horizontal_flip,
{'keypoint_flip_permutation': keypoint_flip_permutation})]
self._testPreprocessorCache(preprocess_options,
test_boxes=True,
test_masks=True,
test_keypoints=True)
def testRunRandomHorizontalFlipWithMaskAndKeypoints(self):
preprocess_options = [(preprocessor.random_horizontal_flip, {})]
image_height = 3
......@@ -578,6 +647,16 @@ class PreprocessorTest(tf.test.TestCase):
self.assertAllClose(boxes_, boxes_expected_)
self.assertAllClose(images_diff_, images_diff_expected_)
def testRandomVerticalFlipWithCache(self):
keypoint_flip_permutation = self.createKeypointFlipPermutation()
preprocess_options = [
(preprocessor.random_vertical_flip,
{'keypoint_flip_permutation': keypoint_flip_permutation})]
self._testPreprocessorCache(preprocess_options,
test_boxes=True,
test_masks=True,
test_keypoints=True)
def testRunRandomVerticalFlipWithMaskAndKeypoints(self):
preprocess_options = [(preprocessor.random_vertical_flip, {})]
image_height = 3
......@@ -665,6 +744,13 @@ class PreprocessorTest(tf.test.TestCase):
self.assertAllClose(boxes_, boxes_expected_)
self.assertAllClose(images_diff_, images_diff_expected_)
def testRandomRotation90WithCache(self):
preprocess_options = [(preprocessor.random_rotation90, {})]
self._testPreprocessorCache(preprocess_options,
test_boxes=True,
test_masks=True,
test_keypoints=True)
def testRunRandomRotation90WithMaskAndKeypoints(self):
preprocess_options = [(preprocessor.random_rotation90, {})]
image_height = 3
......@@ -716,6 +802,20 @@ class PreprocessorTest(tf.test.TestCase):
self.assertAllClose(values_greater_, values_true_)
self.assertAllClose(values_less_, values_true_)
def testRandomPixelValueScaleWithCache(self):
preprocess_options = []
preprocess_options.append((preprocessor.normalize_image, {
'original_minval': 0,
'original_maxval': 255,
'target_minval': 0,
'target_maxval': 1
}))
preprocess_options.append((preprocessor.random_pixel_value_scale, {}))
self._testPreprocessorCache(preprocess_options,
test_boxes=True,
test_masks=False,
test_keypoints=False)
def testRandomImageScale(self):
preprocess_options = [(preprocessor.random_image_scale, {})]
images_original = self.createTestImages()
......@@ -736,6 +836,13 @@ class PreprocessorTest(tf.test.TestCase):
self.assertTrue(
images_original_shape_[2] * 2.0 >= images_scaled_shape_[2])
def testRandomImageScaleWithCache(self):
preprocess_options = [(preprocessor.random_image_scale, {})]
self._testPreprocessorCache(preprocess_options,
test_boxes=False,
test_masks=False,
test_keypoints=False)
def testRandomRGBtoGray(self):
preprocess_options = [(preprocessor.random_rgb_to_gray, {})]
images_original = self.createTestImages()
......@@ -769,6 +876,14 @@ class PreprocessorTest(tf.test.TestCase):
self.assertAllClose(images_g_diff_, image_zero1_)
self.assertAllClose(images_b_diff_, image_zero1_)
def testRandomRGBtoGrayWithCache(self):
preprocess_options = [(
preprocessor.random_rgb_to_gray, {'probability': 0.5})]
self._testPreprocessorCache(preprocess_options,
test_boxes=False,
test_masks=False,
test_keypoints=False)
def testRandomAdjustBrightness(self):
preprocessing_options = []
preprocessing_options.append((preprocessor.normalize_image, {
......@@ -789,6 +904,20 @@ class PreprocessorTest(tf.test.TestCase):
[image_original_shape, image_bright_shape])
self.assertAllEqual(image_original_shape_, image_bright_shape_)
def testRandomAdjustBrightnessWithCache(self):
preprocess_options = []
preprocess_options.append((preprocessor.normalize_image, {
'original_minval': 0,
'original_maxval': 255,
'target_minval': 0,
'target_maxval': 1
}))
preprocess_options.append((preprocessor.random_adjust_brightness, {}))
self._testPreprocessorCache(preprocess_options,
test_boxes=False,
test_masks=False,
test_keypoints=False)
def testRandomAdjustContrast(self):
preprocessing_options = []
preprocessing_options.append((preprocessor.normalize_image, {
......@@ -809,6 +938,20 @@ class PreprocessorTest(tf.test.TestCase):
[image_original_shape, image_contrast_shape])
self.assertAllEqual(image_original_shape_, image_contrast_shape_)
def testRandomAdjustContrastWithCache(self):
preprocess_options = []
preprocess_options.append((preprocessor.normalize_image, {
'original_minval': 0,
'original_maxval': 255,
'target_minval': 0,
'target_maxval': 1
}))
preprocess_options.append((preprocessor.random_adjust_contrast, {}))
self._testPreprocessorCache(preprocess_options,
test_boxes=False,
test_masks=False,
test_keypoints=False)
def testRandomAdjustHue(self):
preprocessing_options = []
preprocessing_options.append((preprocessor.normalize_image, {
......@@ -829,6 +972,20 @@ class PreprocessorTest(tf.test.TestCase):
[image_original_shape, image_hue_shape])
self.assertAllEqual(image_original_shape_, image_hue_shape_)
def testRandomAdjustHueWithCache(self):
preprocess_options = []
preprocess_options.append((preprocessor.normalize_image, {
'original_minval': 0,
'original_maxval': 255,
'target_minval': 0,
'target_maxval': 1
}))
preprocess_options.append((preprocessor.random_adjust_hue, {}))
self._testPreprocessorCache(preprocess_options,
test_boxes=False,
test_masks=False,
test_keypoints=False)
def testRandomDistortColor(self):
preprocessing_options = []
preprocessing_options.append((preprocessor.normalize_image, {
......@@ -849,6 +1006,20 @@ class PreprocessorTest(tf.test.TestCase):
[images_original_shape, images_distorted_color_shape])
self.assertAllEqual(images_original_shape_, images_distorted_color_shape_)
def testRandomDistortColorWithCache(self):
preprocess_options = []
preprocess_options.append((preprocessor.normalize_image, {
'original_minval': 0,
'original_maxval': 255,
'target_minval': 0,
'target_maxval': 1
}))
preprocess_options.append((preprocessor.random_distort_color, {}))
self._testPreprocessorCache(preprocess_options,
test_boxes=False,
test_masks=False,
test_keypoints=False)
def testRandomJitterBoxes(self):
preprocessing_options = []
preprocessing_options.append((preprocessor.random_jitter_boxes, {}))
......@@ -900,6 +1071,21 @@ class PreprocessorTest(tf.test.TestCase):
self.assertAllEqual(boxes_rank_, distorted_boxes_rank_)
self.assertAllEqual(images_rank_, distorted_images_rank_)
def testRandomCropImageWithCache(self):
preprocess_options = [(preprocessor.random_rgb_to_gray,
{'probability': 0.5}),
(preprocessor.normalize_image, {
'original_minval': 0,
'original_maxval': 255,
'target_minval': 0,
'target_maxval': 1,
}),
(preprocessor.random_crop_image, {})]
self._testPreprocessorCache(preprocess_options,
test_boxes=True,
test_masks=False,
test_keypoints=False)
def testRandomCropImageGrayscale(self):
preprocessing_options = [(preprocessor.rgb_to_gray, {}),
(preprocessor.normalize_image, {
......@@ -1446,6 +1632,13 @@ class PreprocessorTest(tf.test.TestCase):
self.expectedKeypointsAfterThresholding()])
self.assertAllClose(retained_keypoints_, expected_keypoints_)
def testRandomCropToAspectRatioWithCache(self):
preprocess_options = [(preprocessor.random_crop_to_aspect_ratio, {})]
self._testPreprocessorCache(preprocess_options,
test_boxes=True,
test_masks=False,
test_keypoints=False)
def testRunRandomCropToAspectRatioWithMasks(self):
image = self.createColorfulTestImage()
boxes = self.createTestBoxes()
......@@ -1536,6 +1729,13 @@ class PreprocessorTest(tf.test.TestCase):
self.assertAllClose(distorted_keypoints_.flatten(),
expected_keypoints.flatten())
def testRandomPadToAspectRatioWithCache(self):
preprocess_options = [(preprocessor.random_pad_to_aspect_ratio, {})]
self._testPreprocessorCache(preprocess_options,
test_boxes=True,
test_masks=True,
test_keypoints=True)
def testRunRandomPadToAspectRatioWithMasks(self):
image = self.createColorfulTestImage()
boxes = self.createTestBoxes()
......@@ -1624,6 +1824,17 @@ class PreprocessorTest(tf.test.TestCase):
self.assertAllClose(distorted_keypoints_.flatten(),
expected_keypoints.flatten())
def testRandomPadImageWithCache(self):
preprocess_options = [(preprocessor.normalize_image, {
'original_minval': 0,
'original_maxval': 255,
'target_minval': 0,
'target_maxval': 1,}), (preprocessor.random_pad_image, {})]
self._testPreprocessorCache(preprocess_options,
test_boxes=True,
test_masks=True,
test_keypoints=True)
def testRandomPadImage(self):
preprocessing_options = [(preprocessor.normalize_image, {
'original_minval': 0,
......@@ -1670,6 +1881,17 @@ class PreprocessorTest(tf.test.TestCase):
self.assertTrue(np.all((boxes_[:, 3] - boxes_[:, 1]) >= (
padded_boxes_[:, 3] - padded_boxes_[:, 1])))
def testRandomCropPadImageWithCache(self):
preprocess_options = [(preprocessor.normalize_image, {
'original_minval': 0,
'original_maxval': 255,
'target_minval': 0,
'target_maxval': 1,}), (preprocessor.random_crop_pad_image, {})]
self._testPreprocessorCache(preprocess_options,
test_boxes=True,
test_masks=True,
test_keypoints=True)
def testRandomCropPadImageWithRandomCoefOne(self):
preprocessing_options = [(preprocessor.normalize_image, {
'original_minval': 0,
......@@ -1788,6 +2010,22 @@ class PreprocessorTest(tf.test.TestCase):
self.assertEqual(images_shape_[1], padded_images_shape_[1])
self.assertEqual(2 * images_shape_[2], padded_images_shape_[2])
def testRandomBlackPatchesWithCache(self):
preprocess_options = []
preprocess_options.append((preprocessor.normalize_image, {
'original_minval': 0,
'original_maxval': 255,
'target_minval': 0,
'target_maxval': 1
}))
preprocess_options.append((preprocessor.random_black_patches, {
'size_to_image_ratio': 0.5
}))
self._testPreprocessorCache(preprocess_options,
test_boxes=True,
test_masks=True,
test_keypoints=True)
def testRandomBlackPatches(self):
preprocessing_options = []
preprocessing_options.append((preprocessor.normalize_image, {
......@@ -1812,6 +2050,22 @@ class PreprocessorTest(tf.test.TestCase):
[images_shape, blacked_images_shape])
self.assertAllEqual(images_shape_, blacked_images_shape_)
def testRandomResizeMethodWithCache(self):
preprocess_options = []
preprocess_options.append((preprocessor.normalize_image, {
'original_minval': 0,
'original_maxval': 255,
'target_minval': 0,
'target_maxval': 1
}))
preprocess_options.append((preprocessor.random_resize_method, {
'target_size': (75, 150)
}))
self._testPreprocessorCache(preprocess_options,
test_boxes=True,
test_masks=True,
test_keypoints=True)
def testRandomResizeMethod(self):
preprocessing_options = []
preprocessing_options.append((preprocessor.normalize_image, {
......@@ -2144,6 +2398,20 @@ class PreprocessorTest(tf.test.TestCase):
self.assertAllEqual([0, 1, 1, 0, 1], one_hot)
def testSSDRandomCropWithCache(self):
preprocess_options = [
(preprocessor.normalize_image, {
'original_minval': 0,
'original_maxval': 255,
'target_minval': 0,
'target_maxval': 1
}),
(preprocessor.ssd_random_crop, {})]
self._testPreprocessorCache(preprocess_options,
test_boxes=True,
test_masks=False,
test_keypoints=False)
def testSSDRandomCrop(self):
preprocessing_options = [
(preprocessor.normalize_image, {
......@@ -2216,6 +2484,20 @@ class PreprocessorTest(tf.test.TestCase):
self.assertAllEqual(boxes_rank_, distorted_boxes_rank_)
self.assertAllEqual(images_rank_, distorted_images_rank_)
def testSSDRandomCropFixedAspectRatioWithCache(self):
preprocess_options = [
(preprocessor.normalize_image, {
'original_minval': 0,
'original_maxval': 255,
'target_minval': 0,
'target_maxval': 1
}),
(preprocessor.ssd_random_crop_fixed_aspect_ratio, {})]
self._testPreprocessorCache(preprocess_options,
test_boxes=True,
test_masks=False,
test_keypoints=False)
def _testSSDRandomCropFixedAspectRatio(self,
include_label_scores,
include_instance_masks,
......
......@@ -58,6 +58,9 @@ class InputDataFields(object):
groundtruth_keypoint_visibilities: ground truth keypoint visibilities.
groundtruth_label_scores: groundtruth label scores.
groundtruth_weights: groundtruth weight factor for bounding boxes.
num_groundtruth_boxes: number of groundtruth boxes.
true_image_shapes: true shapes of images in the resized images, as resized
images can be padded with zeros.
"""
image = 'image'
original_image = 'original_image'
......@@ -81,6 +84,8 @@ class InputDataFields(object):
groundtruth_keypoint_visibilities = 'groundtruth_keypoint_visibilities'
groundtruth_label_scores = 'groundtruth_label_scores'
groundtruth_weights = 'groundtruth_weights'
num_groundtruth_boxes = 'num_groundtruth_boxes'
true_image_shape = 'true_image_shape'
class DetectionResultFields(object):
......
......@@ -389,7 +389,8 @@ def create_target_assigner(reference, stage=None,
def batch_assign_targets(target_assigner,
anchors_batch,
gt_box_batch,
gt_class_targets_batch):
gt_class_targets_batch,
gt_weights_batch=None):
"""Batched assignment of classification and regression targets.
Args:
......@@ -402,6 +403,8 @@ def batch_assign_targets(target_assigner,
each tensor has shape [num_gt_boxes_i, classification_target_size] and
num_gt_boxes_i is the number of boxes in the ith boxlist of
gt_box_batch.
gt_weights_batch: A list of 1-D tf.float32 tensors of shape
[num_boxes] containing weights for groundtruth boxes.
Returns:
batch_cls_targets: a tensor with shape [batch_size, num_anchors,
......@@ -435,11 +438,13 @@ def batch_assign_targets(target_assigner,
reg_targets_list = []
reg_weights_list = []
match_list = []
for anchors, gt_boxes, gt_class_targets in zip(
anchors_batch, gt_box_batch, gt_class_targets_batch):
if gt_weights_batch is None:
gt_weights_batch = [None] * len(gt_class_targets_batch)
for anchors, gt_boxes, gt_class_targets, gt_weights in zip(
anchors_batch, gt_box_batch, gt_class_targets_batch, gt_weights_batch):
(cls_targets, cls_weights, reg_targets,
reg_weights, match) = target_assigner.assign(
anchors, gt_boxes, gt_class_targets)
anchors, gt_boxes, gt_class_targets, gt_weights)
cls_targets_list.append(cls_targets)
cls_weights_list.append(cls_weights)
reg_targets_list.append(reg_targets)
......
......@@ -632,6 +632,81 @@ class BatchTargetAssignerTest(test_case.TestCase):
self.assertAllClose(reg_targets_out, exp_reg_targets)
self.assertAllClose(reg_weights_out, exp_reg_weights)
def test_batch_assign_multiclass_targets_with_padded_groundtruth(self):
def graph_fn(anchor_means, anchor_stddevs, groundtruth_boxlist1,
groundtruth_boxlist2, class_targets1, class_targets2,
groundtruth_weights1, groundtruth_weights2):
box_list1 = box_list.BoxList(groundtruth_boxlist1)
box_list2 = box_list.BoxList(groundtruth_boxlist2)
gt_box_batch = [box_list1, box_list2]
gt_class_targets = [class_targets1, class_targets2]
gt_weights = [groundtruth_weights1, groundtruth_weights2]
anchors_boxlist = box_list.BoxList(anchor_means)
anchors_boxlist.add_field('stddev', anchor_stddevs)
multiclass_target_assigner = self._get_multi_class_target_assigner(
num_classes=3)
(cls_targets, cls_weights, reg_targets, reg_weights,
_) = targetassigner.batch_assign_targets(
multiclass_target_assigner, anchors_boxlist, gt_box_batch,
gt_class_targets, gt_weights)
return (cls_targets, cls_weights, reg_targets, reg_weights)
groundtruth_boxlist1 = np.array([[0., 0., 0.2, 0.2],
[0., 0., 0., 0.]], dtype=np.float32)
groundtruth_weights1 = np.array([1, 0], dtype=np.float32)
groundtruth_boxlist2 = np.array([[0, 0.25123152, 1, 1],
[0.015789, 0.0985, 0.55789, 0.3842],
[0, 0, 0, 0]],
dtype=np.float32)
groundtruth_weights2 = np.array([1, 1, 0], dtype=np.float32)
class_targets1 = np.array([[0, 1, 0, 0], [0, 0, 0, 0]], dtype=np.float32)
class_targets2 = np.array([[0, 0, 0, 1],
[0, 0, 1, 0],
[0, 0, 0, 0]], dtype=np.float32)
anchor_means = np.array([[0, 0, .25, .25],
[0, .25, 1, 1],
[0, .1, .5, .5],
[.75, .75, 1, 1]], dtype=np.float32)
anchor_stddevs = np.array([[.1, .1, .1, .1],
[.1, .1, .1, .1],
[.1, .1, .1, .1],
[.1, .1, .1, .1]], dtype=np.float32)
exp_reg_targets = [[[0, 0, -0.5, -0.5],
[0, 0, 0, 0],
[0, 0, 0, 0,],
[0, 0, 0, 0,],],
[[0, 0, 0, 0,],
[0, 0.01231521, 0, 0],
[0.15789001, -0.01500003, 0.57889998, -1.15799987],
[0, 0, 0, 0]]]
exp_cls_weights = [[1, 1, 1, 1],
[1, 1, 1, 1]]
exp_cls_targets = [[[0, 1, 0, 0],
[1, 0, 0, 0],
[1, 0, 0, 0],
[1, 0, 0, 0]],
[[1, 0, 0, 0],
[0, 0, 0, 1],
[0, 0, 1, 0],
[1, 0, 0, 0]]]
exp_reg_weights = [[1, 0, 0, 0],
[0, 1, 1, 0]]
(cls_targets_out, cls_weights_out, reg_targets_out,
reg_weights_out) = self.execute(graph_fn, [anchor_means, anchor_stddevs,
groundtruth_boxlist1,
groundtruth_boxlist2,
class_targets1,
class_targets2,
groundtruth_weights1,
groundtruth_weights2])
self.assertAllClose(cls_targets_out, exp_cls_targets)
self.assertAllClose(cls_weights_out, exp_cls_weights)
self.assertAllClose(reg_targets_out, exp_reg_targets)
self.assertAllClose(reg_weights_out, exp_reg_weights)
def test_batch_assign_multidimensional_targets(self):
def graph_fn(anchor_means, anchor_stddevs, groundtruth_boxlist1,
groundtruth_boxlist2, class_targets1, class_targets2):
......
......@@ -134,7 +134,8 @@ class TfExampleDecoder(data_decoder.DataDecoder):
self.items_to_handlers[
fields.InputDataFields.groundtruth_instance_masks] = (
slim_example_decoder.ItemHandlerCallback(
['image/object/mask'], self._decode_png_instance_masks))
['image/object/mask', 'image/height', 'image/width'],
self._decode_png_instance_masks))
else:
raise ValueError('Did not recognize the `instance_mask_type` option.')
if label_map_proto_file:
......@@ -178,10 +179,15 @@ class TfExampleDecoder(data_decoder.DataDecoder):
[None, 4] containing box corners.
fields.InputDataFields.groundtruth_classes - 1D int64 tensor of shape
[None] containing classes for the boxes.
fields.InputDataFields.groundtruth_weights - 1D float32 tensor of
shape [None] indicating the weights of groundtruth boxes.
fields.InputDataFields.num_groundtruth_boxes - int32 scalar indicating
the number of groundtruth_boxes.
fields.InputDataFields.groundtruth_area - 1D float32 tensor of shape
[None] containing containing object mask area in pixel squared.
fields.InputDataFields.groundtruth_is_crowd - 1D bool tensor of shape
[None] indicating if the boxes enclose a crowd.
Optional:
fields.InputDataFields.groundtruth_difficult - 1D bool tensor of shape
[None] indicating if the boxes represent `difficult` instances.
......@@ -189,8 +195,6 @@ class TfExampleDecoder(data_decoder.DataDecoder):
[None] indicating if the boxes represent `group_of` instances.
fields.InputDataFields.groundtruth_instance_masks - 3D float32 tensor of
shape [None, None, None] containing instance masks.
fields.InputDataFields.groundtruth_weights - 1D float32 tensor of
shape [None] indicating the weights of groundtruth boxes.
"""
serialized_example = tf.reshape(tf_example_string_tensor, shape=[])
decoder = slim_example_decoder.TFExampleDecoder(self.keys_to_features,
......@@ -201,6 +205,20 @@ class TfExampleDecoder(data_decoder.DataDecoder):
is_crowd = fields.InputDataFields.groundtruth_is_crowd
tensor_dict[is_crowd] = tf.cast(tensor_dict[is_crowd], dtype=tf.bool)
tensor_dict[fields.InputDataFields.image].set_shape([None, None, 3])
tensor_dict[fields.InputDataFields.num_groundtruth_boxes] = tf.shape(
tensor_dict[fields.InputDataFields.groundtruth_boxes])[0]
def default_groundtruth_weights():
return tf.ones(
[tf.shape(tensor_dict[fields.InputDataFields.groundtruth_boxes])[0]],
dtype=tf.float32)
tensor_dict[fields.InputDataFields.groundtruth_weights] = tf.cond(
tf.greater(
tf.shape(
tensor_dict[fields.InputDataFields.groundtruth_weights])[0],
0), lambda: tensor_dict[fields.InputDataFields.groundtruth_weights],
default_groundtruth_weights)
return tensor_dict
def _reshape_instance_masks(self, keys_to_tensors):
......@@ -247,6 +265,11 @@ class TfExampleDecoder(data_decoder.DataDecoder):
return image
png_masks = keys_to_tensors['image/object/mask']
height = keys_to_tensors['image/height']
width = keys_to_tensors['image/width']
if isinstance(png_masks, tf.SparseTensor):
png_masks = tf.sparse_tensor_to_dense(png_masks, default_value='')
return tf.map_fn(decode_png_mask, png_masks, dtype=tf.float32)
return tf.cond(
tf.greater(tf.size(png_masks), 0),
lambda: tf.map_fn(decode_png_mask, png_masks, dtype=tf.float32),
lambda: tf.zeros(tf.to_int32(tf.stack([0, height, width]))))
......@@ -58,7 +58,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
def testDecodeJpegImage(self):
image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
encoded_jpeg = self._EncodeImage(image_tensor)
decoded_jpeg = self._DecodeImage(encoded_jpeg)
example = tf.train.Example(features=tf.train.Features(feature={
......@@ -79,7 +79,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
self.assertEqual('image_id', tensor_dict[fields.InputDataFields.source_id])
def testDecodeImageKeyAndFilename(self):
image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
encoded_jpeg = self._EncodeImage(image_tensor)
example = tf.train.Example(features=tf.train.Features(feature={
'image/encoded': self._BytesFeature(encoded_jpeg),
......@@ -97,7 +97,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
self.assertEqual('filename', tensor_dict[fields.InputDataFields.filename])
def testDecodePngImage(self):
image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
encoded_png = self._EncodeImage(image_tensor, encoding_type='png')
decoded_png = self._DecodeImage(encoded_png, encoding_type='png')
example = tf.train.Example(features=tf.train.Features(feature={
......@@ -147,8 +147,32 @@ class TfExampleDecoderTest(tf.test.TestCase):
decoded_masks,
tensor_dict[fields.InputDataFields.groundtruth_instance_masks])
def testDecodeEmptyPngInstanceMasks(self):
image_tensor = np.random.randint(256, size=(10, 10, 3)).astype(np.uint8)
encoded_jpeg = self._EncodeImage(image_tensor)
encoded_masks = []
example = tf.train.Example(
features=tf.train.Features(
feature={
'image/encoded': self._BytesFeature(encoded_jpeg),
'image/format': self._BytesFeature('jpeg'),
'image/object/mask': self._BytesFeature(encoded_masks),
'image/height': self._Int64Feature([10]),
'image/width': self._Int64Feature([10]),
})).SerializeToString()
example_decoder = tf_example_decoder.TfExampleDecoder(
load_instance_masks=True, instance_mask_type=input_reader_pb2.PNG_MASKS)
tensor_dict = example_decoder.decode(tf.convert_to_tensor(example))
with self.test_session() as sess:
tensor_dict = sess.run(tensor_dict)
self.assertAllEqual(
tensor_dict[fields.InputDataFields.groundtruth_instance_masks].shape,
[0, 10, 10])
def testDecodeBoundingBox(self):
image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
encoded_jpeg = self._EncodeImage(image_tensor)
bbox_ymins = [0.0, 4.0]
bbox_xmins = [1.0, 5.0]
......@@ -175,9 +199,39 @@ class TfExampleDecoderTest(tf.test.TestCase):
bbox_ymaxs, bbox_xmaxs]).transpose()
self.assertAllEqual(expected_boxes,
tensor_dict[fields.InputDataFields.groundtruth_boxes])
self.assertAllEqual(
2, tensor_dict[fields.InputDataFields.num_groundtruth_boxes])
def testDecodeDefaultGroundtruthWeights(self):
image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
encoded_jpeg = self._EncodeImage(image_tensor)
bbox_ymins = [0.0, 4.0]
bbox_xmins = [1.0, 5.0]
bbox_ymaxs = [2.0, 6.0]
bbox_xmaxs = [3.0, 7.0]
example = tf.train.Example(features=tf.train.Features(feature={
'image/encoded': self._BytesFeature(encoded_jpeg),
'image/format': self._BytesFeature('jpeg'),
'image/object/bbox/ymin': self._FloatFeature(bbox_ymins),
'image/object/bbox/xmin': self._FloatFeature(bbox_xmins),
'image/object/bbox/ymax': self._FloatFeature(bbox_ymaxs),
'image/object/bbox/xmax': self._FloatFeature(bbox_xmaxs),
})).SerializeToString()
example_decoder = tf_example_decoder.TfExampleDecoder()
tensor_dict = example_decoder.decode(tf.convert_to_tensor(example))
self.assertAllEqual((tensor_dict[fields.InputDataFields.groundtruth_boxes].
get_shape().as_list()), [None, 4])
with self.test_session() as sess:
tensor_dict = sess.run(tensor_dict)
self.assertAllClose(tensor_dict[fields.InputDataFields.groundtruth_weights],
np.ones(2, dtype=np.float32))
def testDecodeObjectLabel(self):
image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
encoded_jpeg = self._EncodeImage(image_tensor)
bbox_classes = [0, 1]
example = tf.train.Example(features=tf.train.Features(feature={
......@@ -199,8 +253,89 @@ class TfExampleDecoderTest(tf.test.TestCase):
self.assertAllEqual(bbox_classes,
tensor_dict[fields.InputDataFields.groundtruth_classes])
def testDecodeObjectLabelNoText(self):
image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
encoded_jpeg = self._EncodeImage(image_tensor)
bbox_classes = [1, 2]
example = tf.train.Example(features=tf.train.Features(feature={
'image/encoded': self._BytesFeature(encoded_jpeg),
'image/format': self._BytesFeature('jpeg'),
'image/object/class/label': self._Int64Feature(bbox_classes),
})).SerializeToString()
label_map_string = """
item {
id:1
name:'cat'
}
item {
id:2
name:'dog'
}
"""
label_map_path = os.path.join(self.get_temp_dir(), 'label_map.pbtxt')
with tf.gfile.Open(label_map_path, 'wb') as f:
f.write(label_map_string)
example_decoder = tf_example_decoder.TfExampleDecoder(
label_map_proto_file=label_map_path)
tensor_dict = example_decoder.decode(tf.convert_to_tensor(example))
self.assertAllEqual((tensor_dict[
fields.InputDataFields.groundtruth_classes].get_shape().as_list()),
[None])
init = tf.tables_initializer()
with self.test_session() as sess:
sess.run(init)
tensor_dict = sess.run(tensor_dict)
self.assertAllEqual(bbox_classes,
tensor_dict[fields.InputDataFields.groundtruth_classes])
def testDecodeObjectLabelUnrecognizedName(self):
image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
encoded_jpeg = self._EncodeImage(image_tensor)
bbox_classes_text = ['cat', 'cheetah']
example = tf.train.Example(
features=tf.train.Features(
feature={
'image/encoded':
self._BytesFeature(encoded_jpeg),
'image/format':
self._BytesFeature('jpeg'),
'image/object/class/text':
self._BytesFeature(bbox_classes_text),
})).SerializeToString()
label_map_string = """
item {
id:2
name:'cat'
}
item {
id:1
name:'dog'
}
"""
label_map_path = os.path.join(self.get_temp_dir(), 'label_map.pbtxt')
with tf.gfile.Open(label_map_path, 'wb') as f:
f.write(label_map_string)
example_decoder = tf_example_decoder.TfExampleDecoder(
label_map_proto_file=label_map_path)
tensor_dict = example_decoder.decode(tf.convert_to_tensor(example))
self.assertAllEqual((tensor_dict[fields.InputDataFields.groundtruth_classes]
.get_shape().as_list()), [None])
with self.test_session() as sess:
sess.run(tf.tables_initializer())
tensor_dict = sess.run(tensor_dict)
self.assertAllEqual([2, -1],
tensor_dict[fields.InputDataFields.groundtruth_classes])
def testDecodeObjectLabelWithMapping(self):
image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
encoded_jpeg = self._EncodeImage(image_tensor)
bbox_classes_text = ['cat', 'dog']
example = tf.train.Example(
......@@ -242,7 +377,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
tensor_dict[fields.InputDataFields.groundtruth_classes])
def testDecodeObjectArea(self):
image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
encoded_jpeg = self._EncodeImage(image_tensor)
object_area = [100., 174.]
example = tf.train.Example(features=tf.train.Features(feature={
......@@ -263,7 +398,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
tensor_dict[fields.InputDataFields.groundtruth_area])
def testDecodeObjectIsCrowd(self):
image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
encoded_jpeg = self._EncodeImage(image_tensor)
object_is_crowd = [0, 1]
example = tf.train.Example(features=tf.train.Features(feature={
......@@ -286,7 +421,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
fields.InputDataFields.groundtruth_is_crowd])
def testDecodeObjectDifficult(self):
image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
encoded_jpeg = self._EncodeImage(image_tensor)
object_difficult = [0, 1]
example = tf.train.Example(features=tf.train.Features(feature={
......@@ -309,7 +444,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
fields.InputDataFields.groundtruth_difficult])
def testDecodeObjectGroupOf(self):
image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
encoded_jpeg = self._EncodeImage(image_tensor)
object_group_of = [0, 1]
example = tf.train.Example(features=tf.train.Features(
......@@ -333,7 +468,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
tensor_dict[fields.InputDataFields.groundtruth_group_of])
def testDecodeObjectWeight(self):
image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8)
image_tensor = np.random.randint(256, size=(4, 5, 3)).astype(np.uint8)
encoded_jpeg = self._EncodeImage(image_tensor)
object_weights = [0.75, 1.0]
example = tf.train.Example(features=tf.train.Features(
......@@ -362,7 +497,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
image_width = 3
# Randomly generate image.
image_tensor = np.random.randint(255, size=(image_height,
image_tensor = np.random.randint(256, size=(image_height,
image_width,
3)).astype(np.uint8)
encoded_jpeg = self._EncodeImage(image_tensor)
......@@ -413,7 +548,7 @@ class TfExampleDecoderTest(tf.test.TestCase):
image_height = 5
image_width = 3
# Randomly generate image.
image_tensor = np.random.randint(255, size=(image_height,
image_tensor = np.random.randint(256, size=(image_height,
image_width,
3)).astype(np.uint8)
encoded_jpeg = self._EncodeImage(image_tensor)
......
......@@ -87,13 +87,12 @@ def create_tf_example(image,
to the format expected by the Tensorflow Object Detection API (which is
which is [ymin, xmin, ymax, xmax] with coordinates normalized relative
to image size).
image_dir: Directory containing the image files.
image_dir: directory containing the image files.
category_index: a dict containing COCO category information keyed
by the 'id' field of each category. See the
label_map_util.create_category_index function.
include_masks: Whether to include instance segmentations masks
(PNG encoded) in the result. default: False.
Returns:
example: The converted tf.Example
num_annotations_skipped: Number of (invalid) annotations that were ignored.
......@@ -104,6 +103,7 @@ def create_tf_example(image,
image_height = image['height']
image_width = image['width']
filename = image['file_name']
image_id = image['id']
full_path = os.path.join(image_dir, filename)
with tf.gfile.GFile(full_path, 'rb') as fid:
......@@ -118,6 +118,7 @@ def create_tf_example(image,
ymax = []
is_crowd = []
category_names = []
category_ids = []
area = []
encoded_mask_png = []
num_annotations_skipped = 0
......@@ -135,12 +136,13 @@ def create_tf_example(image,
ymax.append(float(y + height) / image_height)
is_crowd.append(object_annotations['iscrowd'])
category_id = int(object_annotations['category_id'])
category_ids.append(category_id)
category_names.append(category_index[category_id]['name'].encode('utf8'))
area.append(object_annotations['area'])
if include_masks:
run_len_encoding = mask.frPyObjects(
object_annotations['segmentation'], image_height, image_width)
run_len_encoding = mask.frPyObjects(object_annotations['segmentation'],
image_height, image_width)
binary_mask = mask.decode(run_len_encoding)
if not object_annotations['iscrowd']:
binary_mask = np.amax(binary_mask, axis=2)
......@@ -148,31 +150,41 @@ def create_tf_example(image,
output_io = io.BytesIO()
pil_image.save(output_io, format='PNG')
encoded_mask_png.append(output_io.getvalue())
feature_dict = {
'image/height': dataset_util.int64_feature(image_height),
'image/width': dataset_util.int64_feature(image_width),
'image/filename': dataset_util.bytes_feature(
filename.encode('utf8')),
'image/source_id': dataset_util.bytes_feature(
filename.encode('utf8')),
'image/key/sha256': dataset_util.bytes_feature(key.encode('utf8')),
'image/encoded': dataset_util.bytes_feature(encoded_jpg),
'image/format': dataset_util.bytes_feature('jpeg'.encode('utf8')),
'image/object/bbox/xmin': dataset_util.float_list_feature(xmin),
'image/object/bbox/xmax': dataset_util.float_list_feature(xmax),
'image/object/bbox/ymin': dataset_util.float_list_feature(ymin),
'image/object/bbox/ymax': dataset_util.float_list_feature(ymax),
'image/object/class/text': dataset_util.bytes_list_feature(
category_names),
'image/object/is_crowd': dataset_util.int64_list_feature(is_crowd),
'image/object/area': dataset_util.float_list_feature(area),
'image/height':
dataset_util.int64_feature(image_height),
'image/width':
dataset_util.int64_feature(image_width),
'image/filename':
dataset_util.bytes_feature(filename.encode('utf8')),
'image/source_id':
dataset_util.bytes_feature(str(image_id).encode('utf8')),
'image/key/sha256':
dataset_util.bytes_feature(key.encode('utf8')),
'image/encoded':
dataset_util.bytes_feature(encoded_jpg),
'image/format':
dataset_util.bytes_feature('jpeg'.encode('utf8')),
'image/object/bbox/xmin':
dataset_util.float_list_feature(xmin),
'image/object/bbox/xmax':
dataset_util.float_list_feature(xmax),
'image/object/bbox/ymin':
dataset_util.float_list_feature(ymin),
'image/object/bbox/ymax':
dataset_util.float_list_feature(ymax),
'image/object/class/label':
dataset_util.int64_list_feature(category_ids),
'image/object/is_crowd':
dataset_util.int64_list_feature(is_crowd),
'image/object/area':
dataset_util.float_list_feature(area),
}
if include_masks:
feature_dict['image/object/mask'] = (
dataset_util.bytes_list_feature(encoded_mask_png))
example = tf.train.Example(features=tf.train.Features(feature=feature_dict))
return example, num_annotations_skipped
return key, example, num_annotations_skipped
def _create_tf_record_from_coco_annotations(
......@@ -217,7 +229,7 @@ def _create_tf_record_from_coco_annotations(
if idx % 100 == 0:
tf.logging.info('On image %d of %d', idx, len(images))
annotations_list = annotations_index[image['id']]
tf_example, num_annotations_skipped = create_tf_example(
_, tf_example, num_annotations_skipped = create_tf_example(
image, annotations_list, image_dir, category_index, include_masks)
total_num_annotations_skipped += num_annotations_skipped
writer.write(tf_example.SerializeToString())
......
......@@ -12,7 +12,6 @@
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Test for create_coco_tf_record.py."""
import io
......@@ -52,25 +51,33 @@ class CreateCocoTFRecordTest(tf.test.TestCase):
'id': 11,
}
annotations_list = [
{
annotations_list = [{
'area': .5,
'iscrowd': False,
'image_id': 11,
'bbox': [64, 64, 128, 128],
'category_id': 2,
'id': 1000,
}
]
}]
image_dir = tmp_dir
category_index = {
1: {'name': 'dog', 'id': 1},
2: {'name': 'cat', 'id': 2},
3: {'name': 'human', 'id': 3}
1: {
'name': 'dog',
'id': 1
},
2: {
'name': 'cat',
'id': 2
},
3: {
'name': 'human',
'id': 3
}
}
example, num_annotations_skipped = create_coco_tf_record.create_tf_example(
(_, example,
num_annotations_skipped) = create_coco_tf_record.create_tf_example(
image, annotations_list, image_dir, category_index)
self.assertEqual(num_annotations_skipped, 0)
......@@ -83,7 +90,7 @@ class CreateCocoTFRecordTest(tf.test.TestCase):
[image_file_name])
self._assertProtoEqual(
example.features.feature['image/source_id'].bytes_list.value,
[image_file_name])
[str(image['id'])])
self._assertProtoEqual(
example.features.feature['image/format'].bytes_list.value, ['jpeg'])
self._assertProtoEqual(
......@@ -98,9 +105,6 @@ class CreateCocoTFRecordTest(tf.test.TestCase):
self._assertProtoEqual(
example.features.feature['image/object/bbox/ymax'].float_list.value,
[0.75])
self._assertProtoEqual(
example.features.feature['image/object/class/text'].bytes_list.value,
['cat'])
def test_create_tf_example_with_instance_masks(self):
image_file_name = 'tmp_image.jpg'
......@@ -117,25 +121,26 @@ class CreateCocoTFRecordTest(tf.test.TestCase):
'id': 11,
}
annotations_list = [
{
annotations_list = [{
'area': .5,
'iscrowd': False,
'image_id': 11,
'bbox': [0, 0, 8, 8],
'segmentation': [[4, 0, 0, 0, 0, 4],
[8, 4, 4, 8, 8, 8]],
'segmentation': [[4, 0, 0, 0, 0, 4], [8, 4, 4, 8, 8, 8]],
'category_id': 1,
'id': 1000,
}
]
}]
image_dir = tmp_dir
category_index = {
1: {'name': 'dog', 'id': 1},
1: {
'name': 'dog',
'id': 1
},
}
example, num_annotations_skipped = create_coco_tf_record.create_tf_example(
(_, example,
num_annotations_skipped) = create_coco_tf_record.create_tf_example(
image, annotations_list, image_dir, category_index, include_masks=True)
self.assertEqual(num_annotations_skipped, 0)
......@@ -148,7 +153,7 @@ class CreateCocoTFRecordTest(tf.test.TestCase):
[image_file_name])
self._assertProtoEqual(
example.features.feature['image/source_id'].bytes_list.value,
[image_file_name])
[str(image['id'])])
self._assertProtoEqual(
example.features.feature['image/format'].bytes_list.value, ['jpeg'])
self._assertProtoEqual(
......@@ -163,24 +168,20 @@ class CreateCocoTFRecordTest(tf.test.TestCase):
self._assertProtoEqual(
example.features.feature['image/object/bbox/ymax'].float_list.value,
[1])
self._assertProtoEqual(
example.features.feature['image/object/class/text'].bytes_list.value,
['dog'])
encoded_mask_pngs = [io.BytesIO(encoded_masks)
for encoded_masks in example.features.feature[
'image/object/mask'].bytes_list.value]
pil_masks = [np.array(PIL.Image.open(encoded_mask_png))
for encoded_mask_png in encoded_mask_pngs]
encoded_mask_pngs = [
io.BytesIO(encoded_masks) for encoded_masks in example.features.feature[
'image/object/mask'].bytes_list.value
]
pil_masks = [
np.array(PIL.Image.open(encoded_mask_png))
for encoded_mask_png in encoded_mask_pngs
]
self.assertTrue(len(pil_masks) == 1)
self.assertAllEqual(pil_masks[0],
[[1, 1, 1, 0, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 0, 1, 1, 1, 1]])
[[1, 1, 1, 0, 0, 0, 0, 0], [1, 1, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 1, 1, 1], [0, 0, 0, 0, 1, 1, 1, 1]])
if __name__ == '__main__':
......
......@@ -509,6 +509,11 @@ def result_dict_for_single_example(image,
detection_masks = detections[detection_fields.detection_masks][0]
# TODO: This should be done in model's postprocess
# function ideally.
num_detections = tf.to_int32(detections[detection_fields.num_detections][0])
detection_boxes = tf.slice(
detection_boxes, begin=[0, 0], size=[num_detections, -1])
detection_masks = tf.slice(
detection_masks, begin=[0, 0, 0], size=[num_detections, -1, -1])
detection_masks_reframed = ops.reframe_box_masks_to_image_masks(
detection_masks, detection_boxes, image_shape[1], image_shape[2])
detection_masks_reframed = tf.cast(
......
......@@ -24,6 +24,7 @@ import tensorflow as tf
from object_detection import eval_util
from object_detection.core import prefetcher
from object_detection.core import standard_fields as fields
from object_detection.metrics import coco_evaluation
from object_detection.utils import object_detection_evaluation
# A dictionary of metric names to classes that implement the metric. The classes
......@@ -39,7 +40,11 @@ EVAL_METRICS_CLASS_DICT = {
'weighted_pascal_voc_instance_segmentation_metrics':
object_detection_evaluation.WeightedPascalInstanceSegmentationEvaluator,
'open_images_detection_metrics':
object_detection_evaluation.OpenImagesDetectionEvaluator
object_detection_evaluation.OpenImagesDetectionEvaluator,
'coco_detection_metrics':
coco_evaluation.CocoDetectionEvaluator,
'coco_mask_metrics':
coco_evaluation.CocoMaskEvaluator,
}
EVAL_DEFAULT_METRIC = 'pascal_voc_detection_metrics'
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment