Unverified Commit 9bbf8015 authored by pkulzc's avatar pkulzc Committed by GitHub
Browse files

Merged commit includes the following changes: (#6932)

250447559  by Zhichao Lu:

    Update expected files format for Instance Segmentation challenge:
    - add fields ImageWidth, ImageHeight and store the values per prediction
    - as mask, store only encoded image and assume its size is ImageWidth x ImageHeight

--
250402780  by rathodv:

    Fix failing Mask R-CNN TPU convergence test.

    Cast second stage prediction tensors from bfloat16 to float32 to prevent errors in third target assignment (Mask Prediction) - Concat with different types bfloat16 and bfloat32 isn't allowed.

--
250300240  by Zhichao Lu:

    Addion Open Images Challenge 2019 object detection and instance segmentation
    support into Estimator framework.

--
249944839  by rathodv:

    Modify exporter.py to add multiclass score nodes in exported inference graphs.

--
249935201  by rathodv:

    Modify postprocess methods to preserve multiclass scores after non max suppression.

--
249878079  by Zhichao Lu:

    This CL slightly refactors some Object Detection helper functions for data creation, evaluation, and groundtruth providing.

    This will allow the eager+function custom loops to share code with the existing estimator training loops.

    Concretely we make the following changes:
    1. In input creation we separate dataset-creation into top-level helpers, and allow it to optionally accept a pre-constructed model directly instead of always creating a model from the config just for feature preprocessing.

    2. In coco evaluation we split the update_op creation into its own function, which the custom loops will call directly.

    3. In model_lib we move groundtruth providing/ datastructure munging into a helper function

    4. For now we put an escape hatch in `_summarize_target_assignment` when executing in tf v2.0 behavior because the summary apis used only work w/ tf 1.x

--
249673507  by rathodv:

    Use explicit casts instead of tf.to_float and tf.to_int32 to avoid warnings.

--
249656006  by Zhichao Lu:

    Add named "raw_keypoint_locations" node that corresponds with the "raw_box_locations" node.

--
249651674  by rathodv:

    Keep proposal boxes in float format. MatMulCropAndResize can handle the type even when feature themselves are bfloat16s.

--
249568633  by rathodv:

    Support q > 1 in class agnostic NMS.
    Break post_processing_test.py into 3 separate files to avoid linter errors.

--
249535530  by rathodv:

    Update some deprecated arguments to tf ops.

--
249368223  by rathodv:

    Modify MatMulCropAndResize to use MultiLevelRoIAlign method and move the tests to spatial_transform_ops.py module.

    This cl establishes that CropAndResize and RoIAlign are equivalent and only differ in the sampling point grid within the boxes. CropAndResize uses a uniform size x size point grid such that the corner points exactly overlap box corners, while RoiAlign divides boxes into size x size cells and uses their centers as sampling points. In this cl, we switch MatMulCropAndResize to use the MultiLevelRoIAlign implementation with `align_corner` option as MultiLevelRoIAlign implementation is more memory efficient on TPU when compared to the original MatMulCropAndResize.

--
249337338  by chowdhery:

    Add class-agnostic non-max-suppression in post_processing

--
249139196  by Zhichao Lu:

    Fix positional argument bug in export_tflite_ssd_graph

--
249120219  by Zhichao Lu:

    Add evaluator for computing precision limited to a given recall range.

--
249030593  by Zhichao Lu:

    Evaluation util to run segmentation and detection challenge evaluation.

--
248554358  by Zhichao Lu:

    This change contains the auxiliary changes required for TF 2.0 style training with eager+functions+dist strat loops, but not the loops themselves.

    It includes:
    - Updates to shape usage to support both tensorshape v1 and tensorshape v2
    - A fix to FreezableBatchNorm to not override the `training` arg in call when `None` was passed to the constructor (Not an issue in the estimator loops but it was in the custom loops)
    - Puts some constants in init_scope so they work in eager + functions
    - Makes learning rate schedules return a callable in eager mode (required so they update when the global_step changes)
    - Makes DetectionModel a tf.module so it tracks variables (e.g. ones nested in layers)
    - Removes some references to `op.name` for some losses and replaces it w/ explicit names
    - A small part of the change to allow the coco evaluation metrics to work in eager mode

--
248271226  by rathodv:

    Add MultiLevel RoIAlign op.

--
248229103  by rathodv:

    Add functions to 1. pad features maps 2. ravel 5-D indices

--
248206769  by rathodv:

    Add utilities needed to introduce RoI Align op.

--
248177733  by pengchong:

    Internal changes

--
247742582  by Zhichao Lu:

    Open Images Challenge 2019 instance segmentation metric: part 2

--
247525401  by Zhichao Lu:

    Update comments on max_class_per_detection.

--
247520753  by rathodv:

    Add multilevel crop and resize operation that builds on top of matmul_crop_and_resize.

--
247391600  by Zhichao Lu:

    Open Images Challenge 2019 instance segmentation metric

--
247325813  by chowdhery:

    Quantized MobileNet v2 SSD FPNLite config with depth multiplier 0.75

--

PiperOrigin-RevId: 250447559
parent f42fddee
...@@ -55,12 +55,24 @@ a handful of auxiliary annotations associated with each bounding box, namely, ...@@ -55,12 +55,24 @@ a handful of auxiliary annotations associated with each bounding box, namely,
instance masks and keypoints. instance masks and keypoints.
""" """
import abc import abc
import tensorflow as tf
from object_detection.core import standard_fields as fields from object_detection.core import standard_fields as fields
class DetectionModel(object): # If using a new enough version of TensorFlow, detection models should be a
"""Abstract base class for detection models.""" # tf module or keras model for tracking.
try:
_BaseClass = tf.Module
except AttributeError:
_BaseClass = object
class DetectionModel(_BaseClass):
"""Abstract base class for detection models.
Extends tf.Module to guarantee variable tracking.
"""
__metaclass__ = abc.ABCMeta __metaclass__ = abc.ABCMeta
def __init__(self, num_classes): def __init__(self, num_classes):
......
This diff is collapsed.
...@@ -55,7 +55,7 @@ def prefetch(tensor_dict, capacity): ...@@ -55,7 +55,7 @@ def prefetch(tensor_dict, capacity):
enqueue_op = prefetch_queue.enqueue(tensor_dict) enqueue_op = prefetch_queue.enqueue(tensor_dict)
tf.train.queue_runner.add_queue_runner(tf.train.queue_runner.QueueRunner( tf.train.queue_runner.add_queue_runner(tf.train.queue_runner.QueueRunner(
prefetch_queue, [enqueue_op])) prefetch_queue, [enqueue_op]))
tf.summary.scalar('queue/%s/fraction_of_%d_full' % (prefetch_queue.name, tf.summary.scalar(
capacity), 'queue/%s/fraction_of_%d_full' % (prefetch_queue.name, capacity),
tf.to_float(prefetch_queue.size()) * (1. / capacity)) tf.cast(prefetch_queue.size(), dtype=tf.float32) * (1. / capacity))
return prefetch_queue return prefetch_queue
...@@ -261,7 +261,7 @@ def normalize_image(image, original_minval, original_maxval, target_minval, ...@@ -261,7 +261,7 @@ def normalize_image(image, original_minval, original_maxval, target_minval,
original_maxval = float(original_maxval) original_maxval = float(original_maxval)
target_minval = float(target_minval) target_minval = float(target_minval)
target_maxval = float(target_maxval) target_maxval = float(target_maxval)
image = tf.to_float(image) image = tf.cast(image, dtype=tf.float32)
image = tf.subtract(image, original_minval) image = tf.subtract(image, original_minval)
image = tf.multiply(image, (target_maxval - target_minval) / image = tf.multiply(image, (target_maxval - target_minval) /
(original_maxval - original_minval)) (original_maxval - original_minval))
...@@ -810,10 +810,12 @@ def random_image_scale(image, ...@@ -810,10 +810,12 @@ def random_image_scale(image,
generator_func, preprocessor_cache.PreprocessorCache.IMAGE_SCALE, generator_func, preprocessor_cache.PreprocessorCache.IMAGE_SCALE,
preprocess_vars_cache) preprocess_vars_cache)
image_newysize = tf.to_int32( image_newysize = tf.cast(
tf.multiply(tf.to_float(image_height), size_coef)) tf.multiply(tf.cast(image_height, dtype=tf.float32), size_coef),
image_newxsize = tf.to_int32( dtype=tf.int32)
tf.multiply(tf.to_float(image_width), size_coef)) image_newxsize = tf.cast(
tf.multiply(tf.cast(image_width, dtype=tf.float32), size_coef),
dtype=tf.int32)
image = tf.image.resize_images( image = tf.image.resize_images(
image, [image_newysize, image_newxsize], align_corners=True) image, [image_newysize, image_newxsize], align_corners=True)
result.append(image) result.append(image)
...@@ -1237,7 +1239,7 @@ def _strict_random_crop_image(image, ...@@ -1237,7 +1239,7 @@ def _strict_random_crop_image(image,
new_image.set_shape([None, None, image.get_shape()[2]]) new_image.set_shape([None, None, image.get_shape()[2]])
# [1, 4] # [1, 4]
im_box_rank2 = tf.squeeze(im_box, squeeze_dims=[0]) im_box_rank2 = tf.squeeze(im_box, axis=[0])
# [4] # [4]
im_box_rank1 = tf.squeeze(im_box) im_box_rank1 = tf.squeeze(im_box)
...@@ -1555,13 +1557,15 @@ def random_pad_image(image, ...@@ -1555,13 +1557,15 @@ def random_pad_image(image,
new_image += image_color_padded new_image += image_color_padded
# setting boxes # setting boxes
new_window = tf.to_float( new_window = tf.cast(
tf.stack([ tf.stack([
-offset_height, -offset_width, target_height - offset_height, -offset_height, -offset_width, target_height - offset_height,
target_width - offset_width target_width - offset_width
])) ]),
new_window /= tf.to_float( dtype=tf.float32)
tf.stack([image_height, image_width, image_height, image_width])) new_window /= tf.cast(
tf.stack([image_height, image_width, image_height, image_width]),
dtype=tf.float32)
boxlist = box_list.BoxList(boxes) boxlist = box_list.BoxList(boxes)
new_boxlist = box_list_ops.change_coordinate_frame(boxlist, new_window) new_boxlist = box_list_ops.change_coordinate_frame(boxlist, new_window)
new_boxes = new_boxlist.get() new_boxes = new_boxlist.get()
...@@ -1616,8 +1620,8 @@ def random_absolute_pad_image(image, ...@@ -1616,8 +1620,8 @@ def random_absolute_pad_image(image,
form. form.
""" """
min_image_size = tf.shape(image)[:2] min_image_size = tf.shape(image)[:2]
max_image_size = min_image_size + tf.to_int32( max_image_size = min_image_size + tf.cast(
[max_height_padding, max_width_padding]) [max_height_padding, max_width_padding], dtype=tf.int32)
return random_pad_image(image, boxes, min_image_size=min_image_size, return random_pad_image(image, boxes, min_image_size=min_image_size,
max_image_size=max_image_size, pad_color=pad_color, max_image_size=max_image_size, pad_color=pad_color,
seed=seed, seed=seed,
...@@ -1723,12 +1727,14 @@ def random_crop_pad_image(image, ...@@ -1723,12 +1727,14 @@ def random_crop_pad_image(image,
cropped_image, cropped_boxes, cropped_labels = result[:3] cropped_image, cropped_boxes, cropped_labels = result[:3]
min_image_size = tf.to_int32( min_image_size = tf.cast(
tf.to_float(tf.stack([image_height, image_width])) * tf.cast(tf.stack([image_height, image_width]), dtype=tf.float32) *
min_padded_size_ratio) min_padded_size_ratio,
max_image_size = tf.to_int32( dtype=tf.int32)
tf.to_float(tf.stack([image_height, image_width])) * max_image_size = tf.cast(
max_padded_size_ratio) tf.cast(tf.stack([image_height, image_width]), dtype=tf.float32) *
max_padded_size_ratio,
dtype=tf.int32)
padded_image, padded_boxes = random_pad_image( padded_image, padded_boxes = random_pad_image(
cropped_image, cropped_image,
...@@ -1840,16 +1846,23 @@ def random_crop_to_aspect_ratio(image, ...@@ -1840,16 +1846,23 @@ def random_crop_to_aspect_ratio(image,
image_shape = tf.shape(image) image_shape = tf.shape(image)
orig_height = image_shape[0] orig_height = image_shape[0]
orig_width = image_shape[1] orig_width = image_shape[1]
orig_aspect_ratio = tf.to_float(orig_width) / tf.to_float(orig_height) orig_aspect_ratio = tf.cast(
orig_width, dtype=tf.float32) / tf.cast(
orig_height, dtype=tf.float32)
new_aspect_ratio = tf.constant(aspect_ratio, dtype=tf.float32) new_aspect_ratio = tf.constant(aspect_ratio, dtype=tf.float32)
def target_height_fn(): def target_height_fn():
return tf.to_int32(tf.round(tf.to_float(orig_width) / new_aspect_ratio)) return tf.cast(
tf.round(tf.cast(orig_width, dtype=tf.float32) / new_aspect_ratio),
dtype=tf.int32)
target_height = tf.cond(orig_aspect_ratio >= new_aspect_ratio, target_height = tf.cond(orig_aspect_ratio >= new_aspect_ratio,
lambda: orig_height, target_height_fn) lambda: orig_height, target_height_fn)
def target_width_fn(): def target_width_fn():
return tf.to_int32(tf.round(tf.to_float(orig_height) * new_aspect_ratio)) return tf.cast(
tf.round(tf.cast(orig_height, dtype=tf.float32) * new_aspect_ratio),
dtype=tf.int32)
target_width = tf.cond(orig_aspect_ratio <= new_aspect_ratio, target_width = tf.cond(orig_aspect_ratio <= new_aspect_ratio,
lambda: orig_width, target_width_fn) lambda: orig_width, target_width_fn)
...@@ -1870,10 +1883,14 @@ def random_crop_to_aspect_ratio(image, ...@@ -1870,10 +1883,14 @@ def random_crop_to_aspect_ratio(image,
image, offset_height, offset_width, target_height, target_width) image, offset_height, offset_width, target_height, target_width)
im_box = tf.stack([ im_box = tf.stack([
tf.to_float(offset_height) / tf.to_float(orig_height), tf.cast(offset_height, dtype=tf.float32) /
tf.to_float(offset_width) / tf.to_float(orig_width), tf.cast(orig_height, dtype=tf.float32),
tf.to_float(offset_height + target_height) / tf.to_float(orig_height), tf.cast(offset_width, dtype=tf.float32) /
tf.to_float(offset_width + target_width) / tf.to_float(orig_width) tf.cast(orig_width, dtype=tf.float32),
tf.cast(offset_height + target_height, dtype=tf.float32) /
tf.cast(orig_height, dtype=tf.float32),
tf.cast(offset_width + target_width, dtype=tf.float32) /
tf.cast(orig_width, dtype=tf.float32)
]) ])
boxlist = box_list.BoxList(boxes) boxlist = box_list.BoxList(boxes)
...@@ -1996,8 +2013,8 @@ def random_pad_to_aspect_ratio(image, ...@@ -1996,8 +2013,8 @@ def random_pad_to_aspect_ratio(image,
with tf.name_scope('RandomPadToAspectRatio', values=[image]): with tf.name_scope('RandomPadToAspectRatio', values=[image]):
image_shape = tf.shape(image) image_shape = tf.shape(image)
image_height = tf.to_float(image_shape[0]) image_height = tf.cast(image_shape[0], dtype=tf.float32)
image_width = tf.to_float(image_shape[1]) image_width = tf.cast(image_shape[1], dtype=tf.float32)
image_aspect_ratio = image_width / image_height image_aspect_ratio = image_width / image_height
new_aspect_ratio = tf.constant(aspect_ratio, dtype=tf.float32) new_aspect_ratio = tf.constant(aspect_ratio, dtype=tf.float32)
target_height = tf.cond( target_height = tf.cond(
...@@ -2034,7 +2051,8 @@ def random_pad_to_aspect_ratio(image, ...@@ -2034,7 +2051,8 @@ def random_pad_to_aspect_ratio(image,
target_width = tf.round(scale * target_width) target_width = tf.round(scale * target_width)
new_image = tf.image.pad_to_bounding_box( new_image = tf.image.pad_to_bounding_box(
image, 0, 0, tf.to_int32(target_height), tf.to_int32(target_width)) image, 0, 0, tf.cast(target_height, dtype=tf.int32),
tf.cast(target_width, dtype=tf.int32))
im_box = tf.stack([ im_box = tf.stack([
0.0, 0.0,
...@@ -2050,9 +2068,9 @@ def random_pad_to_aspect_ratio(image, ...@@ -2050,9 +2068,9 @@ def random_pad_to_aspect_ratio(image,
if masks is not None: if masks is not None:
new_masks = tf.expand_dims(masks, -1) new_masks = tf.expand_dims(masks, -1)
new_masks = tf.image.pad_to_bounding_box(new_masks, 0, 0, new_masks = tf.image.pad_to_bounding_box(
tf.to_int32(target_height), new_masks, 0, 0, tf.cast(target_height, dtype=tf.int32),
tf.to_int32(target_width)) tf.cast(target_width, dtype=tf.int32))
new_masks = tf.squeeze(new_masks, [-1]) new_masks = tf.squeeze(new_masks, [-1])
result.append(new_masks) result.append(new_masks)
...@@ -2106,10 +2124,12 @@ def random_black_patches(image, ...@@ -2106,10 +2124,12 @@ def random_black_patches(image,
image_shape = tf.shape(image) image_shape = tf.shape(image)
image_height = image_shape[0] image_height = image_shape[0]
image_width = image_shape[1] image_width = image_shape[1]
box_size = tf.to_int32( box_size = tf.cast(
tf.multiply( tf.multiply(
tf.minimum(tf.to_float(image_height), tf.to_float(image_width)), tf.minimum(
size_to_image_ratio)) tf.cast(image_height, dtype=tf.float32),
tf.cast(image_width, dtype=tf.float32)), size_to_image_ratio),
dtype=tf.int32)
generator_func = functools.partial(tf.random_uniform, [], minval=0.0, generator_func = functools.partial(tf.random_uniform, [], minval=0.0,
maxval=(1.0 - size_to_image_ratio), maxval=(1.0 - size_to_image_ratio),
...@@ -2123,8 +2143,12 @@ def random_black_patches(image, ...@@ -2123,8 +2143,12 @@ def random_black_patches(image,
preprocessor_cache.PreprocessorCache.ADD_BLACK_PATCH, preprocessor_cache.PreprocessorCache.ADD_BLACK_PATCH,
preprocess_vars_cache, key=str(idx) + 'x') preprocess_vars_cache, key=str(idx) + 'x')
y_min = tf.to_int32(normalized_y_min * tf.to_float(image_height)) y_min = tf.cast(
x_min = tf.to_int32(normalized_x_min * tf.to_float(image_width)) normalized_y_min * tf.cast(image_height, dtype=tf.float32),
dtype=tf.int32)
x_min = tf.cast(
normalized_x_min * tf.cast(image_width, dtype=tf.float32),
dtype=tf.int32)
black_box = tf.ones([box_size, box_size, 3], dtype=tf.float32) black_box = tf.ones([box_size, box_size, 3], dtype=tf.float32)
mask = 1.0 - tf.image.pad_to_bounding_box(black_box, y_min, x_min, mask = 1.0 - tf.image.pad_to_bounding_box(black_box, y_min, x_min,
image_height, image_width) image_height, image_width)
...@@ -2156,7 +2180,7 @@ def image_to_float(image): ...@@ -2156,7 +2180,7 @@ def image_to_float(image):
image: image in tf.float32 format. image: image in tf.float32 format.
""" """
with tf.name_scope('ImageToFloat', values=[image]): with tf.name_scope('ImageToFloat', values=[image]):
image = tf.to_float(image) image = tf.cast(image, dtype=tf.float32)
return image return image
...@@ -2342,10 +2366,12 @@ def resize_to_min_dimension(image, masks=None, min_dimension=600, ...@@ -2342,10 +2366,12 @@ def resize_to_min_dimension(image, masks=None, min_dimension=600,
(image_height, image_width, num_channels) = _get_image_info(image) (image_height, image_width, num_channels) = _get_image_info(image)
min_image_dimension = tf.minimum(image_height, image_width) min_image_dimension = tf.minimum(image_height, image_width)
min_target_dimension = tf.maximum(min_image_dimension, min_dimension) min_target_dimension = tf.maximum(min_image_dimension, min_dimension)
target_ratio = tf.to_float(min_target_dimension) / tf.to_float( target_ratio = tf.cast(min_target_dimension, dtype=tf.float32) / tf.cast(
min_image_dimension) min_image_dimension, dtype=tf.float32)
target_height = tf.to_int32(tf.to_float(image_height) * target_ratio) target_height = tf.cast(
target_width = tf.to_int32(tf.to_float(image_width) * target_ratio) tf.cast(image_height, dtype=tf.float32) * target_ratio, dtype=tf.int32)
target_width = tf.cast(
tf.cast(image_width, dtype=tf.float32) * target_ratio, dtype=tf.int32)
image = tf.image.resize_images( image = tf.image.resize_images(
tf.expand_dims(image, axis=0), size=[target_height, target_width], tf.expand_dims(image, axis=0), size=[target_height, target_width],
method=method, method=method,
...@@ -2398,10 +2424,12 @@ def resize_to_max_dimension(image, masks=None, max_dimension=600, ...@@ -2398,10 +2424,12 @@ def resize_to_max_dimension(image, masks=None, max_dimension=600,
(image_height, image_width, num_channels) = _get_image_info(image) (image_height, image_width, num_channels) = _get_image_info(image)
max_image_dimension = tf.maximum(image_height, image_width) max_image_dimension = tf.maximum(image_height, image_width)
max_target_dimension = tf.minimum(max_image_dimension, max_dimension) max_target_dimension = tf.minimum(max_image_dimension, max_dimension)
target_ratio = tf.to_float(max_target_dimension) / tf.to_float( target_ratio = tf.cast(max_target_dimension, dtype=tf.float32) / tf.cast(
max_image_dimension) max_image_dimension, dtype=tf.float32)
target_height = tf.to_int32(tf.to_float(image_height) * target_ratio) target_height = tf.cast(
target_width = tf.to_int32(tf.to_float(image_width) * target_ratio) tf.cast(image_height, dtype=tf.float32) * target_ratio, dtype=tf.int32)
target_width = tf.cast(
tf.cast(image_width, dtype=tf.float32) * target_ratio, dtype=tf.int32)
image = tf.image.resize_images( image = tf.image.resize_images(
tf.expand_dims(image, axis=0), size=[target_height, target_width], tf.expand_dims(image, axis=0), size=[target_height, target_width],
method=method, method=method,
...@@ -2639,11 +2667,11 @@ def random_self_concat_image( ...@@ -2639,11 +2667,11 @@ def random_self_concat_image(
if axis == 0: if axis == 0:
# Concat vertically, so need to reduce the y coordinates. # Concat vertically, so need to reduce the y coordinates.
old_scaling = tf.to_float([0.5, 1.0, 0.5, 1.0]) old_scaling = tf.constant([0.5, 1.0, 0.5, 1.0])
new_translation = tf.to_float([0.5, 0.0, 0.5, 0.0]) new_translation = tf.constant([0.5, 0.0, 0.5, 0.0])
elif axis == 1: elif axis == 1:
old_scaling = tf.to_float([1.0, 0.5, 1.0, 0.5]) old_scaling = tf.constant([1.0, 0.5, 1.0, 0.5])
new_translation = tf.to_float([0.0, 0.5, 0.0, 0.5]) new_translation = tf.constant([0.0, 0.5, 0.0, 0.5])
old_boxes = old_scaling * boxes old_boxes = old_scaling * boxes
new_boxes = old_boxes + new_translation new_boxes = old_boxes + new_translation
......
...@@ -795,8 +795,8 @@ class PreprocessorTest(tf.test.TestCase): ...@@ -795,8 +795,8 @@ class PreprocessorTest(tf.test.TestCase):
images = self.createTestImages() images = self.createTestImages()
tensor_dict = {fields.InputDataFields.image: images} tensor_dict = {fields.InputDataFields.image: images}
tensor_dict = preprocessor.preprocess(tensor_dict, preprocessing_options) tensor_dict = preprocessor.preprocess(tensor_dict, preprocessing_options)
images_min = tf.to_float(images) * 0.9 / 255.0 images_min = tf.cast(images, dtype=tf.float32) * 0.9 / 255.0
images_max = tf.to_float(images) * 1.1 / 255.0 images_max = tf.cast(images, dtype=tf.float32) * 1.1 / 255.0
images = tensor_dict[fields.InputDataFields.image] images = tensor_dict[fields.InputDataFields.image]
values_greater = tf.greater_equal(images, images_min) values_greater = tf.greater_equal(images, images_min)
values_less = tf.less_equal(images, images_max) values_less = tf.less_equal(images, images_max)
...@@ -858,20 +858,26 @@ class PreprocessorTest(tf.test.TestCase): ...@@ -858,20 +858,26 @@ class PreprocessorTest(tf.test.TestCase):
value=images_gray, num_or_size_splits=3, axis=3) value=images_gray, num_or_size_splits=3, axis=3)
images_r, images_g, images_b = tf.split( images_r, images_g, images_b = tf.split(
value=images_original, num_or_size_splits=3, axis=3) value=images_original, num_or_size_splits=3, axis=3)
images_r_diff1 = tf.squared_difference(tf.to_float(images_r), images_r_diff1 = tf.squared_difference(
tf.to_float(images_gray_r)) tf.cast(images_r, dtype=tf.float32),
images_r_diff2 = tf.squared_difference(tf.to_float(images_gray_r), tf.cast(images_gray_r, dtype=tf.float32))
tf.to_float(images_gray_g)) images_r_diff2 = tf.squared_difference(
tf.cast(images_gray_r, dtype=tf.float32),
tf.cast(images_gray_g, dtype=tf.float32))
images_r_diff = tf.multiply(images_r_diff1, images_r_diff2) images_r_diff = tf.multiply(images_r_diff1, images_r_diff2)
images_g_diff1 = tf.squared_difference(tf.to_float(images_g), images_g_diff1 = tf.squared_difference(
tf.to_float(images_gray_g)) tf.cast(images_g, dtype=tf.float32),
images_g_diff2 = tf.squared_difference(tf.to_float(images_gray_g), tf.cast(images_gray_g, dtype=tf.float32))
tf.to_float(images_gray_b)) images_g_diff2 = tf.squared_difference(
tf.cast(images_gray_g, dtype=tf.float32),
tf.cast(images_gray_b, dtype=tf.float32))
images_g_diff = tf.multiply(images_g_diff1, images_g_diff2) images_g_diff = tf.multiply(images_g_diff1, images_g_diff2)
images_b_diff1 = tf.squared_difference(tf.to_float(images_b), images_b_diff1 = tf.squared_difference(
tf.to_float(images_gray_b)) tf.cast(images_b, dtype=tf.float32),
images_b_diff2 = tf.squared_difference(tf.to_float(images_gray_b), tf.cast(images_gray_b, dtype=tf.float32))
tf.to_float(images_gray_r)) images_b_diff2 = tf.squared_difference(
tf.cast(images_gray_b, dtype=tf.float32),
tf.cast(images_gray_r, dtype=tf.float32))
images_b_diff = tf.multiply(images_b_diff1, images_b_diff2) images_b_diff = tf.multiply(images_b_diff1, images_b_diff2)
image_zero1 = tf.constant(0, dtype=tf.float32, shape=[1, 4, 4, 1]) image_zero1 = tf.constant(0, dtype=tf.float32, shape=[1, 4, 4, 1])
with self.test_session() as sess: with self.test_session() as sess:
...@@ -2135,7 +2141,7 @@ class PreprocessorTest(tf.test.TestCase): ...@@ -2135,7 +2141,7 @@ class PreprocessorTest(tf.test.TestCase):
boxes = self.createTestBoxes() boxes = self.createTestBoxes()
labels = self.createTestLabels() labels = self.createTestLabels()
tensor_dict = { tensor_dict = {
fields.InputDataFields.image: tf.to_float(images), fields.InputDataFields.image: tf.cast(images, dtype=tf.float32),
fields.InputDataFields.groundtruth_boxes: boxes, fields.InputDataFields.groundtruth_boxes: boxes,
fields.InputDataFields.groundtruth_classes: labels, fields.InputDataFields.groundtruth_classes: labels,
} }
...@@ -2856,7 +2862,7 @@ class PreprocessorTest(tf.test.TestCase): ...@@ -2856,7 +2862,7 @@ class PreprocessorTest(tf.test.TestCase):
scores = self.createTestMultiClassScores() scores = self.createTestMultiClassScores()
tensor_dict = { tensor_dict = {
fields.InputDataFields.image: tf.to_float(images), fields.InputDataFields.image: tf.cast(images, dtype=tf.float32),
fields.InputDataFields.groundtruth_boxes: boxes, fields.InputDataFields.groundtruth_boxes: boxes,
fields.InputDataFields.groundtruth_classes: labels, fields.InputDataFields.groundtruth_classes: labels,
fields.InputDataFields.groundtruth_weights: weights, fields.InputDataFields.groundtruth_weights: weights,
......
...@@ -109,6 +109,8 @@ class DetectionResultFields(object): ...@@ -109,6 +109,8 @@ class DetectionResultFields(object):
key: unique key corresponding to image. key: unique key corresponding to image.
detection_boxes: coordinates of the detection boxes in the image. detection_boxes: coordinates of the detection boxes in the image.
detection_scores: detection scores for the detection boxes in the image. detection_scores: detection scores for the detection boxes in the image.
detection_multiclass_scores: class score distribution (including background)
for detection boxes in the image including background class.
detection_classes: detection-level class labels. detection_classes: detection-level class labels.
detection_masks: contains a segmentation mask for each detection box. detection_masks: contains a segmentation mask for each detection box.
detection_boundaries: contains an object boundary for each detection box. detection_boundaries: contains an object boundary for each detection box.
...@@ -123,6 +125,7 @@ class DetectionResultFields(object): ...@@ -123,6 +125,7 @@ class DetectionResultFields(object):
key = 'key' key = 'key'
detection_boxes = 'detection_boxes' detection_boxes = 'detection_boxes'
detection_scores = 'detection_scores' detection_scores = 'detection_scores'
detection_multiclass_scores = 'detection_multiclass_scores'
detection_classes = 'detection_classes' detection_classes = 'detection_classes'
detection_masks = 'detection_masks' detection_masks = 'detection_masks'
detection_boundaries = 'detection_boundaries' detection_boundaries = 'detection_boundaries'
......
...@@ -660,16 +660,16 @@ def batch_assign_confidences(target_assigner, ...@@ -660,16 +660,16 @@ def batch_assign_confidences(target_assigner,
explicit_example_mask = tf.logical_or(positive_mask, negative_mask) explicit_example_mask = tf.logical_or(positive_mask, negative_mask)
positive_anchors = tf.reduce_any(positive_mask, axis=-1) positive_anchors = tf.reduce_any(positive_mask, axis=-1)
regression_weights = tf.to_float(positive_anchors) regression_weights = tf.cast(positive_anchors, dtype=tf.float32)
regression_targets = ( regression_targets = (
reg_targets * tf.expand_dims(regression_weights, axis=-1)) reg_targets * tf.expand_dims(regression_weights, axis=-1))
regression_weights_expanded = tf.expand_dims(regression_weights, axis=-1) regression_weights_expanded = tf.expand_dims(regression_weights, axis=-1)
cls_targets_without_background = ( cls_targets_without_background = (
cls_targets_without_background * (1 - tf.to_float(negative_mask))) cls_targets_without_background *
cls_weights_without_background = ( (1 - tf.cast(negative_mask, dtype=tf.float32)))
(1 - implicit_class_weight) * tf.to_float(explicit_example_mask) cls_weights_without_background = ((1 - implicit_class_weight) * tf.cast(
+ implicit_class_weight) explicit_example_mask, dtype=tf.float32) + implicit_class_weight)
if include_background_class: if include_background_class:
cls_weights_background = ( cls_weights_background = (
......
...@@ -59,8 +59,15 @@ class _ClassTensorHandler(slim_example_decoder.Tensor): ...@@ -59,8 +59,15 @@ class _ClassTensorHandler(slim_example_decoder.Tensor):
label_map_proto_file, use_display_name=False) label_map_proto_file, use_display_name=False)
# We use a default_value of -1, but we expect all labels to be contained # We use a default_value of -1, but we expect all labels to be contained
# in the label map. # in the label map.
name_to_id_table = tf.contrib.lookup.HashTable( try:
initializer=tf.contrib.lookup.KeyValueTensorInitializer( # Dynamically try to load the tf v2 lookup, falling back to contrib
lookup = tf.compat.v2.lookup
hash_table_class = tf.compat.v2.lookup.StaticHashTable
except AttributeError:
lookup = tf.contrib.lookup
hash_table_class = tf.contrib.lookup.HashTable
name_to_id_table = hash_table_class(
initializer=lookup.KeyValueTensorInitializer(
keys=tf.constant(list(name_to_id.keys())), keys=tf.constant(list(name_to_id.keys())),
values=tf.constant(list(name_to_id.values()), dtype=tf.int64)), values=tf.constant(list(name_to_id.values()), dtype=tf.int64)),
default_value=-1) default_value=-1)
...@@ -68,8 +75,8 @@ class _ClassTensorHandler(slim_example_decoder.Tensor): ...@@ -68,8 +75,8 @@ class _ClassTensorHandler(slim_example_decoder.Tensor):
label_map_proto_file, use_display_name=True) label_map_proto_file, use_display_name=True)
# We use a default_value of -1, but we expect all labels to be contained # We use a default_value of -1, but we expect all labels to be contained
# in the label map. # in the label map.
display_name_to_id_table = tf.contrib.lookup.HashTable( display_name_to_id_table = hash_table_class(
initializer=tf.contrib.lookup.KeyValueTensorInitializer( initializer=lookup.KeyValueTensorInitializer(
keys=tf.constant(list(display_name_to_id.keys())), keys=tf.constant(list(display_name_to_id.keys())),
values=tf.constant( values=tf.constant(
list(display_name_to_id.values()), dtype=tf.int64)), list(display_name_to_id.values()), dtype=tf.int64)),
...@@ -444,7 +451,8 @@ class TfExampleDecoder(data_decoder.DataDecoder): ...@@ -444,7 +451,8 @@ class TfExampleDecoder(data_decoder.DataDecoder):
masks = keys_to_tensors['image/object/mask'] masks = keys_to_tensors['image/object/mask']
if isinstance(masks, tf.SparseTensor): if isinstance(masks, tf.SparseTensor):
masks = tf.sparse_tensor_to_dense(masks) masks = tf.sparse_tensor_to_dense(masks)
masks = tf.reshape(tf.to_float(tf.greater(masks, 0.0)), to_shape) masks = tf.reshape(
tf.cast(tf.greater(masks, 0.0), dtype=tf.float32), to_shape)
return tf.cast(masks, tf.float32) return tf.cast(masks, tf.float32)
def _decode_png_instance_masks(self, keys_to_tensors): def _decode_png_instance_masks(self, keys_to_tensors):
...@@ -465,7 +473,7 @@ class TfExampleDecoder(data_decoder.DataDecoder): ...@@ -465,7 +473,7 @@ class TfExampleDecoder(data_decoder.DataDecoder):
image = tf.squeeze( image = tf.squeeze(
tf.image.decode_image(image_buffer, channels=1), axis=2) tf.image.decode_image(image_buffer, channels=1), axis=2)
image.set_shape([None, None]) image.set_shape([None, None])
image = tf.to_float(tf.greater(image, 0)) image = tf.cast(tf.greater(image, 0), dtype=tf.float32)
return image return image
png_masks = keys_to_tensors['image/object/mask'] png_masks = keys_to_tensors['image/object/mask']
...@@ -476,4 +484,4 @@ class TfExampleDecoder(data_decoder.DataDecoder): ...@@ -476,4 +484,4 @@ class TfExampleDecoder(data_decoder.DataDecoder):
return tf.cond( return tf.cond(
tf.greater(tf.size(png_masks), 0), tf.greater(tf.size(png_masks), 0),
lambda: tf.map_fn(decode_png_mask, png_masks, dtype=tf.float32), lambda: tf.map_fn(decode_png_mask, png_masks, dtype=tf.float32),
lambda: tf.zeros(tf.to_int32(tf.stack([0, height, width])))) lambda: tf.zeros(tf.cast(tf.stack([0, height, width]), dtype=tf.int32)))
...@@ -44,10 +44,15 @@ EVAL_METRICS_CLASS_DICT = { ...@@ -44,10 +44,15 @@ EVAL_METRICS_CLASS_DICT = {
coco_evaluation.CocoMaskEvaluator, coco_evaluation.CocoMaskEvaluator,
'oid_challenge_detection_metrics': 'oid_challenge_detection_metrics':
object_detection_evaluation.OpenImagesDetectionChallengeEvaluator, object_detection_evaluation.OpenImagesDetectionChallengeEvaluator,
'oid_challenge_segmentation_metrics':
object_detection_evaluation
.OpenImagesInstanceSegmentationChallengeEvaluator,
'pascal_voc_detection_metrics': 'pascal_voc_detection_metrics':
object_detection_evaluation.PascalDetectionEvaluator, object_detection_evaluation.PascalDetectionEvaluator,
'weighted_pascal_voc_detection_metrics': 'weighted_pascal_voc_detection_metrics':
object_detection_evaluation.WeightedPascalDetectionEvaluator, object_detection_evaluation.WeightedPascalDetectionEvaluator,
'precision_at_recall_detection_metrics':
object_detection_evaluation.PrecisionAtRecallDetectionEvaluator,
'pascal_voc_instance_segmentation_metrics': 'pascal_voc_instance_segmentation_metrics':
object_detection_evaluation.PascalInstanceSegmentationEvaluator, object_detection_evaluation.PascalInstanceSegmentationEvaluator,
'weighted_pascal_voc_instance_segmentation_metrics': 'weighted_pascal_voc_instance_segmentation_metrics':
...@@ -776,7 +781,8 @@ def result_dict_for_batched_example(images, ...@@ -776,7 +781,8 @@ def result_dict_for_batched_example(images,
detection_fields = fields.DetectionResultFields detection_fields = fields.DetectionResultFields
detection_boxes = detections[detection_fields.detection_boxes] detection_boxes = detections[detection_fields.detection_boxes]
detection_scores = detections[detection_fields.detection_scores] detection_scores = detections[detection_fields.detection_scores]
num_detections = tf.to_int32(detections[detection_fields.num_detections]) num_detections = tf.cast(detections[detection_fields.num_detections],
dtype=tf.int32)
if class_agnostic: if class_agnostic:
detection_classes = tf.ones_like(detection_scores, dtype=tf.int64) detection_classes = tf.ones_like(detection_scores, dtype=tf.int64)
...@@ -939,4 +945,9 @@ def evaluator_options_from_eval_config(eval_config): ...@@ -939,4 +945,9 @@ def evaluator_options_from_eval_config(eval_config):
'include_metrics_per_category': ( 'include_metrics_per_category': (
eval_config.include_metrics_per_category) eval_config.include_metrics_per_category)
} }
elif eval_metric_fn_key == 'precision_at_recall_detection_metrics':
evaluator_options[eval_metric_fn_key] = {
'recall_lower_bound': (eval_config.recall_lower_bound),
'recall_upper_bound': (eval_config.recall_upper_bound)
}
return evaluator_options return evaluator_options
...@@ -31,9 +31,9 @@ from object_detection.utils import test_case ...@@ -31,9 +31,9 @@ from object_detection.utils import test_case
class EvalUtilTest(test_case.TestCase, parameterized.TestCase): class EvalUtilTest(test_case.TestCase, parameterized.TestCase):
def _get_categories_list(self): def _get_categories_list(self):
return [{'id': 0, 'name': 'person'}, return [{'id': 1, 'name': 'person'},
{'id': 1, 'name': 'dog'}, {'id': 2, 'name': 'dog'},
{'id': 2, 'name': 'cat'}] {'id': 3, 'name': 'cat'}]
def _make_evaluation_dict(self, def _make_evaluation_dict(self,
resized_groundtruth_masks=False, resized_groundtruth_masks=False,
...@@ -192,43 +192,66 @@ class EvalUtilTest(test_case.TestCase, parameterized.TestCase): ...@@ -192,43 +192,66 @@ class EvalUtilTest(test_case.TestCase, parameterized.TestCase):
def test_get_eval_metric_ops_for_evaluators(self): def test_get_eval_metric_ops_for_evaluators(self):
eval_config = eval_pb2.EvalConfig() eval_config = eval_pb2.EvalConfig()
eval_config.metrics_set.extend( eval_config.metrics_set.extend([
['coco_detection_metrics', 'coco_mask_metrics']) 'coco_detection_metrics', 'coco_mask_metrics',
'precision_at_recall_detection_metrics'
])
eval_config.include_metrics_per_category = True eval_config.include_metrics_per_category = True
eval_config.recall_lower_bound = 0.2
eval_config.recall_upper_bound = 0.6
evaluator_options = eval_util.evaluator_options_from_eval_config( evaluator_options = eval_util.evaluator_options_from_eval_config(
eval_config) eval_config)
self.assertTrue(evaluator_options['coco_detection_metrics'][ self.assertTrue(evaluator_options['coco_detection_metrics']
'include_metrics_per_category']) ['include_metrics_per_category'])
self.assertTrue(evaluator_options['coco_mask_metrics'][ self.assertTrue(
'include_metrics_per_category']) evaluator_options['coco_mask_metrics']['include_metrics_per_category'])
self.assertAlmostEqual(
evaluator_options['precision_at_recall_detection_metrics']
['recall_lower_bound'], eval_config.recall_lower_bound)
self.assertAlmostEqual(
evaluator_options['precision_at_recall_detection_metrics']
['recall_upper_bound'], eval_config.recall_upper_bound)
def test_get_evaluator_with_evaluator_options(self): def test_get_evaluator_with_evaluator_options(self):
eval_config = eval_pb2.EvalConfig() eval_config = eval_pb2.EvalConfig()
eval_config.metrics_set.extend(['coco_detection_metrics']) eval_config.metrics_set.extend(
['coco_detection_metrics', 'precision_at_recall_detection_metrics'])
eval_config.include_metrics_per_category = True eval_config.include_metrics_per_category = True
eval_config.recall_lower_bound = 0.2
eval_config.recall_upper_bound = 0.6
categories = self._get_categories_list() categories = self._get_categories_list()
evaluator_options = eval_util.evaluator_options_from_eval_config( evaluator_options = eval_util.evaluator_options_from_eval_config(
eval_config) eval_config)
evaluator = eval_util.get_evaluators( evaluator = eval_util.get_evaluators(eval_config, categories,
eval_config, categories, evaluator_options) evaluator_options)
self.assertTrue(evaluator[0]._include_metrics_per_category) self.assertTrue(evaluator[0]._include_metrics_per_category)
self.assertAlmostEqual(evaluator[1]._recall_lower_bound,
eval_config.recall_lower_bound)
self.assertAlmostEqual(evaluator[1]._recall_upper_bound,
eval_config.recall_upper_bound)
def test_get_evaluator_with_no_evaluator_options(self): def test_get_evaluator_with_no_evaluator_options(self):
eval_config = eval_pb2.EvalConfig() eval_config = eval_pb2.EvalConfig()
eval_config.metrics_set.extend(['coco_detection_metrics']) eval_config.metrics_set.extend(
['coco_detection_metrics', 'precision_at_recall_detection_metrics'])
eval_config.include_metrics_per_category = True eval_config.include_metrics_per_category = True
eval_config.recall_lower_bound = 0.2
eval_config.recall_upper_bound = 0.6
categories = self._get_categories_list() categories = self._get_categories_list()
evaluator = eval_util.get_evaluators( evaluator = eval_util.get_evaluators(
eval_config, categories, evaluator_options=None) eval_config, categories, evaluator_options=None)
# Even though we are setting eval_config.include_metrics_per_category = True # Even though we are setting eval_config.include_metrics_per_category = True
# this option is never passed into the DetectionEvaluator constructor (via # and bounds on recall, these options are never passed into the
# `evaluator_options`). # DetectionEvaluator constructor (via `evaluator_options`).
self.assertFalse(evaluator[0]._include_metrics_per_category) self.assertFalse(evaluator[0]._include_metrics_per_category)
self.assertAlmostEqual(evaluator[1]._recall_lower_bound, 0.0)
self.assertAlmostEqual(evaluator[1]._recall_upper_bound, 1.0)
if __name__ == '__main__': if __name__ == '__main__':
tf.test.main() tf.test.main()
...@@ -106,7 +106,7 @@ flags.DEFINE_string('trained_checkpoint_prefix', None, 'Checkpoint prefix.') ...@@ -106,7 +106,7 @@ flags.DEFINE_string('trained_checkpoint_prefix', None, 'Checkpoint prefix.')
flags.DEFINE_integer('max_detections', 10, flags.DEFINE_integer('max_detections', 10,
'Maximum number of detections (boxes) to show.') 'Maximum number of detections (boxes) to show.')
flags.DEFINE_integer('max_classes_per_detection', 1, flags.DEFINE_integer('max_classes_per_detection', 1,
'Number of classes to display per detection box.') 'Maximum number of classes to output per detection box.')
flags.DEFINE_integer( flags.DEFINE_integer(
'detections_per_class', 100, 'detections_per_class', 100,
'Number of anchors used per class in Regular Non-Max-Suppression.') 'Number of anchors used per class in Regular Non-Max-Suppression.')
...@@ -136,7 +136,7 @@ def main(argv): ...@@ -136,7 +136,7 @@ def main(argv):
export_tflite_ssd_graph_lib.export_tflite_graph( export_tflite_ssd_graph_lib.export_tflite_graph(
pipeline_config, FLAGS.trained_checkpoint_prefix, FLAGS.output_directory, pipeline_config, FLAGS.trained_checkpoint_prefix, FLAGS.output_directory,
FLAGS.add_postprocessing_op, FLAGS.max_detections, FLAGS.add_postprocessing_op, FLAGS.max_detections,
FLAGS.max_classes_per_detection, FLAGS.use_regular_nms) FLAGS.max_classes_per_detection, use_regular_nms=FLAGS.use_regular_nms)
if __name__ == '__main__': if __name__ == '__main__':
......
...@@ -176,6 +176,9 @@ def add_output_tensor_nodes(postprocessed_tensors, ...@@ -176,6 +176,9 @@ def add_output_tensor_nodes(postprocessed_tensors,
containing detected boxes. containing detected boxes.
* detection_scores: float32 tensor of shape [batch_size, num_boxes] * detection_scores: float32 tensor of shape [batch_size, num_boxes]
containing scores for the detected boxes. containing scores for the detected boxes.
* detection_multiclass_scores: (Optional) float32 tensor of shape
[batch_size, num_boxes, num_classes_with_background] for containing class
score distribution for detected boxes including background if any.
* detection_classes: float32 tensor of shape [batch_size, num_boxes] * detection_classes: float32 tensor of shape [batch_size, num_boxes]
containing class predictions for the detected boxes. containing class predictions for the detected boxes.
* detection_keypoints: (Optional) float32 tensor of shape * detection_keypoints: (Optional) float32 tensor of shape
...@@ -189,6 +192,8 @@ def add_output_tensor_nodes(postprocessed_tensors, ...@@ -189,6 +192,8 @@ def add_output_tensor_nodes(postprocessed_tensors,
postprocessed_tensors: a dictionary containing the following fields postprocessed_tensors: a dictionary containing the following fields
'detection_boxes': [batch, max_detections, 4] 'detection_boxes': [batch, max_detections, 4]
'detection_scores': [batch, max_detections] 'detection_scores': [batch, max_detections]
'detection_multiclass_scores': [batch, max_detections,
num_classes_with_background]
'detection_classes': [batch, max_detections] 'detection_classes': [batch, max_detections]
'detection_masks': [batch, max_detections, mask_height, mask_width] 'detection_masks': [batch, max_detections, mask_height, mask_width]
(optional). (optional).
...@@ -204,6 +209,8 @@ def add_output_tensor_nodes(postprocessed_tensors, ...@@ -204,6 +209,8 @@ def add_output_tensor_nodes(postprocessed_tensors,
label_id_offset = 1 label_id_offset = 1
boxes = postprocessed_tensors.get(detection_fields.detection_boxes) boxes = postprocessed_tensors.get(detection_fields.detection_boxes)
scores = postprocessed_tensors.get(detection_fields.detection_scores) scores = postprocessed_tensors.get(detection_fields.detection_scores)
multiclass_scores = postprocessed_tensors.get(
detection_fields.detection_multiclass_scores)
raw_boxes = postprocessed_tensors.get(detection_fields.raw_detection_boxes) raw_boxes = postprocessed_tensors.get(detection_fields.raw_detection_boxes)
raw_scores = postprocessed_tensors.get(detection_fields.raw_detection_scores) raw_scores = postprocessed_tensors.get(detection_fields.raw_detection_scores)
classes = postprocessed_tensors.get( classes = postprocessed_tensors.get(
...@@ -216,6 +223,9 @@ def add_output_tensor_nodes(postprocessed_tensors, ...@@ -216,6 +223,9 @@ def add_output_tensor_nodes(postprocessed_tensors,
boxes, name=detection_fields.detection_boxes) boxes, name=detection_fields.detection_boxes)
outputs[detection_fields.detection_scores] = tf.identity( outputs[detection_fields.detection_scores] = tf.identity(
scores, name=detection_fields.detection_scores) scores, name=detection_fields.detection_scores)
if multiclass_scores is not None:
outputs[detection_fields.detection_multiclass_scores] = tf.identity(
multiclass_scores, name=detection_fields.detection_multiclass_scores)
outputs[detection_fields.detection_classes] = tf.identity( outputs[detection_fields.detection_classes] = tf.identity(
classes, name=detection_fields.detection_classes) classes, name=detection_fields.detection_classes)
outputs[detection_fields.num_detections] = tf.identity( outputs[detection_fields.num_detections] = tf.identity(
...@@ -306,7 +316,7 @@ def write_graph_and_checkpoint(inference_graph_def, ...@@ -306,7 +316,7 @@ def write_graph_and_checkpoint(inference_graph_def,
def _get_outputs_from_inputs(input_tensors, detection_model, def _get_outputs_from_inputs(input_tensors, detection_model,
output_collection_name): output_collection_name):
inputs = tf.to_float(input_tensors) inputs = tf.cast(input_tensors, dtype=tf.float32)
preprocessed_inputs, true_image_shapes = detection_model.preprocess(inputs) preprocessed_inputs, true_image_shapes = detection_model.preprocess(inputs)
output_tensors = detection_model.predict( output_tensors = detection_model.predict(
preprocessed_inputs, true_image_shapes) preprocessed_inputs, true_image_shapes)
......
...@@ -59,6 +59,9 @@ class FakeModel(model.DetectionModel): ...@@ -59,6 +59,9 @@ class FakeModel(model.DetectionModel):
[0.0, 0.0, 0.0, 0.0]]], tf.float32), [0.0, 0.0, 0.0, 0.0]]], tf.float32),
'detection_scores': tf.constant([[0.7, 0.6], 'detection_scores': tf.constant([[0.7, 0.6],
[0.9, 0.0]], tf.float32), [0.9, 0.0]], tf.float32),
'detection_multiclass_scores': tf.constant([[[0.3, 0.7], [0.4, 0.6]],
[[0.1, 0.9], [0.0, 0.0]]],
tf.float32),
'detection_classes': tf.constant([[0, 1], 'detection_classes': tf.constant([[0, 1],
[1, 0]], tf.float32), [1, 0]], tf.float32),
'num_detections': tf.constant([2, 1], tf.float32), 'num_detections': tf.constant([2, 1], tf.float32),
...@@ -371,6 +374,7 @@ class ExportInferenceGraphTest(tf.test.TestCase): ...@@ -371,6 +374,7 @@ class ExportInferenceGraphTest(tf.test.TestCase):
inference_graph.get_tensor_by_name('image_tensor:0') inference_graph.get_tensor_by_name('image_tensor:0')
inference_graph.get_tensor_by_name('detection_boxes:0') inference_graph.get_tensor_by_name('detection_boxes:0')
inference_graph.get_tensor_by_name('detection_scores:0') inference_graph.get_tensor_by_name('detection_scores:0')
inference_graph.get_tensor_by_name('detection_multiclass_scores:0')
inference_graph.get_tensor_by_name('detection_classes:0') inference_graph.get_tensor_by_name('detection_classes:0')
inference_graph.get_tensor_by_name('detection_keypoints:0') inference_graph.get_tensor_by_name('detection_keypoints:0')
inference_graph.get_tensor_by_name('detection_masks:0') inference_graph.get_tensor_by_name('detection_masks:0')
...@@ -398,6 +402,7 @@ class ExportInferenceGraphTest(tf.test.TestCase): ...@@ -398,6 +402,7 @@ class ExportInferenceGraphTest(tf.test.TestCase):
inference_graph.get_tensor_by_name('image_tensor:0') inference_graph.get_tensor_by_name('image_tensor:0')
inference_graph.get_tensor_by_name('detection_boxes:0') inference_graph.get_tensor_by_name('detection_boxes:0')
inference_graph.get_tensor_by_name('detection_scores:0') inference_graph.get_tensor_by_name('detection_scores:0')
inference_graph.get_tensor_by_name('detection_multiclass_scores:0')
inference_graph.get_tensor_by_name('detection_classes:0') inference_graph.get_tensor_by_name('detection_classes:0')
inference_graph.get_tensor_by_name('num_detections:0') inference_graph.get_tensor_by_name('num_detections:0')
with self.assertRaises(KeyError): with self.assertRaises(KeyError):
...@@ -491,15 +496,20 @@ class ExportInferenceGraphTest(tf.test.TestCase): ...@@ -491,15 +496,20 @@ class ExportInferenceGraphTest(tf.test.TestCase):
'encoded_image_string_tensor:0') 'encoded_image_string_tensor:0')
boxes = inference_graph.get_tensor_by_name('detection_boxes:0') boxes = inference_graph.get_tensor_by_name('detection_boxes:0')
scores = inference_graph.get_tensor_by_name('detection_scores:0') scores = inference_graph.get_tensor_by_name('detection_scores:0')
multiclass_scores = inference_graph.get_tensor_by_name(
'detection_multiclass_scores:0')
classes = inference_graph.get_tensor_by_name('detection_classes:0') classes = inference_graph.get_tensor_by_name('detection_classes:0')
keypoints = inference_graph.get_tensor_by_name('detection_keypoints:0') keypoints = inference_graph.get_tensor_by_name('detection_keypoints:0')
masks = inference_graph.get_tensor_by_name('detection_masks:0') masks = inference_graph.get_tensor_by_name('detection_masks:0')
num_detections = inference_graph.get_tensor_by_name('num_detections:0') num_detections = inference_graph.get_tensor_by_name('num_detections:0')
for image_str in [jpg_image_str, png_image_str]: for image_str in [jpg_image_str, png_image_str]:
image_str_batch_np = np.hstack([image_str]* 2) image_str_batch_np = np.hstack([image_str]* 2)
(boxes_np, scores_np, classes_np, keypoints_np, masks_np, (boxes_np, scores_np, multiclass_scores_np, classes_np, keypoints_np,
num_detections_np) = sess.run( masks_np, num_detections_np) = sess.run(
[boxes, scores, classes, keypoints, masks, num_detections], [
boxes, scores, multiclass_scores, classes, keypoints, masks,
num_detections
],
feed_dict={image_str_tensor: image_str_batch_np}) feed_dict={image_str_tensor: image_str_batch_np})
self.assertAllClose(boxes_np, [[[0.0, 0.0, 0.5, 0.5], self.assertAllClose(boxes_np, [[[0.0, 0.0, 0.5, 0.5],
[0.5, 0.5, 0.8, 0.8]], [0.5, 0.5, 0.8, 0.8]],
...@@ -507,6 +517,8 @@ class ExportInferenceGraphTest(tf.test.TestCase): ...@@ -507,6 +517,8 @@ class ExportInferenceGraphTest(tf.test.TestCase):
[0.0, 0.0, 0.0, 0.0]]]) [0.0, 0.0, 0.0, 0.0]]])
self.assertAllClose(scores_np, [[0.7, 0.6], self.assertAllClose(scores_np, [[0.7, 0.6],
[0.9, 0.0]]) [0.9, 0.0]])
self.assertAllClose(multiclass_scores_np, [[[0.3, 0.7], [0.4, 0.6]],
[[0.1, 0.9], [0.0, 0.0]]])
self.assertAllClose(classes_np, [[1, 2], self.assertAllClose(classes_np, [[1, 2],
[2, 1]]) [2, 1]])
self.assertAllClose(keypoints_np, np.arange(48).reshape([2, 2, 6, 2])) self.assertAllClose(keypoints_np, np.arange(48).reshape([2, 2, 6, 2]))
......
...@@ -127,7 +127,7 @@ def transform_input_data(tensor_dict, ...@@ -127,7 +127,7 @@ def transform_input_data(tensor_dict,
# Apply model preprocessing ops and resize instance masks. # Apply model preprocessing ops and resize instance masks.
image = tensor_dict[fields.InputDataFields.image] image = tensor_dict[fields.InputDataFields.image]
preprocessed_resized_image, true_image_shape = model_preprocess_fn( preprocessed_resized_image, true_image_shape = model_preprocess_fn(
tf.expand_dims(tf.to_float(image), axis=0)) tf.expand_dims(tf.cast(image, dtype=tf.float32), axis=0))
if use_bfloat16: if use_bfloat16:
preprocessed_resized_image = tf.cast( preprocessed_resized_image = tf.cast(
preprocessed_resized_image, tf.bfloat16) preprocessed_resized_image, tf.bfloat16)
...@@ -219,14 +219,15 @@ def pad_input_data_to_static_shapes(tensor_dict, max_num_boxes, num_classes, ...@@ -219,14 +219,15 @@ def pad_input_data_to_static_shapes(tensor_dict, max_num_boxes, num_classes,
num_additional_channels = 0 num_additional_channels = 0
if fields.InputDataFields.image_additional_channels in tensor_dict: if fields.InputDataFields.image_additional_channels in tensor_dict:
num_additional_channels = tensor_dict[ num_additional_channels = shape_utils.get_dim_as_int(tensor_dict[
fields.InputDataFields.image_additional_channels].shape[2].value fields.InputDataFields.image_additional_channels].shape[2])
# We assume that if num_additional_channels > 0, then it has already been # We assume that if num_additional_channels > 0, then it has already been
# concatenated to the base image (but not the ground truth). # concatenated to the base image (but not the ground truth).
num_channels = 3 num_channels = 3
if fields.InputDataFields.image in tensor_dict: if fields.InputDataFields.image in tensor_dict:
num_channels = tensor_dict[fields.InputDataFields.image].shape[2].value num_channels = shape_utils.get_dim_as_int(
tensor_dict[fields.InputDataFields.image].shape[2])
if num_additional_channels: if num_additional_channels:
if num_additional_channels >= num_channels: if num_additional_channels >= num_channels:
...@@ -234,7 +235,8 @@ def pad_input_data_to_static_shapes(tensor_dict, max_num_boxes, num_classes, ...@@ -234,7 +235,8 @@ def pad_input_data_to_static_shapes(tensor_dict, max_num_boxes, num_classes,
'Image must be already concatenated with additional channels.') 'Image must be already concatenated with additional channels.')
if (fields.InputDataFields.original_image in tensor_dict and if (fields.InputDataFields.original_image in tensor_dict and
tensor_dict[fields.InputDataFields.original_image].shape[2].value == shape_utils.get_dim_as_int(
tensor_dict[fields.InputDataFields.original_image].shape[2]) ==
num_channels): num_channels):
raise ValueError( raise ValueError(
'Image must be already concatenated with additional channels.') 'Image must be already concatenated with additional channels.')
...@@ -273,19 +275,21 @@ def pad_input_data_to_static_shapes(tensor_dict, max_num_boxes, num_classes, ...@@ -273,19 +275,21 @@ def pad_input_data_to_static_shapes(tensor_dict, max_num_boxes, num_classes,
if fields.InputDataFields.original_image in tensor_dict: if fields.InputDataFields.original_image in tensor_dict:
padding_shapes[fields.InputDataFields.original_image] = [ padding_shapes[fields.InputDataFields.original_image] = [
height, width, tensor_dict[fields.InputDataFields. height, width,
original_image].shape[2].value shape_utils.get_dim_as_int(tensor_dict[fields.InputDataFields.
original_image].shape[2])
] ]
if fields.InputDataFields.groundtruth_keypoints in tensor_dict: if fields.InputDataFields.groundtruth_keypoints in tensor_dict:
tensor_shape = ( tensor_shape = (
tensor_dict[fields.InputDataFields.groundtruth_keypoints].shape) tensor_dict[fields.InputDataFields.groundtruth_keypoints].shape)
padding_shape = [max_num_boxes, tensor_shape[1].value, padding_shape = [max_num_boxes,
tensor_shape[2].value] shape_utils.get_dim_as_int(tensor_shape[1]),
shape_utils.get_dim_as_int(tensor_shape[2])]
padding_shapes[fields.InputDataFields.groundtruth_keypoints] = padding_shape padding_shapes[fields.InputDataFields.groundtruth_keypoints] = padding_shape
if fields.InputDataFields.groundtruth_keypoint_visibilities in tensor_dict: if fields.InputDataFields.groundtruth_keypoint_visibilities in tensor_dict:
tensor_shape = tensor_dict[fields.InputDataFields. tensor_shape = tensor_dict[fields.InputDataFields.
groundtruth_keypoint_visibilities].shape groundtruth_keypoint_visibilities].shape
padding_shape = [max_num_boxes, tensor_shape[1].value] padding_shape = [max_num_boxes, shape_utils.get_dim_as_int(tensor_shape[1])]
padding_shapes[fields.InputDataFields. padding_shapes[fields.InputDataFields.
groundtruth_keypoint_visibilities] = padding_shape groundtruth_keypoint_visibilities] = padding_shape
...@@ -318,7 +322,7 @@ def augment_input_data(tensor_dict, data_augmentation_options): ...@@ -318,7 +322,7 @@ def augment_input_data(tensor_dict, data_augmentation_options):
input tensor dictionary. input tensor dictionary.
""" """
tensor_dict[fields.InputDataFields.image] = tf.expand_dims( tensor_dict[fields.InputDataFields.image] = tf.expand_dims(
tf.to_float(tensor_dict[fields.InputDataFields.image]), 0) tf.cast(tensor_dict[fields.InputDataFields.image], dtype=tf.float32), 0)
include_instance_masks = (fields.InputDataFields.groundtruth_instance_masks include_instance_masks = (fields.InputDataFields.groundtruth_instance_masks
in tensor_dict) in tensor_dict)
...@@ -438,9 +442,22 @@ def create_train_input_fn(train_config, train_input_config, ...@@ -438,9 +442,22 @@ def create_train_input_fn(train_config, train_input_config,
""" """
def _train_input_fn(params=None): def _train_input_fn(params=None):
return train_input(train_config, train_input_config, model_config,
params=params)
return _train_input_fn
def train_input(train_config, train_input_config,
model_config, model=None, params=None):
"""Returns `features` and `labels` tensor dictionaries for training. """Returns `features` and `labels` tensor dictionaries for training.
Args: Args:
train_config: A train_pb2.TrainConfig.
train_input_config: An input_reader_pb2.InputReader.
model_config: A model_pb2.DetectionModel.
model: A pre-constructed Detection Model.
If None, one will be created from the config.
params: Parameter dictionary passed from the estimator. params: Parameter dictionary passed from the estimator.
Returns: Returns:
...@@ -490,6 +507,12 @@ def create_train_input_fn(train_config, train_input_config, ...@@ -490,6 +507,12 @@ def create_train_input_fn(train_config, train_input_config,
raise TypeError('The `model_config` must be a ' raise TypeError('The `model_config` must be a '
'model_pb2.DetectionModel.') 'model_pb2.DetectionModel.')
if model is None:
model_preprocess_fn = INPUT_BUILDER_UTIL_MAP['model_build'](
model_config, is_training=True).preprocess
else:
model_preprocess_fn = model.preprocess
def transform_and_pad_input_data_fn(tensor_dict): def transform_and_pad_input_data_fn(tensor_dict):
"""Combines transform and pad operation.""" """Combines transform and pad operation."""
data_augmentation_options = [ data_augmentation_options = [
...@@ -500,8 +523,6 @@ def create_train_input_fn(train_config, train_input_config, ...@@ -500,8 +523,6 @@ def create_train_input_fn(train_config, train_input_config,
augment_input_data, augment_input_data,
data_augmentation_options=data_augmentation_options) data_augmentation_options=data_augmentation_options)
model_preprocess_fn = INPUT_BUILDER_UTIL_MAP['model_build'](
model_config, is_training=True).preprocess
image_resizer_config = config_util.get_image_resizer_config(model_config) image_resizer_config = config_util.get_image_resizer_config(model_config)
image_resizer_fn = image_resizer_builder.build(image_resizer_config) image_resizer_fn = image_resizer_builder.build(image_resizer_config)
transform_data_fn = functools.partial( transform_data_fn = functools.partial(
...@@ -528,8 +549,6 @@ def create_train_input_fn(train_config, train_input_config, ...@@ -528,8 +549,6 @@ def create_train_input_fn(train_config, train_input_config,
batch_size=params['batch_size'] if params else train_config.batch_size) batch_size=params['batch_size'] if params else train_config.batch_size)
return dataset return dataset
return _train_input_fn
def create_eval_input_fn(eval_config, eval_input_config, model_config): def create_eval_input_fn(eval_config, eval_input_config, model_config):
"""Creates an eval `input` function for `Estimator`. """Creates an eval `input` function for `Estimator`.
...@@ -544,9 +563,22 @@ def create_eval_input_fn(eval_config, eval_input_config, model_config): ...@@ -544,9 +563,22 @@ def create_eval_input_fn(eval_config, eval_input_config, model_config):
""" """
def _eval_input_fn(params=None): def _eval_input_fn(params=None):
return eval_input(eval_config, eval_input_config, model_config,
params=params)
return _eval_input_fn
def eval_input(eval_config, eval_input_config, model_config,
model=None, params=None):
"""Returns `features` and `labels` tensor dictionaries for evaluation. """Returns `features` and `labels` tensor dictionaries for evaluation.
Args: Args:
eval_config: An eval_pb2.EvalConfig.
eval_input_config: An input_reader_pb2.InputReader.
model_config: A model_pb2.DetectionModel.
model: A pre-constructed Detection Model.
If None, one will be created from the config.
params: Parameter dictionary passed from the estimator. params: Parameter dictionary passed from the estimator.
Returns: Returns:
...@@ -593,11 +625,15 @@ def create_eval_input_fn(eval_config, eval_input_config, model_config): ...@@ -593,11 +625,15 @@ def create_eval_input_fn(eval_config, eval_input_config, model_config):
raise TypeError('The `model_config` must be a ' raise TypeError('The `model_config` must be a '
'model_pb2.DetectionModel.') 'model_pb2.DetectionModel.')
if model is None:
model_preprocess_fn = INPUT_BUILDER_UTIL_MAP['model_build'](
model_config, is_training=False).preprocess
else:
model_preprocess_fn = model.preprocess
def transform_and_pad_input_data_fn(tensor_dict): def transform_and_pad_input_data_fn(tensor_dict):
"""Combines transform and pad operation.""" """Combines transform and pad operation."""
num_classes = config_util.get_number_of_classes(model_config) num_classes = config_util.get_number_of_classes(model_config)
model_preprocess_fn = INPUT_BUILDER_UTIL_MAP['model_build'](
model_config, is_training=False).preprocess
image_resizer_config = config_util.get_image_resizer_config(model_config) image_resizer_config = config_util.get_image_resizer_config(model_config)
image_resizer_fn = image_resizer_builder.build(image_resizer_config) image_resizer_fn = image_resizer_builder.build(image_resizer_config)
...@@ -621,8 +657,6 @@ def create_eval_input_fn(eval_config, eval_input_config, model_config): ...@@ -621,8 +657,6 @@ def create_eval_input_fn(eval_config, eval_input_config, model_config):
transform_input_data_fn=transform_and_pad_input_data_fn) transform_input_data_fn=transform_and_pad_input_data_fn)
return dataset return dataset
return _eval_input_fn
def create_predict_input_fn(model_config, predict_input_config): def create_predict_input_fn(model_config, predict_input_config):
"""Creates a predict `input` function for `Estimator`. """Creates a predict `input` function for `Estimator`.
...@@ -664,7 +698,7 @@ def create_predict_input_fn(model_config, predict_input_config): ...@@ -664,7 +698,7 @@ def create_predict_input_fn(model_config, predict_input_config):
load_instance_masks=False, load_instance_masks=False,
num_additional_channels=predict_input_config.num_additional_channels) num_additional_channels=predict_input_config.num_additional_channels)
input_dict = transform_fn(decoder.decode(example)) input_dict = transform_fn(decoder.decode(example))
images = tf.to_float(input_dict[fields.InputDataFields.image]) images = tf.cast(input_dict[fields.InputDataFields.image], dtype=tf.float32)
images = tf.expand_dims(images, axis=0) images = tf.expand_dims(images, axis=0)
true_image_shape = tf.expand_dims( true_image_shape = tf.expand_dims(
input_dict[fields.InputDataFields.true_image_shape], axis=0) input_dict[fields.InputDataFields.true_image_shape], axis=0)
......
...@@ -53,6 +53,9 @@ EVAL_METRICS_CLASS_DICT = { ...@@ -53,6 +53,9 @@ EVAL_METRICS_CLASS_DICT = {
# DEPRECATED: please use oid_challenge_detection_metrics instead # DEPRECATED: please use oid_challenge_detection_metrics instead
'oid_challenge_object_detection_metrics': 'oid_challenge_object_detection_metrics':
object_detection_evaluation.OpenImagesDetectionChallengeEvaluator, object_detection_evaluation.OpenImagesDetectionChallengeEvaluator,
'oid_challenge_segmentation_metrics':
object_detection_evaluation
.OpenImagesInstanceSegmentationChallengeEvaluator,
} }
EVAL_DEFAULT_METRIC = 'pascal_voc_detection_metrics' EVAL_DEFAULT_METRIC = 'pascal_voc_detection_metrics'
...@@ -80,7 +83,7 @@ def _extract_predictions_and_losses(model, ...@@ -80,7 +83,7 @@ def _extract_predictions_and_losses(model,
input_dict = prefetch_queue.dequeue() input_dict = prefetch_queue.dequeue()
original_image = tf.expand_dims(input_dict[fields.InputDataFields.image], 0) original_image = tf.expand_dims(input_dict[fields.InputDataFields.image], 0)
preprocessed_image, true_image_shapes = model.preprocess( preprocessed_image, true_image_shapes = model.preprocess(
tf.to_float(original_image)) tf.cast(original_image, dtype=tf.float32))
prediction_dict = model.predict(preprocessed_image, true_image_shapes) prediction_dict = model.predict(preprocessed_image, true_image_shapes)
detections = model.postprocess(prediction_dict, true_image_shapes) detections = model.postprocess(prediction_dict, true_image_shapes)
......
...@@ -62,7 +62,7 @@ def create_input_queue(batch_size_per_clone, create_tensor_dict_fn, ...@@ -62,7 +62,7 @@ def create_input_queue(batch_size_per_clone, create_tensor_dict_fn,
tensor_dict[fields.InputDataFields.image], 0) tensor_dict[fields.InputDataFields.image], 0)
images = tensor_dict[fields.InputDataFields.image] images = tensor_dict[fields.InputDataFields.image]
float_images = tf.to_float(images) float_images = tf.cast(images, dtype=tf.float32)
tensor_dict[fields.InputDataFields.image] = float_images tensor_dict[fields.InputDataFields.image] = float_images
include_instance_masks = (fields.InputDataFields.groundtruth_instance_masks include_instance_masks = (fields.InputDataFields.groundtruth_instance_masks
......
...@@ -184,7 +184,7 @@ class ArgMaxMatcher(matcher.Matcher): ...@@ -184,7 +184,7 @@ class ArgMaxMatcher(matcher.Matcher):
return matches return matches
if similarity_matrix.shape.is_fully_defined(): if similarity_matrix.shape.is_fully_defined():
if similarity_matrix.shape[0].value == 0: if shape_utils.get_dim_as_int(similarity_matrix.shape[0]) == 0:
return _match_when_rows_are_empty() return _match_when_rows_are_empty()
else: else:
return _match_when_rows_are_non_empty() return _match_when_rows_are_non_empty()
......
...@@ -62,7 +62,7 @@ class GreedyBipartiteMatcher(matcher.Matcher): ...@@ -62,7 +62,7 @@ class GreedyBipartiteMatcher(matcher.Matcher):
# Convert similarity matrix to distance matrix as tf.image.bipartite tries # Convert similarity matrix to distance matrix as tf.image.bipartite tries
# to find minimum distance matches. # to find minimum distance matches.
distance_matrix = -1 * similarity_matrix distance_matrix = -1 * similarity_matrix
num_valid_rows = tf.reduce_sum(tf.to_float(valid_rows)) num_valid_rows = tf.reduce_sum(tf.cast(valid_rows, dtype=tf.float32))
_, match_results = image_ops.bipartite_match( _, match_results = image_ops.bipartite_match(
distance_matrix, num_valid_rows=num_valid_rows) distance_matrix, num_valid_rows=num_valid_rows)
match_results = tf.reshape(match_results, [-1]) match_results = tf.reshape(match_results, [-1])
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment