Unverified Commit 9bbf8015 authored by pkulzc's avatar pkulzc Committed by GitHub
Browse files

Merged commit includes the following changes: (#6932)

250447559  by Zhichao Lu:

    Update expected files format for Instance Segmentation challenge:
    - add fields ImageWidth, ImageHeight and store the values per prediction
    - as mask, store only encoded image and assume its size is ImageWidth x ImageHeight

--
250402780  by rathodv:

    Fix failing Mask R-CNN TPU convergence test.

    Cast second stage prediction tensors from bfloat16 to float32 to prevent errors in third target assignment (Mask Prediction) - Concat with different types bfloat16 and bfloat32 isn't allowed.

--
250300240  by Zhichao Lu:

    Addion Open Images Challenge 2019 object detection and instance segmentation
    support into Estimator framework.

--
249944839  by rathodv:

    Modify exporter.py to add multiclass score nodes in exported inference graphs.

--
249935201  by rathodv:

    Modify postprocess methods to preserve multiclass scores after non max suppression.

--
249878079  by Zhichao Lu:

    This CL slightly refactors some Object Detection helper functions for data creation, evaluation, and groundtruth providing.

    This will allow the eager+function custom loops to share code with the existing estimator training loops.

    Concretely we make the following changes:
    1. In input creation we separate dataset-creation into top-level helpers, and allow it to optionally accept a pre-constructed model directly instead of always creating a model from the config just for feature preprocessing.

    2. In coco evaluation we split the update_op creation into its own function, which the custom loops will call directly.

    3. In model_lib we move groundtruth providing/ datastructure munging into a helper function

    4. For now we put an escape hatch in `_summarize_target_assignment` when executing in tf v2.0 behavior because the summary apis used only work w/ tf 1.x

--
249673507  by rathodv:

    Use explicit casts instead of tf.to_float and tf.to_int32 to avoid warnings.

--
249656006  by Zhichao Lu:

    Add named "raw_keypoint_locations" node that corresponds with the "raw_box_locations" node.

--
249651674  by rathodv:

    Keep proposal boxes in float format. MatMulCropAndResize can handle the type even when feature themselves are bfloat16s.

--
249568633  by rathodv:

    Support q > 1 in class agnostic NMS.
    Break post_processing_test.py into 3 separate files to avoid linter errors.

--
249535530  by rathodv:

    Update some deprecated arguments to tf ops.

--
249368223  by rathodv:

    Modify MatMulCropAndResize to use MultiLevelRoIAlign method and move the tests to spatial_transform_ops.py module.

    This cl establishes that CropAndResize and RoIAlign are equivalent and only differ in the sampling point grid within the boxes. CropAndResize uses a uniform size x size point grid such that the corner points exactly overlap box corners, while RoiAlign divides boxes into size x size cells and uses their centers as sampling points. In this cl, we switch MatMulCropAndResize to use the MultiLevelRoIAlign implementation with `align_corner` option as MultiLevelRoIAlign implementation is more memory efficient on TPU when compared to the original MatMulCropAndResize.

--
249337338  by chowdhery:

    Add class-agnostic non-max-suppression in post_processing

--
249139196  by Zhichao Lu:

    Fix positional argument bug in export_tflite_ssd_graph

--
249120219  by Zhichao Lu:

    Add evaluator for computing precision limited to a given recall range.

--
249030593  by Zhichao Lu:

    Evaluation util to run segmentation and detection challenge evaluation.

--
248554358  by Zhichao Lu:

    This change contains the auxiliary changes required for TF 2.0 style training with eager+functions+dist strat loops, but not the loops themselves.

    It includes:
    - Updates to shape usage to support both tensorshape v1 and tensorshape v2
    - A fix to FreezableBatchNorm to not override the `training` arg in call when `None` was passed to the constructor (Not an issue in the estimator loops but it was in the custom loops)
    - Puts some constants in init_scope so they work in eager + functions
    - Makes learning rate schedules return a callable in eager mode (required so they update when the global_step changes)
    - Makes DetectionModel a tf.module so it tracks variables (e.g. ones nested in layers)
    - Removes some references to `op.name` for some losses and replaces it w/ explicit names
    - A small part of the change to allow the coco evaluation metrics to work in eager mode

--
248271226  by rathodv:

    Add MultiLevel RoIAlign op.

--
248229103  by rathodv:

    Add functions to 1. pad features maps 2. ravel 5-D indices

--
248206769  by rathodv:

    Add utilities needed to introduce RoI Align op.

--
248177733  by pengchong:

    Internal changes

--
247742582  by Zhichao Lu:

    Open Images Challenge 2019 instance segmentation metric: part 2

--
247525401  by Zhichao Lu:

    Update comments on max_class_per_detection.

--
247520753  by rathodv:

    Add multilevel crop and resize operation that builds on top of matmul_crop_and_resize.

--
247391600  by Zhichao Lu:

    Open Images Challenge 2019 instance segmentation metric

--
247325813  by chowdhery:

    Quantized MobileNet v2 SSD FPNLite config with depth multiplier 0.75

--

PiperOrigin-RevId: 250447559
parent f42fddee
...@@ -55,12 +55,24 @@ a handful of auxiliary annotations associated with each bounding box, namely, ...@@ -55,12 +55,24 @@ a handful of auxiliary annotations associated with each bounding box, namely,
instance masks and keypoints. instance masks and keypoints.
""" """
import abc import abc
import tensorflow as tf
from object_detection.core import standard_fields as fields from object_detection.core import standard_fields as fields
class DetectionModel(object): # If using a new enough version of TensorFlow, detection models should be a
"""Abstract base class for detection models.""" # tf module or keras model for tracking.
try:
_BaseClass = tf.Module
except AttributeError:
_BaseClass = object
class DetectionModel(_BaseClass):
"""Abstract base class for detection models.
Extends tf.Module to guarantee variable tracking.
"""
__metaclass__ = abc.ABCMeta __metaclass__ = abc.ABCMeta
def __init__(self, num_classes): def __init__(self, num_classes):
......
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for tensorflow_models.object_detection.core.post_processing."""
import numpy as np
import tensorflow as tf
from object_detection.core import post_processing
from object_detection.core import standard_fields as fields
from object_detection.utils import test_case
class MulticlassNonMaxSuppressionTest(test_case.TestCase):
def test_multiclass_nms_select_with_shared_boxes(self):
boxes = tf.constant([[[0, 0, 1, 1]],
[[0, 0.1, 1, 1.1]],
[[0, -0.1, 1, 0.9]],
[[0, 10, 1, 11]],
[[0, 10.1, 1, 11.1]],
[[0, 100, 1, 101]],
[[0, 1000, 1, 1002]],
[[0, 1000, 1, 1002.1]]], tf.float32)
scores = tf.constant([[.9, 0.01], [.75, 0.05],
[.6, 0.01], [.95, 0],
[.5, 0.01], [.3, 0.01],
[.01, .85], [.01, .5]])
score_thresh = 0.1
iou_thresh = .5
max_output_size = 4
exp_nms_corners = [[0, 10, 1, 11],
[0, 0, 1, 1],
[0, 1000, 1, 1002],
[0, 100, 1, 101]]
exp_nms_scores = [.95, .9, .85, .3]
exp_nms_classes = [0, 0, 1, 0]
nms, _ = post_processing.multiclass_non_max_suppression(
boxes, scores, score_thresh, iou_thresh, max_output_size)
with self.test_session() as sess:
nms_corners_output, nms_scores_output, nms_classes_output = sess.run(
[nms.get(), nms.get_field(fields.BoxListFields.scores),
nms.get_field(fields.BoxListFields.classes)])
self.assertAllClose(nms_corners_output, exp_nms_corners)
self.assertAllClose(nms_scores_output, exp_nms_scores)
self.assertAllClose(nms_classes_output, exp_nms_classes)
def test_multiclass_nms_select_with_shared_boxes_pad_to_max_output_size(self):
boxes = np.array([[[0, 0, 1, 1]],
[[0, 0.1, 1, 1.1]],
[[0, -0.1, 1, 0.9]],
[[0, 10, 1, 11]],
[[0, 10.1, 1, 11.1]],
[[0, 100, 1, 101]],
[[0, 1000, 1, 1002]],
[[0, 1000, 1, 1002.1]]], np.float32)
scores = np.array([[.9, 0.01], [.75, 0.05],
[.6, 0.01], [.95, 0],
[.5, 0.01], [.3, 0.01],
[.01, .85], [.01, .5]], np.float32)
score_thresh = 0.1
iou_thresh = .5
max_size_per_class = 4
max_output_size = 5
exp_nms_corners = [[0, 10, 1, 11],
[0, 0, 1, 1],
[0, 1000, 1, 1002],
[0, 100, 1, 101]]
exp_nms_scores = [.95, .9, .85, .3]
exp_nms_classes = [0, 0, 1, 0]
def graph_fn(boxes, scores):
nms, num_valid_nms_boxes = post_processing.multiclass_non_max_suppression(
boxes,
scores,
score_thresh,
iou_thresh,
max_size_per_class,
max_total_size=max_output_size,
pad_to_max_output_size=True)
return [nms.get(), nms.get_field(fields.BoxListFields.scores),
nms.get_field(fields.BoxListFields.classes), num_valid_nms_boxes]
[nms_corners_output, nms_scores_output, nms_classes_output,
num_valid_nms_boxes] = self.execute(graph_fn, [boxes, scores])
self.assertEqual(num_valid_nms_boxes, 4)
self.assertAllClose(nms_corners_output[0:num_valid_nms_boxes],
exp_nms_corners)
self.assertAllClose(nms_scores_output[0:num_valid_nms_boxes],
exp_nms_scores)
self.assertAllClose(nms_classes_output[0:num_valid_nms_boxes],
exp_nms_classes)
def test_multiclass_nms_select_with_shared_boxes_given_keypoints(self):
boxes = tf.constant([[[0, 0, 1, 1]],
[[0, 0.1, 1, 1.1]],
[[0, -0.1, 1, 0.9]],
[[0, 10, 1, 11]],
[[0, 10.1, 1, 11.1]],
[[0, 100, 1, 101]],
[[0, 1000, 1, 1002]],
[[0, 1000, 1, 1002.1]]], tf.float32)
scores = tf.constant([[.9, 0.01], [.75, 0.05],
[.6, 0.01], [.95, 0],
[.5, 0.01], [.3, 0.01],
[.01, .85], [.01, .5]])
num_keypoints = 6
keypoints = tf.tile(
tf.reshape(tf.range(8), [8, 1, 1]),
[1, num_keypoints, 2])
score_thresh = 0.1
iou_thresh = .5
max_output_size = 4
exp_nms_corners = [[0, 10, 1, 11],
[0, 0, 1, 1],
[0, 1000, 1, 1002],
[0, 100, 1, 101]]
exp_nms_scores = [.95, .9, .85, .3]
exp_nms_classes = [0, 0, 1, 0]
exp_nms_keypoints_tensor = tf.tile(
tf.reshape(tf.constant([3, 0, 6, 5], dtype=tf.float32), [4, 1, 1]),
[1, num_keypoints, 2])
nms, _ = post_processing.multiclass_non_max_suppression(
boxes,
scores,
score_thresh,
iou_thresh,
max_output_size,
additional_fields={fields.BoxListFields.keypoints: keypoints})
with self.test_session() as sess:
(nms_corners_output,
nms_scores_output,
nms_classes_output,
nms_keypoints,
exp_nms_keypoints) = sess.run([
nms.get(),
nms.get_field(fields.BoxListFields.scores),
nms.get_field(fields.BoxListFields.classes),
nms.get_field(fields.BoxListFields.keypoints),
exp_nms_keypoints_tensor
])
self.assertAllClose(nms_corners_output, exp_nms_corners)
self.assertAllClose(nms_scores_output, exp_nms_scores)
self.assertAllClose(nms_classes_output, exp_nms_classes)
self.assertAllEqual(nms_keypoints, exp_nms_keypoints)
def test_multiclass_nms_with_shared_boxes_given_keypoint_heatmaps(self):
boxes = tf.constant([[[0, 0, 1, 1]],
[[0, 0.1, 1, 1.1]],
[[0, -0.1, 1, 0.9]],
[[0, 10, 1, 11]],
[[0, 10.1, 1, 11.1]],
[[0, 100, 1, 101]],
[[0, 1000, 1, 1002]],
[[0, 1000, 1, 1002.1]]], tf.float32)
scores = tf.constant([[.9, 0.01], [.75, 0.05],
[.6, 0.01], [.95, 0],
[.5, 0.01], [.3, 0.01],
[.01, .85], [.01, .5]])
num_boxes = tf.shape(boxes)[0]
heatmap_height = 5
heatmap_width = 5
num_keypoints = 17
keypoint_heatmaps = tf.ones(
[num_boxes, heatmap_height, heatmap_width, num_keypoints],
dtype=tf.float32)
score_thresh = 0.1
iou_thresh = .5
max_output_size = 4
exp_nms_corners = [[0, 10, 1, 11],
[0, 0, 1, 1],
[0, 1000, 1, 1002],
[0, 100, 1, 101]]
exp_nms_scores = [.95, .9, .85, .3]
exp_nms_classes = [0, 0, 1, 0]
exp_nms_keypoint_heatmaps = np.ones(
(4, heatmap_height, heatmap_width, num_keypoints), dtype=np.float32)
nms, _ = post_processing.multiclass_non_max_suppression(
boxes,
scores,
score_thresh,
iou_thresh,
max_output_size,
additional_fields={
fields.BoxListFields.keypoint_heatmaps: keypoint_heatmaps
})
with self.test_session() as sess:
(nms_corners_output,
nms_scores_output,
nms_classes_output,
nms_keypoint_heatmaps) = sess.run(
[nms.get(),
nms.get_field(fields.BoxListFields.scores),
nms.get_field(fields.BoxListFields.classes),
nms.get_field(fields.BoxListFields.keypoint_heatmaps)])
self.assertAllClose(nms_corners_output, exp_nms_corners)
self.assertAllClose(nms_scores_output, exp_nms_scores)
self.assertAllClose(nms_classes_output, exp_nms_classes)
self.assertAllEqual(nms_keypoint_heatmaps, exp_nms_keypoint_heatmaps)
def test_multiclass_nms_with_additional_fields(self):
boxes = tf.constant([[[0, 0, 1, 1]],
[[0, 0.1, 1, 1.1]],
[[0, -0.1, 1, 0.9]],
[[0, 10, 1, 11]],
[[0, 10.1, 1, 11.1]],
[[0, 100, 1, 101]],
[[0, 1000, 1, 1002]],
[[0, 1000, 1, 1002.1]]], tf.float32)
scores = tf.constant([[.9, 0.01], [.75, 0.05],
[.6, 0.01], [.95, 0],
[.5, 0.01], [.3, 0.01],
[.01, .85], [.01, .5]])
coarse_boxes_key = 'coarse_boxes'
coarse_boxes = tf.constant([[0.1, 0.1, 1.1, 1.1],
[0.1, 0.2, 1.1, 1.2],
[0.1, -0.2, 1.1, 1.0],
[0.1, 10.1, 1.1, 11.1],
[0.1, 10.2, 1.1, 11.2],
[0.1, 100.1, 1.1, 101.1],
[0.1, 1000.1, 1.1, 1002.1],
[0.1, 1000.1, 1.1, 1002.2]], tf.float32)
score_thresh = 0.1
iou_thresh = .5
max_output_size = 4
exp_nms_corners = np.array([[0, 10, 1, 11],
[0, 0, 1, 1],
[0, 1000, 1, 1002],
[0, 100, 1, 101]], dtype=np.float32)
exp_nms_coarse_corners = np.array([[0.1, 10.1, 1.1, 11.1],
[0.1, 0.1, 1.1, 1.1],
[0.1, 1000.1, 1.1, 1002.1],
[0.1, 100.1, 1.1, 101.1]],
dtype=np.float32)
exp_nms_scores = [.95, .9, .85, .3]
exp_nms_classes = [0, 0, 1, 0]
nms, _ = post_processing.multiclass_non_max_suppression(
boxes,
scores,
score_thresh,
iou_thresh,
max_output_size,
additional_fields={coarse_boxes_key: coarse_boxes})
with self.test_session() as sess:
(nms_corners_output,
nms_scores_output,
nms_classes_output,
nms_coarse_corners) = sess.run(
[nms.get(),
nms.get_field(fields.BoxListFields.scores),
nms.get_field(fields.BoxListFields.classes),
nms.get_field(coarse_boxes_key)])
self.assertAllClose(nms_corners_output, exp_nms_corners)
self.assertAllClose(nms_scores_output, exp_nms_scores)
self.assertAllClose(nms_classes_output, exp_nms_classes)
self.assertAllEqual(nms_coarse_corners, exp_nms_coarse_corners)
def test_multiclass_nms_select_with_shared_boxes_given_masks(self):
boxes = tf.constant([[[0, 0, 1, 1]],
[[0, 0.1, 1, 1.1]],
[[0, -0.1, 1, 0.9]],
[[0, 10, 1, 11]],
[[0, 10.1, 1, 11.1]],
[[0, 100, 1, 101]],
[[0, 1000, 1, 1002]],
[[0, 1000, 1, 1002.1]]], tf.float32)
scores = tf.constant([[.9, 0.01], [.75, 0.05],
[.6, 0.01], [.95, 0],
[.5, 0.01], [.3, 0.01],
[.01, .85], [.01, .5]])
num_classes = 2
mask_height = 3
mask_width = 3
masks = tf.tile(
tf.reshape(tf.range(8), [8, 1, 1, 1]),
[1, num_classes, mask_height, mask_width])
score_thresh = 0.1
iou_thresh = .5
max_output_size = 4
exp_nms_corners = [[0, 10, 1, 11],
[0, 0, 1, 1],
[0, 1000, 1, 1002],
[0, 100, 1, 101]]
exp_nms_scores = [.95, .9, .85, .3]
exp_nms_classes = [0, 0, 1, 0]
exp_nms_masks_tensor = tf.tile(
tf.reshape(tf.constant([3, 0, 6, 5], dtype=tf.float32), [4, 1, 1]),
[1, mask_height, mask_width])
nms, _ = post_processing.multiclass_non_max_suppression(
boxes, scores, score_thresh, iou_thresh, max_output_size, masks=masks)
with self.test_session() as sess:
(nms_corners_output,
nms_scores_output,
nms_classes_output,
nms_masks,
exp_nms_masks) = sess.run([nms.get(),
nms.get_field(fields.BoxListFields.scores),
nms.get_field(fields.BoxListFields.classes),
nms.get_field(fields.BoxListFields.masks),
exp_nms_masks_tensor])
self.assertAllClose(nms_corners_output, exp_nms_corners)
self.assertAllClose(nms_scores_output, exp_nms_scores)
self.assertAllClose(nms_classes_output, exp_nms_classes)
self.assertAllEqual(nms_masks, exp_nms_masks)
def test_multiclass_nms_select_with_clip_window(self):
boxes = tf.constant([[[0, 0, 10, 10]],
[[1, 1, 11, 11]]], tf.float32)
scores = tf.constant([[.9], [.75]])
clip_window = tf.constant([5, 4, 8, 7], tf.float32)
score_thresh = 0.0
iou_thresh = 0.5
max_output_size = 100
exp_nms_corners = [[5, 4, 8, 7]]
exp_nms_scores = [.9]
exp_nms_classes = [0]
nms, _ = post_processing.multiclass_non_max_suppression(
boxes,
scores,
score_thresh,
iou_thresh,
max_output_size,
clip_window=clip_window)
with self.test_session() as sess:
nms_corners_output, nms_scores_output, nms_classes_output = sess.run(
[nms.get(), nms.get_field(fields.BoxListFields.scores),
nms.get_field(fields.BoxListFields.classes)])
self.assertAllClose(nms_corners_output, exp_nms_corners)
self.assertAllClose(nms_scores_output, exp_nms_scores)
self.assertAllClose(nms_classes_output, exp_nms_classes)
def test_multiclass_nms_select_with_clip_window_change_coordinate_frame(self):
boxes = tf.constant([[[0, 0, 10, 10]],
[[1, 1, 11, 11]]], tf.float32)
scores = tf.constant([[.9], [.75]])
clip_window = tf.constant([5, 4, 8, 7], tf.float32)
score_thresh = 0.0
iou_thresh = 0.5
max_output_size = 100
exp_nms_corners = [[0, 0, 1, 1]]
exp_nms_scores = [.9]
exp_nms_classes = [0]
nms, _ = post_processing.multiclass_non_max_suppression(
boxes,
scores,
score_thresh,
iou_thresh,
max_output_size,
clip_window=clip_window,
change_coordinate_frame=True)
with self.test_session() as sess:
nms_corners_output, nms_scores_output, nms_classes_output = sess.run(
[nms.get(), nms.get_field(fields.BoxListFields.scores),
nms.get_field(fields.BoxListFields.classes)])
self.assertAllClose(nms_corners_output, exp_nms_corners)
self.assertAllClose(nms_scores_output, exp_nms_scores)
self.assertAllClose(nms_classes_output, exp_nms_classes)
def test_multiclass_nms_select_with_per_class_cap(self):
boxes = tf.constant([[[0, 0, 1, 1]],
[[0, 0.1, 1, 1.1]],
[[0, -0.1, 1, 0.9]],
[[0, 10, 1, 11]],
[[0, 10.1, 1, 11.1]],
[[0, 100, 1, 101]],
[[0, 1000, 1, 1002]],
[[0, 1000, 1, 1002.1]]], tf.float32)
scores = tf.constant([[.9, 0.01], [.75, 0.05],
[.6, 0.01], [.95, 0],
[.5, 0.01], [.3, 0.01],
[.01, .85], [.01, .5]])
score_thresh = 0.1
iou_thresh = .5
max_size_per_class = 2
exp_nms_corners = [[0, 10, 1, 11],
[0, 0, 1, 1],
[0, 1000, 1, 1002]]
exp_nms_scores = [.95, .9, .85]
exp_nms_classes = [0, 0, 1]
nms, _ = post_processing.multiclass_non_max_suppression(
boxes, scores, score_thresh, iou_thresh, max_size_per_class)
with self.test_session() as sess:
nms_corners_output, nms_scores_output, nms_classes_output = sess.run(
[nms.get(), nms.get_field(fields.BoxListFields.scores),
nms.get_field(fields.BoxListFields.classes)])
self.assertAllClose(nms_corners_output, exp_nms_corners)
self.assertAllClose(nms_scores_output, exp_nms_scores)
self.assertAllClose(nms_classes_output, exp_nms_classes)
def test_multiclass_nms_select_with_total_cap(self):
boxes = tf.constant([[[0, 0, 1, 1]],
[[0, 0.1, 1, 1.1]],
[[0, -0.1, 1, 0.9]],
[[0, 10, 1, 11]],
[[0, 10.1, 1, 11.1]],
[[0, 100, 1, 101]],
[[0, 1000, 1, 1002]],
[[0, 1000, 1, 1002.1]]], tf.float32)
scores = tf.constant([[.9, 0.01], [.75, 0.05],
[.6, 0.01], [.95, 0],
[.5, 0.01], [.3, 0.01],
[.01, .85], [.01, .5]])
score_thresh = 0.1
iou_thresh = .5
max_size_per_class = 4
max_total_size = 2
exp_nms_corners = [[0, 10, 1, 11],
[0, 0, 1, 1]]
exp_nms_scores = [.95, .9]
exp_nms_classes = [0, 0]
nms, _ = post_processing.multiclass_non_max_suppression(
boxes, scores, score_thresh, iou_thresh, max_size_per_class,
max_total_size)
with self.test_session() as sess:
nms_corners_output, nms_scores_output, nms_classes_output = sess.run(
[nms.get(), nms.get_field(fields.BoxListFields.scores),
nms.get_field(fields.BoxListFields.classes)])
self.assertAllClose(nms_corners_output, exp_nms_corners)
self.assertAllClose(nms_scores_output, exp_nms_scores)
self.assertAllClose(nms_classes_output, exp_nms_classes)
def test_multiclass_nms_threshold_then_select_with_shared_boxes(self):
boxes = tf.constant([[[0, 0, 1, 1]],
[[0, 0.1, 1, 1.1]],
[[0, -0.1, 1, 0.9]],
[[0, 10, 1, 11]],
[[0, 10.1, 1, 11.1]],
[[0, 100, 1, 101]],
[[0, 1000, 1, 1002]],
[[0, 1000, 1, 1002.1]]], tf.float32)
scores = tf.constant([[.9], [.75], [.6], [.95], [.5], [.3], [.01], [.01]])
score_thresh = 0.1
iou_thresh = .5
max_output_size = 3
exp_nms = [[0, 10, 1, 11],
[0, 0, 1, 1],
[0, 100, 1, 101]]
nms, _ = post_processing.multiclass_non_max_suppression(
boxes, scores, score_thresh, iou_thresh, max_output_size)
with self.test_session() as sess:
nms_output = sess.run(nms.get())
self.assertAllClose(nms_output, exp_nms)
def test_multiclass_nms_select_with_separate_boxes(self):
boxes = tf.constant([[[0, 0, 1, 1], [0, 0, 4, 5]],
[[0, 0.1, 1, 1.1], [0, 0.1, 2, 1.1]],
[[0, -0.1, 1, 0.9], [0, -0.1, 1, 0.9]],
[[0, 10, 1, 11], [0, 10, 1, 11]],
[[0, 10.1, 1, 11.1], [0, 10.1, 1, 11.1]],
[[0, 100, 1, 101], [0, 100, 1, 101]],
[[0, 1000, 1, 1002], [0, 999, 2, 1004]],
[[0, 1000, 1, 1002.1], [0, 999, 2, 1002.7]]],
tf.float32)
scores = tf.constant([[.9, 0.01], [.75, 0.05],
[.6, 0.01], [.95, 0],
[.5, 0.01], [.3, 0.01],
[.01, .85], [.01, .5]])
score_thresh = 0.1
iou_thresh = .5
max_output_size = 4
exp_nms_corners = [[0, 10, 1, 11],
[0, 0, 1, 1],
[0, 999, 2, 1004],
[0, 100, 1, 101]]
exp_nms_scores = [.95, .9, .85, .3]
exp_nms_classes = [0, 0, 1, 0]
nms, _ = post_processing.multiclass_non_max_suppression(
boxes, scores, score_thresh, iou_thresh, max_output_size)
with self.test_session() as sess:
nms_corners_output, nms_scores_output, nms_classes_output = sess.run(
[nms.get(), nms.get_field(fields.BoxListFields.scores),
nms.get_field(fields.BoxListFields.classes)])
self.assertAllClose(nms_corners_output, exp_nms_corners)
self.assertAllClose(nms_scores_output, exp_nms_scores)
self.assertAllClose(nms_classes_output, exp_nms_classes)
if __name__ == '__main__':
tf.test.main()
...@@ -24,6 +24,94 @@ from object_detection.core import standard_fields as fields ...@@ -24,6 +24,94 @@ from object_detection.core import standard_fields as fields
from object_detection.utils import shape_utils from object_detection.utils import shape_utils
def _validate_boxes_scores_iou_thresh(boxes, scores, iou_thresh,
change_coordinate_frame, clip_window):
"""Validates boxes, scores and iou_thresh.
This function validates the boxes, scores, iou_thresh
and if change_coordinate_frame is True, clip_window must be specified.
Args:
boxes: A [k, q, 4] float32 tensor containing k detections. `q` can be either
number of classes or 1 depending on whether a separate box is predicted
per class.
scores: A [k, num_classes] float32 tensor containing the scores for each of
the k detections. The scores have to be non-negative when
pad_to_max_output_size is True.
iou_thresh: scalar threshold for IOU (new boxes that have high IOU overlap
with previously selected boxes are removed).
change_coordinate_frame: Whether to normalize coordinates after clipping
relative to clip_window (this can only be set to True if a clip_window is
provided)
clip_window: A float32 tensor of the form [y_min, x_min, y_max, x_max]
representing the window to clip and normalize boxes to before performing
non-max suppression.
Raises:
ValueError: if iou_thresh is not in [0, 1] or if input boxlist does not
have a valid scores field.
"""
if not 0 <= iou_thresh <= 1.0:
raise ValueError('iou_thresh must be between 0 and 1')
if scores.shape.ndims != 2:
raise ValueError('scores field must be of rank 2')
if scores.shape[1].value is None:
raise ValueError('scores must have statically defined second ' 'dimension')
if boxes.shape.ndims != 3:
raise ValueError('boxes must be of rank 3.')
if not (shape_utils.get_dim_as_int(
boxes.shape[1]) == shape_utils.get_dim_as_int(scores.shape[1]) or
shape_utils.get_dim_as_int(boxes.shape[1]) == 1):
raise ValueError('second dimension of boxes must be either 1 or equal '
'to the second dimension of scores')
if boxes.shape[2].value != 4:
raise ValueError('last dimension of boxes must be of size 4.')
if change_coordinate_frame and clip_window is None:
raise ValueError('if change_coordinate_frame is True, then a clip_window'
'must be specified.')
def _clip_window_prune_boxes(sorted_boxes, clip_window, pad_to_max_output_size,
change_coordinate_frame):
"""Prune boxes with zero area.
Args:
sorted_boxes: A BoxList containing k detections.
clip_window: A float32 tensor of the form [y_min, x_min, y_max, x_max]
representing the window to clip and normalize boxes to before performing
non-max suppression.
pad_to_max_output_size: flag indicating whether to pad to max output size or
not.
change_coordinate_frame: Whether to normalize coordinates after clipping
relative to clip_window (this can only be set to True if a clip_window is
provided).
Returns:
sorted_boxes: A BoxList containing k detections after pruning.
num_valid_nms_boxes_cumulative: Number of valid NMS boxes
"""
sorted_boxes = box_list_ops.clip_to_window(
sorted_boxes,
clip_window,
filter_nonoverlapping=not pad_to_max_output_size)
# Set the scores of boxes with zero area to -1 to keep the default
# behaviour of pruning out zero area boxes.
sorted_boxes_size = tf.shape(sorted_boxes.get())[0]
non_zero_box_area = tf.cast(box_list_ops.area(sorted_boxes), tf.bool)
sorted_boxes_scores = tf.where(
non_zero_box_area, sorted_boxes.get_field(fields.BoxListFields.scores),
-1 * tf.ones(sorted_boxes_size))
sorted_boxes.add_field(fields.BoxListFields.scores, sorted_boxes_scores)
num_valid_nms_boxes_cumulative = tf.reduce_sum(
tf.cast(tf.greater_equal(sorted_boxes_scores, 0), tf.int32))
sorted_boxes = box_list_ops.sort_by_field(sorted_boxes,
fields.BoxListFields.scores)
if change_coordinate_frame:
sorted_boxes = box_list_ops.change_coordinate_frame(sorted_boxes,
clip_window)
return sorted_boxes, num_valid_nms_boxes_cumulative
def multiclass_non_max_suppression(boxes, def multiclass_non_max_suppression(boxes,
scores, scores,
score_thresh, score_thresh,
...@@ -97,28 +185,12 @@ def multiclass_non_max_suppression(boxes, ...@@ -97,28 +185,12 @@ def multiclass_non_max_suppression(boxes,
ValueError: if iou_thresh is not in [0, 1] or if input boxlist does not have ValueError: if iou_thresh is not in [0, 1] or if input boxlist does not have
a valid scores field. a valid scores field.
""" """
if not 0 <= iou_thresh <= 1.0: _validate_boxes_scores_iou_thresh(boxes, scores, iou_thresh,
raise ValueError('iou_thresh must be between 0 and 1') change_coordinate_frame, clip_window)
if scores.shape.ndims != 2:
raise ValueError('scores field must be of rank 2')
if scores.shape[1].value is None:
raise ValueError('scores must have statically defined second '
'dimension')
if boxes.shape.ndims != 3:
raise ValueError('boxes must be of rank 3.')
if not (boxes.shape[1].value == scores.shape[1].value or
boxes.shape[1].value == 1):
raise ValueError('second dimension of boxes must be either 1 or equal '
'to the second dimension of scores')
if boxes.shape[2].value != 4:
raise ValueError('last dimension of boxes must be of size 4.')
if change_coordinate_frame and clip_window is None:
raise ValueError('if change_coordinate_frame is True, then a clip_window'
'must be specified.')
with tf.name_scope(scope, 'MultiClassNonMaxSuppression'): with tf.name_scope(scope, 'MultiClassNonMaxSuppression'):
num_scores = tf.shape(scores)[0] num_scores = tf.shape(scores)[0]
num_classes = scores.get_shape()[1] num_classes = shape_utils.get_dim_as_int(scores.get_shape()[1])
selected_boxes_list = [] selected_boxes_list = []
num_valid_nms_boxes_cumulative = tf.constant(0) num_valid_nms_boxes_cumulative = tf.constant(0)
...@@ -128,7 +200,7 @@ def multiclass_non_max_suppression(boxes, ...@@ -128,7 +200,7 @@ def multiclass_non_max_suppression(boxes,
if boundaries is not None: if boundaries is not None:
per_class_boundaries_list = tf.unstack(boundaries, axis=1) per_class_boundaries_list = tf.unstack(boundaries, axis=1)
boxes_ids = (range(num_classes) if len(per_class_boxes_list) > 1 boxes_ids = (range(num_classes) if len(per_class_boxes_list) > 1
else [0] * num_classes.value) else [0] * num_classes)
for class_idx, boxes_idx in zip(range(num_classes), boxes_ids): for class_idx, boxes_idx in zip(range(num_classes), boxes_ids):
per_class_boxes = per_class_boxes_list[boxes_idx] per_class_boxes = per_class_boxes_list[boxes_idx]
boxlist_and_class_scores = box_list.BoxList(per_class_boxes) boxlist_and_class_scores = box_list.BoxList(per_class_boxes)
...@@ -193,32 +265,13 @@ def multiclass_non_max_suppression(boxes, ...@@ -193,32 +265,13 @@ def multiclass_non_max_suppression(boxes,
if clip_window is not None: if clip_window is not None:
# When pad_to_max_output_size is False, it prunes the boxes with zero # When pad_to_max_output_size is False, it prunes the boxes with zero
# area. # area.
sorted_boxes = box_list_ops.clip_to_window( sorted_boxes, num_valid_nms_boxes_cumulative = _clip_window_prune_boxes(
sorted_boxes, sorted_boxes, clip_window, pad_to_max_output_size,
clip_window, change_coordinate_frame)
filter_nonoverlapping=not pad_to_max_output_size)
# Set the scores of boxes with zero area to -1 to keep the default
# behaviour of pruning out zero area boxes.
sorted_boxes_size = tf.shape(sorted_boxes.get())[0]
non_zero_box_area = tf.cast(box_list_ops.area(sorted_boxes), tf.bool)
sorted_boxes_scores = tf.where(
non_zero_box_area,
sorted_boxes.get_field(fields.BoxListFields.scores),
-1*tf.ones(sorted_boxes_size))
sorted_boxes.add_field(fields.BoxListFields.scores, sorted_boxes_scores)
num_valid_nms_boxes_cumulative = tf.reduce_sum(
tf.cast(tf.greater_equal(sorted_boxes_scores, 0), tf.int32))
sorted_boxes = box_list_ops.sort_by_field(sorted_boxes,
fields.BoxListFields.scores)
if change_coordinate_frame:
sorted_boxes = box_list_ops.change_coordinate_frame(
sorted_boxes, clip_window)
if max_total_size: if max_total_size:
max_total_size = tf.minimum(max_total_size, max_total_size = tf.minimum(max_total_size, sorted_boxes.num_boxes())
sorted_boxes.num_boxes()) sorted_boxes = box_list_ops.gather(sorted_boxes, tf.range(max_total_size))
sorted_boxes = box_list_ops.gather(sorted_boxes,
tf.range(max_total_size))
num_valid_nms_boxes_cumulative = tf.where( num_valid_nms_boxes_cumulative = tf.where(
max_total_size > num_valid_nms_boxes_cumulative, max_total_size > num_valid_nms_boxes_cumulative,
num_valid_nms_boxes_cumulative, max_total_size) num_valid_nms_boxes_cumulative, max_total_size)
...@@ -230,6 +283,175 @@ def multiclass_non_max_suppression(boxes, ...@@ -230,6 +283,175 @@ def multiclass_non_max_suppression(boxes,
return sorted_boxes, num_valid_nms_boxes_cumulative return sorted_boxes, num_valid_nms_boxes_cumulative
def class_agnostic_non_max_suppression(boxes,
scores,
score_thresh,
iou_thresh,
max_classes_per_detection=1,
max_total_size=0,
clip_window=None,
change_coordinate_frame=False,
masks=None,
boundaries=None,
pad_to_max_output_size=False,
additional_fields=None,
scope=None):
"""Class-agnostic version of non maximum suppression.
This op greedily selects a subset of detection bounding boxes, pruning
away boxes that have high IOU (intersection over union) overlap (> thresh)
with already selected boxes. It operates on all the boxes using
max scores across all classes for which scores are provided (via the scores
field of the input box_list), pruning boxes with score less than a provided
threshold prior to applying NMS.
Please note that this operation is performed in a class-agnostic way,
therefore any background classes should be removed prior to calling this
function.
Selected boxes are guaranteed to be sorted in decreasing order by score (but
the sort is not guaranteed to be stable).
Args:
boxes: A [k, q, 4] float32 tensor containing k detections. `q` can be either
number of classes or 1 depending on whether a separate box is predicted
per class.
scores: A [k, num_classes] float32 tensor containing the scores for each of
the k detections. The scores have to be non-negative when
pad_to_max_output_size is True.
score_thresh: scalar threshold for score (low scoring boxes are removed).
iou_thresh: scalar threshold for IOU (new boxes that have high IOU overlap
with previously selected boxes are removed).
max_classes_per_detection: maximum number of retained classes per detection
box in class-agnostic NMS.
max_total_size: maximum number of boxes retained over all classes. By
default returns all boxes retained after capping boxes per class.
clip_window: A float32 tensor of the form [y_min, x_min, y_max, x_max]
representing the window to clip and normalize boxes to before performing
non-max suppression.
change_coordinate_frame: Whether to normalize coordinates after clipping
relative to clip_window (this can only be set to True if a clip_window is
provided)
masks: (optional) a [k, q, mask_height, mask_width] float32 tensor
containing box masks. `q` can be either number of classes or 1 depending
on whether a separate mask is predicted per class.
boundaries: (optional) a [k, q, boundary_height, boundary_width] float32
tensor containing box boundaries. `q` can be either number of classes or 1
depending on whether a separate boundary is predicted per class.
pad_to_max_output_size: If true, the output nmsed boxes are padded to be of
length `max_size_per_class`. Defaults to false.
additional_fields: (optional) If not None, a dictionary that maps keys to
tensors whose first dimensions are all of size `k`. After non-maximum
suppression, all tensors corresponding to the selected boxes will be added
to resulting BoxList.
scope: name scope.
Returns:
A tuple of sorted_boxes and num_valid_nms_boxes. The sorted_boxes is a
BoxList holds M boxes with a rank-1 scores field representing
corresponding scores for each box with scores sorted in decreasing order
and a rank-1 classes field representing a class label for each box. The
num_valid_nms_boxes is a 0-D integer tensor representing the number of
valid elements in `BoxList`, with the valid elements appearing first.
Raises:
ValueError: if iou_thresh is not in [0, 1] or if input boxlist does not have
a valid scores field.
"""
_validate_boxes_scores_iou_thresh(boxes, scores, iou_thresh,
change_coordinate_frame, clip_window)
if max_classes_per_detection > 1:
raise ValueError('Max classes per detection box >1 not supported.')
q = boxes.shape[1].value
if q > 1:
class_ids = tf.expand_dims(
tf.argmax(scores, axis=1, output_type=tf.int32), axis=1)
boxes = tf.batch_gather(boxes, class_ids)
if masks is not None:
masks = tf.batch_gather(masks, class_ids)
if boundaries is not None:
boundaries = tf.batch_gather(boundaries, class_ids)
boxes = tf.squeeze(boxes, axis=[1])
if masks is not None:
masks = tf.squeeze(masks, axis=[1])
if boundaries is not None:
boundaries = tf.squeeze(boundaries, axis=[1])
with tf.name_scope(scope, 'ClassAgnosticNonMaxSuppression'):
boxlist_and_class_scores = box_list.BoxList(boxes)
max_scores = tf.reduce_max(scores, axis=-1)
classes_with_max_scores = tf.argmax(scores, axis=-1)
boxlist_and_class_scores.add_field(fields.BoxListFields.scores, max_scores)
if masks is not None:
boxlist_and_class_scores.add_field(fields.BoxListFields.masks, masks)
if boundaries is not None:
boxlist_and_class_scores.add_field(fields.BoxListFields.boundaries,
boundaries)
if additional_fields is not None:
for key, tensor in additional_fields.items():
boxlist_and_class_scores.add_field(key, tensor)
if pad_to_max_output_size:
max_selection_size = max_total_size
selected_indices, num_valid_nms_boxes = (
tf.image.non_max_suppression_padded(
boxlist_and_class_scores.get(),
boxlist_and_class_scores.get_field(fields.BoxListFields.scores),
max_selection_size,
iou_threshold=iou_thresh,
score_threshold=score_thresh,
pad_to_max_output_size=True))
else:
max_selection_size = tf.minimum(max_total_size,
boxlist_and_class_scores.num_boxes())
selected_indices = tf.image.non_max_suppression(
boxlist_and_class_scores.get(),
boxlist_and_class_scores.get_field(fields.BoxListFields.scores),
max_selection_size,
iou_threshold=iou_thresh,
score_threshold=score_thresh)
num_valid_nms_boxes = tf.shape(selected_indices)[0]
selected_indices = tf.concat([
selected_indices,
tf.zeros(max_selection_size - num_valid_nms_boxes, tf.int32)
], 0)
nms_result = box_list_ops.gather(boxlist_and_class_scores, selected_indices)
valid_nms_boxes_indx = tf.less(
tf.range(max_selection_size), num_valid_nms_boxes)
nms_scores = nms_result.get_field(fields.BoxListFields.scores)
nms_result.add_field(
fields.BoxListFields.scores,
tf.where(valid_nms_boxes_indx, nms_scores,
-1 * tf.ones(max_selection_size)))
selected_classes = tf.gather(classes_with_max_scores, selected_indices)
nms_result.add_field(fields.BoxListFields.classes, selected_classes)
selected_boxes = nms_result
sorted_boxes = box_list_ops.sort_by_field(selected_boxes,
fields.BoxListFields.scores)
if clip_window is not None:
# When pad_to_max_output_size is False, it prunes the boxes with zero
# area.
sorted_boxes, num_valid_nms_boxes = _clip_window_prune_boxes(
sorted_boxes, clip_window, pad_to_max_output_size,
change_coordinate_frame)
if max_total_size:
max_total_size = tf.minimum(max_total_size, sorted_boxes.num_boxes())
sorted_boxes = box_list_ops.gather(sorted_boxes, tf.range(max_total_size))
num_valid_nms_boxes = tf.where(max_total_size > num_valid_nms_boxes,
num_valid_nms_boxes, max_total_size)
# Select only the valid boxes if pad_to_max_output_size is False.
if not pad_to_max_output_size:
sorted_boxes = box_list_ops.gather(sorted_boxes,
tf.range(num_valid_nms_boxes))
return sorted_boxes, num_valid_nms_boxes
def batch_multiclass_non_max_suppression(boxes, def batch_multiclass_non_max_suppression(boxes,
scores, scores,
score_thresh, score_thresh,
...@@ -243,7 +465,9 @@ def batch_multiclass_non_max_suppression(boxes, ...@@ -243,7 +465,9 @@ def batch_multiclass_non_max_suppression(boxes,
additional_fields=None, additional_fields=None,
scope=None, scope=None,
use_static_shapes=False, use_static_shapes=False,
parallel_iterations=32): parallel_iterations=32,
use_class_agnostic_nms=False,
max_classes_per_detection=1):
"""Multi-class version of non maximum suppression that operates on a batch. """Multi-class version of non maximum suppression that operates on a batch.
This op is similar to `multiclass_non_max_suppression` but operates on a batch This op is similar to `multiclass_non_max_suppression` but operates on a batch
...@@ -253,8 +477,8 @@ def batch_multiclass_non_max_suppression(boxes, ...@@ -253,8 +477,8 @@ def batch_multiclass_non_max_suppression(boxes,
Args: Args:
boxes: A [batch_size, num_anchors, q, 4] float32 tensor containing boxes: A [batch_size, num_anchors, q, 4] float32 tensor containing
detections. If `q` is 1 then same boxes are used for all classes detections. If `q` is 1 then same boxes are used for all classes
otherwise, if `q` is equal to number of classes, class-specific boxes otherwise, if `q` is equal to number of classes, class-specific boxes are
are used. used.
scores: A [batch_size, num_anchors, num_classes] float32 tensor containing scores: A [batch_size, num_anchors, num_classes] float32 tensor containing
the scores for each of the `num_anchors` detections. The scores have to be the scores for each of the `num_anchors` detections. The scores have to be
non-negative when use_static_shapes is set True. non-negative when use_static_shapes is set True.
...@@ -274,8 +498,8 @@ def batch_multiclass_non_max_suppression(boxes, ...@@ -274,8 +498,8 @@ def batch_multiclass_non_max_suppression(boxes,
relative to clip_window (this can only be set to True if a clip_window is relative to clip_window (this can only be set to True if a clip_window is
provided) provided)
num_valid_boxes: (optional) a Tensor of type `int32`. A 1-D tensor of shape num_valid_boxes: (optional) a Tensor of type `int32`. A 1-D tensor of shape
[batch_size] representing the number of valid boxes to be considered [batch_size] representing the number of valid boxes to be considered for
for each image in the batch. This parameter allows for ignoring zero each image in the batch. This parameter allows for ignoring zero
paddings. paddings.
masks: (optional) a [batch_size, num_anchors, q, mask_height, mask_width] masks: (optional) a [batch_size, num_anchors, q, mask_height, mask_width]
float32 tensor containing box masks. `q` can be either number of classes float32 tensor containing box masks. `q` can be either number of classes
...@@ -288,6 +512,10 @@ def batch_multiclass_non_max_suppression(boxes, ...@@ -288,6 +512,10 @@ def batch_multiclass_non_max_suppression(boxes,
Defaults to false. Defaults to false.
parallel_iterations: (optional) number of batch items to process in parallel_iterations: (optional) number of batch items to process in
parallel. parallel.
use_class_agnostic_nms: If true, this uses class-agnostic non max
suppression
max_classes_per_detection: Maximum number of retained classes per detection
box in class-agnostic NMS.
Returns: Returns:
'nmsed_boxes': A [batch_size, max_detections, 4] float32 tensor 'nmsed_boxes': A [batch_size, max_detections, 4] float32 tensor
...@@ -313,8 +541,8 @@ def batch_multiclass_non_max_suppression(boxes, ...@@ -313,8 +541,8 @@ def batch_multiclass_non_max_suppression(boxes,
ValueError: if `q` in boxes.shape is not 1 or not equal to number of ValueError: if `q` in boxes.shape is not 1 or not equal to number of
classes as inferred from scores.shape. classes as inferred from scores.shape.
""" """
q = boxes.shape[2].value q = shape_utils.get_dim_as_int(boxes.shape[2])
num_classes = scores.shape[2].value num_classes = shape_utils.get_dim_as_int(scores.shape[2])
if q != 1 and q != num_classes: if q != 1 and q != num_classes:
raise ValueError('third dimension of boxes must be either 1 or equal ' raise ValueError('third dimension of boxes must be either 1 or equal '
'to the third dimension of scores') 'to the third dimension of scores')
...@@ -335,8 +563,8 @@ def batch_multiclass_non_max_suppression(boxes, ...@@ -335,8 +563,8 @@ def batch_multiclass_non_max_suppression(boxes,
del additional_fields del additional_fields
with tf.name_scope(scope, 'BatchMultiClassNonMaxSuppression'): with tf.name_scope(scope, 'BatchMultiClassNonMaxSuppression'):
boxes_shape = boxes.shape boxes_shape = boxes.shape
batch_size = boxes_shape[0].value batch_size = shape_utils.get_dim_as_int(boxes_shape[0])
num_anchors = boxes_shape[1].value num_anchors = shape_utils.get_dim_as_int(boxes_shape[1])
if batch_size is None: if batch_size is None:
batch_size = tf.shape(boxes)[0] batch_size = tf.shape(boxes)[0]
...@@ -434,31 +662,47 @@ def batch_multiclass_non_max_suppression(boxes, ...@@ -434,31 +662,47 @@ def batch_multiclass_non_max_suppression(boxes,
per_image_masks = tf.reshape( per_image_masks = tf.reshape(
tf.slice(per_image_masks, 4 * [0], tf.slice(per_image_masks, 4 * [0],
tf.stack([per_image_num_valid_boxes, -1, -1, -1])), tf.stack([per_image_num_valid_boxes, -1, -1, -1])),
[-1, q, per_image_masks.shape[2].value, [-1, q, shape_utils.get_dim_as_int(per_image_masks.shape[2]),
per_image_masks.shape[3].value]) shape_utils.get_dim_as_int(per_image_masks.shape[3])])
if per_image_additional_fields is not None: if per_image_additional_fields is not None:
for key, tensor in per_image_additional_fields.items(): for key, tensor in per_image_additional_fields.items():
additional_field_shape = tensor.get_shape() additional_field_shape = tensor.get_shape()
additional_field_dim = len(additional_field_shape) additional_field_dim = len(additional_field_shape)
per_image_additional_fields[key] = tf.reshape( per_image_additional_fields[key] = tf.reshape(
tf.slice(per_image_additional_fields[key], tf.slice(
additional_field_dim * [0], per_image_additional_fields[key],
tf.stack([per_image_num_valid_boxes] + additional_field_dim * [0],
(additional_field_dim - 1) * [-1])), tf.stack([per_image_num_valid_boxes] +
[-1] + [dim.value for dim in additional_field_shape[1:]]) (additional_field_dim - 1) * [-1])), [-1] + [
shape_utils.get_dim_as_int(dim)
nmsed_boxlist, num_valid_nms_boxes = multiclass_non_max_suppression( for dim in additional_field_shape[1:]
per_image_boxes, ])
per_image_scores, if use_class_agnostic_nms:
score_thresh, nmsed_boxlist, num_valid_nms_boxes = class_agnostic_non_max_suppression(
iou_thresh, per_image_boxes,
max_size_per_class, per_image_scores,
max_total_size, score_thresh,
clip_window=per_image_clip_window, iou_thresh,
change_coordinate_frame=change_coordinate_frame, max_classes_per_detection,
masks=per_image_masks, max_total_size,
pad_to_max_output_size=use_static_shapes, clip_window=per_image_clip_window,
additional_fields=per_image_additional_fields) change_coordinate_frame=change_coordinate_frame,
masks=per_image_masks,
pad_to_max_output_size=use_static_shapes,
additional_fields=per_image_additional_fields)
else:
nmsed_boxlist, num_valid_nms_boxes = multiclass_non_max_suppression(
per_image_boxes,
per_image_scores,
score_thresh,
iou_thresh,
max_size_per_class,
max_total_size,
clip_window=per_image_clip_window,
change_coordinate_frame=change_coordinate_frame,
masks=per_image_masks,
pad_to_max_output_size=use_static_shapes,
additional_fields=per_image_additional_fields)
if not use_static_shapes: if not use_static_shapes:
nmsed_boxlist = box_list_ops.pad_or_clip_box_list( nmsed_boxlist = box_list_ops.pad_or_clip_box_list(
...@@ -499,7 +743,7 @@ def batch_multiclass_non_max_suppression(boxes, ...@@ -499,7 +743,7 @@ def batch_multiclass_non_max_suppression(boxes,
if num_additional_fields > 0: if num_additional_fields > 0:
# Sort the keys to ensure arranging elements in same order as # Sort the keys to ensure arranging elements in same order as
# in _single_image_nms_fn. # in _single_image_nms_fn.
batch_nmsed_keys = ordered_additional_fields.keys() batch_nmsed_keys = list(ordered_additional_fields.keys())
for i in range(len(batch_nmsed_keys)): for i in range(len(batch_nmsed_keys)):
batch_nmsed_additional_fields[ batch_nmsed_additional_fields[
batch_nmsed_keys[i]] = batch_nmsed_values[i] batch_nmsed_keys[i]] = batch_nmsed_values[i]
......
...@@ -55,7 +55,7 @@ def prefetch(tensor_dict, capacity): ...@@ -55,7 +55,7 @@ def prefetch(tensor_dict, capacity):
enqueue_op = prefetch_queue.enqueue(tensor_dict) enqueue_op = prefetch_queue.enqueue(tensor_dict)
tf.train.queue_runner.add_queue_runner(tf.train.queue_runner.QueueRunner( tf.train.queue_runner.add_queue_runner(tf.train.queue_runner.QueueRunner(
prefetch_queue, [enqueue_op])) prefetch_queue, [enqueue_op]))
tf.summary.scalar('queue/%s/fraction_of_%d_full' % (prefetch_queue.name, tf.summary.scalar(
capacity), 'queue/%s/fraction_of_%d_full' % (prefetch_queue.name, capacity),
tf.to_float(prefetch_queue.size()) * (1. / capacity)) tf.cast(prefetch_queue.size(), dtype=tf.float32) * (1. / capacity))
return prefetch_queue return prefetch_queue
...@@ -261,7 +261,7 @@ def normalize_image(image, original_minval, original_maxval, target_minval, ...@@ -261,7 +261,7 @@ def normalize_image(image, original_minval, original_maxval, target_minval,
original_maxval = float(original_maxval) original_maxval = float(original_maxval)
target_minval = float(target_minval) target_minval = float(target_minval)
target_maxval = float(target_maxval) target_maxval = float(target_maxval)
image = tf.to_float(image) image = tf.cast(image, dtype=tf.float32)
image = tf.subtract(image, original_minval) image = tf.subtract(image, original_minval)
image = tf.multiply(image, (target_maxval - target_minval) / image = tf.multiply(image, (target_maxval - target_minval) /
(original_maxval - original_minval)) (original_maxval - original_minval))
...@@ -810,10 +810,12 @@ def random_image_scale(image, ...@@ -810,10 +810,12 @@ def random_image_scale(image,
generator_func, preprocessor_cache.PreprocessorCache.IMAGE_SCALE, generator_func, preprocessor_cache.PreprocessorCache.IMAGE_SCALE,
preprocess_vars_cache) preprocess_vars_cache)
image_newysize = tf.to_int32( image_newysize = tf.cast(
tf.multiply(tf.to_float(image_height), size_coef)) tf.multiply(tf.cast(image_height, dtype=tf.float32), size_coef),
image_newxsize = tf.to_int32( dtype=tf.int32)
tf.multiply(tf.to_float(image_width), size_coef)) image_newxsize = tf.cast(
tf.multiply(tf.cast(image_width, dtype=tf.float32), size_coef),
dtype=tf.int32)
image = tf.image.resize_images( image = tf.image.resize_images(
image, [image_newysize, image_newxsize], align_corners=True) image, [image_newysize, image_newxsize], align_corners=True)
result.append(image) result.append(image)
...@@ -1237,7 +1239,7 @@ def _strict_random_crop_image(image, ...@@ -1237,7 +1239,7 @@ def _strict_random_crop_image(image,
new_image.set_shape([None, None, image.get_shape()[2]]) new_image.set_shape([None, None, image.get_shape()[2]])
# [1, 4] # [1, 4]
im_box_rank2 = tf.squeeze(im_box, squeeze_dims=[0]) im_box_rank2 = tf.squeeze(im_box, axis=[0])
# [4] # [4]
im_box_rank1 = tf.squeeze(im_box) im_box_rank1 = tf.squeeze(im_box)
...@@ -1555,13 +1557,15 @@ def random_pad_image(image, ...@@ -1555,13 +1557,15 @@ def random_pad_image(image,
new_image += image_color_padded new_image += image_color_padded
# setting boxes # setting boxes
new_window = tf.to_float( new_window = tf.cast(
tf.stack([ tf.stack([
-offset_height, -offset_width, target_height - offset_height, -offset_height, -offset_width, target_height - offset_height,
target_width - offset_width target_width - offset_width
])) ]),
new_window /= tf.to_float( dtype=tf.float32)
tf.stack([image_height, image_width, image_height, image_width])) new_window /= tf.cast(
tf.stack([image_height, image_width, image_height, image_width]),
dtype=tf.float32)
boxlist = box_list.BoxList(boxes) boxlist = box_list.BoxList(boxes)
new_boxlist = box_list_ops.change_coordinate_frame(boxlist, new_window) new_boxlist = box_list_ops.change_coordinate_frame(boxlist, new_window)
new_boxes = new_boxlist.get() new_boxes = new_boxlist.get()
...@@ -1616,8 +1620,8 @@ def random_absolute_pad_image(image, ...@@ -1616,8 +1620,8 @@ def random_absolute_pad_image(image,
form. form.
""" """
min_image_size = tf.shape(image)[:2] min_image_size = tf.shape(image)[:2]
max_image_size = min_image_size + tf.to_int32( max_image_size = min_image_size + tf.cast(
[max_height_padding, max_width_padding]) [max_height_padding, max_width_padding], dtype=tf.int32)
return random_pad_image(image, boxes, min_image_size=min_image_size, return random_pad_image(image, boxes, min_image_size=min_image_size,
max_image_size=max_image_size, pad_color=pad_color, max_image_size=max_image_size, pad_color=pad_color,
seed=seed, seed=seed,
...@@ -1723,12 +1727,14 @@ def random_crop_pad_image(image, ...@@ -1723,12 +1727,14 @@ def random_crop_pad_image(image,
cropped_image, cropped_boxes, cropped_labels = result[:3] cropped_image, cropped_boxes, cropped_labels = result[:3]
min_image_size = tf.to_int32( min_image_size = tf.cast(
tf.to_float(tf.stack([image_height, image_width])) * tf.cast(tf.stack([image_height, image_width]), dtype=tf.float32) *
min_padded_size_ratio) min_padded_size_ratio,
max_image_size = tf.to_int32( dtype=tf.int32)
tf.to_float(tf.stack([image_height, image_width])) * max_image_size = tf.cast(
max_padded_size_ratio) tf.cast(tf.stack([image_height, image_width]), dtype=tf.float32) *
max_padded_size_ratio,
dtype=tf.int32)
padded_image, padded_boxes = random_pad_image( padded_image, padded_boxes = random_pad_image(
cropped_image, cropped_image,
...@@ -1840,16 +1846,23 @@ def random_crop_to_aspect_ratio(image, ...@@ -1840,16 +1846,23 @@ def random_crop_to_aspect_ratio(image,
image_shape = tf.shape(image) image_shape = tf.shape(image)
orig_height = image_shape[0] orig_height = image_shape[0]
orig_width = image_shape[1] orig_width = image_shape[1]
orig_aspect_ratio = tf.to_float(orig_width) / tf.to_float(orig_height) orig_aspect_ratio = tf.cast(
orig_width, dtype=tf.float32) / tf.cast(
orig_height, dtype=tf.float32)
new_aspect_ratio = tf.constant(aspect_ratio, dtype=tf.float32) new_aspect_ratio = tf.constant(aspect_ratio, dtype=tf.float32)
def target_height_fn(): def target_height_fn():
return tf.to_int32(tf.round(tf.to_float(orig_width) / new_aspect_ratio)) return tf.cast(
tf.round(tf.cast(orig_width, dtype=tf.float32) / new_aspect_ratio),
dtype=tf.int32)
target_height = tf.cond(orig_aspect_ratio >= new_aspect_ratio, target_height = tf.cond(orig_aspect_ratio >= new_aspect_ratio,
lambda: orig_height, target_height_fn) lambda: orig_height, target_height_fn)
def target_width_fn(): def target_width_fn():
return tf.to_int32(tf.round(tf.to_float(orig_height) * new_aspect_ratio)) return tf.cast(
tf.round(tf.cast(orig_height, dtype=tf.float32) * new_aspect_ratio),
dtype=tf.int32)
target_width = tf.cond(orig_aspect_ratio <= new_aspect_ratio, target_width = tf.cond(orig_aspect_ratio <= new_aspect_ratio,
lambda: orig_width, target_width_fn) lambda: orig_width, target_width_fn)
...@@ -1870,10 +1883,14 @@ def random_crop_to_aspect_ratio(image, ...@@ -1870,10 +1883,14 @@ def random_crop_to_aspect_ratio(image,
image, offset_height, offset_width, target_height, target_width) image, offset_height, offset_width, target_height, target_width)
im_box = tf.stack([ im_box = tf.stack([
tf.to_float(offset_height) / tf.to_float(orig_height), tf.cast(offset_height, dtype=tf.float32) /
tf.to_float(offset_width) / tf.to_float(orig_width), tf.cast(orig_height, dtype=tf.float32),
tf.to_float(offset_height + target_height) / tf.to_float(orig_height), tf.cast(offset_width, dtype=tf.float32) /
tf.to_float(offset_width + target_width) / tf.to_float(orig_width) tf.cast(orig_width, dtype=tf.float32),
tf.cast(offset_height + target_height, dtype=tf.float32) /
tf.cast(orig_height, dtype=tf.float32),
tf.cast(offset_width + target_width, dtype=tf.float32) /
tf.cast(orig_width, dtype=tf.float32)
]) ])
boxlist = box_list.BoxList(boxes) boxlist = box_list.BoxList(boxes)
...@@ -1996,8 +2013,8 @@ def random_pad_to_aspect_ratio(image, ...@@ -1996,8 +2013,8 @@ def random_pad_to_aspect_ratio(image,
with tf.name_scope('RandomPadToAspectRatio', values=[image]): with tf.name_scope('RandomPadToAspectRatio', values=[image]):
image_shape = tf.shape(image) image_shape = tf.shape(image)
image_height = tf.to_float(image_shape[0]) image_height = tf.cast(image_shape[0], dtype=tf.float32)
image_width = tf.to_float(image_shape[1]) image_width = tf.cast(image_shape[1], dtype=tf.float32)
image_aspect_ratio = image_width / image_height image_aspect_ratio = image_width / image_height
new_aspect_ratio = tf.constant(aspect_ratio, dtype=tf.float32) new_aspect_ratio = tf.constant(aspect_ratio, dtype=tf.float32)
target_height = tf.cond( target_height = tf.cond(
...@@ -2034,7 +2051,8 @@ def random_pad_to_aspect_ratio(image, ...@@ -2034,7 +2051,8 @@ def random_pad_to_aspect_ratio(image,
target_width = tf.round(scale * target_width) target_width = tf.round(scale * target_width)
new_image = tf.image.pad_to_bounding_box( new_image = tf.image.pad_to_bounding_box(
image, 0, 0, tf.to_int32(target_height), tf.to_int32(target_width)) image, 0, 0, tf.cast(target_height, dtype=tf.int32),
tf.cast(target_width, dtype=tf.int32))
im_box = tf.stack([ im_box = tf.stack([
0.0, 0.0,
...@@ -2050,9 +2068,9 @@ def random_pad_to_aspect_ratio(image, ...@@ -2050,9 +2068,9 @@ def random_pad_to_aspect_ratio(image,
if masks is not None: if masks is not None:
new_masks = tf.expand_dims(masks, -1) new_masks = tf.expand_dims(masks, -1)
new_masks = tf.image.pad_to_bounding_box(new_masks, 0, 0, new_masks = tf.image.pad_to_bounding_box(
tf.to_int32(target_height), new_masks, 0, 0, tf.cast(target_height, dtype=tf.int32),
tf.to_int32(target_width)) tf.cast(target_width, dtype=tf.int32))
new_masks = tf.squeeze(new_masks, [-1]) new_masks = tf.squeeze(new_masks, [-1])
result.append(new_masks) result.append(new_masks)
...@@ -2106,10 +2124,12 @@ def random_black_patches(image, ...@@ -2106,10 +2124,12 @@ def random_black_patches(image,
image_shape = tf.shape(image) image_shape = tf.shape(image)
image_height = image_shape[0] image_height = image_shape[0]
image_width = image_shape[1] image_width = image_shape[1]
box_size = tf.to_int32( box_size = tf.cast(
tf.multiply( tf.multiply(
tf.minimum(tf.to_float(image_height), tf.to_float(image_width)), tf.minimum(
size_to_image_ratio)) tf.cast(image_height, dtype=tf.float32),
tf.cast(image_width, dtype=tf.float32)), size_to_image_ratio),
dtype=tf.int32)
generator_func = functools.partial(tf.random_uniform, [], minval=0.0, generator_func = functools.partial(tf.random_uniform, [], minval=0.0,
maxval=(1.0 - size_to_image_ratio), maxval=(1.0 - size_to_image_ratio),
...@@ -2123,8 +2143,12 @@ def random_black_patches(image, ...@@ -2123,8 +2143,12 @@ def random_black_patches(image,
preprocessor_cache.PreprocessorCache.ADD_BLACK_PATCH, preprocessor_cache.PreprocessorCache.ADD_BLACK_PATCH,
preprocess_vars_cache, key=str(idx) + 'x') preprocess_vars_cache, key=str(idx) + 'x')
y_min = tf.to_int32(normalized_y_min * tf.to_float(image_height)) y_min = tf.cast(
x_min = tf.to_int32(normalized_x_min * tf.to_float(image_width)) normalized_y_min * tf.cast(image_height, dtype=tf.float32),
dtype=tf.int32)
x_min = tf.cast(
normalized_x_min * tf.cast(image_width, dtype=tf.float32),
dtype=tf.int32)
black_box = tf.ones([box_size, box_size, 3], dtype=tf.float32) black_box = tf.ones([box_size, box_size, 3], dtype=tf.float32)
mask = 1.0 - tf.image.pad_to_bounding_box(black_box, y_min, x_min, mask = 1.0 - tf.image.pad_to_bounding_box(black_box, y_min, x_min,
image_height, image_width) image_height, image_width)
...@@ -2156,7 +2180,7 @@ def image_to_float(image): ...@@ -2156,7 +2180,7 @@ def image_to_float(image):
image: image in tf.float32 format. image: image in tf.float32 format.
""" """
with tf.name_scope('ImageToFloat', values=[image]): with tf.name_scope('ImageToFloat', values=[image]):
image = tf.to_float(image) image = tf.cast(image, dtype=tf.float32)
return image return image
...@@ -2342,10 +2366,12 @@ def resize_to_min_dimension(image, masks=None, min_dimension=600, ...@@ -2342,10 +2366,12 @@ def resize_to_min_dimension(image, masks=None, min_dimension=600,
(image_height, image_width, num_channels) = _get_image_info(image) (image_height, image_width, num_channels) = _get_image_info(image)
min_image_dimension = tf.minimum(image_height, image_width) min_image_dimension = tf.minimum(image_height, image_width)
min_target_dimension = tf.maximum(min_image_dimension, min_dimension) min_target_dimension = tf.maximum(min_image_dimension, min_dimension)
target_ratio = tf.to_float(min_target_dimension) / tf.to_float( target_ratio = tf.cast(min_target_dimension, dtype=tf.float32) / tf.cast(
min_image_dimension) min_image_dimension, dtype=tf.float32)
target_height = tf.to_int32(tf.to_float(image_height) * target_ratio) target_height = tf.cast(
target_width = tf.to_int32(tf.to_float(image_width) * target_ratio) tf.cast(image_height, dtype=tf.float32) * target_ratio, dtype=tf.int32)
target_width = tf.cast(
tf.cast(image_width, dtype=tf.float32) * target_ratio, dtype=tf.int32)
image = tf.image.resize_images( image = tf.image.resize_images(
tf.expand_dims(image, axis=0), size=[target_height, target_width], tf.expand_dims(image, axis=0), size=[target_height, target_width],
method=method, method=method,
...@@ -2398,10 +2424,12 @@ def resize_to_max_dimension(image, masks=None, max_dimension=600, ...@@ -2398,10 +2424,12 @@ def resize_to_max_dimension(image, masks=None, max_dimension=600,
(image_height, image_width, num_channels) = _get_image_info(image) (image_height, image_width, num_channels) = _get_image_info(image)
max_image_dimension = tf.maximum(image_height, image_width) max_image_dimension = tf.maximum(image_height, image_width)
max_target_dimension = tf.minimum(max_image_dimension, max_dimension) max_target_dimension = tf.minimum(max_image_dimension, max_dimension)
target_ratio = tf.to_float(max_target_dimension) / tf.to_float( target_ratio = tf.cast(max_target_dimension, dtype=tf.float32) / tf.cast(
max_image_dimension) max_image_dimension, dtype=tf.float32)
target_height = tf.to_int32(tf.to_float(image_height) * target_ratio) target_height = tf.cast(
target_width = tf.to_int32(tf.to_float(image_width) * target_ratio) tf.cast(image_height, dtype=tf.float32) * target_ratio, dtype=tf.int32)
target_width = tf.cast(
tf.cast(image_width, dtype=tf.float32) * target_ratio, dtype=tf.int32)
image = tf.image.resize_images( image = tf.image.resize_images(
tf.expand_dims(image, axis=0), size=[target_height, target_width], tf.expand_dims(image, axis=0), size=[target_height, target_width],
method=method, method=method,
...@@ -2639,11 +2667,11 @@ def random_self_concat_image( ...@@ -2639,11 +2667,11 @@ def random_self_concat_image(
if axis == 0: if axis == 0:
# Concat vertically, so need to reduce the y coordinates. # Concat vertically, so need to reduce the y coordinates.
old_scaling = tf.to_float([0.5, 1.0, 0.5, 1.0]) old_scaling = tf.constant([0.5, 1.0, 0.5, 1.0])
new_translation = tf.to_float([0.5, 0.0, 0.5, 0.0]) new_translation = tf.constant([0.5, 0.0, 0.5, 0.0])
elif axis == 1: elif axis == 1:
old_scaling = tf.to_float([1.0, 0.5, 1.0, 0.5]) old_scaling = tf.constant([1.0, 0.5, 1.0, 0.5])
new_translation = tf.to_float([0.0, 0.5, 0.0, 0.5]) new_translation = tf.constant([0.0, 0.5, 0.0, 0.5])
old_boxes = old_scaling * boxes old_boxes = old_scaling * boxes
new_boxes = old_boxes + new_translation new_boxes = old_boxes + new_translation
......
...@@ -795,8 +795,8 @@ class PreprocessorTest(tf.test.TestCase): ...@@ -795,8 +795,8 @@ class PreprocessorTest(tf.test.TestCase):
images = self.createTestImages() images = self.createTestImages()
tensor_dict = {fields.InputDataFields.image: images} tensor_dict = {fields.InputDataFields.image: images}
tensor_dict = preprocessor.preprocess(tensor_dict, preprocessing_options) tensor_dict = preprocessor.preprocess(tensor_dict, preprocessing_options)
images_min = tf.to_float(images) * 0.9 / 255.0 images_min = tf.cast(images, dtype=tf.float32) * 0.9 / 255.0
images_max = tf.to_float(images) * 1.1 / 255.0 images_max = tf.cast(images, dtype=tf.float32) * 1.1 / 255.0
images = tensor_dict[fields.InputDataFields.image] images = tensor_dict[fields.InputDataFields.image]
values_greater = tf.greater_equal(images, images_min) values_greater = tf.greater_equal(images, images_min)
values_less = tf.less_equal(images, images_max) values_less = tf.less_equal(images, images_max)
...@@ -858,20 +858,26 @@ class PreprocessorTest(tf.test.TestCase): ...@@ -858,20 +858,26 @@ class PreprocessorTest(tf.test.TestCase):
value=images_gray, num_or_size_splits=3, axis=3) value=images_gray, num_or_size_splits=3, axis=3)
images_r, images_g, images_b = tf.split( images_r, images_g, images_b = tf.split(
value=images_original, num_or_size_splits=3, axis=3) value=images_original, num_or_size_splits=3, axis=3)
images_r_diff1 = tf.squared_difference(tf.to_float(images_r), images_r_diff1 = tf.squared_difference(
tf.to_float(images_gray_r)) tf.cast(images_r, dtype=tf.float32),
images_r_diff2 = tf.squared_difference(tf.to_float(images_gray_r), tf.cast(images_gray_r, dtype=tf.float32))
tf.to_float(images_gray_g)) images_r_diff2 = tf.squared_difference(
tf.cast(images_gray_r, dtype=tf.float32),
tf.cast(images_gray_g, dtype=tf.float32))
images_r_diff = tf.multiply(images_r_diff1, images_r_diff2) images_r_diff = tf.multiply(images_r_diff1, images_r_diff2)
images_g_diff1 = tf.squared_difference(tf.to_float(images_g), images_g_diff1 = tf.squared_difference(
tf.to_float(images_gray_g)) tf.cast(images_g, dtype=tf.float32),
images_g_diff2 = tf.squared_difference(tf.to_float(images_gray_g), tf.cast(images_gray_g, dtype=tf.float32))
tf.to_float(images_gray_b)) images_g_diff2 = tf.squared_difference(
tf.cast(images_gray_g, dtype=tf.float32),
tf.cast(images_gray_b, dtype=tf.float32))
images_g_diff = tf.multiply(images_g_diff1, images_g_diff2) images_g_diff = tf.multiply(images_g_diff1, images_g_diff2)
images_b_diff1 = tf.squared_difference(tf.to_float(images_b), images_b_diff1 = tf.squared_difference(
tf.to_float(images_gray_b)) tf.cast(images_b, dtype=tf.float32),
images_b_diff2 = tf.squared_difference(tf.to_float(images_gray_b), tf.cast(images_gray_b, dtype=tf.float32))
tf.to_float(images_gray_r)) images_b_diff2 = tf.squared_difference(
tf.cast(images_gray_b, dtype=tf.float32),
tf.cast(images_gray_r, dtype=tf.float32))
images_b_diff = tf.multiply(images_b_diff1, images_b_diff2) images_b_diff = tf.multiply(images_b_diff1, images_b_diff2)
image_zero1 = tf.constant(0, dtype=tf.float32, shape=[1, 4, 4, 1]) image_zero1 = tf.constant(0, dtype=tf.float32, shape=[1, 4, 4, 1])
with self.test_session() as sess: with self.test_session() as sess:
...@@ -2135,7 +2141,7 @@ class PreprocessorTest(tf.test.TestCase): ...@@ -2135,7 +2141,7 @@ class PreprocessorTest(tf.test.TestCase):
boxes = self.createTestBoxes() boxes = self.createTestBoxes()
labels = self.createTestLabels() labels = self.createTestLabels()
tensor_dict = { tensor_dict = {
fields.InputDataFields.image: tf.to_float(images), fields.InputDataFields.image: tf.cast(images, dtype=tf.float32),
fields.InputDataFields.groundtruth_boxes: boxes, fields.InputDataFields.groundtruth_boxes: boxes,
fields.InputDataFields.groundtruth_classes: labels, fields.InputDataFields.groundtruth_classes: labels,
} }
...@@ -2856,7 +2862,7 @@ class PreprocessorTest(tf.test.TestCase): ...@@ -2856,7 +2862,7 @@ class PreprocessorTest(tf.test.TestCase):
scores = self.createTestMultiClassScores() scores = self.createTestMultiClassScores()
tensor_dict = { tensor_dict = {
fields.InputDataFields.image: tf.to_float(images), fields.InputDataFields.image: tf.cast(images, dtype=tf.float32),
fields.InputDataFields.groundtruth_boxes: boxes, fields.InputDataFields.groundtruth_boxes: boxes,
fields.InputDataFields.groundtruth_classes: labels, fields.InputDataFields.groundtruth_classes: labels,
fields.InputDataFields.groundtruth_weights: weights, fields.InputDataFields.groundtruth_weights: weights,
......
...@@ -109,6 +109,8 @@ class DetectionResultFields(object): ...@@ -109,6 +109,8 @@ class DetectionResultFields(object):
key: unique key corresponding to image. key: unique key corresponding to image.
detection_boxes: coordinates of the detection boxes in the image. detection_boxes: coordinates of the detection boxes in the image.
detection_scores: detection scores for the detection boxes in the image. detection_scores: detection scores for the detection boxes in the image.
detection_multiclass_scores: class score distribution (including background)
for detection boxes in the image including background class.
detection_classes: detection-level class labels. detection_classes: detection-level class labels.
detection_masks: contains a segmentation mask for each detection box. detection_masks: contains a segmentation mask for each detection box.
detection_boundaries: contains an object boundary for each detection box. detection_boundaries: contains an object boundary for each detection box.
...@@ -123,6 +125,7 @@ class DetectionResultFields(object): ...@@ -123,6 +125,7 @@ class DetectionResultFields(object):
key = 'key' key = 'key'
detection_boxes = 'detection_boxes' detection_boxes = 'detection_boxes'
detection_scores = 'detection_scores' detection_scores = 'detection_scores'
detection_multiclass_scores = 'detection_multiclass_scores'
detection_classes = 'detection_classes' detection_classes = 'detection_classes'
detection_masks = 'detection_masks' detection_masks = 'detection_masks'
detection_boundaries = 'detection_boundaries' detection_boundaries = 'detection_boundaries'
......
...@@ -660,16 +660,16 @@ def batch_assign_confidences(target_assigner, ...@@ -660,16 +660,16 @@ def batch_assign_confidences(target_assigner,
explicit_example_mask = tf.logical_or(positive_mask, negative_mask) explicit_example_mask = tf.logical_or(positive_mask, negative_mask)
positive_anchors = tf.reduce_any(positive_mask, axis=-1) positive_anchors = tf.reduce_any(positive_mask, axis=-1)
regression_weights = tf.to_float(positive_anchors) regression_weights = tf.cast(positive_anchors, dtype=tf.float32)
regression_targets = ( regression_targets = (
reg_targets * tf.expand_dims(regression_weights, axis=-1)) reg_targets * tf.expand_dims(regression_weights, axis=-1))
regression_weights_expanded = tf.expand_dims(regression_weights, axis=-1) regression_weights_expanded = tf.expand_dims(regression_weights, axis=-1)
cls_targets_without_background = ( cls_targets_without_background = (
cls_targets_without_background * (1 - tf.to_float(negative_mask))) cls_targets_without_background *
cls_weights_without_background = ( (1 - tf.cast(negative_mask, dtype=tf.float32)))
(1 - implicit_class_weight) * tf.to_float(explicit_example_mask) cls_weights_without_background = ((1 - implicit_class_weight) * tf.cast(
+ implicit_class_weight) explicit_example_mask, dtype=tf.float32) + implicit_class_weight)
if include_background_class: if include_background_class:
cls_weights_background = ( cls_weights_background = (
......
...@@ -59,8 +59,15 @@ class _ClassTensorHandler(slim_example_decoder.Tensor): ...@@ -59,8 +59,15 @@ class _ClassTensorHandler(slim_example_decoder.Tensor):
label_map_proto_file, use_display_name=False) label_map_proto_file, use_display_name=False)
# We use a default_value of -1, but we expect all labels to be contained # We use a default_value of -1, but we expect all labels to be contained
# in the label map. # in the label map.
name_to_id_table = tf.contrib.lookup.HashTable( try:
initializer=tf.contrib.lookup.KeyValueTensorInitializer( # Dynamically try to load the tf v2 lookup, falling back to contrib
lookup = tf.compat.v2.lookup
hash_table_class = tf.compat.v2.lookup.StaticHashTable
except AttributeError:
lookup = tf.contrib.lookup
hash_table_class = tf.contrib.lookup.HashTable
name_to_id_table = hash_table_class(
initializer=lookup.KeyValueTensorInitializer(
keys=tf.constant(list(name_to_id.keys())), keys=tf.constant(list(name_to_id.keys())),
values=tf.constant(list(name_to_id.values()), dtype=tf.int64)), values=tf.constant(list(name_to_id.values()), dtype=tf.int64)),
default_value=-1) default_value=-1)
...@@ -68,8 +75,8 @@ class _ClassTensorHandler(slim_example_decoder.Tensor): ...@@ -68,8 +75,8 @@ class _ClassTensorHandler(slim_example_decoder.Tensor):
label_map_proto_file, use_display_name=True) label_map_proto_file, use_display_name=True)
# We use a default_value of -1, but we expect all labels to be contained # We use a default_value of -1, but we expect all labels to be contained
# in the label map. # in the label map.
display_name_to_id_table = tf.contrib.lookup.HashTable( display_name_to_id_table = hash_table_class(
initializer=tf.contrib.lookup.KeyValueTensorInitializer( initializer=lookup.KeyValueTensorInitializer(
keys=tf.constant(list(display_name_to_id.keys())), keys=tf.constant(list(display_name_to_id.keys())),
values=tf.constant( values=tf.constant(
list(display_name_to_id.values()), dtype=tf.int64)), list(display_name_to_id.values()), dtype=tf.int64)),
...@@ -444,7 +451,8 @@ class TfExampleDecoder(data_decoder.DataDecoder): ...@@ -444,7 +451,8 @@ class TfExampleDecoder(data_decoder.DataDecoder):
masks = keys_to_tensors['image/object/mask'] masks = keys_to_tensors['image/object/mask']
if isinstance(masks, tf.SparseTensor): if isinstance(masks, tf.SparseTensor):
masks = tf.sparse_tensor_to_dense(masks) masks = tf.sparse_tensor_to_dense(masks)
masks = tf.reshape(tf.to_float(tf.greater(masks, 0.0)), to_shape) masks = tf.reshape(
tf.cast(tf.greater(masks, 0.0), dtype=tf.float32), to_shape)
return tf.cast(masks, tf.float32) return tf.cast(masks, tf.float32)
def _decode_png_instance_masks(self, keys_to_tensors): def _decode_png_instance_masks(self, keys_to_tensors):
...@@ -465,7 +473,7 @@ class TfExampleDecoder(data_decoder.DataDecoder): ...@@ -465,7 +473,7 @@ class TfExampleDecoder(data_decoder.DataDecoder):
image = tf.squeeze( image = tf.squeeze(
tf.image.decode_image(image_buffer, channels=1), axis=2) tf.image.decode_image(image_buffer, channels=1), axis=2)
image.set_shape([None, None]) image.set_shape([None, None])
image = tf.to_float(tf.greater(image, 0)) image = tf.cast(tf.greater(image, 0), dtype=tf.float32)
return image return image
png_masks = keys_to_tensors['image/object/mask'] png_masks = keys_to_tensors['image/object/mask']
...@@ -476,4 +484,4 @@ class TfExampleDecoder(data_decoder.DataDecoder): ...@@ -476,4 +484,4 @@ class TfExampleDecoder(data_decoder.DataDecoder):
return tf.cond( return tf.cond(
tf.greater(tf.size(png_masks), 0), tf.greater(tf.size(png_masks), 0),
lambda: tf.map_fn(decode_png_mask, png_masks, dtype=tf.float32), lambda: tf.map_fn(decode_png_mask, png_masks, dtype=tf.float32),
lambda: tf.zeros(tf.to_int32(tf.stack([0, height, width])))) lambda: tf.zeros(tf.cast(tf.stack([0, height, width]), dtype=tf.int32)))
...@@ -44,10 +44,15 @@ EVAL_METRICS_CLASS_DICT = { ...@@ -44,10 +44,15 @@ EVAL_METRICS_CLASS_DICT = {
coco_evaluation.CocoMaskEvaluator, coco_evaluation.CocoMaskEvaluator,
'oid_challenge_detection_metrics': 'oid_challenge_detection_metrics':
object_detection_evaluation.OpenImagesDetectionChallengeEvaluator, object_detection_evaluation.OpenImagesDetectionChallengeEvaluator,
'oid_challenge_segmentation_metrics':
object_detection_evaluation
.OpenImagesInstanceSegmentationChallengeEvaluator,
'pascal_voc_detection_metrics': 'pascal_voc_detection_metrics':
object_detection_evaluation.PascalDetectionEvaluator, object_detection_evaluation.PascalDetectionEvaluator,
'weighted_pascal_voc_detection_metrics': 'weighted_pascal_voc_detection_metrics':
object_detection_evaluation.WeightedPascalDetectionEvaluator, object_detection_evaluation.WeightedPascalDetectionEvaluator,
'precision_at_recall_detection_metrics':
object_detection_evaluation.PrecisionAtRecallDetectionEvaluator,
'pascal_voc_instance_segmentation_metrics': 'pascal_voc_instance_segmentation_metrics':
object_detection_evaluation.PascalInstanceSegmentationEvaluator, object_detection_evaluation.PascalInstanceSegmentationEvaluator,
'weighted_pascal_voc_instance_segmentation_metrics': 'weighted_pascal_voc_instance_segmentation_metrics':
...@@ -776,7 +781,8 @@ def result_dict_for_batched_example(images, ...@@ -776,7 +781,8 @@ def result_dict_for_batched_example(images,
detection_fields = fields.DetectionResultFields detection_fields = fields.DetectionResultFields
detection_boxes = detections[detection_fields.detection_boxes] detection_boxes = detections[detection_fields.detection_boxes]
detection_scores = detections[detection_fields.detection_scores] detection_scores = detections[detection_fields.detection_scores]
num_detections = tf.to_int32(detections[detection_fields.num_detections]) num_detections = tf.cast(detections[detection_fields.num_detections],
dtype=tf.int32)
if class_agnostic: if class_agnostic:
detection_classes = tf.ones_like(detection_scores, dtype=tf.int64) detection_classes = tf.ones_like(detection_scores, dtype=tf.int64)
...@@ -939,4 +945,9 @@ def evaluator_options_from_eval_config(eval_config): ...@@ -939,4 +945,9 @@ def evaluator_options_from_eval_config(eval_config):
'include_metrics_per_category': ( 'include_metrics_per_category': (
eval_config.include_metrics_per_category) eval_config.include_metrics_per_category)
} }
elif eval_metric_fn_key == 'precision_at_recall_detection_metrics':
evaluator_options[eval_metric_fn_key] = {
'recall_lower_bound': (eval_config.recall_lower_bound),
'recall_upper_bound': (eval_config.recall_upper_bound)
}
return evaluator_options return evaluator_options
...@@ -31,9 +31,9 @@ from object_detection.utils import test_case ...@@ -31,9 +31,9 @@ from object_detection.utils import test_case
class EvalUtilTest(test_case.TestCase, parameterized.TestCase): class EvalUtilTest(test_case.TestCase, parameterized.TestCase):
def _get_categories_list(self): def _get_categories_list(self):
return [{'id': 0, 'name': 'person'}, return [{'id': 1, 'name': 'person'},
{'id': 1, 'name': 'dog'}, {'id': 2, 'name': 'dog'},
{'id': 2, 'name': 'cat'}] {'id': 3, 'name': 'cat'}]
def _make_evaluation_dict(self, def _make_evaluation_dict(self,
resized_groundtruth_masks=False, resized_groundtruth_masks=False,
...@@ -192,43 +192,66 @@ class EvalUtilTest(test_case.TestCase, parameterized.TestCase): ...@@ -192,43 +192,66 @@ class EvalUtilTest(test_case.TestCase, parameterized.TestCase):
def test_get_eval_metric_ops_for_evaluators(self): def test_get_eval_metric_ops_for_evaluators(self):
eval_config = eval_pb2.EvalConfig() eval_config = eval_pb2.EvalConfig()
eval_config.metrics_set.extend( eval_config.metrics_set.extend([
['coco_detection_metrics', 'coco_mask_metrics']) 'coco_detection_metrics', 'coco_mask_metrics',
'precision_at_recall_detection_metrics'
])
eval_config.include_metrics_per_category = True eval_config.include_metrics_per_category = True
eval_config.recall_lower_bound = 0.2
eval_config.recall_upper_bound = 0.6
evaluator_options = eval_util.evaluator_options_from_eval_config( evaluator_options = eval_util.evaluator_options_from_eval_config(
eval_config) eval_config)
self.assertTrue(evaluator_options['coco_detection_metrics'][ self.assertTrue(evaluator_options['coco_detection_metrics']
'include_metrics_per_category']) ['include_metrics_per_category'])
self.assertTrue(evaluator_options['coco_mask_metrics'][ self.assertTrue(
'include_metrics_per_category']) evaluator_options['coco_mask_metrics']['include_metrics_per_category'])
self.assertAlmostEqual(
evaluator_options['precision_at_recall_detection_metrics']
['recall_lower_bound'], eval_config.recall_lower_bound)
self.assertAlmostEqual(
evaluator_options['precision_at_recall_detection_metrics']
['recall_upper_bound'], eval_config.recall_upper_bound)
def test_get_evaluator_with_evaluator_options(self): def test_get_evaluator_with_evaluator_options(self):
eval_config = eval_pb2.EvalConfig() eval_config = eval_pb2.EvalConfig()
eval_config.metrics_set.extend(['coco_detection_metrics']) eval_config.metrics_set.extend(
['coco_detection_metrics', 'precision_at_recall_detection_metrics'])
eval_config.include_metrics_per_category = True eval_config.include_metrics_per_category = True
eval_config.recall_lower_bound = 0.2
eval_config.recall_upper_bound = 0.6
categories = self._get_categories_list() categories = self._get_categories_list()
evaluator_options = eval_util.evaluator_options_from_eval_config( evaluator_options = eval_util.evaluator_options_from_eval_config(
eval_config) eval_config)
evaluator = eval_util.get_evaluators( evaluator = eval_util.get_evaluators(eval_config, categories,
eval_config, categories, evaluator_options) evaluator_options)
self.assertTrue(evaluator[0]._include_metrics_per_category) self.assertTrue(evaluator[0]._include_metrics_per_category)
self.assertAlmostEqual(evaluator[1]._recall_lower_bound,
eval_config.recall_lower_bound)
self.assertAlmostEqual(evaluator[1]._recall_upper_bound,
eval_config.recall_upper_bound)
def test_get_evaluator_with_no_evaluator_options(self): def test_get_evaluator_with_no_evaluator_options(self):
eval_config = eval_pb2.EvalConfig() eval_config = eval_pb2.EvalConfig()
eval_config.metrics_set.extend(['coco_detection_metrics']) eval_config.metrics_set.extend(
['coco_detection_metrics', 'precision_at_recall_detection_metrics'])
eval_config.include_metrics_per_category = True eval_config.include_metrics_per_category = True
eval_config.recall_lower_bound = 0.2
eval_config.recall_upper_bound = 0.6
categories = self._get_categories_list() categories = self._get_categories_list()
evaluator = eval_util.get_evaluators( evaluator = eval_util.get_evaluators(
eval_config, categories, evaluator_options=None) eval_config, categories, evaluator_options=None)
# Even though we are setting eval_config.include_metrics_per_category = True # Even though we are setting eval_config.include_metrics_per_category = True
# this option is never passed into the DetectionEvaluator constructor (via # and bounds on recall, these options are never passed into the
# `evaluator_options`). # DetectionEvaluator constructor (via `evaluator_options`).
self.assertFalse(evaluator[0]._include_metrics_per_category) self.assertFalse(evaluator[0]._include_metrics_per_category)
self.assertAlmostEqual(evaluator[1]._recall_lower_bound, 0.0)
self.assertAlmostEqual(evaluator[1]._recall_upper_bound, 1.0)
if __name__ == '__main__': if __name__ == '__main__':
tf.test.main() tf.test.main()
...@@ -106,7 +106,7 @@ flags.DEFINE_string('trained_checkpoint_prefix', None, 'Checkpoint prefix.') ...@@ -106,7 +106,7 @@ flags.DEFINE_string('trained_checkpoint_prefix', None, 'Checkpoint prefix.')
flags.DEFINE_integer('max_detections', 10, flags.DEFINE_integer('max_detections', 10,
'Maximum number of detections (boxes) to show.') 'Maximum number of detections (boxes) to show.')
flags.DEFINE_integer('max_classes_per_detection', 1, flags.DEFINE_integer('max_classes_per_detection', 1,
'Number of classes to display per detection box.') 'Maximum number of classes to output per detection box.')
flags.DEFINE_integer( flags.DEFINE_integer(
'detections_per_class', 100, 'detections_per_class', 100,
'Number of anchors used per class in Regular Non-Max-Suppression.') 'Number of anchors used per class in Regular Non-Max-Suppression.')
...@@ -136,7 +136,7 @@ def main(argv): ...@@ -136,7 +136,7 @@ def main(argv):
export_tflite_ssd_graph_lib.export_tflite_graph( export_tflite_ssd_graph_lib.export_tflite_graph(
pipeline_config, FLAGS.trained_checkpoint_prefix, FLAGS.output_directory, pipeline_config, FLAGS.trained_checkpoint_prefix, FLAGS.output_directory,
FLAGS.add_postprocessing_op, FLAGS.max_detections, FLAGS.add_postprocessing_op, FLAGS.max_detections,
FLAGS.max_classes_per_detection, FLAGS.use_regular_nms) FLAGS.max_classes_per_detection, use_regular_nms=FLAGS.use_regular_nms)
if __name__ == '__main__': if __name__ == '__main__':
......
...@@ -176,6 +176,9 @@ def add_output_tensor_nodes(postprocessed_tensors, ...@@ -176,6 +176,9 @@ def add_output_tensor_nodes(postprocessed_tensors,
containing detected boxes. containing detected boxes.
* detection_scores: float32 tensor of shape [batch_size, num_boxes] * detection_scores: float32 tensor of shape [batch_size, num_boxes]
containing scores for the detected boxes. containing scores for the detected boxes.
* detection_multiclass_scores: (Optional) float32 tensor of shape
[batch_size, num_boxes, num_classes_with_background] for containing class
score distribution for detected boxes including background if any.
* detection_classes: float32 tensor of shape [batch_size, num_boxes] * detection_classes: float32 tensor of shape [batch_size, num_boxes]
containing class predictions for the detected boxes. containing class predictions for the detected boxes.
* detection_keypoints: (Optional) float32 tensor of shape * detection_keypoints: (Optional) float32 tensor of shape
...@@ -189,6 +192,8 @@ def add_output_tensor_nodes(postprocessed_tensors, ...@@ -189,6 +192,8 @@ def add_output_tensor_nodes(postprocessed_tensors,
postprocessed_tensors: a dictionary containing the following fields postprocessed_tensors: a dictionary containing the following fields
'detection_boxes': [batch, max_detections, 4] 'detection_boxes': [batch, max_detections, 4]
'detection_scores': [batch, max_detections] 'detection_scores': [batch, max_detections]
'detection_multiclass_scores': [batch, max_detections,
num_classes_with_background]
'detection_classes': [batch, max_detections] 'detection_classes': [batch, max_detections]
'detection_masks': [batch, max_detections, mask_height, mask_width] 'detection_masks': [batch, max_detections, mask_height, mask_width]
(optional). (optional).
...@@ -204,6 +209,8 @@ def add_output_tensor_nodes(postprocessed_tensors, ...@@ -204,6 +209,8 @@ def add_output_tensor_nodes(postprocessed_tensors,
label_id_offset = 1 label_id_offset = 1
boxes = postprocessed_tensors.get(detection_fields.detection_boxes) boxes = postprocessed_tensors.get(detection_fields.detection_boxes)
scores = postprocessed_tensors.get(detection_fields.detection_scores) scores = postprocessed_tensors.get(detection_fields.detection_scores)
multiclass_scores = postprocessed_tensors.get(
detection_fields.detection_multiclass_scores)
raw_boxes = postprocessed_tensors.get(detection_fields.raw_detection_boxes) raw_boxes = postprocessed_tensors.get(detection_fields.raw_detection_boxes)
raw_scores = postprocessed_tensors.get(detection_fields.raw_detection_scores) raw_scores = postprocessed_tensors.get(detection_fields.raw_detection_scores)
classes = postprocessed_tensors.get( classes = postprocessed_tensors.get(
...@@ -216,6 +223,9 @@ def add_output_tensor_nodes(postprocessed_tensors, ...@@ -216,6 +223,9 @@ def add_output_tensor_nodes(postprocessed_tensors,
boxes, name=detection_fields.detection_boxes) boxes, name=detection_fields.detection_boxes)
outputs[detection_fields.detection_scores] = tf.identity( outputs[detection_fields.detection_scores] = tf.identity(
scores, name=detection_fields.detection_scores) scores, name=detection_fields.detection_scores)
if multiclass_scores is not None:
outputs[detection_fields.detection_multiclass_scores] = tf.identity(
multiclass_scores, name=detection_fields.detection_multiclass_scores)
outputs[detection_fields.detection_classes] = tf.identity( outputs[detection_fields.detection_classes] = tf.identity(
classes, name=detection_fields.detection_classes) classes, name=detection_fields.detection_classes)
outputs[detection_fields.num_detections] = tf.identity( outputs[detection_fields.num_detections] = tf.identity(
...@@ -306,7 +316,7 @@ def write_graph_and_checkpoint(inference_graph_def, ...@@ -306,7 +316,7 @@ def write_graph_and_checkpoint(inference_graph_def,
def _get_outputs_from_inputs(input_tensors, detection_model, def _get_outputs_from_inputs(input_tensors, detection_model,
output_collection_name): output_collection_name):
inputs = tf.to_float(input_tensors) inputs = tf.cast(input_tensors, dtype=tf.float32)
preprocessed_inputs, true_image_shapes = detection_model.preprocess(inputs) preprocessed_inputs, true_image_shapes = detection_model.preprocess(inputs)
output_tensors = detection_model.predict( output_tensors = detection_model.predict(
preprocessed_inputs, true_image_shapes) preprocessed_inputs, true_image_shapes)
......
...@@ -59,6 +59,9 @@ class FakeModel(model.DetectionModel): ...@@ -59,6 +59,9 @@ class FakeModel(model.DetectionModel):
[0.0, 0.0, 0.0, 0.0]]], tf.float32), [0.0, 0.0, 0.0, 0.0]]], tf.float32),
'detection_scores': tf.constant([[0.7, 0.6], 'detection_scores': tf.constant([[0.7, 0.6],
[0.9, 0.0]], tf.float32), [0.9, 0.0]], tf.float32),
'detection_multiclass_scores': tf.constant([[[0.3, 0.7], [0.4, 0.6]],
[[0.1, 0.9], [0.0, 0.0]]],
tf.float32),
'detection_classes': tf.constant([[0, 1], 'detection_classes': tf.constant([[0, 1],
[1, 0]], tf.float32), [1, 0]], tf.float32),
'num_detections': tf.constant([2, 1], tf.float32), 'num_detections': tf.constant([2, 1], tf.float32),
...@@ -371,6 +374,7 @@ class ExportInferenceGraphTest(tf.test.TestCase): ...@@ -371,6 +374,7 @@ class ExportInferenceGraphTest(tf.test.TestCase):
inference_graph.get_tensor_by_name('image_tensor:0') inference_graph.get_tensor_by_name('image_tensor:0')
inference_graph.get_tensor_by_name('detection_boxes:0') inference_graph.get_tensor_by_name('detection_boxes:0')
inference_graph.get_tensor_by_name('detection_scores:0') inference_graph.get_tensor_by_name('detection_scores:0')
inference_graph.get_tensor_by_name('detection_multiclass_scores:0')
inference_graph.get_tensor_by_name('detection_classes:0') inference_graph.get_tensor_by_name('detection_classes:0')
inference_graph.get_tensor_by_name('detection_keypoints:0') inference_graph.get_tensor_by_name('detection_keypoints:0')
inference_graph.get_tensor_by_name('detection_masks:0') inference_graph.get_tensor_by_name('detection_masks:0')
...@@ -398,6 +402,7 @@ class ExportInferenceGraphTest(tf.test.TestCase): ...@@ -398,6 +402,7 @@ class ExportInferenceGraphTest(tf.test.TestCase):
inference_graph.get_tensor_by_name('image_tensor:0') inference_graph.get_tensor_by_name('image_tensor:0')
inference_graph.get_tensor_by_name('detection_boxes:0') inference_graph.get_tensor_by_name('detection_boxes:0')
inference_graph.get_tensor_by_name('detection_scores:0') inference_graph.get_tensor_by_name('detection_scores:0')
inference_graph.get_tensor_by_name('detection_multiclass_scores:0')
inference_graph.get_tensor_by_name('detection_classes:0') inference_graph.get_tensor_by_name('detection_classes:0')
inference_graph.get_tensor_by_name('num_detections:0') inference_graph.get_tensor_by_name('num_detections:0')
with self.assertRaises(KeyError): with self.assertRaises(KeyError):
...@@ -491,15 +496,20 @@ class ExportInferenceGraphTest(tf.test.TestCase): ...@@ -491,15 +496,20 @@ class ExportInferenceGraphTest(tf.test.TestCase):
'encoded_image_string_tensor:0') 'encoded_image_string_tensor:0')
boxes = inference_graph.get_tensor_by_name('detection_boxes:0') boxes = inference_graph.get_tensor_by_name('detection_boxes:0')
scores = inference_graph.get_tensor_by_name('detection_scores:0') scores = inference_graph.get_tensor_by_name('detection_scores:0')
multiclass_scores = inference_graph.get_tensor_by_name(
'detection_multiclass_scores:0')
classes = inference_graph.get_tensor_by_name('detection_classes:0') classes = inference_graph.get_tensor_by_name('detection_classes:0')
keypoints = inference_graph.get_tensor_by_name('detection_keypoints:0') keypoints = inference_graph.get_tensor_by_name('detection_keypoints:0')
masks = inference_graph.get_tensor_by_name('detection_masks:0') masks = inference_graph.get_tensor_by_name('detection_masks:0')
num_detections = inference_graph.get_tensor_by_name('num_detections:0') num_detections = inference_graph.get_tensor_by_name('num_detections:0')
for image_str in [jpg_image_str, png_image_str]: for image_str in [jpg_image_str, png_image_str]:
image_str_batch_np = np.hstack([image_str]* 2) image_str_batch_np = np.hstack([image_str]* 2)
(boxes_np, scores_np, classes_np, keypoints_np, masks_np, (boxes_np, scores_np, multiclass_scores_np, classes_np, keypoints_np,
num_detections_np) = sess.run( masks_np, num_detections_np) = sess.run(
[boxes, scores, classes, keypoints, masks, num_detections], [
boxes, scores, multiclass_scores, classes, keypoints, masks,
num_detections
],
feed_dict={image_str_tensor: image_str_batch_np}) feed_dict={image_str_tensor: image_str_batch_np})
self.assertAllClose(boxes_np, [[[0.0, 0.0, 0.5, 0.5], self.assertAllClose(boxes_np, [[[0.0, 0.0, 0.5, 0.5],
[0.5, 0.5, 0.8, 0.8]], [0.5, 0.5, 0.8, 0.8]],
...@@ -507,6 +517,8 @@ class ExportInferenceGraphTest(tf.test.TestCase): ...@@ -507,6 +517,8 @@ class ExportInferenceGraphTest(tf.test.TestCase):
[0.0, 0.0, 0.0, 0.0]]]) [0.0, 0.0, 0.0, 0.0]]])
self.assertAllClose(scores_np, [[0.7, 0.6], self.assertAllClose(scores_np, [[0.7, 0.6],
[0.9, 0.0]]) [0.9, 0.0]])
self.assertAllClose(multiclass_scores_np, [[[0.3, 0.7], [0.4, 0.6]],
[[0.1, 0.9], [0.0, 0.0]]])
self.assertAllClose(classes_np, [[1, 2], self.assertAllClose(classes_np, [[1, 2],
[2, 1]]) [2, 1]])
self.assertAllClose(keypoints_np, np.arange(48).reshape([2, 2, 6, 2])) self.assertAllClose(keypoints_np, np.arange(48).reshape([2, 2, 6, 2]))
......
...@@ -127,7 +127,7 @@ def transform_input_data(tensor_dict, ...@@ -127,7 +127,7 @@ def transform_input_data(tensor_dict,
# Apply model preprocessing ops and resize instance masks. # Apply model preprocessing ops and resize instance masks.
image = tensor_dict[fields.InputDataFields.image] image = tensor_dict[fields.InputDataFields.image]
preprocessed_resized_image, true_image_shape = model_preprocess_fn( preprocessed_resized_image, true_image_shape = model_preprocess_fn(
tf.expand_dims(tf.to_float(image), axis=0)) tf.expand_dims(tf.cast(image, dtype=tf.float32), axis=0))
if use_bfloat16: if use_bfloat16:
preprocessed_resized_image = tf.cast( preprocessed_resized_image = tf.cast(
preprocessed_resized_image, tf.bfloat16) preprocessed_resized_image, tf.bfloat16)
...@@ -219,14 +219,15 @@ def pad_input_data_to_static_shapes(tensor_dict, max_num_boxes, num_classes, ...@@ -219,14 +219,15 @@ def pad_input_data_to_static_shapes(tensor_dict, max_num_boxes, num_classes,
num_additional_channels = 0 num_additional_channels = 0
if fields.InputDataFields.image_additional_channels in tensor_dict: if fields.InputDataFields.image_additional_channels in tensor_dict:
num_additional_channels = tensor_dict[ num_additional_channels = shape_utils.get_dim_as_int(tensor_dict[
fields.InputDataFields.image_additional_channels].shape[2].value fields.InputDataFields.image_additional_channels].shape[2])
# We assume that if num_additional_channels > 0, then it has already been # We assume that if num_additional_channels > 0, then it has already been
# concatenated to the base image (but not the ground truth). # concatenated to the base image (but not the ground truth).
num_channels = 3 num_channels = 3
if fields.InputDataFields.image in tensor_dict: if fields.InputDataFields.image in tensor_dict:
num_channels = tensor_dict[fields.InputDataFields.image].shape[2].value num_channels = shape_utils.get_dim_as_int(
tensor_dict[fields.InputDataFields.image].shape[2])
if num_additional_channels: if num_additional_channels:
if num_additional_channels >= num_channels: if num_additional_channels >= num_channels:
...@@ -234,7 +235,8 @@ def pad_input_data_to_static_shapes(tensor_dict, max_num_boxes, num_classes, ...@@ -234,7 +235,8 @@ def pad_input_data_to_static_shapes(tensor_dict, max_num_boxes, num_classes,
'Image must be already concatenated with additional channels.') 'Image must be already concatenated with additional channels.')
if (fields.InputDataFields.original_image in tensor_dict and if (fields.InputDataFields.original_image in tensor_dict and
tensor_dict[fields.InputDataFields.original_image].shape[2].value == shape_utils.get_dim_as_int(
tensor_dict[fields.InputDataFields.original_image].shape[2]) ==
num_channels): num_channels):
raise ValueError( raise ValueError(
'Image must be already concatenated with additional channels.') 'Image must be already concatenated with additional channels.')
...@@ -273,19 +275,21 @@ def pad_input_data_to_static_shapes(tensor_dict, max_num_boxes, num_classes, ...@@ -273,19 +275,21 @@ def pad_input_data_to_static_shapes(tensor_dict, max_num_boxes, num_classes,
if fields.InputDataFields.original_image in tensor_dict: if fields.InputDataFields.original_image in tensor_dict:
padding_shapes[fields.InputDataFields.original_image] = [ padding_shapes[fields.InputDataFields.original_image] = [
height, width, tensor_dict[fields.InputDataFields. height, width,
original_image].shape[2].value shape_utils.get_dim_as_int(tensor_dict[fields.InputDataFields.
original_image].shape[2])
] ]
if fields.InputDataFields.groundtruth_keypoints in tensor_dict: if fields.InputDataFields.groundtruth_keypoints in tensor_dict:
tensor_shape = ( tensor_shape = (
tensor_dict[fields.InputDataFields.groundtruth_keypoints].shape) tensor_dict[fields.InputDataFields.groundtruth_keypoints].shape)
padding_shape = [max_num_boxes, tensor_shape[1].value, padding_shape = [max_num_boxes,
tensor_shape[2].value] shape_utils.get_dim_as_int(tensor_shape[1]),
shape_utils.get_dim_as_int(tensor_shape[2])]
padding_shapes[fields.InputDataFields.groundtruth_keypoints] = padding_shape padding_shapes[fields.InputDataFields.groundtruth_keypoints] = padding_shape
if fields.InputDataFields.groundtruth_keypoint_visibilities in tensor_dict: if fields.InputDataFields.groundtruth_keypoint_visibilities in tensor_dict:
tensor_shape = tensor_dict[fields.InputDataFields. tensor_shape = tensor_dict[fields.InputDataFields.
groundtruth_keypoint_visibilities].shape groundtruth_keypoint_visibilities].shape
padding_shape = [max_num_boxes, tensor_shape[1].value] padding_shape = [max_num_boxes, shape_utils.get_dim_as_int(tensor_shape[1])]
padding_shapes[fields.InputDataFields. padding_shapes[fields.InputDataFields.
groundtruth_keypoint_visibilities] = padding_shape groundtruth_keypoint_visibilities] = padding_shape
...@@ -318,7 +322,7 @@ def augment_input_data(tensor_dict, data_augmentation_options): ...@@ -318,7 +322,7 @@ def augment_input_data(tensor_dict, data_augmentation_options):
input tensor dictionary. input tensor dictionary.
""" """
tensor_dict[fields.InputDataFields.image] = tf.expand_dims( tensor_dict[fields.InputDataFields.image] = tf.expand_dims(
tf.to_float(tensor_dict[fields.InputDataFields.image]), 0) tf.cast(tensor_dict[fields.InputDataFields.image], dtype=tf.float32), 0)
include_instance_masks = (fields.InputDataFields.groundtruth_instance_masks include_instance_masks = (fields.InputDataFields.groundtruth_instance_masks
in tensor_dict) in tensor_dict)
...@@ -438,97 +442,112 @@ def create_train_input_fn(train_config, train_input_config, ...@@ -438,97 +442,112 @@ def create_train_input_fn(train_config, train_input_config,
""" """
def _train_input_fn(params=None): def _train_input_fn(params=None):
"""Returns `features` and `labels` tensor dictionaries for training. return train_input(train_config, train_input_config, model_config,
params=params)
Args: return _train_input_fn
params: Parameter dictionary passed from the estimator.
Returns:
A tf.data.Dataset that holds (features, labels) tuple.
features: Dictionary of feature tensors.
features[fields.InputDataFields.image] is a [batch_size, H, W, C]
float32 tensor with preprocessed images.
features[HASH_KEY] is a [batch_size] int32 tensor representing unique
identifiers for the images.
features[fields.InputDataFields.true_image_shape] is a [batch_size, 3]
int32 tensor representing the true image shapes, as preprocessed
images could be padded.
features[fields.InputDataFields.original_image] (optional) is a
[batch_size, H, W, C] float32 tensor with original images.
labels: Dictionary of groundtruth tensors.
labels[fields.InputDataFields.num_groundtruth_boxes] is a [batch_size]
int32 tensor indicating the number of groundtruth boxes.
labels[fields.InputDataFields.groundtruth_boxes] is a
[batch_size, num_boxes, 4] float32 tensor containing the corners of
the groundtruth boxes.
labels[fields.InputDataFields.groundtruth_classes] is a
[batch_size, num_boxes, num_classes] float32 one-hot tensor of
classes.
labels[fields.InputDataFields.groundtruth_weights] is a
[batch_size, num_boxes] float32 tensor containing groundtruth weights
for the boxes.
-- Optional --
labels[fields.InputDataFields.groundtruth_instance_masks] is a
[batch_size, num_boxes, H, W] float32 tensor containing only binary
values, which represent instance masks for objects.
labels[fields.InputDataFields.groundtruth_keypoints] is a
[batch_size, num_boxes, num_keypoints, 2] float32 tensor containing
keypoints for each box.
Raises:
TypeError: if the `train_config`, `train_input_config` or `model_config`
are not of the correct type.
"""
if not isinstance(train_config, train_pb2.TrainConfig):
raise TypeError('For training mode, the `train_config` must be a '
'train_pb2.TrainConfig.')
if not isinstance(train_input_config, input_reader_pb2.InputReader):
raise TypeError('The `train_input_config` must be a '
'input_reader_pb2.InputReader.')
if not isinstance(model_config, model_pb2.DetectionModel):
raise TypeError('The `model_config` must be a '
'model_pb2.DetectionModel.')
def transform_and_pad_input_data_fn(tensor_dict):
"""Combines transform and pad operation."""
data_augmentation_options = [
preprocessor_builder.build(step)
for step in train_config.data_augmentation_options
]
data_augmentation_fn = functools.partial(
augment_input_data,
data_augmentation_options=data_augmentation_options)
model_preprocess_fn = INPUT_BUILDER_UTIL_MAP['model_build'](
model_config, is_training=True).preprocess
image_resizer_config = config_util.get_image_resizer_config(model_config)
image_resizer_fn = image_resizer_builder.build(image_resizer_config)
transform_data_fn = functools.partial(
transform_input_data, model_preprocess_fn=model_preprocess_fn,
image_resizer_fn=image_resizer_fn,
num_classes=config_util.get_number_of_classes(model_config),
data_augmentation_fn=data_augmentation_fn,
merge_multiple_boxes=train_config.merge_multiple_label_boxes,
retain_original_image=train_config.retain_original_images,
use_multiclass_scores=train_config.use_multiclass_scores,
use_bfloat16=train_config.use_bfloat16)
tensor_dict = pad_input_data_to_static_shapes(
tensor_dict=transform_data_fn(tensor_dict),
max_num_boxes=train_input_config.max_number_of_boxes,
num_classes=config_util.get_number_of_classes(model_config),
spatial_image_shape=config_util.get_spatial_image_size(
image_resizer_config))
return (_get_features_dict(tensor_dict), _get_labels_dict(tensor_dict))
dataset = INPUT_BUILDER_UTIL_MAP['dataset_build'](
train_input_config,
transform_input_data_fn=transform_and_pad_input_data_fn,
batch_size=params['batch_size'] if params else train_config.batch_size)
return dataset
return _train_input_fn def train_input(train_config, train_input_config,
model_config, model=None, params=None):
"""Returns `features` and `labels` tensor dictionaries for training.
Args:
train_config: A train_pb2.TrainConfig.
train_input_config: An input_reader_pb2.InputReader.
model_config: A model_pb2.DetectionModel.
model: A pre-constructed Detection Model.
If None, one will be created from the config.
params: Parameter dictionary passed from the estimator.
Returns:
A tf.data.Dataset that holds (features, labels) tuple.
features: Dictionary of feature tensors.
features[fields.InputDataFields.image] is a [batch_size, H, W, C]
float32 tensor with preprocessed images.
features[HASH_KEY] is a [batch_size] int32 tensor representing unique
identifiers for the images.
features[fields.InputDataFields.true_image_shape] is a [batch_size, 3]
int32 tensor representing the true image shapes, as preprocessed
images could be padded.
features[fields.InputDataFields.original_image] (optional) is a
[batch_size, H, W, C] float32 tensor with original images.
labels: Dictionary of groundtruth tensors.
labels[fields.InputDataFields.num_groundtruth_boxes] is a [batch_size]
int32 tensor indicating the number of groundtruth boxes.
labels[fields.InputDataFields.groundtruth_boxes] is a
[batch_size, num_boxes, 4] float32 tensor containing the corners of
the groundtruth boxes.
labels[fields.InputDataFields.groundtruth_classes] is a
[batch_size, num_boxes, num_classes] float32 one-hot tensor of
classes.
labels[fields.InputDataFields.groundtruth_weights] is a
[batch_size, num_boxes] float32 tensor containing groundtruth weights
for the boxes.
-- Optional --
labels[fields.InputDataFields.groundtruth_instance_masks] is a
[batch_size, num_boxes, H, W] float32 tensor containing only binary
values, which represent instance masks for objects.
labels[fields.InputDataFields.groundtruth_keypoints] is a
[batch_size, num_boxes, num_keypoints, 2] float32 tensor containing
keypoints for each box.
Raises:
TypeError: if the `train_config`, `train_input_config` or `model_config`
are not of the correct type.
"""
if not isinstance(train_config, train_pb2.TrainConfig):
raise TypeError('For training mode, the `train_config` must be a '
'train_pb2.TrainConfig.')
if not isinstance(train_input_config, input_reader_pb2.InputReader):
raise TypeError('The `train_input_config` must be a '
'input_reader_pb2.InputReader.')
if not isinstance(model_config, model_pb2.DetectionModel):
raise TypeError('The `model_config` must be a '
'model_pb2.DetectionModel.')
if model is None:
model_preprocess_fn = INPUT_BUILDER_UTIL_MAP['model_build'](
model_config, is_training=True).preprocess
else:
model_preprocess_fn = model.preprocess
def transform_and_pad_input_data_fn(tensor_dict):
"""Combines transform and pad operation."""
data_augmentation_options = [
preprocessor_builder.build(step)
for step in train_config.data_augmentation_options
]
data_augmentation_fn = functools.partial(
augment_input_data,
data_augmentation_options=data_augmentation_options)
image_resizer_config = config_util.get_image_resizer_config(model_config)
image_resizer_fn = image_resizer_builder.build(image_resizer_config)
transform_data_fn = functools.partial(
transform_input_data, model_preprocess_fn=model_preprocess_fn,
image_resizer_fn=image_resizer_fn,
num_classes=config_util.get_number_of_classes(model_config),
data_augmentation_fn=data_augmentation_fn,
merge_multiple_boxes=train_config.merge_multiple_label_boxes,
retain_original_image=train_config.retain_original_images,
use_multiclass_scores=train_config.use_multiclass_scores,
use_bfloat16=train_config.use_bfloat16)
tensor_dict = pad_input_data_to_static_shapes(
tensor_dict=transform_data_fn(tensor_dict),
max_num_boxes=train_input_config.max_number_of_boxes,
num_classes=config_util.get_number_of_classes(model_config),
spatial_image_shape=config_util.get_spatial_image_size(
image_resizer_config))
return (_get_features_dict(tensor_dict), _get_labels_dict(tensor_dict))
dataset = INPUT_BUILDER_UTIL_MAP['dataset_build'](
train_input_config,
transform_input_data_fn=transform_and_pad_input_data_fn,
batch_size=params['batch_size'] if params else train_config.batch_size)
return dataset
def create_eval_input_fn(eval_config, eval_input_config, model_config): def create_eval_input_fn(eval_config, eval_input_config, model_config):
...@@ -544,84 +563,99 @@ def create_eval_input_fn(eval_config, eval_input_config, model_config): ...@@ -544,84 +563,99 @@ def create_eval_input_fn(eval_config, eval_input_config, model_config):
""" """
def _eval_input_fn(params=None): def _eval_input_fn(params=None):
"""Returns `features` and `labels` tensor dictionaries for evaluation. return eval_input(eval_config, eval_input_config, model_config,
params=params)
Args: return _eval_input_fn
params: Parameter dictionary passed from the estimator.
Returns:
A tf.data.Dataset that holds (features, labels) tuple.
features: Dictionary of feature tensors.
features[fields.InputDataFields.image] is a [1, H, W, C] float32 tensor
with preprocessed images.
features[HASH_KEY] is a [1] int32 tensor representing unique
identifiers for the images.
features[fields.InputDataFields.true_image_shape] is a [1, 3]
int32 tensor representing the true image shapes, as preprocessed
images could be padded.
features[fields.InputDataFields.original_image] is a [1, H', W', C]
float32 tensor with the original image.
labels: Dictionary of groundtruth tensors.
labels[fields.InputDataFields.groundtruth_boxes] is a [1, num_boxes, 4]
float32 tensor containing the corners of the groundtruth boxes.
labels[fields.InputDataFields.groundtruth_classes] is a
[num_boxes, num_classes] float32 one-hot tensor of classes.
labels[fields.InputDataFields.groundtruth_area] is a [1, num_boxes]
float32 tensor containing object areas.
labels[fields.InputDataFields.groundtruth_is_crowd] is a [1, num_boxes]
bool tensor indicating if the boxes enclose a crowd.
labels[fields.InputDataFields.groundtruth_difficult] is a [1, num_boxes]
int32 tensor indicating if the boxes represent difficult instances.
-- Optional --
labels[fields.InputDataFields.groundtruth_instance_masks] is a
[1, num_boxes, H, W] float32 tensor containing only binary values,
which represent instance masks for objects.
Raises:
TypeError: if the `eval_config`, `eval_input_config` or `model_config`
are not of the correct type.
"""
params = params or {}
if not isinstance(eval_config, eval_pb2.EvalConfig):
raise TypeError('For eval mode, the `eval_config` must be a '
'train_pb2.EvalConfig.')
if not isinstance(eval_input_config, input_reader_pb2.InputReader):
raise TypeError('The `eval_input_config` must be a '
'input_reader_pb2.InputReader.')
if not isinstance(model_config, model_pb2.DetectionModel):
raise TypeError('The `model_config` must be a '
'model_pb2.DetectionModel.')
def transform_and_pad_input_data_fn(tensor_dict):
"""Combines transform and pad operation."""
num_classes = config_util.get_number_of_classes(model_config)
model_preprocess_fn = INPUT_BUILDER_UTIL_MAP['model_build'](
model_config, is_training=False).preprocess
image_resizer_config = config_util.get_image_resizer_config(model_config)
image_resizer_fn = image_resizer_builder.build(image_resizer_config)
transform_data_fn = functools.partial(
transform_input_data, model_preprocess_fn=model_preprocess_fn,
image_resizer_fn=image_resizer_fn,
num_classes=num_classes,
data_augmentation_fn=None,
retain_original_image=eval_config.retain_original_images)
tensor_dict = pad_input_data_to_static_shapes(
tensor_dict=transform_data_fn(tensor_dict),
max_num_boxes=eval_input_config.max_number_of_boxes,
num_classes=config_util.get_number_of_classes(model_config),
spatial_image_shape=config_util.get_spatial_image_size(
image_resizer_config))
return (_get_features_dict(tensor_dict), _get_labels_dict(tensor_dict))
dataset = INPUT_BUILDER_UTIL_MAP['dataset_build'](
eval_input_config,
batch_size=params['batch_size'] if params else eval_config.batch_size,
transform_input_data_fn=transform_and_pad_input_data_fn)
return dataset
return _eval_input_fn def eval_input(eval_config, eval_input_config, model_config,
model=None, params=None):
"""Returns `features` and `labels` tensor dictionaries for evaluation.
Args:
eval_config: An eval_pb2.EvalConfig.
eval_input_config: An input_reader_pb2.InputReader.
model_config: A model_pb2.DetectionModel.
model: A pre-constructed Detection Model.
If None, one will be created from the config.
params: Parameter dictionary passed from the estimator.
Returns:
A tf.data.Dataset that holds (features, labels) tuple.
features: Dictionary of feature tensors.
features[fields.InputDataFields.image] is a [1, H, W, C] float32 tensor
with preprocessed images.
features[HASH_KEY] is a [1] int32 tensor representing unique
identifiers for the images.
features[fields.InputDataFields.true_image_shape] is a [1, 3]
int32 tensor representing the true image shapes, as preprocessed
images could be padded.
features[fields.InputDataFields.original_image] is a [1, H', W', C]
float32 tensor with the original image.
labels: Dictionary of groundtruth tensors.
labels[fields.InputDataFields.groundtruth_boxes] is a [1, num_boxes, 4]
float32 tensor containing the corners of the groundtruth boxes.
labels[fields.InputDataFields.groundtruth_classes] is a
[num_boxes, num_classes] float32 one-hot tensor of classes.
labels[fields.InputDataFields.groundtruth_area] is a [1, num_boxes]
float32 tensor containing object areas.
labels[fields.InputDataFields.groundtruth_is_crowd] is a [1, num_boxes]
bool tensor indicating if the boxes enclose a crowd.
labels[fields.InputDataFields.groundtruth_difficult] is a [1, num_boxes]
int32 tensor indicating if the boxes represent difficult instances.
-- Optional --
labels[fields.InputDataFields.groundtruth_instance_masks] is a
[1, num_boxes, H, W] float32 tensor containing only binary values,
which represent instance masks for objects.
Raises:
TypeError: if the `eval_config`, `eval_input_config` or `model_config`
are not of the correct type.
"""
params = params or {}
if not isinstance(eval_config, eval_pb2.EvalConfig):
raise TypeError('For eval mode, the `eval_config` must be a '
'train_pb2.EvalConfig.')
if not isinstance(eval_input_config, input_reader_pb2.InputReader):
raise TypeError('The `eval_input_config` must be a '
'input_reader_pb2.InputReader.')
if not isinstance(model_config, model_pb2.DetectionModel):
raise TypeError('The `model_config` must be a '
'model_pb2.DetectionModel.')
if model is None:
model_preprocess_fn = INPUT_BUILDER_UTIL_MAP['model_build'](
model_config, is_training=False).preprocess
else:
model_preprocess_fn = model.preprocess
def transform_and_pad_input_data_fn(tensor_dict):
"""Combines transform and pad operation."""
num_classes = config_util.get_number_of_classes(model_config)
image_resizer_config = config_util.get_image_resizer_config(model_config)
image_resizer_fn = image_resizer_builder.build(image_resizer_config)
transform_data_fn = functools.partial(
transform_input_data, model_preprocess_fn=model_preprocess_fn,
image_resizer_fn=image_resizer_fn,
num_classes=num_classes,
data_augmentation_fn=None,
retain_original_image=eval_config.retain_original_images)
tensor_dict = pad_input_data_to_static_shapes(
tensor_dict=transform_data_fn(tensor_dict),
max_num_boxes=eval_input_config.max_number_of_boxes,
num_classes=config_util.get_number_of_classes(model_config),
spatial_image_shape=config_util.get_spatial_image_size(
image_resizer_config))
return (_get_features_dict(tensor_dict), _get_labels_dict(tensor_dict))
dataset = INPUT_BUILDER_UTIL_MAP['dataset_build'](
eval_input_config,
batch_size=params['batch_size'] if params else eval_config.batch_size,
transform_input_data_fn=transform_and_pad_input_data_fn)
return dataset
def create_predict_input_fn(model_config, predict_input_config): def create_predict_input_fn(model_config, predict_input_config):
...@@ -664,7 +698,7 @@ def create_predict_input_fn(model_config, predict_input_config): ...@@ -664,7 +698,7 @@ def create_predict_input_fn(model_config, predict_input_config):
load_instance_masks=False, load_instance_masks=False,
num_additional_channels=predict_input_config.num_additional_channels) num_additional_channels=predict_input_config.num_additional_channels)
input_dict = transform_fn(decoder.decode(example)) input_dict = transform_fn(decoder.decode(example))
images = tf.to_float(input_dict[fields.InputDataFields.image]) images = tf.cast(input_dict[fields.InputDataFields.image], dtype=tf.float32)
images = tf.expand_dims(images, axis=0) images = tf.expand_dims(images, axis=0)
true_image_shape = tf.expand_dims( true_image_shape = tf.expand_dims(
input_dict[fields.InputDataFields.true_image_shape], axis=0) input_dict[fields.InputDataFields.true_image_shape], axis=0)
......
...@@ -53,6 +53,9 @@ EVAL_METRICS_CLASS_DICT = { ...@@ -53,6 +53,9 @@ EVAL_METRICS_CLASS_DICT = {
# DEPRECATED: please use oid_challenge_detection_metrics instead # DEPRECATED: please use oid_challenge_detection_metrics instead
'oid_challenge_object_detection_metrics': 'oid_challenge_object_detection_metrics':
object_detection_evaluation.OpenImagesDetectionChallengeEvaluator, object_detection_evaluation.OpenImagesDetectionChallengeEvaluator,
'oid_challenge_segmentation_metrics':
object_detection_evaluation
.OpenImagesInstanceSegmentationChallengeEvaluator,
} }
EVAL_DEFAULT_METRIC = 'pascal_voc_detection_metrics' EVAL_DEFAULT_METRIC = 'pascal_voc_detection_metrics'
...@@ -80,7 +83,7 @@ def _extract_predictions_and_losses(model, ...@@ -80,7 +83,7 @@ def _extract_predictions_and_losses(model,
input_dict = prefetch_queue.dequeue() input_dict = prefetch_queue.dequeue()
original_image = tf.expand_dims(input_dict[fields.InputDataFields.image], 0) original_image = tf.expand_dims(input_dict[fields.InputDataFields.image], 0)
preprocessed_image, true_image_shapes = model.preprocess( preprocessed_image, true_image_shapes = model.preprocess(
tf.to_float(original_image)) tf.cast(original_image, dtype=tf.float32))
prediction_dict = model.predict(preprocessed_image, true_image_shapes) prediction_dict = model.predict(preprocessed_image, true_image_shapes)
detections = model.postprocess(prediction_dict, true_image_shapes) detections = model.postprocess(prediction_dict, true_image_shapes)
......
...@@ -62,7 +62,7 @@ def create_input_queue(batch_size_per_clone, create_tensor_dict_fn, ...@@ -62,7 +62,7 @@ def create_input_queue(batch_size_per_clone, create_tensor_dict_fn,
tensor_dict[fields.InputDataFields.image], 0) tensor_dict[fields.InputDataFields.image], 0)
images = tensor_dict[fields.InputDataFields.image] images = tensor_dict[fields.InputDataFields.image]
float_images = tf.to_float(images) float_images = tf.cast(images, dtype=tf.float32)
tensor_dict[fields.InputDataFields.image] = float_images tensor_dict[fields.InputDataFields.image] = float_images
include_instance_masks = (fields.InputDataFields.groundtruth_instance_masks include_instance_masks = (fields.InputDataFields.groundtruth_instance_masks
......
...@@ -184,7 +184,7 @@ class ArgMaxMatcher(matcher.Matcher): ...@@ -184,7 +184,7 @@ class ArgMaxMatcher(matcher.Matcher):
return matches return matches
if similarity_matrix.shape.is_fully_defined(): if similarity_matrix.shape.is_fully_defined():
if similarity_matrix.shape[0].value == 0: if shape_utils.get_dim_as_int(similarity_matrix.shape[0]) == 0:
return _match_when_rows_are_empty() return _match_when_rows_are_empty()
else: else:
return _match_when_rows_are_non_empty() return _match_when_rows_are_non_empty()
......
...@@ -62,7 +62,7 @@ class GreedyBipartiteMatcher(matcher.Matcher): ...@@ -62,7 +62,7 @@ class GreedyBipartiteMatcher(matcher.Matcher):
# Convert similarity matrix to distance matrix as tf.image.bipartite tries # Convert similarity matrix to distance matrix as tf.image.bipartite tries
# to find minimum distance matches. # to find minimum distance matches.
distance_matrix = -1 * similarity_matrix distance_matrix = -1 * similarity_matrix
num_valid_rows = tf.reduce_sum(tf.to_float(valid_rows)) num_valid_rows = tf.reduce_sum(tf.cast(valid_rows, dtype=tf.float32))
_, match_results = image_ops.bipartite_match( _, match_results = image_ops.bipartite_match(
distance_matrix, num_valid_rows=num_valid_rows) distance_matrix, num_valid_rows=num_valid_rows)
match_results = tf.reshape(match_results, [-1]) match_results = tf.reshape(match_results, [-1])
......
...@@ -722,9 +722,13 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -722,9 +722,13 @@ class FasterRCNNMetaArch(model.DetectionModel):
""" """
clip_heights = image_shapes[:, 0] clip_heights = image_shapes[:, 0]
clip_widths = image_shapes[:, 1] clip_widths = image_shapes[:, 1]
clip_window = tf.to_float(tf.stack([tf.zeros_like(clip_heights), clip_window = tf.cast(
tf.zeros_like(clip_heights), tf.stack([
clip_heights, clip_widths], axis=1)) tf.zeros_like(clip_heights),
tf.zeros_like(clip_heights), clip_heights, clip_widths
],
axis=1),
dtype=tf.float32)
return clip_window return clip_window
def _proposal_postprocess(self, rpn_box_encodings, def _proposal_postprocess(self, rpn_box_encodings,
...@@ -732,7 +736,7 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -732,7 +736,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
image_shape, true_image_shapes): image_shape, true_image_shapes):
"""Wraps over FasterRCNNMetaArch._postprocess_rpn().""" """Wraps over FasterRCNNMetaArch._postprocess_rpn()."""
image_shape_2d = self._image_batch_shape_2d(image_shape) image_shape_2d = self._image_batch_shape_2d(image_shape)
proposal_boxes_normalized, _, num_proposals, _, _ = \ proposal_boxes_normalized, _, _, num_proposals, _, _ = \
self._postprocess_rpn( self._postprocess_rpn(
rpn_box_encodings, rpn_objectness_predictions_with_background, rpn_box_encodings, rpn_objectness_predictions_with_background,
anchors, image_shape_2d, true_image_shapes) anchors, image_shape_2d, true_image_shapes)
...@@ -817,17 +821,13 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -817,17 +821,13 @@ class FasterRCNNMetaArch(model.DetectionModel):
prediction_dict = self._predict_first_stage(preprocessed_inputs) prediction_dict = self._predict_first_stage(preprocessed_inputs)
if self._number_of_stages >= 2: if self._number_of_stages >= 2:
# If mixed-precision training on TPU is enabled, rpn_box_encodings and
# rpn_objectness_predictions_with_background are bfloat16 tensors.
# Considered prediction results, they need to be casted to float32
# tensors for correct postprocess_rpn computation in predict_second_stage.
prediction_dict.update( prediction_dict.update(
self._predict_second_stage( self._predict_second_stage(
tf.to_float(prediction_dict['rpn_box_encodings']), prediction_dict['rpn_box_encodings'],
tf.to_float( prediction_dict['rpn_objectness_predictions_with_background'],
prediction_dict['rpn_objectness_predictions_with_background'] prediction_dict['rpn_features_to_crop'],
), prediction_dict['rpn_features_to_crop'], prediction_dict['anchors'],
prediction_dict['anchors'], prediction_dict['image_shape'], prediction_dict['image_shape'],
true_image_shapes)) true_image_shapes))
if self._number_of_stages == 3: if self._number_of_stages == 3:
...@@ -848,21 +848,21 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -848,21 +848,21 @@ class FasterRCNNMetaArch(model.DetectionModel):
Returns: Returns:
prediction_dict: a dictionary holding "raw" prediction tensors: prediction_dict: a dictionary holding "raw" prediction tensors:
1) rpn_box_predictor_features: A 4-D float32 tensor with shape 1) rpn_box_predictor_features: A 4-D float32/bfloat16 tensor with shape
[batch_size, height, width, depth] to be used for predicting proposal [batch_size, height, width, depth] to be used for predicting proposal
boxes and corresponding objectness scores. boxes and corresponding objectness scores.
2) rpn_features_to_crop: A 4-D float32 tensor with shape 2) rpn_features_to_crop: A 4-D float32/bfloat16 tensor with shape
[batch_size, height, width, depth] representing image features to crop [batch_size, height, width, depth] representing image features to crop
using the proposal boxes predicted by the RPN. using the proposal boxes predicted by the RPN.
3) image_shape: a 1-D tensor of shape [4] representing the input 3) image_shape: a 1-D tensor of shape [4] representing the input
image shape. image shape.
4) rpn_box_encodings: 3-D float tensor of shape 4) rpn_box_encodings: 3-D float32 tensor of shape
[batch_size, num_anchors, self._box_coder.code_size] containing [batch_size, num_anchors, self._box_coder.code_size] containing
predicted boxes. predicted boxes.
5) rpn_objectness_predictions_with_background: 3-D float tensor of shape 5) rpn_objectness_predictions_with_background: 3-D float32 tensor of
[batch_size, num_anchors, 2] containing class shape [batch_size, num_anchors, 2] containing class predictions
predictions (logits) for each of the anchors. Note that this (logits) for each of the anchors. Note that this tensor *includes*
tensor *includes* background class predictions (at class index 0). background class predictions (at class index 0).
6) anchors: A 2-D tensor of shape [num_anchors, 4] representing anchors 6) anchors: A 2-D tensor of shape [num_anchors, 4] representing anchors
for the first stage RPN (in absolute coordinates). Note that for the first stage RPN (in absolute coordinates). Note that
`num_anchors` can differ depending on whether the model is created in `num_anchors` can differ depending on whether the model is created in
...@@ -875,7 +875,8 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -875,7 +875,8 @@ class FasterRCNNMetaArch(model.DetectionModel):
# The Faster R-CNN paper recommends pruning anchors that venture outside # The Faster R-CNN paper recommends pruning anchors that venture outside
# the image window at training time and clipping at inference time. # the image window at training time and clipping at inference time.
clip_window = tf.to_float(tf.stack([0, 0, image_shape[1], image_shape[2]])) clip_window = tf.cast(tf.stack([0, 0, image_shape[1], image_shape[2]]),
dtype=tf.float32)
if self._is_training: if self._is_training:
if self.clip_anchors_to_image: if self.clip_anchors_to_image:
anchors_boxlist = box_list_ops.clip_to_window( anchors_boxlist = box_list_ops.clip_to_window(
...@@ -899,9 +900,10 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -899,9 +900,10 @@ class FasterRCNNMetaArch(model.DetectionModel):
'image_shape': 'image_shape':
image_shape, image_shape,
'rpn_box_encodings': 'rpn_box_encodings':
rpn_box_encodings, tf.cast(rpn_box_encodings, dtype=tf.float32),
'rpn_objectness_predictions_with_background': 'rpn_objectness_predictions_with_background':
rpn_objectness_predictions_with_background, tf.cast(rpn_objectness_predictions_with_background,
dtype=tf.float32),
'anchors': 'anchors':
anchors_boxlist.data['boxes'], anchors_boxlist.data['boxes'],
} }
...@@ -954,13 +956,13 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -954,13 +956,13 @@ class FasterRCNNMetaArch(model.DetectionModel):
Returns: Returns:
prediction_dict: a dictionary holding "raw" prediction tensors: prediction_dict: a dictionary holding "raw" prediction tensors:
1) refined_box_encodings: a 3-D tensor with shape 1) refined_box_encodings: a 3-D float32 tensor with shape
[total_num_proposals, num_classes, self._box_coder.code_size] [total_num_proposals, num_classes, self._box_coder.code_size]
representing predicted (final) refined box encodings, where representing predicted (final) refined box encodings, where
total_num_proposals=batch_size*self._max_num_proposals. If using a total_num_proposals=batch_size*self._max_num_proposals. If using a
shared box across classes the shape will instead be shared box across classes the shape will instead be
[total_num_proposals, 1, self._box_coder.code_size]. [total_num_proposals, 1, self._box_coder.code_size].
2) class_predictions_with_background: a 3-D tensor with shape 2) class_predictions_with_background: a 3-D float32 tensor with shape
[total_num_proposals, num_classes + 1] containing class [total_num_proposals, num_classes + 1] containing class
predictions (logits) for each of the anchors, where predictions (logits) for each of the anchors, where
total_num_proposals=batch_size*self._max_num_proposals. total_num_proposals=batch_size*self._max_num_proposals.
...@@ -980,7 +982,7 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -980,7 +982,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
boxes proposed by the RPN, thus enabling one to extract features and boxes proposed by the RPN, thus enabling one to extract features and
get box classification and prediction for externally selected areas get box classification and prediction for externally selected areas
of the image. of the image.
6) box_classifier_features: a 4-D float32 or bfloat16 tensor 6) box_classifier_features: a 4-D float32/bfloat16 tensor
representing the features for each proposal. representing the features for each proposal.
""" """
proposal_boxes_normalized, num_proposals = self._proposal_postprocess( proposal_boxes_normalized, num_proposals = self._proposal_postprocess(
...@@ -1008,13 +1010,13 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -1008,13 +1010,13 @@ class FasterRCNNMetaArch(model.DetectionModel):
Returns: Returns:
prediction_dict: a dictionary holding "raw" prediction tensors: prediction_dict: a dictionary holding "raw" prediction tensors:
1) refined_box_encodings: a 3-D tensor with shape 1) refined_box_encodings: a 3-D float32 tensor with shape
[total_num_proposals, num_classes, self._box_coder.code_size] [total_num_proposals, num_classes, self._box_coder.code_size]
representing predicted (final) refined box encodings, where representing predicted (final) refined box encodings, where
total_num_proposals=batch_size*self._max_num_proposals. If using a total_num_proposals=batch_size*self._max_num_proposals. If using a
shared box across classes the shape will instead be shared box across classes the shape will instead be
[total_num_proposals, 1, self._box_coder.code_size]. [total_num_proposals, 1, self._box_coder.code_size].
2) class_predictions_with_background: a 3-D tensor with shape 2) class_predictions_with_background: a 3-D float32 tensor with shape
[total_num_proposals, num_classes + 1] containing class [total_num_proposals, num_classes + 1] containing class
predictions (logits) for each of the anchors, where predictions (logits) for each of the anchors, where
total_num_proposals=batch_size*self._max_num_proposals. total_num_proposals=batch_size*self._max_num_proposals.
...@@ -1029,17 +1031,12 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -1029,17 +1031,12 @@ class FasterRCNNMetaArch(model.DetectionModel):
boxes proposed by the RPN, thus enabling one to extract features and boxes proposed by the RPN, thus enabling one to extract features and
get box classification and prediction for externally selected areas get box classification and prediction for externally selected areas
of the image. of the image.
5) box_classifier_features: a 4-D float32 or bfloat16 tensor 5) box_classifier_features: a 4-D float32/bfloat16 tensor
representing the features for each proposal. representing the features for each proposal.
""" """
# If mixed-precision training on TPU is enabled, the dtype of
# rpn_features_to_crop is bfloat16, otherwise it is float32. tf.cast is
# used to match the dtype of proposal_boxes_normalized to that of
# rpn_features_to_crop for further computation.
flattened_proposal_feature_maps = ( flattened_proposal_feature_maps = (
self._compute_second_stage_input_feature_maps( self._compute_second_stage_input_feature_maps(
rpn_features_to_crop, rpn_features_to_crop, proposal_boxes_normalized))
tf.cast(proposal_boxes_normalized, rpn_features_to_crop.dtype)))
box_classifier_features = self._extract_box_classifier_features( box_classifier_features = self._extract_box_classifier_features(
flattened_proposal_feature_maps) flattened_proposal_feature_maps)
...@@ -1066,9 +1063,10 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -1066,9 +1063,10 @@ class FasterRCNNMetaArch(model.DetectionModel):
proposal_boxes_normalized, image_shape, self._parallel_iterations) proposal_boxes_normalized, image_shape, self._parallel_iterations)
prediction_dict = { prediction_dict = {
'refined_box_encodings': refined_box_encodings, 'refined_box_encodings': tf.cast(refined_box_encodings,
dtype=tf.float32),
'class_predictions_with_background': 'class_predictions_with_background':
class_predictions_with_background, tf.cast(class_predictions_with_background, dtype=tf.float32),
'proposal_boxes': absolute_proposal_boxes, 'proposal_boxes': absolute_proposal_boxes,
'box_classifier_features': box_classifier_features, 'box_classifier_features': box_classifier_features,
'proposal_boxes_normalized': proposal_boxes_normalized, 'proposal_boxes_normalized': proposal_boxes_normalized,
...@@ -1215,7 +1213,7 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -1215,7 +1213,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
_, num_classes, height, width = instance_masks.get_shape().as_list() _, num_classes, height, width = instance_masks.get_shape().as_list()
k = tf.shape(instance_masks)[0] k = tf.shape(instance_masks)[0]
instance_masks = tf.reshape(instance_masks, [-1, height, width]) instance_masks = tf.reshape(instance_masks, [-1, height, width])
classes = tf.to_int32(tf.reshape(classes, [-1])) classes = tf.cast(tf.reshape(classes, [-1]), dtype=tf.int32)
gather_idx = tf.range(k) * num_classes + classes gather_idx = tf.range(k) * num_classes + classes
return tf.gather(instance_masks, gather_idx) return tf.gather(instance_masks, gather_idx)
...@@ -1415,6 +1413,7 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -1415,6 +1413,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
detections: a dictionary containing the following fields detections: a dictionary containing the following fields
detection_boxes: [batch, max_detection, 4] detection_boxes: [batch, max_detection, 4]
detection_scores: [batch, max_detections] detection_scores: [batch, max_detections]
detection_multiclass_scores: [batch, max_detections, 2]
detection_classes: [batch, max_detections] detection_classes: [batch, max_detections]
(this entry is only created if rpn_mode=False) (this entry is only created if rpn_mode=False)
num_detections: [batch] num_detections: [batch]
...@@ -1427,7 +1426,8 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -1427,7 +1426,8 @@ class FasterRCNNMetaArch(model.DetectionModel):
with tf.name_scope('FirstStagePostprocessor'): with tf.name_scope('FirstStagePostprocessor'):
if self._number_of_stages == 1: if self._number_of_stages == 1:
(proposal_boxes, proposal_scores, num_proposals, raw_proposal_boxes, (proposal_boxes, proposal_scores, proposal_multiclass_scores,
num_proposals, raw_proposal_boxes,
raw_proposal_scores) = self._postprocess_rpn( raw_proposal_scores) = self._postprocess_rpn(
prediction_dict['rpn_box_encodings'], prediction_dict['rpn_box_encodings'],
prediction_dict['rpn_objectness_predictions_with_background'], prediction_dict['rpn_objectness_predictions_with_background'],
...@@ -1437,8 +1437,10 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -1437,8 +1437,10 @@ class FasterRCNNMetaArch(model.DetectionModel):
proposal_boxes, proposal_boxes,
fields.DetectionResultFields.detection_scores: fields.DetectionResultFields.detection_scores:
proposal_scores, proposal_scores,
fields.DetectionResultFields.detection_multiclass_scores:
proposal_multiclass_scores,
fields.DetectionResultFields.num_detections: fields.DetectionResultFields.num_detections:
tf.to_float(num_proposals), tf.cast(num_proposals, dtype=tf.float32),
fields.DetectionResultFields.raw_detection_boxes: fields.DetectionResultFields.raw_detection_boxes:
raw_proposal_boxes, raw_proposal_boxes,
fields.DetectionResultFields.raw_detection_scores: fields.DetectionResultFields.raw_detection_scores:
...@@ -1545,6 +1547,9 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -1545,6 +1547,9 @@ class FasterRCNNMetaArch(model.DetectionModel):
proposal_scores: A float tensor with shape proposal_scores: A float tensor with shape
[batch_size, max_num_proposals] representing the (potentially zero [batch_size, max_num_proposals] representing the (potentially zero
padded) proposal objectness scores for all images in the batch. padded) proposal objectness scores for all images in the batch.
proposal_multiclass_scores: A float tensor with shape
[batch_size, max_num_proposals, 2] representing the (potentially zero
padded) proposal multiclass scores for all images in the batch.
num_proposals: A Tensor of type `int32`. A 1-D tensor of shape [batch] num_proposals: A Tensor of type `int32`. A 1-D tensor of shape [batch]
representing the number of proposals predicted for each image in representing the number of proposals predicted for each image in
the batch. the batch.
...@@ -1566,10 +1571,12 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -1566,10 +1571,12 @@ class FasterRCNNMetaArch(model.DetectionModel):
rpn_objectness_predictions_with_background_batch) rpn_objectness_predictions_with_background_batch)
rpn_objectness_softmax_without_background = rpn_objectness_softmax[:, :, 1] rpn_objectness_softmax_without_background = rpn_objectness_softmax[:, :, 1]
clip_window = self._compute_clip_window(image_shapes) clip_window = self._compute_clip_window(image_shapes)
(proposal_boxes, proposal_scores, _, _, _, additional_fields = {'multiclass_scores': rpn_objectness_softmax}
(proposal_boxes, proposal_scores, _, _, nmsed_additional_fields,
num_proposals) = self._first_stage_nms_fn( num_proposals) = self._first_stage_nms_fn(
tf.expand_dims(raw_proposal_boxes, axis=2), tf.expand_dims(raw_proposal_boxes, axis=2),
tf.expand_dims(rpn_objectness_softmax_without_background, axis=2), tf.expand_dims(rpn_objectness_softmax_without_background, axis=2),
additional_fields=additional_fields,
clip_window=clip_window) clip_window=clip_window)
if self._is_training: if self._is_training:
proposal_boxes = tf.stop_gradient(proposal_boxes) proposal_boxes = tf.stop_gradient(proposal_boxes)
...@@ -1596,7 +1603,9 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -1596,7 +1603,9 @@ class FasterRCNNMetaArch(model.DetectionModel):
normalize_boxes, normalize_boxes,
elems=[raw_proposal_boxes, image_shapes], elems=[raw_proposal_boxes, image_shapes],
dtype=tf.float32) dtype=tf.float32)
return (normalized_proposal_boxes, proposal_scores, num_proposals, proposal_multiclass_scores = nmsed_additional_fields['multiclass_scores']
return (normalized_proposal_boxes, proposal_scores,
proposal_multiclass_scores, num_proposals,
raw_normalized_proposal_boxes, rpn_objectness_softmax) raw_normalized_proposal_boxes, rpn_objectness_softmax)
def _sample_box_classifier_batch( def _sample_box_classifier_batch(
...@@ -1713,11 +1722,13 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -1713,11 +1722,13 @@ class FasterRCNNMetaArch(model.DetectionModel):
for i, boxes in enumerate( for i, boxes in enumerate(
self.groundtruth_lists(fields.BoxListFields.boxes)) self.groundtruth_lists(fields.BoxListFields.boxes))
] ]
groundtruth_classes_with_background_list = [ groundtruth_classes_with_background_list = []
tf.to_float( for one_hot_encoding in self.groundtruth_lists(
tf.pad(one_hot_encoding, [[0, 0], [1, 0]], mode='CONSTANT')) fields.BoxListFields.classes):
for one_hot_encoding in self.groundtruth_lists( groundtruth_classes_with_background_list.append(
fields.BoxListFields.classes)] tf.cast(
tf.pad(one_hot_encoding, [[0, 0], [1, 0]], mode='CONSTANT'),
dtype=tf.float32))
groundtruth_masks_list = self._groundtruth_lists.get( groundtruth_masks_list = self._groundtruth_lists.get(
fields.BoxListFields.masks) fields.BoxListFields.masks)
...@@ -1860,6 +1871,9 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -1860,6 +1871,9 @@ class FasterRCNNMetaArch(model.DetectionModel):
A dictionary containing: A dictionary containing:
`detection_boxes`: [batch, max_detection, 4] in normalized co-ordinates. `detection_boxes`: [batch, max_detection, 4] in normalized co-ordinates.
`detection_scores`: [batch, max_detections] `detection_scores`: [batch, max_detections]
detection_multiclass_scores: [batch, max_detections,
num_classes_with_background] tensor with class score distribution for
post-processed detection boxes including background class if any.
`detection_classes`: [batch, max_detections] `detection_classes`: [batch, max_detections]
`num_detections`: [batch] `num_detections`: [batch]
`detection_masks`: `detection_masks`:
...@@ -1894,20 +1908,24 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -1894,20 +1908,24 @@ class FasterRCNNMetaArch(model.DetectionModel):
clip_window = self._compute_clip_window(image_shapes) clip_window = self._compute_clip_window(image_shapes)
mask_predictions_batch = None mask_predictions_batch = None
if mask_predictions is not None: if mask_predictions is not None:
mask_height = mask_predictions.shape[2].value mask_height = shape_utils.get_dim_as_int(mask_predictions.shape[2])
mask_width = mask_predictions.shape[3].value mask_width = shape_utils.get_dim_as_int(mask_predictions.shape[3])
mask_predictions = tf.sigmoid(mask_predictions) mask_predictions = tf.sigmoid(mask_predictions)
mask_predictions_batch = tf.reshape( mask_predictions_batch = tf.reshape(
mask_predictions, [-1, self.max_num_proposals, mask_predictions, [-1, self.max_num_proposals,
self.num_classes, mask_height, mask_width]) self.num_classes, mask_height, mask_width])
(nmsed_boxes, nmsed_scores, nmsed_classes, nmsed_masks, _, additional_fields = {
num_detections) = self._second_stage_nms_fn( 'multiclass_scores': class_predictions_with_background_batch_normalized
}
(nmsed_boxes, nmsed_scores, nmsed_classes, nmsed_masks,
nmsed_additional_fields, num_detections) = self._second_stage_nms_fn(
refined_decoded_boxes_batch, refined_decoded_boxes_batch,
class_predictions_batch, class_predictions_batch,
clip_window=clip_window, clip_window=clip_window,
change_coordinate_frame=True, change_coordinate_frame=True,
num_valid_boxes=num_proposals, num_valid_boxes=num_proposals,
additional_fields=additional_fields,
masks=mask_predictions_batch) masks=mask_predictions_batch)
if refined_decoded_boxes_batch.shape[2] > 1: if refined_decoded_boxes_batch.shape[2] > 1:
class_ids = tf.expand_dims( class_ids = tf.expand_dims(
...@@ -1948,8 +1966,10 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -1948,8 +1966,10 @@ class FasterRCNNMetaArch(model.DetectionModel):
nmsed_scores, nmsed_scores,
fields.DetectionResultFields.detection_classes: fields.DetectionResultFields.detection_classes:
nmsed_classes, nmsed_classes,
fields.DetectionResultFields.detection_multiclass_scores:
nmsed_additional_fields['multiclass_scores'],
fields.DetectionResultFields.num_detections: fields.DetectionResultFields.num_detections:
tf.to_float(num_detections), tf.cast(num_detections, dtype=tf.float32),
fields.DetectionResultFields.raw_detection_boxes: fields.DetectionResultFields.raw_detection_boxes:
raw_normalized_detection_boxes, raw_normalized_detection_boxes,
fields.DetectionResultFields.raw_detection_scores: fields.DetectionResultFields.raw_detection_scores:
...@@ -2096,18 +2116,18 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -2096,18 +2116,18 @@ class FasterRCNNMetaArch(model.DetectionModel):
return self._first_stage_sampler.subsample( return self._first_stage_sampler.subsample(
tf.cast(cls_weights, tf.bool), tf.cast(cls_weights, tf.bool),
self._first_stage_minibatch_size, tf.cast(cls_targets, tf.bool)) self._first_stage_minibatch_size, tf.cast(cls_targets, tf.bool))
batch_sampled_indices = tf.to_float(shape_utils.static_or_dynamic_map_fn( batch_sampled_indices = tf.cast(shape_utils.static_or_dynamic_map_fn(
_minibatch_subsample_fn, _minibatch_subsample_fn,
[batch_cls_targets, batch_cls_weights], [batch_cls_targets, batch_cls_weights],
dtype=tf.bool, dtype=tf.bool,
parallel_iterations=self._parallel_iterations, parallel_iterations=self._parallel_iterations,
back_prop=True)) back_prop=True), dtype=tf.float32)
# Normalize by number of examples in sampled minibatch # Normalize by number of examples in sampled minibatch
normalizer = tf.maximum( normalizer = tf.maximum(
tf.reduce_sum(batch_sampled_indices, axis=1), 1.0) tf.reduce_sum(batch_sampled_indices, axis=1), 1.0)
batch_one_hot_targets = tf.one_hot( batch_one_hot_targets = tf.one_hot(
tf.to_int32(batch_cls_targets), depth=2) tf.cast(batch_cls_targets, dtype=tf.int32), depth=2)
sampled_reg_indices = tf.multiply(batch_sampled_indices, sampled_reg_indices = tf.multiply(batch_sampled_indices,
batch_reg_weights) batch_reg_weights)
...@@ -2133,8 +2153,8 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -2133,8 +2153,8 @@ class FasterRCNNMetaArch(model.DetectionModel):
name='localization_loss') name='localization_loss')
objectness_loss = tf.multiply(self._first_stage_obj_loss_weight, objectness_loss = tf.multiply(self._first_stage_obj_loss_weight,
objectness_loss, name='objectness_loss') objectness_loss, name='objectness_loss')
loss_dict = {localization_loss.op.name: localization_loss, loss_dict = {'Loss/RPNLoss/localization_loss': localization_loss,
objectness_loss.op.name: objectness_loss} 'Loss/RPNLoss/objectness_loss': objectness_loss}
return loss_dict return loss_dict
def _loss_box_classifier(self, def _loss_box_classifier(self,
...@@ -2216,8 +2236,9 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -2216,8 +2236,9 @@ class FasterRCNNMetaArch(model.DetectionModel):
for proposal_boxes_single_image in tf.unstack(proposal_boxes)] for proposal_boxes_single_image in tf.unstack(proposal_boxes)]
batch_size = len(proposal_boxlists) batch_size = len(proposal_boxlists)
num_proposals_or_one = tf.to_float(tf.expand_dims( num_proposals_or_one = tf.cast(tf.expand_dims(
tf.maximum(num_proposals, tf.ones_like(num_proposals)), 1)) tf.maximum(num_proposals, tf.ones_like(num_proposals)), 1),
dtype=tf.float32)
normalizer = tf.tile(num_proposals_or_one, normalizer = tf.tile(num_proposals_or_one,
[1, self.max_num_proposals]) * batch_size [1, self.max_num_proposals]) * batch_size
...@@ -2276,9 +2297,11 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -2276,9 +2297,11 @@ class FasterRCNNMetaArch(model.DetectionModel):
ndims=2) / normalizer ndims=2) / normalizer
second_stage_loc_loss = tf.reduce_sum( second_stage_loc_loss = tf.reduce_sum(
second_stage_loc_losses * tf.to_float(paddings_indicator)) second_stage_loc_losses * tf.cast(paddings_indicator,
dtype=tf.float32))
second_stage_cls_loss = tf.reduce_sum( second_stage_cls_loss = tf.reduce_sum(
second_stage_cls_losses * tf.to_float(paddings_indicator)) second_stage_cls_losses * tf.cast(paddings_indicator,
dtype=tf.float32))
if self._hard_example_miner: if self._hard_example_miner:
(second_stage_loc_loss, second_stage_cls_loss (second_stage_loc_loss, second_stage_cls_loss
...@@ -2293,8 +2316,10 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -2293,8 +2316,10 @@ class FasterRCNNMetaArch(model.DetectionModel):
second_stage_cls_loss, second_stage_cls_loss,
name='classification_loss') name='classification_loss')
loss_dict = {localization_loss.op.name: localization_loss, loss_dict = {'Loss/BoxClassifierLoss/localization_loss':
classification_loss.op.name: classification_loss} localization_loss,
'Loss/BoxClassifierLoss/classification_loss':
classification_loss}
second_stage_mask_loss = None second_stage_mask_loss = None
if prediction_masks is not None: if prediction_masks is not None:
if groundtruth_masks_list is None: if groundtruth_masks_list is None:
...@@ -2332,8 +2357,8 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -2332,8 +2357,8 @@ class FasterRCNNMetaArch(model.DetectionModel):
prediction_masks_with_background, prediction_masks_with_background,
tf.greater(one_hot_flat_cls_targets_with_background, 0)) tf.greater(one_hot_flat_cls_targets_with_background, 0))
mask_height = prediction_masks.shape[2].value mask_height = shape_utils.get_dim_as_int(prediction_masks.shape[2])
mask_width = prediction_masks.shape[3].value mask_width = shape_utils.get_dim_as_int(prediction_masks.shape[3])
reshaped_prediction_masks = tf.reshape( reshaped_prediction_masks = tf.reshape(
prediction_masks_masked_by_class_targets, prediction_masks_masked_by_class_targets,
[batch_size, -1, mask_height * mask_width]) [batch_size, -1, mask_height * mask_width])
...@@ -2364,7 +2389,8 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -2364,7 +2389,8 @@ class FasterRCNNMetaArch(model.DetectionModel):
[batch_size, -1, mask_height * mask_width]) [batch_size, -1, mask_height * mask_width])
mask_losses_weights = ( mask_losses_weights = (
batch_mask_target_weights * tf.to_float(paddings_indicator)) batch_mask_target_weights * tf.cast(paddings_indicator,
dtype=tf.float32))
mask_losses = self._second_stage_mask_loss( mask_losses = self._second_stage_mask_loss(
reshaped_prediction_masks, reshaped_prediction_masks,
batch_cropped_gt_mask, batch_cropped_gt_mask,
...@@ -2419,7 +2445,7 @@ class FasterRCNNMetaArch(model.DetectionModel): ...@@ -2419,7 +2445,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
for detection_boxes_single_image in tf.unstack(proposal_boxes) for detection_boxes_single_image in tf.unstack(proposal_boxes)
] ]
paddings_indicator = self._padded_batched_proposals_indicator( paddings_indicator = self._padded_batched_proposals_indicator(
tf.to_int32(num_detections), detection_boxes.shape[1]) tf.cast(num_detections, dtype=tf.int32), detection_boxes.shape[1])
(batch_cls_targets_with_background, _, _, _, (batch_cls_targets_with_background, _, _, _,
_) = target_assigner.batch_assign_targets( _) = target_assigner.batch_assign_targets(
target_assigner=self._detector_target_assigner, target_assigner=self._detector_target_assigner,
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment