Unverified Commit 9bbf8015 authored by pkulzc's avatar pkulzc Committed by GitHub
Browse files

Merged commit includes the following changes: (#6932)

250447559  by Zhichao Lu:

    Update expected files format for Instance Segmentation challenge:
    - add fields ImageWidth, ImageHeight and store the values per prediction
    - as mask, store only encoded image and assume its size is ImageWidth x ImageHeight

--
250402780  by rathodv:

    Fix failing Mask R-CNN TPU convergence test.

    Cast second stage prediction tensors from bfloat16 to float32 to prevent errors in third target assignment (Mask Prediction) - Concat with different types bfloat16 and bfloat32 isn't allowed.

--
250300240  by Zhichao Lu:

    Addion Open Images Challenge 2019 object detection and instance segmentation
    support into Estimator framework.

--
249944839  by rathodv:

    Modify exporter.py to add multiclass score nodes in exported inference graphs.

--
249935201  by rathodv:

    Modify postprocess methods to preserve multiclass scores after non max suppression.

--
249878079  by Zhichao Lu:

    This CL slightly refactors some Object Detection helper functions for data creation, evaluation, and groundtruth providing.

    This will allow the eager+function custom loops to share code with the existing estimator training loops.

    Concretely we make the following changes:
    1. In input creation we separate dataset-creation into top-level helpers, and allow it to optionally accept a pre-constructed model directly instead of always creating a model from the config just for feature preprocessing.

    2. In coco evaluation we split the update_op creation into its own function, which the custom loops will call directly.

    3. In model_lib we move groundtruth providing/ datastructure munging into a helper function

    4. For now we put an escape hatch in `_summarize_target_assignment` when executing in tf v2.0 behavior because the summary apis used only work w/ tf 1.x

--
249673507  by rathodv:

    Use explicit casts instead of tf.to_float and tf.to_int32 to avoid warnings.

--
249656006  by Zhichao Lu:

    Add named "raw_keypoint_locations" node that corresponds with the "raw_box_locations" node.

--
249651674  by rathodv:

    Keep proposal boxes in float format. MatMulCropAndResize can handle the type even when feature themselves are bfloat16s.

--
249568633  by rathodv:

    Support q > 1 in class agnostic NMS.
    Break post_processing_test.py into 3 separate files to avoid linter errors.

--
249535530  by rathodv:

    Update some deprecated arguments to tf ops.

--
249368223  by rathodv:

    Modify MatMulCropAndResize to use MultiLevelRoIAlign method and move the tests to spatial_transform_ops.py module.

    This cl establishes that CropAndResize and RoIAlign are equivalent and only differ in the sampling point grid within the boxes. CropAndResize uses a uniform size x size point grid such that the corner points exactly overlap box corners, while RoiAlign divides boxes into size x size cells and uses their centers as sampling points. In this cl, we switch MatMulCropAndResize to use the MultiLevelRoIAlign implementation with `align_corner` option as MultiLevelRoIAlign implementation is more memory efficient on TPU when compared to the original MatMulCropAndResize.

--
249337338  by chowdhery:

    Add class-agnostic non-max-suppression in post_processing

--
249139196  by Zhichao Lu:

    Fix positional argument bug in export_tflite_ssd_graph

--
249120219  by Zhichao Lu:

    Add evaluator for computing precision limited to a given recall range.

--
249030593  by Zhichao Lu:

    Evaluation util to run segmentation and detection challenge evaluation.

--
248554358  by Zhichao Lu:

    This change contains the auxiliary changes required for TF 2.0 style training with eager+functions+dist strat loops, but not the loops themselves.

    It includes:
    - Updates to shape usage to support both tensorshape v1 and tensorshape v2
    - A fix to FreezableBatchNorm to not override the `training` arg in call when `None` was passed to the constructor (Not an issue in the estimator loops but it was in the custom loops)
    - Puts some constants in init_scope so they work in eager + functions
    - Makes learning rate schedules return a callable in eager mode (required so they update when the global_step changes)
    - Makes DetectionModel a tf.module so it tracks variables (e.g. ones nested in layers)
    - Removes some references to `op.name` for some losses and replaces it w/ explicit names
    - A small part of the change to allow the coco evaluation metrics to work in eager mode

--
248271226  by rathodv:

    Add MultiLevel RoIAlign op.

--
248229103  by rathodv:

    Add functions to 1. pad features maps 2. ravel 5-D indices

--
248206769  by rathodv:

    Add utilities needed to introduce RoI Align op.

--
248177733  by pengchong:

    Internal changes

--
247742582  by Zhichao Lu:

    Open Images Challenge 2019 instance segmentation metric: part 2

--
247525401  by Zhichao Lu:

    Update comments on max_class_per_detection.

--
247520753  by rathodv:

    Add multilevel crop and resize operation that builds on top of matmul_crop_and_resize.

--
247391600  by Zhichao Lu:

    Open Images Challenge 2019 instance segmentation metric

--
247325813  by chowdhery:

    Quantized MobileNet v2 SSD FPNLite config with depth multiplier 0.75

--

PiperOrigin-RevId: 250447559
parent f42fddee
......@@ -55,12 +55,24 @@ a handful of auxiliary annotations associated with each bounding box, namely,
instance masks and keypoints.
"""
import abc
import tensorflow as tf
from object_detection.core import standard_fields as fields
class DetectionModel(object):
"""Abstract base class for detection models."""
# If using a new enough version of TensorFlow, detection models should be a
# tf module or keras model for tracking.
try:
_BaseClass = tf.Module
except AttributeError:
_BaseClass = object
class DetectionModel(_BaseClass):
"""Abstract base class for detection models.
Extends tf.Module to guarantee variable tracking.
"""
__metaclass__ = abc.ABCMeta
def __init__(self, num_classes):
......
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for tensorflow_models.object_detection.core.post_processing."""
import numpy as np
import tensorflow as tf
from object_detection.core import post_processing
from object_detection.core import standard_fields as fields
from object_detection.utils import test_case
class MulticlassNonMaxSuppressionTest(test_case.TestCase):
def test_multiclass_nms_select_with_shared_boxes(self):
boxes = tf.constant([[[0, 0, 1, 1]],
[[0, 0.1, 1, 1.1]],
[[0, -0.1, 1, 0.9]],
[[0, 10, 1, 11]],
[[0, 10.1, 1, 11.1]],
[[0, 100, 1, 101]],
[[0, 1000, 1, 1002]],
[[0, 1000, 1, 1002.1]]], tf.float32)
scores = tf.constant([[.9, 0.01], [.75, 0.05],
[.6, 0.01], [.95, 0],
[.5, 0.01], [.3, 0.01],
[.01, .85], [.01, .5]])
score_thresh = 0.1
iou_thresh = .5
max_output_size = 4
exp_nms_corners = [[0, 10, 1, 11],
[0, 0, 1, 1],
[0, 1000, 1, 1002],
[0, 100, 1, 101]]
exp_nms_scores = [.95, .9, .85, .3]
exp_nms_classes = [0, 0, 1, 0]
nms, _ = post_processing.multiclass_non_max_suppression(
boxes, scores, score_thresh, iou_thresh, max_output_size)
with self.test_session() as sess:
nms_corners_output, nms_scores_output, nms_classes_output = sess.run(
[nms.get(), nms.get_field(fields.BoxListFields.scores),
nms.get_field(fields.BoxListFields.classes)])
self.assertAllClose(nms_corners_output, exp_nms_corners)
self.assertAllClose(nms_scores_output, exp_nms_scores)
self.assertAllClose(nms_classes_output, exp_nms_classes)
def test_multiclass_nms_select_with_shared_boxes_pad_to_max_output_size(self):
boxes = np.array([[[0, 0, 1, 1]],
[[0, 0.1, 1, 1.1]],
[[0, -0.1, 1, 0.9]],
[[0, 10, 1, 11]],
[[0, 10.1, 1, 11.1]],
[[0, 100, 1, 101]],
[[0, 1000, 1, 1002]],
[[0, 1000, 1, 1002.1]]], np.float32)
scores = np.array([[.9, 0.01], [.75, 0.05],
[.6, 0.01], [.95, 0],
[.5, 0.01], [.3, 0.01],
[.01, .85], [.01, .5]], np.float32)
score_thresh = 0.1
iou_thresh = .5
max_size_per_class = 4
max_output_size = 5
exp_nms_corners = [[0, 10, 1, 11],
[0, 0, 1, 1],
[0, 1000, 1, 1002],
[0, 100, 1, 101]]
exp_nms_scores = [.95, .9, .85, .3]
exp_nms_classes = [0, 0, 1, 0]
def graph_fn(boxes, scores):
nms, num_valid_nms_boxes = post_processing.multiclass_non_max_suppression(
boxes,
scores,
score_thresh,
iou_thresh,
max_size_per_class,
max_total_size=max_output_size,
pad_to_max_output_size=True)
return [nms.get(), nms.get_field(fields.BoxListFields.scores),
nms.get_field(fields.BoxListFields.classes), num_valid_nms_boxes]
[nms_corners_output, nms_scores_output, nms_classes_output,
num_valid_nms_boxes] = self.execute(graph_fn, [boxes, scores])
self.assertEqual(num_valid_nms_boxes, 4)
self.assertAllClose(nms_corners_output[0:num_valid_nms_boxes],
exp_nms_corners)
self.assertAllClose(nms_scores_output[0:num_valid_nms_boxes],
exp_nms_scores)
self.assertAllClose(nms_classes_output[0:num_valid_nms_boxes],
exp_nms_classes)
def test_multiclass_nms_select_with_shared_boxes_given_keypoints(self):
boxes = tf.constant([[[0, 0, 1, 1]],
[[0, 0.1, 1, 1.1]],
[[0, -0.1, 1, 0.9]],
[[0, 10, 1, 11]],
[[0, 10.1, 1, 11.1]],
[[0, 100, 1, 101]],
[[0, 1000, 1, 1002]],
[[0, 1000, 1, 1002.1]]], tf.float32)
scores = tf.constant([[.9, 0.01], [.75, 0.05],
[.6, 0.01], [.95, 0],
[.5, 0.01], [.3, 0.01],
[.01, .85], [.01, .5]])
num_keypoints = 6
keypoints = tf.tile(
tf.reshape(tf.range(8), [8, 1, 1]),
[1, num_keypoints, 2])
score_thresh = 0.1
iou_thresh = .5
max_output_size = 4
exp_nms_corners = [[0, 10, 1, 11],
[0, 0, 1, 1],
[0, 1000, 1, 1002],
[0, 100, 1, 101]]
exp_nms_scores = [.95, .9, .85, .3]
exp_nms_classes = [0, 0, 1, 0]
exp_nms_keypoints_tensor = tf.tile(
tf.reshape(tf.constant([3, 0, 6, 5], dtype=tf.float32), [4, 1, 1]),
[1, num_keypoints, 2])
nms, _ = post_processing.multiclass_non_max_suppression(
boxes,
scores,
score_thresh,
iou_thresh,
max_output_size,
additional_fields={fields.BoxListFields.keypoints: keypoints})
with self.test_session() as sess:
(nms_corners_output,
nms_scores_output,
nms_classes_output,
nms_keypoints,
exp_nms_keypoints) = sess.run([
nms.get(),
nms.get_field(fields.BoxListFields.scores),
nms.get_field(fields.BoxListFields.classes),
nms.get_field(fields.BoxListFields.keypoints),
exp_nms_keypoints_tensor
])
self.assertAllClose(nms_corners_output, exp_nms_corners)
self.assertAllClose(nms_scores_output, exp_nms_scores)
self.assertAllClose(nms_classes_output, exp_nms_classes)
self.assertAllEqual(nms_keypoints, exp_nms_keypoints)
def test_multiclass_nms_with_shared_boxes_given_keypoint_heatmaps(self):
boxes = tf.constant([[[0, 0, 1, 1]],
[[0, 0.1, 1, 1.1]],
[[0, -0.1, 1, 0.9]],
[[0, 10, 1, 11]],
[[0, 10.1, 1, 11.1]],
[[0, 100, 1, 101]],
[[0, 1000, 1, 1002]],
[[0, 1000, 1, 1002.1]]], tf.float32)
scores = tf.constant([[.9, 0.01], [.75, 0.05],
[.6, 0.01], [.95, 0],
[.5, 0.01], [.3, 0.01],
[.01, .85], [.01, .5]])
num_boxes = tf.shape(boxes)[0]
heatmap_height = 5
heatmap_width = 5
num_keypoints = 17
keypoint_heatmaps = tf.ones(
[num_boxes, heatmap_height, heatmap_width, num_keypoints],
dtype=tf.float32)
score_thresh = 0.1
iou_thresh = .5
max_output_size = 4
exp_nms_corners = [[0, 10, 1, 11],
[0, 0, 1, 1],
[0, 1000, 1, 1002],
[0, 100, 1, 101]]
exp_nms_scores = [.95, .9, .85, .3]
exp_nms_classes = [0, 0, 1, 0]
exp_nms_keypoint_heatmaps = np.ones(
(4, heatmap_height, heatmap_width, num_keypoints), dtype=np.float32)
nms, _ = post_processing.multiclass_non_max_suppression(
boxes,
scores,
score_thresh,
iou_thresh,
max_output_size,
additional_fields={
fields.BoxListFields.keypoint_heatmaps: keypoint_heatmaps
})
with self.test_session() as sess:
(nms_corners_output,
nms_scores_output,
nms_classes_output,
nms_keypoint_heatmaps) = sess.run(
[nms.get(),
nms.get_field(fields.BoxListFields.scores),
nms.get_field(fields.BoxListFields.classes),
nms.get_field(fields.BoxListFields.keypoint_heatmaps)])
self.assertAllClose(nms_corners_output, exp_nms_corners)
self.assertAllClose(nms_scores_output, exp_nms_scores)
self.assertAllClose(nms_classes_output, exp_nms_classes)
self.assertAllEqual(nms_keypoint_heatmaps, exp_nms_keypoint_heatmaps)
def test_multiclass_nms_with_additional_fields(self):
boxes = tf.constant([[[0, 0, 1, 1]],
[[0, 0.1, 1, 1.1]],
[[0, -0.1, 1, 0.9]],
[[0, 10, 1, 11]],
[[0, 10.1, 1, 11.1]],
[[0, 100, 1, 101]],
[[0, 1000, 1, 1002]],
[[0, 1000, 1, 1002.1]]], tf.float32)
scores = tf.constant([[.9, 0.01], [.75, 0.05],
[.6, 0.01], [.95, 0],
[.5, 0.01], [.3, 0.01],
[.01, .85], [.01, .5]])
coarse_boxes_key = 'coarse_boxes'
coarse_boxes = tf.constant([[0.1, 0.1, 1.1, 1.1],
[0.1, 0.2, 1.1, 1.2],
[0.1, -0.2, 1.1, 1.0],
[0.1, 10.1, 1.1, 11.1],
[0.1, 10.2, 1.1, 11.2],
[0.1, 100.1, 1.1, 101.1],
[0.1, 1000.1, 1.1, 1002.1],
[0.1, 1000.1, 1.1, 1002.2]], tf.float32)
score_thresh = 0.1
iou_thresh = .5
max_output_size = 4
exp_nms_corners = np.array([[0, 10, 1, 11],
[0, 0, 1, 1],
[0, 1000, 1, 1002],
[0, 100, 1, 101]], dtype=np.float32)
exp_nms_coarse_corners = np.array([[0.1, 10.1, 1.1, 11.1],
[0.1, 0.1, 1.1, 1.1],
[0.1, 1000.1, 1.1, 1002.1],
[0.1, 100.1, 1.1, 101.1]],
dtype=np.float32)
exp_nms_scores = [.95, .9, .85, .3]
exp_nms_classes = [0, 0, 1, 0]
nms, _ = post_processing.multiclass_non_max_suppression(
boxes,
scores,
score_thresh,
iou_thresh,
max_output_size,
additional_fields={coarse_boxes_key: coarse_boxes})
with self.test_session() as sess:
(nms_corners_output,
nms_scores_output,
nms_classes_output,
nms_coarse_corners) = sess.run(
[nms.get(),
nms.get_field(fields.BoxListFields.scores),
nms.get_field(fields.BoxListFields.classes),
nms.get_field(coarse_boxes_key)])
self.assertAllClose(nms_corners_output, exp_nms_corners)
self.assertAllClose(nms_scores_output, exp_nms_scores)
self.assertAllClose(nms_classes_output, exp_nms_classes)
self.assertAllEqual(nms_coarse_corners, exp_nms_coarse_corners)
def test_multiclass_nms_select_with_shared_boxes_given_masks(self):
boxes = tf.constant([[[0, 0, 1, 1]],
[[0, 0.1, 1, 1.1]],
[[0, -0.1, 1, 0.9]],
[[0, 10, 1, 11]],
[[0, 10.1, 1, 11.1]],
[[0, 100, 1, 101]],
[[0, 1000, 1, 1002]],
[[0, 1000, 1, 1002.1]]], tf.float32)
scores = tf.constant([[.9, 0.01], [.75, 0.05],
[.6, 0.01], [.95, 0],
[.5, 0.01], [.3, 0.01],
[.01, .85], [.01, .5]])
num_classes = 2
mask_height = 3
mask_width = 3
masks = tf.tile(
tf.reshape(tf.range(8), [8, 1, 1, 1]),
[1, num_classes, mask_height, mask_width])
score_thresh = 0.1
iou_thresh = .5
max_output_size = 4
exp_nms_corners = [[0, 10, 1, 11],
[0, 0, 1, 1],
[0, 1000, 1, 1002],
[0, 100, 1, 101]]
exp_nms_scores = [.95, .9, .85, .3]
exp_nms_classes = [0, 0, 1, 0]
exp_nms_masks_tensor = tf.tile(
tf.reshape(tf.constant([3, 0, 6, 5], dtype=tf.float32), [4, 1, 1]),
[1, mask_height, mask_width])
nms, _ = post_processing.multiclass_non_max_suppression(
boxes, scores, score_thresh, iou_thresh, max_output_size, masks=masks)
with self.test_session() as sess:
(nms_corners_output,
nms_scores_output,
nms_classes_output,
nms_masks,
exp_nms_masks) = sess.run([nms.get(),
nms.get_field(fields.BoxListFields.scores),
nms.get_field(fields.BoxListFields.classes),
nms.get_field(fields.BoxListFields.masks),
exp_nms_masks_tensor])
self.assertAllClose(nms_corners_output, exp_nms_corners)
self.assertAllClose(nms_scores_output, exp_nms_scores)
self.assertAllClose(nms_classes_output, exp_nms_classes)
self.assertAllEqual(nms_masks, exp_nms_masks)
def test_multiclass_nms_select_with_clip_window(self):
boxes = tf.constant([[[0, 0, 10, 10]],
[[1, 1, 11, 11]]], tf.float32)
scores = tf.constant([[.9], [.75]])
clip_window = tf.constant([5, 4, 8, 7], tf.float32)
score_thresh = 0.0
iou_thresh = 0.5
max_output_size = 100
exp_nms_corners = [[5, 4, 8, 7]]
exp_nms_scores = [.9]
exp_nms_classes = [0]
nms, _ = post_processing.multiclass_non_max_suppression(
boxes,
scores,
score_thresh,
iou_thresh,
max_output_size,
clip_window=clip_window)
with self.test_session() as sess:
nms_corners_output, nms_scores_output, nms_classes_output = sess.run(
[nms.get(), nms.get_field(fields.BoxListFields.scores),
nms.get_field(fields.BoxListFields.classes)])
self.assertAllClose(nms_corners_output, exp_nms_corners)
self.assertAllClose(nms_scores_output, exp_nms_scores)
self.assertAllClose(nms_classes_output, exp_nms_classes)
def test_multiclass_nms_select_with_clip_window_change_coordinate_frame(self):
boxes = tf.constant([[[0, 0, 10, 10]],
[[1, 1, 11, 11]]], tf.float32)
scores = tf.constant([[.9], [.75]])
clip_window = tf.constant([5, 4, 8, 7], tf.float32)
score_thresh = 0.0
iou_thresh = 0.5
max_output_size = 100
exp_nms_corners = [[0, 0, 1, 1]]
exp_nms_scores = [.9]
exp_nms_classes = [0]
nms, _ = post_processing.multiclass_non_max_suppression(
boxes,
scores,
score_thresh,
iou_thresh,
max_output_size,
clip_window=clip_window,
change_coordinate_frame=True)
with self.test_session() as sess:
nms_corners_output, nms_scores_output, nms_classes_output = sess.run(
[nms.get(), nms.get_field(fields.BoxListFields.scores),
nms.get_field(fields.BoxListFields.classes)])
self.assertAllClose(nms_corners_output, exp_nms_corners)
self.assertAllClose(nms_scores_output, exp_nms_scores)
self.assertAllClose(nms_classes_output, exp_nms_classes)
def test_multiclass_nms_select_with_per_class_cap(self):
boxes = tf.constant([[[0, 0, 1, 1]],
[[0, 0.1, 1, 1.1]],
[[0, -0.1, 1, 0.9]],
[[0, 10, 1, 11]],
[[0, 10.1, 1, 11.1]],
[[0, 100, 1, 101]],
[[0, 1000, 1, 1002]],
[[0, 1000, 1, 1002.1]]], tf.float32)
scores = tf.constant([[.9, 0.01], [.75, 0.05],
[.6, 0.01], [.95, 0],
[.5, 0.01], [.3, 0.01],
[.01, .85], [.01, .5]])
score_thresh = 0.1
iou_thresh = .5
max_size_per_class = 2
exp_nms_corners = [[0, 10, 1, 11],
[0, 0, 1, 1],
[0, 1000, 1, 1002]]
exp_nms_scores = [.95, .9, .85]
exp_nms_classes = [0, 0, 1]
nms, _ = post_processing.multiclass_non_max_suppression(
boxes, scores, score_thresh, iou_thresh, max_size_per_class)
with self.test_session() as sess:
nms_corners_output, nms_scores_output, nms_classes_output = sess.run(
[nms.get(), nms.get_field(fields.BoxListFields.scores),
nms.get_field(fields.BoxListFields.classes)])
self.assertAllClose(nms_corners_output, exp_nms_corners)
self.assertAllClose(nms_scores_output, exp_nms_scores)
self.assertAllClose(nms_classes_output, exp_nms_classes)
def test_multiclass_nms_select_with_total_cap(self):
boxes = tf.constant([[[0, 0, 1, 1]],
[[0, 0.1, 1, 1.1]],
[[0, -0.1, 1, 0.9]],
[[0, 10, 1, 11]],
[[0, 10.1, 1, 11.1]],
[[0, 100, 1, 101]],
[[0, 1000, 1, 1002]],
[[0, 1000, 1, 1002.1]]], tf.float32)
scores = tf.constant([[.9, 0.01], [.75, 0.05],
[.6, 0.01], [.95, 0],
[.5, 0.01], [.3, 0.01],
[.01, .85], [.01, .5]])
score_thresh = 0.1
iou_thresh = .5
max_size_per_class = 4
max_total_size = 2
exp_nms_corners = [[0, 10, 1, 11],
[0, 0, 1, 1]]
exp_nms_scores = [.95, .9]
exp_nms_classes = [0, 0]
nms, _ = post_processing.multiclass_non_max_suppression(
boxes, scores, score_thresh, iou_thresh, max_size_per_class,
max_total_size)
with self.test_session() as sess:
nms_corners_output, nms_scores_output, nms_classes_output = sess.run(
[nms.get(), nms.get_field(fields.BoxListFields.scores),
nms.get_field(fields.BoxListFields.classes)])
self.assertAllClose(nms_corners_output, exp_nms_corners)
self.assertAllClose(nms_scores_output, exp_nms_scores)
self.assertAllClose(nms_classes_output, exp_nms_classes)
def test_multiclass_nms_threshold_then_select_with_shared_boxes(self):
boxes = tf.constant([[[0, 0, 1, 1]],
[[0, 0.1, 1, 1.1]],
[[0, -0.1, 1, 0.9]],
[[0, 10, 1, 11]],
[[0, 10.1, 1, 11.1]],
[[0, 100, 1, 101]],
[[0, 1000, 1, 1002]],
[[0, 1000, 1, 1002.1]]], tf.float32)
scores = tf.constant([[.9], [.75], [.6], [.95], [.5], [.3], [.01], [.01]])
score_thresh = 0.1
iou_thresh = .5
max_output_size = 3
exp_nms = [[0, 10, 1, 11],
[0, 0, 1, 1],
[0, 100, 1, 101]]
nms, _ = post_processing.multiclass_non_max_suppression(
boxes, scores, score_thresh, iou_thresh, max_output_size)
with self.test_session() as sess:
nms_output = sess.run(nms.get())
self.assertAllClose(nms_output, exp_nms)
def test_multiclass_nms_select_with_separate_boxes(self):
boxes = tf.constant([[[0, 0, 1, 1], [0, 0, 4, 5]],
[[0, 0.1, 1, 1.1], [0, 0.1, 2, 1.1]],
[[0, -0.1, 1, 0.9], [0, -0.1, 1, 0.9]],
[[0, 10, 1, 11], [0, 10, 1, 11]],
[[0, 10.1, 1, 11.1], [0, 10.1, 1, 11.1]],
[[0, 100, 1, 101], [0, 100, 1, 101]],
[[0, 1000, 1, 1002], [0, 999, 2, 1004]],
[[0, 1000, 1, 1002.1], [0, 999, 2, 1002.7]]],
tf.float32)
scores = tf.constant([[.9, 0.01], [.75, 0.05],
[.6, 0.01], [.95, 0],
[.5, 0.01], [.3, 0.01],
[.01, .85], [.01, .5]])
score_thresh = 0.1
iou_thresh = .5
max_output_size = 4
exp_nms_corners = [[0, 10, 1, 11],
[0, 0, 1, 1],
[0, 999, 2, 1004],
[0, 100, 1, 101]]
exp_nms_scores = [.95, .9, .85, .3]
exp_nms_classes = [0, 0, 1, 0]
nms, _ = post_processing.multiclass_non_max_suppression(
boxes, scores, score_thresh, iou_thresh, max_output_size)
with self.test_session() as sess:
nms_corners_output, nms_scores_output, nms_classes_output = sess.run(
[nms.get(), nms.get_field(fields.BoxListFields.scores),
nms.get_field(fields.BoxListFields.classes)])
self.assertAllClose(nms_corners_output, exp_nms_corners)
self.assertAllClose(nms_scores_output, exp_nms_scores)
self.assertAllClose(nms_classes_output, exp_nms_classes)
if __name__ == '__main__':
tf.test.main()
......@@ -24,6 +24,94 @@ from object_detection.core import standard_fields as fields
from object_detection.utils import shape_utils
def _validate_boxes_scores_iou_thresh(boxes, scores, iou_thresh,
change_coordinate_frame, clip_window):
"""Validates boxes, scores and iou_thresh.
This function validates the boxes, scores, iou_thresh
and if change_coordinate_frame is True, clip_window must be specified.
Args:
boxes: A [k, q, 4] float32 tensor containing k detections. `q` can be either
number of classes or 1 depending on whether a separate box is predicted
per class.
scores: A [k, num_classes] float32 tensor containing the scores for each of
the k detections. The scores have to be non-negative when
pad_to_max_output_size is True.
iou_thresh: scalar threshold for IOU (new boxes that have high IOU overlap
with previously selected boxes are removed).
change_coordinate_frame: Whether to normalize coordinates after clipping
relative to clip_window (this can only be set to True if a clip_window is
provided)
clip_window: A float32 tensor of the form [y_min, x_min, y_max, x_max]
representing the window to clip and normalize boxes to before performing
non-max suppression.
Raises:
ValueError: if iou_thresh is not in [0, 1] or if input boxlist does not
have a valid scores field.
"""
if not 0 <= iou_thresh <= 1.0:
raise ValueError('iou_thresh must be between 0 and 1')
if scores.shape.ndims != 2:
raise ValueError('scores field must be of rank 2')
if scores.shape[1].value is None:
raise ValueError('scores must have statically defined second ' 'dimension')
if boxes.shape.ndims != 3:
raise ValueError('boxes must be of rank 3.')
if not (shape_utils.get_dim_as_int(
boxes.shape[1]) == shape_utils.get_dim_as_int(scores.shape[1]) or
shape_utils.get_dim_as_int(boxes.shape[1]) == 1):
raise ValueError('second dimension of boxes must be either 1 or equal '
'to the second dimension of scores')
if boxes.shape[2].value != 4:
raise ValueError('last dimension of boxes must be of size 4.')
if change_coordinate_frame and clip_window is None:
raise ValueError('if change_coordinate_frame is True, then a clip_window'
'must be specified.')
def _clip_window_prune_boxes(sorted_boxes, clip_window, pad_to_max_output_size,
change_coordinate_frame):
"""Prune boxes with zero area.
Args:
sorted_boxes: A BoxList containing k detections.
clip_window: A float32 tensor of the form [y_min, x_min, y_max, x_max]
representing the window to clip and normalize boxes to before performing
non-max suppression.
pad_to_max_output_size: flag indicating whether to pad to max output size or
not.
change_coordinate_frame: Whether to normalize coordinates after clipping
relative to clip_window (this can only be set to True if a clip_window is
provided).
Returns:
sorted_boxes: A BoxList containing k detections after pruning.
num_valid_nms_boxes_cumulative: Number of valid NMS boxes
"""
sorted_boxes = box_list_ops.clip_to_window(
sorted_boxes,
clip_window,
filter_nonoverlapping=not pad_to_max_output_size)
# Set the scores of boxes with zero area to -1 to keep the default
# behaviour of pruning out zero area boxes.
sorted_boxes_size = tf.shape(sorted_boxes.get())[0]
non_zero_box_area = tf.cast(box_list_ops.area(sorted_boxes), tf.bool)
sorted_boxes_scores = tf.where(
non_zero_box_area, sorted_boxes.get_field(fields.BoxListFields.scores),
-1 * tf.ones(sorted_boxes_size))
sorted_boxes.add_field(fields.BoxListFields.scores, sorted_boxes_scores)
num_valid_nms_boxes_cumulative = tf.reduce_sum(
tf.cast(tf.greater_equal(sorted_boxes_scores, 0), tf.int32))
sorted_boxes = box_list_ops.sort_by_field(sorted_boxes,
fields.BoxListFields.scores)
if change_coordinate_frame:
sorted_boxes = box_list_ops.change_coordinate_frame(sorted_boxes,
clip_window)
return sorted_boxes, num_valid_nms_boxes_cumulative
def multiclass_non_max_suppression(boxes,
scores,
score_thresh,
......@@ -97,28 +185,12 @@ def multiclass_non_max_suppression(boxes,
ValueError: if iou_thresh is not in [0, 1] or if input boxlist does not have
a valid scores field.
"""
if not 0 <= iou_thresh <= 1.0:
raise ValueError('iou_thresh must be between 0 and 1')
if scores.shape.ndims != 2:
raise ValueError('scores field must be of rank 2')
if scores.shape[1].value is None:
raise ValueError('scores must have statically defined second '
'dimension')
if boxes.shape.ndims != 3:
raise ValueError('boxes must be of rank 3.')
if not (boxes.shape[1].value == scores.shape[1].value or
boxes.shape[1].value == 1):
raise ValueError('second dimension of boxes must be either 1 or equal '
'to the second dimension of scores')
if boxes.shape[2].value != 4:
raise ValueError('last dimension of boxes must be of size 4.')
if change_coordinate_frame and clip_window is None:
raise ValueError('if change_coordinate_frame is True, then a clip_window'
'must be specified.')
_validate_boxes_scores_iou_thresh(boxes, scores, iou_thresh,
change_coordinate_frame, clip_window)
with tf.name_scope(scope, 'MultiClassNonMaxSuppression'):
num_scores = tf.shape(scores)[0]
num_classes = scores.get_shape()[1]
num_classes = shape_utils.get_dim_as_int(scores.get_shape()[1])
selected_boxes_list = []
num_valid_nms_boxes_cumulative = tf.constant(0)
......@@ -128,7 +200,7 @@ def multiclass_non_max_suppression(boxes,
if boundaries is not None:
per_class_boundaries_list = tf.unstack(boundaries, axis=1)
boxes_ids = (range(num_classes) if len(per_class_boxes_list) > 1
else [0] * num_classes.value)
else [0] * num_classes)
for class_idx, boxes_idx in zip(range(num_classes), boxes_ids):
per_class_boxes = per_class_boxes_list[boxes_idx]
boxlist_and_class_scores = box_list.BoxList(per_class_boxes)
......@@ -193,32 +265,13 @@ def multiclass_non_max_suppression(boxes,
if clip_window is not None:
# When pad_to_max_output_size is False, it prunes the boxes with zero
# area.
sorted_boxes = box_list_ops.clip_to_window(
sorted_boxes,
clip_window,
filter_nonoverlapping=not pad_to_max_output_size)
# Set the scores of boxes with zero area to -1 to keep the default
# behaviour of pruning out zero area boxes.
sorted_boxes_size = tf.shape(sorted_boxes.get())[0]
non_zero_box_area = tf.cast(box_list_ops.area(sorted_boxes), tf.bool)
sorted_boxes_scores = tf.where(
non_zero_box_area,
sorted_boxes.get_field(fields.BoxListFields.scores),
-1*tf.ones(sorted_boxes_size))
sorted_boxes.add_field(fields.BoxListFields.scores, sorted_boxes_scores)
num_valid_nms_boxes_cumulative = tf.reduce_sum(
tf.cast(tf.greater_equal(sorted_boxes_scores, 0), tf.int32))
sorted_boxes = box_list_ops.sort_by_field(sorted_boxes,
fields.BoxListFields.scores)
if change_coordinate_frame:
sorted_boxes = box_list_ops.change_coordinate_frame(
sorted_boxes, clip_window)
sorted_boxes, num_valid_nms_boxes_cumulative = _clip_window_prune_boxes(
sorted_boxes, clip_window, pad_to_max_output_size,
change_coordinate_frame)
if max_total_size:
max_total_size = tf.minimum(max_total_size,
sorted_boxes.num_boxes())
sorted_boxes = box_list_ops.gather(sorted_boxes,
tf.range(max_total_size))
max_total_size = tf.minimum(max_total_size, sorted_boxes.num_boxes())
sorted_boxes = box_list_ops.gather(sorted_boxes, tf.range(max_total_size))
num_valid_nms_boxes_cumulative = tf.where(
max_total_size > num_valid_nms_boxes_cumulative,
num_valid_nms_boxes_cumulative, max_total_size)
......@@ -230,6 +283,175 @@ def multiclass_non_max_suppression(boxes,
return sorted_boxes, num_valid_nms_boxes_cumulative
def class_agnostic_non_max_suppression(boxes,
scores,
score_thresh,
iou_thresh,
max_classes_per_detection=1,
max_total_size=0,
clip_window=None,
change_coordinate_frame=False,
masks=None,
boundaries=None,
pad_to_max_output_size=False,
additional_fields=None,
scope=None):
"""Class-agnostic version of non maximum suppression.
This op greedily selects a subset of detection bounding boxes, pruning
away boxes that have high IOU (intersection over union) overlap (> thresh)
with already selected boxes. It operates on all the boxes using
max scores across all classes for which scores are provided (via the scores
field of the input box_list), pruning boxes with score less than a provided
threshold prior to applying NMS.
Please note that this operation is performed in a class-agnostic way,
therefore any background classes should be removed prior to calling this
function.
Selected boxes are guaranteed to be sorted in decreasing order by score (but
the sort is not guaranteed to be stable).
Args:
boxes: A [k, q, 4] float32 tensor containing k detections. `q` can be either
number of classes or 1 depending on whether a separate box is predicted
per class.
scores: A [k, num_classes] float32 tensor containing the scores for each of
the k detections. The scores have to be non-negative when
pad_to_max_output_size is True.
score_thresh: scalar threshold for score (low scoring boxes are removed).
iou_thresh: scalar threshold for IOU (new boxes that have high IOU overlap
with previously selected boxes are removed).
max_classes_per_detection: maximum number of retained classes per detection
box in class-agnostic NMS.
max_total_size: maximum number of boxes retained over all classes. By
default returns all boxes retained after capping boxes per class.
clip_window: A float32 tensor of the form [y_min, x_min, y_max, x_max]
representing the window to clip and normalize boxes to before performing
non-max suppression.
change_coordinate_frame: Whether to normalize coordinates after clipping
relative to clip_window (this can only be set to True if a clip_window is
provided)
masks: (optional) a [k, q, mask_height, mask_width] float32 tensor
containing box masks. `q` can be either number of classes or 1 depending
on whether a separate mask is predicted per class.
boundaries: (optional) a [k, q, boundary_height, boundary_width] float32
tensor containing box boundaries. `q` can be either number of classes or 1
depending on whether a separate boundary is predicted per class.
pad_to_max_output_size: If true, the output nmsed boxes are padded to be of
length `max_size_per_class`. Defaults to false.
additional_fields: (optional) If not None, a dictionary that maps keys to
tensors whose first dimensions are all of size `k`. After non-maximum
suppression, all tensors corresponding to the selected boxes will be added
to resulting BoxList.
scope: name scope.
Returns:
A tuple of sorted_boxes and num_valid_nms_boxes. The sorted_boxes is a
BoxList holds M boxes with a rank-1 scores field representing
corresponding scores for each box with scores sorted in decreasing order
and a rank-1 classes field representing a class label for each box. The
num_valid_nms_boxes is a 0-D integer tensor representing the number of
valid elements in `BoxList`, with the valid elements appearing first.
Raises:
ValueError: if iou_thresh is not in [0, 1] or if input boxlist does not have
a valid scores field.
"""
_validate_boxes_scores_iou_thresh(boxes, scores, iou_thresh,
change_coordinate_frame, clip_window)
if max_classes_per_detection > 1:
raise ValueError('Max classes per detection box >1 not supported.')
q = boxes.shape[1].value
if q > 1:
class_ids = tf.expand_dims(
tf.argmax(scores, axis=1, output_type=tf.int32), axis=1)
boxes = tf.batch_gather(boxes, class_ids)
if masks is not None:
masks = tf.batch_gather(masks, class_ids)
if boundaries is not None:
boundaries = tf.batch_gather(boundaries, class_ids)
boxes = tf.squeeze(boxes, axis=[1])
if masks is not None:
masks = tf.squeeze(masks, axis=[1])
if boundaries is not None:
boundaries = tf.squeeze(boundaries, axis=[1])
with tf.name_scope(scope, 'ClassAgnosticNonMaxSuppression'):
boxlist_and_class_scores = box_list.BoxList(boxes)
max_scores = tf.reduce_max(scores, axis=-1)
classes_with_max_scores = tf.argmax(scores, axis=-1)
boxlist_and_class_scores.add_field(fields.BoxListFields.scores, max_scores)
if masks is not None:
boxlist_and_class_scores.add_field(fields.BoxListFields.masks, masks)
if boundaries is not None:
boxlist_and_class_scores.add_field(fields.BoxListFields.boundaries,
boundaries)
if additional_fields is not None:
for key, tensor in additional_fields.items():
boxlist_and_class_scores.add_field(key, tensor)
if pad_to_max_output_size:
max_selection_size = max_total_size
selected_indices, num_valid_nms_boxes = (
tf.image.non_max_suppression_padded(
boxlist_and_class_scores.get(),
boxlist_and_class_scores.get_field(fields.BoxListFields.scores),
max_selection_size,
iou_threshold=iou_thresh,
score_threshold=score_thresh,
pad_to_max_output_size=True))
else:
max_selection_size = tf.minimum(max_total_size,
boxlist_and_class_scores.num_boxes())
selected_indices = tf.image.non_max_suppression(
boxlist_and_class_scores.get(),
boxlist_and_class_scores.get_field(fields.BoxListFields.scores),
max_selection_size,
iou_threshold=iou_thresh,
score_threshold=score_thresh)
num_valid_nms_boxes = tf.shape(selected_indices)[0]
selected_indices = tf.concat([
selected_indices,
tf.zeros(max_selection_size - num_valid_nms_boxes, tf.int32)
], 0)
nms_result = box_list_ops.gather(boxlist_and_class_scores, selected_indices)
valid_nms_boxes_indx = tf.less(
tf.range(max_selection_size), num_valid_nms_boxes)
nms_scores = nms_result.get_field(fields.BoxListFields.scores)
nms_result.add_field(
fields.BoxListFields.scores,
tf.where(valid_nms_boxes_indx, nms_scores,
-1 * tf.ones(max_selection_size)))
selected_classes = tf.gather(classes_with_max_scores, selected_indices)
nms_result.add_field(fields.BoxListFields.classes, selected_classes)
selected_boxes = nms_result
sorted_boxes = box_list_ops.sort_by_field(selected_boxes,
fields.BoxListFields.scores)
if clip_window is not None:
# When pad_to_max_output_size is False, it prunes the boxes with zero
# area.
sorted_boxes, num_valid_nms_boxes = _clip_window_prune_boxes(
sorted_boxes, clip_window, pad_to_max_output_size,
change_coordinate_frame)
if max_total_size:
max_total_size = tf.minimum(max_total_size, sorted_boxes.num_boxes())
sorted_boxes = box_list_ops.gather(sorted_boxes, tf.range(max_total_size))
num_valid_nms_boxes = tf.where(max_total_size > num_valid_nms_boxes,
num_valid_nms_boxes, max_total_size)
# Select only the valid boxes if pad_to_max_output_size is False.
if not pad_to_max_output_size:
sorted_boxes = box_list_ops.gather(sorted_boxes,
tf.range(num_valid_nms_boxes))
return sorted_boxes, num_valid_nms_boxes
def batch_multiclass_non_max_suppression(boxes,
scores,
score_thresh,
......@@ -243,7 +465,9 @@ def batch_multiclass_non_max_suppression(boxes,
additional_fields=None,
scope=None,
use_static_shapes=False,
parallel_iterations=32):
parallel_iterations=32,
use_class_agnostic_nms=False,
max_classes_per_detection=1):
"""Multi-class version of non maximum suppression that operates on a batch.
This op is similar to `multiclass_non_max_suppression` but operates on a batch
......@@ -253,8 +477,8 @@ def batch_multiclass_non_max_suppression(boxes,
Args:
boxes: A [batch_size, num_anchors, q, 4] float32 tensor containing
detections. If `q` is 1 then same boxes are used for all classes
otherwise, if `q` is equal to number of classes, class-specific boxes
are used.
otherwise, if `q` is equal to number of classes, class-specific boxes are
used.
scores: A [batch_size, num_anchors, num_classes] float32 tensor containing
the scores for each of the `num_anchors` detections. The scores have to be
non-negative when use_static_shapes is set True.
......@@ -274,8 +498,8 @@ def batch_multiclass_non_max_suppression(boxes,
relative to clip_window (this can only be set to True if a clip_window is
provided)
num_valid_boxes: (optional) a Tensor of type `int32`. A 1-D tensor of shape
[batch_size] representing the number of valid boxes to be considered
for each image in the batch. This parameter allows for ignoring zero
[batch_size] representing the number of valid boxes to be considered for
each image in the batch. This parameter allows for ignoring zero
paddings.
masks: (optional) a [batch_size, num_anchors, q, mask_height, mask_width]
float32 tensor containing box masks. `q` can be either number of classes
......@@ -288,6 +512,10 @@ def batch_multiclass_non_max_suppression(boxes,
Defaults to false.
parallel_iterations: (optional) number of batch items to process in
parallel.
use_class_agnostic_nms: If true, this uses class-agnostic non max
suppression
max_classes_per_detection: Maximum number of retained classes per detection
box in class-agnostic NMS.
Returns:
'nmsed_boxes': A [batch_size, max_detections, 4] float32 tensor
......@@ -313,8 +541,8 @@ def batch_multiclass_non_max_suppression(boxes,
ValueError: if `q` in boxes.shape is not 1 or not equal to number of
classes as inferred from scores.shape.
"""
q = boxes.shape[2].value
num_classes = scores.shape[2].value
q = shape_utils.get_dim_as_int(boxes.shape[2])
num_classes = shape_utils.get_dim_as_int(scores.shape[2])
if q != 1 and q != num_classes:
raise ValueError('third dimension of boxes must be either 1 or equal '
'to the third dimension of scores')
......@@ -335,8 +563,8 @@ def batch_multiclass_non_max_suppression(boxes,
del additional_fields
with tf.name_scope(scope, 'BatchMultiClassNonMaxSuppression'):
boxes_shape = boxes.shape
batch_size = boxes_shape[0].value
num_anchors = boxes_shape[1].value
batch_size = shape_utils.get_dim_as_int(boxes_shape[0])
num_anchors = shape_utils.get_dim_as_int(boxes_shape[1])
if batch_size is None:
batch_size = tf.shape(boxes)[0]
......@@ -434,31 +662,47 @@ def batch_multiclass_non_max_suppression(boxes,
per_image_masks = tf.reshape(
tf.slice(per_image_masks, 4 * [0],
tf.stack([per_image_num_valid_boxes, -1, -1, -1])),
[-1, q, per_image_masks.shape[2].value,
per_image_masks.shape[3].value])
[-1, q, shape_utils.get_dim_as_int(per_image_masks.shape[2]),
shape_utils.get_dim_as_int(per_image_masks.shape[3])])
if per_image_additional_fields is not None:
for key, tensor in per_image_additional_fields.items():
additional_field_shape = tensor.get_shape()
additional_field_dim = len(additional_field_shape)
per_image_additional_fields[key] = tf.reshape(
tf.slice(per_image_additional_fields[key],
additional_field_dim * [0],
tf.stack([per_image_num_valid_boxes] +
(additional_field_dim - 1) * [-1])),
[-1] + [dim.value for dim in additional_field_shape[1:]])
nmsed_boxlist, num_valid_nms_boxes = multiclass_non_max_suppression(
per_image_boxes,
per_image_scores,
score_thresh,
iou_thresh,
max_size_per_class,
max_total_size,
clip_window=per_image_clip_window,
change_coordinate_frame=change_coordinate_frame,
masks=per_image_masks,
pad_to_max_output_size=use_static_shapes,
additional_fields=per_image_additional_fields)
tf.slice(
per_image_additional_fields[key],
additional_field_dim * [0],
tf.stack([per_image_num_valid_boxes] +
(additional_field_dim - 1) * [-1])), [-1] + [
shape_utils.get_dim_as_int(dim)
for dim in additional_field_shape[1:]
])
if use_class_agnostic_nms:
nmsed_boxlist, num_valid_nms_boxes = class_agnostic_non_max_suppression(
per_image_boxes,
per_image_scores,
score_thresh,
iou_thresh,
max_classes_per_detection,
max_total_size,
clip_window=per_image_clip_window,
change_coordinate_frame=change_coordinate_frame,
masks=per_image_masks,
pad_to_max_output_size=use_static_shapes,
additional_fields=per_image_additional_fields)
else:
nmsed_boxlist, num_valid_nms_boxes = multiclass_non_max_suppression(
per_image_boxes,
per_image_scores,
score_thresh,
iou_thresh,
max_size_per_class,
max_total_size,
clip_window=per_image_clip_window,
change_coordinate_frame=change_coordinate_frame,
masks=per_image_masks,
pad_to_max_output_size=use_static_shapes,
additional_fields=per_image_additional_fields)
if not use_static_shapes:
nmsed_boxlist = box_list_ops.pad_or_clip_box_list(
......@@ -499,7 +743,7 @@ def batch_multiclass_non_max_suppression(boxes,
if num_additional_fields > 0:
# Sort the keys to ensure arranging elements in same order as
# in _single_image_nms_fn.
batch_nmsed_keys = ordered_additional_fields.keys()
batch_nmsed_keys = list(ordered_additional_fields.keys())
for i in range(len(batch_nmsed_keys)):
batch_nmsed_additional_fields[
batch_nmsed_keys[i]] = batch_nmsed_values[i]
......
......@@ -55,7 +55,7 @@ def prefetch(tensor_dict, capacity):
enqueue_op = prefetch_queue.enqueue(tensor_dict)
tf.train.queue_runner.add_queue_runner(tf.train.queue_runner.QueueRunner(
prefetch_queue, [enqueue_op]))
tf.summary.scalar('queue/%s/fraction_of_%d_full' % (prefetch_queue.name,
capacity),
tf.to_float(prefetch_queue.size()) * (1. / capacity))
tf.summary.scalar(
'queue/%s/fraction_of_%d_full' % (prefetch_queue.name, capacity),
tf.cast(prefetch_queue.size(), dtype=tf.float32) * (1. / capacity))
return prefetch_queue
......@@ -261,7 +261,7 @@ def normalize_image(image, original_minval, original_maxval, target_minval,
original_maxval = float(original_maxval)
target_minval = float(target_minval)
target_maxval = float(target_maxval)
image = tf.to_float(image)
image = tf.cast(image, dtype=tf.float32)
image = tf.subtract(image, original_minval)
image = tf.multiply(image, (target_maxval - target_minval) /
(original_maxval - original_minval))
......@@ -810,10 +810,12 @@ def random_image_scale(image,
generator_func, preprocessor_cache.PreprocessorCache.IMAGE_SCALE,
preprocess_vars_cache)
image_newysize = tf.to_int32(
tf.multiply(tf.to_float(image_height), size_coef))
image_newxsize = tf.to_int32(
tf.multiply(tf.to_float(image_width), size_coef))
image_newysize = tf.cast(
tf.multiply(tf.cast(image_height, dtype=tf.float32), size_coef),
dtype=tf.int32)
image_newxsize = tf.cast(
tf.multiply(tf.cast(image_width, dtype=tf.float32), size_coef),
dtype=tf.int32)
image = tf.image.resize_images(
image, [image_newysize, image_newxsize], align_corners=True)
result.append(image)
......@@ -1237,7 +1239,7 @@ def _strict_random_crop_image(image,
new_image.set_shape([None, None, image.get_shape()[2]])
# [1, 4]
im_box_rank2 = tf.squeeze(im_box, squeeze_dims=[0])
im_box_rank2 = tf.squeeze(im_box, axis=[0])
# [4]
im_box_rank1 = tf.squeeze(im_box)
......@@ -1555,13 +1557,15 @@ def random_pad_image(image,
new_image += image_color_padded
# setting boxes
new_window = tf.to_float(
new_window = tf.cast(
tf.stack([
-offset_height, -offset_width, target_height - offset_height,
target_width - offset_width
]))
new_window /= tf.to_float(
tf.stack([image_height, image_width, image_height, image_width]))
]),
dtype=tf.float32)
new_window /= tf.cast(
tf.stack([image_height, image_width, image_height, image_width]),
dtype=tf.float32)
boxlist = box_list.BoxList(boxes)
new_boxlist = box_list_ops.change_coordinate_frame(boxlist, new_window)
new_boxes = new_boxlist.get()
......@@ -1616,8 +1620,8 @@ def random_absolute_pad_image(image,
form.
"""
min_image_size = tf.shape(image)[:2]
max_image_size = min_image_size + tf.to_int32(
[max_height_padding, max_width_padding])
max_image_size = min_image_size + tf.cast(
[max_height_padding, max_width_padding], dtype=tf.int32)
return random_pad_image(image, boxes, min_image_size=min_image_size,
max_image_size=max_image_size, pad_color=pad_color,
seed=seed,
......@@ -1723,12 +1727,14 @@ def random_crop_pad_image(image,
cropped_image, cropped_boxes, cropped_labels = result[:3]
min_image_size = tf.to_int32(
tf.to_float(tf.stack([image_height, image_width])) *
min_padded_size_ratio)
max_image_size = tf.to_int32(
tf.to_float(tf.stack([image_height, image_width])) *
max_padded_size_ratio)
min_image_size = tf.cast(
tf.cast(tf.stack([image_height, image_width]), dtype=tf.float32) *
min_padded_size_ratio,
dtype=tf.int32)
max_image_size = tf.cast(
tf.cast(tf.stack([image_height, image_width]), dtype=tf.float32) *
max_padded_size_ratio,
dtype=tf.int32)
padded_image, padded_boxes = random_pad_image(
cropped_image,
......@@ -1840,16 +1846,23 @@ def random_crop_to_aspect_ratio(image,
image_shape = tf.shape(image)
orig_height = image_shape[0]
orig_width = image_shape[1]
orig_aspect_ratio = tf.to_float(orig_width) / tf.to_float(orig_height)
orig_aspect_ratio = tf.cast(
orig_width, dtype=tf.float32) / tf.cast(
orig_height, dtype=tf.float32)
new_aspect_ratio = tf.constant(aspect_ratio, dtype=tf.float32)
def target_height_fn():
return tf.to_int32(tf.round(tf.to_float(orig_width) / new_aspect_ratio))
return tf.cast(
tf.round(tf.cast(orig_width, dtype=tf.float32) / new_aspect_ratio),
dtype=tf.int32)
target_height = tf.cond(orig_aspect_ratio >= new_aspect_ratio,
lambda: orig_height, target_height_fn)
def target_width_fn():
return tf.to_int32(tf.round(tf.to_float(orig_height) * new_aspect_ratio))
return tf.cast(
tf.round(tf.cast(orig_height, dtype=tf.float32) * new_aspect_ratio),
dtype=tf.int32)
target_width = tf.cond(orig_aspect_ratio <= new_aspect_ratio,
lambda: orig_width, target_width_fn)
......@@ -1870,10 +1883,14 @@ def random_crop_to_aspect_ratio(image,
image, offset_height, offset_width, target_height, target_width)
im_box = tf.stack([
tf.to_float(offset_height) / tf.to_float(orig_height),
tf.to_float(offset_width) / tf.to_float(orig_width),
tf.to_float(offset_height + target_height) / tf.to_float(orig_height),
tf.to_float(offset_width + target_width) / tf.to_float(orig_width)
tf.cast(offset_height, dtype=tf.float32) /
tf.cast(orig_height, dtype=tf.float32),
tf.cast(offset_width, dtype=tf.float32) /
tf.cast(orig_width, dtype=tf.float32),
tf.cast(offset_height + target_height, dtype=tf.float32) /
tf.cast(orig_height, dtype=tf.float32),
tf.cast(offset_width + target_width, dtype=tf.float32) /
tf.cast(orig_width, dtype=tf.float32)
])
boxlist = box_list.BoxList(boxes)
......@@ -1996,8 +2013,8 @@ def random_pad_to_aspect_ratio(image,
with tf.name_scope('RandomPadToAspectRatio', values=[image]):
image_shape = tf.shape(image)
image_height = tf.to_float(image_shape[0])
image_width = tf.to_float(image_shape[1])
image_height = tf.cast(image_shape[0], dtype=tf.float32)
image_width = tf.cast(image_shape[1], dtype=tf.float32)
image_aspect_ratio = image_width / image_height
new_aspect_ratio = tf.constant(aspect_ratio, dtype=tf.float32)
target_height = tf.cond(
......@@ -2034,7 +2051,8 @@ def random_pad_to_aspect_ratio(image,
target_width = tf.round(scale * target_width)
new_image = tf.image.pad_to_bounding_box(
image, 0, 0, tf.to_int32(target_height), tf.to_int32(target_width))
image, 0, 0, tf.cast(target_height, dtype=tf.int32),
tf.cast(target_width, dtype=tf.int32))
im_box = tf.stack([
0.0,
......@@ -2050,9 +2068,9 @@ def random_pad_to_aspect_ratio(image,
if masks is not None:
new_masks = tf.expand_dims(masks, -1)
new_masks = tf.image.pad_to_bounding_box(new_masks, 0, 0,
tf.to_int32(target_height),
tf.to_int32(target_width))
new_masks = tf.image.pad_to_bounding_box(
new_masks, 0, 0, tf.cast(target_height, dtype=tf.int32),
tf.cast(target_width, dtype=tf.int32))
new_masks = tf.squeeze(new_masks, [-1])
result.append(new_masks)
......@@ -2106,10 +2124,12 @@ def random_black_patches(image,
image_shape = tf.shape(image)
image_height = image_shape[0]
image_width = image_shape[1]
box_size = tf.to_int32(
box_size = tf.cast(
tf.multiply(
tf.minimum(tf.to_float(image_height), tf.to_float(image_width)),
size_to_image_ratio))
tf.minimum(
tf.cast(image_height, dtype=tf.float32),
tf.cast(image_width, dtype=tf.float32)), size_to_image_ratio),
dtype=tf.int32)
generator_func = functools.partial(tf.random_uniform, [], minval=0.0,
maxval=(1.0 - size_to_image_ratio),
......@@ -2123,8 +2143,12 @@ def random_black_patches(image,
preprocessor_cache.PreprocessorCache.ADD_BLACK_PATCH,
preprocess_vars_cache, key=str(idx) + 'x')
y_min = tf.to_int32(normalized_y_min * tf.to_float(image_height))
x_min = tf.to_int32(normalized_x_min * tf.to_float(image_width))
y_min = tf.cast(
normalized_y_min * tf.cast(image_height, dtype=tf.float32),
dtype=tf.int32)
x_min = tf.cast(
normalized_x_min * tf.cast(image_width, dtype=tf.float32),
dtype=tf.int32)
black_box = tf.ones([box_size, box_size, 3], dtype=tf.float32)
mask = 1.0 - tf.image.pad_to_bounding_box(black_box, y_min, x_min,
image_height, image_width)
......@@ -2156,7 +2180,7 @@ def image_to_float(image):
image: image in tf.float32 format.
"""
with tf.name_scope('ImageToFloat', values=[image]):
image = tf.to_float(image)
image = tf.cast(image, dtype=tf.float32)
return image
......@@ -2342,10 +2366,12 @@ def resize_to_min_dimension(image, masks=None, min_dimension=600,
(image_height, image_width, num_channels) = _get_image_info(image)
min_image_dimension = tf.minimum(image_height, image_width)
min_target_dimension = tf.maximum(min_image_dimension, min_dimension)
target_ratio = tf.to_float(min_target_dimension) / tf.to_float(
min_image_dimension)
target_height = tf.to_int32(tf.to_float(image_height) * target_ratio)
target_width = tf.to_int32(tf.to_float(image_width) * target_ratio)
target_ratio = tf.cast(min_target_dimension, dtype=tf.float32) / tf.cast(
min_image_dimension, dtype=tf.float32)
target_height = tf.cast(
tf.cast(image_height, dtype=tf.float32) * target_ratio, dtype=tf.int32)
target_width = tf.cast(
tf.cast(image_width, dtype=tf.float32) * target_ratio, dtype=tf.int32)
image = tf.image.resize_images(
tf.expand_dims(image, axis=0), size=[target_height, target_width],
method=method,
......@@ -2398,10 +2424,12 @@ def resize_to_max_dimension(image, masks=None, max_dimension=600,
(image_height, image_width, num_channels) = _get_image_info(image)
max_image_dimension = tf.maximum(image_height, image_width)
max_target_dimension = tf.minimum(max_image_dimension, max_dimension)
target_ratio = tf.to_float(max_target_dimension) / tf.to_float(
max_image_dimension)
target_height = tf.to_int32(tf.to_float(image_height) * target_ratio)
target_width = tf.to_int32(tf.to_float(image_width) * target_ratio)
target_ratio = tf.cast(max_target_dimension, dtype=tf.float32) / tf.cast(
max_image_dimension, dtype=tf.float32)
target_height = tf.cast(
tf.cast(image_height, dtype=tf.float32) * target_ratio, dtype=tf.int32)
target_width = tf.cast(
tf.cast(image_width, dtype=tf.float32) * target_ratio, dtype=tf.int32)
image = tf.image.resize_images(
tf.expand_dims(image, axis=0), size=[target_height, target_width],
method=method,
......@@ -2639,11 +2667,11 @@ def random_self_concat_image(
if axis == 0:
# Concat vertically, so need to reduce the y coordinates.
old_scaling = tf.to_float([0.5, 1.0, 0.5, 1.0])
new_translation = tf.to_float([0.5, 0.0, 0.5, 0.0])
old_scaling = tf.constant([0.5, 1.0, 0.5, 1.0])
new_translation = tf.constant([0.5, 0.0, 0.5, 0.0])
elif axis == 1:
old_scaling = tf.to_float([1.0, 0.5, 1.0, 0.5])
new_translation = tf.to_float([0.0, 0.5, 0.0, 0.5])
old_scaling = tf.constant([1.0, 0.5, 1.0, 0.5])
new_translation = tf.constant([0.0, 0.5, 0.0, 0.5])
old_boxes = old_scaling * boxes
new_boxes = old_boxes + new_translation
......
......@@ -795,8 +795,8 @@ class PreprocessorTest(tf.test.TestCase):
images = self.createTestImages()
tensor_dict = {fields.InputDataFields.image: images}
tensor_dict = preprocessor.preprocess(tensor_dict, preprocessing_options)
images_min = tf.to_float(images) * 0.9 / 255.0
images_max = tf.to_float(images) * 1.1 / 255.0
images_min = tf.cast(images, dtype=tf.float32) * 0.9 / 255.0
images_max = tf.cast(images, dtype=tf.float32) * 1.1 / 255.0
images = tensor_dict[fields.InputDataFields.image]
values_greater = tf.greater_equal(images, images_min)
values_less = tf.less_equal(images, images_max)
......@@ -858,20 +858,26 @@ class PreprocessorTest(tf.test.TestCase):
value=images_gray, num_or_size_splits=3, axis=3)
images_r, images_g, images_b = tf.split(
value=images_original, num_or_size_splits=3, axis=3)
images_r_diff1 = tf.squared_difference(tf.to_float(images_r),
tf.to_float(images_gray_r))
images_r_diff2 = tf.squared_difference(tf.to_float(images_gray_r),
tf.to_float(images_gray_g))
images_r_diff1 = tf.squared_difference(
tf.cast(images_r, dtype=tf.float32),
tf.cast(images_gray_r, dtype=tf.float32))
images_r_diff2 = tf.squared_difference(
tf.cast(images_gray_r, dtype=tf.float32),
tf.cast(images_gray_g, dtype=tf.float32))
images_r_diff = tf.multiply(images_r_diff1, images_r_diff2)
images_g_diff1 = tf.squared_difference(tf.to_float(images_g),
tf.to_float(images_gray_g))
images_g_diff2 = tf.squared_difference(tf.to_float(images_gray_g),
tf.to_float(images_gray_b))
images_g_diff1 = tf.squared_difference(
tf.cast(images_g, dtype=tf.float32),
tf.cast(images_gray_g, dtype=tf.float32))
images_g_diff2 = tf.squared_difference(
tf.cast(images_gray_g, dtype=tf.float32),
tf.cast(images_gray_b, dtype=tf.float32))
images_g_diff = tf.multiply(images_g_diff1, images_g_diff2)
images_b_diff1 = tf.squared_difference(tf.to_float(images_b),
tf.to_float(images_gray_b))
images_b_diff2 = tf.squared_difference(tf.to_float(images_gray_b),
tf.to_float(images_gray_r))
images_b_diff1 = tf.squared_difference(
tf.cast(images_b, dtype=tf.float32),
tf.cast(images_gray_b, dtype=tf.float32))
images_b_diff2 = tf.squared_difference(
tf.cast(images_gray_b, dtype=tf.float32),
tf.cast(images_gray_r, dtype=tf.float32))
images_b_diff = tf.multiply(images_b_diff1, images_b_diff2)
image_zero1 = tf.constant(0, dtype=tf.float32, shape=[1, 4, 4, 1])
with self.test_session() as sess:
......@@ -2135,7 +2141,7 @@ class PreprocessorTest(tf.test.TestCase):
boxes = self.createTestBoxes()
labels = self.createTestLabels()
tensor_dict = {
fields.InputDataFields.image: tf.to_float(images),
fields.InputDataFields.image: tf.cast(images, dtype=tf.float32),
fields.InputDataFields.groundtruth_boxes: boxes,
fields.InputDataFields.groundtruth_classes: labels,
}
......@@ -2856,7 +2862,7 @@ class PreprocessorTest(tf.test.TestCase):
scores = self.createTestMultiClassScores()
tensor_dict = {
fields.InputDataFields.image: tf.to_float(images),
fields.InputDataFields.image: tf.cast(images, dtype=tf.float32),
fields.InputDataFields.groundtruth_boxes: boxes,
fields.InputDataFields.groundtruth_classes: labels,
fields.InputDataFields.groundtruth_weights: weights,
......
......@@ -109,6 +109,8 @@ class DetectionResultFields(object):
key: unique key corresponding to image.
detection_boxes: coordinates of the detection boxes in the image.
detection_scores: detection scores for the detection boxes in the image.
detection_multiclass_scores: class score distribution (including background)
for detection boxes in the image including background class.
detection_classes: detection-level class labels.
detection_masks: contains a segmentation mask for each detection box.
detection_boundaries: contains an object boundary for each detection box.
......@@ -123,6 +125,7 @@ class DetectionResultFields(object):
key = 'key'
detection_boxes = 'detection_boxes'
detection_scores = 'detection_scores'
detection_multiclass_scores = 'detection_multiclass_scores'
detection_classes = 'detection_classes'
detection_masks = 'detection_masks'
detection_boundaries = 'detection_boundaries'
......
......@@ -660,16 +660,16 @@ def batch_assign_confidences(target_assigner,
explicit_example_mask = tf.logical_or(positive_mask, negative_mask)
positive_anchors = tf.reduce_any(positive_mask, axis=-1)
regression_weights = tf.to_float(positive_anchors)
regression_weights = tf.cast(positive_anchors, dtype=tf.float32)
regression_targets = (
reg_targets * tf.expand_dims(regression_weights, axis=-1))
regression_weights_expanded = tf.expand_dims(regression_weights, axis=-1)
cls_targets_without_background = (
cls_targets_without_background * (1 - tf.to_float(negative_mask)))
cls_weights_without_background = (
(1 - implicit_class_weight) * tf.to_float(explicit_example_mask)
+ implicit_class_weight)
cls_targets_without_background *
(1 - tf.cast(negative_mask, dtype=tf.float32)))
cls_weights_without_background = ((1 - implicit_class_weight) * tf.cast(
explicit_example_mask, dtype=tf.float32) + implicit_class_weight)
if include_background_class:
cls_weights_background = (
......
......@@ -59,8 +59,15 @@ class _ClassTensorHandler(slim_example_decoder.Tensor):
label_map_proto_file, use_display_name=False)
# We use a default_value of -1, but we expect all labels to be contained
# in the label map.
name_to_id_table = tf.contrib.lookup.HashTable(
initializer=tf.contrib.lookup.KeyValueTensorInitializer(
try:
# Dynamically try to load the tf v2 lookup, falling back to contrib
lookup = tf.compat.v2.lookup
hash_table_class = tf.compat.v2.lookup.StaticHashTable
except AttributeError:
lookup = tf.contrib.lookup
hash_table_class = tf.contrib.lookup.HashTable
name_to_id_table = hash_table_class(
initializer=lookup.KeyValueTensorInitializer(
keys=tf.constant(list(name_to_id.keys())),
values=tf.constant(list(name_to_id.values()), dtype=tf.int64)),
default_value=-1)
......@@ -68,8 +75,8 @@ class _ClassTensorHandler(slim_example_decoder.Tensor):
label_map_proto_file, use_display_name=True)
# We use a default_value of -1, but we expect all labels to be contained
# in the label map.
display_name_to_id_table = tf.contrib.lookup.HashTable(
initializer=tf.contrib.lookup.KeyValueTensorInitializer(
display_name_to_id_table = hash_table_class(
initializer=lookup.KeyValueTensorInitializer(
keys=tf.constant(list(display_name_to_id.keys())),
values=tf.constant(
list(display_name_to_id.values()), dtype=tf.int64)),
......@@ -444,7 +451,8 @@ class TfExampleDecoder(data_decoder.DataDecoder):
masks = keys_to_tensors['image/object/mask']
if isinstance(masks, tf.SparseTensor):
masks = tf.sparse_tensor_to_dense(masks)
masks = tf.reshape(tf.to_float(tf.greater(masks, 0.0)), to_shape)
masks = tf.reshape(
tf.cast(tf.greater(masks, 0.0), dtype=tf.float32), to_shape)
return tf.cast(masks, tf.float32)
def _decode_png_instance_masks(self, keys_to_tensors):
......@@ -465,7 +473,7 @@ class TfExampleDecoder(data_decoder.DataDecoder):
image = tf.squeeze(
tf.image.decode_image(image_buffer, channels=1), axis=2)
image.set_shape([None, None])
image = tf.to_float(tf.greater(image, 0))
image = tf.cast(tf.greater(image, 0), dtype=tf.float32)
return image
png_masks = keys_to_tensors['image/object/mask']
......@@ -476,4 +484,4 @@ class TfExampleDecoder(data_decoder.DataDecoder):
return tf.cond(
tf.greater(tf.size(png_masks), 0),
lambda: tf.map_fn(decode_png_mask, png_masks, dtype=tf.float32),
lambda: tf.zeros(tf.to_int32(tf.stack([0, height, width]))))
lambda: tf.zeros(tf.cast(tf.stack([0, height, width]), dtype=tf.int32)))
......@@ -44,10 +44,15 @@ EVAL_METRICS_CLASS_DICT = {
coco_evaluation.CocoMaskEvaluator,
'oid_challenge_detection_metrics':
object_detection_evaluation.OpenImagesDetectionChallengeEvaluator,
'oid_challenge_segmentation_metrics':
object_detection_evaluation
.OpenImagesInstanceSegmentationChallengeEvaluator,
'pascal_voc_detection_metrics':
object_detection_evaluation.PascalDetectionEvaluator,
'weighted_pascal_voc_detection_metrics':
object_detection_evaluation.WeightedPascalDetectionEvaluator,
'precision_at_recall_detection_metrics':
object_detection_evaluation.PrecisionAtRecallDetectionEvaluator,
'pascal_voc_instance_segmentation_metrics':
object_detection_evaluation.PascalInstanceSegmentationEvaluator,
'weighted_pascal_voc_instance_segmentation_metrics':
......@@ -776,7 +781,8 @@ def result_dict_for_batched_example(images,
detection_fields = fields.DetectionResultFields
detection_boxes = detections[detection_fields.detection_boxes]
detection_scores = detections[detection_fields.detection_scores]
num_detections = tf.to_int32(detections[detection_fields.num_detections])
num_detections = tf.cast(detections[detection_fields.num_detections],
dtype=tf.int32)
if class_agnostic:
detection_classes = tf.ones_like(detection_scores, dtype=tf.int64)
......@@ -939,4 +945,9 @@ def evaluator_options_from_eval_config(eval_config):
'include_metrics_per_category': (
eval_config.include_metrics_per_category)
}
elif eval_metric_fn_key == 'precision_at_recall_detection_metrics':
evaluator_options[eval_metric_fn_key] = {
'recall_lower_bound': (eval_config.recall_lower_bound),
'recall_upper_bound': (eval_config.recall_upper_bound)
}
return evaluator_options
......@@ -31,9 +31,9 @@ from object_detection.utils import test_case
class EvalUtilTest(test_case.TestCase, parameterized.TestCase):
def _get_categories_list(self):
return [{'id': 0, 'name': 'person'},
{'id': 1, 'name': 'dog'},
{'id': 2, 'name': 'cat'}]
return [{'id': 1, 'name': 'person'},
{'id': 2, 'name': 'dog'},
{'id': 3, 'name': 'cat'}]
def _make_evaluation_dict(self,
resized_groundtruth_masks=False,
......@@ -192,43 +192,66 @@ class EvalUtilTest(test_case.TestCase, parameterized.TestCase):
def test_get_eval_metric_ops_for_evaluators(self):
eval_config = eval_pb2.EvalConfig()
eval_config.metrics_set.extend(
['coco_detection_metrics', 'coco_mask_metrics'])
eval_config.metrics_set.extend([
'coco_detection_metrics', 'coco_mask_metrics',
'precision_at_recall_detection_metrics'
])
eval_config.include_metrics_per_category = True
eval_config.recall_lower_bound = 0.2
eval_config.recall_upper_bound = 0.6
evaluator_options = eval_util.evaluator_options_from_eval_config(
eval_config)
self.assertTrue(evaluator_options['coco_detection_metrics'][
'include_metrics_per_category'])
self.assertTrue(evaluator_options['coco_mask_metrics'][
'include_metrics_per_category'])
self.assertTrue(evaluator_options['coco_detection_metrics']
['include_metrics_per_category'])
self.assertTrue(
evaluator_options['coco_mask_metrics']['include_metrics_per_category'])
self.assertAlmostEqual(
evaluator_options['precision_at_recall_detection_metrics']
['recall_lower_bound'], eval_config.recall_lower_bound)
self.assertAlmostEqual(
evaluator_options['precision_at_recall_detection_metrics']
['recall_upper_bound'], eval_config.recall_upper_bound)
def test_get_evaluator_with_evaluator_options(self):
eval_config = eval_pb2.EvalConfig()
eval_config.metrics_set.extend(['coco_detection_metrics'])
eval_config.metrics_set.extend(
['coco_detection_metrics', 'precision_at_recall_detection_metrics'])
eval_config.include_metrics_per_category = True
eval_config.recall_lower_bound = 0.2
eval_config.recall_upper_bound = 0.6
categories = self._get_categories_list()
evaluator_options = eval_util.evaluator_options_from_eval_config(
eval_config)
evaluator = eval_util.get_evaluators(
eval_config, categories, evaluator_options)
evaluator = eval_util.get_evaluators(eval_config, categories,
evaluator_options)
self.assertTrue(evaluator[0]._include_metrics_per_category)
self.assertAlmostEqual(evaluator[1]._recall_lower_bound,
eval_config.recall_lower_bound)
self.assertAlmostEqual(evaluator[1]._recall_upper_bound,
eval_config.recall_upper_bound)
def test_get_evaluator_with_no_evaluator_options(self):
eval_config = eval_pb2.EvalConfig()
eval_config.metrics_set.extend(['coco_detection_metrics'])
eval_config.metrics_set.extend(
['coco_detection_metrics', 'precision_at_recall_detection_metrics'])
eval_config.include_metrics_per_category = True
eval_config.recall_lower_bound = 0.2
eval_config.recall_upper_bound = 0.6
categories = self._get_categories_list()
evaluator = eval_util.get_evaluators(
eval_config, categories, evaluator_options=None)
# Even though we are setting eval_config.include_metrics_per_category = True
# this option is never passed into the DetectionEvaluator constructor (via
# `evaluator_options`).
# and bounds on recall, these options are never passed into the
# DetectionEvaluator constructor (via `evaluator_options`).
self.assertFalse(evaluator[0]._include_metrics_per_category)
self.assertAlmostEqual(evaluator[1]._recall_lower_bound, 0.0)
self.assertAlmostEqual(evaluator[1]._recall_upper_bound, 1.0)
if __name__ == '__main__':
tf.test.main()
......@@ -106,7 +106,7 @@ flags.DEFINE_string('trained_checkpoint_prefix', None, 'Checkpoint prefix.')
flags.DEFINE_integer('max_detections', 10,
'Maximum number of detections (boxes) to show.')
flags.DEFINE_integer('max_classes_per_detection', 1,
'Number of classes to display per detection box.')
'Maximum number of classes to output per detection box.')
flags.DEFINE_integer(
'detections_per_class', 100,
'Number of anchors used per class in Regular Non-Max-Suppression.')
......@@ -136,7 +136,7 @@ def main(argv):
export_tflite_ssd_graph_lib.export_tflite_graph(
pipeline_config, FLAGS.trained_checkpoint_prefix, FLAGS.output_directory,
FLAGS.add_postprocessing_op, FLAGS.max_detections,
FLAGS.max_classes_per_detection, FLAGS.use_regular_nms)
FLAGS.max_classes_per_detection, use_regular_nms=FLAGS.use_regular_nms)
if __name__ == '__main__':
......
......@@ -176,6 +176,9 @@ def add_output_tensor_nodes(postprocessed_tensors,
containing detected boxes.
* detection_scores: float32 tensor of shape [batch_size, num_boxes]
containing scores for the detected boxes.
* detection_multiclass_scores: (Optional) float32 tensor of shape
[batch_size, num_boxes, num_classes_with_background] for containing class
score distribution for detected boxes including background if any.
* detection_classes: float32 tensor of shape [batch_size, num_boxes]
containing class predictions for the detected boxes.
* detection_keypoints: (Optional) float32 tensor of shape
......@@ -189,6 +192,8 @@ def add_output_tensor_nodes(postprocessed_tensors,
postprocessed_tensors: a dictionary containing the following fields
'detection_boxes': [batch, max_detections, 4]
'detection_scores': [batch, max_detections]
'detection_multiclass_scores': [batch, max_detections,
num_classes_with_background]
'detection_classes': [batch, max_detections]
'detection_masks': [batch, max_detections, mask_height, mask_width]
(optional).
......@@ -204,6 +209,8 @@ def add_output_tensor_nodes(postprocessed_tensors,
label_id_offset = 1
boxes = postprocessed_tensors.get(detection_fields.detection_boxes)
scores = postprocessed_tensors.get(detection_fields.detection_scores)
multiclass_scores = postprocessed_tensors.get(
detection_fields.detection_multiclass_scores)
raw_boxes = postprocessed_tensors.get(detection_fields.raw_detection_boxes)
raw_scores = postprocessed_tensors.get(detection_fields.raw_detection_scores)
classes = postprocessed_tensors.get(
......@@ -216,6 +223,9 @@ def add_output_tensor_nodes(postprocessed_tensors,
boxes, name=detection_fields.detection_boxes)
outputs[detection_fields.detection_scores] = tf.identity(
scores, name=detection_fields.detection_scores)
if multiclass_scores is not None:
outputs[detection_fields.detection_multiclass_scores] = tf.identity(
multiclass_scores, name=detection_fields.detection_multiclass_scores)
outputs[detection_fields.detection_classes] = tf.identity(
classes, name=detection_fields.detection_classes)
outputs[detection_fields.num_detections] = tf.identity(
......@@ -306,7 +316,7 @@ def write_graph_and_checkpoint(inference_graph_def,
def _get_outputs_from_inputs(input_tensors, detection_model,
output_collection_name):
inputs = tf.to_float(input_tensors)
inputs = tf.cast(input_tensors, dtype=tf.float32)
preprocessed_inputs, true_image_shapes = detection_model.preprocess(inputs)
output_tensors = detection_model.predict(
preprocessed_inputs, true_image_shapes)
......
......@@ -59,6 +59,9 @@ class FakeModel(model.DetectionModel):
[0.0, 0.0, 0.0, 0.0]]], tf.float32),
'detection_scores': tf.constant([[0.7, 0.6],
[0.9, 0.0]], tf.float32),
'detection_multiclass_scores': tf.constant([[[0.3, 0.7], [0.4, 0.6]],
[[0.1, 0.9], [0.0, 0.0]]],
tf.float32),
'detection_classes': tf.constant([[0, 1],
[1, 0]], tf.float32),
'num_detections': tf.constant([2, 1], tf.float32),
......@@ -371,6 +374,7 @@ class ExportInferenceGraphTest(tf.test.TestCase):
inference_graph.get_tensor_by_name('image_tensor:0')
inference_graph.get_tensor_by_name('detection_boxes:0')
inference_graph.get_tensor_by_name('detection_scores:0')
inference_graph.get_tensor_by_name('detection_multiclass_scores:0')
inference_graph.get_tensor_by_name('detection_classes:0')
inference_graph.get_tensor_by_name('detection_keypoints:0')
inference_graph.get_tensor_by_name('detection_masks:0')
......@@ -398,6 +402,7 @@ class ExportInferenceGraphTest(tf.test.TestCase):
inference_graph.get_tensor_by_name('image_tensor:0')
inference_graph.get_tensor_by_name('detection_boxes:0')
inference_graph.get_tensor_by_name('detection_scores:0')
inference_graph.get_tensor_by_name('detection_multiclass_scores:0')
inference_graph.get_tensor_by_name('detection_classes:0')
inference_graph.get_tensor_by_name('num_detections:0')
with self.assertRaises(KeyError):
......@@ -491,15 +496,20 @@ class ExportInferenceGraphTest(tf.test.TestCase):
'encoded_image_string_tensor:0')
boxes = inference_graph.get_tensor_by_name('detection_boxes:0')
scores = inference_graph.get_tensor_by_name('detection_scores:0')
multiclass_scores = inference_graph.get_tensor_by_name(
'detection_multiclass_scores:0')
classes = inference_graph.get_tensor_by_name('detection_classes:0')
keypoints = inference_graph.get_tensor_by_name('detection_keypoints:0')
masks = inference_graph.get_tensor_by_name('detection_masks:0')
num_detections = inference_graph.get_tensor_by_name('num_detections:0')
for image_str in [jpg_image_str, png_image_str]:
image_str_batch_np = np.hstack([image_str]* 2)
(boxes_np, scores_np, classes_np, keypoints_np, masks_np,
num_detections_np) = sess.run(
[boxes, scores, classes, keypoints, masks, num_detections],
(boxes_np, scores_np, multiclass_scores_np, classes_np, keypoints_np,
masks_np, num_detections_np) = sess.run(
[
boxes, scores, multiclass_scores, classes, keypoints, masks,
num_detections
],
feed_dict={image_str_tensor: image_str_batch_np})
self.assertAllClose(boxes_np, [[[0.0, 0.0, 0.5, 0.5],
[0.5, 0.5, 0.8, 0.8]],
......@@ -507,6 +517,8 @@ class ExportInferenceGraphTest(tf.test.TestCase):
[0.0, 0.0, 0.0, 0.0]]])
self.assertAllClose(scores_np, [[0.7, 0.6],
[0.9, 0.0]])
self.assertAllClose(multiclass_scores_np, [[[0.3, 0.7], [0.4, 0.6]],
[[0.1, 0.9], [0.0, 0.0]]])
self.assertAllClose(classes_np, [[1, 2],
[2, 1]])
self.assertAllClose(keypoints_np, np.arange(48).reshape([2, 2, 6, 2]))
......
......@@ -127,7 +127,7 @@ def transform_input_data(tensor_dict,
# Apply model preprocessing ops and resize instance masks.
image = tensor_dict[fields.InputDataFields.image]
preprocessed_resized_image, true_image_shape = model_preprocess_fn(
tf.expand_dims(tf.to_float(image), axis=0))
tf.expand_dims(tf.cast(image, dtype=tf.float32), axis=0))
if use_bfloat16:
preprocessed_resized_image = tf.cast(
preprocessed_resized_image, tf.bfloat16)
......@@ -219,14 +219,15 @@ def pad_input_data_to_static_shapes(tensor_dict, max_num_boxes, num_classes,
num_additional_channels = 0
if fields.InputDataFields.image_additional_channels in tensor_dict:
num_additional_channels = tensor_dict[
fields.InputDataFields.image_additional_channels].shape[2].value
num_additional_channels = shape_utils.get_dim_as_int(tensor_dict[
fields.InputDataFields.image_additional_channels].shape[2])
# We assume that if num_additional_channels > 0, then it has already been
# concatenated to the base image (but not the ground truth).
num_channels = 3
if fields.InputDataFields.image in tensor_dict:
num_channels = tensor_dict[fields.InputDataFields.image].shape[2].value
num_channels = shape_utils.get_dim_as_int(
tensor_dict[fields.InputDataFields.image].shape[2])
if num_additional_channels:
if num_additional_channels >= num_channels:
......@@ -234,7 +235,8 @@ def pad_input_data_to_static_shapes(tensor_dict, max_num_boxes, num_classes,
'Image must be already concatenated with additional channels.')
if (fields.InputDataFields.original_image in tensor_dict and
tensor_dict[fields.InputDataFields.original_image].shape[2].value ==
shape_utils.get_dim_as_int(
tensor_dict[fields.InputDataFields.original_image].shape[2]) ==
num_channels):
raise ValueError(
'Image must be already concatenated with additional channels.')
......@@ -273,19 +275,21 @@ def pad_input_data_to_static_shapes(tensor_dict, max_num_boxes, num_classes,
if fields.InputDataFields.original_image in tensor_dict:
padding_shapes[fields.InputDataFields.original_image] = [
height, width, tensor_dict[fields.InputDataFields.
original_image].shape[2].value
height, width,
shape_utils.get_dim_as_int(tensor_dict[fields.InputDataFields.
original_image].shape[2])
]
if fields.InputDataFields.groundtruth_keypoints in tensor_dict:
tensor_shape = (
tensor_dict[fields.InputDataFields.groundtruth_keypoints].shape)
padding_shape = [max_num_boxes, tensor_shape[1].value,
tensor_shape[2].value]
padding_shape = [max_num_boxes,
shape_utils.get_dim_as_int(tensor_shape[1]),
shape_utils.get_dim_as_int(tensor_shape[2])]
padding_shapes[fields.InputDataFields.groundtruth_keypoints] = padding_shape
if fields.InputDataFields.groundtruth_keypoint_visibilities in tensor_dict:
tensor_shape = tensor_dict[fields.InputDataFields.
groundtruth_keypoint_visibilities].shape
padding_shape = [max_num_boxes, tensor_shape[1].value]
padding_shape = [max_num_boxes, shape_utils.get_dim_as_int(tensor_shape[1])]
padding_shapes[fields.InputDataFields.
groundtruth_keypoint_visibilities] = padding_shape
......@@ -318,7 +322,7 @@ def augment_input_data(tensor_dict, data_augmentation_options):
input tensor dictionary.
"""
tensor_dict[fields.InputDataFields.image] = tf.expand_dims(
tf.to_float(tensor_dict[fields.InputDataFields.image]), 0)
tf.cast(tensor_dict[fields.InputDataFields.image], dtype=tf.float32), 0)
include_instance_masks = (fields.InputDataFields.groundtruth_instance_masks
in tensor_dict)
......@@ -438,97 +442,112 @@ def create_train_input_fn(train_config, train_input_config,
"""
def _train_input_fn(params=None):
"""Returns `features` and `labels` tensor dictionaries for training.
return train_input(train_config, train_input_config, model_config,
params=params)
Args:
params: Parameter dictionary passed from the estimator.
return _train_input_fn
Returns:
A tf.data.Dataset that holds (features, labels) tuple.
features: Dictionary of feature tensors.
features[fields.InputDataFields.image] is a [batch_size, H, W, C]
float32 tensor with preprocessed images.
features[HASH_KEY] is a [batch_size] int32 tensor representing unique
identifiers for the images.
features[fields.InputDataFields.true_image_shape] is a [batch_size, 3]
int32 tensor representing the true image shapes, as preprocessed
images could be padded.
features[fields.InputDataFields.original_image] (optional) is a
[batch_size, H, W, C] float32 tensor with original images.
labels: Dictionary of groundtruth tensors.
labels[fields.InputDataFields.num_groundtruth_boxes] is a [batch_size]
int32 tensor indicating the number of groundtruth boxes.
labels[fields.InputDataFields.groundtruth_boxes] is a
[batch_size, num_boxes, 4] float32 tensor containing the corners of
the groundtruth boxes.
labels[fields.InputDataFields.groundtruth_classes] is a
[batch_size, num_boxes, num_classes] float32 one-hot tensor of
classes.
labels[fields.InputDataFields.groundtruth_weights] is a
[batch_size, num_boxes] float32 tensor containing groundtruth weights
for the boxes.
-- Optional --
labels[fields.InputDataFields.groundtruth_instance_masks] is a
[batch_size, num_boxes, H, W] float32 tensor containing only binary
values, which represent instance masks for objects.
labels[fields.InputDataFields.groundtruth_keypoints] is a
[batch_size, num_boxes, num_keypoints, 2] float32 tensor containing
keypoints for each box.
Raises:
TypeError: if the `train_config`, `train_input_config` or `model_config`
are not of the correct type.
"""
if not isinstance(train_config, train_pb2.TrainConfig):
raise TypeError('For training mode, the `train_config` must be a '
'train_pb2.TrainConfig.')
if not isinstance(train_input_config, input_reader_pb2.InputReader):
raise TypeError('The `train_input_config` must be a '
'input_reader_pb2.InputReader.')
if not isinstance(model_config, model_pb2.DetectionModel):
raise TypeError('The `model_config` must be a '
'model_pb2.DetectionModel.')
def transform_and_pad_input_data_fn(tensor_dict):
"""Combines transform and pad operation."""
data_augmentation_options = [
preprocessor_builder.build(step)
for step in train_config.data_augmentation_options
]
data_augmentation_fn = functools.partial(
augment_input_data,
data_augmentation_options=data_augmentation_options)
model_preprocess_fn = INPUT_BUILDER_UTIL_MAP['model_build'](
model_config, is_training=True).preprocess
image_resizer_config = config_util.get_image_resizer_config(model_config)
image_resizer_fn = image_resizer_builder.build(image_resizer_config)
transform_data_fn = functools.partial(
transform_input_data, model_preprocess_fn=model_preprocess_fn,
image_resizer_fn=image_resizer_fn,
num_classes=config_util.get_number_of_classes(model_config),
data_augmentation_fn=data_augmentation_fn,
merge_multiple_boxes=train_config.merge_multiple_label_boxes,
retain_original_image=train_config.retain_original_images,
use_multiclass_scores=train_config.use_multiclass_scores,
use_bfloat16=train_config.use_bfloat16)
tensor_dict = pad_input_data_to_static_shapes(
tensor_dict=transform_data_fn(tensor_dict),
max_num_boxes=train_input_config.max_number_of_boxes,
num_classes=config_util.get_number_of_classes(model_config),
spatial_image_shape=config_util.get_spatial_image_size(
image_resizer_config))
return (_get_features_dict(tensor_dict), _get_labels_dict(tensor_dict))
dataset = INPUT_BUILDER_UTIL_MAP['dataset_build'](
train_input_config,
transform_input_data_fn=transform_and_pad_input_data_fn,
batch_size=params['batch_size'] if params else train_config.batch_size)
return dataset
return _train_input_fn
def train_input(train_config, train_input_config,
model_config, model=None, params=None):
"""Returns `features` and `labels` tensor dictionaries for training.
Args:
train_config: A train_pb2.TrainConfig.
train_input_config: An input_reader_pb2.InputReader.
model_config: A model_pb2.DetectionModel.
model: A pre-constructed Detection Model.
If None, one will be created from the config.
params: Parameter dictionary passed from the estimator.
Returns:
A tf.data.Dataset that holds (features, labels) tuple.
features: Dictionary of feature tensors.
features[fields.InputDataFields.image] is a [batch_size, H, W, C]
float32 tensor with preprocessed images.
features[HASH_KEY] is a [batch_size] int32 tensor representing unique
identifiers for the images.
features[fields.InputDataFields.true_image_shape] is a [batch_size, 3]
int32 tensor representing the true image shapes, as preprocessed
images could be padded.
features[fields.InputDataFields.original_image] (optional) is a
[batch_size, H, W, C] float32 tensor with original images.
labels: Dictionary of groundtruth tensors.
labels[fields.InputDataFields.num_groundtruth_boxes] is a [batch_size]
int32 tensor indicating the number of groundtruth boxes.
labels[fields.InputDataFields.groundtruth_boxes] is a
[batch_size, num_boxes, 4] float32 tensor containing the corners of
the groundtruth boxes.
labels[fields.InputDataFields.groundtruth_classes] is a
[batch_size, num_boxes, num_classes] float32 one-hot tensor of
classes.
labels[fields.InputDataFields.groundtruth_weights] is a
[batch_size, num_boxes] float32 tensor containing groundtruth weights
for the boxes.
-- Optional --
labels[fields.InputDataFields.groundtruth_instance_masks] is a
[batch_size, num_boxes, H, W] float32 tensor containing only binary
values, which represent instance masks for objects.
labels[fields.InputDataFields.groundtruth_keypoints] is a
[batch_size, num_boxes, num_keypoints, 2] float32 tensor containing
keypoints for each box.
Raises:
TypeError: if the `train_config`, `train_input_config` or `model_config`
are not of the correct type.
"""
if not isinstance(train_config, train_pb2.TrainConfig):
raise TypeError('For training mode, the `train_config` must be a '
'train_pb2.TrainConfig.')
if not isinstance(train_input_config, input_reader_pb2.InputReader):
raise TypeError('The `train_input_config` must be a '
'input_reader_pb2.InputReader.')
if not isinstance(model_config, model_pb2.DetectionModel):
raise TypeError('The `model_config` must be a '
'model_pb2.DetectionModel.')
if model is None:
model_preprocess_fn = INPUT_BUILDER_UTIL_MAP['model_build'](
model_config, is_training=True).preprocess
else:
model_preprocess_fn = model.preprocess
def transform_and_pad_input_data_fn(tensor_dict):
"""Combines transform and pad operation."""
data_augmentation_options = [
preprocessor_builder.build(step)
for step in train_config.data_augmentation_options
]
data_augmentation_fn = functools.partial(
augment_input_data,
data_augmentation_options=data_augmentation_options)
image_resizer_config = config_util.get_image_resizer_config(model_config)
image_resizer_fn = image_resizer_builder.build(image_resizer_config)
transform_data_fn = functools.partial(
transform_input_data, model_preprocess_fn=model_preprocess_fn,
image_resizer_fn=image_resizer_fn,
num_classes=config_util.get_number_of_classes(model_config),
data_augmentation_fn=data_augmentation_fn,
merge_multiple_boxes=train_config.merge_multiple_label_boxes,
retain_original_image=train_config.retain_original_images,
use_multiclass_scores=train_config.use_multiclass_scores,
use_bfloat16=train_config.use_bfloat16)
tensor_dict = pad_input_data_to_static_shapes(
tensor_dict=transform_data_fn(tensor_dict),
max_num_boxes=train_input_config.max_number_of_boxes,
num_classes=config_util.get_number_of_classes(model_config),
spatial_image_shape=config_util.get_spatial_image_size(
image_resizer_config))
return (_get_features_dict(tensor_dict), _get_labels_dict(tensor_dict))
dataset = INPUT_BUILDER_UTIL_MAP['dataset_build'](
train_input_config,
transform_input_data_fn=transform_and_pad_input_data_fn,
batch_size=params['batch_size'] if params else train_config.batch_size)
return dataset
def create_eval_input_fn(eval_config, eval_input_config, model_config):
......@@ -544,84 +563,99 @@ def create_eval_input_fn(eval_config, eval_input_config, model_config):
"""
def _eval_input_fn(params=None):
"""Returns `features` and `labels` tensor dictionaries for evaluation.
return eval_input(eval_config, eval_input_config, model_config,
params=params)
Args:
params: Parameter dictionary passed from the estimator.
return _eval_input_fn
Returns:
A tf.data.Dataset that holds (features, labels) tuple.
features: Dictionary of feature tensors.
features[fields.InputDataFields.image] is a [1, H, W, C] float32 tensor
with preprocessed images.
features[HASH_KEY] is a [1] int32 tensor representing unique
identifiers for the images.
features[fields.InputDataFields.true_image_shape] is a [1, 3]
int32 tensor representing the true image shapes, as preprocessed
images could be padded.
features[fields.InputDataFields.original_image] is a [1, H', W', C]
float32 tensor with the original image.
labels: Dictionary of groundtruth tensors.
labels[fields.InputDataFields.groundtruth_boxes] is a [1, num_boxes, 4]
float32 tensor containing the corners of the groundtruth boxes.
labels[fields.InputDataFields.groundtruth_classes] is a
[num_boxes, num_classes] float32 one-hot tensor of classes.
labels[fields.InputDataFields.groundtruth_area] is a [1, num_boxes]
float32 tensor containing object areas.
labels[fields.InputDataFields.groundtruth_is_crowd] is a [1, num_boxes]
bool tensor indicating if the boxes enclose a crowd.
labels[fields.InputDataFields.groundtruth_difficult] is a [1, num_boxes]
int32 tensor indicating if the boxes represent difficult instances.
-- Optional --
labels[fields.InputDataFields.groundtruth_instance_masks] is a
[1, num_boxes, H, W] float32 tensor containing only binary values,
which represent instance masks for objects.
Raises:
TypeError: if the `eval_config`, `eval_input_config` or `model_config`
are not of the correct type.
"""
params = params or {}
if not isinstance(eval_config, eval_pb2.EvalConfig):
raise TypeError('For eval mode, the `eval_config` must be a '
'train_pb2.EvalConfig.')
if not isinstance(eval_input_config, input_reader_pb2.InputReader):
raise TypeError('The `eval_input_config` must be a '
'input_reader_pb2.InputReader.')
if not isinstance(model_config, model_pb2.DetectionModel):
raise TypeError('The `model_config` must be a '
'model_pb2.DetectionModel.')
def transform_and_pad_input_data_fn(tensor_dict):
"""Combines transform and pad operation."""
num_classes = config_util.get_number_of_classes(model_config)
model_preprocess_fn = INPUT_BUILDER_UTIL_MAP['model_build'](
model_config, is_training=False).preprocess
image_resizer_config = config_util.get_image_resizer_config(model_config)
image_resizer_fn = image_resizer_builder.build(image_resizer_config)
transform_data_fn = functools.partial(
transform_input_data, model_preprocess_fn=model_preprocess_fn,
image_resizer_fn=image_resizer_fn,
num_classes=num_classes,
data_augmentation_fn=None,
retain_original_image=eval_config.retain_original_images)
tensor_dict = pad_input_data_to_static_shapes(
tensor_dict=transform_data_fn(tensor_dict),
max_num_boxes=eval_input_config.max_number_of_boxes,
num_classes=config_util.get_number_of_classes(model_config),
spatial_image_shape=config_util.get_spatial_image_size(
image_resizer_config))
return (_get_features_dict(tensor_dict), _get_labels_dict(tensor_dict))
dataset = INPUT_BUILDER_UTIL_MAP['dataset_build'](
eval_input_config,
batch_size=params['batch_size'] if params else eval_config.batch_size,
transform_input_data_fn=transform_and_pad_input_data_fn)
return dataset
return _eval_input_fn
def eval_input(eval_config, eval_input_config, model_config,
model=None, params=None):
"""Returns `features` and `labels` tensor dictionaries for evaluation.
Args:
eval_config: An eval_pb2.EvalConfig.
eval_input_config: An input_reader_pb2.InputReader.
model_config: A model_pb2.DetectionModel.
model: A pre-constructed Detection Model.
If None, one will be created from the config.
params: Parameter dictionary passed from the estimator.
Returns:
A tf.data.Dataset that holds (features, labels) tuple.
features: Dictionary of feature tensors.
features[fields.InputDataFields.image] is a [1, H, W, C] float32 tensor
with preprocessed images.
features[HASH_KEY] is a [1] int32 tensor representing unique
identifiers for the images.
features[fields.InputDataFields.true_image_shape] is a [1, 3]
int32 tensor representing the true image shapes, as preprocessed
images could be padded.
features[fields.InputDataFields.original_image] is a [1, H', W', C]
float32 tensor with the original image.
labels: Dictionary of groundtruth tensors.
labels[fields.InputDataFields.groundtruth_boxes] is a [1, num_boxes, 4]
float32 tensor containing the corners of the groundtruth boxes.
labels[fields.InputDataFields.groundtruth_classes] is a
[num_boxes, num_classes] float32 one-hot tensor of classes.
labels[fields.InputDataFields.groundtruth_area] is a [1, num_boxes]
float32 tensor containing object areas.
labels[fields.InputDataFields.groundtruth_is_crowd] is a [1, num_boxes]
bool tensor indicating if the boxes enclose a crowd.
labels[fields.InputDataFields.groundtruth_difficult] is a [1, num_boxes]
int32 tensor indicating if the boxes represent difficult instances.
-- Optional --
labels[fields.InputDataFields.groundtruth_instance_masks] is a
[1, num_boxes, H, W] float32 tensor containing only binary values,
which represent instance masks for objects.
Raises:
TypeError: if the `eval_config`, `eval_input_config` or `model_config`
are not of the correct type.
"""
params = params or {}
if not isinstance(eval_config, eval_pb2.EvalConfig):
raise TypeError('For eval mode, the `eval_config` must be a '
'train_pb2.EvalConfig.')
if not isinstance(eval_input_config, input_reader_pb2.InputReader):
raise TypeError('The `eval_input_config` must be a '
'input_reader_pb2.InputReader.')
if not isinstance(model_config, model_pb2.DetectionModel):
raise TypeError('The `model_config` must be a '
'model_pb2.DetectionModel.')
if model is None:
model_preprocess_fn = INPUT_BUILDER_UTIL_MAP['model_build'](
model_config, is_training=False).preprocess
else:
model_preprocess_fn = model.preprocess
def transform_and_pad_input_data_fn(tensor_dict):
"""Combines transform and pad operation."""
num_classes = config_util.get_number_of_classes(model_config)
image_resizer_config = config_util.get_image_resizer_config(model_config)
image_resizer_fn = image_resizer_builder.build(image_resizer_config)
transform_data_fn = functools.partial(
transform_input_data, model_preprocess_fn=model_preprocess_fn,
image_resizer_fn=image_resizer_fn,
num_classes=num_classes,
data_augmentation_fn=None,
retain_original_image=eval_config.retain_original_images)
tensor_dict = pad_input_data_to_static_shapes(
tensor_dict=transform_data_fn(tensor_dict),
max_num_boxes=eval_input_config.max_number_of_boxes,
num_classes=config_util.get_number_of_classes(model_config),
spatial_image_shape=config_util.get_spatial_image_size(
image_resizer_config))
return (_get_features_dict(tensor_dict), _get_labels_dict(tensor_dict))
dataset = INPUT_BUILDER_UTIL_MAP['dataset_build'](
eval_input_config,
batch_size=params['batch_size'] if params else eval_config.batch_size,
transform_input_data_fn=transform_and_pad_input_data_fn)
return dataset
def create_predict_input_fn(model_config, predict_input_config):
......@@ -664,7 +698,7 @@ def create_predict_input_fn(model_config, predict_input_config):
load_instance_masks=False,
num_additional_channels=predict_input_config.num_additional_channels)
input_dict = transform_fn(decoder.decode(example))
images = tf.to_float(input_dict[fields.InputDataFields.image])
images = tf.cast(input_dict[fields.InputDataFields.image], dtype=tf.float32)
images = tf.expand_dims(images, axis=0)
true_image_shape = tf.expand_dims(
input_dict[fields.InputDataFields.true_image_shape], axis=0)
......
......@@ -53,6 +53,9 @@ EVAL_METRICS_CLASS_DICT = {
# DEPRECATED: please use oid_challenge_detection_metrics instead
'oid_challenge_object_detection_metrics':
object_detection_evaluation.OpenImagesDetectionChallengeEvaluator,
'oid_challenge_segmentation_metrics':
object_detection_evaluation
.OpenImagesInstanceSegmentationChallengeEvaluator,
}
EVAL_DEFAULT_METRIC = 'pascal_voc_detection_metrics'
......@@ -80,7 +83,7 @@ def _extract_predictions_and_losses(model,
input_dict = prefetch_queue.dequeue()
original_image = tf.expand_dims(input_dict[fields.InputDataFields.image], 0)
preprocessed_image, true_image_shapes = model.preprocess(
tf.to_float(original_image))
tf.cast(original_image, dtype=tf.float32))
prediction_dict = model.predict(preprocessed_image, true_image_shapes)
detections = model.postprocess(prediction_dict, true_image_shapes)
......
......@@ -62,7 +62,7 @@ def create_input_queue(batch_size_per_clone, create_tensor_dict_fn,
tensor_dict[fields.InputDataFields.image], 0)
images = tensor_dict[fields.InputDataFields.image]
float_images = tf.to_float(images)
float_images = tf.cast(images, dtype=tf.float32)
tensor_dict[fields.InputDataFields.image] = float_images
include_instance_masks = (fields.InputDataFields.groundtruth_instance_masks
......
......@@ -184,7 +184,7 @@ class ArgMaxMatcher(matcher.Matcher):
return matches
if similarity_matrix.shape.is_fully_defined():
if similarity_matrix.shape[0].value == 0:
if shape_utils.get_dim_as_int(similarity_matrix.shape[0]) == 0:
return _match_when_rows_are_empty()
else:
return _match_when_rows_are_non_empty()
......
......@@ -62,7 +62,7 @@ class GreedyBipartiteMatcher(matcher.Matcher):
# Convert similarity matrix to distance matrix as tf.image.bipartite tries
# to find minimum distance matches.
distance_matrix = -1 * similarity_matrix
num_valid_rows = tf.reduce_sum(tf.to_float(valid_rows))
num_valid_rows = tf.reduce_sum(tf.cast(valid_rows, dtype=tf.float32))
_, match_results = image_ops.bipartite_match(
distance_matrix, num_valid_rows=num_valid_rows)
match_results = tf.reshape(match_results, [-1])
......
......@@ -722,9 +722,13 @@ class FasterRCNNMetaArch(model.DetectionModel):
"""
clip_heights = image_shapes[:, 0]
clip_widths = image_shapes[:, 1]
clip_window = tf.to_float(tf.stack([tf.zeros_like(clip_heights),
tf.zeros_like(clip_heights),
clip_heights, clip_widths], axis=1))
clip_window = tf.cast(
tf.stack([
tf.zeros_like(clip_heights),
tf.zeros_like(clip_heights), clip_heights, clip_widths
],
axis=1),
dtype=tf.float32)
return clip_window
def _proposal_postprocess(self, rpn_box_encodings,
......@@ -732,7 +736,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
image_shape, true_image_shapes):
"""Wraps over FasterRCNNMetaArch._postprocess_rpn()."""
image_shape_2d = self._image_batch_shape_2d(image_shape)
proposal_boxes_normalized, _, num_proposals, _, _ = \
proposal_boxes_normalized, _, _, num_proposals, _, _ = \
self._postprocess_rpn(
rpn_box_encodings, rpn_objectness_predictions_with_background,
anchors, image_shape_2d, true_image_shapes)
......@@ -817,17 +821,13 @@ class FasterRCNNMetaArch(model.DetectionModel):
prediction_dict = self._predict_first_stage(preprocessed_inputs)
if self._number_of_stages >= 2:
# If mixed-precision training on TPU is enabled, rpn_box_encodings and
# rpn_objectness_predictions_with_background are bfloat16 tensors.
# Considered prediction results, they need to be casted to float32
# tensors for correct postprocess_rpn computation in predict_second_stage.
prediction_dict.update(
self._predict_second_stage(
tf.to_float(prediction_dict['rpn_box_encodings']),
tf.to_float(
prediction_dict['rpn_objectness_predictions_with_background']
), prediction_dict['rpn_features_to_crop'],
prediction_dict['anchors'], prediction_dict['image_shape'],
prediction_dict['rpn_box_encodings'],
prediction_dict['rpn_objectness_predictions_with_background'],
prediction_dict['rpn_features_to_crop'],
prediction_dict['anchors'],
prediction_dict['image_shape'],
true_image_shapes))
if self._number_of_stages == 3:
......@@ -848,21 +848,21 @@ class FasterRCNNMetaArch(model.DetectionModel):
Returns:
prediction_dict: a dictionary holding "raw" prediction tensors:
1) rpn_box_predictor_features: A 4-D float32 tensor with shape
1) rpn_box_predictor_features: A 4-D float32/bfloat16 tensor with shape
[batch_size, height, width, depth] to be used for predicting proposal
boxes and corresponding objectness scores.
2) rpn_features_to_crop: A 4-D float32 tensor with shape
2) rpn_features_to_crop: A 4-D float32/bfloat16 tensor with shape
[batch_size, height, width, depth] representing image features to crop
using the proposal boxes predicted by the RPN.
3) image_shape: a 1-D tensor of shape [4] representing the input
image shape.
4) rpn_box_encodings: 3-D float tensor of shape
4) rpn_box_encodings: 3-D float32 tensor of shape
[batch_size, num_anchors, self._box_coder.code_size] containing
predicted boxes.
5) rpn_objectness_predictions_with_background: 3-D float tensor of shape
[batch_size, num_anchors, 2] containing class
predictions (logits) for each of the anchors. Note that this
tensor *includes* background class predictions (at class index 0).
5) rpn_objectness_predictions_with_background: 3-D float32 tensor of
shape [batch_size, num_anchors, 2] containing class predictions
(logits) for each of the anchors. Note that this tensor *includes*
background class predictions (at class index 0).
6) anchors: A 2-D tensor of shape [num_anchors, 4] representing anchors
for the first stage RPN (in absolute coordinates). Note that
`num_anchors` can differ depending on whether the model is created in
......@@ -875,7 +875,8 @@ class FasterRCNNMetaArch(model.DetectionModel):
# The Faster R-CNN paper recommends pruning anchors that venture outside
# the image window at training time and clipping at inference time.
clip_window = tf.to_float(tf.stack([0, 0, image_shape[1], image_shape[2]]))
clip_window = tf.cast(tf.stack([0, 0, image_shape[1], image_shape[2]]),
dtype=tf.float32)
if self._is_training:
if self.clip_anchors_to_image:
anchors_boxlist = box_list_ops.clip_to_window(
......@@ -899,9 +900,10 @@ class FasterRCNNMetaArch(model.DetectionModel):
'image_shape':
image_shape,
'rpn_box_encodings':
rpn_box_encodings,
tf.cast(rpn_box_encodings, dtype=tf.float32),
'rpn_objectness_predictions_with_background':
rpn_objectness_predictions_with_background,
tf.cast(rpn_objectness_predictions_with_background,
dtype=tf.float32),
'anchors':
anchors_boxlist.data['boxes'],
}
......@@ -954,13 +956,13 @@ class FasterRCNNMetaArch(model.DetectionModel):
Returns:
prediction_dict: a dictionary holding "raw" prediction tensors:
1) refined_box_encodings: a 3-D tensor with shape
1) refined_box_encodings: a 3-D float32 tensor with shape
[total_num_proposals, num_classes, self._box_coder.code_size]
representing predicted (final) refined box encodings, where
total_num_proposals=batch_size*self._max_num_proposals. If using a
shared box across classes the shape will instead be
[total_num_proposals, 1, self._box_coder.code_size].
2) class_predictions_with_background: a 3-D tensor with shape
2) class_predictions_with_background: a 3-D float32 tensor with shape
[total_num_proposals, num_classes + 1] containing class
predictions (logits) for each of the anchors, where
total_num_proposals=batch_size*self._max_num_proposals.
......@@ -980,7 +982,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
boxes proposed by the RPN, thus enabling one to extract features and
get box classification and prediction for externally selected areas
of the image.
6) box_classifier_features: a 4-D float32 or bfloat16 tensor
6) box_classifier_features: a 4-D float32/bfloat16 tensor
representing the features for each proposal.
"""
proposal_boxes_normalized, num_proposals = self._proposal_postprocess(
......@@ -1008,13 +1010,13 @@ class FasterRCNNMetaArch(model.DetectionModel):
Returns:
prediction_dict: a dictionary holding "raw" prediction tensors:
1) refined_box_encodings: a 3-D tensor with shape
1) refined_box_encodings: a 3-D float32 tensor with shape
[total_num_proposals, num_classes, self._box_coder.code_size]
representing predicted (final) refined box encodings, where
total_num_proposals=batch_size*self._max_num_proposals. If using a
shared box across classes the shape will instead be
[total_num_proposals, 1, self._box_coder.code_size].
2) class_predictions_with_background: a 3-D tensor with shape
2) class_predictions_with_background: a 3-D float32 tensor with shape
[total_num_proposals, num_classes + 1] containing class
predictions (logits) for each of the anchors, where
total_num_proposals=batch_size*self._max_num_proposals.
......@@ -1029,17 +1031,12 @@ class FasterRCNNMetaArch(model.DetectionModel):
boxes proposed by the RPN, thus enabling one to extract features and
get box classification and prediction for externally selected areas
of the image.
5) box_classifier_features: a 4-D float32 or bfloat16 tensor
5) box_classifier_features: a 4-D float32/bfloat16 tensor
representing the features for each proposal.
"""
# If mixed-precision training on TPU is enabled, the dtype of
# rpn_features_to_crop is bfloat16, otherwise it is float32. tf.cast is
# used to match the dtype of proposal_boxes_normalized to that of
# rpn_features_to_crop for further computation.
flattened_proposal_feature_maps = (
self._compute_second_stage_input_feature_maps(
rpn_features_to_crop,
tf.cast(proposal_boxes_normalized, rpn_features_to_crop.dtype)))
rpn_features_to_crop, proposal_boxes_normalized))
box_classifier_features = self._extract_box_classifier_features(
flattened_proposal_feature_maps)
......@@ -1066,9 +1063,10 @@ class FasterRCNNMetaArch(model.DetectionModel):
proposal_boxes_normalized, image_shape, self._parallel_iterations)
prediction_dict = {
'refined_box_encodings': refined_box_encodings,
'refined_box_encodings': tf.cast(refined_box_encodings,
dtype=tf.float32),
'class_predictions_with_background':
class_predictions_with_background,
tf.cast(class_predictions_with_background, dtype=tf.float32),
'proposal_boxes': absolute_proposal_boxes,
'box_classifier_features': box_classifier_features,
'proposal_boxes_normalized': proposal_boxes_normalized,
......@@ -1215,7 +1213,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
_, num_classes, height, width = instance_masks.get_shape().as_list()
k = tf.shape(instance_masks)[0]
instance_masks = tf.reshape(instance_masks, [-1, height, width])
classes = tf.to_int32(tf.reshape(classes, [-1]))
classes = tf.cast(tf.reshape(classes, [-1]), dtype=tf.int32)
gather_idx = tf.range(k) * num_classes + classes
return tf.gather(instance_masks, gather_idx)
......@@ -1415,6 +1413,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
detections: a dictionary containing the following fields
detection_boxes: [batch, max_detection, 4]
detection_scores: [batch, max_detections]
detection_multiclass_scores: [batch, max_detections, 2]
detection_classes: [batch, max_detections]
(this entry is only created if rpn_mode=False)
num_detections: [batch]
......@@ -1427,7 +1426,8 @@ class FasterRCNNMetaArch(model.DetectionModel):
with tf.name_scope('FirstStagePostprocessor'):
if self._number_of_stages == 1:
(proposal_boxes, proposal_scores, num_proposals, raw_proposal_boxes,
(proposal_boxes, proposal_scores, proposal_multiclass_scores,
num_proposals, raw_proposal_boxes,
raw_proposal_scores) = self._postprocess_rpn(
prediction_dict['rpn_box_encodings'],
prediction_dict['rpn_objectness_predictions_with_background'],
......@@ -1437,8 +1437,10 @@ class FasterRCNNMetaArch(model.DetectionModel):
proposal_boxes,
fields.DetectionResultFields.detection_scores:
proposal_scores,
fields.DetectionResultFields.detection_multiclass_scores:
proposal_multiclass_scores,
fields.DetectionResultFields.num_detections:
tf.to_float(num_proposals),
tf.cast(num_proposals, dtype=tf.float32),
fields.DetectionResultFields.raw_detection_boxes:
raw_proposal_boxes,
fields.DetectionResultFields.raw_detection_scores:
......@@ -1545,6 +1547,9 @@ class FasterRCNNMetaArch(model.DetectionModel):
proposal_scores: A float tensor with shape
[batch_size, max_num_proposals] representing the (potentially zero
padded) proposal objectness scores for all images in the batch.
proposal_multiclass_scores: A float tensor with shape
[batch_size, max_num_proposals, 2] representing the (potentially zero
padded) proposal multiclass scores for all images in the batch.
num_proposals: A Tensor of type `int32`. A 1-D tensor of shape [batch]
representing the number of proposals predicted for each image in
the batch.
......@@ -1566,10 +1571,12 @@ class FasterRCNNMetaArch(model.DetectionModel):
rpn_objectness_predictions_with_background_batch)
rpn_objectness_softmax_without_background = rpn_objectness_softmax[:, :, 1]
clip_window = self._compute_clip_window(image_shapes)
(proposal_boxes, proposal_scores, _, _, _,
additional_fields = {'multiclass_scores': rpn_objectness_softmax}
(proposal_boxes, proposal_scores, _, _, nmsed_additional_fields,
num_proposals) = self._first_stage_nms_fn(
tf.expand_dims(raw_proposal_boxes, axis=2),
tf.expand_dims(rpn_objectness_softmax_without_background, axis=2),
additional_fields=additional_fields,
clip_window=clip_window)
if self._is_training:
proposal_boxes = tf.stop_gradient(proposal_boxes)
......@@ -1596,7 +1603,9 @@ class FasterRCNNMetaArch(model.DetectionModel):
normalize_boxes,
elems=[raw_proposal_boxes, image_shapes],
dtype=tf.float32)
return (normalized_proposal_boxes, proposal_scores, num_proposals,
proposal_multiclass_scores = nmsed_additional_fields['multiclass_scores']
return (normalized_proposal_boxes, proposal_scores,
proposal_multiclass_scores, num_proposals,
raw_normalized_proposal_boxes, rpn_objectness_softmax)
def _sample_box_classifier_batch(
......@@ -1713,11 +1722,13 @@ class FasterRCNNMetaArch(model.DetectionModel):
for i, boxes in enumerate(
self.groundtruth_lists(fields.BoxListFields.boxes))
]
groundtruth_classes_with_background_list = [
tf.to_float(
tf.pad(one_hot_encoding, [[0, 0], [1, 0]], mode='CONSTANT'))
for one_hot_encoding in self.groundtruth_lists(
fields.BoxListFields.classes)]
groundtruth_classes_with_background_list = []
for one_hot_encoding in self.groundtruth_lists(
fields.BoxListFields.classes):
groundtruth_classes_with_background_list.append(
tf.cast(
tf.pad(one_hot_encoding, [[0, 0], [1, 0]], mode='CONSTANT'),
dtype=tf.float32))
groundtruth_masks_list = self._groundtruth_lists.get(
fields.BoxListFields.masks)
......@@ -1860,6 +1871,9 @@ class FasterRCNNMetaArch(model.DetectionModel):
A dictionary containing:
`detection_boxes`: [batch, max_detection, 4] in normalized co-ordinates.
`detection_scores`: [batch, max_detections]
detection_multiclass_scores: [batch, max_detections,
num_classes_with_background] tensor with class score distribution for
post-processed detection boxes including background class if any.
`detection_classes`: [batch, max_detections]
`num_detections`: [batch]
`detection_masks`:
......@@ -1894,20 +1908,24 @@ class FasterRCNNMetaArch(model.DetectionModel):
clip_window = self._compute_clip_window(image_shapes)
mask_predictions_batch = None
if mask_predictions is not None:
mask_height = mask_predictions.shape[2].value
mask_width = mask_predictions.shape[3].value
mask_height = shape_utils.get_dim_as_int(mask_predictions.shape[2])
mask_width = shape_utils.get_dim_as_int(mask_predictions.shape[3])
mask_predictions = tf.sigmoid(mask_predictions)
mask_predictions_batch = tf.reshape(
mask_predictions, [-1, self.max_num_proposals,
self.num_classes, mask_height, mask_width])
(nmsed_boxes, nmsed_scores, nmsed_classes, nmsed_masks, _,
num_detections) = self._second_stage_nms_fn(
additional_fields = {
'multiclass_scores': class_predictions_with_background_batch_normalized
}
(nmsed_boxes, nmsed_scores, nmsed_classes, nmsed_masks,
nmsed_additional_fields, num_detections) = self._second_stage_nms_fn(
refined_decoded_boxes_batch,
class_predictions_batch,
clip_window=clip_window,
change_coordinate_frame=True,
num_valid_boxes=num_proposals,
additional_fields=additional_fields,
masks=mask_predictions_batch)
if refined_decoded_boxes_batch.shape[2] > 1:
class_ids = tf.expand_dims(
......@@ -1948,8 +1966,10 @@ class FasterRCNNMetaArch(model.DetectionModel):
nmsed_scores,
fields.DetectionResultFields.detection_classes:
nmsed_classes,
fields.DetectionResultFields.detection_multiclass_scores:
nmsed_additional_fields['multiclass_scores'],
fields.DetectionResultFields.num_detections:
tf.to_float(num_detections),
tf.cast(num_detections, dtype=tf.float32),
fields.DetectionResultFields.raw_detection_boxes:
raw_normalized_detection_boxes,
fields.DetectionResultFields.raw_detection_scores:
......@@ -2096,18 +2116,18 @@ class FasterRCNNMetaArch(model.DetectionModel):
return self._first_stage_sampler.subsample(
tf.cast(cls_weights, tf.bool),
self._first_stage_minibatch_size, tf.cast(cls_targets, tf.bool))
batch_sampled_indices = tf.to_float(shape_utils.static_or_dynamic_map_fn(
batch_sampled_indices = tf.cast(shape_utils.static_or_dynamic_map_fn(
_minibatch_subsample_fn,
[batch_cls_targets, batch_cls_weights],
dtype=tf.bool,
parallel_iterations=self._parallel_iterations,
back_prop=True))
back_prop=True), dtype=tf.float32)
# Normalize by number of examples in sampled minibatch
normalizer = tf.maximum(
tf.reduce_sum(batch_sampled_indices, axis=1), 1.0)
batch_one_hot_targets = tf.one_hot(
tf.to_int32(batch_cls_targets), depth=2)
tf.cast(batch_cls_targets, dtype=tf.int32), depth=2)
sampled_reg_indices = tf.multiply(batch_sampled_indices,
batch_reg_weights)
......@@ -2133,8 +2153,8 @@ class FasterRCNNMetaArch(model.DetectionModel):
name='localization_loss')
objectness_loss = tf.multiply(self._first_stage_obj_loss_weight,
objectness_loss, name='objectness_loss')
loss_dict = {localization_loss.op.name: localization_loss,
objectness_loss.op.name: objectness_loss}
loss_dict = {'Loss/RPNLoss/localization_loss': localization_loss,
'Loss/RPNLoss/objectness_loss': objectness_loss}
return loss_dict
def _loss_box_classifier(self,
......@@ -2216,8 +2236,9 @@ class FasterRCNNMetaArch(model.DetectionModel):
for proposal_boxes_single_image in tf.unstack(proposal_boxes)]
batch_size = len(proposal_boxlists)
num_proposals_or_one = tf.to_float(tf.expand_dims(
tf.maximum(num_proposals, tf.ones_like(num_proposals)), 1))
num_proposals_or_one = tf.cast(tf.expand_dims(
tf.maximum(num_proposals, tf.ones_like(num_proposals)), 1),
dtype=tf.float32)
normalizer = tf.tile(num_proposals_or_one,
[1, self.max_num_proposals]) * batch_size
......@@ -2276,9 +2297,11 @@ class FasterRCNNMetaArch(model.DetectionModel):
ndims=2) / normalizer
second_stage_loc_loss = tf.reduce_sum(
second_stage_loc_losses * tf.to_float(paddings_indicator))
second_stage_loc_losses * tf.cast(paddings_indicator,
dtype=tf.float32))
second_stage_cls_loss = tf.reduce_sum(
second_stage_cls_losses * tf.to_float(paddings_indicator))
second_stage_cls_losses * tf.cast(paddings_indicator,
dtype=tf.float32))
if self._hard_example_miner:
(second_stage_loc_loss, second_stage_cls_loss
......@@ -2293,8 +2316,10 @@ class FasterRCNNMetaArch(model.DetectionModel):
second_stage_cls_loss,
name='classification_loss')
loss_dict = {localization_loss.op.name: localization_loss,
classification_loss.op.name: classification_loss}
loss_dict = {'Loss/BoxClassifierLoss/localization_loss':
localization_loss,
'Loss/BoxClassifierLoss/classification_loss':
classification_loss}
second_stage_mask_loss = None
if prediction_masks is not None:
if groundtruth_masks_list is None:
......@@ -2332,8 +2357,8 @@ class FasterRCNNMetaArch(model.DetectionModel):
prediction_masks_with_background,
tf.greater(one_hot_flat_cls_targets_with_background, 0))
mask_height = prediction_masks.shape[2].value
mask_width = prediction_masks.shape[3].value
mask_height = shape_utils.get_dim_as_int(prediction_masks.shape[2])
mask_width = shape_utils.get_dim_as_int(prediction_masks.shape[3])
reshaped_prediction_masks = tf.reshape(
prediction_masks_masked_by_class_targets,
[batch_size, -1, mask_height * mask_width])
......@@ -2364,7 +2389,8 @@ class FasterRCNNMetaArch(model.DetectionModel):
[batch_size, -1, mask_height * mask_width])
mask_losses_weights = (
batch_mask_target_weights * tf.to_float(paddings_indicator))
batch_mask_target_weights * tf.cast(paddings_indicator,
dtype=tf.float32))
mask_losses = self._second_stage_mask_loss(
reshaped_prediction_masks,
batch_cropped_gt_mask,
......@@ -2419,7 +2445,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
for detection_boxes_single_image in tf.unstack(proposal_boxes)
]
paddings_indicator = self._padded_batched_proposals_indicator(
tf.to_int32(num_detections), detection_boxes.shape[1])
tf.cast(num_detections, dtype=tf.int32), detection_boxes.shape[1])
(batch_cls_targets_with_background, _, _, _,
_) = target_assigner.batch_assign_targets(
target_assigner=self._detector_target_assigner,
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment