Commit 80444539 authored by Zhuoran Liu's avatar Zhuoran Liu Committed by pkulzc
Browse files

Add TPU SavedModel exporter and refactor OD code (#6737)

247226201  by ronnyvotel:

    Updating the visualization tools to accept unique_ids for color coding.

--
247067830  by Zhichao Lu:

    Add box_encodings_clip_range options for the convolutional box predictor (for TPU compatibility).

--
246888475  by Zhichao Lu:

    Remove unused _update_eval_steps function.

--
246163259  by lzc:

    Add a gather op that can handle ignore indices (which are "-1"s in this case).

--
246084944  by Zhichao Lu:

    Keras based implementation for SSD + MobilenetV2 + FPN.

--
245544227  by rathodv:

    Add batch_get_targets method to target assigner module to gather any groundtruth tensors based on the results of target assigner.

--
245540854  by rathodv:

    Update target assigner to return match tensor instead of a match object.

--
245434441  by Zhichao Lu:

    Add README for tpu_exporters package.

--
245381834  by lzc:

    Internal change.

--
245298983  by Zhichao Lu:

    Add conditional_shape_resizer to config_util

--
245134666  by Zhichao Lu:

    Adds ConditionalShapeResizer to the ImageResizer proto which enables resizing only if input image height or width is is greater or smaller than a certain size. Also enables specification of resize method in resize_to_{max, min}_dimension methods.

--
245093975  by Zhichao Lu:

    Exporting SavedModel for Object Detection TPU inference. (faster-rcnn)

--
245072421  by Zhichao Lu:

    Adds a new image resizing method "resize_to_max_dimension" which resizes images only if a dimension is greater than the maximum desired value while maintaining aspect ratio.

--
244946998  by lzc:

    Internal Changes.

--
244943693  by Zhichao Lu:

    Add a custom config to mobilenet v2 that makes it more detection friendly.

--
244754158  by derekjchow:

    Internal change.

--
244699875  by Zhichao Lu:

    Add check_range=False to box_list_ops.to_normalized_coordinates when training
    for instance segmentation.  This is consistent with other calls when training
    for object detection.  There could be wrongly annotated boxes in the dataset.

--
244507425  by rathodv:

    Support bfloat16 for ssd models.

--
244399982  by Zhichao Lu:

    Exporting SavedModel for Object Detection TPU inference. (ssd)

--
244209387  by Zhichao Lu:

    Internal change.

--
243922296  by rathodv:

    Change `raw_detection_scores` to contain softmax/sigmoid scores (not logits) for `raw_ detection_boxes`.

--
243883978  by Zhichao Lu:

    Add a sample fully conv config.

--
243369455  by Zhichao Lu:

    Fix regularization loss gap in Keras and Slim.

--
243292002  by lzc:

    Internal changes.

--
243097958  by Zhichao Lu:

    Exporting SavedModel for Object Detection TPU inference. (ssd model)

--
243007177  by Zhichao Lu:

    Exporting SavedModel for Object Detection TPU inference. (ssd model)

--
242776550  by Zhichao Lu:

    Make object detection pre-processing run on GPU.  tf.map_fn() uses
    TensorArrayV3 ops, which have no int32 GPU implementation.  Cast to int64,
    then cast back to int32.

--
242723128  by Zhichao Lu:

    Using sorted dictionaries for additional heads in non_max_suppression to ensure tensor order

--
242495311  by Zhichao Lu:

    Update documentation to reflect new TFLite examples repo location

--
242230527  by Zhichao Lu:

    Fix Dropout bugs for WeightSharedConvolutionalBoxPred.

--
242226573  by Zhichao Lu:

    Create Keras-based WeightSharedConvolutionalBoxPredictor.

--
241806074  by Zhichao Lu:

    Add inference in unit tests of TFX OD template.

--
241641498  by lzc:

    Internal change.

--
241637481  by Zhichao Lu:

    matmul_crop_and_resize(): Switch to dynamic shaping, so that not all dimensions are required to be known.

--
241429980  by Zhichao Lu:

    Internal change

--
241167237  by Zhichao Lu:

    Adds a faster_rcnn_inception_resnet_v2 Keras feature extractor, and updates the model builder to construct it.

--
241088616  by Zhichao Lu:

    Make it compatible with different dtype, e.g. float32, bfloat16, etc.

--
240897364  by lzc:

    Use image_np_expanded in object_detection_tutorial notebook.

--
240890393  by Zhichao Lu:

    Disable multicore inference for OD template as its not yet compatible.

--
240352168  by Zhichao Lu:

    Make SSDResnetV1FpnFeatureExtractor not protected to allow inheritance.

--
240351470  by lzc:

    Internal change.

--
239878928  by Zhichao Lu:

    Defines Keras box predictors for Faster RCNN and RFCN

--
239872103  by Zhichao Lu:

    Delete duplicated inputs in test.

--
239714273  by Zhichao Lu:

    Adding scope variable to all class heads

--
239698643  by Zhichao Lu:

    Create FPN feature extractor for object detection.

--
239696657  by Zhichao Lu:

    Internal Change.

--
239299404  by Zhichao Lu:

    Allows the faster rcnn meta-architecture to support Keras subcomponents

--
238502595  by Zhichao Lu:

    Lay the groundwork for symmetric quantization.

--
238496885  by Zhichao Lu:

    Add flexible_grid_anchor_generator

--
238138727  by lzc:

    Remove dead code.

    _USE_C_SHAPES has been forced True in TensorFlow releases since
    TensorFlow 1.9
    (https://github.com/tensorflow/tensorflow/commit/1d74a69443f741e69f9f52cb6bc2940b4d4ae3b7)

--
238123936  by rathodv:

    Add num_matched_groundtruth summary to target assigner in SSD.

--
238103345  by ronnyvotel:

    Raising error if input file pattern does not match any files.
    Also printing the number of evaluation images for coco metrics.

--
238044081  by Zhichao Lu:

    Fix docstring to state the correct dimensionality of `class_predictions_with_background`.

--
237920279  by Zhichao Lu:

    [XLA] Rework debug flags for dumping HLO.

    The following flags (usually passed via the XLA_FLAGS envvar) are removed:

      xla_dump_computations_to
      xla_dump_executions_to
      xla_dump_ir_to
      xla_dump_optimized_hlo_proto_to
      xla_dump_per_pass_hlo_proto_to
      xla_dump_unoptimized_hlo_proto_to
      xla_generate_hlo_graph
      xla_generate_hlo_text_to
      xla_hlo_dump_as_html
      xla_hlo_graph_path
      xla_log_hlo_text

    The following new flags are added:

      xla_dump_to
      xla_dump_hlo_module_re
      xla_dump_hlo_pass_re
      xla_dump_hlo_as_text
      xla_dump_hlo_as_proto
      xla_dump_hlo_as_dot
      xla_dump_hlo_as_url
      xla_dump_hlo_as_html
      xla_dump_ir
      xla_dump_hlo_snapshots

    The default is not to dump anything at all, but as soon as some dumping flag is
    specified, we enable the following defaults (most of which can be overridden).

     * dump to stdout (overridden by --xla_dump_to)
     * dump HLO modules at the very beginning and end of the optimization pipeline
     * don't dump between any HLO passes (overridden by --xla_dump_hlo_pass_re)
     * dump all HLO modules (overridden by --xla_dump_hlo_module_re)
     * dump in textual format (overridden by
       --xla_dump_hlo_as_{text,proto,dot,url,html}).

    For example, to dump optimized and unoptimized HLO text and protos to /tmp/foo,
    pass

      --xla_dump_to=/tmp/foo --xla_dump_hlo_as_text --xla_dump_hlo_as_proto

    For details on these flags' meanings, see xla.proto.

    The intent of this change is to make dumping both simpler to use and more
    powerful.

    For example:

     * Previously there was no way to dump the HLO module during the pass pipeline
       in HLO text format; the only option was --dump_per_pass_hlo_proto_to, which
       dumped in proto format.

       Now this is --xla_dump_pass_re=.* --xla_dump_hlo_as_text.  (In fact, the
       second flag is not necessary in this case, as dumping as text is the
       default.)

     * Previously there was no way to dump HLO as a graph before and after
       compilation; the only option was --xla_generate_hlo_graph, which would dump
       before/after every pass.

       Now this is --xla_dump_hlo_as_{dot,url,html} (depending on what format you
       want the graph in).

     * Previously, there was no coordination between the filenames written by the
       various flags, so info about one module might be dumped with various
       filename prefixes.  Now the filenames are consistent and all dumps from a
       particular module are next to each other.

    If you only specify some of these flags, we try to figure out what you wanted.
    For example:

     * --xla_dump_to implies --xla_dump_hlo_as_text unless you specify some
       other --xla_dump_as_* flag.

     * --xla_dump_hlo_as_text or --xla_dump_ir implies dumping to stdout unless you
       specify a different --xla_dump_to directory.  You can explicitly dump to
       stdout with --xla_dump_to=-.

    As part of this change, I simplified the debugging code in the HLO passes for
    dumping HLO modules.  Previously, many tests explicitly VLOG'ed the HLO module
    before, after, and sometimes during the pass.  I removed these VLOGs.  If you
    want dumps before/during/after an HLO pass, use --xla_dump_pass_re=<pass_name>.

--
237510043  by lzc:

    Internal Change.

--
237469515  by Zhichao Lu:

    Parameterize model_builder.build in inputs.py.

--
237293511  by rathodv:

    Remove multiclass_scores from tensor_dict in transform_data_fn always.

--
237260333  by ronnyvotel:

    Updating faster_rcnn_meta_arch to define prediction dictionary fields that are batched.

--

PiperOrigin-RevId: 247226201
parent c4f34e58
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Test for object detection's TPU exporter."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
from absl.testing import parameterized
import numpy as np
import tensorflow as tf
from object_detection.tpu_exporters import export_saved_model_tpu_lib
flags = tf.app.flags
FLAGS = flags.FLAGS
def get_path(path_suffix):
return os.path.join(tf.resource_loader.get_data_files_path(), 'testdata',
path_suffix)
class ExportSavedModelTPUTest(tf.test.TestCase, parameterized.TestCase):
@parameterized.named_parameters(
('ssd', get_path('ssd/ssd_pipeline.config'), 'image_tensor', True, 20),
('faster_rcnn',
get_path('faster_rcnn/faster_rcnn_resnet101_atrous_coco.config'),
'image_tensor', True, 20))
def testExportAndLoad(self,
pipeline_config_file,
input_type='image_tensor',
use_bfloat16=False,
repeat=1):
input_placeholder_name = 'placeholder_tensor'
export_dir = os.path.join(FLAGS.test_tmpdir, 'tpu_saved_model')
if tf.gfile.Exists(export_dir):
tf.gfile.DeleteRecursively(export_dir)
ckpt_path = None
export_saved_model_tpu_lib.export(pipeline_config_file, ckpt_path,
export_dir, input_placeholder_name,
input_type, use_bfloat16)
inputs = np.random.rand(256, 256, 3)
tensor_dict_out = export_saved_model_tpu_lib.run_inference_from_saved_model(
inputs, export_dir, input_placeholder_name, repeat)
for k, v in tensor_dict_out.items():
tf.logging.info('{}: {}'.format(k, v))
if __name__ == '__main__':
tf.test.main()
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Python library for faster_rcnn model, tailored for TPU inference."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
# pylint: disable=protected-access
import tensorflow as tf
# pylint: disable=g-import-not-at-top
# Checking TF version, because this module relies on TPUPartitionedCall
# in tensorflow.python.tpu, which is not available until TF r1.14.
major, minor, _ = tf.__version__.split('.') # pylint: disable=protected-access
if int(major) < 1 or (int(major == 1) and int(minor) < 14):
raise RuntimeError(
'TensorFlow version >= 1.14 is required. Found ({}).'.format(
tf.__version__))
from tensorflow.python.framework import function
from tensorflow.python.tpu import functional as tpu_functional
from tensorflow.python.tpu.ops import tpu_ops
from object_detection import exporter
from object_detection.builders import model_builder
from object_detection.tpu_exporters import utils
ANCHORS = 'anchors'
BOX_CLASSIFIER_FEATURES = 'box_classifier_features'
BOX_ENCODINGS = 'box_encodings'
CLASS_PREDICTIONS_WITH_BACKGROUND = 'class_predictions_with_background'
IMAGE_SHAPE = 'image_shape'
NUM_PROPOSALS = 'num_proposals'
PROPOSAL_BOXES = 'proposal_boxes'
PROPOSAL_BOXES_NORMALIZED = 'proposal_boxes_normalized'
REFINED_BOX_ENCODINGS = 'refined_box_encodings'
RPN_BOX_ENCODINGS = 'rpn_box_encodings'
RPN_BOX_PREDICTOR_FEATURES = 'rpn_box_predictor_features'
RPN_FEATURES_TO_CROP = 'rpn_features_to_crop'
RPN_OBJECTNESS_PREDICTIONS_WITH_BACKGROUND = \
'rpn_objectness_predictions_with_background'
def modify_config(pipeline_config):
"""Modifies pipeline config to build the correct graph for TPU."""
# faster_rcnn.use_static_shapes and faster_rcnn.use_static_shapes_for_eval
# are set to True in order for detection_model.use_static_shapes to be True.
# We need to set this so that clip_to_window in _predict_first_stage
# can work on TPU. However as a side-effect, the flag forces the use of
# padded version of NMS.
pipeline_config.model.faster_rcnn.use_static_shapes = True
pipeline_config.model.faster_rcnn.use_static_shapes_for_eval = True
pipeline_config.model.faster_rcnn.use_matmul_crop_and_resize = True
pipeline_config.model.faster_rcnn.clip_anchors_to_image = True
return pipeline_config
def get_prediction_tensor_shapes(pipeline_config):
"""Gets static shapes of tensors by building the graph on CPU.
This function builds the graph on CPU and obtain static shapes of output
tensors from TPUPartitionedCall. Shapes information are later used for setting
shapes of tensors when TPU graphs are built. This is necessary because tensors
coming out of TPUPartitionedCall lose their shape information, which are
needed for a lot of CPU operations later.
Args:
pipeline_config: A TrainEvalPipelineConfig proto.
Returns:
A python dict of tensors' names and their shapes.
"""
pipeline_config = modify_config(pipeline_config)
detection_model = model_builder.build(
pipeline_config.model, is_training=False)
_, input_tensors = exporter.input_placeholder_fn_map['image_tensor']()
inputs = tf.to_float(input_tensors)
preprocessed_inputs, true_image_shapes = detection_model.preprocess(inputs)
prediction_dict = detection_model.predict(preprocessed_inputs,
true_image_shapes)
shapes_info = {k: v.shape.as_list() for k, v in prediction_dict.items()}
return shapes_info
def build_graph(pipeline_config,
shapes_info,
input_type='encoded_image_string_tensor',
use_bfloat16=True):
"""Builds serving graph of faster_rcnn to be exported.
Args:
pipeline_config: A TrainEvalPipelineConfig proto.
shapes_info: A python dict of tensors' names and their shapes, returned by
`get_prediction_tensor_shapes()`.
input_type: One of
'encoded_image_string_tensor': a 1d tensor with dtype=tf.string
'image_tensor': a 4d tensor with dtype=tf.uint8
'tf_example': a 1d tensor with dtype=tf.string
use_bfloat16: If true, use tf.bfloat16 on TPU.
Returns:
placeholder_tensor: A placeholder tensor, type determined by `input_type`.
result_tensor_dict: A python dict of tensors' names and tensors.
"""
pipeline_config = modify_config(pipeline_config)
detection_model = model_builder.build(
pipeline_config.model, is_training=False)
placeholder_tensor, input_tensors = \
exporter.input_placeholder_fn_map[input_type]()
# CPU pre-processing
inputs = tf.to_float(input_tensors)
preprocessed_inputs, true_image_shapes = detection_model.preprocess(inputs)
# Dimshuffle: [b, h, w, c] -> [b, c, h, w]
preprocessed_inputs = tf.transpose(preprocessed_inputs, perm=[0, 3, 1, 2])
if use_bfloat16:
preprocessed_inputs = tf.cast(preprocessed_inputs, dtype=tf.bfloat16)
# TPU feature extraction
def tpu_subgraph_first_stage_fn(preprocessed_inputs):
"""Defines the first part of graph on TPU."""
# [b, c, h, w] -> [b, h, w, c]
preprocessed_inputs = tf.transpose(preprocessed_inputs, perm=[0, 2, 3, 1])
prediction_dict = detection_model._predict_first_stage(preprocessed_inputs)
# [b, h, w, c] -> [b, c, h, w]
rpn_box_predictor_features = tf.transpose(
prediction_dict[RPN_BOX_PREDICTOR_FEATURES], perm=[0, 3, 1, 2])
# [b, h, w, c] -> [b, c, h, w]
rpn_features_to_crop = tf.transpose(
prediction_dict[RPN_FEATURES_TO_CROP], perm=[0, 3, 1, 2])
# [batch, anchor, depth] -> [depth, batch, anchor]
rpn_box_encodings = tf.transpose(
prediction_dict[RPN_BOX_ENCODINGS], perm=[2, 0, 1])
# [batch, anchor, depth] -> [depth, batch, anchor]
rpn_objectness_predictions_with_background = tf.transpose(
prediction_dict[RPN_OBJECTNESS_PREDICTIONS_WITH_BACKGROUND],
perm=[2, 0, 1])
# [anchors, depth]
anchors = tf.transpose(prediction_dict[ANCHORS], perm=[1, 0])
return (rpn_box_predictor_features, rpn_features_to_crop,
prediction_dict['image_shape'], rpn_box_encodings,
rpn_objectness_predictions_with_background, anchors)
@function.Defun(capture_resource_var_by_value=False)
def tpu_subgraph_first_stage():
if use_bfloat16:
with tf.contrib.tpu.bfloat16_scope():
return tf.contrib.tpu.rewrite(tpu_subgraph_first_stage_fn,
[preprocessed_inputs])
else:
return tf.contrib.tpu.rewrite(tpu_subgraph_first_stage_fn,
[preprocessed_inputs])
(rpn_box_predictor_features, rpn_features_to_crop, image_shape,
rpn_box_encodings, rpn_objectness_predictions_with_background,
anchors) = \
tpu_functional.TPUPartitionedCall(
args=tpu_subgraph_first_stage.captured_inputs,
device_ordinal=tpu_ops.tpu_ordinal_selector(),
Tout=[
o.type
for o in tpu_subgraph_first_stage.definition.signature.output_arg
],
f=tpu_subgraph_first_stage)
prediction_dict = {
RPN_BOX_PREDICTOR_FEATURES:
tf.transpose(rpn_box_predictor_features, perm=[0, 2, 3, 1]),
RPN_FEATURES_TO_CROP:
tf.transpose(rpn_features_to_crop, perm=[0, 2, 3, 1]),
IMAGE_SHAPE:
image_shape,
RPN_BOX_ENCODINGS:
tf.transpose(rpn_box_encodings, perm=[1, 2, 0]),
RPN_OBJECTNESS_PREDICTIONS_WITH_BACKGROUND:
tf.transpose(
rpn_objectness_predictions_with_background, perm=[1, 2, 0]),
ANCHORS:
tf.transpose(anchors, perm=[1, 0]),
}
for k in prediction_dict:
prediction_dict[k].set_shape(shapes_info[k])
if use_bfloat16:
prediction_dict = utils.bfloat16_to_float32_nested(prediction_dict)
# CPU region proposal (NMS)
proposal_boxes_normalized, num_proposals = \
detection_model._proposal_postprocess(
tf.cast(prediction_dict[RPN_BOX_ENCODINGS], dtype=tf.float32),
tf.cast(
prediction_dict[RPN_OBJECTNESS_PREDICTIONS_WITH_BACKGROUND],
dtype=tf.float32), prediction_dict[ANCHORS],
prediction_dict[IMAGE_SHAPE], true_image_shapes)
prediction_dict[NUM_PROPOSALS] = num_proposals
# [b, h, w, c] -> [b, c, h, w]
prediction_dict[RPN_FEATURES_TO_CROP] = tf.transpose(
prediction_dict[RPN_FEATURES_TO_CROP], perm=[0, 3, 1, 2])
if use_bfloat16:
prediction_dict[RPN_FEATURES_TO_CROP] = tf.cast(
prediction_dict[RPN_FEATURES_TO_CROP], dtype=tf.bfloat16)
proposal_boxes_normalized = tf.cast(
proposal_boxes_normalized, dtype=tf.bfloat16)
# TPU box prediction
def tpu_subgraph_second_stage_fn(rpn_features_to_crop,
proposal_boxes_normalized, image_shape):
"""Defines the second part of graph on TPU."""
rpn_features_to_crop = tf.transpose(rpn_features_to_crop, perm=[0, 2, 3, 1])
output_dict = detection_model._box_prediction(
rpn_features_to_crop, proposal_boxes_normalized, image_shape)
return [
output_dict[REFINED_BOX_ENCODINGS],
output_dict[CLASS_PREDICTIONS_WITH_BACKGROUND],
output_dict[PROPOSAL_BOXES], output_dict[BOX_CLASSIFIER_FEATURES]
]
@function.Defun(capture_resource_var_by_value=False)
def tpu_subgraph_second_stage():
"""TPU subgraph 2 wrapper."""
if use_bfloat16:
with tf.contrib.tpu.bfloat16_scope():
return tf.contrib.tpu.rewrite(tpu_subgraph_second_stage_fn, [
prediction_dict[RPN_FEATURES_TO_CROP],
proposal_boxes_normalized,
prediction_dict[IMAGE_SHAPE],
])
else:
return tf.contrib.tpu.rewrite(tpu_subgraph_second_stage_fn, [
prediction_dict[RPN_FEATURES_TO_CROP],
proposal_boxes_normalized,
prediction_dict[IMAGE_SHAPE],
])
(refined_box_encodings, class_predictions_with_background, proposal_boxes,
box_classifier_features) = tpu_functional.TPUPartitionedCall(
args=tpu_subgraph_second_stage.captured_inputs,
device_ordinal=tpu_ops.tpu_ordinal_selector(),
Tout=[
o.type
for o in tpu_subgraph_second_stage.definition.signature.output_arg
],
f=tpu_subgraph_second_stage)
prediction_dict[RPN_FEATURES_TO_CROP] = tf.transpose(
prediction_dict[RPN_FEATURES_TO_CROP], perm=[0, 2, 3, 1])
prediction_dict_updater = {
REFINED_BOX_ENCODINGS: refined_box_encodings,
CLASS_PREDICTIONS_WITH_BACKGROUND: class_predictions_with_background,
PROPOSAL_BOXES: proposal_boxes,
BOX_CLASSIFIER_FEATURES: box_classifier_features,
PROPOSAL_BOXES_NORMALIZED: proposal_boxes_normalized,
}
for k in prediction_dict_updater:
prediction_dict_updater[k].set_shape(shapes_info[k])
prediction_dict.update(prediction_dict_updater)
if use_bfloat16:
prediction_dict = utils.bfloat16_to_float32_nested(prediction_dict)
# CPU post-processing (NMS)
postprocessed_tensors = detection_model.postprocess(prediction_dict,
true_image_shapes)
result_tensor_dict = exporter.add_output_tensor_nodes(postprocessed_tensors,
'inference_op')
return placeholder_tensor, result_tensor_dict
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Python library for ssd model, tailored for TPU inference."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
# pylint: disable=g-import-not-at-top
# Checking TF version, because this module relies on TPUPartitionedCall
# in tensorflow.python.tpu, which is not available until TF r1.14.
major, minor, _ = tf.__version__.split('.') # pylint: disable=protected-access
if int(major) < 1 or (int(major == 1) and int(minor) < 14):
raise RuntimeError(
'TensorFlow version >= 1.14 is required. Found ({}).'.format(
tf.__version__)) # pylint: disable=protected-access
from tensorflow.python.framework import function
from tensorflow.python.tpu import functional as tpu_functional
from tensorflow.python.tpu.ops import tpu_ops
from object_detection import exporter
from object_detection.builders import model_builder
from object_detection.tpu_exporters import utils
ANCHORS = 'anchors'
BOX_ENCODINGS = 'box_encodings'
CLASS_PREDICTIONS_WITH_BACKGROUND = 'class_predictions_with_background'
def get_prediction_tensor_shapes(pipeline_config):
"""Gets static shapes of tensors by building the graph on CPU.
This function builds the graph on CPU and obtain static shapes of output
tensors from TPUPartitionedCall. Shapes information are later used for setting
shapes of tensors when TPU graphs are built. This is necessary because tensors
coming out of TPUPartitionedCall lose their shape information, which are
needed for a lot of CPU operations later.
Args:
pipeline_config: A TrainEvalPipelineConfig proto.
Returns:
A python dict of tensors' names and their shapes.
"""
detection_model = model_builder.build(
pipeline_config.model, is_training=False)
_, input_tensors = exporter.input_placeholder_fn_map['image_tensor']()
inputs = tf.to_float(input_tensors)
preprocessed_inputs, true_image_shapes = detection_model.preprocess(inputs)
prediction_dict = detection_model.predict(preprocessed_inputs,
true_image_shapes)
return {
BOX_ENCODINGS:
prediction_dict[BOX_ENCODINGS].shape.as_list(),
CLASS_PREDICTIONS_WITH_BACKGROUND:
prediction_dict[CLASS_PREDICTIONS_WITH_BACKGROUND].shape.as_list(),
ANCHORS:
prediction_dict[ANCHORS].shape.as_list(),
}
def recover_shape(preprocessed_inputs, prediction_outputs, shapes_info):
"""Recovers shape from TPUPartitionedCall.
Args:
preprocessed_inputs: 4D tensor, shaped (batch, channels, height, width)
prediction_outputs: Python list of tensors, in the following order -
box_encodings - 3D tensor, shaped (code_size, batch, num_anchors);
class_predictions_with_background - 3D tensor, shaped (num_classes + 1,
batch, num_anchors); anchors - 2D tensor, shaped (4, num_anchors)
shapes_info: Python dict of tensor shapes as lists.
Returns:
preprocessed_inputs: 4D tensor, shaped (batch, height, width, channels)
box_encodings: 3D tensor, shaped (batch, num_anchors, code_size)
class_predictions_with_background: 3D tensor,
shaped (batch, num_anchors, num_classes + 1)
anchors: 2D tensor, shaped (num_anchors, 4)
"""
# Dimshuffle: (b, c, h, w) -> (b, h, w, c)
preprocessed_inputs = tf.transpose(preprocessed_inputs, perm=[0, 2, 3, 1])
box_encodings = tf.transpose(prediction_outputs[0], perm=[1, 2, 0])
# [None, None, detection_model._box_coder.code_size]
box_encodings.set_shape(shapes_info[BOX_ENCODINGS])
class_predictions_with_background = tf.transpose(
prediction_outputs[1], perm=[1, 2, 0])
# [None, None, num_classes + 1]
class_predictions_with_background.set_shape(
shapes_info[CLASS_PREDICTIONS_WITH_BACKGROUND])
anchors = tf.transpose(prediction_outputs[2], perm=[1, 0])
# [None, 4]
anchors.set_shape(shapes_info[ANCHORS])
return (preprocessed_inputs, box_encodings, class_predictions_with_background,
anchors)
def build_graph(pipeline_config,
shapes_info,
input_type='encoded_image_string_tensor',
use_bfloat16=False):
"""Builds TPU serving graph of ssd to be exported.
Args:
pipeline_config: A TrainEvalPipelineConfig proto.
shapes_info: A python dict of tensors' names and their shapes, returned by
`get_prediction_tensor_shapes()`.
input_type: One of
'encoded_image_string_tensor': a 1d tensor with dtype=tf.string
'image_tensor': a 4d tensor with dtype=tf.uint8
'tf_example': a 1d tensor with dtype=tf.string
use_bfloat16: If true, use tf.bfloat16 on TPU.
Returns:
placeholder_tensor: A placeholder tensor, type determined by `input_type`.
result_tensor_dict: A python dict of tensors' names and tensors.
"""
detection_model = model_builder.build(
pipeline_config.model, is_training=False)
placeholder_tensor, input_tensors = \
exporter.input_placeholder_fn_map[input_type]()
inputs = tf.to_float(input_tensors)
preprocessed_inputs, true_image_shapes = detection_model.preprocess(inputs)
# Dimshuffle: (b, h, w, c) -> (b, c, h, w)
# This is to avoid extra padding due to TPU memory layout:
# We swap larger dimensions in and smaller dimensions out, so that small
# dimensions don't get padded tens / hundreds times of its own size.
# This trick is applied to other similar tensors below.
preprocessed_inputs = tf.transpose(preprocessed_inputs, perm=[0, 3, 1, 2])
if use_bfloat16:
preprocessed_inputs = tf.cast(preprocessed_inputs, dtype=tf.bfloat16)
def predict_tpu_subgraph(preprocessed_inputs, true_image_shapes):
"""Wraps over the CPU version of `predict()`.
This builds a same graph as the original `predict()`, manipulates
result tensors' dimensions to be memory efficient on TPU, and
returns them as list of tensors.
Args:
preprocessed_inputs: A 4D tensor of shape (batch, channels, height, width)
true_image_shapes: True image shapes tensor.
Returns:
A Python list of tensors:
box_encodings: 3D tensor of shape (code_size, batch_size, num_anchors)
class_predictions_with_background: 3D tensor,
shape (num_classes + 1, batch_size, num_anchors)
anchors: 2D tensor of shape (4, num_anchors)
"""
# Dimshuffle: (b, c, h, w) -> (b, h, w, c)
preprocessed_inputs = tf.transpose(preprocessed_inputs, perm=[0, 2, 3, 1])
if use_bfloat16:
with tf.contrib.tpu.bfloat16_scope():
prediction_dict = detection_model.predict(preprocessed_inputs,
true_image_shapes)
else:
prediction_dict = detection_model.predict(preprocessed_inputs,
true_image_shapes)
# Dimshuffle: (batch, anchors, depth) -> (depth, batch, anchors)
return [
tf.transpose(prediction_dict[BOX_ENCODINGS], perm=[2, 0, 1]),
tf.transpose(
prediction_dict[CLASS_PREDICTIONS_WITH_BACKGROUND], perm=[2, 0, 1]),
tf.transpose(prediction_dict[ANCHORS], perm=[1, 0]),
]
@function.Defun(capture_resource_var_by_value=False)
def predict_tpu():
return tf.contrib.tpu.rewrite(predict_tpu_subgraph,
[preprocessed_inputs, true_image_shapes])
prediction_outputs = tpu_functional.TPUPartitionedCall(
args=predict_tpu.captured_inputs,
device_ordinal=tpu_ops.tpu_ordinal_selector(),
Tout=[o.type for o in predict_tpu.definition.signature.output_arg],
f=predict_tpu)
(preprocessed_inputs, box_encodings, class_predictions_with_background,
anchors) = recover_shape(preprocessed_inputs, prediction_outputs,
shapes_info)
output_tensors = {
'preprocessed_inputs': preprocessed_inputs,
BOX_ENCODINGS: box_encodings,
CLASS_PREDICTIONS_WITH_BACKGROUND: class_predictions_with_background,
ANCHORS: anchors,
}
if use_bfloat16:
output_tensors = utils.bfloat16_to_float32_nested(output_tensors)
postprocessed_tensors = detection_model.postprocess(output_tensors,
true_image_shapes)
result_tensor_dict = exporter.add_output_tensor_nodes(postprocessed_tensors,
'inference_op')
return placeholder_tensor, result_tensor_dict
# Faster R-CNN with Resnet-101 (v1), Atrous version
# Trained on COCO, initialized from Imagenet classification checkpoint
model {
faster_rcnn {
num_classes: 90
image_resizer {
keep_aspect_ratio_resizer {
min_dimension: 600
max_dimension: 1024
}
}
feature_extractor {
type: 'faster_rcnn_resnet101'
first_stage_features_stride: 8
}
first_stage_anchor_generator {
grid_anchor_generator {
scales: [0.25, 0.5, 1.0, 2.0]
aspect_ratios: [0.5, 1.0, 2.0]
height_stride: 8
width_stride: 8
}
}
first_stage_atrous_rate: 2
first_stage_box_predictor_conv_hyperparams {
op: CONV
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
truncated_normal_initializer {
stddev: 0.01
}
}
}
first_stage_nms_score_threshold: 0.0
first_stage_nms_iou_threshold: 0.7
first_stage_max_proposals: 300
first_stage_localization_loss_weight: 2.0
first_stage_objectness_loss_weight: 1.0
initial_crop_size: 14
maxpool_kernel_size: 2
maxpool_stride: 2
second_stage_box_predictor {
mask_rcnn_box_predictor {
use_dropout: false
dropout_keep_probability: 1.0
fc_hyperparams {
op: FC
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
variance_scaling_initializer {
factor: 1.0
uniform: true
mode: FAN_AVG
}
}
}
}
}
second_stage_post_processing {
batch_non_max_suppression {
score_threshold: 0.0
iou_threshold: 0.6
max_detections_per_class: 100
max_total_detections: 300
}
score_converter: SOFTMAX
}
second_stage_localization_loss_weight: 2.0
second_stage_classification_loss_weight: 1.0
}
}
model {
ssd {
num_classes: 2
image_resizer {
fixed_shape_resizer {
height: 1280
width: 1280
}
}
feature_extractor {
type: "ssd_resnet50_v1_fpn"
depth_multiplier: 1.0
min_depth: 16
conv_hyperparams {
regularizer {
l2_regularizer {
weight: 0.000399999989895
}
}
initializer {
truncated_normal_initializer {
mean: 0.0
stddev: 0.0299999993294
}
}
activation: RELU_6
batch_norm {
decay: 0.996999979019
center: true
scale: true
epsilon: 0.0010000000475
train: true
}
}
override_base_feature_extractor_hyperparams: true
fpn {
min_level: 2
}
}
box_coder {
faster_rcnn_box_coder {
y_scale: 10.0
x_scale: 10.0
height_scale: 5.0
width_scale: 5.0
}
}
matcher {
argmax_matcher {
matched_threshold: 0.5
unmatched_threshold: 0.5
ignore_thresholds: false
negatives_lower_than_unmatched: true
force_match_for_each_row: true
use_matmul_gather: true
}
}
similarity_calculator {
iou_similarity {
}
}
box_predictor {
weight_shared_convolutional_box_predictor {
conv_hyperparams {
regularizer {
l2_regularizer {
weight: 0.000399999989895
}
}
initializer {
random_normal_initializer {
mean: 0.0
stddev: 0.00999999977648
}
}
activation: RELU_6
batch_norm {
decay: 0.996999979019
scale: true
epsilon: 0.0010000000475
}
}
depth: 256
num_layers_before_predictor: 4
kernel_size: 3
class_prediction_bias_init: -4.59999990463
}
}
anchor_generator {
multiscale_anchor_generator {
min_level: 2
max_level: 7
anchor_scale: 3.0
aspect_ratios: 1.0
aspect_ratios: 2.0
aspect_ratios: 0.5
scales_per_octave: 2
}
}
post_processing {
batch_non_max_suppression {
score_threshold: 9.99999993923e-09
iou_threshold: 0.600000023842
max_detections_per_class: 300
max_total_detections: 600
use_static_shapes: true
}
score_converter: SIGMOID
}
normalize_loss_by_num_matches: true
loss {
localization_loss {
weighted_smooth_l1 {
}
}
classification_loss {
weighted_sigmoid_focal {
gamma: 2.0
alpha: 0.25
}
}
classification_weight: 1.0
localization_weight: 1.0
}
encode_background_as_zeros: true
normalize_loc_loss_by_codesize: true
inplace_batchnorm_update: true
freeze_batchnorm: false
}
}
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Utilities for TPU inference."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
def bfloat16_to_float32(tensor):
"""Converts a tensor to tf.float32 only if it is tf.bfloat16."""
if tensor.dtype == tf.bfloat16:
return tf.cast(tensor, dtype=tf.float32)
else:
return tensor
def bfloat16_to_float32_nested(bfloat16_tensor_dict):
"""Converts bfloat16 tensors in a nested structure to float32.
Other tensors not of dtype bfloat16 will be left as is.
Args:
bfloat16_tensor_dict: A Python dict, values being Tensor or Python
list/tuple of Tensor.
Returns:
A Python dict with the same structure as `bfloat16_tensor_dict`,
with all bfloat16 tensors converted to float32.
"""
float32_tensor_dict = {}
for k, v in bfloat16_tensor_dict.items():
if isinstance(v, tf.Tensor):
float32_tensor_dict[k] = bfloat16_to_float32(v)
elif isinstance(v, (list, tuple)):
float32_tensor_dict[k] = [bfloat16_to_float32(t) for t in v]
return float32_tensor_dict
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Test for Utility functions."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
from object_detection.tpu_exporters import utils
class UtilsTest(tf.test.TestCase):
def testBfloat16ToFloat32(self):
bfloat16_tensor = tf.random.uniform([2, 3], dtype=tf.bfloat16)
float32_tensor = utils.bfloat16_to_float32(bfloat16_tensor)
self.assertEqual(float32_tensor.dtype, tf.float32)
def testOtherDtypesNotConverted(self):
int32_tensor = tf.ones([2, 3], dtype=tf.int32)
converted_tensor = utils.bfloat16_to_float32(int32_tensor)
self.assertEqual(converted_tensor.dtype, tf.int32)
def testBfloat16ToFloat32Nested(self):
tensor_dict = {
'key1': tf.random.uniform([2, 3], dtype=tf.bfloat16),
'key2': [
tf.random.uniform([1, 2], dtype=tf.bfloat16) for _ in range(3)
],
'key3': tf.ones([2, 3], dtype=tf.int32),
}
tensor_dict = utils.bfloat16_to_float32_nested(tensor_dict)
self.assertEqual(tensor_dict['key1'].dtype, tf.float32)
for t in tensor_dict['key2']:
self.assertEqual(t.dtype, tf.float32)
self.assertEqual(tensor_dict['key3'].dtype, tf.int32)
if __name__ == '__main__':
tf.test.main()
...@@ -73,7 +73,9 @@ def get_spatial_image_size(image_resizer_config): ...@@ -73,7 +73,9 @@ def get_spatial_image_size(image_resizer_config):
return [image_resizer_config.keep_aspect_ratio_resizer.max_dimension] * 2 return [image_resizer_config.keep_aspect_ratio_resizer.max_dimension] * 2
else: else:
return [-1, -1] return [-1, -1]
if image_resizer_config.HasField("identity_resizer"): if image_resizer_config.HasField(
"identity_resizer") or image_resizer_config.HasField(
"conditional_shape_resizer"):
return [-1, -1] return [-1, -1]
raise ValueError("Unknown image resizer type.") raise ValueError("Unknown image resizer type.")
...@@ -856,11 +858,6 @@ def _update_train_steps(configs, train_steps): ...@@ -856,11 +858,6 @@ def _update_train_steps(configs, train_steps):
configs["train_config"].num_steps = int(train_steps) configs["train_config"].num_steps = int(train_steps)
def _update_eval_steps(configs, eval_steps):
"""Updates `configs` to reflect new number of eval steps per evaluation."""
configs["eval_config"].num_examples = int(eval_steps)
def _update_all_eval_input_configs(configs, field, value): def _update_all_eval_input_configs(configs, field, value):
"""Updates the content of `field` with `value` for all eval input configs.""" """Updates the content of `field` with `value` for all eval input configs."""
for eval_input_config in configs["eval_input_configs"]: for eval_input_config in configs["eval_input_configs"]:
......
...@@ -612,6 +612,12 @@ class ConfigUtilTest(tf.test.TestCase): ...@@ -612,6 +612,12 @@ class ConfigUtilTest(tf.test.TestCase):
image_shape = config_util.get_spatial_image_size(image_resizer_config) image_shape = config_util.get_spatial_image_size(image_resizer_config)
self.assertAllEqual(image_shape, [-1, -1]) self.assertAllEqual(image_shape, [-1, -1])
def testGetSpatialImageSizeFromConditionalShapeResizer(self):
image_resizer_config = image_resizer_pb2.ImageResizer()
image_resizer_config.conditional_shape_resizer.size_threshold = 100
image_shape = config_util.get_spatial_image_size(image_resizer_config)
self.assertAllEqual(image_shape, [-1, -1])
def testEvalShuffle(self): def testEvalShuffle(self):
"""Tests that `eval_shuffle` keyword arguments are applied correctly.""" """Tests that `eval_shuffle` keyword arguments are applied correctly."""
original_shuffle = True original_shuffle = True
......
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Utility functions for manipulating Keras models."""
import tensorflow as tf
def extract_submodel(model, inputs, outputs, name=None):
"""Extracts a section of a Keras model into a new model.
This method walks an existing model from the specified outputs back to the
specified inputs in order to construct a new model containing only a portion
of the old model, while sharing the layers and weights with the original
model.
WARNING: This method does not work for submodels containing layers that have
been used multiple times in the original model, or in other models beyond
the original model. (E.g. does not work for submodels that contain layers that
use shared weights). This also means that multiple overlapping submodels
cannot be extracted from the same model.
It also relies on recursion and will hit python's recursion limit for large
submodels.
Args:
model: The existing Keras model this method extracts a submodel from.
inputs: The layer inputs in the existing model that start the submodel
outputs: The layer outputs in the existing model that should be output by
the submodel
name: The name for the extracted model
Returns:
The extracted submodel specified by the given inputs and outputs
"""
output_to_layer = {}
output_to_layer_input = {}
for layer in model.layers:
layer_output = layer.output
layer_inputs = layer.input
output_to_layer[layer_output] = layer
output_to_layer_input[layer_output] = layer_inputs
model_inputs_dict = {}
memoized_results = {}
# Relies on recursion, very low limit in python
def _recurse_in_model(tensor):
"""Walk the existing model recursively to copy a submodel."""
if tensor in memoized_results:
return memoized_results[tensor]
if (tensor == inputs) or (isinstance(inputs, list) and tensor in inputs):
if tensor not in model_inputs_dict:
model_inputs_dict[tensor] = tf.keras.layers.Input(tensor=tensor)
out = model_inputs_dict[tensor]
else:
cur_inputs = output_to_layer_input[tensor]
cur_layer = output_to_layer[tensor]
if isinstance(cur_inputs, list):
out = cur_layer([_recurse_in_model(inp) for inp in cur_inputs])
else:
out = cur_layer(_recurse_in_model(cur_inputs))
memoized_results[tensor] = out
return out
if isinstance(outputs, list):
model_outputs = [_recurse_in_model(tensor) for tensor in outputs]
else:
model_outputs = _recurse_in_model(outputs)
if isinstance(inputs, list):
model_inputs = [model_inputs_dict[tensor] for tensor in inputs]
else:
model_inputs = model_inputs_dict[inputs]
return tf.keras.Model(inputs=model_inputs, outputs=model_outputs, name=name)
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Test utility functions for manipulating Keras models."""
import tensorflow as tf
from object_detection.utils import model_util
class ExtractSubmodelUtilTest(tf.test.TestCase):
def test_simple_model(self):
inputs = tf.keras.Input(shape=(256,)) # Returns a placeholder tensor
# A layer instance is callable on a tensor, and returns a tensor.
x = tf.keras.layers.Dense(128, activation='relu', name='a')(inputs)
x = tf.keras.layers.Dense(64, activation='relu', name='b')(x)
x = tf.keras.layers.Dense(32, activation='relu', name='c')(x)
x = tf.keras.layers.Dense(16, activation='relu', name='d')(x)
x = tf.keras.layers.Dense(8, activation='relu', name='e')(x)
predictions = tf.keras.layers.Dense(10, activation='softmax')(x)
model = tf.keras.Model(inputs=inputs, outputs=predictions)
new_in = model.get_layer(
name='b').input
new_out = model.get_layer(
name='d').output
new_model = model_util.extract_submodel(
model=model,
inputs=new_in,
outputs=new_out)
batch_size = 3
ones = tf.ones((batch_size, 128))
final_out = new_model(ones)
self.assertAllEqual(final_out.shape, (batch_size, 16))
if __name__ == '__main__':
tf.test.main()
...@@ -16,7 +16,6 @@ ...@@ -16,7 +16,6 @@
"""A module for helper tensorflow ops.""" """A module for helper tensorflow ops."""
import collections import collections
import math import math
import numpy as np
import six import six
import tensorflow as tf import tensorflow as tf
...@@ -53,17 +52,19 @@ def normalized_to_image_coordinates(normalized_boxes, image_shape, ...@@ -53,17 +52,19 @@ def normalized_to_image_coordinates(normalized_boxes, image_shape,
"""Converts a batch of boxes from normal to image coordinates. """Converts a batch of boxes from normal to image coordinates.
Args: Args:
normalized_boxes: a float32 tensor of shape [None, num_boxes, 4] in normalized_boxes: a tensor of shape [None, num_boxes, 4] in
normalized coordinates. normalized coordinates. The dtype of this tensor must support tf.mul.
image_shape: a float32 tensor of shape [4] containing the image shape. image_shape: a tensor of shape [4] containing the image shape, with same
dtype as `normalized_boxes`.
parallel_iterations: parallelism for the map_fn op. parallel_iterations: parallelism for the map_fn op.
Returns: Returns:
absolute_boxes: a float32 tensor of shape [None, num_boxes, 4] containing absolute_boxes: a tensor of shape [None, num_boxes, 4] containing
the boxes in image coordinates. the boxes in image coordinates, with same
dtype as `normalized_boxes`.
""" """
x_scale = tf.cast(image_shape[2], tf.float32) x_scale = tf.cast(image_shape[2], normalized_boxes.dtype)
y_scale = tf.cast(image_shape[1], tf.float32) y_scale = tf.cast(image_shape[1], normalized_boxes.dtype)
def _to_absolute_coordinates(normalized_boxes): def _to_absolute_coordinates(normalized_boxes):
y_min, x_min, y_max, x_max = tf.split( y_min, x_min, y_max, x_max = tf.split(
value=normalized_boxes, num_or_size_splits=4, axis=1) value=normalized_boxes, num_or_size_splits=4, axis=1)
...@@ -77,7 +78,7 @@ def normalized_to_image_coordinates(normalized_boxes, image_shape, ...@@ -77,7 +78,7 @@ def normalized_to_image_coordinates(normalized_boxes, image_shape,
absolute_boxes = shape_utils.static_or_dynamic_map_fn( absolute_boxes = shape_utils.static_or_dynamic_map_fn(
_to_absolute_coordinates, _to_absolute_coordinates,
elems=(normalized_boxes), elems=(normalized_boxes),
dtype=tf.float32, dtype=normalized_boxes.dtype,
parallel_iterations=parallel_iterations, parallel_iterations=parallel_iterations,
back_prop=True) back_prop=True)
return absolute_boxes return absolute_boxes
...@@ -881,27 +882,33 @@ def merge_boxes_with_multiple_labels(boxes, ...@@ -881,27 +882,33 @@ def merge_boxes_with_multiple_labels(boxes,
merged_box_indices = tf.unsorted_segment_min( merged_box_indices = tf.unsorted_segment_min(
tf.range(num_boxes), unique_indices, num_unique_boxes) tf.range(num_boxes), unique_indices, num_unique_boxes)
merged_boxes = tf.gather(boxes, merged_box_indices) merged_boxes = tf.gather(boxes, merged_box_indices)
unique_indices = tf.to_int64(unique_indices)
classes = tf.to_int64(classes)
def map_box_encodings(i): def map_box_encodings(i):
"""Produces box K-hot and score encodings for each class index.""" """Produces box K-hot and score encodings for each class index."""
box_mask = tf.equal( box_mask = tf.equal(
unique_indices, i * tf.ones(num_boxes, dtype=tf.int32)) unique_indices, i * tf.ones(num_boxes, dtype=tf.int64))
box_mask = tf.reshape(box_mask, [-1]) box_mask = tf.reshape(box_mask, [-1])
box_indices = tf.boolean_mask(classes, box_mask) box_indices = tf.boolean_mask(classes, box_mask)
box_confidences = tf.boolean_mask(confidences, box_mask) box_confidences = tf.boolean_mask(confidences, box_mask)
box_class_encodings = tf.sparse_to_dense( box_class_encodings = tf.sparse_to_dense(
box_indices, [num_classes], 1, validate_indices=False) box_indices, [num_classes], tf.constant(1, dtype=tf.int64),
validate_indices=False)
box_confidence_encodings = tf.sparse_to_dense( box_confidence_encodings = tf.sparse_to_dense(
box_indices, [num_classes], box_confidences, validate_indices=False) box_indices, [num_classes], box_confidences, validate_indices=False)
return box_class_encodings, box_confidence_encodings return box_class_encodings, box_confidence_encodings
# Important to avoid int32 here since there is no GPU kernel for int32.
# int64 and float32 are fine.
class_encodings, confidence_encodings = tf.map_fn( class_encodings, confidence_encodings = tf.map_fn(
map_box_encodings, map_box_encodings,
tf.range(num_unique_boxes), tf.range(tf.to_int64(num_unique_boxes)),
back_prop=False, back_prop=False,
dtype=(tf.int32, tf.float32)) dtype=(tf.int64, tf.float32))
merged_boxes = tf.reshape(merged_boxes, [-1, 4]) merged_boxes = tf.reshape(merged_boxes, [-1, 4])
class_encodings = tf.to_int32(class_encodings)
class_encodings = tf.reshape(class_encodings, [-1, num_classes]) class_encodings = tf.reshape(class_encodings, [-1, num_classes])
confidence_encodings = tf.reshape(confidence_encodings, [-1, num_classes]) confidence_encodings = tf.reshape(confidence_encodings, [-1, num_classes])
merged_box_indices = tf.reshape(merged_box_indices, [-1]) merged_box_indices = tf.reshape(merged_box_indices, [-1])
...@@ -1003,8 +1010,8 @@ def matmul_crop_and_resize(image, boxes, crop_size, scope=None): ...@@ -1003,8 +1010,8 @@ def matmul_crop_and_resize(image, boxes, crop_size, scope=None):
2) Only XLA supported operations are used (e.g., matrix multiplication). 2) Only XLA supported operations are used (e.g., matrix multiplication).
3) There is no `box_indices` argument --- to run this op on multiple images, 3) There is no `box_indices` argument --- to run this op on multiple images,
one must currently call this op independently on each image. one must currently call this op independently on each image.
4) All shapes and the `crop_size` parameter are assumed to be statically 4) The `crop_size` parameter is assumed to be statically defined.
defined. Moreover, the number of boxes must be strictly nonzero. Moreover, the number of boxes must be strictly nonzero.
Args: Args:
image: A `Tensor`. Must be one of the following types: `uint8`, `int8`, image: A `Tensor`. Must be one of the following types: `uint8`, `int8`,
...@@ -1029,41 +1036,20 @@ def matmul_crop_and_resize(image, boxes, crop_size, scope=None): ...@@ -1029,41 +1036,20 @@ def matmul_crop_and_resize(image, boxes, crop_size, scope=None):
Returns: Returns:
A 5-D tensor of shape `[batch, num_boxes, crop_height, crop_width, depth]` A 5-D tensor of shape `[batch, num_boxes, crop_height, crop_width, depth]`
Raises:
ValueError: if image tensor does not have shape
`[batch, image_height, image_width, depth]` and all dimensions statically
defined.
ValueError: if boxes tensor does not have shape `[batch, num_boxes, 4]`
where num_boxes > 0.
ValueError: if crop_size is not a list of two positive integers
""" """
img_shape = image.shape.as_list() img_shape = tf.shape(image)
boxes_shape = boxes.shape.as_list() img_height = img_shape[1]
_, img_height, img_width, _ = img_shape img_width = img_shape[2]
if not isinstance(crop_size, list) or len(crop_size) != 2:
raise ValueError('`crop_size` must be a list of length 2')
dimensions = img_shape + crop_size + boxes_shape
if not all([isinstance(dim, int) for dim in dimensions]):
raise ValueError('all input shapes must be statically defined')
if len(boxes_shape) != 3 or boxes_shape[2] != 4:
raise ValueError('`boxes` should have shape `[batch, num_boxes, 4]`')
if len(img_shape) != 4:
raise ValueError('image should have shape '
'`[batch, image_height, image_width, depth]`')
num_crops = boxes_shape[0]
if not num_crops > 0:
raise ValueError('number of boxes must be > 0')
if not (crop_size[0] > 0 and crop_size[1] > 0):
raise ValueError('`crop_size` must be a list of two positive integers.')
def _lin_space_weights(num, img_size): def _lin_space_weights(num, img_size):
if num > 1: if num > 1:
start_weights = tf.linspace(img_size - 1.0, 0.0, num) start_weights = tf.linspace(tf.to_float(img_size) - 1.0, 0.0, num)
stop_weights = img_size - 1 - start_weights stop_weights = tf.to_float(img_size) - 1.0 - start_weights
else: else:
start_weights = tf.constant(num * [.5 * (img_size - 1)], dtype=tf.float32) start_weights = tf.ones([num], dtype=tf.float32) * \
stop_weights = tf.constant(num * [.5 * (img_size - 1)], dtype=tf.float32) .5 * (tf.to_float(img_size) - 1.0)
stop_weights = tf.ones([num], dtype=tf.float32) * \
.5 * (tf.to_float(img_size) - 1.0)
return (start_weights, stop_weights) return (start_weights, stop_weights)
with tf.name_scope(scope, 'MatMulCropAndResize'): with tf.name_scope(scope, 'MatMulCropAndResize'):
...@@ -1076,19 +1062,19 @@ def matmul_crop_and_resize(image, boxes, crop_size, scope=None): ...@@ -1076,19 +1062,19 @@ def matmul_crop_and_resize(image, boxes, crop_size, scope=None):
[y1, x1, y2, x2] = tf.unstack(boxes, axis=2) [y1, x1, y2, x2] = tf.unstack(boxes, axis=2)
# Pixel centers of input image and grid points along height and width # Pixel centers of input image and grid points along height and width
image_idx_h = tf.constant( image_idx_h = tf.cast(
np.reshape(np.arange(img_height), (1, 1, 1, img_height)), tf.reshape(tf.range(img_height), (1, 1, 1, img_height)),
dtype=boxes.dtype) dtype=boxes.dtype)
image_idx_w = tf.constant( image_idx_w = tf.cast(
np.reshape(np.arange(img_width), (1, 1, 1, img_width)), tf.reshape(tf.range(img_width), (1, 1, 1, img_width)),
dtype=boxes.dtype) dtype=boxes.dtype)
grid_pos_h = tf.expand_dims( grid_pos_h = tf.expand_dims(
tf.einsum('ab,c->abc', y1, y1_weights) + tf.einsum( tf.einsum('ab,c->abc', y1, y1_weights) +
'ab,c->abc', y2, y2_weights), tf.einsum('ab,c->abc', y2, y2_weights),
axis=3) axis=3)
grid_pos_w = tf.expand_dims( grid_pos_w = tf.expand_dims(
tf.einsum('ab,c->abc', x1, x1_weights) + tf.einsum( tf.einsum('ab,c->abc', x1, x1_weights) +
'ab,c->abc', x2, x2_weights), tf.einsum('ab,c->abc', x2, x2_weights),
axis=3) axis=3)
# Create kernel matrices of pairwise kernel evaluations between pixel # Create kernel matrices of pairwise kernel evaluations between pixel
...@@ -1096,7 +1082,8 @@ def matmul_crop_and_resize(image, boxes, crop_size, scope=None): ...@@ -1096,7 +1082,8 @@ def matmul_crop_and_resize(image, boxes, crop_size, scope=None):
kernel_h = tf.nn.relu(1 - tf.abs(image_idx_h - grid_pos_h)) kernel_h = tf.nn.relu(1 - tf.abs(image_idx_h - grid_pos_h))
kernel_w = tf.nn.relu(1 - tf.abs(image_idx_w - grid_pos_w)) kernel_w = tf.nn.relu(1 - tf.abs(image_idx_w - grid_pos_w))
# Compute matrix multiplication between the spatial dimensions of the image # Compute matrix multiplication between
# the spatial dimensions of the image
# and height-wise kernel using einsum. # and height-wise kernel using einsum.
intermediate_image = tf.einsum('abci,aiop->abcop', kernel_h, image) intermediate_image = tf.einsum('abci,aiop->abcop', kernel_h, image)
# Compute matrix multiplication between the spatial dimensions of the # Compute matrix multiplication between the spatial dimensions of the
...@@ -1124,6 +1111,58 @@ def native_crop_and_resize(image, boxes, crop_size, scope=None): ...@@ -1124,6 +1111,58 @@ def native_crop_and_resize(image, boxes, crop_size, scope=None):
return tf.reshape(cropped_regions, final_shape) return tf.reshape(cropped_regions, final_shape)
def bfloat16_to_float32_nested(tensor_nested):
"""Convert float32 tensors in a nested structure to bfloat16.
Args:
tensor_nested: A Python dict, values being Tensor or Python list/tuple of
Tensor.
Returns:
A Python dict with the same structure as `tensor_dict`,
with all bfloat16 tensors converted to float32.
"""
if isinstance(tensor_nested, tf.Tensor):
if tensor_nested.dtype == tf.bfloat16:
return tf.cast(tensor_nested, dtype=tf.float32)
else:
return tensor_nested
elif isinstance(tensor_nested, (list, tuple)):
out_tensor_dict = [bfloat16_to_float32_nested(t) for t in tensor_nested]
elif isinstance(tensor_nested, dict):
out_tensor_dict = {
k: bfloat16_to_float32_nested(v) for k, v in tensor_nested.items()
}
return out_tensor_dict
def gather_with_padding_values(input_tensor, indices, padding_value):
"""Gathers elements from tensor and pads `padding_value` for ignore indices.
Gathers elements from `input_tensor` based on `indices`. If there are ignore
indices (which are "-1"s) in `indices`, `padding_value` will be gathered for
those positions.
Args:
input_tensor: A N-D tensor of shape [M, d_1, d_2 .. d_(N-1)] to gather
values from.
indices: A 1-D tensor in which each element is either an index in the
first dimension of input_tensor or -1.
padding_value: A (N-1)-D tensor of shape [d_1, d_2 .. d_(N-1)] which will be
used as gathered value for each ignore index in `indices`.
Returns:
gathered_tensor: A tensor of shape [L, d_1, d_2 .. d_(N-1)] containing
values gathered from input_tensor. The first dimension L is equal to the
length of `indices`.
"""
padding_value = tf.expand_dims(padding_value, axis=0)
input_tensor = tf.concat([padding_value, input_tensor], axis=0)
gather_indices = indices + 1
gathered_tensor = tf.gather(input_tensor, gather_indices)
return gathered_tensor
......
...@@ -1223,6 +1223,35 @@ class MergeBoxesWithMultipleLabelsTest(tf.test.TestCase): ...@@ -1223,6 +1223,35 @@ class MergeBoxesWithMultipleLabelsTest(tf.test.TestCase):
self.assertAllEqual(np_merged_confidences.shape, [0, 5]) self.assertAllEqual(np_merged_confidences.shape, [0, 5])
self.assertAllEqual(np_merged_box_indices.shape, [0]) self.assertAllEqual(np_merged_box_indices.shape, [0])
def testMergeBoxesWithMultipleLabelsUsesInt64(self):
boxes = tf.constant(
[[0.25, 0.25, 0.75, 0.75], [0.0, 0.0, 0.5, 0.75],
[0.25, 0.25, 0.75, 0.75]],
dtype=tf.float32)
class_indices = tf.constant([0, 4, 2], dtype=tf.int32)
class_confidences = tf.constant([0.8, 0.2, 0.1], dtype=tf.float32)
num_classes = 5
ops.merge_boxes_with_multiple_labels(
boxes, class_indices, class_confidences, num_classes)
graph = tf.get_default_graph()
def assert_dtype_is_int64(op_name):
op = graph.get_operation_by_name(op_name)
self.assertEqual(op.get_attr('dtype'), tf.int64)
def assert_t_is_int64(op_name):
op = graph.get_operation_by_name(op_name)
self.assertEqual(op.get_attr('T'), tf.int64)
assert_dtype_is_int64('map/TensorArray')
assert_dtype_is_int64('map/TensorArray_1')
assert_dtype_is_int64('map/while/TensorArrayReadV3')
assert_t_is_int64('map/while/TensorArrayWrite/TensorArrayWriteV3')
assert_t_is_int64(
'map/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3')
assert_dtype_is_int64('map/TensorArrayStack/TensorArrayGatherV3')
class NearestNeighborUpsamplingTest(test_case.TestCase): class NearestNeighborUpsamplingTest(test_case.TestCase):
...@@ -1470,6 +1499,56 @@ class OpsTestCropAndResize(test_case.TestCase): ...@@ -1470,6 +1499,56 @@ class OpsTestCropAndResize(test_case.TestCase):
self.assertAllClose(crop_output, expected_output) self.assertAllClose(crop_output, expected_output)
class TestBfloat16ToFloat32(test_case.TestCase):
def test_convert_list(self):
var_list = [
tf.constant([1.], dtype=tf.bfloat16),
tf.constant([2], dtype=tf.int32)
]
casted_var_list = ops.bfloat16_to_float32_nested(var_list)
self.assertEqual(casted_var_list[0].dtype, tf.float32)
self.assertEqual(casted_var_list[1].dtype, tf.int32)
def test_convert_tensor_dict(self):
tensor_dict = {
'key1': tf.constant([1.], dtype=tf.bfloat16),
'key2': [
tf.constant([0.5], dtype=tf.bfloat16),
tf.constant([7], dtype=tf.int32),
],
'key3': tf.constant([2], dtype=tf.uint8),
}
tensor_dict = ops.bfloat16_to_float32_nested(tensor_dict)
self.assertEqual(tensor_dict['key1'].dtype, tf.float32)
self.assertEqual(tensor_dict['key2'][0].dtype, tf.float32)
self.assertEqual(tensor_dict['key2'][1].dtype, tf.int32)
self.assertEqual(tensor_dict['key3'].dtype, tf.uint8)
class TestGatherWithPaddingValues(test_case.TestCase):
def test_gather_with_padding_values(self):
indices = tf.constant([1, -1, 0, -1])
input_tensor = tf.constant([[0, 0, 0.1, 0.1], [0, 0, 0.2, 0.2]],
dtype=tf.float32)
expected_gathered_tensor = [
[0, 0, 0.2, 0.2],
[0, 0, 0, 0],
[0, 0, 0.1, 0.1],
[0, 0, 0, 0],
]
gathered_tensor = ops.gather_with_padding_values(
input_tensor,
indices=indices,
padding_value=tf.zeros_like(input_tensor[0]))
self.assertEqual(gathered_tensor.dtype, tf.float32)
with self.test_session():
gathered_tensor_np = gathered_tensor.eval()
self.assertAllClose(expected_gathered_tensor, gathered_tensor_np)
......
...@@ -21,7 +21,6 @@ The functions do not return a value, instead they modify the image itself. ...@@ -21,7 +21,6 @@ The functions do not return a value, instead they modify the image itself.
""" """
import abc import abc
import collections import collections
import functools
# Set headless-friendly backend. # Set headless-friendly backend.
import matplotlib; matplotlib.use('Agg') # pylint: disable=multiple-statements import matplotlib; matplotlib.use('Agg') # pylint: disable=multiple-statements
import matplotlib.pyplot as plt # pylint: disable=g-import-not-at-top import matplotlib.pyplot as plt # pylint: disable=g-import-not-at-top
...@@ -65,6 +64,34 @@ STANDARD_COLORS = [ ...@@ -65,6 +64,34 @@ STANDARD_COLORS = [
] ]
def _get_multiplier_for_color_randomness():
"""Returns a multiplier to get semi-random colors from successive indices.
This function computes a prime number, p, in the range [2, 17] that:
- is closest to len(STANDARD_COLORS) / 10
- does not divide len(STANDARD_COLORS)
If no prime numbers in that range satisfy the constraints, p is returned as 1.
Once p is established, it can be used as a multiplier to select
non-consecutive colors from STANDARD_COLORS:
colors = [(p * i) % len(STANDARD_COLORS) for i in range(20)]
"""
num_colors = len(STANDARD_COLORS)
prime_candidates = [5, 7, 11, 13, 17]
# Remove all prime candidates that divide the number of colors.
prime_candidates = [p for p in prime_candidates if num_colors % p]
if not prime_candidates:
return 1
# Return the closest prime number to num_colors / 10.
abs_distance = [np.abs(num_colors / 10. - p) for p in prime_candidates]
num_candidates = len(abs_distance)
inds = [i for _, i in sorted(zip(abs_distance, range(num_candidates)))]
return prime_candidates[inds[0]]
def save_image_array_as_png(image, output_path): def save_image_array_as_png(image, output_path):
"""Saves an image (represented as a numpy array) to PNG. """Saves an image (represented as a numpy array) to PNG.
...@@ -266,46 +293,98 @@ def draw_bounding_boxes_on_image(image, ...@@ -266,46 +293,98 @@ def draw_bounding_boxes_on_image(image,
boxes[i, 3], color, thickness, display_str_list) boxes[i, 3], color, thickness, display_str_list)
def _visualize_boxes(image, boxes, classes, scores, category_index, **kwargs): def create_visualization_fn(category_index, include_masks=False,
return visualize_boxes_and_labels_on_image_array( include_keypoints=False, include_track_ids=False,
image, boxes, classes, scores, category_index=category_index, **kwargs) **kwargs):
"""Constructs a visualization function that can be wrapped in a py_func.
py_funcs only accept positional arguments. This function returns a suitable
function with the correct positional argument mapping. The positional
arguments in order are:
0: image
1: boxes
2: classes
3: scores
[4-6]: masks (optional)
[4-6]: keypoints (optional)
[4-6]: track_ids (optional)
-- Example 1 --
vis_only_masks_fn = create_visualization_fn(category_index,
include_masks=True, include_keypoints=False, include_track_ids=False,
**kwargs)
image = tf.py_func(vis_only_masks_fn,
inp=[image, boxes, classes, scores, masks],
Tout=tf.uint8)
-- Example 2 --
vis_masks_and_track_ids_fn = create_visualization_fn(category_index,
include_masks=True, include_keypoints=False, include_track_ids=True,
**kwargs)
image = tf.py_func(vis_masks_and_track_ids_fn,
inp=[image, boxes, classes, scores, masks, track_ids],
Tout=tf.uint8)
Args:
category_index: a dict that maps integer ids to category dicts. e.g.
{1: {1: 'dog'}, 2: {2: 'cat'}, ...}
include_masks: Whether masks should be expected as a positional argument in
the returned function.
include_keypoints: Whether keypoints should be expected as a positional
argument in the returned function.
include_track_ids: Whether track ids should be expected as a positional
argument in the returned function.
**kwargs: Additional kwargs that will be passed to
visualize_boxes_and_labels_on_image_array.
def _visualize_boxes_and_masks(image, boxes, classes, scores, masks, Returns:
category_index, **kwargs): Returns a function that only takes tensors as positional arguments.
return visualize_boxes_and_labels_on_image_array( """
image,
boxes,
classes,
scores,
category_index=category_index,
instance_masks=masks,
**kwargs)
def _visualize_boxes_and_keypoints(image, boxes, classes, scores, keypoints, def visualization_py_func_fn(*args):
category_index, **kwargs): """Visualization function that can be wrapped in a tf.py_func.
return visualize_boxes_and_labels_on_image_array(
image,
boxes,
classes,
scores,
category_index=category_index,
keypoints=keypoints,
**kwargs)
Args:
*args: First 4 positional arguments must be:
image - uint8 numpy array with shape (img_height, img_width, 3).
boxes - a numpy array of shape [N, 4].
classes - a numpy array of shape [N].
scores - a numpy array of shape [N] or None.
-- Optional positional arguments --
instance_masks - a numpy array of shape [N, image_height, image_width].
keypoints - a numpy array of shape [N, num_keypoints, 2].
track_ids - a numpy array of shape [N] with unique track ids.
def _visualize_boxes_and_masks_and_keypoints( Returns:
image, boxes, classes, scores, masks, keypoints, category_index, **kwargs): uint8 numpy array with shape (img_height, img_width, 3) with overlaid
return visualize_boxes_and_labels_on_image_array( boxes.
image, """
boxes, image = args[0]
classes, boxes = args[1]
scores, classes = args[2]
category_index=category_index, scores = args[3]
instance_masks=masks, masks = keypoints = track_ids = None
keypoints=keypoints, pos_arg_ptr = 4 # Positional argument for first optional tensor (masks).
**kwargs) if include_masks:
masks = args[pos_arg_ptr]
pos_arg_ptr += 1
if include_keypoints:
keypoints = args[pos_arg_ptr]
pos_arg_ptr += 1
if include_track_ids:
track_ids = args[pos_arg_ptr]
return visualize_boxes_and_labels_on_image_array(
image,
boxes,
classes,
scores,
category_index=category_index,
instance_masks=masks,
keypoints=keypoints,
track_ids=track_ids,
**kwargs)
return visualization_py_func_fn
def _resize_original_image(image, image_shape): def _resize_original_image(image, image_shape):
...@@ -327,6 +406,7 @@ def draw_bounding_boxes_on_image_tensors(images, ...@@ -327,6 +406,7 @@ def draw_bounding_boxes_on_image_tensors(images,
true_image_shape=None, true_image_shape=None,
instance_masks=None, instance_masks=None,
keypoints=None, keypoints=None,
track_ids=None,
max_boxes_to_draw=20, max_boxes_to_draw=20,
min_score_thresh=0.2, min_score_thresh=0.2,
use_normalized_coordinates=True): use_normalized_coordinates=True):
...@@ -350,6 +430,9 @@ def draw_bounding_boxes_on_image_tensors(images, ...@@ -350,6 +430,9 @@ def draw_bounding_boxes_on_image_tensors(images,
instance masks. instance masks.
keypoints: A 4D float32 tensor of shape [N, max_detection, num_keypoints, 2] keypoints: A 4D float32 tensor of shape [N, max_detection, num_keypoints, 2]
with keypoints. with keypoints.
track_ids: [N, max_detections] int32 tensor of unique tracks ids (i.e.
instance ids for each object). If provided, the color-coding of boxes is
dictated by these ids, and not classes.
max_boxes_to_draw: Maximum number of boxes to draw on an image. Default 20. max_boxes_to_draw: Maximum number of boxes to draw on an image. Default 20.
min_score_thresh: Minimum score threshold for visualization. Default 0.2. min_score_thresh: Minimum score threshold for visualization. Default 0.2.
use_normalized_coordinates: Whether to assume boxes and kepoints are in use_normalized_coordinates: Whether to assume boxes and kepoints are in
...@@ -380,40 +463,20 @@ def draw_bounding_boxes_on_image_tensors(images, ...@@ -380,40 +463,20 @@ def draw_bounding_boxes_on_image_tensors(images,
else: else:
original_shapes = original_image_spatial_shape original_shapes = original_image_spatial_shape
if instance_masks is not None and keypoints is None: visualize_boxes_fn = create_visualization_fn(
visualize_boxes_fn = functools.partial( category_index,
_visualize_boxes_and_masks, include_masks=instance_masks is not None,
category_index=category_index, include_keypoints=keypoints is not None,
**visualization_keyword_args) include_track_ids=track_ids is not None,
elems = [ **visualization_keyword_args)
true_shapes, original_shapes, images, boxes, classes, scores,
instance_masks elems = [true_shapes, original_shapes, images, boxes, classes, scores]
] if instance_masks is not None:
elif instance_masks is None and keypoints is not None: elems.append(instance_masks)
visualize_boxes_fn = functools.partial( if keypoints is not None:
_visualize_boxes_and_keypoints, elems.append(keypoints)
category_index=category_index, if track_ids is not None:
**visualization_keyword_args) elems.append(track_ids)
elems = [
true_shapes, original_shapes, images, boxes, classes, scores, keypoints
]
elif instance_masks is not None and keypoints is not None:
visualize_boxes_fn = functools.partial(
_visualize_boxes_and_masks_and_keypoints,
category_index=category_index,
**visualization_keyword_args)
elems = [
true_shapes, original_shapes, images, boxes, classes, scores,
instance_masks, keypoints
]
else:
visualize_boxes_fn = functools.partial(
_visualize_boxes,
category_index=category_index,
**visualization_keyword_args)
elems = [
true_shapes, original_shapes, images, boxes, classes, scores
]
def draw_boxes(image_and_detections): def draw_boxes(image_and_detections):
"""Draws boxes on image.""" """Draws boxes on image."""
...@@ -627,6 +690,7 @@ def visualize_boxes_and_labels_on_image_array( ...@@ -627,6 +690,7 @@ def visualize_boxes_and_labels_on_image_array(
instance_masks=None, instance_masks=None,
instance_boundaries=None, instance_boundaries=None,
keypoints=None, keypoints=None,
track_ids=None,
use_normalized_coordinates=False, use_normalized_coordinates=False,
max_boxes_to_draw=20, max_boxes_to_draw=20,
min_score_thresh=.5, min_score_thresh=.5,
...@@ -634,7 +698,8 @@ def visualize_boxes_and_labels_on_image_array( ...@@ -634,7 +698,8 @@ def visualize_boxes_and_labels_on_image_array(
line_thickness=4, line_thickness=4,
groundtruth_box_visualization_color='black', groundtruth_box_visualization_color='black',
skip_scores=False, skip_scores=False,
skip_labels=False): skip_labels=False,
skip_track_ids=False):
"""Overlay labeled boxes on an image with formatted scores and label names. """Overlay labeled boxes on an image with formatted scores and label names.
This function groups boxes that correspond to the same location This function groups boxes that correspond to the same location
...@@ -658,6 +723,9 @@ def visualize_boxes_and_labels_on_image_array( ...@@ -658,6 +723,9 @@ def visualize_boxes_and_labels_on_image_array(
with values ranging between 0 and 1, can be None. with values ranging between 0 and 1, can be None.
keypoints: a numpy array of shape [N, num_keypoints, 2], can keypoints: a numpy array of shape [N, num_keypoints, 2], can
be None be None
track_ids: a numpy array of shape [N] with unique track ids. If provided,
color-coding of boxes will be determined by these ids, and not the class
indices.
use_normalized_coordinates: whether boxes is to be interpreted as use_normalized_coordinates: whether boxes is to be interpreted as
normalized coordinates or not. normalized coordinates or not.
max_boxes_to_draw: maximum number of boxes to visualize. If None, draw max_boxes_to_draw: maximum number of boxes to visualize. If None, draw
...@@ -671,6 +739,7 @@ def visualize_boxes_and_labels_on_image_array( ...@@ -671,6 +739,7 @@ def visualize_boxes_and_labels_on_image_array(
boxes boxes
skip_scores: whether to skip score when drawing a single detection skip_scores: whether to skip score when drawing a single detection
skip_labels: whether to skip label when drawing a single detection skip_labels: whether to skip label when drawing a single detection
skip_track_ids: whether to skip track id when drawing a single detection
Returns: Returns:
uint8 numpy array with shape (img_height, img_width, 3) with overlaid boxes. uint8 numpy array with shape (img_height, img_width, 3) with overlaid boxes.
...@@ -682,6 +751,7 @@ def visualize_boxes_and_labels_on_image_array( ...@@ -682,6 +751,7 @@ def visualize_boxes_and_labels_on_image_array(
box_to_instance_masks_map = {} box_to_instance_masks_map = {}
box_to_instance_boundaries_map = {} box_to_instance_boundaries_map = {}
box_to_keypoints_map = collections.defaultdict(list) box_to_keypoints_map = collections.defaultdict(list)
box_to_track_ids_map = {}
if not max_boxes_to_draw: if not max_boxes_to_draw:
max_boxes_to_draw = boxes.shape[0] max_boxes_to_draw = boxes.shape[0]
for i in range(min(max_boxes_to_draw, boxes.shape[0])): for i in range(min(max_boxes_to_draw, boxes.shape[0])):
...@@ -693,6 +763,8 @@ def visualize_boxes_and_labels_on_image_array( ...@@ -693,6 +763,8 @@ def visualize_boxes_and_labels_on_image_array(
box_to_instance_boundaries_map[box] = instance_boundaries[i] box_to_instance_boundaries_map[box] = instance_boundaries[i]
if keypoints is not None: if keypoints is not None:
box_to_keypoints_map[box].extend(keypoints[i]) box_to_keypoints_map[box].extend(keypoints[i])
if track_ids is not None:
box_to_track_ids_map[box] = track_ids[i]
if scores is None: if scores is None:
box_to_color_map[box] = groundtruth_box_visualization_color box_to_color_map[box] = groundtruth_box_visualization_color
else: else:
...@@ -709,9 +781,18 @@ def visualize_boxes_and_labels_on_image_array( ...@@ -709,9 +781,18 @@ def visualize_boxes_and_labels_on_image_array(
display_str = '{}%'.format(int(100*scores[i])) display_str = '{}%'.format(int(100*scores[i]))
else: else:
display_str = '{}: {}%'.format(display_str, int(100*scores[i])) display_str = '{}: {}%'.format(display_str, int(100*scores[i]))
if not skip_track_ids and track_ids is not None:
if not display_str:
display_str = 'ID {}'.format(track_ids[i])
else:
display_str = '{}: ID {}'.format(display_str, track_ids[i])
box_to_display_str_map[box].append(display_str) box_to_display_str_map[box].append(display_str)
if agnostic_mode: if agnostic_mode:
box_to_color_map[box] = 'DarkOrange' box_to_color_map[box] = 'DarkOrange'
elif track_ids is not None:
prime_multipler = _get_multiplier_for_color_randomness()
box_to_color_map[box] = STANDARD_COLORS[
(prime_multipler * track_ids[i]) % len(STANDARD_COLORS)]
else: else:
box_to_color_map[box] = STANDARD_COLORS[ box_to_color_map[box] = STANDARD_COLORS[
classes[i] % len(STANDARD_COLORS)] classes[i] % len(STANDARD_COLORS)]
......
...@@ -29,6 +29,30 @@ _TESTDATA_PATH = 'object_detection/test_images' ...@@ -29,6 +29,30 @@ _TESTDATA_PATH = 'object_detection/test_images'
class VisualizationUtilsTest(tf.test.TestCase): class VisualizationUtilsTest(tf.test.TestCase):
def test_get_prime_multiplier_for_color_randomness(self):
# Show that default multipler is not 1 and does not divide the total number
# of standard colors.
multiplier = visualization_utils._get_multiplier_for_color_randomness()
self.assertNotEqual(
0, multiplier % len(visualization_utils.STANDARD_COLORS))
self.assertNotEqual(1, multiplier)
# Show that with 34 colors, the closest prime number to 34/10 that
# satisfies the constraints is 5.
visualization_utils.STANDARD_COLORS = [
'color_{}'.format(str(i)) for i in range(34)
]
multiplier = visualization_utils._get_multiplier_for_color_randomness()
self.assertEqual(5, multiplier)
# Show that with 110 colors, the closest prime number to 110/10 that
# satisfies the constraints is 13 (since 11 equally divides 110).
visualization_utils.STANDARD_COLORS = [
'color_{}'.format(str(i)) for i in range(110)
]
multiplier = visualization_utils._get_multiplier_for_color_randomness()
self.assertEqual(13, multiplier)
def create_colorful_test_image(self): def create_colorful_test_image(self):
"""This function creates an image that can be used to test vis functions. """This function creates an image that can be used to test vis functions.
...@@ -158,6 +182,55 @@ class VisualizationUtilsTest(tf.test.TestCase): ...@@ -158,6 +182,55 @@ class VisualizationUtilsTest(tf.test.TestCase):
image_pil = Image.fromarray(images_with_boxes_np[i, ...]) image_pil = Image.fromarray(images_with_boxes_np[i, ...])
image_pil.save(output_file) image_pil.save(output_file)
def test_draw_bounding_boxes_on_image_tensors_with_track_ids(self):
"""Tests that bounding box utility produces reasonable results."""
category_index = {1: {'id': 1, 'name': 'dog'}, 2: {'id': 2, 'name': 'cat'}}
fname = os.path.join(_TESTDATA_PATH, 'image1.jpg')
image_np = np.array(Image.open(fname))
images_np = np.stack((image_np, image_np), axis=0)
original_image_shape = [[636, 512], [636, 512]]
with tf.Graph().as_default():
images_tensor = tf.constant(value=images_np, dtype=tf.uint8)
image_shape = tf.constant(original_image_shape, dtype=tf.int32)
boxes = tf.constant([[[0.4, 0.25, 0.75, 0.75],
[0.5, 0.3, 0.7, 0.9],
[0.7, 0.5, 0.8, 0.9]],
[[0.41, 0.25, 0.75, 0.75],
[0.51, 0.3, 0.7, 0.9],
[0.75, 0.5, 0.8, 0.9]]])
classes = tf.constant([[1, 1, 2], [1, 1, 2]], dtype=tf.int64)
scores = tf.constant([[0.8, 0.5, 0.7], [0.6, 0.5, 0.8]])
track_ids = tf.constant([[3, 9, 7], [3, 9, 144]], dtype=tf.int32)
images_with_boxes = (
visualization_utils.draw_bounding_boxes_on_image_tensors(
images_tensor,
boxes,
classes,
scores,
category_index,
original_image_spatial_shape=image_shape,
true_image_shape=image_shape,
track_ids=track_ids,
min_score_thresh=0.2))
with self.test_session() as sess:
sess.run(tf.global_variables_initializer())
# Write output images for visualization.
images_with_boxes_np = sess.run(images_with_boxes)
self.assertEqual(images_np.shape[0], images_with_boxes_np.shape[0])
self.assertEqual(images_np.shape[3], images_with_boxes_np.shape[3])
self.assertEqual(
tuple(original_image_shape[0]), images_with_boxes_np.shape[1:3])
for i in range(images_with_boxes_np.shape[0]):
img_name = 'image_with_track_ids_' + str(i) + '.png'
output_file = os.path.join(self.get_temp_dir(), img_name)
logging.info('Writing output image %d to %s', i, output_file)
image_pil = Image.fromarray(images_with_boxes_np[i, ...])
image_pil.save(output_file)
def test_draw_bounding_boxes_on_image_tensors_with_additional_channels(self): def test_draw_bounding_boxes_on_image_tensors_with_additional_channels(self):
"""Tests the case where input image tensor has more than 3 channels.""" """Tests the case where input image tensor has more than 3 channels."""
category_index = {1: {'id': 1, 'name': 'dog'}} category_index = {1: {'id': 1, 'name': 'dog'}}
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment