Unverified Commit 8518d053 authored by pkulzc's avatar pkulzc Committed by GitHub
Browse files

Open source MnasFPN and minor fixes to OD API (#8484)

310447280  by lzc:

    Internal change

310420845  by Zhichao Lu:

    Open source the internal Context RCNN code.

--
310362339  by Zhichao Lu:

    Internal change

310259448  by lzc:

    Update required TF version for OD API.

--
310252159  by Zhichao Lu:

    Port patch_ops_test to TF1/TF2 as TPUs.

--
310247180  by Zhichao Lu:

    Ignore keypoint heatmap loss in the regions/bounding boxes with target keypoint
    class but no valid keypoint annotations.

--
310178294  by Zhichao Lu:

    Opensource MnasFPN
    https://arxiv.org/abs/1912.01106

--
310094222  by lzc:

    Internal changes.

--
310085250  by lzc:

    Internal Change.

--
310016447  by huizhongc:

    Remove unrecognized classes from labeled_classes.

--
310009470  by rathodv:

    Mark batcher.py as TF1 only.

--
310001984  by rathodv:

    Update core/preprocessor.py to be compatible with TF1/TF2..

--
309455035  by Zhichao Lu:

    Makes the freezable_batch_norm_test run w/ v2 behavior.

    The main change is in v2 updates will happen right away when running batchnorm in training mode. So, we need to restore the weights between batchnorm calls to make sure the numerical checks all start from the same place.

--
309425881  by Zhichao Lu:

    Make TF1/TF2 optimizer builder tests explicit.

--
309408646  by Zhichao Lu:

    Make dataset builder tests TF1 and TF2 compatible.

--
309246305  by Zhichao Lu:

    Added the functionality of combining the person keypoints and object detection
    annotations in the binary that converts the COCO raw data to TfRecord.

--
309125076  by Zhichao Lu:

    Convert target_assigner_utils to TF1/TF2.

--
308966359  by huizhongc:

    Support SSD training with partially labeled groundtruth.

--
308937159  by rathodv:

    Update core/target_assigner.py to be compatible with TF1/TF2.

--
308774302  by Zhichao Lu:

    Internal

--
308732860  by rathodv:

    Make core/prefetcher.py  compatible with TF1 only.

--
308726984  by rathodv:

    Update core/multiclass_nms_test.py to be TF1/TF2 compatible.

--
308714718  by rathodv:

    Update core/region_similarity_calculator_test.py to be TF1/TF2 compatible.

--
308707960  by rathodv:

    Update core/minibatch_sampler_test.py to be TF1/TF2 compatible.

--
308700595  by rathodv:

    Update core/losses_test.py to be TF1/TF2 compatible and remove losses_test_v2.py

--
308361472  by rathodv:

    Update core/matcher_test.py to be TF1/TF2 compatible.

--
308335846  by Zhichao Lu:

    Updated the COCO evaluation logics and populated the groundturth area
    information through. This change matches the groundtruth format expected by the
    COCO keypoint evaluation.

--
308256924  by rathodv:

    Update core/keypoints_ops_test.py to be TF1/TF2 compatible.

--
308256826  by rathodv:

    Update class_agnostic_nms_test.py to be TF1/TF2 compatible.

--
308256112  by rathodv:

    Update box_list_ops_test.py to be TF1/TF2 compatible.

--
308159360  by Zhichao Lu:

    Internal change

308145008  by Zhichao Lu:

    Added 'image/class/confidence' field in the TFExample decoder.

--
307651875  by rathodv:

    Refactor core/box_list.py to support TF1/TF2.

--
307651798  by rathodv:

    Modify box_coder.py base class to work with with TF1/TF2

--
307651652  by rathodv:

    Refactor core/balanced_positive_negative_sampler.py to support TF1/TF2.

--
307651571  by rathodv:

    Modify BoxCoders tests to use test_case:execute method to allow testing with TF1.X and TF2.X

--
307651480  by rathodv:

    Modify Matcher tests to use test_case:execute method to allow testing with TF1.X and TF2.X

--
307651409  by rathodv:

    Modify AnchorGenerator tests to use test_case:execute method to allow testing with TF1.X and TF2.X

--
307651314  by rathodv:

    Refactor model_builder to support TF1 or TF2 models based on TensorFlow version.

--
307092053  by Zhichao Lu:

    Use manager to save checkpoint.

--
307071352  by ronnyvotel:

    Fixing keypoint visibilities. Now by default, the visibility is marked True if the keypoint is labeled (regardless of whether it is visible or not).
    Also, if visibilities are not present in the dataset, they will be created based on whether the keypoint coordinates are finite (vis = True) or NaN (vis = False).

--
307069557  by Zhichao Lu:

    Internal change to add few fields related to postprocessing parameters in
    center_net.proto and populate those parameters to the keypoint postprocessing
    functions.

--
307012091  by Zhichao Lu:

    Make Adam Optimizer's epsilon proto configurable.

    Potential issue: tf.compat.v1's AdamOptimizer has a default epsilon on 1e-08 ([doc-link](https://www.tensorflow.org/api_docs/python/tf/compat/v1/train/AdamOptimizer))  whereas tf.keras's AdamOptimizer has default epsilon 1e-07 ([doc-link](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Adam))

--
306858598  by Zhichao Lu:

    Internal changes to update the CenterNet model:
    1) Modified eval job loss computation to avoid averaging over batches with zero loss.
    2) Updated CenterNet keypoint heatmap target assigner to apply box size to heatmap Guassian standard deviation.
    3) Updated the CenterNet meta arch keypoint losses computation to apply weights outside of loss function.

--
306731223  by jonathanhuang:

    Internal change.

--
306549183  by rathodv:

    Internal Update.

--
306542930  by rathodv:

    Internal Update

--
306322697  by rathodv:

    Internal.

--
305345036  by Zhichao Lu:

    Adding COCO Camera Traps Json to tf.Example beam code

--
304104869  by lzc:

    Internal changes.

--
304068971  by jonathanhuang:

    Internal change.

--
304050469  by Zhichao Lu:

    Internal change.

--
303880642  by huizhongc:

    Support parsing partially labeled groundtruth.

--
303841743  by Zhichao Lu:

    Deprecate nms_on_host in SSDMetaArch.

--
303803204  by rathodv:

    Internal change.

--
303793895  by jonathanhuang:

    Internal change.

--
303467631  by rathodv:

    Py3 update for detection inference test.

--
303444542  by rathodv:

    Py3 update to metrics module

--
303421960  by rathodv:

    Update json_utils to python3.

--
302787583  by ronnyvotel:

    Coco results generator for submission to the coco test server.

--
302719091  by Zhichao Lu:

    Internal change to add the ResNet50 image feature extractor for CenterNet model.

--
302116230  by Zhichao Lu:

    Added the functions to overlay the heatmaps with images in visualization util
    library.

--
301888316  by Zhichao Lu:

    Fix checkpoint_filepath not defined error.

--
301840312  by ronnyvotel:

    Adding keypoint_scores to visualizations.

--
301683475  by ronnyvotel:

    Introducing the ability to preprocess `keypoint_visibilities`.

    Some data augmentation ops such as random crop can filter instances and keypoints. It's important to also filter keypoint visibilities, so that the groundtruth tensors are always in alignment.

--
301532344  by Zhichao Lu:

    Don't use tf.divide since "Quantization not yet supported for op: DIV"

--
301480348  by ronnyvotel:

    Introducing keypoint evaluation into model lib v2.
    Also, making some fixes to coco keypoint evaluation.

--
301454018  by Zhichao Lu:

    Added the image summary to visualize the train/eval input images and eval's
    prediction/groundtruth side-by-side image.

--
301317527  by Zhichao Lu:

    Updated the random_absolute_pad_image function in the preprocessor library to
    support the keypoints argument.

--
301300324  by Zhichao Lu:

    Apply name change(experimental_run_v2 -> run) for all callers in Tensorflow.

--
301297115  by ronnyvotel:

    Utility function for setting keypoint visibilities based on keypoint coordinates.

--
301248885  by Zhichao Lu:

    Allow MultiworkerMirroredStrategy(MWMS) use by adding checkpoint handling with temporary directories in model_lib_v2. Added missing WeakKeyDictionary cfer_fn_cache field in CollectiveAllReduceStrategyExtended.

--
301224559  by Zhichao Lu:

    ...1) Fixes model_lib to also use keypoints while preparing model groundtruth.
    ...2) Tests model_lib with newly added keypoint metrics config.

--
300836556  by Zhichao Lu:

    Internal changes to add keypoint estimation parameters in CenterNet proto.

--
300795208  by Zhichao Lu:

    Updated the eval_util library to populate the keypoint groundtruth to
    eval_dict.

--
299474766  by Zhichao Lu:

    ...Modifies eval_util to create Keypoint Evaluator objects when configured in eval config.

--
299453920  by Zhichao Lu:

    Add swish activation as a hyperperams option.

--
299240093  by ronnyvotel:

    Keypoint postprocessing for CenterNetMetaArch.

--
299176395  by Zhichao Lu:

    Internal change.

--
299135608  by Zhichao Lu:

    Internal changes to refactor the CenterNet model in preparation for keypoint estimation tasks.

--
298915482  by Zhichao Lu:

    Make dataset_builder aware of input_context for distributed training.

--
298713595  by Zhichao Lu:

    Handling data with negative size boxes.

--
298695964  by Zhichao Lu:

    Expose change_coordinate_frame as a config parameter; fix multiclass_scores optional field.

--
298492150  by Zhichao Lu:

    Rename optimizer_builder_test_v2.py -> optimizer_builder_v2_test.py

--
298476471  by Zhichao Lu:

    Internal changes to support CenterNet keypoint estimation.

--
298365851  by ronnyvotel:

    Fixing a bug where groundtruth_keypoint_weights were being padded with a dynamic dimension.

--
297843700  by Zhichao Lu:

    Internal change.

--
297706988  by lzc:

    Internal change.

--
297705287  by ronnyvotel:

    Creating the "snapping" behavior in CenterNet, where regressed keypoints are refined with updated candidate keypoints from a heatmap.

--
297700447  by Zhichao Lu:

    Improve checkpoint checking logic with TF2 loop.

--
297686094  by Zhichao Lu:

    Convert "import tensorflow as tf" to "import tensorflow.compat.v1".

--
297670468  by lzc:

    Internal change.

--
297241327  by Zhichao Lu:

    Convert "import tensorflow as tf" to "import tensorflow.compat.v1".

--
297205959  by Zhichao Lu:

    Internal changes to support refactored the centernet object detection target assigner into a separate library.

--
297143806  by Zhichao Lu:

    Convert "import tensorflow as tf" to "import tensorflow.compat.v1".

--
297129625  by Zhichao Lu:

    Explicitly replace "import tensorflow" with "tensorflow.compat.v1" for TF2.x migration

--
297117070  by Zhichao Lu:

    Explicitly replace "import tensorflow" with "tensorflow.compat.v1" for TF2.x migration

--
297030190  by Zhichao Lu:

    Add configuration options for visualizing keypoint edges

--
296359649  by Zhichao Lu:

    Support DepthwiseConv2dNative (of separable conv) in weight equalization loss.

--
296290582  by Zhichao Lu:

    Internal change.

--
296093857  by Zhichao Lu:

    Internal changes to add general target assigner utilities.

--
295975116  by Zhichao Lu:

    Fix visualize_boxes_and_labels_on_image_array to show max_boxes_to_draw correctly.

--
295819711  by Zhichao Lu:

    Adds a flag to visualize_boxes_and_labels_on_image_array to skip the drawing of axis aligned bounding boxes.

--
295811929  by Zhichao Lu:

    Keypoint support in random_square_crop_by_scale.

--
295788458  by rathodv:

    Remove unused checkpoint to reduce repo size on github

--
295787184  by Zhichao Lu:

    Enable visualization of edges between keypoints

--
295763508  by Zhichao Lu:

    [Context RCNN] Add an option to enable / disable cropping feature in the post
    process step in the meta archtecture.

--
295605344  by Zhichao Lu:

    internal change.

--
294926050  by ronnyvotel:

    Adding per-keypoint groundtruth weights. These weights are intended to be used as multipliers in a keypoint loss function.

    Groundtruth keypoint weights are constructed as follows:
    - Initialize the weight for each keypoint type based on user-specified weights in the input_reader proto
    - Mask out (i.e. make zero) all keypoint weights that are not visible.

--
294829061  by lzc:

    Internal change.

--
294566503  by Zhichao Lu:

    Changed internal CenterNet Model configuration.

--
294346662  by ronnyvotel:

    Using NaN values in keypoint coordinates that are not visible.

--
294333339  by Zhichao Lu:

    Change experimetna_distribute_dataset -> experimental_distribute_dataset_from_function

--
293928752  by Zhichao Lu:

    Internal change

--
293909384  by Zhichao Lu:

    Add capabilities to train 1024x1024 CenterNet models.

--
293637554  by ronnyvotel:

    Adding keypoint visibilities to TfExampleDecoder.

--
293501558  by lzc:

    Internal change.

--
293252851  by Zhichao Lu:

    Change tf.gfile.GFile to tf.io.gfile.GFile.

--
292730217  by Zhichao Lu:

    Internal change.

--
292456563  by lzc:

    Internal changes.

--
292355612  by Zhichao Lu:

    Use tf.gather and tf.scatter_nd instead of matrix ops.

--
292245265  by rathodv:

    Internal

--
291989323  by richardmunoz:

    Refactor out building a DataDecoder from building a tf.data.Dataset.

--
291950147  by Zhichao Lu:

    Flip bounding boxes in arbitrary shaped tensors.

--
291401052  by huizhongc:

    Fix multiscale grid anchor generator to allow fully convolutional inference. When exporting model with identity_resizer as image_resizer, there is an incorrect box offset on the detection results. We add the anchor offset to address this problem.

--
291298871  by Zhichao Lu:

    Py3 compatibility changes.

--
290957957  by Zhichao Lu:

    Hourglass feature extractor for CenterNet.

--
290564372  by Zhichao Lu:

    Internal change.

--
290155278  by rathodv:

    Remove Dataset Explorer.

--
290155153  by Zhichao Lu:

    Internal change

--
290122054  by Zhichao Lu:

    Unify the format in the faster_rcnn.proto

--
290116084  by Zhichao Lu:

    Deprecate tensorflow.contrib.

--
290100672  by Zhichao Lu:

    Update MobilenetV3 SSD candidates

--
289926392  by Zhichao Lu:

    Internal change

--
289553440  by Zhichao Lu:

    [Object Detection API] Fix the comments about the dimension of the rpn_box_encodings from 4-D to 3-D.

--
288994128  by lzc:

    Internal changes.

--
288942194  by lzc:

    Internal change.

--
288746124  by Zhichao Lu:

    Configurable channel mean/std. dev in CenterNet feature extractors.

--
288552509  by rathodv:

    Internal.

--
288541285  by rathodv:

    Internal update.

--
288396396  by Zhichao Lu:

    Make object detection import contrib explicitly

--
288255791  by rathodv:

    Internal

--
288078600  by Zhichao Lu:

    Fix model_lib_v2 test

--
287952244  by rathodv:

    Internal

--
287921774  by Zhichao Lu:

    internal change

--
287906173  by Zhichao Lu:

    internal change

--
287889407  by jonathanhuang:

    PY3 compatibility

--
287889042  by rathodv:

    Internal

--
287876178  by Zhichao Lu:

    Internal change.

--
287770490  by Zhichao Lu:

    Add CenterNet proto and builder

--
287694213  by Zhichao Lu:

    Support for running multiple steps per tf.function call.

--
287377183  by jonathanhuang:

    PY3 compatibility

--
287371344  by rathodv:

    Support loading keypoint labels and ids.

--
287368213  by rathodv:

    Add protos supporting keypoint evaluation.

--
286673200  by rathodv:

    dataset_tools PY3 migration

--
286635106  by Zhichao Lu:

    Update code for upcoming tf.contrib removal

--
286479439  by Zhichao Lu:

    Internal change

--
286311711  by Zhichao Lu:

    Skeleton of context model within TFODAPI

--
286005546  by Zhichao Lu:

    Fix Faster-RCNN training when using keep_aspect_ratio_resizer with pad_to_max_dimension

--
285906400  by derekjchow:

    Internal change

--
285822795  by Zhichao Lu:

    Add CenterNet meta arch target assigners.

--
285447238  by Zhichao Lu:

    Internal changes.

--
285016927  by Zhichao Lu:

    Make _dummy_computation a tf.function. This fixes breakage caused by
    cl/284256438

--
284827274  by Zhichao Lu:

    Convert to python 3.

--
284645593  by rathodv:

    Internal change

--
284639893  by rathodv:

    Add missing documentation for keypoints in eval_util.py.

--
284323712  by Zhichao Lu:

    Internal changes.

--
284295290  by Zhichao Lu:

    Updating input config proto and dataset builder to include context fields

    Updating standard_fields and tf_example_decoder to include context features

--
284226821  by derekjchow:

    Update exporter.

--
284211030  by Zhichao Lu:

    API changes in CenterNet informed by the experiments with hourlgass network.

--
284190451  by Zhichao Lu:

    Add support for CenterNet losses in protos and builders.

--
284093961  by lzc:

    Internal changes.

--
284028174  by Zhichao Lu:

    Internal change

--
284014719  by derekjchow:

    Do not pad top_down feature maps unnecessarily.

--
284005765  by Zhichao Lu:

    Add new pad_to_multiple_resizer

--
283858233  by Zhichao Lu:

    Make target assigner work when under tf.function.

--
283836611  by Zhichao Lu:

    Make config getters more general.

--
283808990  by Zhichao Lu:

    Internal change

--
283754588  by Zhichao Lu:

    Internal changes.

--
282460301  by Zhichao Lu:

    Add ability to restore v2 style checkpoints.

--
281605842  by lzc:

    Add option to disable loss computation in OD API eval job.

--
280298212  by Zhichao Lu:

    Add backwards compatible change

--
280237857  by Zhichao Lu:

    internal change

--

PiperOrigin-RevId: 310447280
parent ac5fff19
# Lint as: python2, python3
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
......
# Lint as: python2, python3
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
......@@ -30,10 +31,18 @@ from object_detection.protos import graph_rewriter_pb2
from object_detection.protos import pipeline_pb2
from object_detection.protos import post_processing_pb2
# pylint: disable=g-import-not-at-top
try:
from tensorflow.contrib import slim as contrib_slim
except ImportError:
# TF 2.0 doesn't ship with contrib.
pass
if six.PY2:
import mock # pylint: disable=g-import-not-at-top
import mock
else:
from unittest import mock # pylint: disable=g-import-not-at-top
from unittest import mock # pylint: disable=g-importing-member
# pylint: enable=g-import-not-at-top
class FakeModel(model.DetectionModel):
......@@ -45,7 +54,7 @@ class FakeModel(model.DetectionModel):
pass
def predict(self, preprocessed_inputs, true_image_shapes):
features = tf.contrib.slim.conv2d(preprocessed_inputs, 3, 1)
features = contrib_slim.conv2d(preprocessed_inputs, 3, 1)
with tf.control_dependencies([features]):
prediction_tensors = {
'box_encodings':
......@@ -105,17 +114,17 @@ class ExportTfliteGraphTest(tf.test.TestCase):
saver.save(sess, checkpoint_path)
def _assert_quant_vars_exists(self, tflite_graph_file):
with tf.gfile.Open(tflite_graph_file) as f:
with tf.gfile.Open(tflite_graph_file, mode='rb') as f:
graph_string = f.read()
print(graph_string)
self.assertTrue('quant' in graph_string)
self.assertIn(six.ensure_binary('quant'), graph_string)
def _import_graph_and_run_inference(self, tflite_graph_file, num_channels=3):
"""Imports a tflite graph, runs single inference and returns outputs."""
graph = tf.Graph()
with graph.as_default():
graph_def = tf.GraphDef()
with tf.gfile.Open(tflite_graph_file) as f:
with tf.gfile.Open(tflite_graph_file, mode='rb') as f:
graph_def.ParseFromString(f.read())
tf.import_graph_def(graph_def, name='')
input_tensor = graph.get_tensor_by_name('normalized_input_image_tensor:0')
......@@ -330,21 +339,21 @@ class ExportTfliteGraphTest(tf.test.TestCase):
graph = tf.Graph()
with graph.as_default():
graph_def = tf.GraphDef()
with tf.gfile.Open(tflite_graph_file) as f:
with tf.gfile.Open(tflite_graph_file, mode='rb') as f:
graph_def.ParseFromString(f.read())
all_op_names = [node.name for node in graph_def.node]
self.assertIn('TFLite_Detection_PostProcess', all_op_names)
self.assertNotIn('UnattachedTensor', all_op_names)
for node in graph_def.node:
if node.name == 'TFLite_Detection_PostProcess':
self.assertTrue(node.attr['_output_quantized'].b is True)
self.assertTrue(node.attr['_output_quantized'].b)
self.assertTrue(
node.attr['_support_output_type_float_in_quantized_op'].b is True)
self.assertTrue(node.attr['y_scale'].f == 10.0)
self.assertTrue(node.attr['x_scale'].f == 10.0)
self.assertTrue(node.attr['h_scale'].f == 5.0)
self.assertTrue(node.attr['w_scale'].f == 5.0)
self.assertTrue(node.attr['num_classes'].i == 2)
node.attr['_support_output_type_float_in_quantized_op'].b)
self.assertEqual(node.attr['y_scale'].f, 10.0)
self.assertEqual(node.attr['x_scale'].f, 10.0)
self.assertEqual(node.attr['h_scale'].f, 5.0)
self.assertEqual(node.attr['w_scale'].f, 5.0)
self.assertEqual(node.attr['num_classes'].i, 2)
self.assertTrue(
all([
t == types_pb2.DT_FLOAT
......@@ -362,7 +371,7 @@ class ExportTfliteGraphTest(tf.test.TestCase):
graph = tf.Graph()
with graph.as_default():
graph_def = tf.GraphDef()
with tf.gfile.Open(tflite_graph_file) as f:
with tf.gfile.Open(tflite_graph_file, mode='rb') as f:
graph_def.ParseFromString(f.read())
all_op_names = [node.name for node in graph_def.node]
self.assertIn('UnattachedTensor', all_op_names)
......@@ -381,7 +390,7 @@ class ExportTfliteGraphTest(tf.test.TestCase):
graph = tf.Graph()
with graph.as_default():
graph_def = tf.GraphDef()
with tf.gfile.Open(tflite_graph_file) as f:
with tf.gfile.Open(tflite_graph_file, mode='rb') as f:
graph_def.ParseFromString(f.read())
all_op_names = [node.name for node in graph_def.node]
self.assertIn('TFLite_Detection_PostProcess', all_op_names)
......
......@@ -17,7 +17,6 @@
import os
import tempfile
import tensorflow as tf
from tensorflow.contrib.quantize.python import graph_matcher
from tensorflow.core.protobuf import saver_pb2
from tensorflow.python.tools import freeze_graph # pylint: disable=g-direct-tensorflow-import
from object_detection.builders import graph_rewriter_builder
......@@ -27,7 +26,15 @@ from object_detection.data_decoders import tf_example_decoder
from object_detection.utils import config_util
from object_detection.utils import shape_utils
slim = tf.contrib.slim
# pylint: disable=g-import-not-at-top
try:
from tensorflow.contrib import slim
from tensorflow.contrib import tfprof as contrib_tfprof
from tensorflow.contrib.quantize.python import graph_matcher
except ImportError:
# TF 2.0 doesn't ship with contrib.
pass
# pylint: enable=g-import-not-at-top
freeze_graph_with_def_protos = freeze_graph.freeze_graph_with_def_protos
......@@ -41,7 +48,7 @@ def rewrite_nn_resize_op(is_quantized=False):
is_quantized: True if the default graph is quantized.
"""
def remove_nn():
"""Remove nearest neighbor upsampling structure and replace with TF op."""
"""Remove nearest neighbor upsampling structures and replace with TF op."""
input_pattern = graph_matcher.OpTypePattern(
'FakeQuantWithMinMaxVars' if is_quantized else '*')
stack_1_pattern = graph_matcher.OpTypePattern(
......@@ -50,10 +57,15 @@ def rewrite_nn_resize_op(is_quantized=False):
'Pack', inputs=[stack_1_pattern, stack_1_pattern], ordered_inputs=False)
reshape_pattern = graph_matcher.OpTypePattern(
'Reshape', inputs=[stack_2_pattern, 'Const'], ordered_inputs=False)
consumer_pattern = graph_matcher.OpTypePattern(
consumer_pattern1 = graph_matcher.OpTypePattern(
'Add|AddV2|Max|Mul', inputs=[reshape_pattern, '*'],
ordered_inputs=False)
consumer_pattern2 = graph_matcher.OpTypePattern(
'StridedSlice', inputs=[reshape_pattern, '*', '*', '*'],
ordered_inputs=False)
def replace_matches(consumer_pattern):
"""Search for nearest neighbor pattern and replace with TF op."""
match_counter = 0
matcher = graph_matcher.GraphMatcher(consumer_pattern)
for match in matcher.match_graph(tf.get_default_graph()):
......@@ -72,6 +84,11 @@ def rewrite_nn_resize_op(is_quantized=False):
consumer_op._update_input(index, nn_resize) # pylint: disable=protected-access
break
return match_counter
match_counter = replace_matches(consumer_pattern1)
match_counter += replace_matches(consumer_pattern2)
tf.logging.info('Found and fixed {} matches'.format(match_counter))
return match_counter
......@@ -524,8 +541,8 @@ def profile_inference_graph(graph):
graph: the inference graph.
"""
tfprof_vars_option = (
tf.contrib.tfprof.model_analyzer.TRAINABLE_VARS_PARAMS_STAT_OPTIONS)
tfprof_flops_option = tf.contrib.tfprof.model_analyzer.FLOAT_OPS_OPTIONS
contrib_tfprof.model_analyzer.TRAINABLE_VARS_PARAMS_STAT_OPTIONS)
tfprof_flops_option = contrib_tfprof.model_analyzer.FLOAT_OPS_OPTIONS
# Batchnorm is usually folded during inference.
tfprof_vars_option['trim_name_regexes'] = ['.*BatchNorm.*']
......@@ -534,10 +551,8 @@ def profile_inference_graph(graph):
'.*BatchNorm.*', '.*Initializer.*', '.*Regularizer.*', '.*BiasAdd.*'
]
tf.contrib.tfprof.model_analyzer.print_model_analysis(
graph,
tfprof_options=tfprof_vars_option)
contrib_tfprof.model_analyzer.print_model_analysis(
graph, tfprof_options=tfprof_vars_option)
tf.contrib.tfprof.model_analyzer.print_model_analysis(
graph,
tfprof_options=tfprof_flops_option)
contrib_tfprof.model_analyzer.print_model_analysis(
graph, tfprof_options=tfprof_flops_option)
# Lint as: python2, python3
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
......@@ -14,6 +15,9 @@
# ==============================================================================
"""Tests for object_detection.export_inference_graph."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import numpy as np
import six
......@@ -36,7 +40,13 @@ if six.PY2:
else:
from unittest import mock # pylint: disable=g-import-not-at-top
slim = tf.contrib.slim
# pylint: disable=g-import-not-at-top
try:
from tensorflow.contrib import slim as contrib_slim
except ImportError:
# TF 2.0 doesn't ship with contrib.
pass
# pylint: enable=g-import-not-at-top
class FakeModel(model.DetectionModel):
......@@ -55,7 +65,7 @@ class FakeModel(model.DetectionModel):
return {'image': tf.layers.conv2d(preprocessed_inputs, 3, 1)}
def postprocess(self, prediction_dict, true_image_shapes):
with tf.control_dependencies(prediction_dict.values()):
with tf.control_dependencies(list(prediction_dict.values())):
postprocessed_tensors = {
'detection_boxes': tf.constant([[[0.0, 0.0, 0.5, 0.5],
[0.5, 0.5, 0.8, 0.8]],
......@@ -135,7 +145,7 @@ class ExportInferenceGraphTest(tf.test.TestCase):
od_graph = tf.Graph()
with od_graph.as_default():
od_graph_def = tf.GraphDef()
with tf.gfile.GFile(inference_graph_path) as fid:
with tf.gfile.GFile(inference_graph_path, mode='rb') as fid:
if is_binary:
od_graph_def.ParseFromString(fid.read())
else:
......@@ -147,7 +157,9 @@ class ExportInferenceGraphTest(tf.test.TestCase):
with self.test_session():
encoded_image = tf.image.encode_jpeg(tf.constant(image_array)).eval()
def _bytes_feature(value):
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
return tf.train.Feature(
bytes_list=tf.train.BytesList(value=[six.ensure_binary(value)]))
example = tf.train.Example(features=tf.train.Features(feature={
'image/encoded': _bytes_feature(encoded_image),
'image/format': _bytes_feature('jpg'),
......@@ -401,7 +413,7 @@ class ExportInferenceGraphTest(tf.test.TestCase):
self._load_inference_graph(inference_graph_path, is_binary=False)
has_quant_nodes = False
for v in variables_helper.get_global_variables_safely():
if v.op.name.endswith('act_quant/min'):
if six.ensure_str(v.op.name).endswith('act_quant/min'):
has_quant_nodes = True
break
self.assertTrue(has_quant_nodes)
......@@ -724,7 +736,7 @@ class ExportInferenceGraphTest(tf.test.TestCase):
input_shape=None,
output_collection_name='inference_op',
graph_hook_fn=None)
output_node_names = ','.join(outputs.keys())
output_node_names = ','.join(list(outputs.keys()))
saver = tf.train.Saver()
input_saver_def = saver.as_saver_def()
exporter.freeze_graph_with_def_protos(
......@@ -877,7 +889,7 @@ class ExportInferenceGraphTest(tf.test.TestCase):
input_shape=None,
output_collection_name='inference_op',
graph_hook_fn=None)
output_node_names = ','.join(outputs.keys())
output_node_names = ','.join(list(outputs.keys()))
saver = tf.train.Saver()
input_saver_def = saver.as_saver_def()
frozen_graph_def = exporter.freeze_graph_with_def_protos(
......@@ -1080,7 +1092,7 @@ class ExportInferenceGraphTest(tf.test.TestCase):
g = tf.Graph()
with g.as_default():
x = array_ops.placeholder(dtypes.float32, shape=(8, 10, 10, 8))
x_conv = tf.contrib.slim.conv2d(x, 8, 1)
x_conv = contrib_slim.conv2d(x, 8, 1)
y = array_ops.placeholder(dtypes.float32, shape=(8, 20, 20, 8))
s = ops.nearest_neighbor_upsampling(x_conv, 2)
t = s + y
......@@ -1103,6 +1115,50 @@ class ExportInferenceGraphTest(tf.test.TestCase):
self.assertTrue(resize_op_found)
def test_rewrite_nn_resize_op_odd_size(self):
g = tf.Graph()
with g.as_default():
x = array_ops.placeholder(dtypes.float32, shape=(8, 10, 10, 8))
s = ops.nearest_neighbor_upsampling(x, 2)
t = s[:, :19, :19, :]
exporter.rewrite_nn_resize_op()
resize_op_found = False
for op in g.get_operations():
if op.type == 'ResizeNearestNeighbor':
resize_op_found = True
self.assertEqual(op.inputs[0], x)
self.assertEqual(op.outputs[0].consumers()[0], t.op)
break
self.assertTrue(resize_op_found)
def test_rewrite_nn_resize_op_quantized_odd_size(self):
g = tf.Graph()
with g.as_default():
x = array_ops.placeholder(dtypes.float32, shape=(8, 10, 10, 8))
x_conv = contrib_slim.conv2d(x, 8, 1)
s = ops.nearest_neighbor_upsampling(x_conv, 2)
t = s[:, :19, :19, :]
graph_rewriter_config = graph_rewriter_pb2.GraphRewriter()
graph_rewriter_config.quantization.delay = 500000
graph_rewriter_fn = graph_rewriter_builder.build(
graph_rewriter_config, is_training=False)
graph_rewriter_fn()
exporter.rewrite_nn_resize_op(is_quantized=True)
resize_op_found = False
for op in g.get_operations():
if op.type == 'ResizeNearestNeighbor':
resize_op_found = True
self.assertEqual(op.inputs[0].op.type, 'FakeQuantWithMinMaxVars')
self.assertEqual(op.outputs[0].consumers()[0], t.op)
break
self.assertTrue(resize_op_found)
def test_rewrite_nn_resize_op_multiple_path(self):
g = tf.Graph()
with g.as_default():
......@@ -1136,7 +1192,7 @@ class ExportInferenceGraphTest(tf.test.TestCase):
self.assertNotEqual(node.op, 'Pack')
if node.op == 'ResizeNearestNeighbor':
counter_resize_op += 1
self.assertIn(node.name + ':0', t_input_ops)
self.assertIn(six.ensure_str(node.name) + ':0', t_input_ops)
self.assertEqual(counter_resize_op, 2)
......
# Tensorflow detection model zoo
We provide a collection of detection models pre-trained on the [COCO
dataset](http://cocodataset.org/), the [Kitti dataset](http://www.cvlibs.net/datasets/kitti/),
the [Open Images dataset](https://storage.googleapis.com/openimages/web/index.html), the
[AVA v2.1 dataset](https://research.google.com/ava/) and the
dataset](http://cocodataset.org), the [Kitti dataset](http://www.cvlibs.net/datasets/kitti/),
the
[Open Images dataset](https://storage.googleapis.com/openimages/web/index.html),
the [AVA v2.1 dataset](https://research.google.com/ava/) and the
[iNaturalist Species Detection Dataset](https://github.com/visipedia/inat_comp/blob/master/2017/README.md#bounding-boxes).
These models can be useful for out-of-the-box inference if you are interested in
categories already in those datasets. They are also useful for initializing your
......@@ -107,7 +108,8 @@ Note: If you download the tar.gz file of quantized models and un-tar, you will g
### Mobile models
Model name | Pixel 1 Latency (ms) | COCO mAP | Outputs
----------------------------------------------------------------------------------------------------------------------------------- | :------------------: | :------: | :-----:
------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------------------: | :------: | :-----:
[ssd_mobilenet_v2_mnasfpn_coco](http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v2_mnasfpn_shared_box_predictor_320x320_coco_sync_2020_05_06.tar.gz) | 183 | 26.6 | Boxes
[ssd_mobilenet_v3_large_coco](http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v3_large_coco_2020_01_14.tar.gz) | 119 | 22.6 | Boxes
[ssd_mobilenet_v3_small_coco](http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v3_small_coco_2020_01_14.tar.gz) | 43 | 15.4 | Boxes
......
......@@ -11,7 +11,7 @@ Tensorflow Object Detection API depends on the following libraries:
* tf Slim (which is included in the "tensorflow/models/research/" checkout)
* Jupyter notebook
* Matplotlib
* Tensorflow (>=1.12.0)
* Tensorflow (1.15.0)
* Cython
* contextlib2
* cocoapi
......@@ -59,7 +59,9 @@ If that is your case, try the [manual](#Manual-protobuf-compiler-installation-an
git clone https://github.com/tensorflow/models.git
```
To use this library, you need to download this repository, whenever it says `<path-to-tensorflow>` it will be referring to the folder that you downloaded this repository into.
To use this library, you need to download this repository, whenever it says
`<path-to-tensorflow>` it will be referring to the folder that you downloaded
this repository into.
## COCO API installation
......@@ -80,18 +82,20 @@ make
cp -r pycocotools <path_to_tensorflow>/models/research/
```
Alternatively, users can install `pycocotools` using pip:
Alternatively, users can install `pycocotools` using pip:
```bash
```bash
pip install --user pycocotools
```
```
## Protobuf Compilation
The Tensorflow Object Detection API uses Protobufs to configure model and
training parameters. Before the framework can be used, the Protobuf libraries
must be compiled. This should be done by running the following command from
the [tensorflow/models/research/](https://github.com/tensorflow/models/tree/master/research/) directory:
the [tensorflow/models/research/
](https://github.com/tensorflow/models/tree/master/research/)
directory:
``` bash
......@@ -154,7 +158,8 @@ export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim
Note: This command needs to run from every new terminal you start. If you wish
to avoid running this manually, you can add it as a new line to the end of your
~/.bashrc file, replacing \`pwd\` with the absolute path of
tensorflow/models/research on your system. After updating ~/.bashrc file you can run the following command:
tensorflow/models/research on your system. After updating ~/.bashrc file you
can run the following command:
``` bash
source ~/.bashrc
......@@ -165,6 +170,8 @@ source ~/.bashrc
You can test that you have correctly installed the Tensorflow Object Detection\
API by running the following command:
```bash
python object_detection/builders/model_builder_test.py
# If using Tensorflow 1.X:
python object_detection/builders/model_builder_tf1_test.py
```
......@@ -142,11 +142,11 @@ python -m object_detection/inference/infer_detections \
Inference preserves all fields of the input TFExamples, and adds new fields to
store the inferred detections. This allows [computing evaluation
measures](#computing-evaluation-measures) on the output TFRecord alone, as ground
truth boxes are preserved as well. Since measure computations don't require
access to the images, `infer_detections` can optionally discard them with the
`--discard_image_pixels` flag. Discarding the images drastically reduces the
size of the output TFRecord.
measures](#computing-evaluation-measures) on the output TFRecord alone, as
groundtruth boxes are preserved as well. Since measure computations don't
require access to the images, `infer_detections` can optionally discard them
with the `--discard_image_pixels` flag. Discarding the images drastically
reduces the size of the output TFRecord.
### Accelerating inference
......
......@@ -56,7 +56,7 @@ via the following command. For a quantized model, run this from the tensorflow/
directory:
```shell
bazel run --config=opt tensorflow/lite/toco:toco -- \
bazel run -c opt tensorflow/lite/toco:toco -- \
--input_file=$OUTPUT_DIR/tflite_graph.pb \
--output_file=$OUTPUT_DIR/detect.tflite \
--input_shapes=1,300,300,3 \
......@@ -82,7 +82,7 @@ parameters and can be run via the TensorFlow Lite interpreter on the Android
device. For a floating point model, run this from the tensorflow/ directory:
```shell
bazel run --config=opt tensorflow/lite/toco:toco -- \
bazel run -c opt tensorflow/lite/toco:toco -- \
--input_file=$OUTPUT_DIR/tflite_graph.pb \
--output_file=$OUTPUT_DIR/detect.tflite \
--input_shapes=1,300,300,3 \
......
......@@ -46,26 +46,16 @@ have static shape:
* **Groundtruth tensors with static shape** - Images in a typical detection
dataset have variable number of groundtruth boxes and associated classes.
Setting `max_number_of_boxes` to a large enough number in the
`train_input_reader` and `eval_input_reader` pads the groundtruth tensors
with zeros to a static shape. Padded groundtruth tensors are correctly
handled internally within the model.
Setting `max_number_of_boxes` to a large enough number in `train_config`
pads the groundtruth tensors with zeros to a static shape. Padded
groundtruth tensors are correctly handled internally within the model.
```
train_input_reader: {
tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/mscoco_train.record-?????-of-00100"
}
label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
max_number_of_boxes: 200
}
eval_input_reader: {
tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/mscoco_val.record-?????-of-0010"
}
label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
train_config: {
fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt"
batch_size: 64
max_number_of_boxes: 200
unpad_groundtruth_tensors: false
}
```
......
......@@ -15,11 +15,12 @@
r"""Tests for detection_inference.py."""
import os
import StringIO
import numpy as np
from PIL import Image
import six
import tensorflow as tf
from google.protobuf import text_format
from object_detection.core import standard_fields
from object_detection.inference import detection_inference
......@@ -32,7 +33,7 @@ def get_mock_tfrecord_path():
def create_mock_tfrecord():
pil_image = Image.fromarray(np.array([[[123, 0, 0]]], dtype=np.uint8), 'RGB')
image_output_stream = StringIO.StringIO()
image_output_stream = six.BytesIO()
pil_image.save(image_output_stream, format='png')
encoded_image = image_output_stream.getvalue()
......@@ -46,6 +47,7 @@ def create_mock_tfrecord():
tf_example = tf.train.Example(features=tf.train.Features(feature=feature_map))
with tf.python_io.TFRecordWriter(get_mock_tfrecord_path()) as writer:
writer.write(tf_example.SerializeToString())
return encoded_image
def get_mock_graph_path():
......@@ -76,7 +78,7 @@ class InferDetectionsTests(tf.test.TestCase):
def test_simple(self):
create_mock_graph()
create_mock_tfrecord()
encoded_image = create_mock_tfrecord()
serialized_example_tensor, image_tensor = detection_inference.build_input(
[get_mock_tfrecord_path()])
......@@ -94,8 +96,8 @@ class InferDetectionsTests(tf.test.TestCase):
tf_example = detection_inference.infer_detections_and_add_to_example(
serialized_example_tensor, detected_boxes_tensor,
detected_scores_tensor, detected_labels_tensor, False)
self.assertProtoEquals(r"""
expected_example = tf.train.Example()
text_format.Merge(r"""
features {
feature {
key: "image/detection/bbox/ymin"
......@@ -115,17 +117,14 @@ class InferDetectionsTests(tf.test.TestCase):
feature {
key: "image/detection/score"
value { float_list { value: [0.1, 0.2] } } }
feature {
key: "image/encoded"
value { bytes_list { value:
"\211PNG\r\n\032\n\000\000\000\rIHDR\000\000\000\001\000\000"
"\000\001\010\002\000\000\000\220wS\336\000\000\000\022IDATx"
"\234b\250f`\000\000\000\000\377\377\003\000\001u\000|gO\242"
"\213\000\000\000\000IEND\256B`\202" } } }
feature {
key: "test_field"
value { float_list { value: [1.0, 2.0, 3.0, 4.0] } } } }
""", tf_example)
value { float_list { value: [1.0, 2.0, 3.0, 4.0] } } } }""",
expected_example)
expected_example.features.feature[
standard_fields.TfExampleFields
.image_encoded].CopyFrom(dataset_util.bytes_feature(encoded_image))
self.assertProtoEquals(expected_example, tf_example)
def test_discard_image(self):
create_mock_graph()
......
......@@ -43,6 +43,7 @@ from object_detection.utils import shape_utils
HASH_KEY = 'hash'
HASH_BINS = 1 << 31
SERVING_FED_EXAMPLE_KEY = 'serialized_example'
_LABEL_OFFSET = 1
# A map of names to methods that help build the input pipeline.
INPUT_BUILDER_UTIL_MAP = {
......@@ -67,6 +68,64 @@ def _multiclass_scores_or_one_hot_labels(multiclass_scores,
return tf.cond(tf.size(multiclass_scores) > 0, true_fn, false_fn)
def _convert_labeled_classes_to_k_hot(groundtruth_labeled_classes, num_classes):
"""Returns k-hot encoding of the labeled classes."""
# If the input labeled_classes is empty, it assumes all classes are
# exhaustively labeled, thus returning an all-one encoding.
def true_fn():
return tf.sparse_to_dense(
groundtruth_labeled_classes - _LABEL_OFFSET, [num_classes],
tf.constant(1, dtype=tf.float32),
validate_indices=False)
def false_fn():
return tf.ones(num_classes, dtype=tf.float32)
return tf.cond(tf.size(groundtruth_labeled_classes) > 0, true_fn, false_fn)
def _remove_unrecognized_classes(class_ids, unrecognized_label):
"""Returns class ids with unrecognized classes filtered out."""
recognized_indices = tf.where(tf.greater(class_ids, unrecognized_label))
return tf.gather(class_ids, recognized_indices)
def assert_or_prune_invalid_boxes(boxes):
"""Makes sure boxes have valid sizes (ymax >= ymin, xmax >= xmin).
When the hardware supports assertions, the function raises an error when
boxes have an invalid size. If assertions are not supported (e.g. on TPU),
boxes with invalid sizes are filtered out.
Args:
boxes: float tensor of shape [num_boxes, 4]
Returns:
boxes: float tensor of shape [num_valid_boxes, 4] with invalid boxes
filtered out.
Raises:
tf.errors.InvalidArgumentError: When we detect boxes with invalid size.
This is not supported on TPUs.
"""
ymin, xmin, ymax, xmax = tf.split(
boxes, num_or_size_splits=4, axis=1)
height_check = tf.Assert(tf.reduce_all(ymax >= ymin), [ymin, ymax])
width_check = tf.Assert(tf.reduce_all(xmax >= xmin), [xmin, xmax])
with tf.control_dependencies([height_check, width_check]):
boxes_tensor = tf.concat([ymin, xmin, ymax, xmax], axis=1)
boxlist = box_list.BoxList(boxes_tensor)
# TODO(b/149221748) Remove pruning when XLA supports assertions.
boxlist = box_list_ops.prune_small_boxes(boxlist, 0)
return boxlist.get()
def transform_input_data(tensor_dict,
model_preprocess_fn,
image_resizer_fn,
......@@ -76,7 +135,8 @@ def transform_input_data(tensor_dict,
retain_original_image=False,
use_multiclass_scores=False,
use_bfloat16=False,
retain_original_image_additional_channels=False):
retain_original_image_additional_channels=False,
keypoint_type_weight=None):
"""A single function that is responsible for all input data transformations.
Data transformation functions are applied in the following order.
......@@ -85,10 +145,15 @@ def transform_input_data(tensor_dict,
fields.InputDataFields.image.
2. data_augmentation_fn (optional): applied on tensor_dict.
3. model_preprocess_fn: applied only on image tensor in tensor_dict.
4. image_resizer_fn: applied on original image and instance mask tensor in
4. keypoint_type_weight (optional): If groundtruth keypoints are in
the tensor dictionary, per-keypoint weights are produced. These weights are
initialized by `keypoint_type_weight` (or ones if left None).
Then, for all keypoints that are not visible, the weights are set to 0 (to
avoid penalizing the model in a loss function).
5. image_resizer_fn: applied on original image and instance mask tensor in
tensor_dict.
5. one_hot_encoding: applied to classes tensor in tensor_dict.
6. merge_multiple_boxes (optional): when groundtruth boxes are exactly the
6. one_hot_encoding: applied to classes tensor in tensor_dict.
7. merge_multiple_boxes (optional): when groundtruth boxes are exactly the
same they can be merged into a single box with an associated k-hot class
label.
......@@ -117,12 +182,25 @@ def transform_input_data(tensor_dict,
use_bfloat16: (optional) a bool, whether to use bfloat16 in training.
retain_original_image_additional_channels: (optional) Whether to retain
original image additional channels in the output dictionary.
keypoint_type_weight: A list (of length num_keypoints) containing
groundtruth loss weights to use for each keypoint. If None, will use a
weight of 1.
Returns:
A dictionary keyed by fields.InputDataFields containing the tensors obtained
after applying all the transformations.
"""
out_tensor_dict = tensor_dict.copy()
labeled_classes_field = fields.InputDataFields.groundtruth_labeled_classes
if labeled_classes_field in out_tensor_dict:
# tf_example_decoder casts unrecognized labels to -1. Remove these
# unrecognized labels before converting labeled_classes to k-hot vector.
out_tensor_dict[labeled_classes_field] = _remove_unrecognized_classes(
out_tensor_dict[labeled_classes_field], unrecognized_label=-1)
out_tensor_dict[labeled_classes_field] = _convert_labeled_classes_to_k_hot(
out_tensor_dict[labeled_classes_field], num_classes)
if fields.InputDataFields.multiclass_scores in out_tensor_dict:
out_tensor_dict[
fields.InputDataFields
......@@ -173,8 +251,11 @@ def transform_input_data(tensor_dict,
bboxes = out_tensor_dict[fields.InputDataFields.groundtruth_boxes]
boxlist = box_list.BoxList(bboxes)
realigned_bboxes = box_list_ops.change_coordinate_frame(boxlist, im_box)
realigned_boxes_tensor = realigned_bboxes.get()
valid_boxes_tensor = assert_or_prune_invalid_boxes(realigned_boxes_tensor)
out_tensor_dict[
fields.InputDataFields.groundtruth_boxes] = realigned_bboxes.get()
fields.InputDataFields.groundtruth_boxes] = valid_boxes_tensor
if fields.InputDataFields.groundtruth_keypoints in tensor_dict:
keypoints = out_tensor_dict[fields.InputDataFields.groundtruth_keypoints]
......@@ -182,10 +263,24 @@ def transform_input_data(tensor_dict,
im_box)
out_tensor_dict[
fields.InputDataFields.groundtruth_keypoints] = realigned_keypoints
flds_gt_kpt = fields.InputDataFields.groundtruth_keypoints
flds_gt_kpt_vis = fields.InputDataFields.groundtruth_keypoint_visibilities
flds_gt_kpt_weights = fields.InputDataFields.groundtruth_keypoint_weights
if flds_gt_kpt_vis not in out_tensor_dict:
out_tensor_dict[flds_gt_kpt_vis] = tf.ones_like(
out_tensor_dict[flds_gt_kpt][:, :, 0],
dtype=tf.bool)
out_tensor_dict[flds_gt_kpt_weights] = (
keypoint_ops.keypoint_weights_from_visibilities(
out_tensor_dict[flds_gt_kpt_vis],
keypoint_type_weight))
if use_bfloat16:
preprocessed_resized_image = tf.cast(
preprocessed_resized_image, tf.bfloat16)
if fields.InputDataFields.context_features in out_tensor_dict:
out_tensor_dict[fields.InputDataFields.context_features] = tf.cast(
out_tensor_dict[fields.InputDataFields.context_features], tf.bfloat16)
out_tensor_dict[fields.InputDataFields.image] = tf.squeeze(
preprocessed_resized_image, axis=0)
out_tensor_dict[fields.InputDataFields.true_image_shape] = tf.squeeze(
......@@ -198,9 +293,8 @@ def transform_input_data(tensor_dict,
out_tensor_dict[
fields.InputDataFields.groundtruth_instance_masks] = resized_masks
label_offset = 1
zero_indexed_groundtruth_classes = out_tensor_dict[
fields.InputDataFields.groundtruth_classes] - label_offset
fields.InputDataFields.groundtruth_classes] - _LABEL_OFFSET
if use_multiclass_scores:
out_tensor_dict[
fields.InputDataFields.groundtruth_classes] = out_tensor_dict[
......@@ -242,8 +336,12 @@ def transform_input_data(tensor_dict,
return out_tensor_dict
def pad_input_data_to_static_shapes(tensor_dict, max_num_boxes, num_classes,
spatial_image_shape=None):
def pad_input_data_to_static_shapes(tensor_dict,
max_num_boxes,
num_classes,
spatial_image_shape=None,
max_num_context_features=None,
context_feature_length=None):
"""Pads input tensors to static shapes.
In case num_additional_channels > 0, we assume that the additional channels
......@@ -257,6 +355,9 @@ def pad_input_data_to_static_shapes(tensor_dict, max_num_boxes, num_classes,
padding.
spatial_image_shape: A list of two integers of the form [height, width]
containing expected spatial shape of the image.
max_num_context_features (optional): The maximum number of context
features needed to compute shapes padding.
context_feature_length (optional): The length of the context feature.
Returns:
A dictionary keyed by fields.InputDataFields containing padding shapes for
......@@ -264,7 +365,9 @@ def pad_input_data_to_static_shapes(tensor_dict, max_num_boxes, num_classes,
Raises:
ValueError: If groundtruth classes is neither rank 1 nor rank 2, or if we
detect that additional channels have not been concatenated yet.
detect that additional channels have not been concatenated yet, or if
max_num_context_features is not specified and context_features is in the
tensor dict.
"""
if not spatial_image_shape or spatial_image_shape == [-1, -1]:
......@@ -296,10 +399,14 @@ def pad_input_data_to_static_shapes(tensor_dict, max_num_boxes, num_classes,
raise ValueError(
'Image must be already concatenated with additional channels.')
if fields.InputDataFields.context_features in tensor_dict and (
max_num_context_features is None):
raise ValueError('max_num_context_features must be specified in the model '
'config if include_context is specified in the input '
'config')
padding_shapes = {
fields.InputDataFields.image: [
height, width, num_channels
],
fields.InputDataFields.image: [height, width, num_channels],
fields.InputDataFields.original_image_spatial_shape: [2],
fields.InputDataFields.image_additional_channels: [
height, width, num_additional_channels
......@@ -326,6 +433,7 @@ def pad_input_data_to_static_shapes(tensor_dict, max_num_boxes, num_classes,
fields.InputDataFields.true_image_shape: [3],
fields.InputDataFields.groundtruth_image_classes: [num_classes],
fields.InputDataFields.groundtruth_image_confidences: [num_classes],
fields.InputDataFields.groundtruth_labeled_classes: [num_classes],
}
if fields.InputDataFields.original_image in tensor_dict:
......@@ -348,6 +456,25 @@ def pad_input_data_to_static_shapes(tensor_dict, max_num_boxes, num_classes,
padding_shapes[fields.InputDataFields.
groundtruth_keypoint_visibilities] = padding_shape
if fields.InputDataFields.groundtruth_keypoint_weights in tensor_dict:
tensor_shape = (
tensor_dict[fields.InputDataFields.groundtruth_keypoint_weights].shape)
padding_shape = [max_num_boxes, shape_utils.get_dim_as_int(tensor_shape[1])]
padding_shapes[fields.InputDataFields.
groundtruth_keypoint_weights] = padding_shape
# Prepare for ContextRCNN related fields.
if fields.InputDataFields.context_features in tensor_dict:
padding_shape = [max_num_context_features, context_feature_length]
padding_shapes[fields.InputDataFields.context_features] = padding_shape
tensor_shape = tf.shape(
tensor_dict[fields.InputDataFields.context_features])
tensor_dict[fields.InputDataFields.valid_context_size] = tensor_shape[0]
padding_shapes[fields.InputDataFields.valid_context_size] = []
if fields.InputDataFields.context_feature_length in tensor_dict:
padding_shapes[fields.InputDataFields.context_feature_length] = []
padded_tensor_dict = {}
for tensor_name in tensor_dict:
padded_tensor_dict[tensor_name] = shape_utils.pad_or_clip_nd(
......@@ -383,6 +510,8 @@ def augment_input_data(tensor_dict, data_augmentation_options):
in tensor_dict)
include_keypoints = (fields.InputDataFields.groundtruth_keypoints
in tensor_dict)
include_keypoint_visibilities = (
fields.InputDataFields.groundtruth_keypoint_visibilities in tensor_dict)
include_label_weights = (fields.InputDataFields.groundtruth_weights
in tensor_dict)
include_label_confidences = (fields.InputDataFields.groundtruth_confidences
......@@ -396,7 +525,8 @@ def augment_input_data(tensor_dict, data_augmentation_options):
include_label_confidences=include_label_confidences,
include_multiclass_scores=include_multiclass_scores,
include_instance_masks=include_instance_masks,
include_keypoints=include_keypoints))
include_keypoints=include_keypoints,
include_keypoint_visibilities=include_keypoint_visibilities))
tensor_dict[fields.InputDataFields.image] = tf.squeeze(
tensor_dict[fields.InputDataFields.image], axis=0)
return tensor_dict
......@@ -416,11 +546,14 @@ def _get_labels_dict(input_dict):
optional_label_keys = [
fields.InputDataFields.groundtruth_confidences,
fields.InputDataFields.groundtruth_labeled_classes,
fields.InputDataFields.groundtruth_keypoints,
fields.InputDataFields.groundtruth_instance_masks,
fields.InputDataFields.groundtruth_area,
fields.InputDataFields.groundtruth_is_crowd,
fields.InputDataFields.groundtruth_difficult
fields.InputDataFields.groundtruth_difficult,
fields.InputDataFields.groundtruth_keypoint_visibilities,
fields.InputDataFields.groundtruth_keypoint_weights,
]
for key in optional_label_keys:
......@@ -461,7 +594,7 @@ def _replace_empty_string_with_random_number(string_tensor):
return out_string
def _get_features_dict(input_dict):
def _get_features_dict(input_dict, include_source_id=False):
"""Extracts features dict from input dict."""
source_id = _replace_empty_string_with_random_number(
......@@ -477,12 +610,20 @@ def _get_features_dict(input_dict):
fields.InputDataFields.original_image_spatial_shape:
input_dict[fields.InputDataFields.original_image_spatial_shape]
}
if include_source_id:
features[fields.InputDataFields.source_id] = source_id
if fields.InputDataFields.original_image in input_dict:
features[fields.InputDataFields.original_image] = input_dict[
fields.InputDataFields.original_image]
if fields.InputDataFields.image_additional_channels in input_dict:
features[fields.InputDataFields.image_additional_channels] = input_dict[
fields.InputDataFields.image_additional_channels]
if fields.InputDataFields.context_features in input_dict:
features[fields.InputDataFields.context_features] = input_dict[
fields.InputDataFields.context_features]
if fields.InputDataFields.valid_context_size in input_dict:
features[fields.InputDataFields.valid_context_size] = input_dict[
fields.InputDataFields.valid_context_size]
return features
......@@ -507,7 +648,7 @@ def create_train_input_fn(train_config, train_input_config,
def train_input(train_config, train_input_config,
model_config, model=None, params=None):
model_config, model=None, params=None, input_context=None):
"""Returns `features` and `labels` tensor dictionaries for training.
Args:
......@@ -517,6 +658,9 @@ def train_input(train_config, train_input_config,
model: A pre-constructed Detection Model.
If None, one will be created from the config.
params: Parameter dictionary passed from the estimator.
input_context: optional, A tf.distribute.InputContext object used to
shard filenames and compute per-replica batch_size when this function
is being called per-replica.
Returns:
A tf.data.Dataset that holds (features, labels) tuple.
......@@ -550,6 +694,12 @@ def train_input(train_config, train_input_config,
labels[fields.InputDataFields.groundtruth_keypoints] is a
[batch_size, num_boxes, num_keypoints, 2] float32 tensor containing
keypoints for each box.
labels[fields.InputDataFields.groundtruth_weights] is a
[batch_size, num_boxes, num_keypoints] float32 tensor containing
groundtruth weights for the keypoints.
labels[fields.InputDataFields.groundtruth_visibilities] is a
[batch_size, num_boxes, num_keypoints] bool tensor containing
groundtruth visibilities for each keypoint.
Raises:
TypeError: if the `train_config`, `train_input_config` or `model_config`
......@@ -571,6 +721,8 @@ def train_input(train_config, train_input_config,
else:
model_preprocess_fn = model.preprocess
num_classes = config_util.get_number_of_classes(model_config)
def transform_and_pad_input_data_fn(tensor_dict):
"""Combines transform and pad operation."""
data_augmentation_options = [
......@@ -583,28 +735,37 @@ def train_input(train_config, train_input_config,
image_resizer_config = config_util.get_image_resizer_config(model_config)
image_resizer_fn = image_resizer_builder.build(image_resizer_config)
keypoint_type_weight = train_input_config.keypoint_type_weight or None
transform_data_fn = functools.partial(
transform_input_data, model_preprocess_fn=model_preprocess_fn,
image_resizer_fn=image_resizer_fn,
num_classes=config_util.get_number_of_classes(model_config),
num_classes=num_classes,
data_augmentation_fn=data_augmentation_fn,
merge_multiple_boxes=train_config.merge_multiple_label_boxes,
retain_original_image=train_config.retain_original_images,
use_multiclass_scores=train_config.use_multiclass_scores,
use_bfloat16=train_config.use_bfloat16)
use_bfloat16=train_config.use_bfloat16,
keypoint_type_weight=keypoint_type_weight)
tensor_dict = pad_input_data_to_static_shapes(
tensor_dict=transform_data_fn(tensor_dict),
max_num_boxes=train_input_config.max_number_of_boxes,
num_classes=config_util.get_number_of_classes(model_config),
num_classes=num_classes,
spatial_image_shape=config_util.get_spatial_image_size(
image_resizer_config))
return (_get_features_dict(tensor_dict), _get_labels_dict(tensor_dict))
image_resizer_config),
max_num_context_features=config_util.get_max_num_context_features(
model_config),
context_feature_length=config_util.get_context_feature_length(
model_config))
include_source_id = train_input_config.include_source_id
return (_get_features_dict(tensor_dict, include_source_id),
_get_labels_dict(tensor_dict))
dataset = INPUT_BUILDER_UTIL_MAP['dataset_build'](
train_input_config,
transform_input_data_fn=transform_and_pad_input_data_fn,
batch_size=params['batch_size'] if params else train_config.batch_size)
batch_size=params['batch_size'] if params else train_config.batch_size,
input_context=input_context)
return dataset
......@@ -667,6 +828,12 @@ def eval_input(eval_config, eval_input_config, model_config,
labels[fields.InputDataFields.groundtruth_instance_masks] is a
[1, num_boxes, H, W] float32 tensor containing only binary values,
which represent instance masks for objects.
labels[fields.InputDataFields.groundtruth_weights] is a
[batch_size, num_boxes, num_keypoints] float32 tensor containing
groundtruth weights for the keypoints.
labels[fields.InputDataFields.groundtruth_visibilities] is a
[batch_size, num_boxes, num_keypoints] bool tensor containing
groundtruth visibilities for each keypoint.
Raises:
TypeError: if the `eval_config`, `eval_input_config` or `model_config`
......@@ -703,6 +870,7 @@ def eval_input(eval_config, eval_input_config, model_config,
image_resizer_config = config_util.get_image_resizer_config(model_config)
image_resizer_fn = image_resizer_builder.build(image_resizer_config)
keypoint_type_weight = eval_input_config.keypoint_type_weight or None
transform_data_fn = functools.partial(
transform_input_data, model_preprocess_fn=model_preprocess_fn,
......@@ -711,14 +879,21 @@ def eval_input(eval_config, eval_input_config, model_config,
data_augmentation_fn=None,
retain_original_image=eval_config.retain_original_images,
retain_original_image_additional_channels=
eval_config.retain_original_image_additional_channels)
eval_config.retain_original_image_additional_channels,
keypoint_type_weight=keypoint_type_weight)
tensor_dict = pad_input_data_to_static_shapes(
tensor_dict=transform_data_fn(tensor_dict),
max_num_boxes=eval_input_config.max_number_of_boxes,
num_classes=config_util.get_number_of_classes(model_config),
spatial_image_shape=config_util.get_spatial_image_size(
image_resizer_config))
return (_get_features_dict(tensor_dict), _get_labels_dict(tensor_dict))
image_resizer_config),
max_num_context_features=config_util.get_max_num_context_features(
model_config),
context_feature_length=config_util.get_context_feature_length(
model_config))
include_source_id = eval_input_config.include_source_id
return (_get_features_dict(tensor_dict, include_source_id),
_get_labels_dict(tensor_dict))
dataset = INPUT_BUILDER_UTIL_MAP['dataset_build'](
eval_input_config,
batch_size=params['batch_size'] if params else eval_config.batch_size,
......
......@@ -20,6 +20,7 @@ from __future__ import print_function
import functools
import os
from absl import logging
from absl.testing import parameterized
import numpy as np
......@@ -484,16 +485,18 @@ class InputsTest(test_case.TestCase, parameterized.TestCase):
empty_string = ''
feed_dict = {string_placeholder: empty_string}
tf.set_random_seed(0)
with self.test_session() as sess:
out_string = sess.run(replaced_string, feed_dict=feed_dict)
# Test whether out_string is a string which represents an integer.
int(out_string) # throws an error if out_string is not castable to int.
is_integer = True
try:
# Test whether out_string is a string which represents an integer, the
# casting below will throw an error if out_string is not castable to int.
int(out_string)
except ValueError:
is_integer = False
self.assertEqual(out_string, b'2798129067578209328')
self.assertTrue(is_integer)
def test_force_no_resize(self):
"""Tests the functionality of force_no_reisze option."""
......@@ -681,7 +684,7 @@ def _fake_resize50_preprocess_fn(image):
return tf.expand_dims(image, 0), tf.expand_dims(shape, axis=0)
class DataTransformationFnTest(test_case.TestCase):
class DataTransformationFnTest(test_case.TestCase, parameterized.TestCase):
def test_combine_additional_channels_if_present(self):
image = np.random.rand(4, 4, 3).astype(np.float32)
......@@ -766,6 +769,54 @@ class DataTransformationFnTest(test_case.TestCase):
np.array([[0, 1, 0], [0, 0, 1]], np.float32),
transformed_inputs[fields.InputDataFields.groundtruth_classes])
@parameterized.parameters(
{'labeled_classes': [1, 2]},
{'labeled_classes': []},
{'labeled_classes': [1, -1, 2]} # -1 denotes an unrecognized class
)
def test_use_labeled_classes(self, labeled_classes):
def compute_fn(image, groundtruth_boxes, groundtruth_classes,
groundtruth_labeled_classes):
tensor_dict = {
fields.InputDataFields.image:
image,
fields.InputDataFields.groundtruth_boxes:
groundtruth_boxes,
fields.InputDataFields.groundtruth_classes:
groundtruth_classes,
fields.InputDataFields.groundtruth_labeled_classes:
groundtruth_labeled_classes
}
input_transformation_fn = functools.partial(
inputs.transform_input_data,
model_preprocess_fn=_fake_model_preprocessor_fn,
image_resizer_fn=_fake_image_resizer_fn,
num_classes=3)
return input_transformation_fn(tensor_dict=tensor_dict)
image = np.random.rand(4, 4, 3).astype(np.float32)
groundtruth_boxes = np.array([[.5, .5, 1, 1], [.5, .5, 1, 1]], np.float32)
groundtruth_classes = np.array([1, 2], np.int32)
groundtruth_labeled_classes = np.array(labeled_classes, np.int32)
transformed_inputs = self.execute_cpu(compute_fn, [
image, groundtruth_boxes, groundtruth_classes,
groundtruth_labeled_classes
])
if labeled_classes == [1, 2] or labeled_classes == [1, -1, 2]:
transformed_labeled_classes = [1, 1, 0]
elif not labeled_classes:
transformed_labeled_classes = [1, 1, 1]
else:
logging.exception('Unexpected labeled_classes %r', labeled_classes)
self.assertAllEqual(
np.array(transformed_labeled_classes, np.float32),
transformed_inputs[fields.InputDataFields.groundtruth_labeled_classes])
def test_returns_correct_class_label_encodings(self):
tensor_dict = {
fields.InputDataFields.image:
......@@ -809,7 +860,7 @@ class DataTransformationFnTest(test_case.TestCase):
np.array([[[.1, .1]], [[.2, .2]], [[.5, .5]]],
np.float32)),
fields.InputDataFields.groundtruth_keypoint_visibilities:
tf.constant([True, False, True]),
tf.constant([[True, True], [False, False], [True, True]]),
fields.InputDataFields.groundtruth_instance_masks:
tf.constant(np.random.rand(3, 4, 4).astype(np.float32)),
fields.InputDataFields.groundtruth_is_crowd:
......@@ -847,7 +898,7 @@ class DataTransformationFnTest(test_case.TestCase):
self.assertAllEqual(
transformed_inputs[
fields.InputDataFields.groundtruth_keypoint_visibilities],
[True, True])
[[True, True], [True, True]])
self.assertAllEqual(
transformed_inputs[
fields.InputDataFields.groundtruth_instance_masks].shape, [2, 4, 4])
......@@ -1060,7 +1111,7 @@ class DataTransformationFnTest(test_case.TestCase):
fields.InputDataFields.groundtruth_classes:
tf.constant(np.array([1, 2], np.int32)),
fields.InputDataFields.groundtruth_keypoints:
tf.constant([[0.1, 0.2], [0.3, 0.4]]),
tf.constant([[[0.1, 0.2]], [[0.3, 0.4]]]),
}
num_classes = 3
......@@ -1078,7 +1129,75 @@ class DataTransformationFnTest(test_case.TestCase):
[[.5, .25, 1., .5], [.0, .0, .5, .25]])
self.assertAllClose(
transformed_inputs[fields.InputDataFields.groundtruth_keypoints],
[[[.1, .1], [.3, .2]]])
[[[.1, .1]], [[.3, .2]]])
def test_groundtruth_keypoint_weights(self):
tensor_dict = {
fields.InputDataFields.image:
tf.constant(np.random.rand(100, 50, 3).astype(np.float32)),
fields.InputDataFields.groundtruth_boxes:
tf.constant(np.array([[.5, .5, 1, 1], [.0, .0, .5, .5]],
np.float32)),
fields.InputDataFields.groundtruth_classes:
tf.constant(np.array([1, 2], np.int32)),
fields.InputDataFields.groundtruth_keypoints:
tf.constant([[[0.1, 0.2], [0.3, 0.4]],
[[0.5, 0.6], [0.7, 0.8]]]),
fields.InputDataFields.groundtruth_keypoint_visibilities:
tf.constant([[True, False], [True, True]]),
}
num_classes = 3
keypoint_type_weight = [1.0, 2.0]
input_transformation_fn = functools.partial(
inputs.transform_input_data,
model_preprocess_fn=_fake_resize50_preprocess_fn,
image_resizer_fn=_fake_image_resizer_fn,
num_classes=num_classes,
keypoint_type_weight=keypoint_type_weight)
with self.test_session() as sess:
transformed_inputs = sess.run(
input_transformation_fn(tensor_dict=tensor_dict))
self.assertAllClose(
transformed_inputs[fields.InputDataFields.groundtruth_keypoints],
[[[0.1, 0.1], [0.3, 0.2]],
[[0.5, 0.3], [0.7, 0.4]]])
self.assertAllClose(
transformed_inputs[fields.InputDataFields.groundtruth_keypoint_weights],
[[1.0, 0.0], [1.0, 2.0]])
def test_groundtruth_keypoint_weights_default(self):
tensor_dict = {
fields.InputDataFields.image:
tf.constant(np.random.rand(100, 50, 3).astype(np.float32)),
fields.InputDataFields.groundtruth_boxes:
tf.constant(np.array([[.5, .5, 1, 1], [.0, .0, .5, .5]],
np.float32)),
fields.InputDataFields.groundtruth_classes:
tf.constant(np.array([1, 2], np.int32)),
fields.InputDataFields.groundtruth_keypoints:
tf.constant([[[0.1, 0.2], [0.3, 0.4]],
[[0.5, 0.6], [0.7, 0.8]]]),
}
num_classes = 3
input_transformation_fn = functools.partial(
inputs.transform_input_data,
model_preprocess_fn=_fake_resize50_preprocess_fn,
image_resizer_fn=_fake_image_resizer_fn,
num_classes=num_classes)
with self.test_session() as sess:
transformed_inputs = sess.run(
input_transformation_fn(tensor_dict=tensor_dict))
self.assertAllClose(
transformed_inputs[fields.InputDataFields.groundtruth_keypoints],
[[[0.1, 0.1], [0.3, 0.2]],
[[0.5, 0.3], [0.7, 0.4]]])
self.assertAllClose(
transformed_inputs[fields.InputDataFields.groundtruth_keypoint_weights],
[[1.0, 1.0], [1.0, 1.0]])
class PadInputDataToStaticShapesFnTest(test_case.TestCase):
......@@ -1272,6 +1391,44 @@ class PadInputDataToStaticShapesFnTest(test_case.TestCase):
fields.InputDataFields.groundtruth_keypoint_visibilities]
.shape.as_list(), [3, 16])
def test_context_features(self):
context_memory_size = 8
context_feature_length = 10
max_num_context_features = 20
input_tensor_dict = {
fields.InputDataFields.context_features:
tf.placeholder(tf.float32,
[context_memory_size, context_feature_length]),
fields.InputDataFields.context_feature_length:
tf.placeholder(tf.float32, [])
}
padded_tensor_dict = inputs.pad_input_data_to_static_shapes(
tensor_dict=input_tensor_dict,
max_num_boxes=3,
num_classes=3,
spatial_image_shape=[5, 6],
max_num_context_features=max_num_context_features,
context_feature_length=context_feature_length)
self.assertAllEqual(
padded_tensor_dict[
fields.InputDataFields.context_features].shape.as_list(),
[max_num_context_features, context_feature_length])
with self.test_session() as sess:
feed_dict = {
input_tensor_dict[fields.InputDataFields.context_features]:
np.ones([context_memory_size, context_feature_length],
dtype=np.float32),
input_tensor_dict[fields.InputDataFields.context_feature_length]:
context_feature_length
}
padded_tensor_dict_out = sess.run(padded_tensor_dict, feed_dict=feed_dict)
self.assertEqual(
padded_tensor_dict_out[fields.InputDataFields.valid_context_size],
context_memory_size)
if __name__ == '__main__':
tf.test.main()
......@@ -15,65 +15,72 @@
"""Tests for object_detection.core.bipartite_matcher."""
import numpy as np
import tensorflow as tf
from object_detection.matchers import bipartite_matcher
from object_detection.utils import test_case
class GreedyBipartiteMatcherTest(tf.test.TestCase):
class GreedyBipartiteMatcherTest(test_case.TestCase):
def test_get_expected_matches_when_all_rows_are_valid(self):
similarity_matrix = tf.constant([[0.50, 0.1, 0.8], [0.15, 0.2, 0.3]])
valid_rows = tf.ones([2], dtype=tf.bool)
similarity_matrix = np.array([[0.50, 0.1, 0.8], [0.15, 0.2, 0.3]],
dtype=np.float32)
valid_rows = np.ones([2], dtype=np.bool)
expected_match_results = [-1, 1, 0]
def graph_fn(similarity_matrix, valid_rows):
matcher = bipartite_matcher.GreedyBipartiteMatcher()
match = matcher.match(similarity_matrix, valid_rows=valid_rows)
with self.test_session() as sess:
match_results_out = sess.run(match._match_results)
return match._match_results
match_results_out = self.execute(graph_fn, [similarity_matrix, valid_rows])
self.assertAllEqual(match_results_out, expected_match_results)
def test_get_expected_matches_with_all_rows_be_default(self):
similarity_matrix = tf.constant([[0.50, 0.1, 0.8], [0.15, 0.2, 0.3]])
similarity_matrix = np.array([[0.50, 0.1, 0.8], [0.15, 0.2, 0.3]],
dtype=np.float32)
expected_match_results = [-1, 1, 0]
def graph_fn(similarity_matrix):
matcher = bipartite_matcher.GreedyBipartiteMatcher()
match = matcher.match(similarity_matrix)
with self.test_session() as sess:
match_results_out = sess.run(match._match_results)
return match._match_results
match_results_out = self.execute(graph_fn, [similarity_matrix])
self.assertAllEqual(match_results_out, expected_match_results)
def test_get_no_matches_with_zero_valid_rows(self):
similarity_matrix = tf.constant([[0.50, 0.1, 0.8], [0.15, 0.2, 0.3]])
valid_rows = tf.zeros([2], dtype=tf.bool)
similarity_matrix = np.array([[0.50, 0.1, 0.8], [0.15, 0.2, 0.3]],
dtype=np.float32)
valid_rows = np.zeros([2], dtype=np.bool)
expected_match_results = [-1, -1, -1]
def graph_fn(similarity_matrix, valid_rows):
matcher = bipartite_matcher.GreedyBipartiteMatcher()
match = matcher.match(similarity_matrix, valid_rows)
with self.test_session() as sess:
match_results_out = sess.run(match._match_results)
match = matcher.match(similarity_matrix, valid_rows=valid_rows)
return match._match_results
match_results_out = self.execute(graph_fn, [similarity_matrix, valid_rows])
self.assertAllEqual(match_results_out, expected_match_results)
def test_get_expected_matches_with_only_one_valid_row(self):
similarity_matrix = tf.constant([[0.50, 0.1, 0.8], [0.15, 0.2, 0.3]])
valid_rows = tf.constant([True, False], dtype=tf.bool)
similarity_matrix = np.array([[0.50, 0.1, 0.8], [0.15, 0.2, 0.3]],
dtype=np.float32)
valid_rows = np.array([True, False], dtype=np.bool)
expected_match_results = [-1, -1, 0]
def graph_fn(similarity_matrix, valid_rows):
matcher = bipartite_matcher.GreedyBipartiteMatcher()
match = matcher.match(similarity_matrix, valid_rows)
with self.test_session() as sess:
match_results_out = sess.run(match._match_results)
match = matcher.match(similarity_matrix, valid_rows=valid_rows)
return match._match_results
match_results_out = self.execute(graph_fn, [similarity_matrix, valid_rows])
self.assertAllEqual(match_results_out, expected_match_results)
def test_get_expected_matches_with_only_one_valid_row_at_bottom(self):
similarity_matrix = tf.constant([[0.15, 0.2, 0.3], [0.50, 0.1, 0.8]])
valid_rows = tf.constant([False, True], dtype=tf.bool)
similarity_matrix = np.array([[0.15, 0.2, 0.3], [0.50, 0.1, 0.8]],
dtype=np.float32)
valid_rows = np.array([False, True], dtype=np.bool)
expected_match_results = [-1, -1, 0]
def graph_fn(similarity_matrix, valid_rows):
matcher = bipartite_matcher.GreedyBipartiteMatcher()
match = matcher.match(similarity_matrix, valid_rows)
with self.test_session() as sess:
match_results_out = sess.run(match._match_results)
match = matcher.match(similarity_matrix, valid_rows=valid_rows)
return match._match_results
match_results_out = self.execute(graph_fn, [similarity_matrix, valid_rows])
self.assertAllEqual(match_results_out, expected_match_results)
......
......@@ -92,11 +92,11 @@ configured in the meta architecture:
non-max suppression and normalize them. In this case, the `postprocess` method
skips both `_postprocess_rpn` and `_postprocess_box_classifier`.
"""
from __future__ import print_function
import abc
import functools
import tensorflow as tf
from tensorflow.contrib import framework as contrib_framework
from tensorflow.contrib import slim as contrib_slim
from object_detection.anchor_generators import grid_anchor_generator
from object_detection.builders import box_predictor_builder
......@@ -112,7 +112,14 @@ from object_detection.utils import ops
from object_detection.utils import shape_utils
from object_detection.utils import variables_helper
slim = contrib_slim
# pylint: disable=g-import-not-at-top
try:
from tensorflow.contrib import framework as contrib_framework
from tensorflow.contrib import slim as contrib_slim
except ImportError:
# TF 2.0 doesn't ship with contrib.
pass
# pylint: enable=g-import-not-at-top
_UNINITIALIZED_FEATURE_EXTRACTOR = '__uninitialized__'
......@@ -141,7 +148,7 @@ class FasterRCNNFeatureExtractor(object):
self._is_training = is_training
self._first_stage_features_stride = first_stage_features_stride
self._train_batch_norm = (batch_norm_trainable and is_training)
self._reuse_weights = reuse_weights
self._reuse_weights = tf.AUTO_REUSE if reuse_weights else None
self._weight_decay = weight_decay
@abc.abstractmethod
......@@ -329,7 +336,8 @@ class FasterRCNNMetaArch(model.DetectionModel):
use_static_shapes=False,
resize_masks=True,
freeze_batchnorm=False,
return_raw_detections_during_predict=False):
return_raw_detections_during_predict=False,
output_final_box_features=False):
"""FasterRCNNMetaArch Constructor.
Args:
......@@ -461,6 +469,10 @@ class FasterRCNNMetaArch(model.DetectionModel):
return_raw_detections_during_predict: Whether to return raw detection
boxes in the predict() method. These are decoded boxes that have not
been through postprocessing (i.e. NMS). Default False.
output_final_box_features: Whether to output final box features. If true,
it crops the feauture map based on the final box prediction and returns
in the dict as detection_features.
Raises:
ValueError: If `second_stage_batch_size` > `first_stage_max_proposals` at
training time.
......@@ -554,13 +566,16 @@ class FasterRCNNMetaArch(model.DetectionModel):
self._first_stage_box_predictor_arg_scope_fn = (
first_stage_box_predictor_arg_scope_fn)
def rpn_box_predictor_feature_extractor(rpn_features_to_crop):
with slim.arg_scope(self._first_stage_box_predictor_arg_scope_fn()):
with contrib_slim.arg_scope(
self._first_stage_box_predictor_arg_scope_fn()):
reuse = tf.get_variable_scope().reuse
return slim.conv2d(
return contrib_slim.conv2d(
rpn_features_to_crop,
self._first_stage_box_predictor_depth,
kernel_size=[self._first_stage_box_predictor_kernel_size,
self._first_stage_box_predictor_kernel_size],
kernel_size=[
self._first_stage_box_predictor_kernel_size,
self._first_stage_box_predictor_kernel_size
],
rate=self._first_stage_atrous_rate,
activation_fn=tf.nn.relu6,
scope='Conv',
......@@ -630,6 +645,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
self._batched_prediction_tensor_names = []
self._return_raw_detections_during_predict = (
return_raw_detections_during_predict)
self._output_final_box_features = output_final_box_features
@property
def first_stage_feature_extractor_scope(self):
......@@ -678,6 +694,10 @@ class FasterRCNNMetaArch(model.DetectionModel):
'tensor names.')
return self._batched_prediction_tensor_names
@property
def feature_extractor(self):
return self._feature_extractor
def preprocess(self, inputs):
"""Feature-extractor specific preprocessing.
......@@ -746,7 +766,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
anchors, image_shape_2d, true_image_shapes)
return proposal_boxes_normalized, num_proposals
def predict(self, preprocessed_inputs, true_image_shapes):
def predict(self, preprocessed_inputs, true_image_shapes, **side_inputs):
"""Predicts unpostprocessed tensors from input tensor.
This function takes an input batch of images and runs it through the
......@@ -771,6 +791,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
of the form [height, width, channels] indicating the shapes
of true images in the resized images, as resized images can be padded
with zeros.
**side_inputs: additional tensors that are required by the network.
Returns:
prediction_dict: a dictionary holding "raw" prediction tensors:
......@@ -841,9 +862,8 @@ class FasterRCNNMetaArch(model.DetectionModel):
prediction_dict['rpn_box_encodings'],
prediction_dict['rpn_objectness_predictions_with_background'],
prediction_dict['rpn_features_to_crop'],
prediction_dict['anchors'],
prediction_dict['image_shape'],
true_image_shapes))
prediction_dict['anchors'], prediction_dict['image_shape'],
true_image_shapes, **side_inputs))
if self._number_of_stages == 3:
prediction_dict = self._predict_third_stage(prediction_dict,
......@@ -948,14 +968,12 @@ class FasterRCNNMetaArch(model.DetectionModel):
def _predict_second_stage(self, rpn_box_encodings,
rpn_objectness_predictions_with_background,
rpn_features_to_crop,
anchors,
image_shape,
true_image_shapes):
rpn_features_to_crop, anchors, image_shape,
true_image_shapes, **side_inputs):
"""Predicts the output tensors from second stage of Faster R-CNN.
Args:
rpn_box_encodings: 4-D float tensor of shape
rpn_box_encodings: 3-D float tensor of shape
[batch_size, num_valid_anchors, self._box_coder.code_size] containing
predicted boxes.
rpn_objectness_predictions_with_background: 2-D float tensor of shape
......@@ -972,6 +990,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
of the form [height, width, channels] indicating the shapes
of true images in the resized images, as resized images can be padded
with zeros.
**side_inputs: additional tensors that are required by the network.
Returns:
prediction_dict: a dictionary holding "raw" prediction tensors:
......@@ -1016,12 +1035,13 @@ class FasterRCNNMetaArch(model.DetectionModel):
image_shape, true_image_shapes)
prediction_dict = self._box_prediction(rpn_features_to_crop,
proposal_boxes_normalized,
image_shape, true_image_shapes)
image_shape, true_image_shapes,
**side_inputs)
prediction_dict['num_proposals'] = num_proposals
return prediction_dict
def _box_prediction(self, rpn_features_to_crop, proposal_boxes_normalized,
image_shape, true_image_shapes):
image_shape, true_image_shapes, **side_inputs):
"""Predicts the output tensors from second stage of Faster R-CNN.
Args:
......@@ -1037,6 +1057,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
of the form [height, width, channels] indicating the shapes
of true images in the resized images, as resized images can be padded
with zeros.
**side_inputs: additional tensors that are required by the network.
Returns:
prediction_dict: a dictionary holding "raw" prediction tensors:
......@@ -1076,7 +1097,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
"""
flattened_proposal_feature_maps = (
self._compute_second_stage_input_feature_maps(
rpn_features_to_crop, proposal_boxes_normalized))
rpn_features_to_crop, proposal_boxes_normalized, **side_inputs))
box_classifier_features = self._extract_box_classifier_features(
flattened_proposal_feature_maps)
......@@ -1508,16 +1529,21 @@ class FasterRCNNMetaArch(model.DetectionModel):
Raises:
ValueError: If `predict` is called before `preprocess`.
ValueError: If `_output_final_box_features` is true but
rpn_features_to_crop is not in the prediction_dict.
"""
with tf.name_scope('FirstStagePostprocessor'):
if self._number_of_stages == 1:
image_shapes = self._image_batch_shape_2d(
prediction_dict['image_shape'])
(proposal_boxes, proposal_scores, proposal_multiclass_scores,
num_proposals, raw_proposal_boxes,
raw_proposal_scores) = self._postprocess_rpn(
prediction_dict['rpn_box_encodings'],
prediction_dict['rpn_objectness_predictions_with_background'],
prediction_dict['anchors'], true_image_shapes, true_image_shapes)
prediction_dict['anchors'], image_shapes, true_image_shapes)
return {
fields.DetectionResultFields.detection_boxes:
proposal_boxes,
......@@ -1546,7 +1572,11 @@ class FasterRCNNMetaArch(model.DetectionModel):
true_image_shapes,
mask_predictions=mask_predictions)
if 'rpn_features_to_crop' in prediction_dict and self._initial_crop_size:
if self._output_final_box_features:
if 'rpn_features_to_crop' not in prediction_dict:
raise ValueError(
'Please make sure rpn_features_to_crop is in the prediction_dict.'
)
detections_dict[
'detection_features'] = self._add_detection_features_output_node(
detections_dict[fields.DetectionResultFields.detection_boxes],
......@@ -1679,7 +1709,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
rpn_objectness_softmax = tf.nn.softmax(
rpn_objectness_predictions_with_background_batch)
rpn_objectness_softmax_without_background = rpn_objectness_softmax[:, :, 1]
clip_window = self._compute_clip_window(image_shapes)
clip_window = self._compute_clip_window(true_image_shapes)
additional_fields = {'multiclass_scores': rpn_objectness_softmax}
(proposal_boxes, proposal_scores, _, _, nmsed_additional_fields,
num_proposals) = self._first_stage_nms_fn(
......@@ -1692,7 +1722,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
if not self._hard_example_miner:
(groundtruth_boxlists, groundtruth_classes_with_background_list, _,
groundtruth_weights_list
) = self._format_groundtruth_data(true_image_shapes)
) = self._format_groundtruth_data(image_shapes)
(proposal_boxes, proposal_scores,
num_proposals) = self._sample_box_classifier_batch(
proposal_boxes, proposal_scores, num_proposals,
......@@ -1798,7 +1828,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
tf.stack(single_image_proposal_score_sample),
tf.stack(single_image_num_proposals_sample))
def _format_groundtruth_data(self, true_image_shapes):
def _format_groundtruth_data(self, image_shapes):
"""Helper function for preparing groundtruth data for target assignment.
In order to be consistent with the model.DetectionModel interface,
......@@ -1811,10 +1841,8 @@ class FasterRCNNMetaArch(model.DetectionModel):
image_shape.
Args:
true_image_shapes: int32 tensor of shape [batch, 3] where each row is
of the form [height, width, channels] indicating the shapes
of true images in the resized images, as resized images can be padded
with zeros.
image_shapes: a 2-D int32 tensor of shape [batch_size, 3] containing
shapes of input image in the batch.
Returns:
groundtruth_boxlists: A list of BoxLists containing (absolute) coordinates
......@@ -1826,10 +1854,10 @@ class FasterRCNNMetaArch(model.DetectionModel):
shape [num_boxes, image_height, image_width] containing instance masks.
This is set to None if no masks exist in the provided groundtruth.
"""
# pylint: disable=g-complex-comprehension
groundtruth_boxlists = [
box_list_ops.to_absolute_coordinates(
box_list.BoxList(boxes), true_image_shapes[i, 0],
true_image_shapes[i, 1])
box_list.BoxList(boxes), image_shapes[i, 0], image_shapes[i, 1])
for i, boxes in enumerate(
self.groundtruth_lists(fields.BoxListFields.boxes))
]
......@@ -1934,7 +1962,8 @@ class FasterRCNNMetaArch(model.DetectionModel):
if self._use_static_shapes else None))
def _compute_second_stage_input_feature_maps(self, features_to_crop,
proposal_boxes_normalized):
proposal_boxes_normalized,
**side_inputs):
"""Crops to a set of proposals from the feature map for a batch of images.
Helper function for self._postprocess_rpn. This function calls
......@@ -1947,6 +1976,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
proposal_boxes_normalized: A float32 tensor with shape [batch_size,
num_proposals, box_code_size] containing proposal boxes in
normalized coordinates.
**side_inputs: additional tensors that are required by the network.
Returns:
A float32 tensor with shape [K, new_height, new_width, depth].
......@@ -2189,7 +2219,8 @@ class FasterRCNNMetaArch(model.DetectionModel):
with tf.name_scope(scope, 'Loss', prediction_dict.values()):
(groundtruth_boxlists, groundtruth_classes_with_background_list,
groundtruth_masks_list, groundtruth_weights_list
) = self._format_groundtruth_data(true_image_shapes)
) = self._format_groundtruth_data(
self._image_batch_shape_2d(prediction_dict['image_shape']))
loss_dict = self._loss_rpn(
prediction_dict['rpn_box_encodings'],
prediction_dict['rpn_objectness_predictions_with_background'],
......@@ -2222,7 +2253,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
participate in the loss computation, and returns the RPN losses.
Args:
rpn_box_encodings: A 4-D float tensor of shape
rpn_box_encodings: A 3-D float tensor of shape
[batch_size, num_anchors, self._box_coder.code_size] containing
predicted proposal box encodings.
rpn_objectness_predictions_with_background: A 2-D float tensor of shape
......@@ -2765,7 +2796,7 @@ class FasterRCNNMetaArch(model.DetectionModel):
self.second_stage_feature_extractor_scope)
variables_to_restore = variables_helper.get_global_variables_safely()
variables_to_restore.append(slim.get_or_create_global_step())
variables_to_restore.append(tf.train.get_or_create_global_step())
# Only load feature extractor variables to be consistent with loading from
# a classification checkpoint.
include_patterns = None
......
# Lint as: python2, python3
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
......@@ -15,8 +16,14 @@
"""Tests for object_detection.meta_architectures.faster_rcnn_meta_arch."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from absl.testing import parameterized
import numpy as np
from six.moves import range
from six.moves import zip
import tensorflow as tf
from object_detection.meta_architectures import faster_rcnn_meta_arch_test_lib
......@@ -488,8 +495,8 @@ class FasterRCNNMetaArchTest(
batch_size = 2
initial_crop_size = 3
maxpool_stride = 1
height = initial_crop_size/maxpool_stride
width = initial_crop_size/maxpool_stride
height = initial_crop_size // maxpool_stride
width = initial_crop_size // maxpool_stride
depth = 3
image_shape = np.array((2, 36, 48, 3), dtype=np.int32)
for (num_proposals_shape, refined_box_encoding_shape,
......@@ -574,9 +581,102 @@ class FasterRCNNMetaArchTest(
maxpool_stride,
num_features):
return (batch_size * max_num_proposals,
initial_crop_size/maxpool_stride,
initial_crop_size/maxpool_stride,
initial_crop_size // maxpool_stride,
initial_crop_size // maxpool_stride,
num_features)
@parameterized.parameters({'use_keras': True}, {'use_keras': False})
def test_output_final_box_features(self, use_keras):
model = self._build_model(
is_training=False,
use_keras=use_keras,
number_of_stages=2,
second_stage_batch_size=6,
output_final_box_features=True)
batch_size = 2
total_num_padded_proposals = batch_size * model.max_num_proposals
proposal_boxes = tf.constant([[[1, 1, 2, 3], [0, 0, 1, 1], [.5, .5, .6, .6],
4 * [0], 4 * [0], 4 * [0], 4 * [0], 4 * [0]],
[[2, 3, 6, 8], [1, 2, 5, 3], 4 * [0], 4 * [0],
4 * [0], 4 * [0], 4 * [0], 4 * [0]]],
dtype=tf.float32)
num_proposals = tf.constant([3, 2], dtype=tf.int32)
refined_box_encodings = tf.zeros(
[total_num_padded_proposals, model.num_classes, 4], dtype=tf.float32)
class_predictions_with_background = tf.ones(
[total_num_padded_proposals, model.num_classes + 1], dtype=tf.float32)
image_shape = tf.constant([batch_size, 36, 48, 3], dtype=tf.int32)
mask_height = 2
mask_width = 2
mask_predictions = 30. * tf.ones([
total_num_padded_proposals, model.num_classes, mask_height, mask_width
],
dtype=tf.float32)
exp_detection_masks = np.array([[[[1, 1], [1, 1]], [[1, 1], [1, 1]],
[[1, 1], [1, 1]], [[1, 1], [1, 1]],
[[1, 1], [1, 1]]],
[[[1, 1], [1, 1]], [[1, 1], [1, 1]],
[[1, 1], [1, 1]], [[1, 1], [1, 1]],
[[0, 0], [0, 0]]]])
_, true_image_shapes = model.preprocess(tf.zeros(image_shape))
# It should fail due to no rpn_features_to_crop in the input dict.
with self.assertRaises(ValueError):
detections = model.postprocess(
{
'refined_box_encodings':
refined_box_encodings,
'class_predictions_with_background':
class_predictions_with_background,
'num_proposals':
num_proposals,
'proposal_boxes':
proposal_boxes,
'image_shape':
image_shape,
'mask_predictions':
mask_predictions
}, true_image_shapes)
rpn_features_to_crop = tf.ones((batch_size, mask_height, mask_width, 3),
tf.float32)
detections = model.postprocess(
{
'refined_box_encodings':
refined_box_encodings,
'class_predictions_with_background':
class_predictions_with_background,
'num_proposals':
num_proposals,
'proposal_boxes':
proposal_boxes,
'image_shape':
image_shape,
'mask_predictions':
mask_predictions,
'rpn_features_to_crop':
rpn_features_to_crop
}, true_image_shapes)
with self.test_session() as sess:
init_op = tf.global_variables_initializer()
sess.run(init_op)
detections_out = sess.run(detections)
self.assertAllEqual(detections_out['detection_boxes'].shape, [2, 5, 4])
self.assertAllClose(detections_out['detection_scores'],
[[1, 1, 1, 1, 1], [1, 1, 1, 1, 0]])
self.assertAllClose(detections_out['detection_classes'],
[[0, 0, 0, 1, 1], [0, 0, 1, 1, 0]])
self.assertAllClose(detections_out['num_detections'], [5, 4])
self.assertAllClose(detections_out['detection_masks'],
exp_detection_masks)
self.assertTrue(np.amax(detections_out['detection_masks'] <= 1.0))
self.assertTrue(np.amin(detections_out['detection_masks'] >= 0.0))
self.assertIn('detection_features', detections_out)
if __name__ == '__main__':
tf.test.main()
......@@ -18,10 +18,10 @@ import functools
from absl.testing import parameterized
import numpy as np
import six
import tensorflow as tf
from google.protobuf import text_format
from tensorflow.contrib import slim as contrib_slim
from object_detection.anchor_generators import grid_anchor_generator
from object_detection.builders import box_predictor_builder
from object_detection.builders import hyperparams_builder
......@@ -38,7 +38,14 @@ from object_detection.utils import ops
from object_detection.utils import test_case
from object_detection.utils import test_utils
slim = contrib_slim
# pylint: disable=g-import-not-at-top
try:
from tensorflow.contrib import slim as contrib_slim
except ImportError:
# TF 2.0 doesn't ship with contrib.
pass
# pylint: enable=g-import-not-at-top
BOX_CODE_SIZE = 4
......@@ -58,14 +65,14 @@ class FakeFasterRCNNFeatureExtractor(
def _extract_proposal_features(self, preprocessed_inputs, scope):
with tf.variable_scope('mock_model'):
proposal_features = 0 * slim.conv2d(
proposal_features = 0 * contrib_slim.conv2d(
preprocessed_inputs, num_outputs=3, kernel_size=1, scope='layer1')
return proposal_features, {}
def _extract_box_classifier_features(self, proposal_feature_maps, scope):
with tf.variable_scope('mock_model'):
return 0 * slim.conv2d(proposal_feature_maps,
num_outputs=3, kernel_size=1, scope='layer2')
return 0 * contrib_slim.conv2d(
proposal_feature_maps, num_outputs=3, kernel_size=1, scope='layer2')
class FakeFasterRCNNKerasFeatureExtractor(
......@@ -226,7 +233,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
use_static_shapes=False,
calibration_mapping_value=None,
share_box_across_classes=False,
return_raw_detections_during_predict=False):
return_raw_detections_during_predict=False,
output_final_box_features=False):
def image_resizer_fn(image, masks=None):
"""Fake image resizer function."""
......@@ -372,47 +380,70 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
ops.matmul_crop_and_resize
if use_matmul_crop_and_resize else ops.native_crop_and_resize)
common_kwargs = {
'is_training': is_training,
'num_classes': num_classes,
'image_resizer_fn': image_resizer_fn,
'feature_extractor': fake_feature_extractor,
'number_of_stages': number_of_stages,
'first_stage_anchor_generator': first_stage_anchor_generator,
'first_stage_target_assigner': first_stage_target_assigner,
'first_stage_atrous_rate': first_stage_atrous_rate,
'is_training':
is_training,
'num_classes':
num_classes,
'image_resizer_fn':
image_resizer_fn,
'feature_extractor':
fake_feature_extractor,
'number_of_stages':
number_of_stages,
'first_stage_anchor_generator':
first_stage_anchor_generator,
'first_stage_target_assigner':
first_stage_target_assigner,
'first_stage_atrous_rate':
first_stage_atrous_rate,
'first_stage_box_predictor_arg_scope_fn':
first_stage_box_predictor_arg_scope_fn,
'first_stage_box_predictor_kernel_size':
first_stage_box_predictor_kernel_size,
'first_stage_box_predictor_depth': first_stage_box_predictor_depth,
'first_stage_minibatch_size': first_stage_minibatch_size,
'first_stage_sampler': first_stage_sampler,
'first_stage_box_predictor_depth':
first_stage_box_predictor_depth,
'first_stage_minibatch_size':
first_stage_minibatch_size,
'first_stage_sampler':
first_stage_sampler,
'first_stage_non_max_suppression_fn':
first_stage_non_max_suppression_fn,
'first_stage_max_proposals': first_stage_max_proposals,
'first_stage_max_proposals':
first_stage_max_proposals,
'first_stage_localization_loss_weight':
first_stage_localization_loss_weight,
'first_stage_objectness_loss_weight':
first_stage_objectness_loss_weight,
'second_stage_target_assigner': second_stage_target_assigner,
'second_stage_batch_size': second_stage_batch_size,
'second_stage_sampler': second_stage_sampler,
'second_stage_target_assigner':
second_stage_target_assigner,
'second_stage_batch_size':
second_stage_batch_size,
'second_stage_sampler':
second_stage_sampler,
'second_stage_non_max_suppression_fn':
second_stage_non_max_suppression_fn,
'second_stage_score_conversion_fn': second_stage_score_conversion_fn,
'second_stage_score_conversion_fn':
second_stage_score_conversion_fn,
'second_stage_localization_loss_weight':
second_stage_localization_loss_weight,
'second_stage_classification_loss_weight':
second_stage_classification_loss_weight,
'second_stage_classification_loss':
second_stage_classification_loss,
'hard_example_miner': hard_example_miner,
'crop_and_resize_fn': crop_and_resize_fn,
'clip_anchors_to_image': clip_anchors_to_image,
'use_static_shapes': use_static_shapes,
'resize_masks': True,
'hard_example_miner':
hard_example_miner,
'crop_and_resize_fn':
crop_and_resize_fn,
'clip_anchors_to_image':
clip_anchors_to_image,
'use_static_shapes':
use_static_shapes,
'resize_masks':
True,
'return_raw_detections_during_predict':
return_raw_detections_during_predict
return_raw_detections_during_predict,
'output_final_box_features':
output_final_box_features
}
return self._get_model(
......@@ -866,12 +897,13 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
use_matmul_gather_in_matcher=use_static_shapes,
first_stage_max_proposals=first_stage_max_proposals,
pad_to_max_dimension=pad_to_max_dimension)
_, true_image_shapes = model.preprocess(images)
preprocessed_images, true_image_shapes = model.preprocess(images)
proposals = model.postprocess({
'rpn_box_encodings': rpn_box_encodings,
'rpn_objectness_predictions_with_background':
rpn_objectness_predictions_with_background,
'rpn_features_to_crop': rpn_features_to_crop,
'image_shape': tf.shape(preprocessed_images),
'anchors': anchors}, true_image_shapes)
return (proposals['num_detections'], proposals['detection_boxes'],
proposals['detection_scores'], proposals['raw_detection_boxes'],
......@@ -925,6 +957,12 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
expected_raw_scores = [[[0., 1.], [1., 0.], [1., 0.], [0., 1.]],
[[1., 0.], [0., 1.], [0., 1.], [1., 0.]]]
if pad_to_max_dimension is not None:
expected_raw_proposal_boxes = (np.array(expected_raw_proposal_boxes) *
32 / pad_to_max_dimension)
expected_proposal_boxes = (np.array(expected_proposal_boxes) *
32 / pad_to_max_dimension)
self.assertAllClose(results[0], expected_num_proposals)
for indx, num_proposals in enumerate(expected_num_proposals):
self.assertAllClose(results[1][indx][0:num_proposals],
......@@ -982,7 +1020,8 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
'rpn_objectness_predictions_with_background':
rpn_objectness_predictions_with_background,
'rpn_features_to_crop': rpn_features_to_crop,
'anchors': anchors}, true_image_shapes)
'anchors': anchors,
'image_shape': image_shape}, true_image_shapes)
expected_proposal_boxes = [
[[0, 0, .5, .5], [.5, .5, 1, 1]], [[0, .5, .5, 1], [.5, 0, 1, .5]]]
expected_proposal_scores = [[1, 1],
......@@ -1933,8 +1972,9 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
with test_graph_classification.as_default():
image = tf.placeholder(dtype=tf.float32, shape=[1, 20, 20, 3])
with tf.variable_scope('mock_model'):
net = slim.conv2d(image, num_outputs=3, kernel_size=1, scope='layer1')
slim.conv2d(net, num_outputs=3, kernel_size=1, scope='layer2')
net = contrib_slim.conv2d(
image, num_outputs=3, kernel_size=1, scope='layer1')
contrib_slim.conv2d(net, num_outputs=3, kernel_size=1, scope='layer2')
init_op = tf.global_variables_initializer()
saver = tf.train.Saver()
......@@ -2012,10 +2052,12 @@ class FasterRCNNMetaArchTestBase(test_case.TestCase, parameterized.TestCase):
with self.test_session(graph=test_graph_detection2) as sess:
saver.restore(sess, saved_model_path)
uninitialized_vars_list = sess.run(tf.report_uninitialized_variables())
self.assertIn('another_variable', uninitialized_vars_list)
self.assertIn(six.b('another_variable'), uninitialized_vars_list)
for var in uninitialized_vars_list:
self.assertNotIn(model2.first_stage_feature_extractor_scope, var)
self.assertNotIn(model2.second_stage_feature_extractor_scope, var)
self.assertNotIn(
six.b(model2.first_stage_feature_extractor_scope), var)
self.assertNotIn(
six.b(model2.second_stage_feature_extractor_scope), var)
@parameterized.parameters(
{'use_keras': True},
......
......@@ -83,7 +83,8 @@ class RFCNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch):
use_static_shapes=False,
resize_masks=False,
freeze_batchnorm=False,
return_raw_detections_during_predict=False):
return_raw_detections_during_predict=False,
output_final_box_features=False):
"""RFCNMetaArch Constructor.
Args:
......@@ -192,6 +193,9 @@ class RFCNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch):
return_raw_detections_during_predict: Whether to return raw detection
boxes in the predict() method. These are decoded boxes that have not
been through postprocessing (i.e. NMS). Default False.
output_final_box_features: Whether to output final box features. If true,
it crops the feauture map based on the final box prediction and returns
in the dict as detection_features.
Raises:
ValueError: If `second_stage_batch_size` > `first_stage_max_proposals`
......@@ -240,7 +244,8 @@ class RFCNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch):
resize_masks,
freeze_batchnorm=freeze_batchnorm,
return_raw_detections_during_predict=(
return_raw_detections_during_predict))
return_raw_detections_during_predict),
output_final_box_features=output_final_box_features)
self._rfcn_box_predictor = second_stage_rfcn_box_predictor
......
......@@ -19,9 +19,7 @@ models.
"""
import abc
import tensorflow as tf
from tensorflow.contrib import slim as contrib_slim
from tensorflow.contrib import tpu as contrib_tpu
from tensorflow.python.util.deprecation import deprecated_args
from object_detection.core import box_list
from object_detection.core import box_list_ops
from object_detection.core import matcher
......@@ -33,7 +31,14 @@ from object_detection.utils import shape_utils
from object_detection.utils import variables_helper
from object_detection.utils import visualization_utils
slim = contrib_slim
# pylint: disable=g-import-not-at-top
try:
from tensorflow.contrib import slim as contrib_slim
except ImportError:
# TF 2.0 doesn't ship with contrib.
pass
# pylint: enable=g-import-not-at-top
class SSDFeatureExtractor(object):
......@@ -278,6 +283,9 @@ class SSDKerasFeatureExtractor(tf.keras.Model):
class SSDMetaArch(model.DetectionModel):
"""SSD Meta-architecture definition."""
@deprecated_args(None,
'NMS is always placed on TPU; do not use nms_on_host '
'as it has no effect.', 'nms_on_host')
def __init__(self,
is_training,
anchor_generator,
......@@ -457,7 +465,10 @@ class SSDMetaArch(model.DetectionModel):
self._return_raw_detections_during_predict = (
return_raw_detections_during_predict)
self._nms_on_host = nms_on_host
@property
def feature_extractor(self):
return self._feature_extractor
@property
def anchors(self):
......@@ -590,9 +601,9 @@ class SSDMetaArch(model.DetectionModel):
if self._feature_extractor.is_keras_model:
feature_maps = self._feature_extractor(preprocessed_inputs)
else:
with slim.arg_scope([slim.batch_norm],
is_training=(self._is_training and
not self._freeze_batchnorm),
with contrib_slim.arg_scope(
[contrib_slim.batch_norm],
is_training=(self._is_training and not self._freeze_batchnorm),
updates_collections=batchnorm_updates_collections):
with tf.variable_scope(None, self._extract_features_scope,
[preprocessed_inputs]):
......@@ -611,9 +622,9 @@ class SSDMetaArch(model.DetectionModel):
if self._box_predictor.is_keras_model:
predictor_results_dict = self._box_predictor(feature_maps)
else:
with slim.arg_scope([slim.batch_norm],
is_training=(self._is_training and
not self._freeze_batchnorm),
with contrib_slim.arg_scope(
[contrib_slim.batch_norm],
is_training=(self._is_training and not self._freeze_batchnorm),
updates_collections=batchnorm_updates_collections):
predictor_results_dict = self._box_predictor.predict(
feature_maps, self._anchor_generator.num_anchors_per_location())
......@@ -780,36 +791,16 @@ class SSDMetaArch(model.DetectionModel):
detection_keypoints, 'raw_keypoint_locations')
additional_fields[fields.BoxListFields.keypoints] = detection_keypoints
with tf.init_scope():
if tf.executing_eagerly():
# soft device placement in eager mode will automatically handle
# outside compilation.
def _non_max_suppression_wrapper(kwargs):
return self._non_max_suppression_fn(**kwargs)
else:
def _non_max_suppression_wrapper(kwargs):
if self._nms_on_host:
# Note: NMS is not memory efficient on TPU. This force the NMS
# to run outside of TPU.
return contrib_tpu.outside_compilation(
lambda x: self._non_max_suppression_fn(**x), kwargs)
else:
return self._non_max_suppression_fn(**kwargs)
(nmsed_boxes, nmsed_scores, nmsed_classes, nmsed_masks,
nmsed_additional_fields,
num_detections) = _non_max_suppression_wrapper({
'boxes':
num_detections) = self._non_max_suppression_fn(
detection_boxes,
'scores':
detection_scores,
'clip_window':
self._compute_clip_window(preprocessed_images, true_image_shapes),
'additional_fields':
additional_fields,
'masks':
prediction_dict.get('mask_predictions')
})
clip_window=self._compute_clip_window(
preprocessed_images, true_image_shapes),
additional_fields=additional_fields,
masks=prediction_dict.get('mask_predictions'))
detection_dict = {
fields.DetectionResultFields.detection_boxes:
nmsed_boxes,
......@@ -817,9 +808,6 @@ class SSDMetaArch(model.DetectionModel):
nmsed_scores,
fields.DetectionResultFields.detection_classes:
nmsed_classes,
fields.DetectionResultFields.detection_multiclass_scores:
nmsed_additional_fields.get(
'multiclass_scores') if nmsed_additional_fields else None,
fields.DetectionResultFields.num_detections:
tf.cast(num_detections, dtype=tf.float32),
fields.DetectionResultFields.raw_detection_boxes:
......@@ -827,6 +815,12 @@ class SSDMetaArch(model.DetectionModel):
fields.DetectionResultFields.raw_detection_scores:
detection_scores_with_background
}
if (nmsed_additional_fields is not None and
fields.InputDataFields.multiclass_scores in nmsed_additional_fields):
detection_dict[
fields.DetectionResultFields.detection_multiclass_scores] = (
nmsed_additional_fields[
fields.InputDataFields.multiclass_scores])
if (nmsed_additional_fields is not None and
'anchor_indices' in nmsed_additional_fields):
detection_dict.update({
......@@ -907,6 +901,8 @@ class SSDMetaArch(model.DetectionModel):
if self.groundtruth_has_field(fields.InputDataFields.is_annotated):
losses_mask = tf.stack(self.groundtruth_lists(
fields.InputDataFields.is_annotated))
location_losses = self._localization_loss(
prediction_dict['box_encodings'],
batch_reg_targets,
......@@ -1068,10 +1064,14 @@ class SSDMetaArch(model.DetectionModel):
batch_reg_targets: a tensor with shape [batch_size, num_anchors,
box_code_dimension]
batch_reg_weights: a tensor with shape [batch_size, num_anchors],
match_list: a list of matcher.Match objects encoding the match between
anchors and groundtruth boxes for each image of the batch,
with rows of the Match objects corresponding to groundtruth boxes
and columns corresponding to anchors.
match: an int32 tensor of shape [batch_size, num_anchors], containing
result of anchor groundtruth matching. Each position in the tensor
indicates an anchor and holds the following meaning:
(1) if match[x, i] >= 0, anchor i is matched with groundtruth
match[x, i].
(2) if match[x, i]=-1, anchor i is marked to be background .
(3) if match[x, i]=-2, anchor i is ignored since it is not background
and does not have sufficient overlap to call it a foreground.
"""
groundtruth_boxlists = [
box_list.BoxList(boxes) for boxes in groundtruth_boxes_list
......
# Lint as: python2, python3
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
......@@ -15,18 +16,30 @@
"""Tests for object_detection.meta_architectures.ssd_meta_arch."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from absl.testing import parameterized
import numpy as np
import six
from six.moves import range
import tensorflow as tf
from tensorflow.contrib import slim as contrib_slim
from object_detection.meta_architectures import ssd_meta_arch
from object_detection.meta_architectures import ssd_meta_arch_test_lib
from object_detection.protos import model_pb2
from object_detection.utils import test_utils
slim = contrib_slim
# pylint: disable=g-import-not-at-top
try:
from tensorflow.contrib import slim as contrib_slim
except ImportError:
# TF 2.0 doesn't ship with contrib.
pass
# pylint: enable=g-import-not-at-top
keras = tf.keras.layers
......@@ -681,9 +694,9 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
layer_two(net)
else:
with tf.variable_scope('mock_model'):
net = slim.conv2d(image, num_outputs=32, kernel_size=1,
scope='layer1')
slim.conv2d(net, num_outputs=3, kernel_size=1, scope='layer2')
net = contrib_slim.conv2d(
image, num_outputs=32, kernel_size=1, scope='layer1')
contrib_slim.conv2d(net, num_outputs=3, kernel_size=1, scope='layer2')
init_op = tf.global_variables_initializer()
saver = tf.train.Saver()
......@@ -711,7 +724,7 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
with self.test_session(graph=test_graph_detection) as sess:
saver.restore(sess, saved_model_path)
for var in sess.run(tf.report_uninitialized_variables()):
self.assertNotIn('FeatureExtractor', var)
self.assertNotIn(six.ensure_binary('FeatureExtractor'), var)
def test_load_all_det_checkpoint_vars(self, use_keras):
test_graph_detection = tf.Graph()
......@@ -776,5 +789,7 @@ class SsdMetaArchTest(ssd_meta_arch_test_lib.SSDMetaArchTestBase,
self.assertAllClose(localization_loss, expected_localization_loss)
self.assertAllClose(classification_loss, expected_classification_loss)
if __name__ == '__main__':
tf.test.main()
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment