Unverified Commit 8518d053 authored by pkulzc's avatar pkulzc Committed by GitHub
Browse files

Open source MnasFPN and minor fixes to OD API (#8484)

310447280  by lzc:

    Internal change

310420845  by Zhichao Lu:

    Open source the internal Context RCNN code.

--
310362339  by Zhichao Lu:

    Internal change

310259448  by lzc:

    Update required TF version for OD API.

--
310252159  by Zhichao Lu:

    Port patch_ops_test to TF1/TF2 as TPUs.

--
310247180  by Zhichao Lu:

    Ignore keypoint heatmap loss in the regions/bounding boxes with target keypoint
    class but no valid keypoint annotations.

--
310178294  by Zhichao Lu:

    Opensource MnasFPN
    https://arxiv.org/abs/1912.01106

--
310094222  by lzc:

    Internal changes.

--
310085250  by lzc:

    Internal Change.

--
310016447  by huizhongc:

    Remove unrecognized classes from labeled_classes.

--
310009470  by rathodv:

    Mark batcher.py as TF1 only.

--
310001984  by rathodv:

    Update core/preprocessor.py to be compatible with TF1/TF2..

--
309455035  by Zhichao Lu:

    Makes the freezable_batch_norm_test run w/ v2 behavior.

    The main change is in v2 updates will happen right away when running batchnorm in training mode. So, we need to restore the weights between batchnorm calls to make sure the numerical checks all start from the same place.

--
309425881  by Zhichao Lu:

    Make TF1/TF2 optimizer builder tests explicit.

--
309408646  by Zhichao Lu:

    Make dataset builder tests TF1 and TF2 compatible.

--
309246305  by Zhichao Lu:

    Added the functionality of combining the person keypoints and object detection
    annotations in the binary that converts the COCO raw data to TfRecord.

--
309125076  by Zhichao Lu:

    Convert target_assigner_utils to TF1/TF2.

--
308966359  by huizhongc:

    Support SSD training with partially labeled groundtruth.

--
308937159  by rathodv:

    Update core/target_assigner.py to be compatible with TF1/TF2.

--
308774302  by Zhichao Lu:

    Internal

--
308732860  by rathodv:

    Make core/prefetcher.py  compatible with TF1 only.

--
308726984  by rathodv:

    Update core/multiclass_nms_test.py to be TF1/TF2 compatible.

--
308714718  by rathodv:

    Update core/region_similarity_calculator_test.py to be TF1/TF2 compatible.

--
308707960  by rathodv:

    Update core/minibatch_sampler_test.py to be TF1/TF2 compatible.

--
308700595  by rathodv:

    Update core/losses_test.py to be TF1/TF2 compatible and remove losses_test_v2.py

--
308361472  by rathodv:

    Update core/matcher_test.py to be TF1/TF2 compatible.

--
308335846  by Zhichao Lu:

    Updated the COCO evaluation logics and populated the groundturth area
    information through. This change matches the groundtruth format expected by the
    COCO keypoint evaluation.

--
308256924  by rathodv:

    Update core/keypoints_ops_test.py to be TF1/TF2 compatible.

--
308256826  by rathodv:

    Update class_agnostic_nms_test.py to be TF1/TF2 compatible.

--
308256112  by rathodv:

    Update box_list_ops_test.py to be TF1/TF2 compatible.

--
308159360  by Zhichao Lu:

    Internal change

308145008  by Zhichao Lu:

    Added 'image/class/confidence' field in the TFExample decoder.

--
307651875  by rathodv:

    Refactor core/box_list.py to support TF1/TF2.

--
307651798  by rathodv:

    Modify box_coder.py base class to work with with TF1/TF2

--
307651652  by rathodv:

    Refactor core/balanced_positive_negative_sampler.py to support TF1/TF2.

--
307651571  by rathodv:

    Modify BoxCoders tests to use test_case:execute method to allow testing with TF1.X and TF2.X

--
307651480  by rathodv:

    Modify Matcher tests to use test_case:execute method to allow testing with TF1.X and TF2.X

--
307651409  by rathodv:

    Modify AnchorGenerator tests to use test_case:execute method to allow testing with TF1.X and TF2.X

--
307651314  by rathodv:

    Refactor model_builder to support TF1 or TF2 models based on TensorFlow version.

--
307092053  by Zhichao Lu:

    Use manager to save checkpoint.

--
307071352  by ronnyvotel:

    Fixing keypoint visibilities. Now by default, the visibility is marked True if the keypoint is labeled (regardless of whether it is visible or not).
    Also, if visibilities are not present in the dataset, they will be created based on whether the keypoint coordinates are finite (vis = True) or NaN (vis = False).

--
307069557  by Zhichao Lu:

    Internal change to add few fields related to postprocessing parameters in
    center_net.proto and populate those parameters to the keypoint postprocessing
    functions.

--
307012091  by Zhichao Lu:

    Make Adam Optimizer's epsilon proto configurable.

    Potential issue: tf.compat.v1's AdamOptimizer has a default epsilon on 1e-08 ([doc-link](https://www.tensorflow.org/api_docs/python/tf/compat/v1/train/AdamOptimizer))  whereas tf.keras's AdamOptimizer has default epsilon 1e-07 ([doc-link](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Adam))

--
306858598  by Zhichao Lu:

    Internal changes to update the CenterNet model:
    1) Modified eval job loss computation to avoid averaging over batches with zero loss.
    2) Updated CenterNet keypoint heatmap target assigner to apply box size to heatmap Guassian standard deviation.
    3) Updated the CenterNet meta arch keypoint losses computation to apply weights outside of loss function.

--
306731223  by jonathanhuang:

    Internal change.

--
306549183  by rathodv:

    Internal Update.

--
306542930  by rathodv:

    Internal Update

--
306322697  by rathodv:

    Internal.

--
305345036  by Zhichao Lu:

    Adding COCO Camera Traps Json to tf.Example beam code

--
304104869  by lzc:

    Internal changes.

--
304068971  by jonathanhuang:

    Internal change.

--
304050469  by Zhichao Lu:

    Internal change.

--
303880642  by huizhongc:

    Support parsing partially labeled groundtruth.

--
303841743  by Zhichao Lu:

    Deprecate nms_on_host in SSDMetaArch.

--
303803204  by rathodv:

    Internal change.

--
303793895  by jonathanhuang:

    Internal change.

--
303467631  by rathodv:

    Py3 update for detection inference test.

--
303444542  by rathodv:

    Py3 update to metrics module

--
303421960  by rathodv:

    Update json_utils to python3.

--
302787583  by ronnyvotel:

    Coco results generator for submission to the coco test server.

--
302719091  by Zhichao Lu:

    Internal change to add the ResNet50 image feature extractor for CenterNet model.

--
302116230  by Zhichao Lu:

    Added the functions to overlay the heatmaps with images in visualization util
    library.

--
301888316  by Zhichao Lu:

    Fix checkpoint_filepath not defined error.

--
301840312  by ronnyvotel:

    Adding keypoint_scores to visualizations.

--
301683475  by ronnyvotel:

    Introducing the ability to preprocess `keypoint_visibilities`.

    Some data augmentation ops such as random crop can filter instances and keypoints. It's important to also filter keypoint visibilities, so that the groundtruth tensors are always in alignment.

--
301532344  by Zhichao Lu:

    Don't use tf.divide since "Quantization not yet supported for op: DIV"

--
301480348  by ronnyvotel:

    Introducing keypoint evaluation into model lib v2.
    Also, making some fixes to coco keypoint evaluation.

--
301454018  by Zhichao Lu:

    Added the image summary to visualize the train/eval input images and eval's
    prediction/groundtruth side-by-side image.

--
301317527  by Zhichao Lu:

    Updated the random_absolute_pad_image function in the preprocessor library to
    support the keypoints argument.

--
301300324  by Zhichao Lu:

    Apply name change(experimental_run_v2 -> run) for all callers in Tensorflow.

--
301297115  by ronnyvotel:

    Utility function for setting keypoint visibilities based on keypoint coordinates.

--
301248885  by Zhichao Lu:

    Allow MultiworkerMirroredStrategy(MWMS) use by adding checkpoint handling with temporary directories in model_lib_v2. Added missing WeakKeyDictionary cfer_fn_cache field in CollectiveAllReduceStrategyExtended.

--
301224559  by Zhichao Lu:

    ...1) Fixes model_lib to also use keypoints while preparing model groundtruth.
    ...2) Tests model_lib with newly added keypoint metrics config.

--
300836556  by Zhichao Lu:

    Internal changes to add keypoint estimation parameters in CenterNet proto.

--
300795208  by Zhichao Lu:

    Updated the eval_util library to populate the keypoint groundtruth to
    eval_dict.

--
299474766  by Zhichao Lu:

    ...Modifies eval_util to create Keypoint Evaluator objects when configured in eval config.

--
299453920  by Zhichao Lu:

    Add swish activation as a hyperperams option.

--
299240093  by ronnyvotel:

    Keypoint postprocessing for CenterNetMetaArch.

--
299176395  by Zhichao Lu:

    Internal change.

--
299135608  by Zhichao Lu:

    Internal changes to refactor the CenterNet model in preparation for keypoint estimation tasks.

--
298915482  by Zhichao Lu:

    Make dataset_builder aware of input_context for distributed training.

--
298713595  by Zhichao Lu:

    Handling data with negative size boxes.

--
298695964  by Zhichao Lu:

    Expose change_coordinate_frame as a config parameter; fix multiclass_scores optional field.

--
298492150  by Zhichao Lu:

    Rename optimizer_builder_test_v2.py -> optimizer_builder_v2_test.py

--
298476471  by Zhichao Lu:

    Internal changes to support CenterNet keypoint estimation.

--
298365851  by ronnyvotel:

    Fixing a bug where groundtruth_keypoint_weights were being padded with a dynamic dimension.

--
297843700  by Zhichao Lu:

    Internal change.

--
297706988  by lzc:

    Internal change.

--
297705287  by ronnyvotel:

    Creating the "snapping" behavior in CenterNet, where regressed keypoints are refined with updated candidate keypoints from a heatmap.

--
297700447  by Zhichao Lu:

    Improve checkpoint checking logic with TF2 loop.

--
297686094  by Zhichao Lu:

    Convert "import tensorflow as tf" to "import tensorflow.compat.v1".

--
297670468  by lzc:

    Internal change.

--
297241327  by Zhichao Lu:

    Convert "import tensorflow as tf" to "import tensorflow.compat.v1".

--
297205959  by Zhichao Lu:

    Internal changes to support refactored the centernet object detection target assigner into a separate library.

--
297143806  by Zhichao Lu:

    Convert "import tensorflow as tf" to "import tensorflow.compat.v1".

--
297129625  by Zhichao Lu:

    Explicitly replace "import tensorflow" with "tensorflow.compat.v1" for TF2.x migration

--
297117070  by Zhichao Lu:

    Explicitly replace "import tensorflow" with "tensorflow.compat.v1" for TF2.x migration

--
297030190  by Zhichao Lu:

    Add configuration options for visualizing keypoint edges

--
296359649  by Zhichao Lu:

    Support DepthwiseConv2dNative (of separable conv) in weight equalization loss.

--
296290582  by Zhichao Lu:

    Internal change.

--
296093857  by Zhichao Lu:

    Internal changes to add general target assigner utilities.

--
295975116  by Zhichao Lu:

    Fix visualize_boxes_and_labels_on_image_array to show max_boxes_to_draw correctly.

--
295819711  by Zhichao Lu:

    Adds a flag to visualize_boxes_and_labels_on_image_array to skip the drawing of axis aligned bounding boxes.

--
295811929  by Zhichao Lu:

    Keypoint support in random_square_crop_by_scale.

--
295788458  by rathodv:

    Remove unused checkpoint to reduce repo size on github

--
295787184  by Zhichao Lu:

    Enable visualization of edges between keypoints

--
295763508  by Zhichao Lu:

    [Context RCNN] Add an option to enable / disable cropping feature in the post
    process step in the meta archtecture.

--
295605344  by Zhichao Lu:

    internal change.

--
294926050  by ronnyvotel:

    Adding per-keypoint groundtruth weights. These weights are intended to be used as multipliers in a keypoint loss function.

    Groundtruth keypoint weights are constructed as follows:
    - Initialize the weight for each keypoint type based on user-specified weights in the input_reader proto
    - Mask out (i.e. make zero) all keypoint weights that are not visible.

--
294829061  by lzc:

    Internal change.

--
294566503  by Zhichao Lu:

    Changed internal CenterNet Model configuration.

--
294346662  by ronnyvotel:

    Using NaN values in keypoint coordinates that are not visible.

--
294333339  by Zhichao Lu:

    Change experimetna_distribute_dataset -> experimental_distribute_dataset_from_function

--
293928752  by Zhichao Lu:

    Internal change

--
293909384  by Zhichao Lu:

    Add capabilities to train 1024x1024 CenterNet models.

--
293637554  by ronnyvotel:

    Adding keypoint visibilities to TfExampleDecoder.

--
293501558  by lzc:

    Internal change.

--
293252851  by Zhichao Lu:

    Change tf.gfile.GFile to tf.io.gfile.GFile.

--
292730217  by Zhichao Lu:

    Internal change.

--
292456563  by lzc:

    Internal changes.

--
292355612  by Zhichao Lu:

    Use tf.gather and tf.scatter_nd instead of matrix ops.

--
292245265  by rathodv:

    Internal

--
291989323  by richardmunoz:

    Refactor out building a DataDecoder from building a tf.data.Dataset.

--
291950147  by Zhichao Lu:

    Flip bounding boxes in arbitrary shaped tensors.

--
291401052  by huizhongc:

    Fix multiscale grid anchor generator to allow fully convolutional inference. When exporting model with identity_resizer as image_resizer, there is an incorrect box offset on the detection results. We add the anchor offset to address this problem.

--
291298871  by Zhichao Lu:

    Py3 compatibility changes.

--
290957957  by Zhichao Lu:

    Hourglass feature extractor for CenterNet.

--
290564372  by Zhichao Lu:

    Internal change.

--
290155278  by rathodv:

    Remove Dataset Explorer.

--
290155153  by Zhichao Lu:

    Internal change

--
290122054  by Zhichao Lu:

    Unify the format in the faster_rcnn.proto

--
290116084  by Zhichao Lu:

    Deprecate tensorflow.contrib.

--
290100672  by Zhichao Lu:

    Update MobilenetV3 SSD candidates

--
289926392  by Zhichao Lu:

    Internal change

--
289553440  by Zhichao Lu:

    [Object Detection API] Fix the comments about the dimension of the rpn_box_encodings from 4-D to 3-D.

--
288994128  by lzc:

    Internal changes.

--
288942194  by lzc:

    Internal change.

--
288746124  by Zhichao Lu:

    Configurable channel mean/std. dev in CenterNet feature extractors.

--
288552509  by rathodv:

    Internal.

--
288541285  by rathodv:

    Internal update.

--
288396396  by Zhichao Lu:

    Make object detection import contrib explicitly

--
288255791  by rathodv:

    Internal

--
288078600  by Zhichao Lu:

    Fix model_lib_v2 test

--
287952244  by rathodv:

    Internal

--
287921774  by Zhichao Lu:

    internal change

--
287906173  by Zhichao Lu:

    internal change

--
287889407  by jonathanhuang:

    PY3 compatibility

--
287889042  by rathodv:

    Internal

--
287876178  by Zhichao Lu:

    Internal change.

--
287770490  by Zhichao Lu:

    Add CenterNet proto and builder

--
287694213  by Zhichao Lu:

    Support for running multiple steps per tf.function call.

--
287377183  by jonathanhuang:

    PY3 compatibility

--
287371344  by rathodv:

    Support loading keypoint labels and ids.

--
287368213  by rathodv:

    Add protos supporting keypoint evaluation.

--
286673200  by rathodv:

    dataset_tools PY3 migration

--
286635106  by Zhichao Lu:

    Update code for upcoming tf.contrib removal

--
286479439  by Zhichao Lu:

    Internal change

--
286311711  by Zhichao Lu:

    Skeleton of context model within TFODAPI

--
286005546  by Zhichao Lu:

    Fix Faster-RCNN training when using keep_aspect_ratio_resizer with pad_to_max_dimension

--
285906400  by derekjchow:

    Internal change

--
285822795  by Zhichao Lu:

    Add CenterNet meta arch target assigners.

--
285447238  by Zhichao Lu:

    Internal changes.

--
285016927  by Zhichao Lu:

    Make _dummy_computation a tf.function. This fixes breakage caused by
    cl/284256438

--
284827274  by Zhichao Lu:

    Convert to python 3.

--
284645593  by rathodv:

    Internal change

--
284639893  by rathodv:

    Add missing documentation for keypoints in eval_util.py.

--
284323712  by Zhichao Lu:

    Internal changes.

--
284295290  by Zhichao Lu:

    Updating input config proto and dataset builder to include context fields

    Updating standard_fields and tf_example_decoder to include context features

--
284226821  by derekjchow:

    Update exporter.

--
284211030  by Zhichao Lu:

    API changes in CenterNet informed by the experiments with hourlgass network.

--
284190451  by Zhichao Lu:

    Add support for CenterNet losses in protos and builders.

--
284093961  by lzc:

    Internal changes.

--
284028174  by Zhichao Lu:

    Internal change

--
284014719  by derekjchow:

    Do not pad top_down feature maps unnecessarily.

--
284005765  by Zhichao Lu:

    Add new pad_to_multiple_resizer

--
283858233  by Zhichao Lu:

    Make target assigner work when under tf.function.

--
283836611  by Zhichao Lu:

    Make config getters more general.

--
283808990  by Zhichao Lu:

    Internal change

--
283754588  by Zhichao Lu:

    Internal changes.

--
282460301  by Zhichao Lu:

    Add ability to restore v2 style checkpoints.

--
281605842  by lzc:

    Add option to disable loss computation in OD API eval job.

--
280298212  by Zhichao Lu:

    Add backwards compatible change

--
280237857  by Zhichao Lu:

    internal change

--

PiperOrigin-RevId: 310447280
parent ac5fff19
# Lint as: python2, python3
# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""SSD MobilenetV2 NAS-FPN Feature Extractor."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import functools
from six.moves import range
import tensorflow as tf
from tensorflow.contrib import slim as contrib_slim
from object_detection.meta_architectures import ssd_meta_arch
from object_detection.utils import ops
from object_detection.utils import shape_utils
from nets.mobilenet import mobilenet
from nets.mobilenet import mobilenet_v2
slim = contrib_slim
Block = collections.namedtuple(
'Block', ['inputs', 'output_level', 'kernel_size', 'expansion_size'])
_MNASFPN_CELL_CONFIG = [
Block(inputs=(1, 2), output_level=4, kernel_size=3, expansion_size=256),
Block(inputs=(0, 4), output_level=3, kernel_size=3, expansion_size=128),
Block(inputs=(5, 4), output_level=4, kernel_size=3, expansion_size=128),
Block(inputs=(4, 3), output_level=5, kernel_size=5, expansion_size=128),
Block(inputs=(4, 3), output_level=6, kernel_size=3, expansion_size=96),
]
MNASFPN_DEF = dict(
feature_levels=[3, 4, 5, 6],
spec=[_MNASFPN_CELL_CONFIG] * 4,
)
def _maybe_pad(feature, use_explicit_padding, kernel_size=3):
return ops.fixed_padding(feature,
kernel_size) if use_explicit_padding else feature
# Wrapper around mobilenet.depth_multiplier
def _apply_multiplier(d, multiplier, min_depth):
p = {'num_outputs': d}
mobilenet.depth_multiplier(
p, multiplier=multiplier, divisible_by=8, min_depth=min_depth)
return p['num_outputs']
def _apply_size_dependent_ordering(input_feature, feature_level, block_level,
expansion_size, use_explicit_padding,
use_native_resize_op):
"""Applies Size-Dependent-Ordering when resizing feature maps.
See https://arxiv.org/abs/1912.01106
Args:
input_feature: input feature map to be resized.
feature_level: the level of the input feature.
block_level: the desired output level for the block.
expansion_size: the expansion size for the block.
use_explicit_padding: Whether to use explicit padding.
use_native_resize_op: Whether to use native resize op.
Returns:
A transformed feature at the desired resolution and expansion size.
"""
padding = 'VALID' if use_explicit_padding else 'SAME'
if feature_level >= block_level: # Perform 1x1 then upsampling.
node = slim.conv2d(
input_feature,
expansion_size, [1, 1],
activation_fn=None,
normalizer_fn=slim.batch_norm,
padding=padding,
scope='Conv1x1')
if feature_level == block_level:
return node
scale = 2**(feature_level - block_level)
if use_native_resize_op:
input_shape = shape_utils.combined_static_and_dynamic_shape(node)
node = tf.image.resize_nearest_neighbor(
node, [input_shape[1] * scale, input_shape[2] * scale])
else:
node = ops.nearest_neighbor_upsampling(node, scale=scale)
else: # Perform downsampling then 1x1.
stride = 2**(block_level - feature_level)
node = slim.max_pool2d(
_maybe_pad(input_feature, use_explicit_padding), [3, 3],
stride=[stride, stride],
padding=padding,
scope='Downsample')
node = slim.conv2d(
node,
expansion_size, [1, 1],
activation_fn=None,
normalizer_fn=slim.batch_norm,
padding=padding,
scope='Conv1x1')
return node
def _mnasfpn_cell(feature_maps,
feature_levels,
cell_spec,
output_channel=48,
use_explicit_padding=False,
use_native_resize_op=False,
multiplier_func=None):
"""Create a MnasFPN cell.
Args:
feature_maps: input feature maps.
feature_levels: levels of the feature maps.
cell_spec: A list of Block configs.
output_channel: Number of features for the input, output and intermediate
feature maps.
use_explicit_padding: Whether to use explicit padding.
use_native_resize_op: Whether to use native resize op.
multiplier_func: Depth-multiplier function. If None, use identity function.
Returns:
A transformed list of feature maps at the same resolutions as the inputs.
"""
# This is the level where multipliers are realized.
if multiplier_func is None:
multiplier_func = lambda x: x
num_outputs = len(feature_maps)
cell_features = list(feature_maps)
cell_levels = list(feature_levels)
padding = 'VALID' if use_explicit_padding else 'SAME'
for bi, block in enumerate(cell_spec):
with tf.variable_scope('block_{}'.format(bi)):
block_level = block.output_level
intermediate_feature = None
for i, inp in enumerate(block.inputs):
with tf.variable_scope('input_{}'.format(i)):
input_level = cell_levels[inp]
node = _apply_size_dependent_ordering(
cell_features[inp], input_level, block_level,
multiplier_func(block.expansion_size), use_explicit_padding,
use_native_resize_op)
# Add features incrementally to avoid producing AddN, which doesn't
# play well with TfLite.
if intermediate_feature is None:
intermediate_feature = node
else:
intermediate_feature += node
node = tf.nn.relu6(intermediate_feature)
node = slim.separable_conv2d(
_maybe_pad(node, use_explicit_padding, block.kernel_size),
multiplier_func(output_channel),
block.kernel_size,
activation_fn=None,
normalizer_fn=slim.batch_norm,
padding=padding,
scope='SepConv')
cell_features.append(node)
cell_levels.append(block_level)
# Cell-wide residuals.
out_idx = range(len(cell_features) - num_outputs, len(cell_features))
for in_i, out_i in enumerate(out_idx):
if cell_features[out_i].shape.as_list(
) == cell_features[in_i].shape.as_list():
cell_features[out_i] += cell_features[in_i]
return cell_features[-num_outputs:]
def mnasfpn(feature_maps,
head_def,
output_channel=48,
use_explicit_padding=False,
use_native_resize_op=False,
multiplier_func=None):
"""Create the MnasFPN head given head_def."""
features = feature_maps
for ci, cell_spec in enumerate(head_def['spec']):
with tf.variable_scope('cell_{}'.format(ci)):
features = _mnasfpn_cell(features, head_def['feature_levels'], cell_spec,
output_channel, use_explicit_padding,
use_native_resize_op, multiplier_func)
return features
def training_scope(l2_weight_decay=1e-4, is_training=None):
"""Arg scope for training MnasFPN."""
with slim.arg_scope(
[slim.conv2d],
weights_initializer=tf.initializers.he_normal(),
weights_regularizer=slim.l2_regularizer(l2_weight_decay)), \
slim.arg_scope(
[slim.separable_conv2d],
weights_initializer=tf.initializers.truncated_normal(
stddev=0.536), # He_normal for 3x3 depthwise kernel.
weights_regularizer=slim.l2_regularizer(l2_weight_decay)), \
slim.arg_scope([slim.batch_norm],
is_training=is_training,
epsilon=0.01,
decay=0.99,
center=True,
scale=True) as s:
return s
class SSDMobileNetV2MnasFPNFeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
"""SSD Feature Extractor using MobilenetV2 MnasFPN features."""
def __init__(self,
is_training,
depth_multiplier,
min_depth,
pad_to_multiple,
conv_hyperparams_fn,
fpn_min_level=3,
fpn_max_level=6,
additional_layer_depth=48,
head_def=None,
reuse_weights=None,
use_explicit_padding=False,
use_depthwise=False,
use_native_resize_op=False,
override_base_feature_extractor_hyperparams=False,
data_format='channels_last'):
"""SSD MnasFPN feature extractor based on Mobilenet v2 architecture.
See https://arxiv.org/abs/1912.01106
Args:
is_training: whether the network is in training mode.
depth_multiplier: float depth multiplier for feature extractor.
min_depth: minimum feature extractor depth.
pad_to_multiple: the nearest multiple to zero pad the input height and
width dimensions to.
conv_hyperparams_fn: A function to construct tf slim arg_scope for conv2d
and separable_conv2d ops in the layers that are added on top of the base
feature extractor.
fpn_min_level: the highest resolution feature map to use in MnasFPN.
Currently the only valid value is 3.
fpn_max_level: the smallest resolution feature map to construct or use in
MnasFPN. Currentl the only valid value is 6.
additional_layer_depth: additional feature map layer channel depth for
NAS-FPN.
head_def: A dictionary specifying the MnasFPN head architecture. Default
uses MNASFPN_DEF.
reuse_weights: whether to reuse variables. Default is None.
use_explicit_padding: Whether to use explicit padding when extracting
features. Default is False.
use_depthwise: Whether to use depthwise convolutions. Default is False.
use_native_resize_op: Whether to use native resize op. Default is False.
override_base_feature_extractor_hyperparams: Whether to override
hyperparameters of the base feature extractor with the one from
`conv_hyperparams_fn`.
data_format: The ordering of the dimensions in the inputs, The valid
values are {'channels_first', 'channels_last').
"""
super(SSDMobileNetV2MnasFPNFeatureExtractor, self).__init__(
is_training=is_training,
depth_multiplier=depth_multiplier,
min_depth=min_depth,
pad_to_multiple=pad_to_multiple,
conv_hyperparams_fn=conv_hyperparams_fn,
reuse_weights=reuse_weights,
use_explicit_padding=use_explicit_padding,
use_depthwise=use_depthwise,
override_base_feature_extractor_hyperparams=(
override_base_feature_extractor_hyperparams))
if fpn_min_level != 3 or fpn_max_level != 6:
raise ValueError('Min and max levels of MnasFPN must be 3 and 6 for now.')
self._fpn_min_level = fpn_min_level
self._fpn_max_level = fpn_max_level
self._fpn_layer_depth = additional_layer_depth
self._head_def = head_def if head_def else MNASFPN_DEF
self._data_format = data_format
self._use_native_resize_op = use_native_resize_op
def preprocess(self, resized_inputs):
"""SSD preprocessing.
Maps pixel values to the range [-1, 1].
Args:
resized_inputs: a [batch, height, width, channels] float tensor
representing a batch of images.
Returns:
preprocessed_inputs: a [batch, height, width, channels] float tensor
representing a batch of images.
"""
return (2.0 / 255.0) * resized_inputs - 1.0
def _verify_config(self, inputs):
"""Verify that MnasFPN config and its inputs."""
num_inputs = len(inputs)
assert len(self._head_def['feature_levels']) == num_inputs
base_width = inputs[0].shape.as_list(
)[1] * 2**self._head_def['feature_levels'][0]
for i in range(1, num_inputs):
width = inputs[i].shape.as_list()[1]
level = self._head_def['feature_levels'][i]
expected_width = base_width // 2**level
if width != expected_width:
raise ValueError(
'Resolution of input {} does not match its level {}.'.format(
i, level))
for cell_spec in self._head_def['spec']:
# The last K nodes in a cell are the inputs to the next cell. Assert that
# their feature maps are at the right level.
for i in range(num_inputs):
if cell_spec[-num_inputs +
i].output_level != self._head_def['feature_levels'][i]:
raise ValueError(
'Mismatch between node level {} and desired output level {}.'
.format(cell_spec[-num_inputs + i].output_level,
self._head_def['feature_levels'][i]))
# Assert that each block only uses precending blocks.
for bi, block_spec in enumerate(cell_spec):
for inp in block_spec.inputs:
if inp >= bi + num_inputs:
raise ValueError(
'Block {} is trying to access uncreated block {}.'.format(
bi, inp))
def extract_features(self, preprocessed_inputs):
"""Extract features from preprocessed inputs.
Args:
preprocessed_inputs: a [batch, height, width, channels] float tensor
representing a batch of images.
Returns:
feature_maps: a list of tensors where the ith tensor has shape
[batch, height_i, width_i, depth_i]
"""
preprocessed_inputs = shape_utils.check_min_image_dim(
33, preprocessed_inputs)
with tf.variable_scope('MobilenetV2', reuse=self._reuse_weights) as scope:
with slim.arg_scope(
mobilenet_v2.training_scope(is_training=None, bn_decay=0.99)), \
slim.arg_scope(
[mobilenet.depth_multiplier], min_depth=self._min_depth):
with slim.arg_scope(
training_scope(l2_weight_decay=4e-5,
is_training=self._is_training)):
_, image_features = mobilenet_v2.mobilenet_base(
ops.pad_to_multiple(preprocessed_inputs, self._pad_to_multiple),
final_endpoint='layer_18',
depth_multiplier=self._depth_multiplier,
use_explicit_padding=self._use_explicit_padding,
scope=scope)
multiplier_func = functools.partial(
_apply_multiplier,
multiplier=self._depth_multiplier,
min_depth=self._min_depth)
with tf.variable_scope('MnasFPN', reuse=self._reuse_weights):
with slim.arg_scope(
training_scope(l2_weight_decay=1e-4, is_training=self._is_training)):
# Create C6 by downsampling C5.
c6 = slim.max_pool2d(
_maybe_pad(image_features['layer_18'], self._use_explicit_padding),
[3, 3],
stride=[2, 2],
padding='VALID' if self._use_explicit_padding else 'SAME',
scope='C6_downsample')
c6 = slim.conv2d(
c6,
multiplier_func(self._fpn_layer_depth),
[1, 1],
activation_fn=tf.identity,
normalizer_fn=slim.batch_norm,
weights_regularizer=None, # this 1x1 has no kernel regularizer.
padding='VALID',
scope='C6_Conv1x1')
image_features['C6'] = tf.identity(c6) # Needed for quantization.
for k in sorted(image_features.keys()):
tf.logging.error('{}: {}'.format(k, image_features[k]))
mnasfpn_inputs = [
image_features['layer_7'], # C3
image_features['layer_14'], # C4
image_features['layer_18'], # C5
image_features['C6'] # C6
]
self._verify_config(mnasfpn_inputs)
feature_maps = mnasfpn(
mnasfpn_inputs,
head_def=self._head_def,
output_channel=self._fpn_layer_depth,
use_explicit_padding=self._use_explicit_padding,
use_native_resize_op=self._use_native_resize_op,
multiplier_func=multiplier_func)
return feature_maps
# Lint as: python2, python3
# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for ssd_mobilenet_v2_nas_fpn_feature_extractor."""
import numpy as np
import tensorflow as tf
from tensorflow.contrib import slim as contrib_slim
from object_detection.models import ssd_feature_extractor_test
from object_detection.models import ssd_mobilenet_v2_mnasfpn_feature_extractor as mnasfpn_feature_extractor
slim = contrib_slim
class SsdMobilenetV2MnasFPNFeatureExtractorTest(
ssd_feature_extractor_test.SsdFeatureExtractorTestBase):
def _create_feature_extractor(self,
depth_multiplier,
pad_to_multiple,
use_explicit_padding=False):
min_depth = 16
is_training = True
fpn_num_filters = 48
return mnasfpn_feature_extractor.SSDMobileNetV2MnasFPNFeatureExtractor(
is_training,
depth_multiplier,
min_depth,
pad_to_multiple,
self.conv_hyperparams_fn,
additional_layer_depth=fpn_num_filters,
use_explicit_padding=use_explicit_padding)
def test_extract_features_returns_correct_shapes_320_256(self):
image_height = 320
image_width = 256
depth_multiplier = 1.0
pad_to_multiple = 1
expected_feature_map_shape = [(2, 40, 32, 48), (2, 20, 16, 48),
(2, 10, 8, 48), (2, 5, 4, 48)]
self.check_extract_features_returns_correct_shape(
2, image_height, image_width, depth_multiplier, pad_to_multiple,
expected_feature_map_shape, use_explicit_padding=False)
self.check_extract_features_returns_correct_shape(
2, image_height, image_width, depth_multiplier, pad_to_multiple,
expected_feature_map_shape, use_explicit_padding=True)
def test_extract_features_returns_correct_shapes_enforcing_min_depth(self):
image_height = 256
image_width = 256
depth_multiplier = 0.5**12
pad_to_multiple = 1
expected_feature_map_shape = [(2, 32, 32, 16), (2, 16, 16, 16),
(2, 8, 8, 16), (2, 4, 4, 16)]
self.check_extract_features_returns_correct_shape(
2, image_height, image_width, depth_multiplier, pad_to_multiple,
expected_feature_map_shape, use_explicit_padding=False)
self.check_extract_features_returns_correct_shape(
2, image_height, image_width, depth_multiplier, pad_to_multiple,
expected_feature_map_shape, use_explicit_padding=True)
def test_preprocess_returns_correct_value_range(self):
image_height = 320
image_width = 320
depth_multiplier = 1
pad_to_multiple = 1
test_image = np.random.rand(2, image_height, image_width, 3)
feature_extractor = self._create_feature_extractor(depth_multiplier,
pad_to_multiple)
preprocessed_image = feature_extractor.preprocess(test_image)
self.assertTrue(np.all(np.less_equal(np.abs(preprocessed_image), 1.0)))
if __name__ == '__main__':
tf.test.main()
......@@ -157,7 +157,7 @@ class SSDMobileNetV3FeatureExtractorBase(ssd_meta_arch.SSDFeatureExtractor):
insert_1x1_conv=True,
image_features=image_features)
return feature_maps.values()
return list(feature_maps.values())
class SSDMobileNetV3LargeFeatureExtractor(SSDMobileNetV3FeatureExtractorBase):
......
# Lint as: python2, python3
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
......@@ -154,7 +155,7 @@ class SSDPNASNetFeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
insert_1x1_conv=True,
image_features=image_features)
return feature_maps.values()
return list(feature_maps.values())
def restore_from_classification_checkpoint_fn(self, feature_extractor_scope):
"""Returns a map of variables to load from a foreign checkpoint.
......
# Lint as: python2, python3
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
......@@ -17,6 +18,11 @@
See https://arxiv.org/abs/1708.02002 for details.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from six.moves import range
import tensorflow as tf
from tensorflow.contrib import slim as contrib_slim
......
# Lint as: python2, python3
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
......@@ -13,10 +14,14 @@
# limitations under the License.
# ==============================================================================
"""Tests for ssd resnet v1 FPN feature extractors."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import abc
import itertools
from absl.testing import parameterized
import numpy as np
from six.moves import zip
import tensorflow as tf
from object_detection.models import ssd_feature_extractor_test
......@@ -112,8 +117,8 @@ class SSDResnetFPNFeatureExtractorTestBase(
image_tensor = np.random.rand(2, image_height, image_width,
3).astype(np.float32)
feature_maps = self.execute(graph_fn, [image_tensor])
for feature_map, expected_shape in itertools.izip(
feature_maps, expected_feature_map_shape):
for feature_map, expected_shape in zip(feature_maps,
expected_feature_map_shape):
self.assertAllEqual(feature_map.shape, expected_shape)
def test_extract_features_returns_correct_shapes_with_pad_to_multiple(
......
# Lint as: python2, python3
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
......@@ -15,6 +16,12 @@
"""SSD Keras-based ResnetV1 FPN Feature Extractor."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from six.moves import range
from six.moves import zip
import tensorflow as tf
from object_detection.meta_architectures import ssd_meta_arch
......@@ -121,7 +128,7 @@ class SSDResNetV1FpnKerasFeatureExtractor(
self._resnet_v1_base_model = resnet_v1_base_model
self._resnet_v1_base_model_name = resnet_v1_base_model_name
self._resnet_block_names = ['block1', 'block2', 'block3', 'block4']
self._resnet_v1 = None
self.classification_backbone = None
self._fpn_features_generator = None
self._coarse_feature_layers = []
......@@ -139,7 +146,7 @@ class SSDResNetV1FpnKerasFeatureExtractor(
output_layers = _RESNET_MODEL_OUTPUT_LAYERS[self._resnet_v1_base_model_name]
outputs = [full_resnet_v1_model.get_layer(output_layer_name).output
for output_layer_name in output_layers]
self._resnet_v1 = tf.keras.Model(
self.classification_backbone = tf.keras.Model(
inputs=full_resnet_v1_model.inputs,
outputs=outputs)
# pylint:disable=g-long-lambda
......@@ -214,13 +221,14 @@ class SSDResNetV1FpnKerasFeatureExtractor(
preprocessed_inputs = shape_utils.check_min_image_dim(
129, preprocessed_inputs)
image_features = self._resnet_v1(
image_features = self.classification_backbone(
ops.pad_to_multiple(preprocessed_inputs, self._pad_to_multiple))
feature_block_list = []
for level in range(self._fpn_min_level, self._base_fpn_max_level + 1):
feature_block_list.append('block{}'.format(level - 1))
feature_block_map = dict(zip(self._resnet_block_names, image_features))
feature_block_map = dict(
list(zip(self._resnet_block_names, image_features)))
fpn_input_image_features = [
(feature_block, feature_block_map[feature_block])
for feature_block in feature_block_list]
......@@ -238,6 +246,17 @@ class SSDResNetV1FpnKerasFeatureExtractor(
feature_maps.append(last_feature_map)
return feature_maps
def restore_from_classification_checkpoint_fn(self, feature_extractor_scope):
"""Returns a map for restoring from an (object-based) checkpoint.
Args:
feature_extractor_scope: A scope name for the feature extractor (unused).
Returns:
A dict mapping keys to Keras models
"""
return {'feature_extractor': self.classification_backbone}
class SSDResNet50V1FpnKerasFeatureExtractor(
SSDResNetV1FpnKerasFeatureExtractor):
......
# Lint as: python2, python3
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
......@@ -162,7 +163,7 @@ class _SSDResnetPpnFeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
image_features={
'image_features': self._filter_features(activations)['block3']
})
return feature_maps.values()
return list(feature_maps.values())
class SSDResnet50V1PpnFeatureExtractor(_SSDResnetPpnFeatureExtractor):
......
# Lint as: python2, python3
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
......@@ -14,13 +15,19 @@
# ==============================================================================
"""Convolutional Box Predictors with and without weight sharing."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import functools
from six.moves import range
from six.moves import zip
import tensorflow as tf
from tensorflow.contrib import slim as contrib_slim
from object_detection.core import box_predictor
from object_detection.utils import shape_utils
from object_detection.utils import static_shape
slim = tf.contrib.slim
slim = contrib_slim
BOX_ENCODINGS = box_predictor.BOX_ENCODINGS
CLASS_PREDICTIONS_WITH_BACKGROUND = (
......
# Lint as: python2, python3
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
......@@ -15,8 +16,14 @@
"""Tests for object_detection.predictors.convolutional_box_predictor."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from absl.testing import parameterized
import numpy as np
from six.moves import range
from six.moves import zip
import tensorflow as tf
from google.protobuf import text_format
......
# Lint as: python2, python3
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
......@@ -14,8 +15,13 @@
# ==============================================================================
"""Convolutional Box Predictors with and without weight sharing."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
from six.moves import range
import tensorflow as tf
from object_detection.core import box_predictor
......@@ -400,7 +406,7 @@ class WeightSharedConvolutionalBoxPredictor(box_predictor.KerasBoxPredictor):
self._head_scope_conv_layers[tower_name_scope] = conv_layers
return base_tower_layers
for feature_index, input_shape in enumerate(input_shapes):
for feature_index in range(len(input_shapes)):
# Additional projection layers should not be shared as input channels
# (and thus weight shapes) are different
inserted_layer_counter, projection_layers = (
......
......@@ -107,6 +107,7 @@ class MaskRCNNBoxHead(head.Head):
box_encodings = slim.fully_connected(
flattened_roi_pooled_features,
number_of_boxes * self._box_code_size,
reuse=tf.AUTO_REUSE,
activation_fn=None,
scope='BoxEncodingPredictor')
box_encodings = tf.reshape(box_encodings,
......
......@@ -98,6 +98,7 @@ class MaskRCNNClassHead(head.Head):
class_predictions_with_background = slim.fully_connected(
flattened_roi_pooled_features,
self._num_class_slots,
reuse=tf.AUTO_REUSE,
activation_fn=None,
scope=self._scope)
class_predictions_with_background = tf.reshape(
......
# Lint as: python2, python3
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
......@@ -19,7 +20,12 @@ Contains Mask prediction head classes for different meta architectures.
All the mask prediction heads have a predict function that receives the
`features` as the first argument and returns `mask_predictions`.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import math
from six.moves import range
import tensorflow as tf
from object_detection.predictors.heads import head
......@@ -255,9 +261,9 @@ class MaskRCNNMaskHead(head.KerasHead):
if self._convolve_then_upsample:
# Replace Transposed Convolution with a Nearest Neighbor upsampling step
# followed by 3x3 convolution.
height_scale = self._mask_height / shape_utils.get_dim_as_int(
height_scale = self._mask_height // shape_utils.get_dim_as_int(
input_shapes[1])
width_scale = self._mask_width / shape_utils.get_dim_as_int(
width_scale = self._mask_width // shape_utils.get_dim_as_int(
input_shapes[2])
# pylint: disable=g-long-lambda
self._mask_predictor_layers.append(tf.keras.layers.Lambda(
......
# Lint as: python2, python3
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
......@@ -22,6 +23,11 @@ Keypoints could be used to represent the human body joint locations as in
Mask RCNN paper. Or they could be used to represent different part locations of
objects.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from six.moves import range
import tensorflow as tf
from tensorflow.contrib import slim as contrib_slim
......
# Lint as: python2, python3
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
......@@ -19,7 +20,12 @@ Contains Mask prediction head classes for different meta architectures.
All the mask prediction heads have a predict function that receives the
`features` as the first argument and returns `mask_predictions`.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import math
from six.moves import range
import tensorflow as tf
from tensorflow.contrib import slim as contrib_slim
......@@ -155,8 +161,8 @@ class MaskRCNNMaskHead(head.Head):
if self._convolve_then_upsample:
# Replace Transposed Convolution with a Nearest Neighbor upsampling step
# followed by 3x3 convolution.
height_scale = self._mask_height / features.shape[1].value
width_scale = self._mask_width / features.shape[2].value
height_scale = self._mask_height // features.shape[1].value
width_scale = self._mask_width // features.shape[2].value
features = ops.nearest_neighbor_upsampling(
features, height_scale=height_scale, width_scale=width_scale)
features = slim.conv2d(
......
......@@ -14,11 +14,11 @@
# ==============================================================================
"""Mask R-CNN Box Predictor."""
import tensorflow as tf
from tensorflow.contrib import slim as contrib_slim
from object_detection.core import box_predictor
slim = tf.contrib.slim
slim = contrib_slim
BOX_ENCODINGS = box_predictor.BOX_ENCODINGS
CLASS_PREDICTIONS_WITH_BACKGROUND = (
......
......@@ -15,10 +15,11 @@
"""RFCN Box Predictor."""
import tensorflow as tf
from tensorflow.contrib import slim as contrib_slim
from object_detection.core import box_predictor
from object_detection.utils import ops
slim = tf.contrib.slim
slim = contrib_slim
BOX_ENCODINGS = box_predictor.BOX_ENCODINGS
CLASS_PREDICTIONS_WITH_BACKGROUND = (
......
......@@ -3,7 +3,7 @@ syntax = "proto2";
package object_detection.protos;
// Message for configuring DetectionModel evaluation jobs (eval.py).
// Next id - 30
// Next id - 33
message EvalConfig {
optional uint32 batch_size = 25 [default = 1];
// Number of visualization images to generate.
......@@ -31,6 +31,10 @@ message EvalConfig {
// Type of metrics to use for evaluation.
repeated string metrics_set = 8;
// Type of metrics to use for evaluation. Unlike `metrics_set` above, this
// field allows configuring evaluation metric through config files.
repeated ParameterizedMetric parameterized_metric = 31;
// Path to export detections to COCO compatible JSON format.
optional string export_path = 9 [default =''];
......@@ -45,7 +49,7 @@ message EvalConfig {
// Whether to evaluate instance masks.
// Note that since there is no evaluation code currently for instance
// segmenation this option is unused.
// segmentation this option is unused.
optional bool eval_instance_masks = 12 [default = false];
// Minimum score threshold for a detected object box to be visualized
......@@ -90,5 +94,59 @@ message EvalConfig {
// When this flag is set, images are not resized during evaluation.
// When this flag is not set (default case), image are resized according
// to the image_resizer config in the model during evaluation.
optional bool force_no_resize = 29 [default=false];
optional bool force_no_resize = 29 [default = false];
// Whether to use a dummy loss in eval so model.loss() is not executed.
optional bool use_dummy_loss_in_eval = 30 [default = false];
// Specifies which keypoints should be connected by an edge, which may improve
// visualization. An example would be human pose estimation where certain
// joints can be connected.
repeated KeypointEdge keypoint_edge = 32;
}
// A message to configure parameterized evaluation metric.
message ParameterizedMetric {
oneof parameterized_metric {
CocoKeypointMetrics coco_keypoint_metrics = 1;
}
}
// A message to evaluate COCO keypoint metrics for a specific class.
message CocoKeypointMetrics {
// Identifies the class of object to which keypoints belong. By default this
// should use the class's "display_name" in the label map.
optional string class_label = 1;
// Keypoint specific standard deviations for COCO keypoint metrics, which
// controls how OKS is computed.
// See http://cocodataset.org/#keypoints-eval for details.
// If your keypoints are similar to the COCO keypoints use the precomputed
// standard deviations below:
// "nose": 0.026
// "left_eye": 0.025
// "right_eye": 0.025
// "left_ear": 0.035
// "right_ear": 0.035
// "left_shoulder": 0.079
// "right_shoulder": 0.079
// "left_elbow": 0.072
// "right_elbow": 0.072
// "left_wrist": 0.062
// "right_wrist": 0.062
// "left_hip": 0.107
// "right_hip": 0.107
// "left_knee": 0.087
// "right_knee": 0.087
// "left_ankle": 0.089
// "right_ankle": 0.089
map<string, float> keypoint_label_to_sigmas = 2;
}
// Defines an edge that should be drawn between two keypoints.
message KeypointEdge {
// Index of the keypoint where the edge starts from. Index starts at 0.
optional int32 start = 1;
// Index of the keypoint where the edge ends. Index starts at 0.
optional int32 end = 2;
}
......@@ -18,9 +18,8 @@ import "object_detection/protos/post_processing.proto";
// `first_stage_` and `second_stage_` to indicate the stage to which each
// parameter pertains when relevant.
message FasterRcnn {
// Whether to construct only the Region Proposal Network (RPN).
optional int32 number_of_stages = 1 [default=2];
optional int32 number_of_stages = 1 [default = 2];
// Number of classes to predict.
optional int32 num_classes = 3;
......@@ -31,7 +30,6 @@ message FasterRcnn {
// Feature extractor config.
optional FasterRcnnFeatureExtractor feature_extractor = 5;
// (First stage) region proposal network (RPN) parameters.
// Anchor generator to compute RPN anchors.
......@@ -39,40 +37,39 @@ message FasterRcnn {
// Atrous rate for the convolution op applied to the
// `first_stage_features_to_crop` tensor to obtain box predictions.
optional int32 first_stage_atrous_rate = 7 [default=1];
optional int32 first_stage_atrous_rate = 7 [default = 1];
// Hyperparameters for the convolutional RPN box predictor.
optional Hyperparams first_stage_box_predictor_conv_hyperparams = 8;
// Kernel size to use for the convolution op just prior to RPN box
// predictions.
optional int32 first_stage_box_predictor_kernel_size = 9 [default=3];
optional int32 first_stage_box_predictor_kernel_size = 9 [default = 3];
// Output depth for the convolution op just prior to RPN box predictions.
optional int32 first_stage_box_predictor_depth = 10 [default=512];
optional int32 first_stage_box_predictor_depth = 10 [default = 512];
// The batch size to use for computing the first stage objectness and
// location losses.
optional int32 first_stage_minibatch_size = 11 [default=256];
optional int32 first_stage_minibatch_size = 11 [default = 256];
// Fraction of positive examples per image for the RPN.
optional float first_stage_positive_balance_fraction = 12 [default=0.5];
optional float first_stage_positive_balance_fraction = 12 [default = 0.5];
// Non max suppression score threshold applied to first stage RPN proposals.
optional float first_stage_nms_score_threshold = 13 [default=0.0];
optional float first_stage_nms_score_threshold = 13 [default = 0.0];
// Non max suppression IOU threshold applied to first stage RPN proposals.
optional float first_stage_nms_iou_threshold = 14 [default=0.7];
optional float first_stage_nms_iou_threshold = 14 [default = 0.7];
// Maximum number of RPN proposals retained after first stage postprocessing.
optional int32 first_stage_max_proposals = 15 [default=300];
optional int32 first_stage_max_proposals = 15 [default = 300];
// First stage RPN localization loss weight.
optional float first_stage_localization_loss_weight = 16 [default=1.0];
optional float first_stage_localization_loss_weight = 16 [default = 1.0];
// First stage RPN objectness loss weight.
optional float first_stage_objectness_loss_weight = 17 [default=1.0];
optional float first_stage_objectness_loss_weight = 17 [default = 1.0];
// Per-region cropping parameters.
// Note that if a R-FCN model is constructed the per region cropping
......@@ -89,7 +86,6 @@ message FasterRcnn {
// Stride of the max pool op on the cropped feature map during ROI pooling.
optional int32 maxpool_stride = 20;
// (Second stage) box classifier parameters
// Hyperparameters for the second stage box predictor. If box predictor type
......@@ -100,10 +96,10 @@ message FasterRcnn {
// The batch size per image used for computing the classification and refined
// location loss of the box classifier.
// Note that this field is ignored if `hard_example_miner` is configured.
optional int32 second_stage_batch_size = 22 [default=64];
optional int32 second_stage_batch_size = 22 [default = 64];
// Fraction of positive examples to use per image for the box classifier.
optional float second_stage_balance_fraction = 23 [default=0.25];
optional float second_stage_balance_fraction = 23 [default = 0.25];
// Post processing to apply on the second stage box classifier predictions.
// Note: the `score_converter` provided to the FasterRCNNMetaArch constructor
......@@ -111,15 +107,15 @@ message FasterRcnn {
optional PostProcessing second_stage_post_processing = 24;
// Second stage refined localization loss weight.
optional float second_stage_localization_loss_weight = 25 [default=1.0];
optional float second_stage_localization_loss_weight = 25 [default = 1.0];
// Second stage classification loss weight
optional float second_stage_classification_loss_weight = 26 [default=1.0];
optional float second_stage_classification_loss_weight = 26 [default = 1.0];
// Second stage instance mask loss weight. Note that this is only applicable
// when `MaskRCNNBoxPredictor` is selected for second stage and configured to
// predict instance masks.
optional float second_stage_mask_prediction_loss_weight = 27 [default=1.0];
optional float second_stage_mask_prediction_loss_weight = 27 [default = 1.0];
// If not left to default, applies hard example mining only to classification
// and localization loss..
......@@ -178,6 +174,30 @@ message FasterRcnn {
// Whether to use tf.image.combined_non_max_suppression.
optional bool use_combined_nms_in_first_stage = 40 [default = false];
// Whether to output final box feature. If true, it will crop the feature map
// in the postprocess() method based on the final predictions.
optional bool output_final_box_features = 42 [default = false];
// Configs for context model.
optional Context context_config = 41;
}
message Context {
// Configuration proto for Context .
// Next id: 4
// The maximum number of contextual features per-image, used for padding
optional int32 max_num_context_features = 1 [default = 8500];
// The bottleneck feature dimension of the attention block.
optional int32 attention_bottleneck_dimension = 2 [default = 2048];
// The attention temperature.
optional float attention_temperature = 3 [default = 0.01];
// The context feature length.
optional int32 context_feature_length = 4 [default = 2057];
}
message FasterRcnnFeatureExtractor {
......@@ -186,10 +206,10 @@ message FasterRcnnFeatureExtractor {
optional string type = 1;
// Output stride of extracted RPN feature map.
optional int32 first_stage_features_stride = 2 [default=16];
optional int32 first_stage_features_stride = 2 [default = 16];
// Whether to update batch norm parameters during training or not.
// When training with a relative large batch size (e.g. 8), it could be
// desirable to enable batch norm update.
optional bool batch_norm_trainable = 3 [default=false];
optional bool batch_norm_trainable = 3 [default = false];
}
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment