Commit e00e0e13 authored by dreamdragon's avatar dreamdragon
Browse files

Merge remote-tracking branch 'upstream/master'

parents b915db4e 402b561b
......@@ -115,6 +115,7 @@ approximately 10 times slower.
First ensure that you have installed the following required packages:
* **Bazel** ([instructions](http://bazel.io/docs/install.html))
* **Python 2.7**
* **TensorFlow** 1.0 or greater ([instructions](https://www.tensorflow.org/install/))
* **NumPy** ([instructions](http://www.scipy.org/install.html))
* **Natural Language Toolkit (NLTK)**:
......
name: im2txt
channels:
- defaults
dependencies:
- _tflow_select=2.3.0=mkl
- absl-py=0.5.0=py27_0
- astor=0.7.1=py27_0
- backports=1.0=py27_1
- backports.functools_lru_cache=1.5=py27_1
- backports.shutil_get_terminal_size=1.0.0=py27_2
- backports.weakref=1.0.post1=py27_0
- backports_abc=0.5=py27_0
- blas=1.0=mkl
- bleach=3.0.2=py27_0
- ca-certificates=2018.03.07=0
- certifi=2018.10.15=py27_0
- configparser=3.5.0=py27_0
- cycler=0.10.0=py27_0
- dbus=1.13.2=h714fa37_1
- decorator=4.3.0=py27_0
- entrypoints=0.2.3=py27_2
- enum34=1.1.6=py27_1
- expat=2.2.6=he6710b0_0
- fastcache=1.0.2=py27h14c3975_2
- fontconfig=2.13.0=h9420a91_0
- freetype=2.9.1=h8a8886c_1
- funcsigs=1.0.2=py27_0
- functools32=3.2.3.2=py27_1
- futures=3.2.0=py27_0
- gast=0.2.0=py27_0
- glib=2.56.2=hd408876_0
- gmp=6.1.2=h6c8ec71_1
- gmpy2=2.0.8=py27h10f8cd9_2
- grpcio=1.12.1=py27hdbcaa40_0
- gst-plugins-base=1.14.0=hbbd80ab_1
- gstreamer=1.14.0=hb453b48_1
- h5py=2.8.0=py27h989c5e5_3
- hdf5=1.10.2=hba1933b_1
- icu=58.2=h9c2bf20_1
- intel-openmp=2019.0=118
- ipaddress=1.0.22=py27_0
- ipykernel=4.10.0=py27_0
- ipython=5.8.0=py27_0
- ipython_genutils=0.2.0=py27_0
- ipywidgets=7.4.2=py27_0
- jinja2=2.10=py27_0
- jpeg=9b=h024ee3a_2
- jsonschema=2.6.0=py27_0
- jupyter=1.0.0=py27_7
- jupyter_client=5.2.3=py27_0
- jupyter_console=5.2.0=py27_1
- jupyter_core=4.4.0=py27_0
- keras-applications=1.0.6=py27_0
- keras-preprocessing=1.0.5=py27_0
- kiwisolver=1.0.1=py27hf484d3e_0
- libedit=3.1.20170329=h6b74fdf_2
- libffi=3.2.1=hd88cf55_4
- libgcc-ng=8.2.0=hdf63c60_1
- libgfortran-ng=7.3.0=hdf63c60_0
- libpng=1.6.35=hbc83047_0
- libprotobuf=3.6.0=hdbcaa40_0
- libsodium=1.0.16=h1bed415_0
- libstdcxx-ng=8.2.0=hdf63c60_1
- libuuid=1.0.3=h1bed415_2
- libxcb=1.13=h1bed415_1
- libxml2=2.9.8=h26e45fe_1
- linecache2=1.0.0=py27_0
- markdown=3.0.1=py27_0
- markupsafe=1.0=py27h14c3975_1
- matplotlib=2.2.3=py27hb69df0a_0
- mistune=0.8.4=py27h7b6447c_0
- mkl=2019.0=118
- mkl_fft=1.0.6=py27h7dd41cf_0
- mkl_random=1.0.1=py27h4414c95_1
- mock=2.0.0=py27_0
- mpc=1.1.0=h10f8cd9_1
- mpfr=4.0.1=hdf1c602_3
- mpmath=1.0.0=py27_2
- nbconvert=5.3.1=py27_0
- nbformat=4.4.0=py27_0
- ncurses=6.1=hf484d3e_0
- nltk=3.3.0=py27_0
- nose=1.3.7=py27_2
- notebook=5.7.0=py27_0
- numpy=1.15.3=py27h1d66e8a_0
- numpy-base=1.15.3=py27h81de0dd_0
- openssl=1.0.2p=h14c3975_0
- pandas=0.23.4=py27h04863e7_0
- pandoc=2.2.3.2=0
- pandocfilters=1.4.2=py27_1
- pathlib2=2.3.2=py27_0
- pbr=4.3.0=py27_0
- pcre=8.42=h439df22_0
- pexpect=4.6.0=py27_0
- pickleshare=0.7.5=py27_0
- pip=10.0.1=py27_0
- prometheus_client=0.4.2=py27_0
- prompt_toolkit=1.0.15=py27_0
- protobuf=3.6.0=py27hf484d3e_0
- ptyprocess=0.6.0=py27_0
- pygments=2.2.0=py27_0
- pyparsing=2.2.2=py27_0
- pyqt=5.9.2=py27h05f1152_2
- python=2.7.15=h77bded6_2
- python-dateutil=2.7.3=py27_0
- pytz=2018.5=py27_0
- pyzmq=17.1.2=py27h14c3975_0
- qt=5.9.6=h8703b6f_2
- qtconsole=4.4.2=py27_0
- readline=7.0=h7b6447c_5
- scandir=1.9.0=py27h14c3975_0
- scipy=1.1.0=py27hfa4b5c9_1
- send2trash=1.5.0=py27_0
- setuptools=40.4.3=py27_0
- simplegeneric=0.8.1=py27_2
- singledispatch=3.4.0.3=py27_0
- sip=4.19.8=py27hf484d3e_0
- six=1.11.0=py27_1
- sqlite=3.25.2=h7b6447c_0
- subprocess32=3.5.3=py27h7b6447c_0
- sympy=1.3=py27_0
- tensorboard=1.11.0=py27hf484d3e_0
- tensorflow=1.11.0=mkl_py27h25e0b76_0
- tensorflow-base=1.11.0=mkl_py27h3c3e929_0
- termcolor=1.1.0=py27_1
- terminado=0.8.1=py27_1
- testpath=0.4.2=py27_0
- tk=8.6.8=hbc83047_0
- tornado=5.1.1=py27h7b6447c_0
- traceback2=1.4.0=py27_0
- traitlets=4.3.2=py27_0
- unittest2=1.1.0=py27_0
- wcwidth=0.1.7=py27_0
- webencodings=0.5.1=py27_1
- werkzeug=0.14.1=py27_0
- wheel=0.32.2=py27_0
- widgetsnbextension=3.4.2=py27_0
- xz=5.2.4=h14c3975_4
- zeromq=4.2.5=hf484d3e_1
- zlib=1.2.11=ha838bed_2
prefix: /home/arinto_murdopo/anaconda3/envs/im2txt
......@@ -341,7 +341,7 @@
},
"cell_type": "markdown",
"source": [
"In order toview the outputs of our optimization, we are required to perform the inverse preprocessing step. Furthermore, since our optimized image may take its values anywhere between $- \\infty$ and $\\infty$, we must clip to maintain our values from within the 0-255 range. "
"In order to view the outputs of our optimization, we are required to perform the inverse preprocessing step. Furthermore, since our optimized image may take its values anywhere between $- \\infty$ and $\\infty$, we must clip to maintain our values from within the 0-255 range. "
]
},
{
......@@ -380,7 +380,7 @@
},
"cell_type": "markdown",
"source": [
"### Define content and style representationst\n",
"### Define content and style representations\n",
"In order to get both the content and style representations of our image, we will look at some intermediate layers within our model. As we go deeper into the model, these intermediate layers represent higher and higher order features. In this case, we are using the network architecture VGG19, a pretrained image classification network. These intermediate layers are necessary to define the representation of content and style from our images. For an input image, we will try to match the corresponding style and content target representations at these intermediate layers. \n",
"\n",
"#### Why intermediate layers?\n",
......@@ -1183,7 +1183,7 @@
"### What we covered:\n",
"\n",
"* We built several different loss functions and used backpropagation to transform our input image in order to minimize these losses\n",
" * In order to do this we had to load in an a **pretrained model** and used its learned feature maps to describe the content and style representation of our images.\n",
" * In order to do this we had to load in a **pretrained model** and use its learned feature maps to describe the content and style representation of our images.\n",
" * Our main loss functions were primarily computing the distance in terms of these different representations\n",
"* We implemented this with a custom model and **eager execution**\n",
" * We built our custom model with the Functional API \n",
......
......@@ -108,9 +108,6 @@ class MultiscaleGridAnchorGenerator(anchor_generator.AnchorGenerator):
ValueError: if im_height and im_width are 1, but normalized coordinates
were requested.
"""
if not isinstance(im_height, int) or not isinstance(im_width, int):
raise ValueError('MultiscaleGridAnchorGenerator currently requires '
'input image shape to be statically defined.')
anchor_grid_list = []
for feat_shape, grid_info in zip(feature_map_shape_list,
self._anchor_grid_info):
......@@ -122,10 +119,11 @@ class MultiscaleGridAnchorGenerator(anchor_generator.AnchorGenerator):
feat_h = feat_shape[0]
feat_w = feat_shape[1]
anchor_offset = [0, 0]
if im_height % 2.0**level == 0 or im_height == 1:
anchor_offset[0] = stride / 2.0
if im_width % 2.0**level == 0 or im_width == 1:
anchor_offset[1] = stride / 2.0
if isinstance(im_height, int) and isinstance(im_width, int):
if im_height % 2.0**level == 0 or im_height == 1:
anchor_offset[0] = stride / 2.0
if im_width % 2.0**level == 0 or im_width == 1:
anchor_offset[1] = stride / 2.0
ag = grid_anchor_generator.GridAnchorGenerator(
scales,
aspect_ratios,
......
......@@ -116,7 +116,7 @@ class MultiscaleGridAnchorGeneratorTest(test_case.TestCase):
normalize_coordinates=False)
self.assertEqual(anchor_generator.num_anchors_per_location(), [6, 6])
def test_construct_single_anchor_fails_with_tensor_image_size(self):
def test_construct_single_anchor_dynamic_size(self):
min_level = 5
max_level = 5
anchor_scale = 4.0
......@@ -125,12 +125,22 @@ class MultiscaleGridAnchorGeneratorTest(test_case.TestCase):
im_height = tf.constant(64)
im_width = tf.constant(64)
feature_map_shape_list = [(2, 2)]
# Zero offsets are used.
exp_anchor_corners = [[-64, -64, 64, 64],
[-64, -32, 64, 96],
[-32, -64, 96, 64],
[-32, -32, 96, 96]]
anchor_generator = mg.MultiscaleGridAnchorGenerator(
min_level, max_level, anchor_scale, aspect_ratios, scales_per_octave,
normalize_coordinates=False)
with self.assertRaisesRegexp(ValueError, 'statically defined'):
anchor_generator.generate(
feature_map_shape_list, im_height=im_height, im_width=im_width)
anchors_list = anchor_generator.generate(
feature_map_shape_list, im_height=im_height, im_width=im_width)
anchor_corners = anchors_list[0].get()
with self.test_session():
anchor_corners_out = anchor_corners.eval()
self.assertAllClose(anchor_corners_out, exp_anchor_corners)
def test_construct_single_anchor_with_odd_input_dimension(self):
......
......@@ -42,6 +42,7 @@ def build_convolutional_box_predictor(is_training,
kernel_size,
box_code_size,
apply_sigmoid_to_scores=False,
add_background_class=True,
class_prediction_bias_init=0.0,
use_depthwise=False,
mask_head_config=None):
......@@ -49,7 +50,10 @@ def build_convolutional_box_predictor(is_training,
Args:
is_training: Indicates whether the BoxPredictor is in training mode.
num_classes: Number of classes.
num_classes: number of classes. Note that num_classes *does not*
include the background category, so if groundtruth labels take values
in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the
assigned classification targets can range from {0,... K}).
conv_hyperparams_fn: A function to generate tf-slim arg_scope with
hyperparameters for convolution ops.
min_depth: Minimum feature depth prior to predicting box encodings
......@@ -71,6 +75,7 @@ def build_convolutional_box_predictor(is_training,
box_code_size: Size of encoding for each box.
apply_sigmoid_to_scores: If True, apply the sigmoid on the output
class_predictions.
add_background_class: Whether to add an implicit background class.
class_prediction_bias_init: Constant value to initialize bias of the last
conv2d layer before class prediction.
use_depthwise: Whether to use depthwise convolutions for prediction
......@@ -88,7 +93,7 @@ def build_convolutional_box_predictor(is_training,
use_depthwise=use_depthwise)
class_prediction_head = class_head.ConvolutionalClassHead(
is_training=is_training,
num_classes=num_classes,
num_class_slots=num_classes + 1 if add_background_class else num_classes,
use_dropout=use_dropout,
dropout_keep_prob=dropout_keep_prob,
kernel_size=kernel_size,
......@@ -136,15 +141,19 @@ def build_convolutional_keras_box_predictor(is_training,
dropout_keep_prob,
kernel_size,
box_code_size,
add_background_class=True,
class_prediction_bias_init=0.0,
use_depthwise=False,
mask_head_config=None,
name='BoxPredictor'):
"""Builds the ConvolutionalBoxPredictor from the arguments.
"""Builds the Keras ConvolutionalBoxPredictor from the arguments.
Args:
is_training: Indicates whether the BoxPredictor is in training mode.
num_classes: Number of classes.
num_classes: number of classes. Note that num_classes *does not*
include the background category, so if groundtruth labels take values
in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the
assigned classification targets can range from {0,... K}).
conv_hyperparams: A `hyperparams_builder.KerasLayerHyperparams` object
containing hyperparameters for convolution ops.
freeze_batchnorm: Whether to freeze batch norm parameters during
......@@ -175,6 +184,7 @@ def build_convolutional_keras_box_predictor(is_training,
then the kernel size is automatically set to be
min(feature_width, feature_height).
box_code_size: Size of encoding for each box.
add_background_class: Whether to add an implicit background class.
class_prediction_bias_init: constant value to initialize bias of the last
conv2d layer before class prediction.
use_depthwise: Whether to use depthwise convolutions for prediction
......@@ -185,7 +195,7 @@ def build_convolutional_keras_box_predictor(is_training,
will auto-generate one from the class name.
Returns:
A ConvolutionalBoxPredictor class.
A Keras ConvolutionalBoxPredictor class.
"""
box_prediction_heads = []
class_prediction_heads = []
......@@ -210,7 +220,8 @@ def build_convolutional_keras_box_predictor(is_training,
class_prediction_heads.append(
keras_class_head.ConvolutionalClassHead(
is_training=is_training,
num_classes=num_classes,
num_class_slots=(
num_classes + 1 if add_background_class else num_classes),
use_dropout=use_dropout,
dropout_keep_prob=dropout_keep_prob,
kernel_size=kernel_size,
......@@ -264,6 +275,7 @@ def build_weight_shared_convolutional_box_predictor(
num_layers_before_predictor,
box_code_size,
kernel_size=3,
add_background_class=True,
class_prediction_bias_init=0.0,
use_dropout=False,
dropout_keep_prob=0.8,
......@@ -288,6 +300,7 @@ def build_weight_shared_convolutional_box_predictor(
the predictor.
box_code_size: Size of encoding for each box.
kernel_size: Size of final convolution kernel.
add_background_class: Whether to add an implicit background class.
class_prediction_bias_init: constant value to initialize bias of the last
conv2d layer before class prediction.
use_dropout: Whether to apply dropout to class prediction head.
......@@ -313,7 +326,8 @@ def build_weight_shared_convolutional_box_predictor(
box_encodings_clip_range=box_encodings_clip_range)
class_prediction_head = (
class_head.WeightSharedConvolutionalClassHead(
num_classes=num_classes,
num_class_slots=(
num_classes + 1 if add_background_class else num_classes),
kernel_size=kernel_size,
class_prediction_bias_init=class_prediction_bias_init,
use_dropout=use_dropout,
......@@ -355,6 +369,7 @@ def build_mask_rcnn_box_predictor(is_training,
use_dropout,
dropout_keep_prob,
box_code_size,
add_background_class=True,
share_box_across_classes=False,
predict_instance_masks=False,
conv_hyperparams_fn=None,
......@@ -362,40 +377,46 @@ def build_mask_rcnn_box_predictor(is_training,
mask_width=14,
mask_prediction_num_conv_layers=2,
mask_prediction_conv_depth=256,
masks_are_class_agnostic=False):
masks_are_class_agnostic=False,
convolve_then_upsample_masks=False):
"""Builds and returns a MaskRCNNBoxPredictor class.
Args:
is_training: Indicates whether the BoxPredictor is in training mode.
num_classes: number of classes. Note that num_classes *does not*
include the background category, so if groundtruth labels take values
in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the
assigned classification targets can range from {0,... K}).
fc_hyperparams_fn: A function to generate tf-slim arg_scope with
hyperparameters for fully connected ops.
use_dropout: Option to use dropout or not. Note that a single dropout
op is applied here prior to both box and class predictions, which stands
in contrast to the ConvolutionalBoxPredictor below.
dropout_keep_prob: Keep probability for dropout.
This is only used if use_dropout is True.
box_code_size: Size of encoding for each box.
share_box_across_classes: Whether to share boxes across classes rather
than use a different box for each class.
predict_instance_masks: If True, will add a third stage mask prediction
to the returned class.
conv_hyperparams_fn: A function to generate tf-slim arg_scope with
hyperparameters for convolution ops.
mask_height: Desired output mask height. The default value is 14.
mask_width: Desired output mask width. The default value is 14.
mask_prediction_num_conv_layers: Number of convolution layers applied to
the image_features in mask prediction branch.
mask_prediction_conv_depth: The depth for the first conv2d_transpose op
applied to the image_features in the mask prediction branch. If set
to 0, the depth of the convolution layers will be automatically chosen
based on the number of object classes and the number of channels in the
image features.
masks_are_class_agnostic: Boolean determining if the mask-head is
class-agnostic or not.
is_training: Indicates whether the BoxPredictor is in training mode.
num_classes: number of classes. Note that num_classes *does not*
include the background category, so if groundtruth labels take values
in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the
assigned classification targets can range from {0,... K}).
fc_hyperparams_fn: A function to generate tf-slim arg_scope with
hyperparameters for fully connected ops.
use_dropout: Option to use dropout or not. Note that a single dropout
op is applied here prior to both box and class predictions, which stands
in contrast to the ConvolutionalBoxPredictor below.
dropout_keep_prob: Keep probability for dropout.
This is only used if use_dropout is True.
box_code_size: Size of encoding for each box.
add_background_class: Whether to add an implicit background class.
share_box_across_classes: Whether to share boxes across classes rather
than use a different box for each class.
predict_instance_masks: If True, will add a third stage mask prediction
to the returned class.
conv_hyperparams_fn: A function to generate tf-slim arg_scope with
hyperparameters for convolution ops.
mask_height: Desired output mask height. The default value is 14.
mask_width: Desired output mask width. The default value is 14.
mask_prediction_num_conv_layers: Number of convolution layers applied to
the image_features in mask prediction branch.
mask_prediction_conv_depth: The depth for the first conv2d_transpose op
applied to the image_features in the mask prediction branch. If set
to 0, the depth of the convolution layers will be automatically chosen
based on the number of object classes and the number of channels in the
image features.
masks_are_class_agnostic: Boolean determining if the mask-head is
class-agnostic or not.
convolve_then_upsample_masks: Whether to apply convolutions on mask
features before upsampling using nearest neighbor resizing. Otherwise,
mask features are resized to [`mask_height`, `mask_width`] using
bilinear resizing before applying convolutions.
Returns:
A MaskRCNNBoxPredictor class.
......@@ -410,7 +431,7 @@ def build_mask_rcnn_box_predictor(is_training,
share_box_across_classes=share_box_across_classes)
class_prediction_head = class_head.MaskRCNNClassHead(
is_training=is_training,
num_classes=num_classes,
num_class_slots=num_classes + 1 if add_background_class else num_classes,
fc_hyperparams_fn=fc_hyperparams_fn,
use_dropout=use_dropout,
dropout_keep_prob=dropout_keep_prob)
......@@ -425,7 +446,8 @@ def build_mask_rcnn_box_predictor(is_training,
mask_width=mask_width,
mask_prediction_num_conv_layers=mask_prediction_num_conv_layers,
mask_prediction_conv_depth=mask_prediction_conv_depth,
masks_are_class_agnostic=masks_are_class_agnostic)
masks_are_class_agnostic=masks_are_class_agnostic,
convolve_then_upsample=convolve_then_upsample_masks)
return mask_rcnn_box_predictor.MaskRCNNBoxPredictor(
is_training=is_training,
num_classes=num_classes,
......@@ -464,7 +486,8 @@ BoxEncodingsClipRange = collections.namedtuple('BoxEncodingsClipRange',
['min', 'max'])
def build(argscope_fn, box_predictor_config, is_training, num_classes):
def build(argscope_fn, box_predictor_config, is_training, num_classes,
add_background_class=True):
"""Builds box predictor based on the configuration.
Builds box predictor based on the configuration. See box_predictor.proto for
......@@ -479,6 +502,7 @@ def build(argscope_fn, box_predictor_config, is_training, num_classes):
configuration.
is_training: Whether the models is in training mode.
num_classes: Number of classes to predict.
add_background_class: Whether to add an implicit background class.
Returns:
box_predictor: box_predictor.BoxPredictor object.
......@@ -502,6 +526,7 @@ def build(argscope_fn, box_predictor_config, is_training, num_classes):
return build_convolutional_box_predictor(
is_training=is_training,
num_classes=num_classes,
add_background_class=add_background_class,
conv_hyperparams_fn=conv_hyperparams_fn,
use_dropout=config_box_predictor.use_dropout,
dropout_keep_prob=config_box_predictor.dropout_keep_probability,
......@@ -542,6 +567,7 @@ def build(argscope_fn, box_predictor_config, is_training, num_classes):
return build_weight_shared_convolutional_box_predictor(
is_training=is_training,
num_classes=num_classes,
add_background_class=add_background_class,
conv_hyperparams_fn=conv_hyperparams_fn,
depth=config_box_predictor.depth,
num_layers_before_predictor=(
......@@ -570,6 +596,7 @@ def build(argscope_fn, box_predictor_config, is_training, num_classes):
return build_mask_rcnn_box_predictor(
is_training=is_training,
num_classes=num_classes,
add_background_class=add_background_class,
fc_hyperparams_fn=fc_hyperparams_fn,
use_dropout=config_box_predictor.use_dropout,
dropout_keep_prob=config_box_predictor.dropout_keep_probability,
......@@ -585,7 +612,9 @@ def build(argscope_fn, box_predictor_config, is_training, num_classes):
mask_prediction_conv_depth=(
config_box_predictor.mask_prediction_conv_depth),
masks_are_class_agnostic=(
config_box_predictor.masks_are_class_agnostic))
config_box_predictor.masks_are_class_agnostic),
convolve_then_upsample_masks=(
config_box_predictor.convolve_then_upsample_masks))
if box_predictor_oneof == 'rfcn_box_predictor':
config_box_predictor = box_predictor_config.rfcn_box_predictor
......@@ -603,3 +632,78 @@ def build(argscope_fn, box_predictor_config, is_training, num_classes):
box_code_size=config_box_predictor.box_code_size)
return box_predictor_object
raise ValueError('Unknown box predictor: {}'.format(box_predictor_oneof))
def build_keras(conv_hyperparams_fn, freeze_batchnorm, inplace_batchnorm_update,
num_predictions_per_location_list, box_predictor_config,
is_training, num_classes, add_background_class=True):
"""Builds a Keras-based box predictor based on the configuration.
Builds Keras-based box predictor based on the configuration.
See box_predictor.proto for configurable options. Also, see box_predictor.py
for more details.
Args:
conv_hyperparams_fn: A function that takes a hyperparams_pb2.Hyperparams
proto and returns a `hyperparams_builder.KerasLayerHyperparams`
for Conv or FC hyperparameters.
freeze_batchnorm: Whether to freeze batch norm parameters during
training or not. When training with a small batch size (e.g. 1), it is
desirable to freeze batch norm update and use pretrained batch norm
params.
inplace_batchnorm_update: Whether to update batch norm moving average
values inplace. When this is false train op must add a control
dependency on tf.graphkeys.UPDATE_OPS collection in order to update
batch norm statistics.
num_predictions_per_location_list: A list of integers representing the
number of box predictions to be made per spatial location for each
feature map.
box_predictor_config: box_predictor_pb2.BoxPredictor proto containing
configuration.
is_training: Whether the models is in training mode.
num_classes: Number of classes to predict.
add_background_class: Whether to add an implicit background class.
Returns:
box_predictor: box_predictor.KerasBoxPredictor object.
Raises:
ValueError: On unknown box predictor, or one with no Keras box predictor.
"""
if not isinstance(box_predictor_config, box_predictor_pb2.BoxPredictor):
raise ValueError('box_predictor_config not of type '
'box_predictor_pb2.BoxPredictor.')
box_predictor_oneof = box_predictor_config.WhichOneof('box_predictor_oneof')
if box_predictor_oneof == 'convolutional_box_predictor':
config_box_predictor = box_predictor_config.convolutional_box_predictor
conv_hyperparams = conv_hyperparams_fn(
config_box_predictor.conv_hyperparams)
mask_head_config = (
config_box_predictor.mask_head
if config_box_predictor.HasField('mask_head') else None)
return build_convolutional_keras_box_predictor(
is_training=is_training,
num_classes=num_classes,
add_background_class=add_background_class,
conv_hyperparams=conv_hyperparams,
freeze_batchnorm=freeze_batchnorm,
inplace_batchnorm_update=inplace_batchnorm_update,
num_predictions_per_location_list=num_predictions_per_location_list,
use_dropout=config_box_predictor.use_dropout,
dropout_keep_prob=config_box_predictor.dropout_keep_probability,
box_code_size=config_box_predictor.box_code_size,
kernel_size=config_box_predictor.kernel_size,
num_layers_before_predictor=(
config_box_predictor.num_layers_before_predictor),
min_depth=config_box_predictor.min_depth,
max_depth=config_box_predictor.max_depth,
class_prediction_bias_init=(
config_box_predictor.class_prediction_bias_init),
use_depthwise=config_box_predictor.use_depthwise,
mask_head_config=mask_head_config)
raise ValueError(
'Unknown box predictor for Keras: {}'.format(box_predictor_oneof))
......@@ -113,7 +113,8 @@ class ConvolutionalBoxPredictorBuilderTest(tf.test.TestCase):
argscope_fn=mock_conv_argscope_builder,
box_predictor_config=box_predictor_proto,
is_training=False,
num_classes=10)
num_classes=10,
add_background_class=False)
class_head = box_predictor._class_prediction_head
self.assertEqual(box_predictor._min_depth, 2)
self.assertEqual(box_predictor._max_depth, 16)
......@@ -122,6 +123,7 @@ class ConvolutionalBoxPredictorBuilderTest(tf.test.TestCase):
self.assertAlmostEqual(class_head._dropout_keep_prob, 0.4)
self.assertTrue(class_head._apply_sigmoid_to_scores)
self.assertAlmostEqual(class_head._class_prediction_bias_init, 4.0)
self.assertEqual(class_head._num_class_slots, 10)
self.assertEqual(box_predictor.num_classes, 10)
self.assertFalse(box_predictor._is_training)
self.assertTrue(class_head._use_depthwise)
......@@ -154,6 +156,7 @@ class ConvolutionalBoxPredictorBuilderTest(tf.test.TestCase):
self.assertTrue(class_head._use_dropout)
self.assertAlmostEqual(class_head._dropout_keep_prob, 0.8)
self.assertFalse(class_head._apply_sigmoid_to_scores)
self.assertEqual(class_head._num_class_slots, 91)
self.assertEqual(box_predictor.num_classes, 90)
self.assertTrue(box_predictor._is_training)
self.assertFalse(class_head._use_depthwise)
......@@ -306,7 +309,8 @@ class WeightSharedConvolutionalBoxPredictorBuilderTest(tf.test.TestCase):
argscope_fn=mock_conv_argscope_builder,
box_predictor_config=box_predictor_proto,
is_training=False,
num_classes=10)
num_classes=10,
add_background_class=False)
class_head = box_predictor._class_prediction_head
self.assertEqual(box_predictor._depth, 2)
self.assertEqual(box_predictor._num_layers_before_predictor, 2)
......@@ -349,7 +353,8 @@ class WeightSharedConvolutionalBoxPredictorBuilderTest(tf.test.TestCase):
argscope_fn=mock_conv_argscope_builder,
box_predictor_config=box_predictor_proto,
is_training=False,
num_classes=10)
num_classes=10,
add_background_class=False)
class_head = box_predictor._class_prediction_head
self.assertEqual(box_predictor._depth, 2)
self.assertEqual(box_predictor._num_layers_before_predictor, 2)
......@@ -627,6 +632,48 @@ class MaskRCNNBoxPredictorBuilderTest(tf.test.TestCase):
third_stage_heads[mask_rcnn_box_predictor.MASK_PREDICTIONS]
._mask_prediction_conv_depth, 512)
def test_build_box_predictor_with_convlve_then_upsample_masks(self):
box_predictor_proto = box_predictor_pb2.BoxPredictor()
box_predictor_proto.mask_rcnn_box_predictor.fc_hyperparams.op = (
hyperparams_pb2.Hyperparams.FC)
box_predictor_proto.mask_rcnn_box_predictor.conv_hyperparams.op = (
hyperparams_pb2.Hyperparams.CONV)
box_predictor_proto.mask_rcnn_box_predictor.predict_instance_masks = True
box_predictor_proto.mask_rcnn_box_predictor.mask_prediction_conv_depth = 512
box_predictor_proto.mask_rcnn_box_predictor.mask_height = 24
box_predictor_proto.mask_rcnn_box_predictor.mask_width = 24
box_predictor_proto.mask_rcnn_box_predictor.convolve_then_upsample_masks = (
True)
mock_argscope_fn = mock.Mock(return_value='arg_scope')
box_predictor = box_predictor_builder.build(
argscope_fn=mock_argscope_fn,
box_predictor_config=box_predictor_proto,
is_training=True,
num_classes=90)
mock_argscope_fn.assert_has_calls(
[mock.call(box_predictor_proto.mask_rcnn_box_predictor.fc_hyperparams,
True),
mock.call(box_predictor_proto.mask_rcnn_box_predictor.conv_hyperparams,
True)], any_order=True)
box_head = box_predictor._box_prediction_head
class_head = box_predictor._class_prediction_head
third_stage_heads = box_predictor._third_stage_heads
self.assertFalse(box_head._use_dropout)
self.assertFalse(class_head._use_dropout)
self.assertAlmostEqual(box_head._dropout_keep_prob, 0.5)
self.assertAlmostEqual(class_head._dropout_keep_prob, 0.5)
self.assertEqual(box_predictor.num_classes, 90)
self.assertTrue(box_predictor._is_training)
self.assertEqual(box_head._box_code_size, 4)
self.assertTrue(
mask_rcnn_box_predictor.MASK_PREDICTIONS in third_stage_heads)
self.assertEqual(
third_stage_heads[mask_rcnn_box_predictor.MASK_PREDICTIONS]
._mask_prediction_conv_depth, 512)
self.assertTrue(third_stage_heads[mask_rcnn_box_predictor.MASK_PREDICTIONS]
._convolve_then_upsample)
class RfcnBoxPredictorBuilderTest(tf.test.TestCase):
......
......@@ -64,6 +64,10 @@ class KerasLayerHyperparams(object):
hyperparams_config.batch_norm)
self._activation_fn = _build_activation_fn(hyperparams_config.activation)
# TODO(kaftan): Unclear if these kwargs apply to separable & depthwise conv
# (Those might use depthwise_* instead of kernel_*)
# We should probably switch to using build_conv2d_layer and
# build_depthwise_conv2d_layer methods instead.
self._op_params = {
'kernel_regularizer': _build_keras_regularizer(
hyperparams_config.regularizer),
......
......@@ -106,10 +106,35 @@ def build(image_resizer_config):
raise ValueError(
'Invalid image resizer option: \'%s\'.' % image_resizer_oneof)
def grayscale_image_resizer(image):
[resized_image, resized_image_shape] = image_resizer_fn(image)
grayscale_image = preprocessor.rgb_to_gray(resized_image)
grayscale_image_shape = tf.concat([resized_image_shape[:-1], [1]], 0)
return [grayscale_image, grayscale_image_shape]
def grayscale_image_resizer(image, masks=None):
"""Convert to grayscale before applying image_resizer_fn.
Args:
image: A 3D tensor of shape [height, width, 3]
masks: (optional) rank 3 float32 tensor with shape [num_instances, height,
width] containing instance masks.
Returns:
Note that the position of the resized_image_shape changes based on whether
masks are present.
resized_image: A 3D tensor of shape [new_height, new_width, 1],
where the image has been resized (with bilinear interpolation) so that
min(new_height, new_width) == min_dimension or
max(new_height, new_width) == max_dimension.
resized_masks: If masks is not None, also outputs masks. A 3D tensor of
shape [num_instances, new_height, new_width].
resized_image_shape: A 1D tensor of shape [3] containing shape of the
resized image.
"""
# image_resizer_fn returns [resized_image, resized_image_shape] if
# mask==None, otherwise it returns
# [resized_image, resized_mask, resized_image_shape]. In either case, we
# only deal with first and last element of the returned list.
retval = image_resizer_fn(image, masks)
resized_image = retval[0]
resized_image_shape = retval[-1]
retval[0] = preprocessor.rgb_to_gray(resized_image)
retval[-1] = tf.concat([resized_image_shape[:-1], [1]], 0)
return retval
return functools.partial(grayscale_image_resizer)
......@@ -136,6 +136,14 @@ def build_faster_rcnn_classification_loss(loss_config):
config = loss_config.weighted_logits_softmax
return losses.WeightedSoftmaxClassificationAgainstLogitsLoss(
logit_scale=config.logit_scale)
if loss_type == 'weighted_sigmoid_focal':
config = loss_config.weighted_sigmoid_focal
alpha = None
if config.HasField('alpha'):
alpha = config.alpha
return losses.SigmoidFocalClassificationLoss(
gamma=config.gamma,
alpha=alpha)
# By default, Faster RCNN second stage classifier uses Softmax loss
# with anchor-wise outputs.
......
......@@ -280,7 +280,7 @@ class ClassificationLossBuilderTest(tf.test.TestCase):
losses.WeightedSigmoidClassificationLoss))
predictions = tf.constant([[[0.0, 1.0, 0.0], [0.0, 0.5, 0.5]]])
targets = tf.constant([[[0.0, 1.0, 0.0], [0.0, 0.0, 1.0]]])
weights = tf.constant([[1.0, 1.0]])
weights = tf.constant([[[1.0, 1.0, 1.0], [1.0, 1.0, 1.0]]])
loss = classification_loss(predictions, targets, weights=weights)
self.assertEqual(loss.shape, [1, 2, 3])
......@@ -473,6 +473,19 @@ class FasterRcnnClassificationLossBuilderTest(tf.test.TestCase):
isinstance(classification_loss,
losses.WeightedSoftmaxClassificationAgainstLogitsLoss))
def test_build_sigmoid_focal_loss(self):
losses_text_proto = """
weighted_sigmoid_focal {
}
"""
losses_proto = losses_pb2.ClassificationLoss()
text_format.Merge(losses_text_proto, losses_proto)
classification_loss = losses_builder.build_faster_rcnn_classification_loss(
losses_proto)
self.assertTrue(
isinstance(classification_loss,
losses.SigmoidFocalClassificationLoss))
def test_build_softmax_loss_by_default(self):
losses_text_proto = """
"""
......
......@@ -47,6 +47,8 @@ from object_detection.models.ssd_mobilenet_v1_fpn_feature_extractor import SSDMo
from object_detection.models.ssd_mobilenet_v1_ppn_feature_extractor import SSDMobileNetV1PpnFeatureExtractor
from object_detection.models.ssd_mobilenet_v2_feature_extractor import SSDMobileNetV2FeatureExtractor
from object_detection.models.ssd_mobilenet_v2_fpn_feature_extractor import SSDMobileNetV2FpnFeatureExtractor
from object_detection.models.ssd_mobilenet_v2_keras_feature_extractor import SSDMobileNetV2KerasFeatureExtractor
from object_detection.models.ssd_pnasnet_feature_extractor import SSDPNASNetFeatureExtractor
from object_detection.predictors import rfcn_box_predictor
from object_detection.protos import model_pb2
from object_detection.utils import ops
......@@ -69,6 +71,11 @@ SSD_FEATURE_EXTRACTOR_CLASS_MAP = {
'ssd_resnet152_v1_ppn':
ssd_resnet_v1_ppn.SSDResnet152V1PpnFeatureExtractor,
'embedded_ssd_mobilenet_v1': EmbeddedSSDMobileNetV1FeatureExtractor,
'ssd_pnasnet': SSDPNASNetFeatureExtractor,
}
SSD_KERAS_FEATURE_EXTRACTOR_CLASS_MAP = {
'ssd_mobilenet_v2_keras': SSDMobileNetV2KerasFeatureExtractor
}
# A map of names to Faster R-CNN feature extractors.
......@@ -90,8 +97,7 @@ FASTER_RCNN_FEATURE_EXTRACTOR_CLASS_MAP = {
}
def build(model_config, is_training, add_summaries=True,
add_background_class=True):
def build(model_config, is_training, add_summaries=True):
"""Builds a DetectionModel based on the model config.
Args:
......@@ -99,10 +105,6 @@ def build(model_config, is_training, add_summaries=True,
DetectionModel.
is_training: True if this model is being built for training purposes.
add_summaries: Whether to add tensorflow summaries in the model graph.
add_background_class: Whether to add an implicit background class to one-hot
encodings of groundtruth labels. Set to false if using groundtruth labels
with an explicit background class or using multiclass scores instead of
truth in the case of distillation. Ignored in the case of faster_rcnn.
Returns:
DetectionModel based on the config.
......@@ -113,21 +115,26 @@ def build(model_config, is_training, add_summaries=True,
raise ValueError('model_config not of type model_pb2.DetectionModel.')
meta_architecture = model_config.WhichOneof('model')
if meta_architecture == 'ssd':
return _build_ssd_model(model_config.ssd, is_training, add_summaries,
add_background_class)
return _build_ssd_model(model_config.ssd, is_training, add_summaries)
if meta_architecture == 'faster_rcnn':
return _build_faster_rcnn_model(model_config.faster_rcnn, is_training,
add_summaries)
raise ValueError('Unknown meta architecture: {}'.format(meta_architecture))
def _build_ssd_feature_extractor(feature_extractor_config, is_training,
def _build_ssd_feature_extractor(feature_extractor_config,
is_training,
freeze_batchnorm,
reuse_weights=None):
"""Builds a ssd_meta_arch.SSDFeatureExtractor based on config.
Args:
feature_extractor_config: A SSDFeatureExtractor proto config from ssd.proto.
is_training: True if this feature extractor is being built for training.
freeze_batchnorm: Whether to freeze batch norm parameters during
training or not. When training with a small batch size (e.g. 1), it is
desirable to freeze batch norm update and use pretrained batch norm
params.
reuse_weights: if the feature extractor should reuse weights.
Returns:
......@@ -137,20 +144,31 @@ def _build_ssd_feature_extractor(feature_extractor_config, is_training,
ValueError: On invalid feature extractor type.
"""
feature_type = feature_extractor_config.type
is_keras_extractor = feature_type in SSD_KERAS_FEATURE_EXTRACTOR_CLASS_MAP
depth_multiplier = feature_extractor_config.depth_multiplier
min_depth = feature_extractor_config.min_depth
pad_to_multiple = feature_extractor_config.pad_to_multiple
use_explicit_padding = feature_extractor_config.use_explicit_padding
use_depthwise = feature_extractor_config.use_depthwise
conv_hyperparams = hyperparams_builder.build(
feature_extractor_config.conv_hyperparams, is_training)
if is_keras_extractor:
conv_hyperparams = hyperparams_builder.KerasLayerHyperparams(
feature_extractor_config.conv_hyperparams)
else:
conv_hyperparams = hyperparams_builder.build(
feature_extractor_config.conv_hyperparams, is_training)
override_base_feature_extractor_hyperparams = (
feature_extractor_config.override_base_feature_extractor_hyperparams)
if feature_type not in SSD_FEATURE_EXTRACTOR_CLASS_MAP:
if (feature_type not in SSD_FEATURE_EXTRACTOR_CLASS_MAP) and (
not is_keras_extractor):
raise ValueError('Unknown ssd feature_extractor: {}'.format(feature_type))
feature_extractor_class = SSD_FEATURE_EXTRACTOR_CLASS_MAP[feature_type]
if is_keras_extractor:
feature_extractor_class = SSD_KERAS_FEATURE_EXTRACTOR_CLASS_MAP[
feature_type]
else:
feature_extractor_class = SSD_FEATURE_EXTRACTOR_CLASS_MAP[feature_type]
kwargs = {
'is_training':
is_training,
......@@ -160,10 +178,6 @@ def _build_ssd_feature_extractor(feature_extractor_config, is_training,
min_depth,
'pad_to_multiple':
pad_to_multiple,
'conv_hyperparams_fn':
conv_hyperparams,
'reuse_weights':
reuse_weights,
'use_explicit_padding':
use_explicit_padding,
'use_depthwise':
......@@ -172,6 +186,18 @@ def _build_ssd_feature_extractor(feature_extractor_config, is_training,
override_base_feature_extractor_hyperparams
}
if is_keras_extractor:
kwargs.update({
'conv_hyperparams': conv_hyperparams,
'inplace_batchnorm_update': False,
'freeze_batchnorm': freeze_batchnorm
})
else:
kwargs.update({
'conv_hyperparams_fn': conv_hyperparams,
'reuse_weights': reuse_weights,
})
if feature_extractor_config.HasField('fpn'):
kwargs.update({
'fpn_min_level':
......@@ -185,8 +211,7 @@ def _build_ssd_feature_extractor(feature_extractor_config, is_training,
return feature_extractor_class(**kwargs)
def _build_ssd_model(ssd_config, is_training, add_summaries,
add_background_class=True):
def _build_ssd_model(ssd_config, is_training, add_summaries):
"""Builds an SSD detection model based on the model config.
Args:
......@@ -194,10 +219,6 @@ def _build_ssd_model(ssd_config, is_training, add_summaries,
SSDMetaArch.
is_training: True if this model is being built for training purposes.
add_summaries: Whether to add tf summaries in the model.
add_background_class: Whether to add an implicit background class to one-hot
encodings of groundtruth labels. Set to false if using groundtruth labels
with an explicit background class or using multiclass scores instead of
truth in the case of distillation.
Returns:
SSDMetaArch based on the config.
......@@ -210,6 +231,7 @@ def _build_ssd_model(ssd_config, is_training, add_summaries,
# Feature extractor
feature_extractor = _build_ssd_feature_extractor(
feature_extractor_config=ssd_config.feature_extractor,
freeze_batchnorm=ssd_config.freeze_batchnorm,
is_training=is_training)
box_coder = box_coder_builder.build(ssd_config.box_coder)
......@@ -218,11 +240,23 @@ def _build_ssd_model(ssd_config, is_training, add_summaries,
ssd_config.similarity_calculator)
encode_background_as_zeros = ssd_config.encode_background_as_zeros
negative_class_weight = ssd_config.negative_class_weight
ssd_box_predictor = box_predictor_builder.build(hyperparams_builder.build,
ssd_config.box_predictor,
is_training, num_classes)
anchor_generator = anchor_generator_builder.build(
ssd_config.anchor_generator)
if feature_extractor.is_keras_model:
ssd_box_predictor = box_predictor_builder.build_keras(
conv_hyperparams_fn=hyperparams_builder.KerasLayerHyperparams,
freeze_batchnorm=ssd_config.freeze_batchnorm,
inplace_batchnorm_update=False,
num_predictions_per_location_list=anchor_generator
.num_anchors_per_location(),
box_predictor_config=ssd_config.box_predictor,
is_training=is_training,
num_classes=num_classes,
add_background_class=ssd_config.add_background_class)
else:
ssd_box_predictor = box_predictor_builder.build(
hyperparams_builder.build, ssd_config.box_predictor, is_training,
num_classes, ssd_config.add_background_class)
image_resizer_fn = image_resizer_builder.build(ssd_config.image_resizer)
non_max_suppression_fn, score_conversion_fn = post_processing_builder.build(
ssd_config.post_processing)
......@@ -244,7 +278,7 @@ def _build_ssd_model(ssd_config, is_training, add_summaries,
if ssd_config.use_expected_classification_loss_under_sampling:
expected_classification_loss_under_sampling = functools.partial(
ops.expected_classification_loss_under_sampling,
minimum_negative_sampling=ssd_config.minimum_negative_sampling,
min_num_negative_samples=ssd_config.min_num_negative_samples,
desired_negative_sampling_ratio=ssd_config.
desired_negative_sampling_ratio)
......@@ -271,7 +305,7 @@ def _build_ssd_model(ssd_config, is_training, add_summaries,
normalize_loc_loss_by_codesize=normalize_loc_loss_by_codesize,
freeze_batchnorm=ssd_config.freeze_batchnorm,
inplace_batchnorm_update=ssd_config.inplace_batchnorm_update,
add_background_class=add_background_class,
add_background_class=ssd_config.add_background_class,
random_example_sampler=random_example_sampler,
expected_classification_loss_under_sampling=
expected_classification_loss_under_sampling)
......@@ -357,12 +391,11 @@ def _build_faster_rcnn_model(frcnn_config, is_training, add_summaries):
frcnn_config.first_stage_box_predictor_kernel_size)
first_stage_box_predictor_depth = frcnn_config.first_stage_box_predictor_depth
first_stage_minibatch_size = frcnn_config.first_stage_minibatch_size
# TODO(bhattad): When eval is supported using static shapes, add separate
# use_static_shapes_for_trainig and use_static_shapes_for_evaluation.
use_static_shapes = frcnn_config.use_static_shapes and is_training
use_static_shapes = frcnn_config.use_static_shapes
first_stage_sampler = sampler.BalancedPositiveNegativeSampler(
positive_fraction=frcnn_config.first_stage_positive_balance_fraction,
is_static=frcnn_config.use_static_balanced_label_sampler and is_training)
is_static=(frcnn_config.use_static_balanced_label_sampler and
use_static_shapes))
first_stage_max_proposals = frcnn_config.first_stage_max_proposals
if (frcnn_config.first_stage_nms_iou_threshold < 0 or
frcnn_config.first_stage_nms_iou_threshold > 1.0):
......@@ -377,7 +410,7 @@ def _build_faster_rcnn_model(frcnn_config, is_training, add_summaries):
iou_thresh=frcnn_config.first_stage_nms_iou_threshold,
max_size_per_class=frcnn_config.first_stage_max_proposals,
max_total_size=frcnn_config.first_stage_max_proposals,
use_static_shapes=use_static_shapes and is_training)
use_static_shapes=use_static_shapes)
first_stage_loc_loss_weight = (
frcnn_config.first_stage_localization_loss_weight)
first_stage_obj_loss_weight = frcnn_config.first_stage_objectness_loss_weight
......@@ -398,7 +431,8 @@ def _build_faster_rcnn_model(frcnn_config, is_training, add_summaries):
second_stage_batch_size = frcnn_config.second_stage_batch_size
second_stage_sampler = sampler.BalancedPositiveNegativeSampler(
positive_fraction=frcnn_config.second_stage_balance_fraction,
is_static=frcnn_config.use_static_balanced_label_sampler and is_training)
is_static=(frcnn_config.use_static_balanced_label_sampler and
use_static_shapes))
(second_stage_non_max_suppression_fn, second_stage_score_conversion_fn
) = post_processing_builder.build(frcnn_config.second_stage_post_processing)
second_stage_localization_loss_weight = (
......
......@@ -39,6 +39,9 @@ from object_detection.models.ssd_mobilenet_v1_fpn_feature_extractor import SSDMo
from object_detection.models.ssd_mobilenet_v1_ppn_feature_extractor import SSDMobileNetV1PpnFeatureExtractor
from object_detection.models.ssd_mobilenet_v2_feature_extractor import SSDMobileNetV2FeatureExtractor
from object_detection.models.ssd_mobilenet_v2_fpn_feature_extractor import SSDMobileNetV2FpnFeatureExtractor
from object_detection.models.ssd_mobilenet_v2_keras_feature_extractor import SSDMobileNetV2KerasFeatureExtractor
from object_detection.predictors import convolutional_box_predictor
from object_detection.predictors import convolutional_keras_box_predictor
from object_detection.protos import model_pb2
FRCNN_RESNET_FEAT_MAPS = {
......@@ -148,7 +151,7 @@ class ModelBuilderTest(tf.test.TestCase, parameterized.TestCase):
}
}
use_expected_classification_loss_under_sampling: true
minimum_negative_sampling: 10
min_num_negative_samples: 10
desired_negative_sampling_ratio: 2
}"""
model_proto = model_pb2.DetectionModel()
......@@ -160,7 +163,7 @@ class ModelBuilderTest(tf.test.TestCase, parameterized.TestCase):
self.assertIsNotNone(model._expected_classification_loss_under_sampling)
self.assertEqual(
model._expected_classification_loss_under_sampling.keywords, {
'minimum_negative_sampling': 10,
'min_num_negative_samples': 10,
'desired_negative_sampling_ratio': 2
})
......@@ -713,6 +716,86 @@ class ModelBuilderTest(tf.test.TestCase, parameterized.TestCase):
self.assertIsInstance(model, ssd_meta_arch.SSDMetaArch)
self.assertIsInstance(model._feature_extractor,
SSDMobileNetV2FeatureExtractor)
self.assertIsInstance(model._box_predictor,
convolutional_box_predictor.ConvolutionalBoxPredictor)
self.assertTrue(model._normalize_loc_loss_by_codesize)
self.assertTrue(model._target_assigner._weight_regression_loss_by_score)
def test_create_ssd_mobilenet_v2_keras_model_from_config(self):
model_text_proto = """
ssd {
feature_extractor {
type: 'ssd_mobilenet_v2_keras'
conv_hyperparams {
regularizer {
l2_regularizer {
}
}
initializer {
truncated_normal_initializer {
}
}
}
}
box_coder {
faster_rcnn_box_coder {
}
}
matcher {
argmax_matcher {
}
}
similarity_calculator {
iou_similarity {
}
}
anchor_generator {
ssd_anchor_generator {
aspect_ratios: 1.0
}
}
image_resizer {
fixed_shape_resizer {
height: 320
width: 320
}
}
box_predictor {
convolutional_box_predictor {
conv_hyperparams {
regularizer {
l2_regularizer {
}
}
initializer {
truncated_normal_initializer {
}
}
}
}
}
normalize_loc_loss_by_codesize: true
loss {
classification_loss {
weighted_softmax {
}
}
localization_loss {
weighted_smooth_l1 {
}
}
}
weight_regression_loss_by_score: true
}"""
model_proto = model_pb2.DetectionModel()
text_format.Merge(model_text_proto, model_proto)
model = self.create_model(model_proto)
self.assertIsInstance(model, ssd_meta_arch.SSDMetaArch)
self.assertIsInstance(model._feature_extractor,
SSDMobileNetV2KerasFeatureExtractor)
self.assertIsInstance(
model._box_predictor,
convolutional_keras_box_predictor.ConvolutionalBoxPredictor)
self.assertTrue(model._normalize_loc_loss_by_codesize)
self.assertTrue(model._target_assigner._weight_regression_loss_by_score)
......
......@@ -167,6 +167,7 @@ def build(preprocessor_step_config):
config.max_aspect_ratio),
'area_range': (config.min_area, config.max_area),
'overlap_thresh': config.overlap_thresh,
'clip_boxes': config.clip_boxes,
'random_coef': config.random_coef,
})
......@@ -217,6 +218,7 @@ def build(preprocessor_step_config):
config.max_aspect_ratio),
'area_range': (config.min_area, config.max_area),
'overlap_thresh': config.overlap_thresh,
'clip_boxes': config.clip_boxes,
'random_coef': config.random_coef,
}
if min_padded_size_ratio:
......@@ -252,6 +254,7 @@ def build(preprocessor_step_config):
for op in config.operations]
area_range = [(op.min_area, op.max_area) for op in config.operations]
overlap_thresh = [op.overlap_thresh for op in config.operations]
clip_boxes = [op.clip_boxes for op in config.operations]
random_coef = [op.random_coef for op in config.operations]
return (preprocessor.ssd_random_crop,
{
......@@ -259,6 +262,7 @@ def build(preprocessor_step_config):
'aspect_ratio_range': aspect_ratio_range,
'area_range': area_range,
'overlap_thresh': overlap_thresh,
'clip_boxes': clip_boxes,
'random_coef': random_coef,
})
return (preprocessor.ssd_random_crop, {})
......@@ -271,6 +275,7 @@ def build(preprocessor_step_config):
for op in config.operations]
area_range = [(op.min_area, op.max_area) for op in config.operations]
overlap_thresh = [op.overlap_thresh for op in config.operations]
clip_boxes = [op.clip_boxes for op in config.operations]
random_coef = [op.random_coef for op in config.operations]
min_padded_size_ratio = [tuple(op.min_padded_size_ratio)
for op in config.operations]
......@@ -284,6 +289,7 @@ def build(preprocessor_step_config):
'aspect_ratio_range': aspect_ratio_range,
'area_range': area_range,
'overlap_thresh': overlap_thresh,
'clip_boxes': clip_boxes,
'random_coef': random_coef,
'min_padded_size_ratio': min_padded_size_ratio,
'max_padded_size_ratio': max_padded_size_ratio,
......@@ -297,6 +303,7 @@ def build(preprocessor_step_config):
min_object_covered = [op.min_object_covered for op in config.operations]
area_range = [(op.min_area, op.max_area) for op in config.operations]
overlap_thresh = [op.overlap_thresh for op in config.operations]
clip_boxes = [op.clip_boxes for op in config.operations]
random_coef = [op.random_coef for op in config.operations]
return (preprocessor.ssd_random_crop_fixed_aspect_ratio,
{
......@@ -304,6 +311,7 @@ def build(preprocessor_step_config):
'aspect_ratio': config.aspect_ratio,
'area_range': area_range,
'overlap_thresh': overlap_thresh,
'clip_boxes': clip_boxes,
'random_coef': random_coef,
})
return (preprocessor.ssd_random_crop_fixed_aspect_ratio, {})
......@@ -332,6 +340,7 @@ def build(preprocessor_step_config):
kwargs['area_range'] = [(op.min_area, op.max_area)
for op in config.operations]
kwargs['overlap_thresh'] = [op.overlap_thresh for op in config.operations]
kwargs['clip_boxes'] = [op.clip_boxes for op in config.operations]
kwargs['random_coef'] = [op.random_coef for op in config.operations]
return (preprocessor.ssd_random_crop_pad_fixed_aspect_ratio, kwargs)
......
......@@ -222,6 +222,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
min_area: 0.25
max_area: 0.875
overlap_thresh: 0.5
clip_boxes: False
random_coef: 0.125
}
"""
......@@ -234,6 +235,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
'aspect_ratio_range': (0.75, 1.5),
'area_range': (0.25, 0.875),
'overlap_thresh': 0.5,
'clip_boxes': False,
'random_coef': 0.125,
})
......@@ -261,6 +263,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
min_area: 0.25
max_area: 0.875
overlap_thresh: 0.5
clip_boxes: False
random_coef: 0.125
}
"""
......@@ -273,6 +276,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
'aspect_ratio_range': (0.75, 1.5),
'area_range': (0.25, 0.875),
'overlap_thresh': 0.5,
'clip_boxes': False,
'random_coef': 0.125,
})
......@@ -285,6 +289,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
min_area: 0.25
max_area: 0.875
overlap_thresh: 0.5
clip_boxes: False
random_coef: 0.125
min_padded_size_ratio: 0.5
min_padded_size_ratio: 0.75
......@@ -304,6 +309,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
'aspect_ratio_range': (0.75, 1.5),
'area_range': (0.25, 0.875),
'overlap_thresh': 0.5,
'clip_boxes': False,
'random_coef': 0.125,
'min_padded_size_ratio': (0.5, 0.75),
'max_padded_size_ratio': (0.5, 0.75),
......@@ -315,6 +321,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
random_crop_to_aspect_ratio {
aspect_ratio: 0.85
overlap_thresh: 0.35
clip_boxes: False
}
"""
preprocessor_proto = preprocessor_pb2.PreprocessingStep()
......@@ -322,7 +329,8 @@ class PreprocessorBuilderTest(tf.test.TestCase):
function, args = preprocessor_builder.build(preprocessor_proto)
self.assertEqual(function, preprocessor.random_crop_to_aspect_ratio)
self.assert_dictionary_close(args, {'aspect_ratio': 0.85,
'overlap_thresh': 0.35})
'overlap_thresh': 0.35,
'clip_boxes': False})
def test_build_random_black_patches(self):
preprocessor_text_proto = """
......@@ -411,6 +419,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
min_area: 0.5
max_area: 1.0
overlap_thresh: 0.0
clip_boxes: False
random_coef: 0.375
}
operations {
......@@ -420,6 +429,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
min_area: 0.5
max_area: 1.0
overlap_thresh: 0.25
clip_boxes: True
random_coef: 0.375
}
}
......@@ -432,6 +442,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
'aspect_ratio_range': [(0.875, 1.125), (0.75, 1.5)],
'area_range': [(0.5, 1.0), (0.5, 1.0)],
'overlap_thresh': [0.0, 0.25],
'clip_boxes': [False, True],
'random_coef': [0.375, 0.375]})
def test_build_ssd_random_crop_empty_operations(self):
......@@ -455,6 +466,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
min_area: 0.5
max_area: 1.0
overlap_thresh: 0.0
clip_boxes: False
random_coef: 0.375
min_padded_size_ratio: [1.0, 1.0]
max_padded_size_ratio: [2.0, 2.0]
......@@ -469,6 +481,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
min_area: 0.5
max_area: 1.0
overlap_thresh: 0.25
clip_boxes: True
random_coef: 0.375
min_padded_size_ratio: [1.0, 1.0]
max_padded_size_ratio: [2.0, 2.0]
......@@ -486,6 +499,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
'aspect_ratio_range': [(0.875, 1.125), (0.75, 1.5)],
'area_range': [(0.5, 1.0), (0.5, 1.0)],
'overlap_thresh': [0.0, 0.25],
'clip_boxes': [False, True],
'random_coef': [0.375, 0.375],
'min_padded_size_ratio': [(1.0, 1.0), (1.0, 1.0)],
'max_padded_size_ratio': [(2.0, 2.0), (2.0, 2.0)],
......@@ -499,6 +513,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
min_area: 0.5
max_area: 1.0
overlap_thresh: 0.0
clip_boxes: False
random_coef: 0.375
}
operations {
......@@ -506,6 +521,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
min_area: 0.5
max_area: 1.0
overlap_thresh: 0.25
clip_boxes: True
random_coef: 0.375
}
aspect_ratio: 0.875
......@@ -519,6 +535,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
'aspect_ratio': 0.875,
'area_range': [(0.5, 1.0), (0.5, 1.0)],
'overlap_thresh': [0.0, 0.25],
'clip_boxes': [False, True],
'random_coef': [0.375, 0.375]})
def test_build_ssd_random_crop_pad_fixed_aspect_ratio(self):
......@@ -531,6 +548,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
min_area: 0.5
max_area: 1.0
overlap_thresh: 0.0
clip_boxes: False
random_coef: 0.375
}
operations {
......@@ -540,6 +558,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
min_area: 0.5
max_area: 1.0
overlap_thresh: 0.25
clip_boxes: True
random_coef: 0.375
}
aspect_ratio: 0.875
......@@ -557,6 +576,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
'aspect_ratio_range': [(0.875, 1.125), (0.75, 1.5)],
'area_range': [(0.5, 1.0), (0.5, 1.0)],
'overlap_thresh': [0.0, 0.25],
'clip_boxes': [False, True],
'random_coef': [0.375, 0.375],
'min_padded_size_ratio': (1.0, 1.0),
'max_padded_size_ratio': (2.0, 2.0)})
......
......@@ -225,7 +225,9 @@ class WeightedSigmoidClassificationLoss(Loss):
num_classes] representing the predicted logits for each class
target_tensor: A float tensor of shape [batch_size, num_anchors,
num_classes] representing one-hot encoded classification targets
weights: a float tensor of shape [batch_size, num_anchors]
weights: a float tensor of shape, either [batch_size, num_anchors,
num_classes] or [batch_size, num_anchors, 1]. If the shape is
[batch_size, num_anchors, 1], all the classses are equally weighted.
class_indices: (Optional) A 1-D integer tensor of class indices.
If provided, computes loss only for the specified class indices.
......@@ -233,7 +235,6 @@ class WeightedSigmoidClassificationLoss(Loss):
loss: a float tensor of shape [batch_size, num_anchors, num_classes]
representing the value of the loss function.
"""
weights = tf.expand_dims(weights, 2)
if class_indices is not None:
weights *= tf.reshape(
ops.indices_to_dense_vector(class_indices,
......@@ -273,7 +274,9 @@ class SigmoidFocalClassificationLoss(Loss):
num_classes] representing the predicted logits for each class
target_tensor: A float tensor of shape [batch_size, num_anchors,
num_classes] representing one-hot encoded classification targets
weights: a float tensor of shape [batch_size, num_anchors]
weights: a float tensor of shape, either [batch_size, num_anchors,
num_classes] or [batch_size, num_anchors, 1]. If the shape is
[batch_size, num_anchors, 1], all the classses are equally weighted.
class_indices: (Optional) A 1-D integer tensor of class indices.
If provided, computes loss only for the specified class indices.
......@@ -281,7 +284,6 @@ class SigmoidFocalClassificationLoss(Loss):
loss: a float tensor of shape [batch_size, num_anchors, num_classes]
representing the value of the loss function.
"""
weights = tf.expand_dims(weights, 2)
if class_indices is not None:
weights *= tf.reshape(
ops.indices_to_dense_vector(class_indices,
......@@ -326,12 +328,15 @@ class WeightedSoftmaxClassificationLoss(Loss):
num_classes] representing the predicted logits for each class
target_tensor: A float tensor of shape [batch_size, num_anchors,
num_classes] representing one-hot encoded classification targets
weights: a float tensor of shape [batch_size, num_anchors]
weights: a float tensor of shape, either [batch_size, num_anchors,
num_classes] or [batch_size, num_anchors, 1]. If the shape is
[batch_size, num_anchors, 1], all the classses are equally weighted.
Returns:
loss: a float tensor of shape [batch_size, num_anchors]
representing the value of the loss function.
"""
weights = tf.reduce_mean(weights, axis=2)
num_classes = prediction_tensor.get_shape().as_list()[-1]
prediction_tensor = tf.divide(
prediction_tensor, self._logit_scale, name='scale_logit')
......@@ -372,12 +377,15 @@ class WeightedSoftmaxClassificationAgainstLogitsLoss(Loss):
num_classes] representing the predicted logits for each class
target_tensor: A float tensor of shape [batch_size, num_anchors,
num_classes] representing logit classification targets
weights: a float tensor of shape [batch_size, num_anchors]
weights: a float tensor of shape, either [batch_size, num_anchors,
num_classes] or [batch_size, num_anchors, 1]. If the shape is
[batch_size, num_anchors, 1], all the classses are equally weighted.
Returns:
loss: a float tensor of shape [batch_size, num_anchors]
representing the value of the loss function.
"""
weights = tf.reduce_mean(weights, axis=2)
num_classes = prediction_tensor.get_shape().as_list()[-1]
target_tensor = self._scale_and_softmax_logits(target_tensor)
prediction_tensor = tf.divide(prediction_tensor, self._logit_scale,
......@@ -431,7 +439,9 @@ class BootstrappedSigmoidClassificationLoss(Loss):
num_classes] representing the predicted logits for each class
target_tensor: A float tensor of shape [batch_size, num_anchors,
num_classes] representing one-hot encoded classification targets
weights: a float tensor of shape [batch_size, num_anchors]
weights: a float tensor of shape, either [batch_size, num_anchors,
num_classes] or [batch_size, num_anchors, 1]. If the shape is
[batch_size, num_anchors, 1], all the classses are equally weighted.
Returns:
loss: a float tensor of shape [batch_size, num_anchors, num_classes]
......@@ -446,7 +456,7 @@ class BootstrappedSigmoidClassificationLoss(Loss):
tf.sigmoid(prediction_tensor) > 0.5, tf.float32)
per_entry_cross_ent = (tf.nn.sigmoid_cross_entropy_with_logits(
labels=bootstrap_target_tensor, logits=prediction_tensor))
return per_entry_cross_ent * tf.expand_dims(weights, 2)
return per_entry_cross_ent * weights
class HardExampleMiner(object):
......
......@@ -209,8 +209,14 @@ class WeightedSigmoidClassificationLossTest(tf.test.TestCase):
[0, 1, 0],
[1, 1, 1],
[1, 0, 0]]], tf.float32)
weights = tf.constant([[1, 1, 1, 1],
[1, 1, 1, 0]], tf.float32)
weights = tf.constant([[[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 1]],
[[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[0, 0, 0]]], tf.float32)
loss_op = losses.WeightedSigmoidClassificationLoss()
loss = loss_op(prediction_tensor, target_tensor, weights=weights)
loss = tf.reduce_sum(loss)
......@@ -237,8 +243,14 @@ class WeightedSigmoidClassificationLossTest(tf.test.TestCase):
[0, 1, 0],
[1, 1, 1],
[1, 0, 0]]], tf.float32)
weights = tf.constant([[1, 1, 1, 1],
[1, 1, 1, 0]], tf.float32)
weights = tf.constant([[[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 1]],
[[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[0, 0, 0]]], tf.float32)
loss_op = losses.WeightedSigmoidClassificationLoss()
loss = loss_op(prediction_tensor, target_tensor, weights=weights)
loss = tf.reduce_sum(loss, axis=2)
......@@ -266,8 +278,14 @@ class WeightedSigmoidClassificationLossTest(tf.test.TestCase):
[0, 1, 0, 0],
[1, 1, 1, 0],
[1, 0, 0, 0]]], tf.float32)
weights = tf.constant([[1, 1, 1, 1],
[1, 1, 1, 0]], tf.float32)
weights = tf.constant([[[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]],
[[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1],
[0, 0, 0, 0]]], tf.float32)
# Ignores the last class.
class_indices = tf.constant([0, 1, 2], tf.int32)
loss_op = losses.WeightedSigmoidClassificationLoss()
......@@ -306,9 +324,18 @@ class WeightedSigmoidClassificationLossTest(tf.test.TestCase):
[0, 0, 0],
[0, 0, 0],
[0, 0, 0]]], tf.float32)
weights = tf.constant([[1, 1, 1, 1],
[1, 1, 1, 0],
[1, 1, 1, 1]], tf.float32)
weights = tf.constant([[[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 1]],
[[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[0, 0, 0]],
[[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 1]]], tf.float32)
losses_mask = tf.constant([True, True, False], tf.bool)
loss_op = losses.WeightedSigmoidClassificationLoss()
......@@ -345,7 +372,7 @@ class SigmoidFocalClassificationLossTest(tf.test.TestCase):
[0],
[0],
[0]]], tf.float32)
weights = tf.constant([[1, 1, 1, 1, 1, 1]], tf.float32)
weights = tf.constant([[[1], [1], [1], [1], [1], [1]]], tf.float32)
focal_loss_op = losses.SigmoidFocalClassificationLoss(gamma=2.0, alpha=None)
sigmoid_loss_op = losses.WeightedSigmoidClassificationLoss()
focal_loss = tf.reduce_sum(focal_loss_op(prediction_tensor, target_tensor,
......@@ -371,7 +398,7 @@ class SigmoidFocalClassificationLossTest(tf.test.TestCase):
[1],
[0],
[0]]], tf.float32)
weights = tf.constant([[1, 1, 1, 1, 1]], tf.float32)
weights = tf.constant([[[1], [1], [1], [1], [1]]], tf.float32)
focal_loss_op = losses.SigmoidFocalClassificationLoss(gamma=2.0, alpha=None)
sigmoid_loss_op = losses.WeightedSigmoidClassificationLoss()
focal_loss = tf.reduce_sum(focal_loss_op(prediction_tensor, target_tensor,
......@@ -397,7 +424,7 @@ class SigmoidFocalClassificationLossTest(tf.test.TestCase):
[1],
[0],
[0]]], tf.float32)
weights = tf.constant([[1, 1, 1, 1, 1]], tf.float32)
weights = tf.constant([[[1], [1], [1], [1], [1]]], tf.float32)
focal_loss_op = losses.SigmoidFocalClassificationLoss(gamma=2.0, alpha=None)
sigmoid_loss_op = losses.WeightedSigmoidClassificationLoss()
focal_loss = tf.reduce_sum(focal_loss_op(prediction_tensor, target_tensor,
......@@ -423,7 +450,7 @@ class SigmoidFocalClassificationLossTest(tf.test.TestCase):
[1],
[0],
[0]]], tf.float32)
weights = tf.constant([[1, 1, 1, 1, 1]], tf.float32)
weights = tf.constant([[[1], [1], [1], [1], [1]]], tf.float32)
focal_loss_op = losses.SigmoidFocalClassificationLoss(gamma=2.0, alpha=1.0)
sigmoid_loss_op = losses.WeightedSigmoidClassificationLoss()
focal_loss = tf.reduce_sum(focal_loss_op(prediction_tensor, target_tensor,
......@@ -451,7 +478,7 @@ class SigmoidFocalClassificationLossTest(tf.test.TestCase):
[1],
[0],
[0]]], tf.float32)
weights = tf.constant([[1, 1, 1, 1, 1]], tf.float32)
weights = tf.constant([[[1], [1], [1], [1], [1]]], tf.float32)
focal_loss_op = losses.SigmoidFocalClassificationLoss(gamma=2.0, alpha=0.0)
sigmoid_loss_op = losses.WeightedSigmoidClassificationLoss()
focal_loss = tf.reduce_sum(focal_loss_op(prediction_tensor, target_tensor,
......@@ -485,8 +512,14 @@ class SigmoidFocalClassificationLossTest(tf.test.TestCase):
[0, 1, 0],
[1, 1, 1],
[1, 0, 0]]], tf.float32)
weights = tf.constant([[1, 1, 1, 1],
[1, 1, 1, 0]], tf.float32)
weights = tf.constant([[[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 1]],
[[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[0, 0, 0]]], tf.float32)
focal_loss_op = losses.SigmoidFocalClassificationLoss(alpha=0.5, gamma=0.0)
sigmoid_loss_op = losses.WeightedSigmoidClassificationLoss()
focal_loss = focal_loss_op(prediction_tensor, target_tensor,
......@@ -515,8 +548,14 @@ class SigmoidFocalClassificationLossTest(tf.test.TestCase):
[0, 1, 0],
[1, 1, 1],
[1, 0, 0]]], tf.float32)
weights = tf.constant([[1, 1, 1, 1],
[1, 1, 1, 0]], tf.float32)
weights = tf.constant([[[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 1]],
[[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[0, 0, 0]]], tf.float32)
focal_loss_op = losses.SigmoidFocalClassificationLoss(alpha=None, gamma=0.0)
sigmoid_loss_op = losses.WeightedSigmoidClassificationLoss()
focal_loss = focal_loss_op(prediction_tensor, target_tensor,
......@@ -546,8 +585,14 @@ class SigmoidFocalClassificationLossTest(tf.test.TestCase):
[0, 1, 0],
[1, 0, 0],
[1, 0, 0]]], tf.float32)
weights = tf.constant([[1, 1, 1, 1],
[1, 1, 1, 1]], tf.float32)
weights = tf.constant([[[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 1]],
[[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 1]]], tf.float32)
focal_loss_op = losses.SigmoidFocalClassificationLoss(alpha=1.0, gamma=0.0)
focal_loss = tf.reduce_sum(focal_loss_op(prediction_tensor, target_tensor,
......@@ -578,8 +623,14 @@ class SigmoidFocalClassificationLossTest(tf.test.TestCase):
[0, 1, 0],
[1, 0, 0],
[1, 0, 0]]], tf.float32)
weights = tf.constant([[1, 1, 1, 1],
[1, 1, 1, 1]], tf.float32)
weights = tf.constant([[[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 1]],
[[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 1]]], tf.float32)
focal_loss_op = losses.SigmoidFocalClassificationLoss(alpha=0.75, gamma=0.0)
focal_loss = tf.reduce_sum(focal_loss_op(prediction_tensor, target_tensor,
......@@ -620,9 +671,18 @@ class SigmoidFocalClassificationLossTest(tf.test.TestCase):
[1, 0, 0],
[1, 0, 0],
[1, 0, 0]]], tf.float32)
weights = tf.constant([[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]], tf.float32)
weights = tf.constant([[[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 1]],
[[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 1]],
[[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 1]]], tf.float32)
losses_mask = tf.constant([True, True, False], tf.bool)
focal_loss_op = losses.SigmoidFocalClassificationLoss(alpha=0.75, gamma=0.0)
......@@ -659,8 +719,14 @@ class WeightedSoftmaxClassificationLossTest(tf.test.TestCase):
[0, 1, 0],
[0, 1, 0],
[1, 0, 0]]], tf.float32)
weights = tf.constant([[1, 1, .5, 1],
[1, 1, 1, 0]], tf.float32)
weights = tf.constant([[[1, 1, 1],
[1, 1, 1],
[0.5, 0.5, 0.5],
[1, 1, 1]],
[[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[0, 0, 0]]], tf.float32)
loss_op = losses.WeightedSoftmaxClassificationLoss()
loss = loss_op(prediction_tensor, target_tensor, weights=weights)
loss = tf.reduce_sum(loss)
......@@ -687,8 +753,14 @@ class WeightedSoftmaxClassificationLossTest(tf.test.TestCase):
[0, 1, 0],
[0, 1, 0],
[1, 0, 0]]], tf.float32)
weights = tf.constant([[1, 1, .5, 1],
[1, 1, 1, 0]], tf.float32)
weights = tf.constant([[[1, 1, 1],
[1, 1, 1],
[0.5, 0.5, 0.5],
[1, 1, 1]],
[[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[0, 0, 0]]], tf.float32)
loss_op = losses.WeightedSoftmaxClassificationLoss()
loss = loss_op(prediction_tensor, target_tensor, weights=weights)
......@@ -718,8 +790,14 @@ class WeightedSoftmaxClassificationLossTest(tf.test.TestCase):
[0, 1, 0],
[0, 1, 0],
[1, 0, 0]]], tf.float32)
weights = tf.constant([[1, 1, 1, 1],
[1, 1, 1, 1]], tf.float32)
weights = tf.constant([[[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 1]],
[[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 1]]], tf.float32)
loss_op = losses.WeightedSoftmaxClassificationLoss(logit_scale=logit_scale)
loss = loss_op(prediction_tensor, target_tensor, weights=weights)
......@@ -755,9 +833,18 @@ class WeightedSoftmaxClassificationLossTest(tf.test.TestCase):
[1, 0, 0],
[1, 0, 0],
[1, 0, 0]]], tf.float32)
weights = tf.constant([[1, 1, .5, 1],
[1, 1, 1, 0],
[1, 1, 1, 1]], tf.float32)
weights = tf.constant([[[1, 1, 1],
[1, 1, 1],
[0.5, 0.5, 0.5],
[1, 1, 1]],
[[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[0, 0, 0]],
[[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 1]]], tf.float32)
losses_mask = tf.constant([True, True, False], tf.bool)
loss_op = losses.WeightedSoftmaxClassificationLoss()
loss = loss_op(prediction_tensor, target_tensor, weights=weights,
......@@ -792,6 +879,11 @@ class WeightedSoftmaxClassificationAgainstLogitsLossTest(tf.test.TestCase):
[100, -100, -100]]], tf.float32)
weights = tf.constant([[1, 1, .5, 1],
[1, 1, 1, 1]], tf.float32)
weights_shape = tf.shape(weights)
weights_multiple = tf.concat(
[tf.ones_like(weights_shape), tf.constant([3])],
axis=0)
weights = tf.tile(tf.expand_dims(weights, 2), weights_multiple)
loss_op = losses.WeightedSoftmaxClassificationAgainstLogitsLoss()
loss = loss_op(prediction_tensor, target_tensor, weights=weights)
loss = tf.reduce_sum(loss)
......@@ -820,6 +912,11 @@ class WeightedSoftmaxClassificationAgainstLogitsLossTest(tf.test.TestCase):
[100, -100, -100]]], tf.float32)
weights = tf.constant([[1, 1, .5, 1],
[1, 1, 1, 0]], tf.float32)
weights_shape = tf.shape(weights)
weights_multiple = tf.concat(
[tf.ones_like(weights_shape), tf.constant([3])],
axis=0)
weights = tf.tile(tf.expand_dims(weights, 2), weights_multiple)
loss_op = losses.WeightedSoftmaxClassificationAgainstLogitsLoss()
loss = loss_op(prediction_tensor, target_tensor, weights=weights)
......@@ -849,6 +946,11 @@ class WeightedSoftmaxClassificationAgainstLogitsLossTest(tf.test.TestCase):
[100, -100, -100]]], tf.float32)
weights = tf.constant([[1, 1, .5, 1],
[1, 1, 1, 0]], tf.float32)
weights_shape = tf.shape(weights)
weights_multiple = tf.concat(
[tf.ones_like(weights_shape), tf.constant([3])],
axis=0)
weights = tf.tile(tf.expand_dims(weights, 2), weights_multiple)
loss_op = losses.WeightedSoftmaxClassificationAgainstLogitsLoss(
logit_scale=logit_scale)
loss = loss_op(prediction_tensor, target_tensor, weights=weights)
......@@ -894,8 +996,14 @@ class BootstrappedSigmoidClassificationLossTest(tf.test.TestCase):
[0, 1, 0],
[1, 1, 1],
[1, 0, 0]]], tf.float32)
weights = tf.constant([[1, 1, 1, 1],
[1, 1, 1, 0]], tf.float32)
weights = tf.constant([[[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 1]],
[[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[0, 0, 0]]], tf.float32)
alpha = tf.constant(.5, tf.float32)
loss_op = losses.BootstrappedSigmoidClassificationLoss(
alpha, bootstrap_type='soft')
......@@ -923,8 +1031,14 @@ class BootstrappedSigmoidClassificationLossTest(tf.test.TestCase):
[0, 1, 0],
[1, 1, 1],
[1, 0, 0]]], tf.float32)
weights = tf.constant([[1, 1, 1, 1],
[1, 1, 1, 0]], tf.float32)
weights = tf.constant([[[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 1]],
[[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[0, 0, 0]]], tf.float32)
alpha = tf.constant(.5, tf.float32)
loss_op = losses.BootstrappedSigmoidClassificationLoss(
alpha, bootstrap_type='hard')
......@@ -952,8 +1066,14 @@ class BootstrappedSigmoidClassificationLossTest(tf.test.TestCase):
[0, 1, 0],
[1, 1, 1],
[1, 0, 0]]], tf.float32)
weights = tf.constant([[1, 1, 1, 1],
[1, 1, 1, 0]], tf.float32)
weights = tf.constant([[[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 1]],
[[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[0, 0, 0]]], tf.float32)
alpha = tf.constant(.5, tf.float32)
loss_op = losses.BootstrappedSigmoidClassificationLoss(
alpha, bootstrap_type='hard')
......
......@@ -197,8 +197,10 @@ class Match(object):
The shape of the gathered tensor is [match_results.shape[0]] +
input_tensor.shape[1:].
"""
input_tensor = tf.concat([tf.stack([ignored_value, unmatched_value]),
input_tensor], axis=0)
input_tensor = tf.concat(
[tf.stack([ignored_value, unmatched_value]),
tf.to_float(input_tensor)],
axis=0)
gather_indices = tf.maximum(self.match_results + 2, 0)
gathered_tensor = self._gather_op(input_tensor, gather_indices)
return gathered_tensor
......
......@@ -289,6 +289,18 @@ class DetectionModel(object):
self._groundtruth_lists[
fields.InputDataFields.is_annotated] = is_annotated_list
@abstractmethod
def regularization_losses(self):
"""Returns a list of regularization losses for this model.
Returns a list of regularization losses for this model that the estimator
needs to use during training/optimization.
Returns:
A list of regularization loss tensors.
"""
pass
@abstractmethod
def restore_map(self, fine_tune_checkpoint_type='detection'):
"""Returns a map of variables to load from a foreign checkpoint.
......@@ -312,3 +324,16 @@ class DetectionModel(object):
the model graph.
"""
pass
@abstractmethod
def updates(self):
"""Returns a list of update operators for this model.
Returns a list of update operators for this model that must be executed at
each training step. The estimator's train op needs to have a control
dependency on these updates.
Returns:
A list of update operators.
"""
pass
......@@ -15,6 +15,7 @@
"""Post-processing operations on detected boxes."""
import numpy as np
import tensorflow as tf
from object_detection.core import box_list
......@@ -407,28 +408,36 @@ def batch_multiclass_non_max_suppression(boxes,
for key, value in zip(additional_fields, args[4:-1])
}
per_image_num_valid_boxes = args[-1]
per_image_boxes = tf.reshape(
tf.slice(per_image_boxes, 3 * [0],
tf.stack([per_image_num_valid_boxes, -1, -1])), [-1, q, 4])
per_image_scores = tf.reshape(
tf.slice(per_image_scores, [0, 0],
tf.stack([per_image_num_valid_boxes, -1])),
[-1, num_classes])
per_image_masks = tf.reshape(
tf.slice(per_image_masks, 4 * [0],
tf.stack([per_image_num_valid_boxes, -1, -1, -1])),
[-1, q, per_image_masks.shape[2].value,
per_image_masks.shape[3].value])
if per_image_additional_fields is not None:
for key, tensor in per_image_additional_fields.items():
additional_field_shape = tensor.get_shape()
additional_field_dim = len(additional_field_shape)
per_image_additional_fields[key] = tf.reshape(
tf.slice(per_image_additional_fields[key],
additional_field_dim * [0],
tf.stack([per_image_num_valid_boxes] +
(additional_field_dim - 1) * [-1])),
[-1] + [dim.value for dim in additional_field_shape[1:]])
if use_static_shapes:
total_proposals = tf.shape(per_image_scores)
per_image_scores = tf.where(
tf.less(tf.range(total_proposals[0]), per_image_num_valid_boxes),
per_image_scores,
tf.fill(total_proposals, np.finfo('float32').min))
else:
per_image_boxes = tf.reshape(
tf.slice(per_image_boxes, 3 * [0],
tf.stack([per_image_num_valid_boxes, -1, -1])), [-1, q, 4])
per_image_scores = tf.reshape(
tf.slice(per_image_scores, [0, 0],
tf.stack([per_image_num_valid_boxes, -1])),
[-1, num_classes])
per_image_masks = tf.reshape(
tf.slice(per_image_masks, 4 * [0],
tf.stack([per_image_num_valid_boxes, -1, -1, -1])),
[-1, q, per_image_masks.shape[2].value,
per_image_masks.shape[3].value])
if per_image_additional_fields is not None:
for key, tensor in per_image_additional_fields.items():
additional_field_shape = tensor.get_shape()
additional_field_dim = len(additional_field_shape)
per_image_additional_fields[key] = tf.reshape(
tf.slice(per_image_additional_fields[key],
additional_field_dim * [0],
tf.stack([per_image_num_valid_boxes] +
(additional_field_dim - 1) * [-1])),
[-1] + [dim.value for dim in additional_field_shape[1:]])
nmsed_boxlist, num_valid_nms_boxes = multiclass_non_max_suppression(
per_image_boxes,
per_image_scores,
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment