Merge remote-tracking branch 'upstream/master'

e00e0e13 · dreamdragon · b915db4e · 402b561b · e00e0e13 · e00e0e13
Commit e00e0e13 authored Dec 03, 2018 by dreamdragon
20 changed files
--- a/research/im2txt/README.md
+++ b/research/im2txt/README.md
@@ -115,6 +115,7 @@ approximately 10 times slower.
 First ensure that you have installed the following required packages:

 * **Bazel** ([instructions](http://bazel.io/docs/install.html))
+* **Python 2.7**
 * **TensorFlow** 1.0 or greater ([instructions](https://www.tensorflow.org/install/))
 * **NumPy** ([instructions](http://www.scipy.org/install.html))
 * **Natural Language Toolkit (NLTK)**:

--- a/research/im2txt/conda-env/ubuntu-18-04-environment.yaml
+++ b/research/im2txt/conda-env/ubuntu-18-04-environment.yaml
+name: im2txt
+channels:
+  - defaults
+dependencies:
+  - _tflow_select=2.3.0=mkl
+  - absl-py=0.5.0=py27_0
+  - astor=0.7.1=py27_0
+  - backports=1.0=py27_1
+  - backports.functools_lru_cache=1.5=py27_1
+  - backports.shutil_get_terminal_size=1.0.0=py27_2
+  - backports.weakref=1.0.post1=py27_0
+  - backports_abc=0.5=py27_0
+  - blas=1.0=mkl
+  - bleach=3.0.2=py27_0
+  - ca-certificates=2018.03.07=0
+  - certifi=2018.10.15=py27_0
+  - configparser=3.5.0=py27_0
+  - cycler=0.10.0=py27_0
+  - dbus=1.13.2=h714fa37_1
+  - decorator=4.3.0=py27_0
+  - entrypoints=0.2.3=py27_2
+  - enum34=1.1.6=py27_1
+  - expat=2.2.6=he6710b0_0
+  - fastcache=1.0.2=py27h14c3975_2
+  - fontconfig=2.13.0=h9420a91_0
+  - freetype=2.9.1=h8a8886c_1
+  - funcsigs=1.0.2=py27_0
+  - functools32=3.2.3.2=py27_1
+  - futures=3.2.0=py27_0
+  - gast=0.2.0=py27_0
+  - glib=2.56.2=hd408876_0
+  - gmp=6.1.2=h6c8ec71_1
+  - gmpy2=2.0.8=py27h10f8cd9_2
+  - grpcio=1.12.1=py27hdbcaa40_0
+  - gst-plugins-base=1.14.0=hbbd80ab_1
+  - gstreamer=1.14.0=hb453b48_1
+  - h5py=2.8.0=py27h989c5e5_3
+  - hdf5=1.10.2=hba1933b_1
+  - icu=58.2=h9c2bf20_1
+  - intel-openmp=2019.0=118
+  - ipaddress=1.0.22=py27_0
+  - ipykernel=4.10.0=py27_0
+  - ipython=5.8.0=py27_0
+  - ipython_genutils=0.2.0=py27_0
+  - ipywidgets=7.4.2=py27_0
+  - jinja2=2.10=py27_0
+  - jpeg=9b=h024ee3a_2
+  - jsonschema=2.6.0=py27_0
+  - jupyter=1.0.0=py27_7
+  - jupyter_client=5.2.3=py27_0
+  - jupyter_console=5.2.0=py27_1
+  - jupyter_core=4.4.0=py27_0
+  - keras-applications=1.0.6=py27_0
+  - keras-preprocessing=1.0.5=py27_0
+  - kiwisolver=1.0.1=py27hf484d3e_0
+  - libedit=3.1.20170329=h6b74fdf_2
+  - libffi=3.2.1=hd88cf55_4
+  - libgcc-ng=8.2.0=hdf63c60_1
+  - libgfortran-ng=7.3.0=hdf63c60_0
+  - libpng=1.6.35=hbc83047_0
+  - libprotobuf=3.6.0=hdbcaa40_0
+  - libsodium=1.0.16=h1bed415_0
+  - libstdcxx-ng=8.2.0=hdf63c60_1
+  - libuuid=1.0.3=h1bed415_2
+  - libxcb=1.13=h1bed415_1
+  - libxml2=2.9.8=h26e45fe_1
+  - linecache2=1.0.0=py27_0
+  - markdown=3.0.1=py27_0
+  - markupsafe=1.0=py27h14c3975_1
+  - matplotlib=2.2.3=py27hb69df0a_0
+  - mistune=0.8.4=py27h7b6447c_0
+  - mkl=2019.0=118
+  - mkl_fft=1.0.6=py27h7dd41cf_0
+  - mkl_random=1.0.1=py27h4414c95_1
+  - mock=2.0.0=py27_0
+  - mpc=1.1.0=h10f8cd9_1
+  - mpfr=4.0.1=hdf1c602_3
+  - mpmath=1.0.0=py27_2
+  - nbconvert=5.3.1=py27_0
+  - nbformat=4.4.0=py27_0
+  - ncurses=6.1=hf484d3e_0
+  - nltk=3.3.0=py27_0
+  - nose=1.3.7=py27_2
+  - notebook=5.7.0=py27_0
+  - numpy=1.15.3=py27h1d66e8a_0
+  - numpy-base=1.15.3=py27h81de0dd_0
+  - openssl=1.0.2p=h14c3975_0
+  - pandas=0.23.4=py27h04863e7_0
+  - pandoc=2.2.3.2=0
+  - pandocfilters=1.4.2=py27_1
+  - pathlib2=2.3.2=py27_0
+  - pbr=4.3.0=py27_0
+  - pcre=8.42=h439df22_0
+  - pexpect=4.6.0=py27_0
+  - pickleshare=0.7.5=py27_0
+  - pip=10.0.1=py27_0
+  - prometheus_client=0.4.2=py27_0
+  - prompt_toolkit=1.0.15=py27_0
+  - protobuf=3.6.0=py27hf484d3e_0
+  - ptyprocess=0.6.0=py27_0
+  - pygments=2.2.0=py27_0
+  - pyparsing=2.2.2=py27_0
+  - pyqt=5.9.2=py27h05f1152_2
+  - python=2.7.15=h77bded6_2
+  - python-dateutil=2.7.3=py27_0
+  - pytz=2018.5=py27_0
+  - pyzmq=17.1.2=py27h14c3975_0
+  - qt=5.9.6=h8703b6f_2
+  - qtconsole=4.4.2=py27_0
+  - readline=7.0=h7b6447c_5
+  - scandir=1.9.0=py27h14c3975_0
+  - scipy=1.1.0=py27hfa4b5c9_1
+  - send2trash=1.5.0=py27_0
+  - setuptools=40.4.3=py27_0
+  - simplegeneric=0.8.1=py27_2
+  - singledispatch=3.4.0.3=py27_0
+  - sip=4.19.8=py27hf484d3e_0
+  - six=1.11.0=py27_1
+  - sqlite=3.25.2=h7b6447c_0
+  - subprocess32=3.5.3=py27h7b6447c_0
+  - sympy=1.3=py27_0
+  - tensorboard=1.11.0=py27hf484d3e_0
+  - tensorflow=1.11.0=mkl_py27h25e0b76_0
+  - tensorflow-base=1.11.0=mkl_py27h3c3e929_0
+  - termcolor=1.1.0=py27_1
+  - terminado=0.8.1=py27_1
+  - testpath=0.4.2=py27_0
+  - tk=8.6.8=hbc83047_0
+  - tornado=5.1.1=py27h7b6447c_0
+  - traceback2=1.4.0=py27_0
+  - traitlets=4.3.2=py27_0
+  - unittest2=1.1.0=py27_0
+  - wcwidth=0.1.7=py27_0
+  - webencodings=0.5.1=py27_1
+  - werkzeug=0.14.1=py27_0
+  - wheel=0.32.2=py27_0
+  - widgetsnbextension=3.4.2=py27_0
+  - xz=5.2.4=h14c3975_4
+  - zeromq=4.2.5=hf484d3e_1
+  - zlib=1.2.11=ha838bed_2
+prefix: /home/arinto_murdopo/anaconda3/envs/im2txt
+
--- a/research/nst_blogpost/4_Neural_Style_Transfer_with_Eager_Execution.ipynb
+++ b/research/nst_blogpost/4_Neural_Style_Transfer_with_Eager_Execution.ipynb
@@ -341,7 +341,7 @@
      },
      "cell_type": "markdown",
      "source": [
-        "In order toview the outputs of our optimization, we are required to perform the inverse preprocessing step. Furthermore, since our optimized image may take its values anywhere between $- \\infty$ and $\\infty$, we must clip to maintain our values from within the 0-255 range.   "
+        "In order to view the outputs of our optimization, we are required to perform the inverse preprocessing step. Furthermore, since our optimized image may take its values anywhere between $- \\infty$ and $\\infty$, we must clip to maintain our values from within the 0-255 range.   "
      ]
    },
    {
@@ -380,7 +380,7 @@
      },
      "cell_type": "markdown",
      "source": [
-        "### Define content and style representationst\n",
+        "### Define content and style representations\n",
        "In order to get both the content and style representations of our image, we will look at some intermediate layers within our model. As we go deeper into the model, these intermediate layers represent higher and higher order features. In this case, we are using the network architecture VGG19, a pretrained image classification network. These intermediate layers are necessary to define the representation of content and style from our images. For an input image, we will try to match the corresponding style and content target representations at these intermediate layers. \n",
        "\n",
        "#### Why intermediate layers?\n",
@@ -1183,7 +1183,7 @@
        "### What we covered:\n",
        "\n",
        "* We built several different loss functions and used backpropagation to transform our input image in order to minimize these losses\n",
-        "  * In order to do this we had to load in an a **pretrained model** and used its learned feature maps to describe the content and style representation of our images.\n",
+        "  * In order to do this we had to load in a **pretrained model** and use its learned feature maps to describe the content and style representation of our images.\n",
        "    * Our main loss functions were primarily computing the distance in terms of these different representations\n",
        "* We implemented this with a custom model and **eager execution**\n",
        "  * We built our custom model with the Functional API \n",

--- a/research/object_detection/anchor_generators/multiscale_grid_anchor_generator.py
+++ b/research/object_detection/anchor_generators/multiscale_grid_anchor_generator.py
@@ -108,9 +108,6 @@ class MultiscaleGridAnchorGenerator(anchor_generator.AnchorGenerator):
      ValueError: if im_height and im_width are 1, but normalized coordinates
        were requested.
    """
-    if not isinstance(im_height, int) or not isinstance(im_width, int):
-      raise ValueError('MultiscaleGridAnchorGenerator currently requires '
-                       'input image shape to be statically defined.')
    anchor_grid_list = []
    for feat_shape, grid_info in zip(feature_map_shape_list,
                                     self._anchor_grid_info):
@@ -122,10 +119,11 @@ class MultiscaleGridAnchorGenerator(anchor_generator.AnchorGenerator):
      feat_h = feat_shape[0]
      feat_w = feat_shape[1]
      anchor_offset = [0, 0]
-      if im_height % 2.0**level == 0 or im_height == 1:
-        anchor_offset[0] = stride / 2.0
-      if im_width % 2.0**level == 0 or im_width == 1:
-        anchor_offset[1] = stride / 2.0
+      if isinstance(im_height, int) and isinstance(im_width, int):
+        if im_height % 2.0**level == 0 or im_height == 1:
+          anchor_offset[0] = stride / 2.0
+        if im_width % 2.0**level == 0 or im_width == 1:
+          anchor_offset[1] = stride / 2.0
      ag = grid_anchor_generator.GridAnchorGenerator(
          scales,
          aspect_ratios,

--- a/research/object_detection/anchor_generators/multiscale_grid_anchor_generator_test.py
+++ b/research/object_detection/anchor_generators/multiscale_grid_anchor_generator_test.py
@@ -116,7 +116,7 @@ class MultiscaleGridAnchorGeneratorTest(test_case.TestCase):
        normalize_coordinates=False)
    self.assertEqual(anchor_generator.num_anchors_per_location(), [6, 6])

-  def test_construct_single_anchor_fails_with_tensor_image_size(self):
+  def test_construct_single_anchor_dynamic_size(self):
    min_level = 5
    max_level = 5
    anchor_scale = 4.0
@@ -125,12 +125,22 @@ class MultiscaleGridAnchorGeneratorTest(test_case.TestCase):
    im_height = tf.constant(64)
    im_width = tf.constant(64)
    feature_map_shape_list = [(2, 2)]
+    # Zero offsets are used.
+    exp_anchor_corners = [[-64, -64, 64, 64],
+                          [-64, -32, 64, 96],
+                          [-32, -64, 96, 64],
+                          [-32, -32, 96, 96]]
+
    anchor_generator = mg.MultiscaleGridAnchorGenerator(
        min_level, max_level, anchor_scale, aspect_ratios, scales_per_octave,
        normalize_coordinates=False)
-    with self.assertRaisesRegexp(ValueError, 'statically defined'):
-      anchor_generator.generate(
-          feature_map_shape_list, im_height=im_height, im_width=im_width)
+    anchors_list = anchor_generator.generate(
+        feature_map_shape_list, im_height=im_height, im_width=im_width)
+    anchor_corners = anchors_list[0].get()
+
+    with self.test_session():
+      anchor_corners_out = anchor_corners.eval()
+      self.assertAllClose(anchor_corners_out, exp_anchor_corners)

  def test_construct_single_anchor_with_odd_input_dimension(self):


--- a/research/object_detection/builders/box_predictor_builder.py
+++ b/research/object_detection/builders/box_predictor_builder.py
@@ -42,6 +42,7 @@ def build_convolutional_box_predictor(is_training,
                                      kernel_size,
                                      box_code_size,
                                      apply_sigmoid_to_scores=False,
+                                      add_background_class=True,
                                      class_prediction_bias_init=0.0,
                                      use_depthwise=False,
                                      mask_head_config=None):
@@ -49,7 +50,10 @@ def build_convolutional_box_predictor(is_training,

  Args:
    is_training: Indicates whether the BoxPredictor is in training mode.
-    num_classes: Number of classes.
+    num_classes: number of classes.  Note that num_classes *does not*
+      include the background category, so if groundtruth labels take values
+      in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the
+      assigned classification targets can range from {0,... K}).
    conv_hyperparams_fn: A function to generate tf-slim arg_scope with
      hyperparameters for convolution ops.
    min_depth: Minimum feature depth prior to predicting box encodings
@@ -71,6 +75,7 @@ def build_convolutional_box_predictor(is_training,
    box_code_size: Size of encoding for each box.
    apply_sigmoid_to_scores: If True, apply the sigmoid on the output
      class_predictions.
+    add_background_class: Whether to add an implicit background class.
    class_prediction_bias_init: Constant value to initialize bias of the last
      conv2d layer before class prediction.
    use_depthwise: Whether to use depthwise convolutions for prediction
@@ -88,7 +93,7 @@ def build_convolutional_box_predictor(is_training,
      use_depthwise=use_depthwise)
  class_prediction_head = class_head.ConvolutionalClassHead(
      is_training=is_training,
-      num_classes=num_classes,
+      num_class_slots=num_classes + 1 if add_background_class else num_classes,
      use_dropout=use_dropout,
      dropout_keep_prob=dropout_keep_prob,
      kernel_size=kernel_size,
@@ -136,15 +141,19 @@ def build_convolutional_keras_box_predictor(is_training,
                                            dropout_keep_prob,
                                            kernel_size,
                                            box_code_size,
+                                            add_background_class=True,
                                            class_prediction_bias_init=0.0,
                                            use_depthwise=False,
                                            mask_head_config=None,
                                            name='BoxPredictor'):
-  """Builds the ConvolutionalBoxPredictor from the arguments.
+  """Builds the Keras ConvolutionalBoxPredictor from the arguments.

  Args:
    is_training: Indicates whether the BoxPredictor is in training mode.
-    num_classes: Number of classes.
+    num_classes: number of classes.  Note that num_classes *does not*
+      include the background category, so if groundtruth labels take values
+      in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the
+      assigned classification targets can range from {0,... K}).
    conv_hyperparams: A `hyperparams_builder.KerasLayerHyperparams` object
      containing hyperparameters for convolution ops.
    freeze_batchnorm: Whether to freeze batch norm parameters during
@@ -175,6 +184,7 @@ def build_convolutional_keras_box_predictor(is_training,
      then the kernel size is automatically set to be
      min(feature_width, feature_height).
    box_code_size: Size of encoding for each box.
+    add_background_class: Whether to add an implicit background class.
    class_prediction_bias_init: constant value to initialize bias of the last
      conv2d layer before class prediction.
    use_depthwise: Whether to use depthwise convolutions for prediction
@@ -185,7 +195,7 @@ def build_convolutional_keras_box_predictor(is_training,
      will auto-generate one from the class name.

  Returns:
-    A ConvolutionalBoxPredictor class.
+    A Keras ConvolutionalBoxPredictor class.
  """
  box_prediction_heads = []
  class_prediction_heads = []
@@ -210,7 +220,8 @@ def build_convolutional_keras_box_predictor(is_training,
    class_prediction_heads.append(
        keras_class_head.ConvolutionalClassHead(
            is_training=is_training,
-            num_classes=num_classes,
+            num_class_slots=(
+                num_classes + 1 if add_background_class else num_classes),
            use_dropout=use_dropout,
            dropout_keep_prob=dropout_keep_prob,
            kernel_size=kernel_size,
@@ -264,6 +275,7 @@ def build_weight_shared_convolutional_box_predictor(
    num_layers_before_predictor,
    box_code_size,
    kernel_size=3,
+    add_background_class=True,
    class_prediction_bias_init=0.0,
    use_dropout=False,
    dropout_keep_prob=0.8,
@@ -288,6 +300,7 @@ def build_weight_shared_convolutional_box_predictor(
      the predictor.
    box_code_size: Size of encoding for each box.
    kernel_size: Size of final convolution kernel.
+    add_background_class: Whether to add an implicit background class.
    class_prediction_bias_init: constant value to initialize bias of the last
      conv2d layer before class prediction.
    use_dropout: Whether to apply dropout to class prediction head.
@@ -313,7 +326,8 @@ def build_weight_shared_convolutional_box_predictor(
      box_encodings_clip_range=box_encodings_clip_range)
  class_prediction_head = (
      class_head.WeightSharedConvolutionalClassHead(
-          num_classes=num_classes,
+          num_class_slots=(
+              num_classes + 1 if add_background_class else num_classes),
          kernel_size=kernel_size,
          class_prediction_bias_init=class_prediction_bias_init,
          use_dropout=use_dropout,
@@ -355,6 +369,7 @@ def build_mask_rcnn_box_predictor(is_training,
                                  use_dropout,
                                  dropout_keep_prob,
                                  box_code_size,
+                                  add_background_class=True,
                                  share_box_across_classes=False,
                                  predict_instance_masks=False,
                                  conv_hyperparams_fn=None,
@@ -362,40 +377,46 @@ def build_mask_rcnn_box_predictor(is_training,
                                  mask_width=14,
                                  mask_prediction_num_conv_layers=2,
                                  mask_prediction_conv_depth=256,
-                                  masks_are_class_agnostic=False):
+                                  masks_are_class_agnostic=False,
+                                  convolve_then_upsample_masks=False):
  """Builds and returns a MaskRCNNBoxPredictor class.

  Args:
-      is_training: Indicates whether the BoxPredictor is in training mode.
-      num_classes: number of classes.  Note that num_classes *does not*
-        include the background category, so if groundtruth labels take values
-        in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the
-        assigned classification targets can range from {0,... K}).
-      fc_hyperparams_fn: A function to generate tf-slim arg_scope with
-        hyperparameters for fully connected ops.
-      use_dropout: Option to use dropout or not.  Note that a single dropout
-        op is applied here prior to both box and class predictions, which stands
-        in contrast to the ConvolutionalBoxPredictor below.
-      dropout_keep_prob: Keep probability for dropout.
-        This is only used if use_dropout is True.
-      box_code_size: Size of encoding for each box.
-      share_box_across_classes: Whether to share boxes across classes rather
-        than use a different box for each class.
-      predict_instance_masks: If True, will add a third stage mask prediction
-        to the returned class.
-      conv_hyperparams_fn: A function to generate tf-slim arg_scope with
-        hyperparameters for convolution ops.
-      mask_height: Desired output mask height. The default value is 14.
-      mask_width: Desired output mask width. The default value is 14.
-      mask_prediction_num_conv_layers: Number of convolution layers applied to
-        the image_features in mask prediction branch.
-      mask_prediction_conv_depth: The depth for the first conv2d_transpose op
-        applied to the image_features in the mask prediction branch. If set
-        to 0, the depth of the convolution layers will be automatically chosen
-        based on the number of object classes and the number of channels in the
-        image features.
-      masks_are_class_agnostic: Boolean determining if the mask-head is
-        class-agnostic or not.
+    is_training: Indicates whether the BoxPredictor is in training mode.
+    num_classes: number of classes.  Note that num_classes *does not*
+      include the background category, so if groundtruth labels take values
+      in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the
+      assigned classification targets can range from {0,... K}).
+    fc_hyperparams_fn: A function to generate tf-slim arg_scope with
+      hyperparameters for fully connected ops.
+    use_dropout: Option to use dropout or not.  Note that a single dropout
+      op is applied here prior to both box and class predictions, which stands
+      in contrast to the ConvolutionalBoxPredictor below.
+    dropout_keep_prob: Keep probability for dropout.
+      This is only used if use_dropout is True.
+    box_code_size: Size of encoding for each box.
+    add_background_class: Whether to add an implicit background class.
+    share_box_across_classes: Whether to share boxes across classes rather
+      than use a different box for each class.
+    predict_instance_masks: If True, will add a third stage mask prediction
+      to the returned class.
+    conv_hyperparams_fn: A function to generate tf-slim arg_scope with
+      hyperparameters for convolution ops.
+    mask_height: Desired output mask height. The default value is 14.
+    mask_width: Desired output mask width. The default value is 14.
+    mask_prediction_num_conv_layers: Number of convolution layers applied to
+      the image_features in mask prediction branch.
+    mask_prediction_conv_depth: The depth for the first conv2d_transpose op
+      applied to the image_features in the mask prediction branch. If set
+      to 0, the depth of the convolution layers will be automatically chosen
+      based on the number of object classes and the number of channels in the
+      image features.
+    masks_are_class_agnostic: Boolean determining if the mask-head is
+      class-agnostic or not.
+    convolve_then_upsample_masks: Whether to apply convolutions on mask
+      features before upsampling using nearest neighbor resizing. Otherwise,
+      mask features are resized to [`mask_height`, `mask_width`] using
+      bilinear resizing before applying convolutions.

  Returns:
    A MaskRCNNBoxPredictor class.
@@ -410,7 +431,7 @@ def build_mask_rcnn_box_predictor(is_training,
      share_box_across_classes=share_box_across_classes)
  class_prediction_head = class_head.MaskRCNNClassHead(
      is_training=is_training,
-      num_classes=num_classes,
+      num_class_slots=num_classes + 1 if add_background_class else num_classes,
      fc_hyperparams_fn=fc_hyperparams_fn,
      use_dropout=use_dropout,
      dropout_keep_prob=dropout_keep_prob)
@@ -425,7 +446,8 @@ def build_mask_rcnn_box_predictor(is_training,
            mask_width=mask_width,
            mask_prediction_num_conv_layers=mask_prediction_num_conv_layers,
            mask_prediction_conv_depth=mask_prediction_conv_depth,
-            masks_are_class_agnostic=masks_are_class_agnostic)
+            masks_are_class_agnostic=masks_are_class_agnostic,
+            convolve_then_upsample=convolve_then_upsample_masks)
  return mask_rcnn_box_predictor.MaskRCNNBoxPredictor(
      is_training=is_training,
      num_classes=num_classes,
@@ -464,7 +486,8 @@ BoxEncodingsClipRange = collections.namedtuple('BoxEncodingsClipRange',
                                               ['min', 'max'])


-def build(argscope_fn, box_predictor_config, is_training, num_classes):
+def build(argscope_fn, box_predictor_config, is_training, num_classes,
+          add_background_class=True):
  """Builds box predictor based on the configuration.

  Builds box predictor based on the configuration. See box_predictor.proto for
@@ -479,6 +502,7 @@ def build(argscope_fn, box_predictor_config, is_training, num_classes):
      configuration.
    is_training: Whether the models is in training mode.
    num_classes: Number of classes to predict.
+    add_background_class: Whether to add an implicit background class.

  Returns:
    box_predictor: box_predictor.BoxPredictor object.
@@ -502,6 +526,7 @@ def build(argscope_fn, box_predictor_config, is_training, num_classes):
    return build_convolutional_box_predictor(
        is_training=is_training,
        num_classes=num_classes,
+        add_background_class=add_background_class,
        conv_hyperparams_fn=conv_hyperparams_fn,
        use_dropout=config_box_predictor.use_dropout,
        dropout_keep_prob=config_box_predictor.dropout_keep_probability,
@@ -542,6 +567,7 @@ def build(argscope_fn, box_predictor_config, is_training, num_classes):
    return build_weight_shared_convolutional_box_predictor(
        is_training=is_training,
        num_classes=num_classes,
+        add_background_class=add_background_class,
        conv_hyperparams_fn=conv_hyperparams_fn,
        depth=config_box_predictor.depth,
        num_layers_before_predictor=(
@@ -570,6 +596,7 @@ def build(argscope_fn, box_predictor_config, is_training, num_classes):
    return build_mask_rcnn_box_predictor(
        is_training=is_training,
        num_classes=num_classes,
+        add_background_class=add_background_class,
        fc_hyperparams_fn=fc_hyperparams_fn,
        use_dropout=config_box_predictor.use_dropout,
        dropout_keep_prob=config_box_predictor.dropout_keep_probability,
@@ -585,7 +612,9 @@ def build(argscope_fn, box_predictor_config, is_training, num_classes):
        mask_prediction_conv_depth=(
            config_box_predictor.mask_prediction_conv_depth),
        masks_are_class_agnostic=(
-            config_box_predictor.masks_are_class_agnostic))
+            config_box_predictor.masks_are_class_agnostic),
+        convolve_then_upsample_masks=(
+            config_box_predictor.convolve_then_upsample_masks))

  if box_predictor_oneof == 'rfcn_box_predictor':
    config_box_predictor = box_predictor_config.rfcn_box_predictor
@@ -603,3 +632,78 @@ def build(argscope_fn, box_predictor_config, is_training, num_classes):
        box_code_size=config_box_predictor.box_code_size)
    return box_predictor_object
  raise ValueError('Unknown box predictor: {}'.format(box_predictor_oneof))
+
+
+def build_keras(conv_hyperparams_fn, freeze_batchnorm, inplace_batchnorm_update,
+                num_predictions_per_location_list, box_predictor_config,
+                is_training, num_classes, add_background_class=True):
+  """Builds a Keras-based box predictor based on the configuration.
+
+  Builds Keras-based box predictor based on the configuration.
+  See box_predictor.proto for configurable options. Also, see box_predictor.py
+  for more details.
+
+  Args:
+    conv_hyperparams_fn: A function that takes a hyperparams_pb2.Hyperparams
+      proto and returns a `hyperparams_builder.KerasLayerHyperparams`
+      for Conv or FC hyperparameters.
+    freeze_batchnorm: Whether to freeze batch norm parameters during
+      training or not. When training with a small batch size (e.g. 1), it is
+      desirable to freeze batch norm update and use pretrained batch norm
+      params.
+    inplace_batchnorm_update: Whether to update batch norm moving average
+      values inplace. When this is false train op must add a control
+      dependency on tf.graphkeys.UPDATE_OPS collection in order to update
+      batch norm statistics.
+    num_predictions_per_location_list: A list of integers representing the
+      number of box predictions to be made per spatial location for each
+      feature map.
+    box_predictor_config: box_predictor_pb2.BoxPredictor proto containing
+      configuration.
+    is_training: Whether the models is in training mode.
+    num_classes: Number of classes to predict.
+    add_background_class: Whether to add an implicit background class.
+
+  Returns:
+    box_predictor: box_predictor.KerasBoxPredictor object.
+
+  Raises:
+    ValueError: On unknown box predictor, or one with no Keras box predictor.
+  """
+  if not isinstance(box_predictor_config, box_predictor_pb2.BoxPredictor):
+    raise ValueError('box_predictor_config not of type '
+                     'box_predictor_pb2.BoxPredictor.')
+
+  box_predictor_oneof = box_predictor_config.WhichOneof('box_predictor_oneof')
+
+  if box_predictor_oneof == 'convolutional_box_predictor':
+    config_box_predictor = box_predictor_config.convolutional_box_predictor
+    conv_hyperparams = conv_hyperparams_fn(
+        config_box_predictor.conv_hyperparams)
+
+    mask_head_config = (
+        config_box_predictor.mask_head
+        if config_box_predictor.HasField('mask_head') else None)
+    return build_convolutional_keras_box_predictor(
+        is_training=is_training,
+        num_classes=num_classes,
+        add_background_class=add_background_class,
+        conv_hyperparams=conv_hyperparams,
+        freeze_batchnorm=freeze_batchnorm,
+        inplace_batchnorm_update=inplace_batchnorm_update,
+        num_predictions_per_location_list=num_predictions_per_location_list,
+        use_dropout=config_box_predictor.use_dropout,
+        dropout_keep_prob=config_box_predictor.dropout_keep_probability,
+        box_code_size=config_box_predictor.box_code_size,
+        kernel_size=config_box_predictor.kernel_size,
+        num_layers_before_predictor=(
+            config_box_predictor.num_layers_before_predictor),
+        min_depth=config_box_predictor.min_depth,
+        max_depth=config_box_predictor.max_depth,
+        class_prediction_bias_init=(
+            config_box_predictor.class_prediction_bias_init),
+        use_depthwise=config_box_predictor.use_depthwise,
+        mask_head_config=mask_head_config)
+
+  raise ValueError(
+      'Unknown box predictor for Keras: {}'.format(box_predictor_oneof))
--- a/research/object_detection/builders/box_predictor_builder_test.py
+++ b/research/object_detection/builders/box_predictor_builder_test.py
@@ -113,7 +113,8 @@ class ConvolutionalBoxPredictorBuilderTest(tf.test.TestCase):
        argscope_fn=mock_conv_argscope_builder,
        box_predictor_config=box_predictor_proto,
        is_training=False,
-        num_classes=10)
+        num_classes=10,
+        add_background_class=False)
    class_head = box_predictor._class_prediction_head
    self.assertEqual(box_predictor._min_depth, 2)
    self.assertEqual(box_predictor._max_depth, 16)
@@ -122,6 +123,7 @@ class ConvolutionalBoxPredictorBuilderTest(tf.test.TestCase):
    self.assertAlmostEqual(class_head._dropout_keep_prob, 0.4)
    self.assertTrue(class_head._apply_sigmoid_to_scores)
    self.assertAlmostEqual(class_head._class_prediction_bias_init, 4.0)
+    self.assertEqual(class_head._num_class_slots, 10)
    self.assertEqual(box_predictor.num_classes, 10)
    self.assertFalse(box_predictor._is_training)
    self.assertTrue(class_head._use_depthwise)
@@ -154,6 +156,7 @@ class ConvolutionalBoxPredictorBuilderTest(tf.test.TestCase):
    self.assertTrue(class_head._use_dropout)
    self.assertAlmostEqual(class_head._dropout_keep_prob, 0.8)
    self.assertFalse(class_head._apply_sigmoid_to_scores)
+    self.assertEqual(class_head._num_class_slots, 91)
    self.assertEqual(box_predictor.num_classes, 90)
    self.assertTrue(box_predictor._is_training)
    self.assertFalse(class_head._use_depthwise)
@@ -306,7 +309,8 @@ class WeightSharedConvolutionalBoxPredictorBuilderTest(tf.test.TestCase):
        argscope_fn=mock_conv_argscope_builder,
        box_predictor_config=box_predictor_proto,
        is_training=False,
-        num_classes=10)
+        num_classes=10,
+        add_background_class=False)
    class_head = box_predictor._class_prediction_head
    self.assertEqual(box_predictor._depth, 2)
    self.assertEqual(box_predictor._num_layers_before_predictor, 2)
@@ -349,7 +353,8 @@ class WeightSharedConvolutionalBoxPredictorBuilderTest(tf.test.TestCase):
        argscope_fn=mock_conv_argscope_builder,
        box_predictor_config=box_predictor_proto,
        is_training=False,
-        num_classes=10)
+        num_classes=10,
+        add_background_class=False)
    class_head = box_predictor._class_prediction_head
    self.assertEqual(box_predictor._depth, 2)
    self.assertEqual(box_predictor._num_layers_before_predictor, 2)
@@ -627,6 +632,48 @@ class MaskRCNNBoxPredictorBuilderTest(tf.test.TestCase):
        third_stage_heads[mask_rcnn_box_predictor.MASK_PREDICTIONS]
        ._mask_prediction_conv_depth, 512)

+  def test_build_box_predictor_with_convlve_then_upsample_masks(self):
+    box_predictor_proto = box_predictor_pb2.BoxPredictor()
+    box_predictor_proto.mask_rcnn_box_predictor.fc_hyperparams.op = (
+        hyperparams_pb2.Hyperparams.FC)
+    box_predictor_proto.mask_rcnn_box_predictor.conv_hyperparams.op = (
+        hyperparams_pb2.Hyperparams.CONV)
+    box_predictor_proto.mask_rcnn_box_predictor.predict_instance_masks = True
+    box_predictor_proto.mask_rcnn_box_predictor.mask_prediction_conv_depth = 512
+    box_predictor_proto.mask_rcnn_box_predictor.mask_height = 24
+    box_predictor_proto.mask_rcnn_box_predictor.mask_width = 24
+    box_predictor_proto.mask_rcnn_box_predictor.convolve_then_upsample_masks = (
+        True)
+
+    mock_argscope_fn = mock.Mock(return_value='arg_scope')
+    box_predictor = box_predictor_builder.build(
+        argscope_fn=mock_argscope_fn,
+        box_predictor_config=box_predictor_proto,
+        is_training=True,
+        num_classes=90)
+    mock_argscope_fn.assert_has_calls(
+        [mock.call(box_predictor_proto.mask_rcnn_box_predictor.fc_hyperparams,
+                   True),
+         mock.call(box_predictor_proto.mask_rcnn_box_predictor.conv_hyperparams,
+                   True)], any_order=True)
+    box_head = box_predictor._box_prediction_head
+    class_head = box_predictor._class_prediction_head
+    third_stage_heads = box_predictor._third_stage_heads
+    self.assertFalse(box_head._use_dropout)
+    self.assertFalse(class_head._use_dropout)
+    self.assertAlmostEqual(box_head._dropout_keep_prob, 0.5)
+    self.assertAlmostEqual(class_head._dropout_keep_prob, 0.5)
+    self.assertEqual(box_predictor.num_classes, 90)
+    self.assertTrue(box_predictor._is_training)
+    self.assertEqual(box_head._box_code_size, 4)
+    self.assertTrue(
+        mask_rcnn_box_predictor.MASK_PREDICTIONS in third_stage_heads)
+    self.assertEqual(
+        third_stage_heads[mask_rcnn_box_predictor.MASK_PREDICTIONS]
+        ._mask_prediction_conv_depth, 512)
+    self.assertTrue(third_stage_heads[mask_rcnn_box_predictor.MASK_PREDICTIONS]
+                    ._convolve_then_upsample)
+

 class RfcnBoxPredictorBuilderTest(tf.test.TestCase):


--- a/research/object_detection/builders/hyperparams_builder.py
+++ b/research/object_detection/builders/hyperparams_builder.py
@@ -64,6 +64,10 @@ class KerasLayerHyperparams(object):
          hyperparams_config.batch_norm)

    self._activation_fn = _build_activation_fn(hyperparams_config.activation)
+    # TODO(kaftan): Unclear if these kwargs apply to separable & depthwise conv
+    # (Those might use depthwise_* instead of kernel_*)
+    # We should probably switch to using build_conv2d_layer and
+    # build_depthwise_conv2d_layer methods instead.
    self._op_params = {
        'kernel_regularizer': _build_keras_regularizer(
            hyperparams_config.regularizer),

--- a/research/object_detection/builders/image_resizer_builder.py
+++ b/research/object_detection/builders/image_resizer_builder.py
@@ -106,10 +106,35 @@ def build(image_resizer_config):
    raise ValueError(
        'Invalid image resizer option: \'%s\'.' % image_resizer_oneof)

-  def grayscale_image_resizer(image):
-    [resized_image, resized_image_shape] = image_resizer_fn(image)
-    grayscale_image = preprocessor.rgb_to_gray(resized_image)
-    grayscale_image_shape = tf.concat([resized_image_shape[:-1], [1]], 0)
-    return [grayscale_image, grayscale_image_shape]
+  def grayscale_image_resizer(image, masks=None):
+    """Convert to grayscale before applying image_resizer_fn.
+
+    Args:
+      image: A 3D tensor of shape [height, width, 3]
+      masks: (optional) rank 3 float32 tensor with shape [num_instances, height,
+        width] containing instance masks.
+
+    Returns:
+    Note that the position of the resized_image_shape changes based on whether
+    masks are present.
+    resized_image: A 3D tensor of shape [new_height, new_width, 1],
+      where the image has been resized (with bilinear interpolation) so that
+      min(new_height, new_width) == min_dimension or
+      max(new_height, new_width) == max_dimension.
+    resized_masks: If masks is not None, also outputs masks. A 3D tensor of
+      shape [num_instances, new_height, new_width].
+    resized_image_shape: A 1D tensor of shape [3] containing shape of the
+      resized image.
+    """
+    # image_resizer_fn returns [resized_image, resized_image_shape] if
+    # mask==None, otherwise it returns
+    # [resized_image, resized_mask, resized_image_shape]. In either case, we
+    # only deal with first and last element of the returned list.
+    retval = image_resizer_fn(image, masks)
+    resized_image = retval[0]
+    resized_image_shape = retval[-1]
+    retval[0] = preprocessor.rgb_to_gray(resized_image)
+    retval[-1] = tf.concat([resized_image_shape[:-1], [1]], 0)
+    return retval

  return functools.partial(grayscale_image_resizer)
--- a/research/object_detection/builders/losses_builder.py
+++ b/research/object_detection/builders/losses_builder.py
@@ -136,6 +136,14 @@ def build_faster_rcnn_classification_loss(loss_config):
    config = loss_config.weighted_logits_softmax
    return losses.WeightedSoftmaxClassificationAgainstLogitsLoss(
        logit_scale=config.logit_scale)
+  if loss_type == 'weighted_sigmoid_focal':
+    config = loss_config.weighted_sigmoid_focal
+    alpha = None
+    if config.HasField('alpha'):
+      alpha = config.alpha
+    return losses.SigmoidFocalClassificationLoss(
+        gamma=config.gamma,
+        alpha=alpha)

  # By default, Faster RCNN second stage classifier uses Softmax loss
  # with anchor-wise outputs.

--- a/research/object_detection/builders/losses_builder_test.py
+++ b/research/object_detection/builders/losses_builder_test.py
@@ -280,7 +280,7 @@ class ClassificationLossBuilderTest(tf.test.TestCase):
                               losses.WeightedSigmoidClassificationLoss))
    predictions = tf.constant([[[0.0, 1.0, 0.0], [0.0, 0.5, 0.5]]])
    targets = tf.constant([[[0.0, 1.0, 0.0], [0.0, 0.0, 1.0]]])
-    weights = tf.constant([[1.0, 1.0]])
+    weights = tf.constant([[[1.0, 1.0, 1.0], [1.0, 1.0, 1.0]]])
    loss = classification_loss(predictions, targets, weights=weights)
    self.assertEqual(loss.shape, [1, 2, 3])

@@ -473,6 +473,19 @@ class FasterRcnnClassificationLossBuilderTest(tf.test.TestCase):
        isinstance(classification_loss,
                   losses.WeightedSoftmaxClassificationAgainstLogitsLoss))

+  def test_build_sigmoid_focal_loss(self):
+    losses_text_proto = """
+      weighted_sigmoid_focal {
+      }
+    """
+    losses_proto = losses_pb2.ClassificationLoss()
+    text_format.Merge(losses_text_proto, losses_proto)
+    classification_loss = losses_builder.build_faster_rcnn_classification_loss(
+        losses_proto)
+    self.assertTrue(
+        isinstance(classification_loss,
+                   losses.SigmoidFocalClassificationLoss))
+
  def test_build_softmax_loss_by_default(self):
    losses_text_proto = """
    """

--- a/research/object_detection/builders/model_builder.py
+++ b/research/object_detection/builders/model_builder.py
@@ -47,6 +47,8 @@ from object_detection.models.ssd_mobilenet_v1_fpn_feature_extractor import SSDMo
 from object_detection.models.ssd_mobilenet_v1_ppn_feature_extractor import SSDMobileNetV1PpnFeatureExtractor
 from object_detection.models.ssd_mobilenet_v2_feature_extractor import SSDMobileNetV2FeatureExtractor
 from object_detection.models.ssd_mobilenet_v2_fpn_feature_extractor import SSDMobileNetV2FpnFeatureExtractor
+from object_detection.models.ssd_mobilenet_v2_keras_feature_extractor import SSDMobileNetV2KerasFeatureExtractor
+from object_detection.models.ssd_pnasnet_feature_extractor import SSDPNASNetFeatureExtractor
 from object_detection.predictors import rfcn_box_predictor
 from object_detection.protos import model_pb2
 from object_detection.utils import ops
@@ -69,6 +71,11 @@ SSD_FEATURE_EXTRACTOR_CLASS_MAP = {
    'ssd_resnet152_v1_ppn':
        ssd_resnet_v1_ppn.SSDResnet152V1PpnFeatureExtractor,
    'embedded_ssd_mobilenet_v1': EmbeddedSSDMobileNetV1FeatureExtractor,
+    'ssd_pnasnet': SSDPNASNetFeatureExtractor,
+}
+
+SSD_KERAS_FEATURE_EXTRACTOR_CLASS_MAP = {
+    'ssd_mobilenet_v2_keras': SSDMobileNetV2KerasFeatureExtractor
 }

 # A map of names to Faster R-CNN feature extractors.
@@ -90,8 +97,7 @@ FASTER_RCNN_FEATURE_EXTRACTOR_CLASS_MAP = {
 }


-def build(model_config, is_training, add_summaries=True,
-          add_background_class=True):
+def build(model_config, is_training, add_summaries=True):
  """Builds a DetectionModel based on the model config.

  Args:
@@ -99,10 +105,6 @@ def build(model_config, is_training, add_summaries=True,
      DetectionModel.
    is_training: True if this model is being built for training purposes.
    add_summaries: Whether to add tensorflow summaries in the model graph.
-    add_background_class: Whether to add an implicit background class to one-hot
-      encodings of groundtruth labels. Set to false if using groundtruth labels
-      with an explicit background class or using multiclass scores instead of
-      truth in the case of distillation. Ignored in the case of faster_rcnn.
  Returns:
    DetectionModel based on the config.

@@ -113,21 +115,26 @@ def build(model_config, is_training, add_summaries=True,
    raise ValueError('model_config not of type model_pb2.DetectionModel.')
  meta_architecture = model_config.WhichOneof('model')
  if meta_architecture == 'ssd':
-    return _build_ssd_model(model_config.ssd, is_training, add_summaries,
-                            add_background_class)
+    return _build_ssd_model(model_config.ssd, is_training, add_summaries)
  if meta_architecture == 'faster_rcnn':
    return _build_faster_rcnn_model(model_config.faster_rcnn, is_training,
                                    add_summaries)
  raise ValueError('Unknown meta architecture: {}'.format(meta_architecture))


-def _build_ssd_feature_extractor(feature_extractor_config, is_training,
+def _build_ssd_feature_extractor(feature_extractor_config,
+                                 is_training,
+                                 freeze_batchnorm,
                                 reuse_weights=None):
  """Builds a ssd_meta_arch.SSDFeatureExtractor based on config.

  Args:
    feature_extractor_config: A SSDFeatureExtractor proto config from ssd.proto.
    is_training: True if this feature extractor is being built for training.
+    freeze_batchnorm: Whether to freeze batch norm parameters during
+      training or not. When training with a small batch size (e.g. 1), it is
+      desirable to freeze batch norm update and use pretrained batch norm
+      params.
    reuse_weights: if the feature extractor should reuse weights.

  Returns:
@@ -137,20 +144,31 @@ def _build_ssd_feature_extractor(feature_extractor_config, is_training,
    ValueError: On invalid feature extractor type.
  """
  feature_type = feature_extractor_config.type
+  is_keras_extractor = feature_type in SSD_KERAS_FEATURE_EXTRACTOR_CLASS_MAP
  depth_multiplier = feature_extractor_config.depth_multiplier
  min_depth = feature_extractor_config.min_depth
  pad_to_multiple = feature_extractor_config.pad_to_multiple
  use_explicit_padding = feature_extractor_config.use_explicit_padding
  use_depthwise = feature_extractor_config.use_depthwise
-  conv_hyperparams = hyperparams_builder.build(
-      feature_extractor_config.conv_hyperparams, is_training)
+
+  if is_keras_extractor:
+    conv_hyperparams = hyperparams_builder.KerasLayerHyperparams(
+        feature_extractor_config.conv_hyperparams)
+  else:
+    conv_hyperparams = hyperparams_builder.build(
+        feature_extractor_config.conv_hyperparams, is_training)
  override_base_feature_extractor_hyperparams = (
      feature_extractor_config.override_base_feature_extractor_hyperparams)

-  if feature_type not in SSD_FEATURE_EXTRACTOR_CLASS_MAP:
+  if (feature_type not in SSD_FEATURE_EXTRACTOR_CLASS_MAP) and (
+      not is_keras_extractor):
    raise ValueError('Unknown ssd feature_extractor: {}'.format(feature_type))

-  feature_extractor_class = SSD_FEATURE_EXTRACTOR_CLASS_MAP[feature_type]
+  if is_keras_extractor:
+    feature_extractor_class = SSD_KERAS_FEATURE_EXTRACTOR_CLASS_MAP[
+        feature_type]
+  else:
+    feature_extractor_class = SSD_FEATURE_EXTRACTOR_CLASS_MAP[feature_type]
  kwargs = {
      'is_training':
          is_training,
@@ -160,10 +178,6 @@ def _build_ssd_feature_extractor(feature_extractor_config, is_training,
          min_depth,
      'pad_to_multiple':
          pad_to_multiple,
-      'conv_hyperparams_fn':
-          conv_hyperparams,
-      'reuse_weights':
-          reuse_weights,
      'use_explicit_padding':
          use_explicit_padding,
      'use_depthwise':
@@ -172,6 +186,18 @@ def _build_ssd_feature_extractor(feature_extractor_config, is_training,
          override_base_feature_extractor_hyperparams
  }

+  if is_keras_extractor:
+    kwargs.update({
+        'conv_hyperparams': conv_hyperparams,
+        'inplace_batchnorm_update': False,
+        'freeze_batchnorm': freeze_batchnorm
+    })
+  else:
+    kwargs.update({
+        'conv_hyperparams_fn': conv_hyperparams,
+        'reuse_weights': reuse_weights,
+    })
+
  if feature_extractor_config.HasField('fpn'):
    kwargs.update({
        'fpn_min_level':
@@ -185,8 +211,7 @@ def _build_ssd_feature_extractor(feature_extractor_config, is_training,
  return feature_extractor_class(**kwargs)


-def _build_ssd_model(ssd_config, is_training, add_summaries,
-                     add_background_class=True):
+def _build_ssd_model(ssd_config, is_training, add_summaries):
  """Builds an SSD detection model based on the model config.

  Args:
@@ -194,10 +219,6 @@ def _build_ssd_model(ssd_config, is_training, add_summaries,
      SSDMetaArch.
    is_training: True if this model is being built for training purposes.
    add_summaries: Whether to add tf summaries in the model.
-    add_background_class: Whether to add an implicit background class to one-hot
-      encodings of groundtruth labels. Set to false if using groundtruth labels
-      with an explicit background class or using multiclass scores instead of
-      truth in the case of distillation.
  Returns:
    SSDMetaArch based on the config.

@@ -210,6 +231,7 @@ def _build_ssd_model(ssd_config, is_training, add_summaries,
  # Feature extractor
  feature_extractor = _build_ssd_feature_extractor(
      feature_extractor_config=ssd_config.feature_extractor,
+      freeze_batchnorm=ssd_config.freeze_batchnorm,
      is_training=is_training)

  box_coder = box_coder_builder.build(ssd_config.box_coder)
@@ -218,11 +240,23 @@ def _build_ssd_model(ssd_config, is_training, add_summaries,
      ssd_config.similarity_calculator)
  encode_background_as_zeros = ssd_config.encode_background_as_zeros
  negative_class_weight = ssd_config.negative_class_weight
-  ssd_box_predictor = box_predictor_builder.build(hyperparams_builder.build,
-                                                  ssd_config.box_predictor,
-                                                  is_training, num_classes)
  anchor_generator = anchor_generator_builder.build(
      ssd_config.anchor_generator)
+  if feature_extractor.is_keras_model:
+    ssd_box_predictor = box_predictor_builder.build_keras(
+        conv_hyperparams_fn=hyperparams_builder.KerasLayerHyperparams,
+        freeze_batchnorm=ssd_config.freeze_batchnorm,
+        inplace_batchnorm_update=False,
+        num_predictions_per_location_list=anchor_generator
+        .num_anchors_per_location(),
+        box_predictor_config=ssd_config.box_predictor,
+        is_training=is_training,
+        num_classes=num_classes,
+        add_background_class=ssd_config.add_background_class)
+  else:
+    ssd_box_predictor = box_predictor_builder.build(
+        hyperparams_builder.build, ssd_config.box_predictor, is_training,
+        num_classes, ssd_config.add_background_class)
  image_resizer_fn = image_resizer_builder.build(ssd_config.image_resizer)
  non_max_suppression_fn, score_conversion_fn = post_processing_builder.build(
      ssd_config.post_processing)
@@ -244,7 +278,7 @@ def _build_ssd_model(ssd_config, is_training, add_summaries,
  if ssd_config.use_expected_classification_loss_under_sampling:
    expected_classification_loss_under_sampling = functools.partial(
        ops.expected_classification_loss_under_sampling,
-        minimum_negative_sampling=ssd_config.minimum_negative_sampling,
+        min_num_negative_samples=ssd_config.min_num_negative_samples,
        desired_negative_sampling_ratio=ssd_config.
        desired_negative_sampling_ratio)

@@ -271,7 +305,7 @@ def _build_ssd_model(ssd_config, is_training, add_summaries,
      normalize_loc_loss_by_codesize=normalize_loc_loss_by_codesize,
      freeze_batchnorm=ssd_config.freeze_batchnorm,
      inplace_batchnorm_update=ssd_config.inplace_batchnorm_update,
-      add_background_class=add_background_class,
+      add_background_class=ssd_config.add_background_class,
      random_example_sampler=random_example_sampler,
      expected_classification_loss_under_sampling=
      expected_classification_loss_under_sampling)
@@ -357,12 +391,11 @@ def _build_faster_rcnn_model(frcnn_config, is_training, add_summaries):
      frcnn_config.first_stage_box_predictor_kernel_size)
  first_stage_box_predictor_depth = frcnn_config.first_stage_box_predictor_depth
  first_stage_minibatch_size = frcnn_config.first_stage_minibatch_size
-  # TODO(bhattad): When eval is supported using static shapes, add separate
-  # use_static_shapes_for_trainig and use_static_shapes_for_evaluation.
-  use_static_shapes = frcnn_config.use_static_shapes and is_training
+  use_static_shapes = frcnn_config.use_static_shapes
  first_stage_sampler = sampler.BalancedPositiveNegativeSampler(
      positive_fraction=frcnn_config.first_stage_positive_balance_fraction,
-      is_static=frcnn_config.use_static_balanced_label_sampler and is_training)
+      is_static=(frcnn_config.use_static_balanced_label_sampler and
+                 use_static_shapes))
  first_stage_max_proposals = frcnn_config.first_stage_max_proposals
  if (frcnn_config.first_stage_nms_iou_threshold < 0 or
      frcnn_config.first_stage_nms_iou_threshold > 1.0):
@@ -377,7 +410,7 @@ def _build_faster_rcnn_model(frcnn_config, is_training, add_summaries):
      iou_thresh=frcnn_config.first_stage_nms_iou_threshold,
      max_size_per_class=frcnn_config.first_stage_max_proposals,
      max_total_size=frcnn_config.first_stage_max_proposals,
-      use_static_shapes=use_static_shapes and is_training)
+      use_static_shapes=use_static_shapes)
  first_stage_loc_loss_weight = (
      frcnn_config.first_stage_localization_loss_weight)
  first_stage_obj_loss_weight = frcnn_config.first_stage_objectness_loss_weight
@@ -398,7 +431,8 @@ def _build_faster_rcnn_model(frcnn_config, is_training, add_summaries):
  second_stage_batch_size = frcnn_config.second_stage_batch_size
  second_stage_sampler = sampler.BalancedPositiveNegativeSampler(
      positive_fraction=frcnn_config.second_stage_balance_fraction,
-      is_static=frcnn_config.use_static_balanced_label_sampler and is_training)
+      is_static=(frcnn_config.use_static_balanced_label_sampler and
+                 use_static_shapes))
  (second_stage_non_max_suppression_fn, second_stage_score_conversion_fn
  ) = post_processing_builder.build(frcnn_config.second_stage_post_processing)
  second_stage_localization_loss_weight = (

--- a/research/object_detection/builders/model_builder_test.py
+++ b/research/object_detection/builders/model_builder_test.py
@@ -39,6 +39,9 @@ from object_detection.models.ssd_mobilenet_v1_fpn_feature_extractor import SSDMo
 from object_detection.models.ssd_mobilenet_v1_ppn_feature_extractor import SSDMobileNetV1PpnFeatureExtractor
 from object_detection.models.ssd_mobilenet_v2_feature_extractor import SSDMobileNetV2FeatureExtractor
 from object_detection.models.ssd_mobilenet_v2_fpn_feature_extractor import SSDMobileNetV2FpnFeatureExtractor
+from object_detection.models.ssd_mobilenet_v2_keras_feature_extractor import SSDMobileNetV2KerasFeatureExtractor
+from object_detection.predictors import convolutional_box_predictor
+from object_detection.predictors import convolutional_keras_box_predictor
 from object_detection.protos import model_pb2

 FRCNN_RESNET_FEAT_MAPS = {
@@ -148,7 +151,7 @@ class ModelBuilderTest(tf.test.TestCase, parameterized.TestCase):
          }
        }
        use_expected_classification_loss_under_sampling: true
-        minimum_negative_sampling: 10
+        min_num_negative_samples: 10
        desired_negative_sampling_ratio: 2
      }"""
    model_proto = model_pb2.DetectionModel()
@@ -160,7 +163,7 @@ class ModelBuilderTest(tf.test.TestCase, parameterized.TestCase):
    self.assertIsNotNone(model._expected_classification_loss_under_sampling)
    self.assertEqual(
        model._expected_classification_loss_under_sampling.keywords, {
-            'minimum_negative_sampling': 10,
+            'min_num_negative_samples': 10,
            'desired_negative_sampling_ratio': 2
        })

@@ -713,6 +716,86 @@ class ModelBuilderTest(tf.test.TestCase, parameterized.TestCase):
    self.assertIsInstance(model, ssd_meta_arch.SSDMetaArch)
    self.assertIsInstance(model._feature_extractor,
                          SSDMobileNetV2FeatureExtractor)
+    self.assertIsInstance(model._box_predictor,
+                          convolutional_box_predictor.ConvolutionalBoxPredictor)
+    self.assertTrue(model._normalize_loc_loss_by_codesize)
+    self.assertTrue(model._target_assigner._weight_regression_loss_by_score)
+
+  def test_create_ssd_mobilenet_v2_keras_model_from_config(self):
+    model_text_proto = """
+      ssd {
+        feature_extractor {
+          type: 'ssd_mobilenet_v2_keras'
+          conv_hyperparams {
+            regularizer {
+                l2_regularizer {
+                }
+              }
+              initializer {
+                truncated_normal_initializer {
+                }
+              }
+          }
+        }
+        box_coder {
+          faster_rcnn_box_coder {
+          }
+        }
+        matcher {
+          argmax_matcher {
+          }
+        }
+        similarity_calculator {
+          iou_similarity {
+          }
+        }
+        anchor_generator {
+          ssd_anchor_generator {
+            aspect_ratios: 1.0
+          }
+        }
+        image_resizer {
+          fixed_shape_resizer {
+            height: 320
+            width: 320
+          }
+        }
+        box_predictor {
+          convolutional_box_predictor {
+            conv_hyperparams {
+              regularizer {
+                l2_regularizer {
+                }
+              }
+              initializer {
+                truncated_normal_initializer {
+                }
+              }
+            }
+          }
+        }
+        normalize_loc_loss_by_codesize: true
+        loss {
+          classification_loss {
+            weighted_softmax {
+            }
+          }
+          localization_loss {
+            weighted_smooth_l1 {
+            }
+          }
+        }
+        weight_regression_loss_by_score: true
+      }"""
+    model_proto = model_pb2.DetectionModel()
+    text_format.Merge(model_text_proto, model_proto)
+    model = self.create_model(model_proto)
+    self.assertIsInstance(model, ssd_meta_arch.SSDMetaArch)
+    self.assertIsInstance(model._feature_extractor,
+                          SSDMobileNetV2KerasFeatureExtractor)
+    self.assertIsInstance(
+        model._box_predictor,
+        convolutional_keras_box_predictor.ConvolutionalBoxPredictor)
    self.assertTrue(model._normalize_loc_loss_by_codesize)
    self.assertTrue(model._target_assigner._weight_regression_loss_by_score)


--- a/research/object_detection/builders/preprocessor_builder.py
+++ b/research/object_detection/builders/preprocessor_builder.py
@@ -167,6 +167,7 @@ def build(preprocessor_step_config):
                                       config.max_aspect_ratio),
                'area_range': (config.min_area, config.max_area),
                'overlap_thresh': config.overlap_thresh,
+                'clip_boxes': config.clip_boxes,
                'random_coef': config.random_coef,
            })

@@ -217,6 +218,7 @@ def build(preprocessor_step_config):
                               config.max_aspect_ratio),
        'area_range': (config.min_area, config.max_area),
        'overlap_thresh': config.overlap_thresh,
+        'clip_boxes': config.clip_boxes,
        'random_coef': config.random_coef,
    }
    if min_padded_size_ratio:
@@ -252,6 +254,7 @@ def build(preprocessor_step_config):
                            for op in config.operations]
      area_range = [(op.min_area, op.max_area) for op in config.operations]
      overlap_thresh = [op.overlap_thresh for op in config.operations]
+      clip_boxes = [op.clip_boxes for op in config.operations]
      random_coef = [op.random_coef for op in config.operations]
      return (preprocessor.ssd_random_crop,
              {
@@ -259,6 +262,7 @@ def build(preprocessor_step_config):
                  'aspect_ratio_range': aspect_ratio_range,
                  'area_range': area_range,
                  'overlap_thresh': overlap_thresh,
+                  'clip_boxes': clip_boxes,
                  'random_coef': random_coef,
              })
    return (preprocessor.ssd_random_crop, {})
@@ -271,6 +275,7 @@ def build(preprocessor_step_config):
                            for op in config.operations]
      area_range = [(op.min_area, op.max_area) for op in config.operations]
      overlap_thresh = [op.overlap_thresh for op in config.operations]
+      clip_boxes = [op.clip_boxes for op in config.operations]
      random_coef = [op.random_coef for op in config.operations]
      min_padded_size_ratio = [tuple(op.min_padded_size_ratio)
                               for op in config.operations]
@@ -284,6 +289,7 @@ def build(preprocessor_step_config):
                  'aspect_ratio_range': aspect_ratio_range,
                  'area_range': area_range,
                  'overlap_thresh': overlap_thresh,
+                  'clip_boxes': clip_boxes,
                  'random_coef': random_coef,
                  'min_padded_size_ratio': min_padded_size_ratio,
                  'max_padded_size_ratio': max_padded_size_ratio,
@@ -297,6 +303,7 @@ def build(preprocessor_step_config):
      min_object_covered = [op.min_object_covered for op in config.operations]
      area_range = [(op.min_area, op.max_area) for op in config.operations]
      overlap_thresh = [op.overlap_thresh for op in config.operations]
+      clip_boxes = [op.clip_boxes for op in config.operations]
      random_coef = [op.random_coef for op in config.operations]
      return (preprocessor.ssd_random_crop_fixed_aspect_ratio,
              {
@@ -304,6 +311,7 @@ def build(preprocessor_step_config):
                  'aspect_ratio': config.aspect_ratio,
                  'area_range': area_range,
                  'overlap_thresh': overlap_thresh,
+                  'clip_boxes': clip_boxes,
                  'random_coef': random_coef,
              })
    return (preprocessor.ssd_random_crop_fixed_aspect_ratio, {})
@@ -332,6 +340,7 @@ def build(preprocessor_step_config):
      kwargs['area_range'] = [(op.min_area, op.max_area)
                              for op in config.operations]
      kwargs['overlap_thresh'] = [op.overlap_thresh for op in config.operations]
+      kwargs['clip_boxes'] = [op.clip_boxes for op in config.operations]
      kwargs['random_coef'] = [op.random_coef for op in config.operations]
    return (preprocessor.ssd_random_crop_pad_fixed_aspect_ratio, kwargs)


--- a/research/object_detection/builders/preprocessor_builder_test.py
+++ b/research/object_detection/builders/preprocessor_builder_test.py
@@ -222,6 +222,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
      min_area: 0.25
      max_area: 0.875
      overlap_thresh: 0.5
+      clip_boxes: False
      random_coef: 0.125
    }
    """
@@ -234,6 +235,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
        'aspect_ratio_range': (0.75, 1.5),
        'area_range': (0.25, 0.875),
        'overlap_thresh': 0.5,
+        'clip_boxes': False,
        'random_coef': 0.125,
    })

@@ -261,6 +263,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
      min_area: 0.25
      max_area: 0.875
      overlap_thresh: 0.5
+      clip_boxes: False
      random_coef: 0.125
    }
    """
@@ -273,6 +276,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
        'aspect_ratio_range': (0.75, 1.5),
        'area_range': (0.25, 0.875),
        'overlap_thresh': 0.5,
+        'clip_boxes': False,
        'random_coef': 0.125,
    })

@@ -285,6 +289,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
      min_area: 0.25
      max_area: 0.875
      overlap_thresh: 0.5
+      clip_boxes: False
      random_coef: 0.125
      min_padded_size_ratio: 0.5
      min_padded_size_ratio: 0.75
@@ -304,6 +309,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
        'aspect_ratio_range': (0.75, 1.5),
        'area_range': (0.25, 0.875),
        'overlap_thresh': 0.5,
+        'clip_boxes': False,
        'random_coef': 0.125,
        'min_padded_size_ratio': (0.5, 0.75),
        'max_padded_size_ratio': (0.5, 0.75),
@@ -315,6 +321,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
    random_crop_to_aspect_ratio {
      aspect_ratio: 0.85
      overlap_thresh: 0.35
+      clip_boxes: False
    }
    """
    preprocessor_proto = preprocessor_pb2.PreprocessingStep()
@@ -322,7 +329,8 @@ class PreprocessorBuilderTest(tf.test.TestCase):
    function, args = preprocessor_builder.build(preprocessor_proto)
    self.assertEqual(function, preprocessor.random_crop_to_aspect_ratio)
    self.assert_dictionary_close(args, {'aspect_ratio': 0.85,
-                                        'overlap_thresh': 0.35})
+                                        'overlap_thresh': 0.35,
+                                        'clip_boxes': False})

  def test_build_random_black_patches(self):
    preprocessor_text_proto = """
@@ -411,6 +419,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
        min_area: 0.5
        max_area: 1.0
        overlap_thresh: 0.0
+        clip_boxes: False
        random_coef: 0.375
      }
      operations {
@@ -420,6 +429,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
        min_area: 0.5
        max_area: 1.0
        overlap_thresh: 0.25
+        clip_boxes: True
        random_coef: 0.375
      }
    }
@@ -432,6 +442,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
                            'aspect_ratio_range': [(0.875, 1.125), (0.75, 1.5)],
                            'area_range': [(0.5, 1.0), (0.5, 1.0)],
                            'overlap_thresh': [0.0, 0.25],
+                            'clip_boxes': [False, True],
                            'random_coef': [0.375, 0.375]})

  def test_build_ssd_random_crop_empty_operations(self):
@@ -455,6 +466,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
        min_area: 0.5
        max_area: 1.0
        overlap_thresh: 0.0
+        clip_boxes: False
        random_coef: 0.375
        min_padded_size_ratio: [1.0, 1.0]
        max_padded_size_ratio: [2.0, 2.0]
@@ -469,6 +481,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
        min_area: 0.5
        max_area: 1.0
        overlap_thresh: 0.25
+        clip_boxes: True
        random_coef: 0.375
        min_padded_size_ratio: [1.0, 1.0]
        max_padded_size_ratio: [2.0, 2.0]
@@ -486,6 +499,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
                            'aspect_ratio_range': [(0.875, 1.125), (0.75, 1.5)],
                            'area_range': [(0.5, 1.0), (0.5, 1.0)],
                            'overlap_thresh': [0.0, 0.25],
+                            'clip_boxes': [False, True],
                            'random_coef': [0.375, 0.375],
                            'min_padded_size_ratio': [(1.0, 1.0), (1.0, 1.0)],
                            'max_padded_size_ratio': [(2.0, 2.0), (2.0, 2.0)],
@@ -499,6 +513,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
        min_area: 0.5
        max_area: 1.0
        overlap_thresh: 0.0
+        clip_boxes: False
        random_coef: 0.375
      }
      operations {
@@ -506,6 +521,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
        min_area: 0.5
        max_area: 1.0
        overlap_thresh: 0.25
+        clip_boxes: True
        random_coef: 0.375
      }
      aspect_ratio: 0.875
@@ -519,6 +535,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
                            'aspect_ratio': 0.875,
                            'area_range': [(0.5, 1.0), (0.5, 1.0)],
                            'overlap_thresh': [0.0, 0.25],
+                            'clip_boxes': [False, True],
                            'random_coef': [0.375, 0.375]})

  def test_build_ssd_random_crop_pad_fixed_aspect_ratio(self):
@@ -531,6 +548,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
        min_area: 0.5
        max_area: 1.0
        overlap_thresh: 0.0
+        clip_boxes: False
        random_coef: 0.375
      }
      operations {
@@ -540,6 +558,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
        min_area: 0.5
        max_area: 1.0
        overlap_thresh: 0.25
+        clip_boxes: True
        random_coef: 0.375
      }
      aspect_ratio: 0.875
@@ -557,6 +576,7 @@ class PreprocessorBuilderTest(tf.test.TestCase):
                            'aspect_ratio_range': [(0.875, 1.125), (0.75, 1.5)],
                            'area_range': [(0.5, 1.0), (0.5, 1.0)],
                            'overlap_thresh': [0.0, 0.25],
+                            'clip_boxes': [False, True],
                            'random_coef': [0.375, 0.375],
                            'min_padded_size_ratio': (1.0, 1.0),
                            'max_padded_size_ratio': (2.0, 2.0)})

--- a/research/object_detection/core/losses.py
+++ b/research/object_detection/core/losses.py
@@ -225,7 +225,9 @@ class WeightedSigmoidClassificationLoss(Loss):
        num_classes] representing the predicted logits for each class
      target_tensor: A float tensor of shape [batch_size, num_anchors,
        num_classes] representing one-hot encoded classification targets
-      weights: a float tensor of shape [batch_size, num_anchors]
+      weights: a float tensor of shape, either [batch_size, num_anchors,
+        num_classes] or [batch_size, num_anchors, 1]. If the shape is
+        [batch_size, num_anchors, 1], all the classses are equally weighted.
      class_indices: (Optional) A 1-D integer tensor of class indices.
        If provided, computes loss only for the specified class indices.

@@ -233,7 +235,6 @@ class WeightedSigmoidClassificationLoss(Loss):
      loss: a float tensor of shape [batch_size, num_anchors, num_classes]
        representing the value of the loss function.
    """
-    weights = tf.expand_dims(weights, 2)
    if class_indices is not None:
      weights *= tf.reshape(
          ops.indices_to_dense_vector(class_indices,
@@ -273,7 +274,9 @@ class SigmoidFocalClassificationLoss(Loss):
        num_classes] representing the predicted logits for each class
      target_tensor: A float tensor of shape [batch_size, num_anchors,
        num_classes] representing one-hot encoded classification targets
-      weights: a float tensor of shape [batch_size, num_anchors]
+      weights: a float tensor of shape, either [batch_size, num_anchors,
+        num_classes] or [batch_size, num_anchors, 1]. If the shape is
+        [batch_size, num_anchors, 1], all the classses are equally weighted.
      class_indices: (Optional) A 1-D integer tensor of class indices.
        If provided, computes loss only for the specified class indices.

@@ -281,7 +284,6 @@ class SigmoidFocalClassificationLoss(Loss):
      loss: a float tensor of shape [batch_size, num_anchors, num_classes]
        representing the value of the loss function.
    """
-    weights = tf.expand_dims(weights, 2)
    if class_indices is not None:
      weights *= tf.reshape(
          ops.indices_to_dense_vector(class_indices,
@@ -326,12 +328,15 @@ class WeightedSoftmaxClassificationLoss(Loss):
        num_classes] representing the predicted logits for each class
      target_tensor: A float tensor of shape [batch_size, num_anchors,
        num_classes] representing one-hot encoded classification targets
-      weights: a float tensor of shape [batch_size, num_anchors]
+      weights: a float tensor of shape, either [batch_size, num_anchors,
+        num_classes] or [batch_size, num_anchors, 1]. If the shape is
+        [batch_size, num_anchors, 1], all the classses are equally weighted.

    Returns:
      loss: a float tensor of shape [batch_size, num_anchors]
        representing the value of the loss function.
    """
+    weights = tf.reduce_mean(weights, axis=2)
    num_classes = prediction_tensor.get_shape().as_list()[-1]
    prediction_tensor = tf.divide(
        prediction_tensor, self._logit_scale, name='scale_logit')
@@ -372,12 +377,15 @@ class WeightedSoftmaxClassificationAgainstLogitsLoss(Loss):
        num_classes] representing the predicted logits for each class
      target_tensor: A float tensor of shape [batch_size, num_anchors,
        num_classes] representing logit classification targets
-      weights: a float tensor of shape [batch_size, num_anchors]
+      weights: a float tensor of shape, either [batch_size, num_anchors,
+        num_classes] or [batch_size, num_anchors, 1]. If the shape is
+        [batch_size, num_anchors, 1], all the classses are equally weighted.

    Returns:
      loss: a float tensor of shape [batch_size, num_anchors]
        representing the value of the loss function.
    """
+    weights = tf.reduce_mean(weights, axis=2)
    num_classes = prediction_tensor.get_shape().as_list()[-1]
    target_tensor = self._scale_and_softmax_logits(target_tensor)
    prediction_tensor = tf.divide(prediction_tensor, self._logit_scale,
@@ -431,7 +439,9 @@ class BootstrappedSigmoidClassificationLoss(Loss):
        num_classes] representing the predicted logits for each class
      target_tensor: A float tensor of shape [batch_size, num_anchors,
        num_classes] representing one-hot encoded classification targets
-      weights: a float tensor of shape [batch_size, num_anchors]
+      weights: a float tensor of shape, either [batch_size, num_anchors,
+        num_classes] or [batch_size, num_anchors, 1]. If the shape is
+        [batch_size, num_anchors, 1], all the classses are equally weighted.

    Returns:
      loss: a float tensor of shape [batch_size, num_anchors, num_classes]
@@ -446,7 +456,7 @@ class BootstrappedSigmoidClassificationLoss(Loss):
              tf.sigmoid(prediction_tensor) > 0.5, tf.float32)
    per_entry_cross_ent = (tf.nn.sigmoid_cross_entropy_with_logits(
        labels=bootstrap_target_tensor, logits=prediction_tensor))
-    return per_entry_cross_ent * tf.expand_dims(weights, 2)
+    return per_entry_cross_ent * weights


 class HardExampleMiner(object):

--- a/research/object_detection/core/losses_test.py
+++ b/research/object_detection/core/losses_test.py
@@ -209,8 +209,14 @@ class WeightedSigmoidClassificationLossTest(tf.test.TestCase):
                                  [0, 1, 0],
                                  [1, 1, 1],
                                  [1, 0, 0]]], tf.float32)
-    weights = tf.constant([[1, 1, 1, 1],
-                           [1, 1, 1, 0]], tf.float32)
+    weights = tf.constant([[[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1]],
+                           [[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [0, 0, 0]]], tf.float32)
    loss_op = losses.WeightedSigmoidClassificationLoss()
    loss = loss_op(prediction_tensor, target_tensor, weights=weights)
    loss = tf.reduce_sum(loss)
@@ -237,8 +243,14 @@ class WeightedSigmoidClassificationLossTest(tf.test.TestCase):
                                  [0, 1, 0],
                                  [1, 1, 1],
                                  [1, 0, 0]]], tf.float32)
-    weights = tf.constant([[1, 1, 1, 1],
-                           [1, 1, 1, 0]], tf.float32)
+    weights = tf.constant([[[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1]],
+                           [[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [0, 0, 0]]], tf.float32)
    loss_op = losses.WeightedSigmoidClassificationLoss()
    loss = loss_op(prediction_tensor, target_tensor, weights=weights)
    loss = tf.reduce_sum(loss, axis=2)
@@ -266,8 +278,14 @@ class WeightedSigmoidClassificationLossTest(tf.test.TestCase):
                                  [0, 1, 0, 0],
                                  [1, 1, 1, 0],
                                  [1, 0, 0, 0]]], tf.float32)
-    weights = tf.constant([[1, 1, 1, 1],
-                           [1, 1, 1, 0]], tf.float32)
+    weights = tf.constant([[[1, 1, 1, 1],
+                            [1, 1, 1, 1],
+                            [1, 1, 1, 1],
+                            [1, 1, 1, 1]],
+                           [[1, 1, 1, 1],
+                            [1, 1, 1, 1],
+                            [1, 1, 1, 1],
+                            [0, 0, 0, 0]]], tf.float32)
    # Ignores the last class.
    class_indices = tf.constant([0, 1, 2], tf.int32)
    loss_op = losses.WeightedSigmoidClassificationLoss()
@@ -306,9 +324,18 @@ class WeightedSigmoidClassificationLossTest(tf.test.TestCase):
                                  [0, 0, 0],
                                  [0, 0, 0],
                                  [0, 0, 0]]], tf.float32)
-    weights = tf.constant([[1, 1, 1, 1],
-                           [1, 1, 1, 0],
-                           [1, 1, 1, 1]], tf.float32)
+    weights = tf.constant([[[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1]],
+                           [[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [0, 0, 0]],
+                           [[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1]]], tf.float32)
    losses_mask = tf.constant([True, True, False], tf.bool)

    loss_op = losses.WeightedSigmoidClassificationLoss()
@@ -345,7 +372,7 @@ class SigmoidFocalClassificationLossTest(tf.test.TestCase):
                                  [0],
                                  [0],
                                  [0]]], tf.float32)
-    weights = tf.constant([[1, 1, 1, 1, 1, 1]], tf.float32)
+    weights = tf.constant([[[1], [1], [1], [1], [1], [1]]], tf.float32)
    focal_loss_op = losses.SigmoidFocalClassificationLoss(gamma=2.0, alpha=None)
    sigmoid_loss_op = losses.WeightedSigmoidClassificationLoss()
    focal_loss = tf.reduce_sum(focal_loss_op(prediction_tensor, target_tensor,
@@ -371,7 +398,7 @@ class SigmoidFocalClassificationLossTest(tf.test.TestCase):
                                  [1],
                                  [0],
                                  [0]]], tf.float32)
-    weights = tf.constant([[1, 1, 1, 1, 1]], tf.float32)
+    weights = tf.constant([[[1], [1], [1], [1], [1]]], tf.float32)
    focal_loss_op = losses.SigmoidFocalClassificationLoss(gamma=2.0, alpha=None)
    sigmoid_loss_op = losses.WeightedSigmoidClassificationLoss()
    focal_loss = tf.reduce_sum(focal_loss_op(prediction_tensor, target_tensor,
@@ -397,7 +424,7 @@ class SigmoidFocalClassificationLossTest(tf.test.TestCase):
                                  [1],
                                  [0],
                                  [0]]], tf.float32)
-    weights = tf.constant([[1, 1, 1, 1, 1]], tf.float32)
+    weights = tf.constant([[[1], [1], [1], [1], [1]]], tf.float32)
    focal_loss_op = losses.SigmoidFocalClassificationLoss(gamma=2.0, alpha=None)
    sigmoid_loss_op = losses.WeightedSigmoidClassificationLoss()
    focal_loss = tf.reduce_sum(focal_loss_op(prediction_tensor, target_tensor,
@@ -423,7 +450,7 @@ class SigmoidFocalClassificationLossTest(tf.test.TestCase):
                                  [1],
                                  [0],
                                  [0]]], tf.float32)
-    weights = tf.constant([[1, 1, 1, 1, 1]], tf.float32)
+    weights = tf.constant([[[1], [1], [1], [1], [1]]], tf.float32)
    focal_loss_op = losses.SigmoidFocalClassificationLoss(gamma=2.0, alpha=1.0)
    sigmoid_loss_op = losses.WeightedSigmoidClassificationLoss()
    focal_loss = tf.reduce_sum(focal_loss_op(prediction_tensor, target_tensor,
@@ -451,7 +478,7 @@ class SigmoidFocalClassificationLossTest(tf.test.TestCase):
                                  [1],
                                  [0],
                                  [0]]], tf.float32)
-    weights = tf.constant([[1, 1, 1, 1, 1]], tf.float32)
+    weights = tf.constant([[[1], [1], [1], [1], [1]]], tf.float32)
    focal_loss_op = losses.SigmoidFocalClassificationLoss(gamma=2.0, alpha=0.0)
    sigmoid_loss_op = losses.WeightedSigmoidClassificationLoss()
    focal_loss = tf.reduce_sum(focal_loss_op(prediction_tensor, target_tensor,
@@ -485,8 +512,14 @@ class SigmoidFocalClassificationLossTest(tf.test.TestCase):
                                  [0, 1, 0],
                                  [1, 1, 1],
                                  [1, 0, 0]]], tf.float32)
-    weights = tf.constant([[1, 1, 1, 1],
-                           [1, 1, 1, 0]], tf.float32)
+    weights = tf.constant([[[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1]],
+                           [[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [0, 0, 0]]], tf.float32)
    focal_loss_op = losses.SigmoidFocalClassificationLoss(alpha=0.5, gamma=0.0)
    sigmoid_loss_op = losses.WeightedSigmoidClassificationLoss()
    focal_loss = focal_loss_op(prediction_tensor, target_tensor,
@@ -515,8 +548,14 @@ class SigmoidFocalClassificationLossTest(tf.test.TestCase):
                                  [0, 1, 0],
                                  [1, 1, 1],
                                  [1, 0, 0]]], tf.float32)
-    weights = tf.constant([[1, 1, 1, 1],
-                           [1, 1, 1, 0]], tf.float32)
+    weights = tf.constant([[[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1]],
+                           [[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [0, 0, 0]]], tf.float32)
    focal_loss_op = losses.SigmoidFocalClassificationLoss(alpha=None, gamma=0.0)
    sigmoid_loss_op = losses.WeightedSigmoidClassificationLoss()
    focal_loss = focal_loss_op(prediction_tensor, target_tensor,
@@ -546,8 +585,14 @@ class SigmoidFocalClassificationLossTest(tf.test.TestCase):
                                  [0, 1, 0],
                                  [1, 0, 0],
                                  [1, 0, 0]]], tf.float32)
-    weights = tf.constant([[1, 1, 1, 1],
-                           [1, 1, 1, 1]], tf.float32)
+    weights = tf.constant([[[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1]],
+                           [[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1]]], tf.float32)
    focal_loss_op = losses.SigmoidFocalClassificationLoss(alpha=1.0, gamma=0.0)

    focal_loss = tf.reduce_sum(focal_loss_op(prediction_tensor, target_tensor,
@@ -578,8 +623,14 @@ class SigmoidFocalClassificationLossTest(tf.test.TestCase):
                                  [0, 1, 0],
                                  [1, 0, 0],
                                  [1, 0, 0]]], tf.float32)
-    weights = tf.constant([[1, 1, 1, 1],
-                           [1, 1, 1, 1]], tf.float32)
+    weights = tf.constant([[[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1]],
+                           [[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1]]], tf.float32)
    focal_loss_op = losses.SigmoidFocalClassificationLoss(alpha=0.75, gamma=0.0)

    focal_loss = tf.reduce_sum(focal_loss_op(prediction_tensor, target_tensor,
@@ -620,9 +671,18 @@ class SigmoidFocalClassificationLossTest(tf.test.TestCase):
                                  [1, 0, 0],
                                  [1, 0, 0],
                                  [1, 0, 0]]], tf.float32)
-    weights = tf.constant([[1, 1, 1, 1],
-                           [1, 1, 1, 1],
-                           [1, 1, 1, 1]], tf.float32)
+    weights = tf.constant([[[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1]],
+                           [[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1]],
+                           [[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1]]], tf.float32)
    losses_mask = tf.constant([True, True, False], tf.bool)
    focal_loss_op = losses.SigmoidFocalClassificationLoss(alpha=0.75, gamma=0.0)

@@ -659,8 +719,14 @@ class WeightedSoftmaxClassificationLossTest(tf.test.TestCase):
                                  [0, 1, 0],
                                  [0, 1, 0],
                                  [1, 0, 0]]], tf.float32)
-    weights = tf.constant([[1, 1, .5, 1],
-                           [1, 1, 1, 0]], tf.float32)
+    weights = tf.constant([[[1, 1, 1],
+                            [1, 1, 1],
+                            [0.5, 0.5, 0.5],
+                            [1, 1, 1]],
+                           [[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [0, 0, 0]]], tf.float32)
    loss_op = losses.WeightedSoftmaxClassificationLoss()
    loss = loss_op(prediction_tensor, target_tensor, weights=weights)
    loss = tf.reduce_sum(loss)
@@ -687,8 +753,14 @@ class WeightedSoftmaxClassificationLossTest(tf.test.TestCase):
                                  [0, 1, 0],
                                  [0, 1, 0],
                                  [1, 0, 0]]], tf.float32)
-    weights = tf.constant([[1, 1, .5, 1],
-                           [1, 1, 1, 0]], tf.float32)
+    weights = tf.constant([[[1, 1, 1],
+                            [1, 1, 1],
+                            [0.5, 0.5, 0.5],
+                            [1, 1, 1]],
+                           [[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [0, 0, 0]]], tf.float32)
    loss_op = losses.WeightedSoftmaxClassificationLoss()
    loss = loss_op(prediction_tensor, target_tensor, weights=weights)

@@ -718,8 +790,14 @@ class WeightedSoftmaxClassificationLossTest(tf.test.TestCase):
                                  [0, 1, 0],
                                  [0, 1, 0],
                                  [1, 0, 0]]], tf.float32)
-    weights = tf.constant([[1, 1, 1, 1],
-                           [1, 1, 1, 1]], tf.float32)
+    weights = tf.constant([[[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1]],
+                           [[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1]]], tf.float32)
    loss_op = losses.WeightedSoftmaxClassificationLoss(logit_scale=logit_scale)
    loss = loss_op(prediction_tensor, target_tensor, weights=weights)

@@ -755,9 +833,18 @@ class WeightedSoftmaxClassificationLossTest(tf.test.TestCase):
                                  [1, 0, 0],
                                  [1, 0, 0],
                                  [1, 0, 0]]], tf.float32)
-    weights = tf.constant([[1, 1, .5, 1],
-                           [1, 1, 1, 0],
-                           [1, 1, 1, 1]], tf.float32)
+    weights = tf.constant([[[1, 1, 1],
+                            [1, 1, 1],
+                            [0.5, 0.5, 0.5],
+                            [1, 1, 1]],
+                           [[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [0, 0, 0]],
+                           [[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1]]], tf.float32)
    losses_mask = tf.constant([True, True, False], tf.bool)
    loss_op = losses.WeightedSoftmaxClassificationLoss()
    loss = loss_op(prediction_tensor, target_tensor, weights=weights,
@@ -792,6 +879,11 @@ class WeightedSoftmaxClassificationAgainstLogitsLossTest(tf.test.TestCase):
                                  [100, -100, -100]]], tf.float32)
    weights = tf.constant([[1, 1, .5, 1],
                           [1, 1, 1, 1]], tf.float32)
+    weights_shape = tf.shape(weights)
+    weights_multiple = tf.concat(
+        [tf.ones_like(weights_shape), tf.constant([3])],
+        axis=0)
+    weights = tf.tile(tf.expand_dims(weights, 2), weights_multiple)
    loss_op = losses.WeightedSoftmaxClassificationAgainstLogitsLoss()
    loss = loss_op(prediction_tensor, target_tensor, weights=weights)
    loss = tf.reduce_sum(loss)
@@ -820,6 +912,11 @@ class WeightedSoftmaxClassificationAgainstLogitsLossTest(tf.test.TestCase):
                                  [100, -100, -100]]], tf.float32)
    weights = tf.constant([[1, 1, .5, 1],
                           [1, 1, 1, 0]], tf.float32)
+    weights_shape = tf.shape(weights)
+    weights_multiple = tf.concat(
+        [tf.ones_like(weights_shape), tf.constant([3])],
+        axis=0)
+    weights = tf.tile(tf.expand_dims(weights, 2), weights_multiple)
    loss_op = losses.WeightedSoftmaxClassificationAgainstLogitsLoss()
    loss = loss_op(prediction_tensor, target_tensor, weights=weights)

@@ -849,6 +946,11 @@ class WeightedSoftmaxClassificationAgainstLogitsLossTest(tf.test.TestCase):
                                  [100, -100, -100]]], tf.float32)
    weights = tf.constant([[1, 1, .5, 1],
                           [1, 1, 1, 0]], tf.float32)
+    weights_shape = tf.shape(weights)
+    weights_multiple = tf.concat(
+        [tf.ones_like(weights_shape), tf.constant([3])],
+        axis=0)
+    weights = tf.tile(tf.expand_dims(weights, 2), weights_multiple)
    loss_op = losses.WeightedSoftmaxClassificationAgainstLogitsLoss(
        logit_scale=logit_scale)
    loss = loss_op(prediction_tensor, target_tensor, weights=weights)
@@ -894,8 +996,14 @@ class BootstrappedSigmoidClassificationLossTest(tf.test.TestCase):
                                  [0, 1, 0],
                                  [1, 1, 1],
                                  [1, 0, 0]]], tf.float32)
-    weights = tf.constant([[1, 1, 1, 1],
-                           [1, 1, 1, 0]], tf.float32)
+    weights = tf.constant([[[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1]],
+                           [[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [0, 0, 0]]], tf.float32)
    alpha = tf.constant(.5, tf.float32)
    loss_op = losses.BootstrappedSigmoidClassificationLoss(
        alpha, bootstrap_type='soft')
@@ -923,8 +1031,14 @@ class BootstrappedSigmoidClassificationLossTest(tf.test.TestCase):
                                  [0, 1, 0],
                                  [1, 1, 1],
                                  [1, 0, 0]]], tf.float32)
-    weights = tf.constant([[1, 1, 1, 1],
-                           [1, 1, 1, 0]], tf.float32)
+    weights = tf.constant([[[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1]],
+                           [[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [0, 0, 0]]], tf.float32)
    alpha = tf.constant(.5, tf.float32)
    loss_op = losses.BootstrappedSigmoidClassificationLoss(
        alpha, bootstrap_type='hard')
@@ -952,8 +1066,14 @@ class BootstrappedSigmoidClassificationLossTest(tf.test.TestCase):
                                  [0, 1, 0],
                                  [1, 1, 1],
                                  [1, 0, 0]]], tf.float32)
-    weights = tf.constant([[1, 1, 1, 1],
-                           [1, 1, 1, 0]], tf.float32)
+    weights = tf.constant([[[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1]],
+                           [[1, 1, 1],
+                            [1, 1, 1],
+                            [1, 1, 1],
+                            [0, 0, 0]]], tf.float32)
    alpha = tf.constant(.5, tf.float32)
    loss_op = losses.BootstrappedSigmoidClassificationLoss(
        alpha, bootstrap_type='hard')

--- a/research/object_detection/core/matcher.py
+++ b/research/object_detection/core/matcher.py
@@ -197,8 +197,10 @@ class Match(object):
        The shape of the gathered tensor is [match_results.shape[0]] +
        input_tensor.shape[1:].
    """
-    input_tensor = tf.concat([tf.stack([ignored_value, unmatched_value]),
-                              input_tensor], axis=0)
+    input_tensor = tf.concat(
+        [tf.stack([ignored_value, unmatched_value]),
+         tf.to_float(input_tensor)],
+        axis=0)
    gather_indices = tf.maximum(self.match_results + 2, 0)
    gathered_tensor = self._gather_op(input_tensor, gather_indices)
    return gathered_tensor

--- a/research/object_detection/core/model.py
+++ b/research/object_detection/core/model.py
@@ -289,6 +289,18 @@ class DetectionModel(object):
      self._groundtruth_lists[
          fields.InputDataFields.is_annotated] = is_annotated_list

+  @abstractmethod
+  def regularization_losses(self):
+    """Returns a list of regularization losses for this model.
+
+    Returns a list of regularization losses for this model that the estimator
+    needs to use during training/optimization.
+
+    Returns:
+      A list of regularization loss tensors.
+    """
+    pass
+
  @abstractmethod
  def restore_map(self, fine_tune_checkpoint_type='detection'):
    """Returns a map of variables to load from a foreign checkpoint.
@@ -312,3 +324,16 @@ class DetectionModel(object):
      the model graph.
    """
    pass
+
+  @abstractmethod
+  def updates(self):
+    """Returns a list of update operators for this model.
+
+    Returns a list of update operators for this model that must be executed at
+    each training step. The estimator's train op needs to have a control
+    dependency on these updates.
+
+    Returns:
+      A list of update operators.
+    """
+    pass
--- a/research/object_detection/core/post_processing.py
+++ b/research/object_detection/core/post_processing.py
@@ -15,6 +15,7 @@

 """Post-processing operations on detected boxes."""

+import numpy as np
 import tensorflow as tf

 from object_detection.core import box_list
@@ -407,28 +408,36 @@ def batch_multiclass_non_max_suppression(boxes,
          for key, value in zip(additional_fields, args[4:-1])
      }
      per_image_num_valid_boxes = args[-1]
-      per_image_boxes = tf.reshape(
-          tf.slice(per_image_boxes, 3 * [0],
-                   tf.stack([per_image_num_valid_boxes, -1, -1])), [-1, q, 4])
-      per_image_scores = tf.reshape(
-          tf.slice(per_image_scores, [0, 0],
-                   tf.stack([per_image_num_valid_boxes, -1])),
-          [-1, num_classes])
-      per_image_masks = tf.reshape(
-          tf.slice(per_image_masks, 4 * [0],
-                   tf.stack([per_image_num_valid_boxes, -1, -1, -1])),
-          [-1, q, per_image_masks.shape[2].value,
-           per_image_masks.shape[3].value])
-      if per_image_additional_fields is not None:
-        for key, tensor in per_image_additional_fields.items():
-          additional_field_shape = tensor.get_shape()
-          additional_field_dim = len(additional_field_shape)
-          per_image_additional_fields[key] = tf.reshape(
-              tf.slice(per_image_additional_fields[key],
-                       additional_field_dim * [0],
-                       tf.stack([per_image_num_valid_boxes] +
-                                (additional_field_dim - 1) * [-1])),
-              [-1] + [dim.value for dim in additional_field_shape[1:]])
+      if use_static_shapes:
+        total_proposals = tf.shape(per_image_scores)
+        per_image_scores = tf.where(
+            tf.less(tf.range(total_proposals[0]), per_image_num_valid_boxes),
+            per_image_scores,
+            tf.fill(total_proposals, np.finfo('float32').min))
+      else:
+        per_image_boxes = tf.reshape(
+            tf.slice(per_image_boxes, 3 * [0],
+                     tf.stack([per_image_num_valid_boxes, -1, -1])), [-1, q, 4])
+        per_image_scores = tf.reshape(
+            tf.slice(per_image_scores, [0, 0],
+                     tf.stack([per_image_num_valid_boxes, -1])),
+            [-1, num_classes])
+        per_image_masks = tf.reshape(
+            tf.slice(per_image_masks, 4 * [0],
+                     tf.stack([per_image_num_valid_boxes, -1, -1, -1])),
+            [-1, q, per_image_masks.shape[2].value,
+             per_image_masks.shape[3].value])
+        if per_image_additional_fields is not None:
+          for key, tensor in per_image_additional_fields.items():
+            additional_field_shape = tensor.get_shape()
+            additional_field_dim = len(additional_field_shape)
+            per_image_additional_fields[key] = tf.reshape(
+                tf.slice(per_image_additional_fields[key],
+                         additional_field_dim * [0],
+                         tf.stack([per_image_num_valid_boxes] +
+                                  (additional_field_dim - 1) * [-1])),
+                [-1] + [dim.value for dim in additional_field_shape[1:]])
+
      nmsed_boxlist, num_valid_nms_boxes = multiclass_non_max_suppression(
          per_image_boxes,
          per_image_scores,