Add TPU SavedModel exporter and refactor OD code (#6737)

247226201 by ronnyvotel: Updating the visualization tools to accept unique_ids for color coding. -- 247067830 by Zhichao Lu: Add box_encodings_clip_range options for the convolutional box predictor (for TPU compatibility). -- 246888475 by Zhichao Lu: Remove unused _update_eval_steps function. -- 246163259 by lzc: Add a gather op that can handle ignore indices (which are "-1"s in this case). -- 246084944 by Zhichao Lu: Keras based implementation for SSD + MobilenetV2 + FPN. -- 245544227 by rathodv: Add batch_get_targets method to target assigner module to gather any groundtruth tensors based on the results of target assigner. -- 245540854 by rathodv: Update target assigner to return match tensor instead of a match object. -- 245434441 by Zhichao Lu: Add README for tpu_exporters package. -- 245381834 by lzc: Internal change. -- 245298983 by Zhichao Lu: Add conditional_shape_resizer to config_util -- 245134666 by Zhichao Lu: Adds ConditionalShapeResizer to the ImageResizer proto which enables resizing only if input image height or width is is greater or smaller than a certain size. Also enables specification of resize method in resize_to_{max, min}_dimension methods. -- 245093975 by Zhichao Lu: Exporting SavedModel for Object Detection TPU inference. (faster-rcnn) -- 245072421 by Zhichao Lu: Adds a new image resizing method "resize_to_max_dimension" which resizes images only if a dimension is greater than the maximum desired value while maintaining aspect ratio. -- 244946998 by lzc: Internal Changes. -- 244943693 by Zhichao Lu: Add a custom config to mobilenet v2 that makes it more detection friendly. -- 244754158 by derekjchow: Internal change. -- 244699875 by Zhichao Lu: Add check_range=False to box_list_ops.to_normalized_coordinates when training for instance segmentation. This is consistent with other calls when training for object detection. There could be wrongly annotated boxes in the dataset. -- 244507425 by rathodv: Support bfloat16 for ssd models. -- 244399982 by Zhichao Lu: Exporting SavedModel for Object Detection TPU inference. (ssd) -- 244209387 by Zhichao Lu: Internal change. -- 243922296 by rathodv: Change `raw_detection_scores` to contain softmax/sigmoid scores (not logits) for `raw_ detection_boxes`. -- 243883978 by Zhichao Lu: Add a sample fully conv config. -- 243369455 by Zhichao Lu: Fix regularization loss gap in Keras and Slim. -- 243292002 by lzc: Internal changes. -- 243097958 by Zhichao Lu: Exporting SavedModel for Object Detection TPU inference. (ssd model) -- 243007177 by Zhichao Lu: Exporting SavedModel for Object Detection TPU inference. (ssd model) -- 242776550 by Zhichao Lu: Make object detection pre-processing run on GPU. tf.map_fn() uses TensorArrayV3 ops, which have no int32 GPU implementation. Cast to int64, then cast back to int32. -- 242723128 by Zhichao Lu: Using sorted dictionaries for additional heads in non_max_suppression to ensure tensor order -- 242495311 by Zhichao Lu: Update documentation to reflect new TFLite examples repo location -- 242230527 by Zhichao Lu: Fix Dropout bugs for WeightSharedConvolutionalBoxPred. -- 242226573 by Zhichao Lu: Create Keras-based WeightSharedConvolutionalBoxPredictor. -- 241806074 by Zhichao Lu: Add inference in unit tests of TFX OD template. -- 241641498 by lzc: Internal change. -- 241637481 by Zhichao Lu: matmul_crop_and_resize(): Switch to dynamic shaping, so that not all dimensions are required to be known. -- 241429980 by Zhichao Lu: Internal change -- 241167237 by Zhichao Lu: Adds a faster_rcnn_inception_resnet_v2 Keras feature extractor, and updates the model builder to construct it. -- 241088616 by Zhichao Lu: Make it compatible with different dtype, e.g. float32, bfloat16, etc. -- 240897364 by lzc: Use image_np_expanded in object_detection_tutorial notebook. -- 240890393 by Zhichao Lu: Disable multicore inference for OD template as its not yet compatible. -- 240352168 by Zhichao Lu: Make SSDResnetV1FpnFeatureExtractor not protected to allow inheritance. -- 240351470 by lzc: Internal change. -- 239878928 by Zhichao Lu: Defines Keras box predictors for Faster RCNN and RFCN -- 239872103 by Zhichao Lu: Delete duplicated inputs in test. -- 239714273 by Zhichao Lu: Adding scope variable to all class heads -- 239698643 by Zhichao Lu: Create FPN feature extractor for object detection. -- 239696657 by Zhichao Lu: Internal Change. -- 239299404 by Zhichao Lu: Allows the faster rcnn meta-architecture to support Keras subcomponents -- 238502595 by Zhichao Lu: Lay the groundwork for symmetric quantization. -- 238496885 by Zhichao Lu: Add flexible_grid_anchor_generator -- 238138727 by lzc: Remove dead code. _USE_C_SHAPES has been forced True in TensorFlow releases since TensorFlow 1.9 (https://github.com/tensorflow/tensorflow/commit/1d74a69443f741e69f9f52cb6bc2940b4d4ae3b7) -- 238123936 by rathodv: Add num_matched_groundtruth summary to target assigner in SSD. -- 238103345 by ronnyvotel: Raising error if input file pattern does not match any files. Also printing the number of evaluation images for coco metrics. -- 238044081 by Zhichao Lu: Fix docstring to state the correct dimensionality of `class_predictions_with_background`. -- 237920279 by Zhichao Lu: [XLA] Rework debug flags for dumping HLO. The following flags (usually passed via the XLA_FLAGS envvar) are removed: xla_dump_computations_to xla_dump_executions_to xla_dump_ir_to xla_dump_optimized_hlo_proto_to xla_dump_per_pass_hlo_proto_to xla_dump_unoptimized_hlo_proto_to xla_generate_hlo_graph xla_generate_hlo_text_to xla_hlo_dump_as_html xla_hlo_graph_path xla_log_hlo_text The following new flags are added: xla_dump_to xla_dump_hlo_module_re xla_dump_hlo_pass_re xla_dump_hlo_as_text xla_dump_hlo_as_proto xla_dump_hlo_as_dot xla_dump_hlo_as_url xla_dump_hlo_as_html xla_dump_ir xla_dump_hlo_snapshots The default is not to dump anything at all, but as soon as some dumping flag is specified, we enable the following defaults (most of which can be overridden). * dump to stdout (overridden by --xla_dump_to) * dump HLO modules at the very beginning and end of the optimization pipeline * don't dump between any HLO passes (overridden by --xla_dump_hlo_pass_re) * dump all HLO modules (overridden by --xla_dump_hlo_module_re) * dump in textual format (overridden by --xla_dump_hlo_as_{text,proto,dot,url,html}). For example, to dump optimized and unoptimized HLO text and protos to /tmp/foo, pass --xla_dump_to=/tmp/foo --xla_dump_hlo_as_text --xla_dump_hlo_as_proto For details on these flags' meanings, see xla.proto. The intent of this change is to make dumping both simpler to use and more powerful. For example: * Previously there was no way to dump the HLO module during the pass pipeline in HLO text format; the only option was --dump_per_pass_hlo_proto_to, which dumped in proto format. Now this is --xla_dump_pass_re=.* --xla_dump_hlo_as_text. (In fact, the second flag is not necessary in this case, as dumping as text is the default.) * Previously there was no way to dump HLO as a graph before and after compilation; the only option was --xla_generate_hlo_graph, which would dump before/after every pass. Now this is --xla_dump_hlo_as_{dot,url,html} (depending on what format you want the graph in). * Previously, there was no coordination between the filenames written by the various flags, so info about one module might be dumped with various filename prefixes. Now the filenames are consistent and all dumps from a particular module are next to each other. If you only specify some of these flags, we try to figure out what you wanted. For example: * --xla_dump_to implies --xla_dump_hlo_as_text unless you specify some other --xla_dump_as_* flag. * --xla_dump_hlo_as_text or --xla_dump_ir implies dumping to stdout unless you specify a different --xla_dump_to directory. You can explicitly dump to stdout with --xla_dump_to=-. As part of this change, I simplified the debugging code in the HLO passes for dumping HLO modules. Previously, many tests explicitly VLOG'ed the HLO module before, after, and sometimes during the pass. I removed these VLOGs. If you want dumps before/during/after an HLO pass, use --xla_dump_pass_re=<pass_name>. -- 237510043 by lzc: Internal Change. -- 237469515 by Zhichao Lu: Parameterize model_builder.build in inputs.py. -- 237293511 by rathodv: Remove multiclass_scores from tensor_dict in transform_data_fn always. -- 237260333 by ronnyvotel: Updating faster_rcnn_meta_arch to define prediction dictionary fields that are batched. -- PiperOrigin-RevId: 247226201

Add TPU SavedModel exporter and refactor OD code (#6737)
247226201 by ronnyvotel: Updating the visualization tools to accept unique_ids for color coding. -- 247067830 by Zhichao Lu: Add box_encodings_clip_range options for the convolutional box predictor (for TPU compatibility). -- 246888475 by Zhichao Lu: Remove unused _update_eval_steps function. -- 246163259 by lzc: Add a gather op that can handle ignore indices (which are "-1"s in this case). -- 246084944 by Zhichao Lu: Keras based implementation for SSD + MobilenetV2 + FPN. -- 245544227 by rathodv: Add batch_get_targets method to target assigner module to gather any groundtruth tensors based on the results of target assigner. -- 245540854 by rathodv: Update target assigner to return match tensor instead of a match object. -- 245434441 by Zhichao Lu: Add README for tpu_exporters package. -- 245381834 by lzc: Internal change. -- 245298983 by Zhichao Lu: Add conditional_shape_resizer to config_util -- 245134666 by Zhichao Lu: Adds ConditionalShapeResizer to the ImageResizer proto which enables resizing only if input image height or width is is greater or smaller than a certain size. Also enables specification of resize method in resize_to_{max, min}_dimension methods. -- 245093975 by Zhichao Lu: Exporting SavedModel for Object Detection TPU inference. (faster-rcnn) -- 245072421 by Zhichao Lu: Adds a new image resizing method "resize_to_max_dimension" which resizes images only if a dimension is greater than the maximum desired value while maintaining aspect ratio. -- 244946998 by lzc: Internal Changes. -- 244943693 by Zhichao Lu: Add a custom config to mobilenet v2 that makes it more detection friendly. -- 244754158 by derekjchow: Internal change. -- 244699875 by Zhichao Lu: Add check_range=False to box_list_ops.to_normalized_coordinates when training for instance segmentation. This is consistent with other calls when training for object detection. There could be wrongly annotated boxes in the dataset. -- 244507425 by rathodv: Support bfloat16 for ssd models. -- 244399982 by Zhichao Lu: Exporting SavedModel for Object Detection TPU inference. (ssd) -- 244209387 by Zhichao Lu: Internal change. -- 243922296 by rathodv: Change `raw_detection_scores` to contain softmax/sigmoid scores (not logits) for `raw_ detection_boxes`. -- 243883978 by Zhichao Lu: Add a sample fully conv config. -- 243369455 by Zhichao Lu: Fix regularization loss gap in Keras and Slim. -- 243292002 by lzc: Internal changes. -- 243097958 by Zhichao Lu: Exporting SavedModel for Object Detection TPU inference. (ssd model) -- 243007177 by Zhichao Lu: Exporting SavedModel for Object Detection TPU inference. (ssd model) -- 242776550 by Zhichao Lu: Make object detection pre-processing run on GPU. tf.map_fn() uses TensorArrayV3 ops, which have no int32 GPU implementation. Cast to int64, then cast back to int32. -- 242723128 by Zhichao Lu: Using sorted dictionaries for additional heads in non_max_suppression to ensure tensor order -- 242495311 by Zhichao Lu: Update documentation to reflect new TFLite examples repo location -- 242230527 by Zhichao Lu: Fix Dropout bugs for WeightSharedConvolutionalBoxPred. -- 242226573 by Zhichao Lu: Create Keras-based WeightSharedConvolutionalBoxPredictor. -- 241806074 by Zhichao Lu: Add inference in unit tests of TFX OD template. -- 241641498 by lzc: Internal change. -- 241637481 by Zhichao Lu: matmul_crop_and_resize(): Switch to dynamic shaping, so that not all dimensions are required to be known. -- 241429980 by Zhichao Lu: Internal change -- 241167237 by Zhichao Lu: Adds a faster_rcnn_inception_resnet_v2 Keras feature extractor, and updates the model builder to construct it. -- 241088616 by Zhichao Lu: Make it compatible with different dtype, e.g. float32, bfloat16, etc. -- 240897364 by lzc: Use image_np_expanded in object_detection_tutorial notebook. -- 240890393 by Zhichao Lu: Disable multicore inference for OD template as its not yet compatible. -- 240352168 by Zhichao Lu: Make SSDResnetV1FpnFeatureExtractor not protected to allow inheritance. -- 240351470 by lzc: Internal change. -- 239878928 by Zhichao Lu: Defines Keras box predictors for Faster RCNN and RFCN -- 239872103 by Zhichao Lu: Delete duplicated inputs in test. -- 239714273 by Zhichao Lu: Adding scope variable to all class heads -- 239698643 by Zhichao Lu: Create FPN feature extractor for object detection. -- 239696657 by Zhichao Lu: Internal Change. -- 239299404 by Zhichao Lu: Allows the faster rcnn meta-architecture to support Keras subcomponents -- 238502595 by Zhichao Lu: Lay the groundwork for symmetric quantization. -- 238496885 by Zhichao Lu: Add flexible_grid_anchor_generator -- 238138727 by lzc: Remove dead code. _USE_C_SHAPES has been forced True in TensorFlow releases since TensorFlow 1.9 (https://github.com/tensorflow/tensorflow/commit/1d74a69443f741e69f9f52cb6bc2940b4d4ae3b7) -- 238123936 by rathodv: Add num_matched_groundtruth summary to target assigner in SSD. -- 238103345 by ronnyvotel: Raising error if input file pattern does not match any files. Also printing the number of evaluation images for coco metrics. -- 238044081 by Zhichao Lu: Fix docstring to state the correct dimensionality of `class_predictions_with_background`. -- 237920279 by Zhichao Lu: [XLA] Rework debug flags for dumping HLO. The following flags (usually passed via the XLA_FLAGS envvar) are removed: xla_dump_computations_to xla_dump_executions_to xla_dump_ir_to xla_dump_optimized_hlo_proto_to xla_dump_per_pass_hlo_proto_to xla_dump_unoptimized_hlo_proto_to xla_generate_hlo_graph xla_generate_hlo_text_to xla_hlo_dump_as_html xla_hlo_graph_path xla_log_hlo_text The following new flags are added: xla_dump_to xla_dump_hlo_module_re xla_dump_hlo_pass_re xla_dump_hlo_as_text xla_dump_hlo_as_proto xla_dump_hlo_as_dot xla_dump_hlo_as_url xla_dump_hlo_as_html xla_dump_ir xla_dump_hlo_snapshots The default is not to dump anything at all, but as soon as some dumping flag is specified, we enable the following defaults (most of which can be overridden). * dump to stdout (overridden by --xla_dump_to) * dump HLO modules at the very beginning and end of the optimization pipeline * don't dump between any HLO passes (overridden by --xla_dump_hlo_pass_re) * dump all HLO modules (overridden by --xla_dump_hlo_module_re) * dump in textual format (overridden by --xla_dump_hlo_as_{text,proto,dot,url,html}). For example, to dump optimized and unoptimized HLO text and protos to /tmp/foo, pass --xla_dump_to=/tmp/foo --xla_dump_hlo_as_text --xla_dump_hlo_as_proto For details on these flags' meanings, see xla.proto. The intent of this change is to make dumping both simpler to use and more powerful. For example: * Previously there was no way to dump the HLO module during the pass pipeline in HLO text format; the only option was --dump_per_pass_hlo_proto_to, which dumped in proto format. Now this is --xla_dump_pass_re=.* --xla_dump_hlo_as_text. (In fact, the second flag is not necessary in this case, as dumping as text is the default.) * Previously there was no way to dump HLO as a graph before and after compilation; the only option was --xla_generate_hlo_graph, which would dump before/after every pass. Now this is --xla_dump_hlo_as_{dot,url,html} (depending on what format you want the graph in). * Previously, there was no coordination between the filenames written by the various flags, so info about one module might be dumped with various filename prefixes. Now the filenames are consistent and all dumps from a particular module are next to each other. If you only specify some of these flags, we try to figure out what you wanted. For example: * --xla_dump_to implies --xla_dump_hlo_as_text unless you specify some other --xla_dump_as_* flag. * --xla_dump_hlo_as_text or --xla_dump_ir implies dumping to stdout unless you specify a different --xla_dump_to directory. You can explicitly dump to stdout with --xla_dump_to=-. As part of this change, I simplified the debugging code in the HLO passes for dumping HLO modules. Previously, many tests explicitly VLOG'ed the HLO module before, after, and sometimes during the pass. I removed these VLOGs. If you want dumps before/during/after an HLO pass, use --xla_dump_pass_re=<pass_name>. -- 237510043 by lzc: Internal Change. -- 237469515 by Zhichao Lu: Parameterize model_builder.build in inputs.py. -- 237293511 by rathodv: Remove multiclass_scores from tensor_dict in transform_data_fn always. -- 237260333 by ronnyvotel: Updating faster_rcnn_meta_arch to define prediction dictionary fields that are batched. -- PiperOrigin-RevId: 247226201
80444539 · Zhuoran Liu · pkulzc · c4f34e58 · 80444539 · 80444539
Commit 80444539 authored May 22, 2019 by Zhuoran Liu Committed by pkulzc May 22, 2019
20 changed files
--- a/research/object_detection/models/keras_models/inception_resnet_v2_test.py
+++ b/research/object_detection/models/keras_models/inception_resnet_v2_test.py
+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Tests for inception_resnet_v2.py.
+
+This test mainly focuses on comparing slim inception resnet v2 and Keras
+inception resnet v2 for object detection. To verify the consistency of the two
+models, we compare:
+  1. Output shape of each layer given different inputs
+  2. Number of global variables
+
+We also visualize the model structure via Tensorboard, and compare the model
+layout and the parameters of each Op to make sure the two implementations are
+consistent.
+"""
+
+import itertools
+import numpy as np
+import tensorflow as tf
+
+from object_detection.models.keras_models import inception_resnet_v2
+from object_detection.utils import test_case
+
+_KERAS_TO_SLIM_ENDPOINT_NAMES = {
+    'activation': 'Conv2d_1a_3x3',
+    'activation_1': 'Conv2d_2a_3x3',
+    'activation_2': 'Conv2d_2b_3x3',
+    'activation_3': 'Conv2d_3b_1x1',
+    'activation_4': 'Conv2d_4a_3x3',
+    'max_pooling2d': 'MaxPool_3a_3x3',
+    'max_pooling2d_1': 'MaxPool_5a_3x3',
+    'mixed_5b': 'Mixed_5b',
+    'mixed_6a': 'Mixed_6a',
+    'block17_20_ac': 'PreAuxLogits',
+    'mixed_7a': 'Mixed_7a',
+    'conv_7b_ac': 'Conv2d_7b_1x1',
+}
+
+_SLIM_ENDPOINT_SHAPES_128 = {
+    'Conv2d_1a_3x3': (2, 64, 64, 32),
+    'Conv2d_2a_3x3': (2, 64, 64, 32),
+    'Conv2d_2b_3x3': (2, 64, 64, 64),
+    'Conv2d_3b_1x1': (2, 32, 32, 80),
+    'Conv2d_4a_3x3': (2, 32, 32, 192),
+    'Conv2d_7b_1x1': (2, 4, 4, 1536),
+    'MaxPool_3a_3x3': (2, 32, 32, 64),
+    'MaxPool_5a_3x3': (2, 16, 16, 192),
+    'Mixed_5b': (2, 16, 16, 320),
+    'Mixed_6a': (2, 8, 8, 1088),
+    'Mixed_7a': (2, 4, 4, 2080),
+    'PreAuxLogits': (2, 8, 8, 1088)}
+_SLIM_ENDPOINT_SHAPES_128_STRIDE_8 = {
+    'Conv2d_1a_3x3': (2, 64, 64, 32),
+    'Conv2d_2a_3x3': (2, 64, 64, 32),
+    'Conv2d_2b_3x3': (2, 64, 64, 64),
+    'Conv2d_3b_1x1': (2, 32, 32, 80),
+    'Conv2d_4a_3x3': (2, 32, 32, 192),
+    'MaxPool_3a_3x3': (2, 32, 32, 64),
+    'MaxPool_5a_3x3': (2, 16, 16, 192),
+    'Mixed_5b': (2, 16, 16, 320),
+    'Mixed_6a': (2, 16, 16, 1088),
+    'PreAuxLogits': (2, 16, 16, 1088)}
+_SLIM_ENDPOINT_SHAPES_128_ALIGN_FEATURE_MAPS_FALSE = {
+    'Conv2d_1a_3x3': (2, 63, 63, 32),
+    'Conv2d_2a_3x3': (2, 61, 61, 32),
+    'Conv2d_2b_3x3': (2, 61, 61, 64),
+    'Conv2d_3b_1x1': (2, 30, 30, 80),
+    'Conv2d_4a_3x3': (2, 28, 28, 192),
+    'Conv2d_7b_1x1': (2, 2, 2, 1536),
+    'MaxPool_3a_3x3': (2, 30, 30, 64),
+    'MaxPool_5a_3x3': (2, 13, 13, 192),
+    'Mixed_5b': (2, 13, 13, 320),
+    'Mixed_6a': (2, 6, 6, 1088),
+    'Mixed_7a': (2, 2, 2, 2080),
+    'PreAuxLogits': (2, 6, 6, 1088)}
+_SLIM_ENDPOINT_SHAPES_299 = {}
+_SLIM_ENDPOINT_SHAPES_299_STRIDE_8 = {}
+_SLIM_ENDPOINT_SHAPES_299_ALIGN_FEATURE_MAPS_FALSE = {}
+
+_KERAS_LAYERS_TO_CHECK = list(_KERAS_TO_SLIM_ENDPOINT_NAMES.keys())
+
+_NUM_CHANNELS = 3
+_BATCH_SIZE = 2
+
+
+class InceptionResnetV2Test(test_case.TestCase):
+
+  def _create_application_with_layer_outputs(
+      self, layer_names, batchnorm_training,
+      output_stride=16,
+      align_feature_maps=False,
+      batchnorm_scale=False,
+      weight_decay=0.00004,
+      default_batchnorm_momentum=0.9997,
+      default_batchnorm_epsilon=0.001,):
+    """Constructs Keras inception_resnet_v2 that extracts layer outputs."""
+    # Have to clear the Keras backend to ensure isolation in layer naming
+    tf.keras.backend.clear_session()
+    if not layer_names:
+      layer_names = _KERAS_LAYERS_TO_CHECK
+    full_model = inception_resnet_v2.inception_resnet_v2(
+        batchnorm_training=batchnorm_training,
+        output_stride=output_stride,
+        align_feature_maps=align_feature_maps,
+        weights=None,
+        batchnorm_scale=batchnorm_scale,
+        weight_decay=weight_decay,
+        default_batchnorm_momentum=default_batchnorm_momentum,
+        default_batchnorm_epsilon=default_batchnorm_epsilon,
+        include_top=False)
+    layer_outputs = [full_model.get_layer(name=layer).output
+                     for layer in layer_names]
+    return tf.keras.Model(
+        inputs=full_model.inputs,
+        outputs=layer_outputs)
+
+  def _check_returns_correct_shape(
+      self, image_height, image_width,
+      expected_feature_map_shape, layer_names=None, batchnorm_training=True,
+      output_stride=16,
+      align_feature_maps=False,
+      batchnorm_scale=False,
+      weight_decay=0.00004,
+      default_batchnorm_momentum=0.9997,
+      default_batchnorm_epsilon=0.001,):
+    if not layer_names:
+      layer_names = _KERAS_LAYERS_TO_CHECK
+    model = self._create_application_with_layer_outputs(
+        layer_names=layer_names,
+        batchnorm_training=batchnorm_training,
+        output_stride=output_stride,
+        align_feature_maps=align_feature_maps,
+        batchnorm_scale=batchnorm_scale,
+        weight_decay=weight_decay,
+        default_batchnorm_momentum=default_batchnorm_momentum,
+        default_batchnorm_epsilon=default_batchnorm_epsilon)
+
+    image_tensor = np.random.rand(_BATCH_SIZE, image_height, image_width,
+                                  _NUM_CHANNELS).astype(np.float32)
+    feature_maps = model(image_tensor)
+
+    for feature_map, layer_name in itertools.izip(
+        feature_maps, layer_names):
+      endpoint_name = _KERAS_TO_SLIM_ENDPOINT_NAMES[layer_name]
+      expected_shape = expected_feature_map_shape[endpoint_name]
+      self.assertAllEqual(feature_map.shape, expected_shape)
+
+  def _get_variables(self, layer_names=None):
+    tf.keras.backend.clear_session()
+    model = self._create_application_with_layer_outputs(
+        layer_names=layer_names,
+        batchnorm_training=False)
+    preprocessed_inputs = tf.placeholder(
+        tf.float32, (4, None, None, _NUM_CHANNELS))
+    model(preprocessed_inputs)
+    return model.variables
+
+  def test_returns_correct_shapes_128(self):
+    image_height = 128
+    image_width = 128
+    expected_feature_map_shape = (
+        _SLIM_ENDPOINT_SHAPES_128)
+    self._check_returns_correct_shape(
+        image_height, image_width, expected_feature_map_shape,
+        align_feature_maps=True)
+
+  def test_returns_correct_shapes_128_output_stride_8(self):
+    image_height = 128
+    image_width = 128
+    expected_feature_map_shape = (
+        _SLIM_ENDPOINT_SHAPES_128_STRIDE_8)
+
+    # Output stride of 8 not defined beyond 'block17_20_ac', which is
+    # PreAuxLogits in slim. So, we exclude those layers in our Keras vs Slim
+    # comparison.
+    excluded_layers = {'mixed_7a', 'conv_7b_ac'}
+    layer_names = [l for l in _KERAS_LAYERS_TO_CHECK
+                   if l not in excluded_layers]
+    self._check_returns_correct_shape(
+        image_height, image_width, expected_feature_map_shape,
+        layer_names=layer_names, output_stride=8, align_feature_maps=True)
+
+  def test_returns_correct_shapes_128_align_feature_maps_false(
+      self):
+    image_height = 128
+    image_width = 128
+    expected_feature_map_shape = (
+        _SLIM_ENDPOINT_SHAPES_128_ALIGN_FEATURE_MAPS_FALSE)
+    self._check_returns_correct_shape(
+        image_height, image_width, expected_feature_map_shape,
+        align_feature_maps=False)
+
+  def test_hyperparam_override(self):
+    model = inception_resnet_v2.inception_resnet_v2(
+        batchnorm_training=True,
+        default_batchnorm_momentum=0.2,
+        default_batchnorm_epsilon=0.1,
+        weights=None,
+        include_top=False)
+    bn_layer = model.get_layer(name='freezable_batch_norm')
+    self.assertAllClose(bn_layer.momentum, 0.2)
+    self.assertAllClose(bn_layer.epsilon, 0.1)
+
+  def test_variable_count(self):
+    variables = self._get_variables()
+    # 896 is the number of variables from slim inception resnet v2 model.
+    self.assertEqual(len(variables), 896)
+
+
+if __name__ == '__main__':
+  tf.test.main()
--- a/research/object_detection/models/keras_models/mobilenet_v1.py
+++ b/research/object_detection/models/keras_models/mobilenet_v1.py
@@ -22,6 +22,7 @@ from __future__ import print_function
 import tensorflow as tf

 from object_detection.core import freezable_batch_norm
+from object_detection.models.keras_models import model_utils


 def _fixed_padding(inputs, kernel_size, rate=1):  # pylint: disable=invalid-name
@@ -59,7 +60,8 @@ class _LayersOverride(object):
               conv_hyperparams=None,
               use_explicit_padding=False,
               alpha=1.0,
-               min_depth=None):
+               min_depth=None,
+               conv_defs=None):
    """Alternative tf.keras.layers interface, for use by the Keras MobileNetV1.

    It is used by the Keras applications kwargs injection API to
@@ -90,6 +92,8 @@ class _LayersOverride(object):
        modifies the number of filters in each convolutional layer. It's called
        depth multiplier in Keras application MobilenetV1.
      min_depth: Minimum number of filters in the convolutional layers.
+      conv_defs: Network layout to specify the mobilenet_v1 body. Default is
+        `None` to use the default mobilenet_v1 network layout.
    """
    self._alpha = alpha
    self._batchnorm_training = batchnorm_training
@@ -97,6 +101,7 @@ class _LayersOverride(object):
    self._conv_hyperparams = conv_hyperparams
    self._use_explicit_padding = use_explicit_padding
    self._min_depth = min_depth
+    self._conv_defs = conv_defs
    self.regularizer = tf.keras.regularizers.l2(0.00004 * 0.5)
    self.initializer = tf.truncated_normal_initializer(stddev=0.09)

@@ -122,6 +127,11 @@ class _LayersOverride(object):
      the input argument, or that will first pad the input then apply a Conv2D
      layer.
    """
+    layer_name = kwargs['name']
+    if self._conv_defs:
+      conv_filters = model_utils.get_conv_def(self._conv_defs, layer_name)
+      if conv_filters:
+        filters = conv_filters
    # Apply the width multiplier and the minimum depth to the convolution layers
    filters = int(filters * self._alpha)
    if self._min_depth and filters < self._min_depth:
@@ -163,7 +173,12 @@ class _LayersOverride(object):
    """
    if self._conv_hyperparams:
      kwargs = self._conv_hyperparams.params(**kwargs)
+      # Both regularizer and initializaer also applies to depthwise layer in
+      # MobilenetV1, so we remap the kernel_* to depthwise_* here.
+      kwargs['depthwise_regularizer'] = kwargs['kernel_regularizer']
+      kwargs['depthwise_initializer'] = kwargs['kernel_initializer']
    else:
+      kwargs['depthwise_regularizer'] = self.regularizer
      kwargs['depthwise_initializer'] = self.initializer

    kwargs['padding'] = 'same'
@@ -278,6 +293,7 @@ def mobilenet_v1(batchnorm_training,
                 use_explicit_padding=False,
                 alpha=1.0,
                 min_depth=None,
+                 conv_defs=None,
                 **kwargs):
  """Instantiates the MobileNetV1 architecture, modified for object detection.

@@ -309,6 +325,8 @@ def mobilenet_v1(batchnorm_training,
      alpha: The width multiplier referenced in the MobileNetV1 paper. It
        modifies the number of filters in each convolutional layer.
      min_depth: Minimum number of filters in the convolutional layers.
+      conv_defs: Network layout to specify the mobilenet_v1 body. Default is
+        `None` to use the default mobilenet_v1 network layout.
      **kwargs: Keyword arguments forwarded directly to the
        `tf.keras.applications.Mobilenet` method that constructs the Keras
        model.
@@ -322,7 +340,8 @@ def mobilenet_v1(batchnorm_training,
      conv_hyperparams=conv_hyperparams,
      use_explicit_padding=use_explicit_padding,
      min_depth=min_depth,
-      alpha=alpha)
+      alpha=alpha,
+      conv_defs=conv_defs)
  return tf.keras.applications.MobileNet(
      alpha=alpha, layers=layers_override, **kwargs)
 # pylint: enable=invalid-name
--- a/research/object_detection/models/keras_models/mobilenet_v1_test.py
+++ b/research/object_detection/models/keras_models/mobilenet_v1_test.py
@@ -33,6 +33,7 @@ from google.protobuf import text_format

 from object_detection.builders import hyperparams_builder
 from object_detection.models.keras_models import mobilenet_v1
+from object_detection.models.keras_models import model_utils
 from object_detection.models.keras_models import test_utils
 from object_detection.protos import hyperparams_pb2
 from object_detection.utils import test_case
@@ -88,7 +89,8 @@ class MobilenetV1Test(test_case.TestCase):
      conv_hyperparams=None,
      use_explicit_padding=False,
      alpha=1.0,
-      min_depth=None):
+      min_depth=None,
+      conv_defs=None):
    """Constructs Keras MobilenetV1 that extracts intermediate layer outputs."""
    if not layer_names:
      layer_names = _KERAS_LAYERS_TO_CHECK
@@ -99,6 +101,7 @@ class MobilenetV1Test(test_case.TestCase):
        use_explicit_padding=use_explicit_padding,
        alpha=alpha,
        min_depth=min_depth,
+        conv_defs=conv_defs,
        include_top=False)
    layer_outputs = [full_model.get_layer(name=layer).output
                     for layer in layer_names]
@@ -109,14 +112,15 @@ class MobilenetV1Test(test_case.TestCase):
  def _check_returns_correct_shape(
      self, image_height, image_width, depth_multiplier,
      expected_feature_map_shape, use_explicit_padding=False, min_depth=8,
-      layer_names=None):
+      layer_names=None, conv_defs=None):
    def graph_fn(image_tensor):
      model = self._create_application_with_layer_outputs(
          layer_names=layer_names,
          batchnorm_training=False,
          use_explicit_padding=use_explicit_padding,
          min_depth=min_depth,
-          alpha=depth_multiplier)
+          alpha=depth_multiplier,
+          conv_defs=conv_defs)
      return model(image_tensor)

    image_tensor = np.random.rand(_BATCH_SIZE, image_height, image_width,
@@ -211,6 +215,23 @@ class MobilenetV1Test(test_case.TestCase):
    self._check_returns_correct_shape(
        image_height, image_width, depth_multiplier, expected_feature_map_shape)

+  def test_returns_correct_shapes_with_conv_defs(
+      self):
+    image_height = 299
+    image_width = 299
+    depth_multiplier = 1.0
+    conv_def_block_12 = model_utils.ConvDefs(
+        conv_name='conv_pw_12', filters=512)
+    conv_def_block_13 = model_utils.ConvDefs(
+        conv_name='conv_pw_13', filters=256)
+    conv_defs = [conv_def_block_12, conv_def_block_13]
+
+    expected_feature_map_shape = (
+        test_utils.moblenet_v1_expected_feature_map_shape_with_conv_defs)
+    self._check_returns_correct_shape(
+        image_height, image_width, depth_multiplier, expected_feature_map_shape,
+        conv_defs=conv_defs)
+
  def test_hyperparam_override(self):
    hyperparams = self._build_conv_hyperparams()
    model = mobilenet_v1.mobilenet_v1(

--- a/research/object_detection/models/keras_models/mobilenet_v2.py
+++ b/research/object_detection/models/keras_models/mobilenet_v2.py
@@ -21,6 +21,7 @@ from __future__ import print_function
 import tensorflow as tf

 from object_detection.core import freezable_batch_norm
+from object_detection.models.keras_models import model_utils
 from object_detection.utils import ops


@@ -45,7 +46,8 @@ class _LayersOverride(object):
               conv_hyperparams=None,
               use_explicit_padding=False,
               alpha=1.0,
-               min_depth=None):
+               min_depth=None,
+               conv_defs=None):
    """Alternative tf.keras.layers interface, for use by the Keras MobileNetV2.

    It is used by the Keras applications kwargs injection API to
@@ -75,6 +77,8 @@ class _LayersOverride(object):
      alpha: The width multiplier referenced in the MobileNetV2 paper. It
        modifies the number of filters in each convolutional layer.
      min_depth: Minimum number of filters in the convolutional layers.
+      conv_defs: Network layout to specify the mobilenet_v2 body. Default is
+        `None` to use the default mobilenet_v2 network layout.
    """
    self._alpha = alpha
    self._batchnorm_training = batchnorm_training
@@ -82,6 +86,7 @@ class _LayersOverride(object):
    self._conv_hyperparams = conv_hyperparams
    self._use_explicit_padding = use_explicit_padding
    self._min_depth = min_depth
+    self._conv_defs = conv_defs
    self.regularizer = tf.keras.regularizers.l2(0.00004 * 0.5)
    self.initializer = tf.truncated_normal_initializer(stddev=0.09)

@@ -106,8 +111,14 @@ class _LayersOverride(object):
    """
    # Make sure 'alpha' is always applied to the last convolution block's size
    # (This overrides the Keras application's functionality)
-    if kwargs.get('name') == 'Conv_1' and self._alpha < 1.0:
-      filters = _make_divisible(1280 * self._alpha, 8)
+    layer_name = kwargs.get('name')
+    if layer_name == 'Conv_1':
+      if self._conv_defs:
+        filters = model_utils.get_conv_def(self._conv_defs, 'Conv_1')
+      else:
+        filters = 1280
+      if self._alpha < 1.0:
+        filters = _make_divisible(filters * self._alpha, 8)

    # Apply the minimum depth to the convolution layers
    if (self._min_depth and (filters < self._min_depth)
@@ -263,6 +274,7 @@ def mobilenet_v2(batchnorm_training,
                 use_explicit_padding=False,
                 alpha=1.0,
                 min_depth=None,
+                 conv_defs=None,
                 **kwargs):
  """Instantiates the MobileNetV2 architecture, modified for object detection.

@@ -294,6 +306,8 @@ def mobilenet_v2(batchnorm_training,
      alpha: The width multiplier referenced in the MobileNetV2 paper. It
        modifies the number of filters in each convolutional layer.
      min_depth: Minimum number of filters in the convolutional layers.
+      conv_defs: Network layout to specify the mobilenet_v2 body. Default is
+        `None` to use the default mobilenet_v2 network layout.
      **kwargs: Keyword arguments forwarded directly to the
        `tf.keras.applications.MobilenetV2` method that constructs the Keras
        model.
@@ -307,7 +321,8 @@ def mobilenet_v2(batchnorm_training,
      conv_hyperparams=conv_hyperparams,
      use_explicit_padding=use_explicit_padding,
      min_depth=min_depth,
-      alpha=alpha)
+      alpha=alpha,
+      conv_defs=conv_defs)
  return tf.keras.applications.MobileNetV2(alpha=alpha,
                                           layers=layers_override,
                                           **kwargs)

--- a/research/object_detection/models/keras_models/mobilenet_v2_test.py
+++ b/research/object_detection/models/keras_models/mobilenet_v2_test.py
@@ -22,6 +22,7 @@ from google.protobuf import text_format

 from object_detection.builders import hyperparams_builder
 from object_detection.models.keras_models import mobilenet_v2
+from object_detection.models.keras_models import model_utils
 from object_detection.models.keras_models import test_utils
 from object_detection.protos import hyperparams_pb2
 from object_detection.utils import test_case
@@ -77,7 +78,8 @@ class MobilenetV2Test(test_case.TestCase):
      conv_hyperparams=None,
      use_explicit_padding=False,
      alpha=1.0,
-      min_depth=None):
+      min_depth=None,
+      conv_defs=None):
    """Constructs Keras mobilenetv2 that extracts intermediate layer outputs."""
    if not layer_names:
      layer_names = _layers_to_check
@@ -88,7 +90,8 @@ class MobilenetV2Test(test_case.TestCase):
        use_explicit_padding=use_explicit_padding,
        alpha=alpha,
        min_depth=min_depth,
-        include_top=False)
+        include_top=False,
+        conv_defs=conv_defs)
    layer_outputs = [full_model.get_layer(name=layer).output
                     for layer in layer_names]
    return tf.keras.Model(
@@ -98,13 +101,15 @@ class MobilenetV2Test(test_case.TestCase):
  def _check_returns_correct_shape(
      self, batch_size, image_height, image_width, depth_multiplier,
      expected_feature_map_shapes, use_explicit_padding=False, min_depth=None,
-      layer_names=None):
+      layer_names=None, conv_defs=None):
    def graph_fn(image_tensor):
      model = self._create_application_with_layer_outputs(
          layer_names=layer_names,
-          batchnorm_training=False, use_explicit_padding=use_explicit_padding,
+          batchnorm_training=False,
+          use_explicit_padding=use_explicit_padding,
          min_depth=min_depth,
-          alpha=depth_multiplier)
+          alpha=depth_multiplier,
+          conv_defs=conv_defs)
      return model(image_tensor)

    image_tensor = np.random.rand(batch_size, image_height, image_width,
@@ -202,6 +207,21 @@ class MobilenetV2Test(test_case.TestCase):
        2, image_height, image_width, depth_multiplier,
        expected_feature_map_shape, min_depth=32)

+  def test_returns_correct_shapes_with_conv_defs(
+      self):
+    image_height = 299
+    image_width = 299
+    depth_multiplier = 1.0
+    conv_1 = model_utils.ConvDefs(
+        conv_name='Conv_1', filters=256)
+    conv_defs = [conv_1]
+
+    expected_feature_map_shape = (
+        test_utils.moblenet_v2_expected_feature_map_shape_with_conv_defs)
+    self._check_returns_correct_shape(
+        2, image_height, image_width, depth_multiplier,
+        expected_feature_map_shape, conv_defs=conv_defs)
+
  def test_hyperparam_override(self):
    hyperparams = self._build_conv_hyperparams()
    model = mobilenet_v2.mobilenet_v2(

--- a/research/object_detection/models/keras_models/model_utils.py
+++ b/research/object_detection/models/keras_models/model_utils.py
+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Utils for Keras models."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import collections
+
+# This is to specify the custom config of model structures. For example,
+# ConvDefs(conv_name='conv_pw_12', filters=512) for Mobilenet V1 is to specify
+# the filters of the conv layer with name 'conv_pw_12' as 512.s
+ConvDefs = collections.namedtuple('ConvDefs', ['conv_name', 'filters'])
+
+
+def get_conv_def(conv_defs, layer_name):
+  """Get the custom config for some layer of the model structure.
+
+  Args:
+    conv_defs: A named tuple to specify the custom config of the model
+      network. See `ConvDefs` for details.
+    layer_name: A string, the name of the layer to be customized.
+
+  Returns:
+    The number of filters for the layer, or `None` if there is no custom
+    config for the requested layer.
+  """
+  for conv_def in conv_defs:
+    if layer_name == conv_def.conv_name:
+      return conv_def.filters
+  return None
--- a/research/object_detection/models/keras_models/original_mobilenet_v2.py
+++ b/research/object_detection/models/keras_models/original_mobilenet_v2.py
-  
-# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""MobileNet v2 models for Keras.
-
-MobileNetV2 is a general architecture and can be used for multiple use cases.
-Depending on the use case, it can use different input layer size and
-different width factors. This allows different width models to reduce
-the number of multiply-adds and thereby
-reduce inference cost on mobile devices.
-
-MobileNetV2 is very similar to the original MobileNet,
-except that it uses inverted residual blocks with
-bottlenecking features. It has a drastically lower
-parameter count than the original MobileNet.
-MobileNets support any input size greater
-than 32 x 32, with larger image sizes
-offering better performance.
-
-The number of parameters and number of multiply-adds
-can be modified by using the `alpha` parameter,
-which increases/decreases the number of filters in each layer.
-By altering the image size and `alpha` parameter,
-all 22 models from the paper can be built, with ImageNet weights provided.
-
-The paper demonstrates the performance of MobileNets using `alpha` values of
-1.0 (also called 100 % MobileNet), 0.35, 0.5, 0.75, 1.0, 1.3, and 1.4
-
-For each of these `alpha` values, weights for 5 different input image sizes
-are provided (224, 192, 160, 128, and 96).
-
-
-The following table describes the performance of
-MobileNet on various input sizes:
------------------------------------------------------------------------
-MACs stands for Multiply Adds
-
- Classification Checkpoint| MACs (M)   | Parameters (M)| Top 1 Accuracy| Top 5 Accuracy
--------------------------|------------|---------------|---------|----|-------------
-| [mobilenet_v2_1.4_224]  | 582 | 6.06 |          75.0 | 92.5 |
-| [mobilenet_v2_1.3_224]  | 509 | 5.34 |          74.4 | 92.1 |
-| [mobilenet_v2_1.0_224]  | 300 | 3.47 |          71.8 | 91.0 |
-| [mobilenet_v2_1.0_192]  | 221 | 3.47 |          70.7 | 90.1 |
-| [mobilenet_v2_1.0_160]  | 154 | 3.47 |          68.8 | 89.0 |
-| [mobilenet_v2_1.0_128]  | 99  | 3.47 |          65.3 | 86.9 |
-| [mobilenet_v2_1.0_96]   | 56  | 3.47 |          60.3 | 83.2 |
-| [mobilenet_v2_0.75_224] | 209 | 2.61 |          69.8 | 89.6 |
-| [mobilenet_v2_0.75_192] | 153 | 2.61 |          68.7 | 88.9 |
-| [mobilenet_v2_0.75_160] | 107 | 2.61 |          66.4 | 87.3 |
-| [mobilenet_v2_0.75_128] | 69  | 2.61 |          63.2 | 85.3 |
-| [mobilenet_v2_0.75_96]  | 39  | 2.61 |          58.8 | 81.6 |
-| [mobilenet_v2_0.5_224]  | 97  | 1.95 |          65.4 | 86.4 |
-| [mobilenet_v2_0.5_192]  | 71  | 1.95 |          63.9 | 85.4 |
-| [mobilenet_v2_0.5_160]  | 50  | 1.95 |          61.0 | 83.2 |
-| [mobilenet_v2_0.5_128]  | 32  | 1.95 |          57.7 | 80.8 |
-| [mobilenet_v2_0.5_96]   | 18  | 1.95 |          51.2 | 75.8 |
-| [mobilenet_v2_0.35_224] | 59  | 1.66 |          60.3 | 82.9 |
-| [mobilenet_v2_0.35_192] | 43  | 1.66 |          58.2 | 81.2 |
-| [mobilenet_v2_0.35_160] | 30  | 1.66 |          55.7 | 79.1 |
-| [mobilenet_v2_0.35_128] | 20  | 1.66 |          50.8 | 75.0 |
-| [mobilenet_v2_0.35_96]  | 11  | 1.66 |          45.5 | 70.4 |
-
-The weights for all 16 models are obtained and translated from the Tensorflow checkpoints
-from TensorFlow checkpoints found at
-https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/README.md
-
-# Reference
-This file contains building code for MobileNetV2, based on
-[MobileNetV2: Inverted Residuals and Linear Bottlenecks](https://arxiv.org/abs/1801.04381)
-
-Tests comparing this model to the existing Tensorflow model can be
-found at [mobilenet_v2_keras](https://github.com/JonathanCMitchell/mobilenet_v2_keras)
-"""
-from __future__ import print_function
-from __future__ import absolute_import
-from __future__ import division
-
-import os
-import warnings
-import h5py
-import numpy as np
-
-from ..models import Model
-from ..layers import Input
-from ..layers import Activation
-from ..layers import Dropout
-from ..layers import Reshape
-from ..layers import BatchNormalization
-from ..layers import Conv2D
-from ..layers import DepthwiseConv2D
-from ..layers import GlobalAveragePooling2D
-from ..layers import Add
-from ..layers import Flatten
-from ..layers import Dense
-from .. import initializers
-from .. import regularizers
-from .. import constraints
-from ..utils import conv_utils
-from ..utils.data_utils import get_file
-from ..engine import get_source_inputs
-from ..engine.base_layer import InputSpec
-from . import imagenet_utils
-from .imagenet_utils import _obtain_input_shape
-from .imagenet_utils import decode_predictions
-from .. import backend as K
-
-# TODO Change path to v1.1
-BASE_WEIGHT_PATH = 'https://github.com/JonathanCMitchell/mobilenet_v2_keras/releases/download/v1.1/'
-
-
-def relu6(x):
-    return K.relu(x, max_value=6)
-
-
-def preprocess_input(x):
-    """Preprocesses a numpy array encoding a batch of images.
-
-    This function applies the "Inception" preprocessing which converts
-    the RGB values from [0, 255] to [-1, 1]. Note that this preprocessing
-    function is different from `imagenet_utils.preprocess_input()`.
-
-    # Arguments
-        x: a 4D numpy array consists of RGB values within [0, 255].
-
-    # Returns
-        Preprocessed array.
-    """
-    x /= 128.
-    x -= 1.
-    return x.astype(np.float32)
-
-
-# This function is taken from the original tf repo.
-# It ensures that all layers have a channel number that is divisible by 8
-# It can be seen here:
-# https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
-
-
-def _make_divisible(v, divisor, min_value=None):
-    if min_value is None:
-        min_value = divisor
-    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
-    # Make sure that round down does not go down by more than 10%.
-    if new_v < 0.9 * v:
-        new_v += divisor
-    return new_v
-
-
-def MobileNetV2(input_shape=None,
-                alpha=1.0,
-                depth_multiplier=1,
-                dropout=1e-3,
-                include_top=True,
-                weights='imagenet',
-                input_tensor=None,
-                classes=1000):
-    """Instantiates the MobileNetV2 architecture.
-
-    To load a MobileNetV2 model via `load_model`, import the custom
-    objects `relu6` and pass them to the `custom_objects` parameter.
-    E.g.
-    model = load_model('mobilenet.h5', custom_objects={
-                       'relu6': mobilenet.relu6})
-
-    # Arguments
-        input_shape: optional shape tuple, to be specified if you would
-            like to use a model with an input img resolution that is not
-            (224, 224, 3).
-            It should have exactly 3 inputs channels (224, 224, 3).
-            You can also omit this option if you would like
-            to infer input_shape from an input_tensor.
-            If you choose to include both input_tensor and input_shape then
-            input_shape will be used if they match, if the shapes
-            do not match then we will throw an error.
-            E.g. `(160, 160, 3)` would be one valid value.
-        alpha: controls the width of the network. This is known as the
-        width multiplier in the MobileNetV2 paper.
-            - If `alpha` < 1.0, proportionally decreases the number
-                of filters in each layer.
-            - If `alpha` > 1.0, proportionally increases the number
-                of filters in each layer.
-            - If `alpha` = 1, default number of filters from the paper
-                 are used at each layer.
-        depth_multiplier: depth multiplier for depthwise convolution
-            (also called the resolution multiplier)
-        dropout: dropout rate, dropout is currently not in use
-        include_top: whether to include the fully-connected
-            layer at the top of the network.
-        weights: one of `None` (random initialization),
-              'imagenet' (pre-training on ImageNet),
-              or the path to the weights file to be loaded.
-        input_tensor: optional Keras tensor (i.e. output of
-            `layers.Input()`)
-            to use as image input for the model.
-        classes: optional number of classes to classify images
-            into, only to be specified if `include_top` is True, and
-            if no `weights` argument is specified.
-
-    # Returns
-        A Keras model instance.
-
-    # Raises
-        ValueError: in case of invalid argument for `weights`,
-            or invalid input shape or invalid depth_multiplier, alpha,
-            rows when weights='imagenet'
-    """
-
-    if not (weights in {'imagenet', None} or os.path.exists(weights)):
-        raise ValueError('The `weights` argument should be either '
-                         '`None` (random initialization), `imagenet` '
-                         '(pre-training on ImageNet), '
-                         'or the path to the weights file to be loaded.')
-
-    if weights == 'imagenet' and include_top and classes != 1000:
-        raise ValueError('If using `weights` as ImageNet with `include_top` '
-                         'as true, `classes` should be 1000')
-
-    # Determine proper input shape and default size.
-    # If both input_shape and input_tensor are used, they should match
-    if input_shape is not None and input_tensor is not None:
-        try:
-            is_input_t_tensor = K.is_keras_tensor(input_tensor)
-        except ValueError:
-            try:
-                is_input_t_tensor = K.is_keras_tensor(
-                    get_source_inputs(input_tensor))
-            except ValueError:
-                raise ValueError('input_tensor: ', input_tensor,
-                                 'is not type input_tensor')
-        if is_input_t_tensor:
-            if K.image_data_format == 'channels_first':
-                if input_tensor._keras_shape[1] != input_shape[1]:
-                    raise ValueError('input_shape: ', input_shape,
-                                     'and input_tensor: ', input_tensor,
-                                     'do not meet the same shape requirements')
-            else:
-                if input_tensor._keras_shape[2] != input_shape[1]:
-                    raise ValueError('input_shape: ', input_shape,
-                                     'and input_tensor: ', input_tensor,
-                                     'do not meet the same shape requirements')
-        else:
-            raise ValueError('input_tensor specified: ', input_tensor,
-                             'is not a keras tensor')
-
-    # If input_shape is None, infer shape from input_tensor
-    if input_shape is None and input_tensor is not None:
-
-        try:
-            K.is_keras_tensor(input_tensor)
-        except ValueError:
-            raise ValueError('input_tensor: ', input_tensor,
-                             'is type: ', type(input_tensor),
-                             'which is not a valid type')
-
-        if input_shape is None and not K.is_keras_tensor(input_tensor):
-            default_size = 224
-        elif input_shape is None and K.is_keras_tensor(input_tensor):
-            if K.image_data_format() == 'channels_first':
-                rows = input_tensor._keras_shape[2]
-                cols = input_tensor._keras_shape[3]
-            else:
-                rows = input_tensor._keras_shape[1]
-                cols = input_tensor._keras_shape[2]
-
-            if rows == cols and rows in [96, 128, 160, 192, 224]:
-                default_size = rows
-            else:
-                default_size = 224
-
-    # If input_shape is None and no input_tensor
-    elif input_shape is None:
-        default_size = 224
-
-    # If input_shape is not None, assume default size
-    else:
-        if K.image_data_format() == 'channels_first':
-            rows = input_shape[1]
-            cols = input_shape[2]
-        else:
-            rows = input_shape[0]
-            cols = input_shape[1]
-
-        if rows == cols and rows in [96, 128, 160, 192, 224]:
-            default_size = rows
-        else:
-            default_size = 224
-
-    input_shape = _obtain_input_shape(input_shape,
-                                      default_size=default_size,
-                                      min_size=32,
-                                      data_format=K.image_data_format(),
-                                      require_flatten=include_top,
-                                      weights=weights)
-
-    if K.image_data_format() == 'channels_last':
-        row_axis, col_axis = (0, 1)
-    else:
-        row_axis, col_axis = (1, 2)
-    rows = input_shape[row_axis]
-    cols = input_shape[col_axis]
-
-    if weights == 'imagenet':
-        if depth_multiplier != 1:
-            raise ValueError('If imagenet weights are being loaded, '
-                             'depth multiplier must be 1')
-
-        if alpha not in [0.35, 0.50, 0.75, 1.0, 1.3, 1.4]:
-            raise ValueError('If imagenet weights are being loaded, '
-                             'alpha can be one of'
-                             '`0.25`, `0.50`, `0.75` or `1.0` only.')
-
-        if rows != cols or rows not in [96, 128, 160, 192, 224]:
-            if rows is None:
-                rows = 224
-                warnings.warn('MobileNet shape is undefined.'
-                              ' Weights for input shape'
-                              '(224, 224) will be loaded.')
-            else:
-                raise ValueError('If imagenet weights are being loaded, '
-                                 'input must have a static square shape'
-                                 '(one of (96, 96), (128, 128), (160, 160),'
-                                 '(192, 192), or (224, 224)).'
-                                 'Input shape provided = %s' % (input_shape,))
-
-    if K.image_data_format() != 'channels_last':
-        warnings.warn('The MobileNet family of models is only available '
-                      'for the input data format "channels_last" '
-                      '(width, height, channels). '
-                      'However your settings specify the default '
-                      'data format "channels_first" (channels, width, height).'
-                      ' You should set `image_data_format="channels_last"` '
-                      'in your Keras config located at ~/.keras/keras.json. '
-                      'The model being returned right now will expect inputs '
-                      'to follow the "channels_last" data format.')
-        K.set_image_data_format('channels_last')
-        old_data_format = 'channels_first'
-    else:
-        old_data_format = None
-
-    if input_tensor is None:
-        img_input = Input(shape=input_shape)
-    else:
-        if not K.is_keras_tensor(input_tensor):
-            img_input = Input(tensor=input_tensor, shape=input_shape)
-        else:
-            img_input = input_tensor
-
-    first_block_filters = _make_divisible(32 * alpha, 8)
-    x = Conv2D(first_block_filters,
-               kernel_size=3,
-               strides=(2, 2), padding='same',
-               use_bias=False, name='Conv1')(img_input)
-    x = BatchNormalization(epsilon=1e-3, momentum=0.999, name='bn_Conv1')(x)
-    x = Activation(relu6, name='Conv1_relu')(x)
-
-    x = _first_inverted_res_block(x,
-                                  filters=16,
-                                  alpha=alpha,
-                                  stride=1,
-                                  expansion=1,
-                                  block_id=0)
-
-    x = _inverted_res_block(x, filters=24, alpha=alpha, stride=2,
-                            expansion=6, block_id=1)
-    x = _inverted_res_block(x, filters=24, alpha=alpha, stride=1,
-                            expansion=6, block_id=2)
-
-    x = _inverted_res_block(x, filters=32, alpha=alpha, stride=2,
-                            expansion=6, block_id=3)
-    x = _inverted_res_block(x, filters=32, alpha=alpha, stride=1,
-                            expansion=6, block_id=4)
-    x = _inverted_res_block(x, filters=32, alpha=alpha, stride=1,
-                            expansion=6, block_id=5)
-
-    x = _inverted_res_block(x, filters=64, alpha=alpha, stride=2,
-                            expansion=6, block_id=6)
-    x = _inverted_res_block(x, filters=64, alpha=alpha, stride=1,
-                            expansion=6, block_id=7)
-    x = _inverted_res_block(x, filters=64, alpha=alpha, stride=1,
-                            expansion=6, block_id=8)
-    x = _inverted_res_block(x, filters=64, alpha=alpha, stride=1,
-                            expansion=6, block_id=9)
-
-    x = _inverted_res_block(x, filters=96, alpha=alpha, stride=1,
-                            expansion=6, block_id=10)
-    x = _inverted_res_block(x, filters=96, alpha=alpha, stride=1,
-                            expansion=6, block_id=11)
-    x = _inverted_res_block(x, filters=96, alpha=alpha, stride=1,
-                            expansion=6, block_id=12)
-
-    x = _inverted_res_block(x, filters=160, alpha=alpha, stride=2,
-                            expansion=6, block_id=13)
-    x = _inverted_res_block(x, filters=160, alpha=alpha, stride=1,
-                            expansion=6, block_id=14)
-    x = _inverted_res_block(x, filters=160, alpha=alpha, stride=1,
-                            expansion=6, block_id=15)
-
-    x = _inverted_res_block(x, filters=320, alpha=alpha, stride=1,
-                            expansion=6, block_id=16)
-
-    # no alpha applied to last conv as stated in the paper:
-    # if the width multiplier is greater than 1 we
-    # increase the number of output channels
-    if alpha > 1.0:
-        last_block_filters = _make_divisible(1280 * alpha, 8)
-    else:
-        last_block_filters = 1280
-
-    x = Conv2D(last_block_filters,
-               kernel_size=1,
-               use_bias=False,
-               name='Conv_1')(x)
-    x = BatchNormalization(epsilon=1e-3, momentum=0.999, name='Conv_1_bn')(x)
-    x = Activation(relu6, name='out_relu')(x)
-
-    if include_top:
-        x = GlobalAveragePooling2D()(x)
-        x = Dense(classes, activation='softmax',
-                  use_bias=True, name='Logits')(x)
-
-    # Ensure that the model takes into account
-    # any potential predecessors of `input_tensor`.
-    if input_tensor is not None:
-        inputs = get_source_inputs(input_tensor)
-    else:
-        inputs = img_input
-
-    # Create model.
-    model = Model(inputs, x, name='mobilenetv2_%0.2f_%s' % (alpha, rows))
-
-    # load weights
-    if weights == 'imagenet':
-        if K.image_data_format() == 'channels_first':
-            raise ValueError('Weights for "channels_first" format '
-                             'are not available.')
-
-        if include_top:
-            model_name = 'mobilenet_v2_weights_tf_dim_ordering_tf_kernels_' + \
-                str(alpha) + '_' + str(rows) + '.h5'
-            weigh_path = BASE_WEIGHT_PATH + model_name
-            weights_path = get_file(model_name, weigh_path,
-                                    cache_subdir='models')
-        else:
-            model_name = 'mobilenet_v2_weights_tf_dim_ordering_tf_kernels_' + \
-                str(alpha) + '_' + str(rows) + '_no_top' + '.h5'
-            weigh_path = BASE_WEIGHT_PATH + model_name
-            weights_path = get_file(model_name, weigh_path,
-                                    cache_subdir='models')
-        model.load_weights(weights_path)
-    elif weights is not None:
-        model.load_weights(weights)
-
-    if old_data_format:
-        K.set_image_data_format(old_data_format)
-    return model
-
-
-def _inverted_res_block(inputs, expansion, stride, alpha, filters, block_id):
-    in_channels = inputs._keras_shape[-1]
-    prefix = 'features.' + str(block_id) + '.conv.'
-    pointwise_conv_filters = int(filters * alpha)
-    pointwise_filters = _make_divisible(pointwise_conv_filters, 8)
-    # Expand
-
-    x = Conv2D(expansion * in_channels, kernel_size=1, padding='same',
-               use_bias=False, activation=None,
-               name='mobl%d_conv_expand' % block_id)(inputs)
-    x = BatchNormalization(epsilon=1e-3, momentum=0.999,
-                           name='bn%d_conv_bn_expand' %
-                           block_id)(x)
-    x = Activation(relu6, name='conv_%d_relu' % block_id)(x)
-
-    # Depthwise
-    x = DepthwiseConv2D(kernel_size=3, strides=stride, activation=None,
-                        use_bias=False, padding='same',
-                        name='mobl%d_conv_depthwise' % block_id)(x)
-    x = BatchNormalization(epsilon=1e-3, momentum=0.999,
-                           name='bn%d_conv_depthwise' % block_id)(x)
-
-    x = Activation(relu6, name='conv_dw_%d_relu' % block_id)(x)
-
-    # Project
-    x = Conv2D(pointwise_filters,
-               kernel_size=1, padding='same', use_bias=False, activation=None,
-               name='mobl%d_conv_project' % block_id)(x)
-    x = BatchNormalization(epsilon=1e-3, momentum=0.999,
-                           name='bn%d_conv_bn_project' % block_id)(x)
-
-    if in_channels == pointwise_filters and stride == 1:
-        return Add(name='res_connect_' + str(block_id))([inputs, x])
-
-    return x
-
-
-def _first_inverted_res_block(inputs,
-                              expansion, stride,
-                              alpha, filters, block_id):
-    in_channels = inputs._keras_shape[-1]
-    prefix = 'features.' + str(block_id) + '.conv.'
-    pointwise_conv_filters = int(filters * alpha)
-    pointwise_filters = _make_divisible(pointwise_conv_filters, 8)
-
-    # Depthwise
-    x = DepthwiseConv2D(kernel_size=3,
-                        strides=stride, activation=None,
-                        use_bias=False, padding='same',
-                        name='mobl%d_conv_depthwise' %
-                        block_id)(inputs)
-    x = BatchNormalization(epsilon=1e-3, momentum=0.999,
-                           name='bn%d_conv_depthwise' %
-                           block_id)(x)
-    x = Activation(relu6, name='conv_dw_%d_relu' % block_id)(x)
-
-    # Project
-    x = Conv2D(pointwise_filters,
-               kernel_size=1,
-               padding='same',
-               use_bias=False,
-               activation=None,
-               name='mobl%d_conv_project' %
-               block_id)(x)
-    x = BatchNormalization(epsilon=1e-3, momentum=0.999,
-                           name='bn%d_conv_project' %
-                           block_id)(x)
-
-    if in_channels == pointwise_filters and stride == 1:
-        return Add(name='res_connect_' + str(block_id))([inputs, x])
-
-    return x
--- a/research/object_detection/models/keras_models/test_utils.py
+++ b/research/object_detection/models/keras_models/test_utils.py
@@ -106,6 +106,16 @@ moblenet_v1_expected_feature_map_shape_enforcing_min_depth = [
    (2, 10, 10, 8), (2, 10, 10, 8), (2, 10, 10, 8),
 ]

+moblenet_v1_expected_feature_map_shape_with_conv_defs = [
+    (2, 150, 150, 32), (2, 150, 150, 32), (2, 150, 150, 64), (2, 75, 75, 64),
+    (2, 75, 75, 128), (2, 75, 75, 128), (2, 75, 75, 128), (2, 38, 38, 128),
+    (2, 38, 38, 256), (2, 38, 38, 256), (2, 38, 38, 256), (2, 19, 19, 256),
+    (2, 19, 19, 512), (2, 19, 19, 512), (2, 19, 19, 512), (2, 19, 19, 512),
+    (2, 19, 19, 512), (2, 19, 19, 512), (2, 19, 19, 512), (2, 19, 19, 512),
+    (2, 19, 19, 512), (2, 19, 19, 512), (2, 19, 19, 512), (2, 10, 10, 512),
+    (2, 10, 10, 512), (2, 10, 10, 512), (2, 10, 10, 256),
+]
+
 # For Mobilenet V2
 moblenet_v2_expected_feature_map_shape_128 = [
    (2, 64, 64, 32), (2, 64, 64, 96), (2, 32, 32, 96), (2, 32, 32, 24),
@@ -187,3 +197,18 @@ moblenet_v2_expected_feature_map_shape_enforcing_min_depth = [
    (2, 10, 10, 32), (2, 10, 10, 32)
 ]

+moblenet_v2_expected_feature_map_shape_with_conv_defs = [
+    (2, 150, 150, 32), (2, 150, 150, 96), (2, 75, 75, 96), (2, 75, 75, 24),
+    (2, 75, 75, 144), (2, 75, 75, 144), (2, 75, 75, 24), (2, 75, 75, 144),
+    (2, 38, 38, 144), (2, 38, 38, 32), (2, 38, 38, 192), (2, 38, 38, 192),
+    (2, 38, 38, 32), (2, 38, 38, 192), (2, 38, 38, 192), (2, 38, 38, 32),
+    (2, 38, 38, 192), (2, 19, 19, 192), (2, 19, 19, 64), (2, 19, 19, 384),
+    (2, 19, 19, 384), (2, 19, 19, 64), (2, 19, 19, 384), (2, 19, 19, 384),
+    (2, 19, 19, 64), (2, 19, 19, 384), (2, 19, 19, 384), (2, 19, 19, 64),
+    (2, 19, 19, 384), (2, 19, 19, 384), (2, 19, 19, 96), (2, 19, 19, 576),
+    (2, 19, 19, 576), (2, 19, 19, 96), (2, 19, 19, 576), (2, 19, 19, 576),
+    (2, 19, 19, 96), (2, 19, 19, 576), (2, 10, 10, 576), (2, 10, 10, 160),
+    (2, 10, 10, 960), (2, 10, 10, 960), (2, 10, 10, 160), (2, 10, 10, 960),
+    (2, 10, 10, 960), (2, 10, 10, 160), (2, 10, 10, 960), (2, 10, 10, 960),
+    (2, 10, 10, 320), (2, 10, 10, 256)
+]
--- a/research/object_detection/models/ssd_mobilenet_v1_fpn_feature_extractor_test.py
+++ b/research/object_detection/models/ssd_mobilenet_v1_fpn_feature_extractor_test.py
@@ -13,21 +13,32 @@
 # limitations under the License.
 # ==============================================================================

-"""Tests for ssd_mobilenet_v1_fpn_feature_extractor."""
+"""Tests for ssd_mobilenet_v1_fpn_feature_extractor.
+
+By using parameterized test decorator, this test serves for both Slim-based and
+Keras-based Mobilenet V1 FPN feature extractors in SSD.
+"""
+from absl.testing import parameterized
 import numpy as np
 import tensorflow as tf

 from object_detection.models import ssd_feature_extractor_test
 from object_detection.models import ssd_mobilenet_v1_fpn_feature_extractor
+from object_detection.models import ssd_mobilenet_v1_fpn_keras_feature_extractor

 slim = tf.contrib.slim


+@parameterized.parameters(
+    {'use_keras': False},
+    {'use_keras': True},
+)
 class SsdMobilenetV1FpnFeatureExtractorTest(
    ssd_feature_extractor_test.SsdFeatureExtractorTestBase):

  def _create_feature_extractor(self, depth_multiplier, pad_to_multiple,
-                                is_training=True, use_explicit_padding=False):
+                                is_training=True, use_explicit_padding=False,
+                                use_keras=False):
    """Constructs a new feature extractor.

    Args:
@@ -38,10 +49,27 @@ class SsdMobilenetV1FpnFeatureExtractorTest(
      use_explicit_padding: Use 'VALID' padding for convolutions, but prepad
        inputs so that the output dimensions are the same as if 'SAME' padding
        were used.
+      use_keras: if True builds a keras-based feature extractor, if False builds
+        a slim-based one.
    Returns:
      an ssd_meta_arch.SSDFeatureExtractor object.
    """
    min_depth = 32
+    if use_keras:
+      return (ssd_mobilenet_v1_fpn_keras_feature_extractor.
+              SSDMobileNetV1FpnKerasFeatureExtractor(
+                  is_training=is_training,
+                  depth_multiplier=depth_multiplier,
+                  min_depth=min_depth,
+                  pad_to_multiple=pad_to_multiple,
+                  conv_hyperparams=self._build_conv_hyperparams(
+                      add_batch_norm=False),
+                  freeze_batchnorm=False,
+                  inplace_batchnorm_update=False,
+                  use_explicit_padding=use_explicit_padding,
+                  use_depthwise=True,
+                  name='MobilenetV1_FPN'))
+    else:
      return (ssd_mobilenet_v1_fpn_feature_extractor.
              SSDMobileNetV1FpnFeatureExtractor(
                  is_training,
@@ -49,9 +77,10 @@ class SsdMobilenetV1FpnFeatureExtractorTest(
                  min_depth,
                  pad_to_multiple,
                  self.conv_hyperparams_fn,
+                  use_depthwise=True,
                  use_explicit_padding=use_explicit_padding))

-  def test_extract_features_returns_correct_shapes_256(self):
+  def test_extract_features_returns_correct_shapes_256(self, use_keras):
    image_height = 256
    image_width = 256
    depth_multiplier = 1.0
@@ -61,12 +90,14 @@ class SsdMobilenetV1FpnFeatureExtractorTest(
                                  (2, 2, 2, 256)]
    self.check_extract_features_returns_correct_shape(
        2, image_height, image_width, depth_multiplier, pad_to_multiple,
-        expected_feature_map_shape, use_explicit_padding=False)
+        expected_feature_map_shape, use_explicit_padding=False,
+        use_keras=use_keras)
    self.check_extract_features_returns_correct_shape(
        2, image_height, image_width, depth_multiplier, pad_to_multiple,
-        expected_feature_map_shape, use_explicit_padding=True)
+        expected_feature_map_shape, use_explicit_padding=True,
+        use_keras=use_keras)

-  def test_extract_features_returns_correct_shapes_384(self):
+  def test_extract_features_returns_correct_shapes_384(self, use_keras):
    image_height = 320
    image_width = 320
    depth_multiplier = 1.0
@@ -76,12 +107,14 @@ class SsdMobilenetV1FpnFeatureExtractorTest(
                                  (2, 3, 3, 256)]
    self.check_extract_features_returns_correct_shape(
        2, image_height, image_width, depth_multiplier, pad_to_multiple,
-        expected_feature_map_shape, use_explicit_padding=False)
+        expected_feature_map_shape, use_explicit_padding=False,
+        use_keras=use_keras)
    self.check_extract_features_returns_correct_shape(
        2, image_height, image_width, depth_multiplier, pad_to_multiple,
-        expected_feature_map_shape, use_explicit_padding=True)
+        expected_feature_map_shape, use_explicit_padding=True,
+        use_keras=use_keras)

-  def test_extract_features_with_dynamic_image_shape(self):
+  def test_extract_features_with_dynamic_image_shape(self, use_keras):
    image_height = 256
    image_width = 256
    depth_multiplier = 1.0
@@ -91,12 +124,15 @@ class SsdMobilenetV1FpnFeatureExtractorTest(
                                  (2, 2, 2, 256)]
    self.check_extract_features_returns_correct_shapes_with_dynamic_inputs(
        2, image_height, image_width, depth_multiplier, pad_to_multiple,
-        expected_feature_map_shape, use_explicit_padding=False)
+        expected_feature_map_shape, use_explicit_padding=False,
+        use_keras=use_keras)
    self.check_extract_features_returns_correct_shapes_with_dynamic_inputs(
        2, image_height, image_width, depth_multiplier, pad_to_multiple,
-        expected_feature_map_shape, use_explicit_padding=True)
+        expected_feature_map_shape, use_explicit_padding=True,
+        use_keras=use_keras)

-  def test_extract_features_returns_correct_shapes_with_pad_to_multiple(self):
+  def test_extract_features_returns_correct_shapes_with_pad_to_multiple(
+      self, use_keras):
    image_height = 299
    image_width = 299
    depth_multiplier = 1.0
@@ -106,12 +142,15 @@ class SsdMobilenetV1FpnFeatureExtractorTest(
                                  (2, 3, 3, 256)]
    self.check_extract_features_returns_correct_shape(
        2, image_height, image_width, depth_multiplier, pad_to_multiple,
-        expected_feature_map_shape, use_explicit_padding=False)
+        expected_feature_map_shape, use_explicit_padding=False,
+        use_keras=use_keras)
    self.check_extract_features_returns_correct_shape(
        2, image_height, image_width, depth_multiplier, pad_to_multiple,
-        expected_feature_map_shape, use_explicit_padding=True)
+        expected_feature_map_shape, use_explicit_padding=True,
+        use_keras=use_keras)

-  def test_extract_features_returns_correct_shapes_enforcing_min_depth(self):
+  def test_extract_features_returns_correct_shapes_enforcing_min_depth(
+      self, use_keras):
    image_height = 256
    image_width = 256
    depth_multiplier = 0.5**12
@@ -121,38 +160,50 @@ class SsdMobilenetV1FpnFeatureExtractorTest(
                                  (2, 2, 2, 32)]
    self.check_extract_features_returns_correct_shape(
        2, image_height, image_width, depth_multiplier, pad_to_multiple,
-        expected_feature_map_shape, use_explicit_padding=False)
+        expected_feature_map_shape, use_explicit_padding=False,
+        use_keras=use_keras)
    self.check_extract_features_returns_correct_shape(
        2, image_height, image_width, depth_multiplier, pad_to_multiple,
-        expected_feature_map_shape, use_explicit_padding=True)
+        expected_feature_map_shape, use_explicit_padding=True,
+        use_keras=use_keras)

-  def test_extract_features_raises_error_with_invalid_image_size(self):
+  def test_extract_features_raises_error_with_invalid_image_size(
+      self, use_keras):
    image_height = 32
    image_width = 32
    depth_multiplier = 1.0
    pad_to_multiple = 1
    self.check_extract_features_raises_error_with_invalid_image_size(
-        image_height, image_width, depth_multiplier, pad_to_multiple)
+        image_height, image_width, depth_multiplier, pad_to_multiple,
+        use_keras=use_keras)

-  def test_preprocess_returns_correct_value_range(self):
+  def test_preprocess_returns_correct_value_range(self, use_keras):
    image_height = 256
    image_width = 256
    depth_multiplier = 1
    pad_to_multiple = 1
    test_image = np.random.rand(2, image_height, image_width, 3)
    feature_extractor = self._create_feature_extractor(depth_multiplier,
-                                                       pad_to_multiple)
+                                                       pad_to_multiple,
+                                                       use_keras=use_keras)
    preprocessed_image = feature_extractor.preprocess(test_image)
    self.assertTrue(np.all(np.less_equal(np.abs(preprocessed_image), 1.0)))

-  def test_variables_only_created_in_scope(self):
+  def test_variables_only_created_in_scope(self, use_keras):
    depth_multiplier = 1
    pad_to_multiple = 1
    scope_name = 'MobilenetV1'
    self.check_feature_extractor_variables_under_scope(
-        depth_multiplier, pad_to_multiple, scope_name)
+        depth_multiplier, pad_to_multiple, scope_name, use_keras=use_keras)

-  def test_fused_batchnorm(self):
+  def test_variable_count(self, use_keras):
+    depth_multiplier = 1
+    pad_to_multiple = 1
+    variables = self.get_feature_extractor_variables(
+        depth_multiplier, pad_to_multiple, use_keras=use_keras)
+    self.assertEqual(len(variables), 153)
+
+  def test_fused_batchnorm(self, use_keras):
    image_height = 256
    image_width = 256
    depth_multiplier = 1
@@ -160,9 +211,14 @@ class SsdMobilenetV1FpnFeatureExtractorTest(
    image_placeholder = tf.placeholder(tf.float32,
                                       [1, image_height, image_width, 3])
    feature_extractor = self._create_feature_extractor(depth_multiplier,
-                                                       pad_to_multiple)
+                                                       pad_to_multiple,
+                                                       use_keras=use_keras)
    preprocessed_image = feature_extractor.preprocess(image_placeholder)
+    if use_keras:
+      _ = feature_extractor(preprocessed_image)
+    else:
      _ = feature_extractor.extract_features(preprocessed_image)
+
    self.assertTrue(
        any(op.type == 'FusedBatchNorm'
            for op in tf.get_default_graph().get_operations()))

--- a/research/object_detection/models/ssd_mobilenet_v1_fpn_keras_feature_extractor.py
+++ b/research/object_detection/models/ssd_mobilenet_v1_fpn_keras_feature_extractor.py
+# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""SSD Keras-based MobilenetV1 FPN Feature Extractor."""
+
+import tensorflow as tf
+
+from object_detection.meta_architectures import ssd_meta_arch
+from object_detection.models import feature_map_generators
+from object_detection.models.keras_models import mobilenet_v1
+from object_detection.models.keras_models import model_utils
+from object_detection.utils import ops
+from object_detection.utils import shape_utils
+
+
+# A modified config of mobilenet v1 that makes it more detection friendly.
+def _create_modified_mobilenet_config():
+  conv_def_block_12 = model_utils.ConvDefs(conv_name='conv_pw_12', filters=512)
+  conv_def_block_13 = model_utils.ConvDefs(conv_name='conv_pw_13', filters=256)
+  return [conv_def_block_12, conv_def_block_13]
+
+
+class SSDMobileNetV1FpnKerasFeatureExtractor(
+    ssd_meta_arch.SSDKerasFeatureExtractor):
+  """SSD Feature Extractor using Keras-based MobilenetV1 FPN features."""
+
+  def __init__(self,
+               is_training,
+               depth_multiplier,
+               min_depth,
+               pad_to_multiple,
+               conv_hyperparams,
+               freeze_batchnorm,
+               inplace_batchnorm_update,
+               fpn_min_level=3,
+               fpn_max_level=7,
+               additional_layer_depth=256,
+               use_explicit_padding=False,
+               use_depthwise=False,
+               override_base_feature_extractor_hyperparams=False,
+               name=None):
+    """SSD Keras based FPN feature extractor Mobilenet v1 architecture.
+
+    Args:
+      is_training: whether the network is in training mode.
+      depth_multiplier: float depth multiplier for feature extractor.
+      min_depth: minimum feature extractor depth.
+      pad_to_multiple: the nearest multiple to zero pad the input height and
+        width dimensions to.
+      conv_hyperparams: a `hyperparams_builder.KerasLayerHyperparams` object
+        containing convolution hyperparameters for the layers added on top of
+        the base feature extractor.
+      freeze_batchnorm: whether to freeze batch norm parameters during
+        training or not. When training with a small batch size (e.g. 1), it is
+        desirable to freeze batch norm update and use pretrained batch norm
+        params.
+      inplace_batchnorm_update: whether to update batch norm moving average
+        values inplace. When this is false train op must add a control
+        dependency on tf.graphkeys.UPDATE_OPS collection in order to update
+        batch norm statistics.
+      fpn_min_level: the highest resolution feature map to use in FPN. The valid
+        values are {2, 3, 4, 5} which map to MobileNet v1 layers
+        {Conv2d_3_pointwise, Conv2d_5_pointwise, Conv2d_11_pointwise,
+        Conv2d_13_pointwise}, respectively.
+      fpn_max_level: the smallest resolution feature map to construct or use in
+        FPN. FPN constructions uses features maps starting from fpn_min_level
+        upto the fpn_max_level. In the case that there are not enough feature
+        maps in the backbone network, additional feature maps are created by
+        applying stride 2 convolutions until we get the desired number of fpn
+        levels.
+      additional_layer_depth: additional feature map layer channel depth.
+      use_explicit_padding: Whether to use explicit padding when extracting
+        features. Default is False.
+      use_depthwise: whether to use depthwise convolutions. Default is False.
+      override_base_feature_extractor_hyperparams: Whether to override
+        hyperparameters of the base feature extractor with the one from
+        `conv_hyperparams`.
+      name: a string name scope to assign to the model. If 'None', Keras
+        will auto-generate one from the class name.
+    """
+    super(SSDMobileNetV1FpnKerasFeatureExtractor, self).__init__(
+        is_training=is_training,
+        depth_multiplier=depth_multiplier,
+        min_depth=min_depth,
+        pad_to_multiple=pad_to_multiple,
+        conv_hyperparams=conv_hyperparams,
+        freeze_batchnorm=freeze_batchnorm,
+        inplace_batchnorm_update=inplace_batchnorm_update,
+        use_explicit_padding=use_explicit_padding,
+        use_depthwise=use_depthwise,
+        override_base_feature_extractor_hyperparams=
+        override_base_feature_extractor_hyperparams,
+        name=name)
+    self._fpn_min_level = fpn_min_level
+    self._fpn_max_level = fpn_max_level
+    self._additional_layer_depth = additional_layer_depth
+    self._conv_defs = None
+    if self._use_depthwise:
+      self._conv_defs = _create_modified_mobilenet_config()
+    self._feature_blocks = [
+        'Conv2d_3_pointwise', 'Conv2d_5_pointwise', 'Conv2d_11_pointwise',
+        'Conv2d_13_pointwise'
+    ]
+    self._mobilenet_v1 = None
+    self._fpn_features_generator = None
+    self._coarse_feature_layers = []
+
+  def build(self, input_shape):
+    full_mobilenet_v1 = mobilenet_v1.mobilenet_v1(
+        batchnorm_training=(self._is_training and not self._freeze_batchnorm),
+        conv_hyperparams=(self._conv_hyperparams
+                          if self._override_base_feature_extractor_hyperparams
+                          else None),
+        weights=None,
+        use_explicit_padding=self._use_explicit_padding,
+        alpha=self._depth_multiplier,
+        min_depth=self._min_depth,
+        conv_defs=self._conv_defs,
+        include_top=False)
+    conv2d_3_pointwise = full_mobilenet_v1.get_layer(
+        name='conv_pw_3_relu').output
+    conv2d_5_pointwise = full_mobilenet_v1.get_layer(
+        name='conv_pw_5_relu').output
+    conv2d_11_pointwise = full_mobilenet_v1.get_layer(
+        name='conv_pw_11_relu').output
+    conv2d_13_pointwise = full_mobilenet_v1.get_layer(
+        name='conv_pw_13_relu').output
+    self._mobilenet_v1 = tf.keras.Model(
+        inputs=full_mobilenet_v1.inputs,
+        outputs=[conv2d_3_pointwise, conv2d_5_pointwise,
+                 conv2d_11_pointwise, conv2d_13_pointwise]
+    )
+    # pylint:disable=g-long-lambda
+    self._depth_fn = lambda d: max(
+        int(d * self._depth_multiplier), self._min_depth)
+    self._base_fpn_max_level = min(self._fpn_max_level, 5)
+    self._num_levels = self._base_fpn_max_level + 1 - self._fpn_min_level
+    self._fpn_features_generator = (
+        feature_map_generators.KerasFpnTopDownFeatureMaps(
+            num_levels=self._num_levels,
+            depth=self._depth_fn(self._additional_layer_depth),
+            use_depthwise=self._use_depthwise,
+            use_explicit_padding=self._use_explicit_padding,
+            is_training=self._is_training,
+            conv_hyperparams=self._conv_hyperparams,
+            freeze_batchnorm=self._freeze_batchnorm,
+            name='FeatureMaps'))
+    # Construct coarse feature layers
+    padding = 'VALID' if self._use_explicit_padding else 'SAME'
+    kernel_size = 3
+    stride = 2
+    for i in range(self._base_fpn_max_level + 1, self._fpn_max_level + 1):
+      coarse_feature_layers = []
+      if self._use_explicit_padding:
+        def fixed_padding(features, kernel_size=kernel_size):
+          return ops.fixed_padding(features, kernel_size)
+        coarse_feature_layers.append(tf.keras.layers.Lambda(
+            fixed_padding, name='fixed_padding'))
+      layer_name = 'bottom_up_Conv2d_{}'.format(
+          i - self._base_fpn_max_level + 13)
+      conv_block = feature_map_generators.create_conv_block(
+          self._use_depthwise, kernel_size, padding, stride, layer_name,
+          self._conv_hyperparams, self._is_training, self._freeze_batchnorm,
+          self._depth_fn(self._additional_layer_depth))
+      coarse_feature_layers.extend(conv_block)
+      self._coarse_feature_layers.append(coarse_feature_layers)
+    self.built = True
+
+  def preprocess(self, resized_inputs):
+    """SSD preprocessing.
+
+    Maps pixel values to the range [-1, 1].
+
+    Args:
+      resized_inputs: a [batch, height, width, channels] float tensor
+        representing a batch of images.
+
+    Returns:
+      preprocessed_inputs: a [batch, height, width, channels] float tensor
+        representing a batch of images.
+    """
+    return (2.0 / 255.0) * resized_inputs - 1.0
+
+  def _extract_features(self, preprocessed_inputs):
+    """Extract features from preprocessed inputs.
+
+    Args:
+      preprocessed_inputs: a [batch, height, width, channels] float tensor
+        representing a batch of images.
+
+    Returns:
+      feature_maps: a list of tensors where the ith tensor has shape
+        [batch, height_i, width_i, depth_i]
+    """
+    preprocessed_inputs = shape_utils.check_min_image_dim(
+        33, preprocessed_inputs)
+
+    image_features = self._mobilenet_v1(
+        ops.pad_to_multiple(preprocessed_inputs, self._pad_to_multiple))
+
+    feature_block_list = []
+    for level in range(self._fpn_min_level, self._base_fpn_max_level + 1):
+      feature_block_list.append(self._feature_blocks[level - 2])
+
+    feature_start_index = len(self._feature_blocks) - self._num_levels
+    fpn_input_image_features = [
+        (key, image_features[feature_start_index + index])
+        for index, key in enumerate(feature_block_list)]
+    fpn_features = self._fpn_features_generator(fpn_input_image_features)
+
+    feature_maps = []
+    for level in range(self._fpn_min_level, self._base_fpn_max_level + 1):
+      feature_maps.append(fpn_features['top_down_{}'.format(
+          self._feature_blocks[level - 2])])
+    last_feature_map = fpn_features['top_down_{}'.format(
+        self._feature_blocks[self._base_fpn_max_level - 2])]
+
+    for coarse_feature_layers in self._coarse_feature_layers:
+      for layer in coarse_feature_layers:
+        last_feature_map = layer(last_feature_map)
+      feature_maps.append(last_feature_map)
+    return feature_maps
--- a/research/object_detection/models/ssd_mobilenet_v2_fpn_feature_extractor_test.py
+++ b/research/object_detection/models/ssd_mobilenet_v2_fpn_feature_extractor_test.py
@@ -13,21 +13,32 @@
 # limitations under the License.
 # ==============================================================================

-"""Tests for ssd_mobilenet_v2_fpn_feature_extractor."""
+"""Tests for ssd_mobilenet_v2_fpn_feature_extractor.
+
+By using parameterized test decorator, this test serves for both Slim-based and
+Keras-based Mobilenet V2 FPN feature extractors in SSD.
+"""
+from absl.testing import parameterized
 import numpy as np
 import tensorflow as tf

 from object_detection.models import ssd_feature_extractor_test
 from object_detection.models import ssd_mobilenet_v2_fpn_feature_extractor
+from object_detection.models import ssd_mobilenet_v2_fpn_keras_feature_extractor

 slim = tf.contrib.slim


+@parameterized.parameters(
+    {'use_keras': False},
+    {'use_keras': True},
+)
 class SsdMobilenetV2FpnFeatureExtractorTest(
    ssd_feature_extractor_test.SsdFeatureExtractorTestBase):

  def _create_feature_extractor(self, depth_multiplier, pad_to_multiple,
-                                is_training=True, use_explicit_padding=False):
+                                is_training=True, use_explicit_padding=False,
+                                use_keras=False):
    """Constructs a new feature extractor.

    Args:
@@ -38,10 +49,26 @@ class SsdMobilenetV2FpnFeatureExtractorTest(
      use_explicit_padding: Use 'VALID' padding for convolutions, but prepad
        inputs so that the output dimensions are the same as if 'SAME' padding
        were used.
+      use_keras: if True builds a keras-based feature extractor, if False builds
+        a slim-based one.
    Returns:
      an ssd_meta_arch.SSDFeatureExtractor object.
    """
    min_depth = 32
+    if use_keras:
+      return (ssd_mobilenet_v2_fpn_keras_feature_extractor.
+              SSDMobileNetV2FpnKerasFeatureExtractor(
+                  is_training=is_training,
+                  depth_multiplier=depth_multiplier,
+                  min_depth=min_depth,
+                  pad_to_multiple=pad_to_multiple,
+                  conv_hyperparams=self._build_conv_hyperparams(
+                      add_batch_norm=False),
+                  freeze_batchnorm=False,
+                  inplace_batchnorm_update=False,
+                  use_explicit_padding=use_explicit_padding,
+                  name='MobilenetV2_FPN'))
+    else:
      return (ssd_mobilenet_v2_fpn_feature_extractor.
              SSDMobileNetV2FpnFeatureExtractor(
                  is_training,
@@ -51,7 +78,7 @@ class SsdMobilenetV2FpnFeatureExtractorTest(
                  self.conv_hyperparams_fn,
                  use_explicit_padding=use_explicit_padding))

-  def test_extract_features_returns_correct_shapes_256(self):
+  def test_extract_features_returns_correct_shapes_256(self, use_keras):
    image_height = 256
    image_width = 256
    depth_multiplier = 1.0
@@ -61,12 +88,14 @@ class SsdMobilenetV2FpnFeatureExtractorTest(
                                  (2, 2, 2, 256)]
    self.check_extract_features_returns_correct_shape(
        2, image_height, image_width, depth_multiplier, pad_to_multiple,
-        expected_feature_map_shape, use_explicit_padding=False)
+        expected_feature_map_shape, use_explicit_padding=False,
+        use_keras=use_keras)
    self.check_extract_features_returns_correct_shape(
        2, image_height, image_width, depth_multiplier, pad_to_multiple,
-        expected_feature_map_shape, use_explicit_padding=True)
+        expected_feature_map_shape, use_explicit_padding=True,
+        use_keras=use_keras)

-  def test_extract_features_returns_correct_shapes_384(self):
+  def test_extract_features_returns_correct_shapes_384(self, use_keras):
    image_height = 320
    image_width = 320
    depth_multiplier = 1.0
@@ -76,12 +105,14 @@ class SsdMobilenetV2FpnFeatureExtractorTest(
                                  (2, 3, 3, 256)]
    self.check_extract_features_returns_correct_shape(
        2, image_height, image_width, depth_multiplier, pad_to_multiple,
-        expected_feature_map_shape, use_explicit_padding=False)
+        expected_feature_map_shape, use_explicit_padding=False,
+        use_keras=use_keras)
    self.check_extract_features_returns_correct_shape(
        2, image_height, image_width, depth_multiplier, pad_to_multiple,
-        expected_feature_map_shape, use_explicit_padding=True)
+        expected_feature_map_shape, use_explicit_padding=True,
+        use_keras=use_keras)

-  def test_extract_features_with_dynamic_image_shape(self):
+  def test_extract_features_with_dynamic_image_shape(self, use_keras):
    image_height = 256
    image_width = 256
    depth_multiplier = 1.0
@@ -91,12 +122,15 @@ class SsdMobilenetV2FpnFeatureExtractorTest(
                                  (2, 2, 2, 256)]
    self.check_extract_features_returns_correct_shapes_with_dynamic_inputs(
        2, image_height, image_width, depth_multiplier, pad_to_multiple,
-        expected_feature_map_shape, use_explicit_padding=False)
+        expected_feature_map_shape, use_explicit_padding=False,
+        use_keras=use_keras)
    self.check_extract_features_returns_correct_shapes_with_dynamic_inputs(
        2, image_height, image_width, depth_multiplier, pad_to_multiple,
-        expected_feature_map_shape, use_explicit_padding=True)
+        expected_feature_map_shape, use_explicit_padding=True,
+        use_keras=use_keras)

-  def test_extract_features_returns_correct_shapes_with_pad_to_multiple(self):
+  def test_extract_features_returns_correct_shapes_with_pad_to_multiple(
+      self, use_keras):
    image_height = 299
    image_width = 299
    depth_multiplier = 1.0
@@ -106,12 +140,15 @@ class SsdMobilenetV2FpnFeatureExtractorTest(
                                  (2, 3, 3, 256)]
    self.check_extract_features_returns_correct_shape(
        2, image_height, image_width, depth_multiplier, pad_to_multiple,
-        expected_feature_map_shape, use_explicit_padding=False)
+        expected_feature_map_shape, use_explicit_padding=False,
+        use_keras=use_keras)
    self.check_extract_features_returns_correct_shape(
        2, image_height, image_width, depth_multiplier, pad_to_multiple,
-        expected_feature_map_shape, use_explicit_padding=True)
+        expected_feature_map_shape, use_explicit_padding=True,
+        use_keras=use_keras)

-  def test_extract_features_returns_correct_shapes_enforcing_min_depth(self):
+  def test_extract_features_returns_correct_shapes_enforcing_min_depth(
+      self, use_keras):
    image_height = 256
    image_width = 256
    depth_multiplier = 0.5**12
@@ -121,38 +158,43 @@ class SsdMobilenetV2FpnFeatureExtractorTest(
                                  (2, 2, 2, 32)]
    self.check_extract_features_returns_correct_shape(
        2, image_height, image_width, depth_multiplier, pad_to_multiple,
-        expected_feature_map_shape, use_explicit_padding=False)
+        expected_feature_map_shape, use_explicit_padding=False,
+        use_keras=use_keras)
    self.check_extract_features_returns_correct_shape(
        2, image_height, image_width, depth_multiplier, pad_to_multiple,
-        expected_feature_map_shape, use_explicit_padding=True)
+        expected_feature_map_shape, use_explicit_padding=True,
+        use_keras=use_keras)

-  def test_extract_features_raises_error_with_invalid_image_size(self):
+  def test_extract_features_raises_error_with_invalid_image_size(
+      self, use_keras):
    image_height = 32
    image_width = 32
    depth_multiplier = 1.0
    pad_to_multiple = 1
    self.check_extract_features_raises_error_with_invalid_image_size(
-        image_height, image_width, depth_multiplier, pad_to_multiple)
+        image_height, image_width, depth_multiplier, pad_to_multiple,
+        use_keras=use_keras)

-  def test_preprocess_returns_correct_value_range(self):
+  def test_preprocess_returns_correct_value_range(self, use_keras):
    image_height = 256
    image_width = 256
    depth_multiplier = 1
    pad_to_multiple = 1
    test_image = np.random.rand(2, image_height, image_width, 3)
    feature_extractor = self._create_feature_extractor(depth_multiplier,
-                                                       pad_to_multiple)
+                                                       pad_to_multiple,
+                                                       use_keras=use_keras)
    preprocessed_image = feature_extractor.preprocess(test_image)
    self.assertTrue(np.all(np.less_equal(np.abs(preprocessed_image), 1.0)))

-  def test_variables_only_created_in_scope(self):
+  def test_variables_only_created_in_scope(self, use_keras):
    depth_multiplier = 1
    pad_to_multiple = 1
    scope_name = 'MobilenetV2'
    self.check_feature_extractor_variables_under_scope(
-        depth_multiplier, pad_to_multiple, scope_name)
+        depth_multiplier, pad_to_multiple, scope_name, use_keras=use_keras)

-  def test_fused_batchnorm(self):
+  def test_fused_batchnorm(self, use_keras):
    image_height = 256
    image_width = 256
    depth_multiplier = 1
@@ -160,19 +202,30 @@ class SsdMobilenetV2FpnFeatureExtractorTest(
    image_placeholder = tf.placeholder(tf.float32,
                                       [1, image_height, image_width, 3])
    feature_extractor = self._create_feature_extractor(depth_multiplier,
-                                                       pad_to_multiple)
+                                                       pad_to_multiple,
+                                                       use_keras=use_keras)
    preprocessed_image = feature_extractor.preprocess(image_placeholder)
+    if use_keras:
+      _ = feature_extractor(preprocessed_image)
+    else:
      _ = feature_extractor.extract_features(preprocessed_image)
    self.assertTrue(
        any(op.type == 'FusedBatchNorm'
            for op in tf.get_default_graph().get_operations()))

-  def test_get_expected_feature_map_variable_names(self):
+  def test_variable_count(self, use_keras):
+    depth_multiplier = 1
+    pad_to_multiple = 1
+    variables = self.get_feature_extractor_variables(
+        depth_multiplier, pad_to_multiple, use_keras=use_keras)
+    self.assertEqual(len(variables), 274)
+
+  def test_get_expected_feature_map_variable_names(self, use_keras):
    depth_multiplier = 1.0
    pad_to_multiple = 1

-    expected_feature_maps_variables = set([
-        # Mobilenet V2 feature maps
+    slim_expected_feature_maps_variables = set([
+        # Slim Mobilenet V2 feature maps
        'MobilenetV2/expanded_conv_4/depthwise/depthwise_weights',
        'MobilenetV2/expanded_conv_7/depthwise/depthwise_weights',
        'MobilenetV2/expanded_conv_14/depthwise/depthwise_weights',
@@ -186,13 +239,32 @@ class SsdMobilenetV2FpnFeatureExtractorTest(
        'MobilenetV2/fpn/projection_2/weights',
        'MobilenetV2/fpn/projection_3/weights',
    ])
-
+    keras_expected_feature_maps_variables = set([
+        # Keras Mobilenet V2 feature maps
+        'MobilenetV2_FPN/block_4_depthwise/depthwise_kernel',
+        'MobilenetV2_FPN/block_7_depthwise/depthwise_kernel',
+        'MobilenetV2_FPN/block_14_depthwise/depthwise_kernel',
+        'MobilenetV2_FPN/Conv_1/kernel',
+        # FPN layers
+        'MobilenetV2_FPN/bottom_up_Conv2d_20_conv/kernel',
+        'MobilenetV2_FPN/bottom_up_Conv2d_21_conv/kernel',
+        'MobilenetV2_FPN/FeatureMaps/top_down/smoothing_1_conv/kernel',
+        'MobilenetV2_FPN/FeatureMaps/top_down/smoothing_2_conv/kernel',
+        'MobilenetV2_FPN/FeatureMaps/top_down/projection_1/kernel',
+        'MobilenetV2_FPN/FeatureMaps/top_down/projection_2/kernel',
+        'MobilenetV2_FPN/FeatureMaps/top_down/projection_3/kernel'
+    ])
    g = tf.Graph()
    with g.as_default():
      preprocessed_inputs = tf.placeholder(tf.float32, (4, None, None, 3))
      feature_extractor = self._create_feature_extractor(
-          depth_multiplier, pad_to_multiple)
+          depth_multiplier, pad_to_multiple, use_keras=use_keras)
+      if use_keras:
+        feature_extractor(preprocessed_inputs)
+        expected_feature_maps_variables = keras_expected_feature_maps_variables
+      else:
        feature_extractor.extract_features(preprocessed_inputs)
+        expected_feature_maps_variables = slim_expected_feature_maps_variables
      actual_variable_set = set([
          var.op.name for var in g.get_collection(tf.GraphKeys.GLOBAL_VARIABLES)
      ])

--- a/research/object_detection/models/ssd_mobilenet_v2_fpn_keras_feature_extractor.py
+++ b/research/object_detection/models/ssd_mobilenet_v2_fpn_keras_feature_extractor.py
+# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""SSD Keras-based MobilenetV2 FPN Feature Extractor."""
+
+import tensorflow as tf
+
+from object_detection.meta_architectures import ssd_meta_arch
+from object_detection.models import feature_map_generators
+from object_detection.models.keras_models import mobilenet_v2
+from object_detection.utils import ops
+from object_detection.utils import shape_utils
+
+# Total number of blocks in Mobilenet_V2 base network.
+NUM_LAYERS = 19
+
+
+# A modified config of mobilenet v2 that makes it more detection friendly.
+def _create_modified_mobilenet_config():
+  last_conv = mobilenet_v2.ConvDefs(conv_name='Conv_1', filters=256)
+  return [last_conv]
+
+
+class SSDMobileNetV2FpnKerasFeatureExtractor(
+    ssd_meta_arch.SSDKerasFeatureExtractor):
+  """SSD Feature Extractor using Keras-based MobilenetV2 FPN features."""
+
+  def __init__(self,
+               is_training,
+               depth_multiplier,
+               min_depth,
+               pad_to_multiple,
+               conv_hyperparams,
+               freeze_batchnorm,
+               inplace_batchnorm_update,
+               fpn_min_level=3,
+               fpn_max_level=7,
+               additional_layer_depth=256,
+               reuse_weights=None,
+               use_explicit_padding=False,
+               use_depthwise=False,
+               override_base_feature_extractor_hyperparams=False,
+               name=None):
+    """SSD Keras based FPN feature extractor Mobilenet v2 architecture.
+
+    Args:
+      is_training: whether the network is in training mode.
+      depth_multiplier: float depth multiplier for feature extractor.
+      min_depth: minimum feature extractor depth.
+      pad_to_multiple: the nearest multiple to zero pad the input height and
+        width dimensions to.
+      conv_hyperparams: a `hyperparams_builder.KerasLayerHyperparams` object
+        containing convolution hyperparameters for the layers added on top of
+        the base feature extractor.
+      freeze_batchnorm: whether to freeze batch norm parameters during
+        training or not. When training with a small batch size (e.g. 1), it is
+        desirable to freeze batch norm update and use pretrained batch norm
+        params.
+      inplace_batchnorm_update: whether to update batch norm moving average
+        values inplace. When this is false train op must add a control
+        dependency on tf.graphkeys.UPDATE_OPS collection in order to update
+        batch norm statistics.
+      fpn_min_level: the highest resolution feature map to use in FPN. The valid
+        values are {2, 3, 4, 5} which map to MobileNet v2 layers
+        {layer_4, layer_7, layer_14, layer_19}, respectively.
+      fpn_max_level: the smallest resolution feature map to construct or use in
+        FPN. FPN constructions uses features maps starting from fpn_min_level
+        upto the fpn_max_level. In the case that there are not enough feature
+        maps in the backbone network, additional feature maps are created by
+        applying stride 2 convolutions until we get the desired number of fpn
+        levels.
+      additional_layer_depth: additional feature map layer channel depth.
+      reuse_weights: whether to reuse variables. Default is None.
+      use_explicit_padding: Whether to use explicit padding when extracting
+        features. Default is False.
+      use_depthwise: Whether to use depthwise convolutions. Default is False.
+      override_base_feature_extractor_hyperparams: Whether to override
+        hyperparameters of the base feature extractor with the one from
+        `conv_hyperparams`.
+      name: a string name scope to assign to the model. If 'None', Keras
+        will auto-generate one from the class name.
+    """
+    super(SSDMobileNetV2FpnKerasFeatureExtractor, self).__init__(
+        is_training=is_training,
+        depth_multiplier=depth_multiplier,
+        min_depth=min_depth,
+        pad_to_multiple=pad_to_multiple,
+        conv_hyperparams=conv_hyperparams,
+        freeze_batchnorm=freeze_batchnorm,
+        inplace_batchnorm_update=inplace_batchnorm_update,
+        use_explicit_padding=use_explicit_padding,
+        use_depthwise=use_depthwise,
+        override_base_feature_extractor_hyperparams=
+        override_base_feature_extractor_hyperparams,
+        name=name)
+    self._fpn_min_level = fpn_min_level
+    self._fpn_max_level = fpn_max_level
+    self._additional_layer_depth = additional_layer_depth
+    self._conv_defs = None
+    if self._use_depthwise:
+      self._conv_defs = _create_modified_mobilenet_config()
+    self._feature_blocks = ['layer_4', 'layer_7', 'layer_14', 'layer_19']
+    self._mobilenet_v2 = None
+    self._fpn_features_generator = None
+    self._coarse_feature_layers = []
+
+  def build(self, input_shape):
+    full_mobilenet_v2 = mobilenet_v2.mobilenet_v2(
+        batchnorm_training=(self._is_training and not self._freeze_batchnorm),
+        conv_hyperparams=(self._conv_hyperparams
+                          if self._override_base_feature_extractor_hyperparams
+                          else None),
+        weights=None,
+        use_explicit_padding=self._use_explicit_padding,
+        alpha=self._depth_multiplier,
+        min_depth=self._min_depth,
+        include_top=False)
+    layer_names = [layer.name for layer in full_mobilenet_v2.layers]
+    outputs = []
+    for layer_idx in [4, 7, 14]:
+      add_name = 'block_{}_add'.format(layer_idx - 2)
+      project_name = 'block_{}_project_BN'.format(layer_idx - 2)
+      output_layer_name = add_name if add_name in layer_names else project_name
+      outputs.append(full_mobilenet_v2.get_layer(output_layer_name).output)
+    layer_19 = full_mobilenet_v2.get_layer(name='out_relu').output
+    outputs.append(layer_19)
+    self._mobilenet_v2 = tf.keras.Model(
+        inputs=full_mobilenet_v2.inputs,
+        outputs=outputs)
+    # pylint:disable=g-long-lambda
+    self._depth_fn = lambda d: max(
+        int(d * self._depth_multiplier), self._min_depth)
+    self._base_fpn_max_level = min(self._fpn_max_level, 5)
+    self._num_levels = self._base_fpn_max_level + 1 - self._fpn_min_level
+    self._fpn_features_generator = (
+        feature_map_generators.KerasFpnTopDownFeatureMaps(
+            num_levels=self._num_levels,
+            depth=self._depth_fn(self._additional_layer_depth),
+            use_depthwise=self._use_depthwise,
+            use_explicit_padding=self._use_explicit_padding,
+            is_training=self._is_training,
+            conv_hyperparams=self._conv_hyperparams,
+            freeze_batchnorm=self._freeze_batchnorm,
+            name='FeatureMaps'))
+    # Construct coarse feature layers
+    padding = 'VALID' if self._use_explicit_padding else 'SAME'
+    kernel_size = 3
+    stride = 2
+    for i in range(self._base_fpn_max_level + 1, self._fpn_max_level + 1):
+      coarse_feature_layers = []
+      if self._use_explicit_padding:
+        def fixed_padding(features, kernel_size=kernel_size):
+          return ops.fixed_padding(features, kernel_size)
+        coarse_feature_layers.append(tf.keras.layers.Lambda(
+            fixed_padding, name='fixed_padding'))
+      layer_name = 'bottom_up_Conv2d_{}'.format(
+          i - self._base_fpn_max_level + NUM_LAYERS)
+      conv_block = feature_map_generators.create_conv_block(
+          self._use_depthwise, kernel_size, padding, stride, layer_name,
+          self._conv_hyperparams, self._is_training, self._freeze_batchnorm,
+          self._depth_fn(self._additional_layer_depth))
+      coarse_feature_layers.extend(conv_block)
+      self._coarse_feature_layers.append(coarse_feature_layers)
+    self.built = True
+
+  def preprocess(self, resized_inputs):
+    """SSD preprocessing.
+
+    Maps pixel values to the range [-1, 1].
+
+    Args:
+      resized_inputs: a [batch, height, width, channels] float tensor
+        representing a batch of images.
+
+    Returns:
+      preprocessed_inputs: a [batch, height, width, channels] float tensor
+        representing a batch of images.
+    """
+    return (2.0 / 255.0) * resized_inputs - 1.0
+
+  def _extract_features(self, preprocessed_inputs):
+    """Extract features from preprocessed inputs.
+
+    Args:
+      preprocessed_inputs: a [batch, height, width, channels] float tensor
+        representing a batch of images.
+
+    Returns:
+      feature_maps: a list of tensors where the ith tensor has shape
+        [batch, height_i, width_i, depth_i]
+    """
+    preprocessed_inputs = shape_utils.check_min_image_dim(
+        33, preprocessed_inputs)
+
+    image_features = self._mobilenet_v2(
+        ops.pad_to_multiple(preprocessed_inputs, self._pad_to_multiple))
+
+    feature_block_list = []
+    for level in range(self._fpn_min_level, self._base_fpn_max_level + 1):
+      feature_block_list.append(self._feature_blocks[level - 2])
+
+    feature_start_index = len(self._feature_blocks) - self._num_levels
+    fpn_input_image_features = [
+        (key, image_features[feature_start_index + index])
+        for index, key in enumerate(feature_block_list)]
+    fpn_features = self._fpn_features_generator(fpn_input_image_features)
+
+    feature_maps = []
+    for level in range(self._fpn_min_level, self._base_fpn_max_level + 1):
+      feature_maps.append(fpn_features['top_down_{}'.format(
+          self._feature_blocks[level - 2])])
+    last_feature_map = fpn_features['top_down_{}'.format(
+        self._feature_blocks[self._base_fpn_max_level - 2])]
+
+    for coarse_feature_layers in self._coarse_feature_layers:
+      for layer in coarse_feature_layers:
+        last_feature_map = layer(last_feature_map)
+      feature_maps.append(last_feature_map)
+    return feature_maps
--- a/research/object_detection/models/ssd_resnet_v1_fpn_feature_extractor.py
+++ b/research/object_detection/models/ssd_resnet_v1_fpn_feature_extractor.py
@@ -29,7 +29,7 @@ from nets import resnet_v1
 slim = tf.contrib.slim


-class _SSDResnetV1FpnFeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
+class SSDResnetV1FpnFeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
  """SSD FPN feature extractor based on Resnet v1 architecture."""

  def __init__(self,
@@ -84,7 +84,7 @@ class _SSDResnetV1FpnFeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
    Raises:
      ValueError: On supplying invalid arguments for unused arguments.
    """
-    super(_SSDResnetV1FpnFeatureExtractor, self).__init__(
+    super(SSDResnetV1FpnFeatureExtractor, self).__init__(
        is_training=is_training,
        depth_multiplier=depth_multiplier,
        min_depth=min_depth,
@@ -198,7 +198,7 @@ class _SSDResnetV1FpnFeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
    return feature_maps


-class SSDResnet50V1FpnFeatureExtractor(_SSDResnetV1FpnFeatureExtractor):
+class SSDResnet50V1FpnFeatureExtractor(SSDResnetV1FpnFeatureExtractor):
  """SSD Resnet50 V1 FPN feature extractor."""

  def __init__(self,
@@ -255,7 +255,7 @@ class SSDResnet50V1FpnFeatureExtractor(_SSDResnetV1FpnFeatureExtractor):
        override_base_feature_extractor_hyperparams)


-class SSDResnet101V1FpnFeatureExtractor(_SSDResnetV1FpnFeatureExtractor):
+class SSDResnet101V1FpnFeatureExtractor(SSDResnetV1FpnFeatureExtractor):
  """SSD Resnet101 V1 FPN feature extractor."""

  def __init__(self,
@@ -312,7 +312,7 @@ class SSDResnet101V1FpnFeatureExtractor(_SSDResnetV1FpnFeatureExtractor):
        override_base_feature_extractor_hyperparams)


-class SSDResnet152V1FpnFeatureExtractor(_SSDResnetV1FpnFeatureExtractor):
+class SSDResnet152V1FpnFeatureExtractor(SSDResnetV1FpnFeatureExtractor):
  """SSD Resnet152 V1 FPN feature extractor."""

  def __init__(self,

--- a/research/object_detection/predictors/convolutional_box_predictor.py
+++ b/research/object_detection/predictors/convolutional_box_predictor.py
@@ -357,9 +357,9 @@ class WeightSharedConvolutionalBoxPredictor(box_predictor.BoxPredictor):
      inserted_layer_counter = 0
      target_channel = max(set(feature_channels), key=feature_channels.count)
      tf.logging.info('Not all feature maps have the same number of '
-                      'channels, found: {}, addition project layers '
-                      'to bring all feature maps to uniform channels '
-                      'of {}'.format(feature_channels, target_channel))
+                      'channels, found: {}, appending additional projection '
+                      'layers to bring all feature maps to uniformly have {} '
+                      'channels.'.format(feature_channels, target_channel))
    else:
      # Place holder variables if has_different_feature_channels is False.
      target_channel = -1
@@ -377,6 +377,8 @@ class WeightSharedConvolutionalBoxPredictor(box_predictor.BoxPredictor):
      with tf.variable_scope('WeightSharedConvolutionalBoxPredictor',
                             reuse=tf.AUTO_REUSE):
        with slim.arg_scope(self._conv_hyperparams_fn()):
+          # TODO(wangjiang) Pass is_training to the head class directly.
+          with slim.arg_scope([slim.dropout], is_training=self._is_training):
            (image_feature,
             inserted_layer_counter) = self._insert_additional_projection_layer(
                 image_feature, inserted_layer_counter, target_channel)

--- a/research/object_detection/predictors/convolutional_keras_box_predictor.py
+++ b/research/object_detection/predictors/convolutional_keras_box_predictor.py
@@ -197,3 +197,272 @@ class ConvolutionalBoxPredictor(box_predictor.KerasBoxPredictor):
        predictions[head_name].append(prediction)

    return predictions
+
+
+class WeightSharedConvolutionalBoxPredictor(box_predictor.KerasBoxPredictor):
+  """Convolutional Box Predictor with weight sharing based on Keras.
+
+  Defines the box predictor as defined in
+  https://arxiv.org/abs/1708.02002. This class differs from
+  ConvolutionalBoxPredictor in that it shares weights and biases while
+  predicting from different feature maps. However, batch_norm parameters are not
+  shared because the statistics of the activations vary among the different
+  feature maps.
+
+  Also note that separate multi-layer towers are constructed for the box
+  encoding and class predictors respectively.
+  """
+
+  def __init__(self,
+               is_training,
+               num_classes,
+               box_prediction_head,
+               class_prediction_head,
+               other_heads,
+               conv_hyperparams,
+               depth,
+               num_layers_before_predictor,
+               freeze_batchnorm,
+               inplace_batchnorm_update,
+               kernel_size=3,
+               apply_batch_norm=False,
+               share_prediction_tower=False,
+               use_depthwise=False,
+               name=None):
+    """Constructor.
+
+    Args:
+      is_training: Indicates whether the BoxPredictor is in training mode.
+      num_classes: number of classes.  Note that num_classes *does not*
+        include the background category, so if groundtruth labels take values
+        in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the
+        assigned classification targets can range from {0,... K}).
+      box_prediction_head: The head that predicts the boxes.
+      class_prediction_head: The head that predicts the classes.
+      other_heads: A dictionary mapping head names to convolutional
+        head classes.
+      conv_hyperparams: A `hyperparams_builder.KerasLayerHyperparams` object
+        containing hyperparameters for convolution ops.
+      depth: depth of conv layers.
+      num_layers_before_predictor: Number of the additional conv layers before
+        the predictor.
+      freeze_batchnorm: Whether to freeze batch norm parameters during
+        training or not. When training with a small batch size (e.g. 1), it is
+        desirable to freeze batch norm update and use pretrained batch norm
+        params.
+      inplace_batchnorm_update: Whether to update batch norm moving average
+        values inplace. When this is false train op must add a control
+        dependency on tf.graphkeys.UPDATE_OPS collection in order to update
+        batch norm statistics.
+      kernel_size: Size of final convolution kernel.
+      apply_batch_norm: Whether to apply batch normalization to conv layers in
+        this predictor.
+      share_prediction_tower: Whether to share the multi-layer tower among box
+        prediction head, class prediction head and other heads.
+      use_depthwise: Whether to use depthwise separable conv2d instead of
+       regular conv2d.
+      name: A string name scope to assign to the model. If `None`, Keras
+        will auto-generate one from the class name.
+    """
+    super(WeightSharedConvolutionalBoxPredictor, self).__init__(
+        is_training, num_classes, freeze_batchnorm=freeze_batchnorm,
+        inplace_batchnorm_update=inplace_batchnorm_update,
+        name=name)
+
+    self._box_prediction_head = box_prediction_head
+    self._prediction_heads = {
+        CLASS_PREDICTIONS_WITH_BACKGROUND: class_prediction_head,
+    }
+    if other_heads:
+      self._prediction_heads.update(other_heads)
+    # We generate a consistent ordering for the prediction head names,
+    # so that all workers build the model in the exact same order.
+    self._sorted_head_names = sorted(self._prediction_heads.keys())
+
+    self._conv_hyperparams = conv_hyperparams
+    self._depth = depth
+    self._num_layers_before_predictor = num_layers_before_predictor
+    self._kernel_size = kernel_size
+    self._apply_batch_norm = apply_batch_norm
+    self._share_prediction_tower = share_prediction_tower
+    self._use_depthwise = use_depthwise
+
+    # Additional projection layers to bring all feature maps to uniform
+    # channels.
+    self._additional_projection_layers = []
+    # The base tower layers for each head.
+    self._base_tower_layers_for_heads = {
+        BOX_ENCODINGS: [],
+        CLASS_PREDICTIONS_WITH_BACKGROUND: [],
+    }
+    for head_name in other_heads.keys():
+      self._base_tower_layers_for_heads[head_name] = []
+
+    # A dict maps the tower_name_scope of each head to the shared conv layers in
+    # the base tower for different feature map levels.
+    self._head_scope_conv_layers = {}
+
+  def _insert_additional_projection_layer(
+      self, inserted_layer_counter, target_channel):
+    projection_layers = []
+    if inserted_layer_counter >= 0:
+      use_bias = False if self._apply_batch_norm else True
+      projection_layers.append(keras.Conv2D(
+          target_channel, [1, 1], strides=1, padding='SAME',
+          name='ProjectionLayer/conv2d_{}'.format(inserted_layer_counter),
+          **self._conv_hyperparams.params(use_bias=use_bias)))
+      if self._apply_batch_norm:
+        projection_layers.append(self._conv_hyperparams.build_batch_norm(
+            training=(self._is_training and not self._freeze_batchnorm),
+            name='ProjectionLayer/conv2d_{}/BatchNorm'.format(
+                inserted_layer_counter)))
+      inserted_layer_counter += 1
+    return inserted_layer_counter, projection_layers
+
+  def _compute_base_tower(self, tower_name_scope, feature_index):
+    conv_layers = []
+    batch_norm_layers = []
+    activation_layers = []
+    use_bias = False if self._apply_batch_norm else True
+    for additional_conv_layer_idx in range(self._num_layers_before_predictor):
+      layer_name = '{}/conv2d_{}'.format(
+          tower_name_scope, additional_conv_layer_idx)
+      if tower_name_scope not in self._head_scope_conv_layers:
+        if self._use_depthwise:
+          conv_layers.append(
+              tf.keras.layers.SeparableConv2D(
+                  self._depth,
+                  [self._kernel_size, self._kernel_size],
+                  padding='SAME',
+                  name=layer_name,
+                  **self._conv_hyperparams.params(use_bias=use_bias)))
+        else:
+          conv_layers.append(
+              tf.keras.layers.Conv2D(
+                  self._depth,
+                  [self._kernel_size, self._kernel_size],
+                  padding='SAME',
+                  name=layer_name,
+                  **self._conv_hyperparams.params(use_bias=use_bias)))
+      # Each feature gets a separate batchnorm parameter even though they share
+      # the same convolution weights.
+      if self._apply_batch_norm:
+        batch_norm_layers.append(self._conv_hyperparams.build_batch_norm(
+            training=(self._is_training and not self._freeze_batchnorm),
+            name='{}/conv2d_{}/BatchNorm/feature_{}'.format(
+                tower_name_scope, additional_conv_layer_idx, feature_index)))
+      activation_layers.append(tf.keras.layers.Lambda(tf.nn.relu6))
+
+    # Set conv layers as the shared conv layers for different feature maps with
+    # the same tower_name_scope.
+    if tower_name_scope in self._head_scope_conv_layers:
+      conv_layers = self._head_scope_conv_layers[tower_name_scope]
+
+    # Stack the base_tower_layers in the order of conv_layer, batch_norm_layer
+    # and activation_layer
+    base_tower_layers = []
+    for i in range(self._num_layers_before_predictor):
+      base_tower_layers.extend([conv_layers[i]])
+      if self._apply_batch_norm:
+        base_tower_layers.extend([batch_norm_layers[i]])
+      base_tower_layers.extend([activation_layers[i]])
+    return conv_layers, base_tower_layers
+
+  def build(self, input_shapes):
+    """Creates the variables of the layer."""
+    feature_channels = [
+        input_shape[3].value for input_shape in input_shapes
+    ]
+    has_different_feature_channels = len(set(feature_channels)) > 1
+    if has_different_feature_channels:
+      inserted_layer_counter = 0
+      target_channel = max(set(feature_channels), key=feature_channels.count)
+      tf.logging.info('Not all feature maps have the same number of '
+                      'channels, found: {}, appending additional projection '
+                      'layers to bring all feature maps to uniformly have {} '
+                      'channels.'.format(feature_channels, target_channel))
+    else:
+      # Place holder variables if has_different_feature_channels is False.
+      target_channel = -1
+      inserted_layer_counter = -1
+
+    def _build_layers(tower_name_scope, feature_index):
+      conv_layers, base_tower_layers = self._compute_base_tower(
+          tower_name_scope=tower_name_scope, feature_index=feature_index)
+      if tower_name_scope not in self._head_scope_conv_layers:
+        self._head_scope_conv_layers[tower_name_scope] = conv_layers
+      return base_tower_layers
+
+    for feature_index, input_shape in enumerate(input_shapes):
+      # Additional projection layers should not be shared as input channels
+      # (and thus weight shapes) are different
+      inserted_layer_counter, projection_layers = (
+          self._insert_additional_projection_layer(
+              inserted_layer_counter, target_channel))
+      self._additional_projection_layers.append(projection_layers)
+
+      if self._share_prediction_tower:
+        box_tower_scope = 'PredictionTower'
+      else:
+        box_tower_scope = 'BoxPredictionTower'
+      # For box tower base
+      box_tower_layers = _build_layers(box_tower_scope, feature_index)
+      self._base_tower_layers_for_heads[BOX_ENCODINGS].append(box_tower_layers)
+
+      for head_name in self._sorted_head_names:
+        if head_name == CLASS_PREDICTIONS_WITH_BACKGROUND:
+          tower_name_scope = 'ClassPredictionTower'
+        else:
+          tower_name_scope = '{}PredictionTower'.format(head_name)
+        box_tower_layers = _build_layers(tower_name_scope, feature_index)
+        self._base_tower_layers_for_heads[head_name].append(box_tower_layers)
+
+    self.built = True
+
+  def _predict(self, image_features):
+    """Computes encoded object locations and corresponding confidences.
+
+    Args:
+      image_features: A list of float tensors of shape [batch_size, height_i,
+        width_i, channels_i] containing features for a batch of images.
+
+    Returns:
+      box_encodings: A list of float tensors of shape
+        [batch_size, num_anchors_i, q, code_size] representing the location of
+        the objects, where q is 1 or the number of classes. Each entry in the
+        list corresponds to a feature map in the input `image_features` list.
+      class_predictions_with_background: A list of float tensors of shape
+        [batch_size, num_anchors_i, num_classes + 1] representing the class
+        predictions for the proposals. Each entry in the list corresponds to a
+        feature map in the input `image_features` list.
+    """
+    predictions = collections.defaultdict(list)
+
+    def _apply_layers(base_tower_layers, image_feature):
+      for layer in base_tower_layers:
+        image_feature = layer(image_feature)
+      return image_feature
+
+    for (index, image_feature) in enumerate(image_features):
+      # Apply additional projection layers to image features
+      for layer in self._additional_projection_layers[index]:
+        image_feature = layer(image_feature)
+
+      # Apply box tower layers.
+      box_tower_feature = _apply_layers(
+          self._base_tower_layers_for_heads[BOX_ENCODINGS][index],
+          image_feature)
+      box_encodings = self._box_prediction_head(box_tower_feature)
+      predictions[BOX_ENCODINGS].append(box_encodings)
+
+      for head_name in self._sorted_head_names:
+        head_obj = self._prediction_heads[head_name]
+        if self._share_prediction_tower:
+          head_tower_feature = box_tower_feature
+        else:
+          head_tower_feature = _apply_layers(
+              self._base_tower_layers_for_heads[head_name][index],
+              image_feature)
+        prediction = head_obj(head_tower_feature)
+        predictions[head_name].append(prediction)
+    return predictions
--- a/research/object_detection/predictors/convolutional_keras_box_predictor_test.py
+++ b/research/object_detection/predictors/convolutional_keras_box_predictor_test.py
@@ -21,6 +21,9 @@ from google.protobuf import text_format
 from object_detection.builders import box_predictor_builder
 from object_detection.builders import hyperparams_builder
 from object_detection.predictors import convolutional_keras_box_predictor as box_predictor
+from object_detection.predictors.heads import keras_box_head
+from object_detection.predictors.heads import keras_class_head
+from object_detection.predictors.heads import keras_mask_head
 from object_detection.protos import hyperparams_pb2
 from object_detection.utils import test_case

@@ -255,5 +258,651 @@ class ConvolutionalKerasBoxPredictorTest(test_case.TestCase):
    self.assertEqual(conv_box_predictor._sorted_head_names,
                     ['box_encodings', 'class_predictions_with_background'])

+
+class WeightSharedConvolutionalKerasBoxPredictorTest(test_case.TestCase):
+
+  def _build_conv_hyperparams(self, add_batch_norm=True):
+    conv_hyperparams = hyperparams_pb2.Hyperparams()
+    conv_hyperparams_text_proto = """
+      activation: RELU_6
+      regularizer {
+        l2_regularizer {
+        }
+      }
+      initializer {
+        truncated_normal_initializer {
+          stddev: 0.01
+          mean: 0.0
+        }
+      }
+    """
+    if add_batch_norm:
+      batch_norm_proto = """
+        batch_norm {
+          train: true,
+        }
+      """
+      conv_hyperparams_text_proto += batch_norm_proto
+    text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams)
+    return hyperparams_builder.KerasLayerHyperparams(conv_hyperparams)
+
+  # pylint: disable=line-too-long
+  def test_get_boxes_for_five_aspect_ratios_per_location(self):
+
+    def graph_fn(image_features):
+      conv_box_predictor = (
+          box_predictor_builder.build_weight_shared_convolutional_keras_box_predictor(
+              is_training=False,
+              num_classes=0,
+              conv_hyperparams=self._build_conv_hyperparams(),
+              freeze_batchnorm=False,
+              inplace_batchnorm_update=False,
+              num_predictions_per_location_list=[5],
+              depth=32,
+              num_layers_before_predictor=1,
+              box_code_size=4))
+      box_predictions = conv_box_predictor([image_features])
+      box_encodings = tf.concat(
+          box_predictions[box_predictor.BOX_ENCODINGS], axis=1)
+      objectness_predictions = tf.concat(box_predictions[
+          box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND], axis=1)
+      return (box_encodings, objectness_predictions)
+    image_features = np.random.rand(4, 8, 8, 64).astype(np.float32)
+    (box_encodings, objectness_predictions) = self.execute(
+        graph_fn, [image_features])
+    self.assertAllEqual(box_encodings.shape, [4, 320, 4])
+    self.assertAllEqual(objectness_predictions.shape, [4, 320, 1])
+
+  def test_bias_predictions_to_background_with_sigmoid_score_conversion(self):
+
+    def graph_fn(image_features):
+      conv_box_predictor = (
+          box_predictor_builder.build_weight_shared_convolutional_keras_box_predictor(
+              is_training=True,
+              num_classes=2,
+              conv_hyperparams=self._build_conv_hyperparams(),
+              freeze_batchnorm=False,
+              inplace_batchnorm_update=False,
+              num_predictions_per_location_list=[5],
+              depth=32,
+              num_layers_before_predictor=1,
+              class_prediction_bias_init=-4.6,
+              box_code_size=4))
+      box_predictions = conv_box_predictor([image_features])
+      class_predictions = tf.concat(box_predictions[
+          box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND], axis=1)
+      return (tf.nn.sigmoid(class_predictions),)
+
+    image_features = np.random.rand(4, 8, 8, 64).astype(np.float32)
+    class_predictions = self.execute(graph_fn, [image_features])
+    self.assertAlmostEqual(np.mean(class_predictions), 0.01, places=3)
+
+  def test_get_multi_class_predictions_for_five_aspect_ratios_per_location(
+      self):
+
+    num_classes_without_background = 6
+    def graph_fn(image_features):
+      conv_box_predictor = (
+          box_predictor_builder.build_weight_shared_convolutional_keras_box_predictor(
+              is_training=False,
+              num_classes=num_classes_without_background,
+              conv_hyperparams=self._build_conv_hyperparams(),
+              freeze_batchnorm=False,
+              inplace_batchnorm_update=False,
+              num_predictions_per_location_list=[5],
+              depth=32,
+              num_layers_before_predictor=1,
+              box_code_size=4))
+      box_predictions = conv_box_predictor([image_features])
+      box_encodings = tf.concat(
+          box_predictions[box_predictor.BOX_ENCODINGS], axis=1)
+      class_predictions_with_background = tf.concat(box_predictions[
+          box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND], axis=1)
+      return (box_encodings, class_predictions_with_background)
+
+    image_features = np.random.rand(4, 8, 8, 64).astype(np.float32)
+    (box_encodings, class_predictions_with_background) = self.execute(
+        graph_fn, [image_features])
+    self.assertAllEqual(box_encodings.shape, [4, 320, 4])
+    self.assertAllEqual(class_predictions_with_background.shape,
+                        [4, 320, num_classes_without_background+1])
+
+  def test_get_multi_class_predictions_from_two_feature_maps(
+      self):
+
+    num_classes_without_background = 6
+    def graph_fn(image_features1, image_features2):
+      conv_box_predictor = (
+          box_predictor_builder.build_weight_shared_convolutional_keras_box_predictor(
+              is_training=False,
+              num_classes=num_classes_without_background,
+              conv_hyperparams=self._build_conv_hyperparams(),
+              freeze_batchnorm=False,
+              inplace_batchnorm_update=False,
+              num_predictions_per_location_list=[5, 5],
+              depth=32,
+              num_layers_before_predictor=1,
+              box_code_size=4))
+      box_predictions = conv_box_predictor([image_features1, image_features2])
+      box_encodings = tf.concat(
+          box_predictions[box_predictor.BOX_ENCODINGS], axis=1)
+      class_predictions_with_background = tf.concat(
+          box_predictions[box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND],
+          axis=1)
+      return (box_encodings, class_predictions_with_background)
+
+    image_features1 = np.random.rand(4, 8, 8, 64).astype(np.float32)
+    image_features2 = np.random.rand(4, 8, 8, 64).astype(np.float32)
+    (box_encodings, class_predictions_with_background) = self.execute(
+        graph_fn, [image_features1, image_features2])
+    self.assertAllEqual(box_encodings.shape, [4, 640, 4])
+    self.assertAllEqual(class_predictions_with_background.shape,
+                        [4, 640, num_classes_without_background+1])
+
+  def test_get_multi_class_predictions_from_feature_maps_of_different_depth(
+      self):
+
+    num_classes_without_background = 6
+    def graph_fn(image_features1, image_features2, image_features3):
+      conv_box_predictor = (
+          box_predictor_builder.build_weight_shared_convolutional_keras_box_predictor(
+              is_training=False,
+              num_classes=num_classes_without_background,
+              conv_hyperparams=self._build_conv_hyperparams(),
+              freeze_batchnorm=False,
+              inplace_batchnorm_update=False,
+              num_predictions_per_location_list=[5, 5, 5],
+              depth=32,
+              num_layers_before_predictor=1,
+              box_code_size=4))
+      box_predictions = conv_box_predictor(
+          [image_features1, image_features2, image_features3])
+      box_encodings = tf.concat(
+          box_predictions[box_predictor.BOX_ENCODINGS], axis=1)
+      class_predictions_with_background = tf.concat(
+          box_predictions[box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND],
+          axis=1)
+      return (box_encodings, class_predictions_with_background)
+
+    image_features1 = np.random.rand(4, 8, 8, 64).astype(np.float32)
+    image_features2 = np.random.rand(4, 8, 8, 64).astype(np.float32)
+    image_features3 = np.random.rand(4, 8, 8, 32).astype(np.float32)
+    (box_encodings, class_predictions_with_background) = self.execute(
+        graph_fn, [image_features1, image_features2, image_features3])
+    self.assertAllEqual(box_encodings.shape, [4, 960, 4])
+    self.assertAllEqual(class_predictions_with_background.shape,
+                        [4, 960, num_classes_without_background+1])
+
+  def test_predictions_multiple_feature_maps_share_weights_separate_batchnorm(
+      self):
+    num_classes_without_background = 6
+    def graph_fn(image_features1, image_features2):
+      conv_box_predictor = (
+          box_predictor_builder.build_weight_shared_convolutional_keras_box_predictor(
+              is_training=False,
+              num_classes=num_classes_without_background,
+              conv_hyperparams=self._build_conv_hyperparams(),
+              freeze_batchnorm=False,
+              inplace_batchnorm_update=False,
+              num_predictions_per_location_list=[5, 5],
+              depth=32,
+              num_layers_before_predictor=2,
+              box_code_size=4))
+      box_predictions = conv_box_predictor([image_features1, image_features2])
+      box_encodings = tf.concat(
+          box_predictions[box_predictor.BOX_ENCODINGS], axis=1)
+      class_predictions_with_background = tf.concat(
+          box_predictions[box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND],
+          axis=1)
+      return (box_encodings, class_predictions_with_background)
+
+    with self.test_session(graph=tf.Graph()):
+      graph_fn(tf.random_uniform([4, 32, 32, 3], dtype=tf.float32),
+               tf.random_uniform([4, 16, 16, 3], dtype=tf.float32))
+      actual_variable_set = set(
+          [var.op.name for var in tf.trainable_variables()])
+    expected_variable_set = set([
+        # Box prediction tower
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'BoxPredictionTower/conv2d_0/kernel'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'BoxPredictionTower/conv2d_0/BatchNorm/feature_0/beta'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'BoxPredictionTower/conv2d_0/BatchNorm/feature_1/beta'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'BoxPredictionTower/conv2d_1/kernel'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'BoxPredictionTower/conv2d_1/BatchNorm/feature_0/beta'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'BoxPredictionTower/conv2d_1/BatchNorm/feature_1/beta'),
+        # Box prediction head
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'WeightSharedConvolutionalBoxHead/BoxPredictor/kernel'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'WeightSharedConvolutionalBoxHead/BoxPredictor/bias'),
+        # Class prediction tower
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'ClassPredictionTower/conv2d_0/kernel'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'ClassPredictionTower/conv2d_0/BatchNorm/feature_0/beta'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'ClassPredictionTower/conv2d_0/BatchNorm/feature_1/beta'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'ClassPredictionTower/conv2d_1/kernel'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'ClassPredictionTower/conv2d_1/BatchNorm/feature_0/beta'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'ClassPredictionTower/conv2d_1/BatchNorm/feature_1/beta'),
+        # Class prediction head
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'WeightSharedConvolutionalClassHead/ClassPredictor/kernel'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'WeightSharedConvolutionalClassHead/ClassPredictor/bias')])
+    self.assertEqual(expected_variable_set, actual_variable_set)
+
+  def test_predictions_multiple_feature_maps_share_weights_without_batchnorm(
+      self):
+    num_classes_without_background = 6
+    def graph_fn(image_features1, image_features2):
+      conv_box_predictor = (
+          box_predictor_builder.build_weight_shared_convolutional_keras_box_predictor(
+              is_training=False,
+              num_classes=num_classes_without_background,
+              conv_hyperparams=self._build_conv_hyperparams(),
+              freeze_batchnorm=False,
+              inplace_batchnorm_update=False,
+              num_predictions_per_location_list=[5, 5],
+              depth=32,
+              num_layers_before_predictor=2,
+              box_code_size=4,
+              apply_batch_norm=False))
+      box_predictions = conv_box_predictor([image_features1, image_features2])
+      box_encodings = tf.concat(
+          box_predictions[box_predictor.BOX_ENCODINGS], axis=1)
+      class_predictions_with_background = tf.concat(
+          box_predictions[box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND],
+          axis=1)
+      return (box_encodings, class_predictions_with_background)
+
+    with self.test_session(graph=tf.Graph()):
+      graph_fn(tf.random_uniform([4, 32, 32, 3], dtype=tf.float32),
+               tf.random_uniform([4, 16, 16, 3], dtype=tf.float32))
+      actual_variable_set = set(
+          [var.op.name for var in tf.trainable_variables()])
+    expected_variable_set = set([
+        # Box prediction tower
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'BoxPredictionTower/conv2d_0/kernel'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'BoxPredictionTower/conv2d_0/bias'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'BoxPredictionTower/conv2d_1/kernel'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'BoxPredictionTower/conv2d_1/bias'),
+        # Box prediction head
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'WeightSharedConvolutionalBoxHead/BoxPredictor/kernel'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'WeightSharedConvolutionalBoxHead/BoxPredictor/bias'),
+        # Class prediction tower
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'ClassPredictionTower/conv2d_0/kernel'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'ClassPredictionTower/conv2d_0/bias'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'ClassPredictionTower/conv2d_1/kernel'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'ClassPredictionTower/conv2d_1/bias'),
+        # Class prediction head
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'WeightSharedConvolutionalClassHead/ClassPredictor/kernel'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'WeightSharedConvolutionalClassHead/ClassPredictor/bias')])
+    self.assertEqual(expected_variable_set, actual_variable_set)
+
+  def test_predictions_multiple_feature_maps_share_weights_with_depthwise(
+      self):
+    num_classes_without_background = 6
+    def graph_fn(image_features1, image_features2):
+      conv_box_predictor = (
+          box_predictor_builder.build_weight_shared_convolutional_keras_box_predictor(
+              is_training=False,
+              num_classes=num_classes_without_background,
+              conv_hyperparams=self._build_conv_hyperparams(
+                  add_batch_norm=False),
+              freeze_batchnorm=False,
+              inplace_batchnorm_update=False,
+              num_predictions_per_location_list=[5, 5],
+              depth=32,
+              num_layers_before_predictor=2,
+              box_code_size=4,
+              apply_batch_norm=False,
+              use_depthwise=True))
+      box_predictions = conv_box_predictor([image_features1, image_features2])
+      box_encodings = tf.concat(
+          box_predictions[box_predictor.BOX_ENCODINGS], axis=1)
+      class_predictions_with_background = tf.concat(
+          box_predictions[box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND],
+          axis=1)
+      return (box_encodings, class_predictions_with_background)
+
+    with self.test_session(graph=tf.Graph()):
+      graph_fn(tf.random_uniform([4, 32, 32, 3], dtype=tf.float32),
+               tf.random_uniform([4, 16, 16, 3], dtype=tf.float32))
+      actual_variable_set = set(
+          [var.op.name for var in tf.trainable_variables()])
+    expected_variable_set = set([
+        # Box prediction tower
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'BoxPredictionTower/conv2d_0/depthwise_kernel'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'BoxPredictionTower/conv2d_0/pointwise_kernel'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'BoxPredictionTower/conv2d_0/bias'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'BoxPredictionTower/conv2d_1/depthwise_kernel'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'BoxPredictionTower/conv2d_1/pointwise_kernel'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'BoxPredictionTower/conv2d_1/bias'),
+        # Box prediction head
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'WeightSharedConvolutionalBoxHead/BoxPredictor/depthwise_kernel'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'WeightSharedConvolutionalBoxHead/BoxPredictor/pointwise_kernel'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'WeightSharedConvolutionalBoxHead/BoxPredictor/bias'),
+        # Class prediction tower
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'ClassPredictionTower/conv2d_0/depthwise_kernel'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'ClassPredictionTower/conv2d_0/pointwise_kernel'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'ClassPredictionTower/conv2d_0/bias'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'ClassPredictionTower/conv2d_1/depthwise_kernel'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'ClassPredictionTower/conv2d_1/pointwise_kernel'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'ClassPredictionTower/conv2d_1/bias'),
+        # Class prediction head
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'WeightSharedConvolutionalClassHead/ClassPredictor/depthwise_kernel'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'WeightSharedConvolutionalClassHead/ClassPredictor/pointwise_kernel'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'WeightSharedConvolutionalClassHead/ClassPredictor/bias')])
+    self.assertEqual(expected_variable_set, actual_variable_set)
+
+  def test_no_batchnorm_params_when_batchnorm_is_not_configured(self):
+    num_classes_without_background = 6
+    def graph_fn(image_features1, image_features2):
+      conv_box_predictor = (
+          box_predictor_builder.build_weight_shared_convolutional_keras_box_predictor(
+              is_training=False,
+              num_classes=num_classes_without_background,
+              conv_hyperparams=self._build_conv_hyperparams(
+                  add_batch_norm=False),
+              freeze_batchnorm=False,
+              inplace_batchnorm_update=False,
+              num_predictions_per_location_list=[5, 5],
+              depth=32,
+              num_layers_before_predictor=2,
+              box_code_size=4,
+              apply_batch_norm=False))
+      box_predictions = conv_box_predictor(
+          [image_features1, image_features2])
+      box_encodings = tf.concat(
+          box_predictions[box_predictor.BOX_ENCODINGS], axis=1)
+      class_predictions_with_background = tf.concat(
+          box_predictions[box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND],
+          axis=1)
+      return (box_encodings, class_predictions_with_background)
+
+    with self.test_session(graph=tf.Graph()):
+      graph_fn(tf.random_uniform([4, 32, 32, 3], dtype=tf.float32),
+               tf.random_uniform([4, 16, 16, 3], dtype=tf.float32))
+      actual_variable_set = set(
+          [var.op.name for var in tf.trainable_variables()])
+    expected_variable_set = set([
+        # Box prediction tower
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'BoxPredictionTower/conv2d_0/kernel'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'BoxPredictionTower/conv2d_0/bias'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'BoxPredictionTower/conv2d_1/kernel'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'BoxPredictionTower/conv2d_1/bias'),
+        # Box prediction head
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'WeightSharedConvolutionalBoxHead/BoxPredictor/kernel'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'WeightSharedConvolutionalBoxHead/BoxPredictor/bias'),
+        # Class prediction tower
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'ClassPredictionTower/conv2d_0/kernel'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'ClassPredictionTower/conv2d_0/bias'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'ClassPredictionTower/conv2d_1/kernel'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'ClassPredictionTower/conv2d_1/bias'),
+        # Class prediction head
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'WeightSharedConvolutionalClassHead/ClassPredictor/kernel'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'WeightSharedConvolutionalClassHead/ClassPredictor/bias')])
+    self.assertEqual(expected_variable_set, actual_variable_set)
+
+  def test_predictions_share_weights_share_tower_separate_batchnorm(
+      self):
+    num_classes_without_background = 6
+    def graph_fn(image_features1, image_features2):
+      conv_box_predictor = (
+          box_predictor_builder.build_weight_shared_convolutional_keras_box_predictor(
+              is_training=False,
+              num_classes=num_classes_without_background,
+              conv_hyperparams=self._build_conv_hyperparams(),
+              freeze_batchnorm=False,
+              inplace_batchnorm_update=False,
+              num_predictions_per_location_list=[5, 5],
+              depth=32,
+              num_layers_before_predictor=2,
+              box_code_size=4,
+              share_prediction_tower=True))
+      box_predictions = conv_box_predictor(
+          [image_features1, image_features2])
+      box_encodings = tf.concat(
+          box_predictions[box_predictor.BOX_ENCODINGS], axis=1)
+      class_predictions_with_background = tf.concat(
+          box_predictions[box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND],
+          axis=1)
+      return (box_encodings, class_predictions_with_background)
+
+    with self.test_session(graph=tf.Graph()):
+      graph_fn(tf.random_uniform([4, 32, 32, 3], dtype=tf.float32),
+               tf.random_uniform([4, 16, 16, 3], dtype=tf.float32))
+      actual_variable_set = set(
+          [var.op.name for var in tf.trainable_variables()])
+    expected_variable_set = set([
+        # Shared prediction tower
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'PredictionTower/conv2d_0/kernel'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'PredictionTower/conv2d_0/BatchNorm/feature_0/beta'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'PredictionTower/conv2d_0/BatchNorm/feature_1/beta'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'PredictionTower/conv2d_1/kernel'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'PredictionTower/conv2d_1/BatchNorm/feature_0/beta'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'PredictionTower/conv2d_1/BatchNorm/feature_1/beta'),
+        # Box prediction head
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'WeightSharedConvolutionalBoxHead/BoxPredictor/kernel'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'WeightSharedConvolutionalBoxHead/BoxPredictor/bias'),
+        # Class prediction head
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'WeightSharedConvolutionalClassHead/ClassPredictor/kernel'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'WeightSharedConvolutionalClassHead/ClassPredictor/bias')])
+    self.assertEqual(expected_variable_set, actual_variable_set)
+
+  def test_predictions_share_weights_share_tower_without_batchnorm(
+      self):
+    num_classes_without_background = 6
+    def graph_fn(image_features1, image_features2):
+      conv_box_predictor = (
+          box_predictor_builder.build_weight_shared_convolutional_keras_box_predictor(
+              is_training=False,
+              num_classes=num_classes_without_background,
+              conv_hyperparams=self._build_conv_hyperparams(
+                  add_batch_norm=False),
+              freeze_batchnorm=False,
+              inplace_batchnorm_update=False,
+              num_predictions_per_location_list=[5, 5],
+              depth=32,
+              num_layers_before_predictor=2,
+              box_code_size=4,
+              share_prediction_tower=True,
+              apply_batch_norm=False))
+      box_predictions = conv_box_predictor(
+          [image_features1, image_features2])
+      box_encodings = tf.concat(
+          box_predictions[box_predictor.BOX_ENCODINGS], axis=1)
+      class_predictions_with_background = tf.concat(
+          box_predictions[box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND],
+          axis=1)
+      return (box_encodings, class_predictions_with_background)
+
+    with self.test_session(graph=tf.Graph()):
+      graph_fn(tf.random_uniform([4, 32, 32, 3], dtype=tf.float32),
+               tf.random_uniform([4, 16, 16, 3], dtype=tf.float32))
+      actual_variable_set = set(
+          [var.op.name for var in tf.trainable_variables()])
+    expected_variable_set = set([
+        # Shared prediction tower
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'PredictionTower/conv2d_0/kernel'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'PredictionTower/conv2d_0/bias'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'PredictionTower/conv2d_1/kernel'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'PredictionTower/conv2d_1/bias'),
+        # Box prediction head
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'WeightSharedConvolutionalBoxHead/BoxPredictor/kernel'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'WeightSharedConvolutionalBoxHead/BoxPredictor/bias'),
+        # Class prediction head
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'WeightSharedConvolutionalClassHead/ClassPredictor/kernel'),
+        ('WeightSharedConvolutionalBoxPredictor/'
+         'WeightSharedConvolutionalClassHead/ClassPredictor/bias')])
+
+    self.assertEqual(expected_variable_set, actual_variable_set)
+
+  def test_get_predictions_with_feature_maps_of_dynamic_shape(
+      self):
+    image_features = tf.placeholder(dtype=tf.float32, shape=[4, None, None, 64])
+    conv_box_predictor = (
+        box_predictor_builder.build_weight_shared_convolutional_keras_box_predictor(
+            is_training=False,
+            num_classes=0,
+            conv_hyperparams=self._build_conv_hyperparams(),
+            freeze_batchnorm=False,
+            inplace_batchnorm_update=False,
+            num_predictions_per_location_list=[5],
+            depth=32,
+            num_layers_before_predictor=1,
+            box_code_size=4))
+    box_predictions = conv_box_predictor([image_features])
+    box_encodings = tf.concat(box_predictions[box_predictor.BOX_ENCODINGS],
+                              axis=1)
+    objectness_predictions = tf.concat(box_predictions[
+        box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND], axis=1)
+    init_op = tf.global_variables_initializer()
+
+    resolution = 32
+    expected_num_anchors = resolution*resolution*5
+    with self.test_session() as sess:
+      sess.run(init_op)
+      (box_encodings_shape,
+       objectness_predictions_shape) = sess.run(
+           [tf.shape(box_encodings), tf.shape(objectness_predictions)],
+           feed_dict={image_features:
+                      np.random.rand(4, resolution, resolution, 64)})
+      self.assertAllEqual(box_encodings_shape, [4, expected_num_anchors, 4])
+      self.assertAllEqual(objectness_predictions_shape,
+                          [4, expected_num_anchors, 1])
+
+  def test_other_heads_predictions(self):
+    box_code_size = 4
+    num_classes_without_background = 3
+    other_head_name = 'Mask'
+    mask_height = 5
+    mask_width = 5
+    num_predictions_per_location = 5
+
+    def graph_fn(image_features):
+      box_prediction_head = keras_box_head.WeightSharedConvolutionalBoxHead(
+          box_code_size=box_code_size,
+          conv_hyperparams=self._build_conv_hyperparams(),
+          num_predictions_per_location=num_predictions_per_location)
+      class_prediction_head = keras_class_head.WeightSharedConvolutionalClassHead(
+          num_class_slots=num_classes_without_background + 1,
+          conv_hyperparams=self._build_conv_hyperparams(),
+          num_predictions_per_location=num_predictions_per_location)
+      other_heads = {
+          other_head_name:
+              keras_mask_head.WeightSharedConvolutionalMaskHead(
+                  num_classes=num_classes_without_background,
+                  conv_hyperparams=self._build_conv_hyperparams(),
+                  num_predictions_per_location=num_predictions_per_location,
+                  mask_height=mask_height,
+                  mask_width=mask_width)
+      }
+
+      conv_box_predictor = box_predictor.WeightSharedConvolutionalBoxPredictor(
+          is_training=False,
+          num_classes=num_classes_without_background,
+          box_prediction_head=box_prediction_head,
+          class_prediction_head=class_prediction_head,
+          other_heads=other_heads,
+          conv_hyperparams=self._build_conv_hyperparams(),
+          freeze_batchnorm=False,
+          inplace_batchnorm_update=False,
+          depth=32,
+          num_layers_before_predictor=2)
+      box_predictions = conv_box_predictor([image_features])
+      for key, value in box_predictions.items():
+        box_predictions[key] = tf.concat(value, axis=1)
+      assert len(box_predictions) == 3
+      return (box_predictions[box_predictor.BOX_ENCODINGS],
+              box_predictions[box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND],
+              box_predictions[other_head_name])
+
+    batch_size = 4
+    feature_ht = 8
+    feature_wt = 8
+    image_features = np.random.rand(batch_size, feature_ht, feature_wt,
+                                    64).astype(np.float32)
+    (box_encodings, class_predictions, other_head_predictions) = self.execute(
+        graph_fn, [image_features])
+    num_anchors = feature_ht * feature_wt * num_predictions_per_location
+    self.assertAllEqual(box_encodings.shape,
+                        [batch_size, num_anchors, box_code_size])
+    self.assertAllEqual(
+        class_predictions.shape,
+        [batch_size, num_anchors, num_classes_without_background + 1])
+    self.assertAllEqual(other_head_predictions.shape, [
+        batch_size, num_anchors, num_classes_without_background, mask_height,
+        mask_width
+    ])
+
 if __name__ == '__main__':
  tf.test.main()
--- a/research/object_detection/predictors/heads/box_head.py
+++ b/research/object_detection/predictors/heads/box_head.py
@@ -120,7 +120,8 @@ class ConvolutionalBoxHead(head.Head):
               is_training,
               box_code_size,
               kernel_size,
-               use_depthwise=False):
+               use_depthwise=False,
+               box_encodings_clip_range=None):
    """Constructor.

    Args:
@@ -132,6 +133,7 @@ class ConvolutionalBoxHead(head.Head):
        min(feature_width, feature_height).
      use_depthwise: Whether to use depthwise convolutions for prediction
        steps. Default is False.
+      box_encodings_clip_range: Min and max values for clipping box_encodings.

    Raises:
      ValueError: if min_depth > max_depth.
@@ -141,6 +143,7 @@ class ConvolutionalBoxHead(head.Head):
    self._box_code_size = box_code_size
    self._kernel_size = kernel_size
    self._use_depthwise = use_depthwise
+    self._box_encodings_clip_range = box_encodings_clip_range

  def predict(self, features, num_predictions_per_location):
    """Predicts boxes.
@@ -180,6 +183,11 @@ class ConvolutionalBoxHead(head.Head):
    batch_size = features.get_shape().as_list()[0]
    if batch_size is None:
      batch_size = tf.shape(features)[0]
+    # Clipping the box encodings to make the inference graph TPU friendly.
+    if self._box_encodings_clip_range is not None:
+      box_encodings = tf.clip_by_value(
+          box_encodings, self._box_encodings_clip_range.min,
+          self._box_encodings_clip_range.max)
    box_encodings = tf.reshape(box_encodings,
                               [batch_size, -1, 1, self._box_code_size])
    return box_encodings
@@ -198,7 +206,8 @@ class WeightSharedConvolutionalBoxHead(head.Head):
               box_code_size,
               kernel_size=3,
               use_depthwise=False,
-               box_encodings_clip_range=None):
+               box_encodings_clip_range=None,
+               return_flat_predictions=True):
    """Constructor.

    Args:
@@ -207,12 +216,18 @@ class WeightSharedConvolutionalBoxHead(head.Head):
      use_depthwise: Whether to use depthwise convolutions for prediction steps.
        Default is False.
      box_encodings_clip_range: Min and max values for clipping box_encodings.
+      return_flat_predictions: If true, returns flattened prediction tensor
+        of shape [batch, height * width * num_predictions_per_location,
+        box_coder]. Otherwise returns the prediction tensor before reshaping,
+        whose shape is [batch, height, width, num_predictions_per_location *
+        num_class_slots].
    """
    super(WeightSharedConvolutionalBoxHead, self).__init__()
    self._box_code_size = box_code_size
    self._kernel_size = kernel_size
    self._use_depthwise = use_depthwise
    self._box_encodings_clip_range = box_encodings_clip_range
+    self._return_flat_predictions = return_flat_predictions

  def predict(self, features, num_predictions_per_location):
    """Predicts boxes.
@@ -226,7 +241,9 @@ class WeightSharedConvolutionalBoxHead(head.Head):
    Returns:
      box_encodings: A float tensor of shape
        [batch_size, num_anchors, code_size] representing the location of
-        the objects.
+        the objects, or a float tensor of shape [batch, height, width,
+        num_predictions_per_location * box_code_size] representing grid box
+        location predictions if self._return_flat_predictions is False.
    """
    box_encodings_net = features
    if self._use_depthwise:
@@ -248,6 +265,7 @@ class WeightSharedConvolutionalBoxHead(head.Head):
      box_encodings = tf.clip_by_value(
          box_encodings, self._box_encodings_clip_range.min,
          self._box_encodings_clip_range.max)
+    if self._return_flat_predictions:
      box_encodings = tf.reshape(box_encodings,
                                 [batch_size, -1, self._box_code_size])
    return box_encodings
--- a/research/object_detection/predictors/heads/class_head.py
+++ b/research/object_detection/predictors/heads/class_head.py
@@ -39,7 +39,8 @@ class MaskRCNNClassHead(head.Head):
               num_class_slots,
               fc_hyperparams_fn,
               use_dropout,
-               dropout_keep_prob):
+               dropout_keep_prob,
+               scope='ClassPredictor'):
    """Constructor.

    Args:
@@ -53,6 +54,7 @@ class MaskRCNNClassHead(head.Head):
        in contrast to the ConvolutionalBoxPredictor below.
      dropout_keep_prob: Keep probability for dropout.
        This is only used if use_dropout is True.
+      scope: Scope name for the convolution operation.
    """
    super(MaskRCNNClassHead, self).__init__()
    self._is_training = is_training
@@ -60,6 +62,7 @@ class MaskRCNNClassHead(head.Head):
    self._fc_hyperparams_fn = fc_hyperparams_fn
    self._use_dropout = use_dropout
    self._dropout_keep_prob = dropout_keep_prob
+    self._scope = scope

  def predict(self, features, num_predictions_per_location=1):
    """Predicts boxes and class scores.
@@ -95,7 +98,7 @@ class MaskRCNNClassHead(head.Head):
          flattened_roi_pooled_features,
          self._num_class_slots,
          activation_fn=None,
-          scope='ClassPredictor')
+          scope=self._scope)
    class_predictions_with_background = tf.reshape(
        class_predictions_with_background,
        [-1, 1, self._num_class_slots])
@@ -113,7 +116,8 @@ class ConvolutionalClassHead(head.Head):
               kernel_size,
               apply_sigmoid_to_scores=False,
               class_prediction_bias_init=0.0,
-               use_depthwise=False):
+               use_depthwise=False,
+               scope='ClassPredictor'):
    """Constructor.

    Args:
@@ -135,6 +139,7 @@ class ConvolutionalClassHead(head.Head):
        conv2d layer before class prediction.
      use_depthwise: Whether to use depthwise convolutions for prediction
        steps. Default is False.
+      scope: Scope name for the convolution operation.

    Raises:
      ValueError: if min_depth > max_depth.
@@ -148,6 +153,7 @@ class ConvolutionalClassHead(head.Head):
    self._apply_sigmoid_to_scores = apply_sigmoid_to_scores
    self._class_prediction_bias_init = class_prediction_bias_init
    self._use_depthwise = use_depthwise
+    self._scope = scope

  def predict(self, features, num_predictions_per_location):
    """Predicts boxes.
@@ -167,17 +173,18 @@ class ConvolutionalClassHead(head.Head):
    if self._use_dropout:
      net = slim.dropout(net, keep_prob=self._dropout_keep_prob)
    if self._use_depthwise:
+      depthwise_scope = self._scope + '_depthwise'
      class_predictions_with_background = slim.separable_conv2d(
          net, None, [self._kernel_size, self._kernel_size],
          padding='SAME', depth_multiplier=1, stride=1,
-          rate=1, scope='ClassPredictor_depthwise')
+          rate=1, scope=depthwise_scope)
      class_predictions_with_background = slim.conv2d(
          class_predictions_with_background,
          num_predictions_per_location * self._num_class_slots, [1, 1],
          activation_fn=None,
          normalizer_fn=None,
          normalizer_params=None,
-          scope='ClassPredictor')
+          scope=self._scope)
    else:
      class_predictions_with_background = slim.conv2d(
          net,
@@ -186,7 +193,7 @@ class ConvolutionalClassHead(head.Head):
          activation_fn=None,
          normalizer_fn=None,
          normalizer_params=None,
-          scope='ClassPredictor',
+          scope=self._scope,
          biases_initializer=tf.constant_initializer(
              self._class_prediction_bias_init))
    if self._apply_sigmoid_to_scores:
@@ -217,7 +224,9 @@ class WeightSharedConvolutionalClassHead(head.Head):
               use_dropout=False,
               dropout_keep_prob=0.8,
               use_depthwise=False,
-               score_converter_fn=tf.identity):
+               score_converter_fn=tf.identity,
+               return_flat_predictions=True,
+               scope='ClassPredictor'):
    """Constructor.

    Args:
@@ -232,6 +241,12 @@ class WeightSharedConvolutionalClassHead(head.Head):
        steps. Default is False.
      score_converter_fn: Callable elementwise nonlinearity (that takes tensors
        as inputs and returns tensors).
+      return_flat_predictions: If true, returns flattened prediction tensor
+        of shape [batch, height * width * num_predictions_per_location,
+        box_coder]. Otherwise returns the prediction tensor before reshaping,
+        whose shape is [batch, height, width, num_predictions_per_location *
+        num_class_slots].
+      scope: Scope name for the convolution operation.
    """
    super(WeightSharedConvolutionalClassHead, self).__init__()
    self._num_class_slots = num_class_slots
@@ -241,6 +256,8 @@ class WeightSharedConvolutionalClassHead(head.Head):
    self._dropout_keep_prob = dropout_keep_prob
    self._use_depthwise = use_depthwise
    self._score_converter_fn = score_converter_fn
+    self._return_flat_predictions = return_flat_predictions
+    self._scope = scope

  def predict(self, features, num_predictions_per_location):
    """Predicts boxes.
@@ -254,7 +271,10 @@ class WeightSharedConvolutionalClassHead(head.Head):
    Returns:
      class_predictions_with_background: A tensor of shape
        [batch_size, num_anchors, num_class_slots] representing the class
-        predictions for the proposals.
+        predictions for the proposals, or a tensor of shape [batch, height,
+        width, num_predictions_per_location * num_class_slots] representing
+        class predictions before reshaping if self._return_flat_predictions is
+        False.
    """
    class_predictions_net = features
    if self._use_dropout:
@@ -272,13 +292,15 @@ class WeightSharedConvolutionalClassHead(head.Head):
        normalizer_fn=None,
        biases_initializer=tf.constant_initializer(
            self._class_prediction_bias_init),
-        scope='ClassPredictor')
+        scope=self._scope)
    batch_size = features.get_shape().as_list()[0]
    if batch_size is None:
      batch_size = tf.shape(features)[0]
    class_predictions_with_background = self._score_converter_fn(
        class_predictions_with_background)
+    if self._return_flat_predictions:
      class_predictions_with_background = tf.reshape(
          class_predictions_with_background,
          [batch_size, -1, self._num_class_slots])
    return class_predictions_with_background
+
--- a/research/object_detection/predictors/heads/class_head_test.py
+++ b/research/object_detection/predictors/heads/class_head_test.py
@@ -56,6 +56,30 @@ class MaskRCNNClassHeadTest(test_case.TestCase):
        features=roi_pooled_features, num_predictions_per_location=1)
    self.assertAllEqual([64, 1, 20], prediction.get_shape().as_list())

+  def test_scope_name(self):
+    expected_var_names = set([
+        """ClassPredictor/weights""",
+        """ClassPredictor/biases"""
+    ])
+
+    g = tf.Graph()
+    with g.as_default():
+      class_prediction_head = class_head.MaskRCNNClassHead(
+          is_training=True,
+          num_class_slots=20,
+          fc_hyperparams_fn=self._build_arg_scope_with_hyperparams(),
+          use_dropout=True,
+          dropout_keep_prob=0.5)
+      image_feature = tf.random_uniform(
+          [64, 17, 19, 1024], minval=-10.0, maxval=10.0, dtype=tf.float32)
+      class_prediction_head.predict(
+          features=image_feature,
+          num_predictions_per_location=1)
+      actual_variable_set = set([
+          var.op.name for var in g.get_collection(tf.GraphKeys.GLOBAL_VARIABLES)
+      ])
+      self.assertSetEqual(expected_var_names, actual_variable_set)
+

 class ConvolutionalClassPredictorTest(test_case.TestCase):

@@ -92,6 +116,29 @@ class ConvolutionalClassPredictorTest(test_case.TestCase):
    self.assertAllEqual([64, 323, 20],
                        class_predictions.get_shape().as_list())

+  def test_scope_name(self):
+    expected_var_names = set([
+        """ClassPredictor/weights""",
+        """ClassPredictor/biases"""
+    ])
+    g = tf.Graph()
+    with g.as_default():
+      class_prediction_head = class_head.ConvolutionalClassHead(
+          is_training=True,
+          num_class_slots=20,
+          use_dropout=True,
+          dropout_keep_prob=0.5,
+          kernel_size=3)
+      image_feature = tf.random_uniform(
+          [64, 17, 19, 1024], minval=-10.0, maxval=10.0, dtype=tf.float32)
+      class_prediction_head.predict(
+          features=image_feature,
+          num_predictions_per_location=1)
+      actual_variable_set = set([
+          var.op.name for var in g.get_collection(tf.GraphKeys.GLOBAL_VARIABLES)
+      ])
+      self.assertSetEqual(expected_var_names, actual_variable_set)
+

 class WeightSharedConvolutionalClassPredictorTest(test_case.TestCase):

@@ -123,6 +170,25 @@ class WeightSharedConvolutionalClassPredictorTest(test_case.TestCase):
        num_predictions_per_location=1)
    self.assertAllEqual([64, 323, 20], class_predictions.get_shape().as_list())

+  def test_scope_name(self):
+    expected_var_names = set([
+        """ClassPredictor/weights""",
+        """ClassPredictor/biases"""
+    ])
+    g = tf.Graph()
+    with g.as_default():
+      class_prediction_head = class_head.WeightSharedConvolutionalClassHead(
+          num_class_slots=20)
+      image_feature = tf.random_uniform(
+          [64, 17, 19, 1024], minval=-10.0, maxval=10.0, dtype=tf.float32)
+      class_prediction_head.predict(
+          features=image_feature,
+          num_predictions_per_location=1)
+      actual_variable_set = set([
+          var.op.name for var in g.get_collection(tf.GraphKeys.GLOBAL_VARIABLES)
+      ])
+      self.assertSetEqual(expected_var_names, actual_variable_set)
+

 if __name__ == '__main__':
  tf.test.main()
--- a/research/object_detection/predictors/heads/keras_box_head.py
+++ b/research/object_detection/predictors/heads/keras_box_head.py
@@ -34,7 +34,8 @@ class ConvolutionalBoxHead(head.KerasHead):
               num_predictions_per_location,
               conv_hyperparams,
               freeze_batchnorm,
-               use_depthwise=True,
+               use_depthwise=False,
+               box_encodings_clip_range=None,
               name=None):
    """Constructor.

@@ -55,6 +56,7 @@ class ConvolutionalBoxHead(head.KerasHead):
        params.
      use_depthwise: Whether to use depthwise convolutions for prediction
        steps. Default is False.
+      box_encodings_clip_range: Min and max values for clipping box_encodings.
      name: A string name scope to assign to the model. If `None`, Keras
        will auto-generate one from the class name.

@@ -67,6 +69,7 @@ class ConvolutionalBoxHead(head.KerasHead):
    self._kernel_size = kernel_size
    self._num_predictions_per_location = num_predictions_per_location
    self._use_depthwise = use_depthwise
+    self._box_encodings_clip_range = box_encodings_clip_range

    self._box_encoder_layers = []

@@ -119,6 +122,202 @@ class ConvolutionalBoxHead(head.KerasHead):
    batch_size = features.get_shape().as_list()[0]
    if batch_size is None:
      batch_size = tf.shape(features)[0]
+    # Clipping the box encodings to make the inference graph TPU friendly.
+    if self._box_encodings_clip_range is not None:
+      box_encodings = tf.clip_by_value(
+          box_encodings, self._box_encodings_clip_range.min,
+          self._box_encodings_clip_range.max)
    box_encodings = tf.reshape(box_encodings,
                               [batch_size, -1, 1, self._box_code_size])
    return box_encodings
+
+
+class MaskRCNNBoxHead(head.KerasHead):
+  """Box prediction head.
+
+  This is a piece of Mask RCNN which is responsible for predicting
+  just the box encodings.
+
+  Please refer to Mask RCNN paper:
+  https://arxiv.org/abs/1703.06870
+  """
+
+  def __init__(self,
+               is_training,
+               num_classes,
+               fc_hyperparams,
+               freeze_batchnorm,
+               use_dropout,
+               dropout_keep_prob,
+               box_code_size,
+               share_box_across_classes=False,
+               name=None):
+    """Constructor.
+
+    Args:
+      is_training: Indicates whether the BoxPredictor is in training mode.
+      num_classes: number of classes.  Note that num_classes *does not*
+        include the background category, so if groundtruth labels take values
+        in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the
+        assigned classification targets can range from {0,... K}).
+      fc_hyperparams: A `hyperparams_builder.KerasLayerHyperparams` object
+        containing hyperparameters for fully connected dense ops.
+      freeze_batchnorm: Whether to freeze batch norm parameters during
+        training or not. When training with a small batch size (e.g. 1), it is
+        desirable to freeze batch norm update and use pretrained batch norm
+        params.
+      use_dropout: Option to use dropout or not.  Note that a single dropout
+        op is applied here prior to both box and class predictions, which stands
+        in contrast to the ConvolutionalBoxPredictor below.
+      dropout_keep_prob: Keep probability for dropout.
+        This is only used if use_dropout is True.
+      box_code_size: Size of encoding for each box.
+      share_box_across_classes: Whether to share boxes across classes rather
+        than use a different box for each class.
+      name: A string name scope to assign to the box head. If `None`, Keras
+        will auto-generate one from the class name.
+    """
+    super(MaskRCNNBoxHead, self).__init__(name=name)
+    self._is_training = is_training
+    self._num_classes = num_classes
+    self._fc_hyperparams = fc_hyperparams
+    self._freeze_batchnorm = freeze_batchnorm
+    self._use_dropout = use_dropout
+    self._dropout_keep_prob = dropout_keep_prob
+    self._box_code_size = box_code_size
+    self._share_box_across_classes = share_box_across_classes
+
+    self._box_encoder_layers = [tf.keras.layers.Flatten()]
+
+    if self._use_dropout:
+      self._box_encoder_layers.append(
+          tf.keras.layers.Dropout(rate=1.0 - self._dropout_keep_prob))
+
+    self._number_of_boxes = 1
+    if not self._share_box_across_classes:
+      self._number_of_boxes = self._num_classes
+
+    self._box_encoder_layers.append(
+        tf.keras.layers.Dense(self._number_of_boxes * self._box_code_size,
+                              name='BoxEncodingPredictor_dense'))
+    self._box_encoder_layers.append(
+        fc_hyperparams.build_batch_norm(training=(is_training and
+                                                  not freeze_batchnorm),
+                                        name='BoxEncodingPredictor_batchnorm'))
+
+  def _predict(self, features):
+    """Predicts box encodings.
+
+    Args:
+      features: A float tensor of shape [batch_size, height, width,
+        channels] containing features for a batch of images.
+
+    Returns:
+      box_encodings: A float tensor of shape
+        [batch_size, 1, num_classes, code_size] representing the location of the
+        objects.
+    """
+    spatial_averaged_roi_pooled_features = tf.reduce_mean(
+        features, [1, 2], keep_dims=True, name='AvgPool')
+    net = spatial_averaged_roi_pooled_features
+    for layer in self._box_encoder_layers:
+      net = layer(net)
+    box_encodings = tf.reshape(net,
+                               [-1, 1,
+                                self._number_of_boxes,
+                                self._box_code_size])
+    return box_encodings
+
+
+# TODO(b/128922690): Unify the implementations of ConvolutionalBoxHead
+# and WeightSharedConvolutionalBoxHead
+class WeightSharedConvolutionalBoxHead(head.KerasHead):
+  """Weight shared convolutional box prediction head based on Keras.
+
+  This head allows sharing the same set of parameters (weights) when called more
+  then once on different feature maps.
+  """
+
+  def __init__(self,
+               box_code_size,
+               num_predictions_per_location,
+               conv_hyperparams,
+               kernel_size=3,
+               use_depthwise=False,
+               box_encodings_clip_range=None,
+               return_flat_predictions=True,
+               name=None):
+    """Constructor.
+
+    Args:
+      box_code_size: Size of encoding for each box.
+      num_predictions_per_location: Number of box predictions to be made per
+        spatial location. Int specifying number of boxes per location.
+      conv_hyperparams: A `hyperparams_builder.KerasLayerHyperparams` object
+        containing hyperparameters for convolution ops.
+      kernel_size: Size of final convolution kernel.
+      use_depthwise: Whether to use depthwise convolutions for prediction steps.
+        Default is False.
+      box_encodings_clip_range: Min and max values for clipping box_encodings.
+      return_flat_predictions: If true, returns flattened prediction tensor
+        of shape [batch, height * width * num_predictions_per_location,
+        box_coder]. Otherwise returns the prediction tensor before reshaping,
+        whose shape is [batch, height, width, num_predictions_per_location *
+        num_class_slots].
+      name: A string name scope to assign to the model. If `None`, Keras
+        will auto-generate one from the class name.
+    """
+    super(WeightSharedConvolutionalBoxHead, self).__init__(name=name)
+    self._box_code_size = box_code_size
+    self._kernel_size = kernel_size
+    self._num_predictions_per_location = num_predictions_per_location
+    self._use_depthwise = use_depthwise
+    self._box_encodings_clip_range = box_encodings_clip_range
+    self._return_flat_predictions = return_flat_predictions
+
+    self._box_encoder_layers = []
+
+    if self._use_depthwise:
+      self._box_encoder_layers.append(
+          tf.keras.layers.SeparableConv2D(
+              num_predictions_per_location * self._box_code_size,
+              [self._kernel_size, self._kernel_size],
+              padding='SAME',
+              name='BoxPredictor',
+              **conv_hyperparams.params(use_bias=True)))
+    else:
+      self._box_encoder_layers.append(
+          tf.keras.layers.Conv2D(
+              num_predictions_per_location * self._box_code_size,
+              [self._kernel_size, self._kernel_size],
+              padding='SAME',
+              name='BoxPredictor',
+              **conv_hyperparams.params(use_bias=True)))
+
+  def _predict(self, features):
+    """Predicts boxes.
+
+    Args:
+      features: A float tensor of shape [batch_size, height, width, channels]
+        containing image features.
+
+    Returns:
+      box_encodings: A float tensor of shape
+        [batch_size, num_anchors, q, code_size] representing the location of
+        the objects, where q is 1 or the number of classes.
+    """
+    box_encodings = features
+    for layer in self._box_encoder_layers:
+      box_encodings = layer(box_encodings)
+    batch_size = features.get_shape().as_list()[0]
+    if batch_size is None:
+      batch_size = tf.shape(features)[0]
+    # Clipping the box encodings to make the inference graph TPU friendly.
+    if self._box_encodings_clip_range is not None:
+      box_encodings = tf.clip_by_value(
+          box_encodings, self._box_encodings_clip_range.min,
+          self._box_encodings_clip_range.max)
+    if self._return_flat_predictions:
+      box_encodings = tf.reshape(box_encodings,
+                                 [batch_size, -1, self._box_code_size])
+    return box_encodings