DELF open-source library v2.0 (#8454)

* Merged commit includes the following changes: 253126424 by Andre Araujo: Scripts to compute metrics for Google Landmarks dataset. Also, a small fix to metric in retrieval case: avoids duplicate predicted images. -- 253118971 by Andre Araujo: Metrics for Google Landmarks dataset. -- 253106953 by Andre Araujo: Library to read files from Google Landmarks challenges. -- 250700636 by Andre Araujo: Handle case of aggregation extraction with empty set of input features. -- 250516819 by Andre Araujo: Add minimum size for DELF extractor. -- 250435822 by Andre Araujo: Add max_image_size/min_image_size for open-source DELF proto / module. -- 250414606 by Andre Araujo: Refactor extract_aggregation to allow reuse with different datasets. -- 250356863 by Andre Araujo: Remove unnecessary cmd_args variable from boxes_and_features_extraction. -- 249783379 by Andre Araujo: Create directory for writing mapping file if it does not exist. -- 249581591 by Andre Araujo: Refactor scripts to extract boxes and features from images in Revisited datasets. Also, change tf.logging.info --> print for easier logging in open source code. -- 249511821 by Andre Araujo: Small change to function for file/directory handling. -- 249289499 by Andre Araujo: Internal change. -- PiperOrigin-RevId: 253126424 * Updating DELF init to adjust to latest changes * Editing init files for python packages * Edit D2R dataset reader to work with py3. PiperOrigin-RevId: 253135576 * DELF package: fix import ordering * Adding new requirements to setup.py * Adding init file for training dir * Merged commit includes the following changes: FolderOrigin-RevId: /google/src/cloud/andrearaujo/delf_oss/google3/.. * Adding init file for training subdirs * Working version of DELF training * Internal change. PiperOrigin-RevId: 253248648 * Fix variance loading in open-source code. PiperOrigin-RevId: 260619120 * Separate image re-ranking as a standalone library, and add metric writing to dataset library. PiperOrigin-RevId: 260998608 * Tool to read written D2R Revisited datasets metrics file. Test is added. Also adds a unit test for previously-existing SaveMetricsFile function. PiperOrigin-RevId: 263361410 * Add optional resize factor for feature extraction. PiperOrigin-RevId: 264437080 * Fix NumPy's new version spacing changes. PiperOrigin-RevId: 265127245 * Maker image matching function visible, and add support for RANSAC seed. PiperOrigin-RevId: 277177468 * Avoid matplotlib failure due to missing display backend. PiperOrigin-RevId: 287316435 * Removes tf.contrib dependency. PiperOrigin-RevId: 288842237 * Fix tf contrib removal for feature_aggregation_extractor. PiperOrigin-RevId: 289487669 * Merged commit includes the following changes: 309118395 by Andre Araujo: Make DELF open-source code compatible with TF2. -- 309067582 by Andre Araujo: Handle image resizing rounding properly for python extraction. New behavior is tested with unit tests. -- 308690144 by Andre Araujo: Several changes to improve DELF model/training code and make it work in TF 2.1.0: - Rename some files for better clarity - Using compat.v1 versions of functions - Formatting changes - Using more appropriate TF function names -- 308689397 by Andre Araujo: Internal change. -- 308341315 by Andre Araujo: Remove old slim dependency in DELF open-source model. This avoids issues with requiring old TF-v1, making it compatible with latest TF. -- 306777559 by Andre Araujo: Internal change -- 304505811 by Andre Araujo: Raise error during geometric verification if local features have different dimensionalities. -- 301739992 by Andre Araujo: Transform some geometric verification constants into arguments, to allow custom matching. -- 301300324 by Andre Araujo: Apply name change(experimental_run_v2 -> run) for all callers in Tensorflow. -- 299919057 by Andre Araujo: Automated refactoring to make code Python 3 compatible. -- 297953698 by Andre Araujo: Explicitly replace "import tensorflow" with "tensorflow.compat.v1" for TF2.x migration -- 297521242 by Andre Araujo: Explicitly replace "import tensorflow" with "tensorflow.compat.v1" for TF2.x migration -- 297278247 by Andre Araujo: Explicitly replace "import tensorflow" with "tensorflow.compat.v1" for TF2.x migration -- 297270405 by Andre Araujo: Explicitly replace "import tensorflow" with "tensorflow.compat.v1" for TF2.x migration -- 297238741 by Andre Araujo: Explicitly replace "import tensorflow" with "tensorflow.compat.v1" for TF2.x migration -- 297108605 by Andre Araujo: Explicitly replace "import tensorflow" with "tensorflow.compat.v1" for TF2.x migration -- 294676131 by Andre Araujo: Add option to resize images to square resolutions without aspect ratio preservation. -- 293849641 by Andre Araujo: Internal change. -- 293840896 by Andre Araujo: Changing Slim import to tf_slim codebase. -- 293661660 by Andre Araujo: Allow the delf training script to read from TFRecords dataset. -- 291755295 by Andre Araujo: Internal change. -- 291448508 by Andre Araujo: Internal change. -- 291414459 by Andre Araujo: Adding train script. -- 291384336 by Andre Araujo: Adding model export script and test. -- 291260565 by Andre Araujo: Adding placeholder for Google Landmarks dataset. -- 291205548 by Andre Araujo: Definition of DELF model using Keras ResNet50 as backbone. -- 289500793 by Andre Araujo: Add TFRecord building script for delf. -- PiperOrigin-RevId: 309118395 * Updating README, dependency versions * Updating training README * Fixing init import of export_model * Fixing init import of export_model_utils * tkinter in INSTALL_INSTRUCTIONS * Merged commit includes the following changes: FolderOrigin-RevId: /google/src/cloud/andrearaujo/delf_oss/google3/.. * INSTALL_INSTRUCTIONS mentioning different cloning options

DELF open-source library v2.0 (#8454)
* Merged commit includes the following changes: 253126424 by Andre Araujo: Scripts to compute metrics for Google Landmarks dataset. Also, a small fix to metric in retrieval case: avoids duplicate predicted images. -- 253118971 by Andre Araujo: Metrics for Google Landmarks dataset. -- 253106953 by Andre Araujo: Library to read files from Google Landmarks challenges. -- 250700636 by Andre Araujo: Handle case of aggregation extraction with empty set of input features. -- 250516819 by Andre Araujo: Add minimum size for DELF extractor. -- 250435822 by Andre Araujo: Add max_image_size/min_image_size for open-source DELF proto / module. -- 250414606 by Andre Araujo: Refactor extract_aggregation to allow reuse with different datasets. -- 250356863 by Andre Araujo: Remove unnecessary cmd_args variable from boxes_and_features_extraction. -- 249783379 by Andre Araujo: Create directory for writing mapping file if it does not exist. -- 249581591 by Andre Araujo: Refactor scripts to extract boxes and features from images in Revisited datasets. Also, change tf.logging.info --> print for easier logging in open source code. -- 249511821 by Andre Araujo: Small change to function for file/directory handling. -- 249289499 by Andre Araujo: Internal change. -- PiperOrigin-RevId: 253126424 * Updating DELF init to adjust to latest changes * Editing init files for python packages * Edit D2R dataset reader to work with py3. PiperOrigin-RevId: 253135576 * DELF package: fix import ordering * Adding new requirements to setup.py * Adding init file for training dir * Merged commit includes the following changes: FolderOrigin-RevId: /google/src/cloud/andrearaujo/delf_oss/google3/.. * Adding init file for training subdirs * Working version of DELF training * Internal change. PiperOrigin-RevId: 253248648 * Fix variance loading in open-source code. PiperOrigin-RevId: 260619120 * Separate image re-ranking as a standalone library, and add metric writing to dataset library. PiperOrigin-RevId: 260998608 * Tool to read written D2R Revisited datasets metrics file. Test is added. Also adds a unit test for previously-existing SaveMetricsFile function. PiperOrigin-RevId: 263361410 * Add optional resize factor for feature extraction. PiperOrigin-RevId: 264437080 * Fix NumPy's new version spacing changes. PiperOrigin-RevId: 265127245 * Maker image matching function visible, and add support for RANSAC seed. PiperOrigin-RevId: 277177468 * Avoid matplotlib failure due to missing display backend. PiperOrigin-RevId: 287316435 * Removes tf.contrib dependency. PiperOrigin-RevId: 288842237 * Fix tf contrib removal for feature_aggregation_extractor. PiperOrigin-RevId: 289487669 * Merged commit includes the following changes: 309118395 by Andre Araujo: Make DELF open-source code compatible with TF2. -- 309067582 by Andre Araujo: Handle image resizing rounding properly for python extraction. New behavior is tested with unit tests. -- 308690144 by Andre Araujo: Several changes to improve DELF model/training code and make it work in TF 2.1.0: - Rename some files for better clarity - Using compat.v1 versions of functions - Formatting changes - Using more appropriate TF function names -- 308689397 by Andre Araujo: Internal change. -- 308341315 by Andre Araujo: Remove old slim dependency in DELF open-source model. This avoids issues with requiring old TF-v1, making it compatible with latest TF. -- 306777559 by Andre Araujo: Internal change -- 304505811 by Andre Araujo: Raise error during geometric verification if local features have different dimensionalities. -- 301739992 by Andre Araujo: Transform some geometric verification constants into arguments, to allow custom matching. -- 301300324 by Andre Araujo: Apply name change(experimental_run_v2 -> run) for all callers in Tensorflow. -- 299919057 by Andre Araujo: Automated refactoring to make code Python 3 compatible. -- 297953698 by Andre Araujo: Explicitly replace "import tensorflow" with "tensorflow.compat.v1" for TF2.x migration -- 297521242 by Andre Araujo: Explicitly replace "import tensorflow" with "tensorflow.compat.v1" for TF2.x migration -- 297278247 by Andre Araujo: Explicitly replace "import tensorflow" with "tensorflow.compat.v1" for TF2.x migration -- 297270405 by Andre Araujo: Explicitly replace "import tensorflow" with "tensorflow.compat.v1" for TF2.x migration -- 297238741 by Andre Araujo: Explicitly replace "import tensorflow" with "tensorflow.compat.v1" for TF2.x migration -- 297108605 by Andre Araujo: Explicitly replace "import tensorflow" with "tensorflow.compat.v1" for TF2.x migration -- 294676131 by Andre Araujo: Add option to resize images to square resolutions without aspect ratio preservation. -- 293849641 by Andre Araujo: Internal change. -- 293840896 by Andre Araujo: Changing Slim import to tf_slim codebase. -- 293661660 by Andre Araujo: Allow the delf training script to read from TFRecords dataset. -- 291755295 by Andre Araujo: Internal change. -- 291448508 by Andre Araujo: Internal change. -- 291414459 by Andre Araujo: Adding train script. -- 291384336 by Andre Araujo: Adding model export script and test. -- 291260565 by Andre Araujo: Adding placeholder for Google Landmarks dataset. -- 291205548 by Andre Araujo: Definition of DELF model using Keras ResNet50 as backbone. -- 289500793 by Andre Araujo: Add TFRecord building script for delf. -- PiperOrigin-RevId: 309118395 * Updating README, dependency versions * Updating training README * Fixing init import of export_model * Fixing init import of export_model_utils * tkinter in INSTALL_INSTRUCTIONS * Merged commit includes the following changes: FolderOrigin-RevId: /google/src/cloud/andrearaujo/delf_oss/google3/.. * INSTALL_INSTRUCTIONS mentioning different cloning options
c627506f · André Araujo · GitHub · 71d2680d · c627506f · c627506f
Unverified Commit c627506f authored May 01, 2020 by André Araujo Committed by GitHub May 01, 2020
20 changed files
--- a/research/delf/delf/python/examples/extractor_test.py
+++ b/research/delf/delf/python/examples/extractor_test.py
@@ -29,16 +29,36 @@ from delf import extractor
 class ExtractorTest(tf.test.TestCase, parameterized.TestCase):
  @parameterized.named_parameters(
-      ('Max-1Min-1', -1, -1, [4, 2, 3], 1.0),
+      ('Max-1Min-1', -1, -1, 1.0, False, [4, 2, 3], [1.0, 1.0]),
-      ('Max2Min-1', 2, -1, [2, 1, 3], 0.5),
+      ('Max-1Min-1Square', -1, -1, 1.0, True, [4, 4, 3], [1.0, 2.0]),
-      ('Max8Min-1', 8, -1, [4, 2, 3], 1.0),
+      ('Max2Min-1', 2, -1, 1.0, False, [2, 1, 3], [0.5, 0.5]),
-      ('Max-1Min1', -1, 1, [4, 2, 3], 1.0),
+      ('Max2Min-1Square', 2, -1, 1.0, True, [2, 2, 3], [0.5, 1.0]),
-      ('Max-1Min8', -1, 8, [8, 4, 3], 2.0),
+      ('Max8Min-1', 8, -1, 1.0, False, [4, 2, 3], [1.0, 1.0]),
-      ('Max16Min8', 16, 8, [8, 4, 3], 2.0),
+      ('Max8Min-1Square', 8, -1, 1.0, True, [4, 4, 3], [1.0, 2.0]),
-      ('Max2Min2', 2, 2, [2, 1, 3], 0.5),
+      ('Max-1Min1', -1, 1, 1.0, False, [4, 2, 3], [1.0, 1.0]),
+      ('Max-1Min1Square', -1, 1, 1.0, True, [4, 4, 3], [1.0, 2.0]),
+      ('Max-1Min8', -1, 8, 1.0, False, [8, 4, 3], [2.0, 2.0]),
+      ('Max-1Min8Square', -1, 8, 1.0, True, [8, 8, 3], [2.0, 4.0]),
+      ('Max16Min8', 16, 8, 1.0, False, [8, 4, 3], [2.0, 2.0]),
+      ('Max16Min8Square', 16, 8, 1.0, True, [8, 8, 3], [2.0, 4.0]),
+      ('Max2Min2', 2, 2, 1.0, False, [2, 1, 3], [0.5, 0.5]),
+      ('Max2Min2Square', 2, 2, 1.0, True, [2, 2, 3], [0.5, 1.0]),
+      ('Max-1Min-1Factor0.5', -1, -1, 0.5, False, [4, 2, 3], [1.0, 1.0]),
+      ('Max-1Min-1Factor0.5Square', -1, -1, 0.5, True, [4, 4, 3], [1.0, 2.0]),
+      ('Max2Min-1Factor2.0', 2, -1, 2.0, False, [4, 2, 3], [1.0, 1.0]),
+      ('Max2Min-1Factor2.0Square', 2, -1, 2.0, True, [4, 4, 3], [1.0, 2.0]),
+      ('Max-1Min8Factor0.5', -1, 8, 0.5, False, [4, 2, 3], [1.0, 1.0]),
+      ('Max-1Min8Factor0.5Square', -1, 8, 0.5, True, [4, 4, 3], [1.0, 2.0]),
+      ('Max-1Min8Factor0.25', -1, 8, 0.25, False, [4, 2, 3], [1.0, 1.0]),
+      ('Max-1Min8Factor0.25Square', -1, 8, 0.25, True, [4, 4, 3], [1.0, 2.0]),
+      ('Max2Min2Factor2.0', 2, 2, 2.0, False, [4, 2, 3], [1.0, 1.0]),
+      ('Max2Min2Factor2.0Square', 2, 2, 2.0, True, [4, 4, 3], [1.0, 2.0]),
+      ('Max16Min8Factor0.5', 16, 8, 0.5, False, [4, 2, 3], [1.0, 1.0]),
+      ('Max16Min8Factor0.5Square', 16, 8, 0.5, True, [4, 4, 3], [1.0, 2.0]),
  )
-  def testResizeImageWorks(self, max_image_size, min_image_size, expected_shape,
+  def testResizeImageWorks(self, max_image_size, min_image_size, resize_factor,
-                           expected_scale_factor):
+                           square_output, expected_shape,
+                           expected_scale_factors):
    # Construct image of size 4x2x3.
    image = np.array([[[0, 0, 0], [1, 1, 1]], [[2, 2, 2], [3, 3, 3]],
                      [[4, 4, 4], [5, 5, 5]], [[6, 6, 6], [7, 7, 7]]],
@@ -48,9 +68,31 @@ class ExtractorTest(tf.test.TestCase, parameterized.TestCase):
    config = delf_config_pb2.DelfConfig(
        max_image_size=max_image_size, min_image_size=min_image_size)
-    resized_image, scale_factor = extractor.ResizeImage(image, config)
+    resized_image, scale_factors = extractor.ResizeImage(
+        image, config, resize_factor, square_output)
    self.assertAllEqual(resized_image.shape, expected_shape)
-    self.assertAllClose(scale_factor, expected_scale_factor)
+    self.assertAllClose(scale_factors, expected_scale_factors)
+  @parameterized.named_parameters(
+      ('Max2Min2', 2, 2, 1.0, False, [2, 1, 3], [0.666666, 0.5]),
+      ('Max2Min2Square', 2, 2, 1.0, True, [2, 2, 3], [0.666666, 1.0]),
+  )
+  def testResizeImageRoundingWorks(self, max_image_size, min_image_size,
+                                   resize_factor, square_output, expected_shape,
+                                   expected_scale_factors):
+    # Construct image of size 3x2x3.
+    image = np.array([[[0, 0, 0], [1, 1, 1]], [[2, 2, 2], [3, 3, 3]],
+                      [[4, 4, 4], [5, 5, 5]]],
+                     dtype='uint8')
+    # Set up config.
+    config = delf_config_pb2.DelfConfig(
+        max_image_size=max_image_size, min_image_size=min_image_size)
+    resized_image, scale_factors = extractor.ResizeImage(
+        image, config, resize_factor, square_output)
+    self.assertAllEqual(resized_image.shape, expected_shape)
+    self.assertAllClose(scale_factors, expected_scale_factors)
 if __name__ == '__main__':

--- a/research/delf/delf/python/examples/match_images.py
+++ b/research/delf/delf/python/examples/match_images.py
@@ -27,7 +27,10 @@ from __future__ import print_function
 import argparse
 import sys
-import matplotlib.image as mpimg
+import matplotlib
+# Needed before pyplot import for matplotlib to work properly.
+matplotlib.use('Agg')
+import matplotlib.image as mpimg  # pylint: disable=g-import-not-at-top
 import matplotlib.pyplot as plt
 import numpy as np
 from scipy import spatial
@@ -45,17 +48,17 @@ _DISTANCE_THRESHOLD = 0.8
 def main(unused_argv):
-  tf.logging.set_verbosity(tf.logging.INFO)
+  tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.INFO)
  # Read features.
  locations_1, _, descriptors_1, _, _ = feature_io.ReadFromFile(
      cmd_args.features_1_path)
  num_features_1 = locations_1.shape[0]
-  tf.logging.info("Loaded image 1's %d features" % num_features_1)
+  tf.compat.v1.logging.info("Loaded image 1's %d features" % num_features_1)
  locations_2, _, descriptors_2, _, _ = feature_io.ReadFromFile(
      cmd_args.features_2_path)
  num_features_2 = locations_2.shape[0]
-  tf.logging.info("Loaded image 2's %d features" % num_features_2)
+  tf.compat.v1.logging.info("Loaded image 2's %d features" % num_features_2)
  # Find nearest-neighbor matches using a KD tree.
  d1_tree = spatial.cKDTree(descriptors_1)
@@ -81,7 +84,7 @@ def main(unused_argv):
                              residual_threshold=20,
                              max_trials=1000)
-  tf.logging.info('Found %d inliers' % sum(inliers))
+  tf.compat.v1.logging.info('Found %d inliers' % sum(inliers))
  # Visualize correspondences, and save to file.
  _, ax = plt.subplots()

--- a/research/delf/delf/python/feature_aggregation_extractor.py
+++ b/research/delf/delf/python/feature_aggregation_extractor.py
@@ -27,6 +27,7 @@ import tensorflow as tf
 from delf import aggregation_config_pb2
+_CLUSTER_CENTERS_VAR_NAME = "clusters"
 _NORM_SQUARED_TOLERANCE = 1e-12
 # Aliases for aggregation types.
@@ -66,10 +67,7 @@ class ExtractAggregatedRepresentation(object):
            aggregation_config.feature_dimensionality
        ])
    tf.compat.v1.train.init_from_checkpoint(
-        aggregation_config.codebook_path, {
+        aggregation_config.codebook_path, {_CLUSTER_CENTERS_VAR_NAME: codebook})
-            tf.contrib.factorization.KMeansClustering.CLUSTER_CENTERS_VAR_NAME:
-                codebook
-        })
    # Construct extraction graph based on desired options.
    if self._aggregation_type == _VLAD:
@@ -270,7 +268,7 @@ class ExtractAggregatedRepresentation(object):
          output_vlad: VLAD descriptor updated to take into account contribution
            from ind-th feature.
        """
-        return ind + 1, tf.compat.v1.tensor_scatter_add(
+        return ind + 1, tf.tensor_scatter_nd_add(
            vlad, tf.expand_dims(selected_visual_words[ind], axis=1),
            tf.tile(
                tf.expand_dims(features[ind], axis=0), [num_assignments, 1]) -

--- a/research/delf/delf/python/feature_extractor.py
+++ b/research/delf/delf/python/feature_extractor.py
@@ -12,8 +12,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ==============================================================================
-"""DELF feature extractor.
+"""DELF feature extractor."""
-"""
 from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
@@ -39,8 +38,8 @@ def NormalizePixelValues(image,
  Returns:
    image: a float32 tensor of the same shape as the input image.
  """
-  image = tf.to_float(image)
+  image = tf.cast(image, dtype=tf.float32)
-  image = tf.div(tf.subtract(image, pixel_value_offset), pixel_value_scale)
+  image = tf.truediv(tf.subtract(image, pixel_value_offset), pixel_value_scale)
  return image
@@ -53,6 +52,7 @@ def CalculateReceptiveBoxes(height, width, rf, stride, padding):
    rf: The receptive field size.
    stride: The effective stride between two adjacent feature points.
    padding: The effective padding size.
  Returns:
    rf_boxes: [N, 4] receptive boxes tensor. Here N equals to height x width.
    Each box is represented by [ymin, xmin, ymax, xmax].
@@ -60,7 +60,8 @@ def CalculateReceptiveBoxes(height, width, rf, stride, padding):
  x, y = tf.meshgrid(tf.range(width), tf.range(height))
  coordinates = tf.reshape(tf.stack([y, x], axis=2), [-1, 2])
  # [y,x,y,x]
-  point_boxes = tf.to_float(tf.concat([coordinates, coordinates], 1))
+  point_boxes = tf.cast(
+      tf.concat([coordinates, coordinates], 1), dtype=tf.float32)
  bias = [-padding, -padding, -padding + rf - 1, -padding + rf - 1]
  rf_boxes = stride * point_boxes + bias
  return rf_boxes
@@ -94,12 +95,10 @@ def ExtractKeypointDescriptor(image, layer_name, image_scales, iou,
    abs_thres: A float tensor denoting the score threshold for feature
      selection.
    model_fn: Model function. Follows the signature:
      * Args:
        * `images`: Image tensor which is re-scaled.
        * `normalized_image`: Whether or not the images are normalized.
        * `reuse`: Whether or not the layer and its variables should be reused.
      * Returns:
        * `attention`: Attention score after the non-linearity.
        * `feature_map`: Feature map obtained from the ResNet model.
@@ -117,7 +116,8 @@ def ExtractKeypointDescriptor(image, layer_name, image_scales, iou,
  Raises:
    ValueError: If the layer_name is unsupported.
  """
-  original_image_shape_float = tf.gather(tf.to_float(tf.shape(image)), [0, 1])
+  original_image_shape_float = tf.gather(
+      tf.cast(tf.shape(image), dtype=tf.float32), [0, 1])
  image_tensor = NormalizePixelValues(image)
  image_tensor = tf.expand_dims(image_tensor, 0, name='image/expand_dims')
@@ -163,8 +163,10 @@ def ExtractKeypointDescriptor(image, layer_name, image_scales, iou,
      scores: Concatenated attention score tensor with the shape of [K].
    """
    scale = tf.gather(image_scales, scale_index)
-    new_image_size = tf.to_int32(tf.round(original_image_shape_float * scale))
+    new_image_size = tf.cast(
-    resized_image = tf.image.resize_bilinear(image_tensor, new_image_size)
+        tf.round(original_image_shape_float * scale), dtype=tf.int32)
+    resized_image = tf.compat.v1.image.resize_bilinear(image_tensor,
+                                                       new_image_size)
    attention, feature_map = model_fn(
        resized_image, normalized_image=True, reuse=reuse)
@@ -254,7 +256,7 @@ def BuildModel(layer_name, attention_nonlinear, attention_type,
      Currently, only 'softplus' is supported.
    attention_type: Type of the attention used. Options are:
      'use_l2_normalized_feature' and 'use_default_input_feature'. Note that
-       this is irrelevant during inference time.
+      this is irrelevant during inference time.
    attention_kernel_size: Size of attention kernel (kernel is square).
  Returns:
@@ -268,6 +270,7 @@ def BuildModel(layer_name, attention_nonlinear, attention_type,
      images: Image tensor.
      normalized_image: Whether or not the images are normalized.
      reuse: Whether or not the layer and its variables should be reused.
    Returns:
      attention: Attention score after the non-linearity.
      feature_map: Feature map after ResNet convolution.
@@ -328,57 +331,72 @@ def ApplyPcaAndWhitening(data,
  return output
-def DelfFeaturePostProcessing(boxes, descriptors, config):
+def PostProcessDescriptors(descriptors, use_pca, pca_parameters):
-  """Extract DELF features from input image.
+  """Post-process descriptors.
  Args:
-    boxes: [N, 4] float tensor which denotes the selected receptive box. N is
-      the number of final feature points which pass through keypoint selection
-      and NMS steps.
    descriptors: [N, input_dim] float tensor.
-    config: DelfConfig proto with DELF extraction options.
+    use_pca: Whether to use PCA.
+    pca_parameters: DelfPcaParameters proto.
  Returns:
-    locations: [N, 2] float tensor which denotes the selected keypoint
+    final_descriptors: [N, output_dim] float tensor with descriptors after
-      locations.
-    final_descriptors: [N, output_dim] float tensor with DELF descriptors after
      normalization and (possibly) PCA/whitening.
  """
+  # L2-normalize, and if desired apply PCA (followed by L2-normalization).
-  # Get center of descriptor boxes, corresponding to feature locations.
+  with tf.compat.v1.variable_scope('postprocess'):
-  locations = CalculateKeypointCenters(boxes)
-  # Post-process descriptors: L2-normalize, and if desired apply PCA (followed
-  # by L2-normalization).
-  with tf.variable_scope('postprocess'):
    final_descriptors = tf.nn.l2_normalize(
-        descriptors, dim=1, name='l2_normalization')
+        descriptors, axis=1, name='l2_normalization')
-    if config.delf_local_config.use_pca:
+    if use_pca:
      # Load PCA parameters.
      pca_mean = tf.constant(
-          datum_io.ReadFromFile(
+          datum_io.ReadFromFile(pca_parameters.mean_path), dtype=tf.float32)
-              config.delf_local_config.pca_parameters.mean_path),
-          dtype=tf.float32)
      pca_matrix = tf.constant(
-          datum_io.ReadFromFile(
+          datum_io.ReadFromFile(pca_parameters.projection_matrix_path),
-              config.delf_local_config.pca_parameters.projection_matrix_path),
          dtype=tf.float32)
-      pca_dim = config.delf_local_config.pca_parameters.pca_dim
+      pca_dim = pca_parameters.pca_dim
      pca_variances = None
-      if config.delf_local_config.pca_parameters.use_whitening:
+      if pca_parameters.use_whitening:
-        pca_variances = tf.constant(
+        pca_variances = tf.squeeze(
-            datum_io.ReadFromFile(
+            tf.constant(
-                config.delf_local_config.pca_parameters.pca_variances_path),
+                datum_io.ReadFromFile(pca_parameters.pca_variances_path),
-            dtype=tf.float32)
+                dtype=tf.float32))
      # Apply PCA, and whitening if desired.
-      final_descriptors = ApplyPcaAndWhitening(
+      final_descriptors = ApplyPcaAndWhitening(final_descriptors, pca_matrix,
-          final_descriptors, pca_matrix, pca_mean, pca_dim,
+                                               pca_mean, pca_dim,
-          config.delf_local_config.pca_parameters.use_whitening, pca_variances)
+                                               pca_parameters.use_whitening,
+                                               pca_variances)
      # Re-normalize.
      final_descriptors = tf.nn.l2_normalize(
-          final_descriptors, dim=1, name='pca_l2_normalization')
+          final_descriptors, axis=1, name='pca_l2_normalization')
+  return final_descriptors
+def DelfFeaturePostProcessing(boxes, descriptors, config):
+  """Extract DELF features from input image.
+  Args:
+    boxes: [N, 4] float tensor which denotes the selected receptive box. N is
+      the number of final feature points which pass through keypoint selection
+      and NMS steps.
+    descriptors: [N, input_dim] float tensor.
+    config: DelfConfig proto with DELF extraction options.
+  Returns:
+    locations: [N, 2] float tensor which denotes the selected keypoint
+      locations.
+    final_descriptors: [N, output_dim] float tensor with DELF descriptors after
+      normalization and (possibly) PCA/whitening.
+  """
+  # Get center of descriptor boxes, corresponding to feature locations.
+  locations = CalculateKeypointCenters(boxes)
+  final_descriptors = PostProcessDescriptors(
+      descriptors, config.delf_local_config.use_pca,
+      config.delf_local_config.pca_parameters)
  return locations, final_descriptors
--- a/research/delf/delf/python/feature_extractor_test.py
+++ b/research/delf/delf/python/feature_extractor_test.py
@@ -34,7 +34,7 @@ class FeatureExtractorTest(tf.test.TestCase):
        image, pixel_value_offset=5.0, pixel_value_scale=2.0)
    exp_normalized_image = [[[-1.0, 125.0, -2.5], [14.5, 3.5, 0.0]],
                            [[20.0, 0.0, 30.0], [25.5, 36.0, 42.0]]]
-    with self.test_session() as sess:
+    with self.session() as sess:
      normalized_image_out = sess.run(normalized_image)
    self.assertAllEqual(normalized_image_out, exp_normalized_image)
@@ -43,7 +43,7 @@ class FeatureExtractorTest(tf.test.TestCase):
    boxes = feature_extractor.CalculateReceptiveBoxes(
        height=1, width=2, rf=291, stride=32, padding=145)
    exp_boxes = [[-145., -145., 145., 145.], [-145., -113., 145., 177.]]
-    with self.test_session() as sess:
+    with self.session() as sess:
      boxes_out = sess.run(boxes)
    self.assertAllEqual(exp_boxes, boxes_out)
@@ -52,7 +52,7 @@ class FeatureExtractorTest(tf.test.TestCase):
    boxes = [[-10.0, 0.0, 11.0, 21.0], [-2.5, 5.0, 18.5, 26.0],
             [45.0, -2.5, 66.0, 18.5]]
    centers = feature_extractor.CalculateKeypointCenters(boxes)
-    with self.test_session() as sess:
+    with self.session() as sess:
      centers_out = sess.run(centers)
    exp_centers = [[0.5, 10.5], [8.0, 15.5], [55.5, 8.0]]
@@ -72,12 +72,11 @@ class FeatureExtractorTest(tf.test.TestCase):
      del normalized_image, reuse  # Unused variables in the test.
      image_shape = tf.shape(image)
      attention = tf.squeeze(tf.norm(image, axis=3))
-      feature_map = tf.concat(
+      feature_map = tf.concat([
-          [
+          tf.tile(image, [1, 1, 1, 341]),
-              tf.tile(image, [1, 1, 1, 341]),
+          tf.zeros([1, image_shape[1], image_shape[2], 1])
-              tf.zeros([1, image_shape[1], image_shape[2], 1])
+      ],
-          ],
+                              axis=3)
-          axis=3)
      return attention, feature_map
    boxes, feature_scales, features, scores = (
@@ -99,7 +98,7 @@ class FeatureExtractorTest(tf.test.TestCase):
            axis=1))
    exp_scores = [[1.723042], [1.600781]]
-    with self.test_session() as sess:
+    with self.session() as sess:
      boxes_out, feature_scales_out, features_out, scores_out = sess.run(
          [boxes, feature_scales, features, scores])
@@ -118,16 +117,18 @@ class FeatureExtractorTest(tf.test.TestCase):
    use_whitening = True
    pca_variances = tf.constant([4.0, 1.0])
-    output = feature_extractor.ApplyPcaAndWhitening(
+    output = feature_extractor.ApplyPcaAndWhitening(data, pca_matrix, pca_mean,
-        data, pca_matrix, pca_mean, output_dim, use_whitening, pca_variances)
+                                                    output_dim, use_whitening,
+                                                    pca_variances)
    exp_output = [[2.5, -5.0], [-6.0, -2.0], [-0.5, -3.0], [1.0, -2.0]]
-    with self.test_session() as sess:
+    with self.session() as sess:
      output_out = sess.run(output)
    self.assertAllEqual(exp_output, output_out)
 if __name__ == '__main__':
+  tf.compat.v1.disable_eager_execution()
  tf.test.main()
--- a/research/delf/delf/python/feature_io.py
+++ b/research/delf/delf/python/feature_io.py
@@ -168,7 +168,7 @@ def ReadFromFile(file_path):
    attention: [N] float array with attention scores.
    orientations: [N] float array with orientations.
  """
-  with tf.gfile.FastGFile(file_path, 'rb') as f:
+  with tf.io.gfile.GFile(file_path, 'rb') as f:
    return ParseFromString(f.read())
@@ -192,5 +192,5 @@ def WriteToFile(file_path,
  """
  serialized_data = SerializeToString(locations, scales, descriptors, attention,
                                      orientations)
-  with tf.gfile.FastGFile(file_path, 'w') as f:
+  with tf.io.gfile.GFile(file_path, 'w') as f:
    f.write(serialized_data)
--- a/research/delf/delf/python/feature_io_test.py
+++ b/research/delf/delf/python/feature_io_test.py
@@ -81,7 +81,7 @@ class DelfFeaturesIoTest(tf.test.TestCase):
  def testWriteAndReadToFile(self):
    locations, scales, descriptors, attention, orientations = create_data()
-    tmpdir = tf.test.get_temp_dir()
+    tmpdir = tf.compat.v1.test.get_temp_dir()
    filename = os.path.join(tmpdir, 'test.delf')
    feature_io.WriteToFile(filename, locations, scales, descriptors, attention,
                           orientations)
@@ -94,7 +94,7 @@ class DelfFeaturesIoTest(tf.test.TestCase):
    self.assertAllEqual(orientations, data_read[4])
  def testWriteAndReadToFileEmptyFile(self):
-    tmpdir = tf.test.get_temp_dir()
+    tmpdir = tf.compat.v1.test.get_temp_dir()
    filename = os.path.join(tmpdir, 'test.delf')
    feature_io.WriteToFile(filename, np.array([]), np.array([]), np.array([]),
                           np.array([]), np.array([]))

--- a/research/delf/delf/python/google_landmarks_dataset/dataset_file_io.py
+++ b/research/delf/delf/python/google_landmarks_dataset/dataset_file_io.py
@@ -49,7 +49,7 @@ def ReadSolution(file_path, task):
  public_solution = {}
  private_solution = {}
  ignored_ids = []
-  with tf.gfile.GFile(file_path, 'r') as csv_file:
+  with tf.io.gfile.GFile(file_path, 'r') as csv_file:
    reader = csv.reader(csv_file)
    next(reader, None)  # Skip header.
    for row in reader:
@@ -108,7 +108,7 @@ def ReadPredictions(file_path, public_ids, private_ids, ignored_ids, task):
  """
  public_predictions = {}
  private_predictions = {}
-  with tf.gfile.GFile(file_path, 'r') as csv_file:
+  with tf.io.gfile.GFile(file_path, 'r') as csv_file:
    reader = csv.reader(csv_file)
    next(reader, None)  # Skip header.
    for row in reader:

--- a/research/delf/delf/python/google_landmarks_dataset/dataset_file_io_test.py
+++ b/research/delf/delf/python/google_landmarks_dataset/dataset_file_io_test.py
@@ -29,8 +29,9 @@ class DatasetFileIoTest(tf.test.TestCase):
  def testReadRecognitionSolutionWorks(self):
    # Define inputs.
-    file_path = os.path.join(tf.test.get_temp_dir(), 'recognition_solution.csv')
+    file_path = os.path.join(tf.compat.v1.test.get_temp_dir(),
-    with tf.gfile.GFile(file_path, 'w') as f:
+                             'recognition_solution.csv')
+    with tf.io.gfile.GFile(file_path, 'w') as f:
      f.write('id,landmarks,Usage\n')
      f.write('0123456789abcdef,0 12,Public\n')
      f.write('0223456789abcdef,,Public\n')
@@ -60,8 +61,9 @@ class DatasetFileIoTest(tf.test.TestCase):
  def testReadRetrievalSolutionWorks(self):
    # Define inputs.
-    file_path = os.path.join(tf.test.get_temp_dir(), 'retrieval_solution.csv')
+    file_path = os.path.join(tf.compat.v1.test.get_temp_dir(),
-    with tf.gfile.GFile(file_path, 'w') as f:
+                             'retrieval_solution.csv')
+    with tf.io.gfile.GFile(file_path, 'w') as f:
      f.write('id,images,Usage\n')
      f.write('0123456789abcdef,None,Ignored\n')
      f.write('0223456789abcdef,fedcba9876543210 fedcba9876543200,Public\n')
@@ -91,9 +93,9 @@ class DatasetFileIoTest(tf.test.TestCase):
  def testReadRecognitionPredictionsWorks(self):
    # Define inputs.
-    file_path = os.path.join(tf.test.get_temp_dir(),
+    file_path = os.path.join(tf.compat.v1.test.get_temp_dir(),
                             'recognition_predictions.csv')
-    with tf.gfile.GFile(file_path, 'w') as f:
+    with tf.io.gfile.GFile(file_path, 'w') as f:
      f.write('id,landmarks\n')
      f.write('0123456789abcdef,12 0.1 \n')
      f.write('0423456789abcdef,0 19.0\n')
@@ -129,9 +131,9 @@ class DatasetFileIoTest(tf.test.TestCase):
  def testReadRetrievalPredictionsWorks(self):
    # Define inputs.
-    file_path = os.path.join(tf.test.get_temp_dir(),
+    file_path = os.path.join(tf.compat.v1.test.get_temp_dir(),
                             'retrieval_predictions.csv')
-    with tf.gfile.GFile(file_path, 'w') as f:
+    with tf.io.gfile.GFile(file_path, 'w') as f:
      f.write('id,images\n')
      f.write('0123456789abcdef,fedcba9876543250 \n')
      f.write('0423456789abcdef,fedcba9876543260\n')

--- a/research/delf/delf/python/training/README.md
+++ b/research/delf/delf/python/training/README.md
+# DELF training instructions
+## Data preparation
+See the
+[build_image_dataset.py](https://github.com/andrefaraujo/models/blob/master/research/delf/delf/python/training/build_image_dataset.py)
+script to prepare the data, following the instructions therein to download the
+dataset (via Kaggle) and then running the script.
+## Running training
+Assuming the data was downloaded to `/tmp/gld_tfrecord/`, running the following
+command should start training a model:
+```sh
+python tensorflow_models/research/delf/delf/python/training/train.py \
+  --train_file_pattern=/tmp/gld_tfrecord/train* \
+  --validation_file_pattern=/tmp/gld_tfrecord/train* \
+  --debug
+```
+Note that one may want to split the train TFRecords into a train/val (for
+training, we usually simply split it 80/20 randomly).
--- a/research/delf/delf/python/training/__init__.py
+++ b/research/delf/delf/python/training/__init__.py
+# Copyright 2020 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Module for DELF training."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+# pylint: disable=unused-import
+from delf.python.training import build_image_dataset
+# pylint: enable=unused-import
--- a/research/delf/delf/python/training/build_image_dataset.py
+++ b/research/delf/delf/python/training/build_image_dataset.py
+#!/usr/bin/python
+# Copyright 2020 Google Inc. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Converts landmark image data to TFRecords file format with Example protos.
+The image data set is expected to reside in JPEG files ends up with '.jpg'.
+This script assumes you have downloaded using the provided script:
+https://www.kaggle.com/tobwey/landmark-recognition-challenge-image-downloader
+This script converts the training and testing data into
+a sharded data set consisting of TFRecord files
+  train_directory/train-00000-of-00128
+  train_directory/train-00001-of-00128
+  ...
+  train_directory/train-00127-of-00128
+and
+  test_directory/test-00000-of-00128
+  test_directory/test-00001-of-00128
+  ...
+  test_directory/test-00127-of-00128
+where we have selected 128 shards for both data sets. Each record
+within the TFRecord file is a serialized Example proto. The Example proto
+contains the following fields:
+  image/encoded: string containing JPEG encoded image in RGB colorspace
+  image/height: integer, image height in pixels
+  image/width: integer, image width in pixels
+  image/colorspace: string, specifying the colorspace, always 'RGB'
+  image/channels: integer, specifying the number of channels, always 3
+  image/format: string, specifying the format, always 'JPEG'
+  image/filename: string, the unique id of the image file
+            e.g. '97c0a12e07ae8dd5' or '650c989dd3493748'
+Furthermore, if the data set type is training, it would contain one more field:
+  image/class/label: integer, the landmark_id from the input training csv file.
+"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import os
+from absl import app
+from absl import flags
+import numpy as np
+import pandas as pd
+import tensorflow as tf
+FLAGS = flags.FLAGS
+flags.DEFINE_string('train_directory', '/tmp/', 'Training data directory.')
+flags.DEFINE_string('test_directory', '/tmp/', 'Testing data directory.')
+flags.DEFINE_string('output_directory', '/tmp/', 'Output data directory.')
+flags.DEFINE_string('train_csv_path', '/tmp/train.csv',
+                    'Training data csv file path.')
+flags.DEFINE_string('test_csv_path', '/tmp/test.csv',
+                    'Testing data csv file path.')
+flags.DEFINE_integer('num_shards', 128, 'Number of shards in output data.')
+def _get_image_files_and_labels(name, csv_path, image_dir):
+  """Process input and get the image file paths, image ids and the labels.
+  Args:
+    name: 'train' or 'test'.
+    csv_path: path to the Google-landmark Dataset csv Data Sources files.
+    image_dir: directory that stores downloaded images.
+  Returns:
+    image_paths: the paths to all images in the image_dir.
+    file_ids: the unique ids of images.
+    labels: the landmark id of all images. When name='test', the returned labels
+      will be an empty list.
+  Raises:
+    ValueError: if input name is not supported.
+  """
+  image_paths = tf.io.gfile.glob(image_dir + '/*.jpg')
+  file_ids = [os.path.basename(os.path.normpath(f))[:-4] for f in image_paths]
+  if name == 'train':
+    with tf.io.gfile.GFile(csv_path, 'rb') as csv_file:
+      df = pd.read_csv(csv_file)
+    df = df.set_index('id')
+    labels = [int(df.loc[fid]['landmark_id']) for fid in file_ids]
+  elif name == 'test':
+    labels = []
+  else:
+    raise ValueError('Unsupported dataset split name: %s' % name)
+  return image_paths, file_ids, labels
+def _process_image(filename):
+  """Process a single image file.
+  Args:
+    filename: string, path to an image file e.g., '/path/to/example.jpg'.
+  Returns:
+    image_buffer: string, JPEG encoding of RGB image.
+    height: integer, image height in pixels.
+    width: integer, image width in pixels.
+  Raises:
+    ValueError: if parsed image has wrong number of dimensions or channels.
+  """
+  # Read the image file.
+  with tf.io.gfile.GFile(filename, 'rb') as f:
+    image_data = f.read()
+  # Decode the RGB JPEG.
+  image = tf.io.decode_jpeg(image_data, channels=3)
+  # Check that image converted to RGB
+  if len(image.shape) != 3:
+    raise ValueError('The parsed image number of dimensions is not 3 but %d' %
+                     (image.shape))
+  height = image.shape[0]
+  width = image.shape[1]
+  if image.shape[2] != 3:
+    raise ValueError('The parsed image channels is not 3 but %d' %
+                     (image.shape[2]))
+  return image_data, height, width
+def _int64_feature(value):
+  """Returns an int64_list from a bool / enum / int / uint."""
+  return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
+def _bytes_feature(value):
+  """Returns a bytes_list from a string / byte."""
+  return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
+def _convert_to_example(file_id, image_buffer, height, width, label=None):
+  """Build an Example proto for the given inputs.
+  Args:
+    file_id: string, unique id of an image file, e.g., '97c0a12e07ae8dd5'.
+    image_buffer: string, JPEG encoding of RGB image.
+    height: integer, image height in pixels.
+    width: integer, image width in pixels.
+    label: integer, the landmark id and prediction label.
+  Returns:
+    Example proto.
+  """
+  colorspace = 'RGB'
+  channels = 3
+  image_format = 'JPEG'
+  features = {
+      'image/height': _int64_feature(height),
+      'image/width': _int64_feature(width),
+      'image/colorspace': _bytes_feature(colorspace.encode('utf-8')),
+      'image/channels': _int64_feature(channels),
+      'image/format': _bytes_feature(image_format.encode('utf-8')),
+      'image/id': _bytes_feature(file_id.encode('utf-8')),
+      'image/encoded': _bytes_feature(image_buffer)
+  }
+  if label is not None:
+    features['image/class/label'] = _int64_feature(label)
+  example = tf.train.Example(features=tf.train.Features(feature=features))
+  return example
+def _write_tfrecord(output_prefix, image_paths, file_ids, labels):
+  """Read image files and write image and label data into TFRecord files.
+  Args:
+    output_prefix: string, the prefix of output files, e.g. 'train'.
+    image_paths: list of strings, the paths to images to be converted.
+    file_ids: list of strings, the image unique ids.
+    labels: list of integers, the landmark ids of images. It is an empty list
+      when output_prefix='test'.
+  Raises:
+    ValueError: if the length of input images, ids and labels don't match
+  """
+  if output_prefix == 'test':
+    labels = [None] * len(image_paths)
+  if not len(image_paths) == len(file_ids) == len(labels):
+    raise ValueError('length of image_paths, file_ids, labels shoud be the' +
+                     ' same. But they are %d, %d, %d, respectively' %
+                     (len(image_paths), len(file_ids), len(labels)))
+  spacing = np.linspace(0, len(image_paths), FLAGS.num_shards + 1, dtype=np.int)
+  for shard in range(FLAGS.num_shards):
+    output_file = os.path.join(
+        FLAGS.output_directory,
+        '%s-%.5d-of-%.5d' % (output_prefix, shard, FLAGS.num_shards))
+    writer = tf.io.TFRecordWriter(output_file)
+    print('Processing shard ', shard, ' and writing file ', output_file)
+    for i in range(spacing[shard], spacing[shard + 1]):
+      image_buffer, height, width = _process_image(image_paths[i])
+      example = _convert_to_example(file_ids[i], image_buffer, height, width,
+                                    labels[i])
+      writer.write(example.SerializeToString())
+    writer.close()
+def _build_tfrecord_dataset(name, csv_path, image_dir):
+  """Build a TFRecord dataset.
+  Args:
+    name: 'train' or 'test' to indicate which set of data to be processed.
+    csv_path: path to the Google-landmark Dataset csv Data Sources files.
+    image_dir: directory that stores downloaded images.
+  Returns:
+    Nothing. After the function call, sharded TFRecord files are materialized.
+  """
+  image_paths, file_ids, labels = _get_image_files_and_labels(
+      name, csv_path, image_dir)
+  _write_tfrecord(name, image_paths, file_ids, labels)
+def main(unused_argv):
+  _build_tfrecord_dataset('train', FLAGS.train_csv_path, FLAGS.train_directory)
+  _build_tfrecord_dataset('test', FLAGS.test_csv_path, FLAGS.test_directory)
+if __name__ == '__main__':
+  app.run(main)
--- a/research/delf/delf/python/training/datasets/__init__.py
+++ b/research/delf/delf/python/training/datasets/__init__.py
+# Copyright 2020 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Module exposing datasets for training."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+# pylint: disable=unused-import
+from delf.python.training.datasets import googlelandmarks
+# pylint: enable=unused-import
--- a/research/delf/delf/python/training/datasets/googlelandmarks.py
+++ b/research/delf/delf/python/training/datasets/googlelandmarks.py
+# Lint as: python3
+# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Google Landmarks Dataset(GLD).
+Placeholder for Google Landmarks dataset.
+"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import functools
+import tensorflow as tf
+class _DataAugmentationParams(object):
+  """Default parameters for augmentation."""
+  # The following are used for training.
+  min_object_covered = 0.1
+  aspect_ratio_range_min = 3. / 4
+  aspect_ratio_range_max = 4. / 3
+  area_range_min = 0.08
+  area_range_max = 1.0
+  max_attempts = 100
+  update_labels = False
+  # 'central_fraction' is used for central crop in inference.
+  central_fraction = 0.875
+  random_reflection = False
+  input_rows = 321
+  input_cols = 321
+def NormalizeImages(images, pixel_value_scale=0.5, pixel_value_offset=0.5):
+  """Normalize pixel values in image.
+  Output is computed as
+  normalized_images = (images - pixel_value_offset) / pixel_value_scale.
+  Args:
+    images: `Tensor`, images to normalize.
+    pixel_value_scale: float, scale.
+    pixel_value_offset: float, offset.
+  Returns:
+    normalized_images: `Tensor`, normalized images.
+  """
+  images = tf.cast(images, tf.float32)
+  normalized_images = tf.math.divide(
+      tf.subtract(images, pixel_value_offset), pixel_value_scale)
+  return normalized_images
+def _ImageNetCrop(image):
+  """Imagenet-style crop with random bbox and aspect ratio.
+  Args:
+    image: a `Tensor`, image to crop.
+  Returns:
+    cropped_image: `Tensor`, cropped image.
+  """
+  params = _DataAugmentationParams()
+  bbox = tf.constant([0.0, 0.0, 1.0, 1.0], dtype=tf.float32, shape=[1, 1, 4])
+  (bbox_begin, bbox_size, _) = tf.image.sample_distorted_bounding_box(
+      tf.shape(image),
+      bounding_boxes=bbox,
+      min_object_covered=params.min_object_covered,
+      aspect_ratio_range=(params.aspect_ratio_range_min,
+                          params.aspect_ratio_range_max),
+      area_range=(params.area_range_min, params.area_range_max),
+      max_attempts=params.max_attempts,
+      use_image_if_no_bounding_boxes=True)
+  cropped_image = tf.slice(image, bbox_begin, bbox_size)
+  cropped_image.set_shape([None, None, 3])
+  cropped_image = tf.image.resize(
+      cropped_image, [params.input_rows, params.input_cols], method='area')
+  if params.random_reflection:
+    cropped_image = tf.image.random_flip_left_right(cropped_image)
+  return cropped_image
+def _ParseFunction(example, name_to_features, image_size, augmentation):
+  """Parse a single TFExample to get the image and label and process the image.
+  Args:
+    example: a `TFExample`.
+    name_to_features: a `dict`. The mapping from feature names to its type.
+    image_size: an `int`. The image size for the decoded image, on each side.
+    augmentation: a `boolean`. True if the image will be augmented.
+  Returns:
+    image: a `Tensor`. The processed image.
+    label: a `Tensor`. The ground-truth label.
+  """
+  parsed_example = tf.io.parse_single_example(example, name_to_features)
+  # Parse to get image.
+  image = parsed_example['image/encoded']
+  image = tf.io.decode_jpeg(image)
+  if augmentation:
+    image = _ImageNetCrop(image)
+  else:
+    image = tf.image.resize(image, [image_size, image_size])
+    image.set_shape([image_size, image_size, 3])
+  # Parse to get label.
+  label = parsed_example['image/class/label']
+  return image, label
+def CreateDataset(file_pattern,
+                  image_size=321,
+                  batch_size=32,
+                  augmentation=False,
+                  seed=0):
+  """Creates a dataset.
+  Args:
+    file_pattern: str, file pattern of the dataset files.
+    image_size: int, image size.
+    batch_size: int, batch size.
+    augmentation: bool, whether to apply augmentation.
+    seed: int, seed for shuffling the dataset.
+  Returns:
+     tf.data.TFRecordDataset.
+  """
+  filenames = tf.io.gfile.glob(file_pattern)
+  dataset = tf.data.TFRecordDataset(filenames)
+  dataset = dataset.repeat().shuffle(buffer_size=100, seed=seed)
+  # Create a description of the features.
+  feature_description = {
+      'image/height': tf.io.FixedLenFeature([], tf.int64, default_value=0),
+      'image/width': tf.io.FixedLenFeature([], tf.int64, default_value=0),
+      'image/channels': tf.io.FixedLenFeature([], tf.int64, default_value=0),
+      'image/format': tf.io.FixedLenFeature([], tf.string, default_value=''),
+      'image/filename': tf.io.FixedLenFeature([], tf.string, default_value=''),
+      'image/encoded': tf.io.FixedLenFeature([], tf.string, default_value=''),
+      'image/class/label': tf.io.FixedLenFeature([], tf.int64, default_value=0),
+  }
+  customized_parse_func = functools.partial(
+      _ParseFunction,
+      name_to_features=feature_description,
+      image_size=image_size,
+      augmentation=augmentation)
+  dataset = dataset.map(customized_parse_func)
+  dataset = dataset.batch(batch_size)
+  return dataset
--- a/research/delf/delf/python/training/model/__init__.py
+++ b/research/delf/delf/python/training/model/__init__.py
+# Copyright 2020 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""DELF model module, used for training and exporting."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+# pylint: disable=unused-import
+from delf.python.training.model import delf_model
+from delf.python.training.model import export_model_utils
+from delf.python.training.model import resnet50
+# pylint: enable=unused-import
--- a/research/delf/delf/python/training/model/delf_model.py
+++ b/research/delf/delf/python/training/model/delf_model.py
+# Lint as: python3
+# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""DELF model implementation based on the following paper.
+  Large-Scale Image Retrieval with Attentive Deep Local Features
+  https://arxiv.org/abs/1612.06321
+"""
+import tensorflow as tf
+from delf.python.training.model import resnet50 as resnet
+layers = tf.keras.layers
+reg = tf.keras.regularizers
+_DECAY = 0.0001
+class AttentionModel(tf.keras.Model):
+  """Instantiates attention model.
+  Uses two [kernel_size x kernel_size] convolutions and softplus as activation
+  to compute an attention map with the same resolution as the featuremap.
+  Features l2-normalized and aggregated using attention probabilites as weights.
+  """
+  def __init__(self, kernel_size=1, decay=_DECAY, name='attention'):
+    """Initialization of attention model.
+    Args:
+      kernel_size: int, kernel size of convolutions.
+      decay: float, decay for l2 regularization of kernel weights.
+      name: str, name to identify model.
+    """
+    super(AttentionModel, self).__init__(name=name)
+    # First convolutional layer (called with relu activation).
+    self.conv1 = layers.Conv2D(
+        512,
+        kernel_size,
+        kernel_regularizer=reg.l2(decay),
+        padding='same',
+        name='attn_conv1')
+    self.bn_conv1 = layers.BatchNormalization(axis=3, name='bn_conv1')
+    # Second convolutional layer, with softplus activation.
+    self.conv2 = layers.Conv2D(
+        1,
+        kernel_size,
+        kernel_regularizer=reg.l2(decay),
+        padding='same',
+        name='attn_conv2')
+    self.activation_layer = layers.Activation('softplus')
+  def call(self, inputs, training=True):
+    x = self.conv1(inputs)
+    x = self.bn_conv1(x, training=training)
+    x = tf.nn.relu(x)
+    score = self.conv2(x)
+    prob = self.activation_layer(score)
+    # L2-normalize the featuremap before pooling.
+    inputs = tf.nn.l2_normalize(inputs, axis=-1)
+    feat = tf.reduce_mean(tf.multiply(inputs, prob), [1, 2], keepdims=False)
+    return feat, prob, score
+class Delf(tf.keras.Model):
+  """Instantiates Keras DELF model using ResNet50 as backbone.
+  This class implements the [DELF](https://arxiv.org/abs/1612.06321) model for
+  extracting local features from images. The backbone is a ResNet50 network
+  that extracts featuremaps from both conv_4 and conv_5 layers. Activations
+  from conv_4 are used to compute an attention map of the same resolution.
+  """
+  def __init__(self, block3_strides=True, name='DELF'):
+    """Initialization of DELF model.
+    Args:
+      block3_strides: bool, whether to add strides to the output of block3.
+      name: str, name to identify model.
+    """
+    super(Delf, self).__init__(name=name)
+    # Backbone using Keras ResNet50.
+    self.backbone = resnet.ResNet50(
+        'channels_last',
+        name='backbone',
+        include_top=False,
+        pooling='avg',
+        block3_strides=block3_strides,
+        average_pooling=False)
+    # Attention model.
+    self.attention = AttentionModel(name='attention')
+  # Define classifiers for training backbone and attention models.
+  def init_classifiers(self, num_classes):
+    self.num_classes = num_classes
+    self.desc_classification = layers.Dense(
+        num_classes, activation=None, kernel_regularizer=None, name='desc_fc')
+    self.attn_classification = layers.Dense(
+        num_classes, activation=None, kernel_regularizer=None, name='att_fc')
+  # Weights to optimize for descriptor fine tuning.
+  @property
+  def desc_trainable_weights(self):
+    return (self.backbone.trainable_weights +
+            self.desc_classification.trainable_weights)
+  # Weights to optimize for attention model training.
+  @property
+  def attn_trainable_weights(self):
+    return (self.attention.trainable_weights +
+            self.attn_classification.trainable_weights)
+  def call(self, input_image, training=True):
+    blocks = {'block3': None}
+    self.backbone(input_image, intermediates_dict=blocks, training=training)
+    features = blocks['block3']
+    _, probs, _ = self.attention(features, training=training)
+    return probs, features
--- a/research/delf/delf/python/training/model/delf_model_test.py
+++ b/research/delf/delf/python/training/model/delf_model_test.py
+# Lint as: python3
+# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for the DELF model."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from absl.testing import parameterized
+import tensorflow as tf
+from delf.python.training.model import delf_model
+class DelfTest(tf.test.TestCase, parameterized.TestCase):
+  @parameterized.named_parameters(
+      ('block3_stridesTrue', True),
+      ('block3_stridesFalse', False),
+  )
+  def test_build_model(self, block3_strides):
+    image_size = 321
+    num_classes = 1000
+    batch_size = 2
+    input_shape = (batch_size, image_size, image_size, 3)
+    model = delf_model.Delf(block3_strides=block3_strides, name='DELF')
+    model.init_classifiers(num_classes)
+    images = tf.random.uniform(input_shape, minval=-1.0, maxval=1.0, seed=0)
+    blocks = {}
+    # Get global feature by pooling block4 features.
+    desc_prelogits = model.backbone(
+        images, intermediates_dict=blocks, training=False)
+    desc_logits = model.desc_classification(desc_prelogits)
+    self.assertAllEqual(desc_prelogits.shape, (batch_size, 2048))
+    self.assertAllEqual(desc_logits.shape, (batch_size, num_classes))
+    features = blocks['block3']
+    attn_prelogits, _, _ = model.attention(features)
+    attn_logits = model.attn_classification(attn_prelogits)
+    self.assertAllEqual(attn_prelogits.shape, (batch_size, 1024))
+    self.assertAllEqual(attn_logits.shape, (batch_size, num_classes))
+  @parameterized.named_parameters(
+      ('block3_stridesTrue', True),
+      ('block3_stridesFalse', False),
+  )
+  def test_train_step(self, block3_strides):
+    image_size = 321
+    num_classes = 1000
+    batch_size = 2
+    clip_val = 10.0
+    input_shape = (batch_size, image_size, image_size, 3)
+    model = delf_model.Delf(block3_strides=block3_strides, name='DELF')
+    model.init_classifiers(num_classes)
+    optimizer = tf.keras.optimizers.SGD(learning_rate=0.001, momentum=0.9)
+    images = tf.random.uniform(input_shape, minval=0.0, maxval=1.0, seed=0)
+    labels = tf.random.uniform((batch_size,),
+                               minval=0,
+                               maxval=model.num_classes - 1,
+                               dtype=tf.int64)
+    loss_object = tf.keras.losses.SparseCategoricalCrossentropy(
+        from_logits=True, reduction=tf.keras.losses.Reduction.NONE)
+    def compute_loss(labels, predictions):
+      per_example_loss = loss_object(labels, predictions)
+      return tf.nn.compute_average_loss(
+          per_example_loss, global_batch_size=batch_size)
+    with tf.GradientTape() as desc_tape:
+      blocks = {}
+      desc_prelogits = model.backbone(
+          images, intermediates_dict=blocks, training=False)
+      desc_logits = model.desc_classification(desc_prelogits)
+      desc_logits = model.desc_classification(desc_prelogits)
+      desc_loss = compute_loss(labels, desc_logits)
+    gradients = desc_tape.gradient(desc_loss, model.desc_trainable_weights)
+    clipped, _ = tf.clip_by_global_norm(gradients, clip_norm=clip_val)
+    optimizer.apply_gradients(zip(clipped, model.desc_trainable_weights))
+    with tf.GradientTape() as attn_tape:
+      block3 = blocks['block3']
+      block3 = tf.stop_gradient(block3)
+      attn_prelogits, _, _ = model.attention(block3, training=True)
+      attn_logits = model.attn_classification(attn_prelogits)
+      attn_loss = compute_loss(labels, attn_logits)
+    gradients = attn_tape.gradient(attn_loss, model.attn_trainable_weights)
+    clipped, _ = tf.clip_by_global_norm(gradients, clip_norm=clip_val)
+    optimizer.apply_gradients(zip(clipped, model.attn_trainable_weights))
+if __name__ == '__main__':
+  tf.test.main()
--- a/research/delf/delf/python/training/model/export_model.py
+++ b/research/delf/delf/python/training/model/export_model.py
+# Lint as: python3
+# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Export DELF tensorflow inference model.
+This model includes feature extraction, receptive field calculation and
+key-point selection and outputs the selected feature descriptors.
+"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import os
+from absl import app
+from absl import flags
+import tensorflow as tf
+from delf.python.training.model import delf_model
+from delf.python.training.model import export_model_utils
+FLAGS = flags.FLAGS
+flags.DEFINE_string('ckpt_path', '/tmp/delf-logdir/delf-weights',
+                    'Path to saved checkpoint.')
+flags.DEFINE_string('export_path', None, 'Path where model will be exported.')
+flags.DEFINE_boolean('block3_strides', False,
+                     'Whether to apply strides after block3.')
+flags.DEFINE_float('iou', 1.0, 'IOU for non-max suppression.')
+def _build_tensor_info(tensor_dict):
+  """Replace the dict's value by the tensor info.
+  Args:
+    tensor_dict: A dictionary contains <string, tensor>.
+  Returns:
+    dict: New dictionary contains <string, tensor_info>.
+  """
+  return {
+      k: tf.compat.v1.saved_model.utils.build_tensor_info(t)
+      for k, t in tensor_dict.items()
+  }
+def main(argv):
+  if len(argv) > 1:
+    raise app.UsageError('Too many command-line arguments.')
+  export_path = FLAGS.export_path
+  if os.path.exists(export_path):
+    raise ValueError('Export_path already exists.')
+  with tf.Graph().as_default() as g, tf.compat.v1.Session(graph=g) as sess:
+    # Setup the DELF model for extraction.
+    model = delf_model.Delf(block3_strides=FLAGS.block3_strides, name='DELF')
+    # Initial forward pass to build model.
+    images = tf.zeros((1, 321, 321, 3), dtype=tf.float32)
+    model(images)
+    stride_factor = 2.0 if FLAGS.block3_strides else 1.0
+    # Setup the multiscale keypoint extraction.
+    input_image = tf.compat.v1.placeholder(
+        tf.uint8, shape=(None, None, 3), name='input_image')
+    input_abs_thres = tf.compat.v1.placeholder(
+        tf.float32, shape=(), name='input_abs_thres')
+    input_scales = tf.compat.v1.placeholder(
+        tf.float32, shape=[None], name='input_scales')
+    input_max_feature_num = tf.compat.v1.placeholder(
+        tf.int32, shape=(), name='input_max_feature_num')
+    extracted_features = export_model_utils.ExtractLocalFeatures(
+        input_image, input_scales, input_max_feature_num, input_abs_thres,
+        FLAGS.iou, lambda x: model(x, training=False), stride_factor)
+    # Load the weights.
+    checkpoint_path = FLAGS.ckpt_path
+    model.load_weights(checkpoint_path)
+    print('Checkpoint loaded from ', checkpoint_path)
+    named_input_tensors = {
+        'input_image': input_image,
+        'input_scales': input_scales,
+        'input_abs_thres': input_abs_thres,
+        'input_max_feature_num': input_max_feature_num,
+    }
+    # Outputs to the exported model.
+    named_output_tensors = {}
+    named_output_tensors['boxes'] = tf.identity(
+        extracted_features[0], name='boxes')
+    named_output_tensors['features'] = tf.identity(
+        extracted_features[1], name='features')
+    named_output_tensors['scales'] = tf.identity(
+        extracted_features[2], name='scales')
+    named_output_tensors['scores'] = tf.identity(
+        extracted_features[3], name='scores')
+    # Export the model.
+    signature_def = tf.compat.v1.saved_model.signature_def_utils.build_signature_def(
+        inputs=_build_tensor_info(named_input_tensors),
+        outputs=_build_tensor_info(named_output_tensors))
+    print('Exporting trained model to:', export_path)
+    builder = tf.compat.v1.saved_model.builder.SavedModelBuilder(export_path)
+    init_op = None
+    builder.add_meta_graph_and_variables(
+        sess, [tf.compat.v1.saved_model.tag_constants.SERVING],
+        signature_def_map={
+            tf.compat.v1.saved_model.signature_constants
+            .DEFAULT_SERVING_SIGNATURE_DEF_KEY:
+                signature_def
+        },
+        main_op=init_op)
+    builder.save()
+if __name__ == '__main__':
+  app.run(main)
--- a/research/delf/delf/python/training/model/export_model_utils.py
+++ b/research/delf/delf/python/training/model/export_model_utils.py
+# Lint as: python3
+# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Helper functions for DELF model exporting."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import tensorflow as tf
+from delf import feature_extractor
+from delf.python.training.datasets import googlelandmarks as gld
+from object_detection.core import box_list
+from object_detection.core import box_list_ops
+def ExtractLocalFeatures(image, image_scales, max_feature_num, abs_thres, iou,
+                         attention_model_fn, stride_factor):
+  """Extract local features for input image.
+  Args:
+    image: image tensor of type tf.uint8 with shape [h, w, channels].
+    image_scales: 1D float tensor which contains float scales used for image
+      pyramid construction.
+    max_feature_num: int tensor denotes the maximum selected feature points.
+    abs_thres: float tensor denotes the score threshold for feature selection.
+    iou: float scalar denotes the iou threshold for NMS.
+    attention_model_fn: model function. Follows the signature:
+      * Args:
+        * `images`: Image tensor which is re-scaled.
+      * Returns:
+        * `attention_prob`: attention map after the non-linearity.
+        * `feature_map`: feature map after ResNet convolution.
+    stride_factor: integer accounting for striding after block3.
+  Returns:
+    boxes: [N, 4] float tensor which denotes the selected receptive box. N is
+      the number of final feature points which pass through keypoint selection
+      and NMS steps.
+    features: [N, depth] float tensor.
+    feature_scales: [N] float tensor. It is the inverse of the input image
+      scales such that larger image scales correspond to larger image regions,
+      which is compatible with keypoints detected with other techniques, for
+      example Congas.
+    scores: [N, 1] float tensor denotes the attention score.
+  """
+  original_image_shape_float = tf.gather(
+      tf.dtypes.cast(tf.shape(image), tf.float32), [0, 1])
+  image_tensor = gld.NormalizeImages(
+      image, pixel_value_offset=128.0, pixel_value_scale=128.0)
+  image_tensor = tf.expand_dims(image_tensor, 0, name='image/expand_dims')
+  # Hard code the feature depth and receptive field parameters for now.
+  rf, stride, padding = [291.0, 16.0 * stride_factor, 145.0]
+  feature_depth = 1024
+  def _ProcessSingleScale(scale_index, boxes, features, scales, scores):
+    """Resizes the image and run feature extraction and keypoint selection.
+       This function will be passed into tf.while_loop() and be called
+       repeatedly. The input boxes are collected from the previous iteration
+       [0: scale_index -1]. We get the current scale by
+       image_scales[scale_index], and run resize image, feature extraction and
+       keypoint selection. Then we will get a new set of selected_boxes for
+       current scale. In the end, we concat the previous boxes with current
+       selected_boxes as the output.
+    Args:
+      scale_index: A valid index in the image_scales.
+      boxes: Box tensor with the shape of [N, 4].
+      features: Feature tensor with the shape of [N, depth].
+      scales: Scale tensor with the shape of [N].
+      scores: Attention score tensor with the shape of [N].
+    Returns:
+      scale_index: The next scale index for processing.
+      boxes: Concatenated box tensor with the shape of [K, 4]. K >= N.
+      features: Concatenated feature tensor with the shape of [K, depth].
+      scales: Concatenated scale tensor with the shape of [K].
+      scores: Concatenated score tensor with the shape of [K].
+    """
+    scale = tf.gather(image_scales, scale_index)
+    new_image_size = tf.dtypes.cast(
+        tf.round(original_image_shape_float * scale), tf.int32)
+    resized_image = tf.image.resize(image_tensor, new_image_size)
+    attention_prob, feature_map = attention_model_fn(resized_image)
+    attention_prob = tf.squeeze(attention_prob, axis=[0])
+    feature_map = tf.squeeze(feature_map, axis=[0])
+    rf_boxes = feature_extractor.CalculateReceptiveBoxes(
+        tf.shape(feature_map)[0],
+        tf.shape(feature_map)[1], rf, stride, padding)
+    # Re-project back to the original image space.
+    rf_boxes = tf.divide(rf_boxes, scale)
+    attention_prob = tf.reshape(attention_prob, [-1])
+    feature_map = tf.reshape(feature_map, [-1, feature_depth])
+    # Use attention score to select feature vectors.
+    indices = tf.reshape(tf.where(attention_prob >= abs_thres), [-1])
+    selected_boxes = tf.gather(rf_boxes, indices)
+    selected_features = tf.gather(feature_map, indices)
+    selected_scores = tf.gather(attention_prob, indices)
+    selected_scales = tf.ones_like(selected_scores, tf.float32) / scale
+    # Concat with the previous result from different scales.
+    boxes = tf.concat([boxes, selected_boxes], 0)
+    features = tf.concat([features, selected_features], 0)
+    scales = tf.concat([scales, selected_scales], 0)
+    scores = tf.concat([scores, selected_scores], 0)
+    return scale_index + 1, boxes, features, scales, scores
+  output_boxes = tf.zeros([0, 4], dtype=tf.float32)
+  output_features = tf.zeros([0, feature_depth], dtype=tf.float32)
+  output_scales = tf.zeros([0], dtype=tf.float32)
+  output_scores = tf.zeros([0], dtype=tf.float32)
+  # Process the first scale separately, the following scales will reuse the
+  # graph variables.
+  (_, output_boxes, output_features, output_scales,
+   output_scores) = _ProcessSingleScale(0, output_boxes, output_features,
+                                        output_scales, output_scores)
+  i = tf.constant(1, dtype=tf.int32)
+  num_scales = tf.shape(image_scales)[0]
+  keep_going = lambda j, b, f, scales, scores: tf.less(j, num_scales)
+  (_, output_boxes, output_features, output_scales,
+   output_scores) = tf.while_loop(
+       cond=keep_going,
+       body=_ProcessSingleScale,
+       loop_vars=[
+           i, output_boxes, output_features, output_scales, output_scores
+       ],
+       shape_invariants=[
+           i.get_shape(),
+           tf.TensorShape([None, 4]),
+           tf.TensorShape([None, feature_depth]),
+           tf.TensorShape([None]),
+           tf.TensorShape([None])
+       ],
+       back_prop=False)
+  feature_boxes = box_list.BoxList(output_boxes)
+  feature_boxes.add_field('features', output_features)
+  feature_boxes.add_field('scales', output_scales)
+  feature_boxes.add_field('scores', output_scores)
+  nms_max_boxes = tf.minimum(max_feature_num, feature_boxes.num_boxes())
+  final_boxes = box_list_ops.non_max_suppression(feature_boxes, iou,
+                                                 nms_max_boxes)
+  return final_boxes.get(), final_boxes.get_field(
+      'features'), final_boxes.get_field('scales'), tf.expand_dims(
+          final_boxes.get_field('scores'), 1)
--- a/research/delf/delf/python/training/model/resnet50.py
+++ b/research/delf/delf/python/training/model/resnet50.py
+# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""ResNet50 backbone used in DELF model.
+Copied over from tensorflow/python/eager/benchmarks/resnet50/resnet50.py,
+because that code does not support dependencies.
+"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import functools
+import tensorflow as tf
+layers = tf.keras.layers
+class _IdentityBlock(tf.keras.Model):
+  """_IdentityBlock is the block that has no conv layer at shortcut.
+  Args:
+    kernel_size: the kernel size of middle conv layer at main path
+    filters: list of integers, the filters of 3 conv layer at main path
+    stage: integer, current stage label, used for generating layer names
+    block: 'a','b'..., current block label, used for generating layer names
+    data_format: data_format for the input ('channels_first' or
+      'channels_last').
+  """
+  def __init__(self, kernel_size, filters, stage, block, data_format):
+    super(_IdentityBlock, self).__init__(name='')
+    filters1, filters2, filters3 = filters
+    conv_name_base = 'res' + str(stage) + block + '_branch'
+    bn_name_base = 'bn' + str(stage) + block + '_branch'
+    bn_axis = 1 if data_format == 'channels_first' else 3
+    self.conv2a = layers.Conv2D(
+        filters1, (1, 1), name=conv_name_base + '2a', data_format=data_format)
+    self.bn2a = layers.BatchNormalization(
+        axis=bn_axis, name=bn_name_base + '2a')
+    self.conv2b = layers.Conv2D(
+        filters2,
+        kernel_size,
+        padding='same',
+        data_format=data_format,
+        name=conv_name_base + '2b')
+    self.bn2b = layers.BatchNormalization(
+        axis=bn_axis, name=bn_name_base + '2b')
+    self.conv2c = layers.Conv2D(
+        filters3, (1, 1), name=conv_name_base + '2c', data_format=data_format)
+    self.bn2c = layers.BatchNormalization(
+        axis=bn_axis, name=bn_name_base + '2c')
+  def call(self, input_tensor, training=False):
+    x = self.conv2a(input_tensor)
+    x = self.bn2a(x, training=training)
+    x = tf.nn.relu(x)
+    x = self.conv2b(x)
+    x = self.bn2b(x, training=training)
+    x = tf.nn.relu(x)
+    x = self.conv2c(x)
+    x = self.bn2c(x, training=training)
+    x += input_tensor
+    return tf.nn.relu(x)
+class _ConvBlock(tf.keras.Model):
+  """_ConvBlock is the block that has a conv layer at shortcut.
+  Args:
+      kernel_size: the kernel size of middle conv layer at main path
+      filters: list of integers, the filters of 3 conv layer at main path
+      stage: integer, current stage label, used for generating layer names
+      block: 'a','b'..., current block label, used for generating layer names
+      data_format: data_format for the input ('channels_first' or
+        'channels_last').
+      strides: strides for the convolution. Note that from stage 3, the first
+        conv layer at main path is with strides=(2,2), and the shortcut should
+        have strides=(2,2) as well.
+  """
+  def __init__(self,
+               kernel_size,
+               filters,
+               stage,
+               block,
+               data_format,
+               strides=(2, 2)):
+    super(_ConvBlock, self).__init__(name='')
+    filters1, filters2, filters3 = filters
+    conv_name_base = 'res' + str(stage) + block + '_branch'
+    bn_name_base = 'bn' + str(stage) + block + '_branch'
+    bn_axis = 1 if data_format == 'channels_first' else 3
+    self.conv2a = layers.Conv2D(
+        filters1, (1, 1),
+        strides=strides,
+        name=conv_name_base + '2a',
+        data_format=data_format)
+    self.bn2a = layers.BatchNormalization(
+        axis=bn_axis, name=bn_name_base + '2a')
+    self.conv2b = layers.Conv2D(
+        filters2,
+        kernel_size,
+        padding='same',
+        name=conv_name_base + '2b',
+        data_format=data_format)
+    self.bn2b = layers.BatchNormalization(
+        axis=bn_axis, name=bn_name_base + '2b')
+    self.conv2c = layers.Conv2D(
+        filters3, (1, 1), name=conv_name_base + '2c', data_format=data_format)
+    self.bn2c = layers.BatchNormalization(
+        axis=bn_axis, name=bn_name_base + '2c')
+    self.conv_shortcut = layers.Conv2D(
+        filters3, (1, 1),
+        strides=strides,
+        name=conv_name_base + '1',
+        data_format=data_format)
+    self.bn_shortcut = layers.BatchNormalization(
+        axis=bn_axis, name=bn_name_base + '1')
+  def call(self, input_tensor, training=False):
+    x = self.conv2a(input_tensor)
+    x = self.bn2a(x, training=training)
+    x = tf.nn.relu(x)
+    x = self.conv2b(x)
+    x = self.bn2b(x, training=training)
+    x = tf.nn.relu(x)
+    x = self.conv2c(x)
+    x = self.bn2c(x, training=training)
+    shortcut = self.conv_shortcut(input_tensor)
+    shortcut = self.bn_shortcut(shortcut, training=training)
+    x += shortcut
+    return tf.nn.relu(x)
+# pylint: disable=not-callable
+class ResNet50(tf.keras.Model):
+  """Instantiates the ResNet50 architecture.
+  Args:
+    data_format: format for the image. Either 'channels_first' or
+      'channels_last'.  'channels_first' is typically faster on GPUs while
+      'channels_last' is typically faster on CPUs. See
+      https://www.tensorflow.org/performance/performance_guide#data_formats
+    name: Prefix applied to names of variables created in the model.
+    include_top: whether to include the fully-connected layer at the top of the
+      network.
+    pooling: Optional pooling mode for feature extraction when `include_top` is
+      False. 'None' means that the output of the model will be the 4D tensor
+      output of the last convolutional layer. 'avg' means that global average
+      pooling will be applied to the output of the last convolutional layer, and
+      thus the output of the model will be a 2D tensor. 'max' means that global
+      max pooling will be applied.
+    block3_strides: whether to add a stride of 2 to block3 to make it compatible
+      with tf.slim ResNet implementation.
+    average_pooling: whether to do average pooling of block4 features before
+      global pooling.
+    classes: optional number of classes to classify images into, only to be
+      specified if `include_top` is True.
+  Raises:
+      ValueError: in case of invalid argument for data_format.
+  """
+  def __init__(self,
+               data_format,
+               name='',
+               include_top=True,
+               pooling=None,
+               block3_strides=False,
+               average_pooling=True,
+               classes=1000):
+    super(ResNet50, self).__init__(name=name)
+    valid_channel_values = ('channels_first', 'channels_last')
+    if data_format not in valid_channel_values:
+      raise ValueError('Unknown data_format: %s. Valid values: %s' %
+                       (data_format, valid_channel_values))
+    self.include_top = include_top
+    self.block3_strides = block3_strides
+    self.average_pooling = average_pooling
+    self.pooling = pooling
+    def conv_block(filters, stage, block, strides=(2, 2)):
+      return _ConvBlock(
+          3,
+          filters,
+          stage=stage,
+          block=block,
+          data_format=data_format,
+          strides=strides)
+    def id_block(filters, stage, block):
+      return _IdentityBlock(
+          3, filters, stage=stage, block=block, data_format=data_format)
+    self.conv1 = layers.Conv2D(
+        64, (7, 7),
+        strides=(2, 2),
+        data_format=data_format,
+        padding='same',
+        name='conv1')
+    bn_axis = 1 if data_format == 'channels_first' else 3
+    self.bn_conv1 = layers.BatchNormalization(axis=bn_axis, name='bn_conv1')
+    self.max_pool = layers.MaxPooling2D((3, 3),
+                                        strides=(2, 2),
+                                        data_format=data_format)
+    self.l2a = conv_block([64, 64, 256], stage=2, block='a', strides=(1, 1))
+    self.l2b = id_block([64, 64, 256], stage=2, block='b')
+    self.l2c = id_block([64, 64, 256], stage=2, block='c')
+    self.l3a = conv_block([128, 128, 512], stage=3, block='a')
+    self.l3b = id_block([128, 128, 512], stage=3, block='b')
+    self.l3c = id_block([128, 128, 512], stage=3, block='c')
+    self.l3d = id_block([128, 128, 512], stage=3, block='d')
+    self.l4a = conv_block([256, 256, 1024], stage=4, block='a')
+    self.l4b = id_block([256, 256, 1024], stage=4, block='b')
+    self.l4c = id_block([256, 256, 1024], stage=4, block='c')
+    self.l4d = id_block([256, 256, 1024], stage=4, block='d')
+    self.l4e = id_block([256, 256, 1024], stage=4, block='e')
+    self.l4f = id_block([256, 256, 1024], stage=4, block='f')
+    # Striding layer that can be used on top of block3 to produce feature maps
+    # with the same resolution as the TF-Slim implementation.
+    if self.block3_strides:
+      self.subsampling_layer = layers.MaxPooling2D((1, 1),
+                                                   strides=(2, 2),
+                                                   data_format=data_format)
+      self.l5a = conv_block([512, 512, 2048],
+                            stage=5,
+                            block='a',
+                            strides=(1, 1))
+    else:
+      self.l5a = conv_block([512, 512, 2048], stage=5, block='a')
+    self.l5b = id_block([512, 512, 2048], stage=5, block='b')
+    self.l5c = id_block([512, 512, 2048], stage=5, block='c')
+    self.avg_pool = layers.AveragePooling2D((7, 7),
+                                            strides=(7, 7),
+                                            data_format=data_format)
+    if self.include_top:
+      self.flatten = layers.Flatten()
+      self.fc1000 = layers.Dense(classes, name='fc1000')
+    else:
+      reduction_indices = [1, 2] if data_format == 'channels_last' else [2, 3]
+      reduction_indices = tf.constant(reduction_indices)
+      if pooling == 'avg':
+        self.global_pooling = functools.partial(
+            tf.reduce_mean, axis=reduction_indices, keepdims=False)
+      elif pooling == 'max':
+        self.global_pooling = functools.partial(
+            tf.reduce_max, axis=reduction_indices, keepdims=False)
+      else:
+        self.global_pooling = None
+  def call(self, inputs, training=True, intermediates_dict=None):
+    """Call the ResNet50 model.
+    Args:
+      inputs: Images to compute features for.
+      training: Whether model is in training phase.
+      intermediates_dict: `None` or dictionary. If not None, accumulate feature
+        maps from intermediate blocks into the dictionary. ""
+    Returns:
+      Tensor with featuremap.
+    """
+    x = self.conv1(inputs)
+    x = self.bn_conv1(x, training=training)
+    x = tf.nn.relu(x)
+    if intermediates_dict is not None:
+      intermediates_dict['block0'] = x
+    x = self.max_pool(x)
+    if intermediates_dict is not None:
+      intermediates_dict['block0mp'] = x
+    # Block 1 (equivalent to "conv2" in Resnet paper).
+    x = self.l2a(x, training=training)
+    x = self.l2b(x, training=training)
+    x = self.l2c(x, training=training)
+    if intermediates_dict is not None:
+      intermediates_dict['block1'] = x
+    # Block 2 (equivalent to "conv3" in Resnet paper).
+    x = self.l3a(x, training=training)
+    x = self.l3b(x, training=training)
+    x = self.l3c(x, training=training)
+    x = self.l3d(x, training=training)
+    if intermediates_dict is not None:
+      intermediates_dict['block2'] = x
+    # Block 3 (equivalent to "conv4" in Resnet paper).
+    x = self.l4a(x, training=training)
+    x = self.l4b(x, training=training)
+    x = self.l4c(x, training=training)
+    x = self.l4d(x, training=training)
+    x = self.l4e(x, training=training)
+    x = self.l4f(x, training=training)
+    if self.block3_strides:
+      x = self.subsampling_layer(x)
+      if intermediates_dict is not None:
+        intermediates_dict['block3'] = x
+    else:
+      if intermediates_dict is not None:
+        intermediates_dict['block3'] = x
+    x = self.l5a(x, training=training)
+    x = self.l5b(x, training=training)
+    x = self.l5c(x, training=training)
+    if self.average_pooling:
+      x = self.avg_pool(x)
+      if intermediates_dict is not None:
+        intermediates_dict['block4'] = x
+    else:
+      if intermediates_dict is not None:
+        intermediates_dict['block4'] = x
+    if self.include_top:
+      return self.fc1000(self.flatten(x))
+    elif self.global_pooling:
+      return self.global_pooling(x)
+    else:
+      return x