Merge pull request #2568 from andrefaraujo/master

Adding DELF model

Merge pull request #2568 from andrefaraujo/master
Adding DELF model
9683ee99 · Lukasz Kaiser · GitHub · 0def57a5 · 7f5bdcd4 · 9683ee99
Commit 9683ee99 authored Oct 20, 2017 by Lukasz Kaiser Committed by GitHub Oct 20, 2017
20 changed files
--- a/CODEOWNERS
+++ b/CODEOWNERS
@@ -7,6 +7,7 @@ research/audioset/* @plakal @dpwe
 research/autoencoders/* @snurkabill
 research/cognitive_mapping_and_planning/* @s-gupta
 research/compression/* @nmjohn
+research/delf/* @andrefaraujo
 research/differential_privacy/* @panyx0718
 research/domain_adaptation/* @bousmalis @ddohan
 research/im2txt/* @cshallue

--- a/research/README.md
+++ b/research/README.md
@@ -6,41 +6,63 @@ respective authors. To propose a model for inclusion, please submit a pull
 request.

 Currently, the models are compatible with TensorFlow 1.0 or later. If you are
-running TensorFlow 0.12 or earlier, please
-[upgrade your installation](https://www.tensorflow.org/install).
-
+running TensorFlow 0.12 or earlier, please [upgrade your
+installation](https://www.tensorflow.org/install).

 ## Models
- [adversarial_crypto](adversarial_crypto): protecting communications with adversarial neural cryptography.
- [adversarial_text](adversarial_text): semi-supervised sequence learning with adversarial training.
- [attention_ocr](attention_ocr): a model for real-world image text extraction.
- [audioset](audioset): Models and supporting code for use with [AudioSet](http://g.co.audioset).
- [autoencoder](autoencoder): various autoencoders.
- [cognitive_mapping_and_planning](cognitive_mapping_and_planning): implementation of a spatial memory based mapping and planning architecture for visual navigation.
- [compression](compression): compressing and decompressing images using a pre-trained Residual GRU network.
- [differential_privacy](differential_privacy): privacy-preserving student models from multiple teachers.
- [domain_adaptation](domain_adaptation): domain separation networks.
- [im2txt](im2txt): image-to-text neural network for image captioning.
- [inception](inception): deep convolutional networks for computer vision.
- [learning_to_remember_rare_events](learning_to_remember_rare_events):  a large-scale life-long memory module for use in deep learning.
- [lfads](lfads): sequential variational autoencoder for analyzing neuroscience data.
- [lm_1b](lm_1b): language modeling on the one billion word benchmark.
- [namignizer](namignizer): recognize and generate names.
- [neural_gpu](neural_gpu): highly parallel neural computer.
- [neural_programmer](neural_programmer): neural network augmented with logic and mathematic operations.
- [next_frame_prediction](next_frame_prediction): probabilistic future frame synthesis via cross convolutional networks.
- [object_detection](object_detection): localizing and identifying multiple objects in a single image.
- [pcl_rl](pcl_rl): code for several reinforcement learning algorithms, including Path Consistency Learning.
- [ptn](ptn): perspective transformer nets for 3D object reconstruction.
- [qa_kg](qa_kg): module networks for question answering on knowledge graphs.
- [real_nvp](real_nvp): density estimation using real-valued non-volume preserving (real NVP) transformations.
- [rebar](rebar): low-variance, unbiased gradient estimates for discrete latent variable models.
- [resnet](resnet): deep and wide residual networks.
- [skip_thoughts](skip_thoughts): recurrent neural network sentence-to-vector encoder.
- [slim](slim): image classification models in TF-Slim.
- [street](street): identify the name of a street (in France) from an image using a Deep RNN.
- [swivel](swivel): the Swivel algorithm for generating word embeddings.
- [syntaxnet](syntaxnet): neural models of natural language syntax.
- [textsum](textsum): sequence-to-sequence with attention model for text summarization.
- [transformer](transformer): spatial transformer network, which allows the spatial manipulation of data within the network.
- [video_prediction](video_prediction): predicting future video frames with neural advection.
+
+-   [adversarial_crypto](adversarial_crypto): protecting communications with
+    adversarial neural cryptography.
+-   [adversarial_text](adversarial_text): semi-supervised sequence learning with
+    adversarial training.
+-   [attention_ocr](attention_ocr): a model for real-world image text
+    extraction.
+-   [audioset](audioset): Models and supporting code for use with
+    [AudioSet](http://g.co.audioset).
+-   [autoencoder](autoencoder): various autoencoders.
+-   [cognitive_mapping_and_planning](cognitive_mapping_and_planning):
+    implementation of a spatial memory based mapping and planning architecture
+    for visual navigation.
+-   [compression](compression): compressing and decompressing images using a
+    pre-trained Residual GRU network.
+-   [delf](delf): deep local features for image matching and retrieval.
+-   [differential_privacy](differential_privacy): privacy-preserving student
+    models from multiple teachers.
+-   [domain_adaptation](domain_adaptation): domain separation networks.
+-   [im2txt](im2txt): image-to-text neural network for image captioning.
+-   [inception](inception): deep convolutional networks for computer vision.
+-   [learning_to_remember_rare_events](learning_to_remember_rare_events): a
+    large-scale life-long memory module for use in deep learning.
+-   [lfads](lfads): sequential variational autoencoder for analyzing
+    neuroscience data.
+-   [lm_1b](lm_1b): language modeling on the one billion word benchmark.
+-   [namignizer](namignizer): recognize and generate names.
+-   [neural_gpu](neural_gpu): highly parallel neural computer.
+-   [neural_programmer](neural_programmer): neural network augmented with logic
+    and mathematic operations.
+-   [next_frame_prediction](next_frame_prediction): probabilistic future frame
+    synthesis via cross convolutional networks.
+-   [object_detection](object_detection): localizing and identifying multiple
+    objects in a single image.
+-   [pcl_rl](pcl_rl): code for several reinforcement learning algorithms,
+    including Path Consistency Learning.
+-   [ptn](ptn): perspective transformer nets for 3D object reconstruction.
+-   [qa_kg](qa_kg): module networks for question answering on knowledge graphs.
+-   [real_nvp](real_nvp): density estimation using real-valued non-volume
+    preserving (real NVP) transformations.
+-   [rebar](rebar): low-variance, unbiased gradient estimates for discrete
+    latent variable models.
+-   [resnet](resnet): deep and wide residual networks.
+-   [skip_thoughts](skip_thoughts): recurrent neural network sentence-to-vector
+    encoder.
+-   [slim](slim): image classification models in TF-Slim.
+-   [street](street): identify the name of a street (in France) from an image
+    using a Deep RNN.
+-   [swivel](swivel): the Swivel algorithm for generating word embeddings.
+-   [syntaxnet](syntaxnet): neural models of natural language syntax.
+-   [textsum](textsum): sequence-to-sequence with attention model for text
+    summarization.
+-   [transformer](transformer): spatial transformer network, which allows the
+    spatial manipulation of data within the network.
+-   [video_prediction](video_prediction): predicting future video frames with
+    neural advection.
--- a/research/delf/.gitignore
+++ b/research/delf/.gitignore
+*pyc
+*~
+*pb2.py
+*pb2.pyc
--- a/research/delf/EXTRACTION_MATCHING.md
+++ b/research/delf/EXTRACTION_MATCHING.md
+## Quick start: DELF extraction and matching
+
+To illustrate DELF usage, please download the Oxford buildings dataset. To
+follow these instructions closely, please download the dataset to the
+`tensorflow/models/research/delf/delf/python/examples` directory, as in the
+following commands:
+
+```bash
+# From tensorflow/models/research/delf/delf/python/examples/
+mkdir data && cd data
+wget http://www.robots.ox.ac.uk/~vgg/data/oxbuildings/oxbuild_images.tgz
+mkdir oxford5k_images oxford5k_features
+tar -xvzf oxbuild_images.tgz -C oxford5k_images/
+cd ../
+echo data/oxford5k_images/hertford_000056.jpg >> list_images.txt
+echo data/oxford5k_images/oxford_000317.jpg >> list_images.txt
+```
+
+Also, you will need to download the trained DELF model:
+
+```bash
+# From tensorflow/models/research/delf/delf/python/examples/
+mkdir parameters && cd parameters
+wget http://download.tensorflow.org/models/delf_v1_20171026.tar.gz
+tar -xvzf delf_v1_20171026.tar.gz
+```
+
+### DELF feature extraction
+
+Now that you have everything in place, running this command should extract DELF
+features for the images `hertford_000056.jpg` and `oxford_000317.jpg`:
+
+```bash
+# From tensorflow/models/research/delf/delf/python/examples/
+python extract_features.py \
+  --config_path delf_config_example.pbtxt \
+  --list_images_path list_images.txt \
+  --output_dir data/oxford5k_features
+```
+
+### Image matching using DELF features
+
+After feature extraction, run this command to perform feature matching between
+the images `hertford_000056.jpg` and `oxford_000317.jpg`:
+
+```bash
+python match_images.py \
+  --image_1_path data/oxford5k_images/hertford_000056.jpg \
+  --image_2_path data/oxford5k_images/oxford_000317.jpg \
+  --features_1_path data/oxford5k_features/hertford_000056.delf \
+  --features_2_path data/oxford5k_features/oxford_000317.delf \
+  --output_image matched_images.png
+```
+
+The image `matched_images.png` is generated and should look similar to this one:
+
+![MatchedImagesExample](delf/python/examples/matched_images_example.png)
--- a/research/delf/INSTALL_INSTRUCTIONS.md
+++ b/research/delf/INSTALL_INSTRUCTIONS.md
+## DELF installation
+
+### Tensorflow
+
+For detailed steps to install Tensorflow, follow the [Tensorflow installation
+instructions](https://www.tensorflow.org/install/). A typical user can install
+Tensorflow using one of the following commands:
+
+```bash
+# For CPU:
+pip install tensorflow
+# For GPU:
+pip install tensorflow-gpu
+```
+
+### Protobuf
+
+The DELF library uses [protobuf](https://github.com/google/protobuf) (the python
+version) to configure feature extraction and its format. You will need the
+`protoc` compiler, version >= 3.3. The easiest way to get it is to download
+directly. For Linux, this can be done as (see
+[here](https://github.com/google/protobuf/releases) for other platforms):
+
+```bash
+wget https://github.com/google/protobuf/releases/download/v3.3.0/protoc-3.3.0-linux-x86_64.zip
+unzip protoc-3.3.0-linux-x86_64.zip
+PATH_TO_PROTOC=`pwd`
+```
+
+### Python dependencies
+
+Install python library dependencies:
+
+```bash
+sudo pip install matplotlib
+sudo pip install numpy
+sudo pip install scikit-image
+sudo pip install scipy
+```
+
+### `tensorflow/models`
+
+Now, clone `tensorflow/models`, and install required libraries: (note that the
+`object_detection` library requires you to add `tensorflow/models/research/` to
+your `PYTHONPATH`, as instructed
+[here](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md))
+
+```bash
+git clone https://github.com/tensorflow/models
+
+# First, install slim's "nets" package.
+cd models/research/slim/
+sudo pip install -e .
+
+# Second, setup the object_detection module by editing PYTHONPATH.
+cd ..
+# From tensorflow/models/research/
+export PYTHONPATH=$PYTHONPATH:`pwd`
+```
+
+Then, compile DELF's protobufs. Use `PATH_TO_PROTOC` as the directory where you
+downloaded the `protoc` compiler.
+
+```bash
+# From tensorflow/models/research/delf/
+${PATH_TO_PROTOC?}/bin/protoc delf/protos/*.proto --python_out=.
+```
+
+Finally, install the DELF package.
+
+```bash
+# From tensorflow/models/research/delf/
+sudo pip install -e . # Install "delf" package.
+```
+
+At this point, running
+
+```bash
+python -c 'import delf'
+```
+
+should just return without complaints. This indicates that the DELF package is
+loaded successfully.
--- a/research/delf/README.md
+++ b/research/delf/README.md
+# DELF: DEep Local Features
+
+This project presents code for extracting DELF features, which were introduced
+with the paper ["Large-Scale Image Retrieval with Attentive Deep Local
+Features"](https://arxiv.org/abs/1612.06321). A simple application is also
+illustrated, where two images containing the same landmark can be matched to
+each other, to obtain local image correspondences.
+
+DELF is particularly useful for large-scale instance-level image recognition. It
+detects and describes semantic local features which can be geometrically
+verified between images showing the same object instance. The pre-trained model
+released here has been optimized for landmark recognition, so expect it to work
+well in this area. We also provide tensorflow code for building the DELF model,
+which could then be used to train models for other types of objects.
+
+If you make use of this code, please consider citing:
+
+```
+"Large-Scale Image Retrieval with Attentive Deep Local Features",
+Hyeonwoo Noh, Andre Araujo, Jack Sim, Tobias Weyand, Bohyung Han,
+Proc. ICCV'17
+```
+
+## Installation
+
+To be able to use this code, please follow [these
+instructions](INSTALL_INSTRUCTIONS.md) to properly install the DELF library.
+
+## Quick start: DELF extraction and matching
+
+Please follow [these instructions](EXTRACTION_MATCHING.md). At the end, you
+should obtain a nice figure showing local feature matches, as:
+
+![MatchedImagesExample](delf/python/examples/matched_images_example.png)
+
+## Code overview
+
+DELF's code is located under the `delf` directory. There are two directories
+therein, `protos` and `python`.
+
+### `delf/protos`
+
+This directory contains three protobufs:
+
+-   `datum.proto`: general-purpose protobuf for serializing float tensors.
+-   `feature.proto`: protobuf for serializing DELF features.
+-   `delf_config.proto`: protobuf for configuring DELF extraction.
+
+### `delf/python`
+
+This directory contains files for several different purposes:
+
+-   `datum_io.py`, `feature_io.py` are helper files for reading and writing
+    tensors and features.
+-   `delf_v1.py` contains the code to create DELF models.
+-   `feature_extractor.py` contains the code to extract features using DELF.
+    This is particularly useful for extracting features over multiple scales,
+    with keypoint selection based on attention scores, and PCA/whitening
+    post-processing.
+
+Besides these, other files in this directory contain tests for different
+modules.
+
+The subdirectory `delf/python/examples` contains sample scripts to run DELF
+feature extraction and matching:
+
+-   `extract_features.py` enables DELF extraction from a list of images.
+-   `match_images.py` supports image matching using DELF features extracted
+    using `extract_features.py`.
+-   `delf_config_example.pbtxt` shows an example instantiation of the DelfConfig
+    proto, used for DELF feature extraction.
+
+## Dataset
+
+The Google-Landmarks dataset will be released together with a Kaggle-hosted
+landmark recognition competition. We will include the link to it here once it is
+launched (expect this to be done around mid-January, 2018).
+
+## Maintainers
+
+Andr&eacute; Araujo (@andrefaraujo)
+
+## Release history
+
+### October 26, 2017
+
+Initial release containing DELF-v1 code, including feature extraction and
+matching examples.
+
+**Thanks to contributors**: Andr&eacute; Araujo, Hyeonwoo Noh, Youlong Cheng,
+Jack Sim.
--- a/research/delf/delf/__init__.py
+++ b/research/delf/delf/__init__.py
+# Copyright 2017 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Module to extract deep local features."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+# pylint: disable=unused-import
+from delf.protos import datum_pb2
+from delf.protos import delf_config_pb2
+from delf.protos import feature_pb2
+from delf.python import datum_io
+from delf.python import delf_v1
+from delf.python import feature_extractor
+from delf.python import feature_io
+# pylint: enable=unused-import
--- a/research/delf/delf/protos/__init__.py
+++ b/research/delf/delf/protos/__init__.py
--- a/research/delf/delf/protos/datum.proto
+++ b/research/delf/delf/protos/datum.proto
+// Protocol buffer for serializing arbitrary float tensors.
+// Note: Currently only floating point feature is supported.
+
+syntax = "proto2";
+
+package delf.protos;
+
+// A DatumProto is a data structure used to serialize tensor with arbitrary
+// shape. DatumProto contains an array of floating point values and its shape
+// is represented as a sequence of integer values. Values are contained in
+// row major order.
+//
+// Example:
+//  3 x 2 array
+//
+//  [1.1, 2.2]
+//  [3.3, 4.4]
+//  [5.5, 6.6]
+//
+//  can be represented with the following DatumProto:
+//
+//  DatumProto {
+//    shape {
+//      dim: 3
+//      dim: 2
+//    }
+//    float_list {
+//      value: 1.1
+//      value: 2.2
+//      value: 3.3
+//      value: 4.4
+//      value: 5.5
+//      value: 6.6
+//    }
+//  }
+
+// DatumShape is array of dimension of the tensor.
+message DatumShape {
+  repeated int64 dim = 1 [packed = true];
+}
+
+// FloatList is the container of tensor values. The tensor values are saved as
+// a list of floating point values.
+message FloatList {
+  repeated float value = 1 [packed = true];
+}
+
+message DatumProto {
+  optional DatumShape shape = 1;
+  oneof kind_oneof {
+    FloatList float_list = 2;
+  }
+}
--- a/research/delf/delf/protos/delf_config.proto
+++ b/research/delf/delf/protos/delf_config.proto
+// Protocol buffer for configuring DELF feature extraction.
+
+syntax = "proto2";
+
+package delf.protos;
+
+message DelfPcaParameters {
+  // Path to PCA mean file.
+  optional string mean_path = 1; // Required.
+
+  // Path to PCA matrix file.
+  optional string projection_matrix_path = 2; // Required.
+
+  // Dimensionality of feature after PCA.
+  optional int32 pca_dim = 3; // Required.
+
+  // If whitening is to be used, this must be set to true.
+  optional bool use_whitening = 4 [default = false];
+
+  // Path to PCA variances file, used for whitening. This is used only if
+  // use_whitening is set to true.
+  optional string pca_variances_path = 5;
+}
+
+message DelfLocalFeatureConfig {
+  // If PCA is to be used, this must be set to true.
+  optional bool use_pca = 1 [default = true];
+
+  // Target layer name for DELF model. This is used to obtain receptive field
+  // parameters used for localizing features with respect to the input image.
+  optional string layer_name = 2 [default = ""];
+
+  // Intersection over union threshold for the non-max suppression (NMS)
+  // operation. If two features overlap by at most this amount, both are kept.
+  // Otherwise, the one with largest attention score is kept. This should be a
+  // number between 0.0 (no region is selected) and 1.0 (all regions are
+  // selected and NMS is not performed).
+  optional float iou_threshold = 3 [default = 1.0];
+
+  // Maximum number of features that will be selected. The features with largest
+  // scores (eg, largest attention score if score_type is "Att") are the
+  // selected ones.
+  optional int32 max_feature_num = 4 [default = 1000];
+
+  // Threshold to be used for feature selection: no feature with score lower
+  // than this number will be selected).
+  optional float score_threshold = 5 [default = 100.0];
+
+  // PCA parameters for DELF local feature. This is used only if use_pca is
+  // true.
+  optional DelfPcaParameters pca_parameters = 6;
+}
+
+message DelfConfig {
+  // Path to DELF model.
+  optional string model_path = 1; // Required.
+
+  // Image scales to be used.
+  repeated float image_scales = 2;
+
+  // Configuration used for DELF local features.
+  optional DelfLocalFeatureConfig delf_local_config = 3;
+
+}
--- a/research/delf/delf/protos/feature.proto
+++ b/research/delf/delf/protos/feature.proto
+// Protocol buffer for serializing the DELF feature information.
+
+syntax = "proto2";
+
+package delf.protos;
+
+import "delf/protos/datum.proto";
+
+// FloatList is the container of tensor values. The tensor values are saved as
+// a list of floating point values.
+message DelfFeature {
+  optional DatumProto descriptor = 1;
+  optional float x = 2;
+  optional float y = 3;
+  optional float scale = 4;
+  optional float orientation = 5;
+  optional float strength = 6;
+}
+
+message DelfFeatures {
+  repeated DelfFeature feature = 1;
+}
--- a/research/delf/delf/python/__init__.py
+++ b/research/delf/delf/python/__init__.py
--- a/research/delf/delf/python/datum_io.py
+++ b/research/delf/delf/python/datum_io.py
+# Copyright 2017 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Python interface for DatumProto.
+
+DatumProto is protocol buffer used to serialize tensor with arbitrary shape.
+Please refer to datum.proto for details.
+
+Support read and write of DatumProto from/to numpy array and file.
+"""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from delf import datum_pb2
+import numpy as np
+import tensorflow as tf
+
+
+def ArrayToDatum(arr):
+  """Converts numpy array to DatumProto.
+
+  Args:
+    arr: Numpy array of arbitrary shape.
+
+  Returns:
+    datum: DatumProto object.
+  """
+  datum = datum_pb2.DatumProto()
+  datum.float_list.value.extend(arr.astype(float).flat)
+  datum.shape.dim.extend(arr.shape)
+  return datum
+
+
+def DatumToArray(datum):
+  """Converts data saved in DatumProto to numpy array.
+
+  Args:
+    datum: DatumProto object.
+
+  Returns:
+    Numpy array of arbitrary shape.
+  """
+  return np.array(datum.float_list.value).astype(float).reshape(datum.shape.dim)
+
+
+def SerializeToString(arr):
+  """Converts numpy array to serialized DatumProto.
+
+  Args:
+    arr: Numpy array of arbitrary shape.
+
+  Returns:
+    Serialized DatumProto string.
+  """
+  datum = ArrayToDatum(arr)
+  return datum.SerializeToString()
+
+
+def ParseFromString(string):
+  """Converts serialized DatumProto string to numpy array.
+
+  Args:
+    string: Serialized DatumProto string.
+
+  Returns:
+    Numpy array.
+  """
+  datum = datum_pb2.DatumProto()
+  datum.ParseFromString(string)
+  return DatumToArray(datum)
+
+
+def ReadFromFile(file_path):
+  """Helper function to load data from a DatumProto format in a file.
+
+  Args:
+    file_path: Path to file containing data.
+
+  Returns:
+    data: Numpy array.
+  """
+  with tf.gfile.FastGFile(file_path, 'r') as f:
+    return ParseFromString(f.read())
+
+
+def WriteToFile(data, file_path):
+  """Helper function to write data to a file in DatumProto format.
+
+  Args:
+    data: Numpy array.
+    file_path: Path to file that will be written.
+  """
+  serialized_data = SerializeToString(data)
+  with tf.gfile.FastGFile(file_path, 'w') as f:
+    f.write(serialized_data)
--- a/research/delf/delf/python/datum_io_test.py
+++ b/research/delf/delf/python/datum_io_test.py
+# Copyright 2017 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Tests for datum_io, the python interface of DatumProto."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from delf import datum_io
+from delf import datum_pb2
+import numpy as np
+import os
+import tensorflow as tf
+
+
+class DatumIoTest(tf.test.TestCase):
+
+  def Conversion2dTestWithType(self, dtype):
+    original_data = np.arange(9).reshape(3, 3).astype(dtype)
+    serialized = datum_io.SerializeToString(original_data)
+    retrieved_data = datum_io.ParseFromString(serialized)
+    self.assertTrue(np.array_equal(original_data, retrieved_data))
+
+  def Conversion3dTestWithType(self, dtype):
+    original_data = np.arange(24).reshape(2, 3, 4).astype(dtype)
+    serialized = datum_io.SerializeToString(original_data)
+    retrieved_data = datum_io.ParseFromString(serialized)
+    self.assertTrue(np.array_equal(original_data, retrieved_data))
+
+  def testConversion2dWithType(self):
+    self.Conversion2dTestWithType(np.int8)
+    self.Conversion2dTestWithType(np.int16)
+    self.Conversion2dTestWithType(np.int32)
+    self.Conversion2dTestWithType(np.int64)
+    self.Conversion2dTestWithType(np.float16)
+    self.Conversion2dTestWithType(np.float32)
+    self.Conversion2dTestWithType(np.float64)
+
+  def testConversion3dWithType(self):
+    self.Conversion3dTestWithType(np.int8)
+    self.Conversion3dTestWithType(np.int16)
+    self.Conversion3dTestWithType(np.int32)
+    self.Conversion3dTestWithType(np.int64)
+    self.Conversion3dTestWithType(np.float16)
+    self.Conversion3dTestWithType(np.float32)
+    self.Conversion3dTestWithType(np.float64)
+
+  def testWriteAndReadToFile(self):
+    data = np.array([[[-1.0, 125.0, -2.5], [14.5, 3.5, 0.0]],
+                     [[20.0, 0.0, 30.0], [25.5, 36.0, 42.0]]])
+    tmpdir = tf.test.get_temp_dir()
+    filename = os.path.join(tmpdir, 'test.datum')
+    datum_io.WriteToFile(data, filename)
+    data_read = datum_io.ReadFromFile(filename)
+    self.assertAllEqual(data_read, data)
+
+
+if __name__ == '__main__':
+  tf.test.main()
--- a/research/delf/delf/python/delf_v1.py
+++ b/research/delf/delf/python/delf_v1.py
+# Copyright 2017 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""DELF model implementation based on the following paper:
+
+  Large-Scale Image Retrieval with Attentive Deep Local Features
+  https://arxiv.org/abs/1612.06321
+
+Please refer to the README.md file for detailed explanations on using the DELF
+model.
+"""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from nets import resnet_v1
+
+import tensorflow as tf
+
+slim = tf.contrib.slim
+
+_SUPPORTED_TARGET_LAYER = ['resnet_v1_50/block3', 'resnet_v1_50/block4']
+
+# The variable scope for the attention portion of the model.
+_ATTENTION_VARIABLE_SCOPE = 'attention_block'
+
+# The attention_type determines whether the attention based feature aggregation
+# is performed on the L2-normalized feature map or on the default feature map
+# where L2-normalization is not applied. Note that in both cases, attention
+# functions are built on the un-normalized feature map. This is only relevant
+# for the training stage.
+# Currently supported options are as follows:
+# * use_l2_normalized_feature:
+#   The option use_l2_normalized_feature first applies L2-normalization on the
+#   feature map and then applies attention based feature aggregation. This
+#   option is used for the DELF+FT+Att model in the paper.
+# * use_default_input_feature:
+#   The option use_default_input_feature aggregates unnormalized feature map
+#   directly.
+_SUPPORTED_ATTENTION_TYPES = [
+    'use_l2_normalized_feature', 'use_default_input_feature'
+]
+
+# Supported types of non-lineary for the attention score function.
+_SUPPORTED_ATTENTION_NONLINEARITY = ['softplus']
+
+
+class DelfV1(object):
+  """Creates a DELF model.
+
+  Args:
+    target_layer_type: The name of target CNN architecture and its layer.
+
+  Raises:
+    ValueError: If an unknown target_layer_type is provided.
+  """
+
+  def __init__(self, target_layer_type=_SUPPORTED_TARGET_LAYER[0]):
+    tf.logging.info('Creating model %s ', target_layer_type)
+
+    self._target_layer_type = target_layer_type
+    if self._target_layer_type not in _SUPPORTED_TARGET_LAYER:
+      raise ValueError('Unknown model type.')
+
+  @property
+  def target_layer_type(self):
+    return self._target_layer_type
+
+  def _PerformAttention(self,
+                        attention_feature_map,
+                        feature_map,
+                        attention_nonlinear,
+                        kernel=1):
+    """Helper function to construct the attention part of the model.
+
+    Computes attention score map and aggregates the input feature map based on
+    the attention score map.
+
+    Args:
+      attention_feature_map: Potentially normalized feature map that will
+        be aggregated with attention score map.
+      feature_map: Unnormalized feature map that will be used to compute
+        attention score map.
+      attention_nonlinear: Type of non-linearity that will be applied to
+        attention value.
+      kernel: Convolutional kernel to use in attention layers (eg: 1, [3, 3]).
+
+    Returns:
+      attention_feat: Aggregated feature vector.
+      attention_prob: Attention score map after the non-linearity.
+      attention_score: Attention score map before the non-linearity.
+
+    Raises:
+      ValueError: If unknown attention non-linearity type is provided.
+    """
+    with tf.variable_scope(
+        'attention', values=[attention_feature_map, feature_map]):
+      with tf.variable_scope('compute', values=[feature_map]):
+        activation_fn_conv1 = tf.nn.relu
+        feature_map_conv1 = slim.conv2d(
+            feature_map,
+            512,
+            kernel,
+            rate=1,
+            activation_fn=activation_fn_conv1,
+            scope='conv1')
+
+        attention_score = slim.conv2d(
+            feature_map_conv1,
+            1,
+            kernel,
+            rate=1,
+            activation_fn=None,
+            normalizer_fn=None,
+            scope='conv2')
+
+      # Set activation of conv2 layer of attention model.
+      with tf.variable_scope(
+          'merge', values=[attention_feature_map, attention_score]):
+        if attention_nonlinear not in _SUPPORTED_ATTENTION_NONLINEARITY:
+          raise ValueError('Unknown attention non-linearity.')
+        if attention_nonlinear == 'softplus':
+          with tf.variable_scope(
+              'softplus_attention',
+              values=[attention_feature_map, attention_score]):
+            attention_prob = tf.nn.softplus(attention_score)
+            attention_feat = tf.reduce_mean(
+                tf.multiply(attention_feature_map, attention_prob), [1, 2])
+        attention_feat = tf.expand_dims(tf.expand_dims(attention_feat, 1), 2)
+    return attention_feat, attention_prob, attention_score
+
+  def _GetAttentionSubnetwork(
+      self,
+      feature_map,
+      end_points,
+      attention_nonlinear=_SUPPORTED_ATTENTION_NONLINEARITY[0],
+      attention_type=_SUPPORTED_ATTENTION_TYPES[0],
+      kernel=1,
+      reuse=False):
+    """Constructs the part of the model performing attention.
+
+    Args:
+      feature_map: A tensor of size [batch, height, width, channels]. Usually it
+        corresponds to the output feature map of a fully-convolutional network.
+      end_points: Set of activations of the network constructed so far.
+      attention_nonlinear: Type of non-linearity on top of the attention
+        function.
+      attention_type: Type of the attention structure.
+      kernel: Convolutional kernel to use in attention layers (eg, [3, 3]).
+      reuse: Whether or not the layer and its variables should be reused.
+
+    Returns:
+      prelogits: A tensor of size [batch, 1, 1, channels].
+      attention_prob: Attention score after the non-linearity.
+      attention_score: Attention score before the non-linearity.
+      end_points: Updated set of activations, for external use.
+    Raises:
+      ValueError: If unknown attention_type is provided.
+    """
+    with tf.variable_scope(
+        _ATTENTION_VARIABLE_SCOPE,
+        values=[feature_map, end_points],
+        reuse=reuse):
+      if attention_type not in _SUPPORTED_ATTENTION_TYPES:
+        raise ValueError('Unknown attention_type.')
+      if attention_type == 'use_l2_normalized_feature':
+        attention_feature_map = tf.nn.l2_normalize(
+            feature_map, 3, name='l2_normalize')
+      elif attention_type == 'use_default_input_feature':
+        attention_feature_map = feature_map
+      end_points['attention_feature_map'] = attention_feature_map
+
+      attention_outputs = self._PerformAttention(
+          attention_feature_map, feature_map, attention_nonlinear, kernel)
+      prelogits, attention_prob, attention_score = attention_outputs
+      end_points['prelogits'] = prelogits
+      end_points['attention_prob'] = attention_prob
+      end_points['attention_score'] = attention_score
+    return prelogits, attention_prob, attention_score, end_points
+
+  def GetResnet50Subnetwork(self,
+                            images,
+                            is_training=False,
+                            global_pool=False,
+                            reuse=None):
+    """Constructs resnet_v1_50 part of the DELF model.
+
+    Args:
+      images: A tensor of size [batch, height, width, channels].
+      is_training: Whether or not the model is in training mode.
+      global_pool: If True, perform global average pooling after feature
+        extraction. This may be useful for DELF's descriptor fine-tuning stage.
+      reuse: Whether or not the layer and its variables should be reused.
+
+    Returns:
+      net: A rank-4 tensor of size [batch, height_out, width_out, channels_out].
+        If global_pool is True, height_out = width_out = 1.
+      end_points: A set of activations for external use.
+    """
+    block = resnet_v1.resnet_v1_block
+    blocks = [
+        block('block1', base_depth=64, num_units=3, stride=2),
+        block('block2', base_depth=128, num_units=4, stride=2),
+        block('block3', base_depth=256, num_units=6, stride=2),
+    ]
+    if self._target_layer_type == 'resnet_v1_50/block4':
+      blocks.append(block('block4', base_depth=512, num_units=3, stride=1))
+    net, end_points = resnet_v1.resnet_v1(
+        images,
+        blocks,
+        is_training=is_training,
+        global_pool=global_pool,
+        reuse=reuse,
+        scope='resnet_v1_50')
+    return net, end_points
+
+  def GetAttentionPrelogit(
+      self,
+      images,
+      weight_decay=0.0001,
+      attention_nonlinear=_SUPPORTED_ATTENTION_NONLINEARITY[0],
+      attention_type=_SUPPORTED_ATTENTION_TYPES[0],
+      kernel=1,
+      training_resnet=False,
+      training_attention=False,
+      reuse=False,
+      use_batch_norm=True):
+    """Constructs attention model on resnet_v1_50.
+
+    Args:
+      images: A tensor of size [batch, height, width, channels].
+      weight_decay: The parameters for weight_decay regularizer.
+      attention_nonlinear: Type of non-linearity on top of the attention
+        function.
+      attention_type: Type of the attention structure.
+      kernel: Convolutional kernel to use in attention layers (eg, [3, 3]).
+      training_resnet: Whether or not the Resnet blocks from the model are in
+        training mode.
+      training_attention: Whether or not the attention part of the model is
+        in training mode.
+      reuse: Whether or not the layer and its variables should be reused.
+      use_batch_norm: Whether or not to use batch normalization.
+
+    Returns:
+      prelogits: A tensor of size [batch, 1, 1, channels].
+      attention_prob: Attention score after the non-linearity.
+      attention_score: Attention score before the non-linearity.
+      feature_map: Features extracted from the model, which are not
+        l2-normalized.
+      end_points: Set of activations for external use.
+    """
+    # Construct Resnet50 features.
+    with slim.arg_scope(
+        resnet_v1.resnet_arg_scope(use_batch_norm=use_batch_norm)):
+      _, end_points = self.GetResnet50Subnetwork(
+          images, is_training=training_resnet, reuse=reuse)
+
+    feature_map = end_points[self._target_layer_type]
+
+    # Construct attention subnetwork on top of features.
+    with slim.arg_scope(
+        resnet_v1.resnet_arg_scope(
+            weight_decay=weight_decay, use_batch_norm=use_batch_norm)):
+      with slim.arg_scope([slim.batch_norm], is_training=training_attention):
+        (prelogits, attention_prob, attention_score,
+         end_points) = self._GetAttentionSubnetwork(
+             feature_map,
+             end_points,
+             attention_nonlinear=attention_nonlinear,
+             attention_type=attention_type,
+             kernel=kernel,
+             reuse=reuse)
+
+    return prelogits, attention_prob, attention_score, feature_map, end_points
+
+  def _GetAttentionModel(
+      self,
+      images,
+      num_classes,
+      weight_decay=0.0001,
+      attention_nonlinear=_SUPPORTED_ATTENTION_NONLINEARITY[0],
+      attention_type=_SUPPORTED_ATTENTION_TYPES[0],
+      kernel=1,
+      training_resnet=False,
+      training_attention=False,
+      reuse=False):
+    """Constructs attention model on resnet_v1_50.
+
+    Args:
+      images: A tensor of size [batch, height, width, channels]
+      num_classes: The number of output classes.
+      weight_decay: The parameters for weight_decay regularizer.
+      attention_nonlinear: Type of non-linearity on top of the attention
+        function.
+      attention_type: Type of the attention structure.
+      kernel: Convolutional kernel to use in attention layers (eg, [3, 3]).
+      training_resnet: Whether or not the Resnet blocks from the model are in
+        training mode.
+      training_attention: Whether or not the attention part of the model is in
+        training mode.
+      reuse: Whether or not the layer and its variables should be reused.
+
+    Returns:
+      logits: A tensor of size [batch, num_classes].
+      attention_prob: Attention score after the non-linearity.
+      attention_score: Attention score before the non-linearity.
+      feature_map: Features extracted from the model, which are not
+        l2-normalized.
+    """
+
+    attention_feat, attention_prob, attention_score, feature_map, _ = (
+        self.GetAttentionPrelogit(
+            images,
+            weight_decay,
+            attention_nonlinear=attention_nonlinear,
+            attention_type=attention_type,
+            kernel=kernel,
+            training_resnet=training_resnet,
+            training_attention=training_attention,
+            reuse=reuse))
+    with slim.arg_scope(
+        resnet_v1.resnet_arg_scope(
+            weight_decay=weight_decay, batch_norm_scale=True)):
+      with slim.arg_scope([slim.batch_norm], is_training=training_attention):
+        with tf.variable_scope(
+            _ATTENTION_VARIABLE_SCOPE, values=[attention_feat], reuse=reuse):
+          logits = slim.conv2d(
+              attention_feat,
+              num_classes, [1, 1],
+              activation_fn=None,
+              normalizer_fn=None,
+              scope='logits')
+          logits = tf.squeeze(logits, [1, 2], name='spatial_squeeze')
+    return logits, attention_prob, attention_score, feature_map
+
+  def AttentionModel(self,
+                     images,
+                     num_classes,
+                     weight_decay=0.0001,
+                     attention_nonlinear=_SUPPORTED_ATTENTION_NONLINEARITY[0],
+                     attention_type=_SUPPORTED_ATTENTION_TYPES[0],
+                     kernel=1,
+                     training_resnet=False,
+                     training_attention=False,
+                     reuse=False):
+    """Constructs attention based classification model for training.
+
+    Args:
+      images: A tensor of size [batch, height, width, channels]
+      num_classes: The number of output classes.
+      weight_decay: The parameters for weight_decay regularizer.
+      attention_nonlinear: Type of non-linearity on top of the attention
+        function.
+      attention_type: Type of the attention structure.
+      kernel: Convolutional kernel to use in attention layers (eg, [3, 3]).
+      training_resnet: Whether or not the Resnet blocks from the model are in
+        training mode.
+      training_attention: Whether or not the model is in training mode. Note
+        that this function only supports training the attention part of the
+        model, ie, the feature extraction layers are not trained.
+      reuse: Whether or not the layer and its variables should be reused.
+
+    Returns:
+      logit: A tensor of size [batch, num_classes]
+      attention: Attention score after the non-linearity.
+      feature_map: Features extracted from the model, which are not
+        l2-normalized.
+
+    Raises:
+      ValueError: If unknown target_layer_type is provided.
+    """
+    if 'resnet_v1_50' in self._target_layer_type:
+      net_outputs = self._GetAttentionModel(
+          images,
+          num_classes,
+          weight_decay,
+          attention_nonlinear=attention_nonlinear,
+          attention_type=attention_type,
+          kernel=kernel,
+          training_resnet=training_resnet,
+          training_attention=training_attention,
+          reuse=reuse)
+      logits, attention, _, feature_map = net_outputs
+    else:
+      raise ValueError('Unknown target_layer_type.')
+    return logits, attention, feature_map
--- a/research/delf/delf/python/examples/delf_config_example.pbtxt
+++ b/research/delf/delf/python/examples/delf_config_example.pbtxt
+model_path: "parameters/delf_v1_20171026/model/"
+image_scales: .25
+image_scales: .3536
+image_scales: .5
+image_scales: .7071
+image_scales: 1.0
+image_scales: 1.4142
+image_scales: 2.0
+delf_local_config {
+  use_pca: true
+  # Note that for the exported model provided as an example, layer_name and
+  # iou_threshold are hard-coded in the checkpoint. So, the layer_name and
+  # iou_threshold variables here have no effect on the provided
+  # extract_features.py script.
+  layer_name: "resnet_v1_50/block3"
+  iou_threshold: 1.0
+  max_feature_num: 1000
+  score_threshold: 100.0
+  pca_parameters {
+    mean_path: "parameters/delf_v1_20171026/pca/mean.datum"
+    projection_matrix_path: "parameters/delf_v1_20171026/pca/pca_proj_mat.datum"
+    pca_dim: 40
+    use_whitening: false
+  }
+}
--- a/research/delf/delf/python/examples/extract_features.py
+++ b/research/delf/delf/python/examples/extract_features.py
+# Copyright 2017 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Extracts DELF features from a list of images, saving them to file.
+
+The images must be in JPG format. The program checks if descriptors already
+exist, and skips computation for those.
+"""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import argparse
+from google.protobuf import text_format
+import numpy as np
+import os
+import sys
+import tensorflow as tf
+from tensorflow.python.platform import app
+import time
+
+from delf import delf_config_pb2
+from delf import feature_extractor
+from delf import feature_io
+from delf import feature_pb2
+
+cmd_args = None
+
+# Extension of feature files.
+_DELF_EXT = '.delf'
+
+# Pace to report extraction log.
+_STATUS_CHECK_ITERATIONS = 100
+
+
+def _ReadImageList(list_path):
+  """Helper function to read image paths.
+
+  Args:
+    list_path: Path to list of images, one image path per line.
+
+  Returns:
+    image_paths: List of image paths.
+  """
+  with tf.gfile.GFile(list_path, 'r') as f:
+    image_paths = f.readlines()
+  image_paths = [entry.rstrip() for entry in image_paths]
+  return image_paths
+
+
+def main(unused_argv):
+  tf.logging.set_verbosity(tf.logging.INFO)
+
+  # Read list of images.
+  tf.logging.info('Reading list of images...')
+  image_paths = _ReadImageList(cmd_args.list_images_path)
+  num_images = len(image_paths)
+  tf.logging.info('done! Found %d images', num_images)
+
+  # Parse DelfConfig proto.
+  config = delf_config_pb2.DelfConfig()
+  with tf.gfile.FastGFile(cmd_args.config_path, 'r') as f:
+    text_format.Merge(f.read(), config)
+
+  # Create output directory if necessary.
+  if not os.path.exists(cmd_args.output_dir):
+    os.makedirs(cmd_args.output_dir)
+
+  # Tell TensorFlow that the model will be built into the default Graph.
+  with tf.Graph().as_default():
+    # Reading list of images.
+    filename_queue = tf.train.string_input_producer(image_paths, shuffle=False)
+    reader = tf.WholeFileReader()
+    _, value = reader.read(filename_queue)
+    image_tf = tf.image.decode_jpeg(value, channels=3)
+
+    with tf.Session() as sess:
+      # Initialize variables.
+      init_op = tf.global_variables_initializer()
+      sess.run(init_op)
+
+      # Loading model that will be used.
+      tf.saved_model.loader.load(sess, [tf.saved_model.tag_constants.SERVING],
+                                 config.model_path)
+      graph = tf.get_default_graph()
+      input_image = graph.get_tensor_by_name('input_image:0')
+      input_score_threshold = graph.get_tensor_by_name('input_abs_thres:0')
+      input_image_scales = graph.get_tensor_by_name('input_scales:0')
+      input_max_feature_num = graph.get_tensor_by_name(
+          'input_max_feature_num:0')
+      boxes = graph.get_tensor_by_name('boxes:0')
+      raw_descriptors = graph.get_tensor_by_name('features:0')
+      feature_scales = graph.get_tensor_by_name('scales:0')
+      attention_with_extra_dim = graph.get_tensor_by_name('scores:0')
+      attention = tf.reshape(attention_with_extra_dim,
+                             [tf.shape(attention_with_extra_dim)[0]])
+
+      locations, descriptors = feature_extractor.DelfFeaturePostProcessing(
+          boxes, raw_descriptors, config)
+
+      # Start input enqueue threads.
+      coord = tf.train.Coordinator()
+      threads = tf.train.start_queue_runners(sess=sess, coord=coord)
+      start = time.clock()
+      for i in range(num_images):
+        # Write to log-info once in a while.
+        if i == 0:
+          tf.logging.info('Starting to extract DELF features from images...')
+        elif i % _STATUS_CHECK_ITERATIONS == 0:
+          elapsed = (time.clock() - start)
+          tf.logging.info('Processing image %d out of %d, last %d '
+                          'images took %f seconds', i, num_images,
+                          _STATUS_CHECK_ITERATIONS, elapsed)
+          start = time.clock()
+
+        # # Get next image.
+        im = sess.run(image_tf)
+
+        # If descriptor already exists, skip its computation.
+        out_desc_filename = os.path.splitext(os.path.basename(
+            image_paths[i]))[0] + _DELF_EXT
+        out_desc_fullpath = os.path.join(cmd_args.output_dir, out_desc_filename)
+        if tf.gfile.Exists(out_desc_fullpath):
+          tf.logging.info('Skipping %s', image_paths[i])
+          continue
+
+        # Extract and save features.
+        (locations_out, descriptors_out, feature_scales_out,
+         attention_out) = sess.run(
+             [locations, descriptors, feature_scales, attention],
+             feed_dict={
+                 input_image:
+                     im,
+                 input_score_threshold:
+                     config.delf_local_config.score_threshold,
+                 input_image_scales:
+                     list(config.image_scales),
+                 input_max_feature_num:
+                     config.delf_local_config.max_feature_num
+             })
+
+        serialized_desc = feature_io.WriteToFile(
+            out_desc_fullpath, locations_out, feature_scales_out,
+            descriptors_out, attention_out)
+
+      # Finalize enqueue threads.
+      coord.request_stop()
+      coord.join(threads)
+
+
+if __name__ == '__main__':
+  parser = argparse.ArgumentParser()
+  parser.register('type', 'bool', lambda v: v.lower() == 'true')
+  parser.add_argument(
+      '--config_path',
+      type=str,
+      default='delf_config_example.pbtxt',
+      help="""
+      Path to DelfConfig proto text file with configuration to be used for DELF
+      extraction.
+      """)
+  parser.add_argument(
+      '--list_images_path',
+      type=str,
+      default='list_images.txt',
+      help="""
+      Path to list of images whose DELF features will be extracted.
+      """)
+  parser.add_argument(
+      '--output_dir',
+      type=str,
+      default='test_features',
+      help="""
+      Directory where DELF features will be written to. Each image's features
+      will be written to a file with same name, and extension replaced by .delf.
+      """)
+  cmd_args, unparsed = parser.parse_known_args()
+  app.run(main=main, argv=[sys.argv[0]] + unparsed)
--- a/research/delf/delf/python/examples/match_images.py
+++ b/research/delf/delf/python/examples/match_images.py
+# Copyright 2017 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Matches two images using their DELF features.
+
+The matching is done using feature-based nearest-neighbor search, followed by
+geometric verification using RANSAC.
+
+The DELF features can be extracted using the extract_features.py script.
+"""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import argparse
+from delf import feature_io
+import matplotlib.image as mpimg
+import matplotlib.pyplot as plt
+import numpy as np
+from scipy.spatial import cKDTree
+from skimage.feature import plot_matches
+from skimage.measure import ransac
+from skimage.transform import AffineTransform
+import sys
+import tensorflow as tf
+from tensorflow.python.platform import app
+
+cmd_args = None
+
+_DISTANCE_THRESHOLD = 0.8
+
+
+def main(unused_argv):
+  tf.logging.set_verbosity(tf.logging.INFO)
+
+  # Read features.
+  locations_1, _, descriptors_1, _, _ = feature_io.ReadFromFile(
+      cmd_args.features_1_path)
+  num_features_1 = locations_1.shape[0]
+  tf.logging.info("Loaded image 1's %d features" % num_features_1)
+  locations_2, _, descriptors_2, _, _ = feature_io.ReadFromFile(
+      cmd_args.features_2_path)
+  num_features_2 = locations_2.shape[0]
+  tf.logging.info("Loaded image 2's %d features" % num_features_2)
+
+  # Find nearest-neighbor matches using a KD tree.
+  d1_tree = cKDTree(descriptors_1)
+  distances, indices = d1_tree.query(
+      descriptors_2, distance_upper_bound=_DISTANCE_THRESHOLD)
+
+  # Select feature locations for putative matches.
+  locations_2_to_use = np.array([
+      locations_2[i,] for i in range(num_features_2)
+      if indices[i] != num_features_1
+  ])
+  locations_1_to_use = np.array([
+      locations_1[indices[i],] for i in range(num_features_2)
+      if indices[i] != num_features_1
+  ])
+
+  # Perform geometric verification using RANSAC.
+  model_robust, inliers = ransac(
+      (locations_1_to_use, locations_2_to_use),
+      AffineTransform,
+      min_samples=3,
+      residual_threshold=20,
+      max_trials=1000)
+
+  tf.logging.info('Found %d inliers' % sum(inliers))
+
+  # Visualize correspondences, and save to file.
+  fig, ax = plt.subplots()
+  img_1 = mpimg.imread(cmd_args.image_1_path)
+  img_2 = mpimg.imread(cmd_args.image_2_path)
+  inlier_idxs = np.nonzero(inliers)[0]
+  plot_matches(
+      ax,
+      img_1,
+      img_2,
+      locations_1_to_use,
+      locations_2_to_use,
+      np.column_stack((inlier_idxs, inlier_idxs)),
+      matches_color='b')
+  ax.axis('off')
+  ax.set_title('DELF correspondences')
+  plt.savefig(cmd_args.output_image)
+
+
+if __name__ == '__main__':
+  parser = argparse.ArgumentParser()
+  parser.register('type', 'bool', lambda v: v.lower() == 'true')
+  parser.add_argument(
+      '--image_1_path',
+      type=str,
+      default='test_images/image_1.jpg',
+      help="""
+      Path to test image 1.
+      """)
+  parser.add_argument(
+      '--image_2_path',
+      type=str,
+      default='test_images/image_2.jpg',
+      help="""
+      Path to test image 2.
+      """)
+  parser.add_argument(
+      '--features_1_path',
+      type=str,
+      default='test_features/image_1.delf',
+      help="""
+      Path to DELF features from image 1.
+      """)
+  parser.add_argument(
+      '--features_2_path',
+      type=str,
+      default='test_features/image_2.delf',
+      help="""
+      Path to DELF features from image 2.
+      """)
+  parser.add_argument(
+      '--output_image',
+      type=str,
+      default='test_match.png',
+      help="""
+      Path where an image showing the matches will be saved.
+      """)
+  cmd_args, unparsed = parser.parse_known_args()
+  app.run(main=main, argv=[sys.argv[0]] + unparsed)
--- a/research/delf/delf/python/examples/matched_images_example.png
+++ b/research/delf/delf/python/examples/matched_images_example.png
--- a/research/delf/delf/python/feature_extractor.py
+++ b/research/delf/delf/python/feature_extractor.py
+# Copyright 2017 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""DELF feature extractor.
+"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from delf import datum_io
+from delf import delf_v1
+from delf import delf_config_pb2
+from object_detection.core import box_list
+from object_detection.core import box_list_ops
+import tensorflow as tf
+
+
+def NormalizePixelValues(image,
+                         pixel_value_offset=128.0,
+                         pixel_value_scale=128.0):
+  """Normalize image pixel values.
+
+  Args:
+    image: a uint8 tensor.
+    pixel_value_offset: a Python float, offset for normalizing pixel values.
+    pixel_value_scale: a Python float, scale for normalizing pixel values.
+
+  Returns:
+    image: a float32 tensor of the same shape as the input image.
+  """
+  image = tf.to_float(image)
+  image = tf.div(tf.subtract(image, pixel_value_offset), pixel_value_scale)
+  return image
+
+
+def CalculateReceptiveBoxes(height, width, rf, stride, padding):
+  """Calculate receptive boxes for each feature point.
+
+  Args:
+    height: The height of feature map.
+    width: The width of feature map.
+    rf: The receptive field size.
+    stride: The effective stride between two adjacent feature points.
+    padding: The effective padding size.
+  Returns:
+    rf_boxes: [N, 4] receptive boxes tensor. Here N equals to height x width.
+    Each box is represented by [ymin, xmin, ymax, xmax].
+  """
+  x, y = tf.meshgrid(tf.range(width), tf.range(height))
+  coordinates = tf.reshape(tf.stack([y, x], axis=2), [-1, 2])
+  # [y,x,y,x]
+  point_boxes = tf.to_float(tf.concat([coordinates, coordinates], 1))
+  bias = [-padding, -padding, -padding + rf - 1, -padding + rf - 1]
+  rf_boxes = stride * point_boxes + bias
+  return rf_boxes
+
+
+def CalculateKeypointCenters(boxes):
+  """Helper function to compute feature centers, from RF boxes.
+
+  Args:
+    boxes: [N, 4] float tensor.
+
+  Returns:
+    centers: [N, 2] float tensor.
+  """
+  return tf.divide(
+      tf.add(
+          tf.gather(boxes, [0, 1], axis=1), tf.gather(boxes, [2, 3], axis=1)),
+      2.0)
+
+
+def ExtractKeypointDescriptor(image, layer_name, image_scales, iou,
+                              max_feature_num, abs_thres, model_fn):
+  """Extract keypoint descriptor for input image.
+
+  Args:
+    image: A image tensor with shape [h, w, channels].
+    layer_name: The endpoint of feature extraction layer.
+    image_scales: A 1D float tensor which contains the scales.
+    iou: A float scalar denoting the IOU threshold for NMS.
+    max_feature_num: An int tensor denoting the maximum selected feature points.
+    abs_thres: A float tensor denoting the score threshold for feature
+      selection.
+    model_fn: Model function. Follows the signature:
+
+      * Args:
+        * `images`: Image tensor which is re-scaled.
+        * `normalized_image`: Whether or not the images are normalized.
+        * `reuse`: Whether or not the layer and its variables should be reused.
+
+      * Returns:
+        * `attention`: Attention score after the non-linearity.
+        * `feature_map`: Feature map obtained from the ResNet model.
+
+  Returns:
+    boxes: [N, 4] float tensor which denotes the selected receptive box. N is
+      the number of final feature points which pass through keypoint selection
+      and NMS steps.
+    feature_scales: [N] float tensor. It is the inverse of the input image
+      scales such that larger image scales correspond to larger image regions,
+      which is compatible with scale-space keypoint detection convention.
+    features: [N, depth] float tensor with feature descriptors.
+    scores: [N, 1] float tensor denoting the attention score.
+
+  Raises:
+    ValueError: If the layer_name is unsupported.
+  """
+  original_image_shape_float = tf.gather(tf.to_float(tf.shape(image)), [0, 1])
+  image_tensor = NormalizePixelValues(image)
+  image_tensor = tf.expand_dims(image_tensor, 0, name='image/expand_dims')
+
+  # Feature depth and receptive field parameters for each network version.
+  if layer_name == 'resnet_v1_50/block3':
+    feature_depth = 1024
+    rf, stride, padding = [291.0, 32.0, 145.0]
+  elif layer_name == 'resnet_v1_50/block4':
+    feature_depth = 2048
+    rf, stride, padding = [483.0, 32.0, 241.0]
+  else:
+    raise ValueError('Unsupported layer_name.')
+
+  def _ProcessSingleScale(scale_index,
+                          boxes,
+                          features,
+                          scales,
+                          scores,
+                          reuse=True):
+    """Resize the image and run feature extraction and keypoint selection.
+
+       This function will be passed into tf.while_loop() and be called
+       repeatedly. The input boxes are collected from the previous iteration
+       [0: scale_index -1]. We get the current scale by
+       image_scales[scale_index], and run image resizing, feature extraction and
+       keypoint selection. Then we will get a new set of selected_boxes for
+       current scale. In the end, we concat the previous boxes with current
+       selected_boxes as the output.
+
+    Args:
+      scale_index: A valid index in the image_scales.
+      boxes: Box tensor with the shape of [N, 4].
+      features: Feature tensor with the shape of [N, depth].
+      scales: Scale tensor with the shape of [N].
+      scores: Attention score tensor with the shape of [N].
+      reuse: Whether or not the layer and its variables should be reused.
+
+    Returns:
+      scale_index: The next scale index for processing.
+      boxes: Concatenated box tensor with the shape of [K, 4]. K >= N.
+      features: Concatenated feature tensor with the shape of [K, depth].
+      scales: Concatenated scale tensor with the shape of [K].
+      scores: Concatenated attention score tensor with the shape of [K].
+    """
+    scale = tf.gather(image_scales, scale_index)
+    new_image_size = tf.to_int32(tf.round(original_image_shape_float * scale))
+    resized_image = tf.image.resize_bilinear(image_tensor, new_image_size)
+
+    attention, feature_map = model_fn(
+        resized_image, normalized_image=True, reuse=reuse)
+
+    rf_boxes = CalculateReceptiveBoxes(
+        tf.shape(feature_map)[1], tf.shape(feature_map)[2], rf, stride, padding)
+    # Re-project back to the original image space.
+    rf_boxes = tf.divide(rf_boxes, scale)
+    attention = tf.reshape(attention, [-1])
+    feature_map = tf.reshape(feature_map, [-1, feature_depth])
+
+    # Use attention score to select feature vectors.
+    indices = tf.reshape(tf.where(attention >= abs_thres), [-1])
+    selected_boxes = tf.gather(rf_boxes, indices)
+    selected_features = tf.gather(feature_map, indices)
+    selected_scores = tf.gather(attention, indices)
+    selected_scales = tf.ones_like(selected_scores, tf.float32) / scale
+
+    # Concat with the previous result from different scales.
+    boxes = tf.concat([boxes, selected_boxes], 0)
+    features = tf.concat([features, selected_features], 0)
+    scales = tf.concat([scales, selected_scales], 0)
+    scores = tf.concat([scores, selected_scores], 0)
+
+    return scale_index + 1, boxes, features, scales, scores
+
+  output_boxes = tf.zeros([0, 4], dtype=tf.float32)
+  output_features = tf.zeros([0, feature_depth], dtype=tf.float32)
+  output_scales = tf.zeros([0], dtype=tf.float32)
+  output_scores = tf.zeros([0], dtype=tf.float32)
+
+  # Process the first scale separately, the following scales will reuse the
+  # graph variables.
+  (_, output_boxes, output_features, output_scales,
+   output_scores) = _ProcessSingleScale(
+       0,
+       output_boxes,
+       output_features,
+       output_scales,
+       output_scores,
+       reuse=False)
+  i = tf.constant(1, dtype=tf.int32)
+  num_scales = tf.shape(image_scales)[0]
+  keep_going = lambda j, boxes, features, scales, scores: tf.less(j, num_scales)
+
+  (_, output_boxes, output_features, output_scales,
+   output_scores) = tf.while_loop(
+       cond=keep_going,
+       body=_ProcessSingleScale,
+       loop_vars=[
+           i, output_boxes, output_features, output_scales, output_scores
+       ],
+       shape_invariants=[
+           i.get_shape(),
+           tf.TensorShape([None, 4]),
+           tf.TensorShape([None, feature_depth]),
+           tf.TensorShape([None]),
+           tf.TensorShape([None])
+       ],
+       back_prop=False)
+
+  feature_boxes = box_list.BoxList(output_boxes)
+  feature_boxes.add_field('features', output_features)
+  feature_boxes.add_field('scales', output_scales)
+  feature_boxes.add_field('scores', output_scores)
+
+  nms_max_boxes = tf.minimum(max_feature_num, feature_boxes.num_boxes())
+  final_boxes = box_list_ops.non_max_suppression(feature_boxes, iou,
+                                                 nms_max_boxes)
+
+  return (final_boxes.get(), final_boxes.get_field('scales'),
+          final_boxes.get_field('features'), tf.expand_dims(
+              final_boxes.get_field('scores'), 1))
+
+
+def BuildModel(layer_name, attention_nonlinear, attention_type,
+               attention_kernel_size):
+  """Build the DELF model.
+
+  This function is helpful for constructing the model function which will be fed
+  to ExtractKeypointDescriptor().
+
+  Args:
+    layer_name: the endpoint of feature extraction layer.
+    attention_nonlinear: Type of the non-linearity for the attention function.
+      Currently, only 'softplus' is supported.
+    attention_type: Type of the attention used. Options are:
+      'use_l2_normalized_feature' and 'use_default_input_feature'. Note that
+       this is irrelevant during inference time.
+    attention_kernel_size: Size of attention kernel (kernel is square).
+
+  Returns:
+    Attention model function.
+  """
+
+  def _ModelFn(images, normalized_image, reuse):
+    """Attention model to get feature map and attention score map.
+
+    Args:
+      images: Image tensor.
+      normalized_image: Whether or not the images are normalized.
+      reuse: Whether or not the layer and its variables should be reused.
+    Returns:
+      attention: Attention score after the non-linearity.
+      feature_map: Feature map after ResNet convolution.
+    """
+    if normalized_image:
+      image_tensor = images
+    else:
+      image_tensor = NormalizePixelValues(images)
+
+    # Extract features and attention scores.
+    model = delf_v1.DelfV1(layer_name)
+    _, attention, _, feature_map, _ = model.GetAttentionPrelogit(
+        image_tensor,
+        attention_nonlinear=attention_nonlinear,
+        attention_type=attention_type,
+        kernel=[attention_kernel_size, attention_kernel_size],
+        training_resnet=False,
+        training_attention=False,
+        reuse=reuse)
+    return attention, feature_map
+
+  return _ModelFn
+
+
+def ApplyPcaAndWhitening(data,
+                         pca_matrix,
+                         pca_mean,
+                         output_dim,
+                         use_whitening=False,
+                         pca_variances=None):
+  """Applies PCA/whitening to data.
+
+  Args:
+    data: [N, dim] float tensor containing data which undergoes PCA/whitening.
+    pca_matrix: [dim, dim] float tensor PCA matrix, row-major.
+    pca_mean: [dim] float tensor, mean to subtract before projection.
+    output_dim: Number of dimensions to use in output data, of type int.
+    use_whitening: Whether whitening is to be used.
+    pca_variances: [dim] float tensor containing PCA variances. Only used if
+      use_whitening is True.
+
+  Returns:
+    output: [N, output_dim] float tensor with output of PCA/whitening operation.
+  """
+  output = tf.matmul(
+      tf.subtract(data, pca_mean),
+      tf.slice(pca_matrix, [0, 0], [output_dim, -1]),
+      transpose_b=True,
+      name='pca_matmul')
+
+  # Apply whitening if desired.
+  if use_whitening:
+    output = tf.divide(
+        output,
+        tf.sqrt(tf.slice(pca_variances, [0], [output_dim])),
+        name='whitening')
+
+  return output
+
+
+def DelfFeaturePostProcessing(boxes, descriptors, config):
+  """Extract DELF features from input image.
+
+  Args:
+    boxes: [N, 4] float tensor which denotes the selected receptive box. N is
+      the number of final feature points which pass through keypoint selection
+      and NMS steps.
+    descriptors: [N, input_dim] float tensor.
+    config: DelfConfig proto with DELF extraction options.
+
+  Returns:
+    locations: [N, 2] float tensor which denotes the selected keypoint
+      locations.
+    final_descriptors: [N, output_dim] float tensor with DELF descriptors after
+      normalization and (possibly) PCA/whitening.
+  """
+
+  # Get center of descriptor boxes, corresponding to feature locations.
+  locations = CalculateKeypointCenters(boxes)
+
+  # Post-process descriptors: L2-normalize, and if desired apply PCA (followed
+  # by L2-normalization).
+  with tf.variable_scope('postprocess'):
+    final_descriptors = tf.nn.l2_normalize(
+        descriptors, dim=1, name='l2_normalization')
+
+    if config.delf_local_config.use_pca:
+      # Load PCA parameters.
+      pca_mean = tf.constant(
+          datum_io.ReadFromFile(
+              config.delf_local_config.pca_parameters.mean_path),
+          dtype=tf.float32)
+      pca_matrix = tf.constant(
+          datum_io.ReadFromFile(
+              config.delf_local_config.pca_parameters.projection_matrix_path),
+          dtype=tf.float32)
+      pca_dim = config.delf_local_config.pca_parameters.pca_dim
+      pca_variances = None
+      if config.delf_local_config.pca_parameters.use_whitening:
+        pca_variances = tf.constant(
+            datum_io.ReadFromFile(
+                config.delf_local_config.pca_parameters.pca_variances_path),
+            dtype=tf.float32)
+
+      # Apply PCA, and whitening if desired.
+      final_descriptors = ApplyPcaAndWhitening(
+          final_descriptors, pca_matrix, pca_mean, pca_dim,
+          config.delf_local_config.pca_parameters.use_whitening, pca_variances)
+
+      # Re-normalize.
+      final_descriptors = tf.nn.l2_normalize(
+          final_descriptors, dim=1, name='pca_l2_normalization')
+
+  return locations, final_descriptors