Commit 9683ee99 authored by Lukasz Kaiser's avatar Lukasz Kaiser Committed by GitHub
Browse files

Merge pull request #2568 from andrefaraujo/master

Adding DELF model
parents 0def57a5 7f5bdcd4
......@@ -7,6 +7,7 @@ research/audioset/* @plakal @dpwe
research/autoencoders/* @snurkabill
research/cognitive_mapping_and_planning/* @s-gupta
research/compression/* @nmjohn
research/delf/* @andrefaraujo
research/differential_privacy/* @panyx0718
research/domain_adaptation/* @bousmalis @ddohan
research/im2txt/* @cshallue
......
......@@ -6,41 +6,63 @@ respective authors. To propose a model for inclusion, please submit a pull
request.
Currently, the models are compatible with TensorFlow 1.0 or later. If you are
running TensorFlow 0.12 or earlier, please
[upgrade your installation](https://www.tensorflow.org/install).
running TensorFlow 0.12 or earlier, please [upgrade your
installation](https://www.tensorflow.org/install).
## Models
- [adversarial_crypto](adversarial_crypto): protecting communications with adversarial neural cryptography.
- [adversarial_text](adversarial_text): semi-supervised sequence learning with adversarial training.
- [attention_ocr](attention_ocr): a model for real-world image text extraction.
- [audioset](audioset): Models and supporting code for use with [AudioSet](http://g.co.audioset).
- [autoencoder](autoencoder): various autoencoders.
- [cognitive_mapping_and_planning](cognitive_mapping_and_planning): implementation of a spatial memory based mapping and planning architecture for visual navigation.
- [compression](compression): compressing and decompressing images using a pre-trained Residual GRU network.
- [differential_privacy](differential_privacy): privacy-preserving student models from multiple teachers.
- [domain_adaptation](domain_adaptation): domain separation networks.
- [im2txt](im2txt): image-to-text neural network for image captioning.
- [inception](inception): deep convolutional networks for computer vision.
- [learning_to_remember_rare_events](learning_to_remember_rare_events): a large-scale life-long memory module for use in deep learning.
- [lfads](lfads): sequential variational autoencoder for analyzing neuroscience data.
- [lm_1b](lm_1b): language modeling on the one billion word benchmark.
- [namignizer](namignizer): recognize and generate names.
- [neural_gpu](neural_gpu): highly parallel neural computer.
- [neural_programmer](neural_programmer): neural network augmented with logic and mathematic operations.
- [next_frame_prediction](next_frame_prediction): probabilistic future frame synthesis via cross convolutional networks.
- [object_detection](object_detection): localizing and identifying multiple objects in a single image.
- [pcl_rl](pcl_rl): code for several reinforcement learning algorithms, including Path Consistency Learning.
- [ptn](ptn): perspective transformer nets for 3D object reconstruction.
- [qa_kg](qa_kg): module networks for question answering on knowledge graphs.
- [real_nvp](real_nvp): density estimation using real-valued non-volume preserving (real NVP) transformations.
- [rebar](rebar): low-variance, unbiased gradient estimates for discrete latent variable models.
- [resnet](resnet): deep and wide residual networks.
- [skip_thoughts](skip_thoughts): recurrent neural network sentence-to-vector encoder.
- [slim](slim): image classification models in TF-Slim.
- [street](street): identify the name of a street (in France) from an image using a Deep RNN.
- [swivel](swivel): the Swivel algorithm for generating word embeddings.
- [syntaxnet](syntaxnet): neural models of natural language syntax.
- [textsum](textsum): sequence-to-sequence with attention model for text summarization.
- [transformer](transformer): spatial transformer network, which allows the spatial manipulation of data within the network.
- [video_prediction](video_prediction): predicting future video frames with neural advection.
- [adversarial_crypto](adversarial_crypto): protecting communications with
adversarial neural cryptography.
- [adversarial_text](adversarial_text): semi-supervised sequence learning with
adversarial training.
- [attention_ocr](attention_ocr): a model for real-world image text
extraction.
- [audioset](audioset): Models and supporting code for use with
[AudioSet](http://g.co.audioset).
- [autoencoder](autoencoder): various autoencoders.
- [cognitive_mapping_and_planning](cognitive_mapping_and_planning):
implementation of a spatial memory based mapping and planning architecture
for visual navigation.
- [compression](compression): compressing and decompressing images using a
pre-trained Residual GRU network.
- [delf](delf): deep local features for image matching and retrieval.
- [differential_privacy](differential_privacy): privacy-preserving student
models from multiple teachers.
- [domain_adaptation](domain_adaptation): domain separation networks.
- [im2txt](im2txt): image-to-text neural network for image captioning.
- [inception](inception): deep convolutional networks for computer vision.
- [learning_to_remember_rare_events](learning_to_remember_rare_events): a
large-scale life-long memory module for use in deep learning.
- [lfads](lfads): sequential variational autoencoder for analyzing
neuroscience data.
- [lm_1b](lm_1b): language modeling on the one billion word benchmark.
- [namignizer](namignizer): recognize and generate names.
- [neural_gpu](neural_gpu): highly parallel neural computer.
- [neural_programmer](neural_programmer): neural network augmented with logic
and mathematic operations.
- [next_frame_prediction](next_frame_prediction): probabilistic future frame
synthesis via cross convolutional networks.
- [object_detection](object_detection): localizing and identifying multiple
objects in a single image.
- [pcl_rl](pcl_rl): code for several reinforcement learning algorithms,
including Path Consistency Learning.
- [ptn](ptn): perspective transformer nets for 3D object reconstruction.
- [qa_kg](qa_kg): module networks for question answering on knowledge graphs.
- [real_nvp](real_nvp): density estimation using real-valued non-volume
preserving (real NVP) transformations.
- [rebar](rebar): low-variance, unbiased gradient estimates for discrete
latent variable models.
- [resnet](resnet): deep and wide residual networks.
- [skip_thoughts](skip_thoughts): recurrent neural network sentence-to-vector
encoder.
- [slim](slim): image classification models in TF-Slim.
- [street](street): identify the name of a street (in France) from an image
using a Deep RNN.
- [swivel](swivel): the Swivel algorithm for generating word embeddings.
- [syntaxnet](syntaxnet): neural models of natural language syntax.
- [textsum](textsum): sequence-to-sequence with attention model for text
summarization.
- [transformer](transformer): spatial transformer network, which allows the
spatial manipulation of data within the network.
- [video_prediction](video_prediction): predicting future video frames with
neural advection.
*pyc
*~
*pb2.py
*pb2.pyc
## Quick start: DELF extraction and matching
To illustrate DELF usage, please download the Oxford buildings dataset. To
follow these instructions closely, please download the dataset to the
`tensorflow/models/research/delf/delf/python/examples` directory, as in the
following commands:
```bash
# From tensorflow/models/research/delf/delf/python/examples/
mkdir data && cd data
wget http://www.robots.ox.ac.uk/~vgg/data/oxbuildings/oxbuild_images.tgz
mkdir oxford5k_images oxford5k_features
tar -xvzf oxbuild_images.tgz -C oxford5k_images/
cd ../
echo data/oxford5k_images/hertford_000056.jpg >> list_images.txt
echo data/oxford5k_images/oxford_000317.jpg >> list_images.txt
```
Also, you will need to download the trained DELF model:
```bash
# From tensorflow/models/research/delf/delf/python/examples/
mkdir parameters && cd parameters
wget http://download.tensorflow.org/models/delf_v1_20171026.tar.gz
tar -xvzf delf_v1_20171026.tar.gz
```
### DELF feature extraction
Now that you have everything in place, running this command should extract DELF
features for the images `hertford_000056.jpg` and `oxford_000317.jpg`:
```bash
# From tensorflow/models/research/delf/delf/python/examples/
python extract_features.py \
--config_path delf_config_example.pbtxt \
--list_images_path list_images.txt \
--output_dir data/oxford5k_features
```
### Image matching using DELF features
After feature extraction, run this command to perform feature matching between
the images `hertford_000056.jpg` and `oxford_000317.jpg`:
```bash
python match_images.py \
--image_1_path data/oxford5k_images/hertford_000056.jpg \
--image_2_path data/oxford5k_images/oxford_000317.jpg \
--features_1_path data/oxford5k_features/hertford_000056.delf \
--features_2_path data/oxford5k_features/oxford_000317.delf \
--output_image matched_images.png
```
The image `matched_images.png` is generated and should look similar to this one:
![MatchedImagesExample](delf/python/examples/matched_images_example.png)
## DELF installation
### Tensorflow
For detailed steps to install Tensorflow, follow the [Tensorflow installation
instructions](https://www.tensorflow.org/install/). A typical user can install
Tensorflow using one of the following commands:
```bash
# For CPU:
pip install tensorflow
# For GPU:
pip install tensorflow-gpu
```
### Protobuf
The DELF library uses [protobuf](https://github.com/google/protobuf) (the python
version) to configure feature extraction and its format. You will need the
`protoc` compiler, version >= 3.3. The easiest way to get it is to download
directly. For Linux, this can be done as (see
[here](https://github.com/google/protobuf/releases) for other platforms):
```bash
wget https://github.com/google/protobuf/releases/download/v3.3.0/protoc-3.3.0-linux-x86_64.zip
unzip protoc-3.3.0-linux-x86_64.zip
PATH_TO_PROTOC=`pwd`
```
### Python dependencies
Install python library dependencies:
```bash
sudo pip install matplotlib
sudo pip install numpy
sudo pip install scikit-image
sudo pip install scipy
```
### `tensorflow/models`
Now, clone `tensorflow/models`, and install required libraries: (note that the
`object_detection` library requires you to add `tensorflow/models/research/` to
your `PYTHONPATH`, as instructed
[here](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md))
```bash
git clone https://github.com/tensorflow/models
# First, install slim's "nets" package.
cd models/research/slim/
sudo pip install -e .
# Second, setup the object_detection module by editing PYTHONPATH.
cd ..
# From tensorflow/models/research/
export PYTHONPATH=$PYTHONPATH:`pwd`
```
Then, compile DELF's protobufs. Use `PATH_TO_PROTOC` as the directory where you
downloaded the `protoc` compiler.
```bash
# From tensorflow/models/research/delf/
${PATH_TO_PROTOC?}/bin/protoc delf/protos/*.proto --python_out=.
```
Finally, install the DELF package.
```bash
# From tensorflow/models/research/delf/
sudo pip install -e . # Install "delf" package.
```
At this point, running
```bash
python -c 'import delf'
```
should just return without complaints. This indicates that the DELF package is
loaded successfully.
# DELF: DEep Local Features
This project presents code for extracting DELF features, which were introduced
with the paper ["Large-Scale Image Retrieval with Attentive Deep Local
Features"](https://arxiv.org/abs/1612.06321). A simple application is also
illustrated, where two images containing the same landmark can be matched to
each other, to obtain local image correspondences.
DELF is particularly useful for large-scale instance-level image recognition. It
detects and describes semantic local features which can be geometrically
verified between images showing the same object instance. The pre-trained model
released here has been optimized for landmark recognition, so expect it to work
well in this area. We also provide tensorflow code for building the DELF model,
which could then be used to train models for other types of objects.
If you make use of this code, please consider citing:
```
"Large-Scale Image Retrieval with Attentive Deep Local Features",
Hyeonwoo Noh, Andre Araujo, Jack Sim, Tobias Weyand, Bohyung Han,
Proc. ICCV'17
```
## Installation
To be able to use this code, please follow [these
instructions](INSTALL_INSTRUCTIONS.md) to properly install the DELF library.
## Quick start: DELF extraction and matching
Please follow [these instructions](EXTRACTION_MATCHING.md). At the end, you
should obtain a nice figure showing local feature matches, as:
![MatchedImagesExample](delf/python/examples/matched_images_example.png)
## Code overview
DELF's code is located under the `delf` directory. There are two directories
therein, `protos` and `python`.
### `delf/protos`
This directory contains three protobufs:
- `datum.proto`: general-purpose protobuf for serializing float tensors.
- `feature.proto`: protobuf for serializing DELF features.
- `delf_config.proto`: protobuf for configuring DELF extraction.
### `delf/python`
This directory contains files for several different purposes:
- `datum_io.py`, `feature_io.py` are helper files for reading and writing
tensors and features.
- `delf_v1.py` contains the code to create DELF models.
- `feature_extractor.py` contains the code to extract features using DELF.
This is particularly useful for extracting features over multiple scales,
with keypoint selection based on attention scores, and PCA/whitening
post-processing.
Besides these, other files in this directory contain tests for different
modules.
The subdirectory `delf/python/examples` contains sample scripts to run DELF
feature extraction and matching:
- `extract_features.py` enables DELF extraction from a list of images.
- `match_images.py` supports image matching using DELF features extracted
using `extract_features.py`.
- `delf_config_example.pbtxt` shows an example instantiation of the DelfConfig
proto, used for DELF feature extraction.
## Dataset
The Google-Landmarks dataset will be released together with a Kaggle-hosted
landmark recognition competition. We will include the link to it here once it is
launched (expect this to be done around mid-January, 2018).
## Maintainers
André Araujo (@andrefaraujo)
## Release history
### October 26, 2017
Initial release containing DELF-v1 code, including feature extraction and
matching examples.
**Thanks to contributors**: André Araujo, Hyeonwoo Noh, Youlong Cheng,
Jack Sim.
# Copyright 2017 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Module to extract deep local features."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
# pylint: disable=unused-import
from delf.protos import datum_pb2
from delf.protos import delf_config_pb2
from delf.protos import feature_pb2
from delf.python import datum_io
from delf.python import delf_v1
from delf.python import feature_extractor
from delf.python import feature_io
# pylint: enable=unused-import
// Protocol buffer for serializing arbitrary float tensors.
// Note: Currently only floating point feature is supported.
syntax = "proto2";
package delf.protos;
// A DatumProto is a data structure used to serialize tensor with arbitrary
// shape. DatumProto contains an array of floating point values and its shape
// is represented as a sequence of integer values. Values are contained in
// row major order.
//
// Example:
// 3 x 2 array
//
// [1.1, 2.2]
// [3.3, 4.4]
// [5.5, 6.6]
//
// can be represented with the following DatumProto:
//
// DatumProto {
// shape {
// dim: 3
// dim: 2
// }
// float_list {
// value: 1.1
// value: 2.2
// value: 3.3
// value: 4.4
// value: 5.5
// value: 6.6
// }
// }
// DatumShape is array of dimension of the tensor.
message DatumShape {
repeated int64 dim = 1 [packed = true];
}
// FloatList is the container of tensor values. The tensor values are saved as
// a list of floating point values.
message FloatList {
repeated float value = 1 [packed = true];
}
message DatumProto {
optional DatumShape shape = 1;
oneof kind_oneof {
FloatList float_list = 2;
}
}
// Protocol buffer for configuring DELF feature extraction.
syntax = "proto2";
package delf.protos;
message DelfPcaParameters {
// Path to PCA mean file.
optional string mean_path = 1; // Required.
// Path to PCA matrix file.
optional string projection_matrix_path = 2; // Required.
// Dimensionality of feature after PCA.
optional int32 pca_dim = 3; // Required.
// If whitening is to be used, this must be set to true.
optional bool use_whitening = 4 [default = false];
// Path to PCA variances file, used for whitening. This is used only if
// use_whitening is set to true.
optional string pca_variances_path = 5;
}
message DelfLocalFeatureConfig {
// If PCA is to be used, this must be set to true.
optional bool use_pca = 1 [default = true];
// Target layer name for DELF model. This is used to obtain receptive field
// parameters used for localizing features with respect to the input image.
optional string layer_name = 2 [default = ""];
// Intersection over union threshold for the non-max suppression (NMS)
// operation. If two features overlap by at most this amount, both are kept.
// Otherwise, the one with largest attention score is kept. This should be a
// number between 0.0 (no region is selected) and 1.0 (all regions are
// selected and NMS is not performed).
optional float iou_threshold = 3 [default = 1.0];
// Maximum number of features that will be selected. The features with largest
// scores (eg, largest attention score if score_type is "Att") are the
// selected ones.
optional int32 max_feature_num = 4 [default = 1000];
// Threshold to be used for feature selection: no feature with score lower
// than this number will be selected).
optional float score_threshold = 5 [default = 100.0];
// PCA parameters for DELF local feature. This is used only if use_pca is
// true.
optional DelfPcaParameters pca_parameters = 6;
}
message DelfConfig {
// Path to DELF model.
optional string model_path = 1; // Required.
// Image scales to be used.
repeated float image_scales = 2;
// Configuration used for DELF local features.
optional DelfLocalFeatureConfig delf_local_config = 3;
}
// Protocol buffer for serializing the DELF feature information.
syntax = "proto2";
package delf.protos;
import "delf/protos/datum.proto";
// FloatList is the container of tensor values. The tensor values are saved as
// a list of floating point values.
message DelfFeature {
optional DatumProto descriptor = 1;
optional float x = 2;
optional float y = 3;
optional float scale = 4;
optional float orientation = 5;
optional float strength = 6;
}
message DelfFeatures {
repeated DelfFeature feature = 1;
}
# Copyright 2017 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Python interface for DatumProto.
DatumProto is protocol buffer used to serialize tensor with arbitrary shape.
Please refer to datum.proto for details.
Support read and write of DatumProto from/to numpy array and file.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from delf import datum_pb2
import numpy as np
import tensorflow as tf
def ArrayToDatum(arr):
"""Converts numpy array to DatumProto.
Args:
arr: Numpy array of arbitrary shape.
Returns:
datum: DatumProto object.
"""
datum = datum_pb2.DatumProto()
datum.float_list.value.extend(arr.astype(float).flat)
datum.shape.dim.extend(arr.shape)
return datum
def DatumToArray(datum):
"""Converts data saved in DatumProto to numpy array.
Args:
datum: DatumProto object.
Returns:
Numpy array of arbitrary shape.
"""
return np.array(datum.float_list.value).astype(float).reshape(datum.shape.dim)
def SerializeToString(arr):
"""Converts numpy array to serialized DatumProto.
Args:
arr: Numpy array of arbitrary shape.
Returns:
Serialized DatumProto string.
"""
datum = ArrayToDatum(arr)
return datum.SerializeToString()
def ParseFromString(string):
"""Converts serialized DatumProto string to numpy array.
Args:
string: Serialized DatumProto string.
Returns:
Numpy array.
"""
datum = datum_pb2.DatumProto()
datum.ParseFromString(string)
return DatumToArray(datum)
def ReadFromFile(file_path):
"""Helper function to load data from a DatumProto format in a file.
Args:
file_path: Path to file containing data.
Returns:
data: Numpy array.
"""
with tf.gfile.FastGFile(file_path, 'r') as f:
return ParseFromString(f.read())
def WriteToFile(data, file_path):
"""Helper function to write data to a file in DatumProto format.
Args:
data: Numpy array.
file_path: Path to file that will be written.
"""
serialized_data = SerializeToString(data)
with tf.gfile.FastGFile(file_path, 'w') as f:
f.write(serialized_data)
# Copyright 2017 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for datum_io, the python interface of DatumProto."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from delf import datum_io
from delf import datum_pb2
import numpy as np
import os
import tensorflow as tf
class DatumIoTest(tf.test.TestCase):
def Conversion2dTestWithType(self, dtype):
original_data = np.arange(9).reshape(3, 3).astype(dtype)
serialized = datum_io.SerializeToString(original_data)
retrieved_data = datum_io.ParseFromString(serialized)
self.assertTrue(np.array_equal(original_data, retrieved_data))
def Conversion3dTestWithType(self, dtype):
original_data = np.arange(24).reshape(2, 3, 4).astype(dtype)
serialized = datum_io.SerializeToString(original_data)
retrieved_data = datum_io.ParseFromString(serialized)
self.assertTrue(np.array_equal(original_data, retrieved_data))
def testConversion2dWithType(self):
self.Conversion2dTestWithType(np.int8)
self.Conversion2dTestWithType(np.int16)
self.Conversion2dTestWithType(np.int32)
self.Conversion2dTestWithType(np.int64)
self.Conversion2dTestWithType(np.float16)
self.Conversion2dTestWithType(np.float32)
self.Conversion2dTestWithType(np.float64)
def testConversion3dWithType(self):
self.Conversion3dTestWithType(np.int8)
self.Conversion3dTestWithType(np.int16)
self.Conversion3dTestWithType(np.int32)
self.Conversion3dTestWithType(np.int64)
self.Conversion3dTestWithType(np.float16)
self.Conversion3dTestWithType(np.float32)
self.Conversion3dTestWithType(np.float64)
def testWriteAndReadToFile(self):
data = np.array([[[-1.0, 125.0, -2.5], [14.5, 3.5, 0.0]],
[[20.0, 0.0, 30.0], [25.5, 36.0, 42.0]]])
tmpdir = tf.test.get_temp_dir()
filename = os.path.join(tmpdir, 'test.datum')
datum_io.WriteToFile(data, filename)
data_read = datum_io.ReadFromFile(filename)
self.assertAllEqual(data_read, data)
if __name__ == '__main__':
tf.test.main()
# Copyright 2017 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""DELF model implementation based on the following paper:
Large-Scale Image Retrieval with Attentive Deep Local Features
https://arxiv.org/abs/1612.06321
Please refer to the README.md file for detailed explanations on using the DELF
model.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from nets import resnet_v1
import tensorflow as tf
slim = tf.contrib.slim
_SUPPORTED_TARGET_LAYER = ['resnet_v1_50/block3', 'resnet_v1_50/block4']
# The variable scope for the attention portion of the model.
_ATTENTION_VARIABLE_SCOPE = 'attention_block'
# The attention_type determines whether the attention based feature aggregation
# is performed on the L2-normalized feature map or on the default feature map
# where L2-normalization is not applied. Note that in both cases, attention
# functions are built on the un-normalized feature map. This is only relevant
# for the training stage.
# Currently supported options are as follows:
# * use_l2_normalized_feature:
# The option use_l2_normalized_feature first applies L2-normalization on the
# feature map and then applies attention based feature aggregation. This
# option is used for the DELF+FT+Att model in the paper.
# * use_default_input_feature:
# The option use_default_input_feature aggregates unnormalized feature map
# directly.
_SUPPORTED_ATTENTION_TYPES = [
'use_l2_normalized_feature', 'use_default_input_feature'
]
# Supported types of non-lineary for the attention score function.
_SUPPORTED_ATTENTION_NONLINEARITY = ['softplus']
class DelfV1(object):
"""Creates a DELF model.
Args:
target_layer_type: The name of target CNN architecture and its layer.
Raises:
ValueError: If an unknown target_layer_type is provided.
"""
def __init__(self, target_layer_type=_SUPPORTED_TARGET_LAYER[0]):
tf.logging.info('Creating model %s ', target_layer_type)
self._target_layer_type = target_layer_type
if self._target_layer_type not in _SUPPORTED_TARGET_LAYER:
raise ValueError('Unknown model type.')
@property
def target_layer_type(self):
return self._target_layer_type
def _PerformAttention(self,
attention_feature_map,
feature_map,
attention_nonlinear,
kernel=1):
"""Helper function to construct the attention part of the model.
Computes attention score map and aggregates the input feature map based on
the attention score map.
Args:
attention_feature_map: Potentially normalized feature map that will
be aggregated with attention score map.
feature_map: Unnormalized feature map that will be used to compute
attention score map.
attention_nonlinear: Type of non-linearity that will be applied to
attention value.
kernel: Convolutional kernel to use in attention layers (eg: 1, [3, 3]).
Returns:
attention_feat: Aggregated feature vector.
attention_prob: Attention score map after the non-linearity.
attention_score: Attention score map before the non-linearity.
Raises:
ValueError: If unknown attention non-linearity type is provided.
"""
with tf.variable_scope(
'attention', values=[attention_feature_map, feature_map]):
with tf.variable_scope('compute', values=[feature_map]):
activation_fn_conv1 = tf.nn.relu
feature_map_conv1 = slim.conv2d(
feature_map,
512,
kernel,
rate=1,
activation_fn=activation_fn_conv1,
scope='conv1')
attention_score = slim.conv2d(
feature_map_conv1,
1,
kernel,
rate=1,
activation_fn=None,
normalizer_fn=None,
scope='conv2')
# Set activation of conv2 layer of attention model.
with tf.variable_scope(
'merge', values=[attention_feature_map, attention_score]):
if attention_nonlinear not in _SUPPORTED_ATTENTION_NONLINEARITY:
raise ValueError('Unknown attention non-linearity.')
if attention_nonlinear == 'softplus':
with tf.variable_scope(
'softplus_attention',
values=[attention_feature_map, attention_score]):
attention_prob = tf.nn.softplus(attention_score)
attention_feat = tf.reduce_mean(
tf.multiply(attention_feature_map, attention_prob), [1, 2])
attention_feat = tf.expand_dims(tf.expand_dims(attention_feat, 1), 2)
return attention_feat, attention_prob, attention_score
def _GetAttentionSubnetwork(
self,
feature_map,
end_points,
attention_nonlinear=_SUPPORTED_ATTENTION_NONLINEARITY[0],
attention_type=_SUPPORTED_ATTENTION_TYPES[0],
kernel=1,
reuse=False):
"""Constructs the part of the model performing attention.
Args:
feature_map: A tensor of size [batch, height, width, channels]. Usually it
corresponds to the output feature map of a fully-convolutional network.
end_points: Set of activations of the network constructed so far.
attention_nonlinear: Type of non-linearity on top of the attention
function.
attention_type: Type of the attention structure.
kernel: Convolutional kernel to use in attention layers (eg, [3, 3]).
reuse: Whether or not the layer and its variables should be reused.
Returns:
prelogits: A tensor of size [batch, 1, 1, channels].
attention_prob: Attention score after the non-linearity.
attention_score: Attention score before the non-linearity.
end_points: Updated set of activations, for external use.
Raises:
ValueError: If unknown attention_type is provided.
"""
with tf.variable_scope(
_ATTENTION_VARIABLE_SCOPE,
values=[feature_map, end_points],
reuse=reuse):
if attention_type not in _SUPPORTED_ATTENTION_TYPES:
raise ValueError('Unknown attention_type.')
if attention_type == 'use_l2_normalized_feature':
attention_feature_map = tf.nn.l2_normalize(
feature_map, 3, name='l2_normalize')
elif attention_type == 'use_default_input_feature':
attention_feature_map = feature_map
end_points['attention_feature_map'] = attention_feature_map
attention_outputs = self._PerformAttention(
attention_feature_map, feature_map, attention_nonlinear, kernel)
prelogits, attention_prob, attention_score = attention_outputs
end_points['prelogits'] = prelogits
end_points['attention_prob'] = attention_prob
end_points['attention_score'] = attention_score
return prelogits, attention_prob, attention_score, end_points
def GetResnet50Subnetwork(self,
images,
is_training=False,
global_pool=False,
reuse=None):
"""Constructs resnet_v1_50 part of the DELF model.
Args:
images: A tensor of size [batch, height, width, channels].
is_training: Whether or not the model is in training mode.
global_pool: If True, perform global average pooling after feature
extraction. This may be useful for DELF's descriptor fine-tuning stage.
reuse: Whether or not the layer and its variables should be reused.
Returns:
net: A rank-4 tensor of size [batch, height_out, width_out, channels_out].
If global_pool is True, height_out = width_out = 1.
end_points: A set of activations for external use.
"""
block = resnet_v1.resnet_v1_block
blocks = [
block('block1', base_depth=64, num_units=3, stride=2),
block('block2', base_depth=128, num_units=4, stride=2),
block('block3', base_depth=256, num_units=6, stride=2),
]
if self._target_layer_type == 'resnet_v1_50/block4':
blocks.append(block('block4', base_depth=512, num_units=3, stride=1))
net, end_points = resnet_v1.resnet_v1(
images,
blocks,
is_training=is_training,
global_pool=global_pool,
reuse=reuse,
scope='resnet_v1_50')
return net, end_points
def GetAttentionPrelogit(
self,
images,
weight_decay=0.0001,
attention_nonlinear=_SUPPORTED_ATTENTION_NONLINEARITY[0],
attention_type=_SUPPORTED_ATTENTION_TYPES[0],
kernel=1,
training_resnet=False,
training_attention=False,
reuse=False,
use_batch_norm=True):
"""Constructs attention model on resnet_v1_50.
Args:
images: A tensor of size [batch, height, width, channels].
weight_decay: The parameters for weight_decay regularizer.
attention_nonlinear: Type of non-linearity on top of the attention
function.
attention_type: Type of the attention structure.
kernel: Convolutional kernel to use in attention layers (eg, [3, 3]).
training_resnet: Whether or not the Resnet blocks from the model are in
training mode.
training_attention: Whether or not the attention part of the model is
in training mode.
reuse: Whether or not the layer and its variables should be reused.
use_batch_norm: Whether or not to use batch normalization.
Returns:
prelogits: A tensor of size [batch, 1, 1, channels].
attention_prob: Attention score after the non-linearity.
attention_score: Attention score before the non-linearity.
feature_map: Features extracted from the model, which are not
l2-normalized.
end_points: Set of activations for external use.
"""
# Construct Resnet50 features.
with slim.arg_scope(
resnet_v1.resnet_arg_scope(use_batch_norm=use_batch_norm)):
_, end_points = self.GetResnet50Subnetwork(
images, is_training=training_resnet, reuse=reuse)
feature_map = end_points[self._target_layer_type]
# Construct attention subnetwork on top of features.
with slim.arg_scope(
resnet_v1.resnet_arg_scope(
weight_decay=weight_decay, use_batch_norm=use_batch_norm)):
with slim.arg_scope([slim.batch_norm], is_training=training_attention):
(prelogits, attention_prob, attention_score,
end_points) = self._GetAttentionSubnetwork(
feature_map,
end_points,
attention_nonlinear=attention_nonlinear,
attention_type=attention_type,
kernel=kernel,
reuse=reuse)
return prelogits, attention_prob, attention_score, feature_map, end_points
def _GetAttentionModel(
self,
images,
num_classes,
weight_decay=0.0001,
attention_nonlinear=_SUPPORTED_ATTENTION_NONLINEARITY[0],
attention_type=_SUPPORTED_ATTENTION_TYPES[0],
kernel=1,
training_resnet=False,
training_attention=False,
reuse=False):
"""Constructs attention model on resnet_v1_50.
Args:
images: A tensor of size [batch, height, width, channels]
num_classes: The number of output classes.
weight_decay: The parameters for weight_decay regularizer.
attention_nonlinear: Type of non-linearity on top of the attention
function.
attention_type: Type of the attention structure.
kernel: Convolutional kernel to use in attention layers (eg, [3, 3]).
training_resnet: Whether or not the Resnet blocks from the model are in
training mode.
training_attention: Whether or not the attention part of the model is in
training mode.
reuse: Whether or not the layer and its variables should be reused.
Returns:
logits: A tensor of size [batch, num_classes].
attention_prob: Attention score after the non-linearity.
attention_score: Attention score before the non-linearity.
feature_map: Features extracted from the model, which are not
l2-normalized.
"""
attention_feat, attention_prob, attention_score, feature_map, _ = (
self.GetAttentionPrelogit(
images,
weight_decay,
attention_nonlinear=attention_nonlinear,
attention_type=attention_type,
kernel=kernel,
training_resnet=training_resnet,
training_attention=training_attention,
reuse=reuse))
with slim.arg_scope(
resnet_v1.resnet_arg_scope(
weight_decay=weight_decay, batch_norm_scale=True)):
with slim.arg_scope([slim.batch_norm], is_training=training_attention):
with tf.variable_scope(
_ATTENTION_VARIABLE_SCOPE, values=[attention_feat], reuse=reuse):
logits = slim.conv2d(
attention_feat,
num_classes, [1, 1],
activation_fn=None,
normalizer_fn=None,
scope='logits')
logits = tf.squeeze(logits, [1, 2], name='spatial_squeeze')
return logits, attention_prob, attention_score, feature_map
def AttentionModel(self,
images,
num_classes,
weight_decay=0.0001,
attention_nonlinear=_SUPPORTED_ATTENTION_NONLINEARITY[0],
attention_type=_SUPPORTED_ATTENTION_TYPES[0],
kernel=1,
training_resnet=False,
training_attention=False,
reuse=False):
"""Constructs attention based classification model for training.
Args:
images: A tensor of size [batch, height, width, channels]
num_classes: The number of output classes.
weight_decay: The parameters for weight_decay regularizer.
attention_nonlinear: Type of non-linearity on top of the attention
function.
attention_type: Type of the attention structure.
kernel: Convolutional kernel to use in attention layers (eg, [3, 3]).
training_resnet: Whether or not the Resnet blocks from the model are in
training mode.
training_attention: Whether or not the model is in training mode. Note
that this function only supports training the attention part of the
model, ie, the feature extraction layers are not trained.
reuse: Whether or not the layer and its variables should be reused.
Returns:
logit: A tensor of size [batch, num_classes]
attention: Attention score after the non-linearity.
feature_map: Features extracted from the model, which are not
l2-normalized.
Raises:
ValueError: If unknown target_layer_type is provided.
"""
if 'resnet_v1_50' in self._target_layer_type:
net_outputs = self._GetAttentionModel(
images,
num_classes,
weight_decay,
attention_nonlinear=attention_nonlinear,
attention_type=attention_type,
kernel=kernel,
training_resnet=training_resnet,
training_attention=training_attention,
reuse=reuse)
logits, attention, _, feature_map = net_outputs
else:
raise ValueError('Unknown target_layer_type.')
return logits, attention, feature_map
model_path: "parameters/delf_v1_20171026/model/"
image_scales: .25
image_scales: .3536
image_scales: .5
image_scales: .7071
image_scales: 1.0
image_scales: 1.4142
image_scales: 2.0
delf_local_config {
use_pca: true
# Note that for the exported model provided as an example, layer_name and
# iou_threshold are hard-coded in the checkpoint. So, the layer_name and
# iou_threshold variables here have no effect on the provided
# extract_features.py script.
layer_name: "resnet_v1_50/block3"
iou_threshold: 1.0
max_feature_num: 1000
score_threshold: 100.0
pca_parameters {
mean_path: "parameters/delf_v1_20171026/pca/mean.datum"
projection_matrix_path: "parameters/delf_v1_20171026/pca/pca_proj_mat.datum"
pca_dim: 40
use_whitening: false
}
}
# Copyright 2017 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Extracts DELF features from a list of images, saving them to file.
The images must be in JPG format. The program checks if descriptors already
exist, and skips computation for those.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
from google.protobuf import text_format
import numpy as np
import os
import sys
import tensorflow as tf
from tensorflow.python.platform import app
import time
from delf import delf_config_pb2
from delf import feature_extractor
from delf import feature_io
from delf import feature_pb2
cmd_args = None
# Extension of feature files.
_DELF_EXT = '.delf'
# Pace to report extraction log.
_STATUS_CHECK_ITERATIONS = 100
def _ReadImageList(list_path):
"""Helper function to read image paths.
Args:
list_path: Path to list of images, one image path per line.
Returns:
image_paths: List of image paths.
"""
with tf.gfile.GFile(list_path, 'r') as f:
image_paths = f.readlines()
image_paths = [entry.rstrip() for entry in image_paths]
return image_paths
def main(unused_argv):
tf.logging.set_verbosity(tf.logging.INFO)
# Read list of images.
tf.logging.info('Reading list of images...')
image_paths = _ReadImageList(cmd_args.list_images_path)
num_images = len(image_paths)
tf.logging.info('done! Found %d images', num_images)
# Parse DelfConfig proto.
config = delf_config_pb2.DelfConfig()
with tf.gfile.FastGFile(cmd_args.config_path, 'r') as f:
text_format.Merge(f.read(), config)
# Create output directory if necessary.
if not os.path.exists(cmd_args.output_dir):
os.makedirs(cmd_args.output_dir)
# Tell TensorFlow that the model will be built into the default Graph.
with tf.Graph().as_default():
# Reading list of images.
filename_queue = tf.train.string_input_producer(image_paths, shuffle=False)
reader = tf.WholeFileReader()
_, value = reader.read(filename_queue)
image_tf = tf.image.decode_jpeg(value, channels=3)
with tf.Session() as sess:
# Initialize variables.
init_op = tf.global_variables_initializer()
sess.run(init_op)
# Loading model that will be used.
tf.saved_model.loader.load(sess, [tf.saved_model.tag_constants.SERVING],
config.model_path)
graph = tf.get_default_graph()
input_image = graph.get_tensor_by_name('input_image:0')
input_score_threshold = graph.get_tensor_by_name('input_abs_thres:0')
input_image_scales = graph.get_tensor_by_name('input_scales:0')
input_max_feature_num = graph.get_tensor_by_name(
'input_max_feature_num:0')
boxes = graph.get_tensor_by_name('boxes:0')
raw_descriptors = graph.get_tensor_by_name('features:0')
feature_scales = graph.get_tensor_by_name('scales:0')
attention_with_extra_dim = graph.get_tensor_by_name('scores:0')
attention = tf.reshape(attention_with_extra_dim,
[tf.shape(attention_with_extra_dim)[0]])
locations, descriptors = feature_extractor.DelfFeaturePostProcessing(
boxes, raw_descriptors, config)
# Start input enqueue threads.
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
start = time.clock()
for i in range(num_images):
# Write to log-info once in a while.
if i == 0:
tf.logging.info('Starting to extract DELF features from images...')
elif i % _STATUS_CHECK_ITERATIONS == 0:
elapsed = (time.clock() - start)
tf.logging.info('Processing image %d out of %d, last %d '
'images took %f seconds', i, num_images,
_STATUS_CHECK_ITERATIONS, elapsed)
start = time.clock()
# # Get next image.
im = sess.run(image_tf)
# If descriptor already exists, skip its computation.
out_desc_filename = os.path.splitext(os.path.basename(
image_paths[i]))[0] + _DELF_EXT
out_desc_fullpath = os.path.join(cmd_args.output_dir, out_desc_filename)
if tf.gfile.Exists(out_desc_fullpath):
tf.logging.info('Skipping %s', image_paths[i])
continue
# Extract and save features.
(locations_out, descriptors_out, feature_scales_out,
attention_out) = sess.run(
[locations, descriptors, feature_scales, attention],
feed_dict={
input_image:
im,
input_score_threshold:
config.delf_local_config.score_threshold,
input_image_scales:
list(config.image_scales),
input_max_feature_num:
config.delf_local_config.max_feature_num
})
serialized_desc = feature_io.WriteToFile(
out_desc_fullpath, locations_out, feature_scales_out,
descriptors_out, attention_out)
# Finalize enqueue threads.
coord.request_stop()
coord.join(threads)
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.register('type', 'bool', lambda v: v.lower() == 'true')
parser.add_argument(
'--config_path',
type=str,
default='delf_config_example.pbtxt',
help="""
Path to DelfConfig proto text file with configuration to be used for DELF
extraction.
""")
parser.add_argument(
'--list_images_path',
type=str,
default='list_images.txt',
help="""
Path to list of images whose DELF features will be extracted.
""")
parser.add_argument(
'--output_dir',
type=str,
default='test_features',
help="""
Directory where DELF features will be written to. Each image's features
will be written to a file with same name, and extension replaced by .delf.
""")
cmd_args, unparsed = parser.parse_known_args()
app.run(main=main, argv=[sys.argv[0]] + unparsed)
# Copyright 2017 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Matches two images using their DELF features.
The matching is done using feature-based nearest-neighbor search, followed by
geometric verification using RANSAC.
The DELF features can be extracted using the extract_features.py script.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
from delf import feature_io
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import numpy as np
from scipy.spatial import cKDTree
from skimage.feature import plot_matches
from skimage.measure import ransac
from skimage.transform import AffineTransform
import sys
import tensorflow as tf
from tensorflow.python.platform import app
cmd_args = None
_DISTANCE_THRESHOLD = 0.8
def main(unused_argv):
tf.logging.set_verbosity(tf.logging.INFO)
# Read features.
locations_1, _, descriptors_1, _, _ = feature_io.ReadFromFile(
cmd_args.features_1_path)
num_features_1 = locations_1.shape[0]
tf.logging.info("Loaded image 1's %d features" % num_features_1)
locations_2, _, descriptors_2, _, _ = feature_io.ReadFromFile(
cmd_args.features_2_path)
num_features_2 = locations_2.shape[0]
tf.logging.info("Loaded image 2's %d features" % num_features_2)
# Find nearest-neighbor matches using a KD tree.
d1_tree = cKDTree(descriptors_1)
distances, indices = d1_tree.query(
descriptors_2, distance_upper_bound=_DISTANCE_THRESHOLD)
# Select feature locations for putative matches.
locations_2_to_use = np.array([
locations_2[i,] for i in range(num_features_2)
if indices[i] != num_features_1
])
locations_1_to_use = np.array([
locations_1[indices[i],] for i in range(num_features_2)
if indices[i] != num_features_1
])
# Perform geometric verification using RANSAC.
model_robust, inliers = ransac(
(locations_1_to_use, locations_2_to_use),
AffineTransform,
min_samples=3,
residual_threshold=20,
max_trials=1000)
tf.logging.info('Found %d inliers' % sum(inliers))
# Visualize correspondences, and save to file.
fig, ax = plt.subplots()
img_1 = mpimg.imread(cmd_args.image_1_path)
img_2 = mpimg.imread(cmd_args.image_2_path)
inlier_idxs = np.nonzero(inliers)[0]
plot_matches(
ax,
img_1,
img_2,
locations_1_to_use,
locations_2_to_use,
np.column_stack((inlier_idxs, inlier_idxs)),
matches_color='b')
ax.axis('off')
ax.set_title('DELF correspondences')
plt.savefig(cmd_args.output_image)
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.register('type', 'bool', lambda v: v.lower() == 'true')
parser.add_argument(
'--image_1_path',
type=str,
default='test_images/image_1.jpg',
help="""
Path to test image 1.
""")
parser.add_argument(
'--image_2_path',
type=str,
default='test_images/image_2.jpg',
help="""
Path to test image 2.
""")
parser.add_argument(
'--features_1_path',
type=str,
default='test_features/image_1.delf',
help="""
Path to DELF features from image 1.
""")
parser.add_argument(
'--features_2_path',
type=str,
default='test_features/image_2.delf',
help="""
Path to DELF features from image 2.
""")
parser.add_argument(
'--output_image',
type=str,
default='test_match.png',
help="""
Path where an image showing the matches will be saved.
""")
cmd_args, unparsed = parser.parse_known_args()
app.run(main=main, argv=[sys.argv[0]] + unparsed)
# Copyright 2017 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""DELF feature extractor.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from delf import datum_io
from delf import delf_v1
from delf import delf_config_pb2
from object_detection.core import box_list
from object_detection.core import box_list_ops
import tensorflow as tf
def NormalizePixelValues(image,
pixel_value_offset=128.0,
pixel_value_scale=128.0):
"""Normalize image pixel values.
Args:
image: a uint8 tensor.
pixel_value_offset: a Python float, offset for normalizing pixel values.
pixel_value_scale: a Python float, scale for normalizing pixel values.
Returns:
image: a float32 tensor of the same shape as the input image.
"""
image = tf.to_float(image)
image = tf.div(tf.subtract(image, pixel_value_offset), pixel_value_scale)
return image
def CalculateReceptiveBoxes(height, width, rf, stride, padding):
"""Calculate receptive boxes for each feature point.
Args:
height: The height of feature map.
width: The width of feature map.
rf: The receptive field size.
stride: The effective stride between two adjacent feature points.
padding: The effective padding size.
Returns:
rf_boxes: [N, 4] receptive boxes tensor. Here N equals to height x width.
Each box is represented by [ymin, xmin, ymax, xmax].
"""
x, y = tf.meshgrid(tf.range(width), tf.range(height))
coordinates = tf.reshape(tf.stack([y, x], axis=2), [-1, 2])
# [y,x,y,x]
point_boxes = tf.to_float(tf.concat([coordinates, coordinates], 1))
bias = [-padding, -padding, -padding + rf - 1, -padding + rf - 1]
rf_boxes = stride * point_boxes + bias
return rf_boxes
def CalculateKeypointCenters(boxes):
"""Helper function to compute feature centers, from RF boxes.
Args:
boxes: [N, 4] float tensor.
Returns:
centers: [N, 2] float tensor.
"""
return tf.divide(
tf.add(
tf.gather(boxes, [0, 1], axis=1), tf.gather(boxes, [2, 3], axis=1)),
2.0)
def ExtractKeypointDescriptor(image, layer_name, image_scales, iou,
max_feature_num, abs_thres, model_fn):
"""Extract keypoint descriptor for input image.
Args:
image: A image tensor with shape [h, w, channels].
layer_name: The endpoint of feature extraction layer.
image_scales: A 1D float tensor which contains the scales.
iou: A float scalar denoting the IOU threshold for NMS.
max_feature_num: An int tensor denoting the maximum selected feature points.
abs_thres: A float tensor denoting the score threshold for feature
selection.
model_fn: Model function. Follows the signature:
* Args:
* `images`: Image tensor which is re-scaled.
* `normalized_image`: Whether or not the images are normalized.
* `reuse`: Whether or not the layer and its variables should be reused.
* Returns:
* `attention`: Attention score after the non-linearity.
* `feature_map`: Feature map obtained from the ResNet model.
Returns:
boxes: [N, 4] float tensor which denotes the selected receptive box. N is
the number of final feature points which pass through keypoint selection
and NMS steps.
feature_scales: [N] float tensor. It is the inverse of the input image
scales such that larger image scales correspond to larger image regions,
which is compatible with scale-space keypoint detection convention.
features: [N, depth] float tensor with feature descriptors.
scores: [N, 1] float tensor denoting the attention score.
Raises:
ValueError: If the layer_name is unsupported.
"""
original_image_shape_float = tf.gather(tf.to_float(tf.shape(image)), [0, 1])
image_tensor = NormalizePixelValues(image)
image_tensor = tf.expand_dims(image_tensor, 0, name='image/expand_dims')
# Feature depth and receptive field parameters for each network version.
if layer_name == 'resnet_v1_50/block3':
feature_depth = 1024
rf, stride, padding = [291.0, 32.0, 145.0]
elif layer_name == 'resnet_v1_50/block4':
feature_depth = 2048
rf, stride, padding = [483.0, 32.0, 241.0]
else:
raise ValueError('Unsupported layer_name.')
def _ProcessSingleScale(scale_index,
boxes,
features,
scales,
scores,
reuse=True):
"""Resize the image and run feature extraction and keypoint selection.
This function will be passed into tf.while_loop() and be called
repeatedly. The input boxes are collected from the previous iteration
[0: scale_index -1]. We get the current scale by
image_scales[scale_index], and run image resizing, feature extraction and
keypoint selection. Then we will get a new set of selected_boxes for
current scale. In the end, we concat the previous boxes with current
selected_boxes as the output.
Args:
scale_index: A valid index in the image_scales.
boxes: Box tensor with the shape of [N, 4].
features: Feature tensor with the shape of [N, depth].
scales: Scale tensor with the shape of [N].
scores: Attention score tensor with the shape of [N].
reuse: Whether or not the layer and its variables should be reused.
Returns:
scale_index: The next scale index for processing.
boxes: Concatenated box tensor with the shape of [K, 4]. K >= N.
features: Concatenated feature tensor with the shape of [K, depth].
scales: Concatenated scale tensor with the shape of [K].
scores: Concatenated attention score tensor with the shape of [K].
"""
scale = tf.gather(image_scales, scale_index)
new_image_size = tf.to_int32(tf.round(original_image_shape_float * scale))
resized_image = tf.image.resize_bilinear(image_tensor, new_image_size)
attention, feature_map = model_fn(
resized_image, normalized_image=True, reuse=reuse)
rf_boxes = CalculateReceptiveBoxes(
tf.shape(feature_map)[1], tf.shape(feature_map)[2], rf, stride, padding)
# Re-project back to the original image space.
rf_boxes = tf.divide(rf_boxes, scale)
attention = tf.reshape(attention, [-1])
feature_map = tf.reshape(feature_map, [-1, feature_depth])
# Use attention score to select feature vectors.
indices = tf.reshape(tf.where(attention >= abs_thres), [-1])
selected_boxes = tf.gather(rf_boxes, indices)
selected_features = tf.gather(feature_map, indices)
selected_scores = tf.gather(attention, indices)
selected_scales = tf.ones_like(selected_scores, tf.float32) / scale
# Concat with the previous result from different scales.
boxes = tf.concat([boxes, selected_boxes], 0)
features = tf.concat([features, selected_features], 0)
scales = tf.concat([scales, selected_scales], 0)
scores = tf.concat([scores, selected_scores], 0)
return scale_index + 1, boxes, features, scales, scores
output_boxes = tf.zeros([0, 4], dtype=tf.float32)
output_features = tf.zeros([0, feature_depth], dtype=tf.float32)
output_scales = tf.zeros([0], dtype=tf.float32)
output_scores = tf.zeros([0], dtype=tf.float32)
# Process the first scale separately, the following scales will reuse the
# graph variables.
(_, output_boxes, output_features, output_scales,
output_scores) = _ProcessSingleScale(
0,
output_boxes,
output_features,
output_scales,
output_scores,
reuse=False)
i = tf.constant(1, dtype=tf.int32)
num_scales = tf.shape(image_scales)[0]
keep_going = lambda j, boxes, features, scales, scores: tf.less(j, num_scales)
(_, output_boxes, output_features, output_scales,
output_scores) = tf.while_loop(
cond=keep_going,
body=_ProcessSingleScale,
loop_vars=[
i, output_boxes, output_features, output_scales, output_scores
],
shape_invariants=[
i.get_shape(),
tf.TensorShape([None, 4]),
tf.TensorShape([None, feature_depth]),
tf.TensorShape([None]),
tf.TensorShape([None])
],
back_prop=False)
feature_boxes = box_list.BoxList(output_boxes)
feature_boxes.add_field('features', output_features)
feature_boxes.add_field('scales', output_scales)
feature_boxes.add_field('scores', output_scores)
nms_max_boxes = tf.minimum(max_feature_num, feature_boxes.num_boxes())
final_boxes = box_list_ops.non_max_suppression(feature_boxes, iou,
nms_max_boxes)
return (final_boxes.get(), final_boxes.get_field('scales'),
final_boxes.get_field('features'), tf.expand_dims(
final_boxes.get_field('scores'), 1))
def BuildModel(layer_name, attention_nonlinear, attention_type,
attention_kernel_size):
"""Build the DELF model.
This function is helpful for constructing the model function which will be fed
to ExtractKeypointDescriptor().
Args:
layer_name: the endpoint of feature extraction layer.
attention_nonlinear: Type of the non-linearity for the attention function.
Currently, only 'softplus' is supported.
attention_type: Type of the attention used. Options are:
'use_l2_normalized_feature' and 'use_default_input_feature'. Note that
this is irrelevant during inference time.
attention_kernel_size: Size of attention kernel (kernel is square).
Returns:
Attention model function.
"""
def _ModelFn(images, normalized_image, reuse):
"""Attention model to get feature map and attention score map.
Args:
images: Image tensor.
normalized_image: Whether or not the images are normalized.
reuse: Whether or not the layer and its variables should be reused.
Returns:
attention: Attention score after the non-linearity.
feature_map: Feature map after ResNet convolution.
"""
if normalized_image:
image_tensor = images
else:
image_tensor = NormalizePixelValues(images)
# Extract features and attention scores.
model = delf_v1.DelfV1(layer_name)
_, attention, _, feature_map, _ = model.GetAttentionPrelogit(
image_tensor,
attention_nonlinear=attention_nonlinear,
attention_type=attention_type,
kernel=[attention_kernel_size, attention_kernel_size],
training_resnet=False,
training_attention=False,
reuse=reuse)
return attention, feature_map
return _ModelFn
def ApplyPcaAndWhitening(data,
pca_matrix,
pca_mean,
output_dim,
use_whitening=False,
pca_variances=None):
"""Applies PCA/whitening to data.
Args:
data: [N, dim] float tensor containing data which undergoes PCA/whitening.
pca_matrix: [dim, dim] float tensor PCA matrix, row-major.
pca_mean: [dim] float tensor, mean to subtract before projection.
output_dim: Number of dimensions to use in output data, of type int.
use_whitening: Whether whitening is to be used.
pca_variances: [dim] float tensor containing PCA variances. Only used if
use_whitening is True.
Returns:
output: [N, output_dim] float tensor with output of PCA/whitening operation.
"""
output = tf.matmul(
tf.subtract(data, pca_mean),
tf.slice(pca_matrix, [0, 0], [output_dim, -1]),
transpose_b=True,
name='pca_matmul')
# Apply whitening if desired.
if use_whitening:
output = tf.divide(
output,
tf.sqrt(tf.slice(pca_variances, [0], [output_dim])),
name='whitening')
return output
def DelfFeaturePostProcessing(boxes, descriptors, config):
"""Extract DELF features from input image.
Args:
boxes: [N, 4] float tensor which denotes the selected receptive box. N is
the number of final feature points which pass through keypoint selection
and NMS steps.
descriptors: [N, input_dim] float tensor.
config: DelfConfig proto with DELF extraction options.
Returns:
locations: [N, 2] float tensor which denotes the selected keypoint
locations.
final_descriptors: [N, output_dim] float tensor with DELF descriptors after
normalization and (possibly) PCA/whitening.
"""
# Get center of descriptor boxes, corresponding to feature locations.
locations = CalculateKeypointCenters(boxes)
# Post-process descriptors: L2-normalize, and if desired apply PCA (followed
# by L2-normalization).
with tf.variable_scope('postprocess'):
final_descriptors = tf.nn.l2_normalize(
descriptors, dim=1, name='l2_normalization')
if config.delf_local_config.use_pca:
# Load PCA parameters.
pca_mean = tf.constant(
datum_io.ReadFromFile(
config.delf_local_config.pca_parameters.mean_path),
dtype=tf.float32)
pca_matrix = tf.constant(
datum_io.ReadFromFile(
config.delf_local_config.pca_parameters.projection_matrix_path),
dtype=tf.float32)
pca_dim = config.delf_local_config.pca_parameters.pca_dim
pca_variances = None
if config.delf_local_config.pca_parameters.use_whitening:
pca_variances = tf.constant(
datum_io.ReadFromFile(
config.delf_local_config.pca_parameters.pca_variances_path),
dtype=tf.float32)
# Apply PCA, and whitening if desired.
final_descriptors = ApplyPcaAndWhitening(
final_descriptors, pca_matrix, pca_mean, pca_dim,
config.delf_local_config.pca_parameters.use_whitening, pca_variances)
# Re-normalize.
final_descriptors = tf.nn.l2_normalize(
final_descriptors, dim=1, name='pca_l2_normalization')
return locations, final_descriptors
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment