Unverified Commit e0a16ee8 authored by Jekaterina Jaroslavceva's avatar Jekaterina Jaroslavceva Committed by GitHub
Browse files

Added README.md (#10259)

* Added README.md

* fix README

* review fixes
parent f47f7682
## Global features: CNN Image Retrieval
This Python toolbox implements the training and testing of the approach described in the papers:
[![Paper](http://img.shields.io/badge/paper-arXiv.2001.05027-B3181B.svg)](https://arxiv.org/abs/1711.02512)
```
"Fine-tuning CNN Image Retrieval with No Human Annotation",
Radenović F., Tolias G., Chum O.,
TPAMI 2018
```
[![Paper](http://img.shields.io/badge/paper-arXiv.2001.05027-B3181B.svg)](http://arxiv.org/abs/1604.02426)
```
"CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples",
Radenović F., Tolias G., Chum O.,
ECCV 2016
```
Fine-tuned CNNs are used for global feature extraction with the goal of using
those for image retrieval. The networks are trained on the <i>SfM120k</i>
landmark images dataset.
<img src="http://cmp.felk.cvut.cz/cnnimageretrieval/img/cnnimageretrieval_network_medium.png" width=\textwidth/>
When initializing the network, one of the popular pre-trained architectures
for classification tasks (such as ResNet or VGG) is used as the network’s
backbone. The
fully connected layers of such architectures are discarded, resulting in a fully
convolutional backbone. Then, given an input image of the size [W × H × C],
where C is the number of channels, W and H are image width and height,
respectively; the output is a tensor X with dimensions [W' × H' × K], where
K is the number of feature maps in the last layer. Tensor X
can be considered as a set of the input image’s deep local features. For
deep convolutional features, the simple aggregation approach based on global
pooling arguably provides the best results. This method is fast, has a small
number of parameters, and a low risk of overfitting. Keeping this in mind,
we convert local features to a global descriptor vector using one of the
retrieval system’s global poolings (MAC, SPoC, or GeM). After this stage,
the feature vector is made up of the maximum activation per feature map
with dimensionality equal to K. The final output dimensionality for the most
common networks varies from 512 to 2048, making this image representation
relatively compact.
Vectors that have been pooled are subsequently L2-normalized. The obtained
representation is then optionally passed through the fully connected
layers before being subjected to a
new L2 re-normalization. The finally produced image representation allows
comparing the resemblance of two images by simply using their inner product.
### Install DELF library
To be able to use this code, please follow
[these instructions](../../../../INSTALL_INSTRUCTIONS.md) to properly install
the DELF library.
### Usage
<details>
<summary><b>Training</b></summary><br/>
Navigate (```cd```) to the folder ```[DELF_ROOT/delf/python/training
/global_features].```
Example training script is located in ```DELF_ROOT/delf/python/training/global_features/train.py```.
```
python3 train.py [--arch ARCH] [--batch_size N] [--data_root PATH]
[--debug] [--directory PATH] [--epochs N] [--gpu_id ID]
[--image_size SIZE] [--launch_tensorboard] [--loss LOSS]
[--loss_margin LM] [--lr LR] [--momentum M] [multiscale SCALES]
[--neg_num N] [--optimizer OPTIMIZER] [--pool POOL] [--pool_size N]
[--pretrained] [--precompute_whitening DATASET] [--resume]
[--query_size N] [--test_datasets DATASET] [--test_freq N]
[--test_whiten] [--training_dataset DATASET] [--update_every N]
[--validation_type TYPE] [--weight_decay N] [--whitening]
```
For detailed explanation of the options run:
```
python3 train.py -helpfull
```
Standard training of our models was run with the following parameters:
```
python3 train.py \
--directory="DESTINATION_PATH" \
--gpu_ids='0' \
--data_root="TRAINING_DATA_DIRECTORY" \
--training_dataset='retrieval-SfM-120k' \
--test_datasets='roxford5k,rparis6k' \
--arch='ResNet101' \
--pool='gem' \
--whitening=True \
--debug=True \
--loss='triplet' \
--loss_margin=0.85 \
--optimizer='adam' \
--lr=5e-7 --neg_num=3 --query_size=2000 \
--pool_size=20000 --batch_size=5 \
--image_size=1024 --epochs=100 --test_freq=5 \
--multiscale='[1, 2**(1/2), 1/2**(1/2)]'
```
**Note**: Data and networks used for training and testing are automatically downloaded when using the example training
script (```DELF_ROOT/delf/python/training/global_features/train.py```).
</details>
<details>
<summary><b>Training logic flow</b></summary><br/>
**Initialization phase**
1. Checking if required datasets are downloaded and automatically download them (both test and train/val) if they are
not present in the data folder.
1. Setting up the logging and creating a logging/checkpoint directory.
1. Initialize model according to the user-provided parameters (architecture
/pooling/whitening/pretrained etc.).
1. Defining loss (contrastive/triplet) according to the user parameters.
1. Defining optimizer (Adam/SGD with learning rate/weight decay/momentum) according to the user parameters.
1. Initializing CheckpointManager and resuming from the latest checkpoint if the resume flag is set.
1. Launching Tensorboard if the flag is set.
1. Initializing training (and validation, if required) datasets.
1. Freezing BatchNorm weights update, since we we do training for one image at a time so the statistics would not be per batch, hence we choose freezing (i.e., using pretrained imagenet statistics).
1. Evaluating the network performance before training (on the test datasets).
**Training phase**
The main training loop (for the required number of epochs):
1. Finding the hard negative pairs in the dataset (using the forward pass through the model)
1. Creating the training dataset from generator which changes every epoch. Each
element in the dataset consists of 1 x Positive image, 1 x Query image
, N x Hard negative images (N is specified by the `num_neg` flag), an array
specifying the Positive (-1), Query (0), Negative (1) images.
1. Performing one training step and calculating the final epoch loss.
1. If validation is required, finding hard negatives in the validation set
, which has the same structure as the training set. Performing one validation
step and calculating the loss.
1. Evaluating on the test datasets every `test_freq` epochs.
1. Saving checkpoint (optimizer and the model weights).
</details>
## Exporting the Trained Model
Assuming the training output, the TensorFlow checkpoint, is located in the
`--directory` path. The following code exports the model:
```
python3 model/export_CNN_global_model.py \
[--ckpt_path PATH] [--export_path PATH] [--input_scales_list LIST]
[--multi_scale_pool_type TYPE] [--normalize_global_descriptor BOOL]
[arch ARCHITECTURE] [pool POOLING] [whitening BOOL]
```
*NOTE:* Path to the checkpoint must include .h5 file.
## Testing the trained model
After the trained model has been exported, it can be used to extract global
features similarly as for the DELG model. Please follow
[these instructions](https://github.com/tensorflow/models/tree/master/research/delf/delf/python/training#testing-the-trained-model).
After training the standard training setup for 100 epochs, the
following results are obtained on Roxford and RParis datasets under a single
-scale evaluation:
```
>> roxford5k: mAP E: 74.88, M: 58.28, H: 30.4
>> roxford5k: mP@k[1, 5, 10] E: [89.71 84.8 79.07],
M: [91.43 84.67 78.24],
H: [68.57 53.29 43.29]
>> rparis6k: mAP E: 89.21, M: 73.69, H: 49.1
>> rparis6k: mP@k[1, 5, 10] E: [98.57 97.43 95.57],
M: [98.57 99.14 98.14],
H: [94.29 90. 87.29]
```
\ No newline at end of file
# Lint as: python3
# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Export global CNN feature tensorflow inference model."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
from absl import app
from absl import flags
import tensorflow as tf
from delf.python.training.model import global_model
from delf.python.training.model import export_model_utils
FLAGS = flags.FLAGS
flags.DEFINE_string('ckpt_path', None, help='Path to saved checkpoint.')
flags.DEFINE_string('export_path', None,
help='Path where model will be exported.')
flags.DEFINE_list(
'input_scales_list', None,
'Optional input image scales to use. If None (default), an input '
'end-point '
'"input_scales" is added for the exported model. If not None, the '
'specified list of floats will be hard-coded as the desired input '
'scales.')
flags.DEFINE_enum(
'multi_scale_pool_type', 'None', ['None', 'average', 'sum'],
"If 'None' (default), the model is exported with an output end-point "
"'global_descriptors', where the global descriptor for each scale is "
"returned separately. If not 'None', the global descriptor of each "
"scale is"
' pooled and a 1D global descriptor is returned, with output end-point '
"'global_descriptor'.")
flags.DEFINE_boolean('normalize_global_descriptor', False,
'If True, L2-normalizes global descriptor.')
# Network architecture and initialization options.
flags.DEFINE_string('arch', 'ResNet101',
'model architecture (default: ResNet101)')
flags.DEFINE_string('pool', 'gem', 'pooling options (default: gem)')
flags.DEFINE_boolean('whitening', False,
'train model with learnable whitening (linear layer) '
'after the pooling')
def _NormalizeImages(images, *args):
"""Normalize pixel values in image.
Args:
images: `Tensor`, images to normalize.
Returns:
normalized_images: `Tensor`, normalized images.
"""
tf.keras.applications.imagenet_utils.preprocess_input(images, mode='caffe')
return images
class _ExtractModule(tf.Module):
"""Helper module to build and save global feature model."""
def __init__(self,
multi_scale_pool_type='None',
normalize_global_descriptor=False,
input_scales_tensor=None):
"""Initialization of global feature model.
Args:
multi_scale_pool_type: Type of multi-scale pooling to perform.
normalize_global_descriptor: Whether to L2-normalize global
descriptor.
input_scales_tensor: If None, the exported function to be used
should be ExtractFeatures, where an input end-point "input_scales" is
added for the exported model. If not None, the specified 1D tensor of
floats will be hard-coded as the desired input scales, in conjunction
with ExtractFeaturesFixedScales.
"""
self._multi_scale_pool_type = multi_scale_pool_type
self._normalize_global_descriptor = normalize_global_descriptor
if input_scales_tensor is None:
self._input_scales_tensor = []
else:
self._input_scales_tensor = input_scales_tensor
self._model = global_model.GlobalFeatureNet(
FLAGS.arch, FLAGS.pool, FLAGS.whitening, pretrained=False)
def LoadWeights(self, checkpoint_path):
self._model.load_weights(checkpoint_path)
@tf.function(input_signature=[
tf.TensorSpec(shape=[None, None, 3], dtype=tf.uint8,
name='input_image'),
tf.TensorSpec(shape=[None], dtype=tf.float32, name='input_scales'),
tf.TensorSpec(shape=[None], dtype=tf.int32,
name='input_global_scales_ind')
])
def ExtractFeatures(self, input_image, input_scales,
input_global_scales_ind):
extracted_features = export_model_utils.ExtractGlobalFeatures(
input_image,
input_scales,
input_global_scales_ind,
lambda x: self._model(x, training=False),
multi_scale_pool_type=self._multi_scale_pool_type,
normalize_global_descriptor=self._normalize_global_descriptor,
normalization_function=_NormalizeImages())
named_output_tensors = {}
named_output_tensors['global_descriptors'] = tf.identity(
extracted_features, name='global_descriptors')
return named_output_tensors
@tf.function(input_signature=[
tf.TensorSpec(shape=[None, None, 3], dtype=tf.uint8, name='input_image')
])
def ExtractFeaturesFixedScales(self, input_image):
return self.ExtractFeatures(input_image, self._input_scales_tensor,
tf.range(tf.size(self._input_scales_tensor)))
def main(argv):
if len(argv) > 1:
raise app.UsageError('Too many command-line arguments.')
export_path = FLAGS.export_path
if os.path.exists(export_path):
raise ValueError('export_path %s already exists.' % export_path)
if FLAGS.input_scales_list is None:
input_scales_tensor = None
else:
input_scales_tensor = tf.constant(
[float(s) for s in FLAGS.input_scales_list],
dtype=tf.float32,
shape=[len(FLAGS.input_scales_list)],
name='input_scales')
module = _ExtractModule(FLAGS.multi_scale_pool_type,
FLAGS.normalize_global_descriptor,
input_scales_tensor)
# Load the weights.
checkpoint_path = FLAGS.ckpt_path
module.LoadWeights(checkpoint_path)
print('Checkpoint loaded from ', checkpoint_path)
# Save the module.
if FLAGS.input_scales_list is None:
served_function = module.ExtractFeatures
else:
served_function = module.ExtractFeaturesFixedScales
tf.saved_model.save(
module, export_path, signatures={'serving_default': served_function})
if __name__ == '__main__':
app.run(main)
...@@ -183,7 +183,8 @@ def ExtractGlobalFeatures(image, ...@@ -183,7 +183,8 @@ def ExtractGlobalFeatures(image,
global_scales_ind, global_scales_ind,
model_fn, model_fn,
multi_scale_pool_type='None', multi_scale_pool_type='None',
normalize_global_descriptor=False): normalize_global_descriptor=False,
normalization_function=gld.NormalizeImages):
"""Extract global features for input image. """Extract global features for input image.
Args: Args:
...@@ -201,6 +202,7 @@ def ExtractGlobalFeatures(image, ...@@ -201,6 +202,7 @@ def ExtractGlobalFeatures(image,
and a 1D global descriptor is returned. and a 1D global descriptor is returned.
normalize_global_descriptor: If True, output global descriptors are normalize_global_descriptor: If True, output global descriptors are
L2-normalized. L2-normalized.
normalization_function: Function used for normalization.
Returns: Returns:
global_descriptors: If `multi_scale_pool_type` is 'None', returns a [S, D] global_descriptors: If `multi_scale_pool_type` is 'None', returns a [S, D]
...@@ -213,7 +215,7 @@ def ExtractGlobalFeatures(image, ...@@ -213,7 +215,7 @@ def ExtractGlobalFeatures(image,
""" """
original_image_shape_float = tf.gather( original_image_shape_float = tf.gather(
tf.dtypes.cast(tf.shape(image), tf.float32), [0, 1]) tf.dtypes.cast(tf.shape(image), tf.float32), [0, 1])
image_tensor = gld.NormalizeImages( image_tensor = normalization_function(
image, pixel_value_offset=128.0, pixel_value_scale=128.0) image, pixel_value_offset=128.0, pixel_value_scale=128.0)
image_tensor = tf.expand_dims(image_tensor, 0, name='image/expand_dims') image_tensor = tf.expand_dims(image_tensor, 0, name='image/expand_dims')
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment