Added README.md (#10259)

* Added README.md * fix README * review fixes

Added README.md (#10259)
* Added README.md * fix README * review fixes
e0a16ee8 · Jekaterina Jaroslavceva · GitHub · f47f7682 · e0a16ee8 · e0a16ee8
Unverified Commit e0a16ee8 authored Nov 17, 2021 by Jekaterina Jaroslavceva Committed by GitHub Nov 16, 2021
3 changed files
--- a/research/delf/delf/python/training/global_features/README.md
+++ b/research/delf/delf/python/training/global_features/README.md
+## Global features: CNN Image Retrieval
+This Python toolbox implements the training and testing of the approach described in the papers:
+[![Paper](http://img.shields.io/badge/paper-arXiv.2001.05027-B3181B.svg)](https://arxiv.org/abs/1711.02512)
+```
+"Fine-tuning CNN Image Retrieval with No Human Annotation",  
+Radenović F., Tolias G., Chum O.,
+TPAMI 2018 
+```
+[![Paper](http://img.shields.io/badge/paper-arXiv.2001.05027-B3181B.svg)](http://arxiv.org/abs/1604.02426)
+```
+"CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples",  
+Radenović F., Tolias G., Chum O.,
+ECCV 2016
+```
+Fine-tuned CNNs are used for global feature extraction with the goal of using
+those for image retrieval. The networks are trained on the <i>SfM120k</i>
+landmark images dataset.
+<img src="http://cmp.felk.cvut.cz/cnnimageretrieval/img/cnnimageretrieval_network_medium.png" width=\textwidth/>
+When initializing the network, one of the popular pre-trained architectures
+ for classification tasks (such as ResNet or VGG) is used as the network’s
+  backbone. The
+fully connected layers of such architectures are discarded, resulting in a fully
+convolutional backbone. Then, given an input image of the size [W × H × C],
+where C is the number of channels, W and H are image width and height,
+respectively; the output is a tensor X with dimensions [W' × H' × K], where
+K is the number of feature maps in the last layer. Tensor X
+can be considered as a set of the input image’s deep local features. For
+deep convolutional features, the simple aggregation approach based on global
+pooling arguably provides the best results. This method is fast, has a small
+number of parameters, and a low risk of overfitting. Keeping this in mind,
+we convert local features to a global descriptor vector using one of the
+retrieval system’s global poolings (MAC, SPoC, or GeM). After this stage,
+the feature vector is made up of the maximum activation per feature map
+with dimensionality equal to K. The final output dimensionality for the most
+common networks varies from 512 to 2048, making this image representation
+relatively compact.
+Vectors that have been pooled are subsequently L2-normalized. The obtained
+ representation is then optionally passed through the fully connected
+layers before being subjected to a
+new L2 re-normalization. The finally produced image representation allows
+comparing the resemblance of two images by simply using their inner product.
+### Install DELF library
+To be able to use this code, please follow
+[these instructions](../../../../INSTALL_INSTRUCTIONS.md) to properly install
+the DELF library.
+### Usage
+<details>
+  <summary><b>Training</b></summary><br/>
+  Navigate (```cd```) to the folder ```[DELF_ROOT/delf/python/training
+  /global_features].```
+  Example training script is located in ```DELF_ROOT/delf/python/training/global_features/train.py```.
+  ```
+  python3 train.py [--arch ARCH] [--batch_size N] [--data_root PATH]
+          [--debug] [--directory PATH] [--epochs N] [--gpu_id ID] 
+          [--image_size SIZE] [--launch_tensorboard] [--loss LOSS] 
+          [--loss_margin LM] [--lr LR] [--momentum M] [multiscale SCALES] 
+          [--neg_num N] [--optimizer OPTIMIZER] [--pool POOL] [--pool_size N]
+          [--pretrained] [--precompute_whitening DATASET] [--resume]
+          [--query_size N] [--test_datasets DATASET] [--test_freq N]
+          [--test_whiten] [--training_dataset DATASET] [--update_every N]
+          [--validation_type TYPE] [--weight_decay N] [--whitening]
+  ```
+  For detailed explanation of the options run:
+  ```
+  python3 train.py -helpfull
+  ```
+  Standard training of our models was run with the following parameters:
+  ```
+python3 train.py \
+--directory="DESTINATION_PATH" \
+--gpu_ids='0' \
+--data_root="TRAINING_DATA_DIRECTORY" \
+--training_dataset='retrieval-SfM-120k' \
+--test_datasets='roxford5k,rparis6k' \
+--arch='ResNet101' \
+--pool='gem' \
+--whitening=True \
+--debug=True \
+--loss='triplet' \
+--loss_margin=0.85 \
+--optimizer='adam' \
+--lr=5e-7 --neg_num=3 --query_size=2000 \
+--pool_size=20000 --batch_size=5 \
+--image_size=1024 --epochs=100 --test_freq=5 \
+--multiscale='[1, 2**(1/2), 1/2**(1/2)]'
+```
+  **Note**: Data and networks used for training and testing are automatically downloaded when using the example training
+   script (```DELF_ROOT/delf/python/training/global_features/train.py```).
+</details>
+<details>
+<summary><b>Training logic flow</b></summary><br/>
+**Initialization phase**
+1. Checking if required datasets are downloaded and automatically download them (both test and train/val) if they are 
+not present in the data folder.
+1. Setting up the logging and creating a logging/checkpoint directory.
+1. Initialize model according to the user-provided parameters (architecture
+/pooling/whitening/pretrained etc.).
+1. Defining loss (contrastive/triplet) according to the user parameters.
+1. Defining optimizer (Adam/SGD with learning rate/weight decay/momentum) according to the user parameters.
+1. Initializing CheckpointManager and resuming from the latest checkpoint if the resume flag is set.
+1. Launching Tensorboard if the flag is set.
+1. Initializing training (and validation, if required) datasets.
+1. Freezing BatchNorm weights update, since we we do training for one image at a time so the statistics would not be per batch, hence we choose freezing (i.e., using pretrained imagenet statistics).
+1. Evaluating the network performance before training (on the test datasets).
+**Training phase**
+The main training loop (for the required number of epochs):
+1. Finding the hard negative pairs in the dataset (using the forward pass through the model)
+1. Creating the training dataset from generator which changes every epoch. Each
+ element in the dataset consists of 1 x Positive image, 1 x Query image
+ , N x Hard negative images (N is specified by the `num_neg` flag), an array
+  specifying the Positive (-1), Query (0), Negative (1) images.
+1. Performing one training step and calculating the final epoch loss.
+1. If validation is required, finding hard negatives in the validation set
+, which has the same structure as the training set. Performing one validation
+ step and calculating the loss.
+1. Evaluating on the test datasets every `test_freq` epochs.
+1. Saving checkpoint (optimizer and the model weights).
+</details>
+## Exporting the Trained Model
+Assuming the training output, the TensorFlow checkpoint, is located in the
+`--directory` path. The following code exports the model:
+```
+python3 model/export_CNN_global_model.py \
+        [--ckpt_path PATH] [--export_path PATH] [--input_scales_list LIST]
+        [--multi_scale_pool_type TYPE] [--normalize_global_descriptor BOOL] 
+        [arch ARCHITECTURE] [pool POOLING] [whitening BOOL]
+```
+*NOTE:* Path to the checkpoint must include .h5 file.
+## Testing the trained model
+After the trained model has been exported, it can be used to extract global
+features similarly as for the DELG model. Please follow 
+[these instructions](https://github.com/tensorflow/models/tree/master/research/delf/delf/python/training#testing-the-trained-model).
+After training the standard training setup for 100 epochs, the
+ following results are obtained on Roxford and RParis datasets under a single
+ -scale evaluation:
+```
+>> roxford5k: mAP E: 74.88, M: 58.28, H: 30.4
+>> roxford5k: mP@k[1, 5, 10] E: [89.71 84.8  79.07],
+                             M: [91.43 84.67 78.24],
+                             H: [68.57 53.29 43.29]
+>> rparis6k: mAP E: 89.21, M: 73.69, H: 49.1
+>> rparis6k: mP@k[1, 5, 10] E: [98.57 97.43 95.57],
+                            M: [98.57 99.14 98.14],
+                            H: [94.29 90.   87.29]
+```
\ No newline at end of file
--- a/research/delf/delf/python/training/model/export_CNN_global.py
+++ b/research/delf/delf/python/training/model/export_CNN_global.py
+# Lint as: python3
+# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Export global CNN feature tensorflow inference model."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import os
+from absl import app
+from absl import flags
+import tensorflow as tf
+from delf.python.training.model import global_model
+from delf.python.training.model import export_model_utils
+FLAGS = flags.FLAGS
+flags.DEFINE_string('ckpt_path', None, help='Path to saved checkpoint.')
+flags.DEFINE_string('export_path', None,
+                    help='Path where model will be exported.')
+flags.DEFINE_list(
+        'input_scales_list', None,
+        'Optional input image scales to use. If None (default), an input '
+        'end-point '
+        '"input_scales" is added for the exported model. If not None, the '
+        'specified list of floats will be hard-coded as the desired input '
+        'scales.')
+flags.DEFINE_enum(
+        'multi_scale_pool_type', 'None', ['None', 'average', 'sum'],
+        "If 'None' (default), the model is exported with an output end-point "
+        "'global_descriptors', where the global descriptor for each scale is "
+        "returned separately. If not 'None', the global descriptor of each "
+        "scale is"
+        ' pooled and a 1D global descriptor is returned, with output end-point '
+        "'global_descriptor'.")
+flags.DEFINE_boolean('normalize_global_descriptor', False,
+                     'If True, L2-normalizes global descriptor.')
+# Network architecture and initialization options.
+flags.DEFINE_string('arch', 'ResNet101',
+                    'model architecture (default: ResNet101)')
+flags.DEFINE_string('pool', 'gem', 'pooling options (default: gem)')
+flags.DEFINE_boolean('whitening', False,
+                     'train model with learnable whitening (linear layer) '
+                     'after the pooling')
+def _NormalizeImages(images, *args):
+  """Normalize pixel values in image.
+  Args:
+    images: `Tensor`, images to normalize.
+  Returns:
+    normalized_images: `Tensor`, normalized images.
+  """
+  tf.keras.applications.imagenet_utils.preprocess_input(images, mode='caffe')
+  return images
+class _ExtractModule(tf.Module):
+  """Helper module to build and save global feature model."""
+  def __init__(self,
+               multi_scale_pool_type='None',
+               normalize_global_descriptor=False,
+               input_scales_tensor=None):
+    """Initialization of global feature model.
+    Args:
+      multi_scale_pool_type: Type of multi-scale pooling to perform.
+      normalize_global_descriptor: Whether to L2-normalize global
+        descriptor.
+      input_scales_tensor: If None, the exported function to be used
+        should be ExtractFeatures, where an input end-point "input_scales" is
+        added for the exported model. If not None, the specified 1D tensor of
+        floats will be hard-coded as the desired input scales, in conjunction
+         with ExtractFeaturesFixedScales.
+    """
+    self._multi_scale_pool_type = multi_scale_pool_type
+    self._normalize_global_descriptor = normalize_global_descriptor
+    if input_scales_tensor is None:
+      self._input_scales_tensor = []
+    else:
+      self._input_scales_tensor = input_scales_tensor
+    self._model = global_model.GlobalFeatureNet(
+            FLAGS.arch, FLAGS.pool, FLAGS.whitening, pretrained=False)
+  def LoadWeights(self, checkpoint_path):
+    self._model.load_weights(checkpoint_path)
+  @tf.function(input_signature=[
+    tf.TensorSpec(shape=[None, None, 3], dtype=tf.uint8,
+                  name='input_image'),
+    tf.TensorSpec(shape=[None], dtype=tf.float32, name='input_scales'),
+    tf.TensorSpec(shape=[None], dtype=tf.int32,
+                  name='input_global_scales_ind')
+  ])
+  def ExtractFeatures(self, input_image, input_scales,
+                      input_global_scales_ind):
+    extracted_features = export_model_utils.ExtractGlobalFeatures(
+            input_image,
+            input_scales,
+            input_global_scales_ind,
+            lambda x: self._model(x, training=False),
+            multi_scale_pool_type=self._multi_scale_pool_type,
+            normalize_global_descriptor=self._normalize_global_descriptor,
+            normalization_function=_NormalizeImages())
+    named_output_tensors = {}
+    named_output_tensors['global_descriptors'] = tf.identity(
+            extracted_features, name='global_descriptors')
+    return named_output_tensors
+  @tf.function(input_signature=[
+    tf.TensorSpec(shape=[None, None, 3], dtype=tf.uint8, name='input_image')
+  ])
+  def ExtractFeaturesFixedScales(self, input_image):
+    return self.ExtractFeatures(input_image, self._input_scales_tensor,
+                                tf.range(tf.size(self._input_scales_tensor)))
+def main(argv):
+  if len(argv) > 1:
+    raise app.UsageError('Too many command-line arguments.')
+  export_path = FLAGS.export_path
+  if os.path.exists(export_path):
+    raise ValueError('export_path %s already exists.' % export_path)
+  if FLAGS.input_scales_list is None:
+    input_scales_tensor = None
+  else:
+    input_scales_tensor = tf.constant(
+            [float(s) for s in FLAGS.input_scales_list],
+            dtype=tf.float32,
+            shape=[len(FLAGS.input_scales_list)],
+            name='input_scales')
+  module = _ExtractModule(FLAGS.multi_scale_pool_type,
+                          FLAGS.normalize_global_descriptor,
+                          input_scales_tensor)
+  # Load the weights.
+  checkpoint_path = FLAGS.ckpt_path
+  module.LoadWeights(checkpoint_path)
+  print('Checkpoint loaded from ', checkpoint_path)
+  # Save the module.
+  if FLAGS.input_scales_list is None:
+    served_function = module.ExtractFeatures
+  else:
+    served_function = module.ExtractFeaturesFixedScales
+  tf.saved_model.save(
+          module, export_path, signatures={'serving_default': served_function})
+if __name__ == '__main__':
+  app.run(main)
--- a/research/delf/delf/python/training/model/export_model_utils.py
+++ b/research/delf/delf/python/training/model/export_model_utils.py
@@ -183,7 +183,8 @@ def ExtractGlobalFeatures(image,
                          global_scales_ind,
                          model_fn,
                          multi_scale_pool_type='None',
-                          normalize_global_descriptor=False):
+                          normalize_global_descriptor=False,
+                          normalization_function=gld.NormalizeImages):
  """Extract global features for input image.
  Args:
@@ -201,6 +202,7 @@ def ExtractGlobalFeatures(image,
      and a 1D global descriptor is returned.
    normalize_global_descriptor: If True, output global descriptors are
      L2-normalized.
+    normalization_function: Function used for normalization.
  Returns:
    global_descriptors: If `multi_scale_pool_type` is 'None', returns a [S, D]
@@ -213,7 +215,7 @@ def ExtractGlobalFeatures(image,
  """
  original_image_shape_float = tf.gather(
      tf.dtypes.cast(tf.shape(image), tf.float32), [0, 1])
-  image_tensor = gld.NormalizeImages(
+  image_tensor = normalization_function(
      image, pixel_value_offset=128.0, pixel_value_scale=128.0)
  image_tensor = tf.expand_dims(image_tensor, 0, name='image/expand_dims')