Unverified Commit c627506f authored by André Araujo's avatar André Araujo Committed by GitHub
Browse files

DELF open-source library v2.0 (#8454)

* Merged commit includes the following changes:
253126424  by Andre Araujo:

    Scripts to compute metrics for Google Landmarks dataset.

    Also, a small fix to metric in retrieval case: avoids duplicate predicted images.

--
253118971  by Andre Araujo:

    Metrics for Google Landmarks dataset.

--
253106953  by Andre Araujo:

    Library to read files from Google Landmarks challenges.

--
250700636  by Andre Araujo:

    Handle case of aggregation extraction with empty set of input features.

--
250516819  by Andre Araujo:

    Add minimum size for DELF extractor.

--
250435822  by Andre Araujo:

    Add max_image_size/min_image_size for open-source DELF proto / module.

--
250414606  by Andre Araujo:

    Refactor extract_aggregation to allow reuse with different datasets.

--
250356863  by Andre Araujo:

    Remove unnecessary cmd_args variable from boxes_and_features_extraction.

--
249783379  by Andre Araujo:

    Create directory for writing mapping file if it does not exist.

--
249581591  by Andre Araujo:

    Refactor scripts to extract boxes and features from images in Revisited datasets.
    Also, change tf.logging.info --> print for easier logging in open source code.

--
249511821  by Andre Araujo:

    Small change to function for file/directory handling.

--
249289499  by Andre Araujo:

    Internal change.

--

PiperOrigin-RevId: 253126424

* Updating DELF init to adjust to latest changes

* Editing init files for python packages

* Edit D2R dataset reader to work with py3.

PiperOrigin-RevId: 253135576

* DELF package: fix import ordering

* Adding new requirements to setup.py

* Adding init file for training dir

* Merged commit includes the following changes:

FolderOrigin-RevId: /google/src/cloud/andrearaujo/delf_oss/google3/..

* Adding init file for training subdirs

* Working version of DELF training

* Internal change.

PiperOrigin-RevId: 253248648

* Fix variance loading in open-source code.

PiperOrigin-RevId: 260619120

* Separate image re-ranking as a standalone library, and add metric writing to dataset library.

PiperOrigin-RevId: 260998608

* Tool to read written D2R Revisited datasets metrics file. Test is added.

Also adds a unit test for previously-existing SaveMetricsFile function.

PiperOrigin-RevId: 263361410

* Add optional resize factor for feature extraction.

PiperOrigin-RevId: 264437080

* Fix NumPy's new version spacing changes.

PiperOrigin-RevId: 265127245

* Maker image matching function visible, and add support for RANSAC seed.

PiperOrigin-RevId: 277177468

* Avoid matplotlib failure due to missing display backend.

PiperOrigin-RevId: 287316435

* Removes tf.contrib dependency.

PiperOrigin-RevId: 288842237

* Fix tf contrib removal for feature_aggregation_extractor.

PiperOrigin-RevId: 289487669

* Merged commit includes the following changes:
309118395  by Andre Araujo:

    Make DELF open-source code compatible with TF2.

--
309067582  by Andre Araujo:

    Handle image resizing rounding properly for python extraction.

    New behavior is tested with unit tests.

--
308690144  by Andre Araujo:

    Several changes to improve DELF model/training code and make it work in TF 2.1.0:
    - Rename some files for better clarity
    - Using compat.v1 versions of functions
    - Formatting changes
    - Using more appropriate TF function names

--
308689397  by Andre Araujo:

    Internal change.

--
308341315  by Andre Araujo:

    Remove old slim dependency in DELF open-source model.

    This avoids issues with requiring old TF-v1, making it compatible with latest TF.

--
306777559  by Andre Araujo:

    Internal change

--
304505811  by Andre Araujo:

    Raise error during geometric verification if local features have different dimensionalities.

--
301739992  by Andre Araujo:

    Transform some geometric verification constants into arguments, to allow custom matching.

--
301300324  by Andre Araujo:

    Apply name change(experimental_run_v2 -> run) for all callers in Tensorflow.

--
299919057  by Andre Araujo:

    Automated refactoring to make code Python 3 compatible.

--
297953698  by Andre Araujo:

    Explicitly replace "import tensorflow" with "tensorflow.compat.v1" for TF2.x migration

--
297521242  by Andre Araujo:

    Explicitly replace "import tensorflow" with "tensorflow.compat.v1" for TF2.x migration

--
297278247  by Andre Araujo:

    Explicitly replace "import tensorflow" with "tensorflow.compat.v1" for TF2.x migration

--
297270405  by Andre Araujo:

    Explicitly replace "import tensorflow" with "tensorflow.compat.v1" for TF2.x migration

--
297238741  by Andre Araujo:

    Explicitly replace "import tensorflow" with "tensorflow.compat.v1" for TF2.x migration

--
297108605  by Andre Araujo:

    Explicitly replace "import tensorflow" with "tensorflow.compat.v1" for TF2.x migration

--
294676131  by Andre Araujo:

    Add option to resize images to square resolutions without aspect ratio preservation.

--
293849641  by Andre Araujo:

    Internal change.

--
293840896  by Andre Araujo:

    Changing Slim import to tf_slim codebase.

--
293661660  by Andre Araujo:

    Allow the delf training script to read from TFRecords dataset.

--
291755295  by Andre Araujo:

    Internal change.

--
291448508  by Andre Araujo:

    Internal change.

--
291414459  by Andre Araujo:

    Adding train script.

--
291384336  by Andre Araujo:

    Adding model export script and test.

--
291260565  by Andre Araujo:

    Adding placeholder for Google Landmarks dataset.

--
291205548  by Andre Araujo:

    Definition of DELF model using Keras ResNet50 as backbone.

--
289500793  by Andre Araujo:

    Add TFRecord building script for delf.

--

PiperOrigin-RevId: 309118395

* Updating README, dependency versions

* Updating training README

* Fixing init import of export_model

* Fixing init import of export_model_utils

* tkinter in INSTALL_INSTRUCTIONS

* Merged commit includes the following changes:

FolderOrigin-RevId: /google/src/cloud/andrearaujo/delf_oss/google3/..

* INSTALL_INSTRUCTIONS mentioning different cloning options
parent 71d2680d
# Lint as: python3
# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Training script for DELF on Google Landmarks Dataset.
Script to train DELF using classification loss on Google Landmarks Dataset
using MirroredStrategy to so it can run on multiple GPUs.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
from absl import app
from absl import flags
from absl import logging
import tensorflow as tf
import tensorflow_probability as tfp
# Placeholder for internal import. Do not remove this line.
from delf.python.training.datasets import googlelandmarks as gld
from delf.python.training.model import delf_model
FLAGS = flags.FLAGS
flags.DEFINE_boolean('debug', False, 'Debug mode.')
flags.DEFINE_string('logdir', '/tmp/delf', 'WithTensorBoard logdir.')
flags.DEFINE_string('train_file_pattern', '/tmp/data/train*',
'File pattern of training dataset files.')
flags.DEFINE_string('validation_file_pattern', '/tmp/data/validation*',
'File pattern of validation dataset files.')
flags.DEFINE_integer('seed', 0, 'Seed to training dataset.')
flags.DEFINE_float('initial_lr', 0.001, 'Initial learning rate.')
flags.DEFINE_integer('batch_size', 32, 'Global batch size.')
flags.DEFINE_integer('max_iters', 500000, 'Maximum iterations.')
flags.DEFINE_boolean('block3_strides', False, 'Whether to use block3_strides.')
flags.DEFINE_boolean('use_augmentation', True,
'Whether to use ImageNet style augmentation.')
def _record_accuracy(metric, logits, labels):
"""Record accuracy given predicted logits and ground-truth labels."""
softmax_probabilities = tf.keras.layers.Softmax()(logits)
metric.update_state(labels, softmax_probabilities)
def _attention_summaries(scores, global_step):
"""Record statistics of the attention score."""
tf.summary.scalar('attention/max', tf.reduce_max(scores), step=global_step)
tf.summary.scalar('attention/min', tf.reduce_min(scores), step=global_step)
tf.summary.scalar('attention/mean', tf.reduce_mean(scores), step=global_step)
tf.summary.scalar(
'attention/percent_25',
tfp.stats.percentile(scores, 25.0),
step=global_step)
tf.summary.scalar(
'attention/percent_50',
tfp.stats.percentile(scores, 50.0),
step=global_step)
tf.summary.scalar(
'attention/percent_75',
tfp.stats.percentile(scores, 75.0),
step=global_step)
def create_model(num_classes):
"""Define DELF model, and initialize classifiers."""
model = delf_model.Delf(block3_strides=FLAGS.block3_strides, name='DELF')
model.init_classifiers(num_classes)
return model
def _learning_rate_schedule(global_step_value, max_iters, initial_lr):
"""Calculates learning_rate with linear decay.
Args:
global_step_value: int, global step.
max_iters: int, maximum iterations.
initial_lr: float, initial learning rate.
Returns:
lr: float, learning rate.
"""
lr = initial_lr * (1.0 - global_step_value / max_iters)
return lr
def main(argv):
if len(argv) > 1:
raise app.UsageError('Too many command-line arguments.')
#-------------------------------------------------------------
# Log flags used.
logging.info('Running training script with\n')
logging.info('logdir= %s', FLAGS.logdir)
logging.info('initial_lr= %f', FLAGS.initial_lr)
logging.info('block3_strides= %s', str(FLAGS.block3_strides))
# ------------------------------------------------------------
# Create the strategy.
strategy = tf.distribute.MirroredStrategy()
logging.info('Number of devices: %d', strategy.num_replicas_in_sync)
if FLAGS.debug:
print('Number of devices:', strategy.num_replicas_in_sync)
max_iters = FLAGS.max_iters
global_batch_size = FLAGS.batch_size
image_size = 321
num_eval = 1000
report_interval = 100
eval_interval = 1000
save_interval = 20000
initial_lr = FLAGS.initial_lr
clip_val = tf.constant(10.0)
if FLAGS.debug:
global_batch_size = 4
max_iters = 4
num_eval = 1
save_interval = 1
report_interval = 1
# TODO(andrearaujo): Using placeholder, replace with actual value using
# GoogleLandmarksInfo() from datasets/googlelandmarks.py.
num_classes = 14951
# ------------------------------------------------------------
# Create the distributed train/validation sets.
train_dataset = gld.CreateDataset(
file_pattern=FLAGS.train_file_pattern,
batch_size=global_batch_size,
image_size=image_size,
augmentation=FLAGS.use_augmentation,
seed=FLAGS.seed)
validation_dataset = gld.CreateDataset(
file_pattern=FLAGS.validation_file_pattern,
batch_size=global_batch_size,
image_size=image_size,
augmentation=False,
seed=FLAGS.seed)
train_iterator = strategy.make_dataset_iterator(train_dataset)
validation_iterator = strategy.make_dataset_iterator(validation_dataset)
train_iterator.initialize()
validation_iterator.initialize()
# Create a checkpoint directory to store the checkpoints.
checkpoint_prefix = os.path.join(FLAGS.logdir, 'delf_tf2-ckpt')
# ------------------------------------------------------------
# Finally, we do everything in distributed scope.
with strategy.scope():
# Compute loss.
# Set reduction to `none` so we can do the reduction afterwards and divide
# by global batch size.
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(
from_logits=True, reduction=tf.keras.losses.Reduction.NONE)
def compute_loss(labels, predictions):
per_example_loss = loss_object(labels, predictions)
return tf.nn.compute_average_loss(
per_example_loss, global_batch_size=global_batch_size)
# Set up metrics.
desc_validation_loss = tf.keras.metrics.Mean(name='desc_validation_loss')
attn_validation_loss = tf.keras.metrics.Mean(name='attn_validation_loss')
desc_train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(
name='desc_train_accuracy')
attn_train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(
name='attn_train_accuracy')
desc_validation_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(
name='desc_validation_accuracy')
attn_validation_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(
name='attn_validation_accuracy')
# ------------------------------------------------------------
# Setup DELF model and optimizer.
model = create_model(num_classes)
logging.info('Model, datasets loaded.\nnum_classes= %d', num_classes)
optimizer = tf.keras.optimizers.SGD(learning_rate=initial_lr, momentum=0.9)
# Setup summary writer.
summary_writer = tf.summary.create_file_writer(
os.path.join(FLAGS.logdir, 'train_logs'), flush_millis=10000)
# Setup checkpoint directory.
checkpoint = tf.train.Checkpoint(optimizer=optimizer, model=model)
manager = tf.train.CheckpointManager(
checkpoint, checkpoint_prefix, max_to_keep=3)
# ------------------------------------------------------------
# Train step to run on one GPU.
def train_step(inputs):
"""Train one batch."""
images, labels = inputs
# Temporary workaround to avoid some corrupted labels.
labels = tf.clip_by_value(labels, 0, model.num_classes)
global_step = optimizer.iterations
tf.summary.scalar(
'image_range/max', tf.reduce_max(images), step=global_step)
tf.summary.scalar(
'image_range/min', tf.reduce_min(images), step=global_step)
def _backprop_loss(tape, loss, weights):
"""Backpropogate losses using clipped gradients.
Args:
tape: gradient tape.
loss: scalar Tensor, loss value.
weights: keras model weights.
"""
gradients = tape.gradient(loss, weights)
clipped, _ = tf.clip_by_global_norm(gradients, clip_norm=clip_val)
optimizer.apply_gradients(zip(clipped, weights))
# Record gradients and loss through backbone.
with tf.GradientTape() as desc_tape:
blocks = {}
prelogits = model.backbone(
images, intermediates_dict=blocks, training=True)
# Report sparsity.
activations_zero_fractions = {
'sparsity/%s' % k: tf.nn.zero_fraction(v)
for k, v in blocks.items()
}
for k, v in activations_zero_fractions.items():
tf.summary.scalar(k, v, step=global_step)
# Apply descriptor classifier.
logits = model.desc_classification(prelogits)
desc_loss = compute_loss(labels, logits)
# Backprop only through backbone weights.
_backprop_loss(desc_tape, desc_loss, model.desc_trainable_weights)
# Record descriptor train accuracy.
_record_accuracy(desc_train_accuracy, logits, labels)
# Record gradients and loss through attention block.
with tf.GradientTape() as attn_tape:
block3 = blocks['block3'] # pytype: disable=key-error
# Stopping gradients according to DELG paper:
# (https://arxiv.org/abs/2001.05027).
block3 = tf.stop_gradient(block3)
prelogits, scores, _ = model.attention(block3, training=True)
_attention_summaries(scores, global_step)
# Apply attention block classifier.
logits = model.attn_classification(prelogits)
attn_loss = compute_loss(labels, logits)
# Backprop only through attention weights.
_backprop_loss(attn_tape, attn_loss, model.attn_trainable_weights)
# Record attention train accuracy.
_record_accuracy(attn_train_accuracy, logits, labels)
return desc_loss, attn_loss
# ------------------------------------------------------------
def validation_step(inputs):
"""Validate one batch."""
images, labels = inputs
labels = tf.clip_by_value(labels, 0, model.num_classes)
# Get descriptor predictions.
blocks = {}
prelogits = model.backbone(
images, intermediates_dict=blocks, training=False)
logits = model.desc_classification(prelogits, training=False)
softmax_probabilities = tf.keras.layers.Softmax()(logits)
validation_loss = loss_object(labels, logits)
desc_validation_loss.update_state(validation_loss)
desc_validation_accuracy.update_state(labels, softmax_probabilities)
# Get attention predictions.
block3 = blocks['block3'] # pytype: disable=key-error
prelogits, _, _ = model.attention(block3, training=False)
logits = model.attn_classification(prelogits, training=False)
softmax_probabilities = tf.keras.layers.Softmax()(logits)
validation_loss = loss_object(labels, logits)
attn_validation_loss.update_state(validation_loss)
attn_validation_accuracy.update_state(labels, softmax_probabilities)
return desc_validation_accuracy.result(), attn_validation_accuracy.result(
)
# `run` replicates the provided computation and runs it
# with the distributed input.
@tf.function
def distributed_train_step(dataset_inputs):
"""Get the actual losses."""
# Each (desc, attn) is a list of 3 losses - crossentropy, reg, total.
desc_per_replica_loss, attn_per_replica_loss = (
strategy.run(train_step, args=(dataset_inputs,)))
# Reduce over the replicas.
desc_global_loss = strategy.reduce(
tf.distribute.ReduceOp.SUM, desc_per_replica_loss, axis=None)
attn_global_loss = strategy.reduce(
tf.distribute.ReduceOp.SUM, attn_per_replica_loss, axis=None)
return desc_global_loss, attn_global_loss
@tf.function
def distributed_validation_step(dataset_inputs):
return strategy.run(validation_step, args=(dataset_inputs,))
# ------------------------------------------------------------
# *** TRAIN LOOP ***
with summary_writer.as_default():
with tf.summary.record_if(
tf.math.equal(0, optimizer.iterations % report_interval)):
global_step_value = optimizer.iterations.numpy()
while global_step_value < max_iters:
# input_batch : images(b, h, w, c), labels(b,).
try:
input_batch = train_iterator.get_next()
except tf.errors.OutOfRangeError:
# Break if we run out of data in the dataset.
logging.info('Stopping training at global step %d, no more data',
global_step_value)
break
# Set learning rate for optimizer to use.
global_step = optimizer.iterations
global_step_value = global_step.numpy()
learning_rate = _learning_rate_schedule(global_step_value, max_iters,
initial_lr)
optimizer.learning_rate = learning_rate
tf.summary.scalar(
'learning_rate', optimizer.learning_rate, step=global_step)
# Run the training step over num_gpu gpus.
desc_dist_loss, attn_dist_loss = distributed_train_step(input_batch)
# Log losses and accuracies to tensorboard.
tf.summary.scalar(
'loss/desc/crossentropy', desc_dist_loss, step=global_step)
tf.summary.scalar(
'loss/attn/crossentropy', attn_dist_loss, step=global_step)
tf.summary.scalar(
'train_accuracy/desc',
desc_train_accuracy.result(),
step=global_step)
tf.summary.scalar(
'train_accuracy/attn',
attn_train_accuracy.result(),
step=global_step)
# Print to console if running locally.
if FLAGS.debug:
if global_step_value % report_interval == 0:
print(global_step.numpy())
print('desc:', desc_dist_loss.numpy())
print('attn:', attn_dist_loss.numpy())
# Validate once in {eval_interval*n, n \in N} steps.
if global_step_value % eval_interval == 0:
for i in range(num_eval):
try:
validation_batch = validation_iterator.get_next()
desc_validation_result, attn_validation_result = (
distributed_validation_step(validation_batch))
except tf.errors.OutOfRangeError:
logging.info('Stopping eval at batch %d, no more data', i)
break
# Log validation results to tensorboard.
tf.summary.scalar(
'validation/desc', desc_validation_result, step=global_step)
tf.summary.scalar(
'validation/attn', attn_validation_result, step=global_step)
logging.info('\nValidation(%f)\n', global_step_value)
logging.info(': desc: %f\n', desc_validation_result.numpy())
logging.info(': attn: %f\n', attn_validation_result.numpy())
# Print to console.
if FLAGS.debug:
print('Validation: desc:', desc_validation_result.numpy())
print(' : attn:', attn_validation_result.numpy())
# Save checkpoint once (each save_interval*n, n \in N) steps.
if global_step_value % save_interval == 0:
save_path = manager.save()
logging.info('Saved({global_step_value}) at %s', save_path)
file_path = '%s/delf_weights' % FLAGS.logdir
model.save_weights(file_path, save_format='tf')
logging.info('Saved weights({global_step_value}) at %s', file_path)
# Reset metrics for next step.
desc_train_accuracy.reset_states()
attn_train_accuracy.reset_states()
desc_validation_loss.reset_states()
attn_validation_loss.reset_states()
desc_validation_accuracy.reset_states()
attn_validation_accuracy.reset_states()
if global_step.numpy() > max_iters:
break
logging.info('Finished training for %d steps.', max_iters)
if __name__ == '__main__':
app.run(main)
...@@ -16,10 +16,22 @@ ...@@ -16,10 +16,22 @@
from setuptools import setup, find_packages from setuptools import setup, find_packages
install_requires = [
'absl-py >= 0.7.1',
'protobuf >= 3.8.0',
'pandas >= 0.24.2',
'numpy >= 1.16.1',
'scipy >= 1.2.2',
'tensorflow >= 2.1',
'tf_slim >= 1.1',
'tensorflow_probability >= 0.9.0',
]
setup( setup(
name='delf', name='delf',
version='0.1', version='2.0',
include_package_data=True, include_package_data=True,
packages=find_packages(), packages=find_packages(),
install_requires=install_requires,
description='DELF (DEep Local Features)', description='DELF (DEep Local Features)',
) )
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment