Commit 657dcda5 authored by Kaushik Shivakumar's avatar Kaushik Shivakumar
Browse files

pull latest

parents 26e24e21 e6017471
# DELF Training Instructions # DELF Training Instructions
This README documents the end-to-end process for training a landmark detection and retrieval This README documents the end-to-end process for training a landmark detection
model using the DELF library on the [Google Landmarks Dataset v2](https://github.com/cvdfoundation/google-landmark) (GLDv2). This can be achieved following these steps: and retrieval model using the DELF library on the
[Google Landmarks Dataset v2](https://github.com/cvdfoundation/google-landmark)
(GLDv2). This can be achieved following these steps:
1. Install the DELF Python library. 1. Install the DELF Python library.
2. Download the raw images of the GLDv2 dataset. 2. Download the raw images of the GLDv2 dataset.
3. Prepare the training data. 3. Prepare the training data.
...@@ -11,8 +14,9 @@ The next sections will cove each of these steps in greater detail. ...@@ -11,8 +14,9 @@ The next sections will cove each of these steps in greater detail.
## Prerequisites ## Prerequisites
Clone the [TensorFlow Model Garden](https://github.com/tensorflow/models) repository and move Clone the [TensorFlow Model Garden](https://github.com/tensorflow/models)
into the `models/research/delf/delf/python/training`folder. repository and move into the `models/research/delf/delf/python/training`folder.
``` ```
git clone https://github.com/tensorflow/models.git git clone https://github.com/tensorflow/models.git
cd models/research/delf/delf/python/training cd models/research/delf/delf/python/training
...@@ -20,74 +24,101 @@ cd models/research/delf/delf/python/training ...@@ -20,74 +24,101 @@ cd models/research/delf/delf/python/training
## Install the DELF Library ## Install the DELF Library
The DELF Python library can be installed by running the [`install_delf.sh`](./install_delf.sh) The DELF Python library can be installed by running the
script using the command: [`install_delf.sh`](./install_delf.sh) script using the command:
``` ```
bash install_delf.sh bash install_delf.sh
``` ```
The script installs both the DELF library and its dependencies in the following sequence:
The script installs both the DELF library and its dependencies in the following
sequence:
* Install TensorFlow 2.2 and TensorFlow 2.2 for GPU. * Install TensorFlow 2.2 and TensorFlow 2.2 for GPU.
* Install the [TF-Slim](https://github.com/google-research/tf-slim) library from source. * Install the [TF-Slim](https://github.com/google-research/tf-slim) library
* Download [protoc](https://github.com/protocolbuffers/protobuf) and compile the DELF Protocol from source.
Buffers. * Download [protoc](https://github.com/protocolbuffers/protobuf) and compile
* Install the matplotlib, numpy, scikit-image, scipy and python3-tk Python libraries. the DELF Protocol Buffers.
* Install the [TensorFlow Object Detection API](https://github.com/tensorflow/models/tree/master/research/object_detection) from the cloned TensorFlow Model Garden repository. * Install the matplotlib, numpy, scikit-image, scipy and python3-tk Python
libraries.
* Install the
[TensorFlow Object Detection API](https://github.com/tensorflow/models/tree/master/research/object_detection)
from the cloned TensorFlow Model Garden repository.
* Install the DELF package. * Install the DELF package.
*Please note that the current installation only works on 64 bits Linux architectures due to the *Please note that the current installation only works on 64 bits Linux
`protoc` binary downloaded by the installation script. If you wish to install the DELF library on architectures due to the `protoc` binary downloaded by the installation script.
other architectures please update the [`install_delf.sh`](./install_delf.sh) script by referencing If you wish to install the DELF library on other architectures please update the
the desired `protoc` [binary release](https://github.com/protocolbuffers/protobuf/releases).* [`install_delf.sh`](./install_delf.sh) script by referencing the desired
`protoc`
[binary release](https://github.com/protocolbuffers/protobuf/releases).*
## Download the GLDv2 Training Data ## Download the GLDv2 Training Data
The [GLDv2](https://github.com/cvdfoundation/google-landmark) images are grouped in 3 datasets: TRAIN, INDEX, TEST. Images in each dataset are grouped into `*.tar` files and individually The [GLDv2](https://github.com/cvdfoundation/google-landmark) images are grouped
referenced in `*.csv`files containing training metadata and licensing information. The number of in 3 datasets: TRAIN, INDEX, TEST. Images in each dataset are grouped into
`*.tar` files per dataset is as follows: `*.tar` files and individually referenced in `*.csv`files containing training
metadata and licensing information. The number of `*.tar` files per dataset is
as follows:
* TRAIN: 500 files. * TRAIN: 500 files.
* INDEX: 100 files. * INDEX: 100 files.
* TEST: 20 files. * TEST: 20 files.
To download the GLDv2 images, run the [`download_dataset.sh`](./download_dataset.sh) script like in To download the GLDv2 images, run the
the following example: [`download_dataset.sh`](./download_dataset.sh) script like in the following
example:
``` ```
bash download_dataset.sh 500 100 20 bash download_dataset.sh 500 100 20
``` ```
The script takes the following parameters, in order: The script takes the following parameters, in order:
* The number of image files from the TRAIN dataset to download (maximum 500). * The number of image files from the TRAIN dataset to download (maximum 500).
* The number of image files from the INDEX dataset to download (maximum 100). * The number of image files from the INDEX dataset to download (maximum 100).
* The number of image files from the TEST dataset to download (maximum 20). * The number of image files from the TEST dataset to download (maximum 20).
The script downloads the GLDv2 images under the following directory structure: The script downloads the GLDv2 images under the following directory structure:
* gldv2_dataset/ * gldv2_dataset/
* train/ - Contains raw images from the TRAIN dataset. * train/ - Contains raw images from the TRAIN dataset.
* index/ - Contains raw images from the INDEX dataset. * index/ - Contains raw images from the INDEX dataset.
* test/ - Contains raw images from the TEST dataset. * test/ - Contains raw images from the TEST dataset.
Each of the three folders `gldv2_dataset/train/`, `gldv2_dataset/index/` and `gldv2_dataset/test/` Each of the three folders `gldv2_dataset/train/`, `gldv2_dataset/index/` and
contains the following: `gldv2_dataset/test/` contains the following:
* The downloaded `*.tar` files. * The downloaded `*.tar` files.
* The corresponding MD5 checksum files, `*.txt`. * The corresponding MD5 checksum files, `*.txt`.
* The unpacked content of the downloaded files. (*Images are organized in folders and subfolders * The unpacked content of the downloaded files. (*Images are organized in
based on the first, second and third character in their file name.*) folders and subfolders based on the first, second and third character in
* The CSV files containing training and licensing metadata of the downloaded images. their file name.*)
* The CSV files containing training and licensing metadata of the downloaded
images.
*Please note that due to the large size of the GLDv2 dataset, the download can take up to 12 *Please note that due to the large size of the GLDv2 dataset, the download can
hours and up to 1 TB of disk space. In order to save bandwidth and disk space, you may want to start by downloading only the TRAIN dataset, the only one required for the training, thus saving take up to 12 hours and up to 1 TB of disk space. In order to save bandwidth and
approximately ~95 GB, the equivalent of the INDEX and TEST datasets. To further save disk space, disk space, you may want to start by downloading only the TRAIN dataset, the
the `*.tar` files can be deleted after downloading and upacking them.* only one required for the training, thus saving approximately ~95 GB, the
equivalent of the INDEX and TEST datasets. To further save disk space, the
`*.tar` files can be deleted after downloading and upacking them.*
## Prepare the Data for Training ## Prepare the Data for Training
Preparing the data for training consists of creating [TFRecord](https://www.tensorflow.org/tutorials/load_data/tfrecord) Preparing the data for training consists of creating
files from the raw GLDv2 images grouped into TRAIN and VALIDATION splits. The training set [TFRecord](https://www.tensorflow.org/tutorials/load_data/tfrecord) files from
produced contains only the *clean* subset of the GLDv2 dataset. The [CVPR'20 paper](https://arxiv.org/abs/2004.01804) the raw GLDv2 images grouped into TRAIN and VALIDATION splits. The training set
introducing the GLDv2 dataset contains a detailed description of the *clean* subset. produced contains only the *clean* subset of the GLDv2 dataset. The
[CVPR'20 paper](https://arxiv.org/abs/2004.01804) introducing the GLDv2 dataset
contains a detailed description of the *clean* subset.
Generating the TFRecord files containing the TRAIN and VALIDATION splits of the
*clean* GLDv2 subset can be achieved by running the
[`build_image_dataset.py`](./build_image_dataset.py) script. Assuming that the
GLDv2 images have been downloaded to the `gldv2_dataset` folder, the script can
be run as follows:
Generating the TFRecord files containing the TRAIN and VALIDATION splits of the *clean* GLDv2
subset can be achieved by running the [`build_image_dataset.py`](./build_image_dataset.py)
script. Assuming that the GLDv2 images have been downloaded to the `gldv2_dataset` folder, the
script can be run as follows:
``` ```
python3 build_image_dataset.py \ python3 build_image_dataset.py \
--train_csv_path=gldv2_dataset/train/train.csv \ --train_csv_path=gldv2_dataset/train/train.csv \
...@@ -98,31 +129,165 @@ python3 build_image_dataset.py \ ...@@ -98,31 +129,165 @@ python3 build_image_dataset.py \
--generate_train_validation_splits \ --generate_train_validation_splits \
--validation_split_size=0.2 --validation_split_size=0.2
``` ```
*Please refer to the source code of the [`build_image_dataset.py`](./build_image_dataset.py) script for a detailed description of its parameters.*
The TFRecord files written in the `OUTPUT_DIRECTORY` will be prefixed as follows: *Please refer to the source code of the
[`build_image_dataset.py`](./build_image_dataset.py) script for a detailed
description of its parameters.*
The TFRecord files written in the `OUTPUT_DIRECTORY` will be prefixed as
follows:
* TRAIN split: `train-*` * TRAIN split: `train-*`
* VALIDATION split: `validation-*` * VALIDATION split: `validation-*`
The same script can be used to generate TFRecord files for the TEST split for post-training The same script can be used to generate TFRecord files for the TEST split for
evaluation purposes. This can be achieved by adding the parameters: post-training evaluation purposes. This can be achieved by adding the
parameters:
``` ```
--test_csv_path=gldv2_dataset/train/test.csv \ --test_csv_path=gldv2_dataset/train/test.csv \
--test_directory=gldv2_dataset/test/*/*/*/ \ --test_directory=gldv2_dataset/test/*/*/*/ \
``` ```
In this scenario, the TFRecord files of the TEST split written in the `OUTPUT_DIRECTORY` will be
named according to the pattern `test-*`.
*Please note that due to the large size of the GLDv2 dataset, the generation of the TFRecord In this scenario, the TFRecord files of the TEST split written in the
files can take up to 12 hours and up to 500 GB of space disk.* `OUTPUT_DIRECTORY` will be named according to the pattern `test-*`.
*Please note that due to the large size of the GLDv2 dataset, the generation of
the TFRecord files can take up to 12 hours and up to 500 GB of space disk.*
## Running the Training ## Running the Training
Assuming the TFRecord files were generated in the `gldv2_dataset/tfrecord/` directory, running For the training to converge faster, it is possible to initialize the ResNet
the following command should start training a model: backbone with the weights of a pretrained ImageNet model. The ImageNet
checkpoint is available at the following location:
[`http://storage.googleapis.com/delf/resnet50_imagenet_weights.tar.gz`](http://storage.googleapis.com/delf/resnet50_imagenet_weights.tar.gz).
To download and unpack it run the following commands on a Linux box:
```
curl -Os http://storage.googleapis.com/delf/resnet50_imagenet_weights.tar.gz
tar -xzvf resnet50_imagenet_weights.tar.gz
```
Assuming the TFRecord files were generated in the `gldv2_dataset/tfrecord/`
directory, running the following command should start training a model and
output the results in the `gldv2_training` directory:
``` ```
python3 train.py \ python3 train.py \
--train_file_pattern=gldv2_dataset/tfrecord/train* \ --train_file_pattern=gldv2_dataset/tfrecord/train* \
--validation_file_pattern=gldv2_dataset/tfrecord/validation* --validation_file_pattern=gldv2_dataset/tfrecord/validation* \
--imagenet_checkpoint=resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5 \
--dataset_version=gld_v2_clean \
--logdir=gldv2_training/
```
On a multi-GPU machine the batch size can be increased to speed up the training
using the `--batch_size` parameter. On a 8 Tesla P100 GPUs machine you can set
the batch size to `256`:
```
--batch_size=256
```
## Exporting the Trained Model
Assuming the training output, the TensorFlow checkpoint, is in the
`gldv2_training` directory, running the following commands exports the model.
### DELF local feature model
``` ```
python3 model/export_model.py \
--ckpt_path=gldv2_training/delf_weights \
--export_path=gldv2_model_local \
--block3_strides
```
### Kaggle-compatible global feature model
To export a global feature model in the format required by the
[2020 Landmark Retrieval challenge](https://www.kaggle.com/c/landmark-retrieval-2020),
you can use the following command:
```
python3 model/export_global_model.py \
--ckpt_path=gldv2_training/delf_weights \
--export_path=gldv2_model_global \
--input_scales_list=0.70710677,1.0,1.4142135 \
--multi_scale_pool_type=sum \
--normalize_global_descriptor
```
## Testing the Trained Model
After the trained model has been exported, it can be used to extract DELF
features from 2 images of the same landmark and to perform a matching test
between the 2 images based on the extracted features to validate they represent
the same landmark.
Start by downloading the Oxford buildings dataset:
```
mkdir data && cd data
wget http://www.robots.ox.ac.uk/~vgg/data/oxbuildings/oxbuild_images.tgz
mkdir oxford5k_images oxford5k_features
tar -xvzf oxbuild_images.tgz -C oxford5k_images/
cd ../
echo data/oxford5k_images/hertford_000056.jpg >> list_images.txt
echo data/oxford5k_images/oxford_000317.jpg >> list_images.txt
```
Make a copy of the
[`delf_config_example.pbtxt`](../examples/delf_config_example.pbtxt) protobuffer
file which configures the DELF feature extraction. Update the file by making the
following changes:
* set the `model_path` attribute to the directory containing the exported
model, `gldv2_model_local` in this example
* add at the root level the attribute `is_tf2_exported` with the value `true`
* set to `false` the `use_pca` attribute inside `delf_local_config`
The ensuing file should resemble the following:
```
model_path: "gldv2_model_local"
image_scales: .25
image_scales: .3536
image_scales: .5
image_scales: .7071
image_scales: 1.0
image_scales: 1.4142
image_scales: 2.0
is_tf2_exported: true
delf_local_config {
use_pca: false
max_feature_num: 1000
score_threshold: 100.0
}
```
Run the following command to extract DELF features for the images
`hertford_000056.jpg` and `oxford_000317.jpg`:
```
python3 ../examples/extract_features.py \
--config_path delf_config_example.pbtxt \
--list_images_path list_images.txt \
--output_dir data/oxford5k_features
```
Run the following command to perform feature matching between the images
`hertford_000056.jpg` and `oxford_000317.jpg`:
```
python3 ../examples/match_images.py \
--image_1_path data/oxford5k_images/hertford_000056.jpg \
--image_2_path data/oxford5k_images/oxford_000317.jpg \
--features_1_path data/oxford5k_features/hertford_000056.delf \
--features_2_path data/oxford5k_features/oxford_000317.delf \
--output_image matched_images.png
```
The generated image `matched_images.png` should look similar to this one:
![MatchedImagesDemo](./matched_images_demo.png)
...@@ -302,6 +302,21 @@ def _write_relabeling_rules(relabeling_rules): ...@@ -302,6 +302,21 @@ def _write_relabeling_rules(relabeling_rules):
csv_writer.writerow([new_label, old_label]) csv_writer.writerow([new_label, old_label])
def _shuffle_by_columns(np_array, random_state):
"""Shuffle the columns of a 2D numpy array.
Args:
np_array: array to shuffle.
random_state: numpy RandomState to be used for shuffling.
Returns:
The shuffled array.
"""
columns = np_array.shape[1]
columns_indices = np.arange(columns)
random_state.shuffle(columns_indices)
return np_array[:, columns_indices]
def _build_train_and_validation_splits(image_paths, file_ids, labels, def _build_train_and_validation_splits(image_paths, file_ids, labels,
validation_split_size, seed): validation_split_size, seed):
"""Create TRAIN and VALIDATION splits containg all labels in equal proportion. """Create TRAIN and VALIDATION splits containg all labels in equal proportion.
...@@ -353,19 +368,21 @@ def _build_train_and_validation_splits(image_paths, file_ids, labels, ...@@ -353,19 +368,21 @@ def _build_train_and_validation_splits(image_paths, file_ids, labels,
for label, indexes in image_attrs_idx_by_label.items(): for label, indexes in image_attrs_idx_by_label.items():
# Create the subset for the current label. # Create the subset for the current label.
image_attrs_label = image_attrs[:, indexes] image_attrs_label = image_attrs[:, indexes]
images_per_label = image_attrs_label.shape[1]
# Shuffle the current label subset. # Shuffle the current label subset.
columns_indices = np.arange(images_per_label) image_attrs_label = _shuffle_by_columns(image_attrs_label, rs)
rs.shuffle(columns_indices)
image_attrs_label = image_attrs_label[:, columns_indices]
# Split the current label subset into TRAIN and VALIDATION splits and add # Split the current label subset into TRAIN and VALIDATION splits and add
# each split to the list of all splits. # each split to the list of all splits.
images_per_label = image_attrs_label.shape[1]
cutoff_idx = max(1, int(validation_split_size * images_per_label)) cutoff_idx = max(1, int(validation_split_size * images_per_label))
splits[_VALIDATION_SPLIT].append(image_attrs_label[:, 0 : cutoff_idx]) splits[_VALIDATION_SPLIT].append(image_attrs_label[:, 0 : cutoff_idx])
splits[_TRAIN_SPLIT].append(image_attrs_label[:, cutoff_idx : ]) splits[_TRAIN_SPLIT].append(image_attrs_label[:, cutoff_idx : ])
validation_split = np.concatenate(splits[_VALIDATION_SPLIT], axis=1) # Concatenate all subsets of image attributes into TRAIN and VALIDATION splits
train_split = np.concatenate(splits[_TRAIN_SPLIT], axis=1) # and reshuffle them again to ensure variance of labels across batches.
validation_split = _shuffle_by_columns(
np.concatenate(splits[_VALIDATION_SPLIT], axis=1), rs)
train_split = _shuffle_by_columns(
np.concatenate(splits[_TRAIN_SPLIT], axis=1), rs)
# Unstack the image attribute arrays in the TRAIN and VALIDATION splits and # Unstack the image attribute arrays in the TRAIN and VALIDATION splits and
# convert them back to lists. Convert labels back to 'int' from 'str' # convert them back to lists. Convert labels back to 'int' from 'str'
......
...@@ -29,11 +29,7 @@ import tensorflow as tf ...@@ -29,11 +29,7 @@ import tensorflow as tf
class _GoogleLandmarksInfo(object): class _GoogleLandmarksInfo(object):
"""Metadata about the Google Landmarks dataset.""" """Metadata about the Google Landmarks dataset."""
num_classes = { num_classes = {'gld_v1': 14951, 'gld_v2': 203094, 'gld_v2_clean': 81313}
'gld_v1': 14951,
'gld_v2': 203094,
'gld_v2_clean': 81313
}
class _DataAugmentationParams(object): class _DataAugmentationParams(object):
...@@ -123,6 +119,8 @@ def _ParseFunction(example, name_to_features, image_size, augmentation): ...@@ -123,6 +119,8 @@ def _ParseFunction(example, name_to_features, image_size, augmentation):
# Parse to get image. # Parse to get image.
image = parsed_example['image/encoded'] image = parsed_example['image/encoded']
image = tf.io.decode_jpeg(image) image = tf.io.decode_jpeg(image)
image = NormalizeImages(
image, pixel_value_scale=128.0, pixel_value_offset=128.0)
if augmentation: if augmentation:
image = _ImageNetCrop(image) image = _ImageNetCrop(image)
else: else:
...@@ -130,6 +128,7 @@ def _ParseFunction(example, name_to_features, image_size, augmentation): ...@@ -130,6 +128,7 @@ def _ParseFunction(example, name_to_features, image_size, augmentation):
image.set_shape([image_size, image_size, 3]) image.set_shape([image_size, image_size, 3])
# Parse to get label. # Parse to get label.
label = parsed_example['image/class/label'] label = parsed_example['image/class/label']
return image, label return image, label
...@@ -162,6 +161,7 @@ def CreateDataset(file_pattern, ...@@ -162,6 +161,7 @@ def CreateDataset(file_pattern,
'image/width': tf.io.FixedLenFeature([], tf.int64, default_value=0), 'image/width': tf.io.FixedLenFeature([], tf.int64, default_value=0),
'image/channels': tf.io.FixedLenFeature([], tf.int64, default_value=0), 'image/channels': tf.io.FixedLenFeature([], tf.int64, default_value=0),
'image/format': tf.io.FixedLenFeature([], tf.string, default_value=''), 'image/format': tf.io.FixedLenFeature([], tf.string, default_value=''),
'image/id': tf.io.FixedLenFeature([], tf.string, default_value=''),
'image/filename': tf.io.FixedLenFeature([], tf.string, default_value=''), 'image/filename': tf.io.FixedLenFeature([], tf.string, default_value=''),
'image/encoded': tf.io.FixedLenFeature([], tf.string, default_value=''), 'image/encoded': tf.io.FixedLenFeature([], tf.string, default_value=''),
'image/class/label': tf.io.FixedLenFeature([], tf.int64, default_value=0), 'image/class/label': tf.io.FixedLenFeature([], tf.int64, default_value=0),
......
...@@ -132,10 +132,12 @@ class Delf(tf.keras.Model): ...@@ -132,10 +132,12 @@ class Delf(tf.keras.Model):
self.attn_classification.trainable_weights) self.attn_classification.trainable_weights)
def call(self, input_image, training=True): def call(self, input_image, training=True):
blocks = {'block3': None} blocks = {}
self.backbone(input_image, intermediates_dict=blocks, training=training)
features = blocks['block3'] self.backbone.build_call(
input_image, intermediates_dict=blocks, training=training)
features = blocks['block3'] # pytype: disable=key-error
_, probs, _ = self.attention(features, training=training) _, probs, _ = self.attention(features, training=training)
return probs, features return probs, features
# Lint as: python3
# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Export global feature tensorflow inference model.
This model includes image pyramids for multi-scale processing.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
from absl import app
from absl import flags
import tensorflow as tf
from delf.python.training.model import delf_model
from delf.python.training.model import export_model_utils
FLAGS = flags.FLAGS
flags.DEFINE_string('ckpt_path', '/tmp/delf-logdir/delf-weights',
'Path to saved checkpoint.')
flags.DEFINE_string('export_path', None, 'Path where model will be exported.')
flags.DEFINE_list(
'input_scales_list', None,
'Optional input image scales to use. If None (default), an input end-point '
'"input_scales" is added for the exported model. If not None, the '
'specified list of floats will be hard-coded as the desired input scales.')
flags.DEFINE_enum(
'multi_scale_pool_type', 'None', ['None', 'average', 'sum'],
"If 'None' (default), the model is exported with an output end-point "
"'global_descriptors', where the global descriptor for each scale is "
"returned separately. If not 'None', the global descriptor of each scale is"
' pooled and a 1D global descriptor is returned, with output end-point '
"'global_descriptor'.")
flags.DEFINE_boolean('normalize_global_descriptor', False,
'If True, L2-normalizes global descriptor.')
class _ExtractModule(tf.Module):
"""Helper module to build and save global feature model."""
def __init__(self,
multi_scale_pool_type='None',
normalize_global_descriptor=False,
input_scales_tensor=None):
"""Initialization of global feature model.
Args:
multi_scale_pool_type: Type of multi-scale pooling to perform.
normalize_global_descriptor: Whether to L2-normalize global descriptor.
input_scales_tensor: If None, the exported function to be used should be
ExtractFeatures, where an input end-point "input_scales" is added for
the exported model. If not None, the specified 1D tensor of floats will
be hard-coded as the desired input scales, in conjunction with
ExtractFeaturesFixedScales.
"""
self._multi_scale_pool_type = multi_scale_pool_type
self._normalize_global_descriptor = normalize_global_descriptor
if input_scales_tensor is None:
self._input_scales_tensor = []
else:
self._input_scales_tensor = input_scales_tensor
# Setup the DELF model for extraction.
self._model = delf_model.Delf(block3_strides=False, name='DELF')
def LoadWeights(self, checkpoint_path):
self._model.load_weights(checkpoint_path)
@tf.function(input_signature=[
tf.TensorSpec(shape=[None, None, 3], dtype=tf.uint8, name='input_image'),
tf.TensorSpec(shape=[None], dtype=tf.float32, name='input_scales'),
tf.TensorSpec(
shape=[None], dtype=tf.int32, name='input_global_scales_ind')
])
def ExtractFeatures(self, input_image, input_scales, input_global_scales_ind):
extracted_features = export_model_utils.ExtractGlobalFeatures(
input_image,
input_scales,
input_global_scales_ind,
lambda x: self._model.backbone.build_call(x, training=False),
multi_scale_pool_type=self._multi_scale_pool_type,
normalize_global_descriptor=self._normalize_global_descriptor)
named_output_tensors = {}
if self._multi_scale_pool_type == 'None':
named_output_tensors['global_descriptors'] = tf.identity(
extracted_features, name='global_descriptors')
else:
named_output_tensors['global_descriptor'] = tf.identity(
extracted_features, name='global_descriptor')
return named_output_tensors
@tf.function(input_signature=[
tf.TensorSpec(shape=[None, None, 3], dtype=tf.uint8, name='input_image')
])
def ExtractFeaturesFixedScales(self, input_image):
return self.ExtractFeatures(input_image, self._input_scales_tensor,
tf.range(tf.size(self._input_scales_tensor)))
def main(argv):
if len(argv) > 1:
raise app.UsageError('Too many command-line arguments.')
export_path = FLAGS.export_path
if os.path.exists(export_path):
raise ValueError('export_path %s already exists.' % export_path)
if FLAGS.input_scales_list is None:
input_scales_tensor = None
else:
input_scales_tensor = tf.constant(
[float(s) for s in FLAGS.input_scales_list],
dtype=tf.float32,
shape=[len(FLAGS.input_scales_list)],
name='input_scales')
module = _ExtractModule(FLAGS.multi_scale_pool_type,
FLAGS.normalize_global_descriptor,
input_scales_tensor)
# Load the weights.
checkpoint_path = FLAGS.ckpt_path
module.LoadWeights(checkpoint_path)
print('Checkpoint loaded from ', checkpoint_path)
# Save the module
if FLAGS.input_scales_list is None:
served_function = module.ExtractFeatures
else:
served_function = module.ExtractFeaturesFixedScales
tf.saved_model.save(
module, export_path, signatures={'serving_default': served_function})
if __name__ == '__main__':
app.run(main)
...@@ -42,67 +42,39 @@ flags.DEFINE_boolean('block3_strides', False, ...@@ -42,67 +42,39 @@ flags.DEFINE_boolean('block3_strides', False,
flags.DEFINE_float('iou', 1.0, 'IOU for non-max suppression.') flags.DEFINE_float('iou', 1.0, 'IOU for non-max suppression.')
def _build_tensor_info(tensor_dict): class _ExtractModule(tf.Module):
"""Replace the dict's value by the tensor info. """Helper module to build and save DELF model."""
Args: def __init__(self, block3_strides, iou):
tensor_dict: A dictionary contains <string, tensor>. """Initialization of DELF model.
Returns: Args:
dict: New dictionary contains <string, tensor_info>. block3_strides: bool, whether to add strides to the output of block3.
iou: IOU for non-max suppression.
""" """
return { self._stride_factor = 2.0 if block3_strides else 1.0
k: tf.compat.v1.saved_model.utils.build_tensor_info(t) self._iou = iou
for k, t in tensor_dict.items()
}
def main(argv):
if len(argv) > 1:
raise app.UsageError('Too many command-line arguments.')
export_path = FLAGS.export_path
if os.path.exists(export_path):
raise ValueError('Export_path already exists.')
with tf.Graph().as_default() as g, tf.compat.v1.Session(graph=g) as sess:
# Setup the DELF model for extraction. # Setup the DELF model for extraction.
model = delf_model.Delf(block3_strides=FLAGS.block3_strides, name='DELF') self._model = delf_model.Delf(
block3_strides=block3_strides, name='DELF')
# Initial forward pass to build model. def LoadWeights(self, checkpoint_path):
images = tf.zeros((1, 321, 321, 3), dtype=tf.float32) self._model.load_weights(checkpoint_path)
model(images)
stride_factor = 2.0 if FLAGS.block3_strides else 1.0 @tf.function(input_signature=[
tf.TensorSpec(shape=[None, None, 3], dtype=tf.uint8, name='input_image'),
# Setup the multiscale keypoint extraction. tf.TensorSpec(shape=[None], dtype=tf.float32, name='input_scales'),
input_image = tf.compat.v1.placeholder( tf.TensorSpec(shape=(), dtype=tf.int32, name='input_max_feature_num'),
tf.uint8, shape=(None, None, 3), name='input_image') tf.TensorSpec(shape=(), dtype=tf.float32, name='input_abs_thres')
input_abs_thres = tf.compat.v1.placeholder( ])
tf.float32, shape=(), name='input_abs_thres') def ExtractFeatures(self, input_image, input_scales, input_max_feature_num,
input_scales = tf.compat.v1.placeholder( input_abs_thres):
tf.float32, shape=[None], name='input_scales')
input_max_feature_num = tf.compat.v1.placeholder(
tf.int32, shape=(), name='input_max_feature_num')
extracted_features = export_model_utils.ExtractLocalFeatures( extracted_features = export_model_utils.ExtractLocalFeatures(
input_image, input_scales, input_max_feature_num, input_abs_thres, input_image, input_scales, input_max_feature_num, input_abs_thres,
FLAGS.iou, lambda x: model(x, training=False), stride_factor) self._iou, lambda x: self._model(x, training=False),
self._stride_factor)
# Load the weights.
checkpoint_path = FLAGS.ckpt_path
model.load_weights(checkpoint_path)
print('Checkpoint loaded from ', checkpoint_path)
named_input_tensors = {
'input_image': input_image,
'input_scales': input_scales,
'input_abs_thres': input_abs_thres,
'input_max_feature_num': input_max_feature_num,
}
# Outputs to the exported model.
named_output_tensors = {} named_output_tensors = {}
named_output_tensors['boxes'] = tf.identity( named_output_tensors['boxes'] = tf.identity(
extracted_features[0], name='boxes') extracted_features[0], name='boxes')
...@@ -112,25 +84,27 @@ def main(argv): ...@@ -112,25 +84,27 @@ def main(argv):
extracted_features[2], name='scales') extracted_features[2], name='scales')
named_output_tensors['scores'] = tf.identity( named_output_tensors['scores'] = tf.identity(
extracted_features[3], name='scores') extracted_features[3], name='scores')
return named_output_tensors
def main(argv):
if len(argv) > 1:
raise app.UsageError('Too many command-line arguments.')
export_path = FLAGS.export_path
if os.path.exists(export_path):
raise ValueError(f'Export_path {export_path} already exists. Please '
'specify a different path or delete the existing one.')
module = _ExtractModule(FLAGS.block3_strides, FLAGS.iou)
# Load the weights.
checkpoint_path = FLAGS.ckpt_path
module.LoadWeights(checkpoint_path)
print('Checkpoint loaded from ', checkpoint_path)
# Export the model. # Save the module
signature_def = tf.compat.v1.saved_model.signature_def_utils.build_signature_def( tf.saved_model.save(module, export_path)
inputs=_build_tensor_info(named_input_tensors),
outputs=_build_tensor_info(named_output_tensors))
print('Exporting trained model to:', export_path)
builder = tf.compat.v1.saved_model.builder.SavedModelBuilder(export_path)
init_op = None
builder.add_meta_graph_and_variables(
sess, [tf.compat.v1.saved_model.tag_constants.SERVING],
signature_def_map={
tf.compat.v1.saved_model.signature_constants
.DEFAULT_SERVING_SIGNATURE_DEF_KEY:
signature_def
},
main_op=init_op)
builder.save()
if __name__ == '__main__': if __name__ == '__main__':
......
...@@ -142,7 +142,9 @@ def ExtractLocalFeatures(image, image_scales, max_feature_num, abs_thres, iou, ...@@ -142,7 +142,9 @@ def ExtractLocalFeatures(image, image_scales, max_feature_num, abs_thres, iou,
keep_going = lambda j, b, f, scales, scores: tf.less(j, num_scales) keep_going = lambda j, b, f, scales, scores: tf.less(j, num_scales)
(_, output_boxes, output_features, output_scales, (_, output_boxes, output_features, output_scales,
output_scores) = tf.while_loop( output_scores) = tf.nest.map_structure(
tf.stop_gradient,
tf.while_loop(
cond=keep_going, cond=keep_going,
body=_ProcessSingleScale, body=_ProcessSingleScale,
loop_vars=[ loop_vars=[
...@@ -154,8 +156,7 @@ def ExtractLocalFeatures(image, image_scales, max_feature_num, abs_thres, iou, ...@@ -154,8 +156,7 @@ def ExtractLocalFeatures(image, image_scales, max_feature_num, abs_thres, iou,
tf.TensorShape([None, feature_depth]), tf.TensorShape([None, feature_depth]),
tf.TensorShape([None]), tf.TensorShape([None]),
tf.TensorShape([None]) tf.TensorShape([None])
], ]))
back_prop=False)
feature_boxes = box_list.BoxList(output_boxes) feature_boxes = box_list.BoxList(output_boxes)
feature_boxes.add_field('features', output_features) feature_boxes.add_field('features', output_features)
...@@ -169,3 +170,99 @@ def ExtractLocalFeatures(image, image_scales, max_feature_num, abs_thres, iou, ...@@ -169,3 +170,99 @@ def ExtractLocalFeatures(image, image_scales, max_feature_num, abs_thres, iou,
return final_boxes.get(), final_boxes.get_field( return final_boxes.get(), final_boxes.get_field(
'features'), final_boxes.get_field('scales'), tf.expand_dims( 'features'), final_boxes.get_field('scales'), tf.expand_dims(
final_boxes.get_field('scores'), 1) final_boxes.get_field('scores'), 1)
@tf.function
def ExtractGlobalFeatures(image,
image_scales,
global_scales_ind,
model_fn,
multi_scale_pool_type='None',
normalize_global_descriptor=False):
"""Extract global features for input image.
Args:
image: image tensor of type tf.uint8 with shape [h, w, channels].
image_scales: 1D float tensor which contains float scales used for image
pyramid construction.
global_scales_ind: Feature extraction happens only for a subset of
`image_scales`, those with corresponding indices from this tensor.
model_fn: model function. Follows the signature:
* Args:
* `images`: Image tensor which is re-scaled.
* Returns:
* `global_descriptors`: Global descriptors for input images.
multi_scale_pool_type: If set, the global descriptor of each scale is pooled
and a 1D global descriptor is returned.
normalize_global_descriptor: If True, output global descriptors are
L2-normalized.
Returns:
global_descriptors: If `multi_scale_pool_type` is 'None', returns a [S, D]
float tensor. S is the number of scales, and D the global descriptor
dimensionality. Each D-dimensional entry is a global descriptor, which may
be L2-normalized depending on `normalize_global_descriptor`. If
`multi_scale_pool_type` is not 'None', returns a [D] float tensor with the
pooled global descriptor.
"""
original_image_shape_float = tf.gather(
tf.dtypes.cast(tf.shape(image), tf.float32), [0, 1])
image_tensor = gld.NormalizeImages(
image, pixel_value_offset=128.0, pixel_value_scale=128.0)
image_tensor = tf.expand_dims(image_tensor, 0, name='image/expand_dims')
def _ResizeAndExtract(scale_index):
"""Helper function to resize image then extract global feature.
Args:
scale_index: A valid index in image_scales.
Returns:
global_descriptor: [1,D] tensor denoting the extracted global descriptor.
"""
scale = tf.gather(image_scales, scale_index)
new_image_size = tf.dtypes.cast(
tf.round(original_image_shape_float * scale), tf.int32)
resized_image = tf.image.resize(image_tensor, new_image_size)
global_descriptor = model_fn(resized_image)
return global_descriptor
# First loop to find initial scale to be used.
num_scales = tf.shape(image_scales)[0]
initial_scale_index = tf.constant(-1, dtype=tf.int32)
for scale_index in tf.range(num_scales):
if tf.reduce_any(tf.equal(global_scales_ind, scale_index)):
initial_scale_index = scale_index
break
output_global = _ResizeAndExtract(initial_scale_index)
# Loop over subsequent scales.
for scale_index in tf.range(initial_scale_index + 1, num_scales):
# Allow an undefined number of global feature scales to be extracted.
tf.autograph.experimental.set_loop_options(
shape_invariants=[(output_global, tf.TensorShape([None, None]))])
if tf.reduce_any(tf.equal(global_scales_ind, scale_index)):
global_descriptor = _ResizeAndExtract(scale_index)
output_global = tf.concat([output_global, global_descriptor], 0)
normalization_axis = 1
if multi_scale_pool_type == 'average':
output_global = tf.reduce_mean(
output_global,
axis=0,
keepdims=False,
name='multi_scale_average_pooling')
normalization_axis = 0
elif multi_scale_pool_type == 'sum':
output_global = tf.reduce_sum(
output_global, axis=0, keepdims=False, name='multi_scale_sum_pooling')
normalization_axis = 0
if normalize_global_descriptor:
output_global = tf.nn.l2_normalize(
output_global, axis=normalization_axis, name='l2_normalization')
return output_global
...@@ -22,9 +22,14 @@ from __future__ import division ...@@ -22,9 +22,14 @@ from __future__ import division
from __future__ import print_function from __future__ import print_function
import functools import functools
import os
import tempfile
from absl import logging
import h5py
import tensorflow as tf import tensorflow as tf
layers = tf.keras.layers layers = tf.keras.layers
...@@ -284,8 +289,8 @@ class ResNet50(tf.keras.Model): ...@@ -284,8 +289,8 @@ class ResNet50(tf.keras.Model):
else: else:
self.global_pooling = None self.global_pooling = None
def call(self, inputs, training=True, intermediates_dict=None): def build_call(self, inputs, training=True, intermediates_dict=None):
"""Call the ResNet50 model. """Building the ResNet50 model.
Args: Args:
inputs: Images to compute features for. inputs: Images to compute features for.
...@@ -356,3 +361,79 @@ class ResNet50(tf.keras.Model): ...@@ -356,3 +361,79 @@ class ResNet50(tf.keras.Model):
return self.global_pooling(x) return self.global_pooling(x)
else: else:
return x return x
def call(self, inputs, training=True, intermediates_dict=None):
"""Call the ResNet50 model.
Args:
inputs: Images to compute features for.
training: Whether model is in training phase.
intermediates_dict: `None` or dictionary. If not None, accumulate feature
maps from intermediate blocks into the dictionary. ""
Returns:
Tensor with featuremap.
"""
return self.build_call(inputs, training, intermediates_dict)
def restore_weights(self, filepath):
"""Load pretrained weights.
This function loads a .h5 file from the filepath with saved model weights
and assigns them to the model.
Args:
filepath: String, path to the .h5 file
Raises:
ValueError: if the file referenced by `filepath` does not exist.
"""
if not tf.io.gfile.exists(filepath):
raise ValueError('Unable to load weights from %s. You must provide a'
'valid file.' % (filepath))
# Create a local copy of the weights file for h5py to be able to read it.
local_filename = os.path.basename(filepath)
tmp_filename = os.path.join(tempfile.gettempdir(), local_filename)
tf.io.gfile.copy(filepath, tmp_filename, overwrite=True)
# Load the content of the weights file.
f = h5py.File(tmp_filename, mode='r')
saved_layer_names = [n.decode('utf8') for n in f.attrs['layer_names']]
try:
# Iterate through all the layers assuming the max `depth` is 2.
for layer in self.layers:
if hasattr(layer, 'layers'):
for inlayer in layer.layers:
# Make sure the weights are in the saved model, and that we are in
# the innermost layer.
if inlayer.name not in saved_layer_names:
raise ValueError('Layer %s absent from the pretrained weights.'
'Unable to load its weights.' % (inlayer.name))
if hasattr(inlayer, 'layers'):
raise ValueError('Layer %s is not a depth 2 layer. Unable to load'
'its weights.' % (inlayer.name))
# Assign the weights in the current layer.
g = f[inlayer.name]
weight_names = [n.decode('utf8') for n in g.attrs['weight_names']]
weight_values = [g[weight_name] for weight_name in weight_names]
print('Setting the weights for layer %s' % (inlayer.name))
inlayer.set_weights(weight_values)
finally:
# Clean up the temporary file.
tf.io.gfile.remove(tmp_filename)
def log_weights(self):
"""Log backbone weights."""
logging.info('Logging backbone weights')
logging.info('------------------------')
for layer in self.layers:
if hasattr(layer, 'layers'):
for inlayer in layer.layers:
logging.info('Weights for layer: %s, inlayer % s', layer.name,
inlayer.name)
weights = inlayer.get_weights()
logging.info(weights)
else:
logging.info('Layer %s does not have inner layers.',
layer.name)
...@@ -43,17 +43,20 @@ flags.DEFINE_string('train_file_pattern', '/tmp/data/train*', ...@@ -43,17 +43,20 @@ flags.DEFINE_string('train_file_pattern', '/tmp/data/train*',
'File pattern of training dataset files.') 'File pattern of training dataset files.')
flags.DEFINE_string('validation_file_pattern', '/tmp/data/validation*', flags.DEFINE_string('validation_file_pattern', '/tmp/data/validation*',
'File pattern of validation dataset files.') 'File pattern of validation dataset files.')
flags.DEFINE_enum('dataset_version', 'gld_v1', flags.DEFINE_enum(
['gld_v1', 'gld_v2', 'gld_v2_clean'], 'dataset_version', 'gld_v1', ['gld_v1', 'gld_v2', 'gld_v2_clean'],
'Google Landmarks dataset version, used to determine the' 'Google Landmarks dataset version, used to determine the'
'number of classes.') 'number of classes.')
flags.DEFINE_integer('seed', 0, 'Seed to training dataset.') flags.DEFINE_integer('seed', 0, 'Seed to training dataset.')
flags.DEFINE_float('initial_lr', 0.001, 'Initial learning rate.') flags.DEFINE_float('initial_lr', 0.01, 'Initial learning rate.')
flags.DEFINE_integer('batch_size', 32, 'Global batch size.') flags.DEFINE_integer('batch_size', 32, 'Global batch size.')
flags.DEFINE_integer('max_iters', 500000, 'Maximum iterations.') flags.DEFINE_integer('max_iters', 500000, 'Maximum iterations.')
flags.DEFINE_boolean('block3_strides', False, 'Whether to use block3_strides.') flags.DEFINE_boolean('block3_strides', True, 'Whether to use block3_strides.')
flags.DEFINE_boolean('use_augmentation', True, flags.DEFINE_boolean('use_augmentation', True,
'Whether to use ImageNet style augmentation.') 'Whether to use ImageNet style augmentation.')
flags.DEFINE_string(
'imagenet_checkpoint', None,
'ImageNet checkpoint for ResNet backbone. If None, no checkpoint is used.')
def _record_accuracy(metric, logits, labels): def _record_accuracy(metric, logits, labels):
...@@ -64,6 +67,10 @@ def _record_accuracy(metric, logits, labels): ...@@ -64,6 +67,10 @@ def _record_accuracy(metric, logits, labels):
def _attention_summaries(scores, global_step): def _attention_summaries(scores, global_step):
"""Record statistics of the attention score.""" """Record statistics of the attention score."""
tf.summary.image(
'batch_attention',
scores / tf.reduce_max(scores + 1e-3),
step=global_step)
tf.summary.scalar('attention/max', tf.reduce_max(scores), step=global_step) tf.summary.scalar('attention/max', tf.reduce_max(scores), step=global_step)
tf.summary.scalar('attention/min', tf.reduce_min(scores), step=global_step) tf.summary.scalar('attention/min', tf.reduce_min(scores), step=global_step)
tf.summary.scalar('attention/mean', tf.reduce_mean(scores), step=global_step) tf.summary.scalar('attention/mean', tf.reduce_mean(scores), step=global_step)
...@@ -124,7 +131,7 @@ def main(argv): ...@@ -124,7 +131,7 @@ def main(argv):
max_iters = FLAGS.max_iters max_iters = FLAGS.max_iters
global_batch_size = FLAGS.batch_size global_batch_size = FLAGS.batch_size
image_size = 321 image_size = 321
num_eval = 1000 num_eval_batches = int(50000 / global_batch_size)
report_interval = 100 report_interval = 100
eval_interval = 1000 eval_interval = 1000
save_interval = 20000 save_interval = 20000
...@@ -134,9 +141,10 @@ def main(argv): ...@@ -134,9 +141,10 @@ def main(argv):
clip_val = tf.constant(10.0) clip_val = tf.constant(10.0)
if FLAGS.debug: if FLAGS.debug:
tf.config.run_functions_eagerly(True)
global_batch_size = 4 global_batch_size = 4
max_iters = 4 max_iters = 100
num_eval = 1 num_eval_batches = 1
save_interval = 1 save_interval = 1
report_interval = 1 report_interval = 1
...@@ -159,11 +167,12 @@ def main(argv): ...@@ -159,11 +167,12 @@ def main(argv):
augmentation=False, augmentation=False,
seed=FLAGS.seed) seed=FLAGS.seed)
train_iterator = strategy.make_dataset_iterator(train_dataset) train_dist_dataset = strategy.experimental_distribute_dataset(train_dataset)
validation_iterator = strategy.make_dataset_iterator(validation_dataset) validation_dist_dataset = strategy.experimental_distribute_dataset(
validation_dataset)
train_iterator.initialize() train_iter = iter(train_dist_dataset)
validation_iterator.initialize() validation_iter = iter(validation_dist_dataset)
# Create a checkpoint directory to store the checkpoints. # Create a checkpoint directory to store the checkpoints.
checkpoint_prefix = os.path.join(FLAGS.logdir, 'delf_tf2-ckpt') checkpoint_prefix = os.path.join(FLAGS.logdir, 'delf_tf2-ckpt')
...@@ -219,11 +228,14 @@ def main(argv): ...@@ -219,11 +228,14 @@ def main(argv):
labels = tf.clip_by_value(labels, 0, model.num_classes) labels = tf.clip_by_value(labels, 0, model.num_classes)
global_step = optimizer.iterations global_step = optimizer.iterations
tf.summary.image('batch_images', (images + 1.0) / 2.0, step=global_step)
tf.summary.scalar( tf.summary.scalar(
'image_range/max', tf.reduce_max(images), step=global_step) 'image_range/max', tf.reduce_max(images), step=global_step)
tf.summary.scalar( tf.summary.scalar(
'image_range/min', tf.reduce_min(images), step=global_step) 'image_range/min', tf.reduce_min(images), step=global_step)
# TODO(andrearaujo): we should try to unify the backprop into a single
# function, instead of applying once to descriptor then to attention.
def _backprop_loss(tape, loss, weights): def _backprop_loss(tape, loss, weights):
"""Backpropogate losses using clipped gradients. """Backpropogate losses using clipped gradients.
...@@ -344,12 +356,25 @@ def main(argv): ...@@ -344,12 +356,25 @@ def main(argv):
with tf.summary.record_if( with tf.summary.record_if(
tf.math.equal(0, optimizer.iterations % report_interval)): tf.math.equal(0, optimizer.iterations % report_interval)):
# TODO(dananghel): try to load pretrained weights at backbone creation.
# Load pretrained weights for ResNet50 trained on ImageNet.
if FLAGS.imagenet_checkpoint is not None:
logging.info('Attempting to load ImageNet pretrained weights.')
input_batch = next(train_iter)
_, _ = distributed_train_step(input_batch)
model.backbone.restore_weights(FLAGS.imagenet_checkpoint)
logging.info('Done.')
else:
logging.info('Skip loading ImageNet pretrained weights.')
if FLAGS.debug:
model.backbone.log_weights()
global_step_value = optimizer.iterations.numpy() global_step_value = optimizer.iterations.numpy()
while global_step_value < max_iters: while global_step_value < max_iters:
# input_batch : images(b, h, w, c), labels(b,). # input_batch : images(b, h, w, c), labels(b,).
try: try:
input_batch = train_iterator.get_next() input_batch = next(train_iter)
except tf.errors.OutOfRangeError: except tf.errors.OutOfRangeError:
# Break if we run out of data in the dataset. # Break if we run out of data in the dataset.
logging.info('Stopping training at global step %d, no more data', logging.info('Stopping training at global step %d, no more data',
...@@ -392,9 +417,9 @@ def main(argv): ...@@ -392,9 +417,9 @@ def main(argv):
# Validate once in {eval_interval*n, n \in N} steps. # Validate once in {eval_interval*n, n \in N} steps.
if global_step_value % eval_interval == 0: if global_step_value % eval_interval == 0:
for i in range(num_eval): for i in range(num_eval_batches):
try: try:
validation_batch = validation_iterator.get_next() validation_batch = next(validation_iter)
desc_validation_result, attn_validation_result = ( desc_validation_result, attn_validation_result = (
distributed_validation_step(validation_batch)) distributed_validation_step(validation_batch))
except tf.errors.OutOfRangeError: except tf.errors.OutOfRangeError:
...@@ -416,13 +441,17 @@ def main(argv): ...@@ -416,13 +441,17 @@ def main(argv):
print(' : attn:', attn_validation_result.numpy()) print(' : attn:', attn_validation_result.numpy())
# Save checkpoint once (each save_interval*n, n \in N) steps. # Save checkpoint once (each save_interval*n, n \in N) steps.
# TODO(andrearaujo): save only in one of the two ways. They are
# identical, the only difference is that the manager adds some extra
# prefixes and variables (eg, optimizer variables).
if global_step_value % save_interval == 0: if global_step_value % save_interval == 0:
save_path = manager.save() save_path = manager.save()
logging.info('Saved({global_step_value}) at %s', save_path) logging.info('Saved (%d) at %s', global_step_value, save_path)
file_path = '%s/delf_weights' % FLAGS.logdir file_path = '%s/delf_weights' % FLAGS.logdir
model.save_weights(file_path, save_format='tf') model.save_weights(file_path, save_format='tf')
logging.info('Saved weights({global_step_value}) at %s', file_path) logging.info('Saved weights (%d) at %s', global_step_value,
file_path)
# Reset metrics for next step. # Reset metrics for next step.
desc_train_accuracy.reset_states() desc_train_accuracy.reset_states()
......
...@@ -118,6 +118,8 @@ Importantly, these contextual images need not be labeled. ...@@ -118,6 +118,8 @@ Importantly, these contextual images need not be labeled.
novel camera deployment to improve performance at that camera, boosting novel camera deployment to improve performance at that camera, boosting
model generalizeability. model generalizeability.
Read about Context R-CNN on the Google AI blog [here](https://ai.googleblog.com/2020/06/leveraging-temporal-context-for-object.html).
We have provided code for generating data with associated context We have provided code for generating data with associated context
[here](g3doc/context_rcnn.md), and a sample config for a Context R-CNN [here](g3doc/context_rcnn.md), and a sample config for a Context R-CNN
model [here](samples/configs/context_rcnn_resnet101_snapshot_serengeti_sync.config). model [here](samples/configs/context_rcnn_resnet101_snapshot_serengeti_sync.config).
......
...@@ -390,7 +390,7 @@ class DatasetBuilderTest(test_case.TestCase): ...@@ -390,7 +390,7 @@ class DatasetBuilderTest(test_case.TestCase):
return iter1.get_next(), iter2.get_next() return iter1.get_next(), iter2.get_next()
output_dict1, output_dict2 = self.execute(graph_fn, []) output_dict1, output_dict2 = self.execute(graph_fn, [])
self.assertAllEqual(['0'], output_dict1[fields.InputDataFields.source_id]) self.assertAllEqual([b'0'], output_dict1[fields.InputDataFields.source_id])
self.assertEqual([b'1'], output_dict2[fields.InputDataFields.source_id]) self.assertEqual([b'1'], output_dict2[fields.InputDataFields.source_id])
def test_sample_one_of_n_shards(self): def test_sample_one_of_n_shards(self):
......
...@@ -58,7 +58,8 @@ def build(input_reader_config): ...@@ -58,7 +58,8 @@ def build(input_reader_config):
use_display_name=input_reader_config.use_display_name, use_display_name=input_reader_config.use_display_name,
num_additional_channels=input_reader_config.num_additional_channels, num_additional_channels=input_reader_config.num_additional_channels,
num_keypoints=input_reader_config.num_keypoints, num_keypoints=input_reader_config.num_keypoints,
expand_hierarchy_labels=input_reader_config.expand_labels_hierarchy) expand_hierarchy_labels=input_reader_config.expand_labels_hierarchy,
load_dense_pose=input_reader_config.load_dense_pose)
return decoder return decoder
elif input_type == input_reader_pb2.InputType.Value('TF_SEQUENCE_EXAMPLE'): elif input_type == input_reader_pb2.InputType.Value('TF_SEQUENCE_EXAMPLE'):
decoder = tf_sequence_example_decoder.TfSequenceExampleDecoder( decoder = tf_sequence_example_decoder.TfSequenceExampleDecoder(
......
...@@ -52,6 +52,7 @@ if tf_version.is_tf2(): ...@@ -52,6 +52,7 @@ if tf_version.is_tf2():
from object_detection.models import faster_rcnn_inception_resnet_v2_keras_feature_extractor as frcnn_inc_res_keras from object_detection.models import faster_rcnn_inception_resnet_v2_keras_feature_extractor as frcnn_inc_res_keras
from object_detection.models import faster_rcnn_resnet_keras_feature_extractor as frcnn_resnet_keras from object_detection.models import faster_rcnn_resnet_keras_feature_extractor as frcnn_resnet_keras
from object_detection.models import ssd_resnet_v1_fpn_keras_feature_extractor as ssd_resnet_v1_fpn_keras from object_detection.models import ssd_resnet_v1_fpn_keras_feature_extractor as ssd_resnet_v1_fpn_keras
from object_detection.models import faster_rcnn_resnet_v1_fpn_keras_feature_extractor as frcnn_resnet_fpn_keras
from object_detection.models.ssd_mobilenet_v1_fpn_keras_feature_extractor import SSDMobileNetV1FpnKerasFeatureExtractor from object_detection.models.ssd_mobilenet_v1_fpn_keras_feature_extractor import SSDMobileNetV1FpnKerasFeatureExtractor
from object_detection.models.ssd_mobilenet_v1_keras_feature_extractor import SSDMobileNetV1KerasFeatureExtractor from object_detection.models.ssd_mobilenet_v1_keras_feature_extractor import SSDMobileNetV1KerasFeatureExtractor
from object_detection.models.ssd_mobilenet_v2_fpn_keras_feature_extractor import SSDMobileNetV2FpnKerasFeatureExtractor from object_detection.models.ssd_mobilenet_v2_fpn_keras_feature_extractor import SSDMobileNetV2FpnKerasFeatureExtractor
...@@ -109,6 +110,12 @@ if tf_version.is_tf2(): ...@@ -109,6 +110,12 @@ if tf_version.is_tf2():
frcnn_resnet_keras.FasterRCNNResnet152KerasFeatureExtractor, frcnn_resnet_keras.FasterRCNNResnet152KerasFeatureExtractor,
'faster_rcnn_inception_resnet_v2_keras': 'faster_rcnn_inception_resnet_v2_keras':
frcnn_inc_res_keras.FasterRCNNInceptionResnetV2KerasFeatureExtractor, frcnn_inc_res_keras.FasterRCNNInceptionResnetV2KerasFeatureExtractor,
'fasret_rcnn_resnet50_fpn_keras':
frcnn_resnet_fpn_keras.FasterRCNNResnet50FpnKerasFeatureExtractor,
'fasret_rcnn_resnet101_fpn_keras':
frcnn_resnet_fpn_keras.FasterRCNNResnet101FpnKerasFeatureExtractor,
'fasret_rcnn_resnet152_fpn_keras':
frcnn_resnet_fpn_keras.FasterRCNNResnet152FpnKerasFeatureExtractor,
} }
CENTER_NET_EXTRACTOR_FUNCTION_MAP = { CENTER_NET_EXTRACTOR_FUNCTION_MAP = {
......
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""DensePose operations.
DensePose part ids are represented as tensors of shape
[num_instances, num_points] and coordinates are represented as tensors of shape
[num_instances, num_points, 4] where each point holds (y, x, v, u). The location
of the DensePose sampled point is (y, x) in normalized coordinates. The surface
coordinate (in the part coordinate frame) is (v, u). Note that dim 1 of both
tensors may contain padding, since the number of sampled points per instance
is not fixed. The value `num_points` represents the maximum number of sampled
points for an instance in the example.
"""
import os
import scipy.io
import tensorflow.compat.v1 as tf
from object_detection.utils import shape_utils
PART_NAMES = [
b'torso_back', b'torso_front', b'right_hand', b'left_hand', b'left_foot',
b'right_foot', b'right_upper_leg_back', b'left_upper_leg_back',
b'right_upper_leg_front', b'left_upper_leg_front', b'right_lower_leg_back',
b'left_lower_leg_back', b'right_lower_leg_front', b'left_lower_leg_front',
b'left_upper_arm_back', b'right_upper_arm_back', b'left_upper_arm_front',
b'right_upper_arm_front', b'left_lower_arm_back', b'right_lower_arm_back',
b'left_lower_arm_front', b'right_lower_arm_front', b'right_face',
b'left_face',
]
_SRC_PATH = ('google3/third_party/tensorflow_models/object_detection/'
'dataset_tools/densepose')
def scale(dp_surface_coords, y_scale, x_scale, scope=None):
"""Scales DensePose coordinates in y and x dimensions.
Args:
dp_surface_coords: a tensor of shape [num_instances, num_points, 4], with
coordinates in (y, x, v, u) format.
y_scale: (float) scalar tensor
x_scale: (float) scalar tensor
scope: name scope.
Returns:
new_dp_surface_coords: a tensor of shape [num_instances, num_points, 4]
"""
with tf.name_scope(scope, 'DensePoseScale'):
y_scale = tf.cast(y_scale, tf.float32)
x_scale = tf.cast(x_scale, tf.float32)
new_keypoints = dp_surface_coords * [[[y_scale, x_scale, 1, 1]]]
return new_keypoints
def clip_to_window(dp_surface_coords, window, scope=None):
"""Clips DensePose points to a window.
This op clips any input DensePose points to a window.
Args:
dp_surface_coords: a tensor of shape [num_instances, num_points, 4] with
DensePose surface coordinates in (y, x, v, u) format.
window: a tensor of shape [4] representing the [y_min, x_min, y_max, x_max]
window to which the op should clip the keypoints.
scope: name scope.
Returns:
new_dp_surface_coords: a tensor of shape [num_instances, num_points, 4].
"""
with tf.name_scope(scope, 'DensePoseClipToWindow'):
y, x, v, u = tf.split(value=dp_surface_coords, num_or_size_splits=4, axis=2)
win_y_min, win_x_min, win_y_max, win_x_max = tf.unstack(window)
y = tf.maximum(tf.minimum(y, win_y_max), win_y_min)
x = tf.maximum(tf.minimum(x, win_x_max), win_x_min)
new_dp_surface_coords = tf.concat([y, x, v, u], 2)
return new_dp_surface_coords
def prune_outside_window(dp_num_points, dp_part_ids, dp_surface_coords, window,
scope=None):
"""Prunes DensePose points that fall outside a given window.
This function replaces points that fall outside the given window with zeros.
See also clip_to_window which clips any DensePose points that fall outside the
given window.
Note that this operation uses dynamic shapes, and therefore is not currently
suitable for TPU.
Args:
dp_num_points: a tensor of shape [num_instances] that indicates how many
(non-padded) DensePose points there are per instance.
dp_part_ids: a tensor of shape [num_instances, num_points] with DensePose
part ids. These part_ids are 0-indexed, where the first non-background
part has index 0.
dp_surface_coords: a tensor of shape [num_instances, num_points, 4] with
DensePose surface coordinates in (y, x, v, u) format.
window: a tensor of shape [4] representing the [y_min, x_min, y_max, x_max]
window outside of which the op should prune the points.
scope: name scope.
Returns:
new_dp_num_points: a tensor of shape [num_instances] that indicates how many
(non-padded) DensePose points there are per instance after pruning.
new_dp_part_ids: a tensor of shape [num_instances, num_points] with
DensePose part ids. These part_ids are 0-indexed, where the first
non-background part has index 0.
new_dp_surface_coords: a tensor of shape [num_instances, num_points, 4] with
DensePose surface coordinates after pruning.
"""
with tf.name_scope(scope, 'DensePosePruneOutsideWindow'):
y, x, _, _ = tf.unstack(dp_surface_coords, axis=-1)
win_y_min, win_x_min, win_y_max, win_x_max = tf.unstack(window)
num_instances, num_points = shape_utils.combined_static_and_dynamic_shape(
dp_part_ids)
dp_num_points_tiled = tf.tile(dp_num_points[:, tf.newaxis],
multiples=[1, num_points])
range_tiled = tf.tile(tf.range(num_points)[tf.newaxis, :],
multiples=[num_instances, 1])
valid_initial = range_tiled < dp_num_points_tiled
valid_in_window = tf.logical_and(
tf.logical_and(y >= win_y_min, y <= win_y_max),
tf.logical_and(x >= win_x_min, x <= win_x_max))
valid_indices = tf.logical_and(valid_initial, valid_in_window)
new_dp_num_points = tf.math.reduce_sum(
tf.cast(valid_indices, tf.int32), axis=1)
max_num_points = tf.math.reduce_max(new_dp_num_points)
def gather_and_reshuffle(elems):
dp_part_ids, dp_surface_coords, valid_indices = elems
locs = tf.where(valid_indices)[:, 0]
valid_part_ids = tf.gather(dp_part_ids, locs, axis=0)
valid_part_ids_padded = shape_utils.pad_or_clip_nd(
valid_part_ids, output_shape=[max_num_points])
valid_surface_coords = tf.gather(dp_surface_coords, locs, axis=0)
valid_surface_coords_padded = shape_utils.pad_or_clip_nd(
valid_surface_coords, output_shape=[max_num_points, 4])
return [valid_part_ids_padded, valid_surface_coords_padded]
new_dp_part_ids, new_dp_surface_coords = (
shape_utils.static_or_dynamic_map_fn(
gather_and_reshuffle,
elems=[dp_part_ids, dp_surface_coords, valid_indices],
dtype=[tf.int32, tf.float32],
back_prop=False))
return new_dp_num_points, new_dp_part_ids, new_dp_surface_coords
def change_coordinate_frame(dp_surface_coords, window, scope=None):
"""Changes coordinate frame of the points to be relative to window's frame.
Given a window of the form [y_min, x_min, y_max, x_max] in normalized
coordinates, changes DensePose coordinates to be relative to this window.
An example use case is data augmentation: where we are given groundtruth
points and would like to randomly crop the image to some window. In this
case we need to change the coordinate frame of each sampled point to be
relative to this new window.
Args:
dp_surface_coords: a tensor of shape [num_instances, num_points, 4] with
DensePose surface coordinates in (y, x, v, u) format.
window: a tensor of shape [4] representing the [y_min, x_min, y_max, x_max]
window we should change the coordinate frame to.
scope: name scope.
Returns:
new_dp_surface_coords: a tensor of shape [num_instances, num_points, 4].
"""
with tf.name_scope(scope, 'DensePoseChangeCoordinateFrame'):
win_height = window[2] - window[0]
win_width = window[3] - window[1]
new_dp_surface_coords = scale(
dp_surface_coords - [window[0], window[1], 0, 0],
1.0 / win_height, 1.0 / win_width)
return new_dp_surface_coords
def to_normalized_coordinates(dp_surface_coords, height, width,
check_range=True, scope=None):
"""Converts absolute DensePose coordinates to normalized in range [0, 1].
This function raises an assertion failed error at graph execution time when
the maximum coordinate is smaller than 1.01 (which means that coordinates are
already normalized). The value 1.01 is to deal with small rounding errors.
Args:
dp_surface_coords: a tensor of shape [num_instances, num_points, 4] with
DensePose absolute surface coordinates in (y, x, v, u) format.
height: Height of image.
width: Width of image.
check_range: If True, checks if the coordinates are already normalized.
scope: name scope.
Returns:
A tensor of shape [num_instances, num_points, 4] with normalized
coordinates.
"""
with tf.name_scope(scope, 'DensePoseToNormalizedCoordinates'):
height = tf.cast(height, tf.float32)
width = tf.cast(width, tf.float32)
if check_range:
max_val = tf.reduce_max(dp_surface_coords[:, :, :2])
max_assert = tf.Assert(tf.greater(max_val, 1.01),
['max value is lower than 1.01: ', max_val])
with tf.control_dependencies([max_assert]):
width = tf.identity(width)
return scale(dp_surface_coords, 1.0 / height, 1.0 / width)
def to_absolute_coordinates(dp_surface_coords, height, width,
check_range=True, scope=None):
"""Converts normalized DensePose coordinates to absolute pixel coordinates.
This function raises an assertion failed error when the maximum
coordinate value is larger than 1.01 (in which case coordinates are already
absolute).
Args:
dp_surface_coords: a tensor of shape [num_instances, num_points, 4] with
DensePose normalized surface coordinates in (y, x, v, u) format.
height: Height of image.
width: Width of image.
check_range: If True, checks if the coordinates are normalized or not.
scope: name scope.
Returns:
A tensor of shape [num_instances, num_points, 4] with absolute coordinates.
"""
with tf.name_scope(scope, 'DensePoseToAbsoluteCoordinates'):
height = tf.cast(height, tf.float32)
width = tf.cast(width, tf.float32)
if check_range:
max_val = tf.reduce_max(dp_surface_coords[:, :, :2])
max_assert = tf.Assert(tf.greater_equal(1.01, max_val),
['maximum coordinate value is larger than 1.01: ',
max_val])
with tf.control_dependencies([max_assert]):
width = tf.identity(width)
return scale(dp_surface_coords, height, width)
class DensePoseHorizontalFlip(object):
"""Class responsible for horizontal flipping of parts and surface coords."""
def __init__(self):
"""Constructor."""
uv_symmetry_transforms_path = os.path.join(
tf.resource_loader.get_data_files_path(), '..', 'dataset_tools',
'densepose', 'UV_symmetry_transforms.mat')
data = scipy.io.loadmat(uv_symmetry_transforms_path)
# Create lookup maps which indicate how a VU coordinate changes after a
# horizontal flip.
uv_symmetry_map = {}
for key in ('U_transforms', 'V_transforms'):
uv_symmetry_map_per_part = []
for i in range(data[key].shape[1]):
# The following tensor has shape [256, 256].
map_per_part = tf.constant(data[key][0, i], dtype=tf.float32)
uv_symmetry_map_per_part.append(map_per_part)
uv_symmetry_map[key] = tf.reshape(
tf.stack(uv_symmetry_map_per_part, axis=0), [-1])
# The following dictionary contains flattened lookup maps for the U and V
# coordinates separately. The shape of each is [24 * 256 * 256].
self.uv_symmetries = uv_symmetry_map
# Create a list of that maps part index to flipped part index (0-indexed).
part_symmetries = []
for i, part_name in enumerate(PART_NAMES):
if b'left' in part_name:
part_symmetries.append(PART_NAMES.index(
part_name.replace(b'left', b'right')))
elif b'right' in part_name:
part_symmetries.append(PART_NAMES.index(
part_name.replace(b'right', b'left')))
else:
part_symmetries.append(i)
self.part_symmetries = part_symmetries
def flip_parts_and_coords(self, part_ids, vu):
"""Flips part ids and coordinates.
Args:
part_ids: a [num_instances, num_points] int32 tensor with pre-flipped part
ids. These part_ids are 0-indexed, where the first non-background part
has index 0.
vu: a [num_instances, num_points, 2] float32 tensor with pre-flipped vu
normalized coordinates.
Returns:
new_part_ids: a [num_instances, num_points] int32 tensor with post-flipped
part ids. These part_ids are 0-indexed, where the first non-background
part has index 0.
new_vu: a [num_instances, num_points, 2] float32 tensor with post-flipped
vu coordinates.
"""
num_instances, num_points = shape_utils.combined_static_and_dynamic_shape(
part_ids)
part_ids_flattened = tf.reshape(part_ids, [-1])
new_part_ids_flattened = tf.gather(self.part_symmetries, part_ids_flattened)
new_part_ids = tf.reshape(new_part_ids_flattened,
[num_instances, num_points])
# Convert VU floating point coordinates to values in [256, 256] grid.
vu = tf.math.minimum(tf.math.maximum(vu, 0.0), 1.0)
vu_locs = tf.cast(vu * 256., dtype=tf.int32)
vu_locs_flattened = tf.reshape(vu_locs, [-1, 2])
v_locs_flattened, u_locs_flattened = tf.unstack(vu_locs_flattened, axis=1)
# Convert vu_locs into lookup indices (in flattened part symmetries map).
symmetry_lookup_inds = (
part_ids_flattened * 65536 + 256 * v_locs_flattened + u_locs_flattened)
# New VU coordinates.
v_new = tf.gather(self.uv_symmetries['V_transforms'], symmetry_lookup_inds)
u_new = tf.gather(self.uv_symmetries['U_transforms'], symmetry_lookup_inds)
new_vu_flattened = tf.stack([v_new, u_new], axis=1)
new_vu = tf.reshape(new_vu_flattened, [num_instances, num_points, 2])
return new_part_ids, new_vu
def flip_horizontal(dp_part_ids, dp_surface_coords, scope=None):
"""Flips the DensePose points horizontally around the flip_point.
This operation flips dense pose annotations horizontally. Note that part ids
and surface coordinates may or may not change as a result of the flip.
Args:
dp_part_ids: a tensor of shape [num_instances, num_points] with DensePose
part ids. These part_ids are 0-indexed, where the first non-background
part has index 0.
dp_surface_coords: a tensor of shape [num_instances, num_points, 4] with
DensePose surface coordinates in (y, x, v, u) normalized format.
scope: name scope.
Returns:
new_dp_part_ids: a tensor of shape [num_instances, num_points] with
DensePose part ids after flipping.
new_dp_surface_coords: a tensor of shape [num_instances, num_points, 4] with
DensePose surface coordinates after flipping.
"""
with tf.name_scope(scope, 'DensePoseFlipHorizontal'):
# First flip x coordinate.
y, x, vu = tf.split(dp_surface_coords, num_or_size_splits=[1, 1, 2], axis=2)
xflipped = 1.0 - x
# Flip part ids and surface coordinates.
horizontal_flip = DensePoseHorizontalFlip()
new_dp_part_ids, new_vu = horizontal_flip.flip_parts_and_coords(
dp_part_ids, vu)
new_dp_surface_coords = tf.concat([y, xflipped, new_vu], axis=2)
return new_dp_part_ids, new_dp_surface_coords
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for object_detection.core.densepose_ops."""
import numpy as np
import tensorflow.compat.v1 as tf
from object_detection.core import densepose_ops
from object_detection.utils import test_case
class DensePoseOpsTest(test_case.TestCase):
"""Tests for common DensePose operations."""
def test_scale(self):
def graph_fn():
dp_surface_coords = tf.constant([
[[0.0, 0.0, 0.1, 0.2], [100.0, 200.0, 0.3, 0.4]],
[[50.0, 120.0, 0.5, 0.6], [100.0, 140.0, 0.7, 0.8]]
])
y_scale = tf.constant(1.0 / 100)
x_scale = tf.constant(1.0 / 200)
output = densepose_ops.scale(dp_surface_coords, y_scale, x_scale)
return output
output = self.execute(graph_fn, [])
expected_dp_surface_coords = np.array([
[[0., 0., 0.1, 0.2], [1.0, 1.0, 0.3, 0.4]],
[[0.5, 0.6, 0.5, 0.6], [1.0, 0.7, 0.7, 0.8]]
])
self.assertAllClose(output, expected_dp_surface_coords)
def test_clip_to_window(self):
def graph_fn():
dp_surface_coords = tf.constant([
[[0.25, 0.5, 0.1, 0.2], [0.75, 0.75, 0.3, 0.4]],
[[0.5, 0.0, 0.5, 0.6], [1.0, 1.0, 0.7, 0.8]]
])
window = tf.constant([0.25, 0.25, 0.75, 0.75])
output = densepose_ops.clip_to_window(dp_surface_coords, window)
return output
output = self.execute(graph_fn, [])
expected_dp_surface_coords = np.array([
[[0.25, 0.5, 0.1, 0.2], [0.75, 0.75, 0.3, 0.4]],
[[0.5, 0.25, 0.5, 0.6], [0.75, 0.75, 0.7, 0.8]]
])
self.assertAllClose(output, expected_dp_surface_coords)
def test_prune_outside_window(self):
def graph_fn():
dp_num_points = tf.constant([2, 0, 1])
dp_part_ids = tf.constant([[1, 1], [0, 0], [16, 0]])
dp_surface_coords = tf.constant([
[[0.9, 0.5, 0.1, 0.2], [0.75, 0.75, 0.3, 0.4]],
[[0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0]],
[[0.8, 0.5, 0.6, 0.6], [0.5, 0.5, 0.7, 0.7]]
])
window = tf.constant([0.25, 0.25, 0.75, 0.75])
new_dp_num_points, new_dp_part_ids, new_dp_surface_coords = (
densepose_ops.prune_outside_window(dp_num_points, dp_part_ids,
dp_surface_coords, window))
return new_dp_num_points, new_dp_part_ids, new_dp_surface_coords
new_dp_num_points, new_dp_part_ids, new_dp_surface_coords = (
self.execute_cpu(graph_fn, []))
expected_dp_num_points = np.array([1, 0, 0])
expected_dp_part_ids = np.array([[1], [0], [0]])
expected_dp_surface_coords = np.array([
[[0.75, 0.75, 0.3, 0.4]],
[[0.0, 0.0, 0.0, 0.0]],
[[0.0, 0.0, 0.0, 0.0]]
])
self.assertAllEqual(new_dp_num_points, expected_dp_num_points)
self.assertAllEqual(new_dp_part_ids, expected_dp_part_ids)
self.assertAllClose(new_dp_surface_coords, expected_dp_surface_coords)
def test_change_coordinate_frame(self):
def graph_fn():
dp_surface_coords = tf.constant([
[[0.25, 0.5, 0.1, 0.2], [0.75, 0.75, 0.3, 0.4]],
[[0.5, 0.0, 0.5, 0.6], [1.0, 1.0, 0.7, 0.8]]
])
window = tf.constant([0.25, 0.25, 0.75, 0.75])
output = densepose_ops.change_coordinate_frame(dp_surface_coords, window)
return output
output = self.execute(graph_fn, [])
expected_dp_surface_coords = np.array([
[[0, 0.5, 0.1, 0.2], [1.0, 1.0, 0.3, 0.4]],
[[0.5, -0.5, 0.5, 0.6], [1.5, 1.5, 0.7, 0.8]]
])
self.assertAllClose(output, expected_dp_surface_coords)
def test_to_normalized_coordinates(self):
def graph_fn():
dp_surface_coords = tf.constant([
[[10., 30., 0.1, 0.2], [30., 45., 0.3, 0.4]],
[[20., 0., 0.5, 0.6], [40., 60., 0.7, 0.8]]
])
output = densepose_ops.to_normalized_coordinates(
dp_surface_coords, 40, 60)
return output
output = self.execute(graph_fn, [])
expected_dp_surface_coords = np.array([
[[0.25, 0.5, 0.1, 0.2], [0.75, 0.75, 0.3, 0.4]],
[[0.5, 0.0, 0.5, 0.6], [1.0, 1.0, 0.7, 0.8]]
])
self.assertAllClose(output, expected_dp_surface_coords)
def test_to_absolute_coordinates(self):
def graph_fn():
dp_surface_coords = tf.constant([
[[0.25, 0.5, 0.1, 0.2], [0.75, 0.75, 0.3, 0.4]],
[[0.5, 0.0, 0.5, 0.6], [1.0, 1.0, 0.7, 0.8]]
])
output = densepose_ops.to_absolute_coordinates(
dp_surface_coords, 40, 60)
return output
output = self.execute(graph_fn, [])
expected_dp_surface_coords = np.array([
[[10., 30., 0.1, 0.2], [30., 45., 0.3, 0.4]],
[[20., 0., 0.5, 0.6], [40., 60., 0.7, 0.8]]
])
self.assertAllClose(output, expected_dp_surface_coords)
def test_horizontal_flip(self):
part_ids_np = np.array([[1, 4], [0, 8]], dtype=np.int32)
surf_coords_np = np.array([
[[0.1, 0.7, 0.2, 0.4], [0.3, 0.8, 0.2, 0.4]],
[[0.0, 0.5, 0.8, 0.7], [0.6, 1.0, 0.7, 0.9]],
], dtype=np.float32)
def graph_fn():
part_ids = tf.constant(part_ids_np, dtype=tf.int32)
surf_coords = tf.constant(surf_coords_np, dtype=tf.float32)
flipped_part_ids, flipped_surf_coords = densepose_ops.flip_horizontal(
part_ids, surf_coords)
flipped_twice_part_ids, flipped_twice_surf_coords = (
densepose_ops.flip_horizontal(flipped_part_ids, flipped_surf_coords))
return (flipped_part_ids, flipped_surf_coords,
flipped_twice_part_ids, flipped_twice_surf_coords)
(flipped_part_ids, flipped_surf_coords, flipped_twice_part_ids,
flipped_twice_surf_coords) = self.execute(graph_fn, [])
expected_flipped_part_ids = [[1, 5], # 1->1, 4->5
[0, 9]] # 0->0, 8->9
expected_flipped_surf_coords_yx = np.array([
[[0.1, 1.0-0.7], [0.3, 1.0-0.8]],
[[0.0, 1.0-0.5], [0.6, 1.0-1.0]],
], dtype=np.float32)
self.assertAllEqual(expected_flipped_part_ids, flipped_part_ids)
self.assertAllClose(expected_flipped_surf_coords_yx,
flipped_surf_coords[:, :, 0:2])
self.assertAllEqual(part_ids_np, flipped_twice_part_ids)
self.assertAllClose(surf_coords_np, flipped_twice_surf_coords, rtol=1e-2,
atol=1e-2)
if __name__ == '__main__':
tf.test.main()
...@@ -391,7 +391,9 @@ class DetectionModel(six.with_metaclass(abc.ABCMeta, _BaseClass)): ...@@ -391,7 +391,9 @@ class DetectionModel(six.with_metaclass(abc.ABCMeta, _BaseClass)):
pass pass
@abc.abstractmethod @abc.abstractmethod
def restore_map(self, fine_tune_checkpoint_type='detection'): def restore_map(self,
fine_tune_checkpoint_type='detection',
load_all_detection_checkpoint_vars=False):
"""Returns a map of variables to load from a foreign checkpoint. """Returns a map of variables to load from a foreign checkpoint.
Returns a map of variable names to load from a checkpoint to variables in Returns a map of variable names to load from a checkpoint to variables in
...@@ -407,6 +409,9 @@ class DetectionModel(six.with_metaclass(abc.ABCMeta, _BaseClass)): ...@@ -407,6 +409,9 @@ class DetectionModel(six.with_metaclass(abc.ABCMeta, _BaseClass)):
checkpoint (with compatible variable names) or to restore from a checkpoint (with compatible variable names) or to restore from a
classification checkpoint for initialization prior to training. classification checkpoint for initialization prior to training.
Valid values: `detection`, `classification`. Default 'detection'. Valid values: `detection`, `classification`. Default 'detection'.
load_all_detection_checkpoint_vars: whether to load all variables (when
`fine_tune_checkpoint_type` is `detection`). If False, only variables
within the feature extractor scope are included. Default False.
Returns: Returns:
A dict mapping variable names (to load from a checkpoint) to variables in A dict mapping variable names (to load from a checkpoint) to variables in
...@@ -414,6 +419,36 @@ class DetectionModel(six.with_metaclass(abc.ABCMeta, _BaseClass)): ...@@ -414,6 +419,36 @@ class DetectionModel(six.with_metaclass(abc.ABCMeta, _BaseClass)):
""" """
pass pass
@abc.abstractmethod
def restore_from_objects(self, fine_tune_checkpoint_type='detection'):
"""Returns a map of variables to load from a foreign checkpoint.
Returns a dictionary of Tensorflow 2 Trackable objects (e.g. tf.Module
or Checkpoint). This enables the model to initialize based on weights from
another task. For example, the feature extractor variables from a
classification model can be used to bootstrap training of an object
detector. When loading from an object detection model, the checkpoint model
should have the same parameters as this detection model with exception of
the num_classes parameter.
Note that this function is intended to be used to restore Keras-based
models when running Tensorflow 2, whereas restore_map (above) is intended
to be used to restore Slim-based models when running Tensorflow 1.x.
TODO(jonathanhuang,rathodv): Check tf_version and raise unimplemented
error for both restore_map and restore_from_objects depending on version.
Args:
fine_tune_checkpoint_type: whether to restore from a full detection
checkpoint (with compatible variable names) or to restore from a
classification checkpoint for initialization prior to training.
Valid values: `detection`, `classification`. Default 'detection'.
Returns:
A dict mapping keys to Trackable objects (tf.Module or Checkpoint).
"""
pass
@abc.abstractmethod @abc.abstractmethod
def updates(self): def updates(self):
"""Returns a list of update operators for this model. """Returns a list of update operators for this model.
......
...@@ -57,6 +57,9 @@ class FakeModel(model.DetectionModel): ...@@ -57,6 +57,9 @@ class FakeModel(model.DetectionModel):
def restore_map(self): def restore_map(self):
return {} return {}
def restore_from_objects(self, fine_tune_checkpoint_type):
pass
def regularization_losses(self): def regularization_losses(self):
return [] return []
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment