Unverified Commit c627506f authored by André Araujo's avatar André Araujo Committed by GitHub
Browse files

DELF open-source library v2.0 (#8454)

* Merged commit includes the following changes:
253126424  by Andre Araujo:

    Scripts to compute metrics for Google Landmarks dataset.

    Also, a small fix to metric in retrieval case: avoids duplicate predicted images.

--
253118971  by Andre Araujo:

    Metrics for Google Landmarks dataset.

--
253106953  by Andre Araujo:

    Library to read files from Google Landmarks challenges.

--
250700636  by Andre Araujo:

    Handle case of aggregation extraction with empty set of input features.

--
250516819  by Andre Araujo:

    Add minimum size for DELF extractor.

--
250435822  by Andre Araujo:

    Add max_image_size/min_image_size for open-source DELF proto / module.

--
250414606  by Andre Araujo:

    Refactor extract_aggregation to allow reuse with different datasets.

--
250356863  by Andre Araujo:

    Remove unnecessary cmd_args variable from boxes_and_features_extraction.

--
249783379  by Andre Araujo:

    Create directory for writing mapping file if it does not exist.

--
249581591  by Andre Araujo:

    Refactor scripts to extract boxes and features from images in Revisited datasets.
    Also, change tf.logging.info --> print for easier logging in open source code.

--
249511821  by Andre Araujo:

    Small change to function for file/directory handling.

--
249289499  by Andre Araujo:

    Internal change.

--

PiperOrigin-RevId: 253126424

* Updating DELF init to adjust to latest changes

* Editing init files for python packages

* Edit D2R dataset reader to work with py3.

PiperOrigin-RevId: 253135576

* DELF package: fix import ordering

* Adding new requirements to setup.py

* Adding init file for training dir

* Merged commit includes the following changes:

FolderOrigin-RevId: /google/src/cloud/andrearaujo/delf_oss/google3/..

* Adding init file for training subdirs

* Working version of DELF training

* Internal change.

PiperOrigin-RevId: 253248648

* Fix variance loading in open-source code.

PiperOrigin-RevId: 260619120

* Separate image re-ranking as a standalone library, and add metric writing to dataset library.

PiperOrigin-RevId: 260998608

* Tool to read written D2R Revisited datasets metrics file. Test is added.

Also adds a unit test for previously-existing SaveMetricsFile function.

PiperOrigin-RevId: 263361410

* Add optional resize factor for feature extraction.

PiperOrigin-RevId: 264437080

* Fix NumPy's new version spacing changes.

PiperOrigin-RevId: 265127245

* Maker image matching function visible, and add support for RANSAC seed.

PiperOrigin-RevId: 277177468

* Avoid matplotlib failure due to missing display backend.

PiperOrigin-RevId: 287316435

* Removes tf.contrib dependency.

PiperOrigin-RevId: 288842237

* Fix tf contrib removal for feature_aggregation_extractor.

PiperOrigin-RevId: 289487669

* Merged commit includes the following changes:
309118395  by Andre Araujo:

    Make DELF open-source code compatible with TF2.

--
309067582  by Andre Araujo:

    Handle image resizing rounding properly for python extraction.

    New behavior is tested with unit tests.

--
308690144  by Andre Araujo:

    Several changes to improve DELF model/training code and make it work in TF 2.1.0:
    - Rename some files for better clarity
    - Using compat.v1 versions of functions
    - Formatting changes
    - Using more appropriate TF function names

--
308689397  by Andre Araujo:

    Internal change.

--
308341315  by Andre Araujo:

    Remove old slim dependency in DELF open-source model.

    This avoids issues with requiring old TF-v1, making it compatible with latest TF.

--
306777559  by Andre Araujo:

    Internal change

--
304505811  by Andre Araujo:

    Raise error during geometric verification if local features have different dimensionalities.

--
301739992  by Andre Araujo:

    Transform some geometric verification constants into arguments, to allow custom matching.

--
301300324  by Andre Araujo:

    Apply name change(experimental_run_v2 -> run) for all callers in Tensorflow.

--
299919057  by Andre Araujo:

    Automated refactoring to make code Python 3 compatible.

--
297953698  by Andre Araujo:

    Explicitly replace "import tensorflow" with "tensorflow.compat.v1" for TF2.x migration

--
297521242  by Andre Araujo:

    Explicitly replace "import tensorflow" with "tensorflow.compat.v1" for TF2.x migration

--
297278247  by Andre Araujo:

    Explicitly replace "import tensorflow" with "tensorflow.compat.v1" for TF2.x migration

--
297270405  by Andre Araujo:

    Explicitly replace "import tensorflow" with "tensorflow.compat.v1" for TF2.x migration

--
297238741  by Andre Araujo:

    Explicitly replace "import tensorflow" with "tensorflow.compat.v1" for TF2.x migration

--
297108605  by Andre Araujo:

    Explicitly replace "import tensorflow" with "tensorflow.compat.v1" for TF2.x migration

--
294676131  by Andre Araujo:

    Add option to resize images to square resolutions without aspect ratio preservation.

--
293849641  by Andre Araujo:

    Internal change.

--
293840896  by Andre Araujo:

    Changing Slim import to tf_slim codebase.

--
293661660  by Andre Araujo:

    Allow the delf training script to read from TFRecords dataset.

--
291755295  by Andre Araujo:

    Internal change.

--
291448508  by Andre Araujo:

    Internal change.

--
291414459  by Andre Araujo:

    Adding train script.

--
291384336  by Andre Araujo:

    Adding model export script and test.

--
291260565  by Andre Araujo:

    Adding placeholder for Google Landmarks dataset.

--
291205548  by Andre Araujo:

    Definition of DELF model using Keras ResNet50 as backbone.

--
289500793  by Andre Araujo:

    Add TFRecord building script for delf.

--

PiperOrigin-RevId: 309118395

* Updating README, dependency versions

* Updating training README

* Fixing init import of export_model

* Fixing init import of export_model_utils

* tkinter in INSTALL_INSTRUCTIONS

* Merged commit includes the following changes:

FolderOrigin-RevId: /google/src/cloud/andrearaujo/delf_oss/google3/..

* INSTALL_INSTRUCTIONS mentioning different cloning options
parent 71d2680d
...@@ -29,16 +29,36 @@ from delf import extractor ...@@ -29,16 +29,36 @@ from delf import extractor
class ExtractorTest(tf.test.TestCase, parameterized.TestCase): class ExtractorTest(tf.test.TestCase, parameterized.TestCase):
@parameterized.named_parameters( @parameterized.named_parameters(
('Max-1Min-1', -1, -1, [4, 2, 3], 1.0), ('Max-1Min-1', -1, -1, 1.0, False, [4, 2, 3], [1.0, 1.0]),
('Max2Min-1', 2, -1, [2, 1, 3], 0.5), ('Max-1Min-1Square', -1, -1, 1.0, True, [4, 4, 3], [1.0, 2.0]),
('Max8Min-1', 8, -1, [4, 2, 3], 1.0), ('Max2Min-1', 2, -1, 1.0, False, [2, 1, 3], [0.5, 0.5]),
('Max-1Min1', -1, 1, [4, 2, 3], 1.0), ('Max2Min-1Square', 2, -1, 1.0, True, [2, 2, 3], [0.5, 1.0]),
('Max-1Min8', -1, 8, [8, 4, 3], 2.0), ('Max8Min-1', 8, -1, 1.0, False, [4, 2, 3], [1.0, 1.0]),
('Max16Min8', 16, 8, [8, 4, 3], 2.0), ('Max8Min-1Square', 8, -1, 1.0, True, [4, 4, 3], [1.0, 2.0]),
('Max2Min2', 2, 2, [2, 1, 3], 0.5), ('Max-1Min1', -1, 1, 1.0, False, [4, 2, 3], [1.0, 1.0]),
('Max-1Min1Square', -1, 1, 1.0, True, [4, 4, 3], [1.0, 2.0]),
('Max-1Min8', -1, 8, 1.0, False, [8, 4, 3], [2.0, 2.0]),
('Max-1Min8Square', -1, 8, 1.0, True, [8, 8, 3], [2.0, 4.0]),
('Max16Min8', 16, 8, 1.0, False, [8, 4, 3], [2.0, 2.0]),
('Max16Min8Square', 16, 8, 1.0, True, [8, 8, 3], [2.0, 4.0]),
('Max2Min2', 2, 2, 1.0, False, [2, 1, 3], [0.5, 0.5]),
('Max2Min2Square', 2, 2, 1.0, True, [2, 2, 3], [0.5, 1.0]),
('Max-1Min-1Factor0.5', -1, -1, 0.5, False, [4, 2, 3], [1.0, 1.0]),
('Max-1Min-1Factor0.5Square', -1, -1, 0.5, True, [4, 4, 3], [1.0, 2.0]),
('Max2Min-1Factor2.0', 2, -1, 2.0, False, [4, 2, 3], [1.0, 1.0]),
('Max2Min-1Factor2.0Square', 2, -1, 2.0, True, [4, 4, 3], [1.0, 2.0]),
('Max-1Min8Factor0.5', -1, 8, 0.5, False, [4, 2, 3], [1.0, 1.0]),
('Max-1Min8Factor0.5Square', -1, 8, 0.5, True, [4, 4, 3], [1.0, 2.0]),
('Max-1Min8Factor0.25', -1, 8, 0.25, False, [4, 2, 3], [1.0, 1.0]),
('Max-1Min8Factor0.25Square', -1, 8, 0.25, True, [4, 4, 3], [1.0, 2.0]),
('Max2Min2Factor2.0', 2, 2, 2.0, False, [4, 2, 3], [1.0, 1.0]),
('Max2Min2Factor2.0Square', 2, 2, 2.0, True, [4, 4, 3], [1.0, 2.0]),
('Max16Min8Factor0.5', 16, 8, 0.5, False, [4, 2, 3], [1.0, 1.0]),
('Max16Min8Factor0.5Square', 16, 8, 0.5, True, [4, 4, 3], [1.0, 2.0]),
) )
def testResizeImageWorks(self, max_image_size, min_image_size, expected_shape, def testResizeImageWorks(self, max_image_size, min_image_size, resize_factor,
expected_scale_factor): square_output, expected_shape,
expected_scale_factors):
# Construct image of size 4x2x3. # Construct image of size 4x2x3.
image = np.array([[[0, 0, 0], [1, 1, 1]], [[2, 2, 2], [3, 3, 3]], image = np.array([[[0, 0, 0], [1, 1, 1]], [[2, 2, 2], [3, 3, 3]],
[[4, 4, 4], [5, 5, 5]], [[6, 6, 6], [7, 7, 7]]], [[4, 4, 4], [5, 5, 5]], [[6, 6, 6], [7, 7, 7]]],
...@@ -48,9 +68,31 @@ class ExtractorTest(tf.test.TestCase, parameterized.TestCase): ...@@ -48,9 +68,31 @@ class ExtractorTest(tf.test.TestCase, parameterized.TestCase):
config = delf_config_pb2.DelfConfig( config = delf_config_pb2.DelfConfig(
max_image_size=max_image_size, min_image_size=min_image_size) max_image_size=max_image_size, min_image_size=min_image_size)
resized_image, scale_factor = extractor.ResizeImage(image, config) resized_image, scale_factors = extractor.ResizeImage(
image, config, resize_factor, square_output)
self.assertAllEqual(resized_image.shape, expected_shape) self.assertAllEqual(resized_image.shape, expected_shape)
self.assertAllClose(scale_factor, expected_scale_factor) self.assertAllClose(scale_factors, expected_scale_factors)
@parameterized.named_parameters(
('Max2Min2', 2, 2, 1.0, False, [2, 1, 3], [0.666666, 0.5]),
('Max2Min2Square', 2, 2, 1.0, True, [2, 2, 3], [0.666666, 1.0]),
)
def testResizeImageRoundingWorks(self, max_image_size, min_image_size,
resize_factor, square_output, expected_shape,
expected_scale_factors):
# Construct image of size 3x2x3.
image = np.array([[[0, 0, 0], [1, 1, 1]], [[2, 2, 2], [3, 3, 3]],
[[4, 4, 4], [5, 5, 5]]],
dtype='uint8')
# Set up config.
config = delf_config_pb2.DelfConfig(
max_image_size=max_image_size, min_image_size=min_image_size)
resized_image, scale_factors = extractor.ResizeImage(
image, config, resize_factor, square_output)
self.assertAllEqual(resized_image.shape, expected_shape)
self.assertAllClose(scale_factors, expected_scale_factors)
if __name__ == '__main__': if __name__ == '__main__':
......
...@@ -27,7 +27,10 @@ from __future__ import print_function ...@@ -27,7 +27,10 @@ from __future__ import print_function
import argparse import argparse
import sys import sys
import matplotlib.image as mpimg import matplotlib
# Needed before pyplot import for matplotlib to work properly.
matplotlib.use('Agg')
import matplotlib.image as mpimg # pylint: disable=g-import-not-at-top
import matplotlib.pyplot as plt import matplotlib.pyplot as plt
import numpy as np import numpy as np
from scipy import spatial from scipy import spatial
...@@ -45,17 +48,17 @@ _DISTANCE_THRESHOLD = 0.8 ...@@ -45,17 +48,17 @@ _DISTANCE_THRESHOLD = 0.8
def main(unused_argv): def main(unused_argv):
tf.logging.set_verbosity(tf.logging.INFO) tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.INFO)
# Read features. # Read features.
locations_1, _, descriptors_1, _, _ = feature_io.ReadFromFile( locations_1, _, descriptors_1, _, _ = feature_io.ReadFromFile(
cmd_args.features_1_path) cmd_args.features_1_path)
num_features_1 = locations_1.shape[0] num_features_1 = locations_1.shape[0]
tf.logging.info("Loaded image 1's %d features" % num_features_1) tf.compat.v1.logging.info("Loaded image 1's %d features" % num_features_1)
locations_2, _, descriptors_2, _, _ = feature_io.ReadFromFile( locations_2, _, descriptors_2, _, _ = feature_io.ReadFromFile(
cmd_args.features_2_path) cmd_args.features_2_path)
num_features_2 = locations_2.shape[0] num_features_2 = locations_2.shape[0]
tf.logging.info("Loaded image 2's %d features" % num_features_2) tf.compat.v1.logging.info("Loaded image 2's %d features" % num_features_2)
# Find nearest-neighbor matches using a KD tree. # Find nearest-neighbor matches using a KD tree.
d1_tree = spatial.cKDTree(descriptors_1) d1_tree = spatial.cKDTree(descriptors_1)
...@@ -81,7 +84,7 @@ def main(unused_argv): ...@@ -81,7 +84,7 @@ def main(unused_argv):
residual_threshold=20, residual_threshold=20,
max_trials=1000) max_trials=1000)
tf.logging.info('Found %d inliers' % sum(inliers)) tf.compat.v1.logging.info('Found %d inliers' % sum(inliers))
# Visualize correspondences, and save to file. # Visualize correspondences, and save to file.
_, ax = plt.subplots() _, ax = plt.subplots()
......
...@@ -27,6 +27,7 @@ import tensorflow as tf ...@@ -27,6 +27,7 @@ import tensorflow as tf
from delf import aggregation_config_pb2 from delf import aggregation_config_pb2
_CLUSTER_CENTERS_VAR_NAME = "clusters"
_NORM_SQUARED_TOLERANCE = 1e-12 _NORM_SQUARED_TOLERANCE = 1e-12
# Aliases for aggregation types. # Aliases for aggregation types.
...@@ -66,10 +67,7 @@ class ExtractAggregatedRepresentation(object): ...@@ -66,10 +67,7 @@ class ExtractAggregatedRepresentation(object):
aggregation_config.feature_dimensionality aggregation_config.feature_dimensionality
]) ])
tf.compat.v1.train.init_from_checkpoint( tf.compat.v1.train.init_from_checkpoint(
aggregation_config.codebook_path, { aggregation_config.codebook_path, {_CLUSTER_CENTERS_VAR_NAME: codebook})
tf.contrib.factorization.KMeansClustering.CLUSTER_CENTERS_VAR_NAME:
codebook
})
# Construct extraction graph based on desired options. # Construct extraction graph based on desired options.
if self._aggregation_type == _VLAD: if self._aggregation_type == _VLAD:
...@@ -270,7 +268,7 @@ class ExtractAggregatedRepresentation(object): ...@@ -270,7 +268,7 @@ class ExtractAggregatedRepresentation(object):
output_vlad: VLAD descriptor updated to take into account contribution output_vlad: VLAD descriptor updated to take into account contribution
from ind-th feature. from ind-th feature.
""" """
return ind + 1, tf.compat.v1.tensor_scatter_add( return ind + 1, tf.tensor_scatter_nd_add(
vlad, tf.expand_dims(selected_visual_words[ind], axis=1), vlad, tf.expand_dims(selected_visual_words[ind], axis=1),
tf.tile( tf.tile(
tf.expand_dims(features[ind], axis=0), [num_assignments, 1]) - tf.expand_dims(features[ind], axis=0), [num_assignments, 1]) -
......
...@@ -12,8 +12,7 @@ ...@@ -12,8 +12,7 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
# ============================================================================== # ==============================================================================
"""DELF feature extractor. """DELF feature extractor."""
"""
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
...@@ -39,8 +38,8 @@ def NormalizePixelValues(image, ...@@ -39,8 +38,8 @@ def NormalizePixelValues(image,
Returns: Returns:
image: a float32 tensor of the same shape as the input image. image: a float32 tensor of the same shape as the input image.
""" """
image = tf.to_float(image) image = tf.cast(image, dtype=tf.float32)
image = tf.div(tf.subtract(image, pixel_value_offset), pixel_value_scale) image = tf.truediv(tf.subtract(image, pixel_value_offset), pixel_value_scale)
return image return image
...@@ -53,6 +52,7 @@ def CalculateReceptiveBoxes(height, width, rf, stride, padding): ...@@ -53,6 +52,7 @@ def CalculateReceptiveBoxes(height, width, rf, stride, padding):
rf: The receptive field size. rf: The receptive field size.
stride: The effective stride between two adjacent feature points. stride: The effective stride between two adjacent feature points.
padding: The effective padding size. padding: The effective padding size.
Returns: Returns:
rf_boxes: [N, 4] receptive boxes tensor. Here N equals to height x width. rf_boxes: [N, 4] receptive boxes tensor. Here N equals to height x width.
Each box is represented by [ymin, xmin, ymax, xmax]. Each box is represented by [ymin, xmin, ymax, xmax].
...@@ -60,7 +60,8 @@ def CalculateReceptiveBoxes(height, width, rf, stride, padding): ...@@ -60,7 +60,8 @@ def CalculateReceptiveBoxes(height, width, rf, stride, padding):
x, y = tf.meshgrid(tf.range(width), tf.range(height)) x, y = tf.meshgrid(tf.range(width), tf.range(height))
coordinates = tf.reshape(tf.stack([y, x], axis=2), [-1, 2]) coordinates = tf.reshape(tf.stack([y, x], axis=2), [-1, 2])
# [y,x,y,x] # [y,x,y,x]
point_boxes = tf.to_float(tf.concat([coordinates, coordinates], 1)) point_boxes = tf.cast(
tf.concat([coordinates, coordinates], 1), dtype=tf.float32)
bias = [-padding, -padding, -padding + rf - 1, -padding + rf - 1] bias = [-padding, -padding, -padding + rf - 1, -padding + rf - 1]
rf_boxes = stride * point_boxes + bias rf_boxes = stride * point_boxes + bias
return rf_boxes return rf_boxes
...@@ -94,12 +95,10 @@ def ExtractKeypointDescriptor(image, layer_name, image_scales, iou, ...@@ -94,12 +95,10 @@ def ExtractKeypointDescriptor(image, layer_name, image_scales, iou,
abs_thres: A float tensor denoting the score threshold for feature abs_thres: A float tensor denoting the score threshold for feature
selection. selection.
model_fn: Model function. Follows the signature: model_fn: Model function. Follows the signature:
* Args: * Args:
* `images`: Image tensor which is re-scaled. * `images`: Image tensor which is re-scaled.
* `normalized_image`: Whether or not the images are normalized. * `normalized_image`: Whether or not the images are normalized.
* `reuse`: Whether or not the layer and its variables should be reused. * `reuse`: Whether or not the layer and its variables should be reused.
* Returns: * Returns:
* `attention`: Attention score after the non-linearity. * `attention`: Attention score after the non-linearity.
* `feature_map`: Feature map obtained from the ResNet model. * `feature_map`: Feature map obtained from the ResNet model.
...@@ -117,7 +116,8 @@ def ExtractKeypointDescriptor(image, layer_name, image_scales, iou, ...@@ -117,7 +116,8 @@ def ExtractKeypointDescriptor(image, layer_name, image_scales, iou,
Raises: Raises:
ValueError: If the layer_name is unsupported. ValueError: If the layer_name is unsupported.
""" """
original_image_shape_float = tf.gather(tf.to_float(tf.shape(image)), [0, 1]) original_image_shape_float = tf.gather(
tf.cast(tf.shape(image), dtype=tf.float32), [0, 1])
image_tensor = NormalizePixelValues(image) image_tensor = NormalizePixelValues(image)
image_tensor = tf.expand_dims(image_tensor, 0, name='image/expand_dims') image_tensor = tf.expand_dims(image_tensor, 0, name='image/expand_dims')
...@@ -163,8 +163,10 @@ def ExtractKeypointDescriptor(image, layer_name, image_scales, iou, ...@@ -163,8 +163,10 @@ def ExtractKeypointDescriptor(image, layer_name, image_scales, iou,
scores: Concatenated attention score tensor with the shape of [K]. scores: Concatenated attention score tensor with the shape of [K].
""" """
scale = tf.gather(image_scales, scale_index) scale = tf.gather(image_scales, scale_index)
new_image_size = tf.to_int32(tf.round(original_image_shape_float * scale)) new_image_size = tf.cast(
resized_image = tf.image.resize_bilinear(image_tensor, new_image_size) tf.round(original_image_shape_float * scale), dtype=tf.int32)
resized_image = tf.compat.v1.image.resize_bilinear(image_tensor,
new_image_size)
attention, feature_map = model_fn( attention, feature_map = model_fn(
resized_image, normalized_image=True, reuse=reuse) resized_image, normalized_image=True, reuse=reuse)
...@@ -254,7 +256,7 @@ def BuildModel(layer_name, attention_nonlinear, attention_type, ...@@ -254,7 +256,7 @@ def BuildModel(layer_name, attention_nonlinear, attention_type,
Currently, only 'softplus' is supported. Currently, only 'softplus' is supported.
attention_type: Type of the attention used. Options are: attention_type: Type of the attention used. Options are:
'use_l2_normalized_feature' and 'use_default_input_feature'. Note that 'use_l2_normalized_feature' and 'use_default_input_feature'. Note that
this is irrelevant during inference time. this is irrelevant during inference time.
attention_kernel_size: Size of attention kernel (kernel is square). attention_kernel_size: Size of attention kernel (kernel is square).
Returns: Returns:
...@@ -268,6 +270,7 @@ def BuildModel(layer_name, attention_nonlinear, attention_type, ...@@ -268,6 +270,7 @@ def BuildModel(layer_name, attention_nonlinear, attention_type,
images: Image tensor. images: Image tensor.
normalized_image: Whether or not the images are normalized. normalized_image: Whether or not the images are normalized.
reuse: Whether or not the layer and its variables should be reused. reuse: Whether or not the layer and its variables should be reused.
Returns: Returns:
attention: Attention score after the non-linearity. attention: Attention score after the non-linearity.
feature_map: Feature map after ResNet convolution. feature_map: Feature map after ResNet convolution.
...@@ -328,57 +331,72 @@ def ApplyPcaAndWhitening(data, ...@@ -328,57 +331,72 @@ def ApplyPcaAndWhitening(data,
return output return output
def DelfFeaturePostProcessing(boxes, descriptors, config): def PostProcessDescriptors(descriptors, use_pca, pca_parameters):
"""Extract DELF features from input image. """Post-process descriptors.
Args: Args:
boxes: [N, 4] float tensor which denotes the selected receptive box. N is
the number of final feature points which pass through keypoint selection
and NMS steps.
descriptors: [N, input_dim] float tensor. descriptors: [N, input_dim] float tensor.
config: DelfConfig proto with DELF extraction options. use_pca: Whether to use PCA.
pca_parameters: DelfPcaParameters proto.
Returns: Returns:
locations: [N, 2] float tensor which denotes the selected keypoint final_descriptors: [N, output_dim] float tensor with descriptors after
locations.
final_descriptors: [N, output_dim] float tensor with DELF descriptors after
normalization and (possibly) PCA/whitening. normalization and (possibly) PCA/whitening.
""" """
# L2-normalize, and if desired apply PCA (followed by L2-normalization).
# Get center of descriptor boxes, corresponding to feature locations. with tf.compat.v1.variable_scope('postprocess'):
locations = CalculateKeypointCenters(boxes)
# Post-process descriptors: L2-normalize, and if desired apply PCA (followed
# by L2-normalization).
with tf.variable_scope('postprocess'):
final_descriptors = tf.nn.l2_normalize( final_descriptors = tf.nn.l2_normalize(
descriptors, dim=1, name='l2_normalization') descriptors, axis=1, name='l2_normalization')
if config.delf_local_config.use_pca: if use_pca:
# Load PCA parameters. # Load PCA parameters.
pca_mean = tf.constant( pca_mean = tf.constant(
datum_io.ReadFromFile( datum_io.ReadFromFile(pca_parameters.mean_path), dtype=tf.float32)
config.delf_local_config.pca_parameters.mean_path),
dtype=tf.float32)
pca_matrix = tf.constant( pca_matrix = tf.constant(
datum_io.ReadFromFile( datum_io.ReadFromFile(pca_parameters.projection_matrix_path),
config.delf_local_config.pca_parameters.projection_matrix_path),
dtype=tf.float32) dtype=tf.float32)
pca_dim = config.delf_local_config.pca_parameters.pca_dim pca_dim = pca_parameters.pca_dim
pca_variances = None pca_variances = None
if config.delf_local_config.pca_parameters.use_whitening: if pca_parameters.use_whitening:
pca_variances = tf.constant( pca_variances = tf.squeeze(
datum_io.ReadFromFile( tf.constant(
config.delf_local_config.pca_parameters.pca_variances_path), datum_io.ReadFromFile(pca_parameters.pca_variances_path),
dtype=tf.float32) dtype=tf.float32))
# Apply PCA, and whitening if desired. # Apply PCA, and whitening if desired.
final_descriptors = ApplyPcaAndWhitening( final_descriptors = ApplyPcaAndWhitening(final_descriptors, pca_matrix,
final_descriptors, pca_matrix, pca_mean, pca_dim, pca_mean, pca_dim,
config.delf_local_config.pca_parameters.use_whitening, pca_variances) pca_parameters.use_whitening,
pca_variances)
# Re-normalize. # Re-normalize.
final_descriptors = tf.nn.l2_normalize( final_descriptors = tf.nn.l2_normalize(
final_descriptors, dim=1, name='pca_l2_normalization') final_descriptors, axis=1, name='pca_l2_normalization')
return final_descriptors
def DelfFeaturePostProcessing(boxes, descriptors, config):
"""Extract DELF features from input image.
Args:
boxes: [N, 4] float tensor which denotes the selected receptive box. N is
the number of final feature points which pass through keypoint selection
and NMS steps.
descriptors: [N, input_dim] float tensor.
config: DelfConfig proto with DELF extraction options.
Returns:
locations: [N, 2] float tensor which denotes the selected keypoint
locations.
final_descriptors: [N, output_dim] float tensor with DELF descriptors after
normalization and (possibly) PCA/whitening.
"""
# Get center of descriptor boxes, corresponding to feature locations.
locations = CalculateKeypointCenters(boxes)
final_descriptors = PostProcessDescriptors(
descriptors, config.delf_local_config.use_pca,
config.delf_local_config.pca_parameters)
return locations, final_descriptors return locations, final_descriptors
...@@ -34,7 +34,7 @@ class FeatureExtractorTest(tf.test.TestCase): ...@@ -34,7 +34,7 @@ class FeatureExtractorTest(tf.test.TestCase):
image, pixel_value_offset=5.0, pixel_value_scale=2.0) image, pixel_value_offset=5.0, pixel_value_scale=2.0)
exp_normalized_image = [[[-1.0, 125.0, -2.5], [14.5, 3.5, 0.0]], exp_normalized_image = [[[-1.0, 125.0, -2.5], [14.5, 3.5, 0.0]],
[[20.0, 0.0, 30.0], [25.5, 36.0, 42.0]]] [[20.0, 0.0, 30.0], [25.5, 36.0, 42.0]]]
with self.test_session() as sess: with self.session() as sess:
normalized_image_out = sess.run(normalized_image) normalized_image_out = sess.run(normalized_image)
self.assertAllEqual(normalized_image_out, exp_normalized_image) self.assertAllEqual(normalized_image_out, exp_normalized_image)
...@@ -43,7 +43,7 @@ class FeatureExtractorTest(tf.test.TestCase): ...@@ -43,7 +43,7 @@ class FeatureExtractorTest(tf.test.TestCase):
boxes = feature_extractor.CalculateReceptiveBoxes( boxes = feature_extractor.CalculateReceptiveBoxes(
height=1, width=2, rf=291, stride=32, padding=145) height=1, width=2, rf=291, stride=32, padding=145)
exp_boxes = [[-145., -145., 145., 145.], [-145., -113., 145., 177.]] exp_boxes = [[-145., -145., 145., 145.], [-145., -113., 145., 177.]]
with self.test_session() as sess: with self.session() as sess:
boxes_out = sess.run(boxes) boxes_out = sess.run(boxes)
self.assertAllEqual(exp_boxes, boxes_out) self.assertAllEqual(exp_boxes, boxes_out)
...@@ -52,7 +52,7 @@ class FeatureExtractorTest(tf.test.TestCase): ...@@ -52,7 +52,7 @@ class FeatureExtractorTest(tf.test.TestCase):
boxes = [[-10.0, 0.0, 11.0, 21.0], [-2.5, 5.0, 18.5, 26.0], boxes = [[-10.0, 0.0, 11.0, 21.0], [-2.5, 5.0, 18.5, 26.0],
[45.0, -2.5, 66.0, 18.5]] [45.0, -2.5, 66.0, 18.5]]
centers = feature_extractor.CalculateKeypointCenters(boxes) centers = feature_extractor.CalculateKeypointCenters(boxes)
with self.test_session() as sess: with self.session() as sess:
centers_out = sess.run(centers) centers_out = sess.run(centers)
exp_centers = [[0.5, 10.5], [8.0, 15.5], [55.5, 8.0]] exp_centers = [[0.5, 10.5], [8.0, 15.5], [55.5, 8.0]]
...@@ -72,12 +72,11 @@ class FeatureExtractorTest(tf.test.TestCase): ...@@ -72,12 +72,11 @@ class FeatureExtractorTest(tf.test.TestCase):
del normalized_image, reuse # Unused variables in the test. del normalized_image, reuse # Unused variables in the test.
image_shape = tf.shape(image) image_shape = tf.shape(image)
attention = tf.squeeze(tf.norm(image, axis=3)) attention = tf.squeeze(tf.norm(image, axis=3))
feature_map = tf.concat( feature_map = tf.concat([
[ tf.tile(image, [1, 1, 1, 341]),
tf.tile(image, [1, 1, 1, 341]), tf.zeros([1, image_shape[1], image_shape[2], 1])
tf.zeros([1, image_shape[1], image_shape[2], 1]) ],
], axis=3)
axis=3)
return attention, feature_map return attention, feature_map
boxes, feature_scales, features, scores = ( boxes, feature_scales, features, scores = (
...@@ -99,7 +98,7 @@ class FeatureExtractorTest(tf.test.TestCase): ...@@ -99,7 +98,7 @@ class FeatureExtractorTest(tf.test.TestCase):
axis=1)) axis=1))
exp_scores = [[1.723042], [1.600781]] exp_scores = [[1.723042], [1.600781]]
with self.test_session() as sess: with self.session() as sess:
boxes_out, feature_scales_out, features_out, scores_out = sess.run( boxes_out, feature_scales_out, features_out, scores_out = sess.run(
[boxes, feature_scales, features, scores]) [boxes, feature_scales, features, scores])
...@@ -118,16 +117,18 @@ class FeatureExtractorTest(tf.test.TestCase): ...@@ -118,16 +117,18 @@ class FeatureExtractorTest(tf.test.TestCase):
use_whitening = True use_whitening = True
pca_variances = tf.constant([4.0, 1.0]) pca_variances = tf.constant([4.0, 1.0])
output = feature_extractor.ApplyPcaAndWhitening( output = feature_extractor.ApplyPcaAndWhitening(data, pca_matrix, pca_mean,
data, pca_matrix, pca_mean, output_dim, use_whitening, pca_variances) output_dim, use_whitening,
pca_variances)
exp_output = [[2.5, -5.0], [-6.0, -2.0], [-0.5, -3.0], [1.0, -2.0]] exp_output = [[2.5, -5.0], [-6.0, -2.0], [-0.5, -3.0], [1.0, -2.0]]
with self.test_session() as sess: with self.session() as sess:
output_out = sess.run(output) output_out = sess.run(output)
self.assertAllEqual(exp_output, output_out) self.assertAllEqual(exp_output, output_out)
if __name__ == '__main__': if __name__ == '__main__':
tf.compat.v1.disable_eager_execution()
tf.test.main() tf.test.main()
...@@ -168,7 +168,7 @@ def ReadFromFile(file_path): ...@@ -168,7 +168,7 @@ def ReadFromFile(file_path):
attention: [N] float array with attention scores. attention: [N] float array with attention scores.
orientations: [N] float array with orientations. orientations: [N] float array with orientations.
""" """
with tf.gfile.FastGFile(file_path, 'rb') as f: with tf.io.gfile.GFile(file_path, 'rb') as f:
return ParseFromString(f.read()) return ParseFromString(f.read())
...@@ -192,5 +192,5 @@ def WriteToFile(file_path, ...@@ -192,5 +192,5 @@ def WriteToFile(file_path,
""" """
serialized_data = SerializeToString(locations, scales, descriptors, attention, serialized_data = SerializeToString(locations, scales, descriptors, attention,
orientations) orientations)
with tf.gfile.FastGFile(file_path, 'w') as f: with tf.io.gfile.GFile(file_path, 'w') as f:
f.write(serialized_data) f.write(serialized_data)
...@@ -81,7 +81,7 @@ class DelfFeaturesIoTest(tf.test.TestCase): ...@@ -81,7 +81,7 @@ class DelfFeaturesIoTest(tf.test.TestCase):
def testWriteAndReadToFile(self): def testWriteAndReadToFile(self):
locations, scales, descriptors, attention, orientations = create_data() locations, scales, descriptors, attention, orientations = create_data()
tmpdir = tf.test.get_temp_dir() tmpdir = tf.compat.v1.test.get_temp_dir()
filename = os.path.join(tmpdir, 'test.delf') filename = os.path.join(tmpdir, 'test.delf')
feature_io.WriteToFile(filename, locations, scales, descriptors, attention, feature_io.WriteToFile(filename, locations, scales, descriptors, attention,
orientations) orientations)
...@@ -94,7 +94,7 @@ class DelfFeaturesIoTest(tf.test.TestCase): ...@@ -94,7 +94,7 @@ class DelfFeaturesIoTest(tf.test.TestCase):
self.assertAllEqual(orientations, data_read[4]) self.assertAllEqual(orientations, data_read[4])
def testWriteAndReadToFileEmptyFile(self): def testWriteAndReadToFileEmptyFile(self):
tmpdir = tf.test.get_temp_dir() tmpdir = tf.compat.v1.test.get_temp_dir()
filename = os.path.join(tmpdir, 'test.delf') filename = os.path.join(tmpdir, 'test.delf')
feature_io.WriteToFile(filename, np.array([]), np.array([]), np.array([]), feature_io.WriteToFile(filename, np.array([]), np.array([]), np.array([]),
np.array([]), np.array([])) np.array([]), np.array([]))
......
...@@ -49,7 +49,7 @@ def ReadSolution(file_path, task): ...@@ -49,7 +49,7 @@ def ReadSolution(file_path, task):
public_solution = {} public_solution = {}
private_solution = {} private_solution = {}
ignored_ids = [] ignored_ids = []
with tf.gfile.GFile(file_path, 'r') as csv_file: with tf.io.gfile.GFile(file_path, 'r') as csv_file:
reader = csv.reader(csv_file) reader = csv.reader(csv_file)
next(reader, None) # Skip header. next(reader, None) # Skip header.
for row in reader: for row in reader:
...@@ -108,7 +108,7 @@ def ReadPredictions(file_path, public_ids, private_ids, ignored_ids, task): ...@@ -108,7 +108,7 @@ def ReadPredictions(file_path, public_ids, private_ids, ignored_ids, task):
""" """
public_predictions = {} public_predictions = {}
private_predictions = {} private_predictions = {}
with tf.gfile.GFile(file_path, 'r') as csv_file: with tf.io.gfile.GFile(file_path, 'r') as csv_file:
reader = csv.reader(csv_file) reader = csv.reader(csv_file)
next(reader, None) # Skip header. next(reader, None) # Skip header.
for row in reader: for row in reader:
......
...@@ -29,8 +29,9 @@ class DatasetFileIoTest(tf.test.TestCase): ...@@ -29,8 +29,9 @@ class DatasetFileIoTest(tf.test.TestCase):
def testReadRecognitionSolutionWorks(self): def testReadRecognitionSolutionWorks(self):
# Define inputs. # Define inputs.
file_path = os.path.join(tf.test.get_temp_dir(), 'recognition_solution.csv') file_path = os.path.join(tf.compat.v1.test.get_temp_dir(),
with tf.gfile.GFile(file_path, 'w') as f: 'recognition_solution.csv')
with tf.io.gfile.GFile(file_path, 'w') as f:
f.write('id,landmarks,Usage\n') f.write('id,landmarks,Usage\n')
f.write('0123456789abcdef,0 12,Public\n') f.write('0123456789abcdef,0 12,Public\n')
f.write('0223456789abcdef,,Public\n') f.write('0223456789abcdef,,Public\n')
...@@ -60,8 +61,9 @@ class DatasetFileIoTest(tf.test.TestCase): ...@@ -60,8 +61,9 @@ class DatasetFileIoTest(tf.test.TestCase):
def testReadRetrievalSolutionWorks(self): def testReadRetrievalSolutionWorks(self):
# Define inputs. # Define inputs.
file_path = os.path.join(tf.test.get_temp_dir(), 'retrieval_solution.csv') file_path = os.path.join(tf.compat.v1.test.get_temp_dir(),
with tf.gfile.GFile(file_path, 'w') as f: 'retrieval_solution.csv')
with tf.io.gfile.GFile(file_path, 'w') as f:
f.write('id,images,Usage\n') f.write('id,images,Usage\n')
f.write('0123456789abcdef,None,Ignored\n') f.write('0123456789abcdef,None,Ignored\n')
f.write('0223456789abcdef,fedcba9876543210 fedcba9876543200,Public\n') f.write('0223456789abcdef,fedcba9876543210 fedcba9876543200,Public\n')
...@@ -91,9 +93,9 @@ class DatasetFileIoTest(tf.test.TestCase): ...@@ -91,9 +93,9 @@ class DatasetFileIoTest(tf.test.TestCase):
def testReadRecognitionPredictionsWorks(self): def testReadRecognitionPredictionsWorks(self):
# Define inputs. # Define inputs.
file_path = os.path.join(tf.test.get_temp_dir(), file_path = os.path.join(tf.compat.v1.test.get_temp_dir(),
'recognition_predictions.csv') 'recognition_predictions.csv')
with tf.gfile.GFile(file_path, 'w') as f: with tf.io.gfile.GFile(file_path, 'w') as f:
f.write('id,landmarks\n') f.write('id,landmarks\n')
f.write('0123456789abcdef,12 0.1 \n') f.write('0123456789abcdef,12 0.1 \n')
f.write('0423456789abcdef,0 19.0\n') f.write('0423456789abcdef,0 19.0\n')
...@@ -129,9 +131,9 @@ class DatasetFileIoTest(tf.test.TestCase): ...@@ -129,9 +131,9 @@ class DatasetFileIoTest(tf.test.TestCase):
def testReadRetrievalPredictionsWorks(self): def testReadRetrievalPredictionsWorks(self):
# Define inputs. # Define inputs.
file_path = os.path.join(tf.test.get_temp_dir(), file_path = os.path.join(tf.compat.v1.test.get_temp_dir(),
'retrieval_predictions.csv') 'retrieval_predictions.csv')
with tf.gfile.GFile(file_path, 'w') as f: with tf.io.gfile.GFile(file_path, 'w') as f:
f.write('id,images\n') f.write('id,images\n')
f.write('0123456789abcdef,fedcba9876543250 \n') f.write('0123456789abcdef,fedcba9876543250 \n')
f.write('0423456789abcdef,fedcba9876543260\n') f.write('0423456789abcdef,fedcba9876543260\n')
......
# DELF training instructions
## Data preparation
See the
[build_image_dataset.py](https://github.com/andrefaraujo/models/blob/master/research/delf/delf/python/training/build_image_dataset.py)
script to prepare the data, following the instructions therein to download the
dataset (via Kaggle) and then running the script.
## Running training
Assuming the data was downloaded to `/tmp/gld_tfrecord/`, running the following
command should start training a model:
```sh
python tensorflow_models/research/delf/delf/python/training/train.py \
--train_file_pattern=/tmp/gld_tfrecord/train* \
--validation_file_pattern=/tmp/gld_tfrecord/train* \
--debug
```
Note that one may want to split the train TFRecords into a train/val (for
training, we usually simply split it 80/20 randomly).
# Copyright 2020 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Module for DELF training."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
# pylint: disable=unused-import
from delf.python.training import build_image_dataset
# pylint: enable=unused-import
#!/usr/bin/python
# Copyright 2020 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Converts landmark image data to TFRecords file format with Example protos.
The image data set is expected to reside in JPEG files ends up with '.jpg'.
This script assumes you have downloaded using the provided script:
https://www.kaggle.com/tobwey/landmark-recognition-challenge-image-downloader
This script converts the training and testing data into
a sharded data set consisting of TFRecord files
train_directory/train-00000-of-00128
train_directory/train-00001-of-00128
...
train_directory/train-00127-of-00128
and
test_directory/test-00000-of-00128
test_directory/test-00001-of-00128
...
test_directory/test-00127-of-00128
where we have selected 128 shards for both data sets. Each record
within the TFRecord file is a serialized Example proto. The Example proto
contains the following fields:
image/encoded: string containing JPEG encoded image in RGB colorspace
image/height: integer, image height in pixels
image/width: integer, image width in pixels
image/colorspace: string, specifying the colorspace, always 'RGB'
image/channels: integer, specifying the number of channels, always 3
image/format: string, specifying the format, always 'JPEG'
image/filename: string, the unique id of the image file
e.g. '97c0a12e07ae8dd5' or '650c989dd3493748'
Furthermore, if the data set type is training, it would contain one more field:
image/class/label: integer, the landmark_id from the input training csv file.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
from absl import app
from absl import flags
import numpy as np
import pandas as pd
import tensorflow as tf
FLAGS = flags.FLAGS
flags.DEFINE_string('train_directory', '/tmp/', 'Training data directory.')
flags.DEFINE_string('test_directory', '/tmp/', 'Testing data directory.')
flags.DEFINE_string('output_directory', '/tmp/', 'Output data directory.')
flags.DEFINE_string('train_csv_path', '/tmp/train.csv',
'Training data csv file path.')
flags.DEFINE_string('test_csv_path', '/tmp/test.csv',
'Testing data csv file path.')
flags.DEFINE_integer('num_shards', 128, 'Number of shards in output data.')
def _get_image_files_and_labels(name, csv_path, image_dir):
"""Process input and get the image file paths, image ids and the labels.
Args:
name: 'train' or 'test'.
csv_path: path to the Google-landmark Dataset csv Data Sources files.
image_dir: directory that stores downloaded images.
Returns:
image_paths: the paths to all images in the image_dir.
file_ids: the unique ids of images.
labels: the landmark id of all images. When name='test', the returned labels
will be an empty list.
Raises:
ValueError: if input name is not supported.
"""
image_paths = tf.io.gfile.glob(image_dir + '/*.jpg')
file_ids = [os.path.basename(os.path.normpath(f))[:-4] for f in image_paths]
if name == 'train':
with tf.io.gfile.GFile(csv_path, 'rb') as csv_file:
df = pd.read_csv(csv_file)
df = df.set_index('id')
labels = [int(df.loc[fid]['landmark_id']) for fid in file_ids]
elif name == 'test':
labels = []
else:
raise ValueError('Unsupported dataset split name: %s' % name)
return image_paths, file_ids, labels
def _process_image(filename):
"""Process a single image file.
Args:
filename: string, path to an image file e.g., '/path/to/example.jpg'.
Returns:
image_buffer: string, JPEG encoding of RGB image.
height: integer, image height in pixels.
width: integer, image width in pixels.
Raises:
ValueError: if parsed image has wrong number of dimensions or channels.
"""
# Read the image file.
with tf.io.gfile.GFile(filename, 'rb') as f:
image_data = f.read()
# Decode the RGB JPEG.
image = tf.io.decode_jpeg(image_data, channels=3)
# Check that image converted to RGB
if len(image.shape) != 3:
raise ValueError('The parsed image number of dimensions is not 3 but %d' %
(image.shape))
height = image.shape[0]
width = image.shape[1]
if image.shape[2] != 3:
raise ValueError('The parsed image channels is not 3 but %d' %
(image.shape[2]))
return image_data, height, width
def _int64_feature(value):
"""Returns an int64_list from a bool / enum / int / uint."""
return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
def _bytes_feature(value):
"""Returns a bytes_list from a string / byte."""
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
def _convert_to_example(file_id, image_buffer, height, width, label=None):
"""Build an Example proto for the given inputs.
Args:
file_id: string, unique id of an image file, e.g., '97c0a12e07ae8dd5'.
image_buffer: string, JPEG encoding of RGB image.
height: integer, image height in pixels.
width: integer, image width in pixels.
label: integer, the landmark id and prediction label.
Returns:
Example proto.
"""
colorspace = 'RGB'
channels = 3
image_format = 'JPEG'
features = {
'image/height': _int64_feature(height),
'image/width': _int64_feature(width),
'image/colorspace': _bytes_feature(colorspace.encode('utf-8')),
'image/channels': _int64_feature(channels),
'image/format': _bytes_feature(image_format.encode('utf-8')),
'image/id': _bytes_feature(file_id.encode('utf-8')),
'image/encoded': _bytes_feature(image_buffer)
}
if label is not None:
features['image/class/label'] = _int64_feature(label)
example = tf.train.Example(features=tf.train.Features(feature=features))
return example
def _write_tfrecord(output_prefix, image_paths, file_ids, labels):
"""Read image files and write image and label data into TFRecord files.
Args:
output_prefix: string, the prefix of output files, e.g. 'train'.
image_paths: list of strings, the paths to images to be converted.
file_ids: list of strings, the image unique ids.
labels: list of integers, the landmark ids of images. It is an empty list
when output_prefix='test'.
Raises:
ValueError: if the length of input images, ids and labels don't match
"""
if output_prefix == 'test':
labels = [None] * len(image_paths)
if not len(image_paths) == len(file_ids) == len(labels):
raise ValueError('length of image_paths, file_ids, labels shoud be the' +
' same. But they are %d, %d, %d, respectively' %
(len(image_paths), len(file_ids), len(labels)))
spacing = np.linspace(0, len(image_paths), FLAGS.num_shards + 1, dtype=np.int)
for shard in range(FLAGS.num_shards):
output_file = os.path.join(
FLAGS.output_directory,
'%s-%.5d-of-%.5d' % (output_prefix, shard, FLAGS.num_shards))
writer = tf.io.TFRecordWriter(output_file)
print('Processing shard ', shard, ' and writing file ', output_file)
for i in range(spacing[shard], spacing[shard + 1]):
image_buffer, height, width = _process_image(image_paths[i])
example = _convert_to_example(file_ids[i], image_buffer, height, width,
labels[i])
writer.write(example.SerializeToString())
writer.close()
def _build_tfrecord_dataset(name, csv_path, image_dir):
"""Build a TFRecord dataset.
Args:
name: 'train' or 'test' to indicate which set of data to be processed.
csv_path: path to the Google-landmark Dataset csv Data Sources files.
image_dir: directory that stores downloaded images.
Returns:
Nothing. After the function call, sharded TFRecord files are materialized.
"""
image_paths, file_ids, labels = _get_image_files_and_labels(
name, csv_path, image_dir)
_write_tfrecord(name, image_paths, file_ids, labels)
def main(unused_argv):
_build_tfrecord_dataset('train', FLAGS.train_csv_path, FLAGS.train_directory)
_build_tfrecord_dataset('test', FLAGS.test_csv_path, FLAGS.test_directory)
if __name__ == '__main__':
app.run(main)
# Copyright 2020 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Module exposing datasets for training."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
# pylint: disable=unused-import
from delf.python.training.datasets import googlelandmarks
# pylint: enable=unused-import
# Lint as: python3
# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Google Landmarks Dataset(GLD).
Placeholder for Google Landmarks dataset.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import functools
import tensorflow as tf
class _DataAugmentationParams(object):
"""Default parameters for augmentation."""
# The following are used for training.
min_object_covered = 0.1
aspect_ratio_range_min = 3. / 4
aspect_ratio_range_max = 4. / 3
area_range_min = 0.08
area_range_max = 1.0
max_attempts = 100
update_labels = False
# 'central_fraction' is used for central crop in inference.
central_fraction = 0.875
random_reflection = False
input_rows = 321
input_cols = 321
def NormalizeImages(images, pixel_value_scale=0.5, pixel_value_offset=0.5):
"""Normalize pixel values in image.
Output is computed as
normalized_images = (images - pixel_value_offset) / pixel_value_scale.
Args:
images: `Tensor`, images to normalize.
pixel_value_scale: float, scale.
pixel_value_offset: float, offset.
Returns:
normalized_images: `Tensor`, normalized images.
"""
images = tf.cast(images, tf.float32)
normalized_images = tf.math.divide(
tf.subtract(images, pixel_value_offset), pixel_value_scale)
return normalized_images
def _ImageNetCrop(image):
"""Imagenet-style crop with random bbox and aspect ratio.
Args:
image: a `Tensor`, image to crop.
Returns:
cropped_image: `Tensor`, cropped image.
"""
params = _DataAugmentationParams()
bbox = tf.constant([0.0, 0.0, 1.0, 1.0], dtype=tf.float32, shape=[1, 1, 4])
(bbox_begin, bbox_size, _) = tf.image.sample_distorted_bounding_box(
tf.shape(image),
bounding_boxes=bbox,
min_object_covered=params.min_object_covered,
aspect_ratio_range=(params.aspect_ratio_range_min,
params.aspect_ratio_range_max),
area_range=(params.area_range_min, params.area_range_max),
max_attempts=params.max_attempts,
use_image_if_no_bounding_boxes=True)
cropped_image = tf.slice(image, bbox_begin, bbox_size)
cropped_image.set_shape([None, None, 3])
cropped_image = tf.image.resize(
cropped_image, [params.input_rows, params.input_cols], method='area')
if params.random_reflection:
cropped_image = tf.image.random_flip_left_right(cropped_image)
return cropped_image
def _ParseFunction(example, name_to_features, image_size, augmentation):
"""Parse a single TFExample to get the image and label and process the image.
Args:
example: a `TFExample`.
name_to_features: a `dict`. The mapping from feature names to its type.
image_size: an `int`. The image size for the decoded image, on each side.
augmentation: a `boolean`. True if the image will be augmented.
Returns:
image: a `Tensor`. The processed image.
label: a `Tensor`. The ground-truth label.
"""
parsed_example = tf.io.parse_single_example(example, name_to_features)
# Parse to get image.
image = parsed_example['image/encoded']
image = tf.io.decode_jpeg(image)
if augmentation:
image = _ImageNetCrop(image)
else:
image = tf.image.resize(image, [image_size, image_size])
image.set_shape([image_size, image_size, 3])
# Parse to get label.
label = parsed_example['image/class/label']
return image, label
def CreateDataset(file_pattern,
image_size=321,
batch_size=32,
augmentation=False,
seed=0):
"""Creates a dataset.
Args:
file_pattern: str, file pattern of the dataset files.
image_size: int, image size.
batch_size: int, batch size.
augmentation: bool, whether to apply augmentation.
seed: int, seed for shuffling the dataset.
Returns:
tf.data.TFRecordDataset.
"""
filenames = tf.io.gfile.glob(file_pattern)
dataset = tf.data.TFRecordDataset(filenames)
dataset = dataset.repeat().shuffle(buffer_size=100, seed=seed)
# Create a description of the features.
feature_description = {
'image/height': tf.io.FixedLenFeature([], tf.int64, default_value=0),
'image/width': tf.io.FixedLenFeature([], tf.int64, default_value=0),
'image/channels': tf.io.FixedLenFeature([], tf.int64, default_value=0),
'image/format': tf.io.FixedLenFeature([], tf.string, default_value=''),
'image/filename': tf.io.FixedLenFeature([], tf.string, default_value=''),
'image/encoded': tf.io.FixedLenFeature([], tf.string, default_value=''),
'image/class/label': tf.io.FixedLenFeature([], tf.int64, default_value=0),
}
customized_parse_func = functools.partial(
_ParseFunction,
name_to_features=feature_description,
image_size=image_size,
augmentation=augmentation)
dataset = dataset.map(customized_parse_func)
dataset = dataset.batch(batch_size)
return dataset
# Copyright 2020 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""DELF model module, used for training and exporting."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
# pylint: disable=unused-import
from delf.python.training.model import delf_model
from delf.python.training.model import export_model_utils
from delf.python.training.model import resnet50
# pylint: enable=unused-import
# Lint as: python3
# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""DELF model implementation based on the following paper.
Large-Scale Image Retrieval with Attentive Deep Local Features
https://arxiv.org/abs/1612.06321
"""
import tensorflow as tf
from delf.python.training.model import resnet50 as resnet
layers = tf.keras.layers
reg = tf.keras.regularizers
_DECAY = 0.0001
class AttentionModel(tf.keras.Model):
"""Instantiates attention model.
Uses two [kernel_size x kernel_size] convolutions and softplus as activation
to compute an attention map with the same resolution as the featuremap.
Features l2-normalized and aggregated using attention probabilites as weights.
"""
def __init__(self, kernel_size=1, decay=_DECAY, name='attention'):
"""Initialization of attention model.
Args:
kernel_size: int, kernel size of convolutions.
decay: float, decay for l2 regularization of kernel weights.
name: str, name to identify model.
"""
super(AttentionModel, self).__init__(name=name)
# First convolutional layer (called with relu activation).
self.conv1 = layers.Conv2D(
512,
kernel_size,
kernel_regularizer=reg.l2(decay),
padding='same',
name='attn_conv1')
self.bn_conv1 = layers.BatchNormalization(axis=3, name='bn_conv1')
# Second convolutional layer, with softplus activation.
self.conv2 = layers.Conv2D(
1,
kernel_size,
kernel_regularizer=reg.l2(decay),
padding='same',
name='attn_conv2')
self.activation_layer = layers.Activation('softplus')
def call(self, inputs, training=True):
x = self.conv1(inputs)
x = self.bn_conv1(x, training=training)
x = tf.nn.relu(x)
score = self.conv2(x)
prob = self.activation_layer(score)
# L2-normalize the featuremap before pooling.
inputs = tf.nn.l2_normalize(inputs, axis=-1)
feat = tf.reduce_mean(tf.multiply(inputs, prob), [1, 2], keepdims=False)
return feat, prob, score
class Delf(tf.keras.Model):
"""Instantiates Keras DELF model using ResNet50 as backbone.
This class implements the [DELF](https://arxiv.org/abs/1612.06321) model for
extracting local features from images. The backbone is a ResNet50 network
that extracts featuremaps from both conv_4 and conv_5 layers. Activations
from conv_4 are used to compute an attention map of the same resolution.
"""
def __init__(self, block3_strides=True, name='DELF'):
"""Initialization of DELF model.
Args:
block3_strides: bool, whether to add strides to the output of block3.
name: str, name to identify model.
"""
super(Delf, self).__init__(name=name)
# Backbone using Keras ResNet50.
self.backbone = resnet.ResNet50(
'channels_last',
name='backbone',
include_top=False,
pooling='avg',
block3_strides=block3_strides,
average_pooling=False)
# Attention model.
self.attention = AttentionModel(name='attention')
# Define classifiers for training backbone and attention models.
def init_classifiers(self, num_classes):
self.num_classes = num_classes
self.desc_classification = layers.Dense(
num_classes, activation=None, kernel_regularizer=None, name='desc_fc')
self.attn_classification = layers.Dense(
num_classes, activation=None, kernel_regularizer=None, name='att_fc')
# Weights to optimize for descriptor fine tuning.
@property
def desc_trainable_weights(self):
return (self.backbone.trainable_weights +
self.desc_classification.trainable_weights)
# Weights to optimize for attention model training.
@property
def attn_trainable_weights(self):
return (self.attention.trainable_weights +
self.attn_classification.trainable_weights)
def call(self, input_image, training=True):
blocks = {'block3': None}
self.backbone(input_image, intermediates_dict=blocks, training=training)
features = blocks['block3']
_, probs, _ = self.attention(features, training=training)
return probs, features
# Lint as: python3
# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for the DELF model."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from absl.testing import parameterized
import tensorflow as tf
from delf.python.training.model import delf_model
class DelfTest(tf.test.TestCase, parameterized.TestCase):
@parameterized.named_parameters(
('block3_stridesTrue', True),
('block3_stridesFalse', False),
)
def test_build_model(self, block3_strides):
image_size = 321
num_classes = 1000
batch_size = 2
input_shape = (batch_size, image_size, image_size, 3)
model = delf_model.Delf(block3_strides=block3_strides, name='DELF')
model.init_classifiers(num_classes)
images = tf.random.uniform(input_shape, minval=-1.0, maxval=1.0, seed=0)
blocks = {}
# Get global feature by pooling block4 features.
desc_prelogits = model.backbone(
images, intermediates_dict=blocks, training=False)
desc_logits = model.desc_classification(desc_prelogits)
self.assertAllEqual(desc_prelogits.shape, (batch_size, 2048))
self.assertAllEqual(desc_logits.shape, (batch_size, num_classes))
features = blocks['block3']
attn_prelogits, _, _ = model.attention(features)
attn_logits = model.attn_classification(attn_prelogits)
self.assertAllEqual(attn_prelogits.shape, (batch_size, 1024))
self.assertAllEqual(attn_logits.shape, (batch_size, num_classes))
@parameterized.named_parameters(
('block3_stridesTrue', True),
('block3_stridesFalse', False),
)
def test_train_step(self, block3_strides):
image_size = 321
num_classes = 1000
batch_size = 2
clip_val = 10.0
input_shape = (batch_size, image_size, image_size, 3)
model = delf_model.Delf(block3_strides=block3_strides, name='DELF')
model.init_classifiers(num_classes)
optimizer = tf.keras.optimizers.SGD(learning_rate=0.001, momentum=0.9)
images = tf.random.uniform(input_shape, minval=0.0, maxval=1.0, seed=0)
labels = tf.random.uniform((batch_size,),
minval=0,
maxval=model.num_classes - 1,
dtype=tf.int64)
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(
from_logits=True, reduction=tf.keras.losses.Reduction.NONE)
def compute_loss(labels, predictions):
per_example_loss = loss_object(labels, predictions)
return tf.nn.compute_average_loss(
per_example_loss, global_batch_size=batch_size)
with tf.GradientTape() as desc_tape:
blocks = {}
desc_prelogits = model.backbone(
images, intermediates_dict=blocks, training=False)
desc_logits = model.desc_classification(desc_prelogits)
desc_logits = model.desc_classification(desc_prelogits)
desc_loss = compute_loss(labels, desc_logits)
gradients = desc_tape.gradient(desc_loss, model.desc_trainable_weights)
clipped, _ = tf.clip_by_global_norm(gradients, clip_norm=clip_val)
optimizer.apply_gradients(zip(clipped, model.desc_trainable_weights))
with tf.GradientTape() as attn_tape:
block3 = blocks['block3']
block3 = tf.stop_gradient(block3)
attn_prelogits, _, _ = model.attention(block3, training=True)
attn_logits = model.attn_classification(attn_prelogits)
attn_loss = compute_loss(labels, attn_logits)
gradients = attn_tape.gradient(attn_loss, model.attn_trainable_weights)
clipped, _ = tf.clip_by_global_norm(gradients, clip_norm=clip_val)
optimizer.apply_gradients(zip(clipped, model.attn_trainable_weights))
if __name__ == '__main__':
tf.test.main()
# Lint as: python3
# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Export DELF tensorflow inference model.
This model includes feature extraction, receptive field calculation and
key-point selection and outputs the selected feature descriptors.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
from absl import app
from absl import flags
import tensorflow as tf
from delf.python.training.model import delf_model
from delf.python.training.model import export_model_utils
FLAGS = flags.FLAGS
flags.DEFINE_string('ckpt_path', '/tmp/delf-logdir/delf-weights',
'Path to saved checkpoint.')
flags.DEFINE_string('export_path', None, 'Path where model will be exported.')
flags.DEFINE_boolean('block3_strides', False,
'Whether to apply strides after block3.')
flags.DEFINE_float('iou', 1.0, 'IOU for non-max suppression.')
def _build_tensor_info(tensor_dict):
"""Replace the dict's value by the tensor info.
Args:
tensor_dict: A dictionary contains <string, tensor>.
Returns:
dict: New dictionary contains <string, tensor_info>.
"""
return {
k: tf.compat.v1.saved_model.utils.build_tensor_info(t)
for k, t in tensor_dict.items()
}
def main(argv):
if len(argv) > 1:
raise app.UsageError('Too many command-line arguments.')
export_path = FLAGS.export_path
if os.path.exists(export_path):
raise ValueError('Export_path already exists.')
with tf.Graph().as_default() as g, tf.compat.v1.Session(graph=g) as sess:
# Setup the DELF model for extraction.
model = delf_model.Delf(block3_strides=FLAGS.block3_strides, name='DELF')
# Initial forward pass to build model.
images = tf.zeros((1, 321, 321, 3), dtype=tf.float32)
model(images)
stride_factor = 2.0 if FLAGS.block3_strides else 1.0
# Setup the multiscale keypoint extraction.
input_image = tf.compat.v1.placeholder(
tf.uint8, shape=(None, None, 3), name='input_image')
input_abs_thres = tf.compat.v1.placeholder(
tf.float32, shape=(), name='input_abs_thres')
input_scales = tf.compat.v1.placeholder(
tf.float32, shape=[None], name='input_scales')
input_max_feature_num = tf.compat.v1.placeholder(
tf.int32, shape=(), name='input_max_feature_num')
extracted_features = export_model_utils.ExtractLocalFeatures(
input_image, input_scales, input_max_feature_num, input_abs_thres,
FLAGS.iou, lambda x: model(x, training=False), stride_factor)
# Load the weights.
checkpoint_path = FLAGS.ckpt_path
model.load_weights(checkpoint_path)
print('Checkpoint loaded from ', checkpoint_path)
named_input_tensors = {
'input_image': input_image,
'input_scales': input_scales,
'input_abs_thres': input_abs_thres,
'input_max_feature_num': input_max_feature_num,
}
# Outputs to the exported model.
named_output_tensors = {}
named_output_tensors['boxes'] = tf.identity(
extracted_features[0], name='boxes')
named_output_tensors['features'] = tf.identity(
extracted_features[1], name='features')
named_output_tensors['scales'] = tf.identity(
extracted_features[2], name='scales')
named_output_tensors['scores'] = tf.identity(
extracted_features[3], name='scores')
# Export the model.
signature_def = tf.compat.v1.saved_model.signature_def_utils.build_signature_def(
inputs=_build_tensor_info(named_input_tensors),
outputs=_build_tensor_info(named_output_tensors))
print('Exporting trained model to:', export_path)
builder = tf.compat.v1.saved_model.builder.SavedModelBuilder(export_path)
init_op = None
builder.add_meta_graph_and_variables(
sess, [tf.compat.v1.saved_model.tag_constants.SERVING],
signature_def_map={
tf.compat.v1.saved_model.signature_constants
.DEFAULT_SERVING_SIGNATURE_DEF_KEY:
signature_def
},
main_op=init_op)
builder.save()
if __name__ == '__main__':
app.run(main)
# Lint as: python3
# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Helper functions for DELF model exporting."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
from delf import feature_extractor
from delf.python.training.datasets import googlelandmarks as gld
from object_detection.core import box_list
from object_detection.core import box_list_ops
def ExtractLocalFeatures(image, image_scales, max_feature_num, abs_thres, iou,
attention_model_fn, stride_factor):
"""Extract local features for input image.
Args:
image: image tensor of type tf.uint8 with shape [h, w, channels].
image_scales: 1D float tensor which contains float scales used for image
pyramid construction.
max_feature_num: int tensor denotes the maximum selected feature points.
abs_thres: float tensor denotes the score threshold for feature selection.
iou: float scalar denotes the iou threshold for NMS.
attention_model_fn: model function. Follows the signature:
* Args:
* `images`: Image tensor which is re-scaled.
* Returns:
* `attention_prob`: attention map after the non-linearity.
* `feature_map`: feature map after ResNet convolution.
stride_factor: integer accounting for striding after block3.
Returns:
boxes: [N, 4] float tensor which denotes the selected receptive box. N is
the number of final feature points which pass through keypoint selection
and NMS steps.
features: [N, depth] float tensor.
feature_scales: [N] float tensor. It is the inverse of the input image
scales such that larger image scales correspond to larger image regions,
which is compatible with keypoints detected with other techniques, for
example Congas.
scores: [N, 1] float tensor denotes the attention score.
"""
original_image_shape_float = tf.gather(
tf.dtypes.cast(tf.shape(image), tf.float32), [0, 1])
image_tensor = gld.NormalizeImages(
image, pixel_value_offset=128.0, pixel_value_scale=128.0)
image_tensor = tf.expand_dims(image_tensor, 0, name='image/expand_dims')
# Hard code the feature depth and receptive field parameters for now.
rf, stride, padding = [291.0, 16.0 * stride_factor, 145.0]
feature_depth = 1024
def _ProcessSingleScale(scale_index, boxes, features, scales, scores):
"""Resizes the image and run feature extraction and keypoint selection.
This function will be passed into tf.while_loop() and be called
repeatedly. The input boxes are collected from the previous iteration
[0: scale_index -1]. We get the current scale by
image_scales[scale_index], and run resize image, feature extraction and
keypoint selection. Then we will get a new set of selected_boxes for
current scale. In the end, we concat the previous boxes with current
selected_boxes as the output.
Args:
scale_index: A valid index in the image_scales.
boxes: Box tensor with the shape of [N, 4].
features: Feature tensor with the shape of [N, depth].
scales: Scale tensor with the shape of [N].
scores: Attention score tensor with the shape of [N].
Returns:
scale_index: The next scale index for processing.
boxes: Concatenated box tensor with the shape of [K, 4]. K >= N.
features: Concatenated feature tensor with the shape of [K, depth].
scales: Concatenated scale tensor with the shape of [K].
scores: Concatenated score tensor with the shape of [K].
"""
scale = tf.gather(image_scales, scale_index)
new_image_size = tf.dtypes.cast(
tf.round(original_image_shape_float * scale), tf.int32)
resized_image = tf.image.resize(image_tensor, new_image_size)
attention_prob, feature_map = attention_model_fn(resized_image)
attention_prob = tf.squeeze(attention_prob, axis=[0])
feature_map = tf.squeeze(feature_map, axis=[0])
rf_boxes = feature_extractor.CalculateReceptiveBoxes(
tf.shape(feature_map)[0],
tf.shape(feature_map)[1], rf, stride, padding)
# Re-project back to the original image space.
rf_boxes = tf.divide(rf_boxes, scale)
attention_prob = tf.reshape(attention_prob, [-1])
feature_map = tf.reshape(feature_map, [-1, feature_depth])
# Use attention score to select feature vectors.
indices = tf.reshape(tf.where(attention_prob >= abs_thres), [-1])
selected_boxes = tf.gather(rf_boxes, indices)
selected_features = tf.gather(feature_map, indices)
selected_scores = tf.gather(attention_prob, indices)
selected_scales = tf.ones_like(selected_scores, tf.float32) / scale
# Concat with the previous result from different scales.
boxes = tf.concat([boxes, selected_boxes], 0)
features = tf.concat([features, selected_features], 0)
scales = tf.concat([scales, selected_scales], 0)
scores = tf.concat([scores, selected_scores], 0)
return scale_index + 1, boxes, features, scales, scores
output_boxes = tf.zeros([0, 4], dtype=tf.float32)
output_features = tf.zeros([0, feature_depth], dtype=tf.float32)
output_scales = tf.zeros([0], dtype=tf.float32)
output_scores = tf.zeros([0], dtype=tf.float32)
# Process the first scale separately, the following scales will reuse the
# graph variables.
(_, output_boxes, output_features, output_scales,
output_scores) = _ProcessSingleScale(0, output_boxes, output_features,
output_scales, output_scores)
i = tf.constant(1, dtype=tf.int32)
num_scales = tf.shape(image_scales)[0]
keep_going = lambda j, b, f, scales, scores: tf.less(j, num_scales)
(_, output_boxes, output_features, output_scales,
output_scores) = tf.while_loop(
cond=keep_going,
body=_ProcessSingleScale,
loop_vars=[
i, output_boxes, output_features, output_scales, output_scores
],
shape_invariants=[
i.get_shape(),
tf.TensorShape([None, 4]),
tf.TensorShape([None, feature_depth]),
tf.TensorShape([None]),
tf.TensorShape([None])
],
back_prop=False)
feature_boxes = box_list.BoxList(output_boxes)
feature_boxes.add_field('features', output_features)
feature_boxes.add_field('scales', output_scales)
feature_boxes.add_field('scores', output_scores)
nms_max_boxes = tf.minimum(max_feature_num, feature_boxes.num_boxes())
final_boxes = box_list_ops.non_max_suppression(feature_boxes, iou,
nms_max_boxes)
return final_boxes.get(), final_boxes.get_field(
'features'), final_boxes.get_field('scales'), tf.expand_dims(
final_boxes.get_field('scores'), 1)
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""ResNet50 backbone used in DELF model.
Copied over from tensorflow/python/eager/benchmarks/resnet50/resnet50.py,
because that code does not support dependencies.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import functools
import tensorflow as tf
layers = tf.keras.layers
class _IdentityBlock(tf.keras.Model):
"""_IdentityBlock is the block that has no conv layer at shortcut.
Args:
kernel_size: the kernel size of middle conv layer at main path
filters: list of integers, the filters of 3 conv layer at main path
stage: integer, current stage label, used for generating layer names
block: 'a','b'..., current block label, used for generating layer names
data_format: data_format for the input ('channels_first' or
'channels_last').
"""
def __init__(self, kernel_size, filters, stage, block, data_format):
super(_IdentityBlock, self).__init__(name='')
filters1, filters2, filters3 = filters
conv_name_base = 'res' + str(stage) + block + '_branch'
bn_name_base = 'bn' + str(stage) + block + '_branch'
bn_axis = 1 if data_format == 'channels_first' else 3
self.conv2a = layers.Conv2D(
filters1, (1, 1), name=conv_name_base + '2a', data_format=data_format)
self.bn2a = layers.BatchNormalization(
axis=bn_axis, name=bn_name_base + '2a')
self.conv2b = layers.Conv2D(
filters2,
kernel_size,
padding='same',
data_format=data_format,
name=conv_name_base + '2b')
self.bn2b = layers.BatchNormalization(
axis=bn_axis, name=bn_name_base + '2b')
self.conv2c = layers.Conv2D(
filters3, (1, 1), name=conv_name_base + '2c', data_format=data_format)
self.bn2c = layers.BatchNormalization(
axis=bn_axis, name=bn_name_base + '2c')
def call(self, input_tensor, training=False):
x = self.conv2a(input_tensor)
x = self.bn2a(x, training=training)
x = tf.nn.relu(x)
x = self.conv2b(x)
x = self.bn2b(x, training=training)
x = tf.nn.relu(x)
x = self.conv2c(x)
x = self.bn2c(x, training=training)
x += input_tensor
return tf.nn.relu(x)
class _ConvBlock(tf.keras.Model):
"""_ConvBlock is the block that has a conv layer at shortcut.
Args:
kernel_size: the kernel size of middle conv layer at main path
filters: list of integers, the filters of 3 conv layer at main path
stage: integer, current stage label, used for generating layer names
block: 'a','b'..., current block label, used for generating layer names
data_format: data_format for the input ('channels_first' or
'channels_last').
strides: strides for the convolution. Note that from stage 3, the first
conv layer at main path is with strides=(2,2), and the shortcut should
have strides=(2,2) as well.
"""
def __init__(self,
kernel_size,
filters,
stage,
block,
data_format,
strides=(2, 2)):
super(_ConvBlock, self).__init__(name='')
filters1, filters2, filters3 = filters
conv_name_base = 'res' + str(stage) + block + '_branch'
bn_name_base = 'bn' + str(stage) + block + '_branch'
bn_axis = 1 if data_format == 'channels_first' else 3
self.conv2a = layers.Conv2D(
filters1, (1, 1),
strides=strides,
name=conv_name_base + '2a',
data_format=data_format)
self.bn2a = layers.BatchNormalization(
axis=bn_axis, name=bn_name_base + '2a')
self.conv2b = layers.Conv2D(
filters2,
kernel_size,
padding='same',
name=conv_name_base + '2b',
data_format=data_format)
self.bn2b = layers.BatchNormalization(
axis=bn_axis, name=bn_name_base + '2b')
self.conv2c = layers.Conv2D(
filters3, (1, 1), name=conv_name_base + '2c', data_format=data_format)
self.bn2c = layers.BatchNormalization(
axis=bn_axis, name=bn_name_base + '2c')
self.conv_shortcut = layers.Conv2D(
filters3, (1, 1),
strides=strides,
name=conv_name_base + '1',
data_format=data_format)
self.bn_shortcut = layers.BatchNormalization(
axis=bn_axis, name=bn_name_base + '1')
def call(self, input_tensor, training=False):
x = self.conv2a(input_tensor)
x = self.bn2a(x, training=training)
x = tf.nn.relu(x)
x = self.conv2b(x)
x = self.bn2b(x, training=training)
x = tf.nn.relu(x)
x = self.conv2c(x)
x = self.bn2c(x, training=training)
shortcut = self.conv_shortcut(input_tensor)
shortcut = self.bn_shortcut(shortcut, training=training)
x += shortcut
return tf.nn.relu(x)
# pylint: disable=not-callable
class ResNet50(tf.keras.Model):
"""Instantiates the ResNet50 architecture.
Args:
data_format: format for the image. Either 'channels_first' or
'channels_last'. 'channels_first' is typically faster on GPUs while
'channels_last' is typically faster on CPUs. See
https://www.tensorflow.org/performance/performance_guide#data_formats
name: Prefix applied to names of variables created in the model.
include_top: whether to include the fully-connected layer at the top of the
network.
pooling: Optional pooling mode for feature extraction when `include_top` is
False. 'None' means that the output of the model will be the 4D tensor
output of the last convolutional layer. 'avg' means that global average
pooling will be applied to the output of the last convolutional layer, and
thus the output of the model will be a 2D tensor. 'max' means that global
max pooling will be applied.
block3_strides: whether to add a stride of 2 to block3 to make it compatible
with tf.slim ResNet implementation.
average_pooling: whether to do average pooling of block4 features before
global pooling.
classes: optional number of classes to classify images into, only to be
specified if `include_top` is True.
Raises:
ValueError: in case of invalid argument for data_format.
"""
def __init__(self,
data_format,
name='',
include_top=True,
pooling=None,
block3_strides=False,
average_pooling=True,
classes=1000):
super(ResNet50, self).__init__(name=name)
valid_channel_values = ('channels_first', 'channels_last')
if data_format not in valid_channel_values:
raise ValueError('Unknown data_format: %s. Valid values: %s' %
(data_format, valid_channel_values))
self.include_top = include_top
self.block3_strides = block3_strides
self.average_pooling = average_pooling
self.pooling = pooling
def conv_block(filters, stage, block, strides=(2, 2)):
return _ConvBlock(
3,
filters,
stage=stage,
block=block,
data_format=data_format,
strides=strides)
def id_block(filters, stage, block):
return _IdentityBlock(
3, filters, stage=stage, block=block, data_format=data_format)
self.conv1 = layers.Conv2D(
64, (7, 7),
strides=(2, 2),
data_format=data_format,
padding='same',
name='conv1')
bn_axis = 1 if data_format == 'channels_first' else 3
self.bn_conv1 = layers.BatchNormalization(axis=bn_axis, name='bn_conv1')
self.max_pool = layers.MaxPooling2D((3, 3),
strides=(2, 2),
data_format=data_format)
self.l2a = conv_block([64, 64, 256], stage=2, block='a', strides=(1, 1))
self.l2b = id_block([64, 64, 256], stage=2, block='b')
self.l2c = id_block([64, 64, 256], stage=2, block='c')
self.l3a = conv_block([128, 128, 512], stage=3, block='a')
self.l3b = id_block([128, 128, 512], stage=3, block='b')
self.l3c = id_block([128, 128, 512], stage=3, block='c')
self.l3d = id_block([128, 128, 512], stage=3, block='d')
self.l4a = conv_block([256, 256, 1024], stage=4, block='a')
self.l4b = id_block([256, 256, 1024], stage=4, block='b')
self.l4c = id_block([256, 256, 1024], stage=4, block='c')
self.l4d = id_block([256, 256, 1024], stage=4, block='d')
self.l4e = id_block([256, 256, 1024], stage=4, block='e')
self.l4f = id_block([256, 256, 1024], stage=4, block='f')
# Striding layer that can be used on top of block3 to produce feature maps
# with the same resolution as the TF-Slim implementation.
if self.block3_strides:
self.subsampling_layer = layers.MaxPooling2D((1, 1),
strides=(2, 2),
data_format=data_format)
self.l5a = conv_block([512, 512, 2048],
stage=5,
block='a',
strides=(1, 1))
else:
self.l5a = conv_block([512, 512, 2048], stage=5, block='a')
self.l5b = id_block([512, 512, 2048], stage=5, block='b')
self.l5c = id_block([512, 512, 2048], stage=5, block='c')
self.avg_pool = layers.AveragePooling2D((7, 7),
strides=(7, 7),
data_format=data_format)
if self.include_top:
self.flatten = layers.Flatten()
self.fc1000 = layers.Dense(classes, name='fc1000')
else:
reduction_indices = [1, 2] if data_format == 'channels_last' else [2, 3]
reduction_indices = tf.constant(reduction_indices)
if pooling == 'avg':
self.global_pooling = functools.partial(
tf.reduce_mean, axis=reduction_indices, keepdims=False)
elif pooling == 'max':
self.global_pooling = functools.partial(
tf.reduce_max, axis=reduction_indices, keepdims=False)
else:
self.global_pooling = None
def call(self, inputs, training=True, intermediates_dict=None):
"""Call the ResNet50 model.
Args:
inputs: Images to compute features for.
training: Whether model is in training phase.
intermediates_dict: `None` or dictionary. If not None, accumulate feature
maps from intermediate blocks into the dictionary. ""
Returns:
Tensor with featuremap.
"""
x = self.conv1(inputs)
x = self.bn_conv1(x, training=training)
x = tf.nn.relu(x)
if intermediates_dict is not None:
intermediates_dict['block0'] = x
x = self.max_pool(x)
if intermediates_dict is not None:
intermediates_dict['block0mp'] = x
# Block 1 (equivalent to "conv2" in Resnet paper).
x = self.l2a(x, training=training)
x = self.l2b(x, training=training)
x = self.l2c(x, training=training)
if intermediates_dict is not None:
intermediates_dict['block1'] = x
# Block 2 (equivalent to "conv3" in Resnet paper).
x = self.l3a(x, training=training)
x = self.l3b(x, training=training)
x = self.l3c(x, training=training)
x = self.l3d(x, training=training)
if intermediates_dict is not None:
intermediates_dict['block2'] = x
# Block 3 (equivalent to "conv4" in Resnet paper).
x = self.l4a(x, training=training)
x = self.l4b(x, training=training)
x = self.l4c(x, training=training)
x = self.l4d(x, training=training)
x = self.l4e(x, training=training)
x = self.l4f(x, training=training)
if self.block3_strides:
x = self.subsampling_layer(x)
if intermediates_dict is not None:
intermediates_dict['block3'] = x
else:
if intermediates_dict is not None:
intermediates_dict['block3'] = x
x = self.l5a(x, training=training)
x = self.l5b(x, training=training)
x = self.l5c(x, training=training)
if self.average_pooling:
x = self.avg_pool(x)
if intermediates_dict is not None:
intermediates_dict['block4'] = x
else:
if intermediates_dict is not None:
intermediates_dict['block4'] = x
if self.include_top:
return self.fc1000(self.flatten(x))
elif self.global_pooling:
return self.global_pooling(x)
else:
return x
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment