diff --git a/.github/ISSUE_TEMPLATE.md b/.github/ISSUE_TEMPLATE.md deleted file mode 100644 index 098f2c8f3c836a998d78f7607787b7476108a57e..0000000000000000000000000000000000000000 --- a/.github/ISSUE_TEMPLATE.md +++ /dev/null @@ -1 +0,0 @@ -## Please let us know which model this issue is about (specify the top-level directory) diff --git a/ISSUE_TEMPLATE.md b/ISSUE_TEMPLATE.md new file mode 100644 index 0000000000000000000000000000000000000000..4da144cdd9a2b61aa9a136faa639554e12f89de5 --- /dev/null +++ b/ISSUE_TEMPLATE.md @@ -0,0 +1,37 @@ +Please go to Stack Overflow for help and support: + +http://stackoverflow.com/questions/tagged/tensorflow + +Also, please understand that many of the models included in this repository are experimental and research-style code. If you open a GitHub issue, here is our policy: + +1. It must be a bug or a feature request. +2. The form below must be filled out. + +**Here's why we have that policy**: TensorFlow developers respond to issues. We want to focus on work that benefits the whole community, e.g., fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow. + +------------------------ + +### System information +- **What is the top-level directory of the model you are using**: +- **Have I written custom code (as opposed to using a stock example script provided in TensorFlow)**: +- **OS Platform and Distribution (e.g., Linux Ubuntu 16.04)**: +- **TensorFlow installed from (source or binary)**: +- **TensorFlow version (use command below)**: +- **Bazel version (if compiling from source)**: +- **CUDA/cuDNN version**: +- **GPU model and memory**: +- **Exact command to reproduce**: + +You can collect some of this information using our environment capture script: + +https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh + +You can obtain the TensorFlow version with + +python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)" + +### Describe the problem +Describe the problem clearly here. Be sure to convey here why it's a bug in TensorFlow or a feature request. + +### Source code / logs +Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem. diff --git a/README.md b/README.md index 08b5a7a56908bfbdc48c76399499550e077cb9d7..f3b4619b7756e7c4090aac73b2253af41e1e958b 100644 --- a/README.md +++ b/README.md @@ -11,18 +11,24 @@ running TensorFlow 0.12 or earlier, please ## Models +- [adversarial_crypto](adversarial_crypto): protecting communications with adversarial neural cryptography. +- [adversarial_text](adversarial_text): semi-supervised sequence learning with adversarial training. +- [attention_ocr](attention_ocr): a model for real-world image text extraction. - [autoencoder](autoencoder): various autoencoders. +- [cognitive_mapping_and_planning](cognitive_mapping_and_planning): implementation of a spatial memory based mapping and planning architecture for visual navigation. - [compression](compression): compressing and decompressing images using a pre-trained Residual GRU network. - [differential_privacy](differential_privacy): privacy-preserving student models from multiple teachers. - [domain_adaptation](domain_adaptation): domain separation networks. - [im2txt](im2txt): image-to-text neural network for image captioning. - [inception](inception): deep convolutional networks for computer vision. - [learning_to_remember_rare_events](learning_to_remember_rare_events): a large-scale life-long memory module for use in deep learning. +- [lfads](lfads): sequential variational autoencoder for analyzing neuroscience data. - [lm_1b](lm_1b): language modeling on the one billion word benchmark. - [namignizer](namignizer): recognize and generate names. - [neural_gpu](neural_gpu): highly parallel neural computer. - [neural_programmer](neural_programmer): neural network augmented with logic and mathematic operations. - [next_frame_prediction](next_frame_prediction): probabilistic future frame synthesis via cross convolutional networks. +- [object_detection](object_detection): localizing and identifying multiple objects in a single image. - [real_nvp](real_nvp): density estimation using real-valued non-volume preserving (real NVP) transformations. - [resnet](resnet): deep and wide residual networks. - [skip_thoughts](skip_thoughts): recurrent neural network sentence-to-vector encoder. diff --git a/adversarial_crypto/README.md b/adversarial_crypto/README.md new file mode 100644 index 0000000000000000000000000000000000000000..504ca234bebeb71421128467e0eee3e172abcf6b --- /dev/null +++ b/adversarial_crypto/README.md @@ -0,0 +1,58 @@ +# Learning to Protect Communications with Adversarial Neural Cryptography + +This is a slightly-updated model used for the paper +["Learning to Protect Communications with Adversarial Neural +Cryptography"](https://arxiv.org/abs/1610.06918). + +> We ask whether neural networks can learn to use secret keys to protect +> information from other neural networks. Specifically, we focus on ensuring +> confidentiality properties in a multiagent system, and we specify those +> properties in terms of an adversary. Thus, a system may consist of neural +> networks named Alice and Bob, and we aim to limit what a third neural +> network named Eve learns from eavesdropping on the communication between +> Alice and Bob. We do not prescribe specific cryptographic algorithms to +> these neural networks; instead, we train end-to-end, adversarially. +> We demonstrate that the neural networks can learn how to perform forms of +> encryption and decryption, and also how to apply these operations +> selectively in order to meet confidentiality goals. + +This code allows you to train an encoder/decoder/adversary triplet +and evaluate their effectiveness on randomly generated input and key +pairs. + +## Prerequisites + +The only software requirements for running the encoder and decoder is having +Tensorflow installed. + +Requires Tensorflow r0.12 or later. + +## Training and evaluating + +After installing TensorFlow and ensuring that your paths are configured +appropriately: + +``` +python train_eval.py +``` + +This will begin training a fresh model. If and when the model becomes +sufficiently well-trained, it will reset the Eve model multiple times +and retrain it from scratch, outputting the accuracy thus obtained +in each run. + +## Model differences from the paper + +The model has been simplified slightly from the one described in +the paper - the convolutional layer width was reduced by a factor +of two. In the version in the paper, there was a nonlinear unit +after the fully-connected layer; that nonlinear has been removed +here. These changes improve the robustness of training. The +initializer for the convolution layers has switched to the +tf.contrib.layers default of xavier_initializer instead of +a simpler truncated_normal. + +## Contact information + +This model repository is maintained by David G. Andersen +([dave-andersen](https://github.com/dave-andersen)). diff --git a/adversarial_crypto/train_eval.py b/adversarial_crypto/train_eval.py new file mode 100644 index 0000000000000000000000000000000000000000..6f5a5914b56d426f62dee1b889278e9d62d84f0d --- /dev/null +++ b/adversarial_crypto/train_eval.py @@ -0,0 +1,274 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Adversarial training to learn trivial encryption functions, +from the paper "Learning to Protect Communications with +Adversarial Neural Cryptography", Abadi & Andersen, 2016. + +https://arxiv.org/abs/1610.06918 + +This program creates and trains three neural networks, +termed Alice, Bob, and Eve. Alice takes inputs +in_m (message), in_k (key) and outputs 'ciphertext'. + +Bob takes inputs in_k, ciphertext and tries to reconstruct +the message. + +Eve is an adversarial network that takes input ciphertext +and also tries to reconstruct the message. + +The main function attempts to train these networks and then +evaluates them, all on random plaintext and key values. + +""" + +# TensorFlow Python 3 compatibility +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function +import signal +import sys +from six.moves import xrange # pylint: disable=redefined-builtin +import tensorflow as tf + +flags = tf.app.flags + +flags.DEFINE_float('learning_rate', 0.0008, 'Constant learning rate') +flags.DEFINE_integer('batch_size', 4096, 'Batch size') + +FLAGS = flags.FLAGS + +# Input and output configuration. +TEXT_SIZE = 16 +KEY_SIZE = 16 + +# Training parameters. +ITERS_PER_ACTOR = 1 +EVE_MULTIPLIER = 2 # Train Eve 2x for every step of Alice/Bob +# Train until either max loops or Alice/Bob "good enough": +MAX_TRAINING_LOOPS = 850000 +BOB_LOSS_THRESH = 0.02 # Exit when Bob loss < 0.02 and Eve > 7.7 bits +EVE_LOSS_THRESH = 7.7 + +# Logging and evaluation. +PRINT_EVERY = 200 # In training, log every 200 steps. +EVE_EXTRA_ROUNDS = 2000 # At end, train eve a bit more. +RETRAIN_EVE_ITERS = 10000 # Retrain eve up to ITERS*LOOPS times. +RETRAIN_EVE_LOOPS = 25 # With an evaluation each loop +NUMBER_OF_EVE_RESETS = 5 # And do this up to 5 times with a fresh eve. +# Use EVAL_BATCHES samples each time we check accuracy. +EVAL_BATCHES = 1 + + +def batch_of_random_bools(batch_size, n): + """Return a batch of random "boolean" numbers. + + Args: + batch_size: Batch size dimension of returned tensor. + n: number of entries per batch. + + Returns: + A [batch_size, n] tensor of "boolean" numbers, where each number is + preresented as -1 or 1. + """ + + as_int = tf.random_uniform( + [batch_size, n], minval=0, maxval=2, dtype=tf.int32) + expanded_range = (as_int * 2) - 1 + return tf.cast(expanded_range, tf.float32) + + +class AdversarialCrypto(object): + """Primary model implementation class for Adversarial Neural Crypto. + + This class contains the code for the model itself, + and when created, plumbs the pathways from Alice to Bob and + Eve, creates the optimizers and loss functions, etc. + + Attributes: + eve_loss: Eve's loss function. + bob_loss: Bob's loss function. Different units from eve_loss. + eve_optimizer: A tf op that runs Eve's optimizer. + bob_optimizer: A tf op that runs Bob's optimizer. + bob_reconstruction_loss: Bob's message reconstruction loss, + which is comparable to eve_loss. + reset_eve_vars: Execute this op to completely reset Eve. + """ + + def get_message_and_key(self): + """Generate random pseudo-boolean key and message values.""" + + batch_size = tf.placeholder_with_default(FLAGS.batch_size, shape=[]) + + in_m = batch_of_random_bools(batch_size, TEXT_SIZE) + in_k = batch_of_random_bools(batch_size, KEY_SIZE) + return in_m, in_k + + def model(self, collection, message, key=None): + """The model for Alice, Bob, and Eve. If key=None, the first FC layer + takes only the message as inputs. Otherwise, it uses both the key + and the message. + + Args: + collection: The graph keys collection to add new vars to. + message: The input message to process. + key: The input key (if any) to use. + """ + + if key is not None: + combined_message = tf.concat(axis=1, values=[message, key]) + else: + combined_message = message + + # Ensure that all variables created are in the specified collection. + with tf.contrib.framework.arg_scope( + [tf.contrib.layers.fully_connected, tf.contrib.layers.conv2d], + variables_collections=[collection]): + + fc = tf.contrib.layers.fully_connected( + combined_message, + TEXT_SIZE + KEY_SIZE, + biases_initializer=tf.constant_initializer(0.0), + activation_fn=None) + + # Perform a sequence of 1D convolutions (by expanding the message out to 2D + # and then squeezing it back down). + fc = tf.expand_dims(fc, 2) + # 2,1 -> 1,2 + conv = tf.contrib.layers.conv2d( + fc, 2, 2, 2, 'SAME', activation_fn=tf.nn.sigmoid) + # 1,2 -> 1, 2 + conv = tf.contrib.layers.conv2d( + conv, 2, 1, 1, 'SAME', activation_fn=tf.nn.sigmoid) + # 1,2 -> 1, 1 + conv = tf.contrib.layers.conv2d( + conv, 1, 1, 1, 'SAME', activation_fn=tf.nn.tanh) + conv = tf.squeeze(conv, 2) + return conv + + def __init__(self): + in_m, in_k = self.get_message_and_key() + encrypted = self.model('alice', in_m, in_k) + decrypted = self.model('bob', encrypted, in_k) + eve_out = self.model('eve', encrypted, None) + + self.reset_eve_vars = tf.group( + *[w.initializer for w in tf.get_collection('eve')]) + + optimizer = tf.train.AdamOptimizer(learning_rate=FLAGS.learning_rate) + + # Eve's goal is to decrypt the entire message: + eve_bits_wrong = tf.reduce_sum( + tf.abs((eve_out + 1.0) / 2.0 - (in_m + 1.0) / 2.0), [1]) + self.eve_loss = tf.reduce_sum(eve_bits_wrong) + self.eve_optimizer = optimizer.minimize( + self.eve_loss, var_list=tf.get_collection('eve')) + + # Alice and Bob want to be accurate... + self.bob_bits_wrong = tf.reduce_sum( + tf.abs((decrypted + 1.0) / 2.0 - (in_m + 1.0) / 2.0), [1]) + # ... and to not let Eve do better than guessing. + self.bob_reconstruction_loss = tf.reduce_sum(self.bob_bits_wrong) + bob_eve_error_deviation = tf.abs(float(TEXT_SIZE) / 2.0 - eve_bits_wrong) + # 7-9 bits wrong is OK too, so we squish the error function a bit. + # Without doing this, we often tend to hang out at 0.25 / 7.5 error, + # and it seems bad to have continued, high communication error. + bob_eve_loss = tf.reduce_sum( + tf.square(bob_eve_error_deviation) / (TEXT_SIZE / 2)**2) + + # Rescale the losses to [0, 1] per example and combine. + self.bob_loss = (self.bob_reconstruction_loss / TEXT_SIZE + bob_eve_loss) + + self.bob_optimizer = optimizer.minimize( + self.bob_loss, + var_list=(tf.get_collection('alice') + tf.get_collection('bob'))) + + +def doeval(s, ac, n, itercount): + """Evaluate the current network on n batches of random examples. + + Args: + s: The current TensorFlow session + ac: an instance of the AdversarialCrypto class + n: The number of iterations to run. + itercount: Iteration count label for logging. + + Returns: + Bob and eve's loss, as a percent of bits incorrect. + """ + + bob_loss_accum = 0 + eve_loss_accum = 0 + for _ in xrange(n): + bl, el = s.run([ac.bob_reconstruction_loss, ac.eve_loss]) + bob_loss_accum += bl + eve_loss_accum += el + bob_loss_percent = bob_loss_accum / (n * FLAGS.batch_size) + eve_loss_percent = eve_loss_accum / (n * FLAGS.batch_size) + print('%d %.2f %.2f' % (itercount, bob_loss_percent, eve_loss_percent)) + sys.stdout.flush() + return bob_loss_percent, eve_loss_percent + + +def train_until_thresh(s, ac): + for j in xrange(MAX_TRAINING_LOOPS): + for _ in xrange(ITERS_PER_ACTOR): + s.run(ac.bob_optimizer) + for _ in xrange(ITERS_PER_ACTOR * EVE_MULTIPLIER): + s.run(ac.eve_optimizer) + if j % PRINT_EVERY == 0: + bob_avg_loss, eve_avg_loss = doeval(s, ac, EVAL_BATCHES, j) + if (bob_avg_loss < BOB_LOSS_THRESH and eve_avg_loss > EVE_LOSS_THRESH): + print('Target losses achieved.') + return True + return False + + +def train_and_evaluate(): + """Run the full training and evaluation loop.""" + + ac = AdversarialCrypto() + init = tf.global_variables_initializer() + + with tf.Session() as s: + s.run(init) + print('# Batch size: ', FLAGS.batch_size) + print('# Iter Bob_Recon_Error Eve_Recon_Error') + + if train_until_thresh(s, ac): + for _ in xrange(EVE_EXTRA_ROUNDS): + s.run(eve_optimizer) + print('Loss after eve extra training:') + doeval(s, ac, EVAL_BATCHES * 2, 0) + for _ in xrange(NUMBER_OF_EVE_RESETS): + print('Resetting Eve') + s.run(reset_eve_vars) + eve_counter = 0 + for _ in xrange(RETRAIN_EVE_LOOPS): + for _ in xrange(RETRAIN_EVE_ITERS): + eve_counter += 1 + s.run(eve_optimizer) + doeval(s, ac, EVAL_BATCHES, eve_counter) + doeval(s, ac, EVAL_BATCHES, eve_counter) + + +def main(unused_argv): + # Exit more quietly with Ctrl-C. + signal.signal(signal.SIGINT, signal.SIG_DFL) + train_and_evaluate() + + +if __name__ == '__main__': + tf.app.run() diff --git a/adversarial_text/BUILD b/adversarial_text/BUILD new file mode 100644 index 0000000000000000000000000000000000000000..476865f968291edd9ab256e2733b2e0739372fe2 --- /dev/null +++ b/adversarial_text/BUILD @@ -0,0 +1,97 @@ +licenses(["notice"]) # Apache 2.0 + +# Binaries +# ============================================================================== +py_binary( + name = "evaluate", + srcs = ["evaluate.py"], + deps = [ + ":graphs", + # google3 file dep, + # tensorflow dep, + ], +) + +py_binary( + name = "train_classifier", + srcs = ["train_classifier.py"], + deps = [ + ":graphs", + ":train_utils", + # google3 file dep, + # tensorflow dep, + ], +) + +py_binary( + name = "pretrain", + srcs = [ + "pretrain.py", + ], + deps = [ + ":graphs", + ":train_utils", + # google3 file dep, + # tensorflow dep, + ], +) + +# Libraries +# ============================================================================== +py_library( + name = "graphs", + srcs = ["graphs.py"], + deps = [ + ":adversarial_losses", + ":inputs", + ":layers", + # tensorflow dep, + ], +) + +py_library( + name = "adversarial_losses", + srcs = ["adversarial_losses.py"], + deps = [ + # tensorflow dep, + ], +) + +py_library( + name = "inputs", + srcs = ["inputs.py"], + deps = [ + # tensorflow dep, + "//adversarial_text/data:data_utils", + ], +) + +py_library( + name = "layers", + srcs = ["layers.py"], + deps = [ + # tensorflow dep, + ], +) + +py_library( + name = "train_utils", + srcs = ["train_utils.py"], + deps = [ + # numpy dep, + # tensorflow dep, + ], +) + +# Tests +# ============================================================================== +py_test( + name = "graphs_test", + size = "large", + srcs = ["graphs_test.py"], + deps = [ + ":graphs", + # tensorflow dep, + "//adversarial_text/data:data_utils", + ], +) diff --git a/adversarial_text/README.md b/adversarial_text/README.md new file mode 100644 index 0000000000000000000000000000000000000000..bfddc7088fd631ff9121378907694a5bef80d766 --- /dev/null +++ b/adversarial_text/README.md @@ -0,0 +1,156 @@ +# Adversarial Text Classification + +Code for [*Adversarial Training Methods for Semi-Supervised Text Classification*](https://arxiv.org/abs/1605.07725) and [*Semi-Supervised Sequence Learning*](https://arxiv.org/abs/1511.01432). + +## Requirements + +* Bazel ([install](https://bazel.build/versions/master/docs/install.html)) +* TensorFlow >= v1.1 + +## End-to-end IMDB Sentiment Classification + +### Fetch data + +``` +$ wget http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz \ + -O /tmp/imdb.tar.gz +$ tar -xf /tmp/imdb.tar.gz -C /tmp +``` + +The directory `/tmp/aclImdb` contains the raw IMDB data. + +### Generate vocabulary + +``` +$ IMDB_DATA_DIR=/tmp/imdb +$ bazel run data:gen_vocab -- \ + --output_dir=$IMDB_DATA_DIR \ + --dataset=imdb \ + --imdb_input_dir=/tmp/aclImdb \ + --lowercase=False +``` + +Vocabulary and frequency files will be generated in `$IMDB_DATA_DIR`. + +###  Generate training, validation, and test data + +``` +$ bazel run data:gen_data -- \ + --output_dir=$IMDB_DATA_DIR \ + --dataset=imdb \ + --imdb_input_dir=/tmp/aclImdb \ + --lowercase=False \ + --label_gain=False +``` + +`$IMDB_DATA_DIR` contains TFRecords files. + +### Pretrain IMDB Language Model + +``` +$ PRETRAIN_DIR=/tmp/models/imdb_pretrain +$ bazel run :pretrain -- \ + --train_dir=$PRETRAIN_DIR \ + --data_dir=$IMDB_DATA_DIR \ + --vocab_size=86934 \ + --embedding_dims=256 \ + --rnn_cell_size=1024 \ + --num_candidate_samples=1024 \ + --batch_size=256 \ + --learning_rate=0.001 \ + --learning_rate_decay_factor=0.9999 \ + --max_steps=100000 \ + --max_grad_norm=1.0 \ + --num_timesteps=400 \ + --keep_prob_emb=0.5 \ + --normalize_embeddings +``` + +`$PRETRAIN_DIR` contains checkpoints of the pretrained language model. + +### Train classifier + +Most flags stay the same, save for the removal of candidate sampling and the +addition of `pretrained_model_dir`, from which the classifier will load the +pretrained embedding and LSTM variables, and flags related to adversarial +training and classification. + +``` +$ TRAIN_DIR=/tmp/models/imdb_classify +$ bazel run :train_classifier -- \ + --train_dir=$TRAIN_DIR \ + --pretrained_model_dir=$PRETRAIN_DIR \ + --data_dir=$IMDB_DATA_DIR \ + --vocab_size=86934 \ + --embedding_dims=256 \ + --rnn_cell_size=1024 \ + --cl_num_layers=1 \ + --cl_hidden_size=30 \ + --batch_size=64 \ + --learning_rate=0.0005 \ + --learning_rate_decay_factor=0.9998 \ + --max_steps=15000 \ + --max_grad_norm=1.0 \ + --num_timesteps=400 \ + --keep_prob_emb=0.5 \ + --normalize_embeddings \ + --adv_training_method=vat \ + --perturb_norm_length=5.0 +``` + +### Evaluate on test data + +``` +$ EVAL_DIR=/tmp/models/imdb_eval +$ bazel run :evaluate -- \ + --eval_dir=$EVAL_DIR \ + --checkpoint_dir=$TRAIN_DIR \ + --eval_data=test \ + --run_once \ + --num_examples=25000 \ + --data_dir=$IMDB_DATA_DIR \ + --vocab_size=86934 \ + --embedding_dims=256 \ + --rnn_cell_size=1024 \ + --batch_size=256 \ + --num_timesteps=400 \ + --normalize_embeddings +``` + +## Code Overview + +The main entry points are the binaries listed below. Each training binary builds +a `VatxtModel`, defined in `graphs.py`, which in turn uses graph building blocks +defined in `inputs.py` (defines input data reading and parsing), `layers.py` +(defines core model components), and `adversarial_losses.py` (defines +adversarial training losses). The training loop itself is defined in +`train_utils.py`. + +### Binaries + +* Pretraining: `pretrain.py` +* Classifier Training: `train_classifier.py` +* Evaluation: `evaluate.py` + +### Command-Line Flags + +Flags related to distributed training and the training loop itself are defined +in [`train_utils.py`](https://github.com/tensorflow/models/tree/master/adversarial_text/train_utils.py). + +Flags related to model hyperparameters are defined in [`graphs.py`](https://github.com/tensorflow/models/tree/master/adversarial_text/graphs.py). + +Flags related to adversarial training are defined in [`adversarial_losses.py`](https://github.com/tensorflow/models/tree/master/adversarial_text/adversarial_losses.py). + +Flags particular to each job are defined in the main binary files. + +### Data Generation + +* Vocabulary generation: [`gen_vocab.py`](https://github.com/tensorflow/models/tree/master/adversarial_text/data/gen_vocab.py) +* Data generation: [`gen_data.py`](https://github.com/tensorflow/models/tree/master/adversarial_text/data/gen_data.py) + +Command-line flags defined in [`document_generators.py`](https://github.com/tensorflow/models/tree/master/adversarial_text/data/document_generators.py) +control which dataset is processed and how. + +## Contact for Issues + +* Ryan Sepassi, @rsepassi diff --git a/adversarial_text/adversarial_losses.py b/adversarial_text/adversarial_losses.py new file mode 100644 index 0000000000000000000000000000000000000000..7ca99466688a0444a05f074ee21555f7ec791007 --- /dev/null +++ b/adversarial_text/adversarial_losses.py @@ -0,0 +1,225 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Adversarial losses for text models.""" +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +# Dependency imports + +import tensorflow as tf + +flags = tf.app.flags +FLAGS = flags.FLAGS + +# Adversarial and virtual adversarial training parameters. +flags.DEFINE_float('perturb_norm_length', 5.0, + 'Norm length of adversarial perturbation to be ' + 'optimized with validation. ' + '5.0 is optimal on IMDB with virtual adversarial training. ') + +# Virtual adversarial training parameters +flags.DEFINE_integer('num_power_iteration', 1, 'The number of power iteration') +flags.DEFINE_float('small_constant_for_finite_diff', 1e-1, + 'Small constant for finite difference method') + +# Parameters for building the graph +flags.DEFINE_string('adv_training_method', None, + 'The flag which specifies training method. ' + '"rp" : random perturbation training ' + '"at" : adversarial training ' + '"vat" : virtual adversarial training ' + '"atvat" : at + vat ') +flags.DEFINE_float('adv_reg_coeff', 1.0, + 'Regularization coefficient of adversarial loss.') + + +def random_perturbation_loss(embedded, length, loss_fn): + """Adds noise to embeddings and recomputes classification loss.""" + noise = tf.random_normal(shape=tf.shape(embedded)) + perturb = _scale_l2(_mask_by_length(noise, length), FLAGS.perturb_norm_length) + return loss_fn(embedded + perturb) + + +def adversarial_loss(embedded, loss, loss_fn): + """Adds gradient to embedding and recomputes classification loss.""" + grad, = tf.gradients( + loss, + embedded, + aggregation_method=tf.AggregationMethod.EXPERIMENTAL_ACCUMULATE_N) + grad = tf.stop_gradient(grad) + perturb = _scale_l2(grad, FLAGS.perturb_norm_length) + return loss_fn(embedded + perturb) + + +def virtual_adversarial_loss(logits, embedded, inputs, + logits_from_embedding_fn): + """Virtual adversarial loss. + + Computes virtual adversarial perturbation by finite difference method and + power iteration, adds it to the embedding, and computes the KL divergence + between the new logits and the original logits. + + Args: + logits: 2-D float Tensor, [num_timesteps*batch_size, m], where m=1 if + num_classes=2, otherwise m=num_classes. + embedded: 3-D float Tensor, [batch_size, num_timesteps, embedding_dim]. + inputs: VatxtInput. + logits_from_embedding_fn: callable that takes embeddings and returns + classifier logits. + + Returns: + kl: float scalar. + """ + # Stop gradient of logits. See https://arxiv.org/abs/1507.00677 for details. + logits = tf.stop_gradient(logits) + + # Only care about the KL divergence on the final timestep. + weights = inputs.eos_weights + assert weights is not None + + # Initialize perturbation with random noise. + # shape(embedded) = (batch_size, num_timesteps, embedding_dim) + d = tf.random_normal(shape=tf.shape(embedded)) + + # Perform finite difference method and power iteration. + # See Eq.(8) in the paper http://arxiv.org/pdf/1507.00677.pdf, + # Adding small noise to input and taking gradient with respect to the noise + # corresponds to 1 power iteration. + for _ in xrange(FLAGS.num_power_iteration): + d = _scale_l2( + _mask_by_length(d, inputs.length), FLAGS.small_constant_for_finite_diff) + d_logits = logits_from_embedding_fn(embedded + d) + kl = _kl_divergence_with_logits(logits, d_logits, weights) + d, = tf.gradients( + kl, + d, + aggregation_method=tf.AggregationMethod.EXPERIMENTAL_ACCUMULATE_N) + d = tf.stop_gradient(d) + + perturb = _scale_l2(d, FLAGS.perturb_norm_length) + vadv_logits = logits_from_embedding_fn(embedded + perturb) + return _kl_divergence_with_logits(logits, vadv_logits, weights) + + +def random_perturbation_loss_bidir(embedded, length, loss_fn): + """Adds noise to embeddings and recomputes classification loss.""" + noise = [tf.random_normal(shape=tf.shape(emb)) for emb in embedded] + masked = [_mask_by_length(n, length) for n in noise] + scaled = [_scale_l2(m, FLAGS.perturb_norm_length) for m in masked] + return loss_fn([e + s for (e, s) in zip(embedded, scaled)]) + + +def adversarial_loss_bidir(embedded, loss, loss_fn): + """Adds gradient to embeddings and recomputes classification loss.""" + grads = tf.gradients( + loss, + embedded, + aggregation_method=tf.AggregationMethod.EXPERIMENTAL_ACCUMULATE_N) + adv_exs = [ + emb + _scale_l2(tf.stop_gradient(g), FLAGS.perturb_norm_length) + for emb, g in zip(embedded, grads) + ] + return loss_fn(adv_exs) + + +def virtual_adversarial_loss_bidir(logits, embedded, inputs, + logits_from_embedding_fn): + """Virtual adversarial loss for bidirectional models.""" + logits = tf.stop_gradient(logits) + f_inputs, _ = inputs + weights = f_inputs.eos_weights + assert weights is not None + + perturbs = [ + _mask_by_length(tf.random_normal(shape=tf.shape(emb)), f_inputs.length) + for emb in embedded + ] + for _ in xrange(FLAGS.num_power_iteration): + perturbs = [ + _scale_l2(d, FLAGS.small_constant_for_finite_diff) for d in perturbs + ] + d_logits = logits_from_embedding_fn( + [emb + d for (emb, d) in zip(embedded, perturbs)]) + kl = _kl_divergence_with_logits(logits, d_logits, weights) + perturbs = tf.gradients( + kl, + perturbs, + aggregation_method=tf.AggregationMethod.EXPERIMENTAL_ACCUMULATE_N) + perturbs = [tf.stop_gradient(d) for d in perturbs] + + perturbs = [_scale_l2(d, FLAGS.perturb_norm_length) for d in perturbs] + vadv_logits = logits_from_embedding_fn( + [emb + d for (emb, d) in zip(embedded, perturbs)]) + return _kl_divergence_with_logits(logits, vadv_logits, weights) + + +def _mask_by_length(t, length): + """Mask t, 3-D [batch, time, dim], by length, 1-D [batch,].""" + maxlen = t.get_shape().as_list()[1] + + # Subtract 1 from length to prevent the perturbation from going on 'eos' + mask = tf.sequence_mask(length - 1, maxlen=maxlen) + mask = tf.expand_dims(tf.cast(mask, tf.float32), -1) + # shape(mask) = (batch, num_timesteps, 1) + return t * mask + + +def _scale_l2(x, norm_length): + # shape(x) = (batch, num_timesteps, d) + # Divide x by max(abs(x)) for a numerically stable L2 norm. + # 2norm(x) = a * 2norm(x/a) + # Scale over the full sequence, dims (1, 2) + alpha = tf.reduce_max(tf.abs(x), (1, 2), keep_dims=True) + 1e-12 + l2_norm = alpha * tf.sqrt( + tf.reduce_sum(tf.pow(x / alpha, 2), (1, 2), keep_dims=True) + 1e-6) + x_unit = x / l2_norm + return norm_length * x_unit + + +def _kl_divergence_with_logits(q_logits, p_logits, weights): + """Returns weighted KL divergence between distributions q and p. + + Args: + q_logits: logits for 1st argument of KL divergence shape + [num_timesteps * batch_size, num_classes] if num_classes > 2, and + [num_timesteps * batch_size] if num_classes == 2. + p_logits: logits for 2nd argument of KL divergence with same shape q_logits. + weights: 1-D float tensor with shape [num_timesteps * batch_size]. + Elements should be 1.0 only on end of sequences + + Returns: + KL: float scalar. + """ + # For logistic regression + if FLAGS.num_classes == 2: + q = tf.nn.sigmoid(q_logits) + kl = (-tf.nn.sigmoid_cross_entropy_with_logits(logits=q_logits, labels=q) + + tf.nn.sigmoid_cross_entropy_with_logits(logits=p_logits, labels=q)) + kl = tf.squeeze(kl) + + # For softmax regression + else: + q = tf.nn.softmax(q_logits) + kl = tf.reduce_sum( + q * (tf.nn.log_softmax(q_logits) - tf.nn.log_softmax(p_logits)), 1) + + num_labels = tf.reduce_sum(weights) + num_labels = tf.where(tf.equal(num_labels, 0.), 1., num_labels) + + kl.get_shape().assert_has_rank(1) + weights.get_shape().assert_has_rank(1) + loss = tf.identity(tf.reduce_sum(weights * kl) / num_labels, name='kl') + return loss diff --git a/adversarial_text/data/BUILD b/adversarial_text/data/BUILD new file mode 100644 index 0000000000000000000000000000000000000000..b59f7a30ea67ff8fe49351e2984e1c8a95d7112d --- /dev/null +++ b/adversarial_text/data/BUILD @@ -0,0 +1,52 @@ +licenses(["notice"]) # Apache 2.0 + +package( + default_visibility = [ + "//adversarial_text:__subpackages__", + ], +) + +py_binary( + name = "gen_vocab", + srcs = ["gen_vocab.py"], + deps = [ + ":data_utils", + ":document_generators", + # tensorflow dep, + ], +) + +py_binary( + name = "gen_data", + srcs = ["gen_data.py"], + deps = [ + ":data_utils", + ":document_generators", + # tensorflow dep, + ], +) + +py_library( + name = "document_generators", + srcs = ["document_generators.py"], + deps = [ + # tensorflow dep, + ], +) + +py_library( + name = "data_utils", + srcs = ["data_utils.py"], + deps = [ + # tensorflow dep, + ], +) + +py_test( + name = "data_utils_test", + srcs = ["data_utils_test.py"], + deps = [ + ":data_utils", + # tensorflow dep, + ], +) diff --git a/adversarial_text/data/data_utils.py b/adversarial_text/data/data_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..d458caadd6878b200d7901c886ced3a93e2cb76f --- /dev/null +++ b/adversarial_text/data/data_utils.py @@ -0,0 +1,332 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Utilities for generating/preprocessing data for adversarial text models.""" + +import operator +import os +import random +import re + +# Dependency imports + +import tensorflow as tf + +EOS_TOKEN = '' + +# Data filenames +# Sequence Autoencoder +ALL_SA = 'all_sa.tfrecords' +TRAIN_SA = 'train_sa.tfrecords' +TEST_SA = 'test_sa.tfrecords' +# Language Model +ALL_LM = 'all_lm.tfrecords' +TRAIN_LM = 'train_lm.tfrecords' +TEST_LM = 'test_lm.tfrecords' +# Classification +TRAIN_CLASS = 'train_classification.tfrecords' +TEST_CLASS = 'test_classification.tfrecords' +VALID_CLASS = 'validate_classification.tfrecords' +# LM with bidirectional LSTM +TRAIN_REV_LM = 'train_reverse_lm.tfrecords' +TEST_REV_LM = 'test_reverse_lm.tfrecords' +# Classification with bidirectional LSTM +TRAIN_BD_CLASS = 'train_bidir_classification.tfrecords' +TEST_BD_CLASS = 'test_bidir_classification.tfrecords' +VALID_BD_CLASS = 'validate_bidir_classification.tfrecords' + + +class ShufflingTFRecordWriter(object): + """Thin wrapper around TFRecordWriter that shuffles records.""" + + def __init__(self, path): + self._path = path + self._records = [] + self._closed = False + + def write(self, record): + assert not self._closed + self._records.append(record) + + def close(self): + assert not self._closed + random.shuffle(self._records) + with tf.python_io.TFRecordWriter(self._path) as f: + for record in self._records: + f.write(record) + self._closed = True + + def __enter__(self): + return self + + def __exit__(self, unused_type, unused_value, unused_traceback): + self.close() + + +class Timestep(object): + """Represents a single timestep in a SequenceWrapper.""" + + def __init__(self, token, label, weight, multivalent_tokens=False): + """Constructs Timestep from empty Features.""" + self._token = token + self._label = label + self._weight = weight + self._multivalent_tokens = multivalent_tokens + self._fill_with_defaults() + + @property + def token(self): + if self._multivalent_tokens: + raise TypeError('Timestep may contain multiple values; use `tokens`') + return self._token.int64_list.value[0] + + @property + def tokens(self): + return self._token.int64_list.value + + @property + def label(self): + return self._label.int64_list.value[0] + + @property + def weight(self): + return self._weight.float_list.value[0] + + def set_token(self, token): + if self._multivalent_tokens: + raise TypeError('Timestep may contain multiple values; use `add_token`') + self._token.int64_list.value[0] = token + return self + + def add_token(self, token): + self._token.int64_list.value.append(token) + return self + + def set_label(self, label): + self._label.int64_list.value[0] = label + return self + + def set_weight(self, weight): + self._weight.float_list.value[0] = weight + return self + + def copy_from(self, timestep): + self.set_token(timestep.token).set_label(timestep.label).set_weight( + timestep.weight) + return self + + def _fill_with_defaults(self): + if not self._multivalent_tokens: + self._token.int64_list.value.append(0) + self._label.int64_list.value.append(0) + self._weight.float_list.value.append(0.0) + + +class SequenceWrapper(object): + """Wrapper around tf.SequenceExample.""" + + F_TOKEN_ID = 'token_id' + F_LABEL = 'label' + F_WEIGHT = 'weight' + + def __init__(self, multivalent_tokens=False): + self._seq = tf.train.SequenceExample() + self._flist = self._seq.feature_lists.feature_list + self._timesteps = [] + self._multivalent_tokens = multivalent_tokens + + @property + def seq(self): + return self._seq + + @property + def multivalent_tokens(self): + return self._multivalent_tokens + + @property + def _tokens(self): + return self._flist[SequenceWrapper.F_TOKEN_ID].feature + + @property + def _labels(self): + return self._flist[SequenceWrapper.F_LABEL].feature + + @property + def _weights(self): + return self._flist[SequenceWrapper.F_WEIGHT].feature + + def add_timestep(self): + timestep = Timestep( + self._tokens.add(), + self._labels.add(), + self._weights.add(), + multivalent_tokens=self._multivalent_tokens) + self._timesteps.append(timestep) + return timestep + + def __iter__(self): + for timestep in self._timesteps: + yield timestep + + def __len__(self): + return len(self._timesteps) + + def __getitem__(self, idx): + return self._timesteps[idx] + + +def build_reverse_sequence(seq): + """Builds a sequence that is the reverse of the input sequence.""" + reverse_seq = SequenceWrapper() + + # Copy all but last timestep + for timestep in reversed(seq[:-1]): + reverse_seq.add_timestep().copy_from(timestep) + + # Copy final timestep + reverse_seq.add_timestep().copy_from(seq[-1]) + + return reverse_seq + + +def build_bidirectional_seq(seq, rev_seq): + bidir_seq = SequenceWrapper(multivalent_tokens=True) + for forward_ts, reverse_ts in zip(seq, rev_seq): + bidir_seq.add_timestep().add_token(forward_ts.token).add_token( + reverse_ts.token) + + return bidir_seq + + +def build_lm_sequence(seq): + """Builds language model sequence from input sequence. + + Args: + seq: SequenceWrapper. + + Returns: + SequenceWrapper with `seq` tokens copied over to output sequence tokens and + labels (offset by 1, i.e. predict next token) with weights set to 1.0, + except for token. + """ + lm_seq = SequenceWrapper() + for i, timestep in enumerate(seq): + if i == len(seq) - 1: + lm_seq.add_timestep().set_token(timestep.token).set_label( + seq[i].token).set_weight(0.0) + else: + lm_seq.add_timestep().set_token(timestep.token).set_label( + seq[i + 1].token).set_weight(1.0) + return lm_seq + + +def build_seq_ae_sequence(seq): + """Builds seq_ae sequence from input sequence. + + Args: + seq: SequenceWrapper. + + Returns: + SequenceWrapper with `seq` inputs copied and concatenated, and with labels + copied in on the right-hand (i.e. decoder) side with weights set to 1.0. + The new sequence will have length `len(seq) * 2 - 1`, as the last timestep + of the encoder section and the first step of the decoder section will + overlap. + """ + seq_ae_seq = SequenceWrapper() + + for i in range(len(seq) * 2 - 1): + ts = seq_ae_seq.add_timestep() + + if i < len(seq) - 1: + # Encoder + ts.set_token(seq[i].token) + elif i == len(seq) - 1: + # Transition step + ts.set_token(seq[i].token) + ts.set_label(seq[0].token) + ts.set_weight(1.0) + else: + # Decoder + ts.set_token(seq[i % len(seq)].token) + ts.set_label(seq[(i + 1) % len(seq)].token) + ts.set_weight(1.0) + + return seq_ae_seq + + +def build_labeled_sequence(seq, class_label, label_gain=False): + """Builds labeled sequence from input sequence. + + Args: + seq: SequenceWrapper. + class_label: bool. + label_gain: bool. If True, class_label will be put on every timestep and + weight will increase linearly from 0 to 1. + + Returns: + SequenceWrapper with `seq` copied in and `class_label` added as label to + final timestep. + """ + label_seq = SequenceWrapper(multivalent_tokens=seq.multivalent_tokens) + + # Copy sequence without labels + seq_len = len(seq) + final_timestep = None + for i, timestep in enumerate(seq): + label_timestep = label_seq.add_timestep() + if seq.multivalent_tokens: + for token in timestep.tokens: + label_timestep.add_token(token) + else: + label_timestep.set_token(timestep.token) + if label_gain: + label_timestep.set_label(int(class_label)) + weight = 1.0 if seq_len < 2 else float(i) / (seq_len - 1) + label_timestep.set_weight(weight) + if i == (seq_len - 1): + final_timestep = label_timestep + + # Edit final timestep to have class label and weight = 1. + final_timestep.set_label(int(class_label)).set_weight(1.0) + + return label_seq + + +def split_by_punct(segment): + """Splits str segment by punctuation, filters our empties and spaces.""" + return [s for s in re.split(r'\W+', segment) if s and not s.isspace()] + + +def sort_vocab_by_frequency(vocab_freq_map): + """Sorts vocab_freq_map by count. + + Args: + vocab_freq_map: dict, vocabulary terms with counts. + + Returns: + list> sorted by count, descending. + """ + return sorted( + vocab_freq_map.items(), key=operator.itemgetter(1), reverse=True) + + +def write_vocab_and_frequency(ordered_vocab_freqs, output_dir): + """Writes ordered_vocab_freqs into vocab.txt and vocab_freq.txt.""" + tf.gfile.MakeDirs(output_dir) + with open(os.path.join(output_dir, 'vocab.txt'), 'w') as vocab_f: + with open(os.path.join(output_dir, 'vocab_freq.txt'), 'w') as freq_f: + for word, freq in ordered_vocab_freqs: + vocab_f.write('{}\n'.format(word)) + freq_f.write('{}\n'.format(freq)) diff --git a/adversarial_text/data/data_utils_test.py b/adversarial_text/data/data_utils_test.py new file mode 100644 index 0000000000000000000000000000000000000000..59b7f4e66c083415cef0aea3f7192660cf49e6b8 --- /dev/null +++ b/adversarial_text/data/data_utils_test.py @@ -0,0 +1,200 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Tests for data_utils.""" +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +# Dependency imports + +import tensorflow as tf + +from adversarial_text.data import data_utils + +data = data_utils + + +class SequenceWrapperTest(tf.test.TestCase): + + def testDefaultTimesteps(self): + seq = data.SequenceWrapper() + t1 = seq.add_timestep() + _ = seq.add_timestep() + self.assertEqual(len(seq), 2) + + self.assertEqual(t1.weight, 0.0) + self.assertEqual(t1.label, 0) + self.assertEqual(t1.token, 0) + + def testSettersAndGetters(self): + ts = data.SequenceWrapper().add_timestep() + ts.set_token(3) + ts.set_label(4) + ts.set_weight(2.0) + self.assertEqual(ts.token, 3) + self.assertEqual(ts.label, 4) + self.assertEqual(ts.weight, 2.0) + + def testTimestepIteration(self): + seq = data.SequenceWrapper() + seq.add_timestep().set_token(0) + seq.add_timestep().set_token(1) + seq.add_timestep().set_token(2) + for i, ts in enumerate(seq): + self.assertEqual(ts.token, i) + + def testFillsSequenceExampleCorrectly(self): + seq = data.SequenceWrapper() + seq.add_timestep().set_token(1).set_label(2).set_weight(3.0) + seq.add_timestep().set_token(10).set_label(20).set_weight(30.0) + + seq_ex = seq.seq + fl = seq_ex.feature_lists.feature_list + fl_token = fl[data.SequenceWrapper.F_TOKEN_ID].feature + fl_label = fl[data.SequenceWrapper.F_LABEL].feature + fl_weight = fl[data.SequenceWrapper.F_WEIGHT].feature + _ = [self.assertEqual(len(f), 2) for f in [fl_token, fl_label, fl_weight]] + self.assertAllEqual([f.int64_list.value[0] for f in fl_token], [1, 10]) + self.assertAllEqual([f.int64_list.value[0] for f in fl_label], [2, 20]) + self.assertAllEqual([f.float_list.value[0] for f in fl_weight], [3.0, 30.0]) + + +class DataUtilsTest(tf.test.TestCase): + + def testSplitByPunct(self): + output = data.split_by_punct( + 'hello! world, i\'ve been\nwaiting\tfor\ryou for.a long time') + expected = [ + 'hello', 'world', 'i', 've', 'been', 'waiting', 'for', 'you', 'for', + 'a', 'long', 'time' + ] + self.assertListEqual(output, expected) + + def _buildDummySequence(self): + seq = data.SequenceWrapper() + for i in range(10): + seq.add_timestep().set_token(i) + return seq + + def testBuildLMSeq(self): + seq = self._buildDummySequence() + lm_seq = data.build_lm_sequence(seq) + for i, ts in enumerate(lm_seq): + # For end of sequence, the token and label should be same, and weight + # should be 0.0. + if i == len(lm_seq) - 1: + self.assertEqual(ts.token, i) + self.assertEqual(ts.label, i) + self.assertEqual(ts.weight, 0.0) + else: + self.assertEqual(ts.token, i) + self.assertEqual(ts.label, i + 1) + self.assertEqual(ts.weight, 1.0) + + def testBuildSAESeq(self): + seq = self._buildDummySequence() + sa_seq = data.build_seq_ae_sequence(seq) + + self.assertEqual(len(sa_seq), len(seq) * 2 - 1) + + # Tokens should be sequence twice, minus the EOS token at the end + for i, ts in enumerate(sa_seq): + self.assertEqual(ts.token, seq[i % 10].token) + + # Weights should be len-1 0.0's and len 1.0's. + for i in range(len(seq) - 1): + self.assertEqual(sa_seq[i].weight, 0.0) + for i in range(len(seq) - 1, len(sa_seq)): + self.assertEqual(sa_seq[i].weight, 1.0) + + # Labels should be len-1 0's, and then the sequence + for i in range(len(seq) - 1): + self.assertEqual(sa_seq[i].label, 0) + for i in range(len(seq) - 1, len(sa_seq)): + self.assertEqual(sa_seq[i].label, seq[i - (len(seq) - 1)].token) + + def testBuildLabelSeq(self): + seq = self._buildDummySequence() + eos_id = len(seq) - 1 + label_seq = data.build_labeled_sequence(seq, True) + for i, ts in enumerate(label_seq[:-1]): + self.assertEqual(ts.token, i) + self.assertEqual(ts.label, 0) + self.assertEqual(ts.weight, 0.0) + + final_timestep = label_seq[-1] + self.assertEqual(final_timestep.token, eos_id) + self.assertEqual(final_timestep.label, 1) + self.assertEqual(final_timestep.weight, 1.0) + + def testBuildBidirLabelSeq(self): + seq = self._buildDummySequence() + reverse_seq = data.build_reverse_sequence(seq) + bidir_seq = data.build_bidirectional_seq(seq, reverse_seq) + label_seq = data.build_labeled_sequence(bidir_seq, True) + + for (i, ts), j in zip( + enumerate(label_seq[:-1]), reversed(range(len(seq) - 1))): + self.assertAllEqual(ts.tokens, [i, j]) + self.assertEqual(ts.label, 0) + self.assertEqual(ts.weight, 0.0) + + final_timestep = label_seq[-1] + eos_id = len(seq) - 1 + self.assertAllEqual(final_timestep.tokens, [eos_id, eos_id]) + self.assertEqual(final_timestep.label, 1) + self.assertEqual(final_timestep.weight, 1.0) + + def testReverseSeq(self): + seq = self._buildDummySequence() + reverse_seq = data.build_reverse_sequence(seq) + for i, ts in enumerate(reversed(reverse_seq[:-1])): + self.assertEqual(ts.token, i) + self.assertEqual(ts.label, 0) + self.assertEqual(ts.weight, 0.0) + + final_timestep = reverse_seq[-1] + eos_id = len(seq) - 1 + self.assertEqual(final_timestep.token, eos_id) + self.assertEqual(final_timestep.label, 0) + self.assertEqual(final_timestep.weight, 0.0) + + def testBidirSeq(self): + seq = self._buildDummySequence() + reverse_seq = data.build_reverse_sequence(seq) + bidir_seq = data.build_bidirectional_seq(seq, reverse_seq) + for (i, ts), j in zip( + enumerate(bidir_seq[:-1]), reversed(range(len(seq) - 1))): + self.assertAllEqual(ts.tokens, [i, j]) + self.assertEqual(ts.label, 0) + self.assertEqual(ts.weight, 0.0) + + final_timestep = bidir_seq[-1] + eos_id = len(seq) - 1 + self.assertAllEqual(final_timestep.tokens, [eos_id, eos_id]) + self.assertEqual(final_timestep.label, 0) + self.assertEqual(final_timestep.weight, 0.0) + + def testLabelGain(self): + seq = self._buildDummySequence() + label_seq = data.build_labeled_sequence(seq, True, label_gain=True) + for i, ts in enumerate(label_seq): + self.assertEqual(ts.token, i) + self.assertEqual(ts.label, 1) + self.assertNear(ts.weight, float(i) / (len(seq) - 1), 1e-3) + + +if __name__ == '__main__': + tf.test.main() diff --git a/adversarial_text/data/document_generators.py b/adversarial_text/data/document_generators.py new file mode 100644 index 0000000000000000000000000000000000000000..aee7fc76ad595736c8d08a7fd8ff80d7e9588a55 --- /dev/null +++ b/adversarial_text/data/document_generators.py @@ -0,0 +1,383 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Input readers and document/token generators for datasets.""" +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +from collections import namedtuple +import csv +import os +import random + +# Dependency imports + +import tensorflow as tf + +from adversarial_text.data import data_utils + +flags = tf.app.flags +FLAGS = flags.FLAGS + +flags.DEFINE_string('dataset', '', 'Which dataset to generate data for') + +# Preprocessing config +flags.DEFINE_boolean('output_unigrams', True, 'Whether to output unigrams.') +flags.DEFINE_boolean('output_bigrams', False, 'Whether to output bigrams.') +flags.DEFINE_boolean('output_char', False, 'Whether to output characters.') +flags.DEFINE_boolean('lowercase', True, 'Whether to lowercase document terms.') + +# IMDB +flags.DEFINE_string('imdb_input_dir', '', 'The input directory containing the ' + 'IMDB sentiment dataset.') +flags.DEFINE_integer('imdb_validation_pos_start_id', 10621, 'File id of the ' + 'first file in the pos sentiment validation set.') +flags.DEFINE_integer('imdb_validation_neg_start_id', 10625, 'File id of the ' + 'first file in the neg sentiment validation set.') + +# DBpedia +flags.DEFINE_string('dbpedia_input_dir', '', + 'Path to DBpedia directory containing train.csv and ' + 'test.csv.') + +# Reuters Corpus (rcv1) +flags.DEFINE_string('rcv1_input_dir', '', + 'Path to rcv1 directory containing train.csv, unlab.csv, ' + 'and test.csv.') + +# Rotten Tomatoes +flags.DEFINE_string('rt_input_dir', '', + 'The Rotten Tomatoes dataset input directory.') + +# The amazon reviews input file to use in either the RT or IMDB datasets. +flags.DEFINE_string('amazon_unlabeled_input_file', '', + 'The unlabeled Amazon Reviews dataset input file. If set, ' + 'the input file is used to augment RT and IMDB vocab.') + +Document = namedtuple('Document', + 'content is_validation is_test label add_tokens') + + +def documents(dataset='train', + include_unlabeled=False, + include_validation=False): + """Generates Documents based on FLAGS.dataset. + + Args: + dataset: str, identifies folder within IMDB data directory, test or train. + include_unlabeled: bool, whether to include the unsup directory. Only valid + when dataset=train. + include_validation: bool, whether to include validation data. + + Yields: + Document + + Raises: + ValueError: if include_unlabeled is true but dataset is not 'train' + """ + + if include_unlabeled and dataset != 'train': + raise ValueError('If include_unlabeled=True, must use train dataset') + + # Set the random seed so that we have the same validation set when running + # gen_data and gen_vocab. + random.seed(302) + + ds = FLAGS.dataset + if ds == 'imdb': + docs_gen = imdb_documents + elif ds == 'dbpedia': + docs_gen = dbpedia_documents + elif ds == 'rcv1': + docs_gen = rcv1_documents + elif ds == 'rt': + docs_gen = rt_documents + else: + raise ValueError('Unrecognized dataset %s' % FLAGS.dataset) + + for doc in docs_gen(dataset, include_unlabeled, include_validation): + yield doc + + +def tokens(doc): + """Given a Document, produces character or word tokens. + + Tokens can be either characters, or word-level tokens (unigrams and/or + bigrams). + + Args: + doc: Document to produce tokens from. + + Yields: + token + + Raises: + ValueError: if all FLAGS.{output_unigrams, output_bigrams, output_char} + are False. + """ + if not (FLAGS.output_unigrams or FLAGS.output_bigrams or FLAGS.output_char): + raise ValueError( + 'At least one of {FLAGS.output_unigrams, FLAGS.output_bigrams, ' + 'FLAGS.output_char} must be true') + + content = doc.content.strip() + if FLAGS.lowercase: + content = content.lower() + + if FLAGS.output_char: + for char in content: + yield char + + else: + tokens_ = data_utils.split_by_punct(content) + for i, token in enumerate(tokens_): + if FLAGS.output_unigrams: + yield token + + if FLAGS.output_bigrams: + previous_token = (tokens_[i - 1] if i > 0 else data_utils.EOS_TOKEN) + bigram = '_'.join([previous_token, token]) + yield bigram + if (i + 1) == len(tokens_): + bigram = '_'.join([token, data_utils.EOS_TOKEN]) + yield bigram + + +def imdb_documents(dataset='train', + include_unlabeled=False, + include_validation=False): + """Generates Documents for IMDB dataset. + + Data from http://ai.stanford.edu/~amaas/data/sentiment/ + + Args: + dataset: str, identifies folder within IMDB data directory, test or train. + include_unlabeled: bool, whether to include the unsup directory. Only valid + when dataset=train. + include_validation: bool, whether to include validation data. + + Yields: + Document + + Raises: + ValueError: if FLAGS.imdb_input_dir is empty. + """ + if not FLAGS.imdb_input_dir: + raise ValueError('Must provide FLAGS.imdb_input_dir') + + tf.logging.info('Generating IMDB documents...') + + def check_is_validation(filename, class_label): + if class_label is None: + return False + file_idx = int(filename.split('_')[0]) + is_pos_valid = (class_label and + file_idx >= FLAGS.imdb_validation_pos_start_id) + is_neg_valid = (not class_label and + file_idx >= FLAGS.imdb_validation_neg_start_id) + return is_pos_valid or is_neg_valid + + dirs = [(dataset + '/pos', True), (dataset + '/neg', False)] + if include_unlabeled: + dirs.append(('train/unsup', None)) + + for d, class_label in dirs: + for filename in os.listdir(os.path.join(FLAGS.imdb_input_dir, d)): + is_validation = check_is_validation(filename, class_label) + if is_validation and not include_validation: + continue + + with open(os.path.join(FLAGS.imdb_input_dir, d, filename)) as imdb_f: + content = imdb_f.read() + yield Document( + content=content, + is_validation=is_validation, + is_test=False, + label=class_label, + add_tokens=True) + + if FLAGS.amazon_unlabeled_input_file and include_unlabeled: + with open(FLAGS.amazon_unlabeled_input_file) as rt_f: + for content in rt_f: + yield Document( + content=content, + is_validation=False, + is_test=False, + label=None, + add_tokens=False) + + +def dbpedia_documents(dataset='train', + include_unlabeled=False, + include_validation=False): + """Generates Documents for DBpedia dataset. + + Dataset linked to at https://github.com/zhangxiangxiao/Crepe. + + Args: + dataset: str, identifies the csv file within the DBpedia data directory, + test or train. + include_unlabeled: bool, unused. + include_validation: bool, whether to include validation data, which is a + randomly selected 10% of the data. + + Yields: + Document + + Raises: + ValueError: if FLAGS.dbpedia_input_dir is empty. + """ + del include_unlabeled + + if not FLAGS.dbpedia_input_dir: + raise ValueError('Must provide FLAGS.dbpedia_input_dir') + + tf.logging.info('Generating DBpedia documents...') + + with open(os.path.join(FLAGS.dbpedia_input_dir, dataset + '.csv')) as db_f: + reader = csv.reader(db_f) + for row in reader: + # 10% of the data is randomly held out + is_validation = random.randint(1, 10) == 1 + if is_validation and not include_validation: + continue + + content = row[1] + ' ' + row[2] + yield Document( + content=content, + is_validation=is_validation, + is_test=False, + label=int(row[0]), + add_tokens=True) + + +def rcv1_documents(dataset='train', + include_unlabeled=True, + include_validation=False): + # pylint:disable=line-too-long + """Generates Documents for Reuters Corpus (rcv1) dataset. + + Dataset described at + http://www.ai.mit.edu/projects/jmlr/papers/volume5/lewis04a/lyrl2004_rcv1v2_README.htm + + Args: + dataset: str, identifies the csv file within the rcv1 data directory. + include_unlabeled: bool, whether to include the unlab file. Only valid + when dataset=train. + include_validation: bool, whether to include validation data, which is a + randomly selected 10% of the data. + + Yields: + Document + + Raises: + ValueError: if FLAGS.rcv1_input_dir is empty. + """ + # pylint:enable=line-too-long + + if not FLAGS.rcv1_input_dir: + raise ValueError('Must provide FLAGS.rcv1_input_dir') + + tf.logging.info('Generating rcv1 documents...') + + datasets = [dataset] + if include_unlabeled: + if dataset == 'train': + datasets.append('unlab') + for dset in datasets: + with open(os.path.join(FLAGS.rcv1_input_dir, dset + '.csv')) as db_f: + reader = csv.reader(db_f) + for row in reader: + # 10% of the data is randomly held out + is_validation = random.randint(1, 10) == 1 + if is_validation and not include_validation: + continue + + content = row[1] + yield Document( + content=content, + is_validation=is_validation, + is_test=False, + label=int(row[0]), + add_tokens=True) + + +def rt_documents(dataset='train', + include_unlabeled=True, + include_validation=False): + # pylint:disable=line-too-long + """Generates Documents for the Rotten Tomatoes dataset. + + Dataset available at http://www.cs.cornell.edu/people/pabo/movie-review-data/ + In this dataset, amazon reviews are used for the unlabeled data. + + Args: + dataset: str, identifies the data subdirectory. + include_unlabeled: bool, whether to include the unlabeled data. Only valid + when dataset=train. + include_validation: bool, whether to include validation data, which is a + randomly selected 10% of the data. + + Yields: + Document + + Raises: + ValueError: if FLAGS.rt_input_dir is empty. + """ + # pylint:enable=line-too-long + + if not FLAGS.rt_input_dir: + raise ValueError('Must provide FLAGS.rt_input_dir') + + tf.logging.info('Generating rt documents...') + + data_files = [] + input_filenames = os.listdir(FLAGS.rt_input_dir) + for inp_fname in input_filenames: + if inp_fname.endswith('.pos'): + data_files.append((os.path.join(FLAGS.rt_input_dir, inp_fname), True)) + elif inp_fname.endswith('.neg'): + data_files.append((os.path.join(FLAGS.rt_input_dir, inp_fname), False)) + if include_unlabeled and FLAGS.amazon_unlabeled_input_file: + data_files.append((FLAGS.amazon_unlabeled_input_file, None)) + + for filename, class_label in data_files: + with open(filename) as rt_f: + for content in rt_f: + if class_label is None: + # Process Amazon Review data for unlabeled dataset + if content.startswith('review/text'): + yield Document( + content=content, + is_validation=False, + is_test=False, + label=None, + add_tokens=False) + else: + # 10% of the data is randomly held out for the validation set and + # another 10% of it is randomly held out for the test set + random_int = random.randint(1, 10) + is_validation = random_int == 1 + is_test = random_int == 2 + if (is_test and dataset != 'test') or (is_validation and + not include_validation): + continue + + yield Document( + content=content, + is_validation=is_validation, + is_test=is_test, + label=class_label, + add_tokens=True) diff --git a/adversarial_text/data/gen_data.py b/adversarial_text/data/gen_data.py new file mode 100644 index 0000000000000000000000000000000000000000..66aa141a1ba01ad7c15ab82df8453bbf2ab0352d --- /dev/null +++ b/adversarial_text/data/gen_data.py @@ -0,0 +1,217 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Create TFRecord files of SequenceExample protos from dataset. + +Constructs 3 datasets: + 1. Labeled data for the LSTM classification model, optionally with label gain. + "*_classification.tfrecords" (for both unidirectional and bidirectional + models). + 2. Data for the unsupervised LM-LSTM model that predicts the next token. + "*_lm.tfrecords" (generates forward and reverse data). + 3. Data for the unsupervised SA-LSTM model that uses Seq2Seq. + "*_sa.tfrecords". +""" +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import string + +# Dependency imports + +import tensorflow as tf + +from adversarial_text.data import data_utils +from adversarial_text.data import document_generators + +data = data_utils +flags = tf.app.flags +FLAGS = flags.FLAGS + +# Flags for input data are in document_generators.py +flags.DEFINE_string('vocab_file', '', 'Path to the vocabulary file. Defaults ' + 'to FLAGS.output_dir/vocab.txt.') +flags.DEFINE_string('output_dir', '', 'Path to save tfrecords.') + +# Config +flags.DEFINE_boolean('label_gain', False, + 'Enable linear label gain. If True, sentiment label will ' + 'be included at each timestep with linear weight ' + 'increase.') + + +def build_shuffling_tf_record_writer(fname): + return data.ShufflingTFRecordWriter(os.path.join(FLAGS.output_dir, fname)) + + +def build_tf_record_writer(fname): + return tf.python_io.TFRecordWriter(os.path.join(FLAGS.output_dir, fname)) + + +def build_input_sequence(doc, vocab_ids): + """Builds input sequence from file. + + Splits lines on whitespace. Treats punctuation as whitespace. For word-level + sequences, only keeps terms that are in the vocab. + + Terms are added as token in the SequenceExample. The EOS_TOKEN is also + appended. Label and weight features are set to 0. + + Args: + doc: Document (defined in `document_generators`) from which to build the + sequence. + vocab_ids: dict. + + Returns: + SequenceExampleWrapper. + """ + seq = data.SequenceWrapper() + for token in document_generators.tokens(doc): + if token in vocab_ids: + seq.add_timestep().set_token(vocab_ids[token]) + + # Add EOS token to end + seq.add_timestep().set_token(vocab_ids[data.EOS_TOKEN]) + + return seq + + +def make_vocab_ids(vocab_filename): + if FLAGS.output_char: + ret = dict([(char, i) for i, char in enumerate(string.printable)]) + ret[data.EOS_TOKEN] = len(string.printable) + return ret + else: + with open(vocab_filename) as vocab_f: + return dict([(line.strip(), i) for i, line in enumerate(vocab_f)]) + + +def generate_training_data(vocab_ids, writer_lm_all, writer_seq_ae_all): + """Generates training data.""" + + # Construct training data writers + writer_lm = build_shuffling_tf_record_writer(data.TRAIN_LM) + writer_seq_ae = build_shuffling_tf_record_writer(data.TRAIN_SA) + writer_class = build_shuffling_tf_record_writer(data.TRAIN_CLASS) + writer_valid_class = build_tf_record_writer(data.VALID_CLASS) + writer_rev_lm = build_shuffling_tf_record_writer(data.TRAIN_REV_LM) + writer_bd_class = build_shuffling_tf_record_writer(data.TRAIN_BD_CLASS) + writer_bd_valid_class = build_shuffling_tf_record_writer(data.VALID_BD_CLASS) + + for doc in document_generators.documents( + dataset='train', include_unlabeled=True, include_validation=True): + input_seq = build_input_sequence(doc, vocab_ids) + if len(input_seq) < 2: + continue + rev_seq = data.build_reverse_sequence(input_seq) + lm_seq = data.build_lm_sequence(input_seq) + rev_lm_seq = data.build_lm_sequence(rev_seq) + seq_ae_seq = data.build_seq_ae_sequence(input_seq) + if doc.label is not None: + # Used for sentiment classification. + label_seq = data.build_labeled_sequence( + input_seq, + doc.label, + label_gain=(FLAGS.label_gain and not doc.is_validation)) + bd_label_seq = data.build_labeled_sequence( + data.build_bidirectional_seq(input_seq, rev_seq), + doc.label, + label_gain=(FLAGS.label_gain and not doc.is_validation)) + class_writer = writer_valid_class if doc.is_validation else writer_class + bd_class_writer = (writer_bd_valid_class + if doc.is_validation else writer_bd_class) + class_writer.write(label_seq.seq.SerializeToString()) + bd_class_writer.write(bd_label_seq.seq.SerializeToString()) + + # Write + lm_seq_ser = lm_seq.seq.SerializeToString() + seq_ae_seq_ser = seq_ae_seq.seq.SerializeToString() + writer_lm_all.write(lm_seq_ser) + writer_seq_ae_all.write(seq_ae_seq_ser) + if not doc.is_validation: + writer_lm.write(lm_seq_ser) + writer_rev_lm.write(rev_lm_seq.seq.SerializeToString()) + writer_seq_ae.write(seq_ae_seq_ser) + + # Close writers + writer_lm.close() + writer_seq_ae.close() + writer_class.close() + writer_valid_class.close() + writer_rev_lm.close() + writer_bd_class.close() + writer_bd_valid_class.close() + + +def generate_test_data(vocab_ids, writer_lm_all, writer_seq_ae_all): + """Generates test data.""" + # Construct test data writers + writer_lm = build_shuffling_tf_record_writer(data.TEST_LM) + writer_rev_lm = build_shuffling_tf_record_writer(data.TEST_REV_LM) + writer_seq_ae = build_shuffling_tf_record_writer(data.TEST_SA) + writer_class = build_tf_record_writer(data.TEST_CLASS) + writer_bd_class = build_shuffling_tf_record_writer(data.TEST_BD_CLASS) + + for doc in document_generators.documents( + dataset='test', include_unlabeled=False, include_validation=True): + input_seq = build_input_sequence(doc, vocab_ids) + if len(input_seq) < 2: + continue + rev_seq = data.build_reverse_sequence(input_seq) + lm_seq = data.build_lm_sequence(input_seq) + rev_lm_seq = data.build_lm_sequence(rev_seq) + seq_ae_seq = data.build_seq_ae_sequence(input_seq) + label_seq = data.build_labeled_sequence(input_seq, doc.label) + bd_label_seq = data.build_labeled_sequence( + data.build_bidirectional_seq(input_seq, rev_seq), doc.label) + + # Write + writer_class.write(label_seq.seq.SerializeToString()) + writer_bd_class.write(bd_label_seq.seq.SerializeToString()) + lm_seq_ser = lm_seq.seq.SerializeToString() + seq_ae_seq_ser = seq_ae_seq.seq.SerializeToString() + writer_lm.write(lm_seq_ser) + writer_rev_lm.write(rev_lm_seq.seq.SerializeToString()) + writer_seq_ae.write(seq_ae_seq_ser) + writer_lm_all.write(lm_seq_ser) + writer_seq_ae_all.write(seq_ae_seq_ser) + + # Close test writers + writer_lm.close() + writer_rev_lm.close() + writer_seq_ae.close() + writer_class.close() + writer_bd_class.close() + + +def main(_): + tf.logging.set_verbosity(tf.logging.INFO) + tf.logging.info('Assigning vocabulary ids...') + vocab_ids = make_vocab_ids( + FLAGS.vocab_file or os.path.join(FLAGS.output_dir, 'vocab.txt')) + + with build_shuffling_tf_record_writer(data.ALL_LM) as writer_lm_all: + with build_shuffling_tf_record_writer(data.ALL_SA) as writer_seq_ae_all: + + tf.logging.info('Generating training data...') + generate_training_data(vocab_ids, writer_lm_all, writer_seq_ae_all) + + tf.logging.info('Generating test data...') + generate_test_data(vocab_ids, writer_lm_all, writer_seq_ae_all) + + +if __name__ == '__main__': + tf.app.run() diff --git a/adversarial_text/data/gen_vocab.py b/adversarial_text/data/gen_vocab.py new file mode 100644 index 0000000000000000000000000000000000000000..2ee3e2cd0f6c1f8b750292e92d0c0440642e334a --- /dev/null +++ b/adversarial_text/data/gen_vocab.py @@ -0,0 +1,100 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Generates vocabulary and term frequency files for datasets.""" +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +from collections import defaultdict + +# Dependency imports + +import tensorflow as tf + +from adversarial_text.data import data_utils +from adversarial_text.data import document_generators + +flags = tf.app.flags +FLAGS = flags.FLAGS + +# Flags controlling input are in document_generators.py + +flags.DEFINE_string('output_dir', '', + 'Path to save vocab.txt and vocab_freq.txt.') + +flags.DEFINE_boolean('use_unlabeled', True, 'Whether to use the ' + 'unlabeled sentiment dataset in the vocabulary.') +flags.DEFINE_boolean('include_validation', False, 'Whether to include the ' + 'validation set in the vocabulary.') +flags.DEFINE_integer('doc_count_threshold', 1, 'The minimum number of ' + 'documents a word or bigram should occur in to keep ' + 'it in the vocabulary.') + +MAX_VOCAB_SIZE = 100 * 1000 + + +def fill_vocab_from_doc(doc, vocab_freqs, doc_counts): + """Fills vocabulary and doc counts with tokens from doc. + + Args: + doc: Document to read tokens from. + vocab_freqs: dict + doc_counts: dict + + Returns: + None + """ + doc_seen = set() + + for token in document_generators.tokens(doc): + if doc.add_tokens or token in vocab_freqs: + vocab_freqs[token] += 1 + if token not in doc_seen: + doc_counts[token] += 1 + doc_seen.add(token) + + +def main(_): + tf.logging.set_verbosity(tf.logging.INFO) + vocab_freqs = defaultdict(int) + doc_counts = defaultdict(int) + + # Fill vocabulary frequencies map and document counts map + for doc in document_generators.documents( + dataset='train', + include_unlabeled=FLAGS.use_unlabeled, + include_validation=FLAGS.include_validation): + fill_vocab_from_doc(doc, vocab_freqs, doc_counts) + + # Filter out low-occurring terms + vocab_freqs = dict((term, freq) for term, freq in vocab_freqs.iteritems() + if doc_counts[term] > FLAGS.doc_count_threshold) + + # Sort by frequency + ordered_vocab_freqs = data_utils.sort_vocab_by_frequency(vocab_freqs) + + # Limit vocab size + ordered_vocab_freqs = ordered_vocab_freqs[:MAX_VOCAB_SIZE] + + # Add EOS token + ordered_vocab_freqs.append((data_utils.EOS_TOKEN, 1)) + + # Write + tf.gfile.MakeDirs(FLAGS.output_dir) + data_utils.write_vocab_and_frequency(ordered_vocab_freqs, FLAGS.output_dir) + + +if __name__ == '__main__': + tf.app.run() diff --git a/adversarial_text/evaluate.py b/adversarial_text/evaluate.py new file mode 100644 index 0000000000000000000000000000000000000000..a6480ca7466b4b864195c699aa4063b10b4c9b73 --- /dev/null +++ b/adversarial_text/evaluate.py @@ -0,0 +1,138 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Evaluates text classification model.""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import math +import time + +# Dependency imports + +import tensorflow as tf + +import graphs + +flags = tf.app.flags +FLAGS = flags.FLAGS + +flags.DEFINE_string('master', '', + 'BNS name prefix of the Tensorflow eval master, ' + 'or "local".') +flags.DEFINE_string('eval_dir', '/tmp/text_eval', + 'Directory where to write event logs.') +flags.DEFINE_string('eval_data', 'test', 'Specify which dataset is used. ' + '("train", "valid", "test") ') + +flags.DEFINE_string('checkpoint_dir', '/tmp/text_train', + 'Directory where to read model checkpoints.') +flags.DEFINE_integer('eval_interval_secs', 60, 'How often to run the eval.') +flags.DEFINE_integer('num_examples', 32, 'Number of examples to run.') +flags.DEFINE_bool('run_once', False, 'Whether to run eval only once.') + + +def restore_from_checkpoint(sess, saver): + """Restore model from checkpoint. + + Args: + sess: Session. + saver: Saver for restoring the checkpoint. + + Returns: + bool: Whether the checkpoint was found and restored + """ + ckpt = tf.train.get_checkpoint_state(FLAGS.checkpoint_dir) + if not ckpt or not ckpt.model_checkpoint_path: + tf.logging.info('No checkpoint found at %s', FLAGS.checkpoint_dir) + return False + + saver.restore(sess, ckpt.model_checkpoint_path) + return True + + +def run_eval(eval_ops, summary_writer, saver): + """Runs evaluation over FLAGS.num_examples examples. + + Args: + eval_ops: dict + summary_writer: Summary writer. + saver: Saver. + + Returns: + dict, with value being the average over all examples. + """ + sv = tf.train.Supervisor(logdir=FLAGS.eval_dir, saver=None, summary_op=None) + with sv.managed_session( + master=FLAGS.master, start_standard_services=False) as sess: + if not restore_from_checkpoint(sess, saver): + return + sv.start_queue_runners(sess) + + metric_names, ops = zip(*eval_ops.items()) + value_ops, update_ops = zip(*ops) + + value_ops_dict = dict(zip(metric_names, value_ops)) + + # Run update ops + num_batches = int(math.ceil(FLAGS.num_examples / FLAGS.batch_size)) + tf.logging.info('Running %d batches for evaluation.', num_batches) + for i in range(num_batches): + if (i + 1) % 10 == 0: + tf.logging.info('Running batch %d/%d...', i + 1, num_batches) + if (i + 1) % 50 == 0: + _log_values(sess, value_ops_dict) + sess.run(update_ops) + + _log_values(sess, value_ops_dict, summary_writer=summary_writer) + + +def _log_values(sess, value_ops, summary_writer=None): + """Evaluate, log, and write summaries of the eval metrics in value_ops.""" + metric_names, value_ops = zip(*value_ops.items()) + values = sess.run(value_ops) + + tf.logging.info('Eval metric values:') + summary = tf.summary.Summary() + for name, val in zip(metric_names, values): + summary.value.add(tag=name, simple_value=val) + tf.logging.info('%s = %.3f', name, val) + + if summary_writer is not None: + global_step_val = sess.run(tf.train.get_global_step()) + summary_writer.add_summary(summary, global_step_val) + + +def main(_): + tf.logging.set_verbosity(tf.logging.INFO) + tf.gfile.MakeDirs(FLAGS.eval_dir) + tf.logging.info('Building eval graph...') + output = graphs.get_model().eval_graph(FLAGS.eval_data) + eval_ops, moving_averaged_variables = output + + saver = tf.train.Saver(moving_averaged_variables) + summary_writer = tf.summary.FileWriter( + FLAGS.eval_dir, graph=tf.get_default_graph()) + + while True: + run_eval(eval_ops, summary_writer, saver) + if FLAGS.run_once: + break + time.sleep(FLAGS.eval_interval_secs) + + +if __name__ == '__main__': + tf.app.run() diff --git a/adversarial_text/graphs.py b/adversarial_text/graphs.py new file mode 100644 index 0000000000000000000000000000000000000000..f6d049f178d89b021282cb8c433455943a48aa5b --- /dev/null +++ b/adversarial_text/graphs.py @@ -0,0 +1,664 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Virtual adversarial text models.""" +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import csv +import os + +# Dependency imports + +import tensorflow as tf + +import adversarial_losses as adv_lib +import inputs as inputs_lib +import layers as layers_lib + +flags = tf.app.flags +FLAGS = flags.FLAGS + +# Flags governing adversarial training are defined in adversarial_losses.py. + +# Classifier +flags.DEFINE_integer('num_classes', 2, 'Number of classes for classification') + +# Data path +flags.DEFINE_string('data_dir', '/tmp/IMDB', + 'Directory path to preprocessed text dataset.') +flags.DEFINE_string('vocab_freq_path', None, + 'Path to pre-calculated vocab frequency data. If ' + 'None, use FLAGS.data_dir/vocab_freq.txt.') +flags.DEFINE_integer('batch_size', 64, 'Size of the batch.') +flags.DEFINE_integer('num_timesteps', 100, 'Number of timesteps for BPTT') + +# Model architechture +flags.DEFINE_bool('bidir_lstm', False, 'Whether to build a bidirectional LSTM.') +flags.DEFINE_integer('rnn_num_layers', 1, 'Number of LSTM layers.') +flags.DEFINE_integer('rnn_cell_size', 512, + 'Number of hidden units in the LSTM.') +flags.DEFINE_integer('cl_num_layers', 1, + 'Number of hidden layers of classification model.') +flags.DEFINE_integer('cl_hidden_size', 30, + 'Number of hidden units in classification layer.') +flags.DEFINE_integer('num_candidate_samples', -1, + 'Num samples used in the sampled output layer.') +flags.DEFINE_bool('use_seq2seq_autoencoder', False, + 'If True, seq2seq auto-encoder is used to pretrain. ' + 'If False, standard language model is used.') + +# Vocabulary and embeddings +flags.DEFINE_integer('embedding_dims', 256, 'Dimensions of embedded vector.') +flags.DEFINE_integer('vocab_size', 86934, + 'The size of the vocaburary. This value ' + 'should be exactly same as the number of the ' + 'vocabulary used in dataset. Because the last ' + 'indexed vocabulary of the dataset preprocessed by ' + 'my preprocessed code, is always and here we ' + 'specify the with the the index.') +flags.DEFINE_bool('normalize_embeddings', True, + 'Normalize word embeddings by vocab frequency') + +# Optimization +flags.DEFINE_float('learning_rate', 0.001, 'Learning rate while fine-tuning.') +flags.DEFINE_float('learning_rate_decay_factor', 1.0, + 'Learning rate decay factor') +flags.DEFINE_boolean('sync_replicas', False, 'sync_replica or not') +flags.DEFINE_integer('replicas_to_aggregate', 1, + 'The number of replicas to aggregate') + +# Regularization +flags.DEFINE_float('max_grad_norm', 1.0, + 'Clip the global gradient norm to this value.') +flags.DEFINE_float('keep_prob_emb', 1.0, 'keep probability on embedding layer. ' + '0.5 is optimal on IMDB with virtual adversarial training.') +flags.DEFINE_float('keep_prob_lstm_out', 1.0, + 'keep probability on lstm output.') +flags.DEFINE_float('keep_prob_cl_hidden', 1.0, + 'keep probability on classification hidden layer') + + +def get_model(): + if FLAGS.bidir_lstm: + return VatxtBidirModel() + else: + return VatxtModel() + + +class VatxtModel(object): + """Constructs training and evaluation graphs. + + Main methods: `classifier_training()`, `language_model_training()`, + and `eval_graph()`. + + Variable reuse is a critical part of the model, both for sharing variables + between the language model and the classifier, and for reusing variables for + the adversarial loss calculation. To ensure correct variable reuse, all + variables are created in Keras-style layers, wherein stateful layers (i.e. + layers with variables) are represented as callable instances of the Layer + class. Each time the Layer instance is called, it is using the same variables. + + All Layers are constructed in the __init__ method and reused in the various + graph-building functions. + """ + + def __init__(self, cl_logits_input_dim=None): + self.global_step = tf.contrib.framework.get_or_create_global_step() + self.vocab_freqs = _get_vocab_freqs() + + # Cache VatxtInput objects + self.cl_inputs = None + self.lm_inputs = None + + # Cache intermediate Tensors that are reused + self.tensors = {} + + # Construct layers which are reused in constructing the LM and + # Classification graphs. Instantiating them all once here ensures that + # variable reuse works correctly. + self.layers = {} + self.layers['embedding'] = layers_lib.Embedding( + FLAGS.vocab_size, FLAGS.embedding_dims, FLAGS.normalize_embeddings, + self.vocab_freqs, FLAGS.keep_prob_emb) + self.layers['lstm'] = layers_lib.LSTM( + FLAGS.rnn_cell_size, FLAGS.rnn_num_layers, FLAGS.keep_prob_lstm_out) + self.layers['lm_loss'] = layers_lib.SoftmaxLoss( + FLAGS.vocab_size, + FLAGS.num_candidate_samples, + self.vocab_freqs, + name='LM_loss') + + cl_logits_input_dim = cl_logits_input_dim or FLAGS.rnn_cell_size + self.layers['cl_logits'] = layers_lib.cl_logits_subgraph( + [FLAGS.cl_hidden_size] * FLAGS.cl_num_layers, cl_logits_input_dim, + FLAGS.num_classes, FLAGS.keep_prob_cl_hidden) + + @property + def pretrained_variables(self): + return (self.layers['embedding'].trainable_weights + + self.layers['lstm'].trainable_weights) + + def classifier_training(self): + loss = self.classifier_graph() + train_op = optimize(loss, self.global_step) + return train_op, loss, self.global_step + + def language_model_training(self): + loss = self.language_model_graph() + train_op = optimize(loss, self.global_step) + return train_op, loss, self.global_step + + def classifier_graph(self): + """Constructs classifier graph from inputs to classifier loss. + + * Caches the VatxtInput object in `self.cl_inputs` + * Caches tensors: `cl_embedded`, `cl_logits`, `cl_loss` + + Returns: + loss: scalar float. + """ + inputs = _inputs('train', pretrain=False) + self.cl_inputs = inputs + embedded = self.layers['embedding'](inputs.tokens) + self.tensors['cl_embedded'] = embedded + + _, next_state, logits, loss = self.cl_loss_from_embedding( + embedded, return_intermediates=True) + tf.summary.scalar('classification_loss', loss) + self.tensors['cl_logits'] = logits + self.tensors['cl_loss'] = loss + + acc = layers_lib.accuracy(logits, inputs.labels, inputs.weights) + tf.summary.scalar('accuracy', acc) + + adv_loss = (self.adversarial_loss() * tf.constant( + FLAGS.adv_reg_coeff, name='adv_reg_coeff')) + tf.summary.scalar('adversarial_loss', adv_loss) + + total_loss = loss + adv_loss + tf.summary.scalar('total_classification_loss', total_loss) + + with tf.control_dependencies([inputs.save_state(next_state)]): + total_loss = tf.identity(total_loss) + + return total_loss + + def language_model_graph(self, compute_loss=True): + """Constructs LM graph from inputs to LM loss. + + * Caches the VatxtInput object in `self.lm_inputs` + * Caches tensors: `lm_embedded` + + Args: + compute_loss: bool, whether to compute and return the loss or stop after + the LSTM computation. + + Returns: + loss: scalar float. + """ + inputs = _inputs('train', pretrain=True) + self.lm_inputs = inputs + return self._lm_loss(inputs, compute_loss=compute_loss) + + def _lm_loss(self, + inputs, + emb_key='lm_embedded', + lstm_layer='lstm', + lm_loss_layer='lm_loss', + loss_name='lm_loss', + compute_loss=True): + embedded = self.layers['embedding'](inputs.tokens) + self.tensors[emb_key] = embedded + lstm_out, next_state = self.layers[lstm_layer](embedded, inputs.state, + inputs.length) + if compute_loss: + loss = self.layers[lm_loss_layer]( + [lstm_out, inputs.labels, inputs.weights]) + with tf.control_dependencies([inputs.save_state(next_state)]): + loss = tf.identity(loss) + tf.summary.scalar(loss_name, loss) + + return loss + + def eval_graph(self, dataset='test'): + """Constructs classifier evaluation graph. + + Args: + dataset: the labeled dataset to evaluate, {'train', 'test', 'valid'}. + + Returns: + eval_ops: dict + var_restore_dict: dict mapping variable restoration names to variables. + Trainable variables will be mapped to their moving average names. + """ + inputs = _inputs(dataset, pretrain=False) + embedded = self.layers['embedding'](inputs.tokens) + _, next_state, logits, _ = self.cl_loss_from_embedding( + embedded, inputs=inputs, return_intermediates=True) + + eval_ops = { + 'accuracy': + tf.contrib.metrics.streaming_accuracy( + layers_lib.predictions(logits), inputs.labels, inputs.weights) + } + + with tf.control_dependencies([inputs.save_state(next_state)]): + acc, acc_update = eval_ops['accuracy'] + acc_update = tf.identity(acc_update) + eval_ops['accuracy'] = (acc, acc_update) + + var_restore_dict = make_restore_average_vars_dict() + return eval_ops, var_restore_dict + + def cl_loss_from_embedding(self, + embedded, + inputs=None, + return_intermediates=False): + """Compute classification loss from embedding. + + Args: + embedded: 3-D float Tensor [batch_size, num_timesteps, embedding_dim] + inputs: VatxtInput, defaults to self.cl_inputs. + return_intermediates: bool, whether to return intermediate tensors or only + the final loss. + + Returns: + If return_intermediates is True: + lstm_out, next_state, logits, loss + Else: + loss + """ + if inputs is None: + inputs = self.cl_inputs + + lstm_out, next_state = self.layers['lstm'](embedded, inputs.state, + inputs.length) + logits = self.layers['cl_logits'](lstm_out) + loss = layers_lib.classification_loss(logits, inputs.labels, inputs.weights) + + if return_intermediates: + return lstm_out, next_state, logits, loss + else: + return loss + + def adversarial_loss(self): + """Compute adversarial loss based on FLAGS.adv_training_method.""" + + def random_perturbation_loss(): + return adv_lib.random_perturbation_loss(self.tensors['cl_embedded'], + self.cl_inputs.length, + self.cl_loss_from_embedding) + + def adversarial_loss(): + return adv_lib.adversarial_loss(self.tensors['cl_embedded'], + self.tensors['cl_loss'], + self.cl_loss_from_embedding) + + def virtual_adversarial_loss(): + """Computes virtual adversarial loss. + + Uses lm_inputs and constructs the language model graph if it hasn't yet + been constructed. + + Also ensures that the LM input states are saved for LSTM state-saving + BPTT. + + Returns: + loss: float scalar. + """ + if self.lm_inputs is None: + self.language_model_graph(compute_loss=False) + + def logits_from_embedding(embedded, return_next_state=False): + _, next_state, logits, _ = self.cl_loss_from_embedding( + embedded, inputs=self.lm_inputs, return_intermediates=True) + if return_next_state: + return next_state, logits + else: + return logits + + next_state, lm_cl_logits = logits_from_embedding( + self.tensors['lm_embedded'], return_next_state=True) + + va_loss = adv_lib.virtual_adversarial_loss( + lm_cl_logits, self.tensors['lm_embedded'], self.lm_inputs, + logits_from_embedding) + + with tf.control_dependencies([self.lm_inputs.save_state(next_state)]): + va_loss = tf.identity(va_loss) + + return va_loss + + def combo_loss(): + return adversarial_loss() + virtual_adversarial_loss() + + adv_training_methods = { + # Random perturbation + 'rp': random_perturbation_loss, + # Adversarial training + 'at': adversarial_loss, + # Virtual adversarial training + 'vat': virtual_adversarial_loss, + # Both at and vat + 'atvat': combo_loss, + '': lambda: tf.constant(0.), + None: lambda: tf.constant(0.), + } + + with tf.name_scope('adversarial_loss'): + return adv_training_methods[FLAGS.adv_training_method]() + + +class VatxtBidirModel(VatxtModel): + """Extension of VatxtModel that supports bidirectional input.""" + + def __init__(self): + super(VatxtBidirModel, + self).__init__(cl_logits_input_dim=FLAGS.rnn_cell_size * 2) + + # Reverse LSTM and LM loss for bidirectional models + self.layers['lstm_reverse'] = layers_lib.LSTM( + FLAGS.rnn_cell_size, + FLAGS.rnn_num_layers, + FLAGS.keep_prob_lstm_out, + name='LSTM_Reverse') + self.layers['lm_loss_reverse'] = layers_lib.SoftmaxLoss( + FLAGS.vocab_size, + FLAGS.num_candidate_samples, + self.vocab_freqs, + name='LM_loss_reverse') + + @property + def pretrained_variables(self): + variables = super(VatxtBidirModel, self).pretrained_variables + variables.extend(self.layers['lstm_reverse'].trainable_weights) + return variables + + def classifier_graph(self): + """Constructs classifier graph from inputs to classifier loss. + + * Caches the VatxtInput objects in `self.cl_inputs` + * Caches tensors: `cl_embedded` (tuple of forward and reverse), `cl_logits`, + `cl_loss` + + Returns: + loss: scalar float. + """ + inputs = _inputs('train', pretrain=False, bidir=True) + self.cl_inputs = inputs + f_inputs, _ = inputs + + # Embed both forward and reverse with a shared embedding + embedded = [self.layers['embedding'](inp.tokens) for inp in inputs] + self.tensors['cl_embedded'] = embedded + + _, next_states, logits, loss = self.cl_loss_from_embedding( + embedded, return_intermediates=True) + tf.summary.scalar('classification_loss', loss) + self.tensors['cl_logits'] = logits + self.tensors['cl_loss'] = loss + + acc = layers_lib.accuracy(logits, f_inputs.labels, f_inputs.weights) + tf.summary.scalar('accuracy', acc) + + adv_loss = (self.adversarial_loss() * tf.constant( + FLAGS.adv_reg_coeff, name='adv_reg_coeff')) + tf.summary.scalar('adversarial_loss', adv_loss) + + total_loss = loss + adv_loss + tf.summary.scalar('total_classification_loss', total_loss) + + saves = [inp.save_state(state) for (inp, state) in zip(inputs, next_states)] + with tf.control_dependencies(saves): + total_loss = tf.identity(total_loss) + + return total_loss + + def language_model_graph(self, compute_loss=True): + """Constructs forward and reverse LM graphs from inputs to LM losses. + + * Caches the VatxtInput objects in `self.lm_inputs` + * Caches tensors: `lm_embedded`, `lm_embedded_reverse` + + Args: + compute_loss: bool, whether to compute and return the loss or stop after + the LSTM computation. + + Returns: + loss: scalar float, sum of forward and reverse losses. + """ + inputs = _inputs('train', pretrain=True, bidir=True) + self.lm_inputs = inputs + f_inputs, r_inputs = inputs + f_loss = self._lm_loss(f_inputs, compute_loss=compute_loss) + r_loss = self._lm_loss( + r_inputs, + emb_key='lm_embedded_reverse', + lstm_layer='lstm_reverse', + lm_loss_layer='lm_loss_reverse', + loss_name='lm_loss_reverse', + compute_loss=compute_loss) + if compute_loss: + return f_loss + r_loss + + def eval_graph(self, dataset='test'): + """Constructs classifier evaluation graph. + + Args: + dataset: the labeled dataset to evaluate, {'train', 'test', 'valid'}. + + Returns: + eval_ops: dict + var_restore_dict: dict mapping variable restoration names to variables. + Trainable variables will be mapped to their moving average names. + """ + inputs = _inputs(dataset, pretrain=False, bidir=True) + embedded = [self.layers['embedding'](inp.tokens) for inp in inputs] + _, next_states, logits, _ = self.cl_loss_from_embedding( + embedded, inputs=inputs, return_intermediates=True) + f_inputs, _ = inputs + + eval_ops = { + 'accuracy': + tf.contrib.metrics.streaming_accuracy( + layers_lib.predictions(logits), f_inputs.labels, + f_inputs.weights) + } + + # Save states on accuracy update + saves = [inp.save_state(state) for (inp, state) in zip(inputs, next_states)] + with tf.control_dependencies(saves): + acc, acc_update = eval_ops['accuracy'] + acc_update = tf.identity(acc_update) + eval_ops['accuracy'] = (acc, acc_update) + + var_restore_dict = make_restore_average_vars_dict() + return eval_ops, var_restore_dict + + def cl_loss_from_embedding(self, + embedded, + inputs=None, + return_intermediates=False): + """Compute classification loss from embedding. + + Args: + embedded: Length 2 tuple of 3-D float Tensor + [batch_size, num_timesteps, embedding_dim]. + inputs: Length 2 tuple of VatxtInput, defaults to self.cl_inputs. + return_intermediates: bool, whether to return intermediate tensors or only + the final loss. + + Returns: + If return_intermediates is True: + lstm_out, next_states, logits, loss + Else: + loss + """ + if inputs is None: + inputs = self.cl_inputs + + out = [] + for (layer_name, emb, inp) in zip(['lstm', 'lstm_reverse'], embedded, + inputs): + out.append(self.layers[layer_name](emb, inp.state, inp.length)) + lstm_outs, next_states = zip(*out) + + # Concatenate output of forward and reverse LSTMs + lstm_out = tf.concat(lstm_outs, 1) + + logits = self.layers['cl_logits'](lstm_out) + f_inputs, _ = inputs # pylint: disable=unpacking-non-sequence + loss = layers_lib.classification_loss(logits, f_inputs.labels, + f_inputs.weights) + + if return_intermediates: + return lstm_out, next_states, logits, loss + else: + return loss + + def adversarial_loss(self): + """Compute adversarial loss based on FLAGS.adv_training_method.""" + + def random_perturbation_loss(): + return adv_lib.random_perturbation_loss_bidir(self.tensors['cl_embedded'], + self.cl_inputs[0].length, + self.cl_loss_from_embedding) + + def adversarial_loss(): + return adv_lib.adversarial_loss_bidir(self.tensors['cl_embedded'], + self.tensors['cl_loss'], + self.cl_loss_from_embedding) + + def virtual_adversarial_loss(): + """Computes virtual adversarial loss. + + Uses lm_inputs and constructs the language model graph if it hasn't yet + been constructed. + + Also ensures that the LM input states are saved for LSTM state-saving + BPTT. + + Returns: + loss: float scalar. + """ + if self.lm_inputs is None: + self.language_model_graph(compute_loss=False) + + def logits_from_embedding(embedded, return_next_state=False): + _, next_states, logits, _ = self.cl_loss_from_embedding( + embedded, inputs=self.lm_inputs, return_intermediates=True) + if return_next_state: + return next_states, logits + else: + return logits + + lm_embedded = (self.tensors['lm_embedded'], + self.tensors['lm_embedded_reverse']) + next_states, lm_cl_logits = logits_from_embedding( + lm_embedded, return_next_state=True) + + va_loss = adv_lib.virtual_adversarial_loss_bidir( + lm_cl_logits, lm_embedded, self.lm_inputs, logits_from_embedding) + + saves = [ + inp.save_state(state) + for (inp, state) in zip(self.lm_inputs, next_states) + ] + with tf.control_dependencies(saves): + va_loss = tf.identity(va_loss) + + return va_loss + + def combo_loss(): + return adversarial_loss() + virtual_adversarial_loss() + + adv_training_methods = { + # Random perturbation + 'rp': random_perturbation_loss, + # Adversarial training + 'at': adversarial_loss, + # Virtual adversarial training + 'vat': virtual_adversarial_loss, + # Both at and vat + 'atvat': combo_loss, + '': lambda: tf.constant(0.), + None: lambda: tf.constant(0.), + } + + with tf.name_scope('adversarial_loss'): + return adv_training_methods[FLAGS.adv_training_method]() + + +def _inputs(dataset='train', pretrain=False, bidir=False): + return inputs_lib.inputs( + data_dir=FLAGS.data_dir, + phase=dataset, + bidir=bidir, + pretrain=pretrain, + use_seq2seq=pretrain and FLAGS.use_seq2seq_autoencoder, + state_size=FLAGS.rnn_cell_size, + num_layers=FLAGS.rnn_num_layers, + batch_size=FLAGS.batch_size, + unroll_steps=FLAGS.num_timesteps, + eos_id=FLAGS.vocab_size - 1) + + +def _get_vocab_freqs(): + """Returns vocab frequencies. + + Returns: + List of integers, length=FLAGS.vocab_size. + + Raises: + ValueError: if the length of the frequency file is not equal to the vocab + size, or if the file is not found. + """ + path = FLAGS.vocab_freq_path or os.path.join(FLAGS.data_dir, 'vocab_freq.txt') + + if tf.gfile.Exists(path): + with tf.gfile.Open(path) as f: + # Get pre-calculated frequencies of words. + reader = csv.reader(f, quoting=csv.QUOTE_NONE) + freqs = [int(row[-1]) for row in reader] + if len(freqs) != FLAGS.vocab_size: + raise ValueError('Frequency file length %d != vocab size %d' % + (len(freqs), FLAGS.vocab_size)) + else: + if FLAGS.vocab_freq_path: + raise ValueError('vocab_freq_path not found') + freqs = [1] * FLAGS.vocab_size + + return freqs + + +def make_restore_average_vars_dict(): + """Returns dict mapping moving average names to variables.""" + var_restore_dict = {} + variable_averages = tf.train.ExponentialMovingAverage(0.999) + for v in tf.global_variables(): + if v in tf.trainable_variables(): + name = variable_averages.average_name(v) + else: + name = v.op.name + var_restore_dict[name] = v + return var_restore_dict + + +def optimize(loss, global_step): + return layers_lib.optimize( + loss, global_step, FLAGS.max_grad_norm, FLAGS.learning_rate, + FLAGS.learning_rate_decay_factor, FLAGS.sync_replicas, + FLAGS.replicas_to_aggregate, FLAGS.task) diff --git a/adversarial_text/graphs_test.py b/adversarial_text/graphs_test.py new file mode 100644 index 0000000000000000000000000000000000000000..433afbe743fea18103261263f602bec3504989eb --- /dev/null +++ b/adversarial_text/graphs_test.py @@ -0,0 +1,225 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Tests for graphs.""" +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +from collections import defaultdict +import operator +import os +import random +import shutil +import string +import tempfile + +# Dependency imports + +import tensorflow as tf + +import graphs +from adversarial_text.data import data_utils + +flags = tf.app.flags +FLAGS = flags.FLAGS +data = data_utils + +flags.DEFINE_integer('task', 0, 'Task id; needed for SyncReplicas test') + + +def _build_random_vocabulary(vocab_size=100): + """Builds and returns a dict.""" + vocab = set() + while len(vocab) < (vocab_size - 1): + rand_word = ''.join( + random.choice(string.ascii_lowercase) + for _ in range(random.randint(1, 10))) + vocab.add(rand_word) + + vocab_ids = dict([(word, i) for i, word in enumerate(vocab)]) + vocab_ids[data.EOS_TOKEN] = vocab_size - 1 + return vocab_ids + + +def _build_random_sequence(vocab_ids): + seq_len = random.randint(10, 200) + ids = vocab_ids.values() + seq = data.SequenceWrapper() + for token_id in [random.choice(ids) for _ in range(seq_len)]: + seq.add_timestep().set_token(token_id) + return seq + + +def _build_vocab_frequencies(seqs, vocab_ids): + vocab_freqs = defaultdict(int) + ids_to_words = dict([(i, word) for word, i in vocab_ids.iteritems()]) + for seq in seqs: + for timestep in seq: + vocab_freqs[ids_to_words[timestep.token]] += 1 + + vocab_freqs[data.EOS_TOKEN] = 0 + return vocab_freqs + + +class GraphsTest(tf.test.TestCase): + """Test graph construction methods.""" + + @classmethod + def setUpClass(cls): + # Make model small + FLAGS.batch_size = 2 + FLAGS.num_timesteps = 3 + FLAGS.embedding_dims = 4 + FLAGS.rnn_num_layers = 2 + FLAGS.rnn_cell_size = 4 + FLAGS.cl_num_layers = 2 + FLAGS.cl_hidden_size = 4 + FLAGS.vocab_size = 10 + + # Set input/output flags + FLAGS.data_dir = tempfile.mkdtemp() + + # Build and write sequence files. + vocab_ids = _build_random_vocabulary(FLAGS.vocab_size) + seqs = [_build_random_sequence(vocab_ids) for _ in range(5)] + seqs_label = [ + data.build_labeled_sequence(seq, random.choice([True, False])) + for seq in seqs + ] + seqs_lm = [data.build_lm_sequence(seq) for seq in seqs] + seqs_ae = [data.build_seq_ae_sequence(seq) for seq in seqs] + seqs_rev = [data.build_reverse_sequence(seq) for seq in seqs] + seqs_bidir = [ + data.build_bidirectional_seq(seq, rev) + for seq, rev in zip(seqs, seqs_rev) + ] + seqs_bidir_label = [ + data.build_labeled_sequence(bd_seq, random.choice([True, False])) + for bd_seq in seqs_bidir + ] + + filenames = [ + data.TRAIN_CLASS, data.TRAIN_LM, data.TRAIN_SA, data.TEST_CLASS, + data.TRAIN_REV_LM, data.TRAIN_BD_CLASS, data.TEST_BD_CLASS + ] + seq_lists = [ + seqs_label, seqs_lm, seqs_ae, seqs_label, seqs_rev, seqs_bidir, + seqs_bidir_label + ] + for fname, seq_list in zip(filenames, seq_lists): + with tf.python_io.TFRecordWriter( + os.path.join(FLAGS.data_dir, fname)) as writer: + for seq in seq_list: + writer.write(seq.seq.SerializeToString()) + + # Write vocab.txt and vocab_freq.txt + vocab_freqs = _build_vocab_frequencies(seqs, vocab_ids) + ordered_vocab_freqs = sorted( + vocab_freqs.items(), key=operator.itemgetter(1), reverse=True) + with open(os.path.join(FLAGS.data_dir, 'vocab.txt'), 'w') as vocab_f: + with open(os.path.join(FLAGS.data_dir, 'vocab_freq.txt'), 'w') as freq_f: + for word, freq in ordered_vocab_freqs: + vocab_f.write('{}\n'.format(word)) + freq_f.write('{}\n'.format(freq)) + + @classmethod + def tearDownClass(cls): + shutil.rmtree(FLAGS.data_dir) + + def setUp(self): + # Reset FLAGS + FLAGS.rnn_num_layers = 1 + FLAGS.sync_replicas = False + FLAGS.adv_training_method = None + FLAGS.num_candidate_samples = -1 + FLAGS.num_classes = 2 + FLAGS.use_seq2seq_autoencoder = False + + # Reset Graph + tf.reset_default_graph() + + def testClassifierGraph(self): + FLAGS.rnn_num_layers = 2 + model = graphs.VatxtModel() + train_op, _, _ = model.classifier_training() + # Pretrained vars: embedding + LSTM layers + self.assertEqual( + len(model.pretrained_variables), 1 + 2 * FLAGS.rnn_num_layers) + with self.test_session() as sess: + sess.run(tf.global_variables_initializer()) + tf.train.start_queue_runners(sess) + sess.run(train_op) + + def testLanguageModelGraph(self): + train_op, _, _ = graphs.VatxtModel().language_model_training() + with self.test_session() as sess: + sess.run(tf.global_variables_initializer()) + tf.train.start_queue_runners(sess) + sess.run(train_op) + + def testMulticlass(self): + FLAGS.num_classes = 10 + graphs.VatxtModel().classifier_graph() + + def testATMethods(self): + at_methods = [None, 'rp', 'at', 'vat', 'atvat'] + for method in at_methods: + FLAGS.adv_training_method = method + with tf.Graph().as_default(): + graphs.VatxtModel().classifier_graph() + + # Ensure variables have been reused + # Embedding + LSTM layers + hidden layers + logits layer + expected_num_vars = 1 + 2 * FLAGS.rnn_num_layers + 2 * ( + FLAGS.cl_num_layers) + 2 + self.assertEqual(len(tf.trainable_variables()), expected_num_vars) + + def testSyncReplicas(self): + FLAGS.sync_replicas = True + graphs.VatxtModel().language_model_training() + + def testCandidateSampling(self): + FLAGS.num_candidate_samples = 10 + graphs.VatxtModel().language_model_training() + + def testSeqAE(self): + FLAGS.use_seq2seq_autoencoder = True + graphs.VatxtModel().language_model_training() + + def testBidirLM(self): + graphs.VatxtBidirModel().language_model_graph() + + def testBidirClassifier(self): + at_methods = [None, 'rp', 'at', 'vat', 'atvat'] + for method in at_methods: + FLAGS.adv_training_method = method + with tf.Graph().as_default(): + graphs.VatxtBidirModel().classifier_graph() + + # Ensure variables have been reused + # Embedding + 2 LSTM layers + hidden layers + logits layer + expected_num_vars = 1 + 2 * 2 * FLAGS.rnn_num_layers + 2 * ( + FLAGS.cl_num_layers) + 2 + self.assertEqual(len(tf.trainable_variables()), expected_num_vars) + + def testEvalGraph(self): + _, _ = graphs.VatxtModel().eval_graph() + + def testBidirEvalGraph(self): + _, _ = graphs.VatxtBidirModel().eval_graph() + + +if __name__ == '__main__': + tf.test.main() diff --git a/adversarial_text/inputs.py b/adversarial_text/inputs.py new file mode 100644 index 0000000000000000000000000000000000000000..5a2e462cb0718bc23a819617f5e797cbc8d90753 --- /dev/null +++ b/adversarial_text/inputs.py @@ -0,0 +1,353 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Input utils for virtual adversarial text classification.""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os + +# Dependency imports + +import tensorflow as tf + +from adversarial_text.data import data_utils + + +class VatxtInput(object): + """Wrapper around NextQueuedSequenceBatch.""" + + def __init__(self, + batch, + state_name=None, + tokens=None, + num_states=0, + eos_id=None): + """Construct VatxtInput. + + Args: + batch: NextQueuedSequenceBatch. + state_name: str, name of state to fetch and save. + tokens: int Tensor, tokens. Defaults to batch's F_TOKEN_ID sequence. + num_states: int The number of states to store. + eos_id: int Id of end of Sequence. + """ + self._batch = batch + self._state_name = state_name + self._tokens = (tokens if tokens is not None else + batch.sequences[data_utils.SequenceWrapper.F_TOKEN_ID]) + self._num_states = num_states + + # Once the tokens have passed through embedding and LSTM, the output Tensor + # shapes will be time-major, i.e. shape = (time, batch, dim). Here we make + # both weights and labels time-major with a transpose, and then merge the + # time and batch dimensions such that they are both vectors of shape + # (time*batch). + w = batch.sequences[data_utils.SequenceWrapper.F_WEIGHT] + w = tf.transpose(w, [1, 0]) + w = tf.reshape(w, [-1]) + self._weights = w + + l = batch.sequences[data_utils.SequenceWrapper.F_LABEL] + l = tf.transpose(l, [1, 0]) + l = tf.reshape(l, [-1]) + self._labels = l + + # eos weights + self._eos_weights = None + if eos_id: + ew = tf.cast(tf.equal(self._tokens, eos_id), tf.float32) + ew = tf.transpose(ew, [1, 0]) + ew = tf.reshape(ew, [-1]) + self._eos_weights = ew + + @property + def tokens(self): + return self._tokens + + @property + def weights(self): + return self._weights + + @property + def eos_weights(self): + return self._eos_weights + + @property + def labels(self): + return self._labels + + @property + def length(self): + return self._batch.length + + @property + def state_name(self): + return self._state_name + + @property + def state(self): + # LSTM tuple states + state_names = _get_tuple_state_names(self._num_states, self._state_name) + return tuple([ + tf.contrib.rnn.LSTMStateTuple( + self._batch.state(c_name), self._batch.state(h_name)) + for c_name, h_name in state_names + ]) + + def save_state(self, value): + # LSTM tuple states + state_names = _get_tuple_state_names(self._num_states, self._state_name) + save_ops = [] + for (c_state, h_state), (c_name, h_name) in zip(value, state_names): + save_ops.append(self._batch.save_state(c_name, c_state)) + save_ops.append(self._batch.save_state(h_name, h_state)) + return tf.group(*save_ops) + + +def _get_tuple_state_names(num_states, base_name): + """Returns state names for use with LSTM tuple state.""" + state_names = [('{}_{}_c'.format(i, base_name), '{}_{}_h'.format( + i, base_name)) for i in range(num_states)] + return state_names + + +def _split_bidir_tokens(batch): + tokens = batch.sequences[data_utils.SequenceWrapper.F_TOKEN_ID] + # Tokens have shape [batch, time, 2] + # forward and reverse have shape [batch, time]. + forward, reverse = [ + tf.squeeze(t, axis=[2]) for t in tf.split(tokens, 2, axis=2) + ] + return forward, reverse + + +def _filenames_for_data_spec(phase, bidir, pretrain, use_seq2seq): + """Returns input filenames for configuration. + + Args: + phase: str, 'train', 'test', or 'valid'. + bidir: bool, bidirectional model. + pretrain: bool, pretraining or classification. + use_seq2seq: bool, seq2seq data, only valid if pretrain=True. + + Returns: + Tuple of filenames. + + Raises: + ValueError: if an invalid combination of arguments is provided that does not + map to any data files (e.g. pretrain=False, use_seq2seq=True). + """ + data_spec = (phase, bidir, pretrain, use_seq2seq) + data_specs = { + ('train', True, True, False): (data_utils.TRAIN_LM, + data_utils.TRAIN_REV_LM), + ('train', True, False, False): (data_utils.TRAIN_BD_CLASS,), + ('train', False, True, False): (data_utils.TRAIN_LM,), + ('train', False, True, True): (data_utils.TRAIN_SA,), + ('train', False, False, False): (data_utils.TRAIN_CLASS,), + ('test', True, True, False): (data_utils.TEST_LM, + data_utils.TRAIN_REV_LM), + ('test', True, False, False): (data_utils.TEST_BD_CLASS,), + ('test', False, True, False): (data_utils.TEST_LM,), + ('test', False, True, True): (data_utils.TEST_SA,), + ('test', False, False, False): (data_utils.TEST_CLASS,), + ('valid', True, False, False): (data_utils.VALID_BD_CLASS,), + ('valid', False, False, False): (data_utils.VALID_CLASS,), + } + if data_spec not in data_specs: + raise ValueError( + 'Data specification (phase, bidir, pretrain, use_seq2seq) %s not ' + 'supported' % str(data_spec)) + + return data_specs[data_spec] + + +def _read_single_sequence_example(file_list, tokens_shape=None): + """Reads and parses SequenceExamples from TFRecord-encoded file_list.""" + tf.logging.info('Constructing TFRecordReader from files: %s', file_list) + file_queue = tf.train.string_input_producer(file_list) + reader = tf.TFRecordReader() + seq_key, serialized_record = reader.read(file_queue) + ctx, sequence = tf.parse_single_sequence_example( + serialized_record, + sequence_features={ + data_utils.SequenceWrapper.F_TOKEN_ID: + tf.FixedLenSequenceFeature(tokens_shape or [], dtype=tf.int64), + data_utils.SequenceWrapper.F_LABEL: + tf.FixedLenSequenceFeature([], dtype=tf.int64), + data_utils.SequenceWrapper.F_WEIGHT: + tf.FixedLenSequenceFeature([], dtype=tf.float32), + }) + return seq_key, ctx, sequence + + +def _read_and_batch(data_dir, + fname, + state_name, + state_size, + num_layers, + unroll_steps, + batch_size, + bidir_input=False): + """Inputs for text model. + + Args: + data_dir: str, directory containing TFRecord files of SequenceExample. + fname: str, input file name. + state_name: string, key for saved state of LSTM. + state_size: int, size of LSTM state. + num_layers: int, the number of layers in the LSTM. + unroll_steps: int, number of timesteps to unroll for TBTT. + batch_size: int, batch size. + bidir_input: bool, whether the input is bidirectional. If True, creates 2 + states, state_name and state_name + '_reverse'. + + Returns: + Instance of NextQueuedSequenceBatch + + Raises: + ValueError: if file for input specification is not found. + """ + data_path = os.path.join(data_dir, fname) + if not tf.gfile.Exists(data_path): + raise ValueError('Failed to find file: %s' % data_path) + + tokens_shape = [2] if bidir_input else [] + seq_key, ctx, sequence = _read_single_sequence_example( + [data_path], tokens_shape=tokens_shape) + # Set up stateful queue reader. + state_names = _get_tuple_state_names(num_layers, state_name) + initial_states = {} + for c_state, h_state in state_names: + initial_states[c_state] = tf.zeros(state_size) + initial_states[h_state] = tf.zeros(state_size) + if bidir_input: + rev_state_names = _get_tuple_state_names(num_layers, + '{}_reverse'.format(state_name)) + for rev_c_state, rev_h_state in rev_state_names: + initial_states[rev_c_state] = tf.zeros(state_size) + initial_states[rev_h_state] = tf.zeros(state_size) + batch = tf.contrib.training.batch_sequences_with_states( + input_key=seq_key, + input_sequences=sequence, + input_context=ctx, + input_length=tf.shape(sequence['token_id'])[0], + initial_states=initial_states, + num_unroll=unroll_steps, + batch_size=batch_size, + allow_small_batch=False, + num_threads=4, + capacity=batch_size * 10, + make_keys_unique=True, + make_keys_unique_seed=29392) + return batch + + +def inputs(data_dir=None, + phase='train', + bidir=False, + pretrain=False, + use_seq2seq=False, + state_name='lstm', + state_size=None, + num_layers=0, + batch_size=32, + unroll_steps=100, + eos_id=None): + """Inputs for text model. + + Args: + data_dir: str, directory containing TFRecord files of SequenceExample. + phase: str, dataset for evaluation {'train', 'valid', 'test'}. + bidir: bool, bidirectional LSTM. + pretrain: bool, whether to read pretraining data or classification data. + use_seq2seq: bool, whether to read seq2seq data or the language model data. + state_name: string, key for saved state of LSTM. + state_size: int, size of LSTM state. + num_layers: int, the number of LSTM layers. + batch_size: int, batch size. + unroll_steps: int, number of timesteps to unroll for TBTT. + eos_id: int, id of end of sequence. used for the kl weights on vat + Returns: + Instance of VatxtInput (x2 if bidir=True and pretrain=True, i.e. forward and + reverse). + """ + with tf.name_scope('inputs'): + filenames = _filenames_for_data_spec(phase, bidir, pretrain, use_seq2seq) + + if bidir and pretrain: + # Bidirectional pretraining + # Requires separate forward and reverse language model data. + forward_fname, reverse_fname = filenames + forward_batch = _read_and_batch(data_dir, forward_fname, state_name, + state_size, num_layers, unroll_steps, + batch_size) + state_name_rev = state_name + '_reverse' + reverse_batch = _read_and_batch(data_dir, reverse_fname, state_name_rev, + state_size, num_layers, unroll_steps, + batch_size) + forward_input = VatxtInput( + forward_batch, + state_name=state_name, + num_states=num_layers, + eos_id=eos_id) + reverse_input = VatxtInput( + reverse_batch, + state_name=state_name_rev, + num_states=num_layers, + eos_id=eos_id) + return forward_input, reverse_input + + elif bidir: + # Classifier bidirectional LSTM + # Shared data source, but separate token/state streams + fname, = filenames + batch = _read_and_batch( + data_dir, + fname, + state_name, + state_size, + num_layers, + unroll_steps, + batch_size, + bidir_input=True) + forward_tokens, reverse_tokens = _split_bidir_tokens(batch) + forward_input = VatxtInput( + batch, + state_name=state_name, + tokens=forward_tokens, + num_states=num_layers) + reverse_input = VatxtInput( + batch, + state_name=state_name + '_reverse', + tokens=reverse_tokens, + num_states=num_layers) + return forward_input, reverse_input + else: + # Unidirectional LM or classifier + fname, = filenames + batch = _read_and_batch( + data_dir, + fname, + state_name, + state_size, + num_layers, + unroll_steps, + batch_size, + bidir_input=False) + return VatxtInput( + batch, state_name=state_name, num_states=num_layers, eos_id=eos_id) diff --git a/adversarial_text/layers.py b/adversarial_text/layers.py new file mode 100644 index 0000000000000000000000000000000000000000..f99f8e27ff2ef64bf5c10e219890df902b1ade66 --- /dev/null +++ b/adversarial_text/layers.py @@ -0,0 +1,394 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Layers for VatxtModel.""" +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +# Dependency imports + +import tensorflow as tf +K = tf.contrib.keras + + +def cl_logits_subgraph(layer_sizes, input_size, num_classes, keep_prob=1.): + """Construct multiple ReLU layers with dropout and a linear layer.""" + subgraph = K.models.Sequential(name='cl_logits') + for i, layer_size in enumerate(layer_sizes): + if i == 0: + subgraph.add( + K.layers.Dense(layer_size, activation='relu', input_dim=input_size)) + else: + subgraph.add(K.layers.Dense(layer_size, activation='relu')) + + if keep_prob < 1.: + subgraph.add(K.layers.Dropout(1. - keep_prob)) + subgraph.add(K.layers.Dense(1 if num_classes == 2 else num_classes)) + return subgraph + + +class Embedding(K.layers.Layer): + """Embedding layer with frequency-based normalization and dropout.""" + + def __init__(self, + vocab_size, + embedding_dim, + normalize=False, + vocab_freqs=None, + keep_prob=1., + **kwargs): + self.vocab_size = vocab_size + self.embedding_dim = embedding_dim + self.normalized = normalize + self.keep_prob = keep_prob + + if normalize: + assert vocab_freqs is not None + self.vocab_freqs = tf.constant( + vocab_freqs, dtype=tf.float32, shape=(vocab_size, 1)) + + super(Embedding, self).__init__(**kwargs) + + def build(self, input_shape): + with tf.device('/cpu:0'): + self.var = self.add_weight( + shape=(self.vocab_size, self.embedding_dim), + initializer=tf.random_uniform_initializer(-1., 1.), + name='embedding') + + if self.normalized: + self.var = self._normalize(self.var) + + super(Embedding, self).build(input_shape) + + def call(self, x): + embedded = tf.nn.embedding_lookup(self.var, x) + if self.keep_prob < 1.: + shape = embedded.get_shape().as_list() + + # Use same dropout masks at each timestep with specifying noise_shape. + # This slightly improves performance. + # Please see https://arxiv.org/abs/1512.05287 for the theoretical + # explanation. + embedded = tf.nn.dropout( + embedded, self.keep_prob, noise_shape=(shape[0], 1, shape[2])) + return embedded + + def _normalize(self, emb): + weights = self.vocab_freqs / tf.reduce_sum(self.vocab_freqs) + mean = tf.reduce_sum(weights * emb, 0, keep_dims=True) + var = tf.reduce_sum(weights * tf.pow(emb - mean, 2.), 0, keep_dims=True) + stddev = tf.sqrt(1e-6 + var) + return (emb - mean) / stddev + + +class LSTM(object): + """LSTM layer using static_rnn. + + Exposes variables in `trainable_weights` property. + """ + + def __init__(self, cell_size, num_layers=1, keep_prob=1., name='LSTM'): + self.cell_size = cell_size + self.num_layers = num_layers + self.keep_prob = keep_prob + self.reuse = None + self.trainable_weights = None + self.name = name + + def __call__(self, x, initial_state, seq_length): + with tf.variable_scope(self.name, reuse=self.reuse) as vs: + cell = tf.contrib.rnn.MultiRNNCell([ + tf.contrib.rnn.BasicLSTMCell( + self.cell_size, + forget_bias=0.0, + reuse=tf.get_variable_scope().reuse) + for _ in xrange(self.num_layers) + ]) + + # shape(x) = (batch_size, num_timesteps, embedding_dim) + # Convert into a time-major list for static_rnn + x = tf.unstack(tf.transpose(x, perm=[1, 0, 2])) + + lstm_out, next_state = tf.contrib.rnn.static_rnn( + cell, x, initial_state=initial_state, sequence_length=seq_length) + + # Merge time and batch dimensions + # shape(lstm_out) = timesteps * (batch_size, cell_size) + lstm_out = tf.concat(lstm_out, 0) + # shape(lstm_out) = (timesteps*batch_size, cell_size) + + if self.keep_prob < 1.: + lstm_out = tf.nn.dropout(lstm_out, self.keep_prob) + + if self.reuse is None: + self.trainable_weights = vs.global_variables() + + self.reuse = True + + return lstm_out, next_state + + +class SoftmaxLoss(K.layers.Layer): + """Softmax xentropy loss with candidate sampling.""" + + def __init__(self, + vocab_size, + num_candidate_samples=-1, + vocab_freqs=None, + **kwargs): + self.vocab_size = vocab_size + self.num_candidate_samples = num_candidate_samples + self.vocab_freqs = vocab_freqs + super(SoftmaxLoss, self).__init__(**kwargs) + + def build(self, input_shape): + input_shape = input_shape[0] + with tf.device('/cpu:0'): + self.lin_w = self.add_weight( + shape=(input_shape[-1], self.vocab_size), + name='lm_lin_w', + initializer=K.initializers.glorot_uniform()) + self.lin_b = self.add_weight( + shape=(self.vocab_size,), + name='lm_lin_b', + initializer=K.initializers.glorot_uniform()) + + super(SoftmaxLoss, self).build(input_shape) + + def call(self, inputs): + x, labels, weights = inputs + if self.num_candidate_samples > -1: + assert self.vocab_freqs is not None + labels = tf.expand_dims(labels, -1) + sampled = tf.nn.fixed_unigram_candidate_sampler( + true_classes=labels, + num_true=1, + num_sampled=self.num_candidate_samples, + unique=True, + range_max=self.vocab_size, + unigrams=self.vocab_freqs) + + lm_loss = tf.nn.sampled_softmax_loss( + weights=tf.transpose(self.lin_w), + biases=self.lin_b, + labels=labels, + inputs=x, + num_sampled=self.num_candidate_samples, + num_classes=self.vocab_size, + sampled_values=sampled) + else: + logits = tf.matmul(x, self.lin_w) + self.lin_b + lm_loss = tf.nn.sparse_softmax_cross_entropy_with_logits( + logits=logits, labels=labels) + + lm_loss = tf.identity( + tf.reduce_sum(lm_loss * weights) / _num_labels(weights), + name='lm_xentropy_loss') + return lm_loss + + +def classification_loss(logits, labels, weights): + """Computes cross entropy loss between logits and labels. + + Args: + logits: 2-D [timesteps*batch_size, m] float tensor, where m=1 if + num_classes=2, otherwise m=num_classes. + labels: 1-D [timesteps*batch_size] integer tensor. + weights: 1-D [timesteps*batch_size] float tensor. + + Returns: + Loss scalar of type float. + """ + inner_dim = logits.get_shape().as_list()[-1] + with tf.name_scope('classifier_loss'): + # Logistic loss + if inner_dim == 1: + loss = tf.nn.sigmoid_cross_entropy_with_logits( + logits=tf.squeeze(logits), labels=tf.cast(labels, tf.float32)) + # Softmax loss + else: + loss = tf.nn.sparse_softmax_cross_entropy_with_logits( + logits=logits, labels=labels) + + num_lab = _num_labels(weights) + tf.summary.scalar('num_labels', num_lab) + return tf.identity( + tf.reduce_sum(weights * loss) / num_lab, name='classification_xentropy') + + +def accuracy(logits, targets, weights): + """Computes prediction accuracy. + + Args: + logits: 2-D classifier logits [timesteps*batch_size, num_classes] + targets: 1-D [timesteps*batch_size] integer tensor. + weights: 1-D [timesteps*batch_size] float tensor. + + Returns: + Accuracy: float scalar. + """ + with tf.name_scope('accuracy'): + eq = tf.cast(tf.equal(predictions(logits), targets), tf.float32) + return tf.identity( + tf.reduce_sum(weights * eq) / _num_labels(weights), name='accuracy') + + +def predictions(logits): + """Class prediction from logits.""" + inner_dim = logits.get_shape().as_list()[-1] + with tf.name_scope('predictions'): + # For binary classification + if inner_dim == 1: + pred = tf.cast(tf.greater(tf.squeeze(logits), 0.5), tf.int64) + # For multi-class classification + else: + pred = tf.argmax(logits, 1) + return pred + + +def _num_labels(weights): + """Number of 1's in weights. Returns 1. if 0.""" + num_labels = tf.reduce_sum(weights) + num_labels = tf.where(tf.equal(num_labels, 0.), 1., num_labels) + return num_labels + + +def optimize(loss, + global_step, + max_grad_norm, + lr, + lr_decay, + sync_replicas=False, + replicas_to_aggregate=1, + task_id=0): + """Builds optimization graph. + + * Creates an optimizer, and optionally wraps with SyncReplicasOptimizer + * Computes, clips, and applies gradients + * Maintains moving averages for all trainable variables + * Summarizes variables and gradients + + Args: + loss: scalar loss to minimize. + global_step: integer scalar Variable. + max_grad_norm: float scalar. Grads will be clipped to this value. + lr: float scalar, learning rate. + lr_decay: float scalar, learning rate decay rate. + sync_replicas: bool, whether to use SyncReplicasOptimizer. + replicas_to_aggregate: int, number of replicas to aggregate when using + SyncReplicasOptimizer. + task_id: int, id of the current task; used to ensure proper initialization + of SyncReplicasOptimizer. + + Returns: + train_op + """ + with tf.name_scope('optimization'): + # Compute gradients. + tvars = tf.trainable_variables() + grads = tf.gradients( + loss, + tvars, + aggregation_method=tf.AggregationMethod.EXPERIMENTAL_ACCUMULATE_N) + + # Clip non-embedding grads + non_embedding_grads_and_vars = [(g, v) for (g, v) in zip(grads, tvars) + if 'embedding' not in v.op.name] + embedding_grads_and_vars = [(g, v) for (g, v) in zip(grads, tvars) + if 'embedding' in v.op.name] + + ne_grads, ne_vars = zip(*non_embedding_grads_and_vars) + ne_grads, _ = tf.clip_by_global_norm(ne_grads, max_grad_norm) + non_embedding_grads_and_vars = zip(ne_grads, ne_vars) + + grads_and_vars = embedding_grads_and_vars + non_embedding_grads_and_vars + + # Summarize + _summarize_vars_and_grads(grads_and_vars) + + # Decaying learning rate + lr = tf.train.exponential_decay( + lr, global_step, 1, lr_decay, staircase=True) + tf.summary.scalar('learning_rate', lr) + opt = tf.train.AdamOptimizer(lr) + + # Track the moving averages of all trainable variables. + variable_averages = tf.train.ExponentialMovingAverage(0.999, global_step) + + # Apply gradients + if sync_replicas: + opt = tf.train.SyncReplicasOptimizer( + opt, + replicas_to_aggregate, + variable_averages=variable_averages, + variables_to_average=tvars, + total_num_replicas=replicas_to_aggregate) + apply_gradient_op = opt.apply_gradients( + grads_and_vars, global_step=global_step) + with tf.control_dependencies([apply_gradient_op]): + train_op = tf.no_op(name='train_op') + + # Initialization ops + tf.add_to_collection(tf.GraphKeys.QUEUE_RUNNERS, + opt.get_chief_queue_runner()) + if task_id == 0: # Chief task + local_init_op = opt.chief_init_op + tf.add_to_collection('chief_init_op', opt.get_init_tokens_op()) + else: + local_init_op = opt.local_step_init_op + tf.add_to_collection('local_init_op', local_init_op) + tf.add_to_collection('ready_for_local_init_op', + opt.ready_for_local_init_op) + else: + # Non-sync optimizer + variables_averages_op = variable_averages.apply(tvars) + apply_gradient_op = opt.apply_gradients(grads_and_vars, global_step) + with tf.control_dependencies([apply_gradient_op, variables_averages_op]): + train_op = tf.no_op(name='train_op') + + return train_op + + +def _summarize_vars_and_grads(grads_and_vars): + tf.logging.info('Trainable variables:') + tf.logging.info('-' * 60) + for grad, var in grads_and_vars: + tf.logging.info(var) + + def tag(name, v=var): + return v.op.name + '_' + name + + # Variable summary + mean = tf.reduce_mean(var) + tf.summary.scalar(tag('mean'), mean) + with tf.name_scope(tag('stddev')): + stddev = tf.sqrt(tf.reduce_mean(tf.square(var - mean))) + tf.summary.scalar(tag('stddev'), stddev) + tf.summary.scalar(tag('max'), tf.reduce_max(var)) + tf.summary.scalar(tag('min'), tf.reduce_min(var)) + tf.summary.histogram(tag('histogram'), var) + + # Gradient summary + if grad is not None: + if isinstance(grad, tf.IndexedSlices): + grad_values = grad.values + else: + grad_values = grad + + tf.summary.histogram(tag('gradient'), grad_values) + tf.summary.scalar(tag('gradient_norm'), tf.global_norm([grad_values])) + else: + tf.logging.info('Var %s has no gradient', var.op.name) diff --git a/adversarial_text/pretrain.py b/adversarial_text/pretrain.py new file mode 100644 index 0000000000000000000000000000000000000000..4e1fa6a4cbbfd1b9f5086036555627f3453acc70 --- /dev/null +++ b/adversarial_text/pretrain.py @@ -0,0 +1,46 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Pretrains a recurrent language model. + +Computational time: + 2 days to train 100000 steps on 1 layer 1024 hidden units LSTM, + 256 embeddings, 400 truncated BP, 256 minibatch and on single GPU (Pascal + Titan X, cuDNNv5). +""" +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +# Dependency imports + +import tensorflow as tf + +import graphs +import train_utils + +FLAGS = tf.app.flags.FLAGS + + +def main(_): + """Trains Language Model.""" + tf.logging.set_verbosity(tf.logging.INFO) + with tf.device(tf.train.replica_device_setter(FLAGS.ps_tasks)): + model = graphs.get_model() + train_op, loss, global_step = model.language_model_training() + train_utils.run_training(train_op, loss, global_step) + + +if __name__ == '__main__': + tf.app.run() diff --git a/adversarial_text/train_classifier.py b/adversarial_text/train_classifier.py new file mode 100644 index 0000000000000000000000000000000000000000..f498d2c2fb9fd16f5c38bc10e9d80c124e127cb4 --- /dev/null +++ b/adversarial_text/train_classifier.py @@ -0,0 +1,63 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Trains LSTM text classification model. + +Model trains with adversarial or virtual adversarial training. + +Computational time: + 1.8 hours to train 10000 steps without adversarial or virtual adversarial + training, on 1 layer 1024 hidden units LSTM, 256 embeddings, 400 truncated + BP, 64 minibatch and on single GPU (Pascal Titan X, cuDNNv5). + + 4 hours to train 10000 steps with adversarial or virtual adversarial + training, with above condition. + +To initialize embedding and LSTM cell weights from a pretrained model, set +FLAGS.pretrained_model_dir to the pretrained model's checkpoint directory. +""" +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +# Dependency imports + +import tensorflow as tf + +import graphs +import train_utils + +flags = tf.app.flags +FLAGS = flags.FLAGS + +flags.DEFINE_string('pretrained_model_dir', None, + 'Directory path to pretrained model to restore from') + + +def main(_): + """Trains LSTM classification model.""" + tf.logging.set_verbosity(tf.logging.INFO) + with tf.device(tf.train.replica_device_setter(FLAGS.ps_tasks)): + model = graphs.get_model() + train_op, loss, global_step = model.classifier_training() + train_utils.run_training( + train_op, + loss, + global_step, + variables_to_restore=model.pretrained_variables, + pretrained_model_dir=FLAGS.pretrained_model_dir) + + +if __name__ == '__main__': + tf.app.run() diff --git a/adversarial_text/train_utils.py b/adversarial_text/train_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..2c09d7ae33dc7c7015b702d36e5c9a0335930286 --- /dev/null +++ b/adversarial_text/train_utils.py @@ -0,0 +1,134 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Utilities for training adversarial text models.""" +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import time + +# Dependency imports + +import numpy as np +import tensorflow as tf + +flags = tf.app.flags +FLAGS = flags.FLAGS + +flags.DEFINE_string('master', '', 'Master address.') +flags.DEFINE_integer('task', 0, 'Task id of the replica running the training.') +flags.DEFINE_integer('ps_tasks', 0, 'Number of parameter servers.') +flags.DEFINE_string('train_dir', '/tmp/text_train', + 'Directory for logs and checkpoints.') +flags.DEFINE_integer('max_steps', 1000000, 'Number of batches to run.') +flags.DEFINE_boolean('log_device_placement', False, + 'Whether to log device placement.') + + +def run_training(train_op, + loss, + global_step, + variables_to_restore=None, + pretrained_model_dir=None): + """Sets up and runs training loop.""" + tf.gfile.MakeDirs(FLAGS.train_dir) + + # Create pretrain Saver + if pretrained_model_dir: + assert variables_to_restore + tf.logging.info('Will attempt restore from %s: %s', pretrained_model_dir, + variables_to_restore) + saver_for_restore = tf.train.Saver(variables_to_restore) + + # Init ops + if FLAGS.sync_replicas: + local_init_op = tf.get_collection('local_init_op')[0] + ready_for_local_init_op = tf.get_collection('ready_for_local_init_op')[0] + else: + local_init_op = tf.train.Supervisor.USE_DEFAULT + ready_for_local_init_op = tf.train.Supervisor.USE_DEFAULT + + is_chief = FLAGS.task == 0 + sv = tf.train.Supervisor( + logdir=FLAGS.train_dir, + is_chief=is_chief, + save_summaries_secs=5 * 60, + save_model_secs=5 * 60, + local_init_op=local_init_op, + ready_for_local_init_op=ready_for_local_init_op, + global_step=global_step) + + # Delay starting standard services to allow possible pretrained model restore. + with sv.managed_session( + master=FLAGS.master, + config=tf.ConfigProto(log_device_placement=FLAGS.log_device_placement), + start_standard_services=False) as sess: + # Initialization + if is_chief: + if pretrained_model_dir: + maybe_restore_pretrained_model(sess, saver_for_restore, + pretrained_model_dir) + if FLAGS.sync_replicas: + sess.run(tf.get_collection('chief_init_op')[0]) + sv.start_standard_services(sess) + + sv.start_queue_runners(sess) + + # Training loop + global_step_val = 0 + while not sv.should_stop() and global_step_val < FLAGS.max_steps: + global_step_val = train_step(sess, train_op, loss, global_step) + sv.stop() + + # Final checkpoint + if is_chief: + sv.saver.save(sess, sv.save_path, global_step=global_step) + + +def maybe_restore_pretrained_model(sess, saver_for_restore, model_dir): + """Restores pretrained model if there is no ckpt model.""" + ckpt = tf.train.get_checkpoint_state(FLAGS.train_dir) + checkpoint_exists = ckpt and ckpt.model_checkpoint_path + if checkpoint_exists: + tf.logging.info('Checkpoint exists in FLAGS.train_dir; skipping ' + 'pretraining restore') + return + + pretrain_ckpt = tf.train.get_checkpoint_state(model_dir) + if not (pretrain_ckpt and pretrain_ckpt.model_checkpoint_path): + raise ValueError( + 'Asked to restore model from %s but no checkpoint found.' % model_dir) + saver_for_restore.restore(sess, pretrain_ckpt.model_checkpoint_path) + + +def train_step(sess, train_op, loss, global_step): + """Runs a single training step.""" + start_time = time.time() + _, loss_val, global_step_val = sess.run([train_op, loss, global_step]) + duration = time.time() - start_time + + # Logging + if global_step_val % 10 == 0: + examples_per_sec = FLAGS.batch_size / duration + sec_per_batch = float(duration) + + format_str = ('step %d, loss = %.2f (%.1f examples/sec; %.3f ' 'sec/batch)') + tf.logging.info(format_str % (global_step_val, loss_val, examples_per_sec, + sec_per_batch)) + + if np.isnan(loss_val): + raise OverflowError('Loss is nan') + + return global_step_val diff --git a/attention_ocr/README.md b/attention_ocr/README.md new file mode 100644 index 0000000000000000000000000000000000000000..1079bb74073c4e4d355c9f15c046bdfd99b696d6 --- /dev/null +++ b/attention_ocr/README.md @@ -0,0 +1,179 @@ +## Attention-based Extraction of Structured Information from Street View Imagery + +*A TensorFlow model for real-world image text extraction problems.* + +This folder contains the code needed to train a new Attention OCR model on the +[FSNS dataset][FSNS] dataset to transcribe street names in France. You can +also use it to train it on your own data. + +More details can be found in our paper: + +["Attention-based Extraction of Structured Information from Street View +Imagery"](https://arxiv.org/abs/1704.03549) + +## Contacts + +Authors: +Zbigniew Wojna , +Alexander Gorban + +Pull requests: +[alexgorban](https://github.com/alexgorban) + +## Requirements + +1. Install the TensorFlow library ([instructions][TF]). For example: + +``` +virtualenv --system-site-packages ~/.tensorflow +source ~/.tensorflow/bin/activate +pip install --upgrade pip +pip install --upgrade tensorflow_gpu +``` + +2. At least 158GB of free disk space to download the FSNS dataset: + +``` +cd models/attention_ocr/python/datasets +aria2c -c -j 20 -i ../../../street/python/fsns_urls.txt +cd .. +``` + +3. 16GB of RAM or more; 32GB is recommended. +4. `train.py` works with both CPU and GPU, though using GPU is preferable. It has been tested with a Titan X and with a GTX980. + +[TF]: https://www.tensorflow.org/install/ +[FSNS]: https://github.com/tensorflow/models/tree/master/street + +## How to use this code + +To run all unit tests: + +``` +cd models/attention_ocr/python +python -m unittest discover -p '*_test.py' +``` + +To train from scratch: + +``` +python train.py +``` + +To train a model using pre-trained Inception weights as initialization: + +``` +wget http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz +tar xf inception_v3_2016_08_28.tar.gz +python train.py --checkpoint_inception=inception_v3.ckpt +``` + +To fine tune the Attention OCR model using a checkpoint: + +``` +wget http://download.tensorflow.org/models/attention_ocr_2017_05_17.tar.gz +tar xf attention_ocr_2017_05_17.tar.gz +python train.py --checkpoint=model.ckpt-399731 +``` + +## How to use your own image data to train the model + +You need to define a new dataset. There are two options: + +1. Store data in the same format as the FSNS dataset and just reuse the +[python/datasets/fsns.py](https://github.com/tensorflow/models/blob/master/attention_ocr/python/datasets/fsns.py) +module. E.g., create a file datasets/newtextdataset.py: +``` +import fsns + +DEFAULT_DATASET_DIR = 'path/to/the/dataset' + +DEFAULT_CONFIG = { + 'name': + 'MYDATASET', + 'splits': { + 'train': { + 'size': 123, + 'pattern': 'tfexample_train*' + }, + 'test': { + 'size': 123, + 'pattern': 'tfexample_test*' + } + }, + 'charset_filename': + 'charset_size.txt', + 'image_shape': (150, 600, 3), + 'num_of_views': + 4, + 'max_sequence_length': + 37, + 'null_code': + 42, + 'items_to_descriptions': { + 'image': + 'A [150 x 600 x 3] color image.', + 'label': + 'Characters codes.', + 'text': + 'A unicode string.', + 'length': + 'A length of the encoded text.', + 'num_of_views': + 'A number of different views stored within the image.' + } +} + + +def get_split(split_name, dataset_dir=None, config=None): + if not dataset_dir: + dataset_dir = DEFAULT_DATASET_DIR + if not config: + config = DEFAULT_CONFIG + + return fsns.get_split(split_name, dataset_dir, config) +``` +You will also need to include it into the `datasets/__init__.py` and specify the +dataset name in the command line. + +``` +python train.py --dataset_name=newtextdataset +``` + +Please note that eval.py will also require the same flag. + +2. Define a new dataset format. The model needs the following data to train: + +- images: input images, shape [batch_size x H x W x 3]; +- labels: ground truth label ids, shape=[batch_size x seq_length]; +- labels_one_hot: labels in one-hot encoding, shape [batch_size x seq_length x num_char_classes]; + +Refer to [python/data_provider.py](https://github.com/tensorflow/models/blob/master/attention_ocr/python/data_provider.py#L33) +for more details. You can use [python/datasets/fsns.py](https://github.com/tensorflow/models/blob/master/attention_ocr/python/datasets/fsns.py) +as the example. + +## How to use a pre-trained model + +The inference part was not released yet, but it is pretty straightforward to +implement one in Python or C++. + +The recommended way is to use the [Serving infrastructure](https://tensorflow.github.io/serving/serving_basic). + +Alternatively you can: +1. define a placeholder for images (or use directly an numpy array) +2. [create a graph ](https://github.com/tensorflow/models/blob/master/attention_ocr/python/eval.py#L60) +`endpoints = model.create_base(images_placeholder, labels_one_hot=None)` +3. [load a pretrained model](https://github.com/tensorflow/models/blob/master/attention_ocr/python/model.py#L494) +4. run computations through the graph: +`predictions = sess.run(endpoints.predicted_chars, feed_dict={images_placeholder:images_actual_data})` +5. Convert character IDs (predictions) to UTF8 using the provided charset file. + +## Disclaimer + +This code is a modified version of the internal model we used for our paper. +Currently it reaches 83.79% full sequence accuracy after 400k steps of training. +The main difference between this version and the version used in the paper - for +the paper we used a distributed training with 50 GPU (K80) workers (asynchronous +updates), the provided checkpoint was created using this code after ~6 days of +training on a single GPU (Titan X) (it reached 81% after 24 hours of training), +the coordinate encoding is missing TODO(alexgorban@). diff --git a/attention_ocr/python/all_jobs.screenrc b/attention_ocr/python/all_jobs.screenrc new file mode 100644 index 0000000000000000000000000000000000000000..ef7fdf237387c95eeb9a61e507b1c74db212502d --- /dev/null +++ b/attention_ocr/python/all_jobs.screenrc @@ -0,0 +1,9 @@ +# A GPU/screen config to run all jobs for training and evaluation in parallel. +# Execute: +# source /path/to/your/virtualenv/bin/activate +# screen -R TF -c all_jobs.screenrc + +screen -t train 0 python train.py --train_log_dir=workdir/train +screen -t eval_train 1 python eval.py --split_name=train --train_log_dir=workdir/train --eval_log_dir=workdir/eval_train +screen -t eval_test 2 python eval.py --split_name=test --train_log_dir=workdir/train --eval_log_dir=workdir/eval_test +screen -t tensorboard 3 tensorboard --logdir=workdir diff --git a/attention_ocr/python/common_flags.py b/attention_ocr/python/common_flags.py new file mode 100644 index 0000000000000000000000000000000000000000..996bf4c6c0e9aa67135e7a6f4b47d64b1e1f9e41 --- /dev/null +++ b/attention_ocr/python/common_flags.py @@ -0,0 +1,149 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Define flags are common for both train.py and eval.py scripts.""" +import sys + +from tensorflow.python.platform import flags +import logging + +import datasets +import model + +FLAGS = flags.FLAGS + +logging.basicConfig( + level=logging.DEBUG, + stream=sys.stderr, + format='%(levelname)s ' + '%(asctime)s.%(msecs)06d: ' + '%(filename)s: ' + '%(lineno)d ' + '%(message)s', + datefmt='%Y-%m-%d %H:%M:%S') + + +def define(): + """Define common flags.""" + # yapf: disable + flags.DEFINE_integer('batch_size', 32, + 'Batch size.') + + flags.DEFINE_integer('crop_width', None, + 'Width of the central crop for images.') + + flags.DEFINE_integer('crop_height', None, + 'Height of the central crop for images.') + + flags.DEFINE_string('train_log_dir', '/tmp/attention_ocr/train', + 'Directory where to write event logs.') + + flags.DEFINE_string('dataset_name', 'fsns', + 'Name of the dataset. Supported: fsns') + + flags.DEFINE_string('split_name', 'train', + 'Dataset split name to run evaluation for: test,train.') + + flags.DEFINE_string('dataset_dir', None, + 'Dataset root folder.') + + flags.DEFINE_string('checkpoint', '', + 'Path for checkpoint to restore weights from.') + + flags.DEFINE_string('master', + '', + 'BNS name of the TensorFlow master to use.') + + # Model hyper parameters + flags.DEFINE_float('learning_rate', 0.004, + 'learning rate') + + flags.DEFINE_string('optimizer', 'momentum', + 'the optimizer to use') + + flags.DEFINE_string('momentum', 0.9, + 'momentum value for the momentum optimizer if used') + + flags.DEFINE_bool('use_augment_input', True, + 'If True will use image augmentation') + + # Method hyper parameters + # conv_tower_fn + flags.DEFINE_string('final_endpoint', 'Mixed_5d', + 'Endpoint to cut inception tower') + + # sequence_logit_fn + flags.DEFINE_bool('use_attention', True, + 'If True will use the attention mechanism') + + flags.DEFINE_bool('use_autoregression', True, + 'If True will use autoregression (a feedback link)') + + flags.DEFINE_integer('num_lstm_units', 256, + 'number of LSTM units for sequence LSTM') + + flags.DEFINE_float('weight_decay', 0.00004, + 'weight decay for char prediction FC layers') + + flags.DEFINE_float('lstm_state_clip_value', 10.0, + 'cell state is clipped by this value prior to the cell' + ' output activation') + + # 'sequence_loss_fn' + flags.DEFINE_float('label_smoothing', 0.1, + 'weight for label smoothing') + + flags.DEFINE_bool('ignore_nulls', True, + 'ignore null characters for computing the loss') + + flags.DEFINE_bool('average_across_timesteps', False, + 'divide the returned cost by the total label weight') + # yapf: enable + + +def get_crop_size(): + if FLAGS.crop_width and FLAGS.crop_height: + return (FLAGS.crop_width, FLAGS.crop_height) + else: + return None + + +def create_dataset(split_name): + ds_module = getattr(datasets, FLAGS.dataset_name) + return ds_module.get_split(split_name, dataset_dir=FLAGS.dataset_dir) + + +def create_mparams(): + return { + 'conv_tower_fn': + model.ConvTowerParams(final_endpoint=FLAGS.final_endpoint), + 'sequence_logit_fn': + model.SequenceLogitsParams( + use_attention=FLAGS.use_attention, + use_autoregression=FLAGS.use_autoregression, + num_lstm_units=FLAGS.num_lstm_units, + weight_decay=FLAGS.weight_decay, + lstm_state_clip_value=FLAGS.lstm_state_clip_value), + 'sequence_loss_fn': + model.SequenceLossParams( + label_smoothing=FLAGS.label_smoothing, + ignore_nulls=FLAGS.ignore_nulls, + average_across_timesteps=FLAGS.average_across_timesteps) + } + + +def create_model(*args, **kwargs): + ocr_model = model.Model(mparams=create_mparams(), *args, **kwargs) + return ocr_model diff --git a/attention_ocr/python/data_provider.py b/attention_ocr/python/data_provider.py new file mode 100644 index 0000000000000000000000000000000000000000..1b1181158385cc181566176ae85b710a291b7826 --- /dev/null +++ b/attention_ocr/python/data_provider.py @@ -0,0 +1,199 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Functions to read, decode and pre-process input data for the Model. +""" +import collections +import functools +import tensorflow as tf +from tensorflow.contrib import slim + +import inception_preprocessing + +# Tuple to store input data endpoints for the Model. +# It has following fields (tensors): +# images: input images, +# shape [batch_size x H x W x 3]; +# labels: ground truth label ids, +# shape=[batch_size x seq_length]; +# labels_one_hot: labels in one-hot encoding, +# shape [batch_size x seq_length x num_char_classes]; +InputEndpoints = collections.namedtuple( + 'InputEndpoints', ['images', 'images_orig', 'labels', 'labels_one_hot']) + +# A namedtuple to define a configuration for shuffled batch fetching. +# num_batching_threads: A number of parallel threads to fetch data. +# queue_capacity: a max number of elements in the batch shuffling queue. +# min_after_dequeue: a min number elements in the queue after a dequeue, used +# to ensure a level of mixing of elements. +ShuffleBatchConfig = collections.namedtuple('ShuffleBatchConfig', [ + 'num_batching_threads', 'queue_capacity', 'min_after_dequeue' +]) + +DEFAULT_SHUFFLE_CONFIG = ShuffleBatchConfig( + num_batching_threads=8, queue_capacity=3000, min_after_dequeue=1000) + + +def augment_image(image): + """Augmentation the image with a random modification. + + Args: + image: input Tensor image of rank 3, with the last dimension + of size 3. + + Returns: + Distorted Tensor image of the same shape. + """ + with tf.variable_scope('AugmentImage'): + height = image.get_shape().dims[0].value + width = image.get_shape().dims[1].value + + # Random crop cut from the street sign image, resized to the same size. + # Assures that the crop is covers at least 0.8 area of the input image. + bbox_begin, bbox_size, _ = tf.image.sample_distorted_bounding_box( + tf.shape(image), + bounding_boxes=tf.zeros([0, 0, 4]), + min_object_covered=0.8, + aspect_ratio_range=[0.8, 1.2], + area_range=[0.8, 1.0], + use_image_if_no_bounding_boxes=True) + distorted_image = tf.slice(image, bbox_begin, bbox_size) + + # Randomly chooses one of the 4 interpolation methods + distorted_image = inception_preprocessing.apply_with_random_selector( + distorted_image, + lambda x, method: tf.image.resize_images(x, [height, width], method), + num_cases=4) + distorted_image.set_shape([height, width, 3]) + + # Color distortion + distorted_image = inception_preprocessing.apply_with_random_selector( + distorted_image, + functools.partial( + inception_preprocessing.distort_color, fast_mode=False), + num_cases=4) + distorted_image = tf.clip_by_value(distorted_image, -1.5, 1.5) + + return distorted_image + + +def central_crop(image, crop_size): + """Returns a central crop for the specified size of an image. + + Args: + image: A tensor with shape [height, width, channels] + crop_size: A tuple (crop_width, crop_height) + + Returns: + A tensor of shape [crop_height, crop_width, channels]. + """ + with tf.variable_scope('CentralCrop'): + target_width, target_height = crop_size + image_height, image_width = tf.shape(image)[0], tf.shape(image)[1] + assert_op1 = tf.Assert( + tf.greater_equal(image_height, target_height), + ['image_height < target_height', image_height, target_height]) + assert_op2 = tf.Assert( + tf.greater_equal(image_width, target_width), + ['image_width < target_width', image_width, target_width]) + with tf.control_dependencies([assert_op1, assert_op2]): + offset_width = (image_width - target_width) / 2 + offset_height = (image_height - target_height) / 2 + return tf.image.crop_to_bounding_box(image, offset_height, offset_width, + target_height, target_width) + + +def preprocess_image(image, augment=False, central_crop_size=None, + num_towers=4): + """Normalizes image to have values in a narrow range around zero. + + Args: + image: a [H x W x 3] uint8 tensor. + augment: optional, if True do random image distortion. + central_crop_size: A tuple (crop_width, crop_height). + num_towers: optional, number of shots of the same image in the input image. + + Returns: + A float32 tensor of shape [H x W x 3] with RGB values in the required + range. + """ + with tf.variable_scope('PreprocessImage'): + image = tf.image.convert_image_dtype(image, dtype=tf.float32) + if augment or central_crop_size: + if num_towers == 1: + images = [image] + else: + images = tf.split(value=image, num_or_size_splits=num_towers, axis=1) + if central_crop_size: + view_crop_size = (central_crop_size[0] / num_towers, + central_crop_size[1]) + images = [central_crop(img, view_crop_size) for img in images] + if augment: + images = [augment_image(img) for img in images] + image = tf.concat(images, 1) + + image = tf.subtract(image, 0.5) + image = tf.multiply(image, 2.5) + + return image + + +def get_data(dataset, + batch_size, + augment=False, + central_crop_size=None, + shuffle_config=None, + shuffle=True): + """Wraps calls to DatasetDataProviders and shuffle_batch. + + For more details about supported Dataset objects refer to datasets/fsns.py. + + Args: + dataset: a slim.data.dataset.Dataset object. + batch_size: number of samples per batch. + augment: optional, if True does random image distortion. + central_crop_size: A CharLogittuple (crop_width, crop_height). + shuffle_config: A namedtuple ShuffleBatchConfig. + shuffle: if True use data shuffling. + + Returns: + + """ + if not shuffle_config: + shuffle_config = DEFAULT_SHUFFLE_CONFIG + + provider = slim.dataset_data_provider.DatasetDataProvider( + dataset, + shuffle=shuffle, + common_queue_capacity=2 * batch_size, + common_queue_min=batch_size) + image_orig, label = provider.get(['image', 'label']) + + image = preprocess_image( + image_orig, augment, central_crop_size, num_towers=dataset.num_of_views) + label_one_hot = slim.one_hot_encoding(label, dataset.num_char_classes) + + images, images_orig, labels, labels_one_hot = (tf.train.shuffle_batch( + [image, image_orig, label, label_one_hot], + batch_size=batch_size, + num_threads=shuffle_config.num_batching_threads, + capacity=shuffle_config.queue_capacity, + min_after_dequeue=shuffle_config.min_after_dequeue)) + + return InputEndpoints( + images=images, + images_orig=images_orig, + labels=labels, + labels_one_hot=labels_one_hot) diff --git a/attention_ocr/python/data_provider_test.py b/attention_ocr/python/data_provider_test.py new file mode 100644 index 0000000000000000000000000000000000000000..551bc75e02cc470c40aad8a4066b6bba7ceeb62c --- /dev/null +++ b/attention_ocr/python/data_provider_test.py @@ -0,0 +1,72 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for data_provider.""" + +import numpy as np +import tensorflow as tf +from tensorflow.contrib.slim import queues + +import datasets +import data_provider + + +class DataProviderTest(tf.test.TestCase): + def setUp(self): + tf.test.TestCase.setUp(self) + + def test_preprocessed_image_values_are_in_range(self): + image_shape = (5, 4, 3) + fake_image = np.random.randint(low=0, high=255, size=image_shape) + image_tf = data_provider.preprocess_image(fake_image) + + with self.test_session() as sess: + image_np = sess.run(image_tf) + + self.assertEqual(image_np.shape, image_shape) + min_value, max_value = np.min(image_np), np.max(image_np) + self.assertTrue((-1.28 < min_value) and (min_value < 1.27)) + self.assertTrue((-1.28 < max_value) and (max_value < 1.27)) + + def test_provided_data_has_correct_shape(self): + batch_size = 4 + data = data_provider.get_data( + dataset=datasets.fsns_test.get_test_split(), + batch_size=batch_size, + augment=True, + central_crop_size=None) + + with self.test_session() as sess, queues.QueueRunners(sess): + images_np, labels_np = sess.run([data.images, data.labels_one_hot]) + + self.assertEqual(images_np.shape, (batch_size, 150, 600, 3)) + self.assertEqual(labels_np.shape, (batch_size, 37, 134)) + + def test_optionally_applies_central_crop(self): + batch_size = 4 + data = data_provider.get_data( + dataset=datasets.fsns_test.get_test_split(), + batch_size=batch_size, + augment=True, + central_crop_size=(500, 100)) + + with self.test_session() as sess, queues.QueueRunners(sess): + images_np = sess.run(data.images) + + self.assertEqual(images_np.shape, (batch_size, 100, 500, 3)) + + +if __name__ == '__main__': + tf.test.main() diff --git a/attention_ocr/python/datasets/__init__.py b/attention_ocr/python/datasets/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e2fef7b2dd275051861a29c6d4f708162575eac6 --- /dev/null +++ b/attention_ocr/python/datasets/__init__.py @@ -0,0 +1,19 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +import fsns +import fsns_test + +__all__ = [fsns, fsns_test] diff --git a/attention_ocr/python/datasets/fsns.py b/attention_ocr/python/datasets/fsns.py new file mode 100644 index 0000000000000000000000000000000000000000..d8dd5efb4eb047889b4f8cdab30a1c872f51f44b --- /dev/null +++ b/attention_ocr/python/datasets/fsns.py @@ -0,0 +1,183 @@ +# -*- coding: utf-8 -*- +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Configuration to read FSNS dataset https://goo.gl/3Ldm8v.""" + +import os +import re +import tensorflow as tf +from tensorflow.contrib import slim +import logging + +DEFAULT_DATASET_DIR = os.path.join(os.path.dirname(__file__), 'data/fsns') + +# The dataset configuration, should be used only as a default value. +DEFAULT_CONFIG = { + 'name': 'FSNS', + 'splits': { + 'train': { + 'size': 1044868, + 'pattern': 'train/train*' + }, + 'test': { + 'size': 20404, + 'pattern': 'test/test*' + }, + 'validation': { + 'size': 16150, + 'pattern': 'validation/validation*' + } + }, + 'charset_filename': 'charset_size=134.txt', + 'image_shape': (150, 600, 3), + 'num_of_views': 4, + 'max_sequence_length': 37, + 'null_code': 133, + 'items_to_descriptions': { + 'image': 'A [150 x 600 x 3] color image.', + 'label': 'Characters codes.', + 'text': 'A unicode string.', + 'length': 'A length of the encoded text.', + 'num_of_views': 'A number of different views stored within the image.' + } +} + + +def read_charset(filename, null_character=u'\u2591'): + """Reads a charset definition from a tab separated text file. + + charset file has to have format compatible with the FSNS dataset. + + Args: + filename: a path to the charset file. + null_character: a unicode character used to replace '' character. the + default value is a light shade block '░'. + + Returns: + a dictionary with keys equal to character codes and values - unicode + characters. + """ + pattern = re.compile(r'(\d+)\t(.+)') + charset = {} + with tf.gfile.GFile(filename) as f: + for i, line in enumerate(f): + m = pattern.match(line) + if m is None: + logging.warning('incorrect charset file. line #%d: %s', i, line) + continue + code = int(m.group(1)) + char = m.group(2).decode('utf-8') + if char == '': + char = null_character + charset[code] = char + return charset + + +class _NumOfViewsHandler(slim.tfexample_decoder.ItemHandler): + """Convenience handler to determine number of views stored in an image.""" + + def __init__(self, width_key, original_width_key, num_of_views): + super(_NumOfViewsHandler, self).__init__([width_key, original_width_key]) + self._width_key = width_key + self._original_width_key = original_width_key + self._num_of_views = num_of_views + + def tensors_to_item(self, keys_to_tensors): + return tf.to_int64( + self._num_of_views * keys_to_tensors[self._original_width_key] / + keys_to_tensors[self._width_key]) + + +def get_split(split_name, dataset_dir=None, config=None): + """Returns a dataset tuple for FSNS dataset. + + Args: + split_name: A train/test split name. + dataset_dir: The base directory of the dataset sources, by default it uses + a predefined CNS path (see DEFAULT_DATASET_DIR). + config: A dictionary with dataset configuration. If None - will use the + DEFAULT_CONFIG. + + Returns: + A `Dataset` namedtuple. + + Raises: + ValueError: if `split_name` is not a valid train/test split. + """ + if not dataset_dir: + dataset_dir = DEFAULT_DATASET_DIR + + if not config: + config = DEFAULT_CONFIG + + if split_name not in config['splits']: + raise ValueError('split name %s was not recognized.' % split_name) + + logging.info('Using %s dataset split_name=%s dataset_dir=%s', config['name'], + split_name, dataset_dir) + + # Ignores the 'image/height' feature. + zero = tf.zeros([1], dtype=tf.int64) + keys_to_features = { + 'image/encoded': + tf.FixedLenFeature((), tf.string, default_value=''), + 'image/format': + tf.FixedLenFeature((), tf.string, default_value='png'), + 'image/width': + tf.FixedLenFeature([1], tf.int64, default_value=zero), + 'image/orig_width': + tf.FixedLenFeature([1], tf.int64, default_value=zero), + 'image/class': + tf.FixedLenFeature([config['max_sequence_length']], tf.int64), + 'image/unpadded_class': + tf.VarLenFeature(tf.int64), + 'image/text': + tf.FixedLenFeature([1], tf.string, default_value=''), + } + items_to_handlers = { + 'image': + slim.tfexample_decoder.Image( + shape=config['image_shape'], + image_key='image/encoded', + format_key='image/format'), + 'label': + slim.tfexample_decoder.Tensor(tensor_key='image/class'), + 'text': + slim.tfexample_decoder.Tensor(tensor_key='image/text'), + 'num_of_views': + _NumOfViewsHandler( + width_key='image/width', + original_width_key='image/orig_width', + num_of_views=config['num_of_views']) + } + decoder = slim.tfexample_decoder.TFExampleDecoder(keys_to_features, + items_to_handlers) + charset_file = os.path.join(dataset_dir, config['charset_filename']) + charset = read_charset(charset_file) + file_pattern = os.path.join(dataset_dir, + config['splits'][split_name]['pattern']) + return slim.dataset.Dataset( + data_sources=file_pattern, + reader=tf.TFRecordReader, + decoder=decoder, + num_samples=config['splits'][split_name]['size'], + items_to_descriptions=config['items_to_descriptions'], + # additional parameters for convenience. + charset=charset, + num_char_classes=len(charset), + num_of_views=config['num_of_views'], + max_sequence_length=config['max_sequence_length'], + null_code=config['null_code']) diff --git a/attention_ocr/python/datasets/fsns_test.py b/attention_ocr/python/datasets/fsns_test.py new file mode 100644 index 0000000000000000000000000000000000000000..17cee7d404445e2c1e8f28cfb5b87c10fbbc5289 --- /dev/null +++ b/attention_ocr/python/datasets/fsns_test.py @@ -0,0 +1,103 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for FSNS datasets module.""" + +import collections +import os +import tensorflow as tf +from tensorflow.contrib import slim + +import fsns +import unittest_utils + +FLAGS = tf.flags.FLAGS + + +def get_test_split(): + config = fsns.DEFAULT_CONFIG.copy() + config['splits'] = {'test': {'size': 50, 'pattern': 'fsns-00000-of-00001'}} + return fsns.get_split('test', dataset_dir(), config) + + +def dataset_dir(): + return os.path.join(os.path.dirname(__file__), 'testdata/fsns') + + +class FsnsTest(tf.test.TestCase): + def test_decodes_example_proto(self): + expected_label = range(37) + expected_image, encoded = unittest_utils.create_random_image( + 'PNG', shape=(150, 600, 3)) + serialized = unittest_utils.create_serialized_example({ + 'image/encoded': [encoded], + 'image/format': ['PNG'], + 'image/class': + expected_label, + 'image/unpadded_class': + range(10), + 'image/text': ['Raw text'], + 'image/orig_width': [150], + 'image/width': [600] + }) + + decoder = fsns.get_split('train', dataset_dir()).decoder + with self.test_session() as sess: + data_tuple = collections.namedtuple('DecodedData', decoder.list_items()) + data = sess.run(data_tuple(*decoder.decode(serialized))) + + self.assertAllEqual(expected_image, data.image) + self.assertAllEqual(expected_label, data.label) + self.assertEqual(['Raw text'], data.text) + self.assertEqual([1], data.num_of_views) + + def test_label_has_shape_defined(self): + serialized = 'fake' + decoder = fsns.get_split('train', dataset_dir()).decoder + + [label_tf] = decoder.decode(serialized, ['label']) + + self.assertEqual(label_tf.get_shape().dims[0], 37) + + def test_dataset_tuple_has_all_extra_attributes(self): + dataset = fsns.get_split('train', dataset_dir()) + + self.assertTrue(dataset.charset) + self.assertTrue(dataset.num_char_classes) + self.assertTrue(dataset.num_of_views) + self.assertTrue(dataset.max_sequence_length) + self.assertTrue(dataset.null_code) + + def test_can_use_the_test_data(self): + batch_size = 1 + dataset = get_test_split() + provider = slim.dataset_data_provider.DatasetDataProvider( + dataset, + shuffle=True, + common_queue_capacity=2 * batch_size, + common_queue_min=batch_size) + image_tf, label_tf = provider.get(['image', 'label']) + + with self.test_session() as sess: + sess.run(tf.global_variables_initializer()) + with slim.queues.QueueRunners(sess): + image_np, label_np = sess.run([image_tf, label_tf]) + + self.assertEqual((150, 600, 3), image_np.shape) + self.assertEqual((37, ), label_np.shape) + + +if __name__ == '__main__': + tf.test.main() diff --git a/attention_ocr/python/datasets/testdata/fsns/charset_size=134.txt b/attention_ocr/python/datasets/testdata/fsns/charset_size=134.txt new file mode 100644 index 0000000000000000000000000000000000000000..5c7fcde2ae0ab679f279a083d6de1c50d33ff90b --- /dev/null +++ b/attention_ocr/python/datasets/testdata/fsns/charset_size=134.txt @@ -0,0 +1,139 @@ +0 +133 +1 l +2 ’ +3 é +4 t +5 e +6 i +7 n +8 s +9 x +10 g +11 u +12 o +13 1 +14 8 +15 7 +16 0 +17 - +18 . +19 p +20 a +21 r +22 è +23 d +24 c +25 V +26 v +27 b +28 m +29 ) +30 C +31 z +32 S +33 y +34 , +35 k +36 É +37 A +38 h +39 E +40 » +41 D +42 / +43 H +44 M +45 ( +46 G +47 P +48 ç +2 ' +49 R +50 f +51 " +52 2 +53 j +54 | +55 N +56 6 +57 ° +58 5 +59 T +60 O +61 U +62 3 +63 % +64 9 +65 q +66 Z +67 B +68 K +69 w +70 W +71 : +72 4 +73 L +74 F +75 ] +76 ï +2 ‘ +77 I +78 J +79 ä +80 î +81 ; +82 à +83 ê +84 X +85 ü +86 Y +87 ô +88 = +89 + +90 \ +91 { +92 } +93 _ +94 Q +95 œ +96 ñ +97 * +98 ! +99 Ü +51 “ +100 â +101 Ç +102 Œ +103 û +104 ? +105 $ +106 ë +107 « +108 € +109 & +110 < +51 ” +111 æ +112 # +113 ® +114  +115 È +116 > +117 [ +17 — +118 Æ +119 ù +120 Î +121 Ô +122 ÿ +123 À +124 Ê +125 @ +126 Ï +127 © +128 Ë +129 Ù +130 £ +131 Ÿ +132 Û diff --git a/attention_ocr/python/datasets/testdata/fsns/fsns-00000-of-00001 b/attention_ocr/python/datasets/testdata/fsns/fsns-00000-of-00001 new file mode 100644 index 0000000000000000000000000000000000000000..eacafcc810fafba6c747e81a9f5e30e21c98d816 Binary files /dev/null and b/attention_ocr/python/datasets/testdata/fsns/fsns-00000-of-00001 differ diff --git a/attention_ocr/python/datasets/testdata/fsns/links.txt b/attention_ocr/python/datasets/testdata/fsns/links.txt new file mode 100644 index 0000000000000000000000000000000000000000..da98d305fa02a61a9ac42b5e5490aa4e0c709b7e --- /dev/null +++ b/attention_ocr/python/datasets/testdata/fsns/links.txt @@ -0,0 +1 @@ +http://download.tensorflow.org/data/fsns-20160927/testdata/fsns-00000-of-00001 diff --git a/attention_ocr/python/datasets/unittest_utils.py b/attention_ocr/python/datasets/unittest_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..f74a40a4997d95de5c8353998a74ff32158fe7ad --- /dev/null +++ b/attention_ocr/python/datasets/unittest_utils.py @@ -0,0 +1,64 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Functions to make unit testing easier.""" + +import StringIO +import numpy as np +from PIL import Image as PILImage +import tensorflow as tf + + +def create_random_image(image_format, shape): + """Creates an image with random values. + + Args: + image_format: An image format (PNG or JPEG). + shape: A tuple with image shape (including channels). + + Returns: + A tuple (, ) + """ + image = np.random.randint(low=0, high=255, size=shape, dtype='uint8') + io = StringIO.StringIO() + image_pil = PILImage.fromarray(image) + image_pil.save(io, image_format, subsampling=0, quality=100) + return image, io.getvalue() + + +def create_serialized_example(name_to_values): + """Creates a tf.Example proto using a dictionary. + + It automatically detects type of values and define a corresponding feature. + + Args: + name_to_values: A dictionary. + + Returns: + tf.Example proto. + """ + example = tf.train.Example() + for name, values in name_to_values.items(): + feature = example.features.feature[name] + if isinstance(values[0], str): + add = feature.bytes_list.value.extend + elif isinstance(values[0], float): + add = feature.float32_list.value.extend + elif isinstance(values[0], int): + add = feature.int64_list.value.extend + else: + raise AssertionError('Unsupported type: %s' % type(values[0])) + add(values) + return example.SerializeToString() diff --git a/attention_ocr/python/datasets/unittest_utils_test.py b/attention_ocr/python/datasets/unittest_utils_test.py new file mode 100644 index 0000000000000000000000000000000000000000..a127143320971f24b389afc973accda81cea8432 --- /dev/null +++ b/attention_ocr/python/datasets/unittest_utils_test.py @@ -0,0 +1,64 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for unittest_utils.""" +import StringIO + +import numpy as np +from PIL import Image as PILImage +import tensorflow as tf + +import unittest_utils + + +class UnittestUtilsTest(tf.test.TestCase): + def test_creates_an_image_of_specified_shape(self): + image, _ = unittest_utils.create_random_image('PNG', (10, 20, 3)) + self.assertEqual(image.shape, (10, 20, 3)) + + def test_encoded_image_corresponds_to_numpy_array(self): + image, encoded = unittest_utils.create_random_image('PNG', (20, 10, 3)) + pil_image = PILImage.open(StringIO.StringIO(encoded)) + self.assertAllEqual(image, np.array(pil_image)) + + def test_created_example_has_correct_values(self): + example_serialized = unittest_utils.create_serialized_example({ + 'labels': [1, 2, 3], + 'data': ['FAKE'] + }) + example = tf.train.Example() + example.ParseFromString(example_serialized) + self.assertProtoEquals(""" + features { + feature { + key: "labels" + value { int64_list { + value: 1 + value: 2 + value: 3 + }} + } + feature { + key: "data" + value { bytes_list { + value: "FAKE" + }} + } + } + """, example) + + +if __name__ == '__main__': + tf.test.main() diff --git a/attention_ocr/python/eval.py b/attention_ocr/python/eval.py new file mode 100644 index 0000000000000000000000000000000000000000..ec68ad50bc25cd8528f4e9fd7976adad72782641 --- /dev/null +++ b/attention_ocr/python/eval.py @@ -0,0 +1,78 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Script to evaluate a trained Attention OCR model. + +A simple usage example: +python eval.py +""" +import tensorflow as tf +from tensorflow.contrib import slim +from tensorflow import app +from tensorflow.python.platform import flags + +import data_provider +import common_flags + +FLAGS = flags.FLAGS +common_flags.define() + +# yapf: disable +flags.DEFINE_integer('num_batches', 100, + 'Number of batches to run eval for.') + +flags.DEFINE_string('eval_log_dir', '/tmp/attention_ocr/eval', + 'Directory where the evaluation results are saved to.') + +flags.DEFINE_integer('eval_interval_secs', 60, + 'Frequency in seconds to run evaluations.') + +flags.DEFINE_integer('number_of_steps', None, + 'Number of times to run evaluation.') +# yapf: enable + + +def main(_): + if not tf.gfile.Exists(FLAGS.eval_log_dir): + tf.gfile.MakeDirs(FLAGS.eval_log_dir) + + dataset = common_flags.create_dataset(split_name=FLAGS.split_name) + model = common_flags.create_model(dataset.num_char_classes, + dataset.max_sequence_length, + dataset.num_of_views, dataset.null_code) + data = data_provider.get_data( + dataset, + FLAGS.batch_size, + augment=False, + central_crop_size=common_flags.get_crop_size()) + endpoints = model.create_base(data.images, labels_one_hot=None) + model.create_loss(data, endpoints) + eval_ops = model.create_summaries( + data, endpoints, dataset.charset, is_training=False) + slim.get_or_create_global_step() + session_config = tf.ConfigProto(device_count={"GPU": 0}) + slim.evaluation.evaluation_loop( + master=FLAGS.master, + checkpoint_dir=FLAGS.train_log_dir, + logdir=FLAGS.eval_log_dir, + eval_op=eval_ops, + num_evals=FLAGS.num_batches, + eval_interval_secs=FLAGS.eval_interval_secs, + max_number_of_evaluations=FLAGS.number_of_steps, + session_config=session_config) + + +if __name__ == '__main__': + app.run() diff --git a/attention_ocr/python/inception_preprocessing.py b/attention_ocr/python/inception_preprocessing.py new file mode 100644 index 0000000000000000000000000000000000000000..d3c3a5b07c24bc1a9e62d52b3213aff31c67d7b7 --- /dev/null +++ b/attention_ocr/python/inception_preprocessing.py @@ -0,0 +1,315 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Provides utilities to preprocess images for the Inception networks.""" + +# TODO(gorban): add as a dependency, when slim or tensorflow/models are pipfied +# Source: +# https://raw.githubusercontent.com/tensorflow/models/a9d0e6e8923a4/slim/preprocessing/inception_preprocessing.py +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import tensorflow as tf + +from tensorflow.python.ops import control_flow_ops + + +def apply_with_random_selector(x, func, num_cases): + """Computes func(x, sel), with sel sampled from [0...num_cases-1]. + + Args: + x: input Tensor. + func: Python function to apply. + num_cases: Python int32, number of cases to sample sel from. + + Returns: + The result of func(x, sel), where func receives the value of the + selector as a python integer, but sel is sampled dynamically. + """ + sel = tf.random_uniform([], maxval=num_cases, dtype=tf.int32) + # Pass the real x only to one of the func calls. + return control_flow_ops.merge([ + func(control_flow_ops.switch(x, tf.equal(sel, case))[1], case) + for case in range(num_cases) + ])[0] + + +def distort_color(image, color_ordering=0, fast_mode=True, scope=None): + """Distort the color of a Tensor image. + + Each color distortion is non-commutative and thus ordering of the color ops + matters. Ideally we would randomly permute the ordering of the color ops. + Rather then adding that level of complication, we select a distinct ordering + of color ops for each preprocessing thread. + + Args: + image: 3-D Tensor containing single image in [0, 1]. + color_ordering: Python int, a type of distortion (valid values: 0-3). + fast_mode: Avoids slower ops (random_hue and random_contrast) + scope: Optional scope for name_scope. + Returns: + 3-D Tensor color-distorted image on range [0, 1] + Raises: + ValueError: if color_ordering not in [0, 3] + """ + with tf.name_scope(scope, 'distort_color', [image]): + if fast_mode: + if color_ordering == 0: + image = tf.image.random_brightness(image, max_delta=32. / 255.) + image = tf.image.random_saturation(image, lower=0.5, upper=1.5) + else: + image = tf.image.random_saturation(image, lower=0.5, upper=1.5) + image = tf.image.random_brightness(image, max_delta=32. / 255.) + else: + if color_ordering == 0: + image = tf.image.random_brightness(image, max_delta=32. / 255.) + image = tf.image.random_saturation(image, lower=0.5, upper=1.5) + image = tf.image.random_hue(image, max_delta=0.2) + image = tf.image.random_contrast(image, lower=0.5, upper=1.5) + elif color_ordering == 1: + image = tf.image.random_saturation(image, lower=0.5, upper=1.5) + image = tf.image.random_brightness(image, max_delta=32. / 255.) + image = tf.image.random_contrast(image, lower=0.5, upper=1.5) + image = tf.image.random_hue(image, max_delta=0.2) + elif color_ordering == 2: + image = tf.image.random_contrast(image, lower=0.5, upper=1.5) + image = tf.image.random_hue(image, max_delta=0.2) + image = tf.image.random_brightness(image, max_delta=32. / 255.) + image = tf.image.random_saturation(image, lower=0.5, upper=1.5) + elif color_ordering == 3: + image = tf.image.random_hue(image, max_delta=0.2) + image = tf.image.random_saturation(image, lower=0.5, upper=1.5) + image = tf.image.random_contrast(image, lower=0.5, upper=1.5) + image = tf.image.random_brightness(image, max_delta=32. / 255.) + else: + raise ValueError('color_ordering must be in [0, 3]') + + # The random_* ops do not necessarily clamp. + return tf.clip_by_value(image, 0.0, 1.0) + + +def distorted_bounding_box_crop(image, + bbox, + min_object_covered=0.1, + aspect_ratio_range=(0.75, 1.33), + area_range=(0.05, 1.0), + max_attempts=100, + scope=None): + """Generates cropped_image using a one of the bboxes randomly distorted. + + See `tf.image.sample_distorted_bounding_box` for more documentation. + + Args: + image: 3-D Tensor of image (it will be converted to floats in [0, 1]). + bbox: 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords] + where each coordinate is [0, 1) and the coordinates are arranged + as [ymin, xmin, ymax, xmax]. If num_boxes is 0 then it would use the + whole image. + min_object_covered: An optional `float`. Defaults to `0.1`. The cropped + area of the image must contain at least this fraction of any bounding box + supplied. + aspect_ratio_range: An optional list of `floats`. The cropped area of the + image must have an aspect ratio = width / height within this range. + area_range: An optional list of `floats`. The cropped area of the image + must contain a fraction of the supplied image within in this range. + max_attempts: An optional `int`. Number of attempts at generating a cropped + region of the image of the specified constraints. After `max_attempts` + failures, return the entire image. + scope: Optional scope for name_scope. + Returns: + A tuple, a 3-D Tensor cropped_image and the distorted bbox + """ + with tf.name_scope(scope, 'distorted_bounding_box_crop', [image, bbox]): + # Each bounding box has shape [1, num_boxes, box coords] and + # the coordinates are ordered [ymin, xmin, ymax, xmax]. + + # A large fraction of image datasets contain a human-annotated bounding + # box delineating the region of the image containing the object of interest. + # We choose to create a new bounding box for the object which is a randomly + # distorted version of the human-annotated bounding box that obeys an + # allowed range of aspect ratios, sizes and overlap with the human-annotated + # bounding box. If no box is supplied, then we assume the bounding box is + # the entire image. + sample_distorted_bounding_box = tf.image.sample_distorted_bounding_box( + tf.shape(image), + bounding_boxes=bbox, + min_object_covered=min_object_covered, + aspect_ratio_range=aspect_ratio_range, + area_range=area_range, + max_attempts=max_attempts, + use_image_if_no_bounding_boxes=True) + bbox_begin, bbox_size, distort_bbox = sample_distorted_bounding_box + + # Crop the image to the specified bounding box. + cropped_image = tf.slice(image, bbox_begin, bbox_size) + return cropped_image, distort_bbox + + +def preprocess_for_train(image, + height, + width, + bbox, + fast_mode=True, + scope=None): + """Distort one image for training a network. + + Distorting images provides a useful technique for augmenting the data + set during training in order to make the network invariant to aspects + of the image that do not effect the label. + + Additionally it would create image_summaries to display the different + transformations applied to the image. + + Args: + image: 3-D Tensor of image. If dtype is tf.float32 then the range should be + [0, 1], otherwise it would converted to tf.float32 assuming that the range + is [0, MAX], where MAX is largest positive representable number for + int(8/16/32) data type (see `tf.image.convert_image_dtype` for details). + height: integer + width: integer + bbox: 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords] + where each coordinate is [0, 1) and the coordinates are arranged + as [ymin, xmin, ymax, xmax]. + fast_mode: Optional boolean, if True avoids slower transformations (i.e. + bi-cubic resizing, random_hue or random_contrast). + scope: Optional scope for name_scope. + Returns: + 3-D float Tensor of distorted image used for training with range [-1, 1]. + """ + with tf.name_scope(scope, 'distort_image', [image, height, width, bbox]): + if bbox is None: + bbox = tf.constant( + [0.0, 0.0, 1.0, 1.0], dtype=tf.float32, shape=[1, 1, 4]) + if image.dtype != tf.float32: + image = tf.image.convert_image_dtype(image, dtype=tf.float32) + # Each bounding box has shape [1, num_boxes, box coords] and + # the coordinates are ordered [ymin, xmin, ymax, xmax]. + image_with_box = tf.image.draw_bounding_boxes( + tf.expand_dims(image, 0), bbox) + tf.summary.image('image_with_bounding_boxes', image_with_box) + + distorted_image, distorted_bbox = distorted_bounding_box_crop(image, bbox) + # Restore the shape since the dynamic slice based upon the bbox_size loses + # the third dimension. + distorted_image.set_shape([None, None, 3]) + image_with_distorted_box = tf.image.draw_bounding_boxes( + tf.expand_dims(image, 0), distorted_bbox) + tf.summary.image('images_with_distorted_bounding_box', + image_with_distorted_box) + + # This resizing operation may distort the images because the aspect + # ratio is not respected. We select a resize method in a round robin + # fashion based on the thread number. + # Note that ResizeMethod contains 4 enumerated resizing methods. + + # We select only 1 case for fast_mode bilinear. + num_resize_cases = 1 if fast_mode else 4 + distorted_image = apply_with_random_selector( + distorted_image, + lambda x, method: tf.image.resize_images(x, [height, width], method=method), + num_cases=num_resize_cases) + + tf.summary.image('cropped_resized_image', + tf.expand_dims(distorted_image, 0)) + + # Randomly flip the image horizontally. + distorted_image = tf.image.random_flip_left_right(distorted_image) + + # Randomly distort the colors. There are 4 ways to do it. + distorted_image = apply_with_random_selector( + distorted_image, + lambda x, ordering: distort_color(x, ordering, fast_mode), + num_cases=4) + + tf.summary.image('final_distorted_image', + tf.expand_dims(distorted_image, 0)) + distorted_image = tf.subtract(distorted_image, 0.5) + distorted_image = tf.multiply(distorted_image, 2.0) + return distorted_image + + +def preprocess_for_eval(image, + height, + width, + central_fraction=0.875, + scope=None): + """Prepare one image for evaluation. + + If height and width are specified it would output an image with that size by + applying resize_bilinear. + + If central_fraction is specified it would cropt the central fraction of the + input image. + + Args: + image: 3-D Tensor of image. If dtype is tf.float32 then the range should be + [0, 1], otherwise it would converted to tf.float32 assuming that the range + is [0, MAX], where MAX is largest positive representable number for + int(8/16/32) data type (see `tf.image.convert_image_dtype` for details) + height: integer + width: integer + central_fraction: Optional Float, fraction of the image to crop. + scope: Optional scope for name_scope. + Returns: + 3-D float Tensor of prepared image. + """ + with tf.name_scope(scope, 'eval_image', [image, height, width]): + if image.dtype != tf.float32: + image = tf.image.convert_image_dtype(image, dtype=tf.float32) + # Crop the central region of the image with an area containing 87.5% of + # the original image. + if central_fraction: + image = tf.image.central_crop(image, central_fraction=central_fraction) + + if height and width: + # Resize the image to the specified height and width. + image = tf.expand_dims(image, 0) + image = tf.image.resize_bilinear( + image, [height, width], align_corners=False) + image = tf.squeeze(image, [0]) + image = tf.subtract(image, 0.5) + image = tf.multiply(image, 2.0) + return image + + +def preprocess_image(image, + height, + width, + is_training=False, + bbox=None, + fast_mode=True): + """Pre-process one image for training or evaluation. + + Args: + image: 3-D Tensor [height, width, channels] with the image. + height: integer, image expected height. + width: integer, image expected width. + is_training: Boolean. If true it would transform an image for train, + otherwise it would transform it for evaluation. + bbox: 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords] + where each coordinate is [0, 1) and the coordinates are arranged as + [ymin, xmin, ymax, xmax]. + fast_mode: Optional boolean, if True avoids slower transformations. + + Returns: + 3-D float Tensor containing an appropriately scaled image + + Raises: + ValueError: if user does not provide bounding box + """ + if is_training: + return preprocess_for_train(image, height, width, bbox, fast_mode) + else: + return preprocess_for_eval(image, height, width) diff --git a/attention_ocr/python/metrics.py b/attention_ocr/python/metrics.py new file mode 100644 index 0000000000000000000000000000000000000000..9e2a6a7579812583dc60546f97976f05befe07ff --- /dev/null +++ b/attention_ocr/python/metrics.py @@ -0,0 +1,90 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Quality metrics for the model.""" + +import tensorflow as tf + + +def char_accuracy(predictions, targets, rej_char, streaming=False): + """Computes character level accuracy. + + Both predictions and targets should have the same shape + [batch_size x seq_length]. + + Args: + predictions: predicted characters ids. + targets: ground truth character ids. + rej_char: the character id used to mark an empty element (end of sequence). + streaming: if True, uses the streaming mean from the slim.metric module. + + Returns: + a update_ops for execution and value tensor whose value on evaluation + returns the total character accuracy. + """ + with tf.variable_scope('CharAccuracy'): + predictions.get_shape().assert_is_compatible_with(targets.get_shape()) + + targets = tf.to_int32(targets) + const_rej_char = tf.constant(rej_char, shape=targets.get_shape()) + weights = tf.to_float(tf.not_equal(targets, const_rej_char)) + correct_chars = tf.to_float(tf.equal(predictions, targets)) + accuracy_per_example = tf.div( + tf.reduce_sum(tf.multiply(correct_chars, weights), 1), + tf.reduce_sum(weights, 1)) + if streaming: + return tf.contrib.metrics.streaming_mean(accuracy_per_example) + else: + return tf.reduce_mean(accuracy_per_example) + + +def sequence_accuracy(predictions, targets, rej_char, streaming=False): + """Computes sequence level accuracy. + + Both input tensors should have the same shape: [batch_size x seq_length]. + + Args: + predictions: predicted character classes. + targets: ground truth character classes. + rej_char: the character id used to mark empty element (end of sequence). + streaming: if True, uses the streaming mean from the slim.metric module. + + Returns: + a update_ops for execution and value tensor whose value on evaluation + returns the total sequence accuracy. + """ + + with tf.variable_scope('SequenceAccuracy'): + predictions.get_shape().assert_is_compatible_with(targets.get_shape()) + + targets = tf.to_int32(targets) + const_rej_char = tf.constant( + rej_char, shape=targets.get_shape(), dtype=tf.int32) + include_mask = tf.not_equal(targets, const_rej_char) + include_predictions = tf.to_int32( + tf.where(include_mask, predictions, + tf.zeros_like(predictions) + rej_char)) + correct_chars = tf.to_float(tf.equal(include_predictions, targets)) + correct_chars_counts = tf.cast( + tf.reduce_sum(correct_chars, reduction_indices=[1]), dtype=tf.int32) + target_length = targets.get_shape().dims[1].value + target_chars_counts = tf.constant( + target_length, shape=correct_chars_counts.get_shape()) + accuracy_per_example = tf.to_float( + tf.equal(correct_chars_counts, target_chars_counts)) + if streaming: + return tf.contrib.metrics.streaming_mean(accuracy_per_example) + else: + return tf.reduce_mean(accuracy_per_example) diff --git a/attention_ocr/python/metrics_test.py b/attention_ocr/python/metrics_test.py new file mode 100644 index 0000000000000000000000000000000000000000..68b9724f1d20e62f39f8b1b5c0130d4ea76cf825 --- /dev/null +++ b/attention_ocr/python/metrics_test.py @@ -0,0 +1,97 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for the metrics module.""" +import contextlib +import numpy as np +import tensorflow as tf + +import metrics + + +class AccuracyTest(tf.test.TestCase): + def setUp(self): + tf.test.TestCase.setUp(self) + self.rng = np.random.RandomState([11, 23, 50]) + self.num_char_classes = 3 + self.batch_size = 4 + self.seq_length = 5 + self.rej_char = 42 + + @contextlib.contextmanager + def initialized_session(self): + """Wrapper for test session context manager with required initialization. + + Yields: + A session object that should be used as a context manager. + """ + with self.test_session() as sess: + sess.run(tf.global_variables_initializer()) + sess.run(tf.local_variables_initializer()) + yield sess + + def _fake_labels(self): + return self.rng.randint( + low=0, + high=self.num_char_classes, + size=(self.batch_size, self.seq_length), + dtype='int32') + + def _incorrect_copy(self, values, bad_indexes): + incorrect = np.copy(values) + incorrect[bad_indexes] = values[bad_indexes] + 1 + return incorrect + + def test_sequence_accuracy_identical_samples(self): + labels_tf = tf.convert_to_tensor(self._fake_labels()) + + accuracy_tf = metrics.sequence_accuracy(labels_tf, labels_tf, + self.rej_char) + with self.initialized_session() as sess: + accuracy_np = sess.run(accuracy_tf) + + self.assertAlmostEqual(accuracy_np, 1.0) + + def test_sequence_accuracy_one_char_difference(self): + ground_truth_np = self._fake_labels() + ground_truth_tf = tf.convert_to_tensor(ground_truth_np) + prediction_tf = tf.convert_to_tensor( + self._incorrect_copy(ground_truth_np, bad_indexes=((0, 0)))) + + accuracy_tf = metrics.sequence_accuracy(prediction_tf, ground_truth_tf, + self.rej_char) + with self.initialized_session() as sess: + accuracy_np = sess.run(accuracy_tf) + + # 1 of 4 sequences is incorrect. + self.assertAlmostEqual(accuracy_np, 1.0 - 1.0 / self.batch_size) + + def test_char_accuracy_one_char_difference_with_padding(self): + ground_truth_np = self._fake_labels() + ground_truth_tf = tf.convert_to_tensor(ground_truth_np) + prediction_tf = tf.convert_to_tensor( + self._incorrect_copy(ground_truth_np, bad_indexes=((0, 0)))) + + accuracy_tf = metrics.char_accuracy(prediction_tf, ground_truth_tf, + self.rej_char) + with self.initialized_session() as sess: + accuracy_np = sess.run(accuracy_tf) + + chars_count = self.seq_length * self.batch_size + self.assertAlmostEqual(accuracy_np, 1.0 - 1.0 / chars_count) + + +if __name__ == '__main__': + tf.test.main() diff --git a/attention_ocr/python/model.py b/attention_ocr/python/model.py new file mode 100644 index 0000000000000000000000000000000000000000..8e0e19bb887e1476a4e2a6df82491a5e9a812460 --- /dev/null +++ b/attention_ocr/python/model.py @@ -0,0 +1,531 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Functions to build the Attention OCR model. + +Usage example: + ocr_model = model.Model(num_char_classes, seq_length, num_of_views) + + data = ... # create namedtuple InputEndpoints + endpoints = model.create_base(data.images, data.labels_one_hot) + # endpoints.predicted_chars is a tensor with predicted character codes. + total_loss = model.create_loss(data, endpoints) +""" +import sys +import collections +import logging +import tensorflow as tf +from tensorflow.contrib import slim +from tensorflow.contrib.slim.nets import inception + +import metrics +import sequence_layers +import utils + + +OutputEndpoints = collections.namedtuple('OutputEndpoints', [ + 'chars_logit', 'chars_log_prob', 'predicted_chars', 'predicted_scores' +]) + +# TODO(gorban): replace with tf.HParams when it is released. +ModelParams = collections.namedtuple('ModelParams', [ + 'num_char_classes', 'seq_length', 'num_views', 'null_code' +]) + +ConvTowerParams = collections.namedtuple('ConvTowerParams', ['final_endpoint']) + +SequenceLogitsParams = collections.namedtuple('SequenceLogitsParams', [ + 'use_attention', 'use_autoregression', 'num_lstm_units', 'weight_decay', + 'lstm_state_clip_value' +]) + +SequenceLossParams = collections.namedtuple('SequenceLossParams', [ + 'label_smoothing', 'ignore_nulls', 'average_across_timesteps' +]) + + +def _dict_to_array(id_to_char, default_character): + num_char_classes = max(id_to_char.keys()) + 1 + array = [default_character] * num_char_classes + for k, v in id_to_char.iteritems(): + array[k] = v + return array + + +class CharsetMapper(object): + """A simple class to map tensor ids into strings. + + It works only when the character set is 1:1 mapping between individual + characters and individual ids. + + Make sure you call tf.tables_initializer().run() as part of the init op. + """ + + def __init__(self, charset, default_character='?'): + """Creates a lookup table. + + Args: + charset: a dictionary with id-to-character mapping. + """ + mapping_strings = tf.constant(_dict_to_array(charset, default_character)) + self.table = tf.contrib.lookup.index_to_string_table_from_tensor( + mapping=mapping_strings, default_value=default_character) + + def get_text(self, ids): + """Returns a string corresponding to a sequence of character ids. + + Args: + ids: a tensor with shape [batch_size, max_sequence_length] + """ + return tf.reduce_join( + self.table.lookup(tf.to_int64(ids)), reduction_indices=1) + + +def get_softmax_loss_fn(label_smoothing): + """Returns sparse or dense loss function depending on the label_smoothing. + + Args: + label_smoothing: weight for label smoothing + + Returns: + a function which takes labels and predictions as arguments and returns + a softmax loss for the selected type of labels (sparse or dense). + """ + if label_smoothing > 0: + + def loss_fn(labels, logits): + return (tf.nn.softmax_cross_entropy_with_logits( + logits=logits, labels=labels)) + else: + + def loss_fn(labels, logits): + return tf.nn.sparse_softmax_cross_entropy_with_logits( + logits=logits, labels=labels) + + return loss_fn + + +class Model(object): + """Class to create the Attention OCR Model.""" + + def __init__(self, + num_char_classes, + seq_length, + num_views, + null_code, + mparams=None): + """Initialized model parameters. + + Args: + num_char_classes: size of character set. + seq_length: number of characters in a sequence. + num_views: Number of views (conv towers) to use. + null_code: A character code corresponding to a character which + indicates end of a sequence. + mparams: a dictionary with hyper parameters for methods, keys - + function names, values - corresponding namedtuples. + """ + super(Model, self).__init__() + self._params = ModelParams( + num_char_classes=num_char_classes, + seq_length=seq_length, + num_views=num_views, + null_code=null_code) + self._mparams = self.default_mparams() + if mparams: + self._mparams.update(mparams) + + def default_mparams(self): + return { + 'conv_tower_fn': + ConvTowerParams(final_endpoint='Mixed_5d'), + 'sequence_logit_fn': + SequenceLogitsParams( + use_attention=True, + use_autoregression=True, + num_lstm_units=256, + weight_decay=0.00004, + lstm_state_clip_value=10.0), + 'sequence_loss_fn': + SequenceLossParams( + label_smoothing=0.1, + ignore_nulls=True, + average_across_timesteps=False) + } + + def set_mparam(self, function, **kwargs): + self._mparams[function] = self._mparams[function]._replace(**kwargs) + + def conv_tower_fn(self, images, is_training=True, reuse=None): + """Computes convolutional features using the InceptionV3 model. + + Args: + images: A tensor of shape [batch_size, height, width, channels]. + is_training: whether is training or not. + reuse: whether or not the network and its variables should be reused. To + be able to reuse 'scope' must be given. + + Returns: + A tensor of shape [batch_size, OH, OW, N], where OWxOH is resolution of + output feature map and N is number of output features (depends on the + network architecture). + """ + mparams = self._mparams['conv_tower_fn'] + logging.debug('Using final_endpoint=%s', mparams.final_endpoint) + with tf.variable_scope('conv_tower_fn/INCE'): + if reuse: + tf.get_variable_scope().reuse_variables() + with slim.arg_scope(inception.inception_v3_arg_scope()): + net, _ = inception.inception_v3_base( + images, final_endpoint=mparams.final_endpoint) + return net + + def _create_lstm_inputs(self, net): + """Splits an input tensor into a list of tensors (features). + + Args: + net: A feature map of shape [batch_size, num_features, feature_size]. + + Raises: + AssertionError: if num_features is less than seq_length. + + Returns: + A list with seq_length tensors of shape [batch_size, feature_size] + """ + num_features = net.get_shape().dims[1].value + if num_features < self._params.seq_length: + raise AssertionError('Incorrect dimension #1 of input tensor' + ' %d should be bigger than %d (shape=%s)' % + (num_features, self._params.seq_length, + net.get_shape())) + elif num_features > self._params.seq_length: + logging.warning('Ignoring some features: use %d of %d (shape=%s)', + self._params.seq_length, num_features, net.get_shape()) + net = tf.slice(net, [0, 0, 0], [-1, self._params.seq_length, -1]) + + return tf.unstack(net, axis=1) + + def sequence_logit_fn(self, net, labels_one_hot): + mparams = self._mparams['sequence_logit_fn'] + # TODO(gorban): remove /alias suffixes from the scopes. + with tf.variable_scope('sequence_logit_fn/SQLR'): + layer_class = sequence_layers.get_layer_class(mparams.use_attention, + mparams.use_autoregression) + layer = layer_class(net, labels_one_hot, self._params, mparams) + return layer.create_logits() + + def max_pool_views(self, nets_list): + """Max pool across all nets in spatial dimensions. + + Args: + nets_list: A list of 4D tensors with identical size. + + Returns: + A tensor with the same size as any input tensors. + """ + batch_size, height, width, num_features = [ + d.value for d in nets_list[0].get_shape().dims + ] + xy_flat_shape = (batch_size, 1, height * width, num_features) + nets_for_merge = [] + with tf.variable_scope('max_pool_views', values=nets_list): + for net in nets_list: + nets_for_merge.append(tf.reshape(net, xy_flat_shape)) + merged_net = tf.concat(nets_for_merge, 1) + net = slim.max_pool2d( + merged_net, kernel_size=[len(nets_list), 1], stride=1) + net = tf.reshape(net, (batch_size, height, width, num_features)) + return net + + def pool_views_fn(self, nets): + """Combines output of multiple convolutional towers into a single tensor. + + It stacks towers one on top another (in height dim) in a 4x1 grid. + The order is arbitrary design choice and shouldn't matter much. + + Args: + nets: list of tensors of shape=[batch_size, height, width, num_features]. + + Returns: + A tensor of shape [batch_size, seq_length, features_size]. + """ + with tf.variable_scope('pool_views_fn/STCK'): + net = tf.concat(nets, 1) + batch_size = net.get_shape().dims[0].value + feature_size = net.get_shape().dims[3].value + return tf.reshape(net, [batch_size, -1, feature_size]) + + def char_predictions(self, chars_logit): + """Returns confidence scores (softmax values) for predicted characters. + + Args: + chars_logit: chars logits, a tensor with shape + [batch_size x seq_length x num_char_classes] + + Returns: + A tuple (ids, log_prob, scores), where: + ids - predicted characters, a int32 tensor with shape + [batch_size x seq_length]; + log_prob - a log probability of all characters, a float tensor with + shape [batch_size, seq_length, num_char_classes]; + scores - corresponding confidence scores for characters, a float + tensor + with shape [batch_size x seq_length]. + """ + log_prob = utils.logits_to_log_prob(chars_logit) + ids = tf.to_int32(tf.argmax(log_prob, dimension=2), name='predicted_chars') + mask = tf.cast( + slim.one_hot_encoding(ids, self._params.num_char_classes), tf.bool) + all_scores = tf.nn.softmax(chars_logit) + selected_scores = tf.boolean_mask(all_scores, mask, name='char_scores') + scores = tf.reshape(selected_scores, shape=(-1, self._params.seq_length)) + return ids, log_prob, scores + + def create_base(self, + images, + labels_one_hot, + scope='AttentionOcr_v1', + reuse=None): + """Creates a base part of the Model (no gradients, losses or summaries). + + Args: + images: A tensor of shape [batch_size, height, width, channels]. + labels_one_hot: Optional (can be None) one-hot encoding for ground truth + labels. If provided the function will create a model for training. + scope: Optional variable_scope. + reuse: whether or not the network and its variables should be reused. To + be able to reuse 'scope' must be given. + + Returns: + A named tuple OutputEndpoints. + """ + logging.debug('images: %s', images) + is_training = labels_one_hot is not None + with tf.variable_scope(scope, reuse=reuse): + views = tf.split( + value=images, num_or_size_splits=self._params.num_views, axis=2) + logging.debug('Views=%d single view: %s', len(views), views[0]) + + nets = [ + self.conv_tower_fn(v, is_training, reuse=(i != 0)) + for i, v in enumerate(views) + ] + logging.debug('Conv tower: %s', nets[0]) + + net = self.pool_views_fn(nets) + logging.debug('Pooled views: %s', net) + + chars_logit = self.sequence_logit_fn(net, labels_one_hot) + logging.debug('chars_logit: %s', chars_logit) + + predicted_chars, chars_log_prob, predicted_scores = ( + self.char_predictions(chars_logit)) + + return OutputEndpoints( + chars_logit=chars_logit, + chars_log_prob=chars_log_prob, + predicted_chars=predicted_chars, + predicted_scores=predicted_scores) + + def create_loss(self, data, endpoints): + """Creates all losses required to train the model. + + Args: + data: InputEndpoints namedtuple. + endpoints: Model namedtuple. + + Returns: + Total loss. + """ + # NOTE: the return value of ModelLoss is not used directly for the + # gradient computation because under the hood it calls slim.losses.AddLoss, + # which registers the loss in an internal collection and later returns it + # as part of GetTotalLoss. We need to use total loss because model may have + # multiple losses including regularization losses. + self.sequence_loss_fn(endpoints.chars_logit, data.labels) + total_loss = slim.losses.get_total_loss() + tf.summary.scalar('TotalLoss', total_loss) + return total_loss + + def label_smoothing_regularization(self, chars_labels, weight=0.1): + """Applies a label smoothing regularization. + + Uses the same method as in https://arxiv.org/abs/1512.00567. + + Args: + chars_labels: ground truth ids of charactes, + shape=[batch_size, seq_length]; + weight: label-smoothing regularization weight. + + Returns: + A sensor with the same shape as the input. + """ + one_hot_labels = tf.one_hot( + chars_labels, depth=self._params.num_char_classes, axis=-1) + pos_weight = 1.0 - weight + neg_weight = weight / self._params.num_char_classes + return one_hot_labels * pos_weight + neg_weight + + def sequence_loss_fn(self, chars_logits, chars_labels): + """Loss function for char sequence. + + Depending on values of hyper parameters it applies label smoothing and can + also ignore all null chars after the first one. + + Args: + chars_logits: logits for predicted characters, + shape=[batch_size, seq_length, num_char_classes]; + chars_labels: ground truth ids of characters, + shape=[batch_size, seq_length]; + mparams: method hyper parameters. + + Returns: + A Tensor with shape [batch_size] - the log-perplexity for each sequence. + """ + mparams = self._mparams['sequence_loss_fn'] + with tf.variable_scope('sequence_loss_fn/SLF'): + if mparams.label_smoothing > 0: + smoothed_one_hot_labels = self.label_smoothing_regularization( + chars_labels, mparams.label_smoothing) + labels_list = tf.unstack(smoothed_one_hot_labels, axis=1) + else: + # NOTE: in case of sparse softmax we are not using one-hot + # encoding. + labels_list = tf.unstack(chars_labels, axis=1) + + batch_size, seq_length, _ = chars_logits.shape.as_list() + if mparams.ignore_nulls: + weights = tf.ones((batch_size, seq_length), dtype=tf.float32) + else: + # Suppose that reject character is the last in the charset. + reject_char = tf.constant( + self._params.num_char_classes - 1, + shape=(batch_size, seq_length), + dtype=tf.int64) + known_char = tf.not_equal(chars_labels, reject_char) + weights = tf.to_float(known_char) + + logits_list = tf.unstack(chars_logits, axis=1) + weights_list = tf.unstack(weights, axis=1) + loss = tf.contrib.legacy_seq2seq.sequence_loss( + logits_list, + labels_list, + weights_list, + softmax_loss_function=get_softmax_loss_fn(mparams.label_smoothing), + average_across_timesteps=mparams.average_across_timesteps) + tf.losses.add_loss(loss) + return loss + + def create_summaries(self, data, endpoints, charset, is_training): + """Creates all summaries for the model. + + Args: + data: InputEndpoints namedtuple. + endpoints: OutputEndpoints namedtuple. + charset: A dictionary with mapping between character codes and + unicode characters. Use the one provided by a dataset.charset. + is_training: If True will create summary prefixes for training job, + otherwise - for evaluation. + + Returns: + A list of evaluation ops + """ + + def sname(label): + prefix = 'train' if is_training else 'eval' + return '%s/%s' % (prefix, label) + + max_outputs = 4 + # TODO(gorban): uncomment, when tf.summary.text released. + # charset_mapper = CharsetMapper(charset) + # pr_text = charset_mapper.get_text( + # endpoints.predicted_chars[:max_outputs,:]) + # tf.summary.text(sname('text/pr'), pr_text) + # gt_text = charset_mapper.get_text(data.labels[:max_outputs,:]) + # tf.summary.text(sname('text/gt'), gt_text) + tf.summary.image(sname('image'), data.images, max_outputs=max_outputs) + + if is_training: + tf.summary.image( + sname('image/orig'), data.images_orig, max_outputs=max_outputs) + for var in tf.trainable_variables(): + tf.summary.histogram(var.op.name, var) + return None + + else: + names_to_values = {} + names_to_updates = {} + + def use_metric(name, value_update_tuple): + names_to_values[name] = value_update_tuple[0] + names_to_updates[name] = value_update_tuple[1] + + use_metric('CharacterAccuracy', + metrics.char_accuracy( + endpoints.predicted_chars, + data.labels, + streaming=True, + rej_char=self._params.null_code)) + # Sequence accuracy computed by cutting sequence at the first null char + use_metric('SequenceAccuracy', + metrics.sequence_accuracy( + endpoints.predicted_chars, + data.labels, + streaming=True, + rej_char=self._params.null_code)) + + for name, value in names_to_values.iteritems(): + summary_name = 'eval/' + name + tf.summary.scalar(summary_name, tf.Print(value, [value], summary_name)) + return names_to_updates.values() + + def create_init_fn_to_restore(self, master_checkpoint, inception_checkpoint): + """Creates an init operations to restore weights from various checkpoints. + + Args: + master_checkpoint: path to a checkpoint which contains all weights for + the whole model. + inception_checkpoint: path to a checkpoint which contains weights for the + inception part only. + + Returns: + a function to run initialization ops. + """ + all_assign_ops = [] + all_feed_dict = {} + + def assign_from_checkpoint(variables, checkpoint): + logging.info('Request to re-store %d weights from %s', + len(variables), checkpoint) + if not variables: + logging.error('Can\'t find any variables to restore.') + sys.exit(1) + assign_op, feed_dict = slim.assign_from_checkpoint(checkpoint, variables) + all_assign_ops.append(assign_op) + all_feed_dict.update(feed_dict) + + if master_checkpoint: + assign_from_checkpoint(utils.variables_to_restore(), master_checkpoint) + + if inception_checkpoint: + variables = utils.variables_to_restore( + 'AttentionOcr_v1/conv_tower_fn/INCE', strip_scope=True) + assign_from_checkpoint(variables, inception_checkpoint) + + def init_assign_fn(sess): + logging.info('Restoring checkpoint(s)') + sess.run(all_assign_ops, all_feed_dict) + + return init_assign_fn diff --git a/attention_ocr/python/model_test.py b/attention_ocr/python/model_test.py new file mode 100644 index 0000000000000000000000000000000000000000..3626788b2124779702694a6b71e3aa5923021b32 --- /dev/null +++ b/attention_ocr/python/model_test.py @@ -0,0 +1,181 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for the model.""" + +import numpy as np +import string +import tensorflow as tf +from tensorflow.contrib import slim +from tensorflow.contrib.tfprof import model_analyzer + +import model +import data_provider + + +def create_fake_charset(num_char_classes): + charset = {} + for i in xrange(num_char_classes): + charset[i] = string.printable[i % len(string.printable)] + return charset + + +class ModelTest(tf.test.TestCase): + def setUp(self): + tf.test.TestCase.setUp(self) + + self.rng = np.random.RandomState([11, 23, 50]) + + self.batch_size = 4 + self.image_width = 600 + self.image_height = 30 + self.seq_length = 40 + self.num_char_classes = 72 + self.null_code = 62 + self.num_views = 4 + + feature_size = 288 + self.conv_tower_shape = (self.batch_size, 1, 72, feature_size) + self.features_shape = (self.batch_size, self.seq_length, feature_size) + self.chars_logit_shape = (self.batch_size, self.seq_length, + self.num_char_classes) + self.length_logit_shape = (self.batch_size, self.seq_length + 1) + + self.initialize_fakes() + + def initialize_fakes(self): + self.images_shape = (self.batch_size, self.image_height, self.image_width, + 3) + self.fake_images = tf.constant( + self.rng.randint(low=0, high=255, + size=self.images_shape).astype('float32'), + name='input_node') + self.fake_conv_tower_np = tf.constant( + self.rng.randn(*self.conv_tower_shape).astype('float32')) + self.fake_logits = tf.constant( + self.rng.randn(*self.chars_logit_shape).astype('float32')) + self.fake_labels = tf.constant( + self.rng.randint( + low=0, + high=self.num_char_classes, + size=(self.batch_size, self.seq_length)).astype('int64')) + + def create_model(self): + return model.Model( + self.num_char_classes, self.seq_length, num_views=4, null_code=62) + + def test_char_related_shapes(self): + ocr_model = self.create_model() + with self.test_session() as sess: + endpoints_tf = ocr_model.create_base( + images=self.fake_images, labels_one_hot=None) + + sess.run(tf.global_variables_initializer()) + endpoints = sess.run(endpoints_tf) + + self.assertEqual((self.batch_size, self.seq_length, + self.num_char_classes), endpoints.chars_logit.shape) + self.assertEqual((self.batch_size, self.seq_length, + self.num_char_classes), endpoints.chars_log_prob.shape) + self.assertEqual((self.batch_size, self.seq_length), + endpoints.predicted_chars.shape) + self.assertEqual((self.batch_size, self.seq_length), + endpoints.predicted_scores.shape) + + def test_predicted_scores_are_within_range(self): + ocr_model = self.create_model() + + _, _, scores = ocr_model.char_predictions(self.fake_logits) + with self.test_session() as sess: + scores_np = sess.run(scores) + + values_in_range = (scores_np >= 0.0) & (scores_np <= 1.0) + self.assertTrue( + np.all(values_in_range), + msg=('Scores contains out of the range values %s' % + scores_np[np.logical_not(values_in_range)])) + + def test_conv_tower_shape(self): + with self.test_session() as sess: + ocr_model = self.create_model() + conv_tower = ocr_model.conv_tower_fn(self.fake_images) + + sess.run(tf.global_variables_initializer()) + conv_tower_np = sess.run(conv_tower) + + self.assertEqual(self.conv_tower_shape, conv_tower_np.shape) + + def test_model_size_less_then1_gb(self): + # NOTE: Actual amount of memory occupied my TF during training will be at + # least 4X times bigger because of space need to store original weights, + # updates, gradients and variances. It also depends on the type of used + # optimizer. + ocr_model = self.create_model() + ocr_model.create_base(images=self.fake_images, labels_one_hot=None) + with self.test_session() as sess: + tfprof_root = model_analyzer.print_model_analysis( + sess.graph, + tfprof_options=model_analyzer.TRAINABLE_VARS_PARAMS_STAT_OPTIONS) + + model_size_bytes = 4 * tfprof_root.total_parameters + self.assertLess(model_size_bytes, 1 * 2**30) + + def test_create_summaries_is_runnable(self): + ocr_model = self.create_model() + data = data_provider.InputEndpoints( + images=self.fake_images, + images_orig=self.fake_images, + labels=self.fake_labels, + labels_one_hot=slim.one_hot_encoding(self.fake_labels, + self.num_char_classes)) + endpoints = ocr_model.create_base( + images=self.fake_images, labels_one_hot=None) + charset = create_fake_charset(self.num_char_classes) + summaries = ocr_model.create_summaries( + data, endpoints, charset, is_training=False) + with self.test_session() as sess: + sess.run(tf.global_variables_initializer()) + sess.run(tf.local_variables_initializer()) + tf.tables_initializer().run() + sess.run(summaries) # just check it is runnable + + def test_sequence_loss_function_without_label_smoothing(self): + model = self.create_model() + model.set_mparam('sequence_loss_fn', label_smoothing=0) + + loss = model.sequence_loss_fn(self.fake_logits, self.fake_labels) + with self.test_session() as sess: + loss_np = sess.run(loss) + + # This test checks that the loss function is 'runnable'. + self.assertEqual(loss_np.shape, tuple()) + + +class CharsetMapperTest(tf.test.TestCase): + def test_text_corresponds_to_ids(self): + charset = create_fake_charset(36) + ids = tf.constant( + [[17, 14, 21, 21, 24], [32, 24, 27, 21, 13]], dtype=tf.int64) + charset_mapper = model.CharsetMapper(charset) + + with self.test_session() as sess: + tf.tables_initializer().run() + text = sess.run(charset_mapper.get_text(ids)) + + self.assertAllEqual(text, ['hello', 'world']) + + +if __name__ == '__main__': + tf.test.main() diff --git a/attention_ocr/python/sequence_layers.py b/attention_ocr/python/sequence_layers.py new file mode 100644 index 0000000000000000000000000000000000000000..6e1e8493fdcf81eaf90d6769edefaf55a2baf7e8 --- /dev/null +++ b/attention_ocr/python/sequence_layers.py @@ -0,0 +1,422 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Various implementations of sequence layers for character prediction. + +A 'sequence layer' is a part of a computation graph which is responsible of +producing a sequence of characters using extracted image features. There are +many reasonable ways to implement such layers. All of them are using RNNs. +This module provides implementations which uses 'attention' mechanism to +spatially 'pool' image features and also can use a previously predicted +character to predict the next (aka auto regression). + +Usage: + Select one of available classes, e.g. Attention or use a wrapper function to + pick one based on your requirements: + layer_class = sequence_layers.get_layer_class(use_attention=True, + use_autoregression=True) + layer = layer_class(net, labels_one_hot, model_params, method_params) + char_logits = layer.create_logits() +""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import collections +import abc +import logging +import numpy as np + +import tensorflow as tf + +from tensorflow.contrib import slim + + +def orthogonal_initializer(shape, dtype=tf.float32, *args, **kwargs): + """Generates orthonormal matrices with random values. + + Orthonormal initialization is important for RNNs: + http://arxiv.org/abs/1312.6120 + http://smerity.com/articles/2016/orthogonal_init.html + + For non-square shapes the returned matrix will be semi-orthonormal: if the + number of columns exceeds the number of rows, then the rows are orthonormal + vectors; but if the number of rows exceeds the number of columns, then the + columns are orthonormal vectors. + + We use SVD decomposition to generate an orthonormal matrix with random + values. The same way as it is done in the Lasagne library for Theano. Note + that both u and v returned by the svd are orthogonal and random. We just need + to pick one with the right shape. + + Args: + shape: a shape of the tensor matrix to initialize. + dtype: a dtype of the initialized tensor. + *args: not used. + **kwargs: not used. + + Returns: + An initialized tensor. + """ + del args + del kwargs + flat_shape = (shape[0], np.prod(shape[1:])) + w = np.random.randn(*flat_shape) + u, _, v = np.linalg.svd(w, full_matrices=False) + w = u if u.shape == flat_shape else v + return tf.constant(w.reshape(shape), dtype=dtype) + + +SequenceLayerParams = collections.namedtuple('SequenceLogitsParams', [ + 'num_lstm_units', 'weight_decay', 'lstm_state_clip_value' +]) + + +class SequenceLayerBase(object): + """A base abstruct class for all sequence layers. + + A child class has to define following methods: + get_train_input + get_eval_input + unroll_cell + """ + __metaclass__ = abc.ABCMeta + + def __init__(self, net, labels_one_hot, model_params, method_params): + """Stores argument in member variable for further use. + + Args: + net: A tensor with shape [batch_size, num_features, feature_size] which + contains some extracted image features. + labels_one_hot: An optional (can be None) ground truth labels for the + input features. Is a tensor with shape + [batch_size, seq_length, num_char_classes] + model_params: A namedtuple with model parameters (model.ModelParams). + method_params: A SequenceLayerParams instance. + """ + self._params = model_params + self._mparams = method_params + self._net = net + self._labels_one_hot = labels_one_hot + self._batch_size = net.get_shape().dims[0].value + + # Initialize parameters for char logits which will be computed on the fly + # inside an LSTM decoder. + self._char_logits = {} + regularizer = slim.l2_regularizer(self._mparams.weight_decay) + self._softmax_w = slim.model_variable( + 'softmax_w', + [self._mparams.num_lstm_units, self._params.num_char_classes], + initializer=orthogonal_initializer, + regularizer=regularizer) + self._softmax_b = slim.model_variable( + 'softmax_b', [self._params.num_char_classes], + initializer=tf.zeros_initializer(), + regularizer=regularizer) + + @abc.abstractmethod + def get_train_input(self, prev, i): + """Returns a sample to be used to predict a character during training. + + This function is used as a loop_function for an RNN decoder. + + Args: + prev: output tensor from previous step of the RNN. A tensor with shape: + [batch_size, num_char_classes]. + i: index of a character in the output sequence. + + Returns: + A tensor with shape [batch_size, ?] - depth depends on implementation + details. + """ + pass + + @abc.abstractmethod + def get_eval_input(self, prev, i): + """Returns a sample to be used to predict a character during inference. + + This function is used as a loop_function for an RNN decoder. + + Args: + prev: output tensor from previous step of the RNN. A tensor with shape: + [batch_size, num_char_classes]. + i: index of a character in the output sequence. + + Returns: + A tensor with shape [batch_size, ?] - depth depends on implementation + details. + """ + raise AssertionError('Not implemented') + + @abc.abstractmethod + def unroll_cell(self, decoder_inputs, initial_state, loop_function, cell): + """Unrolls an RNN cell for all inputs. + + This is a placeholder to call some RNN decoder. It has a similar to + tf.seq2seq.rnn_decode interface. + + Args: + decoder_inputs: A list of 2D Tensors* [batch_size x input_size]. In fact, + most of existing decoders in presence of a loop_function use only the + first element to determine batch_size and length of the list to + determine number of steps. + initial_state: 2D Tensor with shape [batch_size x cell.state_size]. + loop_function: function will be applied to the i-th output in order to + generate the i+1-st input (see self.get_input). + cell: rnn_cell.RNNCell defining the cell function and size. + + Returns: + A tuple of the form (outputs, state), where: + outputs: A list of character logits of the same length as + decoder_inputs of 2D Tensors with shape [batch_size x num_characters]. + state: The state of each cell at the final time-step. + It is a 2D Tensor of shape [batch_size x cell.state_size]. + """ + pass + + def is_training(self): + """Returns True if the layer is created for training stage.""" + return self._labels_one_hot is not None + + def char_logit(self, inputs, char_index): + """Creates logits for a character if required. + + Args: + inputs: A tensor with shape [batch_size, ?] (depth is implementation + dependent). + char_index: A integer index of a character in the output sequence. + + Returns: + A tensor with shape [batch_size, num_char_classes] + """ + if char_index not in self._char_logits: + self._char_logits[char_index] = tf.nn.xw_plus_b(inputs, self._softmax_w, + self._softmax_b) + return self._char_logits[char_index] + + def char_one_hot(self, logit): + """Creates one hot encoding for a logit of a character. + + Args: + logit: A tensor with shape [batch_size, num_char_classes]. + + Returns: + A tensor with shape [batch_size, num_char_classes] + """ + prediction = tf.argmax(logit, dimension=1) + return slim.one_hot_encoding(prediction, self._params.num_char_classes) + + def get_input(self, prev, i): + """A wrapper for get_train_input and get_eval_input. + + Args: + prev: output tensor from previous step of the RNN. A tensor with shape: + [batch_size, num_char_classes]. + i: index of a character in the output sequence. + + Returns: + A tensor with shape [batch_size, ?] - depth depends on implementation + details. + """ + if self.is_training(): + return self.get_train_input(prev, i) + else: + return self.get_eval_input(prev, i) + + def create_logits(self): + """Creates character sequence logits for a net specified in the constructor. + + A "main" method for the sequence layer which glues together all pieces. + + Returns: + A tensor with shape [batch_size, seq_length, num_char_classes]. + """ + with tf.variable_scope('LSTM'): + first_label = self.get_input(prev=None, i=0) + decoder_inputs = [first_label] + [None] * (self._params.seq_length - 1) + lstm_cell = tf.contrib.rnn.LSTMCell( + self._mparams.num_lstm_units, + use_peepholes=False, + cell_clip=self._mparams.lstm_state_clip_value, + state_is_tuple=True, + initializer=orthogonal_initializer) + lstm_outputs, _ = self.unroll_cell( + decoder_inputs=decoder_inputs, + initial_state=lstm_cell.zero_state(self._batch_size, tf.float32), + loop_function=self.get_input, + cell=lstm_cell) + + with tf.variable_scope('logits'): + logits_list = [ + tf.expand_dims(self.char_logit(logit, i), dim=1) + for i, logit in enumerate(lstm_outputs) + ] + + return tf.concat(logits_list, 1) + + +class NetSlice(SequenceLayerBase): + """A layer which uses a subset of image features to predict each character. + """ + + def __init__(self, *args, **kwargs): + super(NetSlice, self).__init__(*args, **kwargs) + self._zero_label = tf.zeros( + [self._batch_size, self._params.num_char_classes]) + + def get_image_feature(self, char_index): + """Returns a subset of image features for a character. + + Args: + char_index: an index of a character. + + Returns: + A tensor with shape [batch_size, ?]. The output depth depends on the + depth of input net. + """ + batch_size, features_num, _ = [d.value for d in self._net.get_shape()] + slice_len = int(features_num / self._params.seq_length) + # In case when features_num != seq_length, we just pick a subset of image + # features, this choice is arbitrary and there is no intuitive geometrical + # interpretation. If features_num is not dividable by seq_length there will + # be unused image features. + net_slice = self._net[:, char_index:char_index + slice_len, :] + feature = tf.reshape(net_slice, [batch_size, -1]) + logging.debug('Image feature: %s', feature) + return feature + + def get_eval_input(self, prev, i): + """See SequenceLayerBase.get_eval_input for details.""" + del prev + return self.get_image_feature(i) + + def get_train_input(self, prev, i): + """See SequenceLayerBase.get_train_input for details.""" + return self.get_eval_input(prev, i) + + def unroll_cell(self, decoder_inputs, initial_state, loop_function, cell): + """See SequenceLayerBase.unroll_cell for details.""" + return tf.contrib.legacy_seq2seq.rnn_decoder( + decoder_inputs=decoder_inputs, + initial_state=initial_state, + cell=cell, + loop_function=self.get_input) + + +class NetSliceWithAutoregression(NetSlice): + """A layer similar to NetSlice, but it also uses auto regression. + + The "auto regression" means that we use network output for previous character + as a part of input for the current character. + """ + + def __init__(self, *args, **kwargs): + super(NetSliceWithAutoregression, self).__init__(*args, **kwargs) + + def get_eval_input(self, prev, i): + """See SequenceLayerBase.get_eval_input for details.""" + if i == 0: + prev = self._zero_label + else: + logit = self.char_logit(prev, char_index=i - 1) + prev = self.char_one_hot(logit) + image_feature = self.get_image_feature(char_index=i) + return tf.concat([image_feature, prev], 1) + + def get_train_input(self, prev, i): + """See SequenceLayerBase.get_train_input for details.""" + if i == 0: + prev = self._zero_label + else: + prev = self._labels_one_hot[:, i - 1, :] + image_feature = self.get_image_feature(i) + return tf.concat([image_feature, prev], 1) + + +class Attention(SequenceLayerBase): + """A layer which uses attention mechanism to select image features.""" + + def __init__(self, *args, **kwargs): + super(Attention, self).__init__(*args, **kwargs) + self._zero_label = tf.zeros( + [self._batch_size, self._params.num_char_classes]) + + def get_eval_input(self, prev, i): + """See SequenceLayerBase.get_eval_input for details.""" + del prev, i + # The attention_decoder will fetch image features from the net, no need for + # extra inputs. + return self._zero_label + + def get_train_input(self, prev, i): + """See SequenceLayerBase.get_train_input for details.""" + return self.get_eval_input(prev, i) + + def unroll_cell(self, decoder_inputs, initial_state, loop_function, cell): + return tf.contrib.legacy_seq2seq.attention_decoder( + decoder_inputs=decoder_inputs, + initial_state=initial_state, + attention_states=self._net, + cell=cell, + loop_function=self.get_input) + + +class AttentionWithAutoregression(Attention): + """A layer which uses both attention and auto regression.""" + + def __init__(self, *args, **kwargs): + super(AttentionWithAutoregression, self).__init__(*args, **kwargs) + + def get_train_input(self, prev, i): + """See SequenceLayerBase.get_train_input for details.""" + if i == 0: + return self._zero_label + else: + # TODO(gorban): update to gradually introduce gt labels. + return self._labels_one_hot[:, i - 1, :] + + def get_eval_input(self, prev, i): + """See SequenceLayerBase.get_eval_input for details.""" + if i == 0: + return self._zero_label + else: + logit = self.char_logit(prev, char_index=i - 1) + return self.char_one_hot(logit) + + +def get_layer_class(use_attention, use_autoregression): + """A convenience function to get a layer class based on requirements. + + Args: + use_attention: if True a returned class will use attention. + use_autoregression: if True a returned class will use auto regression. + + Returns: + One of available sequence layers (child classes for SequenceLayerBase). + """ + if use_attention and use_autoregression: + layer_class = AttentionWithAutoregression + elif use_attention and not use_autoregression: + layer_class = Attention + elif not use_attention and not use_autoregression: + layer_class = NetSlice + elif not use_attention and use_autoregression: + layer_class = NetSliceWithAutoregression + else: + raise AssertionError('Unsupported sequence layer class') + + logging.debug('Use %s as a layer class', layer_class.__name__) + return layer_class diff --git a/attention_ocr/python/sequence_layers_test.py b/attention_ocr/python/sequence_layers_test.py new file mode 100644 index 0000000000000000000000000000000000000000..fd41e2d824c014084129707631d45de334ec741b --- /dev/null +++ b/attention_ocr/python/sequence_layers_test.py @@ -0,0 +1,112 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for sequence_layers.""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import numpy as np +import tensorflow as tf +from tensorflow.contrib import slim + +import model +import sequence_layers + + +def fake_net(batch_size, num_features, feature_size): + return tf.convert_to_tensor( + np.random.uniform(size=(batch_size, num_features, feature_size)), + dtype=tf.float32) + + +def fake_labels(batch_size, seq_length, num_char_classes): + labels_np = tf.convert_to_tensor( + np.random.randint( + low=0, high=num_char_classes, size=(batch_size, seq_length))) + return slim.one_hot_encoding(labels_np, num_classes=num_char_classes) + + +def create_layer(layer_class, batch_size, seq_length, num_char_classes): + model_params = model.ModelParams( + num_char_classes=num_char_classes, + seq_length=seq_length, + num_views=1, + null_code=num_char_classes) + net = fake_net( + batch_size=batch_size, num_features=seq_length * 5, feature_size=6) + labels_one_hot = fake_labels(batch_size, seq_length, num_char_classes) + layer_params = sequence_layers.SequenceLayerParams( + num_lstm_units=10, weight_decay=0.00004, lstm_state_clip_value=10.0) + return layer_class(net, labels_one_hot, model_params, layer_params) + + +class SequenceLayersTest(tf.test.TestCase): + def test_net_slice_char_logits_with_correct_shape(self): + batch_size = 2 + seq_length = 4 + num_char_classes = 3 + + layer = create_layer(sequence_layers.NetSlice, batch_size, seq_length, + num_char_classes) + char_logits = layer.create_logits() + + self.assertEqual( + tf.TensorShape([batch_size, seq_length, num_char_classes]), + char_logits.get_shape()) + + def test_net_slice_with_autoregression_char_logits_with_correct_shape(self): + batch_size = 2 + seq_length = 4 + num_char_classes = 3 + + layer = create_layer(sequence_layers.NetSliceWithAutoregression, + batch_size, seq_length, num_char_classes) + char_logits = layer.create_logits() + + self.assertEqual( + tf.TensorShape([batch_size, seq_length, num_char_classes]), + char_logits.get_shape()) + + def test_attention_char_logits_with_correct_shape(self): + batch_size = 2 + seq_length = 4 + num_char_classes = 3 + + layer = create_layer(sequence_layers.Attention, batch_size, seq_length, + num_char_classes) + char_logits = layer.create_logits() + + self.assertEqual( + tf.TensorShape([batch_size, seq_length, num_char_classes]), + char_logits.get_shape()) + + def test_attention_with_autoregression_char_logits_with_correct_shape(self): + batch_size = 2 + seq_length = 4 + num_char_classes = 3 + + layer = create_layer(sequence_layers.AttentionWithAutoregression, + batch_size, seq_length, num_char_classes) + char_logits = layer.create_logits() + + self.assertEqual( + tf.TensorShape([batch_size, seq_length, num_char_classes]), + char_logits.get_shape()) + + +if __name__ == '__main__': + tf.test.main() diff --git a/attention_ocr/python/train.py b/attention_ocr/python/train.py new file mode 100644 index 0000000000000000000000000000000000000000..fa91fb73b412287889f05d0af5875e269f1ce367 --- /dev/null +++ b/attention_ocr/python/train.py @@ -0,0 +1,209 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Script to train the Attention OCR model. + +A simple usage example: +python train.py +""" +import collections +import logging +import tensorflow as tf +from tensorflow.contrib import slim +from tensorflow import app +from tensorflow.python.platform import flags +from tensorflow.contrib.tfprof import model_analyzer + +import data_provider +import common_flags + +FLAGS = flags.FLAGS +common_flags.define() + +# yapf: disable +flags.DEFINE_integer('task', 0, + 'The Task ID. This value is used when training with ' + 'multiple workers to identify each worker.') + +flags.DEFINE_integer('ps_tasks', 0, + 'The number of parameter servers. If the value is 0, then' + ' the parameters are handled locally by the worker.') + +flags.DEFINE_integer('save_summaries_secs', 60, + 'The frequency with which summaries are saved, in ' + 'seconds.') + +flags.DEFINE_integer('save_interval_secs', 600, + 'Frequency in seconds of saving the model.') + +flags.DEFINE_integer('max_number_of_steps', int(1e10), + 'The maximum number of gradient steps.') + +flags.DEFINE_string('checkpoint_inception', '', + 'Checkpoint to recover inception weights from.') + +flags.DEFINE_float('clip_gradient_norm', 2.0, + 'If greater than 0 then the gradients would be clipped by ' + 'it.') + +flags.DEFINE_bool('sync_replicas', False, + 'If True will synchronize replicas during training.') + +flags.DEFINE_integer('replicas_to_aggregate', 1, + 'The number of gradients updates before updating params.') + +flags.DEFINE_integer('total_num_replicas', 1, + 'Total number of worker replicas.') + +flags.DEFINE_integer('startup_delay_steps', 15, + 'Number of training steps between replicas startup.') + +flags.DEFINE_boolean('reset_train_dir', False, + 'If true will delete all files in the train_log_dir') + +flags.DEFINE_boolean('show_graph_stats', False, + 'Output model size stats to stderr.') +# yapf: enable + +TrainingHParams = collections.namedtuple('TrainingHParams', [ + 'learning_rate', + 'optimizer', + 'momentum', + 'use_augment_input', +]) + + +def get_training_hparams(): + return TrainingHParams( + learning_rate=FLAGS.learning_rate, + optimizer=FLAGS.optimizer, + momentum=FLAGS.momentum, + use_augment_input=FLAGS.use_augment_input) + + +def create_optimizer(hparams): + """Creates optimized based on the specified flags.""" + if hparams.optimizer == 'momentum': + optimizer = tf.train.MomentumOptimizer( + hparams.learning_rate, momentum=hparams.momentum) + elif hparams.optimizer == 'adam': + optimizer = tf.train.AdamOptimizer(hparams.learning_rate) + elif hparams.optimizer == 'adadelta': + optimizer = tf.train.AdadeltaOptimizer(hparams.learning_rate) + elif hparams.optimizer == 'adagrad': + optimizer = tf.train.AdagradOptimizer(hparams.learning_rate) + elif hparams.optimizer == 'rmsprop': + optimizer = tf.train.RMSPropOptimizer( + hparams.learning_rate, momentum=hparams.momentum) + return optimizer + + +def train(loss, init_fn, hparams): + """Wraps slim.learning.train to run a training loop. + + Args: + loss: a loss tensor + init_fn: A callable to be executed after all other initialization is done. + hparams: a model hyper parameters + """ + optimizer = create_optimizer(hparams) + + if FLAGS.sync_replicas: + replica_id = tf.constant(FLAGS.task, tf.int32, shape=()) + optimizer = tf.LegacySyncReplicasOptimizer( + opt=optimizer, + replicas_to_aggregate=FLAGS.replicas_to_aggregate, + replica_id=replica_id, + total_num_replicas=FLAGS.total_num_replicas) + sync_optimizer = optimizer + startup_delay_steps = 0 + else: + startup_delay_steps = 0 + sync_optimizer = None + + train_op = slim.learning.create_train_op( + loss, + optimizer, + summarize_gradients=True, + clip_gradient_norm=FLAGS.clip_gradient_norm) + + slim.learning.train( + train_op=train_op, + logdir=FLAGS.train_log_dir, + graph=loss.graph, + master=FLAGS.master, + is_chief=(FLAGS.task == 0), + number_of_steps=FLAGS.max_number_of_steps, + save_summaries_secs=FLAGS.save_summaries_secs, + save_interval_secs=FLAGS.save_interval_secs, + startup_delay_steps=startup_delay_steps, + sync_optimizer=sync_optimizer, + init_fn=init_fn) + + +def prepare_training_dir(): + if not tf.gfile.Exists(FLAGS.train_log_dir): + logging.info('Create a new training directory %s', FLAGS.train_log_dir) + tf.gfile.MakeDirs(FLAGS.train_log_dir) + else: + if FLAGS.reset_train_dir: + logging.info('Reset the training directory %s', FLAGS.train_log_dir) + tf.gfile.DeleteRecursively(FLAGS.train_log_dir) + tf.gfile.MakeDirs(FLAGS.train_log_dir) + else: + logging.info('Use already existing training directory %s', + FLAGS.train_log_dir) + + +def calculate_graph_metrics(): + param_stats = model_analyzer.print_model_analysis( + tf.get_default_graph(), + tfprof_options=model_analyzer.TRAINABLE_VARS_PARAMS_STAT_OPTIONS) + return param_stats.total_parameters + + +def main(_): + prepare_training_dir() + + dataset = common_flags.create_dataset(split_name=FLAGS.split_name) + model = common_flags.create_model(dataset.num_char_classes, + dataset.max_sequence_length, + dataset.num_of_views, dataset.null_code) + hparams = get_training_hparams() + + # If ps_tasks is zero, the local device is used. When using multiple + # (non-local) replicas, the ReplicaDeviceSetter distributes the variables + # across the different devices. + device_setter = tf.train.replica_device_setter( + FLAGS.ps_tasks, merge_devices=True) + with tf.device(device_setter): + data = data_provider.get_data( + dataset, + FLAGS.batch_size, + augment=hparams.use_augment_input, + central_crop_size=common_flags.get_crop_size()) + endpoints = model.create_base(data.images, data.labels_one_hot) + total_loss = model.create_loss(data, endpoints) + model.create_summaries(data, endpoints, dataset.charset, is_training=True) + init_fn = model.create_init_fn_to_restore(FLAGS.checkpoint, + FLAGS.checkpoint_inception) + if FLAGS.show_graph_stats: + logging.info('Total number of weights in the graph: %s', + calculate_graph_metrics()) + train(total_loss, init_fn, hparams) + + +if __name__ == '__main__': + app.run() diff --git a/attention_ocr/python/utils.py b/attention_ocr/python/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..10d93ad21e1444736bf4562ef0df1c939617a5c1 --- /dev/null +++ b/attention_ocr/python/utils.py @@ -0,0 +1,80 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Functions to support building models for StreetView text transcription.""" + +import tensorflow as tf +from tensorflow.contrib import slim + + +def logits_to_log_prob(logits): + """Computes log probabilities using numerically stable trick. + + This uses two numerical stability tricks: + 1) softmax(x) = softmax(x - c) where c is a constant applied to all + arguments. If we set c = max(x) then the softmax is more numerically + stable. + 2) log softmax(x) is not numerically stable, but we can stabilize it + by using the identity log softmax(x) = x - log sum exp(x) + + Args: + logits: Tensor of arbitrary shape whose last dimension contains logits. + + Returns: + A tensor of the same shape as the input, but with corresponding log + probabilities. + """ + + with tf.variable_scope('log_probabilities'): + reduction_indices = len(logits.shape.as_list()) - 1 + max_logits = tf.reduce_max( + logits, reduction_indices=reduction_indices, keep_dims=True) + safe_logits = tf.subtract(logits, max_logits) + sum_exp = tf.reduce_sum( + tf.exp(safe_logits), + reduction_indices=reduction_indices, + keep_dims=True) + log_probs = tf.subtract(safe_logits, tf.log(sum_exp)) + return log_probs + + +def variables_to_restore(scope=None, strip_scope=False): + """Returns a list of variables to restore for the specified list of methods. + + It is supposed that variable name starts with the method's scope (a prefix + returned by _method_scope function). + + Args: + methods_names: a list of names of configurable methods. + strip_scope: if True will return variable names without method's scope. + If methods_names is None will return names unchanged. + model_scope: a scope for a whole model. + + Returns: + a dictionary mapping variable names to variables for restore. + """ + if scope: + variable_map = {} + method_variables = slim.get_variables_to_restore(include=[scope]) + for var in method_variables: + if strip_scope: + var_name = var.op.name[len(scope) + 1:] + else: + var_name = var.op.name + variable_map[var_name] = var + + return variable_map + else: + return {v.op.name: v for v in slim.get_variables_to_restore()} diff --git a/cognitive_mapping_and_planning/.gitignore b/cognitive_mapping_and_planning/.gitignore new file mode 100644 index 0000000000000000000000000000000000000000..cbc6a8f0271075171ffdf3c2bc5fb9c528b08fc6 --- /dev/null +++ b/cognitive_mapping_and_planning/.gitignore @@ -0,0 +1,4 @@ +deps +*.pyc +lib*.so +lib*.so* diff --git a/cognitive_mapping_and_planning/README.md b/cognitive_mapping_and_planning/README.md new file mode 100644 index 0000000000000000000000000000000000000000..ce69d34745368159d36ee3421ce1ed9de468cf2b --- /dev/null +++ b/cognitive_mapping_and_planning/README.md @@ -0,0 +1,122 @@ +# Cognitive Mapping and Planning for Visual Navigation +**Saurabh Gupta, James Davidson, Sergey Levine, Rahul Sukthankar, Jitendra Malik** + +**Computer Vision and Pattern Recognition (CVPR) 2017.** + +**[ArXiv](https://arxiv.org/abs/1702.03920), +[Project Website](https://sites.google.com/corp/view/cognitive-mapping-and-planning/)** + +### Citing +If you find this code base and models useful in your research, please consider +citing the following paper: + ``` + @inproceedings{gupta2017cognitive, + title={Cognitive Mapping and Planning for Visual Navigation}, + author={Gupta, Saurabh and Davidson, James and Levine, Sergey and + Sukthankar, Rahul and Malik, Jitendra}, + booktitle={CVPR}, + year={2017} + } + ``` + +### Contents +1. [Requirements: software](#requirements-software) +2. [Requirements: data](#requirements-data) +3. [Test Pre-trained Models](#test-pre_trained-models) +4. [Train your Own Models](#train-your-own-models) + +### Requirements: software +1. Python Virtual Env Setup: All code is implemented in Python but depends on a + small number of python packages and a couple of C libraries. We recommend + using virtual environment for installing these python packages and python + bindings for these C libraries. + ```Shell + VENV_DIR=venv + pip install virtualenv + virtualenv $VENV_DIR + source $VENV_DIR/bin/activate + + # You may need to upgrade pip for installing openv-python. + pip install --upgrade pip + # Install simple dependencies. + pip install -r requirements.txt + + # Patch bugs in dependencies. + sh patches/apply_patches.sh + ``` + +2. Install [Tensorflow](https://www.tensorflow.org/) inside this virtual + environment. Typically done with `pip install --upgrade tensorflow-gpu`. + +3. Swiftshader: We use + [Swiftshader](https://github.com/google/swiftshader.git), a CPU based + renderer to render the meshes. It is possible to use other renderers, + replace `SwiftshaderRenderer` in `render/swiftshader_renderer.py` with + bindings to your renderer. + ```Shell + mkdir -p deps + git clone --recursive https://github.com/google/swiftshader.git deps/swiftshader-src + cd deps/swiftshader-src && git checkout 91da6b00584afd7dcaed66da88e2b617429b3950 + mkdir build && cd build && cmake .. && make -j 16 libEGL libGLESv2 + cd ../../../ + cp deps/swiftshader-src/build/libEGL* libEGL.so.1 + cp deps/swiftshader-src/build/libGLESv2* libGLESv2.so.2 + ``` + +4. PyAssimp: We use [PyAssimp](https://github.com/assimp/assimp.git) to load + meshes. It is possible to use other libraries to load meshes, replace + `Shape` `render/swiftshader_renderer.py` with bindings to your library for + loading meshes. + ```Shell + mkdir -p deps + git clone https://github.com/assimp/assimp.git deps/assimp-src + cd deps/assimp-src + git checkout 2afeddd5cb63d14bc77b53740b38a54a97d94ee8 + cmake CMakeLists.txt -G 'Unix Makefiles' && make -j 16 + cd port/PyAssimp && python setup.py install + cd ../../../.. + cp deps/assimp-src/lib/libassimp* . + ``` + +5. graph-tool: We use [graph-tool](https://git.skewed.de/count0/graph-tool) + library for graph processing. + ```Shell + mkdir -p deps + # If the following git clone command fails, you can also download the source + # from https://downloads.skewed.de/graph-tool/graph-tool-2.2.44.tar.bz2 + git clone https://git.skewed.de/count0/graph-tool deps/graph-tool-src + cd deps/graph-tool-src && git checkout 178add3a571feb6666f4f119027705d95d2951ab + bash autogen.sh + ./configure --disable-cairo --disable-sparsehash --prefix=$HOME/.local + make -j 16 + make install + cd ../../ + ``` + +### Requirements: data +1. Download the Stanford 3D Indoor Spaces Dataset (S3DIS Dataset) and ImageNet + Pre-trained models for initializing different models. Follow instructions in + `data/README.md` + +### Test Pre-trained Models +1. Download pre-trained models using + `scripts/scripts_download_pretrained_models.sh` + +2. Test models using `scripts/script_test_pretrained_models.sh`. + +### Train Your Own Models +All models were trained asynchronously with 16 workers each worker using data +from a single floor. The default hyper-parameters correspond to this setting. +See [distributed training with +Tensorflow](https://www.tensorflow.org/deploy/distributed) for setting up +distributed training. Training with a single worker is possible with the current +code base but will require some minor changes to allow each worker to load all +training environments. + +### Contact +For questions or issues open an issue on the tensorflow/models [issues +tracker](https://github.com/tensorflow/models/issues). Please assign issues to +@s-gupta. + +### Credits +This code was written by Saurabh Gupta (@s-gupta). diff --git a/cognitive_mapping_and_planning/__init__.py b/cognitive_mapping_and_planning/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/cognitive_mapping_and_planning/cfgs/__init__.py b/cognitive_mapping_and_planning/cfgs/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/cognitive_mapping_and_planning/cfgs/config_cmp.py b/cognitive_mapping_and_planning/cfgs/config_cmp.py new file mode 100644 index 0000000000000000000000000000000000000000..715eee2b973cb66f816ecdb65bbcc3abdd8a9483 --- /dev/null +++ b/cognitive_mapping_and_planning/cfgs/config_cmp.py @@ -0,0 +1,283 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +import os, sys +import numpy as np +from tensorflow.python.platform import app +from tensorflow.python.platform import flags +import logging +import src.utils as utils +import cfgs.config_common as cc + + +import tensorflow as tf + + +rgb_resnet_v2_50_path = 'data/init_models/resnet_v2_50/model.ckpt-5136169' +d_resnet_v2_50_path = 'data/init_models/distill_rgb_to_d_resnet_v2_50/model.ckpt-120002' + +def get_default_args(): + summary_args = utils.Foo(display_interval=1, test_iters=26, + arop_full_summary_iters=14) + + control_args = utils.Foo(train=False, test=False, + force_batchnorm_is_training_at_test=False, + reset_rng_seed=False, only_eval_when_done=False, + test_mode=None) + return summary_args, control_args + +def get_default_cmp_args(): + batch_norm_param = {'center': True, 'scale': True, + 'activation_fn':tf.nn.relu} + + mapper_arch_args = utils.Foo( + dim_reduce_neurons=64, + fc_neurons=[1024, 1024], + fc_out_size=8, + fc_out_neurons=64, + encoder='resnet_v2_50', + deconv_neurons=[64, 32, 16, 8, 4, 2], + deconv_strides=[2, 2, 2, 2, 2, 2], + deconv_layers_per_block=2, + deconv_kernel_size=4, + fc_dropout=0.5, + combine_type='wt_avg_logits', + batch_norm_param=batch_norm_param) + + readout_maps_arch_args = utils.Foo( + num_neurons=[], + strides=[], + kernel_size=None, + layers_per_block=None) + + arch_args = utils.Foo( + vin_val_neurons=8, vin_action_neurons=8, vin_ks=3, vin_share_wts=False, + pred_neurons=[64, 64], pred_batch_norm_param=batch_norm_param, + conv_on_value_map=0, fr_neurons=16, fr_ver='v2', fr_inside_neurons=64, + fr_stride=1, crop_remove_each=30, value_crop_size=4, + action_sample_type='sample', action_sample_combine_type='one_or_other', + sample_gt_prob_type='inverse_sigmoid_decay', dagger_sample_bn_false=True, + vin_num_iters=36, isd_k=750., use_agent_loc=False, multi_scale=True, + readout_maps=False, rom_arch=readout_maps_arch_args) + + return arch_args, mapper_arch_args + +def get_arch_vars(arch_str): + if arch_str == '': vals = [] + else: vals = arch_str.split('_') + ks = ['var1', 'var2', 'var3'] + ks = ks[:len(vals)] + + # Exp Ver. + if len(vals) == 0: ks.append('var1'); vals.append('v0') + # custom arch. + if len(vals) == 1: ks.append('var2'); vals.append('') + # map scape for projection baseline. + if len(vals) == 2: ks.append('var3'); vals.append('fr2') + + assert(len(vals) == 3) + + vars = utils.Foo() + for k, v in zip(ks, vals): + setattr(vars, k, v) + + logging.error('arch_vars: %s', vars) + return vars + +def process_arch_str(args, arch_str): + # This function modifies args. + args.arch, args.mapper_arch = get_default_cmp_args() + + arch_vars = get_arch_vars(arch_str) + + args.navtask.task_params.outputs.ego_maps = True + args.navtask.task_params.outputs.ego_goal_imgs = True + args.navtask.task_params.outputs.egomotion = True + args.navtask.task_params.toy_problem = False + + if arch_vars.var1 == 'lmap': + args = process_arch_learned_map(args, arch_vars) + + elif arch_vars.var1 == 'pmap': + args = process_arch_projected_map(args, arch_vars) + + else: + logging.fatal('arch_vars.var1 should be lmap or pmap, but is %s', arch_vars.var1) + assert(False) + + return args + +def process_arch_learned_map(args, arch_vars): + # Multiscale vision based system. + args.navtask.task_params.input_type = 'vision' + args.navtask.task_params.outputs.images = True + + if args.navtask.camera_param.modalities[0] == 'rgb': + args.solver.pretrained_path = rgb_resnet_v2_50_path + elif args.navtask.camera_param.modalities[0] == 'depth': + args.solver.pretrained_path = d_resnet_v2_50_path + + if arch_vars.var2 == 'Ssc': + sc = 1./args.navtask.task_params.step_size + args.arch.vin_num_iters = 40 + args.navtask.task_params.map_scales = [sc] + max_dist = args.navtask.task_params.max_dist * \ + args.navtask.task_params.num_goals + args.navtask.task_params.map_crop_sizes = [2*max_dist] + + args.arch.fr_stride = 1 + args.arch.vin_action_neurons = 8 + args.arch.vin_val_neurons = 3 + args.arch.fr_inside_neurons = 32 + + args.mapper_arch.pad_map_with_zeros_each = [24] + args.mapper_arch.deconv_neurons = [64, 32, 16] + args.mapper_arch.deconv_strides = [1, 2, 1] + + elif (arch_vars.var2 == 'Msc' or arch_vars.var2 == 'MscROMms' or + arch_vars.var2 == 'MscROMss' or arch_vars.var2 == 'MscNoVin'): + # Code for multi-scale planner. + args.arch.vin_num_iters = 8 + args.arch.crop_remove_each = 4 + args.arch.value_crop_size = 8 + + sc = 1./args.navtask.task_params.step_size + max_dist = args.navtask.task_params.max_dist * \ + args.navtask.task_params.num_goals + n_scales = np.log2(float(max_dist) / float(args.arch.vin_num_iters)) + n_scales = int(np.ceil(n_scales)+1) + + args.navtask.task_params.map_scales = \ + list(sc*(0.5**(np.arange(n_scales))[::-1])) + args.navtask.task_params.map_crop_sizes = [16 for x in range(n_scales)] + + args.arch.fr_stride = 1 + args.arch.vin_action_neurons = 8 + args.arch.vin_val_neurons = 3 + args.arch.fr_inside_neurons = 32 + + args.mapper_arch.pad_map_with_zeros_each = [0 for _ in range(n_scales)] + args.mapper_arch.deconv_neurons = [64*n_scales, 32*n_scales, 16*n_scales] + args.mapper_arch.deconv_strides = [1, 2, 1] + + if arch_vars.var2 == 'MscNoVin': + # No planning version. + args.arch.fr_stride = [1, 2, 1, 2] + args.arch.vin_action_neurons = None + args.arch.vin_val_neurons = 16 + args.arch.fr_inside_neurons = 32 + + args.arch.crop_remove_each = 0 + args.arch.value_crop_size = 4 + args.arch.vin_num_iters = 0 + + elif arch_vars.var2 == 'MscROMms' or arch_vars.var2 == 'MscROMss': + # Code with read outs, MscROMms flattens and reads out, + # MscROMss does not flatten and produces output at multiple scales. + args.navtask.task_params.outputs.readout_maps = True + args.navtask.task_params.map_resize_method = 'antialiasing' + args.arch.readout_maps = True + + if arch_vars.var2 == 'MscROMms': + args.arch.rom_arch.num_neurons = [64, 1] + args.arch.rom_arch.kernel_size = 4 + args.arch.rom_arch.strides = [2,2] + args.arch.rom_arch.layers_per_block = 2 + + args.navtask.task_params.readout_maps_crop_sizes = [64] + args.navtask.task_params.readout_maps_scales = [sc] + + elif arch_vars.var2 == 'MscROMss': + args.arch.rom_arch.num_neurons = \ + [64, len(args.navtask.task_params.map_scales)] + args.arch.rom_arch.kernel_size = 4 + args.arch.rom_arch.strides = [1,1] + args.arch.rom_arch.layers_per_block = 1 + + args.navtask.task_params.readout_maps_crop_sizes = \ + args.navtask.task_params.map_crop_sizes + args.navtask.task_params.readout_maps_scales = \ + args.navtask.task_params.map_scales + + else: + logging.fatal('arch_vars.var2 not one of Msc, MscROMms, MscROMss, MscNoVin.') + assert(False) + + map_channels = args.mapper_arch.deconv_neurons[-1] / \ + (2*len(args.navtask.task_params.map_scales)) + args.navtask.task_params.map_channels = map_channels + + return args + +def process_arch_projected_map(args, arch_vars): + # Single scale vision based system which does not use a mapper but instead + # uses an analytically estimated map. + ds = int(arch_vars.var3[2]) + args.navtask.task_params.input_type = 'analytical_counts' + args.navtask.task_params.outputs.analytical_counts = True + + assert(args.navtask.task_params.modalities[0] == 'depth') + args.navtask.camera_param.img_channels = None + + analytical_counts = utils.Foo(map_sizes=[512/ds], + xy_resolution=[5.*ds], + z_bins=[[-10, 10, 150, 200]], + non_linearity=[arch_vars.var2]) + args.navtask.task_params.analytical_counts = analytical_counts + + sc = 1./ds + args.arch.vin_num_iters = 36 + args.navtask.task_params.map_scales = [sc] + args.navtask.task_params.map_crop_sizes = [512/ds] + + args.arch.fr_stride = [1,2] + args.arch.vin_action_neurons = 8 + args.arch.vin_val_neurons = 3 + args.arch.fr_inside_neurons = 32 + + map_channels = len(analytical_counts.z_bins[0]) + 1 + args.navtask.task_params.map_channels = map_channels + args.solver.freeze_conv = False + + return args + +def get_args_for_config(config_name): + args = utils.Foo() + + args.summary, args.control = get_default_args() + + exp_name, mode_str = config_name.split('+') + arch_str, solver_str, navtask_str = exp_name.split('.') + logging.error('config_name: %s', config_name) + logging.error('arch_str: %s', arch_str) + logging.error('navtask_str: %s', navtask_str) + logging.error('solver_str: %s', solver_str) + logging.error('mode_str: %s', mode_str) + + args.solver = cc.process_solver_str(solver_str) + args.navtask = cc.process_navtask_str(navtask_str) + + args = process_arch_str(args, arch_str) + args.arch.isd_k = args.solver.isd_k + + # Train, test, etc. + mode, imset = mode_str.split('_') + args = cc.adjust_args_for_mode(args, mode) + args.navtask.building_names = args.navtask.dataset.get_split(imset) + args.control.test_name = '{:s}_on_{:s}'.format(mode, imset) + + # Log the arguments + logging.error('%s', args) + return args diff --git a/cognitive_mapping_and_planning/cfgs/config_common.py b/cognitive_mapping_and_planning/cfgs/config_common.py new file mode 100644 index 0000000000000000000000000000000000000000..440bf5b72f87a1eeca38e22f33b22e82de7345c0 --- /dev/null +++ b/cognitive_mapping_and_planning/cfgs/config_common.py @@ -0,0 +1,261 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +import os +import numpy as np +import logging +import src.utils as utils +import datasets.nav_env_config as nec +from datasets import factory + +def adjust_args_for_mode(args, mode): + if mode == 'train': + args.control.train = True + + elif mode == 'val1': + # Same settings as for training, to make sure nothing wonky is happening + # there. + args.control.test = True + args.control.test_mode = 'val' + args.navtask.task_params.batch_size = 32 + + elif mode == 'val2': + # No data augmentation, not sampling but taking the argmax action, not + # sampling from the ground truth at all. + args.control.test = True + args.arch.action_sample_type = 'argmax' + args.arch.sample_gt_prob_type = 'zero' + args.navtask.task_params.data_augment = \ + utils.Foo(lr_flip=0, delta_angle=0, delta_xy=0, relight=False, + relight_fast=False, structured=False) + args.control.test_mode = 'val' + args.navtask.task_params.batch_size = 32 + + elif mode == 'bench': + # Actually testing the agent in settings that are kept same between + # different runs. + args.navtask.task_params.batch_size = 16 + args.control.test = True + args.arch.action_sample_type = 'argmax' + args.arch.sample_gt_prob_type = 'zero' + args.navtask.task_params.data_augment = \ + utils.Foo(lr_flip=0, delta_angle=0, delta_xy=0, relight=False, + relight_fast=False, structured=False) + args.summary.test_iters = 250 + args.control.only_eval_when_done = True + args.control.reset_rng_seed = True + args.control.test_mode = 'test' + else: + logging.fatal('Unknown mode: %s.', mode) + assert(False) + return args + +def get_solver_vars(solver_str): + if solver_str == '': vals = []; + else: vals = solver_str.split('_') + ks = ['clip', 'dlw', 'long', 'typ', 'isdk', 'adam_eps', 'init_lr']; + ks = ks[:len(vals)] + + # Gradient clipping or not. + if len(vals) == 0: ks.append('clip'); vals.append('noclip'); + # data loss weight. + if len(vals) == 1: ks.append('dlw'); vals.append('dlw20') + # how long to train for. + if len(vals) == 2: ks.append('long'); vals.append('nolong') + # Adam + if len(vals) == 3: ks.append('typ'); vals.append('adam2') + # reg loss wt + if len(vals) == 4: ks.append('rlw'); vals.append('rlw1') + # isd_k + if len(vals) == 5: ks.append('isdk'); vals.append('isdk415') # 415, inflexion at 2.5k. + # adam eps + if len(vals) == 6: ks.append('adam_eps'); vals.append('aeps1en8') + # init lr + if len(vals) == 7: ks.append('init_lr'); vals.append('lr1en3') + + assert(len(vals) == 8) + + vars = utils.Foo() + for k, v in zip(ks, vals): + setattr(vars, k, v) + logging.error('solver_vars: %s', vars) + return vars + +def process_solver_str(solver_str): + solver = utils.Foo( + seed=0, learning_rate_decay=None, clip_gradient_norm=None, max_steps=None, + initial_learning_rate=None, momentum=None, steps_per_decay=None, + logdir=None, sync=False, adjust_lr_sync=True, wt_decay=0.0001, + data_loss_wt=None, reg_loss_wt=None, freeze_conv=True, num_workers=1, + task=0, ps_tasks=0, master='local', typ=None, momentum2=None, + adam_eps=None) + + # Clobber with overrides from solver str. + solver_vars = get_solver_vars(solver_str) + + solver.data_loss_wt = float(solver_vars.dlw[3:].replace('x', '.')) + solver.adam_eps = float(solver_vars.adam_eps[4:].replace('x', '.').replace('n', '-')) + solver.initial_learning_rate = float(solver_vars.init_lr[2:].replace('x', '.').replace('n', '-')) + solver.reg_loss_wt = float(solver_vars.rlw[3:].replace('x', '.')) + solver.isd_k = float(solver_vars.isdk[4:].replace('x', '.')) + + long = solver_vars.long + if long == 'long': + solver.steps_per_decay = 40000 + solver.max_steps = 120000 + elif long == 'long2': + solver.steps_per_decay = 80000 + solver.max_steps = 120000 + elif long == 'nolong' or long == 'nol': + solver.steps_per_decay = 20000 + solver.max_steps = 60000 + else: + logging.fatal('solver_vars.long should be long, long2, nolong or nol.') + assert(False) + + clip = solver_vars.clip + if clip == 'noclip' or clip == 'nocl': + solver.clip_gradient_norm = 0 + elif clip[:4] == 'clip': + solver.clip_gradient_norm = float(clip[4:].replace('x', '.')) + else: + logging.fatal('Unknown solver_vars.clip: %s', clip) + assert(False) + + typ = solver_vars.typ + if typ == 'adam': + solver.typ = 'adam' + solver.momentum = 0.9 + solver.momentum2 = 0.999 + solver.learning_rate_decay = 1.0 + elif typ == 'adam2': + solver.typ = 'adam' + solver.momentum = 0.9 + solver.momentum2 = 0.999 + solver.learning_rate_decay = 0.1 + elif typ == 'sgd': + solver.typ = 'sgd' + solver.momentum = 0.99 + solver.momentum2 = None + solver.learning_rate_decay = 0.1 + else: + logging.fatal('Unknown solver_vars.typ: %s', typ) + assert(False) + + logging.error('solver: %s', solver) + return solver + +def get_navtask_vars(navtask_str): + if navtask_str == '': vals = [] + else: vals = navtask_str.split('_') + + ks_all = ['dataset_name', 'modality', 'task', 'history', 'max_dist', + 'num_steps', 'step_size', 'n_ori', 'aux_views', 'data_aug'] + ks = ks_all[:len(vals)] + + # All data or not. + if len(vals) == 0: ks.append('dataset_name'); vals.append('sbpd') + # modality + if len(vals) == 1: ks.append('modality'); vals.append('rgb') + # semantic task? + if len(vals) == 2: ks.append('task'); vals.append('r2r') + # number of history frames. + if len(vals) == 3: ks.append('history'); vals.append('h0') + # max steps + if len(vals) == 4: ks.append('max_dist'); vals.append('32') + # num steps + if len(vals) == 5: ks.append('num_steps'); vals.append('40') + # step size + if len(vals) == 6: ks.append('step_size'); vals.append('8') + # n_ori + if len(vals) == 7: ks.append('n_ori'); vals.append('4') + # Auxiliary views. + if len(vals) == 8: ks.append('aux_views'); vals.append('nv0') + # Normal data augmentation as opposed to structured data augmentation (if set + # to straug. + if len(vals) == 9: ks.append('data_aug'); vals.append('straug') + + assert(len(vals) == 10) + for i in range(len(ks)): + assert(ks[i] == ks_all[i]) + + vars = utils.Foo() + for k, v in zip(ks, vals): + setattr(vars, k, v) + logging.error('navtask_vars: %s', vals) + return vars + +def process_navtask_str(navtask_str): + navtask = nec.nav_env_base_config() + + # Clobber with overrides from strings. + navtask_vars = get_navtask_vars(navtask_str) + + navtask.task_params.n_ori = int(navtask_vars.n_ori) + navtask.task_params.max_dist = int(navtask_vars.max_dist) + navtask.task_params.num_steps = int(navtask_vars.num_steps) + navtask.task_params.step_size = int(navtask_vars.step_size) + navtask.task_params.data_augment.delta_xy = int(navtask_vars.step_size)/2. + n_aux_views_each = int(navtask_vars.aux_views[2]) + aux_delta_thetas = np.concatenate((np.arange(n_aux_views_each) + 1, + -1 -np.arange(n_aux_views_each))) + aux_delta_thetas = aux_delta_thetas*np.deg2rad(navtask.camera_param.fov) + navtask.task_params.aux_delta_thetas = aux_delta_thetas + + if navtask_vars.data_aug == 'aug': + navtask.task_params.data_augment.structured = False + elif navtask_vars.data_aug == 'straug': + navtask.task_params.data_augment.structured = True + else: + logging.fatal('Unknown navtask_vars.data_aug %s.', navtask_vars.data_aug) + assert(False) + + navtask.task_params.num_history_frames = int(navtask_vars.history[1:]) + navtask.task_params.n_views = 1+navtask.task_params.num_history_frames + + navtask.task_params.goal_channels = int(navtask_vars.n_ori) + + if navtask_vars.task == 'hard': + navtask.task_params.type = 'rng_rejection_sampling_many' + navtask.task_params.rejection_sampling_M = 2000 + navtask.task_params.min_dist = 10 + elif navtask_vars.task == 'r2r': + navtask.task_params.type = 'room_to_room_many' + elif navtask_vars.task == 'ST': + # Semantic task at hand. + navtask.task_params.goal_channels = \ + len(navtask.task_params.semantic_task.class_map_names) + navtask.task_params.rel_goal_loc_dim = \ + len(navtask.task_params.semantic_task.class_map_names) + navtask.task_params.type = 'to_nearest_obj_acc' + else: + logging.fatal('navtask_vars.task: should be hard or r2r, ST') + assert(False) + + if navtask_vars.modality == 'rgb': + navtask.camera_param.modalities = ['rgb'] + navtask.camera_param.img_channels = 3 + elif navtask_vars.modality == 'd': + navtask.camera_param.modalities = ['depth'] + navtask.camera_param.img_channels = 2 + + navtask.task_params.img_height = navtask.camera_param.height + navtask.task_params.img_width = navtask.camera_param.width + navtask.task_params.modalities = navtask.camera_param.modalities + navtask.task_params.img_channels = navtask.camera_param.img_channels + navtask.task_params.img_fov = navtask.camera_param.fov + + navtask.dataset = factory.get_dataset(navtask_vars.dataset_name) + return navtask diff --git a/cognitive_mapping_and_planning/cfgs/config_distill.py b/cognitive_mapping_and_planning/cfgs/config_distill.py new file mode 100644 index 0000000000000000000000000000000000000000..a6f7985f8f003bc48800153239817d6ecbd53662 --- /dev/null +++ b/cognitive_mapping_and_planning/cfgs/config_distill.py @@ -0,0 +1,114 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +import pprint +import copy +import os +from tensorflow.python.platform import app +from tensorflow.python.platform import flags +import logging +import src.utils as utils +import cfgs.config_common as cc + + +import tensorflow as tf + +rgb_resnet_v2_50_path = 'cache/resnet_v2_50_inception_preprocessed/model.ckpt-5136169' + +def get_default_args(): + robot = utils.Foo(radius=15, base=10, height=140, sensor_height=120, + camera_elevation_degree=-15) + + camera_param = utils.Foo(width=225, height=225, z_near=0.05, z_far=20.0, + fov=60., modalities=['rgb', 'depth']) + + env = utils.Foo(padding=10, resolution=5, num_point_threshold=2, + valid_min=-10, valid_max=200, n_samples_per_face=200) + + data_augment = utils.Foo(lr_flip=0, delta_angle=1, delta_xy=4, relight=False, + relight_fast=False, structured=False) + + task_params = utils.Foo(num_actions=4, step_size=4, num_steps=0, + batch_size=32, room_seed=0, base_class='Building', + task='mapping', n_ori=6, data_augment=data_augment, + output_transform_to_global_map=False, + output_canonical_map=False, + output_incremental_transform=False, + output_free_space=False, move_type='shortest_path', + toy_problem=0) + + buildinger_args = utils.Foo(building_names=['area1_gates_wingA_floor1_westpart'], + env_class=None, robot=robot, + task_params=task_params, env=env, + camera_param=camera_param) + + solver_args = utils.Foo(seed=0, learning_rate_decay=0.1, + clip_gradient_norm=0, max_steps=120000, + initial_learning_rate=0.001, momentum=0.99, + steps_per_decay=40000, logdir=None, sync=False, + adjust_lr_sync=True, wt_decay=0.0001, + data_loss_wt=1.0, reg_loss_wt=1.0, + num_workers=1, task=0, ps_tasks=0, master='local') + + summary_args = utils.Foo(display_interval=1, test_iters=100) + + control_args = utils.Foo(train=False, test=False, + force_batchnorm_is_training_at_test=False) + + arch_args = utils.Foo(rgb_encoder='resnet_v2_50', d_encoder='resnet_v2_50') + + return utils.Foo(solver=solver_args, + summary=summary_args, control=control_args, arch=arch_args, + buildinger=buildinger_args) + +def get_vars(config_name): + vars = config_name.split('_') + if len(vars) == 1: # All data or not. + vars.append('noall') + if len(vars) == 2: # n_ori + vars.append('4') + logging.error('vars: %s', vars) + return vars + +def get_args_for_config(config_name): + args = get_default_args() + config_name, mode = config_name.split('+') + vars = get_vars(config_name) + + logging.info('config_name: %s, mode: %s', config_name, mode) + + args.buildinger.task_params.n_ori = int(vars[2]) + args.solver.freeze_conv = True + args.solver.pretrained_path = resnet_v2_50_path + args.buildinger.task_params.img_channels = 5 + args.solver.data_loss_wt = 0.00001 + + if vars[0] == 'v0': + None + else: + logging.error('config_name: %s undefined', config_name) + + args.buildinger.task_params.height = args.buildinger.camera_param.height + args.buildinger.task_params.width = args.buildinger.camera_param.width + args.buildinger.task_params.modalities = args.buildinger.camera_param.modalities + + if vars[1] == 'all': + args = cc.get_args_for_mode_building_all(args, mode) + elif vars[1] == 'noall': + args = cc.get_args_for_mode_building(args, mode) + + # Log the arguments + logging.error('%s', args) + return args diff --git a/cognitive_mapping_and_planning/cfgs/config_vision_baseline.py b/cognitive_mapping_and_planning/cfgs/config_vision_baseline.py new file mode 100644 index 0000000000000000000000000000000000000000..3cc64fe594ab025fbcfb41543302fa42c7fc0074 --- /dev/null +++ b/cognitive_mapping_and_planning/cfgs/config_vision_baseline.py @@ -0,0 +1,173 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +import pprint +import os +import numpy as np +from tensorflow.python.platform import app +from tensorflow.python.platform import flags +import logging +import src.utils as utils +import cfgs.config_common as cc +import datasets.nav_env_config as nec + + +import tensorflow as tf + +FLAGS = flags.FLAGS + +get_solver_vars = cc.get_solver_vars +get_navtask_vars = cc.get_navtask_vars + + +rgb_resnet_v2_50_path = 'data/init_models/resnet_v2_50/model.ckpt-5136169' +d_resnet_v2_50_path = 'data/init_models/distill_rgb_to_d_resnet_v2_50/model.ckpt-120002' + +def get_default_args(): + summary_args = utils.Foo(display_interval=1, test_iters=26, + arop_full_summary_iters=14) + + control_args = utils.Foo(train=False, test=False, + force_batchnorm_is_training_at_test=False, + reset_rng_seed=False, only_eval_when_done=False, + test_mode=None) + return summary_args, control_args + +def get_default_baseline_args(): + batch_norm_param = {'center': True, 'scale': True, + 'activation_fn':tf.nn.relu} + arch_args = utils.Foo( + pred_neurons=[], goal_embed_neurons=[], img_embed_neurons=[], + batch_norm_param=batch_norm_param, dim_reduce_neurons=64, combine_type='', + encoder='resnet_v2_50', action_sample_type='sample', + action_sample_combine_type='one_or_other', + sample_gt_prob_type='inverse_sigmoid_decay', dagger_sample_bn_false=True, + isd_k=750., use_visit_count=False, lstm_output=False, lstm_ego=False, + lstm_img=False, fc_dropout=0.0, embed_goal_for_state=False, + lstm_output_init_state_from_goal=False) + return arch_args + +def get_arch_vars(arch_str): + if arch_str == '': vals = [] + else: vals = arch_str.split('_') + + ks = ['ver', 'lstm_dim', 'dropout'] + + # Exp Ver + if len(vals) == 0: vals.append('v0') + # LSTM dimentsions + if len(vals) == 1: vals.append('lstm2048') + # Dropout + if len(vals) == 2: vals.append('noDO') + + assert(len(vals) == 3) + + vars = utils.Foo() + for k, v in zip(ks, vals): + setattr(vars, k, v) + + logging.error('arch_vars: %s', vars) + return vars + +def process_arch_str(args, arch_str): + # This function modifies args. + args.arch = get_default_baseline_args() + arch_vars = get_arch_vars(arch_str) + + args.navtask.task_params.outputs.rel_goal_loc = True + args.navtask.task_params.input_type = 'vision' + args.navtask.task_params.outputs.images = True + + if args.navtask.camera_param.modalities[0] == 'rgb': + args.solver.pretrained_path = rgb_resnet_v2_50_path + elif args.navtask.camera_param.modalities[0] == 'depth': + args.solver.pretrained_path = d_resnet_v2_50_path + else: + logging.fatal('Neither of rgb or d') + + if arch_vars.dropout == 'DO': + args.arch.fc_dropout = 0.5 + + args.tfcode = 'B' + + exp_ver = arch_vars.ver + if exp_ver == 'v0': + # Multiplicative interaction between goal loc and image features. + args.arch.combine_type = 'multiply' + args.arch.pred_neurons = [256, 256] + args.arch.goal_embed_neurons = [64, 8] + args.arch.img_embed_neurons = [1024, 512, 256*8] + + elif exp_ver == 'v1': + # Additive interaction between goal and image features. + args.arch.combine_type = 'add' + args.arch.pred_neurons = [256, 256] + args.arch.goal_embed_neurons = [64, 256] + args.arch.img_embed_neurons = [1024, 512, 256] + + elif exp_ver == 'v2': + # LSTM at the output on top of multiple interactions. + args.arch.combine_type = 'multiply' + args.arch.goal_embed_neurons = [64, 8] + args.arch.img_embed_neurons = [1024, 512, 256*8] + args.arch.lstm_output = True + args.arch.lstm_output_dim = int(arch_vars.lstm_dim[4:]) + args.arch.pred_neurons = [256] # The other is inside the LSTM. + + elif exp_ver == 'v0blind': + # LSTM only on the goal location. + args.arch.combine_type = 'goalonly' + args.arch.goal_embed_neurons = [64, 256] + args.arch.img_embed_neurons = [2] # I dont know what it will do otherwise. + args.arch.lstm_output = True + args.arch.lstm_output_dim = 256 + args.arch.pred_neurons = [256] # The other is inside the LSTM. + + else: + logging.fatal('exp_ver: %s undefined', exp_ver) + assert(False) + + # Log the arguments + logging.error('%s', args) + return args + +def get_args_for_config(config_name): + args = utils.Foo() + + args.summary, args.control = get_default_args() + + exp_name, mode_str = config_name.split('+') + arch_str, solver_str, navtask_str = exp_name.split('.') + logging.error('config_name: %s', config_name) + logging.error('arch_str: %s', arch_str) + logging.error('navtask_str: %s', navtask_str) + logging.error('solver_str: %s', solver_str) + logging.error('mode_str: %s', mode_str) + + args.solver = cc.process_solver_str(solver_str) + args.navtask = cc.process_navtask_str(navtask_str) + + args = process_arch_str(args, arch_str) + args.arch.isd_k = args.solver.isd_k + + # Train, test, etc. + mode, imset = mode_str.split('_') + args = cc.adjust_args_for_mode(args, mode) + args.navtask.building_names = args.navtask.dataset.get_split(imset) + args.control.test_name = '{:s}_on_{:s}'.format(mode, imset) + + # Log the arguments + logging.error('%s', args) + return args diff --git a/cognitive_mapping_and_planning/data/.gitignore b/cognitive_mapping_and_planning/data/.gitignore new file mode 100644 index 0000000000000000000000000000000000000000..2b6d5e46652d14a9c0a8025dbcccfc2dd4376e4a --- /dev/null +++ b/cognitive_mapping_and_planning/data/.gitignore @@ -0,0 +1,3 @@ +stanford_building_parser_dataset_raw +stanford_building_parser_dataset +init_models diff --git a/cognitive_mapping_and_planning/data/README.md b/cognitive_mapping_and_planning/data/README.md new file mode 100644 index 0000000000000000000000000000000000000000..a8928345351dac19c0e12fd33f99dd2aa600e23b --- /dev/null +++ b/cognitive_mapping_and_planning/data/README.md @@ -0,0 +1,33 @@ +This directory contains the data needed for training and benchmarking various +navigation models. + +1. Download the data from the [dataset website] + (http://buildingparser.stanford.edu/dataset.html). + 1. [Raw meshes](https://goo.gl/forms/2YSPaO2UKmn5Td5m2). We need the meshes + which are in the noXYZ folder. Download the tar files and place them in + the `stanford_building_parser_dataset_raw` folder. You need to download + `area_1_noXYZ.tar`, `area_3_noXYZ.tar`, `area_5a_noXYZ.tar`, + `area_5b_noXYZ.tar`, `area_6_noXYZ.tar` for training and + `area_4_noXYZ.tar` for evaluation. + 2. [Annotations](https://goo.gl/forms/4SoGp4KtH1jfRqEj2) for setting up + tasks. We will need the file called `Stanford3dDataset_v1.2.zip`. Place + the file in the directory `stanford_building_parser_dataset_raw`. + +2. Preprocess the data. + 1. Extract meshes using `scripts/script_preprocess_meshes_S3DIS.sh`. After + this `ls data/stanford_building_parser_dataset/mesh` should have 6 + folders `area1`, `area3`, `area4`, `area5a`, `area5b`, `area6`, with + textures and obj files within each directory. + 2. Extract out room information and semantics from zip file using + `scripts/script_preprocess_annoations_S3DIS.sh`. After this there should + be `room-dimension` and `class-maps` folder in + `data/stanford_building_parser_dataset`. (If you find this script to + crash because of an exception in np.loadtxt while processing + `Area_5/office_19/Annotations/ceiling_1.txt`, there is a special + character on line 323474, that should be removed manually.) + +3. Download ImageNet Pre-trained models. We used ResNet-v2-50 for representing + images. For RGB images this is pre-trained on ImageNet. For Depth images we + [distill](https://arxiv.org/abs/1507.00448) the RGB model to depth images + using paired RGB-D images. Both there models are available through + `scripts/script_download_init_models.sh` diff --git a/cognitive_mapping_and_planning/datasets/__init__.py b/cognitive_mapping_and_planning/datasets/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/cognitive_mapping_and_planning/datasets/factory.py b/cognitive_mapping_and_planning/datasets/factory.py new file mode 100644 index 0000000000000000000000000000000000000000..3f7b5c0a602dbacf9619dc1c2ec98e94200428b6 --- /dev/null +++ b/cognitive_mapping_and_planning/datasets/factory.py @@ -0,0 +1,113 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +r"""Wrapper for selecting the navigation environment that we want to train and +test on. +""" +import numpy as np +import os, glob +import platform + +import logging +from tensorflow.python.platform import app +from tensorflow.python.platform import flags + +import render.swiftshader_renderer as renderer +import src.file_utils as fu +import src.utils as utils + +def get_dataset(dataset_name): + if dataset_name == 'sbpd': + dataset = StanfordBuildingParserDataset(dataset_name) + else: + logging.fatal('Not one of sbpd') + return dataset + +class Loader(): + def get_data_dir(): + pass + + def get_meta_data(self, file_name, data_dir=None): + if data_dir is None: + data_dir = self.get_data_dir() + full_file_name = os.path.join(data_dir, 'meta', file_name) + assert(fu.exists(full_file_name)), \ + '{:s} does not exist'.format(full_file_name) + ext = os.path.splitext(full_file_name)[1] + if ext == '.txt': + ls = [] + with fu.fopen(full_file_name, 'r') as f: + for l in f: + ls.append(l.rstrip()) + elif ext == '.pkl': + ls = utils.load_variables(full_file_name) + return ls + + def load_building(self, name, data_dir=None): + if data_dir is None: + data_dir = self.get_data_dir() + out = {} + out['name'] = name + out['data_dir'] = data_dir + out['room_dimension_file'] = os.path.join(data_dir, 'room-dimension', + name+'.pkl') + out['class_map_folder'] = os.path.join(data_dir, 'class-maps') + return out + + def load_building_meshes(self, building): + dir_name = os.path.join(building['data_dir'], 'mesh', building['name']) + mesh_file_name = glob.glob1(dir_name, '*.obj')[0] + mesh_file_name_full = os.path.join(dir_name, mesh_file_name) + logging.error('Loading building from obj file: %s', mesh_file_name_full) + shape = renderer.Shape(mesh_file_name_full, load_materials=True, + name_prefix=building['name']+'_') + return [shape] + +class StanfordBuildingParserDataset(Loader): + def __init__(self, ver): + self.ver = ver + self.data_dir = None + + def get_data_dir(self): + if self.data_dir is None: + self.data_dir = 'data/stanford_building_parser_dataset/' + return self.data_dir + + def get_benchmark_sets(self): + return self._get_benchmark_sets() + + def get_split(self, split_name): + if self.ver == 'sbpd': + return self._get_split(split_name) + else: + logging.fatal('Unknown version.') + + def _get_benchmark_sets(self): + sets = ['train1', 'val', 'test'] + return sets + + def _get_split(self, split_name): + train = ['area1', 'area5a', 'area5b', 'area6'] + train1 = ['area1'] + val = ['area3'] + test = ['area4'] + + sets = {} + sets['train'] = train + sets['train1'] = train1 + sets['val'] = val + sets['test'] = test + sets['all'] = sorted(list(set(train + val + test))) + return sets[split_name] diff --git a/cognitive_mapping_and_planning/datasets/nav_env.py b/cognitive_mapping_and_planning/datasets/nav_env.py new file mode 100644 index 0000000000000000000000000000000000000000..5710e26dcb113121d99400cb060104224dd91749 --- /dev/null +++ b/cognitive_mapping_and_planning/datasets/nav_env.py @@ -0,0 +1,1465 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +r"""Navidation Environment. Includes the following classes along with some +helper functions. + Building: Loads buildings, computes traversibility, exposes functionality for + rendering images. + + GridWorld: Base class which implements functionality for moving an agent on a + grid world. + + NavigationEnv: Base class which generates navigation problems on a grid world. + + VisualNavigationEnv: Builds upon NavigationEnv and Building to provide + interface that is used externally to train the agent. + + MeshMapper: Class used for distilling the model, testing the mapper. + + BuildingMultiplexer: Wrapper class that instantiates a VisualNavigationEnv for + each building and multiplexes between them as needed. +""" + +import numpy as np +import os +import re +import matplotlib.pyplot as plt + +import graph_tool as gt +import graph_tool.topology + +from tensorflow.python.platform import gfile +import logging +import src.file_utils as fu +import src.utils as utils +import src.graph_utils as gu +import src.map_utils as mu +import src.depth_utils as du +import render.swiftshader_renderer as sru +from render.swiftshader_renderer import SwiftshaderRenderer +import cv2 + +label_nodes_with_class = gu.label_nodes_with_class +label_nodes_with_class_geodesic = gu.label_nodes_with_class_geodesic +get_distance_node_list = gu.get_distance_node_list +convert_to_graph_tool = gu.convert_to_graph_tool +generate_graph = gu.generate_graph +get_hardness_distribution = gu.get_hardness_distribution +rng_next_goal_rejection_sampling = gu.rng_next_goal_rejection_sampling +rng_next_goal = gu.rng_next_goal +rng_room_to_room = gu.rng_room_to_room +rng_target_dist_field = gu.rng_target_dist_field + +compute_traversibility = mu.compute_traversibility +make_map = mu.make_map +resize_maps = mu.resize_maps +pick_largest_cc = mu.pick_largest_cc +get_graph_origin_loc = mu.get_graph_origin_loc +generate_egocentric_maps = mu.generate_egocentric_maps +generate_goal_images = mu.generate_goal_images +get_map_to_predict = mu.get_map_to_predict + +bin_points = du.bin_points +make_geocentric = du.make_geocentric +get_point_cloud_from_z = du.get_point_cloud_from_z +get_camera_matrix = du.get_camera_matrix + +def _get_semantic_maps(folder_name, building_name, map, flip): + # Load file from the cache. + file_name = '{:s}_{:d}_{:d}_{:d}_{:d}_{:d}_{:d}.pkl' + file_name = file_name.format(building_name, map.size[0], map.size[1], + map.origin[0], map.origin[1], map.resolution, + flip) + file_name = os.path.join(folder_name, file_name) + logging.info('Loading semantic maps from %s.', file_name) + + if fu.exists(file_name): + a = utils.load_variables(file_name) + maps = a['maps'] #HxWx#C + cats = a['cats'] + else: + logging.error('file_name: %s not found.', file_name) + maps = None + cats = None + return maps, cats + +def _select_classes(all_maps, all_cats, cats_to_use): + inds = [] + for c in cats_to_use: + ind = all_cats.index(c) + inds.append(ind) + out_maps = all_maps[:,:,inds] + return out_maps + +def _get_room_dimensions(file_name, resolution, origin, flip=False): + if fu.exists(file_name): + a = utils.load_variables(file_name)['room_dimension'] + names = a.keys() + dims = np.concatenate(a.values(), axis=0).reshape((-1,6)) + ind = np.argsort(names) + dims = dims[ind,:] + names = [names[x] for x in ind] + if flip: + dims_new = dims*1 + dims_new[:,1] = -dims[:,4] + dims_new[:,4] = -dims[:,1] + dims = dims_new*1 + + dims = dims*100. + dims[:,0] = dims[:,0] - origin[0] + dims[:,1] = dims[:,1] - origin[1] + dims[:,3] = dims[:,3] - origin[0] + dims[:,4] = dims[:,4] - origin[1] + dims = dims / resolution + out = {'names': names, 'dims': dims} + else: + out = None + return out + +def _filter_rooms(room_dims, room_regex): + pattern = re.compile(room_regex) + ind = [] + for i, name in enumerate(room_dims['names']): + if pattern.match(name): + ind.append(i) + new_room_dims = {} + new_room_dims['names'] = [room_dims['names'][i] for i in ind] + new_room_dims['dims'] = room_dims['dims'][ind,:]*1 + return new_room_dims + +def _label_nodes_with_room_id(xyt, room_dims): + # Label the room with the ID into things. + node_room_id = -1*np.ones((xyt.shape[0], 1)) + dims = room_dims['dims'] + for x, name in enumerate(room_dims['names']): + all_ = np.concatenate((xyt[:,[0]] >= dims[x,0], + xyt[:,[0]] <= dims[x,3], + xyt[:,[1]] >= dims[x,1], + xyt[:,[1]] <= dims[x,4]), axis=1) + node_room_id[np.all(all_, axis=1), 0] = x + return node_room_id + +def get_path_ids(start_node_id, end_node_id, pred_map): + id = start_node_id + path = [id] + while id != end_node_id: + id = pred_map[id] + path.append(id) + return path + +def image_pre(images, modalities): + # Assumes images are ...xHxWxC. + # We always assume images are RGB followed by Depth. + if 'depth' in modalities: + d = images[...,-1][...,np.newaxis]*1. + d[d < 0.01] = np.NaN; isnan = np.isnan(d); + d = 100./d; d[isnan] = 0.; + images = np.concatenate((images[...,:-1], d, isnan), axis=images.ndim-1) + if 'rgb' in modalities: + images[...,:3] = images[...,:3]*1. - 128 + return images + +def _get_relative_goal_loc(goal_loc, loc, theta): + r = np.sqrt(np.sum(np.square(goal_loc - loc), axis=1)) + t = np.arctan2(goal_loc[:,1] - loc[:,1], goal_loc[:,0] - loc[:,0]) + t = t-theta[:,0] + np.pi/2 + return np.expand_dims(r,axis=1), np.expand_dims(t, axis=1) + +def _gen_perturbs(rng, batch_size, num_steps, lr_flip, delta_angle, delta_xy, + structured): + perturbs = [] + for i in range(batch_size): + # Doing things one by one for each episode in this batch. This way this + # remains replicatable even when we change the batch size. + p = np.zeros((num_steps+1, 4)) + if lr_flip: + # Flip the whole trajectory. + p[:,3] = rng.rand(1)-0.5 + if delta_angle > 0: + if structured: + p[:,2] = (rng.rand(1)-0.5)* delta_angle + else: + p[:,2] = (rng.rand(p.shape[0])-0.5)* delta_angle + if delta_xy > 0: + if structured: + p[:,:2] = (rng.rand(1, 2)-0.5)*delta_xy + else: + p[:,:2] = (rng.rand(p.shape[0], 2)-0.5)*delta_xy + perturbs.append(p) + return perturbs + +def get_multiplexer_class(args, task_number): + assert(args.task_params.base_class == 'Building') + logging.info('Returning BuildingMultiplexer') + R = BuildingMultiplexer(args, task_number) + return R + +class GridWorld(): + def __init__(self): + """Class members that will be assigned by any class that actually uses this + class.""" + self.restrict_to_largest_cc = None + self.robot = None + self.env = None + self.category_list = None + self.traversible = None + + def get_loc_axis(self, node, delta_theta, perturb=None): + """Based on the node orientation returns X, and Y axis. Used to sample the + map in egocentric coordinate frame. + """ + if type(node) == tuple: + node = np.array([node]) + if perturb is None: + perturb = np.zeros((node.shape[0], 4)) + xyt = self.to_actual_xyt_vec(node) + x = xyt[:,[0]] + perturb[:,[0]] + y = xyt[:,[1]] + perturb[:,[1]] + t = xyt[:,[2]] + perturb[:,[2]] + theta = t*delta_theta + loc = np.concatenate((x,y), axis=1) + x_axis = np.concatenate((np.cos(theta), np.sin(theta)), axis=1) + y_axis = np.concatenate((np.cos(theta+np.pi/2.), np.sin(theta+np.pi/2.)), + axis=1) + # Flip the sampled map where need be. + y_axis[np.where(perturb[:,3] > 0)[0], :] *= -1. + return loc, x_axis, y_axis, theta + + def to_actual_xyt(self, pqr): + """Converts from node to location on the map.""" + (p, q, r) = pqr + if self.task.n_ori == 6: + out = (p - q * 0.5 + self.task.origin_loc[0], + q * np.sqrt(3.) / 2. + self.task.origin_loc[1], r) + elif self.task.n_ori == 4: + out = (p + self.task.origin_loc[0], + q + self.task.origin_loc[1], r) + return out + + def to_actual_xyt_vec(self, pqr): + """Converts from node array to location array on the map.""" + p = pqr[:,0][:, np.newaxis] + q = pqr[:,1][:, np.newaxis] + r = pqr[:,2][:, np.newaxis] + if self.task.n_ori == 6: + out = np.concatenate((p - q * 0.5 + self.task.origin_loc[0], + q * np.sqrt(3.) / 2. + self.task.origin_loc[1], + r), axis=1) + elif self.task.n_ori == 4: + out = np.concatenate((p + self.task.origin_loc[0], + q + self.task.origin_loc[1], + r), axis=1) + return out + + def raw_valid_fn_vec(self, xyt): + """Returns if the given set of nodes is valid or not.""" + height = self.traversible.shape[0] + width = self.traversible.shape[1] + x = np.round(xyt[:,[0]]).astype(np.int32) + y = np.round(xyt[:,[1]]).astype(np.int32) + is_inside = np.all(np.concatenate((x >= 0, y >= 0, + x < width, y < height), axis=1), axis=1) + x = np.minimum(np.maximum(x, 0), width-1) + y = np.minimum(np.maximum(y, 0), height-1) + ind = np.ravel_multi_index((y,x), self.traversible.shape) + is_traversible = self.traversible.ravel()[ind] + + is_valid = np.all(np.concatenate((is_inside[:,np.newaxis], is_traversible), + axis=1), axis=1) + return is_valid + + + def valid_fn_vec(self, pqr): + """Returns if the given set of nodes is valid or not.""" + xyt = self.to_actual_xyt_vec(np.array(pqr)) + height = self.traversible.shape[0] + width = self.traversible.shape[1] + x = np.round(xyt[:,[0]]).astype(np.int32) + y = np.round(xyt[:,[1]]).astype(np.int32) + is_inside = np.all(np.concatenate((x >= 0, y >= 0, + x < width, y < height), axis=1), axis=1) + x = np.minimum(np.maximum(x, 0), width-1) + y = np.minimum(np.maximum(y, 0), height-1) + ind = np.ravel_multi_index((y,x), self.traversible.shape) + is_traversible = self.traversible.ravel()[ind] + + is_valid = np.all(np.concatenate((is_inside[:,np.newaxis], is_traversible), + axis=1), axis=1) + return is_valid + + def get_feasible_actions(self, node_ids): + """Returns the feasible set of actions from the current node.""" + a = np.zeros((len(node_ids), self.task_params.num_actions), dtype=np.int32) + gtG = self.task.gtG + next_node = [] + for i, c in enumerate(node_ids): + neigh = gtG.vertex(c).out_neighbours() + neigh_edge = gtG.vertex(c).out_edges() + nn = {} + for n, e in zip(neigh, neigh_edge): + _ = gtG.ep['action'][e] + a[i,_] = 1 + nn[_] = int(n) + next_node.append(nn) + return a, next_node + + def take_action(self, current_node_ids, action): + """Returns the new node after taking the action action. Stays at the current + node if the action is invalid.""" + actions, next_node_ids = self.get_feasible_actions(current_node_ids) + new_node_ids = [] + for i, (c,a) in enumerate(zip(current_node_ids, action)): + if actions[i,a] == 1: + new_node_ids.append(next_node_ids[i][a]) + else: + new_node_ids.append(c) + return new_node_ids + + def set_r_obj(self, r_obj): + """Sets the SwiftshaderRenderer object used for rendering.""" + self.r_obj = r_obj + +class Building(GridWorld): + def __init__(self, building_name, robot, env, + category_list=None, small=False, flip=False, logdir=None, + building_loader=None): + + self.restrict_to_largest_cc = True + self.robot = robot + self.env = env + self.logdir = logdir + + # Load the building meta data. + building = building_loader.load_building(building_name) + if small: + building['mesh_names'] = building['mesh_names'][:5] + + # New code. + shapess = building_loader.load_building_meshes(building) + if flip: + for shapes in shapess: + shapes.flip_shape() + + vs = [] + for shapes in shapess: + vs.append(shapes.get_vertices()[0]) + vs = np.concatenate(vs, axis=0) + map = make_map(env.padding, env.resolution, vertex=vs, sc=100.) + map = compute_traversibility( + map, robot.base, robot.height, robot.radius, env.valid_min, + env.valid_max, env.num_point_threshold, shapess=shapess, sc=100., + n_samples_per_face=env.n_samples_per_face) + + room_dims = _get_room_dimensions(building['room_dimension_file'], + env.resolution, map.origin, flip=flip) + class_maps, class_map_names = _get_semantic_maps( + building['class_map_folder'], building_name, map, flip) + + self.class_maps = class_maps + self.class_map_names = class_map_names + self.building = building + self.shapess = shapess + self.map = map + self.traversible = map.traversible*1 + self.building_name = building_name + self.room_dims = room_dims + self.flipped = flip + self.renderer_entitiy_ids = [] + + if self.restrict_to_largest_cc: + self.traversible = pick_largest_cc(self.traversible) + + def load_building_into_scene(self): + # Loads the scene. + self.renderer_entitiy_ids += self.r_obj.load_shapes(self.shapess) + # Free up memory, we dont need the mesh or the materials anymore. + self.shapess = None + + def add_entity_at_nodes(self, nodes, height, shape): + xyt = self.to_actual_xyt_vec(nodes) + nxy = xyt[:,:2]*1. + nxy = nxy * self.map.resolution + nxy = nxy + self.map.origin + Ts = np.concatenate((nxy, nxy[:,:1]), axis=1) + Ts[:,2] = height; Ts = Ts / 100.; + + # Merge all the shapes into a single shape and add that shape. + shape.replicate_shape(Ts) + entity_ids = self.r_obj.load_shapes([shape]) + self.renderer_entitiy_ids += entity_ids + return entity_ids + + def add_shapes(self, shapes): + scene = self.r_obj.viz.scene() + for shape in shapes: + scene.AddShape(shape) + + def add_materials(self, materials): + scene = self.r_obj.viz.scene() + for material in materials: + scene.AddOrUpdateMaterial(material) + + def set_building_visibility(self, visibility): + self.r_obj.set_entity_visible(self.renderer_entitiy_ids, visibility) + + def render_nodes(self, nodes, perturb=None, aux_delta_theta=0.): + self.set_building_visibility(True) + if perturb is None: + perturb = np.zeros((len(nodes), 4)) + + imgs = [] + r = 2 + elevation_z = r * np.tan(np.deg2rad(self.robot.camera_elevation_degree)) + + for i in range(len(nodes)): + xyt = self.to_actual_xyt(nodes[i]) + lookat_theta = 3.0 * np.pi / 2.0 - (xyt[2]+perturb[i,2]+aux_delta_theta) * (self.task.delta_theta) + nxy = np.array([xyt[0]+perturb[i,0], xyt[1]+perturb[i,1]]).reshape(1, -1) + nxy = nxy * self.map.resolution + nxy = nxy + self.map.origin + camera_xyz = np.zeros((1, 3)) + camera_xyz[...] = [nxy[0, 0], nxy[0, 1], self.robot.sensor_height] + camera_xyz = camera_xyz / 100. + lookat_xyz = np.array([-r * np.sin(lookat_theta), + -r * np.cos(lookat_theta), elevation_z]) + lookat_xyz = lookat_xyz + camera_xyz[0, :] + self.r_obj.position_camera(camera_xyz[0, :].tolist(), + lookat_xyz.tolist(), [0.0, 0.0, 1.0]) + img = self.r_obj.render(take_screenshot=True, output_type=0) + img = [x for x in img if x is not None] + img = np.concatenate(img, axis=2).astype(np.float32) + if perturb[i,3]>0: + img = img[:,::-1,:] + imgs.append(img) + + self.set_building_visibility(False) + return imgs + + +class MeshMapper(Building): + def __init__(self, robot, env, task_params, building_name, category_list, + flip, logdir=None, building_loader=None): + Building.__init__(self, building_name, robot, env, category_list, + small=task_params.toy_problem, flip=flip, logdir=logdir, + building_loader=building_loader) + self.task_params = task_params + self.task = None + self._preprocess_for_task(self.task_params.building_seed) + + def _preprocess_for_task(self, seed): + if self.task is None or self.task.seed != seed: + rng = np.random.RandomState(seed) + origin_loc = get_graph_origin_loc(rng, self.traversible) + self.task = utils.Foo(seed=seed, origin_loc=origin_loc, + n_ori=self.task_params.n_ori) + G = generate_graph(self.valid_fn_vec, + self.task_params.step_size, self.task.n_ori, + (0, 0, 0)) + gtG, nodes, nodes_to_id = convert_to_graph_tool(G) + self.task.gtG = gtG + self.task.nodes = nodes + self.task.delta_theta = 2.0*np.pi/(self.task.n_ori*1.) + self.task.nodes_to_id = nodes_to_id + logging.info('Building %s, #V=%d, #E=%d', self.building_name, + self.task.nodes.shape[0], self.task.gtG.num_edges()) + + if self.logdir is not None: + write_traversible = cv2.applyColorMap(self.traversible.astype(np.uint8)*255, cv2.COLORMAP_JET) + img_path = os.path.join(self.logdir, + '{:s}_{:d}_graph.png'.format(self.building_name, + seed)) + node_xyt = self.to_actual_xyt_vec(self.task.nodes) + plt.set_cmap('jet'); + fig, ax = utils.subplot(plt, (1,1), (12,12)) + ax.plot(node_xyt[:,0], node_xyt[:,1], 'm.') + ax.imshow(self.traversible, origin='lower'); + ax.set_axis_off(); ax.axis('equal'); + ax.set_title('{:s}, {:d}, {:d}'.format(self.building_name, + self.task.nodes.shape[0], + self.task.gtG.num_edges())) + if self.room_dims is not None: + for i, r in enumerate(self.room_dims['dims']*1): + min_ = r[:3]*1 + max_ = r[3:]*1 + xmin, ymin, zmin = min_ + xmax, ymax, zmax = max_ + + ax.plot([xmin, xmax, xmax, xmin, xmin], + [ymin, ymin, ymax, ymax, ymin], 'g') + with fu.fopen(img_path, 'w') as f: + fig.savefig(f, bbox_inches='tight', transparent=True, pad_inches=0) + plt.close(fig) + + + def _gen_rng(self, rng): + # instances is a list of list of node_ids. + if self.task_params.move_type == 'circle': + _, _, _, _, paths = rng_target_dist_field(self.task_params.batch_size, + self.task.gtG, rng, 0, 1, + compute_path=True) + instances_ = paths + + instances = [] + for instance_ in instances_: + instance = instance_ + for i in range(self.task_params.num_steps): + instance.append(self.take_action([instance[-1]], [1])[0]) + instances.append(instance) + + elif self.task_params.move_type == 'shortest_path': + _, _, _, _, paths = rng_target_dist_field(self.task_params.batch_size, + self.task.gtG, rng, + self.task_params.num_steps, + self.task_params.num_steps+1, + compute_path=True) + instances = paths + + elif self.task_params.move_type == 'circle+forward': + _, _, _, _, paths = rng_target_dist_field(self.task_params.batch_size, + self.task.gtG, rng, 0, 1, + compute_path=True) + instances_ = paths + instances = [] + for instance_ in instances_: + instance = instance_ + for i in range(self.task_params.n_ori-1): + instance.append(self.take_action([instance[-1]], [1])[0]) + while len(instance) <= self.task_params.num_steps: + while self.take_action([instance[-1]], [3])[0] == instance[-1] and len(instance) <= self.task_params.num_steps: + instance.append(self.take_action([instance[-1]], [2])[0]) + if len(instance) <= self.task_params.num_steps: + instance.append(self.take_action([instance[-1]], [3])[0]) + instances.append(instance) + + # Do random perturbation if needed. + perturbs = _gen_perturbs(rng, self.task_params.batch_size, + self.task_params.num_steps, + self.task_params.data_augment.lr_flip, + self.task_params.data_augment.delta_angle, + self.task_params.data_augment.delta_xy, + self.task_params.data_augment.structured) + return instances, perturbs + + def worker(self, instances, perturbs): + # Output the images and the free space. + + # Make the instances be all the same length. + for i in range(len(instances)): + for j in range(self.task_params.num_steps - len(instances[i]) + 1): + instances[i].append(instances[i][-1]) + if perturbs[i].shape[0] < self.task_params.num_steps+1: + p = np.zeros((self.task_params.num_steps+1, 4)) + p[:perturbs[i].shape[0], :] = perturbs[i] + p[perturbs[i].shape[0]:, :] = perturbs[i][-1,:] + perturbs[i] = p + + instances_ = [] + for instance in instances: + instances_ = instances_ + instance + perturbs_ = np.concatenate(perturbs, axis=0) + + instances_nodes = self.task.nodes[instances_,:] + instances_nodes = [tuple(x) for x in instances_nodes] + + imgs_ = self.render_nodes(instances_nodes, perturbs_) + imgs = []; next = 0; + for instance in instances: + img_i = [] + for _ in instance: + img_i.append(imgs_[next]) + next = next+1 + imgs.append(img_i) + imgs = np.array(imgs) + + # Render out the maps in the egocentric view for all nodes and not just the + # last node. + all_nodes = [] + for x in instances: + all_nodes = all_nodes + x + all_perturbs = np.concatenate(perturbs, axis=0) + loc, x_axis, y_axis, theta = self.get_loc_axis( + self.task.nodes[all_nodes, :]*1, delta_theta=self.task.delta_theta, + perturb=all_perturbs) + fss = None + valids = None + loc_on_map = None + theta_on_map = None + cum_fs = None + cum_valid = None + incremental_locs = None + incremental_thetas = None + + if self.task_params.output_free_space: + fss, valids = get_map_to_predict(loc, x_axis, y_axis, + map=self.traversible*1., + map_size=self.task_params.map_size) + fss = np.array(fss) > 0.5 + fss = np.reshape(fss, [self.task_params.batch_size, + self.task_params.num_steps+1, + self.task_params.map_size, + self.task_params.map_size]) + valids = np.reshape(np.array(valids), fss.shape) + + if self.task_params.output_transform_to_global_map: + # Output the transform to the global map. + loc_on_map = np.reshape(loc*1, [self.task_params.batch_size, + self.task_params.num_steps+1, -1]) + # Converting to location wrt to first location so that warping happens + # properly. + theta_on_map = np.reshape(theta*1, [self.task_params.batch_size, + self.task_params.num_steps+1, -1]) + + if self.task_params.output_incremental_transform: + # Output the transform to the global map. + incremental_locs_ = np.reshape(loc*1, [self.task_params.batch_size, + self.task_params.num_steps+1, -1]) + incremental_locs_[:,1:,:] -= incremental_locs_[:,:-1,:] + t0 = -np.pi/2+np.reshape(theta*1, [self.task_params.batch_size, + self.task_params.num_steps+1, -1]) + t = t0*1 + incremental_locs = incremental_locs_*1 + incremental_locs[:,:,0] = np.sum(incremental_locs_ * np.concatenate((np.cos(t), np.sin(t)), axis=-1), axis=-1) + incremental_locs[:,:,1] = np.sum(incremental_locs_ * np.concatenate((np.cos(t+np.pi/2), np.sin(t+np.pi/2)), axis=-1), axis=-1) + incremental_locs[:,0,:] = incremental_locs_[:,0,:] + # print incremental_locs_[0,:,:], incremental_locs[0,:,:], t0[0,:,:] + + incremental_thetas = np.reshape(theta*1, [self.task_params.batch_size, + self.task_params.num_steps+1, + -1]) + incremental_thetas[:,1:,:] += -incremental_thetas[:,:-1,:] + + if self.task_params.output_canonical_map: + loc_ = loc[0::(self.task_params.num_steps+1), :] + x_axis = np.zeros_like(loc_); x_axis[:,1] = 1 + y_axis = np.zeros_like(loc_); y_axis[:,0] = -1 + cum_fs, cum_valid = get_map_to_predict(loc_, x_axis, y_axis, + map=self.traversible*1., + map_size=self.task_params.map_size) + cum_fs = np.array(cum_fs) > 0.5 + cum_fs = np.reshape(cum_fs, [self.task_params.batch_size, 1, + self.task_params.map_size, + self.task_params.map_size]) + cum_valid = np.reshape(np.array(cum_valid), cum_fs.shape) + + + inputs = {'fs_maps': fss, + 'valid_maps': valids, + 'imgs': imgs, + 'loc_on_map': loc_on_map, + 'theta_on_map': theta_on_map, + 'cum_fs_maps': cum_fs, + 'cum_valid_maps': cum_valid, + 'incremental_thetas': incremental_thetas, + 'incremental_locs': incremental_locs} + return inputs + + def pre(self, inputs): + inputs['imgs'] = image_pre(inputs['imgs'], self.task_params.modalities) + if inputs['loc_on_map'] is not None: + inputs['loc_on_map'] = inputs['loc_on_map'] - inputs['loc_on_map'][:,[0],:] + if inputs['theta_on_map'] is not None: + inputs['theta_on_map'] = np.pi/2. - inputs['theta_on_map'] + return inputs + +def _nav_env_reset_helper(type, rng, nodes, batch_size, gtG, max_dist, + num_steps, num_goals, data_augment, **kwargs): + """Generates and returns a new episode.""" + max_compute = max_dist + 4*num_steps + if type == 'general': + start_node_ids, end_node_ids, dist, pred_map, paths = \ + rng_target_dist_field(batch_size, gtG, rng, max_dist, max_compute, + nodes=nodes, compute_path=False) + target_class = None + + elif type == 'room_to_room_many': + goal_node_ids = []; dists = []; + node_room_ids = kwargs['node_room_ids'] + # Sample the first one + start_node_ids_, end_node_ids_, dist_, _, _ = rng_room_to_room( + batch_size, gtG, rng, max_dist, max_compute, + node_room_ids=node_room_ids, nodes=nodes) + start_node_ids = start_node_ids_ + goal_node_ids.append(end_node_ids_) + dists.append(dist_) + for n in range(num_goals-1): + start_node_ids_, end_node_ids_, dist_, _, _ = rng_next_goal( + goal_node_ids[n], batch_size, gtG, rng, max_dist, + max_compute, node_room_ids=node_room_ids, nodes=nodes, + dists_from_start_node=dists[n]) + goal_node_ids.append(end_node_ids_) + dists.append(dist_) + target_class = None + + elif type == 'rng_rejection_sampling_many': + num_goals = num_goals + goal_node_ids = []; dists = []; + + n_ori = kwargs['n_ori'] + step_size = kwargs['step_size'] + min_dist = kwargs['min_dist'] + sampling_distribution = kwargs['sampling_distribution'] + target_distribution = kwargs['target_distribution'] + rejection_sampling_M = kwargs['rejection_sampling_M'] + distribution_bins = kwargs['distribution_bins'] + + for n in range(num_goals): + if n == 0: input_nodes = None + else: input_nodes = goal_node_ids[n-1] + start_node_ids_, end_node_ids_, dist_, _, _, _, _ = rng_next_goal_rejection_sampling( + input_nodes, batch_size, gtG, rng, max_dist, min_dist, + max_compute, sampling_distribution, target_distribution, nodes, + n_ori, step_size, distribution_bins, rejection_sampling_M) + if n == 0: start_node_ids = start_node_ids_ + goal_node_ids.append(end_node_ids_) + dists.append(dist_) + target_class = None + + elif type == 'room_to_room_back': + num_goals = num_goals + assert(num_goals == 2), 'num_goals must be 2.' + goal_node_ids = []; dists = []; + node_room_ids = kwargs['node_room_ids'] + # Sample the first one. + start_node_ids_, end_node_ids_, dist_, _, _ = rng_room_to_room( + batch_size, gtG, rng, max_dist, max_compute, + node_room_ids=node_room_ids, nodes=nodes) + start_node_ids = start_node_ids_ + goal_node_ids.append(end_node_ids_) + dists.append(dist_) + + # Set second goal to be starting position, and compute distance to the start node. + goal_node_ids.append(start_node_ids) + dist = [] + for i in range(batch_size): + dist_ = gt.topology.shortest_distance( + gt.GraphView(gtG, reversed=True), + source=gtG.vertex(start_node_ids[i]), target=None) + dist_ = np.array(dist_.get_array()) + dist.append(dist_) + dists.append(dist) + target_class = None + + elif type[:14] == 'to_nearest_obj': + # Generate an episode by sampling one of the target classes (with + # probability proportional to the number of nodes in the world). + # With the sampled class sample a node that is within some distance from + # the sampled class. + class_nodes = kwargs['class_nodes'] + sampling = kwargs['sampling'] + dist_to_class = kwargs['dist_to_class'] + + assert(num_goals == 1), 'Only supports a single goal.' + ind = rng.choice(class_nodes.shape[0], size=batch_size) + target_class = class_nodes[ind,1] + start_node_ids = []; dists = []; goal_node_ids = []; + + for t in target_class: + if sampling == 'uniform': + max_dist = max_dist + cnts = np.bincount(dist_to_class[t], minlength=max_dist+1)*1. + cnts[max_dist+1:] = 0 + p_each = 1./ cnts / (max_dist+1.) + p_each[cnts == 0] = 0 + p = p_each[dist_to_class[t]]*1.; p = p/np.sum(p) + start_node_id = rng.choice(p.shape[0], size=1, p=p)[0] + else: + logging.fatal('Sampling not one of uniform.') + start_node_ids.append(start_node_id) + dists.append(dist_to_class[t]) + # Dummy goal node, same as the start node, so that vis is better. + goal_node_ids.append(start_node_id) + dists = [dists] + goal_node_ids = [goal_node_ids] + + return start_node_ids, goal_node_ids, dists, target_class + + +class NavigationEnv(GridWorld, Building): + """Wrapper around GridWorld which sets up navigation tasks. + """ + def _debug_save_hardness(self, seed): + out_path = os.path.join(self.logdir, '{:s}_{:d}_hardness.png'.format(self.building_name, seed)) + batch_size = 4000 + rng = np.random.RandomState(0) + start_node_ids, end_node_ids, dists, pred_maps, paths, hardnesss, gt_dists = \ + rng_next_goal_rejection_sampling( + None, batch_size, self.task.gtG, rng, self.task_params.max_dist, + self.task_params.min_dist, self.task_params.max_dist, + self.task.sampling_distribution, self.task.target_distribution, + self.task.nodes, self.task_params.n_ori, self.task_params.step_size, + self.task.distribution_bins, self.task.rejection_sampling_M) + bins = self.task.distribution_bins + n_bins = self.task.n_bins + with plt.style.context('ggplot'): + fig, axes = utils.subplot(plt, (1,2), (10,10)) + ax = axes[0] + _ = ax.hist(hardnesss, bins=bins, weights=np.ones_like(hardnesss)/len(hardnesss)) + ax.plot(bins[:-1]+0.5/n_bins, self.task.target_distribution, 'g') + ax.plot(bins[:-1]+0.5/n_bins, self.task.sampling_distribution, 'b') + ax.grid('on') + + ax = axes[1] + _ = ax.hist(gt_dists, bins=np.arange(self.task_params.max_dist+1)) + ax.grid('on') + ax.set_title('Mean: {:0.2f}, Median: {:0.2f}'.format(np.mean(gt_dists), + np.median(gt_dists))) + with fu.fopen(out_path, 'w') as f: + fig.savefig(f, bbox_inches='tight', transparent=True, pad_inches=0) + + def _debug_save_map_nodes(self, seed): + """Saves traversible space along with nodes generated on the graph. Takes + the seed as input.""" + img_path = os.path.join(self.logdir, '{:s}_{:d}_graph.png'.format(self.building_name, seed)) + node_xyt = self.to_actual_xyt_vec(self.task.nodes) + plt.set_cmap('jet'); + fig, ax = utils.subplot(plt, (1,1), (12,12)) + ax.plot(node_xyt[:,0], node_xyt[:,1], 'm.') + ax.set_axis_off(); ax.axis('equal'); + + if self.room_dims is not None: + for i, r in enumerate(self.room_dims['dims']*1): + min_ = r[:3]*1 + max_ = r[3:]*1 + xmin, ymin, zmin = min_ + xmax, ymax, zmax = max_ + + ax.plot([xmin, xmax, xmax, xmin, xmin], + [ymin, ymin, ymax, ymax, ymin], 'g') + ax.imshow(self.traversible, origin='lower'); + with fu.fopen(img_path, 'w') as f: + fig.savefig(f, bbox_inches='tight', transparent=True, pad_inches=0) + + def _debug_semantic_maps(self, seed): + """Saves traversible space along with nodes generated on the graph. Takes + the seed as input.""" + for i, cls in enumerate(self.task_params.semantic_task.class_map_names): + img_path = os.path.join(self.logdir, '{:s}_flip{:d}_{:s}_graph.png'.format(self.building_name, seed, cls)) + maps = self.traversible*1. + maps += 0.5*(self.task.class_maps_dilated[:,:,i]) + write_traversible = (maps*1.+1.)/3.0 + write_traversible = (write_traversible*255.).astype(np.uint8)[:,:,np.newaxis] + write_traversible = write_traversible + np.zeros((1,1,3), dtype=np.uint8) + fu.write_image(img_path, write_traversible[::-1,:,:]) + + def _preprocess_for_task(self, seed): + """Sets up the task field for doing navigation on the grid world.""" + if self.task is None or self.task.seed != seed: + rng = np.random.RandomState(seed) + origin_loc = get_graph_origin_loc(rng, self.traversible) + self.task = utils.Foo(seed=seed, origin_loc=origin_loc, + n_ori=self.task_params.n_ori) + G = generate_graph(self.valid_fn_vec, self.task_params.step_size, + self.task.n_ori, (0, 0, 0)) + gtG, nodes, nodes_to_id = convert_to_graph_tool(G) + self.task.gtG = gtG + self.task.nodes = nodes + self.task.delta_theta = 2.0*np.pi/(self.task.n_ori*1.) + self.task.nodes_to_id = nodes_to_id + + logging.info('Building %s, #V=%d, #E=%d', self.building_name, + self.task.nodes.shape[0], self.task.gtG.num_edges()) + type = self.task_params.type + if type == 'general': + # Do nothing + _ = None + + elif type == 'room_to_room_many' or type == 'room_to_room_back': + if type == 'room_to_room_back': + assert(self.task_params.num_goals == 2), 'num_goals must be 2.' + + self.room_dims = _filter_rooms(self.room_dims, self.task_params.room_regex) + xyt = self.to_actual_xyt_vec(self.task.nodes) + self.task.node_room_ids = _label_nodes_with_room_id(xyt, self.room_dims) + self.task.reset_kwargs = {'node_room_ids': self.task.node_room_ids} + + elif type == 'rng_rejection_sampling_many': + n_bins = 20 + rejection_sampling_M = self.task_params.rejection_sampling_M + min_dist = self.task_params.min_dist + bins = np.arange(n_bins+1)/(n_bins*1.) + target_d = np.zeros(n_bins); target_d[...] = 1./n_bins; + + sampling_d = get_hardness_distribution( + self.task.gtG, self.task_params.max_dist, self.task_params.min_dist, + np.random.RandomState(0), 4000, bins, self.task.nodes, + self.task_params.n_ori, self.task_params.step_size) + + self.task.reset_kwargs = {'distribution_bins': bins, + 'target_distribution': target_d, + 'sampling_distribution': sampling_d, + 'rejection_sampling_M': rejection_sampling_M, + 'n_bins': n_bins, + 'n_ori': self.task_params.n_ori, + 'step_size': self.task_params.step_size, + 'min_dist': self.task_params.min_dist} + self.task.n_bins = n_bins + self.task.distribution_bins = bins + self.task.target_distribution = target_d + self.task.sampling_distribution = sampling_d + self.task.rejection_sampling_M = rejection_sampling_M + + if self.logdir is not None: + self._debug_save_hardness(seed) + + elif type[:14] == 'to_nearest_obj': + self.room_dims = _filter_rooms(self.room_dims, self.task_params.room_regex) + xyt = self.to_actual_xyt_vec(self.task.nodes) + + self.class_maps = _select_classes(self.class_maps, + self.class_map_names, + self.task_params.semantic_task.class_map_names)*1 + self.class_map_names = self.task_params.semantic_task.class_map_names + nodes_xyt = self.to_actual_xyt_vec(np.array(self.task.nodes)) + + tt = utils.Timer(); tt.tic(); + if self.task_params.type == 'to_nearest_obj_acc': + self.task.class_maps_dilated, self.task.node_class_label = label_nodes_with_class_geodesic( + nodes_xyt, self.class_maps, + self.task_params.semantic_task.pix_distance+8, self.map.traversible, + ff_cost=1., fo_cost=1., oo_cost=4., connectivity=8.) + + dists = [] + for i in range(len(self.class_map_names)): + class_nodes_ = np.where(self.task.node_class_label[:,i])[0] + dists.append(get_distance_node_list(gtG, source_nodes=class_nodes_, direction='to')) + self.task.dist_to_class = dists + a_, b_ = np.where(self.task.node_class_label) + self.task.class_nodes = np.concatenate((a_[:,np.newaxis], b_[:,np.newaxis]), axis=1) + + if self.logdir is not None: + self._debug_semantic_maps(seed) + + self.task.reset_kwargs = {'sampling': self.task_params.semantic_task.sampling, + 'class_nodes': self.task.class_nodes, + 'dist_to_class': self.task.dist_to_class} + + if self.logdir is not None: + self._debug_save_map_nodes(seed) + + def reset(self, rngs): + rng = rngs[0]; rng_perturb = rngs[1]; + nodes = self.task.nodes + tp = self.task_params + + start_node_ids, goal_node_ids, dists, target_class = \ + _nav_env_reset_helper(tp.type, rng, self.task.nodes, tp.batch_size, + self.task.gtG, tp.max_dist, tp.num_steps, + tp.num_goals, tp.data_augment, + **(self.task.reset_kwargs)) + + start_nodes = [tuple(nodes[_,:]) for _ in start_node_ids] + goal_nodes = [[tuple(nodes[_,:]) for _ in __] for __ in goal_node_ids] + data_augment = tp.data_augment + perturbs = _gen_perturbs(rng_perturb, tp.batch_size, + (tp.num_steps+1)*tp.num_goals, + data_augment.lr_flip, data_augment.delta_angle, + data_augment.delta_xy, data_augment.structured) + perturbs = np.array(perturbs) # batch x steps x 4 + end_perturbs = perturbs[:,-(tp.num_goals):,:]*1 # fixed perturb for the goal. + perturbs = perturbs[:,:-(tp.num_goals),:]*1 + + history = -np.ones((tp.batch_size, tp.num_steps*tp.num_goals), dtype=np.int32) + self.episode = utils.Foo( + start_nodes=start_nodes, start_node_ids=start_node_ids, + goal_nodes=goal_nodes, goal_node_ids=goal_node_ids, dist_to_goal=dists, + perturbs=perturbs, goal_perturbs=end_perturbs, history=history, + target_class=target_class, history_frames=[]) + return start_node_ids + + def take_action(self, current_node_ids, action, step_number): + """In addition to returning the action, also returns the reward that the + agent receives.""" + goal_number = step_number / self.task_params.num_steps + new_node_ids = GridWorld.take_action(self, current_node_ids, action) + rewards = [] + for i, n in enumerate(new_node_ids): + reward = 0 + if n == self.episode.goal_node_ids[goal_number][i]: + reward = self.task_params.reward_at_goal + reward = reward - self.task_params.reward_time_penalty + rewards.append(reward) + return new_node_ids, rewards + + + def get_optimal_action(self, current_node_ids, step_number): + """Returns the optimal action from the current node.""" + goal_number = step_number / self.task_params.num_steps + gtG = self.task.gtG + a = np.zeros((len(current_node_ids), self.task_params.num_actions), dtype=np.int32) + d_dict = self.episode.dist_to_goal[goal_number] + for i, c in enumerate(current_node_ids): + neigh = gtG.vertex(c).out_neighbours() + neigh_edge = gtG.vertex(c).out_edges() + ds = np.array([d_dict[i][int(x)] for x in neigh]) + ds_min = np.min(ds) + for i_, e in enumerate(neigh_edge): + if ds[i_] == ds_min: + _ = gtG.ep['action'][e] + a[i, _] = 1 + return a + + def get_targets(self, current_node_ids, step_number): + """Returns the target actions from the current node.""" + action = self.get_optimal_action(current_node_ids, step_number) + action = np.expand_dims(action, axis=1) + return vars(utils.Foo(action=action)) + + def get_targets_name(self): + """Returns the list of names of the targets.""" + return ['action'] + + def cleanup(self): + self.episode = None + +class VisualNavigationEnv(NavigationEnv): + """Class for doing visual navigation in environments. Functions for computing + features on states, etc. + """ + def __init__(self, robot, env, task_params, category_list=None, + building_name=None, flip=False, logdir=None, + building_loader=None, r_obj=None): + tt = utils.Timer() + tt.tic() + Building.__init__(self, building_name, robot, env, category_list, + small=task_params.toy_problem, flip=flip, logdir=logdir, + building_loader=building_loader) + + self.set_r_obj(r_obj) + self.task_params = task_params + self.task = None + self.episode = None + self._preprocess_for_task(self.task_params.building_seed) + if hasattr(self.task_params, 'map_scales'): + self.task.scaled_maps = resize_maps( + self.traversible.astype(np.float32)*1, self.task_params.map_scales, + self.task_params.map_resize_method) + else: + logging.fatal('VisualNavigationEnv does not support scale_f anymore.') + self.task.readout_maps_scaled = resize_maps( + self.traversible.astype(np.float32)*1, + self.task_params.readout_maps_scales, + self.task_params.map_resize_method) + tt.toc(log_at=1, log_str='VisualNavigationEnv __init__: ') + + def get_weight(self): + return self.task.nodes.shape[0] + + def get_common_data(self): + goal_nodes = self.episode.goal_nodes + start_nodes = self.episode.start_nodes + perturbs = self.episode.perturbs + goal_perturbs = self.episode.goal_perturbs + target_class = self.episode.target_class + + goal_locs = []; rel_goal_locs = []; + for i in range(len(goal_nodes)): + end_nodes = goal_nodes[i] + goal_loc, _, _, goal_theta = self.get_loc_axis( + np.array(end_nodes), delta_theta=self.task.delta_theta, + perturb=goal_perturbs[:,i,:]) + + # Compute the relative location to all goals from the starting location. + loc, _, _, theta = self.get_loc_axis(np.array(start_nodes), + delta_theta=self.task.delta_theta, + perturb=perturbs[:,0,:]) + r_goal, t_goal = _get_relative_goal_loc(goal_loc*1., loc, theta) + rel_goal_loc = np.concatenate((r_goal*np.cos(t_goal), r_goal*np.sin(t_goal), + np.cos(goal_theta-theta), + np.sin(goal_theta-theta)), axis=1) + rel_goal_locs.append(np.expand_dims(rel_goal_loc, axis=1)) + goal_locs.append(np.expand_dims(goal_loc, axis=1)) + + map = self.traversible*1. + maps = np.repeat(np.expand_dims(np.expand_dims(map, axis=0), axis=0), + self.task_params.batch_size, axis=0)*1 + if self.task_params.type[:14] == 'to_nearest_obj': + for i in range(self.task_params.batch_size): + maps[i,0,:,:] += 0.5*(self.task.class_maps_dilated[:,:,target_class[i]]) + + rel_goal_locs = np.concatenate(rel_goal_locs, axis=1) + goal_locs = np.concatenate(goal_locs, axis=1) + maps = np.expand_dims(maps, axis=-1) + + if self.task_params.type[:14] == 'to_nearest_obj': + rel_goal_locs = np.zeros((self.task_params.batch_size, 1, + len(self.task_params.semantic_task.class_map_names)), + dtype=np.float32) + goal_locs = np.zeros((self.task_params.batch_size, 1, 2), + dtype=np.float32) + for i in range(self.task_params.batch_size): + t = target_class[i] + rel_goal_locs[i,0,t] = 1. + goal_locs[i,0,0] = t + goal_locs[i,0,1] = np.NaN + + return vars(utils.Foo(orig_maps=maps, goal_loc=goal_locs, + rel_goal_loc_at_start=rel_goal_locs)) + + def pre_common_data(self, inputs): + return inputs + + + def get_features(self, current_node_ids, step_number): + task_params = self.task_params + goal_number = step_number / self.task_params.num_steps + end_nodes = self.task.nodes[self.episode.goal_node_ids[goal_number],:]*1 + current_nodes = self.task.nodes[current_node_ids,:]*1 + end_perturbs = self.episode.goal_perturbs[:,goal_number,:][:,np.newaxis,:] + perturbs = self.episode.perturbs + target_class = self.episode.target_class + + # Append to history. + self.episode.history[:,step_number] = np.array(current_node_ids) + + # Render out the images from current node. + outs = {} + + if self.task_params.outputs.images: + imgs_all = [] + imgs = self.render_nodes([tuple(x) for x in current_nodes], + perturb=perturbs[:,step_number,:]) + imgs_all.append(imgs) + aux_delta_thetas = self.task_params.aux_delta_thetas + for i in range(len(aux_delta_thetas)): + imgs = self.render_nodes([tuple(x) for x in current_nodes], + perturb=perturbs[:,step_number,:], + aux_delta_theta=aux_delta_thetas[i]) + imgs_all.append(imgs) + imgs_all = np.array(imgs_all) # A x B x H x W x C + imgs_all = np.transpose(imgs_all, axes=[1,0,2,3,4]) + imgs_all = np.expand_dims(imgs_all, axis=1) # B x N x A x H x W x C + if task_params.num_history_frames > 0: + if step_number == 0: + # Append the same frame 4 times + for i in range(task_params.num_history_frames+1): + self.episode.history_frames.insert(0, imgs_all*1.) + self.episode.history_frames.insert(0, imgs_all) + self.episode.history_frames.pop() + imgs_all_with_history = np.concatenate(self.episode.history_frames, axis=2) + else: + imgs_all_with_history = imgs_all + outs['imgs'] = imgs_all_with_history # B x N x A x H x W x C + + if self.task_params.outputs.node_ids: + outs['node_ids'] = np.array(current_node_ids).reshape((-1,1,1)) + outs['perturbs'] = np.expand_dims(perturbs[:,step_number, :]*1., axis=1) + + if self.task_params.outputs.analytical_counts: + assert(self.task_params.modalities == ['depth']) + d = image_pre(outs['imgs']*1., self.task_params.modalities) + cm = get_camera_matrix(self.task_params.img_width, + self.task_params.img_height, + self.task_params.img_fov) + XYZ = get_point_cloud_from_z(100./d[...,0], cm) + XYZ = make_geocentric(XYZ*100., self.robot.sensor_height, + self.robot.camera_elevation_degree) + for i in range(len(self.task_params.analytical_counts.map_sizes)): + non_linearity = self.task_params.analytical_counts.non_linearity[i] + count, isvalid = bin_points(XYZ*1., + map_size=self.task_params.analytical_counts.map_sizes[i], + xy_resolution=self.task_params.analytical_counts.xy_resolution[i], + z_bins=self.task_params.analytical_counts.z_bins[i]) + assert(count.shape[2] == 1), 'only works for n_views equal to 1.' + count = count[:,:,0,:,:,:] + isvalid = isvalid[:,:,0,:,:,:] + if non_linearity == 'none': + None + elif non_linearity == 'min10': + count = np.minimum(count, 10.) + elif non_linearity == 'sqrt': + count = np.sqrt(count) + else: + logging.fatal('Undefined non_linearity.') + outs['analytical_counts_{:d}'.format(i)] = count + + # Compute the goal location in the cordinate frame of the robot. + if self.task_params.outputs.rel_goal_loc: + if self.task_params.type[:14] != 'to_nearest_obj': + loc, _, _, theta = self.get_loc_axis(current_nodes, + delta_theta=self.task.delta_theta, + perturb=perturbs[:,step_number,:]) + goal_loc, _, _, goal_theta = self.get_loc_axis(end_nodes, + delta_theta=self.task.delta_theta, + perturb=end_perturbs[:,0,:]) + r_goal, t_goal = _get_relative_goal_loc(goal_loc, loc, theta) + + rel_goal_loc = np.concatenate((r_goal*np.cos(t_goal), r_goal*np.sin(t_goal), + np.cos(goal_theta-theta), + np.sin(goal_theta-theta)), axis=1) + outs['rel_goal_loc'] = np.expand_dims(rel_goal_loc, axis=1) + elif self.task_params.type[:14] == 'to_nearest_obj': + rel_goal_loc = np.zeros((self.task_params.batch_size, 1, + len(self.task_params.semantic_task.class_map_names)), + dtype=np.float32) + for i in range(self.task_params.batch_size): + t = target_class[i] + rel_goal_loc[i,0,t] = 1. + outs['rel_goal_loc'] = rel_goal_loc + + # Location on map to plot the trajectory during validation. + if self.task_params.outputs.loc_on_map: + loc, x_axis, y_axis, theta = self.get_loc_axis(current_nodes, + delta_theta=self.task.delta_theta, + perturb=perturbs[:,step_number,:]) + outs['loc_on_map'] = np.expand_dims(loc, axis=1) + + # Compute gt_dist to goal + if self.task_params.outputs.gt_dist_to_goal: + gt_dist_to_goal = np.zeros((len(current_node_ids), 1), dtype=np.float32) + for i, n in enumerate(current_node_ids): + gt_dist_to_goal[i,0] = self.episode.dist_to_goal[goal_number][i][n] + outs['gt_dist_to_goal'] = np.expand_dims(gt_dist_to_goal, axis=1) + + # Free space in front of you, map and goal as images. + if self.task_params.outputs.ego_maps: + loc, x_axis, y_axis, theta = self.get_loc_axis(current_nodes, + delta_theta=self.task.delta_theta, + perturb=perturbs[:,step_number,:]) + maps = generate_egocentric_maps(self.task.scaled_maps, + self.task_params.map_scales, + self.task_params.map_crop_sizes, loc, + x_axis, y_axis, theta) + + for i in range(len(self.task_params.map_scales)): + outs['ego_maps_{:d}'.format(i)] = \ + np.expand_dims(np.expand_dims(maps[i], axis=1), axis=-1) + + if self.task_params.outputs.readout_maps: + loc, x_axis, y_axis, theta = self.get_loc_axis(current_nodes, + delta_theta=self.task.delta_theta, + perturb=perturbs[:,step_number,:]) + maps = generate_egocentric_maps(self.task.readout_maps_scaled, + self.task_params.readout_maps_scales, + self.task_params.readout_maps_crop_sizes, + loc, x_axis, y_axis, theta) + for i in range(len(self.task_params.readout_maps_scales)): + outs['readout_maps_{:d}'.format(i)] = \ + np.expand_dims(np.expand_dims(maps[i], axis=1), axis=-1) + + # Images for the goal. + if self.task_params.outputs.ego_goal_imgs: + if self.task_params.type[:14] != 'to_nearest_obj': + loc, x_axis, y_axis, theta = self.get_loc_axis(current_nodes, + delta_theta=self.task.delta_theta, + perturb=perturbs[:,step_number,:]) + goal_loc, _, _, _ = self.get_loc_axis(end_nodes, + delta_theta=self.task.delta_theta, + perturb=end_perturbs[:,0,:]) + rel_goal_orientation = np.mod( + np.int32(current_nodes[:,2:] - end_nodes[:,2:]), self.task_params.n_ori) + goal_dist, goal_theta = _get_relative_goal_loc(goal_loc, loc, theta) + goals = generate_goal_images(self.task_params.map_scales, + self.task_params.map_crop_sizes, + self.task_params.n_ori, goal_dist, + goal_theta, rel_goal_orientation) + for i in range(len(self.task_params.map_scales)): + outs['ego_goal_imgs_{:d}'.format(i)] = np.expand_dims(goals[i], axis=1) + + elif self.task_params.type[:14] == 'to_nearest_obj': + for i in range(len(self.task_params.map_scales)): + num_classes = len(self.task_params.semantic_task.class_map_names) + outs['ego_goal_imgs_{:d}'.format(i)] = np.zeros((self.task_params.batch_size, 1, + self.task_params.map_crop_sizes[i], + self.task_params.map_crop_sizes[i], + self.task_params.goal_channels)) + for i in range(self.task_params.batch_size): + t = target_class[i] + for j in range(len(self.task_params.map_scales)): + outs['ego_goal_imgs_{:d}'.format(j)][i,:,:,:,t] = 1. + + # Incremental locs and theta (for map warping), always in the original scale + # of the map, the subequent steps in the tf code scale appropriately. + # Scaling is done by just multiplying incremental_locs appropriately. + if self.task_params.outputs.egomotion: + if step_number == 0: + # Zero Ego Motion + incremental_locs = np.zeros((self.task_params.batch_size, 1, 2), dtype=np.float32) + incremental_thetas = np.zeros((self.task_params.batch_size, 1, 1), dtype=np.float32) + else: + previous_nodes = self.task.nodes[self.episode.history[:,step_number-1], :]*1 + loc, _, _, theta = self.get_loc_axis(current_nodes, + delta_theta=self.task.delta_theta, + perturb=perturbs[:,step_number,:]) + previous_loc, _, _, previous_theta = self.get_loc_axis( + previous_nodes, delta_theta=self.task.delta_theta, + perturb=perturbs[:,step_number-1,:]) + + incremental_locs_ = np.reshape(loc-previous_loc, [self.task_params.batch_size, 1, -1]) + + t = -np.pi/2+np.reshape(theta*1, [self.task_params.batch_size, 1, -1]) + incremental_locs = incremental_locs_*1 + incremental_locs[:,:,0] = np.sum(incremental_locs_ * + np.concatenate((np.cos(t), np.sin(t)), + axis=-1), axis=-1) + incremental_locs[:,:,1] = np.sum(incremental_locs_ * + np.concatenate((np.cos(t+np.pi/2), + np.sin(t+np.pi/2)), + axis=-1), axis=-1) + incremental_thetas = np.reshape(theta-previous_theta, + [self.task_params.batch_size, 1, -1]) + outs['incremental_locs'] = incremental_locs + outs['incremental_thetas'] = incremental_thetas + + if self.task_params.outputs.visit_count: + # Output the visit count for this state, how many times has the current + # state been visited, and how far in the history was the last visit + # (except this one) + visit_count = np.zeros((self.task_params.batch_size, 1), dtype=np.int32) + last_visit = -np.ones((self.task_params.batch_size, 1), dtype=np.int32) + if step_number >= 1: + h = self.episode.history[:,:(step_number)] + visit_count[:,0] = np.sum(h == np.array(current_node_ids).reshape([-1,1]), + axis=1) + last_visit[:,0] = np.argmax(h[:,::-1] == np.array(current_node_ids).reshape([-1,1]), + axis=1) + 1 + last_visit[visit_count == 0] = -1 # -1 if not visited. + outs['visit_count'] = np.expand_dims(visit_count, axis=1) + outs['last_visit'] = np.expand_dims(last_visit, axis=1) + return outs + + def get_features_name(self): + f = [] + if self.task_params.outputs.images: + f.append('imgs') + if self.task_params.outputs.rel_goal_loc: + f.append('rel_goal_loc') + if self.task_params.outputs.loc_on_map: + f.append('loc_on_map') + if self.task_params.outputs.gt_dist_to_goal: + f.append('gt_dist_to_goal') + if self.task_params.outputs.ego_maps: + for i in range(len(self.task_params.map_scales)): + f.append('ego_maps_{:d}'.format(i)) + if self.task_params.outputs.readout_maps: + for i in range(len(self.task_params.readout_maps_scales)): + f.append('readout_maps_{:d}'.format(i)) + if self.task_params.outputs.ego_goal_imgs: + for i in range(len(self.task_params.map_scales)): + f.append('ego_goal_imgs_{:d}'.format(i)) + if self.task_params.outputs.egomotion: + f.append('incremental_locs') + f.append('incremental_thetas') + if self.task_params.outputs.visit_count: + f.append('visit_count') + f.append('last_visit') + if self.task_params.outputs.analytical_counts: + for i in range(len(self.task_params.analytical_counts.map_sizes)): + f.append('analytical_counts_{:d}'.format(i)) + if self.task_params.outputs.node_ids: + f.append('node_ids') + f.append('perturbs') + return f + + def pre_features(self, inputs): + if self.task_params.outputs.images: + inputs['imgs'] = image_pre(inputs['imgs'], self.task_params.modalities) + return inputs + +class BuildingMultiplexer(): + def __init__(self, args, task_number): + params = vars(args) + for k in params.keys(): + setattr(self, k, params[k]) + self.task_number = task_number + self._pick_data(task_number) + logging.info('Env Class: %s.', self.env_class) + if self.task_params.task == 'planning': + self._setup_planner() + elif self.task_params.task == 'mapping': + self._setup_mapper() + elif self.task_params.task == 'map+plan': + self._setup_mapper() + else: + logging.error('Undefined task: %s'.format(self.task_params.task)) + + def _pick_data(self, task_number): + logging.error('Input Building Names: %s', self.building_names) + self.flip = [np.mod(task_number / len(self.building_names), 2) == 1] + id = np.mod(task_number, len(self.building_names)) + self.building_names = [self.building_names[id]] + self.task_params.building_seed = task_number + logging.error('BuildingMultiplexer: Picked Building Name: %s', self.building_names) + self.building_names = self.building_names[0].split('+') + self.flip = [self.flip[0] for _ in self.building_names] + logging.error('BuildingMultiplexer: Picked Building Name: %s', self.building_names) + logging.error('BuildingMultiplexer: Flipping Buildings: %s', self.flip) + logging.error('BuildingMultiplexer: Set building_seed: %d', self.task_params.building_seed) + self.num_buildings = len(self.building_names) + logging.error('BuildingMultiplexer: Num buildings: %d', self.num_buildings) + + def _setup_planner(self): + # Load building env class. + self.buildings = [] + for i, building_name in enumerate(self.building_names): + b = self.env_class(robot=self.robot, env=self.env, + task_params=self.task_params, + building_name=building_name, flip=self.flip[i], + logdir=self.logdir, building_loader=self.dataset) + self.buildings.append(b) + + def _setup_mapper(self): + # Set up the renderer. + cp = self.camera_param + rgb_shader, d_shader = sru.get_shaders(cp.modalities) + r_obj = SwiftshaderRenderer() + r_obj.init_display(width=cp.width, height=cp.height, fov=cp.fov, + z_near=cp.z_near, z_far=cp.z_far, rgb_shader=rgb_shader, + d_shader=d_shader) + self.r_obj = r_obj + r_obj.clear_scene() + + # Load building env class. + self.buildings = [] + wt = [] + for i, building_name in enumerate(self.building_names): + b = self.env_class(robot=self.robot, env=self.env, + task_params=self.task_params, + building_name=building_name, flip=self.flip[i], + logdir=self.logdir, building_loader=self.dataset, + r_obj=r_obj) + wt.append(b.get_weight()) + b.load_building_into_scene() + b.set_building_visibility(False) + self.buildings.append(b) + wt = np.array(wt).astype(np.float32) + wt = wt / np.sum(wt+0.0001) + self.building_sampling_weights = wt + + def sample_building(self, rng): + if self.num_buildings == 1: + building_id = rng.choice(range(len(self.building_names))) + else: + building_id = rng.choice(self.num_buildings, + p=self.building_sampling_weights) + b = self.buildings[building_id] + instances = b._gen_rng(rng) + self._building_id = building_id + return self.buildings[building_id], instances + + def sample_env(self, rngs): + rng = rngs[0]; + if self.num_buildings == 1: + building_id = rng.choice(range(len(self.building_names))) + else: + building_id = rng.choice(self.num_buildings, + p=self.building_sampling_weights) + return self.buildings[building_id] + + def pre(self, inputs): + return self.buildings[self._building_id].pre(inputs) + + def __del__(self): + self.r_obj.clear_scene() + logging.error('Clearing scene.') diff --git a/cognitive_mapping_and_planning/datasets/nav_env_config.py b/cognitive_mapping_and_planning/datasets/nav_env_config.py new file mode 100644 index 0000000000000000000000000000000000000000..3d71c5767c4dc0ed9f05cce5c1790f11ede3778a --- /dev/null +++ b/cognitive_mapping_and_planning/datasets/nav_env_config.py @@ -0,0 +1,127 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Configs for stanford navigation environment. + +Base config for stanford navigation enviornment. +""" +import numpy as np +import src.utils as utils +import datasets.nav_env as nav_env + +def nav_env_base_config(): + """Returns the base config for stanford navigation environment. + + Returns: + Base config for stanford navigation environment. + """ + robot = utils.Foo(radius=15, + base=10, + height=140, + sensor_height=120, + camera_elevation_degree=-15) + + env = utils.Foo(padding=10, + resolution=5, + num_point_threshold=2, + valid_min=-10, + valid_max=200, + n_samples_per_face=200) + + camera_param = utils.Foo(width=225, + height=225, + z_near=0.05, + z_far=20.0, + fov=60., + modalities=['rgb'], + img_channels=3) + + data_augment = utils.Foo(lr_flip=0, + delta_angle=0.5, + delta_xy=4, + relight=True, + relight_fast=False, + structured=False) # if True, uses the same perturb for the whole episode. + + outputs = utils.Foo(images=True, + rel_goal_loc=False, + loc_on_map=True, + gt_dist_to_goal=True, + ego_maps=False, + ego_goal_imgs=False, + egomotion=False, + visit_count=False, + analytical_counts=False, + node_ids=True, + readout_maps=False) + + # class_map_names=['board', 'chair', 'door', 'sofa', 'table'] + class_map_names = ['chair', 'door', 'table'] + semantic_task = utils.Foo(class_map_names=class_map_names, pix_distance=16, + sampling='uniform') + + # time per iteration for cmp is 0.82 seconds per episode with 3.4s overhead per batch. + task_params = utils.Foo(max_dist=32, + step_size=8, + num_steps=40, + num_actions=4, + batch_size=4, + building_seed=0, + num_goals=1, + img_height=None, + img_width=None, + img_channels=None, + modalities=None, + outputs=outputs, + map_scales=[1.], + map_crop_sizes=[64], + rel_goal_loc_dim=4, + base_class='Building', + task='map+plan', + n_ori=4, + type='room_to_room_many', + data_augment=data_augment, + room_regex='^((?!hallway).)*$', + toy_problem=False, + map_channels=1, + gt_coverage=False, + input_type='maps', + full_information=False, + aux_delta_thetas=[], + semantic_task=semantic_task, + num_history_frames=0, + node_ids_dim=1, + perturbs_dim=4, + map_resize_method='linear_noantialiasing', + readout_maps_channels=1, + readout_maps_scales=[], + readout_maps_crop_sizes=[], + n_views=1, + reward_time_penalty=0.1, + reward_at_goal=1., + discount_factor=0.99, + rejection_sampling_M=100, + min_dist=None) + + navtask_args = utils.Foo( + building_names=['area1_gates_wingA_floor1_westpart'], + env_class=nav_env.VisualNavigationEnv, + robot=robot, + task_params=task_params, + env=env, + camera_param=camera_param, + cache_rooms=True) + return navtask_args + diff --git a/cognitive_mapping_and_planning/matplotlibrc b/cognitive_mapping_and_planning/matplotlibrc new file mode 100644 index 0000000000000000000000000000000000000000..ed5097572ae68680d0c9afdf510968e1c3d175d4 --- /dev/null +++ b/cognitive_mapping_and_planning/matplotlibrc @@ -0,0 +1 @@ +backend : agg diff --git a/cognitive_mapping_and_planning/output/.gitignore b/cognitive_mapping_and_planning/output/.gitignore new file mode 100644 index 0000000000000000000000000000000000000000..a767cafbbd864d0baf76530294598e4c2be60a24 --- /dev/null +++ b/cognitive_mapping_and_planning/output/.gitignore @@ -0,0 +1 @@ +* diff --git a/cognitive_mapping_and_planning/output/README.md b/cognitive_mapping_and_planning/output/README.md new file mode 100644 index 0000000000000000000000000000000000000000..7518c3874390da7e2aa65a89ccdec035ca7610e8 --- /dev/null +++ b/cognitive_mapping_and_planning/output/README.md @@ -0,0 +1,16 @@ +### Pre-Trained Models + +We provide the following pre-trained models: + +Config Name | Checkpoint | Mean Dist. | 50%ile Dist. | 75%ile Dist. | Success %age | +:-: | :-: | :-: | :-: | :-: | :-: | +cmp.lmap_Msc.clip5.sbpd_d_r2r | [ckpt](http://download.tensorflow.org/models/cognitive_mapping_and_planning/2017_04_16/cmp.lmap_Msc.clip5.sbpd_d_r2r.tar) | 4.79 | 0 | 1 | 78.9 | +cmp.lmap_Msc.clip5.sbpd_rgb_r2r | [ckpt](http://download.tensorflow.org/models/cognitive_mapping_and_planning/2017_04_16/cmp.lmap_Msc.clip5.sbpd_rgb_r2r.tar) | 7.74 | 0 | 14 | 62.4 | +cmp.lmap_Msc.clip5.sbpd_d_ST | [ckpt](http://download.tensorflow.org/models/cognitive_mapping_and_planning/2017_04_16/cmp.lmap_Msc.clip5.sbpd_d_ST.tar) | 10.67 | 9 | 19 | 39.7 | +cmp.lmap_Msc.clip5.sbpd_rgb_ST | [ckpt](http://download.tensorflow.org/models/cognitive_mapping_and_planning/2017_04_16/cmp.lmap_Msc.clip5.sbpd_rgb_ST.tar) | 11.27 | 10 | 19 | 35.6 | +cmp.lmap_Msc.clip5.sbpd_d_r2r_h0_64_80 | [ckpt](http:////download.tensorflow.org/models/cognitive_mapping_and_planning/2017_04_16/cmp.lmap_Msc.clip5.sbpd_d_r2r_h0_64_80.tar) | 11.6 | 0 | 19 | 66.9 | +bl.v2.noclip.sbpd_d_r2r | [ckpt](http://download.tensorflow.org/models/cognitive_mapping_and_planning/2017_04_16/bl.v2.noclip.sbpd_d_r2r.tar) | 5.90 | 0 | 6 | 71.2 | +bl.v2.noclip.sbpd_rgb_r2r | [ckpt](http://download.tensorflow.org/models/cognitive_mapping_and_planning/2017_04_16/bl.v2.noclip.sbpd_rgb_r2r.tar) | 10.21 | 1 | 21 | 53.4 | +bl.v2.noclip.sbpd_d_ST | [ckpt](http://download.tensorflow.org/models/cognitive_mapping_and_planning/2017_04_16/bl.v2.noclip.sbpd_d_ST.tar) | 13.29 | 14 | 23 | 28.0 | +bl.v2.noclip.sbpd_rgb_ST | [ckpt](http://download.tensorflow.org/models/cognitive_mapping_and_planning/2017_04_16/bl.v2.noclip.sbpd_rgb_ST.tar) | 13.37 | 13 | 20 | 24.2 | +bl.v2.noclip.sbpd_d_r2r_h0_64_80 | [ckpt](http:////download.tensorflow.org/models/cognitive_mapping_and_planning/2017_04_16/bl.v2.noclip.sbpd_d_r2r_h0_64_80.tar) | 15.30 | 0 | 29 | 57.9 | diff --git a/cognitive_mapping_and_planning/patches/GLES2_2_0.py.patch b/cognitive_mapping_and_planning/patches/GLES2_2_0.py.patch new file mode 100644 index 0000000000000000000000000000000000000000..de1be442d5b9fff44862d37b9329e32face2b663 --- /dev/null +++ b/cognitive_mapping_and_planning/patches/GLES2_2_0.py.patch @@ -0,0 +1,14 @@ +10c10 +< from OpenGL import platform, constant, arrays +--- +> from OpenGL import platform, constant, arrays, contextdata +249a250 +> from OpenGL._bytes import _NULL_8_BYTE +399c400 +< array = ArrayDatatype.asArray( pointer, type ) +--- +> array = arrays.ArrayDatatype.asArray( pointer, type ) +405c406 +< ArrayDatatype.voidDataPointer( array ) +--- +> arrays.ArrayDatatype.voidDataPointer( array ) diff --git a/cognitive_mapping_and_planning/patches/apply_patches.sh b/cognitive_mapping_and_planning/patches/apply_patches.sh new file mode 100644 index 0000000000000000000000000000000000000000..4a786058258decdfb381eff25684183d92788ebe --- /dev/null +++ b/cognitive_mapping_and_planning/patches/apply_patches.sh @@ -0,0 +1,18 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +echo $VIRTUAL_ENV +patch $VIRTUAL_ENV/local/lib/python2.7/site-packages/OpenGL/GLES2/VERSION/GLES2_2_0.py patches/GLES2_2_0.py.patch +patch $VIRTUAL_ENV/local/lib/python2.7/site-packages/OpenGL/platform/ctypesloader.py patches/ctypesloader.py.patch diff --git a/cognitive_mapping_and_planning/patches/ctypesloader.py.patch b/cognitive_mapping_and_planning/patches/ctypesloader.py.patch new file mode 100644 index 0000000000000000000000000000000000000000..27dd43b18010dc5fdcd605b9a5d470abaa19151f --- /dev/null +++ b/cognitive_mapping_and_planning/patches/ctypesloader.py.patch @@ -0,0 +1,15 @@ +45c45,46 +< return dllType( name, mode ) +--- +> print './' + name +> return dllType( './' + name, mode ) +47,48c48,53 +< err.args += (name,fullName) +< raise +--- +> try: +> print name +> return dllType( name, mode ) +> except: +> err.args += (name,fullName) +> raise diff --git a/cognitive_mapping_and_planning/render/__init__.py b/cognitive_mapping_and_planning/render/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/cognitive_mapping_and_planning/render/depth_rgb_encoded.fp b/cognitive_mapping_and_planning/render/depth_rgb_encoded.fp new file mode 100644 index 0000000000000000000000000000000000000000..23e93d27f585e93896799f177888e9c50fa03eed --- /dev/null +++ b/cognitive_mapping_and_planning/render/depth_rgb_encoded.fp @@ -0,0 +1,30 @@ +// This shader computes per-pixel depth (-z coordinate in the camera space, or +// orthogonal distance to the camera plane). The result is multiplied by the +// `kFixedPointFraction` constant and is encoded to RGB channels as an integer +// (R being the least significant byte). + +#ifdef GL_ES +#ifdef GL_FRAGMENT_PRECISION_HIGH +precision highp float; +#else +precision mediump float; +#endif +#endif + +const float kFixedPointFraction = 1000.0; + +varying float vDepth; + +void main(void) { + float d = vDepth; + + // Encode the depth to RGB. + d *= (kFixedPointFraction / 255.0); + gl_FragColor.r = mod(d, 1.0); + d = (d - gl_FragColor.r) / 255.0; + gl_FragColor.g = mod(d, 1.0); + d = (d - gl_FragColor.g) / 255.0; + gl_FragColor.b = mod(d, 1.0); + + gl_FragColor.a = 1.0; +} diff --git a/cognitive_mapping_and_planning/render/depth_rgb_encoded.vp b/cognitive_mapping_and_planning/render/depth_rgb_encoded.vp new file mode 100644 index 0000000000000000000000000000000000000000..2db74f14aa7f253b8f544ec1ab519129f13426a0 --- /dev/null +++ b/cognitive_mapping_and_planning/render/depth_rgb_encoded.vp @@ -0,0 +1,15 @@ +uniform mat4 uViewMatrix; +uniform mat4 uProjectionMatrix; + +attribute vec3 aPosition; + +varying float vDepth; + +void main(void) { + vec4 worldPosition = vec4(aPosition, 1.0); + vec4 viewPosition = uViewMatrix * worldPosition; + gl_Position = uProjectionMatrix * viewPosition; + + // Orthogonal depth is simply -z in the camera space. + vDepth = -viewPosition.z; +} diff --git a/cognitive_mapping_and_planning/render/rgb_flat_color.fp b/cognitive_mapping_and_planning/render/rgb_flat_color.fp new file mode 100644 index 0000000000000000000000000000000000000000..c8c24d76103793d9cfa9166517177cb332d1a92c --- /dev/null +++ b/cognitive_mapping_and_planning/render/rgb_flat_color.fp @@ -0,0 +1,11 @@ +precision highp float; +varying vec4 vColor; +varying vec2 vTextureCoord; + +uniform sampler2D uTexture; + +void main(void) { + vec4 color = vColor; + color = texture2D(uTexture, vTextureCoord); + gl_FragColor = color; +} diff --git a/cognitive_mapping_and_planning/render/rgb_flat_color.vp b/cognitive_mapping_and_planning/render/rgb_flat_color.vp new file mode 100644 index 0000000000000000000000000000000000000000..ebc79173405f7449921fd40f778fe3695aab5ea8 --- /dev/null +++ b/cognitive_mapping_and_planning/render/rgb_flat_color.vp @@ -0,0 +1,18 @@ +uniform mat4 uViewMatrix; +uniform mat4 uProjectionMatrix; +uniform vec4 uColor; + +attribute vec4 aColor; +attribute vec3 aPosition; +attribute vec2 aTextureCoord; + +varying vec4 vColor; +varying vec2 vTextureCoord; + +void main(void) { + vec4 worldPosition = vec4(aPosition, 1.0); + gl_Position = uProjectionMatrix * (uViewMatrix * worldPosition); + + vColor = aColor * uColor; + vTextureCoord = aTextureCoord; +} diff --git a/cognitive_mapping_and_planning/render/swiftshader_renderer.py b/cognitive_mapping_and_planning/render/swiftshader_renderer.py new file mode 100644 index 0000000000000000000000000000000000000000..74b1be72c11a2877231a66886d02babfd4793ce8 --- /dev/null +++ b/cognitive_mapping_and_planning/render/swiftshader_renderer.py @@ -0,0 +1,427 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +r"""Implements loading and rendering of meshes. Contains 2 classes: + Shape: Class that exposes high level functions for loading and manipulating + shapes. This currently is bound to assimp + (https://github.com/assimp/assimp). If you want to interface to a different + library, reimplement this class with bindings to your mesh loading library. + + SwiftshaderRenderer: Class that renders Shapes. Currently this uses python + bindings to OpenGL (EGL), bindings to an alternate renderer may be implemented + here. +""" + +import numpy as np, os +import cv2, ctypes, logging, os, numpy as np +import pyassimp as assimp +from OpenGL.GLES2 import * +from OpenGL.EGL import * +import src.rotation_utils as ru + +__version__ = 'swiftshader_renderer' + +def get_shaders(modalities): + rgb_shader = 'rgb_flat_color' if 'rgb' in modalities else None + d_shader = 'depth_rgb_encoded' if 'depth' in modalities else None + return rgb_shader, d_shader + +def sample_points_on_faces(vs, fs, rng, n_samples_per_face): + idx = np.repeat(np.arange(fs.shape[0]), n_samples_per_face) + + r = rng.rand(idx.size, 2) + r1 = r[:,:1]; r2 = r[:,1:]; sqrt_r1 = np.sqrt(r1); + + v1 = vs[fs[idx, 0], :]; v2 = vs[fs[idx, 1], :]; v3 = vs[fs[idx, 2], :]; + pts = (1-sqrt_r1)*v1 + sqrt_r1*(1-r2)*v2 + sqrt_r1*r2*v3 + + v1 = vs[fs[:,0], :]; v2 = vs[fs[:, 1], :]; v3 = vs[fs[:, 2], :]; + ar = 0.5*np.sqrt(np.sum(np.cross(v1-v3, v2-v3)**2, 1)) + + return pts, ar, idx + +class Shape(): + def get_pyassimp_load_options(self): + load_flags = assimp.postprocess.aiProcess_Triangulate; + load_flags = load_flags | assimp.postprocess.aiProcess_SortByPType; + load_flags = load_flags | assimp.postprocess.aiProcess_OptimizeMeshes; + load_flags = load_flags | assimp.postprocess.aiProcess_RemoveRedundantMaterials; + load_flags = load_flags | assimp.postprocess.aiProcess_FindDegenerates; + load_flags = load_flags | assimp.postprocess.aiProcess_GenSmoothNormals; + load_flags = load_flags | assimp.postprocess.aiProcess_JoinIdenticalVertices; + load_flags = load_flags | assimp.postprocess.aiProcess_ImproveCacheLocality; + load_flags = load_flags | assimp.postprocess.aiProcess_GenUVCoords; + load_flags = load_flags | assimp.postprocess.aiProcess_FindInvalidData; + return load_flags + + def __init__(self, obj_file, material_file=None, load_materials=True, + name_prefix='', name_suffix=''): + if material_file is not None: + logging.error('Ignoring material file input, reading them off obj file.') + load_flags = self.get_pyassimp_load_options() + scene = assimp.load(obj_file, processing=load_flags) + filter_ind = self._filter_triangles(scene.meshes) + self.meshes = [scene.meshes[i] for i in filter_ind] + for m in self.meshes: + m.name = name_prefix + m.name + name_suffix + + dir_name = os.path.dirname(obj_file) + # Load materials + materials = None + if load_materials: + materials = [] + for m in self.meshes: + file_name = os.path.join(dir_name, m.material.properties[('file', 1)]) + assert(os.path.exists(file_name)), \ + 'Texture file {:s} foes not exist.'.format(file_name) + img_rgb = cv2.imread(file_name)[::-1,:,::-1] + if img_rgb.shape[0] != img_rgb.shape[1]: + logging.warn('Texture image not square.') + sz = np.maximum(img_rgb.shape[0], img_rgb.shape[1]) + sz = int(np.power(2., np.ceil(np.log2(sz)))) + img_rgb = cv2.resize(img_rgb, (sz,sz), interpolation=cv2.INTER_LINEAR) + else: + sz = img_rgb.shape[0] + sz_ = int(np.power(2., np.ceil(np.log2(sz)))) + if sz != sz_: + logging.warn('Texture image not square of power of 2 size. ' + + 'Changing size from %d to %d.', sz, sz_) + sz = sz_ + img_rgb = cv2.resize(img_rgb, (sz,sz), interpolation=cv2.INTER_LINEAR) + materials.append(img_rgb) + self.scene = scene + self.materials = materials + + def _filter_triangles(self, meshes): + select = [] + for i in range(len(meshes)): + if meshes[i].primitivetypes == 4: + select.append(i) + return select + + def flip_shape(self): + for m in self.meshes: + m.vertices[:,1] = -m.vertices[:,1] + bb = m.faces*1 + bb[:,1] = m.faces[:,2] + bb[:,2] = m.faces[:,1] + m.faces = bb + # m.vertices[:,[0,1]] = m.vertices[:,[1,0]] + + def get_vertices(self): + vs = [] + for m in self.meshes: + vs.append(m.vertices) + vss = np.concatenate(vs, axis=0) + return vss, vs + + def get_faces(self): + vs = [] + for m in self.meshes: + v = m.faces + vs.append(v) + return vs + + def get_number_of_meshes(self): + return len(self.meshes) + + def scale(self, sx=1., sy=1., sz=1.): + pass + + def sample_points_on_face_of_shape(self, i, n_samples_per_face, sc): + v = self.meshes[i].vertices*sc + f = self.meshes[i].faces + p, face_areas, face_idx = sample_points_on_faces( + v, f, np.random.RandomState(0), n_samples_per_face) + return p, face_areas, face_idx + + def __del__(self): + scene = self.scene + assimp.release(scene) + +class SwiftshaderRenderer(): + def __init__(self): + self.entities = {} + + def init_display(self, width, height, fov, z_near, z_far, rgb_shader, + d_shader): + self.init_renderer_egl(width, height) + dir_path = os.path.dirname(os.path.realpath(__file__)) + if d_shader is not None and rgb_shader is not None: + logging.fatal('Does not support setting both rgb_shader and d_shader.') + + if d_shader is not None: + assert rgb_shader is None + shader = d_shader + self.modality = 'depth' + + if rgb_shader is not None: + assert d_shader is None + shader = rgb_shader + self.modality = 'rgb' + + self.create_shaders(os.path.join(dir_path, shader+'.vp'), + os.path.join(dir_path, shader + '.fp')) + aspect = width*1./(height*1.) + self.set_camera(fov, z_near, z_far, aspect) + + def init_renderer_egl(self, width, height): + major,minor = ctypes.c_long(),ctypes.c_long() + logging.info('init_renderer_egl: EGL_DEFAULT_DISPLAY: %s', EGL_DEFAULT_DISPLAY) + + egl_display = eglGetDisplay(EGL_DEFAULT_DISPLAY) + logging.info('init_renderer_egl: egl_display: %s', egl_display) + + eglInitialize(egl_display, major, minor) + logging.info('init_renderer_egl: EGL_OPENGL_API, EGL_OPENGL_ES_API: %s, %s', + EGL_OPENGL_API, EGL_OPENGL_ES_API) + eglBindAPI(EGL_OPENGL_ES_API) + + num_configs = ctypes.c_long() + configs = (EGLConfig*1)() + local_attributes = [EGL_RED_SIZE, 8, EGL_GREEN_SIZE, 8, EGL_BLUE_SIZE, 8, + EGL_DEPTH_SIZE, 16, EGL_SURFACE_TYPE, EGL_PBUFFER_BIT, + EGL_RENDERABLE_TYPE, EGL_OPENGL_ES2_BIT, EGL_NONE,] + logging.error('init_renderer_egl: local attributes: %s', local_attributes) + local_attributes = arrays.GLintArray.asArray(local_attributes) + success = eglChooseConfig(egl_display, local_attributes, configs, 1, num_configs) + logging.error('init_renderer_egl: eglChooseConfig success, num_configs: %d, %d', success, num_configs.value) + egl_config = configs[0] + + + context_attributes = [EGL_CONTEXT_CLIENT_VERSION, 2, EGL_NONE] + context_attributes = arrays.GLintArray.asArray(context_attributes) + egl_context = eglCreateContext(egl_display, egl_config, EGL_NO_CONTEXT, context_attributes) + + buffer_attributes = [EGL_WIDTH, width, EGL_HEIGHT, height, EGL_NONE] + buffer_attributes = arrays.GLintArray.asArray(buffer_attributes) + egl_surface = eglCreatePbufferSurface(egl_display, egl_config, buffer_attributes) + + + eglMakeCurrent(egl_display, egl_surface, egl_surface, egl_context) + logging.error("init_renderer_egl: egl_display: %s egl_surface: %s, egl_config: %s", egl_display, egl_surface, egl_context) + + glViewport(0, 0, width, height); + + self.egl_display = egl_display + self.egl_surface = egl_surface + self.egl_config = egl_config + self.egl_mapping = {} + self.render_timer = None + self.load_timer = None + self.height = height + self.width = width + + def create_shaders(self, v_shader_file, f_shader_file): + v_shader = glCreateShader(GL_VERTEX_SHADER) + with open(v_shader_file, 'r') as f: + ls = '' + for l in f: + ls = ls + l + glShaderSource(v_shader, ls) + glCompileShader(v_shader); + assert(glGetShaderiv(v_shader, GL_COMPILE_STATUS) == 1) + + f_shader = glCreateShader(GL_FRAGMENT_SHADER) + with open(f_shader_file, 'r') as f: + ls = '' + for l in f: + ls = ls + l + glShaderSource(f_shader, ls) + glCompileShader(f_shader); + assert(glGetShaderiv(f_shader, GL_COMPILE_STATUS) == 1) + + egl_program = glCreateProgram(); + assert(egl_program) + glAttachShader(egl_program, v_shader) + glAttachShader(egl_program, f_shader) + glLinkProgram(egl_program); + assert(glGetProgramiv(egl_program, GL_LINK_STATUS) == 1) + glUseProgram(egl_program) + + glBindAttribLocation(egl_program, 0, "aPosition") + glBindAttribLocation(egl_program, 1, "aColor") + glBindAttribLocation(egl_program, 2, "aTextureCoord") + + self.egl_program = egl_program + self.egl_mapping['vertexs'] = 0 + self.egl_mapping['vertexs_color'] = 1 + self.egl_mapping['vertexs_tc'] = 2 + + glClearColor(0.0, 0.0, 0.0, 1.0); + # glEnable(GL_CULL_FACE); glCullFace(GL_BACK); + glEnable(GL_DEPTH_TEST); + + glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT) + + def set_camera(self, fov_vertical, z_near, z_far, aspect): + width = 2*np.tan(np.deg2rad(fov_vertical)/2.0)*z_near*aspect; + height = 2*np.tan(np.deg2rad(fov_vertical)/2.0)*z_near; + egl_program = self.egl_program + c = np.eye(4, dtype=np.float32) + c[3,3] = 0 + c[3,2] = -1 + c[2,2] = -(z_near+z_far)/(z_far-z_near) + c[2,3] = -2.0*(z_near*z_far)/(z_far-z_near) + c[0,0] = 2.0*z_near/width + c[1,1] = 2.0*z_near/height + c = c.T + + projection_matrix_o = glGetUniformLocation(egl_program, 'uProjectionMatrix') + projection_matrix = np.eye(4, dtype=np.float32) + projection_matrix[...] = c + projection_matrix = np.reshape(projection_matrix, (-1)) + glUniformMatrix4fv(projection_matrix_o, 1, GL_FALSE, projection_matrix) + + + def load_default_object(self): + v = np.array([[0.0, 0.5, 0.0, 1.0, 1.0, 0.0, 1.0], + [-0.5, -0.5, 0.0, 1.0, 0.0, 1.0, 1.0], + [0.5, -0.5, 0.0, 1.0, 1.0, 1.0, 1.0]], dtype=np.float32) + v = np.concatenate((v,v+0.1), axis=0) + v = np.ascontiguousarray(v, dtype=np.float32) + + vbo = glGenBuffers(1) + glBindBuffer (GL_ARRAY_BUFFER, vbo) + glBufferData (GL_ARRAY_BUFFER, v.dtype.itemsize*v.size, v, GL_STATIC_DRAW) + glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 28, ctypes.c_void_p(0)) + glVertexAttribPointer(1, 4, GL_FLOAT, GL_FALSE, 28, ctypes.c_void_p(12)) + glEnableVertexAttribArray(0); + glEnableVertexAttribArray(1); + + self.num_to_render = 6; + + def _actual_render(self): + for entity_id, entity in self.entities.iteritems(): + if entity['visible']: + vbo = entity['vbo'] + tbo = entity['tbo'] + num = entity['num'] + + glBindBuffer(GL_ARRAY_BUFFER, vbo) + glVertexAttribPointer(self.egl_mapping['vertexs'], 3, GL_FLOAT, GL_FALSE, + 20, ctypes.c_void_p(0)) + glVertexAttribPointer(self.egl_mapping['vertexs_tc'], 2, GL_FLOAT, + GL_FALSE, 20, ctypes.c_void_p(12)) + glEnableVertexAttribArray(self.egl_mapping['vertexs']); + glEnableVertexAttribArray(self.egl_mapping['vertexs_tc']); + + glBindTexture(GL_TEXTURE_2D, tbo) + glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR); + glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR); + glDrawArrays(GL_TRIANGLES, 0, num) + + def render(self, take_screenshot=False, output_type=0): + # self.render_timer.tic() + self._actual_render() + # self.render_timer.toc(log_at=1000, log_str='render timer', type='time') + + np_rgb_img = None + np_d_img = None + c = 1000. + if take_screenshot: + if self.modality == 'rgb': + screenshot_rgba = np.zeros((self.height, self.width, 4), dtype=np.uint8) + glReadPixels(0, 0, self.width, self.height, GL_RGBA, GL_UNSIGNED_BYTE, screenshot_rgba) + np_rgb_img = screenshot_rgba[::-1,:,:3]; + + if self.modality == 'depth': + screenshot_d = np.zeros((self.height, self.width, 4), dtype=np.uint8) + glReadPixels(0, 0, self.width, self.height, GL_RGBA, GL_UNSIGNED_BYTE, screenshot_d) + np_d_img = screenshot_d[::-1,:,:3]; + np_d_img = np_d_img[:,:,2]*(255.*255./c) + np_d_img[:,:,1]*(255./c) + np_d_img[:,:,0]*(1./c) + np_d_img = np_d_img.astype(np.float32) + np_d_img[np_d_img == 0] = np.NaN + np_d_img = np_d_img[:,:,np.newaxis] + + glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT) + return np_rgb_img, np_d_img + + def _load_mesh_into_gl(self, mesh, material): + vvt = np.concatenate((mesh.vertices, mesh.texturecoords[0,:,:2]), axis=1) + vvt = np.ascontiguousarray(vvt[mesh.faces.reshape((-1)),:], dtype=np.float32) + num = vvt.shape[0] + vvt = np.reshape(vvt, (-1)) + + vbo = glGenBuffers(1) + glBindBuffer(GL_ARRAY_BUFFER, vbo) + glBufferData(GL_ARRAY_BUFFER, vvt.dtype.itemsize*vvt.size, vvt, GL_STATIC_DRAW) + + tbo = glGenTextures(1) + glBindTexture(GL_TEXTURE_2D, tbo) + glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB, material.shape[1], + material.shape[0], 0, GL_RGB, GL_UNSIGNED_BYTE, + np.reshape(material, (-1))) + return num, vbo, tbo + + def load_shapes(self, shapes): + entities = self.entities + entity_ids = [] + for i, shape in enumerate(shapes): + for j in range(len(shape.meshes)): + name = shape.meshes[j].name + assert name not in entities, '{:s} entity already exists.'.format(name) + num, vbo, tbo = self._load_mesh_into_gl(shape.meshes[j], shape.materials[j]) + entities[name] = {'num': num, 'vbo': vbo, 'tbo': tbo, 'visible': False} + entity_ids.append(name) + return entity_ids + + def set_entity_visible(self, entity_ids, visibility): + for entity_id in entity_ids: + self.entities[entity_id]['visible'] = visibility + + def position_camera(self, camera_xyz, lookat_xyz, up): + camera_xyz = np.array(camera_xyz) + lookat_xyz = np.array(lookat_xyz) + up = np.array(up) + lookat_to = lookat_xyz - camera_xyz + lookat_from = np.array([0, 1., 0.]) + up_from = np.array([0, 0., 1.]) + up_to = up * 1. + # np.set_printoptions(precision=2, suppress=True) + # print up_from, lookat_from, up_to, lookat_to + r = ru.rotate_camera_to_point_at(up_from, lookat_from, up_to, lookat_to) + R = np.eye(4, dtype=np.float32) + R[:3,:3] = r + + t = np.eye(4, dtype=np.float32) + t[:3,3] = -camera_xyz + + view_matrix = np.dot(R.T, t) + flip_yz = np.eye(4, dtype=np.float32) + flip_yz[1,1] = 0; flip_yz[2,2] = 0; flip_yz[1,2] = 1; flip_yz[2,1] = -1; + view_matrix = np.dot(flip_yz, view_matrix) + view_matrix = view_matrix.T + # print np.concatenate((R, t, view_matrix), axis=1) + view_matrix = np.reshape(view_matrix, (-1)) + view_matrix_o = glGetUniformLocation(self.egl_program, 'uViewMatrix') + glUniformMatrix4fv(view_matrix_o, 1, GL_FALSE, view_matrix) + return None, None #camera_xyz, q + + def clear_scene(self): + keys = self.entities.keys() + for entity_id in keys: + entity = self.entities.pop(entity_id, None) + vbo = entity['vbo'] + tbo = entity['tbo'] + num = entity['num'] + glDeleteBuffers(1, [vbo]) + glDeleteTextures(1, [tbo]) + + def __del__(self): + self.clear_scene() + eglMakeCurrent(self.egl_display, EGL_NO_SURFACE, EGL_NO_SURFACE, EGL_NO_CONTEXT) + eglDestroySurface(self.egl_display, self.egl_surface) + eglTerminate(self.egl_display) diff --git a/cognitive_mapping_and_planning/requirements.txt b/cognitive_mapping_and_planning/requirements.txt new file mode 100644 index 0000000000000000000000000000000000000000..306c807a6c9fd9404afa1c05108e5e835e84edc6 --- /dev/null +++ b/cognitive_mapping_and_planning/requirements.txt @@ -0,0 +1,9 @@ +numpy +pillow +PyOpenGL +PyOpenGL-accelerate +six +networkx +scikit-image +scipy +opencv-python diff --git a/cognitive_mapping_and_planning/scripts/__init__.py b/cognitive_mapping_and_planning/scripts/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/cognitive_mapping_and_planning/scripts/script_distill.py b/cognitive_mapping_and_planning/scripts/script_distill.py new file mode 100644 index 0000000000000000000000000000000000000000..010c690412ed28011146ab44109dc099d02324e7 --- /dev/null +++ b/cognitive_mapping_and_planning/scripts/script_distill.py @@ -0,0 +1,177 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +r""" Script to setup the grid moving agent. + +blaze build --define=ION_GFX_OGLES20=1 -c opt --copt=-mavx --config=cuda_clang \ + learning/brain/public/tensorflow_std_server{,_gpu} \ + experimental/users/saurabhgupta/navigation/cmp/scripts/script_distill.par \ + experimental/users/saurabhgupta/navigation/cmp/scripts/script_distill + + +./blaze-bin/experimental/users/saurabhgupta/navigation/cmp/scripts/script_distill \ + --logdir=/cns/iq-d/home/saurabhgupta/output/stanford-distill/local/v0/ \ + --config_name 'v0+train' --gfs_user robot-intelligence-gpu + +""" +import sys, os, numpy as np +import copy +import argparse, pprint +import time +import cProfile + + +import tensorflow as tf +from tensorflow.contrib import slim +from tensorflow.python.framework import ops +from tensorflow.contrib.framework.python.ops import variables + +import logging +from tensorflow.python.platform import gfile +from tensorflow.python.platform import app +from tensorflow.python.platform import flags +from cfgs import config_distill +from tfcode import tf_utils +import src.utils as utils +import src.file_utils as fu +import tfcode.distillation as distill +import datasets.nav_env as nav_env + +FLAGS = flags.FLAGS + +flags.DEFINE_string('master', 'local', + 'The name of the TensorFlow master to use.') +flags.DEFINE_integer('ps_tasks', 0, 'The number of parameter servers. If the ' + 'value is 0, then the parameters are handled locally by ' + 'the worker.') +flags.DEFINE_integer('task', 0, 'The Task ID. This value is used when training ' + 'with multiple workers to identify each worker.') + +flags.DEFINE_integer('num_workers', 1, '') + +flags.DEFINE_string('config_name', '', '') + +flags.DEFINE_string('logdir', '', '') + +def main(_): + args = config_distill.get_args_for_config(FLAGS.config_name) + args.logdir = FLAGS.logdir + args.solver.num_workers = FLAGS.num_workers + args.solver.task = FLAGS.task + args.solver.ps_tasks = FLAGS.ps_tasks + args.solver.master = FLAGS.master + + args.buildinger.env_class = nav_env.MeshMapper + fu.makedirs(args.logdir) + args.buildinger.logdir = args.logdir + R = nav_env.get_multiplexor_class(args.buildinger, args.solver.task) + + if False: + pr = cProfile.Profile() + pr.enable() + rng = np.random.RandomState(0) + for i in range(1): + b, instances_perturbs = R.sample_building(rng) + inputs = b.worker(*(instances_perturbs)) + for j in range(inputs['imgs'].shape[0]): + p = os.path.join('tmp', '{:d}.png'.format(j)) + img = inputs['imgs'][j,0,:,:,:3]*1 + img = (img).astype(np.uint8) + fu.write_image(p, img) + print(inputs['imgs'].shape) + inputs = R.pre(inputs) + pr.disable() + pr.print_stats(2) + + if args.control.train: + if not gfile.Exists(args.logdir): + gfile.MakeDirs(args.logdir) + + m = utils.Foo() + m.tf_graph = tf.Graph() + + config = tf.ConfigProto() + config.device_count['GPU'] = 1 + config.gpu_options.allow_growth = True + config.gpu_options.per_process_gpu_memory_fraction = 0.8 + + with m.tf_graph.as_default(): + with tf.device(tf.train.replica_device_setter(args.solver.ps_tasks)): + m = distill.setup_to_run(m, args, is_training=True, + batch_norm_is_training=True) + + train_step_kwargs = distill.setup_train_step_kwargs_mesh( + m, R, os.path.join(args.logdir, 'train'), + rng_seed=args.solver.task, is_chief=args.solver.task==0, iters=1, + train_display_interval=args.summary.display_interval) + + final_loss = slim.learning.train( + train_op=m.train_op, + logdir=args.logdir, + master=args.solver.master, + is_chief=args.solver.task == 0, + number_of_steps=args.solver.max_steps, + train_step_fn=tf_utils.train_step_custom, + train_step_kwargs=train_step_kwargs, + global_step=m.global_step_op, + init_op=m.init_op, + init_fn=m.init_fn, + sync_optimizer=m.sync_optimizer, + saver=m.saver_op, + summary_op=None, session_config=config) + + if args.control.test: + m = utils.Foo() + m.tf_graph = tf.Graph() + checkpoint_dir = os.path.join(format(args.logdir)) + with m.tf_graph.as_default(): + m = distill.setup_to_run(m, args, is_training=False, + batch_norm_is_training=args.control.force_batchnorm_is_training_at_test) + + train_step_kwargs = distill.setup_train_step_kwargs_mesh( + m, R, os.path.join(args.logdir, args.control.test_name), + rng_seed=args.solver.task+1, is_chief=args.solver.task==0, + iters=args.summary.test_iters, train_display_interval=None) + + sv = slim.learning.supervisor.Supervisor( + graph=ops.get_default_graph(), logdir=None, init_op=m.init_op, + summary_op=None, summary_writer=None, global_step=None, saver=m.saver_op) + + last_checkpoint = None + while True: + last_checkpoint = slim.evaluation.wait_for_new_checkpoint(checkpoint_dir, last_checkpoint) + checkpoint_iter = int(os.path.basename(last_checkpoint).split('-')[1]) + start = time.time() + logging.info('Starting evaluation at %s using checkpoint %s.', + time.strftime('%Y-%m-%d-%H:%M:%S', time.localtime()), + last_checkpoint) + + config = tf.ConfigProto() + config.device_count['GPU'] = 1 + config.gpu_options.allow_growth = True + config.gpu_options.per_process_gpu_memory_fraction = 0.8 + + with sv.managed_session(args.solver.master,config=config, + start_standard_services=False) as sess: + sess.run(m.init_op) + sv.saver.restore(sess, last_checkpoint) + sv.start_queue_runners(sess) + vals, _ = tf_utils.train_step_custom( + sess, None, m.global_step_op, train_step_kwargs, mode='val') + if checkpoint_iter >= args.solver.max_steps: + break + +if __name__ == '__main__': + app.run() diff --git a/cognitive_mapping_and_planning/scripts/script_download_init_models.sh b/cognitive_mapping_and_planning/scripts/script_download_init_models.sh new file mode 100644 index 0000000000000000000000000000000000000000..1900bd0b03566d29dac8a8de5f4fce623be98a92 --- /dev/null +++ b/cognitive_mapping_and_planning/scripts/script_download_init_models.sh @@ -0,0 +1,18 @@ +# Script to download models to initialize the RGB and D models for training.We +# use ResNet-v2-50 for both modalities. + +mkdir -p data/init_models +cd data/init_models + +# RGB Models are initialized by pre-training on ImageNet. +mkdir -p resnet_v2_50 +RGB_URL="http://download.tensorflow.org/models/resnet_v2_50_2017_04_14.tar.gz" +wget $RGB_URL +tar -xf resnet_v2_50_2017_04_14.tar.gz -C resnet_v2_50 + +# Depth models are initialized by distilling the RGB model to D images using +# Cross-Modal Distillation (https://arxiv.org/abs/1507.00448). +mkdir -p distill_rgb_to_d_resnet_v2_50 +D_URL="http://download.tensorflow.org/models/cognitive_mapping_and_planning/2017_04_16/distill_rgb_to_d_resnet_v2_50.tar" +wget $D_URL +tar -xf distill_rgb_to_d_resnet_v2_50.tar -C distill_rgb_to_d_resnet_v2_50 diff --git a/cognitive_mapping_and_planning/scripts/script_env_vis.py b/cognitive_mapping_and_planning/scripts/script_env_vis.py new file mode 100644 index 0000000000000000000000000000000000000000..03222dfab3f25d2eecec8c9a66903999b194b405 --- /dev/null +++ b/cognitive_mapping_and_planning/scripts/script_env_vis.py @@ -0,0 +1,186 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""A simple python function to walk in the enviornments that we have created. +PYTHONPATH='.' PYOPENGL_PLATFORM=egl python scripts/script_env_vis.py \ + --dataset_name sbpd --building_name area3 +""" +import sys +import numpy as np +import matplotlib +matplotlib.use('TkAgg') +from PIL import ImageTk, Image +import Tkinter as tk +import logging +from tensorflow.python.platform import app +from tensorflow.python.platform import flags + +import datasets.nav_env_config as nec +import datasets.nav_env as nav_env +import cv2 +from datasets import factory +import render.swiftshader_renderer as renderer + +SwiftshaderRenderer = renderer.SwiftshaderRenderer +VisualNavigationEnv = nav_env.VisualNavigationEnv + +FLAGS = flags.FLAGS +flags.DEFINE_string('dataset_name', 'sbpd', 'Name of the dataset.') +flags.DEFINE_float('fov', 60., 'Field of view') +flags.DEFINE_integer('image_size', 512, 'Size of the image.') +flags.DEFINE_string('building_name', '', 'Name of the building.') + +def get_args(): + navtask = nec.nav_env_base_config() + navtask.task_params.type = 'rng_rejection_sampling_many' + navtask.task_params.rejection_sampling_M = 2000 + navtask.task_params.min_dist = 10 + sz = FLAGS.image_size + navtask.camera_param.fov = FLAGS.fov + navtask.camera_param.height = sz + navtask.camera_param.width = sz + navtask.task_params.img_height = sz + navtask.task_params.img_width = sz + + # navtask.task_params.semantic_task.class_map_names = ['chair', 'door', 'table'] + # navtask.task_params.type = 'to_nearest_obj_acc' + + logging.info('navtask: %s', navtask) + return navtask + +def load_building(dataset_name, building_name): + dataset = factory.get_dataset(dataset_name) + + navtask = get_args() + cp = navtask.camera_param + rgb_shader, d_shader = renderer.get_shaders(cp.modalities) + r_obj = SwiftshaderRenderer() + r_obj.init_display(width=cp.width, height=cp.height, + fov=cp.fov, z_near=cp.z_near, z_far=cp.z_far, + rgb_shader=rgb_shader, d_shader=d_shader) + r_obj.clear_scene() + b = VisualNavigationEnv(robot=navtask.robot, env=navtask.env, + task_params=navtask.task_params, + building_name=building_name, flip=False, + logdir=None, building_loader=dataset, + r_obj=r_obj) + b.load_building_into_scene() + b.set_building_visibility(False) + return b + +def walk_through(b): + # init agent at a random location in the environment. + init_env_state = b.reset([np.random.RandomState(0), np.random.RandomState(0)]) + + global current_node + rng = np.random.RandomState(0) + current_node = rng.choice(b.task.nodes.shape[0]) + + root = tk.Tk() + image = b.render_nodes(b.task.nodes[[current_node],:])[0] + print image.shape + image = image.astype(np.uint8) + im = Image.fromarray(image) + im = ImageTk.PhotoImage(im) + panel = tk.Label(root, image=im) + + map_size = b.traversible.shape + sc = np.max(map_size)/256. + loc = np.array([[map_size[1]/2., map_size[0]/2.]]) + x_axis = np.zeros_like(loc); x_axis[:,1] = sc + y_axis = np.zeros_like(loc); y_axis[:,0] = -sc + cum_fs, cum_valid = nav_env.get_map_to_predict(loc, x_axis, y_axis, + map=b.traversible*1., + map_size=256) + cum_fs = cum_fs[0] + cum_fs = cv2.applyColorMap((cum_fs*255).astype(np.uint8), cv2.COLORMAP_JET) + im = Image.fromarray(cum_fs) + im = ImageTk.PhotoImage(im) + panel_overhead = tk.Label(root, image=im) + + def refresh(): + global current_node + image = b.render_nodes(b.task.nodes[[current_node],:])[0] + image = image.astype(np.uint8) + im = Image.fromarray(image) + im = ImageTk.PhotoImage(im) + panel.configure(image=im) + panel.image = im + + def left_key(event): + global current_node + current_node = b.take_action([current_node], [2], 1)[0][0] + refresh() + + def up_key(event): + global current_node + current_node = b.take_action([current_node], [3], 1)[0][0] + refresh() + + def right_key(event): + global current_node + current_node = b.take_action([current_node], [1], 1)[0][0] + refresh() + + def quit(event): + root.destroy() + + panel_overhead.grid(row=4, column=5, rowspan=1, columnspan=1, + sticky=tk.W+tk.E+tk.N+tk.S) + panel.bind('', left_key) + panel.bind('', up_key) + panel.bind('', right_key) + panel.bind('q', quit) + panel.focus_set() + panel.grid(row=0, column=0, rowspan=5, columnspan=5, + sticky=tk.W+tk.E+tk.N+tk.S) + root.mainloop() + +def simple_window(): + root = tk.Tk() + + image = np.zeros((128, 128, 3), dtype=np.uint8) + image[32:96, 32:96, 0] = 255 + im = Image.fromarray(image) + im = ImageTk.PhotoImage(im) + + image = np.zeros((128, 128, 3), dtype=np.uint8) + image[32:96, 32:96, 1] = 255 + im2 = Image.fromarray(image) + im2 = ImageTk.PhotoImage(im2) + + panel = tk.Label(root, image=im) + + def left_key(event): + panel.configure(image=im2) + panel.image = im2 + + def quit(event): + sys.exit() + + panel.bind('', left_key) + panel.bind('', left_key) + panel.bind('', left_key) + panel.bind('q', quit) + panel.focus_set() + panel.pack(side = "bottom", fill = "both", expand = "yes") + root.mainloop() + +def main(_): + b = load_building(FLAGS.dataset_name, FLAGS.building_name) + walk_through(b) + +if __name__ == '__main__': + app.run() diff --git a/cognitive_mapping_and_planning/scripts/script_nav_agent_release.py b/cognitive_mapping_and_planning/scripts/script_nav_agent_release.py new file mode 100644 index 0000000000000000000000000000000000000000..dab2819a6fcf100cb2e385e45b7aa694c4c5f033 --- /dev/null +++ b/cognitive_mapping_and_planning/scripts/script_nav_agent_release.py @@ -0,0 +1,253 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +r""" Script to train and test the grid navigation agent. +Usage: + 1. Testing a model. + CUDA_VISIBLE_DEVICES=0 LD_LIBRARY_PATH=/opt/cuda-8.0/lib64:/opt/cudnnv51/lib64 \ + PYTHONPATH='.' PYOPENGL_PLATFORM=egl python scripts/script_nav_agent_release.py \ + --config_name cmp.lmap_Msc.clip5.sbpd_d_r2r+bench_test \ + --logdir output/cmp.lmap_Msc.clip5.sbpd_d_r2r + + 2. Training a model (locally). + CUDA_VISIBLE_DEVICES=0 LD_LIBRARY_PATH=/opt/cuda-8.0/lib64:/opt/cudnnv51/lib64 \ + PYTHONPATH='.' PYOPENGL_PLATFORM=egl python scripts/script_nav_agent_release.py \ + --config_name cmp.lmap_Msc.clip5.sbpd_d_r2r+train_train \ + --logdir output/cmp.lmap_Msc.clip5.sbpd_d_r2r_ + + 3. Training a model (distributed). + # See https://www.tensorflow.org/deploy/distributed on how to setup distributed + # training. + CUDA_VISIBLE_DEVICES=0 LD_LIBRARY_PATH=/opt/cuda-8.0/lib64:/opt/cudnnv51/lib64 \ + PYTHONPATH='.' PYOPENGL_PLATFORM=egl python scripts/script_nav_agent_release.py \ + --config_name cmp.lmap_Msc.clip5.sbpd_d_r2r+train_train \ + --logdir output/cmp.lmap_Msc.clip5.sbpd_d_r2r_ \ + --ps_tasks $num_ps --master $master_name --task $worker_id +""" + +import sys, os, numpy as np +import copy +import argparse, pprint +import time +import cProfile +import platform + + +import tensorflow as tf +from tensorflow.contrib import slim +from tensorflow.python.framework import ops +from tensorflow.contrib.framework.python.ops import variables + +import logging +from tensorflow.python.platform import gfile +from tensorflow.python.platform import app +from tensorflow.python.platform import flags +from cfgs import config_cmp +from cfgs import config_vision_baseline +import datasets.nav_env as nav_env +import src.file_utils as fu +import src.utils as utils +import tfcode.cmp as cmp +from tfcode import tf_utils +from tfcode import vision_baseline_lstm + +FLAGS = flags.FLAGS + +flags.DEFINE_string('master', '', + 'The address of the tensorflow master') +flags.DEFINE_integer('ps_tasks', 0, 'The number of parameter servers. If the ' + 'value is 0, then the parameters are handled locally by ' + 'the worker.') +flags.DEFINE_integer('task', 0, 'The Task ID. This value is used when training ' + 'with multiple workers to identify each worker.') + +flags.DEFINE_integer('num_workers', 1, '') + +flags.DEFINE_string('config_name', '', '') + +flags.DEFINE_string('logdir', '', '') + +flags.DEFINE_integer('solver_seed', 0, '') + +flags.DEFINE_integer('delay_start_iters', 20, '') + +logging.basicConfig(level=logging.INFO) + +def main(_): + _launcher(FLAGS.config_name, FLAGS.logdir) + +def _launcher(config_name, logdir): + args = _setup_args(config_name, logdir) + + fu.makedirs(args.logdir) + + if args.control.train: + _train(args) + + if args.control.test: + _test(args) + +def get_args_for_config(config_name): + configs = config_name.split('.') + type = configs[0] + config_name = '.'.join(configs[1:]) + if type == 'cmp': + args = config_cmp.get_args_for_config(config_name) + args.setup_to_run = cmp.setup_to_run + args.setup_train_step_kwargs = cmp.setup_train_step_kwargs + + elif type == 'bl': + args = config_vision_baseline.get_args_for_config(config_name) + args.setup_to_run = vision_baseline_lstm.setup_to_run + args.setup_train_step_kwargs = vision_baseline_lstm.setup_train_step_kwargs + + else: + logging.fatal('Unknown type: {:s}'.format(type)) + return args + +def _setup_args(config_name, logdir): + args = get_args_for_config(config_name) + args.solver.num_workers = FLAGS.num_workers + args.solver.task = FLAGS.task + args.solver.ps_tasks = FLAGS.ps_tasks + args.solver.master = FLAGS.master + args.solver.seed = FLAGS.solver_seed + args.logdir = logdir + args.navtask.logdir = None + return args + +def _train(args): + container_name = "" + + R = lambda: nav_env.get_multiplexer_class(args.navtask, args.solver.task) + m = utils.Foo() + m.tf_graph = tf.Graph() + + config = tf.ConfigProto() + config.device_count['GPU'] = 1 + + with m.tf_graph.as_default(): + with tf.device(tf.train.replica_device_setter(args.solver.ps_tasks, + merge_devices=True)): + with tf.container(container_name): + m = args.setup_to_run(m, args, is_training=True, + batch_norm_is_training=True, summary_mode='train') + + train_step_kwargs = args.setup_train_step_kwargs( + m, R(), os.path.join(args.logdir, 'train'), rng_seed=args.solver.task, + is_chief=args.solver.task==0, + num_steps=args.navtask.task_params.num_steps*args.navtask.task_params.num_goals, iters=1, + train_display_interval=args.summary.display_interval, + dagger_sample_bn_false=args.arch.dagger_sample_bn_false) + + delay_start = (args.solver.task*(args.solver.task+1))/2 * FLAGS.delay_start_iters + logging.error('delaying start for task %d by %d steps.', + args.solver.task, delay_start) + + additional_args = {} + final_loss = slim.learning.train( + train_op=m.train_op, + logdir=args.logdir, + master=args.solver.master, + is_chief=args.solver.task == 0, + number_of_steps=args.solver.max_steps, + train_step_fn=tf_utils.train_step_custom_online_sampling, + train_step_kwargs=train_step_kwargs, + global_step=m.global_step_op, + init_op=m.init_op, + init_fn=m.init_fn, + sync_optimizer=m.sync_optimizer, + saver=m.saver_op, + startup_delay_steps=delay_start, + summary_op=None, session_config=config, **additional_args) + +def _test(args): + args.solver.master = '' + container_name = "" + checkpoint_dir = os.path.join(format(args.logdir)) + logging.error('Checkpoint_dir: %s', args.logdir) + + config = tf.ConfigProto(); + config.device_count['GPU'] = 1; + + m = utils.Foo() + m.tf_graph = tf.Graph() + + rng_data_seed = 0; rng_action_seed = 0; + R = lambda: nav_env.get_multiplexer_class(args.navtask, rng_data_seed) + with m.tf_graph.as_default(): + with tf.container(container_name): + m = args.setup_to_run( + m, args, is_training=False, + batch_norm_is_training=args.control.force_batchnorm_is_training_at_test, + summary_mode=args.control.test_mode) + train_step_kwargs = args.setup_train_step_kwargs( + m, R(), os.path.join(args.logdir, args.control.test_name), + rng_seed=rng_data_seed, is_chief=True, + num_steps=args.navtask.task_params.num_steps*args.navtask.task_params.num_goals, + iters=args.summary.test_iters, train_display_interval=None, + dagger_sample_bn_false=args.arch.dagger_sample_bn_false) + + saver = slim.learning.tf_saver.Saver(variables.get_variables_to_restore()) + + sv = slim.learning.supervisor.Supervisor( + graph=ops.get_default_graph(), logdir=None, init_op=m.init_op, + summary_op=None, summary_writer=None, global_step=None, saver=m.saver_op) + + last_checkpoint = None + reported = False + while True: + last_checkpoint_ = None + while last_checkpoint_ is None: + last_checkpoint_ = slim.evaluation.wait_for_new_checkpoint( + checkpoint_dir, last_checkpoint, seconds_to_sleep=10, timeout=60) + if last_checkpoint_ is None: break + + last_checkpoint = last_checkpoint_ + checkpoint_iter = int(os.path.basename(last_checkpoint).split('-')[1]) + + logging.info('Starting evaluation at %s using checkpoint %s.', + time.strftime('%Y-%m-%d-%H:%M:%S', time.localtime()), + last_checkpoint) + + if (args.control.only_eval_when_done == False or + checkpoint_iter >= args.solver.max_steps): + start = time.time() + logging.info('Starting evaluation at %s using checkpoint %s.', + time.strftime('%Y-%m-%d-%H:%M:%S', time.localtime()), + last_checkpoint) + + with sv.managed_session(args.solver.master, config=config, + start_standard_services=False) as sess: + sess.run(m.init_op) + sv.saver.restore(sess, last_checkpoint) + sv.start_queue_runners(sess) + if args.control.reset_rng_seed: + train_step_kwargs['rng_data'] = [np.random.RandomState(rng_data_seed), + np.random.RandomState(rng_data_seed)] + train_step_kwargs['rng_action'] = np.random.RandomState(rng_action_seed) + vals, _ = tf_utils.train_step_custom_online_sampling( + sess, None, m.global_step_op, train_step_kwargs, + mode=args.control.test_mode) + should_stop = False + + if checkpoint_iter >= args.solver.max_steps: + should_stop = True + + if should_stop: + break + +if __name__ == '__main__': + app.run() diff --git a/cognitive_mapping_and_planning/scripts/script_plot_trajectory.py b/cognitive_mapping_and_planning/scripts/script_plot_trajectory.py new file mode 100644 index 0000000000000000000000000000000000000000..81c4c899052884b2061cde554c27c43e9574d771 --- /dev/null +++ b/cognitive_mapping_and_planning/scripts/script_plot_trajectory.py @@ -0,0 +1,339 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +r""" +Code for plotting trajectories in the top view, and also plot first person views +from saved trajectories. Does not run the network but only loads the mesh data +to plot the view points. + CUDA_VISIBLE_DEVICES=0 LD_LIBRARY_PATH=/opt/cuda-8.0/lib64:/opt/cudnnv51/lib64 + PYTHONPATH='.' PYOPENGL_PLATFORM=egl python scripts/script_plot_trajectory.py \ + --first_person --num_steps 40 \ + --config_name cmp.lmap_Msc.clip5.sbpd_d_r2r \ + --imset test --alsologtostderr --base_dir output --out_dir vis + +""" +import os, sys, numpy as np, copy +import matplotlib +matplotlib.use("Agg") +import matplotlib.pyplot as plt +import matplotlib.animation as animation +from matplotlib.gridspec import GridSpec + +import tensorflow as tf +from tensorflow.contrib import slim +import cv2 +import logging +from tensorflow.python.platform import gfile +from tensorflow.python.platform import app +from tensorflow.python.platform import flags + +from datasets import nav_env +import scripts.script_nav_agent_release as sna +import src.file_utils as fu +from src import graph_utils +from src import utils +FLAGS = flags.FLAGS + +flags.DEFINE_string('out_dir', 'vis', 'Directory where to store the output') +flags.DEFINE_string('type', '', 'Optional type.') +flags.DEFINE_bool('first_person', False, 'Visualize the first person view.') +flags.DEFINE_bool('top_view', False, 'Visualize the trajectory in the top view.') +flags.DEFINE_integer('num_steps', 40, 'Number of steps to run the model for.') +flags.DEFINE_string('imset', 'test', '') +flags.DEFINE_string('base_dir', 'output', 'Cache directory.') + +def _get_suffix_str(): + return '' + + +def _load_trajectory(): + base_dir = FLAGS.base_dir + config_name = FLAGS.config_name+_get_suffix_str() + + dir_name = os.path.join(base_dir, FLAGS.type, config_name) + logging.info('Waiting for snapshot in directory %s.', dir_name) + last_checkpoint = slim.evaluation.wait_for_new_checkpoint(dir_name, None) + checkpoint_iter = int(os.path.basename(last_checkpoint).split('-')[1]) + + # Load the distances. + a = utils.load_variables(os.path.join(dir_name, 'bench_on_'+FLAGS.imset, + 'all_locs_at_t_{:d}.pkl'.format(checkpoint_iter))) + return a + +def _compute_hardness(): + # Load the stanford data to compute the hardness. + if FLAGS.type == '': + args = sna.get_args_for_config(FLAGS.config_name+'+bench_'+FLAGS.imset) + else: + args = sna.get_args_for_config(FLAGS.type+'.'+FLAGS.config_name+'+bench_'+FLAGS.imset) + + args.navtask.logdir = None + R = lambda: nav_env.get_multiplexer_class(args.navtask, 0) + R = R() + + rng_data = [np.random.RandomState(0), np.random.RandomState(0)] + + # Sample a room. + h_dists = [] + gt_dists = [] + for i in range(250): + e = R.sample_env(rng_data) + nodes = e.task.nodes + + # Initialize the agent. + init_env_state = e.reset(rng_data) + + gt_dist_to_goal = [e.episode.dist_to_goal[0][j][s] + for j, s in enumerate(e.episode.start_node_ids)] + + for j in range(args.navtask.task_params.batch_size): + start_node_id = e.episode.start_node_ids[j] + end_node_id =e.episode.goal_node_ids[0][j] + h_dist = graph_utils.heuristic_fn_vec( + nodes[[start_node_id],:], nodes[[end_node_id], :], + n_ori=args.navtask.task_params.n_ori, + step_size=args.navtask.task_params.step_size)[0][0] + gt_dist = e.episode.dist_to_goal[0][j][start_node_id] + h_dists.append(h_dist) + gt_dists.append(gt_dist) + + h_dists = np.array(h_dists) + gt_dists = np.array(gt_dists) + e = R.sample_env([np.random.RandomState(0), np.random.RandomState(0)]) + input = e.get_common_data() + orig_maps = input['orig_maps'][0,0,:,:,0] + return h_dists, gt_dists, orig_maps + +def plot_trajectory_first_person(dt, orig_maps, out_dir): + out_dir = os.path.join(out_dir, FLAGS.config_name+_get_suffix_str(), + FLAGS.imset) + fu.makedirs(out_dir) + + # Load the model so that we can render. + plt.set_cmap('gray') + samples_per_action = 8; wait_at_action = 0; + + Writer = animation.writers['mencoder'] + writer = Writer(fps=3*(samples_per_action+wait_at_action), + metadata=dict(artist='anonymous'), bitrate=1800) + + args = sna.get_args_for_config(FLAGS.config_name + '+bench_'+FLAGS.imset) + args.navtask.logdir = None + navtask_ = copy.deepcopy(args.navtask) + navtask_.camera_param.modalities = ['rgb'] + navtask_.task_params.modalities = ['rgb'] + sz = 512 + navtask_.camera_param.height = sz + navtask_.camera_param.width = sz + navtask_.task_params.img_height = sz + navtask_.task_params.img_width = sz + R = lambda: nav_env.get_multiplexer_class(navtask_, 0) + R = R() + b = R.buildings[0] + + f = [0 for _ in range(wait_at_action)] + \ + [float(_)/samples_per_action for _ in range(samples_per_action)]; + + # Generate things for it to render. + inds_to_do = [] + inds_to_do += [1, 4, 10] #1291, 1268, 1273, 1289, 1302, 1426, 1413, 1449, 1399, 1390] + + for i in inds_to_do: + fig = plt.figure(figsize=(10,8)) + gs = GridSpec(3,4) + gs.update(wspace=0.05, hspace=0.05, left=0.0, top=0.97, right=1.0, bottom=0.) + ax = fig.add_subplot(gs[:,:-1]) + ax1 = fig.add_subplot(gs[0,-1]) + ax2 = fig.add_subplot(gs[1,-1]) + ax3 = fig.add_subplot(gs[2,-1]) + axes = [ax, ax1, ax2, ax3] + # ax = fig.add_subplot(gs[:,:]) + # axes = [ax] + for ax in axes: + ax.set_axis_off() + + node_ids = dt['all_node_ids'][i, :, 0]*1 + # Prune so that last node is not repeated more than 3 times? + if np.all(node_ids[-4:] == node_ids[-1]): + while node_ids[-4] == node_ids[-1]: + node_ids = node_ids[:-1] + num_steps = np.minimum(FLAGS.num_steps, len(node_ids)) + + xyt = b.to_actual_xyt_vec(b.task.nodes[node_ids]) + xyt_diff = xyt[1:,:] - xyt[:-1:,:] + xyt_diff[:,2] = np.mod(xyt_diff[:,2], 4) + ind = np.where(xyt_diff[:,2] == 3)[0] + xyt_diff[ind, 2] = -1 + xyt_diff = np.expand_dims(xyt_diff, axis=1) + to_cat = [xyt_diff*_ for _ in f] + perturbs_all = np.concatenate(to_cat, axis=1) + perturbs_all = np.concatenate([perturbs_all, np.zeros_like(perturbs_all[:,:,:1])], axis=2) + node_ids_all = np.expand_dims(node_ids, axis=1)*1 + node_ids_all = np.concatenate([node_ids_all for _ in f], axis=1) + node_ids_all = np.reshape(node_ids_all[:-1,:], -1) + perturbs_all = np.reshape(perturbs_all, [-1, 4]) + imgs = b.render_nodes(b.task.nodes[node_ids_all,:], perturb=perturbs_all) + + # Get action at each node. + actions = [] + _, action_to_nodes = b.get_feasible_actions(node_ids) + for j in range(num_steps-1): + action_to_node = action_to_nodes[j] + node_to_action = dict(zip(action_to_node.values(), action_to_node.keys())) + actions.append(node_to_action[node_ids[j+1]]) + + def init_fn(): + return fig, + gt_dist_to_goal = [] + + # Render trajectories. + def worker(j): + # Plot the image. + step_number = j/(samples_per_action + wait_at_action) + img = imgs[j]; ax = axes[0]; ax.clear(); ax.set_axis_off(); + img = img.astype(np.uint8); ax.imshow(img); + tt = ax.set_title( + "First Person View\n" + + "Top corners show diagnostics (distance, agents' action) not input to agent.", + fontsize=12) + plt.setp(tt, color='white') + + # Distance to goal. + t = 'Dist to Goal:\n{:2d} steps'.format(int(dt['all_d_at_t'][i, step_number])) + t = ax.text(0.01, 0.99, t, + horizontalalignment='left', + verticalalignment='top', + fontsize=20, color='red', + transform=ax.transAxes, alpha=1.0) + t.set_bbox(dict(color='white', alpha=0.85, pad=-0.1)) + + # Action to take. + action_latex = ['$\odot$ ', '$\curvearrowright$ ', '$\curvearrowleft$ ', '$\Uparrow$ '] + t = ax.text(0.99, 0.99, action_latex[actions[step_number]], + horizontalalignment='right', + verticalalignment='top', + fontsize=40, color='green', + transform=ax.transAxes, alpha=1.0) + t.set_bbox(dict(color='white', alpha=0.85, pad=-0.1)) + + + # Plot the map top view. + ax = axes[-1] + if j == 0: + # Plot the map + locs = dt['all_locs'][i,:num_steps,:] + goal_loc = dt['all_goal_locs'][i,:,:] + xymin = np.minimum(np.min(goal_loc, axis=0), np.min(locs, axis=0)) + xymax = np.maximum(np.max(goal_loc, axis=0), np.max(locs, axis=0)) + xy1 = (xymax+xymin)/2. - 0.7*np.maximum(np.max(xymax-xymin), 24) + xy2 = (xymax+xymin)/2. + 0.7*np.maximum(np.max(xymax-xymin), 24) + + ax.set_axis_on() + ax.patch.set_facecolor((0.333, 0.333, 0.333)) + ax.set_xticks([]); ax.set_yticks([]); + ax.imshow(orig_maps, origin='lower', vmin=-1.0, vmax=2.0) + ax.plot(goal_loc[:,0], goal_loc[:,1], 'g*', markersize=12) + + locs = dt['all_locs'][i,:1,:] + ax.plot(locs[:,0], locs[:,1], 'b.', markersize=12) + + ax.set_xlim([xy1[0], xy2[0]]) + ax.set_ylim([xy1[1], xy2[1]]) + + locs = dt['all_locs'][i,step_number,:] + locs = np.expand_dims(locs, axis=0) + ax.plot(locs[:,0], locs[:,1], 'r.', alpha=1.0, linewidth=0, markersize=4) + tt = ax.set_title('Trajectory in topview', fontsize=14) + plt.setp(tt, color='white') + return fig, + + line_ani = animation.FuncAnimation(fig, worker, + (num_steps-1)*(wait_at_action+samples_per_action), + interval=500, blit=True, init_func=init_fn) + tmp_file_name = 'tmp.mp4' + line_ani.save(tmp_file_name, writer=writer, savefig_kwargs={'facecolor':'black'}) + out_file_name = os.path.join(out_dir, 'vis_{:04d}.mp4'.format(i)) + print out_file_name + + if fu.exists(out_file_name): + gfile.Remove(out_file_name) + gfile.Copy(tmp_file_name, out_file_name) + gfile.Remove(tmp_file_name) + plt.close(fig) + +def plot_trajectory(dt, hardness, orig_maps, out_dir): + out_dir = os.path.join(out_dir, FLAGS.config_name+_get_suffix_str(), + FLAGS.imset) + fu.makedirs(out_dir) + out_file = os.path.join(out_dir, 'all_locs_at_t.pkl') + dt['hardness'] = hardness + utils.save_variables(out_file, dt.values(), dt.keys(), overwrite=True) + + #Plot trajectories onto the maps + plt.set_cmap('gray') + for i in range(4000): + goal_loc = dt['all_goal_locs'][i, :, :] + locs = np.concatenate((dt['all_locs'][i,:,:], + dt['all_locs'][i,:,:]), axis=0) + xymin = np.minimum(np.min(goal_loc, axis=0), np.min(locs, axis=0)) + xymax = np.maximum(np.max(goal_loc, axis=0), np.max(locs, axis=0)) + xy1 = (xymax+xymin)/2. - 1.*np.maximum(np.max(xymax-xymin), 24) + xy2 = (xymax+xymin)/2. + 1.*np.maximum(np.max(xymax-xymin), 24) + + fig, ax = utils.tight_imshow_figure(plt, figsize=(6,6)) + ax.set_axis_on() + ax.patch.set_facecolor((0.333, 0.333, 0.333)) + ax.set_xticks([]) + ax.set_yticks([]) + + all_locs = dt['all_locs'][i,:,:]*1 + uniq = np.where(np.any(all_locs[1:,:] != all_locs[:-1,:], axis=1))[0]+1 + uniq = np.sort(uniq).tolist() + uniq.insert(0,0) + uniq = np.array(uniq) + all_locs = all_locs[uniq, :] + + ax.plot(dt['all_locs'][i, 0, 0], + dt['all_locs'][i, 0, 1], 'b.', markersize=24) + ax.plot(dt['all_goal_locs'][i, 0, 0], + dt['all_goal_locs'][i, 0, 1], 'g*', markersize=19) + ax.plot(all_locs[:,0], all_locs[:,1], 'r', alpha=0.4, linewidth=2) + ax.scatter(all_locs[:,0], all_locs[:,1], + c=5+np.arange(all_locs.shape[0])*1./all_locs.shape[0], + cmap='Reds', s=30, linewidth=0) + ax.imshow(orig_maps, origin='lower', vmin=-1.0, vmax=2.0, aspect='equal') + ax.set_xlim([xy1[0], xy2[0]]) + ax.set_ylim([xy1[1], xy2[1]]) + + file_name = os.path.join(out_dir, 'trajectory_{:04d}.png'.format(i)) + print file_name + with fu.fopen(file_name, 'w') as f: + plt.savefig(f) + plt.close(fig) + + +def main(_): + a = _load_trajectory() + h_dists, gt_dists, orig_maps = _compute_hardness() + hardness = 1.-h_dists*1./ gt_dists + + if FLAGS.top_view: + plot_trajectory(a, hardness, orig_maps, out_dir=FLAGS.out_dir) + + if FLAGS.first_person: + plot_trajectory_first_person(a, orig_maps, out_dir=FLAGS.out_dir) + +if __name__ == '__main__': + app.run() diff --git a/cognitive_mapping_and_planning/scripts/script_preprocess_annoations_S3DIS.py b/cognitive_mapping_and_planning/scripts/script_preprocess_annoations_S3DIS.py new file mode 100644 index 0000000000000000000000000000000000000000..58f32d121acf4c638625079907b02161e808af68 --- /dev/null +++ b/cognitive_mapping_and_planning/scripts/script_preprocess_annoations_S3DIS.py @@ -0,0 +1,197 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +import os +import glob +import numpy as np +import logging +import cPickle +from datasets import nav_env +from datasets import factory +from src import utils +from src import map_utils as mu + +logging.basicConfig(level=logging.INFO) +DATA_DIR = 'data/stanford_building_parser_dataset_raw/' + +mkdir_if_missing = utils.mkdir_if_missing +save_variables = utils.save_variables + +def _get_semantic_maps(building_name, transform, map_, flip, cats): + rooms = get_room_in_building(building_name) + maps = [] + for cat in cats: + maps.append(np.zeros((map_.size[1], map_.size[0]))) + + for r in rooms: + room = load_room(building_name, r, category_list=cats) + classes = room['class_id'] + for i, cat in enumerate(cats): + c_ind = cats.index(cat) + ind = [_ for _, c in enumerate(classes) if c == c_ind] + if len(ind) > 0: + vs = [room['vertexs'][x]*1 for x in ind] + vs = np.concatenate(vs, axis=0) + if transform: + vs = np.array([vs[:,1], vs[:,0], vs[:,2]]).T + vs[:,0] = -vs[:,0] + vs[:,1] += 4.20 + vs[:,0] += 6.20 + vs = vs*100. + if flip: + vs[:,1] = -vs[:,1] + maps[i] = maps[i] + \ + mu._project_to_map(map_, vs, ignore_points_outside_map=True) + return maps + +def _map_building_name(building_name): + b = int(building_name.split('_')[0][4]) + out_name = 'Area_{:d}'.format(b) + if b == 5: + if int(building_name.split('_')[0][5]) == 1: + transform = True + else: + transform = False + else: + transform = False + return out_name, transform + +def get_categories(): + cats = ['beam', 'board', 'bookcase', 'ceiling', 'chair', 'clutter', 'column', + 'door', 'floor', 'sofa', 'table', 'wall', 'window'] + return cats + +def _write_map_files(b_in, b_out, transform): + cats = get_categories() + + env = utils.Foo(padding=10, resolution=5, num_point_threshold=2, + valid_min=-10, valid_max=200, n_samples_per_face=200) + robot = utils.Foo(radius=15, base=10, height=140, sensor_height=120, + camera_elevation_degree=-15) + + building_loader = factory.get_dataset('sbpd') + for flip in [False, True]: + b = nav_env.Building(b_out, robot, env, flip=flip, + building_loader=building_loader) + logging.info("building_in: %s, building_out: %s, transform: %d", b_in, + b_out, transform) + maps = _get_semantic_maps(b_in, transform, b.map, flip, cats) + maps = np.transpose(np.array(maps), axes=[1,2,0]) + + # Load file from the cache. + file_name = '{:s}_{:d}_{:d}_{:d}_{:d}_{:d}_{:d}.pkl' + file_name = file_name.format(b.building_name, b.map.size[0], b.map.size[1], + b.map.origin[0], b.map.origin[1], + b.map.resolution, flip) + out_file = os.path.join(DATA_DIR, 'processing', 'class-maps', file_name) + logging.info('Writing semantic maps to %s.', out_file) + save_variables(out_file, [maps, cats], ['maps', 'cats'], overwrite=True) + +def _transform_area5b(room_dimension): + for a in room_dimension.keys(): + r = room_dimension[a]*1 + r[[0,1,3,4]] = r[[1,0,4,3]] + r[[0,3]] = -r[[3,0]] + r[[1,4]] += 4.20 + r[[0,3]] += 6.20 + room_dimension[a] = r + return room_dimension + +def collect_room(building_name, room_name): + room_dir = os.path.join(DATA_DIR, 'Stanford3dDataset_v1.2', building_name, + room_name, 'Annotations') + files = glob.glob1(room_dir, '*.txt') + files = sorted(files, key=lambda s: s.lower()) + vertexs = []; colors = []; + for f in files: + file_name = os.path.join(room_dir, f) + logging.info(' %s', file_name) + a = np.loadtxt(file_name) + vertex = a[:,:3]*1. + color = a[:,3:]*1 + color = color.astype(np.uint8) + vertexs.append(vertex) + colors.append(color) + files = [f.split('.')[0] for f in files] + out = {'vertexs': vertexs, 'colors': colors, 'names': files} + return out + +def load_room(building_name, room_name, category_list=None): + room = collect_room(building_name, room_name) + room['building_name'] = building_name + room['room_name'] = room_name + instance_id = range(len(room['names'])) + room['instance_id'] = instance_id + if category_list is not None: + name = [r.split('_')[0] for r in room['names']] + class_id = [] + for n in name: + if n in category_list: + class_id.append(category_list.index(n)) + else: + class_id.append(len(category_list)) + room['class_id'] = class_id + room['category_list'] = category_list + return room + +def get_room_in_building(building_name): + building_dir = os.path.join(DATA_DIR, 'Stanford3dDataset_v1.2', building_name) + rn = os.listdir(building_dir) + rn = [x for x in rn if os.path.isdir(os.path.join(building_dir, x))] + rn = sorted(rn, key=lambda s: s.lower()) + return rn + +def write_room_dimensions(b_in, b_out, transform): + rooms = get_room_in_building(b_in) + room_dimension = {} + for r in rooms: + room = load_room(b_in, r, category_list=None) + vertex = np.concatenate(room['vertexs'], axis=0) + room_dimension[r] = np.concatenate((np.min(vertex, axis=0), np.max(vertex, axis=0)), axis=0) + if transform == 1: + room_dimension = _transform_area5b(room_dimension) + + out_file = os.path.join(DATA_DIR, 'processing', 'room-dimension', b_out+'.pkl') + save_variables(out_file, [room_dimension], ['room_dimension'], overwrite=True) + +def write_room_dimensions_all(I): + mkdir_if_missing(os.path.join(DATA_DIR, 'processing', 'room-dimension')) + bs_in = ['Area_1', 'Area_2', 'Area_3', 'Area_4', 'Area_5', 'Area_5', 'Area_6'] + bs_out = ['area1', 'area2', 'area3', 'area4', 'area5a', 'area5b', 'area6'] + transforms = [0, 0, 0, 0, 0, 1, 0] + + for i in I: + b_in = bs_in[i] + b_out = bs_out[i] + t = transforms[i] + write_room_dimensions(b_in, b_out, t) + +def write_class_maps_all(I): + mkdir_if_missing(os.path.join(DATA_DIR, 'processing', 'class-maps')) + bs_in = ['Area_1', 'Area_2', 'Area_3', 'Area_4', 'Area_5', 'Area_5', 'Area_6'] + bs_out = ['area1', 'area2', 'area3', 'area4', 'area5a', 'area5b', 'area6'] + transforms = [0, 0, 0, 0, 0, 1, 0] + + for i in I: + b_in = bs_in[i] + b_out = bs_out[i] + t = transforms[i] + _write_map_files(b_in, b_out, t) + + +if __name__ == '__main__': + write_room_dimensions_all([0, 2, 3, 4, 5, 6]) + write_class_maps_all([0, 2, 3, 4, 5, 6]) + diff --git a/cognitive_mapping_and_planning/scripts/script_preprocess_annoations_S3DIS.sh b/cognitive_mapping_and_planning/scripts/script_preprocess_annoations_S3DIS.sh new file mode 100644 index 0000000000000000000000000000000000000000..1384fabe69259ccc514a14d62aee358d1909bffb --- /dev/null +++ b/cognitive_mapping_and_planning/scripts/script_preprocess_annoations_S3DIS.sh @@ -0,0 +1,24 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +cd data/stanford_building_parser_dataset_raw +unzip Stanford3dDataset_v1.2.zip +cd ../../ +PYOPENGL_PLATFORM=egl PYTHONPATH='.' python scripts/script_preprocess_annoations_S3DIS.py + +mv data/stanford_building_parser_dataset_raw/processing/room-dimension data/stanford_building_parser_dataset/. +mv data/stanford_building_parser_dataset_raw/processing/class-maps data/stanford_building_parser_dataset/. + +echo "You may now delete data/stanford_building_parser_dataset_raw if needed." diff --git a/cognitive_mapping_and_planning/scripts/script_preprocess_meshes_S3DIS.sh b/cognitive_mapping_and_planning/scripts/script_preprocess_meshes_S3DIS.sh new file mode 100644 index 0000000000000000000000000000000000000000..557a4dde611d42e71d71dd1589abf96f55e6eec6 --- /dev/null +++ b/cognitive_mapping_and_planning/scripts/script_preprocess_meshes_S3DIS.sh @@ -0,0 +1,37 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +mkdir -p data/stanford_building_parser_dataset +mkdir -p data/stanford_building_parser_dataset/mesh +cd data/stanford_building_parser_dataset_raw + +# Untar the files and extract the meshes. +for t in "1" "3" "4" "5a" "5b" "6"; do + tar -xf area_"$t"_noXYZ.tar area_$t/3d/rgb_textures + mv area_$t/3d/rgb_textures ../stanford_building_parser_dataset/mesh/area$t + rmdir area_$t/3d + rmdir area_$t +done + +cd ../../ + +# Preprocess meshes to remove the group and chunk information. +cd data/stanford_building_parser_dataset/ +for t in "1" "3" "4" "5a" "5b" "6"; do + obj_name=`ls mesh/area$t/*.obj` + cp $obj_name "$obj_name".bck + cat $obj_name.bck | grep -v '^g' | grep -v '^o' > $obj_name +done +cd ../../ diff --git a/cognitive_mapping_and_planning/scripts/script_test_pretrained_models.sh b/cognitive_mapping_and_planning/scripts/script_test_pretrained_models.sh new file mode 100644 index 0000000000000000000000000000000000000000..a4299fff5346afb53783a61de5c3e84f102a6304 --- /dev/null +++ b/cognitive_mapping_and_planning/scripts/script_test_pretrained_models.sh @@ -0,0 +1,63 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +# Test CMP models. +CUDA_VISIBLE_DEVICES=0 LD_LIBRARY_PATH=/opt/cuda-8.0/lib64:/opt/cudnnv51/lib64 PYTHONPATH='.' PYOPENGL_PLATFORM=egl \ + python scripts/script_nav_agent_release.py --config_name cmp.lmap_Msc.clip5.sbpd_d_r2r+bench_test \ + --logdir output/cmp.lmap_Msc.clip5.sbpd_d_r2r + +CUDA_VISIBLE_DEVICES=0 LD_LIBRARY_PATH=/opt/cuda-8.0/lib64:/opt/cudnnv51/lib64 PYTHONPATH='.' PYOPENGL_PLATFORM=egl \ + python scripts/script_nav_agent_release.py --config_name cmp.lmap_Msc.clip5.sbpd_rgb_r2r+bench_test \ + --logdir output/cmp.lmap_Msc.clip5.sbpd_rgb_r2r + +CUDA_VISIBLE_DEVICES=0 LD_LIBRARY_PATH=/opt/cuda-8.0/lib64:/opt/cudnnv51/lib64 PYTHONPATH='.' PYOPENGL_PLATFORM=egl \ + python scripts/script_nav_agent_release.py --config_name cmp.lmap_Msc.clip5.sbpd_d_ST+bench_test \ + --logdir output/cmp.lmap_Msc.clip5.sbpd_d_ST + +CUDA_VISIBLE_DEVICES=0 LD_LIBRARY_PATH=/opt/cuda-8.0/lib64:/opt/cudnnv51/lib64 PYTHONPATH='.' PYOPENGL_PLATFORM=egl \ + python scripts/script_nav_agent_release.py --config_name cmp.lmap_Msc.clip5.sbpd_rgb_ST+bench_test \ + --logdir output/cmp.lmap_Msc.clip5.sbpd_rgb_ST + +CUDA_VISIBLE_DEVICES=0 LD_LIBRARY_PATH=/opt/cuda-8.0/lib64:/opt/cudnnv51/lib64 PYTHONPATH='.' PYOPENGL_PLATFORM=egl \ + python scripts/script_nav_agent_release.py --config_name cmp.lmap_Msc.clip5.sbpd_d_r2r_h0_64_80+bench_test \ + --logdir output/cmp.lmap_Msc.clip5.sbpd_d_r2r_h0_64_80 + +# Test LSTM baseline models. +CUDA_VISIBLE_DEVICES=0 LD_LIBRARY_PATH=/opt/cuda-8.0/lib64:/opt/cudnnv51/lib64 PYTHONPATH='.' PYOPENGL_PLATFORM=egl \ + python scripts/script_nav_agent_release.py --config_name bl.v2.noclip.sbpd_d_r2r+bench_test \ + --logdir output/bl.v2.noclip.sbpd_d_r2r + +CUDA_VISIBLE_DEVICES=0 LD_LIBRARY_PATH=/opt/cuda-8.0/lib64:/opt/cudnnv51/lib64 PYTHONPATH='.' PYOPENGL_PLATFORM=egl \ + python scripts/script_nav_agent_release.py --config_name bl.v2.noclip.sbpd_rgb_r2r+bench_test \ + --logdir output/bl.v2.noclip.sbpd_rgb_r2r + +CUDA_VISIBLE_DEVICES=0 LD_LIBRARY_PATH=/opt/cuda-8.0/lib64:/opt/cudnnv51/lib64 PYTHONPATH='.' PYOPENGL_PLATFORM=egl \ + python scripts/script_nav_agent_release.py --config_name bl.v2.noclip.sbpd_d_ST+bench_test \ + --logdir output/bl.v2.noclip.sbpd_d_ST + +CUDA_VISIBLE_DEVICES=0 LD_LIBRARY_PATH=/opt/cuda-8.0/lib64:/opt/cudnnv51/lib64 PYTHONPATH='.' PYOPENGL_PLATFORM=egl \ + python scripts/script_nav_agent_release.py --config_name bl.v2.noclip.sbpd_rgb_ST+bench_test \ + --logdir output/bl.v2.noclip.sbpd_rgb_ST + +CUDA_VISIBLE_DEVICES=0 LD_LIBRARY_PATH=/opt/cuda-8.0/lib64:/opt/cudnnv51/lib64 PYTHONPATH='.' PYOPENGL_PLATFORM=egl \ + python scripts/script_nav_agent_release.py --config_name bl.v2.noclip.sbpd_d_r2r_h0_64_80+bench_test \ + --logdir output/bl.v2.noclip.sbpd_d_r2r_h0_64_80 + +# Visualize test trajectories in top view. +# CUDA_VISIBLE_DEVICES=0 LD_LIBRARY_PATH=/opt/cuda-8.0/lib64:/opt/cudnnv51/lib64 PYTHONPATH='.' PYOPENGL_PLATFORM=egl \ +# python scripts/script_plot_trajectory.py \ +# --first_person --num_steps 40 \ +# --config_name cmp.lmap_Msc.clip5.sbpd_d_r2r \ +# --imset test --alsologtostderr diff --git a/cognitive_mapping_and_planning/src/__init__.py b/cognitive_mapping_and_planning/src/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/cognitive_mapping_and_planning/src/depth_utils.py b/cognitive_mapping_and_planning/src/depth_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..b1fb2f51e5caa08ac43c730d587a771576700242 --- /dev/null +++ b/cognitive_mapping_and_planning/src/depth_utils.py @@ -0,0 +1,95 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Utilities for processing depth images. +""" +import numpy as np +import src.rotation_utils as ru + +def get_camera_matrix(width, height, fov): + """Returns a camera matrix from image size and fov.""" + xc = (width-1.) / 2. + zc = (height-1.) / 2. + f = (width / 2.) / np.tan(np.deg2rad(fov / 2.)) + camera_matrix = utils.Foo(xc=xc, zc=zc, f=f) + return camera_matrix + +def get_point_cloud_from_z(Y, camera_matrix): + """Projects the depth image Y into a 3D point cloud. + Inputs: + Y is ...xHxW + camera_matrix + Outputs: + X is positive going right + Y is positive into the image + Z is positive up in the image + XYZ is ...xHxWx3 + """ + x, z = np.meshgrid(np.arange(Y.shape[-1]), + np.arange(Y.shape[-2]-1, -1, -1)) + for i in range(Y.ndim-2): + x = np.expand_dims(x, axis=0) + z = np.expand_dims(z, axis=0) + X = (x-camera_matrix.xc) * Y / camera_matrix.f + Z = (z-camera_matrix.zc) * Y / camera_matrix.f + XYZ = np.concatenate((X[...,np.newaxis], Y[...,np.newaxis], + Z[...,np.newaxis]), axis=X.ndim) + return XYZ + +def make_geocentric(XYZ, sensor_height, camera_elevation_degree): + """Transforms the point cloud into geocentric coordinate frame. + Input: + XYZ : ...x3 + sensor_height : height of the sensor + camera_elevation_degree : camera elevation to rectify. + Output: + XYZ : ...x3 + """ + R = ru.get_r_matrix([1.,0.,0.], angle=np.deg2rad(camera_elevation_degree)) + XYZ = np.matmul(XYZ.reshape(-1,3), R.T).reshape(XYZ.shape) + XYZ[...,2] = XYZ[...,2] + sensor_height + return XYZ + +def bin_points(XYZ_cms, map_size, z_bins, xy_resolution): + """Bins points into xy-z bins + XYZ_cms is ... x H x W x3 + Outputs is ... x map_size x map_size x (len(z_bins)+1) + """ + sh = XYZ_cms.shape + XYZ_cms = XYZ_cms.reshape([-1, sh[-3], sh[-2], sh[-1]]) + n_z_bins = len(z_bins)+1 + map_center = (map_size-1.)/2. + counts = [] + isvalids = [] + for XYZ_cm in XYZ_cms: + isnotnan = np.logical_not(np.isnan(XYZ_cm[:,:,0])) + X_bin = np.round(XYZ_cm[:,:,0] / xy_resolution + map_center).astype(np.int32) + Y_bin = np.round(XYZ_cm[:,:,1] / xy_resolution + map_center).astype(np.int32) + Z_bin = np.digitize(XYZ_cm[:,:,2], bins=z_bins).astype(np.int32) + + isvalid = np.array([X_bin >= 0, X_bin < map_size, Y_bin >= 0, Y_bin < map_size, + Z_bin >= 0, Z_bin < n_z_bins, isnotnan]) + isvalid = np.all(isvalid, axis=0) + + ind = (Y_bin * map_size + X_bin) * n_z_bins + Z_bin + ind[np.logical_not(isvalid)] = 0 + count = np.bincount(ind.ravel(), isvalid.ravel().astype(np.int32), + minlength=map_size*map_size*n_z_bins) + count = np.reshape(count, [map_size, map_size, n_z_bins]) + counts.append(count) + isvalids.append(isvalid) + counts = np.array(counts).reshape(list(sh[:-3]) + [map_size, map_size, n_z_bins]) + isvalids = np.array(isvalids).reshape(list(sh[:-3]) + [sh[-3], sh[-2], 1]) + return counts, isvalids diff --git a/cognitive_mapping_and_planning/src/file_utils.py b/cognitive_mapping_and_planning/src/file_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..5bf0e4a2e0d1f11382476b586fc76eb3cb5c583e --- /dev/null +++ b/cognitive_mapping_and_planning/src/file_utils.py @@ -0,0 +1,41 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Utilities for manipulating files. +""" +import os +import PIL +from tensorflow.python.platform import gfile +import cv2 + +exists = lambda path: gfile.Exists(path) +fopen = lambda path, mode: gfile.Open(path, mode) +makedirs = lambda path: gfile.MakeDirs(path) +listdir = lambda path: gfile.ListDir(path) +copyfile = lambda a, b, o: gfile.Copy(a,b,o) + +def write_image(image_path, rgb): + ext = os.path.splitext(image_path)[1] + with gfile.GFile(image_path, 'w') as f: + img_str = cv2.imencode(ext, rgb[:,:,::-1])[1].tostring() + f.write(img_str) + +def read_image(image_path, type='rgb'): + with fopen(file_name, 'r') as f: + I = PIL.Image.open(f) + II = np.array(I) + if type == 'rgb': + II = II[:,:,:3] + return II diff --git a/cognitive_mapping_and_planning/src/graph_utils.py b/cognitive_mapping_and_planning/src/graph_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..d40eb62ca6eb47126074ccb243be5773fc92d83f --- /dev/null +++ b/cognitive_mapping_and_planning/src/graph_utils.py @@ -0,0 +1,550 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Various function to manipulate graphs for computing distances. +""" +import skimage.morphology +import numpy as np +import networkx as nx +import itertools +import graph_tool as gt +import graph_tool.topology +import graph_tool.generation +import src.utils as utils + +# Compute shortest path from all nodes to or from all source nodes +def get_distance_node_list(gtG, source_nodes, direction, weights=None): + gtG_ = gt.Graph(gtG) + v = gtG_.add_vertex() + + if weights is not None: + weights = gtG_.edge_properties[weights] + + for s in source_nodes: + e = gtG_.add_edge(s, int(v)) + if weights is not None: + weights[e] = 0. + + if direction == 'to': + dist = gt.topology.shortest_distance( + gt.GraphView(gtG_, reversed=True), source=gtG_.vertex(int(v)), + target=None, weights=weights) + elif direction == 'from': + dist = gt.topology.shortest_distance( + gt.GraphView(gtG_, reversed=False), source=gtG_.vertex(int(v)), + target=None, weights=weights) + dist = np.array(dist.get_array()) + dist = dist[:-1] + if weights is None: + dist = dist-1 + return dist + +# Functions for semantically labelling nodes in the traversal graph. +def generate_lattice(sz_x, sz_y): + """Generates a lattice with sz_x vertices along x and sz_y vertices along y + direction Each of these vertices is step_size distance apart. Origin is at + (0,0). """ + g = gt.generation.lattice([sz_x, sz_y]) + x, y = np.meshgrid(np.arange(sz_x), np.arange(sz_y)) + x = np.reshape(x, [-1,1]); y = np.reshape(y, [-1,1]); + nodes = np.concatenate((x,y), axis=1) + return g, nodes + +def add_diagonal_edges(g, nodes, sz_x, sz_y, edge_len): + offset = [sz_x+1, sz_x-1] + for o in offset: + s = np.arange(nodes.shape[0]-o-1) + t = s + o + ind = np.all(np.abs(nodes[s,:] - nodes[t,:]) == np.array([[1,1]]), axis=1) + s = s[ind][:,np.newaxis] + t = t[ind][:,np.newaxis] + st = np.concatenate((s,t), axis=1) + for i in range(st.shape[0]): + e = g.add_edge(st[i,0], st[i,1], add_missing=False) + g.ep['wts'][e] = edge_len + +def convert_traversible_to_graph(traversible, ff_cost=1., fo_cost=1., + oo_cost=1., connectivity=4): + assert(connectivity == 4 or connectivity == 8) + + sz_x = traversible.shape[1] + sz_y = traversible.shape[0] + g, nodes = generate_lattice(sz_x, sz_y) + + # Assign costs. + edge_wts = g.new_edge_property('float') + g.edge_properties['wts'] = edge_wts + wts = np.ones(g.num_edges(), dtype=np.float32) + edge_wts.get_array()[:] = wts + + if connectivity == 8: + add_diagonal_edges(g, nodes, sz_x, sz_y, np.sqrt(2.)) + + se = np.array([[int(e.source()), int(e.target())] for e in g.edges()]) + s_xy = nodes[se[:,0]] + t_xy = nodes[se[:,1]] + s_t = np.ravel_multi_index((s_xy[:,1], s_xy[:,0]), traversible.shape) + t_t = np.ravel_multi_index((t_xy[:,1], t_xy[:,0]), traversible.shape) + s_t = traversible.ravel()[s_t] + t_t = traversible.ravel()[t_t] + + wts = np.zeros(g.num_edges(), dtype=np.float32) + wts[np.logical_and(s_t == True, t_t == True)] = ff_cost + wts[np.logical_and(s_t == False, t_t == False)] = oo_cost + wts[np.logical_xor(s_t, t_t)] = fo_cost + + edge_wts = g.edge_properties['wts'] + for i, e in enumerate(g.edges()): + edge_wts[e] = edge_wts[e] * wts[i] + # d = edge_wts.get_array()*1. + # edge_wts.get_array()[:] = d*wts + return g, nodes + +def label_nodes_with_class(nodes_xyt, class_maps, pix): + """ + Returns: + class_maps__: one-hot class_map for each class. + node_class_label: one-hot class_map for each class, nodes_xyt.shape[0] x n_classes + """ + # Assign each pixel to a node. + selem = skimage.morphology.disk(pix) + class_maps_ = class_maps*1. + for i in range(class_maps.shape[2]): + class_maps_[:,:,i] = skimage.morphology.dilation(class_maps[:,:,i]*1, selem) + class_maps__ = np.argmax(class_maps_, axis=2) + class_maps__[np.max(class_maps_, axis=2) == 0] = -1 + + # For each node pick out the label from this class map. + x = np.round(nodes_xyt[:,[0]]).astype(np.int32) + y = np.round(nodes_xyt[:,[1]]).astype(np.int32) + ind = np.ravel_multi_index((y,x), class_maps__.shape) + node_class_label = class_maps__.ravel()[ind][:,0] + + # Convert to one hot versions. + class_maps_one_hot = np.zeros(class_maps.shape, dtype=np.bool) + node_class_label_one_hot = np.zeros((node_class_label.shape[0], class_maps.shape[2]), dtype=np.bool) + for i in range(class_maps.shape[2]): + class_maps_one_hot[:,:,i] = class_maps__ == i + node_class_label_one_hot[:,i] = node_class_label == i + return class_maps_one_hot, node_class_label_one_hot + +def label_nodes_with_class_geodesic(nodes_xyt, class_maps, pix, traversible, + ff_cost=1., fo_cost=1., oo_cost=1., + connectivity=4): + """Labels nodes in nodes_xyt with class labels using geodesic distance as + defined by traversible from class_maps. + Inputs: + nodes_xyt + class_maps: counts for each class. + pix: distance threshold to consider close enough to target. + traversible: binary map of whether traversible or not. + Output: + labels: For each node in nodes_xyt returns a label of the class or -1 is + unlabelled. + """ + g, nodes = convert_traversible_to_graph(traversible, ff_cost=ff_cost, + fo_cost=fo_cost, oo_cost=oo_cost, + connectivity=connectivity) + + class_dist = np.zeros_like(class_maps*1.) + n_classes = class_maps.shape[2] + if False: + # Assign each pixel to a class based on number of points. + selem = skimage.morphology.disk(pix) + class_maps_ = class_maps*1. + class_maps__ = np.argmax(class_maps_, axis=2) + class_maps__[np.max(class_maps_, axis=2) == 0] = -1 + + # Label nodes with classes. + for i in range(n_classes): + # class_node_ids = np.where(class_maps__.ravel() == i)[0] + class_node_ids = np.where(class_maps[:,:,i].ravel() > 0)[0] + dist_i = get_distance_node_list(g, class_node_ids, 'to', weights='wts') + class_dist[:,:,i] = np.reshape(dist_i, class_dist[:,:,i].shape) + class_map_geodesic = (class_dist <= pix) + class_map_geodesic = np.reshape(class_map_geodesic, [-1, n_classes]) + + # For each node pick out the label from this class map. + x = np.round(nodes_xyt[:,[0]]).astype(np.int32) + y = np.round(nodes_xyt[:,[1]]).astype(np.int32) + ind = np.ravel_multi_index((y,x), class_dist[:,:,0].shape) + node_class_label = class_map_geodesic[ind[:,0],:] + class_map_geodesic = class_dist <= pix + return class_map_geodesic, node_class_label + +def _get_next_nodes_undirected(n, sc, n_ori): + nodes_to_add = [] + nodes_to_validate = [] + (p, q, r) = n + nodes_to_add.append((n, (p, q, r), 0)) + if n_ori == 4: + for _ in [1, 2, 3, 4]: + if _ == 1: + v = (p - sc, q, r) + elif _ == 2: + v = (p + sc, q, r) + elif _ == 3: + v = (p, q - sc, r) + elif _ == 4: + v = (p, q + sc, r) + nodes_to_validate.append((n, v, _)) + return nodes_to_add, nodes_to_validate + +def _get_next_nodes(n, sc, n_ori): + nodes_to_add = [] + nodes_to_validate = [] + (p, q, r) = n + for r_, a_ in zip([-1, 0, 1], [1, 0, 2]): + nodes_to_add.append((n, (p, q, np.mod(r+r_, n_ori)), a_)) + + if n_ori == 6: + if r == 0: + v = (p + sc, q, r) + elif r == 1: + v = (p + sc, q + sc, r) + elif r == 2: + v = (p, q + sc, r) + elif r == 3: + v = (p - sc, q, r) + elif r == 4: + v = (p - sc, q - sc, r) + elif r == 5: + v = (p, q - sc, r) + elif n_ori == 4: + if r == 0: + v = (p + sc, q, r) + elif r == 1: + v = (p, q + sc, r) + elif r == 2: + v = (p - sc, q, r) + elif r == 3: + v = (p, q - sc, r) + nodes_to_validate.append((n,v,3)) + + return nodes_to_add, nodes_to_validate + +def generate_graph(valid_fn_vec=None, sc=1., n_ori=6, + starting_location=(0, 0, 0), vis=False, directed=True): + timer = utils.Timer() + timer.tic() + if directed: G = nx.DiGraph(directed=True) + else: G = nx.Graph() + G.add_node(starting_location) + new_nodes = G.nodes() + while len(new_nodes) != 0: + nodes_to_add = [] + nodes_to_validate = [] + for n in new_nodes: + if directed: + na, nv = _get_next_nodes(n, sc, n_ori) + else: + na, nv = _get_next_nodes_undirected(n, sc, n_ori) + nodes_to_add = nodes_to_add + na + if valid_fn_vec is not None: + nodes_to_validate = nodes_to_validate + nv + else: + node_to_add = nodes_to_add + nv + + # Validate nodes. + vs = [_[1] for _ in nodes_to_validate] + valids = valid_fn_vec(vs) + + for nva, valid in zip(nodes_to_validate, valids): + if valid: + nodes_to_add.append(nva) + + new_nodes = [] + for n,v,a in nodes_to_add: + if not G.has_node(v): + new_nodes.append(v) + G.add_edge(n, v, action=a) + + timer.toc(average=True, log_at=1, log_str='src.graph_utils.generate_graph') + return (G) + +def vis_G(G, ax, vertex_color='r', edge_color='b', r=None): + if edge_color is not None: + for e in G.edges(): + XYT = zip(*e) + x = XYT[-3] + y = XYT[-2] + t = XYT[-1] + if r is None or t[0] == r: + ax.plot(x, y, edge_color) + if vertex_color is not None: + XYT = zip(*G.nodes()) + x = XYT[-3] + y = XYT[-2] + t = XYT[-1] + ax.plot(x, y, vertex_color + '.') + +def convert_to_graph_tool(G): + timer = utils.Timer() + timer.tic() + gtG = gt.Graph(directed=G.is_directed()) + gtG.ep['action'] = gtG.new_edge_property('int') + + nodes_list = G.nodes() + nodes_array = np.array(nodes_list) + + nodes_id = np.zeros((nodes_array.shape[0],), dtype=np.int64) + + for i in range(nodes_array.shape[0]): + v = gtG.add_vertex() + nodes_id[i] = int(v) + + # d = {key: value for (key, value) in zip(nodes_list, nodes_id)} + d = dict(itertools.izip(nodes_list, nodes_id)) + + for src, dst, data in G.edges_iter(data=True): + e = gtG.add_edge(d[src], d[dst]) + gtG.ep['action'][e] = data['action'] + nodes_to_id = d + timer.toc(average=True, log_at=1, log_str='src.graph_utils.convert_to_graph_tool') + return gtG, nodes_array, nodes_to_id + + +def _rejection_sampling(rng, sampling_d, target_d, bins, hardness, M): + bin_ind = np.digitize(hardness, bins)-1 + i = 0 + ratio = target_d[bin_ind] / (M*sampling_d[bin_ind]) + while i < ratio.size and rng.rand() > ratio[i]: + i = i+1 + return i + +def heuristic_fn_vec(n1, n2, n_ori, step_size): + # n1 is a vector and n2 is a single point. + dx = (n1[:,0] - n2[0,0])/step_size + dy = (n1[:,1] - n2[0,1])/step_size + dt = n1[:,2] - n2[0,2] + dt = np.mod(dt, n_ori) + dt = np.minimum(dt, n_ori-dt) + + if n_ori == 6: + if dx*dy > 0: + d = np.maximum(np.abs(dx), np.abs(dy)) + else: + d = np.abs(dy-dx) + elif n_ori == 4: + d = np.abs(dx) + np.abs(dy) + + return (d + dt).reshape((-1,1)) + +def get_hardness_distribution(gtG, max_dist, min_dist, rng, trials, bins, nodes, + n_ori, step_size): + heuristic_fn = lambda node_ids, node_id: \ + heuristic_fn_vec(nodes[node_ids, :], nodes[[node_id], :], n_ori, step_size) + num_nodes = gtG.num_vertices() + gt_dists = []; h_dists = []; + for i in range(trials): + end_node_id = rng.choice(num_nodes) + gt_dist = gt.topology.shortest_distance(gt.GraphView(gtG, reversed=True), + source=gtG.vertex(end_node_id), + target=None, max_dist=max_dist) + gt_dist = np.array(gt_dist.get_array()) + ind = np.where(np.logical_and(gt_dist <= max_dist, gt_dist >= min_dist))[0] + gt_dist = gt_dist[ind] + h_dist = heuristic_fn(ind, end_node_id)[:,0] + gt_dists.append(gt_dist) + h_dists.append(h_dist) + gt_dists = np.concatenate(gt_dists) + h_dists = np.concatenate(h_dists) + hardness = 1. - h_dists*1./gt_dists + hist, _ = np.histogram(hardness, bins) + hist = hist.astype(np.float64) + hist = hist / np.sum(hist) + return hist + +def rng_next_goal_rejection_sampling(start_node_ids, batch_size, gtG, rng, + max_dist, min_dist, max_dist_to_compute, + sampling_d, target_d, + nodes, n_ori, step_size, bins, M): + sample_start_nodes = start_node_ids is None + dists = []; pred_maps = []; end_node_ids = []; start_node_ids_ = []; + hardnesss = []; gt_dists = []; + num_nodes = gtG.num_vertices() + for i in range(batch_size): + done = False + while not done: + if sample_start_nodes: + start_node_id = rng.choice(num_nodes) + else: + start_node_id = start_node_ids[i] + + gt_dist = gt.topology.shortest_distance( + gt.GraphView(gtG, reversed=False), source=start_node_id, target=None, + max_dist=max_dist) + gt_dist = np.array(gt_dist.get_array()) + ind = np.where(np.logical_and(gt_dist <= max_dist, gt_dist >= min_dist))[0] + ind = rng.permutation(ind) + gt_dist = gt_dist[ind]*1. + h_dist = heuristic_fn_vec(nodes[ind, :], nodes[[start_node_id], :], + n_ori, step_size)[:,0] + hardness = 1. - h_dist / gt_dist + sampled_ind = _rejection_sampling(rng, sampling_d, target_d, bins, + hardness, M) + if sampled_ind < ind.size: + # print sampled_ind + end_node_id = ind[sampled_ind] + hardness = hardness[sampled_ind] + gt_dist = gt_dist[sampled_ind] + done = True + + # Compute distance from end node to all nodes, to return. + dist, pred_map = gt.topology.shortest_distance( + gt.GraphView(gtG, reversed=True), source=end_node_id, target=None, + max_dist=max_dist_to_compute, pred_map=True) + dist = np.array(dist.get_array()) + pred_map = np.array(pred_map.get_array()) + + hardnesss.append(hardness); dists.append(dist); pred_maps.append(pred_map); + start_node_ids_.append(start_node_id); end_node_ids.append(end_node_id); + gt_dists.append(gt_dist); + paths = None + return start_node_ids_, end_node_ids, dists, pred_maps, paths, hardnesss, gt_dists + + +def rng_next_goal(start_node_ids, batch_size, gtG, rng, max_dist, + max_dist_to_compute, node_room_ids, nodes=None, + compute_path=False, dists_from_start_node=None): + # Compute the distance field from the starting location, and then pick a + # destination in another room if possible otherwise anywhere outside this + # room. + dists = []; pred_maps = []; paths = []; end_node_ids = []; + for i in range(batch_size): + room_id = node_room_ids[start_node_ids[i]] + # Compute distances. + if dists_from_start_node == None: + dist, pred_map = gt.topology.shortest_distance( + gt.GraphView(gtG, reversed=False), source=gtG.vertex(start_node_ids[i]), + target=None, max_dist=max_dist_to_compute, pred_map=True) + dist = np.array(dist.get_array()) + else: + dist = dists_from_start_node[i] + + # Randomly sample nodes which are within max_dist. + near_ids = dist <= max_dist + near_ids = near_ids[:, np.newaxis] + # Check to see if there is a non-negative node which is close enough. + non_same_room_ids = node_room_ids != room_id + non_hallway_ids = node_room_ids != -1 + good1_ids = np.logical_and(near_ids, np.logical_and(non_same_room_ids, non_hallway_ids)) + good2_ids = np.logical_and(near_ids, non_hallway_ids) + good3_ids = near_ids + if np.any(good1_ids): + end_node_id = rng.choice(np.where(good1_ids)[0]) + elif np.any(good2_ids): + end_node_id = rng.choice(np.where(good2_ids)[0]) + elif np.any(good3_ids): + end_node_id = rng.choice(np.where(good3_ids)[0]) + else: + logging.error('Did not find any good nodes.') + + # Compute distance to this new goal for doing distance queries. + dist, pred_map = gt.topology.shortest_distance( + gt.GraphView(gtG, reversed=True), source=gtG.vertex(end_node_id), + target=None, max_dist=max_dist_to_compute, pred_map=True) + dist = np.array(dist.get_array()) + pred_map = np.array(pred_map.get_array()) + + dists.append(dist) + pred_maps.append(pred_map) + end_node_ids.append(end_node_id) + + path = None + if compute_path: + path = get_path_ids(start_node_ids[i], end_node_ids[i], pred_map) + paths.append(path) + + return start_node_ids, end_node_ids, dists, pred_maps, paths + + +def rng_room_to_room(batch_size, gtG, rng, max_dist, max_dist_to_compute, + node_room_ids, nodes=None, compute_path=False): + # Sample one of the rooms, compute the distance field. Pick a destination in + # another room if possible otherwise anywhere outside this room. + dists = []; pred_maps = []; paths = []; start_node_ids = []; end_node_ids = []; + room_ids = np.unique(node_room_ids[node_room_ids[:,0] >= 0, 0]) + for i in range(batch_size): + room_id = rng.choice(room_ids) + end_node_id = rng.choice(np.where(node_room_ids[:,0] == room_id)[0]) + end_node_ids.append(end_node_id) + + # Compute distances. + dist, pred_map = gt.topology.shortest_distance( + gt.GraphView(gtG, reversed=True), source=gtG.vertex(end_node_id), + target=None, max_dist=max_dist_to_compute, pred_map=True) + dist = np.array(dist.get_array()) + pred_map = np.array(pred_map.get_array()) + dists.append(dist) + pred_maps.append(pred_map) + + # Randomly sample nodes which are within max_dist. + near_ids = dist <= max_dist + near_ids = near_ids[:, np.newaxis] + + # Check to see if there is a non-negative node which is close enough. + non_same_room_ids = node_room_ids != room_id + non_hallway_ids = node_room_ids != -1 + good1_ids = np.logical_and(near_ids, np.logical_and(non_same_room_ids, non_hallway_ids)) + good2_ids = np.logical_and(near_ids, non_hallway_ids) + good3_ids = near_ids + if np.any(good1_ids): + start_node_id = rng.choice(np.where(good1_ids)[0]) + elif np.any(good2_ids): + start_node_id = rng.choice(np.where(good2_ids)[0]) + elif np.any(good3_ids): + start_node_id = rng.choice(np.where(good3_ids)[0]) + else: + logging.error('Did not find any good nodes.') + + start_node_ids.append(start_node_id) + + path = None + if compute_path: + path = get_path_ids(start_node_ids[i], end_node_ids[i], pred_map) + paths.append(path) + + return start_node_ids, end_node_ids, dists, pred_maps, paths + + +def rng_target_dist_field(batch_size, gtG, rng, max_dist, max_dist_to_compute, + nodes=None, compute_path=False): + # Sample a single node, compute distance to all nodes less than max_dist, + # sample nodes which are a particular distance away. + dists = []; pred_maps = []; paths = []; start_node_ids = [] + end_node_ids = rng.choice(gtG.num_vertices(), size=(batch_size,), + replace=False).tolist() + + for i in range(batch_size): + dist, pred_map = gt.topology.shortest_distance( + gt.GraphView(gtG, reversed=True), source=gtG.vertex(end_node_ids[i]), + target=None, max_dist=max_dist_to_compute, pred_map=True) + dist = np.array(dist.get_array()) + pred_map = np.array(pred_map.get_array()) + dists.append(dist) + pred_maps.append(pred_map) + + # Randomly sample nodes which are withing max_dist + near_ids = np.where(dist <= max_dist)[0] + start_node_id = rng.choice(near_ids, size=(1,), replace=False)[0] + start_node_ids.append(start_node_id) + + path = None + if compute_path: + path = get_path_ids(start_node_ids[i], end_node_ids[i], pred_map) + paths.append(path) + + return start_node_ids, end_node_ids, dists, pred_maps, paths diff --git a/cognitive_mapping_and_planning/src/map_utils.py b/cognitive_mapping_and_planning/src/map_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..1298bff24e798cb31bd40c106e603d5accd2b573 --- /dev/null +++ b/cognitive_mapping_and_planning/src/map_utils.py @@ -0,0 +1,244 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Various function to compute the ground truth map for training etc. +""" +import copy +import skimage.morphology +import numpy as np +import scipy.ndimage +import matplotlib.pyplot as plt +import PIL + +import src.utils as utils +import cv2 + +def _get_xy_bounding_box(vertex, padding): + """Returns the xy bounding box of the environment.""" + min_ = np.floor(np.min(vertex[:, :2], axis=0) - padding).astype(np.int) + max_ = np.ceil(np.max(vertex[:, :2], axis=0) + padding).astype(np.int) + return min_, max_ + +def _project_to_map(map, vertex, wt=None, ignore_points_outside_map=False): + """Projects points to map, returns how many points are present at each + location.""" + num_points = np.zeros((map.size[1], map.size[0])) + vertex_ = vertex[:, :2] - map.origin + vertex_ = np.round(vertex_ / map.resolution).astype(np.int) + if ignore_points_outside_map: + good_ind = np.all(np.array([vertex_[:,1] >= 0, vertex_[:,1] < map.size[1], + vertex_[:,0] >= 0, vertex_[:,0] < map.size[0]]), + axis=0) + vertex_ = vertex_[good_ind, :] + if wt is not None: + wt = wt[good_ind, :] + if wt is None: + np.add.at(num_points, (vertex_[:, 1], vertex_[:, 0]), 1) + else: + assert(wt.shape[0] == vertex.shape[0]), \ + 'number of weights should be same as vertices.' + np.add.at(num_points, (vertex_[:, 1], vertex_[:, 0]), wt) + return num_points + +def make_map(padding, resolution, vertex=None, sc=1.): + """Returns a map structure.""" + min_, max_ = _get_xy_bounding_box(vertex*sc, padding=padding) + sz = np.ceil((max_ - min_ + 1) / resolution).astype(np.int32) + max_ = min_ + sz * resolution - 1 + map = utils.Foo(origin=min_, size=sz, max=max_, resolution=resolution, + padding=padding) + return map + +def _fill_holes(img, thresh): + """Fills holes less than thresh area (assumes 4 connectivity when computing + hole area.""" + l, n = scipy.ndimage.label(np.logical_not(img)) + img_ = img == True + cnts = np.bincount(l.reshape(-1)) + for i, cnt in enumerate(cnts): + if cnt < thresh: + l[l == i] = -1 + img_[l == -1] = True + return img_ + +def compute_traversibility(map, robot_base, robot_height, robot_radius, + valid_min, valid_max, num_point_threshold, shapess, + sc=100., n_samples_per_face=200): + """Returns a bit map with pixels that are traversible or not as long as the + robot center is inside this volume we are good colisions can be detected by + doing a line search on things, or walking from current location to final + location in the bitmap, or doing bwlabel on the traversibility map.""" + + tt = utils.Timer() + tt.tic() + num_obstcale_points = np.zeros((map.size[1], map.size[0])) + num_points = np.zeros((map.size[1], map.size[0])) + + for i, shapes in enumerate(shapess): + for j in range(shapes.get_number_of_meshes()): + p, face_areas, face_idx = shapes.sample_points_on_face_of_shape( + j, n_samples_per_face, sc) + wt = face_areas[face_idx]/n_samples_per_face + + ind = np.all(np.concatenate( + (p[:, [2]] > robot_base, + p[:, [2]] < robot_base + robot_height), axis=1),axis=1) + num_obstcale_points += _project_to_map(map, p[ind, :], wt[ind]) + + ind = np.all(np.concatenate( + (p[:, [2]] > valid_min, + p[:, [2]] < valid_max), axis=1),axis=1) + num_points += _project_to_map(map, p[ind, :], wt[ind]) + + selem = skimage.morphology.disk(robot_radius / map.resolution) + obstacle_free = skimage.morphology.binary_dilation( + _fill_holes(num_obstcale_points > num_point_threshold, 20), selem) != True + valid_space = _fill_holes(num_points > num_point_threshold, 20) + traversible = np.all(np.concatenate((obstacle_free[...,np.newaxis], + valid_space[...,np.newaxis]), axis=2), + axis=2) + # plt.imshow(np.concatenate((obstacle_free, valid_space, traversible), axis=1)) + # plt.show() + + map_out = copy.deepcopy(map) + map_out.num_obstcale_points = num_obstcale_points + map_out.num_points = num_points + map_out.traversible = traversible + map_out.obstacle_free = obstacle_free + map_out.valid_space = valid_space + tt.toc(log_at=1, log_str='src.map_utils.compute_traversibility: ') + return map_out + + +def resize_maps(map, map_scales, resize_method): + scaled_maps = [] + for i, sc in enumerate(map_scales): + if resize_method == 'antialiasing': + # Resize using open cv so that we can compute the size. + # Use PIL resize to use anti aliasing feature. + map_ = cv2.resize(map*1, None, None, fx=sc, fy=sc, interpolation=cv2.INTER_LINEAR) + w = map_.shape[1]; h = map_.shape[0] + + map_img = PIL.Image.fromarray((map*255).astype(np.uint8)) + map__img = map_img.resize((w,h), PIL.Image.ANTIALIAS) + map_ = np.asarray(map__img).astype(np.float32) + map_ = map_/255. + map_ = np.minimum(map_, 1.0) + map_ = np.maximum(map_, 0.0) + elif resize_method == 'linear_noantialiasing': + map_ = cv2.resize(map*1, None, None, fx=sc, fy=sc, interpolation=cv2.INTER_LINEAR) + else: + logging.error('Unknown resizing method') + scaled_maps.append(map_) + return scaled_maps + + +def pick_largest_cc(traversible): + out = scipy.ndimage.label(traversible)[0] + cnt = np.bincount(out.reshape(-1))[1:] + return out == np.argmax(cnt) + 1 + +def get_graph_origin_loc(rng, traversible): + """Erode the traversibility mask so that we get points in the bulk of the + graph, and not end up with a situation where the graph is localized in the + corner of a cramped room. Output Locs is in the coordinate frame of the + map.""" + + aa = pick_largest_cc(skimage.morphology.binary_erosion(traversible == True, + selem=np.ones((15,15)))) + y, x = np.where(aa > 0) + ind = rng.choice(y.size) + locs = np.array([x[ind], y[ind]]) + locs = locs + rng.rand(*(locs.shape)) - 0.5 + return locs + + +def generate_egocentric_maps(scaled_maps, map_scales, map_crop_sizes, loc, + x_axis, y_axis, theta): + maps = [] + for i, (map_, sc, map_crop_size) in enumerate(zip(scaled_maps, map_scales, map_crop_sizes)): + maps_i = np.array(get_map_to_predict(loc*sc, x_axis, y_axis, map_, + map_crop_size, + interpolation=cv2.INTER_LINEAR)[0]) + maps_i[np.isnan(maps_i)] = 0 + maps.append(maps_i) + return maps + +def generate_goal_images(map_scales, map_crop_sizes, n_ori, goal_dist, + goal_theta, rel_goal_orientation): + goal_dist = goal_dist[:,0] + goal_theta = goal_theta[:,0] + rel_goal_orientation = rel_goal_orientation[:,0] + + goals = []; + # Generate the map images. + for i, (sc, map_crop_size) in enumerate(zip(map_scales, map_crop_sizes)): + goal_i = np.zeros((goal_dist.shape[0], map_crop_size, map_crop_size, n_ori), + dtype=np.float32) + x = goal_dist*np.cos(goal_theta)*sc + (map_crop_size-1.)/2. + y = goal_dist*np.sin(goal_theta)*sc + (map_crop_size-1.)/2. + + for j in range(goal_dist.shape[0]): + gc = rel_goal_orientation[j] + x0 = np.floor(x[j]).astype(np.int32); x1 = x0 + 1; + y0 = np.floor(y[j]).astype(np.int32); y1 = y0 + 1; + if x0 >= 0 and x0 <= map_crop_size-1: + if y0 >= 0 and y0 <= map_crop_size-1: + goal_i[j, y0, x0, gc] = (x1-x[j])*(y1-y[j]) + if y1 >= 0 and y1 <= map_crop_size-1: + goal_i[j, y1, x0, gc] = (x1-x[j])*(y[j]-y0) + + if x1 >= 0 and x1 <= map_crop_size-1: + if y0 >= 0 and y0 <= map_crop_size-1: + goal_i[j, y0, x1, gc] = (x[j]-x0)*(y1-y[j]) + if y1 >= 0 and y1 <= map_crop_size-1: + goal_i[j, y1, x1, gc] = (x[j]-x0)*(y[j]-y0) + + goals.append(goal_i) + return goals + +def get_map_to_predict(src_locs, src_x_axiss, src_y_axiss, map, map_size, + interpolation=cv2.INTER_LINEAR): + fss = [] + valids = [] + + center = (map_size-1.0)/2.0 + dst_theta = np.pi/2.0 + dst_loc = np.array([center, center]) + dst_x_axis = np.array([np.cos(dst_theta), np.sin(dst_theta)]) + dst_y_axis = np.array([np.cos(dst_theta+np.pi/2), np.sin(dst_theta+np.pi/2)]) + + def compute_points(center, x_axis, y_axis): + points = np.zeros((3,2),dtype=np.float32) + points[0,:] = center + points[1,:] = center + x_axis + points[2,:] = center + y_axis + return points + + dst_points = compute_points(dst_loc, dst_x_axis, dst_y_axis) + for i in range(src_locs.shape[0]): + src_loc = src_locs[i,:] + src_x_axis = src_x_axiss[i,:] + src_y_axis = src_y_axiss[i,:] + src_points = compute_points(src_loc, src_x_axis, src_y_axis) + M = cv2.getAffineTransform(src_points, dst_points) + + fs = cv2.warpAffine(map, M, (map_size, map_size), None, flags=interpolation, + borderValue=np.NaN) + valid = np.invert(np.isnan(fs)) + valids.append(valid) + fss.append(fs) + return fss, valids + diff --git a/cognitive_mapping_and_planning/src/rotation_utils.py b/cognitive_mapping_and_planning/src/rotation_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..8d6d4f3cbdb1f808d210dce8b22fa3ba831d45a9 --- /dev/null +++ b/cognitive_mapping_and_planning/src/rotation_utils.py @@ -0,0 +1,73 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Utilities for generating and applying rotation matrices. +""" +import numpy as np + +ANGLE_EPS = 0.001 + + +def normalize(v): + return v / np.linalg.norm(v) + + +def get_r_matrix(ax_, angle): + ax = normalize(ax_) + if np.abs(angle) > ANGLE_EPS: + S_hat = np.array( + [[0.0, -ax[2], ax[1]], [ax[2], 0.0, -ax[0]], [-ax[1], ax[0], 0.0]], + dtype=np.float32) + R = np.eye(3) + np.sin(angle)*S_hat + \ + (1-np.cos(angle))*(np.linalg.matrix_power(S_hat, 2)) + else: + R = np.eye(3) + return R + + +def r_between(v_from_, v_to_): + v_from = normalize(v_from_) + v_to = normalize(v_to_) + ax = normalize(np.cross(v_from, v_to)) + angle = np.arccos(np.dot(v_from, v_to)) + return get_r_matrix(ax, angle) + + +def rotate_camera_to_point_at(up_from, lookat_from, up_to, lookat_to): + inputs = [up_from, lookat_from, up_to, lookat_to] + for i in range(4): + inputs[i] = normalize(np.array(inputs[i]).reshape((-1,))) + up_from, lookat_from, up_to, lookat_to = inputs + r1 = r_between(lookat_from, lookat_to) + + new_x = np.dot(r1, np.array([1, 0, 0]).reshape((-1, 1))).reshape((-1)) + to_x = normalize(np.cross(lookat_to, up_to)) + angle = np.arccos(np.dot(new_x, to_x)) + if angle > ANGLE_EPS: + if angle < np.pi - ANGLE_EPS: + ax = normalize(np.cross(new_x, to_x)) + flip = np.dot(lookat_to, ax) + if flip > 0: + r2 = get_r_matrix(lookat_to, angle) + elif flip < 0: + r2 = get_r_matrix(lookat_to, -1. * angle) + else: + # Angle of rotation is too close to 180 degrees, direction of rotation + # does not matter. + r2 = get_r_matrix(lookat_to, angle) + else: + r2 = np.eye(3) + return np.dot(r2, r1) + diff --git a/cognitive_mapping_and_planning/src/utils.py b/cognitive_mapping_and_planning/src/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..f58820c1f4cda35c0b38fb42f02d3f221924dc66 --- /dev/null +++ b/cognitive_mapping_and_planning/src/utils.py @@ -0,0 +1,168 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +r"""Generaly Utilities. +""" + +import numpy as np, cPickle, os, time +import src.file_utils as fu +import logging + +class Timer(): + def __init__(self): + self.calls = 0. + self.start_time = 0. + self.time_per_call = 0. + self.total_time = 0. + self.last_log_time = 0. + + def tic(self): + self.start_time = time.time() + + def toc(self, average=True, log_at=-1, log_str='', type='calls'): + if self.start_time == 0: + logging.error('Timer not started by calling tic().') + t = time.time() + diff = time.time() - self.start_time + self.total_time += diff + self.calls += 1. + self.time_per_call = self.total_time/self.calls + + if type == 'calls' and log_at > 0 and np.mod(self.calls, log_at) == 0: + _ = [] + logging.info('%s: %f seconds.', log_str, self.time_per_call) + elif type == 'time' and log_at > 0 and t - self.last_log_time >= log_at: + _ = [] + logging.info('%s: %f seconds.', log_str, self.time_per_call) + self.last_log_time = t + + if average: + return self.time_per_call + else: + return diff + +class Foo(object): + def __init__(self, **kwargs): + self.__dict__.update(kwargs) + def __str__(self): + str_ = '' + for v in vars(self).keys(): + a = getattr(self, v) + if True: #isinstance(v, object): + str__ = str(a) + str__ = str__.replace('\n', '\n ') + else: + str__ = str(a) + str_ += '{:s}: {:s}'.format(v, str__) + str_ += '\n' + return str_ + + +def dict_equal(dict1, dict2): + assert(set(dict1.keys()) == set(dict2.keys())), "Sets of keys between 2 dictionaries are different." + for k in dict1.keys(): + assert(type(dict1[k]) == type(dict2[k])), "Type of key '{:s}' if different.".format(k) + if type(dict1[k]) == np.ndarray: + assert(dict1[k].dtype == dict2[k].dtype), "Numpy Type of key '{:s}' if different.".format(k) + assert(np.allclose(dict1[k], dict2[k])), "Value for key '{:s}' do not match.".format(k) + else: + assert(dict1[k] == dict2[k]), "Value for key '{:s}' do not match.".format(k) + return True + +def subplot(plt, Y_X, sz_y_sz_x = (10, 10)): + Y,X = Y_X + sz_y, sz_x = sz_y_sz_x + plt.rcParams['figure.figsize'] = (X*sz_x, Y*sz_y) + fig, axes = plt.subplots(Y, X) + plt.subplots_adjust(wspace=0.1, hspace=0.1) + return fig, axes + +def tic_toc_print(interval, string): + global tic_toc_print_time_old + if 'tic_toc_print_time_old' not in globals(): + tic_toc_print_time_old = time.time() + print string + else: + new_time = time.time() + if new_time - tic_toc_print_time_old > interval: + tic_toc_print_time_old = new_time; + print string + +def mkdir_if_missing(output_dir): + if not fu.exists(output_dir): + fu.makedirs(output_dir) + +def save_variables(pickle_file_name, var, info, overwrite = False): + if fu.exists(pickle_file_name) and overwrite == False: + raise Exception('{:s} exists and over write is false.'.format(pickle_file_name)) + # Construct the dictionary + assert(type(var) == list); assert(type(info) == list); + d = {} + for i in xrange(len(var)): + d[info[i]] = var[i] + with fu.fopen(pickle_file_name, 'w') as f: + cPickle.dump(d, f, cPickle.HIGHEST_PROTOCOL) + +def load_variables(pickle_file_name): + if fu.exists(pickle_file_name): + with fu.fopen(pickle_file_name, 'r') as f: + d = cPickle.load(f) + return d + else: + raise Exception('{:s} does not exists.'.format(pickle_file_name)) + +def voc_ap(rec, prec): + rec = rec.reshape((-1,1)) + prec = prec.reshape((-1,1)) + z = np.zeros((1,1)) + o = np.ones((1,1)) + mrec = np.vstack((z, rec, o)) + mpre = np.vstack((z, prec, z)) + for i in range(len(mpre)-2, -1, -1): + mpre[i] = max(mpre[i], mpre[i+1]) + + I = np.where(mrec[1:] != mrec[0:-1])[0]+1; + ap = 0; + for i in I: + ap = ap + (mrec[i] - mrec[i-1])*mpre[i]; + return ap + +def tight_imshow_figure(plt, figsize=None): + fig = plt.figure(figsize=figsize) + ax = plt.Axes(fig, [0,0,1,1]) + ax.set_axis_off() + fig.add_axes(ax) + return fig, ax + +def calc_pr(gt, out, wt=None): + if wt is None: + wt = np.ones((gt.size,1)) + + gt = gt.astype(np.float64).reshape((-1,1)) + wt = wt.astype(np.float64).reshape((-1,1)) + out = out.astype(np.float64).reshape((-1,1)) + + gt = gt*wt + tog = np.concatenate([gt, wt, out], axis=1)*1. + ind = np.argsort(tog[:,2], axis=0)[::-1] + tog = tog[ind,:] + cumsumsortgt = np.cumsum(tog[:,0]) + cumsumsortwt = np.cumsum(tog[:,1]) + prec = cumsumsortgt / cumsumsortwt + rec = cumsumsortgt / np.sum(tog[:,0]) + + ap = voc_ap(rec, prec) + return ap, rec, prec + diff --git a/cognitive_mapping_and_planning/tfcode/__init__.py b/cognitive_mapping_and_planning/tfcode/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/cognitive_mapping_and_planning/tfcode/cmp.py b/cognitive_mapping_and_planning/tfcode/cmp.py new file mode 100644 index 0000000000000000000000000000000000000000..228ef90fddcd9ff41b26795544d93a1f18466158 --- /dev/null +++ b/cognitive_mapping_and_planning/tfcode/cmp.py @@ -0,0 +1,553 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Code for setting up the network for CMP. + +Sets up the mapper and the planner. +""" + +import sys, os, numpy as np +import matplotlib.pyplot as plt +import copy +import argparse, pprint +import time + + +import tensorflow as tf + +from tensorflow.contrib import slim +from tensorflow.contrib.slim import arg_scope + +import logging +from tensorflow.python.platform import app +from tensorflow.python.platform import flags +from src import utils +import src.file_utils as fu +import tfcode.nav_utils as nu +import tfcode.cmp_utils as cu +import tfcode.cmp_summary as cmp_s +from tfcode import tf_utils + +value_iteration_network = cu.value_iteration_network +rotate_preds = cu.rotate_preds +deconv = cu.deconv +get_visual_frustum = cu.get_visual_frustum +fr_v2 = cu.fr_v2 + +setup_train_step_kwargs = nu.default_train_step_kwargs +compute_losses_multi_or = nu.compute_losses_multi_or + +get_repr_from_image = nu.get_repr_from_image + +_save_d_at_t = nu.save_d_at_t +_save_all = nu.save_all +_eval_ap = nu.eval_ap +_eval_dist = nu.eval_dist +_plot_trajectories = nu.plot_trajectories + +_vis_readout_maps = cmp_s._vis_readout_maps +_vis = cmp_s._vis +_summary_vis = cmp_s._summary_vis +_summary_readout_maps = cmp_s._summary_readout_maps +_add_summaries = cmp_s._add_summaries + +def _inputs(problem): + # Set up inputs. + with tf.name_scope('inputs'): + inputs = [] + inputs.append(('orig_maps', tf.float32, + (problem.batch_size, 1, None, None, 1))) + inputs.append(('goal_loc', tf.float32, + (problem.batch_size, problem.num_goals, 2))) + common_input_data, _ = tf_utils.setup_inputs(inputs) + + inputs = [] + if problem.input_type == 'vision': + # Multiple images from an array of cameras. + inputs.append(('imgs', tf.float32, + (problem.batch_size, None, len(problem.aux_delta_thetas)+1, + problem.img_height, problem.img_width, + problem.img_channels))) + elif problem.input_type == 'analytical_counts': + for i in range(len(problem.map_crop_sizes)): + inputs.append(('analytical_counts_{:d}'.format(i), tf.float32, + (problem.batch_size, None, problem.map_crop_sizes[i], + problem.map_crop_sizes[i], problem.map_channels))) + + if problem.outputs.readout_maps: + for i in range(len(problem.readout_maps_crop_sizes)): + inputs.append(('readout_maps_{:d}'.format(i), tf.float32, + (problem.batch_size, None, + problem.readout_maps_crop_sizes[i], + problem.readout_maps_crop_sizes[i], + problem.readout_maps_channels))) + + for i in range(len(problem.map_crop_sizes)): + inputs.append(('ego_goal_imgs_{:d}'.format(i), tf.float32, + (problem.batch_size, None, problem.map_crop_sizes[i], + problem.map_crop_sizes[i], problem.goal_channels))) + for s in ['sum_num', 'sum_denom', 'max_denom']: + inputs.append(('running_'+s+'_{:d}'.format(i), tf.float32, + (problem.batch_size, 1, problem.map_crop_sizes[i], + problem.map_crop_sizes[i], problem.map_channels))) + + inputs.append(('incremental_locs', tf.float32, + (problem.batch_size, None, 2))) + inputs.append(('incremental_thetas', tf.float32, + (problem.batch_size, None, 1))) + inputs.append(('step_number', tf.int32, (1, None, 1))) + inputs.append(('node_ids', tf.int32, (problem.batch_size, None, + problem.node_ids_dim))) + inputs.append(('perturbs', tf.float32, (problem.batch_size, None, + problem.perturbs_dim))) + + # For plotting result plots + inputs.append(('loc_on_map', tf.float32, (problem.batch_size, None, 2))) + inputs.append(('gt_dist_to_goal', tf.float32, (problem.batch_size, None, 1))) + + step_input_data, _ = tf_utils.setup_inputs(inputs) + + inputs = [] + inputs.append(('action', tf.int32, (problem.batch_size, None, problem.num_actions))) + train_data, _ = tf_utils.setup_inputs(inputs) + train_data.update(step_input_data) + train_data.update(common_input_data) + return common_input_data, step_input_data, train_data + +def readout_general(multi_scale_belief, num_neurons, strides, layers_per_block, + kernel_size, batch_norm_is_training_op, wt_decay): + multi_scale_belief = tf.stop_gradient(multi_scale_belief) + with tf.variable_scope('readout_maps_deconv'): + x, outs = deconv(multi_scale_belief, batch_norm_is_training_op, + wt_decay=wt_decay, neurons=num_neurons, strides=strides, + layers_per_block=layers_per_block, kernel_size=kernel_size, + conv_fn=slim.conv2d_transpose, offset=0, + name='readout_maps_deconv') + probs = tf.sigmoid(x) + return x, probs + + +def running_combine(fss_logits, confs_probs, incremental_locs, + incremental_thetas, previous_sum_num, previous_sum_denom, + previous_max_denom, map_size, num_steps): + # fss_logits is B x N x H x W x C + # confs_logits is B x N x H x W x C + # incremental_locs is B x N x 2 + # incremental_thetas is B x N x 1 + # previous_sum_num etc is B x 1 x H x W x C + + with tf.name_scope('combine_{:d}'.format(num_steps)): + running_sum_nums_ = []; running_sum_denoms_ = []; + running_max_denoms_ = []; + + fss_logits_ = tf.unstack(fss_logits, axis=1, num=num_steps) + confs_probs_ = tf.unstack(confs_probs, axis=1, num=num_steps) + incremental_locs_ = tf.unstack(incremental_locs, axis=1, num=num_steps) + incremental_thetas_ = tf.unstack(incremental_thetas, axis=1, num=num_steps) + running_sum_num = tf.unstack(previous_sum_num, axis=1, num=1)[0] + running_sum_denom = tf.unstack(previous_sum_denom, axis=1, num=1)[0] + running_max_denom = tf.unstack(previous_max_denom, axis=1, num=1)[0] + + for i in range(num_steps): + # Rotate the previous running_num and running_denom + running_sum_num, running_sum_denom, running_max_denom = rotate_preds( + incremental_locs_[i], incremental_thetas_[i], map_size, + [running_sum_num, running_sum_denom, running_max_denom], + output_valid_mask=False)[0] + # print i, num_steps, running_sum_num.get_shape().as_list() + running_sum_num = running_sum_num + fss_logits_[i] * confs_probs_[i] + running_sum_denom = running_sum_denom + confs_probs_[i] + running_max_denom = tf.maximum(running_max_denom, confs_probs_[i]) + running_sum_nums_.append(running_sum_num) + running_sum_denoms_.append(running_sum_denom) + running_max_denoms_.append(running_max_denom) + + running_sum_nums = tf.stack(running_sum_nums_, axis=1) + running_sum_denoms = tf.stack(running_sum_denoms_, axis=1) + running_max_denoms = tf.stack(running_max_denoms_, axis=1) + return running_sum_nums, running_sum_denoms, running_max_denoms + +def get_map_from_images(imgs, mapper_arch, task_params, freeze_conv, wt_decay, + is_training, batch_norm_is_training_op, num_maps, + split_maps=True): + # Hit image with a resnet. + n_views = len(task_params.aux_delta_thetas) + 1 + out = utils.Foo() + + images_reshaped = tf.reshape(imgs, + shape=[-1, task_params.img_height, + task_params.img_width, + task_params.img_channels], name='re_image') + + x, out.vars_to_restore = get_repr_from_image( + images_reshaped, task_params.modalities, task_params.data_augment, + mapper_arch.encoder, freeze_conv, wt_decay, is_training) + + # Reshape into nice things so that these can be accumulated over time steps + # for faster backprop. + sh_before = x.get_shape().as_list() + out.encoder_output = tf.reshape(x, shape=[task_params.batch_size, -1, n_views] + sh_before[1:]) + x = tf.reshape(out.encoder_output, shape=[-1] + sh_before[1:]) + + # Add a layer to reduce dimensions for a fc layer. + if mapper_arch.dim_reduce_neurons > 0: + ks = 1; neurons = mapper_arch.dim_reduce_neurons; + init_var = np.sqrt(2.0/(ks**2)/neurons) + batch_norm_param = mapper_arch.batch_norm_param + batch_norm_param['is_training'] = batch_norm_is_training_op + out.conv_feat = slim.conv2d(x, neurons, kernel_size=ks, stride=1, + normalizer_fn=slim.batch_norm, normalizer_params=batch_norm_param, + padding='SAME', scope='dim_reduce', + weights_regularizer=slim.l2_regularizer(wt_decay), + weights_initializer=tf.random_normal_initializer(stddev=init_var)) + reshape_conv_feat = slim.flatten(out.conv_feat) + sh = reshape_conv_feat.get_shape().as_list() + out.reshape_conv_feat = tf.reshape(reshape_conv_feat, shape=[-1, sh[1]*n_views]) + + with tf.variable_scope('fc'): + # Fully connected layers to compute the representation in top-view space. + fc_batch_norm_param = {'center': True, 'scale': True, + 'activation_fn':tf.nn.relu, + 'is_training': batch_norm_is_training_op} + f = out.reshape_conv_feat + out_neurons = (mapper_arch.fc_out_size**2)*mapper_arch.fc_out_neurons + neurons = mapper_arch.fc_neurons + [out_neurons] + f, _ = tf_utils.fc_network(f, neurons=neurons, wt_decay=wt_decay, + name='fc', offset=0, + batch_norm_param=fc_batch_norm_param, + is_training=is_training, + dropout_ratio=mapper_arch.fc_dropout) + f = tf.reshape(f, shape=[-1, mapper_arch.fc_out_size, + mapper_arch.fc_out_size, + mapper_arch.fc_out_neurons], name='re_fc') + + # Use pool5 to predict the free space map via deconv layers. + with tf.variable_scope('deconv'): + x, outs = deconv(f, batch_norm_is_training_op, wt_decay=wt_decay, + neurons=mapper_arch.deconv_neurons, + strides=mapper_arch.deconv_strides, + layers_per_block=mapper_arch.deconv_layers_per_block, + kernel_size=mapper_arch.deconv_kernel_size, + conv_fn=slim.conv2d_transpose, offset=0, name='deconv') + + # Reshape x the right way. + sh = x.get_shape().as_list() + x = tf.reshape(x, shape=[task_params.batch_size, -1] + sh[1:]) + out.deconv_output = x + + # Separate out the map and the confidence predictions, pass the confidence + # through a sigmoid. + if split_maps: + with tf.name_scope('split'): + out_all = tf.split(value=x, axis=4, num_or_size_splits=2*num_maps) + out.fss_logits = out_all[:num_maps] + out.confs_logits = out_all[num_maps:] + with tf.name_scope('sigmoid'): + out.confs_probs = [tf.nn.sigmoid(x) for x in out.confs_logits] + return out + +def setup_to_run(m, args, is_training, batch_norm_is_training, summary_mode): + assert(args.arch.multi_scale), 'removed support for old single scale code.' + # Set up the model. + tf.set_random_seed(args.solver.seed) + task_params = args.navtask.task_params + + batch_norm_is_training_op = \ + tf.placeholder_with_default(batch_norm_is_training, shape=[], + name='batch_norm_is_training_op') + + # Setup the inputs + m.input_tensors = {} + m.train_ops = {} + m.input_tensors['common'], m.input_tensors['step'], m.input_tensors['train'] = \ + _inputs(task_params) + + m.init_fn = None + + if task_params.input_type == 'vision': + m.vision_ops = get_map_from_images( + m.input_tensors['step']['imgs'], args.mapper_arch, + task_params, args.solver.freeze_conv, + args.solver.wt_decay, is_training, batch_norm_is_training_op, + num_maps=len(task_params.map_crop_sizes)) + + # Load variables from snapshot if needed. + if args.solver.pretrained_path is not None: + m.init_fn = slim.assign_from_checkpoint_fn(args.solver.pretrained_path, + m.vision_ops.vars_to_restore) + + # Set up caching of vision features if needed. + if args.solver.freeze_conv: + m.train_ops['step_data_cache'] = [m.vision_ops.encoder_output] + else: + m.train_ops['step_data_cache'] = [] + + # Set up blobs that are needed for the computation in rest of the graph. + m.ego_map_ops = m.vision_ops.fss_logits + m.coverage_ops = m.vision_ops.confs_probs + + # Zero pad these to make them same size as what the planner expects. + for i in range(len(m.ego_map_ops)): + if args.mapper_arch.pad_map_with_zeros_each[i] > 0: + paddings = np.zeros((5,2), dtype=np.int32) + paddings[2:4,:] = args.mapper_arch.pad_map_with_zeros_each[i] + paddings_op = tf.constant(paddings, dtype=tf.int32) + m.ego_map_ops[i] = tf.pad(m.ego_map_ops[i], paddings=paddings_op) + m.coverage_ops[i] = tf.pad(m.coverage_ops[i], paddings=paddings_op) + + elif task_params.input_type == 'analytical_counts': + m.ego_map_ops = []; m.coverage_ops = [] + for i in range(len(task_params.map_crop_sizes)): + ego_map_op = m.input_tensors['step']['analytical_counts_{:d}'.format(i)] + coverage_op = tf.cast(tf.greater_equal( + tf.reduce_max(ego_map_op, reduction_indices=[4], + keep_dims=True), 1), tf.float32) + coverage_op = tf.ones_like(ego_map_op) * coverage_op + m.ego_map_ops.append(ego_map_op) + m.coverage_ops.append(coverage_op) + m.train_ops['step_data_cache'] = [] + + num_steps = task_params.num_steps + num_goals = task_params.num_goals + + map_crop_size_ops = [] + for map_crop_size in task_params.map_crop_sizes: + map_crop_size_ops.append(tf.constant(map_crop_size, dtype=tf.int32, shape=(2,))) + + with tf.name_scope('check_size'): + is_single_step = tf.equal(tf.unstack(tf.shape(m.ego_map_ops[0]), num=5)[1], 1) + + fr_ops = []; value_ops = []; + fr_intermediate_ops = []; value_intermediate_ops = []; + crop_value_ops = []; + resize_crop_value_ops = []; + confs = []; occupancys = []; + + previous_value_op = None + updated_state = []; state_names = []; + + for i in range(len(task_params.map_crop_sizes)): + map_crop_size = task_params.map_crop_sizes[i] + with tf.variable_scope('scale_{:d}'.format(i)): + # Accumulate the map. + fn = lambda ns: running_combine( + m.ego_map_ops[i], + m.coverage_ops[i], + m.input_tensors['step']['incremental_locs'] * task_params.map_scales[i], + m.input_tensors['step']['incremental_thetas'], + m.input_tensors['step']['running_sum_num_{:d}'.format(i)], + m.input_tensors['step']['running_sum_denom_{:d}'.format(i)], + m.input_tensors['step']['running_max_denom_{:d}'.format(i)], + map_crop_size, ns) + + running_sum_num, running_sum_denom, running_max_denom = \ + tf.cond(is_single_step, lambda: fn(1), lambda: fn(num_steps*num_goals)) + updated_state += [running_sum_num, running_sum_denom, running_max_denom] + state_names += ['running_sum_num_{:d}'.format(i), + 'running_sum_denom_{:d}'.format(i), + 'running_max_denom_{:d}'.format(i)] + + # Concat the accumulated map and goal + occupancy = running_sum_num / tf.maximum(running_sum_denom, 0.001) + conf = running_max_denom + # print occupancy.get_shape().as_list() + + # Concat occupancy, how much occupied and goal. + with tf.name_scope('concat'): + sh = [-1, map_crop_size, map_crop_size, task_params.map_channels] + occupancy = tf.reshape(occupancy, shape=sh) + conf = tf.reshape(conf, shape=sh) + + sh = [-1, map_crop_size, map_crop_size, task_params.goal_channels] + goal = tf.reshape(m.input_tensors['step']['ego_goal_imgs_{:d}'.format(i)], shape=sh) + to_concat = [occupancy, conf, goal] + + if previous_value_op is not None: + to_concat.append(previous_value_op) + + x = tf.concat(to_concat, 3) + + # Pass the map, previous rewards and the goal through a few convolutional + # layers to get fR. + fr_op, fr_intermediate_op = fr_v2( + x, output_neurons=args.arch.fr_neurons, + inside_neurons=args.arch.fr_inside_neurons, + is_training=batch_norm_is_training_op, name='fr', + wt_decay=args.solver.wt_decay, stride=args.arch.fr_stride) + + # Do Value Iteration on the fR + if args.arch.vin_num_iters > 0: + value_op, value_intermediate_op = value_iteration_network( + fr_op, num_iters=args.arch.vin_num_iters, + val_neurons=args.arch.vin_val_neurons, + action_neurons=args.arch.vin_action_neurons, + kernel_size=args.arch.vin_ks, share_wts=args.arch.vin_share_wts, + name='vin', wt_decay=args.solver.wt_decay) + else: + value_op = fr_op + value_intermediate_op = [] + + # Crop out and upsample the previous value map. + remove = args.arch.crop_remove_each + if remove > 0: + crop_value_op = value_op[:, remove:-remove, remove:-remove,:] + else: + crop_value_op = value_op + crop_value_op = tf.reshape(crop_value_op, shape=[-1, args.arch.value_crop_size, + args.arch.value_crop_size, + args.arch.vin_val_neurons]) + if i < len(task_params.map_crop_sizes)-1: + # Reshape it to shape of the next scale. + previous_value_op = tf.image.resize_bilinear(crop_value_op, + map_crop_size_ops[i+1], + align_corners=True) + resize_crop_value_ops.append(previous_value_op) + + occupancys.append(occupancy) + confs.append(conf) + value_ops.append(value_op) + crop_value_ops.append(crop_value_op) + fr_ops.append(fr_op) + fr_intermediate_ops.append(fr_intermediate_op) + + m.value_ops = value_ops + m.value_intermediate_ops = value_intermediate_ops + m.fr_ops = fr_ops + m.fr_intermediate_ops = fr_intermediate_ops + m.final_value_op = crop_value_op + m.crop_value_ops = crop_value_ops + m.resize_crop_value_ops = resize_crop_value_ops + m.confs = confs + m.occupancys = occupancys + + sh = [-1, args.arch.vin_val_neurons*((args.arch.value_crop_size)**2)] + m.value_features_op = tf.reshape(m.final_value_op, sh, name='reshape_value_op') + + # Determine what action to take. + with tf.variable_scope('action_pred'): + batch_norm_param = args.arch.pred_batch_norm_param + if batch_norm_param is not None: + batch_norm_param['is_training'] = batch_norm_is_training_op + m.action_logits_op, _ = tf_utils.fc_network( + m.value_features_op, neurons=args.arch.pred_neurons, + wt_decay=args.solver.wt_decay, name='pred', offset=0, + num_pred=task_params.num_actions, + batch_norm_param=batch_norm_param) + m.action_prob_op = tf.nn.softmax(m.action_logits_op) + + init_state = tf.constant(0., dtype=tf.float32, shape=[ + task_params.batch_size, 1, map_crop_size, map_crop_size, + task_params.map_channels]) + + m.train_ops['state_names'] = state_names + m.train_ops['updated_state'] = updated_state + m.train_ops['init_state'] = [init_state for _ in updated_state] + + m.train_ops['step'] = m.action_prob_op + m.train_ops['common'] = [m.input_tensors['common']['orig_maps'], + m.input_tensors['common']['goal_loc']] + m.train_ops['batch_norm_is_training_op'] = batch_norm_is_training_op + m.loss_ops = []; m.loss_ops_names = []; + + if args.arch.readout_maps: + with tf.name_scope('readout_maps'): + all_occupancys = tf.concat(m.occupancys + m.confs, 3) + readout_maps, probs = readout_general( + all_occupancys, num_neurons=args.arch.rom_arch.num_neurons, + strides=args.arch.rom_arch.strides, + layers_per_block=args.arch.rom_arch.layers_per_block, + kernel_size=args.arch.rom_arch.kernel_size, + batch_norm_is_training_op=batch_norm_is_training_op, + wt_decay=args.solver.wt_decay) + + gt_ego_maps = [m.input_tensors['step']['readout_maps_{:d}'.format(i)] + for i in range(len(task_params.readout_maps_crop_sizes))] + m.readout_maps_gt = tf.concat(gt_ego_maps, 4) + gt_shape = tf.shape(m.readout_maps_gt) + m.readout_maps_logits = tf.reshape(readout_maps, gt_shape) + m.readout_maps_probs = tf.reshape(probs, gt_shape) + + # Add a loss op + m.readout_maps_loss_op = tf.losses.sigmoid_cross_entropy( + tf.reshape(m.readout_maps_gt, [-1, len(task_params.readout_maps_crop_sizes)]), + tf.reshape(readout_maps, [-1, len(task_params.readout_maps_crop_sizes)]), + scope='loss') + m.readout_maps_loss_op = 10.*m.readout_maps_loss_op + + ewma_decay = 0.99 if is_training else 0.0 + weight = tf.ones_like(m.input_tensors['train']['action'], dtype=tf.float32, + name='weight') + m.reg_loss_op, m.data_loss_op, m.total_loss_op, m.acc_ops = \ + compute_losses_multi_or(m.action_logits_op, + m.input_tensors['train']['action'], weights=weight, + num_actions=task_params.num_actions, + data_loss_wt=args.solver.data_loss_wt, + reg_loss_wt=args.solver.reg_loss_wt, + ewma_decay=ewma_decay) + + if args.arch.readout_maps: + m.total_loss_op = m.total_loss_op + m.readout_maps_loss_op + m.loss_ops += [m.readout_maps_loss_op] + m.loss_ops_names += ['readout_maps_loss'] + + m.loss_ops += [m.reg_loss_op, m.data_loss_op, m.total_loss_op] + m.loss_ops_names += ['reg_loss', 'data_loss', 'total_loss'] + + if args.solver.freeze_conv: + vars_to_optimize = list(set(tf.trainable_variables()) - + set(m.vision_ops.vars_to_restore)) + else: + vars_to_optimize = None + + m.lr_op, m.global_step_op, m.train_op, m.should_stop_op, m.optimizer, \ + m.sync_optimizer = tf_utils.setup_training( + m.total_loss_op, + args.solver.initial_learning_rate, + args.solver.steps_per_decay, + args.solver.learning_rate_decay, + args.solver.momentum, + args.solver.max_steps, + args.solver.sync, + args.solver.adjust_lr_sync, + args.solver.num_workers, + args.solver.task, + vars_to_optimize=vars_to_optimize, + clip_gradient_norm=args.solver.clip_gradient_norm, + typ=args.solver.typ, momentum2=args.solver.momentum2, + adam_eps=args.solver.adam_eps) + + if args.arch.sample_gt_prob_type == 'inverse_sigmoid_decay': + m.sample_gt_prob_op = tf_utils.inverse_sigmoid_decay(args.arch.isd_k, + m.global_step_op) + elif args.arch.sample_gt_prob_type == 'zero': + m.sample_gt_prob_op = tf.constant(-1.0, dtype=tf.float32) + + elif args.arch.sample_gt_prob_type.split('_')[0] == 'step': + step = int(args.arch.sample_gt_prob_type.split('_')[1]) + m.sample_gt_prob_op = tf_utils.step_gt_prob( + step, m.input_tensors['step']['step_number'][0,0,0]) + + m.sample_action_type = args.arch.action_sample_type + m.sample_action_combine_type = args.arch.action_sample_combine_type + + m.summary_ops = { + summary_mode: _add_summaries(m, args, summary_mode, + args.summary.arop_full_summary_iters)} + + m.init_op = tf.group(tf.global_variables_initializer(), + tf.local_variables_initializer()) + m.saver_op = tf.train.Saver(keep_checkpoint_every_n_hours=4, + write_version=tf.train.SaverDef.V2) + return m diff --git a/cognitive_mapping_and_planning/tfcode/cmp_summary.py b/cognitive_mapping_and_planning/tfcode/cmp_summary.py new file mode 100644 index 0000000000000000000000000000000000000000..55313bfbd52a9e079e1de5093ae1882a9bf1d858 --- /dev/null +++ b/cognitive_mapping_and_planning/tfcode/cmp_summary.py @@ -0,0 +1,213 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Code for setting up summaries for CMP. +""" + +import sys, os, numpy as np +import matplotlib.pyplot as plt + + +import tensorflow as tf + +from tensorflow.contrib import slim +from tensorflow.contrib.slim import arg_scope + +import logging +from tensorflow.python.platform import app +from tensorflow.python.platform import flags +from src import utils +import src.file_utils as fu +import tfcode.nav_utils as nu + +def _vis_readout_maps(outputs, global_step, output_dir, metric_summary, N): + # outputs is [gt_map, pred_map]: + if N >= 0: + outputs = outputs[:N] + N = len(outputs) + + plt.set_cmap('jet') + fig, axes = utils.subplot(plt, (N, outputs[0][0].shape[4]*2), (5,5)) + axes = axes.ravel()[::-1].tolist() + for i in range(N): + gt_map, pred_map = outputs[i] + for j in [0]: + for k in range(gt_map.shape[4]): + # Display something like the midpoint of the trajectory. + id = np.int(gt_map.shape[1]/2) + + ax = axes.pop(); + ax.imshow(gt_map[j,id,:,:,k], origin='lower', interpolation='none', + vmin=0., vmax=1.) + ax.set_axis_off(); + if i == 0: ax.set_title('gt_map') + + ax = axes.pop(); + ax.imshow(pred_map[j,id,:,:,k], origin='lower', interpolation='none', + vmin=0., vmax=1.) + ax.set_axis_off(); + if i == 0: ax.set_title('pred_map') + + file_name = os.path.join(output_dir, 'readout_map_{:d}.png'.format(global_step)) + with fu.fopen(file_name, 'w') as f: + fig.savefig(f, bbox_inches='tight', transparent=True, pad_inches=0) + plt.close(fig) + +def _vis(outputs, global_step, output_dir, metric_summary, N): + # Plot the value map, goal for various maps to see what if the model is + # learning anything useful. + # + # outputs is [values, goals, maps, occupancy, conf]. + # + if N >= 0: + outputs = outputs[:N] + N = len(outputs) + + plt.set_cmap('jet') + fig, axes = utils.subplot(plt, (N, outputs[0][0].shape[4]*5), (5,5)) + axes = axes.ravel()[::-1].tolist() + for i in range(N): + values, goals, maps, occupancy, conf = outputs[i] + for j in [0]: + for k in range(values.shape[4]): + # Display something like the midpoint of the trajectory. + id = np.int(values.shape[1]/2) + + ax = axes.pop(); + ax.imshow(goals[j,id,:,:,k], origin='lower', interpolation='none') + ax.set_axis_off(); + if i == 0: ax.set_title('goal') + + ax = axes.pop(); + ax.imshow(occupancy[j,id,:,:,k], origin='lower', interpolation='none') + ax.set_axis_off(); + if i == 0: ax.set_title('occupancy') + + ax = axes.pop(); + ax.imshow(conf[j,id,:,:,k], origin='lower', interpolation='none', + vmin=0., vmax=1.) + ax.set_axis_off(); + if i == 0: ax.set_title('conf') + + ax = axes.pop(); + ax.imshow(values[j,id,:,:,k], origin='lower', interpolation='none') + ax.set_axis_off(); + if i == 0: ax.set_title('value') + + ax = axes.pop(); + ax.imshow(maps[j,id,:,:,k], origin='lower', interpolation='none') + ax.set_axis_off(); + if i == 0: ax.set_title('incr map') + + file_name = os.path.join(output_dir, 'value_vis_{:d}.png'.format(global_step)) + with fu.fopen(file_name, 'w') as f: + fig.savefig(f, bbox_inches='tight', transparent=True, pad_inches=0) + plt.close(fig) + +def _summary_vis(m, batch_size, num_steps, arop_full_summary_iters): + arop = []; arop_summary_iters = []; arop_eval_fns = []; + vis_value_ops = []; vis_goal_ops = []; vis_map_ops = []; + vis_occupancy_ops = []; vis_conf_ops = []; + for i, val_op in enumerate(m.value_ops): + vis_value_op = tf.reduce_mean(tf.abs(val_op), axis=3, keep_dims=True) + vis_value_ops.append(vis_value_op) + + vis_occupancy_op = tf.reduce_mean(tf.abs(m.occupancys[i]), 3, True) + vis_occupancy_ops.append(vis_occupancy_op) + + vis_conf_op = tf.reduce_max(tf.abs(m.confs[i]), axis=3, keep_dims=True) + vis_conf_ops.append(vis_conf_op) + + ego_goal_imgs_i_op = m.input_tensors['step']['ego_goal_imgs_{:d}'.format(i)] + vis_goal_op = tf.reduce_max(ego_goal_imgs_i_op, 4, True) + vis_goal_ops.append(vis_goal_op) + + vis_map_op = tf.reduce_mean(tf.abs(m.ego_map_ops[i]), 4, True) + vis_map_ops.append(vis_map_op) + + vis_goal_ops = tf.concat(vis_goal_ops, 4) + vis_map_ops = tf.concat(vis_map_ops, 4) + vis_value_ops = tf.concat(vis_value_ops, 3) + vis_occupancy_ops = tf.concat(vis_occupancy_ops, 3) + vis_conf_ops = tf.concat(vis_conf_ops, 3) + + sh = tf.unstack(tf.shape(vis_value_ops))[1:] + vis_value_ops = tf.reshape(vis_value_ops, shape=[batch_size, -1] + sh) + + sh = tf.unstack(tf.shape(vis_conf_ops))[1:] + vis_conf_ops = tf.reshape(vis_conf_ops, shape=[batch_size, -1] + sh) + + sh = tf.unstack(tf.shape(vis_occupancy_ops))[1:] + vis_occupancy_ops = tf.reshape(vis_occupancy_ops, shape=[batch_size,-1] + sh) + + # Save memory, only return time steps that need to be visualized, factor of + # 32 CPU memory saving. + id = np.int(num_steps/2) + vis_goal_ops = tf.expand_dims(vis_goal_ops[:,id,:,:,:], axis=1) + vis_map_ops = tf.expand_dims(vis_map_ops[:,id,:,:,:], axis=1) + vis_value_ops = tf.expand_dims(vis_value_ops[:,id,:,:,:], axis=1) + vis_conf_ops = tf.expand_dims(vis_conf_ops[:,id,:,:,:], axis=1) + vis_occupancy_ops = tf.expand_dims(vis_occupancy_ops[:,id,:,:,:], axis=1) + + arop += [[vis_value_ops, vis_goal_ops, vis_map_ops, vis_occupancy_ops, + vis_conf_ops]] + arop_summary_iters += [arop_full_summary_iters] + arop_eval_fns += [_vis] + return arop, arop_summary_iters, arop_eval_fns + +def _summary_readout_maps(m, num_steps, arop_full_summary_iters): + arop = []; arop_summary_iters = []; arop_eval_fns = []; + id = np.int(num_steps-1) + vis_readout_maps_gt = m.readout_maps_gt + vis_readout_maps_prob = tf.reshape(m.readout_maps_probs, + shape=tf.shape(vis_readout_maps_gt)) + vis_readout_maps_gt = tf.expand_dims(vis_readout_maps_gt[:,id,:,:,:], 1) + vis_readout_maps_prob = tf.expand_dims(vis_readout_maps_prob[:,id,:,:,:], 1) + arop += [[vis_readout_maps_gt, vis_readout_maps_prob]] + arop_summary_iters += [arop_full_summary_iters] + arop_eval_fns += [_vis_readout_maps] + return arop, arop_summary_iters, arop_eval_fns + +def _add_summaries(m, args, summary_mode, arop_full_summary_iters): + task_params = args.navtask.task_params + + summarize_ops = [m.lr_op, m.global_step_op, m.sample_gt_prob_op] + \ + m.loss_ops + m.acc_ops + summarize_names = ['lr', 'global_step', 'sample_gt_prob_op'] + \ + m.loss_ops_names + ['acc_{:d}'.format(i) for i in range(len(m.acc_ops))] + to_aggregate = [0, 0, 0] + [1]*len(m.loss_ops_names) + [1]*len(m.acc_ops) + + scope_name = 'summary' + with tf.name_scope(scope_name): + s_ops = nu.add_default_summaries(summary_mode, arop_full_summary_iters, + summarize_ops, summarize_names, + to_aggregate, m.action_prob_op, + m.input_tensors, scope_name=scope_name) + if summary_mode == 'val': + arop, arop_summary_iters, arop_eval_fns = _summary_vis( + m, task_params.batch_size, task_params.num_steps, + arop_full_summary_iters) + s_ops.additional_return_ops += arop + s_ops.arop_summary_iters += arop_summary_iters + s_ops.arop_eval_fns += arop_eval_fns + + if args.arch.readout_maps: + arop, arop_summary_iters, arop_eval_fns = _summary_readout_maps( + m, task_params.num_steps, arop_full_summary_iters) + s_ops.additional_return_ops += arop + s_ops.arop_summary_iters += arop_summary_iters + s_ops.arop_eval_fns += arop_eval_fns + + return s_ops diff --git a/cognitive_mapping_and_planning/tfcode/cmp_utils.py b/cognitive_mapping_and_planning/tfcode/cmp_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..6d87c697b4b29128c8b8a42caac27aeb4d657ec6 --- /dev/null +++ b/cognitive_mapping_and_planning/tfcode/cmp_utils.py @@ -0,0 +1,164 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Utility functions for setting up the CMP graph. +""" + +import os, numpy as np +import matplotlib.pyplot as plt + + +import tensorflow as tf + +from tensorflow.contrib import slim +from tensorflow.contrib.slim import arg_scope +import logging +from src import utils +import src.file_utils as fu +from tfcode import tf_utils + +resnet_v2 = tf_utils.resnet_v2 +custom_residual_block = tf_utils.custom_residual_block + +def value_iteration_network( + fr, num_iters, val_neurons, action_neurons, kernel_size, share_wts=False, + name='vin', wt_decay=0.0001, activation_fn=None, shape_aware=False): + """ + Constructs a Value Iteration Network, convolutions and max pooling across + channels. + Input: + fr: NxWxHxC + val_neurons: Number of channels for maintaining the value. + action_neurons: Computes action_neurons * val_neurons at each iteration to + max pool over. + Output: + value image: NxHxWx(val_neurons) + """ + init_var = np.sqrt(2.0/(kernel_size**2)/(val_neurons*action_neurons)) + vals = [] + with tf.variable_scope(name) as varscope: + if shape_aware == False: + fr_shape = tf.unstack(tf.shape(fr)) + val_shape = tf.stack(fr_shape[:-1] + [val_neurons]) + val = tf.zeros(val_shape, name='val_init') + else: + val = tf.expand_dims(tf.zeros_like(fr[:,:,:,0]), dim=-1) * \ + tf.constant(0., dtype=tf.float32, shape=[1,1,1,val_neurons]) + val_shape = tf.shape(val) + vals.append(val) + for i in range(num_iters): + if share_wts: + # The first Value Iteration maybe special, so it can have its own + # paramterss. + scope = 'conv' + if i == 0: scope = 'conv_0' + if i > 1: varscope.reuse_variables() + else: + scope = 'conv_{:d}'.format(i) + val = slim.conv2d(tf.concat([val, fr], 3, name='concat_{:d}'.format(i)), + num_outputs=action_neurons*val_neurons, + kernel_size=kernel_size, stride=1, activation_fn=activation_fn, + scope=scope, normalizer_fn=None, + weights_regularizer=slim.l2_regularizer(wt_decay), + weights_initializer=tf.random_normal_initializer(stddev=init_var), + biases_initializer=tf.zeros_initializer()) + val = tf.reshape(val, [-1, action_neurons*val_neurons, 1, 1], + name='re_{:d}'.format(i)) + val = slim.max_pool2d(val, kernel_size=[action_neurons,1], + stride=[action_neurons,1], padding='VALID', + scope='val_{:d}'.format(i)) + val = tf.reshape(val, val_shape, name='unre_{:d}'.format(i)) + vals.append(val) + return val, vals + + +def rotate_preds(loc_on_map, relative_theta, map_size, preds, + output_valid_mask): + with tf.name_scope('rotate'): + flow_op = tf_utils.get_flow(loc_on_map, relative_theta, map_size=map_size) + if type(preds) != list: + rotated_preds, valid_mask_warps = tf_utils.dense_resample(preds, flow_op, + output_valid_mask) + else: + rotated_preds = [] ;valid_mask_warps = [] + for pred in preds: + rotated_pred, valid_mask_warp = tf_utils.dense_resample(pred, flow_op, + output_valid_mask) + rotated_preds.append(rotated_pred) + valid_mask_warps.append(valid_mask_warp) + return rotated_preds, valid_mask_warps + +def get_visual_frustum(map_size, shape_like, expand_dims=[0,0]): + with tf.name_scope('visual_frustum'): + l = np.tril(np.ones(map_size)) ;l = l + l[:,::-1] + l = (l == 2).astype(np.float32) + for e in expand_dims: + l = np.expand_dims(l, axis=e) + confs_probs = tf.constant(l, dtype=tf.float32) + confs_probs = tf.ones_like(shape_like, dtype=tf.float32) * confs_probs + return confs_probs + +def deconv(x, is_training, wt_decay, neurons, strides, layers_per_block, + kernel_size, conv_fn, name, offset=0): + """Generates a up sampling network with residual connections. + """ + batch_norm_param = {'center': True, 'scale': True, + 'activation_fn': tf.nn.relu, + 'is_training': is_training} + outs = [] + for i, (neuron, stride) in enumerate(zip(neurons, strides)): + for s in range(layers_per_block): + scope = '{:s}_{:d}_{:d}'.format(name, i+1+offset,s+1) + x = custom_residual_block(x, neuron, kernel_size, stride, scope, + is_training, wt_decay, use_residual=True, + residual_stride_conv=True, conv_fn=conv_fn, + batch_norm_param=batch_norm_param) + stride = 1 + outs.append((x,True)) + return x, outs + +def fr_v2(x, output_neurons, inside_neurons, is_training, name='fr', + wt_decay=0.0001, stride=1, updates_collections=tf.GraphKeys.UPDATE_OPS): + """Performs fusion of information between the map and the reward map. + Inputs + x: NxHxWxC1 + + Outputs + fr map: NxHxWx(output_neurons) + """ + if type(stride) != list: + stride = [stride] + with slim.arg_scope(resnet_v2.resnet_utils.resnet_arg_scope( + is_training=is_training, weight_decay=wt_decay)): + with slim.arg_scope([slim.batch_norm], updates_collections=updates_collections) as arg_sc: + # Change the updates_collections for the conv normalizer_params to None + for i in range(len(arg_sc.keys())): + if 'convolution' in arg_sc.keys()[i]: + arg_sc.values()[i]['normalizer_params']['updates_collections'] = updates_collections + with slim.arg_scope(arg_sc): + bottleneck = resnet_v2.bottleneck + blocks = [] + for i, s in enumerate(stride): + b = resnet_v2.resnet_utils.Block( + 'block{:d}'.format(i + 1), bottleneck, [{ + 'depth': output_neurons, + 'depth_bottleneck': inside_neurons, + 'stride': stride[i] + }]) + blocks.append(b) + x, outs = resnet_v2.resnet_v2(x, blocks, num_classes=None, global_pool=False, + output_stride=None, include_root_block=False, + reuse=False, scope=name) + return x, outs diff --git a/cognitive_mapping_and_planning/tfcode/nav_utils.py b/cognitive_mapping_and_planning/tfcode/nav_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..2f764f33df91a80f6539dcbae1e0fa7093becd29 --- /dev/null +++ b/cognitive_mapping_and_planning/tfcode/nav_utils.py @@ -0,0 +1,435 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Various losses for training navigation agents. + +Defines various loss functions for navigation agents, +compute_losses_multi_or. +""" + +import os, numpy as np +import matplotlib.pyplot as plt + + +import tensorflow as tf + +from tensorflow.contrib import slim +from tensorflow.contrib.slim import arg_scope +from tensorflow.contrib.slim.nets import resnet_v2 +from tensorflow.python.training import moving_averages +import logging +from src import utils +import src.file_utils as fu +from tfcode import tf_utils + + +def compute_losses_multi_or(logits, actions_one_hot, weights=None, + num_actions=-1, data_loss_wt=1., reg_loss_wt=1., + ewma_decay=0.99, reg_loss_op=None): + assert(num_actions > 0), 'num_actions must be specified and must be > 0.' + + with tf.name_scope('loss'): + if weights is None: + weight = tf.ones_like(actions_one_hot, dtype=tf.float32, name='weight') + + actions_one_hot = tf.cast(tf.reshape(actions_one_hot, [-1, num_actions], + 're_actions_one_hot'), tf.float32) + weights = tf.reduce_sum(tf.reshape(weights, [-1, num_actions], 're_weight'), + reduction_indices=1) + total = tf.reduce_sum(weights) + + action_prob = tf.nn.softmax(logits) + action_prob = tf.reduce_sum(tf.multiply(action_prob, actions_one_hot), + reduction_indices=1) + example_loss = -tf.log(tf.maximum(tf.constant(1e-4), action_prob)) + + data_loss_op = tf.reduce_sum(example_loss * weights) / total + if reg_loss_op is None: + if reg_loss_wt > 0: + reg_loss_op = tf.add_n(tf.losses.get_regularization_losses()) + else: + reg_loss_op = tf.constant(0.) + + if reg_loss_wt > 0: + total_loss_op = data_loss_wt*data_loss_op + reg_loss_wt*reg_loss_op + else: + total_loss_op = data_loss_wt*data_loss_op + + is_correct = tf.cast(tf.greater(action_prob, 0.5, name='pred_class'), tf.float32) + acc_op = tf.reduce_sum(is_correct*weights) / total + + ewma_acc_op = moving_averages.weighted_moving_average( + acc_op, ewma_decay, weight=total, name='ewma_acc') + + acc_ops = [ewma_acc_op] + + return reg_loss_op, data_loss_op, total_loss_op, acc_ops + + +def get_repr_from_image(images_reshaped, modalities, data_augment, encoder, + freeze_conv, wt_decay, is_training): + # Pass image through lots of convolutional layers, to obtain pool5 + if modalities == ['rgb']: + with tf.name_scope('pre_rgb'): + x = (images_reshaped + 128.) / 255. # Convert to brightness between 0 and 1. + if data_augment.relight and is_training: + x = tf_utils.distort_image(x, fast_mode=data_augment.relight_fast) + x = (x-0.5)*2.0 + scope_name = encoder + elif modalities == ['depth']: + with tf.name_scope('pre_d'): + d_image = images_reshaped + x = 2*(d_image[...,0] - 80.0)/100.0 + y = d_image[...,1] + d_image = tf.concat([tf.expand_dims(x, -1), tf.expand_dims(y, -1)], 3) + x = d_image + scope_name = 'd_'+encoder + + resnet_is_training = is_training and (not freeze_conv) + with slim.arg_scope(resnet_v2.resnet_utils.resnet_arg_scope(resnet_is_training)): + fn = getattr(tf_utils, encoder) + x, end_points = fn(x, num_classes=None, global_pool=False, + output_stride=None, reuse=None, + scope=scope_name) + vars_ = slim.get_variables_to_restore() + + conv_feat = x + return conv_feat, vars_ + +def default_train_step_kwargs(m, obj, logdir, rng_seed, is_chief, num_steps, + iters, train_display_interval, + dagger_sample_bn_false): + train_step_kwargs = {} + train_step_kwargs['obj'] = obj + train_step_kwargs['m'] = m + + # rng_data has 2 independent rngs, one for sampling episodes and one for + # sampling perturbs (so that we can make results reproducible. + train_step_kwargs['rng_data'] = [np.random.RandomState(rng_seed), + np.random.RandomState(rng_seed)] + train_step_kwargs['rng_action'] = np.random.RandomState(rng_seed) + if is_chief: + train_step_kwargs['writer'] = tf.summary.FileWriter(logdir) #, m.tf_graph) + else: + train_step_kwargs['writer'] = None + train_step_kwargs['iters'] = iters + train_step_kwargs['train_display_interval'] = train_display_interval + train_step_kwargs['num_steps'] = num_steps + train_step_kwargs['logdir'] = logdir + train_step_kwargs['dagger_sample_bn_false'] = dagger_sample_bn_false + return train_step_kwargs + +# Utilities for visualizing and analysing validation output. +def save_d_at_t(outputs, global_step, output_dir, metric_summary, N): + """Save distance to goal at all time steps. + + Args: + outputs : [gt_dist_to_goal]. + global_step : number of iterations. + output_dir : output directory. + metric_summary : to append scalars to summary. + N : number of outputs to process. + + """ + d_at_t = np.concatenate(map(lambda x: x[0][:,:,0]*1, outputs), axis=0) + fig, axes = utils.subplot(plt, (1,1), (5,5)) + axes.plot(np.arange(d_at_t.shape[1]), np.mean(d_at_t, axis=0), 'r.') + axes.set_xlabel('time step') + axes.set_ylabel('dist to next goal') + axes.grid('on') + file_name = os.path.join(output_dir, 'dist_at_t_{:d}.png'.format(global_step)) + with fu.fopen(file_name, 'w') as f: + fig.savefig(f, bbox_inches='tight', transparent=True, pad_inches=0) + file_name = os.path.join(output_dir, 'dist_at_t_{:d}.pkl'.format(global_step)) + utils.save_variables(file_name, [d_at_t], ['d_at_t'], overwrite=True) + plt.close(fig) + return None + +def save_all(outputs, global_step, output_dir, metric_summary, N): + """Save numerous statistics. + + Args: + outputs : [locs, goal_loc, gt_dist_to_goal, node_ids, perturbs] + global_step : number of iterations. + output_dir : output directory. + metric_summary : to append scalars to summary. + N : number of outputs to process. + """ + all_locs = np.concatenate(map(lambda x: x[0], outputs), axis=0) + all_goal_locs = np.concatenate(map(lambda x: x[1], outputs), axis=0) + all_d_at_t = np.concatenate(map(lambda x: x[2][:,:,0]*1, outputs), axis=0) + all_node_ids = np.concatenate(map(lambda x: x[3], outputs), axis=0) + all_perturbs = np.concatenate(map(lambda x: x[4], outputs), axis=0) + + file_name = os.path.join(output_dir, 'all_locs_at_t_{:d}.pkl'.format(global_step)) + vars = [all_locs, all_goal_locs, all_d_at_t, all_node_ids, all_perturbs] + var_names = ['all_locs', 'all_goal_locs', 'all_d_at_t', 'all_node_ids', 'all_perturbs'] + utils.save_variables(file_name, vars, var_names, overwrite=True) + return None + +def eval_ap(outputs, global_step, output_dir, metric_summary, N, num_classes=4): + """Processes the collected outputs to compute AP for action prediction. + + Args: + outputs : [logits, labels] + global_step : global_step. + output_dir : where to store results. + metric_summary : summary object to add summaries to. + N : number of outputs to process. + num_classes : number of classes to compute AP over, and to reshape tensors. + """ + if N >= 0: + outputs = outputs[:N] + logits = np.concatenate(map(lambda x: x[0], outputs), axis=0).reshape((-1, num_classes)) + labels = np.concatenate(map(lambda x: x[1], outputs), axis=0).reshape((-1, num_classes)) + aps = [] + for i in range(logits.shape[1]): + ap, rec, prec = utils.calc_pr(labels[:,i], logits[:,i]) + ap = ap[0] + tf_utils.add_value_to_summary(metric_summary, 'aps/ap_{:d}: '.format(i), ap) + aps.append(ap) + return aps + +def eval_dist(outputs, global_step, output_dir, metric_summary, N): + """Processes the collected outputs during validation to + 1. Plot the distance over time curve. + 2. Compute mean and median distances. + 3. Plots histogram of end distances. + + Args: + outputs : [locs, goal_loc, gt_dist_to_goal]. + global_step : global_step. + output_dir : where to store results. + metric_summary : summary object to add summaries to. + N : number of outputs to process. + """ + SUCCESS_THRESH = 3 + if N >= 0: + outputs = outputs[:N] + + # Plot distance at time t. + d_at_t = [] + for i in range(len(outputs)): + locs, goal_loc, gt_dist_to_goal = outputs[i] + d_at_t.append(gt_dist_to_goal[:,:,0]*1) + + # Plot the distance. + fig, axes = utils.subplot(plt, (1,1), (5,5)) + d_at_t = np.concatenate(d_at_t, axis=0) + axes.plot(np.arange(d_at_t.shape[1]), np.mean(d_at_t, axis=0), 'r.') + axes.set_xlabel('time step') + axes.set_ylabel('dist to next goal') + axes.grid('on') + file_name = os.path.join(output_dir, 'dist_at_t_{:d}.png'.format(global_step)) + with fu.fopen(file_name, 'w') as f: + fig.savefig(f, bbox_inches='tight', transparent=True, pad_inches=0) + file_name = os.path.join(output_dir, 'dist_at_t_{:d}.pkl'.format(global_step)) + utils.save_variables(file_name, [d_at_t], ['d_at_t'], overwrite=True) + plt.close(fig) + + # Plot the trajectories and the init_distance and final distance. + d_inits = [] + d_ends = [] + for i in range(len(outputs)): + locs, goal_loc, gt_dist_to_goal = outputs[i] + d_inits.append(gt_dist_to_goal[:,0,0]*1) + d_ends.append(gt_dist_to_goal[:,-1,0]*1) + + # Plot the distance. + fig, axes = utils.subplot(plt, (1,1), (5,5)) + d_inits = np.concatenate(d_inits, axis=0) + d_ends = np.concatenate(d_ends, axis=0) + axes.plot(d_inits+np.random.rand(*(d_inits.shape))-0.5, + d_ends+np.random.rand(*(d_ends.shape))-0.5, '.', mec='red', mew=1.0) + axes.set_xlabel('init dist'); axes.set_ylabel('final dist'); + axes.grid('on'); axes.axis('equal'); + title_str = 'mean: {:0.1f}, 50: {:0.1f}, 75: {:0.2f}, s: {:0.1f}' + title_str = title_str.format( + np.mean(d_ends), np.median(d_ends), np.percentile(d_ends, q=75), + 100*(np.mean(d_ends <= SUCCESS_THRESH))) + axes.set_title(title_str) + file_name = os.path.join(output_dir, 'dist_{:d}.png'.format(global_step)) + with fu.fopen(file_name, 'w') as f: + fig.savefig(f, bbox_inches='tight', transparent=True, pad_inches=0) + + file_name = os.path.join(output_dir, 'dist_{:d}.pkl'.format(global_step)) + utils.save_variables(file_name, [d_inits, d_ends], ['d_inits', 'd_ends'], + overwrite=True) + plt.close(fig) + + # Plot the histogram of the end_distance. + with plt.style.context('seaborn-white'): + d_ends_ = np.sort(d_ends) + d_inits_ = np.sort(d_inits) + leg = []; + fig, ax = utils.subplot(plt, (1,1), (5,5)) + ax.grid('on') + ax.set_xlabel('Distance from goal'); ax.xaxis.label.set_fontsize(16); + ax.set_ylabel('Fraction of data'); ax.yaxis.label.set_fontsize(16); + ax.plot(d_ends_, np.arange(d_ends_.size)*1./d_ends_.size, 'r') + ax.plot(d_inits_, np.arange(d_inits_.size)*1./d_inits_.size, 'k') + leg.append('Final'); leg.append('Init'); + ax.legend(leg, fontsize='x-large'); + ax.set_axis_on() + title_str = 'mean: {:0.1f}, 50: {:0.1f}, 75: {:0.2f}, s: {:0.1f}' + title_str = title_str.format( + np.mean(d_ends), np.median(d_ends), np.percentile(d_ends, q=75), + 100*(np.mean(d_ends <= SUCCESS_THRESH))) + ax.set_title(title_str) + file_name = os.path.join(output_dir, 'dist_hist_{:d}.png'.format(global_step)) + with fu.fopen(file_name, 'w') as f: + fig.savefig(f, bbox_inches='tight', transparent=True, pad_inches=0) + + # Log distance metrics. + tf_utils.add_value_to_summary(metric_summary, 'dists/success_init: ', + 100*(np.mean(d_inits <= SUCCESS_THRESH))) + tf_utils.add_value_to_summary(metric_summary, 'dists/success_end: ', + 100*(np.mean(d_ends <= SUCCESS_THRESH))) + tf_utils.add_value_to_summary(metric_summary, 'dists/dist_init (75): ', + np.percentile(d_inits, q=75)) + tf_utils.add_value_to_summary(metric_summary, 'dists/dist_end (75): ', + np.percentile(d_ends, q=75)) + tf_utils.add_value_to_summary(metric_summary, 'dists/dist_init (median): ', + np.median(d_inits)) + tf_utils.add_value_to_summary(metric_summary, 'dists/dist_end (median): ', + np.median(d_ends)) + tf_utils.add_value_to_summary(metric_summary, 'dists/dist_init (mean): ', + np.mean(d_inits)) + tf_utils.add_value_to_summary(metric_summary, 'dists/dist_end (mean): ', + np.mean(d_ends)) + return np.median(d_inits), np.median(d_ends), np.mean(d_inits), np.mean(d_ends), \ + np.percentile(d_inits, q=75), np.percentile(d_ends, q=75), \ + 100*(np.mean(d_inits) <= SUCCESS_THRESH), 100*(np.mean(d_ends) <= SUCCESS_THRESH) + +def plot_trajectories(outputs, global_step, output_dir, metric_summary, N): + """Processes the collected outputs during validation to plot the trajectories + in the top view. + + Args: + outputs : [locs, orig_maps, goal_loc]. + global_step : global_step. + output_dir : where to store results. + metric_summary : summary object to add summaries to. + N : number of outputs to process. + """ + if N >= 0: + outputs = outputs[:N] + N = len(outputs) + + plt.set_cmap('gray') + fig, axes = utils.subplot(plt, (N, outputs[0][1].shape[0]), (5,5)) + axes = axes.ravel()[::-1].tolist() + for i in range(N): + locs, orig_maps, goal_loc = outputs[i] + is_semantic = np.isnan(goal_loc[0,0,1]) + for j in range(orig_maps.shape[0]): + ax = axes.pop(); + ax.plot(locs[j,0,0], locs[j,0,1], 'ys') + # Plot one by one, so that they come in different colors. + for k in range(goal_loc.shape[1]): + if not is_semantic: + ax.plot(goal_loc[j,k,0], goal_loc[j,k,1], 's') + if False: + ax.plot(locs[j,:,0], locs[j,:,1], 'r.', ms=3) + ax.imshow(orig_maps[j,0,:,:,0], origin='lower') + ax.set_axis_off(); + else: + ax.scatter(locs[j,:,0], locs[j,:,1], c=np.arange(locs.shape[1]), + cmap='jet', s=10, lw=0) + ax.imshow(orig_maps[j,0,:,:,0], origin='lower', vmin=-1.0, vmax=2.0) + if not is_semantic: + xymin = np.minimum(np.min(goal_loc[j,:,:], axis=0), np.min(locs[j,:,:], axis=0)) + xymax = np.maximum(np.max(goal_loc[j,:,:], axis=0), np.max(locs[j,:,:], axis=0)) + else: + xymin = np.min(locs[j,:,:], axis=0) + xymax = np.max(locs[j,:,:], axis=0) + xy1 = (xymax+xymin)/2. - np.maximum(np.max(xymax-xymin), 12) + xy2 = (xymax+xymin)/2. + np.maximum(np.max(xymax-xymin), 12) + ax.set_xlim([xy1[0], xy2[0]]) + ax.set_ylim([xy1[1], xy2[1]]) + ax.set_axis_off() + file_name = os.path.join(output_dir, 'trajectory_{:d}.png'.format(global_step)) + with fu.fopen(file_name, 'w') as f: + fig.savefig(f, bbox_inches='tight', transparent=True, pad_inches=0) + plt.close(fig) + return None + +def add_default_summaries(mode, arop_full_summary_iters, summarize_ops, + summarize_names, to_aggregate, action_prob_op, + input_tensors, scope_name): + assert(mode == 'train' or mode == 'val' or mode == 'test'), \ + 'add_default_summaries mode is neither train or val or test.' + + s_ops = tf_utils.get_default_summary_ops() + + if mode == 'train': + s_ops.summary_ops, s_ops.print_summary_ops, additional_return_ops, \ + arop_summary_iters, arop_eval_fns = tf_utils.simple_summaries( + summarize_ops, summarize_names, mode, to_aggregate=False, + scope_name=scope_name) + s_ops.additional_return_ops += additional_return_ops + s_ops.arop_summary_iters += arop_summary_iters + s_ops.arop_eval_fns += arop_eval_fns + elif mode == 'val': + s_ops.summary_ops, s_ops.print_summary_ops, additional_return_ops, \ + arop_summary_iters, arop_eval_fns = tf_utils.simple_summaries( + summarize_ops, summarize_names, mode, to_aggregate=to_aggregate, + scope_name=scope_name) + s_ops.additional_return_ops += additional_return_ops + s_ops.arop_summary_iters += arop_summary_iters + s_ops.arop_eval_fns += arop_eval_fns + + elif mode == 'test': + s_ops.summary_ops, s_ops.print_summary_ops, additional_return_ops, \ + arop_summary_iters, arop_eval_fns = tf_utils.simple_summaries( + [], [], mode, to_aggregate=[], scope_name=scope_name) + s_ops.additional_return_ops += additional_return_ops + s_ops.arop_summary_iters += arop_summary_iters + s_ops.arop_eval_fns += arop_eval_fns + + + if mode == 'val': + arop = s_ops.additional_return_ops + arop += [[action_prob_op, input_tensors['train']['action']]] + arop += [[input_tensors['step']['loc_on_map'], + input_tensors['common']['goal_loc'], + input_tensors['step']['gt_dist_to_goal']]] + arop += [[input_tensors['step']['loc_on_map'], + input_tensors['common']['orig_maps'], + input_tensors['common']['goal_loc']]] + s_ops.arop_summary_iters += [-1, arop_full_summary_iters, + arop_full_summary_iters] + s_ops.arop_eval_fns += [eval_ap, eval_dist, plot_trajectories] + + elif mode == 'test': + arop = s_ops.additional_return_ops + arop += [[input_tensors['step']['loc_on_map'], + input_tensors['common']['goal_loc'], + input_tensors['step']['gt_dist_to_goal']]] + arop += [[input_tensors['step']['gt_dist_to_goal']]] + arop += [[input_tensors['step']['loc_on_map'], + input_tensors['common']['goal_loc'], + input_tensors['step']['gt_dist_to_goal'], + input_tensors['step']['node_ids'], + input_tensors['step']['perturbs']]] + arop += [[input_tensors['step']['loc_on_map'], + input_tensors['common']['orig_maps'], + input_tensors['common']['goal_loc']]] + s_ops.arop_summary_iters += [-1, -1, -1, arop_full_summary_iters] + s_ops.arop_eval_fns += [eval_dist, save_d_at_t, save_all, + plot_trajectories] + return s_ops + + diff --git a/cognitive_mapping_and_planning/tfcode/tf_utils.py b/cognitive_mapping_and_planning/tfcode/tf_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..5f96d8ff5ce7473f0ec49096abcbac274e6c4fcc --- /dev/null +++ b/cognitive_mapping_and_planning/tfcode/tf_utils.py @@ -0,0 +1,840 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +import numpy as np +import sys +import tensorflow as tf +import src.utils as utils +import logging +from tensorflow.contrib import slim +from tensorflow.contrib.metrics.python.ops import confusion_matrix_ops +from tensorflow.contrib.slim import arg_scope +from tensorflow.contrib.slim.nets import resnet_v2 +from tensorflow.python.framework import dtypes +from tensorflow.python.ops import array_ops +from tensorflow.python.ops import check_ops +from tensorflow.python.ops import math_ops +from tensorflow.python.ops import variable_scope +sys.path.insert(0, '../slim') +from preprocessing import inception_preprocessing as ip + +resnet_v2_50 = resnet_v2.resnet_v2_50 + + +def custom_residual_block(x, neurons, kernel_size, stride, name, is_training, + wt_decay=0.0001, use_residual=True, + residual_stride_conv=True, conv_fn=slim.conv2d, + batch_norm_param=None): + + # batch norm x and relu + init_var = np.sqrt(2.0/(kernel_size**2)/neurons) + with arg_scope([conv_fn], + weights_regularizer=slim.l2_regularizer(wt_decay), + weights_initializer=tf.random_normal_initializer(stddev=init_var), + biases_initializer=tf.zeros_initializer()): + + if batch_norm_param is None: + batch_norm_param = {'center': True, 'scale': False, + 'activation_fn':tf.nn.relu, + 'is_training': is_training} + + y = slim.batch_norm(x, scope=name+'_bn', **batch_norm_param) + + y = conv_fn(y, num_outputs=neurons, kernel_size=kernel_size, stride=stride, + activation_fn=None, scope=name+'_1', + normalizer_fn=slim.batch_norm, + normalizer_params=batch_norm_param) + + y = conv_fn(y, num_outputs=neurons, kernel_size=kernel_size, + stride=1, activation_fn=None, scope=name+'_2') + + if use_residual: + if stride != 1 or x.get_shape().as_list()[-1] != neurons: + batch_norm_param_ = dict(batch_norm_param) + batch_norm_param_['activation_fn'] = None + x = conv_fn(x, num_outputs=neurons, kernel_size=1, + stride=stride if residual_stride_conv else 1, + activation_fn=None, scope=name+'_0_1x1', + normalizer_fn=slim.batch_norm, + normalizer_params=batch_norm_param_) + if not residual_stride_conv: + x = slim.avg_pool2d(x, 1, stride=stride, scope=name+'_0_avg') + + y = tf.add(x, y, name=name+'_add') + + return y + +def step_gt_prob(step, step_number_op): + # Change samping probability from 1 to -1 at step steps. + with tf.name_scope('step_gt_prob'): + out = tf.cond(tf.less(step_number_op, step), + lambda: tf.constant(1.), lambda: tf.constant(-1.)) + return out + +def inverse_sigmoid_decay(k, global_step_op): + with tf.name_scope('inverse_sigmoid_decay'): + k = tf.constant(k, dtype=tf.float32) + tmp = k*tf.exp(-tf.cast(global_step_op, tf.float32)/k) + tmp = tmp / (1. + tmp) + return tmp + +def dense_resample(im, flow_im, output_valid_mask, name='dense_resample'): + """ Resample reward at particular locations. + Args: + im: ...xHxWxC matrix to sample from. + flow_im: ...xHxWx2 matrix, samples the image using absolute offsets as given + by the flow_im. + """ + with tf.name_scope(name): + valid_mask = None + + x, y = tf.unstack(flow_im, axis=-1) + x = tf.cast(tf.reshape(x, [-1]), tf.float32) + y = tf.cast(tf.reshape(y, [-1]), tf.float32) + + # constants + shape = tf.unstack(tf.shape(im)) + channels = shape[-1] + width = shape[-2] + height = shape[-3] + num_batch = tf.cast(tf.reduce_prod(tf.stack(shape[:-3])), 'int32') + zero = tf.constant(0, dtype=tf.int32) + + # Round up and down. + x0 = tf.cast(tf.floor(x), 'int32'); x1 = x0 + 1; + y0 = tf.cast(tf.floor(y), 'int32'); y1 = y0 + 1; + + if output_valid_mask: + valid_mask = tf.logical_and( + tf.logical_and(tf.less_equal(x, tf.cast(width, tf.float32)-1.), tf.greater_equal(x, 0.)), + tf.logical_and(tf.less_equal(y, tf.cast(height, tf.float32)-1.), tf.greater_equal(y, 0.))) + valid_mask = tf.reshape(valid_mask, shape=shape[:-1] + [1]) + + x0 = tf.clip_by_value(x0, zero, width-1) + x1 = tf.clip_by_value(x1, zero, width-1) + y0 = tf.clip_by_value(y0, zero, height-1) + y1 = tf.clip_by_value(y1, zero, height-1) + + dim2 = width; dim1 = width * height; + + # Create base index + base = tf.reshape(tf.range(num_batch) * dim1, shape=[-1,1]) + base = tf.reshape(tf.tile(base, [1, height*width]), shape=[-1]) + + base_y0 = base + y0 * dim2 + base_y1 = base + y1 * dim2 + idx_a = base_y0 + x0 + idx_b = base_y1 + x0 + idx_c = base_y0 + x1 + idx_d = base_y1 + x1 + + # use indices to lookup pixels in the flat image and restore channels dim + sh = tf.stack([tf.constant(-1,dtype=tf.int32), channels]) + im_flat = tf.cast(tf.reshape(im, sh), dtype=tf.float32) + pixel_a = tf.gather(im_flat, idx_a) + pixel_b = tf.gather(im_flat, idx_b) + pixel_c = tf.gather(im_flat, idx_c) + pixel_d = tf.gather(im_flat, idx_d) + + # and finally calculate interpolated values + x1_f = tf.to_float(x1) + y1_f = tf.to_float(y1) + + wa = tf.expand_dims(((x1_f - x) * (y1_f - y)), 1) + wb = tf.expand_dims((x1_f - x) * (1.0 - (y1_f - y)), 1) + wc = tf.expand_dims(((1.0 - (x1_f - x)) * (y1_f - y)), 1) + wd = tf.expand_dims(((1.0 - (x1_f - x)) * (1.0 - (y1_f - y))), 1) + + output = tf.add_n([wa * pixel_a, wb * pixel_b, wc * pixel_c, wd * pixel_d]) + output = tf.reshape(output, shape=tf.shape(im)) + return output, valid_mask + +def get_flow(t, theta, map_size, name_scope='gen_flow'): + """ + Rotates the map by theta and translates the rotated map by t. + + Assume that the robot rotates by an angle theta and then moves forward by + translation t. This function returns the flow field field. For every pixel in + the new image it tells us which pixel in the original image it came from: + NewI(x, y) = OldI(flow_x(x,y), flow_y(x,y)). + + Assume there is a point p in the original image. Robot rotates by R and moves + forward by t. p1 = Rt*p; p2 = p1 - t; (the world moves in opposite direction. + So, p2 = Rt*p - t, thus p2 came from R*(p2+t), which is what this function + calculates. + + t: ... x 2 (translation for B batches of N motions each). + theta: ... x 1 (rotation for B batches of N motions each). + + Output: ... x map_size x map_size x 2 + """ + + with tf.name_scope(name_scope): + tx, ty = tf.unstack(tf.reshape(t, shape=[-1, 1, 1, 1, 2]), axis=4) + theta = tf.reshape(theta, shape=[-1, 1, 1, 1]) + c = tf.constant((map_size-1.)/2., dtype=tf.float32) + + x, y = np.meshgrid(np.arange(map_size), np.arange(map_size)) + x = tf.constant(x[np.newaxis, :, :, np.newaxis], dtype=tf.float32, name='x', + shape=[1, map_size, map_size, 1]) + y = tf.constant(y[np.newaxis, :, :, np.newaxis], dtype=tf.float32, name='y', + shape=[1,map_size, map_size, 1]) + + x = x-(-tx+c) + y = y-(-ty+c) + + sin_theta = tf.sin(theta) + cos_theta = tf.cos(theta) + xr = cos_theta*x - sin_theta*y + yr = sin_theta*x + cos_theta*y + + xr = xr + c + yr = yr + c + + flow = tf.stack([xr, yr], axis=-1) + sh = tf.unstack(tf.shape(t), axis=0) + sh = tf.stack(sh[:-1]+[tf.constant(_, dtype=tf.int32) for _ in [map_size, map_size, 2]]) + flow = tf.reshape(flow, shape=sh) + return flow + +def distort_image(im, fast_mode=False): + # All images in the same batch are transformed the same way, but over + # iterations you see different distortions. + # im should be float with values between 0 and 1. + im_ = tf.reshape(im, shape=(-1,1,3)) + im_ = ip.apply_with_random_selector( + im_, lambda x, ordering: ip.distort_color(x, ordering, fast_mode), + num_cases=4) + im_ = tf.reshape(im_, tf.shape(im)) + return im_ + +def fc_network(x, neurons, wt_decay, name, num_pred=None, offset=0, + batch_norm_param=None, dropout_ratio=0.0, is_training=None): + if dropout_ratio > 0: + assert(is_training is not None), \ + 'is_training needs to be defined when trainnig with dropout.' + + repr = [] + for i, neuron in enumerate(neurons): + init_var = np.sqrt(2.0/neuron) + if batch_norm_param is not None: + x = slim.fully_connected(x, neuron, activation_fn=None, + weights_initializer=tf.random_normal_initializer(stddev=init_var), + weights_regularizer=slim.l2_regularizer(wt_decay), + normalizer_fn=slim.batch_norm, + normalizer_params=batch_norm_param, + biases_initializer=tf.zeros_initializer(), + scope='{:s}_{:d}'.format(name, offset+i)) + else: + x = slim.fully_connected(x, neuron, activation_fn=tf.nn.relu, + weights_initializer=tf.random_normal_initializer(stddev=init_var), + weights_regularizer=slim.l2_regularizer(wt_decay), + biases_initializer=tf.zeros_initializer(), + scope='{:s}_{:d}'.format(name, offset+i)) + if dropout_ratio > 0: + x = slim.dropout(x, keep_prob=1-dropout_ratio, is_training=is_training, + scope='{:s}_{:d}'.format('dropout_'+name, offset+i)) + repr.append(x) + + if num_pred is not None: + init_var = np.sqrt(2.0/num_pred) + x = slim.fully_connected(x, num_pred, + weights_regularizer=slim.l2_regularizer(wt_decay), + weights_initializer=tf.random_normal_initializer(stddev=init_var), + biases_initializer=tf.zeros_initializer(), + activation_fn=None, + scope='{:s}_pred'.format(name)) + return x, repr + +def concat_state_x_list(f, names): + af = {} + for i, k in enumerate(names): + af[k] = np.concatenate([x[i] for x in f], axis=1) + return af + +def concat_state_x(f, names): + af = {} + for k in names: + af[k] = np.concatenate([x[k] for x in f], axis=1) + # af[k] = np.swapaxes(af[k], 0, 1) + return af + +def sample_action(rng, action_probs, optimal_action, sample_gt_prob, + type='sample', combine_type='one_or_other'): + optimal_action_ = optimal_action/np.sum(optimal_action+0., 1, keepdims=True) + action_probs_ = action_probs/np.sum(action_probs+0.001, 1, keepdims=True) + batch_size = action_probs_.shape[0] + + action = np.zeros((batch_size), dtype=np.int32) + action_sample_wt = np.zeros((batch_size), dtype=np.float32) + if combine_type == 'add': + sample_gt_prob_ = np.minimum(np.maximum(sample_gt_prob, 0.), 1.) + + for i in range(batch_size): + if combine_type == 'one_or_other': + sample_gt = rng.rand() < sample_gt_prob + if sample_gt: distr_ = optimal_action_[i,:]*1. + else: distr_ = action_probs_[i,:]*1. + elif combine_type == 'add': + distr_ = optimal_action_[i,:]*sample_gt_prob_ + \ + (1.-sample_gt_prob_)*action_probs_[i,:] + distr_ = distr_ / np.sum(distr_) + + if type == 'sample': + action[i] = np.argmax(rng.multinomial(1, distr_, size=1)) + elif type == 'argmax': + action[i] = np.argmax(distr_) + action_sample_wt[i] = action_probs_[i, action[i]] / distr_[action[i]] + return action, action_sample_wt + +def train_step_custom_online_sampling(sess, train_op, global_step, + train_step_kwargs, mode='train'): + m = train_step_kwargs['m'] + obj = train_step_kwargs['obj'] + rng_data = train_step_kwargs['rng_data'] + rng_action = train_step_kwargs['rng_action'] + writer = train_step_kwargs['writer'] + iters = train_step_kwargs['iters'] + num_steps = train_step_kwargs['num_steps'] + logdir = train_step_kwargs['logdir'] + dagger_sample_bn_false = train_step_kwargs['dagger_sample_bn_false'] + train_display_interval = train_step_kwargs['train_display_interval'] + if 'outputs' not in m.train_ops: + m.train_ops['outputs'] = [] + + s_ops = m.summary_ops[mode] + val_additional_ops = [] + + # Print all variables here. + if False: + v = tf.get_collection(tf.GraphKeys.VARIABLES) + v_op = [_.value() for _ in v] + v_op_value = sess.run(v_op) + + filter = lambda x, y: 'Adam' in x.name + # filter = lambda x, y: np.is_any_nan(y) + ind = [i for i, (_, __) in enumerate(zip(v, v_op_value)) if filter(_, __)] + v = [v[i] for i in ind] + v_op_value = [v_op_value[i] for i in ind] + + for i in range(len(v)): + logging.info('XXXX: variable: %30s, is_any_nan: %5s, norm: %f.', + v[i].name, np.any(np.isnan(v_op_value[i])), + np.linalg.norm(v_op_value[i])) + + tt = utils.Timer() + for i in range(iters): + tt.tic() + # Sample a room. + e = obj.sample_env(rng_data) + + # Initialize the agent. + init_env_state = e.reset(rng_data) + + # Get and process the common data. + input = e.get_common_data() + input = e.pre_common_data(input) + feed_dict = prepare_feed_dict(m.input_tensors['common'], input) + if dagger_sample_bn_false: + feed_dict[m.train_ops['batch_norm_is_training_op']] = False + common_data = sess.run(m.train_ops['common'], feed_dict=feed_dict) + + states = [] + state_features = [] + state_targets = [] + net_state_to_input = [] + step_data_cache = [] + executed_actions = [] + rewards = [] + action_sample_wts = [] + states.append(init_env_state) + + net_state = sess.run(m.train_ops['init_state'], feed_dict=feed_dict) + net_state = dict(zip(m.train_ops['state_names'], net_state)) + net_state_to_input.append(net_state) + for j in range(num_steps): + f = e.get_features(states[j], j) + f = e.pre_features(f) + f.update(net_state) + f['step_number'] = np.ones((1,1,1), dtype=np.int32)*j + state_features.append(f) + + feed_dict = prepare_feed_dict(m.input_tensors['step'], state_features[-1]) + optimal_action = e.get_optimal_action(states[j], j) + for x, v in zip(m.train_ops['common'], common_data): + feed_dict[x] = v + if dagger_sample_bn_false: + feed_dict[m.train_ops['batch_norm_is_training_op']] = False + outs = sess.run([m.train_ops['step'], m.sample_gt_prob_op, + m.train_ops['step_data_cache'], + m.train_ops['updated_state'], + m.train_ops['outputs']], feed_dict=feed_dict) + action_probs = outs[0] + sample_gt_prob = outs[1] + step_data_cache.append(dict(zip(m.train_ops['step_data_cache'], outs[2]))) + net_state = outs[3] + if hasattr(e, 'update_state'): + outputs = outs[4] + outputs = dict(zip(m.train_ops['output_names'], outputs)) + e.update_state(outputs, j) + state_targets.append(e.get_targets(states[j], j)) + + if j < num_steps-1: + # Sample from action_probs and optimal action. + action, action_sample_wt = sample_action( + rng_action, action_probs, optimal_action, sample_gt_prob, + m.sample_action_type, m.sample_action_combine_type) + next_state, reward = e.take_action(states[j], action, j) + executed_actions.append(action) + states.append(next_state) + rewards.append(reward) + action_sample_wts.append(action_sample_wt) + net_state = dict(zip(m.train_ops['state_names'], net_state)) + net_state_to_input.append(net_state) + + # Concatenate things together for training. + rewards = np.array(rewards).T + action_sample_wts = np.array(action_sample_wts).T + executed_actions = np.array(executed_actions).T + all_state_targets = concat_state_x(state_targets, e.get_targets_name()) + all_state_features = concat_state_x(state_features, + e.get_features_name()+['step_number']) + # all_state_net = concat_state_x(net_state_to_input, + # m.train_ops['state_names']) + all_step_data_cache = concat_state_x(step_data_cache, + m.train_ops['step_data_cache']) + + dict_train = dict(input) + dict_train.update(all_state_features) + dict_train.update(all_state_targets) + # dict_train.update(all_state_net) + dict_train.update(net_state_to_input[0]) + dict_train.update(all_step_data_cache) + dict_train.update({'rewards': rewards, + 'action_sample_wts': action_sample_wts, + 'executed_actions': executed_actions}) + feed_dict = prepare_feed_dict(m.input_tensors['train'], dict_train) + for x in m.train_ops['step_data_cache']: + feed_dict[x] = all_step_data_cache[x] + if mode == 'train': + n_step = sess.run(global_step) + + if np.mod(n_step, train_display_interval) == 0: + total_loss, np_global_step, summary, print_summary = sess.run( + [train_op, global_step, s_ops.summary_ops, s_ops.print_summary_ops], + feed_dict=feed_dict) + logging.error("") + else: + total_loss, np_global_step, summary = sess.run( + [train_op, global_step, s_ops.summary_ops], feed_dict=feed_dict) + + if writer is not None and summary is not None: + writer.add_summary(summary, np_global_step) + + should_stop = sess.run(m.should_stop_op) + + if mode != 'train': + arop = [[] for j in range(len(s_ops.additional_return_ops))] + for j in range(len(s_ops.additional_return_ops)): + if s_ops.arop_summary_iters[j] < 0 or i < s_ops.arop_summary_iters[j]: + arop[j] = s_ops.additional_return_ops[j] + val = sess.run(arop, feed_dict=feed_dict) + val_additional_ops.append(val) + tt.toc(log_at=60, log_str='val timer {:d} / {:d}: '.format(i, iters), + type='time') + + if mode != 'train': + # Write the default val summaries. + summary, print_summary, np_global_step = sess.run( + [s_ops.summary_ops, s_ops.print_summary_ops, global_step]) + if writer is not None and summary is not None: + writer.add_summary(summary, np_global_step) + + # write custom validation ops + val_summarys = [] + val_additional_ops = zip(*val_additional_ops) + if len(s_ops.arop_eval_fns) > 0: + val_metric_summary = tf.summary.Summary() + for i in range(len(s_ops.arop_eval_fns)): + val_summary = None + if s_ops.arop_eval_fns[i] is not None: + val_summary = s_ops.arop_eval_fns[i](val_additional_ops[i], + np_global_step, logdir, + val_metric_summary, + s_ops.arop_summary_iters[i]) + val_summarys.append(val_summary) + if writer is not None: + writer.add_summary(val_metric_summary, np_global_step) + + # Return the additional val_ops + total_loss = (val_additional_ops, val_summarys) + should_stop = None + + return total_loss, should_stop + +def train_step_custom_v2(sess, train_op, global_step, train_step_kwargs, + mode='train'): + m = train_step_kwargs['m'] + obj = train_step_kwargs['obj'] + rng = train_step_kwargs['rng'] + writer = train_step_kwargs['writer'] + iters = train_step_kwargs['iters'] + logdir = train_step_kwargs['logdir'] + train_display_interval = train_step_kwargs['train_display_interval'] + + s_ops = m.summary_ops[mode] + val_additional_ops = [] + + # Print all variables here. + if False: + v = tf.get_collection(tf.GraphKeys.VARIABLES) + v_op = [_.value() for _ in v] + v_op_value = sess.run(v_op) + + filter = lambda x, y: 'Adam' in x.name + # filter = lambda x, y: np.is_any_nan(y) + ind = [i for i, (_, __) in enumerate(zip(v, v_op_value)) if filter(_, __)] + v = [v[i] for i in ind] + v_op_value = [v_op_value[i] for i in ind] + + for i in range(len(v)): + logging.info('XXXX: variable: %30s, is_any_nan: %5s, norm: %f.', + v[i].name, np.any(np.isnan(v_op_value[i])), + np.linalg.norm(v_op_value[i])) + + tt = utils.Timer() + for i in range(iters): + tt.tic() + e = obj.sample_env(rng) + rngs = e.gen_rng(rng) + input_data = e.gen_data(*rngs) + input_data = e.pre_data(input_data) + feed_dict = prepare_feed_dict(m.input_tensors, input_data) + + if mode == 'train': + n_step = sess.run(global_step) + + if np.mod(n_step, train_display_interval) == 0: + total_loss, np_global_step, summary, print_summary = sess.run( + [train_op, global_step, s_ops.summary_ops, s_ops.print_summary_ops], + feed_dict=feed_dict) + else: + total_loss, np_global_step, summary = sess.run( + [train_op, global_step, s_ops.summary_ops], + feed_dict=feed_dict) + + if writer is not None and summary is not None: + writer.add_summary(summary, np_global_step) + + should_stop = sess.run(m.should_stop_op) + + if mode != 'train': + arop = [[] for j in range(len(s_ops.additional_return_ops))] + for j in range(len(s_ops.additional_return_ops)): + if s_ops.arop_summary_iters[j] < 0 or i < s_ops.arop_summary_iters[j]: + arop[j] = s_ops.additional_return_ops[j] + val = sess.run(arop, feed_dict=feed_dict) + val_additional_ops.append(val) + tt.toc(log_at=60, log_str='val timer {:d} / {:d}: '.format(i, iters), + type='time') + + if mode != 'train': + # Write the default val summaries. + summary, print_summary, np_global_step = sess.run( + [s_ops.summary_ops, s_ops.print_summary_ops, global_step]) + if writer is not None and summary is not None: + writer.add_summary(summary, np_global_step) + + # write custom validation ops + val_summarys = [] + val_additional_ops = zip(*val_additional_ops) + if len(s_ops.arop_eval_fns) > 0: + val_metric_summary = tf.summary.Summary() + for i in range(len(s_ops.arop_eval_fns)): + val_summary = None + if s_ops.arop_eval_fns[i] is not None: + val_summary = s_ops.arop_eval_fns[i](val_additional_ops[i], + np_global_step, logdir, + val_metric_summary, + s_ops.arop_summary_iters[i]) + val_summarys.append(val_summary) + if writer is not None: + writer.add_summary(val_metric_summary, np_global_step) + + # Return the additional val_ops + total_loss = (val_additional_ops, val_summarys) + should_stop = None + + return total_loss, should_stop + +def train_step_custom(sess, train_op, global_step, train_step_kwargs, + mode='train'): + m = train_step_kwargs['m'] + params = train_step_kwargs['params'] + rng = train_step_kwargs['rng'] + writer = train_step_kwargs['writer'] + iters = train_step_kwargs['iters'] + gen_rng = train_step_kwargs['gen_rng'] + logdir = train_step_kwargs['logdir'] + gen_data = train_step_kwargs['gen_data'] + pre_data = train_step_kwargs['pre_data'] + train_display_interval = train_step_kwargs['train_display_interval'] + + val_additional_ops = [] + # Print all variables here. + if False: + v = tf.get_collection(tf.GraphKeys.VARIABLES) + for _ in v: + val = sess.run(_.value()) + logging.info('variable: %30s, is_any_nan: %5s, norm: %f.', _.name, + np.any(np.isnan(val)), np.linalg.norm(val)) + + for i in range(iters): + rngs = gen_rng(params, rng) + input_data = gen_data(params, *rngs) + input_data = pre_data(params, input_data) + feed_dict = prepare_feed_dict(m.input_tensors, input_data) + + if mode == 'train': + n_step = sess.run(global_step) + + if np.mod(n_step, train_display_interval) == 0: + total_loss, np_global_step, summary, print_summary = sess.run( + [train_op, global_step, m.summary_op[mode], m.print_summary_op[mode]], + feed_dict=feed_dict) + else: + total_loss, np_global_step, summary = sess.run( + [train_op, global_step, m.summary_op[mode]], + feed_dict=feed_dict) + + if writer is not None: + writer.add_summary(summary, np_global_step) + + should_stop = sess.run(m.should_stop_op) + + if mode == 'val': + val = sess.run(m.agg_update_op[mode] + m.additional_return_op[mode], + feed_dict=feed_dict) + val_additional_ops.append(val[len(m.agg_update_op[mode]):]) + + if mode == 'val': + summary, print_summary, np_global_step = sess.run( + [m.summary_op[mode], m.print_summary_op[mode], global_step]) + if writer is not None: + writer.add_summary(summary, np_global_step) + sess.run([m.agg_reset_op[mode]]) + + # write custom validation ops + if m.eval_metrics_fn[mode] is not None: + val_metric_summary = m.eval_metrics_fn[mode](val_additional_ops, + np_global_step, logdir) + if writer is not None: + writer.add_summary(val_metric_summary, np_global_step) + + total_loss = val_additional_ops + should_stop = None + + return total_loss, should_stop + +def setup_training(loss_op, initial_learning_rate, steps_per_decay, + learning_rate_decay, momentum, max_steps, + sync=False, adjust_lr_sync=True, + num_workers=1, replica_id=0, vars_to_optimize=None, + clip_gradient_norm=0, typ=None, momentum2=0.999, + adam_eps=1e-8): + if sync and adjust_lr_sync: + initial_learning_rate = initial_learning_rate * num_workers + max_steps = np.int(max_steps / num_workers) + steps_per_decay = np.int(steps_per_decay / num_workers) + + global_step_op = slim.get_or_create_global_step() + lr_op = tf.train.exponential_decay(initial_learning_rate, + global_step_op, steps_per_decay, learning_rate_decay, staircase=True) + if typ == 'sgd': + optimizer = tf.train.MomentumOptimizer(lr_op, momentum) + elif typ == 'adam': + optimizer = tf.train.AdamOptimizer(learning_rate=lr_op, beta1=momentum, + beta2=momentum2, epsilon=adam_eps) + + if sync: + + sync_optimizer = tf.train.SyncReplicasOptimizer(optimizer, + replicas_to_aggregate=num_workers, + replica_id=replica_id, + total_num_replicas=num_workers) + train_op = slim.learning.create_train_op(loss_op, sync_optimizer, + variables_to_train=vars_to_optimize, + clip_gradient_norm=clip_gradient_norm) + else: + sync_optimizer = None + train_op = slim.learning.create_train_op(loss_op, optimizer, + variables_to_train=vars_to_optimize, + clip_gradient_norm=clip_gradient_norm) + should_stop_op = tf.greater_equal(global_step_op, max_steps) + return lr_op, global_step_op, train_op, should_stop_op, optimizer, sync_optimizer + +def add_value_to_summary(metric_summary, tag, val, log=True, tag_str=None): + """Adds a scalar summary to the summary object. Optionally also logs to + logging.""" + new_value = metric_summary.value.add(); + new_value.tag = tag + new_value.simple_value = val + if log: + if tag_str is None: + tag_str = tag + '%f' + logging.info(tag_str, val) + +def add_scalar_summary_op(tensor, name=None, + summary_key='summaries', print_summary_key='print_summaries', prefix=''): + collections = [] + op = tf.summary.scalar(name, tensor, collections=collections) + if summary_key != print_summary_key: + tf.add_to_collection(summary_key, op) + + op = tf.Print(op, [tensor], ' {:-<25s}: '.format(name) + prefix) + tf.add_to_collection(print_summary_key, op) + return op + +def setup_inputs(inputs): + input_tensors = {} + input_shapes = {} + for (name, typ, sz) in inputs: + _ = tf.placeholder(typ, shape=sz, name=name) + input_tensors[name] = _ + input_shapes[name] = sz + return input_tensors, input_shapes + +def prepare_feed_dict(input_tensors, inputs): + feed_dict = {} + for n in input_tensors.keys(): + feed_dict[input_tensors[n]] = inputs[n].astype(input_tensors[n].dtype.as_numpy_dtype) + return feed_dict + +def simple_add_summaries(summarize_ops, summarize_names, + summary_key='summaries', + print_summary_key='print_summaries', prefix=''): + for op, name, in zip(summarize_ops, summarize_names): + add_scalar_summary_op(op, name, summary_key, print_summary_key, prefix) + + summary_op = tf.summary.merge_all(summary_key) + print_summary_op = tf.summary.merge_all(print_summary_key) + return summary_op, print_summary_op + +def add_summary_ops(m, summarize_ops, summarize_names, to_aggregate=None, + summary_key='summaries', + print_summary_key='print_summaries', prefix=''): + if type(to_aggregate) != list: + to_aggregate = [to_aggregate for _ in summarize_ops] + + # set up aggregating metrics + if np.any(to_aggregate): + agg_ops = [] + for op, name, to_agg in zip(summarize_ops, summarize_names, to_aggregate): + if to_agg: + # agg_ops.append(slim.metrics.streaming_mean(op, return_reset_op=True)) + agg_ops.append(tf.contrib.metrics.streaming_mean(op)) + # agg_ops.append(tf.contrib.metrics.streaming_mean(op, return_reset_op=True)) + else: + agg_ops.append([None, None, None]) + + # agg_values_op, agg_update_op, agg_reset_op = zip(*agg_ops) + # agg_update_op = [x for x in agg_update_op if x is not None] + # agg_reset_op = [x for x in agg_reset_op if x is not None] + agg_values_op, agg_update_op = zip(*agg_ops) + agg_update_op = [x for x in agg_update_op if x is not None] + agg_reset_op = [tf.no_op()] + else: + agg_values_op = [None for _ in to_aggregate] + agg_update_op = [tf.no_op()] + agg_reset_op = [tf.no_op()] + + for op, name, to_agg, agg_op in zip(summarize_ops, summarize_names, to_aggregate, agg_values_op): + if to_agg: + add_scalar_summary_op(agg_op, name, summary_key, print_summary_key, prefix) + else: + add_scalar_summary_op(op, name, summary_key, print_summary_key, prefix) + + summary_op = tf.summary.merge_all(summary_key) + print_summary_op = tf.summary.merge_all(print_summary_key) + return summary_op, print_summary_op, agg_update_op, agg_reset_op + + + +def accum_val_ops(outputs, names, global_step, output_dir, metric_summary, N): + """Processes the collected outputs to compute AP for action prediction. + + Args: + outputs : List of scalar ops to summarize. + names : Name of the scalar ops. + global_step : global_step. + output_dir : where to store results. + metric_summary : summary object to add summaries to. + N : number of outputs to process. + """ + outs = [] + if N >= 0: + outputs = outputs[:N] + for i in range(len(outputs[0])): + scalar = np.array(map(lambda x: x[i], outputs)) + assert(scalar.ndim == 1) + add_value_to_summary(metric_summary, names[i], np.mean(scalar), + tag_str='{:>27s}: [{:s}]: %f'.format(names[i], '')) + outs.append(np.mean(scalar)) + return outs + +def get_default_summary_ops(): + return utils.Foo(summary_ops=None, print_summary_ops=None, + additional_return_ops=[], arop_summary_iters=[], + arop_eval_fns=[]) + + +def simple_summaries(summarize_ops, summarize_names, mode, to_aggregate=False, + scope_name='summary'): + + if type(to_aggregate) != list: + to_aggregate = [to_aggregate for _ in summarize_ops] + + summary_key = '{:s}_summaries'.format(mode) + print_summary_key = '{:s}_print_summaries'.format(mode) + prefix=' [{:s}]: '.format(mode) + + # Default ops for things that dont need to be aggregated. + if not np.all(to_aggregate): + for op, name, to_agg in zip(summarize_ops, summarize_names, to_aggregate): + if not to_agg: + add_scalar_summary_op(op, name, summary_key, print_summary_key, prefix) + summary_ops = tf.summary.merge_all(summary_key) + print_summary_ops = tf.summary.merge_all(print_summary_key) + else: + summary_ops = tf.no_op() + print_summary_ops = tf.no_op() + + # Default ops for things that dont need to be aggregated. + if np.any(to_aggregate): + additional_return_ops = [[summarize_ops[i] + for i, x in enumerate(to_aggregate )if x]] + arop_summary_iters = [-1] + s_names = ['{:s}/{:s}'.format(scope_name, summarize_names[i]) + for i, x in enumerate(to_aggregate) if x] + fn = lambda outputs, global_step, output_dir, metric_summary, N: \ + accum_val_ops(outputs, s_names, global_step, output_dir, metric_summary, + N) + arop_eval_fns = [fn] + else: + additional_return_ops = [] + arop_summary_iters = [] + arop_eval_fns = [] + return summary_ops, print_summary_ops, additional_return_ops, \ + arop_summary_iters, arop_eval_fns diff --git a/cognitive_mapping_and_planning/tfcode/vision_baseline_lstm.py b/cognitive_mapping_and_planning/tfcode/vision_baseline_lstm.py new file mode 100644 index 0000000000000000000000000000000000000000..ccf3ab23b06b71ed2a6d300b9a7d2a67a396c52e --- /dev/null +++ b/cognitive_mapping_and_planning/tfcode/vision_baseline_lstm.py @@ -0,0 +1,533 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +import numpy as np + + +import tensorflow as tf + +from tensorflow.contrib import slim + +import logging +from tensorflow.python.platform import app +from tensorflow.python.platform import flags +from src import utils +import src.file_utils as fu +import tfcode.nav_utils as nu +from tfcode import tf_utils + +setup_train_step_kwargs = nu.default_train_step_kwargs +compute_losses_multi_or = nu.compute_losses_multi_or +get_repr_from_image = nu.get_repr_from_image + +_save_d_at_t = nu.save_d_at_t +_save_all = nu.save_all +_eval_ap = nu.eval_ap +_eval_dist = nu.eval_dist +_plot_trajectories = nu.plot_trajectories + +def lstm_online(cell_fn, num_steps, inputs, state, varscope): + # inputs is B x num_steps x C, C channels. + # state is 2 tuple with B x 1 x C1, B x 1 x C2 + # Output state is always B x 1 x C + inputs = tf.unstack(inputs, axis=1, num=num_steps) + state = tf.unstack(state, axis=1, num=1)[0] + outputs = [] + + if num_steps > 1: + varscope.reuse_variables() + + for s in range(num_steps): + output, state = cell_fn(inputs[s], state) + outputs.append(output) + outputs = tf.stack(outputs, axis=1) + state = tf.stack([state], axis=1) + return outputs, state + +def _inputs(problem, lstm_states, lstm_state_dims): + # Set up inputs. + with tf.name_scope('inputs'): + n_views = problem.n_views + + inputs = [] + inputs.append(('orig_maps', tf.float32, + (problem.batch_size, 1, None, None, 1))) + inputs.append(('goal_loc', tf.float32, + (problem.batch_size, problem.num_goals, 2))) + + # For initing LSTM. + inputs.append(('rel_goal_loc_at_start', tf.float32, + (problem.batch_size, problem.num_goals, + problem.rel_goal_loc_dim))) + common_input_data, _ = tf_utils.setup_inputs(inputs) + + inputs = [] + inputs.append(('imgs', tf.float32, (problem.batch_size, None, n_views, + problem.img_height, problem.img_width, + problem.img_channels))) + # Goal location as a tuple of delta location and delta theta. + inputs.append(('rel_goal_loc', tf.float32, (problem.batch_size, None, + problem.rel_goal_loc_dim))) + if problem.outputs.visit_count: + inputs.append(('visit_count', tf.int32, (problem.batch_size, None, 1))) + inputs.append(('last_visit', tf.int32, (problem.batch_size, None, 1))) + + for i, (state, dim) in enumerate(zip(lstm_states, lstm_state_dims)): + inputs.append((state, tf.float32, (problem.batch_size, 1, dim))) + + if problem.outputs.egomotion: + inputs.append(('incremental_locs', tf.float32, + (problem.batch_size, None, 2))) + inputs.append(('incremental_thetas', tf.float32, + (problem.batch_size, None, 1))) + + inputs.append(('step_number', tf.int32, (1, None, 1))) + inputs.append(('node_ids', tf.int32, (problem.batch_size, None, + problem.node_ids_dim))) + inputs.append(('perturbs', tf.float32, (problem.batch_size, None, + problem.perturbs_dim))) + + # For plotting result plots + inputs.append(('loc_on_map', tf.float32, (problem.batch_size, None, 2))) + inputs.append(('gt_dist_to_goal', tf.float32, (problem.batch_size, None, 1))) + step_input_data, _ = tf_utils.setup_inputs(inputs) + + inputs = [] + inputs.append(('executed_actions', tf.int32, (problem.batch_size, None))) + inputs.append(('rewards', tf.float32, (problem.batch_size, None))) + inputs.append(('action_sample_wts', tf.float32, (problem.batch_size, None))) + inputs.append(('action', tf.int32, (problem.batch_size, None, + problem.num_actions))) + train_data, _ = tf_utils.setup_inputs(inputs) + train_data.update(step_input_data) + train_data.update(common_input_data) + return common_input_data, step_input_data, train_data + + +def _add_summaries(m, summary_mode, arop_full_summary_iters): + summarize_ops = [m.lr_op, m.global_step_op, m.sample_gt_prob_op, + m.total_loss_op, m.data_loss_op, m.reg_loss_op] + m.acc_ops + summarize_names = ['lr', 'global_step', 'sample_gt_prob_op', 'total_loss', + 'data_loss', 'reg_loss'] + \ + ['acc_{:d}'.format(i) for i in range(len(m.acc_ops))] + to_aggregate = [0, 0, 0, 1, 1, 1] + [1]*len(m.acc_ops) + + scope_name = 'summary' + with tf.name_scope(scope_name): + s_ops = nu.add_default_summaries(summary_mode, arop_full_summary_iters, + summarize_ops, summarize_names, + to_aggregate, m.action_prob_op, + m.input_tensors, scope_name=scope_name) + m.summary_ops = {summary_mode: s_ops} + +def visit_count_fc(visit_count, last_visit, embed_neurons, wt_decay, fc_dropout): + with tf.variable_scope('embed_visit_count'): + visit_count = tf.reshape(visit_count, shape=[-1]) + last_visit = tf.reshape(last_visit, shape=[-1]) + + visit_count = tf.clip_by_value(visit_count, clip_value_min=-1, + clip_value_max=15) + last_visit = tf.clip_by_value(last_visit, clip_value_min=-1, + clip_value_max=15) + visit_count = tf.one_hot(visit_count, depth=16, axis=1, dtype=tf.float32, + on_value=10., off_value=0.) + last_visit = tf.one_hot(last_visit, depth=16, axis=1, dtype=tf.float32, + on_value=10., off_value=0.) + f = tf.concat([visit_count, last_visit], 1) + x, _ = tf_utils.fc_network( + f, neurons=embed_neurons, wt_decay=wt_decay, name='visit_count_embed', + offset=0, batch_norm_param=None, dropout_ratio=fc_dropout, + is_training=is_training) + return x + +def lstm_setup(name, x, batch_size, is_single_step, lstm_dim, lstm_out, + num_steps, state_input_op): + # returns state_name, state_init_op, updated_state_op, out_op + with tf.name_scope('reshape_'+name): + sh = x.get_shape().as_list() + x = tf.reshape(x, shape=[batch_size, -1, sh[-1]]) + + with tf.variable_scope(name) as varscope: + cell = tf.contrib.rnn.LSTMCell( + num_units=lstm_dim, forget_bias=1.0, state_is_tuple=False, + num_proj=lstm_out, use_peepholes=True, + initializer=tf.random_uniform_initializer(-0.01, 0.01, seed=0), + cell_clip=None, proj_clip=None) + + sh = [batch_size, 1, lstm_dim+lstm_out] + state_init_op = tf.constant(0., dtype=tf.float32, shape=sh) + + fn = lambda ns: lstm_online(cell, ns, x, state_input_op, varscope) + out_op, updated_state_op = tf.cond(is_single_step, lambda: fn(1), lambda: + fn(num_steps)) + + return name, state_init_op, updated_state_op, out_op + +def combine_setup(name, combine_type, embed_img, embed_goal, num_img_neuorons=None, + num_goal_neurons=None): + with tf.name_scope(name + '_' + combine_type): + if combine_type == 'add': + # Simple concat features from goal and image + out = embed_img + embed_goal + + elif combine_type == 'multiply': + # Multiply things together + re_embed_img = tf.reshape( + embed_img, shape=[-1, num_img_neuorons / num_goal_neurons, + num_goal_neurons]) + re_embed_goal = tf.reshape(embed_goal, shape=[-1, num_goal_neurons, 1]) + x = tf.matmul(re_embed_img, re_embed_goal, transpose_a=False, transpose_b=False) + out = slim.flatten(x) + elif combine_type == 'none' or combine_type == 'imgonly': + out = embed_img + elif combine_type == 'goalonly': + out = embed_goal + else: + logging.fatal('Undefined combine_type: %s', combine_type) + return out + + +def preprocess_egomotion(locs, thetas): + with tf.name_scope('pre_ego'): + pre_ego = tf.concat([locs, tf.sin(thetas), tf.cos(thetas)], 2) + sh = pre_ego.get_shape().as_list() + pre_ego = tf.reshape(pre_ego, [-1, sh[-1]]) + return pre_ego + +def setup_to_run(m, args, is_training, batch_norm_is_training, summary_mode): + # Set up the model. + tf.set_random_seed(args.solver.seed) + task_params = args.navtask.task_params + num_steps = task_params.num_steps + num_goals = task_params.num_goals + num_actions = task_params.num_actions + num_actions_ = num_actions + + n_views = task_params.n_views + + batch_norm_is_training_op = \ + tf.placeholder_with_default(batch_norm_is_training, shape=[], + name='batch_norm_is_training_op') + # Setup the inputs + m.input_tensors = {} + lstm_states = []; lstm_state_dims = []; + state_names = []; updated_state_ops = []; init_state_ops = []; + if args.arch.lstm_output: + lstm_states += ['lstm_output'] + lstm_state_dims += [args.arch.lstm_output_dim+task_params.num_actions] + if args.arch.lstm_ego: + lstm_states += ['lstm_ego'] + lstm_state_dims += [args.arch.lstm_ego_dim + args.arch.lstm_ego_out] + lstm_states += ['lstm_img'] + lstm_state_dims += [args.arch.lstm_img_dim + args.arch.lstm_img_out] + elif args.arch.lstm_img: + # An LSTM only on the image + lstm_states += ['lstm_img'] + lstm_state_dims += [args.arch.lstm_img_dim + args.arch.lstm_img_out] + else: + # No LSTMs involved here. + None + + m.input_tensors['common'], m.input_tensors['step'], m.input_tensors['train'] = \ + _inputs(task_params, lstm_states, lstm_state_dims) + + with tf.name_scope('check_size'): + is_single_step = tf.equal(tf.unstack(tf.shape(m.input_tensors['step']['imgs']), + num=6)[1], 1) + + images_reshaped = tf.reshape(m.input_tensors['step']['imgs'], + shape=[-1, task_params.img_height, task_params.img_width, + task_params.img_channels], name='re_image') + + rel_goal_loc_reshaped = tf.reshape(m.input_tensors['step']['rel_goal_loc'], + shape=[-1, task_params.rel_goal_loc_dim], name='re_rel_goal_loc') + + x, vars_ = get_repr_from_image( + images_reshaped, task_params.modalities, task_params.data_augment, + args.arch.encoder, args.solver.freeze_conv, args.solver.wt_decay, + is_training) + + # Reshape into nice things so that these can be accumulated over time steps + # for faster backprop. + sh_before = x.get_shape().as_list() + m.encoder_output = tf.reshape( + x, shape=[task_params.batch_size, -1, n_views] + sh_before[1:]) + x = tf.reshape(m.encoder_output, shape=[-1] + sh_before[1:]) + + # Add a layer to reduce dimensions for a fc layer. + if args.arch.dim_reduce_neurons > 0: + ks = 1; neurons = args.arch.dim_reduce_neurons; + init_var = np.sqrt(2.0/(ks**2)/neurons) + batch_norm_param = args.arch.batch_norm_param + batch_norm_param['is_training'] = batch_norm_is_training_op + m.conv_feat = slim.conv2d( + x, neurons, kernel_size=ks, stride=1, normalizer_fn=slim.batch_norm, + normalizer_params=batch_norm_param, padding='SAME', scope='dim_reduce', + weights_regularizer=slim.l2_regularizer(args.solver.wt_decay), + weights_initializer=tf.random_normal_initializer(stddev=init_var)) + reshape_conv_feat = slim.flatten(m.conv_feat) + sh = reshape_conv_feat.get_shape().as_list() + m.reshape_conv_feat = tf.reshape(reshape_conv_feat, + shape=[-1, sh[1]*n_views]) + + # Restore these from a checkpoint. + if args.solver.pretrained_path is not None: + m.init_fn = slim.assign_from_checkpoint_fn(args.solver.pretrained_path, + vars_) + else: + m.init_fn = None + + # Hit the goal_location with a bunch of fully connected layers, to embed it + # into some space. + with tf.variable_scope('embed_goal'): + batch_norm_param = args.arch.batch_norm_param + batch_norm_param['is_training'] = batch_norm_is_training_op + m.embed_goal, _ = tf_utils.fc_network( + rel_goal_loc_reshaped, neurons=args.arch.goal_embed_neurons, + wt_decay=args.solver.wt_decay, name='goal_embed', offset=0, + batch_norm_param=batch_norm_param, dropout_ratio=args.arch.fc_dropout, + is_training=is_training) + + if args.arch.embed_goal_for_state: + with tf.variable_scope('embed_goal_for_state'): + batch_norm_param = args.arch.batch_norm_param + batch_norm_param['is_training'] = batch_norm_is_training_op + m.embed_goal_for_state, _ = tf_utils.fc_network( + m.input_tensors['common']['rel_goal_loc_at_start'][:,0,:], + neurons=args.arch.goal_embed_neurons, wt_decay=args.solver.wt_decay, + name='goal_embed', offset=0, batch_norm_param=batch_norm_param, + dropout_ratio=args.arch.fc_dropout, is_training=is_training) + + # Hit the goal_location with a bunch of fully connected layers, to embed it + # into some space. + with tf.variable_scope('embed_img'): + batch_norm_param = args.arch.batch_norm_param + batch_norm_param['is_training'] = batch_norm_is_training_op + m.embed_img, _ = tf_utils.fc_network( + m.reshape_conv_feat, neurons=args.arch.img_embed_neurons, + wt_decay=args.solver.wt_decay, name='img_embed', offset=0, + batch_norm_param=batch_norm_param, dropout_ratio=args.arch.fc_dropout, + is_training=is_training) + + # For lstm_ego, and lstm_image, embed the ego motion, accumulate it into an + # LSTM, combine with image features and accumulate those in an LSTM. Finally + # combine what you get from the image LSTM with the goal to output an action. + if args.arch.lstm_ego: + ego_reshaped = preprocess_egomotion(m.input_tensors['step']['incremental_locs'], + m.input_tensors['step']['incremental_thetas']) + with tf.variable_scope('embed_ego'): + batch_norm_param = args.arch.batch_norm_param + batch_norm_param['is_training'] = batch_norm_is_training_op + m.embed_ego, _ = tf_utils.fc_network( + ego_reshaped, neurons=args.arch.ego_embed_neurons, + wt_decay=args.solver.wt_decay, name='ego_embed', offset=0, + batch_norm_param=batch_norm_param, dropout_ratio=args.arch.fc_dropout, + is_training=is_training) + + state_name, state_init_op, updated_state_op, out_op = lstm_setup( + 'lstm_ego', m.embed_ego, task_params.batch_size, is_single_step, + args.arch.lstm_ego_dim, args.arch.lstm_ego_out, num_steps*num_goals, + m.input_tensors['step']['lstm_ego']) + state_names += [state_name] + init_state_ops += [state_init_op] + updated_state_ops += [updated_state_op] + + # Combine the output with the vision features. + m.img_ego_op = combine_setup('img_ego', args.arch.combine_type_ego, + m.embed_img, out_op, + args.arch.img_embed_neurons[-1], + args.arch.lstm_ego_out) + + # LSTM on these vision features. + state_name, state_init_op, updated_state_op, out_op = lstm_setup( + 'lstm_img', m.img_ego_op, task_params.batch_size, is_single_step, + args.arch.lstm_img_dim, args.arch.lstm_img_out, num_steps*num_goals, + m.input_tensors['step']['lstm_img']) + state_names += [state_name] + init_state_ops += [state_init_op] + updated_state_ops += [updated_state_op] + + m.img_for_goal = out_op + num_img_for_goal_neurons = args.arch.lstm_img_out + + elif args.arch.lstm_img: + # LSTM on just the image features. + state_name, state_init_op, updated_state_op, out_op = lstm_setup( + 'lstm_img', m.embed_img, task_params.batch_size, is_single_step, + args.arch.lstm_img_dim, args.arch.lstm_img_out, num_steps*num_goals, + m.input_tensors['step']['lstm_img']) + state_names += [state_name] + init_state_ops += [state_init_op] + updated_state_ops += [updated_state_op] + m.img_for_goal = out_op + num_img_for_goal_neurons = args.arch.lstm_img_out + + else: + m.img_for_goal = m.embed_img + num_img_for_goal_neurons = args.arch.img_embed_neurons[-1] + + + if args.arch.use_visit_count: + m.embed_visit_count = visit_count_fc( + m.input_tensors['step']['visit_count'], + m.input_tensors['step']['last_visit'], args.arch.goal_embed_neurons, + args.solver.wt_decay, args.arch.fc_dropout, is_training=is_training) + m.embed_goal = m.embed_goal + m.embed_visit_count + + m.combined_f = combine_setup('img_goal', args.arch.combine_type, + m.img_for_goal, m.embed_goal, + num_img_for_goal_neurons, + args.arch.goal_embed_neurons[-1]) + + # LSTM on the combined representation. + if args.arch.lstm_output: + name = 'lstm_output' + # A few fully connected layers here. + with tf.variable_scope('action_pred'): + batch_norm_param = args.arch.batch_norm_param + batch_norm_param['is_training'] = batch_norm_is_training_op + x, _ = tf_utils.fc_network( + m.combined_f, neurons=args.arch.pred_neurons, + wt_decay=args.solver.wt_decay, name='pred', offset=0, + batch_norm_param=batch_norm_param, dropout_ratio=args.arch.fc_dropout) + + if args.arch.lstm_output_init_state_from_goal: + # Use the goal embedding to initialize the LSTM state. + # UGLY CLUGGY HACK: if this is doing computation for a single time step + # then this will not involve back prop, so we can use the state input from + # the feed dict, otherwise we compute the state representation from the + # goal and feed that in. Necessary for using goal location to generate the + # state representation. + m.embed_goal_for_state = tf.expand_dims(m.embed_goal_for_state, dim=1) + state_op = tf.cond(is_single_step, lambda: m.input_tensors['step'][name], + lambda: m.embed_goal_for_state) + state_name, state_init_op, updated_state_op, out_op = lstm_setup( + name, x, task_params.batch_size, is_single_step, + args.arch.lstm_output_dim, + num_actions_, + num_steps*num_goals, state_op) + init_state_ops += [m.embed_goal_for_state] + else: + state_op = m.input_tensors['step'][name] + state_name, state_init_op, updated_state_op, out_op = lstm_setup( + name, x, task_params.batch_size, is_single_step, + args.arch.lstm_output_dim, + num_actions_, num_steps*num_goals, state_op) + init_state_ops += [state_init_op] + + state_names += [state_name] + updated_state_ops += [updated_state_op] + + out_op = tf.reshape(out_op, shape=[-1, num_actions_]) + if num_actions_ > num_actions: + m.action_logits_op = out_op[:,:num_actions] + m.baseline_op = out_op[:,num_actions:] + else: + m.action_logits_op = out_op + m.baseline_op = None + m.action_prob_op = tf.nn.softmax(m.action_logits_op) + + else: + # A few fully connected layers here. + with tf.variable_scope('action_pred'): + batch_norm_param = args.arch.batch_norm_param + batch_norm_param['is_training'] = batch_norm_is_training_op + out_op, _ = tf_utils.fc_network( + m.combined_f, neurons=args.arch.pred_neurons, + wt_decay=args.solver.wt_decay, name='pred', offset=0, + num_pred=num_actions_, + batch_norm_param=batch_norm_param, + dropout_ratio=args.arch.fc_dropout, is_training=is_training) + if num_actions_ > num_actions: + m.action_logits_op = out_op[:,:num_actions] + m.baseline_op = out_op[:,num_actions:] + else: + m.action_logits_op = out_op + m.baseline_op = None + m.action_prob_op = tf.nn.softmax(m.action_logits_op) + + m.train_ops = {} + m.train_ops['step'] = m.action_prob_op + m.train_ops['common'] = [m.input_tensors['common']['orig_maps'], + m.input_tensors['common']['goal_loc'], + m.input_tensors['common']['rel_goal_loc_at_start']] + m.train_ops['state_names'] = state_names + m.train_ops['init_state'] = init_state_ops + m.train_ops['updated_state'] = updated_state_ops + m.train_ops['batch_norm_is_training_op'] = batch_norm_is_training_op + + # Flat list of ops which cache the step data. + m.train_ops['step_data_cache'] = [tf.no_op()] + + if args.solver.freeze_conv: + m.train_ops['step_data_cache'] = [m.encoder_output] + else: + m.train_ops['step_data_cache'] = [] + + ewma_decay = 0.99 if is_training else 0.0 + weight = tf.ones_like(m.input_tensors['train']['action'], dtype=tf.float32, + name='weight') + + m.reg_loss_op, m.data_loss_op, m.total_loss_op, m.acc_ops = \ + compute_losses_multi_or( + m.action_logits_op, m.input_tensors['train']['action'], + weights=weight, num_actions=num_actions, + data_loss_wt=args.solver.data_loss_wt, + reg_loss_wt=args.solver.reg_loss_wt, ewma_decay=ewma_decay) + + + if args.solver.freeze_conv: + vars_to_optimize = list(set(tf.trainable_variables()) - set(vars_)) + else: + vars_to_optimize = None + + m.lr_op, m.global_step_op, m.train_op, m.should_stop_op, m.optimizer, \ + m.sync_optimizer = tf_utils.setup_training( + m.total_loss_op, + args.solver.initial_learning_rate, + args.solver.steps_per_decay, + args.solver.learning_rate_decay, + args.solver.momentum, + args.solver.max_steps, + args.solver.sync, + args.solver.adjust_lr_sync, + args.solver.num_workers, + args.solver.task, + vars_to_optimize=vars_to_optimize, + clip_gradient_norm=args.solver.clip_gradient_norm, + typ=args.solver.typ, momentum2=args.solver.momentum2, + adam_eps=args.solver.adam_eps) + + + if args.arch.sample_gt_prob_type == 'inverse_sigmoid_decay': + m.sample_gt_prob_op = tf_utils.inverse_sigmoid_decay(args.arch.isd_k, + m.global_step_op) + elif args.arch.sample_gt_prob_type == 'zero': + m.sample_gt_prob_op = tf.constant(-1.0, dtype=tf.float32) + elif args.arch.sample_gt_prob_type.split('_')[0] == 'step': + step = int(args.arch.sample_gt_prob_type.split('_')[1]) + m.sample_gt_prob_op = tf_utils.step_gt_prob( + step, m.input_tensors['step']['step_number'][0,0,0]) + + m.sample_action_type = args.arch.action_sample_type + m.sample_action_combine_type = args.arch.action_sample_combine_type + _add_summaries(m, summary_mode, args.summary.arop_full_summary_iters) + + m.init_op = tf.group(tf.global_variables_initializer(), + tf.local_variables_initializer()) + m.saver_op = tf.train.Saver(keep_checkpoint_every_n_hours=4, + write_version=tf.train.SaverDef.V2) + + return m diff --git a/compression/README.md b/compression/README.md index 4b95961b2d20af864cc91d192e9365dbf2c2625f..4406b268fb15944d84072a31f05d25afc6ba252d 100644 --- a/compression/README.md +++ b/compression/README.md @@ -1,107 +1,15 @@ -# Image Compression with Neural Networks +# Compression with Neural Networks -This is a [TensorFlow](http://www.tensorflow.org/) model for compressing and -decompressing images using an already trained Residual GRU model as descibed -in [Full Resolution Image Compression with Recurrent Neural Networks] -(https://arxiv.org/abs/1608.05148). Please consult the paper for more details -on the architecture and compression results. +This is a [TensorFlow](http://www.tensorflow.org/) model repo containing +research on compression with neural networks. This repo currently contains +code for the following papers: -This code will allow you to perform the lossy compression on an model -already trained on compression. This code doesn't not currently contain the -Entropy Coding portions of our paper. +[Full Resolution Image Compression with Recurrent Neural Networks](https://arxiv.org/abs/1608.05148) +## Organization +[Image Encoder](image_encoder/): Encoding and decoding images into their binary representation. -## Prerequisites -The only software requirements for running the encoder and decoder is having -Tensorflow installed. You will also need to [download] -(http://download.tensorflow.org/models/compression_residual_gru-2016-08-23.tar.gz) -and extract the model residual_gru.pb. - -If you want to generate the perceptual similarity under MS-SSIM, you will also -need to [Install SciPy](https://www.scipy.org/install.html). - -## Encoding -The Residual GRU network is fully convolutional, but requires the images -height and width in pixels by a multiple of 32. There is an image in this folder -called example.png that is 768x1024 if one is needed for testing. We also -rely on TensorFlow's built in decoding ops, which support only PNG and JPEG at -time of release. - -To encode an image, simply run the following command: - -`python encoder.py --input_image=/your/image/here.png ---output_codes=output_codes.npz --iteration=15 ---model=/path/to/model/residual_gru.pb -` - -The iteration parameter specifies the lossy-quality to target for compression. -The quality can be [0-15], where 0 corresponds to a target of 1/8 (bits per -pixel) bpp and every increment results in an additional 1/8 bpp. - -| Iteration | BPP | Compression Ratio | -|---: |---: |---: | -|0 | 0.125 | 192:1| -|1 | 0.250 | 96:1| -|2 | 0.375 | 64:1| -|3 | 0.500 | 48:1| -|4 | 0.625 | 38.4:1| -|5 | 0.750 | 32:1| -|6 | 0.875 | 27.4:1| -|7 | 1.000 | 24:1| -|8 | 1.125 | 21.3:1| -|9 | 1.250 | 19.2:1| -|10 | 1.375 | 17.4:1| -|11 | 1.500 | 16:1| -|12 | 1.625 | 14.7:1| -|13 | 1.750 | 13.7:1| -|14 | 1.875 | 12.8:1| -|15 | 2.000 | 12:1| - -The output_codes file contains the numpy shape and a flattened, bit-packed -array of the codes. These can be inspected in python by using numpy.load(). - - -## Decoding -After generating codes for an image, the lossy reconstructions for that image -can be done as follows: - -`python decoder.py --input_codes=codes.npz --output_directory=/tmp/decoded/ ---model=residual_gru.pb` - -The output_directory will contain images decoded at each quality level. - - -## Comparing Similarity -One of our primary metrics for comparing how similar two images are -is MS-SSIM. - -To generate these metrics on your images you can run: -`python msssim.py --original_image=/path/to/your/image.png ---compared_image=/tmp/decoded/image_15.png` - - -## Results -CSV results containing the post-entropy bitrates and MS-SSIM over Kodak can -are available for reference. Each row of the CSV represents each of the Kodak -images in their dataset number (1-24). Each column of the CSV represents each -iteration of the model (1-16). - -[Post Entropy Bitrates](https://storage.googleapis.com/compression-ml/residual_gru_results/bitrate.csv) - -[MS-SSIM](https://storage.googleapis.com/compression-ml/residual_gru_results/msssim.csv) - - -## FAQ - -#### How do I train my own compression network? -We currently don't provide the code to build and train a compression -graph from scratch. - -#### I get an InvalidArgumentError: Incompatible shapes. -This is usually due to the fact that our network only supports images that are -both height and width divisible by 32 pixel. Try padding your images to 32 -pixel boundaries. - +[Entropy Coder](entropy_coder/): Lossless compression of the binary representation. ## Contact Info -Model repository maintained by Nick Johnston ([nickj-google](https://github.com/nickj-google)). +Model repository maintained by Nick Johnston ([nmjohn](https://github.com/nmjohn)). diff --git a/compression/entropy_coder/README.md b/compression/entropy_coder/README.md new file mode 100644 index 0000000000000000000000000000000000000000..59e889990aab71e12ed13122c9b5a796a048402a --- /dev/null +++ b/compression/entropy_coder/README.md @@ -0,0 +1,109 @@ +# Neural net based entropy coding + +This is a [TensorFlow](http://www.tensorflow.org/) model for additional +lossless compression of bitstreams generated by neural net based image +encoders as described in +[https://arxiv.org/abs/1703.10114](https://arxiv.org/abs/1703.10114). + +To be more specific, the entropy coder aims at compressing further binary +codes which have a 3D tensor structure with: + +* the first two dimensions of the tensors corresponding to the height and +the width of the binary codes, +* the last dimension being the depth of the codes. The last dimension can be +sliced into N groups of K, where each additional group is used by the image +decoder to add more details to the reconstructed image. + +The code in this directory only contains the underlying code probability model +but does not perform the actual compression using arithmetic coding. +The code probability model is enough to compute the theoretical compression +ratio. + + +## Prerequisites +The only software requirements for running the encoder and decoder is having +Tensorflow installed. + +You will also need to add the top level source directory of the entropy coder +to your `PYTHONPATH`, for example: + +`export PYTHONPATH=${PYTHONPATH}:/tmp/models/compression` + + +## Training the entropy coder + +### Synthetic dataset +If you do not have a training dataset, there is a simple code generative model +that you can use to generate a dataset and play with the entropy coder. +The generative model is located under dataset/gen\_synthetic\_dataset.py. Note +that this simple generative model is not going to give good results on real +images as it is not supposed to be close to the statistics of the binary +representation of encoded images. Consider it as a toy dataset, no more, no +less. + +To generate a synthetic dataset with 20000 samples: + +`mkdir -p /tmp/dataset` + +`python ./dataset/gen_synthetic_dataset.py --dataset_dir=/tmp/dataset/ +--count=20000` + +Note that the generator has not been optimized at all, generating the synthetic +dataset is currently pretty slow. + +### Training + +If you just want to play with the entropy coder trainer, here is the command +line that can be used to train the entropy coder on the synthetic dataset: + +`mkdir -p /tmp/entropy_coder_train` + +`python ./core/entropy_coder_train.py --task=0 +--train_dir=/tmp/entropy_coder_train/ +--model=progressive +--model_config=./configs/synthetic/model_config.json +--train_config=./configs/synthetic/train_config.json +--input_config=./configs/synthetic/input_config.json +` + +Training is configured using 3 files formatted using JSON: + +* One file is used to configure the underlying entropy coder model. + Currently, only the *progressive* model is supported. + This model takes 2 mandatory parameters and an optional one: + * `layer_depth`: the number of bits per layer (a.k.a. iteration). + Background: the image decoder takes each layer to add more detail + to the image. + * `layer_count`: the maximum number of layers that should be supported + by the model. This should be equal or greater than the maximum number + of layers in the input binary codes. + * `coded_layer_count`: This can be used to consider only partial codes, + keeping only the first `coded_layer_count` layers and ignoring the + remaining layers. If left empty, the binary codes are left unchanged. +* One file to configure the training, including the learning rate, ... + The meaning of the parameters are pretty straightforward. Note that this + file is only used during training and is not needed during inference. +* One file to specify the input dataset to use during training. + The dataset is formatted using tf.RecordIO. + + +## Inference: file size after entropy coding. + +### Using a synthetic sample + +Here is the command line to generate a single synthetic sample formatted +in the same way as what is provided by the image encoder: + +`python ./dataset/gen_synthetic_single.py +--sample_filename=/tmp/dataset/sample_0000.npz` + +To actually compute the additional compression ratio using the entropy coder +trained in the previous step: + +`python ./core/entropy_coder_single.py +--model=progressive +--model_config=./configs/synthetic/model_config.json +--input_codes=/tmp/dataset/sample_0000.npz +--checkpoint=/tmp/entropy_coder_train/model.ckpt-209078` + +where the checkpoint number should be adjusted accordingly. diff --git a/compression/entropy_coder/__init__.py b/compression/entropy_coder/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/compression/entropy_coder/all_models/__init__.py b/compression/entropy_coder/all_models/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/compression/entropy_coder/all_models/all_models.py b/compression/entropy_coder/all_models/all_models.py new file mode 100644 index 0000000000000000000000000000000000000000..e376dac737667a348065eec622920b0a81ed1ac9 --- /dev/null +++ b/compression/entropy_coder/all_models/all_models.py @@ -0,0 +1,19 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Import and register all the entropy coder models.""" + +# pylint: disable=unused-import +from entropy_coder.progressive import progressive diff --git a/compression/entropy_coder/all_models/all_models_test.py b/compression/entropy_coder/all_models/all_models_test.py new file mode 100644 index 0000000000000000000000000000000000000000..b8aff504a0a00d579d1b2768164b78b6c095b235 --- /dev/null +++ b/compression/entropy_coder/all_models/all_models_test.py @@ -0,0 +1,68 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Basic test of all registered models.""" + +import tensorflow as tf + +# pylint: disable=unused-import +import all_models +# pylint: enable=unused-import +from entropy_coder.model import model_factory + + +class AllModelsTest(tf.test.TestCase): + + def testBuildModelForTraining(self): + factory = model_factory.GetModelRegistry() + model_names = factory.GetAvailableModels() + + for m in model_names: + tf.reset_default_graph() + + global_step = tf.Variable(tf.zeros([], dtype=tf.int64), + trainable=False, + name='global_step') + + optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1) + + batch_size = 3 + height = 40 + width = 20 + depth = 5 + binary_codes = tf.placeholder(dtype=tf.float32, + shape=[batch_size, height, width, depth]) + + # Create a model with the default configuration. + print('Creating model: {}'.format(m)) + model = factory.CreateModel(m) + model.Initialize(global_step, + optimizer, + model.GetConfigStringForUnitTest()) + self.assertTrue(model.loss is None, 'model: {}'.format(m)) + self.assertTrue(model.train_op is None, 'model: {}'.format(m)) + self.assertTrue(model.average_code_length is None, 'model: {}'.format(m)) + + # Build the Tensorflow graph corresponding to the model. + model.BuildGraph(binary_codes) + self.assertTrue(model.loss is not None, 'model: {}'.format(m)) + self.assertTrue(model.average_code_length is not None, + 'model: {}'.format(m)) + if model.train_op is None: + print('Model {} is not trainable'.format(m)) + + +if __name__ == '__main__': + tf.test.main() diff --git a/compression/entropy_coder/configs/gru_prime3/model_config.json b/compression/entropy_coder/configs/gru_prime3/model_config.json new file mode 100644 index 0000000000000000000000000000000000000000..cf63a4c454df5c47c732c5eaeea481b2aa714665 --- /dev/null +++ b/compression/entropy_coder/configs/gru_prime3/model_config.json @@ -0,0 +1,4 @@ +{ + "layer_count": 16, + "layer_depth": 32 +} diff --git a/compression/entropy_coder/configs/synthetic/input_config.json b/compression/entropy_coder/configs/synthetic/input_config.json new file mode 100644 index 0000000000000000000000000000000000000000..18455e65120cd45cb04106ed8b6b2d6641e1d49a --- /dev/null +++ b/compression/entropy_coder/configs/synthetic/input_config.json @@ -0,0 +1,4 @@ +{ + "data": "/tmp/dataset/synthetic_dataset", + "unique_code_size": true +} diff --git a/compression/entropy_coder/configs/synthetic/model_config.json b/compression/entropy_coder/configs/synthetic/model_config.json new file mode 100644 index 0000000000000000000000000000000000000000..c6f1f3e11547a75c05019e24c59a7fc6d2a29e3b --- /dev/null +++ b/compression/entropy_coder/configs/synthetic/model_config.json @@ -0,0 +1,4 @@ +{ + "layer_depth": 2, + "layer_count": 8 +} diff --git a/compression/entropy_coder/configs/synthetic/train_config.json b/compression/entropy_coder/configs/synthetic/train_config.json new file mode 100644 index 0000000000000000000000000000000000000000..79e4909fd3f93df983d79890e25b7b61ba14aa40 --- /dev/null +++ b/compression/entropy_coder/configs/synthetic/train_config.json @@ -0,0 +1,6 @@ +{ + "batch_size": 4, + "learning_rate": 0.1, + "decay_rate": 0.9, + "samples_per_decay": 20000 +} diff --git a/compression/entropy_coder/core/code_loader.py b/compression/entropy_coder/core/code_loader.py new file mode 100644 index 0000000000000000000000000000000000000000..603ab724afb0e6c4e94db9c121d7799eaf30fa02 --- /dev/null +++ b/compression/entropy_coder/core/code_loader.py @@ -0,0 +1,73 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Load binary codes stored as tf.Example in a TFRecord table.""" + +import tensorflow as tf + + +def ReadFirstCode(dataset): + """Read the first example from a binary code RecordIO table.""" + for record in tf.python_io.tf_record_iterator(dataset): + tf_example = tf.train.Example() + tf_example.ParseFromString(record) + break + return tf_example + + +def LoadBinaryCode(input_config, batch_size): + """Load a batch of binary codes from a tf.Example dataset. + + Args: + input_config: An InputConfig proto containing the input configuration. + batch_size: Output batch size of examples. + + Returns: + A batched tensor of binary codes. + """ + data = input_config.data + + # TODO: Possibly use multiple files (instead of just one). + file_list = [data] + filename_queue = tf.train.string_input_producer(file_list, + capacity=4) + reader = tf.TFRecordReader() + _, values = reader.read(filename_queue) + + serialized_example = tf.reshape(values, shape=[1]) + serialized_features = { + 'code_shape': tf.FixedLenFeature([3], + dtype=tf.int64), + 'code': tf.VarLenFeature(tf.float32), + } + example = tf.parse_example(serialized_example, serialized_features) + + # 3D shape: height x width x binary_code_depth + z = example['code_shape'] + code_shape = tf.reshape(tf.cast(z, tf.int32), [3]) + # Un-flatten the binary codes. + code = tf.reshape(tf.sparse_tensor_to_dense(example['code']), code_shape) + + queue_size = 10 + queue = tf.PaddingFIFOQueue( + queue_size + 3 * batch_size, + dtypes=[code.dtype], + shapes=[[None, None, None]]) + enqueue_op = queue.enqueue([code]) + dequeue_code = queue.dequeue_many(batch_size) + queue_runner = tf.train.queue_runner.QueueRunner(queue, [enqueue_op]) + tf.add_to_collection(tf.GraphKeys.QUEUE_RUNNERS, queue_runner) + + return dequeue_code diff --git a/compression/entropy_coder/core/config_helper.py b/compression/entropy_coder/core/config_helper.py new file mode 100644 index 0000000000000000000000000000000000000000..a7d949e329b93f33d330d1ba494f71ae1704fa3f --- /dev/null +++ b/compression/entropy_coder/core/config_helper.py @@ -0,0 +1,52 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Helper functions used in both train and inference.""" + +import json +import os.path + +import tensorflow as tf + + +def GetConfigString(config_file): + config_string = '' + if config_file is not None: + config_string = open(config_file).read() + return config_string + + +class InputConfig(object): + + def __init__(self, config_string): + config = json.loads(config_string) + self.data = config["data"] + self.unique_code_size = config["unique_code_size"] + + +class TrainConfig(object): + + def __init__(self, config_string): + config = json.loads(config_string) + self.batch_size = config["batch_size"] + self.learning_rate = config["learning_rate"] + self.decay_rate = config["decay_rate"] + self.samples_per_decay = config["samples_per_decay"] + + +def SaveConfig(directory, filename, config_string): + path = os.path.join(directory, filename) + with tf.gfile.Open(path, mode='w') as f: + f.write(config_string) diff --git a/compression/entropy_coder/core/entropy_coder_single.py b/compression/entropy_coder/core/entropy_coder_single.py new file mode 100644 index 0000000000000000000000000000000000000000..40a1317c91c77423d2f6f1cad385f4fcbf98df8c --- /dev/null +++ b/compression/entropy_coder/core/entropy_coder_single.py @@ -0,0 +1,116 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Compute the additional compression ratio after entropy coding.""" + +import io +import os + +import numpy as np +import tensorflow as tf + +import config_helper + +# pylint: disable=unused-import +from entropy_coder.all_models import all_models +# pylint: enable=unused-import +from entropy_coder.model import model_factory + + +# Checkpoint used to restore the model parameters. +tf.app.flags.DEFINE_string('checkpoint', None, + """Model checkpoint.""") + +# Model selection and configuration. +tf.app.flags.DEFINE_string('model', None, """Underlying encoder model.""") +tf.app.flags.DEFINE_string('model_config', None, + """Model config protobuf given as text file.""") + +# File holding the binary codes. +tf.flags.DEFINE_string('input_codes', None, 'Location of binary code file.') + +FLAGS = tf.flags.FLAGS + + +def main(_): + if (FLAGS.input_codes is None or FLAGS.model is None): + print ('\nUsage: python entropy_coder_single.py --model=progressive ' + '--model_config=model_config.json' + '--iteration=15\n\n') + return + + #if FLAGS.iteration < -1 or FLAGS.iteration > 15: + # print ('\n--iteration must be between 0 and 15 inclusive, or -1 to infer ' + # 'from file.\n') + # return + #iteration = FLAGS.iteration + + if not tf.gfile.Exists(FLAGS.input_codes): + print '\nInput codes not found.\n' + return + + with tf.gfile.FastGFile(FLAGS.input_codes, 'rb') as code_file: + contents = code_file.read() + loaded_codes = np.load(io.BytesIO(contents)) + assert ['codes', 'shape'] not in loaded_codes.files + loaded_shape = loaded_codes['shape'] + loaded_array = loaded_codes['codes'] + + # Unpack and recover code shapes. + unpacked_codes = np.reshape(np.unpackbits(loaded_array) + [:np.prod(loaded_shape)], + loaded_shape) + + numpy_int_codes = unpacked_codes.transpose([1, 2, 3, 0, 4]) + numpy_int_codes = numpy_int_codes.reshape([numpy_int_codes.shape[0], + numpy_int_codes.shape[1], + numpy_int_codes.shape[2], + -1]) + numpy_codes = numpy_int_codes.astype(np.float32) * 2.0 - 1.0 + + with tf.Graph().as_default() as graph: + # TF tensor to hold the binary codes to losslessly compress. + batch_size = 1 + codes = tf.placeholder(tf.float32, shape=numpy_codes.shape) + + # Create the entropy coder model. + global_step = None + optimizer = None + model = model_factory.GetModelRegistry().CreateModel(FLAGS.model) + model_config_string = config_helper.GetConfigString(FLAGS.model_config) + model.Initialize(global_step, optimizer, model_config_string) + model.BuildGraph(codes) + + saver = tf.train.Saver(sharded=True, keep_checkpoint_every_n_hours=12.0) + + with tf.Session(graph=graph) as sess: + # Initialize local variables. + sess.run(tf.local_variables_initializer()) + + # Restore model variables. + saver.restore(sess, FLAGS.checkpoint) + + tf_tensors = { + 'code_length': model.average_code_length + } + feed_dict = {codes: numpy_codes} + np_tensors = sess.run(tf_tensors, feed_dict=feed_dict) + + print('Additional compression ratio: {}'.format( + np_tensors['code_length'])) + + +if __name__ == '__main__': + tf.app.run() diff --git a/compression/entropy_coder/core/entropy_coder_train.py b/compression/entropy_coder/core/entropy_coder_train.py new file mode 100644 index 0000000000000000000000000000000000000000..248935e3c9504e6945745d6fe97ff6dcccf0d639 --- /dev/null +++ b/compression/entropy_coder/core/entropy_coder_train.py @@ -0,0 +1,184 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Train an entropy coder model.""" + +import time + +import tensorflow as tf + +import code_loader +import config_helper + +# pylint: disable=unused-import +from entropy_coder.all_models import all_models +# pylint: enable=unused-import +from entropy_coder.model import model_factory + + +FLAGS = tf.app.flags.FLAGS + +# Hardware resources configuration. +tf.app.flags.DEFINE_string('master', '', + """Name of the TensorFlow master to use.""") +tf.app.flags.DEFINE_string('train_dir', None, + """Directory where to write event logs.""") +tf.app.flags.DEFINE_integer('task', None, + """Task id of the replica running the training.""") +tf.app.flags.DEFINE_integer('ps_tasks', 0, """Number of tasks in the ps job. + If 0 no ps job is used.""") + +# Model selection and configuration. +tf.app.flags.DEFINE_string('model', None, """Underlying encoder model.""") +tf.app.flags.DEFINE_string('model_config', None, + """Model config protobuf given as text file.""") + +# Training data and parameters configuration. +tf.app.flags.DEFINE_string('input_config', None, + """Path to the training input config file.""") +tf.app.flags.DEFINE_string('train_config', None, + """Path to the training experiment config file.""") + + +def train(): + if FLAGS.train_dir is None: + raise ValueError('Parameter train_dir must be provided') + if FLAGS.task is None: + raise ValueError('Parameter task must be provided') + if FLAGS.model is None: + raise ValueError('Parameter model must be provided') + + input_config_string = config_helper.GetConfigString(FLAGS.input_config) + input_config = config_helper.InputConfig(input_config_string) + + # Training parameters. + train_config_string = config_helper.GetConfigString(FLAGS.train_config) + train_config = config_helper.TrainConfig(train_config_string) + + batch_size = train_config.batch_size + initial_learning_rate = train_config.learning_rate + decay_rate = train_config.decay_rate + samples_per_decay = train_config.samples_per_decay + + # Parameters for learning-rate decay. + # The formula is decay_rate ** floor(steps / decay_steps). + decay_steps = samples_per_decay / batch_size + decay_steps = max(decay_steps, 1) + + first_code = code_loader.ReadFirstCode(input_config.data) + first_code_height = ( + first_code.features.feature['code_shape'].int64_list.value[0]) + first_code_width = ( + first_code.features.feature['code_shape'].int64_list.value[1]) + max_bit_depth = ( + first_code.features.feature['code_shape'].int64_list.value[2]) + print('Maximum code depth: {}'.format(max_bit_depth)) + + with tf.Graph().as_default(): + ps_ops = ["Variable", "VariableV2", "AutoReloadVariable", "VarHandleOp"] + with tf.device(tf.train.replica_device_setter(FLAGS.ps_tasks, + ps_ops=ps_ops)): + codes = code_loader.LoadBinaryCode( + input_config=input_config, + batch_size=batch_size) + if input_config.unique_code_size: + print('Input code size: {} x {}'.format(first_code_height, + first_code_width)) + codes.set_shape( + [batch_size, first_code_height, first_code_width, max_bit_depth]) + else: + codes.set_shape([batch_size, None, None, max_bit_depth]) + codes_effective_shape = tf.shape(codes) + + global_step = tf.contrib.framework.create_global_step() + + # Apply learning-rate decay. + learning_rate = tf.train.exponential_decay( + learning_rate=initial_learning_rate, + global_step=global_step, + decay_steps=decay_steps, + decay_rate=decay_rate, + staircase=True) + tf.summary.scalar('Learning Rate', learning_rate) + optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate, + epsilon=1.0) + + # Create the entropy coder model. + model = model_factory.GetModelRegistry().CreateModel(FLAGS.model) + model_config_string = config_helper.GetConfigString(FLAGS.model_config) + model.Initialize(global_step, optimizer, model_config_string) + model.BuildGraph(codes) + + summary_op = tf.summary.merge_all() + + # Verify that the model can actually be trained. + if model.train_op is None: + raise ValueError('Input model {} is not trainable'.format(FLAGS.model)) + + # We disable the summary thread run by Supervisor class by passing + # summary_op=None. We still pass save_summaries_secs because it is used by + # the global step counter thread. + is_chief = (FLAGS.task == 0) + sv = tf.train.Supervisor(logdir=FLAGS.train_dir, + is_chief=is_chief, + global_step=global_step, + # saver=model.saver, + summary_op=None, + save_summaries_secs=120, + save_model_secs=600, + recovery_wait_secs=30) + + sess = sv.PrepareSession(FLAGS.master) + sv.StartQueueRunners(sess) + + step = sess.run(global_step) + print('Trainer initial step: {}.'.format(step)) + + # Once everything has been setup properly, save the configs. + if is_chief: + config_helper.SaveConfig(FLAGS.train_dir, 'input_config.json', + input_config_string) + config_helper.SaveConfig(FLAGS.train_dir, 'model_config.json', + model_config_string) + config_helper.SaveConfig(FLAGS.train_dir, 'train_config.json', + train_config_string) + + # Train the model. + next_summary_time = time.time() + while not sv.ShouldStop(): + feed_dict = None + + # Once in a while, update the summaries on the chief worker. + if is_chief and next_summary_time < time.time(): + summary_str = sess.run(summary_op, feed_dict=feed_dict) + sv.SummaryComputed(sess, summary_str) + next_summary_time = time.time() + sv.save_summaries_secs + else: + tf_tensors = { + 'train': model.train_op, + 'code_length': model.average_code_length + } + np_tensors = sess.run(tf_tensors, feed_dict=feed_dict) + print np_tensors['code_length'] + + sv.Stop() + + +def main(argv=None): # pylint: disable=unused-argument + train() + + +if __name__ == '__main__': + tf.app.run() diff --git a/compression/entropy_coder/dataset/gen_synthetic_dataset.py b/compression/entropy_coder/dataset/gen_synthetic_dataset.py new file mode 100644 index 0000000000000000000000000000000000000000..aa511b530692c3f7d9e57756473f18850f632beb --- /dev/null +++ b/compression/entropy_coder/dataset/gen_synthetic_dataset.py @@ -0,0 +1,88 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Generate a synthetic dataset.""" + +import os + +import numpy as np +import tensorflow as tf + +import synthetic_model + + +FLAGS = tf.app.flags.FLAGS + +tf.app.flags.DEFINE_string( + 'dataset_dir', None, + """Directory where to write the dataset and the configs.""") +tf.app.flags.DEFINE_integer( + 'count', 1000, + """Number of samples to generate.""") + + +def int64_feature(values): + """Returns a TF-Feature of int64s. + + Args: + values: A scalar or list of values. + + Returns: + A TF-Feature. + """ + if not isinstance(values, (tuple, list)): + values = [values] + return tf.train.Feature(int64_list=tf.train.Int64List(value=values)) + + +def float_feature(values): + """Returns a TF-Feature of floats. + + Args: + values: A scalar of list of values. + + Returns: + A TF-Feature. + """ + if not isinstance(values, (tuple, list)): + values = [values] + return tf.train.Feature(float_list=tf.train.FloatList(value=values)) + + +def AddToTFRecord(code, tfrecord_writer): + example = tf.train.Example(features=tf.train.Features(feature={ + 'code_shape': int64_feature(code.shape), + 'code': float_feature(code.flatten().tolist()), + })) + tfrecord_writer.write(example.SerializeToString()) + + +def GenerateDataset(filename, count, code_shape): + with tf.python_io.TFRecordWriter(filename) as tfrecord_writer: + for _ in xrange(count): + code = synthetic_model.GenerateSingleCode(code_shape) + # Convert {0,1} codes to {-1,+1} codes. + code = 2.0 * code - 1.0 + AddToTFRecord(code, tfrecord_writer) + + +def main(argv=None): # pylint: disable=unused-argument + GenerateDataset(os.path.join(FLAGS.dataset_dir + '/synthetic_dataset'), + FLAGS.count, + [35, 48, 8]) + + +if __name__ == '__main__': + tf.app.run() diff --git a/compression/entropy_coder/dataset/gen_synthetic_single.py b/compression/entropy_coder/dataset/gen_synthetic_single.py new file mode 100644 index 0000000000000000000000000000000000000000..b8c3821c38b6a0b95f01ad7ffb283cca4beb34b3 --- /dev/null +++ b/compression/entropy_coder/dataset/gen_synthetic_single.py @@ -0,0 +1,72 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Generate a single synthetic sample.""" + +import io +import os + +import numpy as np +import tensorflow as tf + +import synthetic_model + + +FLAGS = tf.app.flags.FLAGS + +tf.app.flags.DEFINE_string( + 'sample_filename', None, + """Output file to store the generated binary code.""") + + +def GenerateSample(filename, code_shape, layer_depth): + # {0, +1} binary codes. + # No conversion since the output file is expected to store + # codes using {0, +1} codes (and not {-1, +1}). + code = synthetic_model.GenerateSingleCode(code_shape) + code = np.round(code) + + # Reformat the code so as to be compatible with what is generated + # by the image encoder. + # The image encoder generates a tensor of size: + # iteration_count x batch_size x height x width x iteration_depth. + # Here: batch_size = 1 + if code_shape[-1] % layer_depth != 0: + raise ValueError('Number of layers is not an integer') + height = code_shape[0] + width = code_shape[1] + code = code.reshape([1, height, width, -1, layer_depth]) + code = np.transpose(code, [3, 0, 1, 2, 4]) + + int_codes = code.astype(np.int8) + exported_codes = np.packbits(int_codes.reshape(-1)) + + output = io.BytesIO() + np.savez_compressed(output, shape=int_codes.shape, codes=exported_codes) + with tf.gfile.FastGFile(filename, 'wb') as code_file: + code_file.write(output.getvalue()) + + +def main(argv=None): # pylint: disable=unused-argument + # Note: the height and the width is different from the training dataset. + # The main purpose is to show that the entropy coder model is fully + # convolutional and can be used on any image size. + layer_depth = 2 + GenerateSample(FLAGS.sample_filename, [31, 36, 8], layer_depth) + + +if __name__ == '__main__': + tf.app.run() + diff --git a/compression/entropy_coder/dataset/synthetic_model.py b/compression/entropy_coder/dataset/synthetic_model.py new file mode 100644 index 0000000000000000000000000000000000000000..4811208386dd9ba72df03a3b01afb90aa0ee58a5 --- /dev/null +++ b/compression/entropy_coder/dataset/synthetic_model.py @@ -0,0 +1,74 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Binary code sample generator.""" + +import numpy as np + + +_CRC_LINE = [ + [0, 1, 0], + [1, 1, 0], + [1, 0, 0] +] + +_CRC_DEPTH = [1, 1, 0, 1] + + +def ComputeLineCrc(code, width, y, x, d): + crc = 0 + for dy in xrange(len(_CRC_LINE)): + i = y - 1 - dy + if i < 0: + continue + for dx in xrange(len(_CRC_LINE[dy])): + j = x - 2 + dx + if j < 0 or j >= width: + continue + crc += 1 if (code[i, j, d] != _CRC_LINE[dy][dx]) else 0 + return crc + + +def ComputeDepthCrc(code, y, x, d): + crc = 0 + for delta in xrange(len(_CRC_DEPTH)): + k = d - 1 - delta + if k < 0: + continue + crc += 1 if (code[y, x, k] != _CRC_DEPTH[delta]) else 0 + return crc + + +def GenerateSingleCode(code_shape): + code = np.zeros(code_shape, dtype=np.int) + + keep_value_proba = 0.8 + + height = code_shape[0] + width = code_shape[1] + depth = code_shape[2] + + for d in xrange(depth): + for y in xrange(height): + for x in xrange(width): + v1 = ComputeLineCrc(code, width, y, x, d) + v2 = ComputeDepthCrc(code, y, x, d) + v = 1 if (v1 + v2 >= 6) else 0 + if np.random.rand() < keep_value_proba: + code[y, x, d] = v + else: + code[y, x, d] = 1 - v + + return code diff --git a/compression/entropy_coder/lib/__init__.py b/compression/entropy_coder/lib/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/compression/entropy_coder/lib/block_base.py b/compression/entropy_coder/lib/block_base.py new file mode 100644 index 0000000000000000000000000000000000000000..615dff82829dbbcab46c7217cd35f6259de01161 --- /dev/null +++ b/compression/entropy_coder/lib/block_base.py @@ -0,0 +1,258 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Base class for Tensorflow building blocks.""" + +import collections +import contextlib +import itertools + +import tensorflow as tf + +_block_stacks = collections.defaultdict(lambda: []) + + +class BlockBase(object): + """Base class for transform wrappers of Tensorflow. + + To implement a Tensorflow transform block, inherit this class. + + 1. To create a variable, use NewVar() method. Do not overload this method! + For example, use as follows. + a_variable = self.NewVar(initial_value) + + 2. All Tensorflow-related code must be done inside 'with self._BlockScope().' + Otherwise, name scoping and block hierarchy will not work. An exception + is _Apply() method, which is already called inside the context manager + by __call__() method. + + 3. Override and implement _Apply() method. This method is called by + __call__() method. + + The users would use blocks like the following. + nn1 = NN(128, bias=Bias(0), act=tf.nn.relu) + y = nn1(x) + + Some things to consider. + + - Use lazy-initialization if possible. That is, initialize at first Apply() + rather than at __init__(). + + Note: if needed, the variables can be created on a specific parameter + server by creating blocks in a scope like: + with g.device(device): + linear = Linear(...) + """ + + def __init__(self, name): + self._variables = [] + self._subblocks = [] + self._called = False + + # Intentionally distinguishing empty string and None. + # If name is an empty string, then do not use name scope. + self.name = name if name is not None else self.__class__.__name__ + self._graph = tf.get_default_graph() + + if self.name: + # Capture the scope string at the init time. + with self._graph.name_scope(self.name) as scope: + self._scope_str = scope + else: + self._scope_str = '' + + # Maintain hierarchy structure of blocks. + self._stack = _block_stacks[self._graph] + if self.__class__ is BlockBase: + # This code is only executed to create the root, which starts in the + # initialized state. + assert not self._stack + self._parent = None + self._called = True # The root is initialized. + return + + # Create a fake root if a root is not already present. + if not self._stack: + self._stack.append(BlockBase('NoOpRoot')) + + self._parent = self._stack[-1] + self._parent._subblocks.append(self) # pylint: disable=protected-access + + def __repr__(self): + return '"{}" ({})'.format(self._scope_str, self.__class__.__name__) + + @contextlib.contextmanager + def _OptionalNameScope(self, scope_str): + if scope_str: + with self._graph.name_scope(scope_str): + yield + else: + yield + + @contextlib.contextmanager + def _BlockScope(self): + """Context manager that handles graph, namescope, and nested blocks.""" + self._stack.append(self) + + try: + with self._graph.as_default(): + with self._OptionalNameScope(self._scope_str): + yield self + finally: # Pop from the stack no matter exception is raised or not. + # The following line is executed when leaving 'with self._BlockScope()' + self._stack.pop() + + def __call__(self, *args, **kwargs): + assert self._stack is _block_stacks[self._graph] + + with self._BlockScope(): + ret = self._Apply(*args, **kwargs) + + self._called = True + return ret + + def _Apply(self, *args, **kwargs): + """Implementation of __call__().""" + raise NotImplementedError() + + # Redirect all variable creation to this single function, so that we can + # switch to better variable creation scheme. + def NewVar(self, value, **kwargs): + """Creates a new variable. + + This function creates a variable, then returns a local copy created by + Identity operation. To get the Variable class object, use LookupRef() + method. + + Note that each time Variable class object is used as an input to an + operation, Tensorflow will create a new Send/Recv pair. This hurts + performance. + + If not for assign operations, use the local copy returned by this method. + + Args: + value: Initialization value of the variable. The shape and the data type + of the variable is determined by this initial value. + **kwargs: Extra named arguments passed to Variable.__init__(). + + Returns: + A local copy of the new variable. + """ + v = tf.Variable(value, **kwargs) + + self._variables.append(v) + return v + + @property + def initialized(self): + """Returns bool if the block is initialized. + + By default, BlockBase assumes that a block is initialized when __call__() + is executed for the first time. If this is an incorrect assumption for some + subclasses, override this property in those subclasses. + + Returns: + True if initialized, False otherwise. + """ + return self._called + + def AssertInitialized(self): + """Asserts initialized property.""" + if not self.initialized: + raise RuntimeError('{} has not been initialized.'.format(self)) + + def VariableList(self): + """Returns the list of all tensorflow variables used inside this block.""" + variables = list(itertools.chain( + itertools.chain.from_iterable( + t.VariableList() for t in self._subblocks), + self._VariableList())) + return variables + + def _VariableList(self): + """Returns the list of all tensorflow variables owned by this block.""" + self.AssertInitialized() + return self._variables + + def CreateWeightLoss(self): + """Returns L2 loss list of (almost) all variables used inside this block. + + When this method needs to be overridden, there are two choices. + + 1. Override CreateWeightLoss() to change the weight loss of all variables + that belong to this block, both directly and indirectly. + 2. Override _CreateWeightLoss() to change the weight loss of all + variables that directly belong to this block but not to the sub-blocks. + + Returns: + A Tensor object or None. + """ + losses = list(itertools.chain( + itertools.chain.from_iterable( + t.CreateWeightLoss() for t in self._subblocks), + self._CreateWeightLoss())) + return losses + + def _CreateWeightLoss(self): + """Returns weight loss list of variables that belong to this block.""" + self.AssertInitialized() + with self._BlockScope(): + return [tf.nn.l2_loss(v) for v in self._variables] + + def CreateUpdateOps(self): + """Creates update operations for this block and its sub-blocks.""" + ops = list(itertools.chain( + itertools.chain.from_iterable( + t.CreateUpdateOps() for t in self._subblocks), + self._CreateUpdateOps())) + return ops + + def _CreateUpdateOps(self): + """Creates update operations for this block.""" + self.AssertInitialized() + return [] + + def MarkAsNonTrainable(self): + """Mark all the variables of this block as non-trainable. + + All the variables owned directly or indirectly (through subblocks) are + marked as non trainable. + + This function along with CheckpointInitOp can be used to load a pretrained + model that consists in only one part of the whole graph. + """ + assert self._called + + all_variables = self.VariableList() + collection = tf.get_collection_ref(tf.GraphKeys.TRAINABLE_VARIABLES) + for v in all_variables: + if v in collection: + collection.remove(v) + + +def CreateWeightLoss(): + """Returns all weight losses from the blocks in the graph.""" + stack = _block_stacks[tf.get_default_graph()] + if not stack: + return [] + return stack[0].CreateWeightLoss() + + +def CreateBlockUpdates(): + """Combines all updates from the blocks in the graph.""" + stack = _block_stacks[tf.get_default_graph()] + if not stack: + return [] + return stack[0].CreateUpdateOps() diff --git a/compression/entropy_coder/lib/block_util.py b/compression/entropy_coder/lib/block_util.py new file mode 100644 index 0000000000000000000000000000000000000000..957f8d603130d8dfa5c2523cce07a926cd8fe330 --- /dev/null +++ b/compression/entropy_coder/lib/block_util.py @@ -0,0 +1,100 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Utility functions for blocks.""" + +from __future__ import division +from __future__ import unicode_literals + +import math + +import numpy as np +import tensorflow as tf + + +class RsqrtInitializer(object): + """Gaussian initializer with standard deviation 1/sqrt(n). + + Note that tf.truncated_normal is used internally. Therefore any random sample + outside two-sigma will be discarded and re-sampled. + """ + + def __init__(self, dims=(0,), **kwargs): + """Creates an initializer. + + Args: + dims: Dimension(s) index to compute standard deviation: + 1.0 / sqrt(product(shape[dims])) + **kwargs: Extra keyword arguments to pass to tf.truncated_normal. + """ + if isinstance(dims, (int, long)): + self._dims = [dims] + else: + self._dims = dims + self._kwargs = kwargs + + def __call__(self, shape, dtype): + stddev = 1.0 / np.sqrt(np.prod([shape[x] for x in self._dims])) + return tf.truncated_normal( + shape=shape, dtype=dtype, stddev=stddev, **self._kwargs) + + +class RectifierInitializer(object): + """Gaussian initializer with standard deviation sqrt(2/fan_in). + + Note that tf.random_normal is used internally to ensure the expected weight + distribution. This is intended to be used with ReLU activations, specially + in ResNets. + + For details please refer to: + Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet + Classification + """ + + def __init__(self, dims=(0,), scale=2.0, **kwargs): + """Creates an initializer. + + Args: + dims: Dimension(s) index to compute standard deviation: + sqrt(scale / product(shape[dims])) + scale: A constant scaling for the initialization used as + sqrt(scale / product(shape[dims])). + **kwargs: Extra keyword arguments to pass to tf.truncated_normal. + """ + if isinstance(dims, (int, long)): + self._dims = [dims] + else: + self._dims = dims + self._kwargs = kwargs + self._scale = scale + + def __call__(self, shape, dtype): + stddev = np.sqrt(self._scale / np.prod([shape[x] for x in self._dims])) + return tf.random_normal( + shape=shape, dtype=dtype, stddev=stddev, **self._kwargs) + + +class GaussianInitializer(object): + """Gaussian initializer with a given standard deviation. + + Note that tf.truncated_normal is used internally. Therefore any random sample + outside two-sigma will be discarded and re-sampled. + """ + + def __init__(self, stddev=1.0): + self._stddev = stddev + + def __call__(self, shape, dtype): + return tf.truncated_normal(shape=shape, dtype=dtype, stddev=self._stddev) diff --git a/compression/entropy_coder/lib/blocks.py b/compression/entropy_coder/lib/blocks.py new file mode 100644 index 0000000000000000000000000000000000000000..002384eb07045f1cad963d217a205ade51ba03b6 --- /dev/null +++ b/compression/entropy_coder/lib/blocks.py @@ -0,0 +1,24 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +from block_base import * +from block_util import * +from blocks_binarizer import * +from blocks_entropy_coding import * +from blocks_lstm import * +from blocks_masked_conv2d import * +from blocks_masked_conv2d_lstm import * +from blocks_operator import * +from blocks_std import * diff --git a/compression/entropy_coder/lib/blocks_binarizer.py b/compression/entropy_coder/lib/blocks_binarizer.py new file mode 100644 index 0000000000000000000000000000000000000000..8206731610613af2cf3ec15210fd5b9977f4a916 --- /dev/null +++ b/compression/entropy_coder/lib/blocks_binarizer.py @@ -0,0 +1,35 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Activation and weight binarizer implementations.""" + +import math + +import numpy as np +import tensorflow as tf + + +def ConvertSignCodeToZeroOneCode(x): + """Conversion from codes {-1, +1} to codes {0, 1}.""" + return 0.5 * (x + 1.0) + + +def ConvertZeroOneCodeToSignCode(x): + """Convert from codes {0, 1} to codes {-1, +1}.""" + return 2.0 * x - 1.0 + + +def CheckZeroOneCode(x): + return tf.reduce_all(tf.equal(x * (x - 1.0), 0)) diff --git a/compression/entropy_coder/lib/blocks_entropy_coding.py b/compression/entropy_coder/lib/blocks_entropy_coding.py new file mode 100644 index 0000000000000000000000000000000000000000..6ee5d97926c1b50b12cb9853d16caa25ba31e8d7 --- /dev/null +++ b/compression/entropy_coder/lib/blocks_entropy_coding.py @@ -0,0 +1,49 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Set of blocks related to entropy coding.""" + +import math + +import tensorflow as tf + +import block_base + +# pylint does not recognize block_base.BlockBase.__call__(). +# pylint: disable=not-callable + + +class CodeLength(block_base.BlockBase): + """Theoretical bound for a code length given a probability distribution. + """ + + def __init__(self, name=None): + super(CodeLength, self).__init__(name) + + def _Apply(self, c, p): + """Theoretical bound of the coded length given a probability distribution. + + Args: + c: The binary codes. Belong to {0, 1}. + p: The probability of: P(code==+1) + + Returns: + The average code length. + Note: the average code length can be greater than 1 bit (e.g. when + encoding the least likely symbol). + """ + entropy = ((1.0 - c) * tf.log(1.0 - p) + c * tf.log(p)) / (-math.log(2)) + entropy = tf.reduce_mean(entropy) + return entropy diff --git a/compression/entropy_coder/lib/blocks_entropy_coding_test.py b/compression/entropy_coder/lib/blocks_entropy_coding_test.py new file mode 100644 index 0000000000000000000000000000000000000000..5209865f5991598ee873ed24a4be572e3f9fc515 --- /dev/null +++ b/compression/entropy_coder/lib/blocks_entropy_coding_test.py @@ -0,0 +1,56 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for basic tensorflow blocks_entropy_coding.""" + +from __future__ import division +from __future__ import unicode_literals + +import math + +import numpy as np +import tensorflow as tf + +import blocks_entropy_coding + + +class BlocksEntropyCodingTest(tf.test.TestCase): + + def testCodeLength(self): + shape = [2, 4] + proba_feed = [[0.65, 0.25, 0.70, 0.10], + [0.28, 0.20, 0.44, 0.54]] + symbol_feed = [[1.0, 0.0, 1.0, 0.0], + [0.0, 0.0, 0.0, 1.0]] + mean_code_length = - ( + (math.log(0.65) + math.log(0.75) + math.log(0.70) + math.log(0.90) + + math.log(0.72) + math.log(0.80) + math.log(0.56) + math.log(0.54)) / + math.log(2.0)) / (shape[0] * shape[1]) + + symbol = tf.placeholder(dtype=tf.float32, shape=shape) + proba = tf.placeholder(dtype=tf.float32, shape=shape) + code_length_calculator = blocks_entropy_coding.CodeLength() + code_length = code_length_calculator(symbol, proba) + + with self.test_session(): + tf.global_variables_initializer().run() + code_length_eval = code_length.eval( + feed_dict={symbol: symbol_feed, proba: proba_feed}) + + self.assertAllClose(mean_code_length, code_length_eval) + + +if __name__ == '__main__': + tf.test.main() diff --git a/compression/entropy_coder/lib/blocks_lstm.py b/compression/entropy_coder/lib/blocks_lstm.py new file mode 100644 index 0000000000000000000000000000000000000000..6e474e3e3fcb6eeb3f18daf320e21a3acc88a2bf --- /dev/null +++ b/compression/entropy_coder/lib/blocks_lstm.py @@ -0,0 +1,263 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Blocks of LSTM and its variants.""" + +import numpy as np +import tensorflow as tf + +import block_base +import block_util +import blocks_std + +# pylint does not recognize block_base.BlockBase.__call__(). +# pylint: disable=not-callable + + +def LSTMBiasInit(shape, dtype): + """Returns ones for forget-gate, and zeros for the others.""" + shape = np.array(shape) + + # Check internal consistencies. + assert shape.shape == (1,), shape + assert shape[0] % 4 == 0, shape + + n = shape[0] // 4 + ones = tf.fill([n], tf.constant(1, dtype=dtype)) + zeros = tf.fill([3 * n], tf.constant(0, dtype=dtype)) + return tf.concat([ones, zeros], 0) + + +class LSTMBase(block_base.BlockBase): + """Base class for LSTM implementations. + + These LSTM implementations use the pattern found in [1]. No peephole + connection, i.e., cell content is not used in recurrence computation. + Hidden units are also output units. + + [1] Zaremba, Sutskever, Vinyals. Recurrent Neural Network Regularization, + 2015. arxiv:1409.2329. + """ + + def __init__(self, output_shape, name): + """Initializes LSTMBase class object. + + Args: + output_shape: List representing the LSTM output shape. This argument + does not include batch dimension. For example, if the LSTM output has + shape [batch, depth], then pass [depth]. + name: Name of this block. + """ + super(LSTMBase, self).__init__(name) + + with self._BlockScope(): + self._output_shape = [None] + list(output_shape) + self._hidden = None + self._cell = None + + @property + def hidden(self): + """Returns the hidden units of this LSTM.""" + return self._hidden + + @hidden.setter + def hidden(self, value): + """Assigns to the hidden units of this LSTM. + + Args: + value: The new value for the hidden units. If None, the hidden units are + considered to be filled with zeros. + """ + if value is not None: + value.get_shape().assert_is_compatible_with(self._output_shape) + self._hidden = value + + @property + def cell(self): + """Returns the cell units of this LSTM.""" + return self._cell + + @cell.setter + def cell(self, value): + """Assigns to the cell units of this LSTM. + + Args: + value: The new value for the cell units. If None, the cell units are + considered to be filled with zeros. + """ + if value is not None: + value.get_shape().assert_is_compatible_with(self._output_shape) + self._cell = value + + # Consider moving bias terms to the base, and require this method to be + # linear. + def _TransformInputs(self, _): + """Transforms the input units to (4 * depth) units. + + The forget-gate, input-gate, output-gate, and cell update is computed as + f, i, j, o = T(h) + R(x) + where h is hidden units, x is input units, and T, R are transforms of + h, x, respectively. + + This method implements R. Note that T is strictly linear, so if LSTM is + going to use bias, this method must include the bias to the transformation. + + Subclasses must implement this method. See _Apply() for more details. + """ + raise NotImplementedError() + + def _TransformHidden(self, _): + """Transforms the hidden units to (4 * depth) units. + + The forget-gate, input-gate, output-gate, and cell update is computed as + f, i, j, o = T(h) + R(x) + where h is hidden units, x is input units, and T, R are transforms of + h, x, respectively. + + This method implements T in the equation. The method must implement a + strictly linear transformation. For example, it may use MatMul or Conv2D, + but must not add bias. This is because when hidden units are zeros, then + the LSTM implementation will skip calling this method, instead of passing + zeros to this function. + + Subclasses must implement this method. See _Apply() for more details. + """ + raise NotImplementedError() + + def _Apply(self, *args): + xtransform = self._TransformInputs(*args) + depth_axis = len(self._output_shape) - 1 + + if self.hidden is not None: + htransform = self._TransformHidden(self.hidden) + f, i, j, o = tf.split( + value=htransform + xtransform, num_or_size_splits=4, axis=depth_axis) + else: + f, i, j, o = tf.split( + value=xtransform, num_or_size_splits=4, axis=depth_axis) + + if self.cell is not None: + self.cell = tf.sigmoid(f) * self.cell + tf.sigmoid(i) * tf.tanh(j) + else: + self.cell = tf.sigmoid(i) * tf.tanh(j) + + self.hidden = tf.sigmoid(o) * tf.tanh(self.cell) + return self.hidden + + +class LSTM(LSTMBase): + """Efficient LSTM implementation used in [1]. + + [1] Zaremba, Sutskever, Vinyals. Recurrent Neural Network Regularization, + 2015. arxiv:1409.2329. + """ + + def __init__(self, + depth, + bias=LSTMBiasInit, + initializer=block_util.RsqrtInitializer(), + name=None): + super(LSTM, self).__init__([depth], name) + + with self._BlockScope(): + self._depth = depth + self._nn = blocks_std.NN( + 4 * depth, bias=bias, act=None, initializer=initializer) + self._hidden_linear = blocks_std.Linear( + 4 * depth, initializer=initializer) + + def _TransformInputs(self, *args): + return self._nn(*args) + + def _TransformHidden(self, h): + return self._hidden_linear(h) + + +class Conv2DLSTM(LSTMBase): + """Convolutional LSTM implementation with optimizations inspired by [1]. + + Note that when using the batch normalization feature, the bias initializer + will not be used, since BN effectively cancels its effect out. + + [1] Zaremba, Sutskever, Vinyals. Recurrent Neural Network Regularization, + 2015. arxiv:1409.2329. + """ + + def __init__(self, + depth, + filter_size, + hidden_filter_size, + strides, + padding, + bias=LSTMBiasInit, + initializer=block_util.RsqrtInitializer(dims=(0, 1, 2)), + use_moving_average=False, + name=None): + super(Conv2DLSTM, self).__init__([None, None, depth], name) + self._iter = 0 + + with self._BlockScope(): + self._input_conv = blocks_std.Conv2D( + 4 * depth, + filter_size, + strides, + padding, + bias=None, + act=None, + initializer=initializer, + name='input_conv2d') + + self._hidden_conv = blocks_std.Conv2D( + 4 * depth, + hidden_filter_size, + [1, 1], + 'SAME', + bias=None, + act=None, + initializer=initializer, + name='hidden_conv2d') + + if bias is not None: + self._bias = blocks_std.BiasAdd(bias, name='biases') + else: + self._bias = blocks_std.PassThrough() + + def _TransformInputs(self, x): + return self._bias(self._input_conv(x)) + + def _TransformHidden(self, h): + return self._hidden_conv(h) + + def _Apply(self, *args): + xtransform = self._TransformInputs(*args) + depth_axis = len(self._output_shape) - 1 + + if self.hidden is not None: + htransform = self._TransformHidden(self.hidden) + f, i, j, o = tf.split( + value=htransform + xtransform, num_or_size_splits=4, axis=depth_axis) + else: + f, i, j, o = tf.split( + value=xtransform, num_or_size_splits=4, axis=depth_axis) + + if self.cell is not None: + self.cell = tf.sigmoid(f) * self.cell + tf.sigmoid(i) * tf.tanh(j) + else: + self.cell = tf.sigmoid(i) * tf.tanh(j) + + self.hidden = tf.sigmoid(o) * tf.tanh(self.cell) + + self._iter += 1 + return self.hidden diff --git a/compression/entropy_coder/lib/blocks_lstm_test.py b/compression/entropy_coder/lib/blocks_lstm_test.py new file mode 100644 index 0000000000000000000000000000000000000000..03c32dc136effda11163f2e35c5a48496f0187c0 --- /dev/null +++ b/compression/entropy_coder/lib/blocks_lstm_test.py @@ -0,0 +1,113 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for LSTM tensorflow blocks.""" +from __future__ import division + +import numpy as np +import tensorflow as tf + +import block_base +import blocks_std +import blocks_lstm + + +class BlocksLSTMTest(tf.test.TestCase): + + def CheckUnary(self, y, op_type): + self.assertEqual(op_type, y.op.type) + self.assertEqual(1, len(y.op.inputs)) + return y.op.inputs[0] + + def CheckBinary(self, y, op_type): + self.assertEqual(op_type, y.op.type) + self.assertEqual(2, len(y.op.inputs)) + return y.op.inputs + + def testLSTM(self): + lstm = blocks_lstm.LSTM(10) + lstm.hidden = tf.zeros(shape=[10, 10], dtype=tf.float32) + lstm.cell = tf.zeros(shape=[10, 10], dtype=tf.float32) + x = tf.placeholder(dtype=tf.float32, shape=[10, 11]) + y = lstm(x) + + o, tanhc = self.CheckBinary(y, 'Mul') + self.assertEqual(self.CheckUnary(o, 'Sigmoid').name, 'LSTM/split:3') + + self.assertIs(lstm.cell, self.CheckUnary(tanhc, 'Tanh')) + fc, ij = self.CheckBinary(lstm.cell, 'Add') + + f, _ = self.CheckBinary(fc, 'Mul') + self.assertEqual(self.CheckUnary(f, 'Sigmoid').name, 'LSTM/split:0') + + i, j = self.CheckBinary(ij, 'Mul') + self.assertEqual(self.CheckUnary(i, 'Sigmoid').name, 'LSTM/split:1') + j = self.CheckUnary(j, 'Tanh') + self.assertEqual(j.name, 'LSTM/split:2') + + def testLSTMBiasInit(self): + lstm = blocks_lstm.LSTM(9) + x = tf.placeholder(dtype=tf.float32, shape=[15, 7]) + lstm(x) + b = lstm._nn._bias + + with self.test_session(): + tf.global_variables_initializer().run() + bias_var = b._bias.eval() + + comp = ([1.0] * 9) + ([0.0] * 27) + self.assertAllEqual(bias_var, comp) + + def testConv2DLSTM(self): + lstm = blocks_lstm.Conv2DLSTM(depth=10, + filter_size=[1, 1], + hidden_filter_size=[1, 1], + strides=[1, 1], + padding='SAME') + lstm.hidden = tf.zeros(shape=[10, 11, 11, 10], dtype=tf.float32) + lstm.cell = tf.zeros(shape=[10, 11, 11, 10], dtype=tf.float32) + x = tf.placeholder(dtype=tf.float32, shape=[10, 11, 11, 1]) + y = lstm(x) + + o, tanhc = self.CheckBinary(y, 'Mul') + self.assertEqual(self.CheckUnary(o, 'Sigmoid').name, 'Conv2DLSTM/split:3') + + self.assertIs(lstm.cell, self.CheckUnary(tanhc, 'Tanh')) + fc, ij = self.CheckBinary(lstm.cell, 'Add') + + f, _ = self.CheckBinary(fc, 'Mul') + self.assertEqual(self.CheckUnary(f, 'Sigmoid').name, 'Conv2DLSTM/split:0') + + i, j = self.CheckBinary(ij, 'Mul') + self.assertEqual(self.CheckUnary(i, 'Sigmoid').name, 'Conv2DLSTM/split:1') + j = self.CheckUnary(j, 'Tanh') + self.assertEqual(j.name, 'Conv2DLSTM/split:2') + + def testConv2DLSTMBiasInit(self): + lstm = blocks_lstm.Conv2DLSTM(9, 1, 1, [1, 1], 'SAME') + x = tf.placeholder(dtype=tf.float32, shape=[1, 7, 7, 7]) + lstm(x) + b = lstm._bias + + with self.test_session(): + tf.global_variables_initializer().run() + bias_var = b._bias.eval() + + comp = ([1.0] * 9) + ([0.0] * 27) + self.assertAllEqual(bias_var, comp) + + +if __name__ == '__main__': + tf.test.main() diff --git a/compression/entropy_coder/lib/blocks_masked_conv2d.py b/compression/entropy_coder/lib/blocks_masked_conv2d.py new file mode 100644 index 0000000000000000000000000000000000000000..395af334953676215849683b9b275c64ae967b38 --- /dev/null +++ b/compression/entropy_coder/lib/blocks_masked_conv2d.py @@ -0,0 +1,225 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Define some typical masked 2D convolutions.""" + +import numpy as np +import tensorflow as tf + +import block_util +import blocks_std + +# pylint does not recognize block_base.BlockBase.__call__(). +# pylint: disable=not-callable + + +class RasterScanConv2D(blocks_std.Conv2DBase): + """Conv2D with no dependency on future pixels (in raster scan order). + + For example, assuming a 5 x 5 kernel, the kernel is applied a spatial mask: + T T T T T + T T T T T + T T x F F + F F F F F + F F F F F + where 'T' are pixels which are available when computing the convolution + for pixel 'x'. All the pixels marked with 'F' are not available. + 'x' itself is not available if strict_order is True, otherwise, it is + available. + """ + + def __init__(self, depth, filter_size, strides, padding, + strict_order=True, + bias=None, act=None, initializer=None, name=None): + super(RasterScanConv2D, self).__init__( + depth, filter_size, strides, padding, bias, act, name=name) + + if (filter_size[0] % 2) != 1 or (filter_size[1] % 2) != 1: + raise ValueError('Kernel size should be odd.') + + with self._BlockScope(): + if initializer is None: + initializer = block_util.RsqrtInitializer(dims=(0, 1, 2)) + self._initializer = initializer + self._strict_order = strict_order + + def _CreateKernel(self, shape, dtype): + init = self._initializer(shape, dtype) + kernel = self.NewVar(init) + + mask = np.ones(shape[:2], dtype=dtype.as_numpy_dtype) + center = shape[:2] // 2 + mask[center[0] + 1:, :] = 0 + if not self._strict_order: + mask[center[0], center[1] + 1:] = 0 + else: + mask[center[0], center[1]:] = 0 + mask = mask.reshape(mask.shape + (1, 1)) + + return tf.convert_to_tensor(mask, dtype) * kernel + + +class DepthOrderConv2D(blocks_std.Conv2DBase): + """Conv2D with no dependency on higher depth dimensions. + + More precisely, the output depth #n has only dependencies on input depths #k + for k < n (if strict_order is True) or for k <= n (if strict_order is False). + """ + + def __init__(self, depth, filter_size, strides, padding, + strict_order=True, + bias=None, act=None, initializer=None, name=None): + super(DepthOrderConv2D, self).__init__( + depth, filter_size, strides, padding, bias, act, name=name) + + with self._BlockScope(): + if initializer is None: + initializer = block_util.RsqrtInitializer(dims=(0, 1, 2)) + self._initializer = initializer + self._strict_order = strict_order + + def _CreateKernel(self, shape, dtype): + init = self._initializer(shape, dtype) + kernel = self.NewVar(init) + + mask = np.ones(shape[2:], dtype=dtype.as_numpy_dtype) + depth_output = shape[3] + for d in xrange(depth_output): + if self._strict_order: + mask[d:, d] = 0 + else: + mask[d + 1:, d] = 0 + mask = mask.reshape((1, 1) + mask.shape) + + return tf.convert_to_tensor(mask, dtype) * kernel + + +class GroupRasterScanConv2D(blocks_std.Conv2DBase): + """Conv2D with no dependency on future pixels (in raster scan order). + + This version only introduces dependencies on previous pixels in raster scan + order. It can also introduce some dependencies on previous depth positions + of the current pixel (current pixel = center pixel of the kernel) in the + following way: + the depth dimension of the input is split into Ki groups of size + |input_group_size|, the output dimension is split into Ko groups of size + |output_group_size| (usually Ki == Ko). Each output group ko of the current + pixel position can only depend on previous input groups ki + (i.e. ki < ko if strict_order is True or ki <= ko if strict_order is False). + + Notes: + - Block RasterScanConv2D is a special case of GroupRasterScanConv2D + where Ki == Ko == 1 (i.e. input_group_size == input_depth and + output_group_size == output_depth). + - For 1x1 convolution, block DepthOrderConv2D is a special case of + GroupRasterScanConv2D where input_group_size == 1 and + output_group_size == 1. + """ + + def __init__(self, depth, filter_size, strides, padding, + strict_order=True, + input_group_size=1, + output_group_size=1, + bias=None, act=None, initializer=None, name=None): + super(GroupRasterScanConv2D, self).__init__( + depth, filter_size, strides, padding, bias, act, name=name) + + if (filter_size[0] % 2) != 1 or (filter_size[1] % 2) != 1: + raise ValueError('Kernel size should be odd.') + + with self._BlockScope(): + if initializer is None: + initializer = block_util.RsqrtInitializer(dims=(0, 1, 2)) + self._initializer = initializer + self._input_group_size = input_group_size + self._output_group_size = output_group_size + self._strict_order = strict_order + + if depth % self._output_group_size != 0: + raise ValueError( + 'Invalid depth group size: {} for depth {}'.format( + self._output_group_size, depth)) + self._output_group_count = depth // self._output_group_size + + def _CreateKernel(self, shape, dtype): + init = self._initializer(shape, dtype) + kernel = self.NewVar(init) + + depth_input = shape[2] + if depth_input % self._input_group_size != 0: + raise ValueError( + 'Invalid depth group size: {} for depth {}'.format( + self._input_group_size, depth_input)) + input_group_count = depth_input // self._input_group_size + output_group_count = self._output_group_count + + # Set the mask to 0 for future pixels in raster scan order. + center = shape[:2] // 2 + mask = np.ones([shape[0], shape[1], + input_group_count, self._input_group_size, + output_group_count, self._output_group_size], + dtype=dtype.as_numpy_dtype) + mask[center[0] + 1:, :, :, :, :, :] = 0 + mask[center[0], center[1] + 1:, :, :, :, :] = 0 + + # Adjust the mask for the current position (the center position). + depth_output = shape[3] + for d in xrange(output_group_count): + mask[center[0], center[1], d + 1:, :, d:d + 1, :] = 0 + if self._strict_order: + mask[center[0], center[1], d, :, d:d + 1, :] = 0 + + mask = mask.reshape([shape[0], shape[1], depth_input, depth_output]) + return tf.convert_to_tensor(mask, dtype) * kernel + + +class InFillingConv2D(blocks_std.Conv2DBase): + """Conv2D with kernel having no dependency on the current pixel. + + For example, assuming a 5 x 5 kernel, the kernel is applied a spatial mask: + T T T T T + T T T T T + T T x T T + T T T T T + T T T T T + where 'T' marks a pixel which is available when computing the convolution + for pixel 'x'. 'x' itself is not available. + """ + + def __init__(self, depth, filter_size, strides, padding, + bias=None, act=None, initializer=None, name=None): + super(InFillingConv2D, self).__init__( + depth, filter_size, strides, padding, bias, act, name=name) + + if (filter_size[0] % 2) != 1 or (filter_size[1] % 2) != 1: + raise ValueError('Kernel size should be odd.') + if filter_size[0] == 1 and filter_size[1] == 1: + raise ValueError('Kernel size should be larger than 1x1.') + + with self._BlockScope(): + if initializer is None: + initializer = block_util.RsqrtInitializer(dims=(0, 1, 2)) + self._initializer = initializer + + def _CreateKernel(self, shape, dtype): + init = self._initializer(shape, dtype) + kernel = self.NewVar(init) + + mask = np.ones(shape[:2], dtype=dtype.as_numpy_dtype) + center = shape[:2] // 2 + mask[center[0], center[1]] = 0 + mask = mask.reshape(mask.shape + (1, 1)) + + return tf.convert_to_tensor(mask, dtype) * kernel diff --git a/compression/entropy_coder/lib/blocks_masked_conv2d_lstm.py b/compression/entropy_coder/lib/blocks_masked_conv2d_lstm.py new file mode 100644 index 0000000000000000000000000000000000000000..2d6dfeffcaff1289adf3bdec33cb0560db6b0416 --- /dev/null +++ b/compression/entropy_coder/lib/blocks_masked_conv2d_lstm.py @@ -0,0 +1,79 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Masked conv2d LSTM.""" + +import block_base +import block_util +import blocks_masked_conv2d +import blocks_lstm +import blocks_std + +# pylint: disable=not-callable + + +class RasterScanConv2DLSTM(blocks_lstm.LSTMBase): + """Convolutional LSTM implementation with optimizations inspired by [1]. + + Note that when using the batch normalization feature, the bias initializer + will not be used, since BN effectively cancels its effect out. + + [1] Zaremba, Sutskever, Vinyals. Recurrent Neural Network Regularization, + 2015. arxiv:1409.2329. + """ + + def __init__(self, + depth, + filter_size, + hidden_filter_size, + strides, + padding, + bias=blocks_lstm.LSTMBiasInit, + initializer=block_util.RsqrtInitializer(dims=(0, 1, 2)), + name=None): + super(RasterScanConv2DLSTM, self).__init__([None, None, depth], name) + + with self._BlockScope(): + self._input_conv = blocks_masked_conv2d.RasterScanConv2D( + 4 * depth, + filter_size, + strides, + padding, + strict_order=False, + bias=None, + act=None, + initializer=initializer, + name='input_conv2d') + + self._hidden_conv = blocks_std.Conv2D( + 4 * depth, + hidden_filter_size, + [1, 1], + 'SAME', + bias=None, + act=None, + initializer=initializer, + name='hidden_conv2d') + + if bias is not None: + self._bias = blocks_std.BiasAdd(bias, name='biases') + else: + self._bias = blocks_std.PassThrough() + + def _TransformInputs(self, x): + return self._bias(self._input_conv(x)) + + def _TransformHidden(self, h): + return self._hidden_conv(h) diff --git a/compression/entropy_coder/lib/blocks_masked_conv2d_test.py b/compression/entropy_coder/lib/blocks_masked_conv2d_test.py new file mode 100644 index 0000000000000000000000000000000000000000..adb546778e526bfb99fda3bb3e6a4432d0082161 --- /dev/null +++ b/compression/entropy_coder/lib/blocks_masked_conv2d_test.py @@ -0,0 +1,206 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests of the 2D masked convolution blocks.""" + +from __future__ import division +from __future__ import unicode_literals + +import numpy as np +import tensorflow as tf + +import blocks_masked_conv2d + + +class MaskedConv2DTest(tf.test.TestCase): + + def testRasterScanKernel(self): + kernel_size = 5 + input_depth = 1 + output_depth = 1 + kernel_shape = [kernel_size, kernel_size, input_depth, output_depth] + + # pylint: disable=bad-whitespace + kernel_feed = [[ 1.0, 2.0, 3.0, 4.0, 5.0], + [ 6.0, 7.0, 8.0, 9.0, 10.0], + [11.0, 12.0, 13.0, 14.0, 15.0], + [16.0, 17.0, 18.0, 19.0, 20.0], + [21.0, 22.0, 23.0, 24.0, 25.0]] + kernel_feed = np.reshape(kernel_feed, kernel_shape) + kernel_expected = [[ 1.0, 2.0, 3.0, 4.0, 5.0], + [ 6.0, 7.0, 8.0, 9.0, 10.0], + [11.0, 12.0, 0.0, 0.0, 0.0], + [ 0.0, 0.0, 0.0, 0.0, 0.0], + [ 0.0, 0.0, 0.0, 0.0, 0.0]] + kernel_expected = np.reshape(kernel_expected, kernel_shape) + # pylint: enable=bad-whitespace + + init_kernel = lambda s, t: tf.constant(kernel_feed, dtype=t, shape=s) + masked_conv2d = blocks_masked_conv2d.RasterScanConv2D( + output_depth, [kernel_size] * 2, [1] * 2, 'SAME', + initializer=init_kernel) + x = tf.placeholder(dtype=tf.float32, shape=[10] * 3 + [input_depth]) + _ = masked_conv2d(x) + + with self.test_session(): + tf.global_variables_initializer().run() + kernel_value = masked_conv2d._kernel.eval() + + self.assertAllEqual(kernel_expected, kernel_value) + + def testDepthOrderKernel(self): + kernel_size = 1 + input_depth = 7 + output_depth = input_depth + kernel_shape = [kernel_size, kernel_size, input_depth, output_depth] + + kernel_feed = np.ones(kernel_shape) + x_shape = [5] * 3 + [input_depth] + x_feed = np.ones(x_shape) + y_expected = np.zeros(x_shape[0:3] + [output_depth]) + y_expected[:, :, :] = np.arange(output_depth) + + init_kernel = lambda s, t: tf.constant(kernel_feed, dtype=t, shape=s) + masked_conv2d = blocks_masked_conv2d.DepthOrderConv2D( + output_depth, [kernel_size] * 2, [1] * 2, 'SAME', + strict_order=True, + initializer=init_kernel) + x = tf.placeholder(dtype=tf.float32, shape=x_shape) + y = masked_conv2d(x) + + with self.test_session(): + tf.global_variables_initializer().run() + y_value = y.eval(feed_dict={x: x_feed}) + + self.assertAllEqual(y_expected, y_value) + + def testGroupRasterScanKernel(self): + kernel_size = 3 + input_depth = 4 + input_group_size = 2 + output_depth = 2 + output_group_size = 1 + kernel_shape = [kernel_size, kernel_size, input_depth, output_depth] + kernel_feed = np.ones(shape=kernel_shape) + + height = 5 + width = 5 + x_shape = [1, height, width, input_depth] + x_feed = np.ones(shape=x_shape) + + # pylint: disable=bad-whitespace + y_expected = [ + [[ 0, 2], [ 4, 6], [ 4, 6], [ 4, 6], [ 4, 6]], + [[ 8, 10], [16, 18], [16, 18], [16, 18], [12, 14]], + [[ 8, 10], [16, 18], [16, 18], [16, 18], [12, 14]], + [[ 8, 10], [16, 18], [16, 18], [16, 18], [12, 14]], + [[ 8, 10], [16, 18], [16, 18], [16, 18], [12, 14]], + ] + y_expected = np.reshape(y_expected, [1, height, width, output_depth]) + # pylint: enable=bad-whitespace + + init_kernel = lambda s, t: tf.constant(kernel_feed, dtype=t, shape=s) + masked_conv2d = blocks_masked_conv2d.GroupRasterScanConv2D( + output_depth, [kernel_size] * 2, [1] * 2, 'SAME', + strict_order=True, + input_group_size=input_group_size, + output_group_size=output_group_size, + initializer=init_kernel) + x = tf.placeholder(dtype=tf.float32, shape=x_shape) + y = masked_conv2d(x) + + with self.test_session(): + tf.global_variables_initializer().run() + y_value = y.eval(feed_dict={x: x_feed}) + + self.assertAllEqual(y_expected, y_value) + + def testInFillingKernel(self): + kernel_size = 5 + input_depth = 1 + output_depth = 1 + kernel_shape = [kernel_size, kernel_size, input_depth, output_depth] + + # pylint: disable=bad-whitespace + kernel_feed = [[ 1.0, 2.0, 3.0, 4.0, 5.0], + [ 6.0, 7.0, 8.0, 9.0, 10.0], + [11.0, 12.0, 13.0, 14.0, 15.0], + [16.0, 17.0, 18.0, 19.0, 20.0], + [21.0, 22.0, 23.0, 24.0, 25.0]] + kernel_feed = np.reshape(kernel_feed, kernel_shape) + kernel_expected = [[ 1.0, 2.0, 3.0, 4.0, 5.0], + [ 6.0, 7.0, 8.0, 9.0, 10.0], + [11.0, 12.0, 0.0, 14.0, 15.0], + [16.0, 17.0, 18.0, 19.0, 20.0], + [21.0, 22.0, 23.0, 24.0, 25.0]] + kernel_expected = np.reshape(kernel_expected, kernel_shape) + # pylint: enable=bad-whitespace + + init_kernel = lambda s, t: tf.constant(kernel_feed, dtype=t, shape=s) + masked_conv2d = blocks_masked_conv2d.InFillingConv2D( + output_depth, [kernel_size] * 2, [1] * 2, 'SAME', + initializer=init_kernel) + x = tf.placeholder(dtype=tf.float32, shape=[10] * 3 + [input_depth]) + _ = masked_conv2d(x) + + with self.test_session(): + tf.global_variables_initializer().run() + kernel_value = masked_conv2d._kernel.eval() + + self.assertAllEqual(kernel_expected, kernel_value) + + def testConv2DMaskedNumerics(self): + kernel_size = 5 + input_shape = [1, 10, 10, 1] + filter_shape = [kernel_size, kernel_size, 1, 1] + strides = [1, 1, 1, 1] + output_shape = [1, 10, 10, 1] + + conv = blocks_masked_conv2d.RasterScanConv2D( + depth=filter_shape[-1], + filter_size=filter_shape[0:2], + strides=strides[1:3], + padding='SAME', + initializer=tf.constant_initializer(value=1.0)) + x = tf.placeholder(dtype=tf.float32, shape=input_shape) + y = conv(x) + + x_feed = - np.ones(input_shape, dtype=float) + y_expected = np.ones(output_shape, dtype=float) + for i in xrange(input_shape[1]): + for j in xrange(input_shape[2]): + x_feed[0, i, j, 0] = 10 * (j + 1) + i + v = 0 + ki_start = max(i - kernel_size // 2, 0) + kj_start = max(j - kernel_size // 2, 0) + kj_end = min(j + kernel_size // 2, input_shape[2] - 1) + for ki in range(ki_start, i + 1): + for kj in range(kj_start, kj_end + 1): + if ki > i: + continue + if ki == i and kj >= j: + continue + v += 10 * (kj + 1) + ki + y_expected[0, i, j, 0] = v + + with self.test_session(): + tf.global_variables_initializer().run() + y_value = y.eval(feed_dict={x: x_feed}) + + self.assertAllEqual(y_expected, y_value) + + +if __name__ == '__main__': + tf.test.main() diff --git a/compression/entropy_coder/lib/blocks_operator.py b/compression/entropy_coder/lib/blocks_operator.py new file mode 100644 index 0000000000000000000000000000000000000000..e35e37b27aa416ed48f91eda866d372601741cba --- /dev/null +++ b/compression/entropy_coder/lib/blocks_operator.py @@ -0,0 +1,87 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Common blocks which work as operators on other blocks.""" + +import tensorflow as tf + +import block_base + +# pylint: disable=not-callable + + +class CompositionOperator(block_base.BlockBase): + """Composition of several blocks.""" + + def __init__(self, block_list, name=None): + """Initialization of the composition operator. + + Args: + block_list: List of blocks.BlockBase that are chained to create + a new blocks.BlockBase. + name: Name of this block. + """ + super(CompositionOperator, self).__init__(name) + self._blocks = block_list + + def _Apply(self, x): + """Apply successively all the blocks on the given input tensor.""" + h = x + for layer in self._blocks: + h = layer(h) + return h + + +class LineOperator(block_base.BlockBase): + """Repeat the same block over all the lines of an input tensor.""" + + def __init__(self, block, name=None): + super(LineOperator, self).__init__(name) + self._block = block + + def _Apply(self, x): + height = x.get_shape()[1].value + if height is None: + raise ValueError('Unknown tensor height') + all_line_x = tf.split(value=x, num_or_size_splits=height, axis=1) + + y = [] + for line_x in all_line_x: + y.append(self._block(line_x)) + y = tf.concat(values=y, axis=1) + + return y + + +class TowerOperator(block_base.BlockBase): + """Parallel execution with concatenation of several blocks.""" + + def __init__(self, block_list, dim=3, name=None): + """Initialization of the parallel exec + concat (Tower). + + Args: + block_list: List of blocks.BlockBase that are chained to create + a new blocks.BlockBase. + dim: the dimension on which to concat. + name: Name of this block. + """ + super(TowerOperator, self).__init__(name) + self._blocks = block_list + self._concat_dim = dim + + def _Apply(self, x): + """Apply successively all the blocks on the given input tensor.""" + outputs = [layer(x) for layer in self._blocks] + return tf.concat(outputs, self._concat_dim) diff --git a/compression/entropy_coder/lib/blocks_operator_test.py b/compression/entropy_coder/lib/blocks_operator_test.py new file mode 100644 index 0000000000000000000000000000000000000000..8b6d80da1d09102585e4725dd5c59f48d48eafcd --- /dev/null +++ b/compression/entropy_coder/lib/blocks_operator_test.py @@ -0,0 +1,64 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests of the block operators.""" + +import numpy as np +import tensorflow as tf + +import block_base +import blocks_operator + + +class AddOneBlock(block_base.BlockBase): + + def __init__(self, name=None): + super(AddOneBlock, self).__init__(name) + + def _Apply(self, x): + return x + 1.0 + + +class SquareBlock(block_base.BlockBase): + + def __init__(self, name=None): + super(SquareBlock, self).__init__(name) + + def _Apply(self, x): + return x * x + + +class BlocksOperatorTest(tf.test.TestCase): + + def testComposition(self): + x_value = np.array([[1.0, 2.0, 3.0], + [-1.0, -2.0, -3.0]]) + y_expected_value = np.array([[4.0, 9.0, 16.0], + [0.0, 1.0, 4.0]]) + + x = tf.placeholder(dtype=tf.float32, shape=[2, 3]) + complex_block = blocks_operator.CompositionOperator( + [AddOneBlock(), + SquareBlock()]) + y = complex_block(x) + + with self.test_session(): + y_value = y.eval(feed_dict={x: x_value}) + + self.assertAllClose(y_expected_value, y_value) + + +if __name__ == '__main__': + tf.test.main() diff --git a/compression/entropy_coder/lib/blocks_std.py b/compression/entropy_coder/lib/blocks_std.py new file mode 100644 index 0000000000000000000000000000000000000000..2c617485342452f500d4b1b0b18e33b07d51e487 --- /dev/null +++ b/compression/entropy_coder/lib/blocks_std.py @@ -0,0 +1,363 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Basic blocks for building tensorflow models.""" + +import numpy as np +import tensorflow as tf + +import block_base +import block_util + +# pylint does not recognize block_base.BlockBase.__call__(). +# pylint: disable=not-callable + + +def HandleConvPaddingModes(x, padding, kernel_shape, strides): + """Returns an updated tensor and padding type for REFLECT and SYMMETRIC. + + Args: + x: A 4D tensor with shape [batch_size, height, width, depth]. + padding: Padding mode (SAME, VALID, REFLECT, or SYMMETRIC). + kernel_shape: Shape of convolution kernel that will be applied. + strides: Convolution stride that will be used. + + Returns: + x and padding after adjustments for REFLECT and SYMMETRIC. + """ + # For 1x1 convolution, all padding modes are the same. + if np.all(kernel_shape[:2] == 1): + return x, 'VALID' + + if padding == 'REFLECT' or padding == 'SYMMETRIC': + # We manually compute the number of paddings as if 'SAME'. + # From Tensorflow kernel, the formulas are as follows. + # output_shape = ceil(input_shape / strides) + # paddings = (output_shape - 1) * strides + filter_size - input_shape + # Let x, y, s be a shorthand notations for input_shape, output_shape, and + # strides, respectively. Let (x - 1) = sn + r where 0 <= r < s. Note that + # y - 1 = ceil(x / s) - 1 = floor((x - 1) / s) = n + # provided that x > 0. Therefore + # paddings = n * s + filter_size - (sn + r + 1) + # = filter_size - r - 1. + input_shape = x.get_shape() # shape at graph construction time + img_shape = tf.shape(x)[1:3] # image shape (no batch) at run time + remainder = tf.mod(img_shape - 1, strides[1:3]) + pad_sizes = kernel_shape[:2] - remainder - 1 + + pad_rows = pad_sizes[0] + pad_cols = pad_sizes[1] + pad = tf.stack([[0, 0], tf.stack([pad_rows // 2, (pad_rows + 1) // 2]), + tf.stack([pad_cols // 2, (pad_cols + 1) // 2]), [0, 0]]) + + # Manually pad the input and switch the padding mode to 'VALID'. + x = tf.pad(x, pad, mode=padding) + x.set_shape([input_shape[0], x.get_shape()[1], + x.get_shape()[2], input_shape[3]]) + padding = 'VALID' + + return x, padding + + +class PassThrough(block_base.BlockBase): + """A dummy transform block that does nothing.""" + + def __init__(self): + # Pass an empty string to disable name scoping. + super(PassThrough, self).__init__(name='') + + def _Apply(self, inp): + return inp + + @property + def initialized(self): + """Always returns True.""" + return True + + +class Bias(object): + """An initialization helper class for BiasAdd block below.""" + + def __init__(self, value=0): + self.value = value + + +class BiasAdd(block_base.BlockBase): + """A tf.nn.bias_add wrapper. + + This wrapper may act as a PassThrough block depending on the initializer + provided, to make easier optional bias applications in NN blocks, etc. + See __init__() for the details. + """ + + def __init__(self, initializer=Bias(0), name=None): + """Initializes Bias block. + + |initializer| parameter have two special cases. + + 1. If initializer is None, then this block works as a PassThrough. + 2. If initializer is a Bias class object, then tf.constant_initializer is + used with the stored value. + + Args: + initializer: An initializer for the bias variable. + name: Name of this block. + """ + super(BiasAdd, self).__init__(name) + + with self._BlockScope(): + if isinstance(initializer, Bias): + self._initializer = tf.constant_initializer(value=initializer.value) + else: + self._initializer = initializer + + self._bias = None + + def _Apply(self, x): + if not self._bias: + init = self._initializer([int(x.get_shape()[-1])], x.dtype) + self._bias = self.NewVar(init) + + return tf.nn.bias_add(x, self._bias) + + def CreateWeightLoss(self): + return [] + + +class LinearBase(block_base.BlockBase): + """A matmul wrapper. + + Returns input * W, where matrix W can be customized through derivation. + """ + + def __init__(self, depth, name=None): + super(LinearBase, self).__init__(name) + + with self._BlockScope(): + self._depth = depth + self._matrix = None + + def _CreateKernel(self, shape, dtype): + raise NotImplementedError('This method must be sub-classed.') + + def _Apply(self, x): + if not self._matrix: + shape = [int(x.get_shape()[-1]), self._depth] + self._matrix = self._CreateKernel(shape, x.dtype) + + return tf.matmul(x, self._matrix) + + +class Linear(LinearBase): + """A matmul wrapper. + + Returns input * W, where matrix W is learned. + """ + + def __init__(self, + depth, + initializer=block_util.RsqrtInitializer(), + name=None): + super(Linear, self).__init__(depth, name) + + with self._BlockScope(): + self._initializer = initializer + + def _CreateKernel(self, shape, dtype): + init = self._initializer(shape, dtype) + return self.NewVar(init) + + +class NN(block_base.BlockBase): + """A neural network layer wrapper. + + Returns act(input * W + b), where matrix W, bias b are learned, and act is an + optional activation function (i.e., nonlinearity). + + This transform block can handle multiple inputs. If x_1, x_2, ..., x_m are + the inputs, then returns act(x_1 * W_1 + ... + x_m * W_m + b). + + Attributes: + nunits: The dimension of the output. + """ + + def __init__(self, + depth, + bias=Bias(0), + act=None, # e.g., tf.nn.relu + initializer=block_util.RsqrtInitializer(), + linear_block_factory=(lambda d, i: Linear(d, initializer=i)), + name=None): + """Initializes NN block. + + Args: + depth: The depth of the output. + bias: An initializer for the bias, or a Bias class object. If None, there + will be no bias term for this NN block. See BiasAdd block. + act: Optional activation function. If None, no activation is applied. + initializer: The initialization method for the matrix weights. + linear_block_factory: A function used to create a linear block. + name: The name of this block. + """ + super(NN, self).__init__(name) + + with self._BlockScope(): + self._linear_block_factory = linear_block_factory + self._depth = depth + self._initializer = initializer + self._matrices = None + + self._bias = BiasAdd(bias) if bias else PassThrough() + self._act = act if act else PassThrough() + + def _Apply(self, *args): + if not self._matrices: + self._matrices = [ + self._linear_block_factory(self._depth, self._initializer) + for _ in args] + + if len(self._matrices) != len(args): + raise ValueError('{} expected {} inputs, but observed {} inputs'.format( + self.name, len(self._matrices), len(args))) + + if len(args) > 1: + y = tf.add_n([m(x) for m, x in zip(self._matrices, args)]) + else: + y = self._matrices[0](args[0]) + + return self._act(self._bias(y)) + + +class Conv2DBase(block_base.BlockBase): + """A tf.nn.conv2d operator.""" + + def __init__(self, depth, filter_size, strides, padding, + bias=None, act=None, atrous_rate=None, conv=tf.nn.conv2d, + name=None): + """Initializes a Conv2DBase block. + + Arguments: + depth: The output depth of the block (i.e. #filters); if negative, the + output depth will be set to be the same as the input depth. + filter_size: The size of the 2D filter. If it's specified as an integer, + it's going to create a square filter. Otherwise, this is a tuple + specifying the height x width of the filter. + strides: A tuple specifying the y and x stride. + padding: One of the valid padding modes allowed by tf.nn.conv2d, or + 'REFLECT'/'SYMMETRIC' for mirror padding. + bias: An initializer for the bias, or a Bias class object. If None, there + will be no bias in this block. See BiasAdd block. + act: Optional activation function applied to the output. + atrous_rate: optional input rate for ATrous convolution. If not None, this + will be used and the strides will be ignored. + conv: The convolution function to use (e.g. tf.nn.conv2d). + name: The name for this conv2d op. + """ + super(Conv2DBase, self).__init__(name) + + with self._BlockScope(): + self._act = act if act else PassThrough() + self._bias = BiasAdd(bias) if bias else PassThrough() + + self._kernel_shape = np.zeros((4,), dtype=np.int32) + self._kernel_shape[:2] = filter_size + self._kernel_shape[3] = depth + + self._strides = np.ones((4,), dtype=np.int32) + self._strides[1:3] = strides + self._strides = list(self._strides) + + self._padding = padding + + self._kernel = None + self._conv = conv + + self._atrous_rate = atrous_rate + + def _CreateKernel(self, shape, dtype): + raise NotImplementedError('This method must be sub-classed') + + def _Apply(self, x): + """Apply the self._conv op. + + Arguments: + x: input tensor. It needs to be a 4D tensor of the form + [batch, height, width, channels]. + Returns: + The output of the convolution of x with the current convolutional + kernel. + Raises: + ValueError: if number of channels is not defined at graph construction. + """ + input_shape = x.get_shape().with_rank(4) + input_shape[3:].assert_is_fully_defined() # channels must be defined + if self._kernel is None: + assert self._kernel_shape[2] == 0, self._kernel_shape + self._kernel_shape[2] = input_shape[3].value + if self._kernel_shape[3] < 0: + # Make output depth be the same as input depth. + self._kernel_shape[3] = self._kernel_shape[2] + self._kernel = self._CreateKernel(self._kernel_shape, x.dtype) + + x, padding = HandleConvPaddingModes( + x, self._padding, self._kernel_shape, self._strides) + if self._atrous_rate is None: + x = self._conv(x, self._kernel, strides=self._strides, padding=padding) + else: + x = self._conv(x, self._kernel, rate=self._atrous_rate, padding=padding) + + if self._padding != 'VALID': + # Manually update shape. Known shape information can be lost by tf.pad(). + height = (1 + (input_shape[1].value - 1) // self._strides[1] + if input_shape[1].value else None) + width = (1 + (input_shape[2].value - 1) // self._strides[2] + if input_shape[2].value else None) + shape = x.get_shape() + x.set_shape([shape[0], height, width, shape[3]]) + + return self._act(self._bias(x)) + + +class Conv2D(Conv2DBase): + """A tf.nn.conv2d operator.""" + + def __init__(self, depth, filter_size, strides, padding, + bias=None, act=None, initializer=None, name=None): + """Initializes a Conv2D block. + + Arguments: + depth: The output depth of the block (i.e., #filters) + filter_size: The size of the 2D filter. If it's specified as an integer, + it's going to create a square filter. Otherwise, this is a tuple + specifying the height x width of the filter. + strides: A tuple specifying the y and x stride. + padding: One of the valid padding modes allowed by tf.nn.conv2d, or + 'REFLECT'/'SYMMETRIC' for mirror padding. + bias: An initializer for the bias, or a Bias class object. If None, there + will be no bias in this block. See BiasAdd block. + act: Optional activation function applied to the output. + initializer: Optional initializer for weights. + name: The name for this conv2d op. + """ + super(Conv2D, self).__init__(depth, filter_size, strides, padding, bias, + act, conv=tf.nn.conv2d, name=name) + + with self._BlockScope(): + if initializer is None: + initializer = block_util.RsqrtInitializer(dims=(0, 1, 2)) + self._initializer = initializer + + def _CreateKernel(self, shape, dtype): + return self.NewVar(self._initializer(shape, dtype)) diff --git a/compression/entropy_coder/lib/blocks_std_test.py b/compression/entropy_coder/lib/blocks_std_test.py new file mode 100644 index 0000000000000000000000000000000000000000..7e8d42cf1020dabaeb58ca52049610ce74245092 --- /dev/null +++ b/compression/entropy_coder/lib/blocks_std_test.py @@ -0,0 +1,339 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for basic tensorflow blocks_std.""" + +from __future__ import division +from __future__ import unicode_literals + +import math +import os + +import numpy as np +import tensorflow as tf + +import blocks_std + + +def _NumpyConv2D(x, f, strides, padding, rate=1): + assert strides[0] == 1 and strides[3] == 1, strides + + if rate > 1: + f_shape = f.shape + expand_f = np.zeros([f_shape[0], ((f_shape[1] - 1) * rate + 1), + f_shape[2], f_shape[3]]) + expand_f[:, [y * rate for y in range(f_shape[1])], :, :] = f + f = np.zeros([((f_shape[0] - 1) * rate + 1), expand_f.shape[1], + f_shape[2], f_shape[3]]) + f[[y * rate for y in range(f_shape[0])], :, :, :] = expand_f + + if padding != 'VALID': + assert x.shape[1] > 0 and x.shape[2] > 0, x.shape + # Compute the number of padded rows and cols. + # See Conv2D block comments for a math explanation. + remainder = ((x.shape[1] - 1) % strides[1], (x.shape[2] - 1) % strides[2]) + pad_rows = f.shape[0] - remainder[0] - 1 + pad_cols = f.shape[1] - remainder[1] - 1 + pad = ((0, 0), + (pad_rows // 2, (pad_rows + 1) // 2), + (pad_cols // 2, (pad_cols + 1) // 2), + (0, 0)) + + # Pad the input using numpy.pad(). + mode = None + if padding == 'SAME': + mode = str('constant') + if padding == 'REFLECT': + mode = str('reflect') + if padding == 'SYMMETRIC': + mode = str('symmetric') + x = np.pad(x, pad, mode=mode) + + # Since x is now properly padded, proceed as if padding mode is VALID. + x_window = np.empty( + (x.shape[0], + int(math.ceil((x.shape[1] - f.shape[0] + 1) / strides[1])), + int(math.ceil((x.shape[2] - f.shape[1] + 1) / strides[2])), + np.prod(f.shape[:3]))) + + # The output at pixel location (i, j) is the result of linear transformation + # applied to the window whose top-left corner is at + # (i * row_stride, j * col_stride). + for i in xrange(x_window.shape[1]): + k = i * strides[1] + for j in xrange(x_window.shape[2]): + l = j * strides[2] + x_window[:, i, j, :] = x[:, + k:(k + f.shape[0]), + l:(l + f.shape[1]), + :].reshape((x_window.shape[0], -1)) + + y = np.tensordot(x_window, f.reshape((-1, f.shape[3])), axes=1) + return y + + +class BlocksStdTest(tf.test.TestCase): + + def CheckUnary(self, y, op_type): + self.assertEqual(op_type, y.op.type) + self.assertEqual(1, len(y.op.inputs)) + return y.op.inputs[0] + + def CheckBinary(self, y, op_type): + self.assertEqual(op_type, y.op.type) + self.assertEqual(2, len(y.op.inputs)) + return y.op.inputs + + def testPassThrough(self): + p = blocks_std.PassThrough() + x = tf.placeholder(dtype=tf.float32, shape=[1]) + self.assertIs(p(x), x) + + def CheckBiasAdd(self, y, b): + x, u = self.CheckBinary(y, 'BiasAdd') + self.assertIs(u, b._bias.value()) + self.assertEqual(x.dtype, u.dtype.base_dtype) + return x + + def testBiasAdd(self): + b = blocks_std.BiasAdd() + x = tf.placeholder(dtype=tf.float32, shape=[4, 8]) + y = b(x) + self.assertEqual(b._bias.get_shape(), x.get_shape()[-1:]) + self.assertIs(x, self.CheckBiasAdd(y, b)) + + def testBiasRankTest(self): + b = blocks_std.BiasAdd() + x = tf.placeholder(dtype=tf.float32, shape=[10]) + with self.assertRaises(ValueError): + b(x) + + def CheckLinear(self, y, m): + x, w = self.CheckBinary(y, 'MatMul') + self.assertIs(w, m._matrix.value()) + self.assertEqual(x.dtype, w.dtype.base_dtype) + return x + + def testLinear(self): + m = blocks_std.Linear(10) + x = tf.placeholder(dtype=tf.float32, shape=[8, 9]) + y = m(x) + self.assertEqual(m._matrix.get_shape(), [9, 10]) + self.assertIs(x, self.CheckLinear(y, m)) + + def testLinearShared(self): + # Create a linear map which is applied twice on different inputs + # (i.e. the weights of the map are shared). + linear_map = blocks_std.Linear(6) + x1 = tf.random_normal(shape=[1, 5]) + x2 = tf.random_normal(shape=[1, 5]) + xs = x1 + x2 + + # Apply the transform with the same weights. + y1 = linear_map(x1) + y2 = linear_map(x2) + ys = linear_map(xs) + + with self.test_session() as sess: + # Initialize all the variables of the graph. + tf.global_variables_initializer().run() + + y1_res, y2_res, ys_res = sess.run([y1, y2, ys]) + self.assertAllClose(y1_res + y2_res, ys_res) + + def CheckNN(self, y, nn, act=None): + if act: + pre_act = self.CheckUnary(y, act) + else: + pre_act = y + + if not isinstance(nn._bias, blocks_std.PassThrough): + pre_bias = self.CheckBiasAdd(pre_act, nn._bias) + else: + pre_bias = pre_act + + if len(nn._matrices) > 1: + self.assertEqual('AddN', pre_bias.op.type) + pre_bias = pre_bias.op.inputs + else: + pre_bias = [pre_bias] + + self.assertEqual(len(pre_bias), len(nn._matrices)) + return [self.CheckLinear(u, m) for u, m in zip(pre_bias, nn._matrices)] + + def testNNWithoutActWithoutBias(self): + nn = blocks_std.NN(10, act=None, bias=None) + x = tf.placeholder(dtype=tf.float32, shape=[5, 7]) + y = nn(x) + self.assertIs(x, self.CheckNN(y, nn)[0]) + + def testNNWithoutBiasWithAct(self): + nn = blocks_std.NN(10, act=tf.nn.relu, bias=None) + x = tf.placeholder(dtype=tf.float32, shape=[5, 7]) + y = nn(x) + self.assertIs(x, self.CheckNN(y, nn, 'Relu')[0]) + + def testNNWithBiasWithoutAct(self): + nn = blocks_std.NN(10, bias=blocks_std.Bias(0), act=None) + x = tf.placeholder(dtype=tf.float32, shape=[5, 7]) + y = nn(x) + self.assertIs(x, self.CheckNN(y, nn)[0]) + + def testNNWithBiasWithAct(self): + nn = blocks_std.NN(10, bias=blocks_std.Bias(0), act=tf.square) + x = tf.placeholder(dtype=tf.float32, shape=[5, 7]) + y = nn(x) + self.assertIs(x, self.CheckNN(y, nn, 'Square')[0]) + + def testNNMultipleInputs(self): + nn = blocks_std.NN(10, bias=blocks_std.Bias(0), act=tf.tanh) + x = [tf.placeholder(dtype=tf.float32, shape=[5, 7]), + tf.placeholder(dtype=tf.float32, shape=[5, 3]), + tf.placeholder(dtype=tf.float32, shape=[5, 5])] + y = nn(*x) + xs = self.CheckNN(y, nn, 'Tanh') + self.assertEqual(len(x), len(xs)) + for u, v in zip(x, xs): + self.assertIs(u, v) + + def testConv2DSAME(self): + np.random.seed(142536) + + x_shape = [4, 16, 11, 5] + f_shape = [4, 3, 5, 6] + strides = [1, 2, 2, 1] + padding = 'SAME' + + conv = blocks_std.Conv2D(depth=f_shape[-1], + filter_size=f_shape[0:2], + strides=strides[1:3], + padding=padding, + act=None, + bias=None) + x_value = np.random.normal(size=x_shape) + x = tf.convert_to_tensor(x_value, dtype=tf.float32) + y = conv(x) + + with self.test_session(): + tf.global_variables_initializer().run() + f_value = conv._kernel.eval() + y_value = y.eval() + + y_expected = _NumpyConv2D(x_value, f_value, + strides=strides, padding=padding) + self.assertAllClose(y_expected, y_value) + + def testConv2DValid(self): + np.random.seed(253647) + + x_shape = [4, 11, 12, 5] + f_shape = [5, 2, 5, 5] + strides = [1, 2, 2, 1] + padding = 'VALID' + + conv = blocks_std.Conv2D(depth=f_shape[-1], + filter_size=f_shape[0:2], + strides=strides[1:3], + padding=padding, + act=None, + bias=None) + x_value = np.random.normal(size=x_shape) + x = tf.convert_to_tensor(x_value, dtype=tf.float32) + y = conv(x) + + with self.test_session(): + tf.global_variables_initializer().run() + f_value = conv._kernel.eval() + y_value = y.eval() + + y_expected = _NumpyConv2D(x_value, f_value, + strides=strides, padding=padding) + self.assertAllClose(y_expected, y_value) + + def testConv2DSymmetric(self): + np.random.seed(364758) + + x_shape = [4, 10, 12, 6] + f_shape = [3, 4, 6, 5] + strides = [1, 1, 1, 1] + padding = 'SYMMETRIC' + + conv = blocks_std.Conv2D(depth=f_shape[-1], + filter_size=f_shape[0:2], + strides=strides[1:3], + padding=padding, + act=None, + bias=None) + x_value = np.random.normal(size=x_shape) + x = tf.convert_to_tensor(x_value, dtype=tf.float32) + y = conv(x) + + with self.test_session(): + tf.global_variables_initializer().run() + f_value = conv._kernel.eval() + y_value = y.eval() + + y_expected = _NumpyConv2D(x_value, f_value, + strides=strides, padding=padding) + self.assertAllClose(y_expected, y_value) + + def testConv2DReflect(self): + np.random.seed(768798) + + x_shape = [4, 10, 12, 6] + f_shape = [3, 4, 6, 5] + strides = [1, 2, 2, 1] + padding = 'REFLECT' + + conv = blocks_std.Conv2D(depth=f_shape[-1], + filter_size=f_shape[0:2], + strides=strides[1:3], + padding=padding, + act=None, + bias=None) + x_value = np.random.normal(size=x_shape) + x = tf.convert_to_tensor(x_value, dtype=tf.float32) + y = conv(x) + + with self.test_session(): + tf.global_variables_initializer().run() + f_value = conv._kernel.eval() + y_value = y.eval() + + y_expected = _NumpyConv2D(x_value, f_value, + strides=strides, padding=padding) + self.assertAllClose(y_expected, y_value) + + def testConv2DBias(self): + input_shape = [19, 14, 14, 64] + filter_shape = [3, 7, 64, 128] + strides = [1, 2, 2, 1] + output_shape = [19, 6, 4, 128] + + conv = blocks_std.Conv2D(depth=filter_shape[-1], + filter_size=filter_shape[0:2], + strides=strides[1:3], + padding='VALID', + act=None, + bias=blocks_std.Bias(1)) + x = tf.placeholder(dtype=tf.float32, shape=input_shape) + + y = conv(x) + self.CheckBiasAdd(y, conv._bias) + self.assertEqual(output_shape, y.get_shape().as_list()) + + +if __name__ == '__main__': + tf.test.main() diff --git a/compression/entropy_coder/model/__init__.py b/compression/entropy_coder/model/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/compression/entropy_coder/model/entropy_coder_model.py b/compression/entropy_coder/model/entropy_coder_model.py new file mode 100644 index 0000000000000000000000000000000000000000..67f7eb5bc05f3df7363529c19fa77d176caaabc1 --- /dev/null +++ b/compression/entropy_coder/model/entropy_coder_model.py @@ -0,0 +1,55 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Entropy coder model.""" + + +class EntropyCoderModel(object): + """Entropy coder model.""" + + def __init__(self): + # Loss used for training the model. + self.loss = None + + # Tensorflow op to run to train the model. + self.train_op = None + + # Tensor corresponding to the average code length of the input bit field + # tensor. The average code length is a number of output bits per input bit. + # To get an effective compression, this number should be between 0.0 + # and 1.0 (1.0 corresponds to no compression). + self.average_code_length = None + + def Initialize(self, global_step, optimizer, config_string): + raise NotImplementedError() + + def BuildGraph(self, input_codes): + """Build the Tensorflow graph corresponding to the entropy coder model. + + Args: + input_codes: Tensor of size: batch_size x height x width x bit_depth + corresponding to the codes to compress. + The input codes are {-1, +1} codes. + """ + # TODO: + # - consider switching to {0, 1} codes. + # - consider passing an extra tensor which gives for each (b, y, x) + # what is the actual depth (which would allow to use more or less bits + # for each (y, x) location. + raise NotImplementedError() + + def GetConfigStringForUnitTest(self): + """Returns a default model configuration to be used for unit tests.""" + return None diff --git a/compression/entropy_coder/model/model_factory.py b/compression/entropy_coder/model/model_factory.py new file mode 100644 index 0000000000000000000000000000000000000000..e6f9902f3bb720e76f228f2774a9eaf7774ef191 --- /dev/null +++ b/compression/entropy_coder/model/model_factory.py @@ -0,0 +1,53 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Entropy coder model registrar.""" + + +class ModelFactory(object): + """Factory of encoder/decoder models.""" + + def __init__(self): + self._model_dictionary = dict() + + def RegisterModel(self, + entropy_coder_model_name, + entropy_coder_model_factory): + self._model_dictionary[entropy_coder_model_name] = ( + entropy_coder_model_factory) + + def CreateModel(self, model_name): + current_model_factory = self._model_dictionary[model_name] + return current_model_factory() + + def GetAvailableModels(self): + return self._model_dictionary.keys() + + +_model_registry = ModelFactory() + + +def GetModelRegistry(): + return _model_registry + + +class RegisterEntropyCoderModel(object): + + def __init__(self, model_name): + self._model_name = model_name + + def __call__(self, f): + _model_registry.RegisterModel(self._model_name, f) + return f diff --git a/compression/entropy_coder/progressive/__init__.py b/compression/entropy_coder/progressive/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/compression/entropy_coder/progressive/progressive.py b/compression/entropy_coder/progressive/progressive.py new file mode 100644 index 0000000000000000000000000000000000000000..98777d8d5e7a7c72aba8aa11673c46830f6ef7d2 --- /dev/null +++ b/compression/entropy_coder/progressive/progressive.py @@ -0,0 +1,241 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Code probability model used for entropy coding.""" + +import json + +import tensorflow as tf + +from entropy_coder.lib import blocks +from entropy_coder.model import entropy_coder_model +from entropy_coder.model import model_factory + +# pylint: disable=not-callable + + +class BrnnPredictor(blocks.BlockBase): + """BRNN prediction applied on one layer.""" + + def __init__(self, code_depth, name=None): + super(BrnnPredictor, self).__init__(name) + + with self._BlockScope(): + hidden_depth = 2 * code_depth + + # What is coming from the previous layer/iteration + # is going through a regular Conv2D layer as opposed to the binary codes + # of the current layer/iteration which are going through a masked + # convolution. + self._adaptation0 = blocks.RasterScanConv2D( + hidden_depth, [7, 7], [1, 1], 'SAME', + strict_order=True, + bias=blocks.Bias(0), act=tf.tanh) + self._adaptation1 = blocks.Conv2D( + hidden_depth, [3, 3], [1, 1], 'SAME', + bias=blocks.Bias(0), act=tf.tanh) + self._predictor = blocks.CompositionOperator([ + blocks.LineOperator( + blocks.RasterScanConv2DLSTM( + depth=hidden_depth, + filter_size=[1, 3], + hidden_filter_size=[1, 3], + strides=[1, 1], + padding='SAME')), + blocks.Conv2D(hidden_depth, [1, 1], [1, 1], 'SAME', + bias=blocks.Bias(0), act=tf.tanh), + blocks.Conv2D(code_depth, [1, 1], [1, 1], 'SAME', + bias=blocks.Bias(0), act=tf.tanh) + ]) + + def _Apply(self, x, s): + # Code estimation using both: + # - the state from the previous iteration/layer, + # - the binary codes that are before in raster scan order. + h = tf.concat(values=[self._adaptation0(x), self._adaptation1(s)], axis=3) + + estimated_codes = self._predictor(h) + + return estimated_codes + + +class LayerPrediction(blocks.BlockBase): + """Binary code prediction for one layer.""" + + def __init__(self, layer_count, code_depth, name=None): + super(LayerPrediction, self).__init__(name) + + self._layer_count = layer_count + + # No previous layer. + self._layer_state = None + self._current_layer = 0 + + with self._BlockScope(): + # Layers used to do the conditional code prediction. + self._brnn_predictors = [] + for _ in xrange(layer_count): + self._brnn_predictors.append(BrnnPredictor(code_depth)) + + # Layers used to generate the input of the LSTM operating on the + # iteration/depth domain. + hidden_depth = 2 * code_depth + self._state_blocks = [] + for _ in xrange(layer_count): + self._state_blocks.append(blocks.CompositionOperator([ + blocks.Conv2D( + hidden_depth, [3, 3], [1, 1], 'SAME', + bias=blocks.Bias(0), act=tf.tanh), + blocks.Conv2D( + code_depth, [3, 3], [1, 1], 'SAME', + bias=blocks.Bias(0), act=tf.tanh) + ])) + + # Memory of the RNN is equivalent to the size of 2 layers of binary + # codes. + hidden_depth = 2 * code_depth + self._layer_rnn = blocks.CompositionOperator([ + blocks.Conv2DLSTM( + depth=hidden_depth, + filter_size=[1, 1], + hidden_filter_size=[1, 1], + strides=[1, 1], + padding='SAME'), + blocks.Conv2D(hidden_depth, [1, 1], [1, 1], 'SAME', + bias=blocks.Bias(0), act=tf.tanh), + blocks.Conv2D(code_depth, [1, 1], [1, 1], 'SAME', + bias=blocks.Bias(0), act=tf.tanh) + ]) + + def _Apply(self, x): + assert self._current_layer < self._layer_count + + # Layer state is set to 0 when there is no previous iteration. + if self._layer_state is None: + self._layer_state = tf.zeros_like(x, dtype=tf.float32) + + # Code estimation using both: + # - the state from the previous iteration/layer, + # - the binary codes that are before in raster scan order. + estimated_codes = self._brnn_predictors[self._current_layer]( + x, self._layer_state) + + # Compute the updated layer state. + h = self._state_blocks[self._current_layer](x) + self._layer_state = self._layer_rnn(h) + self._current_layer += 1 + + return estimated_codes + + +class ProgressiveModel(entropy_coder_model.EntropyCoderModel): + """Progressive BRNN entropy coder model.""" + + def __init__(self): + super(ProgressiveModel, self).__init__() + + def Initialize(self, global_step, optimizer, config_string): + if config_string is None: + raise ValueError('The progressive model requires a configuration.') + config = json.loads(config_string) + if 'coded_layer_count' not in config: + config['coded_layer_count'] = 0 + + self._config = config + self._optimizer = optimizer + self._global_step = global_step + + def BuildGraph(self, input_codes): + """Build the graph corresponding to the progressive BRNN model.""" + layer_depth = self._config['layer_depth'] + layer_count = self._config['layer_count'] + + code_shape = input_codes.get_shape() + code_depth = code_shape[-1].value + if self._config['coded_layer_count'] > 0: + prefix_depth = self._config['coded_layer_count'] * layer_depth + if code_depth < prefix_depth: + raise ValueError('Invalid prefix depth: {} VS {}'.format( + prefix_depth, code_depth)) + input_codes = input_codes[:, :, :, :prefix_depth] + + code_shape = input_codes.get_shape() + code_depth = code_shape[-1].value + if code_depth % layer_depth != 0: + raise ValueError( + 'Code depth must be a multiple of the layer depth: {} vs {}'.format( + code_depth, layer_depth)) + code_layer_count = code_depth // layer_depth + if code_layer_count > layer_count: + raise ValueError('Input codes have too many layers: {}, max={}'.format( + code_layer_count, layer_count)) + + # Block used to estimate binary codes. + layer_prediction = LayerPrediction(layer_count, layer_depth) + + # Block used to compute code lengths. + code_length_block = blocks.CodeLength() + + # Loop over all the layers. + code_length = [] + code_layers = tf.split( + value=input_codes, num_or_size_splits=code_layer_count, axis=3) + for k in xrange(code_layer_count): + x = code_layers[k] + predicted_x = layer_prediction(x) + # Saturate the prediction to avoid infinite code length. + epsilon = 0.001 + predicted_x = tf.clip_by_value( + predicted_x, -1 + epsilon, +1 - epsilon) + code_length.append(code_length_block( + blocks.ConvertSignCodeToZeroOneCode(x), + blocks.ConvertSignCodeToZeroOneCode(predicted_x))) + tf.summary.scalar('code_length_layer_{:02d}'.format(k), code_length[-1]) + code_length = tf.stack(code_length) + self.loss = tf.reduce_mean(code_length) + tf.summary.scalar('loss', self.loss) + + # Loop over all the remaining layers just to make sure they are + # instantiated. Otherwise, loading model params could fail. + dummy_x = tf.zeros_like(code_layers[0]) + for _ in xrange(layer_count - code_layer_count): + dummy_predicted_x = layer_prediction(dummy_x) + + # Average bitrate over total_line_count. + self.average_code_length = tf.reduce_mean(code_length) + + if self._optimizer: + optim_op = self._optimizer.minimize(self.loss, + global_step=self._global_step) + block_updates = blocks.CreateBlockUpdates() + if block_updates: + with tf.get_default_graph().control_dependencies([optim_op]): + self.train_op = tf.group(*block_updates) + else: + self.train_op = optim_op + else: + self.train_op = None + + def GetConfigStringForUnitTest(self): + s = '{\n' + s += '"layer_depth": 1,\n' + s += '"layer_count": 8\n' + s += '}\n' + return s + + +@model_factory.RegisterEntropyCoderModel('progressive') +def CreateProgressiveModel(): + return ProgressiveModel() diff --git a/compression/image_encoder/README.md b/compression/image_encoder/README.md new file mode 100644 index 0000000000000000000000000000000000000000..a47da977aa4db4be26528c5ebfe030024f31291b --- /dev/null +++ b/compression/image_encoder/README.md @@ -0,0 +1,105 @@ +# Image Compression with Neural Networks + +This is a [TensorFlow](http://www.tensorflow.org/) model for compressing and +decompressing images using an already trained Residual GRU model as descibed +in [Full Resolution Image Compression with Recurrent Neural Networks](https://arxiv.org/abs/1608.05148). Please consult the paper for more details +on the architecture and compression results. + +This code will allow you to perform the lossy compression on an model +already trained on compression. This code doesn't not currently contain the +Entropy Coding portions of our paper. + + +## Prerequisites +The only software requirements for running the encoder and decoder is having +Tensorflow installed. You will also need to [download](http://download.tensorflow.org/models/compression_residual_gru-2016-08-23.tar.gz) +and extract the model residual_gru.pb. + +If you want to generate the perceptual similarity under MS-SSIM, you will also +need to [Install SciPy](https://www.scipy.org/install.html). + +## Encoding +The Residual GRU network is fully convolutional, but requires the images +height and width in pixels by a multiple of 32. There is an image in this folder +called example.png that is 768x1024 if one is needed for testing. We also +rely on TensorFlow's built in decoding ops, which support only PNG and JPEG at +time of release. + +To encode an image, simply run the following command: + +`python encoder.py --input_image=/your/image/here.png +--output_codes=output_codes.npz --iteration=15 +--model=/path/to/model/residual_gru.pb +` + +The iteration parameter specifies the lossy-quality to target for compression. +The quality can be [0-15], where 0 corresponds to a target of 1/8 (bits per +pixel) bpp and every increment results in an additional 1/8 bpp. + +| Iteration | BPP | Compression Ratio | +|---: |---: |---: | +|0 | 0.125 | 192:1| +|1 | 0.250 | 96:1| +|2 | 0.375 | 64:1| +|3 | 0.500 | 48:1| +|4 | 0.625 | 38.4:1| +|5 | 0.750 | 32:1| +|6 | 0.875 | 27.4:1| +|7 | 1.000 | 24:1| +|8 | 1.125 | 21.3:1| +|9 | 1.250 | 19.2:1| +|10 | 1.375 | 17.4:1| +|11 | 1.500 | 16:1| +|12 | 1.625 | 14.7:1| +|13 | 1.750 | 13.7:1| +|14 | 1.875 | 12.8:1| +|15 | 2.000 | 12:1| + +The output_codes file contains the numpy shape and a flattened, bit-packed +array of the codes. These can be inspected in python by using numpy.load(). + + +## Decoding +After generating codes for an image, the lossy reconstructions for that image +can be done as follows: + +`python decoder.py --input_codes=codes.npz --output_directory=/tmp/decoded/ +--model=residual_gru.pb` + +The output_directory will contain images decoded at each quality level. + + +## Comparing Similarity +One of our primary metrics for comparing how similar two images are +is MS-SSIM. + +To generate these metrics on your images you can run: +`python msssim.py --original_image=/path/to/your/image.png +--compared_image=/tmp/decoded/image_15.png` + + +## Results +CSV results containing the post-entropy bitrates and MS-SSIM over Kodak can +are available for reference. Each row of the CSV represents each of the Kodak +images in their dataset number (1-24). Each column of the CSV represents each +iteration of the model (1-16). + +[Post Entropy Bitrates](https://storage.googleapis.com/compression-ml/residual_gru_results/bitrate.csv) + +[MS-SSIM](https://storage.googleapis.com/compression-ml/residual_gru_results/msssim.csv) + + +## FAQ + +#### How do I train my own compression network? +We currently don't provide the code to build and train a compression +graph from scratch. + +#### I get an InvalidArgumentError: Incompatible shapes. +This is usually due to the fact that our network only supports images that are +both height and width divisible by 32 pixel. Try padding your images to 32 +pixel boundaries. + + +## Contact Info +Model repository maintained by Nick Johnston ([nmjohn](https://github.com/nmjohn)). diff --git a/compression/decoder.py b/compression/image_encoder/decoder.py similarity index 100% rename from compression/decoder.py rename to compression/image_encoder/decoder.py diff --git a/compression/encoder.py b/compression/image_encoder/encoder.py similarity index 100% rename from compression/encoder.py rename to compression/image_encoder/encoder.py diff --git a/compression/example.png b/compression/image_encoder/example.png similarity index 100% rename from compression/example.png rename to compression/image_encoder/example.png diff --git a/compression/msssim.py b/compression/image_encoder/msssim.py similarity index 100% rename from compression/msssim.py rename to compression/image_encoder/msssim.py diff --git a/differential_privacy/README.md b/differential_privacy/README.md index 9cda93aa18c06b51f2671e56b731adcf746189b9..4bd6c22c99830a329db4ae887d8243d0c1b8f931 100644 --- a/differential_privacy/README.md +++ b/differential_privacy/README.md @@ -3,7 +3,7 @@ Open Sourced By: Xin Pan (xpan@google.com, github: panyx0718) -###Introduction for dp_sgd/README.md +### Introduction for [dp_sgd/README.md](dp_sgd/README.md) Machine learning techniques based on neural networks are achieving remarkable results in a wide variety of domains. Often, the training of models requires @@ -18,7 +18,7 @@ manageable cost in software complexity, training efficiency, and model quality. paper: https://arxiv.org/abs/1607.00133 -###Introduction for multiple_teachers/README.md +### Introduction for [multiple_teachers/README.md](multiple_teachers/README.md) This repository contains code to create a setup for learning privacy-preserving student models by transferring knowledge from an ensemble of teachers trained diff --git a/differential_privacy/dp_sgd/README.md b/differential_privacy/dp_sgd/README.md index 887a13e8fbb61633ab6f869c60dc65ec2bcbf6bb..6c0846748b3516a12ccc126ef1bea843b6635914 100644 --- a/differential_privacy/dp_sgd/README.md +++ b/differential_privacy/dp_sgd/README.md @@ -8,14 +8,14 @@ Open Sourced By: Xin Pan (xpan@google.com, github: panyx0718) -Machine learning techniques based on neural networks are achieving remarkable -results in a wide variety of domains. Often, the training of models requires -large, representative datasets, which may be crowdsourced and contain sensitive -information. The models should not expose private information in these datasets. -Addressing this goal, we develop new algorithmic techniques for learning and a -refined analysis of privacy costs within the framework of differential privacy. -Our implementation and experiments demonstrate that we can train deep neural -networks with non-convex objectives, under a modest privacy budget, and at a +Machine learning techniques based on neural networks are achieving remarkable +results in a wide variety of domains. Often, the training of models requires +large, representative datasets, which may be crowdsourced and contain sensitive +information. The models should not expose private information in these datasets. +Addressing this goal, we develop new algorithmic techniques for learning and a +refined analysis of privacy costs within the framework of differential privacy. +Our implementation and experiments demonstrate that we can train deep neural +networks with non-convex objectives, under a modest privacy budget, and at a manageable cost in software complexity, training efficiency, and model quality. paper: https://arxiv.org/abs/1607.00133 @@ -46,7 +46,7 @@ https://github.com/panyx0718/models/tree/master/slim # Download the data to the data/ directory. # List the codes. -ls -R differential_privacy/ +$ ls -R differential_privacy/ differential_privacy/: dp_sgd __init__.py privacy_accountant README.md @@ -72,16 +72,16 @@ differential_privacy/privacy_accountant/tf: accountant.py accountant_test.py BUILD # List the data. -ls -R data/ +$ ls -R data/ ./data: mnist_test.tfrecord mnist_train.tfrecord # Build the codes. -bazel build -c opt differential_privacy/... +$ bazel build -c opt differential_privacy/... # Run the mnist differntial privacy training codes. -bazel-bin/differential_privacy/dp_sgd/dp_mnist/dp_mnist \ +$ bazel-bin/differential_privacy/dp_sgd/dp_mnist/dp_mnist \ --training_data_path=data/mnist_train.tfrecord \ --eval_data_path=data/mnist_test.tfrecord \ --save_path=/tmp/mnist_dir @@ -102,6 +102,6 @@ train_accuracy: 0.53 eval_accuracy: 0.53 ... -ls /tmp/mnist_dir/ +$ ls /tmp/mnist_dir/ checkpoint ckpt ckpt.meta results-0.json ``` diff --git a/differential_privacy/multiple_teachers/analysis.py b/differential_privacy/multiple_teachers/analysis.py index 1fe6df27c38ea7a546a0a0f90cdb7ce6ff7ad864..44647cdfaa10fc2d23ee7d249a2be9a6d07fefdd 100644 --- a/differential_privacy/multiple_teachers/analysis.py +++ b/differential_privacy/multiple_teachers/analysis.py @@ -216,10 +216,10 @@ def main(unused_argv): # If we are reproducing results from paper https://arxiv.org/abs/1610.05755, # download the required binaries with label information. ################################################################## - + # Binaries for MNIST results paper_binaries_mnist = \ - ["https://github.com/npapernot/multiple-teachers-for-privacy/blob/master/mnist_250_teachers_labels.npy?raw=true", + ["https://github.com/npapernot/multiple-teachers-for-privacy/blob/master/mnist_250_teachers_labels.npy?raw=true", "https://github.com/npapernot/multiple-teachers-for-privacy/blob/master/mnist_250_teachers_100_indices_used_by_student.npy?raw=true"] if FLAGS.counts_file == "mnist_250_teachers_labels.npy" \ or FLAGS.indices_file == "mnist_250_teachers_100_indices_used_by_student.npy": @@ -254,7 +254,7 @@ def main(unused_argv): total_log_mgf_nm = np.array([0.0 for _ in l_list]) total_ss_nm = np.array([0.0 for _ in l_list]) noise_eps = FLAGS.noise_eps - + for i in indices: total_log_mgf_nm += np.array( [logmgf_from_counts(counts_mat[i], noise_eps, l) diff --git a/differential_privacy/multiple_teachers/deep_cnn.py b/differential_privacy/multiple_teachers/deep_cnn.py index d502c9926b0459c95bdec503f1ab483cadd38559..cc34d0a2f3ea7907a439faf178b1bb04467821dd 100644 --- a/differential_privacy/multiple_teachers/deep_cnn.py +++ b/differential_privacy/multiple_teachers/deep_cnn.py @@ -95,9 +95,9 @@ def inference(images, dropout=False): # conv1 with tf.variable_scope('conv1') as scope: - kernel = _variable_with_weight_decay('weights', + kernel = _variable_with_weight_decay('weights', shape=first_conv_shape, - stddev=1e-4, + stddev=1e-4, wd=0.0) conv = tf.nn.conv2d(images, kernel, [1, 1, 1, 1], padding='SAME') biases = _variable_on_cpu('biases', [64], tf.constant_initializer(0.0)) @@ -108,25 +108,25 @@ def inference(images, dropout=False): # pool1 - pool1 = tf.nn.max_pool(conv1, - ksize=[1, 3, 3, 1], + pool1 = tf.nn.max_pool(conv1, + ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], - padding='SAME', + padding='SAME', name='pool1') - + # norm1 - norm1 = tf.nn.lrn(pool1, - 4, - bias=1.0, - alpha=0.001 / 9.0, + norm1 = tf.nn.lrn(pool1, + 4, + bias=1.0, + alpha=0.001 / 9.0, beta=0.75, name='norm1') # conv2 with tf.variable_scope('conv2') as scope: - kernel = _variable_with_weight_decay('weights', + kernel = _variable_with_weight_decay('weights', shape=[5, 5, 64, 128], - stddev=1e-4, + stddev=1e-4, wd=0.0) conv = tf.nn.conv2d(norm1, kernel, [1, 1, 1, 1], padding='SAME') biases = _variable_on_cpu('biases', [128], tf.constant_initializer(0.1)) @@ -137,18 +137,18 @@ def inference(images, dropout=False): # norm2 - norm2 = tf.nn.lrn(conv2, - 4, - bias=1.0, - alpha=0.001 / 9.0, + norm2 = tf.nn.lrn(conv2, + 4, + bias=1.0, + alpha=0.001 / 9.0, beta=0.75, name='norm2') - + # pool2 - pool2 = tf.nn.max_pool(norm2, + pool2 = tf.nn.max_pool(norm2, ksize=[1, 3, 3, 1], - strides=[1, 2, 2, 1], - padding='SAME', + strides=[1, 2, 2, 1], + padding='SAME', name='pool2') # local3 @@ -156,9 +156,9 @@ def inference(images, dropout=False): # Move everything into depth so we can perform a single matrix multiply. reshape = tf.reshape(pool2, [FLAGS.batch_size, -1]) dim = reshape.get_shape()[1].value - weights = _variable_with_weight_decay('weights', + weights = _variable_with_weight_decay('weights', shape=[dim, 384], - stddev=0.04, + stddev=0.04, wd=0.004) biases = _variable_on_cpu('biases', [384], tf.constant_initializer(0.1)) local3 = tf.nn.relu(tf.matmul(reshape, weights) + biases, name=scope.name) @@ -167,9 +167,9 @@ def inference(images, dropout=False): # local4 with tf.variable_scope('local4') as scope: - weights = _variable_with_weight_decay('weights', + weights = _variable_with_weight_decay('weights', shape=[384, 192], - stddev=0.04, + stddev=0.04, wd=0.004) biases = _variable_on_cpu('biases', [192], tf.constant_initializer(0.1)) local4 = tf.nn.relu(tf.matmul(local3, weights) + biases, name=scope.name) @@ -178,11 +178,11 @@ def inference(images, dropout=False): # compute logits with tf.variable_scope('softmax_linear') as scope: - weights = _variable_with_weight_decay('weights', + weights = _variable_with_weight_decay('weights', [192, FLAGS.nb_labels], - stddev=1/192.0, + stddev=1/192.0, wd=0.0) - biases = _variable_on_cpu('biases', + biases = _variable_on_cpu('biases', [FLAGS.nb_labels], tf.constant_initializer(0.0)) logits = tf.add(tf.matmul(local4, weights), biases, name=scope.name) @@ -386,7 +386,7 @@ def train_op_fun(total_loss, global_step): """ # Variables that affect learning rate. nb_ex_per_train_epoch = int(60000 / FLAGS.nb_teachers) - + num_batches_per_epoch = nb_ex_per_train_epoch / FLAGS.batch_size decay_steps = int(num_batches_per_epoch * FLAGS.epochs_per_decay) diff --git a/differential_privacy/multiple_teachers/input.py b/differential_privacy/multiple_teachers/input.py index e57da68782a425660ca020469f520bfbe96a1aca..bc8dec915b2a0f836e501455704016f4b1e4eff1 100644 --- a/differential_privacy/multiple_teachers/input.py +++ b/differential_privacy/multiple_teachers/input.py @@ -47,7 +47,7 @@ def create_dir_if_needed(dest_directory): def maybe_download(file_urls, directory): """ Download a set of files in temporary local folder - :param directory: the directory where to download + :param directory: the directory where to download :return: a tuple of filepaths corresponding to the files given as input """ # Create directory if doesn't exist @@ -73,7 +73,7 @@ def maybe_download(file_urls, directory): result.append(filepath) # Test if file already exists - if not gfile.Exists(filepath): + if not tf.gfile.Exists(filepath): def _progress(count, block_size, total_size): sys.stdout.write('\r>> Downloading %s %.1f%%' % (filename, float(count * block_size) / float(total_size) * 100.0)) @@ -124,7 +124,7 @@ def extract_svhn(local_url): :return: """ - with gfile.Open(local_url, mode='r') as file_obj: + with tf.gfile.Open(local_url, mode='r') as file_obj: # Load MATLAB matrix using scipy IO dict = loadmat(file_obj) diff --git a/differential_privacy/multiple_teachers/train_teachers.py b/differential_privacy/multiple_teachers/train_teachers.py index 16e55b151695d357d21f4c243e32417338cd2447..fdb7634f4d8f29d8292642bf6fe050fcd082854f 100644 --- a/differential_privacy/multiple_teachers/train_teachers.py +++ b/differential_privacy/multiple_teachers/train_teachers.py @@ -64,11 +64,11 @@ def train_teacher(dataset, nb_teachers, teacher_id): else: print("Check value of dataset flag") return False - + # Retrieve subset of data for this teacher - data, labels = input.partition_dataset(train_data, - train_labels, - nb_teachers, + data, labels = input.partition_dataset(train_data, + train_labels, + nb_teachers, teacher_id) print("Length of training data: " + str(len(labels))) diff --git a/im2txt/README.md b/im2txt/README.md index 510ee544efb2c6b2a4ee9094d326bc2a2d182e3b..223cf91fba52643e77116b4f6149bbd2bb8ba1c3 100644 --- a/im2txt/README.md +++ b/im2txt/README.md @@ -145,7 +145,8 @@ available space for storing the downloaded and processed data. MSCOCO_DIR="${HOME}/im2txt/data/mscoco" # Build the preprocessing script. -bazel build im2txt/download_and_preprocess_mscoco +cd tensorflow-models/im2txt +bazel build //im2txt:download_and_preprocess_mscoco # Run the preprocessing script. bazel-bin/im2txt/download_and_preprocess_mscoco "${MSCOCO_DIR}" @@ -211,7 +212,8 @@ INCEPTION_CHECKPOINT="${HOME}/im2txt/data/inception_v3.ckpt" MODEL_DIR="${HOME}/im2txt/model" # Build the model. -bazel build -c opt im2txt/... +cd tensorflow-models/im2txt +bazel build -c opt //im2txt/... # Run the training script. bazel-bin/im2txt/train \ @@ -304,7 +306,8 @@ VOCAB_FILE="${HOME}/im2txt/data/mscoco/word_counts.txt" IMAGE_FILE="${HOME}/im2txt/data/mscoco/raw-data/val2014/COCO_val2014_000000224477.jpg" # Build the inference binary. -bazel build -c opt im2txt/run_inference +cd tensorflow-models/im2txt +bazel build -c opt //im2txt:run_inference # Ignore GPU devices (only necessary if your GPU is currently memory # constrained, for example, by running the training script). diff --git a/inception/README.md b/inception/README.md index bbf13eb5a812fb08f09d30a1bc7ae7c092427a9f..f4731213755714b01b49036bb8a745cf354df9dd 100644 --- a/inception/README.md +++ b/inception/README.md @@ -86,7 +86,8 @@ you will not need to interact with the script again. DATA_DIR=$HOME/imagenet-data # build the preprocessing script. -bazel build inception/download_and_preprocess_imagenet +cd tensorflow-models/inception +bazel build //inception:download_and_preprocess_imagenet # run it bazel-bin/inception/download_and_preprocess_imagenet "${DATA_DIR}" @@ -153,7 +154,8 @@ To train this model, you simply need to specify the following: ```shell # Build the model. Note that we need to make sure the TensorFlow is ready to # use before this as this command will not build TensorFlow. -bazel build inception/imagenet_train +cd tensorflow-models/inception +bazel build //inception:imagenet_train # run it bazel-bin/inception/imagenet_train --num_gpus=1 --batch_size=32 --train_dir=/tmp/imagenet_train --data_dir=/tmp/imagenet_data @@ -189,7 +191,8 @@ GPU cards. ```shell # Build the model. Note that we need to make sure the TensorFlow is ready to # use before this as this command will not build TensorFlow. -bazel build inception/imagenet_train +cd tensorflow-models/inception +bazel build //inception:imagenet_train # run it bazel-bin/inception/imagenet_train --num_gpus=2 --batch_size=64 --train_dir=/tmp/imagenet_train @@ -260,7 +263,7 @@ Note that in this example each replica has a single tower that uses one GPU. The command-line flags `worker_hosts` and `ps_hosts` specify available servers. The same binary will be used for both the `worker` jobs and the `ps` jobs. Command line flag `job_name` will be used to specify what role a task will be -playing and `task_id` will be used to idenify which one of the jobs it is +playing and `task_id` will be used to identify which one of the jobs it is running. Several things to note here: * The numbers of `ps` and `worker` tasks are inferred from the lists of hosts @@ -288,7 +291,8 @@ running. Several things to note here: ```shell # Build the model. Note that we need to make sure the TensorFlow is ready to # use before this as this command will not build TensorFlow. -bazel build inception/imagenet_distributed_train +cd tensorflow-models/inception +bazel build //inception:imagenet_distributed_train # To start worker 0, go to the worker0 host and run the following (Note that # task_id should be in the range [0, num_worker_tasks): @@ -367,6 +371,13 @@ I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:206] Initialize HostPo I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:202] Started server with target: grpc://localhost:2222 ``` +If you compiled TensorFlow (from v1.1-rc3) with VERBS support and you have the +required device and IB verbs SW stack, you can specify --protocol='grpc+verbs' +In order to use Verbs RDMA for Tensor passing between workers and ps. +Need to add the the --protocol flag in all tasks (ps and workers). +The default protocol is the TensorFlow default protocol of grpc. + + [Congratulations!](https://www.youtube.com/watch?v=9bZkp7q19f0) You are now training Inception in a distributed manner. @@ -388,7 +399,8 @@ Briefly, one can evaluate the model by running: ```shell # Build the model. Note that we need to make sure the TensorFlow is ready to # use before this as this command will not build TensorFlow. -bazel build inception/imagenet_eval +cd tensorflow-models/inception +bazel build //inception:imagenet_eval # run it bazel-bin/inception/imagenet_eval --checkpoint_dir=/tmp/imagenet_train --eval_dir=/tmp/imagenet_eval @@ -443,7 +455,8 @@ but feel free to edit accordingly. FLOWERS_DATA_DIR=/tmp/flowers-data/ # build the preprocessing script. -bazel build inception/download_and_preprocess_flowers +cd tensorflow-models/inception +bazel build //inception:download_and_preprocess_flowers # run it bazel-bin/inception/download_and_preprocess_flowers "${FLOWERS_DATA_DIR}" @@ -523,7 +536,8 @@ the flowers data set with the following command. ```shell # Build the model. Note that we need to make sure the TensorFlow is ready to # use before this as this command will not build TensorFlow. -bazel build inception/flowers_train +cd tensorflow-models/inception +bazel build //inception:flowers_train # Path to the downloaded Inception-v3 model. MODEL_PATH="${INCEPTION_MODEL_DIR}/inception-v3/model.ckpt-157585" @@ -559,7 +573,8 @@ fine-tuned model, you will need to run `flowers_eval`: ```shell # Build the model. Note that we need to make sure the TensorFlow is ready to # use before this as this command will not build TensorFlow. -bazel build inception/flowers_eval +cd tensorflow-models/inception +bazel build //inception:flowers_eval # Directory where we saved the fine-tuned checkpoint and events files. TRAIN_DIR=/tmp/flowers_train/ @@ -647,7 +662,8 @@ To run `build_image_data.py`, you can run the following command line: OUTPUT_DIRECTORY=$HOME/my-custom-data/ # build the preprocessing script. -bazel build inception/build_image_data +cd tensorflow-models/inception +bazel build //inception:build_image_data # convert the data. bazel-bin/inception/build_image_data \ @@ -749,7 +765,7 @@ batch-splitting the model across multiple GPUs. permit training the model with higher learning rates. * Often the GPU memory is a bottleneck that prevents employing larger batch - sizes. Employing more GPUs allows one to user larger batch sizes because + sizes. Employing more GPUs allows one to use larger batch sizes because this model splits the batch across the GPUs. **NOTE** If one wishes to train this model with *asynchronous* gradient updates, diff --git a/inception/inception/data/download_and_preprocess_flowers.sh b/inception/inception/data/download_and_preprocess_flowers.sh index 1c1f9cd21d63dfc0d84a0d4071beac05dc7d8ce3..ab8d451c34117eac52362942155b39b46e7f93e3 100755 --- a/inception/inception/data/download_and_preprocess_flowers.sh +++ b/inception/inception/data/download_and_preprocess_flowers.sh @@ -44,16 +44,16 @@ DATA_DIR="${1%/}" SCRATCH_DIR="${DATA_DIR}/raw-data" mkdir -p "${DATA_DIR}" mkdir -p "${SCRATCH_DIR}" -# http://stackoverflow.com/questions/59895/getting-the-source-directory-of-a-bash-script-from-within -WORK_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )" +WORK_DIR="$0.runfiles/inception/inception" # Download the flowers data. DATA_URL="http://download.tensorflow.org/example_images/flower_photos.tgz" CURRENT_DIR=$(pwd) +cd "${DATA_DIR}" TARBALL="flower_photos.tgz" if [ ! -f ${TARBALL} ]; then echo "Downloading flower data set." - curl -o ${DATA_DIR}/${TARBALL} "${DATA_URL}" + curl -o ${TARBALL} "${DATA_URL}" else echo "Skipping download of flower data." fi @@ -64,9 +64,8 @@ VALIDATION_DIRECTORY="${SCRATCH_DIR}/validation" # Expands the data into the flower_photos/ directory and rename it as the # train directory. -tar xf ${DATA_DIR}/flower_photos.tgz +tar xf flower_photos.tgz rm -rf "${TRAIN_DIRECTORY}" "${VALIDATION_DIRECTORY}" -mkdir -p "${TRAIN_DIRECTORY}" mv flower_photos "${TRAIN_DIRECTORY}" # Generate a list of 5 labels: daisy, dandelion, roses, sunflowers, tulips @@ -88,9 +87,9 @@ done < "${LABELS_FILE}" # Build the TFRecords version of the image data. cd "${CURRENT_DIR}" -BUILD_SCRIPT="${WORK_DIR}/build_image_data.py" +BUILD_SCRIPT="${WORK_DIR}/build_image_data" OUTPUT_DIRECTORY="${DATA_DIR}" -python "${BUILD_SCRIPT}" \ +"${BUILD_SCRIPT}" \ --train_directory="${TRAIN_DIRECTORY}" \ --validation_directory="${VALIDATION_DIRECTORY}" \ --output_directory="${OUTPUT_DIRECTORY}" \ diff --git a/inception/inception/data/download_imagenet.sh b/inception/inception/data/download_imagenet.sh index 576c99a2b0d9a1891221e4f07c9af24696f6e5c7..49b3b7d5609d92392420b015b5509077dc560e8d 100755 --- a/inception/inception/data/download_imagenet.sh +++ b/inception/inception/data/download_imagenet.sh @@ -40,7 +40,6 @@ fi OUTDIR="${1:-./imagenet-data}" SYNSETS_FILE="${2:-./synsets.txt}" -SYNSETS_FILE="${PWD}/${SYNSETS_FILE}" echo "Saving downloaded files to $OUTDIR" mkdir -p "${OUTDIR}" diff --git a/inception/inception/data/preprocess_imagenet_validation_data.py b/inception/inception/data/preprocess_imagenet_validation_data.py old mode 100644 new mode 100755 index 8308277a079c148eb45410dd2495926880c283d8..ae1576fff38f6d218959aaaa35faae30d7139d9d --- a/inception/inception/data/preprocess_imagenet_validation_data.py +++ b/inception/inception/data/preprocess_imagenet_validation_data.py @@ -76,7 +76,7 @@ if __name__ == '__main__': basename = 'ILSVRC2012_val_000%.5d.JPEG' % (i + 1) original_filename = os.path.join(data_dir, basename) if not os.path.exists(original_filename): - print('Failed to find: ' % original_filename) + print('Failed to find: %s' % original_filename) sys.exit(-1) new_filename = os.path.join(data_dir, labels[i], basename) os.rename(original_filename, new_filename) diff --git a/inception/inception/data/process_bounding_boxes.py b/inception/inception/data/process_bounding_boxes.py old mode 100644 new mode 100755 diff --git a/inception/inception/imagenet_distributed_train.py b/inception/inception/imagenet_distributed_train.py index 1c3ee3ab8eb676d6083f1638cf4a2fa7730a9183..f3615e012f042649b52e37aeaeeb2c3efc07f92c 100644 --- a/inception/inception/imagenet_distributed_train.py +++ b/inception/inception/imagenet_distributed_train.py @@ -45,7 +45,8 @@ def main(unused_args): {'ps': ps_hosts, 'worker': worker_hosts}, job_name=FLAGS.job_name, - task_index=FLAGS.task_id) + task_index=FLAGS.task_id, + protocol=FLAGS.protocol) if FLAGS.job_name == 'ps': # `ps` jobs wait for incoming connections from the workers. diff --git a/inception/inception/imagenet_eval.py b/inception/inception/imagenet_eval.py index 5444f192786822695f3caaf219d4a72bb6e874df..e6f8bac2ee71021914715172296d63dd56b5a6f9 100644 --- a/inception/inception/imagenet_eval.py +++ b/inception/inception/imagenet_eval.py @@ -12,7 +12,7 @@ # See the License for the specific language governing permissions and # limitations under the License. # ============================================================================== -"""A binary to evaluate Inception on the flowers data set. +"""A binary to evaluate Inception on the ImageNet data set. Note that using the supplied pre-trained inception checkpoint, the eval should achieve: diff --git a/inception/inception/inception_distributed_train.py b/inception/inception/inception_distributed_train.py index 67078585b2ea5f350d412a4c1c52e9716eae4dec..c1a589acb5fe386fd648ae3fae926ee927c0ca79 100644 --- a/inception/inception/inception_distributed_train.py +++ b/inception/inception/inception_distributed_train.py @@ -42,6 +42,9 @@ tf.app.flags.DEFINE_string('worker_hosts', '', """Comma-separated list of hostname:port for the """ """worker jobs. e.g. """ """'machine1:2222,machine2:1111,machine2:2222'""") +tf.app.flags.DEFINE_string('protocol', 'grpc', + """Communication protocol to use in distributed """ + """execution (default grpc) """) tf.app.flags.DEFINE_string('train_dir', '/tmp/imagenet_train', """Directory where to write event logs """ diff --git a/inception/inception/inception_train.py b/inception/inception/inception_train.py index 32c959df8ae71f7008ffe57e255d671eda223b48..e1c32713b2012aec8a18637ec5dd79a1cc84d90f 100644 --- a/inception/inception/inception_train.py +++ b/inception/inception/inception_train.py @@ -12,7 +12,7 @@ # See the License for the specific language governing permissions and # limitations under the License. # ============================================================================== -"""A library to train Inception using multiple GPU's with synchronous updates. +"""A library to train Inception using multiple GPUs with synchronous updates. """ from __future__ import absolute_import from __future__ import division @@ -83,7 +83,7 @@ def _tower_loss(images, labels, num_classes, scope, reuse_variables=None): """Calculate the total loss on a single tower running the ImageNet model. We perform 'batch splitting'. This means that we cut up a batch across - multiple GPU's. For instance, if the batch size = 32 and num_gpus = 2, + multiple GPUs. For instance, if the batch size = 32 and num_gpus = 2, then each tower will operate on an batch of 16 images. Args: diff --git a/inception/inception/slim/ops_test.py b/inception/inception/slim/ops_test.py index 0978e0ef3783ed50e618cb70504a4619d127b2c9..13dc5d9aacf6e283540a406d419a67d2d7215161 100644 --- a/inception/inception/slim/ops_test.py +++ b/inception/inception/slim/ops_test.py @@ -21,8 +21,6 @@ from __future__ import print_function import numpy as np import tensorflow as tf -from tensorflow.python.ops import control_flow_ops - from inception.slim import ops from inception.slim import scopes from inception.slim import variables @@ -420,7 +418,7 @@ class DropoutTest(tf.test.TestCase): with self.test_session(): images = tf.random_uniform((5, height, width, 3), seed=1) output = ops.dropout(images) - self.assertEquals(output.op.name, 'Dropout/dropout/mul_1') + self.assertEquals(output.op.name, 'Dropout/dropout/mul') output.get_shape().assert_is_compatible_with(images.get_shape()) def testCreateDropoutNoTraining(self): @@ -601,8 +599,7 @@ class BatchNormTest(tf.test.TestCase): output = ops.batch_norm(images, decay=0.1) update_ops = tf.get_collection(ops.UPDATE_OPS_COLLECTION) with tf.control_dependencies(update_ops): - barrier = tf.no_op(name='gradient_barrier') - output = control_flow_ops.with_dependencies([barrier], output) + output = tf.identity(output) # Initialize all variables sess.run(tf.global_variables_initializer()) moving_mean = variables.get_variables('BatchNorm/moving_mean')[0] @@ -631,8 +628,7 @@ class BatchNormTest(tf.test.TestCase): output = ops.batch_norm(images, decay=0.1, is_training=False) update_ops = tf.get_collection(ops.UPDATE_OPS_COLLECTION) with tf.control_dependencies(update_ops): - barrier = tf.no_op(name='gradient_barrier') - output = control_flow_ops.with_dependencies([barrier], output) + output = tf.identity(output) # Initialize all variables sess.run(tf.global_variables_initializer()) moving_mean = variables.get_variables('BatchNorm/moving_mean')[0] @@ -665,8 +661,7 @@ class BatchNormTest(tf.test.TestCase): output = ops.batch_norm(images, decay=0.1, is_training=False) update_ops = tf.get_collection(ops.UPDATE_OPS_COLLECTION) with tf.control_dependencies(update_ops): - barrier = tf.no_op(name='gradient_barrier') - output = control_flow_ops.with_dependencies([barrier], output) + output = tf.identity(output) # Initialize all variables sess.run(tf.global_variables_initializer()) moving_mean = variables.get_variables('BatchNorm/moving_mean')[0] diff --git a/lfads/README.md b/lfads/README.md new file mode 100644 index 0000000000000000000000000000000000000000..0dacb79db819698b67041fb8cb9c6608e3c70645 --- /dev/null +++ b/lfads/README.md @@ -0,0 +1,196 @@ +# LFADS - Latent Factor Analysis via Dynamical Systems + +This code implements the model from the paper "[LFADS - Latent Factor Analysis via Dynamical Systems](http://biorxiv.org/content/early/2017/06/20/152884)". It is a sequential variational auto-encoder designed specifically for investigating neuroscience data, but can be applied widely to any time series data. In an unsupervised setting, LFADS is able to decompose time series data into various factors, such as an initial condition, a generative dynamical system, control inputs to that generator, and a low dimensional description of the observed data, called the factors. Additionally, the observation model is a loss on a probability distribution, so when LFADS processes a dataset, a denoised version of the dataset is also created. For example, if the dataset is raw spike counts, then under the negative log-likeihood loss under a Poisson distribution, the denoised data would be the inferred Poisson rates. + + +## Prerequisites + +The code is written in Python 2.7.6. You will also need: + +* **TensorFlow** version 1.1 ([install](http://tflearn.org/installation/)) - + there is an incompatibility with LFADS and TF v1.2, which we are in the + process of resolving +* **NumPy, SciPy, Matplotlib** ([install SciPy stack](https://www.scipy.org/install.html), contains all of them) +* **h5py** ([install](https://pypi.python.org/pypi/h5py)) + + +## Getting started + +Before starting, run the following: + +
+$ export PYTHONPATH=$PYTHONPATH:/path/to/your/directory/lfads/
+
+ +where "path/to/your/directory" is replaced with the path to the LFADS repository (you can get this path by using the `pwd` command). This allows the nested directories to access modules from their parent directory. + +## Generate synthetic data + +In order to generate the synthetic datasets first, from the top-level lfads directory, run: + +```sh +$ cd synth_data +$ ./run_generate_synth_data.sh +$ cd .. +``` + +These synthetic datasets are provided 1. to gain insight into how the LFADS algorithm operates, and 2. to give reasonable starting points for analyses you might be interested for your own data. + +## Train an LFADS model + +Now that we have our example datasets, we can train some models! To spin up an LFADS model on the synthetic data, run any of the following commands. For the examples that are in the paper, the important hyperparameters are roughly replicated. Most hyperparameters are insensitive to small changes or won't ever be changed unless you want a very fine level of control. In the first example, all hyperparameter flags are enumerated for easy copy-pasting, but for the rest of the examples only the most important flags (~the first 8) are specified for brevity. For a full list of flags, their descriptions, and their default values, refer to the top of `run_lfads.py`. Please see Table 1 in the Online Methods of the associated paper for definitions of the most important hyperparameters. + +```sh +# Run LFADS on chaotic rnn data with no input pulses (g = 1.5) +$ python run_lfads.py --kind=train \ +--data_dir=/tmp/rnn_synth_data_v1.0/ \ +--data_filename_stem=chaotic_rnn_no_inputs \ +--lfads_save_dir=/tmp/lfads_chaotic_rnn_no_inputs \ +--co_dim=0 \ +--factors_dim=20 \ +--ext_input_dim=0 \ +--controller_input_lag=1 \ +--output_dist=poisson \ +--do_causal_controller=false \ +--batch_size=128 \ +--learning_rate_init=0.01 \ +--learning_rate_stop=1e-05 \ +--learning_rate_decay_factor=0.95 \ +--learning_rate_n_to_compare=6 \ +--do_reset_learning_rate=false \ +--keep_prob=0.95 \ +--con_dim=128 \ +--gen_dim=200 \ +--ci_enc_dim=128 \ +--ic_dim=64 \ +--ic_enc_dim=128 \ +--ic_prior_var_min=0.1 \ +--gen_cell_input_weight_scale=1.0 \ +--cell_weight_scale=1.0 \ +--do_feed_factors_to_controller=true \ +--kl_start_step=0 \ +--kl_increase_steps=2000 \ +--kl_ic_weight=1.0 \ +--l2_con_scale=0.0 \ +--l2_gen_scale=2000.0 \ +--l2_start_step=0 \ +--l2_increase_steps=2000 \ +--ic_prior_var_scale=0.1 \ +--ic_post_var_min=0.0001 \ +--kl_co_weight=1.0 \ +--prior_ar_nvar=0.1 \ +--cell_clip_value=5.0 \ +--max_ckpt_to_keep_lve=5 \ +--do_train_prior_ar_atau=true \ +--co_prior_var_scale=0.1 \ +--csv_log=fitlog \ +--feedback_factors_or_rates=factors \ +--do_train_prior_ar_nvar=true \ +--max_grad_norm=200.0 \ +--device=gpu:0 \ +--num_steps_for_gen_ic=100000000 \ +--ps_nexamples_to_process=100000000 \ +--checkpoint_name=lfads_vae \ +--temporal_spike_jitter_width=0 \ +--checkpoint_pb_load_name=checkpoint \ +--inject_ext_input_to_gen=false \ +--co_mean_corr_scale=0.0 \ +--gen_cell_rec_weight_scale=1.0 \ +--max_ckpt_to_keep=5 \ +--output_filename_stem="" \ +--ic_prior_var_max=0.1 \ +--prior_ar_atau=10.0 \ +--do_train_io_only=false + +# Run LFADS on chaotic rnn data with input pulses (g = 2.5) +$ python run_lfads.py --kind=train \ +--data_dir=/tmp/rnn_synth_data_v1.0/ \ +--data_filename_stem=chaotic_rnn_inputs_g2p5 \ +--lfads_save_dir=/tmp/lfads_chaotic_rnn_inputs_g2p5 \ +--co_dim=1 \ +--factors_dim=20 + +# Run LFADS on multi-session RNN data +$ python run_lfads.py --kind=train \ +--data_dir=/tmp/rnn_synth_data_v1.0/ \ +--data_filename_stem=chaotic_rnn_multisession \ +--lfads_save_dir=/tmp/lfads_chaotic_rnn_multisession \ +--factors_dim=10 + +# Run LFADS on integration to bound model data +$ python run_lfads.py --kind=train \ +--data_dir=/tmp/rnn_synth_data_v1.0/ \ +--data_filename_stem=itb_rnn \ +--lfads_save_dir=/tmp/lfads_itb_rnn \ +--co_dim=1 \ +--factors_dim=20 \ +--controller_input_lag=0 + +# Run LFADS on chaotic RNN data with labels +$ python run_lfads.py --kind=train \ +--data_dir=/tmp/rnn_synth_data_v1.0/ \ +--data_filename_stem=chaotic_rnns_labeled \ +--lfads_save_dir=/tmp/lfads_chaotic_rnns_labeled \ +--co_dim=0 \ +--factors_dim=20 \ +--controller_input_lag=0 \ +--ext_input_dim=1 + +``` + +**Tip**: If you are running LFADS on GPU and would like to run more than one model concurrently, set the `--allow_gpu_growth=True` flag on each job, otherwise one model will take up the entire GPU for performance purposes. Also, one needs to install the TensorFlow libraries with GPU support. + + +## Visualize a training model + +To visualize training curves and various other metrics while training and LFADS model, run the following command on your model directory. To launch a tensorboard on the chaotic RNN data with input pulses, for example: + +```sh +tensorboard --logdir=/tmp/lfads_chaotic_rnn_inputs_g2p5 +``` + +## Evaluate a trained model + +Once your model is finished training, there are multiple ways you can evaluate +it. Below are some sample commands to evaluate an LFADS model trained on the +chaotic rnn data with input pulses (g = 2.5). The key differences here are +setting the `--kind` flag to the appropriate mode, as well as the +`--checkpoint_pb_load_name` flag to `checkpoint_lve` and the `--batch_size` flag +(if you'd like to make it larger or smaller). All other flags should be the +same as used in training, so that the same model architecture is built. + +```sh +# Take samples from posterior then average (denoising operation) +$ python run_lfads.py --kind=posterior_sample_and_average \ +--data_dir=/tmp/rnn_synth_data_v1.0/ \ +--data_filename_stem=chaotic_rnn_inputs_g2p5 \ +--lfads_save_dir=/tmp/lfads_chaotic_rnn_inputs_g2p5 \ +--co_dim=1 \ +--factors_dim=20 \ +--batch_size=1024 \ +--checkpoint_pb_load_name=checkpoint_lve + +# Sample from prior (generation of completely new samples) +$ python run_lfads.py --kind=prior_sample \ +--data_dir=/tmp/rnn_synth_data_v1.0/ \ +--data_filename_stem=chaotic_rnn_inputs_g2p5 \ +--lfads_save_dir=/tmp/lfads_chaotic_rnn_inputs_g2p5 \ +--co_dim=1 \ +--factors_dim=20 \ +--batch_size=50 \ +--checkpoint_pb_load_name=checkpoint_lve + +# Write down model parameters +$ python run_lfads.py --kind=write_model_params \ +--data_dir=/tmp/rnn_synth_data_v1.0/ \ +--data_filename_stem=chaotic_rnn_inputs_g2p5 \ +--lfads_save_dir=/tmp/lfads_chaotic_rnn_inputs_g2p5 \ +--co_dim=1 \ +--factors_dim=20 \ +--checkpoint_pb_load_name=checkpoint_lve +``` + +## Contact + +File any issues with the [issue tracker](https://github.com/tensorflow/models/issues). For any questions or problems, this code is maintained by [@sussillo](https://github.com/sussillo) and [@jazcollins](https://github.com/jazcollins). + diff --git a/lfads/distributions.py b/lfads/distributions.py new file mode 100644 index 0000000000000000000000000000000000000000..56f14cfe3b6351c24f1d1d7a0c6bcbb6a76b01c8 --- /dev/null +++ b/lfads/distributions.py @@ -0,0 +1,493 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# ============================================================================== +import numpy as np +import tensorflow as tf +from utils import linear, log_sum_exp + +class Poisson(object): + """Poisson distributon + + Computes the log probability under the model. + + """ + def __init__(self, log_rates): + """ Create Poisson distributions with log_rates parameters. + + Args: + log_rates: a tensor-like list of log rates underlying the Poisson dist. + """ + self.logr = log_rates + + def logp(self, bin_counts): + """Compute the log probability for the counts in the bin, under the model. + + Args: + bin_counts: array-like integer counts + + Returns: + The log-probability under the Poisson models for each element of + bin_counts. + """ + k = tf.to_float(bin_counts) + # log poisson(k, r) = log(r^k * e^(-r) / k!) = k log(r) - r - log k! + # log poisson(k, r=exp(x)) = k * x - exp(x) - lgamma(k + 1) + return k * self.logr - tf.exp(self.logr) - tf.lgamma(k + 1) + + +def diag_gaussian_log_likelihood(z, mu=0.0, logvar=0.0): + """Log-likelihood under a Gaussian distribution with diagonal covariance. + Returns the log-likelihood for each dimension. One should sum the + results for the log-likelihood under the full multidimensional model. + + Args: + z: The value to compute the log-likelihood. + mu: The mean of the Gaussian + logvar: The log variance of the Gaussian. + + Returns: + The log-likelihood under the Gaussian model. + """ + + return -0.5 * (logvar + np.log(2*np.pi) + \ + tf.square((z-mu)/tf.exp(0.5*logvar))) + + +def gaussian_pos_log_likelihood(unused_mean, logvar, noise): + """Gaussian log-likelihood function for a posterior in VAE + + Note: This function is specialized for a posterior distribution, that has the + form of z = mean + sigma * noise. + + Args: + unused_mean: ignore + logvar: The log variance of the distribution + noise: The noise used in the sampling of the posterior. + + Returns: + The log-likelihood under the Gaussian model. + """ + # ln N(z; mean, sigma) = - ln(sigma) - 0.5 ln 2pi - noise^2 / 2 + return - 0.5 * (logvar + np.log(2 * np.pi) + tf.square(noise)) + + +class Gaussian(object): + """Base class for Gaussian distribution classes.""" + pass + + +class DiagonalGaussian(Gaussian): + """Diagonal Gaussian with different constant mean and variances in each + dimension. + """ + + def __init__(self, batch_size, z_size, mean, logvar): + """Create a diagonal gaussian distribution. + + Args: + batch_size: The size of the batch, i.e. 0th dim in 2D tensor of samples. + z_size: The dimension of the distribution, i.e. 1st dim in 2D tensor. + mean: The N-D mean of the distribution. + logvar: The N-D log variance of the diagonal distribution. + """ + size__xz = [None, z_size] + self.mean = mean # bxn already + self.logvar = logvar # bxn already + self.noise = noise = tf.random_normal(tf.shape(logvar)) + self.sample = mean + tf.exp(0.5 * logvar) * noise + mean.set_shape(size__xz) + logvar.set_shape(size__xz) + self.sample.set_shape(size__xz) + + def logp(self, z=None): + """Compute the log-likelihood under the distribution. + + Args: + z (optional): value to compute likelihood for, if None, use sample. + + Returns: + The likelihood of z under the model. + """ + if z is None: + z = self.sample + + # This is needed to make sure that the gradients are simple. + # The value of the function shouldn't change. + if z == self.sample: + return gaussian_pos_log_likelihood(self.mean, self.logvar, self.noise) + + return diag_gaussian_log_likelihood(z, self.mean, self.logvar) + + +class LearnableDiagonalGaussian(Gaussian): + """Diagonal Gaussian whose mean and variance are learned parameters.""" + + def __init__(self, batch_size, z_size, name, mean_init=0.0, + var_init=1.0, var_min=0.0, var_max=1000000.0): + """Create a learnable diagonal gaussian distribution. + + Args: + batch_size: The size of the batch, i.e. 0th dim in 2D tensor of samples. + z_size: The dimension of the distribution, i.e. 1st dim in 2D tensor. + name: prefix name for the mean and log TF variables. + mean_init (optional): The N-D mean initialization of the distribution. + var_init (optional): The N-D variance initialization of the diagonal + distribution. + var_min (optional): The minimum value the learned variance can take in any + dimension. + var_max (optional): The maximum value the learned variance can take in any + dimension. + """ + + size_1xn = [1, z_size] + size__xn = [None, z_size] + size_bx1 = tf.stack([batch_size, 1]) + assert var_init > 0.0, "Problems" + assert var_max >= var_min, "Problems" + assert var_init >= var_min, "Problems" + assert var_max >= var_init, "Problems" + + + z_mean_1xn = tf.get_variable(name=name+"/mean", shape=size_1xn, + initializer=tf.constant_initializer(mean_init)) + self.mean_bxn = mean_bxn = tf.tile(z_mean_1xn, size_bx1) + mean_bxn.set_shape(size__xn) # tile loses shape + + log_var_init = np.log(var_init) + if var_max > var_min: + var_is_trainable = True + else: + var_is_trainable = False + + z_logvar_1xn = \ + tf.get_variable(name=(name+"/logvar"), shape=size_1xn, + initializer=tf.constant_initializer(log_var_init), + trainable=var_is_trainable) + + if var_is_trainable: + z_logit_var_1xn = tf.exp(z_logvar_1xn) + z_var_1xn = tf.nn.sigmoid(z_logit_var_1xn)*(var_max-var_min) + var_min + z_logvar_1xn = tf.log(z_var_1xn) + + logvar_bxn = tf.tile(z_logvar_1xn, size_bx1) + self.logvar_bxn = logvar_bxn + self.noise_bxn = noise_bxn = tf.random_normal(tf.shape(logvar_bxn)) + self.sample_bxn = mean_bxn + tf.exp(0.5 * logvar_bxn) * noise_bxn + + def logp(self, z=None): + """Compute the log-likelihood under the distribution. + + Args: + z (optional): value to compute likelihood for, if None, use sample. + + Returns: + The likelihood of z under the model. + """ + if z is None: + z = self.sample + + # This is needed to make sure that the gradients are simple. + # The value of the function shouldn't change. + if z == self.sample_bxn: + return gaussian_pos_log_likelihood(self.mean_bxn, self.logvar_bxn, + self.noise_bxn) + + return diag_gaussian_log_likelihood(z, self.mean_bxn, self.logvar_bxn) + + @property + def mean(self): + return self.mean_bxn + + @property + def logvar(self): + return self.logvar_bxn + + @property + def sample(self): + return self.sample_bxn + + +class DiagonalGaussianFromInput(Gaussian): + """Diagonal Gaussian whose mean and variance are conditioned on other + variables. + + Note: the parameters to convert from input to the learned mean and log + variance are held in this class. + """ + + def __init__(self, x_bxu, z_size, name, var_min=0.0): + """Create an input dependent diagonal Gaussian distribution. + + Args: + x: The input tensor from which the mean and variance are computed, + via a linear transformation of x. I.e. + mu = Wx + b, log(var) = Mx + c + z_size: The size of the distribution. + name: The name to prefix to learned variables. + var_min (optional): Minimal variance allowed. This is an additional + way to control the amount of information getting through the stochastic + layer. + """ + size_bxn = tf.stack([tf.shape(x_bxu)[0], z_size]) + self.mean_bxn = mean_bxn = linear(x_bxu, z_size, name=(name+"/mean")) + logvar_bxn = linear(x_bxu, z_size, name=(name+"/logvar")) + if var_min > 0.0: + logvar_bxn = tf.log(tf.exp(logvar_bxn) + var_min) + self.logvar_bxn = logvar_bxn + + self.noise_bxn = noise_bxn = tf.random_normal(size_bxn) + self.noise_bxn.set_shape([None, z_size]) + self.sample_bxn = mean_bxn + tf.exp(0.5 * logvar_bxn) * noise_bxn + + def logp(self, z=None): + """Compute the log-likelihood under the distribution. + + Args: + z (optional): value to compute likelihood for, if None, use sample. + + Returns: + The likelihood of z under the model. + """ + + if z is None: + z = self.sample + + # This is needed to make sure that the gradients are simple. + # The value of the function shouldn't change. + if z == self.sample_bxn: + return gaussian_pos_log_likelihood(self.mean_bxn, + self.logvar_bxn, self.noise_bxn) + + return diag_gaussian_log_likelihood(z, self.mean_bxn, self.logvar_bxn) + + @property + def mean(self): + return self.mean_bxn + + @property + def logvar(self): + return self.logvar_bxn + + @property + def sample(self): + return self.sample_bxn + + +class GaussianProcess: + """Base class for Gaussian processes.""" + pass + + +class LearnableAutoRegressive1Prior(GaussianProcess): + """AR(1) model where autocorrelation and process variance are learned + parameters. Assumed zero mean. + + """ + + def __init__(self, batch_size, z_size, + autocorrelation_taus, noise_variances, + do_train_prior_ar_atau, do_train_prior_ar_nvar, + num_steps, name): + """Create a learnable autoregressive (1) process. + + Args: + batch_size: The size of the batch, i.e. 0th dim in 2D tensor of samples. + z_size: The dimension of the distribution, i.e. 1st dim in 2D tensor. + autocorrelation_taus: The auto correlation time constant of the AR(1) + process. + A value of 0 is uncorrelated gaussian noise. + noise_variances: The variance of the additive noise, *not* the process + variance. + do_train_prior_ar_atau: Train or leave as constant, the autocorrelation? + do_train_prior_ar_nvar: Train or leave as constant, the noise variance? + num_steps: Number of steps to run the process. + name: The name to prefix to learned TF variables. + """ + + # Note the use of the plural in all of these quantities. This is intended + # to mark that even though a sample z_t from the posterior is thought of a + # single sample of a multidimensional gaussian, the prior is actually + # thought of as U AR(1) processes, where U is the dimension of the inferred + # input. + size_bx1 = tf.stack([batch_size, 1]) + size__xu = [None, z_size] + # process variance, the variance at time t over all instantiations of AR(1) + # with these parameters. + log_evar_inits_1xu = tf.expand_dims(tf.log(noise_variances), 0) + self.logevars_1xu = logevars_1xu = \ + tf.Variable(log_evar_inits_1xu, name=name+"/logevars", dtype=tf.float32, + trainable=do_train_prior_ar_nvar) + self.logevars_bxu = logevars_bxu = tf.tile(logevars_1xu, size_bx1) + logevars_bxu.set_shape(size__xu) # tile loses shape + + # \tau, which is the autocorrelation time constant of the AR(1) process + log_atau_inits_1xu = tf.expand_dims(tf.log(autocorrelation_taus), 0) + self.logataus_1xu = logataus_1xu = \ + tf.Variable(log_atau_inits_1xu, name=name+"/logatau", dtype=tf.float32, + trainable=do_train_prior_ar_atau) + + # phi in x_t = \mu + phi x_tm1 + \eps + # phi = exp(-1/tau) + # phi = exp(-1/exp(logtau)) + # phi = exp(-exp(-logtau)) + phis_1xu = tf.exp(-tf.exp(-logataus_1xu)) + self.phis_bxu = phis_bxu = tf.tile(phis_1xu, size_bx1) + phis_bxu.set_shape(size__xu) + + # process noise + # pvar = evar / (1- phi^2) + # logpvar = log ( exp(logevar) / (1 - phi^2) ) + # logpvar = logevar - log(1-phi^2) + # logpvar = logevar - (log(1-phi) + log(1+phi)) + self.logpvars_1xu = \ + logevars_1xu - tf.log(1.0-phis_1xu) - tf.log(1.0+phis_1xu) + self.logpvars_bxu = logpvars_bxu = tf.tile(self.logpvars_1xu, size_bx1) + logpvars_bxu.set_shape(size__xu) + + # process mean (zero but included in for completeness) + self.pmeans_bxu = pmeans_bxu = tf.zeros_like(phis_bxu) + + # For sampling from the prior during de-novo generation. + self.means_t = means_t = [None] * num_steps + self.logvars_t = logvars_t = [None] * num_steps + self.samples_t = samples_t = [None] * num_steps + self.gaussians_t = gaussians_t = [None] * num_steps + sample_bxu = tf.zeros_like(phis_bxu) + for t in range(num_steps): + # process variance used here to make process completely stationary + if t == 0: + logvar_pt_bxu = self.logpvars_bxu + else: + logvar_pt_bxu = self.logevars_bxu + + z_mean_pt_bxu = pmeans_bxu + phis_bxu * sample_bxu + gaussians_t[t] = DiagonalGaussian(batch_size, z_size, + mean=z_mean_pt_bxu, + logvar=logvar_pt_bxu) + sample_bxu = gaussians_t[t].sample + samples_t[t] = sample_bxu + logvars_t[t] = logvar_pt_bxu + means_t[t] = z_mean_pt_bxu + + def logp_t(self, z_t_bxu, z_tm1_bxu=None): + """Compute the log-likelihood under the distribution for a given time t, + not the whole sequence. + + Args: + z_t_bxu: sample to compute likelihood for at time t. + z_tm1_bxu (optional): sample condition probability of z_t upon. + + Returns: + The likelihood of p_t under the model at time t. i.e. + p(z_t|z_tm1) = N(z_tm1 * phis, eps^2) + + """ + if z_tm1_bxu is None: + return diag_gaussian_log_likelihood(z_t_bxu, self.pmeans_bxu, + self.logpvars_bxu) + else: + means_t_bxu = self.pmeans_bxu + self.phis_bxu * z_tm1_bxu + logp_tgtm1_bxu = diag_gaussian_log_likelihood(z_t_bxu, + means_t_bxu, + self.logevars_bxu) + return logp_tgtm1_bxu + + +class KLCost_GaussianGaussian(object): + """log p(x|z) + KL(q||p) terms for Gaussian posterior and Gaussian prior. See + eqn 10 and Appendix B in VAE for latter term, + http://arxiv.org/abs/1312.6114 + + The log p(x|z) term is the reconstruction error under the model. + The KL term represents the penalty for passing information from the encoder + to the decoder. + To sample KL(q||p), we simply sample + ln q - ln p + by drawing samples from q and averaging. + """ + + def __init__(self, zs, prior_zs): + """Create a lower bound in three parts, normalized reconstruction + cost, normalized KL divergence cost, and their sum. + + E_q[ln p(z_i | z_{i+1}) / q(z_i | x) + \int q(z) ln p(z) dz = - 0.5 ln(2pi) - 0.5 \sum (ln(sigma_p^2) + \ + sigma_q^2 / sigma_p^2 + (mean_p - mean_q)^2 / sigma_p^2) + + \int q(z) ln q(z) dz = - 0.5 ln(2pi) - 0.5 \sum (ln(sigma_q^2) + 1) + + Args: + zs: posterior z ~ q(z|x) + prior_zs: prior zs + """ + # L = -KL + log p(x|z), to maximize bound on likelihood + # -L = KL - log p(x|z), to minimize bound on NLL + # so 'KL cost' is postive KL divergence + kl_b = 0.0 + for z, prior_z in zip(zs, prior_zs): + assert isinstance(z, Gaussian) + assert isinstance(prior_z, Gaussian) + # ln(2pi) terms cancel + kl_b += 0.5 * tf.reduce_sum( + prior_z.logvar - z.logvar + + tf.exp(z.logvar - prior_z.logvar) + + tf.square((z.mean - prior_z.mean) / tf.exp(0.5 * prior_z.logvar)) + - 1.0, [1]) + + self.kl_cost_b = kl_b + self.kl_cost = tf.reduce_mean(kl_b) + + +class KLCost_GaussianGaussianProcessSampled(object): + """ log p(x|z) + KL(q||p) terms for Gaussian posterior and Gaussian process + prior via sampling. + + The log p(x|z) term is the reconstruction error under the model. + The KL term represents the penalty for passing information from the encoder + to the decoder. + To sample KL(q||p), we simply sample + ln q - ln p + by drawing samples from q and averaging. + """ + + def __init__(self, post_zs, prior_z_process): + """Create a lower bound in three parts, normalized reconstruction + cost, normalized KL divergence cost, and their sum. + + Args: + post_zs: posterior z ~ q(z|x) + prior_z_process: prior AR(1) process + """ + assert len(post_zs) > 1, "GP is for time, need more than 1 time step." + assert isinstance(prior_z_process, GaussianProcess), "Must use GP." + + # L = -KL + log p(x|z), to maximize bound on likelihood + # -L = KL - log p(x|z), to minimize bound on NLL + # so 'KL cost' is postive KL divergence + z0_bxu = post_zs[0].sample + logq_bxu = post_zs[0].logp(z0_bxu) + logp_bxu = prior_z_process.logp_t(z0_bxu) + z_tm1_bxu = z0_bxu + for z_t in post_zs[1:]: + # posterior is independent in time, prior is not + z_t_bxu = z_t.sample + logq_bxu += z_t.logp(z_t_bxu) + logp_bxu += prior_z_process.logp_t(z_t_bxu, z_tm1_bxu) + z_tm1 = z_t_bxu + + kl_bxu = logq_bxu - logp_bxu + kl_b = tf.reduce_sum(kl_bxu, [1]) + self.kl_cost_b = kl_b + self.kl_cost = tf.reduce_mean(kl_b) diff --git a/lfads/lfads.py b/lfads/lfads.py new file mode 100644 index 0000000000000000000000000000000000000000..bc8f2bbaf5e8f972878cc672af2e3959f0783da8 --- /dev/null +++ b/lfads/lfads.py @@ -0,0 +1,1935 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# ============================================================================== +""" +LFADS - Latent Factor Analysis via Dynamical Systems. + +LFADS is an unsupervised method to decompose time series data into +various factors, such as an initial condition, a generative +dynamical system, control inputs to that generator, and a low +dimensional description of the observed data, called the factors. +Additionally, the observations have a noise model (in this case +Poisson), so a denoised version of the observations is also created +(e.g. underlying rates of a Poisson distribution given the observed +event counts). + +The main data structure being passed around is a dataset. This is a dictionary +of data dictionaries. + +DATASET: The top level dictionary is simply name (string -> dictionary). +The nested dictionary is the DATA DICTIONARY, which has the following keys: + 'train_data' and 'valid_data', whose values are the corresponding training + and validation data with shape + ExTxD, E - # examples, T - # time steps, D - # dimensions in data. + The data dictionary also has a few more keys: + 'train_ext_input' and 'valid_ext_input', if there are know external inputs + to the system being modeled, these take on dimensions: + ExTxI, E - # examples, T - # time steps, I = # dimensions in input. + 'alignment_matrix_cxf' - If you are using multiple days data, it's possible + that one can align the channels (see manuscript). If so each dataset will + contain this matrix, which will be used for both the input adapter and the + output adapter for each dataset. These matrices, if provided, must be of + size [data_dim x factors] where data_dim is the number of neurons recorded + on that day, and factors is chosen and set through the '--factors' flag. + + If one runs LFADS on data where the true rates are known for some trials, + (say simulated, testing data, as in the example shipped with the paper), then + one can add three more fields for plotting purposes. These are 'train_truth' + and 'valid_truth', and 'conversion_factor'. These have the same dimensions as + 'train_data', and 'valid_data' but represent the underlying rates of the + observations. Finally, if one needs to convert scale for plotting the true + underlying firing rates, there is the 'conversion_factor' key. +""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + + +import numpy as np +import os +import tensorflow as tf +from distributions import LearnableDiagonalGaussian, DiagonalGaussianFromInput +from distributions import diag_gaussian_log_likelihood +from distributions import KLCost_GaussianGaussian, Poisson +from distributions import LearnableAutoRegressive1Prior +from distributions import KLCost_GaussianGaussianProcessSampled + +from utils import init_linear, linear, list_t_bxn_to_tensor_bxtxn, write_data +from utils import log_sum_exp, flatten +from plot_lfads import plot_lfads + + +class GRU(object): + """Gated Recurrent Unit cell (cf. http://arxiv.org/abs/1406.1078). + + """ + def __init__(self, num_units, forget_bias=1.0, weight_scale=1.0, + clip_value=np.inf, collections=None): + """Create a GRU object. + + Args: + num_units: Number of units in the GRU + forget_bias (optional): Hack to help learning. + weight_scale (optional): weights are scaled by ws/sqrt(#inputs), with + ws being the weight scale. + clip_value (optional): if the recurrent values grow above this value, + clip them. + collections (optional): List of additonal collections variables should + belong to. + """ + self._num_units = num_units + self._forget_bias = forget_bias + self._weight_scale = weight_scale + self._clip_value = clip_value + self._collections = collections + + @property + def state_size(self): + return self._num_units + + @property + def output_size(self): + return self._num_units + + @property + def state_multiplier(self): + return 1 + + def output_from_state(self, state): + """Return the output portion of the state.""" + return state + + def __call__(self, inputs, state, scope=None): + """Gated recurrent unit (GRU) function. + + Args: + inputs: A 2D batch x input_dim tensor of inputs. + state: The previous state from the last time step. + scope (optional): TF variable scope for defined GRU variables. + + Returns: + A tuple (state, state), where state is the newly computed state at time t. + It is returned twice to respect an interface that works for LSTMs. + """ + + x = inputs + h = state + if inputs is not None: + xh = tf.concat(axis=1, values=[x, h]) + else: + xh = h + + with tf.variable_scope(scope or type(self).__name__): # "GRU" + with tf.variable_scope("Gates"): # Reset gate and update gate. + # We start with bias of 1.0 to not reset and not update. + r, u = tf.split(axis=1, num_or_size_splits=2, value=linear(xh, + 2 * self._num_units, + alpha=self._weight_scale, + name="xh_2_ru", + collections=self._collections)) + r, u = tf.sigmoid(r), tf.sigmoid(u + self._forget_bias) + with tf.variable_scope("Candidate"): + xrh = tf.concat(axis=1, values=[x, r * h]) + c = tf.tanh(linear(xrh, self._num_units, name="xrh_2_c", + collections=self._collections)) + new_h = u * h + (1 - u) * c + new_h = tf.clip_by_value(new_h, -self._clip_value, self._clip_value) + + return new_h, new_h + + +class GenGRU(object): + """Gated Recurrent Unit cell (cf. http://arxiv.org/abs/1406.1078). + + This version is specialized for the generator, but isn't as fast, so + we have two. Note this allows for l2 regularization on the recurrent + weights, but also implicitly rescales the inputs via the 1/sqrt(input) + scaling in the linear helper routine to be large magnitude, if there are + fewer inputs than recurrent state. + + """ + def __init__(self, num_units, forget_bias=1.0, + input_weight_scale=1.0, rec_weight_scale=1.0, clip_value=np.inf, + input_collections=None, recurrent_collections=None): + """Create a GRU object. + + Args: + num_units: Number of units in the GRU + forget_bias (optional): Hack to help learning. + input_weight_scale (optional): weights are scaled ws/sqrt(#inputs), with + ws being the weight scale. + rec_weight_scale (optional): weights are scaled ws/sqrt(#inputs), + with ws being the weight scale. + clip_value (optional): if the recurrent values grow above this value, + clip them. + input_collections (optional): List of additonal collections variables + that input->rec weights should belong to. + recurrent_collections (optional): List of additonal collections variables + that rec->rec weights should belong to. + """ + self._num_units = num_units + self._forget_bias = forget_bias + self._input_weight_scale = input_weight_scale + self._rec_weight_scale = rec_weight_scale + self._clip_value = clip_value + self._input_collections = input_collections + self._rec_collections = recurrent_collections + + @property + def state_size(self): + return self._num_units + + @property + def output_size(self): + return self._num_units + + @property + def state_multiplier(self): + return 1 + + def output_from_state(self, state): + """Return the output portion of the state.""" + return state + + def __call__(self, inputs, state, scope=None): + """Gated recurrent unit (GRU) function. + + Args: + inputs: A 2D batch x input_dim tensor of inputs. + state: The previous state from the last time step. + scope (optional): TF variable scope for defined GRU variables. + + Returns: + A tuple (state, state), where state is the newly computed state at time t. + It is returned twice to respect an interface that works for LSTMs. + """ + + x = inputs + h = state + with tf.variable_scope(scope or type(self).__name__): # "GRU" + with tf.variable_scope("Gates"): # Reset gate and update gate. + # We start with bias of 1.0 to not reset and not update. + r_x = u_x = 0.0 + if x is not None: + r_x, u_x = tf.split(axis=1, num_or_size_splits=2, value=linear(x, + 2 * self._num_units, + alpha=self._input_weight_scale, + do_bias=False, + name="x_2_ru", + normalized=False, + collections=self._input_collections)) + + r_h, u_h = tf.split(axis=1, num_or_size_splits=2, value=linear(h, + 2 * self._num_units, + do_bias=True, + alpha=self._rec_weight_scale, + name="h_2_ru", + collections=self._rec_collections)) + r = r_x + r_h + u = u_x + u_h + r, u = tf.sigmoid(r), tf.sigmoid(u + self._forget_bias) + + with tf.variable_scope("Candidate"): + c_x = 0.0 + if x is not None: + c_x = linear(x, self._num_units, name="x_2_c", do_bias=False, + alpha=self._input_weight_scale, + normalized=False, + collections=self._input_collections) + c_rh = linear(r*h, self._num_units, name="rh_2_c", do_bias=True, + alpha=self._rec_weight_scale, + collections=self._rec_collections) + c = tf.tanh(c_x + c_rh) + + new_h = u * h + (1 - u) * c + new_h = tf.clip_by_value(new_h, -self._clip_value, self._clip_value) + + return new_h, new_h + + +class LFADS(object): + """LFADS - Latent Factor Analysis via Dynamical Systems. + + LFADS is an unsupervised method to decompose time series data into + various factors, such as an initial condition, a generative + dynamical system, inferred inputs to that generator, and a low + dimensional description of the observed data, called the factors. + Additoinally, the observations have a noise model (in this case + Poisson), so a denoised version of the observations is also created + (e.g. underlying rates of a Poisson distribution given the observed + event counts). + """ + + def __init__(self, hps, kind="train", datasets=None): + """Create an LFADS model. + + train - a model for training, sampling of posteriors is used + posterior_sample_and_average - sample from the posterior, this is used + for evaluating the expected value of the outputs of LFADS, given a + specific input, by averaging over multiple samples from the approx + posterior. Also used for the lower bound on the negative + log-likelihood using IWAE error (Importance Weighed Auto-encoder). + This is the denoising operation. + prior_sample - a model for generation - sampling from priors is used + + Args: + hps: The dictionary of hyper parameters. + kind: the type of model to build (see above). + datasets: a dictionary of named data_dictionaries, see top of lfads.py + """ + print("Building graph...") + all_kinds = ['train', 'posterior_sample_and_average', 'prior_sample'] + assert kind in all_kinds, 'Wrong kind' + if hps.feedback_factors_or_rates == "rates": + assert len(hps.dataset_names) == 1, \ + "Multiple datasets not supported for rate feedback." + num_steps = hps.num_steps + ic_dim = hps.ic_dim + co_dim = hps.co_dim + ext_input_dim = hps.ext_input_dim + cell_class = GRU + gen_cell_class = GenGRU + + def makelambda(v): # Used with tf.case + return lambda: v + + # Define the data placeholder, and deal with all parts of the graph + # that are dataset dependent. + self.dataName = tf.placeholder(tf.string, shape=()) + # The batch_size to be inferred from data, as normal. + # Additionally, the data_dim will be inferred as well, allowing for a + # single placeholder for all datasets, regardless of data dimension. + if hps.output_dist == 'poisson': + # Enforce correct dtype + assert np.issubdtype( + datasets[hps.dataset_names[0]]['train_data'].dtype, int), \ + "Data dtype must be int for poisson output distribution" + data_dtype = tf.int32 + elif hps.output_dist == 'gaussian': + assert np.issubdtype( + datasets[hps.dataset_names[0]]['train_data'].dtype, float), \ + "Data dtype must be float for gaussian output dsitribution" + data_dtype = tf.float32 + else: + assert False, "NIY" + self.dataset_ph = dataset_ph = tf.placeholder(data_dtype, + [None, num_steps, None], + name="data") + self.train_step = tf.get_variable("global_step", [], tf.int64, + tf.zeros_initializer(), + trainable=False) + self.hps = hps + ndatasets = hps.ndatasets + factors_dim = hps.factors_dim + self.preds = preds = [None] * ndatasets + self.fns_in_fac_Ws = fns_in_fac_Ws = [None] * ndatasets + self.fns_in_fatcor_bs = fns_in_fac_bs = [None] * ndatasets + self.fns_out_fac_Ws = fns_out_fac_Ws = [None] * ndatasets + self.fns_out_fac_bs = fns_out_fac_bs = [None] * ndatasets + self.datasetNames = dataset_names = hps.dataset_names + self.ext_inputs = ext_inputs = None + + if len(dataset_names) == 1: # single session + if 'alignment_matrix_cxf' in datasets[dataset_names[0]].keys(): + used_in_factors_dim = factors_dim + in_identity_if_poss = False + else: + used_in_factors_dim = hps.dataset_dims[dataset_names[0]] + in_identity_if_poss = True + else: # multisession + used_in_factors_dim = factors_dim + in_identity_if_poss = False + + for d, name in enumerate(dataset_names): + data_dim = hps.dataset_dims[name] + in_mat_cxf = None + if datasets and 'alignment_matrix_cxf' in datasets[name].keys(): + dataset = datasets[name] + print("Using alignment matrix provided for dataset:", name) + in_mat_cxf = dataset['alignment_matrix_cxf'].astype(np.float32) + if in_mat_cxf.shape != (data_dim, factors_dim): + raise ValueError("""Alignment matrix must have dimensions %d x %d + (data_dim x factors_dim), but currently has %d x %d."""% + (data_dim, factors_dim, in_mat_cxf.shape[0], + in_mat_cxf.shape[1])) + + in_fac_lin = init_linear(data_dim, used_in_factors_dim, do_bias=True, + mat_init_value=in_mat_cxf, + identity_if_possible=in_identity_if_poss, + normalized=False, name="x_2_infac_"+name, + collections=['IO_transformations']) + in_fac_W, in_fac_b = in_fac_lin + fns_in_fac_Ws[d] = makelambda(in_fac_W) + fns_in_fac_bs[d] = makelambda(in_fac_b) + + with tf.variable_scope("glm"): + out_identity_if_poss = False + if len(dataset_names) == 1 and \ + factors_dim == hps.dataset_dims[dataset_names[0]]: + out_identity_if_poss = True + for d, name in enumerate(dataset_names): + data_dim = hps.dataset_dims[name] + in_mat_cxf = None + if datasets and 'alignment_matrix_cxf' in datasets[name].keys(): + dataset = datasets[name] + in_mat_cxf = dataset['alignment_matrix_cxf'].astype(np.float32) + + out_mat_cxf = None + if in_mat_cxf is not None: + out_mat_cxf = in_mat_cxf.T + + if hps.output_dist == 'poisson': + out_fac_lin = init_linear(factors_dim, data_dim, do_bias=True, + mat_init_value=out_mat_cxf, + identity_if_possible=out_identity_if_poss, + normalized=False, + name="fac_2_logrates_"+name, + collections=['IO_transformations']) + out_fac_W, out_fac_b = out_fac_lin + + elif hps.output_dist == 'gaussian': + out_fac_lin_mean = \ + init_linear(factors_dim, data_dim, do_bias=True, + mat_init_value=out_mat_cxf, + normalized=False, + name="fac_2_means_"+name, + collections=['IO_transformations']) + out_fac_lin_logvar = \ + init_linear(factors_dim, data_dim, do_bias=True, + mat_init_value=out_mat_cxf, + normalized=False, + name="fac_2_logvars_"+name, + collections=['IO_transformations']) + out_fac_W_mean, out_fac_b_mean = out_fac_lin_mean + out_fac_W_logvar, out_fac_b_logvar = out_fac_lin_logvar + out_fac_W = tf.concat( + axis=1, values=[out_fac_W_mean, out_fac_W_logvar]) + out_fac_b = tf.concat( + axis=1, values=[out_fac_b_mean, out_fac_b_logvar]) + else: + assert False, "NIY" + + preds[d] = tf.equal(tf.constant(name), self.dataName) + data_dim = hps.dataset_dims[name] + fns_out_fac_Ws[d] = makelambda(out_fac_W) + fns_out_fac_bs[d] = makelambda(out_fac_b) + + pf_pairs_in_fac_Ws = zip(preds, fns_in_fac_Ws) + pf_pairs_in_fac_bs = zip(preds, fns_in_fac_bs) + pf_pairs_out_fac_Ws = zip(preds, fns_out_fac_Ws) + pf_pairs_out_fac_bs = zip(preds, fns_out_fac_bs) + + case_default = lambda: tf.constant([-8675309.0]) + this_in_fac_W = tf.case(pf_pairs_in_fac_Ws, case_default, exclusive=True) + this_in_fac_b = tf.case(pf_pairs_in_fac_bs, case_default, exclusive=True) + this_out_fac_W = tf.case(pf_pairs_out_fac_Ws, case_default, exclusive=True) + this_out_fac_b = tf.case(pf_pairs_out_fac_bs, case_default, exclusive=True) + + # External inputs (not changing by dataset, by definition). + if hps.ext_input_dim > 0: + self.ext_input = tf.placeholder(tf.float32, + [None, num_steps, ext_input_dim], + name="ext_input") + else: + self.ext_input = None + ext_input_bxtxi = self.ext_input + + self.keep_prob = keep_prob = tf.placeholder(tf.float32, [], "keep_prob") + self.batch_size = batch_size = int(hps.batch_size) + self.learning_rate = tf.Variable(float(hps.learning_rate_init), + trainable=False, name="learning_rate") + self.learning_rate_decay_op = self.learning_rate.assign( + self.learning_rate * hps.learning_rate_decay_factor) + + # Dropout the data. + dataset_do_bxtxd = tf.nn.dropout(tf.to_float(dataset_ph), keep_prob) + if hps.ext_input_dim > 0: + ext_input_do_bxtxi = tf.nn.dropout(ext_input_bxtxi, keep_prob) + else: + ext_input_do_bxtxi = None + + # ENCODERS + def encode_data(dataset_bxtxd, enc_cell, name, forward_or_reverse, + num_steps_to_encode): + """Encode data for LFADS + Args: + dataset_bxtxd - the data to encode, as a 3 tensor, with dims + time x batch x data dims. + enc_cell: encoder cell + name: name of encoder + forward_or_reverse: string, encode in forward or reverse direction + num_steps_to_encode: number of steps to encode, 0:num_steps_to_encode + Returns: + encoded data as a list with num_steps_to_encode items, in order + """ + if forward_or_reverse == "forward": + dstr = "_fwd" + time_fwd_or_rev = range(num_steps_to_encode) + else: + dstr = "_rev" + time_fwd_or_rev = reversed(range(num_steps_to_encode)) + + with tf.variable_scope(name+"_enc"+dstr, reuse=False): + enc_state = tf.tile( + tf.Variable(tf.zeros([1, enc_cell.state_size]), + name=name+"_enc_t0"+dstr), tf.stack([batch_size, 1])) + enc_state.set_shape([None, enc_cell.state_size]) # tile loses shape + + enc_outs = [None] * num_steps_to_encode + for i, t in enumerate(time_fwd_or_rev): + with tf.variable_scope(name+"_enc"+dstr, reuse=True if i > 0 else None): + dataset_t_bxd = dataset_bxtxd[:,t,:] + in_fac_t_bxf = tf.matmul(dataset_t_bxd, this_in_fac_W) + this_in_fac_b + in_fac_t_bxf.set_shape([None, used_in_factors_dim]) + if ext_input_dim > 0 and not hps.inject_ext_input_to_gen: + ext_input_t_bxi = ext_input_do_bxtxi[:,t,:] + enc_input_t_bxfpe = tf.concat( + axis=1, values=[in_fac_t_bxf, ext_input_t_bxi]) + else: + enc_input_t_bxfpe = in_fac_t_bxf + enc_out, enc_state = enc_cell(enc_input_t_bxfpe, enc_state) + enc_outs[t] = enc_out + + return enc_outs + + # Encode initial condition means and variances + # ([x_T, x_T-1, ... x_0] and [x_0, x_1, ... x_T] -> g0/c0) + self.ic_enc_fwd = [None] * num_steps + self.ic_enc_rev = [None] * num_steps + if ic_dim > 0: + enc_ic_cell = cell_class(hps.ic_enc_dim, + weight_scale=hps.cell_weight_scale, + clip_value=hps.cell_clip_value) + ic_enc_fwd = encode_data(dataset_do_bxtxd, enc_ic_cell, + "ic", "forward", + hps.num_steps_for_gen_ic) + ic_enc_rev = encode_data(dataset_do_bxtxd, enc_ic_cell, + "ic", "reverse", + hps.num_steps_for_gen_ic) + self.ic_enc_fwd = ic_enc_fwd + self.ic_enc_rev = ic_enc_rev + + # Encoder control input means and variances, bi-directional encoding so: + # ([x_T, x_T-1, ..., x_0] and [x_0, x_1 ... x_T] -> u_t) + self.ci_enc_fwd = [None] * num_steps + self.ci_enc_rev = [None] * num_steps + if co_dim > 0: + enc_ci_cell = cell_class(hps.ci_enc_dim, + weight_scale=hps.cell_weight_scale, + clip_value=hps.cell_clip_value) + ci_enc_fwd = encode_data(dataset_do_bxtxd, enc_ci_cell, + "ci", "forward", + hps.num_steps) + if hps.do_causal_controller: + ci_enc_rev = None + else: + ci_enc_rev = encode_data(dataset_do_bxtxd, enc_ci_cell, + "ci", "reverse", + hps.num_steps) + self.ci_enc_fwd = ci_enc_fwd + self.ci_enc_rev = ci_enc_rev + + # STOCHASTIC LATENT VARIABLES, priors and posteriors + # (initial conditions g0, and control inputs, u_t) + # Note that zs represent all the stochastic latent variables. + with tf.variable_scope("z", reuse=False): + self.prior_zs_g0 = None + self.posterior_zs_g0 = None + self.g0s_val = None + if ic_dim > 0: + self.prior_zs_g0 = \ + LearnableDiagonalGaussian(batch_size, ic_dim, name="prior_g0", + mean_init=0.0, + var_min=hps.ic_prior_var_min, + var_init=hps.ic_prior_var_scale, + var_max=hps.ic_prior_var_max) + ic_enc = tf.concat(axis=1, values=[ic_enc_fwd[-1], ic_enc_rev[0]]) + ic_enc = tf.nn.dropout(ic_enc, keep_prob) + self.posterior_zs_g0 = \ + DiagonalGaussianFromInput(ic_enc, ic_dim, "ic_enc_2_post_g0", + var_min=hps.ic_post_var_min) + if kind in ["train", "posterior_sample_and_average"]: + zs_g0 = self.posterior_zs_g0 + else: + zs_g0 = self.prior_zs_g0 + if kind in ["train", "posterior_sample_and_average", "prior_sample"]: + self.g0s_val = zs_g0.sample + else: + self.g0s_val = zs_g0.mean + + # Priors for controller, 'co' for controller output + self.prior_zs_co = prior_zs_co = [None] * num_steps + self.posterior_zs_co = posterior_zs_co = [None] * num_steps + self.zs_co = zs_co = [None] * num_steps + self.prior_zs_ar_con = None + if co_dim > 0: + # Controller outputs + autocorrelation_taus = [hps.prior_ar_atau for x in range(hps.co_dim)] + noise_variances = [hps.prior_ar_nvar for x in range(hps.co_dim)] + self.prior_zs_ar_con = prior_zs_ar_con = \ + LearnableAutoRegressive1Prior(batch_size, hps.co_dim, + autocorrelation_taus, + noise_variances, + hps.do_train_prior_ar_atau, + hps.do_train_prior_ar_nvar, + num_steps, "u_prior_ar1") + + # CONTROLLER -> GENERATOR -> RATES + # (u(t) -> gen(t) -> factors(t) -> rates(t) -> p(x_t|z_t) ) + self.controller_outputs = u_t = [None] * num_steps + self.con_ics = con_state = None + self.con_states = con_states = [None] * num_steps + self.con_outs = con_outs = [None] * num_steps + self.gen_inputs = gen_inputs = [None] * num_steps + if co_dim > 0: + # gen_cell_class here for l2 penalty recurrent weights + # didn't split the cell_weight scale here, because I doubt it matters + con_cell = gen_cell_class(hps.con_dim, + input_weight_scale=hps.cell_weight_scale, + rec_weight_scale=hps.cell_weight_scale, + clip_value=hps.cell_clip_value, + recurrent_collections=['l2_con_reg']) + with tf.variable_scope("con", reuse=False): + self.con_ics = tf.tile( + tf.Variable(tf.zeros([1, hps.con_dim*con_cell.state_multiplier]), \ + name="c0"), + tf.stack([batch_size, 1])) + self.con_ics.set_shape([None, con_cell.state_size]) # tile loses shape + con_states[-1] = self.con_ics + + gen_cell = gen_cell_class(hps.gen_dim, + input_weight_scale=hps.gen_cell_input_weight_scale, + rec_weight_scale=hps.gen_cell_rec_weight_scale, + clip_value=hps.cell_clip_value, + recurrent_collections=['l2_gen_reg']) + with tf.variable_scope("gen", reuse=False): + if ic_dim == 0: + self.gen_ics = tf.tile( + tf.Variable(tf.zeros([1, gen_cell.state_size]), name="g0"), + tf.stack([batch_size, 1])) + else: + self.gen_ics = linear(self.g0s_val, gen_cell.state_size, + identity_if_possible=True, + name="g0_2_gen_ic") + + self.gen_states = gen_states = [None] * num_steps + self.gen_outs = gen_outs = [None] * num_steps + gen_states[-1] = self.gen_ics + gen_outs[-1] = gen_cell.output_from_state(gen_states[-1]) + self.factors = factors = [None] * num_steps + factors[-1] = linear(gen_outs[-1], factors_dim, do_bias=False, + normalized=True, name="gen_2_fac") + + self.rates = rates = [None] * num_steps + # rates[-1] is collected to potentially feed back to controller + with tf.variable_scope("glm", reuse=False): + if hps.output_dist == 'poisson': + log_rates_t0 = tf.matmul(factors[-1], this_out_fac_W) + this_out_fac_b + log_rates_t0.set_shape([None, None]) + rates[-1] = tf.exp(log_rates_t0) # rate + rates[-1].set_shape([None, hps.dataset_dims[hps.dataset_names[0]]]) + elif hps.output_dist == 'gaussian': + mean_n_logvars = tf.matmul(factors[-1],this_out_fac_W) + this_out_fac_b + mean_n_logvars.set_shape([None, None]) + means_t_bxd, logvars_t_bxd = tf.split(axis=1, num_or_size_splits=2, + value=mean_n_logvars) + rates[-1] = means_t_bxd + else: + assert False, "NIY" + + + # We support mulitple output distributions, for example Poisson, and also + # Gaussian. In these two cases respectively, there are one and two + # parameters (rates vs. mean and variance). So the output_dist_params + # tensor will variable sizes via tf.concat and tf.split, along the 1st + # dimension. So in the case of gaussian, for example, it'll be + # batch x (D+D), where each D dims is the mean, and then variances, + # respectively. For a distribution with 3 parameters, it would be + # batch x (D+D+D). + self.output_dist_params = dist_params = [None] * num_steps + self.log_p_xgz_b = log_p_xgz_b = 0.0 # log P(x|z) + for t in range(num_steps): + # Controller + if co_dim > 0: + # Build inputs for controller + tlag = t - hps.controller_input_lag + if tlag < 0: + con_in_f_t = tf.zeros_like(ci_enc_fwd[0]) + else: + con_in_f_t = ci_enc_fwd[tlag] + if hps.do_causal_controller: + # If controller is causal (wrt to data generation process), then it + # cannot see future data. Thus, excluding ci_enc_rev[t] is obvious. + # Less obvious is the need to exclude factors[t-1]. This arises + # because information flows from g0 through factors to the controller + # input. The g0 encoding is backwards, so we must necessarily exclude + # the factors in order to keep the controller input purely from a + # forward encoding (however unlikely it is that + # g0->factors->controller channel might actually be used in this way). + con_in_list_t = [con_in_f_t] + else: + tlag_rev = t + hps.controller_input_lag + if tlag_rev >= num_steps: + # better than zeros + con_in_r_t = tf.zeros_like(ci_enc_rev[0]) + else: + con_in_r_t = ci_enc_rev[tlag_rev] + con_in_list_t = [con_in_f_t, con_in_r_t] + + if hps.do_feed_factors_to_controller: + if hps.feedback_factors_or_rates == "factors": + con_in_list_t.append(factors[t-1]) + elif hps.feedback_factors_or_rates == "rates": + con_in_list_t.append(rates[t-1]) + else: + assert False, "NIY" + + con_in_t = tf.concat(axis=1, values=con_in_list_t) + con_in_t = tf.nn.dropout(con_in_t, keep_prob) + with tf.variable_scope("con", reuse=True if t > 0 else None): + con_outs[t], con_states[t] = con_cell(con_in_t, con_states[t-1]) + posterior_zs_co[t] = \ + DiagonalGaussianFromInput(con_outs[t], co_dim, + name="con_to_post_co") + if kind == "train": + u_t[t] = posterior_zs_co[t].sample + elif kind == "posterior_sample_and_average": + u_t[t] = posterior_zs_co[t].sample + else: + u_t[t] = prior_zs_ar_con.samples_t[t] + + # Inputs to the generator (controller output + external input) + if ext_input_dim > 0 and hps.inject_ext_input_to_gen: + ext_input_t_bxi = ext_input_do_bxtxi[:,t,:] + if co_dim > 0: + gen_inputs[t] = tf.concat(axis=1, values=[u_t[t], ext_input_t_bxi]) + else: + gen_inputs[t] = ext_input_t_bxi + else: + gen_inputs[t] = u_t[t] + + # Generator + data_t_bxd = dataset_ph[:,t,:] + with tf.variable_scope("gen", reuse=True if t > 0 else None): + gen_outs[t], gen_states[t] = gen_cell(gen_inputs[t], gen_states[t-1]) + gen_outs[t] = tf.nn.dropout(gen_outs[t], keep_prob) + with tf.variable_scope("gen", reuse=True): # ic defined it above + factors[t] = linear(gen_outs[t], factors_dim, do_bias=False, + normalized=True, name="gen_2_fac") + with tf.variable_scope("glm", reuse=True if t > 0 else None): + if hps.output_dist == 'poisson': + log_rates_t = tf.matmul(factors[t], this_out_fac_W) + this_out_fac_b + log_rates_t.set_shape([None, None]) + rates[t] = dist_params[t] = tf.exp(log_rates_t) # rates feed back + rates[t].set_shape([None, hps.dataset_dims[hps.dataset_names[0]]]) + loglikelihood_t = Poisson(log_rates_t).logp(data_t_bxd) + + elif hps.output_dist == 'gaussian': + mean_n_logvars = tf.matmul(factors[t],this_out_fac_W) + this_out_fac_b + mean_n_logvars.set_shape([None, None]) + means_t_bxd, logvars_t_bxd = tf.split(axis=1, num_or_size_splits=2, + value=mean_n_logvars) + rates[t] = means_t_bxd # rates feed back to controller + dist_params[t] = tf.concat( + axis=1, values=[means_t_bxd, tf.exp(logvars_t_bxd)]) + loglikelihood_t = \ + diag_gaussian_log_likelihood(data_t_bxd, + means_t_bxd, logvars_t_bxd) + else: + assert False, "NIY" + + log_p_xgz_b += tf.reduce_sum(loglikelihood_t, [1]) + + # Correlation of inferred inputs cost. + self.corr_cost = tf.constant(0.0) + if hps.co_mean_corr_scale > 0.0: + all_sum_corr = [] + for i in range(hps.co_dim): + for j in range(i+1, hps.co_dim): + sum_corr_ij = tf.constant(0.0) + for t in range(num_steps): + u_mean_t = posterior_zs_co[t].mean + sum_corr_ij += u_mean_t[:,i]*u_mean_t[:,j] + all_sum_corr.append(0.5 * tf.square(sum_corr_ij)) + self.corr_cost = tf.reduce_mean(all_sum_corr) # div by batch and by n*(n-1)/2 pairs + + # Variational Lower Bound on posterior, p(z|x), plus reconstruction cost. + # KL and reconstruction costs are normalized only by batch size, not by + # dimension, or by time steps. + kl_cost_g0_b = tf.zeros_like(batch_size, dtype=tf.float32) + kl_cost_co_b = tf.zeros_like(batch_size, dtype=tf.float32) + self.kl_cost = tf.constant(0.0) # VAE KL cost + self.recon_cost = tf.constant(0.0) # VAE reconstruction cost + self.nll_bound_vae = tf.constant(0.0) + self.nll_bound_iwae = tf.constant(0.0) # for eval with IWAE cost. + if kind in ["train", "posterior_sample_and_average"]: + kl_cost_g0_b = 0.0 + kl_cost_co_b = 0.0 + if ic_dim > 0: + g0_priors = [self.prior_zs_g0] + g0_posts = [self.posterior_zs_g0] + kl_cost_g0_b = KLCost_GaussianGaussian(g0_posts, g0_priors).kl_cost_b + kl_cost_g0_b = hps.kl_ic_weight * kl_cost_g0_b + if co_dim > 0: + kl_cost_co_b = \ + KLCost_GaussianGaussianProcessSampled( + posterior_zs_co, prior_zs_ar_con).kl_cost_b + kl_cost_co_b = hps.kl_co_weight * kl_cost_co_b + + # L = -KL + log p(x|z), to maximize bound on likelihood + # -L = KL - log p(x|z), to minimize bound on NLL + # so 'reconstruction cost' is negative log likelihood + self.recon_cost = - tf.reduce_mean(log_p_xgz_b) + self.kl_cost = tf.reduce_mean(kl_cost_g0_b + kl_cost_co_b) + + lb_on_ll_b = log_p_xgz_b - kl_cost_g0_b - kl_cost_co_b + + # VAE error averages outside the log + self.nll_bound_vae = -tf.reduce_mean(lb_on_ll_b) + + # IWAE error averages inside the log + k = tf.cast(tf.shape(log_p_xgz_b)[0], tf.float32) + iwae_lb_on_ll = -tf.log(k) + log_sum_exp(lb_on_ll_b) + self.nll_bound_iwae = -iwae_lb_on_ll + + # L2 regularization on the generator, normalized by number of parameters. + self.l2_cost = tf.constant(0.0) + if self.hps.l2_gen_scale > 0.0 or self.hps.l2_con_scale > 0.0: + l2_costs = [] + l2_numels = [] + l2_reg_var_lists = [tf.get_collection('l2_gen_reg'), + tf.get_collection('l2_con_reg')] + l2_reg_scales = [self.hps.l2_gen_scale, self.hps.l2_con_scale] + for l2_reg_vars, l2_scale in zip(l2_reg_var_lists, l2_reg_scales): + for v in l2_reg_vars: + numel = tf.reduce_prod(tf.concat(axis=0, values=tf.shape(v))) + numel_f = tf.cast(numel, tf.float32) + l2_numels.append(numel_f) + v_l2 = tf.reduce_sum(v*v) + l2_costs.append(0.5 * l2_scale * v_l2) + self.l2_cost = tf.add_n(l2_costs) / tf.add_n(l2_numels) + + # Compute the cost for training, part of the graph regardless. + # The KL cost can be problematic at the beginning of optimization, + # so we allow an exponential increase in weighting the KL from 0 + # to 1. + self.kl_decay_step = tf.maximum(self.train_step - hps.kl_start_step, 0) + self.l2_decay_step = tf.maximum(self.train_step - hps.l2_start_step, 0) + kl_decay_step_f = tf.cast(self.kl_decay_step, tf.float32) + l2_decay_step_f = tf.cast(self.l2_decay_step, tf.float32) + kl_increase_steps_f = tf.cast(hps.kl_increase_steps, tf.float32) + l2_increase_steps_f = tf.cast(hps.l2_increase_steps, tf.float32) + self.kl_weight = kl_weight = \ + tf.minimum(kl_decay_step_f / kl_increase_steps_f, 1.0) + self.l2_weight = l2_weight = \ + tf.minimum(l2_decay_step_f / l2_increase_steps_f, 1.0) + + self.timed_kl_cost = kl_weight * self.kl_cost + self.timed_l2_cost = l2_weight * self.l2_cost + self.weight_corr_cost = hps.co_mean_corr_scale * self.corr_cost + self.cost = self.recon_cost + self.timed_kl_cost + \ + self.timed_l2_cost + self.weight_corr_cost + + if kind != "train": + # save every so often + self.seso_saver = tf.train.Saver(tf.global_variables(), + max_to_keep=hps.max_ckpt_to_keep) + # lowest validation error + self.lve_saver = tf.train.Saver(tf.global_variables(), + max_to_keep=hps.max_ckpt_to_keep_lve) + + return + + # OPTIMIZATION + if not self.hps.do_train_io_only: + self.train_vars = tvars = \ + tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, + scope=tf.get_variable_scope().name) + else: + self.train_vars = tvars = \ + tf.get_collection('IO_transformations', + scope=tf.get_variable_scope().name) + print("done.") + print("Model Variables (to be optimized): ") + total_params = 0 + for i in range(len(tvars)): + shape = tvars[i].get_shape().as_list() + print(" ", i, tvars[i].name, shape) + total_params += np.prod(shape) + print("Total model parameters: ", total_params) + + grads = tf.gradients(self.cost, tvars) + grads, grad_global_norm = tf.clip_by_global_norm(grads, hps.max_grad_norm) + opt = tf.train.AdamOptimizer(self.learning_rate, beta1=0.9, beta2=0.999, + epsilon=1e-01) + self.grads = grads + self.grad_global_norm = grad_global_norm + self.train_op = opt.apply_gradients( + zip(grads, tvars), global_step=self.train_step) + + self.seso_saver = tf.train.Saver(tf.global_variables(), + max_to_keep=hps.max_ckpt_to_keep) + + # lowest validation error + self.lve_saver = tf.train.Saver(tf.global_variables(), + max_to_keep=hps.max_ckpt_to_keep) + + # SUMMARIES, used only during training. + # example summary + self.example_image = tf.placeholder(tf.float32, shape=[1,None,None,3], + name='image_tensor') + self.example_summ = tf.summary.image("LFADS example", self.example_image, + collections=["example_summaries"]) + + # general training summaries + self.lr_summ = tf.summary.scalar("Learning rate", self.learning_rate) + self.kl_weight_summ = tf.summary.scalar("KL weight", self.kl_weight) + self.l2_weight_summ = tf.summary.scalar("L2 weight", self.l2_weight) + self.corr_cost_summ = tf.summary.scalar("Corr cost", self.weight_corr_cost) + self.grad_global_norm_summ = tf.summary.scalar("Gradient global norm", + self.grad_global_norm) + if hps.co_dim > 0: + self.atau_summ = [None] * hps.co_dim + self.pvar_summ = [None] * hps.co_dim + for c in range(hps.co_dim): + self.atau_summ[c] = \ + tf.summary.scalar("AR Autocorrelation taus " + str(c), + tf.exp(self.prior_zs_ar_con.logataus_1xu[0,c])) + self.pvar_summ[c] = \ + tf.summary.scalar("AR Variances " + str(c), + tf.exp(self.prior_zs_ar_con.logpvars_1xu[0,c])) + + # cost summaries, separated into different collections for + # training vs validation. We make placeholders for these, because + # even though the graph computes these costs on a per-batch basis, + # we want to report the more reliable metric of per-epoch cost. + kl_cost_ph = tf.placeholder(tf.float32, shape=[], name='kl_cost_ph') + self.kl_t_cost_summ = tf.summary.scalar("KL cost (train)", kl_cost_ph, + collections=["train_summaries"]) + self.kl_v_cost_summ = tf.summary.scalar("KL cost (valid)", kl_cost_ph, + collections=["valid_summaries"]) + l2_cost_ph = tf.placeholder(tf.float32, shape=[], name='l2_cost_ph') + self.l2_cost_summ = tf.summary.scalar("L2 cost", l2_cost_ph, + collections=["train_summaries"]) + + recon_cost_ph = tf.placeholder(tf.float32, shape=[], name='recon_cost_ph') + self.recon_t_cost_summ = tf.summary.scalar("Reconstruction cost (train)", + recon_cost_ph, + collections=["train_summaries"]) + self.recon_v_cost_summ = tf.summary.scalar("Reconstruction cost (valid)", + recon_cost_ph, + collections=["valid_summaries"]) + + total_cost_ph = tf.placeholder(tf.float32, shape=[], name='total_cost_ph') + self.cost_t_summ = tf.summary.scalar("Total cost (train)", total_cost_ph, + collections=["train_summaries"]) + self.cost_v_summ = tf.summary.scalar("Total cost (valid)", total_cost_ph, + collections=["valid_summaries"]) + + self.kl_cost_ph = kl_cost_ph + self.l2_cost_ph = l2_cost_ph + self.recon_cost_ph = recon_cost_ph + self.total_cost_ph = total_cost_ph + + # Merged summaries, for easy coding later. + self.merged_examples = tf.summary.merge_all(key="example_summaries") + self.merged_generic = tf.summary.merge_all() # default key is 'summaries' + self.merged_train = tf.summary.merge_all(key="train_summaries") + self.merged_valid = tf.summary.merge_all(key="valid_summaries") + + session = tf.get_default_session() + self.logfile = os.path.join(hps.lfads_save_dir, "lfads_log") + self.writer = tf.summary.FileWriter(self.logfile, session.graph) + + def build_feed_dict(self, train_name, data_bxtxd, ext_input_bxtxi=None, + keep_prob=None): + """Build the feed dictionary, handles cases where there is no value defined. + + Args: + train_name: The key into the datasets, to set the tf.case statement for + the proper readin / readout matrices. + data_bxtxd: The data tensor + ext_input_bxtxi (optional): The external input tensor + keep_prob: The drop out keep probability. + + Returns: + The feed dictionary with TF tensors as keys and data as values, for use + with tf.Session.run() + + """ + feed_dict = {} + B, T, _ = data_bxtxd.shape + feed_dict[self.dataName] = train_name + feed_dict[self.dataset_ph] = data_bxtxd + + if self.ext_input is not None and ext_input_bxtxi is not None: + feed_dict[self.ext_input] = ext_input_bxtxi + + if keep_prob is None: + feed_dict[self.keep_prob] = self.hps.keep_prob + else: + feed_dict[self.keep_prob] = keep_prob + + return feed_dict + + @staticmethod + def get_batch(data_extxd, ext_input_extxi=None, batch_size=None, + example_idxs=None): + """Get a batch of data, either randomly chosen, or specified directly. + + Args: + data_extxd: The data to model, numpy tensors with shape: + # examples x # time steps x # dimensions + ext_input_extxi (optional): The external inputs, numpy tensor with shape: + # examples x # time steps x # external input dimensions + batch_size: The size of the batch to return + example_idxs (optional): The example indices used to select examples. + + Returns: + A tuple with two parts: + 1. Batched data numpy tensor with shape: + batch_size x # time steps x # dimensions + 2. Batched external input numpy tensor with shape: + batch_size x # time steps x # external input dims + """ + assert batch_size is not None or example_idxs is not None, "Problems" + E, T, D = data_extxd.shape + if example_idxs is None: + example_idxs = np.random.choice(E, batch_size) + + ext_input_bxtxi = None + if ext_input_extxi is not None: + ext_input_bxtxi = ext_input_extxi[example_idxs,:,:] + + return data_extxd[example_idxs,:,:], ext_input_bxtxi + + @staticmethod + def example_idxs_mod_batch_size(nexamples, batch_size): + """Given a number of examples, E, and a batch_size, B, generate indices + [0, 1, 2, ... B-1; + [B, B+1, ... 2*B-1; + ... + ] + returning those indices as a 2-dim tensor shaped like E/B x B. Note that + shape is only correct if E % B == 0. If not, then an extra row is generated + so that the remainder of examples is included. The extra examples are + explicitly to to the zero index (see randomize_example_idxs_mod_batch_size) + for randomized behavior. + + Args: + nexamples: The number of examples to batch up. + batch_size: The size of the batch. + Returns: + 2-dim tensor as described above. + """ + bmrem = batch_size - (nexamples % batch_size) + bmrem_examples = [] + if bmrem < batch_size: + #bmrem_examples = np.zeros(bmrem, dtype=np.int32) + ridxs = np.random.permutation(nexamples)[0:bmrem].astype(np.int32) + bmrem_examples = np.sort(ridxs) + example_idxs = range(nexamples) + list(bmrem_examples) + example_idxs_e_x_edivb = np.reshape(example_idxs, [-1, batch_size]) + return example_idxs_e_x_edivb, bmrem + + @staticmethod + def randomize_example_idxs_mod_batch_size(nexamples, batch_size): + """Indices 1:nexamples, randomized, in 2D form of + shape = (nexamples / batch_size) x batch_size. The remainder + is managed by drawing randomly from 1:nexamples. + + Args: + nexamples: number of examples to randomize + batch_size: number of elements in batch + + Returns: + The randomized, properly shaped indicies. + """ + assert nexamples > batch_size, "Problems" + bmrem = batch_size - nexamples % batch_size + bmrem_examples = [] + if bmrem < batch_size: + bmrem_examples = np.random.choice(range(nexamples), + size=bmrem, replace=False) + example_idxs = range(nexamples) + list(bmrem_examples) + mixed_example_idxs = np.random.permutation(example_idxs) + example_idxs_e_x_edivb = np.reshape(mixed_example_idxs, [-1, batch_size]) + return example_idxs_e_x_edivb, bmrem + + def shuffle_spikes_in_time(self, data_bxtxd): + """Shuffle the spikes in the temporal dimension. This is useful to + help the LFADS system avoid overfitting to individual spikes or fast + oscillations found in the data that are irrelevant to behavior. A + pure 'tabula rasa' approach would avoid this, but LFADS is sensitive + enough to pick up dynamics that you may not want. + + Args: + data_bxtxd: numpy array of spike count data to be shuffled. + Returns: + S_bxtxd, a numpy array with the same dimensions and contents as + data_bxtxd, but shuffled appropriately. + + """ + + B, T, N = data_bxtxd.shape + w = self.hps.temporal_spike_jitter_width + + if w == 0: + return data_bxtxd + + max_counts = np.max(data_bxtxd) + S_bxtxd = np.zeros([B,T,N]) + + # Intuitively, shuffle spike occurances, 0 or 1, but since we have counts, + # Do it over and over again up to the max count. + for mc in range(1,max_counts+1): + idxs = np.nonzero(data_bxtxd >= mc) + + data_ones = np.zeros_like(data_bxtxd) + data_ones[data_bxtxd >= mc] = 1 + + nfound = len(idxs[0]) + shuffles_incrs_in_time = np.random.randint(-w, w, size=nfound) + + shuffle_tidxs = idxs[1].copy() + shuffle_tidxs += shuffles_incrs_in_time + + # Reflect on the boundaries to not lose mass. + shuffle_tidxs[shuffle_tidxs < 0] = -shuffle_tidxs[shuffle_tidxs < 0] + shuffle_tidxs[shuffle_tidxs > T-1] = \ + (T-1)-(shuffle_tidxs[shuffle_tidxs > T-1] -(T-1)) + + for iii in zip(idxs[0], shuffle_tidxs, idxs[2]): + S_bxtxd[iii] += 1 + + return S_bxtxd + + def shuffle_and_flatten_datasets(self, datasets, kind='train'): + """Since LFADS supports multiple datasets in the same dynamical model, + we have to be careful to use all the data in a single training epoch. But + since the datasets my have different data dimensionality, we cannot batch + examples from data dictionaries together. Instead, we generate random + batches within each data dictionary, and then randomize these batches + while holding onto the dataname, so that when it's time to feed + the graph, the correct in/out matrices can be selected, per batch. + + Args: + datasets: A dict of data dicts. The dataset dict is simply a + name(string)-> data dictionary mapping (See top of lfads.py). + kind: 'train' or 'valid' + + Returns: + A flat list, in which each element is a pair ('name', indices). + """ + batch_size = self.hps.batch_size + ndatasets = len(datasets) + random_example_idxs = {} + epoch_idxs = {} + all_name_example_idx_pairs = [] + kind_data = kind + '_data' + for name, data_dict in datasets.items(): + nexamples, ntime, data_dim = data_dict[kind_data].shape + epoch_idxs[name] = 0 + random_example_idxs, _ = \ + self.randomize_example_idxs_mod_batch_size(nexamples, batch_size) + + epoch_size = random_example_idxs.shape[0] + names = [name] * epoch_size + all_name_example_idx_pairs += zip(names, random_example_idxs) + + np.random.shuffle(all_name_example_idx_pairs) # shuffle in place + + return all_name_example_idx_pairs + + def train_epoch(self, datasets, batch_size=None, do_save_ckpt=True): + """Train the model through the entire dataset once. + + Args: + datasets: A dict of data dicts. The dataset dict is simply a + name(string)-> data dictionary mapping (See top of lfads.py). + batch_size (optional): The batch_size to use + do_save_ckpt (optional): Should the routine save a checkpoint on this + training epoch? + + Returns: + A tuple with 6 float values: + (total cost of the epoch, epoch reconstruction cost, + epoch kl cost, KL weight used this training epoch, + total l2 cost on generator, and the corresponding weight). + """ + ops_to_eval = [self.cost, self.recon_cost, + self.kl_cost, self.kl_weight, + self.l2_cost, self.l2_weight, + self.train_op] + collected_op_values = self.run_epoch(datasets, ops_to_eval, kind="train") + + total_cost = total_recon_cost = total_kl_cost = 0.0 + # normalizing by batch done in distributions.py + epoch_size = len(collected_op_values) + for op_values in collected_op_values: + total_cost += op_values[0] + total_recon_cost += op_values[1] + total_kl_cost += op_values[2] + + kl_weight = collected_op_values[-1][3] + l2_cost = collected_op_values[-1][4] + l2_weight = collected_op_values[-1][5] + + epoch_total_cost = total_cost / epoch_size + epoch_recon_cost = total_recon_cost / epoch_size + epoch_kl_cost = total_kl_cost / epoch_size + + if do_save_ckpt: + session = tf.get_default_session() + checkpoint_path = os.path.join(self.hps.lfads_save_dir, + self.hps.checkpoint_name + '.ckpt') + self.seso_saver.save(session, checkpoint_path, + global_step=self.train_step) + + return epoch_total_cost, epoch_recon_cost, epoch_kl_cost, \ + kl_weight, l2_cost, l2_weight + + + def run_epoch(self, datasets, ops_to_eval, kind="train", batch_size=None, + do_collect=True, keep_prob=None): + """Run the model through the entire dataset once. + + Args: + datasets: A dict of data dicts. The dataset dict is simply a + name(string)-> data dictionary mapping (See top of lfads.py). + ops_to_eval: A list of tensorflow operations that will be evaluated in + the tf.session.run() call. + batch_size (optional): The batch_size to use + do_collect (optional): Should the routine collect all session.run + output as a list, and return it? + keep_prob (optional): The dropout keep probability. + + Returns: + A list of lists, the internal list is the return for the ops for each + session.run() call. The outer list collects over the epoch. + """ + hps = self.hps + all_name_example_idx_pairs = \ + self.shuffle_and_flatten_datasets(datasets, kind) + + kind_data = kind + '_data' + kind_ext_input = kind + '_ext_input' + + total_cost = total_recon_cost = total_kl_cost = 0.0 + session = tf.get_default_session() + epoch_size = len(all_name_example_idx_pairs) + evaled_ops_list = [] + for name, example_idxs in all_name_example_idx_pairs: + data_dict = datasets[name] + data_extxd = data_dict[kind_data] + if hps.output_dist == 'poisson' and hps.temporal_spike_jitter_width > 0: + data_extxd = self.shuffle_spikes_in_time(data_extxd) + + ext_input_extxi = data_dict[kind_ext_input] + data_bxtxd, ext_input_bxtxi = self.get_batch(data_extxd, ext_input_extxi, + example_idxs=example_idxs) + + feed_dict = self.build_feed_dict(name, data_bxtxd, ext_input_bxtxi, + keep_prob=keep_prob) + evaled_ops_np = session.run(ops_to_eval, feed_dict=feed_dict) + if do_collect: + evaled_ops_list.append(evaled_ops_np) + + return evaled_ops_list + + def summarize_all(self, datasets, summary_values): + """Plot and summarize stuff in tensorboard. + + Note that everything done in the current function is otherwise done on + a single, randomly selected dataset (except for summary_values, which are + passed in.) + + Args: + datasets, the dictionary of datasets used in the study. + summary_values: These summary values are created from the training loop, + and so summarize the entire set of datasets. + """ + hps = self.hps + tr_kl_cost = summary_values['tr_kl_cost'] + tr_recon_cost = summary_values['tr_recon_cost'] + tr_total_cost = summary_values['tr_total_cost'] + kl_weight = summary_values['kl_weight'] + l2_weight = summary_values['l2_weight'] + l2_cost = summary_values['l2_cost'] + has_any_valid_set = summary_values['has_any_valid_set'] + i = summary_values['nepochs'] + + session = tf.get_default_session() + train_summ, train_step = session.run([self.merged_train, + self.train_step], + feed_dict={self.l2_cost_ph:l2_cost, + self.kl_cost_ph:tr_kl_cost, + self.recon_cost_ph:tr_recon_cost, + self.total_cost_ph:tr_total_cost}) + self.writer.add_summary(train_summ, train_step) + if has_any_valid_set: + ev_kl_cost = summary_values['ev_kl_cost'] + ev_recon_cost = summary_values['ev_recon_cost'] + ev_total_cost = summary_values['ev_total_cost'] + eval_summ = session.run(self.merged_valid, + feed_dict={self.kl_cost_ph:ev_kl_cost, + self.recon_cost_ph:ev_recon_cost, + self.total_cost_ph:ev_total_cost}) + self.writer.add_summary(eval_summ, train_step) + print("Epoch:%d, step:%d (TRAIN, VALID): total: %.2f, %.2f\ + recon: %.2f, %.2f, kl: %.2f, %.2f, l2: %.5f,\ + kl weight: %.2f, l2 weight: %.2f" % \ + (i, train_step, tr_total_cost, ev_total_cost, + tr_recon_cost, ev_recon_cost, tr_kl_cost, ev_kl_cost, + l2_cost, kl_weight, l2_weight)) + + csv_outstr = "epoch,%d, step,%d, total,%.2f,%.2f, \ + recon,%.2f,%.2f, kl,%.2f,%.2f, l2,%.5f, \ + klweight,%.2f, l2weight,%.2f\n"% \ + (i, train_step, tr_total_cost, ev_total_cost, + tr_recon_cost, ev_recon_cost, tr_kl_cost, ev_kl_cost, + l2_cost, kl_weight, l2_weight) + + else: + print("Epoch:%d, step:%d TRAIN: total: %.2f recon: %.2f, kl: %.2f,\ + l2: %.5f, kl weight: %.2f, l2 weight: %.2f" % \ + (i, train_step, tr_total_cost, tr_recon_cost, tr_kl_cost, + l2_cost, kl_weight, l2_weight)) + csv_outstr = "epoch,%d, step,%d, total,%.2f, recon,%.2f, kl,%.2f, \ + l2,%.5f, klweight,%.2f, l2weight,%.2f\n"% \ + (i, train_step, tr_total_cost, tr_recon_cost, + tr_kl_cost, l2_cost, kl_weight, l2_weight) + + if self.hps.csv_log: + csv_file = os.path.join(self.hps.lfads_save_dir, self.hps.csv_log+'.csv') + with open(csv_file, "a") as myfile: + myfile.write(csv_outstr) + + + def plot_single_example(self, datasets): + """Plot an image relating to a randomly chosen, specific example. We use + posterior sample and average by taking one example, and filling a whole + batch with that example, sample from the posterior, and then average the + quantities. + + """ + hps = self.hps + all_data_names = datasets.keys() + data_name = np.random.permutation(all_data_names)[0] + data_dict = datasets[data_name] + has_valid_set = True if data_dict['valid_data'] is not None else False + cf = 1.0 # plotting concern + + # posterior sample and average here + E, _, _ = data_dict['train_data'].shape + eidx = np.random.choice(E) + example_idxs = eidx * np.ones(hps.batch_size, dtype=np.int32) + + train_data_bxtxd, train_ext_input_bxtxi = \ + self.get_batch(data_dict['train_data'], data_dict['train_ext_input'], + example_idxs=example_idxs) + + truth_train_data_bxtxd = None + if 'train_truth' in data_dict and data_dict['train_truth'] is not None: + truth_train_data_bxtxd, _ = self.get_batch(data_dict['train_truth'], + example_idxs=example_idxs) + cf = data_dict['conversion_factor'] + + # plotter does averaging + train_model_values = self.eval_model_runs_batch(data_name, + train_data_bxtxd, + train_ext_input_bxtxi, + do_average_batch=False) + + train_step = train_model_values['train_steps'] + feed_dict = self.build_feed_dict(data_name, train_data_bxtxd, + train_ext_input_bxtxi, keep_prob=1.0) + + session = tf.get_default_session() + generic_summ = session.run(self.merged_generic, feed_dict=feed_dict) + self.writer.add_summary(generic_summ, train_step) + + valid_data_bxtxd = valid_model_values = valid_ext_input_bxtxi = None + truth_valid_data_bxtxd = None + if has_valid_set: + E, _, _ = data_dict['valid_data'].shape + eidx = np.random.choice(E) + example_idxs = eidx * np.ones(hps.batch_size, dtype=np.int32) + valid_data_bxtxd, valid_ext_input_bxtxi = \ + self.get_batch(data_dict['valid_data'], + data_dict['valid_ext_input'], + example_idxs=example_idxs) + if 'valid_truth' in data_dict and data_dict['valid_truth'] is not None: + truth_valid_data_bxtxd, _ = self.get_batch(data_dict['valid_truth'], + example_idxs=example_idxs) + else: + truth_valid_data_bxtxd = None + + # plotter does averaging + valid_model_values = self.eval_model_runs_batch(data_name, + valid_data_bxtxd, + valid_ext_input_bxtxi, + do_average_batch=False) + + example_image = plot_lfads(train_bxtxd=train_data_bxtxd, + train_model_vals=train_model_values, + train_ext_input_bxtxi=train_ext_input_bxtxi, + train_truth_bxtxd=truth_train_data_bxtxd, + valid_bxtxd=valid_data_bxtxd, + valid_model_vals=valid_model_values, + valid_ext_input_bxtxi=valid_ext_input_bxtxi, + valid_truth_bxtxd=truth_valid_data_bxtxd, + bidx=None, cf=cf, output_dist=hps.output_dist) + example_image = np.expand_dims(example_image, axis=0) + example_summ = session.run(self.merged_examples, + feed_dict={self.example_image : example_image}) + self.writer.add_summary(example_summ) + + def train_model(self, datasets): + """Train the model, print per-epoch information, and save checkpoints. + + Loop over training epochs. The function that actually does the + training is train_epoch. This function iterates over the training + data, one epoch at a time. The learning rate schedule is such + that it will stay the same until the cost goes up in comparison to + the last few values, then it will drop. + + Args: + datasets: A dict of data dicts. The dataset dict is simply a + name(string)-> data dictionary mapping (See top of lfads.py). + """ + hps = self.hps + has_any_valid_set = False + for data_dict in datasets.values(): + if data_dict['valid_data'] is not None: + has_any_valid_set = True + break + + session = tf.get_default_session() + lr = session.run(self.learning_rate) + lr_stop = hps.learning_rate_stop + i = -1 + train_costs = [] + valid_costs = [] + ev_total_cost = ev_recon_cost = ev_kl_cost = 0.0 + lowest_ev_cost = np.Inf + while True: + i += 1 + do_save_ckpt = True if i % 10 ==0 else False + tr_total_cost, tr_recon_cost, tr_kl_cost, kl_weight, l2_cost, l2_weight = \ + self.train_epoch(datasets, do_save_ckpt=do_save_ckpt) + + # Evaluate the validation cost, and potentially save. Note that this + # routine will not save a validation checkpoint until the kl weight and + # l2 weights are equal to 1.0. + if has_any_valid_set: + ev_total_cost, ev_recon_cost, ev_kl_cost = \ + self.eval_cost_epoch(datasets, kind='valid') + valid_costs.append(ev_total_cost) + + # > 1 may give more consistent results, but not the actual lowest vae. + # == 1 gives the lowest vae seen so far. + n_lve = 1 + run_avg_lve = np.mean(valid_costs[-n_lve:]) + + # conditions for saving checkpoints: + # KL weight must have finished stepping (>=1.0), AND + # L2 weight must have finished stepping OR L2 is not being used, AND + # the current run has a lower LVE than previous runs AND + # len(valid_costs > n_lve) (not sure what that does) + if kl_weight >= 1.0 and \ + (l2_weight >= 1.0 or \ + (self.hps.l2_gen_scale == 0.0 and self.hps.l2_con_scale == 0.0)) \ + and (len(valid_costs) > n_lve and run_avg_lve < lowest_ev_cost): + + lowest_ev_cost = run_avg_lve + checkpoint_path = os.path.join(self.hps.lfads_save_dir, + self.hps.checkpoint_name + '_lve.ckpt') + self.lve_saver.save(session, checkpoint_path, + global_step=self.train_step, + latest_filename='checkpoint_lve') + + # Plot and summarize. + values = {'nepochs':i, 'has_any_valid_set': has_any_valid_set, + 'tr_total_cost':tr_total_cost, 'ev_total_cost':ev_total_cost, + 'tr_recon_cost':tr_recon_cost, 'ev_recon_cost':ev_recon_cost, + 'tr_kl_cost':tr_kl_cost, 'ev_kl_cost':ev_kl_cost, + 'l2_weight':l2_weight, 'kl_weight':kl_weight, + 'l2_cost':l2_cost} + self.summarize_all(datasets, values) + self.plot_single_example(datasets) + + # Manage learning rate. + train_res = tr_total_cost + n_lr = hps.learning_rate_n_to_compare + if len(train_costs) > n_lr and train_res > np.max(train_costs[-n_lr:]): + _ = session.run(self.learning_rate_decay_op) + lr = session.run(self.learning_rate) + print(" Decreasing learning rate to %f." % lr) + # Force the system to run n_lr times while at this lr. + train_costs.append(np.inf) + else: + train_costs.append(train_res) + + if lr < lr_stop: + print("Stopping optimization based on learning rate criteria.") + break + + def eval_cost_epoch(self, datasets, kind='train', ext_input_extxi=None, + batch_size=None): + """Evaluate the cost of the epoch. + + Args: + data_dict: The dictionary of data (training and validation) used for + training and evaluation of the model, respectively. + + Returns: + a 3 tuple of costs: + (epoch total cost, epoch reconstruction cost, epoch KL cost) + """ + ops_to_eval = [self.cost, self.recon_cost, self.kl_cost] + collected_op_values = self.run_epoch(datasets, ops_to_eval, kind=kind, + keep_prob=1.0) + + total_cost = total_recon_cost = total_kl_cost = 0.0 + # normalizing by batch done in distributions.py + epoch_size = len(collected_op_values) + for op_values in collected_op_values: + total_cost += op_values[0] + total_recon_cost += op_values[1] + total_kl_cost += op_values[2] + + epoch_total_cost = total_cost / epoch_size + epoch_recon_cost = total_recon_cost / epoch_size + epoch_kl_cost = total_kl_cost / epoch_size + + return epoch_total_cost, epoch_recon_cost, epoch_kl_cost + + def eval_model_runs_batch(self, data_name, data_bxtxd, ext_input_bxtxi=None, + do_eval_cost=False, do_average_batch=False): + """Returns all the goodies for the entire model, per batch. + + Args: + data_name: The name of the data dict, to select which in/out matrices + to use. + data_bxtxd: Numpy array training data with shape: + batch_size x # time steps x # dimensions + ext_input_bxtxi: Numpy array training external input with shape: + batch_size x # time steps x # external input dims + do_eval_cost (optional): If true, the IWAE (Importance Weighted + Autoencoder) log likeihood bound, instead of the VAE version. + do_average_batch (optional): average over the batch, useful for getting + good IWAE costs, and model outputs for a single data point. + + Returns: + A dictionary with the outputs of the model decoder, namely: + prior g0 mean, prior g0 variance, approx. posterior mean, approx + posterior mean, the generator initial conditions, the control inputs (if + enabled), the state of the generator, the factors, and the rates. + """ + session = tf.get_default_session() + feed_dict = self.build_feed_dict(data_name, data_bxtxd, + ext_input_bxtxi, keep_prob=1.0) + + # Non-temporal signals will be batch x dim. + # Temporal signals are list length T with elements batch x dim. + tf_vals = [self.gen_ics, self.gen_states, self.factors, + self.output_dist_params] + tf_vals.append(self.cost) + tf_vals.append(self.nll_bound_vae) + tf_vals.append(self.nll_bound_iwae) + tf_vals.append(self.train_step) # not train_op! + if self.hps.ic_dim > 0: + tf_vals += [self.prior_zs_g0.mean, self.prior_zs_g0.logvar, + self.posterior_zs_g0.mean, self.posterior_zs_g0.logvar] + if self.hps.co_dim > 0: + tf_vals.append(self.controller_outputs) + tf_vals_flat, fidxs = flatten(tf_vals) + + np_vals_flat = session.run(tf_vals_flat, feed_dict=feed_dict) + + ff = 0 + gen_ics = [np_vals_flat[f] for f in fidxs[ff]]; ff += 1 + gen_states = [np_vals_flat[f] for f in fidxs[ff]]; ff += 1 + factors = [np_vals_flat[f] for f in fidxs[ff]]; ff += 1 + out_dist_params = [np_vals_flat[f] for f in fidxs[ff]]; ff += 1 + costs = [np_vals_flat[f] for f in fidxs[ff]]; ff += 1 + nll_bound_vaes = [np_vals_flat[f] for f in fidxs[ff]]; ff += 1 + nll_bound_iwaes = [np_vals_flat[f] for f in fidxs[ff]]; ff +=1 + train_steps = [np_vals_flat[f] for f in fidxs[ff]]; ff +=1 + if self.hps.ic_dim > 0: + prior_g0_mean = [np_vals_flat[f] for f in fidxs[ff]]; ff +=1 + prior_g0_logvar = [np_vals_flat[f] for f in fidxs[ff]]; ff += 1 + post_g0_mean = [np_vals_flat[f] for f in fidxs[ff]]; ff += 1 + post_g0_logvar = [np_vals_flat[f] for f in fidxs[ff]]; ff += 1 + if self.hps.co_dim > 0: + controller_outputs = [np_vals_flat[f] for f in fidxs[ff]]; ff += 1 + + # [0] are to take out the non-temporal items from lists + gen_ics = gen_ics[0] + costs = costs[0] + nll_bound_vaes = nll_bound_vaes[0] + nll_bound_iwaes = nll_bound_iwaes[0] + train_steps = train_steps[0] + + # Convert to full tensors, not lists of tensors in time dim. + gen_states = list_t_bxn_to_tensor_bxtxn(gen_states) + factors = list_t_bxn_to_tensor_bxtxn(factors) + out_dist_params = list_t_bxn_to_tensor_bxtxn(out_dist_params) + if self.hps.ic_dim > 0: + prior_g0_mean = prior_g0_mean[0] + prior_g0_logvar = prior_g0_logvar[0] + post_g0_mean = post_g0_mean[0] + post_g0_logvar = post_g0_logvar[0] + if self.hps.co_dim > 0: + controller_outputs = list_t_bxn_to_tensor_bxtxn(controller_outputs) + + if do_average_batch: + gen_ics = np.mean(gen_ics, axis=0) + gen_states = np.mean(gen_states, axis=0) + factors = np.mean(factors, axis=0) + out_dist_params = np.mean(out_dist_params, axis=0) + if self.hps.ic_dim > 0: + prior_g0_mean = np.mean(prior_g0_mean, axis=0) + prior_g0_logvar = np.mean(prior_g0_logvar, axis=0) + post_g0_mean = np.mean(post_g0_mean, axis=0) + post_g0_logvar = np.mean(post_g0_logvar, axis=0) + if self.hps.co_dim > 0: + controller_outputs = np.mean(controller_outputs, axis=0) + + model_vals = {} + model_vals['gen_ics'] = gen_ics + model_vals['gen_states'] = gen_states + model_vals['factors'] = factors + model_vals['output_dist_params'] = out_dist_params + model_vals['costs'] = costs + model_vals['nll_bound_vaes'] = nll_bound_vaes + model_vals['nll_bound_iwaes'] = nll_bound_iwaes + model_vals['train_steps'] = train_steps + if self.hps.ic_dim > 0: + model_vals['prior_g0_mean'] = prior_g0_mean + model_vals['prior_g0_logvar'] = prior_g0_logvar + model_vals['post_g0_mean'] = post_g0_mean + model_vals['post_g0_logvar'] = post_g0_logvar + if self.hps.co_dim > 0: + model_vals['controller_outputs'] = controller_outputs + + return model_vals + + def eval_model_runs_avg_epoch(self, data_name, data_extxd, + ext_input_extxi=None): + """Returns all the expected value for goodies for the entire model. + + The expected value is taken over hidden (z) variables, namely the initial + conditions and the control inputs. The expected value is approximate, and + accomplished via sampling (batch_size) samples for every examples. + + Args: + data_name: The name of the data dict, to select which in/out matrices + to use. + data_extxd: Numpy array training data with shape: + # examples x # time steps x # dimensions + ext_input_extxi (optional): Numpy array training external input with + shape: # examples x # time steps x # external input dims + + Returns: + A dictionary with the averaged outputs of the model decoder, namely: + prior g0 mean, prior g0 variance, approx. posterior mean, approx + posterior mean, the generator initial conditions, the control inputs (if + enabled), the state of the generator, the factors, and the output + distribution parameters, e.g. (rates or mean and variances). + """ + hps = self.hps + batch_size = hps.batch_size + E, T, D = data_extxd.shape + E_to_process = hps.ps_nexamples_to_process + if E_to_process > E: + print("Setting number of posterior samples to process to : ", E) + E_to_process = E + + if hps.ic_dim > 0: + prior_g0_mean = np.zeros([E_to_process, hps.ic_dim]) + prior_g0_logvar = np.zeros([E_to_process, hps.ic_dim]) + post_g0_mean = np.zeros([E_to_process, hps.ic_dim]) + post_g0_logvar = np.zeros([E_to_process, hps.ic_dim]) + + if hps.co_dim > 0: + controller_outputs = np.zeros([E_to_process, T, hps.co_dim]) + gen_ics = np.zeros([E_to_process, hps.gen_dim]) + gen_states = np.zeros([E_to_process, T, hps.gen_dim]) + factors = np.zeros([E_to_process, T, hps.factors_dim]) + + if hps.output_dist == 'poisson': + out_dist_params = np.zeros([E_to_process, T, D]) + elif hps.output_dist == 'gaussian': + out_dist_params = np.zeros([E_to_process, T, D+D]) + else: + assert False, "NIY" + + costs = np.zeros(E_to_process) + nll_bound_vaes = np.zeros(E_to_process) + nll_bound_iwaes = np.zeros(E_to_process) + train_steps = np.zeros(E_to_process) + for es_idx in range(E_to_process): + print("Running %d of %d." % (es_idx+1, E_to_process)) + example_idxs = es_idx * np.ones(batch_size, dtype=np.int32) + data_bxtxd, ext_input_bxtxi = self.get_batch(data_extxd, + ext_input_extxi, + batch_size=batch_size, + example_idxs=example_idxs) + model_values = self.eval_model_runs_batch(data_name, data_bxtxd, + ext_input_bxtxi, + do_eval_cost=True, + do_average_batch=True) + + if self.hps.ic_dim > 0: + prior_g0_mean[es_idx,:] = model_values['prior_g0_mean'] + prior_g0_logvar[es_idx,:] = model_values['prior_g0_logvar'] + post_g0_mean[es_idx,:] = model_values['post_g0_mean'] + post_g0_logvar[es_idx,:] = model_values['post_g0_logvar'] + gen_ics[es_idx,:] = model_values['gen_ics'] + + if self.hps.co_dim > 0: + controller_outputs[es_idx,:,:] = model_values['controller_outputs'] + gen_states[es_idx,:,:] = model_values['gen_states'] + factors[es_idx,:,:] = model_values['factors'] + out_dist_params[es_idx,:,:] = model_values['output_dist_params'] + costs[es_idx] = model_values['costs'] + nll_bound_vaes[es_idx] = model_values['nll_bound_vaes'] + nll_bound_iwaes[es_idx] = model_values['nll_bound_iwaes'] + train_steps[es_idx] = model_values['train_steps'] + print('bound nll(vae): %.3f, bound nll(iwae): %.3f' \ + % (nll_bound_vaes[es_idx], nll_bound_iwaes[es_idx])) + + model_runs = {} + if self.hps.ic_dim > 0: + model_runs['prior_g0_mean'] = prior_g0_mean + model_runs['prior_g0_logvar'] = prior_g0_logvar + model_runs['post_g0_mean'] = post_g0_mean + model_runs['post_g0_logvar'] = post_g0_logvar + model_runs['gen_ics'] = gen_ics + + if self.hps.co_dim > 0: + model_runs['controller_outputs'] = controller_outputs + model_runs['gen_states'] = gen_states + model_runs['factors'] = factors + model_runs['output_dist_params'] = out_dist_params + model_runs['costs'] = costs + model_runs['nll_bound_vaes'] = nll_bound_vaes + model_runs['nll_bound_iwaes'] = nll_bound_iwaes + model_runs['train_steps'] = train_steps + return model_runs + + def write_model_runs(self, datasets, output_fname=None): + """Run the model on the data in data_dict, and save the computed values. + + LFADS generates a number of outputs for each examples, and these are all + saved. They are: + The mean and variance of the prior of g0. + The mean and variance of approximate posterior of g0. + The control inputs (if enabled) + The initial conditions, g0, for all examples. + The generator states for all time. + The factors for all time. + The output distribution parameters (e.g. rates) for all time. + + Args: + datasets: a dictionary of named data_dictionaries, see top of lfads.py + output_fname: a file name stem for the output files. + """ + hps = self.hps + kind = hps.kind + + for data_name, data_dict in datasets.items(): + data_tuple = [('train', data_dict['train_data'], + data_dict['train_ext_input']), + ('valid', data_dict['valid_data'], + data_dict['valid_ext_input'])] + for data_kind, data_extxd, ext_input_extxi in data_tuple: + if not output_fname: + fname = "model_runs_" + data_name + '_' + data_kind + '_' + kind + else: + fname = output_fname + data_name + '_' + data_kind + '_' + kind + + print("Writing data for %s data and kind %s." % (data_name, data_kind)) + model_runs = self.eval_model_runs_avg_epoch(data_name, data_extxd, + ext_input_extxi) + full_fname = os.path.join(hps.lfads_save_dir, fname) + write_data(full_fname, model_runs, compression='gzip') + print("Done.") + + def write_model_samples(self, dataset_name, output_fname=None): + """Use the prior distribution to generate batch_size number of samples + from the model. + + LFADS generates a number of outputs for each sample, and these are all + saved. They are: + The mean and variance of the prior of g0. + The control inputs (if enabled) + The initial conditions, g0, for all examples. + The generator states for all time. + The factors for all time. + The output distribution parameters (e.g. rates) for all time. + + Args: + dataset_name: The name of the dataset to grab the factors -> rates + alignment matrices from. + output_fname: The name of the file in which to save the generated + samples. + """ + hps = self.hps + batch_size = hps.batch_size + + print("Generating %d samples" % (batch_size)) + tf_vals = [self.factors, self.gen_states, self.gen_ics, + self.cost, self.output_dist_params] + if hps.ic_dim > 0: + tf_vals += [self.prior_zs_g0.mean, self.prior_zs_g0.logvar] + if hps.co_dim > 0: + tf_vals += [self.prior_zs_ar_con.samples_t] + tf_vals_flat, fidxs = flatten(tf_vals) + + session = tf.get_default_session() + feed_dict = {} + feed_dict[self.dataName] = dataset_name + feed_dict[self.keep_prob] = 1.0 + + np_vals_flat = session.run(tf_vals_flat, feed_dict=feed_dict) + + ff = 0 + factors = [np_vals_flat[f] for f in fidxs[ff]]; ff += 1 + gen_states = [np_vals_flat[f] for f in fidxs[ff]]; ff += 1 + gen_ics = [np_vals_flat[f] for f in fidxs[ff]]; ff += 1 + costs = [np_vals_flat[f] for f in fidxs[ff]]; ff += 1 + output_dist_params = [np_vals_flat[f] for f in fidxs[ff]]; ff += 1 + if hps.ic_dim > 0: + prior_g0_mean = [np_vals_flat[f] for f in fidxs[ff]]; ff += 1 + prior_g0_logvar = [np_vals_flat[f] for f in fidxs[ff]]; ff += 1 + if hps.co_dim > 0: + prior_zs_ar_con = [np_vals_flat[f] for f in fidxs[ff]]; ff += 1 + + # [0] are to take out the non-temporal items from lists + gen_ics = gen_ics[0] + costs = costs[0] + + # Convert to full tensors, not lists of tensors in time dim. + gen_states = list_t_bxn_to_tensor_bxtxn(gen_states) + factors = list_t_bxn_to_tensor_bxtxn(factors) + output_dist_params = list_t_bxn_to_tensor_bxtxn(output_dist_params) + if hps.ic_dim > 0: + prior_g0_mean = prior_g0_mean[0] + prior_g0_logvar = prior_g0_logvar[0] + if hps.co_dim > 0: + prior_zs_ar_con = list_t_bxn_to_tensor_bxtxn(prior_zs_ar_con) + + model_vals = {} + model_vals['gen_ics'] = gen_ics + model_vals['gen_states'] = gen_states + model_vals['factors'] = factors + model_vals['output_dist_params'] = output_dist_params + model_vals['costs'] = costs.reshape(1) + if hps.ic_dim > 0: + model_vals['prior_g0_mean'] = prior_g0_mean + model_vals['prior_g0_logvar'] = prior_g0_logvar + if hps.co_dim > 0: + model_vals['prior_zs_ar_con'] = prior_zs_ar_con + + full_fname = os.path.join(hps.lfads_save_dir, output_fname) + write_data(full_fname, model_vals, compression='gzip') + print("Done.") + + @staticmethod + def eval_model_parameters(use_nested=True, include_strs=None): + """Evaluate and return all of the TF variables in the model. + + Args: + use_nested (optional): For returning values, use a nested dictoinary, based + on variable scoping, or return all variables in a flat dictionary. + include_strs (optional): A list of strings to use as a filter, to reduce the + number of variables returned. A variable name must contain at least one + string in include_strs as a sub-string in order to be returned. + + Returns: + The parameters of the model. This can be in a flat + dictionary, or a nested dictionary, where the nesting is by variable + scope. + """ + all_tf_vars = tf.global_variables() + session = tf.get_default_session() + all_tf_vars_eval = session.run(all_tf_vars) + vars_dict = {} + strs = ["LFADS"] + if include_strs: + strs += include_strs + + for i, (var, var_eval) in enumerate(zip(all_tf_vars, all_tf_vars_eval)): + if any(s in include_strs for s in var.name): + if not isinstance(var_eval, np.ndarray): # for H5PY + print(var.name, """ is not numpy array, saving as numpy array + with value: """, var_eval, type(var_eval)) + e = np.array(var_eval) + print(e, type(e)) + else: + e = var_eval + vars_dict[var.name] = e + + if not use_nested: + return vars_dict + + var_names = vars_dict.keys() + nested_vars_dict = {} + current_dict = nested_vars_dict + for v, var_name in enumerate(var_names): + var_split_name_list = var_name.split('/') + split_name_list_len = len(var_split_name_list) + current_dict = nested_vars_dict + for p, part in enumerate(var_split_name_list): + if p < split_name_list_len - 1: + if part in current_dict: + current_dict = current_dict[part] + else: + current_dict[part] = {} + current_dict = current_dict[part] + else: + current_dict[part] = vars_dict[var_name] + + return nested_vars_dict + + @staticmethod + def spikify_rates(rates_bxtxd): + """Randomly spikify underlying rates according a Poisson distribution + + Args: + rates_bxtxd: a numpy tensor with shape: + + Returns: + A numpy array with the same shape as rates_bxtxd, but with the event + counts. + """ + + B,T,N = rates_bxtxd.shape + assert all([B > 0, N > 0]), "problems" + + # Because the rates are changing, there is nesting + spikes_bxtxd = np.zeros([B,T,N], dtype=np.int32) + for b in range(B): + for t in range(T): + for n in range(N): + rate = rates_bxtxd[b,t,n] + count = np.random.poisson(rate) + spikes_bxtxd[b,t,n] = count + + return spikes_bxtxd diff --git a/lfads/plot_lfads.py b/lfads/plot_lfads.py new file mode 100644 index 0000000000000000000000000000000000000000..b4ebba9f489b38de4b4f1dd69bcae45206c9fbf6 --- /dev/null +++ b/lfads/plot_lfads.py @@ -0,0 +1,223 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# ============================================================================== +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import matplotlib +matplotlib.use('Agg') +from matplotlib import pyplot as plt +import numpy as np +import tensorflow as tf + +def _plot_item(W, name, full_name, nspaces): + plt.figure() + if W.shape == (): + print(name, ": ", W) + elif W.shape[0] == 1: + plt.stem(W.T) + plt.title(full_name) + elif W.shape[1] == 1: + plt.stem(W) + plt.title(full_name) + else: + plt.imshow(np.abs(W), interpolation='nearest', cmap='jet'); + plt.colorbar() + plt.title(full_name) + + +def all_plot(d, full_name="", exclude="", nspaces=0): + """Recursively plot all the LFADS model parameters in the nested + dictionary.""" + for k, v in d.iteritems(): + this_name = full_name+"/"+k + if isinstance(v, dict): + all_plot(v, full_name=this_name, exclude=exclude, nspaces=nspaces+4) + else: + if exclude == "" or exclude not in this_name: + _plot_item(v, name=k, full_name=full_name+"/"+k, nspaces=nspaces+4) + + +def plot_priors(): + g0s_prior_mean_bxn = train_modelvals['prior_g0_mean'] + g0s_prior_var_bxn = train_modelvals['prior_g0_var'] + g0s_post_mean_bxn = train_modelvals['posterior_g0_mean'] + g0s_post_var_bxn = train_modelvals['posterior_g0_var'] + + plt.figure(figsize=(10,4), tight_layout=True); + plt.subplot(1,2,1) + plt.hist(g0s_post_mean_bxn.flatten(), bins=20, color='b'); + plt.hist(g0s_prior_mean_bxn.flatten(), bins=20, color='g'); + + plt.title('Histogram of Prior/Posterior Mean Values') + plt.subplot(1,2,2) + plt.hist((g0s_post_var_bxn.flatten()), bins=20, color='b'); + plt.hist((g0s_prior_var_bxn.flatten()), bins=20, color='g'); + plt.title('Histogram of Prior/Posterior Log Variance Values') + + plt.figure(figsize=(10,10), tight_layout=True) + plt.subplot(2,2,1) + plt.imshow(g0s_prior_mean_bxn.T, interpolation='nearest', cmap='jet') + plt.colorbar(fraction=0.025, pad=0.04) + plt.title('Prior g0 means') + + plt.subplot(2,2,2) + plt.imshow(g0s_post_mean_bxn.T, interpolation='nearest', cmap='jet') + plt.colorbar(fraction=0.025, pad=0.04) + plt.title('Posterior g0 means'); + + plt.subplot(2,2,3) + plt.imshow(g0s_prior_var_bxn.T, interpolation='nearest', cmap='jet') + plt.colorbar(fraction=0.025, pad=0.04) + plt.title('Prior g0 variance Values') + + plt.subplot(2,2,4) + plt.imshow(g0s_post_var_bxn.T, interpolation='nearest', cmap='jet') + plt.colorbar(fraction=0.025, pad=0.04) + plt.title('Posterior g0 variance Values') + + plt.figure(figsize=(10,5)) + plt.stem(np.sort(np.log(g0s_post_mean_bxn.std(axis=0)))); + plt.title('Log standard deviation of h0 means'); + + +def plot_time_series(vals_bxtxn, bidx=None, n_to_plot=np.inf, scale=1.0, + color='r', title=None): + + if bidx is None: + vals_txn = np.mean(vals_bxtxn, axis=0) + else: + vals_txn = vals_bxtxn[bidx,:,:] + + T, N = vals_txn.shape + if n_to_plot > N: + n_to_plot = N + + plt.plot(vals_txn[:,0:n_to_plot] + scale*np.array(range(n_to_plot)), + color=color, lw=1.0) + plt.axis('tight') + if title: + plt.title(title) + + +def plot_lfads_timeseries(data_bxtxn, model_vals, ext_input_bxtxi=None, + truth_bxtxn=None, bidx=None, output_dist="poisson", + conversion_factor=1.0, subplot_cidx=0, + col_title=None): + + n_to_plot = 10 + scale = 1.0 + nrows = 7 + plt.subplot(nrows,2,1+subplot_cidx) + + if output_dist == 'poisson': + rates = means = conversion_factor * model_vals['output_dist_params'] + plot_time_series(rates, bidx, n_to_plot=n_to_plot, scale=scale, + title=col_title + " rates (LFADS - red, Truth - black)") + elif output_dist == 'gaussian': + means_vars = model_vals['output_dist_params'] + means, vars = np.split(means_vars,2, axis=2) # bxtxn + stds = np.sqrt(vars) + plot_time_series(means, bidx, n_to_plot=n_to_plot, scale=scale, + title=col_title + " means (LFADS - red, Truth - black)") + plot_time_series(means+stds, bidx, n_to_plot=n_to_plot, scale=scale, + color='c') + plot_time_series(means-stds, bidx, n_to_plot=n_to_plot, scale=scale, + color='c') + else: + assert 'NIY' + + + if truth_bxtxn is not None: + plot_time_series(truth_bxtxn, bidx, n_to_plot=n_to_plot, color='k', + scale=scale) + + input_title = "" + if "controller_outputs" in model_vals.keys(): + input_title += " Controller Output" + plt.subplot(nrows,2,3+subplot_cidx) + u_t = model_vals['controller_outputs'][0:-1] + plot_time_series(u_t, bidx, n_to_plot=n_to_plot, color='c', scale=1.0, + title=col_title + input_title) + + if ext_input_bxtxi is not None: + input_title += " External Input" + plot_time_series(ext_input_bxtxi, n_to_plot=n_to_plot, color='b', + scale=scale, title=col_title + input_title) + + plt.subplot(nrows,2,5+subplot_cidx) + plot_time_series(means, bidx, + n_to_plot=n_to_plot, scale=1.0, + title=col_title + " Spikes (LFADS - red, Spikes - black)") + plot_time_series(data_bxtxn, bidx, n_to_plot=n_to_plot, color='k', scale=1.0) + + plt.subplot(nrows,2,7+subplot_cidx) + plot_time_series(model_vals['factors'], bidx, n_to_plot=n_to_plot, color='b', + scale=2.0, title=col_title + " Factors") + + plt.subplot(nrows,2,9+subplot_cidx) + plot_time_series(model_vals['gen_states'], bidx, n_to_plot=n_to_plot, + color='g', scale=1.0, title=col_title + " Generator State") + + if bidx is not None: + data_nxt = data_bxtxn[bidx,:,:].T + params_nxt = model_vals['output_dist_params'][bidx,:,:].T + else: + data_nxt = np.mean(data_bxtxn, axis=0).T + params_nxt = np.mean(model_vals['output_dist_params'], axis=0).T + if output_dist == 'poisson': + means_nxt = params_nxt + elif output_dist == 'gaussian': # (means+vars) x time + means_nxt = np.vsplit(params_nxt,2)[0] # get means + else: + assert "NIY" + + plt.subplot(nrows,2,11+subplot_cidx) + plt.imshow(data_nxt, aspect='auto', interpolation='nearest') + plt.title(col_title + ' Data') + + plt.subplot(nrows,2,13+subplot_cidx) + plt.imshow(means_nxt, aspect='auto', interpolation='nearest') + plt.title(col_title + ' Means') + + +def plot_lfads(train_bxtxd, train_model_vals, + train_ext_input_bxtxi=None, train_truth_bxtxd=None, + valid_bxtxd=None, valid_model_vals=None, + valid_ext_input_bxtxi=None, valid_truth_bxtxd=None, + bidx=None, cf=1.0, output_dist='poisson'): + + # Plotting + f = plt.figure(figsize=(18,20), tight_layout=True) + plot_lfads_timeseries(train_bxtxd, train_model_vals, + train_ext_input_bxtxi, + truth_bxtxn=train_truth_bxtxd, + conversion_factor=cf, bidx=bidx, + output_dist=output_dist, col_title='Train') + plot_lfads_timeseries(valid_bxtxd, valid_model_vals, + valid_ext_input_bxtxi, + truth_bxtxn=valid_truth_bxtxd, + conversion_factor=cf, bidx=bidx, + output_dist=output_dist, + subplot_cidx=1, col_title='Valid') + + # Convert from figure to an numpy array width x height x 3 (last for RGB) + f.canvas.draw() + data = np.fromstring(f.canvas.tostring_rgb(), dtype=np.uint8, sep='') + data_wxhx3 = data.reshape(f.canvas.get_width_height()[::-1] + (3,)) + plt.close() + + return data_wxhx3 diff --git a/lfads/run_lfads.py b/lfads/run_lfads.py new file mode 100755 index 0000000000000000000000000000000000000000..74c5bd00a233c35c035fc1cadac7deedd5ee2519 --- /dev/null +++ b/lfads/run_lfads.py @@ -0,0 +1,778 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# ============================================================================== +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +from lfads import LFADS +import numpy as np +import os +import tensorflow as tf +import re +import utils + +# Lots of hyperparameters, but most are pretty insensitive. The +# explanation of these hyperparameters is found below, in the flags +# session. + +CHECKPOINT_PB_LOAD_NAME = "checkpoint" +CHECKPOINT_NAME = "lfads_vae" +CSV_LOG = "fitlog" +OUTPUT_FILENAME_STEM = "" +DEVICE = "gpu:0" # "cpu:0", or other gpus, e.g. "gpu:1" +MAX_CKPT_TO_KEEP = 5 +MAX_CKPT_TO_KEEP_LVE = 5 +PS_NEXAMPLES_TO_PROCESS = 1e8 # if larger than number of examples, process all +EXT_INPUT_DIM = 0 +IC_DIM = 64 +FACTORS_DIM = 50 +IC_ENC_DIM = 128 +GEN_DIM = 200 +GEN_CELL_INPUT_WEIGHT_SCALE = 1.0 +GEN_CELL_REC_WEIGHT_SCALE = 1.0 +CELL_WEIGHT_SCALE = 1.0 +BATCH_SIZE = 128 +LEARNING_RATE_INIT = 0.01 +LEARNING_RATE_DECAY_FACTOR = 0.95 +LEARNING_RATE_STOP = 0.00001 +LEARNING_RATE_N_TO_COMPARE = 6 +INJECT_EXT_INPUT_TO_GEN = False +DO_TRAIN_IO_ONLY = False +DO_RESET_LEARNING_RATE = False +FEEDBACK_FACTORS_OR_RATES = "factors" + +# Calibrated just above the average value for the rnn synthetic data. +MAX_GRAD_NORM = 200.0 +CELL_CLIP_VALUE = 5.0 +KEEP_PROB = 0.95 +TEMPORAL_SPIKE_JITTER_WIDTH = 0 +OUTPUT_DISTRIBUTION = 'poisson' # 'poisson' or 'gaussian' +NUM_STEPS_FOR_GEN_IC = np.inf # set to num_steps if greater than num_steps + +DATA_DIR = "/tmp/rnn_synth_data_v1.0/" +DATA_FILENAME_STEM = "chaotic_rnn_inputs_g1p5" +LFADS_SAVE_DIR = "/tmp/lfads_chaotic_rnn_inputs_g1p5/" +CO_DIM = 1 +DO_CAUSAL_CONTROLLER = False +DO_FEED_FACTORS_TO_CONTROLLER = True +CONTROLLER_INPUT_LAG = 1 +PRIOR_AR_AUTOCORRELATION = 10.0 +PRIOR_AR_PROCESS_VAR = 0.1 +DO_TRAIN_PRIOR_AR_ATAU = True +DO_TRAIN_PRIOR_AR_NVAR = True +CI_ENC_DIM = 128 +CON_DIM = 128 +CO_PRIOR_VAR_SCALE = 0.1 +KL_INCREASE_STEPS = 2000 +L2_INCREASE_STEPS = 2000 +L2_GEN_SCALE = 2000.0 +L2_CON_SCALE = 0.0 +# scale of regularizer on time correlation of inferred inputs +CO_MEAN_CORR_SCALE = 0.0 +KL_IC_WEIGHT = 1.0 +KL_CO_WEIGHT = 1.0 +KL_START_STEP = 0 +L2_START_STEP = 0 +IC_PRIOR_VAR_MIN = 0.1 +IC_PRIOR_VAR_SCALE = 0.1 +IC_PRIOR_VAR_MAX = 0.1 +IC_POST_VAR_MIN = 0.0001 # protection from KL blowing up + +flags = tf.app.flags +flags.DEFINE_string("kind", "train", + "Type of model to build {train, \ + posterior_sample_and_average, \ + prior_sample, write_model_params") +flags.DEFINE_string("output_dist", OUTPUT_DISTRIBUTION, + "Type of output distribution, 'poisson' or 'gaussian'") +flags.DEFINE_boolean("allow_gpu_growth", False, + "If true, only allocate amount of memory needed for \ + Session. Otherwise, use full GPU memory.") + +# DATA +flags.DEFINE_string("data_dir", DATA_DIR, "Data for training") +flags.DEFINE_string("data_filename_stem", DATA_FILENAME_STEM, + "Filename stem for data dictionaries.") +flags.DEFINE_string("lfads_save_dir", LFADS_SAVE_DIR, "model save dir") +flags.DEFINE_string("checkpoint_pb_load_name", CHECKPOINT_PB_LOAD_NAME, + "Name of checkpoint files, use 'checkpoint_lve' for best \ + error") +flags.DEFINE_string("checkpoint_name", CHECKPOINT_NAME, + "Name of checkpoint files (.ckpt appended)") +flags.DEFINE_string("output_filename_stem", OUTPUT_FILENAME_STEM, + "Name of output file (postfix will be added)") +flags.DEFINE_string("device", DEVICE, + "Which device to use (default: \"gpu:0\", can also be \ + \"cpu:0\", \"gpu:1\", etc)") +flags.DEFINE_string("csv_log", CSV_LOG, + "Name of file to keep running log of fit likelihoods, \ + etc (.csv appended)") +flags.DEFINE_integer("max_ckpt_to_keep", MAX_CKPT_TO_KEEP, + "Max # of checkpoints to keep (rolling)") +flags.DEFINE_integer("ps_nexamples_to_process", PS_NEXAMPLES_TO_PROCESS, + "Number of examples to process for posterior sample and \ + average (not number of samples to average over).") +flags.DEFINE_integer("max_ckpt_to_keep_lve", MAX_CKPT_TO_KEEP_LVE, + "Max # of checkpoints to keep for lowest validation error \ + models (rolling)") +flags.DEFINE_integer("ext_input_dim", EXT_INPUT_DIM, "Dimension of external \ +inputs") +flags.DEFINE_integer("num_steps_for_gen_ic", NUM_STEPS_FOR_GEN_IC, + "Number of steps to train the generator initial conditon.") + + +# If there are observed inputs, there are two ways to add that observed +# input to the model. The first is by treating as something to be +# inferred, and thus encoding the observed input via the encoders, and then +# input to the generator via the "inferred inputs" channel. Second, one +# can input the input directly into the generator. This has the downside +# of making the generation process strictly dependent on knowing the +# observed input for any generated trial. +flags.DEFINE_boolean("inject_ext_input_to_gen", + INJECT_EXT_INPUT_TO_GEN, + "Should observed inputs be input to model via encoders, \ + or injected directly into generator?") + +# CELL + +# The combined recurrent and input weights of the encoder and +# controller cells are by default set to scale at ws/sqrt(#inputs), +# with ws=1.0. You can change this scaling with this parameter. +flags.DEFINE_float("cell_weight_scale", CELL_WEIGHT_SCALE, + "Input scaling for input weights in generator.") + + +# GENERATION + +# Note that the dimension of the initial conditions is separated from the +# dimensions of the generator initial conditions (and a linear matrix will +# adapt the shapes if necessary). This is just another way to control +# complexity. In all likelihood, setting the ic dims to the size of the +# generator hidden state is just fine. +flags.DEFINE_integer("ic_dim", IC_DIM, "Dimension of h0") +# Setting the dimensions of the factors to something smaller than the data +# dimension is a way to get a reduced dimensionality representation of your +# data. +flags.DEFINE_integer("factors_dim", FACTORS_DIM, + "Number of factors from generator") +flags.DEFINE_integer("ic_enc_dim", IC_ENC_DIM, + "Cell hidden size, encoder of h0") + +# Controlling the size of the generator is one way to control complexity of +# the dynamics (there is also l2, which will squeeze out unnecessary +# dynamics also). The modern deep learning approach is to make these cells +# as large as tolerable (from a waiting perspective), and then regularize +# them to death with drop out or whatever. I don't know if this is correct +# for the LFADS application or not. +flags.DEFINE_integer("gen_dim", GEN_DIM, + "Cell hidden size, generator.") +# The weights of the generator cell by default set to scale at +# ws/sqrt(#inputs), with ws=1.0. You can change ws for +# the input weights or the recurrent weights with these hyperparameters. +flags.DEFINE_float("gen_cell_input_weight_scale", GEN_CELL_INPUT_WEIGHT_SCALE, + "Input scaling for input weights in generator.") +flags.DEFINE_float("gen_cell_rec_weight_scale", GEN_CELL_REC_WEIGHT_SCALE, + "Input scaling for rec weights in generator.") + +# KL DISTRIBUTIONS +# If you don't know what you are donig here, please leave alone, the +# defaults should be fine for most cases, irregardless of other parameters. +# +# If you don't want the prior variance to be learned, set the +# following values to the same thing: ic_prior_var_min, +# ic_prior_var_scale, ic_prior_var_max. The prior mean will be +# learned regardless. +flags.DEFINE_float("ic_prior_var_min", IC_PRIOR_VAR_MIN, + "Minimum variance in posterior h0 codes.") +flags.DEFINE_float("ic_prior_var_scale", IC_PRIOR_VAR_SCALE, + "Variance of ic prior distribution") +flags.DEFINE_float("ic_prior_var_max", IC_PRIOR_VAR_MAX, + "Maximum variance of IC prior distribution.") +# If you really want to limit the information from encoder to decoder, +# Increase ic_post_var_min above 0.0. +flags.DEFINE_float("ic_post_var_min", IC_POST_VAR_MIN, + "Minimum variance of IC posterior distribution.") +flags.DEFINE_float("co_prior_var_scale", CO_PRIOR_VAR_SCALE, + "Variance of control input prior distribution.") + + +flags.DEFINE_float("prior_ar_atau", PRIOR_AR_AUTOCORRELATION, + "Initial autocorrelation of AR(1) priors.") +flags.DEFINE_float("prior_ar_nvar", PRIOR_AR_PROCESS_VAR, + "Initial noise variance for AR(1) priors.") +flags.DEFINE_boolean("do_train_prior_ar_atau", DO_TRAIN_PRIOR_AR_ATAU, + "Is the value for atau an init, or the constant value?") +flags.DEFINE_boolean("do_train_prior_ar_nvar", DO_TRAIN_PRIOR_AR_NVAR, + "Is the value for noise variance an init, or the constant \ + value?") + +# CONTROLLER +# This parameter critically controls whether or not there is a controller +# (along with controller encoders placed into the LFADS graph. If CO_DIM > +# 1, that means there is a 1 dimensional controller outputs, if equal to 0, +# then no controller. +flags.DEFINE_integer("co_dim", CO_DIM, + "Number of control net outputs (>0 builds that graph).") + +# The controller will be more powerful if it can see the encoding of the entire +# trial. However, this allows the controller to create inferred inputs that are +# acausal with respect to the actual data generation process. E.g. the data +# generator could have an input at time t, but the controller, after seeing the +# entirety of the trial could infer that the input is coming a little before +# time t, because there are no restrictions on the data the controller sees. +# One can force the controller to be causal (with respect to perturbations in +# the data generator) so that it only sees forward encodings of the data at time +# t that originate at times before or at time t. One can also control the data +# the controller sees by using an input lag (forward encoding at time [t-tlag] +# for controller input at time t. The same can be done in the reverse direction +# (controller input at time t from reverse encoding at time [t+tlag], in the +# case of an acausal controller). Setting this lag > 0 (even lag=1) can be a +# powerful way of avoiding very spiky decodes. Finally, one can manually control +# whether the factors at time t-1 are fed to the controller at time t. +# +# If you don't care about any of this, and just want to smooth your data, set +# do_causal_controller = False +# do_feed_factors_to_controller = True +# causal_input_lag = 0 +flags.DEFINE_boolean("do_causal_controller", + DO_CAUSAL_CONTROLLER, + "Restrict the controller create only causal inferred \ + inputs?") +# Strictly speaking, feeding either the factors or the rates to the controller +# violates causality, since the g0 gets to see all the data. This may or may not +# be only a theoretical concern. +flags.DEFINE_boolean("do_feed_factors_to_controller", + DO_FEED_FACTORS_TO_CONTROLLER, + "Should factors[t-1] be input to controller at time t?") +flags.DEFINE_string("feedback_factors_or_rates", FEEDBACK_FACTORS_OR_RATES, + "Feedback the factors or the rates to the controller? \ + Acceptable values: 'factors' or 'rates'.") +flags.DEFINE_integer("controller_input_lag", CONTROLLER_INPUT_LAG, + "Time lag on the encoding to controller t-lag for \ + forward, t+lag for reverse.") + +flags.DEFINE_integer("ci_enc_dim", CI_ENC_DIM, + "Cell hidden size, encoder of control inputs") +flags.DEFINE_integer("con_dim", CON_DIM, + "Cell hidden size, controller") + + +# OPTIMIZATION +flags.DEFINE_integer("batch_size", BATCH_SIZE, + "Batch size to use during training.") +flags.DEFINE_float("learning_rate_init", LEARNING_RATE_INIT, + "Learning rate initial value") +flags.DEFINE_float("learning_rate_decay_factor", LEARNING_RATE_DECAY_FACTOR, + "Learning rate decay, decay by this fraction every so \ + often.") +flags.DEFINE_float("learning_rate_stop", LEARNING_RATE_STOP, + "The lr is adaptively reduced, stop training at this value.") +# Rather put the learning rate on an exponentially decreasiong schedule, +# the current algorithm pays attention to the learning rate, and if it +# isn't regularly decreasing, it will decrease the learning rate. So far, +# it works fine, though it is not perfect. +flags.DEFINE_integer("learning_rate_n_to_compare", LEARNING_RATE_N_TO_COMPARE, + "Number of previous costs current cost has to be worse \ + than, to lower learning rate.") + +# This sets a value, above which, the gradients will be clipped. This hp +# is extremely useful to avoid an infrequent, but highly pathological +# problem whereby the gradient is so large that it destroys the +# optimziation by setting parameters too large, leading to a vicious cycle +# that ends in NaNs. If it's too large, it's useless, if it's too small, +# it essentially becomes the learning rate. It's pretty insensitive, though. +flags.DEFINE_float("max_grad_norm", MAX_GRAD_NORM, + "Max norm of gradient before clipping.") + +# If your optimizations start "NaN-ing out", reduce this value so that +# the values of the network don't grow out of control. Typically, once +# this parameter is set to a reasonable value, one stops having numerical +# problems. +flags.DEFINE_float("cell_clip_value", CELL_CLIP_VALUE, + "Max value recurrent cell can take before being clipped.") + +# This flag is used for an experiment where one sees if training a model with +# many days data can be used to learn the dynamics from a held-out days data. +# If you don't care about that particular experiment, this flag should always be +# false. +flags.DEFINE_boolean("do_train_io_only", DO_TRAIN_IO_ONLY, + "Train only the input (readin) and output (readout) \ + affine functions.") + +flags.DEFINE_boolean("do_reset_learning_rate", DO_RESET_LEARNING_RATE, + "Reset the learning rate to initial value.") + + +# OVERFITTING +# Dropout is done on the input data, on controller inputs (from +# encoder), on outputs from generator to factors. +flags.DEFINE_float("keep_prob", KEEP_PROB, "Dropout keep probability.") +# It appears that the system will happily fit spikes (blessing or +# curse, depending). You may not want this. Jittering the spikes a +# bit will help (-/+ bin size, as specified here). +flags.DEFINE_integer("temporal_spike_jitter_width", + TEMPORAL_SPIKE_JITTER_WIDTH, + "Shuffle spikes around this window.") + +# General note about helping ascribe controller inputs vs dynamics: +# +# If controller is heavily penalized, then it won't have any output. +# If dynamics are heavily penalized, then generator won't make +# dynamics. Note this l2 penalty is only on the recurrent portion of +# the RNNs, as dropout is also available, penalizing the feed-forward +# connections. +flags.DEFINE_float("l2_gen_scale", L2_GEN_SCALE, + "L2 regularization cost for the generator only.") +flags.DEFINE_float("l2_con_scale", L2_CON_SCALE, + "L2 regularization cost for the controller only.") +flags.DEFINE_float("co_mean_corr_scale", CO_MEAN_CORR_SCALE, + "Cost of correlation (thru time)in the means of \ + controller output.") + +# UNDERFITTING +# If the primary task of LFADS is "filtering" of data and not +# generation, then it is possible that the KL penalty is too strong. +# Empirically, we have found this to be the case. So we add a +# hyperparameter in front of the the two KL terms (one for the initial +# conditions to the generator, the other for the controller outputs). +# You should always think of the the default values as 1.0, and that +# leads to a standard VAE formulation whereby the numbers that are +# optimized are a lower-bound on the log-likelihood of the data. When +# these 2 HPs deviate from 1.0, one cannot make any statement about +# what those LL lower bounds mean anymore, and they cannot be compared +# (AFAIK). +flags.DEFINE_float("kl_ic_weight", KL_IC_WEIGHT, + "Strength of KL weight on initial conditions KL penatly.") +flags.DEFINE_float("kl_co_weight", KL_CO_WEIGHT, + "Strength of KL weight on controller output KL penalty.") + +# Sometimes the task can be sufficiently hard to learn that the +# optimizer takes the 'easy route', and simply minimizes the KL +# divergence, setting it to near zero, and the optimization gets +# stuck. These two parameters will help avoid that by by getting the +# optimization to 'latch' on to the main optimization, and only +# turning in the regularizers later. +flags.DEFINE_integer("kl_start_step", KL_START_STEP, + "Start increasing weight after this many steps.") +# training passes, not epochs, increase by 0.5 every kl_increase_steps +flags.DEFINE_integer("kl_increase_steps", KL_INCREASE_STEPS, + "Increase weight of kl cost to avoid local minimum.") +# Same story for l2 regularizer. One wants a simple generator, for scientific +# reasons, but not at the expense of hosing the optimization. +flags.DEFINE_integer("l2_start_step", L2_START_STEP, + "Start increasing l2 weight after this many steps.") +flags.DEFINE_integer("l2_increase_steps", L2_INCREASE_STEPS, + "Increase weight of l2 cost to avoid local minimum.") + +FLAGS = flags.FLAGS + + +def build_model(hps, kind="train", datasets=None): + """Builds a model from either random initialization, or saved parameters. + + Args: + hps: The hyper parameters for the model. + kind: (optional) The kind of model to build. Training vs inference require + different graphs. + datasets: The datasets structure (see top of lfads.py). + + Returns: + an LFADS model. + """ + + build_kind = kind + if build_kind == "write_model_params": + build_kind = "train" + with tf.variable_scope("LFADS", reuse=None): + model = LFADS(hps, kind=build_kind, datasets=datasets) + + if not os.path.exists(hps.lfads_save_dir): + print("Save directory %s does not exist, creating it." % hps.lfads_save_dir) + os.makedirs(hps.lfads_save_dir) + + cp_pb_ln = hps.checkpoint_pb_load_name + cp_pb_ln = 'checkpoint' if cp_pb_ln == "" else cp_pb_ln + if cp_pb_ln == 'checkpoint': + print("Loading latest training checkpoint in: ", hps.lfads_save_dir) + saver = model.seso_saver + elif cp_pb_ln == 'checkpoint_lve': + print("Loading lowest validation checkpoint in: ", hps.lfads_save_dir) + saver = model.lve_saver + else: + print("Loading checkpoint: ", cp_pb_ln, ", in: ", hps.lfads_save_dir) + saver = model.seso_saver + + ckpt = tf.train.get_checkpoint_state(hps.lfads_save_dir, + latest_filename=cp_pb_ln) + + session = tf.get_default_session() + print("ckpt: ", ckpt) + if ckpt and tf.train.checkpoint_exists(ckpt.model_checkpoint_path): + print("Reading model parameters from %s" % ckpt.model_checkpoint_path) + saver.restore(session, ckpt.model_checkpoint_path) + else: + print("Created model with fresh parameters.") + if kind in ["posterior_sample_and_average", "prior_sample", + "write_model_params"]: + print("Possible error!!! You are running ", kind, " on a newly \ + initialized model!") + print("Are you sure you sure ", ckpt.model_checkpoint_path, " exists?") + + tf.global_variables_initializer().run() + + if ckpt: + train_step_str = re.search('-[0-9]+$', ckpt.model_checkpoint_path).group() + else: + train_step_str = '-0' + + fname = 'hyperparameters' + train_step_str + '.txt' + hp_fname = os.path.join(hps.lfads_save_dir, fname) + hps_for_saving = jsonify_dict(hps) + utils.write_data(hp_fname, hps_for_saving, use_json=True) + + return model + + +def jsonify_dict(d): + """Turns python booleans into strings so hps dict can be written in json. + Creates a shallow-copied dictionary first, then accomplishes string + conversion. + + Args: + d: hyperparameter dictionary + + Returns: hyperparameter dictionary with bool's as strings + """ + + d2 = d.copy() # shallow copy is fine by assumption of d being shallow + def jsonify_bool(boolean_value): + if boolean_value: + return "true" + else: + return "false" + + for key in d2.keys(): + if isinstance(d2[key], bool): + d2[key] = jsonify_bool(d2[key]) + return d2 + + +def build_hyperparameter_dict(flags): + """Simple script for saving hyper parameters. Under the hood the + flags structure isn't a dictionary, so it has to be simplified since we + want to be able to view file as text. + + Args: + flags: From tf.app.flags + + Returns: + dictionary of hyper parameters (ignoring other flag types). + """ + d = {} + # Data + d['output_dist'] = flags.output_dist + d['data_dir'] = flags.data_dir + d['lfads_save_dir'] = flags.lfads_save_dir + d['checkpoint_pb_load_name'] = flags.checkpoint_pb_load_name + d['checkpoint_name'] = flags.checkpoint_name + d['output_filename_stem'] = flags.output_filename_stem + d['max_ckpt_to_keep'] = flags.max_ckpt_to_keep + d['max_ckpt_to_keep_lve'] = flags.max_ckpt_to_keep_lve + d['ps_nexamples_to_process'] = flags.ps_nexamples_to_process + d['ext_input_dim'] = flags.ext_input_dim + d['data_filename_stem'] = flags.data_filename_stem + d['device'] = flags.device + d['csv_log'] = flags.csv_log + d['num_steps_for_gen_ic'] = flags.num_steps_for_gen_ic + d['inject_ext_input_to_gen'] = flags.inject_ext_input_to_gen + # Cell + d['cell_weight_scale'] = flags.cell_weight_scale + # Generation + d['ic_dim'] = flags.ic_dim + d['factors_dim'] = flags.factors_dim + d['ic_enc_dim'] = flags.ic_enc_dim + d['gen_dim'] = flags.gen_dim + d['gen_cell_input_weight_scale'] = flags.gen_cell_input_weight_scale + d['gen_cell_rec_weight_scale'] = flags.gen_cell_rec_weight_scale + # KL distributions + d['ic_prior_var_min'] = flags.ic_prior_var_min + d['ic_prior_var_scale'] = flags.ic_prior_var_scale + d['ic_prior_var_max'] = flags.ic_prior_var_max + d['ic_post_var_min'] = flags.ic_post_var_min + d['co_prior_var_scale'] = flags.co_prior_var_scale + d['prior_ar_atau'] = flags.prior_ar_atau + d['prior_ar_nvar'] = flags.prior_ar_nvar + d['do_train_prior_ar_atau'] = flags.do_train_prior_ar_atau + d['do_train_prior_ar_nvar'] = flags.do_train_prior_ar_nvar + # Controller + d['do_causal_controller'] = flags.do_causal_controller + d['controller_input_lag'] = flags.controller_input_lag + d['do_feed_factors_to_controller'] = flags.do_feed_factors_to_controller + d['feedback_factors_or_rates'] = flags.feedback_factors_or_rates + d['co_dim'] = flags.co_dim + d['ci_enc_dim'] = flags.ci_enc_dim + d['con_dim'] = flags.con_dim + d['co_mean_corr_scale'] = flags.co_mean_corr_scale + # Optimization + d['batch_size'] = flags.batch_size + d['learning_rate_init'] = flags.learning_rate_init + d['learning_rate_decay_factor'] = flags.learning_rate_decay_factor + d['learning_rate_stop'] = flags.learning_rate_stop + d['learning_rate_n_to_compare'] = flags.learning_rate_n_to_compare + d['max_grad_norm'] = flags.max_grad_norm + d['cell_clip_value'] = flags.cell_clip_value + d['do_train_io_only'] = flags.do_train_io_only + d['do_reset_learning_rate'] = flags.do_reset_learning_rate + + # Overfitting + d['keep_prob'] = flags.keep_prob + d['temporal_spike_jitter_width'] = flags.temporal_spike_jitter_width + d['l2_gen_scale'] = flags.l2_gen_scale + d['l2_con_scale'] = flags.l2_con_scale + # Underfitting + d['kl_ic_weight'] = flags.kl_ic_weight + d['kl_co_weight'] = flags.kl_co_weight + d['kl_start_step'] = flags.kl_start_step + d['kl_increase_steps'] = flags.kl_increase_steps + d['l2_start_step'] = flags.l2_start_step + d['l2_increase_steps'] = flags.l2_increase_steps + + return d + + +class hps_dict_to_obj(dict): + """Helper class allowing us to access hps dictionary more easily.""" + + def __getattr__(self, key): + if key in self: + return self[key] + else: + assert False, ("%s does not exist." % key) + def __setattr__(self, key, value): + self[key] = value + + +def train(hps, datasets): + """Train the LFADS model. + + Args: + hps: The dictionary of hyperparameters. + datasets: A dictionary of data dictionaries. The dataset dict is simply a + name(string)-> data dictionary mapping (See top of lfads.py). + """ + model = build_model(hps, kind="train", datasets=datasets) + if hps.do_reset_learning_rate: + sess = tf.get_default_session() + sess.run(model.learning_rate.initializer) + + model.train_model(datasets) + + +def write_model_runs(hps, datasets, output_fname=None): + """Run the model on the data in data_dict, and save the computed values. + + LFADS generates a number of outputs for each examples, and these are all + saved. They are: + The mean and variance of the prior of g0. + The mean and variance of approximate posterior of g0. + The control inputs (if enabled) + The initial conditions, g0, for all examples. + The generator states for all time. + The factors for all time. + The rates for all time. + + Args: + hps: The dictionary of hyperparameters. + datasets: A dictionary of data dictionaries. The dataset dict is simply a + name(string)-> data dictionary mapping (See top of lfads.py). + output_fname (optional): output filename stem to write the model runs. + """ + model = build_model(hps, kind=hps.kind, datasets=datasets) + model.write_model_runs(datasets, output_fname) + + +def write_model_samples(hps, datasets, dataset_name=None, output_fname=None): + """Use the prior distribution to generate samples from the model. + Generates batch_size number of samples (set through FLAGS). + + LFADS generates a number of outputs for each examples, and these are all + saved. They are: + The mean and variance of the prior of g0. + The control inputs (if enabled) + The initial conditions, g0, for all examples. + The generator states for all time. + The factors for all time. + The output distribution parameters (e.g. rates) for all time. + + Args: + hps: The dictionary of hyperparameters. + datasets: A dictionary of data dictionaries. The dataset dict is simply a + name(string)-> data dictionary mapping (See top of lfads.py). + dataset_name: The name of the dataset to grab the factors -> rates + alignment matrices from. Only a concern with models trained on + multi-session data. By default, uses the first dataset in the data dict. + output_fname: The name prefix of the file in which to save the generated + samples. + """ + if not output_fname: + output_fname = "model_runs_" + hps.kind + else: + output_fname = output_fname + "model_runs_" + hps.kind + if not dataset_name: + dataset_name = datasets.keys()[0] + else: + if dataset_name not in datasets.keys(): + raise ValueError("Invalid dataset name '%s'."%(dataset_name)) + model = build_model(hps, kind=hps.kind, datasets=datasets) + model.write_model_samples(dataset_name, output_fname) + + +def write_model_parameters(hps, output_fname=None, datasets=None): + """Save all the model parameters + + Save all the parameters to hps.lfads_save_dir. + + Args: + hps: The dictionary of hyperparameters. + output_fname: The prefix of the file in which to save the generated + samples. + datasets: A dictionary of data dictionaries. The dataset dict is simply a + name(string)-> data dictionary mapping (See top of lfads.py). + """ + if not output_fname: + output_fname = "model_params" + else: + output_fname = output_fname + "_model_params" + fname = os.path.join(hps.lfads_save_dir, output_fname) + print("Writing model parameters to: ", fname) + # save the optimizer params as well + model = build_model(hps, kind="write_model_params", datasets=datasets) + model_params = model.eval_model_parameters(use_nested=False, + include_strs="LFADS") + utils.write_data(fname, model_params, compression=None) + print("Done.") + + +def clean_data_dict(data_dict): + """Add some key/value pairs to the data dict, if they are missing. + Args: + data_dict - dictionary containing data for LFADS + Returns: + data_dict with some keys filled in, if they are absent. + """ + + keys = ['train_truth', 'train_ext_input', 'valid_data', + 'valid_truth', 'valid_ext_input', 'valid_train'] + for k in keys: + if k not in data_dict: + data_dict[k] = None + + return data_dict + + +def load_datasets(data_dir, data_filename_stem): + """Load the datasets from a specified directory. + + Example files look like + >data_dir/my_dataset_first_day + >data_dir/my_dataset_second_day + + If my_dataset (filename) stem is in the directory, the read routine will try + and load it. The datasets dictionary will then look like + dataset['first_day'] -> (first day data dictionary) + dataset['second_day'] -> (first day data dictionary) + + Args: + data_dir: The directory from which to load the datasets. + data_filename_stem: The stem of the filename for the datasets. + + Returns: + datasets: a dataset dictionary, with one name->data dictionary pair for + each dataset file. + """ + print("Reading data from ", data_dir) + datasets = utils.read_datasets(data_dir, data_filename_stem) + for k, data_dict in datasets.items(): + datasets[k] = clean_data_dict(data_dict) + + train_total_size = len(data_dict['train_data']) + if train_total_size == 0: + print("Did not load training set.") + else: + print("Found training set with number examples: ", train_total_size) + + valid_total_size = len(data_dict['valid_data']) + if valid_total_size == 0: + print("Did not load validation set.") + else: + print("Found validation set with number examples: ", valid_total_size) + + return datasets + + +def main(_): + """Get this whole shindig off the ground.""" + d = build_hyperparameter_dict(FLAGS) + hps = hps_dict_to_obj(d) # hyper parameters + kind = FLAGS.kind + + # Read the data, if necessary. + train_set = valid_set = None + if kind in ["train", "posterior_sample_and_average", "prior_sample", + "write_model_params"]: + datasets = load_datasets(hps.data_dir, hps.data_filename_stem) + else: + raise ValueError('Kind {} is not supported.'.format(kind)) + + # infer the dataset names and dataset dimensions from the loaded files + hps.kind = kind # needs to be added here, cuz not saved as hyperparam + hps.dataset_names = [] + hps.dataset_dims = {} + for key in datasets: + hps.dataset_names.append(key) + hps.dataset_dims[key] = datasets[key]['data_dim'] + + # also store down the dimensionality of the data + # - just pull from one set, required to be same for all sets + hps.num_steps = datasets.values()[0]['num_steps'] + hps.ndatasets = len(hps.dataset_names) + + if hps.num_steps_for_gen_ic > hps.num_steps: + hps.num_steps_for_gen_ic = hps.num_steps + + # Build and run the model, for varying purposes. + config = tf.ConfigProto(allow_soft_placement=True, + log_device_placement=False) + if FLAGS.allow_gpu_growth: + config.gpu_options.allow_growth = True + sess = tf.Session(config=config) + with sess.as_default(): + with tf.device(hps.device): + if kind == "train": + train(hps, datasets) + elif kind == "posterior_sample_and_average": + write_model_runs(hps, datasets, hps.output_filename_stem) + elif kind == "prior_sample": + write_model_samples(hps, datasets, hps.output_filename_stem) + elif kind == "write_model_params": + write_model_parameters(hps, hps.output_filename_stem, datasets) + else: + assert False, ("Kind %s is not implemented. " % kind) + + +if __name__ == "__main__": + tf.app.run() + diff --git a/lfads/synth_data/generate_chaotic_rnn_data.py b/lfads/synth_data/generate_chaotic_rnn_data.py new file mode 100644 index 0000000000000000000000000000000000000000..a89936df6f6f1ac7ccdd0d851b291bf706b8096c --- /dev/null +++ b/lfads/synth_data/generate_chaotic_rnn_data.py @@ -0,0 +1,193 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# ============================================================================== +from __future__ import print_function + +import h5py +import numpy as np +import os +import tensorflow as tf # used for flags here + +from utils import write_datasets +from synthetic_data_utils import add_alignment_projections, generate_data +from synthetic_data_utils import generate_rnn, get_train_n_valid_inds +from synthetic_data_utils import nparray_and_transpose +from synthetic_data_utils import spikify_data, split_list_by_inds +import matplotlib +import matplotlib.pyplot as plt +import scipy.signal + +matplotlib.rcParams['image.interpolation'] = 'nearest' +DATA_DIR = "rnn_synth_data_v1.0" + +flags = tf.app.flags +flags.DEFINE_string("save_dir", "/tmp/" + DATA_DIR + "/", + "Directory for saving data.") +flags.DEFINE_string("datafile_name", "thits_data", + "Name of data file for input case.") +flags.DEFINE_integer("synth_data_seed", 5, "Random seed for RNN generation.") +flags.DEFINE_float("T", 1.0, "Time in seconds to generate.") +flags.DEFINE_integer("C", 100, "Number of conditions") +flags.DEFINE_integer("N", 50, "Number of units for the RNN") +flags.DEFINE_integer("S", 50, "Number of sampled units from RNN") +flags.DEFINE_integer("npcs", 10, "Number of PCS for multi-session case.") +flags.DEFINE_float("train_percentage", 4.0/5.0, + "Percentage of train vs validation trials") +flags.DEFINE_integer("nspikifications", 40, + "Number of spikifications of the same underlying rates.") +flags.DEFINE_float("g", 1.5, "Complexity of dynamics") +flags.DEFINE_float("x0_std", 1.0, + "Volume from which to pull initial conditions (affects diversity of dynamics.") +flags.DEFINE_float("tau", 0.025, "Time constant of RNN") +flags.DEFINE_float("dt", 0.010, "Time bin") +flags.DEFINE_float("input_magnitude", 20.0, + "For the input case, what is the value of the input?") +flags.DEFINE_float("max_firing_rate", 30.0, "Map 1.0 of RNN to a spikes per second") +FLAGS = flags.FLAGS + + +# Note that with N small, (as it is 25 above), the finite size effects +# will have pretty dramatic effects on the dynamics of the random RNN. +# If you want more complex dynamics, you'll have to run the script a +# lot, or increase N (or g). + +# Getting hard vs. easy data can be a little stochastic, so we set the seed. + +# Pull out some commonly used parameters. +# These are user parameters (configuration) +rng = np.random.RandomState(seed=FLAGS.synth_data_seed) +T = FLAGS.T +C = FLAGS.C +N = FLAGS.N +S = FLAGS.S +input_magnitude = FLAGS.input_magnitude +nspikifications = FLAGS.nspikifications +E = nspikifications * C # total number of trials +# S is the number of measurements in each datasets, w/ each +# dataset having a different set of observations. +ndatasets = N/S # ok if rounded down +train_percentage = FLAGS.train_percentage +ntime_steps = int(T / FLAGS.dt) +# End of user parameters + +rnn = generate_rnn(rng, N, FLAGS.g, FLAGS.tau, FLAGS.dt, FLAGS.max_firing_rate) + +# Check to make sure the RNN is the one we used in the paper. +if N == 50: + assert abs(rnn['W'][0,0] - 0.06239899) < 1e-8, 'Error in random seed?' + rem_check = nspikifications * train_percentage + assert abs(rem_check - int(rem_check)) < 1e-8, \ + 'Train percentage * nspikifications should be integral number.' + + +# Initial condition generation, and condition label generation. This +# happens outside of the dataset loop, so that all datasets have the +# same conditions, which is similar to a neurophys setup. +condition_number = 0 +x0s = [] +condition_labels = [] +for c in range(C): + x0 = FLAGS.x0_std * rng.randn(N, 1) + x0s.append(np.tile(x0, nspikifications)) # replicate x0 nspikifications times + # replicate the condition label nspikifications times + for ns in range(nspikifications): + condition_labels.append(condition_number) + condition_number += 1 +x0s = np.concatenate(x0s, axis=1) + +# Containers for storing data across data. +datasets = {} +for n in range(ndatasets): + print(n+1, " of ", ndatasets) + + # First generate all firing rates. in the next loop, generate all + # spikifications this allows the random state for rate generation to be + # independent of n_spikifications. + dataset_name = 'dataset_N' + str(N) + '_S' + str(S) + if S < N: + dataset_name += '_n' + str(n+1) + + # Sample neuron subsets. The assumption is the PC axes of the RNN + # are not unit aligned, so sampling units is adequate to sample all + # the high-variance PCs. + P_sxn = np.eye(S,N) + for m in range(n): + P_sxn = np.roll(P_sxn, S, axis=1) + + if input_magnitude > 0.0: + # time of "hits" randomly chosen between [1/4 and 3/4] of total time + input_times = rng.choice(int(ntime_steps/2), size=[E]) + int(ntime_steps/4) + else: + input_times = None + + rates, x0s, inputs = \ + generate_data(rnn, T=T, E=E, x0s=x0s, P_sxn=P_sxn, + input_magnitude=input_magnitude, + input_times=input_times) + spikes = spikify_data(rates, rng, rnn['dt'], rnn['max_firing_rate']) + + # split into train and validation sets + train_inds, valid_inds = get_train_n_valid_inds(E, train_percentage, + nspikifications) + + # Split the data, inputs, labels and times into train vs. validation. + rates_train, rates_valid = \ + split_list_by_inds(rates, train_inds, valid_inds) + spikes_train, spikes_valid = \ + split_list_by_inds(spikes, train_inds, valid_inds) + input_train, inputs_valid = \ + split_list_by_inds(inputs, train_inds, valid_inds) + condition_labels_train, condition_labels_valid = \ + split_list_by_inds(condition_labels, train_inds, valid_inds) + input_times_train, input_times_valid = \ + split_list_by_inds(input_times, train_inds, valid_inds) + + # Turn rates, spikes, and input into numpy arrays. + rates_train = nparray_and_transpose(rates_train) + rates_valid = nparray_and_transpose(rates_valid) + spikes_train = nparray_and_transpose(spikes_train) + spikes_valid = nparray_and_transpose(spikes_valid) + input_train = nparray_and_transpose(input_train) + inputs_valid = nparray_and_transpose(inputs_valid) + + # Note that we put these 'truth' rates and input into this + # structure, the only data that is used in LFADS are the spike + # trains. The rest is either for printing or posterity. + data = {'train_truth': rates_train, + 'valid_truth': rates_valid, + 'input_train_truth' : input_train, + 'input_valid_truth' : inputs_valid, + 'train_data' : spikes_train, + 'valid_data' : spikes_valid, + 'train_percentage' : train_percentage, + 'nspikifications' : nspikifications, + 'dt' : rnn['dt'], + 'input_magnitude' : input_magnitude, + 'input_times_train' : input_times_train, + 'input_times_valid' : input_times_valid, + 'P_sxn' : P_sxn, + 'condition_labels_train' : condition_labels_train, + 'condition_labels_valid' : condition_labels_valid, + 'conversion_factor': 1.0 / rnn['conversion_factor']} + datasets[dataset_name] = data + +if S < N: + # Note that this isn't necessary for this synthetic example, but + # it's useful to see how the input factor matrices were initialized + # for actual neurophysiology data. + datasets = add_alignment_projections(datasets, npcs=FLAGS.npcs) + +# Write out the datasets. +write_datasets(FLAGS.save_dir, FLAGS.datafile_name, datasets) diff --git a/lfads/synth_data/generate_itb_data.py b/lfads/synth_data/generate_itb_data.py new file mode 100644 index 0000000000000000000000000000000000000000..e2e54179e267a59ef023483c499e2901e0b34ac6 --- /dev/null +++ b/lfads/synth_data/generate_itb_data.py @@ -0,0 +1,208 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# ============================================================================== +from __future__ import print_function + +import h5py +import numpy as np +import os +import tensorflow as tf + +from utils import write_datasets +from synthetic_data_utils import normalize_rates +from synthetic_data_utils import get_train_n_valid_inds, nparray_and_transpose +from synthetic_data_utils import spikify_data, split_list_by_inds + +DATA_DIR = "rnn_synth_data_v1.0" + +flags = tf.app.flags +flags.DEFINE_string("save_dir", "/tmp/" + DATA_DIR + "/", + "Directory for saving data.") +flags.DEFINE_string("datafile_name", "itb_rnn", + "Name of data file for input case.") +flags.DEFINE_integer("synth_data_seed", 5, "Random seed for RNN generation.") +flags.DEFINE_float("T", 1.0, "Time in seconds to generate.") +flags.DEFINE_integer("C", 800, "Number of conditions") +flags.DEFINE_integer("N", 50, "Number of units for the RNN") +flags.DEFINE_float("train_percentage", 4.0/5.0, + "Percentage of train vs validation trials") +flags.DEFINE_integer("nspikifications", 5, + "Number of spikifications of the same underlying rates.") +flags.DEFINE_float("tau", 0.025, "Time constant of RNN") +flags.DEFINE_float("dt", 0.010, "Time bin") +flags.DEFINE_float("max_firing_rate", 30.0, + "Map 1.0 of RNN to a spikes per second") +flags.DEFINE_float("u_std", 0.25, + "Std dev of input to integration to bound model") +flags.DEFINE_string("checkpoint_path", "SAMPLE_CHECKPOINT", + """Path to directory with checkpoints of model + trained on integration to bound task. Currently this + is a placeholder which tells the code to grab the + checkpoint that is provided with the code + (in /trained_itb/..). If you have your own checkpoint + you would like to restore, you would point it to + that path.""") +FLAGS = flags.FLAGS + + +class IntegrationToBoundModel: + def __init__(self, N): + scale = 0.8 / float(N**0.5) + self.N = N + self.Wh_nxn = tf.Variable(tf.random_normal([N, N], stddev=scale)) + self.b_1xn = tf.Variable(tf.zeros([1, N])) + self.Bu_1xn = tf.Variable(tf.zeros([1, N])) + self.Wro_nxo = tf.Variable(tf.random_normal([N, 1], stddev=scale)) + self.bro_o = tf.Variable(tf.zeros([1])) + + def call(self, h_tm1_bxn, u_bx1): + act_t_bxn = tf.matmul(h_tm1_bxn, self.Wh_nxn) + self.b_1xn + u_bx1 * self.Bu_1xn + h_t_bxn = tf.nn.tanh(act_t_bxn) + z_t = tf.nn.xw_plus_b(h_t_bxn, self.Wro_nxo, self.bro_o) + return z_t, h_t_bxn + +def get_data_batch(batch_size, T, rng, u_std): + u_bxt = rng.randn(batch_size, T) * u_std + running_sum_b = np.zeros([batch_size]) + labels_bxt = np.zeros([batch_size, T]) + for t in xrange(T): + running_sum_b += u_bxt[:, t] + labels_bxt[:, t] += running_sum_b + labels_bxt = np.clip(labels_bxt, -1, 1) + return u_bxt, labels_bxt + + +rng = np.random.RandomState(seed=FLAGS.synth_data_seed) +u_rng = np.random.RandomState(seed=FLAGS.synth_data_seed+1) +T = FLAGS.T +C = FLAGS.C +N = FLAGS.N # must be same N as in trained model (provided example is N = 50) +nspikifications = FLAGS.nspikifications +E = nspikifications * C # total number of trials +train_percentage = FLAGS.train_percentage +ntimesteps = int(T / FLAGS.dt) +batch_size = 1 # gives one example per ntrial + +model = IntegrationToBoundModel(N) +inputs_ph_t = [tf.placeholder(tf.float32, + shape=[None, 1]) for _ in range(ntimesteps)] +state = tf.zeros([batch_size, N]) +saver = tf.train.Saver() + +P_nxn = rng.randn(N,N) / np.sqrt(N) # random projections + +# unroll RNN for T timesteps +outputs_t = [] +states_t = [] + +for inp in inputs_ph_t: + output, state = model.call(state, inp) + outputs_t.append(output) + states_t.append(state) + +with tf.Session() as sess: + # restore the latest model ckpt + if FLAGS.checkpoint_path == "SAMPLE_CHECKPOINT": + dir_path = os.path.dirname(os.path.realpath(__file__)) + model_checkpoint_path = os.path.join(dir_path, "trained_itb/model-65000") + else: + model_checkpoint_path = FLAGS.checkpoint_path + try: + saver.restore(sess, model_checkpoint_path) + print ('Model restored from', model_checkpoint_path) + except: + assert False, ("No checkpoints to restore from, is the path %s correct?" + %model_checkpoint_path) + + # generate data for trials + data_e = [] + u_e = [] + outs_e = [] + for c in range(C): + u_1xt, outs_1xt = get_data_batch(batch_size, ntimesteps, u_rng, FLAGS.u_std) + + feed_dict = {} + for t in xrange(ntimesteps): + feed_dict[inputs_ph_t[t]] = np.reshape(u_1xt[:,t], (batch_size,-1)) + + states_t_bxn, outputs_t_bxn = sess.run([states_t, outputs_t], + feed_dict=feed_dict) + states_nxt = np.transpose(np.squeeze(np.asarray(states_t_bxn))) + outputs_t_bxn = np.squeeze(np.asarray(outputs_t_bxn)) + r_sxt = np.dot(P_nxn, states_nxt) + + for s in xrange(nspikifications): + data_e.append(r_sxt) + u_e.append(u_1xt) + outs_e.append(outputs_t_bxn) + + truth_data_e = normalize_rates(data_e, E, N) + +spiking_data_e = spikify_data(truth_data_e, rng, dt=FLAGS.dt, + max_firing_rate=FLAGS.max_firing_rate) +train_inds, valid_inds = get_train_n_valid_inds(E, train_percentage, + nspikifications) + +data_train_truth, data_valid_truth = split_list_by_inds(truth_data_e, + train_inds, + valid_inds) +data_train_spiking, data_valid_spiking = split_list_by_inds(spiking_data_e, + train_inds, + valid_inds) + +data_train_truth = nparray_and_transpose(data_train_truth) +data_valid_truth = nparray_and_transpose(data_valid_truth) +data_train_spiking = nparray_and_transpose(data_train_spiking) +data_valid_spiking = nparray_and_transpose(data_valid_spiking) + +# save down the inputs used to generate this data +train_inputs_u, valid_inputs_u = split_list_by_inds(u_e, + train_inds, + valid_inds) +train_inputs_u = nparray_and_transpose(train_inputs_u) +valid_inputs_u = nparray_and_transpose(valid_inputs_u) + +# save down the network outputs (may be useful later) +train_outputs_u, valid_outputs_u = split_list_by_inds(outs_e, + train_inds, + valid_inds) +train_outputs_u = np.array(train_outputs_u) +valid_outputs_u = np.array(valid_outputs_u) + + +data = { 'train_truth': data_train_truth, + 'valid_truth': data_valid_truth, + 'train_data' : data_train_spiking, + 'valid_data' : data_valid_spiking, + 'train_percentage' : train_percentage, + 'nspikifications' : nspikifications, + 'dt' : FLAGS.dt, + 'u_std' : FLAGS.u_std, + 'max_firing_rate': FLAGS.max_firing_rate, + 'train_inputs_u': train_inputs_u, + 'valid_inputs_u': valid_inputs_u, + 'train_outputs_u': train_outputs_u, + 'valid_outputs_u': valid_outputs_u, + 'conversion_factor' : FLAGS.max_firing_rate/(1.0/FLAGS.dt) } + +# just one dataset here +datasets = {} +dataset_name = 'dataset_N' + str(N) +datasets[dataset_name] = data + +# write out the dataset +write_datasets(FLAGS.save_dir, FLAGS.datafile_name, datasets) +print ('Saved to ', os.path.join(FLAGS.save_dir, + FLAGS.datafile_name + '_' + dataset_name)) diff --git a/lfads/synth_data/generate_labeled_rnn_data.py b/lfads/synth_data/generate_labeled_rnn_data.py new file mode 100644 index 0000000000000000000000000000000000000000..8cb40908a0ae457a372e21db8c942b6ecfe023dd --- /dev/null +++ b/lfads/synth_data/generate_labeled_rnn_data.py @@ -0,0 +1,146 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# ============================================================================== +from __future__ import print_function + +import os +import h5py +import numpy as np + +from synthetic_data_utils import generate_data, generate_rnn +from synthetic_data_utils import get_train_n_valid_inds +from synthetic_data_utils import nparray_and_transpose +from synthetic_data_utils import spikify_data, split_list_by_inds +import tensorflow as tf +from utils import write_datasets + +DATA_DIR = "rnn_synth_data_v1.0" + +flags = tf.app.flags +flags.DEFINE_string("save_dir", "/tmp/" + DATA_DIR + "/", + "Directory for saving data.") +flags.DEFINE_string("datafile_name", "conditioned_rnn_data", + "Name of data file for input case.") +flags.DEFINE_integer("synth_data_seed", 5, "Random seed for RNN generation.") +flags.DEFINE_float("T", 1.0, "Time in seconds to generate.") +flags.DEFINE_integer("C", 400, "Number of conditions") +flags.DEFINE_integer("N", 50, "Number of units for the RNN") +flags.DEFINE_float("train_percentage", 4.0/5.0, + "Percentage of train vs validation trials") +flags.DEFINE_integer("nspikifications", 10, + "Number of spikifications of the same underlying rates.") +flags.DEFINE_float("g", 1.5, "Complexity of dynamics") +flags.DEFINE_float("x0_std", 1.0, + "Volume from which to pull initial conditions (affects diversity of dynamics.") +flags.DEFINE_float("tau", 0.025, "Time constant of RNN") +flags.DEFINE_float("dt", 0.010, "Time bin") +flags.DEFINE_float("max_firing_rate", 30.0, "Map 1.0 of RNN to a spikes per second") +FLAGS = flags.FLAGS + +rng = np.random.RandomState(seed=FLAGS.synth_data_seed) +rnn_rngs = [np.random.RandomState(seed=FLAGS.synth_data_seed+1), + np.random.RandomState(seed=FLAGS.synth_data_seed+2)] +T = FLAGS.T +C = FLAGS.C +N = FLAGS.N +nspikifications = FLAGS.nspikifications +E = nspikifications * C +train_percentage = FLAGS.train_percentage +ntimesteps = int(T / FLAGS.dt) + +rnn_a = generate_rnn(rnn_rngs[0], N, FLAGS.g, FLAGS.tau, FLAGS.dt, + FLAGS.max_firing_rate) +rnn_b = generate_rnn(rnn_rngs[1], N, FLAGS.g, FLAGS.tau, FLAGS.dt, + FLAGS.max_firing_rate) +rnns = [rnn_a, rnn_b] + +# pick which RNN is used on each trial +rnn_to_use = rng.randint(2, size=E) +ext_input = np.repeat(np.expand_dims(rnn_to_use, axis=1), ntimesteps, axis=1) +ext_input = np.expand_dims(ext_input, axis=2) # these are "a's" in the paper + +x0s = [] +condition_labels = [] +condition_number = 0 +for c in range(C): + x0 = FLAGS.x0_std * rng.randn(N, 1) + x0s.append(np.tile(x0, nspikifications)) + for ns in range(nspikifications): + condition_labels.append(condition_number) + condition_number += 1 +x0s = np.concatenate(x0s, axis=1) + +P_nxn = rng.randn(N, N) / np.sqrt(N) + +# generate trials for both RNNs +rates_a, x0s_a, _ = generate_data(rnn_a, T=T, E=E, x0s=x0s, P_sxn=P_nxn, + input_magnitude=0.0, input_times=None) +spikes_a = spikify_data(rates_a, rng, rnn_a['dt'], rnn_a['max_firing_rate']) + +rates_b, x0s_b, _ = generate_data(rnn_b, T=T, E=E, x0s=x0s, P_sxn=P_nxn, + input_magnitude=0.0, input_times=None) +spikes_b = spikify_data(rates_b, rng, rnn_b['dt'], rnn_b['max_firing_rate']) + +# not the best way to do this but E is small enough +rates = [] +spikes = [] +for trial in xrange(E): + if rnn_to_use[trial] == 0: + rates.append(rates_a[trial]) + spikes.append(spikes_a[trial]) + else: + rates.append(rates_b[trial]) + spikes.append(spikes_b[trial]) + +# split into train and validation sets +train_inds, valid_inds = get_train_n_valid_inds(E, train_percentage, + nspikifications) + +rates_train, rates_valid = split_list_by_inds(rates, train_inds, valid_inds) +spikes_train, spikes_valid = split_list_by_inds(spikes, train_inds, valid_inds) +condition_labels_train, condition_labels_valid = split_list_by_inds( + condition_labels, train_inds, valid_inds) +ext_input_train, ext_input_valid = split_list_by_inds( + ext_input, train_inds, valid_inds) + +rates_train = nparray_and_transpose(rates_train) +rates_valid = nparray_and_transpose(rates_valid) +spikes_train = nparray_and_transpose(spikes_train) +spikes_valid = nparray_and_transpose(spikes_valid) + +# add train_ext_input and valid_ext input +data = {'train_truth': rates_train, + 'valid_truth': rates_valid, + 'train_data' : spikes_train, + 'valid_data' : spikes_valid, + 'train_ext_input' : np.array(ext_input_train), + 'valid_ext_input': np.array(ext_input_valid), + 'train_percentage' : train_percentage, + 'nspikifications' : nspikifications, + 'dt' : FLAGS.dt, + 'P_sxn' : P_nxn, + 'condition_labels_train' : condition_labels_train, + 'condition_labels_valid' : condition_labels_valid, + 'conversion_factor': 1.0 / rnn_a['conversion_factor']} + +# just one dataset here +datasets = {} +dataset_name = 'dataset_N' + str(N) +datasets[dataset_name] = data + +# write out the dataset +write_datasets(FLAGS.save_dir, FLAGS.datafile_name, datasets) +print ('Saved to ', os.path.join(FLAGS.save_dir, + FLAGS.datafile_name + '_' + dataset_name)) diff --git a/lfads/synth_data/run_generate_synth_data.sh b/lfads/synth_data/run_generate_synth_data.sh new file mode 100755 index 0000000000000000000000000000000000000000..c73fee5b11b1eb826f767cb80c3865c27da33589 --- /dev/null +++ b/lfads/synth_data/run_generate_synth_data.sh @@ -0,0 +1,37 @@ +#!/bin/bash + +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# ============================================================================== + +SYNTH_PATH=/tmp/rnn_synth_data_v1.0/ + +echo "Generating chaotic rnn data with no input pulses (g=1.5)" +python generate_chaotic_rnn_data.py --save_dir=$SYNTH_PATH --datafile_name=chaotic_rnn_no_inputs --synth_data_seed=5 --T=1.0 --C=400 --N=50 --S=50 --train_percentage=0.8 --nspikifications=10 --g=1.5 --x0_std=1.0 --tau=0.025 --dt=0.01 --input_magnitude=0.0 --max_firing_rate=30.0 + +echo "Generating chaotic rnn data with input pulses (g=1.5)" +python generate_chaotic_rnn_data.py --save_dir=$SYNTH_PATH --datafile_name=chaotic_rnn_inputs_g1p5 --synth_data_seed=5 --T=1.0 --C=400 --N=50 --S=50 --train_percentage=0.8 --nspikifications=10 --g=1.5 --x0_std=1.0 --tau=0.025 --dt=0.01 --input_magnitude=20.0 --max_firing_rate=30.0 + +echo "Generating chaotic rnn data with input pulses (g=2.5)" +python generate_chaotic_rnn_data.py --save_dir=$SYNTH_PATH --datafile_name=chaotic_rnn_inputs_g2p5 --synth_data_seed=5 --T=1.0 --C=400 --N=50 --S=50 --train_percentage=0.8 --nspikifications=10 --g=2.5 --x0_std=1.0 --tau=0.025 --dt=0.01 --input_magnitude=20.0 --max_firing_rate=30.0 + +echo "Generate the multi-session RNN data (no multi-session synth example in paper)" +python generate_chaotic_rnn_data.py --save_dir=$SYNTH_PATH --datafile_name=chaotic_rnn_multisession --synth_data_seed=5 --T=1.0 --C=150 --N=100 --S=20 --npcs=10 --train_percentage=0.8 --nspikifications=40 --g=1.5 --x0_std=1.0 --tau=0.025 --dt=0.01 --input_magnitude=0.0 --max_firing_rate=30.0 + +echo "Generating Integration-to-bound RNN data" +python generate_itb_data.py --save_dir=$SYNTH_PATH --datafile_name=itb_rnn --u_std=0.25 --checkpoint_path=SAMPLE_CHECKPOINT --synth_data_seed=5 --T=1.0 --C=800 --N=50 --train_percentage=0.8 --nspikifications=5 --tau=0.025 --dt=0.01 --max_firing_rate=30.0 + +echo "Generating chaotic rnn data with external input labels (no external input labels example in paper)" +python generate_labeled_rnn_data.py --save_dir=$SYNTH_PATH --datafile_name=chaotic_rnns_labeled --synth_data_seed=5 --T=1.0 --C=400 --N=50 --train_percentage=0.8 --nspikifications=10 --g=1.5 --x0_std=1.0 --tau=0.025 --dt=0.01 --max_firing_rate=30.0 diff --git a/lfads/synth_data/synthetic_data_utils.py b/lfads/synth_data/synthetic_data_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..d01031c191e2abb8e38a936138d9dcd2c8a215fe --- /dev/null +++ b/lfads/synth_data/synthetic_data_utils.py @@ -0,0 +1,322 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# ============================================================================== +from __future__ import print_function + +import h5py +import numpy as np +import os + +from utils import write_datasets +import matplotlib +import matplotlib.pyplot as plt +import scipy.signal + + +def generate_rnn(rng, N, g, tau, dt, max_firing_rate): + """Create a (vanilla) RNN with a bunch of hyper parameters for generating +chaotic data. + Args: + rng: numpy random number generator + N: number of hidden units + g: scaling of recurrent weight matrix in g W, with W ~ N(0,1/N) + tau: time scale of individual unit dynamics + dt: time step for equation updates + max_firing_rate: how to resecale the -1,1 firing rates + Returns: + the dictionary of these parameters, plus some others. +""" + rnn = {} + rnn['N'] = N + rnn['W'] = rng.randn(N,N)/np.sqrt(N) + rnn['Bin'] = rng.randn(N)/np.sqrt(1.0) + rnn['Bin2'] = rng.randn(N)/np.sqrt(1.0) + rnn['b'] = np.zeros(N) + rnn['g'] = g + rnn['tau'] = tau + rnn['dt'] = dt + rnn['max_firing_rate'] = max_firing_rate + mfr = rnn['max_firing_rate'] # spikes / sec + nbins_per_sec = 1.0/rnn['dt'] # bins / sec + # Used for plotting in LFADS + rnn['conversion_factor'] = mfr / nbins_per_sec # spikes / bin + return rnn + + +def generate_data(rnn, T, E, x0s=None, P_sxn=None, input_magnitude=0.0, + input_times=None): + """ Generates data from an randomly initialized RNN. + Args: + rnn: the rnn + T: Time in seconds to run (divided by rnn['dt'] to get steps, rounded down. + E: total number of examples + S: number of samples (subsampling N) + Returns: + A list of length E of NxT tensors of the network being run. + """ + N = rnn['N'] + def run_rnn(rnn, x0, ntime_steps, input_time=None): + rs = np.zeros([N,ntime_steps]) + x_tm1 = x0 + r_tm1 = np.tanh(x0) + tau = rnn['tau'] + dt = rnn['dt'] + alpha = (1.0-dt/tau) + W = dt/tau*rnn['W']*rnn['g'] + Bin = dt/tau*rnn['Bin'] + Bin2 = dt/tau*rnn['Bin2'] + b = dt/tau*rnn['b'] + + us = np.zeros([1, ntime_steps]) + for t in range(ntime_steps): + x_t = alpha*x_tm1 + np.dot(W,r_tm1) + b + if input_time is not None and t == input_time: + us[0,t] = input_magnitude + x_t += Bin * us[0,t] # DCS is this what was used? + r_t = np.tanh(x_t) + x_tm1 = x_t + r_tm1 = r_t + rs[:,t] = r_t + return rs, us + + if P_sxn is None: + P_sxn = np.eye(N) + ntime_steps = int(T / rnn['dt']) + data_e = [] + inputs_e = [] + for e in range(E): + input_time = input_times[e] if input_times is not None else None + r_nxt, u_uxt = run_rnn(rnn, x0s[:,e], ntime_steps, input_time) + r_sxt = np.dot(P_sxn, r_nxt) + inputs_e.append(u_uxt) + data_e.append(r_sxt) + + S = P_sxn.shape[0] + data_e = normalize_rates(data_e, E, S) + + return data_e, x0s, inputs_e + + +def normalize_rates(data_e, E, S): + # Normalization, made more complex because of the P matrices. + # Normalize by min and max in each channel. This normalization will + # cause offset differences between identical rnn runs, but different + # t hits. + for e in range(E): + r_sxt = data_e[e] + for i in range(S): + rmin = np.min(r_sxt[i,:]) + rmax = np.max(r_sxt[i,:]) + assert rmax - rmin != 0, 'Something wrong' + r_sxt[i,:] = (r_sxt[i,:] - rmin)/(rmax-rmin) + data_e[e] = r_sxt + return data_e + + +def spikify_data(data_e, rng, dt=1.0, max_firing_rate=100): + """ Apply spikes to a continuous dataset whose values are between 0.0 and 1.0 + Args: + data_e: nexamples length list of NxT trials + dt: how often the data are sampled + max_firing_rate: the firing rate that is associated with a value of 1.0 + Returns: + spikified_data_e: a list of length b of the data represented as spikes, + sampled from the underlying poisson process. + """ + + spikifies_data_e = [] + E = len(data_e) + spikes_e = [] + for e in range(E): + data = data_e[e] + N,T = data.shape + data_s = np.zeros([N,T]).astype(np.int) + for n in range(N): + f = data[n,:] + s = rng.poisson(f*max_firing_rate*dt, size=T) + data_s[n,:] = s + spikes_e.append(data_s) + + return spikes_e + + +def get_train_n_valid_inds(num_trials, train_fraction, nspikifications): + """Split the numbers between 0 and num_trials-1 into two portions for + training and validation, based on the train fraction. + Args: + num_trials: the number of trials + train_fraction: (e.g. .80) + nspikifications: the number of spiking trials per initial condition + Returns: + a 2-tuple of two lists: the training indices and validation indices + """ + train_inds = [] + valid_inds = [] + for i in range(num_trials): + # This line divides up the trials so that within one initial condition, + # the randomness of spikifying the condition is shared among both + # training and validation data splits. + if (i % nspikifications)+1 > train_fraction * nspikifications: + valid_inds.append(i) + else: + train_inds.append(i) + + return train_inds, valid_inds + + +def split_list_by_inds(data, inds1, inds2): + """Take the data, a list, and split it up based on the indices in inds1 and + inds2. + Args: + data: the list of data to split + inds1, the first list of indices + inds2, the second list of indices + Returns: a 2-tuple of two lists. + """ + if data is None or len(data) == 0: + return [], [] + else: + dout1 = [data[i] for i in inds1] + dout2 = [data[i] for i in inds2] + return dout1, dout2 + + +def nparray_and_transpose(data_a_b_c): + """Convert the list of items in data to a numpy array, and transpose it + Args: + data: data_asbsc: a nested, nested list of length a, with sublist length + b, with sublist length c. + Returns: + a numpy 3-tensor with dimensions a x c x b +""" + data_axbxc = np.array([datum_b_c for datum_b_c in data_a_b_c]) + data_axcxb = np.transpose(data_axbxc, axes=[0,2,1]) + return data_axcxb + + +def add_alignment_projections(datasets, npcs, ntime=None, nsamples=None): + """Create a matrix that aligns the datasets a bit, under + the assumption that each dataset is observing the same underlying dynamical + system. + + Args: + datasets: The dictionary of dataset structures. + npcs: The number of pcs for each, basically like lfads factors. + nsamples (optional): Number of samples to take for each dataset. + ntime (optional): Number of time steps to take in each sample. + + Returns: + The dataset structures, with the field alignment_matrix_cxf added. + This is # channels x npcs dimension +""" + nchannels_all = 0 + channel_idxs = {} + conditions_all = {} + nconditions_all = 0 + for name, dataset in datasets.items(): + cidxs = np.where(dataset['P_sxn'])[1] # non-zero entries in columns + channel_idxs[name] = [cidxs[0], cidxs[-1]+1] + nchannels_all += cidxs[-1]+1 - cidxs[0] + conditions_all[name] = np.unique(dataset['condition_labels_train']) + + all_conditions_list = \ + np.unique(np.ndarray.flatten(np.array(conditions_all.values()))) + nconditions_all = all_conditions_list.shape[0] + + if ntime is None: + ntime = dataset['train_data'].shape[1] + if nsamples is None: + nsamples = dataset['train_data'].shape[0] + + # In the data workup in the paper, Chethan did intra condition + # averaging, so let's do that here. + avg_data_all = {} + for name, conditions in conditions_all.items(): + dataset = datasets[name] + avg_data_all[name] = {} + for cname in conditions: + td_idxs = np.argwhere(np.array(dataset['condition_labels_train'])==cname) + data = np.squeeze(dataset['train_data'][td_idxs,:,:], axis=1) + avg_data = np.mean(data, axis=0) + avg_data_all[name][cname] = avg_data + + # Visualize this in the morning. + all_data_nxtc = np.zeros([nchannels_all, ntime * nconditions_all]) + for name, dataset in datasets.items(): + cidx_s = channel_idxs[name][0] + cidx_f = channel_idxs[name][1] + for cname in conditions_all[name]: + cidxs = np.argwhere(all_conditions_list == cname) + if cidxs.shape[0] > 0: + cidx = cidxs[0][0] + all_tidxs = np.arange(0, ntime+1) + cidx*ntime + all_data_nxtc[cidx_s:cidx_f, all_tidxs[0]:all_tidxs[-1]] = \ + avg_data_all[name][cname].T + + # A bit of filtering. We don't care about spectral properties, or + # filtering artifacts, simply correlate time steps a bit. + filt_len = 6 + bc_filt = np.ones([filt_len])/float(filt_len) + for c in range(nchannels_all): + all_data_nxtc[c,:] = scipy.signal.filtfilt(bc_filt, [1.0], all_data_nxtc[c,:]) + + # Compute the PCs. + all_data_mean_nx1 = np.mean(all_data_nxtc, axis=1, keepdims=True) + all_data_zm_nxtc = all_data_nxtc - all_data_mean_nx1 + corr_mat_nxn = np.dot(all_data_zm_nxtc, all_data_zm_nxtc.T) + evals_n, evecs_nxn = np.linalg.eigh(corr_mat_nxn) + sidxs = np.flipud(np.argsort(evals_n)) # sort such that 0th is highest + evals_n = evals_n[sidxs] + evecs_nxn = evecs_nxn[:,sidxs] + + # Project all the channels data onto the low-D PCA basis, where + # low-d is the npcs parameter. + all_data_pca_pxtc = np.dot(evecs_nxn[:, 0:npcs].T, all_data_zm_nxtc) + + # Now for each dataset, we regress the channel data onto the top + # pcs, and this will be our alignment matrix for that dataset. + # |B - A*W|^2 + for name, dataset in datasets.items(): + cidx_s = channel_idxs[name][0] + cidx_f = channel_idxs[name][1] + all_data_zm_chxtc = all_data_zm_nxtc[cidx_s:cidx_f,:] # ch for channel + W_chxp, _, _, _ = \ + np.linalg.lstsq(all_data_zm_chxtc.T, all_data_pca_pxtc.T) + dataset['alignment_matrix_cxf'] = W_chxp + + do_debug_plot = False + if do_debug_plot: + pc_vecs = evecs_nxn[:,0:npcs] + ntoplot = 400 + + plt.figure() + plt.plot(np.log10(evals_n), '-x') + plt.figure() + plt.subplot(311) + plt.imshow(all_data_pca_pxtc) + plt.colorbar() + + plt.subplot(312) + plt.imshow(np.dot(W_chxp.T, all_data_zm_chxtc)) + plt.colorbar() + + plt.subplot(313) + plt.imshow(np.dot(all_data_zm_chxtc.T, W_chxp).T - all_data_pca_pxtc) + plt.colorbar() + + import pdb + pdb.set_trace() + + return datasets diff --git a/lfads/synth_data/trained_itb/model-65000.data-00000-of-00001 b/lfads/synth_data/trained_itb/model-65000.data-00000-of-00001 new file mode 100644 index 0000000000000000000000000000000000000000..9459a2a1b72f56dc16b3eca210911f14081e7fd5 Binary files /dev/null and b/lfads/synth_data/trained_itb/model-65000.data-00000-of-00001 differ diff --git a/lfads/synth_data/trained_itb/model-65000.index b/lfads/synth_data/trained_itb/model-65000.index new file mode 100644 index 0000000000000000000000000000000000000000..dd9c793acf8dc79e07833d1c0edc8a2fa86d806a Binary files /dev/null and b/lfads/synth_data/trained_itb/model-65000.index differ diff --git a/lfads/synth_data/trained_itb/model-65000.meta b/lfads/synth_data/trained_itb/model-65000.meta new file mode 100644 index 0000000000000000000000000000000000000000..07bd2b9688eda16e329e7b08492151a65a88fb8a Binary files /dev/null and b/lfads/synth_data/trained_itb/model-65000.meta differ diff --git a/lfads/utils.py b/lfads/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..7eb1db84f298894803b3917da6e018cb706ba0c4 --- /dev/null +++ b/lfads/utils.py @@ -0,0 +1,357 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# ============================================================================== +from __future__ import print_function + +import os +import h5py +import json + +import numpy as np +import tensorflow as tf + + +def log_sum_exp(x_k): + """Computes log \sum exp in a numerically stable way. + log ( sum_i exp(x_i) ) + log ( sum_i exp(x_i - m + m) ), with m = max(x_i) + log ( sum_i exp(x_i - m)*exp(m) ) + log ( sum_i exp(x_i - m) + m + + Args: + x_k - k -dimensional list of arguments to log_sum_exp. + + Returns: + log_sum_exp of the arguments. + """ + m = tf.reduce_max(x_k) + x1_k = x_k - m + u_k = tf.exp(x1_k) + z = tf.reduce_sum(u_k) + return tf.log(z) + m + + +def linear(x, out_size, do_bias=True, alpha=1.0, identity_if_possible=False, + normalized=False, name=None, collections=None): + """Linear (affine) transformation, y = x W + b, for a variety of + configurations. + + Args: + x: input The tensor to tranformation. + out_size: The integer size of non-batch output dimension. + do_bias (optional): Add a learnable bias vector to the operation. + alpha (optional): A multiplicative scaling for the weight initialization + of the matrix, in the form \alpha * 1/\sqrt{x.shape[1]}. + identity_if_possible (optional): just return identity, + if x.shape[1] == out_size. + normalized (optional): Option to divide out by the norms of the rows of W. + name (optional): The name prefix to add to variables. + collections (optional): List of additional collections. (Placed in + tf.GraphKeys.GLOBAL_VARIABLES already, so no need for that.) + + Returns: + In the equation, y = x W + b, returns the tensorflow op that yields y. + """ + in_size = int(x.get_shape()[1]) # from Dimension(10) -> 10 + stddev = alpha/np.sqrt(float(in_size)) + mat_init = tf.random_normal_initializer(0.0, stddev) + wname = (name + "/W") if name else "/W" + + if identity_if_possible and in_size == out_size: + # Sometimes linear layers are nothing more than size adapters. + return tf.identity(x, name=(wname+'_ident')) + + W,b = init_linear(in_size, out_size, do_bias=do_bias, alpha=alpha, + normalized=normalized, name=name, collections=collections) + + if do_bias: + return tf.matmul(x, W) + b + else: + return tf.matmul(x, W) + + +def init_linear(in_size, out_size, do_bias=True, mat_init_value=None, alpha=1.0, + identity_if_possible=False, normalized=False, + name=None, collections=None): + """Linear (affine) transformation, y = x W + b, for a variety of + configurations. + + Args: + in_size: The integer size of the non-batc input dimension. [(x),y] + out_size: The integer size of non-batch output dimension. [x,(y)] + do_bias (optional): Add a learnable bias vector to the operation. + mat_init_value (optional): numpy constant for matrix initialization, if None + , do random, with additional parameters. + alpha (optional): A multiplicative scaling for the weight initialization + of the matrix, in the form \alpha * 1/\sqrt{x.shape[1]}. + identity_if_possible (optional): just return identity, + if x.shape[1] == out_size. + normalized (optional): Option to divide out by the norms of the rows of W. + name (optional): The name prefix to add to variables. + collections (optional): List of additional collections. (Placed in + tf.GraphKeys.GLOBAL_VARIABLES already, so no need for that.) + + Returns: + In the equation, y = x W + b, returns the pair (W, b). + """ + + if mat_init_value is not None and mat_init_value.shape != (in_size, out_size): + raise ValueError( + 'Provided mat_init_value must have shape [%d, %d].'%(in_size, out_size)) + + if mat_init_value is None: + stddev = alpha/np.sqrt(float(in_size)) + mat_init = tf.random_normal_initializer(0.0, stddev) + + wname = (name + "/W") if name else "/W" + + if identity_if_possible and in_size == out_size: + return (tf.constant(np.eye(in_size).astype(np.float32)), + tf.zeros(in_size)) + + # Note the use of get_variable vs. tf.Variable. this is because get_variable + # does not allow the initialization of the variable with a value. + if normalized: + w_collections = [tf.GraphKeys.GLOBAL_VARIABLES, "norm-variables"] + if collections: + w_collections += collections + if mat_init_value is not None: + w = tf.Variable(mat_init_value, name=wname, collections=w_collections) + else: + w = tf.get_variable(wname, [in_size, out_size], initializer=mat_init, + collections=w_collections) + w = tf.nn.l2_normalize(w, dim=0) # x W, so xW_j = \sum_i x_bi W_ij + else: + w_collections = [tf.GraphKeys.GLOBAL_VARIABLES] + if collections: + w_collections += collections + if mat_init_value is not None: + w = tf.Variable(mat_init_value, name=wname, collections=w_collections) + else: + w = tf.get_variable(wname, [in_size, out_size], initializer=mat_init, + collections=w_collections) + + if do_bias: + b_collections = [tf.GraphKeys.GLOBAL_VARIABLES] + if collections: + b_collections += collections + bname = (name + "/b") if name else "/b" + b = tf.get_variable(bname, [1, out_size], + initializer=tf.zeros_initializer(), + collections=b_collections) + else: + b = None + + return (w, b) + + +def write_data(data_fname, data_dict, use_json=False, compression=None): + """Write data in HD5F format. + + Args: + data_fname: The filename of teh file in which to write the data. + data_dict: The dictionary of data to write. The keys are strings + and the values are numpy arrays. + use_json (optional): human readable format for simple items + compression (optional): The compression to use for h5py (disabled by + default because the library borks on scalars, otherwise try 'gzip'). + """ + + dir_name = os.path.dirname(data_fname) + if not os.path.exists(dir_name): + os.makedirs(dir_name) + + if use_json: + the_file = open(data_fname,'w') + json.dump(data_dict, the_file) + the_file.close() + else: + try: + with h5py.File(data_fname, 'w') as hf: + for k, v in data_dict.items(): + clean_k = k.replace('/', '_') + if clean_k is not k: + print('Warning: saving variable with name: ', k, ' as ', clean_k) + else: + print('Saving variable with name: ', clean_k) + hf.create_dataset(clean_k, data=v, compression=compression) + except IOError: + print("Cannot open %s for writing.", data_fname) + raise + + +def read_data(data_fname): + """ Read saved data in HDF5 format. + + Args: + data_fname: The filename of the file from which to read the data. + Returns: + A dictionary whose keys will vary depending on dataset (but should + always contain the keys 'train_data' and 'valid_data') and whose + values are numpy arrays. + """ + + try: + with h5py.File(data_fname, 'r') as hf: + data_dict = {k: np.array(v) for k, v in hf.items()} + return data_dict + except IOError: + print("Cannot open %s for reading." % data_fname) + raise + + +def write_datasets(data_path, data_fname_stem, dataset_dict, compression=None): + """Write datasets in HD5F format. + + This function assumes the dataset_dict is a mapping ( string -> + to data_dict ). It calls write_data for each data dictionary, + post-fixing the data filename with the key of the dataset. + + Args: + data_path: The path to the save directory. + data_fname_stem: The filename stem of the file in which to write the data. + dataset_dict: The dictionary of datasets. The keys are strings + and the values data dictionaries (str -> numpy arrays) associations. + compression (optional): The compression to use for h5py (disabled by + default because the library borks on scalars, otherwise try 'gzip'). + """ + + full_name_stem = os.path.join(data_path, data_fname_stem) + for s, data_dict in dataset_dict.items(): + write_data(full_name_stem + "_" + s, data_dict, compression=compression) + + +def read_datasets(data_path, data_fname_stem): + """Read dataset sin HD5F format. + + This function assumes the dataset_dict is a mapping ( string -> + to data_dict ). It calls write_data for each data dictionary, + post-fixing the data filename with the key of the dataset. + + Args: + data_path: The path to the save directory. + data_fname_stem: The filename stem of the file in which to write the data. + """ + + dataset_dict = {} + fnames = os.listdir(data_path) + + print ('loading data from ' + data_path + ' with stem ' + data_fname_stem) + for fname in fnames: + if fname.startswith(data_fname_stem): + data_dict = read_data(os.path.join(data_path,fname)) + idx = len(data_fname_stem) + 1 + key = fname[idx:] + data_dict['data_dim'] = data_dict['train_data'].shape[2] + data_dict['num_steps'] = data_dict['train_data'].shape[1] + dataset_dict[key] = data_dict + + if len(dataset_dict) == 0: + raise ValueError("Failed to load any datasets, are you sure that the " + "'--data_dir' and '--data_filename_stem' flag values " + "are correct?") + + print (str(len(dataset_dict)) + ' datasets loaded') + return dataset_dict + + +# NUMPY utility functions +def list_t_bxn_to_list_b_txn(values_t_bxn): + """Convert a length T list of BxN numpy tensors of length B list of TxN numpy + tensors. + + Args: + values_t_bxn: The length T list of BxN numpy tensors. + + Returns: + The length B list of TxN numpy tensors. + """ + T = len(values_t_bxn) + B, N = values_t_bxn[0].shape + values_b_txn = [] + for b in range(B): + values_pb_txn = np.zeros([T,N]) + for t in range(T): + values_pb_txn[t,:] = values_t_bxn[t][b,:] + values_b_txn.append(values_pb_txn) + + return values_b_txn + + +def list_t_bxn_to_tensor_bxtxn(values_t_bxn): + """Convert a length T list of BxN numpy tensors to single numpy tensor with + shape BxTxN. + + Args: + values_t_bxn: The length T list of BxN numpy tensors. + + Returns: + values_bxtxn: The BxTxN numpy tensor. + """ + + T = len(values_t_bxn) + B, N = values_t_bxn[0].shape + values_bxtxn = np.zeros([B,T,N]) + for t in range(T): + values_bxtxn[:,t,:] = values_t_bxn[t] + + return values_bxtxn + + +def tensor_bxtxn_to_list_t_bxn(tensor_bxtxn): + """Convert a numpy tensor with shape BxTxN to a length T list of numpy tensors + with shape BxT. + + Args: + tensor_bxtxn: The BxTxN numpy tensor. + + Returns: + A length T list of numpy tensors with shape BxT. + """ + + values_t_bxn = [] + B, T, N = tensor_bxtxn.shape + for t in range(T): + values_t_bxn.append(np.squeeze(tensor_bxtxn[:,t,:])) + + return values_t_bxn + + +def flatten(list_of_lists): + """Takes a list of lists and returns a list of the elements. + + Args: + list_of_lists: List of lists. + + Returns: + flat_list: Flattened list. + flat_list_idxs: Flattened list indices. + """ + flat_list = [] + flat_list_idxs = [] + start_idx = 0 + for item in list_of_lists: + if isinstance(item, list): + flat_list += item + l = len(item) + idxs = range(start_idx, start_idx+l) + start_idx = start_idx+l + else: # a value + flat_list.append(item) + idxs = [start_idx] + start_idx += 1 + flat_list_idxs.append(idxs) + + return flat_list, flat_list_idxs diff --git a/lm_1b/README.md b/lm_1b/README.md index 86203cd646c26e870aacebc5e1e06df709674b15..24de775c86b8b2d0b680d2188841ed9a138df462 100644 --- a/lm_1b/README.md +++ b/lm_1b/README.md @@ -73,7 +73,7 @@ LSTM-8192-2048 (50\% Dropout) | 32.2 | 3.3 How To Run -Pre-requesite: +Prerequisites: * Install TensorFlow. * Install Bazel. @@ -97,7 +97,7 @@ Pre-requesite: [link](http://download.tensorflow.org/models/LM_LSTM_CNN/vocab-2016-09-10.txt) * test dataset: link [link](http://download.tensorflow.org/models/LM_LSTM_CNN/test/news.en.heldout-00000-of-00050) -* It is recommended to run on modern desktop instead of laptop. +* It is recommended to run on a modern desktop instead of a laptop. ```shell # 1. Clone the code to your workspace. @@ -105,7 +105,7 @@ Pre-requesite: # 3. Create an empty WORKSPACE file in your workspace. # 4. Create an empty output directory in your workspace. # Example directory structure below: -ls -R +$ ls -R .: data lm_1b output WORKSPACE @@ -121,13 +121,13 @@ BUILD data_utils.py lm_1b_eval.py README.md ./output: # Build the codes. -bazel build -c opt lm_1b/... +$ bazel build -c opt lm_1b/... # Run sample mode: -bazel-bin/lm_1b/lm_1b_eval --mode sample \ - --prefix "I love that I" \ - --pbtxt data/graph-2016-09-10.pbtxt \ - --vocab_file data/vocab-2016-09-10.txt \ - --ckpt 'data/ckpt-*' +$ bazel-bin/lm_1b/lm_1b_eval --mode sample \ + --prefix "I love that I" \ + --pbtxt data/graph-2016-09-10.pbtxt \ + --vocab_file data/vocab-2016-09-10.txt \ + --ckpt 'data/ckpt-*' ...(omitted some TensorFlow output) I love I love that @@ -138,11 +138,11 @@ I love that I find that amazing ...(omitted) # Run eval mode: -bazel-bin/lm_1b/lm_1b_eval --mode eval \ - --pbtxt data/graph-2016-09-10.pbtxt \ - --vocab_file data/vocab-2016-09-10.txt \ - --input_data data/news.en.heldout-00000-of-00050 \ - --ckpt 'data/ckpt-*' +$ bazel-bin/lm_1b/lm_1b_eval --mode eval \ + --pbtxt data/graph-2016-09-10.pbtxt \ + --vocab_file data/vocab-2016-09-10.txt \ + --input_data data/news.en.heldout-00000-of-00050 \ + --ckpt 'data/ckpt-*' ...(omitted some TensorFlow output) Loaded step 14108582. # perplexity is high initially because words without context are harder to @@ -166,28 +166,28 @@ Eval Step: 4531, Average Perplexity: 29.285674. ...(omitted. At convergence, it should be around 30.) # Run dump_emb mode: -bazel-bin/lm_1b/lm_1b_eval --mode dump_emb \ - --pbtxt data/graph-2016-09-10.pbtxt \ - --vocab_file data/vocab-2016-09-10.txt \ - --ckpt 'data/ckpt-*' \ - --save_dir output +$ bazel-bin/lm_1b/lm_1b_eval --mode dump_emb \ + --pbtxt data/graph-2016-09-10.pbtxt \ + --vocab_file data/vocab-2016-09-10.txt \ + --ckpt 'data/ckpt-*' \ + --save_dir output ...(omitted some TensorFlow output) Finished softmax weights Finished word embedding 0/793471 Finished word embedding 1/793471 Finished word embedding 2/793471 ...(omitted) -ls output/ +$ ls output/ embeddings_softmax.npy ... # Run dump_lstm_emb mode: -bazel-bin/lm_1b/lm_1b_eval --mode dump_lstm_emb \ - --pbtxt data/graph-2016-09-10.pbtxt \ - --vocab_file data/vocab-2016-09-10.txt \ - --ckpt 'data/ckpt-*' \ - --sentence "I love who I am ." \ - --save_dir output -ls output/ +$ bazel-bin/lm_1b/lm_1b_eval --mode dump_lstm_emb \ + --pbtxt data/graph-2016-09-10.pbtxt \ + --vocab_file data/vocab-2016-09-10.txt \ + --ckpt 'data/ckpt-*' \ + --sentence "I love who I am ." \ + --save_dir output +$ ls output/ lstm_emb_step_0.npy lstm_emb_step_2.npy lstm_emb_step_4.npy lstm_emb_step_6.npy lstm_emb_step_1.npy lstm_emb_step_3.npy lstm_emb_step_5.npy diff --git a/lm_1b/lm_1b_eval.py b/lm_1b/lm_1b_eval.py index 65c48aa4a543091b4f0af27ce7927da206de4ca2..ce8634757558c135ba137a9b9e09a733977adc3a 100644 --- a/lm_1b/lm_1b_eval.py +++ b/lm_1b/lm_1b_eval.py @@ -19,6 +19,7 @@ import os import sys import numpy as np +from six.moves import xrange import tensorflow as tf from google.protobuf import text_format @@ -83,7 +84,7 @@ def _LoadModel(gd_file, ckpt_file): with tf.Graph().as_default(): sys.stderr.write('Recovering graph.\n') with tf.gfile.FastGFile(gd_file, 'r') as f: - s = f.read() + s = f.read().decode() gd = tf.GraphDef() text_format.Merge(s, gd) @@ -230,7 +231,7 @@ def _DumpEmb(vocab): sys.stderr.write('Finished softmax weights\n') all_embs = np.zeros([vocab.size, 1024]) - for i in range(vocab.size): + for i in xrange(vocab.size): input_dict = {t['inputs_in']: inputs, t['targets_in']: targets, t['target_weights_in']: weights} diff --git a/neural_gpu/README.md b/neural_gpu/README.md index b73dd85ef7cea67b6c3ca681f52b89f0119d8f93..510f1c5e0aef697f503bc7b856e032db7e402be7 100644 --- a/neural_gpu/README.md +++ b/neural_gpu/README.md @@ -1,6 +1,6 @@ # NeuralGPU -Code for the Neural GPU model described in [[http://arxiv.org/abs/1511.08228]]. -The extended version was described in [[https://arxiv.org/abs/1610.08613]]. +Code for the Neural GPU model described in http://arxiv.org/abs/1511.08228. +The extended version was described in https://arxiv.org/abs/1610.08613. Requirements: * TensorFlow (see tensorflow.org for how to install) diff --git a/neural_gpu/neural_gpu.py b/neural_gpu/neural_gpu.py index 4d18773937f0c94c45f1c0a3baebea64955847ad..e8ba66e9d774f48cc4e5d7ccbd8a1c16f999f705 100644 --- a/neural_gpu/neural_gpu.py +++ b/neural_gpu/neural_gpu.py @@ -478,8 +478,10 @@ class NeuralGPU(object): # This is just for running a baseline RNN seq2seq model. if do_rnn: self.after_enc_step.append(step) # Not meaningful here, but needed. - lstm_cell = tf.contrib.rnn.BasicLSTMCell(height * nmaps) - cell = tf.contrib.rnn.MultiRNNCell([lstm_cell] * nconvs) + def lstm_cell(): + return tf.contrib.rnn.BasicLSTMCell(height * nmaps) + cell = tf.contrib.rnn.MultiRNNCell( + [lstm_cell() for _ in range(nconvs)]) with tf.variable_scope("encoder"): encoder_outputs, encoder_state = tf.nn.dynamic_rnn( cell, tf.reshape(step, [batch_size, length, height * nmaps]), diff --git a/next_frame_prediction/README.md b/next_frame_prediction/README.md index 09d32205e390de4d15fea9901bb3209723e161c3..d79a6d4c78a5f2f703fe59bbdf0c6df5f865fab8 100644 --- a/next_frame_prediction/README.md +++ b/next_frame_prediction/README.md @@ -12,17 +12,11 @@ Authors: Xin Pan (Github: panyx0718), Anelia Angelova Results: - ![Sample1](g3doc/cross_conv.png) - - + ![Sample2](g3doc/cross_conv2.png) - - ![Loss](g3doc/cross_conv3.png) - - Prerequisite: @@ -40,7 +34,7 @@ to tf.SequenceExample. How to run: ```shell -ls -R +$ ls -R .: data next_frame_prediction WORKSPACE @@ -58,18 +52,18 @@ cross_conv2.png cross_conv3.png cross_conv.png # Build everything. -bazel build -c opt next_frame_prediction/... +$ bazel build -c opt next_frame_prediction/... # The following example runs the generated 2d objects. # For Sprites dataset, image_size should be 60, norm_scale should be 255.0. # Batch size is normally 16~64, depending on your memory size. -# + # Run training. -bazel-bin/next_frame_prediction/cross_conv/train \ - --batch_size=1 \ - --data_filepattern=data/tfrecords \ - --image_size=64 \ - --log_root=/tmp/predict +$ bazel-bin/next_frame_prediction/cross_conv/train \ + --batch_size=1 \ + --data_filepattern=data/tfrecords \ + --image_size=64 \ + --log_root=/tmp/predict step: 1, loss: 24.428671 step: 2, loss: 19.211605 @@ -81,11 +75,11 @@ step: 7, loss: 1.747665 step: 8, loss: 1.572436 step: 9, loss: 1.586816 step: 10, loss: 1.434191 -# + # Run eval. -bazel-bin/next_frame_prediction/cross_conv/eval \ - --batch_size=1 \ - --data_filepattern=data/tfrecords_test \ - --image_size=64 \ - --log_root=/tmp/predict +$ bazel-bin/next_frame_prediction/cross_conv/eval \ + --batch_size=1 \ + --data_filepattern=data/tfrecords_test \ + --image_size=64 \ + --log_root=/tmp/predict ``` diff --git a/object_detection/BUILD b/object_detection/BUILD new file mode 100644 index 0000000000000000000000000000000000000000..f77e3d644334b572f4cf8aa261ffcf34ab518bc6 --- /dev/null +++ b/object_detection/BUILD @@ -0,0 +1,178 @@ +# Tensorflow Object Detection API: main runnables. + +package( + default_visibility = ["//visibility:public"], +) + +licenses(["notice"]) + +# Apache 2.0 + +py_binary( + name = "train", + srcs = [ + "train.py", + ], + deps = [ + ":trainer", + "//tensorflow", + "//tensorflow_models/object_detection/builders:input_reader_builder", + "//tensorflow_models/object_detection/builders:model_builder", + "//tensorflow_models/object_detection/protos:input_reader_py_pb2", + "//tensorflow_models/object_detection/protos:model_py_pb2", + "//tensorflow_models/object_detection/protos:pipeline_py_pb2", + "//tensorflow_models/object_detection/protos:train_py_pb2", + ], +) + +py_library( + name = "trainer", + srcs = ["trainer.py"], + deps = [ + "//tensorflow", + "//tensorflow_models/object_detection/builders:optimizer_builder", + "//tensorflow_models/object_detection/builders:preprocessor_builder", + "//tensorflow_models/object_detection/core:batcher", + "//tensorflow_models/object_detection/core:standard_fields", + "//tensorflow_models/object_detection/utils:ops", + "//tensorflow_models/object_detection/utils:variables_helper", + "//tensorflow_models/slim:model_deploy", + ], +) + +py_test( + name = "trainer_test", + srcs = ["trainer_test.py"], + deps = [ + ":trainer", + "//tensorflow", + "//tensorflow_models/object_detection/core:losses", + "//tensorflow_models/object_detection/core:model", + "//tensorflow_models/object_detection/core:standard_fields", + "//tensorflow_models/object_detection/protos:train_py_pb2", + ], +) + +py_library( + name = "eval_util", + srcs = [ + "eval_util.py", + ], + deps = [ + "//tensorflow", + "//tensorflow_models/object_detection/utils:label_map_util", + "//tensorflow_models/object_detection/utils:object_detection_evaluation", + "//tensorflow_models/object_detection/utils:visualization_utils", + ], +) + +py_library( + name = "evaluator", + srcs = ["evaluator.py"], + deps = [ + "//tensorflow", + "//tensorflow_models/object_detection:eval_util", + "//tensorflow_models/object_detection/core:box_list", + "//tensorflow_models/object_detection/core:box_list_ops", + "//tensorflow_models/object_detection/core:prefetcher", + "//tensorflow_models/object_detection/core:standard_fields", + "//tensorflow_models/object_detection/protos:eval_py_pb2", + ], +) + +py_binary( + name = "eval", + srcs = [ + "eval.py", + ], + deps = [ + ":evaluator", + "//tensorflow", + "//tensorflow_models/object_detection/builders:input_reader_builder", + "//tensorflow_models/object_detection/builders:model_builder", + "//tensorflow_models/object_detection/protos:eval_py_pb2", + "//tensorflow_models/object_detection/protos:input_reader_py_pb2", + "//tensorflow_models/object_detection/protos:model_py_pb2", + "//tensorflow_models/object_detection/protos:pipeline_py_pb2", + "//tensorflow_models/object_detection/utils:label_map_util", + ], +) + +py_library( + name = "exporter", + srcs = [ + "exporter.py", + ], + deps = [ + "//tensorflow", + "//tensorflow/python/tools:freeze_graph_lib", + "//tensorflow_models/object_detection/builders:model_builder", + "//tensorflow_models/object_detection/core:standard_fields", + "//tensorflow_models/object_detection/data_decoders:tf_example_decoder", + ], +) + +py_test( + name = "exporter_test", + srcs = [ + "exporter_test.py", + ], + deps = [ + ":exporter", + "//tensorflow", + "//tensorflow_models/object_detection/builders:model_builder", + "//tensorflow_models/object_detection/core:model", + "//tensorflow_models/object_detection/protos:pipeline_py_pb2", + ], +) + +py_binary( + name = "export_inference_graph", + srcs = [ + "export_inference_graph.py", + ], + deps = [ + ":exporter", + "//tensorflow", + "//tensorflow_models/object_detection/protos:pipeline_py_pb2", + ], +) + +py_binary( + name = "create_pascal_tf_record", + srcs = [ + "create_pascal_tf_record.py", + ], + deps = [ + "//third_party/py/PIL:pil", + "//third_party/py/lxml", + "//tensorflow", + "//tensorflow_models/object_detection/utils:dataset_util", + "//tensorflow_models/object_detection/utils:label_map_util", + ], +) + +py_test( + name = "create_pascal_tf_record_test", + srcs = [ + "create_pascal_tf_record_test.py", + ], + deps = [ + ":create_pascal_tf_record", + "//tensorflow", + ], +) + +py_binary( + name = "create_pet_tf_record", + srcs = [ + "create_pet_tf_record.py", + ], + deps = [ + "//third_party/py/PIL:pil", + "//third_party/py/lxml", + "//tensorflow", + "//tensorflow_models/object_detection/utils:dataset_util", + "//tensorflow_models/object_detection/utils:label_map_util", + ], +) diff --git a/object_detection/CONTRIBUTING.md b/object_detection/CONTRIBUTING.md new file mode 100644 index 0000000000000000000000000000000000000000..e3d87e3ce90fb4dd22b00a2c5368bf17c3610661 --- /dev/null +++ b/object_detection/CONTRIBUTING.md @@ -0,0 +1,13 @@ +# Contributing to the Tensorflow Object Detection API + +Patches to Tensorflow Object Detection API are welcome! + +We require contributors to fill out either the individual or corporate +Contributor License Agreement (CLA). + + * If you are an individual writing original source code and you're sure you own the intellectual property, then you'll need to sign an [individual CLA](http://code.google.com/legal/individual-cla-v1.0.html). + * If you work for a company that wants to allow you to contribute your work, then you'll need to sign a [corporate CLA](http://code.google.com/legal/corporate-cla-v1.0.html). + +Please follow the +[Tensorflow contributing guidelines](https://github.com/tensorflow/tensorflow/blob/master/CONTRIBUTING.md) +when submitting pull requests. diff --git a/object_detection/README.md b/object_detection/README.md new file mode 100644 index 0000000000000000000000000000000000000000..eaf13817b41c694e92d449ab803e37eaef79aa54 --- /dev/null +++ b/object_detection/README.md @@ -0,0 +1,80 @@ +# Tensorflow Object Detection API +Creating accurate machine learning models capable of localizing and identifying +multiple objects in a single image remains a core challenge in computer vision. +The TensorFlow Object Detection API is an open source framework built on top of +TensorFlow that makes it easy to construct, train and deploy object detection +models. At Google we’ve certainly found this codebase to be useful for our +computer vision needs, and we hope that you will as well. +

+ +

+Contributions to the codebase are welcome and we would love to hear back from +you if you find this API useful. Finally if you use the Tensorflow Object +Detection API for a research publication, please consider citing: + +``` +"Speed/accuracy trade-offs for modern convolutional object detectors." +Huang J, Rathod V, Sun C, Zhu M, Korattikara A, Fathi A, Fischer I, Wojna Z, +Song Y, Guadarrama S, Murphy K, CVPR 2017 +``` +\[[link](https://arxiv.org/abs/1611.10012)\]\[[bibtex]( +https://scholar.googleusercontent.com/scholar.bib?q=info:l291WsrB-hQJ:scholar.google.com/&output=citation&scisig=AAGBfm0AAAAAWUIIlnPZ_L9jxvPwcC49kDlELtaeIyU-&scisf=4&ct=citation&cd=-1&hl=en&scfhb=1)\] + +## Maintainers + +* Jonathan Huang, github: [jch1](https://github.com/jch1) +* Vivek Rathod, github: [tombstone](https://github.com/tombstone) +* Derek Chow, github: [derekjchow](https://github.com/derekjchow) +* Chen Sun, github: [jesu9](https://github.com/jesu9) +* Menglong Zhu, github: [dreamdragon](https://github.com/dreamdragon) + + +## Table of contents + +Quick Start: +*
+ Quick Start: Jupyter notebook for off-the-shelf inference
+* Quick Start: Training a pet detector
+ +Setup: +* Installation
+* + Configuring an object detection pipeline
+* Preparing inputs
+ +Running: +* Running locally
+* Running on the cloud
+ +Extras: +* Tensorflow detection model zoo
+* + Exporting a trained model for inference
+* + Defining your own model architecture
+ +## Release information + +### June 15, 2017 + +In addition to our base Tensorflow detection model definitions, this +release includes: + +* A selection of trainable detection models, including: + * Single Shot Multibox Detector (SSD) with MobileNet, + * SSD with Inception V2, + * Region-Based Fully Convolutional Networks (R-FCN) with Resnet 101, + * Faster RCNN with Resnet 101, + * Faster RCNN with Inception Resnet v2 +* Frozen weights (trained on the COCO dataset) for each of the above models to + be used for out-of-the-box inference purposes. +* A [Jupyter notebook](object_detection_tutorial.ipynb) for performing + out-of-the-box inference with one of our released models +* Convenient [local training](g3doc/running_locally.md) scripts as well as + distributed training and evaluation pipelines via + [Google Cloud](g3doc/running_on_cloud.md). + + +Thanks to contributors: Jonathan Huang, Vivek Rathod, Derek Chow, +Chen Sun, Menglong Zhu, Matthew Tang, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, Jasper Uijlings, +Viacheslav Kovalevskyi, Kevin Murphy diff --git a/object_detection/__init__.py b/object_detection/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/object_detection/anchor_generators/BUILD b/object_detection/anchor_generators/BUILD new file mode 100644 index 0000000000000000000000000000000000000000..cb421a0c16b5d7b367590d5a9f4004fac2f48212 --- /dev/null +++ b/object_detection/anchor_generators/BUILD @@ -0,0 +1,56 @@ +# Tensorflow Object Detection API: Anchor Generator implementations. + +package( + default_visibility = ["//visibility:public"], +) + +licenses(["notice"]) + +# Apache 2.0 +py_library( + name = "grid_anchor_generator", + srcs = [ + "grid_anchor_generator.py", + ], + deps = [ + "//tensorflow", + "//tensorflow_models/object_detection/core:anchor_generator", + "//tensorflow_models/object_detection/core:box_list", + "//tensorflow_models/object_detection/utils:ops", + ], +) + +py_test( + name = "grid_anchor_generator_test", + srcs = [ + "grid_anchor_generator_test.py", + ], + deps = [ + ":grid_anchor_generator", + "//tensorflow", + ], +) + +py_library( + name = "multiple_grid_anchor_generator", + srcs = [ + "multiple_grid_anchor_generator.py", + ], + deps = [ + ":grid_anchor_generator", + "//tensorflow", + "//tensorflow_models/object_detection/core:anchor_generator", + "//tensorflow_models/object_detection/core:box_list_ops", + ], +) + +py_test( + name = "multiple_grid_anchor_generator_test", + srcs = [ + "multiple_grid_anchor_generator_test.py", + ], + deps = [ + ":multiple_grid_anchor_generator", + "//third_party/py/numpy", + ], +) diff --git a/object_detection/anchor_generators/__init__.py b/object_detection/anchor_generators/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/object_detection/anchor_generators/grid_anchor_generator.py b/object_detection/anchor_generators/grid_anchor_generator.py new file mode 100644 index 0000000000000000000000000000000000000000..d2ea2c07d5dd9fec976361da265ee6ff620fab5a --- /dev/null +++ b/object_detection/anchor_generators/grid_anchor_generator.py @@ -0,0 +1,194 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Generates grid anchors on the fly as used in Faster RCNN. + +Generates grid anchors on the fly as described in: +"Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks" +Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. +""" + +import tensorflow as tf + +from object_detection.core import anchor_generator +from object_detection.core import box_list +from object_detection.utils import ops + + +class GridAnchorGenerator(anchor_generator.AnchorGenerator): + """Generates a grid of anchors at given scales and aspect ratios.""" + + def __init__(self, + scales=(0.5, 1.0, 2.0), + aspect_ratios=(0.5, 1.0, 2.0), + base_anchor_size=None, + anchor_stride=None, + anchor_offset=None): + """Constructs a GridAnchorGenerator. + + Args: + scales: a list of (float) scales, default=(0.5, 1.0, 2.0) + aspect_ratios: a list of (float) aspect ratios, default=(0.5, 1.0, 2.0) + base_anchor_size: base anchor size as height, width ( + (length-2 float32 list, default=[256, 256]) + anchor_stride: difference in centers between base anchors for adjacent + grid positions (length-2 float32 list, default=[16, 16]) + anchor_offset: center of the anchor with scale and aspect ratio 1 for the + upper left element of the grid, this should be zero for + feature networks with only VALID padding and even receptive + field size, but may need additional calculation if other + padding is used (length-2 float32 tensor, default=[0, 0]) + """ + # Handle argument defaults + if base_anchor_size is None: + base_anchor_size = [256, 256] + base_anchor_size = tf.constant(base_anchor_size, tf.float32) + if anchor_stride is None: + anchor_stride = [16, 16] + anchor_stride = tf.constant(anchor_stride, dtype=tf.float32) + if anchor_offset is None: + anchor_offset = [0, 0] + anchor_offset = tf.constant(anchor_offset, dtype=tf.float32) + + self._scales = scales + self._aspect_ratios = aspect_ratios + self._base_anchor_size = base_anchor_size + self._anchor_stride = anchor_stride + self._anchor_offset = anchor_offset + + def name_scope(self): + return 'GridAnchorGenerator' + + def num_anchors_per_location(self): + """Returns the number of anchors per spatial location. + + Returns: + a list of integers, one for each expected feature map to be passed to + the `generate` function. + """ + return [len(self._scales) * len(self._aspect_ratios)] + + def _generate(self, feature_map_shape_list): + """Generates a collection of bounding boxes to be used as anchors. + + Args: + feature_map_shape_list: list of pairs of convnet layer resolutions in the + format [(height_0, width_0)]. For example, setting + feature_map_shape_list=[(8, 8)] asks for anchors that correspond + to an 8x8 layer. For this anchor generator, only lists of length 1 are + allowed. + + Returns: + boxes: a BoxList holding a collection of N anchor boxes + Raises: + ValueError: if feature_map_shape_list, box_specs_list do not have the same + length. + ValueError: if feature_map_shape_list does not consist of pairs of + integers + """ + if not (isinstance(feature_map_shape_list, list) + and len(feature_map_shape_list) == 1): + raise ValueError('feature_map_shape_list must be a list of length 1.') + if not all([isinstance(list_item, tuple) and len(list_item) == 2 + for list_item in feature_map_shape_list]): + raise ValueError('feature_map_shape_list must be a list of pairs.') + grid_height, grid_width = feature_map_shape_list[0] + scales_grid, aspect_ratios_grid = ops.meshgrid(self._scales, + self._aspect_ratios) + scales_grid = tf.reshape(scales_grid, [-1]) + aspect_ratios_grid = tf.reshape(aspect_ratios_grid, [-1]) + return tile_anchors(grid_height, + grid_width, + scales_grid, + aspect_ratios_grid, + self._base_anchor_size, + self._anchor_stride, + self._anchor_offset) + + +def tile_anchors(grid_height, + grid_width, + scales, + aspect_ratios, + base_anchor_size, + anchor_stride, + anchor_offset): + """Create a tiled set of anchors strided along a grid in image space. + + This op creates a set of anchor boxes by placing a "basis" collection of + boxes with user-specified scales and aspect ratios centered at evenly + distributed points along a grid. The basis collection is specified via the + scale and aspect_ratios arguments. For example, setting scales=[.1, .2, .2] + and aspect ratios = [2,2,1/2] means that we create three boxes: one with scale + .1, aspect ratio 2, one with scale .2, aspect ratio 2, and one with scale .2 + and aspect ratio 1/2. Each box is multiplied by "base_anchor_size" before + placing it over its respective center. + + Grid points are specified via grid_height, grid_width parameters as well as + the anchor_stride and anchor_offset parameters. + + Args: + grid_height: size of the grid in the y direction (int or int scalar tensor) + grid_width: size of the grid in the x direction (int or int scalar tensor) + scales: a 1-d (float) tensor representing the scale of each box in the + basis set. + aspect_ratios: a 1-d (float) tensor representing the aspect ratio of each + box in the basis set. The length of the scales and aspect_ratios tensors + must be equal. + base_anchor_size: base anchor size as [height, width] + (float tensor of shape [2]) + anchor_stride: difference in centers between base anchors for adjacent grid + positions (float tensor of shape [2]) + anchor_offset: center of the anchor with scale and aspect ratio 1 for the + upper left element of the grid, this should be zero for + feature networks with only VALID padding and even receptive + field size, but may need some additional calculation if other + padding is used (float tensor of shape [2]) + Returns: + a BoxList holding a collection of N anchor boxes + """ + ratio_sqrts = tf.sqrt(aspect_ratios) + heights = scales / ratio_sqrts * base_anchor_size[0] + widths = scales * ratio_sqrts * base_anchor_size[1] + + # Get a grid of box centers + y_centers = tf.to_float(tf.range(grid_height)) + y_centers = y_centers * anchor_stride[0] + anchor_offset[0] + x_centers = tf.to_float(tf.range(grid_width)) + x_centers = x_centers * anchor_stride[1] + anchor_offset[1] + x_centers, y_centers = ops.meshgrid(x_centers, y_centers) + + widths_grid, x_centers_grid = ops.meshgrid(widths, x_centers) + heights_grid, y_centers_grid = ops.meshgrid(heights, y_centers) + bbox_centers = tf.stack([y_centers_grid, x_centers_grid], axis=3) + bbox_sizes = tf.stack([heights_grid, widths_grid], axis=3) + bbox_centers = tf.reshape(bbox_centers, [-1, 2]) + bbox_sizes = tf.reshape(bbox_sizes, [-1, 2]) + bbox_corners = _center_size_bbox_to_corners_bbox(bbox_centers, bbox_sizes) + return box_list.BoxList(bbox_corners) + + +def _center_size_bbox_to_corners_bbox(centers, sizes): + """Converts bbox center-size representation to corners representation. + + Args: + centers: a tensor with shape [N, 2] representing bounding box centers + sizes: a tensor with shape [N, 2] representing bounding boxes + + Returns: + corners: tensor with shape [N, 4] representing bounding boxes in corners + representation + """ + return tf.concat([centers - .5 * sizes, centers + .5 * sizes], 1) diff --git a/object_detection/anchor_generators/grid_anchor_generator_test.py b/object_detection/anchor_generators/grid_anchor_generator_test.py new file mode 100644 index 0000000000000000000000000000000000000000..80a82a3905bc09d1f77a18267ff7a2fd7f5a1f1e --- /dev/null +++ b/object_detection/anchor_generators/grid_anchor_generator_test.py @@ -0,0 +1,76 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.grid_anchor_generator.""" + +import tensorflow as tf + +from object_detection.anchor_generators import grid_anchor_generator + + +class GridAnchorGeneratorTest(tf.test.TestCase): + + def test_construct_single_anchor(self): + """Builds a 1x1 anchor grid to test the size of the output boxes.""" + scales = [0.5, 1.0, 2.0] + aspect_ratios = [0.25, 1.0, 4.0] + anchor_offset = [7, -3] + exp_anchor_corners = [[-121, -35, 135, 29], [-249, -67, 263, 61], + [-505, -131, 519, 125], [-57, -67, 71, 61], + [-121, -131, 135, 125], [-249, -259, 263, 253], + [-25, -131, 39, 125], [-57, -259, 71, 253], + [-121, -515, 135, 509]] + + anchor_generator = grid_anchor_generator.GridAnchorGenerator( + scales, aspect_ratios, + anchor_offset=anchor_offset) + anchors = anchor_generator.generate(feature_map_shape_list=[(1, 1)]) + anchor_corners = anchors.get() + + with self.test_session(): + anchor_corners_out = anchor_corners.eval() + self.assertAllClose(anchor_corners_out, exp_anchor_corners) + + def test_construct_anchor_grid(self): + base_anchor_size = [10, 10] + anchor_stride = [19, 19] + anchor_offset = [0, 0] + scales = [0.5, 1.0, 2.0] + aspect_ratios = [1.0] + + exp_anchor_corners = [[-2.5, -2.5, 2.5, 2.5], [-5., -5., 5., 5.], + [-10., -10., 10., 10.], [-2.5, 16.5, 2.5, 21.5], + [-5., 14., 5, 24], [-10., 9., 10, 29], + [16.5, -2.5, 21.5, 2.5], [14., -5., 24, 5], + [9., -10., 29, 10], [16.5, 16.5, 21.5, 21.5], + [14., 14., 24, 24], [9., 9., 29, 29]] + + anchor_generator = grid_anchor_generator.GridAnchorGenerator( + scales, + aspect_ratios, + base_anchor_size=base_anchor_size, + anchor_stride=anchor_stride, + anchor_offset=anchor_offset) + + anchors = anchor_generator.generate(feature_map_shape_list=[(2, 2)]) + anchor_corners = anchors.get() + + with self.test_session(): + anchor_corners_out = anchor_corners.eval() + self.assertAllClose(anchor_corners_out, exp_anchor_corners) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/anchor_generators/multiple_grid_anchor_generator.py b/object_detection/anchor_generators/multiple_grid_anchor_generator.py new file mode 100644 index 0000000000000000000000000000000000000000..655d99f19a08a72ba03cdfd48a6620efacdd1b56 --- /dev/null +++ b/object_detection/anchor_generators/multiple_grid_anchor_generator.py @@ -0,0 +1,273 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Generates grid anchors on the fly corresponding to multiple CNN layers. + +Generates grid anchors on the fly corresponding to multiple CNN layers as +described in: +"SSD: Single Shot MultiBox Detector" +Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, +Cheng-Yang Fu, Alexander C. Berg +(see Section 2.2: Choosing scales and aspect ratios for default boxes) +""" + +import numpy as np + +import tensorflow as tf + +from object_detection.anchor_generators import grid_anchor_generator +from object_detection.core import anchor_generator +from object_detection.core import box_list_ops + + +class MultipleGridAnchorGenerator(anchor_generator.AnchorGenerator): + """Generate a grid of anchors for multiple CNN layers.""" + + def __init__(self, + box_specs_list, + base_anchor_size=None, + clip_window=None): + """Constructs a MultipleGridAnchorGenerator. + + To construct anchors, at multiple grid resolutions, one must provide a + list of feature_map_shape_list (e.g., [(8, 8), (4, 4)]), and for each grid + size, a corresponding list of (scale, aspect ratio) box specifications. + + For example: + box_specs_list = [[(.1, 1.0), (.1, 2.0)], # for 8x8 grid + [(.2, 1.0), (.3, 1.0), (.2, 2.0)]] # for 4x4 grid + + To support the fully convolutional setting, we pass grid sizes in at + generation time, while scale and aspect ratios are fixed at construction + time. + + Args: + box_specs_list: list of list of (scale, aspect ratio) pairs with the + outside list having the same number of entries as feature_map_shape_list + (which is passed in at generation time). + base_anchor_size: base anchor size as [height, width] + (length-2 float tensor, default=[256, 256]). + clip_window: a tensor of shape [4] specifying a window to which all + anchors should be clipped. If clip_window is None, then no clipping + is performed. + + Raises: + ValueError: if box_specs_list is not a list of list of pairs + ValueError: if clip_window is not either None or a tensor of shape [4] + """ + if isinstance(box_specs_list, list) and all( + [isinstance(list_item, list) for list_item in box_specs_list]): + self._box_specs = box_specs_list + else: + raise ValueError('box_specs_list is expected to be a ' + 'list of lists of pairs') + if base_anchor_size is None: + base_anchor_size = tf.constant([256, 256], dtype=tf.float32) + self._base_anchor_size = base_anchor_size + if clip_window is not None and clip_window.get_shape().as_list() != [4]: + raise ValueError('clip_window must either be None or a shape [4] tensor') + self._clip_window = clip_window + self._scales = [] + self._aspect_ratios = [] + for box_spec in self._box_specs: + if not all([isinstance(entry, tuple) and len(entry) == 2 + for entry in box_spec]): + raise ValueError('box_specs_list is expected to be a ' + 'list of lists of pairs') + scales, aspect_ratios = zip(*box_spec) + self._scales.append(scales) + self._aspect_ratios.append(aspect_ratios) + + def name_scope(self): + return 'MultipleGridAnchorGenerator' + + def num_anchors_per_location(self): + """Returns the number of anchors per spatial location. + + Returns: + a list of integers, one for each expected feature map to be passed to + the Generate function. + """ + return [len(box_specs) for box_specs in self._box_specs] + + def _generate(self, + feature_map_shape_list, + im_height=1, + im_width=1, + anchor_strides=None, + anchor_offsets=None): + """Generates a collection of bounding boxes to be used as anchors. + + The number of anchors generated for a single grid with shape MxM where we + place k boxes over each grid center is k*M^2 and thus the total number of + anchors is the sum over all grids. In our box_specs_list example + (see the constructor docstring), we would place two boxes over each grid + point on an 8x8 grid and three boxes over each grid point on a 4x4 grid and + thus end up with 2*8^2 + 3*4^2 = 176 anchors in total. The layout of the + output anchors follows the order of how the grid sizes and box_specs are + specified (with box_spec index varying the fastest, followed by width + index, then height index, then grid index). + + Args: + feature_map_shape_list: list of pairs of convnet layer resolutions in the + format [(height_0, width_0), (height_1, width_1), ...]. For example, + setting feature_map_shape_list=[(8, 8), (7, 7)] asks for anchors that + correspond to an 8x8 layer followed by a 7x7 layer. + im_height: the height of the image to generate the grid for. If both + im_height and im_width are 1, the generated anchors default to + normalized coordinates, otherwise absolute coordinates are used for the + grid. + im_width: the width of the image to generate the grid for. If both + im_height and im_width are 1, the generated anchors default to + normalized coordinates, otherwise absolute coordinates are used for the + grid. + anchor_strides: list of pairs of strides (in y and x directions + respectively). For example, setting + anchor_strides=[(.25, .25), (.5, .5)] means that we want the anchors + corresponding to the first layer to be strided by .25 and those in the + second layer to be strided by .5 in both y and x directions. By + default, if anchor_strides=None, then they are set to be the reciprocal + of the corresponding grid sizes. The pairs can also be specified as + dynamic tf.int or tf.float numbers, e.g. for variable shape input + images. + anchor_offsets: list of pairs of offsets (in y and x directions + respectively). The offset specifies where we want the center of the + (0, 0)-th anchor to lie for each layer. For example, setting + anchor_offsets=[(.125, .125), (.25, .25)]) means that we want the + (0, 0)-th anchor of the first layer to lie at (.125, .125) in image + space and likewise that we want the (0, 0)-th anchor of the second + layer to lie at (.25, .25) in image space. By default, if + anchor_offsets=None, then they are set to be half of the corresponding + anchor stride. The pairs can also be specified as dynamic tf.int or + tf.float numbers, e.g. for variable shape input images. + + Returns: + boxes: a BoxList holding a collection of N anchor boxes + Raises: + ValueError: if feature_map_shape_list, box_specs_list do not have the same + length. + ValueError: if feature_map_shape_list does not consist of pairs of + integers + """ + if not (isinstance(feature_map_shape_list, list) + and len(feature_map_shape_list) == len(self._box_specs)): + raise ValueError('feature_map_shape_list must be a list with the same ' + 'length as self._box_specs') + if not all([isinstance(list_item, tuple) and len(list_item) == 2 + for list_item in feature_map_shape_list]): + raise ValueError('feature_map_shape_list must be a list of pairs.') + if not anchor_strides: + anchor_strides = [(tf.to_float(im_height) / tf.to_float(pair[0]), + tf.to_float(im_width) / tf.to_float(pair[1])) + for pair in feature_map_shape_list] + if not anchor_offsets: + anchor_offsets = [(0.5 * stride[0], 0.5 * stride[1]) + for stride in anchor_strides] + for arg, arg_name in zip([anchor_strides, anchor_offsets], + ['anchor_strides', 'anchor_offsets']): + if not (isinstance(arg, list) and len(arg) == len(self._box_specs)): + raise ValueError('%s must be a list with the same length ' + 'as self._box_specs' % arg_name) + if not all([isinstance(list_item, tuple) and len(list_item) == 2 + for list_item in arg]): + raise ValueError('%s must be a list of pairs.' % arg_name) + + anchor_grid_list = [] + min_im_shape = tf.to_float(tf.minimum(im_height, im_width)) + base_anchor_size = min_im_shape * self._base_anchor_size + for grid_size, scales, aspect_ratios, stride, offset in zip( + feature_map_shape_list, self._scales, self._aspect_ratios, + anchor_strides, anchor_offsets): + anchor_grid_list.append( + grid_anchor_generator.tile_anchors( + grid_height=grid_size[0], + grid_width=grid_size[1], + scales=scales, + aspect_ratios=aspect_ratios, + base_anchor_size=base_anchor_size, + anchor_stride=stride, + anchor_offset=offset)) + concatenated_anchors = box_list_ops.concatenate(anchor_grid_list) + num_anchors = concatenated_anchors.num_boxes_static() + if num_anchors is None: + num_anchors = concatenated_anchors.num_boxes() + if self._clip_window is not None: + clip_window = tf.multiply( + tf.to_float([im_height, im_width, im_height, im_width]), + self._clip_window) + concatenated_anchors = box_list_ops.clip_to_window( + concatenated_anchors, clip_window, filter_nonoverlapping=False) + # TODO: make reshape an option for the clip_to_window op + concatenated_anchors.set( + tf.reshape(concatenated_anchors.get(), [num_anchors, 4])) + + stddevs_tensor = 0.01 * tf.ones( + [num_anchors, 4], dtype=tf.float32, name='stddevs') + concatenated_anchors.add_field('stddev', stddevs_tensor) + + return concatenated_anchors + + +def create_ssd_anchors(num_layers=6, + min_scale=0.2, + max_scale=0.95, + aspect_ratios=(1.0, 2.0, 3.0, 1.0/2, 1.0/3), + base_anchor_size=None, + reduce_boxes_in_lowest_layer=True): + """Creates MultipleGridAnchorGenerator for SSD anchors. + + This function instantiates a MultipleGridAnchorGenerator that reproduces + ``default box`` construction proposed by Liu et al in the SSD paper. + See Section 2.2 for details. Grid sizes are assumed to be passed in + at generation time from finest resolution to coarsest resolution --- this is + used to (linearly) interpolate scales of anchor boxes corresponding to the + intermediate grid sizes. + + Anchors that are returned by calling the `generate` method on the returned + MultipleGridAnchorGenerator object are always in normalized coordinates + and clipped to the unit square: (i.e. all coordinates lie in [0, 1]x[0, 1]). + + Args: + num_layers: integer number of grid layers to create anchors for (actual + grid sizes passed in at generation time) + min_scale: scale of anchors corresponding to finest resolution (float) + max_scale: scale of anchors corresponding to coarsest resolution (float) + aspect_ratios: list or tuple of (float) aspect ratios to place on each + grid point. + base_anchor_size: base anchor size as [height, width]. + reduce_boxes_in_lowest_layer: a boolean to indicate whether the fixed 3 + boxes per location is used in the lowest layer. + + Returns: + a MultipleGridAnchorGenerator + """ + if base_anchor_size is None: + base_anchor_size = [1.0, 1.0] + base_anchor_size = tf.constant(base_anchor_size, dtype=tf.float32) + box_specs_list = [] + scales = [min_scale + (max_scale - min_scale) * i / (num_layers - 1) + for i in range(num_layers)] + [1.0] + for layer, scale, scale_next in zip( + range(num_layers), scales[:-1], scales[1:]): + layer_box_specs = [] + if layer == 0 and reduce_boxes_in_lowest_layer: + layer_box_specs = [(0.1, 1.0), (scale, 2.0), (scale, 0.5)] + else: + for aspect_ratio in aspect_ratios: + layer_box_specs.append((scale, aspect_ratio)) + if aspect_ratio == 1.0: + layer_box_specs.append((np.sqrt(scale*scale_next), 1.0)) + box_specs_list.append(layer_box_specs) + return MultipleGridAnchorGenerator(box_specs_list, base_anchor_size) diff --git a/object_detection/anchor_generators/multiple_grid_anchor_generator_test.py b/object_detection/anchor_generators/multiple_grid_anchor_generator_test.py new file mode 100644 index 0000000000000000000000000000000000000000..a7f0346b646865259527638beeec8bd35d4ee276 --- /dev/null +++ b/object_detection/anchor_generators/multiple_grid_anchor_generator_test.py @@ -0,0 +1,253 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for anchor_generators.multiple_grid_anchor_generator_test.py.""" + +import numpy as np + +import tensorflow as tf + +from object_detection.anchor_generators import multiple_grid_anchor_generator as ag + + +class MultipleGridAnchorGeneratorTest(tf.test.TestCase): + + def test_construct_single_anchor_grid(self): + """Builds a 1x1 anchor grid to test the size of the output boxes.""" + exp_anchor_corners = [[-121, -35, 135, 29], [-249, -67, 263, 61], + [-505, -131, 519, 125], [-57, -67, 71, 61], + [-121, -131, 135, 125], [-249, -259, 263, 253], + [-25, -131, 39, 125], [-57, -259, 71, 253], + [-121, -515, 135, 509]] + + base_anchor_size = tf.constant([256, 256], dtype=tf.float32) + box_specs_list = [[(.5, .25), (1.0, .25), (2.0, .25), + (.5, 1.0), (1.0, 1.0), (2.0, 1.0), + (.5, 4.0), (1.0, 4.0), (2.0, 4.0)]] + anchor_generator = ag.MultipleGridAnchorGenerator( + box_specs_list, base_anchor_size) + anchors = anchor_generator.generate(feature_map_shape_list=[(1, 1)], + anchor_strides=[(16, 16)], + anchor_offsets=[(7, -3)]) + anchor_corners = anchors.get() + with self.test_session(): + anchor_corners_out = anchor_corners.eval() + self.assertAllClose(anchor_corners_out, exp_anchor_corners) + + def test_construct_anchor_grid(self): + base_anchor_size = tf.constant([10, 10], dtype=tf.float32) + box_specs_list = [[(0.5, 1.0), (1.0, 1.0), (2.0, 1.0)]] + + exp_anchor_corners = [[-2.5, -2.5, 2.5, 2.5], [-5., -5., 5., 5.], + [-10., -10., 10., 10.], [-2.5, 16.5, 2.5, 21.5], + [-5., 14., 5, 24], [-10., 9., 10, 29], + [16.5, -2.5, 21.5, 2.5], [14., -5., 24, 5], + [9., -10., 29, 10], [16.5, 16.5, 21.5, 21.5], + [14., 14., 24, 24], [9., 9., 29, 29]] + + anchor_generator = ag.MultipleGridAnchorGenerator( + box_specs_list, base_anchor_size) + anchors = anchor_generator.generate(feature_map_shape_list=[(2, 2)], + anchor_strides=[(19, 19)], + anchor_offsets=[(0, 0)]) + anchor_corners = anchors.get() + + with self.test_session(): + anchor_corners_out = anchor_corners.eval() + self.assertAllClose(anchor_corners_out, exp_anchor_corners) + + def test_construct_anchor_grid_non_square(self): + base_anchor_size = tf.constant([1, 1], dtype=tf.float32) + box_specs_list = [[(1.0, 1.0)]] + + exp_anchor_corners = [[0., -0.25, 1., 0.75], [0., 0.25, 1., 1.25]] + + anchor_generator = ag.MultipleGridAnchorGenerator(box_specs_list, + base_anchor_size) + anchors = anchor_generator.generate(feature_map_shape_list=[(tf.constant( + 1, dtype=tf.int32), tf.constant(2, dtype=tf.int32))]) + anchor_corners = anchors.get() + + with self.test_session(): + anchor_corners_out = anchor_corners.eval() + self.assertAllClose(anchor_corners_out, exp_anchor_corners) + + def test_construct_anchor_grid_unnormalized(self): + base_anchor_size = tf.constant([1, 1], dtype=tf.float32) + box_specs_list = [[(1.0, 1.0)]] + + exp_anchor_corners = [[0., 0., 320., 320.], [0., 320., 320., 640.]] + + anchor_generator = ag.MultipleGridAnchorGenerator(box_specs_list, + base_anchor_size) + anchors = anchor_generator.generate( + feature_map_shape_list=[(tf.constant(1, dtype=tf.int32), tf.constant( + 2, dtype=tf.int32))], + im_height=320, + im_width=640) + anchor_corners = anchors.get() + + with self.test_session(): + anchor_corners_out = anchor_corners.eval() + self.assertAllClose(anchor_corners_out, exp_anchor_corners) + + def test_construct_multiple_grids(self): + base_anchor_size = tf.constant([1.0, 1.0], dtype=tf.float32) + box_specs_list = [[(1.0, 1.0), (2.0, 1.0), (1.0, 0.5)], + [(1.0, 1.0), (1.0, 0.5)]] + + # height and width of box with .5 aspect ratio + h = np.sqrt(2) + w = 1.0/np.sqrt(2) + exp_small_grid_corners = [[-.25, -.25, .75, .75], + [.25-.5*h, .25-.5*w, .25+.5*h, .25+.5*w], + [-.25, .25, .75, 1.25], + [.25-.5*h, .75-.5*w, .25+.5*h, .75+.5*w], + [.25, -.25, 1.25, .75], + [.75-.5*h, .25-.5*w, .75+.5*h, .25+.5*w], + [.25, .25, 1.25, 1.25], + [.75-.5*h, .75-.5*w, .75+.5*h, .75+.5*w]] + # only test first entry of larger set of anchors + exp_big_grid_corners = [[.125-.5, .125-.5, .125+.5, .125+.5], + [.125-1.0, .125-1.0, .125+1.0, .125+1.0], + [.125-.5*h, .125-.5*w, .125+.5*h, .125+.5*w],] + + anchor_generator = ag.MultipleGridAnchorGenerator( + box_specs_list, base_anchor_size) + anchors = anchor_generator.generate(feature_map_shape_list=[(4, 4), (2, 2)], + anchor_strides=[(.25, .25), (.5, .5)], + anchor_offsets=[(.125, .125), + (.25, .25)]) + anchor_corners = anchors.get() + + with self.test_session(): + anchor_corners_out = anchor_corners.eval() + self.assertEquals(anchor_corners_out.shape, (56, 4)) + big_grid_corners = anchor_corners_out[0:3, :] + small_grid_corners = anchor_corners_out[48:, :] + self.assertAllClose(small_grid_corners, exp_small_grid_corners) + self.assertAllClose(big_grid_corners, exp_big_grid_corners) + + def test_construct_multiple_grids_with_clipping(self): + base_anchor_size = tf.constant([1.0, 1.0], dtype=tf.float32) + box_specs_list = [[(1.0, 1.0), (2.0, 1.0), (1.0, 0.5)], + [(1.0, 1.0), (1.0, 0.5)]] + + # height and width of box with .5 aspect ratio + h = np.sqrt(2) + w = 1.0/np.sqrt(2) + exp_small_grid_corners = [[0, 0, .75, .75], + [0, 0, .25+.5*h, .25+.5*w], + [0, .25, .75, 1], + [0, .75-.5*w, .25+.5*h, 1], + [.25, 0, 1, .75], + [.75-.5*h, 0, 1, .25+.5*w], + [.25, .25, 1, 1], + [.75-.5*h, .75-.5*w, 1, 1]] + + clip_window = tf.constant([0, 0, 1, 1], dtype=tf.float32) + anchor_generator = ag.MultipleGridAnchorGenerator( + box_specs_list, base_anchor_size, clip_window=clip_window) + anchors = anchor_generator.generate(feature_map_shape_list=[(4, 4), (2, 2)]) + anchor_corners = anchors.get() + + with self.test_session(): + anchor_corners_out = anchor_corners.eval() + small_grid_corners = anchor_corners_out[48:, :] + self.assertAllClose(small_grid_corners, exp_small_grid_corners) + + def test_invalid_box_specs(self): + # not all box specs are pairs + box_specs_list = [[(1.0, 1.0), (2.0, 1.0), (1.0, 0.5)], + [(1.0, 1.0), (1.0, 0.5, .3)]] + with self.assertRaises(ValueError): + ag.MultipleGridAnchorGenerator(box_specs_list) + + # box_specs_list is not a list of lists + box_specs_list = [(1.0, 1.0), (2.0, 1.0), (1.0, 0.5)] + with self.assertRaises(ValueError): + ag.MultipleGridAnchorGenerator(box_specs_list) + + def test_invalid_generate_arguments(self): + base_anchor_size = tf.constant([1.0, 1.0], dtype=tf.float32) + box_specs_list = [[(1.0, 1.0), (2.0, 1.0), (1.0, 0.5)], + [(1.0, 1.0), (1.0, 0.5)]] + anchor_generator = ag.MultipleGridAnchorGenerator( + box_specs_list, base_anchor_size) + + # incompatible lengths with box_specs_list + with self.assertRaises(ValueError): + anchor_generator.generate(feature_map_shape_list=[(4, 4), (2, 2)], + anchor_strides=[(.25, .25)], + anchor_offsets=[(.125, .125), (.25, .25)]) + with self.assertRaises(ValueError): + anchor_generator.generate(feature_map_shape_list=[(4, 4), (2, 2), (1, 1)], + anchor_strides=[(.25, .25), (.5, .5)], + anchor_offsets=[(.125, .125), (.25, .25)]) + with self.assertRaises(ValueError): + anchor_generator.generate(feature_map_shape_list=[(4, 4), (2, 2)], + anchor_strides=[(.5, .5)], + anchor_offsets=[(.25, .25)]) + + # not pairs + with self.assertRaises(ValueError): + anchor_generator.generate(feature_map_shape_list=[(4, 4, 4), (2, 2)], + anchor_strides=[(.25, .25), (.5, .5)], + anchor_offsets=[(.125, .125), (.25, .25)]) + with self.assertRaises(ValueError): + anchor_generator.generate(feature_map_shape_list=[(4, 4), (2, 2)], + anchor_strides=[(.25, .25, .1), (.5, .5)], + anchor_offsets=[(.125, .125), + (.25, .25)]) + with self.assertRaises(ValueError): + anchor_generator.generate(feature_map_shape_list=[(4), (2, 2)], + anchor_strides=[(.25, .25), (.5, .5)], + anchor_offsets=[(.125), (.25)]) + + +class CreateSSDAnchorsTest(tf.test.TestCase): + + def test_create_ssd_anchors_returns_correct_shape(self): + anchor_generator = ag.create_ssd_anchors( + num_layers=6, min_scale=0.2, max_scale=0.95, + aspect_ratios=(1.0, 2.0, 3.0, 1.0/2, 1.0/3), + reduce_boxes_in_lowest_layer=True) + + feature_map_shape_list = [(38, 38), (19, 19), (10, 10), + (5, 5), (3, 3), (1, 1)] + anchors = anchor_generator.generate( + feature_map_shape_list=feature_map_shape_list) + anchor_corners = anchors.get() + with self.test_session(): + anchor_corners_out = anchor_corners.eval() + self.assertEquals(anchor_corners_out.shape, (7308, 4)) + + anchor_generator = ag.create_ssd_anchors( + num_layers=6, min_scale=0.2, max_scale=0.95, + aspect_ratios=(1.0, 2.0, 3.0, 1.0/2, 1.0/3), + reduce_boxes_in_lowest_layer=False) + + feature_map_shape_list = [(38, 38), (19, 19), (10, 10), + (5, 5), (3, 3), (1, 1)] + anchors = anchor_generator.generate( + feature_map_shape_list=feature_map_shape_list) + anchor_corners = anchors.get() + with self.test_session(): + anchor_corners_out = anchor_corners.eval() + self.assertEquals(anchor_corners_out.shape, (11640, 4)) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/box_coders/BUILD b/object_detection/box_coders/BUILD new file mode 100644 index 0000000000000000000000000000000000000000..ecb3cc7aa597c3708e2c0c6e8e5c937cd708ce53 --- /dev/null +++ b/object_detection/box_coders/BUILD @@ -0,0 +1,102 @@ +# Tensorflow Object Detection API: Box Coder implementations. + +package( + default_visibility = ["//visibility:public"], +) + +licenses(["notice"]) + +# Apache 2.0 +py_library( + name = "faster_rcnn_box_coder", + srcs = [ + "faster_rcnn_box_coder.py", + ], + deps = [ + "//tensorflow_models/object_detection/core:box_coder", + "//tensorflow_models/object_detection/core:box_list", + ], +) + +py_test( + name = "faster_rcnn_box_coder_test", + srcs = [ + "faster_rcnn_box_coder_test.py", + ], + deps = [ + ":faster_rcnn_box_coder", + "//tensorflow", + "//tensorflow_models/object_detection/core:box_list", + ], +) + +py_library( + name = "keypoint_box_coder", + srcs = [ + "keypoint_box_coder.py", + ], + deps = [ + "//tensorflow_models/object_detection/core:box_coder", + "//tensorflow_models/object_detection/core:box_list", + "//tensorflow_models/object_detection/core:standard_fields", + ], +) + +py_test( + name = "keypoint_box_coder_test", + srcs = [ + "keypoint_box_coder_test.py", + ], + deps = [ + ":keypoint_box_coder", + "//tensorflow", + "//tensorflow_models/object_detection/core:box_list", + "//tensorflow_models/object_detection/core:standard_fields", + ], +) + +py_library( + name = "mean_stddev_box_coder", + srcs = [ + "mean_stddev_box_coder.py", + ], + deps = [ + "//tensorflow_models/object_detection/core:box_coder", + "//tensorflow_models/object_detection/core:box_list", + ], +) + +py_test( + name = "mean_stddev_box_coder_test", + srcs = [ + "mean_stddev_box_coder_test.py", + ], + deps = [ + ":mean_stddev_box_coder", + "//tensorflow", + "//tensorflow_models/object_detection/core:box_list", + ], +) + +py_library( + name = "square_box_coder", + srcs = [ + "square_box_coder.py", + ], + deps = [ + "//tensorflow_models/object_detection/core:box_coder", + "//tensorflow_models/object_detection/core:box_list", + ], +) + +py_test( + name = "square_box_coder_test", + srcs = [ + "square_box_coder_test.py", + ], + deps = [ + ":square_box_coder", + "//tensorflow", + "//tensorflow_models/object_detection/core:box_list", + ], +) diff --git a/object_detection/box_coders/__init__.py b/object_detection/box_coders/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/object_detection/box_coders/faster_rcnn_box_coder.py b/object_detection/box_coders/faster_rcnn_box_coder.py new file mode 100644 index 0000000000000000000000000000000000000000..af25e21a105ffa85931d3f30a1ca41c89c5dde53 --- /dev/null +++ b/object_detection/box_coders/faster_rcnn_box_coder.py @@ -0,0 +1,118 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Faster RCNN box coder. + +Faster RCNN box coder follows the coding schema described below: + ty = (y - ya) / ha + tx = (x - xa) / wa + th = log(h / ha) + tw = log(w / wa) + where x, y, w, h denote the box's center coordinates, width and height + respectively. Similarly, xa, ya, wa, ha denote the anchor's center + coordinates, width and height. tx, ty, tw and th denote the anchor-encoded + center, width and height respectively. + + See http://arxiv.org/abs/1506.01497 for details. +""" + +import tensorflow as tf + +from object_detection.core import box_coder +from object_detection.core import box_list + +EPSILON = 1e-8 + + +class FasterRcnnBoxCoder(box_coder.BoxCoder): + """Faster RCNN box coder.""" + + def __init__(self, scale_factors=None): + """Constructor for FasterRcnnBoxCoder. + + Args: + scale_factors: List of 4 positive scalars to scale ty, tx, th and tw. + If set to None, does not perform scaling. For Faster RCNN, + the open-source implementation recommends using [10.0, 10.0, 5.0, 5.0]. + """ + if scale_factors: + assert len(scale_factors) == 4 + for scalar in scale_factors: + assert scalar > 0 + self._scale_factors = scale_factors + + @property + def code_size(self): + return 4 + + def _encode(self, boxes, anchors): + """Encode a box collection with respect to anchor collection. + + Args: + boxes: BoxList holding N boxes to be encoded. + anchors: BoxList of anchors. + + Returns: + a tensor representing N anchor-encoded boxes of the format + [ty, tx, th, tw]. + """ + # Convert anchors to the center coordinate representation. + ycenter_a, xcenter_a, ha, wa = anchors.get_center_coordinates_and_sizes() + ycenter, xcenter, h, w = boxes.get_center_coordinates_and_sizes() + # Avoid NaN in division and log below. + ha += EPSILON + wa += EPSILON + h += EPSILON + w += EPSILON + + tx = (xcenter - xcenter_a) / wa + ty = (ycenter - ycenter_a) / ha + tw = tf.log(w / wa) + th = tf.log(h / ha) + # Scales location targets as used in paper for joint training. + if self._scale_factors: + ty *= self._scale_factors[0] + tx *= self._scale_factors[1] + th *= self._scale_factors[2] + tw *= self._scale_factors[3] + return tf.transpose(tf.stack([ty, tx, th, tw])) + + def _decode(self, rel_codes, anchors): + """Decode relative codes to boxes. + + Args: + rel_codes: a tensor representing N anchor-encoded boxes. + anchors: BoxList of anchors. + + Returns: + boxes: BoxList holding N bounding boxes. + """ + ycenter_a, xcenter_a, ha, wa = anchors.get_center_coordinates_and_sizes() + + ty, tx, th, tw = tf.unstack(tf.transpose(rel_codes)) + if self._scale_factors: + ty /= self._scale_factors[0] + tx /= self._scale_factors[1] + th /= self._scale_factors[2] + tw /= self._scale_factors[3] + w = tf.exp(tw) * wa + h = tf.exp(th) * ha + ycenter = ty * ha + ycenter_a + xcenter = tx * wa + xcenter_a + ymin = ycenter - h / 2. + xmin = xcenter - w / 2. + ymax = ycenter + h / 2. + xmax = xcenter + w / 2. + return box_list.BoxList(tf.transpose(tf.stack([ymin, xmin, ymax, xmax]))) diff --git a/object_detection/box_coders/faster_rcnn_box_coder_test.py b/object_detection/box_coders/faster_rcnn_box_coder_test.py new file mode 100644 index 0000000000000000000000000000000000000000..b2135f06eea093110c9da17c1c46b7d247f8e806 --- /dev/null +++ b/object_detection/box_coders/faster_rcnn_box_coder_test.py @@ -0,0 +1,94 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.box_coder.faster_rcnn_box_coder.""" + +import tensorflow as tf + +from object_detection.box_coders import faster_rcnn_box_coder +from object_detection.core import box_list + + +class FasterRcnnBoxCoderTest(tf.test.TestCase): + + def test_get_correct_relative_codes_after_encoding(self): + boxes = [[10.0, 10.0, 20.0, 15.0], [0.2, 0.1, 0.5, 0.4]] + anchors = [[15.0, 12.0, 30.0, 18.0], [0.1, 0.0, 0.7, 0.9]] + expected_rel_codes = [[-0.5, -0.416666, -0.405465, -0.182321], + [-0.083333, -0.222222, -0.693147, -1.098612]] + boxes = box_list.BoxList(tf.constant(boxes)) + anchors = box_list.BoxList(tf.constant(anchors)) + coder = faster_rcnn_box_coder.FasterRcnnBoxCoder() + rel_codes = coder.encode(boxes, anchors) + with self.test_session() as sess: + rel_codes_out, = sess.run([rel_codes]) + self.assertAllClose(rel_codes_out, expected_rel_codes) + + def test_get_correct_relative_codes_after_encoding_with_scaling(self): + boxes = [[10.0, 10.0, 20.0, 15.0], [0.2, 0.1, 0.5, 0.4]] + anchors = [[15.0, 12.0, 30.0, 18.0], [0.1, 0.0, 0.7, 0.9]] + scale_factors = [2, 3, 4, 5] + expected_rel_codes = [[-1., -1.25, -1.62186, -0.911608], + [-0.166667, -0.666667, -2.772588, -5.493062]] + boxes = box_list.BoxList(tf.constant(boxes)) + anchors = box_list.BoxList(tf.constant(anchors)) + coder = faster_rcnn_box_coder.FasterRcnnBoxCoder( + scale_factors=scale_factors) + rel_codes = coder.encode(boxes, anchors) + with self.test_session() as sess: + rel_codes_out, = sess.run([rel_codes]) + self.assertAllClose(rel_codes_out, expected_rel_codes) + + def test_get_correct_boxes_after_decoding(self): + anchors = [[15.0, 12.0, 30.0, 18.0], [0.1, 0.0, 0.7, 0.9]] + rel_codes = [[-0.5, -0.416666, -0.405465, -0.182321], + [-0.083333, -0.222222, -0.693147, -1.098612]] + expected_boxes = [[10.0, 10.0, 20.0, 15.0], [0.2, 0.1, 0.5, 0.4]] + anchors = box_list.BoxList(tf.constant(anchors)) + coder = faster_rcnn_box_coder.FasterRcnnBoxCoder() + boxes = coder.decode(rel_codes, anchors) + with self.test_session() as sess: + boxes_out, = sess.run([boxes.get()]) + self.assertAllClose(boxes_out, expected_boxes) + + def test_get_correct_boxes_after_decoding_with_scaling(self): + anchors = [[15.0, 12.0, 30.0, 18.0], [0.1, 0.0, 0.7, 0.9]] + rel_codes = [[-1., -1.25, -1.62186, -0.911608], + [-0.166667, -0.666667, -2.772588, -5.493062]] + scale_factors = [2, 3, 4, 5] + expected_boxes = [[10.0, 10.0, 20.0, 15.0], [0.2, 0.1, 0.5, 0.4]] + anchors = box_list.BoxList(tf.constant(anchors)) + coder = faster_rcnn_box_coder.FasterRcnnBoxCoder( + scale_factors=scale_factors) + boxes = coder.decode(rel_codes, anchors) + with self.test_session() as sess: + boxes_out, = sess.run([boxes.get()]) + self.assertAllClose(boxes_out, expected_boxes) + + def test_very_small_Width_nan_after_encoding(self): + boxes = [[10.0, 10.0, 10.0000001, 20.0]] + anchors = [[15.0, 12.0, 30.0, 18.0]] + expected_rel_codes = [[-0.833333, 0., -21.128731, 0.510826]] + boxes = box_list.BoxList(tf.constant(boxes)) + anchors = box_list.BoxList(tf.constant(anchors)) + coder = faster_rcnn_box_coder.FasterRcnnBoxCoder() + rel_codes = coder.encode(boxes, anchors) + with self.test_session() as sess: + rel_codes_out, = sess.run([rel_codes]) + self.assertAllClose(rel_codes_out, expected_rel_codes) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/box_coders/keypoint_box_coder.py b/object_detection/box_coders/keypoint_box_coder.py new file mode 100644 index 0000000000000000000000000000000000000000..34ed1af2300528f061e1aa248805f8baf9408f2e --- /dev/null +++ b/object_detection/box_coders/keypoint_box_coder.py @@ -0,0 +1,171 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Keypoint box coder. + +The keypoint box coder follows the coding schema described below (this is +similar to the FasterRcnnBoxCoder, except that it encodes keypoints in addition +to box coordinates): + ty = (y - ya) / ha + tx = (x - xa) / wa + th = log(h / ha) + tw = log(w / wa) + tky0 = (ky0 - ya) / ha + tkx0 = (kx0 - xa) / ha + tky1 = (ky1 - ya) / ha + tkx1 = (kx1 - xa) / ha + ... + where x, y, w, h denote the box's center coordinates, width and height + respectively. Similarly, xa, ya, wa, ha denote the anchor's center + coordinates, width and height. tx, ty, tw and th denote the anchor-encoded + center, width and height respectively. ky0, kx0, ky1, kx1, ... denote the + keypoints' coordinates, and tky0, tkx0, tky1, tkx1, ... denote the + anchor-encoded keypoint coordinates. +""" + +import tensorflow as tf + +from object_detection.core import box_coder +from object_detection.core import box_list +from object_detection.core import standard_fields as fields + +EPSILON = 1e-8 + + +class KeypointBoxCoder(box_coder.BoxCoder): + """Keypoint box coder.""" + + def __init__(self, num_keypoints, scale_factors=None): + """Constructor for KeypointBoxCoder. + + Args: + num_keypoints: Number of keypoints to encode/decode. + scale_factors: List of 4 positive scalars to scale ty, tx, th and tw. + In addition to scaling ty and tx, the first 2 scalars are used to scale + the y and x coordinates of the keypoints as well. If set to None, does + not perform scaling. + """ + self._num_keypoints = num_keypoints + + if scale_factors: + assert len(scale_factors) == 4 + for scalar in scale_factors: + assert scalar > 0 + self._scale_factors = scale_factors + self._keypoint_scale_factors = None + if scale_factors is not None: + self._keypoint_scale_factors = tf.expand_dims(tf.tile( + [tf.to_float(scale_factors[0]), tf.to_float(scale_factors[1])], + [num_keypoints]), 1) + + @property + def code_size(self): + return 4 + self._num_keypoints * 2 + + def _encode(self, boxes, anchors): + """Encode a box and keypoint collection with respect to anchor collection. + + Args: + boxes: BoxList holding N boxes and keypoints to be encoded. Boxes are + tensors with the shape [N, 4], and keypoints are tensors with the shape + [N, num_keypoints, 2]. + anchors: BoxList of anchors. + + Returns: + a tensor representing N anchor-encoded boxes of the format + [ty, tx, th, tw, tky0, tkx0, tky1, tkx1, ...] where tky0 and tkx0 + represent the y and x coordinates of the first keypoint, tky1 and tkx1 + represent the y and x coordinates of the second keypoint, and so on. + """ + # Convert anchors to the center coordinate representation. + ycenter_a, xcenter_a, ha, wa = anchors.get_center_coordinates_and_sizes() + ycenter, xcenter, h, w = boxes.get_center_coordinates_and_sizes() + keypoints = boxes.get_field(fields.BoxListFields.keypoints) + keypoints = tf.transpose(tf.reshape(keypoints, + [-1, self._num_keypoints * 2])) + num_boxes = boxes.num_boxes() + + # Avoid NaN in division and log below. + ha += EPSILON + wa += EPSILON + h += EPSILON + w += EPSILON + + tx = (xcenter - xcenter_a) / wa + ty = (ycenter - ycenter_a) / ha + tw = tf.log(w / wa) + th = tf.log(h / ha) + + tiled_anchor_centers = tf.tile( + tf.stack([ycenter_a, xcenter_a]), [self._num_keypoints, 1]) + tiled_anchor_sizes = tf.tile( + tf.stack([ha, wa]), [self._num_keypoints, 1]) + tkeypoints = (keypoints - tiled_anchor_centers) / tiled_anchor_sizes + + # Scales location targets as used in paper for joint training. + if self._scale_factors: + ty *= self._scale_factors[0] + tx *= self._scale_factors[1] + th *= self._scale_factors[2] + tw *= self._scale_factors[3] + tkeypoints *= tf.tile(self._keypoint_scale_factors, [1, num_boxes]) + + tboxes = tf.stack([ty, tx, th, tw]) + return tf.transpose(tf.concat([tboxes, tkeypoints], 0)) + + def _decode(self, rel_codes, anchors): + """Decode relative codes to boxes and keypoints. + + Args: + rel_codes: a tensor with shape [N, 4 + 2 * num_keypoints] representing N + anchor-encoded boxes and keypoints + anchors: BoxList of anchors. + + Returns: + boxes: BoxList holding N bounding boxes and keypoints. + """ + ycenter_a, xcenter_a, ha, wa = anchors.get_center_coordinates_and_sizes() + + num_codes = tf.shape(rel_codes)[0] + result = tf.unstack(tf.transpose(rel_codes)) + ty, tx, th, tw = result[:4] + tkeypoints = result[4:] + if self._scale_factors: + ty /= self._scale_factors[0] + tx /= self._scale_factors[1] + th /= self._scale_factors[2] + tw /= self._scale_factors[3] + tkeypoints /= tf.tile(self._keypoint_scale_factors, [1, num_codes]) + + w = tf.exp(tw) * wa + h = tf.exp(th) * ha + ycenter = ty * ha + ycenter_a + xcenter = tx * wa + xcenter_a + ymin = ycenter - h / 2. + xmin = xcenter - w / 2. + ymax = ycenter + h / 2. + xmax = xcenter + w / 2. + decoded_boxes_keypoints = box_list.BoxList( + tf.transpose(tf.stack([ymin, xmin, ymax, xmax]))) + + tiled_anchor_centers = tf.tile( + tf.stack([ycenter_a, xcenter_a]), [self._num_keypoints, 1]) + tiled_anchor_sizes = tf.tile( + tf.stack([ha, wa]), [self._num_keypoints, 1]) + keypoints = tkeypoints * tiled_anchor_sizes + tiled_anchor_centers + keypoints = tf.reshape(tf.transpose(keypoints), + [-1, self._num_keypoints, 2]) + decoded_boxes_keypoints.add_field(fields.BoxListFields.keypoints, keypoints) + return decoded_boxes_keypoints diff --git a/object_detection/box_coders/keypoint_box_coder_test.py b/object_detection/box_coders/keypoint_box_coder_test.py new file mode 100644 index 0000000000000000000000000000000000000000..330641e586af98af5f4764fb08f5307458777458 --- /dev/null +++ b/object_detection/box_coders/keypoint_box_coder_test.py @@ -0,0 +1,140 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.box_coder.keypoint_box_coder.""" + +import tensorflow as tf + +from object_detection.box_coders import keypoint_box_coder +from object_detection.core import box_list +from object_detection.core import standard_fields as fields + + +class KeypointBoxCoderTest(tf.test.TestCase): + + def test_get_correct_relative_codes_after_encoding(self): + boxes = [[10., 10., 20., 15.], + [0.2, 0.1, 0.5, 0.4]] + keypoints = [[[15., 12.], [10., 15.]], + [[0.5, 0.3], [0.2, 0.4]]] + num_keypoints = len(keypoints[0]) + anchors = [[15., 12., 30., 18.], + [0.1, 0.0, 0.7, 0.9]] + expected_rel_codes = [ + [-0.5, -0.416666, -0.405465, -0.182321, + -0.5, -0.5, -0.833333, 0.], + [-0.083333, -0.222222, -0.693147, -1.098612, + 0.166667, -0.166667, -0.333333, -0.055556] + ] + boxes = box_list.BoxList(tf.constant(boxes)) + boxes.add_field(fields.BoxListFields.keypoints, tf.constant(keypoints)) + anchors = box_list.BoxList(tf.constant(anchors)) + coder = keypoint_box_coder.KeypointBoxCoder(num_keypoints) + rel_codes = coder.encode(boxes, anchors) + with self.test_session() as sess: + rel_codes_out, = sess.run([rel_codes]) + self.assertAllClose(rel_codes_out, expected_rel_codes) + + def test_get_correct_relative_codes_after_encoding_with_scaling(self): + boxes = [[10., 10., 20., 15.], + [0.2, 0.1, 0.5, 0.4]] + keypoints = [[[15., 12.], [10., 15.]], + [[0.5, 0.3], [0.2, 0.4]]] + num_keypoints = len(keypoints[0]) + anchors = [[15., 12., 30., 18.], + [0.1, 0.0, 0.7, 0.9]] + scale_factors = [2, 3, 4, 5] + expected_rel_codes = [ + [-1., -1.25, -1.62186, -0.911608, + -1.0, -1.5, -1.666667, 0.], + [-0.166667, -0.666667, -2.772588, -5.493062, + 0.333333, -0.5, -0.666667, -0.166667] + ] + boxes = box_list.BoxList(tf.constant(boxes)) + boxes.add_field(fields.BoxListFields.keypoints, tf.constant(keypoints)) + anchors = box_list.BoxList(tf.constant(anchors)) + coder = keypoint_box_coder.KeypointBoxCoder( + num_keypoints, scale_factors=scale_factors) + rel_codes = coder.encode(boxes, anchors) + with self.test_session() as sess: + rel_codes_out, = sess.run([rel_codes]) + self.assertAllClose(rel_codes_out, expected_rel_codes) + + def test_get_correct_boxes_after_decoding(self): + anchors = [[15., 12., 30., 18.], + [0.1, 0.0, 0.7, 0.9]] + rel_codes = [ + [-0.5, -0.416666, -0.405465, -0.182321, + -0.5, -0.5, -0.833333, 0.], + [-0.083333, -0.222222, -0.693147, -1.098612, + 0.166667, -0.166667, -0.333333, -0.055556] + ] + expected_boxes = [[10., 10., 20., 15.], + [0.2, 0.1, 0.5, 0.4]] + expected_keypoints = [[[15., 12.], [10., 15.]], + [[0.5, 0.3], [0.2, 0.4]]] + num_keypoints = len(expected_keypoints[0]) + anchors = box_list.BoxList(tf.constant(anchors)) + coder = keypoint_box_coder.KeypointBoxCoder(num_keypoints) + boxes = coder.decode(rel_codes, anchors) + with self.test_session() as sess: + boxes_out, keypoints_out = sess.run( + [boxes.get(), boxes.get_field(fields.BoxListFields.keypoints)]) + self.assertAllClose(boxes_out, expected_boxes) + self.assertAllClose(keypoints_out, expected_keypoints) + + def test_get_correct_boxes_after_decoding_with_scaling(self): + anchors = [[15., 12., 30., 18.], + [0.1, 0.0, 0.7, 0.9]] + rel_codes = [ + [-1., -1.25, -1.62186, -0.911608, + -1.0, -1.5, -1.666667, 0.], + [-0.166667, -0.666667, -2.772588, -5.493062, + 0.333333, -0.5, -0.666667, -0.166667] + ] + scale_factors = [2, 3, 4, 5] + expected_boxes = [[10., 10., 20., 15.], + [0.2, 0.1, 0.5, 0.4]] + expected_keypoints = [[[15., 12.], [10., 15.]], + [[0.5, 0.3], [0.2, 0.4]]] + num_keypoints = len(expected_keypoints[0]) + anchors = box_list.BoxList(tf.constant(anchors)) + coder = keypoint_box_coder.KeypointBoxCoder( + num_keypoints, scale_factors=scale_factors) + boxes = coder.decode(rel_codes, anchors) + with self.test_session() as sess: + boxes_out, keypoints_out = sess.run( + [boxes.get(), boxes.get_field(fields.BoxListFields.keypoints)]) + self.assertAllClose(boxes_out, expected_boxes) + self.assertAllClose(keypoints_out, expected_keypoints) + + def test_very_small_width_nan_after_encoding(self): + boxes = [[10., 10., 10.0000001, 20.]] + keypoints = [[[10., 10.], [10.0000001, 20.]]] + anchors = [[15., 12., 30., 18.]] + expected_rel_codes = [[-0.833333, 0., -21.128731, 0.510826, + -0.833333, -0.833333, -0.833333, 0.833333]] + boxes = box_list.BoxList(tf.constant(boxes)) + boxes.add_field(fields.BoxListFields.keypoints, tf.constant(keypoints)) + anchors = box_list.BoxList(tf.constant(anchors)) + coder = keypoint_box_coder.KeypointBoxCoder(2) + rel_codes = coder.encode(boxes, anchors) + with self.test_session() as sess: + rel_codes_out, = sess.run([rel_codes]) + self.assertAllClose(rel_codes_out, expected_rel_codes) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/box_coders/mean_stddev_box_coder.py b/object_detection/box_coders/mean_stddev_box_coder.py new file mode 100644 index 0000000000000000000000000000000000000000..726b4a61cbedddbf9f2b8d2001a4419b80a3f9e0 --- /dev/null +++ b/object_detection/box_coders/mean_stddev_box_coder.py @@ -0,0 +1,70 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Mean stddev box coder. + +This box coder use the following coding schema to encode boxes: +rel_code = (box_corner - anchor_corner_mean) / anchor_corner_stddev. +""" +from object_detection.core import box_coder +from object_detection.core import box_list + + +class MeanStddevBoxCoder(box_coder.BoxCoder): + """Mean stddev box coder.""" + + @property + def code_size(self): + return 4 + + def _encode(self, boxes, anchors): + """Encode a box collection with respect to anchor collection. + + Args: + boxes: BoxList holding N boxes to be encoded. + anchors: BoxList of N anchors. We assume that anchors has an associated + stddev field. + + Returns: + a tensor representing N anchor-encoded boxes + Raises: + ValueError: if the anchors BoxList does not have a stddev field + """ + if not anchors.has_field('stddev'): + raise ValueError('anchors must have a stddev field') + box_corners = boxes.get() + means = anchors.get() + stddev = anchors.get_field('stddev') + return (box_corners - means) / stddev + + def _decode(self, rel_codes, anchors): + """Decode. + + Args: + rel_codes: a tensor representing N anchor-encoded boxes. + anchors: BoxList of anchors. We assume that anchors has an associated + stddev field. + + Returns: + boxes: BoxList holding N bounding boxes + Raises: + ValueError: if the anchors BoxList does not have a stddev field + """ + if not anchors.has_field('stddev'): + raise ValueError('anchors must have a stddev field') + means = anchors.get() + stddevs = anchors.get_field('stddev') + box_corners = rel_codes * stddevs + means + return box_list.BoxList(box_corners) diff --git a/object_detection/box_coders/mean_stddev_box_coder_test.py b/object_detection/box_coders/mean_stddev_box_coder_test.py new file mode 100644 index 0000000000000000000000000000000000000000..0d3a895280b71c7a8075b58b492470a9a8b95618 --- /dev/null +++ b/object_detection/box_coders/mean_stddev_box_coder_test.py @@ -0,0 +1,58 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.box_coder.mean_stddev_boxcoder.""" + +import tensorflow as tf + +from object_detection.box_coders import mean_stddev_box_coder +from object_detection.core import box_list + + +class MeanStddevBoxCoderTest(tf.test.TestCase): + + def testGetCorrectRelativeCodesAfterEncoding(self): + box_corners = [[0.0, 0.0, 0.5, 0.5], [0.0, 0.0, 0.5, 0.5]] + boxes = box_list.BoxList(tf.constant(box_corners)) + expected_rel_codes = [[0.0, 0.0, 0.0, 0.0], [-5.0, -5.0, -5.0, -3.0]] + prior_means = tf.constant([[0.0, 0.0, 0.5, 0.5], [0.5, 0.5, 1.0, 0.8]]) + prior_stddevs = tf.constant(2 * [4 * [.1]]) + priors = box_list.BoxList(prior_means) + priors.add_field('stddev', prior_stddevs) + + coder = mean_stddev_box_coder.MeanStddevBoxCoder() + rel_codes = coder.encode(boxes, priors) + with self.test_session() as sess: + rel_codes_out = sess.run(rel_codes) + self.assertAllClose(rel_codes_out, expected_rel_codes) + + def testGetCorrectBoxesAfterDecoding(self): + rel_codes = tf.constant([[0.0, 0.0, 0.0, 0.0], [-5.0, -5.0, -5.0, -3.0]]) + expected_box_corners = [[0.0, 0.0, 0.5, 0.5], [0.0, 0.0, 0.5, 0.5]] + prior_means = tf.constant([[0.0, 0.0, 0.5, 0.5], [0.5, 0.5, 1.0, 0.8]]) + prior_stddevs = tf.constant(2 * [4 * [.1]]) + priors = box_list.BoxList(prior_means) + priors.add_field('stddev', prior_stddevs) + + coder = mean_stddev_box_coder.MeanStddevBoxCoder() + decoded_boxes = coder.decode(rel_codes, priors) + decoded_box_corners = decoded_boxes.get() + with self.test_session() as sess: + decoded_out = sess.run(decoded_box_corners) + self.assertAllClose(decoded_out, expected_box_corners) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/box_coders/square_box_coder.py b/object_detection/box_coders/square_box_coder.py new file mode 100644 index 0000000000000000000000000000000000000000..ee46b689524838518182ff0f9208168e78c8b2cf --- /dev/null +++ b/object_detection/box_coders/square_box_coder.py @@ -0,0 +1,126 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Square box coder. + +Square box coder follows the coding schema described below: +l = sqrt(h * w) +la = sqrt(ha * wa) +ty = (y - ya) / la +tx = (x - xa) / la +tl = log(l / la) +where x, y, w, h denote the box's center coordinates, width, and height, +respectively. Similarly, xa, ya, wa, ha denote the anchor's center +coordinates, width and height. tx, ty, tl denote the anchor-encoded +center, and length, respectively. Because the encoded box is a square, only +one length is encoded. + +This has shown to provide performance improvements over the Faster RCNN box +coder when the objects being detected tend to be square (e.g. faces) and when +the input images are not distorted via resizing. +""" + +import tensorflow as tf + +from object_detection.core import box_coder +from object_detection.core import box_list + +EPSILON = 1e-8 + + +class SquareBoxCoder(box_coder.BoxCoder): + """Encodes a 3-scalar representation of a square box.""" + + def __init__(self, scale_factors=None): + """Constructor for SquareBoxCoder. + + Args: + scale_factors: List of 3 positive scalars to scale ty, tx, and tl. + If set to None, does not perform scaling. For faster RCNN, + the open-source implementation recommends using [10.0, 10.0, 5.0]. + + Raises: + ValueError: If scale_factors is not length 3 or contains values less than + or equal to 0. + """ + if scale_factors: + if len(scale_factors) != 3: + raise ValueError('The argument scale_factors must be a list of length ' + '3.') + if any(scalar <= 0 for scalar in scale_factors): + raise ValueError('The values in scale_factors must all be greater ' + 'than 0.') + self._scale_factors = scale_factors + + @property + def code_size(self): + return 3 + + def _encode(self, boxes, anchors): + """Encodes a box collection with respect to an anchor collection. + + Args: + boxes: BoxList holding N boxes to be encoded. + anchors: BoxList of anchors. + + Returns: + a tensor representing N anchor-encoded boxes of the format + [ty, tx, tl]. + """ + # Convert anchors to the center coordinate representation. + ycenter_a, xcenter_a, ha, wa = anchors.get_center_coordinates_and_sizes() + la = tf.sqrt(ha * wa) + ycenter, xcenter, h, w = boxes.get_center_coordinates_and_sizes() + l = tf.sqrt(h * w) + # Avoid NaN in division and log below. + la += EPSILON + l += EPSILON + + tx = (xcenter - xcenter_a) / la + ty = (ycenter - ycenter_a) / la + tl = tf.log(l / la) + # Scales location targets for joint training. + if self._scale_factors: + ty *= self._scale_factors[0] + tx *= self._scale_factors[1] + tl *= self._scale_factors[2] + return tf.transpose(tf.stack([ty, tx, tl])) + + def _decode(self, rel_codes, anchors): + """Decodes relative codes to boxes. + + Args: + rel_codes: a tensor representing N anchor-encoded boxes. + anchors: BoxList of anchors. + + Returns: + boxes: BoxList holding N bounding boxes. + """ + ycenter_a, xcenter_a, ha, wa = anchors.get_center_coordinates_and_sizes() + la = tf.sqrt(ha * wa) + + ty, tx, tl = tf.unstack(tf.transpose(rel_codes)) + if self._scale_factors: + ty /= self._scale_factors[0] + tx /= self._scale_factors[1] + tl /= self._scale_factors[2] + l = tf.exp(tl) * la + ycenter = ty * la + ycenter_a + xcenter = tx * la + xcenter_a + ymin = ycenter - l / 2. + xmin = xcenter - l / 2. + ymax = ycenter + l / 2. + xmax = xcenter + l / 2. + return box_list.BoxList(tf.transpose(tf.stack([ymin, xmin, ymax, xmax]))) diff --git a/object_detection/box_coders/square_box_coder_test.py b/object_detection/box_coders/square_box_coder_test.py new file mode 100644 index 0000000000000000000000000000000000000000..7f739c6b4f38de3d280cb91e9c8e04a661a621e4 --- /dev/null +++ b/object_detection/box_coders/square_box_coder_test.py @@ -0,0 +1,97 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.box_coder.square_box_coder.""" + +import tensorflow as tf + +from object_detection.box_coders import square_box_coder +from object_detection.core import box_list + + +class SquareBoxCoderTest(tf.test.TestCase): + + def test_correct_relative_codes_with_default_scale(self): + boxes = [[10.0, 10.0, 20.0, 15.0], [0.2, 0.1, 0.5, 0.4]] + anchors = [[15.0, 12.0, 30.0, 18.0], [0.1, 0.0, 0.7, 0.9]] + scale_factors = None + expected_rel_codes = [[-0.790569, -0.263523, -0.293893], + [-0.068041, -0.272166, -0.89588]] + + boxes = box_list.BoxList(tf.constant(boxes)) + anchors = box_list.BoxList(tf.constant(anchors)) + coder = square_box_coder.SquareBoxCoder(scale_factors=scale_factors) + rel_codes = coder.encode(boxes, anchors) + with self.test_session() as sess: + (rel_codes_out,) = sess.run([rel_codes]) + self.assertAllClose(rel_codes_out, expected_rel_codes) + + def test_correct_relative_codes_with_non_default_scale(self): + boxes = [[10.0, 10.0, 20.0, 15.0], [0.2, 0.1, 0.5, 0.4]] + anchors = [[15.0, 12.0, 30.0, 18.0], [0.1, 0.0, 0.7, 0.9]] + scale_factors = [2, 3, 4] + expected_rel_codes = [[-1.581139, -0.790569, -1.175573], + [-0.136083, -0.816497, -3.583519]] + boxes = box_list.BoxList(tf.constant(boxes)) + anchors = box_list.BoxList(tf.constant(anchors)) + coder = square_box_coder.SquareBoxCoder(scale_factors=scale_factors) + rel_codes = coder.encode(boxes, anchors) + with self.test_session() as sess: + (rel_codes_out,) = sess.run([rel_codes]) + self.assertAllClose(rel_codes_out, expected_rel_codes) + + def test_correct_relative_codes_with_small_width(self): + boxes = [[10.0, 10.0, 10.0000001, 20.0]] + anchors = [[15.0, 12.0, 30.0, 18.0]] + scale_factors = None + expected_rel_codes = [[-1.317616, 0., -20.670586]] + boxes = box_list.BoxList(tf.constant(boxes)) + anchors = box_list.BoxList(tf.constant(anchors)) + coder = square_box_coder.SquareBoxCoder(scale_factors=scale_factors) + rel_codes = coder.encode(boxes, anchors) + with self.test_session() as sess: + (rel_codes_out,) = sess.run([rel_codes]) + self.assertAllClose(rel_codes_out, expected_rel_codes) + + def test_correct_boxes_with_default_scale(self): + anchors = [[15.0, 12.0, 30.0, 18.0], [0.1, 0.0, 0.7, 0.9]] + rel_codes = [[-0.5, -0.416666, -0.405465], + [-0.083333, -0.222222, -0.693147]] + scale_factors = None + expected_boxes = [[14.594306, 7.884875, 20.918861, 14.209432], + [0.155051, 0.102989, 0.522474, 0.470412]] + anchors = box_list.BoxList(tf.constant(anchors)) + coder = square_box_coder.SquareBoxCoder(scale_factors=scale_factors) + boxes = coder.decode(rel_codes, anchors) + with self.test_session() as sess: + (boxes_out,) = sess.run([boxes.get()]) + self.assertAllClose(boxes_out, expected_boxes) + + def test_correct_boxes_with_non_default_scale(self): + anchors = [[15.0, 12.0, 30.0, 18.0], [0.1, 0.0, 0.7, 0.9]] + rel_codes = [[-1., -1.25, -1.62186], [-0.166667, -0.666667, -2.772588]] + scale_factors = [2, 3, 4] + expected_boxes = [[14.594306, 7.884875, 20.918861, 14.209432], + [0.155051, 0.102989, 0.522474, 0.470412]] + anchors = box_list.BoxList(tf.constant(anchors)) + coder = square_box_coder.SquareBoxCoder(scale_factors=scale_factors) + boxes = coder.decode(rel_codes, anchors) + with self.test_session() as sess: + (boxes_out,) = sess.run([boxes.get()]) + self.assertAllClose(boxes_out, expected_boxes) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/builders/BUILD b/object_detection/builders/BUILD new file mode 100644 index 0000000000000000000000000000000000000000..bb40de5b5e396deae86ff53a5cbb4068ec86bee8 --- /dev/null +++ b/object_detection/builders/BUILD @@ -0,0 +1,296 @@ +# Tensorflow Object Detection API: component builders. + +package( + default_visibility = ["//visibility:public"], +) + +licenses(["notice"]) + +# Apache 2.0 +py_library( + name = "model_builder", + srcs = ["model_builder.py"], + deps = [ + ":anchor_generator_builder", + ":box_coder_builder", + ":box_predictor_builder", + ":hyperparams_builder", + ":image_resizer_builder", + ":losses_builder", + ":matcher_builder", + ":post_processing_builder", + ":region_similarity_calculator_builder", + "//tensorflow_models/object_detection/core:box_predictor", + "//tensorflow_models/object_detection/meta_architectures:faster_rcnn_meta_arch", + "//tensorflow_models/object_detection/meta_architectures:rfcn_meta_arch", + "//tensorflow_models/object_detection/meta_architectures:ssd_meta_arch", + "//tensorflow_models/object_detection/models:faster_rcnn_inception_resnet_v2_feature_extractor", + "//tensorflow_models/object_detection/models:faster_rcnn_resnet_v1_feature_extractor", + "//tensorflow_models/object_detection/models:ssd_inception_v2_feature_extractor", + "//tensorflow_models/object_detection/models:ssd_mobilenet_v1_feature_extractor", + "//tensorflow_models/object_detection/protos:model_py_pb2", + ], +) + +py_test( + name = "model_builder_test", + srcs = ["model_builder_test.py"], + deps = [ + ":model_builder", + "//tensorflow", + "//tensorflow_models/object_detection/meta_architectures:faster_rcnn_meta_arch", + "//tensorflow_models/object_detection/meta_architectures:ssd_meta_arch", + "//tensorflow_models/object_detection/models:ssd_inception_v2_feature_extractor", + "//tensorflow_models/object_detection/models:ssd_mobilenet_v1_feature_extractor", + "//tensorflow_models/object_detection/protos:model_py_pb2", + ], +) + +py_library( + name = "matcher_builder", + srcs = ["matcher_builder.py"], + deps = [ + "//tensorflow_models/object_detection/matchers:argmax_matcher", + "//tensorflow_models/object_detection/matchers:bipartite_matcher", + "//tensorflow_models/object_detection/protos:matcher_py_pb2", + ], +) + +py_test( + name = "matcher_builder_test", + srcs = ["matcher_builder_test.py"], + deps = [ + ":matcher_builder", + "//tensorflow_models/object_detection/matchers:argmax_matcher", + "//tensorflow_models/object_detection/matchers:bipartite_matcher", + "//tensorflow_models/object_detection/protos:matcher_py_pb2", + ], +) + +py_library( + name = "box_coder_builder", + srcs = ["box_coder_builder.py"], + deps = [ + "//tensorflow_models/object_detection/box_coders:faster_rcnn_box_coder", + "//tensorflow_models/object_detection/box_coders:mean_stddev_box_coder", + "//tensorflow_models/object_detection/box_coders:square_box_coder", + "//tensorflow_models/object_detection/protos:box_coder_py_pb2", + ], +) + +py_test( + name = "box_coder_builder_test", + srcs = ["box_coder_builder_test.py"], + deps = [ + ":box_coder_builder", + "//tensorflow", + "//tensorflow_models/object_detection/box_coders:faster_rcnn_box_coder", + "//tensorflow_models/object_detection/box_coders:mean_stddev_box_coder", + "//tensorflow_models/object_detection/box_coders:square_box_coder", + "//tensorflow_models/object_detection/protos:box_coder_py_pb2", + ], +) + +py_library( + name = "anchor_generator_builder", + srcs = ["anchor_generator_builder.py"], + deps = [ + "//tensorflow_models/object_detection/anchor_generators:grid_anchor_generator", + "//tensorflow_models/object_detection/anchor_generators:multiple_grid_anchor_generator", + "//tensorflow_models/object_detection/protos:anchor_generator_py_pb2", + ], +) + +py_test( + name = "anchor_generator_builder_test", + srcs = ["anchor_generator_builder_test.py"], + deps = [ + ":anchor_generator_builder", + "//tensorflow", + "//tensorflow_models/object_detection/anchor_generators:grid_anchor_generator", + "//tensorflow_models/object_detection/anchor_generators:multiple_grid_anchor_generator", + "//tensorflow_models/object_detection/protos:anchor_generator_py_pb2", + ], +) + +py_library( + name = "input_reader_builder", + srcs = ["input_reader_builder.py"], + deps = [ + "//tensorflow", + "//tensorflow_models/object_detection/data_decoders:tf_example_decoder", + "//tensorflow_models/object_detection/protos:input_reader_py_pb2", + ], +) + +py_test( + name = "input_reader_builder_test", + srcs = [ + "input_reader_builder_test.py", + ], + deps = [ + ":input_reader_builder", + "//tensorflow", + "//tensorflow_models/object_detection/core:standard_fields", + "//tensorflow_models/object_detection/protos:input_reader_py_pb2", + ], +) + +py_library( + name = "losses_builder", + srcs = ["losses_builder.py"], + deps = [ + "//tensorflow_models/object_detection/core:losses", + "//tensorflow_models/object_detection/protos:losses_py_pb2", + ], +) + +py_test( + name = "losses_builder_test", + srcs = ["losses_builder_test.py"], + deps = [ + ":losses_builder", + "//tensorflow_models/object_detection/core:losses", + "//tensorflow_models/object_detection/protos:losses_py_pb2", + ], +) + +py_library( + name = "optimizer_builder", + srcs = ["optimizer_builder.py"], + deps = [ + "//tensorflow", + "//tensorflow_models/object_detection/utils:learning_schedules", + ], +) + +py_test( + name = "optimizer_builder_test", + srcs = ["optimizer_builder_test.py"], + deps = [ + ":optimizer_builder", + "//tensorflow", + "//tensorflow_models/object_detection/protos:optimizer_py_pb2", + ], +) + +py_library( + name = "post_processing_builder", + srcs = ["post_processing_builder.py"], + deps = [ + "//tensorflow", + "//tensorflow_models/object_detection/core:post_processing", + "//tensorflow_models/object_detection/protos:post_processing_py_pb2", + ], +) + +py_test( + name = "post_processing_builder_test", + srcs = ["post_processing_builder_test.py"], + deps = [ + ":post_processing_builder", + "//tensorflow", + "//tensorflow_models/object_detection/protos:post_processing_py_pb2", + ], +) + +py_library( + name = "hyperparams_builder", + srcs = ["hyperparams_builder.py"], + deps = [ + "//tensorflow_models/object_detection/protos:hyperparams_py_pb2", + ], +) + +py_test( + name = "hyperparams_builder_test", + srcs = ["hyperparams_builder_test.py"], + deps = [ + ":hyperparams_builder", + "//tensorflow", + "//tensorflow_models/object_detection/protos:hyperparams_py_pb2", + ], +) + +py_library( + name = "box_predictor_builder", + srcs = ["box_predictor_builder.py"], + deps = [ + ":hyperparams_builder", + "//tensorflow_models/object_detection/core:box_predictor", + "//tensorflow_models/object_detection/protos:box_predictor_py_pb2", + ], +) + +py_test( + name = "box_predictor_builder_test", + srcs = ["box_predictor_builder_test.py"], + deps = [ + ":box_predictor_builder", + ":hyperparams_builder", + "//tensorflow", + "//tensorflow_models/object_detection/protos:box_predictor_py_pb2", + "//tensorflow_models/object_detection/protos:hyperparams_py_pb2", + ], +) + +py_library( + name = "region_similarity_calculator_builder", + srcs = ["region_similarity_calculator_builder.py"], + deps = [ + "//tensorflow_models/object_detection/core:region_similarity_calculator", + "//tensorflow_models/object_detection/protos:region_similarity_calculator_py_pb2", + ], +) + +py_test( + name = "region_similarity_calculator_builder_test", + srcs = ["region_similarity_calculator_builder_test.py"], + deps = [ + ":region_similarity_calculator_builder", + "//tensorflow", + ], +) + +py_library( + name = "preprocessor_builder", + srcs = ["preprocessor_builder.py"], + deps = [ + "//tensorflow", + "//tensorflow_models/object_detection/core:preprocessor", + "//tensorflow_models/object_detection/protos:preprocessor_py_pb2", + ], +) + +py_test( + name = "preprocessor_builder_test", + srcs = [ + "preprocessor_builder_test.py", + ], + deps = [ + ":preprocessor_builder", + "//tensorflow", + "//tensorflow_models/object_detection/core:preprocessor", + "//tensorflow_models/object_detection/protos:preprocessor_py_pb2", + ], +) + +py_library( + name = "image_resizer_builder", + srcs = ["image_resizer_builder.py"], + deps = [ + "//tensorflow", + "//tensorflow_models/object_detection/core:preprocessor", + "//tensorflow_models/object_detection/protos:image_resizer_py_pb2", + ], +) + +py_test( + name = "image_resizer_builder_test", + srcs = ["image_resizer_builder_test.py"], + deps = [ + ":image_resizer_builder", + "//tensorflow", + "//tensorflow_models/object_detection/protos:image_resizer_py_pb2", + ], +) diff --git a/object_detection/builders/__init__.py b/object_detection/builders/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/object_detection/builders/anchor_generator_builder.py b/object_detection/builders/anchor_generator_builder.py new file mode 100644 index 0000000000000000000000000000000000000000..7b08deddbc26bf9e9fa52f681e6407954fc987bd --- /dev/null +++ b/object_detection/builders/anchor_generator_builder.py @@ -0,0 +1,66 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""A function to build an object detection anchor generator from config.""" + +from object_detection.anchor_generators import grid_anchor_generator +from object_detection.anchor_generators import multiple_grid_anchor_generator +from object_detection.protos import anchor_generator_pb2 + + +def build(anchor_generator_config): + """Builds an anchor generator based on the config. + + Args: + anchor_generator_config: An anchor_generator.proto object containing the + config for the desired anchor generator. + + Returns: + Anchor generator based on the config. + + Raises: + ValueError: On empty anchor generator proto. + """ + if not isinstance(anchor_generator_config, + anchor_generator_pb2.AnchorGenerator): + raise ValueError('anchor_generator_config not of type ' + 'anchor_generator_pb2.AnchorGenerator') + if anchor_generator_config.WhichOneof( + 'anchor_generator_oneof') == 'grid_anchor_generator': + grid_anchor_generator_config = anchor_generator_config.grid_anchor_generator + return grid_anchor_generator.GridAnchorGenerator( + scales=[float(scale) for scale in grid_anchor_generator_config.scales], + aspect_ratios=[float(aspect_ratio) + for aspect_ratio + in grid_anchor_generator_config.aspect_ratios], + base_anchor_size=[grid_anchor_generator_config.height, + grid_anchor_generator_config.width], + anchor_stride=[grid_anchor_generator_config.height_stride, + grid_anchor_generator_config.width_stride], + anchor_offset=[grid_anchor_generator_config.height_offset, + grid_anchor_generator_config.width_offset]) + elif anchor_generator_config.WhichOneof( + 'anchor_generator_oneof') == 'ssd_anchor_generator': + ssd_anchor_generator_config = anchor_generator_config.ssd_anchor_generator + return multiple_grid_anchor_generator.create_ssd_anchors( + num_layers=ssd_anchor_generator_config.num_layers, + min_scale=ssd_anchor_generator_config.min_scale, + max_scale=ssd_anchor_generator_config.max_scale, + aspect_ratios=ssd_anchor_generator_config.aspect_ratios, + reduce_boxes_in_lowest_layer=(ssd_anchor_generator_config + .reduce_boxes_in_lowest_layer)) + else: + raise ValueError('Empty anchor generator.') + diff --git a/object_detection/builders/anchor_generator_builder_test.py b/object_detection/builders/anchor_generator_builder_test.py new file mode 100644 index 0000000000000000000000000000000000000000..657be18ef115ede611bddff0ce435d522488f757 --- /dev/null +++ b/object_detection/builders/anchor_generator_builder_test.py @@ -0,0 +1,194 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for anchor_generator_builder.""" + +import tensorflow as tf + +from google.protobuf import text_format +from object_detection.anchor_generators import grid_anchor_generator +from object_detection.anchor_generators import multiple_grid_anchor_generator +from object_detection.builders import anchor_generator_builder +from object_detection.protos import anchor_generator_pb2 + + +class AnchorGeneratorBuilderTest(tf.test.TestCase): + + def assert_almost_list_equal(self, expected_list, actual_list, delta=None): + self.assertEqual(len(expected_list), len(actual_list)) + for expected_item, actual_item in zip(expected_list, actual_list): + self.assertAlmostEqual(expected_item, actual_item, delta=delta) + + def test_build_grid_anchor_generator_with_defaults(self): + anchor_generator_text_proto = """ + grid_anchor_generator { + } + """ + anchor_generator_proto = anchor_generator_pb2.AnchorGenerator() + text_format.Merge(anchor_generator_text_proto, anchor_generator_proto) + anchor_generator_object = anchor_generator_builder.build( + anchor_generator_proto) + self.assertTrue(isinstance(anchor_generator_object, + grid_anchor_generator.GridAnchorGenerator)) + self.assertListEqual(anchor_generator_object._scales, []) + self.assertListEqual(anchor_generator_object._aspect_ratios, []) + with self.test_session() as sess: + base_anchor_size, anchor_offset, anchor_stride = sess.run( + [anchor_generator_object._base_anchor_size, + anchor_generator_object._anchor_offset, + anchor_generator_object._anchor_stride]) + self.assertAllEqual(anchor_offset, [0, 0]) + self.assertAllEqual(anchor_stride, [16, 16]) + self.assertAllEqual(base_anchor_size, [256, 256]) + + def test_build_grid_anchor_generator_with_non_default_parameters(self): + anchor_generator_text_proto = """ + grid_anchor_generator { + height: 128 + width: 512 + height_stride: 10 + width_stride: 20 + height_offset: 30 + width_offset: 40 + scales: [0.4, 2.2] + aspect_ratios: [0.3, 4.5] + } + """ + anchor_generator_proto = anchor_generator_pb2.AnchorGenerator() + text_format.Merge(anchor_generator_text_proto, anchor_generator_proto) + anchor_generator_object = anchor_generator_builder.build( + anchor_generator_proto) + self.assertTrue(isinstance(anchor_generator_object, + grid_anchor_generator.GridAnchorGenerator)) + self.assert_almost_list_equal(anchor_generator_object._scales, + [0.4, 2.2]) + self.assert_almost_list_equal(anchor_generator_object._aspect_ratios, + [0.3, 4.5]) + with self.test_session() as sess: + base_anchor_size, anchor_offset, anchor_stride = sess.run( + [anchor_generator_object._base_anchor_size, + anchor_generator_object._anchor_offset, + anchor_generator_object._anchor_stride]) + self.assertAllEqual(anchor_offset, [30, 40]) + self.assertAllEqual(anchor_stride, [10, 20]) + self.assertAllEqual(base_anchor_size, [128, 512]) + + def test_build_ssd_anchor_generator_with_defaults(self): + anchor_generator_text_proto = """ + ssd_anchor_generator { + aspect_ratios: [1.0] + } + """ + anchor_generator_proto = anchor_generator_pb2.AnchorGenerator() + text_format.Merge(anchor_generator_text_proto, anchor_generator_proto) + anchor_generator_object = anchor_generator_builder.build( + anchor_generator_proto) + self.assertTrue(isinstance(anchor_generator_object, + multiple_grid_anchor_generator. + MultipleGridAnchorGenerator)) + for actual_scales, expected_scales in zip( + list(anchor_generator_object._scales), + [(0.1, 0.2, 0.2), + (0.35, 0.418), + (0.499, 0.570), + (0.649, 0.721), + (0.799, 0.871), + (0.949, 0.974)]): + self.assert_almost_list_equal(expected_scales, actual_scales, delta=1e-2) + for actual_aspect_ratio, expected_aspect_ratio in zip( + list(anchor_generator_object._aspect_ratios), + [(1.0, 2.0, 0.5)] + 5 * [(1.0, 1.0)]): + self.assert_almost_list_equal(expected_aspect_ratio, actual_aspect_ratio) + + with self.test_session() as sess: + base_anchor_size = sess.run(anchor_generator_object._base_anchor_size) + self.assertAllClose(base_anchor_size, [1.0, 1.0]) + + def test_build_ssd_anchor_generator_withoud_reduced_boxes(self): + anchor_generator_text_proto = """ + ssd_anchor_generator { + aspect_ratios: [1.0] + reduce_boxes_in_lowest_layer: false + } + """ + anchor_generator_proto = anchor_generator_pb2.AnchorGenerator() + text_format.Merge(anchor_generator_text_proto, anchor_generator_proto) + anchor_generator_object = anchor_generator_builder.build( + anchor_generator_proto) + self.assertTrue(isinstance(anchor_generator_object, + multiple_grid_anchor_generator. + MultipleGridAnchorGenerator)) + + for actual_scales, expected_scales in zip( + list(anchor_generator_object._scales), + [(0.2, 0.264), + (0.35, 0.418), + (0.499, 0.570), + (0.649, 0.721), + (0.799, 0.871), + (0.949, 0.974)]): + self.assert_almost_list_equal(expected_scales, actual_scales, delta=1e-2) + + for actual_aspect_ratio, expected_aspect_ratio in zip( + list(anchor_generator_object._aspect_ratios), + 6 * [(1.0, 1.0)]): + self.assert_almost_list_equal(expected_aspect_ratio, actual_aspect_ratio) + + with self.test_session() as sess: + base_anchor_size = sess.run(anchor_generator_object._base_anchor_size) + self.assertAllClose(base_anchor_size, [1.0, 1.0]) + + def test_build_ssd_anchor_generator_with_non_default_parameters(self): + anchor_generator_text_proto = """ + ssd_anchor_generator { + num_layers: 2 + min_scale: 0.3 + max_scale: 0.8 + aspect_ratios: [2.0] + } + """ + anchor_generator_proto = anchor_generator_pb2.AnchorGenerator() + text_format.Merge(anchor_generator_text_proto, anchor_generator_proto) + anchor_generator_object = anchor_generator_builder.build( + anchor_generator_proto) + self.assertTrue(isinstance(anchor_generator_object, + multiple_grid_anchor_generator. + MultipleGridAnchorGenerator)) + + for actual_scales, expected_scales in zip( + list(anchor_generator_object._scales), + [(0.1, 0.3, 0.3), (0.8,)]): + self.assert_almost_list_equal(expected_scales, actual_scales, delta=1e-2) + + for actual_aspect_ratio, expected_aspect_ratio in zip( + list(anchor_generator_object._aspect_ratios), + [(1.0, 2.0, 0.5), (2.0,)]): + self.assert_almost_list_equal(expected_aspect_ratio, actual_aspect_ratio) + + with self.test_session() as sess: + base_anchor_size = sess.run(anchor_generator_object._base_anchor_size) + self.assertAllClose(base_anchor_size, [1.0, 1.0]) + + def test_raise_value_error_on_empty_anchor_genertor(self): + anchor_generator_text_proto = """ + """ + anchor_generator_proto = anchor_generator_pb2.AnchorGenerator() + text_format.Merge(anchor_generator_text_proto, anchor_generator_proto) + with self.assertRaises(ValueError): + anchor_generator_builder.build(anchor_generator_proto) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/builders/box_coder_builder.py b/object_detection/builders/box_coder_builder.py new file mode 100644 index 0000000000000000000000000000000000000000..ff7ac01fe293016256b802209679d10a75b37a88 --- /dev/null +++ b/object_detection/builders/box_coder_builder.py @@ -0,0 +1,55 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""A function to build an object detection box coder from configuration.""" +from object_detection.box_coders import faster_rcnn_box_coder +from object_detection.box_coders import mean_stddev_box_coder +from object_detection.box_coders import square_box_coder +from object_detection.protos import box_coder_pb2 + + +def build(box_coder_config): + """Builds a box coder object based on the box coder config. + + Args: + box_coder_config: A box_coder.proto object containing the config for the + desired box coder. + + Returns: + BoxCoder based on the config. + + Raises: + ValueError: On empty box coder proto. + """ + if not isinstance(box_coder_config, box_coder_pb2.BoxCoder): + raise ValueError('box_coder_config not of type box_coder_pb2.BoxCoder.') + + if box_coder_config.WhichOneof('box_coder_oneof') == 'faster_rcnn_box_coder': + return faster_rcnn_box_coder.FasterRcnnBoxCoder(scale_factors=[ + box_coder_config.faster_rcnn_box_coder.y_scale, + box_coder_config.faster_rcnn_box_coder.x_scale, + box_coder_config.faster_rcnn_box_coder.height_scale, + box_coder_config.faster_rcnn_box_coder.width_scale + ]) + if (box_coder_config.WhichOneof('box_coder_oneof') == + 'mean_stddev_box_coder'): + return mean_stddev_box_coder.MeanStddevBoxCoder() + if box_coder_config.WhichOneof('box_coder_oneof') == 'square_box_coder': + return square_box_coder.SquareBoxCoder(scale_factors=[ + box_coder_config.square_box_coder.y_scale, + box_coder_config.square_box_coder.x_scale, + box_coder_config.square_box_coder.length_scale + ]) + raise ValueError('Empty box coder.') diff --git a/object_detection/builders/box_coder_builder_test.py b/object_detection/builders/box_coder_builder_test.py new file mode 100644 index 0000000000000000000000000000000000000000..b5adcad5149961a8a58c9429c3f7853981761bd7 --- /dev/null +++ b/object_detection/builders/box_coder_builder_test.py @@ -0,0 +1,107 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for box_coder_builder.""" + +import tensorflow as tf + +from google.protobuf import text_format +from object_detection.box_coders import faster_rcnn_box_coder +from object_detection.box_coders import mean_stddev_box_coder +from object_detection.box_coders import square_box_coder +from object_detection.builders import box_coder_builder +from object_detection.protos import box_coder_pb2 + + +class BoxCoderBuilderTest(tf.test.TestCase): + + def test_build_faster_rcnn_box_coder_with_defaults(self): + box_coder_text_proto = """ + faster_rcnn_box_coder { + } + """ + box_coder_proto = box_coder_pb2.BoxCoder() + text_format.Merge(box_coder_text_proto, box_coder_proto) + box_coder_object = box_coder_builder.build(box_coder_proto) + self.assertTrue(isinstance(box_coder_object, + faster_rcnn_box_coder.FasterRcnnBoxCoder)) + self.assertEqual(box_coder_object._scale_factors, [10.0, 10.0, 5.0, 5.0]) + + def test_build_faster_rcnn_box_coder_with_non_default_parameters(self): + box_coder_text_proto = """ + faster_rcnn_box_coder { + y_scale: 6.0 + x_scale: 3.0 + height_scale: 7.0 + width_scale: 8.0 + } + """ + box_coder_proto = box_coder_pb2.BoxCoder() + text_format.Merge(box_coder_text_proto, box_coder_proto) + box_coder_object = box_coder_builder.build(box_coder_proto) + self.assertTrue(isinstance(box_coder_object, + faster_rcnn_box_coder.FasterRcnnBoxCoder)) + self.assertEqual(box_coder_object._scale_factors, [6.0, 3.0, 7.0, 8.0]) + + def test_build_mean_stddev_box_coder(self): + box_coder_text_proto = """ + mean_stddev_box_coder { + } + """ + box_coder_proto = box_coder_pb2.BoxCoder() + text_format.Merge(box_coder_text_proto, box_coder_proto) + box_coder_object = box_coder_builder.build(box_coder_proto) + self.assertTrue( + isinstance(box_coder_object, + mean_stddev_box_coder.MeanStddevBoxCoder)) + + def test_build_square_box_coder_with_defaults(self): + box_coder_text_proto = """ + square_box_coder { + } + """ + box_coder_proto = box_coder_pb2.BoxCoder() + text_format.Merge(box_coder_text_proto, box_coder_proto) + box_coder_object = box_coder_builder.build(box_coder_proto) + self.assertTrue( + isinstance(box_coder_object, square_box_coder.SquareBoxCoder)) + self.assertEqual(box_coder_object._scale_factors, [10.0, 10.0, 5.0]) + + def test_build_square_box_coder_with_non_default_parameters(self): + box_coder_text_proto = """ + square_box_coder { + y_scale: 6.0 + x_scale: 3.0 + length_scale: 7.0 + } + """ + box_coder_proto = box_coder_pb2.BoxCoder() + text_format.Merge(box_coder_text_proto, box_coder_proto) + box_coder_object = box_coder_builder.build(box_coder_proto) + self.assertTrue( + isinstance(box_coder_object, square_box_coder.SquareBoxCoder)) + self.assertEqual(box_coder_object._scale_factors, [6.0, 3.0, 7.0]) + + def test_raise_error_on_empty_box_coder(self): + box_coder_text_proto = """ + """ + box_coder_proto = box_coder_pb2.BoxCoder() + text_format.Merge(box_coder_text_proto, box_coder_proto) + with self.assertRaises(ValueError): + box_coder_builder.build(box_coder_proto) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/builders/box_predictor_builder.py b/object_detection/builders/box_predictor_builder.py new file mode 100644 index 0000000000000000000000000000000000000000..4f7c5045e4bbc28f1278a824984c29679b6e8bfc --- /dev/null +++ b/object_detection/builders/box_predictor_builder.py @@ -0,0 +1,106 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Function to build box predictor from configuration.""" + +from object_detection.core import box_predictor +from object_detection.protos import box_predictor_pb2 + + +def build(argscope_fn, box_predictor_config, is_training, num_classes): + """Builds box predictor based on the configuration. + + Builds box predictor based on the configuration. See box_predictor.proto for + configurable options. Also, see box_predictor.py for more details. + + Args: + argscope_fn: A function that takes the following inputs: + * hyperparams_pb2.Hyperparams proto + * a boolean indicating if the model is in training mode. + and returns a tf slim argscope for Conv and FC hyperparameters. + box_predictor_config: box_predictor_pb2.BoxPredictor proto containing + configuration. + is_training: Whether the models is in training mode. + num_classes: Number of classes to predict. + + Returns: + box_predictor: box_predictor.BoxPredictor object. + + Raises: + ValueError: On unknown box predictor. + """ + if not isinstance(box_predictor_config, box_predictor_pb2.BoxPredictor): + raise ValueError('box_predictor_config not of type ' + 'box_predictor_pb2.BoxPredictor.') + + box_predictor_oneof = box_predictor_config.WhichOneof('box_predictor_oneof') + + if box_predictor_oneof == 'convolutional_box_predictor': + conv_box_predictor = box_predictor_config.convolutional_box_predictor + conv_hyperparams = argscope_fn(conv_box_predictor.conv_hyperparams, + is_training) + box_predictor_object = box_predictor.ConvolutionalBoxPredictor( + is_training=is_training, + num_classes=num_classes, + conv_hyperparams=conv_hyperparams, + min_depth=conv_box_predictor.min_depth, + max_depth=conv_box_predictor.max_depth, + num_layers_before_predictor=(conv_box_predictor. + num_layers_before_predictor), + use_dropout=conv_box_predictor.use_dropout, + dropout_keep_prob=conv_box_predictor.dropout_keep_probability, + kernel_size=conv_box_predictor.kernel_size, + box_code_size=conv_box_predictor.box_code_size, + apply_sigmoid_to_scores=conv_box_predictor.apply_sigmoid_to_scores) + return box_predictor_object + + if box_predictor_oneof == 'mask_rcnn_box_predictor': + mask_rcnn_box_predictor = box_predictor_config.mask_rcnn_box_predictor + fc_hyperparams = argscope_fn(mask_rcnn_box_predictor.fc_hyperparams, + is_training) + conv_hyperparams = None + if mask_rcnn_box_predictor.HasField('conv_hyperparams'): + conv_hyperparams = argscope_fn(mask_rcnn_box_predictor.conv_hyperparams, + is_training) + box_predictor_object = box_predictor.MaskRCNNBoxPredictor( + is_training=is_training, + num_classes=num_classes, + fc_hyperparams=fc_hyperparams, + use_dropout=mask_rcnn_box_predictor.use_dropout, + dropout_keep_prob=mask_rcnn_box_predictor.dropout_keep_probability, + box_code_size=mask_rcnn_box_predictor.box_code_size, + conv_hyperparams=conv_hyperparams, + predict_instance_masks=mask_rcnn_box_predictor.predict_instance_masks, + mask_prediction_conv_depth=(mask_rcnn_box_predictor. + mask_prediction_conv_depth), + predict_keypoints=mask_rcnn_box_predictor.predict_keypoints) + return box_predictor_object + + if box_predictor_oneof == 'rfcn_box_predictor': + rfcn_box_predictor = box_predictor_config.rfcn_box_predictor + conv_hyperparams = argscope_fn(rfcn_box_predictor.conv_hyperparams, + is_training) + box_predictor_object = box_predictor.RfcnBoxPredictor( + is_training=is_training, + num_classes=num_classes, + conv_hyperparams=conv_hyperparams, + crop_size=[rfcn_box_predictor.crop_height, + rfcn_box_predictor.crop_width], + num_spatial_bins=[rfcn_box_predictor.num_spatial_bins_height, + rfcn_box_predictor.num_spatial_bins_width], + depth=rfcn_box_predictor.depth, + box_code_size=rfcn_box_predictor.box_code_size) + return box_predictor_object + raise ValueError('Unknown box predictor: {}'.format(box_predictor_oneof)) diff --git a/object_detection/builders/box_predictor_builder_test.py b/object_detection/builders/box_predictor_builder_test.py new file mode 100644 index 0000000000000000000000000000000000000000..3f6a574a2d44dd5a364a57b46a9a635c7b474479 --- /dev/null +++ b/object_detection/builders/box_predictor_builder_test.py @@ -0,0 +1,391 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for box_predictor_builder.""" +import mock +import tensorflow as tf + +from google.protobuf import text_format +from object_detection.builders import box_predictor_builder +from object_detection.builders import hyperparams_builder +from object_detection.protos import box_predictor_pb2 +from object_detection.protos import hyperparams_pb2 + + +class ConvolutionalBoxPredictorBuilderTest(tf.test.TestCase): + + def test_box_predictor_calls_conv_argscope_fn(self): + conv_hyperparams_text_proto = """ + regularizer { + l1_regularizer { + weight: 0.0003 + } + } + initializer { + truncated_normal_initializer { + mean: 0.0 + stddev: 0.3 + } + } + activation: RELU_6 + """ + hyperparams_proto = hyperparams_pb2.Hyperparams() + text_format.Merge(conv_hyperparams_text_proto, hyperparams_proto) + def mock_conv_argscope_builder(conv_hyperparams_arg, is_training): + return (conv_hyperparams_arg, is_training) + + box_predictor_proto = box_predictor_pb2.BoxPredictor() + box_predictor_proto.convolutional_box_predictor.conv_hyperparams.CopyFrom( + hyperparams_proto) + box_predictor = box_predictor_builder.build( + argscope_fn=mock_conv_argscope_builder, + box_predictor_config=box_predictor_proto, + is_training=False, + num_classes=10) + (conv_hyperparams_actual, is_training) = box_predictor._conv_hyperparams + self.assertAlmostEqual((hyperparams_proto.regularizer. + l1_regularizer.weight), + (conv_hyperparams_actual.regularizer.l1_regularizer. + weight)) + self.assertAlmostEqual((hyperparams_proto.initializer. + truncated_normal_initializer.stddev), + (conv_hyperparams_actual.initializer. + truncated_normal_initializer.stddev)) + self.assertAlmostEqual((hyperparams_proto.initializer. + truncated_normal_initializer.mean), + (conv_hyperparams_actual.initializer. + truncated_normal_initializer.mean)) + self.assertEqual(hyperparams_proto.activation, + conv_hyperparams_actual.activation) + self.assertFalse(is_training) + + def test_construct_non_default_conv_box_predictor(self): + box_predictor_text_proto = """ + convolutional_box_predictor { + min_depth: 2 + max_depth: 16 + num_layers_before_predictor: 2 + use_dropout: false + dropout_keep_probability: 0.4 + kernel_size: 3 + box_code_size: 3 + apply_sigmoid_to_scores: true + } + """ + conv_hyperparams_text_proto = """ + regularizer { + l1_regularizer { + } + } + initializer { + truncated_normal_initializer { + } + } + """ + hyperparams_proto = hyperparams_pb2.Hyperparams() + text_format.Merge(conv_hyperparams_text_proto, hyperparams_proto) + def mock_conv_argscope_builder(conv_hyperparams_arg, is_training): + return (conv_hyperparams_arg, is_training) + + box_predictor_proto = box_predictor_pb2.BoxPredictor() + text_format.Merge(box_predictor_text_proto, box_predictor_proto) + box_predictor_proto.convolutional_box_predictor.conv_hyperparams.CopyFrom( + hyperparams_proto) + box_predictor = box_predictor_builder.build( + argscope_fn=mock_conv_argscope_builder, + box_predictor_config=box_predictor_proto, + is_training=False, + num_classes=10) + self.assertEqual(box_predictor._min_depth, 2) + self.assertEqual(box_predictor._max_depth, 16) + self.assertEqual(box_predictor._num_layers_before_predictor, 2) + self.assertFalse(box_predictor._use_dropout) + self.assertAlmostEqual(box_predictor._dropout_keep_prob, 0.4) + self.assertTrue(box_predictor._apply_sigmoid_to_scores) + self.assertEqual(box_predictor.num_classes, 10) + self.assertFalse(box_predictor._is_training) + + def test_construct_default_conv_box_predictor(self): + box_predictor_text_proto = """ + convolutional_box_predictor { + conv_hyperparams { + regularizer { + l1_regularizer { + } + } + initializer { + truncated_normal_initializer { + } + } + } + }""" + box_predictor_proto = box_predictor_pb2.BoxPredictor() + text_format.Merge(box_predictor_text_proto, box_predictor_proto) + box_predictor = box_predictor_builder.build( + argscope_fn=hyperparams_builder.build, + box_predictor_config=box_predictor_proto, + is_training=True, + num_classes=90) + self.assertEqual(box_predictor._min_depth, 0) + self.assertEqual(box_predictor._max_depth, 0) + self.assertEqual(box_predictor._num_layers_before_predictor, 0) + self.assertTrue(box_predictor._use_dropout) + self.assertAlmostEqual(box_predictor._dropout_keep_prob, 0.8) + self.assertFalse(box_predictor._apply_sigmoid_to_scores) + self.assertEqual(box_predictor.num_classes, 90) + self.assertTrue(box_predictor._is_training) + + +class MaskRCNNBoxPredictorBuilderTest(tf.test.TestCase): + + def test_box_predictor_builder_calls_fc_argscope_fn(self): + fc_hyperparams_text_proto = """ + regularizer { + l1_regularizer { + weight: 0.0003 + } + } + initializer { + truncated_normal_initializer { + mean: 0.0 + stddev: 0.3 + } + } + activation: RELU_6 + op: FC + """ + hyperparams_proto = hyperparams_pb2.Hyperparams() + text_format.Merge(fc_hyperparams_text_proto, hyperparams_proto) + box_predictor_proto = box_predictor_pb2.BoxPredictor() + box_predictor_proto.mask_rcnn_box_predictor.fc_hyperparams.CopyFrom( + hyperparams_proto) + mock_argscope_fn = mock.Mock(return_value='arg_scope') + box_predictor = box_predictor_builder.build( + argscope_fn=mock_argscope_fn, + box_predictor_config=box_predictor_proto, + is_training=False, + num_classes=10) + mock_argscope_fn.assert_called_with(hyperparams_proto, False) + self.assertEqual(box_predictor._fc_hyperparams, 'arg_scope') + + def test_non_default_mask_rcnn_box_predictor(self): + fc_hyperparams_text_proto = """ + regularizer { + l1_regularizer { + } + } + initializer { + truncated_normal_initializer { + } + } + activation: RELU_6 + op: FC + """ + box_predictor_text_proto = """ + mask_rcnn_box_predictor { + use_dropout: true + dropout_keep_probability: 0.8 + box_code_size: 3 + } + """ + hyperparams_proto = hyperparams_pb2.Hyperparams() + text_format.Merge(fc_hyperparams_text_proto, hyperparams_proto) + def mock_fc_argscope_builder(fc_hyperparams_arg, is_training): + return (fc_hyperparams_arg, is_training) + + box_predictor_proto = box_predictor_pb2.BoxPredictor() + text_format.Merge(box_predictor_text_proto, box_predictor_proto) + box_predictor_proto.mask_rcnn_box_predictor.fc_hyperparams.CopyFrom( + hyperparams_proto) + box_predictor = box_predictor_builder.build( + argscope_fn=mock_fc_argscope_builder, + box_predictor_config=box_predictor_proto, + is_training=True, + num_classes=90) + self.assertTrue(box_predictor._use_dropout) + self.assertAlmostEqual(box_predictor._dropout_keep_prob, 0.8) + self.assertEqual(box_predictor.num_classes, 90) + self.assertTrue(box_predictor._is_training) + self.assertEqual(box_predictor._box_code_size, 3) + + def test_build_default_mask_rcnn_box_predictor(self): + box_predictor_proto = box_predictor_pb2.BoxPredictor() + box_predictor_proto.mask_rcnn_box_predictor.fc_hyperparams.op = ( + hyperparams_pb2.Hyperparams.FC) + box_predictor = box_predictor_builder.build( + argscope_fn=mock.Mock(return_value='arg_scope'), + box_predictor_config=box_predictor_proto, + is_training=True, + num_classes=90) + self.assertFalse(box_predictor._use_dropout) + self.assertAlmostEqual(box_predictor._dropout_keep_prob, 0.5) + self.assertEqual(box_predictor.num_classes, 90) + self.assertTrue(box_predictor._is_training) + self.assertEqual(box_predictor._box_code_size, 4) + self.assertFalse(box_predictor._predict_instance_masks) + self.assertFalse(box_predictor._predict_keypoints) + + def test_build_box_predictor_with_mask_branch(self): + box_predictor_proto = box_predictor_pb2.BoxPredictor() + box_predictor_proto.mask_rcnn_box_predictor.fc_hyperparams.op = ( + hyperparams_pb2.Hyperparams.FC) + box_predictor_proto.mask_rcnn_box_predictor.conv_hyperparams.op = ( + hyperparams_pb2.Hyperparams.CONV) + box_predictor_proto.mask_rcnn_box_predictor.predict_instance_masks = True + box_predictor_proto.mask_rcnn_box_predictor.mask_prediction_conv_depth = 512 + mock_argscope_fn = mock.Mock(return_value='arg_scope') + box_predictor = box_predictor_builder.build( + argscope_fn=mock_argscope_fn, + box_predictor_config=box_predictor_proto, + is_training=True, + num_classes=90) + mock_argscope_fn.assert_has_calls( + [mock.call(box_predictor_proto.mask_rcnn_box_predictor.fc_hyperparams, + True), + mock.call(box_predictor_proto.mask_rcnn_box_predictor.conv_hyperparams, + True)], any_order=True) + self.assertFalse(box_predictor._use_dropout) + self.assertAlmostEqual(box_predictor._dropout_keep_prob, 0.5) + self.assertEqual(box_predictor.num_classes, 90) + self.assertTrue(box_predictor._is_training) + self.assertEqual(box_predictor._box_code_size, 4) + self.assertTrue(box_predictor._predict_instance_masks) + self.assertEqual(box_predictor._mask_prediction_conv_depth, 512) + self.assertFalse(box_predictor._predict_keypoints) + + +class RfcnBoxPredictorBuilderTest(tf.test.TestCase): + + def test_box_predictor_calls_fc_argscope_fn(self): + conv_hyperparams_text_proto = """ + regularizer { + l1_regularizer { + weight: 0.0003 + } + } + initializer { + truncated_normal_initializer { + mean: 0.0 + stddev: 0.3 + } + } + activation: RELU_6 + """ + hyperparams_proto = hyperparams_pb2.Hyperparams() + text_format.Merge(conv_hyperparams_text_proto, hyperparams_proto) + def mock_conv_argscope_builder(conv_hyperparams_arg, is_training): + return (conv_hyperparams_arg, is_training) + + box_predictor_proto = box_predictor_pb2.BoxPredictor() + box_predictor_proto.rfcn_box_predictor.conv_hyperparams.CopyFrom( + hyperparams_proto) + box_predictor = box_predictor_builder.build( + argscope_fn=mock_conv_argscope_builder, + box_predictor_config=box_predictor_proto, + is_training=False, + num_classes=10) + (conv_hyperparams_actual, is_training) = box_predictor._conv_hyperparams + self.assertAlmostEqual((hyperparams_proto.regularizer. + l1_regularizer.weight), + (conv_hyperparams_actual.regularizer.l1_regularizer. + weight)) + self.assertAlmostEqual((hyperparams_proto.initializer. + truncated_normal_initializer.stddev), + (conv_hyperparams_actual.initializer. + truncated_normal_initializer.stddev)) + self.assertAlmostEqual((hyperparams_proto.initializer. + truncated_normal_initializer.mean), + (conv_hyperparams_actual.initializer. + truncated_normal_initializer.mean)) + self.assertEqual(hyperparams_proto.activation, + conv_hyperparams_actual.activation) + self.assertFalse(is_training) + + def test_non_default_rfcn_box_predictor(self): + conv_hyperparams_text_proto = """ + regularizer { + l1_regularizer { + } + } + initializer { + truncated_normal_initializer { + } + } + activation: RELU_6 + """ + box_predictor_text_proto = """ + rfcn_box_predictor { + num_spatial_bins_height: 4 + num_spatial_bins_width: 4 + depth: 4 + box_code_size: 3 + crop_height: 16 + crop_width: 16 + } + """ + hyperparams_proto = hyperparams_pb2.Hyperparams() + text_format.Merge(conv_hyperparams_text_proto, hyperparams_proto) + def mock_conv_argscope_builder(conv_hyperparams_arg, is_training): + return (conv_hyperparams_arg, is_training) + + box_predictor_proto = box_predictor_pb2.BoxPredictor() + text_format.Merge(box_predictor_text_proto, box_predictor_proto) + box_predictor_proto.rfcn_box_predictor.conv_hyperparams.CopyFrom( + hyperparams_proto) + box_predictor = box_predictor_builder.build( + argscope_fn=mock_conv_argscope_builder, + box_predictor_config=box_predictor_proto, + is_training=True, + num_classes=90) + self.assertEqual(box_predictor.num_classes, 90) + self.assertTrue(box_predictor._is_training) + self.assertEqual(box_predictor._box_code_size, 3) + self.assertEqual(box_predictor._num_spatial_bins, [4, 4]) + self.assertEqual(box_predictor._crop_size, [16, 16]) + + def test_default_rfcn_box_predictor(self): + conv_hyperparams_text_proto = """ + regularizer { + l1_regularizer { + } + } + initializer { + truncated_normal_initializer { + } + } + activation: RELU_6 + """ + hyperparams_proto = hyperparams_pb2.Hyperparams() + text_format.Merge(conv_hyperparams_text_proto, hyperparams_proto) + def mock_conv_argscope_builder(conv_hyperparams_arg, is_training): + return (conv_hyperparams_arg, is_training) + + box_predictor_proto = box_predictor_pb2.BoxPredictor() + box_predictor_proto.rfcn_box_predictor.conv_hyperparams.CopyFrom( + hyperparams_proto) + box_predictor = box_predictor_builder.build( + argscope_fn=mock_conv_argscope_builder, + box_predictor_config=box_predictor_proto, + is_training=True, + num_classes=90) + self.assertEqual(box_predictor.num_classes, 90) + self.assertTrue(box_predictor._is_training) + self.assertEqual(box_predictor._box_code_size, 4) + self.assertEqual(box_predictor._num_spatial_bins, [3, 3]) + self.assertEqual(box_predictor._crop_size, [12, 12]) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/builders/hyperparams_builder.py b/object_detection/builders/hyperparams_builder.py new file mode 100644 index 0000000000000000000000000000000000000000..c8c18e39cde892fa70b4543b81c4ee286d893ac3 --- /dev/null +++ b/object_detection/builders/hyperparams_builder.py @@ -0,0 +1,169 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Builder function to construct tf-slim arg_scope for convolution, fc ops.""" +import tensorflow as tf + +from object_detection.protos import hyperparams_pb2 + +slim = tf.contrib.slim + + +def build(hyperparams_config, is_training): + """Builds tf-slim arg_scope for convolution ops based on the config. + + Returns an arg_scope to use for convolution ops containing weights + initializer, weights regularizer, activation function, batch norm function + and batch norm parameters based on the configuration. + + Note that if the batch_norm parameteres are not specified in the config + (i.e. left to default) then batch norm is excluded from the arg_scope. + + The batch norm parameters are set for updates based on `is_training` argument + and conv_hyperparams_config.batch_norm.train parameter. During training, they + are updated only if batch_norm.train parameter is true. However, during eval, + no updates are made to the batch norm variables. In both cases, their current + values are used during forward pass. + + Args: + hyperparams_config: hyperparams.proto object containing + hyperparameters. + is_training: Whether the network is in training mode. + + Returns: + arg_scope: tf-slim arg_scope containing hyperparameters for ops. + + Raises: + ValueError: if hyperparams_config is not of type hyperparams.Hyperparams. + """ + if not isinstance(hyperparams_config, + hyperparams_pb2.Hyperparams): + raise ValueError('hyperparams_config not of type ' + 'hyperparams_pb.Hyperparams.') + + batch_norm = None + batch_norm_params = None + if hyperparams_config.HasField('batch_norm'): + batch_norm = slim.batch_norm + batch_norm_params = _build_batch_norm_params( + hyperparams_config.batch_norm, is_training) + + affected_ops = [slim.conv2d, slim.separable_conv2d, slim.conv2d_transpose] + if hyperparams_config.HasField('op') and ( + hyperparams_config.op == hyperparams_pb2.Hyperparams.FC): + affected_ops = [slim.fully_connected] + with slim.arg_scope( + affected_ops, + weights_regularizer=_build_regularizer( + hyperparams_config.regularizer), + weights_initializer=_build_initializer( + hyperparams_config.initializer), + activation_fn=_build_activation_fn(hyperparams_config.activation), + normalizer_fn=batch_norm, + normalizer_params=batch_norm_params) as sc: + return sc + + +def _build_activation_fn(activation_fn): + """Builds a callable activation from config. + + Args: + activation_fn: hyperparams_pb2.Hyperparams.activation + + Returns: + Callable activation function. + + Raises: + ValueError: On unknown activation function. + """ + if activation_fn == hyperparams_pb2.Hyperparams.NONE: + return None + if activation_fn == hyperparams_pb2.Hyperparams.RELU: + return tf.nn.relu + if activation_fn == hyperparams_pb2.Hyperparams.RELU_6: + return tf.nn.relu6 + raise ValueError('Unknown activation function: {}'.format(activation_fn)) + + +def _build_regularizer(regularizer): + """Builds a tf-slim regularizer from config. + + Args: + regularizer: hyperparams_pb2.Hyperparams.regularizer proto. + + Returns: + tf-slim regularizer. + + Raises: + ValueError: On unknown regularizer. + """ + regularizer_oneof = regularizer.WhichOneof('regularizer_oneof') + if regularizer_oneof == 'l1_regularizer': + return slim.l1_regularizer(scale=float(regularizer.l1_regularizer.weight)) + if regularizer_oneof == 'l2_regularizer': + return slim.l2_regularizer(scale=float(regularizer.l2_regularizer.weight)) + raise ValueError('Unknown regularizer function: {}'.format(regularizer_oneof)) + + +def _build_initializer(initializer): + """Build a tf initializer from config. + + Args: + initializer: hyperparams_pb2.Hyperparams.regularizer proto. + + Returns: + tf initializer. + + Raises: + ValueError: On unknown initializer. + """ + initializer_oneof = initializer.WhichOneof('initializer_oneof') + if initializer_oneof == 'truncated_normal_initializer': + return tf.truncated_normal_initializer( + mean=initializer.truncated_normal_initializer.mean, + stddev=initializer.truncated_normal_initializer.stddev) + if initializer_oneof == 'variance_scaling_initializer': + enum_descriptor = (hyperparams_pb2.VarianceScalingInitializer. + DESCRIPTOR.enum_types_by_name['Mode']) + mode = enum_descriptor.values_by_number[initializer. + variance_scaling_initializer. + mode].name + return slim.variance_scaling_initializer( + factor=initializer.variance_scaling_initializer.factor, + mode=mode, + uniform=initializer.variance_scaling_initializer.uniform) + raise ValueError('Unknown initializer function: {}'.format( + initializer_oneof)) + + +def _build_batch_norm_params(batch_norm, is_training): + """Build a dictionary of batch_norm params from config. + + Args: + batch_norm: hyperparams_pb2.ConvHyperparams.batch_norm proto. + is_training: Whether the models is in training mode. + + Returns: + A dictionary containing batch_norm parameters. + """ + batch_norm_params = { + 'decay': batch_norm.decay, + 'center': batch_norm.center, + 'scale': batch_norm.scale, + 'epsilon': batch_norm.epsilon, + 'fused': True, + 'is_training': is_training and batch_norm.train, + } + return batch_norm_params diff --git a/object_detection/builders/hyperparams_builder_test.py b/object_detection/builders/hyperparams_builder_test.py new file mode 100644 index 0000000000000000000000000000000000000000..7b0572a03f879b996d466d2c06355aeaf192c4ac --- /dev/null +++ b/object_detection/builders/hyperparams_builder_test.py @@ -0,0 +1,450 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests object_detection.core.hyperparams_builder.""" + +import numpy as np +import tensorflow as tf + +from google.protobuf import text_format + +# TODO: Rewrite third_party imports. +from object_detection.builders import hyperparams_builder +from object_detection.protos import hyperparams_pb2 + +slim = tf.contrib.slim + + +class HyperparamsBuilderTest(tf.test.TestCase): + + # TODO: Make this a public api in slim arg_scope.py. + def _get_scope_key(self, op): + return getattr(op, '_key_op', str(op)) + + def test_default_arg_scope_has_conv2d_op(self): + conv_hyperparams_text_proto = """ + regularizer { + l1_regularizer { + } + } + initializer { + truncated_normal_initializer { + } + } + """ + conv_hyperparams_proto = hyperparams_pb2.Hyperparams() + text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams_proto) + scope = hyperparams_builder.build(conv_hyperparams_proto, is_training=True) + self.assertTrue(self._get_scope_key(slim.conv2d) in scope) + + def test_default_arg_scope_has_separable_conv2d_op(self): + conv_hyperparams_text_proto = """ + regularizer { + l1_regularizer { + } + } + initializer { + truncated_normal_initializer { + } + } + """ + conv_hyperparams_proto = hyperparams_pb2.Hyperparams() + text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams_proto) + scope = hyperparams_builder.build(conv_hyperparams_proto, is_training=True) + self.assertTrue(self._get_scope_key(slim.separable_conv2d) in scope) + + def test_default_arg_scope_has_conv2d_transpose_op(self): + conv_hyperparams_text_proto = """ + regularizer { + l1_regularizer { + } + } + initializer { + truncated_normal_initializer { + } + } + """ + conv_hyperparams_proto = hyperparams_pb2.Hyperparams() + text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams_proto) + scope = hyperparams_builder.build(conv_hyperparams_proto, is_training=True) + self.assertTrue(self._get_scope_key(slim.conv2d_transpose) in scope) + + def test_explicit_fc_op_arg_scope_has_fully_connected_op(self): + conv_hyperparams_text_proto = """ + op: FC + regularizer { + l1_regularizer { + } + } + initializer { + truncated_normal_initializer { + } + } + """ + conv_hyperparams_proto = hyperparams_pb2.Hyperparams() + text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams_proto) + scope = hyperparams_builder.build(conv_hyperparams_proto, is_training=True) + self.assertTrue(self._get_scope_key(slim.fully_connected) in scope) + + def test_separable_conv2d_and_conv2d_and_transpose_have_same_parameters(self): + conv_hyperparams_text_proto = """ + regularizer { + l1_regularizer { + } + } + initializer { + truncated_normal_initializer { + } + } + """ + conv_hyperparams_proto = hyperparams_pb2.Hyperparams() + text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams_proto) + scope = hyperparams_builder.build(conv_hyperparams_proto, is_training=True) + kwargs_1, kwargs_2, kwargs_3 = scope.values() + self.assertDictEqual(kwargs_1, kwargs_2) + self.assertDictEqual(kwargs_1, kwargs_3) + + def test_return_l1_regularized_weights(self): + conv_hyperparams_text_proto = """ + regularizer { + l1_regularizer { + weight: 0.5 + } + } + initializer { + truncated_normal_initializer { + } + } + """ + conv_hyperparams_proto = hyperparams_pb2.Hyperparams() + text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams_proto) + scope = hyperparams_builder.build(conv_hyperparams_proto, is_training=True) + conv_scope_arguments = scope.values()[0] + regularizer = conv_scope_arguments['weights_regularizer'] + weights = np.array([1., -1, 4., 2.]) + with self.test_session() as sess: + result = sess.run(regularizer(tf.constant(weights))) + self.assertAllClose(np.abs(weights).sum() * 0.5, result) + + def test_return_l2_regularizer_weights(self): + conv_hyperparams_text_proto = """ + regularizer { + l2_regularizer { + weight: 0.42 + } + } + initializer { + truncated_normal_initializer { + } + } + """ + conv_hyperparams_proto = hyperparams_pb2.Hyperparams() + text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams_proto) + scope = hyperparams_builder.build(conv_hyperparams_proto, is_training=True) + conv_scope_arguments = scope.values()[0] + + regularizer = conv_scope_arguments['weights_regularizer'] + weights = np.array([1., -1, 4., 2.]) + with self.test_session() as sess: + result = sess.run(regularizer(tf.constant(weights))) + self.assertAllClose(np.power(weights, 2).sum() / 2.0 * 0.42, result) + + def test_return_non_default_batch_norm_params_with_train_during_train(self): + conv_hyperparams_text_proto = """ + regularizer { + l2_regularizer { + } + } + initializer { + truncated_normal_initializer { + } + } + batch_norm { + decay: 0.7 + center: false + scale: true + epsilon: 0.03 + train: true + } + """ + conv_hyperparams_proto = hyperparams_pb2.Hyperparams() + text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams_proto) + scope = hyperparams_builder.build(conv_hyperparams_proto, is_training=True) + conv_scope_arguments = scope.values()[0] + self.assertEqual(conv_scope_arguments['normalizer_fn'], slim.batch_norm) + batch_norm_params = conv_scope_arguments['normalizer_params'] + self.assertAlmostEqual(batch_norm_params['decay'], 0.7) + self.assertAlmostEqual(batch_norm_params['epsilon'], 0.03) + self.assertFalse(batch_norm_params['center']) + self.assertTrue(batch_norm_params['scale']) + self.assertTrue(batch_norm_params['is_training']) + + def test_return_batch_norm_params_with_notrain_during_eval(self): + conv_hyperparams_text_proto = """ + regularizer { + l2_regularizer { + } + } + initializer { + truncated_normal_initializer { + } + } + batch_norm { + decay: 0.7 + center: false + scale: true + epsilon: 0.03 + train: true + } + """ + conv_hyperparams_proto = hyperparams_pb2.Hyperparams() + text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams_proto) + scope = hyperparams_builder.build(conv_hyperparams_proto, is_training=False) + conv_scope_arguments = scope.values()[0] + self.assertEqual(conv_scope_arguments['normalizer_fn'], slim.batch_norm) + batch_norm_params = conv_scope_arguments['normalizer_params'] + self.assertAlmostEqual(batch_norm_params['decay'], 0.7) + self.assertAlmostEqual(batch_norm_params['epsilon'], 0.03) + self.assertFalse(batch_norm_params['center']) + self.assertTrue(batch_norm_params['scale']) + self.assertFalse(batch_norm_params['is_training']) + + def test_return_batch_norm_params_with_notrain_when_train_is_false(self): + conv_hyperparams_text_proto = """ + regularizer { + l2_regularizer { + } + } + initializer { + truncated_normal_initializer { + } + } + batch_norm { + decay: 0.7 + center: false + scale: true + epsilon: 0.03 + train: false + } + """ + conv_hyperparams_proto = hyperparams_pb2.Hyperparams() + text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams_proto) + scope = hyperparams_builder.build(conv_hyperparams_proto, is_training=True) + conv_scope_arguments = scope.values()[0] + self.assertEqual(conv_scope_arguments['normalizer_fn'], slim.batch_norm) + batch_norm_params = conv_scope_arguments['normalizer_params'] + self.assertAlmostEqual(batch_norm_params['decay'], 0.7) + self.assertAlmostEqual(batch_norm_params['epsilon'], 0.03) + self.assertFalse(batch_norm_params['center']) + self.assertTrue(batch_norm_params['scale']) + self.assertFalse(batch_norm_params['is_training']) + + def test_do_not_use_batch_norm_if_default(self): + conv_hyperparams_text_proto = """ + regularizer { + l2_regularizer { + } + } + initializer { + truncated_normal_initializer { + } + } + """ + conv_hyperparams_proto = hyperparams_pb2.Hyperparams() + text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams_proto) + scope = hyperparams_builder.build(conv_hyperparams_proto, is_training=True) + conv_scope_arguments = scope.values()[0] + self.assertEqual(conv_scope_arguments['normalizer_fn'], None) + self.assertEqual(conv_scope_arguments['normalizer_params'], None) + + def test_use_none_activation(self): + conv_hyperparams_text_proto = """ + regularizer { + l2_regularizer { + } + } + initializer { + truncated_normal_initializer { + } + } + activation: NONE + """ + conv_hyperparams_proto = hyperparams_pb2.Hyperparams() + text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams_proto) + scope = hyperparams_builder.build(conv_hyperparams_proto, is_training=True) + conv_scope_arguments = scope.values()[0] + self.assertEqual(conv_scope_arguments['activation_fn'], None) + + def test_use_relu_activation(self): + conv_hyperparams_text_proto = """ + regularizer { + l2_regularizer { + } + } + initializer { + truncated_normal_initializer { + } + } + activation: RELU + """ + conv_hyperparams_proto = hyperparams_pb2.Hyperparams() + text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams_proto) + scope = hyperparams_builder.build(conv_hyperparams_proto, is_training=True) + conv_scope_arguments = scope.values()[0] + self.assertEqual(conv_scope_arguments['activation_fn'], tf.nn.relu) + + def test_use_relu_6_activation(self): + conv_hyperparams_text_proto = """ + regularizer { + l2_regularizer { + } + } + initializer { + truncated_normal_initializer { + } + } + activation: RELU_6 + """ + conv_hyperparams_proto = hyperparams_pb2.Hyperparams() + text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams_proto) + scope = hyperparams_builder.build(conv_hyperparams_proto, is_training=True) + conv_scope_arguments = scope.values()[0] + self.assertEqual(conv_scope_arguments['activation_fn'], tf.nn.relu6) + + def _assert_variance_in_range(self, initializer, shape, variance, + tol=1e-2): + with tf.Graph().as_default() as g: + with self.test_session(graph=g) as sess: + var = tf.get_variable( + name='test', + shape=shape, + dtype=tf.float32, + initializer=initializer) + sess.run(tf.global_variables_initializer()) + values = sess.run(var) + self.assertAllClose(np.var(values), variance, tol, tol) + + def test_variance_in_range_with_variance_scaling_initializer_fan_in(self): + conv_hyperparams_text_proto = """ + regularizer { + l2_regularizer { + } + } + initializer { + variance_scaling_initializer { + factor: 2.0 + mode: FAN_IN + uniform: false + } + } + """ + conv_hyperparams_proto = hyperparams_pb2.Hyperparams() + text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams_proto) + scope = hyperparams_builder.build(conv_hyperparams_proto, is_training=True) + conv_scope_arguments = scope.values()[0] + initializer = conv_scope_arguments['weights_initializer'] + self._assert_variance_in_range(initializer, shape=[100, 40], + variance=2. / 100.) + + def test_variance_in_range_with_variance_scaling_initializer_fan_out(self): + conv_hyperparams_text_proto = """ + regularizer { + l2_regularizer { + } + } + initializer { + variance_scaling_initializer { + factor: 2.0 + mode: FAN_OUT + uniform: false + } + } + """ + conv_hyperparams_proto = hyperparams_pb2.Hyperparams() + text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams_proto) + scope = hyperparams_builder.build(conv_hyperparams_proto, is_training=True) + conv_scope_arguments = scope.values()[0] + initializer = conv_scope_arguments['weights_initializer'] + self._assert_variance_in_range(initializer, shape=[100, 40], + variance=2. / 40.) + + def test_variance_in_range_with_variance_scaling_initializer_fan_avg(self): + conv_hyperparams_text_proto = """ + regularizer { + l2_regularizer { + } + } + initializer { + variance_scaling_initializer { + factor: 2.0 + mode: FAN_AVG + uniform: false + } + } + """ + conv_hyperparams_proto = hyperparams_pb2.Hyperparams() + text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams_proto) + scope = hyperparams_builder.build(conv_hyperparams_proto, is_training=True) + conv_scope_arguments = scope.values()[0] + initializer = conv_scope_arguments['weights_initializer'] + self._assert_variance_in_range(initializer, shape=[100, 40], + variance=4. / (100. + 40.)) + + def test_variance_in_range_with_variance_scaling_initializer_uniform(self): + conv_hyperparams_text_proto = """ + regularizer { + l2_regularizer { + } + } + initializer { + variance_scaling_initializer { + factor: 2.0 + mode: FAN_IN + uniform: true + } + } + """ + conv_hyperparams_proto = hyperparams_pb2.Hyperparams() + text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams_proto) + scope = hyperparams_builder.build(conv_hyperparams_proto, is_training=True) + conv_scope_arguments = scope.values()[0] + initializer = conv_scope_arguments['weights_initializer'] + self._assert_variance_in_range(initializer, shape=[100, 40], + variance=2. / 100.) + + def test_variance_in_range_with_truncated_normal_initializer(self): + conv_hyperparams_text_proto = """ + regularizer { + l2_regularizer { + } + } + initializer { + truncated_normal_initializer { + mean: 0.0 + stddev: 0.8 + } + } + """ + conv_hyperparams_proto = hyperparams_pb2.Hyperparams() + text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams_proto) + scope = hyperparams_builder.build(conv_hyperparams_proto, is_training=True) + conv_scope_arguments = scope.values()[0] + initializer = conv_scope_arguments['weights_initializer'] + self._assert_variance_in_range(initializer, shape=[100, 40], + variance=0.49, tol=1e-1) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/builders/image_resizer_builder.py b/object_detection/builders/image_resizer_builder.py new file mode 100644 index 0000000000000000000000000000000000000000..542e2de031a6f2c5cb97d06176c3fc3738ce202e --- /dev/null +++ b/object_detection/builders/image_resizer_builder.py @@ -0,0 +1,62 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Builder function for image resizing operations.""" +import functools + +from object_detection.core import preprocessor +from object_detection.protos import image_resizer_pb2 + + +def build(image_resizer_config): + """Builds callable for image resizing operations. + + Args: + image_resizer_config: image_resizer.proto object containing parameters for + an image resizing operation. + + Returns: + image_resizer_fn: Callable for image resizing. This callable always takes + a rank-3 image tensor (corresponding to a single image) and returns a + rank-3 image tensor, possibly with new spatial dimensions. + + Raises: + ValueError: if `image_resizer_config` is of incorrect type. + ValueError: if `image_resizer_config.image_resizer_oneof` is of expected + type. + ValueError: if min_dimension > max_dimension when keep_aspect_ratio_resizer + is used. + """ + if not isinstance(image_resizer_config, image_resizer_pb2.ImageResizer): + raise ValueError('image_resizer_config not of type ' + 'image_resizer_pb2.ImageResizer.') + + if image_resizer_config.WhichOneof( + 'image_resizer_oneof') == 'keep_aspect_ratio_resizer': + keep_aspect_ratio_config = image_resizer_config.keep_aspect_ratio_resizer + if not (keep_aspect_ratio_config.min_dimension + <= keep_aspect_ratio_config.max_dimension): + raise ValueError('min_dimension > max_dimension') + return functools.partial( + preprocessor.resize_to_range, + min_dimension=keep_aspect_ratio_config.min_dimension, + max_dimension=keep_aspect_ratio_config.max_dimension) + if image_resizer_config.WhichOneof( + 'image_resizer_oneof') == 'fixed_shape_resizer': + fixed_shape_resizer_config = image_resizer_config.fixed_shape_resizer + return functools.partial(preprocessor.resize_image, + new_height=fixed_shape_resizer_config.height, + new_width=fixed_shape_resizer_config.width) + raise ValueError('Invalid image resizer option.') diff --git a/object_detection/builders/image_resizer_builder_test.py b/object_detection/builders/image_resizer_builder_test.py new file mode 100644 index 0000000000000000000000000000000000000000..79c6287d4840f7d2e1e64ac414cee7868c91d197 --- /dev/null +++ b/object_detection/builders/image_resizer_builder_test.py @@ -0,0 +1,70 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.builders.image_resizer_builder.""" +import tensorflow as tf +from google.protobuf import text_format +from object_detection.builders import image_resizer_builder +from object_detection.protos import image_resizer_pb2 + + +class ImageResizerBuilderTest(tf.test.TestCase): + + def _shape_of_resized_random_image_given_text_proto( + self, input_shape, text_proto): + image_resizer_config = image_resizer_pb2.ImageResizer() + text_format.Merge(text_proto, image_resizer_config) + image_resizer_fn = image_resizer_builder.build(image_resizer_config) + images = tf.to_float(tf.random_uniform( + input_shape, minval=0, maxval=255, dtype=tf.int32)) + resized_images = image_resizer_fn(images) + with self.test_session() as sess: + return sess.run(resized_images).shape + + def test_built_keep_aspect_ratio_resizer_returns_expected_shape(self): + image_resizer_text_proto = """ + keep_aspect_ratio_resizer { + min_dimension: 10 + max_dimension: 20 + } + """ + input_shape = (50, 25, 3) + expected_output_shape = (20, 10, 3) + output_shape = self._shape_of_resized_random_image_given_text_proto( + input_shape, image_resizer_text_proto) + self.assertEqual(output_shape, expected_output_shape) + + def test_built_fixed_shape_resizer_returns_expected_shape(self): + image_resizer_text_proto = """ + fixed_shape_resizer { + height: 10 + width: 20 + } + """ + input_shape = (50, 25, 3) + expected_output_shape = (10, 20, 3) + output_shape = self._shape_of_resized_random_image_given_text_proto( + input_shape, image_resizer_text_proto) + self.assertEqual(output_shape, expected_output_shape) + + def test_raises_error_on_invalid_input(self): + invalid_input = 'invalid_input' + with self.assertRaises(ValueError): + image_resizer_builder.build(invalid_input) + + +if __name__ == '__main__': + tf.test.main() + diff --git a/object_detection/builders/input_reader_builder.py b/object_detection/builders/input_reader_builder.py new file mode 100644 index 0000000000000000000000000000000000000000..5baa08137244d3cc734008b99623a5cfe9803c04 --- /dev/null +++ b/object_detection/builders/input_reader_builder.py @@ -0,0 +1,65 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Input reader builder. + +Creates data sources for DetectionModels from an InputReader config. See +input_reader.proto for options. + +Note: If users wishes to also use their own InputReaders with the Object +Detection configuration framework, they should define their own builder function +that wraps the build function. +""" + +import tensorflow as tf + +from object_detection.data_decoders import tf_example_decoder +from object_detection.protos import input_reader_pb2 + +parallel_reader = tf.contrib.slim.parallel_reader + + +def build(input_reader_config): + """Builds a tensor dictionary based on the InputReader config. + + Args: + input_reader_config: A input_reader_pb2.InputReader object. + + Returns: + A tensor dict based on the input_reader_config. + + Raises: + ValueError: On invalid input reader proto. + """ + if not isinstance(input_reader_config, input_reader_pb2.InputReader): + raise ValueError('input_reader_config not of type ' + 'input_reader_pb2.InputReader.') + + if input_reader_config.WhichOneof('input_reader') == 'tf_record_input_reader': + config = input_reader_config.tf_record_input_reader + _, string_tensor = parallel_reader.parallel_read( + config.input_path, + reader_class=tf.TFRecordReader, + num_epochs=(input_reader_config.num_epochs + if input_reader_config.num_epochs else None), + num_readers=input_reader_config.num_readers, + shuffle=input_reader_config.shuffle, + dtypes=[tf.string, tf.string], + capacity=input_reader_config.queue_capacity, + min_after_dequeue=input_reader_config.min_after_dequeue) + + return tf_example_decoder.TfExampleDecoder().decode(string_tensor) + + raise ValueError('Unsupported input_reader_config.') diff --git a/object_detection/builders/input_reader_builder_test.py b/object_detection/builders/input_reader_builder_test.py new file mode 100644 index 0000000000000000000000000000000000000000..05b8a95e50d91eb76b2627f3676114a4cc4cf1ee --- /dev/null +++ b/object_detection/builders/input_reader_builder_test.py @@ -0,0 +1,92 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for input_reader_builder.""" + +import os +import numpy as np +import tensorflow as tf + +from google.protobuf import text_format + +from tensorflow.core.example import example_pb2 +from tensorflow.core.example import feature_pb2 +from object_detection.builders import input_reader_builder +from object_detection.core import standard_fields as fields +from object_detection.protos import input_reader_pb2 + + +class InputReaderBuilderTest(tf.test.TestCase): + + def create_tf_record(self): + path = os.path.join(self.get_temp_dir(), 'tfrecord') + writer = tf.python_io.TFRecordWriter(path) + + image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8) + with self.test_session(): + encoded_jpeg = tf.image.encode_jpeg(tf.constant(image_tensor)).eval() + example = example_pb2.Example(features=feature_pb2.Features(feature={ + 'image/encoded': feature_pb2.Feature( + bytes_list=feature_pb2.BytesList(value=[encoded_jpeg])), + 'image/format': feature_pb2.Feature( + bytes_list=feature_pb2.BytesList(value=['jpeg'.encode('utf-8')])), + 'image/object/bbox/xmin': feature_pb2.Feature( + float_list=feature_pb2.FloatList(value=[0.0])), + 'image/object/bbox/xmax': feature_pb2.Feature( + float_list=feature_pb2.FloatList(value=[1.0])), + 'image/object/bbox/ymin': feature_pb2.Feature( + float_list=feature_pb2.FloatList(value=[0.0])), + 'image/object/bbox/ymax': feature_pb2.Feature( + float_list=feature_pb2.FloatList(value=[1.0])), + 'image/object/class/label': feature_pb2.Feature( + int64_list=feature_pb2.Int64List(value=[2])), + })) + writer.write(example.SerializeToString()) + writer.close() + + return path + + def test_build_tf_record_input_reader(self): + tf_record_path = self.create_tf_record() + + input_reader_text_proto = """ + shuffle: false + num_readers: 1 + tf_record_input_reader {{ + input_path: '{0}' + }} + """.format(tf_record_path) + input_reader_proto = input_reader_pb2.InputReader() + text_format.Merge(input_reader_text_proto, input_reader_proto) + tensor_dict = input_reader_builder.build(input_reader_proto) + + sv = tf.train.Supervisor(logdir=self.get_temp_dir()) + with sv.prepare_or_wait_for_session() as sess: + sv.start_queue_runners(sess) + output_dict = sess.run(tensor_dict) + + self.assertEquals( + (4, 5, 3), output_dict[fields.InputDataFields.image].shape) + self.assertEquals( + [2], output_dict[fields.InputDataFields.groundtruth_classes]) + self.assertEquals( + (1, 4), output_dict[fields.InputDataFields.groundtruth_boxes].shape) + self.assertAllEqual( + [0.0, 0.0, 1.0, 1.0], + output_dict[fields.InputDataFields.groundtruth_boxes][0]) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/builders/losses_builder.py b/object_detection/builders/losses_builder.py new file mode 100644 index 0000000000000000000000000000000000000000..7163e4877a598247ca0b5ed275cb6c7289a981fc --- /dev/null +++ b/object_detection/builders/losses_builder.py @@ -0,0 +1,161 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""A function to build localization and classification losses from config.""" + +from object_detection.core import losses +from object_detection.protos import losses_pb2 + + +def build(loss_config): + """Build losses based on the config. + + Builds classification, localization losses and optionally a hard example miner + based on the config. + + Args: + loss_config: A losses_pb2.Loss object. + + Returns: + classification_loss: Classification loss object. + localization_loss: Localization loss object. + classification_weight: Classification loss weight. + localization_weight: Localization loss weight. + hard_example_miner: Hard example miner object. + """ + classification_loss = _build_classification_loss( + loss_config.classification_loss) + localization_loss = _build_localization_loss( + loss_config.localization_loss) + classification_weight = loss_config.classification_weight + localization_weight = loss_config.localization_weight + hard_example_miner = None + if loss_config.HasField('hard_example_miner'): + hard_example_miner = build_hard_example_miner( + loss_config.hard_example_miner, + classification_weight, + localization_weight) + return (classification_loss, localization_loss, + classification_weight, + localization_weight, hard_example_miner) + + +def build_hard_example_miner(config, + classification_weight, + localization_weight): + """Builds hard example miner based on the config. + + Args: + config: A losses_pb2.HardExampleMiner object. + classification_weight: Classification loss weight. + localization_weight: Localization loss weight. + + Returns: + Hard example miner. + + """ + loss_type = None + if config.loss_type == losses_pb2.HardExampleMiner.BOTH: + loss_type = 'both' + if config.loss_type == losses_pb2.HardExampleMiner.CLASSIFICATION: + loss_type = 'cls' + if config.loss_type == losses_pb2.HardExampleMiner.LOCALIZATION: + loss_type = 'loc' + + max_negatives_per_positive = None + num_hard_examples = None + if config.max_negatives_per_positive > 0: + max_negatives_per_positive = config.max_negatives_per_positive + if config.num_hard_examples > 0: + num_hard_examples = config.num_hard_examples + hard_example_miner = losses.HardExampleMiner( + num_hard_examples=num_hard_examples, + iou_threshold=config.iou_threshold, + loss_type=loss_type, + cls_loss_weight=classification_weight, + loc_loss_weight=localization_weight, + max_negatives_per_positive=max_negatives_per_positive, + min_negatives_per_image=config.min_negatives_per_image) + return hard_example_miner + + +def _build_localization_loss(loss_config): + """Builds a localization loss based on the loss config. + + Args: + loss_config: A losses_pb2.LocalizationLoss object. + + Returns: + Loss based on the config. + + Raises: + ValueError: On invalid loss_config. + """ + if not isinstance(loss_config, losses_pb2.LocalizationLoss): + raise ValueError('loss_config not of type losses_pb2.LocalizationLoss.') + + loss_type = loss_config.WhichOneof('localization_loss') + + if loss_type == 'weighted_l2': + config = loss_config.weighted_l2 + return losses.WeightedL2LocalizationLoss( + anchorwise_output=config.anchorwise_output) + + if loss_type == 'weighted_smooth_l1': + config = loss_config.weighted_smooth_l1 + return losses.WeightedSmoothL1LocalizationLoss( + anchorwise_output=config.anchorwise_output) + + if loss_type == 'weighted_iou': + return losses.WeightedIOULocalizationLoss() + + raise ValueError('Empty loss config.') + + +def _build_classification_loss(loss_config): + """Builds a classification loss based on the loss config. + + Args: + loss_config: A losses_pb2.ClassificationLoss object. + + Returns: + Loss based on the config. + + Raises: + ValueError: On invalid loss_config. + """ + if not isinstance(loss_config, losses_pb2.ClassificationLoss): + raise ValueError('loss_config not of type losses_pb2.ClassificationLoss.') + + loss_type = loss_config.WhichOneof('classification_loss') + + if loss_type == 'weighted_sigmoid': + config = loss_config.weighted_sigmoid + return losses.WeightedSigmoidClassificationLoss( + anchorwise_output=config.anchorwise_output) + + if loss_type == 'weighted_softmax': + config = loss_config.weighted_softmax + return losses.WeightedSoftmaxClassificationLoss( + anchorwise_output=config.anchorwise_output) + + if loss_type == 'bootstrapped_sigmoid': + config = loss_config.bootstrapped_sigmoid + return losses.BootstrappedSigmoidClassificationLoss( + alpha=config.alpha, + bootstrap_type=('hard' if config.hard_bootstrap else 'soft'), + anchorwise_output=config.anchorwise_output) + + raise ValueError('Empty loss config.') diff --git a/object_detection/builders/losses_builder_test.py b/object_detection/builders/losses_builder_test.py new file mode 100644 index 0000000000000000000000000000000000000000..90e5d639ce9746a9b5e496ebb8a9a16c3d5063a2 --- /dev/null +++ b/object_detection/builders/losses_builder_test.py @@ -0,0 +1,323 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for losses_builder.""" + +import tensorflow as tf + +from google.protobuf import text_format +from object_detection.builders import losses_builder +from object_detection.core import losses +from object_detection.protos import losses_pb2 + + +class LocalizationLossBuilderTest(tf.test.TestCase): + + def test_build_weighted_l2_localization_loss(self): + losses_text_proto = """ + localization_loss { + weighted_l2 { + } + } + classification_loss { + weighted_softmax { + } + } + """ + losses_proto = losses_pb2.Loss() + text_format.Merge(losses_text_proto, losses_proto) + _, localization_loss, _, _, _ = losses_builder.build(losses_proto) + self.assertTrue(isinstance(localization_loss, + losses.WeightedL2LocalizationLoss)) + + def test_build_weighted_smooth_l1_localization_loss(self): + losses_text_proto = """ + localization_loss { + weighted_smooth_l1 { + } + } + classification_loss { + weighted_softmax { + } + } + """ + losses_proto = losses_pb2.Loss() + text_format.Merge(losses_text_proto, losses_proto) + _, localization_loss, _, _, _ = losses_builder.build(losses_proto) + self.assertTrue(isinstance(localization_loss, + losses.WeightedSmoothL1LocalizationLoss)) + + def test_build_weighted_iou_localization_loss(self): + losses_text_proto = """ + localization_loss { + weighted_iou { + } + } + classification_loss { + weighted_softmax { + } + } + """ + losses_proto = losses_pb2.Loss() + text_format.Merge(losses_text_proto, losses_proto) + _, localization_loss, _, _, _ = losses_builder.build(losses_proto) + self.assertTrue(isinstance(localization_loss, + losses.WeightedIOULocalizationLoss)) + + def test_anchorwise_output(self): + losses_text_proto = """ + localization_loss { + weighted_smooth_l1 { + anchorwise_output: true + } + } + classification_loss { + weighted_softmax { + } + } + """ + losses_proto = losses_pb2.Loss() + text_format.Merge(losses_text_proto, losses_proto) + _, localization_loss, _, _, _ = losses_builder.build(losses_proto) + self.assertTrue(isinstance(localization_loss, + losses.WeightedSmoothL1LocalizationLoss)) + predictions = tf.constant([[[0.0, 0.0, 1.0, 1.0], [0.0, 0.0, 1.0, 1.0]]]) + targets = tf.constant([[[0.0, 0.0, 1.0, 1.0], [0.0, 0.0, 1.0, 1.0]]]) + weights = tf.constant([[1.0, 1.0]]) + loss = localization_loss(predictions, targets, weights=weights) + self.assertEqual(loss.shape, [1, 2]) + + def test_raise_error_on_empty_localization_config(self): + losses_text_proto = """ + classification_loss { + weighted_softmax { + } + } + """ + losses_proto = losses_pb2.Loss() + text_format.Merge(losses_text_proto, losses_proto) + with self.assertRaises(ValueError): + losses_builder._build_localization_loss(losses_proto) + + +class ClassificationLossBuilderTest(tf.test.TestCase): + + def test_build_weighted_sigmoid_classification_loss(self): + losses_text_proto = """ + classification_loss { + weighted_sigmoid { + } + } + localization_loss { + weighted_l2 { + } + } + """ + losses_proto = losses_pb2.Loss() + text_format.Merge(losses_text_proto, losses_proto) + classification_loss, _, _, _, _ = losses_builder.build(losses_proto) + self.assertTrue(isinstance(classification_loss, + losses.WeightedSigmoidClassificationLoss)) + + def test_build_weighted_softmax_classification_loss(self): + losses_text_proto = """ + classification_loss { + weighted_softmax { + } + } + localization_loss { + weighted_l2 { + } + } + """ + losses_proto = losses_pb2.Loss() + text_format.Merge(losses_text_proto, losses_proto) + classification_loss, _, _, _, _ = losses_builder.build(losses_proto) + self.assertTrue(isinstance(classification_loss, + losses.WeightedSoftmaxClassificationLoss)) + + def test_build_bootstrapped_sigmoid_classification_loss(self): + losses_text_proto = """ + classification_loss { + bootstrapped_sigmoid { + alpha: 0.5 + } + } + localization_loss { + weighted_l2 { + } + } + """ + losses_proto = losses_pb2.Loss() + text_format.Merge(losses_text_proto, losses_proto) + classification_loss, _, _, _, _ = losses_builder.build(losses_proto) + self.assertTrue(isinstance(classification_loss, + losses.BootstrappedSigmoidClassificationLoss)) + + def test_anchorwise_output(self): + losses_text_proto = """ + classification_loss { + weighted_sigmoid { + anchorwise_output: true + } + } + localization_loss { + weighted_l2 { + } + } + """ + losses_proto = losses_pb2.Loss() + text_format.Merge(losses_text_proto, losses_proto) + classification_loss, _, _, _, _ = losses_builder.build(losses_proto) + self.assertTrue(isinstance(classification_loss, + losses.WeightedSigmoidClassificationLoss)) + predictions = tf.constant([[[0.0, 1.0, 0.0], [0.0, 0.5, 0.5]]]) + targets = tf.constant([[[0.0, 1.0, 0.0], [0.0, 0.0, 1.0]]]) + weights = tf.constant([[1.0, 1.0]]) + loss = classification_loss(predictions, targets, weights=weights) + self.assertEqual(loss.shape, [1, 2]) + + def test_raise_error_on_empty_config(self): + losses_text_proto = """ + localization_loss { + weighted_l2 { + } + } + """ + losses_proto = losses_pb2.Loss() + text_format.Merge(losses_text_proto, losses_proto) + with self.assertRaises(ValueError): + losses_builder.build(losses_proto) + + +class HardExampleMinerBuilderTest(tf.test.TestCase): + + def test_do_not_build_hard_example_miner_by_default(self): + losses_text_proto = """ + localization_loss { + weighted_l2 { + } + } + classification_loss { + weighted_softmax { + } + } + """ + losses_proto = losses_pb2.Loss() + text_format.Merge(losses_text_proto, losses_proto) + _, _, _, _, hard_example_miner = losses_builder.build(losses_proto) + self.assertEqual(hard_example_miner, None) + + def test_build_hard_example_miner_for_classification_loss(self): + losses_text_proto = """ + localization_loss { + weighted_l2 { + } + } + classification_loss { + weighted_softmax { + } + } + hard_example_miner { + loss_type: CLASSIFICATION + } + """ + losses_proto = losses_pb2.Loss() + text_format.Merge(losses_text_proto, losses_proto) + _, _, _, _, hard_example_miner = losses_builder.build(losses_proto) + self.assertTrue(isinstance(hard_example_miner, losses.HardExampleMiner)) + self.assertEqual(hard_example_miner._loss_type, 'cls') + + def test_build_hard_example_miner_for_localization_loss(self): + losses_text_proto = """ + localization_loss { + weighted_l2 { + } + } + classification_loss { + weighted_softmax { + } + } + hard_example_miner { + loss_type: LOCALIZATION + } + """ + losses_proto = losses_pb2.Loss() + text_format.Merge(losses_text_proto, losses_proto) + _, _, _, _, hard_example_miner = losses_builder.build(losses_proto) + self.assertTrue(isinstance(hard_example_miner, losses.HardExampleMiner)) + self.assertEqual(hard_example_miner._loss_type, 'loc') + + def test_build_hard_example_miner_with_non_default_values(self): + losses_text_proto = """ + localization_loss { + weighted_l2 { + } + } + classification_loss { + weighted_softmax { + } + } + hard_example_miner { + num_hard_examples: 32 + iou_threshold: 0.5 + loss_type: LOCALIZATION + max_negatives_per_positive: 10 + min_negatives_per_image: 3 + } + """ + losses_proto = losses_pb2.Loss() + text_format.Merge(losses_text_proto, losses_proto) + _, _, _, _, hard_example_miner = losses_builder.build(losses_proto) + self.assertTrue(isinstance(hard_example_miner, losses.HardExampleMiner)) + self.assertEqual(hard_example_miner._num_hard_examples, 32) + self.assertAlmostEqual(hard_example_miner._iou_threshold, 0.5) + self.assertEqual(hard_example_miner._max_negatives_per_positive, 10) + self.assertEqual(hard_example_miner._min_negatives_per_image, 3) + + +class LossBuilderTest(tf.test.TestCase): + + def test_build_all_loss_parameters(self): + losses_text_proto = """ + localization_loss { + weighted_l2 { + } + } + classification_loss { + weighted_softmax { + } + } + hard_example_miner { + } + classification_weight: 0.8 + localization_weight: 0.2 + """ + losses_proto = losses_pb2.Loss() + text_format.Merge(losses_text_proto, losses_proto) + (classification_loss, localization_loss, + classification_weight, localization_weight, + hard_example_miner) = losses_builder.build(losses_proto) + self.assertTrue(isinstance(hard_example_miner, losses.HardExampleMiner)) + self.assertTrue(isinstance(classification_loss, + losses.WeightedSoftmaxClassificationLoss)) + self.assertTrue(isinstance(localization_loss, + losses.WeightedL2LocalizationLoss)) + self.assertAlmostEqual(classification_weight, 0.8) + self.assertAlmostEqual(localization_weight, 0.2) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/builders/matcher_builder.py b/object_detection/builders/matcher_builder.py new file mode 100644 index 0000000000000000000000000000000000000000..6ec49da973abc76ce18106ed7e22945c7356e95c --- /dev/null +++ b/object_detection/builders/matcher_builder.py @@ -0,0 +1,51 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""A function to build an object detection matcher from configuration.""" + +from object_detection.matchers import argmax_matcher +from object_detection.matchers import bipartite_matcher +from object_detection.protos import matcher_pb2 + + +def build(matcher_config): + """Builds a matcher object based on the matcher config. + + Args: + matcher_config: A matcher.proto object containing the config for the desired + Matcher. + + Returns: + Matcher based on the config. + + Raises: + ValueError: On empty matcher proto. + """ + if not isinstance(matcher_config, matcher_pb2.Matcher): + raise ValueError('matcher_config not of type matcher_pb2.Matcher.') + if matcher_config.WhichOneof('matcher_oneof') == 'argmax_matcher': + matcher = matcher_config.argmax_matcher + matched_threshold = unmatched_threshold = None + if not matcher.ignore_thresholds: + matched_threshold = matcher.matched_threshold + unmatched_threshold = matcher.unmatched_threshold + return argmax_matcher.ArgMaxMatcher( + matched_threshold=matched_threshold, + unmatched_threshold=unmatched_threshold, + negatives_lower_than_unmatched=matcher.negatives_lower_than_unmatched, + force_match_for_each_row=matcher.force_match_for_each_row) + if matcher_config.WhichOneof('matcher_oneof') == 'bipartite_matcher': + return bipartite_matcher.GreedyBipartiteMatcher() + raise ValueError('Empty matcher.') diff --git a/object_detection/builders/matcher_builder_test.py b/object_detection/builders/matcher_builder_test.py new file mode 100644 index 0000000000000000000000000000000000000000..c4275aaef9975aa0922ca0330b48e2676be2d1b1 --- /dev/null +++ b/object_detection/builders/matcher_builder_test.py @@ -0,0 +1,97 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for matcher_builder.""" + +import tensorflow as tf + +from google.protobuf import text_format +from object_detection.builders import matcher_builder +from object_detection.matchers import argmax_matcher +from object_detection.matchers import bipartite_matcher +from object_detection.protos import matcher_pb2 + + +class MatcherBuilderTest(tf.test.TestCase): + + def test_build_arg_max_matcher_with_defaults(self): + matcher_text_proto = """ + argmax_matcher { + } + """ + matcher_proto = matcher_pb2.Matcher() + text_format.Merge(matcher_text_proto, matcher_proto) + matcher_object = matcher_builder.build(matcher_proto) + self.assertTrue(isinstance(matcher_object, argmax_matcher.ArgMaxMatcher)) + self.assertAlmostEqual(matcher_object._matched_threshold, 0.5) + self.assertAlmostEqual(matcher_object._unmatched_threshold, 0.5) + self.assertTrue(matcher_object._negatives_lower_than_unmatched) + self.assertFalse(matcher_object._force_match_for_each_row) + + def test_build_arg_max_matcher_without_thresholds(self): + matcher_text_proto = """ + argmax_matcher { + ignore_thresholds: true + } + """ + matcher_proto = matcher_pb2.Matcher() + text_format.Merge(matcher_text_proto, matcher_proto) + matcher_object = matcher_builder.build(matcher_proto) + self.assertTrue(isinstance(matcher_object, argmax_matcher.ArgMaxMatcher)) + self.assertEqual(matcher_object._matched_threshold, None) + self.assertEqual(matcher_object._unmatched_threshold, None) + self.assertTrue(matcher_object._negatives_lower_than_unmatched) + self.assertFalse(matcher_object._force_match_for_each_row) + + def test_build_arg_max_matcher_with_non_default_parameters(self): + matcher_text_proto = """ + argmax_matcher { + matched_threshold: 0.7 + unmatched_threshold: 0.3 + negatives_lower_than_unmatched: false + force_match_for_each_row: true + } + """ + matcher_proto = matcher_pb2.Matcher() + text_format.Merge(matcher_text_proto, matcher_proto) + matcher_object = matcher_builder.build(matcher_proto) + self.assertTrue(isinstance(matcher_object, argmax_matcher.ArgMaxMatcher)) + self.assertAlmostEqual(matcher_object._matched_threshold, 0.7) + self.assertAlmostEqual(matcher_object._unmatched_threshold, 0.3) + self.assertFalse(matcher_object._negatives_lower_than_unmatched) + self.assertTrue(matcher_object._force_match_for_each_row) + + def test_build_bipartite_matcher(self): + matcher_text_proto = """ + bipartite_matcher { + } + """ + matcher_proto = matcher_pb2.Matcher() + text_format.Merge(matcher_text_proto, matcher_proto) + matcher_object = matcher_builder.build(matcher_proto) + self.assertTrue( + isinstance(matcher_object, bipartite_matcher.GreedyBipartiteMatcher)) + + def test_raise_error_on_empty_matcher(self): + matcher_text_proto = """ + """ + matcher_proto = matcher_pb2.Matcher() + text_format.Merge(matcher_text_proto, matcher_proto) + with self.assertRaises(ValueError): + matcher_builder.build(matcher_proto) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/builders/model_builder.py b/object_detection/builders/model_builder.py new file mode 100644 index 0000000000000000000000000000000000000000..7df3959c322d7a795ab748aaa1858f8854e806be --- /dev/null +++ b/object_detection/builders/model_builder.py @@ -0,0 +1,303 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""A function to build a DetectionModel from configuration.""" +from object_detection.builders import anchor_generator_builder +from object_detection.builders import box_coder_builder +from object_detection.builders import box_predictor_builder +from object_detection.builders import hyperparams_builder +from object_detection.builders import image_resizer_builder +from object_detection.builders import losses_builder +from object_detection.builders import matcher_builder +from object_detection.builders import post_processing_builder +from object_detection.builders import region_similarity_calculator_builder as sim_calc +from object_detection.core import box_predictor +from object_detection.meta_architectures import faster_rcnn_meta_arch +from object_detection.meta_architectures import rfcn_meta_arch +from object_detection.meta_architectures import ssd_meta_arch +from object_detection.models import faster_rcnn_inception_resnet_v2_feature_extractor as frcnn_inc_res +from object_detection.models import faster_rcnn_resnet_v1_feature_extractor as frcnn_resnet_v1 +from object_detection.models.ssd_inception_v2_feature_extractor import SSDInceptionV2FeatureExtractor +from object_detection.models.ssd_mobilenet_v1_feature_extractor import SSDMobileNetV1FeatureExtractor +from object_detection.protos import model_pb2 + +# A map of names to SSD feature extractors. +SSD_FEATURE_EXTRACTOR_CLASS_MAP = { + 'ssd_inception_v2': SSDInceptionV2FeatureExtractor, + 'ssd_mobilenet_v1': SSDMobileNetV1FeatureExtractor, +} + +# A map of names to Faster R-CNN feature extractors. +FASTER_RCNN_FEATURE_EXTRACTOR_CLASS_MAP = { + 'faster_rcnn_resnet50': + frcnn_resnet_v1.FasterRCNNResnet50FeatureExtractor, + 'faster_rcnn_resnet101': + frcnn_resnet_v1.FasterRCNNResnet101FeatureExtractor, + 'faster_rcnn_resnet152': + frcnn_resnet_v1.FasterRCNNResnet152FeatureExtractor, + 'faster_rcnn_inception_resnet_v2': + frcnn_inc_res.FasterRCNNInceptionResnetV2FeatureExtractor +} + + +def build(model_config, is_training): + """Builds a DetectionModel based on the model config. + + Args: + model_config: A model.proto object containing the config for the desired + DetectionModel. + is_training: True if this model is being built for training purposes. + + Returns: + DetectionModel based on the config. + + Raises: + ValueError: On invalid meta architecture or model. + """ + if not isinstance(model_config, model_pb2.DetectionModel): + raise ValueError('model_config not of type model_pb2.DetectionModel.') + meta_architecture = model_config.WhichOneof('model') + if meta_architecture == 'ssd': + return _build_ssd_model(model_config.ssd, is_training) + if meta_architecture == 'faster_rcnn': + return _build_faster_rcnn_model(model_config.faster_rcnn, is_training) + raise ValueError('Unknown meta architecture: {}'.format(meta_architecture)) + + +def _build_ssd_feature_extractor(feature_extractor_config, is_training, + reuse_weights=None): + """Builds a ssd_meta_arch.SSDFeatureExtractor based on config. + + Args: + feature_extractor_config: A SSDFeatureExtractor proto config from ssd.proto. + is_training: True if this feature extractor is being built for training. + reuse_weights: if the feature extractor should reuse weights. + + Returns: + ssd_meta_arch.SSDFeatureExtractor based on config. + + Raises: + ValueError: On invalid feature extractor type. + """ + feature_type = feature_extractor_config.type + depth_multiplier = feature_extractor_config.depth_multiplier + min_depth = feature_extractor_config.min_depth + conv_hyperparams = hyperparams_builder.build( + feature_extractor_config.conv_hyperparams, is_training) + + if feature_type not in SSD_FEATURE_EXTRACTOR_CLASS_MAP: + raise ValueError('Unknown ssd feature_extractor: {}'.format(feature_type)) + + feature_extractor_class = SSD_FEATURE_EXTRACTOR_CLASS_MAP[feature_type] + return feature_extractor_class(depth_multiplier, min_depth, conv_hyperparams, + reuse_weights) + + +def _build_ssd_model(ssd_config, is_training): + """Builds an SSD detection model based on the model config. + + Args: + ssd_config: A ssd.proto object containing the config for the desired + SSDMetaArch. + is_training: True if this model is being built for training purposes. + + Returns: + SSDMetaArch based on the config. + Raises: + ValueError: If ssd_config.type is not recognized (i.e. not registered in + model_class_map). + """ + num_classes = ssd_config.num_classes + + # Feature extractor + feature_extractor = _build_ssd_feature_extractor(ssd_config.feature_extractor, + is_training) + + box_coder = box_coder_builder.build(ssd_config.box_coder) + matcher = matcher_builder.build(ssd_config.matcher) + region_similarity_calculator = sim_calc.build( + ssd_config.similarity_calculator) + ssd_box_predictor = box_predictor_builder.build(hyperparams_builder.build, + ssd_config.box_predictor, + is_training, num_classes) + anchor_generator = anchor_generator_builder.build( + ssd_config.anchor_generator) + image_resizer_fn = image_resizer_builder.build(ssd_config.image_resizer) + non_max_suppression_fn, score_conversion_fn = post_processing_builder.build( + ssd_config.post_processing) + (classification_loss, localization_loss, classification_weight, + localization_weight, + hard_example_miner) = losses_builder.build(ssd_config.loss) + normalize_loss_by_num_matches = ssd_config.normalize_loss_by_num_matches + + return ssd_meta_arch.SSDMetaArch( + is_training, + anchor_generator, + ssd_box_predictor, + box_coder, + feature_extractor, + matcher, + region_similarity_calculator, + image_resizer_fn, + non_max_suppression_fn, + score_conversion_fn, + classification_loss, + localization_loss, + classification_weight, + localization_weight, + normalize_loss_by_num_matches, + hard_example_miner) + + +def _build_faster_rcnn_feature_extractor( + feature_extractor_config, is_training, reuse_weights=None): + """Builds a faster_rcnn_meta_arch.FasterRCNNFeatureExtractor based on config. + + Args: + feature_extractor_config: A FasterRcnnFeatureExtractor proto config from + faster_rcnn.proto. + is_training: True if this feature extractor is being built for training. + reuse_weights: if the feature extractor should reuse weights. + + Returns: + faster_rcnn_meta_arch.FasterRCNNFeatureExtractor based on config. + + Raises: + ValueError: On invalid feature extractor type. + """ + feature_type = feature_extractor_config.type + first_stage_features_stride = ( + feature_extractor_config.first_stage_features_stride) + + if feature_type not in FASTER_RCNN_FEATURE_EXTRACTOR_CLASS_MAP: + raise ValueError('Unknown Faster R-CNN feature_extractor: {}'.format( + feature_type)) + feature_extractor_class = FASTER_RCNN_FEATURE_EXTRACTOR_CLASS_MAP[ + feature_type] + return feature_extractor_class( + is_training, first_stage_features_stride, reuse_weights) + + +def _build_faster_rcnn_model(frcnn_config, is_training): + """Builds a Faster R-CNN or R-FCN detection model based on the model config. + + Builds R-FCN model if the second_stage_box_predictor in the config is of type + `rfcn_box_predictor` else builds a Faster R-CNN model. + + Args: + frcnn_config: A faster_rcnn.proto object containing the config for the + desired FasterRCNNMetaArch or RFCNMetaArch. + is_training: True if this model is being built for training purposes. + + Returns: + FasterRCNNMetaArch based on the config. + Raises: + ValueError: If frcnn_config.type is not recognized (i.e. not registered in + model_class_map). + """ + num_classes = frcnn_config.num_classes + image_resizer_fn = image_resizer_builder.build(frcnn_config.image_resizer) + + feature_extractor = _build_faster_rcnn_feature_extractor( + frcnn_config.feature_extractor, is_training) + + first_stage_only = frcnn_config.first_stage_only + first_stage_anchor_generator = anchor_generator_builder.build( + frcnn_config.first_stage_anchor_generator) + + first_stage_atrous_rate = frcnn_config.first_stage_atrous_rate + first_stage_box_predictor_arg_scope = hyperparams_builder.build( + frcnn_config.first_stage_box_predictor_conv_hyperparams, is_training) + first_stage_box_predictor_kernel_size = ( + frcnn_config.first_stage_box_predictor_kernel_size) + first_stage_box_predictor_depth = frcnn_config.first_stage_box_predictor_depth + first_stage_minibatch_size = frcnn_config.first_stage_minibatch_size + first_stage_positive_balance_fraction = ( + frcnn_config.first_stage_positive_balance_fraction) + first_stage_nms_score_threshold = frcnn_config.first_stage_nms_score_threshold + first_stage_nms_iou_threshold = frcnn_config.first_stage_nms_iou_threshold + first_stage_max_proposals = frcnn_config.first_stage_max_proposals + first_stage_loc_loss_weight = ( + frcnn_config.first_stage_localization_loss_weight) + first_stage_obj_loss_weight = frcnn_config.first_stage_objectness_loss_weight + + initial_crop_size = frcnn_config.initial_crop_size + maxpool_kernel_size = frcnn_config.maxpool_kernel_size + maxpool_stride = frcnn_config.maxpool_stride + + second_stage_box_predictor = box_predictor_builder.build( + hyperparams_builder.build, + frcnn_config.second_stage_box_predictor, + is_training=is_training, + num_classes=num_classes) + second_stage_batch_size = frcnn_config.second_stage_batch_size + second_stage_balance_fraction = frcnn_config.second_stage_balance_fraction + (second_stage_non_max_suppression_fn, second_stage_score_conversion_fn + ) = post_processing_builder.build(frcnn_config.second_stage_post_processing) + second_stage_localization_loss_weight = ( + frcnn_config.second_stage_localization_loss_weight) + second_stage_classification_loss_weight = ( + frcnn_config.second_stage_classification_loss_weight) + + hard_example_miner = None + if frcnn_config.HasField('hard_example_miner'): + hard_example_miner = losses_builder.build_hard_example_miner( + frcnn_config.hard_example_miner, + second_stage_classification_loss_weight, + second_stage_localization_loss_weight) + + common_kwargs = { + 'is_training': is_training, + 'num_classes': num_classes, + 'image_resizer_fn': image_resizer_fn, + 'feature_extractor': feature_extractor, + 'first_stage_only': first_stage_only, + 'first_stage_anchor_generator': first_stage_anchor_generator, + 'first_stage_atrous_rate': first_stage_atrous_rate, + 'first_stage_box_predictor_arg_scope': + first_stage_box_predictor_arg_scope, + 'first_stage_box_predictor_kernel_size': + first_stage_box_predictor_kernel_size, + 'first_stage_box_predictor_depth': first_stage_box_predictor_depth, + 'first_stage_minibatch_size': first_stage_minibatch_size, + 'first_stage_positive_balance_fraction': + first_stage_positive_balance_fraction, + 'first_stage_nms_score_threshold': first_stage_nms_score_threshold, + 'first_stage_nms_iou_threshold': first_stage_nms_iou_threshold, + 'first_stage_max_proposals': first_stage_max_proposals, + 'first_stage_localization_loss_weight': first_stage_loc_loss_weight, + 'first_stage_objectness_loss_weight': first_stage_obj_loss_weight, + 'second_stage_batch_size': second_stage_batch_size, + 'second_stage_balance_fraction': second_stage_balance_fraction, + 'second_stage_non_max_suppression_fn': + second_stage_non_max_suppression_fn, + 'second_stage_score_conversion_fn': second_stage_score_conversion_fn, + 'second_stage_localization_loss_weight': + second_stage_localization_loss_weight, + 'second_stage_classification_loss_weight': + second_stage_classification_loss_weight, + 'hard_example_miner': hard_example_miner} + + if isinstance(second_stage_box_predictor, box_predictor.RfcnBoxPredictor): + return rfcn_meta_arch.RFCNMetaArch( + second_stage_rfcn_box_predictor=second_stage_box_predictor, + **common_kwargs) + else: + return faster_rcnn_meta_arch.FasterRCNNMetaArch( + initial_crop_size=initial_crop_size, + maxpool_kernel_size=maxpool_kernel_size, + maxpool_stride=maxpool_stride, + second_stage_mask_rcnn_box_predictor=second_stage_box_predictor, + **common_kwargs) diff --git a/object_detection/builders/model_builder_test.py b/object_detection/builders/model_builder_test.py new file mode 100644 index 0000000000000000000000000000000000000000..28b15e6508f193cf57c80e3c1980523ceec1ac00 --- /dev/null +++ b/object_detection/builders/model_builder_test.py @@ -0,0 +1,456 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.models.model_builder.""" + +import tensorflow as tf + +from google.protobuf import text_format +from object_detection.builders import model_builder +from object_detection.meta_architectures import faster_rcnn_meta_arch +from object_detection.meta_architectures import rfcn_meta_arch +from object_detection.meta_architectures import ssd_meta_arch +from object_detection.models import faster_rcnn_inception_resnet_v2_feature_extractor as frcnn_inc_res +from object_detection.models import faster_rcnn_resnet_v1_feature_extractor as frcnn_resnet_v1 +from object_detection.models.ssd_inception_v2_feature_extractor import SSDInceptionV2FeatureExtractor +from object_detection.models.ssd_mobilenet_v1_feature_extractor import SSDMobileNetV1FeatureExtractor +from object_detection.protos import model_pb2 + +FEATURE_EXTRACTOR_MAPS = { + 'faster_rcnn_resnet50': + frcnn_resnet_v1.FasterRCNNResnet50FeatureExtractor, + 'faster_rcnn_resnet101': + frcnn_resnet_v1.FasterRCNNResnet101FeatureExtractor, + 'faster_rcnn_resnet152': + frcnn_resnet_v1.FasterRCNNResnet152FeatureExtractor +} + + +class ModelBuilderTest(tf.test.TestCase): + + def create_model(self, model_config): + """Builds a DetectionModel based on the model config. + + Args: + model_config: A model.proto object containing the config for the desired + DetectionModel. + + Returns: + DetectionModel based on the config. + """ + return model_builder.build(model_config, is_training=True) + + def test_create_ssd_inception_v2_model_from_config(self): + model_text_proto = """ + ssd { + feature_extractor { + type: 'ssd_inception_v2' + conv_hyperparams { + regularizer { + l2_regularizer { + } + } + initializer { + truncated_normal_initializer { + } + } + } + } + box_coder { + faster_rcnn_box_coder { + } + } + matcher { + argmax_matcher { + } + } + similarity_calculator { + iou_similarity { + } + } + anchor_generator { + ssd_anchor_generator { + aspect_ratios: 1.0 + } + } + image_resizer { + fixed_shape_resizer { + height: 320 + width: 320 + } + } + box_predictor { + convolutional_box_predictor { + conv_hyperparams { + regularizer { + l2_regularizer { + } + } + initializer { + truncated_normal_initializer { + } + } + } + } + } + loss { + classification_loss { + weighted_softmax { + } + } + localization_loss { + weighted_smooth_l1 { + } + } + } + }""" + model_proto = model_pb2.DetectionModel() + text_format.Merge(model_text_proto, model_proto) + model = self.create_model(model_proto) + self.assertIsInstance(model, ssd_meta_arch.SSDMetaArch) + self.assertIsInstance(model._feature_extractor, + SSDInceptionV2FeatureExtractor) + + def test_create_ssd_mobilenet_v1_model_from_config(self): + model_text_proto = """ + ssd { + feature_extractor { + type: 'ssd_mobilenet_v1' + conv_hyperparams { + regularizer { + l2_regularizer { + } + } + initializer { + truncated_normal_initializer { + } + } + } + } + box_coder { + faster_rcnn_box_coder { + } + } + matcher { + argmax_matcher { + } + } + similarity_calculator { + iou_similarity { + } + } + anchor_generator { + ssd_anchor_generator { + aspect_ratios: 1.0 + } + } + image_resizer { + fixed_shape_resizer { + height: 320 + width: 320 + } + } + box_predictor { + convolutional_box_predictor { + conv_hyperparams { + regularizer { + l2_regularizer { + } + } + initializer { + truncated_normal_initializer { + } + } + } + } + } + loss { + classification_loss { + weighted_softmax { + } + } + localization_loss { + weighted_smooth_l1 { + } + } + } + }""" + model_proto = model_pb2.DetectionModel() + text_format.Merge(model_text_proto, model_proto) + model = self.create_model(model_proto) + self.assertIsInstance(model, ssd_meta_arch.SSDMetaArch) + self.assertIsInstance(model._feature_extractor, + SSDMobileNetV1FeatureExtractor) + + def test_create_faster_rcnn_resnet_v1_models_from_config(self): + model_text_proto = """ + faster_rcnn { + num_classes: 3 + image_resizer { + keep_aspect_ratio_resizer { + min_dimension: 600 + max_dimension: 1024 + } + } + feature_extractor { + type: 'faster_rcnn_resnet101' + } + first_stage_anchor_generator { + grid_anchor_generator { + scales: [0.25, 0.5, 1.0, 2.0] + aspect_ratios: [0.5, 1.0, 2.0] + height_stride: 16 + width_stride: 16 + } + } + first_stage_box_predictor_conv_hyperparams { + regularizer { + l2_regularizer { + } + } + initializer { + truncated_normal_initializer { + } + } + } + initial_crop_size: 14 + maxpool_kernel_size: 2 + maxpool_stride: 2 + second_stage_box_predictor { + mask_rcnn_box_predictor { + fc_hyperparams { + op: FC + regularizer { + l2_regularizer { + } + } + initializer { + truncated_normal_initializer { + } + } + } + } + } + second_stage_post_processing { + batch_non_max_suppression { + score_threshold: 0.01 + iou_threshold: 0.6 + max_detections_per_class: 100 + max_total_detections: 300 + } + score_converter: SOFTMAX + } + }""" + model_proto = model_pb2.DetectionModel() + text_format.Merge(model_text_proto, model_proto) + for extractor_type, extractor_class in FEATURE_EXTRACTOR_MAPS.items(): + model_proto.faster_rcnn.feature_extractor.type = extractor_type + model = model_builder.build(model_proto, is_training=True) + self.assertIsInstance(model, faster_rcnn_meta_arch.FasterRCNNMetaArch) + self.assertIsInstance(model._feature_extractor, extractor_class) + + def test_create_faster_rcnn_inception_resnet_v2_model_from_config(self): + model_text_proto = """ + faster_rcnn { + num_classes: 3 + image_resizer { + keep_aspect_ratio_resizer { + min_dimension: 600 + max_dimension: 1024 + } + } + feature_extractor { + type: 'faster_rcnn_inception_resnet_v2' + } + first_stage_anchor_generator { + grid_anchor_generator { + scales: [0.25, 0.5, 1.0, 2.0] + aspect_ratios: [0.5, 1.0, 2.0] + height_stride: 16 + width_stride: 16 + } + } + first_stage_box_predictor_conv_hyperparams { + regularizer { + l2_regularizer { + } + } + initializer { + truncated_normal_initializer { + } + } + } + initial_crop_size: 17 + maxpool_kernel_size: 1 + maxpool_stride: 1 + second_stage_box_predictor { + mask_rcnn_box_predictor { + fc_hyperparams { + op: FC + regularizer { + l2_regularizer { + } + } + initializer { + truncated_normal_initializer { + } + } + } + } + } + second_stage_post_processing { + batch_non_max_suppression { + score_threshold: 0.01 + iou_threshold: 0.6 + max_detections_per_class: 100 + max_total_detections: 300 + } + score_converter: SOFTMAX + } + }""" + model_proto = model_pb2.DetectionModel() + text_format.Merge(model_text_proto, model_proto) + model = model_builder.build(model_proto, is_training=True) + self.assertIsInstance(model, faster_rcnn_meta_arch.FasterRCNNMetaArch) + self.assertIsInstance( + model._feature_extractor, + frcnn_inc_res.FasterRCNNInceptionResnetV2FeatureExtractor) + + def test_create_faster_rcnn_model_from_config_with_example_miner(self): + model_text_proto = """ + faster_rcnn { + num_classes: 3 + feature_extractor { + type: 'faster_rcnn_inception_resnet_v2' + } + image_resizer { + keep_aspect_ratio_resizer { + min_dimension: 600 + max_dimension: 1024 + } + } + first_stage_anchor_generator { + grid_anchor_generator { + scales: [0.25, 0.5, 1.0, 2.0] + aspect_ratios: [0.5, 1.0, 2.0] + height_stride: 16 + width_stride: 16 + } + } + first_stage_box_predictor_conv_hyperparams { + regularizer { + l2_regularizer { + } + } + initializer { + truncated_normal_initializer { + } + } + } + second_stage_box_predictor { + mask_rcnn_box_predictor { + fc_hyperparams { + op: FC + regularizer { + l2_regularizer { + } + } + initializer { + truncated_normal_initializer { + } + } + } + } + } + hard_example_miner { + num_hard_examples: 10 + iou_threshold: 0.99 + } + }""" + model_proto = model_pb2.DetectionModel() + text_format.Merge(model_text_proto, model_proto) + model = model_builder.build(model_proto, is_training=True) + self.assertIsNotNone(model._hard_example_miner) + + def test_create_rfcn_resnet_v1_model_from_config(self): + model_text_proto = """ + faster_rcnn { + num_classes: 3 + image_resizer { + keep_aspect_ratio_resizer { + min_dimension: 600 + max_dimension: 1024 + } + } + feature_extractor { + type: 'faster_rcnn_resnet101' + } + first_stage_anchor_generator { + grid_anchor_generator { + scales: [0.25, 0.5, 1.0, 2.0] + aspect_ratios: [0.5, 1.0, 2.0] + height_stride: 16 + width_stride: 16 + } + } + first_stage_box_predictor_conv_hyperparams { + regularizer { + l2_regularizer { + } + } + initializer { + truncated_normal_initializer { + } + } + } + initial_crop_size: 14 + maxpool_kernel_size: 2 + maxpool_stride: 2 + second_stage_box_predictor { + rfcn_box_predictor { + conv_hyperparams { + op: CONV + regularizer { + l2_regularizer { + } + } + initializer { + truncated_normal_initializer { + } + } + } + } + } + second_stage_post_processing { + batch_non_max_suppression { + score_threshold: 0.01 + iou_threshold: 0.6 + max_detections_per_class: 100 + max_total_detections: 300 + } + score_converter: SOFTMAX + } + }""" + model_proto = model_pb2.DetectionModel() + text_format.Merge(model_text_proto, model_proto) + for extractor_type, extractor_class in FEATURE_EXTRACTOR_MAPS.items(): + model_proto.faster_rcnn.feature_extractor.type = extractor_type + model = model_builder.build(model_proto, is_training=True) + self.assertIsInstance(model, rfcn_meta_arch.RFCNMetaArch) + self.assertIsInstance(model._feature_extractor, extractor_class) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/builders/optimizer_builder.py b/object_detection/builders/optimizer_builder.py new file mode 100644 index 0000000000000000000000000000000000000000..f74b056207d8454feded119cbd81f6382100637e --- /dev/null +++ b/object_detection/builders/optimizer_builder.py @@ -0,0 +1,112 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Functions to build DetectionModel training optimizers.""" + +import tensorflow as tf +from object_detection.utils import learning_schedules + +slim = tf.contrib.slim + + +def build(optimizer_config, global_summaries): + """Create optimizer based on config. + + Args: + optimizer_config: A Optimizer proto message. + global_summaries: A set to attach learning rate summary to. + + Returns: + An optimizer. + + Raises: + ValueError: when using an unsupported input data type. + """ + optimizer_type = optimizer_config.WhichOneof('optimizer') + optimizer = None + + if optimizer_type == 'rms_prop_optimizer': + config = optimizer_config.rms_prop_optimizer + optimizer = tf.train.RMSPropOptimizer( + _create_learning_rate(config.learning_rate, global_summaries), + decay=config.decay, + momentum=config.momentum_optimizer_value, + epsilon=config.epsilon) + + if optimizer_type == 'momentum_optimizer': + config = optimizer_config.momentum_optimizer + optimizer = tf.train.MomentumOptimizer( + _create_learning_rate(config.learning_rate, global_summaries), + momentum=config.momentum_optimizer_value) + + if optimizer_type == 'adam_optimizer': + config = optimizer_config.adam_optimizer + optimizer = tf.train.AdamOptimizer( + _create_learning_rate(config.learning_rate, global_summaries)) + + if optimizer is None: + raise ValueError('Optimizer %s not supported.' % optimizer_type) + + if optimizer_config.use_moving_average: + optimizer = tf.contrib.opt.MovingAverageOptimizer( + optimizer, average_decay=optimizer_config.moving_average_decay) + + return optimizer + + +def _create_learning_rate(learning_rate_config, global_summaries): + """Create optimizer learning rate based on config. + + Args: + learning_rate_config: A LearningRate proto message. + global_summaries: A set to attach learning rate summary to. + + Returns: + A learning rate. + + Raises: + ValueError: when using an unsupported input data type. + """ + learning_rate = None + learning_rate_type = learning_rate_config.WhichOneof('learning_rate') + if learning_rate_type == 'constant_learning_rate': + config = learning_rate_config.constant_learning_rate + learning_rate = config.learning_rate + + if learning_rate_type == 'exponential_decay_learning_rate': + config = learning_rate_config.exponential_decay_learning_rate + learning_rate = tf.train.exponential_decay( + config.initial_learning_rate, + slim.get_or_create_global_step(), + config.decay_steps, + config.decay_factor, + staircase=config.staircase) + + if learning_rate_type == 'manual_step_learning_rate': + config = learning_rate_config.manual_step_learning_rate + if not config.schedule: + raise ValueError('Empty learning rate schedule.') + learning_rate_step_boundaries = [x.step for x in config.schedule] + learning_rate_sequence = [config.initial_learning_rate] + learning_rate_sequence += [x.learning_rate for x in config.schedule] + learning_rate = learning_schedules.manual_stepping( + slim.get_or_create_global_step(), learning_rate_step_boundaries, + learning_rate_sequence) + + if learning_rate is None: + raise ValueError('Learning_rate %s not supported.' % learning_rate_type) + + global_summaries.add(tf.summary.scalar('Learning Rate', learning_rate)) + return learning_rate diff --git a/object_detection/builders/optimizer_builder_test.py b/object_detection/builders/optimizer_builder_test.py new file mode 100644 index 0000000000000000000000000000000000000000..958d2e1d27819d7cfbfe7be89aff6d48560c2ead --- /dev/null +++ b/object_detection/builders/optimizer_builder_test.py @@ -0,0 +1,197 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for optimizer_builder.""" + +import tensorflow as tf + +from google.protobuf import text_format + +from object_detection.builders import optimizer_builder +from object_detection.protos import optimizer_pb2 + + +class LearningRateBuilderTest(tf.test.TestCase): + + def testBuildConstantLearningRate(self): + learning_rate_text_proto = """ + constant_learning_rate { + learning_rate: 0.004 + } + """ + global_summaries = set([]) + learning_rate_proto = optimizer_pb2.LearningRate() + text_format.Merge(learning_rate_text_proto, learning_rate_proto) + learning_rate = optimizer_builder._create_learning_rate( + learning_rate_proto, global_summaries) + self.assertAlmostEqual(learning_rate, 0.004) + + def testBuildExponentialDecayLearningRate(self): + learning_rate_text_proto = """ + exponential_decay_learning_rate { + initial_learning_rate: 0.004 + decay_steps: 99999 + decay_factor: 0.85 + staircase: false + } + """ + global_summaries = set([]) + learning_rate_proto = optimizer_pb2.LearningRate() + text_format.Merge(learning_rate_text_proto, learning_rate_proto) + learning_rate = optimizer_builder._create_learning_rate( + learning_rate_proto, global_summaries) + self.assertTrue(isinstance(learning_rate, tf.Tensor)) + + def testBuildManualStepLearningRate(self): + learning_rate_text_proto = """ + manual_step_learning_rate { + schedule { + step: 0 + learning_rate: 0.006 + } + schedule { + step: 90000 + learning_rate: 0.00006 + } + } + """ + global_summaries = set([]) + learning_rate_proto = optimizer_pb2.LearningRate() + text_format.Merge(learning_rate_text_proto, learning_rate_proto) + learning_rate = optimizer_builder._create_learning_rate( + learning_rate_proto, global_summaries) + self.assertTrue(isinstance(learning_rate, tf.Tensor)) + + def testRaiseErrorOnEmptyLearningRate(self): + learning_rate_text_proto = """ + """ + global_summaries = set([]) + learning_rate_proto = optimizer_pb2.LearningRate() + text_format.Merge(learning_rate_text_proto, learning_rate_proto) + with self.assertRaises(ValueError): + optimizer_builder._create_learning_rate( + learning_rate_proto, global_summaries) + + +class OptimizerBuilderTest(tf.test.TestCase): + + def testBuildRMSPropOptimizer(self): + optimizer_text_proto = """ + rms_prop_optimizer: { + learning_rate: { + exponential_decay_learning_rate { + initial_learning_rate: 0.004 + decay_steps: 800720 + decay_factor: 0.95 + } + } + momentum_optimizer_value: 0.9 + decay: 0.9 + epsilon: 1.0 + } + use_moving_average: false + """ + global_summaries = set([]) + optimizer_proto = optimizer_pb2.Optimizer() + text_format.Merge(optimizer_text_proto, optimizer_proto) + optimizer = optimizer_builder.build(optimizer_proto, global_summaries) + self.assertTrue(isinstance(optimizer, tf.train.RMSPropOptimizer)) + + def testBuildMomentumOptimizer(self): + optimizer_text_proto = """ + momentum_optimizer: { + learning_rate: { + constant_learning_rate { + learning_rate: 0.001 + } + } + momentum_optimizer_value: 0.99 + } + use_moving_average: false + """ + global_summaries = set([]) + optimizer_proto = optimizer_pb2.Optimizer() + text_format.Merge(optimizer_text_proto, optimizer_proto) + optimizer = optimizer_builder.build(optimizer_proto, global_summaries) + self.assertTrue(isinstance(optimizer, tf.train.MomentumOptimizer)) + + def testBuildAdamOptimizer(self): + optimizer_text_proto = """ + adam_optimizer: { + learning_rate: { + constant_learning_rate { + learning_rate: 0.002 + } + } + } + use_moving_average: false + """ + global_summaries = set([]) + optimizer_proto = optimizer_pb2.Optimizer() + text_format.Merge(optimizer_text_proto, optimizer_proto) + optimizer = optimizer_builder.build(optimizer_proto, global_summaries) + self.assertTrue(isinstance(optimizer, tf.train.AdamOptimizer)) + + def testBuildMovingAverageOptimizer(self): + optimizer_text_proto = """ + adam_optimizer: { + learning_rate: { + constant_learning_rate { + learning_rate: 0.002 + } + } + } + use_moving_average: True + """ + global_summaries = set([]) + optimizer_proto = optimizer_pb2.Optimizer() + text_format.Merge(optimizer_text_proto, optimizer_proto) + optimizer = optimizer_builder.build(optimizer_proto, global_summaries) + self.assertTrue( + isinstance(optimizer, tf.contrib.opt.MovingAverageOptimizer)) + + def testBuildMovingAverageOptimizerWithNonDefaultDecay(self): + optimizer_text_proto = """ + adam_optimizer: { + learning_rate: { + constant_learning_rate { + learning_rate: 0.002 + } + } + } + use_moving_average: True + moving_average_decay: 0.2 + """ + global_summaries = set([]) + optimizer_proto = optimizer_pb2.Optimizer() + text_format.Merge(optimizer_text_proto, optimizer_proto) + optimizer = optimizer_builder.build(optimizer_proto, global_summaries) + self.assertTrue( + isinstance(optimizer, tf.contrib.opt.MovingAverageOptimizer)) + # TODO: Find a way to not depend on the private members. + self.assertAlmostEqual(optimizer._ema._decay, 0.2) + + def testBuildEmptyOptimizer(self): + optimizer_text_proto = """ + """ + global_summaries = set([]) + optimizer_proto = optimizer_pb2.Optimizer() + text_format.Merge(optimizer_text_proto, optimizer_proto) + with self.assertRaises(ValueError): + optimizer_builder.build(optimizer_proto, global_summaries) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/builders/post_processing_builder.py b/object_detection/builders/post_processing_builder.py new file mode 100644 index 0000000000000000000000000000000000000000..ab8c04ef9e431210c0413c0f213f28ced7d8b99d --- /dev/null +++ b/object_detection/builders/post_processing_builder.py @@ -0,0 +1,111 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Builder function for post processing operations.""" +import functools + +import tensorflow as tf +from object_detection.core import post_processing +from object_detection.protos import post_processing_pb2 + + +def build(post_processing_config): + """Builds callables for post-processing operations. + + Builds callables for non-max suppression and score conversion based on the + configuration. + + Non-max suppression callable takes `boxes`, `scores`, and optionally + `clip_window`, `parallel_iterations` and `scope` as inputs. It returns + `nms_boxes`, `nms_scores`, `nms_nms_classes` and `num_detections`. See + post_processing.batch_multiclass_non_max_suppression for the type and shape + of these tensors. + + Score converter callable should be called with `input` tensor. The callable + returns the output from one of 3 tf operations based on the configuration - + tf.identity, tf.sigmoid or tf.nn.softmax. See tensorflow documentation for + argument and return value descriptions. + + Args: + post_processing_config: post_processing.proto object containing the + parameters for the post-processing operations. + + Returns: + non_max_suppressor_fn: Callable for non-max suppression. + score_converter_fn: Callable for score conversion. + + Raises: + ValueError: if the post_processing_config is of incorrect type. + """ + if not isinstance(post_processing_config, post_processing_pb2.PostProcessing): + raise ValueError('post_processing_config not of type ' + 'post_processing_pb2.Postprocessing.') + non_max_suppressor_fn = _build_non_max_suppressor( + post_processing_config.batch_non_max_suppression) + score_converter_fn = _build_score_converter( + post_processing_config.score_converter) + return non_max_suppressor_fn, score_converter_fn + + +def _build_non_max_suppressor(nms_config): + """Builds non-max suppresson based on the nms config. + + Args: + nms_config: post_processing_pb2.PostProcessing.BatchNonMaxSuppression proto. + + Returns: + non_max_suppressor_fn: Callable non-max suppressor. + + Raises: + ValueError: On incorrect iou_threshold or on incompatible values of + max_total_detections and max_detections_per_class. + """ + if nms_config.iou_threshold < 0 or nms_config.iou_threshold > 1.0: + raise ValueError('iou_threshold not in [0, 1.0].') + if nms_config.max_detections_per_class > nms_config.max_total_detections: + raise ValueError('max_detections_per_class should be no greater than ' + 'max_total_detections.') + + non_max_suppressor_fn = functools.partial( + post_processing.batch_multiclass_non_max_suppression, + score_thresh=nms_config.score_threshold, + iou_thresh=nms_config.iou_threshold, + max_size_per_class=nms_config.max_detections_per_class, + max_total_size=nms_config.max_total_detections) + return non_max_suppressor_fn + + +def _build_score_converter(score_converter_config): + """Builds score converter based on the config. + + Builds one of [tf.identity, tf.sigmoid, tf.softmax] score converters based on + the config. + + Args: + score_converter_config: post_processing_pb2.PostProcessing.score_converter. + + Returns: + Callable score converter op. + + Raises: + ValueError: On unknown score converter. + """ + if score_converter_config == post_processing_pb2.PostProcessing.IDENTITY: + return tf.identity + if score_converter_config == post_processing_pb2.PostProcessing.SIGMOID: + return tf.sigmoid + if score_converter_config == post_processing_pb2.PostProcessing.SOFTMAX: + return tf.nn.softmax + raise ValueError('Unknown score converter.') diff --git a/object_detection/builders/post_processing_builder_test.py b/object_detection/builders/post_processing_builder_test.py new file mode 100644 index 0000000000000000000000000000000000000000..514ce6d2be008a164e760e34effc528a95ac82ee --- /dev/null +++ b/object_detection/builders/post_processing_builder_test.py @@ -0,0 +1,73 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for post_processing_builder.""" + +import tensorflow as tf +from google.protobuf import text_format +from object_detection.builders import post_processing_builder +from object_detection.protos import post_processing_pb2 + + +class PostProcessingBuilderTest(tf.test.TestCase): + + def test_build_non_max_suppressor_with_correct_parameters(self): + post_processing_text_proto = """ + batch_non_max_suppression { + score_threshold: 0.7 + iou_threshold: 0.6 + max_detections_per_class: 100 + max_total_detections: 300 + } + """ + post_processing_config = post_processing_pb2.PostProcessing() + text_format.Merge(post_processing_text_proto, post_processing_config) + non_max_suppressor, _ = post_processing_builder.build( + post_processing_config) + self.assertEqual(non_max_suppressor.keywords['max_size_per_class'], 100) + self.assertEqual(non_max_suppressor.keywords['max_total_size'], 300) + self.assertAlmostEqual(non_max_suppressor.keywords['score_thresh'], 0.7) + self.assertAlmostEqual(non_max_suppressor.keywords['iou_thresh'], 0.6) + + def test_build_identity_score_converter(self): + post_processing_text_proto = """ + score_converter: IDENTITY + """ + post_processing_config = post_processing_pb2.PostProcessing() + text_format.Merge(post_processing_text_proto, post_processing_config) + _, score_converter = post_processing_builder.build(post_processing_config) + self.assertEqual(score_converter, tf.identity) + + def test_build_sigmoid_score_converter(self): + post_processing_text_proto = """ + score_converter: SIGMOID + """ + post_processing_config = post_processing_pb2.PostProcessing() + text_format.Merge(post_processing_text_proto, post_processing_config) + _, score_converter = post_processing_builder.build(post_processing_config) + self.assertEqual(score_converter, tf.sigmoid) + + def test_build_softmax_score_converter(self): + post_processing_text_proto = """ + score_converter: SOFTMAX + """ + post_processing_config = post_processing_pb2.PostProcessing() + text_format.Merge(post_processing_text_proto, post_processing_config) + _, score_converter = post_processing_builder.build(post_processing_config) + self.assertEqual(score_converter, tf.nn.softmax) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/builders/preprocessor_builder.py b/object_detection/builders/preprocessor_builder.py new file mode 100644 index 0000000000000000000000000000000000000000..d88b31b23776589338de70024f37b4d6c6c43dfd --- /dev/null +++ b/object_detection/builders/preprocessor_builder.py @@ -0,0 +1,277 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Builder for preprocessing steps.""" + +import tensorflow as tf + +from object_detection.core import preprocessor +from object_detection.protos import preprocessor_pb2 + + +def _get_step_config_from_proto(preprocessor_step_config, step_name): + """Returns the value of a field named step_name from proto. + + Args: + preprocessor_step_config: A preprocessor_pb2.PreprocessingStep object. + step_name: Name of the field to get value from. + + Returns: + result_dict: a sub proto message from preprocessor_step_config which will be + later converted to a dictionary. + + Raises: + ValueError: If field does not exist in proto. + """ + for field, value in preprocessor_step_config.ListFields(): + if field.name == step_name: + return value + + raise ValueError('Could not get field %s from proto!', step_name) + + +def _get_dict_from_proto(config): + """Helper function to put all proto fields into a dictionary. + + For many preprocessing steps, there's an trivial 1-1 mapping from proto fields + to function arguments. This function automatically populates a dictionary with + the arguments from the proto. + + Protos that CANNOT be trivially populated include: + * nested messages. + * steps that check if an optional field is set (ie. where None != 0). + * protos that don't map 1-1 to arguments (ie. list should be reshaped). + * fields requiring additional validation (ie. repeated field has n elements). + + Args: + config: A protobuf object that does not violate the conditions above. + + Returns: + result_dict: |config| converted into a python dictionary. + """ + result_dict = {} + for field, value in config.ListFields(): + result_dict[field.name] = value + return result_dict + + +# A map from a PreprocessingStep proto config field name to the preprocessing +# function that should be used. The PreprocessingStep proto should be parsable +# with _get_dict_from_proto. +PREPROCESSING_FUNCTION_MAP = { + 'normalize_image': preprocessor.normalize_image, + 'random_horizontal_flip': preprocessor.random_horizontal_flip, + 'random_pixel_value_scale': preprocessor.random_pixel_value_scale, + 'random_image_scale': preprocessor.random_image_scale, + 'random_rgb_to_gray': preprocessor.random_rgb_to_gray, + 'random_adjust_brightness': preprocessor.random_adjust_brightness, + 'random_adjust_contrast': preprocessor.random_adjust_contrast, + 'random_adjust_hue': preprocessor.random_adjust_hue, + 'random_adjust_saturation': preprocessor.random_adjust_saturation, + 'random_distort_color': preprocessor.random_distort_color, + 'random_jitter_boxes': preprocessor.random_jitter_boxes, + 'random_crop_to_aspect_ratio': preprocessor.random_crop_to_aspect_ratio, + 'random_black_patches': preprocessor.random_black_patches, + 'scale_boxes_to_pixel_coordinates': ( + preprocessor.scale_boxes_to_pixel_coordinates), + 'subtract_channel_mean': preprocessor.subtract_channel_mean, +} + + +# A map to convert from preprocessor_pb2.ResizeImage.Method enum to +# tf.image.ResizeMethod. +RESIZE_METHOD_MAP = { + preprocessor_pb2.ResizeImage.AREA: tf.image.ResizeMethod.AREA, + preprocessor_pb2.ResizeImage.BICUBIC: tf.image.ResizeMethod.BICUBIC, + preprocessor_pb2.ResizeImage.BILINEAR: tf.image.ResizeMethod.BILINEAR, + preprocessor_pb2.ResizeImage.NEAREST_NEIGHBOR: ( + tf.image.ResizeMethod.NEAREST_NEIGHBOR), +} + + +def build(preprocessor_step_config): + """Builds preprocessing step based on the configuration. + + Args: + preprocessor_step_config: PreprocessingStep configuration proto. + + Returns: + function, argmap: A callable function and an argument map to call function + with. + + Raises: + ValueError: On invalid configuration. + """ + step_type = preprocessor_step_config.WhichOneof('preprocessing_step') + + if step_type in PREPROCESSING_FUNCTION_MAP: + preprocessing_function = PREPROCESSING_FUNCTION_MAP[step_type] + step_config = _get_step_config_from_proto(preprocessor_step_config, + step_type) + function_args = _get_dict_from_proto(step_config) + return (preprocessing_function, function_args) + + if step_type == 'random_crop_image': + config = preprocessor_step_config.random_crop_image + return (preprocessor.random_crop_image, + { + 'min_object_covered': config.min_object_covered, + 'aspect_ratio_range': (config.min_aspect_ratio, + config.max_aspect_ratio), + 'area_range': (config.min_area, config.max_area), + 'overlap_thresh': config.overlap_thresh, + 'random_coef': config.random_coef, + }) + + if step_type == 'random_pad_image': + config = preprocessor_step_config.random_pad_image + min_image_size = None + if (config.HasField('min_image_height') != + config.HasField('min_image_width')): + raise ValueError('min_image_height and min_image_width should be either ' + 'both set or both unset.') + if config.HasField('min_image_height'): + min_image_size = (config.min_image_height, config.min_image_width) + + max_image_size = None + if (config.HasField('max_image_height') != + config.HasField('max_image_width')): + raise ValueError('max_image_height and max_image_width should be either ' + 'both set or both unset.') + if config.HasField('max_image_height'): + max_image_size = (config.max_image_height, config.max_image_width) + + pad_color = config.pad_color + if pad_color and len(pad_color) != 3: + raise ValueError('pad_color should have 3 elements (RGB) if set!') + if not pad_color: + pad_color = None + return (preprocessor.random_pad_image, + { + 'min_image_size': min_image_size, + 'max_image_size': max_image_size, + 'pad_color': pad_color, + }) + + if step_type == 'random_crop_pad_image': + config = preprocessor_step_config.random_crop_pad_image + min_padded_size_ratio = config.min_padded_size_ratio + if min_padded_size_ratio and len(min_padded_size_ratio) != 2: + raise ValueError('min_padded_size_ratio should have 3 elements if set!') + max_padded_size_ratio = config.max_padded_size_ratio + if max_padded_size_ratio and len(max_padded_size_ratio) != 2: + raise ValueError('max_padded_size_ratio should have 3 elements if set!') + pad_color = config.pad_color + if pad_color and len(pad_color) != 3: + raise ValueError('pad_color should have 3 elements if set!') + return (preprocessor.random_crop_pad_image, + { + 'min_object_covered': config.min_object_covered, + 'aspect_ratio_range': (config.min_aspect_ratio, + config.max_aspect_ratio), + 'area_range': (config.min_area, config.max_area), + 'overlap_thresh': config.overlap_thresh, + 'random_coef': config.random_coef, + 'min_padded_size_ratio': (min_padded_size_ratio if + min_padded_size_ratio else None), + 'max_padded_size_ratio': (max_padded_size_ratio if + max_padded_size_ratio else None), + 'pad_color': (pad_color if pad_color else None), + }) + + if step_type == 'random_resize_method': + config = preprocessor_step_config.random_resize_method + return (preprocessor.random_resize_method, + { + 'target_size': [config.target_height, config.target_width], + }) + + if step_type == 'resize_image': + config = preprocessor_step_config.resize_image + method = RESIZE_METHOD_MAP[config.method] + return (preprocessor.resize_image, + { + 'new_height': config.new_height, + 'new_width': config.new_width, + 'method': method + }) + + if step_type == 'ssd_random_crop': + config = preprocessor_step_config.ssd_random_crop + if config.operations: + min_object_covered = [op.min_object_covered for op in config.operations] + aspect_ratio_range = [(op.min_aspect_ratio, op.max_aspect_ratio) + for op in config.operations] + area_range = [(op.min_area, op.max_area) for op in config.operations] + overlap_thresh = [op.overlap_thresh for op in config.operations] + random_coef = [op.random_coef for op in config.operations] + return (preprocessor.ssd_random_crop, + { + 'min_object_covered': min_object_covered, + 'aspect_ratio_range': aspect_ratio_range, + 'area_range': area_range, + 'overlap_thresh': overlap_thresh, + 'random_coef': random_coef, + }) + return (preprocessor.ssd_random_crop, {}) + + if step_type == 'ssd_random_crop_pad': + config = preprocessor_step_config.ssd_random_crop_pad + if config.operations: + min_object_covered = [op.min_object_covered for op in config.operations] + aspect_ratio_range = [(op.min_aspect_ratio, op.max_aspect_ratio) + for op in config.operations] + area_range = [(op.min_area, op.max_area) for op in config.operations] + overlap_thresh = [op.overlap_thresh for op in config.operations] + random_coef = [op.random_coef for op in config.operations] + min_padded_size_ratio = [ + (op.min_padded_size_ratio[0], op.min_padded_size_ratio[1]) + for op in config.operations] + max_padded_size_ratio = [ + (op.max_padded_size_ratio[0], op.max_padded_size_ratio[1]) + for op in config.operations] + pad_color = [(op.pad_color_r, op.pad_color_g, op.pad_color_b) + for op in config.operations] + return (preprocessor.ssd_random_crop_pad, + { + 'min_object_covered': min_object_covered, + 'aspect_ratio_range': aspect_ratio_range, + 'area_range': area_range, + 'overlap_thresh': overlap_thresh, + 'random_coef': random_coef, + 'min_padded_size_ratio': min_padded_size_ratio, + 'max_padded_size_ratio': max_padded_size_ratio, + 'pad_color': pad_color, + }) + return (preprocessor.ssd_random_crop_pad, {}) + + if step_type == 'ssd_random_crop_fixed_aspect_ratio': + config = preprocessor_step_config.ssd_random_crop_fixed_aspect_ratio + if config.operations: + min_object_covered = [op.min_object_covered for op in config.operations] + area_range = [(op.min_area, op.max_area) for op in config.operations] + overlap_thresh = [op.overlap_thresh for op in config.operations] + random_coef = [op.random_coef for op in config.operations] + return (preprocessor.ssd_random_crop_fixed_aspect_ratio, + { + 'min_object_covered': min_object_covered, + 'aspect_ratio': config.aspect_ratio, + 'area_range': area_range, + 'overlap_thresh': overlap_thresh, + 'random_coef': random_coef, + }) + return (preprocessor.ssd_random_crop_fixed_aspect_ratio, {}) + + raise ValueError('Unknown preprocessing step.') diff --git a/object_detection/builders/preprocessor_builder_test.py b/object_detection/builders/preprocessor_builder_test.py new file mode 100644 index 0000000000000000000000000000000000000000..8f8ba253d85f7632876ad72888204e0992c7c95b --- /dev/null +++ b/object_detection/builders/preprocessor_builder_test.py @@ -0,0 +1,452 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for preprocessor_builder.""" + +import tensorflow as tf + +from google.protobuf import text_format + +from object_detection.builders import preprocessor_builder +from object_detection.core import preprocessor +from object_detection.protos import preprocessor_pb2 + + +class PreprocessorBuilderTest(tf.test.TestCase): + + def assert_dictionary_close(self, dict1, dict2): + """Helper to check if two dicts with floatst or integers are close.""" + self.assertEqual(sorted(dict1.keys()), sorted(dict2.keys())) + for key in dict1: + value = dict1[key] + if isinstance(value, float): + self.assertAlmostEqual(value, dict2[key]) + else: + self.assertEqual(value, dict2[key]) + + def test_build_normalize_image(self): + preprocessor_text_proto = """ + normalize_image { + original_minval: 0.0 + original_maxval: 255.0 + target_minval: -1.0 + target_maxval: 1.0 + } + """ + preprocessor_proto = preprocessor_pb2.PreprocessingStep() + text_format.Merge(preprocessor_text_proto, preprocessor_proto) + function, args = preprocessor_builder.build(preprocessor_proto) + self.assertEqual(function, preprocessor.normalize_image) + self.assertEqual(args, { + 'original_minval': 0.0, + 'original_maxval': 255.0, + 'target_minval': -1.0, + 'target_maxval': 1.0, + }) + + def test_build_random_horizontal_flip(self): + preprocessor_text_proto = """ + random_horizontal_flip { + } + """ + preprocessor_proto = preprocessor_pb2.PreprocessingStep() + text_format.Merge(preprocessor_text_proto, preprocessor_proto) + function, args = preprocessor_builder.build(preprocessor_proto) + self.assertEqual(function, preprocessor.random_horizontal_flip) + self.assertEqual(args, {}) + + def test_build_random_pixel_value_scale(self): + preprocessor_text_proto = """ + random_pixel_value_scale { + minval: 0.8 + maxval: 1.2 + } + """ + preprocessor_proto = preprocessor_pb2.PreprocessingStep() + text_format.Merge(preprocessor_text_proto, preprocessor_proto) + function, args = preprocessor_builder.build(preprocessor_proto) + self.assertEqual(function, preprocessor.random_pixel_value_scale) + self.assert_dictionary_close(args, {'minval': 0.8, 'maxval': 1.2}) + + def test_build_random_image_scale(self): + preprocessor_text_proto = """ + random_image_scale { + min_scale_ratio: 0.8 + max_scale_ratio: 2.2 + } + """ + preprocessor_proto = preprocessor_pb2.PreprocessingStep() + text_format.Merge(preprocessor_text_proto, preprocessor_proto) + function, args = preprocessor_builder.build(preprocessor_proto) + self.assertEqual(function, preprocessor.random_image_scale) + self.assert_dictionary_close(args, {'min_scale_ratio': 0.8, + 'max_scale_ratio': 2.2}) + + def test_build_random_rgb_to_gray(self): + preprocessor_text_proto = """ + random_rgb_to_gray { + probability: 0.8 + } + """ + preprocessor_proto = preprocessor_pb2.PreprocessingStep() + text_format.Merge(preprocessor_text_proto, preprocessor_proto) + function, args = preprocessor_builder.build(preprocessor_proto) + self.assertEqual(function, preprocessor.random_rgb_to_gray) + self.assert_dictionary_close(args, {'probability': 0.8}) + + def test_build_random_adjust_brightness(self): + preprocessor_text_proto = """ + random_adjust_brightness { + max_delta: 0.2 + } + """ + preprocessor_proto = preprocessor_pb2.PreprocessingStep() + text_format.Merge(preprocessor_text_proto, preprocessor_proto) + function, args = preprocessor_builder.build(preprocessor_proto) + self.assertEqual(function, preprocessor.random_adjust_brightness) + self.assert_dictionary_close(args, {'max_delta': 0.2}) + + def test_build_random_adjust_contrast(self): + preprocessor_text_proto = """ + random_adjust_contrast { + min_delta: 0.7 + max_delta: 1.1 + } + """ + preprocessor_proto = preprocessor_pb2.PreprocessingStep() + text_format.Merge(preprocessor_text_proto, preprocessor_proto) + function, args = preprocessor_builder.build(preprocessor_proto) + self.assertEqual(function, preprocessor.random_adjust_contrast) + self.assert_dictionary_close(args, {'min_delta': 0.7, 'max_delta': 1.1}) + + def test_build_random_adjust_hue(self): + preprocessor_text_proto = """ + random_adjust_hue { + max_delta: 0.01 + } + """ + preprocessor_proto = preprocessor_pb2.PreprocessingStep() + text_format.Merge(preprocessor_text_proto, preprocessor_proto) + function, args = preprocessor_builder.build(preprocessor_proto) + self.assertEqual(function, preprocessor.random_adjust_hue) + self.assert_dictionary_close(args, {'max_delta': 0.01}) + + def test_build_random_adjust_saturation(self): + preprocessor_text_proto = """ + random_adjust_saturation { + min_delta: 0.75 + max_delta: 1.15 + } + """ + preprocessor_proto = preprocessor_pb2.PreprocessingStep() + text_format.Merge(preprocessor_text_proto, preprocessor_proto) + function, args = preprocessor_builder.build(preprocessor_proto) + self.assertEqual(function, preprocessor.random_adjust_saturation) + self.assert_dictionary_close(args, {'min_delta': 0.75, 'max_delta': 1.15}) + + def test_build_random_distort_color(self): + preprocessor_text_proto = """ + random_distort_color { + color_ordering: 1 + } + """ + preprocessor_proto = preprocessor_pb2.PreprocessingStep() + text_format.Merge(preprocessor_text_proto, preprocessor_proto) + function, args = preprocessor_builder.build(preprocessor_proto) + self.assertEqual(function, preprocessor.random_distort_color) + self.assertEqual(args, {'color_ordering': 1}) + + def test_build_random_jitter_boxes(self): + preprocessor_text_proto = """ + random_jitter_boxes { + ratio: 0.1 + } + """ + preprocessor_proto = preprocessor_pb2.PreprocessingStep() + text_format.Merge(preprocessor_text_proto, preprocessor_proto) + function, args = preprocessor_builder.build(preprocessor_proto) + self.assertEqual(function, preprocessor.random_jitter_boxes) + self.assert_dictionary_close(args, {'ratio': 0.1}) + + def test_build_random_crop_image(self): + preprocessor_text_proto = """ + random_crop_image { + min_object_covered: 0.75 + min_aspect_ratio: 0.75 + max_aspect_ratio: 1.5 + min_area: 0.25 + max_area: 0.875 + overlap_thresh: 0.5 + random_coef: 0.125 + } + """ + preprocessor_proto = preprocessor_pb2.PreprocessingStep() + text_format.Merge(preprocessor_text_proto, preprocessor_proto) + function, args = preprocessor_builder.build(preprocessor_proto) + self.assertEqual(function, preprocessor.random_crop_image) + self.assertEqual(args, { + 'min_object_covered': 0.75, + 'aspect_ratio_range': (0.75, 1.5), + 'area_range': (0.25, 0.875), + 'overlap_thresh': 0.5, + 'random_coef': 0.125, + }) + + def test_build_random_pad_image(self): + preprocessor_text_proto = """ + random_pad_image { + } + """ + preprocessor_proto = preprocessor_pb2.PreprocessingStep() + text_format.Merge(preprocessor_text_proto, preprocessor_proto) + function, args = preprocessor_builder.build(preprocessor_proto) + self.assertEqual(function, preprocessor.random_pad_image) + self.assertEqual(args, { + 'min_image_size': None, + 'max_image_size': None, + 'pad_color': None, + }) + + def test_build_random_crop_pad_image(self): + preprocessor_text_proto = """ + random_crop_pad_image { + min_object_covered: 0.75 + min_aspect_ratio: 0.75 + max_aspect_ratio: 1.5 + min_area: 0.25 + max_area: 0.875 + overlap_thresh: 0.5 + random_coef: 0.125 + } + """ + preprocessor_proto = preprocessor_pb2.PreprocessingStep() + text_format.Merge(preprocessor_text_proto, preprocessor_proto) + function, args = preprocessor_builder.build(preprocessor_proto) + self.assertEqual(function, preprocessor.random_crop_pad_image) + self.assertEqual(args, { + 'min_object_covered': 0.75, + 'aspect_ratio_range': (0.75, 1.5), + 'area_range': (0.25, 0.875), + 'overlap_thresh': 0.5, + 'random_coef': 0.125, + 'min_padded_size_ratio': None, + 'max_padded_size_ratio': None, + 'pad_color': None, + }) + + def test_build_random_crop_to_aspect_ratio(self): + preprocessor_text_proto = """ + random_crop_to_aspect_ratio { + aspect_ratio: 0.85 + overlap_thresh: 0.35 + } + """ + preprocessor_proto = preprocessor_pb2.PreprocessingStep() + text_format.Merge(preprocessor_text_proto, preprocessor_proto) + function, args = preprocessor_builder.build(preprocessor_proto) + self.assertEqual(function, preprocessor.random_crop_to_aspect_ratio) + self.assert_dictionary_close(args, {'aspect_ratio': 0.85, + 'overlap_thresh': 0.35}) + + def test_build_random_black_patches(self): + preprocessor_text_proto = """ + random_black_patches { + max_black_patches: 20 + probability: 0.95 + size_to_image_ratio: 0.12 + } + """ + preprocessor_proto = preprocessor_pb2.PreprocessingStep() + text_format.Merge(preprocessor_text_proto, preprocessor_proto) + function, args = preprocessor_builder.build(preprocessor_proto) + self.assertEqual(function, preprocessor.random_black_patches) + self.assert_dictionary_close(args, {'max_black_patches': 20, + 'probability': 0.95, + 'size_to_image_ratio': 0.12}) + + def test_build_random_resize_method(self): + preprocessor_text_proto = """ + random_resize_method { + target_height: 75 + target_width: 100 + } + """ + preprocessor_proto = preprocessor_pb2.PreprocessingStep() + text_format.Merge(preprocessor_text_proto, preprocessor_proto) + function, args = preprocessor_builder.build(preprocessor_proto) + self.assertEqual(function, preprocessor.random_resize_method) + self.assert_dictionary_close(args, {'target_size': [75, 100]}) + + def test_build_scale_boxes_to_pixel_coordinates(self): + preprocessor_text_proto = """ + scale_boxes_to_pixel_coordinates {} + """ + preprocessor_proto = preprocessor_pb2.PreprocessingStep() + text_format.Merge(preprocessor_text_proto, preprocessor_proto) + function, args = preprocessor_builder.build(preprocessor_proto) + self.assertEqual(function, preprocessor.scale_boxes_to_pixel_coordinates) + self.assertEqual(args, {}) + + def test_build_resize_image(self): + preprocessor_text_proto = """ + resize_image { + new_height: 75 + new_width: 100 + method: BICUBIC + } + """ + preprocessor_proto = preprocessor_pb2.PreprocessingStep() + text_format.Merge(preprocessor_text_proto, preprocessor_proto) + function, args = preprocessor_builder.build(preprocessor_proto) + self.assertEqual(function, preprocessor.resize_image) + self.assertEqual(args, {'new_height': 75, + 'new_width': 100, + 'method': tf.image.ResizeMethod.BICUBIC}) + + def test_build_subtract_channel_mean(self): + preprocessor_text_proto = """ + subtract_channel_mean { + means: [1.0, 2.0, 3.0] + } + """ + preprocessor_proto = preprocessor_pb2.PreprocessingStep() + text_format.Merge(preprocessor_text_proto, preprocessor_proto) + function, args = preprocessor_builder.build(preprocessor_proto) + self.assertEqual(function, preprocessor.subtract_channel_mean) + self.assertEqual(args, {'means': [1.0, 2.0, 3.0]}) + + def test_build_ssd_random_crop(self): + preprocessor_text_proto = """ + ssd_random_crop { + operations { + min_object_covered: 0.0 + min_aspect_ratio: 0.875 + max_aspect_ratio: 1.125 + min_area: 0.5 + max_area: 1.0 + overlap_thresh: 0.0 + random_coef: 0.375 + } + operations { + min_object_covered: 0.25 + min_aspect_ratio: 0.75 + max_aspect_ratio: 1.5 + min_area: 0.5 + max_area: 1.0 + overlap_thresh: 0.25 + random_coef: 0.375 + } + } + """ + preprocessor_proto = preprocessor_pb2.PreprocessingStep() + text_format.Merge(preprocessor_text_proto, preprocessor_proto) + function, args = preprocessor_builder.build(preprocessor_proto) + self.assertEqual(function, preprocessor.ssd_random_crop) + self.assertEqual(args, {'min_object_covered': [0.0, 0.25], + 'aspect_ratio_range': [(0.875, 1.125), (0.75, 1.5)], + 'area_range': [(0.5, 1.0), (0.5, 1.0)], + 'overlap_thresh': [0.0, 0.25], + 'random_coef': [0.375, 0.375]}) + + def test_build_ssd_random_crop_empty_operations(self): + preprocessor_text_proto = """ + ssd_random_crop { + } + """ + preprocessor_proto = preprocessor_pb2.PreprocessingStep() + text_format.Merge(preprocessor_text_proto, preprocessor_proto) + function, args = preprocessor_builder.build(preprocessor_proto) + self.assertEqual(function, preprocessor.ssd_random_crop) + self.assertEqual(args, {}) + + def test_build_ssd_random_crop_pad(self): + preprocessor_text_proto = """ + ssd_random_crop_pad { + operations { + min_object_covered: 0.0 + min_aspect_ratio: 0.875 + max_aspect_ratio: 1.125 + min_area: 0.5 + max_area: 1.0 + overlap_thresh: 0.0 + random_coef: 0.375 + min_padded_size_ratio: [0.0, 0.0] + max_padded_size_ratio: [2.0, 2.0] + pad_color_r: 0.5 + pad_color_g: 0.5 + pad_color_b: 0.5 + } + operations { + min_object_covered: 0.25 + min_aspect_ratio: 0.75 + max_aspect_ratio: 1.5 + min_area: 0.5 + max_area: 1.0 + overlap_thresh: 0.25 + random_coef: 0.375 + min_padded_size_ratio: [0.0, 0.0] + max_padded_size_ratio: [2.0, 2.0] + pad_color_r: 0.5 + pad_color_g: 0.5 + pad_color_b: 0.5 + } + } + """ + preprocessor_proto = preprocessor_pb2.PreprocessingStep() + text_format.Merge(preprocessor_text_proto, preprocessor_proto) + function, args = preprocessor_builder.build(preprocessor_proto) + self.assertEqual(function, preprocessor.ssd_random_crop_pad) + self.assertEqual(args, {'min_object_covered': [0.0, 0.25], + 'aspect_ratio_range': [(0.875, 1.125), (0.75, 1.5)], + 'area_range': [(0.5, 1.0), (0.5, 1.0)], + 'overlap_thresh': [0.0, 0.25], + 'random_coef': [0.375, 0.375], + 'min_padded_size_ratio': [(0.0, 0.0), (0.0, 0.0)], + 'max_padded_size_ratio': [(2.0, 2.0), (2.0, 2.0)], + 'pad_color': [(0.5, 0.5, 0.5), (0.5, 0.5, 0.5)]}) + + def test_build_ssd_random_crop_fixed_aspect_ratio(self): + preprocessor_text_proto = """ + ssd_random_crop_fixed_aspect_ratio { + operations { + min_object_covered: 0.0 + min_area: 0.5 + max_area: 1.0 + overlap_thresh: 0.0 + random_coef: 0.375 + } + operations { + min_object_covered: 0.25 + min_area: 0.5 + max_area: 1.0 + overlap_thresh: 0.25 + random_coef: 0.375 + } + aspect_ratio: 0.875 + } + """ + preprocessor_proto = preprocessor_pb2.PreprocessingStep() + text_format.Merge(preprocessor_text_proto, preprocessor_proto) + function, args = preprocessor_builder.build(preprocessor_proto) + self.assertEqual(function, preprocessor.ssd_random_crop_fixed_aspect_ratio) + self.assertEqual(args, {'min_object_covered': [0.0, 0.25], + 'aspect_ratio': 0.875, + 'area_range': [(0.5, 1.0), (0.5, 1.0)], + 'overlap_thresh': [0.0, 0.25], + 'random_coef': [0.375, 0.375]}) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/builders/region_similarity_calculator_builder.py b/object_detection/builders/region_similarity_calculator_builder.py new file mode 100644 index 0000000000000000000000000000000000000000..fa1d671754df07043957ccf9e04f651c114c1cf9 --- /dev/null +++ b/object_detection/builders/region_similarity_calculator_builder.py @@ -0,0 +1,56 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Builder for region similarity calculators.""" + +from object_detection.core import region_similarity_calculator +from object_detection.protos import region_similarity_calculator_pb2 + + +def build(region_similarity_calculator_config): + """Builds region similarity calculator based on the configuration. + + Builds one of [IouSimilarity, IoaSimilarity, NegSqDistSimilarity] objects. See + core/region_similarity_calculator.proto for details. + + Args: + region_similarity_calculator_config: RegionSimilarityCalculator + configuration proto. + + Returns: + region_similarity_calculator: RegionSimilarityCalculator object. + + Raises: + ValueError: On unknown region similarity calculator. + """ + + if not isinstance( + region_similarity_calculator_config, + region_similarity_calculator_pb2.RegionSimilarityCalculator): + raise ValueError( + 'region_similarity_calculator_config not of type ' + 'region_similarity_calculator_pb2.RegionsSimilarityCalculator') + + similarity_calculator = region_similarity_calculator_config.WhichOneof( + 'region_similarity') + if similarity_calculator == 'iou_similarity': + return region_similarity_calculator.IouSimilarity() + if similarity_calculator == 'ioa_similarity': + return region_similarity_calculator.IoaSimilarity() + if similarity_calculator == 'neg_sq_dist_similarity': + return region_similarity_calculator.NegSqDistSimilarity() + + raise ValueError('Unknown region similarity calculator.') + diff --git a/object_detection/builders/region_similarity_calculator_builder_test.py b/object_detection/builders/region_similarity_calculator_builder_test.py new file mode 100644 index 0000000000000000000000000000000000000000..ca3a5512e374fc03f39de1f3f77cf22bc6f6556e --- /dev/null +++ b/object_detection/builders/region_similarity_calculator_builder_test.py @@ -0,0 +1,67 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for region_similarity_calculator_builder.""" + +import tensorflow as tf + +from google.protobuf import text_format +from object_detection.builders import region_similarity_calculator_builder +from object_detection.core import region_similarity_calculator +from object_detection.protos import region_similarity_calculator_pb2 as sim_calc_pb2 + + +class RegionSimilarityCalculatorBuilderTest(tf.test.TestCase): + + def testBuildIoaSimilarityCalculator(self): + similarity_calc_text_proto = """ + ioa_similarity { + } + """ + similarity_calc_proto = sim_calc_pb2.RegionSimilarityCalculator() + text_format.Merge(similarity_calc_text_proto, similarity_calc_proto) + similarity_calc = region_similarity_calculator_builder.build( + similarity_calc_proto) + self.assertTrue(isinstance(similarity_calc, + region_similarity_calculator.IoaSimilarity)) + + def testBuildIouSimilarityCalculator(self): + similarity_calc_text_proto = """ + iou_similarity { + } + """ + similarity_calc_proto = sim_calc_pb2.RegionSimilarityCalculator() + text_format.Merge(similarity_calc_text_proto, similarity_calc_proto) + similarity_calc = region_similarity_calculator_builder.build( + similarity_calc_proto) + self.assertTrue(isinstance(similarity_calc, + region_similarity_calculator.IouSimilarity)) + + def testBuildNegSqDistSimilarityCalculator(self): + similarity_calc_text_proto = """ + neg_sq_dist_similarity { + } + """ + similarity_calc_proto = sim_calc_pb2.RegionSimilarityCalculator() + text_format.Merge(similarity_calc_text_proto, similarity_calc_proto) + similarity_calc = region_similarity_calculator_builder.build( + similarity_calc_proto) + self.assertTrue(isinstance(similarity_calc, + region_similarity_calculator. + NegSqDistSimilarity)) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/core/BUILD b/object_detection/core/BUILD new file mode 100644 index 0000000000000000000000000000000000000000..a3384bcc0a4dd7a9fabe78723a701596a28c0b06 --- /dev/null +++ b/object_detection/core/BUILD @@ -0,0 +1,362 @@ +# Tensorflow Object Detection API: Core. + +package( + default_visibility = ["//visibility:public"], +) + +licenses(["notice"]) +# Apache 2.0 + +py_library( + name = "batcher", + srcs = ["batcher.py"], + deps = [ + ":prefetcher", + ":preprocessor", + ":standard_fields", + "//tensorflow", + ], +) + +py_test( + name = "batcher_test", + srcs = ["batcher_test.py"], + deps = [ + ":batcher", + "//tensorflow", + ], +) + +py_library( + name = "box_list", + srcs = [ + "box_list.py", + ], + deps = [ + "//tensorflow", + ], +) + +py_test( + name = "box_list_test", + srcs = ["box_list_test.py"], + deps = [ + ":box_list", + ], +) + +py_library( + name = "box_list_ops", + srcs = [ + "box_list_ops.py", + ], + deps = [ + ":box_list", + "//tensorflow", + "//tensorflow_models/object_detection/utils:shape_utils", + ], +) + +py_test( + name = "box_list_ops_test", + srcs = ["box_list_ops_test.py"], + deps = [ + ":box_list", + ":box_list_ops", + ], +) + +py_library( + name = "box_coder", + srcs = [ + "box_coder.py", + ], + deps = [ + "//tensorflow", + ], +) + +py_test( + name = "box_coder_test", + srcs = [ + "box_coder_test.py", + ], + deps = [ + ":box_coder", + ":box_list", + "//tensorflow", + ], +) + +py_library( + name = "keypoint_ops", + srcs = [ + "keypoint_ops.py", + ], + deps = [ + "//tensorflow", + ], +) + +py_test( + name = "keypoint_ops_test", + srcs = ["keypoint_ops_test.py"], + deps = [ + ":keypoint_ops", + ], +) + +py_library( + name = "losses", + srcs = ["losses.py"], + deps = [ + ":box_list", + ":box_list_ops", + "//tensorflow", + "//tensorflow_models/object_detection/utils:ops", + ], +) + +py_library( + name = "matcher", + srcs = [ + "matcher.py", + ], + deps = [ + ], +) + +py_library( + name = "model", + srcs = ["model.py"], + deps = [ + ":standard_fields", + ], +) + +py_test( + name = "matcher_test", + srcs = [ + "matcher_test.py", + ], + deps = [ + ":matcher", + "//tensorflow", + ], +) + +py_library( + name = "prefetcher", + srcs = ["prefetcher.py"], + deps = ["//tensorflow"], +) + +py_library( + name = "preprocessor", + srcs = [ + "preprocessor.py", + ], + deps = [ + ":box_list", + ":box_list_ops", + ":keypoint_ops", + ":standard_fields", + "//tensorflow", + ], +) + +py_test( + name = "preprocessor_test", + srcs = [ + "preprocessor_test.py", + ], + deps = [ + ":preprocessor", + "//tensorflow", + ], +) + +py_test( + name = "losses_test", + srcs = ["losses_test.py"], + deps = [ + ":box_list", + ":losses", + ":matcher", + "//tensorflow", + ], +) + +py_test( + name = "prefetcher_test", + srcs = ["prefetcher_test.py"], + deps = [ + ":prefetcher", + "//tensorflow", + ], +) + +py_library( + name = "standard_fields", + srcs = [ + "standard_fields.py", + ], +) + +py_library( + name = "post_processing", + srcs = ["post_processing.py"], + deps = [ + ":box_list", + ":box_list_ops", + ":standard_fields", + "//tensorflow", + ], +) + +py_test( + name = "post_processing_test", + srcs = ["post_processing_test.py"], + deps = [ + ":box_list", + ":box_list_ops", + ":post_processing", + "//tensorflow", + ], +) + +py_library( + name = "target_assigner", + srcs = [ + "target_assigner.py", + ], + deps = [ + ":box_list", + ":box_list_ops", + ":matcher", + ":region_similarity_calculator", + "//tensorflow", + "//tensorflow_models/object_detection/box_coders:faster_rcnn_box_coder", + "//tensorflow_models/object_detection/box_coders:mean_stddev_box_coder", + "//tensorflow_models/object_detection/core:box_coder", + "//tensorflow_models/object_detection/matchers:argmax_matcher", + "//tensorflow_models/object_detection/matchers:bipartite_matcher", + ], +) + +py_test( + name = "target_assigner_test", + size = "large", + timeout = "long", + srcs = ["target_assigner_test.py"], + deps = [ + ":box_list", + ":region_similarity_calculator", + ":target_assigner", + "//tensorflow", + "//tensorflow_models/object_detection/box_coders:mean_stddev_box_coder", + "//tensorflow_models/object_detection/matchers:bipartite_matcher", + ], +) + +py_library( + name = "data_decoder", + srcs = ["data_decoder.py"], +) + +py_library( + name = "box_predictor", + srcs = ["box_predictor.py"], + deps = [ + "//tensorflow", + "//tensorflow_models/object_detection/utils:ops", + "//tensorflow_models/object_detection/utils:static_shape", + ], +) + +py_test( + name = "box_predictor_test", + srcs = ["box_predictor_test.py"], + deps = [ + ":box_predictor", + "//tensorflow", + "//tensorflow_models/object_detection/builders:hyperparams_builder", + "//tensorflow_models/object_detection/protos:hyperparams_py_pb2", + ], +) + +py_library( + name = "region_similarity_calculator", + srcs = [ + "region_similarity_calculator.py", + ], + deps = [ + "//tensorflow", + "//tensorflow_models/object_detection/core:box_list_ops", + ], +) + +py_test( + name = "region_similarity_calculator_test", + srcs = [ + "region_similarity_calculator_test.py", + ], + deps = [ + ":region_similarity_calculator", + "//tensorflow_models/object_detection/core:box_list", + ], +) + +py_library( + name = "anchor_generator", + srcs = [ + "anchor_generator.py", + ], + deps = [ + "//tensorflow", + ], +) + +py_library( + name = "minibatch_sampler", + srcs = [ + "minibatch_sampler.py", + ], + deps = [ + "//tensorflow", + "//tensorflow_models/object_detection/utils:ops", + ], +) + +py_test( + name = "minibatch_sampler_test", + srcs = [ + "minibatch_sampler_test.py", + ], + deps = [ + ":minibatch_sampler", + "//tensorflow", + ], +) + +py_library( + name = "balanced_positive_negative_sampler", + srcs = [ + "balanced_positive_negative_sampler.py", + ], + deps = [ + ":minibatch_sampler", + "//tensorflow", + ], +) + +py_test( + name = "balanced_positive_negative_sampler_test", + srcs = [ + "balanced_positive_negative_sampler_test.py", + ], + deps = [ + ":balanced_positive_negative_sampler", + "//tensorflow", + ], +) diff --git a/object_detection/core/__init__.py b/object_detection/core/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/object_detection/core/anchor_generator.py b/object_detection/core/anchor_generator.py new file mode 100644 index 0000000000000000000000000000000000000000..ed6a2bc54dee7ecf0e691005b7d37692604feb20 --- /dev/null +++ b/object_detection/core/anchor_generator.py @@ -0,0 +1,142 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Base anchor generator. + +The job of the anchor generator is to create (or load) a collection +of bounding boxes to be used as anchors. + +Generated anchors are assumed to match some convolutional grid or list of grid +shapes. For example, we might want to generate anchors matching an 8x8 +feature map and a 4x4 feature map. If we place 3 anchors per grid location +on the first feature map and 6 anchors per grid location on the second feature +map, then 3*8*8 + 6*4*4 = 288 anchors are generated in total. + +To support fully convolutional settings, feature map shapes are passed +dynamically at generation time. The number of anchors to place at each location +is static --- implementations of AnchorGenerator must always be able return +the number of anchors that it uses per location for each feature map. +""" +from abc import ABCMeta +from abc import abstractmethod + +import tensorflow as tf + + +class AnchorGenerator(object): + """Abstract base class for anchor generators.""" + __metaclass__ = ABCMeta + + @abstractmethod + def name_scope(self): + """Name scope. + + Must be defined by implementations. + + Returns: + a string representing the name scope of the anchor generation operation. + """ + pass + + @property + def check_num_anchors(self): + """Whether to dynamically check the number of anchors generated. + + Can be overridden by implementations that would like to disable this + behavior. + + Returns: + a boolean controlling whether the Generate function should dynamically + check the number of anchors generated against the mathematically + expected number of anchors. + """ + return True + + @abstractmethod + def num_anchors_per_location(self): + """Returns the number of anchors per spatial location. + + Returns: + a list of integers, one for each expected feature map to be passed to + the `generate` function. + """ + pass + + def generate(self, feature_map_shape_list, **params): + """Generates a collection of bounding boxes to be used as anchors. + + TODO: remove **params from argument list and make stride and offsets (for + multiple_grid_anchor_generator) constructor arguments. + + Args: + feature_map_shape_list: list of (height, width) pairs in the format + [(height_0, width_0), (height_1, width_1), ...] that the generated + anchors must align with. Pairs can be provided as 1-dimensional + integer tensors of length 2 or simply as tuples of integers. + **params: parameters for anchor generation op + + Returns: + boxes: a BoxList holding a collection of N anchor boxes + Raises: + ValueError: if the number of feature map shapes does not match the length + of NumAnchorsPerLocation. + """ + if self.check_num_anchors and ( + len(feature_map_shape_list) != len(self.num_anchors_per_location())): + raise ValueError('Number of feature maps is expected to equal the length ' + 'of `num_anchors_per_location`.') + with tf.name_scope(self.name_scope()): + anchors = self._generate(feature_map_shape_list, **params) + if self.check_num_anchors: + with tf.control_dependencies([ + self._assert_correct_number_of_anchors( + anchors, feature_map_shape_list)]): + anchors.set(tf.identity(anchors.get())) + return anchors + + @abstractmethod + def _generate(self, feature_map_shape_list, **params): + """To be overridden by implementations. + + Args: + feature_map_shape_list: list of (height, width) pairs in the format + [(height_0, width_0), (height_1, width_1), ...] that the generated + anchors must align with. + **params: parameters for anchor generation op + + Returns: + boxes: a BoxList holding a collection of N anchor boxes + """ + pass + + def _assert_correct_number_of_anchors(self, anchors, feature_map_shape_list): + """Assert that correct number of anchors was generated. + + Args: + anchors: box_list.BoxList object holding anchors generated + feature_map_shape_list: list of (height, width) pairs in the format + [(height_0, width_0), (height_1, width_1), ...] that the generated + anchors must align with. + Returns: + Op that raises InvalidArgumentError if the number of anchors does not + match the number of expected anchors. + """ + expected_num_anchors = 0 + for num_anchors_per_location, feature_map_shape in zip( + self.num_anchors_per_location(), feature_map_shape_list): + expected_num_anchors += (num_anchors_per_location + * feature_map_shape[0] + * feature_map_shape[1]) + return tf.assert_equal(expected_num_anchors, anchors.num_boxes()) diff --git a/object_detection/core/balanced_positive_negative_sampler.py b/object_detection/core/balanced_positive_negative_sampler.py new file mode 100644 index 0000000000000000000000000000000000000000..68844c4f9642e21e341f7327bfc6f4414648380a --- /dev/null +++ b/object_detection/core/balanced_positive_negative_sampler.py @@ -0,0 +1,92 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Class to subsample minibatches by balancing positives and negatives. + +Subsamples minibatches based on a pre-specified positive fraction in range +[0,1]. The class presumes there are many more negatives than positive examples: +if the desired batch_size cannot be achieved with the pre-specified positive +fraction, it fills the rest with negative examples. If this is not sufficient +for obtaining the desired batch_size, it returns fewer examples. + +The main function to call is Subsample(self, indicator, labels). For convenience +one can also call SubsampleWeights(self, weights, labels) which is defined in +the minibatch_sampler base class. +""" + +import tensorflow as tf + +from object_detection.core import minibatch_sampler + + +class BalancedPositiveNegativeSampler(minibatch_sampler.MinibatchSampler): + """Subsamples minibatches to a desired balance of positives and negatives.""" + + def __init__(self, positive_fraction=0.5): + """Constructs a minibatch sampler. + + Args: + positive_fraction: desired fraction of positive examples (scalar in [0,1]) + + Raises: + ValueError: if positive_fraction < 0, or positive_fraction > 1 + """ + if positive_fraction < 0 or positive_fraction > 1: + raise ValueError('positive_fraction should be in range [0,1]. ' + 'Received: %s.' % positive_fraction) + self._positive_fraction = positive_fraction + + def subsample(self, indicator, batch_size, labels): + """Returns subsampled minibatch. + + Args: + indicator: boolean tensor of shape [N] whose True entries can be sampled. + batch_size: desired batch size. + labels: boolean tensor of shape [N] denoting positive(=True) and negative + (=False) examples. + + Returns: + is_sampled: boolean tensor of shape [N], True for entries which are + sampled. + + Raises: + ValueError: if labels and indicator are not 1D boolean tensors. + """ + if len(indicator.get_shape().as_list()) != 1: + raise ValueError('indicator must be 1 dimensional, got a tensor of ' + 'shape %s' % indicator.get_shape()) + if len(labels.get_shape().as_list()) != 1: + raise ValueError('labels must be 1 dimensional, got a tensor of ' + 'shape %s' % labels.get_shape()) + if labels.dtype != tf.bool: + raise ValueError('labels should be of type bool. Received: %s' % + labels.dtype) + if indicator.dtype != tf.bool: + raise ValueError('indicator should be of type bool. Received: %s' % + indicator.dtype) + + # Only sample from indicated samples + negative_idx = tf.logical_not(labels) + positive_idx = tf.logical_and(labels, indicator) + negative_idx = tf.logical_and(negative_idx, indicator) + + # Sample positive and negative samples separately + max_num_pos = int(self._positive_fraction * batch_size) + sampled_pos_idx = self.subsample_indicator(positive_idx, max_num_pos) + max_num_neg = batch_size - tf.reduce_sum(tf.cast(sampled_pos_idx, tf.int32)) + sampled_neg_idx = self.subsample_indicator(negative_idx, max_num_neg) + + sampled_idx = tf.logical_or(sampled_pos_idx, sampled_neg_idx) + return sampled_idx diff --git a/object_detection/core/balanced_positive_negative_sampler_test.py b/object_detection/core/balanced_positive_negative_sampler_test.py new file mode 100644 index 0000000000000000000000000000000000000000..23991cf5667b4bcfc5cd292058316c90f91475d2 --- /dev/null +++ b/object_detection/core/balanced_positive_negative_sampler_test.py @@ -0,0 +1,83 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.core.balanced_positive_negative_sampler.""" + +import numpy as np +import tensorflow as tf + +from object_detection.core import balanced_positive_negative_sampler + + +class BalancedPositiveNegativeSamplerTest(tf.test.TestCase): + + def test_subsample_all_examples(self): + numpy_labels = np.random.permutation(300) + indicator = tf.constant(np.ones(300) == 1) + numpy_labels = (numpy_labels - 200) > 0 + + labels = tf.constant(numpy_labels) + + sampler = (balanced_positive_negative_sampler. + BalancedPositiveNegativeSampler()) + is_sampled = sampler.subsample(indicator, 64, labels) + with self.test_session() as sess: + is_sampled = sess.run(is_sampled) + self.assertTrue(sum(is_sampled) == 64) + self.assertTrue(sum(np.logical_and(numpy_labels, is_sampled)) == 32) + self.assertTrue(sum(np.logical_and( + np.logical_not(numpy_labels), is_sampled)) == 32) + + def test_subsample_selection(self): + # Test random sampling when only some examples can be sampled: + # 100 samples, 20 positives, 10 positives cannot be sampled + numpy_labels = np.arange(100) + numpy_indicator = numpy_labels < 90 + indicator = tf.constant(numpy_indicator) + numpy_labels = (numpy_labels - 80) >= 0 + + labels = tf.constant(numpy_labels) + + sampler = (balanced_positive_negative_sampler. + BalancedPositiveNegativeSampler()) + is_sampled = sampler.subsample(indicator, 64, labels) + with self.test_session() as sess: + is_sampled = sess.run(is_sampled) + self.assertTrue(sum(is_sampled) == 64) + self.assertTrue(sum(np.logical_and(numpy_labels, is_sampled)) == 10) + self.assertTrue(sum(np.logical_and( + np.logical_not(numpy_labels), is_sampled)) == 54) + self.assertAllEqual(is_sampled, np.logical_and(is_sampled, + numpy_indicator)) + + def test_raises_error_with_incorrect_label_shape(self): + labels = tf.constant([[True, False, False]]) + indicator = tf.constant([True, False, True]) + sampler = (balanced_positive_negative_sampler. + BalancedPositiveNegativeSampler()) + with self.assertRaises(ValueError): + sampler.subsample(indicator, 64, labels) + + def test_raises_error_with_incorrect_indicator_shape(self): + labels = tf.constant([True, False, False]) + indicator = tf.constant([[True, False, True]]) + sampler = (balanced_positive_negative_sampler. + BalancedPositiveNegativeSampler()) + with self.assertRaises(ValueError): + sampler.subsample(indicator, 64, labels) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/core/batcher.py b/object_detection/core/batcher.py new file mode 100644 index 0000000000000000000000000000000000000000..2325a5edef3c4417b6d8ff47c0833507e6159c7c --- /dev/null +++ b/object_detection/core/batcher.py @@ -0,0 +1,136 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Provides functions to batch a dictionary of input tensors.""" +import collections + +import tensorflow as tf + +from object_detection.core import prefetcher + +rt_shape_str = '_runtime_shapes' + + +class BatchQueue(object): + """BatchQueue class. + + This class creates a batch queue to asynchronously enqueue tensors_dict. + It also adds a FIFO prefetcher so that the batches are readily available + for the consumers. Dequeue ops for a BatchQueue object can be created via + the Dequeue method which evaluates to a batch of tensor_dict. + + Example input pipeline with batching: + ------------------------------------ + key, string_tensor = slim.parallel_reader.parallel_read(...) + tensor_dict = decoder.decode(string_tensor) + tensor_dict = preprocessor.preprocess(tensor_dict, ...) + batch_queue = batcher.BatchQueue(tensor_dict, + batch_size=32, + batch_queue_capacity=2000, + num_batch_queue_threads=8, + prefetch_queue_capacity=20) + tensor_dict = batch_queue.dequeue() + outputs = Model(tensor_dict) + ... + ----------------------------------- + + Notes: + ----- + This class batches tensors of unequal sizes by zero padding and unpadding + them after generating a batch. This can be computationally expensive when + batching tensors (such as images) that are of vastly different sizes. So it is + recommended that the shapes of such tensors be fully defined in tensor_dict + while other lightweight tensors such as bounding box corners and class labels + can be of varying sizes. Use either crop or resize operations to fully define + the shape of an image in tensor_dict. + + It is also recommended to perform any preprocessing operations on tensors + before passing to BatchQueue and subsequently calling the Dequeue method. + + Another caveat is that this class does not read the last batch if it is not + full. The current implementation makes it hard to support that use case. So, + for evaluation, when it is critical to run all the examples through your + network use the input pipeline example mentioned in core/prefetcher.py. + """ + + def __init__(self, tensor_dict, batch_size, batch_queue_capacity, + num_batch_queue_threads, prefetch_queue_capacity): + """Constructs a batch queue holding tensor_dict. + + Args: + tensor_dict: dictionary of tensors to batch. + batch_size: batch size. + batch_queue_capacity: max capacity of the queue from which the tensors are + batched. + num_batch_queue_threads: number of threads to use for batching. + prefetch_queue_capacity: max capacity of the queue used to prefetch + assembled batches. + """ + # Remember static shapes to set shapes of batched tensors. + static_shapes = collections.OrderedDict( + {key: tensor.get_shape() for key, tensor in tensor_dict.items()}) + # Remember runtime shapes to unpad tensors after batching. + runtime_shapes = collections.OrderedDict( + {(key + rt_shape_str): tf.shape(tensor) + for key, tensor in tensor_dict.iteritems()}) + + all_tensors = tensor_dict + all_tensors.update(runtime_shapes) + batched_tensors = tf.train.batch( + all_tensors, + capacity=batch_queue_capacity, + batch_size=batch_size, + dynamic_pad=True, + num_threads=num_batch_queue_threads) + + self._queue = prefetcher.prefetch(batched_tensors, + prefetch_queue_capacity) + self._static_shapes = static_shapes + self._batch_size = batch_size + + def dequeue(self): + """Dequeues a batch of tensor_dict from the BatchQueue. + + TODO: use allow_smaller_final_batch to allow running over the whole eval set + + Returns: + A list of tensor_dicts of the requested batch_size. + """ + batched_tensors = self._queue.dequeue() + # Separate input tensors from tensors containing their runtime shapes. + tensors = {} + shapes = {} + for key, batched_tensor in batched_tensors.items(): + unbatched_tensor_list = tf.unstack(batched_tensor) + for i, unbatched_tensor in enumerate(unbatched_tensor_list): + if rt_shape_str in key: + shapes[(key[:-len(rt_shape_str)], i)] = unbatched_tensor + else: + tensors[(key, i)] = unbatched_tensor + + # Undo that padding using shapes and create a list of size `batch_size` that + # contains tensor dictionaries. + tensor_dict_list = [] + batch_size = self._batch_size + for batch_id in range(batch_size): + tensor_dict = {} + for key in self._static_shapes: + tensor_dict[key] = tf.slice(tensors[(key, batch_id)], + tf.zeros_like(shapes[(key, batch_id)]), + shapes[(key, batch_id)]) + tensor_dict[key].set_shape(self._static_shapes[key]) + tensor_dict_list.append(tensor_dict) + + return tensor_dict_list diff --git a/object_detection/core/batcher_test.py b/object_detection/core/batcher_test.py new file mode 100644 index 0000000000000000000000000000000000000000..61b4390b4cdcff146b721872ee98f9a48c6f67f0 --- /dev/null +++ b/object_detection/core/batcher_test.py @@ -0,0 +1,158 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.core.batcher.""" + +import numpy as np +import tensorflow as tf + +from object_detection.core import batcher + +slim = tf.contrib.slim + + +class BatcherTest(tf.test.TestCase): + + def test_batch_and_unpad_2d_tensors_of_different_sizes_in_1st_dimension(self): + with self.test_session() as sess: + batch_size = 3 + num_batches = 2 + examples = tf.Variable(tf.constant(2, dtype=tf.int32)) + counter = examples.count_up_to(num_batches * batch_size + 2) + boxes = tf.tile( + tf.reshape(tf.range(4), [1, 4]), tf.stack([counter, tf.constant(1)])) + batch_queue = batcher.BatchQueue( + tensor_dict={'boxes': boxes}, + batch_size=batch_size, + batch_queue_capacity=100, + num_batch_queue_threads=1, + prefetch_queue_capacity=100) + batch = batch_queue.dequeue() + + for tensor_dict in batch: + for tensor in tensor_dict.values(): + self.assertAllEqual([None, 4], tensor.get_shape().as_list()) + + tf.initialize_all_variables().run() + with slim.queues.QueueRunners(sess): + i = 2 + for _ in range(num_batches): + batch_np = sess.run(batch) + for tensor_dict in batch_np: + for tensor in tensor_dict.values(): + self.assertAllEqual(tensor, np.tile(np.arange(4), (i, 1))) + i += 1 + with self.assertRaises(tf.errors.OutOfRangeError): + sess.run(batch) + + def test_batch_and_unpad_2d_tensors_of_different_sizes_in_all_dimensions( + self): + with self.test_session() as sess: + batch_size = 3 + num_batches = 2 + examples = tf.Variable(tf.constant(2, dtype=tf.int32)) + counter = examples.count_up_to(num_batches * batch_size + 2) + image = tf.reshape( + tf.range(counter * counter), tf.stack([counter, counter])) + batch_queue = batcher.BatchQueue( + tensor_dict={'image': image}, + batch_size=batch_size, + batch_queue_capacity=100, + num_batch_queue_threads=1, + prefetch_queue_capacity=100) + batch = batch_queue.dequeue() + + for tensor_dict in batch: + for tensor in tensor_dict.values(): + self.assertAllEqual([None, None], tensor.get_shape().as_list()) + + tf.initialize_all_variables().run() + with slim.queues.QueueRunners(sess): + i = 2 + for _ in range(num_batches): + batch_np = sess.run(batch) + for tensor_dict in batch_np: + for tensor in tensor_dict.values(): + self.assertAllEqual(tensor, np.arange(i * i).reshape((i, i))) + i += 1 + with self.assertRaises(tf.errors.OutOfRangeError): + sess.run(batch) + + def test_batch_and_unpad_2d_tensors_of_same_size_in_all_dimensions(self): + with self.test_session() as sess: + batch_size = 3 + num_batches = 2 + examples = tf.Variable(tf.constant(1, dtype=tf.int32)) + counter = examples.count_up_to(num_batches * batch_size + 1) + image = tf.reshape(tf.range(1, 13), [4, 3]) * counter + batch_queue = batcher.BatchQueue( + tensor_dict={'image': image}, + batch_size=batch_size, + batch_queue_capacity=100, + num_batch_queue_threads=1, + prefetch_queue_capacity=100) + batch = batch_queue.dequeue() + + for tensor_dict in batch: + for tensor in tensor_dict.values(): + self.assertAllEqual([4, 3], tensor.get_shape().as_list()) + + tf.initialize_all_variables().run() + with slim.queues.QueueRunners(sess): + i = 1 + for _ in range(num_batches): + batch_np = sess.run(batch) + for tensor_dict in batch_np: + for tensor in tensor_dict.values(): + self.assertAllEqual(tensor, np.arange(1, 13).reshape((4, 3)) * i) + i += 1 + with self.assertRaises(tf.errors.OutOfRangeError): + sess.run(batch) + + def test_batcher_when_batch_size_is_one(self): + with self.test_session() as sess: + batch_size = 1 + num_batches = 2 + examples = tf.Variable(tf.constant(2, dtype=tf.int32)) + counter = examples.count_up_to(num_batches * batch_size + 2) + image = tf.reshape( + tf.range(counter * counter), tf.stack([counter, counter])) + batch_queue = batcher.BatchQueue( + tensor_dict={'image': image}, + batch_size=batch_size, + batch_queue_capacity=100, + num_batch_queue_threads=1, + prefetch_queue_capacity=100) + batch = batch_queue.dequeue() + + for tensor_dict in batch: + for tensor in tensor_dict.values(): + self.assertAllEqual([None, None], tensor.get_shape().as_list()) + + tf.initialize_all_variables().run() + with slim.queues.QueueRunners(sess): + i = 2 + for _ in range(num_batches): + batch_np = sess.run(batch) + for tensor_dict in batch_np: + for tensor in tensor_dict.values(): + self.assertAllEqual(tensor, np.arange(i * i).reshape((i, i))) + i += 1 + with self.assertRaises(tf.errors.OutOfRangeError): + sess.run(batch) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/core/box_coder.py b/object_detection/core/box_coder.py new file mode 100644 index 0000000000000000000000000000000000000000..f20ac956dfbce1fa69d1b9e6f5b023b704e1ec8a --- /dev/null +++ b/object_detection/core/box_coder.py @@ -0,0 +1,151 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Base box coder. + +Box coders convert between coordinate frames, namely image-centric +(with (0,0) on the top left of image) and anchor-centric (with (0,0) being +defined by a specific anchor). + +Users of a BoxCoder can call two methods: + encode: which encodes a box with respect to a given anchor + (or rather, a tensor of boxes wrt a corresponding tensor of anchors) and + decode: which inverts this encoding with a decode operation. +In both cases, the arguments are assumed to be in 1-1 correspondence already; +it is not the job of a BoxCoder to perform matching. +""" +from abc import ABCMeta +from abc import abstractmethod +from abc import abstractproperty + +import tensorflow as tf + + +# Box coder types. +FASTER_RCNN = 'faster_rcnn' +KEYPOINT = 'keypoint' +MEAN_STDDEV = 'mean_stddev' +SQUARE = 'square' + + +class BoxCoder(object): + """Abstract base class for box coder.""" + __metaclass__ = ABCMeta + + @abstractproperty + def code_size(self): + """Return the size of each code. + + This number is a constant and should agree with the output of the `encode` + op (e.g. if rel_codes is the output of self.encode(...), then it should have + shape [N, code_size()]). This abstractproperty should be overridden by + implementations. + + Returns: + an integer constant + """ + pass + + def encode(self, boxes, anchors): + """Encode a box list relative to an anchor collection. + + Args: + boxes: BoxList holding N boxes to be encoded + anchors: BoxList of N anchors + + Returns: + a tensor representing N relative-encoded boxes + """ + with tf.name_scope('Encode'): + return self._encode(boxes, anchors) + + def decode(self, rel_codes, anchors): + """Decode boxes that are encoded relative to an anchor collection. + + Args: + rel_codes: a tensor representing N relative-encoded boxes + anchors: BoxList of anchors + + Returns: + boxlist: BoxList holding N boxes encoded in the ordinary way (i.e., + with corners y_min, x_min, y_max, x_max) + """ + with tf.name_scope('Decode'): + return self._decode(rel_codes, anchors) + + @abstractmethod + def _encode(self, boxes, anchors): + """Method to be overriden by implementations. + + Args: + boxes: BoxList holding N boxes to be encoded + anchors: BoxList of N anchors + + Returns: + a tensor representing N relative-encoded boxes + """ + pass + + @abstractmethod + def _decode(self, rel_codes, anchors): + """Method to be overriden by implementations. + + Args: + rel_codes: a tensor representing N relative-encoded boxes + anchors: BoxList of anchors + + Returns: + boxlist: BoxList holding N boxes encoded in the ordinary way (i.e., + with corners y_min, x_min, y_max, x_max) + """ + pass + + +def batch_decode(encoded_boxes, box_coder, anchors): + """Decode a batch of encoded boxes. + + This op takes a batch of encoded bounding boxes and transforms + them to a batch of bounding boxes specified by their corners in + the order of [y_min, x_min, y_max, x_max]. + + Args: + encoded_boxes: a float32 tensor of shape [batch_size, num_anchors, + code_size] representing the location of the objects. + box_coder: a BoxCoder object. + anchors: a BoxList of anchors used to encode `encoded_boxes`. + + Returns: + decoded_boxes: a float32 tensor of shape [batch_size, num_anchors, + coder_size] representing the corners of the objects in the order + of [y_min, x_min, y_max, x_max]. + + Raises: + ValueError: if batch sizes of the inputs are inconsistent, or if + the number of anchors inferred from encoded_boxes and anchors are + inconsistent. + """ + encoded_boxes.get_shape().assert_has_rank(3) + if encoded_boxes.get_shape()[1].value != anchors.num_boxes_static(): + raise ValueError('The number of anchors inferred from encoded_boxes' + ' and anchors are inconsistent: shape[1] of encoded_boxes' + ' %s should be equal to the number of anchors: %s.' % + (encoded_boxes.get_shape()[1].value, + anchors.num_boxes_static())) + + decoded_boxes = tf.stack([ + box_coder.decode(boxes, anchors).get() + for boxes in tf.unstack(encoded_boxes) + ]) + return decoded_boxes diff --git a/object_detection/core/box_coder_test.py b/object_detection/core/box_coder_test.py new file mode 100644 index 0000000000000000000000000000000000000000..c087a325275f84604a114d064e050147001d32d0 --- /dev/null +++ b/object_detection/core/box_coder_test.py @@ -0,0 +1,61 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.core.box_coder.""" + +import tensorflow as tf + +from object_detection.core import box_coder +from object_detection.core import box_list + + +class MockBoxCoder(box_coder.BoxCoder): + """Test BoxCoder that encodes/decodes using the multiply-by-two function.""" + + def code_size(self): + return 4 + + def _encode(self, boxes, anchors): + return 2.0 * boxes.get() + + def _decode(self, rel_codes, anchors): + return box_list.BoxList(rel_codes / 2.0) + + +class BoxCoderTest(tf.test.TestCase): + + def test_batch_decode(self): + mock_anchor_corners = tf.constant( + [[0, 0.1, 0.2, 0.3], [0.2, 0.4, 0.4, 0.6]], tf.float32) + mock_anchors = box_list.BoxList(mock_anchor_corners) + mock_box_coder = MockBoxCoder() + + expected_boxes = [[[0.0, 0.1, 0.5, 0.6], [0.5, 0.6, 0.7, 0.8]], + [[0.1, 0.2, 0.3, 0.4], [0.7, 0.8, 0.9, 1.0]]] + + encoded_boxes_list = [mock_box_coder.encode( + box_list.BoxList(tf.constant(boxes)), mock_anchors) + for boxes in expected_boxes] + encoded_boxes = tf.stack(encoded_boxes_list) + decoded_boxes = box_coder.batch_decode( + encoded_boxes, mock_box_coder, mock_anchors) + + with self.test_session() as sess: + decoded_boxes_result = sess.run(decoded_boxes) + self.assertAllClose(expected_boxes, decoded_boxes_result) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/core/box_list.py b/object_detection/core/box_list.py new file mode 100644 index 0000000000000000000000000000000000000000..c0196f053030b103a6021ac159f6203f77ba1eed --- /dev/null +++ b/object_detection/core/box_list.py @@ -0,0 +1,207 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Bounding Box List definition. + +BoxList represents a list of bounding boxes as tensorflow +tensors, where each bounding box is represented as a row of 4 numbers, +[y_min, x_min, y_max, x_max]. It is assumed that all bounding boxes +within a given list correspond to a single image. See also +box_list_ops.py for common box related operations (such as area, iou, etc). + +Optionally, users can add additional related fields (such as weights). +We assume the following things to be true about fields: +* they correspond to boxes in the box_list along the 0th dimension +* they have inferrable rank at graph construction time +* all dimensions except for possibly the 0th can be inferred + (i.e., not None) at graph construction time. + +Some other notes: + * Following tensorflow conventions, we use height, width ordering, + and correspondingly, y,x (or ymin, xmin, ymax, xmax) ordering + * Tensors are always provided as (flat) [N, 4] tensors. +""" + +import tensorflow as tf + + +class BoxList(object): + """Box collection.""" + + def __init__(self, boxes): + """Constructs box collection. + + Args: + boxes: a tensor of shape [N, 4] representing box corners + + Raises: + ValueError: if invalid dimensions for bbox data or if bbox data is not in + float32 format. + """ + if len(boxes.get_shape()) != 2 or boxes.get_shape()[-1] != 4: + raise ValueError('Invalid dimensions for box data.') + if boxes.dtype != tf.float32: + raise ValueError('Invalid tensor type: should be tf.float32') + self.data = {'boxes': boxes} + + def num_boxes(self): + """Returns number of boxes held in collection. + + Returns: + a tensor representing the number of boxes held in the collection. + """ + return tf.shape(self.data['boxes'])[0] + + def num_boxes_static(self): + """Returns number of boxes held in collection. + + This number is inferred at graph construction time rather than run-time. + + Returns: + Number of boxes held in collection (integer) or None if this is not + inferrable at graph construction time. + """ + return self.data['boxes'].get_shape()[0].value + + def get_all_fields(self): + """Returns all fields.""" + return self.data.keys() + + def get_extra_fields(self): + """Returns all non-box fields (i.e., everything not named 'boxes').""" + return [k for k in self.data.keys() if k != 'boxes'] + + def add_field(self, field, field_data): + """Add field to box list. + + This method can be used to add related box data such as + weights/labels, etc. + + Args: + field: a string key to access the data via `get` + field_data: a tensor containing the data to store in the BoxList + """ + self.data[field] = field_data + + def has_field(self, field): + return field in self.data + + def get(self): + """Convenience function for accessing box coordinates. + + Returns: + a tensor with shape [N, 4] representing box coordinates. + """ + return self.get_field('boxes') + + def set(self, boxes): + """Convenience function for setting box coordinates. + + Args: + boxes: a tensor of shape [N, 4] representing box corners + + Raises: + ValueError: if invalid dimensions for bbox data + """ + if len(boxes.get_shape()) != 2 or boxes.get_shape()[-1] != 4: + raise ValueError('Invalid dimensions for box data.') + self.data['boxes'] = boxes + + def get_field(self, field): + """Accesses a box collection and associated fields. + + This function returns specified field with object; if no field is specified, + it returns the box coordinates. + + Args: + field: this optional string parameter can be used to specify + a related field to be accessed. + + Returns: + a tensor representing the box collection or an associated field. + + Raises: + ValueError: if invalid field + """ + if not self.has_field(field): + raise ValueError('field ' + str(field) + ' does not exist') + return self.data[field] + + def set_field(self, field, value): + """Sets the value of a field. + + Updates the field of a box_list with a given value. + + Args: + field: (string) name of the field to set value. + value: the value to assign to the field. + + Raises: + ValueError: if the box_list does not have specified field. + """ + if not self.has_field(field): + raise ValueError('field %s does not exist' % field) + self.data[field] = value + + def get_center_coordinates_and_sizes(self, scope=None): + """Computes the center coordinates, height and width of the boxes. + + Args: + scope: name scope of the function. + + Returns: + a list of 4 1-D tensors [ycenter, xcenter, height, width]. + """ + with tf.name_scope(scope, 'get_center_coordinates_and_sizes'): + box_corners = self.get() + ymin, xmin, ymax, xmax = tf.unstack(tf.transpose(box_corners)) + width = xmax - xmin + height = ymax - ymin + ycenter = ymin + height / 2. + xcenter = xmin + width / 2. + return [ycenter, xcenter, height, width] + + def transpose_coordinates(self, scope=None): + """Transpose the coordinate representation in a boxlist. + + Args: + scope: name scope of the function. + """ + with tf.name_scope(scope, 'transpose_coordinates'): + y_min, x_min, y_max, x_max = tf.split( + value=self.get(), num_or_size_splits=4, axis=1) + self.set(tf.concat([x_min, y_min, x_max, y_max], 1)) + + def as_tensor_dict(self, fields=None): + """Retrieves specified fields as a dictionary of tensors. + + Args: + fields: (optional) list of fields to return in the dictionary. + If None (default), all fields are returned. + + Returns: + tensor_dict: A dictionary of tensors specified by fields. + + Raises: + ValueError: if specified field is not contained in boxlist. + """ + tensor_dict = {} + if fields is None: + fields = self.get_all_fields() + for field in fields: + if not self.has_field(field): + raise ValueError('boxlist must contain all specified fields') + tensor_dict[field] = self.get_field(field) + return tensor_dict diff --git a/object_detection/core/box_list_ops.py b/object_detection/core/box_list_ops.py new file mode 100644 index 0000000000000000000000000000000000000000..b083fabfb0cc17eb567433568b9793229ea9fe7e --- /dev/null +++ b/object_detection/core/box_list_ops.py @@ -0,0 +1,975 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Bounding Box List operations. + +Example box operations that are supported: + * areas: compute bounding box areas + * iou: pairwise intersection-over-union scores + * sq_dist: pairwise distances between bounding boxes + +Whenever box_list_ops functions output a BoxList, the fields of the incoming +BoxList are retained unless documented otherwise. +""" +import tensorflow as tf + +from object_detection.core import box_list +from object_detection.utils import shape_utils + + +class SortOrder(object): + """Enum class for sort order. + + Attributes: + ascend: ascend order. + descend: descend order. + """ + ascend = 1 + descend = 2 + + +def area(boxlist, scope=None): + """Computes area of boxes. + + Args: + boxlist: BoxList holding N boxes + scope: name scope. + + Returns: + a tensor with shape [N] representing box areas. + """ + with tf.name_scope(scope, 'Area'): + y_min, x_min, y_max, x_max = tf.split( + value=boxlist.get(), num_or_size_splits=4, axis=1) + return tf.squeeze((y_max - y_min) * (x_max - x_min), [1]) + + +def height_width(boxlist, scope=None): + """Computes height and width of boxes in boxlist. + + Args: + boxlist: BoxList holding N boxes + scope: name scope. + + Returns: + Height: A tensor with shape [N] representing box heights. + Width: A tensor with shape [N] representing box widths. + """ + with tf.name_scope(scope, 'HeightWidth'): + y_min, x_min, y_max, x_max = tf.split( + value=boxlist.get(), num_or_size_splits=4, axis=1) + return tf.squeeze(y_max - y_min, [1]), tf.squeeze(x_max - x_min, [1]) + + +def scale(boxlist, y_scale, x_scale, scope=None): + """scale box coordinates in x and y dimensions. + + Args: + boxlist: BoxList holding N boxes + y_scale: (float) scalar tensor + x_scale: (float) scalar tensor + scope: name scope. + + Returns: + boxlist: BoxList holding N boxes + """ + with tf.name_scope(scope, 'Scale'): + y_scale = tf.cast(y_scale, tf.float32) + x_scale = tf.cast(x_scale, tf.float32) + y_min, x_min, y_max, x_max = tf.split( + value=boxlist.get(), num_or_size_splits=4, axis=1) + y_min = y_scale * y_min + y_max = y_scale * y_max + x_min = x_scale * x_min + x_max = x_scale * x_max + scaled_boxlist = box_list.BoxList( + tf.concat([y_min, x_min, y_max, x_max], 1)) + return _copy_extra_fields(scaled_boxlist, boxlist) + + +def clip_to_window(boxlist, window, filter_nonoverlapping=True, scope=None): + """Clip bounding boxes to a window. + + This op clips any input bounding boxes (represented by bounding box + corners) to a window, optionally filtering out boxes that do not + overlap at all with the window. + + Args: + boxlist: BoxList holding M_in boxes + window: a tensor of shape [4] representing the [y_min, x_min, y_max, x_max] + window to which the op should clip boxes. + filter_nonoverlapping: whether to filter out boxes that do not overlap at + all with the window. + scope: name scope. + + Returns: + a BoxList holding M_out boxes where M_out <= M_in + """ + with tf.name_scope(scope, 'ClipToWindow'): + y_min, x_min, y_max, x_max = tf.split( + value=boxlist.get(), num_or_size_splits=4, axis=1) + win_y_min, win_x_min, win_y_max, win_x_max = tf.unstack(window) + y_min_clipped = tf.maximum(tf.minimum(y_min, win_y_max), win_y_min) + y_max_clipped = tf.maximum(tf.minimum(y_max, win_y_max), win_y_min) + x_min_clipped = tf.maximum(tf.minimum(x_min, win_x_max), win_x_min) + x_max_clipped = tf.maximum(tf.minimum(x_max, win_x_max), win_x_min) + clipped = box_list.BoxList( + tf.concat([y_min_clipped, x_min_clipped, y_max_clipped, x_max_clipped], + 1)) + clipped = _copy_extra_fields(clipped, boxlist) + if filter_nonoverlapping: + areas = area(clipped) + nonzero_area_indices = tf.cast( + tf.reshape(tf.where(tf.greater(areas, 0.0)), [-1]), tf.int32) + clipped = gather(clipped, nonzero_area_indices) + return clipped + + +def prune_outside_window(boxlist, window, scope=None): + """Prunes bounding boxes that fall outside a given window. + + This function prunes bounding boxes that even partially fall outside the given + window. See also clip_to_window which only prunes bounding boxes that fall + completely outside the window, and clips any bounding boxes that partially + overflow. + + Args: + boxlist: a BoxList holding M_in boxes. + window: a float tensor of shape [4] representing [ymin, xmin, ymax, xmax] + of the window + scope: name scope. + + Returns: + pruned_corners: a tensor with shape [M_out, 4] where M_out <= M_in + valid_indices: a tensor with shape [M_out] indexing the valid bounding boxes + in the input tensor. + """ + with tf.name_scope(scope, 'PruneOutsideWindow'): + y_min, x_min, y_max, x_max = tf.split( + value=boxlist.get(), num_or_size_splits=4, axis=1) + win_y_min, win_x_min, win_y_max, win_x_max = tf.unstack(window) + coordinate_violations = tf.concat([ + tf.less(y_min, win_y_min), tf.less(x_min, win_x_min), + tf.greater(y_max, win_y_max), tf.greater(x_max, win_x_max) + ], 1) + valid_indices = tf.reshape( + tf.where(tf.logical_not(tf.reduce_any(coordinate_violations, 1))), [-1]) + return gather(boxlist, valid_indices), valid_indices + + +def prune_completely_outside_window(boxlist, window, scope=None): + """Prunes bounding boxes that fall completely outside of the given window. + + The function clip_to_window prunes bounding boxes that fall + completely outside the window, but also clips any bounding boxes that + partially overflow. This function does not clip partially overflowing boxes. + + Args: + boxlist: a BoxList holding M_in boxes. + window: a float tensor of shape [4] representing [ymin, xmin, ymax, xmax] + of the window + scope: name scope. + + Returns: + pruned_corners: a tensor with shape [M_out, 4] where M_out <= M_in + valid_indices: a tensor with shape [M_out] indexing the valid bounding boxes + in the input tensor. + """ + with tf.name_scope(scope, 'PruneCompleteleyOutsideWindow'): + y_min, x_min, y_max, x_max = tf.split( + value=boxlist.get(), num_or_size_splits=4, axis=1) + win_y_min, win_x_min, win_y_max, win_x_max = tf.unstack(window) + coordinate_violations = tf.concat([ + tf.greater_equal(y_min, win_y_max), tf.greater_equal(x_min, win_x_max), + tf.less_equal(y_max, win_y_min), tf.less_equal(x_max, win_x_min) + ], 1) + valid_indices = tf.reshape( + tf.where(tf.logical_not(tf.reduce_any(coordinate_violations, 1))), [-1]) + return gather(boxlist, valid_indices), valid_indices + + +def intersection(boxlist1, boxlist2, scope=None): + """Compute pairwise intersection areas between boxes. + + Args: + boxlist1: BoxList holding N boxes + boxlist2: BoxList holding M boxes + scope: name scope. + + Returns: + a tensor with shape [N, M] representing pairwise intersections + """ + with tf.name_scope(scope, 'Intersection'): + y_min1, x_min1, y_max1, x_max1 = tf.split( + value=boxlist1.get(), num_or_size_splits=4, axis=1) + y_min2, x_min2, y_max2, x_max2 = tf.split( + value=boxlist2.get(), num_or_size_splits=4, axis=1) + all_pairs_min_ymax = tf.minimum(y_max1, tf.transpose(y_max2)) + all_pairs_max_ymin = tf.maximum(y_min1, tf.transpose(y_min2)) + intersect_heights = tf.maximum(0.0, all_pairs_min_ymax - all_pairs_max_ymin) + all_pairs_min_xmax = tf.minimum(x_max1, tf.transpose(x_max2)) + all_pairs_max_xmin = tf.maximum(x_min1, tf.transpose(x_min2)) + intersect_widths = tf.maximum(0.0, all_pairs_min_xmax - all_pairs_max_xmin) + return intersect_heights * intersect_widths + + +def matched_intersection(boxlist1, boxlist2, scope=None): + """Compute intersection areas between corresponding boxes in two boxlists. + + Args: + boxlist1: BoxList holding N boxes + boxlist2: BoxList holding N boxes + scope: name scope. + + Returns: + a tensor with shape [N] representing pairwise intersections + """ + with tf.name_scope(scope, 'MatchedIntersection'): + y_min1, x_min1, y_max1, x_max1 = tf.split( + value=boxlist1.get(), num_or_size_splits=4, axis=1) + y_min2, x_min2, y_max2, x_max2 = tf.split( + value=boxlist2.get(), num_or_size_splits=4, axis=1) + min_ymax = tf.minimum(y_max1, y_max2) + max_ymin = tf.maximum(y_min1, y_min2) + intersect_heights = tf.maximum(0.0, min_ymax - max_ymin) + min_xmax = tf.minimum(x_max1, x_max2) + max_xmin = tf.maximum(x_min1, x_min2) + intersect_widths = tf.maximum(0.0, min_xmax - max_xmin) + return tf.reshape(intersect_heights * intersect_widths, [-1]) + + +def iou(boxlist1, boxlist2, scope=None): + """Computes pairwise intersection-over-union between box collections. + + Args: + boxlist1: BoxList holding N boxes + boxlist2: BoxList holding M boxes + scope: name scope. + + Returns: + a tensor with shape [N, M] representing pairwise iou scores. + """ + with tf.name_scope(scope, 'IOU'): + intersections = intersection(boxlist1, boxlist2) + areas1 = area(boxlist1) + areas2 = area(boxlist2) + unions = ( + tf.expand_dims(areas1, 1) + tf.expand_dims(areas2, 0) - intersections) + return tf.where( + tf.equal(intersections, 0.0), + tf.zeros_like(intersections), tf.truediv(intersections, unions)) + + +def matched_iou(boxlist1, boxlist2, scope=None): + """Compute intersection-over-union between corresponding boxes in boxlists. + + Args: + boxlist1: BoxList holding N boxes + boxlist2: BoxList holding N boxes + scope: name scope. + + Returns: + a tensor with shape [N] representing pairwise iou scores. + """ + with tf.name_scope(scope, 'MatchedIOU'): + intersections = matched_intersection(boxlist1, boxlist2) + areas1 = area(boxlist1) + areas2 = area(boxlist2) + unions = areas1 + areas2 - intersections + return tf.where( + tf.equal(intersections, 0.0), + tf.zeros_like(intersections), tf.truediv(intersections, unions)) + + +def ioa(boxlist1, boxlist2, scope=None): + """Computes pairwise intersection-over-area between box collections. + + intersection-over-area (IOA) between two boxes box1 and box2 is defined as + their intersection area over box2's area. Note that ioa is not symmetric, + that is, ioa(box1, box2) != ioa(box2, box1). + + Args: + boxlist1: BoxList holding N boxes + boxlist2: BoxList holding M boxes + scope: name scope. + + Returns: + a tensor with shape [N, M] representing pairwise ioa scores. + """ + with tf.name_scope(scope, 'IOA'): + intersections = intersection(boxlist1, boxlist2) + areas = tf.expand_dims(area(boxlist2), 0) + return tf.truediv(intersections, areas) + + +def prune_non_overlapping_boxes( + boxlist1, boxlist2, min_overlap=0.0, scope=None): + """Prunes the boxes in boxlist1 that overlap less than thresh with boxlist2. + + For each box in boxlist1, we want its IOA to be more than minoverlap with + at least one of the boxes in boxlist2. If it does not, we remove it. + + Args: + boxlist1: BoxList holding N boxes. + boxlist2: BoxList holding M boxes. + min_overlap: Minimum required overlap between boxes, to count them as + overlapping. + scope: name scope. + + Returns: + new_boxlist1: A pruned boxlist with size [N', 4]. + keep_inds: A tensor with shape [N'] indexing kept bounding boxes in the + first input BoxList `boxlist1`. + """ + with tf.name_scope(scope, 'PruneNonOverlappingBoxes'): + ioa_ = ioa(boxlist2, boxlist1) # [M, N] tensor + ioa_ = tf.reduce_max(ioa_, reduction_indices=[0]) # [N] tensor + keep_bool = tf.greater_equal(ioa_, tf.constant(min_overlap)) + keep_inds = tf.squeeze(tf.where(keep_bool), squeeze_dims=[1]) + new_boxlist1 = gather(boxlist1, keep_inds) + return new_boxlist1, keep_inds + + +def prune_small_boxes(boxlist, min_side, scope=None): + """Prunes small boxes in the boxlist which have a side smaller than min_side. + + Args: + boxlist: BoxList holding N boxes. + min_side: Minimum width AND height of box to survive pruning. + scope: name scope. + + Returns: + A pruned boxlist. + """ + with tf.name_scope(scope, 'PruneSmallBoxes'): + height, width = height_width(boxlist) + is_valid = tf.logical_and(tf.greater_equal(width, min_side), + tf.greater_equal(height, min_side)) + return gather(boxlist, tf.reshape(tf.where(is_valid), [-1])) + + +def change_coordinate_frame(boxlist, window, scope=None): + """Change coordinate frame of the boxlist to be relative to window's frame. + + Given a window of the form [ymin, xmin, ymax, xmax], + changes bounding box coordinates from boxlist to be relative to this window + (e.g., the min corner maps to (0,0) and the max corner maps to (1,1)). + + An example use case is data augmentation: where we are given groundtruth + boxes (boxlist) and would like to randomly crop the image to some + window (window). In this case we need to change the coordinate frame of + each groundtruth box to be relative to this new window. + + Args: + boxlist: A BoxList object holding N boxes. + window: A rank 1 tensor [4]. + scope: name scope. + + Returns: + Returns a BoxList object with N boxes. + """ + with tf.name_scope(scope, 'ChangeCoordinateFrame'): + win_height = window[2] - window[0] + win_width = window[3] - window[1] + boxlist_new = scale(box_list.BoxList( + boxlist.get() - [window[0], window[1], window[0], window[1]]), + 1.0 / win_height, 1.0 / win_width) + boxlist_new = _copy_extra_fields(boxlist_new, boxlist) + return boxlist_new + + +def sq_dist(boxlist1, boxlist2, scope=None): + """Computes the pairwise squared distances between box corners. + + This op treats each box as if it were a point in a 4d Euclidean space and + computes pairwise squared distances. + + Mathematically, we are given two matrices of box coordinates X and Y, + where X(i,:) is the i'th row of X, containing the 4 numbers defining the + corners of the i'th box in boxlist1. Similarly Y(j,:) corresponds to + boxlist2. We compute + Z(i,j) = ||X(i,:) - Y(j,:)||^2 + = ||X(i,:)||^2 + ||Y(j,:)||^2 - 2 X(i,:)' * Y(j,:), + + Args: + boxlist1: BoxList holding N boxes + boxlist2: BoxList holding M boxes + scope: name scope. + + Returns: + a tensor with shape [N, M] representing pairwise distances + """ + with tf.name_scope(scope, 'SqDist'): + sqnorm1 = tf.reduce_sum(tf.square(boxlist1.get()), 1, keep_dims=True) + sqnorm2 = tf.reduce_sum(tf.square(boxlist2.get()), 1, keep_dims=True) + innerprod = tf.matmul(boxlist1.get(), boxlist2.get(), + transpose_a=False, transpose_b=True) + return sqnorm1 + tf.transpose(sqnorm2) - 2.0 * innerprod + + +def boolean_mask(boxlist, indicator, fields=None, scope=None): + """Select boxes from BoxList according to indicator and return new BoxList. + + `boolean_mask` returns the subset of boxes that are marked as "True" by the + indicator tensor. By default, `boolean_mask` returns boxes corresponding to + the input index list, as well as all additional fields stored in the boxlist + (indexing into the first dimension). However one can optionally only draw + from a subset of fields. + + Args: + boxlist: BoxList holding N boxes + indicator: a rank-1 boolean tensor + fields: (optional) list of fields to also gather from. If None (default), + all fields are gathered from. Pass an empty fields list to only gather + the box coordinates. + scope: name scope. + + Returns: + subboxlist: a BoxList corresponding to the subset of the input BoxList + specified by indicator + Raises: + ValueError: if `indicator` is not a rank-1 boolean tensor. + """ + with tf.name_scope(scope, 'BooleanMask'): + if indicator.shape.ndims != 1: + raise ValueError('indicator should have rank 1') + if indicator.dtype != tf.bool: + raise ValueError('indicator should be a boolean tensor') + subboxlist = box_list.BoxList(tf.boolean_mask(boxlist.get(), indicator)) + if fields is None: + fields = boxlist.get_extra_fields() + for field in fields: + if not boxlist.has_field(field): + raise ValueError('boxlist must contain all specified fields') + subfieldlist = tf.boolean_mask(boxlist.get_field(field), indicator) + subboxlist.add_field(field, subfieldlist) + return subboxlist + + +def gather(boxlist, indices, fields=None, scope=None): + """Gather boxes from BoxList according to indices and return new BoxList. + + By default, `gather` returns boxes corresponding to the input index list, as + well as all additional fields stored in the boxlist (indexing into the + first dimension). However one can optionally only gather from a + subset of fields. + + Args: + boxlist: BoxList holding N boxes + indices: a rank-1 tensor of type int32 / int64 + fields: (optional) list of fields to also gather from. If None (default), + all fields are gathered from. Pass an empty fields list to only gather + the box coordinates. + scope: name scope. + + Returns: + subboxlist: a BoxList corresponding to the subset of the input BoxList + specified by indices + Raises: + ValueError: if specified field is not contained in boxlist or if the + indices are not of type int32 + """ + with tf.name_scope(scope, 'Gather'): + if len(indices.shape.as_list()) != 1: + raise ValueError('indices should have rank 1') + if indices.dtype != tf.int32 and indices.dtype != tf.int64: + raise ValueError('indices should be an int32 / int64 tensor') + subboxlist = box_list.BoxList(tf.gather(boxlist.get(), indices)) + if fields is None: + fields = boxlist.get_extra_fields() + for field in fields: + if not boxlist.has_field(field): + raise ValueError('boxlist must contain all specified fields') + subfieldlist = tf.gather(boxlist.get_field(field), indices) + subboxlist.add_field(field, subfieldlist) + return subboxlist + + +def concatenate(boxlists, fields=None, scope=None): + """Concatenate list of BoxLists. + + This op concatenates a list of input BoxLists into a larger BoxList. It also + handles concatenation of BoxList fields as long as the field tensor shapes + are equal except for the first dimension. + + Args: + boxlists: list of BoxList objects + fields: optional list of fields to also concatenate. By default, all + fields from the first BoxList in the list are included in the + concatenation. + scope: name scope. + + Returns: + a BoxList with number of boxes equal to + sum([boxlist.num_boxes() for boxlist in BoxList]) + Raises: + ValueError: if boxlists is invalid (i.e., is not a list, is empty, or + contains non BoxList objects), or if requested fields are not contained in + all boxlists + """ + with tf.name_scope(scope, 'Concatenate'): + if not isinstance(boxlists, list): + raise ValueError('boxlists should be a list') + if not boxlists: + raise ValueError('boxlists should have nonzero length') + for boxlist in boxlists: + if not isinstance(boxlist, box_list.BoxList): + raise ValueError('all elements of boxlists should be BoxList objects') + concatenated = box_list.BoxList( + tf.concat([boxlist.get() for boxlist in boxlists], 0)) + if fields is None: + fields = boxlists[0].get_extra_fields() + for field in fields: + first_field_shape = boxlists[0].get_field(field).get_shape().as_list() + first_field_shape[0] = -1 + if None in first_field_shape: + raise ValueError('field %s must have fully defined shape except for the' + ' 0th dimension.' % field) + for boxlist in boxlists: + if not boxlist.has_field(field): + raise ValueError('boxlist must contain all requested fields') + field_shape = boxlist.get_field(field).get_shape().as_list() + field_shape[0] = -1 + if field_shape != first_field_shape: + raise ValueError('field %s must have same shape for all boxlists ' + 'except for the 0th dimension.' % field) + concatenated_field = tf.concat( + [boxlist.get_field(field) for boxlist in boxlists], 0) + concatenated.add_field(field, concatenated_field) + return concatenated + + +def sort_by_field(boxlist, field, order=SortOrder.descend, scope=None): + """Sort boxes and associated fields according to a scalar field. + + A common use case is reordering the boxes according to descending scores. + + Args: + boxlist: BoxList holding N boxes. + field: A BoxList field for sorting and reordering the BoxList. + order: (Optional) descend or ascend. Default is descend. + scope: name scope. + + Returns: + sorted_boxlist: A sorted BoxList with the field in the specified order. + + Raises: + ValueError: if specified field does not exist + ValueError: if the order is not either descend or ascend + """ + with tf.name_scope(scope, 'SortByField'): + if order != SortOrder.descend and order != SortOrder.ascend: + raise ValueError('Invalid sort order') + + field_to_sort = boxlist.get_field(field) + if len(field_to_sort.shape.as_list()) != 1: + raise ValueError('Field should have rank 1') + + num_boxes = boxlist.num_boxes() + num_entries = tf.size(field_to_sort) + length_assert = tf.Assert( + tf.equal(num_boxes, num_entries), + ['Incorrect field size: actual vs expected.', num_entries, num_boxes]) + + with tf.control_dependencies([length_assert]): + # TODO: Remove with tf.device when top_k operation runs correctly on GPU. + with tf.device('/cpu:0'): + _, sorted_indices = tf.nn.top_k(field_to_sort, num_boxes, sorted=True) + + if order == SortOrder.ascend: + sorted_indices = tf.reverse_v2(sorted_indices, [0]) + + return gather(boxlist, sorted_indices) + + +def visualize_boxes_in_image(image, boxlist, normalized=False, scope=None): + """Overlay bounding box list on image. + + Currently this visualization plots a 1 pixel thick red bounding box on top + of the image. Note that tf.image.draw_bounding_boxes essentially is + 1 indexed. + + Args: + image: an image tensor with shape [height, width, 3] + boxlist: a BoxList + normalized: (boolean) specify whether corners are to be interpreted + as absolute coordinates in image space or normalized with respect to the + image size. + scope: name scope. + + Returns: + image_and_boxes: an image tensor with shape [height, width, 3] + """ + with tf.name_scope(scope, 'VisualizeBoxesInImage'): + if not normalized: + height, width, _ = tf.unstack(tf.shape(image)) + boxlist = scale(boxlist, + 1.0 / tf.cast(height, tf.float32), + 1.0 / tf.cast(width, tf.float32)) + corners = tf.expand_dims(boxlist.get(), 0) + image = tf.expand_dims(image, 0) + return tf.squeeze(tf.image.draw_bounding_boxes(image, corners), [0]) + + +def filter_field_value_equals(boxlist, field, value, scope=None): + """Filter to keep only boxes with field entries equal to the given value. + + Args: + boxlist: BoxList holding N boxes. + field: field name for filtering. + value: scalar value. + scope: name scope. + + Returns: + a BoxList holding M boxes where M <= N + + Raises: + ValueError: if boxlist not a BoxList object or if it does not have + the specified field. + """ + with tf.name_scope(scope, 'FilterFieldValueEquals'): + if not isinstance(boxlist, box_list.BoxList): + raise ValueError('boxlist must be a BoxList') + if not boxlist.has_field(field): + raise ValueError('boxlist must contain the specified field') + filter_field = boxlist.get_field(field) + gather_index = tf.reshape(tf.where(tf.equal(filter_field, value)), [-1]) + return gather(boxlist, gather_index) + + +def filter_greater_than(boxlist, thresh, scope=None): + """Filter to keep only boxes with score exceeding a given threshold. + + This op keeps the collection of boxes whose corresponding scores are + greater than the input threshold. + + TODO: Change function name to FilterScoresGreaterThan + + Args: + boxlist: BoxList holding N boxes. Must contain a 'scores' field + representing detection scores. + thresh: scalar threshold + scope: name scope. + + Returns: + a BoxList holding M boxes where M <= N + + Raises: + ValueError: if boxlist not a BoxList object or if it does not + have a scores field + """ + with tf.name_scope(scope, 'FilterGreaterThan'): + if not isinstance(boxlist, box_list.BoxList): + raise ValueError('boxlist must be a BoxList') + if not boxlist.has_field('scores'): + raise ValueError('input boxlist must have \'scores\' field') + scores = boxlist.get_field('scores') + if len(scores.shape.as_list()) > 2: + raise ValueError('Scores should have rank 1 or 2') + if len(scores.shape.as_list()) == 2 and scores.shape.as_list()[1] != 1: + raise ValueError('Scores should have rank 1 or have shape ' + 'consistent with [None, 1]') + high_score_indices = tf.cast(tf.reshape( + tf.where(tf.greater(scores, thresh)), + [-1]), tf.int32) + return gather(boxlist, high_score_indices) + + +def non_max_suppression(boxlist, thresh, max_output_size, scope=None): + """Non maximum suppression. + + This op greedily selects a subset of detection bounding boxes, pruning + away boxes that have high IOU (intersection over union) overlap (> thresh) + with already selected boxes. Note that this only works for a single class --- + to apply NMS to multi-class predictions, use MultiClassNonMaxSuppression. + + Args: + boxlist: BoxList holding N boxes. Must contain a 'scores' field + representing detection scores. + thresh: scalar threshold + max_output_size: maximum number of retained boxes + scope: name scope. + + Returns: + a BoxList holding M boxes where M <= max_output_size + Raises: + ValueError: if thresh is not in [0, 1] + """ + with tf.name_scope(scope, 'NonMaxSuppression'): + if not 0 <= thresh <= 1.0: + raise ValueError('thresh must be between 0 and 1') + if not isinstance(boxlist, box_list.BoxList): + raise ValueError('boxlist must be a BoxList') + if not boxlist.has_field('scores'): + raise ValueError('input boxlist must have \'scores\' field') + selected_indices = tf.image.non_max_suppression( + boxlist.get(), boxlist.get_field('scores'), + max_output_size, iou_threshold=thresh) + return gather(boxlist, selected_indices) + + +def _copy_extra_fields(boxlist_to_copy_to, boxlist_to_copy_from): + """Copies the extra fields of boxlist_to_copy_from to boxlist_to_copy_to. + + Args: + boxlist_to_copy_to: BoxList to which extra fields are copied. + boxlist_to_copy_from: BoxList from which fields are copied. + + Returns: + boxlist_to_copy_to with extra fields. + """ + for field in boxlist_to_copy_from.get_extra_fields(): + boxlist_to_copy_to.add_field(field, boxlist_to_copy_from.get_field(field)) + return boxlist_to_copy_to + + +def to_normalized_coordinates(boxlist, height, width, + check_range=True, scope=None): + """Converts absolute box coordinates to normalized coordinates in [0, 1]. + + Usually one uses the dynamic shape of the image or conv-layer tensor: + boxlist = box_list_ops.to_normalized_coordinates(boxlist, + tf.shape(images)[1], + tf.shape(images)[2]), + + This function raises an assertion failed error at graph execution time when + the maximum coordinate is smaller than 1.01 (which means that coordinates are + already normalized). The value 1.01 is to deal with small rounding errors. + + Args: + boxlist: BoxList with coordinates in terms of pixel-locations. + height: Maximum value for height of absolute box coordinates. + width: Maximum value for width of absolute box coordinates. + check_range: If True, checks if the coordinates are normalized or not. + scope: name scope. + + Returns: + boxlist with normalized coordinates in [0, 1]. + """ + with tf.name_scope(scope, 'ToNormalizedCoordinates'): + height = tf.cast(height, tf.float32) + width = tf.cast(width, tf.float32) + + if check_range: + max_val = tf.reduce_max(boxlist.get()) + max_assert = tf.Assert(tf.greater(max_val, 1.01), + ['max value is lower than 1.01: ', max_val]) + with tf.control_dependencies([max_assert]): + width = tf.identity(width) + + return scale(boxlist, 1 / height, 1 / width) + + +def to_absolute_coordinates(boxlist, height, width, + check_range=True, scope=None): + """Converts normalized box coordinates to absolute pixel coordinates. + + This function raises an assertion failed error when the maximum box coordinate + value is larger than 1.01 (in which case coordinates are already absolute). + + Args: + boxlist: BoxList with coordinates in range [0, 1]. + height: Maximum value for height of absolute box coordinates. + width: Maximum value for width of absolute box coordinates. + check_range: If True, checks if the coordinates are normalized or not. + scope: name scope. + + Returns: + boxlist with absolute coordinates in terms of the image size. + + """ + with tf.name_scope(scope, 'ToAbsoluteCoordinates'): + height = tf.cast(height, tf.float32) + width = tf.cast(width, tf.float32) + + # Ensure range of input boxes is correct. + if check_range: + box_maximum = tf.reduce_max(boxlist.get()) + max_assert = tf.Assert(tf.greater_equal(1.01, box_maximum), + ['maximum box coordinate value is larger ' + 'than 1.01: ', box_maximum]) + with tf.control_dependencies([max_assert]): + width = tf.identity(width) + + return scale(boxlist, height, width) + + +def refine_boxes_multi_class(pool_boxes, + num_classes, + nms_iou_thresh, + nms_max_detections, + voting_iou_thresh=0.5): + """Refines a pool of boxes using non max suppression and box voting. + + Box refinement is done independently for each class. + + Args: + pool_boxes: (BoxList) A collection of boxes to be refined. pool_boxes must + have a rank 1 'scores' field and a rank 1 'classes' field. + num_classes: (int scalar) Number of classes. + nms_iou_thresh: (float scalar) iou threshold for non max suppression (NMS). + nms_max_detections: (int scalar) maximum output size for NMS. + voting_iou_thresh: (float scalar) iou threshold for box voting. + + Returns: + BoxList of refined boxes. + + Raises: + ValueError: if + a) nms_iou_thresh or voting_iou_thresh is not in [0, 1]. + b) pool_boxes is not a BoxList. + c) pool_boxes does not have a scores and classes field. + """ + if not 0.0 <= nms_iou_thresh <= 1.0: + raise ValueError('nms_iou_thresh must be between 0 and 1') + if not 0.0 <= voting_iou_thresh <= 1.0: + raise ValueError('voting_iou_thresh must be between 0 and 1') + if not isinstance(pool_boxes, box_list.BoxList): + raise ValueError('pool_boxes must be a BoxList') + if not pool_boxes.has_field('scores'): + raise ValueError('pool_boxes must have a \'scores\' field') + if not pool_boxes.has_field('classes'): + raise ValueError('pool_boxes must have a \'classes\' field') + + refined_boxes = [] + for i in range(num_classes): + boxes_class = filter_field_value_equals(pool_boxes, 'classes', i) + refined_boxes_class = refine_boxes(boxes_class, nms_iou_thresh, + nms_max_detections, voting_iou_thresh) + refined_boxes.append(refined_boxes_class) + return sort_by_field(concatenate(refined_boxes), 'scores') + + +def refine_boxes(pool_boxes, + nms_iou_thresh, + nms_max_detections, + voting_iou_thresh=0.5): + """Refines a pool of boxes using non max suppression and box voting. + + Args: + pool_boxes: (BoxList) A collection of boxes to be refined. pool_boxes must + have a rank 1 'scores' field. + nms_iou_thresh: (float scalar) iou threshold for non max suppression (NMS). + nms_max_detections: (int scalar) maximum output size for NMS. + voting_iou_thresh: (float scalar) iou threshold for box voting. + + Returns: + BoxList of refined boxes. + + Raises: + ValueError: if + a) nms_iou_thresh or voting_iou_thresh is not in [0, 1]. + b) pool_boxes is not a BoxList. + c) pool_boxes does not have a scores field. + """ + if not 0.0 <= nms_iou_thresh <= 1.0: + raise ValueError('nms_iou_thresh must be between 0 and 1') + if not 0.0 <= voting_iou_thresh <= 1.0: + raise ValueError('voting_iou_thresh must be between 0 and 1') + if not isinstance(pool_boxes, box_list.BoxList): + raise ValueError('pool_boxes must be a BoxList') + if not pool_boxes.has_field('scores'): + raise ValueError('pool_boxes must have a \'scores\' field') + + nms_boxes = non_max_suppression( + pool_boxes, nms_iou_thresh, nms_max_detections) + return box_voting(nms_boxes, pool_boxes, voting_iou_thresh) + + +def box_voting(selected_boxes, pool_boxes, iou_thresh=0.5): + """Performs box voting as described in S. Gidaris and N. Komodakis, ICCV 2015. + + Performs box voting as described in 'Object detection via a multi-region & + semantic segmentation-aware CNN model', Gidaris and Komodakis, ICCV 2015. For + each box 'B' in selected_boxes, we find the set 'S' of boxes in pool_boxes + with iou overlap >= iou_thresh. The location of B is set to the weighted + average location of boxes in S (scores are used for weighting). And the score + of B is set to the average score of boxes in S. + + Args: + selected_boxes: BoxList containing a subset of boxes in pool_boxes. These + boxes are usually selected from pool_boxes using non max suppression. + pool_boxes: BoxList containing a set of (possibly redundant) boxes. + iou_thresh: (float scalar) iou threshold for matching boxes in + selected_boxes and pool_boxes. + + Returns: + BoxList containing averaged locations and scores for each box in + selected_boxes. + + Raises: + ValueError: if + a) selected_boxes or pool_boxes is not a BoxList. + b) if iou_thresh is not in [0, 1]. + c) pool_boxes does not have a scores field. + """ + if not 0.0 <= iou_thresh <= 1.0: + raise ValueError('iou_thresh must be between 0 and 1') + if not isinstance(selected_boxes, box_list.BoxList): + raise ValueError('selected_boxes must be a BoxList') + if not isinstance(pool_boxes, box_list.BoxList): + raise ValueError('pool_boxes must be a BoxList') + if not pool_boxes.has_field('scores'): + raise ValueError('pool_boxes must have a \'scores\' field') + + iou_ = iou(selected_boxes, pool_boxes) + match_indicator = tf.to_float(tf.greater(iou_, iou_thresh)) + num_matches = tf.reduce_sum(match_indicator, 1) + # TODO: Handle the case where some boxes in selected_boxes do not match to any + # boxes in pool_boxes. For such boxes without any matches, we should return + # the original boxes without voting. + match_assert = tf.Assert( + tf.reduce_all(tf.greater(num_matches, 0)), + ['Each box in selected_boxes must match with at least one box ' + 'in pool_boxes.']) + + scores = tf.expand_dims(pool_boxes.get_field('scores'), 1) + scores_assert = tf.Assert( + tf.reduce_all(tf.greater_equal(scores, 0)), + ['Scores must be non negative.']) + + with tf.control_dependencies([scores_assert, match_assert]): + sum_scores = tf.matmul(match_indicator, scores) + averaged_scores = tf.reshape(sum_scores, [-1]) / num_matches + + box_locations = tf.matmul(match_indicator, + pool_boxes.get() * scores) / sum_scores + averaged_boxes = box_list.BoxList(box_locations) + _copy_extra_fields(averaged_boxes, selected_boxes) + averaged_boxes.add_field('scores', averaged_scores) + return averaged_boxes + + +def pad_or_clip_box_list(boxlist, num_boxes, scope=None): + """Pads or clips all fields of a BoxList. + + Args: + boxlist: A BoxList with arbitrary of number of boxes. + num_boxes: First num_boxes in boxlist are kept. + The fields are zero-padded if num_boxes is bigger than the + actual number of boxes. + scope: name scope. + + Returns: + BoxList with all fields padded or clipped. + """ + with tf.name_scope(scope, 'PadOrClipBoxList'): + subboxlist = box_list.BoxList(shape_utils.pad_or_clip_tensor( + boxlist.get(), num_boxes)) + for field in boxlist.get_extra_fields(): + subfield = shape_utils.pad_or_clip_tensor( + boxlist.get_field(field), num_boxes) + subboxlist.add_field(field, subfield) + return subboxlist diff --git a/object_detection/core/box_list_ops_test.py b/object_detection/core/box_list_ops_test.py new file mode 100644 index 0000000000000000000000000000000000000000..467bb3c67d62a892cb09df2d8c3519d58dab6da0 --- /dev/null +++ b/object_detection/core/box_list_ops_test.py @@ -0,0 +1,962 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.core.box_list_ops.""" +import numpy as np +import tensorflow as tf +from tensorflow.python.framework import errors + +from object_detection.core import box_list +from object_detection.core import box_list_ops + + +class BoxListOpsTest(tf.test.TestCase): + """Tests for common bounding box operations.""" + + def test_area(self): + corners = tf.constant([[0.0, 0.0, 10.0, 20.0], [1.0, 2.0, 3.0, 4.0]]) + exp_output = [200.0, 4.0] + boxes = box_list.BoxList(corners) + areas = box_list_ops.area(boxes) + with self.test_session() as sess: + areas_output = sess.run(areas) + self.assertAllClose(areas_output, exp_output) + + def test_height_width(self): + corners = tf.constant([[0.0, 0.0, 10.0, 20.0], [1.0, 2.0, 3.0, 4.0]]) + exp_output_heights = [10., 2.] + exp_output_widths = [20., 2.] + boxes = box_list.BoxList(corners) + heights, widths = box_list_ops.height_width(boxes) + with self.test_session() as sess: + output_heights, output_widths = sess.run([heights, widths]) + self.assertAllClose(output_heights, exp_output_heights) + self.assertAllClose(output_widths, exp_output_widths) + + def test_scale(self): + corners = tf.constant([[0, 0, 100, 200], [50, 120, 100, 140]], + dtype=tf.float32) + boxes = box_list.BoxList(corners) + boxes.add_field('extra_data', tf.constant([[1], [2]])) + + y_scale = tf.constant(1.0/100) + x_scale = tf.constant(1.0/200) + scaled_boxes = box_list_ops.scale(boxes, y_scale, x_scale) + exp_output = [[0, 0, 1, 1], [0.5, 0.6, 1.0, 0.7]] + with self.test_session() as sess: + scaled_corners_out = sess.run(scaled_boxes.get()) + self.assertAllClose(scaled_corners_out, exp_output) + extra_data_out = sess.run(scaled_boxes.get_field('extra_data')) + self.assertAllEqual(extra_data_out, [[1], [2]]) + + def test_clip_to_window_filter_boxes_which_fall_outside_the_window( + self): + window = tf.constant([0, 0, 9, 14], tf.float32) + corners = tf.constant([[5.0, 5.0, 6.0, 6.0], + [-1.0, -2.0, 4.0, 5.0], + [2.0, 3.0, 5.0, 9.0], + [0.0, 0.0, 9.0, 14.0], + [-100.0, -100.0, 300.0, 600.0], + [-10.0, -10.0, -9.0, -9.0]]) + boxes = box_list.BoxList(corners) + boxes.add_field('extra_data', tf.constant([[1], [2], [3], [4], [5], [6]])) + exp_output = [[5.0, 5.0, 6.0, 6.0], [0.0, 0.0, 4.0, 5.0], + [2.0, 3.0, 5.0, 9.0], [0.0, 0.0, 9.0, 14.0], + [0.0, 0.0, 9.0, 14.0]] + pruned = box_list_ops.clip_to_window( + boxes, window, filter_nonoverlapping=True) + with self.test_session() as sess: + pruned_output = sess.run(pruned.get()) + self.assertAllClose(pruned_output, exp_output) + extra_data_out = sess.run(pruned.get_field('extra_data')) + self.assertAllEqual(extra_data_out, [[1], [2], [3], [4], [5]]) + + def test_clip_to_window_without_filtering_boxes_which_fall_outside_the_window( + self): + window = tf.constant([0, 0, 9, 14], tf.float32) + corners = tf.constant([[5.0, 5.0, 6.0, 6.0], + [-1.0, -2.0, 4.0, 5.0], + [2.0, 3.0, 5.0, 9.0], + [0.0, 0.0, 9.0, 14.0], + [-100.0, -100.0, 300.0, 600.0], + [-10.0, -10.0, -9.0, -9.0]]) + boxes = box_list.BoxList(corners) + boxes.add_field('extra_data', tf.constant([[1], [2], [3], [4], [5], [6]])) + exp_output = [[5.0, 5.0, 6.0, 6.0], [0.0, 0.0, 4.0, 5.0], + [2.0, 3.0, 5.0, 9.0], [0.0, 0.0, 9.0, 14.0], + [0.0, 0.0, 9.0, 14.0], [0.0, 0.0, 0.0, 0.0]] + pruned = box_list_ops.clip_to_window( + boxes, window, filter_nonoverlapping=False) + with self.test_session() as sess: + pruned_output = sess.run(pruned.get()) + self.assertAllClose(pruned_output, exp_output) + extra_data_out = sess.run(pruned.get_field('extra_data')) + self.assertAllEqual(extra_data_out, [[1], [2], [3], [4], [5], [6]]) + + def test_prune_outside_window_filters_boxes_which_fall_outside_the_window( + self): + window = tf.constant([0, 0, 9, 14], tf.float32) + corners = tf.constant([[5.0, 5.0, 6.0, 6.0], + [-1.0, -2.0, 4.0, 5.0], + [2.0, 3.0, 5.0, 9.0], + [0.0, 0.0, 9.0, 14.0], + [-10.0, -10.0, -9.0, -9.0], + [-100.0, -100.0, 300.0, 600.0]]) + boxes = box_list.BoxList(corners) + boxes.add_field('extra_data', tf.constant([[1], [2], [3], [4], [5], [6]])) + exp_output = [[5.0, 5.0, 6.0, 6.0], + [2.0, 3.0, 5.0, 9.0], + [0.0, 0.0, 9.0, 14.0]] + pruned, keep_indices = box_list_ops.prune_outside_window(boxes, window) + with self.test_session() as sess: + pruned_output = sess.run(pruned.get()) + self.assertAllClose(pruned_output, exp_output) + keep_indices_out = sess.run(keep_indices) + self.assertAllEqual(keep_indices_out, [0, 2, 3]) + extra_data_out = sess.run(pruned.get_field('extra_data')) + self.assertAllEqual(extra_data_out, [[1], [3], [4]]) + + def test_prune_completely_outside_window(self): + window = tf.constant([0, 0, 9, 14], tf.float32) + corners = tf.constant([[5.0, 5.0, 6.0, 6.0], + [-1.0, -2.0, 4.0, 5.0], + [2.0, 3.0, 5.0, 9.0], + [0.0, 0.0, 9.0, 14.0], + [-10.0, -10.0, -9.0, -9.0], + [-100.0, -100.0, 300.0, 600.0]]) + boxes = box_list.BoxList(corners) + boxes.add_field('extra_data', tf.constant([[1], [2], [3], [4], [5], [6]])) + exp_output = [[5.0, 5.0, 6.0, 6.0], + [-1.0, -2.0, 4.0, 5.0], + [2.0, 3.0, 5.0, 9.0], + [0.0, 0.0, 9.0, 14.0], + [-100.0, -100.0, 300.0, 600.0]] + pruned, keep_indices = box_list_ops.prune_completely_outside_window(boxes, + window) + with self.test_session() as sess: + pruned_output = sess.run(pruned.get()) + self.assertAllClose(pruned_output, exp_output) + keep_indices_out = sess.run(keep_indices) + self.assertAllEqual(keep_indices_out, [0, 1, 2, 3, 5]) + extra_data_out = sess.run(pruned.get_field('extra_data')) + self.assertAllEqual(extra_data_out, [[1], [2], [3], [4], [6]]) + + def test_intersection(self): + corners1 = tf.constant([[4.0, 3.0, 7.0, 5.0], [5.0, 6.0, 10.0, 7.0]]) + corners2 = tf.constant([[3.0, 4.0, 6.0, 8.0], [14.0, 14.0, 15.0, 15.0], + [0.0, 0.0, 20.0, 20.0]]) + exp_output = [[2.0, 0.0, 6.0], [1.0, 0.0, 5.0]] + boxes1 = box_list.BoxList(corners1) + boxes2 = box_list.BoxList(corners2) + intersect = box_list_ops.intersection(boxes1, boxes2) + with self.test_session() as sess: + intersect_output = sess.run(intersect) + self.assertAllClose(intersect_output, exp_output) + + def test_matched_intersection(self): + corners1 = tf.constant([[4.0, 3.0, 7.0, 5.0], [5.0, 6.0, 10.0, 7.0]]) + corners2 = tf.constant([[3.0, 4.0, 6.0, 8.0], [14.0, 14.0, 15.0, 15.0]]) + exp_output = [2.0, 0.0] + boxes1 = box_list.BoxList(corners1) + boxes2 = box_list.BoxList(corners2) + intersect = box_list_ops.matched_intersection(boxes1, boxes2) + with self.test_session() as sess: + intersect_output = sess.run(intersect) + self.assertAllClose(intersect_output, exp_output) + + def test_iou(self): + corners1 = tf.constant([[4.0, 3.0, 7.0, 5.0], [5.0, 6.0, 10.0, 7.0]]) + corners2 = tf.constant([[3.0, 4.0, 6.0, 8.0], [14.0, 14.0, 15.0, 15.0], + [0.0, 0.0, 20.0, 20.0]]) + exp_output = [[2.0 / 16.0, 0, 6.0 / 400.0], [1.0 / 16.0, 0.0, 5.0 / 400.0]] + boxes1 = box_list.BoxList(corners1) + boxes2 = box_list.BoxList(corners2) + iou = box_list_ops.iou(boxes1, boxes2) + with self.test_session() as sess: + iou_output = sess.run(iou) + self.assertAllClose(iou_output, exp_output) + + def test_matched_iou(self): + corners1 = tf.constant([[4.0, 3.0, 7.0, 5.0], [5.0, 6.0, 10.0, 7.0]]) + corners2 = tf.constant([[3.0, 4.0, 6.0, 8.0], [14.0, 14.0, 15.0, 15.0]]) + exp_output = [2.0 / 16.0, 0] + boxes1 = box_list.BoxList(corners1) + boxes2 = box_list.BoxList(corners2) + iou = box_list_ops.matched_iou(boxes1, boxes2) + with self.test_session() as sess: + iou_output = sess.run(iou) + self.assertAllClose(iou_output, exp_output) + + def test_iouworks_on_empty_inputs(self): + corners1 = tf.constant([[4.0, 3.0, 7.0, 5.0], [5.0, 6.0, 10.0, 7.0]]) + corners2 = tf.constant([[3.0, 4.0, 6.0, 8.0], [14.0, 14.0, 15.0, 15.0], + [0.0, 0.0, 20.0, 20.0]]) + boxes1 = box_list.BoxList(corners1) + boxes2 = box_list.BoxList(corners2) + boxes_empty = box_list.BoxList(tf.zeros((0, 4))) + iou_empty_1 = box_list_ops.iou(boxes1, boxes_empty) + iou_empty_2 = box_list_ops.iou(boxes_empty, boxes2) + iou_empty_3 = box_list_ops.iou(boxes_empty, boxes_empty) + with self.test_session() as sess: + iou_output_1, iou_output_2, iou_output_3 = sess.run( + [iou_empty_1, iou_empty_2, iou_empty_3]) + self.assertAllEqual(iou_output_1.shape, (2, 0)) + self.assertAllEqual(iou_output_2.shape, (0, 3)) + self.assertAllEqual(iou_output_3.shape, (0, 0)) + + def test_ioa(self): + corners1 = tf.constant([[4.0, 3.0, 7.0, 5.0], [5.0, 6.0, 10.0, 7.0]]) + corners2 = tf.constant([[3.0, 4.0, 6.0, 8.0], [14.0, 14.0, 15.0, 15.0], + [0.0, 0.0, 20.0, 20.0]]) + exp_output_1 = [[2.0 / 12.0, 0, 6.0 / 400.0], + [1.0 / 12.0, 0.0, 5.0 / 400.0]] + exp_output_2 = [[2.0 / 6.0, 1.0 / 5.0], + [0, 0], + [6.0 / 6.0, 5.0 / 5.0]] + boxes1 = box_list.BoxList(corners1) + boxes2 = box_list.BoxList(corners2) + ioa_1 = box_list_ops.ioa(boxes1, boxes2) + ioa_2 = box_list_ops.ioa(boxes2, boxes1) + with self.test_session() as sess: + ioa_output_1, ioa_output_2 = sess.run([ioa_1, ioa_2]) + self.assertAllClose(ioa_output_1, exp_output_1) + self.assertAllClose(ioa_output_2, exp_output_2) + + def test_prune_non_overlapping_boxes(self): + corners1 = tf.constant([[4.0, 3.0, 7.0, 5.0], [5.0, 6.0, 10.0, 7.0]]) + corners2 = tf.constant([[3.0, 4.0, 6.0, 8.0], [14.0, 14.0, 15.0, 15.0], + [0.0, 0.0, 20.0, 20.0]]) + boxes1 = box_list.BoxList(corners1) + boxes2 = box_list.BoxList(corners2) + minoverlap = 0.5 + + exp_output_1 = boxes1 + exp_output_2 = box_list.BoxList(tf.constant(0.0, shape=[0, 4])) + output_1, keep_indices_1 = box_list_ops.prune_non_overlapping_boxes( + boxes1, boxes2, min_overlap=minoverlap) + output_2, keep_indices_2 = box_list_ops.prune_non_overlapping_boxes( + boxes2, boxes1, min_overlap=minoverlap) + with self.test_session() as sess: + (output_1_, keep_indices_1_, output_2_, keep_indices_2_, exp_output_1_, + exp_output_2_) = sess.run( + [output_1.get(), keep_indices_1, + output_2.get(), keep_indices_2, + exp_output_1.get(), exp_output_2.get()]) + self.assertAllClose(output_1_, exp_output_1_) + self.assertAllClose(output_2_, exp_output_2_) + self.assertAllEqual(keep_indices_1_, [0, 1]) + self.assertAllEqual(keep_indices_2_, []) + + def test_prune_small_boxes(self): + boxes = tf.constant([[4.0, 3.0, 7.0, 5.0], + [5.0, 6.0, 10.0, 7.0], + [3.0, 4.0, 6.0, 8.0], + [14.0, 14.0, 15.0, 15.0], + [0.0, 0.0, 20.0, 20.0]]) + exp_boxes = [[3.0, 4.0, 6.0, 8.0], + [0.0, 0.0, 20.0, 20.0]] + boxes = box_list.BoxList(boxes) + pruned_boxes = box_list_ops.prune_small_boxes(boxes, 3) + with self.test_session() as sess: + pruned_boxes = sess.run(pruned_boxes.get()) + self.assertAllEqual(pruned_boxes, exp_boxes) + + def test_prune_small_boxes_prunes_boxes_with_negative_side(self): + boxes = tf.constant([[4.0, 3.0, 7.0, 5.0], + [5.0, 6.0, 10.0, 7.0], + [3.0, 4.0, 6.0, 8.0], + [14.0, 14.0, 15.0, 15.0], + [0.0, 0.0, 20.0, 20.0], + [2.0, 3.0, 1.5, 7.0], # negative height + [2.0, 3.0, 5.0, 1.7]]) # negative width + exp_boxes = [[3.0, 4.0, 6.0, 8.0], + [0.0, 0.0, 20.0, 20.0]] + boxes = box_list.BoxList(boxes) + pruned_boxes = box_list_ops.prune_small_boxes(boxes, 3) + with self.test_session() as sess: + pruned_boxes = sess.run(pruned_boxes.get()) + self.assertAllEqual(pruned_boxes, exp_boxes) + + def test_change_coordinate_frame(self): + corners = tf.constant([[0.25, 0.5, 0.75, 0.75], [0.5, 0.0, 1.0, 1.0]]) + window = tf.constant([0.25, 0.25, 0.75, 0.75]) + boxes = box_list.BoxList(corners) + + expected_corners = tf.constant([[0, 0.5, 1.0, 1.0], [0.5, -0.5, 1.5, 1.5]]) + expected_boxes = box_list.BoxList(expected_corners) + output = box_list_ops.change_coordinate_frame(boxes, window) + + with self.test_session() as sess: + output_, expected_boxes_ = sess.run([output.get(), expected_boxes.get()]) + self.assertAllClose(output_, expected_boxes_) + + def test_ioaworks_on_empty_inputs(self): + corners1 = tf.constant([[4.0, 3.0, 7.0, 5.0], [5.0, 6.0, 10.0, 7.0]]) + corners2 = tf.constant([[3.0, 4.0, 6.0, 8.0], [14.0, 14.0, 15.0, 15.0], + [0.0, 0.0, 20.0, 20.0]]) + boxes1 = box_list.BoxList(corners1) + boxes2 = box_list.BoxList(corners2) + boxes_empty = box_list.BoxList(tf.zeros((0, 4))) + ioa_empty_1 = box_list_ops.ioa(boxes1, boxes_empty) + ioa_empty_2 = box_list_ops.ioa(boxes_empty, boxes2) + ioa_empty_3 = box_list_ops.ioa(boxes_empty, boxes_empty) + with self.test_session() as sess: + ioa_output_1, ioa_output_2, ioa_output_3 = sess.run( + [ioa_empty_1, ioa_empty_2, ioa_empty_3]) + self.assertAllEqual(ioa_output_1.shape, (2, 0)) + self.assertAllEqual(ioa_output_2.shape, (0, 3)) + self.assertAllEqual(ioa_output_3.shape, (0, 0)) + + def test_pairwise_distances(self): + corners1 = tf.constant([[0.0, 0.0, 0.0, 0.0], + [1.0, 1.0, 0.0, 2.0]]) + corners2 = tf.constant([[3.0, 4.0, 1.0, 0.0], + [-4.0, 0.0, 0.0, 3.0], + [0.0, 0.0, 0.0, 0.0]]) + exp_output = [[26, 25, 0], [18, 27, 6]] + boxes1 = box_list.BoxList(corners1) + boxes2 = box_list.BoxList(corners2) + dist_matrix = box_list_ops.sq_dist(boxes1, boxes2) + with self.test_session() as sess: + dist_output = sess.run(dist_matrix) + self.assertAllClose(dist_output, exp_output) + + def test_boolean_mask(self): + corners = tf.constant( + [4 * [0.0], 4 * [1.0], 4 * [2.0], 4 * [3.0], 4 * [4.0]]) + indicator = tf.constant([True, False, True, False, True], tf.bool) + expected_subset = [4 * [0.0], 4 * [2.0], 4 * [4.0]] + boxes = box_list.BoxList(corners) + subset = box_list_ops.boolean_mask(boxes, indicator) + with self.test_session() as sess: + subset_output = sess.run(subset.get()) + self.assertAllClose(subset_output, expected_subset) + + def test_boolean_mask_with_field(self): + corners = tf.constant( + [4 * [0.0], 4 * [1.0], 4 * [2.0], 4 * [3.0], 4 * [4.0]]) + indicator = tf.constant([True, False, True, False, True], tf.bool) + weights = tf.constant([[.1], [.3], [.5], [.7], [.9]], tf.float32) + expected_subset = [4 * [0.0], 4 * [2.0], 4 * [4.0]] + expected_weights = [[.1], [.5], [.9]] + + boxes = box_list.BoxList(corners) + boxes.add_field('weights', weights) + subset = box_list_ops.boolean_mask(boxes, indicator, ['weights']) + with self.test_session() as sess: + subset_output, weights_output = sess.run( + [subset.get(), subset.get_field('weights')]) + self.assertAllClose(subset_output, expected_subset) + self.assertAllClose(weights_output, expected_weights) + + def test_gather(self): + corners = tf.constant( + [4 * [0.0], 4 * [1.0], 4 * [2.0], 4 * [3.0], 4 * [4.0]]) + indices = tf.constant([0, 2, 4], tf.int32) + expected_subset = [4 * [0.0], 4 * [2.0], 4 * [4.0]] + boxes = box_list.BoxList(corners) + subset = box_list_ops.gather(boxes, indices) + with self.test_session() as sess: + subset_output = sess.run(subset.get()) + self.assertAllClose(subset_output, expected_subset) + + def test_gather_with_field(self): + corners = tf.constant([4*[0.0], 4*[1.0], 4*[2.0], 4*[3.0], 4*[4.0]]) + indices = tf.constant([0, 2, 4], tf.int32) + weights = tf.constant([[.1], [.3], [.5], [.7], [.9]], tf.float32) + expected_subset = [4 * [0.0], 4 * [2.0], 4 * [4.0]] + expected_weights = [[.1], [.5], [.9]] + + boxes = box_list.BoxList(corners) + boxes.add_field('weights', weights) + subset = box_list_ops.gather(boxes, indices, ['weights']) + with self.test_session() as sess: + subset_output, weights_output = sess.run( + [subset.get(), subset.get_field('weights')]) + self.assertAllClose(subset_output, expected_subset) + self.assertAllClose(weights_output, expected_weights) + + def test_gather_with_invalid_field(self): + corners = tf.constant([4 * [0.0], 4 * [1.0]]) + indices = tf.constant([0, 1], tf.int32) + weights = tf.constant([[.1], [.3]], tf.float32) + + boxes = box_list.BoxList(corners) + boxes.add_field('weights', weights) + with self.assertRaises(ValueError): + box_list_ops.gather(boxes, indices, ['foo', 'bar']) + + def test_gather_with_invalid_inputs(self): + corners = tf.constant( + [4 * [0.0], 4 * [1.0], 4 * [2.0], 4 * [3.0], 4 * [4.0]]) + indices_float32 = tf.constant([0, 2, 4], tf.float32) + boxes = box_list.BoxList(corners) + with self.assertRaises(ValueError): + _ = box_list_ops.gather(boxes, indices_float32) + indices_2d = tf.constant([[0, 2, 4]], tf.int32) + boxes = box_list.BoxList(corners) + with self.assertRaises(ValueError): + _ = box_list_ops.gather(boxes, indices_2d) + + def test_gather_with_dynamic_indexing(self): + corners = tf.constant([4 * [0.0], 4 * [1.0], 4 * [2.0], 4 * [3.0], 4 * [4.0] + ]) + weights = tf.constant([.5, .3, .7, .1, .9], tf.float32) + indices = tf.reshape(tf.where(tf.greater(weights, 0.4)), [-1]) + expected_subset = [4 * [0.0], 4 * [2.0], 4 * [4.0]] + expected_weights = [.5, .7, .9] + + boxes = box_list.BoxList(corners) + boxes.add_field('weights', weights) + subset = box_list_ops.gather(boxes, indices, ['weights']) + with self.test_session() as sess: + subset_output, weights_output = sess.run([subset.get(), subset.get_field( + 'weights')]) + self.assertAllClose(subset_output, expected_subset) + self.assertAllClose(weights_output, expected_weights) + + def test_sort_by_field_ascending_order(self): + exp_corners = [[0, 0, 1, 1], [0, 0.1, 1, 1.1], [0, -0.1, 1, 0.9], + [0, 10, 1, 11], [0, 10.1, 1, 11.1], [0, 100, 1, 101]] + exp_scores = [.95, .9, .75, .6, .5, .3] + exp_weights = [.2, .45, .6, .75, .8, .92] + shuffle = [2, 4, 0, 5, 1, 3] + corners = tf.constant([exp_corners[i] for i in shuffle], tf.float32) + boxes = box_list.BoxList(corners) + boxes.add_field('scores', tf.constant( + [exp_scores[i] for i in shuffle], tf.float32)) + boxes.add_field('weights', tf.constant( + [exp_weights[i] for i in shuffle], tf.float32)) + sort_by_weight = box_list_ops.sort_by_field( + boxes, + 'weights', + order=box_list_ops.SortOrder.ascend) + with self.test_session() as sess: + corners_out, scores_out, weights_out = sess.run([ + sort_by_weight.get(), + sort_by_weight.get_field('scores'), + sort_by_weight.get_field('weights')]) + self.assertAllClose(corners_out, exp_corners) + self.assertAllClose(scores_out, exp_scores) + self.assertAllClose(weights_out, exp_weights) + + def test_sort_by_field_descending_order(self): + exp_corners = [[0, 0, 1, 1], [0, 0.1, 1, 1.1], [0, -0.1, 1, 0.9], + [0, 10, 1, 11], [0, 10.1, 1, 11.1], [0, 100, 1, 101]] + exp_scores = [.95, .9, .75, .6, .5, .3] + exp_weights = [.2, .45, .6, .75, .8, .92] + shuffle = [2, 4, 0, 5, 1, 3] + + corners = tf.constant([exp_corners[i] for i in shuffle], tf.float32) + boxes = box_list.BoxList(corners) + boxes.add_field('scores', tf.constant( + [exp_scores[i] for i in shuffle], tf.float32)) + boxes.add_field('weights', tf.constant( + [exp_weights[i] for i in shuffle], tf.float32)) + + sort_by_score = box_list_ops.sort_by_field(boxes, 'scores') + with self.test_session() as sess: + corners_out, scores_out, weights_out = sess.run([sort_by_score.get( + ), sort_by_score.get_field('scores'), sort_by_score.get_field('weights')]) + self.assertAllClose(corners_out, exp_corners) + self.assertAllClose(scores_out, exp_scores) + self.assertAllClose(weights_out, exp_weights) + + def test_sort_by_field_invalid_inputs(self): + corners = tf.constant([4 * [0.0], 4 * [0.5], 4 * [1.0], 4 * [2.0], 4 * + [3.0], 4 * [4.0]]) + misc = tf.constant([[.95, .9], [.5, .3]], tf.float32) + weights = tf.constant([.1, .2], tf.float32) + boxes = box_list.BoxList(corners) + boxes.add_field('misc', misc) + boxes.add_field('weights', weights) + + with self.test_session() as sess: + with self.assertRaises(ValueError): + box_list_ops.sort_by_field(boxes, 'area') + + with self.assertRaises(ValueError): + box_list_ops.sort_by_field(boxes, 'misc') + + with self.assertRaisesWithPredicateMatch(errors.InvalidArgumentError, + 'Incorrect field size'): + sess.run(box_list_ops.sort_by_field(boxes, 'weights').get()) + + def test_visualize_boxes_in_image(self): + image = tf.zeros((6, 4, 3)) + corners = tf.constant([[0, 0, 5, 3], + [0, 0, 3, 2]], tf.float32) + boxes = box_list.BoxList(corners) + image_and_boxes = box_list_ops.visualize_boxes_in_image(image, boxes) + image_and_boxes_bw = tf.to_float( + tf.greater(tf.reduce_sum(image_and_boxes, 2), 0.0)) + exp_result = [[1, 1, 1, 0], + [1, 1, 1, 0], + [1, 1, 1, 0], + [1, 0, 1, 0], + [1, 1, 1, 0], + [0, 0, 0, 0]] + with self.test_session() as sess: + output = sess.run(image_and_boxes_bw) + self.assertAllEqual(output.astype(int), exp_result) + + def test_filter_field_value_equals(self): + corners = tf.constant([[0, 0, 1, 1], + [0, 0.1, 1, 1.1], + [0, -0.1, 1, 0.9], + [0, 10, 1, 11], + [0, 10.1, 1, 11.1], + [0, 100, 1, 101]], tf.float32) + boxes = box_list.BoxList(corners) + boxes.add_field('classes', tf.constant([1, 2, 1, 2, 2, 1])) + exp_output1 = [[0, 0, 1, 1], [0, -0.1, 1, 0.9], [0, 100, 1, 101]] + exp_output2 = [[0, 0.1, 1, 1.1], [0, 10, 1, 11], [0, 10.1, 1, 11.1]] + + filtered_boxes1 = box_list_ops.filter_field_value_equals( + boxes, 'classes', 1) + filtered_boxes2 = box_list_ops.filter_field_value_equals( + boxes, 'classes', 2) + with self.test_session() as sess: + filtered_output1, filtered_output2 = sess.run([filtered_boxes1.get(), + filtered_boxes2.get()]) + self.assertAllClose(filtered_output1, exp_output1) + self.assertAllClose(filtered_output2, exp_output2) + + def test_filter_greater_than(self): + corners = tf.constant([[0, 0, 1, 1], + [0, 0.1, 1, 1.1], + [0, -0.1, 1, 0.9], + [0, 10, 1, 11], + [0, 10.1, 1, 11.1], + [0, 100, 1, 101]], tf.float32) + boxes = box_list.BoxList(corners) + boxes.add_field('scores', tf.constant([.1, .75, .9, .5, .5, .8])) + thresh = .6 + exp_output = [[0, 0.1, 1, 1.1], [0, -0.1, 1, 0.9], [0, 100, 1, 101]] + + filtered_boxes = box_list_ops.filter_greater_than(boxes, thresh) + with self.test_session() as sess: + filtered_output = sess.run(filtered_boxes.get()) + self.assertAllClose(filtered_output, exp_output) + + def test_clip_box_list(self): + boxlist = box_list.BoxList( + tf.constant([[0.1, 0.1, 0.4, 0.4], [0.1, 0.1, 0.5, 0.5], + [0.6, 0.6, 0.8, 0.8], [0.2, 0.2, 0.3, 0.3]], tf.float32)) + boxlist.add_field('classes', tf.constant([0, 0, 1, 1])) + boxlist.add_field('scores', tf.constant([0.75, 0.65, 0.3, 0.2])) + num_boxes = 2 + clipped_boxlist = box_list_ops.pad_or_clip_box_list(boxlist, num_boxes) + + expected_boxes = [[0.1, 0.1, 0.4, 0.4], [0.1, 0.1, 0.5, 0.5]] + expected_classes = [0, 0] + expected_scores = [0.75, 0.65] + with self.test_session() as sess: + boxes_out, classes_out, scores_out = sess.run( + [clipped_boxlist.get(), clipped_boxlist.get_field('classes'), + clipped_boxlist.get_field('scores')]) + + self.assertAllClose(expected_boxes, boxes_out) + self.assertAllEqual(expected_classes, classes_out) + self.assertAllClose(expected_scores, scores_out) + + def test_pad_box_list(self): + boxlist = box_list.BoxList( + tf.constant([[0.1, 0.1, 0.4, 0.4], [0.1, 0.1, 0.5, 0.5]], tf.float32)) + boxlist.add_field('classes', tf.constant([0, 1])) + boxlist.add_field('scores', tf.constant([0.75, 0.2])) + num_boxes = 4 + padded_boxlist = box_list_ops.pad_or_clip_box_list(boxlist, num_boxes) + + expected_boxes = [[0.1, 0.1, 0.4, 0.4], [0.1, 0.1, 0.5, 0.5], + [0, 0, 0, 0], [0, 0, 0, 0]] + expected_classes = [0, 1, 0, 0] + expected_scores = [0.75, 0.2, 0, 0] + with self.test_session() as sess: + boxes_out, classes_out, scores_out = sess.run( + [padded_boxlist.get(), padded_boxlist.get_field('classes'), + padded_boxlist.get_field('scores')]) + + self.assertAllClose(expected_boxes, boxes_out) + self.assertAllEqual(expected_classes, classes_out) + self.assertAllClose(expected_scores, scores_out) + + +class ConcatenateTest(tf.test.TestCase): + + def test_invalid_input_box_list_list(self): + with self.assertRaises(ValueError): + box_list_ops.concatenate(None) + with self.assertRaises(ValueError): + box_list_ops.concatenate([]) + with self.assertRaises(ValueError): + corners = tf.constant([[0, 0, 0, 0]], tf.float32) + boxlist = box_list.BoxList(corners) + box_list_ops.concatenate([boxlist, 2]) + + def test_concatenate_with_missing_fields(self): + corners1 = tf.constant([[0, 0, 0, 0], [1, 2, 3, 4]], tf.float32) + scores1 = tf.constant([1.0, 2.1]) + corners2 = tf.constant([[0, 3, 1, 6], [2, 4, 3, 8]], tf.float32) + boxlist1 = box_list.BoxList(corners1) + boxlist1.add_field('scores', scores1) + boxlist2 = box_list.BoxList(corners2) + with self.assertRaises(ValueError): + box_list_ops.concatenate([boxlist1, boxlist2]) + + def test_concatenate_with_incompatible_field_shapes(self): + corners1 = tf.constant([[0, 0, 0, 0], [1, 2, 3, 4]], tf.float32) + scores1 = tf.constant([1.0, 2.1]) + corners2 = tf.constant([[0, 3, 1, 6], [2, 4, 3, 8]], tf.float32) + scores2 = tf.constant([[1.0, 1.0], [2.1, 3.2]]) + boxlist1 = box_list.BoxList(corners1) + boxlist1.add_field('scores', scores1) + boxlist2 = box_list.BoxList(corners2) + boxlist2.add_field('scores', scores2) + with self.assertRaises(ValueError): + box_list_ops.concatenate([boxlist1, boxlist2]) + + def test_concatenate_is_correct(self): + corners1 = tf.constant([[0, 0, 0, 0], [1, 2, 3, 4]], tf.float32) + scores1 = tf.constant([1.0, 2.1]) + corners2 = tf.constant([[0, 3, 1, 6], [2, 4, 3, 8], [1, 0, 5, 10]], + tf.float32) + scores2 = tf.constant([1.0, 2.1, 5.6]) + + exp_corners = [[0, 0, 0, 0], + [1, 2, 3, 4], + [0, 3, 1, 6], + [2, 4, 3, 8], + [1, 0, 5, 10]] + exp_scores = [1.0, 2.1, 1.0, 2.1, 5.6] + + boxlist1 = box_list.BoxList(corners1) + boxlist1.add_field('scores', scores1) + boxlist2 = box_list.BoxList(corners2) + boxlist2.add_field('scores', scores2) + result = box_list_ops.concatenate([boxlist1, boxlist2]) + with self.test_session() as sess: + corners_output, scores_output = sess.run( + [result.get(), result.get_field('scores')]) + self.assertAllClose(corners_output, exp_corners) + self.assertAllClose(scores_output, exp_scores) + + +class NonMaxSuppressionTest(tf.test.TestCase): + + def test_with_invalid_scores_field(self): + corners = tf.constant([[0, 0, 1, 1], + [0, 0.1, 1, 1.1], + [0, -0.1, 1, 0.9], + [0, 10, 1, 11], + [0, 10.1, 1, 11.1], + [0, 100, 1, 101]], tf.float32) + boxes = box_list.BoxList(corners) + boxes.add_field('scores', tf.constant([.9, .75, .6, .95, .5])) + iou_thresh = .5 + max_output_size = 3 + nms = box_list_ops.non_max_suppression( + boxes, iou_thresh, max_output_size) + with self.test_session() as sess: + with self.assertRaisesWithPredicateMatch( + errors.InvalidArgumentError, 'scores has incompatible shape'): + sess.run(nms.get()) + + def test_select_from_three_clusters(self): + corners = tf.constant([[0, 0, 1, 1], + [0, 0.1, 1, 1.1], + [0, -0.1, 1, 0.9], + [0, 10, 1, 11], + [0, 10.1, 1, 11.1], + [0, 100, 1, 101]], tf.float32) + boxes = box_list.BoxList(corners) + boxes.add_field('scores', tf.constant([.9, .75, .6, .95, .5, .3])) + iou_thresh = .5 + max_output_size = 3 + + exp_nms = [[0, 10, 1, 11], + [0, 0, 1, 1], + [0, 100, 1, 101]] + nms = box_list_ops.non_max_suppression( + boxes, iou_thresh, max_output_size) + with self.test_session() as sess: + nms_output = sess.run(nms.get()) + self.assertAllClose(nms_output, exp_nms) + + def test_select_at_most_two_boxes_from_three_clusters(self): + corners = tf.constant([[0, 0, 1, 1], + [0, 0.1, 1, 1.1], + [0, -0.1, 1, 0.9], + [0, 10, 1, 11], + [0, 10.1, 1, 11.1], + [0, 100, 1, 101]], tf.float32) + boxes = box_list.BoxList(corners) + boxes.add_field('scores', tf.constant([.9, .75, .6, .95, .5, .3])) + iou_thresh = .5 + max_output_size = 2 + + exp_nms = [[0, 10, 1, 11], + [0, 0, 1, 1]] + nms = box_list_ops.non_max_suppression( + boxes, iou_thresh, max_output_size) + with self.test_session() as sess: + nms_output = sess.run(nms.get()) + self.assertAllClose(nms_output, exp_nms) + + def test_select_at_most_thirty_boxes_from_three_clusters(self): + corners = tf.constant([[0, 0, 1, 1], + [0, 0.1, 1, 1.1], + [0, -0.1, 1, 0.9], + [0, 10, 1, 11], + [0, 10.1, 1, 11.1], + [0, 100, 1, 101]], tf.float32) + boxes = box_list.BoxList(corners) + boxes.add_field('scores', tf.constant([.9, .75, .6, .95, .5, .3])) + iou_thresh = .5 + max_output_size = 30 + + exp_nms = [[0, 10, 1, 11], + [0, 0, 1, 1], + [0, 100, 1, 101]] + nms = box_list_ops.non_max_suppression( + boxes, iou_thresh, max_output_size) + with self.test_session() as sess: + nms_output = sess.run(nms.get()) + self.assertAllClose(nms_output, exp_nms) + + def test_select_single_box(self): + corners = tf.constant([[0, 0, 1, 1]], tf.float32) + boxes = box_list.BoxList(corners) + boxes.add_field('scores', tf.constant([.9])) + iou_thresh = .5 + max_output_size = 3 + + exp_nms = [[0, 0, 1, 1]] + nms = box_list_ops.non_max_suppression( + boxes, iou_thresh, max_output_size) + with self.test_session() as sess: + nms_output = sess.run(nms.get()) + self.assertAllClose(nms_output, exp_nms) + + def test_select_from_ten_identical_boxes(self): + corners = tf.constant(10 * [[0, 0, 1, 1]], tf.float32) + boxes = box_list.BoxList(corners) + boxes.add_field('scores', tf.constant(10 * [.9])) + iou_thresh = .5 + max_output_size = 3 + + exp_nms = [[0, 0, 1, 1]] + nms = box_list_ops.non_max_suppression( + boxes, iou_thresh, max_output_size) + with self.test_session() as sess: + nms_output = sess.run(nms.get()) + self.assertAllClose(nms_output, exp_nms) + + def test_copy_extra_fields(self): + corners = tf.constant([[0, 0, 1, 1], + [0, 0.1, 1, 1.1]], tf.float32) + boxes = box_list.BoxList(corners) + tensor1 = np.array([[1], [4]]) + tensor2 = np.array([[1, 1], [2, 2]]) + boxes.add_field('tensor1', tf.constant(tensor1)) + boxes.add_field('tensor2', tf.constant(tensor2)) + new_boxes = box_list.BoxList(tf.constant([[0, 0, 10, 10], + [1, 3, 5, 5]], tf.float32)) + new_boxes = box_list_ops._copy_extra_fields(new_boxes, boxes) + with self.test_session() as sess: + self.assertAllClose(tensor1, sess.run(new_boxes.get_field('tensor1'))) + self.assertAllClose(tensor2, sess.run(new_boxes.get_field('tensor2'))) + + +class CoordinatesConversionTest(tf.test.TestCase): + + def test_to_normalized_coordinates(self): + coordinates = tf.constant([[0, 0, 100, 100], + [25, 25, 75, 75]], tf.float32) + img = tf.ones((128, 100, 100, 3)) + boxlist = box_list.BoxList(coordinates) + normalized_boxlist = box_list_ops.to_normalized_coordinates( + boxlist, tf.shape(img)[1], tf.shape(img)[2]) + expected_boxes = [[0, 0, 1, 1], + [0.25, 0.25, 0.75, 0.75]] + + with self.test_session() as sess: + normalized_boxes = sess.run(normalized_boxlist.get()) + self.assertAllClose(normalized_boxes, expected_boxes) + + def test_to_normalized_coordinates_already_normalized(self): + coordinates = tf.constant([[0, 0, 1, 1], + [0.25, 0.25, 0.75, 0.75]], tf.float32) + img = tf.ones((128, 100, 100, 3)) + boxlist = box_list.BoxList(coordinates) + normalized_boxlist = box_list_ops.to_normalized_coordinates( + boxlist, tf.shape(img)[1], tf.shape(img)[2]) + + with self.test_session() as sess: + with self.assertRaisesOpError('assertion failed'): + sess.run(normalized_boxlist.get()) + + def test_to_absolute_coordinates(self): + coordinates = tf.constant([[0, 0, 1, 1], + [0.25, 0.25, 0.75, 0.75]], tf.float32) + img = tf.ones((128, 100, 100, 3)) + boxlist = box_list.BoxList(coordinates) + absolute_boxlist = box_list_ops.to_absolute_coordinates(boxlist, + tf.shape(img)[1], + tf.shape(img)[2]) + expected_boxes = [[0, 0, 100, 100], + [25, 25, 75, 75]] + + with self.test_session() as sess: + absolute_boxes = sess.run(absolute_boxlist.get()) + self.assertAllClose(absolute_boxes, expected_boxes) + + def test_to_absolute_coordinates_already_abolute(self): + coordinates = tf.constant([[0, 0, 100, 100], + [25, 25, 75, 75]], tf.float32) + img = tf.ones((128, 100, 100, 3)) + boxlist = box_list.BoxList(coordinates) + absolute_boxlist = box_list_ops.to_absolute_coordinates(boxlist, + tf.shape(img)[1], + tf.shape(img)[2]) + + with self.test_session() as sess: + with self.assertRaisesOpError('assertion failed'): + sess.run(absolute_boxlist.get()) + + def test_convert_to_normalized_and_back(self): + coordinates = np.random.uniform(size=(100, 4)) + coordinates = np.round(np.sort(coordinates) * 200) + coordinates[:, 2:4] += 1 + coordinates[99, :] = [0, 0, 201, 201] + img = tf.ones((128, 202, 202, 3)) + + boxlist = box_list.BoxList(tf.constant(coordinates, tf.float32)) + boxlist = box_list_ops.to_normalized_coordinates(boxlist, + tf.shape(img)[1], + tf.shape(img)[2]) + boxlist = box_list_ops.to_absolute_coordinates(boxlist, + tf.shape(img)[1], + tf.shape(img)[2]) + + with self.test_session() as sess: + out = sess.run(boxlist.get()) + self.assertAllClose(out, coordinates) + + def test_convert_to_absolute_and_back(self): + coordinates = np.random.uniform(size=(100, 4)) + coordinates = np.sort(coordinates) + coordinates[99, :] = [0, 0, 1, 1] + img = tf.ones((128, 202, 202, 3)) + + boxlist = box_list.BoxList(tf.constant(coordinates, tf.float32)) + boxlist = box_list_ops.to_absolute_coordinates(boxlist, + tf.shape(img)[1], + tf.shape(img)[2]) + boxlist = box_list_ops.to_normalized_coordinates(boxlist, + tf.shape(img)[1], + tf.shape(img)[2]) + + with self.test_session() as sess: + out = sess.run(boxlist.get()) + self.assertAllClose(out, coordinates) + + +class BoxRefinementTest(tf.test.TestCase): + + def test_box_voting(self): + candidates = box_list.BoxList( + tf.constant([[0.1, 0.1, 0.4, 0.4], [0.6, 0.6, 0.8, 0.8]], tf.float32)) + candidates.add_field('ExtraField', tf.constant([1, 2])) + pool = box_list.BoxList( + tf.constant([[0.1, 0.1, 0.4, 0.4], [0.1, 0.1, 0.5, 0.5], + [0.6, 0.6, 0.8, 0.8]], tf.float32)) + pool.add_field('scores', tf.constant([0.75, 0.25, 0.3])) + averaged_boxes = box_list_ops.box_voting(candidates, pool) + expected_boxes = [[0.1, 0.1, 0.425, 0.425], [0.6, 0.6, 0.8, 0.8]] + expected_scores = [0.5, 0.3] + with self.test_session() as sess: + boxes_out, scores_out, extra_field_out = sess.run( + [averaged_boxes.get(), averaged_boxes.get_field('scores'), + averaged_boxes.get_field('ExtraField')]) + + self.assertAllClose(expected_boxes, boxes_out) + self.assertAllClose(expected_scores, scores_out) + self.assertAllEqual(extra_field_out, [1, 2]) + + def test_box_voting_fails_with_negative_scores(self): + candidates = box_list.BoxList( + tf.constant([[0.1, 0.1, 0.4, 0.4]], tf.float32)) + pool = box_list.BoxList(tf.constant([[0.1, 0.1, 0.4, 0.4]], tf.float32)) + pool.add_field('scores', tf.constant([-0.2])) + averaged_boxes = box_list_ops.box_voting(candidates, pool) + + with self.test_session() as sess: + with self.assertRaisesOpError('Scores must be non negative'): + sess.run([averaged_boxes.get()]) + + def test_box_voting_fails_when_unmatched(self): + candidates = box_list.BoxList( + tf.constant([[0.1, 0.1, 0.4, 0.4]], tf.float32)) + pool = box_list.BoxList(tf.constant([[0.6, 0.6, 0.8, 0.8]], tf.float32)) + pool.add_field('scores', tf.constant([0.2])) + averaged_boxes = box_list_ops.box_voting(candidates, pool) + + with self.test_session() as sess: + with self.assertRaisesOpError('Each box in selected_boxes must match ' + 'with at least one box in pool_boxes.'): + sess.run([averaged_boxes.get()]) + + def test_refine_boxes(self): + pool = box_list.BoxList( + tf.constant([[0.1, 0.1, 0.4, 0.4], [0.1, 0.1, 0.5, 0.5], + [0.6, 0.6, 0.8, 0.8]], tf.float32)) + pool.add_field('ExtraField', tf.constant([1, 2, 3])) + pool.add_field('scores', tf.constant([0.75, 0.25, 0.3])) + refined_boxes = box_list_ops.refine_boxes(pool, 0.5, 10) + + expected_boxes = [[0.1, 0.1, 0.425, 0.425], [0.6, 0.6, 0.8, 0.8]] + expected_scores = [0.5, 0.3] + with self.test_session() as sess: + boxes_out, scores_out, extra_field_out = sess.run( + [refined_boxes.get(), refined_boxes.get_field('scores'), + refined_boxes.get_field('ExtraField')]) + + self.assertAllClose(expected_boxes, boxes_out) + self.assertAllClose(expected_scores, scores_out) + self.assertAllEqual(extra_field_out, [1, 3]) + + def test_refine_boxes_multi_class(self): + pool = box_list.BoxList( + tf.constant([[0.1, 0.1, 0.4, 0.4], [0.1, 0.1, 0.5, 0.5], + [0.6, 0.6, 0.8, 0.8], [0.2, 0.2, 0.3, 0.3]], tf.float32)) + pool.add_field('classes', tf.constant([0, 0, 1, 1])) + pool.add_field('scores', tf.constant([0.75, 0.25, 0.3, 0.2])) + refined_boxes = box_list_ops.refine_boxes_multi_class(pool, 3, 0.5, 10) + + expected_boxes = [[0.1, 0.1, 0.425, 0.425], [0.6, 0.6, 0.8, 0.8], + [0.2, 0.2, 0.3, 0.3]] + expected_scores = [0.5, 0.3, 0.2] + with self.test_session() as sess: + boxes_out, scores_out, extra_field_out = sess.run( + [refined_boxes.get(), refined_boxes.get_field('scores'), + refined_boxes.get_field('classes')]) + + self.assertAllClose(expected_boxes, boxes_out) + self.assertAllClose(expected_scores, scores_out) + self.assertAllEqual(extra_field_out, [0, 1, 1]) + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/core/box_list_test.py b/object_detection/core/box_list_test.py new file mode 100644 index 0000000000000000000000000000000000000000..edc00ebbc40227713739e2583fe9fc067e9449e2 --- /dev/null +++ b/object_detection/core/box_list_test.py @@ -0,0 +1,134 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.core.box_list.""" + +import tensorflow as tf + +from object_detection.core import box_list + + +class BoxListTest(tf.test.TestCase): + """Tests for BoxList class.""" + + def test_num_boxes(self): + data = tf.constant([[0, 0, 1, 1], [1, 1, 2, 3], [3, 4, 5, 5]], tf.float32) + expected_num_boxes = 3 + + boxes = box_list.BoxList(data) + with self.test_session() as sess: + num_boxes_output = sess.run(boxes.num_boxes()) + self.assertEquals(num_boxes_output, expected_num_boxes) + + def test_get_correct_center_coordinates_and_sizes(self): + boxes = [[10.0, 10.0, 20.0, 15.0], [0.2, 0.1, 0.5, 0.4]] + boxes = box_list.BoxList(tf.constant(boxes)) + centers_sizes = boxes.get_center_coordinates_and_sizes() + expected_centers_sizes = [[15, 0.35], [12.5, 0.25], [10, 0.3], [5, 0.3]] + with self.test_session() as sess: + centers_sizes_out = sess.run(centers_sizes) + self.assertAllClose(centers_sizes_out, expected_centers_sizes) + + def test_create_box_list_with_dynamic_shape(self): + data = tf.constant([[0, 0, 1, 1], [1, 1, 2, 3], [3, 4, 5, 5]], tf.float32) + indices = tf.reshape(tf.where(tf.greater([1, 0, 1], 0)), [-1]) + data = tf.gather(data, indices) + assert data.get_shape().as_list() == [None, 4] + expected_num_boxes = 2 + + boxes = box_list.BoxList(data) + with self.test_session() as sess: + num_boxes_output = sess.run(boxes.num_boxes()) + self.assertEquals(num_boxes_output, expected_num_boxes) + + def test_transpose_coordinates(self): + boxes = [[10.0, 10.0, 20.0, 15.0], [0.2, 0.1, 0.5, 0.4]] + boxes = box_list.BoxList(tf.constant(boxes)) + boxes.transpose_coordinates() + expected_corners = [[10.0, 10.0, 15.0, 20.0], [0.1, 0.2, 0.4, 0.5]] + with self.test_session() as sess: + corners_out = sess.run(boxes.get()) + self.assertAllClose(corners_out, expected_corners) + + def test_box_list_invalid_inputs(self): + data0 = tf.constant([[[0, 0, 1, 1], [3, 4, 5, 5]]], tf.float32) + data1 = tf.constant([[0, 0, 1], [1, 1, 2], [3, 4, 5]], tf.float32) + data2 = tf.constant([[0, 0, 1], [1, 1, 2], [3, 4, 5]], tf.int32) + + with self.assertRaises(ValueError): + _ = box_list.BoxList(data0) + with self.assertRaises(ValueError): + _ = box_list.BoxList(data1) + with self.assertRaises(ValueError): + _ = box_list.BoxList(data2) + + def test_num_boxes_static(self): + box_corners = [[10.0, 10.0, 20.0, 15.0], [0.2, 0.1, 0.5, 0.4]] + boxes = box_list.BoxList(tf.constant(box_corners)) + self.assertEquals(boxes.num_boxes_static(), 2) + self.assertEquals(type(boxes.num_boxes_static()), int) + + def test_num_boxes_static_for_uninferrable_shape(self): + placeholder = tf.placeholder(tf.float32, shape=[None, 4]) + boxes = box_list.BoxList(placeholder) + self.assertEquals(boxes.num_boxes_static(), None) + + def test_as_tensor_dict(self): + boxlist = box_list.BoxList( + tf.constant([[0.1, 0.1, 0.4, 0.4], [0.1, 0.1, 0.5, 0.5]], tf.float32)) + boxlist.add_field('classes', tf.constant([0, 1])) + boxlist.add_field('scores', tf.constant([0.75, 0.2])) + tensor_dict = boxlist.as_tensor_dict() + + expected_boxes = [[0.1, 0.1, 0.4, 0.4], [0.1, 0.1, 0.5, 0.5]] + expected_classes = [0, 1] + expected_scores = [0.75, 0.2] + + with self.test_session() as sess: + tensor_dict_out = sess.run(tensor_dict) + self.assertAllEqual(3, len(tensor_dict_out)) + self.assertAllClose(expected_boxes, tensor_dict_out['boxes']) + self.assertAllEqual(expected_classes, tensor_dict_out['classes']) + self.assertAllClose(expected_scores, tensor_dict_out['scores']) + + def test_as_tensor_dict_with_features(self): + boxlist = box_list.BoxList( + tf.constant([[0.1, 0.1, 0.4, 0.4], [0.1, 0.1, 0.5, 0.5]], tf.float32)) + boxlist.add_field('classes', tf.constant([0, 1])) + boxlist.add_field('scores', tf.constant([0.75, 0.2])) + tensor_dict = boxlist.as_tensor_dict(['boxes', 'classes', 'scores']) + + expected_boxes = [[0.1, 0.1, 0.4, 0.4], [0.1, 0.1, 0.5, 0.5]] + expected_classes = [0, 1] + expected_scores = [0.75, 0.2] + + with self.test_session() as sess: + tensor_dict_out = sess.run(tensor_dict) + self.assertAllEqual(3, len(tensor_dict_out)) + self.assertAllClose(expected_boxes, tensor_dict_out['boxes']) + self.assertAllEqual(expected_classes, tensor_dict_out['classes']) + self.assertAllClose(expected_scores, tensor_dict_out['scores']) + + def test_as_tensor_dict_missing_field(self): + boxlist = box_list.BoxList( + tf.constant([[0.1, 0.1, 0.4, 0.4], [0.1, 0.1, 0.5, 0.5]], tf.float32)) + boxlist.add_field('classes', tf.constant([0, 1])) + boxlist.add_field('scores', tf.constant([0.75, 0.2])) + with self.assertRaises(ValueError): + boxlist.as_tensor_dict(['foo', 'bar']) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/core/box_predictor.py b/object_detection/core/box_predictor.py new file mode 100644 index 0000000000000000000000000000000000000000..71540c11f5fe3639defd43345aa24c9c548791b9 --- /dev/null +++ b/object_detection/core/box_predictor.py @@ -0,0 +1,546 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Box predictor for object detectors. + +Box predictors are classes that take a high level +image feature map as input and produce two predictions, +(1) a tensor encoding box locations, and +(2) a tensor encoding classes for each box. + +These components are passed directly to loss functions +in our detection models. + +These modules are separated from the main model since the same +few box predictor architectures are shared across many models. +""" +from abc import abstractmethod +import tensorflow as tf +from object_detection.utils import ops +from object_detection.utils import static_shape + +slim = tf.contrib.slim + +BOX_ENCODINGS = 'box_encodings' +CLASS_PREDICTIONS_WITH_BACKGROUND = 'class_predictions_with_background' +MASK_PREDICTIONS = 'mask_predictions' + + +class BoxPredictor(object): + """BoxPredictor.""" + + def __init__(self, is_training, num_classes): + """Constructor. + + Args: + is_training: Indicates whether the BoxPredictor is in training mode. + num_classes: number of classes. Note that num_classes *does not* + include the background category, so if groundtruth labels take values + in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the + assigned classification targets can range from {0,... K}). + """ + self._is_training = is_training + self._num_classes = num_classes + + @property + def num_classes(self): + return self._num_classes + + def predict(self, image_features, num_predictions_per_location, scope, + **params): + """Computes encoded object locations and corresponding confidences. + + Takes a high level image feature map as input and produce two predictions, + (1) a tensor encoding box locations, and + (2) a tensor encoding class scores for each corresponding box. + In this interface, we only assume that two tensors are returned as output + and do not assume anything about their shapes. + + Args: + image_features: A float tensor of shape [batch_size, height, width, + channels] containing features for a batch of images. + num_predictions_per_location: an integer representing the number of box + predictions to be made per spatial location in the feature map. + scope: Variable and Op scope name. + **params: Additional keyword arguments for specific implementations of + BoxPredictor. + + Returns: + A dictionary containing at least the following tensors. + box_encodings: A float tensor of shape + [batch_size, num_anchors, q, code_size] representing the location of + the objects, where q is 1 or the number of classes. + class_predictions_with_background: A float tensor of shape + [batch_size, num_anchors, num_classes + 1] representing the class + predictions for the proposals. + """ + with tf.variable_scope(scope): + return self._predict(image_features, num_predictions_per_location, + **params) + + # TODO: num_predictions_per_location could be moved to constructor. + # This is currently only used by ConvolutionalBoxPredictor. + @abstractmethod + def _predict(self, image_features, num_predictions_per_location, **params): + """Implementations must override this method. + + Args: + image_features: A float tensor of shape [batch_size, height, width, + channels] containing features for a batch of images. + num_predictions_per_location: an integer representing the number of box + predictions to be made per spatial location in the feature map. + **params: Additional keyword arguments for specific implementations of + BoxPredictor. + + Returns: + A dictionary containing at least the following tensors. + box_encodings: A float tensor of shape + [batch_size, num_anchors, q, code_size] representing the location of + the objects, where q is 1 or the number of classes. + class_predictions_with_background: A float tensor of shape + [batch_size, num_anchors, num_classes + 1] representing the class + predictions for the proposals. + """ + pass + + +class RfcnBoxPredictor(BoxPredictor): + """RFCN Box Predictor. + + Applies a position sensitve ROI pooling on position sensitive feature maps to + predict classes and refined locations. See https://arxiv.org/abs/1605.06409 + for details. + + This is used for the second stage of the RFCN meta architecture. Notice that + locations are *not* shared across classes, thus for each anchor, a separate + prediction is made for each class. + """ + + def __init__(self, + is_training, + num_classes, + conv_hyperparams, + num_spatial_bins, + depth, + crop_size, + box_code_size): + """Constructor. + + Args: + is_training: Indicates whether the BoxPredictor is in training mode. + num_classes: number of classes. Note that num_classes *does not* + include the background category, so if groundtruth labels take values + in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the + assigned classification targets can range from {0,... K}). + conv_hyperparams: Slim arg_scope with hyperparameters for conolutional + layers. + num_spatial_bins: A list of two integers `[spatial_bins_y, + spatial_bins_x]`. + depth: Target depth to reduce the input feature maps to. + crop_size: A list of two integers `[crop_height, crop_width]`. + box_code_size: Size of encoding for each box. + """ + super(RfcnBoxPredictor, self).__init__(is_training, num_classes) + self._conv_hyperparams = conv_hyperparams + self._num_spatial_bins = num_spatial_bins + self._depth = depth + self._crop_size = crop_size + self._box_code_size = box_code_size + + @property + def num_classes(self): + return self._num_classes + + def _predict(self, image_features, num_predictions_per_location, + proposal_boxes): + """Computes encoded object locations and corresponding confidences. + + Args: + image_features: A float tensor of shape [batch_size, height, width, + channels] containing features for a batch of images. + num_predictions_per_location: an integer representing the number of box + predictions to be made per spatial location in the feature map. + Currently, this must be set to 1, or an error will be raised. + proposal_boxes: A float tensor of shape [batch_size, num_proposals, + box_code_size]. + + Returns: + box_encodings: A float tensor of shape + [batch_size, 1, num_classes, code_size] representing the + location of the objects. + class_predictions_with_background: A float tensor of shape + [batch_size, 1, num_classes + 1] representing the class + predictions for the proposals. + Raises: + ValueError: if num_predictions_per_location is not 1. + """ + if num_predictions_per_location != 1: + raise ValueError('Currently RfcnBoxPredictor only supports ' + 'predicting a single box per class per location.') + + batch_size = tf.shape(proposal_boxes)[0] + num_boxes = tf.shape(proposal_boxes)[1] + def get_box_indices(proposals): + proposals_shape = proposals.get_shape().as_list() + if any(dim is None for dim in proposals_shape): + proposals_shape = tf.shape(proposals) + ones_mat = tf.ones(proposals_shape[:2], dtype=tf.int32) + multiplier = tf.expand_dims( + tf.range(start=0, limit=proposals_shape[0]), 1) + return tf.reshape(ones_mat * multiplier, [-1]) + + net = image_features + with slim.arg_scope(self._conv_hyperparams): + net = slim.conv2d(net, self._depth, [1, 1], scope='reduce_depth') + # Location predictions. + location_feature_map_depth = (self._num_spatial_bins[0] * + self._num_spatial_bins[1] * + self.num_classes * + self._box_code_size) + location_feature_map = slim.conv2d(net, location_feature_map_depth, + [1, 1], activation_fn=None, + scope='refined_locations') + box_encodings = ops.position_sensitive_crop_regions( + location_feature_map, + boxes=tf.reshape(proposal_boxes, [-1, self._box_code_size]), + box_ind=get_box_indices(proposal_boxes), + crop_size=self._crop_size, + num_spatial_bins=self._num_spatial_bins, + global_pool=True) + box_encodings = tf.squeeze(box_encodings, squeeze_dims=[1, 2]) + box_encodings = tf.reshape(box_encodings, + [batch_size * num_boxes, 1, self.num_classes, + self._box_code_size]) + + # Class predictions. + total_classes = self.num_classes + 1 # Account for background class. + class_feature_map_depth = (self._num_spatial_bins[0] * + self._num_spatial_bins[1] * + total_classes) + class_feature_map = slim.conv2d(net, class_feature_map_depth, [1, 1], + activation_fn=None, + scope='class_predictions') + class_predictions_with_background = ops.position_sensitive_crop_regions( + class_feature_map, + boxes=tf.reshape(proposal_boxes, [-1, self._box_code_size]), + box_ind=get_box_indices(proposal_boxes), + crop_size=self._crop_size, + num_spatial_bins=self._num_spatial_bins, + global_pool=True) + class_predictions_with_background = tf.squeeze( + class_predictions_with_background, squeeze_dims=[1, 2]) + class_predictions_with_background = tf.reshape( + class_predictions_with_background, + [batch_size * num_boxes, 1, total_classes]) + + return {BOX_ENCODINGS: box_encodings, + CLASS_PREDICTIONS_WITH_BACKGROUND: + class_predictions_with_background} + + +class MaskRCNNBoxPredictor(BoxPredictor): + """Mask R-CNN Box Predictor. + + See Mask R-CNN: He, K., Gkioxari, G., Dollar, P., & Girshick, R. (2017). + Mask R-CNN. arXiv preprint arXiv:1703.06870. + + This is used for the second stage of the Mask R-CNN detector where proposals + cropped from an image are arranged along the batch dimension of the input + image_features tensor. Notice that locations are *not* shared across classes, + thus for each anchor, a separate prediction is made for each class. + + In addition to predicting boxes and classes, optionally this class allows + predicting masks and/or keypoints inside detection boxes. + + Currently this box predictor makes per-class predictions; that is, each + anchor makes a separate box prediction for each class. + """ + + def __init__(self, + is_training, + num_classes, + fc_hyperparams, + use_dropout, + dropout_keep_prob, + box_code_size, + conv_hyperparams=None, + predict_instance_masks=False, + mask_prediction_conv_depth=256, + predict_keypoints=False): + """Constructor. + + Args: + is_training: Indicates whether the BoxPredictor is in training mode. + num_classes: number of classes. Note that num_classes *does not* + include the background category, so if groundtruth labels take values + in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the + assigned classification targets can range from {0,... K}). + fc_hyperparams: Slim arg_scope with hyperparameters for fully + connected ops. + use_dropout: Option to use dropout or not. Note that a single dropout + op is applied here prior to both box and class predictions, which stands + in contrast to the ConvolutionalBoxPredictor below. + dropout_keep_prob: Keep probability for dropout. + This is only used if use_dropout is True. + box_code_size: Size of encoding for each box. + conv_hyperparams: Slim arg_scope with hyperparameters for convolution + ops. + predict_instance_masks: Whether to predict object masks inside detection + boxes. + mask_prediction_conv_depth: The depth for the first conv2d_transpose op + applied to the image_features in the mask prediciton branch. + predict_keypoints: Whether to predict keypoints insde detection boxes. + + + Raises: + ValueError: If predict_instance_masks or predict_keypoints is true. + """ + super(MaskRCNNBoxPredictor, self).__init__(is_training, num_classes) + self._fc_hyperparams = fc_hyperparams + self._use_dropout = use_dropout + self._box_code_size = box_code_size + self._dropout_keep_prob = dropout_keep_prob + self._conv_hyperparams = conv_hyperparams + self._predict_instance_masks = predict_instance_masks + self._mask_prediction_conv_depth = mask_prediction_conv_depth + self._predict_keypoints = predict_keypoints + if self._predict_keypoints: + raise ValueError('Keypoint prediction is unimplemented.') + if ((self._predict_instance_masks or self._predict_keypoints) and + self._conv_hyperparams is None): + raise ValueError('`conv_hyperparams` must be provided when predicting ' + 'masks.') + + @property + def num_classes(self): + return self._num_classes + + def _predict(self, image_features, num_predictions_per_location): + """Computes encoded object locations and corresponding confidences. + + Flattens image_features and applies fully connected ops (with no + non-linearity) to predict box encodings and class predictions. In this + setting, anchors are not spatially arranged in any way and are assumed to + have been folded into the batch dimension. Thus we output 1 for the + anchors dimension. + + Args: + image_features: A float tensor of shape [batch_size, height, width, + channels] containing features for a batch of images. + num_predictions_per_location: an integer representing the number of box + predictions to be made per spatial location in the feature map. + Currently, this must be set to 1, or an error will be raised. + + Returns: + A dictionary containing the following tensors. + box_encodings: A float tensor of shape + [batch_size, 1, num_classes, code_size] representing the + location of the objects. + class_predictions_with_background: A float tensor of shape + [batch_size, 1, num_classes + 1] representing the class + predictions for the proposals. + If predict_masks is True the dictionary also contains: + instance_masks: A float tensor of shape + [batch_size, 1, num_classes, image_height, image_width] + If predict_keypoints is True the dictionary also contains: + keypoints: [batch_size, 1, num_keypoints, 2] + + Raises: + ValueError: if num_predictions_per_location is not 1. + """ + if num_predictions_per_location != 1: + raise ValueError('Currently FullyConnectedBoxPredictor only supports ' + 'predicting a single box per class per location.') + spatial_averaged_image_features = tf.reduce_mean(image_features, [1, 2], + keep_dims=True, + name='AvgPool') + flattened_image_features = slim.flatten(spatial_averaged_image_features) + if self._use_dropout: + flattened_image_features = slim.dropout(flattened_image_features, + keep_prob=self._dropout_keep_prob, + is_training=self._is_training) + with slim.arg_scope(self._fc_hyperparams): + box_encodings = slim.fully_connected( + flattened_image_features, + self._num_classes * self._box_code_size, + activation_fn=None, + scope='BoxEncodingPredictor') + class_predictions_with_background = slim.fully_connected( + flattened_image_features, + self._num_classes + 1, + activation_fn=None, + scope='ClassPredictor') + box_encodings = tf.reshape( + box_encodings, [-1, 1, self._num_classes, self._box_code_size]) + class_predictions_with_background = tf.reshape( + class_predictions_with_background, [-1, 1, self._num_classes + 1]) + + predictions_dict = { + BOX_ENCODINGS: box_encodings, + CLASS_PREDICTIONS_WITH_BACKGROUND: class_predictions_with_background + } + + if self._predict_instance_masks: + with slim.arg_scope(self._conv_hyperparams): + upsampled_features = slim.conv2d_transpose( + image_features, + num_outputs=self._mask_prediction_conv_depth, + kernel_size=[2, 2], + stride=2) + mask_predictions = slim.conv2d(upsampled_features, + num_outputs=self.num_classes, + activation_fn=None, + kernel_size=[1, 1]) + instance_masks = tf.expand_dims(tf.transpose(mask_predictions, + perm=[0, 3, 1, 2]), + axis=1, + name='MaskPredictor') + predictions_dict[MASK_PREDICTIONS] = instance_masks + return predictions_dict + + +class ConvolutionalBoxPredictor(BoxPredictor): + """Convolutional Box Predictor. + + Optionally add an intermediate 1x1 convolutional layer after features and + predict in parallel branches box_encodings and + class_predictions_with_background. + + Currently this box predictor assumes that predictions are "shared" across + classes --- that is each anchor makes box predictions which do not depend + on class. + """ + + def __init__(self, + is_training, + num_classes, + conv_hyperparams, + min_depth, + max_depth, + num_layers_before_predictor, + use_dropout, + dropout_keep_prob, + kernel_size, + box_code_size, + apply_sigmoid_to_scores=False): + """Constructor. + + Args: + is_training: Indicates whether the BoxPredictor is in training mode. + num_classes: number of classes. Note that num_classes *does not* + include the background category, so if groundtruth labels take values + in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the + assigned classification targets can range from {0,... K}). + conv_hyperparams: Slim arg_scope with hyperparameters for convolution ops. + min_depth: Minumum feature depth prior to predicting box encodings + and class predictions. + max_depth: Maximum feature depth prior to predicting box encodings + and class predictions. If max_depth is set to 0, no additional + feature map will be inserted before location and class predictions. + num_layers_before_predictor: Number of the additional conv layers before + the predictor. + use_dropout: Option to use dropout for class prediction or not. + dropout_keep_prob: Keep probability for dropout. + This is only used if use_dropout is True. + kernel_size: Size of final convolution kernel. If the + spatial resolution of the feature map is smaller than the kernel size, + then the kernel size is automatically set to be + min(feature_width, feature_height). + box_code_size: Size of encoding for each box. + apply_sigmoid_to_scores: if True, apply the sigmoid on the output + class_predictions. + + Raises: + ValueError: if min_depth > max_depth. + """ + super(ConvolutionalBoxPredictor, self).__init__(is_training, num_classes) + if min_depth > max_depth: + raise ValueError('min_depth should be less than or equal to max_depth') + self._conv_hyperparams = conv_hyperparams + self._min_depth = min_depth + self._max_depth = max_depth + self._num_layers_before_predictor = num_layers_before_predictor + self._use_dropout = use_dropout + self._kernel_size = kernel_size + self._box_code_size = box_code_size + self._dropout_keep_prob = dropout_keep_prob + self._apply_sigmoid_to_scores = apply_sigmoid_to_scores + + def _predict(self, image_features, num_predictions_per_location): + """Computes encoded object locations and corresponding confidences. + + Args: + image_features: A float tensor of shape [batch_size, height, width, + channels] containing features for a batch of images. + num_predictions_per_location: an integer representing the number of box + predictions to be made per spatial location in the feature map. + + Returns: + A dictionary containing the following tensors. + box_encodings: A float tensor of shape [batch_size, num_anchors, 1, + code_size] representing the location of the objects, where + num_anchors = feat_height * feat_width * num_predictions_per_location + class_predictions_with_background: A float tensor of shape + [batch_size, num_anchors, num_classes + 1] representing the class + predictions for the proposals. + """ + features_depth = static_shape.get_depth(image_features.get_shape()) + depth = max(min(features_depth, self._max_depth), self._min_depth) + + # Add a slot for the background class. + num_class_slots = self.num_classes + 1 + net = image_features + with slim.arg_scope(self._conv_hyperparams), \ + slim.arg_scope([slim.dropout], is_training=self._is_training): + # Add additional conv layers before the predictor. + if depth > 0 and self._num_layers_before_predictor > 0: + for i in range(self._num_layers_before_predictor): + net = slim.conv2d( + net, depth, [1, 1], scope='Conv2d_%d_1x1_%d' % (i, depth)) + with slim.arg_scope([slim.conv2d], activation_fn=None, + normalizer_fn=None, normalizer_params=None): + box_encodings = slim.conv2d( + net, num_predictions_per_location * self._box_code_size, + [self._kernel_size, self._kernel_size], + scope='BoxEncodingPredictor') + if self._use_dropout: + net = slim.dropout(net, keep_prob=self._dropout_keep_prob) + class_predictions_with_background = slim.conv2d( + net, num_predictions_per_location * num_class_slots, + [self._kernel_size, self._kernel_size], scope='ClassPredictor') + if self._apply_sigmoid_to_scores: + class_predictions_with_background = tf.sigmoid( + class_predictions_with_background) + + batch_size = static_shape.get_batch_size(image_features.get_shape()) + if batch_size is None: + features_height = static_shape.get_height(image_features.get_shape()) + features_width = static_shape.get_width(image_features.get_shape()) + flattened_predictions_size = (features_height * features_width * + num_predictions_per_location) + box_encodings = tf.reshape( + box_encodings, + [-1, flattened_predictions_size, 1, self._box_code_size]) + class_predictions_with_background = tf.reshape( + class_predictions_with_background, + [-1, flattened_predictions_size, num_class_slots]) + else: + box_encodings = tf.reshape( + box_encodings, [batch_size, -1, 1, self._box_code_size]) + class_predictions_with_background = tf.reshape( + class_predictions_with_background, [batch_size, -1, num_class_slots]) + return {BOX_ENCODINGS: box_encodings, + CLASS_PREDICTIONS_WITH_BACKGROUND: + class_predictions_with_background} diff --git a/object_detection/core/box_predictor_test.py b/object_detection/core/box_predictor_test.py new file mode 100644 index 0000000000000000000000000000000000000000..e5e5a3c9a15a2b99e42cbfd0101b9b25051c9000 --- /dev/null +++ b/object_detection/core/box_predictor_test.py @@ -0,0 +1,323 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.core.box_predictor.""" + +import numpy as np +import tensorflow as tf + +from google.protobuf import text_format +from object_detection.builders import hyperparams_builder +from object_detection.core import box_predictor +from object_detection.protos import hyperparams_pb2 + + +class MaskRCNNBoxPredictorTest(tf.test.TestCase): + + def _build_arg_scope_with_hyperparams(self, + op_type=hyperparams_pb2.Hyperparams.FC): + hyperparams = hyperparams_pb2.Hyperparams() + hyperparams_text_proto = """ + activation: NONE + regularizer { + l2_regularizer { + } + } + initializer { + truncated_normal_initializer { + } + } + """ + text_format.Merge(hyperparams_text_proto, hyperparams) + hyperparams.op = op_type + return hyperparams_builder.build(hyperparams, is_training=True) + + def test_get_boxes_with_five_classes(self): + image_features = tf.random_uniform([2, 7, 7, 3], dtype=tf.float32) + mask_box_predictor = box_predictor.MaskRCNNBoxPredictor( + is_training=False, + num_classes=5, + fc_hyperparams=self._build_arg_scope_with_hyperparams(), + use_dropout=False, + dropout_keep_prob=0.5, + box_code_size=4, + ) + box_predictions = mask_box_predictor.predict( + image_features, num_predictions_per_location=1, scope='BoxPredictor') + box_encodings = box_predictions[box_predictor.BOX_ENCODINGS] + class_predictions_with_background = box_predictions[ + box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND] + init_op = tf.global_variables_initializer() + with self.test_session() as sess: + sess.run(init_op) + (box_encodings_shape, + class_predictions_with_background_shape) = sess.run( + [tf.shape(box_encodings), + tf.shape(class_predictions_with_background)]) + self.assertAllEqual(box_encodings_shape, [2, 1, 5, 4]) + self.assertAllEqual(class_predictions_with_background_shape, [2, 1, 6]) + + def test_value_error_on_predict_instance_masks_with_no_conv_hyperparms(self): + with self.assertRaises(ValueError): + box_predictor.MaskRCNNBoxPredictor( + is_training=False, + num_classes=5, + fc_hyperparams=self._build_arg_scope_with_hyperparams(), + use_dropout=False, + dropout_keep_prob=0.5, + box_code_size=4, + predict_instance_masks=True) + + def test_get_instance_masks(self): + image_features = tf.random_uniform([2, 7, 7, 3], dtype=tf.float32) + mask_box_predictor = box_predictor.MaskRCNNBoxPredictor( + is_training=False, + num_classes=5, + fc_hyperparams=self._build_arg_scope_with_hyperparams(), + use_dropout=False, + dropout_keep_prob=0.5, + box_code_size=4, + conv_hyperparams=self._build_arg_scope_with_hyperparams( + op_type=hyperparams_pb2.Hyperparams.CONV), + predict_instance_masks=True) + box_predictions = mask_box_predictor.predict( + image_features, num_predictions_per_location=1, scope='BoxPredictor') + mask_predictions = box_predictions[box_predictor.MASK_PREDICTIONS] + self.assertListEqual([2, 1, 5, 14, 14], + mask_predictions.get_shape().as_list()) + + def test_do_not_return_instance_masks_and_keypoints_without_request(self): + image_features = tf.random_uniform([2, 7, 7, 3], dtype=tf.float32) + mask_box_predictor = box_predictor.MaskRCNNBoxPredictor( + is_training=False, + num_classes=5, + fc_hyperparams=self._build_arg_scope_with_hyperparams(), + use_dropout=False, + dropout_keep_prob=0.5, + box_code_size=4) + box_predictions = mask_box_predictor.predict( + image_features, num_predictions_per_location=1, scope='BoxPredictor') + self.assertEqual(len(box_predictions), 2) + self.assertTrue(box_predictor.BOX_ENCODINGS in box_predictions) + self.assertTrue(box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND + in box_predictions) + + def test_value_error_on_predict_keypoints(self): + with self.assertRaises(ValueError): + box_predictor.MaskRCNNBoxPredictor( + is_training=False, + num_classes=5, + fc_hyperparams=self._build_arg_scope_with_hyperparams(), + use_dropout=False, + dropout_keep_prob=0.5, + box_code_size=4, + predict_keypoints=True) + + +class RfcnBoxPredictorTest(tf.test.TestCase): + + def _build_arg_scope_with_conv_hyperparams(self): + conv_hyperparams = hyperparams_pb2.Hyperparams() + conv_hyperparams_text_proto = """ + regularizer { + l2_regularizer { + } + } + initializer { + truncated_normal_initializer { + } + } + """ + text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams) + return hyperparams_builder.build(conv_hyperparams, is_training=True) + + def test_get_correct_box_encoding_and_class_prediction_shapes(self): + image_features = tf.random_uniform([4, 8, 8, 64], dtype=tf.float32) + proposal_boxes = tf.random_normal([4, 2, 4], dtype=tf.float32) + rfcn_box_predictor = box_predictor.RfcnBoxPredictor( + is_training=False, + num_classes=2, + conv_hyperparams=self._build_arg_scope_with_conv_hyperparams(), + num_spatial_bins=[3, 3], + depth=4, + crop_size=[12, 12], + box_code_size=4 + ) + box_predictions = rfcn_box_predictor.predict( + image_features, num_predictions_per_location=1, scope='BoxPredictor', + proposal_boxes=proposal_boxes) + box_encodings = box_predictions[box_predictor.BOX_ENCODINGS] + class_predictions_with_background = box_predictions[ + box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND] + + init_op = tf.global_variables_initializer() + with self.test_session() as sess: + sess.run(init_op) + (box_encodings_shape, + class_predictions_shape) = sess.run( + [tf.shape(box_encodings), + tf.shape(class_predictions_with_background)]) + self.assertAllEqual(box_encodings_shape, [8, 1, 2, 4]) + self.assertAllEqual(class_predictions_shape, [8, 1, 3]) + + +class ConvolutionalBoxPredictorTest(tf.test.TestCase): + + def _build_arg_scope_with_conv_hyperparams(self): + conv_hyperparams = hyperparams_pb2.Hyperparams() + conv_hyperparams_text_proto = """ + activation: RELU_6 + regularizer { + l2_regularizer { + } + } + initializer { + truncated_normal_initializer { + } + } + """ + text_format.Merge(conv_hyperparams_text_proto, conv_hyperparams) + return hyperparams_builder.build(conv_hyperparams, is_training=True) + + def test_get_boxes_for_five_aspect_ratios_per_location(self): + image_features = tf.random_uniform([4, 8, 8, 64], dtype=tf.float32) + conv_box_predictor = box_predictor.ConvolutionalBoxPredictor( + is_training=False, + num_classes=0, + conv_hyperparams=self._build_arg_scope_with_conv_hyperparams(), + min_depth=0, + max_depth=32, + num_layers_before_predictor=1, + use_dropout=True, + dropout_keep_prob=0.8, + kernel_size=1, + box_code_size=4 + ) + box_predictions = conv_box_predictor.predict( + image_features, num_predictions_per_location=5, scope='BoxPredictor') + box_encodings = box_predictions[box_predictor.BOX_ENCODINGS] + objectness_predictions = box_predictions[ + box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND] + + init_op = tf.global_variables_initializer() + with self.test_session() as sess: + sess.run(init_op) + (box_encodings_shape, + objectness_predictions_shape) = sess.run( + [tf.shape(box_encodings), tf.shape(objectness_predictions)]) + self.assertAllEqual(box_encodings_shape, [4, 320, 1, 4]) + self.assertAllEqual(objectness_predictions_shape, [4, 320, 1]) + + def test_get_boxes_for_one_aspect_ratio_per_location(self): + image_features = tf.random_uniform([4, 8, 8, 64], dtype=tf.float32) + conv_box_predictor = box_predictor.ConvolutionalBoxPredictor( + is_training=False, + num_classes=0, + conv_hyperparams=self._build_arg_scope_with_conv_hyperparams(), + min_depth=0, + max_depth=32, + num_layers_before_predictor=1, + use_dropout=True, + dropout_keep_prob=0.8, + kernel_size=1, + box_code_size=4 + ) + box_predictions = conv_box_predictor.predict( + image_features, num_predictions_per_location=1, scope='BoxPredictor') + box_encodings = box_predictions[box_predictor.BOX_ENCODINGS] + objectness_predictions = box_predictions[ + box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND] + + init_op = tf.global_variables_initializer() + with self.test_session() as sess: + sess.run(init_op) + (box_encodings_shape, + objectness_predictions_shape) = sess.run( + [tf.shape(box_encodings), tf.shape(objectness_predictions)]) + self.assertAllEqual(box_encodings_shape, [4, 64, 1, 4]) + self.assertAllEqual(objectness_predictions_shape, [4, 64, 1]) + + def test_get_multi_class_predictions_for_five_aspect_ratios_per_location( + self): + num_classes_without_background = 6 + image_features = tf.random_uniform([4, 8, 8, 64], dtype=tf.float32) + conv_box_predictor = box_predictor.ConvolutionalBoxPredictor( + is_training=False, + num_classes=num_classes_without_background, + conv_hyperparams=self._build_arg_scope_with_conv_hyperparams(), + min_depth=0, + max_depth=32, + num_layers_before_predictor=1, + use_dropout=True, + dropout_keep_prob=0.8, + kernel_size=1, + box_code_size=4 + ) + box_predictions = conv_box_predictor.predict( + image_features, + num_predictions_per_location=5, + scope='BoxPredictor') + box_encodings = box_predictions[box_predictor.BOX_ENCODINGS] + class_predictions_with_background = box_predictions[ + box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND] + + init_op = tf.global_variables_initializer() + with self.test_session() as sess: + sess.run(init_op) + (box_encodings_shape, class_predictions_with_background_shape + ) = sess.run([ + tf.shape(box_encodings), tf.shape(class_predictions_with_background)]) + self.assertAllEqual(box_encodings_shape, [4, 320, 1, 4]) + self.assertAllEqual(class_predictions_with_background_shape, + [4, 320, num_classes_without_background+1]) + + def test_get_boxes_for_five_aspect_ratios_per_location_fully_convolutional( + self): + image_features = tf.placeholder(dtype=tf.float32, shape=[4, None, None, 64]) + conv_box_predictor = box_predictor.ConvolutionalBoxPredictor( + is_training=False, + num_classes=0, + conv_hyperparams=self._build_arg_scope_with_conv_hyperparams(), + min_depth=0, + max_depth=32, + num_layers_before_predictor=1, + use_dropout=True, + dropout_keep_prob=0.8, + kernel_size=1, + box_code_size=4 + ) + box_predictions = conv_box_predictor.predict( + image_features, num_predictions_per_location=5, scope='BoxPredictor') + box_encodings = box_predictions[box_predictor.BOX_ENCODINGS] + objectness_predictions = box_predictions[ + box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND] + init_op = tf.global_variables_initializer() + + resolution = 32 + expected_num_anchors = resolution*resolution*5 + with self.test_session() as sess: + sess.run(init_op) + (box_encodings_shape, + objectness_predictions_shape) = sess.run( + [tf.shape(box_encodings), tf.shape(objectness_predictions)], + feed_dict={image_features: + np.random.rand(4, resolution, resolution, 64)}) + self.assertAllEqual(box_encodings_shape, [4, expected_num_anchors, 1, 4]) + self.assertAllEqual(objectness_predictions_shape, + [4, expected_num_anchors, 1]) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/core/data_decoder.py b/object_detection/core/data_decoder.py new file mode 100644 index 0000000000000000000000000000000000000000..9ae18c1f957ea69432b08740451abb2af2548910 --- /dev/null +++ b/object_detection/core/data_decoder.py @@ -0,0 +1,41 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Interface for data decoders. + +Data decoders decode the input data and return a dictionary of tensors keyed by +the entries in core.reader.Fields. +""" +from abc import ABCMeta +from abc import abstractmethod + + +class DataDecoder(object): + """Interface for data decoders.""" + __metaclass__ = ABCMeta + + @abstractmethod + def decode(self, data): + """Return a single image and associated labels. + + Args: + data: a string tensor holding a serialized protocol buffer corresponding + to data for a single image. + + Returns: + tensor_dict: a dictionary containing tensors. Possible keys are defined in + reader.Fields. + """ + pass diff --git a/object_detection/core/keypoint_ops.py b/object_detection/core/keypoint_ops.py new file mode 100644 index 0000000000000000000000000000000000000000..4a550d3c9c241cf5179f11eaf42b27ff659adc5c --- /dev/null +++ b/object_detection/core/keypoint_ops.py @@ -0,0 +1,231 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Keypoint operations. + +Keypoints are represented as tensors of shape [num_instances, num_keypoints, 2], +where the last dimension holds rank 2 tensors of the form [y, x] representing +the coordinates of the keypoint. +""" +import numpy as np +import tensorflow as tf + + +def scale(keypoints, y_scale, x_scale, scope=None): + """Scales keypoint coordinates in x and y dimensions. + + Args: + keypoints: a tensor of shape [num_instances, num_keypoints, 2] + y_scale: (float) scalar tensor + x_scale: (float) scalar tensor + scope: name scope. + + Returns: + new_keypoints: a tensor of shape [num_instances, num_keypoints, 2] + """ + with tf.name_scope(scope, 'Scale'): + y_scale = tf.cast(y_scale, tf.float32) + x_scale = tf.cast(x_scale, tf.float32) + new_keypoints = keypoints * [[[y_scale, x_scale]]] + return new_keypoints + + +def clip_to_window(keypoints, window, scope=None): + """Clips keypoints to a window. + + This op clips any input keypoints to a window. + + Args: + keypoints: a tensor of shape [num_instances, num_keypoints, 2] + window: a tensor of shape [4] representing the [y_min, x_min, y_max, x_max] + window to which the op should clip the keypoints. + scope: name scope. + + Returns: + new_keypoints: a tensor of shape [num_instances, num_keypoints, 2] + """ + with tf.name_scope(scope, 'ClipToWindow'): + y, x = tf.split(value=keypoints, num_or_size_splits=2, axis=2) + win_y_min, win_x_min, win_y_max, win_x_max = tf.unstack(window) + y = tf.maximum(tf.minimum(y, win_y_max), win_y_min) + x = tf.maximum(tf.minimum(x, win_x_max), win_x_min) + new_keypoints = tf.concat([y, x], 2) + return new_keypoints + + +def prune_outside_window(keypoints, window, scope=None): + """Prunes keypoints that fall outside a given window. + + This function replaces keypoints that fall outside the given window with nan. + See also clip_to_window which clips any keypoints that fall outside the given + window. + + Args: + keypoints: a tensor of shape [num_instances, num_keypoints, 2] + window: a tensor of shape [4] representing the [y_min, x_min, y_max, x_max] + window outside of which the op should prune the keypoints. + scope: name scope. + + Returns: + new_keypoints: a tensor of shape [num_instances, num_keypoints, 2] + """ + with tf.name_scope(scope, 'PruneOutsideWindow'): + y, x = tf.split(value=keypoints, num_or_size_splits=2, axis=2) + win_y_min, win_x_min, win_y_max, win_x_max = tf.unstack(window) + + valid_indices = tf.logical_and( + tf.logical_and(y >= win_y_min, y <= win_y_max), + tf.logical_and(x >= win_x_min, x <= win_x_max)) + + new_y = tf.where(valid_indices, y, np.nan * tf.ones_like(y)) + new_x = tf.where(valid_indices, x, np.nan * tf.ones_like(x)) + new_keypoints = tf.concat([new_y, new_x], 2) + + return new_keypoints + + +def change_coordinate_frame(keypoints, window, scope=None): + """Changes coordinate frame of the keypoints to be relative to window's frame. + + Given a window of the form [y_min, x_min, y_max, x_max], changes keypoint + coordinates from keypoints of shape [num_instances, num_keypoints, 2] + to be relative to this window. + + An example use case is data augmentation: where we are given groundtruth + keypoints and would like to randomly crop the image to some window. In this + case we need to change the coordinate frame of each groundtruth keypoint to be + relative to this new window. + + Args: + keypoints: a tensor of shape [num_instances, num_keypoints, 2] + window: a tensor of shape [4] representing the [y_min, x_min, y_max, x_max] + window we should change the coordinate frame to. + scope: name scope. + + Returns: + new_keypoints: a tensor of shape [num_instances, num_keypoints, 2] + """ + with tf.name_scope(scope, 'ChangeCoordinateFrame'): + win_height = window[2] - window[0] + win_width = window[3] - window[1] + new_keypoints = scale(keypoints - [window[0], window[1]], 1.0 / win_height, + 1.0 / win_width) + return new_keypoints + + +def to_normalized_coordinates(keypoints, height, width, + check_range=True, scope=None): + """Converts absolute keypoint coordinates to normalized coordinates in [0, 1]. + + Usually one uses the dynamic shape of the image or conv-layer tensor: + keypoints = keypoint_ops.to_normalized_coordinates(keypoints, + tf.shape(images)[1], + tf.shape(images)[2]), + + This function raises an assertion failed error at graph execution time when + the maximum coordinate is smaller than 1.01 (which means that coordinates are + already normalized). The value 1.01 is to deal with small rounding errors. + + Args: + keypoints: A tensor of shape [num_instances, num_keypoints, 2]. + height: Maximum value for y coordinate of absolute keypoint coordinates. + width: Maximum value for x coordinate of absolute keypoint coordinates. + check_range: If True, checks if the coordinates are normalized. + scope: name scope. + + Returns: + tensor of shape [num_instances, num_keypoints, 2] with normalized + coordinates in [0, 1]. + """ + with tf.name_scope(scope, 'ToNormalizedCoordinates'): + height = tf.cast(height, tf.float32) + width = tf.cast(width, tf.float32) + + if check_range: + max_val = tf.reduce_max(keypoints) + max_assert = tf.Assert(tf.greater(max_val, 1.01), + ['max value is lower than 1.01: ', max_val]) + with tf.control_dependencies([max_assert]): + width = tf.identity(width) + + return scale(keypoints, 1.0 / height, 1.0 / width) + + +def to_absolute_coordinates(keypoints, height, width, + check_range=True, scope=None): + """Converts normalized keypoint coordinates to absolute pixel coordinates. + + This function raises an assertion failed error when the maximum keypoint + coordinate value is larger than 1.01 (in which case coordinates are already + absolute). + + Args: + keypoints: A tensor of shape [num_instances, num_keypoints, 2] + height: Maximum value for y coordinate of absolute keypoint coordinates. + width: Maximum value for x coordinate of absolute keypoint coordinates. + check_range: If True, checks if the coordinates are normalized or not. + scope: name scope. + + Returns: + tensor of shape [num_instances, num_keypoints, 2] with absolute coordinates + in terms of the image size. + + """ + with tf.name_scope(scope, 'ToAbsoluteCoordinates'): + height = tf.cast(height, tf.float32) + width = tf.cast(width, tf.float32) + + # Ensure range of input keypoints is correct. + if check_range: + max_val = tf.reduce_max(keypoints) + max_assert = tf.Assert(tf.greater_equal(1.01, max_val), + ['maximum keypoint coordinate value is larger ' + 'than 1.01: ', max_val]) + with tf.control_dependencies([max_assert]): + width = tf.identity(width) + + return scale(keypoints, height, width) + + +def flip_horizontal(keypoints, flip_point, flip_permutation, scope=None): + """Flips the keypoints horizontally around the flip_point. + + This operation flips the x coordinate for each keypoint around the flip_point + and also permutes the keypoints in a manner specified by flip_permutation. + + Args: + keypoints: a tensor of shape [num_instances, num_keypoints, 2] + flip_point: (float) scalar tensor representing the x coordinate to flip the + keypoints around. + flip_permutation: rank 1 int32 tensor containing the keypoint flip + permutation. This specifies the mapping from original keypoint indices + to the flipped keypoint indices. This is used primarily for keypoints + that are not reflection invariant. E.g. Suppose there are 3 keypoints + representing ['head', 'right_eye', 'left_eye'], then a logical choice for + flip_permutation might be [0, 2, 1] since we want to swap the 'left_eye' + and 'right_eye' after a horizontal flip. + scope: name scope. + + Returns: + new_keypoints: a tensor of shape [num_instances, num_keypoints, 2] + """ + with tf.name_scope(scope, 'FlipHorizontal'): + keypoints = tf.transpose(keypoints, [1, 0, 2]) + keypoints = tf.gather(keypoints, flip_permutation) + v, u = tf.split(value=keypoints, num_or_size_splits=2, axis=2) + u = flip_point * 2.0 - u + new_keypoints = tf.concat([v, u], 2) + new_keypoints = tf.transpose(new_keypoints, [1, 0, 2]) + return new_keypoints diff --git a/object_detection/core/keypoint_ops_test.py b/object_detection/core/keypoint_ops_test.py new file mode 100644 index 0000000000000000000000000000000000000000..27c227bcfe20e65bbda237c1fc7c572ac593fc0e --- /dev/null +++ b/object_detection/core/keypoint_ops_test.py @@ -0,0 +1,168 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.core.keypoint_ops.""" +import numpy as np +import tensorflow as tf + +from object_detection.core import keypoint_ops + + +class KeypointOpsTest(tf.test.TestCase): + """Tests for common keypoint operations.""" + + def test_scale(self): + keypoints = tf.constant([ + [[0.0, 0.0], [100.0, 200.0]], + [[50.0, 120.0], [100.0, 140.0]] + ]) + y_scale = tf.constant(1.0 / 100) + x_scale = tf.constant(1.0 / 200) + + expected_keypoints = tf.constant([ + [[0., 0.], [1.0, 1.0]], + [[0.5, 0.6], [1.0, 0.7]] + ]) + output = keypoint_ops.scale(keypoints, y_scale, x_scale) + + with self.test_session() as sess: + output_, expected_keypoints_ = sess.run([output, expected_keypoints]) + self.assertAllClose(output_, expected_keypoints_) + + def test_clip_to_window(self): + keypoints = tf.constant([ + [[0.25, 0.5], [0.75, 0.75]], + [[0.5, 0.0], [1.0, 1.0]] + ]) + window = tf.constant([0.25, 0.25, 0.75, 0.75]) + + expected_keypoints = tf.constant([ + [[0.25, 0.5], [0.75, 0.75]], + [[0.5, 0.25], [0.75, 0.75]] + ]) + output = keypoint_ops.clip_to_window(keypoints, window) + + with self.test_session() as sess: + output_, expected_keypoints_ = sess.run([output, expected_keypoints]) + self.assertAllClose(output_, expected_keypoints_) + + def test_prune_outside_window(self): + keypoints = tf.constant([ + [[0.25, 0.5], [0.75, 0.75]], + [[0.5, 0.0], [1.0, 1.0]] + ]) + window = tf.constant([0.25, 0.25, 0.75, 0.75]) + + expected_keypoints = tf.constant([[[0.25, 0.5], [0.75, 0.75]], + [[np.nan, np.nan], [np.nan, np.nan]]]) + output = keypoint_ops.prune_outside_window(keypoints, window) + + with self.test_session() as sess: + output_, expected_keypoints_ = sess.run([output, expected_keypoints]) + self.assertAllClose(output_, expected_keypoints_) + + def test_change_coordinate_frame(self): + keypoints = tf.constant([ + [[0.25, 0.5], [0.75, 0.75]], + [[0.5, 0.0], [1.0, 1.0]] + ]) + window = tf.constant([0.25, 0.25, 0.75, 0.75]) + + expected_keypoints = tf.constant([ + [[0, 0.5], [1.0, 1.0]], + [[0.5, -0.5], [1.5, 1.5]] + ]) + output = keypoint_ops.change_coordinate_frame(keypoints, window) + + with self.test_session() as sess: + output_, expected_keypoints_ = sess.run([output, expected_keypoints]) + self.assertAllClose(output_, expected_keypoints_) + + def test_to_normalized_coordinates(self): + keypoints = tf.constant([ + [[10., 30.], [30., 45.]], + [[20., 0.], [40., 60.]] + ]) + output = keypoint_ops.to_normalized_coordinates( + keypoints, 40, 60) + expected_keypoints = tf.constant([ + [[0.25, 0.5], [0.75, 0.75]], + [[0.5, 0.0], [1.0, 1.0]] + ]) + + with self.test_session() as sess: + output_, expected_keypoints_ = sess.run([output, expected_keypoints]) + self.assertAllClose(output_, expected_keypoints_) + + def test_to_normalized_coordinates_already_normalized(self): + keypoints = tf.constant([ + [[0.25, 0.5], [0.75, 0.75]], + [[0.5, 0.0], [1.0, 1.0]] + ]) + output = keypoint_ops.to_normalized_coordinates( + keypoints, 40, 60) + + with self.test_session() as sess: + with self.assertRaisesOpError('assertion failed'): + sess.run(output) + + def test_to_absolute_coordinates(self): + keypoints = tf.constant([ + [[0.25, 0.5], [0.75, 0.75]], + [[0.5, 0.0], [1.0, 1.0]] + ]) + output = keypoint_ops.to_absolute_coordinates( + keypoints, 40, 60) + expected_keypoints = tf.constant([ + [[10., 30.], [30., 45.]], + [[20., 0.], [40., 60.]] + ]) + + with self.test_session() as sess: + output_, expected_keypoints_ = sess.run([output, expected_keypoints]) + self.assertAllClose(output_, expected_keypoints_) + + def test_to_absolute_coordinates_already_absolute(self): + keypoints = tf.constant([ + [[10., 30.], [30., 45.]], + [[20., 0.], [40., 60.]] + ]) + output = keypoint_ops.to_absolute_coordinates( + keypoints, 40, 60) + + with self.test_session() as sess: + with self.assertRaisesOpError('assertion failed'): + sess.run(output) + + def test_flip_horizontal(self): + keypoints = tf.constant([ + [[0.1, 0.1], [0.2, 0.2], [0.3, 0.3]], + [[0.4, 0.4], [0.5, 0.5], [0.6, 0.6]] + ]) + flip_permutation = [0, 2, 1] + + expected_keypoints = tf.constant([ + [[0.1, 0.9], [0.3, 0.7], [0.2, 0.8]], + [[0.4, 0.6], [0.6, 0.4], [0.5, 0.5]], + ]) + output = keypoint_ops.flip_horizontal(keypoints, 0.5, flip_permutation) + + with self.test_session() as sess: + output_, expected_keypoints_ = sess.run([output, expected_keypoints]) + self.assertAllClose(output_, expected_keypoints_) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/core/losses.py b/object_detection/core/losses.py new file mode 100644 index 0000000000000000000000000000000000000000..75c7b5fc40ddafe63161b311eb9aa910f5a22eb8 --- /dev/null +++ b/object_detection/core/losses.py @@ -0,0 +1,551 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Classification and regression loss functions for object detection. + +Localization losses: + * WeightedL2LocalizationLoss + * WeightedSmoothL1LocalizationLoss + * WeightedIOULocalizationLoss + +Classification losses: + * WeightedSigmoidClassificationLoss + * WeightedSoftmaxClassificationLoss + * BootstrappedSigmoidClassificationLoss +""" +from abc import ABCMeta +from abc import abstractmethod + +import tensorflow as tf + +from object_detection.core import box_list +from object_detection.core import box_list_ops +from object_detection.utils import ops + +slim = tf.contrib.slim + + +class Loss(object): + """Abstract base class for loss functions.""" + __metaclass__ = ABCMeta + + def __call__(self, + prediction_tensor, + target_tensor, + ignore_nan_targets=False, + scope=None, + **params): + """Call the loss function. + + Args: + prediction_tensor: a tensor representing predicted quantities. + target_tensor: a tensor representing regression or classification targets. + ignore_nan_targets: whether to ignore nan targets in the loss computation. + E.g. can be used if the target tensor is missing groundtruth data that + shouldn't be factored into the loss. + scope: Op scope name. Defaults to 'Loss' if None. + **params: Additional keyword arguments for specific implementations of + the Loss. + + Returns: + loss: a tensor representing the value of the loss function. + """ + with tf.name_scope(scope, 'Loss', + [prediction_tensor, target_tensor, params]) as scope: + if ignore_nan_targets: + target_tensor = tf.where(tf.is_nan(target_tensor), + prediction_tensor, + target_tensor) + return self._compute_loss(prediction_tensor, target_tensor, **params) + + @abstractmethod + def _compute_loss(self, prediction_tensor, target_tensor, **params): + """Method to be overriden by implementations. + + Args: + prediction_tensor: a tensor representing predicted quantities + target_tensor: a tensor representing regression or classification targets + **params: Additional keyword arguments for specific implementations of + the Loss. + + Returns: + loss: a tensor representing the value of the loss function + """ + pass + + +class WeightedL2LocalizationLoss(Loss): + """L2 localization loss function with anchorwise output support. + + Loss[b,a] = .5 * ||weights[b,a] * (prediction[b,a,:] - target[b,a,:])||^2 + """ + + def __init__(self, anchorwise_output=False): + """Constructor. + + Args: + anchorwise_output: Outputs loss per anchor. (default False) + + """ + self._anchorwise_output = anchorwise_output + + def _compute_loss(self, prediction_tensor, target_tensor, weights): + """Compute loss function. + + Args: + prediction_tensor: A float tensor of shape [batch_size, num_anchors, + code_size] representing the (encoded) predicted locations of objects. + target_tensor: A float tensor of shape [batch_size, num_anchors, + code_size] representing the regression targets + weights: a float tensor of shape [batch_size, num_anchors] + + Returns: + loss: a (scalar) tensor representing the value of the loss function + or a float tensor of shape [batch_size, num_anchors] + """ + weighted_diff = (prediction_tensor - target_tensor) * tf.expand_dims( + weights, 2) + square_diff = 0.5 * tf.square(weighted_diff) + if self._anchorwise_output: + return tf.reduce_sum(square_diff, 2) + return tf.reduce_sum(square_diff) + + +class WeightedSmoothL1LocalizationLoss(Loss): + """Smooth L1 localization loss function. + + The smooth L1_loss is defined elementwise as .5 x^2 if |x|<1 and |x|-.5 + otherwise, where x is the difference between predictions and target. + + See also Equation (3) in the Fast R-CNN paper by Ross Girshick (ICCV 2015) + """ + + def __init__(self, anchorwise_output=False): + """Constructor. + + Args: + anchorwise_output: Outputs loss per anchor. (default False) + + """ + self._anchorwise_output = anchorwise_output + + def _compute_loss(self, prediction_tensor, target_tensor, weights): + """Compute loss function. + + Args: + prediction_tensor: A float tensor of shape [batch_size, num_anchors, + code_size] representing the (encoded) predicted locations of objects. + target_tensor: A float tensor of shape [batch_size, num_anchors, + code_size] representing the regression targets + weights: a float tensor of shape [batch_size, num_anchors] + + Returns: + loss: a (scalar) tensor representing the value of the loss function + """ + diff = prediction_tensor - target_tensor + abs_diff = tf.abs(diff) + abs_diff_lt_1 = tf.less(abs_diff, 1) + anchorwise_smooth_l1norm = tf.reduce_sum( + tf.where(abs_diff_lt_1, 0.5 * tf.square(abs_diff), abs_diff - 0.5), + 2) * weights + if self._anchorwise_output: + return anchorwise_smooth_l1norm + return tf.reduce_sum(anchorwise_smooth_l1norm) + + +class WeightedIOULocalizationLoss(Loss): + """IOU localization loss function. + + Sums the IOU for corresponding pairs of predicted/groundtruth boxes + and for each pair assign a loss of 1 - IOU. We then compute a weighted + sum over all pairs which is returned as the total loss. + """ + + def _compute_loss(self, prediction_tensor, target_tensor, weights): + """Compute loss function. + + Args: + prediction_tensor: A float tensor of shape [batch_size, num_anchors, 4] + representing the decoded predicted boxes + target_tensor: A float tensor of shape [batch_size, num_anchors, 4] + representing the decoded target boxes + weights: a float tensor of shape [batch_size, num_anchors] + + Returns: + loss: a (scalar) tensor representing the value of the loss function + """ + predicted_boxes = box_list.BoxList(tf.reshape(prediction_tensor, [-1, 4])) + target_boxes = box_list.BoxList(tf.reshape(target_tensor, [-1, 4])) + per_anchor_iou_loss = 1.0 - box_list_ops.matched_iou(predicted_boxes, + target_boxes) + return tf.reduce_sum(tf.reshape(weights, [-1]) * per_anchor_iou_loss) + + +class WeightedSigmoidClassificationLoss(Loss): + """Sigmoid cross entropy classification loss function.""" + + def __init__(self, anchorwise_output=False): + """Constructor. + + Args: + anchorwise_output: Outputs loss per anchor. (default False) + + """ + self._anchorwise_output = anchorwise_output + + def _compute_loss(self, + prediction_tensor, + target_tensor, + weights, + class_indices=None): + """Compute loss function. + + Args: + prediction_tensor: A float tensor of shape [batch_size, num_anchors, + num_classes] representing the predicted logits for each class + target_tensor: A float tensor of shape [batch_size, num_anchors, + num_classes] representing one-hot encoded classification targets + weights: a float tensor of shape [batch_size, num_anchors] + class_indices: (Optional) A 1-D integer tensor of class indices. + If provided, computes loss only for the specified class indices. + + Returns: + loss: a (scalar) tensor representing the value of the loss function + or a float tensor of shape [batch_size, num_anchors] + """ + weights = tf.expand_dims(weights, 2) + if class_indices is not None: + weights *= tf.reshape( + ops.indices_to_dense_vector(class_indices, + tf.shape(prediction_tensor)[2]), + [1, 1, -1]) + per_entry_cross_ent = (tf.nn.sigmoid_cross_entropy_with_logits( + labels=target_tensor, logits=prediction_tensor)) + if self._anchorwise_output: + return tf.reduce_sum(per_entry_cross_ent * weights, 2) + return tf.reduce_sum(per_entry_cross_ent * weights) + + +class WeightedSoftmaxClassificationLoss(Loss): + """Softmax loss function.""" + + def __init__(self, anchorwise_output=False): + """Constructor. + + Args: + anchorwise_output: Whether to output loss per anchor (default False) + + """ + self._anchorwise_output = anchorwise_output + + def _compute_loss(self, prediction_tensor, target_tensor, weights): + """Compute loss function. + + Args: + prediction_tensor: A float tensor of shape [batch_size, num_anchors, + num_classes] representing the predicted logits for each class + target_tensor: A float tensor of shape [batch_size, num_anchors, + num_classes] representing one-hot encoded classification targets + weights: a float tensor of shape [batch_size, num_anchors] + + Returns: + loss: a (scalar) tensor representing the value of the loss function + """ + num_classes = prediction_tensor.get_shape().as_list()[-1] + per_row_cross_ent = (tf.nn.softmax_cross_entropy_with_logits( + labels=tf.reshape(target_tensor, [-1, num_classes]), + logits=tf.reshape(prediction_tensor, [-1, num_classes]))) + if self._anchorwise_output: + return tf.reshape(per_row_cross_ent, tf.shape(weights)) * weights + return tf.reduce_sum(per_row_cross_ent * tf.reshape(weights, [-1])) + + +class BootstrappedSigmoidClassificationLoss(Loss): + """Bootstrapped sigmoid cross entropy classification loss function. + + This loss uses a convex combination of training labels and the current model's + predictions as training targets in the classification loss. The idea is that + as the model improves over time, its predictions can be trusted more and we + can use these predictions to mitigate the damage of noisy/incorrect labels, + because incorrect labels are likely to be eventually highly inconsistent with + other stimuli predicted to have the same label by the model. + + In "soft" bootstrapping, we use all predicted class probabilities, whereas in + "hard" bootstrapping, we use the single class favored by the model. + + See also Training Deep Neural Networks On Noisy Labels with Bootstrapping by + Reed et al. (ICLR 2015). + """ + + def __init__(self, alpha, bootstrap_type='soft', anchorwise_output=False): + """Constructor. + + Args: + alpha: a float32 scalar tensor between 0 and 1 representing interpolation + weight + bootstrap_type: set to either 'hard' or 'soft' (default) + anchorwise_output: Outputs loss per anchor. (default False) + + Raises: + ValueError: if bootstrap_type is not either 'hard' or 'soft' + """ + if bootstrap_type != 'hard' and bootstrap_type != 'soft': + raise ValueError('Unrecognized bootstrap_type: must be one of ' + '\'hard\' or \'soft.\'') + self._alpha = alpha + self._bootstrap_type = bootstrap_type + self._anchorwise_output = anchorwise_output + + def _compute_loss(self, prediction_tensor, target_tensor, weights): + """Compute loss function. + + Args: + prediction_tensor: A float tensor of shape [batch_size, num_anchors, + num_classes] representing the predicted logits for each class + target_tensor: A float tensor of shape [batch_size, num_anchors, + num_classes] representing one-hot encoded classification targets + weights: a float tensor of shape [batch_size, num_anchors] + + Returns: + loss: a (scalar) tensor representing the value of the loss function + or a float tensor of shape [batch_size, num_anchors] + """ + if self._bootstrap_type == 'soft': + bootstrap_target_tensor = self._alpha * target_tensor + ( + 1.0 - self._alpha) * tf.sigmoid(prediction_tensor) + else: + bootstrap_target_tensor = self._alpha * target_tensor + ( + 1.0 - self._alpha) * tf.cast( + tf.sigmoid(prediction_tensor) > 0.5, tf.float32) + per_entry_cross_ent = (tf.nn.sigmoid_cross_entropy_with_logits( + labels=bootstrap_target_tensor, logits=prediction_tensor)) + if self._anchorwise_output: + return tf.reduce_sum(per_entry_cross_ent * tf.expand_dims(weights, 2), 2) + return tf.reduce_sum(per_entry_cross_ent * tf.expand_dims(weights, 2)) + + +class HardExampleMiner(object): + """Hard example mining for regions in a list of images. + + Implements hard example mining to select a subset of regions to be + back-propagated. For each image, selects the regions with highest losses, + subject to the condition that a newly selected region cannot have + an IOU > iou_threshold with any of the previously selected regions. + This can be achieved by re-using a greedy non-maximum suppression algorithm. + A constraint on the number of negatives mined per positive region can also be + enforced. + + Reference papers: "Training Region-based Object Detectors with Online + Hard Example Mining" (CVPR 2016) by Srivastava et al., and + "SSD: Single Shot MultiBox Detector" (ECCV 2016) by Liu et al. + """ + + def __init__(self, + num_hard_examples=64, + iou_threshold=0.7, + loss_type='both', + cls_loss_weight=0.05, + loc_loss_weight=0.06, + max_negatives_per_positive=None, + min_negatives_per_image=0): + """Constructor. + + The hard example mining implemented by this class can replicate the behavior + in the two aforementioned papers (Srivastava et al., and Liu et al). + To replicate the A2 paper (Srivastava et al), num_hard_examples is set + to a fixed parameter (64 by default) and iou_threshold is set to .7 for + running non-max-suppression the predicted boxes prior to hard mining. + In order to replicate the SSD paper (Liu et al), num_hard_examples should + be set to None, max_negatives_per_positive should be 3 and iou_threshold + should be 1.0 (in order to effectively turn off NMS). + + Args: + num_hard_examples: maximum number of hard examples to be + selected per image (prior to enforcing max negative to positive ratio + constraint). If set to None, all examples obtained after NMS are + considered. + iou_threshold: minimum intersection over union for an example + to be discarded during NMS. + loss_type: use only classification losses ('cls', default), + localization losses ('loc') or both losses ('both'). + In the last case, cls_loss_weight and loc_loss_weight are used to + compute weighted sum of the two losses. + cls_loss_weight: weight for classification loss. + loc_loss_weight: weight for location loss. + max_negatives_per_positive: maximum number of negatives to retain for + each positive anchor. By default, num_negatives_per_positive is None, + which means that we do not enforce a prespecified negative:positive + ratio. Note also that num_negatives_per_positives can be a float + (and will be converted to be a float even if it is passed in otherwise). + min_negatives_per_image: minimum number of negative anchors to sample for + a given image. Setting this to a positive number allows sampling + negatives in an image without any positive anchors and thus not biased + towards at least one detection per image. + """ + self._num_hard_examples = num_hard_examples + self._iou_threshold = iou_threshold + self._loss_type = loss_type + self._cls_loss_weight = cls_loss_weight + self._loc_loss_weight = loc_loss_weight + self._max_negatives_per_positive = max_negatives_per_positive + self._min_negatives_per_image = min_negatives_per_image + if self._max_negatives_per_positive is not None: + self._max_negatives_per_positive = float(self._max_negatives_per_positive) + self._num_positives_list = None + self._num_negatives_list = None + + def __call__(self, + location_losses, + cls_losses, + decoded_boxlist_list, + match_list=None): + """Computes localization and classification losses after hard mining. + + Args: + location_losses: a float tensor of shape [num_images, num_anchors] + representing anchorwise localization losses. + cls_losses: a float tensor of shape [num_images, num_anchors] + representing anchorwise classification losses. + decoded_boxlist_list: a list of decoded BoxList representing location + predictions for each image. + match_list: an optional list of matcher.Match objects encoding the match + between anchors and groundtruth boxes for each image of the batch, + with rows of the Match objects corresponding to groundtruth boxes + and columns corresponding to anchors. Match objects in match_list are + used to reference which anchors are positive, negative or ignored. If + self._max_negatives_per_positive exists, these are then used to enforce + a prespecified negative to positive ratio. + + Returns: + mined_location_loss: a float scalar with sum of localization losses from + selected hard examples. + mined_cls_loss: a float scalar with sum of classification losses from + selected hard examples. + Raises: + ValueError: if location_losses, cls_losses and decoded_boxlist_list do + not have compatible shapes (i.e., they must correspond to the same + number of images). + ValueError: if match_list is specified but its length does not match + len(decoded_boxlist_list). + """ + mined_location_losses = [] + mined_cls_losses = [] + location_losses = tf.unstack(location_losses) + cls_losses = tf.unstack(cls_losses) + num_images = len(decoded_boxlist_list) + if not match_list: + match_list = num_images * [None] + if not len(location_losses) == len(decoded_boxlist_list) == len(cls_losses): + raise ValueError('location_losses, cls_losses and decoded_boxlist_list ' + 'do not have compatible shapes.') + if not isinstance(match_list, list): + raise ValueError('match_list must be a list.') + if len(match_list) != len(decoded_boxlist_list): + raise ValueError('match_list must either be None or have ' + 'length=len(decoded_boxlist_list).') + num_positives_list = [] + num_negatives_list = [] + for ind, detection_boxlist in enumerate(decoded_boxlist_list): + box_locations = detection_boxlist.get() + match = match_list[ind] + image_losses = cls_losses[ind] + if self._loss_type == 'loc': + image_losses = location_losses[ind] + elif self._loss_type == 'both': + image_losses *= self._cls_loss_weight + image_losses += location_losses[ind] * self._loc_loss_weight + if self._num_hard_examples is not None: + num_hard_examples = self._num_hard_examples + else: + num_hard_examples = detection_boxlist.num_boxes() + selected_indices = tf.image.non_max_suppression( + box_locations, image_losses, num_hard_examples, self._iou_threshold) + if self._max_negatives_per_positive is not None and match: + (selected_indices, num_positives, + num_negatives) = self._subsample_selection_to_desired_neg_pos_ratio( + selected_indices, match, self._max_negatives_per_positive, + self._min_negatives_per_image) + num_positives_list.append(num_positives) + num_negatives_list.append(num_negatives) + mined_location_losses.append( + tf.reduce_sum(tf.gather(location_losses[ind], selected_indices))) + mined_cls_losses.append( + tf.reduce_sum(tf.gather(cls_losses[ind], selected_indices))) + location_loss = tf.reduce_sum(tf.stack(mined_location_losses)) + cls_loss = tf.reduce_sum(tf.stack(mined_cls_losses)) + if match and self._max_negatives_per_positive: + self._num_positives_list = num_positives_list + self._num_negatives_list = num_negatives_list + return (location_loss, cls_loss) + + def summarize(self): + """Summarize the number of positives and negatives after mining.""" + if self._num_positives_list and self._num_negatives_list: + avg_num_positives = tf.reduce_mean(tf.to_float(self._num_positives_list)) + avg_num_negatives = tf.reduce_mean(tf.to_float(self._num_negatives_list)) + tf.summary.scalar('HardExampleMiner/NumPositives', avg_num_positives) + tf.summary.scalar('HardExampleMiner/NumNegatives', avg_num_negatives) + + def _subsample_selection_to_desired_neg_pos_ratio(self, + indices, + match, + max_negatives_per_positive, + min_negatives_per_image=0): + """Subsample a collection of selected indices to a desired neg:pos ratio. + + This function takes a subset of M indices (indexing into a large anchor + collection of N anchors where M=0, + meaning that column i is matched with row match_results[i]. + (2) match_results[i]=-1, meaning that column i is not matched. + (3) match_results[i]=-2, meaning that column i is ignored. + + Raises: + ValueError: if match_results does not have rank 1 or is not an + integer int32 scalar tensor + """ + if match_results.shape.ndims != 1: + raise ValueError('match_results should have rank 1') + if match_results.dtype != tf.int32: + raise ValueError('match_results should be an int32 or int64 scalar ' + 'tensor') + self._match_results = match_results + + @property + def match_results(self): + """The accessor for match results. + + Returns: + the tensor which encodes the match results. + """ + return self._match_results + + def matched_column_indices(self): + """Returns column indices that match to some row. + + The indices returned by this op are always sorted in increasing order. + + Returns: + column_indices: int32 tensor of shape [K] with column indices. + """ + return self._reshape_and_cast(tf.where(tf.greater(self._match_results, -1))) + + def matched_column_indicator(self): + """Returns column indices that are matched. + + Returns: + column_indices: int32 tensor of shape [K] with column indices. + """ + return tf.greater_equal(self._match_results, 0) + + def num_matched_columns(self): + """Returns number (int32 scalar tensor) of matched columns.""" + return tf.size(self.matched_column_indices()) + + def unmatched_column_indices(self): + """Returns column indices that do not match any row. + + The indices returned by this op are always sorted in increasing order. + + Returns: + column_indices: int32 tensor of shape [K] with column indices. + """ + return self._reshape_and_cast(tf.where(tf.equal(self._match_results, -1))) + + def unmatched_column_indicator(self): + """Returns column indices that are unmatched. + + Returns: + column_indices: int32 tensor of shape [K] with column indices. + """ + return tf.equal(self._match_results, -1) + + def num_unmatched_columns(self): + """Returns number (int32 scalar tensor) of unmatched columns.""" + return tf.size(self.unmatched_column_indices()) + + def ignored_column_indices(self): + """Returns column indices that are ignored (neither Matched nor Unmatched). + + The indices returned by this op are always sorted in increasing order. + + Returns: + column_indices: int32 tensor of shape [K] with column indices. + """ + return self._reshape_and_cast(tf.where(self.ignored_column_indicator())) + + def ignored_column_indicator(self): + """Returns boolean column indicator where True means the colum is ignored. + + Returns: + column_indicator: boolean vector which is True for all ignored column + indices. + """ + return tf.equal(self._match_results, -2) + + def num_ignored_columns(self): + """Returns number (int32 scalar tensor) of matched columns.""" + return tf.size(self.ignored_column_indices()) + + def unmatched_or_ignored_column_indices(self): + """Returns column indices that are unmatched or ignored. + + The indices returned by this op are always sorted in increasing order. + + Returns: + column_indices: int32 tensor of shape [K] with column indices. + """ + return self._reshape_and_cast(tf.where(tf.greater(0, self._match_results))) + + def matched_row_indices(self): + """Returns row indices that match some column. + + The indices returned by this op are ordered so as to be in correspondence + with the output of matched_column_indicator(). For example if + self.matched_column_indicator() is [0,2], and self.matched_row_indices() is + [7, 3], then we know that column 0 was matched to row 7 and column 2 was + matched to row 3. + + Returns: + row_indices: int32 tensor of shape [K] with row indices. + """ + return self._reshape_and_cast( + tf.gather(self._match_results, self.matched_column_indices())) + + def _reshape_and_cast(self, t): + return tf.cast(tf.reshape(t, [-1]), tf.int32) + + +class Matcher(object): + """Abstract base class for matcher. + """ + __metaclass__ = ABCMeta + + def match(self, similarity_matrix, scope=None, **params): + """Computes matches among row and column indices and returns the result. + + Computes matches among the row and column indices based on the similarity + matrix and optional arguments. + + Args: + similarity_matrix: Float tensor of shape [N, M] with pairwise similarity + where higher value means more similar. + scope: Op scope name. Defaults to 'Match' if None. + **params: Additional keyword arguments for specific implementations of + the Matcher. + + Returns: + A Match object with the results of matching. + """ + with tf.name_scope(scope, 'Match', [similarity_matrix, params]) as scope: + return Match(self._match(similarity_matrix, **params)) + + @abstractmethod + def _match(self, similarity_matrix, **params): + """Method to be overriden by implementations. + + Args: + similarity_matrix: Float tensor of shape [N, M] with pairwise similarity + where higher value means more similar. + **params: Additional keyword arguments for specific implementations of + the Matcher. + + Returns: + match_results: Integer tensor of shape [M]: match_results[i]>=0 means + that column i is matched to row match_results[i], match_results[i]=-1 + means that the column is not matched. match_results[i]=-2 means that + the column is ignored (usually this happens when there is a very weak + match which one neither wants as positive nor negative example). + """ + pass diff --git a/object_detection/core/matcher_test.py b/object_detection/core/matcher_test.py new file mode 100644 index 0000000000000000000000000000000000000000..7054015f23c5713e30bac14dd166cc9ee7668da9 --- /dev/null +++ b/object_detection/core/matcher_test.py @@ -0,0 +1,150 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.core.matcher.""" +import numpy as np +import tensorflow as tf + +from object_detection.core import matcher + + +class AnchorMatcherTest(tf.test.TestCase): + + def test_get_correct_matched_columnIndices(self): + match_results = tf.constant([3, 1, -1, 0, -1, 5, -2]) + match = matcher.Match(match_results) + expected_column_indices = [0, 1, 3, 5] + matched_column_indices = match.matched_column_indices() + self.assertEquals(matched_column_indices.dtype, tf.int32) + with self.test_session() as sess: + matched_column_indices = sess.run(matched_column_indices) + self.assertAllEqual(matched_column_indices, expected_column_indices) + + def test_get_correct_counts(self): + match_results = tf.constant([3, 1, -1, 0, -1, 5, -2]) + match = matcher.Match(match_results) + exp_num_matched_columns = 4 + exp_num_unmatched_columns = 2 + exp_num_ignored_columns = 1 + num_matched_columns = match.num_matched_columns() + num_unmatched_columns = match.num_unmatched_columns() + num_ignored_columns = match.num_ignored_columns() + self.assertEquals(num_matched_columns.dtype, tf.int32) + self.assertEquals(num_unmatched_columns.dtype, tf.int32) + self.assertEquals(num_ignored_columns.dtype, tf.int32) + with self.test_session() as sess: + (num_matched_columns_out, num_unmatched_columns_out, + num_ignored_columns_out) = sess.run( + [num_matched_columns, num_unmatched_columns, num_ignored_columns]) + self.assertAllEqual(num_matched_columns_out, exp_num_matched_columns) + self.assertAllEqual(num_unmatched_columns_out, exp_num_unmatched_columns) + self.assertAllEqual(num_ignored_columns_out, exp_num_ignored_columns) + + def testGetCorrectUnmatchedColumnIndices(self): + match_results = tf.constant([3, 1, -1, 0, -1, 5, -2]) + match = matcher.Match(match_results) + expected_column_indices = [2, 4] + unmatched_column_indices = match.unmatched_column_indices() + self.assertEquals(unmatched_column_indices.dtype, tf.int32) + with self.test_session() as sess: + unmatched_column_indices = sess.run(unmatched_column_indices) + self.assertAllEqual(unmatched_column_indices, expected_column_indices) + + def testGetCorrectMatchedRowIndices(self): + match_results = tf.constant([3, 1, -1, 0, -1, 5, -2]) + match = matcher.Match(match_results) + expected_row_indices = [3, 1, 0, 5] + matched_row_indices = match.matched_row_indices() + self.assertEquals(matched_row_indices.dtype, tf.int32) + with self.test_session() as sess: + matched_row_inds = sess.run(matched_row_indices) + self.assertAllEqual(matched_row_inds, expected_row_indices) + + def test_get_correct_ignored_column_indices(self): + match_results = tf.constant([3, 1, -1, 0, -1, 5, -2]) + match = matcher.Match(match_results) + expected_column_indices = [6] + ignored_column_indices = match.ignored_column_indices() + self.assertEquals(ignored_column_indices.dtype, tf.int32) + with self.test_session() as sess: + ignored_column_indices = sess.run(ignored_column_indices) + self.assertAllEqual(ignored_column_indices, expected_column_indices) + + def test_get_correct_matched_column_indicator(self): + match_results = tf.constant([3, 1, -1, 0, -1, 5, -2]) + match = matcher.Match(match_results) + expected_column_indicator = [True, True, False, True, False, True, False] + matched_column_indicator = match.matched_column_indicator() + self.assertEquals(matched_column_indicator.dtype, tf.bool) + with self.test_session() as sess: + matched_column_indicator = sess.run(matched_column_indicator) + self.assertAllEqual(matched_column_indicator, expected_column_indicator) + + def test_get_correct_unmatched_column_indicator(self): + match_results = tf.constant([3, 1, -1, 0, -1, 5, -2]) + match = matcher.Match(match_results) + expected_column_indicator = [False, False, True, False, True, False, False] + unmatched_column_indicator = match.unmatched_column_indicator() + self.assertEquals(unmatched_column_indicator.dtype, tf.bool) + with self.test_session() as sess: + unmatched_column_indicator = sess.run(unmatched_column_indicator) + self.assertAllEqual(unmatched_column_indicator, expected_column_indicator) + + def test_get_correct_ignored_column_indicator(self): + match_results = tf.constant([3, 1, -1, 0, -1, 5, -2]) + match = matcher.Match(match_results) + expected_column_indicator = [False, False, False, False, False, False, True] + ignored_column_indicator = match.ignored_column_indicator() + self.assertEquals(ignored_column_indicator.dtype, tf.bool) + with self.test_session() as sess: + ignored_column_indicator = sess.run(ignored_column_indicator) + self.assertAllEqual(ignored_column_indicator, expected_column_indicator) + + def test_get_correct_unmatched_ignored_column_indices(self): + match_results = tf.constant([3, 1, -1, 0, -1, 5, -2]) + match = matcher.Match(match_results) + expected_column_indices = [2, 4, 6] + unmatched_ignored_column_indices = (match. + unmatched_or_ignored_column_indices()) + self.assertEquals(unmatched_ignored_column_indices.dtype, tf.int32) + with self.test_session() as sess: + unmatched_ignored_column_indices = sess.run( + unmatched_ignored_column_indices) + self.assertAllEqual(unmatched_ignored_column_indices, + expected_column_indices) + + def test_all_columns_accounted_for(self): + # Note: deliberately setting to small number so not always + # all possibilities appear (matched, unmatched, ignored) + num_matches = 10 + match_results = tf.random_uniform( + [num_matches], minval=-2, maxval=5, dtype=tf.int32) + match = matcher.Match(match_results) + matched_column_indices = match.matched_column_indices() + unmatched_column_indices = match.unmatched_column_indices() + ignored_column_indices = match.ignored_column_indices() + with self.test_session() as sess: + matched, unmatched, ignored = sess.run([ + matched_column_indices, unmatched_column_indices, + ignored_column_indices + ]) + all_indices = np.hstack((matched, unmatched, ignored)) + all_indices_sorted = np.sort(all_indices) + self.assertAllEqual(all_indices_sorted, + np.arange(num_matches, dtype=np.int32)) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/core/minibatch_sampler.py b/object_detection/core/minibatch_sampler.py new file mode 100644 index 0000000000000000000000000000000000000000..dc622221ae526360d0a5f85f914bc2c53365911c --- /dev/null +++ b/object_detection/core/minibatch_sampler.py @@ -0,0 +1,90 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Base minibatch sampler module. + +The job of the minibatch_sampler is to subsample a minibatch based on some +criterion. + +The main function call is: + subsample(indicator, batch_size, **params). +Indicator is a 1d boolean tensor where True denotes which examples can be +sampled. It returns a boolean indicator where True denotes an example has been +sampled.. + +Subclasses should implement the Subsample function and can make use of the +@staticmethod SubsampleIndicator. +""" + +from abc import ABCMeta +from abc import abstractmethod + +import tensorflow as tf + +from object_detection.utils import ops + + +class MinibatchSampler(object): + """Abstract base class for subsampling minibatches.""" + __metaclass__ = ABCMeta + + def __init__(self): + """Constructs a minibatch sampler.""" + pass + + @abstractmethod + def subsample(self, indicator, batch_size, **params): + """Returns subsample of entries in indicator. + + Args: + indicator: boolean tensor of shape [N] whose True entries can be sampled. + batch_size: desired batch size. + **params: additional keyword arguments for specific implementations of + the MinibatchSampler. + + Returns: + sample_indicator: boolean tensor of shape [N] whose True entries have been + sampled. If sum(indicator) >= batch_size, sum(is_sampled) = batch_size + """ + pass + + @staticmethod + def subsample_indicator(indicator, num_samples): + """Subsample indicator vector. + + Given a boolean indicator vector with M elements set to `True`, the function + assigns all but `num_samples` of these previously `True` elements to + `False`. If `num_samples` is greater than M, the original indicator vector + is returned. + + Args: + indicator: a 1-dimensional boolean tensor indicating which elements + are allowed to be sampled and which are not. + num_samples: int32 scalar tensor + + Returns: + a boolean tensor with the same shape as input (indicator) tensor + """ + indices = tf.where(indicator) + indices = tf.random_shuffle(indices) + indices = tf.reshape(indices, [-1]) + + num_samples = tf.minimum(tf.size(indices), num_samples) + selected_indices = tf.slice(indices, [0], tf.reshape(num_samples, [1])) + + selected_indicator = ops.indices_to_dense_vector(selected_indices, + tf.shape(indicator)[0]) + + return tf.equal(selected_indicator, 1) diff --git a/object_detection/core/minibatch_sampler_test.py b/object_detection/core/minibatch_sampler_test.py new file mode 100644 index 0000000000000000000000000000000000000000..7420ae5d03ca5318d2fd5df4dd4a5cee400189b1 --- /dev/null +++ b/object_detection/core/minibatch_sampler_test.py @@ -0,0 +1,82 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for google3.research.vale.object_detection.minibatch_sampler.""" + +import numpy as np +import tensorflow as tf + +from object_detection.core import minibatch_sampler + + +class MinibatchSamplerTest(tf.test.TestCase): + + def test_subsample_indicator_when_more_true_elements_than_num_samples(self): + np_indicator = [True, False, True, False, True, True, False] + indicator = tf.constant(np_indicator) + samples = minibatch_sampler.MinibatchSampler.subsample_indicator( + indicator, 3) + with self.test_session() as sess: + samples_out = sess.run(samples) + self.assertTrue(np.sum(samples_out), 3) + self.assertAllEqual(samples_out, + np.logical_and(samples_out, np_indicator)) + + def test_subsample_when_more_true_elements_than_num_samples_no_shape(self): + np_indicator = [True, False, True, False, True, True, False] + indicator = tf.placeholder(tf.bool) + feed_dict = {indicator: np_indicator} + + samples = minibatch_sampler.MinibatchSampler.subsample_indicator( + indicator, 3) + with self.test_session() as sess: + samples_out = sess.run(samples, feed_dict=feed_dict) + self.assertTrue(np.sum(samples_out), 3) + self.assertAllEqual(samples_out, + np.logical_and(samples_out, np_indicator)) + + def test_subsample_indicator_when_less_true_elements_than_num_samples(self): + np_indicator = [True, False, True, False, True, True, False] + indicator = tf.constant(np_indicator) + samples = minibatch_sampler.MinibatchSampler.subsample_indicator( + indicator, 5) + with self.test_session() as sess: + samples_out = sess.run(samples) + self.assertTrue(np.sum(samples_out), 4) + self.assertAllEqual(samples_out, + np.logical_and(samples_out, np_indicator)) + + def test_subsample_indicator_when_num_samples_is_zero(self): + np_indicator = [True, False, True, False, True, True, False] + indicator = tf.constant(np_indicator) + samples_none = minibatch_sampler.MinibatchSampler.subsample_indicator( + indicator, 0) + with self.test_session() as sess: + samples_none_out = sess.run(samples_none) + self.assertAllEqual( + np.zeros_like(samples_none_out, dtype=bool), + samples_none_out) + + def test_subsample_indicator_when_indicator_all_false(self): + indicator_empty = tf.zeros([0], dtype=tf.bool) + samples_empty = minibatch_sampler.MinibatchSampler.subsample_indicator( + indicator_empty, 4) + with self.test_session() as sess: + samples_empty_out = sess.run(samples_empty) + self.assertEqual(0, samples_empty_out.size) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/core/model.py b/object_detection/core/model.py new file mode 100644 index 0000000000000000000000000000000000000000..b8a448b65223993fe2bf1a40b3e8efeee359346b --- /dev/null +++ b/object_detection/core/model.py @@ -0,0 +1,252 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Abstract detection model. + +This file defines a generic base class for detection models. Programs that are +designed to work with arbitrary detection models should only depend on this +class. We intend for the functions in this class to follow tensor-in/tensor-out +design, thus all functions have tensors or lists/dictionaries holding tensors as +inputs and outputs. + +Abstractly, detection models predict output tensors given input images +which can be passed to a loss function at training time or passed to a +postprocessing function at eval time. The computation graphs at a high level +consequently look as follows: + +Training time: +inputs (images tensor) -> preprocess -> predict -> loss -> outputs (loss tensor) + +Evaluation time: +inputs (images tensor) -> preprocess -> predict -> postprocess + -> outputs (boxes tensor, scores tensor, classes tensor, num_detections tensor) + +DetectionModels must thus implement four functions (1) preprocess, (2) predict, +(3) postprocess and (4) loss. DetectionModels should make no assumptions about +the input size or aspect ratio --- they are responsible for doing any +resize/reshaping necessary (see docstring for the preprocess function). +Output classes are always integers in the range [0, num_classes). Any mapping +of these integers to semantic labels is to be handled outside of this class. + +By default, DetectionModels produce bounding box detections; However, we support +a handful of auxiliary annotations associated with each bounding box, namely, +instance masks and keypoints. +""" +from abc import ABCMeta +from abc import abstractmethod + +from object_detection.core import standard_fields as fields + + +class DetectionModel(object): + """Abstract base class for detection models.""" + __metaclass__ = ABCMeta + + def __init__(self, num_classes): + """Constructor. + + Args: + num_classes: number of classes. Note that num_classes *does not* include + background categories that might be implicitly be predicted in various + implementations. + """ + self._num_classes = num_classes + self._groundtruth_lists = {} + + @property + def num_classes(self): + return self._num_classes + + def groundtruth_lists(self, field): + """Access list of groundtruth tensors. + + Args: + field: a string key, options are + fields.BoxListFields.{boxes,classes,masks,keypoints} + + Returns: + a list of tensors holding groundtruth information (see also + provide_groundtruth function below), with one entry for each image in the + batch. + Raises: + RuntimeError: if the field has not been provided via provide_groundtruth. + """ + if field not in self._groundtruth_lists: + raise RuntimeError('Groundtruth tensor %s has not been provided', field) + return self._groundtruth_lists[field] + + @abstractmethod + def preprocess(self, inputs): + """Input preprocessing. + + To be overridden by implementations. + + This function is responsible for any scaling/shifting of input values that + is necessary prior to running the detector on an input image. + It is also responsible for any resizing that might be necessary as images + are assumed to arrive in arbitrary sizes. While this function could + conceivably be part of the predict method (below), it is often convenient + to keep these separate --- for example, we may want to preprocess on one + device, place onto a queue, and let another device (e.g., the GPU) handle + prediction. + + A few important notes about the preprocess function: + + We assume that this operation does not have any trainable variables nor + does it affect the groundtruth annotations in any way (thus data + augmentation operations such as random cropping should be performed + externally). + + There is no assumption that the batchsize in this function is the same as + the batch size in the predict function. In fact, we recommend calling the + preprocess function prior to calling any batching operations (which should + happen outside of the model) and thus assuming that batch sizes are equal + to 1 in the preprocess function. + + There is also no explicit assumption that the output resolutions + must be fixed across inputs --- this is to support "fully convolutional" + settings in which input images can have different shapes/resolutions. + + Args: + inputs: a [batch, height_in, width_in, channels] float32 tensor + representing a batch of images with values between 0 and 255.0. + + Returns: + preprocessed_inputs: a [batch, height_out, width_out, channels] float32 + tensor representing a batch of images. + """ + pass + + @abstractmethod + def predict(self, preprocessed_inputs): + """Predict prediction tensors from inputs tensor. + + Outputs of this function can be passed to loss or postprocess functions. + + Args: + preprocessed_inputs: a [batch, height, width, channels] float32 tensor + representing a batch of images. + + Returns: + prediction_dict: a dictionary holding prediction tensors to be + passed to the Loss or Postprocess functions. + """ + pass + + @abstractmethod + def postprocess(self, prediction_dict, **params): + """Convert predicted output tensors to final detections. + + Outputs adhere to the following conventions: + * Classes are integers in [0, num_classes); background classes are removed + and the first non-background class is mapped to 0. + * Boxes are to be interpreted as being in [y_min, x_min, y_max, x_max] + format and normalized relative to the image window. + * `num_detections` is provided for settings where detections are padded to a + fixed number of boxes. + * We do not specifically assume any kind of probabilistic interpretation + of the scores --- the only important thing is their relative ordering. + Thus implementations of the postprocess function are free to output + logits, probabilities, calibrated probabilities, or anything else. + + Args: + prediction_dict: a dictionary holding prediction tensors. + **params: Additional keyword arguments for specific implementations of + DetectionModel. + + Returns: + detections: a dictionary containing the following fields + detection_boxes: [batch, max_detections, 4] + detection_scores: [batch, max_detections] + detection_classes: [batch, max_detections] + instance_masks: [batch, max_detections, image_height, image_width] + (optional) + keypoints: [batch, max_detections, num_keypoints, 2] (optional) + num_detections: [batch] + """ + pass + + @abstractmethod + def loss(self, prediction_dict): + """Compute scalar loss tensors with respect to provided groundtruth. + + Calling this function requires that groundtruth tensors have been + provided via the provide_groundtruth function. + + Args: + prediction_dict: a dictionary holding predicted tensors + + Returns: + a dictionary mapping strings (loss names) to scalar tensors representing + loss values. + """ + pass + + def provide_groundtruth(self, + groundtruth_boxes_list, + groundtruth_classes_list, + groundtruth_masks_list=None, + groundtruth_keypoints_list=None): + """Provide groundtruth tensors. + + Args: + groundtruth_boxes_list: a list of 2-D tf.float32 tensors of shape + [num_boxes, 4] containing coordinates of the groundtruth boxes. + Groundtruth boxes are provided in [y_min, x_min, y_max, x_max] + format and assumed to be normalized and clipped + relative to the image window with y_min <= y_max and x_min <= x_max. + groundtruth_classes_list: a list of 2-D tf.float32 one-hot (or k-hot) + tensors of shape [num_boxes, num_classes] containing the class targets + with the 0th index assumed to map to the first non-background class. + groundtruth_masks_list: a list of 2-D tf.float32 tensors of + shape [max_detections, height_in, width_in] containing instance + masks with values in {0, 1}. If None, no masks are provided. + Mask resolution `height_in`x`width_in` must agree with the resolution + of the input image tensor provided to the `preprocess` function. + groundtruth_keypoints_list: a list of 2-D tf.float32 tensors of + shape [batch, max_detections, num_keypoints, 2] containing keypoints. + Keypoints are assumed to be provided in normalized coordinates and + missing keypoints should be encoded as NaN. + """ + self._groundtruth_lists[fields.BoxListFields.boxes] = groundtruth_boxes_list + self._groundtruth_lists[ + fields.BoxListFields.classes] = groundtruth_classes_list + if groundtruth_masks_list: + self._groundtruth_lists[ + fields.BoxListFields.masks] = groundtruth_masks_list + if groundtruth_keypoints_list: + self._groundtruth_lists[ + fields.BoxListFields.keypoints] = groundtruth_keypoints_list + + @abstractmethod + def restore_fn(self, checkpoint_path, from_detection_checkpoint=True): + """Return callable for loading a foreign checkpoint into tensorflow graph. + + Loads variables from a different tensorflow graph (typically feature + extractor variables). This enables the model to initialize based on weights + from another task. For example, the feature extractor variables from a + classification model can be used to bootstrap training of an object + detector. When loading from an object detection model, the checkpoint model + should have the same parameters as this detection model with exception of + the num_classes parameter. + + Args: + checkpoint_path: path to checkpoint to restore. + from_detection_checkpoint: whether to restore from a full detection + checkpoint (with compatible variable names) or to restore from a + classification checkpoint for initialization prior to training. + + Returns: + a callable which takes a tf.Session as input and loads a checkpoint when + run. + """ + pass diff --git a/object_detection/core/post_processing.py b/object_detection/core/post_processing.py new file mode 100644 index 0000000000000000000000000000000000000000..5983ca169834c1070b67ba7b53393f81449ba392 --- /dev/null +++ b/object_detection/core/post_processing.py @@ -0,0 +1,298 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Post-processing operations on detected boxes.""" + +import tensorflow as tf + +from object_detection.core import box_list +from object_detection.core import box_list_ops +from object_detection.core import standard_fields as fields + + +def multiclass_non_max_suppression(boxes, + scores, + score_thresh, + iou_thresh, + max_size_per_class, + max_total_size=0, + clip_window=None, + change_coordinate_frame=False, + masks=None, + additional_fields=None, + scope=None): + """Multi-class version of non maximum suppression. + + This op greedily selects a subset of detection bounding boxes, pruning + away boxes that have high IOU (intersection over union) overlap (> thresh) + with already selected boxes. It operates independently for each class for + which scores are provided (via the scores field of the input box_list), + pruning boxes with score less than a provided threshold prior to + applying NMS. + + Please note that this operation is performed on *all* classes, therefore any + background classes should be removed prior to calling this function. + + Args: + boxes: A [k, q, 4] float32 tensor containing k detections. `q` can be either + number of classes or 1 depending on whether a separate box is predicted + per class. + scores: A [k, num_classes] float32 tensor containing the scores for each of + the k detections. + score_thresh: scalar threshold for score (low scoring boxes are removed). + iou_thresh: scalar threshold for IOU (new boxes that have high IOU overlap + with previously selected boxes are removed). + max_size_per_class: maximum number of retained boxes per class. + max_total_size: maximum number of boxes retained over all classes. By + default returns all boxes retained after capping boxes per class. + clip_window: A float32 tensor of the form [y_min, x_min, y_max, x_max] + representing the window to clip and normalize boxes to before performing + non-max suppression. + change_coordinate_frame: Whether to normalize coordinates after clipping + relative to clip_window (this can only be set to True if a clip_window + is provided) + masks: (optional) a [k, q, mask_height, mask_width] float32 tensor + containing box masks. `q` can be either number of classes or 1 depending + on whether a separate mask is predicted per class. + additional_fields: (optional) If not None, a dictionary that maps keys to + tensors whose first dimensions are all of size `k`. After non-maximum + suppression, all tensors corresponding to the selected boxes will be + added to resulting BoxList. + scope: name scope. + + Returns: + a BoxList holding M boxes with a rank-1 scores field representing + corresponding scores for each box with scores sorted in decreasing order + and a rank-1 classes field representing a class label for each box. + If masks, keypoints, keypoint_heatmaps is not None, the boxlist will + contain masks, keypoints, keypoint_heatmaps corresponding to boxes. + + Raises: + ValueError: if iou_thresh is not in [0, 1] or if input boxlist does not have + a valid scores field. + """ + if not 0 <= iou_thresh <= 1.0: + raise ValueError('iou_thresh must be between 0 and 1') + if scores.shape.ndims != 2: + raise ValueError('scores field must be of rank 2') + if scores.shape[1].value is None: + raise ValueError('scores must have statically defined second ' + 'dimension') + if boxes.shape.ndims != 3: + raise ValueError('boxes must be of rank 3.') + if not (boxes.shape[1].value == scores.shape[1].value or + boxes.shape[1].value == 1): + raise ValueError('second dimension of boxes must be either 1 or equal ' + 'to the second dimension of scores') + if boxes.shape[2].value != 4: + raise ValueError('last dimension of boxes must be of size 4.') + if change_coordinate_frame and clip_window is None: + raise ValueError('if change_coordinate_frame is True, then a clip_window' + 'must be specified.') + + with tf.name_scope(scope, 'MultiClassNonMaxSuppression'): + num_boxes = tf.shape(boxes)[0] + num_scores = tf.shape(scores)[0] + num_classes = scores.get_shape()[1] + + length_assert = tf.Assert( + tf.equal(num_boxes, num_scores), + ['Incorrect scores field length: actual vs expected.', + num_scores, num_boxes]) + + selected_boxes_list = [] + per_class_boxes_list = tf.unstack(boxes, axis=1) + if masks is not None: + per_class_masks_list = tf.unstack(masks, axis=1) + boxes_ids = (range(num_classes) if len(per_class_boxes_list) > 1 + else [0] * num_classes) + for class_idx, boxes_idx in zip(range(num_classes), boxes_ids): + per_class_boxes = per_class_boxes_list[boxes_idx] + boxlist_and_class_scores = box_list.BoxList(per_class_boxes) + with tf.control_dependencies([length_assert]): + class_scores = tf.reshape( + tf.slice(scores, [0, class_idx], tf.stack([num_scores, 1])), [-1]) + boxlist_and_class_scores.add_field(fields.BoxListFields.scores, + class_scores) + if masks is not None: + per_class_masks = per_class_masks_list[boxes_idx] + boxlist_and_class_scores.add_field(fields.BoxListFields.masks, + per_class_masks) + if additional_fields is not None: + for key, tensor in additional_fields.items(): + boxlist_and_class_scores.add_field(key, tensor) + boxlist_filtered = box_list_ops.filter_greater_than( + boxlist_and_class_scores, score_thresh) + if clip_window is not None: + boxlist_filtered = box_list_ops.clip_to_window( + boxlist_filtered, clip_window) + if change_coordinate_frame: + boxlist_filtered = box_list_ops.change_coordinate_frame( + boxlist_filtered, clip_window) + max_selection_size = tf.minimum(max_size_per_class, + boxlist_filtered.num_boxes()) + selected_indices = tf.image.non_max_suppression( + boxlist_filtered.get(), + boxlist_filtered.get_field(fields.BoxListFields.scores), + max_selection_size, + iou_threshold=iou_thresh) + nms_result = box_list_ops.gather(boxlist_filtered, selected_indices) + nms_result.add_field( + fields.BoxListFields.classes, (tf.zeros_like( + nms_result.get_field(fields.BoxListFields.scores)) + class_idx)) + selected_boxes_list.append(nms_result) + selected_boxes = box_list_ops.concatenate(selected_boxes_list) + sorted_boxes = box_list_ops.sort_by_field(selected_boxes, + fields.BoxListFields.scores) + if max_total_size: + max_total_size = tf.minimum(max_total_size, + sorted_boxes.num_boxes()) + sorted_boxes = box_list_ops.gather(sorted_boxes, + tf.range(max_total_size)) + return sorted_boxes + + +def batch_multiclass_non_max_suppression(boxes, + scores, + score_thresh, + iou_thresh, + max_size_per_class, + max_total_size=0, + clip_window=None, + change_coordinate_frame=False, + num_valid_boxes=None, + masks=None, + scope=None): + """Multi-class version of non maximum suppression that operates on a batch. + + This op is similar to `multiclass_non_max_suppression` but operates on a batch + of boxes and scores. See documentation for `multiclass_non_max_suppression` + for details. + + Args: + boxes: A [batch_size, num_anchors, q, 4] float32 tensor containing + detections. If `q` is 1 then same boxes are used for all classes + otherwise, if `q` is equal to number of classes, class-specific boxes + are used. + scores: A [batch_size, num_anchors, num_classes] float32 tensor containing + the scores for each of the `num_anchors` detections. + score_thresh: scalar threshold for score (low scoring boxes are removed). + iou_thresh: scalar threshold for IOU (new boxes that have high IOU overlap + with previously selected boxes are removed). + max_size_per_class: maximum number of retained boxes per class. + max_total_size: maximum number of boxes retained over all classes. By + default returns all boxes retained after capping boxes per class. + clip_window: A float32 tensor of the form [y_min, x_min, y_max, x_max] + representing the window to clip boxes to before performing non-max + suppression. + change_coordinate_frame: Whether to normalize coordinates after clipping + relative to clip_window (this can only be set to True if a clip_window + is provided) + num_valid_boxes: (optional) a Tensor of type `int32`. A 1-D tensor of shape + [batch_size] representing the number of valid boxes to be considered + for each image in the batch. This parameter allows for ignoring zero + paddings. + masks: (optional) a [batch_size, num_anchors, q, mask_height, mask_width] + float32 tensor containing box masks. `q` can be either number of classes + or 1 depending on whether a separate mask is predicted per class. + scope: tf scope name. + + Returns: + A dictionary containing the following entries: + 'detection_boxes': A [batch_size, max_detections, 4] float32 tensor + containing the non-max suppressed boxes. + 'detection_scores': A [bath_size, max_detections] float32 tensor containing + the scores for the boxes. + 'detection_classes': A [batch_size, max_detections] float32 tensor + containing the class for boxes. + 'num_detections': A [batchsize] float32 tensor indicating the number of + valid detections per batch item. Only the top num_detections[i] entries in + nms_boxes[i], nms_scores[i] and nms_class[i] are valid. the rest of the + entries are zero paddings. + 'detection_masks': (optional) a + [batch_size, max_detections, mask_height, mask_width] float32 tensor + containing masks for each selected box. + + Raises: + ValueError: if iou_thresh is not in [0, 1] or if input boxlist does not have + a valid scores field. + """ + q = boxes.shape[2].value + num_classes = scores.shape[2].value + if q != 1 and q != num_classes: + raise ValueError('third dimension of boxes must be either 1 or equal ' + 'to the third dimension of scores') + + with tf.name_scope(scope, 'BatchMultiClassNonMaxSuppression'): + per_image_boxes_list = tf.unstack(boxes) + per_image_scores_list = tf.unstack(scores) + num_valid_boxes_list = len(per_image_boxes_list) * [None] + per_image_masks_list = len(per_image_boxes_list) * [None] + if num_valid_boxes is not None: + num_valid_boxes_list = tf.unstack(num_valid_boxes) + if masks is not None: + per_image_masks_list = tf.unstack(masks) + + detection_boxes_list = [] + detection_scores_list = [] + detection_classes_list = [] + num_detections_list = [] + detection_masks_list = [] + for (per_image_boxes, per_image_scores, per_image_masks, num_valid_boxes + ) in zip(per_image_boxes_list, per_image_scores_list, + per_image_masks_list, num_valid_boxes_list): + if num_valid_boxes is not None: + per_image_boxes = tf.reshape( + tf.slice(per_image_boxes, 3*[0], + tf.stack([num_valid_boxes, -1, -1])), [-1, q, 4]) + per_image_scores = tf.reshape( + tf.slice(per_image_scores, [0, 0], + tf.stack([num_valid_boxes, -1])), [-1, num_classes]) + if masks is not None: + per_image_masks = tf.reshape( + tf.slice(per_image_masks, 4*[0], + tf.stack([num_valid_boxes, -1, -1, -1])), + [-1, q, masks.shape[3].value, masks.shape[4].value]) + nmsed_boxlist = multiclass_non_max_suppression( + per_image_boxes, + per_image_scores, + score_thresh, + iou_thresh, + max_size_per_class, + max_total_size, + masks=per_image_masks, + clip_window=clip_window, + change_coordinate_frame=change_coordinate_frame) + num_detections_list.append(tf.to_float(nmsed_boxlist.num_boxes())) + padded_boxlist = box_list_ops.pad_or_clip_box_list(nmsed_boxlist, + max_total_size) + detection_boxes_list.append(padded_boxlist.get()) + detection_scores_list.append( + padded_boxlist.get_field(fields.BoxListFields.scores)) + detection_classes_list.append( + padded_boxlist.get_field(fields.BoxListFields.classes)) + if masks is not None: + detection_masks_list.append( + padded_boxlist.get_field(fields.BoxListFields.masks)) + + nms_dict = { + 'detection_boxes': tf.stack(detection_boxes_list), + 'detection_scores': tf.stack(detection_scores_list), + 'detection_classes': tf.stack(detection_classes_list), + 'num_detections': tf.stack(num_detections_list) + } + if masks is not None: + nms_dict['detection_masks'] = tf.stack(detection_masks_list) + return nms_dict diff --git a/object_detection/core/post_processing_test.py b/object_detection/core/post_processing_test.py new file mode 100644 index 0000000000000000000000000000000000000000..d2fccec73080ab9768256e56df56272561cfa448 --- /dev/null +++ b/object_detection/core/post_processing_test.py @@ -0,0 +1,673 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for tensorflow_models.object_detection.core.post_processing.""" +import numpy as np +import tensorflow as tf +from object_detection.core import post_processing +from object_detection.core import standard_fields as fields + + +class MulticlassNonMaxSuppressionTest(tf.test.TestCase): + + def test_with_invalid_scores_size(self): + boxes = tf.constant([[[0, 0, 1, 1]], + [[0, 0.1, 1, 1.1]], + [[0, -0.1, 1, 0.9]], + [[0, 10, 1, 11]], + [[0, 10.1, 1, 11.1]], + [[0, 100, 1, 101]]], tf.float32) + scores = tf.constant([[.9], [.75], [.6], [.95], [.5]]) + iou_thresh = .5 + score_thresh = 0.6 + max_output_size = 3 + nms = post_processing.multiclass_non_max_suppression( + boxes, scores, score_thresh, iou_thresh, max_output_size) + with self.test_session() as sess: + with self.assertRaisesWithPredicateMatch( + tf.errors.InvalidArgumentError, 'Incorrect scores field length'): + sess.run(nms.get()) + + def test_multiclass_nms_select_with_shared_boxes(self): + boxes = tf.constant([[[0, 0, 1, 1]], + [[0, 0.1, 1, 1.1]], + [[0, -0.1, 1, 0.9]], + [[0, 10, 1, 11]], + [[0, 10.1, 1, 11.1]], + [[0, 100, 1, 101]], + [[0, 1000, 1, 1002]], + [[0, 1000, 1, 1002.1]]], tf.float32) + scores = tf.constant([[.9, 0.01], [.75, 0.05], + [.6, 0.01], [.95, 0], + [.5, 0.01], [.3, 0.01], + [.01, .85], [.01, .5]]) + score_thresh = 0.1 + iou_thresh = .5 + max_output_size = 4 + + exp_nms_corners = [[0, 10, 1, 11], + [0, 0, 1, 1], + [0, 1000, 1, 1002], + [0, 100, 1, 101]] + exp_nms_scores = [.95, .9, .85, .3] + exp_nms_classes = [0, 0, 1, 0] + + nms = post_processing.multiclass_non_max_suppression( + boxes, scores, score_thresh, iou_thresh, max_output_size) + with self.test_session() as sess: + nms_corners_output, nms_scores_output, nms_classes_output = sess.run( + [nms.get(), nms.get_field(fields.BoxListFields.scores), + nms.get_field(fields.BoxListFields.classes)]) + self.assertAllClose(nms_corners_output, exp_nms_corners) + self.assertAllClose(nms_scores_output, exp_nms_scores) + self.assertAllClose(nms_classes_output, exp_nms_classes) + + def test_multiclass_nms_select_with_shared_boxes_given_keypoints(self): + boxes = tf.constant([[[0, 0, 1, 1]], + [[0, 0.1, 1, 1.1]], + [[0, -0.1, 1, 0.9]], + [[0, 10, 1, 11]], + [[0, 10.1, 1, 11.1]], + [[0, 100, 1, 101]], + [[0, 1000, 1, 1002]], + [[0, 1000, 1, 1002.1]]], tf.float32) + scores = tf.constant([[.9, 0.01], [.75, 0.05], + [.6, 0.01], [.95, 0], + [.5, 0.01], [.3, 0.01], + [.01, .85], [.01, .5]]) + num_keypoints = 6 + keypoints = tf.tile( + tf.reshape(tf.range(8), [8, 1, 1]), + [1, num_keypoints, 2]) + score_thresh = 0.1 + iou_thresh = .5 + max_output_size = 4 + + exp_nms_corners = [[0, 10, 1, 11], + [0, 0, 1, 1], + [0, 1000, 1, 1002], + [0, 100, 1, 101]] + exp_nms_scores = [.95, .9, .85, .3] + exp_nms_classes = [0, 0, 1, 0] + exp_nms_keypoints_tensor = tf.tile( + tf.reshape(tf.constant([3, 0, 6, 5], dtype=tf.float32), [4, 1, 1]), + [1, num_keypoints, 2]) + + nms = post_processing.multiclass_non_max_suppression( + boxes, scores, score_thresh, iou_thresh, max_output_size, + additional_fields={ + fields.BoxListFields.keypoints: keypoints}) + + with self.test_session() as sess: + (nms_corners_output, + nms_scores_output, + nms_classes_output, + nms_keypoints, + exp_nms_keypoints) = sess.run([ + nms.get(), + nms.get_field(fields.BoxListFields.scores), + nms.get_field(fields.BoxListFields.classes), + nms.get_field(fields.BoxListFields.keypoints), + exp_nms_keypoints_tensor + ]) + self.assertAllClose(nms_corners_output, exp_nms_corners) + self.assertAllClose(nms_scores_output, exp_nms_scores) + self.assertAllClose(nms_classes_output, exp_nms_classes) + self.assertAllEqual(nms_keypoints, exp_nms_keypoints) + + def test_multiclass_nms_with_shared_boxes_given_keypoint_heatmaps(self): + boxes = tf.constant([[[0, 0, 1, 1]], + [[0, 0.1, 1, 1.1]], + [[0, -0.1, 1, 0.9]], + [[0, 10, 1, 11]], + [[0, 10.1, 1, 11.1]], + [[0, 100, 1, 101]], + [[0, 1000, 1, 1002]], + [[0, 1000, 1, 1002.1]]], tf.float32) + + scores = tf.constant([[.9, 0.01], [.75, 0.05], + [.6, 0.01], [.95, 0], + [.5, 0.01], [.3, 0.01], + [.01, .85], [.01, .5]]) + + num_boxes = tf.shape(boxes)[0] + heatmap_height = 5 + heatmap_width = 5 + num_keypoints = 17 + keypoint_heatmaps = tf.ones( + [num_boxes, heatmap_height, heatmap_width, num_keypoints], + dtype=tf.float32) + + score_thresh = 0.1 + iou_thresh = .5 + max_output_size = 4 + exp_nms_corners = [[0, 10, 1, 11], + [0, 0, 1, 1], + [0, 1000, 1, 1002], + [0, 100, 1, 101]] + + exp_nms_scores = [.95, .9, .85, .3] + exp_nms_classes = [0, 0, 1, 0] + exp_nms_keypoint_heatmaps = np.ones( + (4, heatmap_height, heatmap_width, num_keypoints), dtype=np.float32) + + nms = post_processing.multiclass_non_max_suppression( + boxes, scores, score_thresh, iou_thresh, max_output_size, + additional_fields={ + fields.BoxListFields.keypoint_heatmaps: keypoint_heatmaps}) + + with self.test_session() as sess: + (nms_corners_output, + nms_scores_output, + nms_classes_output, + nms_keypoint_heatmaps) = sess.run( + [nms.get(), + nms.get_field(fields.BoxListFields.scores), + nms.get_field(fields.BoxListFields.classes), + nms.get_field(fields.BoxListFields.keypoint_heatmaps)]) + + self.assertAllClose(nms_corners_output, exp_nms_corners) + self.assertAllClose(nms_scores_output, exp_nms_scores) + self.assertAllClose(nms_classes_output, exp_nms_classes) + self.assertAllEqual(nms_keypoint_heatmaps, exp_nms_keypoint_heatmaps) + + def test_multiclass_nms_with_additional_fields(self): + boxes = tf.constant([[[0, 0, 1, 1]], + [[0, 0.1, 1, 1.1]], + [[0, -0.1, 1, 0.9]], + [[0, 10, 1, 11]], + [[0, 10.1, 1, 11.1]], + [[0, 100, 1, 101]], + [[0, 1000, 1, 1002]], + [[0, 1000, 1, 1002.1]]], tf.float32) + + scores = tf.constant([[.9, 0.01], [.75, 0.05], + [.6, 0.01], [.95, 0], + [.5, 0.01], [.3, 0.01], + [.01, .85], [.01, .5]]) + + coarse_boxes_key = 'coarse_boxes' + coarse_boxes = tf.constant([[0.1, 0.1, 1.1, 1.1], + [0.1, 0.2, 1.1, 1.2], + [0.1, -0.2, 1.1, 1.0], + [0.1, 10.1, 1.1, 11.1], + [0.1, 10.2, 1.1, 11.2], + [0.1, 100.1, 1.1, 101.1], + [0.1, 1000.1, 1.1, 1002.1], + [0.1, 1000.1, 1.1, 1002.2]], tf.float32) + + score_thresh = 0.1 + iou_thresh = .5 + max_output_size = 4 + + exp_nms_corners = np.array([[0, 10, 1, 11], + [0, 0, 1, 1], + [0, 1000, 1, 1002], + [0, 100, 1, 101]], dtype=np.float32) + + exp_nms_coarse_corners = np.array([[0.1, 10.1, 1.1, 11.1], + [0.1, 0.1, 1.1, 1.1], + [0.1, 1000.1, 1.1, 1002.1], + [0.1, 100.1, 1.1, 101.1]], + dtype=np.float32) + + exp_nms_scores = [.95, .9, .85, .3] + exp_nms_classes = [0, 0, 1, 0] + + nms = post_processing.multiclass_non_max_suppression( + boxes, scores, score_thresh, iou_thresh, max_output_size, + additional_fields={coarse_boxes_key: coarse_boxes}) + + with self.test_session() as sess: + (nms_corners_output, + nms_scores_output, + nms_classes_output, + nms_coarse_corners) = sess.run( + [nms.get(), + nms.get_field(fields.BoxListFields.scores), + nms.get_field(fields.BoxListFields.classes), + nms.get_field(coarse_boxes_key)]) + + self.assertAllClose(nms_corners_output, exp_nms_corners) + self.assertAllClose(nms_scores_output, exp_nms_scores) + self.assertAllClose(nms_classes_output, exp_nms_classes) + self.assertAllEqual(nms_coarse_corners, exp_nms_coarse_corners) + + def test_multiclass_nms_select_with_shared_boxes_given_masks(self): + boxes = tf.constant([[[0, 0, 1, 1]], + [[0, 0.1, 1, 1.1]], + [[0, -0.1, 1, 0.9]], + [[0, 10, 1, 11]], + [[0, 10.1, 1, 11.1]], + [[0, 100, 1, 101]], + [[0, 1000, 1, 1002]], + [[0, 1000, 1, 1002.1]]], tf.float32) + scores = tf.constant([[.9, 0.01], [.75, 0.05], + [.6, 0.01], [.95, 0], + [.5, 0.01], [.3, 0.01], + [.01, .85], [.01, .5]]) + num_classes = 2 + mask_height = 3 + mask_width = 3 + masks = tf.tile( + tf.reshape(tf.range(8), [8, 1, 1, 1]), + [1, num_classes, mask_height, mask_width]) + score_thresh = 0.1 + iou_thresh = .5 + max_output_size = 4 + + exp_nms_corners = [[0, 10, 1, 11], + [0, 0, 1, 1], + [0, 1000, 1, 1002], + [0, 100, 1, 101]] + exp_nms_scores = [.95, .9, .85, .3] + exp_nms_classes = [0, 0, 1, 0] + exp_nms_masks_tensor = tf.tile( + tf.reshape(tf.constant([3, 0, 6, 5], dtype=tf.float32), [4, 1, 1]), + [1, mask_height, mask_width]) + + nms = post_processing.multiclass_non_max_suppression(boxes, scores, + score_thresh, + iou_thresh, + max_output_size, + masks=masks) + with self.test_session() as sess: + (nms_corners_output, + nms_scores_output, + nms_classes_output, + nms_masks, + exp_nms_masks) = sess.run([nms.get(), + nms.get_field(fields.BoxListFields.scores), + nms.get_field(fields.BoxListFields.classes), + nms.get_field(fields.BoxListFields.masks), + exp_nms_masks_tensor]) + self.assertAllClose(nms_corners_output, exp_nms_corners) + self.assertAllClose(nms_scores_output, exp_nms_scores) + self.assertAllClose(nms_classes_output, exp_nms_classes) + self.assertAllEqual(nms_masks, exp_nms_masks) + + def test_multiclass_nms_select_with_clip_window(self): + boxes = tf.constant([[[0, 0, 10, 10]], + [[1, 1, 11, 11]]], tf.float32) + scores = tf.constant([[.9], [.75]]) + clip_window = tf.constant([5, 4, 8, 7], tf.float32) + score_thresh = 0.0 + iou_thresh = 0.5 + max_output_size = 100 + + exp_nms_corners = [[5, 4, 8, 7]] + exp_nms_scores = [.9] + exp_nms_classes = [0] + + nms = post_processing.multiclass_non_max_suppression( + boxes, scores, score_thresh, iou_thresh, max_output_size, + clip_window=clip_window) + with self.test_session() as sess: + nms_corners_output, nms_scores_output, nms_classes_output = sess.run( + [nms.get(), nms.get_field(fields.BoxListFields.scores), + nms.get_field(fields.BoxListFields.classes)]) + self.assertAllClose(nms_corners_output, exp_nms_corners) + self.assertAllClose(nms_scores_output, exp_nms_scores) + self.assertAllClose(nms_classes_output, exp_nms_classes) + + def test_multiclass_nms_select_with_clip_window_change_coordinate_frame(self): + boxes = tf.constant([[[0, 0, 10, 10]], + [[1, 1, 11, 11]]], tf.float32) + scores = tf.constant([[.9], [.75]]) + clip_window = tf.constant([5, 4, 8, 7], tf.float32) + score_thresh = 0.0 + iou_thresh = 0.5 + max_output_size = 100 + + exp_nms_corners = [[0, 0, 1, 1]] + exp_nms_scores = [.9] + exp_nms_classes = [0] + + nms = post_processing.multiclass_non_max_suppression( + boxes, scores, score_thresh, iou_thresh, max_output_size, + clip_window=clip_window, change_coordinate_frame=True) + with self.test_session() as sess: + nms_corners_output, nms_scores_output, nms_classes_output = sess.run( + [nms.get(), nms.get_field(fields.BoxListFields.scores), + nms.get_field(fields.BoxListFields.classes)]) + self.assertAllClose(nms_corners_output, exp_nms_corners) + self.assertAllClose(nms_scores_output, exp_nms_scores) + self.assertAllClose(nms_classes_output, exp_nms_classes) + + def test_multiclass_nms_select_with_per_class_cap(self): + boxes = tf.constant([[[0, 0, 1, 1]], + [[0, 0.1, 1, 1.1]], + [[0, -0.1, 1, 0.9]], + [[0, 10, 1, 11]], + [[0, 10.1, 1, 11.1]], + [[0, 100, 1, 101]], + [[0, 1000, 1, 1002]], + [[0, 1000, 1, 1002.1]]], tf.float32) + scores = tf.constant([[.9, 0.01], [.75, 0.05], + [.6, 0.01], [.95, 0], + [.5, 0.01], [.3, 0.01], + [.01, .85], [.01, .5]]) + score_thresh = 0.1 + iou_thresh = .5 + max_size_per_class = 2 + + exp_nms_corners = [[0, 10, 1, 11], + [0, 0, 1, 1], + [0, 1000, 1, 1002]] + exp_nms_scores = [.95, .9, .85] + exp_nms_classes = [0, 0, 1] + + nms = post_processing.multiclass_non_max_suppression( + boxes, scores, score_thresh, iou_thresh, max_size_per_class) + with self.test_session() as sess: + nms_corners_output, nms_scores_output, nms_classes_output = sess.run( + [nms.get(), nms.get_field(fields.BoxListFields.scores), + nms.get_field(fields.BoxListFields.classes)]) + self.assertAllClose(nms_corners_output, exp_nms_corners) + self.assertAllClose(nms_scores_output, exp_nms_scores) + self.assertAllClose(nms_classes_output, exp_nms_classes) + + def test_multiclass_nms_select_with_total_cap(self): + boxes = tf.constant([[[0, 0, 1, 1]], + [[0, 0.1, 1, 1.1]], + [[0, -0.1, 1, 0.9]], + [[0, 10, 1, 11]], + [[0, 10.1, 1, 11.1]], + [[0, 100, 1, 101]], + [[0, 1000, 1, 1002]], + [[0, 1000, 1, 1002.1]]], tf.float32) + scores = tf.constant([[.9, 0.01], [.75, 0.05], + [.6, 0.01], [.95, 0], + [.5, 0.01], [.3, 0.01], + [.01, .85], [.01, .5]]) + score_thresh = 0.1 + iou_thresh = .5 + max_size_per_class = 4 + max_total_size = 2 + + exp_nms_corners = [[0, 10, 1, 11], + [0, 0, 1, 1]] + exp_nms_scores = [.95, .9] + exp_nms_classes = [0, 0] + + nms = post_processing.multiclass_non_max_suppression( + boxes, scores, score_thresh, iou_thresh, max_size_per_class, + max_total_size) + with self.test_session() as sess: + nms_corners_output, nms_scores_output, nms_classes_output = sess.run( + [nms.get(), nms.get_field(fields.BoxListFields.scores), + nms.get_field(fields.BoxListFields.classes)]) + self.assertAllClose(nms_corners_output, exp_nms_corners) + self.assertAllClose(nms_scores_output, exp_nms_scores) + self.assertAllClose(nms_classes_output, exp_nms_classes) + + def test_multiclass_nms_threshold_then_select_with_shared_boxes(self): + boxes = tf.constant([[[0, 0, 1, 1]], + [[0, 0.1, 1, 1.1]], + [[0, -0.1, 1, 0.9]], + [[0, 10, 1, 11]], + [[0, 10.1, 1, 11.1]], + [[0, 100, 1, 101]], + [[0, 1000, 1, 1002]], + [[0, 1000, 1, 1002.1]]], tf.float32) + scores = tf.constant([[.9], [.75], [.6], [.95], [.5], [.3], [.01], [.01]]) + score_thresh = 0.1 + iou_thresh = .5 + max_output_size = 3 + + exp_nms = [[0, 10, 1, 11], + [0, 0, 1, 1], + [0, 100, 1, 101]] + nms = post_processing.multiclass_non_max_suppression( + boxes, scores, score_thresh, iou_thresh, max_output_size) + with self.test_session() as sess: + nms_output = sess.run(nms.get()) + self.assertAllClose(nms_output, exp_nms) + + def test_multiclass_nms_select_with_separate_boxes(self): + boxes = tf.constant([[[0, 0, 1, 1], [0, 0, 4, 5]], + [[0, 0.1, 1, 1.1], [0, 0.1, 2, 1.1]], + [[0, -0.1, 1, 0.9], [0, -0.1, 1, 0.9]], + [[0, 10, 1, 11], [0, 10, 1, 11]], + [[0, 10.1, 1, 11.1], [0, 10.1, 1, 11.1]], + [[0, 100, 1, 101], [0, 100, 1, 101]], + [[0, 1000, 1, 1002], [0, 999, 2, 1004]], + [[0, 1000, 1, 1002.1], [0, 999, 2, 1002.7]]], + tf.float32) + scores = tf.constant([[.9, 0.01], [.75, 0.05], + [.6, 0.01], [.95, 0], + [.5, 0.01], [.3, 0.01], + [.01, .85], [.01, .5]]) + score_thresh = 0.1 + iou_thresh = .5 + max_output_size = 4 + + exp_nms_corners = [[0, 10, 1, 11], + [0, 0, 1, 1], + [0, 999, 2, 1004], + [0, 100, 1, 101]] + exp_nms_scores = [.95, .9, .85, .3] + exp_nms_classes = [0, 0, 1, 0] + + nms = post_processing.multiclass_non_max_suppression( + boxes, scores, score_thresh, iou_thresh, max_output_size) + with self.test_session() as sess: + nms_corners_output, nms_scores_output, nms_classes_output = sess.run( + [nms.get(), nms.get_field(fields.BoxListFields.scores), + nms.get_field(fields.BoxListFields.classes)]) + self.assertAllClose(nms_corners_output, exp_nms_corners) + self.assertAllClose(nms_scores_output, exp_nms_scores) + self.assertAllClose(nms_classes_output, exp_nms_classes) + + def test_batch_multiclass_nms_with_batch_size_1(self): + boxes = tf.constant([[[[0, 0, 1, 1], [0, 0, 4, 5]], + [[0, 0.1, 1, 1.1], [0, 0.1, 2, 1.1]], + [[0, -0.1, 1, 0.9], [0, -0.1, 1, 0.9]], + [[0, 10, 1, 11], [0, 10, 1, 11]], + [[0, 10.1, 1, 11.1], [0, 10.1, 1, 11.1]], + [[0, 100, 1, 101], [0, 100, 1, 101]], + [[0, 1000, 1, 1002], [0, 999, 2, 1004]], + [[0, 1000, 1, 1002.1], [0, 999, 2, 1002.7]]]], + tf.float32) + scores = tf.constant([[[.9, 0.01], [.75, 0.05], + [.6, 0.01], [.95, 0], + [.5, 0.01], [.3, 0.01], + [.01, .85], [.01, .5]]]) + score_thresh = 0.1 + iou_thresh = .5 + max_output_size = 4 + + exp_nms_corners = [[[0, 10, 1, 11], + [0, 0, 1, 1], + [0, 999, 2, 1004], + [0, 100, 1, 101]]] + exp_nms_scores = [[.95, .9, .85, .3]] + exp_nms_classes = [[0, 0, 1, 0]] + + nms_dict = post_processing.batch_multiclass_non_max_suppression( + boxes, scores, score_thresh, iou_thresh, + max_size_per_class=max_output_size, max_total_size=max_output_size) + with self.test_session() as sess: + nms_output = sess.run(nms_dict) + self.assertAllClose(nms_output['detection_boxes'], exp_nms_corners) + self.assertAllClose(nms_output['detection_scores'], exp_nms_scores) + self.assertAllClose(nms_output['detection_classes'], exp_nms_classes) + self.assertEqual(nms_output['num_detections'], [4]) + + def test_batch_multiclass_nms_with_batch_size_2(self): + boxes = tf.constant([[[[0, 0, 1, 1], [0, 0, 4, 5]], + [[0, 0.1, 1, 1.1], [0, 0.1, 2, 1.1]], + [[0, -0.1, 1, 0.9], [0, -0.1, 1, 0.9]], + [[0, 10, 1, 11], [0, 10, 1, 11]]], + [[[0, 10.1, 1, 11.1], [0, 10.1, 1, 11.1]], + [[0, 100, 1, 101], [0, 100, 1, 101]], + [[0, 1000, 1, 1002], [0, 999, 2, 1004]], + [[0, 1000, 1, 1002.1], [0, 999, 2, 1002.7]]]], + tf.float32) + scores = tf.constant([[[.9, 0.01], [.75, 0.05], + [.6, 0.01], [.95, 0]], + [[.5, 0.01], [.3, 0.01], + [.01, .85], [.01, .5]]]) + score_thresh = 0.1 + iou_thresh = .5 + max_output_size = 4 + + exp_nms_corners = [[[0, 10, 1, 11], + [0, 0, 1, 1], + [0, 0, 0, 0], + [0, 0, 0, 0]], + [[0, 999, 2, 1004], + [0, 10.1, 1, 11.1], + [0, 100, 1, 101], + [0, 0, 0, 0]]] + exp_nms_scores = [[.95, .9, 0, 0], + [.85, .5, .3, 0]] + exp_nms_classes = [[0, 0, 0, 0], + [1, 0, 0, 0]] + + nms_dict = post_processing.batch_multiclass_non_max_suppression( + boxes, scores, score_thresh, iou_thresh, + max_size_per_class=max_output_size, max_total_size=max_output_size) + with self.test_session() as sess: + nms_output = sess.run(nms_dict) + self.assertAllClose(nms_output['detection_boxes'], exp_nms_corners) + self.assertAllClose(nms_output['detection_scores'], exp_nms_scores) + self.assertAllClose(nms_output['detection_classes'], exp_nms_classes) + self.assertAllClose(nms_output['num_detections'], [2, 3]) + + def test_batch_multiclass_nms_with_masks(self): + boxes = tf.constant([[[[0, 0, 1, 1], [0, 0, 4, 5]], + [[0, 0.1, 1, 1.1], [0, 0.1, 2, 1.1]], + [[0, -0.1, 1, 0.9], [0, -0.1, 1, 0.9]], + [[0, 10, 1, 11], [0, 10, 1, 11]]], + [[[0, 10.1, 1, 11.1], [0, 10.1, 1, 11.1]], + [[0, 100, 1, 101], [0, 100, 1, 101]], + [[0, 1000, 1, 1002], [0, 999, 2, 1004]], + [[0, 1000, 1, 1002.1], [0, 999, 2, 1002.7]]]], + tf.float32) + scores = tf.constant([[[.9, 0.01], [.75, 0.05], + [.6, 0.01], [.95, 0]], + [[.5, 0.01], [.3, 0.01], + [.01, .85], [.01, .5]]]) + masks = tf.constant([[[[[0, 1], [2, 3]], [[1, 2], [3, 4]]], + [[[2, 3], [4, 5]], [[3, 4], [5, 6]]], + [[[4, 5], [6, 7]], [[5, 6], [7, 8]]], + [[[6, 7], [8, 9]], [[7, 8], [9, 10]]]], + [[[[8, 9], [10, 11]], [[9, 10], [11, 12]]], + [[[10, 11], [12, 13]], [[11, 12], [13, 14]]], + [[[12, 13], [14, 15]], [[13, 14], [15, 16]]], + [[[14, 15], [16, 17]], [[15, 16], [17, 18]]]]], + tf.float32) + score_thresh = 0.1 + iou_thresh = .5 + max_output_size = 4 + + exp_nms_corners = [[[0, 10, 1, 11], + [0, 0, 1, 1], + [0, 0, 0, 0], + [0, 0, 0, 0]], + [[0, 999, 2, 1004], + [0, 10.1, 1, 11.1], + [0, 100, 1, 101], + [0, 0, 0, 0]]] + exp_nms_scores = [[.95, .9, 0, 0], + [.85, .5, .3, 0]] + exp_nms_classes = [[0, 0, 0, 0], + [1, 0, 0, 0]] + exp_nms_masks = [[[[6, 7], [8, 9]], + [[0, 1], [2, 3]], + [[0, 0], [0, 0]], + [[0, 0], [0, 0]]], + [[[13, 14], [15, 16]], + [[8, 9], [10, 11]], + [[10, 11], [12, 13]], + [[0, 0], [0, 0]]]] + + nms_dict = post_processing.batch_multiclass_non_max_suppression( + boxes, scores, score_thresh, iou_thresh, + max_size_per_class=max_output_size, max_total_size=max_output_size, + masks=masks) + with self.test_session() as sess: + nms_output = sess.run(nms_dict) + self.assertAllClose(nms_output['detection_boxes'], exp_nms_corners) + self.assertAllClose(nms_output['detection_scores'], exp_nms_scores) + self.assertAllClose(nms_output['detection_classes'], exp_nms_classes) + self.assertAllClose(nms_output['num_detections'], [2, 3]) + self.assertAllClose(nms_output['detection_masks'], exp_nms_masks) + + def test_batch_multiclass_nms_with_masks_and_num_valid_boxes(self): + boxes = tf.constant([[[[0, 0, 1, 1], [0, 0, 4, 5]], + [[0, 0.1, 1, 1.1], [0, 0.1, 2, 1.1]], + [[0, -0.1, 1, 0.9], [0, -0.1, 1, 0.9]], + [[0, 10, 1, 11], [0, 10, 1, 11]]], + [[[0, 10.1, 1, 11.1], [0, 10.1, 1, 11.1]], + [[0, 100, 1, 101], [0, 100, 1, 101]], + [[0, 1000, 1, 1002], [0, 999, 2, 1004]], + [[0, 1000, 1, 1002.1], [0, 999, 2, 1002.7]]]], + tf.float32) + scores = tf.constant([[[.9, 0.01], [.75, 0.05], + [.6, 0.01], [.95, 0]], + [[.5, 0.01], [.3, 0.01], + [.01, .85], [.01, .5]]]) + masks = tf.constant([[[[[0, 1], [2, 3]], [[1, 2], [3, 4]]], + [[[2, 3], [4, 5]], [[3, 4], [5, 6]]], + [[[4, 5], [6, 7]], [[5, 6], [7, 8]]], + [[[6, 7], [8, 9]], [[7, 8], [9, 10]]]], + [[[[8, 9], [10, 11]], [[9, 10], [11, 12]]], + [[[10, 11], [12, 13]], [[11, 12], [13, 14]]], + [[[12, 13], [14, 15]], [[13, 14], [15, 16]]], + [[[14, 15], [16, 17]], [[15, 16], [17, 18]]]]], + tf.float32) + num_valid_boxes = tf.constant([1, 1], tf.int32) + score_thresh = 0.1 + iou_thresh = .5 + max_output_size = 4 + + exp_nms_corners = [[[0, 0, 1, 1], + [0, 0, 0, 0], + [0, 0, 0, 0], + [0, 0, 0, 0]], + [[0, 10.1, 1, 11.1], + [0, 0, 0, 0], + [0, 0, 0, 0], + [0, 0, 0, 0]]] + exp_nms_scores = [[.9, 0, 0, 0], + [.5, 0, 0, 0]] + exp_nms_classes = [[0, 0, 0, 0], + [0, 0, 0, 0]] + exp_nms_masks = [[[[0, 1], [2, 3]], + [[0, 0], [0, 0]], + [[0, 0], [0, 0]], + [[0, 0], [0, 0]]], + [[[8, 9], [10, 11]], + [[0, 0], [0, 0]], + [[0, 0], [0, 0]], + [[0, 0], [0, 0]]]] + + nms_dict = post_processing.batch_multiclass_non_max_suppression( + boxes, scores, score_thresh, iou_thresh, + max_size_per_class=max_output_size, max_total_size=max_output_size, + num_valid_boxes=num_valid_boxes, masks=masks) + with self.test_session() as sess: + nms_output = sess.run(nms_dict) + self.assertAllClose(nms_output['detection_boxes'], exp_nms_corners) + self.assertAllClose(nms_output['detection_scores'], exp_nms_scores) + self.assertAllClose(nms_output['detection_classes'], exp_nms_classes) + self.assertAllClose(nms_output['num_detections'], [1, 1]) + self.assertAllClose(nms_output['detection_masks'], exp_nms_masks) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/core/prefetcher.py b/object_detection/core/prefetcher.py new file mode 100644 index 0000000000000000000000000000000000000000..e690c599fa74e024d9b7ec857628cdbfb0e3ee81 --- /dev/null +++ b/object_detection/core/prefetcher.py @@ -0,0 +1,61 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Provides functions to prefetch tensors to feed into models.""" +import tensorflow as tf + + +def prefetch(tensor_dict, capacity): + """Creates a prefetch queue for tensors. + + Creates a FIFO queue to asynchronously enqueue tensor_dicts and returns a + dequeue op that evaluates to a tensor_dict. This function is useful in + prefetching preprocessed tensors so that the data is readily available for + consumers. + + Example input pipeline when you don't need batching: + ---------------------------------------------------- + key, string_tensor = slim.parallel_reader.parallel_read(...) + tensor_dict = decoder.decode(string_tensor) + tensor_dict = preprocessor.preprocess(tensor_dict, ...) + prefetch_queue = prefetcher.prefetch(tensor_dict, capacity=20) + tensor_dict = prefetch_queue.dequeue() + outputs = Model(tensor_dict) + ... + ---------------------------------------------------- + + For input pipelines with batching, refer to core/batcher.py + + Args: + tensor_dict: a dictionary of tensors to prefetch. + capacity: the size of the prefetch queue. + + Returns: + a FIFO prefetcher queue + """ + names = list(tensor_dict.keys()) + dtypes = [t.dtype for t in tensor_dict.values()] + shapes = [t.get_shape() for t in tensor_dict.values()] + prefetch_queue = tf.PaddingFIFOQueue(capacity, dtypes=dtypes, + shapes=shapes, + names=names, + name='prefetch_queue') + enqueue_op = prefetch_queue.enqueue(tensor_dict) + tf.train.queue_runner.add_queue_runner(tf.train.queue_runner.QueueRunner( + prefetch_queue, [enqueue_op])) + tf.summary.scalar('queue/%s/fraction_of_%d_full' % (prefetch_queue.name, + capacity), + tf.to_float(prefetch_queue.size()) * (1. / capacity)) + return prefetch_queue diff --git a/object_detection/core/prefetcher_test.py b/object_detection/core/prefetcher_test.py new file mode 100644 index 0000000000000000000000000000000000000000..63f557e3318c25d02434bc1dd0763f1df35b18ac --- /dev/null +++ b/object_detection/core/prefetcher_test.py @@ -0,0 +1,101 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.core.prefetcher.""" +import tensorflow as tf + +from object_detection.core import prefetcher + +slim = tf.contrib.slim + + +class PrefetcherTest(tf.test.TestCase): + + def test_prefetch_tensors_with_fully_defined_shapes(self): + with self.test_session() as sess: + batch_size = 10 + image_size = 32 + num_batches = 5 + examples = tf.Variable(tf.constant(0, dtype=tf.int64)) + counter = examples.count_up_to(num_batches) + image = tf.random_normal([batch_size, image_size, + image_size, 3], + dtype=tf.float32, + name='images') + label = tf.random_uniform([batch_size, 1], 0, 10, + dtype=tf.int32, name='labels') + + prefetch_queue = prefetcher.prefetch(tensor_dict={'counter': counter, + 'image': image, + 'label': label}, + capacity=100) + tensor_dict = prefetch_queue.dequeue() + + self.assertAllEqual(tensor_dict['image'].get_shape().as_list(), + [batch_size, image_size, image_size, 3]) + self.assertAllEqual(tensor_dict['label'].get_shape().as_list(), + [batch_size, 1]) + + tf.initialize_all_variables().run() + with slim.queues.QueueRunners(sess): + for _ in range(num_batches): + results = sess.run(tensor_dict) + self.assertEquals(results['image'].shape, + (batch_size, image_size, image_size, 3)) + self.assertEquals(results['label'].shape, (batch_size, 1)) + with self.assertRaises(tf.errors.OutOfRangeError): + sess.run(tensor_dict) + + def test_prefetch_tensors_with_partially_defined_shapes(self): + with self.test_session() as sess: + batch_size = 10 + image_size = 32 + num_batches = 5 + examples = tf.Variable(tf.constant(0, dtype=tf.int64)) + counter = examples.count_up_to(num_batches) + image = tf.random_normal([batch_size, + tf.Variable(image_size), + tf.Variable(image_size), 3], + dtype=tf.float32, + name='image') + image.set_shape([batch_size, None, None, 3]) + label = tf.random_uniform([batch_size, tf.Variable(1)], 0, + 10, dtype=tf.int32, name='label') + label.set_shape([batch_size, None]) + + prefetch_queue = prefetcher.prefetch(tensor_dict={'counter': counter, + 'image': image, + 'label': label}, + capacity=100) + tensor_dict = prefetch_queue.dequeue() + + self.assertAllEqual(tensor_dict['image'].get_shape().as_list(), + [batch_size, None, None, 3]) + self.assertAllEqual(tensor_dict['label'].get_shape().as_list(), + [batch_size, None]) + + tf.initialize_all_variables().run() + with slim.queues.QueueRunners(sess): + for _ in range(num_batches): + results = sess.run(tensor_dict) + self.assertEquals(results['image'].shape, + (batch_size, image_size, image_size, 3)) + self.assertEquals(results['label'].shape, (batch_size, 1)) + with self.assertRaises(tf.errors.OutOfRangeError): + sess.run(tensor_dict) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/core/preprocessor.py b/object_detection/core/preprocessor.py new file mode 100644 index 0000000000000000000000000000000000000000..25bd0cbaf485e8cf8f295cf941fd5e692a1d469c --- /dev/null +++ b/object_detection/core/preprocessor.py @@ -0,0 +1,1921 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Preprocess images and bounding boxes for detection. + +We perform two sets of operations in preprocessing stage: +(a) operations that are applied to both training and testing data, +(b) operations that are applied only to training data for the purpose of + data augmentation. + +A preprocessing function receives a set of inputs, +e.g. an image and bounding boxes, +performs an operation on them, and returns them. +Some examples are: randomly cropping the image, randomly mirroring the image, + randomly changing the brightness, contrast, hue and + randomly jittering the bounding boxes. + +The preprocess function receives a tensor_dict which is a dictionary that maps +different field names to their tensors. For example, +tensor_dict[fields.InputDataFields.image] holds the image tensor. +The image is a rank 4 tensor: [1, height, width, channels] with +dtype=tf.float32. The groundtruth_boxes is a rank 2 tensor: [N, 4] where +in each row there is a box with [ymin xmin ymax xmax]. +Boxes are in normalized coordinates meaning +their coordinate values range in [0, 1] + +Important Note: In tensor_dict, images is a rank 4 tensor, but preprocessing +functions receive a rank 3 tensor for processing the image. Thus, inside the +preprocess function we squeeze the image to become a rank 3 tensor and then +we pass it to the functions. At the end of the preprocess we expand the image +back to rank 4. +""" + +import sys +import tensorflow as tf + +from tensorflow.python.ops import control_flow_ops + +from object_detection.core import box_list +from object_detection.core import box_list_ops +from object_detection.core import keypoint_ops +from object_detection.core import standard_fields as fields + + +def _apply_with_random_selector(x, func, num_cases): + """Computes func(x, sel), with sel sampled from [0...num_cases-1]. + + Args: + x: input Tensor. + func: Python function to apply. + num_cases: Python int32, number of cases to sample sel from. + + Returns: + The result of func(x, sel), where func receives the value of the + selector as a python integer, but sel is sampled dynamically. + """ + rand_sel = tf.random_uniform([], maxval=num_cases, dtype=tf.int32) + # Pass the real x only to one of the func calls. + return control_flow_ops.merge([func( + control_flow_ops.switch(x, tf.equal(rand_sel, case))[1], case) + for case in range(num_cases)])[0] + + +def _apply_with_random_selector_tuples(x, func, num_cases): + """Computes func(x, sel), with sel sampled from [0...num_cases-1]. + + Args: + x: A tuple of input tensors. + func: Python function to apply. + num_cases: Python int32, number of cases to sample sel from. + + Returns: + The result of func(x, sel), where func receives the value of the + selector as a python integer, but sel is sampled dynamically. + """ + num_inputs = len(x) + rand_sel = tf.random_uniform([], maxval=num_cases, dtype=tf.int32) + # Pass the real x only to one of the func calls. + + tuples = [list() for t in x] + for case in range(num_cases): + new_x = [control_flow_ops.switch(t, tf.equal(rand_sel, case))[1] for t in x] + output = func(tuple(new_x), case) + for j in range(num_inputs): + tuples[j].append(output[j]) + + for i in range(num_inputs): + tuples[i] = control_flow_ops.merge(tuples[i])[0] + return tuple(tuples) + + +def _random_integer(minval, maxval, seed): + """Returns a random 0-D tensor between minval and maxval. + + Args: + minval: minimum value of the random tensor. + maxval: maximum value of the random tensor. + seed: random seed. + + Returns: + A random 0-D tensor between minval and maxval. + """ + return tf.random_uniform( + [], minval=minval, maxval=maxval, dtype=tf.int32, seed=seed) + + +def normalize_image(image, original_minval, original_maxval, target_minval, + target_maxval): + """Normalizes pixel values in the image. + + Moves the pixel values from the current [original_minval, original_maxval] + range to a the [target_minval, target_maxval] range. + + Args: + image: rank 3 float32 tensor containing 1 + image -> [height, width, channels]. + original_minval: current image minimum value. + original_maxval: current image maximum value. + target_minval: target image minimum value. + target_maxval: target image maximum value. + + Returns: + image: image which is the same shape as input image. + """ + with tf.name_scope('NormalizeImage', values=[image]): + original_minval = float(original_minval) + original_maxval = float(original_maxval) + target_minval = float(target_minval) + target_maxval = float(target_maxval) + image = tf.to_float(image) + image = tf.subtract(image, original_minval) + image = tf.multiply(image, (target_maxval - target_minval) / + (original_maxval - original_minval)) + image = tf.add(image, target_minval) + return image + + +def flip_boxes(boxes): + """Left-right flip the boxes. + + Args: + boxes: rank 2 float32 tensor containing the bounding boxes -> [N, 4]. + Boxes are in normalized form meaning their coordinates vary + between [0, 1]. + Each row is in the form of [ymin, xmin, ymax, xmax]. + + Returns: + Flipped boxes. + """ + # Flip boxes. + ymin, xmin, ymax, xmax = tf.split(value=boxes, num_or_size_splits=4, axis=1) + flipped_xmin = tf.subtract(1.0, xmax) + flipped_xmax = tf.subtract(1.0, xmin) + flipped_boxes = tf.concat([ymin, flipped_xmin, ymax, flipped_xmax], 1) + return flipped_boxes + + +def retain_boxes_above_threshold( + boxes, labels, label_scores, masks=None, keypoints=None, threshold=0.0): + """Retains boxes whose label score is above a given threshold. + + If the label score for a box is missing (represented by NaN), the box is + retained. The boxes that don't pass the threshold will not appear in the + returned tensor. + + Args: + boxes: float32 tensor of shape [num_instance, 4] representing boxes + location in normalized coordinates. + labels: rank 1 int32 tensor of shape [num_instance] containing the object + classes. + label_scores: float32 tensor of shape [num_instance] representing the + score for each box. + masks: (optional) rank 3 float32 tensor with shape + [num_instances, height, width] containing instance masks. The masks are of + the same height, width as the input `image`. + keypoints: (optional) rank 3 float32 tensor with shape + [num_instances, num_keypoints, 2]. The keypoints are in y-x normalized + coordinates. + threshold: scalar python float. + + Returns: + retained_boxes: [num_retained_instance, 4] + retianed_labels: [num_retained_instance] + retained_label_scores: [num_retained_instance] + + If masks, or keypoints are not None, the function also returns: + + retained_masks: [num_retained_instance, height, width] + retained_keypoints: [num_retained_instance, num_keypoints, 2] + """ + with tf.name_scope('RetainBoxesAboveThreshold', + values=[boxes, labels, label_scores]): + indices = tf.where( + tf.logical_or(label_scores > threshold, tf.is_nan(label_scores))) + indices = tf.squeeze(indices, axis=1) + retained_boxes = tf.gather(boxes, indices) + retained_labels = tf.gather(labels, indices) + retained_label_scores = tf.gather(label_scores, indices) + result = [retained_boxes, retained_labels, retained_label_scores] + + if masks is not None: + retained_masks = tf.gather(masks, indices) + result.append(retained_masks) + + if keypoints is not None: + retained_keypoints = tf.gather(keypoints, indices) + result.append(retained_keypoints) + + return result + + +def _flip_masks(masks): + """Left-right flips masks. + + Args: + masks: rank 3 float32 tensor with shape + [num_instances, height, width] representing instance masks. + + Returns: + flipped masks: rank 3 float32 tensor with shape + [num_instances, height, width] representing instance masks. + """ + return masks[:, :, ::-1] + + +def random_horizontal_flip( + image, + boxes=None, + masks=None, + keypoints=None, + keypoint_flip_permutation=None, + seed=None): + """Randomly decides whether to mirror the image and detections or not. + + The probability of flipping the image is 50%. + + Args: + image: rank 3 float32 tensor with shape [height, width, channels]. + boxes: (optional) rank 2 float32 tensor with shape [N, 4] + containing the bounding boxes. + Boxes are in normalized form meaning their coordinates vary + between [0, 1]. + Each row is in the form of [ymin, xmin, ymax, xmax]. + masks: (optional) rank 3 float32 tensor with shape + [num_instances, height, width] containing instance masks. The masks + are of the same height, width as the input `image`. + keypoints: (optional) rank 3 float32 tensor with shape + [num_instances, num_keypoints, 2]. The keypoints are in y-x + normalized coordinates. + keypoint_flip_permutation: rank 1 int32 tensor containing keypoint flip + permutation. + seed: random seed + + Returns: + image: image which is the same shape as input image. + + If boxes, masks, keypoints, and keypoint_flip_permutation is not None, + the function also returns the following tensors. + + boxes: rank 2 float32 tensor containing the bounding boxes -> [N, 4]. + Boxes are in normalized form meaning their coordinates vary + between [0, 1]. + masks: rank 3 float32 tensor with shape [num_instances, height, width] + containing instance masks. + keypoints: rank 3 float32 tensor with shape + [num_instances, num_keypoints, 2] + + Raises: + ValueError: if keypoints are provided but keypoint_flip_permutation is not. + """ + def _flip_image(image): + # flip image + image_flipped = tf.image.flip_left_right(image) + return image_flipped + + if keypoints is not None and keypoint_flip_permutation is None: + raise ValueError( + 'keypoints are provided but keypoints_flip_permutation is not provided') + + with tf.name_scope('RandomHorizontalFlip', values=[image, boxes]): + result = [] + # random variable defining whether to do flip or not + do_a_flip_random = tf.random_uniform([], seed=seed) + # flip only if there are bounding boxes in image! + do_a_flip_random = tf.logical_and( + tf.greater(tf.size(boxes), 0), tf.greater(do_a_flip_random, 0.5)) + + # flip image + image = tf.cond(do_a_flip_random, lambda: _flip_image(image), lambda: image) + result.append(image) + + # flip boxes + if boxes is not None: + boxes = tf.cond( + do_a_flip_random, lambda: flip_boxes(boxes), lambda: boxes) + result.append(boxes) + + # flip masks + if masks is not None: + masks = tf.cond( + do_a_flip_random, lambda: _flip_masks(masks), lambda: masks) + result.append(masks) + + # flip keypoints + if keypoints is not None and keypoint_flip_permutation is not None: + permutation = keypoint_flip_permutation + keypoints = tf.cond( + do_a_flip_random, + lambda: keypoint_ops.flip_horizontal(keypoints, 0.5, permutation), + lambda: keypoints) + result.append(keypoints) + + return tuple(result) + + +def random_pixel_value_scale(image, minval=0.9, maxval=1.1, seed=None): + """Scales each value in the pixels of the image. + + This function scales each pixel independent of the other ones. + For each value in image tensor, draws a random number between + minval and maxval and multiples the values with them. + + Args: + image: rank 3 float32 tensor contains 1 image -> [height, width, channels] + with pixel values varying between [0, 1]. + minval: lower ratio of scaling pixel values. + maxval: upper ratio of scaling pixel values. + seed: random seed. + + Returns: + image: image which is the same shape as input image. + """ + with tf.name_scope('RandomPixelValueScale', values=[image]): + color_coef = tf.random_uniform( + tf.shape(image), + minval=minval, + maxval=maxval, + dtype=tf.float32, + seed=seed) + image = tf.multiply(image, color_coef) + image = tf.clip_by_value(image, 0.0, 1.0) + + return image + + +def random_image_scale(image, + masks=None, + min_scale_ratio=0.5, + max_scale_ratio=2.0, + seed=None): + """Scales the image size. + + Args: + image: rank 3 float32 tensor contains 1 image -> [height, width, channels]. + masks: (optional) rank 3 float32 tensor containing masks with + size [height, width, num_masks]. The value is set to None if there are no + masks. + min_scale_ratio: minimum scaling ratio. + max_scale_ratio: maximum scaling ratio. + seed: random seed. + + Returns: + image: image which is the same rank as input image. + masks: If masks is not none, resized masks which are the same rank as input + masks will be returned. + """ + with tf.name_scope('RandomImageScale', values=[image]): + result = [] + image_shape = tf.shape(image) + image_height = image_shape[0] + image_width = image_shape[1] + size_coef = tf.random_uniform([], + minval=min_scale_ratio, + maxval=max_scale_ratio, + dtype=tf.float32, seed=seed) + image_newysize = tf.to_int32( + tf.multiply(tf.to_float(image_height), size_coef)) + image_newxsize = tf.to_int32( + tf.multiply(tf.to_float(image_width), size_coef)) + image = tf.image.resize_images( + image, [image_newysize, image_newxsize], align_corners=True) + result.append(image) + if masks: + masks = tf.image.resize_nearest_neighbor( + masks, [image_newysize, image_newxsize], align_corners=True) + result.append(masks) + return tuple(result) + + +def random_rgb_to_gray(image, probability=0.1, seed=None): + """Changes the image from RGB to Grayscale with the given probability. + + Args: + image: rank 3 float32 tensor contains 1 image -> [height, width, channels] + with pixel values varying between [0, 1]. + probability: the probability of returning a grayscale image. + The probability should be a number between [0, 1]. + seed: random seed. + + Returns: + image: image which is the same shape as input image. + """ + def _image_to_gray(image): + image_gray1 = tf.image.rgb_to_grayscale(image) + image_gray3 = tf.image.grayscale_to_rgb(image_gray1) + return image_gray3 + + with tf.name_scope('RandomRGBtoGray', values=[image]): + # random variable defining whether to do flip or not + do_gray_random = tf.random_uniform([], seed=seed) + + image = tf.cond( + tf.greater(do_gray_random, probability), lambda: image, + lambda: _image_to_gray(image)) + + return image + + +def random_adjust_brightness(image, max_delta=0.2): + """Randomly adjusts brightness. + + Makes sure the output image is still between 0 and 1. + + Args: + image: rank 3 float32 tensor contains 1 image -> [height, width, channels] + with pixel values varying between [0, 1]. + max_delta: how much to change the brightness. A value between [0, 1). + + Returns: + image: image which is the same shape as input image. + boxes: boxes which is the same shape as input boxes. + """ + with tf.name_scope('RandomAdjustBrightness', values=[image]): + image = tf.image.random_brightness(image, max_delta) + image = tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=1.0) + return image + + +def random_adjust_contrast(image, min_delta=0.8, max_delta=1.25): + """Randomly adjusts contrast. + + Makes sure the output image is still between 0 and 1. + + Args: + image: rank 3 float32 tensor contains 1 image -> [height, width, channels] + with pixel values varying between [0, 1]. + min_delta: see max_delta. + max_delta: how much to change the contrast. Contrast will change with a + value between min_delta and max_delta. This value will be + multiplied to the current contrast of the image. + + Returns: + image: image which is the same shape as input image. + """ + with tf.name_scope('RandomAdjustContrast', values=[image]): + image = tf.image.random_contrast(image, min_delta, max_delta) + image = tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=1.0) + return image + + +def random_adjust_hue(image, max_delta=0.02): + """Randomly adjusts hue. + + Makes sure the output image is still between 0 and 1. + + Args: + image: rank 3 float32 tensor contains 1 image -> [height, width, channels] + with pixel values varying between [0, 1]. + max_delta: change hue randomly with a value between 0 and max_delta. + + Returns: + image: image which is the same shape as input image. + """ + with tf.name_scope('RandomAdjustHue', values=[image]): + image = tf.image.random_hue(image, max_delta) + image = tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=1.0) + return image + + +def random_adjust_saturation(image, min_delta=0.8, max_delta=1.25): + """Randomly adjusts saturation. + + Makes sure the output image is still between 0 and 1. + + Args: + image: rank 3 float32 tensor contains 1 image -> [height, width, channels] + with pixel values varying between [0, 1]. + min_delta: see max_delta. + max_delta: how much to change the saturation. Saturation will change with a + value between min_delta and max_delta. This value will be + multiplied to the current saturation of the image. + + Returns: + image: image which is the same shape as input image. + """ + with tf.name_scope('RandomAdjustSaturation', values=[image]): + image = tf.image.random_saturation(image, min_delta, max_delta) + image = tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=1.0) + return image + + +def random_distort_color(image, color_ordering=0): + """Randomly distorts color. + + Randomly distorts color using a combination of brightness, hue, contrast + and saturation changes. Makes sure the output image is still between 0 and 1. + + Args: + image: rank 3 float32 tensor contains 1 image -> [height, width, channels] + with pixel values varying between [0, 1]. + color_ordering: Python int, a type of distortion (valid values: 0, 1). + + Returns: + image: image which is the same shape as input image. + + Raises: + ValueError: if color_ordering is not in {0, 1}. + """ + with tf.name_scope('RandomDistortColor', values=[image]): + if color_ordering == 0: + image = tf.image.random_brightness(image, max_delta=32. / 255.) + image = tf.image.random_saturation(image, lower=0.5, upper=1.5) + image = tf.image.random_hue(image, max_delta=0.2) + image = tf.image.random_contrast(image, lower=0.5, upper=1.5) + elif color_ordering == 1: + image = tf.image.random_brightness(image, max_delta=32. / 255.) + image = tf.image.random_contrast(image, lower=0.5, upper=1.5) + image = tf.image.random_saturation(image, lower=0.5, upper=1.5) + image = tf.image.random_hue(image, max_delta=0.2) + else: + raise ValueError('color_ordering must be in {0, 1}') + + # The random_* ops do not necessarily clamp. + image = tf.clip_by_value(image, 0.0, 1.0) + return image + + +def random_jitter_boxes(boxes, ratio=0.05, seed=None): + """Randomly jitter boxes in image. + + Args: + boxes: rank 2 float32 tensor containing the bounding boxes -> [N, 4]. + Boxes are in normalized form meaning their coordinates vary + between [0, 1]. + Each row is in the form of [ymin, xmin, ymax, xmax]. + ratio: The ratio of the box width and height that the corners can jitter. + For example if the width is 100 pixels and ratio is 0.05, + the corners can jitter up to 5 pixels in the x direction. + seed: random seed. + + Returns: + boxes: boxes which is the same shape as input boxes. + """ + def random_jitter_box(box, ratio, seed): + """Randomly jitter box. + + Args: + box: bounding box [1, 1, 4]. + ratio: max ratio between jittered box and original box, + a number between [0, 0.5]. + seed: random seed. + + Returns: + jittered_box: jittered box. + """ + rand_numbers = tf.random_uniform( + [1, 1, 4], minval=-ratio, maxval=ratio, dtype=tf.float32, seed=seed) + box_width = tf.subtract(box[0, 0, 3], box[0, 0, 1]) + box_height = tf.subtract(box[0, 0, 2], box[0, 0, 0]) + hw_coefs = tf.stack([box_height, box_width, box_height, box_width]) + hw_rand_coefs = tf.multiply(hw_coefs, rand_numbers) + jittered_box = tf.add(box, hw_rand_coefs) + jittered_box = tf.clip_by_value(jittered_box, 0.0, 1.0) + return jittered_box + + with tf.name_scope('RandomJitterBoxes', values=[boxes]): + # boxes are [N, 4]. Lets first make them [N, 1, 1, 4] + boxes_shape = tf.shape(boxes) + boxes = tf.expand_dims(boxes, 1) + boxes = tf.expand_dims(boxes, 2) + + distorted_boxes = tf.map_fn( + lambda x: random_jitter_box(x, ratio, seed), boxes, dtype=tf.float32) + + distorted_boxes = tf.reshape(distorted_boxes, boxes_shape) + + return distorted_boxes + + +def _strict_random_crop_image(image, + boxes, + labels, + masks=None, + keypoints=None, + min_object_covered=1.0, + aspect_ratio_range=(0.75, 1.33), + area_range=(0.1, 1.0), + overlap_thresh=0.3): + """Performs random crop. + + Note: boxes will be clipped to the crop. Keypoint coordinates that are + outside the crop will be set to NaN, which is consistent with the original + keypoint encoding for non-existing keypoints. This function always crops + the image and is supposed to be used by `random_crop_image` function which + sometimes returns image unchanged. + + Args: + image: rank 3 float32 tensor containing 1 image -> [height, width, channels] + with pixel values varying between [0, 1]. + boxes: rank 2 float32 tensor containing the bounding boxes with shape + [num_instances, 4]. + Boxes are in normalized form meaning their coordinates vary + between [0, 1]. + Each row is in the form of [ymin, xmin, ymax, xmax]. + labels: rank 1 int32 tensor containing the object classes. + masks: (optional) rank 3 float32 tensor with shape + [num_instances, height, width] containing instance masks. The masks + are of the same height, width as the input `image`. + keypoints: (optional) rank 3 float32 tensor with shape + [num_instances, num_keypoints, 2]. The keypoints are in y-x + normalized coordinates. + min_object_covered: the cropped image must cover at least this fraction of + at least one of the input bounding boxes. + aspect_ratio_range: allowed range for aspect ratio of cropped image. + area_range: allowed range for area ratio between cropped image and the + original image. + overlap_thresh: minimum overlap thresh with new cropped + image to keep the box. + + Returns: + image: image which is the same rank as input image. + boxes: boxes which is the same rank as input boxes. + Boxes are in normalized form. + labels: new labels. + + If masks, or keypoints is not None, the function also returns: + + masks: rank 3 float32 tensor with shape [num_instances, height, width] + containing instance masks. + keypoints: rank 3 float32 tensor with shape + [num_instances, num_keypoints, 2] + """ + with tf.name_scope('RandomCropImage', values=[image, boxes]): + image_shape = tf.shape(image) + + # boxes are [N, 4]. Lets first make them [N, 1, 4]. + boxes_expanded = tf.expand_dims( + tf.clip_by_value( + boxes, clip_value_min=0.0, clip_value_max=1.0), 1) + + sample_distorted_bounding_box = tf.image.sample_distorted_bounding_box( + image_shape, + bounding_boxes=boxes_expanded, + min_object_covered=min_object_covered, + aspect_ratio_range=aspect_ratio_range, + area_range=area_range, + max_attempts=100, + use_image_if_no_bounding_boxes=True) + + im_box_begin, im_box_size, im_box = sample_distorted_bounding_box + + new_image = tf.slice(image, im_box_begin, im_box_size) + new_image.set_shape([None, None, image.get_shape()[2]]) + + # [1, 4] + im_box_rank2 = tf.squeeze(im_box, squeeze_dims=[0]) + # [4] + im_box_rank1 = tf.squeeze(im_box) + + boxlist = box_list.BoxList(boxes) + boxlist.add_field('labels', labels) + + im_boxlist = box_list.BoxList(im_box_rank2) + + # remove boxes that are outside cropped image + boxlist, inside_window_ids = box_list_ops.prune_completely_outside_window( + boxlist, im_box_rank1) + + # remove boxes that are outside image + overlapping_boxlist, keep_ids = box_list_ops.prune_non_overlapping_boxes( + boxlist, im_boxlist, overlap_thresh) + + # change the coordinate of the remaining boxes + new_labels = overlapping_boxlist.get_field('labels') + new_boxlist = box_list_ops.change_coordinate_frame(overlapping_boxlist, + im_box_rank1) + new_boxes = new_boxlist.get() + new_boxes = tf.clip_by_value( + new_boxes, clip_value_min=0.0, clip_value_max=1.0) + + result = [new_image, new_boxes, new_labels] + + if masks is not None: + masks_of_boxes_inside_window = tf.gather(masks, inside_window_ids) + masks_of_boxes_completely_inside_window = tf.gather( + masks_of_boxes_inside_window, keep_ids) + masks_box_begin = [im_box_begin[2], im_box_begin[0], im_box_begin[1]] + masks_box_size = [im_box_size[2], im_box_size[0], im_box_size[1]] + new_masks = tf.slice( + masks_of_boxes_completely_inside_window, + masks_box_begin, masks_box_size) + result.append(new_masks) + + if keypoints is not None: + keypoints_of_boxes_inside_window = tf.gather(keypoints, inside_window_ids) + keypoints_of_boxes_completely_inside_window = tf.gather( + keypoints_of_boxes_inside_window, keep_ids) + new_keypoints = keypoint_ops.change_coordinate_frame( + keypoints_of_boxes_completely_inside_window, im_box_rank1) + new_keypoints = keypoint_ops.prune_outside_window(new_keypoints, + [0.0, 0.0, 1.0, 1.0]) + result.append(new_keypoints) + + return tuple(result) + + +def random_crop_image(image, + boxes, + labels, + masks=None, + keypoints=None, + min_object_covered=1.0, + aspect_ratio_range=(0.75, 1.33), + area_range=(0.1, 1.0), + overlap_thresh=0.3, + random_coef=0.0, + seed=None): + """Randomly crops the image. + + Given the input image and its bounding boxes, this op randomly + crops a subimage. Given a user-provided set of input constraints, + the crop window is resampled until it satisfies these constraints. + If within 100 trials it is unable to find a valid crop, the original + image is returned. See the Args section for a description of the input + constraints. Both input boxes and returned Boxes are in normalized + form (e.g., lie in the unit square [0, 1]). + This function will return the original image with probability random_coef. + + Note: boxes will be clipped to the crop. Keypoint coordinates that are + outside the crop will be set to NaN, which is consistent with the original + keypoint encoding for non-existing keypoints. + + Args: + image: rank 3 float32 tensor contains 1 image -> [height, width, channels] + with pixel values varying between [0, 1]. + boxes: rank 2 float32 tensor containing the bounding boxes with shape + [num_instances, 4]. + Boxes are in normalized form meaning their coordinates vary + between [0, 1]. + Each row is in the form of [ymin, xmin, ymax, xmax]. + labels: rank 1 int32 tensor containing the object classes. + masks: (optional) rank 3 float32 tensor with shape + [num_instances, height, width] containing instance masks. The masks + are of the same height, width as the input `image`. + keypoints: (optional) rank 3 float32 tensor with shape + [num_instances, num_keypoints, 2]. The keypoints are in y-x + normalized coordinates. + min_object_covered: the cropped image must cover at least this fraction of + at least one of the input bounding boxes. + aspect_ratio_range: allowed range for aspect ratio of cropped image. + area_range: allowed range for area ratio between cropped image and the + original image. + overlap_thresh: minimum overlap thresh with new cropped + image to keep the box. + random_coef: a random coefficient that defines the chance of getting the + original image. If random_coef is 0, we will always get the + cropped image, and if it is 1.0, we will always get the + original image. + seed: random seed. + + Returns: + image: Image shape will be [new_height, new_width, channels]. + boxes: boxes which is the same rank as input boxes. Boxes are in normalized + form. + labels: new labels. + + If masks, or keypoints are not None, the function also returns: + + masks: rank 3 float32 tensor with shape [num_instances, height, width] + containing instance masks. + keypoints: rank 3 float32 tensor with shape + [num_instances, num_keypoints, 2] + """ + + def strict_random_crop_image_fn(): + return _strict_random_crop_image( + image, + boxes, + labels, + masks=masks, + keypoints=keypoints, + min_object_covered=min_object_covered, + aspect_ratio_range=aspect_ratio_range, + area_range=area_range, + overlap_thresh=overlap_thresh) + + # avoids tf.cond to make faster RCNN training on borg. See b/140057645. + if random_coef < sys.float_info.min: + result = strict_random_crop_image_fn() + else: + do_a_crop_random = tf.random_uniform([], seed=seed) + do_a_crop_random = tf.greater(do_a_crop_random, random_coef) + + outputs = [image, boxes, labels] + if masks is not None: + outputs.append(masks) + if keypoints is not None: + outputs.append(keypoints) + + result = tf.cond(do_a_crop_random, + strict_random_crop_image_fn, + lambda: tuple(outputs)) + return result + + +def random_pad_image(image, + boxes, + min_image_size=None, + max_image_size=None, + pad_color=None, + seed=None): + """Randomly pads the image. + + This function randomly pads the image with zeros. The final size of the + padded image will be between min_image_size and max_image_size. + if min_image_size is smaller than the input image size, min_image_size will + be set to the input image size. The same for max_image_size. The input image + will be located at a uniformly random location inside the padded image. + The relative location of the boxes to the original image will remain the same. + + Args: + image: rank 3 float32 tensor containing 1 image -> [height, width, channels] + with pixel values varying between [0, 1]. + boxes: rank 2 float32 tensor containing the bounding boxes -> [N, 4]. + Boxes are in normalized form meaning their coordinates vary + between [0, 1]. + Each row is in the form of [ymin, xmin, ymax, xmax]. + min_image_size: a tensor of size [min_height, min_width], type tf.int32. + If passed as None, will be set to image size + [height, width]. + max_image_size: a tensor of size [max_height, max_width], type tf.int32. + If passed as None, will be set to twice the + image [height * 2, width * 2]. + pad_color: padding color. A rank 1 tensor of [3] with dtype=tf.float32. + if set as None, it will be set to average color of the input + image. + + seed: random seed. + + Returns: + image: Image shape will be [new_height, new_width, channels]. + boxes: boxes which is the same rank as input boxes. Boxes are in normalized + form. + """ + if pad_color is None: + pad_color = tf.reduce_mean(image, reduction_indices=[0, 1]) + + image_shape = tf.shape(image) + image_height = image_shape[0] + image_width = image_shape[1] + + if max_image_size is None: + max_image_size = tf.stack([image_height * 2, image_width * 2]) + max_image_size = tf.maximum(max_image_size, + tf.stack([image_height, image_width])) + + if min_image_size is None: + min_image_size = tf.stack([image_height, image_width]) + min_image_size = tf.maximum(min_image_size, + tf.stack([image_height, image_width])) + + target_height = tf.cond( + max_image_size[0] > min_image_size[0], + lambda: _random_integer(min_image_size[0], max_image_size[0], seed), + lambda: max_image_size[0]) + + target_width = tf.cond( + max_image_size[1] > min_image_size[1], + lambda: _random_integer(min_image_size[1], max_image_size[1], seed), + lambda: max_image_size[1]) + + offset_height = tf.cond( + target_height > image_height, + lambda: _random_integer(0, target_height - image_height, seed), + lambda: tf.constant(0, dtype=tf.int32)) + + offset_width = tf.cond( + target_width > image_width, + lambda: _random_integer(0, target_width - image_width, seed), + lambda: tf.constant(0, dtype=tf.int32)) + + new_image = tf.image.pad_to_bounding_box( + image, offset_height=offset_height, offset_width=offset_width, + target_height=target_height, target_width=target_width) + + # Setting color of the padded pixels + image_ones = tf.ones_like(image) + image_ones_padded = tf.image.pad_to_bounding_box( + image_ones, offset_height=offset_height, offset_width=offset_width, + target_height=target_height, target_width=target_width) + image_color_paded = (1.0 - image_ones_padded) * pad_color + new_image += image_color_paded + + # setting boxes + new_window = tf.to_float( + tf.stack([ + -offset_height, -offset_width, target_height - offset_height, + target_width - offset_width + ])) + new_window /= tf.to_float( + tf.stack([image_height, image_width, image_height, image_width])) + boxlist = box_list.BoxList(boxes) + new_boxlist = box_list_ops.change_coordinate_frame(boxlist, new_window) + new_boxes = new_boxlist.get() + + return new_image, new_boxes + + +def random_crop_pad_image(image, + boxes, + labels, + min_object_covered=1.0, + aspect_ratio_range=(0.75, 1.33), + area_range=(0.1, 1.0), + overlap_thresh=0.3, + random_coef=0.0, + min_padded_size_ratio=None, + max_padded_size_ratio=None, + pad_color=None, + seed=None): + """Randomly crops and pads the image. + + Given an input image and its bounding boxes, this op first randomly crops + the image and then randomly pads the image with background values. Parameters + min_padded_size_ratio and max_padded_size_ratio, determine the range of the + final output image size. Specifically, the final image size will have a size + in the range of min_padded_size_ratio * tf.shape(image) and + max_padded_size_ratio * tf.shape(image). Note that these ratios are with + respect to the size of the original image, so we can't capture the same + effect easily by independently applying RandomCropImage + followed by RandomPadImage. + + Args: + image: rank 3 float32 tensor containing 1 image -> [height, width, channels] + with pixel values varying between [0, 1]. + boxes: rank 2 float32 tensor containing the bounding boxes -> [N, 4]. + Boxes are in normalized form meaning their coordinates vary + between [0, 1]. + Each row is in the form of [ymin, xmin, ymax, xmax]. + labels: rank 1 int32 tensor containing the object classes. + min_object_covered: the cropped image must cover at least this fraction of + at least one of the input bounding boxes. + aspect_ratio_range: allowed range for aspect ratio of cropped image. + area_range: allowed range for area ratio between cropped image and the + original image. + overlap_thresh: minimum overlap thresh with new cropped + image to keep the box. + random_coef: a random coefficient that defines the chance of getting the + original image. If random_coef is 0, we will always get the + cropped image, and if it is 1.0, we will always get the + original image. + min_padded_size_ratio: min ratio of padded image height and width to the + input image's height and width. If None, it will + be set to [0.0, 0.0]. + max_padded_size_ratio: max ratio of padded image height and width to the + input image's height and width. If None, it will + be set to [2.0, 2.0]. + pad_color: padding color. A rank 1 tensor of [3] with dtype=tf.float32. + if set as None, it will be set to average color of the randomly + cropped image. + seed: random seed. + + Returns: + padded_image: padded image. + padded_boxes: boxes which is the same rank as input boxes. Boxes are in + normalized form. + cropped_labels: cropped labels. + """ + image_size = tf.shape(image) + image_height = image_size[0] + image_width = image_size[1] + if min_padded_size_ratio is None: + min_padded_size_ratio = tf.constant([0.0, 0.0], tf.float32) + if max_padded_size_ratio is None: + max_padded_size_ratio = tf.constant([2.0, 2.0], tf.float32) + cropped_image, cropped_boxes, cropped_labels = random_crop_image( + image=image, + boxes=boxes, + labels=labels, + min_object_covered=min_object_covered, + aspect_ratio_range=aspect_ratio_range, + area_range=area_range, + overlap_thresh=overlap_thresh, + random_coef=random_coef, + seed=seed) + + min_image_size = tf.to_int32( + tf.to_float(tf.stack([image_height, image_width])) * + min_padded_size_ratio) + max_image_size = tf.to_int32( + tf.to_float(tf.stack([image_height, image_width])) * + max_padded_size_ratio) + + padded_image, padded_boxes = random_pad_image( + cropped_image, + cropped_boxes, + min_image_size=min_image_size, + max_image_size=max_image_size, + pad_color=pad_color, + seed=seed) + + return padded_image, padded_boxes, cropped_labels + + +def random_crop_to_aspect_ratio(image, + boxes, + labels, + masks=None, + keypoints=None, + aspect_ratio=1.0, + overlap_thresh=0.3, + seed=None): + """Randomly crops an image to the specified aspect ratio. + + Randomly crops the a portion of the image such that the crop is of the + specified aspect ratio, and the crop is as large as possible. If the specified + aspect ratio is larger than the aspect ratio of the image, this op will + randomly remove rows from the top and bottom of the image. If the specified + aspect ratio is less than the aspect ratio of the image, this op will randomly + remove cols from the left and right of the image. If the specified aspect + ratio is the same as the aspect ratio of the image, this op will return the + image. + + Args: + image: rank 3 float32 tensor contains 1 image -> [height, width, channels] + with pixel values varying between [0, 1]. + boxes: rank 2 float32 tensor containing the bounding boxes -> [N, 4]. + Boxes are in normalized form meaning their coordinates vary + between [0, 1]. + Each row is in the form of [ymin, xmin, ymax, xmax]. + labels: rank 1 int32 tensor containing the object classes. + masks: (optional) rank 3 float32 tensor with shape + [num_instances, height, width] containing instance masks. The masks + are of the same height, width as the input `image`. + keypoints: (optional) rank 3 float32 tensor with shape + [num_instances, num_keypoints, 2]. The keypoints are in y-x + normalized coordinates. + aspect_ratio: the aspect ratio of cropped image. + overlap_thresh: minimum overlap thresh with new cropped + image to keep the box. + seed: random seed. + + Returns: + image: image which is the same rank as input image. + boxes: boxes which is the same rank as input boxes. + Boxes are in normalized form. + labels: new labels. + + If masks, or keypoints is not None, the function also returns: + + masks: rank 3 float32 tensor with shape [num_instances, height, width] + containing instance masks. + keypoints: rank 3 float32 tensor with shape + [num_instances, num_keypoints, 2] + + Raises: + ValueError: If image is not a 3D tensor. + """ + if len(image.get_shape()) != 3: + raise ValueError('Image should be 3D tensor') + + with tf.name_scope('RandomCropToAspectRatio', values=[image]): + image_shape = tf.shape(image) + orig_height = image_shape[0] + orig_width = image_shape[1] + orig_aspect_ratio = tf.to_float(orig_width) / tf.to_float(orig_height) + new_aspect_ratio = tf.constant(aspect_ratio, dtype=tf.float32) + def target_height_fn(): + return tf.to_int32( + tf.round( + tf.to_float(orig_height) * orig_aspect_ratio / new_aspect_ratio)) + target_height = tf.cond( + orig_aspect_ratio >= new_aspect_ratio, + lambda: orig_height, + target_height_fn) + def target_width_fn(): + return tf.to_int32( + tf.round( + tf.to_float(orig_width) * new_aspect_ratio / orig_aspect_ratio)) + target_width = tf.cond( + orig_aspect_ratio <= new_aspect_ratio, + lambda: orig_width, + target_width_fn) + + # either offset_height = 0 and offset_width is randomly chosen from + # [0, offset_width - target_width), or else offset_width = 0 and + # offset_height is randomly chosen from [0, offset_height - target_height) + offset_height = _random_integer(0, orig_height - target_height + 1, seed) + offset_width = _random_integer(0, orig_width - target_width + 1, seed) + new_image = tf.image.crop_to_bounding_box( + image, offset_height, offset_width, target_height, target_width) + + im_box = tf.stack([ + tf.to_float(offset_height) / tf.to_float(orig_height), + tf.to_float(offset_width) / tf.to_float(orig_width), + tf.to_float(offset_height + target_height) / tf.to_float(orig_height), + tf.to_float(offset_width + target_width) / tf.to_float(orig_width) + ]) + + boxlist = box_list.BoxList(boxes) + boxlist.add_field('labels', labels) + + im_boxlist = box_list.BoxList(tf.expand_dims(im_box, 0)) + + # remove boxes whose overlap with the image is less than overlap_thresh + overlapping_boxlist, keep_ids = box_list_ops.prune_non_overlapping_boxes( + boxlist, im_boxlist, overlap_thresh) + + # change the coordinate of the remaining boxes + new_labels = overlapping_boxlist.get_field('labels') + new_boxlist = box_list_ops.change_coordinate_frame(overlapping_boxlist, + im_box) + new_boxlist = box_list_ops.clip_to_window(new_boxlist, + tf.constant( + [0.0, 0.0, 1.0, 1.0], + tf.float32)) + new_boxes = new_boxlist.get() + + result = [new_image, new_boxes, new_labels] + + if masks is not None: + masks_inside_window = tf.gather(masks, keep_ids) + masks_box_begin = tf.stack([0, offset_height, offset_width]) + masks_box_size = tf.stack([-1, target_height, target_width]) + new_masks = tf.slice(masks_inside_window, masks_box_begin, masks_box_size) + result.append(new_masks) + + if keypoints is not None: + keypoints_inside_window = tf.gather(keypoints, keep_ids) + new_keypoints = keypoint_ops.change_coordinate_frame( + keypoints_inside_window, im_box) + new_keypoints = keypoint_ops.prune_outside_window(new_keypoints, + [0.0, 0.0, 1.0, 1.0]) + result.append(new_keypoints) + + return tuple(result) + + +def random_black_patches(image, + max_black_patches=10, + probability=0.5, + size_to_image_ratio=0.1, + random_seed=None): + """Randomly adds some black patches to the image. + + This op adds up to max_black_patches square black patches of a fixed size + to the image where size is specified via the size_to_image_ratio parameter. + + Args: + image: rank 3 float32 tensor containing 1 image -> [height, width, channels] + with pixel values varying between [0, 1]. + max_black_patches: number of times that the function tries to add a + black box to the image. + probability: at each try, what is the chance of adding a box. + size_to_image_ratio: Determines the ratio of the size of the black patches + to the size of the image. + box_size = size_to_image_ratio * + min(image_width, image_height) + random_seed: random seed. + + Returns: + image + """ + def add_black_patch_to_image(image): + """Function for adding one patch to the image. + + Args: + image: image + + Returns: + image with a randomly added black box + """ + image_shape = tf.shape(image) + image_height = image_shape[0] + image_width = image_shape[1] + box_size = tf.to_int32( + tf.multiply( + tf.minimum(tf.to_float(image_height), tf.to_float(image_width)), + size_to_image_ratio)) + normalized_y_min = tf.random_uniform( + [], minval=0.0, maxval=(1.0 - size_to_image_ratio), seed=random_seed) + normalized_x_min = tf.random_uniform( + [], minval=0.0, maxval=(1.0 - size_to_image_ratio), seed=random_seed) + y_min = tf.to_int32(normalized_y_min * tf.to_float(image_height)) + x_min = tf.to_int32(normalized_x_min * tf.to_float(image_width)) + black_box = tf.ones([box_size, box_size, 3], dtype=tf.float32) + mask = 1.0 - tf.image.pad_to_bounding_box(black_box, y_min, x_min, + image_height, image_width) + image = tf.multiply(image, mask) + return image + + with tf.name_scope('RandomBlackPatchInImage', values=[image]): + for _ in range(max_black_patches): + random_prob = tf.random_uniform([], minval=0.0, maxval=1.0, + dtype=tf.float32, seed=random_seed) + image = tf.cond( + tf.greater(random_prob, probability), lambda: image, + lambda: add_black_patch_to_image(image)) + + return image + + +def image_to_float(image): + """Used in Faster R-CNN. Casts image pixel values to float. + + Args: + image: input image which might be in tf.uint8 or sth else format + + Returns: + image: image in tf.float32 format. + """ + with tf.name_scope('ImageToFloat', values=[image]): + image = tf.to_float(image) + return image + + +def random_resize_method(image, target_size): + """Uses a random resize method to resize the image to target size. + + Args: + image: a rank 3 tensor. + target_size: a list of [target_height, target_width] + + Returns: + resized image. + """ + + resized_image = _apply_with_random_selector( + image, + lambda x, method: tf.image.resize_images(x, target_size, method), + num_cases=4) + + return resized_image + + +def resize_to_range(image, + masks=None, + min_dimension=None, + max_dimension=None, + align_corners=False): + """Resizes an image so its dimensions are within the provided value. + + The output size can be described by two cases: + 1. If the image can be rescaled so its minimum dimension is equal to the + provided value without the other dimension exceeding max_dimension, + then do so. + 2. Otherwise, resize so the largest dimension is equal to max_dimension. + + Args: + image: A 3D tensor of shape [height, width, channels] + masks: (optional) rank 3 float32 tensor with shape + [num_instances, height, width] containing instance masks. + min_dimension: (optional) (scalar) desired size of the smaller image + dimension. + max_dimension: (optional) (scalar) maximum allowed size + of the larger image dimension. + align_corners: bool. If true, exactly align all 4 corners of the input + and output. Defaults to False. + + Returns: + A 3D tensor of shape [new_height, new_width, channels], + where the image has been resized (with bilinear interpolation) so that + min(new_height, new_width) == min_dimension or + max(new_height, new_width) == max_dimension. + + If masks is not None, also outputs masks: + A 3D tensor of shape [num_instances, new_height, new_width] + + Raises: + ValueError: if the image is not a 3D tensor. + """ + if len(image.get_shape()) != 3: + raise ValueError('Image should be 3D tensor') + + with tf.name_scope('ResizeToRange', values=[image, min_dimension]): + image_shape = tf.shape(image) + orig_height = tf.to_float(image_shape[0]) + orig_width = tf.to_float(image_shape[1]) + orig_min_dim = tf.minimum(orig_height, orig_width) + + # Calculates the larger of the possible sizes + min_dimension = tf.constant(min_dimension, dtype=tf.float32) + large_scale_factor = min_dimension / orig_min_dim + # Scaling orig_(height|width) by large_scale_factor will make the smaller + # dimension equal to min_dimension, save for floating point rounding errors. + # For reasonably-sized images, taking the nearest integer will reliably + # eliminate this error. + large_height = tf.to_int32(tf.round(orig_height * large_scale_factor)) + large_width = tf.to_int32(tf.round(orig_width * large_scale_factor)) + large_size = tf.stack([large_height, large_width]) + + if max_dimension: + # Calculates the smaller of the possible sizes, use that if the larger + # is too big. + orig_max_dim = tf.maximum(orig_height, orig_width) + max_dimension = tf.constant(max_dimension, dtype=tf.float32) + small_scale_factor = max_dimension / orig_max_dim + # Scaling orig_(height|width) by small_scale_factor will make the larger + # dimension equal to max_dimension, save for floating point rounding + # errors. For reasonably-sized images, taking the nearest integer will + # reliably eliminate this error. + small_height = tf.to_int32(tf.round(orig_height * small_scale_factor)) + small_width = tf.to_int32(tf.round(orig_width * small_scale_factor)) + small_size = tf.stack([small_height, small_width]) + + new_size = tf.cond( + tf.to_float(tf.reduce_max(large_size)) > max_dimension, + lambda: small_size, lambda: large_size) + else: + new_size = large_size + + new_image = tf.image.resize_images(image, new_size, + align_corners=align_corners) + + result = new_image + if masks is not None: + num_instances = tf.shape(masks)[0] + + def resize_masks_branch(): + new_masks = tf.expand_dims(masks, 3) + new_masks = tf.image.resize_nearest_neighbor( + new_masks, new_size, align_corners=align_corners) + new_masks = tf.squeeze(new_masks, axis=3) + return new_masks + + def reshape_masks_branch(): + new_masks = tf.reshape(masks, [0, new_size[0], new_size[1]]) + return new_masks + + masks = tf.cond(num_instances > 0, + resize_masks_branch, + reshape_masks_branch) + result = [new_image, masks] + + return result + + +def scale_boxes_to_pixel_coordinates(image, boxes, keypoints=None): + """Scales boxes from normalized to pixel coordinates. + + Args: + image: A 3D float32 tensor of shape [height, width, channels]. + boxes: A 2D float32 tensor of shape [num_boxes, 4] containing the bounding + boxes in normalized coordinates. Each row is of the form + [ymin, xmin, ymax, xmax]. + keypoints: (optional) rank 3 float32 tensor with shape + [num_instances, num_keypoints, 2]. The keypoints are in y-x normalized + coordinates. + + Returns: + image: unchanged input image. + scaled_boxes: a 2D float32 tensor of shape [num_boxes, 4] containing the + bounding boxes in pixel coordinates. + scaled_keypoints: a 3D float32 tensor with shape + [num_instances, num_keypoints, 2] containing the keypoints in pixel + coordinates. + """ + boxlist = box_list.BoxList(boxes) + image_height = tf.shape(image)[0] + image_width = tf.shape(image)[1] + scaled_boxes = box_list_ops.scale(boxlist, image_height, image_width).get() + result = [image, scaled_boxes] + if keypoints is not None: + scaled_keypoints = keypoint_ops.scale(keypoints, image_height, image_width) + result.append(scaled_keypoints) + return tuple(result) + + +# pylint: disable=g-doc-return-or-yield +def resize_image(image, + masks=None, + new_height=600, + new_width=1024, + method=tf.image.ResizeMethod.BILINEAR, + align_corners=False): + """See `tf.image.resize_images` for detailed doc.""" + with tf.name_scope( + 'ResizeImage', + values=[image, new_height, new_width, method, align_corners]): + new_image = tf.image.resize_images(image, [new_height, new_width], + method=method, + align_corners=align_corners) + result = new_image + if masks is not None: + num_instances = tf.shape(masks)[0] + new_size = tf.constant([new_height, new_width], dtype=tf.int32) + def resize_masks_branch(): + new_masks = tf.expand_dims(masks, 3) + new_masks = tf.image.resize_nearest_neighbor( + new_masks, new_size, align_corners=align_corners) + new_masks = tf.squeeze(new_masks, axis=3) + return new_masks + + def reshape_masks_branch(): + new_masks = tf.reshape(masks, [0, new_size[0], new_size[1]]) + return new_masks + + masks = tf.cond(num_instances > 0, + resize_masks_branch, + reshape_masks_branch) + result = [new_image, masks] + + return result + + +def subtract_channel_mean(image, means=None): + """Normalizes an image by subtracting a mean from each channel. + + Args: + image: A 3D tensor of shape [height, width, channels] + means: float list containing a mean for each channel + Returns: + normalized_images: a tensor of shape [height, width, channels] + Raises: + ValueError: if images is not a 4D tensor or if the number of means is not + equal to the number of channels. + """ + with tf.name_scope('SubtractChannelMean', values=[image, means]): + if len(image.get_shape()) != 3: + raise ValueError('Input must be of size [height, width, channels]') + if len(means) != image.get_shape()[-1]: + raise ValueError('len(means) must match the number of channels') + return image - [[means]] + + +def one_hot_encoding(labels, num_classes=None): + """One-hot encodes the multiclass labels. + + Example usage: + labels = tf.constant([1, 4], dtype=tf.int32) + one_hot = OneHotEncoding(labels, num_classes=5) + one_hot.eval() # evaluates to [0, 1, 0, 0, 1] + + Args: + labels: A tensor of shape [None] corresponding to the labels. + num_classes: Number of classes in the dataset. + Returns: + onehot_labels: a tensor of shape [num_classes] corresponding to the one hot + encoding of the labels. + Raises: + ValueError: if num_classes is not specified. + """ + with tf.name_scope('OneHotEncoding', values=[labels]): + if num_classes is None: + raise ValueError('num_classes must be specified') + + labels = tf.one_hot(labels, num_classes, 1, 0) + return tf.reduce_max(labels, 0) + + +def rgb_to_gray(image): + """Converts a 3 channel RGB image to a 1 channel grayscale image. + + Args: + image: Rank 3 float32 tensor containing 1 image -> [height, width, 3] + with pixel values varying between [0, 1]. + + Returns: + image: A single channel grayscale image -> [image, height, 1]. + """ + return tf.image.rgb_to_grayscale(image) + + +def ssd_random_crop(image, + boxes, + labels, + masks=None, + keypoints=None, + min_object_covered=(0.0, 0.1, 0.3, 0.5, 0.7, 0.9, 1.0), + aspect_ratio_range=((0.5, 2.0),) * 7, + area_range=((0.1, 1.0),) * 7, + overlap_thresh=(0.0, 0.1, 0.3, 0.5, 0.7, 0.9, 1.0), + random_coef=(0.15,) * 7, + seed=None): + """Random crop preprocessing with default parameters as in SSD paper. + + Liu et al., SSD: Single shot multibox detector. + For further information on random crop preprocessing refer to RandomCrop + function above. + + Args: + image: rank 3 float32 tensor contains 1 image -> [height, width, channels] + with pixel values varying between [0, 1]. + boxes: rank 2 float32 tensor containing the bounding boxes -> [N, 4]. + Boxes are in normalized form meaning their coordinates vary + between [0, 1]. + Each row is in the form of [ymin, xmin, ymax, xmax]. + labels: rank 1 int32 tensor containing the object classes. + masks: (optional) rank 3 float32 tensor with shape + [num_instances, height, width] containing instance masks. The masks + are of the same height, width as the input `image`. + keypoints: (optional) rank 3 float32 tensor with shape + [num_instances, num_keypoints, 2]. The keypoints are in y-x + normalized coordinates. + min_object_covered: the cropped image must cover at least this fraction of + at least one of the input bounding boxes. + aspect_ratio_range: allowed range for aspect ratio of cropped image. + area_range: allowed range for area ratio between cropped image and the + original image. + overlap_thresh: minimum overlap thresh with new cropped + image to keep the box. + random_coef: a random coefficient that defines the chance of getting the + original image. If random_coef is 0, we will always get the + cropped image, and if it is 1.0, we will always get the + original image. + seed: random seed. + + Returns: + image: image which is the same rank as input image. + boxes: boxes which is the same rank as input boxes. + Boxes are in normalized form. + labels: new labels. + + If masks, or keypoints is not None, the function also returns: + + masks: rank 3 float32 tensor with shape [num_instances, height, width] + containing instance masks. + keypoints: rank 3 float32 tensor with shape + [num_instances, num_keypoints, 2] + """ + def random_crop_selector(selected_result, index): + """Applies random_crop_image to selected result. + + Args: + selected_result: A tuple containing image, boxes, labels, keypoints (if + not None), and masks (if not None). + index: The index that was randomly selected. + + Returns: A tuple containing image, boxes, labels, keypoints (if not None), + and masks (if not None). + """ + i = 3 + image, boxes, labels = selected_result[:i] + selected_masks = None + selected_keypoints = None + if masks is not None: + selected_masks = selected_result[i] + i += 1 + if keypoints is not None: + selected_keypoints = selected_result[i] + + return random_crop_image( + image=image, + boxes=boxes, + labels=labels, + masks=selected_masks, + keypoints=selected_keypoints, + min_object_covered=min_object_covered[index], + aspect_ratio_range=aspect_ratio_range[index], + area_range=area_range[index], + overlap_thresh=overlap_thresh[index], + random_coef=random_coef[index], + seed=seed) + + result = _apply_with_random_selector_tuples( + tuple( + t for t in (image, boxes, labels, masks, keypoints) if t is not None), + random_crop_selector, + num_cases=len(min_object_covered)) + return result + + +def ssd_random_crop_pad(image, + boxes, + labels, + min_object_covered=(0.1, 0.3, 0.5, 0.7, 0.9, 1.0), + aspect_ratio_range=((0.5, 2.0),) * 6, + area_range=((0.1, 1.0),) * 6, + overlap_thresh=(0.1, 0.3, 0.5, 0.7, 0.9, 1.0), + random_coef=(0.15,) * 6, + min_padded_size_ratio=(None,) * 6, + max_padded_size_ratio=(None,) * 6, + pad_color=(None,) * 6, + seed=None): + """Random crop preprocessing with default parameters as in SSD paper. + + Liu et al., SSD: Single shot multibox detector. + For further information on random crop preprocessing refer to RandomCrop + function above. + + Args: + image: rank 3 float32 tensor containing 1 image -> [height, width, channels] + with pixel values varying between [0, 1]. + boxes: rank 2 float32 tensor containing the bounding boxes -> [N, 4]. + Boxes are in normalized form meaning their coordinates vary + between [0, 1]. + Each row is in the form of [ymin, xmin, ymax, xmax]. + labels: rank 1 int32 tensor containing the object classes. + min_object_covered: the cropped image must cover at least this fraction of + at least one of the input bounding boxes. + aspect_ratio_range: allowed range for aspect ratio of cropped image. + area_range: allowed range for area ratio between cropped image and the + original image. + overlap_thresh: minimum overlap thresh with new cropped + image to keep the box. + random_coef: a random coefficient that defines the chance of getting the + original image. If random_coef is 0, we will always get the + cropped image, and if it is 1.0, we will always get the + original image. + min_padded_size_ratio: min ratio of padded image height and width to the + input image's height and width. If None, it will + be set to [0.0, 0.0]. + max_padded_size_ratio: max ratio of padded image height and width to the + input image's height and width. If None, it will + be set to [2.0, 2.0]. + pad_color: padding color. A rank 1 tensor of [3] with dtype=tf.float32. + if set as None, it will be set to average color of the randomly + cropped image. + seed: random seed. + + Returns: + image: Image shape will be [new_height, new_width, channels]. + boxes: boxes which is the same rank as input boxes. Boxes are in normalized + form. + new_labels: new labels. + """ + def random_crop_pad_selector(image_boxes_labels, index): + image, boxes, labels = image_boxes_labels + + return random_crop_pad_image( + image, + boxes, + labels, + min_object_covered=min_object_covered[index], + aspect_ratio_range=aspect_ratio_range[index], + area_range=area_range[index], + overlap_thresh=overlap_thresh[index], + random_coef=random_coef[index], + min_padded_size_ratio=min_padded_size_ratio[index], + max_padded_size_ratio=max_padded_size_ratio[index], + pad_color=pad_color[index], + seed=seed) + + new_image, new_boxes, new_labels = _apply_with_random_selector_tuples( + (image, boxes, labels), + random_crop_pad_selector, + num_cases=len(min_object_covered)) + return new_image, new_boxes, new_labels + + +def ssd_random_crop_fixed_aspect_ratio( + image, + boxes, + labels, + masks=None, + keypoints=None, + min_object_covered=(0.0, 0.1, 0.3, 0.5, 0.7, 0.9, 1.0), + aspect_ratio=1.0, + area_range=((0.1, 1.0),) * 7, + overlap_thresh=(0.0, 0.1, 0.3, 0.5, 0.7, 0.9, 1.0), + random_coef=(0.15,) * 7, + seed=None): + """Random crop preprocessing with default parameters as in SSD paper. + + Liu et al., SSD: Single shot multibox detector. + For further information on random crop preprocessing refer to RandomCrop + function above. + + The only difference is that the aspect ratio of the crops are fixed. + + Args: + image: rank 3 float32 tensor contains 1 image -> [height, width, channels] + with pixel values varying between [0, 1]. + boxes: rank 2 float32 tensor containing the bounding boxes -> [N, 4]. + Boxes are in normalized form meaning their coordinates vary + between [0, 1]. + Each row is in the form of [ymin, xmin, ymax, xmax]. + labels: rank 1 int32 tensor containing the object classes. + masks: (optional) rank 3 float32 tensor with shape + [num_instances, height, width] containing instance masks. The masks + are of the same height, width as the input `image`. + keypoints: (optional) rank 3 float32 tensor with shape + [num_instances, num_keypoints, 2]. The keypoints are in y-x + normalized coordinates. + min_object_covered: the cropped image must cover at least this fraction of + at least one of the input bounding boxes. + aspect_ratio: aspect ratio of the cropped image. + area_range: allowed range for area ratio between cropped image and the + original image. + overlap_thresh: minimum overlap thresh with new cropped + image to keep the box. + random_coef: a random coefficient that defines the chance of getting the + original image. If random_coef is 0, we will always get the + cropped image, and if it is 1.0, we will always get the + original image. + seed: random seed. + + Returns: + image: image which is the same rank as input image. + boxes: boxes which is the same rank as input boxes. + Boxes are in normalized form. + labels: new labels. + + If masks, or keypoints is not None, the function also returns: + + masks: rank 3 float32 tensor with shape [num_instances, height, width] + containing instance masks. + keypoints: rank 3 float32 tensor with shape + [num_instances, num_keypoints, 2] + + """ + aspect_ratio_range = ((aspect_ratio, aspect_ratio),) * len(area_range) + + crop_result = ssd_random_crop(image, boxes, labels, masks, keypoints, + min_object_covered, aspect_ratio_range, + area_range, overlap_thresh, random_coef, seed) + i = 3 + new_image, new_boxes, new_labels = crop_result[:i] + new_masks = None + new_keypoints = None + if masks is not None: + new_masks = crop_result[i] + i += 1 + if keypoints is not None: + new_keypoints = crop_result[i] + result = random_crop_to_aspect_ratio( + new_image, + new_boxes, + new_labels, + new_masks, + new_keypoints, + aspect_ratio=aspect_ratio, + seed=seed) + + return result + + +def get_default_func_arg_map(include_instance_masks=False, + include_keypoints=False): + """Returns the default mapping from a preprocessor function to its args. + + Args: + include_instance_masks: If True, preprocessing functions will modify the + instance masks, too. + include_keypoints: If True, preprocessing functions will modify the + keypoints, too. + + Returns: + A map from preprocessing functions to the arguments they receive. + """ + groundtruth_instance_masks = None + if include_instance_masks: + groundtruth_instance_masks = ( + fields.InputDataFields.groundtruth_instance_masks) + + groundtruth_keypoints = None + if include_keypoints: + groundtruth_keypoints = fields.InputDataFields.groundtruth_keypoints + + prep_func_arg_map = { + normalize_image: (fields.InputDataFields.image,), + random_horizontal_flip: (fields.InputDataFields.image, + fields.InputDataFields.groundtruth_boxes, + groundtruth_instance_masks, + groundtruth_keypoints,), + random_pixel_value_scale: (fields.InputDataFields.image,), + random_image_scale: (fields.InputDataFields.image, + groundtruth_instance_masks,), + random_rgb_to_gray: (fields.InputDataFields.image,), + random_adjust_brightness: (fields.InputDataFields.image,), + random_adjust_contrast: (fields.InputDataFields.image,), + random_adjust_hue: (fields.InputDataFields.image,), + random_adjust_saturation: (fields.InputDataFields.image,), + random_distort_color: (fields.InputDataFields.image,), + random_jitter_boxes: (fields.InputDataFields.groundtruth_boxes,), + random_crop_image: (fields.InputDataFields.image, + fields.InputDataFields.groundtruth_boxes, + fields.InputDataFields.groundtruth_classes, + groundtruth_instance_masks, + groundtruth_keypoints,), + random_pad_image: (fields.InputDataFields.image, + fields.InputDataFields.groundtruth_boxes), + random_crop_pad_image: (fields.InputDataFields.image, + fields.InputDataFields.groundtruth_boxes, + fields.InputDataFields.groundtruth_classes), + random_crop_to_aspect_ratio: (fields.InputDataFields.image, + fields.InputDataFields.groundtruth_boxes, + fields.InputDataFields.groundtruth_classes, + groundtruth_instance_masks, + groundtruth_keypoints,), + random_black_patches: (fields.InputDataFields.image,), + retain_boxes_above_threshold: ( + fields.InputDataFields.groundtruth_boxes, + fields.InputDataFields.groundtruth_classes, + fields.InputDataFields.groundtruth_label_scores, + groundtruth_instance_masks, + groundtruth_keypoints,), + image_to_float: (fields.InputDataFields.image,), + random_resize_method: (fields.InputDataFields.image,), + resize_to_range: (fields.InputDataFields.image, + groundtruth_instance_masks,), + scale_boxes_to_pixel_coordinates: ( + fields.InputDataFields.image, + fields.InputDataFields.groundtruth_boxes, + groundtruth_keypoints,), + flip_boxes: (fields.InputDataFields.groundtruth_boxes,), + resize_image: (fields.InputDataFields.image, + groundtruth_instance_masks,), + subtract_channel_mean: (fields.InputDataFields.image,), + one_hot_encoding: (fields.InputDataFields.groundtruth_image_classes,), + rgb_to_gray: (fields.InputDataFields.image,), + ssd_random_crop: (fields.InputDataFields.image, + fields.InputDataFields.groundtruth_boxes, + fields.InputDataFields.groundtruth_classes, + groundtruth_instance_masks, + groundtruth_keypoints,), + ssd_random_crop_pad: (fields.InputDataFields.image, + fields.InputDataFields.groundtruth_boxes, + fields.InputDataFields.groundtruth_classes), + ssd_random_crop_fixed_aspect_ratio: ( + fields.InputDataFields.image, + fields.InputDataFields.groundtruth_boxes, + fields.InputDataFields.groundtruth_classes, + groundtruth_instance_masks, + groundtruth_keypoints,), + } + + return prep_func_arg_map + + +def preprocess(tensor_dict, preprocess_options, func_arg_map=None): + """Preprocess images and bounding boxes. + + Various types of preprocessing (to be implemented) based on the + preprocess_options dictionary e.g. "crop image" (affects image and possibly + boxes), "white balance image" (affects only image), etc. If self._options + is None, no preprocessing is done. + + Args: + tensor_dict: dictionary that contains images, boxes, and can contain other + things as well. + images-> rank 4 float32 tensor contains + 1 image -> [1, height, width, 3]. + with pixel values varying between [0, 1] + boxes-> rank 2 float32 tensor containing + the bounding boxes -> [N, 4]. + Boxes are in normalized form meaning + their coordinates vary between [0, 1]. + Each row is in the form + of [ymin, xmin, ymax, xmax]. + preprocess_options: It is a list of tuples, where each tuple contains a + function and a dictionary that contains arguments and + their values. + func_arg_map: mapping from preprocessing functions to arguments that they + expect to receive and return. + + Returns: + tensor_dict: which contains the preprocessed images, bounding boxes, etc. + + Raises: + ValueError: (a) If the functions passed to Preprocess + are not in func_arg_map. + (b) If the arguments that a function needs + do not exist in tensor_dict. + (c) If image in tensor_dict is not rank 4 + """ + if func_arg_map is None: + func_arg_map = get_default_func_arg_map() + + # changes the images to image (rank 4 to rank 3) since the functions + # receive rank 3 tensor for image + if fields.InputDataFields.image in tensor_dict: + images = tensor_dict[fields.InputDataFields.image] + if len(images.get_shape()) != 4: + raise ValueError('images in tensor_dict should be rank 4') + image = tf.squeeze(images, squeeze_dims=[0]) + tensor_dict[fields.InputDataFields.image] = image + + # Preprocess inputs based on preprocess_options + for option in preprocess_options: + func, params = option + if func not in func_arg_map: + raise ValueError('The function %s does not exist in func_arg_map' % + (func.__name__)) + arg_names = func_arg_map[func] + for a in arg_names: + if a is not None and a not in tensor_dict: + raise ValueError('The function %s requires argument %s' % + (func.__name__, a)) + + def get_arg(key): + return tensor_dict[key] if key is not None else None + args = [get_arg(a) for a in arg_names] + results = func(*args, **params) + if not isinstance(results, (list, tuple)): + results = (results,) + # Removes None args since the return values will not contain those. + arg_names = [arg_name for arg_name in arg_names if arg_name is not None] + for res, arg_name in zip(results, arg_names): + tensor_dict[arg_name] = res + + # changes the image to images (rank 3 to rank 4) to be compatible to what + # we received in the first place + if fields.InputDataFields.image in tensor_dict: + image = tensor_dict[fields.InputDataFields.image] + images = tf.expand_dims(image, 0) + tensor_dict[fields.InputDataFields.image] = images + + return tensor_dict diff --git a/object_detection/core/preprocessor_test.py b/object_detection/core/preprocessor_test.py new file mode 100644 index 0000000000000000000000000000000000000000..eca135d165785b2a164d51f1b980e5a55f52df5c --- /dev/null +++ b/object_detection/core/preprocessor_test.py @@ -0,0 +1,1751 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.core.preprocessor.""" + +import numpy as np +import six + +import tensorflow as tf + +from object_detection.core import preprocessor +from object_detection.core import standard_fields as fields + +if six.PY2: + import mock # pylint: disable=g-import-not-at-top +else: + from unittest import mock # pylint: disable=g-import-not-at-top + + +class PreprocessorTest(tf.test.TestCase): + + def createColorfulTestImage(self): + ch255 = tf.fill([1, 100, 200, 1], tf.constant(255, dtype=tf.uint8)) + ch128 = tf.fill([1, 100, 200, 1], tf.constant(128, dtype=tf.uint8)) + ch0 = tf.fill([1, 100, 200, 1], tf.constant(0, dtype=tf.uint8)) + imr = tf.concat([ch255, ch0, ch0], 3) + img = tf.concat([ch255, ch255, ch0], 3) + imb = tf.concat([ch255, ch0, ch255], 3) + imw = tf.concat([ch128, ch128, ch128], 3) + imu = tf.concat([imr, img], 2) + imd = tf.concat([imb, imw], 2) + im = tf.concat([imu, imd], 1) + return im + + def createTestImages(self): + images_r = tf.constant([[[128, 128, 128, 128], [0, 0, 128, 128], + [0, 128, 128, 128], [192, 192, 128, 128]]], + dtype=tf.uint8) + images_r = tf.expand_dims(images_r, 3) + images_g = tf.constant([[[0, 0, 128, 128], [0, 0, 128, 128], + [0, 128, 192, 192], [192, 192, 128, 192]]], + dtype=tf.uint8) + images_g = tf.expand_dims(images_g, 3) + images_b = tf.constant([[[128, 128, 192, 0], [0, 0, 128, 192], + [0, 128, 128, 0], [192, 192, 192, 128]]], + dtype=tf.uint8) + images_b = tf.expand_dims(images_b, 3) + images = tf.concat([images_r, images_g, images_b], 3) + return images + + def createTestBoxes(self): + boxes = tf.constant( + [[0.0, 0.25, 0.75, 1.0], [0.25, 0.5, 0.75, 1.0]], dtype=tf.float32) + return boxes + + def createTestLabelScores(self): + return tf.constant([1.0, 0.5], dtype=tf.float32) + + def createTestLabelScoresWithMissingScore(self): + return tf.constant([0.5, np.nan], dtype=tf.float32) + + def createTestMasks(self): + mask = np.array([ + [[255.0, 0.0, 0.0], + [255.0, 0.0, 0.0], + [255.0, 0.0, 0.0]], + [[255.0, 255.0, 0.0], + [255.0, 255.0, 0.0], + [255.0, 255.0, 0.0]]]) + return tf.constant(mask, dtype=tf.float32) + + def createTestKeypoints(self): + keypoints = np.array([ + [[0.1, 0.1], [0.2, 0.2], [0.3, 0.3]], + [[0.4, 0.4], [0.5, 0.5], [0.6, 0.6]], + ]) + return tf.constant(keypoints, dtype=tf.float32) + + def createTestKeypointsInsideCrop(self): + keypoints = np.array([ + [[0.4, 0.4], [0.5, 0.5], [0.6, 0.6]], + [[0.4, 0.4], [0.5, 0.5], [0.6, 0.6]], + ]) + return tf.constant(keypoints, dtype=tf.float32) + + def createTestKeypointsOutsideCrop(self): + keypoints = np.array([ + [[0.1, 0.1], [0.2, 0.2], [0.3, 0.3]], + [[0.1, 0.1], [0.2, 0.2], [0.3, 0.3]], + ]) + return tf.constant(keypoints, dtype=tf.float32) + + def createKeypointFlipPermutation(self): + return np.array([0, 2, 1], dtype=np.int32) + + def createTestLabels(self): + labels = tf.constant([1, 2], dtype=tf.int32) + return labels + + def createTestBoxesOutOfImage(self): + boxes = tf.constant( + [[-0.1, 0.25, 0.75, 1], [0.25, 0.5, 0.75, 1.1]], dtype=tf.float32) + return boxes + + def expectedImagesAfterNormalization(self): + images_r = tf.constant([[[0, 0, 0, 0], [-1, -1, 0, 0], + [-1, 0, 0, 0], [0.5, 0.5, 0, 0]]], + dtype=tf.float32) + images_r = tf.expand_dims(images_r, 3) + images_g = tf.constant([[[-1, -1, 0, 0], [-1, -1, 0, 0], + [-1, 0, 0.5, 0.5], [0.5, 0.5, 0, 0.5]]], + dtype=tf.float32) + images_g = tf.expand_dims(images_g, 3) + images_b = tf.constant([[[0, 0, 0.5, -1], [-1, -1, 0, 0.5], + [-1, 0, 0, -1], [0.5, 0.5, 0.5, 0]]], + dtype=tf.float32) + images_b = tf.expand_dims(images_b, 3) + images = tf.concat([images_r, images_g, images_b], 3) + return images + + def expectedMaxImageAfterColorScale(self): + images_r = tf.constant([[[0.1, 0.1, 0.1, 0.1], [-0.9, -0.9, 0.1, 0.1], + [-0.9, 0.1, 0.1, 0.1], [0.6, 0.6, 0.1, 0.1]]], + dtype=tf.float32) + images_r = tf.expand_dims(images_r, 3) + images_g = tf.constant([[[-0.9, -0.9, 0.1, 0.1], [-0.9, -0.9, 0.1, 0.1], + [-0.9, 0.1, 0.6, 0.6], [0.6, 0.6, 0.1, 0.6]]], + dtype=tf.float32) + images_g = tf.expand_dims(images_g, 3) + images_b = tf.constant([[[0.1, 0.1, 0.6, -0.9], [-0.9, -0.9, 0.1, 0.6], + [-0.9, 0.1, 0.1, -0.9], [0.6, 0.6, 0.6, 0.1]]], + dtype=tf.float32) + images_b = tf.expand_dims(images_b, 3) + images = tf.concat([images_r, images_g, images_b], 3) + return images + + def expectedMinImageAfterColorScale(self): + images_r = tf.constant([[[-0.1, -0.1, -0.1, -0.1], [-1, -1, -0.1, -0.1], + [-1, -0.1, -0.1, -0.1], [0.4, 0.4, -0.1, -0.1]]], + dtype=tf.float32) + images_r = tf.expand_dims(images_r, 3) + images_g = tf.constant([[[-1, -1, -0.1, -0.1], [-1, -1, -0.1, -0.1], + [-1, -0.1, 0.4, 0.4], [0.4, 0.4, -0.1, 0.4]]], + dtype=tf.float32) + images_g = tf.expand_dims(images_g, 3) + images_b = tf.constant([[[-0.1, -0.1, 0.4, -1], [-1, -1, -0.1, 0.4], + [-1, -0.1, -0.1, -1], [0.4, 0.4, 0.4, -0.1]]], + dtype=tf.float32) + images_b = tf.expand_dims(images_b, 3) + images = tf.concat([images_r, images_g, images_b], 3) + return images + + def expectedImagesAfterMirroring(self): + images_r = tf.constant([[[0, 0, 0, 0], [0, 0, -1, -1], + [0, 0, 0, -1], [0, 0, 0.5, 0.5]]], + dtype=tf.float32) + images_r = tf.expand_dims(images_r, 3) + images_g = tf.constant([[[0, 0, -1, -1], [0, 0, -1, -1], + [0.5, 0.5, 0, -1], [0.5, 0, 0.5, 0.5]]], + dtype=tf.float32) + images_g = tf.expand_dims(images_g, 3) + images_b = tf.constant([[[-1, 0.5, 0, 0], [0.5, 0, -1, -1], + [-1, 0, 0, -1], [0, 0.5, 0.5, 0.5]]], + dtype=tf.float32) + images_b = tf.expand_dims(images_b, 3) + images = tf.concat([images_r, images_g, images_b], 3) + return images + + def expectedBoxesAfterMirroring(self): + boxes = tf.constant([[0.0, 0.0, 0.75, 0.75], [0.25, 0.0, 0.75, 0.5]], + dtype=tf.float32) + return boxes + + def expectedBoxesAfterXY(self): + boxes = tf.constant([[0.25, 0.0, 1.0, 0.75], [0.5, 0.25, 1, 0.75]], + dtype=tf.float32) + return boxes + + def expectedMasksAfterMirroring(self): + mask = np.array([ + [[0.0, 0.0, 255.0], + [0.0, 0.0, 255.0], + [0.0, 0.0, 255.0]], + [[0.0, 255.0, 255.0], + [0.0, 255.0, 255.0], + [0.0, 255.0, 255.0]]]) + return tf.constant(mask, dtype=tf.float32) + + def expectedLabelScoresAfterThresholding(self): + return tf.constant([1.0], dtype=tf.float32) + + def expectedBoxesAfterThresholding(self): + return tf.constant([[0.0, 0.25, 0.75, 1.0]], dtype=tf.float32) + + def expectedLabelsAfterThresholding(self): + return tf.constant([1], dtype=tf.float32) + + def expectedMasksAfterThresholding(self): + mask = np.array([ + [[255.0, 0.0, 0.0], + [255.0, 0.0, 0.0], + [255.0, 0.0, 0.0]]]) + return tf.constant(mask, dtype=tf.float32) + + def expectedKeypointsAfterThresholding(self): + keypoints = np.array([ + [[0.1, 0.1], [0.2, 0.2], [0.3, 0.3]] + ]) + return tf.constant(keypoints, dtype=tf.float32) + + def expectedLabelScoresAfterThresholdingWithMissingScore(self): + return tf.constant([np.nan], dtype=tf.float32) + + def expectedBoxesAfterThresholdingWithMissingScore(self): + return tf.constant([[0.25, 0.5, 0.75, 1]], dtype=tf.float32) + + def expectedLabelsAfterThresholdingWithMissingScore(self): + return tf.constant([2], dtype=tf.float32) + + def testNormalizeImage(self): + preprocess_options = [(preprocessor.normalize_image, { + 'original_minval': 0, + 'original_maxval': 256, + 'target_minval': -1, + 'target_maxval': 1 + })] + images = self.createTestImages() + tensor_dict = {fields.InputDataFields.image: images} + tensor_dict = preprocessor.preprocess(tensor_dict, preprocess_options) + images = tensor_dict[fields.InputDataFields.image] + images_expected = self.expectedImagesAfterNormalization() + + with self.test_session() as sess: + (images_, images_expected_) = sess.run( + [images, images_expected]) + images_shape_ = images_.shape + images_expected_shape_ = images_expected_.shape + expected_shape = [1, 4, 4, 3] + self.assertAllEqual(images_expected_shape_, images_shape_) + self.assertAllEqual(images_shape_, expected_shape) + self.assertAllClose(images_, images_expected_) + + def testRetainBoxesAboveThreshold(self): + boxes = self.createTestBoxes() + labels = self.createTestLabels() + label_scores = self.createTestLabelScores() + (retained_boxes, retained_labels, + retained_label_scores) = preprocessor.retain_boxes_above_threshold( + boxes, labels, label_scores, threshold=0.6) + with self.test_session() as sess: + (retained_boxes_, retained_labels_, retained_label_scores_, + expected_retained_boxes_, expected_retained_labels_, + expected_retained_label_scores_) = sess.run([ + retained_boxes, retained_labels, retained_label_scores, + self.expectedBoxesAfterThresholding(), + self.expectedLabelsAfterThresholding(), + self.expectedLabelScoresAfterThresholding()]) + self.assertAllClose( + retained_boxes_, expected_retained_boxes_) + self.assertAllClose( + retained_labels_, expected_retained_labels_) + self.assertAllClose( + retained_label_scores_, expected_retained_label_scores_) + + def testRetainBoxesAboveThresholdWithMasks(self): + boxes = self.createTestBoxes() + labels = self.createTestLabels() + label_scores = self.createTestLabelScores() + masks = self.createTestMasks() + _, _, _, retained_masks = preprocessor.retain_boxes_above_threshold( + boxes, labels, label_scores, masks, threshold=0.6) + with self.test_session() as sess: + retained_masks_, expected_retained_masks_ = sess.run([ + retained_masks, self.expectedMasksAfterThresholding()]) + + self.assertAllClose( + retained_masks_, expected_retained_masks_) + + def testRetainBoxesAboveThresholdWithKeypoints(self): + boxes = self.createTestBoxes() + labels = self.createTestLabels() + label_scores = self.createTestLabelScores() + keypoints = self.createTestKeypoints() + (_, _, _, retained_keypoints) = preprocessor.retain_boxes_above_threshold( + boxes, labels, label_scores, keypoints=keypoints, threshold=0.6) + with self.test_session() as sess: + (retained_keypoints_, + expected_retained_keypoints_) = sess.run([ + retained_keypoints, + self.expectedKeypointsAfterThresholding()]) + + self.assertAllClose( + retained_keypoints_, expected_retained_keypoints_) + + def testRetainBoxesAboveThresholdWithMissingScore(self): + boxes = self.createTestBoxes() + labels = self.createTestLabels() + label_scores = self.createTestLabelScoresWithMissingScore() + (retained_boxes, retained_labels, + retained_label_scores) = preprocessor.retain_boxes_above_threshold( + boxes, labels, label_scores, threshold=0.6) + with self.test_session() as sess: + (retained_boxes_, retained_labels_, retained_label_scores_, + expected_retained_boxes_, expected_retained_labels_, + expected_retained_label_scores_) = sess.run([ + retained_boxes, retained_labels, retained_label_scores, + self.expectedBoxesAfterThresholdingWithMissingScore(), + self.expectedLabelsAfterThresholdingWithMissingScore(), + self.expectedLabelScoresAfterThresholdingWithMissingScore()]) + self.assertAllClose( + retained_boxes_, expected_retained_boxes_) + self.assertAllClose( + retained_labels_, expected_retained_labels_) + self.assertAllClose( + retained_label_scores_, expected_retained_label_scores_) + + def testRandomFlipBoxes(self): + boxes = self.createTestBoxes() + + # Case where the boxes are flipped. + boxes_expected1 = self.expectedBoxesAfterMirroring() + + # Case where the boxes are not flipped. + boxes_expected2 = boxes + + # After elementwise multiplication, the result should be all-zero since one + # of them is all-zero. + boxes_diff = tf.multiply( + tf.squared_difference(boxes, boxes_expected1), + tf.squared_difference(boxes, boxes_expected2)) + expected_result = tf.zeros_like(boxes_diff) + + with self.test_session() as sess: + (boxes_diff, expected_result) = sess.run([boxes_diff, expected_result]) + self.assertAllEqual(boxes_diff, expected_result) + + def testFlipMasks(self): + test_mask = self.createTestMasks() + flipped_mask = preprocessor._flip_masks(test_mask) + expected_mask = self.expectedMasksAfterMirroring() + with self.test_session() as sess: + flipped_mask, expected_mask = sess.run([flipped_mask, expected_mask]) + self.assertAllEqual(flipped_mask.flatten(), expected_mask.flatten()) + + def testRandomHorizontalFlip(self): + preprocess_options = [(preprocessor.random_horizontal_flip, {})] + images = self.expectedImagesAfterNormalization() + boxes = self.createTestBoxes() + tensor_dict = {fields.InputDataFields.image: images, + fields.InputDataFields.groundtruth_boxes: boxes} + images_expected1 = self.expectedImagesAfterMirroring() + boxes_expected1 = self.expectedBoxesAfterMirroring() + images_expected2 = images + boxes_expected2 = boxes + tensor_dict = preprocessor.preprocess(tensor_dict, preprocess_options) + images = tensor_dict[fields.InputDataFields.image] + boxes = tensor_dict[fields.InputDataFields.groundtruth_boxes] + + boxes_diff1 = tf.squared_difference(boxes, boxes_expected1) + boxes_diff2 = tf.squared_difference(boxes, boxes_expected2) + boxes_diff = tf.multiply(boxes_diff1, boxes_diff2) + boxes_diff_expected = tf.zeros_like(boxes_diff) + + images_diff1 = tf.squared_difference(images, images_expected1) + images_diff2 = tf.squared_difference(images, images_expected2) + images_diff = tf.multiply(images_diff1, images_diff2) + images_diff_expected = tf.zeros_like(images_diff) + + with self.test_session() as sess: + (images_diff_, images_diff_expected_, boxes_diff_, + boxes_diff_expected_) = sess.run([images_diff, images_diff_expected, + boxes_diff, boxes_diff_expected]) + self.assertAllClose(boxes_diff_, boxes_diff_expected_) + self.assertAllClose(images_diff_, images_diff_expected_) + + def testRunRandomHorizontalFlipWithMaskAndKeypoints(self): + preprocess_options = [(preprocessor.random_horizontal_flip, {})] + image_height = 3 + image_width = 3 + images = tf.random_uniform([1, image_height, image_width, 3]) + boxes = self.createTestBoxes() + masks = self.createTestMasks() + keypoints = self.createTestKeypoints() + keypoint_flip_permutation = self.createKeypointFlipPermutation() + tensor_dict = { + fields.InputDataFields.image: images, + fields.InputDataFields.groundtruth_boxes: boxes, + fields.InputDataFields.groundtruth_instance_masks: masks, + fields.InputDataFields.groundtruth_keypoints: keypoints + } + preprocess_options = [ + (preprocessor.random_horizontal_flip, + {'keypoint_flip_permutation': keypoint_flip_permutation})] + preprocessor_arg_map = preprocessor.get_default_func_arg_map( + include_instance_masks=True, include_keypoints=True) + tensor_dict = preprocessor.preprocess( + tensor_dict, preprocess_options, func_arg_map=preprocessor_arg_map) + boxes = tensor_dict[fields.InputDataFields.groundtruth_boxes] + masks = tensor_dict[fields.InputDataFields.groundtruth_instance_masks] + keypoints = tensor_dict[fields.InputDataFields.groundtruth_keypoints] + with self.test_session() as sess: + boxes, masks, keypoints = sess.run([boxes, masks, keypoints]) + self.assertTrue(boxes is not None) + self.assertTrue(masks is not None) + self.assertTrue(keypoints is not None) + + def testRandomPixelValueScale(self): + preprocessing_options = [] + preprocessing_options.append((preprocessor.normalize_image, { + 'original_minval': 0, + 'original_maxval': 255, + 'target_minval': 0, + 'target_maxval': 1 + })) + preprocessing_options.append((preprocessor.random_pixel_value_scale, {})) + images = self.createTestImages() + tensor_dict = {fields.InputDataFields.image: images} + tensor_dict = preprocessor.preprocess(tensor_dict, preprocessing_options) + images_min = tf.to_float(images) * 0.9 / 255.0 + images_max = tf.to_float(images) * 1.1 / 255.0 + images = tensor_dict[fields.InputDataFields.image] + values_greater = tf.greater_equal(images, images_min) + values_less = tf.less_equal(images, images_max) + values_true = tf.fill([1, 4, 4, 3], True) + with self.test_session() as sess: + (values_greater_, values_less_, values_true_) = sess.run( + [values_greater, values_less, values_true]) + self.assertAllClose(values_greater_, values_true_) + self.assertAllClose(values_less_, values_true_) + + def testRandomImageScale(self): + preprocess_options = [(preprocessor.random_image_scale, {})] + images_original = self.createTestImages() + tensor_dict = {fields.InputDataFields.image: images_original} + tensor_dict = preprocessor.preprocess(tensor_dict, preprocess_options) + images_scaled = tensor_dict[fields.InputDataFields.image] + images_original_shape = tf.shape(images_original) + images_scaled_shape = tf.shape(images_scaled) + with self.test_session() as sess: + (images_original_shape_, images_scaled_shape_) = sess.run( + [images_original_shape, images_scaled_shape]) + self.assertTrue( + images_original_shape_[1] * 0.5 <= images_scaled_shape_[1]) + self.assertTrue( + images_original_shape_[1] * 2.0 >= images_scaled_shape_[1]) + self.assertTrue( + images_original_shape_[2] * 0.5 <= images_scaled_shape_[2]) + self.assertTrue( + images_original_shape_[2] * 2.0 >= images_scaled_shape_[2]) + + def testRandomRGBtoGray(self): + preprocess_options = [(preprocessor.random_rgb_to_gray, {})] + images_original = self.createTestImages() + tensor_dict = {fields.InputDataFields.image: images_original} + tensor_dict = preprocessor.preprocess(tensor_dict, preprocess_options) + images_gray = tensor_dict[fields.InputDataFields.image] + images_gray_r, images_gray_g, images_gray_b = tf.split( + value=images_gray, num_or_size_splits=3, axis=3) + images_r, images_g, images_b = tf.split( + value=images_original, num_or_size_splits=3, axis=3) + images_r_diff1 = tf.squared_difference(tf.to_float(images_r), + tf.to_float(images_gray_r)) + images_r_diff2 = tf.squared_difference(tf.to_float(images_gray_r), + tf.to_float(images_gray_g)) + images_r_diff = tf.multiply(images_r_diff1, images_r_diff2) + images_g_diff1 = tf.squared_difference(tf.to_float(images_g), + tf.to_float(images_gray_g)) + images_g_diff2 = tf.squared_difference(tf.to_float(images_gray_g), + tf.to_float(images_gray_b)) + images_g_diff = tf.multiply(images_g_diff1, images_g_diff2) + images_b_diff1 = tf.squared_difference(tf.to_float(images_b), + tf.to_float(images_gray_b)) + images_b_diff2 = tf.squared_difference(tf.to_float(images_gray_b), + tf.to_float(images_gray_r)) + images_b_diff = tf.multiply(images_b_diff1, images_b_diff2) + image_zero1 = tf.constant(0, dtype=tf.float32, shape=[1, 4, 4, 1]) + with self.test_session() as sess: + (images_r_diff_, images_g_diff_, images_b_diff_, image_zero1_) = sess.run( + [images_r_diff, images_g_diff, images_b_diff, image_zero1]) + self.assertAllClose(images_r_diff_, image_zero1_) + self.assertAllClose(images_g_diff_, image_zero1_) + self.assertAllClose(images_b_diff_, image_zero1_) + + def testRandomAdjustBrightness(self): + preprocessing_options = [] + preprocessing_options.append((preprocessor.normalize_image, { + 'original_minval': 0, + 'original_maxval': 255, + 'target_minval': 0, + 'target_maxval': 1 + })) + preprocessing_options.append((preprocessor.random_adjust_brightness, {})) + images_original = self.createTestImages() + tensor_dict = {fields.InputDataFields.image: images_original} + tensor_dict = preprocessor.preprocess(tensor_dict, preprocessing_options) + images_bright = tensor_dict[fields.InputDataFields.image] + image_original_shape = tf.shape(images_original) + image_bright_shape = tf.shape(images_bright) + with self.test_session() as sess: + (image_original_shape_, image_bright_shape_) = sess.run( + [image_original_shape, image_bright_shape]) + self.assertAllEqual(image_original_shape_, image_bright_shape_) + + def testRandomAdjustContrast(self): + preprocessing_options = [] + preprocessing_options.append((preprocessor.normalize_image, { + 'original_minval': 0, + 'original_maxval': 255, + 'target_minval': 0, + 'target_maxval': 1 + })) + preprocessing_options.append((preprocessor.random_adjust_contrast, {})) + images_original = self.createTestImages() + tensor_dict = {fields.InputDataFields.image: images_original} + tensor_dict = preprocessor.preprocess(tensor_dict, preprocessing_options) + images_contrast = tensor_dict[fields.InputDataFields.image] + image_original_shape = tf.shape(images_original) + image_contrast_shape = tf.shape(images_contrast) + with self.test_session() as sess: + (image_original_shape_, image_contrast_shape_) = sess.run( + [image_original_shape, image_contrast_shape]) + self.assertAllEqual(image_original_shape_, image_contrast_shape_) + + def testRandomAdjustHue(self): + preprocessing_options = [] + preprocessing_options.append((preprocessor.normalize_image, { + 'original_minval': 0, + 'original_maxval': 255, + 'target_minval': 0, + 'target_maxval': 1 + })) + preprocessing_options.append((preprocessor.random_adjust_hue, {})) + images_original = self.createTestImages() + tensor_dict = {fields.InputDataFields.image: images_original} + tensor_dict = preprocessor.preprocess(tensor_dict, preprocessing_options) + images_hue = tensor_dict[fields.InputDataFields.image] + image_original_shape = tf.shape(images_original) + image_hue_shape = tf.shape(images_hue) + with self.test_session() as sess: + (image_original_shape_, image_hue_shape_) = sess.run( + [image_original_shape, image_hue_shape]) + self.assertAllEqual(image_original_shape_, image_hue_shape_) + + def testRandomDistortColor(self): + preprocessing_options = [] + preprocessing_options.append((preprocessor.normalize_image, { + 'original_minval': 0, + 'original_maxval': 255, + 'target_minval': 0, + 'target_maxval': 1 + })) + preprocessing_options.append((preprocessor.random_distort_color, {})) + images_original = self.createTestImages() + images_original_shape = tf.shape(images_original) + tensor_dict = {fields.InputDataFields.image: images_original} + tensor_dict = preprocessor.preprocess(tensor_dict, preprocessing_options) + images_distorted_color = tensor_dict[fields.InputDataFields.image] + images_distorted_color_shape = tf.shape(images_distorted_color) + with self.test_session() as sess: + (images_original_shape_, images_distorted_color_shape_) = sess.run( + [images_original_shape, images_distorted_color_shape]) + self.assertAllEqual(images_original_shape_, images_distorted_color_shape_) + + def testRandomJitterBoxes(self): + preprocessing_options = [] + preprocessing_options.append((preprocessor.random_jitter_boxes, {})) + boxes = self.createTestBoxes() + boxes_shape = tf.shape(boxes) + tensor_dict = {fields.InputDataFields.groundtruth_boxes: boxes} + tensor_dict = preprocessor.preprocess(tensor_dict, preprocessing_options) + distorted_boxes = tensor_dict[fields.InputDataFields.groundtruth_boxes] + distorted_boxes_shape = tf.shape(distorted_boxes) + + with self.test_session() as sess: + (boxes_shape_, distorted_boxes_shape_) = sess.run( + [boxes_shape, distorted_boxes_shape]) + self.assertAllEqual(boxes_shape_, distorted_boxes_shape_) + + def testRandomCropImage(self): + preprocessing_options = [] + preprocessing_options.append((preprocessor.normalize_image, { + 'original_minval': 0, + 'original_maxval': 255, + 'target_minval': 0, + 'target_maxval': 1 + })) + preprocessing_options.append((preprocessor.random_crop_image, {})) + images = self.createTestImages() + boxes = self.createTestBoxes() + labels = self.createTestLabels() + tensor_dict = {fields.InputDataFields.image: images, + fields.InputDataFields.groundtruth_boxes: boxes, + fields.InputDataFields.groundtruth_classes: labels} + distorted_tensor_dict = preprocessor.preprocess(tensor_dict, + preprocessing_options) + distorted_images = distorted_tensor_dict[fields.InputDataFields.image] + distorted_boxes = distorted_tensor_dict[ + fields.InputDataFields.groundtruth_boxes] + boxes_rank = tf.rank(boxes) + distorted_boxes_rank = tf.rank(distorted_boxes) + images_rank = tf.rank(images) + distorted_images_rank = tf.rank(distorted_images) + self.assertEqual(3, distorted_images.get_shape()[3]) + + with self.test_session() as sess: + (boxes_rank_, distorted_boxes_rank_, images_rank_, + distorted_images_rank_) = sess.run([ + boxes_rank, distorted_boxes_rank, images_rank, distorted_images_rank + ]) + self.assertAllEqual(boxes_rank_, distorted_boxes_rank_) + self.assertAllEqual(images_rank_, distorted_images_rank_) + + def testRandomCropImageGrayscale(self): + preprocessing_options = [(preprocessor.rgb_to_gray, {}), + (preprocessor.normalize_image, { + 'original_minval': 0, + 'original_maxval': 255, + 'target_minval': 0, + 'target_maxval': 1, + }), + (preprocessor.random_crop_image, {})] + images = self.createTestImages() + boxes = self.createTestBoxes() + labels = self.createTestLabels() + tensor_dict = { + fields.InputDataFields.image: images, + fields.InputDataFields.groundtruth_boxes: boxes, + fields.InputDataFields.groundtruth_classes: labels + } + distorted_tensor_dict = preprocessor.preprocess( + tensor_dict, preprocessing_options) + distorted_images = distorted_tensor_dict[fields.InputDataFields.image] + distorted_boxes = distorted_tensor_dict[ + fields.InputDataFields.groundtruth_boxes] + boxes_rank = tf.rank(boxes) + distorted_boxes_rank = tf.rank(distorted_boxes) + images_rank = tf.rank(images) + distorted_images_rank = tf.rank(distorted_images) + self.assertEqual(1, distorted_images.get_shape()[3]) + + with self.test_session() as sess: + session_results = sess.run([ + boxes_rank, distorted_boxes_rank, images_rank, distorted_images_rank + ]) + (boxes_rank_, distorted_boxes_rank_, images_rank_, + distorted_images_rank_) = session_results + self.assertAllEqual(boxes_rank_, distorted_boxes_rank_) + self.assertAllEqual(images_rank_, distorted_images_rank_) + + def testRandomCropImageWithBoxOutOfImage(self): + preprocessing_options = [] + preprocessing_options.append((preprocessor.normalize_image, { + 'original_minval': 0, + 'original_maxval': 255, + 'target_minval': 0, + 'target_maxval': 1 + })) + preprocessing_options.append((preprocessor.random_crop_image, {})) + images = self.createTestImages() + boxes = self.createTestBoxesOutOfImage() + labels = self.createTestLabels() + tensor_dict = {fields.InputDataFields.image: images, + fields.InputDataFields.groundtruth_boxes: boxes, + fields.InputDataFields.groundtruth_classes: labels} + distorted_tensor_dict = preprocessor.preprocess(tensor_dict, + preprocessing_options) + distorted_images = distorted_tensor_dict[fields.InputDataFields.image] + distorted_boxes = distorted_tensor_dict[ + fields.InputDataFields.groundtruth_boxes] + boxes_rank = tf.rank(boxes) + distorted_boxes_rank = tf.rank(distorted_boxes) + images_rank = tf.rank(images) + distorted_images_rank = tf.rank(distorted_images) + + with self.test_session() as sess: + (boxes_rank_, distorted_boxes_rank_, images_rank_, + distorted_images_rank_) = sess.run( + [boxes_rank, distorted_boxes_rank, images_rank, + distorted_images_rank]) + self.assertAllEqual(boxes_rank_, distorted_boxes_rank_) + self.assertAllEqual(images_rank_, distorted_images_rank_) + + def testRandomCropImageWithRandomCoefOne(self): + preprocessing_options = [(preprocessor.normalize_image, { + 'original_minval': 0, + 'original_maxval': 255, + 'target_minval': 0, + 'target_maxval': 1 + })] + + images = self.createTestImages() + boxes = self.createTestBoxes() + labels = self.createTestLabels() + tensor_dict = {fields.InputDataFields.image: images, + fields.InputDataFields.groundtruth_boxes: boxes, + fields.InputDataFields.groundtruth_classes: labels} + tensor_dict = preprocessor.preprocess(tensor_dict, preprocessing_options) + images = tensor_dict[fields.InputDataFields.image] + + preprocessing_options = [(preprocessor.random_crop_image, { + 'random_coef': 1.0 + })] + distorted_tensor_dict = preprocessor.preprocess(tensor_dict, + preprocessing_options) + + distorted_images = distorted_tensor_dict[fields.InputDataFields.image] + distorted_boxes = distorted_tensor_dict[ + fields.InputDataFields.groundtruth_boxes] + distorted_labels = distorted_tensor_dict[ + fields.InputDataFields.groundtruth_classes] + boxes_shape = tf.shape(boxes) + distorted_boxes_shape = tf.shape(distorted_boxes) + images_shape = tf.shape(images) + distorted_images_shape = tf.shape(distorted_images) + + with self.test_session() as sess: + (boxes_shape_, distorted_boxes_shape_, images_shape_, + distorted_images_shape_, images_, distorted_images_, + boxes_, distorted_boxes_, labels_, distorted_labels_) = sess.run( + [boxes_shape, distorted_boxes_shape, images_shape, + distorted_images_shape, images, distorted_images, + boxes, distorted_boxes, labels, distorted_labels]) + self.assertAllEqual(boxes_shape_, distorted_boxes_shape_) + self.assertAllEqual(images_shape_, distorted_images_shape_) + self.assertAllClose(images_, distorted_images_) + self.assertAllClose(boxes_, distorted_boxes_) + self.assertAllEqual(labels_, distorted_labels_) + + def testRandomCropWithMockSampleDistortedBoundingBox(self): + preprocessing_options = [(preprocessor.normalize_image, { + 'original_minval': 0, + 'original_maxval': 255, + 'target_minval': 0, + 'target_maxval': 1 + })] + + images = self.createColorfulTestImage() + boxes = tf.constant([[0.1, 0.1, 0.8, 0.3], + [0.2, 0.4, 0.75, 0.75], + [0.3, 0.1, 0.4, 0.7]], dtype=tf.float32) + labels = tf.constant([1, 7, 11], dtype=tf.int32) + tensor_dict = {fields.InputDataFields.image: images, + fields.InputDataFields.groundtruth_boxes: boxes, + fields.InputDataFields.groundtruth_classes: labels} + tensor_dict = preprocessor.preprocess(tensor_dict, preprocessing_options) + images = tensor_dict[fields.InputDataFields.image] + + preprocessing_options = [(preprocessor.random_crop_image, {})] + with mock.patch.object( + tf.image, + 'sample_distorted_bounding_box') as mock_sample_distorted_bounding_box: + mock_sample_distorted_bounding_box.return_value = (tf.constant( + [6, 143, 0], dtype=tf.int32), tf.constant( + [190, 237, -1], dtype=tf.int32), tf.constant( + [[[0.03, 0.3575, 0.98, 0.95]]], dtype=tf.float32)) + + distorted_tensor_dict = preprocessor.preprocess(tensor_dict, + preprocessing_options) + + distorted_boxes = distorted_tensor_dict[ + fields.InputDataFields.groundtruth_boxes] + distorted_labels = distorted_tensor_dict[ + fields.InputDataFields.groundtruth_classes] + expected_boxes = tf.constant([[0.178947, 0.07173, 0.75789469, 0.66244733], + [0.28421, 0.0, 0.38947365, 0.57805908]], + dtype=tf.float32) + expected_labels = tf.constant([7, 11], dtype=tf.int32) + + with self.test_session() as sess: + (distorted_boxes_, distorted_labels_, + expected_boxes_, expected_labels_) = sess.run( + [distorted_boxes, distorted_labels, + expected_boxes, expected_labels]) + self.assertAllClose(distorted_boxes_, expected_boxes_) + self.assertAllEqual(distorted_labels_, expected_labels_) + + def testStrictRandomCropImageWithMasks(self): + image = self.createColorfulTestImage()[0] + boxes = self.createTestBoxes() + labels = self.createTestLabels() + masks = tf.random_uniform([2, 200, 400], dtype=tf.float32) + with mock.patch.object( + tf.image, + 'sample_distorted_bounding_box' + ) as mock_sample_distorted_bounding_box: + mock_sample_distorted_bounding_box.return_value = ( + tf.constant([6, 143, 0], dtype=tf.int32), + tf.constant([190, 237, -1], dtype=tf.int32), + tf.constant([[[0.03, 0.3575, 0.98, 0.95]]], dtype=tf.float32)) + (new_image, new_boxes, new_labels, + new_masks) = preprocessor._strict_random_crop_image( + image, boxes, labels, masks=masks) + with self.test_session() as sess: + new_image, new_boxes, new_labels, new_masks = sess.run([ + new_image, new_boxes, new_labels, new_masks]) + + expected_boxes = np.array([ + [0.0, 0.0, 0.75789469, 1.0], + [0.23157893, 0.24050637, 0.75789469, 1.0], + ], dtype=np.float32) + self.assertAllEqual(new_image.shape, [190, 237, 3]) + self.assertAllEqual(new_masks.shape, [2, 190, 237]) + self.assertAllClose( + new_boxes.flatten(), expected_boxes.flatten()) + + def testStrictRandomCropImageWithKeypoints(self): + image = self.createColorfulTestImage()[0] + boxes = self.createTestBoxes() + labels = self.createTestLabels() + keypoints = self.createTestKeypoints() + with mock.patch.object( + tf.image, + 'sample_distorted_bounding_box' + ) as mock_sample_distorted_bounding_box: + mock_sample_distorted_bounding_box.return_value = ( + tf.constant([6, 143, 0], dtype=tf.int32), + tf.constant([190, 237, -1], dtype=tf.int32), + tf.constant([[[0.03, 0.3575, 0.98, 0.95]]], dtype=tf.float32)) + (new_image, new_boxes, new_labels, + new_keypoints) = preprocessor._strict_random_crop_image( + image, boxes, labels, keypoints=keypoints) + with self.test_session() as sess: + new_image, new_boxes, new_labels, new_keypoints = sess.run([ + new_image, new_boxes, new_labels, new_keypoints]) + + expected_boxes = np.array([ + [0.0, 0.0, 0.75789469, 1.0], + [0.23157893, 0.24050637, 0.75789469, 1.0], + ], dtype=np.float32) + expected_keypoints = np.array([ + [[np.nan, np.nan], + [np.nan, np.nan], + [np.nan, np.nan]], + [[0.38947368, 0.07173], + [0.49473682, 0.24050637], + [0.60000002, 0.40928277]] + ], dtype=np.float32) + self.assertAllEqual(new_image.shape, [190, 237, 3]) + self.assertAllClose( + new_boxes.flatten(), expected_boxes.flatten()) + self.assertAllClose( + new_keypoints.flatten(), expected_keypoints.flatten()) + + def testRunRandomCropImageWithMasks(self): + image = self.createColorfulTestImage() + boxes = self.createTestBoxes() + labels = self.createTestLabels() + masks = tf.random_uniform([2, 200, 400], dtype=tf.float32) + + tensor_dict = { + fields.InputDataFields.image: image, + fields.InputDataFields.groundtruth_boxes: boxes, + fields.InputDataFields.groundtruth_classes: labels, + fields.InputDataFields.groundtruth_instance_masks: masks, + } + + preprocessor_arg_map = preprocessor.get_default_func_arg_map( + include_instance_masks=True) + + preprocessing_options = [(preprocessor.random_crop_image, {})] + + with mock.patch.object( + tf.image, + 'sample_distorted_bounding_box' + ) as mock_sample_distorted_bounding_box: + mock_sample_distorted_bounding_box.return_value = ( + tf.constant([6, 143, 0], dtype=tf.int32), + tf.constant([190, 237, -1], dtype=tf.int32), + tf.constant([[[0.03, 0.3575, 0.98, 0.95]]], dtype=tf.float32)) + distorted_tensor_dict = preprocessor.preprocess( + tensor_dict, preprocessing_options, func_arg_map=preprocessor_arg_map) + distorted_image = distorted_tensor_dict[fields.InputDataFields.image] + distorted_boxes = distorted_tensor_dict[ + fields.InputDataFields.groundtruth_boxes] + distorted_labels = distorted_tensor_dict[ + fields.InputDataFields.groundtruth_classes] + distorted_masks = distorted_tensor_dict[ + fields.InputDataFields.groundtruth_instance_masks] + with self.test_session() as sess: + (distorted_image_, distorted_boxes_, distorted_labels_, + distorted_masks_) = sess.run( + [distorted_image, distorted_boxes, distorted_labels, + distorted_masks]) + + expected_boxes = np.array([ + [0.0, 0.0, 0.75789469, 1.0], + [0.23157893, 0.24050637, 0.75789469, 1.0], + ], dtype=np.float32) + self.assertAllEqual(distorted_image_.shape, [1, 190, 237, 3]) + self.assertAllEqual(distorted_masks_.shape, [2, 190, 237]) + self.assertAllEqual(distorted_labels_, [1, 2]) + self.assertAllClose( + distorted_boxes_.flatten(), expected_boxes.flatten()) + + def testRunRandomCropImageWithKeypointsInsideCrop(self): + image = self.createColorfulTestImage() + boxes = self.createTestBoxes() + labels = self.createTestLabels() + keypoints = self.createTestKeypointsInsideCrop() + + tensor_dict = { + fields.InputDataFields.image: image, + fields.InputDataFields.groundtruth_boxes: boxes, + fields.InputDataFields.groundtruth_classes: labels, + fields.InputDataFields.groundtruth_keypoints: keypoints + } + + preprocessor_arg_map = preprocessor.get_default_func_arg_map( + include_keypoints=True) + + preprocessing_options = [(preprocessor.random_crop_image, {})] + + with mock.patch.object( + tf.image, + 'sample_distorted_bounding_box' + ) as mock_sample_distorted_bounding_box: + mock_sample_distorted_bounding_box.return_value = ( + tf.constant([6, 143, 0], dtype=tf.int32), + tf.constant([190, 237, -1], dtype=tf.int32), + tf.constant([[[0.03, 0.3575, 0.98, 0.95]]], dtype=tf.float32)) + distorted_tensor_dict = preprocessor.preprocess( + tensor_dict, preprocessing_options, func_arg_map=preprocessor_arg_map) + distorted_image = distorted_tensor_dict[fields.InputDataFields.image] + distorted_boxes = distorted_tensor_dict[ + fields.InputDataFields.groundtruth_boxes] + distorted_labels = distorted_tensor_dict[ + fields.InputDataFields.groundtruth_classes] + distorted_keypoints = distorted_tensor_dict[ + fields.InputDataFields.groundtruth_keypoints] + with self.test_session() as sess: + (distorted_image_, distorted_boxes_, distorted_labels_, + distorted_keypoints_) = sess.run( + [distorted_image, distorted_boxes, distorted_labels, + distorted_keypoints]) + + expected_boxes = np.array([ + [0.0, 0.0, 0.75789469, 1.0], + [0.23157893, 0.24050637, 0.75789469, 1.0], + ], dtype=np.float32) + expected_keypoints = np.array([ + [[0.38947368, 0.07173], + [0.49473682, 0.24050637], + [0.60000002, 0.40928277]], + [[0.38947368, 0.07173], + [0.49473682, 0.24050637], + [0.60000002, 0.40928277]] + ]) + self.assertAllEqual(distorted_image_.shape, [1, 190, 237, 3]) + self.assertAllEqual(distorted_labels_, [1, 2]) + self.assertAllClose( + distorted_boxes_.flatten(), expected_boxes.flatten()) + self.assertAllClose( + distorted_keypoints_.flatten(), expected_keypoints.flatten()) + + def testRunRandomCropImageWithKeypointsOutsideCrop(self): + image = self.createColorfulTestImage() + boxes = self.createTestBoxes() + labels = self.createTestLabels() + keypoints = self.createTestKeypointsOutsideCrop() + + tensor_dict = { + fields.InputDataFields.image: image, + fields.InputDataFields.groundtruth_boxes: boxes, + fields.InputDataFields.groundtruth_classes: labels, + fields.InputDataFields.groundtruth_keypoints: keypoints + } + + preprocessor_arg_map = preprocessor.get_default_func_arg_map( + include_keypoints=True) + + preprocessing_options = [(preprocessor.random_crop_image, {})] + + with mock.patch.object( + tf.image, + 'sample_distorted_bounding_box' + ) as mock_sample_distorted_bounding_box: + mock_sample_distorted_bounding_box.return_value = ( + tf.constant([6, 143, 0], dtype=tf.int32), + tf.constant([190, 237, -1], dtype=tf.int32), + tf.constant([[[0.03, 0.3575, 0.98, 0.95]]], dtype=tf.float32)) + distorted_tensor_dict = preprocessor.preprocess( + tensor_dict, preprocessing_options, func_arg_map=preprocessor_arg_map) + distorted_image = distorted_tensor_dict[fields.InputDataFields.image] + distorted_boxes = distorted_tensor_dict[ + fields.InputDataFields.groundtruth_boxes] + distorted_labels = distorted_tensor_dict[ + fields.InputDataFields.groundtruth_classes] + distorted_keypoints = distorted_tensor_dict[ + fields.InputDataFields.groundtruth_keypoints] + with self.test_session() as sess: + (distorted_image_, distorted_boxes_, distorted_labels_, + distorted_keypoints_) = sess.run( + [distorted_image, distorted_boxes, distorted_labels, + distorted_keypoints]) + + expected_boxes = np.array([ + [0.0, 0.0, 0.75789469, 1.0], + [0.23157893, 0.24050637, 0.75789469, 1.0], + ], dtype=np.float32) + expected_keypoints = np.array([ + [[np.nan, np.nan], + [np.nan, np.nan], + [np.nan, np.nan]], + [[np.nan, np.nan], + [np.nan, np.nan], + [np.nan, np.nan]], + ]) + self.assertAllEqual(distorted_image_.shape, [1, 190, 237, 3]) + self.assertAllEqual(distorted_labels_, [1, 2]) + self.assertAllClose( + distorted_boxes_.flatten(), expected_boxes.flatten()) + self.assertAllClose( + distorted_keypoints_.flatten(), expected_keypoints.flatten()) + + def testRunRetainBoxesAboveThreshold(self): + boxes = self.createTestBoxes() + labels = self.createTestLabels() + label_scores = self.createTestLabelScores() + + tensor_dict = { + fields.InputDataFields.groundtruth_boxes: boxes, + fields.InputDataFields.groundtruth_classes: labels, + fields.InputDataFields.groundtruth_label_scores: label_scores + } + + preprocessing_options = [ + (preprocessor.retain_boxes_above_threshold, {'threshold': 0.6}) + ] + + retained_tensor_dict = preprocessor.preprocess( + tensor_dict, preprocessing_options) + retained_boxes = retained_tensor_dict[ + fields.InputDataFields.groundtruth_boxes] + retained_labels = retained_tensor_dict[ + fields.InputDataFields.groundtruth_classes] + retained_label_scores = retained_tensor_dict[ + fields.InputDataFields.groundtruth_label_scores] + + with self.test_session() as sess: + (retained_boxes_, retained_labels_, + retained_label_scores_, expected_retained_boxes_, + expected_retained_labels_, expected_retained_label_scores_) = sess.run( + [retained_boxes, retained_labels, retained_label_scores, + self.expectedBoxesAfterThresholding(), + self.expectedLabelsAfterThresholding(), + self.expectedLabelScoresAfterThresholding()]) + + self.assertAllClose(retained_boxes_, expected_retained_boxes_) + self.assertAllClose(retained_labels_, expected_retained_labels_) + self.assertAllClose( + retained_label_scores_, expected_retained_label_scores_) + + def testRunRetainBoxesAboveThresholdWithMasks(self): + boxes = self.createTestBoxes() + labels = self.createTestLabels() + label_scores = self.createTestLabelScores() + masks = self.createTestMasks() + + tensor_dict = { + fields.InputDataFields.groundtruth_boxes: boxes, + fields.InputDataFields.groundtruth_classes: labels, + fields.InputDataFields.groundtruth_label_scores: label_scores, + fields.InputDataFields.groundtruth_instance_masks: masks + } + + preprocessor_arg_map = preprocessor.get_default_func_arg_map( + include_instance_masks=True) + + preprocessing_options = [ + (preprocessor.retain_boxes_above_threshold, {'threshold': 0.6}) + ] + + retained_tensor_dict = preprocessor.preprocess( + tensor_dict, preprocessing_options, func_arg_map=preprocessor_arg_map) + retained_masks = retained_tensor_dict[ + fields.InputDataFields.groundtruth_instance_masks] + + with self.test_session() as sess: + (retained_masks_, expected_masks_) = sess.run( + [retained_masks, + self.expectedMasksAfterThresholding()]) + self.assertAllClose(retained_masks_, expected_masks_) + + def testRunRetainBoxesAboveThresholdWithKeypoints(self): + boxes = self.createTestBoxes() + labels = self.createTestLabels() + label_scores = self.createTestLabelScores() + keypoints = self.createTestKeypoints() + + tensor_dict = { + fields.InputDataFields.groundtruth_boxes: boxes, + fields.InputDataFields.groundtruth_classes: labels, + fields.InputDataFields.groundtruth_label_scores: label_scores, + fields.InputDataFields.groundtruth_keypoints: keypoints + } + + preprocessor_arg_map = preprocessor.get_default_func_arg_map( + include_keypoints=True) + + preprocessing_options = [ + (preprocessor.retain_boxes_above_threshold, {'threshold': 0.6}) + ] + + retained_tensor_dict = preprocessor.preprocess( + tensor_dict, preprocessing_options, func_arg_map=preprocessor_arg_map) + retained_keypoints = retained_tensor_dict[ + fields.InputDataFields.groundtruth_keypoints] + + with self.test_session() as sess: + (retained_keypoints_, expected_keypoints_) = sess.run( + [retained_keypoints, + self.expectedKeypointsAfterThresholding()]) + self.assertAllClose(retained_keypoints_, expected_keypoints_) + + def testRunRandomCropToAspectRatioWithMasks(self): + image = self.createColorfulTestImage() + boxes = self.createTestBoxes() + labels = self.createTestLabels() + masks = tf.random_uniform([2, 200, 400], dtype=tf.float32) + + tensor_dict = { + fields.InputDataFields.image: image, + fields.InputDataFields.groundtruth_boxes: boxes, + fields.InputDataFields.groundtruth_classes: labels, + fields.InputDataFields.groundtruth_instance_masks: masks + } + + preprocessor_arg_map = preprocessor.get_default_func_arg_map( + include_instance_masks=True) + + preprocessing_options = [(preprocessor.random_crop_to_aspect_ratio, {})] + + with mock.patch.object(preprocessor, + '_random_integer') as mock_random_integer: + mock_random_integer.return_value = tf.constant(0, dtype=tf.int32) + distorted_tensor_dict = preprocessor.preprocess( + tensor_dict, preprocessing_options, func_arg_map=preprocessor_arg_map) + distorted_image = distorted_tensor_dict[fields.InputDataFields.image] + distorted_boxes = distorted_tensor_dict[ + fields.InputDataFields.groundtruth_boxes] + distorted_labels = distorted_tensor_dict[ + fields.InputDataFields.groundtruth_classes] + distorted_masks = distorted_tensor_dict[ + fields.InputDataFields.groundtruth_instance_masks] + with self.test_session() as sess: + (distorted_image_, distorted_boxes_, distorted_labels_, + distorted_masks_) = sess.run([ + distorted_image, distorted_boxes, distorted_labels, distorted_masks + ]) + + expected_boxes = np.array([0.0, 0.5, 0.75, 1.0], dtype=np.float32) + self.assertAllEqual(distorted_image_.shape, [1, 200, 200, 3]) + self.assertAllEqual(distorted_labels_, [1]) + self.assertAllClose(distorted_boxes_.flatten(), + expected_boxes.flatten()) + self.assertAllEqual(distorted_masks_.shape, [1, 200, 200]) + + def testRunRandomCropToAspectRatioWithKeypoints(self): + image = self.createColorfulTestImage() + boxes = self.createTestBoxes() + labels = self.createTestLabels() + keypoints = self.createTestKeypoints() + + tensor_dict = { + fields.InputDataFields.image: image, + fields.InputDataFields.groundtruth_boxes: boxes, + fields.InputDataFields.groundtruth_classes: labels, + fields.InputDataFields.groundtruth_keypoints: keypoints + } + + preprocessor_arg_map = preprocessor.get_default_func_arg_map( + include_keypoints=True) + + preprocessing_options = [(preprocessor.random_crop_to_aspect_ratio, {})] + + with mock.patch.object(preprocessor, + '_random_integer') as mock_random_integer: + mock_random_integer.return_value = tf.constant(0, dtype=tf.int32) + distorted_tensor_dict = preprocessor.preprocess( + tensor_dict, preprocessing_options, func_arg_map=preprocessor_arg_map) + distorted_image = distorted_tensor_dict[fields.InputDataFields.image] + distorted_boxes = distorted_tensor_dict[ + fields.InputDataFields.groundtruth_boxes] + distorted_labels = distorted_tensor_dict[ + fields.InputDataFields.groundtruth_classes] + distorted_keypoints = distorted_tensor_dict[ + fields.InputDataFields.groundtruth_keypoints] + with self.test_session() as sess: + (distorted_image_, distorted_boxes_, distorted_labels_, + distorted_keypoints_) = sess.run([ + distorted_image, distorted_boxes, distorted_labels, + distorted_keypoints + ]) + + expected_boxes = np.array([0.0, 0.5, 0.75, 1.0], dtype=np.float32) + expected_keypoints = np.array( + [[0.1, 0.2], [0.2, 0.4], [0.3, 0.6]], dtype=np.float32) + self.assertAllEqual(distorted_image_.shape, [1, 200, 200, 3]) + self.assertAllEqual(distorted_labels_, [1]) + self.assertAllClose(distorted_boxes_.flatten(), + expected_boxes.flatten()) + self.assertAllClose(distorted_keypoints_.flatten(), + expected_keypoints.flatten()) + + def testRandomPadImage(self): + preprocessing_options = [(preprocessor.normalize_image, { + 'original_minval': 0, + 'original_maxval': 255, + 'target_minval': 0, + 'target_maxval': 1 + })] + + images = self.createTestImages() + boxes = self.createTestBoxes() + labels = self.createTestLabels() + tensor_dict = {fields.InputDataFields.image: images, + fields.InputDataFields.groundtruth_boxes: boxes, + fields.InputDataFields.groundtruth_classes: labels} + tensor_dict = preprocessor.preprocess(tensor_dict, preprocessing_options) + images = tensor_dict[fields.InputDataFields.image] + + preprocessing_options = [(preprocessor.random_pad_image, {})] + padded_tensor_dict = preprocessor.preprocess(tensor_dict, + preprocessing_options) + + padded_images = padded_tensor_dict[fields.InputDataFields.image] + padded_boxes = padded_tensor_dict[ + fields.InputDataFields.groundtruth_boxes] + boxes_shape = tf.shape(boxes) + padded_boxes_shape = tf.shape(padded_boxes) + images_shape = tf.shape(images) + padded_images_shape = tf.shape(padded_images) + + with self.test_session() as sess: + (boxes_shape_, padded_boxes_shape_, images_shape_, + padded_images_shape_, boxes_, padded_boxes_) = sess.run( + [boxes_shape, padded_boxes_shape, images_shape, + padded_images_shape, boxes, padded_boxes]) + self.assertAllEqual(boxes_shape_, padded_boxes_shape_) + self.assertTrue((images_shape_[1] >= padded_images_shape_[1] * 0.5).all) + self.assertTrue((images_shape_[2] >= padded_images_shape_[2] * 0.5).all) + self.assertTrue((images_shape_[1] <= padded_images_shape_[1]).all) + self.assertTrue((images_shape_[2] <= padded_images_shape_[2]).all) + self.assertTrue(np.all((boxes_[:, 2] - boxes_[:, 0]) >= ( + padded_boxes_[:, 2] - padded_boxes_[:, 0]))) + self.assertTrue(np.all((boxes_[:, 3] - boxes_[:, 1]) >= ( + padded_boxes_[:, 3] - padded_boxes_[:, 1]))) + + def testRandomCropPadImageWithRandomCoefOne(self): + preprocessing_options = [(preprocessor.normalize_image, { + 'original_minval': 0, + 'original_maxval': 255, + 'target_minval': 0, + 'target_maxval': 1 + })] + + images = self.createTestImages() + boxes = self.createTestBoxes() + labels = self.createTestLabels() + tensor_dict = {fields.InputDataFields.image: images, + fields.InputDataFields.groundtruth_boxes: boxes, + fields.InputDataFields.groundtruth_classes: labels} + tensor_dict = preprocessor.preprocess(tensor_dict, preprocessing_options) + images = tensor_dict[fields.InputDataFields.image] + + preprocessing_options = [(preprocessor.random_crop_pad_image, { + 'random_coef': 1.0 + })] + padded_tensor_dict = preprocessor.preprocess(tensor_dict, + preprocessing_options) + + padded_images = padded_tensor_dict[fields.InputDataFields.image] + padded_boxes = padded_tensor_dict[ + fields.InputDataFields.groundtruth_boxes] + boxes_shape = tf.shape(boxes) + padded_boxes_shape = tf.shape(padded_boxes) + images_shape = tf.shape(images) + padded_images_shape = tf.shape(padded_images) + + with self.test_session() as sess: + (boxes_shape_, padded_boxes_shape_, images_shape_, + padded_images_shape_, boxes_, padded_boxes_) = sess.run( + [boxes_shape, padded_boxes_shape, images_shape, + padded_images_shape, boxes, padded_boxes]) + self.assertAllEqual(boxes_shape_, padded_boxes_shape_) + self.assertTrue((images_shape_[1] >= padded_images_shape_[1] * 0.5).all) + self.assertTrue((images_shape_[2] >= padded_images_shape_[2] * 0.5).all) + self.assertTrue((images_shape_[1] <= padded_images_shape_[1]).all) + self.assertTrue((images_shape_[2] <= padded_images_shape_[2]).all) + self.assertTrue(np.all((boxes_[:, 2] - boxes_[:, 0]) >= ( + padded_boxes_[:, 2] - padded_boxes_[:, 0]))) + self.assertTrue(np.all((boxes_[:, 3] - boxes_[:, 1]) >= ( + padded_boxes_[:, 3] - padded_boxes_[:, 1]))) + + def testRandomCropToAspectRatio(self): + preprocessing_options = [(preprocessor.normalize_image, { + 'original_minval': 0, + 'original_maxval': 255, + 'target_minval': 0, + 'target_maxval': 1 + })] + + images = self.createTestImages() + boxes = self.createTestBoxes() + labels = self.createTestLabels() + tensor_dict = { + fields.InputDataFields.image: images, + fields.InputDataFields.groundtruth_boxes: boxes, + fields.InputDataFields.groundtruth_classes: labels + } + tensor_dict = preprocessor.preprocess(tensor_dict, preprocessing_options) + images = tensor_dict[fields.InputDataFields.image] + + preprocessing_options = [(preprocessor.random_crop_to_aspect_ratio, { + 'aspect_ratio': 2.0 + })] + cropped_tensor_dict = preprocessor.preprocess(tensor_dict, + preprocessing_options) + + cropped_images = cropped_tensor_dict[fields.InputDataFields.image] + cropped_boxes = cropped_tensor_dict[ + fields.InputDataFields.groundtruth_boxes] + boxes_shape = tf.shape(boxes) + cropped_boxes_shape = tf.shape(cropped_boxes) + images_shape = tf.shape(images) + cropped_images_shape = tf.shape(cropped_images) + + with self.test_session() as sess: + (boxes_shape_, cropped_boxes_shape_, images_shape_, + cropped_images_shape_) = sess.run([ + boxes_shape, cropped_boxes_shape, images_shape, cropped_images_shape + ]) + self.assertAllEqual(boxes_shape_, cropped_boxes_shape_) + self.assertEqual(images_shape_[1], cropped_images_shape_[1] * 2) + self.assertEqual(images_shape_[2], cropped_images_shape_[2]) + + def testRandomBlackPatches(self): + preprocessing_options = [] + preprocessing_options.append((preprocessor.normalize_image, { + 'original_minval': 0, + 'original_maxval': 255, + 'target_minval': 0, + 'target_maxval': 1 + })) + preprocessing_options.append((preprocessor.random_black_patches, { + 'size_to_image_ratio': 0.5 + })) + images = self.createTestImages() + tensor_dict = {fields.InputDataFields.image: images} + blacked_tensor_dict = preprocessor.preprocess(tensor_dict, + preprocessing_options) + blacked_images = blacked_tensor_dict[fields.InputDataFields.image] + images_shape = tf.shape(images) + blacked_images_shape = tf.shape(blacked_images) + + with self.test_session() as sess: + (images_shape_, blacked_images_shape_) = sess.run( + [images_shape, blacked_images_shape]) + self.assertAllEqual(images_shape_, blacked_images_shape_) + + def testRandomResizeMethod(self): + preprocessing_options = [] + preprocessing_options.append((preprocessor.normalize_image, { + 'original_minval': 0, + 'original_maxval': 255, + 'target_minval': 0, + 'target_maxval': 1 + })) + preprocessing_options.append((preprocessor.random_resize_method, { + 'target_size': (75, 150) + })) + images = self.createTestImages() + tensor_dict = {fields.InputDataFields.image: images} + resized_tensor_dict = preprocessor.preprocess(tensor_dict, + preprocessing_options) + resized_images = resized_tensor_dict[fields.InputDataFields.image] + resized_images_shape = tf.shape(resized_images) + expected_images_shape = tf.constant([1, 75, 150, 3], dtype=tf.int32) + + with self.test_session() as sess: + (expected_images_shape_, resized_images_shape_) = sess.run( + [expected_images_shape, resized_images_shape]) + self.assertAllEqual(expected_images_shape_, + resized_images_shape_) + + def testResizeToRange(self): + """Tests image resizing, checking output sizes.""" + in_shape_list = [[60, 40, 3], [15, 30, 3], [15, 50, 3]] + min_dim = 50 + max_dim = 100 + expected_shape_list = [[75, 50, 3], [50, 100, 3], [30, 100, 3]] + + for in_shape, expected_shape in zip(in_shape_list, expected_shape_list): + in_image = tf.random_uniform(in_shape) + out_image = preprocessor.resize_to_range( + in_image, min_dimension=min_dim, max_dimension=max_dim) + out_image_shape = tf.shape(out_image) + + with self.test_session() as sess: + out_image_shape = sess.run(out_image_shape) + self.assertAllEqual(out_image_shape, expected_shape) + + def testResizeToRangeWithMasks(self): + """Tests image resizing, checking output sizes.""" + in_image_shape_list = [[60, 40, 3], [15, 30, 3]] + in_masks_shape_list = [[15, 60, 40], [10, 15, 30]] + min_dim = 50 + max_dim = 100 + expected_image_shape_list = [[75, 50, 3], [50, 100, 3]] + expected_masks_shape_list = [[15, 75, 50], [10, 50, 100]] + + for (in_image_shape, expected_image_shape, in_masks_shape, + expected_mask_shape) in zip(in_image_shape_list, + expected_image_shape_list, + in_masks_shape_list, + expected_masks_shape_list): + in_image = tf.random_uniform(in_image_shape) + in_masks = tf.random_uniform(in_masks_shape) + out_image, out_masks = preprocessor.resize_to_range( + in_image, in_masks, min_dimension=min_dim, max_dimension=max_dim) + out_image_shape = tf.shape(out_image) + out_masks_shape = tf.shape(out_masks) + + with self.test_session() as sess: + out_image_shape, out_masks_shape = sess.run( + [out_image_shape, out_masks_shape]) + self.assertAllEqual(out_image_shape, expected_image_shape) + self.assertAllEqual(out_masks_shape, expected_mask_shape) + + def testResizeToRangeWithNoInstanceMask(self): + """Tests image resizing, checking output sizes.""" + in_image_shape_list = [[60, 40, 3], [15, 30, 3]] + in_masks_shape_list = [[0, 60, 40], [0, 15, 30]] + min_dim = 50 + max_dim = 100 + expected_image_shape_list = [[75, 50, 3], [50, 100, 3]] + expected_masks_shape_list = [[0, 75, 50], [0, 50, 100]] + + for (in_image_shape, expected_image_shape, in_masks_shape, + expected_mask_shape) in zip(in_image_shape_list, + expected_image_shape_list, + in_masks_shape_list, + expected_masks_shape_list): + in_image = tf.random_uniform(in_image_shape) + in_masks = tf.random_uniform(in_masks_shape) + out_image, out_masks = preprocessor.resize_to_range( + in_image, in_masks, min_dimension=min_dim, max_dimension=max_dim) + out_image_shape = tf.shape(out_image) + out_masks_shape = tf.shape(out_masks) + + with self.test_session() as sess: + out_image_shape, out_masks_shape = sess.run( + [out_image_shape, out_masks_shape]) + self.assertAllEqual(out_image_shape, expected_image_shape) + self.assertAllEqual(out_masks_shape, expected_mask_shape) + + def testResizeImageWithMasks(self): + """Tests image resizing, checking output sizes.""" + in_image_shape_list = [[60, 40, 3], [15, 30, 3]] + in_masks_shape_list = [[15, 60, 40], [10, 15, 30]] + height = 50 + width = 100 + expected_image_shape_list = [[50, 100, 3], [50, 100, 3]] + expected_masks_shape_list = [[15, 50, 100], [10, 50, 100]] + + for (in_image_shape, expected_image_shape, in_masks_shape, + expected_mask_shape) in zip(in_image_shape_list, + expected_image_shape_list, + in_masks_shape_list, + expected_masks_shape_list): + in_image = tf.random_uniform(in_image_shape) + in_masks = tf.random_uniform(in_masks_shape) + out_image, out_masks = preprocessor.resize_image( + in_image, in_masks, new_height=height, new_width=width) + out_image_shape = tf.shape(out_image) + out_masks_shape = tf.shape(out_masks) + + with self.test_session() as sess: + out_image_shape, out_masks_shape = sess.run( + [out_image_shape, out_masks_shape]) + self.assertAllEqual(out_image_shape, expected_image_shape) + self.assertAllEqual(out_masks_shape, expected_mask_shape) + + def testResizeImageWithNoInstanceMask(self): + """Tests image resizing, checking output sizes.""" + in_image_shape_list = [[60, 40, 3], [15, 30, 3]] + in_masks_shape_list = [[0, 60, 40], [0, 15, 30]] + height = 50 + width = 100 + expected_image_shape_list = [[50, 100, 3], [50, 100, 3]] + expected_masks_shape_list = [[0, 50, 100], [0, 50, 100]] + + for (in_image_shape, expected_image_shape, in_masks_shape, + expected_mask_shape) in zip(in_image_shape_list, + expected_image_shape_list, + in_masks_shape_list, + expected_masks_shape_list): + in_image = tf.random_uniform(in_image_shape) + in_masks = tf.random_uniform(in_masks_shape) + out_image, out_masks = preprocessor.resize_image( + in_image, in_masks, new_height=height, new_width=width) + out_image_shape = tf.shape(out_image) + out_masks_shape = tf.shape(out_masks) + + with self.test_session() as sess: + out_image_shape, out_masks_shape = sess.run( + [out_image_shape, out_masks_shape]) + self.assertAllEqual(out_image_shape, expected_image_shape) + self.assertAllEqual(out_masks_shape, expected_mask_shape) + + def testResizeToRange4DImageTensor(self): + image = tf.random_uniform([1, 200, 300, 3]) + with self.assertRaises(ValueError): + preprocessor.resize_to_range(image, 500, 600) + + def testResizeToRangeSameMinMax(self): + """Tests image resizing, checking output sizes.""" + in_shape_list = [[312, 312, 3], [299, 299, 3]] + min_dim = 320 + max_dim = 320 + expected_shape_list = [[320, 320, 3], [320, 320, 3]] + + for in_shape, expected_shape in zip(in_shape_list, expected_shape_list): + in_image = tf.random_uniform(in_shape) + out_image = preprocessor.resize_to_range( + in_image, min_dimension=min_dim, max_dimension=max_dim) + out_image_shape = tf.shape(out_image) + + with self.test_session() as sess: + out_image_shape = sess.run(out_image_shape) + self.assertAllEqual(out_image_shape, expected_shape) + + def testScaleBoxesToPixelCoordinates(self): + """Tests box scaling, checking scaled values.""" + in_shape = [60, 40, 3] + in_boxes = [[0.1, 0.2, 0.4, 0.6], + [0.5, 0.3, 0.9, 0.7]] + + expected_boxes = [[6., 8., 24., 24.], + [30., 12., 54., 28.]] + + in_image = tf.random_uniform(in_shape) + in_boxes = tf.constant(in_boxes) + _, out_boxes = preprocessor.scale_boxes_to_pixel_coordinates( + in_image, boxes=in_boxes) + with self.test_session() as sess: + out_boxes = sess.run(out_boxes) + self.assertAllClose(out_boxes, expected_boxes) + + def testScaleBoxesToPixelCoordinatesWithKeypoints(self): + """Tests box and keypoint scaling, checking scaled values.""" + in_shape = [60, 40, 3] + in_boxes = self.createTestBoxes() + in_keypoints = self.createTestKeypoints() + + expected_boxes = [[0., 10., 45., 40.], + [15., 20., 45., 40.]] + expected_keypoints = [ + [[6., 4.], [12., 8.], [18., 12.]], + [[24., 16.], [30., 20.], [36., 24.]], + ] + + in_image = tf.random_uniform(in_shape) + _, out_boxes, out_keypoints = preprocessor.scale_boxes_to_pixel_coordinates( + in_image, boxes=in_boxes, keypoints=in_keypoints) + with self.test_session() as sess: + out_boxes_, out_keypoints_ = sess.run([out_boxes, out_keypoints]) + self.assertAllClose(out_boxes_, expected_boxes) + self.assertAllClose(out_keypoints_, expected_keypoints) + + def testSubtractChannelMean(self): + """Tests whether channel means have been subtracted.""" + with self.test_session(): + image = tf.zeros((240, 320, 3)) + means = [1, 2, 3] + actual = preprocessor.subtract_channel_mean(image, means=means) + actual = actual.eval() + + self.assertTrue((actual[:, :, 0] == -1).all()) + self.assertTrue((actual[:, :, 1] == -2).all()) + self.assertTrue((actual[:, :, 2] == -3).all()) + + def testOneHotEncoding(self): + """Tests one hot encoding of multiclass labels.""" + with self.test_session(): + labels = tf.constant([1, 4, 2], dtype=tf.int32) + one_hot = preprocessor.one_hot_encoding(labels, num_classes=5) + one_hot = one_hot.eval() + + self.assertAllEqual([0, 1, 1, 0, 1], one_hot) + + def testSSDRandomCrop(self): + preprocessing_options = [ + (preprocessor.normalize_image, { + 'original_minval': 0, + 'original_maxval': 255, + 'target_minval': 0, + 'target_maxval': 1 + }), + (preprocessor.ssd_random_crop, {})] + images = self.createTestImages() + boxes = self.createTestBoxes() + labels = self.createTestLabels() + tensor_dict = {fields.InputDataFields.image: images, + fields.InputDataFields.groundtruth_boxes: boxes, + fields.InputDataFields.groundtruth_classes: labels} + distorted_tensor_dict = preprocessor.preprocess(tensor_dict, + preprocessing_options) + distorted_images = distorted_tensor_dict[fields.InputDataFields.image] + distorted_boxes = distorted_tensor_dict[ + fields.InputDataFields.groundtruth_boxes] + + images_rank = tf.rank(images) + distorted_images_rank = tf.rank(distorted_images) + boxes_rank = tf.rank(boxes) + distorted_boxes_rank = tf.rank(distorted_boxes) + + with self.test_session() as sess: + (boxes_rank_, distorted_boxes_rank_, images_rank_, + distorted_images_rank_) = sess.run( + [boxes_rank, distorted_boxes_rank, images_rank, + distorted_images_rank]) + self.assertAllEqual(boxes_rank_, distorted_boxes_rank_) + self.assertAllEqual(images_rank_, distorted_images_rank_) + + def testSSDRandomCropPad(self): + images = self.createTestImages() + boxes = self.createTestBoxes() + labels = self.createTestLabels() + preprocessing_options = [ + (preprocessor.normalize_image, { + 'original_minval': 0, + 'original_maxval': 255, + 'target_minval': 0, + 'target_maxval': 1 + }), + (preprocessor.ssd_random_crop_pad, {})] + tensor_dict = {fields.InputDataFields.image: images, + fields.InputDataFields.groundtruth_boxes: boxes, + fields.InputDataFields.groundtruth_classes: labels} + distorted_tensor_dict = preprocessor.preprocess(tensor_dict, + preprocessing_options) + distorted_images = distorted_tensor_dict[fields.InputDataFields.image] + distorted_boxes = distorted_tensor_dict[ + fields.InputDataFields.groundtruth_boxes] + + images_rank = tf.rank(images) + distorted_images_rank = tf.rank(distorted_images) + boxes_rank = tf.rank(boxes) + distorted_boxes_rank = tf.rank(distorted_boxes) + + with self.test_session() as sess: + (boxes_rank_, distorted_boxes_rank_, images_rank_, + distorted_images_rank_) = sess.run([ + boxes_rank, distorted_boxes_rank, images_rank, distorted_images_rank + ]) + self.assertAllEqual(boxes_rank_, distorted_boxes_rank_) + self.assertAllEqual(images_rank_, distorted_images_rank_) + + def testSSDRandomCropFixedAspectRatio(self): + images = self.createTestImages() + boxes = self.createTestBoxes() + labels = self.createTestLabels() + preprocessing_options = [ + (preprocessor.normalize_image, { + 'original_minval': 0, + 'original_maxval': 255, + 'target_minval': 0, + 'target_maxval': 1 + }), + (preprocessor.ssd_random_crop_fixed_aspect_ratio, {})] + tensor_dict = { + fields.InputDataFields.image: images, + fields.InputDataFields.groundtruth_boxes: boxes, + fields.InputDataFields.groundtruth_classes: labels + } + distorted_tensor_dict = preprocessor.preprocess(tensor_dict, + preprocessing_options) + distorted_images = distorted_tensor_dict[fields.InputDataFields.image] + distorted_boxes = distorted_tensor_dict[ + fields.InputDataFields.groundtruth_boxes] + + images_rank = tf.rank(images) + distorted_images_rank = tf.rank(distorted_images) + boxes_rank = tf.rank(boxes) + distorted_boxes_rank = tf.rank(distorted_boxes) + + with self.test_session() as sess: + (boxes_rank_, distorted_boxes_rank_, images_rank_, + distorted_images_rank_) = sess.run( + [boxes_rank, distorted_boxes_rank, images_rank, + distorted_images_rank]) + self.assertAllEqual(boxes_rank_, distorted_boxes_rank_) + self.assertAllEqual(images_rank_, distorted_images_rank_) + + def testSSDRandomCropFixedAspectRatioWithMasksAndKeypoints(self): + images = self.createTestImages() + boxes = self.createTestBoxes() + labels = self.createTestLabels() + masks = self.createTestMasks() + keypoints = self.createTestKeypoints() + preprocessing_options = [ + (preprocessor.normalize_image, { + 'original_minval': 0, + 'original_maxval': 255, + 'target_minval': 0, + 'target_maxval': 1 + }), + (preprocessor.ssd_random_crop_fixed_aspect_ratio, {})] + tensor_dict = { + fields.InputDataFields.image: images, + fields.InputDataFields.groundtruth_boxes: boxes, + fields.InputDataFields.groundtruth_classes: labels, + fields.InputDataFields.groundtruth_instance_masks: masks, + fields.InputDataFields.groundtruth_keypoints: keypoints, + } + preprocessor_arg_map = preprocessor.get_default_func_arg_map( + include_instance_masks=True, include_keypoints=True) + distorted_tensor_dict = preprocessor.preprocess( + tensor_dict, preprocessing_options, func_arg_map=preprocessor_arg_map) + distorted_images = distorted_tensor_dict[fields.InputDataFields.image] + distorted_boxes = distorted_tensor_dict[ + fields.InputDataFields.groundtruth_boxes] + + images_rank = tf.rank(images) + distorted_images_rank = tf.rank(distorted_images) + boxes_rank = tf.rank(boxes) + distorted_boxes_rank = tf.rank(distorted_boxes) + + with self.test_session() as sess: + (boxes_rank_, distorted_boxes_rank_, images_rank_, + distorted_images_rank_) = sess.run( + [boxes_rank, distorted_boxes_rank, images_rank, + distorted_images_rank]) + self.assertAllEqual(boxes_rank_, distorted_boxes_rank_) + self.assertAllEqual(images_rank_, distorted_images_rank_) + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/core/region_similarity_calculator.py b/object_detection/core/region_similarity_calculator.py new file mode 100644 index 0000000000000000000000000000000000000000..f344006a3c56c95021dae47fcf5195a1b9743d85 --- /dev/null +++ b/object_detection/core/region_similarity_calculator.py @@ -0,0 +1,114 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Region Similarity Calculators for BoxLists. + +Region Similarity Calculators compare a pairwise measure of similarity +between the boxes in two BoxLists. +""" +from abc import ABCMeta +from abc import abstractmethod + +import tensorflow as tf + +from object_detection.core import box_list_ops + + +class RegionSimilarityCalculator(object): + """Abstract base class for region similarity calculator.""" + __metaclass__ = ABCMeta + + def compare(self, boxlist1, boxlist2, scope=None): + """Computes matrix of pairwise similarity between BoxLists. + + This op (to be overriden) computes a measure of pairwise similarity between + the boxes in the given BoxLists. Higher values indicate more similarity. + + Note that this method simply measures similarity and does not explicitly + perform a matching. + + Args: + boxlist1: BoxList holding N boxes. + boxlist2: BoxList holding M boxes. + scope: Op scope name. Defaults to 'Compare' if None. + + Returns: + a (float32) tensor of shape [N, M] with pairwise similarity score. + """ + with tf.name_scope(scope, 'Compare', [boxlist1, boxlist2]) as scope: + return self._compare(boxlist1, boxlist2) + + @abstractmethod + def _compare(self, boxlist1, boxlist2): + pass + + +class IouSimilarity(RegionSimilarityCalculator): + """Class to compute similarity based on Intersection over Union (IOU) metric. + + This class computes pairwise similarity between two BoxLists based on IOU. + """ + + def _compare(self, boxlist1, boxlist2): + """Compute pairwise IOU similarity between the two BoxLists. + + Args: + boxlist1: BoxList holding N boxes. + boxlist2: BoxList holding M boxes. + + Returns: + A tensor with shape [N, M] representing pairwise iou scores. + """ + return box_list_ops.iou(boxlist1, boxlist2) + + +class NegSqDistSimilarity(RegionSimilarityCalculator): + """Class to compute similarity based on the squared distance metric. + + This class computes pairwise similarity between two BoxLists based on the + negative squared distance metric. + """ + + def _compare(self, boxlist1, boxlist2): + """Compute matrix of (negated) sq distances. + + Args: + boxlist1: BoxList holding N boxes. + boxlist2: BoxList holding M boxes. + + Returns: + A tensor with shape [N, M] representing negated pairwise squared distance. + """ + return -1 * box_list_ops.sq_dist(boxlist1, boxlist2) + + +class IoaSimilarity(RegionSimilarityCalculator): + """Class to compute similarity based on Intersection over Area (IOA) metric. + + This class computes pairwise similarity between two BoxLists based on their + pairwise intersections divided by the areas of second BoxLists. + """ + + def _compare(self, boxlist1, boxlist2): + """Compute pairwise IOA similarity between the two BoxLists. + + Args: + boxlist1: BoxList holding N boxes. + boxlist2: BoxList holding M boxes. + + Returns: + A tensor with shape [N, M] representing pairwise IOA scores. + """ + return box_list_ops.ioa(boxlist1, boxlist2) diff --git a/object_detection/core/region_similarity_calculator_test.py b/object_detection/core/region_similarity_calculator_test.py new file mode 100644 index 0000000000000000000000000000000000000000..162151a3b53468a7724133ca681efc0df5293563 --- /dev/null +++ b/object_detection/core/region_similarity_calculator_test.py @@ -0,0 +1,75 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for region_similarity_calculator.""" +import tensorflow as tf + +from object_detection.core import box_list +from object_detection.core import region_similarity_calculator + + +class RegionSimilarityCalculatorTest(tf.test.TestCase): + + def test_get_correct_pairwise_similarity_based_on_iou(self): + corners1 = tf.constant([[4.0, 3.0, 7.0, 5.0], [5.0, 6.0, 10.0, 7.0]]) + corners2 = tf.constant([[3.0, 4.0, 6.0, 8.0], [14.0, 14.0, 15.0, 15.0], + [0.0, 0.0, 20.0, 20.0]]) + exp_output = [[2.0 / 16.0, 0, 6.0 / 400.0], [1.0 / 16.0, 0.0, 5.0 / 400.0]] + boxes1 = box_list.BoxList(corners1) + boxes2 = box_list.BoxList(corners2) + iou_similarity_calculator = region_similarity_calculator.IouSimilarity() + iou_similarity = iou_similarity_calculator.compare(boxes1, boxes2) + with self.test_session() as sess: + iou_output = sess.run(iou_similarity) + self.assertAllClose(iou_output, exp_output) + + def test_get_correct_pairwise_similarity_based_on_squared_distances(self): + corners1 = tf.constant([[0.0, 0.0, 0.0, 0.0], + [1.0, 1.0, 0.0, 2.0]]) + corners2 = tf.constant([[3.0, 4.0, 1.0, 0.0], + [-4.0, 0.0, 0.0, 3.0], + [0.0, 0.0, 0.0, 0.0]]) + exp_output = [[-26, -25, 0], [-18, -27, -6]] + boxes1 = box_list.BoxList(corners1) + boxes2 = box_list.BoxList(corners2) + dist_similarity_calc = region_similarity_calculator.NegSqDistSimilarity() + dist_similarity = dist_similarity_calc.compare(boxes1, boxes2) + with self.test_session() as sess: + dist_output = sess.run(dist_similarity) + self.assertAllClose(dist_output, exp_output) + + def test_get_correct_pairwise_similarity_based_on_ioa(self): + corners1 = tf.constant([[4.0, 3.0, 7.0, 5.0], [5.0, 6.0, 10.0, 7.0]]) + corners2 = tf.constant([[3.0, 4.0, 6.0, 8.0], [14.0, 14.0, 15.0, 15.0], + [0.0, 0.0, 20.0, 20.0]]) + exp_output_1 = [[2.0 / 12.0, 0, 6.0 / 400.0], + [1.0 / 12.0, 0.0, 5.0 / 400.0]] + exp_output_2 = [[2.0 / 6.0, 1.0 / 5.0], + [0, 0], + [6.0 / 6.0, 5.0 / 5.0]] + boxes1 = box_list.BoxList(corners1) + boxes2 = box_list.BoxList(corners2) + ioa_similarity_calculator = region_similarity_calculator.IoaSimilarity() + ioa_similarity_1 = ioa_similarity_calculator.compare(boxes1, boxes2) + ioa_similarity_2 = ioa_similarity_calculator.compare(boxes2, boxes1) + with self.test_session() as sess: + iou_output_1, iou_output_2 = sess.run( + [ioa_similarity_1, ioa_similarity_2]) + self.assertAllClose(iou_output_1, exp_output_1) + self.assertAllClose(iou_output_2, exp_output_2) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/core/standard_fields.py b/object_detection/core/standard_fields.py new file mode 100644 index 0000000000000000000000000000000000000000..978aad3d82c3e352ae2236bfe0724d2794966ec4 --- /dev/null +++ b/object_detection/core/standard_fields.py @@ -0,0 +1,150 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Contains classes specifying naming conventions used for object detection. + + +Specifies: + InputDataFields: standard fields used by reader/preprocessor/batcher. + BoxListFields: standard field used by BoxList + TfExampleFields: standard fields for tf-example data format (go/tf-example). +""" + + +class InputDataFields(object): + """Names for the input tensors. + + Holds the standard data field names to use for identifying input tensors. This + should be used by the decoder to identify keys for the returned tensor_dict + containing input tensors. And it should be used by the model to identify the + tensors it needs. + + Attributes: + image: image. + original_image: image in the original input size. + key: unique key corresponding to image. + source_id: source of the original image. + filename: original filename of the dataset (without common path). + groundtruth_image_classes: image-level class labels. + groundtruth_boxes: coordinates of the ground truth boxes in the image. + groundtruth_classes: box-level class labels. + groundtruth_label_types: box-level label types (e.g. explicit negative). + groundtruth_is_crowd: is the groundtruth a single object or a crowd. + groundtruth_area: area of a groundtruth segment. + groundtruth_difficult: is a `difficult` object + proposal_boxes: coordinates of object proposal boxes. + proposal_objectness: objectness score of each proposal. + groundtruth_instance_masks: ground truth instance masks. + groundtruth_instance_classes: instance mask-level class labels. + groundtruth_keypoints: ground truth keypoints. + groundtruth_keypoint_visibilities: ground truth keypoint visibilities. + groundtruth_label_scores: groundtruth label scores. + """ + image = 'image' + original_image = 'original_image' + key = 'key' + source_id = 'source_id' + filename = 'filename' + groundtruth_image_classes = 'groundtruth_image_classes' + groundtruth_boxes = 'groundtruth_boxes' + groundtruth_classes = 'groundtruth_classes' + groundtruth_label_types = 'groundtruth_label_types' + groundtruth_is_crowd = 'groundtruth_is_crowd' + groundtruth_area = 'groundtruth_area' + groundtruth_difficult = 'groundtruth_difficult' + proposal_boxes = 'proposal_boxes' + proposal_objectness = 'proposal_objectness' + groundtruth_instance_masks = 'groundtruth_instance_masks' + groundtruth_instance_classes = 'groundtruth_instance_classes' + groundtruth_keypoints = 'groundtruth_keypoints' + groundtruth_keypoint_visibilities = 'groundtruth_keypoint_visibilities' + groundtruth_label_scores = 'groundtruth_label_scores' + + +class BoxListFields(object): + """Naming conventions for BoxLists. + + Attributes: + boxes: bounding box coordinates. + classes: classes per bounding box. + scores: scores per bounding box. + weights: sample weights per bounding box. + objectness: objectness score per bounding box. + masks: masks per bounding box. + keypoints: keypoints per bounding box. + keypoint_heatmaps: keypoint heatmaps per bounding box. + """ + boxes = 'boxes' + classes = 'classes' + scores = 'scores' + weights = 'weights' + objectness = 'objectness' + masks = 'masks' + keypoints = 'keypoints' + keypoint_heatmaps = 'keypoint_heatmaps' + + +class TfExampleFields(object): + """TF-example proto feature names for object detection. + + Holds the standard feature names to load from an Example proto for object + detection. + + Attributes: + image_encoded: JPEG encoded string + image_format: image format, e.g. "JPEG" + filename: filename + channels: number of channels of image + colorspace: colorspace, e.g. "RGB" + height: height of image in pixels, e.g. 462 + width: width of image in pixels, e.g. 581 + source_id: original source of the image + object_class_text: labels in text format, e.g. ["person", "cat"] + object_class_text: labels in numbers, e.g. [16, 8] + object_bbox_xmin: xmin coordinates of groundtruth box, e.g. 10, 30 + object_bbox_xmax: xmax coordinates of groundtruth box, e.g. 50, 40 + object_bbox_ymin: ymin coordinates of groundtruth box, e.g. 40, 50 + object_bbox_ymax: ymax coordinates of groundtruth box, e.g. 80, 70 + object_view: viewpoint of object, e.g. ["frontal", "left"] + object_truncated: is object truncated, e.g. [true, false] + object_occluded: is object occluded, e.g. [true, false] + object_difficult: is object difficult, e.g. [true, false] + object_is_crowd: is the object a single object or a crowd + object_segment_area: the area of the segment. + instance_masks: instance segmentation masks. + instance_classes: Classes for each instance segmentation mask. + """ + image_encoded = 'image/encoded' + image_format = 'image/format' # format is reserved keyword + filename = 'image/filename' + channels = 'image/channels' + colorspace = 'image/colorspace' + height = 'image/height' + width = 'image/width' + source_id = 'image/source_id' + object_class_text = 'image/object/class/text' + object_class_label = 'image/object/class/label' + object_bbox_ymin = 'image/object/bbox/ymin' + object_bbox_xmin = 'image/object/bbox/xmin' + object_bbox_ymax = 'image/object/bbox/ymax' + object_bbox_xmax = 'image/object/bbox/xmax' + object_view = 'image/object/view' + object_truncated = 'image/object/truncated' + object_occluded = 'image/object/occluded' + object_difficult = 'image/object/difficult' + object_is_crowd = 'image/object/is_crowd' + object_segment_area = 'image/object/segment/area' + instance_masks = 'image/segmentation/object' + instance_classes = 'image/segmentation/object/class' diff --git a/object_detection/core/target_assigner.py b/object_detection/core/target_assigner.py new file mode 100644 index 0000000000000000000000000000000000000000..a9f3f5aeac5db83c32728c7531315594ffcceea5 --- /dev/null +++ b/object_detection/core/target_assigner.py @@ -0,0 +1,449 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Base target assigner module. + +The job of a TargetAssigner is, for a given set of anchors (bounding boxes) and +groundtruth detections (bounding boxes), to assign classification and regression +targets to each anchor as well as weights to each anchor (specifying, e.g., +which anchors should not contribute to training loss). + +It assigns classification/regression targets by performing the following steps: +1) Computing pairwise similarity between anchors and groundtruth boxes using a + provided RegionSimilarity Calculator +2) Computing a matching based on the similarity matrix using a provided Matcher +3) Assigning regression targets based on the matching and a provided BoxCoder +4) Assigning classification targets based on the matching and groundtruth labels + +Note that TargetAssigners only operate on detections from a single +image at a time, so any logic for applying a TargetAssigner to multiple +images must be handled externally. +""" +import tensorflow as tf + +from object_detection.box_coders import faster_rcnn_box_coder +from object_detection.box_coders import mean_stddev_box_coder +from object_detection.core import box_coder as bcoder +from object_detection.core import box_list +from object_detection.core import box_list_ops +from object_detection.core import matcher as mat +from object_detection.core import region_similarity_calculator as sim_calc +from object_detection.matchers import argmax_matcher +from object_detection.matchers import bipartite_matcher + + +class TargetAssigner(object): + """Target assigner to compute classification and regression targets.""" + + def __init__(self, similarity_calc, matcher, box_coder, + positive_class_weight=1.0, negative_class_weight=1.0, + unmatched_cls_target=None): + """Construct Multibox Target Assigner. + + Args: + similarity_calc: a RegionSimilarityCalculator + matcher: an object_detection.core.Matcher used to match groundtruth to + anchors. + box_coder: an object_detection.core.BoxCoder used to encode matching + groundtruth boxes with respect to anchors. + positive_class_weight: classification weight to be associated to positive + anchors (default: 1.0) + negative_class_weight: classification weight to be associated to negative + anchors (default: 1.0) + unmatched_cls_target: a float32 tensor with shape [d_1, d_2, ..., d_k] + which is consistent with the classification target for each + anchor (and can be empty for scalar targets). This shape must thus be + compatible with the groundtruth labels that are passed to the "assign" + function (which have shape [num_gt_boxes, d_1, d_2, ..., d_k]). + If set to None, unmatched_cls_target is set to be [0] for each anchor. + + Raises: + ValueError: if similarity_calc is not a RegionSimilarityCalculator or + if matcher is not a Matcher or if box_coder is not a BoxCoder + """ + if not isinstance(similarity_calc, sim_calc.RegionSimilarityCalculator): + raise ValueError('similarity_calc must be a RegionSimilarityCalculator') + if not isinstance(matcher, mat.Matcher): + raise ValueError('matcher must be a Matcher') + if not isinstance(box_coder, bcoder.BoxCoder): + raise ValueError('box_coder must be a BoxCoder') + self._similarity_calc = similarity_calc + self._matcher = matcher + self._box_coder = box_coder + self._positive_class_weight = positive_class_weight + self._negative_class_weight = negative_class_weight + if unmatched_cls_target is None: + self._unmatched_cls_target = tf.constant([0], tf.float32) + else: + self._unmatched_cls_target = unmatched_cls_target + + @property + def box_coder(self): + return self._box_coder + + def assign(self, anchors, groundtruth_boxes, groundtruth_labels=None, + **params): + """Assign classification and regression targets to each anchor. + + For a given set of anchors and groundtruth detections, match anchors + to groundtruth_boxes and assign classification and regression targets to + each anchor as well as weights based on the resulting match (specifying, + e.g., which anchors should not contribute to training loss). + + Anchors that are not matched to anything are given a classification target + of self._unmatched_cls_target which can be specified via the constructor. + + Args: + anchors: a BoxList representing N anchors + groundtruth_boxes: a BoxList representing M groundtruth boxes + groundtruth_labels: a tensor of shape [num_gt_boxes, d_1, ... d_k] + with labels for each of the ground_truth boxes. The subshape + [d_1, ... d_k] can be empty (corresponding to scalar inputs). When set + to None, groundtruth_labels assumes a binary problem where all + ground_truth boxes get a positive label (of 1). + **params: Additional keyword arguments for specific implementations of + the Matcher. + + Returns: + cls_targets: a float32 tensor with shape [num_anchors, d_1, d_2 ... d_k], + where the subshape [d_1, ..., d_k] is compatible with groundtruth_labels + which has shape [num_gt_boxes, d_1, d_2, ... d_k]. + cls_weights: a float32 tensor with shape [num_anchors] + reg_targets: a float32 tensor with shape [num_anchors, box_code_dimension] + reg_weights: a float32 tensor with shape [num_anchors] + match: a matcher.Match object encoding the match between anchors and + groundtruth boxes, with rows corresponding to groundtruth boxes + and columns corresponding to anchors. + + Raises: + ValueError: if anchors or groundtruth_boxes are not of type + box_list.BoxList + """ + if not isinstance(anchors, box_list.BoxList): + raise ValueError('anchors must be an BoxList') + if not isinstance(groundtruth_boxes, box_list.BoxList): + raise ValueError('groundtruth_boxes must be an BoxList') + + if groundtruth_labels is None: + groundtruth_labels = tf.ones(tf.expand_dims(groundtruth_boxes.num_boxes(), + 0)) + groundtruth_labels = tf.expand_dims(groundtruth_labels, -1) + shape_assert = tf.assert_equal(tf.shape(groundtruth_labels)[1:], + tf.shape(self._unmatched_cls_target)) + + with tf.control_dependencies([shape_assert]): + match_quality_matrix = self._similarity_calc.compare(groundtruth_boxes, + anchors) + match = self._matcher.match(match_quality_matrix, **params) + reg_targets = self._create_regression_targets(anchors, + groundtruth_boxes, + match) + cls_targets = self._create_classification_targets(groundtruth_labels, + match) + reg_weights = self._create_regression_weights(match) + cls_weights = self._create_classification_weights( + match, self._positive_class_weight, self._negative_class_weight) + + num_anchors = anchors.num_boxes_static() + if num_anchors is not None: + reg_targets = self._reset_target_shape(reg_targets, num_anchors) + cls_targets = self._reset_target_shape(cls_targets, num_anchors) + reg_weights = self._reset_target_shape(reg_weights, num_anchors) + cls_weights = self._reset_target_shape(cls_weights, num_anchors) + + return cls_targets, cls_weights, reg_targets, reg_weights, match + + def _reset_target_shape(self, target, num_anchors): + """Sets the static shape of the target. + + Args: + target: the target tensor. Its first dimension will be overwritten. + num_anchors: the number of anchors, which is used to override the target's + first dimension. + + Returns: + A tensor with the shape info filled in. + """ + target_shape = target.get_shape().as_list() + target_shape[0] = num_anchors + target.set_shape(target_shape) + return target + + def _create_regression_targets(self, anchors, groundtruth_boxes, match): + """Returns a regression target for each anchor. + + Args: + anchors: a BoxList representing N anchors + groundtruth_boxes: a BoxList representing M groundtruth_boxes + match: a matcher.Match object + + Returns: + reg_targets: a float32 tensor with shape [N, box_code_dimension] + """ + matched_anchor_indices = match.matched_column_indices() + unmatched_ignored_anchor_indices = (match. + unmatched_or_ignored_column_indices()) + matched_gt_indices = match.matched_row_indices() + matched_anchors = box_list_ops.gather(anchors, + matched_anchor_indices) + matched_gt_boxes = box_list_ops.gather(groundtruth_boxes, + matched_gt_indices) + matched_reg_targets = self._box_coder.encode(matched_gt_boxes, + matched_anchors) + unmatched_ignored_reg_targets = tf.tile( + self._default_regression_target(), + tf.stack([tf.size(unmatched_ignored_anchor_indices), 1])) + reg_targets = tf.dynamic_stitch( + [matched_anchor_indices, unmatched_ignored_anchor_indices], + [matched_reg_targets, unmatched_ignored_reg_targets]) + # TODO: summarize the number of matches on average. + return reg_targets + + def _default_regression_target(self): + """Returns the default target for anchors to regress to. + + Default regression targets are set to zero (though in + this implementation what these targets are set to should + not matter as the regression weight of any box set to + regress to the default target is zero). + + Returns: + default_target: a float32 tensor with shape [1, box_code_dimension] + """ + return tf.constant([self._box_coder.code_size*[0]], tf.float32) + + def _create_classification_targets(self, groundtruth_labels, match): + """Create classification targets for each anchor. + + Assign a classification target of for each anchor to the matching + groundtruth label that is provided by match. Anchors that are not matched + to anything are given the target self._unmatched_cls_target + + Args: + groundtruth_labels: a tensor of shape [num_gt_boxes, d_1, ... d_k] + with labels for each of the ground_truth boxes. The subshape + [d_1, ... d_k] can be empty (corresponding to scalar labels). + match: a matcher.Match object that provides a matching between anchors + and groundtruth boxes. + + Returns: + cls_targets: a float32 tensor with shape [num_anchors, d_1, d_2 ... d_k], + where the subshape [d_1, ..., d_k] is compatible with groundtruth_labels + which has shape [num_gt_boxes, d_1, d_2, ... d_k]. + """ + matched_anchor_indices = match.matched_column_indices() + unmatched_ignored_anchor_indices = (match. + unmatched_or_ignored_column_indices()) + matched_gt_indices = match.matched_row_indices() + matched_cls_targets = tf.gather(groundtruth_labels, matched_gt_indices) + + ones = self._unmatched_cls_target.shape.ndims * [1] + unmatched_ignored_cls_targets = tf.tile( + tf.expand_dims(self._unmatched_cls_target, 0), + tf.stack([tf.size(unmatched_ignored_anchor_indices)] + ones)) + + cls_targets = tf.dynamic_stitch( + [matched_anchor_indices, unmatched_ignored_anchor_indices], + [matched_cls_targets, unmatched_ignored_cls_targets]) + return cls_targets + + def _create_regression_weights(self, match): + """Set regression weight for each anchor. + + Only positive anchors are set to contribute to the regression loss, so this + method returns a weight of 1 for every positive anchor and 0 for every + negative anchor. + + Args: + match: a matcher.Match object that provides a matching between anchors + and groundtruth boxes. + + Returns: + reg_weights: a float32 tensor with shape [num_anchors] representing + regression weights + """ + reg_weights = tf.cast(match.matched_column_indicator(), tf.float32) + return reg_weights + + def _create_classification_weights(self, + match, + positive_class_weight=1.0, + negative_class_weight=1.0): + """Create classification weights for each anchor. + + Positive (matched) anchors are associated with a weight of + positive_class_weight and negative (unmatched) anchors are associated with + a weight of negative_class_weight. When anchors are ignored, weights are set + to zero. By default, both positive/negative weights are set to 1.0, + but they can be adjusted to handle class imbalance (which is almost always + the case in object detection). + + Args: + match: a matcher.Match object that provides a matching between anchors + and groundtruth boxes. + positive_class_weight: weight to be associated to positive anchors + negative_class_weight: weight to be associated to negative anchors + + Returns: + cls_weights: a float32 tensor with shape [num_anchors] representing + classification weights. + """ + matched_indicator = tf.cast(match.matched_column_indicator(), tf.float32) + ignore_indicator = tf.cast(match.ignored_column_indicator(), tf.float32) + unmatched_indicator = 1.0 - matched_indicator - ignore_indicator + cls_weights = (positive_class_weight * matched_indicator + + negative_class_weight * unmatched_indicator) + return cls_weights + + def get_box_coder(self): + """Get BoxCoder of this TargetAssigner. + + Returns: + BoxCoder: BoxCoder object. + """ + return self._box_coder + + +# TODO: This method pulls in all the implementation dependencies into core. +# Therefore its best to have this factory method outside of core. +def create_target_assigner(reference, stage=None, + positive_class_weight=1.0, + negative_class_weight=1.0, + unmatched_cls_target=None): + """Factory function for creating standard target assigners. + + Args: + reference: string referencing the type of TargetAssigner. + stage: string denoting stage: {proposal, detection}. + positive_class_weight: classification weight to be associated to positive + anchors (default: 1.0) + negative_class_weight: classification weight to be associated to negative + anchors (default: 1.0) + unmatched_cls_target: a float32 tensor with shape [d_1, d_2, ..., d_k] + which is consistent with the classification target for each + anchor (and can be empty for scalar targets). This shape must thus be + compatible with the groundtruth labels that are passed to the Assign + function (which have shape [num_gt_boxes, d_1, d_2, ..., d_k]). + If set to None, unmatched_cls_target is set to be 0 for each anchor. + + Returns: + TargetAssigner: desired target assigner. + + Raises: + ValueError: if combination reference+stage is invalid. + """ + if reference == 'Multibox' and stage == 'proposal': + similarity_calc = sim_calc.NegSqDistSimilarity() + matcher = bipartite_matcher.GreedyBipartiteMatcher() + box_coder = mean_stddev_box_coder.MeanStddevBoxCoder() + + elif reference == 'FasterRCNN' and stage == 'proposal': + similarity_calc = sim_calc.IouSimilarity() + matcher = argmax_matcher.ArgMaxMatcher(matched_threshold=0.7, + unmatched_threshold=0.3, + force_match_for_each_row=True) + box_coder = faster_rcnn_box_coder.FasterRcnnBoxCoder( + scale_factors=[10.0, 10.0, 5.0, 5.0]) + + elif reference == 'FasterRCNN' and stage == 'detection': + similarity_calc = sim_calc.IouSimilarity() + # Uses all proposals with IOU < 0.5 as candidate negatives. + matcher = argmax_matcher.ArgMaxMatcher(matched_threshold=0.5, + negatives_lower_than_unmatched=True) + box_coder = faster_rcnn_box_coder.FasterRcnnBoxCoder( + scale_factors=[10.0, 10.0, 5.0, 5.0]) + + elif reference == 'FastRCNN': + similarity_calc = sim_calc.IouSimilarity() + matcher = argmax_matcher.ArgMaxMatcher(matched_threshold=0.5, + unmatched_threshold=0.1, + force_match_for_each_row=False, + negatives_lower_than_unmatched=False) + box_coder = faster_rcnn_box_coder.FasterRcnnBoxCoder() + + else: + raise ValueError('No valid combination of reference and stage.') + + return TargetAssigner(similarity_calc, matcher, box_coder, + positive_class_weight=positive_class_weight, + negative_class_weight=negative_class_weight, + unmatched_cls_target=unmatched_cls_target) + + +def batch_assign_targets(target_assigner, + anchors_batch, + gt_box_batch, + gt_class_targets_batch): + """Batched assignment of classification and regression targets. + + Args: + target_assigner: a target assigner. + anchors_batch: BoxList representing N box anchors or list of BoxList objects + with length batch_size representing anchor sets. + gt_box_batch: a list of BoxList objects with length batch_size + representing groundtruth boxes for each image in the batch + gt_class_targets_batch: a list of tensors with length batch_size, where + each tensor has shape [num_gt_boxes_i, classification_target_size] and + num_gt_boxes_i is the number of boxes in the ith boxlist of + gt_box_batch. + + Returns: + batch_cls_targets: a tensor with shape [batch_size, num_anchors, + num_classes], + batch_cls_weights: a tensor with shape [batch_size, num_anchors], + batch_reg_targets: a tensor with shape [batch_size, num_anchors, + box_code_dimension] + batch_reg_weights: a tensor with shape [batch_size, num_anchors], + match_list: a list of matcher.Match objects encoding the match between + anchors and groundtruth boxes for each image of the batch, + with rows of the Match objects corresponding to groundtruth boxes + and columns corresponding to anchors. + Raises: + ValueError: if input list lengths are inconsistent, i.e., + batch_size == len(gt_box_batch) == len(gt_class_targets_batch) + and batch_size == len(anchors_batch) unless anchors_batch is a single + BoxList. + """ + if not isinstance(anchors_batch, list): + anchors_batch = len(gt_box_batch) * [anchors_batch] + if not all( + isinstance(anchors, box_list.BoxList) for anchors in anchors_batch): + raise ValueError('anchors_batch must be a BoxList or list of BoxLists.') + if not (len(anchors_batch) + == len(gt_box_batch) + == len(gt_class_targets_batch)): + raise ValueError('batch size incompatible with lengths of anchors_batch, ' + 'gt_box_batch and gt_class_targets_batch.') + cls_targets_list = [] + cls_weights_list = [] + reg_targets_list = [] + reg_weights_list = [] + match_list = [] + for anchors, gt_boxes, gt_class_targets in zip( + anchors_batch, gt_box_batch, gt_class_targets_batch): + (cls_targets, cls_weights, reg_targets, + reg_weights, match) = target_assigner.assign( + anchors, gt_boxes, gt_class_targets) + cls_targets_list.append(cls_targets) + cls_weights_list.append(cls_weights) + reg_targets_list.append(reg_targets) + reg_weights_list.append(reg_weights) + match_list.append(match) + batch_cls_targets = tf.stack(cls_targets_list) + batch_cls_weights = tf.stack(cls_weights_list) + batch_reg_targets = tf.stack(reg_targets_list) + batch_reg_weights = tf.stack(reg_weights_list) + return (batch_cls_targets, batch_cls_weights, batch_reg_targets, + batch_reg_weights, match_list) diff --git a/object_detection/core/target_assigner_test.py b/object_detection/core/target_assigner_test.py new file mode 100644 index 0000000000000000000000000000000000000000..92c7564897a268c5565d315e1c6cd5c4c26b0731 --- /dev/null +++ b/object_detection/core/target_assigner_test.py @@ -0,0 +1,682 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.core.target_assigner.""" +import numpy as np +import tensorflow as tf + +from object_detection.box_coders import mean_stddev_box_coder +from object_detection.core import box_list +from object_detection.core import region_similarity_calculator +from object_detection.core import target_assigner as targetassigner +from object_detection.matchers import argmax_matcher +from object_detection.matchers import bipartite_matcher + + +class TargetAssignerTest(tf.test.TestCase): + + def test_assign_agnostic(self): + similarity_calc = region_similarity_calculator.NegSqDistSimilarity() + matcher = bipartite_matcher.GreedyBipartiteMatcher() + box_coder = mean_stddev_box_coder.MeanStddevBoxCoder() + target_assigner = targetassigner.TargetAssigner( + similarity_calc, matcher, box_coder, unmatched_cls_target=None) + + prior_means = tf.constant([[0.0, 0.0, 0.5, 0.5], + [0.5, 0.5, 1.0, 0.8], + [0, 0.5, .5, 1.0]]) + prior_stddevs = tf.constant(3 * [4 * [.1]]) + priors = box_list.BoxList(prior_means) + priors.add_field('stddev', prior_stddevs) + + box_corners = [[0.0, 0.0, 0.5, 0.5], [0.5, 0.5, 0.9, 0.9]] + boxes = box_list.BoxList(tf.constant(box_corners)) + exp_cls_targets = [[1], [1], [0]] + exp_cls_weights = [1, 1, 1] + exp_reg_targets = [[0, 0, 0, 0], + [0, 0, -1, 1], + [0, 0, 0, 0]] + exp_reg_weights = [1, 1, 0] + exp_matching_anchors = [0, 1] + + result = target_assigner.assign(priors, boxes, num_valid_rows=2) + (cls_targets, cls_weights, reg_targets, reg_weights, match) = result + + with self.test_session() as sess: + (cls_targets_out, cls_weights_out, + reg_targets_out, reg_weights_out, matching_anchors_out) = sess.run( + [cls_targets, cls_weights, reg_targets, reg_weights, + match.matched_column_indices()]) + + self.assertAllClose(cls_targets_out, exp_cls_targets) + self.assertAllClose(cls_weights_out, exp_cls_weights) + self.assertAllClose(reg_targets_out, exp_reg_targets) + self.assertAllClose(reg_weights_out, exp_reg_weights) + self.assertAllClose(matching_anchors_out, exp_matching_anchors) + self.assertEquals(cls_targets_out.dtype, np.float32) + self.assertEquals(cls_weights_out.dtype, np.float32) + self.assertEquals(reg_targets_out.dtype, np.float32) + self.assertEquals(reg_weights_out.dtype, np.float32) + self.assertEquals(matching_anchors_out.dtype, np.int32) + + def test_assign_with_ignored_matches(self): + # Note: test is very similar to above. The third box matched with an IOU + # of 0.35, which is between the matched and unmatched threshold. This means + # That like above the expected classification targets are [1, 1, 0]. + # Unlike above, the third target is ignored and therefore expected + # classification weights are [1, 1, 0]. + similarity_calc = region_similarity_calculator.IouSimilarity() + matcher = argmax_matcher.ArgMaxMatcher(matched_threshold=0.5, + unmatched_threshold=0.3) + box_coder = mean_stddev_box_coder.MeanStddevBoxCoder() + target_assigner = targetassigner.TargetAssigner( + similarity_calc, matcher, box_coder) + + prior_means = tf.constant([[0.0, 0.0, 0.5, 0.5], + [0.5, 0.5, 1.0, 0.8], + [0.0, 0.5, .9, 1.0]]) + prior_stddevs = tf.constant(3 * [4 * [.1]]) + priors = box_list.BoxList(prior_means) + priors.add_field('stddev', prior_stddevs) + + box_corners = [[0.0, 0.0, 0.5, 0.5], + [0.5, 0.5, 0.9, 0.9]] + boxes = box_list.BoxList(tf.constant(box_corners)) + exp_cls_targets = [[1], [1], [0]] + exp_cls_weights = [1, 1, 0] + exp_reg_targets = [[0, 0, 0, 0], + [0, 0, -1, 1], + [0, 0, 0, 0]] + exp_reg_weights = [1, 1, 0] + exp_matching_anchors = [0, 1] + + result = target_assigner.assign(priors, boxes) + (cls_targets, cls_weights, reg_targets, reg_weights, match) = result + with self.test_session() as sess: + (cls_targets_out, cls_weights_out, + reg_targets_out, reg_weights_out, matching_anchors_out) = sess.run( + [cls_targets, cls_weights, reg_targets, reg_weights, + match.matched_column_indices()]) + + self.assertAllClose(cls_targets_out, exp_cls_targets) + self.assertAllClose(cls_weights_out, exp_cls_weights) + self.assertAllClose(reg_targets_out, exp_reg_targets) + self.assertAllClose(reg_weights_out, exp_reg_weights) + self.assertAllClose(matching_anchors_out, exp_matching_anchors) + self.assertEquals(cls_targets_out.dtype, np.float32) + self.assertEquals(cls_weights_out.dtype, np.float32) + self.assertEquals(reg_targets_out.dtype, np.float32) + self.assertEquals(reg_weights_out.dtype, np.float32) + self.assertEquals(matching_anchors_out.dtype, np.int32) + + def test_assign_multiclass(self): + similarity_calc = region_similarity_calculator.NegSqDistSimilarity() + matcher = bipartite_matcher.GreedyBipartiteMatcher() + box_coder = mean_stddev_box_coder.MeanStddevBoxCoder() + unmatched_cls_target = tf.constant([1, 0, 0, 0, 0, 0, 0], tf.float32) + target_assigner = targetassigner.TargetAssigner( + similarity_calc, matcher, box_coder, + unmatched_cls_target=unmatched_cls_target) + + prior_means = tf.constant([[0.0, 0.0, 0.5, 0.5], + [0.5, 0.5, 1.0, 0.8], + [0, 0.5, .5, 1.0], + [.75, 0, 1.0, .25]]) + prior_stddevs = tf.constant(4 * [4 * [.1]]) + priors = box_list.BoxList(prior_means) + priors.add_field('stddev', prior_stddevs) + + box_corners = [[0.0, 0.0, 0.5, 0.5], + [0.5, 0.5, 0.9, 0.9], + [.75, 0, .95, .27]] + boxes = box_list.BoxList(tf.constant(box_corners)) + + groundtruth_labels = tf.constant([[0, 1, 0, 0, 0, 0, 0], + [0, 0, 0, 0, 0, 1, 0], + [0, 0, 0, 1, 0, 0, 0]], tf.float32) + + exp_cls_targets = [[0, 1, 0, 0, 0, 0, 0], + [0, 0, 0, 0, 0, 1, 0], + [1, 0, 0, 0, 0, 0, 0], + [0, 0, 0, 1, 0, 0, 0]] + exp_cls_weights = [1, 1, 1, 1] + exp_reg_targets = [[0, 0, 0, 0], + [0, 0, -1, 1], + [0, 0, 0, 0], + [0, 0, -.5, .2]] + exp_reg_weights = [1, 1, 0, 1] + exp_matching_anchors = [0, 1, 3] + + result = target_assigner.assign(priors, boxes, groundtruth_labels, + num_valid_rows=3) + (cls_targets, cls_weights, reg_targets, reg_weights, match) = result + with self.test_session() as sess: + (cls_targets_out, cls_weights_out, + reg_targets_out, reg_weights_out, matching_anchors_out) = sess.run( + [cls_targets, cls_weights, reg_targets, reg_weights, + match.matched_column_indices()]) + + self.assertAllClose(cls_targets_out, exp_cls_targets) + self.assertAllClose(cls_weights_out, exp_cls_weights) + self.assertAllClose(reg_targets_out, exp_reg_targets) + self.assertAllClose(reg_weights_out, exp_reg_weights) + self.assertAllClose(matching_anchors_out, exp_matching_anchors) + self.assertEquals(cls_targets_out.dtype, np.float32) + self.assertEquals(cls_weights_out.dtype, np.float32) + self.assertEquals(reg_targets_out.dtype, np.float32) + self.assertEquals(reg_weights_out.dtype, np.float32) + self.assertEquals(matching_anchors_out.dtype, np.int32) + + def test_assign_multiclass_unequal_class_weights(self): + similarity_calc = region_similarity_calculator.NegSqDistSimilarity() + matcher = bipartite_matcher.GreedyBipartiteMatcher() + box_coder = mean_stddev_box_coder.MeanStddevBoxCoder() + unmatched_cls_target = tf.constant([1, 0, 0, 0, 0, 0, 0], tf.float32) + target_assigner = targetassigner.TargetAssigner( + similarity_calc, matcher, box_coder, + positive_class_weight=1.0, negative_class_weight=0.5, + unmatched_cls_target=unmatched_cls_target) + + prior_means = tf.constant([[0.0, 0.0, 0.5, 0.5], + [0.5, 0.5, 1.0, 0.8], + [0, 0.5, .5, 1.0], + [.75, 0, 1.0, .25]]) + prior_stddevs = tf.constant(4 * [4 * [.1]]) + priors = box_list.BoxList(prior_means) + priors.add_field('stddev', prior_stddevs) + + box_corners = [[0.0, 0.0, 0.5, 0.5], + [0.5, 0.5, 0.9, 0.9], + [.75, 0, .95, .27]] + boxes = box_list.BoxList(tf.constant(box_corners)) + + groundtruth_labels = tf.constant([[0, 1, 0, 0, 0, 0, 0], + [0, 0, 0, 0, 0, 1, 0], + [0, 0, 0, 1, 0, 0, 0]], tf.float32) + + exp_cls_weights = [1, 1, .5, 1] + result = target_assigner.assign(priors, boxes, groundtruth_labels, + num_valid_rows=3) + (_, cls_weights, _, _, _) = result + with self.test_session() as sess: + cls_weights_out = sess.run(cls_weights) + self.assertAllClose(cls_weights_out, exp_cls_weights) + + def test_assign_multidimensional_class_targets(self): + similarity_calc = region_similarity_calculator.NegSqDistSimilarity() + matcher = bipartite_matcher.GreedyBipartiteMatcher() + box_coder = mean_stddev_box_coder.MeanStddevBoxCoder() + unmatched_cls_target = tf.constant([[0, 0], [0, 0]], tf.float32) + target_assigner = targetassigner.TargetAssigner( + similarity_calc, matcher, box_coder, + unmatched_cls_target=unmatched_cls_target) + + prior_means = tf.constant([[0.0, 0.0, 0.5, 0.5], + [0.5, 0.5, 1.0, 0.8], + [0, 0.5, .5, 1.0], + [.75, 0, 1.0, .25]]) + prior_stddevs = tf.constant(4 * [4 * [.1]]) + priors = box_list.BoxList(prior_means) + priors.add_field('stddev', prior_stddevs) + + box_corners = [[0.0, 0.0, 0.5, 0.5], + [0.5, 0.5, 0.9, 0.9], + [.75, 0, .95, .27]] + boxes = box_list.BoxList(tf.constant(box_corners)) + + groundtruth_labels = tf.constant([[[0, 1], [1, 0]], + [[1, 0], [0, 1]], + [[0, 1], [1, .5]]], tf.float32) + + exp_cls_targets = [[[0, 1], [1, 0]], + [[1, 0], [0, 1]], + [[0, 0], [0, 0]], + [[0, 1], [1, .5]]] + exp_cls_weights = [1, 1, 1, 1] + exp_reg_targets = [[0, 0, 0, 0], + [0, 0, -1, 1], + [0, 0, 0, 0], + [0, 0, -.5, .2]] + exp_reg_weights = [1, 1, 0, 1] + exp_matching_anchors = [0, 1, 3] + + result = target_assigner.assign(priors, boxes, groundtruth_labels, + num_valid_rows=3) + (cls_targets, cls_weights, reg_targets, reg_weights, match) = result + with self.test_session() as sess: + (cls_targets_out, cls_weights_out, + reg_targets_out, reg_weights_out, matching_anchors_out) = sess.run( + [cls_targets, cls_weights, reg_targets, reg_weights, + match.matched_column_indices()]) + + self.assertAllClose(cls_targets_out, exp_cls_targets) + self.assertAllClose(cls_weights_out, exp_cls_weights) + self.assertAllClose(reg_targets_out, exp_reg_targets) + self.assertAllClose(reg_weights_out, exp_reg_weights) + self.assertAllClose(matching_anchors_out, exp_matching_anchors) + self.assertEquals(cls_targets_out.dtype, np.float32) + self.assertEquals(cls_weights_out.dtype, np.float32) + self.assertEquals(reg_targets_out.dtype, np.float32) + self.assertEquals(reg_weights_out.dtype, np.float32) + self.assertEquals(matching_anchors_out.dtype, np.int32) + + def test_assign_empty_groundtruth(self): + similarity_calc = region_similarity_calculator.NegSqDistSimilarity() + matcher = bipartite_matcher.GreedyBipartiteMatcher() + box_coder = mean_stddev_box_coder.MeanStddevBoxCoder() + unmatched_cls_target = tf.constant([0, 0, 0], tf.float32) + target_assigner = targetassigner.TargetAssigner( + similarity_calc, matcher, box_coder, + unmatched_cls_target=unmatched_cls_target) + + prior_means = tf.constant([[0.0, 0.0, 0.5, 0.5], + [0.5, 0.5, 1.0, 0.8], + [0, 0.5, .5, 1.0], + [.75, 0, 1.0, .25]]) + prior_stddevs = tf.constant(4 * [4 * [.1]]) + priors = box_list.BoxList(prior_means) + priors.add_field('stddev', prior_stddevs) + + box_corners_expanded = tf.constant([[0.0, 0.0, 0.0, 0.0]]) + box_corners = tf.slice(box_corners_expanded, [0, 0], [0, 4]) + boxes = box_list.BoxList(box_corners) + + groundtruth_labels_expanded = tf.constant([[0, 0, 0]], tf.float32) + groundtruth_labels = tf.slice(groundtruth_labels_expanded, [0, 0], [0, 3]) + + exp_cls_targets = [[0, 0, 0], + [0, 0, 0], + [0, 0, 0], + [0, 0, 0]] + exp_cls_weights = [1, 1, 1, 1] + exp_reg_targets = [[0, 0, 0, 0], + [0, 0, 0, 0], + [0, 0, 0, 0], + [0, 0, 0, 0]] + exp_reg_weights = [0, 0, 0, 0] + exp_matching_anchors = [] + + result = target_assigner.assign(priors, boxes, groundtruth_labels) + (cls_targets, cls_weights, reg_targets, reg_weights, match) = result + with self.test_session() as sess: + (cls_targets_out, cls_weights_out, + reg_targets_out, reg_weights_out, matching_anchors_out) = sess.run( + [cls_targets, cls_weights, reg_targets, reg_weights, + match.matched_column_indices()]) + + self.assertAllClose(cls_targets_out, exp_cls_targets) + self.assertAllClose(cls_weights_out, exp_cls_weights) + self.assertAllClose(reg_targets_out, exp_reg_targets) + self.assertAllClose(reg_weights_out, exp_reg_weights) + self.assertAllClose(matching_anchors_out, exp_matching_anchors) + self.assertEquals(cls_targets_out.dtype, np.float32) + self.assertEquals(cls_weights_out.dtype, np.float32) + self.assertEquals(reg_targets_out.dtype, np.float32) + self.assertEquals(reg_weights_out.dtype, np.float32) + self.assertEquals(matching_anchors_out.dtype, np.int32) + + def test_raises_error_on_invalid_groundtruth_labels(self): + similarity_calc = region_similarity_calculator.NegSqDistSimilarity() + matcher = bipartite_matcher.GreedyBipartiteMatcher() + box_coder = mean_stddev_box_coder.MeanStddevBoxCoder() + unmatched_cls_target = tf.constant([[0, 0], [0, 0], [0, 0]], tf.float32) + target_assigner = targetassigner.TargetAssigner( + similarity_calc, matcher, box_coder, + unmatched_cls_target=unmatched_cls_target) + + prior_means = tf.constant([[0.0, 0.0, 0.5, 0.5]]) + prior_stddevs = tf.constant([[1.0, 1.0, 1.0, 1.0]]) + priors = box_list.BoxList(prior_means) + priors.add_field('stddev', prior_stddevs) + + box_corners = [[0.0, 0.0, 0.5, 0.5], + [0.5, 0.5, 0.9, 0.9], + [.75, 0, .95, .27]] + boxes = box_list.BoxList(tf.constant(box_corners)) + + groundtruth_labels = tf.constant([[[0, 1], [1, 0]]], tf.float32) + + with self.assertRaises(ValueError): + target_assigner.assign(priors, boxes, groundtruth_labels, + num_valid_rows=3) + + +class BatchTargetAssignerTest(tf.test.TestCase): + + def _get_agnostic_target_assigner(self): + similarity_calc = region_similarity_calculator.NegSqDistSimilarity() + matcher = bipartite_matcher.GreedyBipartiteMatcher() + box_coder = mean_stddev_box_coder.MeanStddevBoxCoder() + return targetassigner.TargetAssigner( + similarity_calc, matcher, box_coder, + positive_class_weight=1.0, + negative_class_weight=1.0, + unmatched_cls_target=None) + + def _get_multi_class_target_assigner(self, num_classes): + similarity_calc = region_similarity_calculator.NegSqDistSimilarity() + matcher = bipartite_matcher.GreedyBipartiteMatcher() + box_coder = mean_stddev_box_coder.MeanStddevBoxCoder() + unmatched_cls_target = tf.constant([1] + num_classes * [0], tf.float32) + return targetassigner.TargetAssigner( + similarity_calc, matcher, box_coder, + positive_class_weight=1.0, + negative_class_weight=1.0, + unmatched_cls_target=unmatched_cls_target) + + def _get_multi_dimensional_target_assigner(self, target_dimensions): + similarity_calc = region_similarity_calculator.NegSqDistSimilarity() + matcher = bipartite_matcher.GreedyBipartiteMatcher() + box_coder = mean_stddev_box_coder.MeanStddevBoxCoder() + unmatched_cls_target = tf.constant(np.zeros(target_dimensions), + tf.float32) + return targetassigner.TargetAssigner( + similarity_calc, matcher, box_coder, + positive_class_weight=1.0, + negative_class_weight=1.0, + unmatched_cls_target=unmatched_cls_target) + + def test_batch_assign_targets(self): + box_list1 = box_list.BoxList(tf.constant([[0., 0., 0.2, 0.2]])) + box_list2 = box_list.BoxList(tf.constant( + [[0, 0.25123152, 1, 1], + [0.015789, 0.0985, 0.55789, 0.3842]] + )) + + gt_box_batch = [box_list1, box_list2] + gt_class_targets = [None, None] + + prior_means = tf.constant([[0, 0, .25, .25], + [0, .25, 1, 1], + [0, .1, .5, .5], + [.75, .75, 1, 1]]) + prior_stddevs = tf.constant([[.1, .1, .1, .1], + [.1, .1, .1, .1], + [.1, .1, .1, .1], + [.1, .1, .1, .1]]) + priors = box_list.BoxList(prior_means) + priors.add_field('stddev', prior_stddevs) + + exp_reg_targets = [[[0, 0, -0.5, -0.5], + [0, 0, 0, 0], + [0, 0, 0, 0,], + [0, 0, 0, 0,],], + [[0, 0, 0, 0,], + [0, 0.01231521, 0, 0], + [0.15789001, -0.01500003, 0.57889998, -1.15799987], + [0, 0, 0, 0]]] + exp_cls_weights = [[1, 1, 1, 1], + [1, 1, 1, 1]] + exp_cls_targets = [[[1], [0], [0], [0]], + [[0], [1], [1], [0]]] + exp_reg_weights = [[1, 0, 0, 0], + [0, 1, 1, 0]] + exp_match_0 = [0] + exp_match_1 = [1, 2] + + agnostic_target_assigner = self._get_agnostic_target_assigner() + (cls_targets, cls_weights, reg_targets, reg_weights, + match_list) = targetassigner.batch_assign_targets( + agnostic_target_assigner, priors, gt_box_batch, gt_class_targets) + self.assertTrue(isinstance(match_list, list) and len(match_list) == 2) + with self.test_session() as sess: + (cls_targets_out, cls_weights_out, reg_targets_out, reg_weights_out, + match_out_0, match_out_1) = sess.run([ + cls_targets, cls_weights, reg_targets, reg_weights] + [ + match.matched_column_indices() for match in match_list]) + self.assertAllClose(cls_targets_out, exp_cls_targets) + self.assertAllClose(cls_weights_out, exp_cls_weights) + self.assertAllClose(reg_targets_out, exp_reg_targets) + self.assertAllClose(reg_weights_out, exp_reg_weights) + self.assertAllClose(match_out_0, exp_match_0) + self.assertAllClose(match_out_1, exp_match_1) + + def test_batch_assign_multiclass_targets(self): + box_list1 = box_list.BoxList(tf.constant([[0., 0., 0.2, 0.2]])) + + box_list2 = box_list.BoxList(tf.constant( + [[0, 0.25123152, 1, 1], + [0.015789, 0.0985, 0.55789, 0.3842]] + )) + + gt_box_batch = [box_list1, box_list2] + + class_targets1 = tf.constant([[0, 1, 0, 0]], tf.float32) + class_targets2 = tf.constant([[0, 0, 0, 1], + [0, 0, 1, 0]], tf.float32) + + gt_class_targets = [class_targets1, class_targets2] + + prior_means = tf.constant([[0, 0, .25, .25], + [0, .25, 1, 1], + [0, .1, .5, .5], + [.75, .75, 1, 1]]) + prior_stddevs = tf.constant([[.1, .1, .1, .1], + [.1, .1, .1, .1], + [.1, .1, .1, .1], + [.1, .1, .1, .1]]) + priors = box_list.BoxList(prior_means) + priors.add_field('stddev', prior_stddevs) + + exp_reg_targets = [[[0, 0, -0.5, -0.5], + [0, 0, 0, 0], + [0, 0, 0, 0], + [0, 0, 0, 0]], + [[0, 0, 0, 0], + [0, 0.01231521, 0, 0], + [0.15789001, -0.01500003, 0.57889998, -1.15799987], + [0, 0, 0, 0]]] + exp_cls_weights = [[1, 1, 1, 1], + [1, 1, 1, 1]] + exp_cls_targets = [[[0, 1, 0, 0], + [1, 0, 0, 0], + [1, 0, 0, 0], + [1, 0, 0, 0]], + [[1, 0, 0, 0], + [0, 0, 0, 1], + [0, 0, 1, 0], + [1, 0, 0, 0]]] + exp_reg_weights = [[1, 0, 0, 0], + [0, 1, 1, 0]] + exp_match_0 = [0] + exp_match_1 = [1, 2] + + multiclass_target_assigner = self._get_multi_class_target_assigner( + num_classes=3) + + (cls_targets, cls_weights, reg_targets, reg_weights, + match_list) = targetassigner.batch_assign_targets( + multiclass_target_assigner, priors, gt_box_batch, gt_class_targets) + self.assertTrue(isinstance(match_list, list) and len(match_list) == 2) + with self.test_session() as sess: + (cls_targets_out, cls_weights_out, reg_targets_out, reg_weights_out, + match_out_0, match_out_1) = sess.run([ + cls_targets, cls_weights, reg_targets, reg_weights] + [ + match.matched_column_indices() for match in match_list]) + self.assertAllClose(cls_targets_out, exp_cls_targets) + self.assertAllClose(cls_weights_out, exp_cls_weights) + self.assertAllClose(reg_targets_out, exp_reg_targets) + self.assertAllClose(reg_weights_out, exp_reg_weights) + self.assertAllClose(match_out_0, exp_match_0) + self.assertAllClose(match_out_1, exp_match_1) + + def test_batch_assign_multidimensional_targets(self): + box_list1 = box_list.BoxList(tf.constant([[0., 0., 0.2, 0.2]])) + + box_list2 = box_list.BoxList(tf.constant( + [[0, 0.25123152, 1, 1], + [0.015789, 0.0985, 0.55789, 0.3842]] + )) + + gt_box_batch = [box_list1, box_list2] + class_targets1 = tf.constant([[[0, 1, 1], + [1, 1, 0]]], tf.float32) + class_targets2 = tf.constant([[[0, 1, 1], + [1, 1, 0]], + [[0, 0, 1], + [0, 0, 1]]], tf.float32) + + gt_class_targets = [class_targets1, class_targets2] + + prior_means = tf.constant([[0, 0, .25, .25], + [0, .25, 1, 1], + [0, .1, .5, .5], + [.75, .75, 1, 1]]) + prior_stddevs = tf.constant([[.1, .1, .1, .1], + [.1, .1, .1, .1], + [.1, .1, .1, .1], + [.1, .1, .1, .1]]) + priors = box_list.BoxList(prior_means) + priors.add_field('stddev', prior_stddevs) + + exp_reg_targets = [[[0, 0, -0.5, -0.5], + [0, 0, 0, 0], + [0, 0, 0, 0], + [0, 0, 0, 0]], + [[0, 0, 0, 0], + [0, 0.01231521, 0, 0], + [0.15789001, -0.01500003, 0.57889998, -1.15799987], + [0, 0, 0, 0]]] + exp_cls_weights = [[1, 1, 1, 1], + [1, 1, 1, 1]] + + exp_cls_targets = [[[[0., 1., 1.], + [1., 1., 0.]], + [[0., 0., 0.], + [0., 0., 0.]], + [[0., 0., 0.], + [0., 0., 0.]], + [[0., 0., 0.], + [0., 0., 0.]]], + [[[0., 0., 0.], + [0., 0., 0.]], + [[0., 1., 1.], + [1., 1., 0.]], + [[0., 0., 1.], + [0., 0., 1.]], + [[0., 0., 0.], + [0., 0., 0.]]]] + exp_reg_weights = [[1, 0, 0, 0], + [0, 1, 1, 0]] + exp_match_0 = [0] + exp_match_1 = [1, 2] + + multiclass_target_assigner = self._get_multi_dimensional_target_assigner( + target_dimensions=(2, 3)) + + (cls_targets, cls_weights, reg_targets, reg_weights, + match_list) = targetassigner.batch_assign_targets( + multiclass_target_assigner, priors, gt_box_batch, gt_class_targets) + self.assertTrue(isinstance(match_list, list) and len(match_list) == 2) + with self.test_session() as sess: + (cls_targets_out, cls_weights_out, reg_targets_out, reg_weights_out, + match_out_0, match_out_1) = sess.run([ + cls_targets, cls_weights, reg_targets, reg_weights] + [ + match.matched_column_indices() for match in match_list]) + self.assertAllClose(cls_targets_out, exp_cls_targets) + self.assertAllClose(cls_weights_out, exp_cls_weights) + self.assertAllClose(reg_targets_out, exp_reg_targets) + self.assertAllClose(reg_weights_out, exp_reg_weights) + self.assertAllClose(match_out_0, exp_match_0) + self.assertAllClose(match_out_1, exp_match_1) + + def test_batch_assign_empty_groundtruth(self): + box_coords_expanded = tf.zeros((1, 4), tf.float32) + box_coords = tf.slice(box_coords_expanded, [0, 0], [0, 4]) + box_list1 = box_list.BoxList(box_coords) + gt_box_batch = [box_list1] + + prior_means = tf.constant([[0, 0, .25, .25], + [0, .25, 1, 1]]) + prior_stddevs = tf.constant([[.1, .1, .1, .1], + [.1, .1, .1, .1]]) + priors = box_list.BoxList(prior_means) + priors.add_field('stddev', prior_stddevs) + + exp_reg_targets = [[[0, 0, 0, 0], + [0, 0, 0, 0]]] + exp_cls_weights = [[1, 1]] + exp_cls_targets = [[[1, 0, 0, 0], + [1, 0, 0, 0]]] + exp_reg_weights = [[0, 0]] + exp_match_0 = [] + + num_classes = 3 + pad = 1 + gt_class_targets = tf.zeros((0, num_classes + pad)) + gt_class_targets_batch = [gt_class_targets] + + multiclass_target_assigner = self._get_multi_class_target_assigner( + num_classes=3) + + (cls_targets, cls_weights, reg_targets, reg_weights, + match_list) = targetassigner.batch_assign_targets( + multiclass_target_assigner, priors, + gt_box_batch, gt_class_targets_batch) + self.assertTrue(isinstance(match_list, list) and len(match_list) == 1) + with self.test_session() as sess: + (cls_targets_out, cls_weights_out, reg_targets_out, reg_weights_out, + match_out_0) = sess.run([ + cls_targets, cls_weights, reg_targets, reg_weights] + [ + match.matched_column_indices() for match in match_list]) + self.assertAllClose(cls_targets_out, exp_cls_targets) + self.assertAllClose(cls_weights_out, exp_cls_weights) + self.assertAllClose(reg_targets_out, exp_reg_targets) + self.assertAllClose(reg_weights_out, exp_reg_weights) + self.assertAllClose(match_out_0, exp_match_0) + + +class CreateTargetAssignerTest(tf.test.TestCase): + + def test_create_target_assigner(self): + """Tests that named constructor gives working target assigners. + + TODO: Make this test more general. + """ + corners = [[0.0, 0.0, 1.0, 1.0]] + groundtruth = box_list.BoxList(tf.constant(corners)) + + priors = box_list.BoxList(tf.constant(corners)) + prior_stddevs = tf.constant([[1.0, 1.0, 1.0, 1.0]]) + priors.add_field('stddev', prior_stddevs) + multibox_ta = (targetassigner + .create_target_assigner('Multibox', stage='proposal')) + multibox_ta.assign(priors, groundtruth) + # No tests on output, as that may vary arbitrarily as new target assigners + # are added. As long as it is constructed correctly and runs without errors, + # tests on the individual assigners cover correctness of the assignments. + + anchors = box_list.BoxList(tf.constant(corners)) + faster_rcnn_proposals_ta = (targetassigner + .create_target_assigner('FasterRCNN', + stage='proposal')) + faster_rcnn_proposals_ta.assign(anchors, groundtruth) + + fast_rcnn_ta = (targetassigner + .create_target_assigner('FastRCNN')) + fast_rcnn_ta.assign(anchors, groundtruth) + + faster_rcnn_detection_ta = (targetassigner + .create_target_assigner('FasterRCNN', + stage='detection')) + faster_rcnn_detection_ta.assign(anchors, groundtruth) + + with self.assertRaises(ValueError): + targetassigner.create_target_assigner('InvalidDetector', + stage='invalid_stage') + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/create_pascal_tf_record.py b/object_detection/create_pascal_tf_record.py new file mode 100644 index 0000000000000000000000000000000000000000..9da40d90e368742d20af1816c76c0491239fbeab --- /dev/null +++ b/object_detection/create_pascal_tf_record.py @@ -0,0 +1,183 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +r"""Convert raw PASCAL dataset to TFRecord for object_detection. + +Example usage: + ./create_pascal_tf_record --data_dir=/home/user/VOCdevkit \ + --year=VOC2012 \ + --output_path=/home/user/pascal.record +""" +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import hashlib +import io +import logging +import os + +from lxml import etree +import PIL.Image +import tensorflow as tf + +from object_detection.utils import dataset_util +from object_detection.utils import label_map_util + + +flags = tf.app.flags +flags.DEFINE_string('data_dir', '', 'Root directory to raw PASCAL VOC dataset.') +flags.DEFINE_string('set', 'train', 'Convert training set, validation set or ' + 'merged set.') +flags.DEFINE_string('annotations_dir', 'Annotations', + '(Relative) path to annotations directory.') +flags.DEFINE_string('year', 'VOC2007', 'Desired challenge year.') +flags.DEFINE_string('output_path', '', 'Path to output TFRecord') +flags.DEFINE_string('label_map_path', 'data/pascal_label_map.pbtxt', + 'Path to label map proto') +flags.DEFINE_boolean('ignore_difficult_instances', False, 'Whether to ignore ' + 'difficult instances') +FLAGS = flags.FLAGS + +SETS = ['train', 'val', 'trainval', 'test'] +YEARS = ['VOC2007', 'VOC2012', 'merged'] + + +def dict_to_tf_example(data, + dataset_directory, + label_map_dict, + ignore_difficult_instances=False, + image_subdirectory='JPEGImages'): + """Convert XML derived dict to tf.Example proto. + + Notice that this function normalizes the bounding box coordinates provided + by the raw data. + + Args: + data: dict holding PASCAL XML fields for a single image (obtained by + running dataset_util.recursive_parse_xml_to_dict) + dataset_directory: Path to root directory holding PASCAL dataset + label_map_dict: A map from string label names to integers ids. + ignore_difficult_instances: Whether to skip difficult instances in the + dataset (default: False). + image_subdirectory: String specifying subdirectory within the + PASCAL dataset directory holding the actual image data. + + Returns: + example: The converted tf.Example. + + Raises: + ValueError: if the image pointed to by data['filename'] is not a valid JPEG + """ + img_path = os.path.join(data['folder'], image_subdirectory, data['filename']) + full_path = os.path.join(dataset_directory, img_path) + with tf.gfile.GFile(full_path, 'rb') as fid: + encoded_jpg = fid.read() + encoded_jpg_io = io.BytesIO(encoded_jpg) + image = PIL.Image.open(encoded_jpg_io) + if image.format != 'JPEG': + raise ValueError('Image format not JPEG') + key = hashlib.sha256(encoded_jpg).hexdigest() + + width = int(data['size']['width']) + height = int(data['size']['height']) + + xmin = [] + ymin = [] + xmax = [] + ymax = [] + classes = [] + classes_text = [] + truncated = [] + poses = [] + difficult_obj = [] + for obj in data['object']: + difficult = bool(int(obj['difficult'])) + if ignore_difficult_instances and difficult: + continue + + difficult_obj.append(int(difficult)) + + xmin.append(float(obj['bndbox']['xmin']) / width) + ymin.append(float(obj['bndbox']['ymin']) / height) + xmax.append(float(obj['bndbox']['xmax']) / width) + ymax.append(float(obj['bndbox']['ymax']) / height) + classes_text.append(obj['name'].encode('utf8')) + classes.append(label_map_dict[obj['name']]) + truncated.append(int(obj['truncated'])) + poses.append(obj['pose'].encode('utf8')) + + example = tf.train.Example(features=tf.train.Features(feature={ + 'image/height': dataset_util.int64_feature(height), + 'image/width': dataset_util.int64_feature(width), + 'image/filename': dataset_util.bytes_feature( + data['filename'].encode('utf8')), + 'image/source_id': dataset_util.bytes_feature( + data['filename'].encode('utf8')), + 'image/key/sha256': dataset_util.bytes_feature(key.encode('utf8')), + 'image/encoded': dataset_util.bytes_feature(encoded_jpg), + 'image/format': dataset_util.bytes_feature('jpeg'.encode('utf8')), + 'image/object/bbox/xmin': dataset_util.float_list_feature(xmin), + 'image/object/bbox/xmax': dataset_util.float_list_feature(xmax), + 'image/object/bbox/ymin': dataset_util.float_list_feature(ymin), + 'image/object/bbox/ymax': dataset_util.float_list_feature(ymax), + 'image/object/class/text': dataset_util.bytes_list_feature(classes_text), + 'image/object/class/label': dataset_util.int64_list_feature(classes), + 'image/object/difficult': dataset_util.int64_list_feature(difficult_obj), + 'image/object/truncated': dataset_util.int64_list_feature(truncated), + 'image/object/view': dataset_util.bytes_list_feature(poses), + })) + return example + + +def main(_): + if FLAGS.set not in SETS: + raise ValueError('set must be in : {}'.format(SETS)) + if FLAGS.year not in YEARS: + raise ValueError('year must be in : {}'.format(YEARS)) + + data_dir = FLAGS.data_dir + years = ['VOC2007', 'VOC2012'] + if FLAGS.year != 'merged': + years = [FLAGS.year] + + writer = tf.python_io.TFRecordWriter(FLAGS.output_path) + + label_map_dict = label_map_util.get_label_map_dict(FLAGS.label_map_path) + + for year in years: + logging.info('Reading from PASCAL %s dataset.', year) + examples_path = os.path.join(data_dir, year, 'ImageSets', 'Main', + 'aeroplane_' + FLAGS.set + '.txt') + annotations_dir = os.path.join(data_dir, year, FLAGS.annotations_dir) + examples_list = dataset_util.read_examples_list(examples_path) + for idx, example in enumerate(examples_list): + if idx % 100 == 0: + logging.info('On image %d of %d', idx, len(examples_list)) + path = os.path.join(annotations_dir, example + '.xml') + with tf.gfile.GFile(path, 'r') as fid: + xml_str = fid.read() + xml = etree.fromstring(xml_str) + data = dataset_util.recursive_parse_xml_to_dict(xml)['annotation'] + + tf_example = dict_to_tf_example(data, FLAGS.data_dir, label_map_dict, + FLAGS.ignore_difficult_instances) + writer.write(tf_example.SerializeToString()) + + writer.close() + + +if __name__ == '__main__': + tf.app.run() diff --git a/object_detection/create_pascal_tf_record_test.py b/object_detection/create_pascal_tf_record_test.py new file mode 100644 index 0000000000000000000000000000000000000000..dd29c6c2be4f9fd12e42905b2ed00a04f6c6db48 --- /dev/null +++ b/object_detection/create_pascal_tf_record_test.py @@ -0,0 +1,118 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Test for create_pascal_tf_record.py.""" + +import os + +import numpy as np +import PIL.Image +import tensorflow as tf + +from object_detection import create_pascal_tf_record + + +class DictToTFExampleTest(tf.test.TestCase): + + def _assertProtoEqual(self, proto_field, expectation): + """Helper function to assert if a proto field equals some value. + + Args: + proto_field: The protobuf field to compare. + expectation: The expected value of the protobuf field. + """ + proto_list = [p for p in proto_field] + self.assertListEqual(proto_list, expectation) + + def test_dict_to_tf_example(self): + image_file_name = 'tmp_image.jpg' + image_data = np.random.rand(256, 256, 3) + save_path = os.path.join(self.get_temp_dir(), image_file_name) + image = PIL.Image.fromarray(image_data, 'RGB') + image.save(save_path) + + data = { + 'folder': '', + 'filename': image_file_name, + 'size': { + 'height': 256, + 'width': 256, + }, + 'object': [ + { + 'difficult': 1, + 'bndbox': { + 'xmin': 64, + 'ymin': 64, + 'xmax': 192, + 'ymax': 192, + }, + 'name': 'person', + 'truncated': 0, + 'pose': '', + }, + ], + } + + label_map_dict = { + 'background': 0, + 'person': 1, + 'notperson': 2, + } + + example = create_pascal_tf_record.dict_to_tf_example( + data, self.get_temp_dir(), label_map_dict, image_subdirectory='') + self._assertProtoEqual( + example.features.feature['image/height'].int64_list.value, [256]) + self._assertProtoEqual( + example.features.feature['image/width'].int64_list.value, [256]) + self._assertProtoEqual( + example.features.feature['image/filename'].bytes_list.value, + [image_file_name]) + self._assertProtoEqual( + example.features.feature['image/source_id'].bytes_list.value, + [image_file_name]) + self._assertProtoEqual( + example.features.feature['image/format'].bytes_list.value, ['jpeg']) + self._assertProtoEqual( + example.features.feature['image/object/bbox/xmin'].float_list.value, + [0.25]) + self._assertProtoEqual( + example.features.feature['image/object/bbox/ymin'].float_list.value, + [0.25]) + self._assertProtoEqual( + example.features.feature['image/object/bbox/xmax'].float_list.value, + [0.75]) + self._assertProtoEqual( + example.features.feature['image/object/bbox/ymax'].float_list.value, + [0.75]) + self._assertProtoEqual( + example.features.feature['image/object/class/text'].bytes_list.value, + ['person']) + self._assertProtoEqual( + example.features.feature['image/object/class/label'].int64_list.value, + [1]) + self._assertProtoEqual( + example.features.feature['image/object/difficult'].int64_list.value, + [1]) + self._assertProtoEqual( + example.features.feature['image/object/truncated'].int64_list.value, + [0]) + self._assertProtoEqual( + example.features.feature['image/object/view'].bytes_list.value, ['']) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/create_pet_tf_record.py b/object_detection/create_pet_tf_record.py new file mode 100644 index 0000000000000000000000000000000000000000..d7bad283edad8e8b4d85a33884a69f343ff873d1 --- /dev/null +++ b/object_detection/create_pet_tf_record.py @@ -0,0 +1,213 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +r"""Convert the Oxford pet dataset to TFRecord for object_detection. + +See: O. M. Parkhi, A. Vedaldi, A. Zisserman, C. V. Jawahar + Cats and Dogs + IEEE Conference on Computer Vision and Pattern Recognition, 2012 + http://www.robots.ox.ac.uk/~vgg/data/pets/ + +Example usage: + ./create_pet_tf_record --data_dir=/home/user/pet \ + --output_dir=/home/user/pet/output +""" + +import hashlib +import io +import logging +import os +import random +import re + +from lxml import etree +import PIL.Image +import tensorflow as tf + +from object_detection.utils import dataset_util +from object_detection.utils import label_map_util + +flags = tf.app.flags +flags.DEFINE_string('data_dir', '', 'Root directory to raw pet dataset.') +flags.DEFINE_string('output_dir', '', 'Path to directory to output TFRecords.') +flags.DEFINE_string('label_map_path', 'data/pet_label_map.pbtxt', + 'Path to label map proto') +FLAGS = flags.FLAGS + + +def get_class_name_from_filename(file_name): + """Gets the class name from a file. + + Args: + file_name: The file name to get the class name from. + ie. "american_pit_bull_terrier_105.jpg" + + Returns: + example: The converted tf.Example. + """ + match = re.match(r'([A-Za-z_]+)(_[0-9]+\.jpg)', file_name, re.I) + return match.groups()[0] + + +def dict_to_tf_example(data, + label_map_dict, + image_subdirectory, + ignore_difficult_instances=False): + """Convert XML derived dict to tf.Example proto. + + Notice that this function normalizes the bounding box coordinates provided + by the raw data. + + Args: + data: dict holding PASCAL XML fields for a single image (obtained by + running dataset_util.recursive_parse_xml_to_dict) + label_map_dict: A map from string label names to integers ids. + image_subdirectory: String specifying subdirectory within the + Pascal dataset directory holding the actual image data. + ignore_difficult_instances: Whether to skip difficult instances in the + dataset (default: False). + + Returns: + example: The converted tf.Example. + + Raises: + ValueError: if the image pointed to by data['filename'] is not a valid JPEG + """ + img_path = os.path.join(image_subdirectory, data['filename']) + with tf.gfile.GFile(img_path, 'rb') as fid: + encoded_jpg = fid.read() + encoded_jpg_io = io.BytesIO(encoded_jpg) + image = PIL.Image.open(encoded_jpg_io) + if image.format != 'JPEG': + raise ValueError('Image format not JPEG') + key = hashlib.sha256(encoded_jpg).hexdigest() + + width = int(data['size']['width']) + height = int(data['size']['height']) + + xmin = [] + ymin = [] + xmax = [] + ymax = [] + classes = [] + classes_text = [] + truncated = [] + poses = [] + difficult_obj = [] + for obj in data['object']: + difficult = bool(int(obj['difficult'])) + if ignore_difficult_instances and difficult: + continue + + difficult_obj.append(int(difficult)) + + xmin.append(float(obj['bndbox']['xmin']) / width) + ymin.append(float(obj['bndbox']['ymin']) / height) + xmax.append(float(obj['bndbox']['xmax']) / width) + ymax.append(float(obj['bndbox']['ymax']) / height) + class_name = get_class_name_from_filename(data['filename']) + classes_text.append(class_name.encode('utf8')) + classes.append(label_map_dict[class_name]) + truncated.append(int(obj['truncated'])) + poses.append(obj['pose'].encode('utf8')) + + example = tf.train.Example(features=tf.train.Features(feature={ + 'image/height': dataset_util.int64_feature(height), + 'image/width': dataset_util.int64_feature(width), + 'image/filename': dataset_util.bytes_feature( + data['filename'].encode('utf8')), + 'image/source_id': dataset_util.bytes_feature( + data['filename'].encode('utf8')), + 'image/key/sha256': dataset_util.bytes_feature(key.encode('utf8')), + 'image/encoded': dataset_util.bytes_feature(encoded_jpg), + 'image/format': dataset_util.bytes_feature('jpeg'.encode('utf8')), + 'image/object/bbox/xmin': dataset_util.float_list_feature(xmin), + 'image/object/bbox/xmax': dataset_util.float_list_feature(xmax), + 'image/object/bbox/ymin': dataset_util.float_list_feature(ymin), + 'image/object/bbox/ymax': dataset_util.float_list_feature(ymax), + 'image/object/class/text': dataset_util.bytes_list_feature(classes_text), + 'image/object/class/label': dataset_util.int64_list_feature(classes), + 'image/object/difficult': dataset_util.int64_list_feature(difficult_obj), + 'image/object/truncated': dataset_util.int64_list_feature(truncated), + 'image/object/view': dataset_util.bytes_list_feature(poses), + })) + return example + + +def create_tf_record(output_filename, + label_map_dict, + annotations_dir, + image_dir, + examples): + """Creates a TFRecord file from examples. + + Args: + output_filename: Path to where output file is saved. + label_map_dict: The label map dictionary. + annotations_dir: Directory where annotation files are stored. + image_dir: Directory where image files are stored. + examples: Examples to parse and save to tf record. + """ + writer = tf.python_io.TFRecordWriter(output_filename) + for idx, example in enumerate(examples): + if idx % 100 == 0: + logging.info('On image %d of %d', idx, len(examples)) + path = os.path.join(annotations_dir, 'xmls', example + '.xml') + + if not os.path.exists(path): + logging.warning('Could not find %s, ignoring example.', path) + continue + with tf.gfile.GFile(path, 'r') as fid: + xml_str = fid.read() + xml = etree.fromstring(xml_str) + data = dataset_util.recursive_parse_xml_to_dict(xml)['annotation'] + + tf_example = dict_to_tf_example(data, label_map_dict, image_dir) + writer.write(tf_example.SerializeToString()) + + writer.close() + + +# TODO: Add test for pet/PASCAL main files. +def main(_): + data_dir = FLAGS.data_dir + label_map_dict = label_map_util.get_label_map_dict(FLAGS.label_map_path) + + logging.info('Reading from Pet dataset.') + image_dir = os.path.join(data_dir, 'images') + annotations_dir = os.path.join(data_dir, 'annotations') + examples_path = os.path.join(annotations_dir, 'trainval.txt') + examples_list = dataset_util.read_examples_list(examples_path) + + # Test images are not included in the downloaded data set, so we shall perform + # our own split. + random.seed(42) + random.shuffle(examples_list) + num_examples = len(examples_list) + num_train = int(0.7 * num_examples) + train_examples = examples_list[:num_train] + val_examples = examples_list[num_train:] + logging.info('%d training and %d validation examples.', + len(train_examples), len(val_examples)) + + train_output_path = os.path.join(FLAGS.output_dir, 'pet_train.record') + val_output_path = os.path.join(FLAGS.output_dir, 'pet_val.record') + create_tf_record(train_output_path, label_map_dict, annotations_dir, + image_dir, train_examples) + create_tf_record(val_output_path, label_map_dict, annotations_dir, + image_dir, val_examples) + +if __name__ == '__main__': + tf.app.run() diff --git a/object_detection/data/mscoco_label_map.pbtxt b/object_detection/data/mscoco_label_map.pbtxt new file mode 100644 index 0000000000000000000000000000000000000000..1f4872bd0c7f53e70beecf88af005c07a5df9e08 --- /dev/null +++ b/object_detection/data/mscoco_label_map.pbtxt @@ -0,0 +1,400 @@ +item { + name: "/m/01g317" + id: 1 + display_name: "person" +} +item { + name: "/m/0199g" + id: 2 + display_name: "bicycle" +} +item { + name: "/m/0k4j" + id: 3 + display_name: "car" +} +item { + name: "/m/04_sv" + id: 4 + display_name: "motorcycle" +} +item { + name: "/m/05czz6l" + id: 5 + display_name: "airplane" +} +item { + name: "/m/01bjv" + id: 6 + display_name: "bus" +} +item { + name: "/m/07jdr" + id: 7 + display_name: "train" +} +item { + name: "/m/07r04" + id: 8 + display_name: "truck" +} +item { + name: "/m/019jd" + id: 9 + display_name: "boat" +} +item { + name: "/m/015qff" + id: 10 + display_name: "traffic light" +} +item { + name: "/m/01pns0" + id: 11 + display_name: "fire hydrant" +} +item { + name: "/m/02pv19" + id: 13 + display_name: "stop sign" +} +item { + name: "/m/015qbp" + id: 14 + display_name: "parking meter" +} +item { + name: "/m/0cvnqh" + id: 15 + display_name: "bench" +} +item { + name: "/m/015p6" + id: 16 + display_name: "bird" +} +item { + name: "/m/01yrx" + id: 17 + display_name: "cat" +} +item { + name: "/m/0bt9lr" + id: 18 + display_name: "dog" +} +item { + name: "/m/03k3r" + id: 19 + display_name: "horse" +} +item { + name: "/m/07bgp" + id: 20 + display_name: "sheep" +} +item { + name: "/m/01xq0k1" + id: 21 + display_name: "cow" +} +item { + name: "/m/0bwd_0j" + id: 22 + display_name: "elephant" +} +item { + name: "/m/01dws" + id: 23 + display_name: "bear" +} +item { + name: "/m/0898b" + id: 24 + display_name: "zebra" +} +item { + name: "/m/03bk1" + id: 25 + display_name: "giraffe" +} +item { + name: "/m/01940j" + id: 27 + display_name: "backpack" +} +item { + name: "/m/0hnnb" + id: 28 + display_name: "umbrella" +} +item { + name: "/m/080hkjn" + id: 31 + display_name: "handbag" +} +item { + name: "/m/01rkbr" + id: 32 + display_name: "tie" +} +item { + name: "/m/01s55n" + id: 33 + display_name: "suitcase" +} +item { + name: "/m/02wmf" + id: 34 + display_name: "frisbee" +} +item { + name: "/m/071p9" + id: 35 + display_name: "skis" +} +item { + name: "/m/06__v" + id: 36 + display_name: "snowboard" +} +item { + name: "/m/018xm" + id: 37 + display_name: "sports ball" +} +item { + name: "/m/02zt3" + id: 38 + display_name: "kite" +} +item { + name: "/m/03g8mr" + id: 39 + display_name: "baseball bat" +} +item { + name: "/m/03grzl" + id: 40 + display_name: "baseball glove" +} +item { + name: "/m/06_fw" + id: 41 + display_name: "skateboard" +} +item { + name: "/m/019w40" + id: 42 + display_name: "surfboard" +} +item { + name: "/m/0dv9c" + id: 43 + display_name: "tennis racket" +} +item { + name: "/m/04dr76w" + id: 44 + display_name: "bottle" +} +item { + name: "/m/09tvcd" + id: 46 + display_name: "wine glass" +} +item { + name: "/m/08gqpm" + id: 47 + display_name: "cup" +} +item { + name: "/m/0dt3t" + id: 48 + display_name: "fork" +} +item { + name: "/m/04ctx" + id: 49 + display_name: "knife" +} +item { + name: "/m/0cmx8" + id: 50 + display_name: "spoon" +} +item { + name: "/m/04kkgm" + id: 51 + display_name: "bowl" +} +item { + name: "/m/09qck" + id: 52 + display_name: "banana" +} +item { + name: "/m/014j1m" + id: 53 + display_name: "apple" +} +item { + name: "/m/0l515" + id: 54 + display_name: "sandwich" +} +item { + name: "/m/0cyhj_" + id: 55 + display_name: "orange" +} +item { + name: "/m/0hkxq" + id: 56 + display_name: "broccoli" +} +item { + name: "/m/0fj52s" + id: 57 + display_name: "carrot" +} +item { + name: "/m/01b9xk" + id: 58 + display_name: "hot dog" +} +item { + name: "/m/0663v" + id: 59 + display_name: "pizza" +} +item { + name: "/m/0jy4k" + id: 60 + display_name: "donut" +} +item { + name: "/m/0fszt" + id: 61 + display_name: "cake" +} +item { + name: "/m/01mzpv" + id: 62 + display_name: "chair" +} +item { + name: "/m/02crq1" + id: 63 + display_name: "couch" +} +item { + name: "/m/03fp41" + id: 64 + display_name: "potted plant" +} +item { + name: "/m/03ssj5" + id: 65 + display_name: "bed" +} +item { + name: "/m/04bcr3" + id: 67 + display_name: "dining table" +} +item { + name: "/m/09g1w" + id: 70 + display_name: "toilet" +} +item { + name: "/m/07c52" + id: 72 + display_name: "tv" +} +item { + name: "/m/01c648" + id: 73 + display_name: "laptop" +} +item { + name: "/m/020lf" + id: 74 + display_name: "mouse" +} +item { + name: "/m/0qjjc" + id: 75 + display_name: "remote" +} +item { + name: "/m/01m2v" + id: 76 + display_name: "keyboard" +} +item { + name: "/m/050k8" + id: 77 + display_name: "cell phone" +} +item { + name: "/m/0fx9l" + id: 78 + display_name: "microwave" +} +item { + name: "/m/029bxz" + id: 79 + display_name: "oven" +} +item { + name: "/m/01k6s3" + id: 80 + display_name: "toaster" +} +item { + name: "/m/0130jx" + id: 81 + display_name: "sink" +} +item { + name: "/m/040b_t" + id: 82 + display_name: "refrigerator" +} +item { + name: "/m/0bt_c3" + id: 84 + display_name: "book" +} +item { + name: "/m/01x3z" + id: 85 + display_name: "clock" +} +item { + name: "/m/02s195" + id: 86 + display_name: "vase" +} +item { + name: "/m/01lsmm" + id: 87 + display_name: "scissors" +} +item { + name: "/m/0kmg4" + id: 88 + display_name: "teddy bear" +} +item { + name: "/m/03wvsk" + id: 89 + display_name: "hair drier" +} +item { + name: "/m/012xff" + id: 90 + display_name: "toothbrush" +} diff --git a/object_detection/data/pascal_label_map.pbtxt b/object_detection/data/pascal_label_map.pbtxt new file mode 100644 index 0000000000000000000000000000000000000000..f79d3d5e979080193632386b2baa764fdbb7a7cd --- /dev/null +++ b/object_detection/data/pascal_label_map.pbtxt @@ -0,0 +1,104 @@ +item { + id: 0 + name: 'none_of_the_above' +} + +item { + id: 1 + name: 'aeroplane' +} + +item { + id: 2 + name: 'bicycle' +} + +item { + id: 3 + name: 'bird' +} + +item { + id: 4 + name: 'boat' +} + +item { + id: 5 + name: 'bottle' +} + +item { + id: 6 + name: 'bus' +} + +item { + id: 7 + name: 'car' +} + +item { + id: 8 + name: 'cat' +} + +item { + id: 9 + name: 'chair' +} + +item { + id: 10 + name: 'cow' +} + +item { + id: 11 + name: 'diningtable' +} + +item { + id: 12 + name: 'dog' +} + +item { + id: 13 + name: 'horse' +} + +item { + id: 14 + name: 'motorbike' +} + +item { + id: 15 + name: 'person' +} + +item { + id: 16 + name: 'pottedplant' +} + +item { + id: 17 + name: 'sheep' +} + +item { + id: 18 + name: 'sofa' +} + +item { + id: 19 + name: 'train' +} + +item { + id: 20 + name: 'tvmonitor' +} diff --git a/object_detection/data/pet_label_map.pbtxt b/object_detection/data/pet_label_map.pbtxt new file mode 100644 index 0000000000000000000000000000000000000000..61813d687e5a0a1851abd71a68d0835d39eb97d0 --- /dev/null +++ b/object_detection/data/pet_label_map.pbtxt @@ -0,0 +1,189 @@ +item { + id: 0 + name: 'none_of_the_above' +} + +item { + id: 1 + name: 'Abyssinian' +} + +item { + id: 2 + name: 'american_bulldog' +} + +item { + id: 3 + name: 'american_pit_bull_terrier' +} + +item { + id: 4 + name: 'basset_hound' +} + +item { + id: 5 + name: 'beagle' +} + +item { + id: 6 + name: 'Bengal' +} + +item { + id: 7 + name: 'Birman' +} + +item { + id: 8 + name: 'Bombay' +} + +item { + id: 9 + name: 'boxer' +} + +item { + id: 10 + name: 'British_Shorthair' +} + +item { + id: 11 + name: 'chihuahua' +} + +item { + id: 12 + name: 'Egyptian_Mau' +} + +item { + id: 13 + name: 'english_cocker_spaniel' +} + +item { + id: 14 + name: 'english_setter' +} + +item { + id: 15 + name: 'german_shorthaired' +} + +item { + id: 16 + name: 'great_pyrenees' +} + +item { + id: 17 + name: 'havanese' +} + +item { + id: 18 + name: 'japanese_chin' +} + +item { + id: 19 + name: 'keeshond' +} + +item { + id: 20 + name: 'leonberger' +} + +item { + id: 21 + name: 'Maine_Coon' +} + +item { + id: 22 + name: 'miniature_pinscher' +} + +item { + id: 23 + name: 'newfoundland' +} + +item { + id: 24 + name: 'Persian' +} + +item { + id: 25 + name: 'pomeranian' +} + +item { + id: 26 + name: 'pug' +} + +item { + id: 27 + name: 'Ragdoll' +} + +item { + id: 28 + name: 'Russian_Blue' +} + +item { + id: 29 + name: 'saint_bernard' +} + +item { + id: 30 + name: 'samoyed' +} + +item { + id: 31 + name: 'scottish_terrier' +} + +item { + id: 32 + name: 'shiba_inu' +} + +item { + id: 33 + name: 'Siamese' +} + +item { + id: 34 + name: 'Sphynx' +} + +item { + id: 35 + name: 'staffordshire_bull_terrier' +} + +item { + id: 36 + name: 'wheaten_terrier' +} + +item { + id: 37 + name: 'yorkshire_terrier' +} diff --git a/object_detection/data_decoders/BUILD b/object_detection/data_decoders/BUILD new file mode 100644 index 0000000000000000000000000000000000000000..c857294a5f1fc29cd79a629816d66dd3c88df6fb --- /dev/null +++ b/object_detection/data_decoders/BUILD @@ -0,0 +1,28 @@ +# Tensorflow Object Detection API: data decoders. + +package( + default_visibility = ["//visibility:public"], +) + +licenses(["notice"]) +# Apache 2.0 + +py_library( + name = "tf_example_decoder", + srcs = ["tf_example_decoder.py"], + deps = [ + "//tensorflow", + "//tensorflow_models/object_detection/core:data_decoder", + "//tensorflow_models/object_detection/core:standard_fields", + ], +) + +py_test( + name = "tf_example_decoder_test", + srcs = ["tf_example_decoder_test.py"], + deps = [ + ":tf_example_decoder", + "//tensorflow", + "//tensorflow_models/object_detection/core:standard_fields", + ], +) diff --git a/object_detection/data_decoders/__init__.py b/object_detection/data_decoders/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/object_detection/data_decoders/tf_example_decoder.py b/object_detection/data_decoders/tf_example_decoder.py new file mode 100644 index 0000000000000000000000000000000000000000..7426f466e234baf3537a93d0cb994c8d2316d357 --- /dev/null +++ b/object_detection/data_decoders/tf_example_decoder.py @@ -0,0 +1,147 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tensorflow Example proto decoder for object detection. + +A decoder to decode string tensors containing serialized tensorflow.Example +protos for object detection. +""" +import tensorflow as tf + +from object_detection.core import data_decoder +from object_detection.core import standard_fields as fields + +slim_example_decoder = tf.contrib.slim.tfexample_decoder + + +class TfExampleDecoder(data_decoder.DataDecoder): + """Tensorflow Example proto decoder.""" + + def __init__(self): + """Constructor sets keys_to_features and items_to_handlers.""" + self.keys_to_features = { + 'image/encoded': tf.FixedLenFeature((), tf.string, default_value=''), + 'image/format': tf.FixedLenFeature((), tf.string, default_value='jpeg'), + 'image/filename': tf.FixedLenFeature((), tf.string, default_value=''), + 'image/key/sha256': tf.FixedLenFeature((), tf.string, default_value=''), + 'image/source_id': tf.FixedLenFeature((), tf.string, default_value=''), + 'image/height': tf.FixedLenFeature((), tf.int64, 1), + 'image/width': tf.FixedLenFeature((), tf.int64, 1), + # Object boxes and classes. + 'image/object/bbox/xmin': tf.VarLenFeature(tf.float32), + 'image/object/bbox/xmax': tf.VarLenFeature(tf.float32), + 'image/object/bbox/ymin': tf.VarLenFeature(tf.float32), + 'image/object/bbox/ymax': tf.VarLenFeature(tf.float32), + 'image/object/class/label': tf.VarLenFeature(tf.int64), + 'image/object/area': tf.VarLenFeature(tf.float32), + 'image/object/is_crowd': tf.VarLenFeature(tf.int64), + 'image/object/difficult': tf.VarLenFeature(tf.int64), + # Instance masks and classes. + 'image/segmentation/object': tf.VarLenFeature(tf.int64), + 'image/segmentation/object/class': tf.VarLenFeature(tf.int64) + } + self.items_to_handlers = { + fields.InputDataFields.image: slim_example_decoder.Image( + image_key='image/encoded', format_key='image/format', channels=3), + fields.InputDataFields.source_id: ( + slim_example_decoder.Tensor('image/source_id')), + fields.InputDataFields.key: ( + slim_example_decoder.Tensor('image/key/sha256')), + fields.InputDataFields.filename: ( + slim_example_decoder.Tensor('image/filename')), + # Object boxes and classes. + fields.InputDataFields.groundtruth_boxes: ( + slim_example_decoder.BoundingBox( + ['ymin', 'xmin', 'ymax', 'xmax'], 'image/object/bbox/')), + fields.InputDataFields.groundtruth_classes: ( + slim_example_decoder.Tensor('image/object/class/label')), + fields.InputDataFields.groundtruth_area: slim_example_decoder.Tensor( + 'image/object/area'), + fields.InputDataFields.groundtruth_is_crowd: ( + slim_example_decoder.Tensor('image/object/is_crowd')), + fields.InputDataFields.groundtruth_difficult: ( + slim_example_decoder.Tensor('image/object/difficult')), + # Instance masks and classes. + fields.InputDataFields.groundtruth_instance_masks: ( + slim_example_decoder.ItemHandlerCallback( + ['image/segmentation/object', 'image/height', 'image/width'], + self._reshape_instance_masks)), + fields.InputDataFields.groundtruth_instance_classes: ( + slim_example_decoder.Tensor('image/segmentation/object/class')), + } + + def decode(self, tf_example_string_tensor): + """Decodes serialized tensorflow example and returns a tensor dictionary. + + Args: + tf_example_string_tensor: a string tensor holding a serialized tensorflow + example proto. + + Returns: + A dictionary of the following tensors. + fields.InputDataFields.image - 3D uint8 tensor of shape [None, None, 3] + containing image. + fields.InputDataFields.source_id - string tensor containing original + image id. + fields.InputDataFields.key - string tensor with unique sha256 hash key. + fields.InputDataFields.filename - string tensor with original dataset + filename. + fields.InputDataFields.groundtruth_boxes - 2D float32 tensor of shape + [None, 4] containing box corners. + fields.InputDataFields.groundtruth_classes - 1D int64 tensor of shape + [None] containing classes for the boxes. + fields.InputDataFields.groundtruth_area - 1D float32 tensor of shape + [None] containing containing object mask area in pixel squared. + fields.InputDataFields.groundtruth_is_crowd - 1D bool tensor of shape + [None] indicating if the boxes enclose a crowd. + fields.InputDataFields.groundtruth_difficult - 1D bool tensor of shape + [None] indicating if the boxes represent `difficult` instances. + fields.InputDataFields.groundtruth_instance_masks - 3D int64 tensor of + shape [None, None, None] containing instance masks. + fields.InputDataFields.groundtruth_instance_classes - 1D int64 tensor + of shape [None] containing classes for the instance masks. + """ + + serialized_example = tf.reshape(tf_example_string_tensor, shape=[]) + decoder = slim_example_decoder.TFExampleDecoder(self.keys_to_features, + self.items_to_handlers) + keys = decoder.list_items() + tensors = decoder.decode(serialized_example, items=keys) + tensor_dict = dict(zip(keys, tensors)) + is_crowd = fields.InputDataFields.groundtruth_is_crowd + tensor_dict[is_crowd] = tf.cast(tensor_dict[is_crowd], dtype=tf.bool) + tensor_dict[fields.InputDataFields.image].set_shape([None, None, 3]) + return tensor_dict + + def _reshape_instance_masks(self, keys_to_tensors): + """Reshape instance segmentation masks. + + The instance segmentation masks are reshaped to [num_instances, height, + width] and cast to boolean type to save memory. + + Args: + keys_to_tensors: a dictionary from keys to tensors. + + Returns: + A 3-D boolean tensor of shape [num_instances, height, width]. + """ + masks = keys_to_tensors['image/segmentation/object'] + if isinstance(masks, tf.SparseTensor): + masks = tf.sparse_tensor_to_dense(masks) + height = keys_to_tensors['image/height'] + width = keys_to_tensors['image/width'] + to_shape = tf.cast(tf.stack([-1, height, width]), tf.int32) + + return tf.cast(tf.reshape(masks, to_shape), tf.bool) diff --git a/object_detection/data_decoders/tf_example_decoder_test.py b/object_detection/data_decoders/tf_example_decoder_test.py new file mode 100644 index 0000000000000000000000000000000000000000..de23bec1582bacc4f0c2c7d3d8988e426e56b7b6 --- /dev/null +++ b/object_detection/data_decoders/tf_example_decoder_test.py @@ -0,0 +1,288 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.data_decoders.tf_example_decoder.""" + +import numpy as np +import tensorflow as tf + +from object_detection.core import standard_fields as fields +from object_detection.data_decoders import tf_example_decoder + + +class TfExampleDecoderTest(tf.test.TestCase): + + def _EncodeImage(self, image_tensor, encoding_type='jpeg'): + with self.test_session(): + if encoding_type == 'jpeg': + image_encoded = tf.image.encode_jpeg(tf.constant(image_tensor)).eval() + elif encoding_type == 'png': + image_encoded = tf.image.encode_png(tf.constant(image_tensor)).eval() + else: + raise ValueError('Invalid encoding type.') + return image_encoded + + def _DecodeImage(self, image_encoded, encoding_type='jpeg'): + with self.test_session(): + if encoding_type == 'jpeg': + image_decoded = tf.image.decode_jpeg(tf.constant(image_encoded)).eval() + elif encoding_type == 'png': + image_decoded = tf.image.decode_png(tf.constant(image_encoded)).eval() + else: + raise ValueError('Invalid encoding type.') + return image_decoded + + def _Int64Feature(self, value): + return tf.train.Feature(int64_list=tf.train.Int64List(value=value)) + + def _FloatFeature(self, value): + return tf.train.Feature(float_list=tf.train.FloatList(value=value)) + + def _BytesFeature(self, value): + return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value])) + + def testDecodeJpegImage(self): + image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8) + encoded_jpeg = self._EncodeImage(image_tensor) + decoded_jpeg = self._DecodeImage(encoded_jpeg) + example = tf.train.Example(features=tf.train.Features(feature={ + 'image/encoded': self._BytesFeature(encoded_jpeg), + 'image/format': self._BytesFeature('jpeg'), + 'image/source_id': self._BytesFeature('image_id'), + })).SerializeToString() + + example_decoder = tf_example_decoder.TfExampleDecoder() + tensor_dict = example_decoder.decode(tf.convert_to_tensor(example)) + + self.assertAllEqual((tensor_dict[fields.InputDataFields.image]. + get_shape().as_list()), [None, None, 3]) + with self.test_session() as sess: + tensor_dict = sess.run(tensor_dict) + + self.assertAllEqual(decoded_jpeg, tensor_dict[fields.InputDataFields.image]) + self.assertEqual('image_id', tensor_dict[fields.InputDataFields.source_id]) + + def testDecodeImageKeyAndFilename(self): + image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8) + encoded_jpeg = self._EncodeImage(image_tensor) + example = tf.train.Example(features=tf.train.Features(feature={ + 'image/encoded': self._BytesFeature(encoded_jpeg), + 'image/key/sha256': self._BytesFeature('abc'), + 'image/filename': self._BytesFeature('filename') + })).SerializeToString() + + example_decoder = tf_example_decoder.TfExampleDecoder() + tensor_dict = example_decoder.decode(tf.convert_to_tensor(example)) + + with self.test_session() as sess: + tensor_dict = sess.run(tensor_dict) + + self.assertEqual('abc', tensor_dict[fields.InputDataFields.key]) + self.assertEqual('filename', tensor_dict[fields.InputDataFields.filename]) + + def testDecodePngImage(self): + image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8) + encoded_png = self._EncodeImage(image_tensor, encoding_type='png') + decoded_png = self._DecodeImage(encoded_png, encoding_type='png') + example = tf.train.Example(features=tf.train.Features(feature={ + 'image/encoded': self._BytesFeature(encoded_png), + 'image/format': self._BytesFeature('png'), + 'image/source_id': self._BytesFeature('image_id') + })).SerializeToString() + + example_decoder = tf_example_decoder.TfExampleDecoder() + tensor_dict = example_decoder.decode(tf.convert_to_tensor(example)) + + self.assertAllEqual((tensor_dict[fields.InputDataFields.image]. + get_shape().as_list()), [None, None, 3]) + with self.test_session() as sess: + tensor_dict = sess.run(tensor_dict) + + self.assertAllEqual(decoded_png, tensor_dict[fields.InputDataFields.image]) + self.assertEqual('image_id', tensor_dict[fields.InputDataFields.source_id]) + + def testDecodeBoundingBox(self): + image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8) + encoded_jpeg = self._EncodeImage(image_tensor) + bbox_ymins = [0.0, 4.0] + bbox_xmins = [1.0, 5.0] + bbox_ymaxs = [2.0, 6.0] + bbox_xmaxs = [3.0, 7.0] + example = tf.train.Example(features=tf.train.Features(feature={ + 'image/encoded': self._BytesFeature(encoded_jpeg), + 'image/format': self._BytesFeature('jpeg'), + 'image/object/bbox/ymin': self._FloatFeature(bbox_ymins), + 'image/object/bbox/xmin': self._FloatFeature(bbox_xmins), + 'image/object/bbox/ymax': self._FloatFeature(bbox_ymaxs), + 'image/object/bbox/xmax': self._FloatFeature(bbox_xmaxs), + })).SerializeToString() + + example_decoder = tf_example_decoder.TfExampleDecoder() + tensor_dict = example_decoder.decode(tf.convert_to_tensor(example)) + + self.assertAllEqual((tensor_dict[fields.InputDataFields.groundtruth_boxes]. + get_shape().as_list()), [None, 4]) + with self.test_session() as sess: + tensor_dict = sess.run(tensor_dict) + + expected_boxes = np.vstack([bbox_ymins, bbox_xmins, + bbox_ymaxs, bbox_xmaxs]).transpose() + self.assertAllEqual(expected_boxes, + tensor_dict[fields.InputDataFields.groundtruth_boxes]) + + def testDecodeObjectLabel(self): + image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8) + encoded_jpeg = self._EncodeImage(image_tensor) + bbox_classes = [0, 1] + example = tf.train.Example(features=tf.train.Features(feature={ + 'image/encoded': self._BytesFeature(encoded_jpeg), + 'image/format': self._BytesFeature('jpeg'), + 'image/object/class/label': self._Int64Feature(bbox_classes), + })).SerializeToString() + + example_decoder = tf_example_decoder.TfExampleDecoder() + tensor_dict = example_decoder.decode(tf.convert_to_tensor(example)) + + self.assertAllEqual((tensor_dict[ + fields.InputDataFields.groundtruth_classes].get_shape().as_list()), + [None]) + + with self.test_session() as sess: + tensor_dict = sess.run(tensor_dict) + + self.assertAllEqual(bbox_classes, + tensor_dict[fields.InputDataFields.groundtruth_classes]) + + def testDecodeObjectArea(self): + image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8) + encoded_jpeg = self._EncodeImage(image_tensor) + object_area = [100., 174.] + example = tf.train.Example(features=tf.train.Features(feature={ + 'image/encoded': self._BytesFeature(encoded_jpeg), + 'image/format': self._BytesFeature('jpeg'), + 'image/object/area': self._FloatFeature(object_area), + })).SerializeToString() + + example_decoder = tf_example_decoder.TfExampleDecoder() + tensor_dict = example_decoder.decode(tf.convert_to_tensor(example)) + + self.assertAllEqual((tensor_dict[fields.InputDataFields.groundtruth_area]. + get_shape().as_list()), [None]) + with self.test_session() as sess: + tensor_dict = sess.run(tensor_dict) + + self.assertAllEqual(object_area, + tensor_dict[fields.InputDataFields.groundtruth_area]) + + def testDecodeObjectIsCrowd(self): + image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8) + encoded_jpeg = self._EncodeImage(image_tensor) + object_is_crowd = [0, 1] + example = tf.train.Example(features=tf.train.Features(feature={ + 'image/encoded': self._BytesFeature(encoded_jpeg), + 'image/format': self._BytesFeature('jpeg'), + 'image/object/is_crowd': self._Int64Feature(object_is_crowd), + })).SerializeToString() + + example_decoder = tf_example_decoder.TfExampleDecoder() + tensor_dict = example_decoder.decode(tf.convert_to_tensor(example)) + + self.assertAllEqual((tensor_dict[ + fields.InputDataFields.groundtruth_is_crowd].get_shape().as_list()), + [None]) + with self.test_session() as sess: + tensor_dict = sess.run(tensor_dict) + + self.assertAllEqual([bool(item) for item in object_is_crowd], + tensor_dict[ + fields.InputDataFields.groundtruth_is_crowd]) + + def testDecodeObjectDifficult(self): + image_tensor = np.random.randint(255, size=(4, 5, 3)).astype(np.uint8) + encoded_jpeg = self._EncodeImage(image_tensor) + object_difficult = [0, 1] + example = tf.train.Example(features=tf.train.Features(feature={ + 'image/encoded': self._BytesFeature(encoded_jpeg), + 'image/format': self._BytesFeature('jpeg'), + 'image/object/difficult': self._Int64Feature(object_difficult), + })).SerializeToString() + + example_decoder = tf_example_decoder.TfExampleDecoder() + tensor_dict = example_decoder.decode(tf.convert_to_tensor(example)) + + self.assertAllEqual((tensor_dict[ + fields.InputDataFields.groundtruth_difficult].get_shape().as_list()), + [None]) + with self.test_session() as sess: + tensor_dict = sess.run(tensor_dict) + + self.assertAllEqual([bool(item) for item in object_difficult], + tensor_dict[ + fields.InputDataFields.groundtruth_difficult]) + + def testDecodeInstanceSegmentation(self): + num_instances = 4 + image_height = 5 + image_width = 3 + + # Randomly generate image. + image_tensor = np.random.randint(255, size=(image_height, + image_width, + 3)).astype(np.uint8) + encoded_jpeg = self._EncodeImage(image_tensor) + + # Randomly generate instance segmentation masks. + instance_segmentation = ( + np.random.randint(2, size=(num_instances, + image_height, + image_width)).astype(np.int64)) + + # Randomly generate class labels for each instance. + instance_segmentation_classes = np.random.randint( + 100, size=(num_instances)).astype(np.int64) + + example = tf.train.Example(features=tf.train.Features(feature={ + 'image/encoded': self._BytesFeature(encoded_jpeg), + 'image/format': self._BytesFeature('jpeg'), + 'image/height': self._Int64Feature([image_height]), + 'image/width': self._Int64Feature([image_width]), + 'image/segmentation/object': self._Int64Feature( + instance_segmentation.flatten()), + 'image/segmentation/object/class': self._Int64Feature( + instance_segmentation_classes)})).SerializeToString() + example_decoder = tf_example_decoder.TfExampleDecoder() + tensor_dict = example_decoder.decode(tf.convert_to_tensor(example)) + + self.assertAllEqual(( + tensor_dict[fields.InputDataFields.groundtruth_instance_masks]. + get_shape().as_list()), [None, None, None]) + + self.assertAllEqual(( + tensor_dict[fields.InputDataFields.groundtruth_instance_classes]. + get_shape().as_list()), [None]) + + with self.test_session() as sess: + tensor_dict = sess.run(tensor_dict) + + self.assertAllEqual( + instance_segmentation.astype(np.bool), + tensor_dict[fields.InputDataFields.groundtruth_instance_masks]) + self.assertAllEqual( + instance_segmentation_classes, + tensor_dict[fields.InputDataFields.groundtruth_instance_classes]) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/eval.py b/object_detection/eval.py new file mode 100644 index 0000000000000000000000000000000000000000..cf3ab0c5648f2d219ad8a69aa4e15fbec79aac20 --- /dev/null +++ b/object_detection/eval.py @@ -0,0 +1,161 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +r"""Evaluation executable for detection models. + +This executable is used to evaluate DetectionModels. There are two ways of +configuring the eval job. + +1) A single pipeline_pb2.TrainEvalPipelineConfig file maybe specified instead. +In this mode, the --eval_training_data flag may be given to force the pipeline +to evaluate on training data instead. + +Example usage: + ./eval \ + --logtostderr \ + --checkpoint_dir=path/to/checkpoint_dir \ + --eval_dir=path/to/eval_dir \ + --pipeline_config_path=pipeline_config.pbtxt + +2) Three configuration files may be provided: a model_pb2.DetectionModel +configuration file to define what type of DetectionModel is being evaulated, an +input_reader_pb2.InputReader file to specify what data the model is evaluating +and an eval_pb2.EvalConfig file to configure evaluation parameters. + +Example usage: + ./eval \ + --logtostderr \ + --checkpoint_dir=path/to/checkpoint_dir \ + --eval_dir=path/to/eval_dir \ + --eval_config_path=eval_config.pbtxt \ + --model_config_path=model_config.pbtxt \ + --input_config_path=eval_input_config.pbtxt +""" +import functools +import tensorflow as tf + +from google.protobuf import text_format +from object_detection import evaluator +from object_detection.builders import input_reader_builder +from object_detection.builders import model_builder +from object_detection.protos import eval_pb2 +from object_detection.protos import input_reader_pb2 +from object_detection.protos import model_pb2 +from object_detection.protos import pipeline_pb2 +from object_detection.utils import label_map_util + +tf.logging.set_verbosity(tf.logging.INFO) + +flags = tf.app.flags +flags.DEFINE_boolean('eval_training_data', False, + 'If training data should be evaluated for this job.') +flags.DEFINE_string('checkpoint_dir', '', + 'Directory containing checkpoints to evaluate, typically ' + 'set to `train_dir` used in the training job.') +flags.DEFINE_string('eval_dir', '', + 'Directory to write eval summaries to.') +flags.DEFINE_string('pipeline_config_path', '', + 'Path to a pipeline_pb2.TrainEvalPipelineConfig config ' + 'file. If provided, other configs are ignored') +flags.DEFINE_string('eval_config_path', '', + 'Path to an eval_pb2.EvalConfig config file.') +flags.DEFINE_string('input_config_path', '', + 'Path to an input_reader_pb2.InputReader config file.') +flags.DEFINE_string('model_config_path', '', + 'Path to a model_pb2.DetectionModel config file.') + +FLAGS = flags.FLAGS + + +def get_configs_from_pipeline_file(): + """Reads evaluation configuration from a pipeline_pb2.TrainEvalPipelineConfig. + + Reads evaluation config from file specified by pipeline_config_path flag. + + Returns: + model_config: a model_pb2.DetectionModel + eval_config: a eval_pb2.EvalConfig + input_config: a input_reader_pb2.InputReader + """ + pipeline_config = pipeline_pb2.TrainEvalPipelineConfig() + with tf.gfile.GFile(FLAGS.pipeline_config_path, 'r') as f: + text_format.Merge(f.read(), pipeline_config) + + model_config = pipeline_config.model + if FLAGS.eval_training_data: + eval_config = pipeline_config.train_config + else: + eval_config = pipeline_config.eval_config + input_config = pipeline_config.eval_input_reader + + return model_config, eval_config, input_config + + +def get_configs_from_multiple_files(): + """Reads evaluation configuration from multiple config files. + + Reads the evaluation config from the following files: + model_config: Read from --model_config_path + eval_config: Read from --eval_config_path + input_config: Read from --input_config_path + + Returns: + model_config: a model_pb2.DetectionModel + eval_config: a eval_pb2.EvalConfig + input_config: a input_reader_pb2.InputReader + """ + eval_config = eval_pb2.EvalConfig() + with tf.gfile.GFile(FLAGS.eval_config_path, 'r') as f: + text_format.Merge(f.read(), eval_config) + + model_config = model_pb2.DetectionModel() + with tf.gfile.GFile(FLAGS.model_config_path, 'r') as f: + text_format.Merge(f.read(), model_config) + + input_config = input_reader_pb2.InputReader() + with tf.gfile.GFile(FLAGS.input_config_path, 'r') as f: + text_format.Merge(f.read(), input_config) + + return model_config, eval_config, input_config + + +def main(unused_argv): + assert FLAGS.checkpoint_dir, '`checkpoint_dir` is missing.' + assert FLAGS.eval_dir, '`eval_dir` is missing.' + if FLAGS.pipeline_config_path: + model_config, eval_config, input_config = get_configs_from_pipeline_file() + else: + model_config, eval_config, input_config = get_configs_from_multiple_files() + + model_fn = functools.partial( + model_builder.build, + model_config=model_config, + is_training=False) + + create_input_dict_fn = functools.partial( + input_reader_builder.build, + input_config) + + label_map = label_map_util.load_labelmap(input_config.label_map_path) + max_num_classes = max([item.id for item in label_map.item]) + categories = label_map_util.convert_label_map_to_categories( + label_map, max_num_classes) + + evaluator.evaluate(create_input_dict_fn, model_fn, eval_config, categories, + FLAGS.checkpoint_dir, FLAGS.eval_dir) + + +if __name__ == '__main__': + tf.app.run() diff --git a/object_detection/eval_util.py b/object_detection/eval_util.py new file mode 100644 index 0000000000000000000000000000000000000000..51e6878ea6ae6651c1dcd75fd33ac304c542c397 --- /dev/null +++ b/object_detection/eval_util.py @@ -0,0 +1,524 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Common functions for repeatedly evaluating a checkpoint. +""" +import copy +import logging +import os +import time + +import numpy as np +import tensorflow as tf + +from object_detection.utils import label_map_util +from object_detection.utils import object_detection_evaluation +from object_detection.utils import visualization_utils as vis_utils + +slim = tf.contrib.slim + + +def write_metrics(metrics, global_step, summary_dir): + """Write metrics to a summary directory. + + Args: + metrics: A dictionary containing metric names and values. + global_step: Global step at which the metrics are computed. + summary_dir: Directory to write tensorflow summaries to. + """ + logging.info('Writing metrics to tf summary.') + summary_writer = tf.summary.FileWriter(summary_dir) + for key in sorted(metrics): + summary = tf.Summary(value=[ + tf.Summary.Value(tag=key, simple_value=metrics[key]), + ]) + summary_writer.add_summary(summary, global_step) + logging.info('%s: %f', key, metrics[key]) + summary_writer.close() + logging.info('Metrics written to tf summary.') + + +def evaluate_detection_results_pascal_voc(result_lists, + categories, + label_id_offset=0, + iou_thres=0.5, + corloc_summary=False): + """Computes Pascal VOC detection metrics given groundtruth and detections. + + This function computes Pascal VOC metrics. This function by default + takes detections and groundtruth boxes encoded in result_lists and writes + evaluation results to tf summaries which can be viewed on tensorboard. + + Args: + result_lists: a dictionary holding lists of groundtruth and detection + data corresponding to each image being evaluated. The following keys + are required: + 'image_id': a list of string ids + 'detection_boxes': a list of float32 numpy arrays of shape [N, 4] + 'detection_scores': a list of float32 numpy arrays of shape [N] + 'detection_classes': a list of int32 numpy arrays of shape [N] + 'groundtruth_boxes': a list of float32 numpy arrays of shape [M, 4] + 'groundtruth_classes': a list of int32 numpy arrays of shape [M] + and the remaining fields below are optional: + 'difficult': a list of boolean arrays of shape [M] indicating the + difficulty of groundtruth boxes. Some datasets like PASCAL VOC provide + this information and it is used to remove difficult examples from eval + in order to not penalize the models on them. + Note that it is okay to have additional fields in result_lists --- they + are simply ignored. + categories: a list of dictionaries representing all possible categories. + Each dict in this list has the following keys: + 'id': (required) an integer id uniquely identifying this category + 'name': (required) string representing category name + e.g., 'cat', 'dog', 'pizza' + label_id_offset: an integer offset for the label space. + iou_thres: float determining the IoU threshold at which a box is considered + correct. Defaults to the standard 0.5. + corloc_summary: boolean. If True, also outputs CorLoc metrics. + + Returns: + A dictionary of metric names to scalar values. + + Raises: + ValueError: if the set of keys in result_lists is not a superset of the + expected list of keys. Unexpected keys are ignored. + ValueError: if the lists in result_lists have inconsistent sizes. + """ + # check for expected keys in result_lists + expected_keys = [ + 'detection_boxes', 'detection_scores', 'detection_classes', 'image_id' + ] + expected_keys += ['groundtruth_boxes', 'groundtruth_classes'] + if not set(expected_keys).issubset(set(result_lists.keys())): + raise ValueError('result_lists does not have expected key set.') + num_results = len(result_lists[expected_keys[0]]) + for key in expected_keys: + if len(result_lists[key]) != num_results: + raise ValueError('Inconsistent list sizes in result_lists') + + # Pascal VOC evaluator assumes foreground index starts from zero. + categories = copy.deepcopy(categories) + for idx in range(len(categories)): + categories[idx]['id'] -= label_id_offset + + # num_classes (maybe encoded as categories) + num_classes = max([cat['id'] for cat in categories]) + 1 + logging.info('Computing Pascal VOC metrics on results.') + if all(image_id.isdigit() for image_id in result_lists['image_id']): + image_ids = [int(image_id) for image_id in result_lists['image_id']] + else: + image_ids = range(num_results) + + evaluator = object_detection_evaluation.ObjectDetectionEvaluation( + num_classes, matching_iou_threshold=iou_thres) + + difficult_lists = None + if 'difficult' in result_lists and result_lists['difficult']: + difficult_lists = result_lists['difficult'] + for idx, image_id in enumerate(image_ids): + difficult = None + if difficult_lists is not None and difficult_lists[idx].size: + difficult = difficult_lists[idx].astype(np.bool) + evaluator.add_single_ground_truth_image_info( + image_id, result_lists['groundtruth_boxes'][idx], + result_lists['groundtruth_classes'][idx] - label_id_offset, + difficult) + evaluator.add_single_detected_image_info( + image_id, result_lists['detection_boxes'][idx], + result_lists['detection_scores'][idx], + result_lists['detection_classes'][idx] - label_id_offset) + per_class_ap, mean_ap, _, _, per_class_corloc, mean_corloc = ( + evaluator.evaluate()) + + metrics = {'Precision/mAP@{}IOU'.format(iou_thres): mean_ap} + category_index = label_map_util.create_category_index(categories) + for idx in range(per_class_ap.size): + if idx in category_index: + display_name = ('PerformanceByCategory/mAP@{}IOU/{}' + .format(iou_thres, category_index[idx]['name'])) + metrics[display_name] = per_class_ap[idx] + + if corloc_summary: + metrics['CorLoc/CorLoc@{}IOU'.format(iou_thres)] = mean_corloc + for idx in range(per_class_corloc.size): + if idx in category_index: + display_name = ( + 'PerformanceByCategory/CorLoc@{}IOU/{}'.format( + iou_thres, category_index[idx]['name'])) + metrics[display_name] = per_class_corloc[idx] + return metrics + + +# TODO: Add tests. +def visualize_detection_results(result_dict, + tag, + global_step, + categories, + summary_dir='', + export_dir='', + agnostic_mode=False, + show_groundtruth=False, + min_score_thresh=.5, + max_num_predictions=20): + """Visualizes detection results and writes visualizations to image summaries. + + This function visualizes an image with its detected bounding boxes and writes + to image summaries which can be viewed on tensorboard. It optionally also + writes images to a directory. In the case of missing entry in the label map, + unknown class name in the visualization is shown as "N/A". + + Args: + result_dict: a dictionary holding groundtruth and detection + data corresponding to each image being evaluated. The following keys + are required: + 'original_image': a numpy array representing the image with shape + [1, height, width, 3] + 'detection_boxes': a numpy array of shape [N, 4] + 'detection_scores': a numpy array of shape [N] + 'detection_classes': a numpy array of shape [N] + The following keys are optional: + 'groundtruth_boxes': a numpy array of shape [N, 4] + 'groundtruth_keypoints': a numpy array of shape [N, num_keypoints, 2] + Detections are assumed to be provided in decreasing order of score and for + display, and we assume that scores are probabilities between 0 and 1. + tag: tensorboard tag (string) to associate with image. + global_step: global step at which the visualization are generated. + categories: a list of dictionaries representing all possible categories. + Each dict in this list has the following keys: + 'id': (required) an integer id uniquely identifying this category + 'name': (required) string representing category name + e.g., 'cat', 'dog', 'pizza' + 'supercategory': (optional) string representing the supercategory + e.g., 'animal', 'vehicle', 'food', etc + summary_dir: the output directory to which the image summaries are written. + export_dir: the output directory to which images are written. If this is + empty (default), then images are not exported. + agnostic_mode: boolean (default: False) controlling whether to evaluate in + class-agnostic mode or not. + show_groundtruth: boolean (default: False) controlling whether to show + groundtruth boxes in addition to detected boxes + min_score_thresh: minimum score threshold for a box to be visualized + max_num_predictions: maximum number of detections to visualize + Raises: + ValueError: if result_dict does not contain the expected keys (i.e., + 'original_image', 'detection_boxes', 'detection_scores', + 'detection_classes') + """ + if not set([ + 'original_image', 'detection_boxes', 'detection_scores', + 'detection_classes' + ]).issubset(set(result_dict.keys())): + raise ValueError('result_dict does not contain all expected keys.') + if show_groundtruth and 'groundtruth_boxes' not in result_dict: + raise ValueError('If show_groundtruth is enabled, result_dict must contain ' + 'groundtruth_boxes.') + logging.info('Creating detection visualizations.') + category_index = label_map_util.create_category_index(categories) + + image = np.squeeze(result_dict['original_image'], axis=0) + detection_boxes = result_dict['detection_boxes'] + detection_scores = result_dict['detection_scores'] + detection_classes = np.int32((result_dict['detection_classes'])) + detection_keypoints = result_dict.get('detection_keypoints', None) + detection_masks = result_dict.get('detection_masks', None) + + # Plot groundtruth underneath detections + if show_groundtruth: + groundtruth_boxes = result_dict['groundtruth_boxes'] + groundtruth_keypoints = result_dict.get('groundtruth_keypoints', None) + vis_utils.visualize_boxes_and_labels_on_image_array( + image, + groundtruth_boxes, + None, + None, + category_index, + keypoints=groundtruth_keypoints, + use_normalized_coordinates=False, + max_boxes_to_draw=None) + vis_utils.visualize_boxes_and_labels_on_image_array( + image, + detection_boxes, + detection_classes, + detection_scores, + category_index, + instance_masks=detection_masks, + keypoints=detection_keypoints, + use_normalized_coordinates=False, + max_boxes_to_draw=max_num_predictions, + min_score_thresh=min_score_thresh, + agnostic_mode=agnostic_mode) + + if export_dir: + export_path = os.path.join(export_dir, 'export-{}.png'.format(tag)) + vis_utils.save_image_array_as_png(image, export_path) + + summary = tf.Summary(value=[ + tf.Summary.Value(tag=tag, image=tf.Summary.Image( + encoded_image_string=vis_utils.encode_image_array_as_png_str( + image))) + ]) + summary_writer = tf.summary.FileWriter(summary_dir) + summary_writer.add_summary(summary, global_step) + summary_writer.close() + + logging.info('Detection visualizations written to summary with tag %s.', tag) + + +# TODO: Add tests. +# TODO: Have an argument called `aggregated_processor_tensor_keys` that contains +# a whitelist of tensors used by the `aggregated_result_processor` instead of a +# blacklist. This will prevent us from inadvertently adding any evaluated +# tensors into the `results_list` data structure that are not needed by +# `aggregated_result_preprocessor`. +def run_checkpoint_once(tensor_dict, + update_op, + summary_dir, + aggregated_result_processor=None, + batch_processor=None, + checkpoint_dirs=None, + variables_to_restore=None, + restore_fn=None, + num_batches=1, + master='', + save_graph=False, + save_graph_dir='', + metric_names_to_values=None, + keys_to_exclude_from_results=()): + """Evaluates both python metrics and tensorflow slim metrics. + + Python metrics are processed in batch by the aggregated_result_processor, + while tensorflow slim metrics statistics are computed by running + metric_names_to_updates tensors and aggregated using metric_names_to_values + tensor. + + Args: + tensor_dict: a dictionary holding tensors representing a batch of detections + and corresponding groundtruth annotations. + update_op: a tensorflow update op that will run for each batch along with + the tensors in tensor_dict.. + summary_dir: a directory to write metrics summaries. + aggregated_result_processor: a function taking one arguments: + 1. result_lists: a dictionary with keys matching those in tensor_dict + and corresponding values being the list of results for each tensor + in tensor_dict. The length of each such list is num_batches. + batch_processor: a function taking four arguments: + 1. tensor_dict: the same tensor_dict that is passed in as the first + argument to this function. + 2. sess: a tensorflow session + 3. batch_index: an integer representing the index of the batch amongst + all batches + 4. update_op: a tensorflow update op that will run for each batch. + and returns result_dict, a dictionary of results for that batch. + By default, batch_processor is None, which defaults to running: + return sess.run(tensor_dict) + To skip an image, it suffices to return an empty dictionary in place of + result_dict. + checkpoint_dirs: list of directories to load into an EnsembleModel. If it + has only one directory, EnsembleModel will not be used -- a DetectionModel + will be instantiated directly. Not used if restore_fn is set. + variables_to_restore: None, or a dictionary mapping variable names found in + a checkpoint to model variables. The dictionary would normally be + generated by creating a tf.train.ExponentialMovingAverage object and + calling its variables_to_restore() method. Not used if restore_fn is set. + restore_fn: None, or a function that takes a tf.Session object and correctly + restores all necessary variables from the correct checkpoint file. If + None, attempts to restore from the first directory in checkpoint_dirs. + num_batches: the number of batches to use for evaluation. + master: the location of the Tensorflow session. + save_graph: whether or not the Tensorflow graph is stored as a pbtxt file. + save_graph_dir: where to store the Tensorflow graph on disk. If save_graph + is True this must be non-empty. + metric_names_to_values: A dictionary containing metric names to tensors + which will be evaluated after processing all batches + of [tensor_dict, update_op]. If any metrics depend on statistics computed + during each batch ensure that `update_op` tensor has a control dependency + on the update ops that compute the statistics. + keys_to_exclude_from_results: keys in tensor_dict that will be excluded + from results_list. Note that the tensors corresponding to these keys will + still be evaluated for each batch, but won't be added to results_list. + + Raises: + ValueError: if restore_fn is None and checkpoint_dirs doesn't have at least + one element. + ValueError: if save_graph is True and save_graph_dir is not defined. + """ + if save_graph and not save_graph_dir: + raise ValueError('`save_graph_dir` must be defined.') + sess = tf.Session(master, graph=tf.get_default_graph()) + sess.run(tf.global_variables_initializer()) + sess.run(tf.local_variables_initializer()) + if restore_fn: + restore_fn(sess) + else: + if not checkpoint_dirs: + raise ValueError('`checkpoint_dirs` must have at least one entry.') + checkpoint_file = tf.train.latest_checkpoint(checkpoint_dirs[0]) + saver = tf.train.Saver(variables_to_restore) + saver.restore(sess, checkpoint_file) + + if save_graph: + tf.train.write_graph(sess.graph_def, save_graph_dir, 'eval.pbtxt') + + valid_keys = list(set(tensor_dict.keys()) - set(keys_to_exclude_from_results)) + result_lists = {key: [] for key in valid_keys} + counters = {'skipped': 0, 'success': 0} + other_metrics = None + with tf.contrib.slim.queues.QueueRunners(sess): + try: + for batch in range(int(num_batches)): + if (batch + 1) % 100 == 0: + logging.info('Running eval ops batch %d/%d', batch + 1, num_batches) + if not batch_processor: + try: + (result_dict, _) = sess.run([tensor_dict, update_op]) + counters['success'] += 1 + except tf.errors.InvalidArgumentError: + logging.info('Skipping image') + counters['skipped'] += 1 + result_dict = {} + else: + result_dict = batch_processor( + tensor_dict, sess, batch, counters, update_op) + for key in result_dict: + if key in valid_keys: + result_lists[key].append(result_dict[key]) + if metric_names_to_values is not None: + other_metrics = sess.run(metric_names_to_values) + logging.info('Running eval batches done.') + except tf.errors.OutOfRangeError: + logging.info('Done evaluating -- epoch limit reached') + finally: + # When done, ask the threads to stop. + metrics = aggregated_result_processor(result_lists) + if other_metrics is not None: + metrics.update(other_metrics) + global_step = tf.train.global_step(sess, slim.get_global_step()) + write_metrics(metrics, global_step, summary_dir) + logging.info('# success: %d', counters['success']) + logging.info('# skipped: %d', counters['skipped']) + sess.close() + + +# TODO: Add tests. +def repeated_checkpoint_run(tensor_dict, + update_op, + summary_dir, + aggregated_result_processor=None, + batch_processor=None, + checkpoint_dirs=None, + variables_to_restore=None, + restore_fn=None, + num_batches=1, + eval_interval_secs=120, + max_number_of_evaluations=None, + master='', + save_graph=False, + save_graph_dir='', + metric_names_to_values=None, + keys_to_exclude_from_results=()): + """Periodically evaluates desired tensors using checkpoint_dirs or restore_fn. + + This function repeatedly loads a checkpoint and evaluates a desired + set of tensors (provided by tensor_dict) and hands the resulting numpy + arrays to a function result_processor which can be used to further + process/save/visualize the results. + + Args: + tensor_dict: a dictionary holding tensors representing a batch of detections + and corresponding groundtruth annotations. + update_op: a tensorflow update op that will run for each batch along with + the tensors in tensor_dict. + summary_dir: a directory to write metrics summaries. + aggregated_result_processor: a function taking one argument: + 1. result_lists: a dictionary with keys matching those in tensor_dict + and corresponding values being the list of results for each tensor + in tensor_dict. The length of each such list is num_batches. + batch_processor: a function taking three arguments: + 1. tensor_dict: the same tensor_dict that is passed in as the first + argument to this function. + 2. sess: a tensorflow session + 3. batch_index: an integer representing the index of the batch amongst + all batches + 4. update_op: a tensorflow update op that will run for each batch. + and returns result_dict, a dictionary of results for that batch. + By default, batch_processor is None, which defaults to running: + return sess.run(tensor_dict) + checkpoint_dirs: list of directories to load into a DetectionModel or an + EnsembleModel if restore_fn isn't set. Also used to determine when to run + next evaluation. Must have at least one element. + variables_to_restore: None, or a dictionary mapping variable names found in + a checkpoint to model variables. The dictionary would normally be + generated by creating a tf.train.ExponentialMovingAverage object and + calling its variables_to_restore() method. Not used if restore_fn is set. + restore_fn: a function that takes a tf.Session object and correctly restores + all necessary variables from the correct checkpoint file. + num_batches: the number of batches to use for evaluation. + eval_interval_secs: the number of seconds between each evaluation run. + max_number_of_evaluations: the max number of iterations of the evaluation. + If the value is left as None the evaluation continues indefinitely. + master: the location of the Tensorflow session. + save_graph: whether or not the Tensorflow graph is saved as a pbtxt file. + save_graph_dir: where to save on disk the Tensorflow graph. If store_graph + is True this must be non-empty. + metric_names_to_values: A dictionary containing metric names to tensors + which will be evaluated after processing all batches + of [tensor_dict, update_op]. If any metrics depend on statistics computed + during each batch ensure that `update_op` tensor has a control dependency + on the update ops that compute the statistics. + keys_to_exclude_from_results: keys in tensor_dict that will be excluded + from results_list. Note that the tensors corresponding to these keys will + still be evaluated for each batch, but won't be added to results_list. + + Raises: + ValueError: if max_num_of_evaluations is not None or a positive number. + ValueError: if checkpoint_dirs doesn't have at least one element. + """ + if max_number_of_evaluations and max_number_of_evaluations <= 0: + raise ValueError( + '`number_of_steps` must be either None or a positive number.') + + if not checkpoint_dirs: + raise ValueError('`checkpoint_dirs` must have at least one entry.') + + last_evaluated_model_path = None + number_of_evaluations = 0 + while True: + start = time.time() + logging.info('Starting evaluation at ' + time.strftime('%Y-%m-%d-%H:%M:%S', + time.gmtime())) + model_path = tf.train.latest_checkpoint(checkpoint_dirs[0]) + if not model_path: + logging.info('No model found in %s. Will try again in %d seconds', + checkpoint_dirs[0], eval_interval_secs) + elif model_path == last_evaluated_model_path: + logging.info('Found already evaluated checkpoint. Will try again in %d ' + 'seconds', eval_interval_secs) + else: + last_evaluated_model_path = model_path + run_checkpoint_once(tensor_dict, update_op, summary_dir, + aggregated_result_processor, + batch_processor, checkpoint_dirs, + variables_to_restore, restore_fn, num_batches, master, + save_graph, save_graph_dir, metric_names_to_values, + keys_to_exclude_from_results) + number_of_evaluations += 1 + + if (max_number_of_evaluations and + number_of_evaluations >= max_number_of_evaluations): + logging.info('Finished evaluation!') + break + time_to_next_eval = start + eval_interval_secs - time.time() + if time_to_next_eval > 0: + time.sleep(time_to_next_eval) diff --git a/object_detection/evaluator.py b/object_detection/evaluator.py new file mode 100644 index 0000000000000000000000000000000000000000..45f03dc764160c17ced491722ef49d913ff6dc14 --- /dev/null +++ b/object_detection/evaluator.py @@ -0,0 +1,211 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Detection model evaluator. + +This file provides a generic evaluation method that can be used to evaluate a +DetectionModel. +""" +import logging +import tensorflow as tf + +from object_detection import eval_util +from object_detection.core import box_list +from object_detection.core import box_list_ops +from object_detection.core import prefetcher +from object_detection.core import standard_fields as fields +from object_detection.utils import ops + +slim = tf.contrib.slim + +EVAL_METRICS_FN_DICT = { + 'pascal_voc_metrics': eval_util.evaluate_detection_results_pascal_voc +} + + +def _extract_prediction_tensors(model, + create_input_dict_fn, + ignore_groundtruth=False): + """Restores the model in a tensorflow session. + + Args: + model: model to perform predictions with. + create_input_dict_fn: function to create input tensor dictionaries. + ignore_groundtruth: whether groundtruth should be ignored. + + Returns: + tensor_dict: A tensor dictionary with evaluations. + """ + input_dict = create_input_dict_fn() + prefetch_queue = prefetcher.prefetch(input_dict, capacity=500) + input_dict = prefetch_queue.dequeue() + original_image = tf.expand_dims(input_dict[fields.InputDataFields.image], 0) + preprocessed_image = model.preprocess(tf.to_float(original_image)) + prediction_dict = model.predict(preprocessed_image) + detections = model.postprocess(prediction_dict) + + original_image_shape = tf.shape(original_image) + absolute_detection_boxlist = box_list_ops.to_absolute_coordinates( + box_list.BoxList(tf.squeeze(detections['detection_boxes'], axis=0)), + original_image_shape[1], original_image_shape[2]) + label_id_offset = 1 + tensor_dict = { + 'original_image': original_image, + 'image_id': input_dict[fields.InputDataFields.source_id], + 'detection_boxes': absolute_detection_boxlist.get(), + 'detection_scores': tf.squeeze(detections['detection_scores'], axis=0), + 'detection_classes': ( + tf.squeeze(detections['detection_classes'], axis=0) + + label_id_offset), + } + if 'detection_masks' in detections: + detection_masks = tf.squeeze(detections['detection_masks'], + axis=0) + detection_boxes = tf.squeeze(detections['detection_boxes'], + axis=0) + # TODO: This should be done in model's postprocess function ideally. + detection_masks_reframed = ops.reframe_box_masks_to_image_masks( + detection_masks, + detection_boxes, + original_image_shape[1], original_image_shape[2]) + detection_masks_reframed = tf.to_float(tf.greater(detection_masks_reframed, + 0.5)) + + tensor_dict['detection_masks'] = detection_masks_reframed + # load groundtruth fields into tensor_dict + if not ignore_groundtruth: + normalized_gt_boxlist = box_list.BoxList( + input_dict[fields.InputDataFields.groundtruth_boxes]) + gt_boxlist = box_list_ops.scale(normalized_gt_boxlist, + tf.shape(original_image)[1], + tf.shape(original_image)[2]) + groundtruth_boxes = gt_boxlist.get() + groundtruth_classes = input_dict[fields.InputDataFields.groundtruth_classes] + tensor_dict['groundtruth_boxes'] = groundtruth_boxes + tensor_dict['groundtruth_classes'] = groundtruth_classes + tensor_dict['area'] = input_dict[fields.InputDataFields.groundtruth_area] + tensor_dict['is_crowd'] = input_dict[ + fields.InputDataFields.groundtruth_is_crowd] + tensor_dict['difficult'] = input_dict[ + fields.InputDataFields.groundtruth_difficult] + if 'detection_masks' in tensor_dict: + tensor_dict['groundtruth_instance_masks'] = input_dict[ + fields.InputDataFields.groundtruth_instance_masks] + return tensor_dict + + +def evaluate(create_input_dict_fn, create_model_fn, eval_config, categories, + checkpoint_dir, eval_dir): + """Evaluation function for detection models. + + Args: + create_input_dict_fn: a function to create a tensor input dictionary. + create_model_fn: a function that creates a DetectionModel. + eval_config: a eval_pb2.EvalConfig protobuf. + categories: a list of category dictionaries. Each dict in the list should + have an integer 'id' field and string 'name' field. + checkpoint_dir: directory to load the checkpoints to evaluate from. + eval_dir: directory to write evaluation metrics summary to. + """ + + model = create_model_fn() + + if eval_config.ignore_groundtruth and not eval_config.export_path: + logging.fatal('If ignore_groundtruth=True then an export_path is ' + 'required. Aborting!!!') + + tensor_dict = _extract_prediction_tensors( + model=model, + create_input_dict_fn=create_input_dict_fn, + ignore_groundtruth=eval_config.ignore_groundtruth) + + def _process_batch(tensor_dict, sess, batch_index, counters, update_op): + """Evaluates tensors in tensor_dict, visualizing the first K examples. + + This function calls sess.run on tensor_dict, evaluating the original_image + tensor only on the first K examples and visualizing detections overlaid + on this original_image. + + Args: + tensor_dict: a dictionary of tensors + sess: tensorflow session + batch_index: the index of the batch amongst all batches in the run. + counters: a dictionary holding 'success' and 'skipped' fields which can + be updated to keep track of number of successful and failed runs, + respectively. If these fields are not updated, then the success/skipped + counter values shown at the end of evaluation will be incorrect. + update_op: An update op that has to be run along with output tensors. For + example this could be an op to compute statistics for slim metrics. + + Returns: + result_dict: a dictionary of numpy arrays + """ + if batch_index >= eval_config.num_visualizations: + if 'original_image' in tensor_dict: + tensor_dict = {k: v for (k, v) in tensor_dict.items() + if k != 'original_image'} + try: + (result_dict, _) = sess.run([tensor_dict, update_op]) + counters['success'] += 1 + except tf.errors.InvalidArgumentError: + logging.info('Skipping image') + counters['skipped'] += 1 + return {} + global_step = tf.train.global_step(sess, slim.get_global_step()) + if batch_index < eval_config.num_visualizations: + tag = 'image-{}'.format(batch_index) + eval_util.visualize_detection_results( + result_dict, tag, global_step, categories=categories, + summary_dir=eval_dir, + export_dir=eval_config.visualization_export_dir, + show_groundtruth=eval_config.visualization_export_dir) + return result_dict + + def _process_aggregated_results(result_lists): + eval_metric_fn_key = eval_config.metrics_set + if eval_metric_fn_key not in EVAL_METRICS_FN_DICT: + raise ValueError('Metric not found: {}'.format(eval_metric_fn_key)) + return EVAL_METRICS_FN_DICT[eval_metric_fn_key](result_lists, + categories=categories) + + variables_to_restore = tf.global_variables() + global_step = slim.get_or_create_global_step() + variables_to_restore.append(global_step) + if eval_config.use_moving_averages: + variable_averages = tf.train.ExponentialMovingAverage(0.0) + variables_to_restore = variable_averages.variables_to_restore() + saver = tf.train.Saver(variables_to_restore) + def _restore_latest_checkpoint(sess): + latest_checkpoint = tf.train.latest_checkpoint(checkpoint_dir) + saver.restore(sess, latest_checkpoint) + + eval_util.repeated_checkpoint_run( + tensor_dict=tensor_dict, + update_op=tf.no_op(), + summary_dir=eval_dir, + aggregated_result_processor=_process_aggregated_results, + batch_processor=_process_batch, + checkpoint_dirs=[checkpoint_dir], + variables_to_restore=None, + restore_fn=_restore_latest_checkpoint, + num_batches=eval_config.num_examples, + eval_interval_secs=eval_config.eval_interval_secs, + max_number_of_evaluations=( + 1 if eval_config.ignore_groundtruth else + eval_config.max_evals if eval_config.max_evals else + None), + master=eval_config.eval_master, + save_graph=eval_config.save_graph, + save_graph_dir=(eval_dir if eval_config.save_graph else '')) diff --git a/object_detection/export_inference_graph.py b/object_detection/export_inference_graph.py new file mode 100644 index 0000000000000000000000000000000000000000..e9836e99714fd7cb2ada65edbecbda192d461a93 --- /dev/null +++ b/object_detection/export_inference_graph.py @@ -0,0 +1,101 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +r"""Tool to export an object detection model for inference. + +Prepares an object detection tensorflow graph for inference using model +configuration and an optional trained checkpoint. Outputs either an inference +graph or a SavedModel (https://tensorflow.github.io/serving/serving_basic.html). + +The inference graph contains one of three input nodes depending on the user +specified option. + * `image_tensor`: Accepts a uint8 4-D tensor of shape [1, None, None, 3] + * `encoded_image_string_tensor`: Accepts a scalar string tensor of encoded PNG + or JPEG image. + * `tf_example`: Accepts a serialized TFExample proto. The batch size in this + case is always 1. + +and the following output nodes returned by the model.postprocess(..): + * `num_detections`: Outputs float32 tensors of the form [batch] + that specifies the number of valid boxes per image in the batch. + * `detection_boxes`: Outputs float32 tensors of the form + [batch, num_boxes, 4] containing detected boxes. + * `detection_scores`: Outputs float32 tensors of the form + [batch, num_boxes] containing class scores for the detections. + * `detection_classes`: Outputs float32 tensors of the form + [batch, num_boxes] containing classes for the detections. + * `detection_masks`: Outputs float32 tensors of the form + [batch, num_boxes, mask_height, mask_width] containing predicted instance + masks for each box if its present in the dictionary of postprocessed + tensors returned by the model. + +Note that currently `batch` is always 1, but we will support `batch` > 1 in +the future. + +Optionally, one can freeze the graph by converting the weights in the provided +checkpoint as graph constants thereby eliminating the need to use a checkpoint +file during inference. + +Note that this tool uses `use_moving_averages` from eval_config to decide +which weights to freeze. + +Example Usage: +-------------- +python export_inference_graph \ + --input_type image_tensor \ + --pipeline_config_path path/to/ssd_inception_v2.config \ + --checkpoint_path path/to/model-ckpt \ + --inference_graph_path path/to/inference_graph.pb +""" +import tensorflow as tf +from google.protobuf import text_format +from object_detection import exporter +from object_detection.protos import pipeline_pb2 + +slim = tf.contrib.slim +flags = tf.app.flags + +flags.DEFINE_string('input_type', 'image_tensor', 'Type of input node. Can be ' + 'one of [`image_tensor`, `encoded_image_string_tensor`, ' + '`tf_example`]') +flags.DEFINE_string('pipeline_config_path', '', + 'Path to a pipeline_pb2.TrainEvalPipelineConfig config ' + 'file.') +flags.DEFINE_string('checkpoint_path', '', 'Optional path to checkpoint file. ' + 'If provided, bakes the weights from the checkpoint into ' + 'the graph.') +flags.DEFINE_string('inference_graph_path', '', 'Path to write the output ' + 'inference graph.') +flags.DEFINE_bool('export_as_saved_model', False, 'Whether the exported graph ' + 'should be saved as a SavedModel') + +FLAGS = flags.FLAGS + + +def main(_): + assert FLAGS.pipeline_config_path, 'TrainEvalPipelineConfig missing.' + assert FLAGS.inference_graph_path, 'Inference graph path missing.' + assert FLAGS.input_type, 'Input type missing.' + pipeline_config = pipeline_pb2.TrainEvalPipelineConfig() + with tf.gfile.GFile(FLAGS.pipeline_config_path, 'r') as f: + text_format.Merge(f.read(), pipeline_config) + exporter.export_inference_graph(FLAGS.input_type, pipeline_config, + FLAGS.checkpoint_path, + FLAGS.inference_graph_path, + FLAGS.export_as_saved_model) + + +if __name__ == '__main__': + tf.app.run() diff --git a/object_detection/exporter.py b/object_detection/exporter.py new file mode 100644 index 0000000000000000000000000000000000000000..b6dd46408157b47c9c0719055cac3ed90b238f50 --- /dev/null +++ b/object_detection/exporter.py @@ -0,0 +1,339 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Functions to export object detection inference graph.""" +import logging +import os +import tensorflow as tf +from tensorflow.python import pywrap_tensorflow +from tensorflow.python.client import session +from tensorflow.python.framework import graph_util +from tensorflow.python.framework import importer +from tensorflow.python.platform import gfile +from tensorflow.python.saved_model import signature_constants +from tensorflow.python.training import saver as saver_lib +from object_detection.builders import model_builder +from object_detection.core import standard_fields as fields +from object_detection.data_decoders import tf_example_decoder + +slim = tf.contrib.slim + + +# TODO: Replace with freeze_graph.freeze_graph_with_def_protos when +# newer version of Tensorflow becomes more common. +def freeze_graph_with_def_protos( + input_graph_def, + input_saver_def, + input_checkpoint, + output_node_names, + restore_op_name, + filename_tensor_name, + clear_devices, + initializer_nodes, + variable_names_blacklist=''): + """Converts all variables in a graph and checkpoint into constants.""" + del restore_op_name, filename_tensor_name # Unused by updated loading code. + + # 'input_checkpoint' may be a prefix if we're using Saver V2 format + if not saver_lib.checkpoint_exists(input_checkpoint): + raise ValueError( + 'Input checkpoint "' + input_checkpoint + '" does not exist!') + + if not output_node_names: + raise ValueError( + 'You must supply the name of a node to --output_node_names.') + + # Remove all the explicit device specifications for this node. This helps to + # make the graph more portable. + if clear_devices: + for node in input_graph_def.node: + node.device = '' + + _ = importer.import_graph_def(input_graph_def, name='') + + with session.Session() as sess: + if input_saver_def: + saver = saver_lib.Saver(saver_def=input_saver_def) + saver.restore(sess, input_checkpoint) + else: + var_list = {} + reader = pywrap_tensorflow.NewCheckpointReader(input_checkpoint) + var_to_shape_map = reader.get_variable_to_shape_map() + for key in var_to_shape_map: + try: + tensor = sess.graph.get_tensor_by_name(key + ':0') + except KeyError: + # This tensor doesn't exist in the graph (for example it's + # 'global_step' or a similar housekeeping element) so skip it. + continue + var_list[key] = tensor + saver = saver_lib.Saver(var_list=var_list) + saver.restore(sess, input_checkpoint) + if initializer_nodes: + sess.run(initializer_nodes) + + variable_names_blacklist = (variable_names_blacklist.split(',') if + variable_names_blacklist else None) + output_graph_def = graph_util.convert_variables_to_constants( + sess, + input_graph_def, + output_node_names.split(','), + variable_names_blacklist=variable_names_blacklist) + + return output_graph_def + + +def get_frozen_graph_def(inference_graph_def, use_moving_averages, + input_checkpoint, output_node_names): + """Freezes all variables in a graph definition.""" + saver = None + if use_moving_averages: + variable_averages = tf.train.ExponentialMovingAverage(0.0) + variables_to_restore = variable_averages.variables_to_restore() + saver = tf.train.Saver(variables_to_restore) + else: + saver = tf.train.Saver() + + frozen_graph_def = freeze_graph_with_def_protos( + input_graph_def=inference_graph_def, + input_saver_def=saver.as_saver_def(), + input_checkpoint=input_checkpoint, + output_node_names=output_node_names, + restore_op_name='save/restore_all', + filename_tensor_name='save/Const:0', + clear_devices=True, + initializer_nodes='') + return frozen_graph_def + + +# TODO: Support batch tf example inputs. +def _tf_example_input_placeholder(): + tf_example_placeholder = tf.placeholder( + tf.string, shape=[], name='tf_example') + tensor_dict = tf_example_decoder.TfExampleDecoder().decode( + tf_example_placeholder) + image = tensor_dict[fields.InputDataFields.image] + return tf.expand_dims(image, axis=0) + + +def _image_tensor_input_placeholder(): + return tf.placeholder(dtype=tf.uint8, + shape=(1, None, None, 3), + name='image_tensor') + + +def _encoded_image_string_tensor_input_placeholder(): + image_str = tf.placeholder(dtype=tf.string, + shape=[], + name='encoded_image_string_tensor') + image_tensor = tf.image.decode_image(image_str, channels=3) + image_tensor.set_shape((None, None, 3)) + return tf.expand_dims(image_tensor, axis=0) + + +input_placeholder_fn_map = { + 'image_tensor': _image_tensor_input_placeholder, + 'encoded_image_string_tensor': + _encoded_image_string_tensor_input_placeholder, + 'tf_example': _tf_example_input_placeholder, +} + + +def _add_output_tensor_nodes(postprocessed_tensors): + """Adds output nodes for detection boxes and scores. + + Adds the following nodes for output tensors - + * num_detections: float32 tensor of shape [batch_size]. + * detection_boxes: float32 tensor of shape [batch_size, num_boxes, 4] + containing detected boxes. + * detection_scores: float32 tensor of shape [batch_size, num_boxes] + containing scores for the detected boxes. + * detection_classes: float32 tensor of shape [batch_size, num_boxes] + containing class predictions for the detected boxes. + * detection_masks: (Optional) float32 tensor of shape + [batch_size, num_boxes, mask_height, mask_width] containing masks for each + detection box. + + Args: + postprocessed_tensors: a dictionary containing the following fields + 'detection_boxes': [batch, max_detections, 4] + 'detection_scores': [batch, max_detections] + 'detection_classes': [batch, max_detections] + 'detection_masks': [batch, max_detections, mask_height, mask_width] + (optional). + 'num_detections': [batch] + + Returns: + A tensor dict containing the added output tensor nodes. + """ + label_id_offset = 1 + boxes = postprocessed_tensors.get('detection_boxes') + scores = postprocessed_tensors.get('detection_scores') + classes = postprocessed_tensors.get('detection_classes') + label_id_offset + masks = postprocessed_tensors.get('detection_masks') + num_detections = postprocessed_tensors.get('num_detections') + outputs = {} + outputs['detection_boxes'] = tf.identity(boxes, name='detection_boxes') + outputs['detection_scores'] = tf.identity(scores, name='detection_scores') + outputs['detection_classes'] = tf.identity(classes, name='detection_classes') + outputs['num_detections'] = tf.identity(num_detections, name='num_detections') + if masks is not None: + outputs['detection_masks'] = tf.identity(masks, name='detection_masks') + return outputs + + +def _write_inference_graph(inference_graph_path, + checkpoint_path=None, + use_moving_averages=False, + output_node_names=( + 'num_detections,detection_scores,' + 'detection_boxes,detection_classes')): + """Writes inference graph to disk with the option to bake in weights. + + If checkpoint_path is not None bakes the weights into the graph thereby + eliminating the need of checkpoint files during inference. If the model + was trained with moving averages, setting use_moving_averages to true + restores the moving averages, otherwise the original set of variables + is restored. + + Args: + inference_graph_path: Path to write inference graph. + checkpoint_path: Optional path to the checkpoint file. + use_moving_averages: Whether to export the original or the moving averages + of the trainable variables from the checkpoint. + output_node_names: Output tensor names, defaults are: num_detections, + detection_scores, detection_boxes, detection_classes. + """ + inference_graph_def = tf.get_default_graph().as_graph_def() + if checkpoint_path: + output_graph_def = get_frozen_graph_def( + inference_graph_def=inference_graph_def, + use_moving_averages=use_moving_averages, + input_checkpoint=checkpoint_path, + output_node_names=output_node_names, + ) + + with gfile.GFile(inference_graph_path, 'wb') as f: + f.write(output_graph_def.SerializeToString()) + logging.info('%d ops in the final graph.', len(output_graph_def.node)) + + return + tf.train.write_graph(inference_graph_def, + os.path.dirname(inference_graph_path), + os.path.basename(inference_graph_path), + as_text=False) + + +def _write_saved_model(inference_graph_path, inputs, outputs, + checkpoint_path=None, use_moving_averages=False): + """Writes SavedModel to disk. + + If checkpoint_path is not None bakes the weights into the graph thereby + eliminating the need of checkpoint files during inference. If the model + was trained with moving averages, setting use_moving_averages to true + restores the moving averages, otherwise the original set of variables + is restored. + + Args: + inference_graph_path: Path to write inference graph. + inputs: The input image tensor to use for detection. + outputs: A tensor dictionary containing the outputs of a DetectionModel. + checkpoint_path: Optional path to the checkpoint file. + use_moving_averages: Whether to export the original or the moving averages + of the trainable variables from the checkpoint. + """ + inference_graph_def = tf.get_default_graph().as_graph_def() + checkpoint_graph_def = None + if checkpoint_path: + output_node_names = ','.join(outputs.keys()) + checkpoint_graph_def = get_frozen_graph_def( + inference_graph_def=inference_graph_def, + use_moving_averages=use_moving_averages, + input_checkpoint=checkpoint_path, + output_node_names=output_node_names + ) + + with tf.Graph().as_default(): + with session.Session() as sess: + + tf.import_graph_def(checkpoint_graph_def) + + builder = tf.saved_model.builder.SavedModelBuilder(inference_graph_path) + + tensor_info_inputs = { + 'inputs': tf.saved_model.utils.build_tensor_info(inputs)} + tensor_info_outputs = {} + for k, v in outputs.items(): + tensor_info_outputs[k] = tf.saved_model.utils.build_tensor_info(v) + + detection_signature = ( + tf.saved_model.signature_def_utils.build_signature_def( + inputs=tensor_info_inputs, + outputs=tensor_info_outputs, + method_name=signature_constants.PREDICT_METHOD_NAME)) + + builder.add_meta_graph_and_variables( + sess, [tf.saved_model.tag_constants.SERVING], + signature_def_map={ + signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY: + detection_signature, + }, + ) + builder.save() + + +def _export_inference_graph(input_type, + detection_model, + use_moving_averages, + checkpoint_path, + inference_graph_path, + export_as_saved_model=False): + """Export helper.""" + if input_type not in input_placeholder_fn_map: + raise ValueError('Unknown input type: {}'.format(input_type)) + inputs = tf.to_float(input_placeholder_fn_map[input_type]()) + preprocessed_inputs = detection_model.preprocess(inputs) + output_tensors = detection_model.predict(preprocessed_inputs) + postprocessed_tensors = detection_model.postprocess(output_tensors) + outputs = _add_output_tensor_nodes(postprocessed_tensors) + out_node_names = list(outputs.keys()) + if export_as_saved_model: + _write_saved_model(inference_graph_path, inputs, outputs, checkpoint_path, + use_moving_averages) + else: + _write_inference_graph(inference_graph_path, checkpoint_path, + use_moving_averages, + output_node_names=','.join(out_node_names)) + + +def export_inference_graph(input_type, pipeline_config, checkpoint_path, + inference_graph_path, export_as_saved_model=False): + """Exports inference graph for the model specified in the pipeline config. + + Args: + input_type: Type of input for the graph. Can be one of [`image_tensor`, + `tf_example`]. + pipeline_config: pipeline_pb2.TrainAndEvalPipelineConfig proto. + checkpoint_path: Path to the checkpoint file to freeze. + inference_graph_path: Path to write inference graph to. + export_as_saved_model: If the model should be exported as a SavedModel. If + false, it is saved as an inference graph. + """ + detection_model = model_builder.build(pipeline_config.model, + is_training=False) + _export_inference_graph(input_type, detection_model, + pipeline_config.eval_config.use_moving_averages, + checkpoint_path, inference_graph_path, + export_as_saved_model) diff --git a/object_detection/exporter_test.py b/object_detection/exporter_test.py new file mode 100644 index 0000000000000000000000000000000000000000..d613a7f1c58416be462aa60ae67d59cb269bd6b4 --- /dev/null +++ b/object_detection/exporter_test.py @@ -0,0 +1,397 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.export_inference_graph.""" +import os +import numpy as np +import six +import tensorflow as tf +from object_detection import exporter +from object_detection.builders import model_builder +from object_detection.core import model +from object_detection.protos import pipeline_pb2 + +if six.PY2: + import mock # pylint: disable=g-import-not-at-top +else: + from unittest import mock # pylint: disable=g-import-not-at-top + + +class FakeModel(model.DetectionModel): + + def __init__(self, add_detection_masks=False): + self._add_detection_masks = add_detection_masks + + def preprocess(self, inputs): + return tf.identity(inputs) + + def predict(self, preprocessed_inputs): + return {'image': tf.layers.conv2d(preprocessed_inputs, 3, 1)} + + def postprocess(self, prediction_dict): + with tf.control_dependencies(prediction_dict.values()): + postprocessed_tensors = { + 'detection_boxes': tf.constant([[0.0, 0.0, 0.5, 0.5], + [0.5, 0.5, 0.8, 0.8]], tf.float32), + 'detection_scores': tf.constant([[0.7, 0.6]], tf.float32), + 'detection_classes': tf.constant([[0, 1]], tf.float32), + 'num_detections': tf.constant([2], tf.float32) + } + if self._add_detection_masks: + postprocessed_tensors['detection_masks'] = tf.constant( + np.arange(32).reshape([2, 4, 4]), tf.float32) + return postprocessed_tensors + + def restore_fn(self, checkpoint_path, from_detection_checkpoint): + pass + + def loss(self, prediction_dict): + pass + + +class ExportInferenceGraphTest(tf.test.TestCase): + + def _save_checkpoint_from_mock_model(self, checkpoint_path, + use_moving_averages): + g = tf.Graph() + with g.as_default(): + mock_model = FakeModel() + preprocessed_inputs = mock_model.preprocess( + tf.ones([1, 3, 4, 3], tf.float32)) + predictions = mock_model.predict(preprocessed_inputs) + mock_model.postprocess(predictions) + if use_moving_averages: + tf.train.ExponentialMovingAverage(0.0).apply() + saver = tf.train.Saver() + init = tf.global_variables_initializer() + with self.test_session() as sess: + sess.run(init) + saver.save(sess, checkpoint_path) + + def _load_inference_graph(self, inference_graph_path): + od_graph = tf.Graph() + with od_graph.as_default(): + od_graph_def = tf.GraphDef() + with tf.gfile.GFile(inference_graph_path) as fid: + serialized_graph = fid.read() + od_graph_def.ParseFromString(serialized_graph) + tf.import_graph_def(od_graph_def, name='') + return od_graph + + def _create_tf_example(self, image_array): + with self.test_session(): + encoded_image = tf.image.encode_jpeg(tf.constant(image_array)).eval() + def _bytes_feature(value): + return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value])) + example = tf.train.Example(features=tf.train.Features(feature={ + 'image/encoded': _bytes_feature(encoded_image), + 'image/format': _bytes_feature('jpg'), + 'image/source_id': _bytes_feature('image_id') + })).SerializeToString() + return example + + def test_export_graph_with_image_tensor_input(self): + with mock.patch.object( + model_builder, 'build', autospec=True) as mock_builder: + mock_builder.return_value = FakeModel() + inference_graph_path = os.path.join(self.get_temp_dir(), + 'exported_graph.pbtxt') + + pipeline_config = pipeline_pb2.TrainEvalPipelineConfig() + pipeline_config.eval_config.use_moving_averages = False + exporter.export_inference_graph( + input_type='image_tensor', + pipeline_config=pipeline_config, + checkpoint_path=None, + inference_graph_path=inference_graph_path) + + def test_export_graph_with_tf_example_input(self): + with mock.patch.object( + model_builder, 'build', autospec=True) as mock_builder: + mock_builder.return_value = FakeModel() + inference_graph_path = os.path.join(self.get_temp_dir(), + 'exported_graph.pbtxt') + pipeline_config = pipeline_pb2.TrainEvalPipelineConfig() + pipeline_config.eval_config.use_moving_averages = False + exporter.export_inference_graph( + input_type='tf_example', + pipeline_config=pipeline_config, + checkpoint_path=None, + inference_graph_path=inference_graph_path) + + def test_export_graph_with_encoded_image_string_input(self): + with mock.patch.object( + model_builder, 'build', autospec=True) as mock_builder: + mock_builder.return_value = FakeModel() + inference_graph_path = os.path.join(self.get_temp_dir(), + 'exported_graph.pbtxt') + pipeline_config = pipeline_pb2.TrainEvalPipelineConfig() + pipeline_config.eval_config.use_moving_averages = False + exporter.export_inference_graph( + input_type='encoded_image_string_tensor', + pipeline_config=pipeline_config, + checkpoint_path=None, + inference_graph_path=inference_graph_path) + + def test_export_frozen_graph(self): + checkpoint_path = os.path.join(self.get_temp_dir(), 'model-ckpt') + self._save_checkpoint_from_mock_model(checkpoint_path, + use_moving_averages=False) + inference_graph_path = os.path.join(self.get_temp_dir(), + 'exported_graph.pb') + with mock.patch.object( + model_builder, 'build', autospec=True) as mock_builder: + mock_builder.return_value = FakeModel() + pipeline_config = pipeline_pb2.TrainEvalPipelineConfig() + pipeline_config.eval_config.use_moving_averages = False + exporter.export_inference_graph( + input_type='image_tensor', + pipeline_config=pipeline_config, + checkpoint_path=checkpoint_path, + inference_graph_path=inference_graph_path) + + def test_export_frozen_graph_with_moving_averages(self): + checkpoint_path = os.path.join(self.get_temp_dir(), 'model-ckpt') + self._save_checkpoint_from_mock_model(checkpoint_path, + use_moving_averages=True) + inference_graph_path = os.path.join(self.get_temp_dir(), + 'exported_graph.pb') + with mock.patch.object( + model_builder, 'build', autospec=True) as mock_builder: + mock_builder.return_value = FakeModel() + pipeline_config = pipeline_pb2.TrainEvalPipelineConfig() + pipeline_config.eval_config.use_moving_averages = True + exporter.export_inference_graph( + input_type='image_tensor', + pipeline_config=pipeline_config, + checkpoint_path=checkpoint_path, + inference_graph_path=inference_graph_path) + + def test_export_model_with_all_output_nodes(self): + checkpoint_path = os.path.join(self.get_temp_dir(), 'model-ckpt') + self._save_checkpoint_from_mock_model(checkpoint_path, + use_moving_averages=False) + inference_graph_path = os.path.join(self.get_temp_dir(), + 'exported_graph.pb') + with mock.patch.object( + model_builder, 'build', autospec=True) as mock_builder: + mock_builder.return_value = FakeModel(add_detection_masks=True) + pipeline_config = pipeline_pb2.TrainEvalPipelineConfig() + exporter.export_inference_graph( + input_type='image_tensor', + pipeline_config=pipeline_config, + checkpoint_path=checkpoint_path, + inference_graph_path=inference_graph_path) + inference_graph = self._load_inference_graph(inference_graph_path) + with self.test_session(graph=inference_graph): + inference_graph.get_tensor_by_name('image_tensor:0') + inference_graph.get_tensor_by_name('detection_boxes:0') + inference_graph.get_tensor_by_name('detection_scores:0') + inference_graph.get_tensor_by_name('detection_classes:0') + inference_graph.get_tensor_by_name('detection_masks:0') + inference_graph.get_tensor_by_name('num_detections:0') + + def test_export_model_with_detection_only_nodes(self): + checkpoint_path = os.path.join(self.get_temp_dir(), 'model-ckpt') + self._save_checkpoint_from_mock_model(checkpoint_path, + use_moving_averages=False) + inference_graph_path = os.path.join(self.get_temp_dir(), + 'exported_graph.pb') + with mock.patch.object( + model_builder, 'build', autospec=True) as mock_builder: + mock_builder.return_value = FakeModel(add_detection_masks=False) + pipeline_config = pipeline_pb2.TrainEvalPipelineConfig() + exporter.export_inference_graph( + input_type='image_tensor', + pipeline_config=pipeline_config, + checkpoint_path=checkpoint_path, + inference_graph_path=inference_graph_path) + inference_graph = self._load_inference_graph(inference_graph_path) + with self.test_session(graph=inference_graph): + inference_graph.get_tensor_by_name('image_tensor:0') + inference_graph.get_tensor_by_name('detection_boxes:0') + inference_graph.get_tensor_by_name('detection_scores:0') + inference_graph.get_tensor_by_name('detection_classes:0') + inference_graph.get_tensor_by_name('num_detections:0') + with self.assertRaises(KeyError): + inference_graph.get_tensor_by_name('detection_masks:0') + + def test_export_and_run_inference_with_image_tensor(self): + checkpoint_path = os.path.join(self.get_temp_dir(), 'model-ckpt') + self._save_checkpoint_from_mock_model(checkpoint_path, + use_moving_averages=False) + inference_graph_path = os.path.join(self.get_temp_dir(), + 'exported_graph.pb') + with mock.patch.object( + model_builder, 'build', autospec=True) as mock_builder: + mock_builder.return_value = FakeModel(add_detection_masks=True) + pipeline_config = pipeline_pb2.TrainEvalPipelineConfig() + pipeline_config.eval_config.use_moving_averages = False + exporter.export_inference_graph( + input_type='image_tensor', + pipeline_config=pipeline_config, + checkpoint_path=checkpoint_path, + inference_graph_path=inference_graph_path) + + inference_graph = self._load_inference_graph(inference_graph_path) + with self.test_session(graph=inference_graph) as sess: + image_tensor = inference_graph.get_tensor_by_name('image_tensor:0') + boxes = inference_graph.get_tensor_by_name('detection_boxes:0') + scores = inference_graph.get_tensor_by_name('detection_scores:0') + classes = inference_graph.get_tensor_by_name('detection_classes:0') + masks = inference_graph.get_tensor_by_name('detection_masks:0') + num_detections = inference_graph.get_tensor_by_name('num_detections:0') + (boxes, scores, classes, masks, num_detections) = sess.run( + [boxes, scores, classes, masks, num_detections], + feed_dict={image_tensor: np.ones((1, 4, 4, 3)).astype(np.uint8)}) + self.assertAllClose(boxes, [[0.0, 0.0, 0.5, 0.5], + [0.5, 0.5, 0.8, 0.8]]) + self.assertAllClose(scores, [[0.7, 0.6]]) + self.assertAllClose(classes, [[1, 2]]) + self.assertAllClose(masks, np.arange(32).reshape([2, 4, 4])) + self.assertAllClose(num_detections, [2]) + + def _create_encoded_image_string(self, image_array_np, encoding_format): + od_graph = tf.Graph() + with od_graph.as_default(): + if encoding_format == 'jpg': + encoded_string = tf.image.encode_jpeg(image_array_np) + elif encoding_format == 'png': + encoded_string = tf.image.encode_png(image_array_np) + else: + raise ValueError('Supports only the following formats: `jpg`, `png`') + with self.test_session(graph=od_graph): + return encoded_string.eval() + + def test_export_and_run_inference_with_encoded_image_string_tensor(self): + checkpoint_path = os.path.join(self.get_temp_dir(), 'model-ckpt') + self._save_checkpoint_from_mock_model(checkpoint_path, + use_moving_averages=False) + inference_graph_path = os.path.join(self.get_temp_dir(), + 'exported_graph.pb') + with mock.patch.object( + model_builder, 'build', autospec=True) as mock_builder: + mock_builder.return_value = FakeModel(add_detection_masks=True) + pipeline_config = pipeline_pb2.TrainEvalPipelineConfig() + pipeline_config.eval_config.use_moving_averages = False + exporter.export_inference_graph( + input_type='encoded_image_string_tensor', + pipeline_config=pipeline_config, + checkpoint_path=checkpoint_path, + inference_graph_path=inference_graph_path) + + inference_graph = self._load_inference_graph(inference_graph_path) + jpg_image_str = self._create_encoded_image_string( + np.ones((4, 4, 3)).astype(np.uint8), 'jpg') + png_image_str = self._create_encoded_image_string( + np.ones((4, 4, 3)).astype(np.uint8), 'png') + with self.test_session(graph=inference_graph) as sess: + image_str_tensor = inference_graph.get_tensor_by_name( + 'encoded_image_string_tensor:0') + boxes = inference_graph.get_tensor_by_name('detection_boxes:0') + scores = inference_graph.get_tensor_by_name('detection_scores:0') + classes = inference_graph.get_tensor_by_name('detection_classes:0') + masks = inference_graph.get_tensor_by_name('detection_masks:0') + num_detections = inference_graph.get_tensor_by_name('num_detections:0') + for image_str in [jpg_image_str, png_image_str]: + (boxes_np, scores_np, classes_np, masks_np, + num_detections_np) = sess.run( + [boxes, scores, classes, masks, num_detections], + feed_dict={image_str_tensor: image_str}) + self.assertAllClose(boxes_np, [[0.0, 0.0, 0.5, 0.5], + [0.5, 0.5, 0.8, 0.8]]) + self.assertAllClose(scores_np, [[0.7, 0.6]]) + self.assertAllClose(classes_np, [[1, 2]]) + self.assertAllClose(masks_np, np.arange(32).reshape([2, 4, 4])) + self.assertAllClose(num_detections_np, [2]) + + def test_export_and_run_inference_with_tf_example(self): + checkpoint_path = os.path.join(self.get_temp_dir(), 'model-ckpt') + self._save_checkpoint_from_mock_model(checkpoint_path, + use_moving_averages=False) + inference_graph_path = os.path.join(self.get_temp_dir(), + 'exported_graph.pb') + with mock.patch.object( + model_builder, 'build', autospec=True) as mock_builder: + mock_builder.return_value = FakeModel(add_detection_masks=True) + pipeline_config = pipeline_pb2.TrainEvalPipelineConfig() + pipeline_config.eval_config.use_moving_averages = False + exporter.export_inference_graph( + input_type='tf_example', + pipeline_config=pipeline_config, + checkpoint_path=checkpoint_path, + inference_graph_path=inference_graph_path) + + inference_graph = self._load_inference_graph(inference_graph_path) + with self.test_session(graph=inference_graph) as sess: + tf_example = inference_graph.get_tensor_by_name('tf_example:0') + boxes = inference_graph.get_tensor_by_name('detection_boxes:0') + scores = inference_graph.get_tensor_by_name('detection_scores:0') + classes = inference_graph.get_tensor_by_name('detection_classes:0') + masks = inference_graph.get_tensor_by_name('detection_masks:0') + num_detections = inference_graph.get_tensor_by_name('num_detections:0') + (boxes, scores, classes, masks, num_detections) = sess.run( + [boxes, scores, classes, masks, num_detections], + feed_dict={tf_example: self._create_tf_example( + np.ones((4, 4, 3)).astype(np.uint8))}) + self.assertAllClose(boxes, [[0.0, 0.0, 0.5, 0.5], + [0.5, 0.5, 0.8, 0.8]]) + self.assertAllClose(scores, [[0.7, 0.6]]) + self.assertAllClose(classes, [[1, 2]]) + self.assertAllClose(masks, np.arange(32).reshape([2, 4, 4])) + self.assertAllClose(num_detections, [2]) + + def test_export_saved_model_and_run_inference(self): + checkpoint_path = os.path.join(self.get_temp_dir(), 'model-ckpt') + self._save_checkpoint_from_mock_model(checkpoint_path, + use_moving_averages=False) + inference_graph_path = os.path.join(self.get_temp_dir(), + 'saved_model') + + with mock.patch.object( + model_builder, 'build', autospec=True) as mock_builder: + mock_builder.return_value = FakeModel(add_detection_masks=True) + pipeline_config = pipeline_pb2.TrainEvalPipelineConfig() + pipeline_config.eval_config.use_moving_averages = False + exporter.export_inference_graph( + input_type='tf_example', + pipeline_config=pipeline_config, + checkpoint_path=checkpoint_path, + inference_graph_path=inference_graph_path, + export_as_saved_model=True) + + with tf.Graph().as_default() as od_graph: + with self.test_session(graph=od_graph) as sess: + tf.saved_model.loader.load( + sess, [tf.saved_model.tag_constants.SERVING], inference_graph_path) + tf_example = od_graph.get_tensor_by_name('import/tf_example:0') + boxes = od_graph.get_tensor_by_name('import/detection_boxes:0') + scores = od_graph.get_tensor_by_name('import/detection_scores:0') + classes = od_graph.get_tensor_by_name('import/detection_classes:0') + masks = od_graph.get_tensor_by_name('import/detection_masks:0') + num_detections = od_graph.get_tensor_by_name('import/num_detections:0') + (boxes, scores, classes, masks, num_detections) = sess.run( + [boxes, scores, classes, masks, num_detections], + feed_dict={tf_example: self._create_tf_example( + np.ones((4, 4, 3)).astype(np.uint8))}) + self.assertAllClose(boxes, [[0.0, 0.0, 0.5, 0.5], + [0.5, 0.5, 0.8, 0.8]]) + self.assertAllClose(scores, [[0.7, 0.6]]) + self.assertAllClose(classes, [[1, 2]]) + self.assertAllClose(masks, np.arange(32).reshape([2, 4, 4])) + self.assertAllClose(num_detections, [2]) + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/g3doc/configuring_jobs.md b/object_detection/g3doc/configuring_jobs.md new file mode 100644 index 0000000000000000000000000000000000000000..f4d345ffcead5594b56639ad3336867ea3402201 --- /dev/null +++ b/object_detection/g3doc/configuring_jobs.md @@ -0,0 +1,162 @@ +# Configuring the Object Detection Training Pipeline + +## Overview + +The Tensorflow Object Detection API uses protobuf files to configure the +training and evaluation process. The schema for the training pipeline can be +found in object_detection/protos/pipeline.proto. At a high level, the config +file is split into 5 parts: + +1. The `model` configuration. This defines what type of model will be trained +(ie. meta-architecture, feature extractor). +2. The `train_config`, which decides what parameters should be used to train +model parameters (ie. SGD parameters, input preprocessing and feature extractor +initialization values). +3. The `eval_config`, which determines what set of metrics will be reported for +evaluation (currently we only support the PASCAL VOC metrics). +4. The `train_input_config`, which defines what dataset the model should be +trained on. +5. The `eval_input_config`, which defines what dataset the model will be +evaluated on. Typically this should be different than the training input +dataset. + +A skeleton configuration file is shown below: + +``` +model { +(... Add model config here...) +} + +train_config : { +(... Add train_config here...) +} + +train_input_reader: { +(... Add train_input configuration here...) +} + +eval_config: { +} + +eval_input_reader: { +(... Add eval_input configuration here...) +} +``` + +## Picking Model Parameters + +There are a large number of model parameters to configure. The best settings +will depend on your given application. Faster R-CNN models are better suited to +cases where high accuracy is desired and latency is of lower priority. +Conversely, if processing time is the most important factor, SSD models are +recommended. Read [our paper](https://arxiv.org/abs/1611.10012) for a more +detailed discussion on the speed vs accuracy tradeoff. + +To help new users get started, sample model configurations have been provided +in the object_detection/samples/model_configs folder. The contents of these +configuration files can be pasted into `model` field of the skeleton +configuration. Users should note that the `num_classes` field should be changed +to a value suited for the dataset the user is training on. + +## Defining Inputs + +The Tensorflow Object Detection API accepts inputs in the TFRecord file format. +Users must specify the locations of both the training and evaluation files. +Additionally, users should also specify a label map, which define the mapping +between a class id and class name. The label map should be identical between +training and evaluation datasets. + +An example input configuration looks as follows: + +``` +tf_record_input_reader { + input_path: "/usr/home/username/data/train.record" +} +label_map_path: "/usr/home/username/data/label_map.pbtxt" +``` + +Users should substitute the `input_path` and `label_map_path` arguments and +insert the input configuration into the `train_input_reader` and +`eval_input_reader` fields in the skeleton configuration. Note that the paths +can also point to Google Cloud Storage buckets (ie. +"gs://project_bucket/train.record") for use on Google Cloud. + +## Configuring the Trainer + +The `train_config` defines parts of the training process: + +1. Model parameter initialization. +2. Input preprocessing. +3. SGD parameters. + +A sample `train_config` is below: + +``` +batch_size: 1 +optimizer { + momentum_optimizer: { + learning_rate: { + manual_step_learning_rate { + initial_learning_rate: 0.0002 + schedule { + step: 0 + learning_rate: .0002 + } + schedule { + step: 900000 + learning_rate: .00002 + } + schedule { + step: 1200000 + learning_rate: .000002 + } + } + } + momentum_optimizer_value: 0.9 + } + use_moving_average: false +} +fine_tune_checkpoint: "/usr/home/username/tmp/model.ckpt-#####" +from_detection_checkpoint: true +gradient_clipping_by_norm: 10.0 +data_augmentation_options { + random_horizontal_flip { + } +} +``` + +### Model Parameter Initialization + +While optional, it is highly recommended that users utilize other object +detection checkpoints. Training an object detector from scratch can take days. +To speed up the training process, it is recommended that users re-use the +feature extractor parameters from a pre-existing object classification or +detection checkpoint. `train_config` provides two fields to specify +pre-existing checkpoints: `fine_tune_checkpoint` and +`from_detection_checkpoint`. `fine_tune_checkpoint` should provide a path to +the pre-existing checkpoint +(ie:"/usr/home/username/checkpoint/model.ckpt-#####"). +`from_detection_checkpoint` is a boolean value. If false, it assumes the +checkpoint was from an object classification checkpoint. Note that starting +from a detection checkpoint will usually result in a faster training job than +a classification checkpoint. + +The list of provided checkpoints can be found [here](detection_model_zoo.md). + +### Input Preprocessing + +The `data_augmentation_options` in `train_config` can be used to specify +how training data can be modified. This field is optional. + +### SGD Parameters + +The remainings parameters in `train_config` are hyperparameters for gradient +descent. Please note that the optimal learning rates provided in these +configuration files may depend on the specifics of the training setup (e.g. +number of workers, gpu type). + +## Configuring the Evaluator + +Currently evaluation is fixed to generating metrics as defined by the PASCAL +VOC challenge. The parameters for `eval_config` are set to reasonable defaults +and typically do not need to be configured. diff --git a/object_detection/g3doc/defining_your_own_model.md b/object_detection/g3doc/defining_your_own_model.md new file mode 100644 index 0000000000000000000000000000000000000000..6e36543b50a1f1df924b07d34bdf10a93b716268 --- /dev/null +++ b/object_detection/g3doc/defining_your_own_model.md @@ -0,0 +1,137 @@ +# So you want to create a new model! + +In this section, we discuss some of the abstractions that we use +for defining detection models. If you would like to define a new model +architecture for detection and use it in the Tensorflow Detection API, +then this section should also serve as a high level guide to the files that you +will need to edit to get your new model working. + +## DetectionModels (`object_detection/core/model.py`) + +In order to be trained, evaluated, and exported for serving using our +provided binaries, all models under the Tensorflow Object Detection API must +implement the `DetectionModel` interface (see the full definition in `object_detection/core/model.py`). In particular, +each of these models are responsible for implementing 5 functions: + +* `preprocess`: Run any preprocessing (e.g., scaling/shifting/reshaping) of + input values that is necessary prior to running the detector on an input + image. +* `predict`: Produce “raw” prediction tensors that can be passed to loss or + postprocess functions. +* `postprocess`: Convert predicted output tensors to final detections. +* `loss`: Compute scalar loss tensors with respect to provided groundtruth. +* `restore`: Load a checkpoint into the Tensorflow graph. + +Given a `DetectionModel` at training time, we pass each image batch through +the following sequence of functions to compute a loss which can be optimized via +SGD: + +``` +inputs (images tensor) -> preprocess -> predict -> loss -> outputs (loss tensor) +``` + +And at eval time, we pass each image batch through the following sequence of +functions to produce a set of detections: + +``` +inputs (images tensor) -> preprocess -> predict -> postprocess -> + outputs (boxes tensor, scores tensor, classes tensor, num_detections tensor) +``` + +Some conventions to be aware of: + +* `DetectionModel`s should make no assumptions about the input size or aspect + ratio --- they are responsible for doing any resize/reshaping necessary + (see docstring for the `preprocess` function). +* Output classes are always integers in the range `[0, num_classes)`. + Any mapping of these integers to semantic labels is to be handled outside + of this class. We never explicitly emit a “background class” --- thus 0 is + the first non-background class and any logic of predicting and removing + implicit background classes must be handled internally by the implementation. +* Detected boxes are to be interpreted as being in + `[y_min, x_min, y_max, x_max]` format and normalized relative to the + image window. +* We do not specifically assume any kind of probabilistic interpretation of the + scores --- the only important thing is their relative ordering. Thus + implementations of the postprocess function are free to output logits, + probabilities, calibrated probabilities, or anything else. + +## Defining a new Faster R-CNN or SSD Feature Extractor + +In most cases, you probably will not implement a `DetectionModel` from scratch +--- instead you might create a new feature extractor to be used by one of the +SSD or Faster R-CNN meta-architectures. (We think of meta-architectures as +classes that define entire families of models using the `DetectionModel` +abstraction). + +Note: For the following discussion to make sense, we recommend first becoming +familiar with the [Faster R-CNN](https://arxiv.org/abs/1506.01497) paper. + +Let’s now imagine that you have invented a brand new network architecture +(say, “InceptionV100”) for classification and want to see how InceptionV100 +would behave as a feature extractor for detection (say, with Faster R-CNN). +A similar procedure would hold for SSD models, but we’ll discuss Faster R-CNN. + +To use InceptionV100, we will have to define a new +`FasterRCNNFeatureExtractor` and pass it to our `FasterRCNNMetaArch` +constructor as input. See +`object_detection/meta_architectures/faster_rcnn_meta_arch.py` for definitions +of `FasterRCNNFeatureExtractor` and `FasterRCNNMetaArch`, respectively. +A `FasterRCNNFeatureExtractor` must define a few +functions: + +* `preprocess`: Run any preprocessing of input values that is necessary prior + to running the detector on an input image. +* `_extract_proposal_features`: Extract first stage Region Proposal Network + (RPN) features. +* `_extract_box_classifier_features`: Extract second stage Box Classifier + features. +* `restore_from_classification_checkpoint_fn`: Load a checkpoint into the + Tensorflow graph. + +See the `object_detection/models/faster_rcnn_resnet_v1_feature_extractor.py` +definition as one example. Some remarks: + +* We typically initialize the weights of this feature extractor + using those from the + [Slim Resnet-101 classification checkpoint](https://github.com/tensorflow/models/tree/master/slim#pre-trained-models), + and we know + that images were preprocessed when training this checkpoint + by subtracting a channel mean from each input + image. Thus, we implement the preprocess function to replicate the same + channel mean subtraction behavior. +* The “full” resnet classification network defined in slim is cut into two + parts --- all but the last “resnet block” is put into the + `_extract_proposal_features` function and the final block is separately + defined in the `_extract_box_classifier_features function`. In general, + some experimentation may be required to decide on an optimal layer at + which to “cut” your feature extractor into these two pieces for Faster R-CNN. + +## Register your model for configuration + +Assuming that your new feature extractor does not require nonstandard +configuration, you will want to ideally be able to simply change the +“feature_extractor.type” fields in your configuration protos to point to a +new feature extractor. In order for our API to know how to understand this +new type though, you will first have to register your new feature +extractor with the model builder (`object_detection/builders/model_builder.py`), +whose job is to create models from config protos.. + +Registration is simple --- just add a pointer to the new Feature Extractor +class that you have defined in one of the SSD or Faster R-CNN Feature +Extractor Class maps at the top of the +`object_detection/builders/model_builder.py` file. +We recommend adding a test in `object_detection/builders/model_builder_test.py` +to make sure that parsing your proto will work as expected. + +## Taking your new model for a spin + +After registration you are ready to go with your model! Some final tips: + +* To save time debugging, try running your configuration file locally first + (both training and evaluation). +* Do a sweep of learning rates to figure out which learning rate is best + for your model. +* A small but often important detail: you may find it necessary to disable + batchnorm training (that is, load the batch norm parameters from the + classification checkpoint, but do not update them during gradient descent). diff --git a/object_detection/g3doc/detection_model_zoo.md b/object_detection/g3doc/detection_model_zoo.md new file mode 100644 index 0000000000000000000000000000000000000000..ba656bae674b54cf149b0c6120a2901f228abe64 --- /dev/null +++ b/object_detection/g3doc/detection_model_zoo.md @@ -0,0 +1,42 @@ +# Tensorflow detection model zoo + +We provide a collection of detection models pre-trained on the +[COCO dataset](http://mscoco.org). +These models can be useful for out-of-the-box inference if you are interested +in categories already in COCO (e.g., humans, cars, etc). +They are also useful for initializing your models when training on novel +datasets. + +In the table below, we list each such pre-trained model including: + +* a model name that corresponds to a config file that was used to train this + model in the `samples/configs` directory, +* a download link to a tar.gz file containing the pre-trained model, +* model speed (one of {slow, medium, fast}), +* detector performance on COCO data as measured by the COCO mAP measure. + Here, higher is better, and we only report bounding box mAP rounded to the + nearest integer. +* Output types (currently only `Boxes`) + +You can un-tar each tar.gz file via, e.g.,: + +``` +tar -xzvf ssd_mobilenet_v1_coco.tar.gz +``` + +Inside the un-tar'ed directory, you will find: + +* a graph proto (`graph.pbtxt`) +* a checkpoint + (`model.ckpt.data-00000-of-00001`, `model.ckpt.index`, `model.ckpt.meta`) +* a frozen graph proto with weights baked into the graph as constants + (`frozen_inference_graph.pb`) to be used for out of the box inference + (try this out in the Jupyter notebook!) + +| Model name | Speed | COCO mAP | Outputs | +| ------------ | :--------------: | :--------------: | :-------------: | +| [ssd_mobilenet_v1_coco](http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_coco_11_06_2017.tar.gz) | fast | 21 | Boxes | +| [ssd_inception_v2_coco](http://download.tensorflow.org/models/object_detection/ssd_inception_v2_coco_11_06_2017.tar.gz) | fast | 24 | Boxes | +| [rfcn_resnet101_coco](http://download.tensorflow.org/models/object_detection/rfcn_resnet101_coco_11_06_2017.tar.gz) | medium | 30 | Boxes | +| [faster_rcnn_resnet101_coco](http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_coco_11_06_2017.tar.gz) | medium | 32 | Boxes | +| [faster_rcnn_inception_resnet_v2_atrous_coco](http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_resnet_v2_atrous_coco_11_06_2017.tar.gz) | slow | 37 | Boxes | diff --git a/object_detection/g3doc/exporting_models.md b/object_detection/g3doc/exporting_models.md new file mode 100644 index 0000000000000000000000000000000000000000..5291d6b9f6ad7f368ab602ad6cb9957a5d5e48a0 --- /dev/null +++ b/object_detection/g3doc/exporting_models.md @@ -0,0 +1,22 @@ +# Exporting a trained model for inference + +After your model has been trained, you should export it to a Tensorflow +graph proto. A checkpoint will typically consist of three files: + +* model.ckpt-${CHECKPOINT_NUMBER}.data-00000-of-00001, +* model.ckpt-${CHECKPOINT_NUMBER}.index +* model.ckpt-${CHECKPOINT_NUMBER}.meta + +After you've identified a candidate checkpoint to export, run the following +command from tensorflow/models/object_detection: + +``` bash +# From tensorflow/models +python object_detection/export_inference_graph \ + --input_type image_tensor \ + --pipeline_config_path ${PIPELINE_CONFIG_PATH} \ + --checkpoint_path model.ckpt-${CHECKPOINT_NUMBER} \ + --inference_graph_path output_inference_graph.pb +``` + +Afterwards, you should see a graph named output_inference_graph.pb. diff --git a/object_detection/g3doc/img/dogs_detections_output.jpg b/object_detection/g3doc/img/dogs_detections_output.jpg new file mode 100644 index 0000000000000000000000000000000000000000..9e88a7010fa90f5c4a74f6caee78f5c975f77e40 Binary files /dev/null and b/object_detection/g3doc/img/dogs_detections_output.jpg differ diff --git a/object_detection/g3doc/img/kites_detections_output.jpg b/object_detection/g3doc/img/kites_detections_output.jpg new file mode 100644 index 0000000000000000000000000000000000000000..7c0f3364deda6614b5bf6fdddad7e7a578f0f6eb Binary files /dev/null and b/object_detection/g3doc/img/kites_detections_output.jpg differ diff --git a/object_detection/g3doc/img/oxford_pet.png b/object_detection/g3doc/img/oxford_pet.png new file mode 100644 index 0000000000000000000000000000000000000000..ddac415f5ef079f8d6fde8dd4c9838735fd96325 Binary files /dev/null and b/object_detection/g3doc/img/oxford_pet.png differ diff --git a/object_detection/g3doc/img/tensorboard.png b/object_detection/g3doc/img/tensorboard.png new file mode 100644 index 0000000000000000000000000000000000000000..fbcdbeb38cf5594681c0e206a08b6d06bd1e86a9 Binary files /dev/null and b/object_detection/g3doc/img/tensorboard.png differ diff --git a/object_detection/g3doc/img/tensorboard2.png b/object_detection/g3doc/img/tensorboard2.png new file mode 100644 index 0000000000000000000000000000000000000000..97ad22daa11870ecebbbe7cadfb2d8bb30d738f6 Binary files /dev/null and b/object_detection/g3doc/img/tensorboard2.png differ diff --git a/object_detection/g3doc/installation.md b/object_detection/g3doc/installation.md new file mode 100644 index 0000000000000000000000000000000000000000..833f5fc2427b7e7f8991db88d873486e35252943 --- /dev/null +++ b/object_detection/g3doc/installation.md @@ -0,0 +1,79 @@ +# Installation + +## Dependencies + +Tensorflow Object Detection API depends on the following libraries: + +* Protobuf 2.6 +* Pillow 1.0 +* lxml +* tf Slim (which is included in the "tensorflow/models" checkout) +* Jupyter notebook +* Matplotlib +* Tensorflow + +For detailed steps to install Tensorflow, follow the +[Tensorflow installation instructions](https://www.tensorflow.org/install/). +A typically user can install Tensorflow using one of the following commands: + +``` bash +# For CPU +pip install tensorflow +# For GPU +pip install tensorflow-gpu +``` + +The remaining libraries can be installed on Ubuntu 16.04 using via apt-get: + +``` bash +sudo apt-get install protobuf-compiler python-pil python-lxml +sudo pip install jupyter +sudo pip install matplotlib +``` + +Alternatively, users can install dependencies using pip: + +``` bash +sudo pip install pillow +sudo pip install lxml +sudo pip install jupyter +sudo pip install matplotlib +``` + +## Protobuf Compilation + +The Tensorflow Object Detection API uses Protobufs to configure model and +training parameters. Before the framework can be used, the Protobuf libraries +must be compiled. This should be done by running the following command from +the tensorflow/models directory: + + +``` bash +# From tensorflow/models/ +protoc object_detection/protos/*.proto --python_out=. +``` + +## Add Libraries to PYTHONPATH + +When running locally, the tensorflow/models/ and slim directories should be +appended to PYTHONPATH. This can be done by running the following from +tensorflow/models/: + + +``` bash +# From tensorflow/models/ +export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim +``` + +Note: This command needs to run from every new terminal you start. If you wish +to avoid running this manually, you can add it as a new line to the end of your +~/.bashrc file. + +# Testing the Installation + +You can test that you have correctly installed the Tensorflow Object Detection\ +API by running the following command: + +``` bash +python object_detection/builders/model_builder_test.py +``` diff --git a/object_detection/g3doc/preparing_inputs.md b/object_detection/g3doc/preparing_inputs.md new file mode 100644 index 0000000000000000000000000000000000000000..1e80bebb0f8c15801dfef814aa6f1e6ffa68084b --- /dev/null +++ b/object_detection/g3doc/preparing_inputs.md @@ -0,0 +1,45 @@ +# Preparing Inputs + +Tensorflow Object Detection API reads data using the TFRecord file format. Two +sample scripts (`create_pascal_tf_record.py` and `create_pet_tf_record.py`) are +provided to convert from the PASCAL VOC dataset and Oxford-IIIT Pet dataset to +TFRecords. + +## Generating the PASCAL VOC TFRecord files. + +The raw 2012 PASCAL VOC data set can be downloaded +[here](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar). +Extract the tar file and run the `create_pascal_tf_record` script: + +```bash +# From tensorflow/models/object_detection +tar -xvf VOCtrainval_11-May-2012.tar +python create_pascal_tf_record.py --data_dir=VOCdevkit \ + --year=VOC2012 --set=train --output_path=pascal_train.record +python create_pascal_tf_record.py --data_dir=VOCdevkit \ + --year=VOC2012 --set=val --output_path=pascal_val.record +``` + +You should end up with two TFRecord files named `pascal_train.record` and +`pascal_val.record` in the `tensorflow/models/object_detection` directory. + +The label map for the PASCAL VOC data set can be found at +`data/pascal_label_map.pbtxt`. + +## Generation the Oxford-IIIT Pet TFRecord files. + +The Oxford-IIIT Pet data set can be downloaded from +[their website](http://www.robots.ox.ac.uk/~vgg/data/pets/). Extract the tar +file and run the `create_pet_tf_record` script to generate TFRecords. + +```bash +# From tensorflow/models/object_detection +tar -xvf annotations.tar.gz +tar -xvf images.tar.gz +python create_pet_tf_record.py --data_dir=`pwd` --output_dir=`pwd` +``` + +You should end up with two TFRecord files named `pet_train.record` and +`pet_val.record` in the `tensorflow/models/object_detection` directory. + +The label map for the Pet dataset can be found at `data/pet_label_map.pbtxt`. diff --git a/object_detection/g3doc/running_locally.md b/object_detection/g3doc/running_locally.md new file mode 100644 index 0000000000000000000000000000000000000000..dd53225b33ff1af020cad635aeca115344429d15 --- /dev/null +++ b/object_detection/g3doc/running_locally.md @@ -0,0 +1,81 @@ +# Running Locally + +This page walks through the steps required to train an object detection model +on a local machine. It assumes the reader has completed the +following prerequisites: + +1. The Tensorflow Object Detection API has been installed as documented in the +[installation instructions](installation.md). This includes installing library +dependencies, compiling the configuration protobufs and setting up the Python +environment. +2. A valid data set has been created. See [this page](preparing_inputs.md) for +instructions on how to generate a dataset for the PASCAL VOC challenge or the +Oxford-IIIT Pet dataset. +3. A Object Detection pipeline configuration has been written. See +[this page](configuring_jobs.md) for details on how to write a pipeline configuration. + +## Recommended Directory Structure for Training and Evaluation + +``` ++data + -label_map file + -train TFRecord file + -eval TFRecord file ++models + + model + -pipeline config file + +train + +eval +``` + +## Running the Training Job + +A local training job can be run with the following command: + +```bash +# From the tensorflow/models/ directory +python object_detection/train.py \ + --logtostderr \ + --pipeline_config_path=${PATH_TO_YOUR_PIPELINE_CONFIG} \ + --train_dir=${PATH_TO_TRAIN_DIR} +``` + +where `${PATH_TO_YOUR_PIPELINE_CONFIG}` points to the pipeline config and +`${PATH_TO_TRAIN_DIR}` points to the directory in which training checkpoints +and events will be written to. By default, the training job will +run indefinitely until the user kills it. + +## Running the Evaluation Job + +Evaluation is run as a separate job. The eval job will periodically poll the +train directory for new checkpoints and evaluate them on a test dataset. The +job can be run using the following command: + +```bash +# From the tensorflow/models/ directory +python object_detection/eval.py \ + --logtostderr \ + --pipeline_config_path=${PATH_TO_YOUR_PIPELINE_CONFIG} \ + --checkpoint_dir=${PATH_TO_TRAIN_DIR} \ + --eval_dir=${PATH_TO_EVAL_DIR} +``` + +where `${PATH_TO_YOUR_PIPELINE_CONFIG}` points to the pipeline config, +`${PATH_TO_TRAIN_DIR}` points to the directory in which training checkpoints +were saved (same as the training job) and `${PATH_TO_EVAL_DIR}` points to the +directory in which evaluation events will be saved. As with the training job, +the eval job run until terminated by default. + +## Running Tensorboard + +Progress for training and eval jobs can be inspected using Tensorboard. If +using the recommended directory structure, Tensorboard can be run using the +following command: + +```bash +tensorboard --logdir=${PATH_TO_MODEL_DIRECTORY} +``` + +where `${PATH_TO_MODEL_DIRECTORY}` points to the directory that contains the +train and eval directories. Please note it make take Tensorboard a couple +minutes to populate with data. diff --git a/object_detection/g3doc/running_notebook.md b/object_detection/g3doc/running_notebook.md new file mode 100644 index 0000000000000000000000000000000000000000..8d7948d824b67ee4e6a69c9f8a6c77fcba01b881 --- /dev/null +++ b/object_detection/g3doc/running_notebook.md @@ -0,0 +1,15 @@ +# Quick Start: Jupyter notebook for off-the-shelf inference + +If you'd like to hit the ground running and run detection on a few example +images right out of the box, we recommend trying out the Jupyter notebook demo. +To run the Jupyter notebook, run the following command from +`tensorflow/models/object_detection`: + +``` +# From tensorflow/models/object_detection +jupyter notebook +``` + +The notebook should open in your favorite web browser. Click the +[`object_detection_tutorial.ipynb`](../object_detection_tutorial.ipynb) link +to open the demo. diff --git a/object_detection/g3doc/running_on_cloud.md b/object_detection/g3doc/running_on_cloud.md new file mode 100644 index 0000000000000000000000000000000000000000..b691c0e5b8690a567ff551aff5a0448f58571de2 --- /dev/null +++ b/object_detection/g3doc/running_on_cloud.md @@ -0,0 +1,128 @@ +# Running on Google Cloud Platform + +The Tensorflow Object Detection API supports distributed training on Google +Cloud ML Engine. This section documents instructions on how to train and +evaluate your model using Cloud ML. The reader should complete the following +prerequistes: + +1. The reader has created and configured a project on Google Cloud Platform. +See [the Cloud ML quick start guide](https://cloud.google.com/ml-engine/docs/quickstarts/command-line). +2. The reader has installed the Tensorflow Object Detection API as documented +in the [installation instructions](installation.md). +3. The reader has a valid data set and stored it in a Google Cloud Storage +bucket. See [this page](preparing_inputs.md) for instructions on how to generate +a dataset for the PASCAL VOC challenge or the Oxford-IIIT Pet dataset. +4. The reader has configured a valid Object Detection pipeline, and stored it +in a Google Cloud Storage bucket. See [this page](configuring_jobs.md) for +details on how to write a pipeline configuration. + +Additionally, it is recommended users test their job by running training and +evaluation jobs for a few iterations +[locally on their own machines](running_locally.md). + +## Packaging + +In order to run the Tensorflow Object Detection API on Cloud ML, it must be +packaged (along with it's TF-Slim dependency). The required packages can be +created with the following command + +``` bash +# From tensorflow/models/ +python setup.py sdist +(cd slim && python setup.py sdist) +``` + +This will create python packages in dist/object_detection-0.1.tar.gz and +slim/dist/slim-0.1.tar.gz. + +## Running a Multiworker Training Job + +Google Cloud ML requires a YAML configuration file for a multiworker training +job using GPUs. A sample YAML file is given below: + +``` +trainingInput: + runtimeVersion: "1.0" + scaleTier: CUSTOM + masterType: standard_gpu + workerCount: 9 + workerType: standard_gpu + parameterServerCount: 3 + parameterServerType: standard + + +``` + +Please keep the following guidelines in mind when writing the YAML +configuration: + +* A job with n workers will have n + 1 training machines (n workers + 1 master). +* The number of parameters servers used should be an odd number to prevent + a parameter server from storing only weight variables or only bias variables + (due to round robin parameter scheduling). +* The learning rate in the training config should be decreased when using a + larger number of workers. Some experimentation is required to find the + optimal learning rate. + +The YAML file should be saved on the local machine (not on GCP). Once it has +been written, a user can start a training job on Cloud ML Engine using the +following command: + +``` bash +# From tensorflow/models/ +gcloud ml-engine jobs submit training object_detection_`date +%s` \ + --job-dir=gs://${TRAIN_DIR} \ + --packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz \ + --module-name object_detection.train \ + --region us-central1 \ + --config ${PATH_TO_LOCAL_YAML_FILE} \ + -- \ + --train_dir=gs://${TRAIN_DIR} \ + --pipeline_config_path=gs://${PIPELINE_CONFIG_PATH} +``` + +Where `${PATH_TO_LOCAL_YAML_FILE}` is the local path to the YAML configuration, +`gs://${TRAIN_DIR}` specifies the directory on Google Cloud Storage where the +training checkpoints and events will be written to and +`gs://${PIPELINE_CONFIG_PATH}` points to the pipeline configuration stored on +Google Cloud Storage. + +Users can monitor the progress of their training job on the [ML Engine +Dashboard](https://pantheon.corp.google.com/mlengine/jobs). + +## Running an Evaluation Job on Cloud + +Evaluation jobs run on a single machine, so it is not necessary to write a YAML +configuration for evaluation. Run the following command to start the evaluation +job: + +``` bash +gcloud ml-engine jobs submit training object_detection_eval_`date +%s` \ + --job-dir=gs://${TRAIN_DIR} \ + --packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz \ + --module-name object_detection.eval \ + --region us-central1 \ + --scale-tier BASIC_GPU \ + -- \ + --checkpoint_dir=gs://${TRAIN_DIR} \ + --eval_dir=gs://${EVAL_DIR} \ + --pipeline_config_path=gs://${PIPELINE_CONFIG_PATH} +``` + +Where `gs://${TRAIN_DIR}` points to the directory on Google Cloud Storage where +training checkpoints are saved (same as the training job), `gs://${EVAL_DIR}` +points to where evaluation events will be saved on Google Cloud Storage and +`gs://${PIPELINE_CONFIG_PATH}` points to where the pipeline configuration is +stored on Google Cloud Storage. + +## Running Tensorboard + +You can run Tensorboard locally on your own machine to view progress of your +training and eval jobs on Google Cloud ML. Run the following command to start +Tensorboard: + +``` bash +tensorboard --logdir=gs://${YOUR_CLOUD_BUCKET} +``` + +Note it may Tensorboard a few minutes to populate with results. diff --git a/object_detection/g3doc/running_pets.md b/object_detection/g3doc/running_pets.md new file mode 100644 index 0000000000000000000000000000000000000000..6975b1966c78fc9796a0c46cb8d39f8a99b7cbda --- /dev/null +++ b/object_detection/g3doc/running_pets.md @@ -0,0 +1,303 @@ +# Quick Start: Distributed Training on the Oxford-IIIT Pets Dataset on Google Cloud + +This page is a walkthrough for training an object detector using the Tensorflow +Object Detection API. In this tutorial, we'll be training on the Oxford-IIIT Pets +dataset to build a system to detect various breeds of cats and dogs. The output +of the detector will look like the following: + +![](img/oxford_pet.png) + +## Setting up a Project on Google Cloud + +To accelerate the process, we'll run training and evaluation on [Google Cloud +ML Engine](https://cloud.google.com/ml-engine/) to leverage multiple GPUs. To +begin, you will have to set up Google Cloud via the following steps (if you have +already done this, feel free to skip to the next section): + +1. [Create a GCP project](https://cloud.google.com/resource-manager/docs/creating-managing-projects). +2. [Install the Google Cloud SDK](https://cloud.google.com/sdk/downloads) on +your workstation or laptop. +This will provide the tools you need to upload files to Google Cloud Storage and +start ML training jobs. +3. [Enable the ML Engine +APIs](https://console.cloud.google.com/flows/enableapi?apiid=ml.googleapis.com,compute_component&_ga=1.73374291.1570145678.1496689256). +By default, a new GCP project does not enable APIs to start ML Engine training +jobs. Use the above link to explicitly enable them. +4. [Set up a Google Cloud Storage (GCS) +bucket](https://cloud.google.com/storage/docs/creating-buckets). ML Engine +training jobs can only access files on a Google Cloud Storage bucket. In this +tutorial, we'll be required to upload our dataset and configuration to GCS. + +Please remember the name of your GCS bucket, as we will reference it multiple +times in this document. Substitute `${YOUR_GCS_BUCKET}` with the name of +your bucket in this document. For your convenience, you should define the +environment variable below: + +``` bash +export YOUR_GCS_BUCKET=${YOUR_GCS_BUCKET} +``` + +## Installing Tensorflow and the Tensorflow Object Detection API + +Please run through the [installation instructions](installation.md) to install +Tensorflow and all it dependencies. Ensure the Protobuf libraries are +compiled and the library directories are added to `PYTHONPATH`. + +## Getting the Oxford-IIIT Pets Dataset and Uploading it to Google Cloud Storage + +In order to train a detector, we require a dataset of images, bounding boxes and +classifications. For this demo, we'll use the Oxford-IIIT Pets dataset. The raw +dataset for Oxford-IIIT Pets lives +[here](http://www.robots.ox.ac.uk/~vgg/data/pets/). You will need to download +both the image dataset [`images.tar.gz`](http://www.robots.ox.ac.uk/~vgg/data/pets/data/images.tar.gz) +and the groundtruth data [`annotations.tar.gz`](http://www.robots.ox.ac.uk/~vgg/data/pets/data/annotations.tar.gz) +to the `tensorflow/models` directory. This may take some time. After downloading +the tarballs, your `object_detection` directory should appear as follows: + +```lang-none ++ object_detection/ + + data/ + - images.tar.gz + - annotations.tar.gz + - create_pet_tf_record.py + ... other files and directories +``` + +The Tensorflow Object Detection API expects data to be in the TFRecord format, +so we'll now run the `create_pet_tf_record` script to convert from the raw +Oxford-IIIT Pet dataset into TFRecords. Run the following commands from the +`object_detection` directory: + +``` bash +# From tensorflow/models/ +wget http://www.robots.ox.ac.uk/~vgg/data/pets/data/images.tar.gz +wget http://www.robots.ox.ac.uk/~vgg/data/pets/data/annotations.tar.gz +tar -xvf annotations.tar.gz +tar -xvf images.tar.gz +python object_detection/create_pet_tf_record.py \ + --label_map_path=object_detection/data/pet_label_map.pbtxt \ + --data_dir=`pwd` \ + --output_dir=`pwd` +``` + +Note: It is normal to see some warnings when running this script. You may ignore +them. + +Two TFRecord files named `pet_train.record` and `pet_val.record` should be generated +in the `object_detection` directory. + +Now that the data has been generated, we'll need to upload it to Google Cloud +Storage so the data can be accessed by ML Engine. Run the following command to +copy the files into your GCS bucket (substituting `${YOUR_GCS_BUCKET}`): + +``` bash +# From tensorflow/models/ +gsutil cp pet_train.record gs://${YOUR_GCS_BUCKET}/data/pet_train.record +gsutil cp pet_val.record gs://${YOUR_GCS_BUCKET}/data/pet_val.record +gsutil cp object_detection/data/pet_label_map.pbtxt gs://${YOUR_GCS_BUCKET}/data/pet_label_map.pbtxt +``` + +Please remember the path where you upload the data to, as we will need this +information when configuring the pipeline in a following step. + +## Downloading a COCO-pretrained Model for Transfer Learning + +Training a state of the art object detector from scratch can take days, even +when using multiple GPUs! In order to speed up training, we'll take an object +detector trained on a different dataset (COCO), and reuse some of it's +parameters to initialize our new model. + +Download our [COCO-pretrained Faster R-CNN with Resnet-101 +model](http://storage.googleapis.com/download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_coco_11_06_2017.tar.gz). +Unzip the contents of the folder and copy the `model.ckpt*` files into your GCS +Bucket. + +``` bash +wget http://storage.googleapis.com/download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_coco_11_06_2017.tar.gz +tar -xvf faster_rcnn_resnet101_coco_11_06_2017.tar.gz +gsutil cp faster_rcnn_resnet101_coco_11_06_2017/model.ckpt.* gs://${YOUR_GCS_BUCKET}/data/ +``` + +Remember the path where you uploaded the model checkpoint to, as we will need it +in the following step. + +## Configuring the Object Detection Pipeline + +In the Tensorflow Object Detection API, the model parameters, training +parameters and eval parameters are all defined by a config file. More details +can be found [here](configuring_jobs.md). For this tutorial, we will use some +predefined templates provided with the source code. In the +`object_detection/samples/configs` folder, there are skeleton object_detection +configuration files. We will use `faster_rcnn_resnet101_pets.config` as a +starting point for configuring the pipeline. Open the file with your favourite +text editor. + +We'll need to configure some paths in order for the template to work. Search the +file for instances of `PATH_TO_BE_CONFIGURED` and replace them with the +appropriate value (typically `gs://${YOUR_GCS_BUCKET}/data/`). Afterwards +upload your edited file onto GCS, making note of the path it was uploaded to +(we'll need it when starting the training/eval jobs). + +``` bash +# From tensorflow/models/ + +# Edit the faster_rcnn_resnet101_pets.config template. Please note that there +# are multiple places where PATH_TO_BE_CONFIGURED needs to be set. +sed -i "s|PATH_TO_BE_CONFIGURED|"gs://${YOUR_GCS_BUCKET}"/data|g" \ + object_detection/samples/configs/faster_rcnn_resnet101_pets.config + +# Copy edited template to cloud. +gsutil cp object_detection/samples/configs/faster_rcnn_resnet101_pets.config \ + gs://${YOUR_GCS_BUCKET}/data/faster_rcnn_resnet101_pets.config +``` + +## Checking Your Google Cloud Storage Bucket + +At this point in the tutorial, you should have uploaded the training/validation +datasets (including label map), our COCO trained FasterRCNN finetune checkpoint and your job +configuration to your Google Cloud Storage Bucket. Your bucket should look like +the following: + +```lang-none ++ ${YOUR_GCS_BUCKET}/ + + data/ + - faster_rcnn_resnet101_pets.config + - model.ckpt.index + - model.ckpt.meta + - model.ckpt.data-00000-of-00001 + - pet_label_map.pbtxt + - pet_train.record + - pet_val.record +``` + +You can inspect your bucket using the [Google Cloud Storage +browser](https://console.cloud.google.com/storage/browser). + +## Starting Training and Evaluation Jobs on Google Cloud ML Engine + +Before we can start a job on Google Cloud ML Engine, we must: + +1. Package the Tensorflow Object Detection code. +2. Write a cluster configuration for our Google Cloud ML job. + +To package the Tensorflow Object Detection code, run the following commands from +the `tensorflow/models/` directory: + +``` bash +# From tensorflow/models/ +python setup.py sdist +(cd slim && python setup.py sdist) +``` + +You should see two tar.gz files created at `dist/object_detection-0.1.tar.gz` +and `slim/dist/slim-0.1.tar.gz`. + +For running the training Cloud ML job, we'll configure the cluster to use 10 +training jobs (1 master + 9 workers) and three parameters servers. The +configuration file can be found at `object_detection/samples/cloud/cloud.yml`. + +To start training, execute the following command from the `tensorflow/models/` +directory: + +``` bash +# From tensorflow/models/ +gcloud ml-engine jobs submit training `whoami`_object_detection_`date +%s` \ + --job-dir=gs://${YOUR_GCS_BUCKET}/train \ + --packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz \ + --module-name object_detection.train \ + --region us-central1 \ + --config object_detection/samples/cloud/cloud.yml \ + -- \ + --train_dir=gs://${YOUR_GCS_BUCKET}/train \ + --pipeline_config_path=gs://${YOUR_GCS_BUCKET}/data/faster_rcnn_resnet101_pets.config +``` + +Once training has started, we can run an evaluation concurrently: + +``` bash +# From tensorflow/models/ +gcloud ml-engine jobs submit training `whoami`_object_detection_eval_`date +%s` \ + --job-dir=gs://${YOUR_GCS_BUCKET}/train \ + --packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz \ + --module-name object_detection.eval \ + --region us-central1 \ + --scale-tier BASIC_GPU \ + -- \ + --checkpoint_dir=gs://${YOUR_GCS_BUCKET}/train \ + --eval_dir=gs://${YOUR_GCS_BUCKET}/eval \ + --pipeline_config_path=gs://${YOUR_GCS_BUCKET}/data/faster_rcnn_resnet101_pets.config +``` + +Note: Even though we're running an evaluation job, the `gcloud ml-engine jobs +submit training` command is correct. ML Engine does not distinguish between +training and evaluation jobs. + +Users can monitor and stop training and evaluation jobs on the [ML Engine +Dashboard](https://console.cloud.google.com/mlengine/jobs). + +## Monitoring Progress with Tensorboard + +You can monitor progress of the training and eval jobs by running Tensorboard on +your local machine: + +``` bash +# This command needs to be run once to allow your local machine to access your +# GCS bucket. +gcloud auth application-default login + +tensorboard --logdir=gs://${YOUR_GCS_BUCKET} +``` + +Once Tensorboard is running, navigate to `localhost:6006` from your favourite +web browser. You should something similar see the following: + +![](img/tensorboard.png) + +You will also want to click on the images tab to see example detections made by +the model while it trains. After about an hour and a half of training, you can +expect to see something like this: + +![](img/tensorboard2.png) + +Note: It takes roughly 10 minutes for a job to get started on ML Engine, and +roughly an hour for the system to evaluate the validation dataset. It may take +some time to populate the dashboards. If you do not see any entries after half +an hour, check the logs from the [ML Engine +Dashboard](https://console.cloud.google.com/mlengine/jobs). + +## Exporting the Tensorflow Graph + +After your model has been trained, you should export it to a Tensorflow +graph proto. First, you need to identify a candidate checkpoint to export. You +can search your bucket using the [Google Cloud Storage +Browser](https://console.cloud.google.com/storage/browser). The file should be +stored under `${YOUR_GCS_BUCKET}/train`. The checkpoint will typically consist of +three files: + +* `model.ckpt-${CHECKPOINT_NUMBER}.data-00000-of-00001` +* `model.ckpt-${CHECKPOINT_NUMBER}.index` +* `model.ckpt-${CHECKPOINT_NUMBER}.meta` + +After you've identified a candidate checkpoint to export, run the following +command from `tensorflow/models/object_detection`: + +``` bash +# From tensorflow/models +gsutil cp gs://${YOUR_GCS_BUCKET}/train/model.ckpt-${CHECKPOINT_NUMBER}.* . +python object_detection/export_inference_graph.py \ + --input_type image_tensor \ + --pipeline_config_path object_detection/samples/configs/faster_rcnn_resnet101_pets.config \ + --checkpoint_path model.ckpt-${CHECKPOINT_NUMBER} \ + --inference_graph_path output_inference_graph.pb +``` + +Afterwards, you should see a graph named `output_inference_graph.pb`. + +## What's Next + +Congratulations, you have now trained an object detector for various cats and +dogs! There different things you can do now: + +1. [Test your exported model using the provided Jupyter notebook.](running_notebook.md) +2. [Experiment with different model configurations.](configuring_jobs.md) +3. Train an object detector using your own data. diff --git a/object_detection/matchers/BUILD b/object_detection/matchers/BUILD new file mode 100644 index 0000000000000000000000000000000000000000..1bc5992f55834a8b49ce500e95a86415d573eeb8 --- /dev/null +++ b/object_detection/matchers/BUILD @@ -0,0 +1,51 @@ +# Tensorflow Object Detection API: Matcher implementations. + +package( + default_visibility = ["//visibility:public"], +) + +licenses(["notice"]) + +# Apache 2.0 +py_library( + name = "argmax_matcher", + srcs = [ + "argmax_matcher.py", + ], + deps = [ + "//tensorflow", + "//tensorflow_models/object_detection/core:matcher", + ], +) + +py_test( + name = "argmax_matcher_test", + srcs = ["argmax_matcher_test.py"], + deps = [ + ":argmax_matcher", + "//tensorflow", + ], +) + +py_library( + name = "bipartite_matcher", + srcs = [ + "bipartite_matcher.py", + ], + deps = [ + "//tensorflow", + "//tensorflow/contrib/image:image_py", + "//tensorflow_models/object_detection/core:matcher", + ], +) + +py_test( + name = "bipartite_matcher_test", + srcs = [ + "bipartite_matcher_test.py", + ], + deps = [ + ":bipartite_matcher", + "//tensorflow", + ], +) diff --git a/object_detection/matchers/__init__.py b/object_detection/matchers/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/object_detection/matchers/argmax_matcher.py b/object_detection/matchers/argmax_matcher.py new file mode 100644 index 0000000000000000000000000000000000000000..97d851858b935398c5b78315c940059ec62aa784 --- /dev/null +++ b/object_detection/matchers/argmax_matcher.py @@ -0,0 +1,189 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Argmax matcher implementation. + +This class takes a similarity matrix and matches columns to rows based on the +maximum value per column. One can specify matched_thresholds and +to prevent columns from matching to rows (generally resulting in a negative +training example) and unmatched_theshold to ignore the match (generally +resulting in neither a positive or negative training example). + +This matcher is used in Fast(er)-RCNN. + +Note: matchers are used in TargetAssigners. There is a create_target_assigner +factory function for popular implementations. +""" + +import tensorflow as tf + +from object_detection.core import matcher + + +class ArgMaxMatcher(matcher.Matcher): + """Matcher based on highest value. + + This class computes matches from a similarity matrix. Each column is matched + to a single row. + + To support object detection target assignment this class enables setting both + matched_threshold (upper threshold) and unmatched_threshold (lower thresholds) + defining three categories of similarity which define whether examples are + positive, negative, or ignored: + (1) similarity >= matched_threshold: Highest similarity. Matched/Positive! + (2) matched_threshold > similarity >= unmatched_threshold: Medium similarity. + Depending on negatives_lower_than_unmatched, this is either + Unmatched/Negative OR Ignore. + (3) unmatched_threshold > similarity: Lowest similarity. Depending on flag + negatives_lower_than_unmatched, either Unmatched/Negative OR Ignore. + For ignored matches this class sets the values in the Match object to -2. + """ + + def __init__(self, + matched_threshold, + unmatched_threshold=None, + negatives_lower_than_unmatched=True, + force_match_for_each_row=False): + """Construct ArgMaxMatcher. + + Args: + matched_threshold: Threshold for positive matches. Positive if + sim >= matched_threshold, where sim is the maximum value of the + similarity matrix for a given column. Set to None for no threshold. + unmatched_threshold: Threshold for negative matches. Negative if + sim < unmatched_threshold. Defaults to matched_threshold + when set to None. + negatives_lower_than_unmatched: Boolean which defaults to True. If True + then negative matches are the ones below the unmatched_threshold, + whereas ignored matches are in between the matched and umatched + threshold. If False, then negative matches are in between the matched + and unmatched threshold, and everything lower than unmatched is ignored. + force_match_for_each_row: If True, ensures that each row is matched to + at least one column (which is not guaranteed otherwise if the + matched_threshold is high). Defaults to False. See + argmax_matcher_test.testMatcherForceMatch() for an example. + + Raises: + ValueError: if unmatched_threshold is set but matched_threshold is not set + or if unmatched_threshold > matched_threshold. + """ + if (matched_threshold is None) and (unmatched_threshold is not None): + raise ValueError('Need to also define matched_threshold when' + 'unmatched_threshold is defined') + self._matched_threshold = matched_threshold + if unmatched_threshold is None: + self._unmatched_threshold = matched_threshold + else: + if unmatched_threshold > matched_threshold: + raise ValueError('unmatched_threshold needs to be smaller or equal' + 'to matched_threshold') + self._unmatched_threshold = unmatched_threshold + if not negatives_lower_than_unmatched: + if self._unmatched_threshold == self._matched_threshold: + raise ValueError('When negatives are in between matched and ' + 'unmatched thresholds, these cannot be of equal ' + 'value. matched: %s, unmatched: %s', + self._matched_threshold, self._unmatched_threshold) + self._force_match_for_each_row = force_match_for_each_row + self._negatives_lower_than_unmatched = negatives_lower_than_unmatched + + def _match(self, similarity_matrix): + """Tries to match each column of the similarity matrix to a row. + + Args: + similarity_matrix: tensor of shape [N, M] representing any similarity + metric. + + Returns: + Match object with corresponding matches for each of M columns. + """ + + def _match_when_rows_are_empty(): + """Performs matching when the rows of similarity matrix are empty. + + When the rows are empty, all detections are false positives. So we return + a tensor of -1's to indicate that the columns do not match to any rows. + + Returns: + matches: int32 tensor indicating the row each column matches to. + """ + return -1 * tf.ones([tf.shape(similarity_matrix)[1]], dtype=tf.int32) + + def _match_when_rows_are_non_empty(): + """Performs matching when the rows of similarity matrix are non empty. + + Returns: + matches: int32 tensor indicating the row each column matches to. + """ + # Matches for each column + matches = tf.argmax(similarity_matrix, 0) + + # Deal with matched and unmatched threshold + if self._matched_threshold is not None: + # Get logical indices of ignored and unmatched columns as tf.int64 + matched_vals = tf.reduce_max(similarity_matrix, 0) + below_unmatched_threshold = tf.greater(self._unmatched_threshold, + matched_vals) + between_thresholds = tf.logical_and( + tf.greater_equal(matched_vals, self._unmatched_threshold), + tf.greater(self._matched_threshold, matched_vals)) + + if self._negatives_lower_than_unmatched: + matches = self._set_values_using_indicator(matches, + below_unmatched_threshold, + -1) + matches = self._set_values_using_indicator(matches, + between_thresholds, + -2) + else: + matches = self._set_values_using_indicator(matches, + below_unmatched_threshold, + -2) + matches = self._set_values_using_indicator(matches, + between_thresholds, + -1) + + if self._force_match_for_each_row: + forced_matches_ids = tf.cast(tf.argmax(similarity_matrix, 1), tf.int32) + + # Set matches[forced_matches_ids] = [0, ..., R], R is number of rows. + row_range = tf.range(tf.shape(similarity_matrix)[0]) + col_range = tf.range(tf.shape(similarity_matrix)[1]) + forced_matches_values = tf.cast(row_range, matches.dtype) + keep_matches_ids, _ = tf.setdiff1d(col_range, forced_matches_ids) + keep_matches_values = tf.gather(matches, keep_matches_ids) + matches = tf.dynamic_stitch( + [forced_matches_ids, + keep_matches_ids], [forced_matches_values, keep_matches_values]) + + return tf.cast(matches, tf.int32) + + return tf.cond( + tf.greater(tf.shape(similarity_matrix)[0], 0), + _match_when_rows_are_non_empty, _match_when_rows_are_empty) + + def _set_values_using_indicator(self, x, indicator, val): + """Set the indicated fields of x to val. + + Args: + x: tensor. + indicator: boolean with same shape as x. + val: scalar with value to set. + + Returns: + modified tensor. + """ + indicator = tf.cast(indicator, x.dtype) + return tf.add(tf.multiply(x, 1 - indicator), val * indicator) diff --git a/object_detection/matchers/argmax_matcher_test.py b/object_detection/matchers/argmax_matcher_test.py new file mode 100644 index 0000000000000000000000000000000000000000..36740f4b6f5fa64eeefff2e244600117e866a364 --- /dev/null +++ b/object_detection/matchers/argmax_matcher_test.py @@ -0,0 +1,237 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.matchers.argmax_matcher.""" + +import numpy as np +import tensorflow as tf + +from object_detection.matchers import argmax_matcher + + +class ArgMaxMatcherTest(tf.test.TestCase): + + def test_return_correct_matches_with_default_thresholds(self): + similarity = np.array([[1., 1, 1, 3, 1], + [2, -1, 2, 0, 4], + [3, 0, -1, 0, 0]]) + + matcher = argmax_matcher.ArgMaxMatcher(matched_threshold=None) + expected_matched_rows = np.array([2, 0, 1, 0, 1]) + + sim = tf.constant(similarity) + match = matcher.match(sim) + matched_cols = match.matched_column_indices() + matched_rows = match.matched_row_indices() + unmatched_cols = match.unmatched_column_indices() + + with self.test_session() as sess: + res_matched_cols = sess.run(matched_cols) + res_matched_rows = sess.run(matched_rows) + res_unmatched_cols = sess.run(unmatched_cols) + + self.assertAllEqual(res_matched_rows, expected_matched_rows) + self.assertAllEqual(res_matched_cols, np.arange(similarity.shape[1])) + self.assertEmpty(res_unmatched_cols) + + def test_return_correct_matches_with_empty_rows(self): + + matcher = argmax_matcher.ArgMaxMatcher(matched_threshold=None) + sim = 0.2*tf.ones([0, 5]) + match = matcher.match(sim) + unmatched_cols = match.unmatched_column_indices() + + with self.test_session() as sess: + res_unmatched_cols = sess.run(unmatched_cols) + self.assertAllEqual(res_unmatched_cols, np.arange(5)) + + def test_return_correct_matches_with_matched_threshold(self): + similarity = np.array([[1, 1, 1, 3, 1], + [2, -1, 2, 0, 4], + [3, 0, -1, 0, 0]], dtype=np.int32) + + matcher = argmax_matcher.ArgMaxMatcher(matched_threshold=3) + expected_matched_cols = np.array([0, 3, 4]) + expected_matched_rows = np.array([2, 0, 1]) + expected_unmatched_cols = np.array([1, 2]) + + sim = tf.constant(similarity) + match = matcher.match(sim) + matched_cols = match.matched_column_indices() + matched_rows = match.matched_row_indices() + unmatched_cols = match.unmatched_column_indices() + + init_op = tf.global_variables_initializer() + + with self.test_session() as sess: + sess.run(init_op) + res_matched_cols = sess.run(matched_cols) + res_matched_rows = sess.run(matched_rows) + res_unmatched_cols = sess.run(unmatched_cols) + + self.assertAllEqual(res_matched_rows, expected_matched_rows) + self.assertAllEqual(res_matched_cols, expected_matched_cols) + self.assertAllEqual(res_unmatched_cols, expected_unmatched_cols) + + def test_return_correct_matches_with_matched_and_unmatched_threshold(self): + similarity = np.array([[1, 1, 1, 3, 1], + [2, -1, 2, 0, 4], + [3, 0, -1, 0, 0]], dtype=np.int32) + + matcher = argmax_matcher.ArgMaxMatcher(matched_threshold=3, + unmatched_threshold=2) + expected_matched_cols = np.array([0, 3, 4]) + expected_matched_rows = np.array([2, 0, 1]) + expected_unmatched_cols = np.array([1]) # col 2 has too high maximum val + + sim = tf.constant(similarity) + match = matcher.match(sim) + matched_cols = match.matched_column_indices() + matched_rows = match.matched_row_indices() + unmatched_cols = match.unmatched_column_indices() + + with self.test_session() as sess: + res_matched_cols = sess.run(matched_cols) + res_matched_rows = sess.run(matched_rows) + res_unmatched_cols = sess.run(unmatched_cols) + + self.assertAllEqual(res_matched_rows, expected_matched_rows) + self.assertAllEqual(res_matched_cols, expected_matched_cols) + self.assertAllEqual(res_unmatched_cols, expected_unmatched_cols) + + def test_return_correct_matches_negatives_lower_than_unmatched_false(self): + similarity = np.array([[1, 1, 1, 3, 1], + [2, -1, 2, 0, 4], + [3, 0, -1, 0, 0]], dtype=np.int32) + + matcher = argmax_matcher.ArgMaxMatcher(matched_threshold=3, + unmatched_threshold=2, + negatives_lower_than_unmatched=False) + expected_matched_cols = np.array([0, 3, 4]) + expected_matched_rows = np.array([2, 0, 1]) + expected_unmatched_cols = np.array([2]) # col 1 has too low maximum val + + sim = tf.constant(similarity) + match = matcher.match(sim) + matched_cols = match.matched_column_indices() + matched_rows = match.matched_row_indices() + unmatched_cols = match.unmatched_column_indices() + + with self.test_session() as sess: + res_matched_cols = sess.run(matched_cols) + res_matched_rows = sess.run(matched_rows) + res_unmatched_cols = sess.run(unmatched_cols) + + self.assertAllEqual(res_matched_rows, expected_matched_rows) + self.assertAllEqual(res_matched_cols, expected_matched_cols) + self.assertAllEqual(res_unmatched_cols, expected_unmatched_cols) + + def test_return_correct_matches_unmatched_row_not_using_force_match(self): + similarity = np.array([[1, 1, 1, 3, 1], + [-1, 0, -2, -2, -1], + [3, 0, -1, 2, 0]], dtype=np.int32) + + matcher = argmax_matcher.ArgMaxMatcher(matched_threshold=3, + unmatched_threshold=2) + expected_matched_cols = np.array([0, 3]) + expected_matched_rows = np.array([2, 0]) + expected_unmatched_cols = np.array([1, 2, 4]) + + sim = tf.constant(similarity) + match = matcher.match(sim) + matched_cols = match.matched_column_indices() + matched_rows = match.matched_row_indices() + unmatched_cols = match.unmatched_column_indices() + + with self.test_session() as sess: + res_matched_cols = sess.run(matched_cols) + res_matched_rows = sess.run(matched_rows) + res_unmatched_cols = sess.run(unmatched_cols) + + self.assertAllEqual(res_matched_rows, expected_matched_rows) + self.assertAllEqual(res_matched_cols, expected_matched_cols) + self.assertAllEqual(res_unmatched_cols, expected_unmatched_cols) + + def test_return_correct_matches_unmatched_row_while_using_force_match(self): + similarity = np.array([[1, 1, 1, 3, 1], + [-1, 0, -2, -2, -1], + [3, 0, -1, 2, 0]], dtype=np.int32) + + matcher = argmax_matcher.ArgMaxMatcher(matched_threshold=3, + unmatched_threshold=2, + force_match_for_each_row=True) + expected_matched_cols = np.array([0, 1, 3]) + expected_matched_rows = np.array([2, 1, 0]) + expected_unmatched_cols = np.array([2, 4]) # col 2 has too high max val + + sim = tf.constant(similarity) + match = matcher.match(sim) + matched_cols = match.matched_column_indices() + matched_rows = match.matched_row_indices() + unmatched_cols = match.unmatched_column_indices() + + with self.test_session() as sess: + res_matched_cols = sess.run(matched_cols) + res_matched_rows = sess.run(matched_rows) + res_unmatched_cols = sess.run(unmatched_cols) + + self.assertAllEqual(res_matched_rows, expected_matched_rows) + self.assertAllEqual(res_matched_cols, expected_matched_cols) + self.assertAllEqual(res_unmatched_cols, expected_unmatched_cols) + + def test_valid_arguments_corner_case(self): + argmax_matcher.ArgMaxMatcher(matched_threshold=1, + unmatched_threshold=1) + + def test_invalid_arguments_corner_case_negatives_lower_than_thres_false(self): + with self.assertRaises(ValueError): + argmax_matcher.ArgMaxMatcher(matched_threshold=1, + unmatched_threshold=1, + negatives_lower_than_unmatched=False) + + def test_invalid_arguments_no_matched_threshold(self): + with self.assertRaises(ValueError): + argmax_matcher.ArgMaxMatcher(matched_threshold=None, + unmatched_threshold=4) + + def test_invalid_arguments_unmatched_thres_larger_than_matched_thres(self): + with self.assertRaises(ValueError): + argmax_matcher.ArgMaxMatcher(matched_threshold=1, + unmatched_threshold=2) + + def test_set_values_using_indicator(self): + input_a = np.array([3, 4, 5, 1, 4, 3, 2]) + expected_b = np.array([3, 0, 0, 1, 0, 3, 2]) # Set a>3 to 0 + expected_c = np.array( + [3., 4., 5., -1., 4., 3., -1.]) # Set a<3 to -1. Float32 + idxb_ = input_a > 3 + idxc_ = input_a < 3 + + matcher = argmax_matcher.ArgMaxMatcher(matched_threshold=None) + + a = tf.constant(input_a) + idxb = tf.constant(idxb_) + idxc = tf.constant(idxc_) + b = matcher._set_values_using_indicator(a, idxb, 0) + c = matcher._set_values_using_indicator(tf.cast(a, tf.float32), idxc, -1) + with self.test_session() as sess: + res_b = sess.run(b) + res_c = sess.run(c) + self.assertAllEqual(res_b, expected_b) + self.assertAllEqual(res_c, expected_c) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/matchers/bipartite_matcher.py b/object_detection/matchers/bipartite_matcher.py new file mode 100644 index 0000000000000000000000000000000000000000..3d717d12fe72448f61869fa9734601a422f5003f --- /dev/null +++ b/object_detection/matchers/bipartite_matcher.py @@ -0,0 +1,53 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Bipartite matcher implementation.""" + +import tensorflow as tf + +from tensorflow.contrib.image.python.ops import image_ops +from object_detection.core import matcher + + +class GreedyBipartiteMatcher(matcher.Matcher): + """Wraps a Tensorflow greedy bipartite matcher.""" + + def _match(self, similarity_matrix, num_valid_rows=-1): + """Bipartite matches a collection rows and columns. A greedy bi-partite. + + TODO: Add num_valid_columns options to match only that many columns with + all the rows. + + Args: + similarity_matrix: Float tensor of shape [N, M] with pairwise similarity + where higher values mean more similar. + num_valid_rows: A scalar or a 1-D tensor with one element describing the + number of valid rows of similarity_matrix to consider for the bipartite + matching. If set to be negative, then all rows from similarity_matrix + are used. + + Returns: + match_results: int32 tensor of shape [M] with match_results[i]=-1 + meaning that column i is not matched and otherwise that it is matched to + row match_results[i]. + """ + # Convert similarity matrix to distance matrix as tf.image.bipartite tries + # to find minimum distance matches. + distance_matrix = -1 * similarity_matrix + _, match_results = image_ops.bipartite_match( + distance_matrix, num_valid_rows) + match_results = tf.reshape(match_results, [-1]) + match_results = tf.cast(match_results, tf.int32) + return match_results diff --git a/object_detection/matchers/bipartite_matcher_test.py b/object_detection/matchers/bipartite_matcher_test.py new file mode 100644 index 0000000000000000000000000000000000000000..2ee45a80dfafc82b6ee4965a28719b9840296591 --- /dev/null +++ b/object_detection/matchers/bipartite_matcher_test.py @@ -0,0 +1,71 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.core.bipartite_matcher.""" + +import tensorflow as tf + +from object_detection.matchers import bipartite_matcher + + +class GreedyBipartiteMatcherTest(tf.test.TestCase): + + def test_get_expected_matches_when_all_rows_are_valid(self): + similarity_matrix = tf.constant([[0.50, 0.1, 0.8], [0.15, 0.2, 0.3]]) + num_valid_rows = 2 + expected_match_results = [-1, 1, 0] + + matcher = bipartite_matcher.GreedyBipartiteMatcher() + match = matcher.match(similarity_matrix, num_valid_rows=num_valid_rows) + with self.test_session() as sess: + match_results_out = sess.run(match._match_results) + self.assertAllEqual(match_results_out, expected_match_results) + + def test_get_expected_matches_with_valid_rows_set_to_minus_one(self): + similarity_matrix = tf.constant([[0.50, 0.1, 0.8], [0.15, 0.2, 0.3]]) + num_valid_rows = -1 + expected_match_results = [-1, 1, 0] + + matcher = bipartite_matcher.GreedyBipartiteMatcher() + match = matcher.match(similarity_matrix, num_valid_rows=num_valid_rows) + with self.test_session() as sess: + match_results_out = sess.run(match._match_results) + self.assertAllEqual(match_results_out, expected_match_results) + + def test_get_no_matches_with_zero_valid_rows(self): + similarity_matrix = tf.constant([[0.50, 0.1, 0.8], [0.15, 0.2, 0.3]]) + num_valid_rows = 0 + expected_match_results = [-1, -1, -1] + + matcher = bipartite_matcher.GreedyBipartiteMatcher() + match = matcher.match(similarity_matrix, num_valid_rows=num_valid_rows) + with self.test_session() as sess: + match_results_out = sess.run(match._match_results) + self.assertAllEqual(match_results_out, expected_match_results) + + def test_get_expected_matches_with_only_one_valid_row(self): + similarity_matrix = tf.constant([[0.50, 0.1, 0.8], [0.15, 0.2, 0.3]]) + num_valid_rows = 1 + expected_match_results = [-1, -1, 0] + + matcher = bipartite_matcher.GreedyBipartiteMatcher() + match = matcher.match(similarity_matrix, num_valid_rows=num_valid_rows) + with self.test_session() as sess: + match_results_out = sess.run(match._match_results) + self.assertAllEqual(match_results_out, expected_match_results) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/meta_architectures/BUILD b/object_detection/meta_architectures/BUILD new file mode 100644 index 0000000000000000000000000000000000000000..5a9dcdc3e17930ff145abf189085b85c08829f0d --- /dev/null +++ b/object_detection/meta_architectures/BUILD @@ -0,0 +1,109 @@ +# Tensorflow Object Detection API: Meta-architectures. + +package( + default_visibility = ["//visibility:public"], +) + +licenses(["notice"]) + +# Apache 2.0 + +py_library( + name = "ssd_meta_arch", + srcs = ["ssd_meta_arch.py"], + deps = [ + "//tensorflow", + "//tensorflow_models/object_detection/core:box_coder", + "//tensorflow_models/object_detection/core:box_list", + "//tensorflow_models/object_detection/core:box_predictor", + "//tensorflow_models/object_detection/core:model", + "//tensorflow_models/object_detection/core:target_assigner", + "//tensorflow_models/object_detection/utils:variables_helper", + ], +) + +py_test( + name = "ssd_meta_arch_test", + srcs = ["ssd_meta_arch_test.py"], + deps = [ + ":ssd_meta_arch", + "//tensorflow", + "//tensorflow/python:training", + "//tensorflow_models/object_detection/core:anchor_generator", + "//tensorflow_models/object_detection/core:box_list", + "//tensorflow_models/object_detection/core:losses", + "//tensorflow_models/object_detection/core:post_processing", + "//tensorflow_models/object_detection/core:region_similarity_calculator", + "//tensorflow_models/object_detection/utils:test_utils", + ], +) + +py_library( + name = "faster_rcnn_meta_arch", + srcs = [ + "faster_rcnn_meta_arch.py", + ], + deps = [ + "//tensorflow", + "//tensorflow_models/object_detection/anchor_generators:grid_anchor_generator", + "//tensorflow_models/object_detection/core:balanced_positive_negative_sampler", + "//tensorflow_models/object_detection/core:box_list", + "//tensorflow_models/object_detection/core:box_list_ops", + "//tensorflow_models/object_detection/core:box_predictor", + "//tensorflow_models/object_detection/core:losses", + "//tensorflow_models/object_detection/core:model", + "//tensorflow_models/object_detection/core:post_processing", + "//tensorflow_models/object_detection/core:standard_fields", + "//tensorflow_models/object_detection/core:target_assigner", + "//tensorflow_models/object_detection/utils:ops", + "//tensorflow_models/object_detection/utils:variables_helper", + ], +) + +py_library( + name = "faster_rcnn_meta_arch_test_lib", + srcs = [ + "faster_rcnn_meta_arch_test_lib.py", + ], + deps = [ + ":faster_rcnn_meta_arch", + "//tensorflow", + "//tensorflow_models/object_detection/anchor_generators:grid_anchor_generator", + "//tensorflow_models/object_detection/builders:box_predictor_builder", + "//tensorflow_models/object_detection/builders:hyperparams_builder", + "//tensorflow_models/object_detection/builders:post_processing_builder", + "//tensorflow_models/object_detection/core:losses", + "//tensorflow_models/object_detection/protos:box_predictor_py_pb2", + "//tensorflow_models/object_detection/protos:hyperparams_py_pb2", + "//tensorflow_models/object_detection/protos:post_processing_py_pb2", + ], +) + +py_test( + name = "faster_rcnn_meta_arch_test", + srcs = ["faster_rcnn_meta_arch_test.py"], + deps = [ + ":faster_rcnn_meta_arch_test_lib", + ], +) + +py_library( + name = "rfcn_meta_arch", + srcs = ["rfcn_meta_arch.py"], + deps = [ + ":faster_rcnn_meta_arch", + "//tensorflow", + "//tensorflow_models/object_detection/core:box_predictor", + "//tensorflow_models/object_detection/utils:ops", + ], +) + +py_test( + name = "rfcn_meta_arch_test", + srcs = ["rfcn_meta_arch_test.py"], + deps = [ + ":faster_rcnn_meta_arch_test_lib", + ":rfcn_meta_arch", + "//tensorflow", + ], +) diff --git a/object_detection/meta_architectures/__init__.py b/object_detection/meta_architectures/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/object_detection/meta_architectures/faster_rcnn_meta_arch.py b/object_detection/meta_architectures/faster_rcnn_meta_arch.py new file mode 100644 index 0000000000000000000000000000000000000000..baf6d38fab518b0475d26a17d15ceaf058324b08 --- /dev/null +++ b/object_detection/meta_architectures/faster_rcnn_meta_arch.py @@ -0,0 +1,1451 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Faster R-CNN meta-architecture definition. + +General tensorflow implementation of Faster R-CNN detection models. + +See Faster R-CNN: Ren, Shaoqing, et al. +"Faster R-CNN: Towards real-time object detection with region proposal +networks." Advances in neural information processing systems. 2015. + +We allow for two modes: first_stage_only=True and first_stage_only=False. In +the former setting, all of the user facing methods (e.g., predict, postprocess, +loss) can be used as if the model consisted only of the RPN, returning class +agnostic proposals (these can be thought of as approximate detections with no +associated class information). In the latter setting, proposals are computed, +then passed through a second stage "box classifier" to yield (multi-class) +detections. + +Implementations of Faster R-CNN models must define a new +FasterRCNNFeatureExtractor and override three methods: `preprocess`, +`_extract_proposal_features` (the first stage of the model), and +`_extract_box_classifier_features` (the second stage of the model). Optionally, +the `restore_fn` method can be overridden. See tests for an example. + +A few important notes: ++ Batching conventions: We support batched inference and training where +all images within a batch have the same resolution. Batch sizes are determined +dynamically via the shape of the input tensors (rather than being specified +directly as, e.g., a model constructor). + +A complication is that due to non-max suppression, we are not guaranteed to get +the same number of proposals from the first stage RPN (region proposal network) +for each image (though in practice, we should often get the same number of +proposals). For this reason we pad to a max number of proposals per image +within a batch. This `self.max_num_proposals` property is set to the +`first_stage_max_proposals` parameter at inference time and the +`second_stage_batch_size` at training time since we subsample the batch to +be sent through the box classifier during training. + +For the second stage of the pipeline, we arrange the proposals for all images +within the batch along a single batch dimension. For example, the input to +_extract_box_classifier_features is a tensor of shape +`[total_num_proposals, crop_height, crop_width, depth]` where +total_num_proposals is batch_size * self.max_num_proposals. (And note that per +the above comment, a subset of these entries correspond to zero paddings.) + ++ Coordinate representations: +Following the API (see model.DetectionModel definition), our outputs after +postprocessing operations are always normalized boxes however, internally, we +sometimes convert to absolute --- e.g. for loss computation. In particular, +anchors and proposal_boxes are both represented as absolute coordinates. + +TODO: Support TPU implementations and sigmoid loss. +""" +from abc import abstractmethod +from functools import partial +import tensorflow as tf + +from object_detection.anchor_generators import grid_anchor_generator +from object_detection.core import balanced_positive_negative_sampler as sampler +from object_detection.core import box_list +from object_detection.core import box_list_ops +from object_detection.core import box_predictor +from object_detection.core import losses +from object_detection.core import model +from object_detection.core import post_processing +from object_detection.core import standard_fields as fields +from object_detection.core import target_assigner +from object_detection.utils import ops +from object_detection.utils import variables_helper + +slim = tf.contrib.slim + + +class FasterRCNNFeatureExtractor(object): + """Faster R-CNN Feature Extractor definition.""" + + def __init__(self, + is_training, + first_stage_features_stride, + reuse_weights=None, + weight_decay=0.0): + """Constructor. + + Args: + is_training: A boolean indicating whether the training version of the + computation graph should be constructed. + first_stage_features_stride: Output stride of extracted RPN feature map. + reuse_weights: Whether to reuse variables. Default is None. + weight_decay: float weight decay for feature extractor (default: 0.0). + """ + self._is_training = is_training + self._first_stage_features_stride = first_stage_features_stride + self._reuse_weights = reuse_weights + self._weight_decay = weight_decay + + @abstractmethod + def preprocess(self, resized_inputs): + """Feature-extractor specific preprocessing (minus image resizing).""" + pass + + def extract_proposal_features(self, preprocessed_inputs, scope): + """Extracts first stage RPN features. + + This function is responsible for extracting feature maps from preprocessed + images. These features are used by the region proposal network (RPN) to + predict proposals. + + Args: + preprocessed_inputs: A [batch, height, width, channels] float tensor + representing a batch of images. + scope: A scope name. + + Returns: + rpn_feature_map: A tensor with shape [batch, height, width, depth] + """ + with tf.variable_scope(scope, values=[preprocessed_inputs]): + return self._extract_proposal_features(preprocessed_inputs, scope) + + @abstractmethod + def _extract_proposal_features(self, preprocessed_inputs, scope): + """Extracts first stage RPN features, to be overridden.""" + pass + + def extract_box_classifier_features(self, proposal_feature_maps, scope): + """Extracts second stage box classifier features. + + Args: + proposal_feature_maps: A 4-D float tensor with shape + [batch_size * self.max_num_proposals, crop_height, crop_width, depth] + representing the feature map cropped to each proposal. + scope: A scope name. + + Returns: + proposal_classifier_features: A 4-D float tensor with shape + [batch_size * self.max_num_proposals, height, width, depth] + representing box classifier features for each proposal. + """ + with tf.variable_scope(scope, values=[proposal_feature_maps]): + return self._extract_box_classifier_features(proposal_feature_maps, scope) + + @abstractmethod + def _extract_box_classifier_features(self, proposal_feature_maps, scope): + """Extracts second stage box classifier features, to be overridden.""" + pass + + def restore_from_classification_checkpoint_fn( + self, + checkpoint_path, + first_stage_feature_extractor_scope, + second_stage_feature_extractor_scope): + """Returns callable for loading a checkpoint into the tensorflow graph. + + Args: + checkpoint_path: path to checkpoint to restore. + first_stage_feature_extractor_scope: A scope name for the first stage + feature extractor. + second_stage_feature_extractor_scope: A scope name for the second stage + feature extractor. + + Returns: + a callable which takes a tf.Session as input and loads a checkpoint when + run. + """ + variables_to_restore = {} + for variable in tf.global_variables(): + for scope_name in [first_stage_feature_extractor_scope, + second_stage_feature_extractor_scope]: + if variable.op.name.startswith(scope_name): + var_name = variable.op.name.replace(scope_name + '/', '') + variables_to_restore[var_name] = variable + variables_to_restore = ( + variables_helper.get_variables_available_in_checkpoint( + variables_to_restore, checkpoint_path)) + saver = tf.train.Saver(variables_to_restore) + def restore(sess): + saver.restore(sess, checkpoint_path) + return restore + + +class FasterRCNNMetaArch(model.DetectionModel): + """Faster R-CNN Meta-architecture definition.""" + + def __init__(self, + is_training, + num_classes, + image_resizer_fn, + feature_extractor, + first_stage_only, + first_stage_anchor_generator, + first_stage_atrous_rate, + first_stage_box_predictor_arg_scope, + first_stage_box_predictor_kernel_size, + first_stage_box_predictor_depth, + first_stage_minibatch_size, + first_stage_positive_balance_fraction, + first_stage_nms_score_threshold, + first_stage_nms_iou_threshold, + first_stage_max_proposals, + first_stage_localization_loss_weight, + first_stage_objectness_loss_weight, + initial_crop_size, + maxpool_kernel_size, + maxpool_stride, + second_stage_mask_rcnn_box_predictor, + second_stage_batch_size, + second_stage_balance_fraction, + second_stage_non_max_suppression_fn, + second_stage_score_conversion_fn, + second_stage_localization_loss_weight, + second_stage_classification_loss_weight, + hard_example_miner, + parallel_iterations=16): + """FasterRCNNMetaArch Constructor. + + Args: + is_training: A boolean indicating whether the training version of the + computation graph should be constructed. + num_classes: Number of classes. Note that num_classes *does not* + include the background category, so if groundtruth labels take values + in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the + assigned classification targets can range from {0,... K}). + image_resizer_fn: A callable for image resizing. This callable always + takes a rank-3 image tensor (corresponding to a single image) and + returns a rank-3 image tensor, possibly with new spatial dimensions. + See builders/image_resizer_builder.py. + feature_extractor: A FasterRCNNFeatureExtractor object. + first_stage_only: Whether to construct only the Region Proposal Network + (RPN) part of the model. + first_stage_anchor_generator: An anchor_generator.AnchorGenerator object + (note that currently we only support + grid_anchor_generator.GridAnchorGenerator objects) + first_stage_atrous_rate: A single integer indicating the atrous rate for + the single convolution op which is applied to the `rpn_features_to_crop` + tensor to obtain a tensor to be used for box prediction. Some feature + extractors optionally allow for producing feature maps computed at + denser resolutions. The atrous rate is used to compensate for the + denser feature maps by using an effectively larger receptive field. + (This should typically be set to 1). + first_stage_box_predictor_arg_scope: Slim arg_scope for conv2d, + separable_conv2d and fully_connected ops for the RPN box predictor. + first_stage_box_predictor_kernel_size: Kernel size to use for the + convolution op just prior to RPN box predictions. + first_stage_box_predictor_depth: Output depth for the convolution op + just prior to RPN box predictions. + first_stage_minibatch_size: The "batch size" to use for computing the + objectness and location loss of the region proposal network. This + "batch size" refers to the number of anchors selected as contributing + to the loss function for any given image within the image batch and is + only called "batch_size" due to terminology from the Faster R-CNN paper. + first_stage_positive_balance_fraction: Fraction of positive examples + per image for the RPN. The recommended value for Faster RCNN is 0.5. + first_stage_nms_score_threshold: Score threshold for non max suppression + for the Region Proposal Network (RPN). This value is expected to be in + [0, 1] as it is applied directly after a softmax transformation. The + recommended value for Faster R-CNN is 0. + first_stage_nms_iou_threshold: The Intersection Over Union (IOU) threshold + for performing Non-Max Suppression (NMS) on the boxes predicted by the + Region Proposal Network (RPN). + first_stage_max_proposals: Maximum number of boxes to retain after + performing Non-Max Suppression (NMS) on the boxes predicted by the + Region Proposal Network (RPN). + first_stage_localization_loss_weight: A float + first_stage_objectness_loss_weight: A float + initial_crop_size: A single integer indicating the output size + (width and height are set to be the same) of the initial bilinear + interpolation based cropping during ROI pooling. + maxpool_kernel_size: A single integer indicating the kernel size of the + max pool op on the cropped feature map during ROI pooling. + maxpool_stride: A single integer indicating the stride of the max pool + op on the cropped feature map during ROI pooling. + second_stage_mask_rcnn_box_predictor: Mask R-CNN box predictor to use for + the second stage. + second_stage_batch_size: The batch size used for computing the + classification and refined location loss of the box classifier. This + "batch size" refers to the number of proposals selected as contributing + to the loss function for any given image within the image batch and is + only called "batch_size" due to terminology from the Faster R-CNN paper. + second_stage_balance_fraction: Fraction of positive examples to use + per image for the box classifier. The recommended value for Faster RCNN + is 0.25. + second_stage_non_max_suppression_fn: batch_multiclass_non_max_suppression + callable that takes `boxes`, `scores`, optional `clip_window` and + optional (kwarg) `mask` inputs (with all other inputs already set) + and returns a dictionary containing tensors with keys: + `detection_boxes`, `detection_scores`, `detection_classes`, + `num_detections`, and (optionally) `detection_masks`. See + `post_processing.batch_multiclass_non_max_suppression` for the type and + shape of these tensors. + second_stage_score_conversion_fn: Callable elementwise nonlinearity + (that takes tensors as inputs and returns tensors). This is usually + used to convert logits to probabilities. + second_stage_localization_loss_weight: A float + second_stage_classification_loss_weight: A float + hard_example_miner: A losses.HardExampleMiner object (can be None). + parallel_iterations: (Optional) The number of iterations allowed to run + in parallel for calls to tf.map_fn. + Raises: + ValueError: If `second_stage_batch_size` > `first_stage_max_proposals` + ValueError: If first_stage_anchor_generator is not of type + grid_anchor_generator.GridAnchorGenerator. + """ + super(FasterRCNNMetaArch, self).__init__(num_classes=num_classes) + + if second_stage_batch_size > first_stage_max_proposals: + raise ValueError('second_stage_batch_size should be no greater than ' + 'first_stage_max_proposals.') + if not isinstance(first_stage_anchor_generator, + grid_anchor_generator.GridAnchorGenerator): + raise ValueError('first_stage_anchor_generator must be of type ' + 'grid_anchor_generator.GridAnchorGenerator.') + + self._is_training = is_training + self._image_resizer_fn = image_resizer_fn + self._feature_extractor = feature_extractor + self._first_stage_only = first_stage_only + + # The first class is reserved as background. + unmatched_cls_target = tf.constant( + [1] + self._num_classes * [0], dtype=tf.float32) + self._proposal_target_assigner = target_assigner.create_target_assigner( + 'FasterRCNN', 'proposal') + self._detector_target_assigner = target_assigner.create_target_assigner( + 'FasterRCNN', 'detection', unmatched_cls_target=unmatched_cls_target) + # Both proposal and detector target assigners use the same box coder + self._box_coder = self._proposal_target_assigner.box_coder + + # (First stage) Region proposal network parameters + self._first_stage_anchor_generator = first_stage_anchor_generator + self._first_stage_atrous_rate = first_stage_atrous_rate + self._first_stage_box_predictor_arg_scope = ( + first_stage_box_predictor_arg_scope) + self._first_stage_box_predictor_kernel_size = ( + first_stage_box_predictor_kernel_size) + self._first_stage_box_predictor_depth = first_stage_box_predictor_depth + self._first_stage_minibatch_size = first_stage_minibatch_size + self._first_stage_sampler = sampler.BalancedPositiveNegativeSampler( + positive_fraction=first_stage_positive_balance_fraction) + self._first_stage_box_predictor = box_predictor.ConvolutionalBoxPredictor( + self._is_training, num_classes=1, + conv_hyperparams=self._first_stage_box_predictor_arg_scope, + min_depth=0, max_depth=0, num_layers_before_predictor=0, + use_dropout=False, dropout_keep_prob=1.0, kernel_size=1, + box_code_size=self._box_coder.code_size) + + self._first_stage_nms_score_threshold = first_stage_nms_score_threshold + self._first_stage_nms_iou_threshold = first_stage_nms_iou_threshold + self._first_stage_max_proposals = first_stage_max_proposals + + self._first_stage_localization_loss = ( + losses.WeightedSmoothL1LocalizationLoss(anchorwise_output=True)) + self._first_stage_objectness_loss = ( + losses.WeightedSoftmaxClassificationLoss(anchorwise_output=True)) + self._first_stage_loc_loss_weight = first_stage_localization_loss_weight + self._first_stage_obj_loss_weight = first_stage_objectness_loss_weight + + # Per-region cropping parameters + self._initial_crop_size = initial_crop_size + self._maxpool_kernel_size = maxpool_kernel_size + self._maxpool_stride = maxpool_stride + + self._mask_rcnn_box_predictor = second_stage_mask_rcnn_box_predictor + + self._second_stage_batch_size = second_stage_batch_size + self._second_stage_sampler = sampler.BalancedPositiveNegativeSampler( + positive_fraction=second_stage_balance_fraction) + + self._second_stage_nms_fn = second_stage_non_max_suppression_fn + self._second_stage_score_conversion_fn = second_stage_score_conversion_fn + + self._second_stage_localization_loss = ( + losses.WeightedSmoothL1LocalizationLoss(anchorwise_output=True)) + self._second_stage_classification_loss = ( + losses.WeightedSoftmaxClassificationLoss(anchorwise_output=True)) + self._second_stage_loc_loss_weight = second_stage_localization_loss_weight + self._second_stage_cls_loss_weight = second_stage_classification_loss_weight + self._hard_example_miner = hard_example_miner + self._parallel_iterations = parallel_iterations + + @property + def first_stage_feature_extractor_scope(self): + return 'FirstStageFeatureExtractor' + + @property + def second_stage_feature_extractor_scope(self): + return 'SecondStageFeatureExtractor' + + @property + def first_stage_box_predictor_scope(self): + return 'FirstStageBoxPredictor' + + @property + def second_stage_box_predictor_scope(self): + return 'SecondStageBoxPredictor' + + @property + def max_num_proposals(self): + """Max number of proposals (to pad to) for each image in the input batch. + + At training time, this is set to be the `second_stage_batch_size` if hard + example miner is not configured, else it is set to + `first_stage_max_proposals`. At inference time, this is always set to + `first_stage_max_proposals`. + + Returns: + A positive integer. + """ + if self._is_training and not self._hard_example_miner: + return self._second_stage_batch_size + return self._first_stage_max_proposals + + def preprocess(self, inputs): + """Feature-extractor specific preprocessing. + + See base class. + + For Faster R-CNN, we perform image resizing in the base class --- each + class subclassing FasterRCNNMetaArch is responsible for any additional + preprocessing (e.g., scaling pixel values to be in [-1, 1]). + + Args: + inputs: a [batch, height_in, width_in, channels] float tensor representing + a batch of images with values between 0 and 255.0. + + Returns: + preprocessed_inputs: a [batch, height_out, width_out, channels] float + tensor representing a batch of images. + Raises: + ValueError: if inputs tensor does not have type tf.float32 + """ + if inputs.dtype is not tf.float32: + raise ValueError('`preprocess` expects a tf.float32 tensor') + with tf.name_scope('Preprocessor'): + resized_inputs = tf.map_fn(self._image_resizer_fn, + elems=inputs, + dtype=tf.float32, + parallel_iterations=self._parallel_iterations) + return self._feature_extractor.preprocess(resized_inputs) + + def predict(self, preprocessed_inputs): + """Predicts unpostprocessed tensors from input tensor. + + This function takes an input batch of images and runs it through the + forward pass of the network to yield "raw" un-postprocessed predictions. + If `first_stage_only` is True, this function only returns first stage + RPN predictions (un-postprocessed). Otherwise it returns both + first stage RPN predictions as well as second stage box classifier + predictions. + + Other remarks: + + Anchor pruning vs. clipping: following the recommendation of the Faster + R-CNN paper, we prune anchors that venture outside the image window at + training time and clip anchors to the image window at inference time. + + Proposal padding: as described at the top of the file, proposals are + padded to self._max_num_proposals and flattened so that proposals from all + images within the input batch are arranged along the same batch dimension. + + Args: + preprocessed_inputs: a [batch, height, width, channels] float tensor + representing a batch of images. + + Returns: + prediction_dict: a dictionary holding "raw" prediction tensors: + 1) rpn_box_predictor_features: A 4-D float32 tensor with shape + [batch_size, height, width, depth] to be used for predicting proposal + boxes and corresponding objectness scores. + 2) rpn_features_to_crop: A 4-D float32 tensor with shape + [batch_size, height, width, depth] representing image features to crop + using the proposal boxes predicted by the RPN. + 3) image_shape: a 1-D tensor of shape [4] representing the input + image shape. + 4) rpn_box_encodings: 3-D float tensor of shape + [batch_size, num_anchors, self._box_coder.code_size] containing + predicted boxes. + 5) rpn_objectness_predictions_with_background: 3-D float tensor of shape + [batch_size, num_anchors, 2] containing class + predictions (logits) for each of the anchors. Note that this + tensor *includes* background class predictions (at class index 0). + 6) anchors: A 2-D tensor of shape [num_anchors, 4] representing anchors + for the first stage RPN (in absolute coordinates). Note that + `num_anchors` can differ depending on whether the model is created in + training or inference mode. + + (and if first_stage_only=False): + 7) refined_box_encodings: a 3-D tensor with shape + [total_num_proposals, num_classes, 4] representing predicted + (final) refined box encodings, where + total_num_proposals=batch_size*self._max_num_proposals + 8) class_predictions_with_background: a 3-D tensor with shape + [total_num_proposals, num_classes + 1] containing class + predictions (logits) for each of the anchors, where + total_num_proposals=batch_size*self._max_num_proposals. + Note that this tensor *includes* background class predictions + (at class index 0). + 9) num_proposals: An int32 tensor of shape [batch_size] representing the + number of proposals generated by the RPN. `num_proposals` allows us + to keep track of which entries are to be treated as zero paddings and + which are not since we always pad the number of proposals to be + `self.max_num_proposals` for each image. + 10) proposal_boxes: A float32 tensor of shape + [batch_size, self.max_num_proposals, 4] representing + decoded proposal bounding boxes (in absolute coordinates). + 11) mask_predictions: (optional) a 4-D tensor with shape + [total_num_padded_proposals, num_classes, mask_height, mask_width] + containing instance mask predictions. + """ + (rpn_box_predictor_features, rpn_features_to_crop, anchors_boxlist, + image_shape) = self._extract_rpn_feature_maps(preprocessed_inputs) + (rpn_box_encodings, rpn_objectness_predictions_with_background + ) = self._predict_rpn_proposals(rpn_box_predictor_features) + + # The Faster R-CNN paper recommends pruning anchors that venture outside + # the image window at training time and clipping at inference time. + clip_window = tf.to_float(tf.stack([0, 0, image_shape[1], image_shape[2]])) + if self._is_training: + (rpn_box_encodings, rpn_objectness_predictions_with_background, + anchors_boxlist) = self._remove_invalid_anchors_and_predictions( + rpn_box_encodings, rpn_objectness_predictions_with_background, + anchors_boxlist, clip_window) + else: + anchors_boxlist = box_list_ops.clip_to_window( + anchors_boxlist, clip_window) + + anchors = anchors_boxlist.get() + prediction_dict = { + 'rpn_box_predictor_features': rpn_box_predictor_features, + 'rpn_features_to_crop': rpn_features_to_crop, + 'image_shape': image_shape, + 'rpn_box_encodings': rpn_box_encodings, + 'rpn_objectness_predictions_with_background': + rpn_objectness_predictions_with_background, + 'anchors': anchors + } + + if not self._first_stage_only: + prediction_dict.update(self._predict_second_stage( + rpn_box_encodings, + rpn_objectness_predictions_with_background, + rpn_features_to_crop, + anchors, image_shape)) + return prediction_dict + + def _predict_second_stage(self, rpn_box_encodings, + rpn_objectness_predictions_with_background, + rpn_features_to_crop, + anchors, + image_shape): + """Predicts the output tensors from second stage of Faster R-CNN. + + Args: + rpn_box_encodings: 4-D float tensor of shape + [batch_size, num_valid_anchors, self._box_coder.code_size] containing + predicted boxes. + rpn_objectness_predictions_with_background: 2-D float tensor of shape + [batch_size, num_valid_anchors, 2] containing class + predictions (logits) for each of the anchors. Note that this + tensor *includes* background class predictions (at class index 0). + rpn_features_to_crop: A 4-D float32 tensor with shape + [batch_size, height, width, depth] representing image features to crop + using the proposal boxes predicted by the RPN. + anchors: 2-D float tensor of shape + [num_anchors, self._box_coder.code_size]. + image_shape: A 1D int32 tensors of size [4] containing the image shape. + + Returns: + prediction_dict: a dictionary holding "raw" prediction tensors: + 1) refined_box_encodings: a 3-D tensor with shape + [total_num_proposals, num_classes, 4] representing predicted + (final) refined box encodings, where + total_num_proposals=batch_size*self._max_num_proposals + 2) class_predictions_with_background: a 3-D tensor with shape + [total_num_proposals, num_classes + 1] containing class + predictions (logits) for each of the anchors, where + total_num_proposals=batch_size*self._max_num_proposals. + Note that this tensor *includes* background class predictions + (at class index 0). + 3) num_proposals: An int32 tensor of shape [batch_size] representing the + number of proposals generated by the RPN. `num_proposals` allows us + to keep track of which entries are to be treated as zero paddings and + which are not since we always pad the number of proposals to be + `self.max_num_proposals` for each image. + 4) proposal_boxes: A float32 tensor of shape + [batch_size, self.max_num_proposals, 4] representing + decoded proposal bounding boxes (in absolute coordinates). + 5) mask_predictions: (optional) a 4-D tensor with shape + [total_num_padded_proposals, num_classes, mask_height, mask_width] + containing instance mask predictions. + """ + proposal_boxes_normalized, _, num_proposals = self._postprocess_rpn( + rpn_box_encodings, rpn_objectness_predictions_with_background, + anchors, image_shape) + + flattened_proposal_feature_maps = ( + self._compute_second_stage_input_feature_maps( + rpn_features_to_crop, proposal_boxes_normalized)) + + box_classifier_features = ( + self._feature_extractor.extract_box_classifier_features( + flattened_proposal_feature_maps, + scope=self.second_stage_feature_extractor_scope)) + + box_predictions = self._mask_rcnn_box_predictor.predict( + box_classifier_features, + num_predictions_per_location=1, + scope=self.second_stage_box_predictor_scope) + refined_box_encodings = tf.squeeze( + box_predictions[box_predictor.BOX_ENCODINGS], axis=1) + class_predictions_with_background = tf.squeeze(box_predictions[ + box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND], axis=1) + + absolute_proposal_boxes = ops.normalized_to_image_coordinates( + proposal_boxes_normalized, image_shape, self._parallel_iterations) + + prediction_dict = { + 'refined_box_encodings': refined_box_encodings, + 'class_predictions_with_background': + class_predictions_with_background, + 'num_proposals': num_proposals, + 'proposal_boxes': absolute_proposal_boxes, + } + return prediction_dict + + def _extract_rpn_feature_maps(self, preprocessed_inputs): + """Extracts RPN features. + + This function extracts two feature maps: a feature map to be directly + fed to a box predictor (to predict location and objectness scores for + proposals) and a feature map from which to crop regions which will then + be sent to the second stage box classifier. + + Args: + preprocessed_inputs: a [batch, height, width, channels] image tensor. + + Returns: + rpn_box_predictor_features: A 4-D float32 tensor with shape + [batch, height, width, depth] to be used for predicting proposal boxes + and corresponding objectness scores. + rpn_features_to_crop: A 4-D float32 tensor with shape + [batch, height, width, depth] representing image features to crop using + the proposals boxes. + anchors: A BoxList representing anchors (for the RPN) in + absolute coordinates. + image_shape: A 1-D tensor representing the input image shape. + """ + image_shape = tf.shape(preprocessed_inputs) + rpn_features_to_crop = self._feature_extractor.extract_proposal_features( + preprocessed_inputs, scope=self.first_stage_feature_extractor_scope) + + feature_map_shape = tf.shape(rpn_features_to_crop) + anchors = self._first_stage_anchor_generator.generate( + [(feature_map_shape[1], feature_map_shape[2])]) + with slim.arg_scope(self._first_stage_box_predictor_arg_scope): + kernel_size = self._first_stage_box_predictor_kernel_size + rpn_box_predictor_features = slim.conv2d( + rpn_features_to_crop, + self._first_stage_box_predictor_depth, + kernel_size=[kernel_size, kernel_size], + rate=self._first_stage_atrous_rate, + activation_fn=tf.nn.relu6) + return (rpn_box_predictor_features, rpn_features_to_crop, + anchors, image_shape) + + def _predict_rpn_proposals(self, rpn_box_predictor_features): + """Adds box predictors to RPN feature map to predict proposals. + + Note resulting tensors will not have been postprocessed. + + Args: + rpn_box_predictor_features: A 4-D float32 tensor with shape + [batch, height, width, depth] to be used for predicting proposal boxes + and corresponding objectness scores. + + Returns: + box_encodings: 3-D float tensor of shape + [batch_size, num_anchors, self._box_coder.code_size] containing + predicted boxes. + objectness_predictions_with_background: 3-D float tensor of shape + [batch_size, num_anchors, 2] containing class + predictions (logits) for each of the anchors. Note that this + tensor *includes* background class predictions (at class index 0). + + Raises: + RuntimeError: if the anchor generator generates anchors corresponding to + multiple feature maps. We currently assume that a single feature map + is generated for the RPN. + """ + num_anchors_per_location = ( + self._first_stage_anchor_generator.num_anchors_per_location()) + if len(num_anchors_per_location) != 1: + raise RuntimeError('anchor_generator is expected to generate anchors ' + 'corresponding to a single feature map.') + box_predictions = self._first_stage_box_predictor.predict( + rpn_box_predictor_features, + num_anchors_per_location[0], + scope=self.first_stage_box_predictor_scope) + + box_encodings = box_predictions[box_predictor.BOX_ENCODINGS] + objectness_predictions_with_background = box_predictions[ + box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND] + return (tf.squeeze(box_encodings, axis=2), + objectness_predictions_with_background) + + def _remove_invalid_anchors_and_predictions( + self, + box_encodings, + objectness_predictions_with_background, + anchors_boxlist, + clip_window): + """Removes anchors that (partially) fall outside an image. + + Also removes associated box encodings and objectness predictions. + + Args: + box_encodings: 3-D float tensor of shape + [batch_size, num_anchors, self._box_coder.code_size] containing + predicted boxes. + objectness_predictions_with_background: 3-D float tensor of shape + [batch_size, num_anchors, 2] containing class + predictions (logits) for each of the anchors. Note that this + tensor *includes* background class predictions (at class index 0). + anchors_boxlist: A BoxList representing num_anchors anchors (for the RPN) + in absolute coordinates. + clip_window: a 1-D tensor representing the [ymin, xmin, ymax, xmax] + extent of the window to clip/prune to. + + Returns: + box_encodings: 4-D float tensor of shape + [batch_size, num_valid_anchors, self._box_coder.code_size] containing + predicted boxes, where num_valid_anchors <= num_anchors + objectness_predictions_with_background: 2-D float tensor of shape + [batch_size, num_valid_anchors, 2] containing class + predictions (logits) for each of the anchors, where + num_valid_anchors <= num_anchors. Note that this + tensor *includes* background class predictions (at class index 0). + anchors: A BoxList representing num_valid_anchors anchors (for the RPN) in + absolute coordinates. + """ + pruned_anchors_boxlist, keep_indices = box_list_ops.prune_outside_window( + anchors_boxlist, clip_window) + def _batch_gather_kept_indices(predictions_tensor): + return tf.map_fn( + partial(tf.gather, indices=keep_indices), + elems=predictions_tensor, + dtype=tf.float32, + parallel_iterations=self._parallel_iterations, + back_prop=True) + return (_batch_gather_kept_indices(box_encodings), + _batch_gather_kept_indices(objectness_predictions_with_background), + pruned_anchors_boxlist) + + def _flatten_first_two_dimensions(self, inputs): + """Flattens `K-d` tensor along batch dimension to be a `(K-1)-d` tensor. + + Converts `inputs` with shape [A, B, ..., depth] into a tensor of shape + [A * B, ..., depth]. + + Args: + inputs: A float tensor with shape [A, B, ..., depth]. Note that the first + two and last dimensions must be statically defined. + Returns: + A float tensor with shape [A * B, ..., depth] (where the first and last + dimension are statically defined. + """ + inputs_shape = inputs.get_shape().as_list() + flattened_shape = tf.concat([ + [inputs_shape[0]*inputs_shape[1]], tf.shape(inputs)[2:-1], + [inputs_shape[-1]]], 0) + return tf.reshape(inputs, flattened_shape) + + def postprocess(self, prediction_dict): + """Convert prediction tensors to final detections. + + This function converts raw predictions tensors to final detection results. + See base class for output format conventions. Note also that by default, + scores are to be interpreted as logits, but if a score_converter is used, + then scores are remapped (and may thus have a different interpretation). + + If first_stage_only=True, the returned results represent proposals from the + first stage RPN and are padded to have self.max_num_proposals for each + image; otherwise, the results can be interpreted as multiclass detections + from the full two-stage model and are padded to self._max_detections. + + Args: + prediction_dict: a dictionary holding prediction tensors (see the + documentation for the predict method. If first_stage_only=True, we + expect prediction_dict to contain `rpn_box_encodings`, + `rpn_objectness_predictions_with_background`, `rpn_features_to_crop`, + `image_shape`, and `anchors` fields. Otherwise we expect + prediction_dict to additionally contain `refined_box_encodings`, + `class_predictions_with_background`, `num_proposals`, + `proposal_boxes` and, optionally, `mask_predictions` fields. + + Returns: + detections: a dictionary containing the following fields + detection_boxes: [batch, max_detection, 4] + detection_scores: [batch, max_detections] + detection_classes: [batch, max_detections] + (this entry is only created if rpn_mode=False) + num_detections: [batch] + """ + with tf.name_scope('FirstStagePostprocessor'): + image_shape = prediction_dict['image_shape'] + if self._first_stage_only: + proposal_boxes, proposal_scores, num_proposals = self._postprocess_rpn( + prediction_dict['rpn_box_encodings'], + prediction_dict['rpn_objectness_predictions_with_background'], + prediction_dict['anchors'], + image_shape) + return { + 'detection_boxes': proposal_boxes, + 'detection_scores': proposal_scores, + 'num_detections': num_proposals + } + with tf.name_scope('SecondStagePostprocessor'): + mask_predictions = prediction_dict.get(box_predictor.MASK_PREDICTIONS) + detections_dict = self._postprocess_box_classifier( + prediction_dict['refined_box_encodings'], + prediction_dict['class_predictions_with_background'], + prediction_dict['proposal_boxes'], + prediction_dict['num_proposals'], + image_shape, + mask_predictions=mask_predictions) + return detections_dict + + def _postprocess_rpn(self, + rpn_box_encodings_batch, + rpn_objectness_predictions_with_background_batch, + anchors, + image_shape): + """Converts first stage prediction tensors from the RPN to proposals. + + This function decodes the raw RPN predictions, runs non-max suppression + on the result. + + Note that the behavior of this function is slightly modified during + training --- specifically, we stop the gradient from passing through the + proposal boxes and we only return a balanced sampled subset of proposals + with size `second_stage_batch_size`. + + Args: + rpn_box_encodings_batch: A 3-D float32 tensor of shape + [batch_size, num_anchors, self._box_coder.code_size] containing + predicted proposal box encodings. + rpn_objectness_predictions_with_background_batch: A 3-D float tensor of + shape [batch_size, num_anchors, 2] containing objectness predictions + (logits) for each of the anchors with 0 corresponding to background + and 1 corresponding to object. + anchors: A 2-D tensor of shape [num_anchors, 4] representing anchors + for the first stage RPN. Note that `num_anchors` can differ depending + on whether the model is created in training or inference mode. + image_shape: A 1-D tensor representing the input image shape. + + Returns: + proposal_boxes: A float tensor with shape + [batch_size, max_num_proposals, 4] representing the (potentially zero + padded) proposal boxes for all images in the batch. These boxes are + represented as normalized coordinates. + proposal_scores: A float tensor with shape + [batch_size, max_num_proposals] representing the (potentially zero + padded) proposal objectness scores for all images in the batch. + num_proposals: A Tensor of type `int32`. A 1-D tensor of shape [batch] + representing the number of proposals predicted for each image in + the batch. + """ + clip_window = tf.to_float(tf.stack([0, 0, image_shape[1], image_shape[2]])) + if self._is_training: + (groundtruth_boxlists, groundtruth_classes_with_background_list + ) = self._format_groundtruth_data(image_shape) + + proposal_boxes_list = [] + proposal_scores_list = [] + num_proposals_list = [] + for (batch_index, + (rpn_box_encodings, + rpn_objectness_predictions_with_background)) in enumerate(zip( + tf.unstack(rpn_box_encodings_batch), + tf.unstack(rpn_objectness_predictions_with_background_batch))): + decoded_boxes = self._box_coder.decode( + rpn_box_encodings, box_list.BoxList(anchors)) + objectness_scores = tf.unstack( + tf.nn.softmax(rpn_objectness_predictions_with_background), axis=1)[1] + proposal_boxlist = post_processing.multiclass_non_max_suppression( + tf.expand_dims(decoded_boxes.get(), 1), + tf.expand_dims(objectness_scores, 1), + self._first_stage_nms_score_threshold, + self._first_stage_nms_iou_threshold, self._first_stage_max_proposals, + clip_window=clip_window) + + if self._is_training: + proposal_boxlist.set(tf.stop_gradient(proposal_boxlist.get())) + if not self._hard_example_miner: + proposal_boxlist = self._sample_box_classifier_minibatch( + proposal_boxlist, groundtruth_boxlists[batch_index], + groundtruth_classes_with_background_list[batch_index]) + + normalized_proposals = box_list_ops.to_normalized_coordinates( + proposal_boxlist, image_shape[1], image_shape[2], + check_range=False) + + # pad proposals to max_num_proposals + padded_proposals = box_list_ops.pad_or_clip_box_list( + normalized_proposals, num_boxes=self.max_num_proposals) + proposal_boxes_list.append(padded_proposals.get()) + proposal_scores_list.append( + padded_proposals.get_field(fields.BoxListFields.scores)) + num_proposals_list.append(tf.minimum(normalized_proposals.num_boxes(), + self.max_num_proposals)) + + return (tf.stack(proposal_boxes_list), tf.stack(proposal_scores_list), + tf.stack(num_proposals_list)) + + def _format_groundtruth_data(self, image_shape): + """Helper function for preparing groundtruth data for target assignment. + + In order to be consistent with the model.DetectionModel interface, + groundtruth boxes are specified in normalized coordinates and classes are + specified as label indices with no assumed background category. To prepare + for target assignment, we: + 1) convert boxes to absolute coordinates, + 2) add a background class at class index 0 + + Args: + image_shape: A 1-D int32 tensor of shape [4] representing the shape of the + input image batch. + + Returns: + groundtruth_boxlists: A list of BoxLists containing (absolute) coordinates + of the groundtruth boxes. + groundtruth_classes_with_background_list: A list of 2-D one-hot + (or k-hot) tensors of shape [num_boxes, num_classes+1] containing the + class targets with the 0th index assumed to map to the background class. + """ + groundtruth_boxlists = [ + box_list_ops.to_absolute_coordinates( + box_list.BoxList(boxes), image_shape[1], image_shape[2]) + for boxes in self.groundtruth_lists(fields.BoxListFields.boxes)] + groundtruth_classes_with_background_list = [ + tf.to_float( + tf.pad(one_hot_encoding, [[0, 0], [1, 0]], mode='CONSTANT')) + for one_hot_encoding in self.groundtruth_lists( + fields.BoxListFields.classes)] + return groundtruth_boxlists, groundtruth_classes_with_background_list + + def _sample_box_classifier_minibatch(self, + proposal_boxlist, + groundtruth_boxlist, + groundtruth_classes_with_background): + """Samples a mini-batch of proposals to be sent to the box classifier. + + Helper function for self._postprocess_rpn. + + Args: + proposal_boxlist: A BoxList containing K proposal boxes in absolute + coordinates. + groundtruth_boxlist: A Boxlist containing N groundtruth object boxes in + absolute coordinates. + groundtruth_classes_with_background: A tensor with shape + `[N, self.num_classes + 1]` representing groundtruth classes. The + classes are assumed to be k-hot encoded, and include background as the + zero-th class. + + Returns: + a BoxList contained sampled proposals. + """ + (cls_targets, cls_weights, _, _, _) = self._detector_target_assigner.assign( + proposal_boxlist, groundtruth_boxlist, + groundtruth_classes_with_background) + # Selects all boxes as candidates if none of them is selected according + # to cls_weights. This could happen as boxes within certain IOU ranges + # are ignored. If triggered, the selected boxes will still be ignored + # during loss computation. + cls_weights += tf.to_float(tf.equal(tf.reduce_sum(cls_weights), 0)) + positive_indicator = tf.greater(tf.argmax(cls_targets, axis=1), 0) + sampled_indices = self._second_stage_sampler.subsample( + tf.cast(cls_weights, tf.bool), + self._second_stage_batch_size, + positive_indicator) + return box_list_ops.boolean_mask(proposal_boxlist, sampled_indices) + + def _compute_second_stage_input_feature_maps(self, features_to_crop, + proposal_boxes_normalized): + """Crops to a set of proposals from the feature map for a batch of images. + + Helper function for self._postprocess_rpn. This function calls + `tf.image.crop_and_resize` to create the feature map to be passed to the + second stage box classifier for each proposal. + + Args: + features_to_crop: A float32 tensor with shape + [batch_size, height, width, depth] + proposal_boxes_normalized: A float32 tensor with shape [batch_size, + num_proposals, box_code_size] containing proposal boxes in + normalized coordinates. + + Returns: + A float32 tensor with shape [K, new_height, new_width, depth]. + """ + def get_box_inds(proposals): + proposals_shape = proposals.get_shape().as_list() + if any(dim is None for dim in proposals_shape): + proposals_shape = tf.shape(proposals) + ones_mat = tf.ones(proposals_shape[:2], dtype=tf.int32) + multiplier = tf.expand_dims( + tf.range(start=0, limit=proposals_shape[0]), 1) + return tf.reshape(ones_mat * multiplier, [-1]) + + cropped_regions = tf.image.crop_and_resize( + features_to_crop, + self._flatten_first_two_dimensions(proposal_boxes_normalized), + get_box_inds(proposal_boxes_normalized), + (self._initial_crop_size, self._initial_crop_size)) + return slim.max_pool2d( + cropped_regions, + [self._maxpool_kernel_size, self._maxpool_kernel_size], + stride=self._maxpool_stride) + + def _postprocess_box_classifier(self, + refined_box_encodings, + class_predictions_with_background, + proposal_boxes, + num_proposals, + image_shape, + mask_predictions=None, + mask_threshold=0.5): + """Converts predictions from the second stage box classifier to detections. + + Args: + refined_box_encodings: a 3-D tensor with shape + [total_num_padded_proposals, num_classes, 4] representing predicted + (final) refined box encodings. + class_predictions_with_background: a 3-D tensor with shape + [total_num_padded_proposals, num_classes + 1] containing class + predictions (logits) for each of the proposals. Note that this tensor + *includes* background class predictions (at class index 0). + proposal_boxes: [batch_size, self.max_num_proposals, 4] representing + decoded proposal bounding boxes. + num_proposals: A Tensor of type `int32`. A 1-D tensor of shape [batch] + representing the number of proposals predicted for each image in + the batch. + image_shape: a 1-D tensor representing the input image shape. + mask_predictions: (optional) a 4-D tensor with shape + [total_num_padded_proposals, num_classes, mask_height, mask_width] + containing instance mask predictions. + mask_threshold: a scalar threshold determining which mask values are + rounded to 0 or 1. + + Returns: + A dictionary containing: + `detection_boxes`: [batch, max_detection, 4] + `detection_scores`: [batch, max_detections] + `detection_classes`: [batch, max_detections] + `num_detections`: [batch] + `detection_masks`: + (optional) [batch, max_detections, mask_height, mask_width] + """ + refined_box_encodings_batch = tf.reshape(refined_box_encodings, + [-1, self.max_num_proposals, + self.num_classes, + self._box_coder.code_size]) + class_predictions_with_background_batch = tf.reshape( + class_predictions_with_background, + [-1, self.max_num_proposals, self.num_classes + 1] + ) + refined_decoded_boxes_batch = self._batch_decode_refined_boxes( + refined_box_encodings_batch, proposal_boxes) + class_predictions_with_background_batch = ( + self._second_stage_score_conversion_fn( + class_predictions_with_background_batch)) + class_predictions_batch = tf.reshape( + tf.slice(class_predictions_with_background_batch, + [0, 0, 1], [-1, -1, -1]), + [-1, self.max_num_proposals, self.num_classes]) + clip_window = tf.to_float(tf.stack([0, 0, image_shape[1], image_shape[2]])) + + mask_predictions_batch = None + if mask_predictions is not None: + mask_height = mask_predictions.shape[2].value + mask_width = mask_predictions.shape[3].value + mask_predictions_batch = tf.reshape( + mask_predictions, [-1, self.max_num_proposals, + self.num_classes, mask_height, mask_width]) + detections = self._second_stage_nms_fn( + refined_decoded_boxes_batch, + class_predictions_batch, + clip_window=clip_window, + change_coordinate_frame=True, + num_valid_boxes=num_proposals, + masks=mask_predictions_batch) + if mask_predictions is not None: + detections['detection_masks'] = tf.to_float( + tf.greater_equal(detections['detection_masks'], mask_threshold)) + return detections + + def _batch_decode_refined_boxes(self, refined_box_encodings, proposal_boxes): + """Decode tensor of refined box encodings. + + Args: + refined_box_encodings: a 3-D tensor with shape + [batch_size, max_num_proposals, num_classes, self._box_coder.code_size] + representing predicted (final) refined box encodings. + proposal_boxes: [batch_size, self.max_num_proposals, 4] representing + decoded proposal bounding boxes. + + Returns: + refined_box_predictions: a [batch_size, max_num_proposals, num_classes, 4] + float tensor representing (padded) refined bounding box predictions + (for each image in batch, proposal and class). + """ + tiled_proposal_boxes = tf.tile( + tf.expand_dims(proposal_boxes, 2), [1, 1, self.num_classes, 1]) + tiled_proposals_boxlist = box_list.BoxList( + tf.reshape(tiled_proposal_boxes, [-1, 4])) + decoded_boxes = self._box_coder.decode( + tf.reshape(refined_box_encodings, [-1, self._box_coder.code_size]), + tiled_proposals_boxlist) + return tf.reshape(decoded_boxes.get(), + [-1, self.max_num_proposals, self.num_classes, 4]) + + def loss(self, prediction_dict, scope=None): + """Compute scalar loss tensors given prediction tensors. + + If first_stage_only=True, only RPN related losses are computed (i.e., + `rpn_localization_loss` and `rpn_objectness_loss`). Otherwise all + losses are computed. + + Args: + prediction_dict: a dictionary holding prediction tensors (see the + documentation for the predict method. If first_stage_only=True, we + expect prediction_dict to contain `rpn_box_encodings`, + `rpn_objectness_predictions_with_background`, `rpn_features_to_crop`, + `image_shape`, and `anchors` fields. Otherwise we expect + prediction_dict to additionally contain `refined_box_encodings`, + `class_predictions_with_background`, `num_proposals`, and + `proposal_boxes` fields. + scope: Optional scope name. + + Returns: + a dictionary mapping loss keys (`first_stage_localization_loss`, + `first_stage_objectness_loss`, 'second_stage_localization_loss', + 'second_stage_classification_loss') to scalar tensors representing + corresponding loss values. + """ + with tf.name_scope(scope, 'Loss', prediction_dict.values()): + (groundtruth_boxlists, groundtruth_classes_with_background_list + ) = self._format_groundtruth_data(prediction_dict['image_shape']) + loss_dict = self._loss_rpn( + prediction_dict['rpn_box_encodings'], + prediction_dict['rpn_objectness_predictions_with_background'], + prediction_dict['anchors'], + groundtruth_boxlists, + groundtruth_classes_with_background_list) + if not self._first_stage_only: + loss_dict.update( + self._loss_box_classifier( + prediction_dict['refined_box_encodings'], + prediction_dict['class_predictions_with_background'], + prediction_dict['proposal_boxes'], + prediction_dict['num_proposals'], + groundtruth_boxlists, + groundtruth_classes_with_background_list)) + return loss_dict + + def _loss_rpn(self, + rpn_box_encodings, + rpn_objectness_predictions_with_background, + anchors, + groundtruth_boxlists, + groundtruth_classes_with_background_list): + """Computes scalar RPN loss tensors. + + Uses self._proposal_target_assigner to obtain regression and classification + targets for the first stage RPN, samples a "minibatch" of anchors to + participate in the loss computation, and returns the RPN losses. + + Args: + rpn_box_encodings: A 4-D float tensor of shape + [batch_size, num_anchors, self._box_coder.code_size] containing + predicted proposal box encodings. + rpn_objectness_predictions_with_background: A 2-D float tensor of shape + [batch_size, num_anchors, 2] containing objectness predictions + (logits) for each of the anchors with 0 corresponding to background + and 1 corresponding to object. + anchors: A 2-D tensor of shape [num_anchors, 4] representing anchors + for the first stage RPN. Note that `num_anchors` can differ depending + on whether the model is created in training or inference mode. + groundtruth_boxlists: A list of BoxLists containing coordinates of the + groundtruth boxes. + groundtruth_classes_with_background_list: A list of 2-D one-hot + (or k-hot) tensors of shape [num_boxes, num_classes+1] containing the + class targets with the 0th index assumed to map to the background class. + + Returns: + a dictionary mapping loss keys (`first_stage_localization_loss`, + `first_stage_objectness_loss`) to scalar tensors representing + corresponding loss values. + """ + with tf.name_scope('RPNLoss'): + (batch_cls_targets, batch_cls_weights, batch_reg_targets, + batch_reg_weights, _) = target_assigner.batch_assign_targets( + self._proposal_target_assigner, box_list.BoxList(anchors), + groundtruth_boxlists, len(groundtruth_boxlists)*[None]) + batch_cls_targets = tf.squeeze(batch_cls_targets, axis=2) + + def _minibatch_subsample_fn(inputs): + cls_targets, cls_weights = inputs + return self._first_stage_sampler.subsample( + tf.cast(cls_weights, tf.bool), + self._first_stage_minibatch_size, tf.cast(cls_targets, tf.bool)) + batch_sampled_indices = tf.to_float(tf.map_fn( + _minibatch_subsample_fn, + [batch_cls_targets, batch_cls_weights], + dtype=tf.bool, + parallel_iterations=self._parallel_iterations, + back_prop=True)) + + # Normalize by number of examples in sampled minibatch + normalizer = tf.reduce_sum(batch_sampled_indices, axis=1) + batch_one_hot_targets = tf.one_hot( + tf.to_int32(batch_cls_targets), depth=2) + sampled_reg_indices = tf.multiply(batch_sampled_indices, + batch_reg_weights) + + localization_losses = self._first_stage_localization_loss( + rpn_box_encodings, batch_reg_targets, weights=sampled_reg_indices) + objectness_losses = self._first_stage_objectness_loss( + rpn_objectness_predictions_with_background, + batch_one_hot_targets, weights=batch_sampled_indices) + localization_loss = tf.reduce_mean( + tf.reduce_sum(localization_losses, axis=1) / normalizer) + objectness_loss = tf.reduce_mean( + tf.reduce_sum(objectness_losses, axis=1) / normalizer) + loss_dict = { + 'first_stage_localization_loss': + self._first_stage_loc_loss_weight * localization_loss, + 'first_stage_objectness_loss': + self._first_stage_obj_loss_weight * objectness_loss, + } + return loss_dict + + def _loss_box_classifier(self, + refined_box_encodings, + class_predictions_with_background, + proposal_boxes, + num_proposals, + groundtruth_boxlists, + groundtruth_classes_with_background_list): + """Computes scalar box classifier loss tensors. + + Uses self._detector_target_assigner to obtain regression and classification + targets for the second stage box classifier, optionally performs + hard mining, and returns losses. All losses are computed independently + for each image and then averaged across the batch. + + This function assumes that the proposal boxes in the "padded" regions are + actually zero (and thus should not be matched to). + + Args: + refined_box_encodings: a 3-D tensor with shape + [total_num_proposals, num_classes, box_coder.code_size] representing + predicted (final) refined box encodings. + class_predictions_with_background: a 3-D tensor with shape + [total_num_proposals, num_classes + 1] containing class + predictions (logits) for each of the anchors. Note that this tensor + *includes* background class predictions (at class index 0). + proposal_boxes: [batch_size, self.max_num_proposals, 4] representing + decoded proposal bounding boxes. + num_proposals: A Tensor of type `int32`. A 1-D tensor of shape [batch] + representing the number of proposals predicted for each image in + the batch. + groundtruth_boxlists: a list of BoxLists containing coordinates of the + groundtruth boxes. + groundtruth_classes_with_background_list: a list of 2-D one-hot + (or k-hot) tensors of shape [num_boxes, num_classes + 1] containing the + class targets with the 0th index assumed to map to the background class. + + Returns: + a dictionary mapping loss keys ('second_stage_localization_loss', + 'second_stage_classification_loss') to scalar tensors representing + corresponding loss values. + """ + with tf.name_scope('BoxClassifierLoss'): + paddings_indicator = self._padded_batched_proposals_indicator( + num_proposals, self.max_num_proposals) + proposal_boxlists = [ + box_list.BoxList(proposal_boxes_single_image) + for proposal_boxes_single_image in tf.unstack(proposal_boxes)] + batch_size = len(proposal_boxlists) + + num_proposals_or_one = tf.to_float(tf.expand_dims( + tf.maximum(num_proposals, tf.ones_like(num_proposals)), 1)) + normalizer = tf.tile(num_proposals_or_one, + [1, self.max_num_proposals]) * batch_size + + (batch_cls_targets_with_background, batch_cls_weights, batch_reg_targets, + batch_reg_weights, _) = target_assigner.batch_assign_targets( + self._detector_target_assigner, proposal_boxlists, + groundtruth_boxlists, groundtruth_classes_with_background_list) + + # We only predict refined location encodings for the non background + # classes, but we now pad it to make it compatible with the class + # predictions + flat_cls_targets_with_background = tf.reshape( + batch_cls_targets_with_background, + [batch_size * self.max_num_proposals, -1]) + refined_box_encodings_with_background = tf.pad( + refined_box_encodings, [[0, 0], [1, 0], [0, 0]]) + refined_box_encodings_masked_by_class_targets = tf.boolean_mask( + refined_box_encodings_with_background, + tf.greater(flat_cls_targets_with_background, 0)) + reshaped_refined_box_encodings = tf.reshape( + refined_box_encodings_masked_by_class_targets, + [batch_size, -1, 4]) + + second_stage_loc_losses = self._second_stage_localization_loss( + reshaped_refined_box_encodings, + batch_reg_targets, weights=batch_reg_weights) / normalizer + second_stage_cls_losses = self._second_stage_classification_loss( + class_predictions_with_background, + batch_cls_targets_with_background, + weights=batch_cls_weights) / normalizer + second_stage_loc_loss = tf.reduce_sum( + tf.boolean_mask(second_stage_loc_losses, paddings_indicator)) + second_stage_cls_loss = tf.reduce_sum( + tf.boolean_mask(second_stage_cls_losses, paddings_indicator)) + + if self._hard_example_miner: + (second_stage_loc_loss, second_stage_cls_loss + ) = self._unpad_proposals_and_apply_hard_mining( + proposal_boxlists, second_stage_loc_losses, + second_stage_cls_losses, num_proposals) + loss_dict = { + 'second_stage_localization_loss': + (self._second_stage_loc_loss_weight * second_stage_loc_loss), + 'second_stage_classification_loss': + (self._second_stage_cls_loss_weight * second_stage_cls_loss), + } + return loss_dict + + def _padded_batched_proposals_indicator(self, + num_proposals, + max_num_proposals): + """Creates indicator matrix of non-pad elements of padded batch proposals. + + Args: + num_proposals: Tensor of type tf.int32 with shape [batch_size]. + max_num_proposals: Maximum number of proposals per image (integer). + + Returns: + A Tensor of type tf.bool with shape [batch_size, max_num_proposals]. + """ + batch_size = tf.size(num_proposals) + tiled_num_proposals = tf.tile( + tf.expand_dims(num_proposals, 1), [1, max_num_proposals]) + tiled_proposal_index = tf.tile( + tf.expand_dims(tf.range(max_num_proposals), 0), [batch_size, 1]) + return tf.greater(tiled_num_proposals, tiled_proposal_index) + + def _unpad_proposals_and_apply_hard_mining(self, + proposal_boxlists, + second_stage_loc_losses, + second_stage_cls_losses, + num_proposals): + """Unpads proposals and applies hard mining. + + Args: + proposal_boxlists: A list of `batch_size` BoxLists each representing + `self.max_num_proposals` representing decoded proposal bounding boxes + for each image. + second_stage_loc_losses: A Tensor of type `float32`. A tensor of shape + `[batch_size, self.max_num_proposals]` representing per-anchor + second stage localization loss values. + second_stage_cls_losses: A Tensor of type `float32`. A tensor of shape + `[batch_size, self.max_num_proposals]` representing per-anchor + second stage classification loss values. + num_proposals: A Tensor of type `int32`. A 1-D tensor of shape [batch] + representing the number of proposals predicted for each image in + the batch. + + Returns: + second_stage_loc_loss: A scalar float32 tensor representing the second + stage localization loss. + second_stage_cls_loss: A scalar float32 tensor representing the second + stage classification loss. + """ + for (proposal_boxlist, single_image_loc_loss, single_image_cls_loss, + single_image_num_proposals) in zip( + proposal_boxlists, + tf.unstack(second_stage_loc_losses), + tf.unstack(second_stage_cls_losses), + tf.unstack(num_proposals)): + proposal_boxlist = box_list.BoxList( + tf.slice(proposal_boxlist.get(), + [0, 0], [single_image_num_proposals, -1])) + single_image_loc_loss = tf.slice(single_image_loc_loss, + [0], [single_image_num_proposals]) + single_image_cls_loss = tf.slice(single_image_cls_loss, + [0], [single_image_num_proposals]) + return self._hard_example_miner( + location_losses=tf.expand_dims(single_image_loc_loss, 0), + cls_losses=tf.expand_dims(single_image_cls_loss, 0), + decoded_boxlist_list=[proposal_boxlist]) + + def restore_fn(self, checkpoint_path, from_detection_checkpoint=True): + """Returns callable for loading a checkpoint into the tensorflow graph. + + Args: + checkpoint_path: path to checkpoint to restore. + from_detection_checkpoint: whether to restore from a detection checkpoint + (with compatible variable names) or to restore from a classification + checkpoint for initialization prior to training. Note that when + from_detection_checkpoint=True, the current implementation only + supports restoration from an (exactly) identical model (with exception + of the num_classes parameter). + + Returns: + a callable which takes a tf.Session as input and loads a checkpoint when + run. + """ + if not from_detection_checkpoint: + return self._feature_extractor.restore_from_classification_checkpoint_fn( + checkpoint_path, + self.first_stage_feature_extractor_scope, + self.second_stage_feature_extractor_scope) + + variables_to_restore = tf.global_variables() + variables_to_restore.append(slim.get_or_create_global_step()) + # Only load feature extractor variables to be consistent with loading from + # a classification checkpoint. + first_stage_variables = tf.contrib.framework.filter_variables( + variables_to_restore, + include_patterns=[self.first_stage_feature_extractor_scope, + self.second_stage_feature_extractor_scope]) + + saver = tf.train.Saver(first_stage_variables) + + def restore(sess): + saver.restore(sess, checkpoint_path) + return restore diff --git a/object_detection/meta_architectures/faster_rcnn_meta_arch_test.py b/object_detection/meta_architectures/faster_rcnn_meta_arch_test.py new file mode 100644 index 0000000000000000000000000000000000000000..527e24b4eb799a3c2e0972b7ff1ce39f63aa6285 --- /dev/null +++ b/object_detection/meta_architectures/faster_rcnn_meta_arch_test.py @@ -0,0 +1,84 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.meta_architectures.faster_rcnn_meta_arch.""" + +import tensorflow as tf + +from object_detection.meta_architectures import faster_rcnn_meta_arch_test_lib + + +class FasterRCNNMetaArchTest( + faster_rcnn_meta_arch_test_lib.FasterRCNNMetaArchTestBase): + + def test_postprocess_second_stage_only_inference_mode_with_masks(self): + model = self._build_model( + is_training=False, first_stage_only=False, second_stage_batch_size=6) + + batch_size = 2 + total_num_padded_proposals = batch_size * model.max_num_proposals + proposal_boxes = tf.constant( + [[[1, 1, 2, 3], + [0, 0, 1, 1], + [.5, .5, .6, .6], + 4*[0], 4*[0], 4*[0], 4*[0], 4*[0]], + [[2, 3, 6, 8], + [1, 2, 5, 3], + 4*[0], 4*[0], 4*[0], 4*[0], 4*[0], 4*[0]]], dtype=tf.float32) + num_proposals = tf.constant([3, 2], dtype=tf.int32) + refined_box_encodings = tf.zeros( + [total_num_padded_proposals, model.num_classes, 4], dtype=tf.float32) + class_predictions_with_background = tf.ones( + [total_num_padded_proposals, model.num_classes+1], dtype=tf.float32) + image_shape = tf.constant([batch_size, 36, 48, 3], dtype=tf.int32) + + mask_height = 2 + mask_width = 2 + mask_predictions = .6 * tf.ones( + [total_num_padded_proposals, model.num_classes, + mask_height, mask_width], dtype=tf.float32) + exp_detection_masks = [[[[1, 1], [1, 1]], + [[1, 1], [1, 1]], + [[1, 1], [1, 1]], + [[1, 1], [1, 1]], + [[1, 1], [1, 1]]], + [[[1, 1], [1, 1]], + [[1, 1], [1, 1]], + [[1, 1], [1, 1]], + [[1, 1], [1, 1]], + [[0, 0], [0, 0]]]] + + detections = model.postprocess({ + 'refined_box_encodings': refined_box_encodings, + 'class_predictions_with_background': class_predictions_with_background, + 'num_proposals': num_proposals, + 'proposal_boxes': proposal_boxes, + 'image_shape': image_shape, + 'mask_predictions': mask_predictions + }) + with self.test_session() as sess: + detections_out = sess.run(detections) + self.assertAllEqual(detections_out['detection_boxes'].shape, [2, 5, 4]) + self.assertAllClose(detections_out['detection_scores'], + [[1, 1, 1, 1, 1], [1, 1, 1, 1, 0]]) + self.assertAllClose(detections_out['detection_classes'], + [[0, 0, 0, 1, 1], [0, 0, 1, 1, 0]]) + self.assertAllClose(detections_out['num_detections'], [5, 4]) + self.assertAllClose(detections_out['detection_masks'], + exp_detection_masks) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/meta_architectures/faster_rcnn_meta_arch_test_lib.py b/object_detection/meta_architectures/faster_rcnn_meta_arch_test_lib.py new file mode 100644 index 0000000000000000000000000000000000000000..17e1f62b297929f8908335b3637d2a41b0bea001 --- /dev/null +++ b/object_detection/meta_architectures/faster_rcnn_meta_arch_test_lib.py @@ -0,0 +1,1035 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.meta_architectures.faster_rcnn_meta_arch.""" +import numpy as np +import tensorflow as tf +from google.protobuf import text_format +from object_detection.anchor_generators import grid_anchor_generator +from object_detection.builders import box_predictor_builder +from object_detection.builders import hyperparams_builder +from object_detection.builders import post_processing_builder +from object_detection.core import losses +from object_detection.meta_architectures import faster_rcnn_meta_arch +from object_detection.protos import box_predictor_pb2 +from object_detection.protos import hyperparams_pb2 +from object_detection.protos import post_processing_pb2 + +slim = tf.contrib.slim +BOX_CODE_SIZE = 4 + + +class FakeFasterRCNNFeatureExtractor( + faster_rcnn_meta_arch.FasterRCNNFeatureExtractor): + """Fake feature extracture to use in tests.""" + + def __init__(self): + super(FakeFasterRCNNFeatureExtractor, self).__init__( + is_training=False, + first_stage_features_stride=32, + reuse_weights=None, + weight_decay=0.0) + + def preprocess(self, resized_inputs): + return tf.identity(resized_inputs) + + def _extract_proposal_features(self, preprocessed_inputs, scope): + with tf.variable_scope('mock_model'): + return 0 * slim.conv2d(preprocessed_inputs, + num_outputs=3, kernel_size=1, scope='layer1') + + def _extract_box_classifier_features(self, proposal_feature_maps, scope): + with tf.variable_scope('mock_model'): + return 0 * slim.conv2d(proposal_feature_maps, + num_outputs=3, kernel_size=1, scope='layer2') + + +class FasterRCNNMetaArchTestBase(tf.test.TestCase): + """Base class to test Faster R-CNN and R-FCN meta architectures.""" + + def _build_arg_scope_with_hyperparams(self, + hyperparams_text_proto, + is_training): + hyperparams = hyperparams_pb2.Hyperparams() + text_format.Merge(hyperparams_text_proto, hyperparams) + return hyperparams_builder.build(hyperparams, is_training=is_training) + + def _get_second_stage_box_predictor_text_proto(self): + box_predictor_text_proto = """ + mask_rcnn_box_predictor { + fc_hyperparams { + op: FC + activation: NONE + regularizer { + l2_regularizer { + weight: 0.0005 + } + } + initializer { + variance_scaling_initializer { + factor: 1.0 + uniform: true + mode: FAN_AVG + } + } + } + } + """ + return box_predictor_text_proto + + def _get_second_stage_box_predictor(self, num_classes, is_training): + box_predictor_proto = box_predictor_pb2.BoxPredictor() + text_format.Merge(self._get_second_stage_box_predictor_text_proto(), + box_predictor_proto) + return box_predictor_builder.build( + hyperparams_builder.build, + box_predictor_proto, + num_classes=num_classes, + is_training=is_training) + + def _get_model(self, box_predictor, **common_kwargs): + return faster_rcnn_meta_arch.FasterRCNNMetaArch( + initial_crop_size=3, + maxpool_kernel_size=1, + maxpool_stride=1, + second_stage_mask_rcnn_box_predictor=box_predictor, + **common_kwargs) + + def _build_model(self, + is_training, + first_stage_only, + second_stage_batch_size, + first_stage_max_proposals=8, + num_classes=2, + hard_mining=False): + + def image_resizer_fn(image): + return tf.identity(image) + + # anchors in this test are designed so that a subset of anchors are inside + # the image and a subset of anchors are outside. + first_stage_anchor_scales = (0.001, 0.005, 0.1) + first_stage_anchor_aspect_ratios = (0.5, 1.0, 2.0) + first_stage_anchor_strides = (1, 1) + first_stage_anchor_generator = grid_anchor_generator.GridAnchorGenerator( + first_stage_anchor_scales, + first_stage_anchor_aspect_ratios, + anchor_stride=first_stage_anchor_strides) + + fake_feature_extractor = FakeFasterRCNNFeatureExtractor() + + first_stage_box_predictor_hyperparams_text_proto = """ + op: CONV + activation: RELU + regularizer { + l2_regularizer { + weight: 0.00004 + } + } + initializer { + truncated_normal_initializer { + stddev: 0.03 + } + } + """ + first_stage_box_predictor_arg_scope = ( + self._build_arg_scope_with_hyperparams( + first_stage_box_predictor_hyperparams_text_proto, is_training)) + + first_stage_box_predictor_kernel_size = 3 + first_stage_atrous_rate = 1 + first_stage_box_predictor_depth = 512 + first_stage_minibatch_size = 3 + first_stage_positive_balance_fraction = .5 + + first_stage_nms_score_threshold = -1.0 + first_stage_nms_iou_threshold = 1.0 + first_stage_max_proposals = first_stage_max_proposals + + first_stage_localization_loss_weight = 1.0 + first_stage_objectness_loss_weight = 1.0 + + post_processing_text_proto = """ + batch_non_max_suppression { + score_threshold: -20.0 + iou_threshold: 1.0 + max_detections_per_class: 5 + max_total_detections: 5 + } + """ + post_processing_config = post_processing_pb2.PostProcessing() + text_format.Merge(post_processing_text_proto, post_processing_config) + second_stage_non_max_suppression_fn, _ = post_processing_builder.build( + post_processing_config) + second_stage_balance_fraction = 1.0 + + second_stage_score_conversion_fn = tf.identity + second_stage_localization_loss_weight = 1.0 + second_stage_classification_loss_weight = 1.0 + + hard_example_miner = None + if hard_mining: + hard_example_miner = losses.HardExampleMiner( + num_hard_examples=1, + iou_threshold=0.99, + loss_type='both', + cls_loss_weight=second_stage_classification_loss_weight, + loc_loss_weight=second_stage_localization_loss_weight, + max_negatives_per_positive=None) + + common_kwargs = { + 'is_training': is_training, + 'num_classes': num_classes, + 'image_resizer_fn': image_resizer_fn, + 'feature_extractor': fake_feature_extractor, + 'first_stage_only': first_stage_only, + 'first_stage_anchor_generator': first_stage_anchor_generator, + 'first_stage_atrous_rate': first_stage_atrous_rate, + 'first_stage_box_predictor_arg_scope': + first_stage_box_predictor_arg_scope, + 'first_stage_box_predictor_kernel_size': + first_stage_box_predictor_kernel_size, + 'first_stage_box_predictor_depth': first_stage_box_predictor_depth, + 'first_stage_minibatch_size': first_stage_minibatch_size, + 'first_stage_positive_balance_fraction': + first_stage_positive_balance_fraction, + 'first_stage_nms_score_threshold': first_stage_nms_score_threshold, + 'first_stage_nms_iou_threshold': first_stage_nms_iou_threshold, + 'first_stage_max_proposals': first_stage_max_proposals, + 'first_stage_localization_loss_weight': + first_stage_localization_loss_weight, + 'first_stage_objectness_loss_weight': + first_stage_objectness_loss_weight, + 'second_stage_batch_size': second_stage_batch_size, + 'second_stage_balance_fraction': second_stage_balance_fraction, + 'second_stage_non_max_suppression_fn': + second_stage_non_max_suppression_fn, + 'second_stage_score_conversion_fn': second_stage_score_conversion_fn, + 'second_stage_localization_loss_weight': + second_stage_localization_loss_weight, + 'second_stage_classification_loss_weight': + second_stage_classification_loss_weight, + 'hard_example_miner': hard_example_miner} + + return self._get_model(self._get_second_stage_box_predictor( + num_classes=num_classes, is_training=is_training), **common_kwargs) + + def test_predict_gives_correct_shapes_in_inference_mode_first_stage_only( + self): + test_graph = tf.Graph() + with test_graph.as_default(): + model = self._build_model( + is_training=False, first_stage_only=True, second_stage_batch_size=2) + batch_size = 2 + height = 10 + width = 12 + input_image_shape = (batch_size, height, width, 3) + + preprocessed_inputs = tf.placeholder(dtype=tf.float32, + shape=(batch_size, None, None, 3)) + prediction_dict = model.predict(preprocessed_inputs) + + # In inference mode, anchors are clipped to the image window, but not + # pruned. Since MockFasterRCNN.extract_proposal_features returns a + # tensor with the same shape as its input, the expected number of anchors + # is height * width * the number of anchors per location (i.e. 3x3). + expected_num_anchors = height * width * 3 * 3 + expected_output_keys = set([ + 'rpn_box_predictor_features', 'rpn_features_to_crop', 'image_shape', + 'rpn_box_encodings', 'rpn_objectness_predictions_with_background', + 'anchors']) + expected_output_shapes = { + 'rpn_box_predictor_features': (batch_size, height, width, 512), + 'rpn_features_to_crop': (batch_size, height, width, 3), + 'rpn_box_encodings': (batch_size, expected_num_anchors, 4), + 'rpn_objectness_predictions_with_background': + (batch_size, expected_num_anchors, 2), + 'anchors': (expected_num_anchors, 4) + } + + init_op = tf.global_variables_initializer() + with self.test_session() as sess: + sess.run(init_op) + prediction_out = sess.run(prediction_dict, + feed_dict={ + preprocessed_inputs: + np.zeros(input_image_shape) + }) + + self.assertEqual(set(prediction_out.keys()), expected_output_keys) + + self.assertAllEqual(prediction_out['image_shape'], input_image_shape) + for output_key, expected_shape in expected_output_shapes.iteritems(): + self.assertAllEqual(prediction_out[output_key].shape, expected_shape) + + # Check that anchors are clipped to window. + anchors = prediction_out['anchors'] + self.assertTrue(np.all(np.greater_equal(anchors, 0))) + self.assertTrue(np.all(np.less_equal(anchors[:, 0], height))) + self.assertTrue(np.all(np.less_equal(anchors[:, 1], width))) + self.assertTrue(np.all(np.less_equal(anchors[:, 2], height))) + self.assertTrue(np.all(np.less_equal(anchors[:, 3], width))) + + def test_predict_gives_valid_anchors_in_training_mode_first_stage_only(self): + test_graph = tf.Graph() + with test_graph.as_default(): + model = self._build_model( + is_training=True, first_stage_only=True, second_stage_batch_size=2) + batch_size = 2 + height = 10 + width = 12 + input_image_shape = (batch_size, height, width, 3) + preprocessed_inputs = tf.placeholder(dtype=tf.float32, + shape=(batch_size, None, None, 3)) + prediction_dict = model.predict(preprocessed_inputs) + + expected_output_keys = set([ + 'rpn_box_predictor_features', 'rpn_features_to_crop', 'image_shape', + 'rpn_box_encodings', 'rpn_objectness_predictions_with_background', + 'anchors']) + # At training time, anchors that exceed image bounds are pruned. Thus + # the `expected_num_anchors` in the above inference mode test is now + # a strict upper bound on the number of anchors. + num_anchors_strict_upper_bound = height * width * 3 * 3 + + init_op = tf.global_variables_initializer() + with self.test_session() as sess: + sess.run(init_op) + prediction_out = sess.run(prediction_dict, + feed_dict={ + preprocessed_inputs: + np.zeros(input_image_shape) + }) + + self.assertEqual(set(prediction_out.keys()), expected_output_keys) + self.assertAllEqual(prediction_out['image_shape'], input_image_shape) + + # Check that anchors have less than the upper bound and + # are clipped to window. + anchors = prediction_out['anchors'] + self.assertTrue(len(anchors.shape) == 2 and anchors.shape[1] == 4) + num_anchors_out = anchors.shape[0] + self.assertTrue(num_anchors_out < num_anchors_strict_upper_bound) + + self.assertTrue(np.all(np.greater_equal(anchors, 0))) + self.assertTrue(np.all(np.less_equal(anchors[:, 0], height))) + self.assertTrue(np.all(np.less_equal(anchors[:, 1], width))) + self.assertTrue(np.all(np.less_equal(anchors[:, 2], height))) + self.assertTrue(np.all(np.less_equal(anchors[:, 3], width))) + + self.assertAllEqual(prediction_out['rpn_box_encodings'].shape, + (batch_size, num_anchors_out, 4)) + self.assertAllEqual( + prediction_out['rpn_objectness_predictions_with_background'].shape, + (batch_size, num_anchors_out, 2)) + + def test_predict_gives_correct_shapes_in_inference_mode_both_stages(self): + test_graph = tf.Graph() + with test_graph.as_default(): + model = self._build_model( + is_training=False, first_stage_only=False, second_stage_batch_size=2) + batch_size = 2 + image_size = 10 + image_shape = (batch_size, image_size, image_size, 3) + preprocessed_inputs = tf.zeros(image_shape, dtype=tf.float32) + result_tensor_dict = model.predict(preprocessed_inputs) + expected_num_anchors = image_size * image_size * 3 * 3 + + expected_shapes = { + 'rpn_box_predictor_features': + (2, image_size, image_size, 512), + 'rpn_features_to_crop': (2, image_size, image_size, 3), + 'image_shape': (4,), + 'rpn_box_encodings': (2, expected_num_anchors, 4), + 'rpn_objectness_predictions_with_background': + (2, expected_num_anchors, 2), + 'anchors': (expected_num_anchors, 4), + 'refined_box_encodings': (2 * 8, 2, 4), + 'class_predictions_with_background': (2 * 8, 2 + 1), + 'num_proposals': (2,), + 'proposal_boxes': (2, 8, 4), + } + init_op = tf.global_variables_initializer() + with self.test_session() as sess: + sess.run(init_op) + tensor_dict_out = sess.run(result_tensor_dict) + self.assertEqual(set(tensor_dict_out.keys()), + set(expected_shapes.keys())) + for key in expected_shapes: + self.assertAllEqual(tensor_dict_out[key].shape, expected_shapes[key]) + + def test_predict_gives_correct_shapes_in_train_mode_both_stages(self): + test_graph = tf.Graph() + with test_graph.as_default(): + model = self._build_model( + is_training=True, first_stage_only=False, second_stage_batch_size=7) + batch_size = 2 + image_size = 10 + image_shape = (batch_size, image_size, image_size, 3) + preprocessed_inputs = tf.zeros(image_shape, dtype=tf.float32) + groundtruth_boxes_list = [ + tf.constant([[0, 0, .5, .5], [.5, .5, 1, 1]], dtype=tf.float32), + tf.constant([[0, .5, .5, 1], [.5, 0, 1, .5]], dtype=tf.float32)] + groundtruth_classes_list = [ + tf.constant([[1, 0], [0, 1]], dtype=tf.float32), + tf.constant([[1, 0], [1, 0]], dtype=tf.float32)] + + model.provide_groundtruth(groundtruth_boxes_list, + groundtruth_classes_list) + + result_tensor_dict = model.predict(preprocessed_inputs) + expected_shapes = { + 'rpn_box_predictor_features': + (2, image_size, image_size, 512), + 'rpn_features_to_crop': (2, image_size, image_size, 3), + 'image_shape': (4,), + 'refined_box_encodings': (2 * 7, 2, 4), + 'class_predictions_with_background': (2 * 7, 2 + 1), + 'num_proposals': (2,), + 'proposal_boxes': (2, 7, 4), + } + init_op = tf.global_variables_initializer() + with self.test_session() as sess: + sess.run(init_op) + tensor_dict_out = sess.run(result_tensor_dict) + self.assertEqual(set(tensor_dict_out.keys()), + set(expected_shapes.keys()).union(set([ + 'rpn_box_encodings', + 'rpn_objectness_predictions_with_background', + 'anchors']))) + for key in expected_shapes: + self.assertAllEqual(tensor_dict_out[key].shape, expected_shapes[key]) + + anchors_shape_out = tensor_dict_out['anchors'].shape + self.assertEqual(2, len(anchors_shape_out)) + self.assertEqual(4, anchors_shape_out[1]) + num_anchors_out = anchors_shape_out[0] + self.assertAllEqual(tensor_dict_out['rpn_box_encodings'].shape, + (2, num_anchors_out, 4)) + self.assertAllEqual( + tensor_dict_out['rpn_objectness_predictions_with_background'].shape, + (2, num_anchors_out, 2)) + + def test_postprocess_first_stage_only_inference_mode(self): + model = self._build_model( + is_training=False, first_stage_only=True, second_stage_batch_size=6) + batch_size = 2 + anchors = tf.constant( + [[0, 0, 16, 16], + [0, 16, 16, 32], + [16, 0, 32, 16], + [16, 16, 32, 32]], dtype=tf.float32) + rpn_box_encodings = tf.zeros( + [batch_size, anchors.get_shape().as_list()[0], + BOX_CODE_SIZE], dtype=tf.float32) + # use different numbers for the objectness category to break ties in + # order of boxes returned by NMS + rpn_objectness_predictions_with_background = tf.constant([ + [[-10, 13], + [10, -10], + [10, -11], + [-10, 12]], + [[10, -10], + [-10, 13], + [-10, 12], + [10, -11]]], dtype=tf.float32) + rpn_features_to_crop = tf.ones((batch_size, 8, 8, 10), dtype=tf.float32) + image_shape = tf.constant([batch_size, 32, 32, 3], dtype=tf.int32) + proposals = model.postprocess({ + 'rpn_box_encodings': rpn_box_encodings, + 'rpn_objectness_predictions_with_background': + rpn_objectness_predictions_with_background, + 'rpn_features_to_crop': rpn_features_to_crop, + 'anchors': anchors, + 'image_shape': image_shape}) + expected_proposal_boxes = [ + [[0, 0, .5, .5], [.5, .5, 1, 1], [0, .5, .5, 1], [.5, 0, 1.0, .5]] + + 4 * [4 * [0]], + [[0, .5, .5, 1], [.5, 0, 1.0, .5], [0, 0, .5, .5], [.5, .5, 1, 1]] + + 4 * [4 * [0]]] + expected_proposal_scores = [[1, 1, 0, 0, 0, 0, 0, 0], + [1, 1, 0, 0, 0, 0, 0, 0]] + expected_num_proposals = [4, 4] + + expected_output_keys = set(['detection_boxes', 'detection_scores', + 'num_detections']) + self.assertEqual(set(proposals.keys()), expected_output_keys) + with self.test_session() as sess: + proposals_out = sess.run(proposals) + self.assertAllClose(proposals_out['detection_boxes'], + expected_proposal_boxes) + self.assertAllClose(proposals_out['detection_scores'], + expected_proposal_scores) + self.assertAllEqual(proposals_out['num_detections'], + expected_num_proposals) + + def test_postprocess_first_stage_only_train_mode(self): + model = self._build_model( + is_training=True, first_stage_only=True, second_stage_batch_size=2) + batch_size = 2 + anchors = tf.constant( + [[0, 0, 16, 16], + [0, 16, 16, 32], + [16, 0, 32, 16], + [16, 16, 32, 32]], dtype=tf.float32) + rpn_box_encodings = tf.zeros( + [batch_size, anchors.get_shape().as_list()[0], + BOX_CODE_SIZE], dtype=tf.float32) + # use different numbers for the objectness category to break ties in + # order of boxes returned by NMS + rpn_objectness_predictions_with_background = tf.constant([ + [[-10, 13], + [-10, 12], + [-10, 11], + [-10, 10]], + [[-10, 13], + [-10, 12], + [-10, 11], + [-10, 10]]], dtype=tf.float32) + rpn_features_to_crop = tf.ones((batch_size, 8, 8, 10), dtype=tf.float32) + image_shape = tf.constant([batch_size, 32, 32, 3], dtype=tf.int32) + groundtruth_boxes_list = [ + tf.constant([[0, 0, .5, .5], [.5, .5, 1, 1]], dtype=tf.float32), + tf.constant([[0, .5, .5, 1], [.5, 0, 1, .5]], dtype=tf.float32)] + groundtruth_classes_list = [tf.constant([[1, 0], [0, 1]], dtype=tf.float32), + tf.constant([[1, 0], [1, 0]], dtype=tf.float32)] + + model.provide_groundtruth(groundtruth_boxes_list, + groundtruth_classes_list) + proposals = model.postprocess({ + 'rpn_box_encodings': rpn_box_encodings, + 'rpn_objectness_predictions_with_background': + rpn_objectness_predictions_with_background, + 'rpn_features_to_crop': rpn_features_to_crop, + 'anchors': anchors, + 'image_shape': image_shape}) + expected_proposal_boxes = [ + [[0, 0, .5, .5], [.5, .5, 1, 1]], [[0, .5, .5, 1], [.5, 0, 1, .5]]] + expected_proposal_scores = [[1, 1], + [1, 1]] + expected_num_proposals = [2, 2] + + expected_output_keys = set(['detection_boxes', 'detection_scores', + 'num_detections']) + self.assertEqual(set(proposals.keys()), expected_output_keys) + + with self.test_session() as sess: + proposals_out = sess.run(proposals) + self.assertAllClose(proposals_out['detection_boxes'], + expected_proposal_boxes) + self.assertAllClose(proposals_out['detection_scores'], + expected_proposal_scores) + self.assertAllEqual(proposals_out['num_detections'], + expected_num_proposals) + + def test_postprocess_second_stage_only_inference_mode(self): + model = self._build_model( + is_training=False, first_stage_only=False, second_stage_batch_size=6) + + batch_size = 2 + total_num_padded_proposals = batch_size * model.max_num_proposals + proposal_boxes = tf.constant( + [[[1, 1, 2, 3], + [0, 0, 1, 1], + [.5, .5, .6, .6], + 4*[0], 4*[0], 4*[0], 4*[0], 4*[0]], + [[2, 3, 6, 8], + [1, 2, 5, 3], + 4*[0], 4*[0], 4*[0], 4*[0], 4*[0], 4*[0]]], dtype=tf.float32) + num_proposals = tf.constant([3, 2], dtype=tf.int32) + refined_box_encodings = tf.zeros( + [total_num_padded_proposals, model.num_classes, 4], dtype=tf.float32) + class_predictions_with_background = tf.ones( + [total_num_padded_proposals, model.num_classes+1], dtype=tf.float32) + image_shape = tf.constant([batch_size, 36, 48, 3], dtype=tf.int32) + + detections = model.postprocess({ + 'refined_box_encodings': refined_box_encodings, + 'class_predictions_with_background': class_predictions_with_background, + 'num_proposals': num_proposals, + 'proposal_boxes': proposal_boxes, + 'image_shape': image_shape + }) + with self.test_session() as sess: + detections_out = sess.run(detections) + self.assertAllEqual(detections_out['detection_boxes'].shape, [2, 5, 4]) + self.assertAllClose(detections_out['detection_scores'], + [[1, 1, 1, 1, 1], [1, 1, 1, 1, 0]]) + self.assertAllClose(detections_out['detection_classes'], + [[0, 0, 0, 1, 1], [0, 0, 1, 1, 0]]) + self.assertAllClose(detections_out['num_detections'], [5, 4]) + + def test_loss_first_stage_only_mode(self): + model = self._build_model( + is_training=True, first_stage_only=True, second_stage_batch_size=6) + batch_size = 2 + anchors = tf.constant( + [[0, 0, 16, 16], + [0, 16, 16, 32], + [16, 0, 32, 16], + [16, 16, 32, 32]], dtype=tf.float32) + + rpn_box_encodings = tf.zeros( + [batch_size, + anchors.get_shape().as_list()[0], + BOX_CODE_SIZE], dtype=tf.float32) + # use different numbers for the objectness category to break ties in + # order of boxes returned by NMS + rpn_objectness_predictions_with_background = tf.constant([ + [[-10, 13], + [10, -10], + [10, -11], + [-10, 12]], + [[10, -10], + [-10, 13], + [-10, 12], + [10, -11]]], dtype=tf.float32) + image_shape = tf.constant([batch_size, 32, 32, 3], dtype=tf.int32) + + groundtruth_boxes_list = [ + tf.constant([[0, 0, .5, .5], [.5, .5, 1, 1]], dtype=tf.float32), + tf.constant([[0, .5, .5, 1], [.5, 0, 1, .5]], dtype=tf.float32)] + groundtruth_classes_list = [tf.constant([[1, 0], [0, 1]], dtype=tf.float32), + tf.constant([[1, 0], [1, 0]], dtype=tf.float32)] + + prediction_dict = { + 'rpn_box_encodings': rpn_box_encodings, + 'rpn_objectness_predictions_with_background': + rpn_objectness_predictions_with_background, + 'image_shape': image_shape, + 'anchors': anchors + } + model.provide_groundtruth(groundtruth_boxes_list, + groundtruth_classes_list) + loss_dict = model.loss(prediction_dict) + with self.test_session() as sess: + loss_dict_out = sess.run(loss_dict) + self.assertAllClose(loss_dict_out['first_stage_localization_loss'], 0) + self.assertAllClose(loss_dict_out['first_stage_objectness_loss'], 0) + self.assertTrue('second_stage_localization_loss' not in loss_dict_out) + self.assertTrue('second_stage_classification_loss' not in loss_dict_out) + + def test_loss_full(self): + model = self._build_model( + is_training=True, first_stage_only=False, second_stage_batch_size=6) + batch_size = 2 + anchors = tf.constant( + [[0, 0, 16, 16], + [0, 16, 16, 32], + [16, 0, 32, 16], + [16, 16, 32, 32]], dtype=tf.float32) + rpn_box_encodings = tf.zeros( + [batch_size, + anchors.get_shape().as_list()[0], + BOX_CODE_SIZE], dtype=tf.float32) + # use different numbers for the objectness category to break ties in + # order of boxes returned by NMS + rpn_objectness_predictions_with_background = tf.constant([ + [[-10, 13], + [10, -10], + [10, -11], + [-10, 12]], + [[10, -10], + [-10, 13], + [-10, 12], + [10, -11]]], dtype=tf.float32) + image_shape = tf.constant([batch_size, 32, 32, 3], dtype=tf.int32) + + num_proposals = tf.constant([6, 6], dtype=tf.int32) + proposal_boxes = tf.constant( + 2 * [[[0, 0, 16, 16], + [0, 16, 16, 32], + [16, 0, 32, 16], + [16, 16, 32, 32], + [0, 0, 16, 16], + [0, 16, 16, 32]]], dtype=tf.float32) + refined_box_encodings = tf.zeros( + (batch_size * model.max_num_proposals, + model.num_classes, + BOX_CODE_SIZE), dtype=tf.float32) + class_predictions_with_background = tf.constant( + [[-10, 10, -10], # first image + [10, -10, -10], + [10, -10, -10], + [-10, -10, 10], + [-10, 10, -10], + [10, -10, -10], + [10, -10, -10], # second image + [-10, 10, -10], + [-10, 10, -10], + [10, -10, -10], + [10, -10, -10], + [-10, 10, -10]], dtype=tf.float32) + + groundtruth_boxes_list = [ + tf.constant([[0, 0, .5, .5], [.5, .5, 1, 1]], dtype=tf.float32), + tf.constant([[0, .5, .5, 1], [.5, 0, 1, .5]], dtype=tf.float32)] + groundtruth_classes_list = [tf.constant([[1, 0], [0, 1]], dtype=tf.float32), + tf.constant([[1, 0], [1, 0]], dtype=tf.float32)] + + prediction_dict = { + 'rpn_box_encodings': rpn_box_encodings, + 'rpn_objectness_predictions_with_background': + rpn_objectness_predictions_with_background, + 'image_shape': image_shape, + 'anchors': anchors, + 'refined_box_encodings': refined_box_encodings, + 'class_predictions_with_background': class_predictions_with_background, + 'proposal_boxes': proposal_boxes, + 'num_proposals': num_proposals + } + model.provide_groundtruth(groundtruth_boxes_list, + groundtruth_classes_list) + loss_dict = model.loss(prediction_dict) + + with self.test_session() as sess: + loss_dict_out = sess.run(loss_dict) + self.assertAllClose(loss_dict_out['first_stage_localization_loss'], 0) + self.assertAllClose(loss_dict_out['first_stage_objectness_loss'], 0) + self.assertAllClose(loss_dict_out['second_stage_localization_loss'], 0) + self.assertAllClose(loss_dict_out['second_stage_classification_loss'], 0) + + def test_loss_full_zero_padded_proposals(self): + model = self._build_model( + is_training=True, first_stage_only=False, second_stage_batch_size=6) + batch_size = 1 + anchors = tf.constant( + [[0, 0, 16, 16], + [0, 16, 16, 32], + [16, 0, 32, 16], + [16, 16, 32, 32]], dtype=tf.float32) + rpn_box_encodings = tf.zeros( + [batch_size, + anchors.get_shape().as_list()[0], + BOX_CODE_SIZE], dtype=tf.float32) + # use different numbers for the objectness category to break ties in + # order of boxes returned by NMS + rpn_objectness_predictions_with_background = tf.constant([ + [[-10, 13], + [10, -10], + [10, -11], + [10, -12]],], dtype=tf.float32) + image_shape = tf.constant([batch_size, 32, 32, 3], dtype=tf.int32) + + # box_classifier_batch_size is 6, but here we assume that the number of + # actual proposals (not counting zero paddings) is fewer (3). + num_proposals = tf.constant([3], dtype=tf.int32) + proposal_boxes = tf.constant( + [[[0, 0, 16, 16], + [0, 16, 16, 32], + [16, 0, 32, 16], + [0, 0, 0, 0], # begin paddings + [0, 0, 0, 0], + [0, 0, 0, 0]]], dtype=tf.float32) + + refined_box_encodings = tf.zeros( + (batch_size * model.max_num_proposals, + model.num_classes, + BOX_CODE_SIZE), dtype=tf.float32) + class_predictions_with_background = tf.constant( + [[-10, 10, -10], + [10, -10, -10], + [10, -10, -10], + [0, 0, 0], # begin paddings + [0, 0, 0], + [0, 0, 0]], dtype=tf.float32) + + groundtruth_boxes_list = [ + tf.constant([[0, 0, .5, .5]], dtype=tf.float32)] + groundtruth_classes_list = [tf.constant([[1, 0]], dtype=tf.float32)] + + prediction_dict = { + 'rpn_box_encodings': rpn_box_encodings, + 'rpn_objectness_predictions_with_background': + rpn_objectness_predictions_with_background, + 'image_shape': image_shape, + 'anchors': anchors, + 'refined_box_encodings': refined_box_encodings, + 'class_predictions_with_background': class_predictions_with_background, + 'proposal_boxes': proposal_boxes, + 'num_proposals': num_proposals + } + model.provide_groundtruth(groundtruth_boxes_list, + groundtruth_classes_list) + loss_dict = model.loss(prediction_dict) + + with self.test_session() as sess: + loss_dict_out = sess.run(loss_dict) + self.assertAllClose(loss_dict_out['first_stage_localization_loss'], 0) + self.assertAllClose(loss_dict_out['first_stage_objectness_loss'], 0) + self.assertAllClose(loss_dict_out['second_stage_localization_loss'], 0) + self.assertAllClose(loss_dict_out['second_stage_classification_loss'], 0) + + def test_loss_full_zero_padded_proposals_nonzero_loss_with_two_images(self): + model = self._build_model( + is_training=True, first_stage_only=False, second_stage_batch_size=6) + batch_size = 2 + anchors = tf.constant( + [[0, 0, 16, 16], + [0, 16, 16, 32], + [16, 0, 32, 16], + [16, 16, 32, 32]], dtype=tf.float32) + rpn_box_encodings = tf.zeros( + [batch_size, + anchors.get_shape().as_list()[0], + BOX_CODE_SIZE], dtype=tf.float32) + # use different numbers for the objectness category to break ties in + # order of boxes returned by NMS + rpn_objectness_predictions_with_background = tf.constant( + [[[-10, 13], + [10, -10], + [10, -11], + [10, -12]], + [[-10, 13], + [10, -10], + [10, -11], + [10, -12]]], dtype=tf.float32) + image_shape = tf.constant([batch_size, 32, 32, 3], dtype=tf.int32) + + # box_classifier_batch_size is 6, but here we assume that the number of + # actual proposals (not counting zero paddings) is fewer (3). + num_proposals = tf.constant([3, 2], dtype=tf.int32) + proposal_boxes = tf.constant( + [[[0, 0, 16, 16], + [0, 16, 16, 32], + [16, 0, 32, 16], + [0, 0, 0, 0], # begin paddings + [0, 0, 0, 0], + [0, 0, 0, 0]], + [[0, 0, 16, 16], + [0, 16, 16, 32], + [0, 0, 0, 0], + [0, 0, 0, 0], # begin paddings + [0, 0, 0, 0], + [0, 0, 0, 0]]], dtype=tf.float32) + + refined_box_encodings = tf.zeros( + (batch_size * model.max_num_proposals, + model.num_classes, + BOX_CODE_SIZE), dtype=tf.float32) + class_predictions_with_background = tf.constant( + [[-10, 10, -10], # first image + [10, -10, -10], + [10, -10, -10], + [0, 0, 0], # begin paddings + [0, 0, 0], + [0, 0, 0], + [-10, -10, 10], # second image + [10, -10, -10], + [0, 0, 0], # begin paddings + [0, 0, 0], + [0, 0, 0], + [0, 0, 0],], dtype=tf.float32) + + # The first groundtruth box is 4/5 of the anchor size in both directions + # experiencing a loss of: + # 2 * SmoothL1(5 * log(4/5)) / num_proposals + # = 2 * (abs(5 * log(1/2)) - .5) / 3 + # The second groundtruth box is identical to the prediction and thus + # experiences zero loss. + # Total average loss is (abs(5 * log(1/2)) - .5) / 3. + groundtruth_boxes_list = [ + tf.constant([[0.05, 0.05, 0.45, 0.45]], dtype=tf.float32), + tf.constant([[0.0, 0.0, 0.5, 0.5]], dtype=tf.float32)] + groundtruth_classes_list = [tf.constant([[1, 0]], dtype=tf.float32), + tf.constant([[0, 1]], dtype=tf.float32)] + exp_loc_loss = (-5 * np.log(.8) - 0.5) / 3.0 + + prediction_dict = { + 'rpn_box_encodings': rpn_box_encodings, + 'rpn_objectness_predictions_with_background': + rpn_objectness_predictions_with_background, + 'image_shape': image_shape, + 'anchors': anchors, + 'refined_box_encodings': refined_box_encodings, + 'class_predictions_with_background': class_predictions_with_background, + 'proposal_boxes': proposal_boxes, + 'num_proposals': num_proposals + } + model.provide_groundtruth(groundtruth_boxes_list, + groundtruth_classes_list) + loss_dict = model.loss(prediction_dict) + + with self.test_session() as sess: + loss_dict_out = sess.run(loss_dict) + self.assertAllClose(loss_dict_out['first_stage_localization_loss'], + exp_loc_loss) + self.assertAllClose(loss_dict_out['first_stage_objectness_loss'], 0) + self.assertAllClose(loss_dict_out['second_stage_localization_loss'], + exp_loc_loss) + self.assertAllClose(loss_dict_out['second_stage_classification_loss'], 0) + + def test_loss_with_hard_mining(self): + model = self._build_model(is_training=True, + first_stage_only=False, + second_stage_batch_size=None, + first_stage_max_proposals=6, + hard_mining=True) + batch_size = 1 + anchors = tf.constant( + [[0, 0, 16, 16], + [0, 16, 16, 32], + [16, 0, 32, 16], + [16, 16, 32, 32]], dtype=tf.float32) + rpn_box_encodings = tf.zeros( + [batch_size, + anchors.get_shape().as_list()[0], + BOX_CODE_SIZE], dtype=tf.float32) + # use different numbers for the objectness category to break ties in + # order of boxes returned by NMS + rpn_objectness_predictions_with_background = tf.constant( + [[[-10, 13], + [-10, 12], + [10, -11], + [10, -12]]], dtype=tf.float32) + image_shape = tf.constant([batch_size, 32, 32, 3], dtype=tf.int32) + + # box_classifier_batch_size is 6, but here we assume that the number of + # actual proposals (not counting zero paddings) is fewer (3). + num_proposals = tf.constant([3], dtype=tf.int32) + proposal_boxes = tf.constant( + [[[0, 0, 16, 16], + [0, 16, 16, 32], + [16, 0, 32, 16], + [0, 0, 0, 0], # begin paddings + [0, 0, 0, 0], + [0, 0, 0, 0]]], dtype=tf.float32) + + refined_box_encodings = tf.zeros( + (batch_size * model.max_num_proposals, + model.num_classes, + BOX_CODE_SIZE), dtype=tf.float32) + class_predictions_with_background = tf.constant( + [[-10, 10, -10], # first image + [-10, -10, 10], + [10, -10, -10], + [0, 0, 0], # begin paddings + [0, 0, 0], + [0, 0, 0]], dtype=tf.float32) + + # The first groundtruth box is 4/5 of the anchor size in both directions + # experiencing a loss of: + # 2 * SmoothL1(5 * log(4/5)) / num_proposals + # = 2 * (abs(5 * log(1/2)) - .5) / 3 + # The second groundtruth box is 46/50 of the anchor size in both directions + # experiencing a loss of: + # 2 * SmoothL1(5 * log(42/50)) / num_proposals + # = 2 * (.5(5 * log(.92))^2 - .5) / 3. + # Since the first groundtruth box experiences greater loss, and we have + # set num_hard_examples=1 in the HardMiner, the final localization loss + # corresponds to that of the first groundtruth box. + groundtruth_boxes_list = [ + tf.constant([[0.05, 0.05, 0.45, 0.45], + [0.02, 0.52, 0.48, 0.98],], dtype=tf.float32)] + groundtruth_classes_list = [tf.constant([[1, 0], [0, 1]], dtype=tf.float32)] + exp_loc_loss = 2 * (-5 * np.log(.8) - 0.5) / 3.0 + + prediction_dict = { + 'rpn_box_encodings': rpn_box_encodings, + 'rpn_objectness_predictions_with_background': + rpn_objectness_predictions_with_background, + 'image_shape': image_shape, + 'anchors': anchors, + 'refined_box_encodings': refined_box_encodings, + 'class_predictions_with_background': class_predictions_with_background, + 'proposal_boxes': proposal_boxes, + 'num_proposals': num_proposals + } + model.provide_groundtruth(groundtruth_boxes_list, + groundtruth_classes_list) + loss_dict = model.loss(prediction_dict) + + with self.test_session() as sess: + loss_dict_out = sess.run(loss_dict) + self.assertAllClose(loss_dict_out['second_stage_localization_loss'], + exp_loc_loss) + self.assertAllClose(loss_dict_out['second_stage_classification_loss'], 0) + + def test_restore_fn_classification(self): + # Define mock tensorflow classification graph and save variables. + test_graph_classification = tf.Graph() + with test_graph_classification.as_default(): + image = tf.placeholder(dtype=tf.float32, shape=[1, 20, 20, 3]) + with tf.variable_scope('mock_model'): + net = slim.conv2d(image, num_outputs=3, kernel_size=1, scope='layer1') + slim.conv2d(net, num_outputs=3, kernel_size=1, scope='layer2') + + init_op = tf.global_variables_initializer() + saver = tf.train.Saver() + save_path = self.get_temp_dir() + with self.test_session() as sess: + sess.run(init_op) + saved_model_path = saver.save(sess, save_path) + + # Create tensorflow detection graph and load variables from + # classification checkpoint. + test_graph_detection = tf.Graph() + with test_graph_detection.as_default(): + model = self._build_model( + is_training=False, first_stage_only=False, second_stage_batch_size=6) + + inputs_shape = (2, 20, 20, 3) + inputs = tf.to_float(tf.random_uniform( + inputs_shape, minval=0, maxval=255, dtype=tf.int32)) + preprocessed_inputs = model.preprocess(inputs) + prediction_dict = model.predict(preprocessed_inputs) + model.postprocess(prediction_dict) + restore_fn = model.restore_fn(saved_model_path, + from_detection_checkpoint=False) + with self.test_session() as sess: + restore_fn(sess) + + def test_restore_fn_detection(self): + # Define first detection graph and save variables. + test_graph_detection1 = tf.Graph() + with test_graph_detection1.as_default(): + model = self._build_model( + is_training=False, first_stage_only=False, second_stage_batch_size=6) + inputs_shape = (2, 20, 20, 3) + inputs = tf.to_float(tf.random_uniform( + inputs_shape, minval=0, maxval=255, dtype=tf.int32)) + preprocessed_inputs = model.preprocess(inputs) + prediction_dict = model.predict(preprocessed_inputs) + model.postprocess(prediction_dict) + init_op = tf.global_variables_initializer() + saver = tf.train.Saver() + save_path = self.get_temp_dir() + with self.test_session() as sess: + sess.run(init_op) + saved_model_path = saver.save(sess, save_path) + + # Define second detection graph and restore variables. + test_graph_detection2 = tf.Graph() + with test_graph_detection2.as_default(): + model2 = self._build_model(is_training=False, first_stage_only=False, + second_stage_batch_size=6, num_classes=42) + + inputs_shape2 = (2, 20, 20, 3) + inputs2 = tf.to_float(tf.random_uniform( + inputs_shape2, minval=0, maxval=255, dtype=tf.int32)) + preprocessed_inputs2 = model2.preprocess(inputs2) + prediction_dict2 = model2.predict(preprocessed_inputs2) + model2.postprocess(prediction_dict2) + restore_fn = model2.restore_fn(saved_model_path, + from_detection_checkpoint=True) + with self.test_session() as sess: + restore_fn(sess) + for var in sess.run(tf.report_uninitialized_variables()): + self.assertNotIn(model2.first_stage_feature_extractor_scope, var.name) + self.assertNotIn(model2.second_stage_feature_extractor_scope, + var.name) + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/meta_architectures/rfcn_meta_arch.py b/object_detection/meta_architectures/rfcn_meta_arch.py new file mode 100644 index 0000000000000000000000000000000000000000..7f712ba4d7cf64860b4a875570a320d66eb1d42b --- /dev/null +++ b/object_detection/meta_architectures/rfcn_meta_arch.py @@ -0,0 +1,267 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""R-FCN meta-architecture definition. + +R-FCN: Dai, Jifeng, et al. "R-FCN: Object Detection via Region-based +Fully Convolutional Networks." arXiv preprint arXiv:1605.06409 (2016). + +The R-FCN meta architecture is similar to Faster R-CNN and only differs in the +second stage. Hence this class inherits FasterRCNNMetaArch and overrides only +the `_predict_second_stage` method. + +Similar to Faster R-CNN we allow for two modes: first_stage_only=True and +first_stage_only=False. In the former setting, all of the user facing methods +(e.g., predict, postprocess, loss) can be used as if the model consisted +only of the RPN, returning class agnostic proposals (these can be thought of as +approximate detections with no associated class information). In the latter +setting, proposals are computed, then passed through a second stage +"box classifier" to yield (multi-class) detections. + +Implementations of R-FCN models must define a new FasterRCNNFeatureExtractor and +override three methods: `preprocess`, `_extract_proposal_features` (the first +stage of the model), and `_extract_box_classifier_features` (the second stage of +the model). Optionally, the `restore_fn` method can be overridden. See tests +for an example. + +See notes in the documentation of Faster R-CNN meta-architecture as they all +apply here. +""" +import tensorflow as tf + +from object_detection.core import box_predictor +from object_detection.meta_architectures import faster_rcnn_meta_arch +from object_detection.utils import ops + + +class RFCNMetaArch(faster_rcnn_meta_arch.FasterRCNNMetaArch): + """R-FCN Meta-architecture definition.""" + + def __init__(self, + is_training, + num_classes, + image_resizer_fn, + feature_extractor, + first_stage_only, + first_stage_anchor_generator, + first_stage_atrous_rate, + first_stage_box_predictor_arg_scope, + first_stage_box_predictor_kernel_size, + first_stage_box_predictor_depth, + first_stage_minibatch_size, + first_stage_positive_balance_fraction, + first_stage_nms_score_threshold, + first_stage_nms_iou_threshold, + first_stage_max_proposals, + first_stage_localization_loss_weight, + first_stage_objectness_loss_weight, + second_stage_rfcn_box_predictor, + second_stage_batch_size, + second_stage_balance_fraction, + second_stage_non_max_suppression_fn, + second_stage_score_conversion_fn, + second_stage_localization_loss_weight, + second_stage_classification_loss_weight, + hard_example_miner, + parallel_iterations=16): + """RFCNMetaArch Constructor. + + Args: + is_training: A boolean indicating whether the training version of the + computation graph should be constructed. + num_classes: Number of classes. Note that num_classes *does not* + include the background category, so if groundtruth labels take values + in {0, 1, .., K-1}, num_classes=K (and not K+1, even though the + assigned classification targets can range from {0,... K}). + image_resizer_fn: A callable for image resizing. This callable always + takes a rank-3 image tensor (corresponding to a single image) and + returns a rank-3 image tensor, possibly with new spatial dimensions. + See builders/image_resizer_builder.py. + feature_extractor: A FasterRCNNFeatureExtractor object. + first_stage_only: Whether to construct only the Region Proposal Network + (RPN) part of the model. + first_stage_anchor_generator: An anchor_generator.AnchorGenerator object + (note that currently we only support + grid_anchor_generator.GridAnchorGenerator objects) + first_stage_atrous_rate: A single integer indicating the atrous rate for + the single convolution op which is applied to the `rpn_features_to_crop` + tensor to obtain a tensor to be used for box prediction. Some feature + extractors optionally allow for producing feature maps computed at + denser resolutions. The atrous rate is used to compensate for the + denser feature maps by using an effectively larger receptive field. + (This should typically be set to 1). + first_stage_box_predictor_arg_scope: Slim arg_scope for conv2d, + separable_conv2d and fully_connected ops for the RPN box predictor. + first_stage_box_predictor_kernel_size: Kernel size to use for the + convolution op just prior to RPN box predictions. + first_stage_box_predictor_depth: Output depth for the convolution op + just prior to RPN box predictions. + first_stage_minibatch_size: The "batch size" to use for computing the + objectness and location loss of the region proposal network. This + "batch size" refers to the number of anchors selected as contributing + to the loss function for any given image within the image batch and is + only called "batch_size" due to terminology from the Faster R-CNN paper. + first_stage_positive_balance_fraction: Fraction of positive examples + per image for the RPN. The recommended value for Faster RCNN is 0.5. + first_stage_nms_score_threshold: Score threshold for non max suppression + for the Region Proposal Network (RPN). This value is expected to be in + [0, 1] as it is applied directly after a softmax transformation. The + recommended value for Faster R-CNN is 0. + first_stage_nms_iou_threshold: The Intersection Over Union (IOU) threshold + for performing Non-Max Suppression (NMS) on the boxes predicted by the + Region Proposal Network (RPN). + first_stage_max_proposals: Maximum number of boxes to retain after + performing Non-Max Suppression (NMS) on the boxes predicted by the + Region Proposal Network (RPN). + first_stage_localization_loss_weight: A float + first_stage_objectness_loss_weight: A float + second_stage_rfcn_box_predictor: RFCN box predictor to use for + second stage. + second_stage_batch_size: The batch size used for computing the + classification and refined location loss of the box classifier. This + "batch size" refers to the number of proposals selected as contributing + to the loss function for any given image within the image batch and is + only called "batch_size" due to terminology from the Faster R-CNN paper. + second_stage_balance_fraction: Fraction of positive examples to use + per image for the box classifier. The recommended value for Faster RCNN + is 0.25. + second_stage_non_max_suppression_fn: batch_multiclass_non_max_suppression + callable that takes `boxes`, `scores`, optional `clip_window` and + optional (kwarg) `mask` inputs (with all other inputs already set) + and returns a dictionary containing tensors with keys: + `detection_boxes`, `detection_scores`, `detection_classes`, + `num_detections`, and (optionally) `detection_masks`. See + `post_processing.batch_multiclass_non_max_suppression` for the type and + shape of these tensors. + second_stage_score_conversion_fn: Callable elementwise nonlinearity + (that takes tensors as inputs and returns tensors). This is usually + used to convert logits to probabilities. + second_stage_localization_loss_weight: A float + second_stage_classification_loss_weight: A float + hard_example_miner: A losses.HardExampleMiner object (can be None). + parallel_iterations: (Optional) The number of iterations allowed to run + in parallel for calls to tf.map_fn. + Raises: + ValueError: If `second_stage_batch_size` > `first_stage_max_proposals` + ValueError: If first_stage_anchor_generator is not of type + grid_anchor_generator.GridAnchorGenerator. + """ + super(RFCNMetaArch, self).__init__( + is_training, + num_classes, + image_resizer_fn, + feature_extractor, + first_stage_only, + first_stage_anchor_generator, + first_stage_atrous_rate, + first_stage_box_predictor_arg_scope, + first_stage_box_predictor_kernel_size, + first_stage_box_predictor_depth, + first_stage_minibatch_size, + first_stage_positive_balance_fraction, + first_stage_nms_score_threshold, + first_stage_nms_iou_threshold, + first_stage_max_proposals, + first_stage_localization_loss_weight, + first_stage_objectness_loss_weight, + None, # initial_crop_size is not used in R-FCN + None, # maxpool_kernel_size is not use in R-FCN + None, # maxpool_stride is not use in R-FCN + None, # fully_connected_box_predictor is not used in R-FCN. + second_stage_batch_size, + second_stage_balance_fraction, + second_stage_non_max_suppression_fn, + second_stage_score_conversion_fn, + second_stage_localization_loss_weight, + second_stage_classification_loss_weight, + hard_example_miner, + parallel_iterations) + + self._rfcn_box_predictor = second_stage_rfcn_box_predictor + + def _predict_second_stage(self, rpn_box_encodings, + rpn_objectness_predictions_with_background, + rpn_features, + anchors, + image_shape): + """Predicts the output tensors from 2nd stage of FasterRCNN. + + Args: + rpn_box_encodings: 4-D float tensor of shape + [batch_size, num_valid_anchors, self._box_coder.code_size] containing + predicted boxes. + rpn_objectness_predictions_with_background: 2-D float tensor of shape + [batch_size, num_valid_anchors, 2] containing class + predictions (logits) for each of the anchors. Note that this + tensor *includes* background class predictions (at class index 0). + rpn_features: A 4-D float32 tensor with shape + [batch_size, height, width, depth] representing image features from the + RPN. + anchors: 2-D float tensor of shape + [num_anchors, self._box_coder.code_size]. + image_shape: A 1D int32 tensors of size [4] containing the image shape. + + Returns: + prediction_dict: a dictionary holding "raw" prediction tensors: + 1) refined_box_encodings: a 3-D tensor with shape + [total_num_proposals, num_classes, 4] representing predicted + (final) refined box encodings, where + total_num_proposals=batch_size*self._max_num_proposals + 2) class_predictions_with_background: a 3-D tensor with shape + [total_num_proposals, num_classes + 1] containing class + predictions (logits) for each of the anchors, where + total_num_proposals=batch_size*self._max_num_proposals. + Note that this tensor *includes* background class predictions + (at class index 0). + 3) num_proposals: An int32 tensor of shape [batch_size] representing the + number of proposals generated by the RPN. `num_proposals` allows us + to keep track of which entries are to be treated as zero paddings and + which are not since we always pad the number of proposals to be + `self.max_num_proposals` for each image. + 4) proposal_boxes: A float32 tensor of shape + [batch_size, self.max_num_proposals, 4] representing + decoded proposal bounding boxes (in absolute coordinates). + """ + proposal_boxes_normalized, _, num_proposals = self._postprocess_rpn( + rpn_box_encodings, rpn_objectness_predictions_with_background, + anchors, image_shape) + + box_classifier_features = ( + self._feature_extractor.extract_box_classifier_features( + rpn_features, + scope=self.second_stage_feature_extractor_scope)) + + box_predictions = self._rfcn_box_predictor.predict( + box_classifier_features, + num_predictions_per_location=1, + scope=self.second_stage_box_predictor_scope, + proposal_boxes=proposal_boxes_normalized) + refined_box_encodings = tf.squeeze( + box_predictions[box_predictor.BOX_ENCODINGS], axis=1) + class_predictions_with_background = tf.squeeze( + box_predictions[box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND], + axis=1) + + absolute_proposal_boxes = ops.normalized_to_image_coordinates( + proposal_boxes_normalized, image_shape, + parallel_iterations=self._parallel_iterations) + + prediction_dict = { + 'refined_box_encodings': refined_box_encodings, + 'class_predictions_with_background': + class_predictions_with_background, + 'num_proposals': num_proposals, + 'proposal_boxes': absolute_proposal_boxes, + } + return prediction_dict diff --git a/object_detection/meta_architectures/rfcn_meta_arch_test.py b/object_detection/meta_architectures/rfcn_meta_arch_test.py new file mode 100644 index 0000000000000000000000000000000000000000..5a7ad8baa3f0324cb703efe36e5c141c1f749a16 --- /dev/null +++ b/object_detection/meta_architectures/rfcn_meta_arch_test.py @@ -0,0 +1,56 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.meta_architectures.rfcn_meta_arch.""" + +import tensorflow as tf + +from object_detection.meta_architectures import faster_rcnn_meta_arch_test_lib +from object_detection.meta_architectures import rfcn_meta_arch + + +class RFCNMetaArchTest( + faster_rcnn_meta_arch_test_lib.FasterRCNNMetaArchTestBase): + + def _get_second_stage_box_predictor_text_proto(self): + box_predictor_text_proto = """ + rfcn_box_predictor { + conv_hyperparams { + op: CONV + activation: NONE + regularizer { + l2_regularizer { + weight: 0.0005 + } + } + initializer { + variance_scaling_initializer { + factor: 1.0 + uniform: true + mode: FAN_AVG + } + } + } + } + """ + return box_predictor_text_proto + + def _get_model(self, box_predictor, **common_kwargs): + return rfcn_meta_arch.RFCNMetaArch( + second_stage_rfcn_box_predictor=box_predictor, **common_kwargs) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/meta_architectures/ssd_meta_arch.py b/object_detection/meta_architectures/ssd_meta_arch.py new file mode 100644 index 0000000000000000000000000000000000000000..c23bd3a24d101a2a63829d921c6ed603d1fd1f49 --- /dev/null +++ b/object_detection/meta_architectures/ssd_meta_arch.py @@ -0,0 +1,594 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""SSD Meta-architecture definition. + +General tensorflow implementation of convolutional Multibox/SSD detection +models. +""" +from abc import abstractmethod + +import re +import tensorflow as tf + +from object_detection.core import box_coder as bcoder +from object_detection.core import box_list +from object_detection.core import box_predictor as bpredictor +from object_detection.core import model +from object_detection.core import standard_fields as fields +from object_detection.core import target_assigner +from object_detection.utils import variables_helper + +slim = tf.contrib.slim + + +class SSDFeatureExtractor(object): + """SSD Feature Extractor definition.""" + + def __init__(self, + depth_multiplier, + min_depth, + conv_hyperparams, + reuse_weights=None): + self._depth_multiplier = depth_multiplier + self._min_depth = min_depth + self._conv_hyperparams = conv_hyperparams + self._reuse_weights = reuse_weights + + @abstractmethod + def preprocess(self, resized_inputs): + """Preprocesses images for feature extraction (minus image resizing). + + Args: + resized_inputs: a [batch, height, width, channels] float tensor + representing a batch of images. + + Returns: + preprocessed_inputs: a [batch, height, width, channels] float tensor + representing a batch of images. + """ + pass + + @abstractmethod + def extract_features(self, preprocessed_inputs): + """Extracts features from preprocessed inputs. + + This function is responsible for extracting feature maps from preprocessed + images. + + Args: + preprocessed_inputs: a [batch, height, width, channels] float tensor + representing a batch of images. + + Returns: + feature_maps: a list of tensors where the ith tensor has shape + [batch, height_i, width_i, depth_i] + """ + pass + + +class SSDMetaArch(model.DetectionModel): + """SSD Meta-architecture definition.""" + + def __init__(self, + is_training, + anchor_generator, + box_predictor, + box_coder, + feature_extractor, + matcher, + region_similarity_calculator, + image_resizer_fn, + non_max_suppression_fn, + score_conversion_fn, + classification_loss, + localization_loss, + classification_loss_weight, + localization_loss_weight, + normalize_loss_by_num_matches, + hard_example_miner, + add_summaries=True): + """SSDMetaArch Constructor. + + TODO: group NMS parameters + score converter into + a class and loss parameters into a class and write config protos for + postprocessing and losses. + + Args: + is_training: A boolean indicating whether the training version of the + computation graph should be constructed. + anchor_generator: an anchor_generator.AnchorGenerator object. + box_predictor: a box_predictor.BoxPredictor object. + box_coder: a box_coder.BoxCoder object. + feature_extractor: a SSDFeatureExtractor object. + matcher: a matcher.Matcher object. + region_similarity_calculator: a + region_similarity_calculator.RegionSimilarityCalculator object. + image_resizer_fn: a callable for image resizing. This callable always + takes a rank-3 image tensor (corresponding to a single image) and + returns a rank-3 image tensor, possibly with new spatial dimensions. + See builders/image_resizer_builder.py. + non_max_suppression_fn: batch_multiclass_non_max_suppression + callable that takes `boxes`, `scores` and optional `clip_window` + inputs (with all other inputs already set) and returns a dictionary + hold tensors with keys: `detection_boxes`, `detection_scores`, + `detection_classes` and `num_detections`. See `post_processing. + batch_multiclass_non_max_suppression` for the type and shape of these + tensors. + score_conversion_fn: callable elementwise nonlinearity (that takes tensors + as inputs and returns tensors). This is usually used to convert logits + to probabilities. + classification_loss: an object_detection.core.losses.Loss object. + localization_loss: a object_detection.core.losses.Loss object. + classification_loss_weight: float + localization_loss_weight: float + normalize_loss_by_num_matches: boolean + hard_example_miner: a losses.HardExampleMiner object (can be None) + add_summaries: boolean (default: True) controlling whether summary ops + should be added to tensorflow graph. + """ + super(SSDMetaArch, self).__init__(num_classes=box_predictor.num_classes) + self._is_training = is_training + + # Needed for fine-tuning from classification checkpoints whose + # variables do not have the feature extractor scope. + self._extract_features_scope = 'FeatureExtractor' + + self._anchor_generator = anchor_generator + self._box_predictor = box_predictor + + self._box_coder = box_coder + self._feature_extractor = feature_extractor + self._matcher = matcher + self._region_similarity_calculator = region_similarity_calculator + + # TODO: handle agnostic mode and positive/negative class weights + unmatched_cls_target = None + unmatched_cls_target = tf.constant([1] + self.num_classes * [0], tf.float32) + self._target_assigner = target_assigner.TargetAssigner( + self._region_similarity_calculator, + self._matcher, + self._box_coder, + positive_class_weight=1.0, + negative_class_weight=1.0, + unmatched_cls_target=unmatched_cls_target) + + self._classification_loss = classification_loss + self._localization_loss = localization_loss + self._classification_loss_weight = classification_loss_weight + self._localization_loss_weight = localization_loss_weight + self._normalize_loss_by_num_matches = normalize_loss_by_num_matches + self._hard_example_miner = hard_example_miner + + self._image_resizer_fn = image_resizer_fn + self._non_max_suppression_fn = non_max_suppression_fn + self._score_conversion_fn = score_conversion_fn + + self._anchors = None + self._add_summaries = add_summaries + + @property + def anchors(self): + if not self._anchors: + raise RuntimeError('anchors have not been constructed yet!') + if not isinstance(self._anchors, box_list.BoxList): + raise RuntimeError('anchors should be a BoxList object, but is not.') + return self._anchors + + def preprocess(self, inputs): + """Feature-extractor specific preprocessing. + + See base class. + + Args: + inputs: a [batch, height_in, width_in, channels] float tensor representing + a batch of images with values between 0 and 255.0. + + Returns: + preprocessed_inputs: a [batch, height_out, width_out, channels] float + tensor representing a batch of images. + Raises: + ValueError: if inputs tensor does not have type tf.float32 + """ + if inputs.dtype is not tf.float32: + raise ValueError('`preprocess` expects a tf.float32 tensor') + with tf.name_scope('Preprocessor'): + # TODO: revisit whether to always use batch size as the number of + # parallel iterations vs allow for dynamic batching. + resized_inputs = tf.map_fn(self._image_resizer_fn, + elems=inputs, + dtype=tf.float32) + return self._feature_extractor.preprocess(resized_inputs) + + def predict(self, preprocessed_inputs): + """Predicts unpostprocessed tensors from input tensor. + + This function takes an input batch of images and runs it through the forward + pass of the network to yield unpostprocessesed predictions. + + A side effect of calling the predict method is that self._anchors is + populated with a box_list.BoxList of anchors. These anchors must be + constructed before the postprocess or loss functions can be called. + + Args: + preprocessed_inputs: a [batch, height, width, channels] image tensor. + + Returns: + prediction_dict: a dictionary holding "raw" prediction tensors: + 1) box_encodings: 4-D float tensor of shape [batch_size, num_anchors, + box_code_dimension] containing predicted boxes. + 2) class_predictions_with_background: 3-D float tensor of shape + [batch_size, num_anchors, num_classes+1] containing class predictions + (logits) for each of the anchors. Note that this tensor *includes* + background class predictions (at class index 0). + 3) feature_maps: a list of tensors where the ith tensor has shape + [batch, height_i, width_i, depth_i]. + """ + with tf.variable_scope(None, self._extract_features_scope, + [preprocessed_inputs]): + feature_maps = self._feature_extractor.extract_features( + preprocessed_inputs) + feature_map_spatial_dims = self._get_feature_map_spatial_dims(feature_maps) + self._anchors = self._anchor_generator.generate(feature_map_spatial_dims) + (box_encodings, class_predictions_with_background + ) = self._add_box_predictions_to_feature_maps(feature_maps) + predictions_dict = { + 'box_encodings': box_encodings, + 'class_predictions_with_background': class_predictions_with_background, + 'feature_maps': feature_maps + } + return predictions_dict + + def _add_box_predictions_to_feature_maps(self, feature_maps): + """Adds box predictors to each feature map and returns concatenated results. + + Args: + feature_maps: a list of tensors where the ith tensor has shape + [batch, height_i, width_i, depth_i] + + Returns: + box_encodings: 4-D float tensor of shape [batch_size, num_anchors, + box_code_dimension] containing predicted boxes. + class_predictions_with_background: 2-D float tensor of shape + [batch_size, num_anchors, num_classes+1] containing class predictions + (logits) for each of the anchors. Note that this tensor *includes* + background class predictions (at class index 0). + + Raises: + RuntimeError: if the number of feature maps extracted via the + extract_features method does not match the length of the + num_anchors_per_locations list that was passed to the constructor. + RuntimeError: if box_encodings from the box_predictor does not have + shape of the form [batch_size, num_anchors, 1, code_size]. + """ + num_anchors_per_location_list = ( + self._anchor_generator.num_anchors_per_location()) + if len(feature_maps) != len(num_anchors_per_location_list): + raise RuntimeError('the number of feature maps must match the ' + 'length of self.anchors.NumAnchorsPerLocation().') + box_encodings_list = [] + cls_predictions_with_background_list = [] + for idx, (feature_map, num_anchors_per_location + ) in enumerate(zip(feature_maps, num_anchors_per_location_list)): + box_predictor_scope = 'BoxPredictor_{}'.format(idx) + box_predictions = self._box_predictor.predict(feature_map, + num_anchors_per_location, + box_predictor_scope) + box_encodings = box_predictions[bpredictor.BOX_ENCODINGS] + cls_predictions_with_background = box_predictions[ + bpredictor.CLASS_PREDICTIONS_WITH_BACKGROUND] + + box_encodings_shape = box_encodings.get_shape().as_list() + if len(box_encodings_shape) != 4 or box_encodings_shape[2] != 1: + raise RuntimeError('box_encodings from the box_predictor must be of ' + 'shape `[batch_size, num_anchors, 1, code_size]`; ' + 'actual shape', box_encodings_shape) + box_encodings = tf.squeeze(box_encodings, axis=2) + box_encodings_list.append(box_encodings) + cls_predictions_with_background_list.append( + cls_predictions_with_background) + + num_predictions = sum( + [tf.shape(box_encodings)[1] for box_encodings in box_encodings_list]) + num_anchors = self.anchors.num_boxes() + anchors_assert = tf.assert_equal(num_anchors, num_predictions, [ + 'Mismatch: number of anchors vs number of predictions', num_anchors, + num_predictions + ]) + with tf.control_dependencies([anchors_assert]): + box_encodings = tf.concat(box_encodings_list, 1) + class_predictions_with_background = tf.concat( + cls_predictions_with_background_list, 1) + return box_encodings, class_predictions_with_background + + def _get_feature_map_spatial_dims(self, feature_maps): + """Return list of spatial dimensions for each feature map in a list. + + Args: + feature_maps: a list of tensors where the ith tensor has shape + [batch, height_i, width_i, depth_i]. + + Returns: + a list of pairs (height, width) for each feature map in feature_maps + """ + feature_map_shapes = [ + feature_map.get_shape().as_list() for feature_map in feature_maps + ] + return [(shape[1], shape[2]) for shape in feature_map_shapes] + + def postprocess(self, prediction_dict): + """Converts prediction tensors to final detections. + + This function converts raw predictions tensors to final detection results by + slicing off the background class, decoding box predictions and applying + non max suppression and clipping to the image window. + + See base class for output format conventions. Note also that by default, + scores are to be interpreted as logits, but if a score_conversion_fn is + used, then scores are remapped (and may thus have a different + interpretation). + + Args: + prediction_dict: a dictionary holding prediction tensors with + 1) box_encodings: 4-D float tensor of shape [batch_size, num_anchors, + box_code_dimension] containing predicted boxes. + 2) class_predictions_with_background: 2-D float tensor of shape + [batch_size, num_anchors, num_classes+1] containing class predictions + (logits) for each of the anchors. Note that this tensor *includes* + background class predictions. + + Returns: + detections: a dictionary containing the following fields + detection_boxes: [batch, max_detection, 4] + detection_scores: [batch, max_detections] + detection_classes: [batch, max_detections] + num_detections: [batch] + Raises: + ValueError: if prediction_dict does not contain `box_encodings` or + `class_predictions_with_background` fields. + """ + if ('box_encodings' not in prediction_dict or + 'class_predictions_with_background' not in prediction_dict): + raise ValueError('prediction_dict does not contain expected entries.') + with tf.name_scope('Postprocessor'): + box_encodings = prediction_dict['box_encodings'] + class_predictions = prediction_dict['class_predictions_with_background'] + detection_boxes = bcoder.batch_decode(box_encodings, self._box_coder, + self.anchors) + detection_boxes = tf.expand_dims(detection_boxes, axis=2) + + class_predictions_without_background = tf.slice(class_predictions, + [0, 0, 1], + [-1, -1, -1]) + detection_scores = self._score_conversion_fn( + class_predictions_without_background) + clip_window = tf.constant([0, 0, 1, 1], tf.float32) + detections = self._non_max_suppression_fn(detection_boxes, + detection_scores, + clip_window=clip_window) + return detections + + def loss(self, prediction_dict, scope=None): + """Compute scalar loss tensors with respect to provided groundtruth. + + Calling this function requires that groundtruth tensors have been + provided via the provide_groundtruth function. + + Args: + prediction_dict: a dictionary holding prediction tensors with + 1) box_encodings: 4-D float tensor of shape [batch_size, num_anchors, + box_code_dimension] containing predicted boxes. + 2) class_predictions_with_background: 2-D float tensor of shape + [batch_size, num_anchors, num_classes+1] containing class predictions + (logits) for each of the anchors. Note that this tensor *includes* + background class predictions. + scope: Optional scope name. + + Returns: + a dictionary mapping loss keys (`localization_loss` and + `classification_loss`) to scalar tensors representing corresponding loss + values. + """ + with tf.name_scope(scope, 'Loss', prediction_dict.values()): + (batch_cls_targets, batch_cls_weights, batch_reg_targets, + batch_reg_weights, match_list) = self._assign_targets( + self.groundtruth_lists(fields.BoxListFields.boxes), + self.groundtruth_lists(fields.BoxListFields.classes)) + if self._add_summaries: + self._summarize_input( + self.groundtruth_lists(fields.BoxListFields.boxes), match_list) + num_matches = tf.stack( + [match.num_matched_columns() for match in match_list]) + location_losses = self._localization_loss( + prediction_dict['box_encodings'], + batch_reg_targets, + weights=batch_reg_weights) + cls_losses = self._classification_loss( + prediction_dict['class_predictions_with_background'], + batch_cls_targets, + weights=batch_cls_weights) + + # Optionally apply hard mining on top of loss values + localization_loss = tf.reduce_sum(location_losses) + classification_loss = tf.reduce_sum(cls_losses) + if self._hard_example_miner: + (localization_loss, classification_loss) = self._apply_hard_mining( + location_losses, cls_losses, prediction_dict, match_list) + if self._add_summaries: + self._hard_example_miner.summarize() + + # Optionally normalize by number of positive matches + normalizer = tf.constant(1.0, dtype=tf.float32) + if self._normalize_loss_by_num_matches: + normalizer = tf.maximum(tf.to_float(tf.reduce_sum(num_matches)), 1.0) + + loss_dict = { + 'localization_loss': (self._localization_loss_weight / normalizer) * + localization_loss, + 'classification_loss': (self._classification_loss_weight / + normalizer) * classification_loss + } + return loss_dict + + def _assign_targets(self, groundtruth_boxes_list, groundtruth_classes_list): + """Assign groundtruth targets. + + Adds a background class to each one-hot encoding of groundtruth classes + and uses target assigner to obtain regression and classification targets. + + Args: + groundtruth_boxes_list: a list of 2-D tensors of shape [num_boxes, 4] + containing coordinates of the groundtruth boxes. + Groundtruth boxes are provided in [y_min, x_min, y_max, x_max] + format and assumed to be normalized and clipped + relative to the image window with y_min <= y_max and x_min <= x_max. + groundtruth_classes_list: a list of 2-D one-hot (or k-hot) tensors of + shape [num_boxes, num_classes] containing the class targets with the 0th + index assumed to map to the first non-background class. + + Returns: + batch_cls_targets: a tensor with shape [batch_size, num_anchors, + num_classes], + batch_cls_weights: a tensor with shape [batch_size, num_anchors], + batch_reg_targets: a tensor with shape [batch_size, num_anchors, + box_code_dimension] + batch_reg_weights: a tensor with shape [batch_size, num_anchors], + match_list: a list of matcher.Match objects encoding the match between + anchors and groundtruth boxes for each image of the batch, + with rows of the Match objects corresponding to groundtruth boxes + and columns corresponding to anchors. + """ + groundtruth_boxlists = [ + box_list.BoxList(boxes) for boxes in groundtruth_boxes_list + ] + groundtruth_classes_with_background_list = [ + tf.pad(one_hot_encoding, [[0, 0], [1, 0]], mode='CONSTANT') + for one_hot_encoding in groundtruth_classes_list + ] + return target_assigner.batch_assign_targets( + self._target_assigner, self.anchors, groundtruth_boxlists, + groundtruth_classes_with_background_list) + + def _summarize_input(self, groundtruth_boxes_list, match_list): + """Creates tensorflow summaries for the input boxes and anchors. + + This function creates four summaries corresponding to the average + number (over images in a batch) of (1) groundtruth boxes, (2) anchors + marked as positive, (3) anchors marked as negative, and (4) anchors marked + as ignored. + + Args: + groundtruth_boxes_list: a list of 2-D tensors of shape [num_boxes, 4] + containing corners of the groundtruth boxes. + match_list: a list of matcher.Match objects encoding the match between + anchors and groundtruth boxes for each image of the batch, + with rows of the Match objects corresponding to groundtruth boxes + and columns corresponding to anchors. + """ + num_boxes_per_image = tf.stack( + [tf.shape(x)[0] for x in groundtruth_boxes_list]) + pos_anchors_per_image = tf.stack( + [match.num_matched_columns() for match in match_list]) + neg_anchors_per_image = tf.stack( + [match.num_unmatched_columns() for match in match_list]) + ignored_anchors_per_image = tf.stack( + [match.num_ignored_columns() for match in match_list]) + tf.summary.scalar('Input/AvgNumGroundtruthBoxesPerImage', + tf.reduce_mean(tf.to_float(num_boxes_per_image))) + tf.summary.scalar('Input/AvgNumPositiveAnchorsPerImage', + tf.reduce_mean(tf.to_float(pos_anchors_per_image))) + tf.summary.scalar('Input/AvgNumNegativeAnchorsPerImage', + tf.reduce_mean(tf.to_float(neg_anchors_per_image))) + tf.summary.scalar('Input/AvgNumIgnoredAnchorsPerImage', + tf.reduce_mean(tf.to_float(ignored_anchors_per_image))) + + def _apply_hard_mining(self, location_losses, cls_losses, prediction_dict, + match_list): + """Applies hard mining to anchorwise losses. + + Args: + location_losses: Float tensor of shape [batch_size, num_anchors] + representing anchorwise location losses. + cls_losses: Float tensor of shape [batch_size, num_anchors] + representing anchorwise classification losses. + prediction_dict: p a dictionary holding prediction tensors with + 1) box_encodings: 4-D float tensor of shape [batch_size, num_anchors, + box_code_dimension] containing predicted boxes. + 2) class_predictions_with_background: 2-D float tensor of shape + [batch_size, num_anchors, num_classes+1] containing class predictions + (logits) for each of the anchors. Note that this tensor *includes* + background class predictions. + match_list: a list of matcher.Match objects encoding the match between + anchors and groundtruth boxes for each image of the batch, + with rows of the Match objects corresponding to groundtruth boxes + and columns corresponding to anchors. + + Returns: + mined_location_loss: a float scalar with sum of localization losses from + selected hard examples. + mined_cls_loss: a float scalar with sum of classification losses from + selected hard examples. + """ + class_pred_shape = [-1, self.anchors.num_boxes_static(), self.num_classes] + class_predictions = tf.reshape( + tf.slice(prediction_dict['class_predictions_with_background'], + [0, 0, 1], class_pred_shape), class_pred_shape) + + decoded_boxes = bcoder.batch_decode(prediction_dict['box_encodings'], + self._box_coder, self.anchors) + decoded_box_tensors_list = tf.unstack(decoded_boxes) + class_prediction_list = tf.unstack(class_predictions) + decoded_boxlist_list = [] + for box_location, box_score in zip(decoded_box_tensors_list, + class_prediction_list): + decoded_boxlist = box_list.BoxList(box_location) + decoded_boxlist.add_field('scores', box_score) + decoded_boxlist_list.append(decoded_boxlist) + return self._hard_example_miner( + location_losses=location_losses, + cls_losses=cls_losses, + decoded_boxlist_list=decoded_boxlist_list, + match_list=match_list) + + def restore_fn(self, checkpoint_path, from_detection_checkpoint=True): + """Return callable for loading a checkpoint into the tensorflow graph. + + Args: + checkpoint_path: path to checkpoint to restore. + from_detection_checkpoint: whether to restore from a full detection + checkpoint (with compatible variable names) or to restore from a + classification checkpoint for initialization prior to training. + + Returns: + a callable which takes a tf.Session as input and loads a checkpoint when + run. + """ + variables_to_restore = {} + for variable in tf.all_variables(): + if variable.op.name.startswith(self._extract_features_scope): + var_name = variable.op.name + if not from_detection_checkpoint: + var_name = ( + re.split('^' + self._extract_features_scope + '/', var_name)[-1]) + variables_to_restore[var_name] = variable + # TODO: Load variables selectively using scopes. + variables_to_restore = ( + variables_helper.get_variables_available_in_checkpoint( + variables_to_restore, checkpoint_path)) + saver = tf.train.Saver(variables_to_restore) + + def restore(sess): + saver.restore(sess, checkpoint_path) + return restore diff --git a/object_detection/meta_architectures/ssd_meta_arch_test.py b/object_detection/meta_architectures/ssd_meta_arch_test.py new file mode 100644 index 0000000000000000000000000000000000000000..8096da9a62018561814c70c7c3650ce434226fde --- /dev/null +++ b/object_detection/meta_architectures/ssd_meta_arch_test.py @@ -0,0 +1,258 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.meta_architectures.ssd_meta_arch.""" +import functools +import numpy as np +import tensorflow as tf + +from tensorflow.python.training import saver as tf_saver +from object_detection.core import anchor_generator +from object_detection.core import box_list +from object_detection.core import losses +from object_detection.core import post_processing +from object_detection.core import region_similarity_calculator as sim_calc +from object_detection.meta_architectures import ssd_meta_arch +from object_detection.utils import test_utils + +slim = tf.contrib.slim + + +class FakeSSDFeatureExtractor(ssd_meta_arch.SSDFeatureExtractor): + + def __init__(self): + super(FakeSSDFeatureExtractor, self).__init__( + depth_multiplier=0, min_depth=0, conv_hyperparams=None) + + def preprocess(self, resized_inputs): + return tf.identity(resized_inputs) + + def extract_features(self, preprocessed_inputs): + with tf.variable_scope('mock_model'): + features = slim.conv2d(inputs=preprocessed_inputs, num_outputs=32, + kernel_size=[1, 1], scope='layer1') + return [features] + + +class MockAnchorGenerator2x2(anchor_generator.AnchorGenerator): + """Sets up a simple 2x2 anchor grid on the unit square.""" + + def name_scope(self): + return 'MockAnchorGenerator' + + def num_anchors_per_location(self): + return [1] + + def _generate(self, feature_map_shape_list): + return box_list.BoxList( + tf.constant([[0, 0, .5, .5], + [0, .5, .5, 1], + [.5, 0, 1, .5], + [.5, .5, 1, 1]], tf.float32)) + + +class SsdMetaArchTest(tf.test.TestCase): + + def setUp(self): + """Set up mock SSD model. + + Here we set up a simple mock SSD model that will always predict 4 + detections that happen to always be exactly the anchors that are set up + in the above MockAnchorGenerator. Because we let max_detections=5, + we will also always end up with an extra padded row in the detection + results. + """ + is_training = False + self._num_classes = 1 + mock_anchor_generator = MockAnchorGenerator2x2() + mock_box_predictor = test_utils.MockBoxPredictor( + is_training, self._num_classes) + mock_box_coder = test_utils.MockBoxCoder() + fake_feature_extractor = FakeSSDFeatureExtractor() + mock_matcher = test_utils.MockMatcher() + region_similarity_calculator = sim_calc.IouSimilarity() + + def image_resizer_fn(image): + return tf.identity(image) + + classification_loss = losses.WeightedSigmoidClassificationLoss( + anchorwise_output=True) + localization_loss = losses.WeightedSmoothL1LocalizationLoss( + anchorwise_output=True) + non_max_suppression_fn = functools.partial( + post_processing.batch_multiclass_non_max_suppression, + score_thresh=-20.0, + iou_thresh=1.0, + max_size_per_class=5, + max_total_size=5) + classification_loss_weight = 1.0 + localization_loss_weight = 1.0 + normalize_loss_by_num_matches = False + + # This hard example miner is expected to be a no-op. + hard_example_miner = losses.HardExampleMiner( + num_hard_examples=None, + iou_threshold=1.0) + + self._num_anchors = 4 + self._code_size = 4 + self._model = ssd_meta_arch.SSDMetaArch( + is_training, mock_anchor_generator, mock_box_predictor, mock_box_coder, + fake_feature_extractor, mock_matcher, region_similarity_calculator, + image_resizer_fn, non_max_suppression_fn, tf.identity, + classification_loss, localization_loss, classification_loss_weight, + localization_loss_weight, normalize_loss_by_num_matches, + hard_example_miner) + + def test_predict_results_have_correct_keys_and_shapes(self): + batch_size = 3 + preprocessed_input = tf.random_uniform((batch_size, 2, 2, 3), + dtype=tf.float32) + prediction_dict = self._model.predict(preprocessed_input) + + self.assertTrue('box_encodings' in prediction_dict) + self.assertTrue('class_predictions_with_background' in prediction_dict) + self.assertTrue('feature_maps' in prediction_dict) + + expected_box_encodings_shape_out = ( + batch_size, self._num_anchors, self._code_size) + expected_class_predictions_with_background_shape_out = ( + batch_size, self._num_anchors, self._num_classes+1) + init_op = tf.global_variables_initializer() + with self.test_session() as sess: + sess.run(init_op) + prediction_out = sess.run(prediction_dict) + self.assertAllEqual(prediction_out['box_encodings'].shape, + expected_box_encodings_shape_out) + self.assertAllEqual( + prediction_out['class_predictions_with_background'].shape, + expected_class_predictions_with_background_shape_out) + + def test_postprocess_results_are_correct(self): + batch_size = 2 + preprocessed_input = tf.random_uniform((batch_size, 2, 2, 3), + dtype=tf.float32) + prediction_dict = self._model.predict(preprocessed_input) + detections = self._model.postprocess(prediction_dict) + + expected_boxes = np.array([[[0, 0, .5, .5], + [0, .5, .5, 1], + [.5, 0, 1, .5], + [.5, .5, 1, 1], + [0, 0, 0, 0]], + [[0, 0, .5, .5], + [0, .5, .5, 1], + [.5, 0, 1, .5], + [.5, .5, 1, 1], + [0, 0, 0, 0]]]) + expected_scores = np.array([[0, 0, 0, 0, 0], + [0, 0, 0, 0, 0]]) + expected_classes = np.array([[0, 0, 0, 0, 0], + [0, 0, 0, 0, 0]]) + expected_num_detections = np.array([4, 4]) + + self.assertTrue('detection_boxes' in detections) + self.assertTrue('detection_scores' in detections) + self.assertTrue('detection_classes' in detections) + self.assertTrue('num_detections' in detections) + + init_op = tf.global_variables_initializer() + with self.test_session() as sess: + sess.run(init_op) + detections_out = sess.run(detections) + self.assertAllClose(detections_out['detection_boxes'], expected_boxes) + self.assertAllClose(detections_out['detection_scores'], expected_scores) + self.assertAllClose(detections_out['detection_classes'], expected_classes) + self.assertAllClose(detections_out['num_detections'], + expected_num_detections) + + def test_loss_results_are_correct(self): + batch_size = 2 + preprocessed_input = tf.random_uniform((batch_size, 2, 2, 3), + dtype=tf.float32) + groundtruth_boxes_list = [tf.constant([[0, 0, .5, .5]], dtype=tf.float32), + tf.constant([[0, 0, .5, .5]], dtype=tf.float32)] + groundtruth_classes_list = [tf.constant([[1]], dtype=tf.float32), + tf.constant([[1]], dtype=tf.float32)] + self._model.provide_groundtruth(groundtruth_boxes_list, + groundtruth_classes_list) + prediction_dict = self._model.predict(preprocessed_input) + loss_dict = self._model.loss(prediction_dict) + + self.assertTrue('localization_loss' in loss_dict) + self.assertTrue('classification_loss' in loss_dict) + + expected_localization_loss = 0.0 + expected_classification_loss = (batch_size * self._num_anchors + * (self._num_classes+1) * np.log(2.0)) + init_op = tf.global_variables_initializer() + with self.test_session() as sess: + sess.run(init_op) + losses_out = sess.run(loss_dict) + + self.assertAllClose(losses_out['localization_loss'], + expected_localization_loss) + self.assertAllClose(losses_out['classification_loss'], + expected_classification_loss) + + def test_restore_fn_detection(self): + init_op = tf.global_variables_initializer() + saver = tf_saver.Saver() + save_path = self.get_temp_dir() + with self.test_session() as sess: + sess.run(init_op) + saved_model_path = saver.save(sess, save_path) + restore_fn = self._model.restore_fn(saved_model_path, + from_detection_checkpoint=True) + restore_fn(sess) + for var in sess.run(tf.report_uninitialized_variables()): + self.assertNotIn('FeatureExtractor', var.name) + + def test_restore_fn_classification(self): + # Define mock tensorflow classification graph and save variables. + test_graph_classification = tf.Graph() + with test_graph_classification.as_default(): + image = tf.placeholder(dtype=tf.float32, shape=[1, 20, 20, 3]) + with tf.variable_scope('mock_model'): + net = slim.conv2d(image, num_outputs=32, kernel_size=1, scope='layer1') + slim.conv2d(net, num_outputs=3, kernel_size=1, scope='layer2') + + init_op = tf.global_variables_initializer() + saver = tf.train.Saver() + save_path = self.get_temp_dir() + with self.test_session() as sess: + sess.run(init_op) + saved_model_path = saver.save(sess, save_path) + + # Create tensorflow detection graph and load variables from + # classification checkpoint. + test_graph_detection = tf.Graph() + with test_graph_detection.as_default(): + inputs_shape = [2, 2, 2, 3] + inputs = tf.to_float(tf.random_uniform( + inputs_shape, minval=0, maxval=255, dtype=tf.int32)) + preprocessed_inputs = self._model.preprocess(inputs) + prediction_dict = self._model.predict(preprocessed_inputs) + self._model.postprocess(prediction_dict) + restore_fn = self._model.restore_fn(saved_model_path, + from_detection_checkpoint=False) + with self.test_session() as sess: + restore_fn(sess) + for var in sess.run(tf.report_uninitialized_variables()): + self.assertNotIn('FeatureExtractor', var.name) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/models/BUILD b/object_detection/models/BUILD new file mode 100644 index 0000000000000000000000000000000000000000..f4af73682eb9412f34a43ef3f55d412d6eee1b95 --- /dev/null +++ b/object_detection/models/BUILD @@ -0,0 +1,135 @@ +# Tensorflow Object Detection API: Models. + +package( + default_visibility = ["//visibility:public"], +) + +licenses(["notice"]) + +# Apache 2.0 + +py_library( + name = "feature_map_generators", + srcs = [ + "feature_map_generators.py", + ], + deps = [ + "//tensorflow", + "//tensorflow_models/object_detection/utils:ops", + ], +) + +py_test( + name = "feature_map_generators_test", + srcs = [ + "feature_map_generators_test.py", + ], + deps = [ + ":feature_map_generators", + "//tensorflow", + ], +) + +py_library( + name = "ssd_feature_extractor_test", + srcs = [ + "ssd_feature_extractor_test.py", + ], + deps = [ + "//tensorflow", + ], +) + +py_library( + name = "ssd_inception_v2_feature_extractor", + srcs = [ + "ssd_inception_v2_feature_extractor.py", + ], + deps = [ + ":feature_map_generators", + "//tensorflow", + "//tensorflow_models/object_detection/meta_architectures:ssd_meta_arch", + "//tensorflow_models/slim:inception_v2", + ], +) + +py_library( + name = "ssd_mobilenet_v1_feature_extractor", + srcs = ["ssd_mobilenet_v1_feature_extractor.py"], + deps = [ + ":feature_map_generators", + "//tensorflow", + "//tensorflow_models/object_detection/meta_architectures:ssd_meta_arch", + "//tensorflow_models/slim:mobilenet_v1", + ], +) + +py_test( + name = "ssd_inception_v2_feature_extractor_test", + srcs = [ + "ssd_inception_v2_feature_extractor_test.py", + ], + deps = [ + ":ssd_feature_extractor_test", + ":ssd_inception_v2_feature_extractor", + "//tensorflow", + ], +) + +py_test( + name = "ssd_mobilenet_v1_feature_extractor_test", + srcs = ["ssd_mobilenet_v1_feature_extractor_test.py"], + deps = [ + ":ssd_feature_extractor_test", + ":ssd_mobilenet_v1_feature_extractor", + "//tensorflow", + ], +) + +py_library( + name = "faster_rcnn_inception_resnet_v2_feature_extractor", + srcs = [ + "faster_rcnn_inception_resnet_v2_feature_extractor.py", + ], + deps = [ + "//tensorflow", + "//tensorflow_models/object_detection/meta_architectures:faster_rcnn_meta_arch", + "//tensorflow_models/object_detection/utils:variables_helper", + "//tensorflow_models/slim:inception_resnet_v2", + ], +) + +py_test( + name = "faster_rcnn_inception_resnet_v2_feature_extractor_test", + srcs = [ + "faster_rcnn_inception_resnet_v2_feature_extractor_test.py", + ], + deps = [ + ":faster_rcnn_inception_resnet_v2_feature_extractor", + "//tensorflow", + ], +) + +py_library( + name = "faster_rcnn_resnet_v1_feature_extractor", + srcs = [ + "faster_rcnn_resnet_v1_feature_extractor.py", + ], + deps = [ + "//tensorflow", + "//tensorflow_models/object_detection/meta_architectures:faster_rcnn_meta_arch", + "//tensorflow_models/slim:resnet_utils", + "//tensorflow_models/slim:resnet_v1", + ], +) + +py_test( + name = "faster_rcnn_resnet_v1_feature_extractor_test", + srcs = [ + "faster_rcnn_resnet_v1_feature_extractor_test.py", + ], + deps = [ + ":faster_rcnn_resnet_v1_feature_extractor", + "//tensorflow", + ], +) diff --git a/object_detection/models/__init__.py b/object_detection/models/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/object_detection/models/faster_rcnn_inception_resnet_v2_feature_extractor.py b/object_detection/models/faster_rcnn_inception_resnet_v2_feature_extractor.py new file mode 100644 index 0000000000000000000000000000000000000000..f8c86e0c0b8c07d85724ad4fcbfb33a9f921c503 --- /dev/null +++ b/object_detection/models/faster_rcnn_inception_resnet_v2_feature_extractor.py @@ -0,0 +1,216 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Inception Resnet v2 Faster R-CNN implementation. + +See "Inception-v4, Inception-ResNet and the Impact of Residual Connections on +Learning" by Szegedy et al. (https://arxiv.org/abs/1602.07261) +as well as +"Speed/accuracy trade-offs for modern convolutional object detectors" by +Huang et al. (https://arxiv.org/abs/1611.10012) +""" + +import tensorflow as tf + +from object_detection.meta_architectures import faster_rcnn_meta_arch +from object_detection.utils import variables_helper +from nets import inception_resnet_v2 + +slim = tf.contrib.slim + + +class FasterRCNNInceptionResnetV2FeatureExtractor( + faster_rcnn_meta_arch.FasterRCNNFeatureExtractor): + """Faster R-CNN with Inception Resnet v2 feature extractor implementation.""" + + def __init__(self, + is_training, + first_stage_features_stride, + reuse_weights=None, + weight_decay=0.0): + """Constructor. + + Args: + is_training: See base class. + first_stage_features_stride: See base class. + reuse_weights: See base class. + weight_decay: See base class. + + Raises: + ValueError: If `first_stage_features_stride` is not 8 or 16. + """ + if first_stage_features_stride != 8 and first_stage_features_stride != 16: + raise ValueError('`first_stage_features_stride` must be 8 or 16.') + super(FasterRCNNInceptionResnetV2FeatureExtractor, self).__init__( + is_training, first_stage_features_stride, reuse_weights, weight_decay) + + def preprocess(self, resized_inputs): + """Faster R-CNN with Inception Resnet v2 preprocessing. + + Maps pixel values to the range [-1, 1]. + + Args: + resized_inputs: A [batch, height_in, width_in, channels] float32 tensor + representing a batch of images with values between 0 and 255.0. + + Returns: + preprocessed_inputs: A [batch, height_out, width_out, channels] float32 + tensor representing a batch of images. + + """ + return (2.0 / 255.0) * resized_inputs - 1.0 + + def _extract_proposal_features(self, preprocessed_inputs, scope): + """Extracts first stage RPN features. + + Extracts features using the first half of the Inception Resnet v2 network. + We construct the network in `align_feature_maps=True` mode, which means + that all VALID paddings in the network are changed to SAME padding so that + the feature maps are aligned. + + Args: + preprocessed_inputs: A [batch, height, width, channels] float32 tensor + representing a batch of images. + scope: A scope name. + + Returns: + rpn_feature_map: A tensor with shape [batch, height, width, depth] + Raises: + InvalidArgumentError: If the spatial size of `preprocessed_inputs` + (height or width) is less than 33. + ValueError: If the created network is missing the required activation. + """ + if len(preprocessed_inputs.get_shape().as_list()) != 4: + raise ValueError('`preprocessed_inputs` must be 4 dimensional, got a ' + 'tensor of shape %s' % preprocessed_inputs.get_shape()) + + with slim.arg_scope(inception_resnet_v2.inception_resnet_v2_arg_scope( + weight_decay=self._weight_decay)): + # Forces is_training to False to disable batch norm update. + with slim.arg_scope([slim.batch_norm], is_training=False): + with tf.variable_scope('InceptionResnetV2', + reuse=self._reuse_weights) as scope: + rpn_feature_map, _ = ( + inception_resnet_v2.inception_resnet_v2_base( + preprocessed_inputs, final_endpoint='PreAuxLogits', + scope=scope, output_stride=self._first_stage_features_stride, + align_feature_maps=True)) + return rpn_feature_map + + def _extract_box_classifier_features(self, proposal_feature_maps, scope): + """Extracts second stage box classifier features. + + This function reconstructs the "second half" of the Inception ResNet v2 + network after the part defined in `_extract_proposal_features`. + + Args: + proposal_feature_maps: A 4-D float tensor with shape + [batch_size * self.max_num_proposals, crop_height, crop_width, depth] + representing the feature map cropped to each proposal. + scope: A scope name. + + Returns: + proposal_classifier_features: A 4-D float tensor with shape + [batch_size * self.max_num_proposals, height, width, depth] + representing box classifier features for each proposal. + """ + with tf.variable_scope('InceptionResnetV2', reuse=self._reuse_weights): + with slim.arg_scope(inception_resnet_v2.inception_resnet_v2_arg_scope( + weight_decay=self._weight_decay)): + # Forces is_training to False to disable batch norm update. + with slim.arg_scope([slim.batch_norm], is_training=False): + with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d], + stride=1, padding='SAME'): + with tf.variable_scope('Mixed_7a'): + with tf.variable_scope('Branch_0'): + tower_conv = slim.conv2d(proposal_feature_maps, + 256, 1, scope='Conv2d_0a_1x1') + tower_conv_1 = slim.conv2d( + tower_conv, 384, 3, stride=2, + padding='VALID', scope='Conv2d_1a_3x3') + with tf.variable_scope('Branch_1'): + tower_conv1 = slim.conv2d( + proposal_feature_maps, 256, 1, scope='Conv2d_0a_1x1') + tower_conv1_1 = slim.conv2d( + tower_conv1, 288, 3, stride=2, + padding='VALID', scope='Conv2d_1a_3x3') + with tf.variable_scope('Branch_2'): + tower_conv2 = slim.conv2d( + proposal_feature_maps, 256, 1, scope='Conv2d_0a_1x1') + tower_conv2_1 = slim.conv2d(tower_conv2, 288, 3, + scope='Conv2d_0b_3x3') + tower_conv2_2 = slim.conv2d( + tower_conv2_1, 320, 3, stride=2, + padding='VALID', scope='Conv2d_1a_3x3') + with tf.variable_scope('Branch_3'): + tower_pool = slim.max_pool2d( + proposal_feature_maps, 3, stride=2, padding='VALID', + scope='MaxPool_1a_3x3') + net = tf.concat( + [tower_conv_1, tower_conv1_1, tower_conv2_2, tower_pool], 3) + net = slim.repeat(net, 9, inception_resnet_v2.block8, scale=0.20) + net = inception_resnet_v2.block8(net, activation_fn=None) + proposal_classifier_features = slim.conv2d( + net, 1536, 1, scope='Conv2d_7b_1x1') + return proposal_classifier_features + + def restore_from_classification_checkpoint_fn( + self, + checkpoint_path, + first_stage_feature_extractor_scope, + second_stage_feature_extractor_scope): + """Returns callable for loading a checkpoint into the tensorflow graph. + + Note that this overrides the default implementation in + faster_rcnn_meta_arch.FasterRCNNFeatureExtractor which does not work for + InceptionResnetV2 checkpoints. + + TODO: revisit whether it's possible to force the `Repeat` namescope as + created in `_extract_box_classifier_features` to start counting at 2 (e.g. + `Repeat_2`) so that the default restore_fn can be used. + + Args: + checkpoint_path: Path to checkpoint to restore. + first_stage_feature_extractor_scope: A scope name for the first stage + feature extractor. + second_stage_feature_extractor_scope: A scope name for the second stage + feature extractor. + + Returns: + a callable which takes a tf.Session as input and loads a checkpoint when + run. + """ + variables_to_restore = {} + for variable in tf.global_variables(): + if variable.op.name.startswith( + first_stage_feature_extractor_scope): + var_name = variable.op.name.replace( + first_stage_feature_extractor_scope + '/', '') + variables_to_restore[var_name] = variable + if variable.op.name.startswith( + second_stage_feature_extractor_scope): + var_name = variable.op.name.replace( + second_stage_feature_extractor_scope + + '/InceptionResnetV2/Repeat', 'InceptionResnetV2/Repeat_2') + var_name = var_name.replace( + second_stage_feature_extractor_scope + '/', '') + variables_to_restore[var_name] = variable + variables_to_restore = ( + variables_helper.get_variables_available_in_checkpoint( + variables_to_restore, checkpoint_path)) + saver = tf.train.Saver(variables_to_restore) + def restore(sess): + saver.restore(sess, checkpoint_path) + return restore diff --git a/object_detection/models/faster_rcnn_inception_resnet_v2_feature_extractor_test.py b/object_detection/models/faster_rcnn_inception_resnet_v2_feature_extractor_test.py new file mode 100644 index 0000000000000000000000000000000000000000..cdb70187c989b2b2b94fb9aa05990039eaad3995 --- /dev/null +++ b/object_detection/models/faster_rcnn_inception_resnet_v2_feature_extractor_test.py @@ -0,0 +1,108 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for models.faster_rcnn_inception_resnet_v2_feature_extractor.""" + +import tensorflow as tf + +from object_detection.models import faster_rcnn_inception_resnet_v2_feature_extractor as frcnn_inc_res + + +class FasterRcnnInceptionResnetV2FeatureExtractorTest(tf.test.TestCase): + + def _build_feature_extractor(self, first_stage_features_stride): + return frcnn_inc_res.FasterRCNNInceptionResnetV2FeatureExtractor( + is_training=False, + first_stage_features_stride=first_stage_features_stride, + reuse_weights=None, + weight_decay=0.0) + + def test_extract_proposal_features_returns_expected_size(self): + feature_extractor = self._build_feature_extractor( + first_stage_features_stride=16) + preprocessed_inputs = tf.random_uniform( + [1, 299, 299, 3], maxval=255, dtype=tf.float32) + rpn_feature_map = feature_extractor.extract_proposal_features( + preprocessed_inputs, scope='TestScope') + features_shape = tf.shape(rpn_feature_map) + + init_op = tf.global_variables_initializer() + with self.test_session() as sess: + sess.run(init_op) + features_shape_out = sess.run(features_shape) + self.assertAllEqual(features_shape_out, [1, 19, 19, 1088]) + + def test_extract_proposal_features_stride_eight(self): + feature_extractor = self._build_feature_extractor( + first_stage_features_stride=8) + preprocessed_inputs = tf.random_uniform( + [1, 224, 224, 3], maxval=255, dtype=tf.float32) + rpn_feature_map = feature_extractor.extract_proposal_features( + preprocessed_inputs, scope='TestScope') + features_shape = tf.shape(rpn_feature_map) + + init_op = tf.global_variables_initializer() + with self.test_session() as sess: + sess.run(init_op) + features_shape_out = sess.run(features_shape) + self.assertAllEqual(features_shape_out, [1, 28, 28, 1088]) + + def test_extract_proposal_features_half_size_input(self): + feature_extractor = self._build_feature_extractor( + first_stage_features_stride=16) + preprocessed_inputs = tf.random_uniform( + [1, 112, 112, 3], maxval=255, dtype=tf.float32) + rpn_feature_map = feature_extractor.extract_proposal_features( + preprocessed_inputs, scope='TestScope') + features_shape = tf.shape(rpn_feature_map) + + init_op = tf.global_variables_initializer() + with self.test_session() as sess: + sess.run(init_op) + features_shape_out = sess.run(features_shape) + self.assertAllEqual(features_shape_out, [1, 7, 7, 1088]) + + def test_extract_proposal_features_dies_on_invalid_stride(self): + with self.assertRaises(ValueError): + self._build_feature_extractor(first_stage_features_stride=99) + + def test_extract_proposal_features_dies_with_incorrect_rank_inputs(self): + feature_extractor = self._build_feature_extractor( + first_stage_features_stride=16) + preprocessed_inputs = tf.random_uniform( + [224, 224, 3], maxval=255, dtype=tf.float32) + with self.assertRaises(ValueError): + feature_extractor.extract_proposal_features( + preprocessed_inputs, scope='TestScope') + + def test_extract_box_classifier_features_returns_expected_size(self): + feature_extractor = self._build_feature_extractor( + first_stage_features_stride=16) + proposal_feature_maps = tf.random_uniform( + [2, 17, 17, 1088], maxval=255, dtype=tf.float32) + proposal_classifier_features = ( + feature_extractor.extract_box_classifier_features( + proposal_feature_maps, scope='TestScope')) + features_shape = tf.shape(proposal_classifier_features) + + init_op = tf.global_variables_initializer() + with self.test_session() as sess: + sess.run(init_op) + features_shape_out = sess.run(features_shape) + self.assertAllEqual(features_shape_out, [2, 8, 8, 1536]) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/models/faster_rcnn_resnet_v1_feature_extractor.py b/object_detection/models/faster_rcnn_resnet_v1_feature_extractor.py new file mode 100644 index 0000000000000000000000000000000000000000..ff443ac6da4b5bc235f2ae4bf00319be9ab1e035 --- /dev/null +++ b/object_detection/models/faster_rcnn_resnet_v1_feature_extractor.py @@ -0,0 +1,236 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Resnet V1 Faster R-CNN implementation. + +See "Deep Residual Learning for Image Recognition" by He et al., 2015. +https://arxiv.org/abs/1512.03385 + +Note: this implementation assumes that the classification checkpoint used +to finetune this model is trained using the same configuration as that of +the MSRA provided checkpoints +(see https://github.com/KaimingHe/deep-residual-networks), e.g., with +same preprocessing, batch norm scaling, etc. +""" +import tensorflow as tf + +from object_detection.meta_architectures import faster_rcnn_meta_arch +from nets import resnet_utils +from nets import resnet_v1 + +slim = tf.contrib.slim + + +class FasterRCNNResnetV1FeatureExtractor( + faster_rcnn_meta_arch.FasterRCNNFeatureExtractor): + """Faster R-CNN Resnet V1 feature extractor implementation.""" + + def __init__(self, + architecture, + resnet_model, + is_training, + first_stage_features_stride, + reuse_weights=None, + weight_decay=0.0): + """Constructor. + + Args: + architecture: Architecture name of the Resnet V1 model. + resnet_model: Definition of the Resnet V1 model. + is_training: See base class. + first_stage_features_stride: See base class. + reuse_weights: See base class. + weight_decay: See base class. + + Raises: + ValueError: If `first_stage_features_stride` is not 8 or 16. + """ + if first_stage_features_stride != 8 and first_stage_features_stride != 16: + raise ValueError('`first_stage_features_stride` must be 8 or 16.') + self._architecture = architecture + self._resnet_model = resnet_model + super(FasterRCNNResnetV1FeatureExtractor, self).__init__( + is_training, first_stage_features_stride, reuse_weights, weight_decay) + + def preprocess(self, resized_inputs): + """Faster R-CNN Resnet V1 preprocessing. + + VGG style channel mean subtraction as described here: + https://gist.github.com/ksimonyan/211839e770f7b538e2d8#file-readme-md + + Args: + resized_inputs: A [batch, height_in, width_in, channels] float32 tensor + representing a batch of images with values between 0 and 255.0. + + Returns: + preprocessed_inputs: A [batch, height_out, width_out, channels] float32 + tensor representing a batch of images. + + """ + channel_means = [123.68, 116.779, 103.939] + return resized_inputs - [[channel_means]] + + def _extract_proposal_features(self, preprocessed_inputs, scope): + """Extracts first stage RPN features. + + Args: + preprocessed_inputs: A [batch, height, width, channels] float32 tensor + representing a batch of images. + scope: A scope name. + + Returns: + rpn_feature_map: A tensor with shape [batch, height, width, depth] + Raises: + InvalidArgumentError: If the spatial size of `preprocessed_inputs` + (height or width) is less than 33. + ValueError: If the created network is missing the required activation. + """ + if len(preprocessed_inputs.get_shape().as_list()) != 4: + raise ValueError('`preprocessed_inputs` must be 4 dimensional, got a ' + 'tensor of shape %s' % preprocessed_inputs.get_shape()) + shape_assert = tf.Assert( + tf.logical_and( + tf.greater_equal(tf.shape(preprocessed_inputs)[1], 33), + tf.greater_equal(tf.shape(preprocessed_inputs)[2], 33)), + ['image size must at least be 33 in both height and width.']) + + with tf.control_dependencies([shape_assert]): + # Disables batchnorm for fine-tuning with smaller batch sizes. + # TODO: Figure out if it is needed when image batch size is bigger. + with slim.arg_scope( + resnet_utils.resnet_arg_scope( + batch_norm_epsilon=1e-5, + batch_norm_scale=True, + weight_decay=self._weight_decay)): + with tf.variable_scope( + self._architecture, reuse=self._reuse_weights) as var_scope: + _, activations = self._resnet_model( + preprocessed_inputs, + num_classes=None, + is_training=False, + global_pool=False, + output_stride=self._first_stage_features_stride, + spatial_squeeze=False, + scope=var_scope) + + handle = scope + '/%s/block3' % self._architecture + return activations[handle] + + def _extract_box_classifier_features(self, proposal_feature_maps, scope): + """Extracts second stage box classifier features. + + Args: + proposal_feature_maps: A 4-D float tensor with shape + [batch_size * self.max_num_proposals, crop_height, crop_width, depth] + representing the feature map cropped to each proposal. + scope: A scope name (unused). + + Returns: + proposal_classifier_features: A 4-D float tensor with shape + [batch_size * self.max_num_proposals, height, width, depth] + representing box classifier features for each proposal. + """ + with tf.variable_scope(self._architecture, reuse=self._reuse_weights): + with slim.arg_scope( + resnet_utils.resnet_arg_scope( + batch_norm_epsilon=1e-5, + batch_norm_scale=True, + weight_decay=self._weight_decay)): + with slim.arg_scope([slim.batch_norm], is_training=False): + blocks = [ + resnet_utils.Block('block4', resnet_v1.bottleneck, [{ + 'depth': 2048, + 'depth_bottleneck': 512, + 'stride': 1 + }] * 3) + ] + proposal_classifier_features = resnet_utils.stack_blocks_dense( + proposal_feature_maps, blocks) + return proposal_classifier_features + + +class FasterRCNNResnet50FeatureExtractor(FasterRCNNResnetV1FeatureExtractor): + """Faster R-CNN Resnet 50 feature extractor implementation.""" + + def __init__(self, + is_training, + first_stage_features_stride, + reuse_weights=None, + weight_decay=0.0): + """Constructor. + + Args: + is_training: See base class. + first_stage_features_stride: See base class. + reuse_weights: See base class. + weight_decay: See base class. + + Raises: + ValueError: If `first_stage_features_stride` is not 8 or 16, + or if `architecture` is not supported. + """ + super(FasterRCNNResnet50FeatureExtractor, self).__init__( + 'resnet_v1_50', resnet_v1.resnet_v1_50, is_training, + first_stage_features_stride, reuse_weights, weight_decay) + + +class FasterRCNNResnet101FeatureExtractor(FasterRCNNResnetV1FeatureExtractor): + """Faster R-CNN Resnet 101 feature extractor implementation.""" + + def __init__(self, + is_training, + first_stage_features_stride, + reuse_weights=None, + weight_decay=0.0): + """Constructor. + + Args: + is_training: See base class. + first_stage_features_stride: See base class. + reuse_weights: See base class. + weight_decay: See base class. + + Raises: + ValueError: If `first_stage_features_stride` is not 8 or 16, + or if `architecture` is not supported. + """ + super(FasterRCNNResnet101FeatureExtractor, self).__init__( + 'resnet_v1_101', resnet_v1.resnet_v1_101, is_training, + first_stage_features_stride, reuse_weights, weight_decay) + + +class FasterRCNNResnet152FeatureExtractor(FasterRCNNResnetV1FeatureExtractor): + """Faster R-CNN Resnet 152 feature extractor implementation.""" + + def __init__(self, + is_training, + first_stage_features_stride, + reuse_weights=None, + weight_decay=0.0): + """Constructor. + + Args: + is_training: See base class. + first_stage_features_stride: See base class. + reuse_weights: See base class. + weight_decay: See base class. + + Raises: + ValueError: If `first_stage_features_stride` is not 8 or 16, + or if `architecture` is not supported. + """ + super(FasterRCNNResnet152FeatureExtractor, self).__init__( + 'resnet_v1_152', resnet_v1.resnet_v1_152, is_training, + first_stage_features_stride, reuse_weights, weight_decay) diff --git a/object_detection/models/faster_rcnn_resnet_v1_feature_extractor_test.py b/object_detection/models/faster_rcnn_resnet_v1_feature_extractor_test.py new file mode 100644 index 0000000000000000000000000000000000000000..57ec5793acfdf20c01aa9bfbd109077ee06b7786 --- /dev/null +++ b/object_detection/models/faster_rcnn_resnet_v1_feature_extractor_test.py @@ -0,0 +1,136 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.models.faster_rcnn_resnet_v1_feature_extractor.""" + +import numpy as np +import tensorflow as tf + +from object_detection.models import faster_rcnn_resnet_v1_feature_extractor as faster_rcnn_resnet_v1 + + +class FasterRcnnResnetV1FeatureExtractorTest(tf.test.TestCase): + + def _build_feature_extractor(self, + first_stage_features_stride, + architecture='resnet_v1_101'): + feature_extractor_map = { + 'resnet_v1_50': + faster_rcnn_resnet_v1.FasterRCNNResnet50FeatureExtractor, + 'resnet_v1_101': + faster_rcnn_resnet_v1.FasterRCNNResnet101FeatureExtractor, + 'resnet_v1_152': + faster_rcnn_resnet_v1.FasterRCNNResnet152FeatureExtractor + } + return feature_extractor_map[architecture]( + is_training=False, + first_stage_features_stride=first_stage_features_stride, + reuse_weights=None, + weight_decay=0.0) + + def test_extract_proposal_features_returns_expected_size(self): + for architecture in ['resnet_v1_50', 'resnet_v1_101', 'resnet_v1_152']: + feature_extractor = self._build_feature_extractor( + first_stage_features_stride=16, architecture=architecture) + preprocessed_inputs = tf.random_uniform( + [4, 224, 224, 3], maxval=255, dtype=tf.float32) + rpn_feature_map = feature_extractor.extract_proposal_features( + preprocessed_inputs, scope='TestScope') + features_shape = tf.shape(rpn_feature_map) + + init_op = tf.global_variables_initializer() + with self.test_session() as sess: + sess.run(init_op) + features_shape_out = sess.run(features_shape) + self.assertAllEqual(features_shape_out, [4, 14, 14, 1024]) + + def test_extract_proposal_features_stride_eight(self): + feature_extractor = self._build_feature_extractor( + first_stage_features_stride=8) + preprocessed_inputs = tf.random_uniform( + [4, 224, 224, 3], maxval=255, dtype=tf.float32) + rpn_feature_map = feature_extractor.extract_proposal_features( + preprocessed_inputs, scope='TestScope') + features_shape = tf.shape(rpn_feature_map) + + init_op = tf.global_variables_initializer() + with self.test_session() as sess: + sess.run(init_op) + features_shape_out = sess.run(features_shape) + self.assertAllEqual(features_shape_out, [4, 28, 28, 1024]) + + def test_extract_proposal_features_half_size_input(self): + feature_extractor = self._build_feature_extractor( + first_stage_features_stride=16) + preprocessed_inputs = tf.random_uniform( + [1, 112, 112, 3], maxval=255, dtype=tf.float32) + rpn_feature_map = feature_extractor.extract_proposal_features( + preprocessed_inputs, scope='TestScope') + features_shape = tf.shape(rpn_feature_map) + + init_op = tf.global_variables_initializer() + with self.test_session() as sess: + sess.run(init_op) + features_shape_out = sess.run(features_shape) + self.assertAllEqual(features_shape_out, [1, 7, 7, 1024]) + + def test_extract_proposal_features_dies_on_invalid_stride(self): + with self.assertRaises(ValueError): + self._build_feature_extractor(first_stage_features_stride=99) + + def test_extract_proposal_features_dies_on_very_small_images(self): + feature_extractor = self._build_feature_extractor( + first_stage_features_stride=16) + preprocessed_inputs = tf.placeholder(tf.float32, (4, None, None, 3)) + rpn_feature_map = feature_extractor.extract_proposal_features( + preprocessed_inputs, scope='TestScope') + features_shape = tf.shape(rpn_feature_map) + + init_op = tf.global_variables_initializer() + with self.test_session() as sess: + sess.run(init_op) + with self.assertRaises(tf.errors.InvalidArgumentError): + sess.run( + features_shape, + feed_dict={preprocessed_inputs: np.random.rand(4, 32, 32, 3)}) + + def test_extract_proposal_features_dies_with_incorrect_rank_inputs(self): + feature_extractor = self._build_feature_extractor( + first_stage_features_stride=16) + preprocessed_inputs = tf.random_uniform( + [224, 224, 3], maxval=255, dtype=tf.float32) + with self.assertRaises(ValueError): + feature_extractor.extract_proposal_features( + preprocessed_inputs, scope='TestScope') + + def test_extract_box_classifier_features_returns_expected_size(self): + feature_extractor = self._build_feature_extractor( + first_stage_features_stride=16) + proposal_feature_maps = tf.random_uniform( + [3, 7, 7, 1024], maxval=255, dtype=tf.float32) + proposal_classifier_features = ( + feature_extractor.extract_box_classifier_features( + proposal_feature_maps, scope='TestScope')) + features_shape = tf.shape(proposal_classifier_features) + + init_op = tf.global_variables_initializer() + with self.test_session() as sess: + sess.run(init_op) + features_shape_out = sess.run(features_shape) + self.assertAllEqual(features_shape_out, [3, 7, 7, 2048]) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/models/feature_map_generators.py b/object_detection/models/feature_map_generators.py new file mode 100644 index 0000000000000000000000000000000000000000..44e7dd0a3ff109f60e31c27b9c7e0f5fdc4e555d --- /dev/null +++ b/object_detection/models/feature_map_generators.py @@ -0,0 +1,179 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Functions to generate a list of feature maps based on image features. + +Provides several feature map generators that can be used to build object +detection feature extractors. + +Object detection feature extractors usually are built by stacking two components +- A base feature extractor such as Inception V3 and a feature map generator. +Feature map generators build on the base feature extractors and produce a list +of final feature maps. +""" +import collections +import tensorflow as tf +from object_detection.utils import ops +slim = tf.contrib.slim + + +def get_depth_fn(depth_multiplier, min_depth): + """Builds a callable to compute depth (output channels) of conv filters. + + Args: + depth_multiplier: a multiplier for the nominal depth. + min_depth: a lower bound on the depth of filters. + + Returns: + A callable that takes in a nominal depth and returns the depth to use. + """ + def multiply_depth(depth): + new_depth = int(depth * depth_multiplier) + return max(new_depth, min_depth) + return multiply_depth + + +def multi_resolution_feature_maps(feature_map_layout, depth_multiplier, + min_depth, insert_1x1_conv, image_features): + """Generates multi resolution feature maps from input image features. + + Generates multi-scale feature maps for detection as in the SSD papers by + Liu et al: https://arxiv.org/pdf/1512.02325v2.pdf, See Sec 2.1. + + More specifically, it performs the following two tasks: + 1) If a layer name is provided in the configuration, returns that layer as a + feature map. + 2) If a layer name is left as an empty string, constructs a new feature map + based on the spatial shape and depth configuration. Note that the current + implementation only supports generating new layers using convolution of + stride 2 resulting in a spatial resolution reduction by a factor of 2. + + An example of the configuration for Inception V3: + { + 'from_layer': ['Mixed_5d', 'Mixed_6e', 'Mixed_7c', '', '', ''], + 'layer_depth': [-1, -1, -1, 512, 256, 128], + 'anchor_strides': [16, 32, 64, -1, -1, -1] + } + + Args: + feature_map_layout: Dictionary of specifications for the feature map + layouts in the following format (Inception V2/V3 respectively): + { + 'from_layer': ['Mixed_3c', 'Mixed_4c', 'Mixed_5c', '', '', ''], + 'layer_depth': [-1, -1, -1, 512, 256, 128], + 'anchor_strides': [16, 32, 64, -1, -1, -1] + } + or + { + 'from_layer': ['Mixed_5d', 'Mixed_6e', 'Mixed_7c', '', '', '', ''], + 'layer_depth': [-1, -1, -1, 512, 256, 128], + 'anchor_strides': [16, 32, 64, -1, -1, -1] + } + If 'from_layer' is specified, the specified feature map is directly used + as a box predictor layer, and the layer_depth is directly infered from the + feature map (instead of using the provided 'layer_depth' parameter). In + this case, our convention is to set 'layer_depth' to -1 for clarity. + Otherwise, if 'from_layer' is an empty string, then the box predictor + layer will be built from the previous layer using convolution operations. + Note that the current implementation only supports generating new layers + using convolutions of stride 2 (resulting in a spatial resolution + reduction by a factor of 2), and will be extended to a more flexible + design. Finally, the optional 'anchor_strides' can be used to specify the + anchor stride at each layer where 'from_layer' is specified. Our + convention is to set 'anchor_strides' to -1 whenever at the positions that + 'from_layer' is an empty string, and anchor strides at these layers will + be inferred from the previous layer's anchor strides and the current + layer's stride length. In the case where 'anchor_strides' is not + specified, the anchor strides will default to the image width and height + divided by the number of anchors. + depth_multiplier: Depth multiplier for convolutional layers. + min_depth: Minimum depth for convolutional layers. + insert_1x1_conv: A boolean indicating whether an additional 1x1 convolution + should be inserted before shrinking the feature map. + image_features: A dictionary of handles to activation tensors from the + base feature extractor. + + Returns: + feature_maps: an OrderedDict mapping keys (feature map names) to + tensors where each tensor has shape [batch, height_i, width_i, depth_i]. + + Raises: + ValueError: if the number entries in 'from_layer' and + 'layer_depth' do not match. + ValueError: if the generated layer does not have the same resolution + as specified. + """ + depth_fn = get_depth_fn(depth_multiplier, min_depth) + + feature_map_keys = [] + feature_maps = [] + base_from_layer = '' + feature_map_strides = None + use_depthwise = False + if 'anchor_strides' in feature_map_layout: + feature_map_strides = (feature_map_layout['anchor_strides']) + if 'use_depthwise' in feature_map_layout: + use_depthwise = feature_map_layout['use_depthwise'] + for index, (from_layer, layer_depth) in enumerate( + zip(feature_map_layout['from_layer'], feature_map_layout['layer_depth'])): + if from_layer: + feature_map = image_features[from_layer] + base_from_layer = from_layer + feature_map_keys.append(from_layer) + else: + pre_layer = feature_maps[-1] + intermediate_layer = pre_layer + if insert_1x1_conv: + layer_name = '{}_1_Conv2d_{}_1x1_{}'.format( + base_from_layer, index, depth_fn(layer_depth / 2)) + intermediate_layer = slim.conv2d( + pre_layer, + depth_fn(layer_depth / 2), [1, 1], + padding='SAME', + stride=1, + scope=layer_name) + stride = 2 + layer_name = '{}_2_Conv2d_{}_3x3_s2_{}'.format( + base_from_layer, index, depth_fn(layer_depth)) + if use_depthwise: + feature_map = slim.separable_conv2d( + ops.pad_to_multiple(intermediate_layer, stride), + None, [3, 3], + depth_multiplier=1, + padding='SAME', + stride=stride, + scope=layer_name + '_depthwise') + feature_map = slim.conv2d( + feature_map, + depth_fn(layer_depth), [1, 1], + padding='SAME', + stride=1, + scope=layer_name) + else: + feature_map = slim.conv2d( + ops.pad_to_multiple(intermediate_layer, stride), + depth_fn(layer_depth), [3, 3], + padding='SAME', + stride=stride, + scope=layer_name) + + if (index > 0 and feature_map_strides and + feature_map_strides[index - 1] > 0): + feature_map_strides[index] = ( + stride * feature_map_strides[index - 1]) + feature_map_keys.append(layer_name) + feature_maps.append(feature_map) + return collections.OrderedDict( + [(x, y) for (x, y) in zip(feature_map_keys, feature_maps)]) diff --git a/object_detection/models/feature_map_generators_test.py b/object_detection/models/feature_map_generators_test.py new file mode 100644 index 0000000000000000000000000000000000000000..690723db1b61a8a15cd86a9fd27f189ea5d87d9e --- /dev/null +++ b/object_detection/models/feature_map_generators_test.py @@ -0,0 +1,114 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for feature map generators.""" + +import tensorflow as tf + +from object_detection.models import feature_map_generators + +INCEPTION_V2_LAYOUT = { + 'from_layer': ['Mixed_3c', 'Mixed_4c', 'Mixed_5c', '', '', ''], + 'layer_depth': [-1, -1, -1, 512, 256, 256], + 'anchor_strides': [16, 32, 64, -1, -1, -1], + 'layer_target_norm': [20.0, -1, -1, -1, -1, -1], +} + +INCEPTION_V3_LAYOUT = { + 'from_layer': ['Mixed_5d', 'Mixed_6e', 'Mixed_7c', '', '', ''], + 'layer_depth': [-1, -1, -1, 512, 256, 128], + 'anchor_strides': [16, 32, 64, -1, -1, -1], + 'aspect_ratios': [1.0, 2.0, 1.0/2, 3.0, 1.0/3] +} + + +# TODO: add tests with different anchor strides. +class MultiResolutionFeatureMapGeneratorTest(tf.test.TestCase): + + def test_get_expected_feature_map_shapes_with_inception_v2(self): + image_features = { + 'Mixed_3c': tf.random_uniform([4, 28, 28, 256], dtype=tf.float32), + 'Mixed_4c': tf.random_uniform([4, 14, 14, 576], dtype=tf.float32), + 'Mixed_5c': tf.random_uniform([4, 7, 7, 1024], dtype=tf.float32) + } + feature_maps = feature_map_generators.multi_resolution_feature_maps( + feature_map_layout=INCEPTION_V2_LAYOUT, + depth_multiplier=1, + min_depth=32, + insert_1x1_conv=True, + image_features=image_features) + + expected_feature_map_shapes = { + 'Mixed_3c': (4, 28, 28, 256), + 'Mixed_4c': (4, 14, 14, 576), + 'Mixed_5c': (4, 7, 7, 1024), + 'Mixed_5c_2_Conv2d_3_3x3_s2_512': (4, 4, 4, 512), + 'Mixed_5c_2_Conv2d_4_3x3_s2_256': (4, 2, 2, 256), + 'Mixed_5c_2_Conv2d_5_3x3_s2_256': (4, 1, 1, 256)} + + init_op = tf.global_variables_initializer() + with self.test_session() as sess: + sess.run(init_op) + out_feature_maps = sess.run(feature_maps) + out_feature_map_shapes = dict( + (key, value.shape) for key, value in out_feature_maps.iteritems()) + self.assertDictEqual(out_feature_map_shapes, expected_feature_map_shapes) + + def test_get_expected_feature_map_shapes_with_inception_v3(self): + image_features = { + 'Mixed_5d': tf.random_uniform([4, 35, 35, 256], dtype=tf.float32), + 'Mixed_6e': tf.random_uniform([4, 17, 17, 576], dtype=tf.float32), + 'Mixed_7c': tf.random_uniform([4, 8, 8, 1024], dtype=tf.float32) + } + + feature_maps = feature_map_generators.multi_resolution_feature_maps( + feature_map_layout=INCEPTION_V3_LAYOUT, + depth_multiplier=1, + min_depth=32, + insert_1x1_conv=True, + image_features=image_features) + + expected_feature_map_shapes = { + 'Mixed_5d': (4, 35, 35, 256), + 'Mixed_6e': (4, 17, 17, 576), + 'Mixed_7c': (4, 8, 8, 1024), + 'Mixed_7c_2_Conv2d_3_3x3_s2_512': (4, 4, 4, 512), + 'Mixed_7c_2_Conv2d_4_3x3_s2_256': (4, 2, 2, 256), + 'Mixed_7c_2_Conv2d_5_3x3_s2_128': (4, 1, 1, 128)} + + init_op = tf.global_variables_initializer() + with self.test_session() as sess: + sess.run(init_op) + out_feature_maps = sess.run(feature_maps) + out_feature_map_shapes = dict( + (key, value.shape) for key, value in out_feature_maps.iteritems()) + self.assertDictEqual(out_feature_map_shapes, expected_feature_map_shapes) + + +class GetDepthFunctionTest(tf.test.TestCase): + + def test_return_min_depth_when_multiplier_is_small(self): + depth_fn = feature_map_generators.get_depth_fn(depth_multiplier=0.5, + min_depth=16) + self.assertEqual(depth_fn(16), 16) + + def test_return_correct_depth_with_multiplier(self): + depth_fn = feature_map_generators.get_depth_fn(depth_multiplier=0.5, + min_depth=16) + self.assertEqual(depth_fn(64), 32) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/models/ssd_feature_extractor_test.py b/object_detection/models/ssd_feature_extractor_test.py new file mode 100644 index 0000000000000000000000000000000000000000..434a4978f5fe169ed76b54875873c9adad0955ec --- /dev/null +++ b/object_detection/models/ssd_feature_extractor_test.py @@ -0,0 +1,96 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Base test class SSDFeatureExtractors.""" + +from abc import abstractmethod + +import numpy as np +import tensorflow as tf + + +class SsdFeatureExtractorTestBase(object): + + def _validate_features_shape(self, + feature_extractor, + preprocessed_inputs, + expected_feature_map_shapes): + """Checks the extracted features are of correct shape. + + Args: + feature_extractor: The feature extractor to test. + preprocessed_inputs: A [batch, height, width, 3] tensor to extract + features with. + expected_feature_map_shapes: The expected shape of the extracted features. + """ + feature_maps = feature_extractor.extract_features(preprocessed_inputs) + feature_map_shapes = [tf.shape(feature_map) for feature_map in feature_maps] + init_op = tf.global_variables_initializer() + with self.test_session() as sess: + sess.run(init_op) + feature_map_shapes_out = sess.run(feature_map_shapes) + for shape_out, exp_shape_out in zip( + feature_map_shapes_out, expected_feature_map_shapes): + self.assertAllEqual(shape_out, exp_shape_out) + + @abstractmethod + def _create_feature_extractor(self, depth_multiplier): + """Constructs a new feature extractor. + + Args: + depth_multiplier: float depth multiplier for feature extractor + Returns: + an ssd_meta_arch.SSDFeatureExtractor object. + """ + pass + + def check_extract_features_returns_correct_shape( + self, + image_height, + image_width, + depth_multiplier, + expected_feature_map_shapes_out): + feature_extractor = self._create_feature_extractor(depth_multiplier) + preprocessed_inputs = tf.random_uniform( + [4, image_height, image_width, 3], dtype=tf.float32) + self._validate_features_shape( + feature_extractor, preprocessed_inputs, expected_feature_map_shapes_out) + + def check_extract_features_raises_error_with_invalid_image_size( + self, + image_height, + image_width, + depth_multiplier): + feature_extractor = self._create_feature_extractor(depth_multiplier) + preprocessed_inputs = tf.placeholder(tf.float32, (4, None, None, 3)) + feature_maps = feature_extractor.extract_features(preprocessed_inputs) + test_preprocessed_image = np.random.rand(4, image_height, image_width, 3) + with self.test_session() as sess: + sess.run(tf.global_variables_initializer()) + with self.assertRaises(tf.errors.InvalidArgumentError): + sess.run(feature_maps, + feed_dict={preprocessed_inputs: test_preprocessed_image}) + + def check_feature_extractor_variables_under_scope(self, + depth_multiplier, + scope_name): + g = tf.Graph() + with g.as_default(): + feature_extractor = self._create_feature_extractor(depth_multiplier) + preprocessed_inputs = tf.placeholder(tf.float32, (4, None, None, 3)) + feature_extractor.extract_features(preprocessed_inputs) + variables = g.get_collection(tf.GraphKeys.GLOBAL_VARIABLES) + for variable in variables: + self.assertTrue(variable.name.startswith(scope_name)) diff --git a/object_detection/models/ssd_inception_v2_feature_extractor.py b/object_detection/models/ssd_inception_v2_feature_extractor.py new file mode 100644 index 0000000000000000000000000000000000000000..2791f4aa0ca7600b95607e682f19929bf89f0b49 --- /dev/null +++ b/object_detection/models/ssd_inception_v2_feature_extractor.py @@ -0,0 +1,99 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""SSDFeatureExtractor for InceptionV2 features.""" +import tensorflow as tf + +from object_detection.meta_architectures import ssd_meta_arch +from object_detection.models import feature_map_generators +from nets import inception_v2 + +slim = tf.contrib.slim + + +class SSDInceptionV2FeatureExtractor(ssd_meta_arch.SSDFeatureExtractor): + """SSD Feature Extractor using InceptionV2 features.""" + + def __init__(self, + depth_multiplier, + min_depth, + conv_hyperparams, + reuse_weights=None): + """InceptionV2 Feature Extractor for SSD Models. + + Args: + depth_multiplier: float depth multiplier for feature extractor. + min_depth: minimum feature extractor depth. + conv_hyperparams: tf slim arg_scope for conv2d and separable_conv2d ops. + reuse_weights: Whether to reuse variables. Default is None. + """ + super(SSDInceptionV2FeatureExtractor, self).__init__( + depth_multiplier, min_depth, conv_hyperparams, reuse_weights) + + def preprocess(self, resized_inputs): + """SSD preprocessing. + + Maps pixel values to the range [-1, 1]. + + Args: + resized_inputs: a [batch, height, width, channels] float tensor + representing a batch of images. + + Returns: + preprocessed_inputs: a [batch, height, width, channels] float tensor + representing a batch of images. + """ + return (2.0 / 255.0) * resized_inputs - 1.0 + + def extract_features(self, preprocessed_inputs): + """Extract features from preprocessed inputs. + + Args: + preprocessed_inputs: a [batch, height, width, channels] float tensor + representing a batch of images. + + Returns: + feature_maps: a list of tensors where the ith tensor has shape + [batch, height_i, width_i, depth_i] + """ + preprocessed_inputs.get_shape().assert_has_rank(4) + shape_assert = tf.Assert( + tf.logical_and(tf.greater_equal(tf.shape(preprocessed_inputs)[1], 33), + tf.greater_equal(tf.shape(preprocessed_inputs)[2], 33)), + ['image size must at least be 33 in both height and width.']) + + feature_map_layout = { + 'from_layer': ['Mixed_4c', 'Mixed_5c', '', '', '', ''], + 'layer_depth': [-1, -1, 512, 256, 256, 128], + } + + with tf.control_dependencies([shape_assert]): + with slim.arg_scope(self._conv_hyperparams): + with tf.variable_scope('InceptionV2', + reuse=self._reuse_weights) as scope: + _, image_features = inception_v2.inception_v2_base( + preprocessed_inputs, + final_endpoint='Mixed_5c', + min_depth=self._min_depth, + depth_multiplier=self._depth_multiplier, + scope=scope) + feature_maps = feature_map_generators.multi_resolution_feature_maps( + feature_map_layout=feature_map_layout, + depth_multiplier=self._depth_multiplier, + min_depth=self._min_depth, + insert_1x1_conv=True, + image_features=image_features) + + return feature_maps.values() diff --git a/object_detection/models/ssd_inception_v2_feature_extractor_test.py b/object_detection/models/ssd_inception_v2_feature_extractor_test.py new file mode 100644 index 0000000000000000000000000000000000000000..9be9ded6d9494853836d517e2cb5f5232380fca2 --- /dev/null +++ b/object_detection/models/ssd_inception_v2_feature_extractor_test.py @@ -0,0 +1,95 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.models.ssd_inception_v2_feature_extractor.""" +import numpy as np +import tensorflow as tf + +from object_detection.models import ssd_feature_extractor_test +from object_detection.models import ssd_inception_v2_feature_extractor + + +class SsdInceptionV2FeatureExtractorTest( + ssd_feature_extractor_test.SsdFeatureExtractorTestBase, + tf.test.TestCase): + + def _create_feature_extractor(self, depth_multiplier): + """Constructs a SsdInceptionV2FeatureExtractor. + + Args: + depth_multiplier: float depth multiplier for feature extractor + Returns: + an ssd_inception_v2_feature_extractor.SsdInceptionV2FeatureExtractor. + """ + min_depth = 32 + conv_hyperparams = {} + return ssd_inception_v2_feature_extractor.SSDInceptionV2FeatureExtractor( + depth_multiplier, min_depth, conv_hyperparams) + + def test_extract_features_returns_correct_shapes_128(self): + image_height = 128 + image_width = 128 + depth_multiplier = 1.0 + expected_feature_map_shape = [(4, 8, 8, 576), (4, 4, 4, 1024), + (4, 2, 2, 512), (4, 1, 1, 256), + (4, 1, 1, 256), (4, 1, 1, 128)] + self.check_extract_features_returns_correct_shape( + image_height, image_width, depth_multiplier, expected_feature_map_shape) + + def test_extract_features_returns_correct_shapes_299(self): + image_height = 299 + image_width = 299 + depth_multiplier = 1.0 + expected_feature_map_shape = [(4, 19, 19, 576), (4, 10, 10, 1024), + (4, 5, 5, 512), (4, 3, 3, 256), + (4, 2, 2, 256), (4, 1, 1, 128)] + self.check_extract_features_returns_correct_shape( + image_height, image_width, depth_multiplier, expected_feature_map_shape) + + def test_extract_features_returns_correct_shapes_enforcing_min_depth(self): + image_height = 299 + image_width = 299 + depth_multiplier = 0.5**12 + expected_feature_map_shape = [(4, 19, 19, 128), (4, 10, 10, 128), + (4, 5, 5, 32), (4, 3, 3, 32), + (4, 2, 2, 32), (4, 1, 1, 32)] + self.check_extract_features_returns_correct_shape( + image_height, image_width, depth_multiplier, expected_feature_map_shape) + + def test_extract_features_raises_error_with_invalid_image_size(self): + image_height = 32 + image_width = 32 + depth_multiplier = 1.0 + self.check_extract_features_raises_error_with_invalid_image_size( + image_height, image_width, depth_multiplier) + + def test_preprocess_returns_correct_value_range(self): + image_height = 128 + image_width = 128 + depth_multiplier = 1 + test_image = np.random.rand(4, image_height, image_width, 3) + feature_extractor = self._create_feature_extractor(depth_multiplier) + preprocessed_image = feature_extractor.preprocess(test_image) + self.assertTrue(np.all(np.less_equal(np.abs(preprocessed_image), 1.0))) + + def test_variables_only_created_in_scope(self): + depth_multiplier = 1 + scope_name = 'InceptionV2' + self.check_feature_extractor_variables_under_scope(depth_multiplier, + scope_name) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/models/ssd_mobilenet_v1_feature_extractor.py b/object_detection/models/ssd_mobilenet_v1_feature_extractor.py new file mode 100644 index 0000000000000000000000000000000000000000..fa4360c44eb92a4be95c156a79c9155d87a1d65c --- /dev/null +++ b/object_detection/models/ssd_mobilenet_v1_feature_extractor.py @@ -0,0 +1,101 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""SSDFeatureExtractor for MobilenetV1 features.""" + +import tensorflow as tf + +from object_detection.meta_architectures import ssd_meta_arch +from object_detection.models import feature_map_generators +from nets import mobilenet_v1 + +slim = tf.contrib.slim + + +class SSDMobileNetV1FeatureExtractor(ssd_meta_arch.SSDFeatureExtractor): + """SSD Feature Extractor using MobilenetV1 features.""" + + def __init__(self, + depth_multiplier, + min_depth, + conv_hyperparams, + reuse_weights=None): + """MobileNetV1 Feature Extractor for SSD Models. + + Args: + depth_multiplier: float depth multiplier for feature extractor. + min_depth: minimum feature extractor depth. + conv_hyperparams: tf slim arg_scope for conv2d and separable_conv2d ops. + reuse_weights: Whether to reuse variables. Default is None. + """ + super(SSDMobileNetV1FeatureExtractor, self).__init__( + depth_multiplier, min_depth, conv_hyperparams, reuse_weights) + + def preprocess(self, resized_inputs): + """SSD preprocessing. + + Maps pixel values to the range [-1, 1]. + + Args: + resized_inputs: a [batch, height, width, channels] float tensor + representing a batch of images. + + Returns: + preprocessed_inputs: a [batch, height, width, channels] float tensor + representing a batch of images. + """ + return (2.0 / 255.0) * resized_inputs - 1.0 + + def extract_features(self, preprocessed_inputs): + """Extract features from preprocessed inputs. + + Args: + preprocessed_inputs: a [batch, height, width, channels] float tensor + representing a batch of images. + + Returns: + feature_maps: a list of tensors where the ith tensor has shape + [batch, height_i, width_i, depth_i] + """ + preprocessed_inputs.get_shape().assert_has_rank(4) + shape_assert = tf.Assert( + tf.logical_and(tf.greater_equal(tf.shape(preprocessed_inputs)[1], 33), + tf.greater_equal(tf.shape(preprocessed_inputs)[2], 33)), + ['image size must at least be 33 in both height and width.']) + + feature_map_layout = { + 'from_layer': ['Conv2d_11_pointwise', 'Conv2d_13_pointwise', '', '', + '', ''], + 'layer_depth': [-1, -1, 512, 256, 256, 128], + } + + with tf.control_dependencies([shape_assert]): + with slim.arg_scope(self._conv_hyperparams): + with tf.variable_scope('MobilenetV1', + reuse=self._reuse_weights) as scope: + _, image_features = mobilenet_v1.mobilenet_v1_base( + preprocessed_inputs, + final_endpoint='Conv2d_13_pointwise', + min_depth=self._min_depth, + depth_multiplier=self._depth_multiplier, + scope=scope) + feature_maps = feature_map_generators.multi_resolution_feature_maps( + feature_map_layout=feature_map_layout, + depth_multiplier=self._depth_multiplier, + min_depth=self._min_depth, + insert_1x1_conv=True, + image_features=image_features) + + return feature_maps.values() diff --git a/object_detection/models/ssd_mobilenet_v1_feature_extractor_test.py b/object_detection/models/ssd_mobilenet_v1_feature_extractor_test.py new file mode 100644 index 0000000000000000000000000000000000000000..49cd734ab803ec6028b64fd43b1111398144ba04 --- /dev/null +++ b/object_detection/models/ssd_mobilenet_v1_feature_extractor_test.py @@ -0,0 +1,94 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for ssd_mobilenet_v1_feature_extractor.""" +import numpy as np +import tensorflow as tf + +from object_detection.models import ssd_feature_extractor_test +from object_detection.models import ssd_mobilenet_v1_feature_extractor + + +class SsdMobilenetV1FeatureExtractorTest( + ssd_feature_extractor_test.SsdFeatureExtractorTestBase, tf.test.TestCase): + + def _create_feature_extractor(self, depth_multiplier): + """Constructs a new feature extractor. + + Args: + depth_multiplier: float depth multiplier for feature extractor + Returns: + an ssd_meta_arch.SSDFeatureExtractor object. + """ + min_depth = 32 + conv_hyperparams = {} + return ssd_mobilenet_v1_feature_extractor.SSDMobileNetV1FeatureExtractor( + depth_multiplier, min_depth, conv_hyperparams) + + def test_extract_features_returns_correct_shapes_128(self): + image_height = 128 + image_width = 128 + depth_multiplier = 1.0 + expected_feature_map_shape = [(4, 8, 8, 512), (4, 4, 4, 1024), + (4, 2, 2, 512), (4, 1, 1, 256), + (4, 1, 1, 256), (4, 1, 1, 128)] + self.check_extract_features_returns_correct_shape( + image_height, image_width, depth_multiplier, expected_feature_map_shape) + + def test_extract_features_returns_correct_shapes_299(self): + image_height = 299 + image_width = 299 + depth_multiplier = 1.0 + expected_feature_map_shape = [(4, 19, 19, 512), (4, 10, 10, 1024), + (4, 5, 5, 512), (4, 3, 3, 256), + (4, 2, 2, 256), (4, 1, 1, 128)] + self.check_extract_features_returns_correct_shape( + image_height, image_width, depth_multiplier, expected_feature_map_shape) + + def test_extract_features_returns_correct_shapes_enforcing_min_depth(self): + image_height = 299 + image_width = 299 + depth_multiplier = 0.5**12 + expected_feature_map_shape = [(4, 19, 19, 32), (4, 10, 10, 32), + (4, 5, 5, 32), (4, 3, 3, 32), + (4, 2, 2, 32), (4, 1, 1, 32)] + self.check_extract_features_returns_correct_shape( + image_height, image_width, depth_multiplier, expected_feature_map_shape) + + def test_extract_features_raises_error_with_invalid_image_size(self): + image_height = 32 + image_width = 32 + depth_multiplier = 1.0 + self.check_extract_features_raises_error_with_invalid_image_size( + image_height, image_width, depth_multiplier) + + def test_preprocess_returns_correct_value_range(self): + image_height = 128 + image_width = 128 + depth_multiplier = 1 + test_image = np.random.rand(4, image_height, image_width, 3) + feature_extractor = self._create_feature_extractor(depth_multiplier) + preprocessed_image = feature_extractor.preprocess(test_image) + self.assertTrue(np.all(np.less_equal(np.abs(preprocessed_image), 1.0))) + + def test_variables_only_created_in_scope(self): + depth_multiplier = 1 + scope_name = 'MobilenetV1' + self.check_feature_extractor_variables_under_scope(depth_multiplier, + scope_name) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/object_detection_tutorial.ipynb b/object_detection/object_detection_tutorial.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..31e189916e6f9ea9e693e08e434ea6094544067c --- /dev/null +++ b/object_detection/object_detection_tutorial.ipynb @@ -0,0 +1,313 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Object Detection Demo\n", + "Welcome to the object detection inference walkthrough! This notebook will walk you step by step through the process of using a pre-trained model to detect objects in an image." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Imports" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true, + "scrolled": true + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "import os\n", + "import six.moves.urllib as urllib\n", + "import sys\n", + "import tarfile\n", + "import tensorflow as tf\n", + "import zipfile\n", + "\n", + "from collections import defaultdict\n", + "from io import StringIO\n", + "from matplotlib import pyplot as plt\n", + "from PIL import Image" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Env setup" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "# This is needed to display the images.\n", + "%matplotlib inline\n", + "\n", + "# This is needed since the notebook is stored in the object_detection folder.\n", + "sys.path.append(\"..\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Object detection imports\n", + "Here are the imports from the object detection module." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "from utils import label_map_util\n", + "\n", + "from utils import visualization_utils as vis_util" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Model preparation " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Variables\n", + "\n", + "Any model exported using the `export_inference_graph.py` tool can be loaded here simply by changing `PATH_TO_CKPT` to point to a new .pb file. \n", + "\n", + "By default we use an \"SSD with Mobilenet\" model here. See the [detection model zoo](g3doc/detection_model_zoo.md) for a list of other models that can be run out-of-the-box with varying speeds and accuracies." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "# What model to download.\n", + "MODEL_NAME = 'ssd_mobilenet_v1_coco_11_06_2017'\n", + "MODEL_FILE = MODEL_NAME + '.tar.gz'\n", + "DOWNLOAD_BASE = 'http://download.tensorflow.org/models/object_detection/'\n", + "\n", + "# Path to frozen detection graph. This is the actual model that is used for the object detection.\n", + "PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb'\n", + "\n", + "# List of the strings that is used to add correct label for each box.\n", + "PATH_TO_LABELS = os.path.join('data', 'mscoco_label_map.pbtxt')\n", + "\n", + "NUM_CLASSES = 90" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Download Model" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "opener = urllib.request.URLopener()\n", + "opener.retrieve(DOWNLOAD_BASE + MODEL_FILE, MODEL_FILE)\n", + "tar_file = tarfile.open(MODEL_FILE)\n", + "for file in tar_file.getmembers():\n", + " file_name = os.path.basename(file.name)\n", + " if 'frozen_inference_graph.pb' in file_name:\n", + " tar_file.extract(file, os.getcwd())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Load a (frozen) Tensorflow model into memory." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "detection_graph = tf.Graph()\n", + "with detection_graph.as_default():\n", + " od_graph_def = tf.GraphDef()\n", + " with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:\n", + " serialized_graph = fid.read()\n", + " od_graph_def.ParseFromString(serialized_graph)\n", + " tf.import_graph_def(od_graph_def, name='')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Loading label map\n", + "Label maps map indices to category names, so that when our convolution network predicts `5`, we know that this corresponds to `airplane`. Here we use internal utility functions, but anything that returns a dictionary mapping integers to appropriate string labels would be fine" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "label_map = label_map_util.load_labelmap(PATH_TO_LABELS)\n", + "categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)\n", + "category_index = label_map_util.create_category_index(categories)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Helper code" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "def load_image_into_numpy_array(image):\n", + " (im_width, im_height) = image.size\n", + " return np.array(image.getdata()).reshape(\n", + " (im_height, im_width, 3)).astype(np.uint8)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Detection" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "# For the sake of simplicity we will use only 2 images:\n", + "# image1.jpg\n", + "# image2.jpg\n", + "# If you want to test the code with your images, just add path to the images to the TEST_IMAGE_PATHS.\n", + "PATH_TO_TEST_IMAGES_DIR = 'test_images'\n", + "TEST_IMAGE_PATHS = [ os.path.join(PATH_TO_TEST_IMAGES_DIR, 'image{}.jpg'.format(i)) for i in range(1, 3) ]\n", + "\n", + "# Size, in inches, of the output images.\n", + "IMAGE_SIZE = (12, 8)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "with detection_graph.as_default():\n", + " with tf.Session(graph=detection_graph) as sess:\n", + " for image_path in TEST_IMAGE_PATHS:\n", + " image = Image.open(image_path)\n", + " # the array based representation of the image will be used later in order to prepare the\n", + " # result image with boxes and labels on it.\n", + " image_np = load_image_into_numpy_array(image)\n", + " # Expand dimensions since the model expects images to have shape: [1, None, None, 3]\n", + " image_np_expanded = np.expand_dims(image_np, axis=0)\n", + " image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')\n", + " # Each box represents a part of the image where a particular object was detected.\n", + " boxes = detection_graph.get_tensor_by_name('detection_boxes:0')\n", + " # Each score represent how level of confidence for each of the objects.\n", + " # Score is shown on the result image, together with the class label.\n", + " scores = detection_graph.get_tensor_by_name('detection_scores:0')\n", + " classes = detection_graph.get_tensor_by_name('detection_classes:0')\n", + " num_detections = detection_graph.get_tensor_by_name('num_detections:0')\n", + " # Actual detection.\n", + " (boxes, scores, classes, num_detections) = sess.run(\n", + " [boxes, scores, classes, num_detections],\n", + " feed_dict={image_tensor: image_np_expanded})\n", + " # Visualization of the results of a detection.\n", + " vis_util.visualize_boxes_and_labels_on_image_array(\n", + " image_np,\n", + " np.squeeze(boxes),\n", + " np.squeeze(classes).astype(np.int32),\n", + " np.squeeze(scores),\n", + " category_index,\n", + " use_normalized_coordinates=True,\n", + " line_thickness=8)\n", + " plt.figure(figsize=IMAGE_SIZE)\n", + " plt.imshow(image_np)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 2", + "language": "python", + "name": "python2" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 2 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython2", + "version": "2.7.12" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/object_detection/protos/BUILD b/object_detection/protos/BUILD new file mode 100644 index 0000000000000000000000000000000000000000..7ab70ca0fca0b40c8d058d72d7a0a78b043b054b --- /dev/null +++ b/object_detection/protos/BUILD @@ -0,0 +1,329 @@ +# Tensorflow Object Detection API: Configuration protos. + +package( + default_visibility = ["//visibility:public"], +) + +licenses(["notice"]) + +proto_library( + name = "argmax_matcher_proto", + srcs = ["argmax_matcher.proto"], +) + +py_proto_library( + name = "argmax_matcher_py_pb2", + api_version = 2, + deps = [":argmax_matcher_proto"], +) + +proto_library( + name = "bipartite_matcher_proto", + srcs = ["bipartite_matcher.proto"], +) + +py_proto_library( + name = "bipartite_matcher_py_pb2", + api_version = 2, + deps = [":bipartite_matcher_proto"], +) + +proto_library( + name = "matcher_proto", + srcs = ["matcher.proto"], + deps = [ + ":argmax_matcher_proto", + ":bipartite_matcher_proto", + ], +) + +py_proto_library( + name = "matcher_py_pb2", + api_version = 2, + deps = [":matcher_proto"], +) + +proto_library( + name = "faster_rcnn_box_coder_proto", + srcs = ["faster_rcnn_box_coder.proto"], +) + +py_proto_library( + name = "faster_rcnn_box_coder_py_pb2", + api_version = 2, + deps = [":faster_rcnn_box_coder_proto"], +) + +proto_library( + name = "mean_stddev_box_coder_proto", + srcs = ["mean_stddev_box_coder.proto"], +) + +py_proto_library( + name = "mean_stddev_box_coder_py_pb2", + api_version = 2, + deps = [":mean_stddev_box_coder_proto"], +) + +proto_library( + name = "square_box_coder_proto", + srcs = ["square_box_coder.proto"], +) + +py_proto_library( + name = "square_box_coder_py_pb2", + api_version = 2, + deps = [":square_box_coder_proto"], +) + +proto_library( + name = "box_coder_proto", + srcs = ["box_coder.proto"], + deps = [ + ":faster_rcnn_box_coder_proto", + ":mean_stddev_box_coder_proto", + ":square_box_coder_proto", + ], +) + +py_proto_library( + name = "box_coder_py_pb2", + api_version = 2, + deps = [":box_coder_proto"], +) + +proto_library( + name = "grid_anchor_generator_proto", + srcs = ["grid_anchor_generator.proto"], +) + +py_proto_library( + name = "grid_anchor_generator_py_pb2", + api_version = 2, + deps = [":grid_anchor_generator_proto"], +) + +proto_library( + name = "ssd_anchor_generator_proto", + srcs = ["ssd_anchor_generator.proto"], +) + +py_proto_library( + name = "ssd_anchor_generator_py_pb2", + api_version = 2, + deps = [":ssd_anchor_generator_proto"], +) + +proto_library( + name = "anchor_generator_proto", + srcs = ["anchor_generator.proto"], + deps = [ + ":grid_anchor_generator_proto", + ":ssd_anchor_generator_proto", + ], +) + +py_proto_library( + name = "anchor_generator_py_pb2", + api_version = 2, + deps = [":anchor_generator_proto"], +) + +proto_library( + name = "input_reader_proto", + srcs = ["input_reader.proto"], +) + +py_proto_library( + name = "input_reader_py_pb2", + api_version = 2, + deps = [":input_reader_proto"], +) + +proto_library( + name = "losses_proto", + srcs = ["losses.proto"], +) + +py_proto_library( + name = "losses_py_pb2", + api_version = 2, + deps = [":losses_proto"], +) + +proto_library( + name = "optimizer_proto", + srcs = ["optimizer.proto"], +) + +py_proto_library( + name = "optimizer_py_pb2", + api_version = 2, + deps = [":optimizer_proto"], +) + +proto_library( + name = "post_processing_proto", + srcs = ["post_processing.proto"], +) + +py_proto_library( + name = "post_processing_py_pb2", + api_version = 2, + deps = [":post_processing_proto"], +) + +proto_library( + name = "hyperparams_proto", + srcs = ["hyperparams.proto"], +) + +py_proto_library( + name = "hyperparams_py_pb2", + api_version = 2, + deps = [":hyperparams_proto"], +) + +proto_library( + name = "box_predictor_proto", + srcs = ["box_predictor.proto"], + deps = [":hyperparams_proto"], +) + +py_proto_library( + name = "box_predictor_py_pb2", + api_version = 2, + deps = [":box_predictor_proto"], +) + +proto_library( + name = "region_similarity_calculator_proto", + srcs = ["region_similarity_calculator.proto"], + deps = [], +) + +py_proto_library( + name = "region_similarity_calculator_py_pb2", + api_version = 2, + deps = [":region_similarity_calculator_proto"], +) + +proto_library( + name = "preprocessor_proto", + srcs = ["preprocessor.proto"], +) + +py_proto_library( + name = "preprocessor_py_pb2", + api_version = 2, + deps = [":preprocessor_proto"], +) + +proto_library( + name = "train_proto", + srcs = ["train.proto"], + deps = [ + ":optimizer_proto", + ":preprocessor_proto", + ], +) + +py_proto_library( + name = "train_py_pb2", + api_version = 2, + deps = [":train_proto"], +) + +proto_library( + name = "eval_proto", + srcs = ["eval.proto"], +) + +py_proto_library( + name = "eval_py_pb2", + api_version = 2, + deps = [":eval_proto"], +) + +proto_library( + name = "image_resizer_proto", + srcs = ["image_resizer.proto"], +) + +py_proto_library( + name = "image_resizer_py_pb2", + api_version = 2, + deps = [":image_resizer_proto"], +) + +proto_library( + name = "faster_rcnn_proto", + srcs = ["faster_rcnn.proto"], + deps = [ + ":box_predictor_proto", + "//object_detection/protos:anchor_generator_proto", + "//object_detection/protos:hyperparams_proto", + "//object_detection/protos:image_resizer_proto", + "//object_detection/protos:losses_proto", + "//object_detection/protos:post_processing_proto", + ], +) + +proto_library( + name = "ssd_proto", + srcs = ["ssd.proto"], + deps = [ + ":anchor_generator_proto", + ":box_coder_proto", + ":box_predictor_proto", + ":hyperparams_proto", + ":image_resizer_proto", + ":losses_proto", + ":matcher_proto", + ":post_processing_proto", + ":region_similarity_calculator_proto", + ], +) + +proto_library( + name = "model_proto", + srcs = ["model.proto"], + deps = [ + ":faster_rcnn_proto", + ":ssd_proto", + ], +) + +py_proto_library( + name = "model_py_pb2", + api_version = 2, + deps = [":model_proto"], +) + +proto_library( + name = "pipeline_proto", + srcs = ["pipeline.proto"], + deps = [ + ":eval_proto", + ":input_reader_proto", + ":model_proto", + ":train_proto", + ], +) + +py_proto_library( + name = "pipeline_py_pb2", + api_version = 2, + deps = [":pipeline_proto"], +) + +proto_library( + name = "string_int_label_map_proto", + srcs = ["string_int_label_map.proto"], +) + +py_proto_library( + name = "string_int_label_map_py_pb2", + api_version = 2, + deps = [":string_int_label_map_proto"], +) diff --git a/object_detection/protos/__init__.py b/object_detection/protos/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/object_detection/protos/anchor_generator.proto b/object_detection/protos/anchor_generator.proto new file mode 100644 index 0000000000000000000000000000000000000000..4b7b1d62e6304aac8a79bcbc7c61ac57d46341ca --- /dev/null +++ b/object_detection/protos/anchor_generator.proto @@ -0,0 +1,15 @@ +syntax = "proto2"; + +package object_detection.protos; + +import "object_detection/protos/grid_anchor_generator.proto"; +import "object_detection/protos/ssd_anchor_generator.proto"; + +// Configuration proto for the anchor generator to use in the object detection +// pipeline. See core/anchor_generator.py for details. +message AnchorGenerator { + oneof anchor_generator_oneof { + GridAnchorGenerator grid_anchor_generator = 1; + SsdAnchorGenerator ssd_anchor_generator = 2; + } +} diff --git a/object_detection/protos/argmax_matcher.proto b/object_detection/protos/argmax_matcher.proto new file mode 100644 index 0000000000000000000000000000000000000000..88c503182eb1f0fbe86d314664c5c6b1d3d1e350 --- /dev/null +++ b/object_detection/protos/argmax_matcher.proto @@ -0,0 +1,25 @@ +syntax = "proto2"; + +package object_detection.protos; + +// Configuration proto for ArgMaxMatcher. See +// matchers/argmax_matcher.py for details. +message ArgMaxMatcher { + // Threshold for positive matches. + optional float matched_threshold = 1 [default = 0.5]; + + // Threshold for negative matches. + optional float unmatched_threshold = 2 [default = 0.5]; + + // Whether to construct ArgMaxMatcher without thresholds. + optional bool ignore_thresholds = 3 [default = false]; + + // If True then negative matches are the ones below the unmatched_threshold, + // whereas ignored matches are in between the matched and umatched + // threshold. If False, then negative matches are in between the matched + // and unmatched threshold, and everything lower than unmatched is ignored. + optional bool negatives_lower_than_unmatched = 4 [default = true]; + + // Whether to ensure each row is matched to at least one column. + optional bool force_match_for_each_row = 5 [default = false]; +} diff --git a/object_detection/protos/bipartite_matcher.proto b/object_detection/protos/bipartite_matcher.proto new file mode 100644 index 0000000000000000000000000000000000000000..7e5a9e5c15a23a2d8d8575d27d4a10a6291b1cee --- /dev/null +++ b/object_detection/protos/bipartite_matcher.proto @@ -0,0 +1,8 @@ +syntax = "proto2"; + +package object_detection.protos; + +// Configuration proto for bipartite matcher. See +// matchers/bipartite_matcher.py for details. +message BipartiteMatcher { +} diff --git a/object_detection/protos/box_coder.proto b/object_detection/protos/box_coder.proto new file mode 100644 index 0000000000000000000000000000000000000000..6b37e8f1937f0ee5afa17947191b4fb7861c7417 --- /dev/null +++ b/object_detection/protos/box_coder.proto @@ -0,0 +1,17 @@ +syntax = "proto2"; + +package object_detection.protos; + +import "object_detection/protos/faster_rcnn_box_coder.proto"; +import "object_detection/protos/mean_stddev_box_coder.proto"; +import "object_detection/protos/square_box_coder.proto"; + +// Configuration proto for the box coder to be used in the object detection +// pipeline. See core/box_coder.py for details. +message BoxCoder { + oneof box_coder_oneof { + FasterRcnnBoxCoder faster_rcnn_box_coder = 1; + MeanStddevBoxCoder mean_stddev_box_coder = 2; + SquareBoxCoder square_box_coder = 3; + } +} diff --git a/object_detection/protos/box_predictor.proto b/object_detection/protos/box_predictor.proto new file mode 100644 index 0000000000000000000000000000000000000000..96c501c0d2e1717cccaec2bc93d045389d0ee9b3 --- /dev/null +++ b/object_detection/protos/box_predictor.proto @@ -0,0 +1,99 @@ +syntax = "proto2"; + +package object_detection.protos; + +import "object_detection/protos/hyperparams.proto"; + + +// Configuration proto for box predictor. See core/box_predictor.py for details. +message BoxPredictor { + oneof box_predictor_oneof { + ConvolutionalBoxPredictor convolutional_box_predictor = 1; + MaskRCNNBoxPredictor mask_rcnn_box_predictor = 2; + RfcnBoxPredictor rfcn_box_predictor = 3; + } +} + +// Configuration proto for Convolutional box predictor. +message ConvolutionalBoxPredictor { + // Hyperparameters for convolution ops used in the box predictor. + optional Hyperparams conv_hyperparams = 1; + + // Minumum feature depth prior to predicting box encodings and class + // predictions. + optional int32 min_depth = 2 [default = 0]; + + // Maximum feature depth prior to predicting box encodings and class + // predictions. If max_depth is set to 0, no additional feature map will be + // inserted before location and class predictions. + optional int32 max_depth = 3 [default = 0]; + + // Number of the additional conv layers before the predictor. + optional int32 num_layers_before_predictor = 4 [default = 0]; + + // Whether to use dropout for class prediction. + optional bool use_dropout = 5 [default = true]; + + // Keep probability for dropout + optional float dropout_keep_probability = 6 [default = 0.8]; + + // Size of final convolution kernel. If the spatial resolution of the feature + // map is smaller than the kernel size, then the kernel size is set to + // min(feature_width, feature_height). + optional int32 kernel_size = 7 [default = 1]; + + // Size of the encoding for boxes. + optional int32 box_code_size = 8 [default = 4]; + + // Whether to apply sigmoid to the output of class predictions. + // TODO: Do we need this since we have a post processing module.? + optional bool apply_sigmoid_to_scores = 9 [default = false]; +} + +message MaskRCNNBoxPredictor { + // Hyperparameters for fully connected ops used in the box predictor. + optional Hyperparams fc_hyperparams = 1; + + // Whether to use dropout op prior to the both box and class predictions. + optional bool use_dropout = 2 [default= false]; + + // Keep probability for dropout. This is only used if use_dropout is true. + optional float dropout_keep_probability = 3 [default = 0.5]; + + // Size of the encoding for the boxes. + optional int32 box_code_size = 4 [default = 4]; + + // Hyperparameters for convolution ops used in the box predictor. + optional Hyperparams conv_hyperparams = 5; + + // Whether to predict instance masks inside detection boxes. + optional bool predict_instance_masks = 6 [default = false]; + + // The depth for the first conv2d_transpose op applied to the + // image_features in the mask prediciton branch + optional int32 mask_prediction_conv_depth = 7 [default = 256]; + + // Whether to predict keypoints inside detection boxes. + optional bool predict_keypoints = 8 [default = false]; +} + +message RfcnBoxPredictor { + // Hyperparameters for convolution ops used in the box predictor. + optional Hyperparams conv_hyperparams = 1; + + // Bin sizes for RFCN crops. + optional int32 num_spatial_bins_height = 2 [default = 3]; + + optional int32 num_spatial_bins_width = 3 [default = 3]; + + // Target depth to reduce the input image features to. + optional int32 depth = 4 [default=1024]; + + // Size of the encoding for the boxes. + optional int32 box_code_size = 5 [default = 4]; + + // Size to resize the rfcn crops to. + optional int32 crop_height = 6 [default= 12]; + + optional int32 crop_width = 7 [default=12]; +} diff --git a/object_detection/protos/eval.proto b/object_detection/protos/eval.proto new file mode 100644 index 0000000000000000000000000000000000000000..081b60de117bbed03ed73264380e4409f80388e8 --- /dev/null +++ b/object_detection/protos/eval.proto @@ -0,0 +1,47 @@ +syntax = "proto2"; + +package object_detection.protos; + +// Message for configuring DetectionModel evaluation jobs (eval.py). +message EvalConfig { + // Number of visualization images to generate. + optional uint32 num_visualizations = 1 [default=10]; + + // Number of examples to process of evaluation. + optional uint32 num_examples = 2 [default=5000]; + + // How often to run evaluation. + optional uint32 eval_interval_secs = 3 [default=300]; + + // Maximum number of times to run evaluation. If set to 0, will run forever. + optional uint32 max_evals = 4 [default=0]; + + // Whether the TensorFlow graph used for evaluation should be saved to disk. + optional bool save_graph = 5 [default=false]; + + // Path to directory to store visualizations in. If empty, visualization + // images are not exported (only shown on Tensorboard). + optional string visualization_export_dir = 6 [default=""]; + + // BNS name of the TensorFlow master. + optional string eval_master = 7 [default=""]; + + // Type of metrics to use for evaluation. Currently supports only Pascal VOC + // detection metrics. + optional string metrics_set = 8 [default="pascal_voc_metrics"]; + + // Path to export detections to COCO compatible JSON format. + optional string export_path = 9 [default='']; + + // Option to not read groundtruth labels and only export detections to + // COCO-compatible JSON file. + optional bool ignore_groundtruth = 10 [default=false]; + + // Use exponential moving averages of variables for evaluation. + // TODO: When this is false make sure the model is constructed + // without moving averages in restore_fn. + optional bool use_moving_averages = 11 [default=false]; + + // Whether to evaluate instance masks. + optional bool eval_instance_masks = 12 [default=false]; +} diff --git a/object_detection/protos/faster_rcnn.proto b/object_detection/protos/faster_rcnn.proto new file mode 100644 index 0000000000000000000000000000000000000000..e2fd5d666cf2d22c84b71ab07e1c2bc88da44fcf --- /dev/null +++ b/object_detection/protos/faster_rcnn.proto @@ -0,0 +1,131 @@ +syntax = "proto2"; + +package object_detection.protos; + +import "object_detection/protos/anchor_generator.proto"; +import "object_detection/protos/box_predictor.proto"; +import "object_detection/protos/hyperparams.proto"; +import "object_detection/protos/image_resizer.proto"; +import "object_detection/protos/losses.proto"; +import "object_detection/protos/post_processing.proto"; + +// Configuration for Faster R-CNN models. +// See meta_architectures/faster_rcnn_meta_arch.py and models/model_builder.py +// +// Naming conventions: +// Faster R-CNN models have two stages: a first stage region proposal network +// (or RPN) and a second stage box classifier. We thus use the prefixes +// `first_stage_` and `second_stage_` to indicate the stage to which each +// parameter pertains when relevant. +message FasterRcnn { + + // Whether to construct only the Region Proposal Network (RPN). + optional bool first_stage_only = 1 [default=false]; + + // Number of classes to predict. + optional int32 num_classes = 3; + + // Image resizer for preprocessing the input image. + optional ImageResizer image_resizer = 4; + + // Feature extractor config. + optional FasterRcnnFeatureExtractor feature_extractor = 5; + + + // (First stage) region proposal network (RPN) parameters. + + // Anchor generator to compute RPN anchors. + optional AnchorGenerator first_stage_anchor_generator = 6; + + // Atrous rate for the convolution op applied to the + // `first_stage_features_to_crop` tensor to obtain box predictions. + optional int32 first_stage_atrous_rate = 7 [default=1]; + + // Hyperparameters for the convolutional RPN box predictor. + optional Hyperparams first_stage_box_predictor_conv_hyperparams = 8; + + // Kernel size to use for the convolution op just prior to RPN box + // predictions. + optional int32 first_stage_box_predictor_kernel_size = 9 [default=3]; + + // Output depth for the convolution op just prior to RPN box predictions. + optional int32 first_stage_box_predictor_depth = 10 [default=512]; + + // The batch size to use for computing the first stage objectness and + // location losses. + optional int32 first_stage_minibatch_size = 11 [default=256]; + + // Fraction of positive examples per image for the RPN. + optional float first_stage_positive_balance_fraction = 12 [default=0.5]; + + // Non max suppression score threshold applied to first stage RPN proposals. + optional float first_stage_nms_score_threshold = 13 [default=0.0]; + + // Non max suppression IOU threshold applied to first stage RPN proposals. + optional float first_stage_nms_iou_threshold = 14 [default=0.7]; + + // Maximum number of RPN proposals retained after first stage postprocessing. + optional int32 first_stage_max_proposals = 15 [default=300]; + + // First stage RPN localization loss weight. + optional float first_stage_localization_loss_weight = 16 [default=1.0]; + + // First stage RPN objectness loss weight. + optional float first_stage_objectness_loss_weight = 17 [default=1.0]; + + + // Per-region cropping parameters. + // Note that if a R-FCN model is constructed the per region cropping + // parameters below are ignored. + + // Output size (width and height are set to be the same) of the initial + // bilinear interpolation based cropping during ROI pooling. + optional int32 initial_crop_size = 18; + + // Kernel size of the max pool op on the cropped feature map during + // ROI pooling. + optional int32 maxpool_kernel_size = 19; + + // Stride of the max pool op on the cropped feature map during ROI pooling. + optional int32 maxpool_stride = 20; + + + // (Second stage) box classifier parameters + + // Hyperparameters for the second stage box predictor. If box predictor type + // is set to rfcn_box_predictor, a R-FCN model is constructed, otherwise a + // Faster R-CNN model is constructed. + optional BoxPredictor second_stage_box_predictor = 21; + + // The batch size per image used for computing the classification and refined + // location loss of the box classifier. + // Note that this field is ignored if `hard_example_miner` is configured. + optional int32 second_stage_batch_size = 22 [default=64]; + + // Fraction of positive examples to use per image for the box classifier. + optional float second_stage_balance_fraction = 23 [default=0.25]; + + // Post processing to apply on the second stage box classifier predictions. + // Note: the `score_converter` provided to the FasterRCNNMetaArch constructor + // is taken from this `second_stage_post_processing` proto. + optional PostProcessing second_stage_post_processing = 24; + + // Second stage refined localization loss weight. + optional float second_stage_localization_loss_weight = 25 [default=1.0]; + + // Second stage classification loss weight + optional float second_stage_classification_loss_weight = 26 [default=1.0]; + + // If not left to default, applies hard example mining. + optional HardExampleMiner hard_example_miner = 27; +} + + +message FasterRcnnFeatureExtractor { + // Type of Faster R-CNN model (e.g., 'faster_rcnn_resnet101'; + // See models/model_builder.py for expected types). + optional string type = 1; + + // Output stride of extracted RPN feature map. + optional int32 first_stage_features_stride = 2 [default=16]; +} diff --git a/object_detection/protos/faster_rcnn_box_coder.proto b/object_detection/protos/faster_rcnn_box_coder.proto new file mode 100644 index 0000000000000000000000000000000000000000..512a20a15099ffde1c8811797786bf8fa14cb692 --- /dev/null +++ b/object_detection/protos/faster_rcnn_box_coder.proto @@ -0,0 +1,17 @@ +syntax = "proto2"; + +package object_detection.protos; + +// Configuration proto for FasterRCNNBoxCoder. See +// box_coders/faster_rcnn_box_coder.py for details. +message FasterRcnnBoxCoder { + // Scale factor for anchor encoded box center. + optional float y_scale = 1 [default = 10.0]; + optional float x_scale = 2 [default = 10.0]; + + // Scale factor for anchor encoded box height. + optional float height_scale = 3 [default = 5.0]; + + // Scale factor for anchor encoded box width. + optional float width_scale = 4 [default = 5.0]; +} diff --git a/object_detection/protos/grid_anchor_generator.proto b/object_detection/protos/grid_anchor_generator.proto new file mode 100644 index 0000000000000000000000000000000000000000..85168f8f58617b98369899dfbd4ccdb732044541 --- /dev/null +++ b/object_detection/protos/grid_anchor_generator.proto @@ -0,0 +1,34 @@ +syntax = "proto2"; + +package object_detection.protos; + +// Configuration proto for GridAnchorGenerator. See +// anchor_generators/grid_anchor_generator.py for details. +message GridAnchorGenerator { + // Anchor height in pixels. + optional int32 height = 1 [default = 256]; + + // Anchor width in pixels. + optional int32 width = 2 [default = 256]; + + // Anchor stride in height dimension in pixels. + optional int32 height_stride = 3 [default = 16]; + + // Anchor stride in width dimension in pixels. + optional int32 width_stride = 4 [default = 16]; + + // Anchor height offset in pixels. + optional int32 height_offset = 5 [default = 0]; + + // Anchor width offset in pixels. + optional int32 width_offset = 6 [default = 0]; + + // At any given location, len(scales) * len(aspect_ratios) anchors are + // generated with all possible combinations of scales and aspect ratios. + + // List of scales for the anchors. + repeated float scales = 7; + + // List of aspect ratios for the anchors. + repeated float aspect_ratios = 8; +} diff --git a/object_detection/protos/hyperparams.proto b/object_detection/protos/hyperparams.proto new file mode 100644 index 0000000000000000000000000000000000000000..b8b9972e6147906890bf1ca8034be828784bbfae --- /dev/null +++ b/object_detection/protos/hyperparams.proto @@ -0,0 +1,103 @@ +syntax = "proto2"; + +package object_detection.protos; + +// Configuration proto for the convolution op hyperparameters to use in the +// object detection pipeline. +message Hyperparams { + + // Operations affected by hyperparameters. + enum Op { + // Convolution, Separable Convolution, Convolution transpose. + CONV = 1; + + // Fully connected + FC = 2; + } + optional Op op = 1 [default = CONV]; + + // Regularizer for the weights of the convolution op. + optional Regularizer regularizer = 2; + + // Initializer for the weights of the convolution op. + optional Initializer initializer = 3; + + // Type of activation to apply after convolution. + enum Activation { + // Use None (no activation) + NONE = 0; + + // Use tf.nn.relu + RELU = 1; + + // Use tf.nn.relu6 + RELU_6 = 2; + } + optional Activation activation = 4 [default = RELU]; + + // BatchNorm hyperparameters. If this parameter is NOT set then BatchNorm is + // not applied! + optional BatchNorm batch_norm = 5; +} + +// Proto with one-of field for regularizers. +message Regularizer { + oneof regularizer_oneof { + L1Regularizer l1_regularizer = 1; + L2Regularizer l2_regularizer = 2; + } +} + +// Configuration proto for L1 Regularizer. +// See https://www.tensorflow.org/api_docs/python/tf/contrib/layers/l1_regularizer +message L1Regularizer { + optional float weight = 1 [default = 1.0]; +} + +// Configuration proto for L2 Regularizer. +// See https://www.tensorflow.org/api_docs/python/tf/contrib/layers/l2_regularizer +message L2Regularizer { + optional float weight = 1 [default = 1.0]; +} + +// Proto with one-of field for initializers. +message Initializer { + oneof initializer_oneof { + TruncatedNormalInitializer truncated_normal_initializer = 1; + VarianceScalingInitializer variance_scaling_initializer = 2; + } +} + +// Configuration proto for truncated normal initializer. See +// https://www.tensorflow.org/api_docs/python/tf/truncated_normal_initializer +message TruncatedNormalInitializer { + optional float mean = 1 [default = 0.0]; + optional float stddev = 2 [default = 1.0]; +} + +// Configuration proto for variance scaling initializer. See +// https://www.tensorflow.org/api_docs/python/tf/contrib/layers/ +// variance_scaling_initializer +message VarianceScalingInitializer { + optional float factor = 1 [default = 2.0]; + optional bool uniform = 2 [default = false]; + enum Mode { + FAN_IN = 0; + FAN_OUT = 1; + FAN_AVG = 2; + } + optional Mode mode = 3 [default = FAN_IN]; +} + +// Configuration proto for batch norm to apply after convolution op. See +// https://www.tensorflow.org/api_docs/python/tf/contrib/layers/batch_norm +message BatchNorm { + optional float decay = 1 [default = 0.999]; + optional bool center = 2 [default = true]; + optional bool scale = 3 [default = false]; + optional float epsilon = 4 [default = 0.001]; + // Whether to train the batch norm variables. If this is set to false during + // training, the current value of the batch_norm variables are used for + // forward pass but they are never updated. + optional bool train = 5 [default = true]; +} diff --git a/object_detection/protos/image_resizer.proto b/object_detection/protos/image_resizer.proto new file mode 100644 index 0000000000000000000000000000000000000000..4618add723e41124ba2d02652cbfe9f7963eafc2 --- /dev/null +++ b/object_detection/protos/image_resizer.proto @@ -0,0 +1,32 @@ +syntax = "proto2"; + +package object_detection.protos; + +// Configuration proto for image resizing operations. +// See builders/image_resizer_builder.py for details. +message ImageResizer { + oneof image_resizer_oneof { + KeepAspectRatioResizer keep_aspect_ratio_resizer = 1; + FixedShapeResizer fixed_shape_resizer = 2; + } +} + + +// Configuration proto for image resizer that keeps aspect ratio. +message KeepAspectRatioResizer { + // Desired size of the smaller image dimension in pixels. + optional int32 min_dimension = 1 [default = 600]; + + // Desired size of the larger image dimension in pixels. + optional int32 max_dimension = 2 [default = 1024]; +} + + +// Configuration proto for image resizer that resizes to a fixed shape. +message FixedShapeResizer { + // Desired height of image in pixels. + optional int32 height = 1 [default = 300]; + + // Desired width of image in pixels. + optional int32 width = 2 [default = 300]; +} diff --git a/object_detection/protos/input_reader.proto b/object_detection/protos/input_reader.proto new file mode 100644 index 0000000000000000000000000000000000000000..8956b009eb926864116c7105416ad3d231acad11 --- /dev/null +++ b/object_detection/protos/input_reader.proto @@ -0,0 +1,60 @@ +syntax = "proto2"; + +package object_detection.protos; + +// Configuration proto for defining input readers that generate Object Detection +// Examples from input sources. Input readers are expected to generate a +// dictionary of tensors, with the following fields populated: +// +// 'image': an [image_height, image_width, channels] image tensor that detection +// will be run on. +// 'groundtruth_classes': a [num_boxes] int32 tensor storing the class +// labels of detected boxes in the image. +// 'groundtruth_boxes': a [num_boxes, 4] float tensor storing the coordinates of +// detected boxes in the image. +// 'groundtruth_instance_masks': (Optional), a [num_boxes, image_height, +// image_width] float tensor storing binary mask of the objects in boxes. + +message InputReader { + // Path to StringIntLabelMap pbtxt file specifying the mapping from string + // labels to integer ids. + optional string label_map_path = 1 [default=""]; + + // Whether data should be processed in the order they are read in, or + // shuffled randomly. + optional bool shuffle = 2 [default=true]; + + // Maximum number of records to keep in reader queue. + optional uint32 queue_capacity = 3 [default=2000]; + + // Minimum number of records to keep in reader queue. A large value is needed + // to generate a good random shuffle. + optional uint32 min_after_dequeue = 4 [default=1000]; + + // The number of times a data source is read. If set to zero, the data source + // will be reused indefinitely. + optional uint32 num_epochs = 5 [default=0]; + + // Number of reader instances to create. + optional uint32 num_readers = 6 [default=8]; + + // Whether to load groundtruth instance masks. + optional bool load_instance_masks = 7 [default = false]; + + oneof input_reader { + TFRecordInputReader tf_record_input_reader = 8; + ExternalInputReader external_input_reader = 9; + } +} + +// An input reader that reads TF Example protos from local TFRecord files. +message TFRecordInputReader { + // Path to TFRecordFile. + optional string input_path = 1 [default=""]; +} + +// An externally defined input reader. Users may define an extension to this +// proto to interface their own input readers. +message ExternalInputReader { + extensions 1 to 999; +} diff --git a/object_detection/protos/losses.proto b/object_detection/protos/losses.proto new file mode 100644 index 0000000000000000000000000000000000000000..acd32b1fc06944d329e22c2ebfcfe3e0eb030740 --- /dev/null +++ b/object_detection/protos/losses.proto @@ -0,0 +1,116 @@ +syntax = "proto2"; + +package object_detection.protos; + +// Message for configuring the localization loss, classification loss and hard +// example miner used for training object detection models. See core/losses.py +// for details +message Loss { + // Localization loss to use. + optional LocalizationLoss localization_loss = 1; + + // Classification loss to use. + optional ClassificationLoss classification_loss = 2; + + // If not left to default, applies hard example mining. + optional HardExampleMiner hard_example_miner = 3; + + // Classification loss weight. + optional float classification_weight = 4 [default=1.0]; + + // Localization loss weight. + optional float localization_weight = 5 [default=1.0]; +} + +// Configuration for bounding box localization loss function. +message LocalizationLoss { + oneof localization_loss { + WeightedL2LocalizationLoss weighted_l2 = 1; + WeightedSmoothL1LocalizationLoss weighted_smooth_l1 = 2; + WeightedIOULocalizationLoss weighted_iou = 3; + } +} + +// L2 location loss: 0.5 * ||weight * (a - b)|| ^ 2 +message WeightedL2LocalizationLoss { + // Output loss per anchor. + optional bool anchorwise_output = 1 [default=false]; +} + +// SmoothL1 (Huber) location loss: .5 * x ^ 2 if |x| < 1 else |x| - .5 +message WeightedSmoothL1LocalizationLoss { + // Output loss per anchor. + optional bool anchorwise_output = 1 [default=false]; +} + +// Intersection over union location loss: 1 - IOU +message WeightedIOULocalizationLoss { +} + +// Configuration for class prediction loss function. +message ClassificationLoss { + oneof classification_loss { + WeightedSigmoidClassificationLoss weighted_sigmoid = 1; + WeightedSoftmaxClassificationLoss weighted_softmax = 2; + BootstrappedSigmoidClassificationLoss bootstrapped_sigmoid = 3; + } +} + +// Classification loss using a sigmoid function over class predictions. +message WeightedSigmoidClassificationLoss { + // Output loss per anchor. + optional bool anchorwise_output = 1 [default=false]; +} + +// Classification loss using a softmax function over class predictions. +message WeightedSoftmaxClassificationLoss { + // Output loss per anchor. + optional bool anchorwise_output = 1 [default=false]; +} + +// Classification loss using a sigmoid function over the class prediction with +// the highest prediction score. +message BootstrappedSigmoidClassificationLoss { + // Interpolation weight between 0 and 1. + optional float alpha = 1; + + // Whether hard boot strapping should be used or not. If true, will only use + // one class favored by model. Othewise, will use all predicted class + // probabilities. + optional bool hard_bootstrap = 2 [default=false]; + + // Output loss per anchor. + optional bool anchorwise_output = 3 [default=false]; +} + +// Configuation for hard example miner. +message HardExampleMiner { + // Maximum number of hard examples to be selected per image (prior to + // enforcing max negative to positive ratio constraint). If set to 0, + // all examples obtained after NMS are considered. + optional int32 num_hard_examples = 1 [default=64]; + + // Minimum intersection over union for an example to be discarded during NMS. + optional float iou_threshold = 2 [default=0.7]; + + // Whether to use classification losses ('cls', default), localization losses + // ('loc') or both losses ('both'). In the case of 'both', cls_loss_weight and + // loc_loss_weight are used to compute weighted sum of the two losses. + enum LossType { + BOTH = 0; + CLASSIFICATION = 1; + LOCALIZATION = 2; + } + optional LossType loss_type = 3 [default=BOTH]; + + // Maximum number of negatives to retain for each positive anchor. If + // num_negatives_per_positive is 0 no prespecified negative:positive ratio is + // enforced. + optional int32 max_negatives_per_positive = 4 [default=0]; + + // Minimum number of negative anchors to sample for a given image. Setting + // this to a positive number samples negatives in an image without any + // positive anchors and thus not bias the model towards having at least one + // detection per image. + optional int32 min_negatives_per_image = 5 [default=0]; +} diff --git a/object_detection/protos/matcher.proto b/object_detection/protos/matcher.proto new file mode 100644 index 0000000000000000000000000000000000000000..b47de56c0ea6e0c3ae7bb6ab99a35daa1c987b90 --- /dev/null +++ b/object_detection/protos/matcher.proto @@ -0,0 +1,15 @@ +syntax = "proto2"; + +package object_detection.protos; + +import "object_detection/protos/argmax_matcher.proto"; +import "object_detection/protos/bipartite_matcher.proto"; + +// Configuration proto for the matcher to be used in the object detection +// pipeline. See core/matcher.py for details. +message Matcher { + oneof matcher_oneof { + ArgMaxMatcher argmax_matcher = 1; + BipartiteMatcher bipartite_matcher = 2; + } +} diff --git a/object_detection/protos/mean_stddev_box_coder.proto b/object_detection/protos/mean_stddev_box_coder.proto new file mode 100644 index 0000000000000000000000000000000000000000..597c70cdbbb09c6811650198980fade760f7daa9 --- /dev/null +++ b/object_detection/protos/mean_stddev_box_coder.proto @@ -0,0 +1,8 @@ +syntax = "proto2"; + +package object_detection.protos; + +// Configuration proto for MeanStddevBoxCoder. See +// box_coders/mean_stddev_box_coder.py for details. +message MeanStddevBoxCoder { +} diff --git a/object_detection/protos/model.proto b/object_detection/protos/model.proto new file mode 100644 index 0000000000000000000000000000000000000000..b699c17b52dfa4235baf69213ebc09ed1aa54210 --- /dev/null +++ b/object_detection/protos/model.proto @@ -0,0 +1,14 @@ +syntax = "proto2"; + +package object_detection.protos; + +import "object_detection/protos/faster_rcnn.proto"; +import "object_detection/protos/ssd.proto"; + +// Top level configuration for DetectionModels. +message DetectionModel { + oneof model { + FasterRcnn faster_rcnn = 1; + Ssd ssd = 2; + } +} diff --git a/object_detection/protos/optimizer.proto b/object_detection/protos/optimizer.proto new file mode 100644 index 0000000000000000000000000000000000000000..6ea9f193565ce41ba1955cb7926bb77a8f2f596d --- /dev/null +++ b/object_detection/protos/optimizer.proto @@ -0,0 +1,73 @@ +syntax = "proto2"; + +package object_detection.protos; + +// Messages for configuring the optimizing strategy for training object +// detection models. + +// Top level optimizer message. +message Optimizer { + oneof optimizer { + RMSPropOptimizer rms_prop_optimizer = 1; + MomentumOptimizer momentum_optimizer = 2; + AdamOptimizer adam_optimizer = 3; + } + optional bool use_moving_average = 4 [default=true]; + optional float moving_average_decay = 5 [default=0.9999]; +} + +// Configuration message for the RMSPropOptimizer +// See: https://www.tensorflow.org/api_docs/python/tf/train/RMSPropOptimizer +message RMSPropOptimizer { + optional LearningRate learning_rate = 1; + optional float momentum_optimizer_value = 2 [default=0.9]; + optional float decay = 3 [default=0.9]; + optional float epsilon = 4 [default=1.0]; +} + +// Configuration message for the MomentumOptimizer +// See: https://www.tensorflow.org/api_docs/python/tf/train/MomentumOptimizer +message MomentumOptimizer { + optional LearningRate learning_rate = 1; + optional float momentum_optimizer_value = 2 [default=0.9]; +} + +// Configuration message for the AdamOptimizer +// See: https://www.tensorflow.org/api_docs/python/tf/train/AdamOptimizer +message AdamOptimizer { + optional LearningRate learning_rate = 1; +} + +// Configuration message for optimizer learning rate. +message LearningRate { + oneof learning_rate { + ConstantLearningRate constant_learning_rate = 1; + ExponentialDecayLearningRate exponential_decay_learning_rate = 2; + ManualStepLearningRate manual_step_learning_rate = 3; + } +} + +// Configuration message for a constant learning rate. +message ConstantLearningRate { + optional float learning_rate = 1 [default=0.002]; +} + +// Configuration message for an exponentially decaying learning rate. +// See https://www.tensorflow.org/versions/master/api_docs/python/train/ \ +// decaying_the_learning_rate#exponential_decay +message ExponentialDecayLearningRate { + optional float initial_learning_rate = 1 [default=0.002]; + optional uint32 decay_steps = 2 [default=4000000]; + optional float decay_factor = 3 [default=0.95]; + optional bool staircase = 4 [default=true]; +} + +// Configuration message for a manually defined learning rate schedule. +message ManualStepLearningRate { + optional float initial_learning_rate = 1 [default=0.002]; + message LearningRateSchedule { + optional uint32 step = 1; + optional float learning_rate = 2 [default=0.002]; + } + repeated LearningRateSchedule schedule = 2; +} diff --git a/object_detection/protos/pipeline.proto b/object_detection/protos/pipeline.proto new file mode 100644 index 0000000000000000000000000000000000000000..67f4e544948a868666961d636f4fcdb3b7b265a0 --- /dev/null +++ b/object_detection/protos/pipeline.proto @@ -0,0 +1,18 @@ +syntax = "proto2"; + +package object_detection.protos; + +import "object_detection/protos/eval.proto"; +import "object_detection/protos/input_reader.proto"; +import "object_detection/protos/model.proto"; +import "object_detection/protos/train.proto"; + +// Convenience message for configuring a training and eval pipeline. Allows all +// of the pipeline parameters to be configured from one file. +message TrainEvalPipelineConfig { + optional DetectionModel model = 1; + optional TrainConfig train_config = 2; + optional InputReader train_input_reader = 3; + optional EvalConfig eval_config = 4; + optional InputReader eval_input_reader = 5; +} diff --git a/object_detection/protos/post_processing.proto b/object_detection/protos/post_processing.proto new file mode 100644 index 0000000000000000000000000000000000000000..736ac579dabb072eddd3ec961ba9868bf3a2ace6 --- /dev/null +++ b/object_detection/protos/post_processing.proto @@ -0,0 +1,42 @@ +syntax = "proto2"; + +package object_detection.protos; + +// Configuration proto for non-max-suppression operation on a batch of +// detections. +message BatchNonMaxSuppression { + // Scalar threshold for score (low scoring boxes are removed). + optional float score_threshold = 1 [default = 0.0]; + + // Scalar threshold for IOU (boxes that have high IOU overlap + // with previously selected boxes are removed). + optional float iou_threshold = 2 [default = 0.6]; + + // Maximum number of detections to retain per class. + optional int32 max_detections_per_class = 3 [default = 100]; + + // Maximum number of detections to retain across all classes. + optional int32 max_total_detections = 5 [default = 100]; +} + +// Configuration proto for post-processing predicted boxes and +// scores. +message PostProcessing { + // Non max suppression parameters. + optional BatchNonMaxSuppression batch_non_max_suppression = 1; + + // Enum to specify how to convert the detection scores. + enum ScoreConverter { + // Input scores equals output scores. + IDENTITY = 0; + + // Applies a sigmoid on input scores. + SIGMOID = 1; + + // Applies a softmax on input scores + SOFTMAX = 2; + } + + // Score converter to use. + optional ScoreConverter score_converter = 2 [default = IDENTITY]; +} diff --git a/object_detection/protos/preprocessor.proto b/object_detection/protos/preprocessor.proto new file mode 100644 index 0000000000000000000000000000000000000000..0cb338c8d651ac790833732155f7f4fba8796ae5 --- /dev/null +++ b/object_detection/protos/preprocessor.proto @@ -0,0 +1,326 @@ +syntax = "proto2"; + +package object_detection.protos; + +// Message for defining a preprocessing operation on input data. +// See: //object_detection/core/preprocessor.py +message PreprocessingStep { + oneof preprocessing_step { + NormalizeImage normalize_image = 1; + RandomHorizontalFlip random_horizontal_flip = 2; + RandomPixelValueScale random_pixel_value_scale = 3; + RandomImageScale random_image_scale = 4; + RandomRGBtoGray random_rgb_to_gray = 5; + RandomAdjustBrightness random_adjust_brightness = 6; + RandomAdjustContrast random_adjust_contrast = 7; + RandomAdjustHue random_adjust_hue = 8; + RandomAdjustSaturation random_adjust_saturation = 9; + RandomDistortColor random_distort_color = 10; + RandomJitterBoxes random_jitter_boxes = 11; + RandomCropImage random_crop_image = 12; + RandomPadImage random_pad_image = 13; + RandomCropPadImage random_crop_pad_image = 14; + RandomCropToAspectRatio random_crop_to_aspect_ratio = 15; + RandomBlackPatches random_black_patches = 16; + RandomResizeMethod random_resize_method = 17; + ScaleBoxesToPixelCoordinates scale_boxes_to_pixel_coordinates = 18; + ResizeImage resize_image = 19; + SubtractChannelMean subtract_channel_mean = 20; + SSDRandomCrop ssd_random_crop = 21; + SSDRandomCropPad ssd_random_crop_pad = 22; + SSDRandomCropFixedAspectRatio ssd_random_crop_fixed_aspect_ratio = 23; + } +} + +// Normalizes pixel values in an image. +// For every channel in the image, moves the pixel values from the range +// [original_minval, original_maxval] to [target_minval, target_maxval]. +message NormalizeImage { + optional float original_minval = 1; + optional float original_maxval = 2; + optional float target_minval = 3 [default=0]; + optional float target_maxval = 4 [default=1]; +} + +// Randomly horizontally mirrors the image and detections 50% of the time. +message RandomHorizontalFlip { +} + +// Randomly scales the values of all pixels in the image by some constant value +// between [minval, maxval], then clip the value to a range between [0, 1.0]. +message RandomPixelValueScale { + optional float minval = 1 [default=0.9]; + optional float maxval = 2 [default=1.1]; +} + +// Randomly enlarges or shrinks image (keeping aspect ratio). +message RandomImageScale { + optional float min_scale_ratio = 1 [default=0.5]; + optional float max_scale_ratio = 2 [default=2.0]; +} + +// Randomly convert entire image to grey scale. +message RandomRGBtoGray { + optional float probability = 1 [default=0.1]; +} + +// Randomly changes image brightness by up to max_delta. Image outputs will be +// saturated between 0 and 1. +message RandomAdjustBrightness { + optional float max_delta=1 [default=0.2]; +} + +// Randomly scales contract by a value between [min_delta, max_delta]. +message RandomAdjustContrast { + optional float min_delta = 1 [default=0.8]; + optional float max_delta = 2 [default=1.25]; +} + +// Randomly alters hue by a value of up to max_delta. +message RandomAdjustHue { + optional float max_delta = 1 [default=0.02]; +} + +// Randomly changes saturation by a value between [min_delta, max_delta]. +message RandomAdjustSaturation { + optional float min_delta = 1 [default=0.8]; + optional float max_delta = 2 [default=1.25]; +} + +// Performs a random color distortion. color_orderings should either be 0 or 1. +message RandomDistortColor { + optional int32 color_ordering = 1; +} + +// Randomly jitters corners of boxes in the image determined by ratio. +// ie. If a box is [100, 200] and ratio is 0.02, the corners can move by [1, 4]. +message RandomJitterBoxes { + optional float ratio = 1 [default=0.05]; +} + +// Randomly crops the image and bounding boxes. +message RandomCropImage { + // Cropped image must cover at least one box by this fraction. + optional float min_object_covered = 1 [default=1.0]; + + // Aspect ratio bounds of cropped image. + optional float min_aspect_ratio = 2 [default=0.75]; + optional float max_aspect_ratio = 3 [default=1.33]; + + // Allowed area ratio of cropped image to original image. + optional float min_area = 4 [default=0.1]; + optional float max_area = 5 [default=1.0]; + + // Minimum overlap threshold of cropped boxes to keep in new image. If the + // ratio between a cropped bounding box and the original is less than this + // value, it is removed from the new image. + optional float overlap_thresh = 6 [default=0.3]; + + // Probability of keeping the original image. + optional float random_coef = 7 [default=0.0]; +} + +// Randomly adds padding to the image. +message RandomPadImage { + // Minimum dimensions for padded image. If unset, will use original image + // dimension as a lower bound. + optional float min_image_height = 1; + optional float min_image_width = 2; + + // Maximum dimensions for padded image. If unset, will use double the original + // image dimension as a lower bound. + optional float max_image_height = 3; + optional float max_image_width = 4; + + // Color of the padding. If unset, will pad using average color of the input + // image. + repeated float pad_color = 5; +} + +// Randomly crops an image followed by a random pad. +message RandomCropPadImage { + // Cropping operation must cover at least one box by this fraction. + optional float min_object_covered = 1 [default=1.0]; + + // Aspect ratio bounds of image after cropping operation. + optional float min_aspect_ratio = 2 [default=0.75]; + optional float max_aspect_ratio = 3 [default=1.33]; + + // Allowed area ratio of image after cropping operation. + optional float min_area = 4 [default=0.1]; + optional float max_area = 5 [default=1.0]; + + // Minimum overlap threshold of cropped boxes to keep in new image. If the + // ratio between a cropped bounding box and the original is less than this + // value, it is removed from the new image. + optional float overlap_thresh = 6 [default=0.3]; + + // Probability of keeping the original image during the crop operation. + optional float random_coef = 7 [default=0.0]; + + // Maximum dimensions for padded image. If unset, will use double the original + // image dimension as a lower bound. Both of the following fields should be + // length 2. + repeated float min_padded_size_ratio = 8; + repeated float max_padded_size_ratio = 9; + + // Color of the padding. If unset, will pad using average color of the input + // image. + repeated float pad_color = 10; +} + +// Randomly crops an iamge to a given aspect ratio. +message RandomCropToAspectRatio { + // Aspect ratio. + optional float aspect_ratio = 1 [default=1.0]; + + // Minimum overlap threshold of cropped boxes to keep in new image. If the + // ratio between a cropped bounding box and the original is less than this + // value, it is removed from the new image. + optional float overlap_thresh = 2 [default=0.3]; +} + +// Randomly adds black square patches to an image. +message RandomBlackPatches { + // The maximum number of black patches to add. + optional int32 max_black_patches = 1 [default=10]; + + // The probability of a black patch being added to an image. + optional float probability = 2 [default=0.5]; + + // Ratio between the dimension of the black patch to the minimum dimension of + // the image (patch_width = patch_height = min(image_height, image_width)). + optional float size_to_image_ratio = 3 [default=0.1]; +} + +// Randomly resizes the image up to [target_height, target_width]. +message RandomResizeMethod { + optional float target_height = 1; + optional float target_width = 2; +} + +// Scales boxes from normalized coordinates to pixel coordinates. +message ScaleBoxesToPixelCoordinates { +} + +// Resizes images to [new_height, new_width]. +message ResizeImage { + optional int32 new_height = 1; + optional int32 new_width = 2; + enum Method { + AREA=1; + BICUBIC=2; + BILINEAR=3; + NEAREST_NEIGHBOR=4; + } + optional Method method = 3 [default=BILINEAR]; +} + +// Normalizes an image by subtracting a mean from each channel. +message SubtractChannelMean { + // The mean to subtract from each channel. Should be of same dimension of + // channels in the input image. + repeated float means = 1; +} + +message SSDRandomCropOperation { + // Cropped image must cover at least this fraction of one original bounding + // box. + optional float min_object_covered = 1; + + // The aspect ratio of the cropped image must be within the range of + // [min_aspect_ratio, max_aspect_ratio]. + optional float min_aspect_ratio = 2; + optional float max_aspect_ratio = 3; + + // The area of the cropped image must be within the range of + // [min_area, max_area]. + optional float min_area = 4; + optional float max_area = 5; + + // Cropped box area ratio must be above this threhold to be kept. + optional float overlap_thresh = 6; + + // Probability a crop operation is skipped. + optional float random_coef = 7; +} + +// Randomly crops a image according to: +// Liu et al., SSD: Single shot multibox detector. +// This preprocessing step defines multiple SSDRandomCropOperations. Only one +// operation (chosen at random) is actually performed on an image. +message SSDRandomCrop { + repeated SSDRandomCropOperation operations = 1; +} + +message SSDRandomCropPadOperation { + // Cropped image must cover at least this fraction of one original bounding + // box. + optional float min_object_covered = 1; + + // The aspect ratio of the cropped image must be within the range of + // [min_aspect_ratio, max_aspect_ratio]. + optional float min_aspect_ratio = 2; + optional float max_aspect_ratio = 3; + + // The area of the cropped image must be within the range of + // [min_area, max_area]. + optional float min_area = 4; + optional float max_area = 5; + + // Cropped box area ratio must be above this threhold to be kept. + optional float overlap_thresh = 6; + + // Probability a crop operation is skipped. + optional float random_coef = 7; + + // Min ratio of padded image height and width to the input image's height and + // width. Two entries per operation. + repeated float min_padded_size_ratio = 8; + + // Max ratio of padded image height and width to the input image's height and + // width. Two entries per operation. + repeated float max_padded_size_ratio = 9; + + // Padding color. + optional float pad_color_r = 10; + optional float pad_color_g = 11; + optional float pad_color_b = 12; +} + +// Randomly crops and pads an image according to: +// Liu et al., SSD: Single shot multibox detector. +// This preprocessing step defines multiple SSDRandomCropPadOperations. Only one +// operation (chosen at random) is actually performed on an image. +message SSDRandomCropPad { + repeated SSDRandomCropPadOperation operations = 1; +} + +message SSDRandomCropFixedAspectRatioOperation { + // Cropped image must cover at least this fraction of one original bounding + // box. + optional float min_object_covered = 1; + + // The area of the cropped image must be within the range of + // [min_area, max_area]. + optional float min_area = 4; + optional float max_area = 5; + + // Cropped box area ratio must be above this threhold to be kept. + optional float overlap_thresh = 6; + + // Probability a crop operation is skipped. + optional float random_coef = 7; +} + +// Randomly crops a image to a fixed aspect ratio according to: +// Liu et al., SSD: Single shot multibox detector. +// Multiple SSDRandomCropFixedAspectRatioOperations are defined by this +// preprocessing step. Only one operation (chosen at random) is actually +// performed on an image. +message SSDRandomCropFixedAspectRatio { + repeated SSDRandomCropFixedAspectRatioOperation operations = 1; + + // Aspect ratio to crop to. This value is used for all crop operations. + optional float aspect_ratio = 2 [default=1.0]; +} diff --git a/object_detection/protos/region_similarity_calculator.proto b/object_detection/protos/region_similarity_calculator.proto new file mode 100644 index 0000000000000000000000000000000000000000..e82424e2e70c48db3805985f48f0246079df28dc --- /dev/null +++ b/object_detection/protos/region_similarity_calculator.proto @@ -0,0 +1,25 @@ +syntax = "proto2"; + +package object_detection.protos; + +// Configuration proto for region similarity calculators. See +// core/region_similarity_calculator.py for details. +message RegionSimilarityCalculator { + oneof region_similarity { + NegSqDistSimilarity neg_sq_dist_similarity = 1; + IouSimilarity iou_similarity = 2; + IoaSimilarity ioa_similarity = 3; + } +} + +// Configuration for negative squared distance similarity calculator. +message NegSqDistSimilarity { +} + +// Configuration for intersection-over-union (IOU) similarity calculator. +message IouSimilarity { +} + +// Configuration for intersection-over-area (IOA) similarity calculator. +message IoaSimilarity { +} diff --git a/object_detection/protos/square_box_coder.proto b/object_detection/protos/square_box_coder.proto new file mode 100644 index 0000000000000000000000000000000000000000..41575eb42d91271be65d1f18743f747c03249cc9 --- /dev/null +++ b/object_detection/protos/square_box_coder.proto @@ -0,0 +1,14 @@ +syntax = "proto2"; + +package object_detection.protos; + +// Configuration proto for SquareBoxCoder. See +// box_coders/square_box_coder.py for details. +message SquareBoxCoder { + // Scale factor for anchor encoded box center. + optional float y_scale = 1 [default = 10.0]; + optional float x_scale = 2 [default = 10.0]; + + // Scale factor for anchor encoded box length. + optional float length_scale = 3 [default = 5.0]; +} diff --git a/object_detection/protos/ssd.proto b/object_detection/protos/ssd.proto new file mode 100644 index 0000000000000000000000000000000000000000..9eb78c6624dc49d303173d072f5abc70154590fa --- /dev/null +++ b/object_detection/protos/ssd.proto @@ -0,0 +1,65 @@ +syntax = "proto2"; +package object_detection.protos; + +import "object_detection/protos/anchor_generator.proto"; +import "object_detection/protos/box_coder.proto"; +import "object_detection/protos/box_predictor.proto"; +import "object_detection/protos/hyperparams.proto"; +import "object_detection/protos/image_resizer.proto"; +import "object_detection/protos/matcher.proto"; +import "object_detection/protos/losses.proto"; +import "object_detection/protos/post_processing.proto"; +import "object_detection/protos/region_similarity_calculator.proto"; + +// Configuration for Single Shot Detection (SSD) models. +message Ssd { + + // Number of classes to predict. + optional int32 num_classes = 1; + + // Image resizer for preprocessing the input image. + optional ImageResizer image_resizer = 2; + + // Feature extractor config. + optional SsdFeatureExtractor feature_extractor = 3; + + // Box coder to encode the boxes. + optional BoxCoder box_coder = 4; + + // Matcher to match groundtruth with anchors. + optional Matcher matcher = 5; + + // Region similarity calculator to compute similarity of boxes. + optional RegionSimilarityCalculator similarity_calculator = 6; + + // Box predictor to attach to the features. + optional BoxPredictor box_predictor = 7; + + // Anchor generator to compute anchors. + optional AnchorGenerator anchor_generator = 8; + + // Post processing to apply on the predictions. + optional PostProcessing post_processing = 9; + + // Whether to normalize the loss by number of groundtruth boxes that match to + // the anchors. + optional bool normalize_loss_by_num_matches = 10 [default=true]; + + // Loss configuration for training. + optional Loss loss = 11; +} + + +message SsdFeatureExtractor { + // Type of ssd feature extractor. + optional string type = 1; + + // The factor to alter the depth of the channels in the feature extractor. + optional float depth_multiplier = 2 [default=1.0]; + + // Minimum number of the channels in the feature extractor. + optional int32 min_depth = 3 [default=16]; + + // Hyperparameters for the feature extractor. + optional Hyperparams conv_hyperparams = 4; +} diff --git a/object_detection/protos/ssd_anchor_generator.proto b/object_detection/protos/ssd_anchor_generator.proto new file mode 100644 index 0000000000000000000000000000000000000000..15654ace45a0f61396d2feaa56043c921c94c9d7 --- /dev/null +++ b/object_detection/protos/ssd_anchor_generator.proto @@ -0,0 +1,25 @@ +syntax = "proto2"; + +package object_detection.protos; + +// Configuration proto for SSD anchor generator described in +// https://arxiv.org/abs/1512.02325. See +// anchor_generators/multiple_grid_anchor_generator.py for details. +message SsdAnchorGenerator { + // Number of grid layers to create anchors for. + optional int32 num_layers = 1 [default = 6]; + + // Scale of anchors corresponding to finest resolution. + optional float min_scale = 2 [default = 0.2]; + + // Scale of anchors corresponding to coarsest resolution + optional float max_scale = 3 [default = 0.95]; + + // Aspect ratios for anchors at each grid point. + repeated float aspect_ratios = 4; + + // Whether to use the following aspect ratio and scale combination for the + // layer with the finest resolution : (scale=0.1, aspect_ratio=1.0), + // (scale=min_scale, aspect_ration=2.0), (scale=min_scale, aspect_ratio=0.5). + optional bool reduce_boxes_in_lowest_layer = 5 [default = true]; +} diff --git a/object_detection/protos/string_int_label_map.proto b/object_detection/protos/string_int_label_map.proto new file mode 100644 index 0000000000000000000000000000000000000000..0894183bba09f5f28aadf8d0db0fcbbcbe1ab915 --- /dev/null +++ b/object_detection/protos/string_int_label_map.proto @@ -0,0 +1,24 @@ +// Message to store the mapping from class label strings to class id. Datasets +// use string labels to represent classes while the object detection framework +// works with class ids. This message maps them so they can be converted back +// and forth as needed. +syntax = "proto2"; + +package object_detection.protos; + +message StringIntLabelMapItem { + // String name. The most common practice is to set this to a MID or synsets + // id. + optional string name = 1; + + // Integer id that maps to the string name above. Label ids should start from + // 1. + optional int32 id = 2; + + // Human readable string label. + optional string display_name = 3; +}; + +message StringIntLabelMap { + repeated StringIntLabelMapItem item = 1; +}; diff --git a/object_detection/protos/train.proto b/object_detection/protos/train.proto new file mode 100644 index 0000000000000000000000000000000000000000..4f070082a8406f124360c9aeb776cc03a25b0eea --- /dev/null +++ b/object_detection/protos/train.proto @@ -0,0 +1,64 @@ +syntax = "proto2"; + +package object_detection.protos; + +import "object_detection/protos/optimizer.proto"; +import "object_detection/protos/preprocessor.proto"; + +// Message for configuring DetectionModel training jobs (train.py). +message TrainConfig { + // Input queue batch size. + optional uint32 batch_size = 1 [default=32]; + + // Data augmentation options. + repeated PreprocessingStep data_augmentation_options = 2; + + // Whether to synchronize replicas during training. + optional bool sync_replicas = 3 [default=false]; + + // How frequently to keep checkpoints. + optional uint32 keep_checkpoint_every_n_hours = 4 [default=1000]; + + // Optimizer used to train the DetectionModel. + optional Optimizer optimizer = 5; + + // If greater than 0, clips gradients by this value. + optional float gradient_clipping_by_norm = 6 [default=0.0]; + + // Checkpoint to restore variables from. Typically used to load feature + // extractor variables trained outside of object detection. + optional string fine_tune_checkpoint = 7 [default=""]; + + // Specifies if the finetune checkpoint is from an object detection model. + // If from an object detection model, the model being trained should have + // the same parameters with the exception of the num_classes parameter. + // If false, it assumes the checkpoint was a object classification model. + optional bool from_detection_checkpoint = 8 [default=false]; + + // Number of steps to train the DetectionModel for. If 0, will train the model + // indefinitely. + optional uint32 num_steps = 9 [default=0]; + + // Number of training steps between replica startup. + // This flag must be set to 0 if sync_replicas is set to true. + optional float startup_delay_steps = 10 [default=15]; + + // If greater than 0, multiplies the gradient of bias variables by this + // amount. + optional float bias_grad_multiplier = 11 [default=0]; + + // Variables that should not be updated during training. + repeated string freeze_variables = 12; + + // Number of replicas to aggregate before making parameter updates. + optional int32 replicas_to_aggregate = 13 [default=1]; + + // Maximum number of elements to store within a queue. + optional int32 batch_queue_capacity = 14 [default=600]; + + // Number of threads to use for batching. + optional int32 num_batch_queue_threads = 15 [default=8]; + + // Maximum capacity of the queue used to prefetch assembled batches. + optional int32 prefetch_queue_capacity = 16 [default=10]; +} diff --git a/object_detection/samples/cloud/cloud.yml b/object_detection/samples/cloud/cloud.yml new file mode 100644 index 0000000000000000000000000000000000000000..495876a1209b19b3a57e7dfd18a0c2d9e2b7bc0c --- /dev/null +++ b/object_detection/samples/cloud/cloud.yml @@ -0,0 +1,11 @@ +trainingInput: + runtimeVersion: "1.0" + scaleTier: CUSTOM + masterType: standard_gpu + workerCount: 5 + workerType: standard_gpu + parameterServerCount: 3 + parameterServerType: standard + + + diff --git a/object_detection/samples/configs/faster_rcnn_inception_resnet_v2_atrous_pets.config b/object_detection/samples/configs/faster_rcnn_inception_resnet_v2_atrous_pets.config new file mode 100644 index 0000000000000000000000000000000000000000..e27c58e7ee4e5a6b292eff5c35294ee78ddaaa06 --- /dev/null +++ b/object_detection/samples/configs/faster_rcnn_inception_resnet_v2_atrous_pets.config @@ -0,0 +1,138 @@ +# Faster R-CNN with Inception Resnet v2, Atrous version; +# Configured for Oxford-IIIT Pets Dataset. +# Users should configure the fine_tune_checkpoint field in the train config as +# well as the label_map_path and input_path fields in the train_input_reader and +# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that +# should be configured. + +model { + faster_rcnn { + num_classes: 37 + image_resizer { + keep_aspect_ratio_resizer { + min_dimension: 600 + max_dimension: 1024 + } + } + feature_extractor { + type: 'faster_rcnn_inception_resnet_v2' + first_stage_features_stride: 8 + } + first_stage_anchor_generator { + grid_anchor_generator { + scales: [0.25, 0.5, 1.0, 2.0] + aspect_ratios: [0.5, 1.0, 2.0] + height_stride: 8 + width_stride: 8 + } + } + first_stage_atrous_rate: 2 + first_stage_box_predictor_conv_hyperparams { + op: CONV + regularizer { + l2_regularizer { + weight: 0.0 + } + } + initializer { + truncated_normal_initializer { + stddev: 0.01 + } + } + } + first_stage_nms_score_threshold: 0.0 + first_stage_nms_iou_threshold: 0.7 + first_stage_max_proposals: 300 + first_stage_localization_loss_weight: 2.0 + first_stage_objectness_loss_weight: 1.0 + initial_crop_size: 17 + maxpool_kernel_size: 1 + maxpool_stride: 1 + second_stage_box_predictor { + mask_rcnn_box_predictor { + use_dropout: false + dropout_keep_probability: 1.0 + fc_hyperparams { + op: FC + regularizer { + l2_regularizer { + weight: 0.0 + } + } + initializer { + variance_scaling_initializer { + factor: 1.0 + uniform: true + mode: FAN_AVG + } + } + } + } + } + second_stage_post_processing { + batch_non_max_suppression { + score_threshold: 0.0 + iou_threshold: 0.6 + max_detections_per_class: 100 + max_total_detections: 100 + } + score_converter: SOFTMAX + } + second_stage_localization_loss_weight: 2.0 + second_stage_classification_loss_weight: 1.0 + } +} + +train_config: { + batch_size: 1 + optimizer { + momentum_optimizer: { + learning_rate: { + manual_step_learning_rate { + initial_learning_rate: 0.0003 + schedule { + step: 0 + learning_rate: .0003 + } + schedule { + step: 900000 + learning_rate: .00003 + } + schedule { + step: 1200000 + learning_rate: .000003 + } + } + } + momentum_optimizer_value: 0.9 + } + use_moving_average: false + } + gradient_clipping_by_norm: 10.0 + fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt" + from_detection_checkpoint: true + data_augmentation_options { + random_horizontal_flip { + } + } +} + +train_input_reader: { + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/pet_train.record" + } + label_map_path: "PATH_TO_BE_CONFIGURED/pet_label_map.pbtxt" +} + +eval_config: { + num_examples: 2000 +} + +eval_input_reader: { + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/pet_val.record" + } + label_map_path: "PATH_TO_BE_CONFIGURED/pet_label_map.pbtxt" + shuffle: false + num_readers: 1 +} diff --git a/object_detection/samples/configs/faster_rcnn_resnet101_pets.config b/object_detection/samples/configs/faster_rcnn_resnet101_pets.config new file mode 100644 index 0000000000000000000000000000000000000000..e61d5ff7ab768a7ef69f2eb3ccf74903c4a0780f --- /dev/null +++ b/object_detection/samples/configs/faster_rcnn_resnet101_pets.config @@ -0,0 +1,136 @@ +# Faster R-CNN with Resnet-101 (v1) configured for the Oxford-IIIT Pet Dataset. +# Users should configure the fine_tune_checkpoint field in the train config as +# well as the label_map_path and input_path fields in the train_input_reader and +# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that +# should be configured. + +model { + faster_rcnn { + num_classes: 37 + image_resizer { + keep_aspect_ratio_resizer { + min_dimension: 600 + max_dimension: 1024 + } + } + feature_extractor { + type: 'faster_rcnn_resnet101' + first_stage_features_stride: 16 + } + first_stage_anchor_generator { + grid_anchor_generator { + scales: [0.25, 0.5, 1.0, 2.0] + aspect_ratios: [0.5, 1.0, 2.0] + height_stride: 16 + width_stride: 16 + } + } + first_stage_box_predictor_conv_hyperparams { + op: CONV + regularizer { + l2_regularizer { + weight: 0.0 + } + } + initializer { + truncated_normal_initializer { + stddev: 0.01 + } + } + } + first_stage_nms_score_threshold: 0.0 + first_stage_nms_iou_threshold: 0.7 + first_stage_max_proposals: 300 + first_stage_localization_loss_weight: 2.0 + first_stage_objectness_loss_weight: 1.0 + initial_crop_size: 14 + maxpool_kernel_size: 2 + maxpool_stride: 2 + second_stage_box_predictor { + mask_rcnn_box_predictor { + use_dropout: false + dropout_keep_probability: 1.0 + fc_hyperparams { + op: FC + regularizer { + l2_regularizer { + weight: 0.0 + } + } + initializer { + variance_scaling_initializer { + factor: 1.0 + uniform: true + mode: FAN_AVG + } + } + } + } + } + second_stage_post_processing { + batch_non_max_suppression { + score_threshold: 0.0 + iou_threshold: 0.6 + max_detections_per_class: 100 + max_total_detections: 300 + } + score_converter: SOFTMAX + } + second_stage_localization_loss_weight: 2.0 + second_stage_classification_loss_weight: 1.0 + } +} + +train_config: { + batch_size: 1 + optimizer { + momentum_optimizer: { + learning_rate: { + manual_step_learning_rate { + initial_learning_rate: 0.0003 + schedule { + step: 0 + learning_rate: .0003 + } + schedule { + step: 900000 + learning_rate: .00003 + } + schedule { + step: 1200000 + learning_rate: .000003 + } + } + } + momentum_optimizer_value: 0.9 + } + use_moving_average: false + } + gradient_clipping_by_norm: 10.0 + fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt" + from_detection_checkpoint: true + data_augmentation_options { + random_horizontal_flip { + } + } +} + +train_input_reader: { + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/pet_train.record" + } + label_map_path: "PATH_TO_BE_CONFIGURED/pet_label_map.pbtxt" +} + +eval_config: { + num_examples: 2000 +} + +eval_input_reader: { + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/pet_val.record" + } + label_map_path: "PATH_TO_BE_CONFIGURED/pet_label_map.pbtxt" + shuffle: false + num_readers: 1 +} diff --git a/object_detection/samples/configs/faster_rcnn_resnet101_voc07.config b/object_detection/samples/configs/faster_rcnn_resnet101_voc07.config new file mode 100644 index 0000000000000000000000000000000000000000..e236224184dde1d171c45df1b144c2e14d6bc217 --- /dev/null +++ b/object_detection/samples/configs/faster_rcnn_resnet101_voc07.config @@ -0,0 +1,137 @@ +# Faster R-CNN with Resnet-101 (v1), configured for Pascal VOC Dataset. +# Users should configure the fine_tune_checkpoint field in the train config as +# well as the label_map_path and input_path fields in the train_input_reader and +# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that +# should be configured. + +model { + faster_rcnn { + num_classes: 20 + image_resizer { + keep_aspect_ratio_resizer { + min_dimension: 600 + max_dimension: 1024 + } + } + feature_extractor { + type: 'faster_rcnn_resnet101' + first_stage_features_stride: 16 + } + first_stage_anchor_generator { + grid_anchor_generator { + scales: [0.25, 0.5, 1.0, 2.0] + aspect_ratios: [0.5, 1.0, 2.0] + height_stride: 16 + width_stride: 16 + } + } + first_stage_box_predictor_conv_hyperparams { + op: CONV + regularizer { + l2_regularizer { + weight: 0.0 + } + } + initializer { + truncated_normal_initializer { + stddev: 0.01 + } + } + } + first_stage_nms_score_threshold: 0.0 + first_stage_nms_iou_threshold: 0.7 + first_stage_max_proposals: 300 + first_stage_localization_loss_weight: 2.0 + first_stage_objectness_loss_weight: 1.0 + initial_crop_size: 14 + maxpool_kernel_size: 2 + maxpool_stride: 2 + second_stage_box_predictor { + mask_rcnn_box_predictor { + use_dropout: false + dropout_keep_probability: 1.0 + fc_hyperparams { + op: FC + regularizer { + l2_regularizer { + weight: 0.0 + } + } + initializer { + variance_scaling_initializer { + factor: 1.0 + uniform: true + mode: FAN_AVG + } + } + } + } + } + second_stage_post_processing { + batch_non_max_suppression { + score_threshold: 0.0 + iou_threshold: 0.6 + max_detections_per_class: 100 + max_total_detections: 300 + } + score_converter: SOFTMAX + } + second_stage_localization_loss_weight: 2.0 + second_stage_classification_loss_weight: 1.0 + } +} + +train_config: { + batch_size: 1 + optimizer { + momentum_optimizer: { + learning_rate: { + manual_step_learning_rate { + initial_learning_rate: 0.0001 + schedule { + step: 0 + learning_rate: .0001 + } + schedule { + step: 500000 + learning_rate: .00001 + } + schedule { + step: 700000 + learning_rate: .000001 + } + } + } + momentum_optimizer_value: 0.9 + } + use_moving_average: false + } + gradient_clipping_by_norm: 10.0 + fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt" + from_detection_checkpoint: true + num_steps: 800000 + data_augmentation_options { + random_horizontal_flip { + } + } +} + +train_input_reader: { + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/pascal_train.record" + } + label_map_path: "PATH_TO_BE_CONFIGURED/pascal_label_map.pbtxt" +} + +eval_config: { + num_examples: 4952 +} + +eval_input_reader: { + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/pascal_val.record" + } + label_map_path: "PATH_TO_BE_CONFIGURED/pascal_label_map.pbtxt" + shuffle: false + num_readers: 1 +} diff --git a/object_detection/samples/configs/faster_rcnn_resnet152_pets.config b/object_detection/samples/configs/faster_rcnn_resnet152_pets.config new file mode 100644 index 0000000000000000000000000000000000000000..8a466ee6d0a16070bb219d3dc71a6536695a8e6c --- /dev/null +++ b/object_detection/samples/configs/faster_rcnn_resnet152_pets.config @@ -0,0 +1,136 @@ +# Faster R-CNN with Resnet-152 (v1), configured for Oxford-IIIT Pets Dataset. +# Users should configure the fine_tune_checkpoint field in the train config as +# well as the label_map_path and input_path fields in the train_input_reader and +# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that +# should be configured. + +model { + faster_rcnn { + num_classes: 37 + image_resizer { + keep_aspect_ratio_resizer { + min_dimension: 600 + max_dimension: 1024 + } + } + feature_extractor { + type: 'faster_rcnn_resnet152' + first_stage_features_stride: 16 + } + first_stage_anchor_generator { + grid_anchor_generator { + scales: [0.25, 0.5, 1.0, 2.0] + aspect_ratios: [0.5, 1.0, 2.0] + height_stride: 16 + width_stride: 16 + } + } + first_stage_box_predictor_conv_hyperparams { + op: CONV + regularizer { + l2_regularizer { + weight: 0.0 + } + } + initializer { + truncated_normal_initializer { + stddev: 0.01 + } + } + } + first_stage_nms_score_threshold: 0.0 + first_stage_nms_iou_threshold: 0.7 + first_stage_max_proposals: 300 + first_stage_localization_loss_weight: 2.0 + first_stage_objectness_loss_weight: 1.0 + initial_crop_size: 14 + maxpool_kernel_size: 2 + maxpool_stride: 2 + second_stage_box_predictor { + mask_rcnn_box_predictor { + use_dropout: false + dropout_keep_probability: 1.0 + fc_hyperparams { + op: FC + regularizer { + l2_regularizer { + weight: 0.0 + } + } + initializer { + variance_scaling_initializer { + factor: 1.0 + uniform: true + mode: FAN_AVG + } + } + } + } + } + second_stage_post_processing { + batch_non_max_suppression { + score_threshold: 0.0 + iou_threshold: 0.6 + max_detections_per_class: 100 + max_total_detections: 300 + } + score_converter: SOFTMAX + } + second_stage_localization_loss_weight: 2.0 + second_stage_classification_loss_weight: 1.0 + } +} + +train_config: { + batch_size: 1 + optimizer { + momentum_optimizer: { + learning_rate: { + manual_step_learning_rate { + initial_learning_rate: 0.0003 + schedule { + step: 0 + learning_rate: .0003 + } + schedule { + step: 900000 + learning_rate: .00003 + } + schedule { + step: 1200000 + learning_rate: .000003 + } + } + } + momentum_optimizer_value: 0.9 + } + use_moving_average: false + } + gradient_clipping_by_norm: 10.0 + fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt" + from_detection_checkpoint: true + data_augmentation_options { + random_horizontal_flip { + } + } +} + +train_input_reader: { + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/pet_train.record" + } + label_map_path: "PATH_TO_BE_CONFIGURED/pet_label_map.pbtxt" +} + +eval_config: { + num_examples: 2000 +} + +eval_input_reader: { + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/pet_val.record" + } + label_map_path: "PATH_TO_BE_CONFIGURED/pet_label_map.pbtxt" + shuffle: false + num_readers: 1 +} diff --git a/object_detection/samples/configs/faster_rcnn_resnet50_pets.config b/object_detection/samples/configs/faster_rcnn_resnet50_pets.config new file mode 100644 index 0000000000000000000000000000000000000000..9764844d794890d43149287515fac6829d2df664 --- /dev/null +++ b/object_detection/samples/configs/faster_rcnn_resnet50_pets.config @@ -0,0 +1,136 @@ +# Faster R-CNN with Resnet-50 (v1), configured for Oxford-IIIT Pets Dataset. +# Users should configure the fine_tune_checkpoint field in the train config as +# well as the label_map_path and input_path fields in the train_input_reader and +# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that +# should be configured. + +model { + faster_rcnn { + num_classes: 37 + image_resizer { + keep_aspect_ratio_resizer { + min_dimension: 600 + max_dimension: 1024 + } + } + feature_extractor { + type: 'faster_rcnn_resnet50' + first_stage_features_stride: 16 + } + first_stage_anchor_generator { + grid_anchor_generator { + scales: [0.25, 0.5, 1.0, 2.0] + aspect_ratios: [0.5, 1.0, 2.0] + height_stride: 16 + width_stride: 16 + } + } + first_stage_box_predictor_conv_hyperparams { + op: CONV + regularizer { + l2_regularizer { + weight: 0.0 + } + } + initializer { + truncated_normal_initializer { + stddev: 0.01 + } + } + } + first_stage_nms_score_threshold: 0.0 + first_stage_nms_iou_threshold: 0.7 + first_stage_max_proposals: 300 + first_stage_localization_loss_weight: 2.0 + first_stage_objectness_loss_weight: 1.0 + initial_crop_size: 14 + maxpool_kernel_size: 2 + maxpool_stride: 2 + second_stage_box_predictor { + mask_rcnn_box_predictor { + use_dropout: false + dropout_keep_probability: 1.0 + fc_hyperparams { + op: FC + regularizer { + l2_regularizer { + weight: 0.0 + } + } + initializer { + variance_scaling_initializer { + factor: 1.0 + uniform: true + mode: FAN_AVG + } + } + } + } + } + second_stage_post_processing { + batch_non_max_suppression { + score_threshold: 0.0 + iou_threshold: 0.6 + max_detections_per_class: 100 + max_total_detections: 300 + } + score_converter: SOFTMAX + } + second_stage_localization_loss_weight: 2.0 + second_stage_classification_loss_weight: 1.0 + } +} + +train_config: { + batch_size: 1 + optimizer { + momentum_optimizer: { + learning_rate: { + manual_step_learning_rate { + initial_learning_rate: 0.0003 + schedule { + step: 0 + learning_rate: .0003 + } + schedule { + step: 900000 + learning_rate: .00003 + } + schedule { + step: 1200000 + learning_rate: .000003 + } + } + } + momentum_optimizer_value: 0.9 + } + use_moving_average: false + } + gradient_clipping_by_norm: 10.0 + fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt" + from_detection_checkpoint: true + data_augmentation_options { + random_horizontal_flip { + } + } +} + +train_input_reader: { + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/pet_train.record" + } + label_map_path: "PATH_TO_BE_CONFIGURED/pet_label_map.pbtxt" +} + +eval_config: { + num_examples: 2000 +} + +eval_input_reader: { + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/pet_val.record" + } + label_map_path: "PATH_TO_BE_CONFIGURED/pet_label_map.pbtxt" + shuffle: false + num_readers: 1 +} diff --git a/object_detection/samples/configs/rfcn_resnet101_pets.config b/object_detection/samples/configs/rfcn_resnet101_pets.config new file mode 100644 index 0000000000000000000000000000000000000000..5750563ac14702bd4a28732028f33f0f4613c398 --- /dev/null +++ b/object_detection/samples/configs/rfcn_resnet101_pets.config @@ -0,0 +1,133 @@ +# R-FCN with Resnet-101 (v1), configured for Oxford-IIIT Pets Dataset. +# Users should configure the fine_tune_checkpoint field in the train config as +# well as the label_map_path and input_path fields in the train_input_reader and +# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that +# should be configured. + +model { + faster_rcnn { + num_classes: 37 + image_resizer { + keep_aspect_ratio_resizer { + min_dimension: 600 + max_dimension: 1024 + } + } + feature_extractor { + type: 'faster_rcnn_resnet101' + first_stage_features_stride: 16 + } + first_stage_anchor_generator { + grid_anchor_generator { + scales: [0.25, 0.5, 1.0, 2.0] + aspect_ratios: [0.5, 1.0, 2.0] + height_stride: 16 + width_stride: 16 + } + } + first_stage_box_predictor_conv_hyperparams { + op: CONV + regularizer { + l2_regularizer { + weight: 0.0 + } + } + initializer { + truncated_normal_initializer { + stddev: 0.01 + } + } + } + first_stage_nms_score_threshold: 0.0 + first_stage_nms_iou_threshold: 0.7 + first_stage_max_proposals: 300 + first_stage_localization_loss_weight: 2.0 + first_stage_objectness_loss_weight: 1.0 + second_stage_box_predictor { + rfcn_box_predictor { + conv_hyperparams { + op: CONV + regularizer { + l2_regularizer { + weight: 0.0 + } + } + initializer { + truncated_normal_initializer { + stddev: 0.01 + } + } + } + crop_height: 18 + crop_width: 18 + num_spatial_bins_height: 3 + num_spatial_bins_width: 3 + } + } + second_stage_post_processing { + batch_non_max_suppression { + score_threshold: 0.0 + iou_threshold: 0.6 + max_detections_per_class: 100 + max_total_detections: 300 + } + score_converter: SOFTMAX + } + second_stage_localization_loss_weight: 2.0 + second_stage_classification_loss_weight: 1.0 + } +} + +train_config: { + batch_size: 1 + optimizer { + momentum_optimizer: { + learning_rate: { + manual_step_learning_rate { + initial_learning_rate: 0.0003 + schedule { + step: 0 + learning_rate: .0003 + } + schedule { + step: 900000 + learning_rate: .00003 + } + schedule { + step: 1200000 + learning_rate: .000003 + } + } + } + momentum_optimizer_value: 0.9 + } + use_moving_average: false + } + gradient_clipping_by_norm: 10.0 + fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt" + from_detection_checkpoint: true + data_augmentation_options { + random_horizontal_flip { + } + } +} + +train_input_reader: { + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/pet_train.record" + } + label_map_path: "PATH_TO_BE_CONFIGURED/pet_label_map.pbtxt" +} + +eval_config: { + num_examples: 2000 +} + +eval_input_reader: { + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/pet_val.record" + } + label_map_path: "PATH_TO_BE_CONFIGURED/pet_label_map.pbtxt" + shuffle: false + num_readers: 1 +} diff --git a/object_detection/samples/configs/ssd_inception_v2_pets.config b/object_detection/samples/configs/ssd_inception_v2_pets.config new file mode 100644 index 0000000000000000000000000000000000000000..fd799b4ca4e9f15ca8a87cf65f8bc5deaa59f497 --- /dev/null +++ b/object_detection/samples/configs/ssd_inception_v2_pets.config @@ -0,0 +1,182 @@ +# SSD with Inception v2 configured for Oxford-IIIT Pets Dataset. +# Users should configure the fine_tune_checkpoint field in the train config as +# well as the label_map_path and input_path fields in the train_input_reader and +# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that +# should be configured. + +model { + ssd { + num_classes: 37 + box_coder { + faster_rcnn_box_coder { + y_scale: 10.0 + x_scale: 10.0 + height_scale: 5.0 + width_scale: 5.0 + } + } + matcher { + argmax_matcher { + matched_threshold: 0.5 + unmatched_threshold: 0.5 + ignore_thresholds: false + negatives_lower_than_unmatched: true + force_match_for_each_row: true + } + } + similarity_calculator { + iou_similarity { + } + } + anchor_generator { + ssd_anchor_generator { + num_layers: 6 + min_scale: 0.2 + max_scale: 0.95 + aspect_ratios: 1.0 + aspect_ratios: 2.0 + aspect_ratios: 0.5 + aspect_ratios: 3.0 + aspect_ratios: 0.3333 + reduce_boxes_in_lowest_layer: true + } + } + image_resizer { + fixed_shape_resizer { + height: 300 + width: 300 + } + } + box_predictor { + convolutional_box_predictor { + min_depth: 0 + max_depth: 0 + num_layers_before_predictor: 0 + use_dropout: false + dropout_keep_probability: 0.8 + kernel_size: 3 + box_code_size: 4 + apply_sigmoid_to_scores: false + conv_hyperparams { + activation: RELU_6, + regularizer { + l2_regularizer { + weight: 0.00004 + } + } + initializer { + truncated_normal_initializer { + stddev: 0.03 + mean: 0.0 + } + } + } + } + } + feature_extractor { + type: 'ssd_inception_v2' + min_depth: 16 + depth_multiplier: 1.0 + conv_hyperparams { + activation: RELU_6, + regularizer { + l2_regularizer { + weight: 0.00004 + } + } + initializer { + truncated_normal_initializer { + stddev: 0.03 + mean: 0.0 + } + } + batch_norm { + train: true, + scale: true, + center: true, + decay: 0.9997, + epsilon: 0.001, + } + } + } + loss { + classification_loss { + weighted_sigmoid { + anchorwise_output: true + } + } + localization_loss { + weighted_smooth_l1 { + anchorwise_output: true + } + } + hard_example_miner { + num_hard_examples: 3000 + iou_threshold: 0.99 + loss_type: CLASSIFICATION + max_negatives_per_positive: 3 + min_negatives_per_image: 0 + } + classification_weight: 1.0 + localization_weight: 1.0 + } + normalize_loss_by_num_matches: true + post_processing { + batch_non_max_suppression { + score_threshold: 1e-8 + iou_threshold: 0.6 + max_detections_per_class: 100 + max_total_detections: 100 + } + score_converter: SIGMOID + } + } +} + +train_config: { + batch_size: 24 + optimizer { + rms_prop_optimizer: { + learning_rate: { + exponential_decay_learning_rate { + initial_learning_rate: 0.004 + decay_steps: 800720 + decay_factor: 0.95 + } + } + momentum_optimizer_value: 0.9 + decay: 0.9 + epsilon: 1.0 + } + } + fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt" + from_detection_checkpoint: true + data_augmentation_options { + random_horizontal_flip { + } + } + data_augmentation_options { + ssd_random_crop { + } + } +} + +train_input_reader: { + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/pet_train.record" + } + label_map_path: "PATH_TO_BE_CONFIGURED/pet_label_map.pbtxt" +} + +eval_config: { + num_examples: 2000 +} + +eval_input_reader: { + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/pet_val.record" + } + label_map_path: "PATH_TO_BE_CONFIGURED/pet_label_map.pbtxt" + shuffle: false + num_readers: 1 +} diff --git a/object_detection/samples/configs/ssd_mobilenet_v1_pets.config b/object_detection/samples/configs/ssd_mobilenet_v1_pets.config new file mode 100644 index 0000000000000000000000000000000000000000..8aeb73870b6d2a726d2f72b1b1848dd07fac2d5a --- /dev/null +++ b/object_detection/samples/configs/ssd_mobilenet_v1_pets.config @@ -0,0 +1,188 @@ +# SSD with Mobilenet v1, configured for Oxford-IIIT Pets Dataset. +# Users should configure the fine_tune_checkpoint field in the train config as +# well as the label_map_path and input_path fields in the train_input_reader and +# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that +# should be configured. + +model { + ssd { + num_classes: 37 + box_coder { + faster_rcnn_box_coder { + y_scale: 10.0 + x_scale: 10.0 + height_scale: 5.0 + width_scale: 5.0 + } + } + matcher { + argmax_matcher { + matched_threshold: 0.5 + unmatched_threshold: 0.5 + ignore_thresholds: false + negatives_lower_than_unmatched: true + force_match_for_each_row: true + } + } + similarity_calculator { + iou_similarity { + } + } + anchor_generator { + ssd_anchor_generator { + num_layers: 6 + min_scale: 0.2 + max_scale: 0.95 + aspect_ratios: 1.0 + aspect_ratios: 2.0 + aspect_ratios: 0.5 + aspect_ratios: 3.0 + aspect_ratios: 0.3333 + } + } + image_resizer { + fixed_shape_resizer { + height: 300 + width: 300 + } + } + box_predictor { + convolutional_box_predictor { + min_depth: 0 + max_depth: 0 + num_layers_before_predictor: 0 + use_dropout: false + dropout_keep_probability: 0.8 + kernel_size: 1 + box_code_size: 4 + apply_sigmoid_to_scores: false + conv_hyperparams { + activation: RELU_6, + regularizer { + l2_regularizer { + weight: 0.00004 + } + } + initializer { + truncated_normal_initializer { + stddev: 0.03 + mean: 0.0 + } + } + batch_norm { + train: true, + scale: true, + center: true, + decay: 0.9997, + epsilon: 0.001, + } + } + } + } + feature_extractor { + type: 'ssd_mobilenet_v1' + min_depth: 16 + depth_multiplier: 1.0 + conv_hyperparams { + activation: RELU_6, + regularizer { + l2_regularizer { + weight: 0.00004 + } + } + initializer { + truncated_normal_initializer { + stddev: 0.03 + mean: 0.0 + } + } + batch_norm { + train: true, + scale: true, + center: true, + decay: 0.9997, + epsilon: 0.001, + } + } + } + loss { + classification_loss { + weighted_sigmoid { + anchorwise_output: true + } + } + localization_loss { + weighted_smooth_l1 { + anchorwise_output: true + } + } + hard_example_miner { + num_hard_examples: 3000 + iou_threshold: 0.99 + loss_type: CLASSIFICATION + max_negatives_per_positive: 3 + min_negatives_per_image: 0 + } + classification_weight: 1.0 + localization_weight: 1.0 + } + normalize_loss_by_num_matches: true + post_processing { + batch_non_max_suppression { + score_threshold: 1e-8 + iou_threshold: 0.6 + max_detections_per_class: 100 + max_total_detections: 100 + } + score_converter: SIGMOID + } + } +} + +train_config: { + batch_size: 24 + optimizer { + rms_prop_optimizer: { + learning_rate: { + exponential_decay_learning_rate { + initial_learning_rate: 0.004 + decay_steps: 800720 + decay_factor: 0.95 + } + } + momentum_optimizer_value: 0.9 + decay: 0.9 + epsilon: 1.0 + } + } + fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt" + from_detection_checkpoint: true + data_augmentation_options { + random_horizontal_flip { + } + } + data_augmentation_options { + ssd_random_crop { + } + } +} + +train_input_reader: { + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/pet_train.record" + } + label_map_path: "PATH_TO_BE_CONFIGURED/pet_label_map.pbtxt" +} + +eval_config: { + num_examples: 2000 +} + +eval_input_reader: { + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/pet_val.record" + } + label_map_path: "PATH_TO_BE_CONFIGURED/pet_label_map.pbtxt" + shuffle: false + num_readers: 1 +} diff --git a/object_detection/test_images/image1.jpg b/object_detection/test_images/image1.jpg new file mode 100644 index 0000000000000000000000000000000000000000..8b20d8af3e195be7f4c212e31102cada9248dcde Binary files /dev/null and b/object_detection/test_images/image1.jpg differ diff --git a/object_detection/test_images/image2.jpg b/object_detection/test_images/image2.jpg new file mode 100644 index 0000000000000000000000000000000000000000..9eb325ac5fc375cb2513380087dd713be9be19d8 Binary files /dev/null and b/object_detection/test_images/image2.jpg differ diff --git a/object_detection/test_images/image_info.txt b/object_detection/test_images/image_info.txt new file mode 100644 index 0000000000000000000000000000000000000000..6f805cbcd27405940398f24f2a1a4538e197e108 --- /dev/null +++ b/object_detection/test_images/image_info.txt @@ -0,0 +1,6 @@ + +Image provenance: +image1.jpg: https://commons.wikimedia.org/wiki/File:Baegle_dwa.jpg +image2.jpg: Michael Miley, + https://www.flickr.com/photos/mike_miley/4678754542/in/photolist-88rQHL-88oBVp-88oC2B-88rS6J-88rSqm-88oBLv-88oBC4 + diff --git a/object_detection/train.py b/object_detection/train.py new file mode 100644 index 0000000000000000000000000000000000000000..f2e823afbee0fd8095df76360840243f8ca83db6 --- /dev/null +++ b/object_detection/train.py @@ -0,0 +1,198 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +r"""Training executable for detection models. + +This executable is used to train DetectionModels. There are two ways of +configuring the training job: + +1) A single pipeline_pb2.TrainEvalPipelineConfig configuration file +can be specified by --pipeline_config_path. + +Example usage: + ./train \ + --logtostderr \ + --train_dir=path/to/train_dir \ + --pipeline_config_path=pipeline_config.pbtxt + +2) Three configuration files can be provided: a model_pb2.DetectionModel +configuration file to define what type of DetectionModel is being trained, an +input_reader_pb2.InputReader file to specify what training data will be used and +a train_pb2.TrainConfig file to configure training parameters. + +Example usage: + ./train \ + --logtostderr \ + --train_dir=path/to/train_dir \ + --model_config_path=model_config.pbtxt \ + --train_config_path=train_config.pbtxt \ + --input_config_path=train_input_config.pbtxt +""" + +import functools +import json +import os +import tensorflow as tf + +from google.protobuf import text_format + +from object_detection import trainer +from object_detection.builders import input_reader_builder +from object_detection.builders import model_builder +from object_detection.protos import input_reader_pb2 +from object_detection.protos import model_pb2 +from object_detection.protos import pipeline_pb2 +from object_detection.protos import train_pb2 + +tf.logging.set_verbosity(tf.logging.INFO) + +flags = tf.app.flags +flags.DEFINE_string('master', '', 'BNS name of the TensorFlow master to use.') +flags.DEFINE_integer('task', 0, 'task id') +flags.DEFINE_integer('num_clones', 1, 'Number of clones to deploy per worker.') +flags.DEFINE_boolean('clone_on_cpu', False, + 'Force clones to be deployed on CPU. Note that even if ' + 'set to False (allowing ops to run on gpu), some ops may ' + 'still be run on the CPU if they have no GPU kernel.') +flags.DEFINE_integer('worker_replicas', 1, 'Number of worker+trainer ' + 'replicas.') +flags.DEFINE_integer('ps_tasks', 0, + 'Number of parameter server tasks. If None, does not use ' + 'a parameter server.') +flags.DEFINE_string('train_dir', '', + 'Directory to save the checkpoints and training summaries.') + +flags.DEFINE_string('pipeline_config_path', '', + 'Path to a pipeline_pb2.TrainEvalPipelineConfig config ' + 'file. If provided, other configs are ignored') + +flags.DEFINE_string('train_config_path', '', + 'Path to a train_pb2.TrainConfig config file.') +flags.DEFINE_string('input_config_path', '', + 'Path to an input_reader_pb2.InputReader config file.') +flags.DEFINE_string('model_config_path', '', + 'Path to a model_pb2.DetectionModel config file.') + +FLAGS = flags.FLAGS + + +def get_configs_from_pipeline_file(): + """Reads training configuration from a pipeline_pb2.TrainEvalPipelineConfig. + + Reads training config from file specified by pipeline_config_path flag. + + Returns: + model_config: model_pb2.DetectionModel + train_config: train_pb2.TrainConfig + input_config: input_reader_pb2.InputReader + """ + pipeline_config = pipeline_pb2.TrainEvalPipelineConfig() + with tf.gfile.GFile(FLAGS.pipeline_config_path, 'r') as f: + text_format.Merge(f.read(), pipeline_config) + + model_config = pipeline_config.model + train_config = pipeline_config.train_config + input_config = pipeline_config.train_input_reader + + return model_config, train_config, input_config + + +def get_configs_from_multiple_files(): + """Reads training configuration from multiple config files. + + Reads the training config from the following files: + model_config: Read from --model_config_path + train_config: Read from --train_config_path + input_config: Read from --input_config_path + + Returns: + model_config: model_pb2.DetectionModel + train_config: train_pb2.TrainConfig + input_config: input_reader_pb2.InputReader + """ + train_config = train_pb2.TrainConfig() + with tf.gfile.GFile(FLAGS.train_config_path, 'r') as f: + text_format.Merge(f.read(), train_config) + + model_config = model_pb2.DetectionModel() + with tf.gfile.GFile(FLAGS.model_config_path, 'r') as f: + text_format.Merge(f.read(), model_config) + + input_config = input_reader_pb2.InputReader() + with tf.gfile.GFile(FLAGS.input_config_path, 'r') as f: + text_format.Merge(f.read(), input_config) + + return model_config, train_config, input_config + + +def main(_): + assert FLAGS.train_dir, '`train_dir` is missing.' + if FLAGS.pipeline_config_path: + model_config, train_config, input_config = get_configs_from_pipeline_file() + else: + model_config, train_config, input_config = get_configs_from_multiple_files() + + model_fn = functools.partial( + model_builder.build, + model_config=model_config, + is_training=True) + + create_input_dict_fn = functools.partial( + input_reader_builder.build, input_config) + + env = json.loads(os.environ.get('TF_CONFIG', '{}')) + cluster_data = env.get('cluster', None) + cluster = tf.train.ClusterSpec(cluster_data) if cluster_data else None + task_data = env.get('task', None) or {'type': 'master', 'index': 0} + task_info = type('TaskSpec', (object,), task_data) + + # Parameters for a single worker. + ps_tasks = 0 + worker_replicas = 1 + worker_job_name = 'lonely_worker' + task = 0 + is_chief = True + master = '' + + if cluster_data and 'worker' in cluster_data: + # Number of total worker replicas include "worker"s and the "master". + worker_replicas = len(cluster_data['worker']) + 1 + if cluster_data and 'ps' in cluster_data: + ps_tasks = len(cluster_data['ps']) + + if worker_replicas > 1 and ps_tasks < 1: + raise ValueError('At least 1 ps task is needed for distributed training.') + + if worker_replicas >= 1 and ps_tasks > 0: + # Set up distributed training. + server = tf.train.Server(tf.train.ClusterSpec(cluster), protocol='grpc', + job_name=task_info.type, + task_index=task_info.index) + if task_info.type == 'ps': + server.join() + return + + worker_job_name = '%s/task:%d' % (task_info.type, task_info.index) + task = task_info.index + is_chief = (task_info.type == 'master') + master = server.target + + trainer.train(create_input_dict_fn, model_fn, train_config, master, task, + FLAGS.num_clones, worker_replicas, FLAGS.clone_on_cpu, ps_tasks, + worker_job_name, is_chief, FLAGS.train_dir) + + +if __name__ == '__main__': + tf.app.run() diff --git a/object_detection/trainer.py b/object_detection/trainer.py new file mode 100644 index 0000000000000000000000000000000000000000..1c681e3437f28b6f314eccbe79b2bca9df6e46e3 --- /dev/null +++ b/object_detection/trainer.py @@ -0,0 +1,290 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Detection model trainer. + +This file provides a generic training method that can be used to train a +DetectionModel. +""" + +import functools + +import tensorflow as tf + +from object_detection.builders import optimizer_builder +from object_detection.builders import preprocessor_builder +from object_detection.core import batcher +from object_detection.core import preprocessor +from object_detection.core import standard_fields as fields +from object_detection.utils import ops as util_ops +from object_detection.utils import variables_helper +from deployment import model_deploy + +slim = tf.contrib.slim + + +def _create_input_queue(batch_size_per_clone, create_tensor_dict_fn, + batch_queue_capacity, num_batch_queue_threads, + prefetch_queue_capacity, data_augmentation_options): + """Sets up reader, prefetcher and returns input queue. + + Args: + batch_size_per_clone: batch size to use per clone. + create_tensor_dict_fn: function to create tensor dictionary. + batch_queue_capacity: maximum number of elements to store within a queue. + num_batch_queue_threads: number of threads to use for batching. + prefetch_queue_capacity: maximum capacity of the queue used to prefetch + assembled batches. + data_augmentation_options: a list of tuples, where each tuple contains a + data augmentation function and a dictionary containing arguments and their + values (see preprocessor.py). + + Returns: + input queue: a batcher.BatchQueue object holding enqueued tensor_dicts + (which hold images, boxes and targets). To get a batch of tensor_dicts, + call input_queue.Dequeue(). + """ + tensor_dict = create_tensor_dict_fn() + + tensor_dict[fields.InputDataFields.image] = tf.expand_dims( + tensor_dict[fields.InputDataFields.image], 0) + + images = tensor_dict[fields.InputDataFields.image] + float_images = tf.to_float(images) + tensor_dict[fields.InputDataFields.image] = float_images + + if data_augmentation_options: + tensor_dict = preprocessor.preprocess(tensor_dict, + data_augmentation_options) + + input_queue = batcher.BatchQueue( + tensor_dict, + batch_size=batch_size_per_clone, + batch_queue_capacity=batch_queue_capacity, + num_batch_queue_threads=num_batch_queue_threads, + prefetch_queue_capacity=prefetch_queue_capacity) + return input_queue + + +def _get_inputs(input_queue, num_classes): + """Dequeue batch and construct inputs to object detection model. + + Args: + input_queue: BatchQueue object holding enqueued tensor_dicts. + num_classes: Number of classes. + + Returns: + images: a list of 3-D float tensor of images. + locations_list: a list of tensors of shape [num_boxes, 4] + containing the corners of the groundtruth boxes. + classes_list: a list of padded one-hot tensors containing target classes. + masks_list: a list of 3-D float tensors of shape [num_boxes, image_height, + image_width] containing instance masks for objects if present in the + input_queue. Else returns None. + """ + read_data_list = input_queue.dequeue() + label_id_offset = 1 + def extract_images_and_targets(read_data): + image = read_data[fields.InputDataFields.image] + location_gt = read_data[fields.InputDataFields.groundtruth_boxes] + classes_gt = tf.cast(read_data[fields.InputDataFields.groundtruth_classes], + tf.int32) + classes_gt -= label_id_offset + classes_gt = util_ops.padded_one_hot_encoding(indices=classes_gt, + depth=num_classes, left_pad=0) + masks_gt = read_data.get(fields.InputDataFields.groundtruth_instance_masks) + return image, location_gt, classes_gt, masks_gt + return zip(*map(extract_images_and_targets, read_data_list)) + + +def _create_losses(input_queue, create_model_fn): + """Creates loss function for a DetectionModel. + + Args: + input_queue: BatchQueue object holding enqueued tensor_dicts. + create_model_fn: A function to create the DetectionModel. + """ + detection_model = create_model_fn() + (images, groundtruth_boxes_list, groundtruth_classes_list, + groundtruth_masks_list + ) = _get_inputs(input_queue, detection_model.num_classes) + images = [detection_model.preprocess(image) for image in images] + images = tf.concat(images, 0) + if any(mask is None for mask in groundtruth_masks_list): + groundtruth_masks_list = None + + detection_model.provide_groundtruth(groundtruth_boxes_list, + groundtruth_classes_list, + groundtruth_masks_list) + prediction_dict = detection_model.predict(images) + + losses_dict = detection_model.loss(prediction_dict) + for loss_tensor in losses_dict.values(): + tf.losses.add_loss(loss_tensor) + + +def train(create_tensor_dict_fn, create_model_fn, train_config, master, task, + num_clones, worker_replicas, clone_on_cpu, ps_tasks, worker_job_name, + is_chief, train_dir): + """Training function for detection models. + + Args: + create_tensor_dict_fn: a function to create a tensor input dictionary. + create_model_fn: a function that creates a DetectionModel and generates + losses. + train_config: a train_pb2.TrainConfig protobuf. + master: BNS name of the TensorFlow master to use. + task: The task id of this training instance. + num_clones: The number of clones to run per machine. + worker_replicas: The number of work replicas to train with. + clone_on_cpu: True if clones should be forced to run on CPU. + ps_tasks: Number of parameter server tasks. + worker_job_name: Name of the worker job. + is_chief: Whether this replica is the chief replica. + train_dir: Directory to write checkpoints and training summaries to. + """ + + detection_model = create_model_fn() + data_augmentation_options = [ + preprocessor_builder.build(step) + for step in train_config.data_augmentation_options] + + with tf.Graph().as_default(): + # Build a configuration specifying multi-GPU and multi-replicas. + deploy_config = model_deploy.DeploymentConfig( + num_clones=num_clones, + clone_on_cpu=clone_on_cpu, + replica_id=task, + num_replicas=worker_replicas, + num_ps_tasks=ps_tasks, + worker_job_name=worker_job_name) + + # Place the global step on the device storing the variables. + with tf.device(deploy_config.variables_device()): + global_step = slim.create_global_step() + + with tf.device(deploy_config.inputs_device()): + input_queue = _create_input_queue(train_config.batch_size // num_clones, + create_tensor_dict_fn, + train_config.batch_queue_capacity, + train_config.num_batch_queue_threads, + train_config.prefetch_queue_capacity, + data_augmentation_options) + + # Gather initial summaries. + summaries = set(tf.get_collection(tf.GraphKeys.SUMMARIES)) + global_summaries = set([]) + + model_fn = functools.partial(_create_losses, + create_model_fn=create_model_fn) + clones = model_deploy.create_clones(deploy_config, model_fn, [input_queue]) + first_clone_scope = clones[0].scope + + # Gather update_ops from the first clone. These contain, for example, + # the updates for the batch_norm variables created by model_fn. + update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS, first_clone_scope) + + with tf.device(deploy_config.optimizer_device()): + training_optimizer = optimizer_builder.build(train_config.optimizer, + global_summaries) + + sync_optimizer = None + if train_config.sync_replicas: + training_optimizer = tf.SyncReplicasOptimizer( + training_optimizer, + replicas_to_aggregate=train_config.replicas_to_aggregate, + total_num_replicas=train_config.worker_replicas) + sync_optimizer = training_optimizer + + # Create ops required to initialize the model from a given checkpoint. + init_fn = None + if train_config.fine_tune_checkpoint: + init_fn = detection_model.restore_fn( + train_config.fine_tune_checkpoint, + from_detection_checkpoint=train_config.from_detection_checkpoint) + + with tf.device(deploy_config.optimizer_device()): + total_loss, grads_and_vars = model_deploy.optimize_clones( + clones, training_optimizer, regularization_losses=None) + total_loss = tf.check_numerics(total_loss, 'LossTensor is inf or nan.') + + # Optionally multiply bias gradients by train_config.bias_grad_multiplier. + if train_config.bias_grad_multiplier: + biases_regex_list = ['.*/biases'] + grads_and_vars = variables_helper.multiply_gradients_matching_regex( + grads_and_vars, + biases_regex_list, + multiplier=train_config.bias_grad_multiplier) + + # Optionally freeze some layers by setting their gradients to be zero. + if train_config.freeze_variables: + grads_and_vars = variables_helper.freeze_gradients_matching_regex( + grads_and_vars, train_config.freeze_variables) + + # Optionally clip gradients + if train_config.gradient_clipping_by_norm > 0: + with tf.name_scope('clip_grads'): + grads_and_vars = slim.learning.clip_gradient_norms( + grads_and_vars, train_config.gradient_clipping_by_norm) + + # Create gradient updates. + grad_updates = training_optimizer.apply_gradients(grads_and_vars, + global_step=global_step) + update_ops.append(grad_updates) + + update_op = tf.group(*update_ops) + with tf.control_dependencies([update_op]): + train_tensor = tf.identity(total_loss, name='train_op') + + # Add summaries. + for model_var in slim.get_model_variables(): + global_summaries.add(tf.summary.histogram(model_var.op.name, model_var)) + for loss_tensor in tf.losses.get_losses(): + global_summaries.add(tf.summary.scalar(loss_tensor.op.name, loss_tensor)) + global_summaries.add( + tf.summary.scalar('TotalLoss', tf.losses.get_total_loss())) + + # Add the summaries from the first clone. These contain the summaries + # created by model_fn and either optimize_clones() or _gather_clone_loss(). + summaries |= set(tf.get_collection(tf.GraphKeys.SUMMARIES, + first_clone_scope)) + summaries |= global_summaries + + # Merge all summaries together. + summary_op = tf.summary.merge(list(summaries), name='summary_op') + + # Soft placement allows placing on CPU ops without GPU implementation. + session_config = tf.ConfigProto(allow_soft_placement=True, + log_device_placement=False) + + # Save checkpoints regularly. + keep_checkpoint_every_n_hours = train_config.keep_checkpoint_every_n_hours + saver = tf.train.Saver( + keep_checkpoint_every_n_hours=keep_checkpoint_every_n_hours) + + slim.learning.train( + train_tensor, + logdir=train_dir, + master=master, + is_chief=is_chief, + session_config=session_config, + startup_delay_steps=train_config.startup_delay_steps, + init_fn=init_fn, + summary_op=summary_op, + number_of_steps=( + train_config.num_steps if train_config.num_steps else None), + save_summaries_secs=120, + sync_optimizer=sync_optimizer, + saver=saver) diff --git a/object_detection/trainer_test.py b/object_detection/trainer_test.py new file mode 100644 index 0000000000000000000000000000000000000000..36e92752a41e97ac065739a45a11400f8a7d5d16 --- /dev/null +++ b/object_detection/trainer_test.py @@ -0,0 +1,205 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.trainer.""" + +import tensorflow as tf + +from google.protobuf import text_format + +from object_detection import trainer +from object_detection.core import losses +from object_detection.core import model +from object_detection.core import standard_fields as fields +from object_detection.protos import train_pb2 + + +NUMBER_OF_CLASSES = 2 + + +def get_input_function(): + """A function to get test inputs. Returns an image with one box.""" + image = tf.random_uniform([32, 32, 3], dtype=tf.float32) + class_label = tf.random_uniform( + [1], minval=0, maxval=NUMBER_OF_CLASSES, dtype=tf.int32) + box_label = tf.random_uniform( + [1, 4], minval=0.4, maxval=0.6, dtype=tf.float32) + + return { + fields.InputDataFields.image: image, + fields.InputDataFields.groundtruth_classes: class_label, + fields.InputDataFields.groundtruth_boxes: box_label + } + + +class FakeDetectionModel(model.DetectionModel): + """A simple (and poor) DetectionModel for use in test.""" + + def __init__(self): + super(FakeDetectionModel, self).__init__(num_classes=NUMBER_OF_CLASSES) + self._classification_loss = losses.WeightedSigmoidClassificationLoss( + anchorwise_output=True) + self._localization_loss = losses.WeightedSmoothL1LocalizationLoss( + anchorwise_output=True) + + def preprocess(self, inputs): + """Input preprocessing, resizes images to 28x28. + + Args: + inputs: a [batch, height_in, width_in, channels] float32 tensor + representing a batch of images with values between 0 and 255.0. + + Returns: + preprocessed_inputs: a [batch, 28, 28, channels] float32 tensor. + """ + return tf.image.resize_images(inputs, [28, 28]) + + def predict(self, preprocessed_inputs): + """Prediction tensors from inputs tensor. + + Args: + preprocessed_inputs: a [batch, 28, 28, channels] float32 tensor. + + Returns: + prediction_dict: a dictionary holding prediction tensors to be + passed to the Loss or Postprocess functions. + """ + flattened_inputs = tf.contrib.layers.flatten(preprocessed_inputs) + class_prediction = tf.contrib.layers.fully_connected( + flattened_inputs, self._num_classes) + box_prediction = tf.contrib.layers.fully_connected(flattened_inputs, 4) + + return { + 'class_predictions_with_background': tf.reshape( + class_prediction, [-1, 1, self._num_classes]), + 'box_encodings': tf.reshape(box_prediction, [-1, 1, 4]) + } + + def postprocess(self, prediction_dict, **params): + """Convert predicted output tensors to final detections. Unused. + + Args: + prediction_dict: a dictionary holding prediction tensors. + **params: Additional keyword arguments for specific implementations of + DetectionModel. + + Returns: + detections: a dictionary with empty fields. + """ + return { + 'detection_boxes': None, + 'detection_scores': None, + 'detection_classes': None, + 'num_detections': None + } + + def loss(self, prediction_dict): + """Compute scalar loss tensors with respect to provided groundtruth. + + Calling this function requires that groundtruth tensors have been + provided via the provide_groundtruth function. + + Args: + prediction_dict: a dictionary holding predicted tensors + + Returns: + a dictionary mapping strings (loss names) to scalar tensors representing + loss values. + """ + batch_reg_targets = tf.stack( + self.groundtruth_lists(fields.BoxListFields.boxes)) + batch_cls_targets = tf.stack( + self.groundtruth_lists(fields.BoxListFields.classes)) + weights = tf.constant( + 1.0, dtype=tf.float32, + shape=[len(self.groundtruth_lists(fields.BoxListFields.boxes)), 1]) + + location_losses = self._localization_loss( + prediction_dict['box_encodings'], batch_reg_targets, + weights=weights) + cls_losses = self._classification_loss( + prediction_dict['class_predictions_with_background'], batch_cls_targets, + weights=weights) + + loss_dict = { + 'localization_loss': tf.reduce_sum(location_losses), + 'classification_loss': tf.reduce_sum(cls_losses), + } + return loss_dict + + def restore_fn(self, checkpoint_path, from_detection_checkpoint=True): + """Return callable for loading a checkpoint into the tensorflow graph. + + Args: + checkpoint_path: path to checkpoint to restore. + from_detection_checkpoint: whether to restore from a full detection + checkpoint (with compatible variable names) or to restore from a + classification checkpoint for initialization prior to training. + + Returns: + a callable which takes a tf.Session and does nothing. + """ + def restore(unused_sess): + return + return restore + + +class TrainerTest(tf.test.TestCase): + + def test_configure_trainer_and_train_two_steps(self): + train_config_text_proto = """ + optimizer { + adam_optimizer { + learning_rate { + constant_learning_rate { + learning_rate: 0.01 + } + } + } + } + data_augmentation_options { + random_adjust_brightness { + max_delta: 0.2 + } + } + data_augmentation_options { + random_adjust_contrast { + min_delta: 0.7 + max_delta: 1.1 + } + } + num_steps: 2 + """ + train_config = train_pb2.TrainConfig() + text_format.Merge(train_config_text_proto, train_config) + + train_dir = self.get_temp_dir() + + trainer.train(create_tensor_dict_fn=get_input_function, + create_model_fn=FakeDetectionModel, + train_config=train_config, + master='', + task=0, + num_clones=1, + worker_replicas=1, + clone_on_cpu=True, + ps_tasks=0, + worker_job_name='worker', + is_chief=True, + train_dir=train_dir) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/utils/BUILD b/object_detection/utils/BUILD new file mode 100644 index 0000000000000000000000000000000000000000..dc71a38c91dcfea08fd0163cd50450a2aeaa9d5c --- /dev/null +++ b/object_detection/utils/BUILD @@ -0,0 +1,287 @@ +# Tensorflow Object Detection API: Utility functions. + +package( + default_visibility = ["//visibility:public"], +) + +licenses(["notice"]) + +# Apache 2.0 + +py_library( + name = "category_util", + srcs = ["category_util.py"], + deps = ["//tensorflow"], +) + +py_library( + name = "dataset_util", + srcs = ["dataset_util.py"], + deps = [ + "//tensorflow", + ], +) + +py_library( + name = "label_map_util", + srcs = ["label_map_util.py"], + deps = [ + "//third_party/py/google/protobuf", + "//tensorflow", + "//tensorflow_models/object_detection/protos:string_int_label_map_py_pb2", + ], +) + +py_library( + name = "learning_schedules", + srcs = ["learning_schedules.py"], + deps = ["//tensorflow"], +) + +py_library( + name = "metrics", + srcs = ["metrics.py"], + deps = ["//third_party/py/numpy"], +) + +py_library( + name = "np_box_list", + srcs = ["np_box_list.py"], + deps = ["//tensorflow"], +) + +py_library( + name = "np_box_list_ops", + srcs = ["np_box_list_ops.py"], + deps = [ + ":np_box_list", + ":np_box_ops", + "//tensorflow", + ], +) + +py_library( + name = "np_box_ops", + srcs = ["np_box_ops.py"], + deps = ["//tensorflow"], +) + +py_library( + name = "object_detection_evaluation", + srcs = ["object_detection_evaluation.py"], + deps = [ + ":metrics", + ":per_image_evaluation", + "//tensorflow", + ], +) + +py_library( + name = "ops", + srcs = ["ops.py"], + deps = [ + ":static_shape", + "//tensorflow", + "//tensorflow_models/object_detection/core:box_list", + "//tensorflow_models/object_detection/core:box_list_ops", + "//tensorflow_models/object_detection/core:standard_fields", + ], +) + +py_library( + name = "per_image_evaluation", + srcs = ["per_image_evaluation.py"], + deps = [ + ":np_box_list", + ":np_box_list_ops", + "//tensorflow", + ], +) + +py_library( + name = "shape_utils", + srcs = ["shape_utils.py"], + deps = ["//tensorflow"], +) + +py_library( + name = "static_shape", + srcs = ["static_shape.py"], + deps = [], +) + +py_library( + name = "test_utils", + srcs = ["test_utils.py"], + deps = [ + "//tensorflow", + "//tensorflow_models/object_detection/core:anchor_generator", + "//tensorflow_models/object_detection/core:box_coder", + "//tensorflow_models/object_detection/core:box_list", + "//tensorflow_models/object_detection/core:box_predictor", + "//tensorflow_models/object_detection/core:matcher", + ], +) + +py_library( + name = "variables_helper", + srcs = ["variables_helper.py"], + deps = [ + "//tensorflow", + ], +) + +py_library( + name = "visualization_utils", + srcs = ["visualization_utils.py"], + deps = [ + "//third_party/py/PIL:pil", + "//tensorflow", + ], +) + +py_test( + name = "category_util_test", + srcs = ["category_util_test.py"], + deps = [ + ":category_util", + "//tensorflow", + ], +) + +py_test( + name = "dataset_util_test", + srcs = ["dataset_util_test.py"], + deps = [ + ":dataset_util", + "//tensorflow", + ], +) + +py_test( + name = "label_map_util_test", + srcs = ["label_map_util_test.py"], + deps = [ + ":label_map_util", + "//tensorflow", + ], +) + +py_test( + name = "learning_schedules_test", + srcs = ["learning_schedules_test.py"], + deps = [ + ":learning_schedules", + "//tensorflow", + ], +) + +py_test( + name = "metrics_test", + srcs = ["metrics_test.py"], + deps = [ + ":metrics", + "//tensorflow", + ], +) + +py_test( + name = "np_box_list_test", + srcs = ["np_box_list_test.py"], + deps = [ + ":np_box_list", + "//tensorflow", + ], +) + +py_test( + name = "np_box_list_ops_test", + srcs = ["np_box_list_ops_test.py"], + deps = [ + ":np_box_list", + ":np_box_list_ops", + "//tensorflow", + ], +) + +py_test( + name = "np_box_ops_test", + srcs = ["np_box_ops_test.py"], + deps = [ + ":np_box_ops", + "//tensorflow", + ], +) + +py_test( + name = "object_detection_evaluation_test", + srcs = ["object_detection_evaluation_test.py"], + deps = [ + ":object_detection_evaluation", + "//tensorflow", + ], +) + +py_test( + name = "ops_test", + srcs = ["ops_test.py"], + deps = [ + ":ops", + "//tensorflow", + "//tensorflow_models/object_detection/core:standard_fields", + ], +) + +py_test( + name = "per_image_evaluation_test", + srcs = ["per_image_evaluation_test.py"], + deps = [ + ":per_image_evaluation", + "//tensorflow", + ], +) + +py_test( + name = "shape_utils_test", + srcs = ["shape_utils_test.py"], + deps = [ + ":shape_utils", + "//tensorflow", + ], +) + +py_test( + name = "static_shape_test", + srcs = ["static_shape_test.py"], + deps = [ + ":static_shape", + "//tensorflow", + ], +) + +py_test( + name = "test_utils_test", + srcs = ["test_utils_test.py"], + deps = [ + ":test_utils", + "//tensorflow", + ], +) + +py_test( + name = "variables_helper_test", + srcs = ["variables_helper_test.py"], + deps = [ + ":variables_helper", + "//tensorflow", + ], +) + +py_test( + name = "visualization_utils_test", + srcs = ["visualization_utils_test.py"], + deps = [ + ":visualization_utils", + "//third_party/py/PIL:pil", + ], +) diff --git a/object_detection/utils/__init__.py b/object_detection/utils/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/object_detection/utils/category_util.py b/object_detection/utils/category_util.py new file mode 100644 index 0000000000000000000000000000000000000000..fdd9c1c1c985a6e595089a473b3c02fe89d5a257 --- /dev/null +++ b/object_detection/utils/category_util.py @@ -0,0 +1,72 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Functions for importing/exporting Object Detection categories.""" +import csv + +import tensorflow as tf + + +def load_categories_from_csv_file(csv_path): + """Loads categories from a csv file. + + The CSV file should have one comma delimited numeric category id and string + category name pair per line. For example: + + 0,"cat" + 1,"dog" + 2,"bird" + ... + + Args: + csv_path: Path to the csv file to be parsed into categories. + Returns: + categories: A list of dictionaries representing all possible categories. + The categories will contain an integer 'id' field and a string + 'name' field. + Raises: + ValueError: If the csv file is incorrectly formatted. + """ + categories = [] + + with tf.gfile.Open(csv_path, 'r') as csvfile: + reader = csv.reader(csvfile, delimiter=',', quotechar='"') + for row in reader: + if not row: + continue + + if len(row) != 2: + raise ValueError('Expected 2 fields per row in csv: %s' % ','.join(row)) + + category_id = int(row[0]) + category_name = row[1] + categories.append({'id': category_id, 'name': category_name}) + + return categories + + +def save_categories_to_csv_file(categories, csv_path): + """Saves categories to a csv file. + + Args: + categories: A list of dictionaries representing categories to save to file. + Each category must contain an 'id' and 'name' field. + csv_path: Path to the csv file to be parsed into categories. + """ + categories.sort(key=lambda x: x['id']) + with tf.gfile.Open(csv_path, 'w') as csvfile: + writer = csv.writer(csvfile, delimiter=',', quotechar='"') + for category in categories: + writer.writerow([category['id'], category['name']]) diff --git a/object_detection/utils/category_util_test.py b/object_detection/utils/category_util_test.py new file mode 100644 index 0000000000000000000000000000000000000000..9c99079e1c45d5cf8ce92048e7c3a9267b26ccf3 --- /dev/null +++ b/object_detection/utils/category_util_test.py @@ -0,0 +1,54 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.utils.category_util.""" +import os + +import tensorflow as tf + +from object_detection.utils import category_util + + +class EvalUtilTest(tf.test.TestCase): + + def test_load_categories_from_csv_file(self): + csv_data = """ + 0,"cat" + 1,"dog" + 2,"bird" + """.strip(' ') + csv_path = os.path.join(self.get_temp_dir(), 'test.csv') + with tf.gfile.Open(csv_path, 'wb') as f: + f.write(csv_data) + + categories = category_util.load_categories_from_csv_file(csv_path) + self.assertTrue({'id': 0, 'name': 'cat'} in categories) + self.assertTrue({'id': 1, 'name': 'dog'} in categories) + self.assertTrue({'id': 2, 'name': 'bird'} in categories) + + def test_save_categories_to_csv_file(self): + categories = [ + {'id': 0, 'name': 'cat'}, + {'id': 1, 'name': 'dog'}, + {'id': 2, 'name': 'bird'}, + ] + csv_path = os.path.join(self.get_temp_dir(), 'test.csv') + category_util.save_categories_to_csv_file(categories, csv_path) + saved_categories = category_util.load_categories_from_csv_file(csv_path) + self.assertEqual(saved_categories, categories) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/utils/dataset_util.py b/object_detection/utils/dataset_util.py new file mode 100644 index 0000000000000000000000000000000000000000..014a9118d1ad3be636cee7e049e2fe96be6ca4ec --- /dev/null +++ b/object_detection/utils/dataset_util.py @@ -0,0 +1,86 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Utility functions for creating TFRecord data sets.""" + +import tensorflow as tf + + +def int64_feature(value): + return tf.train.Feature(int64_list=tf.train.Int64List(value=[value])) + + +def int64_list_feature(value): + return tf.train.Feature(int64_list=tf.train.Int64List(value=value)) + + +def bytes_feature(value): + return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value])) + + +def bytes_list_feature(value): + return tf.train.Feature(bytes_list=tf.train.BytesList(value=value)) + + +def float_list_feature(value): + return tf.train.Feature(float_list=tf.train.FloatList(value=value)) + + +def read_examples_list(path): + """Read list of training or validation examples. + + The file is assumed to contain a single example per line where the first + token in the line is an identifier that allows us to find the image and + annotation xml for that example. + + For example, the line: + xyz 3 + would allow us to find files xyz.jpg and xyz.xml (the 3 would be ignored). + + Args: + path: absolute path to examples list file. + + Returns: + list of example identifiers (strings). + """ + with tf.gfile.GFile(path) as fid: + lines = fid.readlines() + return [line.strip().split(' ')[0] for line in lines] + + +def recursive_parse_xml_to_dict(xml): + """Recursively parses XML contents to python dict. + + We assume that `object` tags are the only ones that can appear + multiple times at the same level of a tree. + + Args: + xml: xml tree obtained by parsing XML file contents using lxml.etree + + Returns: + Python dictionary holding XML contents. + """ + if not xml: + return {xml.tag: xml.text} + result = {} + for child in xml: + child_result = recursive_parse_xml_to_dict(child) + if child.tag != 'object': + result[child.tag] = child_result[child.tag] + else: + if child.tag not in result: + result[child.tag] = [] + result[child.tag].append(child_result[child.tag]) + return {xml.tag: result} diff --git a/object_detection/utils/dataset_util_test.py b/object_detection/utils/dataset_util_test.py new file mode 100644 index 0000000000000000000000000000000000000000..99cfb2cdfce61d85b18f99142169270b0bf2254a --- /dev/null +++ b/object_detection/utils/dataset_util_test.py @@ -0,0 +1,37 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.utils.dataset_util.""" + +import os +import tensorflow as tf + +from object_detection.utils import dataset_util + + +class DatasetUtilTest(tf.test.TestCase): + + def test_read_examples_list(self): + example_list_data = """example1 1\nexample2 2""" + example_list_path = os.path.join(self.get_temp_dir(), 'examples.txt') + with tf.gfile.Open(example_list_path, 'wb') as f: + f.write(example_list_data) + + examples = dataset_util.read_examples_list(example_list_path) + self.assertListEqual(['example1', 'example2'], examples) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/utils/label_map_util.py b/object_detection/utils/label_map_util.py new file mode 100644 index 0000000000000000000000000000000000000000..a3b3125242d63db2a93345f676b2eb4463efd44e --- /dev/null +++ b/object_detection/utils/label_map_util.py @@ -0,0 +1,126 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Label map utility functions.""" + +import logging + +import tensorflow as tf +from google.protobuf import text_format +from object_detection.protos import string_int_label_map_pb2 + + +def create_category_index(categories): + """Creates dictionary of COCO compatible categories keyed by category id. + + Args: + categories: a list of dicts, each of which has the following keys: + 'id': (required) an integer id uniquely identifying this category. + 'name': (required) string representing category name + e.g., 'cat', 'dog', 'pizza'. + + Returns: + category_index: a dict containing the same entries as categories, but keyed + by the 'id' field of each category. + """ + category_index = {} + for cat in categories: + category_index[cat['id']] = cat + return category_index + + +def convert_label_map_to_categories(label_map, + max_num_classes, + use_display_name=True): + """Loads label map proto and returns categories list compatible with eval. + + This function loads a label map and returns a list of dicts, each of which + has the following keys: + 'id': (required) an integer id uniquely identifying this category. + 'name': (required) string representing category name + e.g., 'cat', 'dog', 'pizza'. + We only allow class into the list if its id-label_id_offset is + between 0 (inclusive) and max_num_classes (exclusive). + If there are several items mapping to the same id in the label map, + we will only keep the first one in the categories list. + + Args: + label_map: a StringIntLabelMapProto or None. If None, a default categories + list is created with max_num_classes categories. + max_num_classes: maximum number of (consecutive) label indices to include. + use_display_name: (boolean) choose whether to load 'display_name' field + as category name. If False of if the display_name field does not exist, + uses 'name' field as category names instead. + Returns: + categories: a list of dictionaries representing all possible categories. + """ + categories = [] + list_of_ids_already_added = [] + if not label_map: + label_id_offset = 1 + for class_id in range(max_num_classes): + categories.append({ + 'id': class_id + label_id_offset, + 'name': 'category_{}'.format(class_id + label_id_offset) + }) + return categories + for item in label_map.item: + if not 0 < item.id <= max_num_classes: + logging.info('Ignore item %d since it falls outside of requested ' + 'label range.', item.id) + continue + if use_display_name and item.HasField('display_name'): + name = item.display_name + else: + name = item.name + if item.id not in list_of_ids_already_added: + list_of_ids_already_added.append(item.id) + categories.append({'id': item.id, 'name': name}) + return categories + + +# TODO: double check documentaion. +def load_labelmap(path): + """Loads label map proto. + + Args: + path: path to StringIntLabelMap proto text file. + Returns: + a StringIntLabelMapProto + """ + with tf.gfile.GFile(path, 'r') as fid: + label_map_string = fid.read() + label_map = string_int_label_map_pb2.StringIntLabelMap() + try: + text_format.Merge(label_map_string, label_map) + except text_format.ParseError: + label_map.ParseFromString(label_map_string) + return label_map + + +def get_label_map_dict(label_map_path): + """Reads a label map and returns a dictionary of label names to id. + + Args: + label_map_path: path to label_map. + + Returns: + A dictionary mapping label names to id. + """ + label_map = load_labelmap(label_map_path) + label_map_dict = {} + for item in label_map.item: + label_map_dict[item.name] = item.id + return label_map_dict diff --git a/object_detection/utils/label_map_util_test.py b/object_detection/utils/label_map_util_test.py new file mode 100644 index 0000000000000000000000000000000000000000..10e0f3ddc925db5ae7b972018d0983c0d3d2a391 --- /dev/null +++ b/object_detection/utils/label_map_util_test.py @@ -0,0 +1,147 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.utils.label_map_util.""" + +import os +import tensorflow as tf + +from google.protobuf import text_format +from object_detection.protos import string_int_label_map_pb2 +from object_detection.utils import label_map_util + + +class LabelMapUtilTest(tf.test.TestCase): + + def _generate_label_map(self, num_classes): + label_map_proto = string_int_label_map_pb2.StringIntLabelMap() + for i in range(1, num_classes + 1): + item = label_map_proto.item.add() + item.id = i + item.name = 'label_' + str(i) + item.display_name = str(i) + return label_map_proto + + def test_get_label_map_dict(self): + label_map_string = """ + item { + id:2 + name:'cat' + } + item { + id:1 + name:'dog' + } + """ + label_map_path = os.path.join(self.get_temp_dir(), 'label_map.pbtxt') + with tf.gfile.Open(label_map_path, 'wb') as f: + f.write(label_map_string) + + label_map_dict = label_map_util.get_label_map_dict(label_map_path) + self.assertEqual(label_map_dict['dog'], 1) + self.assertEqual(label_map_dict['cat'], 2) + + def test_keep_categories_with_unique_id(self): + label_map_proto = string_int_label_map_pb2.StringIntLabelMap() + label_map_string = """ + item { + id:2 + name:'cat' + } + item { + id:1 + name:'child' + } + item { + id:1 + name:'person' + } + item { + id:1 + name:'n00007846' + } + """ + text_format.Merge(label_map_string, label_map_proto) + categories = label_map_util.convert_label_map_to_categories( + label_map_proto, max_num_classes=3) + self.assertListEqual([{ + 'id': 2, + 'name': u'cat' + }, { + 'id': 1, + 'name': u'child' + }], categories) + + def test_convert_label_map_to_categories_no_label_map(self): + categories = label_map_util.convert_label_map_to_categories( + None, max_num_classes=3) + expected_categories_list = [{ + 'name': u'category_1', + 'id': 1 + }, { + 'name': u'category_2', + 'id': 2 + }, { + 'name': u'category_3', + 'id': 3 + }] + self.assertListEqual(expected_categories_list, categories) + + def test_convert_label_map_to_coco_categories(self): + label_map_proto = self._generate_label_map(num_classes=4) + categories = label_map_util.convert_label_map_to_categories( + label_map_proto, max_num_classes=3) + expected_categories_list = [{ + 'name': u'1', + 'id': 1 + }, { + 'name': u'2', + 'id': 2 + }, { + 'name': u'3', + 'id': 3 + }] + self.assertListEqual(expected_categories_list, categories) + + def test_convert_label_map_to_coco_categories_with_few_classes(self): + label_map_proto = self._generate_label_map(num_classes=4) + cat_no_offset = label_map_util.convert_label_map_to_categories( + label_map_proto, max_num_classes=2) + expected_categories_list = [{ + 'name': u'1', + 'id': 1 + }, { + 'name': u'2', + 'id': 2 + }] + self.assertListEqual(expected_categories_list, cat_no_offset) + + def test_create_category_index(self): + categories = [{'name': u'1', 'id': 1}, {'name': u'2', 'id': 2}] + category_index = label_map_util.create_category_index(categories) + self.assertDictEqual({ + 1: { + 'name': u'1', + 'id': 1 + }, + 2: { + 'name': u'2', + 'id': 2 + } + }, category_index) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/utils/learning_schedules.py b/object_detection/utils/learning_schedules.py new file mode 100644 index 0000000000000000000000000000000000000000..217b47a71d03047691ea8f4b41bd387491d5c76e --- /dev/null +++ b/object_detection/utils/learning_schedules.py @@ -0,0 +1,103 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Library of common learning rate schedules.""" + +import tensorflow as tf + + +def exponential_decay_with_burnin(global_step, + learning_rate_base, + learning_rate_decay_steps, + learning_rate_decay_factor, + burnin_learning_rate=0.0, + burnin_steps=0): + """Exponential decay schedule with burn-in period. + + In this schedule, learning rate is fixed at burnin_learning_rate + for a fixed period, before transitioning to a regular exponential + decay schedule. + + Args: + global_step: int tensor representing global step. + learning_rate_base: base learning rate. + learning_rate_decay_steps: steps to take between decaying the learning rate. + Note that this includes the number of burn-in steps. + learning_rate_decay_factor: multiplicative factor by which to decay + learning rate. + burnin_learning_rate: initial learning rate during burn-in period. If + 0.0 (which is the default), then the burn-in learning rate is simply + set to learning_rate_base. + burnin_steps: number of steps to use burnin learning rate. + + Returns: + a (scalar) float tensor representing learning rate + """ + if burnin_learning_rate == 0: + burnin_learning_rate = learning_rate_base + post_burnin_learning_rate = tf.train.exponential_decay( + learning_rate_base, + global_step, + learning_rate_decay_steps, + learning_rate_decay_factor, + staircase=True) + return tf.cond( + tf.less(global_step, burnin_steps), + lambda: tf.convert_to_tensor(burnin_learning_rate), + lambda: post_burnin_learning_rate) + + +def manual_stepping(global_step, boundaries, rates): + """Manually stepped learning rate schedule. + + This function provides fine grained control over learning rates. One must + specify a sequence of learning rates as well as a set of integer steps + at which the current learning rate must transition to the next. For example, + if boundaries = [5, 10] and rates = [.1, .01, .001], then the learning + rate returned by this function is .1 for global_step=0,...,4, .01 for + global_step=5...9, and .001 for global_step=10 and onward. + + Args: + global_step: int64 (scalar) tensor representing global step. + boundaries: a list of global steps at which to switch learning + rates. This list is assumed to consist of increasing positive integers. + rates: a list of (float) learning rates corresponding to intervals between + the boundaries. The length of this list must be exactly + len(boundaries) + 1. + + Returns: + a (scalar) float tensor representing learning rate + Raises: + ValueError: if one of the following checks fails: + 1. boundaries is a strictly increasing list of positive integers + 2. len(rates) == len(boundaries) + 1 + """ + if any([b < 0 for b in boundaries]) or any( + [not isinstance(b, int) for b in boundaries]): + raise ValueError('boundaries must be a list of positive integers') + if any([bnext <= b for bnext, b in zip(boundaries[1:], boundaries[:-1])]): + raise ValueError('Entries in boundaries must be strictly increasing.') + if any([not isinstance(r, float) for r in rates]): + raise ValueError('Learning rates must be floats') + if len(rates) != len(boundaries) + 1: + raise ValueError('Number of provided learning rates must exceed ' + 'number of boundary points by exactly 1.') + step_boundaries = tf.constant(boundaries, tf.int64) + learning_rates = tf.constant(rates, tf.float32) + unreached_boundaries = tf.reshape(tf.where( + tf.greater(step_boundaries, global_step)), [-1]) + unreached_boundaries = tf.concat([unreached_boundaries, [len(boundaries)]], 0) + index = tf.reshape(tf.reduce_min(unreached_boundaries), [1]) + return tf.reshape(tf.slice(learning_rates, index, [1]), []) diff --git a/object_detection/utils/learning_schedules_test.py b/object_detection/utils/learning_schedules_test.py new file mode 100644 index 0000000000000000000000000000000000000000..c8e6ce641d0035e887c1471176a876316afa9edc --- /dev/null +++ b/object_detection/utils/learning_schedules_test.py @@ -0,0 +1,59 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.utils.learning_schedules.""" +import tensorflow as tf + +from object_detection.utils import learning_schedules + + +class LearningSchedulesTest(tf.test.TestCase): + + def testExponentialDecayWithBurnin(self): + global_step = tf.placeholder(tf.int32, []) + learning_rate_base = 1.0 + learning_rate_decay_steps = 3 + learning_rate_decay_factor = .1 + burnin_learning_rate = .5 + burnin_steps = 2 + exp_rates = [.5, .5, 1, .1, .1, .1, .01, .01] + learning_rate = learning_schedules.exponential_decay_with_burnin( + global_step, learning_rate_base, learning_rate_decay_steps, + learning_rate_decay_factor, burnin_learning_rate, burnin_steps) + with self.test_session() as sess: + output_rates = [] + for input_global_step in range(8): + output_rate = sess.run(learning_rate, + feed_dict={global_step: input_global_step}) + output_rates.append(output_rate) + self.assertAllClose(output_rates, exp_rates) + + def testManualStepping(self): + global_step = tf.placeholder(tf.int64, []) + boundaries = [2, 3, 7] + rates = [1.0, 2.0, 3.0, 4.0] + exp_rates = [1.0, 1.0, 2.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0] + learning_rate = learning_schedules.manual_stepping(global_step, boundaries, + rates) + with self.test_session() as sess: + output_rates = [] + for input_global_step in range(10): + output_rate = sess.run(learning_rate, + feed_dict={global_step: input_global_step}) + output_rates.append(output_rate) + self.assertAllClose(output_rates, exp_rates) + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/utils/metrics.py b/object_detection/utils/metrics.py new file mode 100644 index 0000000000000000000000000000000000000000..cfce1e9ceef070170a4dc141986fae5865b3ab77 --- /dev/null +++ b/object_detection/utils/metrics.py @@ -0,0 +1,145 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Functions for computing metrics like precision, recall, CorLoc and etc.""" +from __future__ import division + +import numpy as np +from six import moves + + +def compute_precision_recall(scores, labels, num_gt): + """Compute precision and recall. + + Args: + scores: A float numpy array representing detection score + labels: A boolean numpy array representing true/false positive labels + num_gt: Number of ground truth instances + + Raises: + ValueError: if the input is not of the correct format + + Returns: + precision: Fraction of positive instances over detected ones. This value is + None if no ground truth labels are present. + recall: Fraction of detected positive instance over all positive instances. + This value is None if no ground truth labels are present. + + """ + if not isinstance( + labels, np.ndarray) or labels.dtype != np.bool or len(labels.shape) != 1: + raise ValueError("labels must be single dimension bool numpy array") + + if not isinstance( + scores, np.ndarray) or len(scores.shape) != 1: + raise ValueError("scores must be single dimension numpy array") + + if num_gt < np.sum(labels): + raise ValueError("Number of true positives must be smaller than num_gt.") + + if len(scores) != len(labels): + raise ValueError("scores and labels must be of the same size.") + + if num_gt == 0: + return None, None + + sorted_indices = np.argsort(scores) + sorted_indices = sorted_indices[::-1] + labels = labels.astype(int) + true_positive_labels = labels[sorted_indices] + false_positive_labels = 1 - true_positive_labels + cum_true_positives = np.cumsum(true_positive_labels) + cum_false_positives = np.cumsum(false_positive_labels) + precision = cum_true_positives.astype(float) / ( + cum_true_positives + cum_false_positives) + recall = cum_true_positives.astype(float) / num_gt + return precision, recall + + +def compute_average_precision(precision, recall): + """Compute Average Precision according to the definition in VOCdevkit. + + Precision is modified to ensure that it does not decrease as recall + decrease. + + Args: + precision: A float [N, 1] numpy array of precisions + recall: A float [N, 1] numpy array of recalls + + Raises: + ValueError: if the input is not of the correct format + + Returns: + average_precison: The area under the precision recall curve. NaN if + precision and recall are None. + + """ + if precision is None: + if recall is not None: + raise ValueError("If precision is None, recall must also be None") + return np.NAN + + if not isinstance(precision, np.ndarray) or not isinstance(recall, + np.ndarray): + raise ValueError("precision and recall must be numpy array") + if precision.dtype != np.float or recall.dtype != np.float: + raise ValueError("input must be float numpy array.") + if len(precision) != len(recall): + raise ValueError("precision and recall must be of the same size.") + if not precision.size: + return 0.0 + if np.amin(precision) < 0 or np.amax(precision) > 1: + raise ValueError("Precision must be in the range of [0, 1].") + if np.amin(recall) < 0 or np.amax(recall) > 1: + raise ValueError("recall must be in the range of [0, 1].") + if not all(recall[i] <= recall[i + 1] for i in moves.range(len(recall) - 1)): + raise ValueError("recall must be a non-decreasing array") + + recall = np.concatenate([[0], recall, [1]]) + precision = np.concatenate([[0], precision, [0]]) + + # Preprocess precision to be a non-decreasing array + for i in range(len(precision) - 2, -1, -1): + precision[i] = np.maximum(precision[i], precision[i + 1]) + + indices = np.where(recall[1:] != recall[:-1])[0] + 1 + average_precision = np.sum( + (recall[indices] - recall[indices - 1]) * precision[indices]) + return average_precision + + +def compute_cor_loc(num_gt_imgs_per_class, + num_images_correctly_detected_per_class): + """Compute CorLoc according to the definition in the following paper. + + https://www.robots.ox.ac.uk/~vgg/rg/papers/deselaers-eccv10.pdf + + Returns nans if there are no ground truth images for a class. + + Args: + num_gt_imgs_per_class: 1D array, representing number of images containing + at least one object instance of a particular class + num_images_correctly_detected_per_class: 1D array, representing number of + images that are correctly detected at least one object instance of a + particular class + + Returns: + corloc_per_class: A float numpy array represents the corloc score of each + class + """ + return np.where( + num_gt_imgs_per_class == 0, + np.nan, + num_images_correctly_detected_per_class / num_gt_imgs_per_class) diff --git a/object_detection/utils/metrics_test.py b/object_detection/utils/metrics_test.py new file mode 100644 index 0000000000000000000000000000000000000000..a2064bbff50686274b055bb058c81982eeefbe67 --- /dev/null +++ b/object_detection/utils/metrics_test.py @@ -0,0 +1,79 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.metrics.""" + +import numpy as np +import tensorflow as tf + +from object_detection.utils import metrics + + +class MetricsTest(tf.test.TestCase): + + def test_compute_cor_loc(self): + num_gt_imgs_per_class = np.array([100, 1, 5, 1, 1], dtype=int) + num_images_correctly_detected_per_class = np.array([10, 0, 1, 0, 0], + dtype=int) + corloc = metrics.compute_cor_loc(num_gt_imgs_per_class, + num_images_correctly_detected_per_class) + expected_corloc = np.array([0.1, 0, 0.2, 0, 0], dtype=float) + self.assertTrue(np.allclose(corloc, expected_corloc)) + + def test_compute_cor_loc_nans(self): + num_gt_imgs_per_class = np.array([100, 0, 0, 1, 1], dtype=int) + num_images_correctly_detected_per_class = np.array([10, 0, 1, 0, 0], + dtype=int) + corloc = metrics.compute_cor_loc(num_gt_imgs_per_class, + num_images_correctly_detected_per_class) + expected_corloc = np.array([0.1, np.nan, np.nan, 0, 0], dtype=float) + self.assertAllClose(corloc, expected_corloc) + + def test_compute_precision_recall(self): + num_gt = 10 + scores = np.array([0.4, 0.3, 0.6, 0.2, 0.7, 0.1], dtype=float) + labels = np.array([0, 1, 1, 0, 0, 1], dtype=bool) + accumulated_tp_count = np.array([0, 1, 1, 2, 2, 3], dtype=float) + expected_precision = accumulated_tp_count / np.array([1, 2, 3, 4, 5, 6]) + expected_recall = accumulated_tp_count / num_gt + precision, recall = metrics.compute_precision_recall(scores, labels, num_gt) + self.assertAllClose(precision, expected_precision) + self.assertAllClose(recall, expected_recall) + + def test_compute_average_precision(self): + precision = np.array([0.8, 0.76, 0.9, 0.65, 0.7, 0.5, 0.55, 0], dtype=float) + recall = np.array([0.3, 0.3, 0.4, 0.4, 0.45, 0.45, 0.5, 0.5], dtype=float) + processed_precision = np.array([0.9, 0.9, 0.9, 0.7, 0.7, 0.55, 0.55, 0], + dtype=float) + recall_interval = np.array([0.3, 0, 0.1, 0, 0.05, 0, 0.05, 0], dtype=float) + expected_mean_ap = np.sum(recall_interval * processed_precision) + mean_ap = metrics.compute_average_precision(precision, recall) + self.assertAlmostEqual(expected_mean_ap, mean_ap) + + def test_compute_precision_recall_and_ap_no_groundtruth(self): + num_gt = 0 + scores = np.array([0.4, 0.3, 0.6, 0.2, 0.7, 0.1], dtype=float) + labels = np.array([0, 0, 0, 0, 0, 0], dtype=bool) + expected_precision = None + expected_recall = None + precision, recall = metrics.compute_precision_recall(scores, labels, num_gt) + self.assertEquals(precision, expected_precision) + self.assertEquals(recall, expected_recall) + ap = metrics.compute_average_precision(precision, recall) + self.assertTrue(np.isnan(ap)) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/utils/np_box_list.py b/object_detection/utils/np_box_list.py new file mode 100644 index 0000000000000000000000000000000000000000..13a1fde9032b955f4550b06b3146b4d7e5fd71af --- /dev/null +++ b/object_detection/utils/np_box_list.py @@ -0,0 +1,134 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Numpy BoxList classes and functions.""" + +import numpy as np +from six import moves + + +class BoxList(object): + """Box collection. + + BoxList represents a list of bounding boxes as numpy array, where each + bounding box is represented as a row of 4 numbers, + [y_min, x_min, y_max, x_max]. It is assumed that all bounding boxes within a + given list correspond to a single image. + + Optionally, users can add additional related fields (such as + objectness/classification scores). + """ + + def __init__(self, data): + """Constructs box collection. + + Args: + data: a numpy array of shape [N, 4] representing box coordinates + + Raises: + ValueError: if bbox data is not a numpy array + ValueError: if invalid dimensions for bbox data + """ + if not isinstance(data, np.ndarray): + raise ValueError('data must be a numpy array.') + if len(data.shape) != 2 or data.shape[1] != 4: + raise ValueError('Invalid dimensions for box data.') + if data.dtype != np.float32 and data.dtype != np.float64: + raise ValueError('Invalid data type for box data: float is required.') + if not self._is_valid_boxes(data): + raise ValueError('Invalid box data. data must be a numpy array of ' + 'N*[y_min, x_min, y_max, x_max]') + self.data = {'boxes': data} + + def num_boxes(self): + """Return number of boxes held in collections.""" + return self.data['boxes'].shape[0] + + def get_extra_fields(self): + """Return all non-box fields.""" + return [k for k in self.data.keys() if k != 'boxes'] + + def has_field(self, field): + return field in self.data + + def add_field(self, field, field_data): + """Add data to a specified field. + + Args: + field: a string parameter used to speficy a related field to be accessed. + field_data: a numpy array of [N, ...] representing the data associated + with the field. + Raises: + ValueError: if the field is already exist or the dimension of the field + data does not matches the number of boxes. + """ + if self.has_field(field): + raise ValueError('Field ' + field + 'already exists') + if len(field_data.shape) < 1 or field_data.shape[0] != self.num_boxes(): + raise ValueError('Invalid dimensions for field data') + self.data[field] = field_data + + def get(self): + """Convenience function for accesssing box coordinates. + + Returns: + a numpy array of shape [N, 4] representing box corners + """ + return self.get_field('boxes') + + def get_field(self, field): + """Accesses data associated with the specified field in the box collection. + + Args: + field: a string parameter used to speficy a related field to be accessed. + + Returns: + a numpy 1-d array representing data of an associated field + + Raises: + ValueError: if invalid field + """ + if not self.has_field(field): + raise ValueError('field {} does not exist'.format(field)) + return self.data[field] + + def get_coordinates(self): + """Get corner coordinates of boxes. + + Returns: + a list of 4 1-d numpy arrays [y_min, x_min, y_max, x_max] + """ + box_coordinates = self.get() + y_min = box_coordinates[:, 0] + x_min = box_coordinates[:, 1] + y_max = box_coordinates[:, 2] + x_max = box_coordinates[:, 3] + return [y_min, x_min, y_max, x_max] + + def _is_valid_boxes(self, data): + """Check whether data fullfills the format of N*[ymin, xmin, ymax, xmin]. + + Args: + data: a numpy array of shape [N, 4] representing box coordinates + + Returns: + a boolean indicating whether all ymax of boxes are equal or greater than + ymin, and all xmax of boxes are equal or greater than xmin. + """ + if data.shape[0] > 0: + for i in moves.range(data.shape[0]): + if data[i, 0] > data[i, 2] or data[i, 1] > data[i, 3]: + return False + return True diff --git a/object_detection/utils/np_box_list_ops.py b/object_detection/utils/np_box_list_ops.py new file mode 100644 index 0000000000000000000000000000000000000000..cb9fee8561986692277ddcfef6aae91ccf4dd626 --- /dev/null +++ b/object_detection/utils/np_box_list_ops.py @@ -0,0 +1,555 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Bounding Box List operations for Numpy BoxLists. + +Example box operations that are supported: + * Areas: compute bounding box areas + * IOU: pairwise intersection-over-union scores +""" + +import numpy as np + +from object_detection.utils import np_box_list +from object_detection.utils import np_box_ops + + +class SortOrder(object): + """Enum class for sort order. + + Attributes: + ascend: ascend order. + descend: descend order. + """ + ASCEND = 1 + DESCEND = 2 + + +def area(boxlist): + """Computes area of boxes. + + Args: + boxlist: BoxList holding N boxes + + Returns: + a numpy array with shape [N*1] representing box areas + """ + y_min, x_min, y_max, x_max = boxlist.get_coordinates() + return (y_max - y_min) * (x_max - x_min) + + +def intersection(boxlist1, boxlist2): + """Compute pairwise intersection areas between boxes. + + Args: + boxlist1: BoxList holding N boxes + boxlist2: BoxList holding M boxes + + Returns: + a numpy array with shape [N*M] representing pairwise intersection area + """ + return np_box_ops.intersection(boxlist1.get(), boxlist2.get()) + + +def iou(boxlist1, boxlist2): + """Computes pairwise intersection-over-union between box collections. + + Args: + boxlist1: BoxList holding N boxes + boxlist2: BoxList holding M boxes + + Returns: + a numpy array with shape [N, M] representing pairwise iou scores. + """ + return np_box_ops.iou(boxlist1.get(), boxlist2.get()) + + +def ioa(boxlist1, boxlist2): + """Computes pairwise intersection-over-area between box collections. + + Intersection-over-area (ioa) between two boxes box1 and box2 is defined as + their intersection area over box2's area. Note that ioa is not symmetric, + that is, IOA(box1, box2) != IOA(box2, box1). + + Args: + boxlist1: BoxList holding N boxes + boxlist2: BoxList holding M boxes + + Returns: + a numpy array with shape [N, M] representing pairwise ioa scores. + """ + return np_box_ops.ioa(boxlist1.get(), boxlist2.get()) + + +def gather(boxlist, indices, fields=None): + """Gather boxes from BoxList according to indices and return new BoxList. + + By default, Gather returns boxes corresponding to the input index list, as + well as all additional fields stored in the boxlist (indexing into the + first dimension). However one can optionally only gather from a + subset of fields. + + Args: + boxlist: BoxList holding N boxes + indices: a 1-d numpy array of type int_ + fields: (optional) list of fields to also gather from. If None (default), + all fields are gathered from. Pass an empty fields list to only gather + the box coordinates. + + Returns: + subboxlist: a BoxList corresponding to the subset of the input BoxList + specified by indices + + Raises: + ValueError: if specified field is not contained in boxlist or if the + indices are not of type int_ + """ + if indices.size: + if np.amax(indices) >= boxlist.num_boxes() or np.amin(indices) < 0: + raise ValueError('indices are out of valid range.') + subboxlist = np_box_list.BoxList(boxlist.get()[indices, :]) + if fields is None: + fields = boxlist.get_extra_fields() + for field in fields: + extra_field_data = boxlist.get_field(field) + subboxlist.add_field(field, extra_field_data[indices, ...]) + return subboxlist + + +def sort_by_field(boxlist, field, order=SortOrder.DESCEND): + """Sort boxes and associated fields according to a scalar field. + + A common use case is reordering the boxes according to descending scores. + + Args: + boxlist: BoxList holding N boxes. + field: A BoxList field for sorting and reordering the BoxList. + order: (Optional) 'descend' or 'ascend'. Default is descend. + + Returns: + sorted_boxlist: A sorted BoxList with the field in the specified order. + + Raises: + ValueError: if specified field does not exist or is not of single dimension. + ValueError: if the order is not either descend or ascend. + """ + if not boxlist.has_field(field): + raise ValueError('Field ' + field + ' does not exist') + if len(boxlist.get_field(field).shape) != 1: + raise ValueError('Field ' + field + 'should be single dimension.') + if order != SortOrder.DESCEND and order != SortOrder.ASCEND: + raise ValueError('Invalid sort order') + + field_to_sort = boxlist.get_field(field) + sorted_indices = np.argsort(field_to_sort) + if order == SortOrder.DESCEND: + sorted_indices = sorted_indices[::-1] + return gather(boxlist, sorted_indices) + + +def non_max_suppression(boxlist, + max_output_size=10000, + iou_threshold=1.0, + score_threshold=-10.0): + """Non maximum suppression. + + This op greedily selects a subset of detection bounding boxes, pruning + away boxes that have high IOU (intersection over union) overlap (> thresh) + with already selected boxes. In each iteration, the detected bounding box with + highest score in the available pool is selected. + + Args: + boxlist: BoxList holding N boxes. Must contain a 'scores' field + representing detection scores. All scores belong to the same class. + max_output_size: maximum number of retained boxes + iou_threshold: intersection over union threshold. + score_threshold: minimum score threshold. Remove the boxes with scores + less than this value. Default value is set to -10. A very + low threshold to pass pretty much all the boxes, unless + the user sets a different score threshold. + + Returns: + a BoxList holding M boxes where M <= max_output_size + Raises: + ValueError: if 'scores' field does not exist + ValueError: if threshold is not in [0, 1] + ValueError: if max_output_size < 0 + """ + if not boxlist.has_field('scores'): + raise ValueError('Field scores does not exist') + if iou_threshold < 0. or iou_threshold > 1.0: + raise ValueError('IOU threshold must be in [0, 1]') + if max_output_size < 0: + raise ValueError('max_output_size must be bigger than 0.') + + boxlist = filter_scores_greater_than(boxlist, score_threshold) + if boxlist.num_boxes() == 0: + return boxlist + + boxlist = sort_by_field(boxlist, 'scores') + + # Prevent further computation if NMS is disabled. + if iou_threshold == 1.0: + if boxlist.num_boxes() > max_output_size: + selected_indices = np.arange(max_output_size) + return gather(boxlist, selected_indices) + else: + return boxlist + + boxes = boxlist.get() + num_boxes = boxlist.num_boxes() + # is_index_valid is True only for all remaining valid boxes, + is_index_valid = np.full(num_boxes, 1, dtype=bool) + selected_indices = [] + num_output = 0 + for i in xrange(num_boxes): + if num_output < max_output_size: + if is_index_valid[i]: + num_output += 1 + selected_indices.append(i) + is_index_valid[i] = False + valid_indices = np.where(is_index_valid)[0] + if valid_indices.size == 0: + break + + intersect_over_union = np_box_ops.iou( + np.expand_dims(boxes[i, :], axis=0), boxes[valid_indices, :]) + intersect_over_union = np.squeeze(intersect_over_union, axis=0) + is_index_valid[valid_indices] = np.logical_and( + is_index_valid[valid_indices], + intersect_over_union <= iou_threshold) + return gather(boxlist, np.array(selected_indices)) + + +def multi_class_non_max_suppression(boxlist, score_thresh, iou_thresh, + max_output_size): + """Multi-class version of non maximum suppression. + + This op greedily selects a subset of detection bounding boxes, pruning + away boxes that have high IOU (intersection over union) overlap (> thresh) + with already selected boxes. It operates independently for each class for + which scores are provided (via the scores field of the input box_list), + pruning boxes with score less than a provided threshold prior to + applying NMS. + + Args: + boxlist: BoxList holding N boxes. Must contain a 'scores' field + representing detection scores. This scores field is a tensor that can + be 1 dimensional (in the case of a single class) or 2-dimensional, which + which case we assume that it takes the shape [num_boxes, num_classes]. + We further assume that this rank is known statically and that + scores.shape[1] is also known (i.e., the number of classes is fixed + and known at graph construction time). + score_thresh: scalar threshold for score (low scoring boxes are removed). + iou_thresh: scalar threshold for IOU (boxes that that high IOU overlap + with previously selected boxes are removed). + max_output_size: maximum number of retained boxes per class. + + Returns: + a BoxList holding M boxes with a rank-1 scores field representing + corresponding scores for each box with scores sorted in decreasing order + and a rank-1 classes field representing a class label for each box. + Raises: + ValueError: if iou_thresh is not in [0, 1] or if input boxlist does not have + a valid scores field. + """ + if not 0 <= iou_thresh <= 1.0: + raise ValueError('thresh must be between 0 and 1') + if not isinstance(boxlist, np_box_list.BoxList): + raise ValueError('boxlist must be a BoxList') + if not boxlist.has_field('scores'): + raise ValueError('input boxlist must have \'scores\' field') + scores = boxlist.get_field('scores') + if len(scores.shape) == 1: + scores = np.reshape(scores, [-1, 1]) + elif len(scores.shape) == 2: + if scores.shape[1] is None: + raise ValueError('scores field must have statically defined second ' + 'dimension') + else: + raise ValueError('scores field must be of rank 1 or 2') + num_boxes = boxlist.num_boxes() + num_scores = scores.shape[0] + num_classes = scores.shape[1] + + if num_boxes != num_scores: + raise ValueError('Incorrect scores field length: actual vs expected.') + + selected_boxes_list = [] + for class_idx in range(num_classes): + boxlist_and_class_scores = np_box_list.BoxList(boxlist.get()) + class_scores = np.reshape(scores[0:num_scores, class_idx], [-1]) + boxlist_and_class_scores.add_field('scores', class_scores) + boxlist_filt = filter_scores_greater_than(boxlist_and_class_scores, + score_thresh) + nms_result = non_max_suppression(boxlist_filt, + max_output_size=max_output_size, + iou_threshold=iou_thresh, + score_threshold=score_thresh) + nms_result.add_field( + 'classes', np.zeros_like(nms_result.get_field('scores')) + class_idx) + selected_boxes_list.append(nms_result) + selected_boxes = concatenate(selected_boxes_list) + sorted_boxes = sort_by_field(selected_boxes, 'scores') + return sorted_boxes + + +def scale(boxlist, y_scale, x_scale): + """Scale box coordinates in x and y dimensions. + + Args: + boxlist: BoxList holding N boxes + y_scale: float + x_scale: float + + Returns: + boxlist: BoxList holding N boxes + """ + y_min, x_min, y_max, x_max = np.array_split(boxlist.get(), 4, axis=1) + y_min = y_scale * y_min + y_max = y_scale * y_max + x_min = x_scale * x_min + x_max = x_scale * x_max + scaled_boxlist = np_box_list.BoxList(np.hstack([y_min, x_min, y_max, x_max])) + + fields = boxlist.get_extra_fields() + for field in fields: + extra_field_data = boxlist.get_field(field) + scaled_boxlist.add_field(field, extra_field_data) + + return scaled_boxlist + + +def clip_to_window(boxlist, window): + """Clip bounding boxes to a window. + + This op clips input bounding boxes (represented by bounding box + corners) to a window, optionally filtering out boxes that do not + overlap at all with the window. + + Args: + boxlist: BoxList holding M_in boxes + window: a numpy array of shape [4] representing the + [y_min, x_min, y_max, x_max] window to which the op + should clip boxes. + + Returns: + a BoxList holding M_out boxes where M_out <= M_in + """ + y_min, x_min, y_max, x_max = np.array_split(boxlist.get(), 4, axis=1) + win_y_min = window[0] + win_x_min = window[1] + win_y_max = window[2] + win_x_max = window[3] + y_min_clipped = np.fmax(np.fmin(y_min, win_y_max), win_y_min) + y_max_clipped = np.fmax(np.fmin(y_max, win_y_max), win_y_min) + x_min_clipped = np.fmax(np.fmin(x_min, win_x_max), win_x_min) + x_max_clipped = np.fmax(np.fmin(x_max, win_x_max), win_x_min) + clipped = np_box_list.BoxList( + np.hstack([y_min_clipped, x_min_clipped, y_max_clipped, x_max_clipped])) + clipped = _copy_extra_fields(clipped, boxlist) + areas = area(clipped) + nonzero_area_indices = np.reshape(np.nonzero(np.greater(areas, 0.0)), + [-1]).astype(np.int32) + return gather(clipped, nonzero_area_indices) + + +def prune_non_overlapping_boxes(boxlist1, boxlist2, minoverlap=0.0): + """Prunes the boxes in boxlist1 that overlap less than thresh with boxlist2. + + For each box in boxlist1, we want its IOA to be more than minoverlap with + at least one of the boxes in boxlist2. If it does not, we remove it. + + Args: + boxlist1: BoxList holding N boxes. + boxlist2: BoxList holding M boxes. + minoverlap: Minimum required overlap between boxes, to count them as + overlapping. + + Returns: + A pruned boxlist with size [N', 4]. + """ + intersection_over_area = ioa(boxlist2, boxlist1) # [M, N] tensor + intersection_over_area = np.amax(intersection_over_area, axis=0) # [N] tensor + keep_bool = np.greater_equal(intersection_over_area, np.array(minoverlap)) + keep_inds = np.nonzero(keep_bool)[0] + new_boxlist1 = gather(boxlist1, keep_inds) + return new_boxlist1 + + +def prune_outside_window(boxlist, window): + """Prunes bounding boxes that fall outside a given window. + + This function prunes bounding boxes that even partially fall outside the given + window. See also ClipToWindow which only prunes bounding boxes that fall + completely outside the window, and clips any bounding boxes that partially + overflow. + + Args: + boxlist: a BoxList holding M_in boxes. + window: a numpy array of size 4, representing [ymin, xmin, ymax, xmax] + of the window. + + Returns: + pruned_corners: a tensor with shape [M_out, 4] where M_out <= M_in. + valid_indices: a tensor with shape [M_out] indexing the valid bounding boxes + in the input tensor. + """ + + y_min, x_min, y_max, x_max = np.array_split(boxlist.get(), 4, axis=1) + win_y_min = window[0] + win_x_min = window[1] + win_y_max = window[2] + win_x_max = window[3] + coordinate_violations = np.hstack([np.less(y_min, win_y_min), + np.less(x_min, win_x_min), + np.greater(y_max, win_y_max), + np.greater(x_max, win_x_max)]) + valid_indices = np.reshape( + np.where(np.logical_not(np.max(coordinate_violations, axis=1))), [-1]) + return gather(boxlist, valid_indices), valid_indices + + +def concatenate(boxlists, fields=None): + """Concatenate list of BoxLists. + + This op concatenates a list of input BoxLists into a larger BoxList. It also + handles concatenation of BoxList fields as long as the field tensor shapes + are equal except for the first dimension. + + Args: + boxlists: list of BoxList objects + fields: optional list of fields to also concatenate. By default, all + fields from the first BoxList in the list are included in the + concatenation. + + Returns: + a BoxList with number of boxes equal to + sum([boxlist.num_boxes() for boxlist in BoxList]) + Raises: + ValueError: if boxlists is invalid (i.e., is not a list, is empty, or + contains non BoxList objects), or if requested fields are not contained in + all boxlists + """ + if not isinstance(boxlists, list): + raise ValueError('boxlists should be a list') + if not boxlists: + raise ValueError('boxlists should have nonzero length') + for boxlist in boxlists: + if not isinstance(boxlist, np_box_list.BoxList): + raise ValueError('all elements of boxlists should be BoxList objects') + concatenated = np_box_list.BoxList( + np.vstack([boxlist.get() for boxlist in boxlists])) + if fields is None: + fields = boxlists[0].get_extra_fields() + for field in fields: + first_field_shape = boxlists[0].get_field(field).shape + first_field_shape = first_field_shape[1:] + for boxlist in boxlists: + if not boxlist.has_field(field): + raise ValueError('boxlist must contain all requested fields') + field_shape = boxlist.get_field(field).shape + field_shape = field_shape[1:] + if field_shape != first_field_shape: + raise ValueError('field %s must have same shape for all boxlists ' + 'except for the 0th dimension.' % field) + concatenated_field = np.concatenate( + [boxlist.get_field(field) for boxlist in boxlists], axis=0) + concatenated.add_field(field, concatenated_field) + return concatenated + + +def filter_scores_greater_than(boxlist, thresh): + """Filter to keep only boxes with score exceeding a given threshold. + + This op keeps the collection of boxes whose corresponding scores are + greater than the input threshold. + + Args: + boxlist: BoxList holding N boxes. Must contain a 'scores' field + representing detection scores. + thresh: scalar threshold + + Returns: + a BoxList holding M boxes where M <= N + + Raises: + ValueError: if boxlist not a BoxList object or if it does not + have a scores field + """ + if not isinstance(boxlist, np_box_list.BoxList): + raise ValueError('boxlist must be a BoxList') + if not boxlist.has_field('scores'): + raise ValueError('input boxlist must have \'scores\' field') + scores = boxlist.get_field('scores') + if len(scores.shape) > 2: + raise ValueError('Scores should have rank 1 or 2') + if len(scores.shape) == 2 and scores.shape[1] != 1: + raise ValueError('Scores should have rank 1 or have shape ' + 'consistent with [None, 1]') + high_score_indices = np.reshape(np.where(np.greater(scores, thresh)), + [-1]).astype(np.int32) + return gather(boxlist, high_score_indices) + + +def change_coordinate_frame(boxlist, window): + """Change coordinate frame of the boxlist to be relative to window's frame. + + Given a window of the form [ymin, xmin, ymax, xmax], + changes bounding box coordinates from boxlist to be relative to this window + (e.g., the min corner maps to (0,0) and the max corner maps to (1,1)). + + An example use case is data augmentation: where we are given groundtruth + boxes (boxlist) and would like to randomly crop the image to some + window (window). In this case we need to change the coordinate frame of + each groundtruth box to be relative to this new window. + + Args: + boxlist: A BoxList object holding N boxes. + window: a size 4 1-D numpy array. + + Returns: + Returns a BoxList object with N boxes. + """ + win_height = window[2] - window[0] + win_width = window[3] - window[1] + boxlist_new = scale( + np_box_list.BoxList(boxlist.get() - + [window[0], window[1], window[0], window[1]]), + 1.0 / win_height, 1.0 / win_width) + _copy_extra_fields(boxlist_new, boxlist) + + return boxlist_new + + +def _copy_extra_fields(boxlist_to_copy_to, boxlist_to_copy_from): + """Copies the extra fields of boxlist_to_copy_from to boxlist_to_copy_to. + + Args: + boxlist_to_copy_to: BoxList to which extra fields are copied. + boxlist_to_copy_from: BoxList from which fields are copied. + + Returns: + boxlist_to_copy_to with extra fields. + """ + for field in boxlist_to_copy_from.get_extra_fields(): + boxlist_to_copy_to.add_field(field, boxlist_to_copy_from.get_field(field)) + return boxlist_to_copy_to + + +def _update_valid_indices_by_removing_high_iou_boxes( + selected_indices, is_index_valid, intersect_over_union, threshold): + max_iou = np.max(intersect_over_union[:, selected_indices], axis=1) + return np.logical_and(is_index_valid, max_iou <= threshold) diff --git a/object_detection/utils/np_box_list_ops_test.py b/object_detection/utils/np_box_list_ops_test.py new file mode 100644 index 0000000000000000000000000000000000000000..24a2cc8cfabd141e60178b017669d1177bb042d7 --- /dev/null +++ b/object_detection/utils/np_box_list_ops_test.py @@ -0,0 +1,414 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.utils.np_box_list_ops.""" + +import numpy as np +import tensorflow as tf + +from object_detection.utils import np_box_list +from object_detection.utils import np_box_list_ops + + +class AreaRelatedTest(tf.test.TestCase): + + def setUp(self): + boxes1 = np.array([[4.0, 3.0, 7.0, 5.0], [5.0, 6.0, 10.0, 7.0]], + dtype=float) + boxes2 = np.array([[3.0, 4.0, 6.0, 8.0], [14.0, 14.0, 15.0, 15.0], + [0.0, 0.0, 20.0, 20.0]], + dtype=float) + self.boxlist1 = np_box_list.BoxList(boxes1) + self.boxlist2 = np_box_list.BoxList(boxes2) + + def test_area(self): + areas = np_box_list_ops.area(self.boxlist1) + expected_areas = np.array([6.0, 5.0], dtype=float) + self.assertAllClose(expected_areas, areas) + + def test_intersection(self): + intersection = np_box_list_ops.intersection(self.boxlist1, self.boxlist2) + expected_intersection = np.array([[2.0, 0.0, 6.0], [1.0, 0.0, 5.0]], + dtype=float) + self.assertAllClose(intersection, expected_intersection) + + def test_iou(self): + iou = np_box_list_ops.iou(self.boxlist1, self.boxlist2) + expected_iou = np.array([[2.0 / 16.0, 0.0, 6.0 / 400.0], + [1.0 / 16.0, 0.0, 5.0 / 400.0]], + dtype=float) + self.assertAllClose(iou, expected_iou) + + def test_ioa(self): + boxlist1 = np_box_list.BoxList( + np.array( + [[0.25, 0.25, 0.75, 0.75], [0.0, 0.0, 0.5, 0.75]], dtype= + np.float32)) + boxlist2 = np_box_list.BoxList( + np.array( + [[0.5, 0.25, 1.0, 1.0], [0.0, 0.0, 1.0, 1.0]], dtype=np.float32)) + ioa21 = np_box_list_ops.ioa(boxlist2, boxlist1) + expected_ioa21 = np.array([[0.5, 0.0], + [1.0, 1.0]], + dtype=np.float32) + self.assertAllClose(ioa21, expected_ioa21) + + def test_scale(self): + boxlist = np_box_list.BoxList( + np.array( + [[0.25, 0.25, 0.75, 0.75], [0.0, 0.0, 0.5, 0.75]], dtype= + np.float32)) + boxlist_scaled = np_box_list_ops.scale(boxlist, 2.0, 3.0) + expected_boxlist_scaled = np_box_list.BoxList( + np.array( + [[0.5, 0.75, 1.5, 2.25], [0.0, 0.0, 1.0, 2.25]], dtype=np.float32)) + self.assertAllClose(expected_boxlist_scaled.get(), boxlist_scaled.get()) + + def test_clip_to_window(self): + boxlist = np_box_list.BoxList( + np.array( + [[0.25, 0.25, 0.75, 0.75], [0.0, 0.0, 0.5, 0.75], + [-0.2, -0.3, 0.7, 1.5]], + dtype=np.float32)) + boxlist_clipped = np_box_list_ops.clip_to_window(boxlist, + [0.0, 0.0, 1.0, 1.0]) + expected_boxlist_clipped = np_box_list.BoxList( + np.array( + [[0.25, 0.25, 0.75, 0.75], [0.0, 0.0, 0.5, 0.75], + [0.0, 0.0, 0.7, 1.0]], + dtype=np.float32)) + self.assertAllClose(expected_boxlist_clipped.get(), boxlist_clipped.get()) + + def test_prune_outside_window(self): + boxlist = np_box_list.BoxList( + np.array( + [[0.25, 0.25, 0.75, 0.75], [0.0, 0.0, 0.5, 0.75], + [-0.2, -0.3, 0.7, 1.5]], + dtype=np.float32)) + boxlist_pruned, _ = np_box_list_ops.prune_outside_window( + boxlist, [0.0, 0.0, 1.0, 1.0]) + expected_boxlist_pruned = np_box_list.BoxList( + np.array( + [[0.25, 0.25, 0.75, 0.75], [0.0, 0.0, 0.5, 0.75]], dtype= + np.float32)) + self.assertAllClose(expected_boxlist_pruned.get(), boxlist_pruned.get()) + + def test_concatenate(self): + boxlist1 = np_box_list.BoxList( + np.array( + [[0.25, 0.25, 0.75, 0.75], [0.0, 0.0, 0.5, 0.75]], dtype= + np.float32)) + boxlist2 = np_box_list.BoxList( + np.array( + [[0.5, 0.25, 1.0, 1.0], [0.0, 0.0, 1.0, 1.0]], dtype=np.float32)) + boxlists = [boxlist1, boxlist2] + boxlist_concatenated = np_box_list_ops.concatenate(boxlists) + boxlist_concatenated_expected = np_box_list.BoxList( + np.array( + [[0.25, 0.25, 0.75, 0.75], [0.0, 0.0, 0.5, 0.75], + [0.5, 0.25, 1.0, 1.0], [0.0, 0.0, 1.0, 1.0]], + dtype=np.float32)) + self.assertAllClose(boxlist_concatenated_expected.get(), + boxlist_concatenated.get()) + + def test_change_coordinate_frame(self): + boxlist = np_box_list.BoxList( + np.array( + [[0.25, 0.25, 0.75, 0.75], [0.0, 0.0, 0.5, 0.75]], dtype= + np.float32)) + boxlist_coord = np_box_list_ops.change_coordinate_frame( + boxlist, np.array([0, 0, 0.5, 0.5], dtype=np.float32)) + expected_boxlist_coord = np_box_list.BoxList( + np.array([[0.5, 0.5, 1.5, 1.5], [0, 0, 1.0, 1.5]], dtype=np.float32)) + self.assertAllClose(boxlist_coord.get(), expected_boxlist_coord.get()) + + def test_filter_scores_greater_than(self): + boxlist = np_box_list.BoxList( + np.array( + [[0.25, 0.25, 0.75, 0.75], [0.0, 0.0, 0.5, 0.75]], dtype= + np.float32)) + boxlist.add_field('scores', np.array([0.8, 0.2], np.float32)) + boxlist_greater = np_box_list_ops.filter_scores_greater_than(boxlist, 0.5) + + expected_boxlist_greater = np_box_list.BoxList( + np.array([[0.25, 0.25, 0.75, 0.75]], dtype=np.float32)) + + self.assertAllClose(boxlist_greater.get(), expected_boxlist_greater.get()) + + +class GatherOpsTest(tf.test.TestCase): + + def setUp(self): + boxes = np.array([[3.0, 4.0, 6.0, 8.0], [14.0, 14.0, 15.0, 15.0], + [0.0, 0.0, 20.0, 20.0]], + dtype=float) + self.boxlist = np_box_list.BoxList(boxes) + self.boxlist.add_field('scores', np.array([0.5, 0.7, 0.9], dtype=float)) + self.boxlist.add_field('labels', + np.array([[0, 0, 0, 1, 0], [0, 1, 0, 0, 0], + [0, 0, 0, 0, 1]], + dtype=int)) + + def test_gather_with_out_of_range_indices(self): + indices = np.array([3, 1], dtype=int) + boxlist = self.boxlist + with self.assertRaises(ValueError): + np_box_list_ops.gather(boxlist, indices) + + def test_gather_with_invalid_multidimensional_indices(self): + indices = np.array([[0, 1], [1, 2]], dtype=int) + boxlist = self.boxlist + with self.assertRaises(ValueError): + np_box_list_ops.gather(boxlist, indices) + + def test_gather_without_fields_specified(self): + indices = np.array([2, 0, 1], dtype=int) + boxlist = self.boxlist + subboxlist = np_box_list_ops.gather(boxlist, indices) + + expected_scores = np.array([0.9, 0.5, 0.7], dtype=float) + self.assertAllClose(expected_scores, subboxlist.get_field('scores')) + + expected_boxes = np.array([[0.0, 0.0, 20.0, 20.0], [3.0, 4.0, 6.0, 8.0], + [14.0, 14.0, 15.0, 15.0]], + dtype=float) + self.assertAllClose(expected_boxes, subboxlist.get()) + + expected_labels = np.array([[0, 0, 0, 0, 1], [0, 0, 0, 1, 0], + [0, 1, 0, 0, 0]], + dtype=int) + self.assertAllClose(expected_labels, subboxlist.get_field('labels')) + + def test_gather_with_invalid_field_specified(self): + indices = np.array([2, 0, 1], dtype=int) + boxlist = self.boxlist + + with self.assertRaises(ValueError): + np_box_list_ops.gather(boxlist, indices, 'labels') + + with self.assertRaises(ValueError): + np_box_list_ops.gather(boxlist, indices, ['objectness']) + + def test_gather_with_fields_specified(self): + indices = np.array([2, 0, 1], dtype=int) + boxlist = self.boxlist + subboxlist = np_box_list_ops.gather(boxlist, indices, ['labels']) + + self.assertFalse(subboxlist.has_field('scores')) + + expected_boxes = np.array([[0.0, 0.0, 20.0, 20.0], [3.0, 4.0, 6.0, 8.0], + [14.0, 14.0, 15.0, 15.0]], + dtype=float) + self.assertAllClose(expected_boxes, subboxlist.get()) + + expected_labels = np.array([[0, 0, 0, 0, 1], [0, 0, 0, 1, 0], + [0, 1, 0, 0, 0]], + dtype=int) + self.assertAllClose(expected_labels, subboxlist.get_field('labels')) + + +class SortByFieldTest(tf.test.TestCase): + + def setUp(self): + boxes = np.array([[3.0, 4.0, 6.0, 8.0], [14.0, 14.0, 15.0, 15.0], + [0.0, 0.0, 20.0, 20.0]], + dtype=float) + self.boxlist = np_box_list.BoxList(boxes) + self.boxlist.add_field('scores', np.array([0.5, 0.9, 0.4], dtype=float)) + self.boxlist.add_field('labels', + np.array([[0, 0, 0, 1, 0], [0, 1, 0, 0, 0], + [0, 0, 0, 0, 1]], + dtype=int)) + + def test_with_invalid_field(self): + with self.assertRaises(ValueError): + np_box_list_ops.sort_by_field(self.boxlist, 'objectness') + with self.assertRaises(ValueError): + np_box_list_ops.sort_by_field(self.boxlist, 'labels') + + def test_with_invalid_sorting_order(self): + with self.assertRaises(ValueError): + np_box_list_ops.sort_by_field(self.boxlist, 'scores', 'Descending') + + def test_with_descending_sorting(self): + sorted_boxlist = np_box_list_ops.sort_by_field(self.boxlist, 'scores') + + expected_boxes = np.array([[14.0, 14.0, 15.0, 15.0], [3.0, 4.0, 6.0, 8.0], + [0.0, 0.0, 20.0, 20.0]], + dtype=float) + self.assertAllClose(expected_boxes, sorted_boxlist.get()) + + expected_scores = np.array([0.9, 0.5, 0.4], dtype=float) + self.assertAllClose(expected_scores, sorted_boxlist.get_field('scores')) + + def test_with_ascending_sorting(self): + sorted_boxlist = np_box_list_ops.sort_by_field( + self.boxlist, 'scores', np_box_list_ops.SortOrder.ASCEND) + + expected_boxes = np.array([[0.0, 0.0, 20.0, 20.0], + [3.0, 4.0, 6.0, 8.0], + [14.0, 14.0, 15.0, 15.0],], + dtype=float) + self.assertAllClose(expected_boxes, sorted_boxlist.get()) + + expected_scores = np.array([0.4, 0.5, 0.9], dtype=float) + self.assertAllClose(expected_scores, sorted_boxlist.get_field('scores')) + + +class NonMaximumSuppressionTest(tf.test.TestCase): + + def setUp(self): + self._boxes = np.array([[0, 0, 1, 1], + [0, 0.1, 1, 1.1], + [0, -0.1, 1, 0.9], + [0, 10, 1, 11], + [0, 10.1, 1, 11.1], + [0, 100, 1, 101]], + dtype=float) + self._boxlist = np_box_list.BoxList(self._boxes) + + def test_with_no_scores_field(self): + boxlist = np_box_list.BoxList(self._boxes) + max_output_size = 3 + iou_threshold = 0.5 + + with self.assertRaises(ValueError): + np_box_list_ops.non_max_suppression( + boxlist, max_output_size, iou_threshold) + + def test_nms_disabled_max_output_size_equals_three(self): + boxlist = np_box_list.BoxList(self._boxes) + boxlist.add_field('scores', + np.array([.9, .75, .6, .95, .2, .3], dtype=float)) + max_output_size = 3 + iou_threshold = 1. # No NMS + + expected_boxes = np.array([[0, 10, 1, 11], [0, 0, 1, 1], [0, 0.1, 1, 1.1]], + dtype=float) + nms_boxlist = np_box_list_ops.non_max_suppression( + boxlist, max_output_size, iou_threshold) + self.assertAllClose(nms_boxlist.get(), expected_boxes) + + def test_select_from_three_clusters(self): + boxlist = np_box_list.BoxList(self._boxes) + boxlist.add_field('scores', + np.array([.9, .75, .6, .95, .2, .3], dtype=float)) + max_output_size = 3 + iou_threshold = 0.5 + + expected_boxes = np.array([[0, 10, 1, 11], [0, 0, 1, 1], [0, 100, 1, 101]], + dtype=float) + nms_boxlist = np_box_list_ops.non_max_suppression( + boxlist, max_output_size, iou_threshold) + self.assertAllClose(nms_boxlist.get(), expected_boxes) + + def test_select_at_most_two_from_three_clusters(self): + boxlist = np_box_list.BoxList(self._boxes) + boxlist.add_field('scores', + np.array([.9, .75, .6, .95, .5, .3], dtype=float)) + max_output_size = 2 + iou_threshold = 0.5 + + expected_boxes = np.array([[0, 10, 1, 11], [0, 0, 1, 1]], dtype=float) + nms_boxlist = np_box_list_ops.non_max_suppression( + boxlist, max_output_size, iou_threshold) + self.assertAllClose(nms_boxlist.get(), expected_boxes) + + def test_select_at_most_thirty_from_three_clusters(self): + boxlist = np_box_list.BoxList(self._boxes) + boxlist.add_field('scores', + np.array([.9, .75, .6, .95, .5, .3], dtype=float)) + max_output_size = 30 + iou_threshold = 0.5 + + expected_boxes = np.array([[0, 10, 1, 11], [0, 0, 1, 1], [0, 100, 1, 101]], + dtype=float) + nms_boxlist = np_box_list_ops.non_max_suppression( + boxlist, max_output_size, iou_threshold) + self.assertAllClose(nms_boxlist.get(), expected_boxes) + + def test_select_from_ten_indentical_boxes(self): + boxes = np.array(10 * [[0, 0, 1, 1]], dtype=float) + boxlist = np_box_list.BoxList(boxes) + boxlist.add_field('scores', np.array(10 * [0.8])) + iou_threshold = .5 + max_output_size = 3 + expected_boxes = np.array([[0, 0, 1, 1]], dtype=float) + nms_boxlist = np_box_list_ops.non_max_suppression( + boxlist, max_output_size, iou_threshold) + self.assertAllClose(nms_boxlist.get(), expected_boxes) + + def test_different_iou_threshold(self): + boxes = np.array([[0, 0, 20, 100], [0, 0, 20, 80], [200, 200, 210, 300], + [200, 200, 210, 250]], + dtype=float) + boxlist = np_box_list.BoxList(boxes) + boxlist.add_field('scores', np.array([0.9, 0.8, 0.7, 0.6])) + max_output_size = 4 + + iou_threshold = .4 + expected_boxes = np.array([[0, 0, 20, 100], + [200, 200, 210, 300],], + dtype=float) + nms_boxlist = np_box_list_ops.non_max_suppression( + boxlist, max_output_size, iou_threshold) + self.assertAllClose(nms_boxlist.get(), expected_boxes) + + iou_threshold = .5 + expected_boxes = np.array([[0, 0, 20, 100], [200, 200, 210, 300], + [200, 200, 210, 250]], + dtype=float) + nms_boxlist = np_box_list_ops.non_max_suppression( + boxlist, max_output_size, iou_threshold) + self.assertAllClose(nms_boxlist.get(), expected_boxes) + + iou_threshold = .8 + expected_boxes = np.array([[0, 0, 20, 100], [0, 0, 20, 80], + [200, 200, 210, 300], [200, 200, 210, 250]], + dtype=float) + nms_boxlist = np_box_list_ops.non_max_suppression( + boxlist, max_output_size, iou_threshold) + self.assertAllClose(nms_boxlist.get(), expected_boxes) + + def test_multiclass_nms(self): + boxlist = np_box_list.BoxList( + np.array( + [[0.2, 0.4, 0.8, 0.8], [0.4, 0.2, 0.8, 0.8], [0.6, 0.0, 1.0, 1.0]], + dtype=np.float32)) + scores = np.array([[-0.2, 0.1, 0.5, -0.4, 0.3], + [0.7, -0.7, 0.6, 0.2, -0.9], + [0.4, 0.34, -0.9, 0.2, 0.31]], + dtype=np.float32) + boxlist.add_field('scores', scores) + boxlist_clean = np_box_list_ops.multi_class_non_max_suppression( + boxlist, score_thresh=0.25, iou_thresh=0.1, max_output_size=3) + + scores_clean = boxlist_clean.get_field('scores') + classes_clean = boxlist_clean.get_field('classes') + boxes = boxlist_clean.get() + expected_scores = np.array([0.7, 0.6, 0.34, 0.31]) + expected_classes = np.array([0, 2, 1, 4]) + expected_boxes = np.array([[0.4, 0.2, 0.8, 0.8], + [0.4, 0.2, 0.8, 0.8], + [0.6, 0.0, 1.0, 1.0], + [0.6, 0.0, 1.0, 1.0]], + dtype=np.float32) + self.assertAllClose(scores_clean, expected_scores) + self.assertAllClose(classes_clean, expected_classes) + self.assertAllClose(boxes, expected_boxes) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/utils/np_box_list_test.py b/object_detection/utils/np_box_list_test.py new file mode 100644 index 0000000000000000000000000000000000000000..bb0ee5d2887b7a7a958168323a2a3d074c7ee831 --- /dev/null +++ b/object_detection/utils/np_box_list_test.py @@ -0,0 +1,135 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.utils.np_box_list_test.""" + +import numpy as np +import tensorflow as tf + +from object_detection.utils import np_box_list + + +class BoxListTest(tf.test.TestCase): + + def test_invalid_box_data(self): + with self.assertRaises(ValueError): + np_box_list.BoxList([0, 0, 1, 1]) + + with self.assertRaises(ValueError): + np_box_list.BoxList(np.array([[0, 0, 1, 1]], dtype=int)) + + with self.assertRaises(ValueError): + np_box_list.BoxList(np.array([0, 1, 1, 3, 4], dtype=float)) + + with self.assertRaises(ValueError): + np_box_list.BoxList(np.array([[0, 1, 1, 3], [3, 1, 1, 5]], dtype=float)) + + def test_has_field_with_existed_field(self): + boxes = np.array([[3.0, 4.0, 6.0, 8.0], [14.0, 14.0, 15.0, 15.0], + [0.0, 0.0, 20.0, 20.0]], + dtype=float) + boxlist = np_box_list.BoxList(boxes) + self.assertTrue(boxlist.has_field('boxes')) + + def test_has_field_with_nonexisted_field(self): + boxes = np.array([[3.0, 4.0, 6.0, 8.0], [14.0, 14.0, 15.0, 15.0], + [0.0, 0.0, 20.0, 20.0]], + dtype=float) + boxlist = np_box_list.BoxList(boxes) + self.assertFalse(boxlist.has_field('scores')) + + def test_get_field_with_existed_field(self): + boxes = np.array([[3.0, 4.0, 6.0, 8.0], [14.0, 14.0, 15.0, 15.0], + [0.0, 0.0, 20.0, 20.0]], + dtype=float) + boxlist = np_box_list.BoxList(boxes) + self.assertTrue(np.allclose(boxlist.get_field('boxes'), boxes)) + + def test_get_field_with_nonexited_field(self): + boxes = np.array([[3.0, 4.0, 6.0, 8.0], [14.0, 14.0, 15.0, 15.0], + [0.0, 0.0, 20.0, 20.0]], + dtype=float) + boxlist = np_box_list.BoxList(boxes) + with self.assertRaises(ValueError): + boxlist.get_field('scores') + + +class AddExtraFieldTest(tf.test.TestCase): + + def setUp(self): + boxes = np.array([[3.0, 4.0, 6.0, 8.0], [14.0, 14.0, 15.0, 15.0], + [0.0, 0.0, 20.0, 20.0]], + dtype=float) + self.boxlist = np_box_list.BoxList(boxes) + + def test_add_already_existed_field(self): + with self.assertRaises(ValueError): + self.boxlist.add_field('boxes', np.array([[0, 0, 0, 1, 0]], dtype=float)) + + def test_add_invalid_field_data(self): + with self.assertRaises(ValueError): + self.boxlist.add_field('scores', np.array([0.5, 0.7], dtype=float)) + with self.assertRaises(ValueError): + self.boxlist.add_field('scores', + np.array([0.5, 0.7, 0.9, 0.1], dtype=float)) + + def test_add_single_dimensional_field_data(self): + boxlist = self.boxlist + scores = np.array([0.5, 0.7, 0.9], dtype=float) + boxlist.add_field('scores', scores) + self.assertTrue(np.allclose(scores, self.boxlist.get_field('scores'))) + + def test_add_multi_dimensional_field_data(self): + boxlist = self.boxlist + labels = np.array([[0, 0, 0, 1, 0], [0, 1, 0, 0, 0], [0, 0, 0, 0, 1]], + dtype=int) + boxlist.add_field('labels', labels) + self.assertTrue(np.allclose(labels, self.boxlist.get_field('labels'))) + + def test_get_extra_fields(self): + boxlist = self.boxlist + self.assertSameElements(boxlist.get_extra_fields(), []) + + scores = np.array([0.5, 0.7, 0.9], dtype=float) + boxlist.add_field('scores', scores) + self.assertSameElements(boxlist.get_extra_fields(), ['scores']) + + labels = np.array([[0, 0, 0, 1, 0], [0, 1, 0, 0, 0], [0, 0, 0, 0, 1]], + dtype=int) + boxlist.add_field('labels', labels) + self.assertSameElements(boxlist.get_extra_fields(), ['scores', 'labels']) + + def test_get_coordinates(self): + y_min, x_min, y_max, x_max = self.boxlist.get_coordinates() + + expected_y_min = np.array([3.0, 14.0, 0.0], dtype=float) + expected_x_min = np.array([4.0, 14.0, 0.0], dtype=float) + expected_y_max = np.array([6.0, 15.0, 20.0], dtype=float) + expected_x_max = np.array([8.0, 15.0, 20.0], dtype=float) + + self.assertTrue(np.allclose(y_min, expected_y_min)) + self.assertTrue(np.allclose(x_min, expected_x_min)) + self.assertTrue(np.allclose(y_max, expected_y_max)) + self.assertTrue(np.allclose(x_max, expected_x_max)) + + def test_num_boxes(self): + boxes = np.array([[0., 0., 100., 100.], [10., 30., 50., 70.]], dtype=float) + boxlist = np_box_list.BoxList(boxes) + expected_num_boxes = 2 + self.assertEquals(boxlist.num_boxes(), expected_num_boxes) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/utils/np_box_ops.py b/object_detection/utils/np_box_ops.py new file mode 100644 index 0000000000000000000000000000000000000000..b4b46a75650c0dc06a5cbe3c0751778b1106f9f3 --- /dev/null +++ b/object_detection/utils/np_box_ops.py @@ -0,0 +1,97 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Operations for [N, 4] numpy arrays representing bounding boxes. + +Example box operations that are supported: + * Areas: compute bounding box areas + * IOU: pairwise intersection-over-union scores +""" +import numpy as np + + +def area(boxes): + """Computes area of boxes. + + Args: + boxes: Numpy array with shape [N, 4] holding N boxes + + Returns: + a numpy array with shape [N*1] representing box areas + """ + return (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1]) + + +def intersection(boxes1, boxes2): + """Compute pairwise intersection areas between boxes. + + Args: + boxes1: a numpy array with shape [N, 4] holding N boxes + boxes2: a numpy array with shape [M, 4] holding M boxes + + Returns: + a numpy array with shape [N*M] representing pairwise intersection area + """ + [y_min1, x_min1, y_max1, x_max1] = np.split(boxes1, 4, axis=1) + [y_min2, x_min2, y_max2, x_max2] = np.split(boxes2, 4, axis=1) + + all_pairs_min_ymax = np.minimum(y_max1, np.transpose(y_max2)) + all_pairs_max_ymin = np.maximum(y_min1, np.transpose(y_min2)) + intersect_heights = np.maximum( + np.zeros(all_pairs_max_ymin.shape), + all_pairs_min_ymax - all_pairs_max_ymin) + all_pairs_min_xmax = np.minimum(x_max1, np.transpose(x_max2)) + all_pairs_max_xmin = np.maximum(x_min1, np.transpose(x_min2)) + intersect_widths = np.maximum( + np.zeros(all_pairs_max_xmin.shape), + all_pairs_min_xmax - all_pairs_max_xmin) + return intersect_heights * intersect_widths + + +def iou(boxes1, boxes2): + """Computes pairwise intersection-over-union between box collections. + + Args: + boxes1: a numpy array with shape [N, 4] holding N boxes. + boxes2: a numpy array with shape [M, 4] holding N boxes. + + Returns: + a numpy array with shape [N, M] representing pairwise iou scores. + """ + intersect = intersection(boxes1, boxes2) + area1 = area(boxes1) + area2 = area(boxes2) + union = np.expand_dims(area1, axis=1) + np.expand_dims( + area2, axis=0) - intersect + return intersect / union + + +def ioa(boxes1, boxes2): + """Computes pairwise intersection-over-area between box collections. + + Intersection-over-area (ioa) between two boxes box1 and box2 is defined as + their intersection area over box2's area. Note that ioa is not symmetric, + that is, IOA(box1, box2) != IOA(box2, box1). + + Args: + boxes1: a numpy array with shape [N, 4] holding N boxes. + boxes2: a numpy array with shape [M, 4] holding N boxes. + + Returns: + a numpy array with shape [N, M] representing pairwise ioa scores. + """ + intersect = intersection(boxes1, boxes2) + areas = np.expand_dims(area(boxes2), axis=0) + return intersect / areas diff --git a/object_detection/utils/np_box_ops_test.py b/object_detection/utils/np_box_ops_test.py new file mode 100644 index 0000000000000000000000000000000000000000..730f3d205a1e83ea8efaec6689cf8251e68f46a7 --- /dev/null +++ b/object_detection/utils/np_box_ops_test.py @@ -0,0 +1,68 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.np_box_ops.""" + +import numpy as np +import tensorflow as tf + +from object_detection.utils import np_box_ops + + +class BoxOpsTests(tf.test.TestCase): + + def setUp(self): + boxes1 = np.array([[4.0, 3.0, 7.0, 5.0], [5.0, 6.0, 10.0, 7.0]], + dtype=float) + boxes2 = np.array([[3.0, 4.0, 6.0, 8.0], [14.0, 14.0, 15.0, 15.0], + [0.0, 0.0, 20.0, 20.0]], + dtype=float) + self.boxes1 = boxes1 + self.boxes2 = boxes2 + + def testArea(self): + areas = np_box_ops.area(self.boxes1) + expected_areas = np.array([6.0, 5.0], dtype=float) + self.assertAllClose(expected_areas, areas) + + def testIntersection(self): + intersection = np_box_ops.intersection(self.boxes1, self.boxes2) + expected_intersection = np.array([[2.0, 0.0, 6.0], [1.0, 0.0, 5.0]], + dtype=float) + self.assertAllClose(intersection, expected_intersection) + + def testIOU(self): + iou = np_box_ops.iou(self.boxes1, self.boxes2) + expected_iou = np.array([[2.0 / 16.0, 0.0, 6.0 / 400.0], + [1.0 / 16.0, 0.0, 5.0 / 400.0]], + dtype=float) + self.assertAllClose(iou, expected_iou) + + def testIOA(self): + boxes1 = np.array([[0.25, 0.25, 0.75, 0.75], + [0.0, 0.0, 0.5, 0.75]], + dtype=np.float32) + boxes2 = np.array([[0.5, 0.25, 1.0, 1.0], + [0.0, 0.0, 1.0, 1.0]], + dtype=np.float32) + ioa21 = np_box_ops.ioa(boxes2, boxes1) + expected_ioa21 = np.array([[0.5, 0.0], + [1.0, 1.0]], + dtype=np.float32) + self.assertAllClose(ioa21, expected_ioa21) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/utils/object_detection_evaluation.py b/object_detection/utils/object_detection_evaluation.py new file mode 100644 index 0000000000000000000000000000000000000000..b2b14844bed4a2c0c4fed7fea0f77f8dbd4082ce --- /dev/null +++ b/object_detection/utils/object_detection_evaluation.py @@ -0,0 +1,233 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""object_detection_evaluation module. + +ObjectDetectionEvaluation is a class which manages ground truth information of a +object detection dataset, and computes frequently used detection metrics such as +Precision, Recall, CorLoc of the provided detection results. +It supports the following operations: +1) Add ground truth information of images sequentially. +2) Add detection result of images sequentially. +3) Evaluate detection metrics on already inserted detection results. +4) Write evaluation result into a pickle file for future processing or + visualization. + +Note: This module operates on numpy boxes and box lists. +""" + +import logging +import numpy as np + +from object_detection.utils import metrics +from object_detection.utils import per_image_evaluation + + +class ObjectDetectionEvaluation(object): + """Evaluate Object Detection Result.""" + + def __init__(self, + num_groundtruth_classes, + matching_iou_threshold=0.5, + nms_iou_threshold=1.0, + nms_max_output_boxes=10000): + self.per_image_eval = per_image_evaluation.PerImageEvaluation( + num_groundtruth_classes, matching_iou_threshold, nms_iou_threshold, + nms_max_output_boxes) + self.num_class = num_groundtruth_classes + + self.groundtruth_boxes = {} + self.groundtruth_class_labels = {} + self.groundtruth_is_difficult_list = {} + self.num_gt_instances_per_class = np.zeros(self.num_class, dtype=int) + self.num_gt_imgs_per_class = np.zeros(self.num_class, dtype=int) + + self.detection_keys = set() + self.scores_per_class = [[] for _ in range(self.num_class)] + self.tp_fp_labels_per_class = [[] for _ in range(self.num_class)] + self.num_images_correctly_detected_per_class = np.zeros(self.num_class) + self.average_precision_per_class = np.empty(self.num_class, dtype=float) + self.average_precision_per_class.fill(np.nan) + self.precisions_per_class = [] + self.recalls_per_class = [] + self.corloc_per_class = np.ones(self.num_class, dtype=float) + + def clear_detections(self): + self.detection_keys = {} + self.scores_per_class = [[] for _ in range(self.num_class)] + self.tp_fp_labels_per_class = [[] for _ in range(self.num_class)] + self.num_images_correctly_detected_per_class = np.zeros(self.num_class) + self.average_precision_per_class = np.zeros(self.num_class, dtype=float) + self.precisions_per_class = [] + self.recalls_per_class = [] + self.corloc_per_class = np.ones(self.num_class, dtype=float) + + def add_single_ground_truth_image_info(self, + image_key, + groundtruth_boxes, + groundtruth_class_labels, + groundtruth_is_difficult_list=None): + """Add ground truth info of a single image into the evaluation database. + + Args: + image_key: sha256 key of image content + groundtruth_boxes: A numpy array of shape [M, 4] representing object box + coordinates[y_min, x_min, y_max, x_max] + groundtruth_class_labels: A 1-d numpy array of length M representing class + labels + groundtruth_is_difficult_list: A length M numpy boolean array denoting + whether a ground truth box is a difficult instance or not. To support + the case that no boxes are difficult, it is by default set as None. + """ + if image_key in self.groundtruth_boxes: + logging.warn( + 'image %s has already been added to the ground truth database.', + image_key) + return + + self.groundtruth_boxes[image_key] = groundtruth_boxes + self.groundtruth_class_labels[image_key] = groundtruth_class_labels + if groundtruth_is_difficult_list is None: + num_boxes = groundtruth_boxes.shape[0] + groundtruth_is_difficult_list = np.zeros(num_boxes, dtype=bool) + self.groundtruth_is_difficult_list[ + image_key] = groundtruth_is_difficult_list.astype(dtype=bool) + self._update_ground_truth_statistics(groundtruth_class_labels, + groundtruth_is_difficult_list) + + def add_single_detected_image_info(self, image_key, detected_boxes, + detected_scores, detected_class_labels): + """Add detected result of a single image into the evaluation database. + + Args: + image_key: sha256 key of image content + detected_boxes: A numpy array of shape [N, 4] representing detected box + coordinates[y_min, x_min, y_max, x_max] + detected_scores: A 1-d numpy array of length N representing classification + score + detected_class_labels: A 1-d numpy array of length N representing class + labels + Raises: + ValueError: if detected_boxes, detected_scores and detected_class_labels + do not have the same length. + """ + if (len(detected_boxes) != len(detected_scores) or + len(detected_boxes) != len(detected_class_labels)): + raise ValueError('detected_boxes, detected_scores and ' + 'detected_class_labels should all have same lengths. Got' + '[%d, %d, %d]' % len(detected_boxes), + len(detected_scores), len(detected_class_labels)) + + if image_key in self.detection_keys: + logging.warn( + 'image %s has already been added to the detection result database', + image_key) + return + + self.detection_keys.add(image_key) + if image_key in self.groundtruth_boxes: + groundtruth_boxes = self.groundtruth_boxes[image_key] + groundtruth_class_labels = self.groundtruth_class_labels[image_key] + groundtruth_is_difficult_list = self.groundtruth_is_difficult_list[ + image_key] + else: + groundtruth_boxes = np.empty(shape=[0, 4], dtype=float) + groundtruth_class_labels = np.array([], dtype=int) + groundtruth_is_difficult_list = np.array([], dtype=bool) + scores, tp_fp_labels, is_class_correctly_detected_in_image = ( + self.per_image_eval.compute_object_detection_metrics( + detected_boxes, detected_scores, detected_class_labels, + groundtruth_boxes, groundtruth_class_labels, + groundtruth_is_difficult_list)) + for i in range(self.num_class): + self.scores_per_class[i].append(scores[i]) + self.tp_fp_labels_per_class[i].append(tp_fp_labels[i]) + (self.num_images_correctly_detected_per_class + ) += is_class_correctly_detected_in_image + + def _update_ground_truth_statistics(self, groundtruth_class_labels, + groundtruth_is_difficult_list): + """Update grouth truth statitistics. + + 1. Difficult boxes are ignored when counting the number of ground truth + instances as done in Pascal VOC devkit. + 2. Difficult boxes are treated as normal boxes when computing CorLoc related + statitistics. + + Args: + groundtruth_class_labels: An integer numpy array of length M, + representing M class labels of object instances in ground truth + groundtruth_is_difficult_list: A boolean numpy array of length M denoting + whether a ground truth box is a difficult instance or not + """ + for class_index in range(self.num_class): + num_gt_instances = np.sum(groundtruth_class_labels[ + ~groundtruth_is_difficult_list] == class_index) + self.num_gt_instances_per_class[class_index] += num_gt_instances + if np.any(groundtruth_class_labels == class_index): + self.num_gt_imgs_per_class[class_index] += 1 + + def evaluate(self): + """Compute evaluation result. + + Returns: + average_precision_per_class: float numpy array of average precision for + each class. + mean_ap: mean average precision of all classes, float scalar + precisions_per_class: List of precisions, each precision is a float numpy + array + recalls_per_class: List of recalls, each recall is a float numpy array + corloc_per_class: numpy float array + mean_corloc: Mean CorLoc score for each class, float scalar + """ + if (self.num_gt_instances_per_class == 0).any(): + logging.warn( + 'The following classes have no ground truth examples: %s', + np.squeeze(np.argwhere(self.num_gt_instances_per_class == 0))) + for class_index in range(self.num_class): + if self.num_gt_instances_per_class[class_index] == 0: + continue + scores = np.concatenate(self.scores_per_class[class_index]) + tp_fp_labels = np.concatenate(self.tp_fp_labels_per_class[class_index]) + precision, recall = metrics.compute_precision_recall( + scores, tp_fp_labels, self.num_gt_instances_per_class[class_index]) + self.precisions_per_class.append(precision) + self.recalls_per_class.append(recall) + average_precision = metrics.compute_average_precision(precision, recall) + self.average_precision_per_class[class_index] = average_precision + + self.corloc_per_class = metrics.compute_cor_loc( + self.num_gt_imgs_per_class, + self.num_images_correctly_detected_per_class) + + mean_ap = np.nanmean(self.average_precision_per_class) + mean_corloc = np.nanmean(self.corloc_per_class) + return (self.average_precision_per_class, mean_ap, + self.precisions_per_class, self.recalls_per_class, + self.corloc_per_class, mean_corloc) + + def get_eval_result(self): + return EvalResult(self.average_precision_per_class, + self.precisions_per_class, self.recalls_per_class, + self.corloc_per_class) + + +class EvalResult(object): + + def __init__(self, average_precisions, precisions, recalls, all_corloc): + self.precisions = precisions + self.recalls = recalls + self.all_corloc = all_corloc + self.average_precisions = average_precisions diff --git a/object_detection/utils/object_detection_evaluation_test.py b/object_detection/utils/object_detection_evaluation_test.py new file mode 100644 index 0000000000000000000000000000000000000000..12bfc6b9d2b0e447aca3507f27885dbc45ba617b --- /dev/null +++ b/object_detection/utils/object_detection_evaluation_test.py @@ -0,0 +1,125 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.utils.object_detection_evaluation.""" + +import numpy as np +import tensorflow as tf + +from object_detection.utils import object_detection_evaluation + + +class ObjectDetectionEvaluationTest(tf.test.TestCase): + + def setUp(self): + num_groundtruth_classes = 3 + self.od_eval = object_detection_evaluation.ObjectDetectionEvaluation( + num_groundtruth_classes) + + image_key1 = "img1" + groundtruth_boxes1 = np.array([[0, 0, 1, 1], [0, 0, 2, 2], [0, 0, 3, 3]], + dtype=float) + groundtruth_class_labels1 = np.array([0, 2, 0], dtype=int) + self.od_eval.add_single_ground_truth_image_info( + image_key1, groundtruth_boxes1, groundtruth_class_labels1) + image_key2 = "img2" + groundtruth_boxes2 = np.array([[10, 10, 11, 11], [500, 500, 510, 510], + [10, 10, 12, 12]], dtype=float) + groundtruth_class_labels2 = np.array([0, 0, 2], dtype=int) + groundtruth_is_difficult_list2 = np.array([False, True, False], dtype=bool) + self.od_eval.add_single_ground_truth_image_info( + image_key2, groundtruth_boxes2, groundtruth_class_labels2, + groundtruth_is_difficult_list2) + image_key3 = "img3" + groundtruth_boxes3 = np.array([[0, 0, 1, 1]], dtype=float) + groundtruth_class_labels3 = np.array([1], dtype=int) + self.od_eval.add_single_ground_truth_image_info( + image_key3, groundtruth_boxes3, groundtruth_class_labels3) + + image_key = "img2" + detected_boxes = np.array( + [[10, 10, 11, 11], [100, 100, 120, 120], [100, 100, 220, 220]], + dtype=float) + detected_class_labels = np.array([0, 0, 2], dtype=int) + detected_scores = np.array([0.7, 0.8, 0.9], dtype=float) + self.od_eval.add_single_detected_image_info( + image_key, detected_boxes, detected_scores, detected_class_labels) + + def test_add_single_ground_truth_image_info(self): + expected_num_gt_instances_per_class = np.array([3, 1, 2], dtype=int) + expected_num_gt_imgs_per_class = np.array([2, 1, 2], dtype=int) + self.assertTrue(np.array_equal(expected_num_gt_instances_per_class, + self.od_eval.num_gt_instances_per_class)) + self.assertTrue(np.array_equal(expected_num_gt_imgs_per_class, + self.od_eval.num_gt_imgs_per_class)) + groundtruth_boxes2 = np.array([[10, 10, 11, 11], [500, 500, 510, 510], + [10, 10, 12, 12]], dtype=float) + self.assertTrue(np.allclose(self.od_eval.groundtruth_boxes["img2"], + groundtruth_boxes2)) + groundtruth_is_difficult_list2 = np.array([False, True, False], dtype=bool) + self.assertTrue(np.allclose( + self.od_eval.groundtruth_is_difficult_list["img2"], + groundtruth_is_difficult_list2)) + groundtruth_class_labels1 = np.array([0, 2, 0], dtype=int) + self.assertTrue(np.array_equal(self.od_eval.groundtruth_class_labels[ + "img1"], groundtruth_class_labels1)) + + def test_add_single_detected_image_info(self): + expected_scores_per_class = [[np.array([0.8, 0.7], dtype=float)], [], + [np.array([0.9], dtype=float)]] + expected_tp_fp_labels_per_class = [[np.array([0, 1], dtype=bool)], [], + [np.array([0], dtype=bool)]] + expected_num_images_correctly_detected_per_class = np.array([0, 0, 0], + dtype=int) + for i in range(self.od_eval.num_class): + for j in range(len(expected_scores_per_class[i])): + self.assertTrue(np.allclose(expected_scores_per_class[i][j], + self.od_eval.scores_per_class[i][j])) + self.assertTrue(np.array_equal(expected_tp_fp_labels_per_class[i][ + j], self.od_eval.tp_fp_labels_per_class[i][j])) + self.assertTrue(np.array_equal( + expected_num_images_correctly_detected_per_class, + self.od_eval.num_images_correctly_detected_per_class)) + + def test_evaluate(self): + (average_precision_per_class, mean_ap, precisions_per_class, + recalls_per_class, corloc_per_class, + mean_corloc) = self.od_eval.evaluate() + expected_precisions_per_class = [np.array([0, 0.5], dtype=float), + np.array([], dtype=float), + np.array([0], dtype=float)] + expected_recalls_per_class = [ + np.array([0, 1. / 3.], dtype=float), np.array([], dtype=float), + np.array([0], dtype=float) + ] + expected_average_precision_per_class = np.array([1. / 6., 0, 0], + dtype=float) + expected_corloc_per_class = np.array([0, np.divide(0, 0), 0], dtype=float) + expected_mean_ap = 1. / 18 + expected_mean_corloc = 0.0 + for i in range(self.od_eval.num_class): + self.assertTrue(np.allclose(expected_precisions_per_class[i], + precisions_per_class[i])) + self.assertTrue(np.allclose(expected_recalls_per_class[i], + recalls_per_class[i])) + self.assertTrue(np.allclose(expected_average_precision_per_class, + average_precision_per_class)) + self.assertTrue(np.allclose(expected_corloc_per_class, corloc_per_class)) + self.assertAlmostEqual(expected_mean_ap, mean_ap) + self.assertAlmostEqual(expected_mean_corloc, mean_corloc) + + +if __name__ == "__main__": + tf.test.main() diff --git a/object_detection/utils/ops.py b/object_detection/utils/ops.py new file mode 100644 index 0000000000000000000000000000000000000000..290cd33a8990ccfd9cab46b00715d8e4af0e9386 --- /dev/null +++ b/object_detection/utils/ops.py @@ -0,0 +1,651 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""A module for helper tensorflow ops.""" +import math +import six + +import tensorflow as tf + +from object_detection.core import box_list +from object_detection.core import box_list_ops +from object_detection.core import standard_fields as fields +from object_detection.utils import static_shape + + +def expanded_shape(orig_shape, start_dim, num_dims): + """Inserts multiple ones into a shape vector. + + Inserts an all-1 vector of length num_dims at position start_dim into a shape. + Can be combined with tf.reshape to generalize tf.expand_dims. + + Args: + orig_shape: the shape into which the all-1 vector is added (int32 vector) + start_dim: insertion position (int scalar) + num_dims: length of the inserted all-1 vector (int scalar) + Returns: + An int32 vector of length tf.size(orig_shape) + num_dims. + """ + with tf.name_scope('ExpandedShape'): + start_dim = tf.expand_dims(start_dim, 0) # scalar to rank-1 + before = tf.slice(orig_shape, [0], start_dim) + add_shape = tf.ones(tf.reshape(num_dims, [1]), dtype=tf.int32) + after = tf.slice(orig_shape, start_dim, [-1]) + new_shape = tf.concat([before, add_shape, after], 0) + return new_shape + + +def normalized_to_image_coordinates(normalized_boxes, image_shape, + parallel_iterations=32): + """Converts a batch of boxes from normal to image coordinates. + + Args: + normalized_boxes: a float32 tensor of shape [None, num_boxes, 4] in + normalized coordinates. + image_shape: a float32 tensor of shape [4] containing the image shape. + parallel_iterations: parallelism for the map_fn op. + + Returns: + absolute_boxes: a float32 tensor of shape [None, num_boxes, 4] containg the + boxes in image coordinates. + """ + def _to_absolute_coordinates(normalized_boxes): + return box_list_ops.to_absolute_coordinates( + box_list.BoxList(normalized_boxes), + image_shape[1], image_shape[2], check_range=False).get() + + absolute_boxes = tf.map_fn( + _to_absolute_coordinates, + elems=(normalized_boxes), + dtype=tf.float32, + parallel_iterations=parallel_iterations, + back_prop=True) + return absolute_boxes + + +def meshgrid(x, y): + """Tiles the contents of x and y into a pair of grids. + + Multidimensional analog of numpy.meshgrid, giving the same behavior if x and y + are vectors. Generally, this will give: + + xgrid(i1, ..., i_m, j_1, ..., j_n) = x(j_1, ..., j_n) + ygrid(i1, ..., i_m, j_1, ..., j_n) = y(i_1, ..., i_m) + + Keep in mind that the order of the arguments and outputs is reverse relative + to the order of the indices they go into, done for compatibility with numpy. + The output tensors have the same shapes. Specifically: + + xgrid.get_shape() = y.get_shape().concatenate(x.get_shape()) + ygrid.get_shape() = y.get_shape().concatenate(x.get_shape()) + + Args: + x: A tensor of arbitrary shape and rank. xgrid will contain these values + varying in its last dimensions. + y: A tensor of arbitrary shape and rank. ygrid will contain these values + varying in its first dimensions. + Returns: + A tuple of tensors (xgrid, ygrid). + """ + with tf.name_scope('Meshgrid'): + x = tf.convert_to_tensor(x) + y = tf.convert_to_tensor(y) + x_exp_shape = expanded_shape(tf.shape(x), 0, tf.rank(y)) + y_exp_shape = expanded_shape(tf.shape(y), tf.rank(y), tf.rank(x)) + + xgrid = tf.tile(tf.reshape(x, x_exp_shape), y_exp_shape) + ygrid = tf.tile(tf.reshape(y, y_exp_shape), x_exp_shape) + new_shape = y.get_shape().concatenate(x.get_shape()) + xgrid.set_shape(new_shape) + ygrid.set_shape(new_shape) + + return xgrid, ygrid + + +def pad_to_multiple(tensor, multiple): + """Returns the tensor zero padded to the specified multiple. + + Appends 0s to the end of the first and second dimension (height and width) of + the tensor until both dimensions are a multiple of the input argument + 'multiple'. E.g. given an input tensor of shape [1, 3, 5, 1] and an input + multiple of 4, PadToMultiple will append 0s so that the resulting tensor will + be of shape [1, 4, 8, 1]. + + Args: + tensor: rank 4 float32 tensor, where + tensor -> [batch_size, height, width, channels]. + multiple: the multiple to pad to. + + Returns: + padded_tensor: the tensor zero padded to the specified multiple. + """ + tensor_shape = tensor.get_shape() + batch_size = static_shape.get_batch_size(tensor_shape) + tensor_height = static_shape.get_height(tensor_shape) + tensor_width = static_shape.get_width(tensor_shape) + tensor_depth = static_shape.get_depth(tensor_shape) + + if batch_size is None: + batch_size = tf.shape(tensor)[0] + + if tensor_height is None: + tensor_height = tf.shape(tensor)[1] + padded_tensor_height = tf.to_int32( + tf.ceil(tf.to_float(tensor_height) / tf.to_float(multiple))) * multiple + else: + padded_tensor_height = int( + math.ceil(float(tensor_height) / multiple) * multiple) + + if tensor_width is None: + tensor_width = tf.shape(tensor)[2] + padded_tensor_width = tf.to_int32( + tf.ceil(tf.to_float(tensor_width) / tf.to_float(multiple))) * multiple + else: + padded_tensor_width = int( + math.ceil(float(tensor_width) / multiple) * multiple) + + if tensor_depth is None: + tensor_depth = tf.shape(tensor)[3] + + # Use tf.concat instead of tf.pad to preserve static shape + height_pad = tf.zeros([ + batch_size, padded_tensor_height - tensor_height, tensor_width, + tensor_depth + ]) + padded_tensor = tf.concat([tensor, height_pad], 1) + width_pad = tf.zeros([ + batch_size, padded_tensor_height, padded_tensor_width - tensor_width, + tensor_depth + ]) + padded_tensor = tf.concat([padded_tensor, width_pad], 2) + + return padded_tensor + + +def padded_one_hot_encoding(indices, depth, left_pad): + """Returns a zero padded one-hot tensor. + + This function converts a sparse representation of indices (e.g., [4]) to a + zero padded one-hot representation (e.g., [0, 0, 0, 0, 1] with depth = 4 and + left_pad = 1). If `indices` is empty, the result will simply be a tensor of + shape (0, depth + left_pad). If depth = 0, then this function just returns + `None`. + + Args: + indices: an integer tensor of shape [num_indices]. + depth: depth for the one-hot tensor (integer). + left_pad: number of zeros to left pad the one-hot tensor with (integer). + + Returns: + padded_onehot: a tensor with shape (num_indices, depth + left_pad). Returns + `None` if the depth is zero. + + Raises: + ValueError: if `indices` does not have rank 1 or if `left_pad` or `depth are + either negative or non-integers. + + TODO: add runtime checks for depth and indices. + """ + if depth < 0 or not isinstance(depth, (int, long) if six.PY2 else int): + raise ValueError('`depth` must be a non-negative integer.') + if left_pad < 0 or not isinstance(left_pad, (int, long) if six.PY2 else int): + raise ValueError('`left_pad` must be a non-negative integer.') + if depth == 0: + return None + if len(indices.get_shape().as_list()) != 1: + raise ValueError('`indices` must have rank 1') + + def one_hot_and_pad(): + one_hot = tf.cast(tf.one_hot(tf.cast(indices, tf.int64), depth, + on_value=1, off_value=0), tf.float32) + return tf.pad(one_hot, [[0, 0], [left_pad, 0]], mode='CONSTANT') + result = tf.cond(tf.greater(tf.size(indices), 0), one_hot_and_pad, + lambda: tf.zeros((depth + left_pad, 0))) + return tf.reshape(result, [-1, depth + left_pad]) + + +def dense_to_sparse_boxes(dense_locations, dense_num_boxes, num_classes): + """Converts bounding boxes from dense to sparse form. + + Args: + dense_locations: a [max_num_boxes, 4] tensor in which only the first k rows + are valid bounding box location coordinates, where k is the sum of + elements in dense_num_boxes. + dense_num_boxes: a [max_num_classes] tensor indicating the counts of + various bounding box classes e.g. [1, 0, 0, 2] means that the first + bounding box is of class 0 and the second and third bounding boxes are + of class 3. The sum of elements in this tensor is the number of valid + bounding boxes. + num_classes: number of classes + + Returns: + box_locations: a [num_boxes, 4] tensor containing only valid bounding + boxes (i.e. the first num_boxes rows of dense_locations) + box_classes: a [num_boxes] tensor containing the classes of each bounding + box (e.g. dense_num_boxes = [1, 0, 0, 2] => box_classes = [0, 3, 3] + """ + + num_valid_boxes = tf.reduce_sum(dense_num_boxes) + box_locations = tf.slice(dense_locations, + tf.constant([0, 0]), tf.stack([num_valid_boxes, 4])) + tiled_classes = [tf.tile([i], tf.expand_dims(dense_num_boxes[i], 0)) + for i in range(num_classes)] + box_classes = tf.concat(tiled_classes, 0) + box_locations.set_shape([None, 4]) + return box_locations, box_classes + + +def indices_to_dense_vector(indices, + size, + indices_value=1., + default_value=0, + dtype=tf.float32): + """Creates dense vector with indices set to specific value and rest to zeros. + + This function exists because it is unclear if it is safe to use + tf.sparse_to_dense(indices, [size], 1, validate_indices=False) + with indices which are not ordered. + This function accepts a dynamic size (e.g. tf.shape(tensor)[0]) + + Args: + indices: 1d Tensor with integer indices which are to be set to + indices_values. + size: scalar with size (integer) of output Tensor. + indices_value: values of elements specified by indices in the output vector + default_value: values of other elements in the output vector. + dtype: data type. + + Returns: + dense 1D Tensor of shape [size] with indices set to indices_values and the + rest set to default_value. + """ + size = tf.to_int32(size) + zeros = tf.ones([size], dtype=dtype) * default_value + values = tf.ones_like(indices, dtype=dtype) * indices_value + + return tf.dynamic_stitch([tf.range(size), tf.to_int32(indices)], + [zeros, values]) + + +def retain_groundtruth(tensor_dict, valid_indices): + """Retains groundtruth by valid indices. + + Args: + tensor_dict: a dictionary of following groundtruth tensors - + fields.InputDataFields.groundtruth_boxes + fields.InputDataFields.groundtruth_classes + fields.InputDataFields.groundtruth_is_crowd + fields.InputDataFields.groundtruth_area + fields.InputDataFields.groundtruth_label_types + fields.InputDataFields.groundtruth_difficult + valid_indices: a tensor with valid indices for the box-level groundtruth. + + Returns: + a dictionary of tensors containing only the groundtruth for valid_indices. + + Raises: + ValueError: If the shape of valid_indices is invalid. + ValueError: field fields.InputDataFields.groundtruth_boxes is + not present in tensor_dict. + """ + input_shape = valid_indices.get_shape().as_list() + if not (len(input_shape) == 1 or + (len(input_shape) == 2 and input_shape[1] == 1)): + raise ValueError('The shape of valid_indices is invalid.') + valid_indices = tf.reshape(valid_indices, [-1]) + valid_dict = {} + if fields.InputDataFields.groundtruth_boxes in tensor_dict: + # Prevents reshape failure when num_boxes is 0. + num_boxes = tf.maximum(tf.shape( + tensor_dict[fields.InputDataFields.groundtruth_boxes])[0], 1) + for key in tensor_dict: + if key in [fields.InputDataFields.groundtruth_boxes, + fields.InputDataFields.groundtruth_classes]: + valid_dict[key] = tf.gather(tensor_dict[key], valid_indices) + # Input decoder returns empty tensor when these fields are not provided. + # Needs to reshape into [num_boxes, -1] for tf.gather() to work. + elif key in [fields.InputDataFields.groundtruth_is_crowd, + fields.InputDataFields.groundtruth_area, + fields.InputDataFields.groundtruth_difficult, + fields.InputDataFields.groundtruth_label_types]: + valid_dict[key] = tf.reshape( + tf.gather(tf.reshape(tensor_dict[key], [num_boxes, -1]), + valid_indices), [-1]) + # Fields that are not associated with boxes. + else: + valid_dict[key] = tensor_dict[key] + else: + raise ValueError('%s not present in input tensor dict.' % ( + fields.InputDataFields.groundtruth_boxes)) + return valid_dict + + +def retain_groundtruth_with_positive_classes(tensor_dict): + """Retains only groundtruth with positive class ids. + + Args: + tensor_dict: a dictionary of following groundtruth tensors - + fields.InputDataFields.groundtruth_boxes + fields.InputDataFields.groundtruth_classes + fields.InputDataFields.groundtruth_is_crowd + fields.InputDataFields.groundtruth_area + fields.InputDataFields.groundtruth_label_types + fields.InputDataFields.groundtruth_difficult + + Returns: + a dictionary of tensors containing only the groundtruth with positive + classes. + + Raises: + ValueError: If groundtruth_classes tensor is not in tensor_dict. + """ + if fields.InputDataFields.groundtruth_classes not in tensor_dict: + raise ValueError('`groundtruth classes` not in tensor_dict.') + keep_indices = tf.where(tf.greater( + tensor_dict[fields.InputDataFields.groundtruth_classes], 0)) + return retain_groundtruth(tensor_dict, keep_indices) + + +def filter_groundtruth_with_nan_box_coordinates(tensor_dict): + """Filters out groundtruth with no bounding boxes. + + Args: + tensor_dict: a dictionary of following groundtruth tensors - + fields.InputDataFields.groundtruth_boxes + fields.InputDataFields.groundtruth_classes + fields.InputDataFields.groundtruth_is_crowd + fields.InputDataFields.groundtruth_area + fields.InputDataFields.groundtruth_label_types + + Returns: + a dictionary of tensors containing only the groundtruth that have bounding + boxes. + """ + groundtruth_boxes = tensor_dict[fields.InputDataFields.groundtruth_boxes] + nan_indicator_vector = tf.greater(tf.reduce_sum(tf.to_int32( + tf.is_nan(groundtruth_boxes)), reduction_indices=[1]), 0) + valid_indicator_vector = tf.logical_not(nan_indicator_vector) + valid_indices = tf.where(valid_indicator_vector) + + return retain_groundtruth(tensor_dict, valid_indices) + + +def normalize_to_target(inputs, + target_norm_value, + dim, + epsilon=1e-7, + trainable=True, + scope='NormalizeToTarget', + summarize=True): + """L2 normalizes the inputs across the specified dimension to a target norm. + + This op implements the L2 Normalization layer introduced in + Liu, Wei, et al. "SSD: Single Shot MultiBox Detector." + and Liu, Wei, Andrew Rabinovich, and Alexander C. Berg. + "Parsenet: Looking wider to see better." and is useful for bringing + activations from multiple layers in a convnet to a standard scale. + + Note that the rank of `inputs` must be known and the dimension to which + normalization is to be applied should be statically defined. + + TODO: Add option to scale by L2 norm of the entire input. + + Args: + inputs: A `Tensor` of arbitrary size. + target_norm_value: A float value that specifies an initial target norm or + a list of floats (whose length must be equal to the depth along the + dimension to be normalized) specifying a per-dimension multiplier + after normalization. + dim: The dimension along which the input is normalized. + epsilon: A small value to add to the inputs to avoid dividing by zero. + trainable: Whether the norm is trainable or not + scope: Optional scope for variable_scope. + summarize: Whether or not to add a tensorflow summary for the op. + + Returns: + The input tensor normalized to the specified target norm. + + Raises: + ValueError: If dim is smaller than the number of dimensions in 'inputs'. + ValueError: If target_norm_value is not a float or a list of floats with + length equal to the depth along the dimension to be normalized. + """ + with tf.variable_scope(scope, 'NormalizeToTarget', [inputs]): + if not inputs.get_shape(): + raise ValueError('The input rank must be known.') + input_shape = inputs.get_shape().as_list() + input_rank = len(input_shape) + if dim < 0 or dim >= input_rank: + raise ValueError( + 'dim must be non-negative but smaller than the input rank.') + if not input_shape[dim]: + raise ValueError('input shape should be statically defined along ' + 'the specified dimension.') + depth = input_shape[dim] + if not (isinstance(target_norm_value, float) or + (isinstance(target_norm_value, list) and + len(target_norm_value) == depth) and + all([isinstance(val, float) for val in target_norm_value])): + raise ValueError('target_norm_value must be a float or a list of floats ' + 'with length equal to the depth along the dimension to ' + 'be normalized.') + if isinstance(target_norm_value, float): + initial_norm = depth * [target_norm_value] + else: + initial_norm = target_norm_value + target_norm = tf.contrib.framework.model_variable( + name='weights', dtype=tf.float32, + initializer=tf.constant(initial_norm, dtype=tf.float32), + trainable=trainable) + if summarize: + mean = tf.reduce_mean(target_norm) + mean = tf.Print(mean, ['NormalizeToTarget:', mean]) + tf.summary.scalar(tf.get_variable_scope().name, mean) + lengths = epsilon + tf.sqrt(tf.reduce_sum(tf.square(inputs), dim, True)) + mult_shape = input_rank*[1] + mult_shape[dim] = depth + return tf.reshape(target_norm, mult_shape) * tf.truediv(inputs, lengths) + + +def position_sensitive_crop_regions(image, + boxes, + box_ind, + crop_size, + num_spatial_bins, + global_pool, + extrapolation_value=None): + """Position-sensitive crop and pool rectangular regions from a feature grid. + + The output crops are split into `spatial_bins_y` vertical bins + and `spatial_bins_x` horizontal bins. For each intersection of a vertical + and a horizontal bin the output values are gathered by performing + `tf.image.crop_and_resize` (bilinear resampling) on a a separate subset of + channels of the image. This reduces `depth` by a factor of + `(spatial_bins_y * spatial_bins_x)`. + + When global_pool is True, this function implements a differentiable version + of position-sensitive RoI pooling used in + [R-FCN detection system](https://arxiv.org/abs/1605.06409). + + When global_pool is False, this function implements a differentiable version + of position-sensitive assembling operation used in + [instance FCN](https://arxiv.org/abs/1603.08678). + + Args: + image: A `Tensor`. Must be one of the following types: `uint8`, `int8`, + `int16`, `int32`, `int64`, `half`, `float32`, `float64`. + A 4-D tensor of shape `[batch, image_height, image_width, depth]`. + Both `image_height` and `image_width` need to be positive. + boxes: A `Tensor` of type `float32`. + A 2-D tensor of shape `[num_boxes, 4]`. The `i`-th row of the tensor + specifies the coordinates of a box in the `box_ind[i]` image and is + specified in normalized coordinates `[y1, x1, y2, x2]`. A normalized + coordinate value of `y` is mapped to the image coordinate at + `y * (image_height - 1)`, so as the `[0, 1]` interval of normalized image + height is mapped to `[0, image_height - 1] in image height coordinates. + We do allow y1 > y2, in which case the sampled crop is an up-down flipped + version of the original image. The width dimension is treated similarly. + Normalized coordinates outside the `[0, 1]` range are allowed, in which + case we use `extrapolation_value` to extrapolate the input image values. + box_ind: A `Tensor` of type `int32`. + A 1-D tensor of shape `[num_boxes]` with int32 values in `[0, batch)`. + The value of `box_ind[i]` specifies the image that the `i`-th box refers + to. + crop_size: A list of two integers `[crop_height, crop_width]`. All + cropped image patches are resized to this size. The aspect ratio of the + image content is not preserved. Both `crop_height` and `crop_width` need + to be positive. + num_spatial_bins: A list of two integers `[spatial_bins_y, spatial_bins_x]`. + Represents the number of position-sensitive bins in y and x directions. + Both values should be >= 1. `crop_height` should be divisible by + `spatial_bins_y`, and similarly for width. + The number of image channels should be divisible by + (spatial_bins_y * spatial_bins_x). + Suggested value from R-FCN paper: [3, 3]. + global_pool: A boolean variable. + If True, we perform average global pooling on the features assembled from + the position-sensitive score maps. + If False, we keep the position-pooled features without global pooling + over the spatial coordinates. + Note that using global_pool=True is equivalent to but more efficient than + running the function with global_pool=False and then performing global + average pooling. + extrapolation_value: An optional `float`. Defaults to `0`. + Value used for extrapolation, when applicable. + Returns: + position_sensitive_features: A 4-D tensor of shape + `[num_boxes, K, K, crop_channels]`, + where `crop_channels = depth / (spatial_bins_y * spatial_bins_x)`, + where K = 1 when global_pool is True (Average-pooled cropped regions), + and K = crop_size when global_pool is False. + Raises: + ValueError: Raised in four situations: + `num_spatial_bins` is not >= 1; + `num_spatial_bins` does not divide `crop_size`; + `(spatial_bins_y*spatial_bins_x)` does not divide `depth`; + `bin_crop_size` is not square when global_pool=False due to the + constraint in function space_to_depth. + """ + total_bins = 1 + bin_crop_size = [] + + for (num_bins, crop_dim) in zip(num_spatial_bins, crop_size): + if num_bins < 1: + raise ValueError('num_spatial_bins should be >= 1') + + if crop_dim % num_bins != 0: + raise ValueError('crop_size should be divisible by num_spatial_bins') + + total_bins *= num_bins + bin_crop_size.append(crop_dim // num_bins) + + if not global_pool and bin_crop_size[0] != bin_crop_size[1]: + raise ValueError('Only support square bin crop size for now.') + + ymin, xmin, ymax, xmax = tf.unstack(boxes, axis=1) + spatial_bins_y, spatial_bins_x = num_spatial_bins + + # Split each box into spatial_bins_y * spatial_bins_x bins. + position_sensitive_boxes = [] + for bin_y in range(spatial_bins_y): + step_y = (ymax - ymin) / spatial_bins_y + for bin_x in range(spatial_bins_x): + step_x = (xmax - xmin) / spatial_bins_x + box_coordinates = [ymin + bin_y * step_y, + xmin + bin_x * step_x, + ymin + (bin_y + 1) * step_y, + xmin + (bin_x + 1) * step_x, + ] + position_sensitive_boxes.append(tf.stack(box_coordinates, axis=1)) + + image_splits = tf.split(value=image, num_or_size_splits=total_bins, axis=3) + + image_crops = [] + for (split, box) in zip(image_splits, position_sensitive_boxes): + crop = tf.image.crop_and_resize(split, box, box_ind, bin_crop_size, + extrapolation_value=extrapolation_value) + image_crops.append(crop) + + if global_pool: + # Average over all bins. + position_sensitive_features = tf.add_n(image_crops) / len(image_crops) + # Then average over spatial positions within the bins. + position_sensitive_features = tf.reduce_mean( + position_sensitive_features, [1, 2], keep_dims=True) + else: + # Reorder height/width to depth channel. + block_size = bin_crop_size[0] + if block_size >= 2: + image_crops = [tf.space_to_depth( + crop, block_size=block_size) for crop in image_crops] + + # Pack image_crops so that first dimension is for position-senstive boxes. + position_sensitive_features = tf.stack(image_crops, axis=0) + + # Unroll the position-sensitive boxes to spatial positions. + position_sensitive_features = tf.squeeze( + tf.batch_to_space_nd(position_sensitive_features, + block_shape=[1] + num_spatial_bins, + crops=tf.zeros((3, 2), dtype=tf.int32)), + squeeze_dims=[0]) + + # Reorder back the depth channel. + if block_size >= 2: + position_sensitive_features = tf.depth_to_space( + position_sensitive_features, block_size=block_size) + + return position_sensitive_features + + +def reframe_box_masks_to_image_masks(box_masks, boxes, image_height, + image_width): + """Transforms the box masks back to full image masks. + + Embeds masks in bounding boxes of larger masks whose shapes correspond to + image shape. + + Args: + box_masks: A tf.float32 tensor of size [num_masks, mask_height, mask_width]. + boxes: A tf.float32 tensor of size [num_masks, 4] containing the box + corners. Row i contains [ymin, xmin, ymax, xmax] of the box + corresponding to mask i. Note that the box corners are in + normalized coordinates. + image_height: Image height. The output mask will have the same height as + the image height. + image_width: Image width. The output mask will have the same width as the + image width. + + Returns: + A tf.float32 tensor of size [num_masks, image_height, image_width]. + """ + # TODO: Make this a public function. + def transform_boxes_relative_to_boxes(boxes, reference_boxes): + boxes = tf.reshape(boxes, [-1, 2, 2]) + min_corner = tf.expand_dims(reference_boxes[:, 0:2], 1) + max_corner = tf.expand_dims(reference_boxes[:, 2:4], 1) + transformed_boxes = (boxes - min_corner) / (max_corner - min_corner) + return tf.reshape(transformed_boxes, [-1, 4]) + + box_masks = tf.expand_dims(box_masks, axis=3) + num_boxes = tf.shape(box_masks)[0] + unit_boxes = tf.concat( + [tf.zeros([num_boxes, 2]), tf.ones([num_boxes, 2])], axis=1) + reverse_boxes = transform_boxes_relative_to_boxes(unit_boxes, boxes) + image_masks = tf.image.crop_and_resize(image=box_masks, + boxes=reverse_boxes, + box_ind=tf.range(num_boxes), + crop_size=[image_height, image_width], + extrapolation_value=0.0) + return tf.squeeze(image_masks, axis=3) diff --git a/object_detection/utils/ops_test.py b/object_detection/utils/ops_test.py new file mode 100644 index 0000000000000000000000000000000000000000..1765c82a24bb0f1e5c58ef94fe3f578c11461192 --- /dev/null +++ b/object_detection/utils/ops_test.py @@ -0,0 +1,1033 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.utils.ops.""" +import numpy as np +import tensorflow as tf + +from object_detection.core import standard_fields as fields +from object_detection.utils import ops + + +class NormalizedToImageCoordinatesTest(tf.test.TestCase): + + def test_normalized_to_image_coordinates(self): + normalized_boxes = tf.placeholder(tf.float32, shape=(None, 1, 4)) + normalized_boxes_np = np.array([[[0.0, 0.0, 1.0, 1.0]], + [[0.5, 0.5, 1.0, 1.0]]]) + image_shape = tf.convert_to_tensor([1, 4, 4, 3], dtype=tf.int32) + absolute_boxes = ops.normalized_to_image_coordinates(normalized_boxes, + image_shape, + parallel_iterations=2) + + expected_boxes = np.array([[[0, 0, 4, 4]], + [[2, 2, 4, 4]]]) + with self.test_session() as sess: + absolute_boxes = sess.run(absolute_boxes, + feed_dict={normalized_boxes: + normalized_boxes_np}) + + self.assertAllEqual(absolute_boxes, expected_boxes) + + +class MeshgridTest(tf.test.TestCase): + + def test_meshgrid_numpy_comparison(self): + """Tests meshgrid op with vectors, for which it should match numpy.""" + x = np.arange(4) + y = np.arange(6) + exp_xgrid, exp_ygrid = np.meshgrid(x, y) + xgrid, ygrid = ops.meshgrid(x, y) + with self.test_session() as sess: + xgrid_output, ygrid_output = sess.run([xgrid, ygrid]) + self.assertAllEqual(xgrid_output, exp_xgrid) + self.assertAllEqual(ygrid_output, exp_ygrid) + + def test_meshgrid_multidimensional(self): + np.random.seed(18) + x = np.random.rand(4, 1, 2).astype(np.float32) + y = np.random.rand(2, 3).astype(np.float32) + + xgrid, ygrid = ops.meshgrid(x, y) + + grid_shape = list(y.shape) + list(x.shape) + self.assertEqual(xgrid.get_shape().as_list(), grid_shape) + self.assertEqual(ygrid.get_shape().as_list(), grid_shape) + with self.test_session() as sess: + xgrid_output, ygrid_output = sess.run([xgrid, ygrid]) + + # Check the shape of the output grids + self.assertEqual(xgrid_output.shape, tuple(grid_shape)) + self.assertEqual(ygrid_output.shape, tuple(grid_shape)) + + # Check a few elements + test_elements = [((3, 0, 0), (1, 2)), + ((2, 0, 1), (0, 0)), + ((0, 0, 0), (1, 1))] + for xind, yind in test_elements: + # These are float equality tests, but the meshgrid op should not introduce + # rounding. + self.assertEqual(xgrid_output[yind + xind], x[xind]) + self.assertEqual(ygrid_output[yind + xind], y[yind]) + + +class OpsTestPadToMultiple(tf.test.TestCase): + + def test_zero_padding(self): + tensor = tf.constant([[[[0.], [0.]], [[0.], [0.]]]]) + padded_tensor = ops.pad_to_multiple(tensor, 1) + with self.test_session() as sess: + padded_tensor_out = sess.run(padded_tensor) + self.assertEqual((1, 2, 2, 1), padded_tensor_out.shape) + + def test_no_padding(self): + tensor = tf.constant([[[[0.], [0.]], [[0.], [0.]]]]) + padded_tensor = ops.pad_to_multiple(tensor, 2) + with self.test_session() as sess: + padded_tensor_out = sess.run(padded_tensor) + self.assertEqual((1, 2, 2, 1), padded_tensor_out.shape) + + def test_padding(self): + tensor = tf.constant([[[[0.], [0.]], [[0.], [0.]]]]) + padded_tensor = ops.pad_to_multiple(tensor, 4) + with self.test_session() as sess: + padded_tensor_out = sess.run(padded_tensor) + self.assertEqual((1, 4, 4, 1), padded_tensor_out.shape) + + +class OpsTestPaddedOneHotEncoding(tf.test.TestCase): + + def test_correct_one_hot_tensor_with_no_pad(self): + indices = tf.constant([1, 2, 3, 5]) + one_hot_tensor = ops.padded_one_hot_encoding(indices, depth=6, left_pad=0) + expected_tensor = np.array([[0, 1, 0, 0, 0, 0], + [0, 0, 1, 0, 0, 0], + [0, 0, 0, 1, 0, 0], + [0, 0, 0, 0, 0, 1]], np.float32) + with self.test_session() as sess: + out_one_hot_tensor = sess.run(one_hot_tensor) + self.assertAllClose(out_one_hot_tensor, expected_tensor, rtol=1e-10, + atol=1e-10) + + def test_correct_one_hot_tensor_with_pad_one(self): + indices = tf.constant([1, 2, 3, 5]) + one_hot_tensor = ops.padded_one_hot_encoding(indices, depth=6, left_pad=1) + expected_tensor = np.array([[0, 0, 1, 0, 0, 0, 0], + [0, 0, 0, 1, 0, 0, 0], + [0, 0, 0, 0, 1, 0, 0], + [0, 0, 0, 0, 0, 0, 1]], np.float32) + with self.test_session() as sess: + out_one_hot_tensor = sess.run(one_hot_tensor) + self.assertAllClose(out_one_hot_tensor, expected_tensor, rtol=1e-10, + atol=1e-10) + + def test_correct_one_hot_tensor_with_pad_three(self): + indices = tf.constant([1, 2, 3, 5]) + one_hot_tensor = ops.padded_one_hot_encoding(indices, depth=6, left_pad=3) + expected_tensor = np.array([[0, 0, 0, 0, 1, 0, 0, 0, 0], + [0, 0, 0, 0, 0, 1, 0, 0, 0], + [0, 0, 0, 0, 0, 0, 1, 0, 0], + [0, 0, 0, 0, 0, 0, 0, 0, 1]], np.float32) + with self.test_session() as sess: + out_one_hot_tensor = sess.run(one_hot_tensor) + self.assertAllClose(out_one_hot_tensor, expected_tensor, rtol=1e-10, + atol=1e-10) + + def test_correct_padded_one_hot_tensor_with_empty_indices(self): + depth = 6 + pad = 2 + indices = tf.constant([]) + one_hot_tensor = ops.padded_one_hot_encoding( + indices, depth=depth, left_pad=pad) + expected_tensor = np.zeros((0, depth + pad)) + with self.test_session() as sess: + out_one_hot_tensor = sess.run(one_hot_tensor) + self.assertAllClose(out_one_hot_tensor, expected_tensor, rtol=1e-10, + atol=1e-10) + + def test_return_none_on_zero_depth(self): + indices = tf.constant([1, 2, 3, 4, 5]) + one_hot_tensor = ops.padded_one_hot_encoding(indices, depth=0, left_pad=2) + self.assertEqual(one_hot_tensor, None) + + def test_raise_value_error_on_rank_two_input(self): + indices = tf.constant(1.0, shape=(2, 3)) + with self.assertRaises(ValueError): + ops.padded_one_hot_encoding(indices, depth=6, left_pad=2) + + def test_raise_value_error_on_negative_pad(self): + indices = tf.constant(1.0, shape=(2, 3)) + with self.assertRaises(ValueError): + ops.padded_one_hot_encoding(indices, depth=6, left_pad=-1) + + def test_raise_value_error_on_float_pad(self): + indices = tf.constant(1.0, shape=(2, 3)) + with self.assertRaises(ValueError): + ops.padded_one_hot_encoding(indices, depth=6, left_pad=0.1) + + def test_raise_value_error_on_float_depth(self): + indices = tf.constant(1.0, shape=(2, 3)) + with self.assertRaises(ValueError): + ops.padded_one_hot_encoding(indices, depth=0.1, left_pad=2) + + +class OpsDenseToSparseBoxesTest(tf.test.TestCase): + + def test_return_all_boxes_when_all_input_boxes_are_valid(self): + num_classes = 4 + num_valid_boxes = 3 + code_size = 4 + dense_location_placeholder = tf.placeholder(tf.float32, + shape=(num_valid_boxes, + code_size)) + dense_num_boxes_placeholder = tf.placeholder(tf.int32, shape=(num_classes)) + box_locations, box_classes = ops.dense_to_sparse_boxes( + dense_location_placeholder, dense_num_boxes_placeholder, num_classes) + feed_dict = {dense_location_placeholder: np.random.uniform( + size=[num_valid_boxes, code_size]), + dense_num_boxes_placeholder: np.array([1, 0, 0, 2], + dtype=np.int32)} + + expected_box_locations = feed_dict[dense_location_placeholder] + expected_box_classses = np.array([0, 3, 3]) + with self.test_session() as sess: + box_locations, box_classes = sess.run([box_locations, box_classes], + feed_dict=feed_dict) + + self.assertAllClose(box_locations, expected_box_locations, rtol=1e-6, + atol=1e-6) + self.assertAllEqual(box_classes, expected_box_classses) + + def test_return_only_valid_boxes_when_input_contains_invalid_boxes(self): + num_classes = 4 + num_valid_boxes = 3 + num_boxes = 10 + code_size = 4 + + dense_location_placeholder = tf.placeholder(tf.float32, shape=(num_boxes, + code_size)) + dense_num_boxes_placeholder = tf.placeholder(tf.int32, shape=(num_classes)) + box_locations, box_classes = ops.dense_to_sparse_boxes( + dense_location_placeholder, dense_num_boxes_placeholder, num_classes) + feed_dict = {dense_location_placeholder: np.random.uniform( + size=[num_boxes, code_size]), + dense_num_boxes_placeholder: np.array([1, 0, 0, 2], + dtype=np.int32)} + + expected_box_locations = (feed_dict[dense_location_placeholder] + [:num_valid_boxes]) + expected_box_classses = np.array([0, 3, 3]) + with self.test_session() as sess: + box_locations, box_classes = sess.run([box_locations, box_classes], + feed_dict=feed_dict) + + self.assertAllClose(box_locations, expected_box_locations, rtol=1e-6, + atol=1e-6) + self.assertAllEqual(box_classes, expected_box_classses) + + +class OpsTestIndicesToDenseVector(tf.test.TestCase): + + def test_indices_to_dense_vector(self): + size = 10000 + num_indices = np.random.randint(size) + rand_indices = np.random.permutation(np.arange(size))[0:num_indices] + + expected_output = np.zeros(size, dtype=np.float32) + expected_output[rand_indices] = 1. + + tf_rand_indices = tf.constant(rand_indices) + indicator = ops.indices_to_dense_vector(tf_rand_indices, size) + + with self.test_session() as sess: + output = sess.run(indicator) + self.assertAllEqual(output, expected_output) + self.assertEqual(output.dtype, expected_output.dtype) + + def test_indices_to_dense_vector_size_at_inference(self): + size = 5000 + num_indices = 250 + all_indices = np.arange(size) + rand_indices = np.random.permutation(all_indices)[0:num_indices] + + expected_output = np.zeros(size, dtype=np.float32) + expected_output[rand_indices] = 1. + + tf_all_indices = tf.placeholder(tf.int32) + tf_rand_indices = tf.constant(rand_indices) + indicator = ops.indices_to_dense_vector(tf_rand_indices, + tf.shape(tf_all_indices)[0]) + feed_dict = {tf_all_indices: all_indices} + + with self.test_session() as sess: + output = sess.run(indicator, feed_dict=feed_dict) + self.assertAllEqual(output, expected_output) + self.assertEqual(output.dtype, expected_output.dtype) + + def test_indices_to_dense_vector_int(self): + size = 500 + num_indices = 25 + rand_indices = np.random.permutation(np.arange(size))[0:num_indices] + + expected_output = np.zeros(size, dtype=np.int64) + expected_output[rand_indices] = 1 + + tf_rand_indices = tf.constant(rand_indices) + indicator = ops.indices_to_dense_vector( + tf_rand_indices, size, 1, dtype=tf.int64) + + with self.test_session() as sess: + output = sess.run(indicator) + self.assertAllEqual(output, expected_output) + self.assertEqual(output.dtype, expected_output.dtype) + + def test_indices_to_dense_vector_custom_values(self): + size = 100 + num_indices = 10 + rand_indices = np.random.permutation(np.arange(size))[0:num_indices] + indices_value = np.random.rand(1) + default_value = np.random.rand(1) + + expected_output = np.float32(np.ones(size) * default_value) + expected_output[rand_indices] = indices_value + + tf_rand_indices = tf.constant(rand_indices) + indicator = ops.indices_to_dense_vector( + tf_rand_indices, + size, + indices_value=indices_value, + default_value=default_value) + + with self.test_session() as sess: + output = sess.run(indicator) + self.assertAllClose(output, expected_output) + self.assertEqual(output.dtype, expected_output.dtype) + + def test_indices_to_dense_vector_all_indices_as_input(self): + size = 500 + num_indices = 500 + rand_indices = np.random.permutation(np.arange(size))[0:num_indices] + + expected_output = np.ones(size, dtype=np.float32) + + tf_rand_indices = tf.constant(rand_indices) + indicator = ops.indices_to_dense_vector(tf_rand_indices, size) + + with self.test_session() as sess: + output = sess.run(indicator) + self.assertAllEqual(output, expected_output) + self.assertEqual(output.dtype, expected_output.dtype) + + def test_indices_to_dense_vector_empty_indices_as_input(self): + size = 500 + rand_indices = [] + + expected_output = np.zeros(size, dtype=np.float32) + + tf_rand_indices = tf.constant(rand_indices) + indicator = ops.indices_to_dense_vector(tf_rand_indices, size) + + with self.test_session() as sess: + output = sess.run(indicator) + self.assertAllEqual(output, expected_output) + self.assertEqual(output.dtype, expected_output.dtype) + + +class GroundtruthFilterTest(tf.test.TestCase): + + def test_filter_groundtruth(self): + input_image = tf.placeholder(tf.float32, shape=(None, None, 3)) + input_boxes = tf.placeholder(tf.float32, shape=(None, 4)) + input_classes = tf.placeholder(tf.int32, shape=(None,)) + input_is_crowd = tf.placeholder(tf.bool, shape=(None,)) + input_area = tf.placeholder(tf.float32, shape=(None,)) + input_difficult = tf.placeholder(tf.float32, shape=(None,)) + input_label_types = tf.placeholder(tf.string, shape=(None,)) + valid_indices = tf.placeholder(tf.int32, shape=(None,)) + input_tensors = { + fields.InputDataFields.image: input_image, + fields.InputDataFields.groundtruth_boxes: input_boxes, + fields.InputDataFields.groundtruth_classes: input_classes, + fields.InputDataFields.groundtruth_is_crowd: input_is_crowd, + fields.InputDataFields.groundtruth_area: input_area, + fields.InputDataFields.groundtruth_difficult: input_difficult, + fields.InputDataFields.groundtruth_label_types: input_label_types + } + output_tensors = ops.retain_groundtruth(input_tensors, valid_indices) + + image_tensor = np.random.rand(224, 224, 3) + feed_dict = { + input_image: image_tensor, + input_boxes: + np.array([[0.2, 0.4, 0.1, 0.8], [0.2, 0.4, 1.0, 0.8]], dtype=np.float), + input_classes: + np.array([1, 2], dtype=np.int32), + input_is_crowd: + np.array([False, True], dtype=np.bool), + input_area: + np.array([32, 48], dtype=np.float32), + input_difficult: + np.array([True, False], dtype=np.bool), + input_label_types: + np.array(['APPROPRIATE', 'INCORRECT'], dtype=np.string_), + valid_indices: + np.array([0], dtype=np.int32) + } + expected_tensors = { + fields.InputDataFields.image: + image_tensor, + fields.InputDataFields.groundtruth_boxes: + [[0.2, 0.4, 0.1, 0.8]], + fields.InputDataFields.groundtruth_classes: + [1], + fields.InputDataFields.groundtruth_is_crowd: + [False], + fields.InputDataFields.groundtruth_area: + [32], + fields.InputDataFields.groundtruth_difficult: + [True], + fields.InputDataFields.groundtruth_label_types: + ['APPROPRIATE'] + } + with self.test_session() as sess: + output_tensors = sess.run(output_tensors, feed_dict=feed_dict) + for key in [fields.InputDataFields.image, + fields.InputDataFields.groundtruth_boxes, + fields.InputDataFields.groundtruth_area]: + self.assertAllClose(expected_tensors[key], output_tensors[key]) + for key in [fields.InputDataFields.groundtruth_classes, + fields.InputDataFields.groundtruth_is_crowd, + fields.InputDataFields.groundtruth_label_types]: + self.assertAllEqual(expected_tensors[key], output_tensors[key]) + + def test_filter_with_missing_fields(self): + input_boxes = tf.placeholder(tf.float32, shape=(None, 4)) + input_classes = tf.placeholder(tf.int32, shape=(None,)) + input_tensors = { + fields.InputDataFields.groundtruth_boxes: input_boxes, + fields.InputDataFields.groundtruth_classes: input_classes + } + valid_indices = tf.placeholder(tf.int32, shape=(None,)) + + feed_dict = { + input_boxes: + np.array([[0.2, 0.4, 0.1, 0.8], [0.2, 0.4, 1.0, 0.8]], dtype=np.float), + input_classes: + np.array([1, 2], dtype=np.int32), + valid_indices: + np.array([0], dtype=np.int32) + } + expected_tensors = { + fields.InputDataFields.groundtruth_boxes: + [[0.2, 0.4, 0.1, 0.8]], + fields.InputDataFields.groundtruth_classes: + [1] + } + + output_tensors = ops.retain_groundtruth(input_tensors, valid_indices) + with self.test_session() as sess: + output_tensors = sess.run(output_tensors, feed_dict=feed_dict) + for key in [fields.InputDataFields.groundtruth_boxes]: + self.assertAllClose(expected_tensors[key], output_tensors[key]) + for key in [fields.InputDataFields.groundtruth_classes]: + self.assertAllEqual(expected_tensors[key], output_tensors[key]) + + def test_filter_with_empty_fields(self): + input_boxes = tf.placeholder(tf.float32, shape=(None, 4)) + input_classes = tf.placeholder(tf.int32, shape=(None,)) + input_is_crowd = tf.placeholder(tf.bool, shape=(None,)) + input_area = tf.placeholder(tf.float32, shape=(None,)) + input_difficult = tf.placeholder(tf.float32, shape=(None,)) + valid_indices = tf.placeholder(tf.int32, shape=(None,)) + input_tensors = { + fields.InputDataFields.groundtruth_boxes: input_boxes, + fields.InputDataFields.groundtruth_classes: input_classes, + fields.InputDataFields.groundtruth_is_crowd: input_is_crowd, + fields.InputDataFields.groundtruth_area: input_area, + fields.InputDataFields.groundtruth_difficult: input_difficult + } + output_tensors = ops.retain_groundtruth(input_tensors, valid_indices) + + feed_dict = { + input_boxes: + np.array([[0.2, 0.4, 0.1, 0.8], [0.2, 0.4, 1.0, 0.8]], dtype=np.float), + input_classes: + np.array([1, 2], dtype=np.int32), + input_is_crowd: + np.array([False, True], dtype=np.bool), + input_area: + np.array([], dtype=np.float32), + input_difficult: + np.array([], dtype=np.float32), + valid_indices: + np.array([0], dtype=np.int32) + } + expected_tensors = { + fields.InputDataFields.groundtruth_boxes: + [[0.2, 0.4, 0.1, 0.8]], + fields.InputDataFields.groundtruth_classes: + [1], + fields.InputDataFields.groundtruth_is_crowd: + [False], + fields.InputDataFields.groundtruth_area: + [], + fields.InputDataFields.groundtruth_difficult: + [] + } + with self.test_session() as sess: + output_tensors = sess.run(output_tensors, feed_dict=feed_dict) + for key in [fields.InputDataFields.groundtruth_boxes, + fields.InputDataFields.groundtruth_area]: + self.assertAllClose(expected_tensors[key], output_tensors[key]) + for key in [fields.InputDataFields.groundtruth_classes, + fields.InputDataFields.groundtruth_is_crowd]: + self.assertAllEqual(expected_tensors[key], output_tensors[key]) + + def test_filter_with_empty_groundtruth_boxes(self): + input_boxes = tf.placeholder(tf.float32, shape=(None, 4)) + input_classes = tf.placeholder(tf.int32, shape=(None,)) + input_is_crowd = tf.placeholder(tf.bool, shape=(None,)) + input_area = tf.placeholder(tf.float32, shape=(None,)) + input_difficult = tf.placeholder(tf.float32, shape=(None,)) + valid_indices = tf.placeholder(tf.int32, shape=(None,)) + input_tensors = { + fields.InputDataFields.groundtruth_boxes: input_boxes, + fields.InputDataFields.groundtruth_classes: input_classes, + fields.InputDataFields.groundtruth_is_crowd: input_is_crowd, + fields.InputDataFields.groundtruth_area: input_area, + fields.InputDataFields.groundtruth_difficult: input_difficult + } + output_tensors = ops.retain_groundtruth(input_tensors, valid_indices) + + feed_dict = { + input_boxes: + np.array([], dtype=np.float).reshape(0, 4), + input_classes: + np.array([], dtype=np.int32), + input_is_crowd: + np.array([], dtype=np.bool), + input_area: + np.array([], dtype=np.float32), + input_difficult: + np.array([], dtype=np.float32), + valid_indices: + np.array([], dtype=np.int32) + } + with self.test_session() as sess: + output_tensors = sess.run(output_tensors, feed_dict=feed_dict) + for key in input_tensors: + if key == fields.InputDataFields.groundtruth_boxes: + self.assertAllEqual([0, 4], output_tensors[key].shape) + else: + self.assertAllEqual([0], output_tensors[key].shape) + + +class RetainGroundTruthWithPositiveClasses(tf.test.TestCase): + + def test_filter_groundtruth_with_positive_classes(self): + input_image = tf.placeholder(tf.float32, shape=(None, None, 3)) + input_boxes = tf.placeholder(tf.float32, shape=(None, 4)) + input_classes = tf.placeholder(tf.int32, shape=(None,)) + input_is_crowd = tf.placeholder(tf.bool, shape=(None,)) + input_area = tf.placeholder(tf.float32, shape=(None,)) + input_difficult = tf.placeholder(tf.float32, shape=(None,)) + input_label_types = tf.placeholder(tf.string, shape=(None,)) + valid_indices = tf.placeholder(tf.int32, shape=(None,)) + input_tensors = { + fields.InputDataFields.image: input_image, + fields.InputDataFields.groundtruth_boxes: input_boxes, + fields.InputDataFields.groundtruth_classes: input_classes, + fields.InputDataFields.groundtruth_is_crowd: input_is_crowd, + fields.InputDataFields.groundtruth_area: input_area, + fields.InputDataFields.groundtruth_difficult: input_difficult, + fields.InputDataFields.groundtruth_label_types: input_label_types + } + output_tensors = ops.retain_groundtruth_with_positive_classes(input_tensors) + + image_tensor = np.random.rand(224, 224, 3) + feed_dict = { + input_image: image_tensor, + input_boxes: + np.array([[0.2, 0.4, 0.1, 0.8], [0.2, 0.4, 1.0, 0.8]], dtype=np.float), + input_classes: + np.array([1, 0], dtype=np.int32), + input_is_crowd: + np.array([False, True], dtype=np.bool), + input_area: + np.array([32, 48], dtype=np.float32), + input_difficult: + np.array([True, False], dtype=np.bool), + input_label_types: + np.array(['APPROPRIATE', 'INCORRECT'], dtype=np.string_), + valid_indices: + np.array([0], dtype=np.int32) + } + expected_tensors = { + fields.InputDataFields.image: + image_tensor, + fields.InputDataFields.groundtruth_boxes: + [[0.2, 0.4, 0.1, 0.8]], + fields.InputDataFields.groundtruth_classes: + [1], + fields.InputDataFields.groundtruth_is_crowd: + [False], + fields.InputDataFields.groundtruth_area: + [32], + fields.InputDataFields.groundtruth_difficult: + [True], + fields.InputDataFields.groundtruth_label_types: + ['APPROPRIATE'] + } + with self.test_session() as sess: + output_tensors = sess.run(output_tensors, feed_dict=feed_dict) + for key in [fields.InputDataFields.image, + fields.InputDataFields.groundtruth_boxes, + fields.InputDataFields.groundtruth_area]: + self.assertAllClose(expected_tensors[key], output_tensors[key]) + for key in [fields.InputDataFields.groundtruth_classes, + fields.InputDataFields.groundtruth_is_crowd, + fields.InputDataFields.groundtruth_label_types]: + self.assertAllEqual(expected_tensors[key], output_tensors[key]) + + +class GroundtruthFilterWithNanBoxTest(tf.test.TestCase): + + def test_filter_groundtruth_with_nan_box_coordinates(self): + input_tensors = { + fields.InputDataFields.groundtruth_boxes: + [[np.nan, np.nan, np.nan, np.nan], [0.2, 0.4, 0.1, 0.8]], + fields.InputDataFields.groundtruth_classes: + [1, 2], + fields.InputDataFields.groundtruth_is_crowd: + [False, True], + fields.InputDataFields.groundtruth_area: + [100.0, 238.7] + } + + expected_tensors = { + fields.InputDataFields.groundtruth_boxes: + [[0.2, 0.4, 0.1, 0.8]], + fields.InputDataFields.groundtruth_classes: + [2], + fields.InputDataFields.groundtruth_is_crowd: + [True], + fields.InputDataFields.groundtruth_area: + [238.7] + } + + output_tensors = ops.filter_groundtruth_with_nan_box_coordinates( + input_tensors) + with self.test_session() as sess: + output_tensors = sess.run(output_tensors) + for key in [fields.InputDataFields.groundtruth_boxes, + fields.InputDataFields.groundtruth_area]: + self.assertAllClose(expected_tensors[key], output_tensors[key]) + for key in [fields.InputDataFields.groundtruth_classes, + fields.InputDataFields.groundtruth_is_crowd]: + self.assertAllEqual(expected_tensors[key], output_tensors[key]) + + +class OpsTestNormalizeToTarget(tf.test.TestCase): + + def test_create_normalize_to_target(self): + inputs = tf.random_uniform([5, 10, 12, 3]) + target_norm_value = 4.0 + dim = 3 + with self.test_session(): + output = ops.normalize_to_target(inputs, target_norm_value, dim) + self.assertEqual(output.op.name, 'NormalizeToTarget/mul') + var_name = tf.contrib.framework.get_variables()[0].name + self.assertEqual(var_name, 'NormalizeToTarget/weights:0') + + def test_invalid_dim(self): + inputs = tf.random_uniform([5, 10, 12, 3]) + target_norm_value = 4.0 + dim = 10 + with self.assertRaisesRegexp( + ValueError, + 'dim must be non-negative but smaller than the input rank.'): + ops.normalize_to_target(inputs, target_norm_value, dim) + + def test_invalid_target_norm_values(self): + inputs = tf.random_uniform([5, 10, 12, 3]) + target_norm_value = [4.0, 4.0] + dim = 3 + with self.assertRaisesRegexp( + ValueError, 'target_norm_value must be a float or a list of floats'): + ops.normalize_to_target(inputs, target_norm_value, dim) + + def test_correct_output_shape(self): + inputs = tf.random_uniform([5, 10, 12, 3]) + target_norm_value = 4.0 + dim = 3 + with self.test_session(): + output = ops.normalize_to_target(inputs, target_norm_value, dim) + self.assertEqual(output.get_shape().as_list(), + inputs.get_shape().as_list()) + + def test_correct_initial_output_values(self): + inputs = tf.constant([[[[3, 4], [7, 24]], + [[5, -12], [-1, 0]]]], tf.float32) + target_norm_value = 10.0 + dim = 3 + expected_output = [[[[30/5.0, 40/5.0], [70/25.0, 240/25.0]], + [[50/13.0, -120/13.0], [-10, 0]]]] + with self.test_session() as sess: + normalized_inputs = ops.normalize_to_target(inputs, target_norm_value, + dim) + sess.run(tf.global_variables_initializer()) + output = normalized_inputs.eval() + self.assertAllClose(output, expected_output) + + def test_multiple_target_norm_values(self): + inputs = tf.constant([[[[3, 4], [7, 24]], + [[5, -12], [-1, 0]]]], tf.float32) + target_norm_value = [10.0, 20.0] + dim = 3 + expected_output = [[[[30/5.0, 80/5.0], [70/25.0, 480/25.0]], + [[50/13.0, -240/13.0], [-10, 0]]]] + with self.test_session() as sess: + normalized_inputs = ops.normalize_to_target(inputs, target_norm_value, + dim) + sess.run(tf.global_variables_initializer()) + output = normalized_inputs.eval() + self.assertAllClose(output, expected_output) + + +class OpsTestPositionSensitiveCropRegions(tf.test.TestCase): + + def test_position_sensitive(self): + num_spatial_bins = [3, 2] + image_shape = [1, 3, 2, 6] + + # First channel is 1's, second channel is 2's, etc. + image = tf.constant(range(1, 3 * 2 + 1) * 6, dtype=tf.float32, + shape=image_shape) + boxes = tf.random_uniform((2, 4)) + box_ind = tf.constant([0, 0], dtype=tf.int32) + + # The result for both boxes should be [[1, 2], [3, 4], [5, 6]] + # before averaging. + expected_output = np.array([3.5, 3.5]).reshape([2, 1, 1, 1]) + + for crop_size_mult in range(1, 3): + crop_size = [3 * crop_size_mult, 2 * crop_size_mult] + ps_crop_and_pool = ops.position_sensitive_crop_regions( + image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=True) + + with self.test_session() as sess: + output = sess.run(ps_crop_and_pool) + self.assertAllClose(output, expected_output) + + def test_position_sensitive_with_equal_channels(self): + num_spatial_bins = [2, 2] + image_shape = [1, 3, 3, 4] + crop_size = [2, 2] + + image = tf.constant(range(1, 3 * 3 + 1), dtype=tf.float32, + shape=[1, 3, 3, 1]) + tiled_image = tf.tile(image, [1, 1, 1, image_shape[3]]) + boxes = tf.random_uniform((3, 4)) + box_ind = tf.constant([0, 0, 0], dtype=tf.int32) + + # All channels are equal so position-sensitive crop and resize should + # work as the usual crop and resize for just one channel. + crop = tf.image.crop_and_resize(image, boxes, box_ind, crop_size) + crop_and_pool = tf.reduce_mean(crop, [1, 2], keep_dims=True) + + ps_crop_and_pool = ops.position_sensitive_crop_regions( + tiled_image, + boxes, + box_ind, + crop_size, + num_spatial_bins, + global_pool=True) + + with self.test_session() as sess: + expected_output, output = sess.run((crop_and_pool, ps_crop_and_pool)) + self.assertAllClose(output, expected_output) + + def test_position_sensitive_with_single_bin(self): + num_spatial_bins = [1, 1] + image_shape = [2, 3, 3, 4] + crop_size = [2, 2] + + image = tf.random_uniform(image_shape) + boxes = tf.random_uniform((6, 4)) + box_ind = tf.constant([0, 0, 0, 1, 1, 1], dtype=tf.int32) + + # When a single bin is used, position-sensitive crop and pool should be + # the same as non-position sensitive crop and pool. + crop = tf.image.crop_and_resize(image, boxes, box_ind, crop_size) + crop_and_pool = tf.reduce_mean(crop, [1, 2], keep_dims=True) + + ps_crop_and_pool = ops.position_sensitive_crop_regions( + image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=True) + + with self.test_session() as sess: + expected_output, output = sess.run((crop_and_pool, ps_crop_and_pool)) + self.assertAllClose(output, expected_output) + + def test_raise_value_error_on_num_bins_less_than_one(self): + num_spatial_bins = [1, -1] + image_shape = [1, 1, 1, 2] + crop_size = [2, 2] + + image = tf.constant(1, dtype=tf.float32, shape=image_shape) + boxes = tf.constant([[0, 0, 1, 1]], dtype=tf.float32) + box_ind = tf.constant([0], dtype=tf.int32) + + with self.assertRaisesRegexp(ValueError, 'num_spatial_bins should be >= 1'): + ops.position_sensitive_crop_regions( + image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=True) + + def test_raise_value_error_on_non_divisible_crop_size(self): + num_spatial_bins = [2, 3] + image_shape = [1, 1, 1, 6] + crop_size = [3, 2] + + image = tf.constant(1, dtype=tf.float32, shape=image_shape) + boxes = tf.constant([[0, 0, 1, 1]], dtype=tf.float32) + box_ind = tf.constant([0], dtype=tf.int32) + + with self.assertRaisesRegexp( + ValueError, 'crop_size should be divisible by num_spatial_bins'): + ops.position_sensitive_crop_regions( + image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=True) + + def test_raise_value_error_on_non_divisible_num_channels(self): + num_spatial_bins = [2, 2] + image_shape = [1, 1, 1, 5] + crop_size = [2, 2] + + image = tf.constant(1, dtype=tf.float32, shape=image_shape) + boxes = tf.constant([[0, 0, 1, 1]], dtype=tf.float32) + box_ind = tf.constant([0], dtype=tf.int32) + + with self.assertRaisesRegexp( + ValueError, 'Dimension size must be evenly divisible by 4 but is 5'): + ops.position_sensitive_crop_regions( + image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=True) + + def test_position_sensitive_with_global_pool_false(self): + num_spatial_bins = [3, 2] + image_shape = [1, 3, 2, 6] + num_boxes = 2 + + # First channel is 1's, second channel is 2's, etc. + image = tf.constant(range(1, 3 * 2 + 1) * 6, dtype=tf.float32, + shape=image_shape) + boxes = tf.random_uniform((num_boxes, 4)) + box_ind = tf.constant([0, 0], dtype=tf.int32) + + expected_output = [] + + # Expected output, when crop_size = [3, 2]. + expected_output.append(np.expand_dims( + np.tile(np.array([[1, 2], + [3, 4], + [5, 6]]), (num_boxes, 1, 1)), + axis=-1)) + + # Expected output, when crop_size = [6, 4]. + expected_output.append(np.expand_dims( + np.tile(np.array([[1, 1, 2, 2], + [1, 1, 2, 2], + [3, 3, 4, 4], + [3, 3, 4, 4], + [5, 5, 6, 6], + [5, 5, 6, 6]]), (num_boxes, 1, 1)), + axis=-1)) + + for crop_size_mult in range(1, 3): + crop_size = [3 * crop_size_mult, 2 * crop_size_mult] + ps_crop = ops.position_sensitive_crop_regions( + image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=False) + with self.test_session() as sess: + output = sess.run(ps_crop) + + self.assertAllEqual(output, expected_output[crop_size_mult - 1]) + + def test_position_sensitive_with_global_pool_false_and_known_boxes(self): + num_spatial_bins = [2, 2] + image_shape = [2, 2, 2, 4] + crop_size = [2, 2] + + image = tf.constant(range(1, 2 * 2 * 4 + 1) * 2, dtype=tf.float32, + shape=image_shape) + + # First box contains whole image, and second box contains only first row. + boxes = tf.constant(np.array([[0., 0., 1., 1.], + [0., 0., 0.5, 1.]]), dtype=tf.float32) + box_ind = tf.constant([0, 1], dtype=tf.int32) + + expected_output = [] + + # Expected output, when the box containing whole image. + expected_output.append( + np.reshape(np.array([[4, 7], + [10, 13]]), + (1, 2, 2, 1)) + ) + + # Expected output, when the box containing only first row. + expected_output.append( + np.reshape(np.array([[3, 6], + [7, 10]]), + (1, 2, 2, 1)) + ) + expected_output = np.concatenate(expected_output, axis=0) + + ps_crop = ops.position_sensitive_crop_regions( + image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=False) + + with self.test_session() as sess: + output = sess.run(ps_crop) + self.assertAllEqual(output, expected_output) + + def test_position_sensitive_with_global_pool_false_and_single_bin(self): + num_spatial_bins = [1, 1] + image_shape = [2, 3, 3, 4] + crop_size = [1, 1] + + image = tf.random_uniform(image_shape) + boxes = tf.random_uniform((6, 4)) + box_ind = tf.constant([0, 0, 0, 1, 1, 1], dtype=tf.int32) + + # Since single_bin is used and crop_size = [1, 1] (i.e., no crop resize), + # the outputs are the same whatever the global_pool value is. + ps_crop_and_pool = ops.position_sensitive_crop_regions( + image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=True) + ps_crop = ops.position_sensitive_crop_regions( + image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=False) + + with self.test_session() as sess: + pooled_output, unpooled_output = sess.run((ps_crop_and_pool, ps_crop)) + self.assertAllClose(pooled_output, unpooled_output) + + def test_position_sensitive_with_global_pool_false_and_do_global_pool(self): + num_spatial_bins = [3, 2] + image_shape = [1, 3, 2, 6] + num_boxes = 2 + + # First channel is 1's, second channel is 2's, etc. + image = tf.constant(range(1, 3 * 2 + 1) * 6, dtype=tf.float32, + shape=image_shape) + boxes = tf.random_uniform((num_boxes, 4)) + box_ind = tf.constant([0, 0], dtype=tf.int32) + + expected_output = [] + + # Expected output, when crop_size = [3, 2]. + expected_output.append(np.mean( + np.expand_dims( + np.tile(np.array([[1, 2], + [3, 4], + [5, 6]]), (num_boxes, 1, 1)), + axis=-1), + axis=(1, 2), keepdims=True)) + + # Expected output, when crop_size = [6, 4]. + expected_output.append(np.mean( + np.expand_dims( + np.tile(np.array([[1, 1, 2, 2], + [1, 1, 2, 2], + [3, 3, 4, 4], + [3, 3, 4, 4], + [5, 5, 6, 6], + [5, 5, 6, 6]]), (num_boxes, 1, 1)), + axis=-1), + axis=(1, 2), keepdims=True)) + + for crop_size_mult in range(1, 3): + crop_size = [3 * crop_size_mult, 2 * crop_size_mult] + + # Perform global_pooling after running the function with + # global_pool=False. + ps_crop = ops.position_sensitive_crop_regions( + image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=False) + ps_crop_and_pool = tf.reduce_mean( + ps_crop, reduction_indices=(1, 2), keep_dims=True) + + with self.test_session() as sess: + output = sess.run(ps_crop_and_pool) + + self.assertAllEqual(output, expected_output[crop_size_mult - 1]) + + def test_raise_value_error_on_non_square_block_size(self): + num_spatial_bins = [3, 2] + image_shape = [1, 3, 2, 6] + crop_size = [6, 2] + + image = tf.constant(1, dtype=tf.float32, shape=image_shape) + boxes = tf.constant([[0, 0, 1, 1]], dtype=tf.float32) + box_ind = tf.constant([0], dtype=tf.int32) + + with self.assertRaisesRegexp( + ValueError, 'Only support square bin crop size for now.'): + ops.position_sensitive_crop_regions( + image, boxes, box_ind, crop_size, num_spatial_bins, global_pool=False) + + +class ReframeBoxMasksToImageMasksTest(tf.test.TestCase): + + def testZeroImageOnEmptyMask(self): + box_masks = tf.constant([[[0, 0], + [0, 0]]], dtype=tf.float32) + boxes = tf.constant([[0.0, 0.0, 1.0, 1.0]], dtype=tf.float32) + image_masks = ops.reframe_box_masks_to_image_masks(box_masks, boxes, + image_height=4, + image_width=4) + np_expected_image_masks = np.array([[[0, 0, 0, 0], + [0, 0, 0, 0], + [0, 0, 0, 0], + [0, 0, 0, 0]]], dtype=np.float32) + with self.test_session() as sess: + np_image_masks = sess.run(image_masks) + self.assertAllClose(np_image_masks, np_expected_image_masks) + + def testMaskIsCenteredInImageWhenBoxIsCentered(self): + box_masks = tf.constant([[[1, 1], + [1, 1]]], dtype=tf.float32) + boxes = tf.constant([[0.25, 0.25, 0.75, 0.75]], dtype=tf.float32) + image_masks = ops.reframe_box_masks_to_image_masks(box_masks, boxes, + image_height=4, + image_width=4) + np_expected_image_masks = np.array([[[0, 0, 0, 0], + [0, 1, 1, 0], + [0, 1, 1, 0], + [0, 0, 0, 0]]], dtype=np.float32) + with self.test_session() as sess: + np_image_masks = sess.run(image_masks) + self.assertAllClose(np_image_masks, np_expected_image_masks) + + def testMaskOffCenterRemainsOffCenterInImage(self): + box_masks = tf.constant([[[1, 0], + [0, 1]]], dtype=tf.float32) + boxes = tf.constant([[0.25, 0.5, 0.75, 1.0]], dtype=tf.float32) + image_masks = ops.reframe_box_masks_to_image_masks(box_masks, boxes, + image_height=4, + image_width=4) + np_expected_image_masks = np.array([[[0, 0, 0, 0], + [0, 0, 0.6111111, 0.16666669], + [0, 0, 0.3888889, 0.83333337], + [0, 0, 0, 0]]], dtype=np.float32) + with self.test_session() as sess: + np_image_masks = sess.run(image_masks) + self.assertAllClose(np_image_masks, np_expected_image_masks) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/utils/per_image_evaluation.py b/object_detection/utils/per_image_evaluation.py new file mode 100644 index 0000000000000000000000000000000000000000..ed39afa6f5c2f3d8989febc4f2d3653572ac100d --- /dev/null +++ b/object_detection/utils/per_image_evaluation.py @@ -0,0 +1,260 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Evaluate Object Detection result on a single image. + +Annotate each detected result as true positives or false positive according to +a predefined IOU ratio. Non Maximum Supression is used by default. Multi class +detection is supported by default. +""" +import numpy as np + +from object_detection.utils import np_box_list +from object_detection.utils import np_box_list_ops + + +class PerImageEvaluation(object): + """Evaluate detection result of a single image.""" + + def __init__(self, + num_groundtruth_classes, + matching_iou_threshold=0.5, + nms_iou_threshold=0.3, + nms_max_output_boxes=50): + """Initialized PerImageEvaluation by evaluation parameters. + + Args: + num_groundtruth_classes: Number of ground truth object classes + matching_iou_threshold: A ratio of area intersection to union, which is + the threshold to consider whether a detection is true positive or not + nms_iou_threshold: IOU threshold used in Non Maximum Suppression. + nms_max_output_boxes: Number of maximum output boxes in NMS. + """ + self.matching_iou_threshold = matching_iou_threshold + self.nms_iou_threshold = nms_iou_threshold + self.nms_max_output_boxes = nms_max_output_boxes + self.num_groundtruth_classes = num_groundtruth_classes + + def compute_object_detection_metrics(self, detected_boxes, detected_scores, + detected_class_labels, groundtruth_boxes, + groundtruth_class_labels, + groundtruth_is_difficult_lists): + """Compute Object Detection related metrics from a single image. + + Args: + detected_boxes: A float numpy array of shape [N, 4], representing N + regions of detected object regions. + Each row is of the format [y_min, x_min, y_max, x_max] + detected_scores: A float numpy array of shape [N, 1], representing + the confidence scores of the detected N object instances. + detected_class_labels: A integer numpy array of shape [N, 1], repreneting + the class labels of the detected N object instances. + groundtruth_boxes: A float numpy array of shape [M, 4], representing M + regions of object instances in ground truth + groundtruth_class_labels: An integer numpy array of shape [M, 1], + representing M class labels of object instances in ground truth + groundtruth_is_difficult_lists: A boolean numpy array of length M denoting + whether a ground truth box is a difficult instance or not + + Returns: + scores: A list of C float numpy arrays. Each numpy array is of + shape [K, 1], representing K scores detected with object class + label c + tp_fp_labels: A list of C boolean numpy arrays. Each numpy array + is of shape [K, 1], representing K True/False positive label of + object instances detected with class label c + is_class_correctly_detected_in_image: a numpy integer array of + shape [C, 1], indicating whether the correponding class has a least + one instance being correctly detected in the image + """ + detected_boxes, detected_scores, detected_class_labels = ( + self._remove_invalid_boxes(detected_boxes, detected_scores, + detected_class_labels)) + scores, tp_fp_labels = self._compute_tp_fp( + detected_boxes, detected_scores, detected_class_labels, + groundtruth_boxes, groundtruth_class_labels, + groundtruth_is_difficult_lists) + is_class_correctly_detected_in_image = self._compute_cor_loc( + detected_boxes, detected_scores, detected_class_labels, + groundtruth_boxes, groundtruth_class_labels) + return scores, tp_fp_labels, is_class_correctly_detected_in_image + + def _compute_cor_loc(self, detected_boxes, detected_scores, + detected_class_labels, groundtruth_boxes, + groundtruth_class_labels): + """Compute CorLoc score for object detection result. + + Args: + detected_boxes: A float numpy array of shape [N, 4], representing N + regions of detected object regions. + Each row is of the format [y_min, x_min, y_max, x_max] + detected_scores: A float numpy array of shape [N, 1], representing + the confidence scores of the detected N object instances. + detected_class_labels: A integer numpy array of shape [N, 1], repreneting + the class labels of the detected N object instances. + groundtruth_boxes: A float numpy array of shape [M, 4], representing M + regions of object instances in ground truth + groundtruth_class_labels: An integer numpy array of shape [M, 1], + representing M class labels of object instances in ground truth + Returns: + is_class_correctly_detected_in_image: a numpy integer array of + shape [C, 1], indicating whether the correponding class has a least + one instance being correctly detected in the image + """ + is_class_correctly_detected_in_image = np.zeros( + self.num_groundtruth_classes, dtype=int) + for i in range(self.num_groundtruth_classes): + gt_boxes_at_ith_class = groundtruth_boxes[ + groundtruth_class_labels == i, :] + detected_boxes_at_ith_class = detected_boxes[ + detected_class_labels == i, :] + detected_scores_at_ith_class = detected_scores[detected_class_labels == i] + is_class_correctly_detected_in_image[i] = ( + self._compute_is_aclass_correctly_detected_in_image( + detected_boxes_at_ith_class, detected_scores_at_ith_class, + gt_boxes_at_ith_class)) + + return is_class_correctly_detected_in_image + + def _compute_is_aclass_correctly_detected_in_image( + self, detected_boxes, detected_scores, groundtruth_boxes): + """Compute CorLoc score for a single class. + + Args: + detected_boxes: A numpy array of shape [N, 4] representing detected box + coordinates + detected_scores: A 1-d numpy array of length N representing classification + score + groundtruth_boxes: A numpy array of shape [M, 4] representing ground truth + box coordinates + + Returns: + is_class_correctly_detected_in_image: An integer 1 or 0 denoting whether a + class is correctly detected in the image or not + """ + if detected_boxes.size > 0: + if groundtruth_boxes.size > 0: + max_score_id = np.argmax(detected_scores) + detected_boxlist = np_box_list.BoxList( + np.expand_dims(detected_boxes[max_score_id, :], axis=0)) + gt_boxlist = np_box_list.BoxList(groundtruth_boxes) + iou = np_box_list_ops.iou(detected_boxlist, gt_boxlist) + if np.max(iou) >= self.matching_iou_threshold: + return 1 + return 0 + + def _compute_tp_fp(self, detected_boxes, detected_scores, + detected_class_labels, groundtruth_boxes, + groundtruth_class_labels, groundtruth_is_difficult_lists): + """Labels true/false positives of detections of an image across all classes. + + Args: + detected_boxes: A float numpy array of shape [N, 4], representing N + regions of detected object regions. + Each row is of the format [y_min, x_min, y_max, x_max] + detected_scores: A float numpy array of shape [N, 1], representing + the confidence scores of the detected N object instances. + detected_class_labels: A integer numpy array of shape [N, 1], repreneting + the class labels of the detected N object instances. + groundtruth_boxes: A float numpy array of shape [M, 4], representing M + regions of object instances in ground truth + groundtruth_class_labels: An integer numpy array of shape [M, 1], + representing M class labels of object instances in ground truth + groundtruth_is_difficult_lists: A boolean numpy array of length M denoting + whether a ground truth box is a difficult instance or not + + Returns: + result_scores: A list of float numpy arrays. Each numpy array is of + shape [K, 1], representing K scores detected with object class + label c + result_tp_fp_labels: A list of boolean numpy array. Each numpy array is of + shape [K, 1], representing K True/False positive label of object + instances detected with class label c + """ + result_scores = [] + result_tp_fp_labels = [] + for i in range(self.num_groundtruth_classes): + gt_boxes_at_ith_class = groundtruth_boxes[(groundtruth_class_labels == i + ), :] + groundtruth_is_difficult_list_at_ith_class = ( + groundtruth_is_difficult_lists[groundtruth_class_labels == i]) + detected_boxes_at_ith_class = detected_boxes[(detected_class_labels == i + ), :] + detected_scores_at_ith_class = detected_scores[detected_class_labels == i] + scores, tp_fp_labels = self._compute_tp_fp_for_single_class( + detected_boxes_at_ith_class, detected_scores_at_ith_class, + gt_boxes_at_ith_class, groundtruth_is_difficult_list_at_ith_class) + result_scores.append(scores) + result_tp_fp_labels.append(tp_fp_labels) + return result_scores, result_tp_fp_labels + + def _remove_invalid_boxes(self, detected_boxes, detected_scores, + detected_class_labels): + valid_indices = np.logical_and(detected_boxes[:, 0] < detected_boxes[:, 2], + detected_boxes[:, 1] < detected_boxes[:, 3]) + return (detected_boxes[valid_indices, :], detected_scores[valid_indices], + detected_class_labels[valid_indices]) + + def _compute_tp_fp_for_single_class(self, detected_boxes, detected_scores, + groundtruth_boxes, + groundtruth_is_difficult_list): + """Labels boxes detected with the same class from the same image as tp/fp. + + Args: + detected_boxes: A numpy array of shape [N, 4] representing detected box + coordinates + detected_scores: A 1-d numpy array of length N representing classification + score + groundtruth_boxes: A numpy array of shape [M, 4] representing ground truth + box coordinates + groundtruth_is_difficult_list: A boolean numpy array of length M denoting + whether a ground truth box is a difficult instance or not + + Returns: + scores: A numpy array representing the detection scores + tp_fp_labels: a boolean numpy array indicating whether a detection is a + true positive. + + """ + if detected_boxes.size == 0: + return np.array([], dtype=float), np.array([], dtype=bool) + detected_boxlist = np_box_list.BoxList(detected_boxes) + detected_boxlist.add_field('scores', detected_scores) + detected_boxlist = np_box_list_ops.non_max_suppression( + detected_boxlist, self.nms_max_output_boxes, self.nms_iou_threshold) + + scores = detected_boxlist.get_field('scores') + + if groundtruth_boxes.size == 0: + return scores, np.zeros(detected_boxlist.num_boxes(), dtype=bool) + gt_boxlist = np_box_list.BoxList(groundtruth_boxes) + + iou = np_box_list_ops.iou(detected_boxlist, gt_boxlist) + max_overlap_gt_ids = np.argmax(iou, axis=1) + is_gt_box_detected = np.zeros(gt_boxlist.num_boxes(), dtype=bool) + tp_fp_labels = np.zeros(detected_boxlist.num_boxes(), dtype=bool) + is_matched_to_difficult_box = np.zeros( + detected_boxlist.num_boxes(), dtype=bool) + for i in range(detected_boxlist.num_boxes()): + gt_id = max_overlap_gt_ids[i] + if iou[i, gt_id] >= self.matching_iou_threshold: + if not groundtruth_is_difficult_list[gt_id]: + if not is_gt_box_detected[gt_id]: + tp_fp_labels[i] = True + is_gt_box_detected[gt_id] = True + else: + is_matched_to_difficult_box[i] = True + return scores[~is_matched_to_difficult_box], tp_fp_labels[ + ~is_matched_to_difficult_box] diff --git a/object_detection/utils/per_image_evaluation_test.py b/object_detection/utils/per_image_evaluation_test.py new file mode 100644 index 0000000000000000000000000000000000000000..8c449f1ac50b9a02ce6f9fd1f36340e4993f5186 --- /dev/null +++ b/object_detection/utils/per_image_evaluation_test.py @@ -0,0 +1,212 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.utils.per_image_evaluation.""" + +import numpy as np +import tensorflow as tf + +from object_detection.utils import per_image_evaluation + + +class SingleClassTpFpWithDifficultBoxesTest(tf.test.TestCase): + + def setUp(self): + num_groundtruth_classes = 1 + matching_iou_threshold = 0.5 + nms_iou_threshold = 1.0 + nms_max_output_boxes = 10000 + self.eval = per_image_evaluation.PerImageEvaluation( + num_groundtruth_classes, matching_iou_threshold, nms_iou_threshold, + nms_max_output_boxes) + + self.detected_boxes = np.array([[0, 0, 1, 1], [0, 0, 2, 2], [0, 0, 3, 3]], + dtype=float) + self.detected_scores = np.array([0.6, 0.8, 0.5], dtype=float) + self.groundtruth_boxes = np.array([[0, 0, 1, 1], [0, 0, 10, 10]], + dtype=float) + + def test_match_to_not_difficult_box(self): + groundtruth_groundtruth_is_difficult_list = np.array([False, True], + dtype=bool) + scores, tp_fp_labels = self.eval._compute_tp_fp_for_single_class( + self.detected_boxes, self.detected_scores, self.groundtruth_boxes, + groundtruth_groundtruth_is_difficult_list) + expected_scores = np.array([0.8, 0.6, 0.5], dtype=float) + expected_tp_fp_labels = np.array([False, True, False], dtype=bool) + self.assertTrue(np.allclose(expected_scores, scores)) + self.assertTrue(np.allclose(expected_tp_fp_labels, tp_fp_labels)) + + def test_match_to_difficult_box(self): + groundtruth_groundtruth_is_difficult_list = np.array([True, False], + dtype=bool) + scores, tp_fp_labels = self.eval._compute_tp_fp_for_single_class( + self.detected_boxes, self.detected_scores, self.groundtruth_boxes, + groundtruth_groundtruth_is_difficult_list) + expected_scores = np.array([0.8, 0.5], dtype=float) + expected_tp_fp_labels = np.array([False, False], dtype=bool) + self.assertTrue(np.allclose(expected_scores, scores)) + self.assertTrue(np.allclose(expected_tp_fp_labels, tp_fp_labels)) + + +class SingleClassTpFpNoDifficultBoxesTest(tf.test.TestCase): + + def setUp(self): + num_groundtruth_classes = 1 + matching_iou_threshold1 = 0.5 + matching_iou_threshold2 = 0.1 + nms_iou_threshold = 1.0 + nms_max_output_boxes = 10000 + self.eval1 = per_image_evaluation.PerImageEvaluation( + num_groundtruth_classes, matching_iou_threshold1, nms_iou_threshold, + nms_max_output_boxes) + + self.eval2 = per_image_evaluation.PerImageEvaluation( + num_groundtruth_classes, matching_iou_threshold2, nms_iou_threshold, + nms_max_output_boxes) + + self.detected_boxes = np.array([[0, 0, 1, 1], [0, 0, 2, 2], [0, 0, 3, 3]], + dtype=float) + self.detected_scores = np.array([0.6, 0.8, 0.5], dtype=float) + + def test_no_true_positives(self): + groundtruth_boxes = np.array([[100, 100, 105, 105]], dtype=float) + groundtruth_groundtruth_is_difficult_list = np.zeros(1, dtype=bool) + scores, tp_fp_labels = self.eval1._compute_tp_fp_for_single_class( + self.detected_boxes, self.detected_scores, groundtruth_boxes, + groundtruth_groundtruth_is_difficult_list) + expected_scores = np.array([0.8, 0.6, 0.5], dtype=float) + expected_tp_fp_labels = np.array([False, False, False], dtype=bool) + self.assertTrue(np.allclose(expected_scores, scores)) + self.assertTrue(np.allclose(expected_tp_fp_labels, tp_fp_labels)) + + def test_one_true_positives_with_large_iou_threshold(self): + groundtruth_boxes = np.array([[0, 0, 1, 1]], dtype=float) + groundtruth_groundtruth_is_difficult_list = np.zeros(1, dtype=bool) + scores, tp_fp_labels = self.eval1._compute_tp_fp_for_single_class( + self.detected_boxes, self.detected_scores, groundtruth_boxes, + groundtruth_groundtruth_is_difficult_list) + expected_scores = np.array([0.8, 0.6, 0.5], dtype=float) + expected_tp_fp_labels = np.array([False, True, False], dtype=bool) + self.assertTrue(np.allclose(expected_scores, scores)) + self.assertTrue(np.allclose(expected_tp_fp_labels, tp_fp_labels)) + + def test_one_true_positives_with_very_small_iou_threshold(self): + groundtruth_boxes = np.array([[0, 0, 1, 1]], dtype=float) + groundtruth_groundtruth_is_difficult_list = np.zeros(1, dtype=bool) + scores, tp_fp_labels = self.eval2._compute_tp_fp_for_single_class( + self.detected_boxes, self.detected_scores, groundtruth_boxes, + groundtruth_groundtruth_is_difficult_list) + expected_scores = np.array([0.8, 0.6, 0.5], dtype=float) + expected_tp_fp_labels = np.array([True, False, False], dtype=bool) + self.assertTrue(np.allclose(expected_scores, scores)) + self.assertTrue(np.allclose(expected_tp_fp_labels, tp_fp_labels)) + + def test_two_true_positives_with_large_iou_threshold(self): + groundtruth_boxes = np.array([[0, 0, 1, 1], [0, 0, 3.5, 3.5]], dtype=float) + groundtruth_groundtruth_is_difficult_list = np.zeros(2, dtype=bool) + scores, tp_fp_labels = self.eval1._compute_tp_fp_for_single_class( + self.detected_boxes, self.detected_scores, groundtruth_boxes, + groundtruth_groundtruth_is_difficult_list) + expected_scores = np.array([0.8, 0.6, 0.5], dtype=float) + expected_tp_fp_labels = np.array([False, True, True], dtype=bool) + self.assertTrue(np.allclose(expected_scores, scores)) + self.assertTrue(np.allclose(expected_tp_fp_labels, tp_fp_labels)) + + +class MultiClassesTpFpTest(tf.test.TestCase): + + def test_tp_fp(self): + num_groundtruth_classes = 3 + matching_iou_threshold = 0.5 + nms_iou_threshold = 1.0 + nms_max_output_boxes = 10000 + eval1 = per_image_evaluation.PerImageEvaluation(num_groundtruth_classes, + matching_iou_threshold, + nms_iou_threshold, + nms_max_output_boxes) + detected_boxes = np.array([[0, 0, 1, 1], [10, 10, 5, 5], [0, 0, 2, 2], + [5, 10, 10, 5], [10, 5, 5, 10], [0, 0, 3, 3]], + dtype=float) + detected_scores = np.array([0.8, 0.1, 0.8, 0.9, 0.7, 0.8], dtype=float) + detected_class_labels = np.array([0, 1, 1, 2, 0, 2], dtype=int) + groundtruth_boxes = np.array([[0, 0, 1, 1], [0, 0, 3.5, 3.5]], dtype=float) + groundtruth_class_labels = np.array([0, 2], dtype=int) + groundtruth_groundtruth_is_difficult_list = np.zeros(2, dtype=float) + scores, tp_fp_labels, _ = eval1.compute_object_detection_metrics( + detected_boxes, detected_scores, detected_class_labels, + groundtruth_boxes, groundtruth_class_labels, + groundtruth_groundtruth_is_difficult_list) + expected_scores = [np.array([0.8], dtype=float)] * 3 + expected_tp_fp_labels = [np.array([True]), np.array([False]), np.array([True + ])] + for i in range(len(expected_scores)): + self.assertTrue(np.allclose(expected_scores[i], scores[i])) + self.assertTrue(np.array_equal(expected_tp_fp_labels[i], tp_fp_labels[i])) + + +class CorLocTest(tf.test.TestCase): + + def test_compute_corloc_with_normal_iou_threshold(self): + num_groundtruth_classes = 3 + matching_iou_threshold = 0.5 + nms_iou_threshold = 1.0 + nms_max_output_boxes = 10000 + eval1 = per_image_evaluation.PerImageEvaluation(num_groundtruth_classes, + matching_iou_threshold, + nms_iou_threshold, + nms_max_output_boxes) + detected_boxes = np.array([[0, 0, 1, 1], [0, 0, 2, 2], [0, 0, 3, 3], + [0, 0, 5, 5]], dtype=float) + detected_scores = np.array([0.9, 0.9, 0.1, 0.9], dtype=float) + detected_class_labels = np.array([0, 1, 0, 2], dtype=int) + groundtruth_boxes = np.array([[0, 0, 1, 1], [0, 0, 3, 3], [0, 0, 6, 6]], + dtype=float) + groundtruth_class_labels = np.array([0, 0, 2], dtype=int) + + is_class_correctly_detected_in_image = eval1._compute_cor_loc( + detected_boxes, detected_scores, detected_class_labels, + groundtruth_boxes, groundtruth_class_labels) + expected_result = np.array([1, 0, 1], dtype=int) + self.assertTrue(np.array_equal(expected_result, + is_class_correctly_detected_in_image)) + + def test_compute_corloc_with_very_large_iou_threshold(self): + num_groundtruth_classes = 3 + matching_iou_threshold = 0.9 + nms_iou_threshold = 1.0 + nms_max_output_boxes = 10000 + eval1 = per_image_evaluation.PerImageEvaluation(num_groundtruth_classes, + matching_iou_threshold, + nms_iou_threshold, + nms_max_output_boxes) + detected_boxes = np.array([[0, 0, 1, 1], [0, 0, 2, 2], [0, 0, 3, 3], + [0, 0, 5, 5]], dtype=float) + detected_scores = np.array([0.9, 0.9, 0.1, 0.9], dtype=float) + detected_class_labels = np.array([0, 1, 0, 2], dtype=int) + groundtruth_boxes = np.array([[0, 0, 1, 1], [0, 0, 3, 3], [0, 0, 6, 6]], + dtype=float) + groundtruth_class_labels = np.array([0, 0, 2], dtype=int) + + is_class_correctly_detected_in_image = eval1._compute_cor_loc( + detected_boxes, detected_scores, detected_class_labels, + groundtruth_boxes, groundtruth_class_labels) + expected_result = np.array([1, 0, 0], dtype=int) + self.assertTrue(np.array_equal(expected_result, + is_class_correctly_detected_in_image)) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/utils/shape_utils.py b/object_detection/utils/shape_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..6fee6ad08e67cebcccf605f07ca57ae1712d82e2 --- /dev/null +++ b/object_detection/utils/shape_utils.py @@ -0,0 +1,113 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Utils used to manipulate tensor shapes.""" + +import tensorflow as tf + + +def _is_tensor(t): + """Returns a boolean indicating whether the input is a tensor. + + Args: + t: the input to be tested. + + Returns: + a boolean that indicates whether t is a tensor. + """ + return isinstance(t, (tf.Tensor, tf.SparseTensor, tf.Variable)) + + +def _set_dim_0(t, d0): + """Sets the 0-th dimension of the input tensor. + + Args: + t: the input tensor, assuming the rank is at least 1. + d0: an integer indicating the 0-th dimension of the input tensor. + + Returns: + the tensor t with the 0-th dimension set. + """ + t_shape = t.get_shape().as_list() + t_shape[0] = d0 + t.set_shape(t_shape) + return t + + +def pad_tensor(t, length): + """Pads the input tensor with 0s along the first dimension up to the length. + + Args: + t: the input tensor, assuming the rank is at least 1. + length: a tensor of shape [1] or an integer, indicating the first dimension + of the input tensor t after padding, assuming length <= t.shape[0]. + + Returns: + padded_t: the padded tensor, whose first dimension is length. If the length + is an integer, the first dimension of padded_t is set to length + statically. + """ + t_rank = tf.rank(t) + t_shape = tf.shape(t) + t_d0 = t_shape[0] + pad_d0 = tf.expand_dims(length - t_d0, 0) + pad_shape = tf.cond( + tf.greater(t_rank, 1), lambda: tf.concat([pad_d0, t_shape[1:]], 0), + lambda: tf.expand_dims(length - t_d0, 0)) + padded_t = tf.concat([t, tf.zeros(pad_shape, dtype=t.dtype)], 0) + if not _is_tensor(length): + padded_t = _set_dim_0(padded_t, length) + return padded_t + + +def clip_tensor(t, length): + """Clips the input tensor along the first dimension up to the length. + + Args: + t: the input tensor, assuming the rank is at least 1. + length: a tensor of shape [1] or an integer, indicating the first dimension + of the input tensor t after clipping, assuming length <= t.shape[0]. + + Returns: + clipped_t: the clipped tensor, whose first dimension is length. If the + length is an integer, the first dimension of clipped_t is set to length + statically. + """ + clipped_t = tf.gather(t, tf.range(length)) + if not _is_tensor(length): + clipped_t = _set_dim_0(clipped_t, length) + return clipped_t + + +def pad_or_clip_tensor(t, length): + """Pad or clip the input tensor along the first dimension. + + Args: + t: the input tensor, assuming the rank is at least 1. + length: a tensor of shape [1] or an integer, indicating the first dimension + of the input tensor t after processing. + + Returns: + processed_t: the processed tensor, whose first dimension is length. If the + length is an integer, the first dimension of the processed tensor is set + to length statically. + """ + processed_t = tf.cond( + tf.greater(tf.shape(t)[0], length), + lambda: clip_tensor(t, length), + lambda: pad_tensor(t, length)) + if not _is_tensor(length): + processed_t = _set_dim_0(processed_t, length) + return processed_t diff --git a/object_detection/utils/shape_utils_test.py b/object_detection/utils/shape_utils_test.py new file mode 100644 index 0000000000000000000000000000000000000000..b1fa945cbb21840d51090c0cdd53e24931efe0e3 --- /dev/null +++ b/object_detection/utils/shape_utils_test.py @@ -0,0 +1,120 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.utils.shape_utils.""" + +import tensorflow as tf + +from object_detection.utils import shape_utils + + +class UtilTest(tf.test.TestCase): + + def test_pad_tensor_using_integer_input(self): + t1 = tf.constant([1], dtype=tf.int32) + pad_t1 = shape_utils.pad_tensor(t1, 2) + t2 = tf.constant([[0.1, 0.2]], dtype=tf.float32) + pad_t2 = shape_utils.pad_tensor(t2, 2) + + self.assertEqual(2, pad_t1.get_shape()[0]) + self.assertEqual(2, pad_t2.get_shape()[0]) + + with self.test_session() as sess: + pad_t1_result, pad_t2_result = sess.run([pad_t1, pad_t2]) + self.assertAllEqual([1, 0], pad_t1_result) + self.assertAllClose([[0.1, 0.2], [0, 0]], pad_t2_result) + + def test_pad_tensor_using_tensor_input(self): + t1 = tf.constant([1], dtype=tf.int32) + pad_t1 = shape_utils.pad_tensor(t1, tf.constant(2)) + t2 = tf.constant([[0.1, 0.2]], dtype=tf.float32) + pad_t2 = shape_utils.pad_tensor(t2, tf.constant(2)) + + with self.test_session() as sess: + pad_t1_result, pad_t2_result = sess.run([pad_t1, pad_t2]) + self.assertAllEqual([1, 0], pad_t1_result) + self.assertAllClose([[0.1, 0.2], [0, 0]], pad_t2_result) + + def test_clip_tensor_using_integer_input(self): + t1 = tf.constant([1, 2, 3], dtype=tf.int32) + clip_t1 = shape_utils.clip_tensor(t1, 2) + t2 = tf.constant([[0.1, 0.2], [0.2, 0.4], [0.5, 0.8]], dtype=tf.float32) + clip_t2 = shape_utils.clip_tensor(t2, 2) + + self.assertEqual(2, clip_t1.get_shape()[0]) + self.assertEqual(2, clip_t2.get_shape()[0]) + + with self.test_session() as sess: + clip_t1_result, clip_t2_result = sess.run([clip_t1, clip_t2]) + self.assertAllEqual([1, 2], clip_t1_result) + self.assertAllClose([[0.1, 0.2], [0.2, 0.4]], clip_t2_result) + + def test_clip_tensor_using_tensor_input(self): + t1 = tf.constant([1, 2, 3], dtype=tf.int32) + clip_t1 = shape_utils.clip_tensor(t1, tf.constant(2)) + t2 = tf.constant([[0.1, 0.2], [0.2, 0.4], [0.5, 0.8]], dtype=tf.float32) + clip_t2 = shape_utils.clip_tensor(t2, tf.constant(2)) + + with self.test_session() as sess: + clip_t1_result, clip_t2_result = sess.run([clip_t1, clip_t2]) + self.assertAllEqual([1, 2], clip_t1_result) + self.assertAllClose([[0.1, 0.2], [0.2, 0.4]], clip_t2_result) + + def test_pad_or_clip_tensor_using_integer_input(self): + t1 = tf.constant([1], dtype=tf.int32) + tt1 = shape_utils.pad_or_clip_tensor(t1, 2) + t2 = tf.constant([[0.1, 0.2]], dtype=tf.float32) + tt2 = shape_utils.pad_or_clip_tensor(t2, 2) + + t3 = tf.constant([1, 2, 3], dtype=tf.int32) + tt3 = shape_utils.clip_tensor(t3, 2) + t4 = tf.constant([[0.1, 0.2], [0.2, 0.4], [0.5, 0.8]], dtype=tf.float32) + tt4 = shape_utils.clip_tensor(t4, 2) + + self.assertEqual(2, tt1.get_shape()[0]) + self.assertEqual(2, tt2.get_shape()[0]) + self.assertEqual(2, tt3.get_shape()[0]) + self.assertEqual(2, tt4.get_shape()[0]) + + with self.test_session() as sess: + tt1_result, tt2_result, tt3_result, tt4_result = sess.run( + [tt1, tt2, tt3, tt4]) + self.assertAllEqual([1, 0], tt1_result) + self.assertAllClose([[0.1, 0.2], [0, 0]], tt2_result) + self.assertAllEqual([1, 2], tt3_result) + self.assertAllClose([[0.1, 0.2], [0.2, 0.4]], tt4_result) + + def test_pad_or_clip_tensor_using_tensor_input(self): + t1 = tf.constant([1], dtype=tf.int32) + tt1 = shape_utils.pad_or_clip_tensor(t1, tf.constant(2)) + t2 = tf.constant([[0.1, 0.2]], dtype=tf.float32) + tt2 = shape_utils.pad_or_clip_tensor(t2, tf.constant(2)) + + t3 = tf.constant([1, 2, 3], dtype=tf.int32) + tt3 = shape_utils.clip_tensor(t3, tf.constant(2)) + t4 = tf.constant([[0.1, 0.2], [0.2, 0.4], [0.5, 0.8]], dtype=tf.float32) + tt4 = shape_utils.clip_tensor(t4, tf.constant(2)) + + with self.test_session() as sess: + tt1_result, tt2_result, tt3_result, tt4_result = sess.run( + [tt1, tt2, tt3, tt4]) + self.assertAllEqual([1, 0], tt1_result) + self.assertAllClose([[0.1, 0.2], [0, 0]], tt2_result) + self.assertAllEqual([1, 2], tt3_result) + self.assertAllClose([[0.1, 0.2], [0.2, 0.4]], tt4_result) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/utils/static_shape.py b/object_detection/utils/static_shape.py new file mode 100644 index 0000000000000000000000000000000000000000..8e4e522f10f273417ead26ed9e263f90750154e5 --- /dev/null +++ b/object_detection/utils/static_shape.py @@ -0,0 +1,71 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Helper functions to access TensorShape values. + +The rank 4 tensor_shape must be of the form [batch_size, height, width, depth]. +""" + + +def get_batch_size(tensor_shape): + """Returns batch size from the tensor shape. + + Args: + tensor_shape: A rank 4 TensorShape. + + Returns: + An integer representing the batch size of the tensor. + """ + tensor_shape.assert_has_rank(rank=4) + return tensor_shape[0].value + + +def get_height(tensor_shape): + """Returns height from the tensor shape. + + Args: + tensor_shape: A rank 4 TensorShape. + + Returns: + An integer representing the height of the tensor. + """ + tensor_shape.assert_has_rank(rank=4) + return tensor_shape[1].value + + +def get_width(tensor_shape): + """Returns width from the tensor shape. + + Args: + tensor_shape: A rank 4 TensorShape. + + Returns: + An integer representing the width of the tensor. + """ + tensor_shape.assert_has_rank(rank=4) + return tensor_shape[2].value + + +def get_depth(tensor_shape): + """Returns depth from the tensor shape. + + Args: + tensor_shape: A rank 4 TensorShape. + + Returns: + An integer representing the depth of the tensor. + """ + tensor_shape.assert_has_rank(rank=4) + return tensor_shape[3].value diff --git a/object_detection/utils/static_shape_test.py b/object_detection/utils/static_shape_test.py new file mode 100644 index 0000000000000000000000000000000000000000..99307e9322e34a31bc27615429efab673bf114f5 --- /dev/null +++ b/object_detection/utils/static_shape_test.py @@ -0,0 +1,50 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.utils.static_shape.""" + +import tensorflow as tf + +from object_detection.utils import static_shape + + +class StaticShapeTest(tf.test.TestCase): + + def test_return_correct_batchSize(self): + tensor_shape = tf.TensorShape(dims=[32, 299, 384, 3]) + self.assertEqual(32, static_shape.get_batch_size(tensor_shape)) + + def test_return_correct_height(self): + tensor_shape = tf.TensorShape(dims=[32, 299, 384, 3]) + self.assertEqual(299, static_shape.get_height(tensor_shape)) + + def test_return_correct_width(self): + tensor_shape = tf.TensorShape(dims=[32, 299, 384, 3]) + self.assertEqual(384, static_shape.get_width(tensor_shape)) + + def test_return_correct_depth(self): + tensor_shape = tf.TensorShape(dims=[32, 299, 384, 3]) + self.assertEqual(3, static_shape.get_depth(tensor_shape)) + + def test_die_on_tensor_shape_with_rank_three(self): + tensor_shape = tf.TensorShape(dims=[32, 299, 384]) + with self.assertRaises(ValueError): + static_shape.get_batch_size(tensor_shape) + static_shape.get_height(tensor_shape) + static_shape.get_width(tensor_shape) + static_shape.get_depth(tensor_shape) + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/utils/test_utils.py b/object_detection/utils/test_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..f4eb8171c4b428ea0049574e604903f8e136e79f --- /dev/null +++ b/object_detection/utils/test_utils.py @@ -0,0 +1,137 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Contains functions which are convenient for unit testing.""" +import numpy as np +import tensorflow as tf + +from object_detection.core import anchor_generator +from object_detection.core import box_coder +from object_detection.core import box_list +from object_detection.core import box_predictor +from object_detection.core import matcher + + +class MockBoxCoder(box_coder.BoxCoder): + """Simple `difference` BoxCoder.""" + + @property + def code_size(self): + return 4 + + def _encode(self, boxes, anchors): + return boxes.get() - anchors.get() + + def _decode(self, rel_codes, anchors): + return box_list.BoxList(rel_codes + anchors.get()) + + +class MockBoxPredictor(box_predictor.BoxPredictor): + """Simple box predictor that ignores inputs and outputs all zeros.""" + + def __init__(self, is_training, num_classes): + super(MockBoxPredictor, self).__init__(is_training, num_classes) + + def _predict(self, image_features, num_predictions_per_location): + batch_size = image_features.get_shape().as_list()[0] + num_anchors = (image_features.get_shape().as_list()[1] + * image_features.get_shape().as_list()[2]) + code_size = 4 + zero = tf.reduce_sum(0 * image_features) + box_encodings = zero + tf.zeros( + (batch_size, num_anchors, 1, code_size), dtype=tf.float32) + class_predictions_with_background = zero + tf.zeros( + (batch_size, num_anchors, self.num_classes + 1), dtype=tf.float32) + return {box_predictor.BOX_ENCODINGS: box_encodings, + box_predictor.CLASS_PREDICTIONS_WITH_BACKGROUND: + class_predictions_with_background} + + +class MockAnchorGenerator(anchor_generator.AnchorGenerator): + """Mock anchor generator.""" + + def name_scope(self): + return 'MockAnchorGenerator' + + def num_anchors_per_location(self): + return [1] + + def _generate(self, feature_map_shape_list): + num_anchors = sum([shape[0] * shape[1] for shape in feature_map_shape_list]) + return box_list.BoxList(tf.zeros((num_anchors, 4), dtype=tf.float32)) + + +class MockMatcher(matcher.Matcher): + """Simple matcher that matches first anchor to first groundtruth box.""" + + def _match(self, similarity_matrix): + return tf.constant([0, -1, -1, -1], dtype=tf.int32) + + +def create_diagonal_gradient_image(height, width, depth): + """Creates pyramid image. Useful for testing. + + For example, pyramid_image(5, 6, 1) looks like: + # [[[ 5. 4. 3. 2. 1. 0.] + # [ 6. 5. 4. 3. 2. 1.] + # [ 7. 6. 5. 4. 3. 2.] + # [ 8. 7. 6. 5. 4. 3.] + # [ 9. 8. 7. 6. 5. 4.]]] + + Args: + height: height of image + width: width of image + depth: depth of image + + Returns: + pyramid image + """ + row = np.arange(height) + col = np.arange(width)[::-1] + image_layer = np.expand_dims(row, 1) + col + image_layer = np.expand_dims(image_layer, 2) + + image = image_layer + for i in range(1, depth): + image = np.concatenate((image, image_layer * pow(10, i)), 2) + + return image.astype(np.float32) + + +def create_random_boxes(num_boxes, max_height, max_width): + """Creates random bounding boxes of specific maximum height and width. + + Args: + num_boxes: number of boxes. + max_height: maximum height of boxes. + max_width: maximum width of boxes. + + Returns: + boxes: numpy array of shape [num_boxes, 4]. Each row is in form + [y_min, x_min, y_max, x_max]. + """ + + y_1 = np.random.uniform(size=(1, num_boxes)) * max_height + y_2 = np.random.uniform(size=(1, num_boxes)) * max_height + x_1 = np.random.uniform(size=(1, num_boxes)) * max_width + x_2 = np.random.uniform(size=(1, num_boxes)) * max_width + + boxes = np.zeros(shape=(num_boxes, 4)) + boxes[:, 0] = np.minimum(y_1, y_2) + boxes[:, 1] = np.minimum(x_1, x_2) + boxes[:, 2] = np.maximum(y_1, y_2) + boxes[:, 3] = np.maximum(x_1, x_2) + + return boxes.astype(np.float32) diff --git a/object_detection/utils/test_utils_test.py b/object_detection/utils/test_utils_test.py new file mode 100644 index 0000000000000000000000000000000000000000..1a4799c699d5cd9abe1a88219f3c4af29087a370 --- /dev/null +++ b/object_detection/utils/test_utils_test.py @@ -0,0 +1,73 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.utils.test_utils.""" + +import numpy as np +import tensorflow as tf + +from object_detection.utils import test_utils + + +class TestUtilsTest(tf.test.TestCase): + + def test_diagonal_gradient_image(self): + """Tests if a good pyramid image is created.""" + pyramid_image = test_utils.create_diagonal_gradient_image(3, 4, 2) + + # Test which is easy to understand. + expected_first_channel = np.array([[3, 2, 1, 0], + [4, 3, 2, 1], + [5, 4, 3, 2]], dtype=np.float32) + self.assertAllEqual(np.squeeze(pyramid_image[:, :, 0]), + expected_first_channel) + + # Actual test. + expected_image = np.array([[[3, 30], + [2, 20], + [1, 10], + [0, 0]], + [[4, 40], + [3, 30], + [2, 20], + [1, 10]], + [[5, 50], + [4, 40], + [3, 30], + [2, 20]]], dtype=np.float32) + + self.assertAllEqual(pyramid_image, expected_image) + + def test_random_boxes(self): + """Tests if valid random boxes are created.""" + num_boxes = 1000 + max_height = 3 + max_width = 5 + boxes = test_utils.create_random_boxes(num_boxes, + max_height, + max_width) + + true_column = np.ones(shape=(num_boxes)) == 1 + self.assertAllEqual(boxes[:, 0] < boxes[:, 2], true_column) + self.assertAllEqual(boxes[:, 1] < boxes[:, 3], true_column) + + self.assertTrue(boxes[:, 0].min() >= 0) + self.assertTrue(boxes[:, 1].min() >= 0) + self.assertTrue(boxes[:, 2].max() <= max_height) + self.assertTrue(boxes[:, 3].max() <= max_width) + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/utils/variables_helper.py b/object_detection/utils/variables_helper.py new file mode 100644 index 0000000000000000000000000000000000000000..b27f814f193803d884e428cf3370e9df5b352c87 --- /dev/null +++ b/object_detection/utils/variables_helper.py @@ -0,0 +1,133 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Helper functions for manipulating collections of variables during training. +""" +import logging +import re + +import tensorflow as tf + +slim = tf.contrib.slim + + +# TODO: Consider replacing with tf.contrib.filter_variables in +# tensorflow/contrib/framework/python/ops/variables.py +def filter_variables(variables, filter_regex_list, invert=False): + """Filters out the variables matching the filter_regex. + + Filter out the variables whose name matches the any of the regular + expressions in filter_regex_list and returns the remaining variables. + Optionally, if invert=True, the complement set is returned. + + Args: + variables: a list of tensorflow variables. + filter_regex_list: a list of string regular expressions. + invert: (boolean). If True, returns the complement of the filter set; that + is, all variables matching filter_regex are kept and all others discarded. + + Returns: + a list of filtered variables. + """ + kept_vars = [] + variables_to_ignore_patterns = filter(None, filter_regex_list) + for var in variables: + add = True + for pattern in variables_to_ignore_patterns: + if re.match(pattern, var.op.name): + add = False + break + if add != invert: + kept_vars.append(var) + return kept_vars + + +def multiply_gradients_matching_regex(grads_and_vars, regex_list, multiplier): + """Multiply gradients whose variable names match a regular expression. + + Args: + grads_and_vars: A list of gradient to variable pairs (tuples). + regex_list: A list of string regular expressions. + multiplier: A (float) multiplier to apply to each gradient matching the + regular expression. + + Returns: + grads_and_vars: A list of gradient to variable pairs (tuples). + """ + variables = [pair[1] for pair in grads_and_vars] + matching_vars = filter_variables(variables, regex_list, invert=True) + for var in matching_vars: + logging.info('Applying multiplier %f to variable [%s]', + multiplier, var.op.name) + grad_multipliers = {var: float(multiplier) for var in matching_vars} + return slim.learning.multiply_gradients(grads_and_vars, + grad_multipliers) + + +def freeze_gradients_matching_regex(grads_and_vars, regex_list): + """Freeze gradients whose variable names match a regular expression. + + Args: + grads_and_vars: A list of gradient to variable pairs (tuples). + regex_list: A list of string regular expressions. + + Returns: + grads_and_vars: A list of gradient to variable pairs (tuples) that do not + contain the variables and gradients matching the regex. + """ + variables = [pair[1] for pair in grads_and_vars] + matching_vars = filter_variables(variables, regex_list, invert=True) + kept_grads_and_vars = [pair for pair in grads_and_vars + if pair[1] not in matching_vars] + for var in matching_vars: + logging.info('Freezing variable [%s]', var.op.name) + return kept_grads_and_vars + + +def get_variables_available_in_checkpoint(variables, checkpoint_path): + """Returns the subset of variables available in the checkpoint. + + Inspects given checkpoint and returns the subset of variables that are + available in it. + + TODO: force input and output to be a dictionary. + + Args: + variables: a list or dictionary of variables to find in checkpoint. + checkpoint_path: path to the checkpoint to restore variables from. + + Returns: + A list or dictionary of variables. + Raises: + ValueError: if `variables` is not a list or dict. + """ + if isinstance(variables, list): + variable_names_map = {variable.op.name: variable for variable in variables} + elif isinstance(variables, dict): + variable_names_map = variables + else: + raise ValueError('`variables` is expected to be a list or dict.') + ckpt_reader = tf.train.NewCheckpointReader(checkpoint_path) + ckpt_vars = ckpt_reader.get_variable_to_shape_map().keys() + vars_in_ckpt = {} + for variable_name, variable in sorted(variable_names_map.items()): + if variable_name in ckpt_vars: + vars_in_ckpt[variable_name] = variable + else: + logging.warning('Variable [%s] not available in checkpoint', + variable_name) + if isinstance(variables, list): + return vars_in_ckpt.values() + return vars_in_ckpt diff --git a/object_detection/utils/variables_helper_test.py b/object_detection/utils/variables_helper_test.py new file mode 100644 index 0000000000000000000000000000000000000000..c04b11916a36e63275a39f62f9ea0e7479dd9f42 --- /dev/null +++ b/object_detection/utils/variables_helper_test.py @@ -0,0 +1,185 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for object_detection.utils.variables_helper.""" +import os + +import tensorflow as tf + +from object_detection.utils import variables_helper + + +class FilterVariablesTest(tf.test.TestCase): + + def _create_variables(self): + return [tf.Variable(1.0, name='FeatureExtractor/InceptionV3/weights'), + tf.Variable(1.0, name='FeatureExtractor/InceptionV3/biases'), + tf.Variable(1.0, name='StackProposalGenerator/weights'), + tf.Variable(1.0, name='StackProposalGenerator/biases')] + + def test_return_all_variables_when_empty_regex(self): + variables = self._create_variables() + out_variables = variables_helper.filter_variables(variables, ['']) + self.assertItemsEqual(out_variables, variables) + + def test_return_variables_which_do_not_match_single_regex(self): + variables = self._create_variables() + out_variables = variables_helper.filter_variables(variables, + ['FeatureExtractor/.*']) + self.assertItemsEqual(out_variables, variables[2:]) + + def test_return_variables_which_do_not_match_any_regex_in_list(self): + variables = self._create_variables() + out_variables = variables_helper.filter_variables(variables, [ + 'FeatureExtractor.*biases', 'StackProposalGenerator.*biases' + ]) + self.assertItemsEqual(out_variables, [variables[0], variables[2]]) + + def test_return_variables_matching_empty_regex_list(self): + variables = self._create_variables() + out_variables = variables_helper.filter_variables( + variables, [''], invert=True) + self.assertItemsEqual(out_variables, []) + + def test_return_variables_matching_some_regex_in_list(self): + variables = self._create_variables() + out_variables = variables_helper.filter_variables( + variables, + ['FeatureExtractor.*biases', 'StackProposalGenerator.*biases'], + invert=True) + self.assertItemsEqual(out_variables, [variables[1], variables[3]]) + + +class MultiplyGradientsMatchingRegexTest(tf.test.TestCase): + + def _create_grads_and_vars(self): + return [(tf.constant(1.0), + tf.Variable(1.0, name='FeatureExtractor/InceptionV3/weights')), + (tf.constant(2.0), + tf.Variable(2.0, name='FeatureExtractor/InceptionV3/biases')), + (tf.constant(3.0), + tf.Variable(3.0, name='StackProposalGenerator/weights')), + (tf.constant(4.0), + tf.Variable(4.0, name='StackProposalGenerator/biases'))] + + def test_multiply_all_feature_extractor_variables(self): + grads_and_vars = self._create_grads_and_vars() + regex_list = ['FeatureExtractor/.*'] + multiplier = 0.0 + grads_and_vars = variables_helper.multiply_gradients_matching_regex( + grads_and_vars, regex_list, multiplier) + exp_output = [(0.0, 1.0), (0.0, 2.0), (3.0, 3.0), (4.0, 4.0)] + init_op = tf.global_variables_initializer() + with self.test_session() as sess: + sess.run(init_op) + output = sess.run(grads_and_vars) + self.assertItemsEqual(output, exp_output) + + def test_multiply_all_bias_variables(self): + grads_and_vars = self._create_grads_and_vars() + regex_list = ['.*/biases'] + multiplier = 0.0 + grads_and_vars = variables_helper.multiply_gradients_matching_regex( + grads_and_vars, regex_list, multiplier) + exp_output = [(1.0, 1.0), (0.0, 2.0), (3.0, 3.0), (0.0, 4.0)] + init_op = tf.global_variables_initializer() + with self.test_session() as sess: + sess.run(init_op) + output = sess.run(grads_and_vars) + self.assertItemsEqual(output, exp_output) + + +class FreezeGradientsMatchingRegexTest(tf.test.TestCase): + + def _create_grads_and_vars(self): + return [(tf.constant(1.0), + tf.Variable(1.0, name='FeatureExtractor/InceptionV3/weights')), + (tf.constant(2.0), + tf.Variable(2.0, name='FeatureExtractor/InceptionV3/biases')), + (tf.constant(3.0), + tf.Variable(3.0, name='StackProposalGenerator/weights')), + (tf.constant(4.0), + tf.Variable(4.0, name='StackProposalGenerator/biases'))] + + def test_freeze_all_feature_extractor_variables(self): + grads_and_vars = self._create_grads_and_vars() + regex_list = ['FeatureExtractor/.*'] + grads_and_vars = variables_helper.freeze_gradients_matching_regex( + grads_and_vars, regex_list) + exp_output = [(3.0, 3.0), (4.0, 4.0)] + init_op = tf.global_variables_initializer() + with self.test_session() as sess: + sess.run(init_op) + output = sess.run(grads_and_vars) + self.assertItemsEqual(output, exp_output) + + +class GetVariablesAvailableInCheckpointTest(tf.test.TestCase): + + def test_return_all_variables_from_checkpoint(self): + variables = [ + tf.Variable(1.0, name='weights'), + tf.Variable(1.0, name='biases') + ] + checkpoint_path = os.path.join(self.get_temp_dir(), 'graph.pb') + init_op = tf.global_variables_initializer() + saver = tf.train.Saver(variables) + with self.test_session() as sess: + sess.run(init_op) + saver.save(sess, checkpoint_path) + out_variables = variables_helper.get_variables_available_in_checkpoint( + variables, checkpoint_path) + self.assertItemsEqual(out_variables, variables) + + def test_return_variables_available_in_checkpoint(self): + checkpoint_path = os.path.join(self.get_temp_dir(), 'graph.pb') + graph1_variables = [ + tf.Variable(1.0, name='weights'), + ] + init_op = tf.global_variables_initializer() + saver = tf.train.Saver(graph1_variables) + with self.test_session() as sess: + sess.run(init_op) + saver.save(sess, checkpoint_path) + + graph2_variables = graph1_variables + [tf.Variable(1.0, name='biases')] + out_variables = variables_helper.get_variables_available_in_checkpoint( + graph2_variables, checkpoint_path) + self.assertItemsEqual(out_variables, graph1_variables) + + def test_return_variables_available_an_checkpoint_with_dict_inputs(self): + checkpoint_path = os.path.join(self.get_temp_dir(), 'graph.pb') + graph1_variables = [ + tf.Variable(1.0, name='ckpt_weights'), + ] + init_op = tf.global_variables_initializer() + saver = tf.train.Saver(graph1_variables) + with self.test_session() as sess: + sess.run(init_op) + saver.save(sess, checkpoint_path) + + graph2_variables_dict = { + 'ckpt_weights': tf.Variable(1.0, name='weights'), + 'ckpt_biases': tf.Variable(1.0, name='biases') + } + out_variables = variables_helper.get_variables_available_in_checkpoint( + graph2_variables_dict, checkpoint_path) + self.assertTrue(isinstance(out_variables, dict)) + self.assertItemsEqual(out_variables.keys(), ['ckpt_weights']) + self.assertTrue(out_variables['ckpt_weights'].op.name == 'weights') + + +if __name__ == '__main__': + tf.test.main() diff --git a/object_detection/utils/visualization_utils.py b/object_detection/utils/visualization_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..1d0802c3669721374d14ba5528eba794b75af39e --- /dev/null +++ b/object_detection/utils/visualization_utils.py @@ -0,0 +1,425 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""A set of functions that are used for visualization. + +These functions often receive an image, perform some visualization on the image. +The functions do not return a value, instead they modify the image itself. + +""" +import collections +import numpy as np +import PIL.Image as Image +import PIL.ImageColor as ImageColor +import PIL.ImageDraw as ImageDraw +import PIL.ImageFont as ImageFont +import six +import tensorflow as tf + + +_TITLE_LEFT_MARGIN = 10 +_TITLE_TOP_MARGIN = 10 +STANDARD_COLORS = [ + 'AliceBlue', 'Chartreuse', 'Aqua', 'Aquamarine', 'Azure', 'Beige', 'Bisque', + 'BlanchedAlmond', 'BlueViolet', 'BurlyWood', 'CadetBlue', 'AntiqueWhite', + 'Chocolate', 'Coral', 'CornflowerBlue', 'Cornsilk', 'Crimson', 'Cyan', + 'DarkCyan', 'DarkGoldenRod', 'DarkGrey', 'DarkKhaki', 'DarkOrange', + 'DarkOrchid', 'DarkSalmon', 'DarkSeaGreen', 'DarkTurquoise', 'DarkViolet', + 'DeepPink', 'DeepSkyBlue', 'DodgerBlue', 'FireBrick', 'FloralWhite', + 'ForestGreen', 'Fuchsia', 'Gainsboro', 'GhostWhite', 'Gold', 'GoldenRod', + 'Salmon', 'Tan', 'HoneyDew', 'HotPink', 'IndianRed', 'Ivory', 'Khaki', + 'Lavender', 'LavenderBlush', 'LawnGreen', 'LemonChiffon', 'LightBlue', + 'LightCoral', 'LightCyan', 'LightGoldenRodYellow', 'LightGray', 'LightGrey', + 'LightGreen', 'LightPink', 'LightSalmon', 'LightSeaGreen', 'LightSkyBlue', + 'LightSlateGray', 'LightSlateGrey', 'LightSteelBlue', 'LightYellow', 'Lime', + 'LimeGreen', 'Linen', 'Magenta', 'MediumAquaMarine', 'MediumOrchid', + 'MediumPurple', 'MediumSeaGreen', 'MediumSlateBlue', 'MediumSpringGreen', + 'MediumTurquoise', 'MediumVioletRed', 'MintCream', 'MistyRose', 'Moccasin', + 'NavajoWhite', 'OldLace', 'Olive', 'OliveDrab', 'Orange', 'OrangeRed', + 'Orchid', 'PaleGoldenRod', 'PaleGreen', 'PaleTurquoise', 'PaleVioletRed', + 'PapayaWhip', 'PeachPuff', 'Peru', 'Pink', 'Plum', 'PowderBlue', 'Purple', + 'Red', 'RosyBrown', 'RoyalBlue', 'SaddleBrown', 'Green', 'SandyBrown', + 'SeaGreen', 'SeaShell', 'Sienna', 'Silver', 'SkyBlue', 'SlateBlue', + 'SlateGray', 'SlateGrey', 'Snow', 'SpringGreen', 'SteelBlue', 'GreenYellow', + 'Teal', 'Thistle', 'Tomato', 'Turquoise', 'Violet', 'Wheat', 'White', + 'WhiteSmoke', 'Yellow', 'YellowGreen' +] + + +def save_image_array_as_png(image, output_path): + """Saves an image (represented as a numpy array) to PNG. + + Args: + image: a numpy array with shape [height, width, 3]. + output_path: path to which image should be written. + """ + image_pil = Image.fromarray(np.uint8(image)).convert('RGB') + with tf.gfile.Open(output_path, 'w') as fid: + image_pil.save(fid, 'PNG') + + +def encode_image_array_as_png_str(image): + """Encodes a numpy array into a PNG string. + + Args: + image: a numpy array with shape [height, width, 3]. + + Returns: + PNG encoded image string. + """ + image_pil = Image.fromarray(np.uint8(image)) + output = six.BytesIO() + image_pil.save(output, format='PNG') + png_string = output.getvalue() + output.close() + return png_string + + +def draw_bounding_box_on_image_array(image, + ymin, + xmin, + ymax, + xmax, + color='red', + thickness=4, + display_str_list=(), + use_normalized_coordinates=True): + """Adds a bounding box to an image (numpy array). + + Args: + image: a numpy array with shape [height, width, 3]. + ymin: ymin of bounding box in normalized coordinates (same below). + xmin: xmin of bounding box. + ymax: ymax of bounding box. + xmax: xmax of bounding box. + color: color to draw bounding box. Default is red. + thickness: line thickness. Default value is 4. + display_str_list: list of strings to display in box + (each to be shown on its own line). + use_normalized_coordinates: If True (default), treat coordinates + ymin, xmin, ymax, xmax as relative to the image. Otherwise treat + coordinates as absolute. + """ + image_pil = Image.fromarray(np.uint8(image)).convert('RGB') + draw_bounding_box_on_image(image_pil, ymin, xmin, ymax, xmax, color, + thickness, display_str_list, + use_normalized_coordinates) + np.copyto(image, np.array(image_pil)) + + +def draw_bounding_box_on_image(image, + ymin, + xmin, + ymax, + xmax, + color='red', + thickness=4, + display_str_list=(), + use_normalized_coordinates=True): + """Adds a bounding box to an image. + + Each string in display_str_list is displayed on a separate line above the + bounding box in black text on a rectangle filled with the input 'color'. + + Args: + image: a PIL.Image object. + ymin: ymin of bounding box. + xmin: xmin of bounding box. + ymax: ymax of bounding box. + xmax: xmax of bounding box. + color: color to draw bounding box. Default is red. + thickness: line thickness. Default value is 4. + display_str_list: list of strings to display in box + (each to be shown on its own line). + use_normalized_coordinates: If True (default), treat coordinates + ymin, xmin, ymax, xmax as relative to the image. Otherwise treat + coordinates as absolute. + """ + draw = ImageDraw.Draw(image) + im_width, im_height = image.size + if use_normalized_coordinates: + (left, right, top, bottom) = (xmin * im_width, xmax * im_width, + ymin * im_height, ymax * im_height) + else: + (left, right, top, bottom) = (xmin, xmax, ymin, ymax) + draw.line([(left, top), (left, bottom), (right, bottom), + (right, top), (left, top)], width=thickness, fill=color) + try: + font = ImageFont.truetype('arial.ttf', 24) + except IOError: + font = ImageFont.load_default() + + text_bottom = top + # Reverse list and print from bottom to top. + for display_str in display_str_list[::-1]: + text_width, text_height = font.getsize(display_str) + margin = np.ceil(0.05 * text_height) + draw.rectangle( + [(left, text_bottom - text_height - 2 * margin), (left + text_width, + text_bottom)], + fill=color) + draw.text( + (left + margin, text_bottom - text_height - margin), + display_str, + fill='black', + font=font) + text_bottom -= text_height - 2 * margin + + +def draw_bounding_boxes_on_image_array(image, + boxes, + color='red', + thickness=4, + display_str_list_list=()): + """Draws bounding boxes on image (numpy array). + + Args: + image: a numpy array object. + boxes: a 2 dimensional numpy array of [N, 4]: (ymin, xmin, ymax, xmax). + The coordinates are in normalized format between [0, 1]. + color: color to draw bounding box. Default is red. + thickness: line thickness. Default value is 4. + display_str_list_list: list of list of strings. + a list of strings for each bounding box. + The reason to pass a list of strings for a + bounding box is that it might contain + multiple labels. + + Raises: + ValueError: if boxes is not a [N, 4] array + """ + image_pil = Image.fromarray(image) + draw_bounding_boxes_on_image(image_pil, boxes, color, thickness, + display_str_list_list) + np.copyto(image, np.array(image_pil)) + + +def draw_bounding_boxes_on_image(image, + boxes, + color='red', + thickness=4, + display_str_list_list=()): + """Draws bounding boxes on image. + + Args: + image: a PIL.Image object. + boxes: a 2 dimensional numpy array of [N, 4]: (ymin, xmin, ymax, xmax). + The coordinates are in normalized format between [0, 1]. + color: color to draw bounding box. Default is red. + thickness: line thickness. Default value is 4. + display_str_list_list: list of list of strings. + a list of strings for each bounding box. + The reason to pass a list of strings for a + bounding box is that it might contain + multiple labels. + + Raises: + ValueError: if boxes is not a [N, 4] array + """ + boxes_shape = boxes.shape + if not boxes_shape: + return + if len(boxes_shape) != 2 or boxes_shape[1] != 4: + raise ValueError('Input must be of size [N, 4]') + for i in range(boxes_shape[0]): + display_str_list = () + if display_str_list_list: + display_str_list = display_str_list_list[i] + draw_bounding_box_on_image(image, boxes[i, 0], boxes[i, 1], boxes[i, 2], + boxes[i, 3], color, thickness, display_str_list) + + +def draw_keypoints_on_image_array(image, + keypoints, + color='red', + radius=2, + use_normalized_coordinates=True): + """Draws keypoints on an image (numpy array). + + Args: + image: a numpy array with shape [height, width, 3]. + keypoints: a numpy array with shape [num_keypoints, 2]. + color: color to draw the keypoints with. Default is red. + radius: keypoint radius. Default value is 2. + use_normalized_coordinates: if True (default), treat keypoint values as + relative to the image. Otherwise treat them as absolute. + """ + image_pil = Image.fromarray(np.uint8(image)).convert('RGB') + draw_keypoints_on_image(image_pil, keypoints, color, radius, + use_normalized_coordinates) + np.copyto(image, np.array(image_pil)) + + +def draw_keypoints_on_image(image, + keypoints, + color='red', + radius=2, + use_normalized_coordinates=True): + """Draws keypoints on an image. + + Args: + image: a PIL.Image object. + keypoints: a numpy array with shape [num_keypoints, 2]. + color: color to draw the keypoints with. Default is red. + radius: keypoint radius. Default value is 2. + use_normalized_coordinates: if True (default), treat keypoint values as + relative to the image. Otherwise treat them as absolute. + """ + draw = ImageDraw.Draw(image) + im_width, im_height = image.size + keypoints_x = [k[1] for k in keypoints] + keypoints_y = [k[0] for k in keypoints] + if use_normalized_coordinates: + keypoints_x = tuple([im_width * x for x in keypoints_x]) + keypoints_y = tuple([im_height * y for y in keypoints_y]) + for keypoint_x, keypoint_y in zip(keypoints_x, keypoints_y): + draw.ellipse([(keypoint_x - radius, keypoint_y - radius), + (keypoint_x + radius, keypoint_y + radius)], + outline=color, fill=color) + + +def draw_mask_on_image_array(image, mask, color='red', alpha=0.7): + """Draws mask on an image. + + Args: + image: uint8 numpy array with shape (img_height, img_height, 3) + mask: a float numpy array of shape (img_height, img_height) with + values between 0 and 1 + color: color to draw the keypoints with. Default is red. + alpha: transparency value between 0 and 1. (default: 0.7) + + Raises: + ValueError: On incorrect data type for image or masks. + """ + if image.dtype != np.uint8: + raise ValueError('`image` not of type np.uint8') + if mask.dtype != np.float32: + raise ValueError('`mask` not of type np.float32') + if np.any(np.logical_or(mask > 1.0, mask < 0.0)): + raise ValueError('`mask` elements should be in [0, 1]') + rgb = ImageColor.getrgb(color) + pil_image = Image.fromarray(image) + + solid_color = np.expand_dims( + np.ones_like(mask), axis=2) * np.reshape(list(rgb), [1, 1, 3]) + pil_solid_color = Image.fromarray(np.uint8(solid_color)).convert('RGBA') + pil_mask = Image.fromarray(np.uint8(255.0*alpha*mask)).convert('L') + pil_image = Image.composite(pil_solid_color, pil_image, pil_mask) + np.copyto(image, np.array(pil_image.convert('RGB'))) + + +def visualize_boxes_and_labels_on_image_array(image, + boxes, + classes, + scores, + category_index, + instance_masks=None, + keypoints=None, + use_normalized_coordinates=False, + max_boxes_to_draw=20, + min_score_thresh=.5, + agnostic_mode=False, + line_thickness=4): + """Overlay labeled boxes on an image with formatted scores and label names. + + This function groups boxes that correspond to the same location + and creates a display string for each detection and overlays these + on the image. Note that this function modifies the image array in-place + and does not return anything. + + Args: + image: uint8 numpy array with shape (img_height, img_width, 3) + boxes: a numpy array of shape [N, 4] + classes: a numpy array of shape [N] + scores: a numpy array of shape [N] or None. If scores=None, then + this function assumes that the boxes to be plotted are groundtruth + boxes and plot all boxes as black with no classes or scores. + category_index: a dict containing category dictionaries (each holding + category index `id` and category name `name`) keyed by category indices. + instance_masks: a numpy array of shape [N, image_height, image_width], can + be None + keypoints: a numpy array of shape [N, num_keypoints, 2], can + be None + use_normalized_coordinates: whether boxes is to be interpreted as + normalized coordinates or not. + max_boxes_to_draw: maximum number of boxes to visualize. If None, draw + all boxes. + min_score_thresh: minimum score threshold for a box to be visualized + agnostic_mode: boolean (default: False) controlling whether to evaluate in + class-agnostic mode or not. This mode will display scores but ignore + classes. + line_thickness: integer (default: 4) controlling line width of the boxes. + """ + # Create a display string (and color) for every box location, group any boxes + # that correspond to the same location. + box_to_display_str_map = collections.defaultdict(list) + box_to_color_map = collections.defaultdict(str) + box_to_instance_masks_map = {} + box_to_keypoints_map = collections.defaultdict(list) + if not max_boxes_to_draw: + max_boxes_to_draw = boxes.shape[0] + for i in range(min(max_boxes_to_draw, boxes.shape[0])): + if scores is None or scores[i] > min_score_thresh: + box = tuple(boxes[i].tolist()) + if instance_masks is not None: + box_to_instance_masks_map[box] = instance_masks[i] + if keypoints is not None: + box_to_keypoints_map[box].extend(keypoints[i]) + if scores is None: + box_to_color_map[box] = 'black' + else: + if not agnostic_mode: + if classes[i] in category_index.keys(): + class_name = category_index[classes[i]]['name'] + else: + class_name = 'N/A' + display_str = '{}: {}%'.format( + class_name, + int(100*scores[i])) + else: + display_str = 'score: {}%'.format(int(100 * scores[i])) + box_to_display_str_map[box].append(display_str) + if agnostic_mode: + box_to_color_map[box] = 'DarkOrange' + else: + box_to_color_map[box] = STANDARD_COLORS[ + classes[i] % len(STANDARD_COLORS)] + + # Draw all boxes onto image. + for box, color in six.iteritems(box_to_color_map): + ymin, xmin, ymax, xmax = box + if instance_masks is not None: + draw_mask_on_image_array( + image, + box_to_instance_masks_map[box], + color=color + ) + draw_bounding_box_on_image_array( + image, + ymin, + xmin, + ymax, + xmax, + color=color, + thickness=line_thickness, + display_str_list=box_to_display_str_map[box], + use_normalized_coordinates=use_normalized_coordinates) + if keypoints is not None: + draw_keypoints_on_image_array( + image, + box_to_keypoints_map[box], + color=color, + radius=line_thickness / 2, + use_normalized_coordinates=use_normalized_coordinates) diff --git a/object_detection/utils/visualization_utils_test.py b/object_detection/utils/visualization_utils_test.py new file mode 100644 index 0000000000000000000000000000000000000000..809d5f068c2ac4a8ec17f75f0314b84a408bdf51 --- /dev/null +++ b/object_detection/utils/visualization_utils_test.py @@ -0,0 +1,151 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for image.understanding.object_detection.core.visualization_utils. + +Testing with visualization in the following colab: +https://drive.google.com/a/google.com/file/d/0B5HnKS_hMsNARERpU3MtU3I5RFE/view?usp=sharing + +""" + + +import numpy as np +import PIL.Image as Image +import tensorflow as tf + +from object_detection.utils import visualization_utils + + +class VisualizationUtilsTest(tf.test.TestCase): + + def create_colorful_test_image(self): + """This function creates an image that can be used to test vis functions. + + It makes an image composed of four colored rectangles. + + Returns: + colorful test numpy array image. + """ + ch255 = np.full([100, 200, 1], 255, dtype=np.uint8) + ch128 = np.full([100, 200, 1], 128, dtype=np.uint8) + ch0 = np.full([100, 200, 1], 0, dtype=np.uint8) + imr = np.concatenate((ch255, ch128, ch128), axis=2) + img = np.concatenate((ch255, ch255, ch0), axis=2) + imb = np.concatenate((ch255, ch0, ch255), axis=2) + imw = np.concatenate((ch128, ch128, ch128), axis=2) + imu = np.concatenate((imr, img), axis=1) + imd = np.concatenate((imb, imw), axis=1) + image = np.concatenate((imu, imd), axis=0) + return image + + def test_draw_bounding_box_on_image(self): + test_image = self.create_colorful_test_image() + test_image = Image.fromarray(test_image) + width_original, height_original = test_image.size + ymin = 0.25 + ymax = 0.75 + xmin = 0.4 + xmax = 0.6 + + visualization_utils.draw_bounding_box_on_image(test_image, ymin, xmin, ymax, + xmax) + width_final, height_final = test_image.size + + self.assertEqual(width_original, width_final) + self.assertEqual(height_original, height_final) + + def test_draw_bounding_box_on_image_array(self): + test_image = self.create_colorful_test_image() + width_original = test_image.shape[0] + height_original = test_image.shape[1] + ymin = 0.25 + ymax = 0.75 + xmin = 0.4 + xmax = 0.6 + + visualization_utils.draw_bounding_box_on_image_array( + test_image, ymin, xmin, ymax, xmax) + width_final = test_image.shape[0] + height_final = test_image.shape[1] + + self.assertEqual(width_original, width_final) + self.assertEqual(height_original, height_final) + + def test_draw_bounding_boxes_on_image(self): + test_image = self.create_colorful_test_image() + test_image = Image.fromarray(test_image) + width_original, height_original = test_image.size + boxes = np.array([[0.25, 0.75, 0.4, 0.6], + [0.1, 0.1, 0.9, 0.9]]) + + visualization_utils.draw_bounding_boxes_on_image(test_image, boxes) + width_final, height_final = test_image.size + + self.assertEqual(width_original, width_final) + self.assertEqual(height_original, height_final) + + def test_draw_bounding_boxes_on_image_array(self): + test_image = self.create_colorful_test_image() + width_original = test_image.shape[0] + height_original = test_image.shape[1] + boxes = np.array([[0.25, 0.75, 0.4, 0.6], + [0.1, 0.1, 0.9, 0.9]]) + + visualization_utils.draw_bounding_boxes_on_image_array(test_image, boxes) + width_final = test_image.shape[0] + height_final = test_image.shape[1] + + self.assertEqual(width_original, width_final) + self.assertEqual(height_original, height_final) + + def test_draw_keypoints_on_image(self): + test_image = self.create_colorful_test_image() + test_image = Image.fromarray(test_image) + width_original, height_original = test_image.size + keypoints = [[0.25, 0.75], [0.4, 0.6], [0.1, 0.1], [0.9, 0.9]] + + visualization_utils.draw_keypoints_on_image(test_image, keypoints) + width_final, height_final = test_image.size + + self.assertEqual(width_original, width_final) + self.assertEqual(height_original, height_final) + + def test_draw_keypoints_on_image_array(self): + test_image = self.create_colorful_test_image() + width_original = test_image.shape[0] + height_original = test_image.shape[1] + keypoints = [[0.25, 0.75], [0.4, 0.6], [0.1, 0.1], [0.9, 0.9]] + + visualization_utils.draw_keypoints_on_image_array(test_image, keypoints) + width_final = test_image.shape[0] + height_final = test_image.shape[1] + + self.assertEqual(width_original, width_final) + self.assertEqual(height_original, height_final) + + def test_draw_mask_on_image_array(self): + test_image = np.asarray([[[0, 0, 0], [0, 0, 0]], + [[0, 0, 0], [0, 0, 0]]], dtype=np.uint8) + mask = np.asarray([[0.0, 1.0], + [1.0, 1.0]], dtype=np.float32) + expected_result = np.asarray([[[0, 0, 0], [0, 0, 127]], + [[0, 0, 127], [0, 0, 127]]], dtype=np.uint8) + visualization_utils.draw_mask_on_image_array(test_image, mask, + color='Blue', alpha=.5) + self.assertAllEqual(test_image, expected_result) + + +if __name__ == '__main__': + tf.test.main() diff --git a/real_nvp/real_nvp_multiscale_dataset.py b/real_nvp/real_nvp_multiscale_dataset.py index a89dec8aa73012b41c4367cd1fe743af203dd8f0..d7b32ddfb6ffb71ca444796400fff3ac3a3e38ec 100644 --- a/real_nvp/real_nvp_multiscale_dataset.py +++ b/real_nvp/real_nvp_multiscale_dataset.py @@ -321,8 +321,8 @@ def masked_conv_aff_coupling(input_, mask_in, dim, name, input_=res, dim=channels, name="bn_in", scale=False, train=train, epsilon=1e-4, axes=[0, 1, 2]) res *= 2. - res = tf.concat_v2([res, -res], 3) - res = tf.concat_v2([res, mask], 3) + res = tf.concat([res, -res], 3) + res = tf.concat([res, mask], 3) dim_in = 2. * channels + 1 res = tf.nn.relu(res) res = resnet(input_=res, dim_in=dim_in, dim=dim, @@ -411,8 +411,8 @@ def masked_conv_add_coupling(input_, mask_in, dim, name, input_=res, dim=channels, name="bn_in", scale=False, train=train, epsilon=1e-4, axes=[0, 1, 2]) res *= 2. - res = tf.concat_v2([res, -res], 3) - res = tf.concat_v2([res, mask], 3) + res = tf.concat([res, -res], 3) + res = tf.concat([res, mask], 3) dim_in = 2. * channels + 1 res = tf.nn.relu(res) shift = resnet(input_=res, dim_in=dim_in, dim=dim, dim_out=channels, @@ -501,7 +501,7 @@ def conv_ch_aff_coupling(input_, dim, name, res = batch_norm( input_=res, dim=channels, name="bn_in", scale=False, train=train, epsilon=1e-4, axes=[0, 1, 2]) - res = tf.concat_v2([res, -res], 3) + res = tf.concat([res, -res], 3) dim_in = 2. * channels res = tf.nn.relu(res) res = resnet(input_=res, dim_in=dim_in, dim=dim, dim_out=2 * channels, @@ -551,11 +551,11 @@ def conv_ch_aff_coupling(input_, dim, name, res *= tf.exp(-.5 * log_var) log_diff -= .5 * log_var if change_bottom: - res = tf.concat_v2([input_, res], 3) - log_diff = tf.concat_v2([tf.zeros_like(log_diff), log_diff], 3) + res = tf.concat([input_, res], 3) + log_diff = tf.concat([tf.zeros_like(log_diff), log_diff], 3) else: - res = tf.concat_v2([res, input_], 3) - log_diff = tf.concat_v2([log_diff, tf.zeros_like(log_diff)], 3) + res = tf.concat([res, input_], 3) + log_diff = tf.concat([log_diff, tf.zeros_like(log_diff)], 3) return res, log_diff @@ -582,7 +582,7 @@ def conv_ch_add_coupling(input_, dim, name, res = batch_norm( input_=res, dim=channels, name="bn_in", scale=False, train=train, epsilon=1e-4, axes=[0, 1, 2]) - res = tf.concat_v2([res, -res], 3) + res = tf.concat([res, -res], 3) dim_in = 2. * channels res = tf.nn.relu(res) shift = resnet(input_=res, dim_in=dim_in, dim=dim, dim_out=channels, @@ -616,11 +616,11 @@ def conv_ch_add_coupling(input_, dim, name, res *= tf.exp(-.5 * log_var) log_diff -= .5 * log_var if change_bottom: - res = tf.concat_v2([input_, res], 3) - log_diff = tf.concat_v2([tf.zeros_like(log_diff), log_diff], 3) + res = tf.concat([input_, res], 3) + log_diff = tf.concat([tf.zeros_like(log_diff), log_diff], 3) else: - res = tf.concat_v2([res, input_], 3) - log_diff = tf.concat_v2([log_diff, tf.zeros_like(log_diff)], 3) + res = tf.concat([res, input_], 3) + log_diff = tf.concat([log_diff, tf.zeros_like(log_diff)], 3) return res, log_diff @@ -742,9 +742,9 @@ def rec_masked_conv_coupling(input_, hps, scale_idx, n_scale, input_=res_1, hps=hps, scale_idx=scale_idx + 1, n_scale=n_scale, use_batch_norm=use_batch_norm, weight_norm=weight_norm, train=train) - res = tf.concat_v2([res_1, res_2], 3) + res = tf.concat([res_1, res_2], 3) log_diff_1 += inc_log_diff - log_diff = tf.concat_v2([log_diff_1, log_diff_2], 3) + log_diff = tf.concat([log_diff_1, log_diff_2], 3) res = squeeze_2x2_ordered(res, reverse=True) log_diff = squeeze_2x2_ordered(log_diff, reverse=True) else: @@ -805,8 +805,8 @@ def rec_masked_deconv_coupling(input_, hps, scale_idx, n_scale, scale_idx=scale_idx + 1, n_scale=n_scale, use_batch_norm=use_batch_norm, weight_norm=weight_norm, train=train) - res = tf.concat_v2([res_1, res_2], 3) - log_diff = tf.concat_v2([log_diff_1, log_diff_2], 3) + res = tf.concat([res_1, res_2], 3) + log_diff = tf.concat([log_diff_1, log_diff_2], 3) res = squeeze_2x2_ordered(res, reverse=True) log_diff = squeeze_2x2_ordered(log_diff, reverse=True) else: @@ -1018,7 +1018,7 @@ class RealNVP(object): width = tf.cast(width, tf.int32) depth = tf.reshape((features["depth"], tf.int64)[0], [1]) depth = tf.cast(depth, tf.int32) - image = tf.reshape(image, tf.concat_v2([height, width, depth], 0)) + image = tf.reshape(image, tf.concat([height, width, depth], 0)) image = tf.random_crop(image, [64, 64, 3]) if FLAGS.mode == "train": image = tf.image.random_flip_left_right(image) @@ -1309,19 +1309,19 @@ class RealNVP(object): z_compressed = z_lost z_noisy = z_lost for _ in xrange(scale_idx + 1): - z_compressed = tf.concat_v2( + z_compressed = tf.concat( [z_compressed, tf.zeros_like(z_compressed)], 3) z_compressed = squeeze_2x2_ordered( z_compressed, reverse=True) - z_noisy = tf.concat_v2( + z_noisy = tf.concat( [z_noisy, tf.random_normal( z_noisy.get_shape().as_list())], 3) z_noisy = squeeze_2x2_ordered(z_noisy, reverse=True) z_compressed_list.append(z_compressed) z_noisy_list.append(z_noisy) self.z_reduced = z_lost - z_compressed = tf.concat_v2(z_compressed_list, 0) - z_noisy = tf.concat_v2(z_noisy_list, 0) + z_compressed = tf.concat(z_compressed_list, 0) + z_noisy = tf.concat(z_noisy_list, 0) noisy_images, _ = decoder( input_=z_noisy, hps=hps, n_scale=hps.n_scale, use_batch_norm=hps.use_batch_norm, weight_norm=True, diff --git a/resnet/README.md b/resnet/README.md index 4ea8028438803da8fa4fb61d0eed5545f8c05b10..7591b39cb10933f12fb199d37637b4b9b4e33b28 100644 --- a/resnet/README.md +++ b/resnet/README.md @@ -23,7 +23,7 @@ https://arxiv.org/pdf/1605.07146v1.pdf Settings: * Random split 50k training set into 45k/5k train/eval split. -* Pad to 36x36 and random crop. Horizontal flip. Per-image whitenting. +* Pad to 36x36 and random crop. Horizontal flip. Per-image whitening. * Momentum optimizer 0.9. * Learning rate schedule: 0.1 (40k), 0.01 (60k), 0.001 (>60k). * L2 weight decay: 0.002. @@ -31,13 +31,9 @@ https://arxiv.org/pdf/1605.07146v1.pdf Results: - ![Precisions](g3doc/cifar_resnet.gif) - - -![Precisions Legends](g3doc/cifar_resnet_legends.gif) - +![Precisions Legends](g3doc/cifar_resnet_legends.gif) CIFAR-10 Model|Best Precision|Steps --------------|--------------|------ @@ -69,40 +65,40 @@ curl -o cifar-100-binary.tar.gz https://www.cs.toronto.edu/~kriz/cifar-100-binar How to run: ```shell -# cd to the your workspace. -# It contains an empty WORKSPACE file, resnet codes and cifar10 dataset. -# Note: User can split 5k from train set for eval set. -ls -R - .: - cifar10 resnet WORKSPACE +# cd to the models repository and run with bash. Expected command output shown. +# The directory should contain an empty WORKSPACE file, the resnet code, and the cifar10 dataset. +# Note: The user can split 5k from train set for eval set. +$ ls -R +.: +cifar10 resnet WORKSPACE - ./cifar10: - data_batch_1.bin data_batch_2.bin data_batch_3.bin data_batch_4.bin - data_batch_5.bin test_batch.bin +./cifar10: +data_batch_1.bin data_batch_2.bin data_batch_3.bin data_batch_4.bin +data_batch_5.bin test_batch.bin - ./resnet: - BUILD cifar_input.py g3doc README.md resnet_main.py resnet_model.py +./resnet: +BUILD cifar_input.py g3doc README.md resnet_main.py resnet_model.py # Build everything for GPU. -bazel build -c opt --config=cuda resnet/... +$ bazel build -c opt --config=cuda resnet/... # Train the model. -bazel-bin/resnet/resnet_main --train_data_path=cifar10/data_batch* \ - --log_root=/tmp/resnet_model \ - --train_dir=/tmp/resnet_model/train \ - --dataset='cifar10' \ - --num_gpus=1 +$ bazel-bin/resnet/resnet_main --train_data_path=cifar10/data_batch* \ + --log_root=/tmp/resnet_model \ + --train_dir=/tmp/resnet_model/train \ + --dataset='cifar10' \ + --num_gpus=1 # While the model is training, you can also check on its progress using tensorboard: -tensorboard --logdir=/tmp/resnet_model +$ tensorboard --logdir=/tmp/resnet_model # Evaluate the model. # Avoid running on the same GPU as the training job at the same time, # otherwise, you might run out of memory. -bazel-bin/resnet/resnet_main --eval_data_path=cifar10/test_batch.bin \ - --log_root=/tmp/resnet_model \ - --eval_dir=/tmp/resnet_model/test \ - --mode=eval \ - --dataset='cifar10' \ - --num_gpus=0 +$ bazel-bin/resnet/resnet_main --eval_data_path=cifar10/test_batch.bin \ + --log_root=/tmp/resnet_model \ + --eval_dir=/tmp/resnet_model/test \ + --mode=eval \ + --dataset='cifar10' \ + --num_gpus=0 ``` diff --git a/resnet/resnet_model.py b/resnet/resnet_model.py index 0690c207afe633f0cc1ede678333b54ff3868507..2be68a132b952847169ba94f9bde07407acb4056 100644 --- a/resnet/resnet_model.py +++ b/resnet/resnet_model.py @@ -85,7 +85,7 @@ class ResNet(object): # comparably good performance. # https://arxiv.org/pdf/1605.07146v1.pdf # filters = [16, 160, 320, 640] - # Update hps.num_residual_units to 9 + # Update hps.num_residual_units to 4 with tf.variable_scope('unit_1_0'): x = res_func(x, filters[0], filters[1], self._stride_arr(strides[0]), @@ -185,7 +185,7 @@ class ResNet(object): trainable=False) tf.summary.histogram(mean.op.name, mean) tf.summary.histogram(variance.op.name, variance) - # elipson used to be 1e-5. Maybe 0.001 solves NaN problem in deeper net. + # epsilon used to be 1e-5. Maybe 0.001 solves NaN problem in deeper net. y = tf.nn.batch_normalization( x, mean, variance, beta, gamma, 0.001) y.set_shape(x.get_shape()) diff --git a/setup.py b/setup.py new file mode 100644 index 0000000000000000000000000000000000000000..2ea9812190900aef4daf945247707e8a6ab4ce15 --- /dev/null +++ b/setup.py @@ -0,0 +1,16 @@ +"""Setup script for object_detection.""" + +from setuptools import find_packages +from setuptools import setup + + +REQUIRED_PACKAGES = ['Pillow>=1.0'] + +setup( + name='object_detection', + version='0.1', + install_requires=REQUIRED_PACKAGES, + include_package_data=True, + packages=[p for p in find_packages() if p.startswith('object_detection')], + description='Tensorflow Object Detection Library', +) diff --git a/skip_thoughts/README.md b/skip_thoughts/README.md index ad6c98ec03dcb031354d0fbfe69420eac8c4cec6..cdcffe7c51bb12ca29265580ff8eae54d02c2b7d 100644 --- a/skip_thoughts/README.md +++ b/skip_thoughts/README.md @@ -133,7 +133,8 @@ INPUT_FILES="${HOME}/skip_thoughts/bookcorpus/*.txt" DATA_DIR="${HOME}/skip_thoughts/data" # Build the preprocessing script. -bazel build -c opt skip_thoughts/data/preprocess_dataset +cd tensorflow-models/skip_thoughts +bazel build -c opt //skip_thoughts/data:preprocess_dataset # Run the preprocessing script. bazel-bin/skip_thoughts/data/preprocess_dataset \ @@ -164,7 +165,8 @@ DATA_DIR="${HOME}/skip_thoughts/data" MODEL_DIR="${HOME}/skip_thoughts/model" # Build the model. -bazel build -c opt skip_thoughts/... +cd tensorflow-models/skip_thoughts +bazel build -c opt //skip_thoughts/... # Run the training script. bazel-bin/skip_thoughts/train \ @@ -269,7 +271,8 @@ WORD2VEC_MODEL="${HOME}/skip_thoughts/googlenews/GoogleNews-vectors-negative300. EXP_VOCAB_DIR="${HOME}/skip_thoughts/exp_vocab" # Build the vocabulary expansion script. -bazel build -c opt skip_thoughts/vocabulary_expansion +cd tensorflow-models/skip_thoughts +bazel build -c opt //skip_thoughts:vocabulary_expansion # Run the vocabulary expansion script. bazel-bin/skip_thoughts/vocabulary_expansion \ @@ -285,7 +288,7 @@ bazel-bin/skip_thoughts/vocabulary_expansion \ The model can be evaluated using the benchmark tasks described in the [Skip-Thought Vectors](https://papers.nips.cc/paper/5950-skip-thought-vectors.pdf) -paper. The following tasks are suported (refer to the paper for full details): +paper. The following tasks are supported (refer to the paper for full details): * **SICK** semantic relatedness task. * **MSRP** (Microsoft Research Paraphrase Corpus) paraphrase detection task. @@ -343,7 +346,8 @@ EMBEDDINGS_FILE="${HOME}/skip_thoughts/exp_vocab/embeddings.npy" EVAL_DATA_DIR="${HOME}/skip_thoughts/eval_data" # Build the evaluation script. -bazel build -c opt skip_thoughts/evaluate +cd tensorflow-models/skip_thoughts +bazel build -c opt //skip_thoughts:evaluate # Run the evaluation script. bazel-bin/skip_thoughts/evaluate \ diff --git a/slim/BUILD b/slim/BUILD index 77a1ae50353905968c351a5ffa91d891354b645d..bc38704a36b48c467e5cf6ed159b220dd5732764 100644 --- a/slim/BUILD +++ b/slim/BUILD @@ -132,6 +132,7 @@ py_library( ":cifarnet", ":inception", ":lenet", + ":mobilenet_v1", ":overfeat", ":resnet_v1", ":resnet_v2", @@ -269,6 +270,23 @@ py_library( srcs = ["nets/lenet.py"], ) +py_library( + name = "mobilenet_v1", + srcs = ["nets/mobilenet_v1.py"], + srcs_version = "PY2AND3", +) + +py_test( + name = "mobilenet_v1_test", + size = "large", + srcs = ["nets/mobilenet_v1_test.py"], + shard_count = 3, + srcs_version = "PY2AND3", + deps = [ + ":mobilenet_v1", + ], +) + py_library( name = "overfeat", srcs = ["nets/overfeat.py"], @@ -372,3 +390,26 @@ py_binary( ":preprocessing_factory", ], ) + +py_binary( + name = "export_inference_graph", + srcs = ["export_inference_graph.py"], + deps = [ + ":dataset_factory", + ":nets_factory", + ], +) + +py_test( + name = "export_inference_graph_test", + size = "medium", + srcs = ["export_inference_graph_test.py"], + srcs_version = "PY2AND3", + tags = [ + "manual", + ], + deps = [ + ":export_inference_graph", + ":nets_factory", + ], +) diff --git a/slim/README.md b/slim/README.md index bf20a084c668572d0b9b4e5d96ede01f378a92cc..021673631c527ffbb567b0879604185a852637cf 100644 --- a/slim/README.md +++ b/slim/README.md @@ -32,6 +32,8 @@ Maintainers of TF-slim: Training from scratch
Fine tuning to a new task
Evaluating performance
+Exporting Inference Graph
+Troubleshooting
# Installation @@ -178,12 +180,12 @@ image classification dataset. In the table below, we list each model, the corresponding TensorFlow model file, the link to the model checkpoint, and the top 1 and top 5 accuracy (on the imagenet test set). -Note that the VGG and ResNet parameters have been converted from their original +Note that the VGG and ResNet V1 parameters have been converted from their original caffe formats ([here](https://github.com/BVLC/caffe/wiki/Model-Zoo#models-used-by-the-vgg-team-in-ilsvrc-2014) and [here](https://github.com/KaimingHe/deep-residual-networks)), -whereas the Inception parameters have been trained internally at +whereas the Inception and ResNet V2 parameters have been trained internally at Google. Also be aware that these accuracies were computed by evaluating using a single image crop. Some academic papers report higher accuracy by using multiple crops at multiple scales. @@ -194,13 +196,28 @@ Model | TF-Slim File | Checkpoint | Top-1 Accuracy| Top-5 Accuracy | [Inception V2](http://arxiv.org/abs/1502.03167)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/inception_v2.py)|[inception_v2_2016_08_28.tar.gz](http://download.tensorflow.org/models/inception_v2_2016_08_28.tar.gz)|73.9|91.8| [Inception V3](http://arxiv.org/abs/1512.00567)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/inception_v3.py)|[inception_v3_2016_08_28.tar.gz](http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz)|78.0|93.9| [Inception V4](http://arxiv.org/abs/1602.07261)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/inception_v4.py)|[inception_v4_2016_09_09.tar.gz](http://download.tensorflow.org/models/inception_v4_2016_09_09.tar.gz)|80.2|95.2| -[Inception-ResNet-v2](http://arxiv.org/abs/1602.07261)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/inception_resnet_v2.py)|[inception_resnet_v2.tar.gz](http://download.tensorflow.org/models/inception_resnet_v2_2016_08_30.tar.gz)|80.4|95.3| -[ResNet 50](https://arxiv.org/abs/1512.03385)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v1.py)|[resnet_v1_50.tar.gz](http://download.tensorflow.org/models/resnet_v1_50_2016_08_28.tar.gz)|75.2|92.2| -[ResNet 101](https://arxiv.org/abs/1512.03385)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v1.py)|[resnet_v1_101.tar.gz](http://download.tensorflow.org/models/resnet_v1_101_2016_08_28.tar.gz)|76.4|92.9| -[ResNet 152](https://arxiv.org/abs/1512.03385)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v1.py)|[resnet_v1_152.tar.gz](http://download.tensorflow.org/models/resnet_v1_152_2016_08_28.tar.gz)|76.8|93.2| -[VGG 16](http://arxiv.org/abs/1409.1556.pdf)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/vgg.py)|[vgg_16.tar.gz](http://download.tensorflow.org/models/vgg_16_2016_08_28.tar.gz)|71.5|89.8| -[VGG 19](http://arxiv.org/abs/1409.1556.pdf)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/vgg.py)|[vgg_19.tar.gz](http://download.tensorflow.org/models/vgg_19_2016_08_28.tar.gz)|71.1|89.8| - +[Inception-ResNet-v2](http://arxiv.org/abs/1602.07261)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/inception_resnet_v2.py)|[inception_resnet_v2_2016_08_30.tar.gz](http://download.tensorflow.org/models/inception_resnet_v2_2016_08_30.tar.gz)|80.4|95.3| +[ResNet V1 50](https://arxiv.org/abs/1512.03385)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v1.py)|[resnet_v1_50_2016_08_28.tar.gz](http://download.tensorflow.org/models/resnet_v1_50_2016_08_28.tar.gz)|75.2|92.2| +[ResNet V1 101](https://arxiv.org/abs/1512.03385)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v1.py)|[resnet_v1_101_2016_08_28.tar.gz](http://download.tensorflow.org/models/resnet_v1_101_2016_08_28.tar.gz)|76.4|92.9| +[ResNet V1 152](https://arxiv.org/abs/1512.03385)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v1.py)|[resnet_v1_152_2016_08_28.tar.gz](http://download.tensorflow.org/models/resnet_v1_152_2016_08_28.tar.gz)|76.8|93.2| +[ResNet V2 50](https://arxiv.org/abs/1603.05027)^|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v2.py)|[resnet_v2_50_2017_04_14.tar.gz](http://download.tensorflow.org/models/resnet_v2_50_2017_04_14.tar.gz)|75.6|92.8| +[ResNet V2 101](https://arxiv.org/abs/1603.05027)^|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v2.py)|[resnet_v2_101_2017_04_14.tar.gz](http://download.tensorflow.org/models/resnet_v2_101_2017_04_14.tar.gz)|77.0|93.7| +[ResNet V2 152](https://arxiv.org/abs/1603.05027)^|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v2.py)|[resnet_v2_152_2017_04_14.tar.gz](http://download.tensorflow.org/models/resnet_v2_152_2017_04_14.tar.gz)|77.8|94.1| +[ResNet V2 200](https://arxiv.org/abs/1603.05027)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v2.py)|[TBA]()|79.9\*|95.2\*| +[VGG 16](http://arxiv.org/abs/1409.1556.pdf)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/vgg.py)|[vgg_16_2016_08_28.tar.gz](http://download.tensorflow.org/models/vgg_16_2016_08_28.tar.gz)|71.5|89.8| +[VGG 19](http://arxiv.org/abs/1409.1556.pdf)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/vgg.py)|[vgg_19_2016_08_28.tar.gz](http://download.tensorflow.org/models/vgg_19_2016_08_28.tar.gz)|71.1|89.8| +[MobileNet_v1_1.0_224](https://arxiv.org/pdf/1704.04861.pdf)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/mobilenet_v1.py)|[mobilenet_v1_1.0_224_2017_06_14.tar.gz](http://download.tensorflow.org/models/mobilenet_v1_1.0_224_2017_06_14.tar.gz)|70.7|89.5| +[MobileNet_v1_0.50_160](https://arxiv.org/pdf/1704.04861.pdf)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/mobilenet_v1.py)|[mobilenet_v1_0.50_160_2017_06_14.tar.gz](http://download.tensorflow.org/models/mobilenet_v1_0.50_160_2017_06_14.tar.gz)|59.9|82.5| +[MobileNet_v1_0.25_128](https://arxiv.org/pdf/1704.04861.pdf)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/mobilenet_v1.py)|[mobilenet_v1_0.25_128_2017_06_14.tar.gz](http://download.tensorflow.org/models/mobilenet_v1_0.25_128_2017_06_14.tar.gz)|41.3|66.2| + +^ ResNet V2 models use Inception pre-processing and input image size of 299 (use +`--preprocessing_name inception --eval_image_size 299` when using +`eval_image_classifier.py`). Performance numbers for ResNet V2 models are +reported on ImageNet valdiation set. + +All 16 MobileNet Models reported in the [MobileNet Paper](https://arxiv.org/abs/1704.04861) can be found [here](https://github.com/tensorflow/models/tree/master/slim/nets/mobilenet_v1.md). + +(\*): Results quoted from the [paper](https://arxiv.org/abs/1603.05027). Here is an example of how to download the Inception V3 checkpoint: @@ -316,8 +333,72 @@ $ python eval_image_classifier.py \ ``` +# Exporting the Inference Graph + + +Saves out a GraphDef containing the architecture of the model. + +To use it with a model name defined by slim, run: + +```shell +$ python export_inference_graph.py \ + --alsologtostderr \ + --model_name=inception_v3 \ + --output_file=/tmp/inception_v3_inf_graph.pb + +$ python export_inference_graph.py \ + --alsologtostderr \ + --model_name=mobilenet_v1 \ + --image_size=224 \ + --output_file=/tmp/mobilenet_v1_224.pb +``` + +## Freezing the exported Graph +If you then want to use the resulting model with your own or pretrained +checkpoints as part of a mobile model, you can run freeze_graph to get a graph +def with the variables inlined as constants using: + +```shell +bazel build tensorflow/python/tools:freeze_graph + +bazel-bin/tensorflow/python/tools/freeze_graph \ + --input_graph=/tmp/inception_v3_inf_graph.pb \ + --input_checkpoint=/tmp/checkpoints/inception_v3.ckpt \ + --input_binary=true --output_graph=/tmp/frozen_inception_v3.pb \ + --output_node_names=InceptionV3/Predictions/Reshape_1 +``` + +The output node names will vary depending on the model, but you can inspect and +estimate them using the summarize_graph tool: + +```shell +bazel build tensorflow/tools/graph_transforms:summarize_graph + +bazel-bin/tensorflow/tools/graph_transforms/summarize_graph \ + --in_graph=/tmp/inception_v3_inf_graph.pb +``` + +## Run label image in C++ + +To run the resulting graph in C++, you can look at the label_image sample code: + +```shell +bazel build tensorflow/examples/label_image:label_image + +bazel-bin/tensorflow/examples/label_image/label_image \ + --image=${HOME}/Pictures/flowers.jpg \ + --input_layer=input \ + --output_layer=InceptionV3/Predictions/Reshape_1 \ + --graph=/tmp/frozen_inception_v3.pb \ + --labels=/tmp/imagenet_slim_labels.txt \ + --input_mean=0 \ + --input_std=255 \ + --logtostderr +``` + # Troubleshooting + #### The model runs out of CPU memory. @@ -344,10 +425,10 @@ following error: ```bash InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [1001] rhs shape= [1000] ``` -This is due to the fact that the VGG and ResNet final layers have only 1000 +This is due to the fact that the VGG and ResNet V1 final layers have only 1000 outputs rather than 1001. -To fix this issue, you can set the `--labels_offsets=1` flag. This results in +To fix this issue, you can set the `--labels_offset=1` flag. This results in the ImageNet labels being shifted down by one: @@ -368,4 +449,3 @@ image_preprocessing_fn = preprocessing_factory.get_preprocessing( See [Hardware Specifications](https://github.com/tensorflow/models/tree/master/inception#what-hardware-specification-are-these-hyper-parameters-targeted-for). - diff --git a/slim/WORKSPACE b/slim/WORKSPACE new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/slim/datasets/dataset_utils.py b/slim/datasets/dataset_utils.py index 9c79aadfbcd7c0c741837f818a97b7a154c3b040..6f7a1c207b58be97a1b974980989b3039ffc04ef 100644 --- a/slim/datasets/dataset_utils.py +++ b/slim/datasets/dataset_utils.py @@ -124,7 +124,7 @@ def read_label_file(dataset_dir, filename=LABELS_FILENAME): A map from a label (integer) to class name. """ labels_filename = os.path.join(dataset_dir, filename) - with tf.gfile.Open(labels_filename, 'r') as f: + with tf.gfile.Open(labels_filename, 'rb') as f: lines = f.read().decode() lines = lines.split('\n') lines = filter(None, lines) diff --git a/slim/datasets/download_and_convert_cifar10.py b/slim/datasets/download_and_convert_cifar10.py index 2cb787d08effdcc35dd0a2c9ff18c4800d0cf425..0e0abe3c06657c1383f5c71162be98eed3ff10b9 100644 --- a/slim/datasets/download_and_convert_cifar10.py +++ b/slim/datasets/download_and_convert_cifar10.py @@ -26,7 +26,7 @@ from __future__ import absolute_import from __future__ import division from __future__ import print_function -import cPickle +from six.moves import cPickle import os import sys import tarfile @@ -72,14 +72,17 @@ def _add_to_tfrecord(filename, tfrecord_writer, offset=0): Returns: The new offset. """ - with tf.gfile.Open(filename, 'r') as f: - data = cPickle.load(f) + with tf.gfile.Open(filename, 'rb') as f: + if sys.version_info < (3,): + data = cPickle.load(f) + else: + data = cPickle.load(f, encoding='bytes') - images = data['data'] + images = data[b'data'] num_images = images.shape[0] images = images.reshape((num_images, 3, 32, 32)) - labels = data['labels'] + labels = data[b'labels'] with tf.Graph().as_default(): image_placeholder = tf.placeholder(dtype=tf.uint8) @@ -99,7 +102,7 @@ def _add_to_tfrecord(filename, tfrecord_writer, offset=0): feed_dict={image_placeholder: image}) example = dataset_utils.image_to_tfexample( - png_string, 'png', _IMAGE_SIZE, _IMAGE_SIZE, label) + png_string, b'png', _IMAGE_SIZE, _IMAGE_SIZE, label) tfrecord_writer.write(example.SerializeToString()) return offset + num_images diff --git a/slim/datasets/download_and_convert_flowers.py b/slim/datasets/download_and_convert_flowers.py index 347a4df29efc6a4571d25960769bb8661a122907..2c11ead410c91b21bce609bc870ff6263d842f1f 100644 --- a/slim/datasets/download_and_convert_flowers.py +++ b/slim/datasets/download_and_convert_flowers.py @@ -136,14 +136,14 @@ def _convert_dataset(split_name, filenames, class_names_to_ids, dataset_dir): sys.stdout.flush() # Read the filename: - image_data = tf.gfile.FastGFile(filenames[i], 'r').read() + image_data = tf.gfile.FastGFile(filenames[i], 'rb').read() height, width = image_reader.read_image_dims(sess, image_data) class_name = os.path.basename(os.path.dirname(filenames[i])) class_id = class_names_to_ids[class_name] example = dataset_utils.image_to_tfexample( - image_data, 'jpg', height, width, class_id) + image_data, b'jpg', height, width, class_id) tfrecord_writer.write(example.SerializeToString()) sys.stdout.write('\n') diff --git a/slim/deployment/model_deploy.py b/slim/deployment/model_deploy.py index 8855f2aee3f67bebd097d8f440d676001e19f1fe..c6820769dcb60a70436b706c176749e2a64d706b 100644 --- a/slim/deployment/model_deploy.py +++ b/slim/deployment/model_deploy.py @@ -103,8 +103,6 @@ import collections import tensorflow as tf -from tensorflow.python.ops import control_flow_ops - slim = tf.contrib.slim @@ -378,8 +376,8 @@ def deploy(config, update_ops.append(grad_updates) update_op = tf.group(*update_ops) - train_op = control_flow_ops.with_dependencies([update_op], total_loss, - name='train_op') + with tf.control_dependencies([update_op]): + train_op = tf.identity(total_loss, name='train_op') else: clones_losses = [] regularization_losses = tf.get_collection( @@ -594,8 +592,7 @@ class DeploymentConfig(object): if self._clone_on_cpu: device += '/device:CPU:0' else: - if self._num_clones > 1: - device += '/device:GPU:%d' % clone_index + device += '/device:GPU:%d' % clone_index return device def clone_scope(self, clone_index): @@ -663,7 +660,7 @@ class DeploymentConfig(object): if op.device: return op.device node_def = op if isinstance(op, tf.NodeDef) else op.node_def - if node_def.op == 'Variable': + if node_def.op.startswith('Variable'): t = self._task self._task = (self._task + 1) % self._tasks d = '%s/task:%d' % (self._device, t) diff --git a/slim/deployment/model_deploy_test.py b/slim/deployment/model_deploy_test.py index 57951db9616bd9334deb557f2a1cf0aac55448cc..48982eda7bafb162e05891ef1618b04b0c28e06a 100644 --- a/slim/deployment/model_deploy_test.py +++ b/slim/deployment/model_deploy_test.py @@ -33,7 +33,7 @@ class DeploymentConfigTest(tf.test.TestCase): self.assertEqual(slim.get_variables(), []) self.assertEqual(deploy_config.caching_device(), None) - self.assertDeviceEqual(deploy_config.clone_device(0), '') + self.assertDeviceEqual(deploy_config.clone_device(0), 'GPU:0') self.assertEqual(deploy_config.clone_scope(0), '') self.assertDeviceEqual(deploy_config.optimizer_device(), 'CPU:0') self.assertDeviceEqual(deploy_config.inputs_device(), 'CPU:0') @@ -65,7 +65,7 @@ class DeploymentConfigTest(tf.test.TestCase): deploy_config = model_deploy.DeploymentConfig(num_clones=1, num_ps_tasks=1) self.assertDeviceEqual(deploy_config.clone_device(0), - '/job:worker') + '/job:worker/device:GPU:0') self.assertEqual(deploy_config.clone_scope(0), '') self.assertDeviceEqual(deploy_config.optimizer_device(), '/job:worker/device:CPU:0') @@ -105,7 +105,7 @@ class DeploymentConfigTest(tf.test.TestCase): num_ps_tasks=2) self.assertDeviceEqual(deploy_config.clone_device(0), - '/job:worker') + '/job:worker/device:GPU:0') self.assertEqual(deploy_config.clone_scope(0), '') self.assertDeviceEqual(deploy_config.optimizer_device(), '/job:worker/device:CPU:0') @@ -201,7 +201,7 @@ class CreatecloneTest(tf.test.TestCase): self.assertEqual(clone.outputs.op.name, 'LogisticClassifier/fully_connected/Sigmoid') self.assertEqual(clone.scope, '') - self.assertDeviceEqual(clone.device, '') + self.assertDeviceEqual(clone.device, 'GPU:0') self.assertEqual(len(slim.losses.get_losses()), 1) update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) self.assertEqual(update_ops, []) @@ -227,7 +227,7 @@ class CreatecloneTest(tf.test.TestCase): self.assertEqual(clone.outputs.op.name, 'BatchNormClassifier/fully_connected/Sigmoid') self.assertEqual(clone.scope, '') - self.assertDeviceEqual(clone.device, '') + self.assertDeviceEqual(clone.device, 'GPU:0') self.assertEqual(len(slim.losses.get_losses()), 1) update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) self.assertEqual(len(update_ops), 2) @@ -278,7 +278,7 @@ class CreatecloneTest(tf.test.TestCase): clone = clones[0] self.assertEqual(clone.outputs.op.name, 'BatchNormClassifier/fully_connected/Sigmoid') - self.assertDeviceEqual(clone.device, '/job:worker') + self.assertDeviceEqual(clone.device, '/job:worker/device:GPU:0') self.assertEqual(clone.scope, '') self.assertEqual(len(slim.get_variables()), 5) for v in slim.get_variables(): @@ -350,7 +350,7 @@ class OptimizeclonesTest(tf.test.TestCase): self.assertEqual(len(grads_and_vars), len(tf.trainable_variables())) self.assertEqual(total_loss.op.name, 'total_loss') for g, v in grads_and_vars: - self.assertDeviceEqual(g.device, '') + self.assertDeviceEqual(g.device, 'GPU:0') self.assertDeviceEqual(v.device, 'CPU:0') def testCreateSingleclone(self): @@ -376,7 +376,7 @@ class OptimizeclonesTest(tf.test.TestCase): self.assertEqual(len(grads_and_vars), len(tf.trainable_variables())) self.assertEqual(total_loss.op.name, 'total_loss') for g, v in grads_and_vars: - self.assertDeviceEqual(g.device, '') + self.assertDeviceEqual(g.device, 'GPU:0') self.assertDeviceEqual(v.device, 'CPU:0') def testCreateMulticlone(self): @@ -458,7 +458,7 @@ class OptimizeclonesTest(tf.test.TestCase): self.assertEqual(len(grads_and_vars), len(tf.trainable_variables())) self.assertEqual(total_loss.op.name, 'total_loss') for g, v in grads_and_vars: - self.assertDeviceEqual(g.device, '/job:worker') + self.assertDeviceEqual(g.device, '/job:worker/device:GPU:0') self.assertDeviceEqual(v.device, '/job:ps/task:0/CPU:0') @@ -515,7 +515,7 @@ class DeployTest(tf.test.TestCase): for _ in range(10): sess.run(model.train_op) final_loss = sess.run(model.total_loss) - self.assertLess(final_loss, initial_loss / 10.0) + self.assertLess(final_loss, initial_loss / 5.0) final_mean, final_variance = sess.run([moving_mean, moving_variance]) diff --git a/slim/eval_image_classifier.py b/slim/eval_image_classifier.py index 6a759416651255b9eac564b6c52f8ffb78f31ce7..82d10d91cfbefb8179c847123ee6db24b0e54e43 100644 --- a/slim/eval_image_classifier.py +++ b/slim/eval_image_classifier.py @@ -158,7 +158,7 @@ def main(_): }) # Print the summaries to screen. - for name, value in names_to_values.iteritems(): + for name, value in names_to_values.items(): summary_name = 'eval/%s' % name op = tf.summary.scalar(summary_name, value, collections=[]) op = tf.Print(op, [value], summary_name) diff --git a/slim/export_inference_graph.py b/slim/export_inference_graph.py new file mode 100644 index 0000000000000000000000000000000000000000..13f10ce003caa16feec69587558f8355bb4f4d94 --- /dev/null +++ b/slim/export_inference_graph.py @@ -0,0 +1,122 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +r"""Saves out a GraphDef containing the architecture of the model. + +To use it, run something like this, with a model name defined by slim: + +bazel build tensorflow_models/slim:export_inference_graph +bazel-bin/tensorflow_models/slim/export_inference_graph \ +--model_name=inception_v3 --output_file=/tmp/inception_v3_inf_graph.pb + +If you then want to use the resulting model with your own or pretrained +checkpoints as part of a mobile model, you can run freeze_graph to get a graph +def with the variables inlined as constants using: + +bazel build tensorflow/python/tools:freeze_graph +bazel-bin/tensorflow/python/tools/freeze_graph \ +--input_graph=/tmp/inception_v3_inf_graph.pb \ +--input_checkpoint=/tmp/checkpoints/inception_v3.ckpt \ +--input_binary=true --output_graph=/tmp/frozen_inception_v3.pb \ +--output_node_names=InceptionV3/Predictions/Reshape_1 + +The output node names will vary depending on the model, but you can inspect and +estimate them using the summarize_graph tool: + +bazel build tensorflow/tools/graph_transforms:summarize_graph +bazel-bin/tensorflow/tools/graph_transforms/summarize_graph \ +--in_graph=/tmp/inception_v3_inf_graph.pb + +To run the resulting graph in C++, you can look at the label_image sample code: + +bazel build tensorflow/examples/label_image:label_image +bazel-bin/tensorflow/examples/label_image/label_image \ +--image=${HOME}/Pictures/flowers.jpg \ +--input_layer=input \ +--output_layer=InceptionV3/Predictions/Reshape_1 \ +--graph=/tmp/frozen_inception_v3.pb \ +--labels=/tmp/imagenet_slim_labels.txt \ +--input_mean=0 \ +--input_std=255 \ +--logtostderr + +""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import tensorflow as tf + +from tensorflow.python.platform import gfile +from datasets import dataset_factory +from nets import nets_factory + + +slim = tf.contrib.slim + +tf.app.flags.DEFINE_string( + 'model_name', 'inception_v3', 'The name of the architecture to save.') + +tf.app.flags.DEFINE_boolean( + 'is_training', False, + 'Whether to save out a training-focused version of the model.') + +tf.app.flags.DEFINE_integer( + 'default_image_size', 224, + 'The image size to use if the model does not define it.') + +tf.app.flags.DEFINE_string('dataset_name', 'imagenet', + 'The name of the dataset to use with the model.') + +tf.app.flags.DEFINE_integer( + 'labels_offset', 0, + 'An offset for the labels in the dataset. This flag is primarily used to ' + 'evaluate the VGG and ResNet architectures which do not use a background ' + 'class for the ImageNet dataset.') + +tf.app.flags.DEFINE_string( + 'output_file', '', 'Where to save the resulting file to.') + +tf.app.flags.DEFINE_string( + 'dataset_dir', '', 'Directory to save intermediate dataset files to') + +FLAGS = tf.app.flags.FLAGS + + +def main(_): + if not FLAGS.output_file: + raise ValueError('You must supply the path to save to with --output_file') + tf.logging.set_verbosity(tf.logging.INFO) + with tf.Graph().as_default() as graph: + dataset = dataset_factory.get_dataset(FLAGS.dataset_name, 'validation', + FLAGS.dataset_dir) + network_fn = nets_factory.get_network_fn( + FLAGS.model_name, + num_classes=(dataset.num_classes - FLAGS.labels_offset), + is_training=FLAGS.is_training) + if hasattr(network_fn, 'default_image_size'): + image_size = network_fn.default_image_size + else: + image_size = FLAGS.default_image_size + placeholder = tf.placeholder(name='input', dtype=tf.float32, + shape=[1, image_size, image_size, 3]) + network_fn(placeholder) + graph_def = graph.as_graph_def() + with gfile.GFile(FLAGS.output_file, 'wb') as f: + f.write(graph_def.SerializeToString()) + + +if __name__ == '__main__': + tf.app.run() diff --git a/slim/export_inference_graph_test.py b/slim/export_inference_graph_test.py new file mode 100644 index 0000000000000000000000000000000000000000..a730e67e583316d031af9d0df0cfbc0ec6b48983 --- /dev/null +++ b/slim/export_inference_graph_test.py @@ -0,0 +1,44 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for export_inference_graph.""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os + + +import tensorflow as tf + +from tensorflow.python.platform import gfile +from google3.third_party.tensorflow_models.slim import export_inference_graph + + +class ExportInferenceGraphTest(tf.test.TestCase): + + def testExportInferenceGraph(self): + tmpdir = self.get_temp_dir() + output_file = os.path.join(tmpdir, 'inception_v3.pb') + flags = tf.app.flags.FLAGS + flags.output_file = output_file + flags.model_name = 'inception_v3' + flags.dataset_dir = tmpdir + export_inference_graph.main(None) + self.assertTrue(gfile.Exists(output_file)) + +if __name__ == '__main__': + tf.test.main() diff --git a/slim/nets/inception.py b/slim/nets/inception.py index 806c30bee2a5530445f250724261cd750a9900f5..b69cd2aacbea6dcea849c4e4b39accaa05bd264a 100644 --- a/slim/nets/inception.py +++ b/slim/nets/inception.py @@ -21,6 +21,7 @@ from __future__ import print_function # pylint: disable=unused-import from nets.inception_resnet_v2 import inception_resnet_v2 from nets.inception_resnet_v2 import inception_resnet_v2_arg_scope +from nets.inception_resnet_v2 import inception_resnet_v2_base from nets.inception_v1 import inception_v1 from nets.inception_v1 import inception_v1_arg_scope from nets.inception_v1 import inception_v1_base diff --git a/slim/nets/inception_resnet_v2.py b/slim/nets/inception_resnet_v2.py index b5a54c5b6186c8e9357e478d2d0faf22e6cf979b..ec8387a33c6226c76758031f403debb80447e9c8 100644 --- a/slim/nets/inception_resnet_v2.py +++ b/slim/nets/inception_resnet_v2.py @@ -91,10 +91,187 @@ def block8(net, scale=1.0, activation_fn=tf.nn.relu, scope=None, reuse=None): return net +def inception_resnet_v2_base(inputs, + final_endpoint='Conv2d_7b_1x1', + output_stride=16, + align_feature_maps=False, + scope=None): + """Inception model from http://arxiv.org/abs/1602.07261. + + Constructs an Inception Resnet v2 network from inputs to the given final + endpoint. This method can construct the network up to the final inception + block Conv2d_7b_1x1. + + Args: + inputs: a tensor of size [batch_size, height, width, channels]. + final_endpoint: specifies the endpoint to construct the network up to. It + can be one of ['Conv2d_1a_3x3', 'Conv2d_2a_3x3', 'Conv2d_2b_3x3', + 'MaxPool_3a_3x3', 'Conv2d_3b_1x1', 'Conv2d_4a_3x3', 'MaxPool_5a_3x3', + 'Mixed_5b', 'Mixed_6a', 'PreAuxLogits', 'Mixed_7a', 'Conv2d_7b_1x1'] + output_stride: A scalar that specifies the requested ratio of input to + output spatial resolution. Only supports 8 and 16. + align_feature_maps: When true, changes all the VALID paddings in the network + to SAME padding so that the feature maps are aligned. + scope: Optional variable_scope. + + Returns: + tensor_out: output tensor corresponding to the final_endpoint. + end_points: a set of activations for external use, for example summaries or + losses. + + Raises: + ValueError: if final_endpoint is not set to one of the predefined values, + or if the output_stride is not 8 or 16, or if the output_stride is 8 and + we request an end point after 'PreAuxLogits'. + """ + if output_stride != 8 and output_stride != 16: + raise ValueError('output_stride must be 8 or 16.') + + padding = 'SAME' if align_feature_maps else 'VALID' + + end_points = {} + + def add_and_check_final(name, net): + end_points[name] = net + return name == final_endpoint + + with tf.variable_scope(scope, 'InceptionResnetV2', [inputs]): + with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d], + stride=1, padding='SAME'): + # 149 x 149 x 32 + net = slim.conv2d(inputs, 32, 3, stride=2, padding=padding, + scope='Conv2d_1a_3x3') + if add_and_check_final('Conv2d_1a_3x3', net): return net, end_points + + # 147 x 147 x 32 + net = slim.conv2d(net, 32, 3, padding=padding, + scope='Conv2d_2a_3x3') + if add_and_check_final('Conv2d_2a_3x3', net): return net, end_points + # 147 x 147 x 64 + net = slim.conv2d(net, 64, 3, scope='Conv2d_2b_3x3') + if add_and_check_final('Conv2d_2b_3x3', net): return net, end_points + # 73 x 73 x 64 + net = slim.max_pool2d(net, 3, stride=2, padding=padding, + scope='MaxPool_3a_3x3') + if add_and_check_final('MaxPool_3a_3x3', net): return net, end_points + # 73 x 73 x 80 + net = slim.conv2d(net, 80, 1, padding=padding, + scope='Conv2d_3b_1x1') + if add_and_check_final('Conv2d_3b_1x1', net): return net, end_points + # 71 x 71 x 192 + net = slim.conv2d(net, 192, 3, padding=padding, + scope='Conv2d_4a_3x3') + if add_and_check_final('Conv2d_4a_3x3', net): return net, end_points + # 35 x 35 x 192 + net = slim.max_pool2d(net, 3, stride=2, padding=padding, + scope='MaxPool_5a_3x3') + if add_and_check_final('MaxPool_5a_3x3', net): return net, end_points + + # 35 x 35 x 320 + with tf.variable_scope('Mixed_5b'): + with tf.variable_scope('Branch_0'): + tower_conv = slim.conv2d(net, 96, 1, scope='Conv2d_1x1') + with tf.variable_scope('Branch_1'): + tower_conv1_0 = slim.conv2d(net, 48, 1, scope='Conv2d_0a_1x1') + tower_conv1_1 = slim.conv2d(tower_conv1_0, 64, 5, + scope='Conv2d_0b_5x5') + with tf.variable_scope('Branch_2'): + tower_conv2_0 = slim.conv2d(net, 64, 1, scope='Conv2d_0a_1x1') + tower_conv2_1 = slim.conv2d(tower_conv2_0, 96, 3, + scope='Conv2d_0b_3x3') + tower_conv2_2 = slim.conv2d(tower_conv2_1, 96, 3, + scope='Conv2d_0c_3x3') + with tf.variable_scope('Branch_3'): + tower_pool = slim.avg_pool2d(net, 3, stride=1, padding='SAME', + scope='AvgPool_0a_3x3') + tower_pool_1 = slim.conv2d(tower_pool, 64, 1, + scope='Conv2d_0b_1x1') + net = tf.concat( + [tower_conv, tower_conv1_1, tower_conv2_2, tower_pool_1], 3) + + if add_and_check_final('Mixed_5b', net): return net, end_points + # TODO(alemi): Register intermediate endpoints + net = slim.repeat(net, 10, block35, scale=0.17) + + # 17 x 17 x 1088 if output_stride == 8, + # 33 x 33 x 1088 if output_stride == 16 + use_atrous = output_stride == 8 + + with tf.variable_scope('Mixed_6a'): + with tf.variable_scope('Branch_0'): + tower_conv = slim.conv2d(net, 384, 3, stride=1 if use_atrous else 2, + padding=padding, + scope='Conv2d_1a_3x3') + with tf.variable_scope('Branch_1'): + tower_conv1_0 = slim.conv2d(net, 256, 1, scope='Conv2d_0a_1x1') + tower_conv1_1 = slim.conv2d(tower_conv1_0, 256, 3, + scope='Conv2d_0b_3x3') + tower_conv1_2 = slim.conv2d(tower_conv1_1, 384, 3, + stride=1 if use_atrous else 2, + padding=padding, + scope='Conv2d_1a_3x3') + with tf.variable_scope('Branch_2'): + tower_pool = slim.max_pool2d(net, 3, stride=1 if use_atrous else 2, + padding=padding, + scope='MaxPool_1a_3x3') + net = tf.concat([tower_conv, tower_conv1_2, tower_pool], 3) + + if add_and_check_final('Mixed_6a', net): return net, end_points + + # TODO(alemi): register intermediate endpoints + with slim.arg_scope([slim.conv2d], rate=2 if use_atrous else 1): + net = slim.repeat(net, 20, block17, scale=0.10) + if add_and_check_final('PreAuxLogits', net): return net, end_points + + if output_stride == 8: + # TODO(gpapan): Properly support output_stride for the rest of the net. + raise ValueError('output_stride==8 is only supported up to the ' + 'PreAuxlogits end_point for now.') + + # 8 x 8 x 2080 + with tf.variable_scope('Mixed_7a'): + with tf.variable_scope('Branch_0'): + tower_conv = slim.conv2d(net, 256, 1, scope='Conv2d_0a_1x1') + tower_conv_1 = slim.conv2d(tower_conv, 384, 3, stride=2, + padding=padding, + scope='Conv2d_1a_3x3') + with tf.variable_scope('Branch_1'): + tower_conv1 = slim.conv2d(net, 256, 1, scope='Conv2d_0a_1x1') + tower_conv1_1 = slim.conv2d(tower_conv1, 288, 3, stride=2, + padding=padding, + scope='Conv2d_1a_3x3') + with tf.variable_scope('Branch_2'): + tower_conv2 = slim.conv2d(net, 256, 1, scope='Conv2d_0a_1x1') + tower_conv2_1 = slim.conv2d(tower_conv2, 288, 3, + scope='Conv2d_0b_3x3') + tower_conv2_2 = slim.conv2d(tower_conv2_1, 320, 3, stride=2, + padding=padding, + scope='Conv2d_1a_3x3') + with tf.variable_scope('Branch_3'): + tower_pool = slim.max_pool2d(net, 3, stride=2, + padding=padding, + scope='MaxPool_1a_3x3') + net = tf.concat( + [tower_conv_1, tower_conv1_1, tower_conv2_2, tower_pool], 3) + + if add_and_check_final('Mixed_7a', net): return net, end_points + + # TODO(alemi): register intermediate endpoints + net = slim.repeat(net, 9, block8, scale=0.20) + net = block8(net, activation_fn=None) + + # 8 x 8 x 1536 + net = slim.conv2d(net, 1536, 1, scope='Conv2d_7b_1x1') + if add_and_check_final('Conv2d_7b_1x1', net): return net, end_points + + raise ValueError('final_endpoint (%s) not recognized', final_endpoint) + + def inception_resnet_v2(inputs, num_classes=1001, is_training=True, dropout_keep_prob=0.8, reuse=None, - scope='InceptionResnetV2'): + scope='InceptionResnetV2', + create_aux_logits=True): """Creates the Inception Resnet V2 model. Args: @@ -105,6 +282,7 @@ def inception_resnet_v2(inputs, num_classes=1001, is_training=True, reuse: whether or not the network and its variables should be reused. To be able to reuse 'scope' must be given. scope: Optional variable_scope. + create_aux_logits: Whether to include the auxilliary logits. Returns: logits: the logits outputs of the model. @@ -112,88 +290,17 @@ def inception_resnet_v2(inputs, num_classes=1001, is_training=True, """ end_points = {} - with tf.variable_scope(scope, 'InceptionResnetV2', [inputs], reuse=reuse): + with tf.variable_scope(scope, 'InceptionResnetV2', [inputs, num_classes], + reuse=reuse) as scope: with slim.arg_scope([slim.batch_norm, slim.dropout], is_training=is_training): - with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d], - stride=1, padding='SAME'): - - # 149 x 149 x 32 - net = slim.conv2d(inputs, 32, 3, stride=2, padding='VALID', - scope='Conv2d_1a_3x3') - end_points['Conv2d_1a_3x3'] = net - # 147 x 147 x 32 - net = slim.conv2d(net, 32, 3, padding='VALID', - scope='Conv2d_2a_3x3') - end_points['Conv2d_2a_3x3'] = net - # 147 x 147 x 64 - net = slim.conv2d(net, 64, 3, scope='Conv2d_2b_3x3') - end_points['Conv2d_2b_3x3'] = net - # 73 x 73 x 64 - net = slim.max_pool2d(net, 3, stride=2, padding='VALID', - scope='MaxPool_3a_3x3') - end_points['MaxPool_3a_3x3'] = net - # 73 x 73 x 80 - net = slim.conv2d(net, 80, 1, padding='VALID', - scope='Conv2d_3b_1x1') - end_points['Conv2d_3b_1x1'] = net - # 71 x 71 x 192 - net = slim.conv2d(net, 192, 3, padding='VALID', - scope='Conv2d_4a_3x3') - end_points['Conv2d_4a_3x3'] = net - # 35 x 35 x 192 - net = slim.max_pool2d(net, 3, stride=2, padding='VALID', - scope='MaxPool_5a_3x3') - end_points['MaxPool_5a_3x3'] = net - - # 35 x 35 x 320 - with tf.variable_scope('Mixed_5b'): - with tf.variable_scope('Branch_0'): - tower_conv = slim.conv2d(net, 96, 1, scope='Conv2d_1x1') - with tf.variable_scope('Branch_1'): - tower_conv1_0 = slim.conv2d(net, 48, 1, scope='Conv2d_0a_1x1') - tower_conv1_1 = slim.conv2d(tower_conv1_0, 64, 5, - scope='Conv2d_0b_5x5') - with tf.variable_scope('Branch_2'): - tower_conv2_0 = slim.conv2d(net, 64, 1, scope='Conv2d_0a_1x1') - tower_conv2_1 = slim.conv2d(tower_conv2_0, 96, 3, - scope='Conv2d_0b_3x3') - tower_conv2_2 = slim.conv2d(tower_conv2_1, 96, 3, - scope='Conv2d_0c_3x3') - with tf.variable_scope('Branch_3'): - tower_pool = slim.avg_pool2d(net, 3, stride=1, padding='SAME', - scope='AvgPool_0a_3x3') - tower_pool_1 = slim.conv2d(tower_pool, 64, 1, - scope='Conv2d_0b_1x1') - net = tf.concat(axis=3, values=[tower_conv, tower_conv1_1, - tower_conv2_2, tower_pool_1]) - - end_points['Mixed_5b'] = net - net = slim.repeat(net, 10, block35, scale=0.17) - - # 17 x 17 x 1088 - with tf.variable_scope('Mixed_6a'): - with tf.variable_scope('Branch_0'): - tower_conv = slim.conv2d(net, 384, 3, stride=2, padding='VALID', - scope='Conv2d_1a_3x3') - with tf.variable_scope('Branch_1'): - tower_conv1_0 = slim.conv2d(net, 256, 1, scope='Conv2d_0a_1x1') - tower_conv1_1 = slim.conv2d(tower_conv1_0, 256, 3, - scope='Conv2d_0b_3x3') - tower_conv1_2 = slim.conv2d(tower_conv1_1, 384, 3, - stride=2, padding='VALID', - scope='Conv2d_1a_3x3') - with tf.variable_scope('Branch_2'): - tower_pool = slim.max_pool2d(net, 3, stride=2, padding='VALID', - scope='MaxPool_1a_3x3') - net = tf.concat(axis=3, values=[tower_conv, tower_conv1_2, tower_pool]) - - end_points['Mixed_6a'] = net - net = slim.repeat(net, 20, block17, scale=0.10) - # Auxiliary tower + net, end_points = inception_resnet_v2_base(inputs, scope=scope) + + if create_aux_logits: with tf.variable_scope('AuxLogits'): - aux = slim.avg_pool2d(net, 5, stride=3, padding='VALID', + aux = end_points['PreAuxLogits'] + aux = slim.avg_pool2d(aux, 5, stride=3, padding='VALID', scope='Conv2d_1a_3x3') aux = slim.conv2d(aux, 128, 1, scope='Conv2d_1b_1x1') aux = slim.conv2d(aux, 768, aux.get_shape()[1:3], @@ -203,49 +310,19 @@ def inception_resnet_v2(inputs, num_classes=1001, is_training=True, scope='Logits') end_points['AuxLogits'] = aux - with tf.variable_scope('Mixed_7a'): - with tf.variable_scope('Branch_0'): - tower_conv = slim.conv2d(net, 256, 1, scope='Conv2d_0a_1x1') - tower_conv_1 = slim.conv2d(tower_conv, 384, 3, stride=2, - padding='VALID', scope='Conv2d_1a_3x3') - with tf.variable_scope('Branch_1'): - tower_conv1 = slim.conv2d(net, 256, 1, scope='Conv2d_0a_1x1') - tower_conv1_1 = slim.conv2d(tower_conv1, 288, 3, stride=2, - padding='VALID', scope='Conv2d_1a_3x3') - with tf.variable_scope('Branch_2'): - tower_conv2 = slim.conv2d(net, 256, 1, scope='Conv2d_0a_1x1') - tower_conv2_1 = slim.conv2d(tower_conv2, 288, 3, - scope='Conv2d_0b_3x3') - tower_conv2_2 = slim.conv2d(tower_conv2_1, 320, 3, stride=2, - padding='VALID', scope='Conv2d_1a_3x3') - with tf.variable_scope('Branch_3'): - tower_pool = slim.max_pool2d(net, 3, stride=2, padding='VALID', - scope='MaxPool_1a_3x3') - net = tf.concat(axis=3, values=[tower_conv_1, tower_conv1_1, - tower_conv2_2, tower_pool]) - - end_points['Mixed_7a'] = net - - net = slim.repeat(net, 9, block8, scale=0.20) - net = block8(net, activation_fn=None) - - net = slim.conv2d(net, 1536, 1, scope='Conv2d_7b_1x1') - end_points['Conv2d_7b_1x1'] = net - - with tf.variable_scope('Logits'): - end_points['PrePool'] = net - net = slim.avg_pool2d(net, net.get_shape()[1:3], padding='VALID', - scope='AvgPool_1a_8x8') - net = slim.flatten(net) - - net = slim.dropout(net, dropout_keep_prob, is_training=is_training, - scope='Dropout') - - end_points['PreLogitsFlatten'] = net - logits = slim.fully_connected(net, num_classes, activation_fn=None, - scope='Logits') - end_points['Logits'] = logits - end_points['Predictions'] = tf.nn.softmax(logits, name='Predictions') + with tf.variable_scope('Logits'): + net = slim.avg_pool2d(net, net.get_shape()[1:3], padding='VALID', + scope='AvgPool_1a_8x8') + net = slim.flatten(net) + + net = slim.dropout(net, dropout_keep_prob, is_training=is_training, + scope='Dropout') + + end_points['PreLogitsFlatten'] = net + logits = slim.fully_connected(net, num_classes, activation_fn=None, + scope='Logits') + end_points['Logits'] = logits + end_points['Predictions'] = tf.nn.softmax(logits, name='Predictions') return logits, end_points inception_resnet_v2.default_image_size = 299 diff --git a/slim/nets/inception_resnet_v2_test.py b/slim/nets/inception_resnet_v2_test.py index b1560fb0102f8aeed01a4baa3e89f57386c08efe..c369ed9f74b1bf14fb2d45b2210df6239d943177 100644 --- a/slim/nets/inception_resnet_v2_test.py +++ b/slim/nets/inception_resnet_v2_test.py @@ -30,7 +30,26 @@ class InceptionTest(tf.test.TestCase): num_classes = 1000 with self.test_session(): inputs = tf.random_uniform((batch_size, height, width, 3)) - logits, _ = inception.inception_resnet_v2(inputs, num_classes) + logits, endpoints = inception.inception_resnet_v2(inputs, num_classes) + self.assertTrue('AuxLogits' in endpoints) + auxlogits = endpoints['AuxLogits'] + self.assertTrue( + auxlogits.op.name.startswith('InceptionResnetV2/AuxLogits')) + self.assertListEqual(auxlogits.get_shape().as_list(), + [batch_size, num_classes]) + self.assertTrue(logits.op.name.startswith('InceptionResnetV2/Logits')) + self.assertListEqual(logits.get_shape().as_list(), + [batch_size, num_classes]) + + def testBuildWithoutAuxLogits(self): + batch_size = 5 + height, width = 299, 299 + num_classes = 1000 + with self.test_session(): + inputs = tf.random_uniform((batch_size, height, width, 3)) + logits, endpoints = inception.inception_resnet_v2(inputs, num_classes, + create_aux_logits=False) + self.assertTrue('AuxLogits' not in endpoints) self.assertTrue(logits.op.name.startswith('InceptionResnetV2/Logits')) self.assertListEqual(logits.get_shape().as_list(), [batch_size, num_classes]) @@ -50,10 +69,120 @@ class InceptionTest(tf.test.TestCase): aux_logits = end_points['AuxLogits'] self.assertListEqual(aux_logits.get_shape().as_list(), [batch_size, num_classes]) - pre_pool = end_points['PrePool'] + pre_pool = end_points['Conv2d_7b_1x1'] self.assertListEqual(pre_pool.get_shape().as_list(), [batch_size, 8, 8, 1536]) + def testBuildBaseNetwork(self): + batch_size = 5 + height, width = 299, 299 + + inputs = tf.random_uniform((batch_size, height, width, 3)) + net, end_points = inception.inception_resnet_v2_base(inputs) + self.assertTrue(net.op.name.startswith('InceptionResnetV2/Conv2d_7b_1x1')) + self.assertListEqual(net.get_shape().as_list(), + [batch_size, 8, 8, 1536]) + expected_endpoints = ['Conv2d_1a_3x3', 'Conv2d_2a_3x3', 'Conv2d_2b_3x3', + 'MaxPool_3a_3x3', 'Conv2d_3b_1x1', 'Conv2d_4a_3x3', + 'MaxPool_5a_3x3', 'Mixed_5b', 'Mixed_6a', + 'PreAuxLogits', 'Mixed_7a', 'Conv2d_7b_1x1'] + self.assertItemsEqual(end_points.keys(), expected_endpoints) + + def testBuildOnlyUptoFinalEndpoint(self): + batch_size = 5 + height, width = 299, 299 + endpoints = ['Conv2d_1a_3x3', 'Conv2d_2a_3x3', 'Conv2d_2b_3x3', + 'MaxPool_3a_3x3', 'Conv2d_3b_1x1', 'Conv2d_4a_3x3', + 'MaxPool_5a_3x3', 'Mixed_5b', 'Mixed_6a', + 'PreAuxLogits', 'Mixed_7a', 'Conv2d_7b_1x1'] + for index, endpoint in enumerate(endpoints): + with tf.Graph().as_default(): + inputs = tf.random_uniform((batch_size, height, width, 3)) + out_tensor, end_points = inception.inception_resnet_v2_base( + inputs, final_endpoint=endpoint) + if endpoint != 'PreAuxLogits': + self.assertTrue(out_tensor.op.name.startswith( + 'InceptionResnetV2/' + endpoint)) + self.assertItemsEqual(endpoints[:index+1], end_points) + + def testBuildAndCheckAllEndPointsUptoPreAuxLogits(self): + batch_size = 5 + height, width = 299, 299 + + inputs = tf.random_uniform((batch_size, height, width, 3)) + _, end_points = inception.inception_resnet_v2_base( + inputs, final_endpoint='PreAuxLogits') + endpoints_shapes = {'Conv2d_1a_3x3': [5, 149, 149, 32], + 'Conv2d_2a_3x3': [5, 147, 147, 32], + 'Conv2d_2b_3x3': [5, 147, 147, 64], + 'MaxPool_3a_3x3': [5, 73, 73, 64], + 'Conv2d_3b_1x1': [5, 73, 73, 80], + 'Conv2d_4a_3x3': [5, 71, 71, 192], + 'MaxPool_5a_3x3': [5, 35, 35, 192], + 'Mixed_5b': [5, 35, 35, 320], + 'Mixed_6a': [5, 17, 17, 1088], + 'PreAuxLogits': [5, 17, 17, 1088] + } + + self.assertItemsEqual(endpoints_shapes.keys(), end_points.keys()) + for endpoint_name in endpoints_shapes: + expected_shape = endpoints_shapes[endpoint_name] + self.assertTrue(endpoint_name in end_points) + self.assertListEqual(end_points[endpoint_name].get_shape().as_list(), + expected_shape) + + def testBuildAndCheckAllEndPointsUptoPreAuxLogitsWithAlignedFeatureMaps(self): + batch_size = 5 + height, width = 299, 299 + + inputs = tf.random_uniform((batch_size, height, width, 3)) + _, end_points = inception.inception_resnet_v2_base( + inputs, final_endpoint='PreAuxLogits', align_feature_maps=True) + endpoints_shapes = {'Conv2d_1a_3x3': [5, 150, 150, 32], + 'Conv2d_2a_3x3': [5, 150, 150, 32], + 'Conv2d_2b_3x3': [5, 150, 150, 64], + 'MaxPool_3a_3x3': [5, 75, 75, 64], + 'Conv2d_3b_1x1': [5, 75, 75, 80], + 'Conv2d_4a_3x3': [5, 75, 75, 192], + 'MaxPool_5a_3x3': [5, 38, 38, 192], + 'Mixed_5b': [5, 38, 38, 320], + 'Mixed_6a': [5, 19, 19, 1088], + 'PreAuxLogits': [5, 19, 19, 1088] + } + + self.assertItemsEqual(endpoints_shapes.keys(), end_points.keys()) + for endpoint_name in endpoints_shapes: + expected_shape = endpoints_shapes[endpoint_name] + self.assertTrue(endpoint_name in end_points) + self.assertListEqual(end_points[endpoint_name].get_shape().as_list(), + expected_shape) + + def testBuildAndCheckAllEndPointsUptoPreAuxLogitsWithOutputStrideEight(self): + batch_size = 5 + height, width = 299, 299 + + inputs = tf.random_uniform((batch_size, height, width, 3)) + _, end_points = inception.inception_resnet_v2_base( + inputs, final_endpoint='PreAuxLogits', output_stride=8) + endpoints_shapes = {'Conv2d_1a_3x3': [5, 149, 149, 32], + 'Conv2d_2a_3x3': [5, 147, 147, 32], + 'Conv2d_2b_3x3': [5, 147, 147, 64], + 'MaxPool_3a_3x3': [5, 73, 73, 64], + 'Conv2d_3b_1x1': [5, 73, 73, 80], + 'Conv2d_4a_3x3': [5, 71, 71, 192], + 'MaxPool_5a_3x3': [5, 35, 35, 192], + 'Mixed_5b': [5, 35, 35, 320], + 'Mixed_6a': [5, 33, 33, 1088], + 'PreAuxLogits': [5, 33, 33, 1088] + } + + self.assertItemsEqual(endpoints_shapes.keys(), end_points.keys()) + for endpoint_name in endpoints_shapes: + expected_shape = endpoints_shapes[endpoint_name] + self.assertTrue(endpoint_name in end_points) + self.assertListEqual(end_points[endpoint_name].get_shape().as_list(), + expected_shape) + def testVariablesSetDevice(self): batch_size = 5 height, width = 299, 299 @@ -80,7 +209,7 @@ class InceptionTest(tf.test.TestCase): self.assertTrue(logits.op.name.startswith('InceptionResnetV2/Logits')) self.assertListEqual(logits.get_shape().as_list(), [batch_size, num_classes]) - pre_pool = end_points['PrePool'] + pre_pool = end_points['Conv2d_7b_1x1'] self.assertListEqual(pre_pool.get_shape().as_list(), [batch_size, 3, 3, 1536]) diff --git a/slim/nets/inception_v4.py b/slim/nets/inception_v4.py index a03e4127dd133e65b273fece26eebe7001aa105f..b4f07ea70edf69ecac94fad26fb949295a41eac0 100644 --- a/slim/nets/inception_v4.py +++ b/slim/nets/inception_v4.py @@ -223,7 +223,7 @@ def inception_v4_base(inputs, final_endpoint='Mixed_7d', scope=None): # 35 x 35 x 384 # 4 x Inception-A blocks - for idx in xrange(4): + for idx in range(4): block_scope = 'Mixed_5' + chr(ord('b') + idx) net = block_inception_a(net, block_scope) if add_and_check_final(block_scope, net): return net, end_points @@ -235,7 +235,7 @@ def inception_v4_base(inputs, final_endpoint='Mixed_7d', scope=None): # 17 x 17 x 1024 # 7 x Inception-B blocks - for idx in xrange(7): + for idx in range(7): block_scope = 'Mixed_6' + chr(ord('b') + idx) net = block_inception_b(net, block_scope) if add_and_check_final(block_scope, net): return net, end_points @@ -247,7 +247,7 @@ def inception_v4_base(inputs, final_endpoint='Mixed_7d', scope=None): # 8 x 8 x 1536 # 3 x Inception-C blocks - for idx in xrange(3): + for idx in range(3): block_scope = 'Mixed_7' + chr(ord('b') + idx) net = block_inception_c(net, block_scope) if add_and_check_final(block_scope, net): return net, end_points diff --git a/slim/nets/mobilenet_v1.md b/slim/nets/mobilenet_v1.md new file mode 100644 index 0000000000000000000000000000000000000000..342f30561156c812785b3f6191d18884a0701656 --- /dev/null +++ b/slim/nets/mobilenet_v1.md @@ -0,0 +1,47 @@ +# MobileNet_v1 + +[MobileNets](https://arxiv.org/abs/1704.04861) are small, low-latency, low-power models parameterized to meet the resource constraints of a variety of use cases. They can be built upon for classification, detection, embeddings and segmentation similar to how other popular large scale models, such as Inception, are used. MobileNets can be run efficiently on mobile devices with [TensorFlow Mobile](https://www.tensorflow.org/mobile/). + +MobileNets trade off between latency, size and accuracy while comparing favorably with popular models from the literature. + +![alt text](mobilenet_v1.png "MobileNet Graph") + +# Pre-trained Models + +Choose the right MobileNet model to fit your latency and size budget. The size of the network in memory and on disk is proportional to the number of parameters. The latency and power usage of the network scales with the number of Multiply-Accumulates (MACs) which measures the number of fused Multiplication and Addition operations. These MobileNet models have been trained on the +[ILSVRC-2012-CLS](http://www.image-net.org/challenges/LSVRC/2012/) +image classification dataset. Accuracies were computed by evaluating using a single image crop. + +Model Checkpoint | Million MACs | Million Parameters | Top-1 Accuracy| Top-5 Accuracy | +:----:|:------------:|:----------:|:-------:|:-------:| +[MobileNet_v1_1.0_224](http://download.tensorflow.org/models/mobilenet_v1_1.0_224_2017_06_14.tar.gz)|569|4.24|70.7|89.5| +[MobileNet_v1_1.0_192](http://download.tensorflow.org/models/mobilenet_v1_1.0_192_2017_06_14.tar.gz)|418|4.24|69.3|88.9| +[MobileNet_v1_1.0_160](http://download.tensorflow.org/models/mobilenet_v1_1.0_160_2017_06_14.tar.gz)|291|4.24|67.2|87.5| +[MobileNet_v1_1.0_128](http://download.tensorflow.org/models/mobilenet_v1_1.0_128_2017_06_14.tar.gz)|186|4.24|64.1|85.3| +[MobileNet_v1_0.75_224](http://download.tensorflow.org/models/mobilenet_v1_0.75_224_2017_06_14.tar.gz)|317|2.59|68.4|88.2| +[MobileNet_v1_0.75_192](http://download.tensorflow.org/models/mobilenet_v1_0.75_192_2017_06_14.tar.gz)|233|2.59|67.4|87.3| +[MobileNet_v1_0.75_160](http://download.tensorflow.org/models/mobilenet_v1_0.75_160_2017_06_14.tar.gz)|162|2.59|65.2|86.1| +[MobileNet_v1_0.75_128](http://download.tensorflow.org/models/mobilenet_v1_0.75_128_2017_06_14.tar.gz)|104|2.59|61.8|83.6| +[MobileNet_v1_0.50_224](http://download.tensorflow.org/models/mobilenet_v1_0.50_224_2017_06_14.tar.gz)|150|1.34|64.0|85.4| +[MobileNet_v1_0.50_192](http://download.tensorflow.org/models/mobilenet_v1_0.50_192_2017_06_14.tar.gz)|110|1.34|62.1|84.0| +[MobileNet_v1_0.50_160](http://download.tensorflow.org/models/mobilenet_v1_0.50_160_2017_06_14.tar.gz)|77|1.34|59.9|82.5| +[MobileNet_v1_0.50_128](http://download.tensorflow.org/models/mobilenet_v1_0.50_128_2017_06_14.tar.gz)|49|1.34|56.2|79.6| +[MobileNet_v1_0.25_224](http://download.tensorflow.org/models/mobilenet_v1_0.25_224_2017_06_14.tar.gz)|41|0.47|50.6|75.0| +[MobileNet_v1_0.25_192](http://download.tensorflow.org/models/mobilenet_v1_0.25_192_2017_06_14.tar.gz)|34|0.47|49.0|73.6| +[MobileNet_v1_0.25_160](http://download.tensorflow.org/models/mobilenet_v1_0.25_160_2017_06_14.tar.gz)|21|0.47|46.0|70.7| +[MobileNet_v1_0.25_128](http://download.tensorflow.org/models/mobilenet_v1_0.25_128_2017_06_14.tar.gz)|14|0.47|41.3|66.2| + + +Here is an example of how to download the MobileNet_v1_1.0_224 checkpoint: + +```shell +$ CHECKPOINT_DIR=/tmp/checkpoints +$ mkdir ${CHECKPOINT_DIR} +$ wget http://download.tensorflow.org/models/mobilenet_v1_1.0_224_2017_06_14.tar.gz +$ tar -xvf mobilenet_v1_1.0_224_2017_06_14.tar.gz +$ mv mobilenet_v1_1.0_224.ckpt.* ${CHECKPOINT_DIR} +$ rm mobilenet_v1_1.0_224_2017_06_14.tar.gz +``` +More information on integrating MobileNets into your project can be found at the [TF-Slim Image Classification Library](https://github.com/tensorflow/models/blob/master/slim/README.md). + +To get started running models on-device go to [TensorFlow Mobile](https://www.tensorflow.org/mobile/). diff --git a/slim/nets/mobilenet_v1.png b/slim/nets/mobilenet_v1.png new file mode 100644 index 0000000000000000000000000000000000000000..a458345174a12073a653e26d6747914a4e58e516 Binary files /dev/null and b/slim/nets/mobilenet_v1.png differ diff --git a/slim/nets/mobilenet_v1.py b/slim/nets/mobilenet_v1.py new file mode 100644 index 0000000000000000000000000000000000000000..9b25145f8d48093cf5a92a3415699105353aa038 --- /dev/null +++ b/slim/nets/mobilenet_v1.py @@ -0,0 +1,397 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================= +"""MobileNet v1. + +MobileNet is a general architecture and can be used for multiple use cases. +Depending on the use case, it can use different input layer size and different +head (for example: embeddings, localization and classification). + +As described in https://arxiv.org/abs/1704.04861. + + MobileNets: Efficient Convolutional Neural Networks for + Mobile Vision Applications + Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, + Tobias Weyand, Marco Andreetto, Hartwig Adam + +100% Mobilenet V1 (base) with input size 224x224: + +Layer params macs +-------------------------------------------------------------------------------- +MobilenetV1/Conv2d_0/Conv2D: 864 10,838,016 +MobilenetV1/Conv2d_1_depthwise/depthwise: 288 3,612,672 +MobilenetV1/Conv2d_1_pointwise/Conv2D: 2,048 25,690,112 +MobilenetV1/Conv2d_2_depthwise/depthwise: 576 1,806,336 +MobilenetV1/Conv2d_2_pointwise/Conv2D: 8,192 25,690,112 +MobilenetV1/Conv2d_3_depthwise/depthwise: 1,152 3,612,672 +MobilenetV1/Conv2d_3_pointwise/Conv2D: 16,384 51,380,224 +MobilenetV1/Conv2d_4_depthwise/depthwise: 1,152 903,168 +MobilenetV1/Conv2d_4_pointwise/Conv2D: 32,768 25,690,112 +MobilenetV1/Conv2d_5_depthwise/depthwise: 2,304 1,806,336 +MobilenetV1/Conv2d_5_pointwise/Conv2D: 65,536 51,380,224 +MobilenetV1/Conv2d_6_depthwise/depthwise: 2,304 451,584 +MobilenetV1/Conv2d_6_pointwise/Conv2D: 131,072 25,690,112 +MobilenetV1/Conv2d_7_depthwise/depthwise: 4,608 903,168 +MobilenetV1/Conv2d_7_pointwise/Conv2D: 262,144 51,380,224 +MobilenetV1/Conv2d_8_depthwise/depthwise: 4,608 903,168 +MobilenetV1/Conv2d_8_pointwise/Conv2D: 262,144 51,380,224 +MobilenetV1/Conv2d_9_depthwise/depthwise: 4,608 903,168 +MobilenetV1/Conv2d_9_pointwise/Conv2D: 262,144 51,380,224 +MobilenetV1/Conv2d_10_depthwise/depthwise: 4,608 903,168 +MobilenetV1/Conv2d_10_pointwise/Conv2D: 262,144 51,380,224 +MobilenetV1/Conv2d_11_depthwise/depthwise: 4,608 903,168 +MobilenetV1/Conv2d_11_pointwise/Conv2D: 262,144 51,380,224 +MobilenetV1/Conv2d_12_depthwise/depthwise: 4,608 225,792 +MobilenetV1/Conv2d_12_pointwise/Conv2D: 524,288 25,690,112 +MobilenetV1/Conv2d_13_depthwise/depthwise: 9,216 451,584 +MobilenetV1/Conv2d_13_pointwise/Conv2D: 1,048,576 51,380,224 +-------------------------------------------------------------------------------- +Total: 3,185,088 567,716,352 + + +75% Mobilenet V1 (base) with input size 128x128: + +Layer params macs +-------------------------------------------------------------------------------- +MobilenetV1/Conv2d_0/Conv2D: 648 2,654,208 +MobilenetV1/Conv2d_1_depthwise/depthwise: 216 884,736 +MobilenetV1/Conv2d_1_pointwise/Conv2D: 1,152 4,718,592 +MobilenetV1/Conv2d_2_depthwise/depthwise: 432 442,368 +MobilenetV1/Conv2d_2_pointwise/Conv2D: 4,608 4,718,592 +MobilenetV1/Conv2d_3_depthwise/depthwise: 864 884,736 +MobilenetV1/Conv2d_3_pointwise/Conv2D: 9,216 9,437,184 +MobilenetV1/Conv2d_4_depthwise/depthwise: 864 221,184 +MobilenetV1/Conv2d_4_pointwise/Conv2D: 18,432 4,718,592 +MobilenetV1/Conv2d_5_depthwise/depthwise: 1,728 442,368 +MobilenetV1/Conv2d_5_pointwise/Conv2D: 36,864 9,437,184 +MobilenetV1/Conv2d_6_depthwise/depthwise: 1,728 110,592 +MobilenetV1/Conv2d_6_pointwise/Conv2D: 73,728 4,718,592 +MobilenetV1/Conv2d_7_depthwise/depthwise: 3,456 221,184 +MobilenetV1/Conv2d_7_pointwise/Conv2D: 147,456 9,437,184 +MobilenetV1/Conv2d_8_depthwise/depthwise: 3,456 221,184 +MobilenetV1/Conv2d_8_pointwise/Conv2D: 147,456 9,437,184 +MobilenetV1/Conv2d_9_depthwise/depthwise: 3,456 221,184 +MobilenetV1/Conv2d_9_pointwise/Conv2D: 147,456 9,437,184 +MobilenetV1/Conv2d_10_depthwise/depthwise: 3,456 221,184 +MobilenetV1/Conv2d_10_pointwise/Conv2D: 147,456 9,437,184 +MobilenetV1/Conv2d_11_depthwise/depthwise: 3,456 221,184 +MobilenetV1/Conv2d_11_pointwise/Conv2D: 147,456 9,437,184 +MobilenetV1/Conv2d_12_depthwise/depthwise: 3,456 55,296 +MobilenetV1/Conv2d_12_pointwise/Conv2D: 294,912 4,718,592 +MobilenetV1/Conv2d_13_depthwise/depthwise: 6,912 110,592 +MobilenetV1/Conv2d_13_pointwise/Conv2D: 589,824 9,437,184 +-------------------------------------------------------------------------------- +Total: 1,800,144 106,002,432 + +""" + +# Tensorflow mandates these. +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +from collections import namedtuple + +import tensorflow as tf + +slim = tf.contrib.slim + +# Conv and DepthSepConv namedtuple define layers of the MobileNet architecture +# Conv defines 3x3 convolution layers +# DepthSepConv defines 3x3 depthwise convolution followed by 1x1 convolution. +# stride is the stride of the convolution +# depth is the number of channels or filters in a layer +Conv = namedtuple('Conv', ['kernel', 'stride', 'depth']) +DepthSepConv = namedtuple('DepthSepConv', ['kernel', 'stride', 'depth']) + +# _CONV_DEFS specifies the MobileNet body +_CONV_DEFS = [ + Conv(kernel=[3, 3], stride=2, depth=32), + DepthSepConv(kernel=[3, 3], stride=1, depth=64), + DepthSepConv(kernel=[3, 3], stride=2, depth=128), + DepthSepConv(kernel=[3, 3], stride=1, depth=128), + DepthSepConv(kernel=[3, 3], stride=2, depth=256), + DepthSepConv(kernel=[3, 3], stride=1, depth=256), + DepthSepConv(kernel=[3, 3], stride=2, depth=512), + DepthSepConv(kernel=[3, 3], stride=1, depth=512), + DepthSepConv(kernel=[3, 3], stride=1, depth=512), + DepthSepConv(kernel=[3, 3], stride=1, depth=512), + DepthSepConv(kernel=[3, 3], stride=1, depth=512), + DepthSepConv(kernel=[3, 3], stride=1, depth=512), + DepthSepConv(kernel=[3, 3], stride=2, depth=1024), + DepthSepConv(kernel=[3, 3], stride=1, depth=1024) +] + + +def mobilenet_v1_base(inputs, + final_endpoint='Conv2d_13_pointwise', + min_depth=8, + depth_multiplier=1.0, + conv_defs=None, + output_stride=None, + scope=None): + """Mobilenet v1. + + Constructs a Mobilenet v1 network from inputs to the given final endpoint. + + Args: + inputs: a tensor of shape [batch_size, height, width, channels]. + final_endpoint: specifies the endpoint to construct the network up to. It + can be one of ['Conv2d_0', 'Conv2d_1_pointwise', 'Conv2d_2_pointwise', + 'Conv2d_3_pointwise', 'Conv2d_4_pointwise', 'Conv2d_5'_pointwise, + 'Conv2d_6_pointwise', 'Conv2d_7_pointwise', 'Conv2d_8_pointwise', + 'Conv2d_9_pointwise', 'Conv2d_10_pointwise', 'Conv2d_11_pointwise', + 'Conv2d_12_pointwise', 'Conv2d_13_pointwise']. + min_depth: Minimum depth value (number of channels) for all convolution ops. + Enforced when depth_multiplier < 1, and not an active constraint when + depth_multiplier >= 1. + depth_multiplier: Float multiplier for the depth (number of channels) + for all convolution ops. The value must be greater than zero. Typical + usage will be to set this value in (0, 1) to reduce the number of + parameters or computation cost of the model. + conv_defs: A list of ConvDef namedtuples specifying the net architecture. + output_stride: An integer that specifies the requested ratio of input to + output spatial resolution. If not None, then we invoke atrous convolution + if necessary to prevent the network from reducing the spatial resolution + of the activation maps. Allowed values are 8 (accurate fully convolutional + mode), 16 (fast fully convolutional mode), 32 (classification mode). + scope: Optional variable_scope. + + Returns: + tensor_out: output tensor corresponding to the final_endpoint. + end_points: a set of activations for external use, for example summaries or + losses. + + Raises: + ValueError: if final_endpoint is not set to one of the predefined values, + or depth_multiplier <= 0, or the target output_stride is not + allowed. + """ + depth = lambda d: max(int(d * depth_multiplier), min_depth) + end_points = {} + + # Used to find thinned depths for each layer. + if depth_multiplier <= 0: + raise ValueError('depth_multiplier is not greater than zero.') + + if conv_defs is None: + conv_defs = _CONV_DEFS + + if output_stride is not None and output_stride not in [8, 16, 32]: + raise ValueError('Only allowed output_stride values are 8, 16, 32.') + + with tf.variable_scope(scope, 'MobilenetV1', [inputs]): + with slim.arg_scope([slim.conv2d, slim.separable_conv2d], padding='SAME'): + # The current_stride variable keeps track of the output stride of the + # activations, i.e., the running product of convolution strides up to the + # current network layer. This allows us to invoke atrous convolution + # whenever applying the next convolution would result in the activations + # having output stride larger than the target output_stride. + current_stride = 1 + + # The atrous convolution rate parameter. + rate = 1 + + net = inputs + for i, conv_def in enumerate(conv_defs): + end_point_base = 'Conv2d_%d' % i + + if output_stride is not None and current_stride == output_stride: + # If we have reached the target output_stride, then we need to employ + # atrous convolution with stride=1 and multiply the atrous rate by the + # current unit's stride for use in subsequent layers. + layer_stride = 1 + layer_rate = rate + rate *= conv_def.stride + else: + layer_stride = conv_def.stride + layer_rate = 1 + current_stride *= conv_def.stride + + if isinstance(conv_def, Conv): + end_point = end_point_base + net = slim.conv2d(net, depth(conv_def.depth), conv_def.kernel, + stride=conv_def.stride, + normalizer_fn=slim.batch_norm, + scope=end_point) + end_points[end_point] = net + if end_point == final_endpoint: + return net, end_points + + elif isinstance(conv_def, DepthSepConv): + end_point = end_point_base + '_depthwise' + + # By passing filters=None + # separable_conv2d produces only a depthwise convolution layer + net = slim.separable_conv2d(net, None, conv_def.kernel, + depth_multiplier=1, + stride=layer_stride, + rate=layer_rate, + normalizer_fn=slim.batch_norm, + scope=end_point) + + end_points[end_point] = net + if end_point == final_endpoint: + return net, end_points + + end_point = end_point_base + '_pointwise' + + net = slim.conv2d(net, depth(conv_def.depth), [1, 1], + stride=1, + normalizer_fn=slim.batch_norm, + scope=end_point) + + end_points[end_point] = net + if end_point == final_endpoint: + return net, end_points + else: + raise ValueError('Unknown convolution type %s for layer %d' + % (conv_def.ltype, i)) + raise ValueError('Unknown final endpoint %s' % final_endpoint) + + +def mobilenet_v1(inputs, + num_classes=1000, + dropout_keep_prob=0.999, + is_training=True, + min_depth=8, + depth_multiplier=1.0, + conv_defs=None, + prediction_fn=tf.contrib.layers.softmax, + spatial_squeeze=True, + reuse=None, + scope='MobilenetV1'): + """Mobilenet v1 model for classification. + + Args: + inputs: a tensor of shape [batch_size, height, width, channels]. + num_classes: number of predicted classes. + dropout_keep_prob: the percentage of activation values that are retained. + is_training: whether is training or not. + min_depth: Minimum depth value (number of channels) for all convolution ops. + Enforced when depth_multiplier < 1, and not an active constraint when + depth_multiplier >= 1. + depth_multiplier: Float multiplier for the depth (number of channels) + for all convolution ops. The value must be greater than zero. Typical + usage will be to set this value in (0, 1) to reduce the number of + parameters or computation cost of the model. + conv_defs: A list of ConvDef namedtuples specifying the net architecture. + prediction_fn: a function to get predictions out of logits. + spatial_squeeze: if True, logits is of shape is [B, C], if false logits is + of shape [B, 1, 1, C], where B is batch_size and C is number of classes. + reuse: whether or not the network and its variables should be reused. To be + able to reuse 'scope' must be given. + scope: Optional variable_scope. + + Returns: + logits: the pre-softmax activations, a tensor of size + [batch_size, num_classes] + end_points: a dictionary from components of the network to the corresponding + activation. + + Raises: + ValueError: Input rank is invalid. + """ + input_shape = inputs.get_shape().as_list() + if len(input_shape) != 4: + raise ValueError('Invalid input tensor rank, expected 4, was: %d' % + len(input_shape)) + + with tf.variable_scope(scope, 'MobilenetV1', [inputs, num_classes], + reuse=reuse) as scope: + with slim.arg_scope([slim.batch_norm, slim.dropout], + is_training=is_training): + net, end_points = mobilenet_v1_base(inputs, scope=scope, + min_depth=min_depth, + depth_multiplier=depth_multiplier, + conv_defs=conv_defs) + with tf.variable_scope('Logits'): + kernel_size = _reduced_kernel_size_for_small_input(net, [7, 7]) + net = slim.avg_pool2d(net, kernel_size, padding='VALID', + scope='AvgPool_1a') + end_points['AvgPool_1a'] = net + # 1 x 1 x 1024 + net = slim.dropout(net, keep_prob=dropout_keep_prob, scope='Dropout_1b') + logits = slim.conv2d(net, num_classes, [1, 1], activation_fn=None, + normalizer_fn=None, scope='Conv2d_1c_1x1') + if spatial_squeeze: + logits = tf.squeeze(logits, [1, 2], name='SpatialSqueeze') + end_points['Logits'] = logits + if prediction_fn: + end_points['Predictions'] = prediction_fn(logits, scope='Predictions') + return logits, end_points + +mobilenet_v1.default_image_size = 224 + + +def _reduced_kernel_size_for_small_input(input_tensor, kernel_size): + """Define kernel size which is automatically reduced for small input. + + If the shape of the input images is unknown at graph construction time this + function assumes that the input images are large enough. + + Args: + input_tensor: input tensor of size [batch_size, height, width, channels]. + kernel_size: desired kernel size of length 2: [kernel_height, kernel_width] + + Returns: + a tensor with the kernel size. + """ + shape = input_tensor.get_shape().as_list() + if shape[1] is None or shape[2] is None: + kernel_size_out = kernel_size + else: + kernel_size_out = [min(shape[1], kernel_size[0]), + min(shape[2], kernel_size[1])] + return kernel_size_out + + +def mobilenet_v1_arg_scope(is_training=True, + weight_decay=0.00004, + stddev=0.09, + regularize_depthwise=False): + """Defines the default MobilenetV1 arg scope. + + Args: + is_training: Whether or not we're training the model. + weight_decay: The weight decay to use for regularizing the model. + stddev: The standard deviation of the trunctated normal weight initializer. + regularize_depthwise: Whether or not apply regularization on depthwise. + + Returns: + An `arg_scope` to use for the mobilenet v1 model. + """ + batch_norm_params = { + 'is_training': is_training, + 'center': True, + 'scale': True, + 'decay': 0.9997, + 'epsilon': 0.001, + } + + # Set weight_decay for weights in Conv and DepthSepConv layers. + weights_init = tf.truncated_normal_initializer(stddev=stddev) + regularizer = tf.contrib.layers.l2_regularizer(weight_decay) + if regularize_depthwise: + depthwise_regularizer = regularizer + else: + depthwise_regularizer = None + with slim.arg_scope([slim.conv2d, slim.separable_conv2d], + weights_initializer=weights_init, + activation_fn=tf.nn.relu6, normalizer_fn=slim.batch_norm): + with slim.arg_scope([slim.batch_norm], **batch_norm_params): + with slim.arg_scope([slim.conv2d], weights_regularizer=regularizer): + with slim.arg_scope([slim.separable_conv2d], + weights_regularizer=depthwise_regularizer) as sc: + return sc diff --git a/slim/nets/mobilenet_v1_test.py b/slim/nets/mobilenet_v1_test.py new file mode 100644 index 0000000000000000000000000000000000000000..44e66446baa42f49e164131eb4c1a97b46a9918d --- /dev/null +++ b/slim/nets/mobilenet_v1_test.py @@ -0,0 +1,450 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================= +"""Tests for MobileNet v1.""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import numpy as np +import tensorflow as tf + +from nets import mobilenet_v1 + +slim = tf.contrib.slim + + +class MobilenetV1Test(tf.test.TestCase): + + def testBuildClassificationNetwork(self): + batch_size = 5 + height, width = 224, 224 + num_classes = 1000 + + inputs = tf.random_uniform((batch_size, height, width, 3)) + logits, end_points = mobilenet_v1.mobilenet_v1(inputs, num_classes) + self.assertTrue(logits.op.name.startswith('MobilenetV1/Logits')) + self.assertListEqual(logits.get_shape().as_list(), + [batch_size, num_classes]) + self.assertTrue('Predictions' in end_points) + self.assertListEqual(end_points['Predictions'].get_shape().as_list(), + [batch_size, num_classes]) + + def testBuildBaseNetwork(self): + batch_size = 5 + height, width = 224, 224 + + inputs = tf.random_uniform((batch_size, height, width, 3)) + net, end_points = mobilenet_v1.mobilenet_v1_base(inputs) + self.assertTrue(net.op.name.startswith('MobilenetV1/Conv2d_13')) + self.assertListEqual(net.get_shape().as_list(), + [batch_size, 7, 7, 1024]) + expected_endpoints = ['Conv2d_0', + 'Conv2d_1_depthwise', 'Conv2d_1_pointwise', + 'Conv2d_2_depthwise', 'Conv2d_2_pointwise', + 'Conv2d_3_depthwise', 'Conv2d_3_pointwise', + 'Conv2d_4_depthwise', 'Conv2d_4_pointwise', + 'Conv2d_5_depthwise', 'Conv2d_5_pointwise', + 'Conv2d_6_depthwise', 'Conv2d_6_pointwise', + 'Conv2d_7_depthwise', 'Conv2d_7_pointwise', + 'Conv2d_8_depthwise', 'Conv2d_8_pointwise', + 'Conv2d_9_depthwise', 'Conv2d_9_pointwise', + 'Conv2d_10_depthwise', 'Conv2d_10_pointwise', + 'Conv2d_11_depthwise', 'Conv2d_11_pointwise', + 'Conv2d_12_depthwise', 'Conv2d_12_pointwise', + 'Conv2d_13_depthwise', 'Conv2d_13_pointwise'] + self.assertItemsEqual(end_points.keys(), expected_endpoints) + + def testBuildOnlyUptoFinalEndpoint(self): + batch_size = 5 + height, width = 224, 224 + endpoints = ['Conv2d_0', + 'Conv2d_1_depthwise', 'Conv2d_1_pointwise', + 'Conv2d_2_depthwise', 'Conv2d_2_pointwise', + 'Conv2d_3_depthwise', 'Conv2d_3_pointwise', + 'Conv2d_4_depthwise', 'Conv2d_4_pointwise', + 'Conv2d_5_depthwise', 'Conv2d_5_pointwise', + 'Conv2d_6_depthwise', 'Conv2d_6_pointwise', + 'Conv2d_7_depthwise', 'Conv2d_7_pointwise', + 'Conv2d_8_depthwise', 'Conv2d_8_pointwise', + 'Conv2d_9_depthwise', 'Conv2d_9_pointwise', + 'Conv2d_10_depthwise', 'Conv2d_10_pointwise', + 'Conv2d_11_depthwise', 'Conv2d_11_pointwise', + 'Conv2d_12_depthwise', 'Conv2d_12_pointwise', + 'Conv2d_13_depthwise', 'Conv2d_13_pointwise'] + for index, endpoint in enumerate(endpoints): + with tf.Graph().as_default(): + inputs = tf.random_uniform((batch_size, height, width, 3)) + out_tensor, end_points = mobilenet_v1.mobilenet_v1_base( + inputs, final_endpoint=endpoint) + self.assertTrue(out_tensor.op.name.startswith( + 'MobilenetV1/' + endpoint)) + self.assertItemsEqual(endpoints[:index+1], end_points) + + def testBuildCustomNetworkUsingConvDefs(self): + batch_size = 5 + height, width = 224, 224 + conv_defs = [ + mobilenet_v1.Conv(kernel=[3, 3], stride=2, depth=32), + mobilenet_v1.DepthSepConv(kernel=[3, 3], stride=1, depth=64), + mobilenet_v1.DepthSepConv(kernel=[3, 3], stride=2, depth=128), + mobilenet_v1.DepthSepConv(kernel=[3, 3], stride=1, depth=512) + ] + + inputs = tf.random_uniform((batch_size, height, width, 3)) + net, end_points = mobilenet_v1.mobilenet_v1_base( + inputs, final_endpoint='Conv2d_3_pointwise', conv_defs=conv_defs) + self.assertTrue(net.op.name.startswith('MobilenetV1/Conv2d_3')) + self.assertListEqual(net.get_shape().as_list(), + [batch_size, 56, 56, 512]) + expected_endpoints = ['Conv2d_0', + 'Conv2d_1_depthwise', 'Conv2d_1_pointwise', + 'Conv2d_2_depthwise', 'Conv2d_2_pointwise', + 'Conv2d_3_depthwise', 'Conv2d_3_pointwise'] + self.assertItemsEqual(end_points.keys(), expected_endpoints) + + def testBuildAndCheckAllEndPointsUptoConv2d_13(self): + batch_size = 5 + height, width = 224, 224 + + inputs = tf.random_uniform((batch_size, height, width, 3)) + with slim.arg_scope([slim.conv2d, slim.separable_conv2d], + normalizer_fn=slim.batch_norm): + _, end_points = mobilenet_v1.mobilenet_v1_base( + inputs, final_endpoint='Conv2d_13_pointwise') + endpoints_shapes = {'Conv2d_0': [batch_size, 112, 112, 32], + 'Conv2d_1_depthwise': [batch_size, 112, 112, 32], + 'Conv2d_1_pointwise': [batch_size, 112, 112, 64], + 'Conv2d_2_depthwise': [batch_size, 56, 56, 64], + 'Conv2d_2_pointwise': [batch_size, 56, 56, 128], + 'Conv2d_3_depthwise': [batch_size, 56, 56, 128], + 'Conv2d_3_pointwise': [batch_size, 56, 56, 128], + 'Conv2d_4_depthwise': [batch_size, 28, 28, 128], + 'Conv2d_4_pointwise': [batch_size, 28, 28, 256], + 'Conv2d_5_depthwise': [batch_size, 28, 28, 256], + 'Conv2d_5_pointwise': [batch_size, 28, 28, 256], + 'Conv2d_6_depthwise': [batch_size, 14, 14, 256], + 'Conv2d_6_pointwise': [batch_size, 14, 14, 512], + 'Conv2d_7_depthwise': [batch_size, 14, 14, 512], + 'Conv2d_7_pointwise': [batch_size, 14, 14, 512], + 'Conv2d_8_depthwise': [batch_size, 14, 14, 512], + 'Conv2d_8_pointwise': [batch_size, 14, 14, 512], + 'Conv2d_9_depthwise': [batch_size, 14, 14, 512], + 'Conv2d_9_pointwise': [batch_size, 14, 14, 512], + 'Conv2d_10_depthwise': [batch_size, 14, 14, 512], + 'Conv2d_10_pointwise': [batch_size, 14, 14, 512], + 'Conv2d_11_depthwise': [batch_size, 14, 14, 512], + 'Conv2d_11_pointwise': [batch_size, 14, 14, 512], + 'Conv2d_12_depthwise': [batch_size, 7, 7, 512], + 'Conv2d_12_pointwise': [batch_size, 7, 7, 1024], + 'Conv2d_13_depthwise': [batch_size, 7, 7, 1024], + 'Conv2d_13_pointwise': [batch_size, 7, 7, 1024]} + self.assertItemsEqual(endpoints_shapes.keys(), end_points.keys()) + for endpoint_name, expected_shape in endpoints_shapes.iteritems(): + self.assertTrue(endpoint_name in end_points) + self.assertListEqual(end_points[endpoint_name].get_shape().as_list(), + expected_shape) + + def testOutputStride16BuildAndCheckAllEndPointsUptoConv2d_13(self): + batch_size = 5 + height, width = 224, 224 + output_stride = 16 + + inputs = tf.random_uniform((batch_size, height, width, 3)) + with slim.arg_scope([slim.conv2d, slim.separable_conv2d], + normalizer_fn=slim.batch_norm): + _, end_points = mobilenet_v1.mobilenet_v1_base( + inputs, output_stride=output_stride, + final_endpoint='Conv2d_13_pointwise') + endpoints_shapes = {'Conv2d_0': [batch_size, 112, 112, 32], + 'Conv2d_1_depthwise': [batch_size, 112, 112, 32], + 'Conv2d_1_pointwise': [batch_size, 112, 112, 64], + 'Conv2d_2_depthwise': [batch_size, 56, 56, 64], + 'Conv2d_2_pointwise': [batch_size, 56, 56, 128], + 'Conv2d_3_depthwise': [batch_size, 56, 56, 128], + 'Conv2d_3_pointwise': [batch_size, 56, 56, 128], + 'Conv2d_4_depthwise': [batch_size, 28, 28, 128], + 'Conv2d_4_pointwise': [batch_size, 28, 28, 256], + 'Conv2d_5_depthwise': [batch_size, 28, 28, 256], + 'Conv2d_5_pointwise': [batch_size, 28, 28, 256], + 'Conv2d_6_depthwise': [batch_size, 14, 14, 256], + 'Conv2d_6_pointwise': [batch_size, 14, 14, 512], + 'Conv2d_7_depthwise': [batch_size, 14, 14, 512], + 'Conv2d_7_pointwise': [batch_size, 14, 14, 512], + 'Conv2d_8_depthwise': [batch_size, 14, 14, 512], + 'Conv2d_8_pointwise': [batch_size, 14, 14, 512], + 'Conv2d_9_depthwise': [batch_size, 14, 14, 512], + 'Conv2d_9_pointwise': [batch_size, 14, 14, 512], + 'Conv2d_10_depthwise': [batch_size, 14, 14, 512], + 'Conv2d_10_pointwise': [batch_size, 14, 14, 512], + 'Conv2d_11_depthwise': [batch_size, 14, 14, 512], + 'Conv2d_11_pointwise': [batch_size, 14, 14, 512], + 'Conv2d_12_depthwise': [batch_size, 14, 14, 512], + 'Conv2d_12_pointwise': [batch_size, 14, 14, 1024], + 'Conv2d_13_depthwise': [batch_size, 14, 14, 1024], + 'Conv2d_13_pointwise': [batch_size, 14, 14, 1024]} + self.assertItemsEqual(endpoints_shapes.keys(), end_points.keys()) + for endpoint_name, expected_shape in endpoints_shapes.iteritems(): + self.assertTrue(endpoint_name in end_points) + self.assertListEqual(end_points[endpoint_name].get_shape().as_list(), + expected_shape) + + def testOutputStride8BuildAndCheckAllEndPointsUptoConv2d_13(self): + batch_size = 5 + height, width = 224, 224 + output_stride = 8 + + inputs = tf.random_uniform((batch_size, height, width, 3)) + with slim.arg_scope([slim.conv2d, slim.separable_conv2d], + normalizer_fn=slim.batch_norm): + _, end_points = mobilenet_v1.mobilenet_v1_base( + inputs, output_stride=output_stride, + final_endpoint='Conv2d_13_pointwise') + endpoints_shapes = {'Conv2d_0': [batch_size, 112, 112, 32], + 'Conv2d_1_depthwise': [batch_size, 112, 112, 32], + 'Conv2d_1_pointwise': [batch_size, 112, 112, 64], + 'Conv2d_2_depthwise': [batch_size, 56, 56, 64], + 'Conv2d_2_pointwise': [batch_size, 56, 56, 128], + 'Conv2d_3_depthwise': [batch_size, 56, 56, 128], + 'Conv2d_3_pointwise': [batch_size, 56, 56, 128], + 'Conv2d_4_depthwise': [batch_size, 28, 28, 128], + 'Conv2d_4_pointwise': [batch_size, 28, 28, 256], + 'Conv2d_5_depthwise': [batch_size, 28, 28, 256], + 'Conv2d_5_pointwise': [batch_size, 28, 28, 256], + 'Conv2d_6_depthwise': [batch_size, 28, 28, 256], + 'Conv2d_6_pointwise': [batch_size, 28, 28, 512], + 'Conv2d_7_depthwise': [batch_size, 28, 28, 512], + 'Conv2d_7_pointwise': [batch_size, 28, 28, 512], + 'Conv2d_8_depthwise': [batch_size, 28, 28, 512], + 'Conv2d_8_pointwise': [batch_size, 28, 28, 512], + 'Conv2d_9_depthwise': [batch_size, 28, 28, 512], + 'Conv2d_9_pointwise': [batch_size, 28, 28, 512], + 'Conv2d_10_depthwise': [batch_size, 28, 28, 512], + 'Conv2d_10_pointwise': [batch_size, 28, 28, 512], + 'Conv2d_11_depthwise': [batch_size, 28, 28, 512], + 'Conv2d_11_pointwise': [batch_size, 28, 28, 512], + 'Conv2d_12_depthwise': [batch_size, 28, 28, 512], + 'Conv2d_12_pointwise': [batch_size, 28, 28, 1024], + 'Conv2d_13_depthwise': [batch_size, 28, 28, 1024], + 'Conv2d_13_pointwise': [batch_size, 28, 28, 1024]} + self.assertItemsEqual(endpoints_shapes.keys(), end_points.keys()) + for endpoint_name, expected_shape in endpoints_shapes.iteritems(): + self.assertTrue(endpoint_name in end_points) + self.assertListEqual(end_points[endpoint_name].get_shape().as_list(), + expected_shape) + + def testBuildAndCheckAllEndPointsApproximateFaceNet(self): + batch_size = 5 + height, width = 128, 128 + + inputs = tf.random_uniform((batch_size, height, width, 3)) + with slim.arg_scope([slim.conv2d, slim.separable_conv2d], + normalizer_fn=slim.batch_norm): + _, end_points = mobilenet_v1.mobilenet_v1_base( + inputs, final_endpoint='Conv2d_13_pointwise', depth_multiplier=0.75) + # For the Conv2d_0 layer FaceNet has depth=16 + endpoints_shapes = {'Conv2d_0': [batch_size, 64, 64, 24], + 'Conv2d_1_depthwise': [batch_size, 64, 64, 24], + 'Conv2d_1_pointwise': [batch_size, 64, 64, 48], + 'Conv2d_2_depthwise': [batch_size, 32, 32, 48], + 'Conv2d_2_pointwise': [batch_size, 32, 32, 96], + 'Conv2d_3_depthwise': [batch_size, 32, 32, 96], + 'Conv2d_3_pointwise': [batch_size, 32, 32, 96], + 'Conv2d_4_depthwise': [batch_size, 16, 16, 96], + 'Conv2d_4_pointwise': [batch_size, 16, 16, 192], + 'Conv2d_5_depthwise': [batch_size, 16, 16, 192], + 'Conv2d_5_pointwise': [batch_size, 16, 16, 192], + 'Conv2d_6_depthwise': [batch_size, 8, 8, 192], + 'Conv2d_6_pointwise': [batch_size, 8, 8, 384], + 'Conv2d_7_depthwise': [batch_size, 8, 8, 384], + 'Conv2d_7_pointwise': [batch_size, 8, 8, 384], + 'Conv2d_8_depthwise': [batch_size, 8, 8, 384], + 'Conv2d_8_pointwise': [batch_size, 8, 8, 384], + 'Conv2d_9_depthwise': [batch_size, 8, 8, 384], + 'Conv2d_9_pointwise': [batch_size, 8, 8, 384], + 'Conv2d_10_depthwise': [batch_size, 8, 8, 384], + 'Conv2d_10_pointwise': [batch_size, 8, 8, 384], + 'Conv2d_11_depthwise': [batch_size, 8, 8, 384], + 'Conv2d_11_pointwise': [batch_size, 8, 8, 384], + 'Conv2d_12_depthwise': [batch_size, 4, 4, 384], + 'Conv2d_12_pointwise': [batch_size, 4, 4, 768], + 'Conv2d_13_depthwise': [batch_size, 4, 4, 768], + 'Conv2d_13_pointwise': [batch_size, 4, 4, 768]} + self.assertItemsEqual(endpoints_shapes.keys(), end_points.keys()) + for endpoint_name, expected_shape in endpoints_shapes.iteritems(): + self.assertTrue(endpoint_name in end_points) + self.assertListEqual(end_points[endpoint_name].get_shape().as_list(), + expected_shape) + + def testModelHasExpectedNumberOfParameters(self): + batch_size = 5 + height, width = 224, 224 + inputs = tf.random_uniform((batch_size, height, width, 3)) + with slim.arg_scope([slim.conv2d, slim.separable_conv2d], + normalizer_fn=slim.batch_norm): + mobilenet_v1.mobilenet_v1_base(inputs) + total_params, _ = slim.model_analyzer.analyze_vars( + slim.get_model_variables()) + self.assertAlmostEqual(3217920L, total_params) + + def testBuildEndPointsWithDepthMultiplierLessThanOne(self): + batch_size = 5 + height, width = 224, 224 + num_classes = 1000 + + inputs = tf.random_uniform((batch_size, height, width, 3)) + _, end_points = mobilenet_v1.mobilenet_v1(inputs, num_classes) + + endpoint_keys = [key for key in end_points.keys() if key.startswith('Conv')] + + _, end_points_with_multiplier = mobilenet_v1.mobilenet_v1( + inputs, num_classes, scope='depth_multiplied_net', + depth_multiplier=0.5) + + for key in endpoint_keys: + original_depth = end_points[key].get_shape().as_list()[3] + new_depth = end_points_with_multiplier[key].get_shape().as_list()[3] + self.assertEqual(0.5 * original_depth, new_depth) + + def testBuildEndPointsWithDepthMultiplierGreaterThanOne(self): + batch_size = 5 + height, width = 224, 224 + num_classes = 1000 + + inputs = tf.random_uniform((batch_size, height, width, 3)) + _, end_points = mobilenet_v1.mobilenet_v1(inputs, num_classes) + + endpoint_keys = [key for key in end_points.keys() + if key.startswith('Mixed') or key.startswith('Conv')] + + _, end_points_with_multiplier = mobilenet_v1.mobilenet_v1( + inputs, num_classes, scope='depth_multiplied_net', + depth_multiplier=2.0) + + for key in endpoint_keys: + original_depth = end_points[key].get_shape().as_list()[3] + new_depth = end_points_with_multiplier[key].get_shape().as_list()[3] + self.assertEqual(2.0 * original_depth, new_depth) + + def testRaiseValueErrorWithInvalidDepthMultiplier(self): + batch_size = 5 + height, width = 224, 224 + num_classes = 1000 + + inputs = tf.random_uniform((batch_size, height, width, 3)) + with self.assertRaises(ValueError): + _ = mobilenet_v1.mobilenet_v1( + inputs, num_classes, depth_multiplier=-0.1) + with self.assertRaises(ValueError): + _ = mobilenet_v1.mobilenet_v1( + inputs, num_classes, depth_multiplier=0.0) + + def testHalfSizeImages(self): + batch_size = 5 + height, width = 112, 112 + num_classes = 1000 + + inputs = tf.random_uniform((batch_size, height, width, 3)) + logits, end_points = mobilenet_v1.mobilenet_v1(inputs, num_classes) + self.assertTrue(logits.op.name.startswith('MobilenetV1/Logits')) + self.assertListEqual(logits.get_shape().as_list(), + [batch_size, num_classes]) + pre_pool = end_points['Conv2d_13_pointwise'] + self.assertListEqual(pre_pool.get_shape().as_list(), + [batch_size, 4, 4, 1024]) + + def testUnknownImageShape(self): + tf.reset_default_graph() + batch_size = 2 + height, width = 224, 224 + num_classes = 1000 + input_np = np.random.uniform(0, 1, (batch_size, height, width, 3)) + with self.test_session() as sess: + inputs = tf.placeholder(tf.float32, shape=(batch_size, None, None, 3)) + logits, end_points = mobilenet_v1.mobilenet_v1(inputs, num_classes) + self.assertTrue(logits.op.name.startswith('MobilenetV1/Logits')) + self.assertListEqual(logits.get_shape().as_list(), + [batch_size, num_classes]) + pre_pool = end_points['Conv2d_13_pointwise'] + feed_dict = {inputs: input_np} + tf.global_variables_initializer().run() + pre_pool_out = sess.run(pre_pool, feed_dict=feed_dict) + self.assertListEqual(list(pre_pool_out.shape), [batch_size, 7, 7, 1024]) + + def testUnknowBatchSize(self): + batch_size = 1 + height, width = 224, 224 + num_classes = 1000 + + inputs = tf.placeholder(tf.float32, (None, height, width, 3)) + logits, _ = mobilenet_v1.mobilenet_v1(inputs, num_classes) + self.assertTrue(logits.op.name.startswith('MobilenetV1/Logits')) + self.assertListEqual(logits.get_shape().as_list(), + [None, num_classes]) + images = tf.random_uniform((batch_size, height, width, 3)) + + with self.test_session() as sess: + sess.run(tf.global_variables_initializer()) + output = sess.run(logits, {inputs: images.eval()}) + self.assertEquals(output.shape, (batch_size, num_classes)) + + def testEvaluation(self): + batch_size = 2 + height, width = 224, 224 + num_classes = 1000 + + eval_inputs = tf.random_uniform((batch_size, height, width, 3)) + logits, _ = mobilenet_v1.mobilenet_v1(eval_inputs, num_classes, + is_training=False) + predictions = tf.argmax(logits, 1) + + with self.test_session() as sess: + sess.run(tf.global_variables_initializer()) + output = sess.run(predictions) + self.assertEquals(output.shape, (batch_size,)) + + def testTrainEvalWithReuse(self): + train_batch_size = 5 + eval_batch_size = 2 + height, width = 150, 150 + num_classes = 1000 + + train_inputs = tf.random_uniform((train_batch_size, height, width, 3)) + mobilenet_v1.mobilenet_v1(train_inputs, num_classes) + eval_inputs = tf.random_uniform((eval_batch_size, height, width, 3)) + logits, _ = mobilenet_v1.mobilenet_v1(eval_inputs, num_classes, + reuse=True) + predictions = tf.argmax(logits, 1) + + with self.test_session() as sess: + sess.run(tf.global_variables_initializer()) + output = sess.run(predictions) + self.assertEquals(output.shape, (eval_batch_size,)) + + def testLogitsNotSqueezed(self): + num_classes = 25 + images = tf.random_uniform([1, 224, 224, 3]) + logits, _ = mobilenet_v1.mobilenet_v1(images, + num_classes=num_classes, + spatial_squeeze=False) + + with self.test_session() as sess: + tf.global_variables_initializer().run() + logits_out = sess.run(logits) + self.assertListEqual(list(logits_out.shape), [1, 1, 1, num_classes]) + + +if __name__ == '__main__': + tf.test.main() diff --git a/slim/nets/nets_factory.py b/slim/nets/nets_factory.py index b4f71abd1a24354c2a76314d536e3a925916d933..7c0416167d3009a02266809658904cadad57acba 100644 --- a/slim/nets/nets_factory.py +++ b/slim/nets/nets_factory.py @@ -25,6 +25,7 @@ from nets import alexnet from nets import cifarnet from nets import inception from nets import lenet +from nets import mobilenet_v1 from nets import overfeat from nets import resnet_v1 from nets import resnet_v2 @@ -52,6 +53,7 @@ networks_map = {'alexnet_v2': alexnet.alexnet_v2, 'resnet_v2_101': resnet_v2.resnet_v2_101, 'resnet_v2_152': resnet_v2.resnet_v2_152, 'resnet_v2_200': resnet_v2.resnet_v2_200, + 'mobilenet_v1': mobilenet_v1.mobilenet_v1, } arg_scopes_map = {'alexnet_v2': alexnet.alexnet_v2_arg_scope, @@ -75,6 +77,7 @@ arg_scopes_map = {'alexnet_v2': alexnet.alexnet_v2_arg_scope, 'resnet_v2_101': resnet_v2.resnet_arg_scope, 'resnet_v2_152': resnet_v2.resnet_arg_scope, 'resnet_v2_200': resnet_v2.resnet_arg_scope, + 'mobilenet_v1': mobilenet_v1.mobilenet_v1_arg_scope, } diff --git a/slim/nets/nets_factory_test.py b/slim/nets/nets_factory_test.py index 6ac723b6d98833f8eb1ebe02c4552e0cf1d758a1..b4ab1f822c9e85ab41b25e57589479e95377de18 100644 --- a/slim/nets/nets_factory_test.py +++ b/slim/nets/nets_factory_test.py @@ -19,11 +19,12 @@ from __future__ import absolute_import from __future__ import division from __future__ import print_function - import tensorflow as tf from nets import nets_factory +slim = tf.contrib.slim + class NetworksTest(tf.test.TestCase): @@ -42,5 +43,19 @@ class NetworksTest(tf.test.TestCase): self.assertEqual(logits.get_shape().as_list()[0], batch_size) self.assertEqual(logits.get_shape().as_list()[-1], num_classes) + def testGetNetworkFnArgScope(self): + batch_size = 5 + num_classes = 10 + net = 'cifarnet' + with self.test_session(use_gpu=True): + net_fn = nets_factory.get_network_fn(net, num_classes) + image_size = getattr(net_fn, 'default_image_size', 224) + with slim.arg_scope([slim.model_variable, slim.variable], + device='/CPU:0'): + inputs = tf.random_uniform((batch_size, image_size, image_size, 3)) + net_fn(inputs) + weights = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, 'CifarNet/conv1')[0] + self.assertDeviceEqual('/CPU:0', weights.device) + if __name__ == '__main__': tf.test.main() diff --git a/slim/nets/resnet_utils.py b/slim/nets/resnet_utils.py index 1e1dd82929f9bb862d239759b5b21767bbeeb779..20d7789a6087ec0fd6b971a176b3fbb0532c8771 100644 --- a/slim/nets/resnet_utils.py +++ b/slim/nets/resnet_utils.py @@ -178,26 +178,16 @@ def stack_blocks_dense(net, blocks, output_stride=None, raise ValueError('The target output_stride cannot be reached.') with tf.variable_scope('unit_%d' % (i + 1), values=[net]): - unit_depth, unit_depth_bottleneck, unit_stride = unit - # If we have reached the target output_stride, then we need to employ # atrous convolution with stride=1 and multiply the atrous rate by the # current unit's stride for use in subsequent layers. if output_stride is not None and current_stride == output_stride: - net = block.unit_fn(net, - depth=unit_depth, - depth_bottleneck=unit_depth_bottleneck, - stride=1, - rate=rate) - rate *= unit_stride + net = block.unit_fn(net, rate=rate, **dict(unit, stride=1)) + rate *= unit.get('stride', 1) else: - net = block.unit_fn(net, - depth=unit_depth, - depth_bottleneck=unit_depth_bottleneck, - stride=unit_stride, - rate=1) - current_stride *= unit_stride + net = block.unit_fn(net, rate=1, **unit) + current_stride *= unit.get('stride', 1) net = slim.utils.collect_named_outputs(outputs_collections, sc.name, net) if output_stride is not None and current_stride != output_stride: diff --git a/slim/nets/resnet_v1.py b/slim/nets/resnet_v1.py index 7e46fd2e1fc8aa3650092e3e7ecb6f819e665819..841e2fb2b5e6b95f1eb28074040231216eb6f5ba 100644 --- a/slim/nets/resnet_v1.py +++ b/slim/nets/resnet_v1.py @@ -119,7 +119,7 @@ def resnet_v1(inputs, global_pool=True, output_stride=None, include_root_block=True, - spatial_squeeze=True, + spatial_squeeze=False, reuse=None, scope=None): """Generator for v1 ResNet models. @@ -161,6 +161,9 @@ def resnet_v1(inputs, max-pooling, if False excludes it. spatial_squeeze: if True, logits is of shape [B, C], if false logits is of shape [B, 1, 1, C], where B is batch_size and C is number of classes. + To use this parameter, the input images must be smaller than 300x300 + pixels, in which case the output logit layer does not contain spatial + information and can be removed. reuse: whether or not the network and its variables should be reused. To be able to reuse 'scope' must be given. scope: Optional variable_scope. @@ -200,37 +203,60 @@ def resnet_v1(inputs, if num_classes is not None: net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None, normalizer_fn=None, scope='logits') - if spatial_squeeze: - logits = tf.squeeze(net, [1, 2], name='SpatialSqueeze') + if spatial_squeeze: + net = tf.squeeze(net, [1, 2], name='SpatialSqueeze') # Convert end_points_collection into a dictionary of end_points. - end_points = slim.utils.convert_collection_to_dict(end_points_collection) + end_points = slim.utils.convert_collection_to_dict( + end_points_collection) if num_classes is not None: - end_points['predictions'] = slim.softmax(logits, scope='predictions') - return logits, end_points + end_points['predictions'] = slim.softmax(net, scope='predictions') + return net, end_points resnet_v1.default_image_size = 224 +def resnet_v1_block(scope, base_depth, num_units, stride): + """Helper function for creating a resnet_v1 bottleneck block. + + Args: + scope: The scope of the block. + base_depth: The depth of the bottleneck layer for each unit. + num_units: The number of units in the block. + stride: The stride of the block, implemented as a stride in the last unit. + All other units have stride=1. + + Returns: + A resnet_v1 bottleneck block. + """ + return resnet_utils.Block(scope, bottleneck, [{ + 'depth': base_depth * 4, + 'depth_bottleneck': base_depth, + 'stride': 1 + }] * (num_units - 1) + [{ + 'depth': base_depth * 4, + 'depth_bottleneck': base_depth, + 'stride': stride + }]) + + def resnet_v1_50(inputs, num_classes=None, is_training=True, global_pool=True, output_stride=None, + spatial_squeeze=True, reuse=None, scope='resnet_v1_50'): """ResNet-50 model of [1]. See resnet_v1() for arg and return description.""" blocks = [ - resnet_utils.Block( - 'block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]), - resnet_utils.Block( - 'block2', bottleneck, [(512, 128, 1)] * 3 + [(512, 128, 2)]), - resnet_utils.Block( - 'block3', bottleneck, [(1024, 256, 1)] * 5 + [(1024, 256, 2)]), - resnet_utils.Block( - 'block4', bottleneck, [(2048, 512, 1)] * 3) + resnet_v1_block('block1', base_depth=64, num_units=3, stride=2), + resnet_v1_block('block2', base_depth=128, num_units=4, stride=2), + resnet_v1_block('block3', base_depth=256, num_units=6, stride=2), + resnet_v1_block('block4', base_depth=512, num_units=3, stride=1), ] return resnet_v1(inputs, blocks, num_classes, is_training, global_pool=global_pool, output_stride=output_stride, - include_root_block=True, reuse=reuse, scope=scope) + include_root_block=True, spatial_squeeze=spatial_squeeze, + reuse=reuse, scope=scope) resnet_v1_50.default_image_size = resnet_v1.default_image_size @@ -239,22 +265,20 @@ def resnet_v1_101(inputs, is_training=True, global_pool=True, output_stride=None, + spatial_squeeze=True, reuse=None, scope='resnet_v1_101'): """ResNet-101 model of [1]. See resnet_v1() for arg and return description.""" blocks = [ - resnet_utils.Block( - 'block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]), - resnet_utils.Block( - 'block2', bottleneck, [(512, 128, 1)] * 3 + [(512, 128, 2)]), - resnet_utils.Block( - 'block3', bottleneck, [(1024, 256, 1)] * 22 + [(1024, 256, 2)]), - resnet_utils.Block( - 'block4', bottleneck, [(2048, 512, 1)] * 3) + resnet_v1_block('block1', base_depth=64, num_units=3, stride=2), + resnet_v1_block('block2', base_depth=128, num_units=4, stride=2), + resnet_v1_block('block3', base_depth=256, num_units=23, stride=2), + resnet_v1_block('block4', base_depth=512, num_units=3, stride=1), ] return resnet_v1(inputs, blocks, num_classes, is_training, global_pool=global_pool, output_stride=output_stride, - include_root_block=True, reuse=reuse, scope=scope) + include_root_block=True, spatial_squeeze=spatial_squeeze, + reuse=reuse, scope=scope) resnet_v1_101.default_image_size = resnet_v1.default_image_size @@ -263,21 +287,20 @@ def resnet_v1_152(inputs, is_training=True, global_pool=True, output_stride=None, + spatial_squeeze=True, reuse=None, scope='resnet_v1_152'): """ResNet-152 model of [1]. See resnet_v1() for arg and return description.""" blocks = [ - resnet_utils.Block( - 'block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]), - resnet_utils.Block( - 'block2', bottleneck, [(512, 128, 1)] * 7 + [(512, 128, 2)]), - resnet_utils.Block( - 'block3', bottleneck, [(1024, 256, 1)] * 35 + [(1024, 256, 2)]), - resnet_utils.Block( - 'block4', bottleneck, [(2048, 512, 1)] * 3)] + resnet_v1_block('block1', base_depth=64, num_units=3, stride=2), + resnet_v1_block('block2', base_depth=128, num_units=8, stride=2), + resnet_v1_block('block3', base_depth=256, num_units=36, stride=2), + resnet_v1_block('block4', base_depth=512, num_units=3, stride=1), + ] return resnet_v1(inputs, blocks, num_classes, is_training, global_pool=global_pool, output_stride=output_stride, - include_root_block=True, reuse=reuse, scope=scope) + include_root_block=True, spatial_squeeze=spatial_squeeze, + reuse=reuse, scope=scope) resnet_v1_152.default_image_size = resnet_v1.default_image_size @@ -286,19 +309,18 @@ def resnet_v1_200(inputs, is_training=True, global_pool=True, output_stride=None, + spatial_squeeze=True, reuse=None, scope='resnet_v1_200'): """ResNet-200 model of [2]. See resnet_v1() for arg and return description.""" blocks = [ - resnet_utils.Block( - 'block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]), - resnet_utils.Block( - 'block2', bottleneck, [(512, 128, 1)] * 23 + [(512, 128, 2)]), - resnet_utils.Block( - 'block3', bottleneck, [(1024, 256, 1)] * 35 + [(1024, 256, 2)]), - resnet_utils.Block( - 'block4', bottleneck, [(2048, 512, 1)] * 3)] + resnet_v1_block('block1', base_depth=64, num_units=3, stride=2), + resnet_v1_block('block2', base_depth=128, num_units=24, stride=2), + resnet_v1_block('block3', base_depth=256, num_units=36, stride=2), + resnet_v1_block('block4', base_depth=512, num_units=3, stride=1), + ] return resnet_v1(inputs, blocks, num_classes, is_training, global_pool=global_pool, output_stride=output_stride, - include_root_block=True, reuse=reuse, scope=scope) + include_root_block=True, spatial_squeeze=spatial_squeeze, + reuse=reuse, scope=scope) resnet_v1_200.default_image_size = resnet_v1.default_image_size diff --git a/slim/nets/resnet_v1_test.py b/slim/nets/resnet_v1_test.py index 5c229a516ba213bdf4edaa775eafae9a6347bcac..6bee51914d5d54b4799cd2bd811592744e2f59ef 100644 --- a/slim/nets/resnet_v1_test.py +++ b/slim/nets/resnet_v1_test.py @@ -156,14 +156,17 @@ class ResnetUtilsTest(tf.test.TestCase): with tf.variable_scope(scope, values=[inputs]): with slim.arg_scope([slim.conv2d], outputs_collections='end_points'): net = resnet_utils.stack_blocks_dense(inputs, blocks, output_stride) - end_points = dict(tf.get_collection('end_points')) + end_points = slim.utils.convert_collection_to_dict('end_points') return net, end_points def testEndPointsV1(self): """Test the end points of a tiny v1 bottleneck network.""" - bottleneck = resnet_v1.bottleneck - blocks = [resnet_utils.Block('block1', bottleneck, [(4, 1, 1), (4, 1, 2)]), - resnet_utils.Block('block2', bottleneck, [(8, 2, 1), (8, 2, 1)])] + blocks = [ + resnet_v1.resnet_v1_block( + 'block1', base_depth=1, num_units=2, stride=2), + resnet_v1.resnet_v1_block( + 'block2', base_depth=2, num_units=2, stride=1), + ] inputs = create_test_input(2, 32, 16, 3) with slim.arg_scope(resnet_utils.resnet_arg_scope()): _, end_points = self._resnet_plain(inputs, blocks, scope='tiny') @@ -189,30 +192,23 @@ class ResnetUtilsTest(tf.test.TestCase): for block in blocks: with tf.variable_scope(block.scope, 'block', [net]): for i, unit in enumerate(block.args): - depth, depth_bottleneck, stride = unit with tf.variable_scope('unit_%d' % (i + 1), values=[net]): - net = block.unit_fn(net, - depth=depth, - depth_bottleneck=depth_bottleneck, - stride=stride, - rate=1) + net = block.unit_fn(net, rate=1, **unit) return net - def _atrousValues(self, bottleneck): + def testAtrousValuesBottleneck(self): """Verify the values of dense feature extraction by atrous convolution. Make sure that dense feature extraction by stack_blocks_dense() followed by subsampling gives identical results to feature extraction at the nominal network output stride using the simple self._stack_blocks_nondense() above. - - Args: - bottleneck: The bottleneck function. """ + block = resnet_v1.resnet_v1_block blocks = [ - resnet_utils.Block('block1', bottleneck, [(4, 1, 1), (4, 1, 2)]), - resnet_utils.Block('block2', bottleneck, [(8, 2, 1), (8, 2, 2)]), - resnet_utils.Block('block3', bottleneck, [(16, 4, 1), (16, 4, 2)]), - resnet_utils.Block('block4', bottleneck, [(32, 8, 1), (32, 8, 1)]) + block('block1', base_depth=1, num_units=2, stride=2), + block('block2', base_depth=2, num_units=2, stride=2), + block('block3', base_depth=4, num_units=2, stride=2), + block('block4', base_depth=8, num_units=2, stride=1), ] nominal_stride = 8 @@ -244,9 +240,6 @@ class ResnetUtilsTest(tf.test.TestCase): output, expected = sess.run([output, expected]) self.assertAllClose(output, expected, atol=1e-4, rtol=1e-4) - def testAtrousValuesBottleneck(self): - self._atrousValues(resnet_v1.bottleneck) - class ResnetCompleteNetworkTest(tf.test.TestCase): """Tests with complete small ResNet v1 networks.""" @@ -261,16 +254,13 @@ class ResnetCompleteNetworkTest(tf.test.TestCase): reuse=None, scope='resnet_v1_small'): """A shallow and thin ResNet v1 for faster tests.""" - bottleneck = resnet_v1.bottleneck + block = resnet_v1.resnet_v1_block blocks = [ - resnet_utils.Block( - 'block1', bottleneck, [(4, 1, 1)] * 2 + [(4, 1, 2)]), - resnet_utils.Block( - 'block2', bottleneck, [(8, 2, 1)] * 2 + [(8, 2, 2)]), - resnet_utils.Block( - 'block3', bottleneck, [(16, 4, 1)] * 2 + [(16, 4, 2)]), - resnet_utils.Block( - 'block4', bottleneck, [(32, 8, 1)] * 2)] + block('block1', base_depth=1, num_units=3, stride=2), + block('block2', base_depth=2, num_units=3, stride=2), + block('block3', base_depth=4, num_units=3, stride=2), + block('block4', base_depth=8, num_units=2, stride=1), + ] return resnet_v1.resnet_v1(inputs, blocks, num_classes, is_training=is_training, global_pool=global_pool, diff --git a/slim/nets/resnet_v2.py b/slim/nets/resnet_v2.py index a05eb3e3918a93eecbe3a2ad85d64c5e206a19e7..0951c1edb2eb80c35eeac5f3788b2d10abcc296b 100644 --- a/slim/nets/resnet_v2.py +++ b/slim/nets/resnet_v2.py @@ -25,8 +25,6 @@ introduced by: The key difference of the full preactivation 'v2' variant compared to the 'v1' variant in [1] is the use of batch normalization before every weight layer. -Another difference is that 'v2' ResNets do not include an activation function in -the main pathway. Also see [2; Fig. 4e]. Typical use: @@ -117,7 +115,7 @@ def resnet_v2(inputs, global_pool=True, output_stride=None, include_root_block=True, - spatial_squeeze=True, + spatial_squeeze=False, reuse=None, scope=None): """Generator for v2 (preactivation) ResNet models. @@ -160,6 +158,9 @@ def resnet_v2(inputs, results of an activation-less convolution. spatial_squeeze: if True, logits is of shape [B, C], if false logits is of shape [B, 1, 1, C], where B is batch_size and C is number of classes. + To use this parameter, the input images must be smaller than 300x300 + pixels, in which case the output logit layer does not contain spatial + information and can be removed. reuse: whether or not the network and its variables should be reused. To be able to reuse 'scope' must be given. scope: Optional variable_scope. @@ -209,13 +210,39 @@ def resnet_v2(inputs, if num_classes is not None: net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None, normalizer_fn=None, scope='logits') - if spatial_squeeze: - logits = tf.squeeze(net, [1, 2], name='SpatialSqueeze') + if spatial_squeeze: + net = tf.squeeze(net, [1, 2], name='SpatialSqueeze') # Convert end_points_collection into a dictionary of end_points. - end_points = slim.utils.convert_collection_to_dict(end_points_collection) + end_points = slim.utils.convert_collection_to_dict( + end_points_collection) if num_classes is not None: - end_points['predictions'] = slim.softmax(logits, scope='predictions') - return logits, end_points + end_points['predictions'] = slim.softmax(net, scope='predictions') + return net, end_points +resnet_v2.default_image_size = 224 + + +def resnet_v2_block(scope, base_depth, num_units, stride): + """Helper function for creating a resnet_v2 bottleneck block. + + Args: + scope: The scope of the block. + base_depth: The depth of the bottleneck layer for each unit. + num_units: The number of units in the block. + stride: The stride of the block, implemented as a stride in the last unit. + All other units have stride=1. + + Returns: + A resnet_v2 bottleneck block. + """ + return resnet_utils.Block(scope, bottleneck, [{ + 'depth': base_depth * 4, + 'depth_bottleneck': base_depth, + 'stride': 1 + }] * (num_units - 1) + [{ + 'depth': base_depth * 4, + 'depth_bottleneck': base_depth, + 'stride': stride + }]) resnet_v2.default_image_size = 224 @@ -224,21 +251,20 @@ def resnet_v2_50(inputs, is_training=True, global_pool=True, output_stride=None, + spatial_squeeze=False, reuse=None, scope='resnet_v2_50'): """ResNet-50 model of [1]. See resnet_v2() for arg and return description.""" blocks = [ - resnet_utils.Block( - 'block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]), - resnet_utils.Block( - 'block2', bottleneck, [(512, 128, 1)] * 3 + [(512, 128, 2)]), - resnet_utils.Block( - 'block3', bottleneck, [(1024, 256, 1)] * 5 + [(1024, 256, 2)]), - resnet_utils.Block( - 'block4', bottleneck, [(2048, 512, 1)] * 3)] + resnet_v2_block('block1', base_depth=64, num_units=3, stride=2), + resnet_v2_block('block2', base_depth=128, num_units=4, stride=2), + resnet_v2_block('block3', base_depth=256, num_units=6, stride=2), + resnet_v2_block('block4', base_depth=512, num_units=3, stride=1), + ] return resnet_v2(inputs, blocks, num_classes, is_training=is_training, global_pool=global_pool, output_stride=output_stride, - include_root_block=True, reuse=reuse, scope=scope) + include_root_block=True, spatial_squeeze=spatial_squeeze, + reuse=reuse, scope=scope) resnet_v2_50.default_image_size = resnet_v2.default_image_size @@ -247,21 +273,20 @@ def resnet_v2_101(inputs, is_training=True, global_pool=True, output_stride=None, + spatial_squeeze=False, reuse=None, scope='resnet_v2_101'): """ResNet-101 model of [1]. See resnet_v2() for arg and return description.""" blocks = [ - resnet_utils.Block( - 'block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]), - resnet_utils.Block( - 'block2', bottleneck, [(512, 128, 1)] * 3 + [(512, 128, 2)]), - resnet_utils.Block( - 'block3', bottleneck, [(1024, 256, 1)] * 22 + [(1024, 256, 2)]), - resnet_utils.Block( - 'block4', bottleneck, [(2048, 512, 1)] * 3)] + resnet_v2_block('block1', base_depth=64, num_units=3, stride=2), + resnet_v2_block('block2', base_depth=128, num_units=4, stride=2), + resnet_v2_block('block3', base_depth=256, num_units=23, stride=2), + resnet_v2_block('block4', base_depth=512, num_units=3, stride=1), + ] return resnet_v2(inputs, blocks, num_classes, is_training=is_training, global_pool=global_pool, output_stride=output_stride, - include_root_block=True, reuse=reuse, scope=scope) + include_root_block=True, spatial_squeeze=spatial_squeeze, + reuse=reuse, scope=scope) resnet_v2_101.default_image_size = resnet_v2.default_image_size @@ -270,21 +295,20 @@ def resnet_v2_152(inputs, is_training=True, global_pool=True, output_stride=None, + spatial_squeeze=False, reuse=None, scope='resnet_v2_152'): """ResNet-152 model of [1]. See resnet_v2() for arg and return description.""" blocks = [ - resnet_utils.Block( - 'block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]), - resnet_utils.Block( - 'block2', bottleneck, [(512, 128, 1)] * 7 + [(512, 128, 2)]), - resnet_utils.Block( - 'block3', bottleneck, [(1024, 256, 1)] * 35 + [(1024, 256, 2)]), - resnet_utils.Block( - 'block4', bottleneck, [(2048, 512, 1)] * 3)] + resnet_v2_block('block1', base_depth=64, num_units=3, stride=2), + resnet_v2_block('block2', base_depth=128, num_units=8, stride=2), + resnet_v2_block('block3', base_depth=256, num_units=36, stride=2), + resnet_v2_block('block4', base_depth=512, num_units=3, stride=1), + ] return resnet_v2(inputs, blocks, num_classes, is_training=is_training, global_pool=global_pool, output_stride=output_stride, - include_root_block=True, reuse=reuse, scope=scope) + include_root_block=True, spatial_squeeze=spatial_squeeze, + reuse=reuse, scope=scope) resnet_v2_152.default_image_size = resnet_v2.default_image_size @@ -293,19 +317,18 @@ def resnet_v2_200(inputs, is_training=True, global_pool=True, output_stride=None, + spatial_squeeze=False, reuse=None, scope='resnet_v2_200'): """ResNet-200 model of [2]. See resnet_v2() for arg and return description.""" blocks = [ - resnet_utils.Block( - 'block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]), - resnet_utils.Block( - 'block2', bottleneck, [(512, 128, 1)] * 23 + [(512, 128, 2)]), - resnet_utils.Block( - 'block3', bottleneck, [(1024, 256, 1)] * 35 + [(1024, 256, 2)]), - resnet_utils.Block( - 'block4', bottleneck, [(2048, 512, 1)] * 3)] + resnet_v2_block('block1', base_depth=64, num_units=3, stride=2), + resnet_v2_block('block2', base_depth=128, num_units=24, stride=2), + resnet_v2_block('block3', base_depth=256, num_units=36, stride=2), + resnet_v2_block('block4', base_depth=512, num_units=3, stride=1), + ] return resnet_v2(inputs, blocks, num_classes, is_training=is_training, global_pool=global_pool, output_stride=output_stride, - include_root_block=True, reuse=reuse, scope=scope) + include_root_block=True, spatial_squeeze=spatial_squeeze, + reuse=reuse, scope=scope) resnet_v2_200.default_image_size = resnet_v2.default_image_size diff --git a/slim/nets/resnet_v2_test.py b/slim/nets/resnet_v2_test.py index 141937d1e3bcee8888cf9f4eae090f3fe96e6580..8efe3387810e2cee932cf7e6dd5e61f3f7e0923e 100644 --- a/slim/nets/resnet_v2_test.py +++ b/slim/nets/resnet_v2_test.py @@ -156,14 +156,17 @@ class ResnetUtilsTest(tf.test.TestCase): with tf.variable_scope(scope, values=[inputs]): with slim.arg_scope([slim.conv2d], outputs_collections='end_points'): net = resnet_utils.stack_blocks_dense(inputs, blocks, output_stride) - end_points = dict(tf.get_collection('end_points')) + end_points = slim.utils.convert_collection_to_dict('end_points') return net, end_points def testEndPointsV2(self): """Test the end points of a tiny v2 bottleneck network.""" - bottleneck = resnet_v2.bottleneck - blocks = [resnet_utils.Block('block1', bottleneck, [(4, 1, 1), (4, 1, 2)]), - resnet_utils.Block('block2', bottleneck, [(8, 2, 1), (8, 2, 1)])] + blocks = [ + resnet_v2.resnet_v2_block( + 'block1', base_depth=1, num_units=2, stride=2), + resnet_v2.resnet_v2_block( + 'block2', base_depth=2, num_units=2, stride=1), + ] inputs = create_test_input(2, 32, 16, 3) with slim.arg_scope(resnet_utils.resnet_arg_scope()): _, end_points = self._resnet_plain(inputs, blocks, scope='tiny') @@ -189,30 +192,23 @@ class ResnetUtilsTest(tf.test.TestCase): for block in blocks: with tf.variable_scope(block.scope, 'block', [net]): for i, unit in enumerate(block.args): - depth, depth_bottleneck, stride = unit with tf.variable_scope('unit_%d' % (i + 1), values=[net]): - net = block.unit_fn(net, - depth=depth, - depth_bottleneck=depth_bottleneck, - stride=stride, - rate=1) + net = block.unit_fn(net, rate=1, **unit) return net - def _atrousValues(self, bottleneck): + def testAtrousValuesBottleneck(self): """Verify the values of dense feature extraction by atrous convolution. Make sure that dense feature extraction by stack_blocks_dense() followed by subsampling gives identical results to feature extraction at the nominal network output stride using the simple self._stack_blocks_nondense() above. - - Args: - bottleneck: The bottleneck function. """ + block = resnet_v2.resnet_v2_block blocks = [ - resnet_utils.Block('block1', bottleneck, [(4, 1, 1), (4, 1, 2)]), - resnet_utils.Block('block2', bottleneck, [(8, 2, 1), (8, 2, 2)]), - resnet_utils.Block('block3', bottleneck, [(16, 4, 1), (16, 4, 2)]), - resnet_utils.Block('block4', bottleneck, [(32, 8, 1), (32, 8, 1)]) + block('block1', base_depth=1, num_units=2, stride=2), + block('block2', base_depth=2, num_units=2, stride=2), + block('block3', base_depth=4, num_units=2, stride=2), + block('block4', base_depth=8, num_units=2, stride=1), ] nominal_stride = 8 @@ -244,9 +240,6 @@ class ResnetUtilsTest(tf.test.TestCase): output, expected = sess.run([output, expected]) self.assertAllClose(output, expected, atol=1e-4, rtol=1e-4) - def testAtrousValuesBottleneck(self): - self._atrousValues(resnet_v2.bottleneck) - class ResnetCompleteNetworkTest(tf.test.TestCase): """Tests with complete small ResNet v2 networks.""" @@ -261,16 +254,13 @@ class ResnetCompleteNetworkTest(tf.test.TestCase): reuse=None, scope='resnet_v2_small'): """A shallow and thin ResNet v2 for faster tests.""" - bottleneck = resnet_v2.bottleneck + block = resnet_v2.resnet_v2_block blocks = [ - resnet_utils.Block( - 'block1', bottleneck, [(4, 1, 1)] * 2 + [(4, 1, 2)]), - resnet_utils.Block( - 'block2', bottleneck, [(8, 2, 1)] * 2 + [(8, 2, 2)]), - resnet_utils.Block( - 'block3', bottleneck, [(16, 4, 1)] * 2 + [(16, 4, 2)]), - resnet_utils.Block( - 'block4', bottleneck, [(32, 8, 1)] * 2)] + block('block1', base_depth=1, num_units=3, stride=2), + block('block2', base_depth=2, num_units=3, stride=2), + block('block3', base_depth=4, num_units=3, stride=2), + block('block4', base_depth=8, num_units=2, stride=1), + ] return resnet_v2.resnet_v2(inputs, blocks, num_classes, is_training=is_training, global_pool=global_pool, diff --git a/slim/nets/vgg.py b/slim/nets/vgg.py index 7de2806220917462545bbaefed66d58ea6f1d904..79680702c5efb0383036376619395df8bf340a30 100644 --- a/slim/nets/vgg.py +++ b/slim/nets/vgg.py @@ -68,7 +68,8 @@ def vgg_a(inputs, is_training=True, dropout_keep_prob=0.5, spatial_squeeze=True, - scope='vgg_a'): + scope='vgg_a', + fc_conv_padding='VALID'): """Oxford Net VGG 11-Layers version A Example. Note: All the fully_connected layers have been transformed to conv2d layers. @@ -83,6 +84,11 @@ def vgg_a(inputs, spatial_squeeze: whether or not should squeeze the spatial dimensions of the outputs. Useful to remove unnecessary dimensions for classification. scope: Optional scope for the variables. + fc_conv_padding: the type of padding to use for the fully connected layer + that is implemented as a convolutional layer. Use 'SAME' padding if you + are applying the network in a fully convolutional manner and want to + get a prediction map downsampled by a factor of 32 as an output. Otherwise, + the output prediction map will be (input / 32) - 6 in case of 'VALID' padding. Returns: the last op containing the log predictions and end_points dict. @@ -103,7 +109,7 @@ def vgg_a(inputs, net = slim.repeat(net, 2, slim.conv2d, 512, [3, 3], scope='conv5') net = slim.max_pool2d(net, [2, 2], scope='pool5') # Use conv2d instead of fully_connected layers. - net = slim.conv2d(net, 4096, [7, 7], padding='VALID', scope='fc6') + net = slim.conv2d(net, 4096, [7, 7], padding=fc_conv_padding, scope='fc6') net = slim.dropout(net, dropout_keep_prob, is_training=is_training, scope='dropout6') net = slim.conv2d(net, 4096, [1, 1], scope='fc7') @@ -127,7 +133,8 @@ def vgg_16(inputs, is_training=True, dropout_keep_prob=0.5, spatial_squeeze=True, - scope='vgg_16'): + scope='vgg_16', + fc_conv_padding='VALID'): """Oxford Net VGG 16-Layers version D Example. Note: All the fully_connected layers have been transformed to conv2d layers. @@ -142,6 +149,11 @@ def vgg_16(inputs, spatial_squeeze: whether or not should squeeze the spatial dimensions of the outputs. Useful to remove unnecessary dimensions for classification. scope: Optional scope for the variables. + fc_conv_padding: the type of padding to use for the fully connected layer + that is implemented as a convolutional layer. Use 'SAME' padding if you + are applying the network in a fully convolutional manner and want to + get a prediction map downsampled by a factor of 32 as an output. Otherwise, + the output prediction map will be (input / 32) - 6 in case of 'VALID' padding. Returns: the last op containing the log predictions and end_points dict. @@ -162,7 +174,7 @@ def vgg_16(inputs, net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv5') net = slim.max_pool2d(net, [2, 2], scope='pool5') # Use conv2d instead of fully_connected layers. - net = slim.conv2d(net, 4096, [7, 7], padding='VALID', scope='fc6') + net = slim.conv2d(net, 4096, [7, 7], padding=fc_conv_padding, scope='fc6') net = slim.dropout(net, dropout_keep_prob, is_training=is_training, scope='dropout6') net = slim.conv2d(net, 4096, [1, 1], scope='fc7') @@ -186,7 +198,8 @@ def vgg_19(inputs, is_training=True, dropout_keep_prob=0.5, spatial_squeeze=True, - scope='vgg_19'): + scope='vgg_19', + fc_conv_padding='VALID'): """Oxford Net VGG 19-Layers version E Example. Note: All the fully_connected layers have been transformed to conv2d layers. @@ -201,6 +214,11 @@ def vgg_19(inputs, spatial_squeeze: whether or not should squeeze the spatial dimensions of the outputs. Useful to remove unnecessary dimensions for classification. scope: Optional scope for the variables. + fc_conv_padding: the type of padding to use for the fully connected layer + that is implemented as a convolutional layer. Use 'SAME' padding if you + are applying the network in a fully convolutional manner and want to + get a prediction map downsampled by a factor of 32 as an output. Otherwise, + the output prediction map will be (input / 32) - 6 in case of 'VALID' padding. Returns: the last op containing the log predictions and end_points dict. @@ -221,7 +239,7 @@ def vgg_19(inputs, net = slim.repeat(net, 4, slim.conv2d, 512, [3, 3], scope='conv5') net = slim.max_pool2d(net, [2, 2], scope='pool5') # Use conv2d instead of fully_connected layers. - net = slim.conv2d(net, 4096, [7, 7], padding='VALID', scope='fc6') + net = slim.conv2d(net, 4096, [7, 7], padding=fc_conv_padding, scope='fc6') net = slim.dropout(net, dropout_keep_prob, is_training=is_training, scope='dropout6') net = slim.conv2d(net, 4096, [1, 1], scope='fc7') diff --git a/slim/preprocessing/inception_preprocessing.py b/slim/preprocessing/inception_preprocessing.py index ca3eba0baa36622f3546e751e940c60d84b63852..b907aab1f4e610844f843ff590904859722ba237 100644 --- a/slim/preprocessing/inception_preprocessing.py +++ b/slim/preprocessing/inception_preprocessing.py @@ -241,7 +241,7 @@ def preprocess_for_eval(image, height, width, If height and width are specified it would output an image with that size by applying resize_bilinear. - If central_fraction is specified it would cropt the central fraction of the + If central_fraction is specified it would crop the central fraction of the input image. Args: diff --git a/slim/preprocessing/preprocessing_factory.py b/slim/preprocessing/preprocessing_factory.py index 35f8645ef92f35fc74e5798fb0a4bf5a09b28730..3ab79a01291559afb668e368034f06d1e5dae6d7 100644 --- a/slim/preprocessing/preprocessing_factory.py +++ b/slim/preprocessing/preprocessing_factory.py @@ -53,12 +53,10 @@ def get_preprocessing(name, is_training=False): 'inception_v4': inception_preprocessing, 'inception_resnet_v2': inception_preprocessing, 'lenet': lenet_preprocessing, + 'mobilenet_v1': inception_preprocessing, 'resnet_v1_50': vgg_preprocessing, 'resnet_v1_101': vgg_preprocessing, 'resnet_v1_152': vgg_preprocessing, - 'resnet_v2_50': vgg_preprocessing, - 'resnet_v2_101': vgg_preprocessing, - 'resnet_v2_152': vgg_preprocessing, 'vgg': vgg_preprocessing, 'vgg_a': vgg_preprocessing, 'vgg_16': vgg_preprocessing, diff --git a/slim/preprocessing/vgg_preprocessing.py b/slim/preprocessing/vgg_preprocessing.py index 1900cae220972f71834b96bfacd32b167b4645ef..c2c92f0a70a1c7b5f15f9232d59373f243eabe62 100644 --- a/slim/preprocessing/vgg_preprocessing.py +++ b/slim/preprocessing/vgg_preprocessing.py @@ -34,8 +34,6 @@ from __future__ import print_function import tensorflow as tf -from tensorflow.python.ops import control_flow_ops - slim = tf.contrib.slim _R_MEAN = 123.68 @@ -71,9 +69,8 @@ def _crop(image, offset_height, offset_width, crop_height, crop_width): rank_assertion = tf.Assert( tf.equal(tf.rank(image), 3), ['Rank of image must be equal to 3.']) - cropped_shape = control_flow_ops.with_dependencies( - [rank_assertion], - tf.stack([crop_height, crop_width, original_shape[2]])) + with tf.control_dependencies([rank_assertion]): + cropped_shape = tf.stack([crop_height, crop_width, original_shape[2]]) size_assertion = tf.Assert( tf.logical_and( @@ -85,9 +82,8 @@ def _crop(image, offset_height, offset_width, crop_height, crop_width): # Use tf.slice instead of crop_to_bounding box as it accepts tensors to # define the crop size. - image = control_flow_ops.with_dependencies( - [size_assertion], - tf.slice(image, offsets, cropped_shape)) + with tf.control_dependencies([size_assertion]): + image = tf.slice(image, offsets, cropped_shape) return tf.reshape(image, cropped_shape) @@ -126,9 +122,8 @@ def _random_crop(image_list, crop_height, crop_width): image_list[i].name, 3, image_rank]) rank_assertions.append(rank_assert) - image_shape = control_flow_ops.with_dependencies( - [rank_assertions[0]], - tf.shape(image_list[0])) + with tf.control_dependencies([rank_assertions[0]]): + image_shape = tf.shape(image_list[0]) image_height = image_shape[0] image_width = image_shape[1] crop_size_assert = tf.Assert( @@ -142,8 +137,8 @@ def _random_crop(image_list, crop_height, crop_width): for i in range(1, len(image_list)): image = image_list[i] asserts.append(rank_assertions[i]) - shape = control_flow_ops.with_dependencies([rank_assertions[i]], - tf.shape(image)) + with tf.control_dependencies([rank_assertions[i]]): + shape = tf.shape(image) height = shape[0] width = shape[1] @@ -162,10 +157,10 @@ def _random_crop(image_list, crop_height, crop_width): # Use tf.random_uniform and not numpy.random.rand as doing the former would # generate random numbers at graph eval time, unlike the latter which # generates random numbers at graph definition time. - max_offset_height = control_flow_ops.with_dependencies( - asserts, tf.reshape(image_height - crop_height + 1, [])) - max_offset_width = control_flow_ops.with_dependencies( - asserts, tf.reshape(image_width - crop_width + 1, [])) + with tf.control_dependencies(asserts): + max_offset_height = tf.reshape(image_height - crop_height + 1, []) + with tf.control_dependencies(asserts): + max_offset_width = tf.reshape(image_width - crop_width + 1, []) offset_height = tf.random_uniform( [], maxval=max_offset_height, dtype=tf.int32) offset_width = tf.random_uniform( diff --git a/slim/scripts/finetune_inception_v1_on_flowers.sh b/slim/scripts/finetune_inception_v1_on_flowers.sh index 480b46c0991aec159e52b2df4972cf10f7f03cce..d152e367a7a4cc44bbd381beb9b50e1972468942 100644 --- a/slim/scripts/finetune_inception_v1_on_flowers.sh +++ b/slim/scripts/finetune_inception_v1_on_flowers.sh @@ -8,6 +8,7 @@ # Usage: # cd slim # ./slim/scripts/finetune_inception_v1_on_flowers.sh +set -e # Where the pre-trained InceptionV1 checkpoint is saved to. PRETRAINED_CHECKPOINT_DIR=/tmp/checkpoints diff --git a/slim/scripts/finetune_inception_v3_on_flowers.sh b/slim/scripts/finetune_inception_v3_on_flowers.sh index dfcc87ac8734ee5a5007c2e517ccfb1f7e0a50c5..627e42c063c4c0569508e9152bf9fa37c47c17ac 100644 --- a/slim/scripts/finetune_inception_v3_on_flowers.sh +++ b/slim/scripts/finetune_inception_v3_on_flowers.sh @@ -8,6 +8,7 @@ # Usage: # cd slim # ./slim/scripts/finetune_inceptionv3_on_flowers.sh +set -e # Where the pre-trained InceptionV3 checkpoint is saved to. PRETRAINED_CHECKPOINT_DIR=/tmp/checkpoints diff --git a/slim/scripts/finetune_resnet_v1_50_on_flowers.sh b/slim/scripts/finetune_resnet_v1_50_on_flowers.sh index 0465e06b5281c60ff0052d805c51ad3950a390dd..8134dfc3d5bbb516784bb3eec8180e5dbc2fde52 100644 --- a/slim/scripts/finetune_resnet_v1_50_on_flowers.sh +++ b/slim/scripts/finetune_resnet_v1_50_on_flowers.sh @@ -8,6 +8,7 @@ # Usage: # cd slim # ./slim/scripts/finetune_resnet_v1_50_on_flowers.sh +set -e # Where the pre-trained ResNetV1-50 checkpoint is saved to. PRETRAINED_CHECKPOINT_DIR=/tmp/checkpoints diff --git a/slim/scripts/train_cifarnet_on_cifar10.sh b/slim/scripts/train_cifarnet_on_cifar10.sh index daefb22e13b576b6731b8c6637135db8ef3acc8d..bee535a7719ef91672d1b9a6220f569d42c103de 100644 --- a/slim/scripts/train_cifarnet_on_cifar10.sh +++ b/slim/scripts/train_cifarnet_on_cifar10.sh @@ -8,6 +8,7 @@ # Usage: # cd slim # ./scripts/train_cifar_net_on_mnist.sh +set -e # Where the checkpoint and logs will be saved to. TRAIN_DIR=/tmp/cifarnet-model diff --git a/slim/scripts/train_lenet_on_mnist.sh b/slim/scripts/train_lenet_on_mnist.sh index 8dbeff2a00a6f76f8528bfa2e0da2cb19327616c..e5371eba52773cdd5c0b5fa4318fd26a395daa6b 100644 --- a/slim/scripts/train_lenet_on_mnist.sh +++ b/slim/scripts/train_lenet_on_mnist.sh @@ -8,6 +8,7 @@ # Usage: # cd slim # ./slim/scripts/train_lenet_on_mnist.sh +set -e # Where the checkpoint and logs will be saved to. TRAIN_DIR=/tmp/lenet-model diff --git a/slim/setup.py b/slim/setup.py new file mode 100644 index 0000000000000000000000000000000000000000..4262a4ee3190bcb6d14391000de4beb75e2ee257 --- /dev/null +++ b/slim/setup.py @@ -0,0 +1,13 @@ +"""Setup script for slim.""" + +from setuptools import find_packages +from setuptools import setup + + +setup( + name='slim', + version='0.1', + include_package_data=True, + packages=find_packages(), + description='tf-slim', +) diff --git a/slim/slim_walkthrough.ipynb b/slim/slim_walkthrough.ipynb index 94bafc47a890322622f3388a7e5705bc40ec1f76..dff43e03bd3184c3877c28eda475f088e89fb6e8 100644 --- a/slim/slim_walkthrough.ipynb +++ b/slim/slim_walkthrough.ipynb @@ -29,11 +29,14 @@ "## Installation and setup\n", "\n", "\n", - "As of 8/28/16, the latest stable release of TF is r0.10, which does not contain the latest version of slim.\n", - "To obtain the latest version of TF-Slim, please install the most recent nightly build of TF\n", - "as explained [here](https://github.com/tensorflow/models/tree/master/slim#installing-latest-version-of-tf-slim).\n", + "Since the stable release of TF 1.0, the latest version of slim has been available as `tf.contrib.slim`.\n", + "To test that your installation is working, execute the following command; it should run without raising any errors.\n", "\n", - "To use TF-Slim for image classification (as we do in this notebook), you also have to install the TF-Slim image models library from [here](https://github.com/tensorflow/models/tree/master/slim). Let's suppose you install this into a directory called TF_MODELS. Then you should change directory to TF_MODELS/slim **before** running this notebook, so that these files are in your python path.\n", + "```\n", + "python -c \"import tensorflow.contrib.slim as slim; eval = slim.evaluation.evaluate_once\"\n", + "```\n", + "\n", + "Although, to use TF-Slim for image classification (as we do in this notebook), you also have to install the TF-Slim image models library from [here](https://github.com/tensorflow/models/tree/master/slim). Let's suppose you install this into a directory called TF_MODELS. Then you should change directory to TF_MODELS/slim **before** running this notebook, so that these files are in your python path.\n", "\n", "To check you've got these two steps to work, just execute the cell below. If it complains about unknown modules, restart the notebook after moving to the TF-Slim models directory.\n" ] @@ -42,10 +45,14 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": true }, "outputs": [], "source": [ + "from __future__ import absolute_import\n", + "from __future__ import division\n", + "from __future__ import print_function\n", + "\n", "import matplotlib\n", "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", @@ -57,7 +64,7 @@ "from datasets import dataset_utils\n", "\n", "# Main slim library\n", - "slim = tf.contrib.slim" + "from tensorflow.contrib import slim" ] }, { @@ -143,7 +150,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": true }, "outputs": [], "source": [ @@ -156,15 +163,15 @@ " predictions, end_points = regression_model(inputs)\n", "\n", " # Print name and shape of each tensor.\n", - " print \"Layers\"\n", - " for k, v in end_points.iteritems():\n", - " print 'name = {}, shape = {}'.format(v.name, v.get_shape())\n", + " print(\"Layers\")\n", + " for k, v in end_points.items():\n", + " print('name = {}, shape = {}'.format(v.name, v.get_shape()))\n", "\n", " # Print name and shape of parameter nodes (values not yet initialized)\n", - " print \"\\n\"\n", - " print \"Parameters\"\n", + " print(\"\\n\")\n", + " print(\"Parameters\")\n", " for v in slim.get_model_variables():\n", - " print 'name = {}, shape = {}'.format(v.name, v.get_shape())\n" + " print('name = {}, shape = {}'.format(v.name, v.get_shape()))\n" ] }, { @@ -180,7 +187,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": true }, "outputs": [], "source": [ @@ -228,7 +235,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": true }, "outputs": [], "source": [ @@ -280,7 +287,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": true }, "outputs": [], "source": [ @@ -330,7 +337,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": true }, "outputs": [], "source": [ @@ -367,7 +374,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": true }, "outputs": [], "source": [ @@ -391,7 +398,7 @@ " final_op=names_to_value_nodes.values())\n", "\n", " names_to_values = dict(zip(names_to_value_nodes.keys(), metric_values))\n", - " for key, value in names_to_values.iteritems():\n", + " for key, value in names_to_values.items():\n", " print('%s: %f' % (key, value))" ] }, @@ -441,7 +448,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": true }, "outputs": [], "source": [ @@ -468,14 +475,14 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": true }, "outputs": [], "source": [ "from datasets import flowers\n", "import tensorflow as tf\n", "\n", - "slim = tf.contrib.slim\n", + "from tensorflow.contrib import slim\n", "\n", "with tf.Graph().as_default(): \n", " dataset = flowers.get_split('train', flowers_data_dir)\n", @@ -485,7 +492,7 @@ " \n", " with tf.Session() as sess: \n", " with slim.queues.QueueRunners(sess):\n", - " for i in xrange(4):\n", + " for i in range(4):\n", " np_image, np_label = sess.run([image, label])\n", " height, width, _ = np_image.shape\n", " class_name = name = dataset.labels_to_names[np_label]\n", @@ -547,7 +554,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": true }, "outputs": [], "source": [ @@ -599,14 +606,14 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": true }, "outputs": [], "source": [ "from preprocessing import inception_preprocessing\n", "import tensorflow as tf\n", "\n", - "slim = tf.contrib.slim\n", + "from tensorflow.contrib import slim\n", "\n", "\n", "def load_batch(dataset, batch_size=32, height=299, width=299, is_training=False):\n", @@ -651,7 +658,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": true }, "outputs": [], "source": [ @@ -706,7 +713,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": true }, "outputs": [], "source": [ @@ -771,7 +778,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": true }, "outputs": [], "source": [ @@ -802,26 +809,30 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": true }, "outputs": [], "source": [ "import numpy as np\n", "import os\n", "import tensorflow as tf\n", - "import urllib2\n", + "\n", + "try:\n", + " import urllib2\n", + "except ImportError:\n", + " import urllib.request as urllib\n", "\n", "from datasets import imagenet\n", "from nets import inception\n", "from preprocessing import inception_preprocessing\n", "\n", - "slim = tf.contrib.slim\n", + "from tensorflow.contrib import slim\n", "\n", "image_size = inception.inception_v1.default_image_size\n", "\n", "with tf.Graph().as_default():\n", " url = 'https://upload.wikimedia.org/wikipedia/commons/7/70/EnglishCockerSpaniel_simon.jpg'\n", - " image_string = urllib2.urlopen(url).read()\n", + " image_string = urllib.urlopen(url).read()\n", " image = tf.image.decode_jpeg(image_string, channels=3)\n", " processed_image = inception_preprocessing.preprocess_image(image, image_size, image_size, is_training=False)\n", " processed_images = tf.expand_dims(processed_image, 0)\n", @@ -849,7 +860,7 @@ " names = imagenet.create_readable_names_for_imagenet_labels()\n", " for i in range(5):\n", " index = sorted_inds[i]\n", - " print('Probability %0.2f%% => [%s]' % (probabilities[index], names[index]))" + " print('Probability %0.2f%% => [%s]' % (probabilities[index] * 100, names[index]))" ] }, { @@ -902,19 +913,23 @@ "import numpy as np\n", "import os\n", "import tensorflow as tf\n", - "import urllib2\n", + "\n", + "try:\n", + " import urllib2\n", + "except ImportError:\n", + " import urllib.request as urllib\n", "\n", "from datasets import imagenet\n", "from nets import vgg\n", "from preprocessing import vgg_preprocessing\n", "\n", - "slim = tf.contrib.slim\n", + "from tensorflow.contrib import slim\n", "\n", "image_size = vgg.vgg_16.default_image_size\n", "\n", "with tf.Graph().as_default():\n", " url = 'https://upload.wikimedia.org/wikipedia/commons/d/d9/First_Student_IC_school_bus_202076.jpg'\n", - " image_string = urllib2.urlopen(url).read()\n", + " image_string = urllib.urlopen(url).read()\n", " image = tf.image.decode_jpeg(image_string, channels=3)\n", " processed_image = vgg_preprocessing.preprocess_image(image, image_size, image_size, is_training=False)\n", " processed_images = tf.expand_dims(processed_image, 0)\n", @@ -944,7 +959,7 @@ " for i in range(5):\n", " index = sorted_inds[i]\n", " # Shift the index of a class name by one. \n", - " print('Probability %0.2f%% => [%s]' % (probabilities[index], names[index+1]))" + " print('Probability %0.2f%% => [%s]' % (probabilities[index] * 100, names[index+1]))" ] }, { @@ -960,7 +975,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": true }, "outputs": [], "source": [ @@ -972,7 +987,7 @@ "from nets import inception\n", "from preprocessing import inception_preprocessing\n", "\n", - "slim = tf.contrib.slim\n", + "from tensorflow.contrib import slim\n", "image_size = inception.inception_v1.default_image_size\n", "\n", "\n", @@ -1043,7 +1058,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": true }, "outputs": [], "source": [ @@ -1052,7 +1067,7 @@ "from datasets import flowers\n", "from nets import inception\n", "\n", - "slim = tf.contrib.slim\n", + "from tensorflow.contrib import slim\n", "\n", "image_size = inception.inception_v1.default_image_size\n", "batch_size = 3\n", @@ -1080,7 +1095,7 @@ " init_fn(sess)\n", " np_probabilities, np_images_raw, np_labels = sess.run([probabilities, images_raw, labels])\n", " \n", - " for i in xrange(batch_size): \n", + " for i in range(batch_size): \n", " image = np_images_raw[i, :, :, :]\n", " true_label = np_labels[i]\n", " predicted_label = np.argmax(np_probabilities[i, :])\n", @@ -1093,27 +1108,36 @@ " plt.axis('off')\n", " plt.show()" ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [] } ], "metadata": { "kernelspec": { - "display_name": "Python 2", + "display_name": "Python 3", "language": "python", - "name": "python2" + "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", - "version": 2 + "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", - "pygments_lexer": "ipython2", - "version": "2.7.11" + "pygments_lexer": "ipython3", + "version": "3.6.1" } }, "nbformat": 4, - "nbformat_minor": 0 + "nbformat_minor": 1 } diff --git a/slim/train_image_classifier.py b/slim/train_image_classifier.py old mode 100644 new mode 100755 index 8b7b24488d229c1eab580c30e84ca09edc2580ae..21180edb9e8ff342ed051d024b76c4964850257a --- a/slim/train_image_classifier.py +++ b/slim/train_image_classifier.py @@ -20,7 +20,6 @@ from __future__ import print_function import tensorflow as tf -from tensorflow.python.ops import control_flow_ops from datasets import dataset_factory from deployment import model_deploy from nets import nets_factory @@ -118,8 +117,6 @@ tf.app.flags.DEFINE_float( 'momentum', 0.9, 'The momentum for the MomentumOptimizer and RMSPropOptimizer.') -tf.app.flags.DEFINE_float('rmsprop_momentum', 0.9, 'Momentum.') - tf.app.flags.DEFINE_float('rmsprop_decay', 0.9, 'Decay term for RMSProp.') ####################### @@ -304,7 +301,7 @@ def _configure_optimizer(learning_rate): optimizer = tf.train.RMSPropOptimizer( learning_rate, decay=FLAGS.rmsprop_decay, - momentum=FLAGS.rmsprop_momentum, + momentum=FLAGS.momentum, epsilon=FLAGS.opt_epsilon) elif FLAGS.optimizer == 'sgd': optimizer = tf.train.GradientDescentOptimizer(learning_rate) @@ -312,15 +309,6 @@ def _configure_optimizer(learning_rate): raise ValueError('Optimizer [%s] was not recognized', FLAGS.optimizer) return optimizer - -def _add_variables_summaries(learning_rate): - summaries = [] - for variable in slim.get_model_variables(): - summaries.append(tf.summary.histogram(variable.op.name, variable)) - summaries.append(tf.summary.scalar('training/Learning Rate', learning_rate)) - return summaries - - def _get_init_fn(): """Returns a function run by the chief worker to warm-start the training. @@ -462,7 +450,8 @@ def main(_): #################### def clone_fn(batch_queue): """Allows data parallelism by creating multiple clones of network_fn.""" - images, labels = batch_queue.dequeue() + with tf.device(deploy_config.inputs_device()): + images, labels = batch_queue.dequeue() logits, end_points = network_fn(images) ############################# @@ -551,8 +540,8 @@ def main(_): update_ops.append(grad_updates) update_op = tf.group(*update_ops) - train_tensor = control_flow_ops.with_dependencies([update_op], total_loss, - name='train_op') + with tf.control_dependencies([update_op]): + train_tensor = tf.identity(total_loss, name='train_op') # Add the summaries from the first clone. These contain the summaries # created by model_fn and either optimize_clones() or _gather_clone_loss(). diff --git a/street/README.md b/street/README.md index 64ab6b656f7fb49e731c0ac8479b9d3bcb464e78..b63b99b935eaae1df873b67c7afd46acf0252230 100644 --- a/street/README.md +++ b/street/README.md @@ -38,7 +38,7 @@ Avenue des Sapins ## Installing and setting up the STREET model -[Install Tensorflow](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/get_started/os_setup.md#virtualenv-installation) +[Install Tensorflow](https://www.tensorflow.org/install/) Install numpy: @@ -54,6 +54,10 @@ TF_INC=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())') g++ -std=c++11 -shared rnn_ops.cc -o rnn_ops.so -fPIC -I $TF_INC -O3 -mavx ``` +(Note: if running on Mac, add `-undefined dynamic_lookup` to your `g++` command. +If you are running a newer version of gcc, you may also need to add +`-D_GLIBCXX_USE_CXX11_ABI=0`.) + Run the unittests: ``` @@ -75,6 +79,7 @@ Note that these datasets are very large. The approximate sizes are: * Validation: 64 files of 40MB each. * Test: 64 files of 50MB each. * Testdata: some smaller data files of a few MB for testing. +* Total: ~158 Gb. Here is a list of the download paths: @@ -95,9 +100,14 @@ https://download.tensorflow.org/data/fsns-20160927/validation/validation-00000-o https://download.tensorflow.org/data/fsns-20160927/validation/validation-00063-of-00064 ``` -The above files need to be downloaded individually, as they are large and -downloads are more likely to succeed with the individual files than with a -single archive containing them all. +All URLs are stored in the text file `python/fsns_urls.txt`, to download them in +parallel: + +``` +aria2c -c -j 20 -i fsns_urls.txt +``` +If you ctrl+c and re-execute the command it will continue the aborted download. + ## Confidence Tests @@ -252,4 +262,3 @@ defines a Tensor Flow graph that can be used to process images of variable sizes to output a 1-dimensional sequence, like a transcription/OCR problem, or a 0-dimensional label, as for image identification problems. For more information see [vgslspecs](g3doc/vgslspecs.md) - diff --git a/street/python/fsns_urls.py b/street/python/fsns_urls.py new file mode 100644 index 0000000000000000000000000000000000000000..bea547b9d57315e81ed69d290370f851b17784e0 --- /dev/null +++ b/street/python/fsns_urls.py @@ -0,0 +1,49 @@ +# Copyright 2016 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Creates a text file with URLs to download FSNS dataset using aria2c. + +The FSNS dataset has 640 files and takes 158Gb of the disk space. So it is +highly recommended to use some kind of a download manager to download it. + +Aria2c is a powerful download manager which can download multiple files in +parallel, re-try if encounter an error and continue previously unfinished +downloads. +""" + +import os + +_FSNS_BASE_URL = 'http://download.tensorflow.org/data/fsns-20160927/' +_SHARDS = {'test': 64, 'train': 512, 'validation':64} +_OUTPUT_FILE = "fsns_urls.txt" +_OUTPUT_DIR = "data/fsns" + +def fsns_paths(): + paths = ['charset_size=134.txt'] + for name, shards in _SHARDS.items(): + for i in range(shards): + paths.append('%s/%s-%05d-of-%05d' % (name, name, i, shards)) + return paths + + +if __name__ == "__main__": + with open(_OUTPUT_FILE, "w") as f: + for path in fsns_paths(): + url = _FSNS_BASE_URL + path + dst_path = os.path.join(_OUTPUT_DIR, path) + f.write("%s\n out=%s\n" % (url, dst_path)) + print("To download FSNS dataset execute:") + print("aria2c -c -j 20 -i %s" % _OUTPUT_FILE) + print("The downloaded FSNS dataset will be stored under %s" % _OUTPUT_DIR) diff --git a/street/python/fsns_urls.txt b/street/python/fsns_urls.txt new file mode 100644 index 0000000000000000000000000000000000000000..959ffbd5d432105a2964ef2a4be07d046c7ab026 --- /dev/null +++ b/street/python/fsns_urls.txt @@ -0,0 +1,1282 @@ +http://download.tensorflow.org/data/fsns-20160927/charset_size=134.txt + out=data/fsns/charset_size=134.txt +http://download.tensorflow.org/data/fsns-20160927/test/test-00000-of-00064 + out=data/fsns/test/test-00000-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00001-of-00064 + out=data/fsns/test/test-00001-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00002-of-00064 + out=data/fsns/test/test-00002-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00003-of-00064 + out=data/fsns/test/test-00003-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00004-of-00064 + out=data/fsns/test/test-00004-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00005-of-00064 + out=data/fsns/test/test-00005-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00006-of-00064 + out=data/fsns/test/test-00006-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00007-of-00064 + out=data/fsns/test/test-00007-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00008-of-00064 + out=data/fsns/test/test-00008-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00009-of-00064 + out=data/fsns/test/test-00009-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00010-of-00064 + out=data/fsns/test/test-00010-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00011-of-00064 + out=data/fsns/test/test-00011-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00012-of-00064 + out=data/fsns/test/test-00012-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00013-of-00064 + out=data/fsns/test/test-00013-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00014-of-00064 + out=data/fsns/test/test-00014-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00015-of-00064 + out=data/fsns/test/test-00015-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00016-of-00064 + out=data/fsns/test/test-00016-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00017-of-00064 + out=data/fsns/test/test-00017-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00018-of-00064 + out=data/fsns/test/test-00018-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00019-of-00064 + out=data/fsns/test/test-00019-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00020-of-00064 + out=data/fsns/test/test-00020-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00021-of-00064 + out=data/fsns/test/test-00021-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00022-of-00064 + out=data/fsns/test/test-00022-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00023-of-00064 + out=data/fsns/test/test-00023-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00024-of-00064 + out=data/fsns/test/test-00024-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00025-of-00064 + out=data/fsns/test/test-00025-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00026-of-00064 + out=data/fsns/test/test-00026-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00027-of-00064 + out=data/fsns/test/test-00027-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00028-of-00064 + out=data/fsns/test/test-00028-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00029-of-00064 + out=data/fsns/test/test-00029-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00030-of-00064 + out=data/fsns/test/test-00030-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00031-of-00064 + out=data/fsns/test/test-00031-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00032-of-00064 + out=data/fsns/test/test-00032-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00033-of-00064 + out=data/fsns/test/test-00033-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00034-of-00064 + out=data/fsns/test/test-00034-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00035-of-00064 + out=data/fsns/test/test-00035-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00036-of-00064 + out=data/fsns/test/test-00036-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00037-of-00064 + out=data/fsns/test/test-00037-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00038-of-00064 + out=data/fsns/test/test-00038-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00039-of-00064 + out=data/fsns/test/test-00039-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00040-of-00064 + out=data/fsns/test/test-00040-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00041-of-00064 + out=data/fsns/test/test-00041-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00042-of-00064 + out=data/fsns/test/test-00042-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00043-of-00064 + out=data/fsns/test/test-00043-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00044-of-00064 + out=data/fsns/test/test-00044-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00045-of-00064 + out=data/fsns/test/test-00045-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00046-of-00064 + out=data/fsns/test/test-00046-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00047-of-00064 + out=data/fsns/test/test-00047-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00048-of-00064 + out=data/fsns/test/test-00048-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00049-of-00064 + out=data/fsns/test/test-00049-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00050-of-00064 + out=data/fsns/test/test-00050-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00051-of-00064 + out=data/fsns/test/test-00051-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00052-of-00064 + out=data/fsns/test/test-00052-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00053-of-00064 + out=data/fsns/test/test-00053-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00054-of-00064 + out=data/fsns/test/test-00054-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00055-of-00064 + out=data/fsns/test/test-00055-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00056-of-00064 + out=data/fsns/test/test-00056-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00057-of-00064 + out=data/fsns/test/test-00057-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00058-of-00064 + out=data/fsns/test/test-00058-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00059-of-00064 + out=data/fsns/test/test-00059-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00060-of-00064 + out=data/fsns/test/test-00060-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00061-of-00064 + out=data/fsns/test/test-00061-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00062-of-00064 + out=data/fsns/test/test-00062-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00063-of-00064 + out=data/fsns/test/test-00063-of-00064 +http://download.tensorflow.org/data/fsns-20160927/train/train-00000-of-00512 + out=data/fsns/train/train-00000-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00001-of-00512 + out=data/fsns/train/train-00001-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00002-of-00512 + out=data/fsns/train/train-00002-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00003-of-00512 + out=data/fsns/train/train-00003-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00004-of-00512 + out=data/fsns/train/train-00004-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00005-of-00512 + out=data/fsns/train/train-00005-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00006-of-00512 + out=data/fsns/train/train-00006-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00007-of-00512 + out=data/fsns/train/train-00007-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00008-of-00512 + out=data/fsns/train/train-00008-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00009-of-00512 + out=data/fsns/train/train-00009-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00010-of-00512 + out=data/fsns/train/train-00010-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00011-of-00512 + out=data/fsns/train/train-00011-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00012-of-00512 + out=data/fsns/train/train-00012-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00013-of-00512 + out=data/fsns/train/train-00013-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00014-of-00512 + out=data/fsns/train/train-00014-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00015-of-00512 + out=data/fsns/train/train-00015-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00016-of-00512 + out=data/fsns/train/train-00016-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00017-of-00512 + out=data/fsns/train/train-00017-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00018-of-00512 + out=data/fsns/train/train-00018-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00019-of-00512 + out=data/fsns/train/train-00019-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00020-of-00512 + out=data/fsns/train/train-00020-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00021-of-00512 + out=data/fsns/train/train-00021-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00022-of-00512 + out=data/fsns/train/train-00022-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00023-of-00512 + out=data/fsns/train/train-00023-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00024-of-00512 + out=data/fsns/train/train-00024-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00025-of-00512 + out=data/fsns/train/train-00025-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00026-of-00512 + out=data/fsns/train/train-00026-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00027-of-00512 + out=data/fsns/train/train-00027-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00028-of-00512 + out=data/fsns/train/train-00028-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00029-of-00512 + out=data/fsns/train/train-00029-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00030-of-00512 + out=data/fsns/train/train-00030-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00031-of-00512 + out=data/fsns/train/train-00031-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00032-of-00512 + out=data/fsns/train/train-00032-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00033-of-00512 + out=data/fsns/train/train-00033-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00034-of-00512 + out=data/fsns/train/train-00034-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00035-of-00512 + out=data/fsns/train/train-00035-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00036-of-00512 + out=data/fsns/train/train-00036-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00037-of-00512 + out=data/fsns/train/train-00037-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00038-of-00512 + out=data/fsns/train/train-00038-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00039-of-00512 + out=data/fsns/train/train-00039-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00040-of-00512 + out=data/fsns/train/train-00040-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00041-of-00512 + out=data/fsns/train/train-00041-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00042-of-00512 + out=data/fsns/train/train-00042-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00043-of-00512 + out=data/fsns/train/train-00043-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00044-of-00512 + out=data/fsns/train/train-00044-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00045-of-00512 + out=data/fsns/train/train-00045-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00046-of-00512 + out=data/fsns/train/train-00046-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00047-of-00512 + out=data/fsns/train/train-00047-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00048-of-00512 + out=data/fsns/train/train-00048-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00049-of-00512 + out=data/fsns/train/train-00049-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00050-of-00512 + out=data/fsns/train/train-00050-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00051-of-00512 + out=data/fsns/train/train-00051-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00052-of-00512 + out=data/fsns/train/train-00052-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00053-of-00512 + out=data/fsns/train/train-00053-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00054-of-00512 + out=data/fsns/train/train-00054-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00055-of-00512 + out=data/fsns/train/train-00055-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00056-of-00512 + out=data/fsns/train/train-00056-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00057-of-00512 + out=data/fsns/train/train-00057-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00058-of-00512 + out=data/fsns/train/train-00058-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00059-of-00512 + out=data/fsns/train/train-00059-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00060-of-00512 + out=data/fsns/train/train-00060-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00061-of-00512 + out=data/fsns/train/train-00061-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00062-of-00512 + out=data/fsns/train/train-00062-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00063-of-00512 + out=data/fsns/train/train-00063-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00064-of-00512 + out=data/fsns/train/train-00064-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00065-of-00512 + out=data/fsns/train/train-00065-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00066-of-00512 + out=data/fsns/train/train-00066-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00067-of-00512 + out=data/fsns/train/train-00067-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00068-of-00512 + out=data/fsns/train/train-00068-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00069-of-00512 + out=data/fsns/train/train-00069-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00070-of-00512 + out=data/fsns/train/train-00070-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00071-of-00512 + out=data/fsns/train/train-00071-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00072-of-00512 + out=data/fsns/train/train-00072-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00073-of-00512 + out=data/fsns/train/train-00073-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00074-of-00512 + out=data/fsns/train/train-00074-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00075-of-00512 + out=data/fsns/train/train-00075-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00076-of-00512 + out=data/fsns/train/train-00076-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00077-of-00512 + out=data/fsns/train/train-00077-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00078-of-00512 + out=data/fsns/train/train-00078-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00079-of-00512 + out=data/fsns/train/train-00079-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00080-of-00512 + out=data/fsns/train/train-00080-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00081-of-00512 + out=data/fsns/train/train-00081-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00082-of-00512 + out=data/fsns/train/train-00082-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00083-of-00512 + out=data/fsns/train/train-00083-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00084-of-00512 + out=data/fsns/train/train-00084-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00085-of-00512 + out=data/fsns/train/train-00085-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00086-of-00512 + out=data/fsns/train/train-00086-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00087-of-00512 + out=data/fsns/train/train-00087-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00088-of-00512 + out=data/fsns/train/train-00088-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00089-of-00512 + out=data/fsns/train/train-00089-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00090-of-00512 + out=data/fsns/train/train-00090-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00091-of-00512 + out=data/fsns/train/train-00091-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00092-of-00512 + out=data/fsns/train/train-00092-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00093-of-00512 + out=data/fsns/train/train-00093-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00094-of-00512 + out=data/fsns/train/train-00094-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00095-of-00512 + out=data/fsns/train/train-00095-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00096-of-00512 + out=data/fsns/train/train-00096-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00097-of-00512 + out=data/fsns/train/train-00097-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00098-of-00512 + out=data/fsns/train/train-00098-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00099-of-00512 + out=data/fsns/train/train-00099-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00100-of-00512 + out=data/fsns/train/train-00100-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00101-of-00512 + out=data/fsns/train/train-00101-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00102-of-00512 + out=data/fsns/train/train-00102-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00103-of-00512 + out=data/fsns/train/train-00103-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00104-of-00512 + out=data/fsns/train/train-00104-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00105-of-00512 + out=data/fsns/train/train-00105-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00106-of-00512 + out=data/fsns/train/train-00106-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00107-of-00512 + out=data/fsns/train/train-00107-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00108-of-00512 + out=data/fsns/train/train-00108-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00109-of-00512 + out=data/fsns/train/train-00109-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00110-of-00512 + out=data/fsns/train/train-00110-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00111-of-00512 + out=data/fsns/train/train-00111-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00112-of-00512 + out=data/fsns/train/train-00112-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00113-of-00512 + out=data/fsns/train/train-00113-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00114-of-00512 + out=data/fsns/train/train-00114-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00115-of-00512 + out=data/fsns/train/train-00115-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00116-of-00512 + out=data/fsns/train/train-00116-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00117-of-00512 + out=data/fsns/train/train-00117-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00118-of-00512 + out=data/fsns/train/train-00118-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00119-of-00512 + out=data/fsns/train/train-00119-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00120-of-00512 + out=data/fsns/train/train-00120-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00121-of-00512 + out=data/fsns/train/train-00121-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00122-of-00512 + out=data/fsns/train/train-00122-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00123-of-00512 + out=data/fsns/train/train-00123-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00124-of-00512 + out=data/fsns/train/train-00124-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00125-of-00512 + out=data/fsns/train/train-00125-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00126-of-00512 + out=data/fsns/train/train-00126-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00127-of-00512 + out=data/fsns/train/train-00127-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00128-of-00512 + out=data/fsns/train/train-00128-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00129-of-00512 + out=data/fsns/train/train-00129-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00130-of-00512 + out=data/fsns/train/train-00130-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00131-of-00512 + out=data/fsns/train/train-00131-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00132-of-00512 + out=data/fsns/train/train-00132-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00133-of-00512 + out=data/fsns/train/train-00133-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00134-of-00512 + out=data/fsns/train/train-00134-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00135-of-00512 + out=data/fsns/train/train-00135-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00136-of-00512 + out=data/fsns/train/train-00136-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00137-of-00512 + out=data/fsns/train/train-00137-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00138-of-00512 + out=data/fsns/train/train-00138-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00139-of-00512 + out=data/fsns/train/train-00139-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00140-of-00512 + out=data/fsns/train/train-00140-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00141-of-00512 + out=data/fsns/train/train-00141-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00142-of-00512 + out=data/fsns/train/train-00142-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00143-of-00512 + out=data/fsns/train/train-00143-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00144-of-00512 + out=data/fsns/train/train-00144-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00145-of-00512 + out=data/fsns/train/train-00145-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00146-of-00512 + out=data/fsns/train/train-00146-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00147-of-00512 + out=data/fsns/train/train-00147-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00148-of-00512 + out=data/fsns/train/train-00148-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00149-of-00512 + out=data/fsns/train/train-00149-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00150-of-00512 + out=data/fsns/train/train-00150-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00151-of-00512 + out=data/fsns/train/train-00151-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00152-of-00512 + out=data/fsns/train/train-00152-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00153-of-00512 + out=data/fsns/train/train-00153-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00154-of-00512 + out=data/fsns/train/train-00154-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00155-of-00512 + out=data/fsns/train/train-00155-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00156-of-00512 + out=data/fsns/train/train-00156-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00157-of-00512 + out=data/fsns/train/train-00157-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00158-of-00512 + out=data/fsns/train/train-00158-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00159-of-00512 + out=data/fsns/train/train-00159-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00160-of-00512 + out=data/fsns/train/train-00160-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00161-of-00512 + out=data/fsns/train/train-00161-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00162-of-00512 + out=data/fsns/train/train-00162-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00163-of-00512 + out=data/fsns/train/train-00163-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00164-of-00512 + out=data/fsns/train/train-00164-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00165-of-00512 + out=data/fsns/train/train-00165-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00166-of-00512 + out=data/fsns/train/train-00166-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00167-of-00512 + out=data/fsns/train/train-00167-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00168-of-00512 + out=data/fsns/train/train-00168-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00169-of-00512 + out=data/fsns/train/train-00169-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00170-of-00512 + out=data/fsns/train/train-00170-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00171-of-00512 + out=data/fsns/train/train-00171-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00172-of-00512 + out=data/fsns/train/train-00172-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00173-of-00512 + out=data/fsns/train/train-00173-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00174-of-00512 + out=data/fsns/train/train-00174-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00175-of-00512 + out=data/fsns/train/train-00175-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00176-of-00512 + out=data/fsns/train/train-00176-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00177-of-00512 + out=data/fsns/train/train-00177-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00178-of-00512 + out=data/fsns/train/train-00178-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00179-of-00512 + out=data/fsns/train/train-00179-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00180-of-00512 + out=data/fsns/train/train-00180-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00181-of-00512 + out=data/fsns/train/train-00181-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00182-of-00512 + out=data/fsns/train/train-00182-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00183-of-00512 + out=data/fsns/train/train-00183-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00184-of-00512 + out=data/fsns/train/train-00184-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00185-of-00512 + out=data/fsns/train/train-00185-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00186-of-00512 + out=data/fsns/train/train-00186-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00187-of-00512 + out=data/fsns/train/train-00187-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00188-of-00512 + out=data/fsns/train/train-00188-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00189-of-00512 + out=data/fsns/train/train-00189-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00190-of-00512 + out=data/fsns/train/train-00190-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00191-of-00512 + out=data/fsns/train/train-00191-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00192-of-00512 + out=data/fsns/train/train-00192-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00193-of-00512 + out=data/fsns/train/train-00193-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00194-of-00512 + out=data/fsns/train/train-00194-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00195-of-00512 + out=data/fsns/train/train-00195-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00196-of-00512 + out=data/fsns/train/train-00196-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00197-of-00512 + out=data/fsns/train/train-00197-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00198-of-00512 + out=data/fsns/train/train-00198-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00199-of-00512 + out=data/fsns/train/train-00199-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00200-of-00512 + out=data/fsns/train/train-00200-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00201-of-00512 + out=data/fsns/train/train-00201-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00202-of-00512 + out=data/fsns/train/train-00202-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00203-of-00512 + out=data/fsns/train/train-00203-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00204-of-00512 + out=data/fsns/train/train-00204-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00205-of-00512 + out=data/fsns/train/train-00205-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00206-of-00512 + out=data/fsns/train/train-00206-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00207-of-00512 + out=data/fsns/train/train-00207-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00208-of-00512 + out=data/fsns/train/train-00208-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00209-of-00512 + out=data/fsns/train/train-00209-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00210-of-00512 + out=data/fsns/train/train-00210-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00211-of-00512 + out=data/fsns/train/train-00211-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00212-of-00512 + out=data/fsns/train/train-00212-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00213-of-00512 + out=data/fsns/train/train-00213-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00214-of-00512 + out=data/fsns/train/train-00214-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00215-of-00512 + out=data/fsns/train/train-00215-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00216-of-00512 + out=data/fsns/train/train-00216-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00217-of-00512 + out=data/fsns/train/train-00217-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00218-of-00512 + out=data/fsns/train/train-00218-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00219-of-00512 + out=data/fsns/train/train-00219-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00220-of-00512 + out=data/fsns/train/train-00220-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00221-of-00512 + out=data/fsns/train/train-00221-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00222-of-00512 + out=data/fsns/train/train-00222-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00223-of-00512 + out=data/fsns/train/train-00223-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00224-of-00512 + out=data/fsns/train/train-00224-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00225-of-00512 + out=data/fsns/train/train-00225-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00226-of-00512 + out=data/fsns/train/train-00226-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00227-of-00512 + out=data/fsns/train/train-00227-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00228-of-00512 + out=data/fsns/train/train-00228-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00229-of-00512 + out=data/fsns/train/train-00229-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00230-of-00512 + out=data/fsns/train/train-00230-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00231-of-00512 + out=data/fsns/train/train-00231-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00232-of-00512 + out=data/fsns/train/train-00232-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00233-of-00512 + out=data/fsns/train/train-00233-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00234-of-00512 + out=data/fsns/train/train-00234-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00235-of-00512 + out=data/fsns/train/train-00235-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00236-of-00512 + out=data/fsns/train/train-00236-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00237-of-00512 + out=data/fsns/train/train-00237-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00238-of-00512 + out=data/fsns/train/train-00238-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00239-of-00512 + out=data/fsns/train/train-00239-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00240-of-00512 + out=data/fsns/train/train-00240-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00241-of-00512 + out=data/fsns/train/train-00241-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00242-of-00512 + out=data/fsns/train/train-00242-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00243-of-00512 + out=data/fsns/train/train-00243-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00244-of-00512 + out=data/fsns/train/train-00244-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00245-of-00512 + out=data/fsns/train/train-00245-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00246-of-00512 + out=data/fsns/train/train-00246-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00247-of-00512 + out=data/fsns/train/train-00247-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00248-of-00512 + out=data/fsns/train/train-00248-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00249-of-00512 + out=data/fsns/train/train-00249-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00250-of-00512 + out=data/fsns/train/train-00250-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00251-of-00512 + out=data/fsns/train/train-00251-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00252-of-00512 + out=data/fsns/train/train-00252-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00253-of-00512 + out=data/fsns/train/train-00253-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00254-of-00512 + out=data/fsns/train/train-00254-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00255-of-00512 + out=data/fsns/train/train-00255-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00256-of-00512 + out=data/fsns/train/train-00256-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00257-of-00512 + out=data/fsns/train/train-00257-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00258-of-00512 + out=data/fsns/train/train-00258-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00259-of-00512 + out=data/fsns/train/train-00259-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00260-of-00512 + out=data/fsns/train/train-00260-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00261-of-00512 + out=data/fsns/train/train-00261-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00262-of-00512 + out=data/fsns/train/train-00262-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00263-of-00512 + out=data/fsns/train/train-00263-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00264-of-00512 + out=data/fsns/train/train-00264-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00265-of-00512 + out=data/fsns/train/train-00265-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00266-of-00512 + out=data/fsns/train/train-00266-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00267-of-00512 + out=data/fsns/train/train-00267-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00268-of-00512 + out=data/fsns/train/train-00268-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00269-of-00512 + out=data/fsns/train/train-00269-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00270-of-00512 + out=data/fsns/train/train-00270-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00271-of-00512 + out=data/fsns/train/train-00271-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00272-of-00512 + out=data/fsns/train/train-00272-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00273-of-00512 + out=data/fsns/train/train-00273-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00274-of-00512 + out=data/fsns/train/train-00274-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00275-of-00512 + out=data/fsns/train/train-00275-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00276-of-00512 + out=data/fsns/train/train-00276-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00277-of-00512 + out=data/fsns/train/train-00277-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00278-of-00512 + out=data/fsns/train/train-00278-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00279-of-00512 + out=data/fsns/train/train-00279-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00280-of-00512 + out=data/fsns/train/train-00280-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00281-of-00512 + out=data/fsns/train/train-00281-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00282-of-00512 + out=data/fsns/train/train-00282-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00283-of-00512 + out=data/fsns/train/train-00283-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00284-of-00512 + out=data/fsns/train/train-00284-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00285-of-00512 + out=data/fsns/train/train-00285-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00286-of-00512 + out=data/fsns/train/train-00286-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00287-of-00512 + out=data/fsns/train/train-00287-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00288-of-00512 + out=data/fsns/train/train-00288-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00289-of-00512 + out=data/fsns/train/train-00289-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00290-of-00512 + out=data/fsns/train/train-00290-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00291-of-00512 + out=data/fsns/train/train-00291-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00292-of-00512 + out=data/fsns/train/train-00292-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00293-of-00512 + out=data/fsns/train/train-00293-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00294-of-00512 + out=data/fsns/train/train-00294-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00295-of-00512 + out=data/fsns/train/train-00295-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00296-of-00512 + out=data/fsns/train/train-00296-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00297-of-00512 + out=data/fsns/train/train-00297-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00298-of-00512 + out=data/fsns/train/train-00298-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00299-of-00512 + out=data/fsns/train/train-00299-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00300-of-00512 + out=data/fsns/train/train-00300-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00301-of-00512 + out=data/fsns/train/train-00301-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00302-of-00512 + out=data/fsns/train/train-00302-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00303-of-00512 + out=data/fsns/train/train-00303-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00304-of-00512 + out=data/fsns/train/train-00304-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00305-of-00512 + out=data/fsns/train/train-00305-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00306-of-00512 + out=data/fsns/train/train-00306-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00307-of-00512 + out=data/fsns/train/train-00307-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00308-of-00512 + out=data/fsns/train/train-00308-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00309-of-00512 + out=data/fsns/train/train-00309-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00310-of-00512 + out=data/fsns/train/train-00310-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00311-of-00512 + out=data/fsns/train/train-00311-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00312-of-00512 + out=data/fsns/train/train-00312-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00313-of-00512 + out=data/fsns/train/train-00313-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00314-of-00512 + out=data/fsns/train/train-00314-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00315-of-00512 + out=data/fsns/train/train-00315-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00316-of-00512 + out=data/fsns/train/train-00316-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00317-of-00512 + out=data/fsns/train/train-00317-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00318-of-00512 + out=data/fsns/train/train-00318-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00319-of-00512 + out=data/fsns/train/train-00319-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00320-of-00512 + out=data/fsns/train/train-00320-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00321-of-00512 + out=data/fsns/train/train-00321-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00322-of-00512 + out=data/fsns/train/train-00322-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00323-of-00512 + out=data/fsns/train/train-00323-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00324-of-00512 + out=data/fsns/train/train-00324-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00325-of-00512 + out=data/fsns/train/train-00325-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00326-of-00512 + out=data/fsns/train/train-00326-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00327-of-00512 + out=data/fsns/train/train-00327-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00328-of-00512 + out=data/fsns/train/train-00328-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00329-of-00512 + out=data/fsns/train/train-00329-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00330-of-00512 + out=data/fsns/train/train-00330-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00331-of-00512 + out=data/fsns/train/train-00331-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00332-of-00512 + out=data/fsns/train/train-00332-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00333-of-00512 + out=data/fsns/train/train-00333-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00334-of-00512 + out=data/fsns/train/train-00334-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00335-of-00512 + out=data/fsns/train/train-00335-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00336-of-00512 + out=data/fsns/train/train-00336-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00337-of-00512 + out=data/fsns/train/train-00337-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00338-of-00512 + out=data/fsns/train/train-00338-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00339-of-00512 + out=data/fsns/train/train-00339-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00340-of-00512 + out=data/fsns/train/train-00340-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00341-of-00512 + out=data/fsns/train/train-00341-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00342-of-00512 + out=data/fsns/train/train-00342-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00343-of-00512 + out=data/fsns/train/train-00343-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00344-of-00512 + out=data/fsns/train/train-00344-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00345-of-00512 + out=data/fsns/train/train-00345-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00346-of-00512 + out=data/fsns/train/train-00346-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00347-of-00512 + out=data/fsns/train/train-00347-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00348-of-00512 + out=data/fsns/train/train-00348-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00349-of-00512 + out=data/fsns/train/train-00349-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00350-of-00512 + out=data/fsns/train/train-00350-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00351-of-00512 + out=data/fsns/train/train-00351-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00352-of-00512 + out=data/fsns/train/train-00352-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00353-of-00512 + out=data/fsns/train/train-00353-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00354-of-00512 + out=data/fsns/train/train-00354-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00355-of-00512 + out=data/fsns/train/train-00355-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00356-of-00512 + out=data/fsns/train/train-00356-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00357-of-00512 + out=data/fsns/train/train-00357-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00358-of-00512 + out=data/fsns/train/train-00358-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00359-of-00512 + out=data/fsns/train/train-00359-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00360-of-00512 + out=data/fsns/train/train-00360-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00361-of-00512 + out=data/fsns/train/train-00361-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00362-of-00512 + out=data/fsns/train/train-00362-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00363-of-00512 + out=data/fsns/train/train-00363-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00364-of-00512 + out=data/fsns/train/train-00364-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00365-of-00512 + out=data/fsns/train/train-00365-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00366-of-00512 + out=data/fsns/train/train-00366-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00367-of-00512 + out=data/fsns/train/train-00367-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00368-of-00512 + out=data/fsns/train/train-00368-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00369-of-00512 + out=data/fsns/train/train-00369-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00370-of-00512 + out=data/fsns/train/train-00370-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00371-of-00512 + out=data/fsns/train/train-00371-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00372-of-00512 + out=data/fsns/train/train-00372-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00373-of-00512 + out=data/fsns/train/train-00373-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00374-of-00512 + out=data/fsns/train/train-00374-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00375-of-00512 + out=data/fsns/train/train-00375-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00376-of-00512 + out=data/fsns/train/train-00376-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00377-of-00512 + out=data/fsns/train/train-00377-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00378-of-00512 + out=data/fsns/train/train-00378-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00379-of-00512 + out=data/fsns/train/train-00379-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00380-of-00512 + out=data/fsns/train/train-00380-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00381-of-00512 + out=data/fsns/train/train-00381-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00382-of-00512 + out=data/fsns/train/train-00382-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00383-of-00512 + out=data/fsns/train/train-00383-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00384-of-00512 + out=data/fsns/train/train-00384-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00385-of-00512 + out=data/fsns/train/train-00385-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00386-of-00512 + out=data/fsns/train/train-00386-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00387-of-00512 + out=data/fsns/train/train-00387-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00388-of-00512 + out=data/fsns/train/train-00388-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00389-of-00512 + out=data/fsns/train/train-00389-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00390-of-00512 + out=data/fsns/train/train-00390-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00391-of-00512 + out=data/fsns/train/train-00391-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00392-of-00512 + out=data/fsns/train/train-00392-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00393-of-00512 + out=data/fsns/train/train-00393-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00394-of-00512 + out=data/fsns/train/train-00394-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00395-of-00512 + out=data/fsns/train/train-00395-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00396-of-00512 + out=data/fsns/train/train-00396-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00397-of-00512 + out=data/fsns/train/train-00397-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00398-of-00512 + out=data/fsns/train/train-00398-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00399-of-00512 + out=data/fsns/train/train-00399-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00400-of-00512 + out=data/fsns/train/train-00400-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00401-of-00512 + out=data/fsns/train/train-00401-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00402-of-00512 + out=data/fsns/train/train-00402-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00403-of-00512 + out=data/fsns/train/train-00403-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00404-of-00512 + out=data/fsns/train/train-00404-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00405-of-00512 + out=data/fsns/train/train-00405-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00406-of-00512 + out=data/fsns/train/train-00406-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00407-of-00512 + out=data/fsns/train/train-00407-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00408-of-00512 + out=data/fsns/train/train-00408-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00409-of-00512 + out=data/fsns/train/train-00409-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00410-of-00512 + out=data/fsns/train/train-00410-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00411-of-00512 + out=data/fsns/train/train-00411-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00412-of-00512 + out=data/fsns/train/train-00412-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00413-of-00512 + out=data/fsns/train/train-00413-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00414-of-00512 + out=data/fsns/train/train-00414-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00415-of-00512 + out=data/fsns/train/train-00415-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00416-of-00512 + out=data/fsns/train/train-00416-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00417-of-00512 + out=data/fsns/train/train-00417-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00418-of-00512 + out=data/fsns/train/train-00418-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00419-of-00512 + out=data/fsns/train/train-00419-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00420-of-00512 + out=data/fsns/train/train-00420-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00421-of-00512 + out=data/fsns/train/train-00421-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00422-of-00512 + out=data/fsns/train/train-00422-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00423-of-00512 + out=data/fsns/train/train-00423-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00424-of-00512 + out=data/fsns/train/train-00424-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00425-of-00512 + out=data/fsns/train/train-00425-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00426-of-00512 + out=data/fsns/train/train-00426-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00427-of-00512 + out=data/fsns/train/train-00427-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00428-of-00512 + out=data/fsns/train/train-00428-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00429-of-00512 + out=data/fsns/train/train-00429-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00430-of-00512 + out=data/fsns/train/train-00430-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00431-of-00512 + out=data/fsns/train/train-00431-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00432-of-00512 + out=data/fsns/train/train-00432-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00433-of-00512 + out=data/fsns/train/train-00433-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00434-of-00512 + out=data/fsns/train/train-00434-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00435-of-00512 + out=data/fsns/train/train-00435-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00436-of-00512 + out=data/fsns/train/train-00436-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00437-of-00512 + out=data/fsns/train/train-00437-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00438-of-00512 + out=data/fsns/train/train-00438-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00439-of-00512 + out=data/fsns/train/train-00439-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00440-of-00512 + out=data/fsns/train/train-00440-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00441-of-00512 + out=data/fsns/train/train-00441-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00442-of-00512 + out=data/fsns/train/train-00442-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00443-of-00512 + out=data/fsns/train/train-00443-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00444-of-00512 + out=data/fsns/train/train-00444-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00445-of-00512 + out=data/fsns/train/train-00445-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00446-of-00512 + out=data/fsns/train/train-00446-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00447-of-00512 + out=data/fsns/train/train-00447-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00448-of-00512 + out=data/fsns/train/train-00448-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00449-of-00512 + out=data/fsns/train/train-00449-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00450-of-00512 + out=data/fsns/train/train-00450-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00451-of-00512 + out=data/fsns/train/train-00451-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00452-of-00512 + out=data/fsns/train/train-00452-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00453-of-00512 + out=data/fsns/train/train-00453-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00454-of-00512 + out=data/fsns/train/train-00454-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00455-of-00512 + out=data/fsns/train/train-00455-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00456-of-00512 + out=data/fsns/train/train-00456-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00457-of-00512 + out=data/fsns/train/train-00457-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00458-of-00512 + out=data/fsns/train/train-00458-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00459-of-00512 + out=data/fsns/train/train-00459-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00460-of-00512 + out=data/fsns/train/train-00460-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00461-of-00512 + out=data/fsns/train/train-00461-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00462-of-00512 + out=data/fsns/train/train-00462-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00463-of-00512 + out=data/fsns/train/train-00463-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00464-of-00512 + out=data/fsns/train/train-00464-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00465-of-00512 + out=data/fsns/train/train-00465-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00466-of-00512 + out=data/fsns/train/train-00466-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00467-of-00512 + out=data/fsns/train/train-00467-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00468-of-00512 + out=data/fsns/train/train-00468-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00469-of-00512 + out=data/fsns/train/train-00469-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00470-of-00512 + out=data/fsns/train/train-00470-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00471-of-00512 + out=data/fsns/train/train-00471-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00472-of-00512 + out=data/fsns/train/train-00472-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00473-of-00512 + out=data/fsns/train/train-00473-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00474-of-00512 + out=data/fsns/train/train-00474-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00475-of-00512 + out=data/fsns/train/train-00475-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00476-of-00512 + out=data/fsns/train/train-00476-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00477-of-00512 + out=data/fsns/train/train-00477-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00478-of-00512 + out=data/fsns/train/train-00478-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00479-of-00512 + out=data/fsns/train/train-00479-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00480-of-00512 + out=data/fsns/train/train-00480-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00481-of-00512 + out=data/fsns/train/train-00481-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00482-of-00512 + out=data/fsns/train/train-00482-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00483-of-00512 + out=data/fsns/train/train-00483-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00484-of-00512 + out=data/fsns/train/train-00484-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00485-of-00512 + out=data/fsns/train/train-00485-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00486-of-00512 + out=data/fsns/train/train-00486-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00487-of-00512 + out=data/fsns/train/train-00487-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00488-of-00512 + out=data/fsns/train/train-00488-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00489-of-00512 + out=data/fsns/train/train-00489-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00490-of-00512 + out=data/fsns/train/train-00490-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00491-of-00512 + out=data/fsns/train/train-00491-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00492-of-00512 + out=data/fsns/train/train-00492-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00493-of-00512 + out=data/fsns/train/train-00493-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00494-of-00512 + out=data/fsns/train/train-00494-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00495-of-00512 + out=data/fsns/train/train-00495-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00496-of-00512 + out=data/fsns/train/train-00496-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00497-of-00512 + out=data/fsns/train/train-00497-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00498-of-00512 + out=data/fsns/train/train-00498-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00499-of-00512 + out=data/fsns/train/train-00499-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00500-of-00512 + out=data/fsns/train/train-00500-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00501-of-00512 + out=data/fsns/train/train-00501-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00502-of-00512 + out=data/fsns/train/train-00502-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00503-of-00512 + out=data/fsns/train/train-00503-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00504-of-00512 + out=data/fsns/train/train-00504-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00505-of-00512 + out=data/fsns/train/train-00505-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00506-of-00512 + out=data/fsns/train/train-00506-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00507-of-00512 + out=data/fsns/train/train-00507-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00508-of-00512 + out=data/fsns/train/train-00508-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00509-of-00512 + out=data/fsns/train/train-00509-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00510-of-00512 + out=data/fsns/train/train-00510-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00511-of-00512 + out=data/fsns/train/train-00511-of-00512 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00000-of-00064 + out=data/fsns/validation/validation-00000-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00001-of-00064 + out=data/fsns/validation/validation-00001-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00002-of-00064 + out=data/fsns/validation/validation-00002-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00003-of-00064 + out=data/fsns/validation/validation-00003-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00004-of-00064 + out=data/fsns/validation/validation-00004-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00005-of-00064 + out=data/fsns/validation/validation-00005-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00006-of-00064 + out=data/fsns/validation/validation-00006-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00007-of-00064 + out=data/fsns/validation/validation-00007-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00008-of-00064 + out=data/fsns/validation/validation-00008-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00009-of-00064 + out=data/fsns/validation/validation-00009-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00010-of-00064 + out=data/fsns/validation/validation-00010-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00011-of-00064 + out=data/fsns/validation/validation-00011-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00012-of-00064 + out=data/fsns/validation/validation-00012-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00013-of-00064 + out=data/fsns/validation/validation-00013-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00014-of-00064 + out=data/fsns/validation/validation-00014-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00015-of-00064 + out=data/fsns/validation/validation-00015-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00016-of-00064 + out=data/fsns/validation/validation-00016-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00017-of-00064 + out=data/fsns/validation/validation-00017-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00018-of-00064 + out=data/fsns/validation/validation-00018-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00019-of-00064 + out=data/fsns/validation/validation-00019-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00020-of-00064 + out=data/fsns/validation/validation-00020-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00021-of-00064 + out=data/fsns/validation/validation-00021-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00022-of-00064 + out=data/fsns/validation/validation-00022-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00023-of-00064 + out=data/fsns/validation/validation-00023-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00024-of-00064 + out=data/fsns/validation/validation-00024-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00025-of-00064 + out=data/fsns/validation/validation-00025-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00026-of-00064 + out=data/fsns/validation/validation-00026-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00027-of-00064 + out=data/fsns/validation/validation-00027-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00028-of-00064 + out=data/fsns/validation/validation-00028-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00029-of-00064 + out=data/fsns/validation/validation-00029-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00030-of-00064 + out=data/fsns/validation/validation-00030-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00031-of-00064 + out=data/fsns/validation/validation-00031-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00032-of-00064 + out=data/fsns/validation/validation-00032-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00033-of-00064 + out=data/fsns/validation/validation-00033-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00034-of-00064 + out=data/fsns/validation/validation-00034-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00035-of-00064 + out=data/fsns/validation/validation-00035-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00036-of-00064 + out=data/fsns/validation/validation-00036-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00037-of-00064 + out=data/fsns/validation/validation-00037-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00038-of-00064 + out=data/fsns/validation/validation-00038-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00039-of-00064 + out=data/fsns/validation/validation-00039-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00040-of-00064 + out=data/fsns/validation/validation-00040-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00041-of-00064 + out=data/fsns/validation/validation-00041-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00042-of-00064 + out=data/fsns/validation/validation-00042-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00043-of-00064 + out=data/fsns/validation/validation-00043-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00044-of-00064 + out=data/fsns/validation/validation-00044-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00045-of-00064 + out=data/fsns/validation/validation-00045-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00046-of-00064 + out=data/fsns/validation/validation-00046-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00047-of-00064 + out=data/fsns/validation/validation-00047-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00048-of-00064 + out=data/fsns/validation/validation-00048-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00049-of-00064 + out=data/fsns/validation/validation-00049-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00050-of-00064 + out=data/fsns/validation/validation-00050-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00051-of-00064 + out=data/fsns/validation/validation-00051-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00052-of-00064 + out=data/fsns/validation/validation-00052-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00053-of-00064 + out=data/fsns/validation/validation-00053-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00054-of-00064 + out=data/fsns/validation/validation-00054-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00055-of-00064 + out=data/fsns/validation/validation-00055-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00056-of-00064 + out=data/fsns/validation/validation-00056-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00057-of-00064 + out=data/fsns/validation/validation-00057-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00058-of-00064 + out=data/fsns/validation/validation-00058-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00059-of-00064 + out=data/fsns/validation/validation-00059-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00060-of-00064 + out=data/fsns/validation/validation-00060-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00061-of-00064 + out=data/fsns/validation/validation-00061-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00062-of-00064 + out=data/fsns/validation/validation-00062-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00063-of-00064 + out=data/fsns/validation/validation-00063-of-00064 diff --git a/swivel/README.md b/swivel/README.md index fff8cc6f431f210e62cbfe6d1ef6a22524e71a25..ed77c747abcffae9ad462d6105d96540162e51d4 100644 --- a/swivel/README.md +++ b/swivel/README.md @@ -24,7 +24,7 @@ Note that the resulting co-occurrence matrix is very sparse (i.e., contains many zeros) since most words won't have been observed in the context of other words. In the case of very rare words, it seems reasonable to assume that you just haven't sampled enough data to spot their co-occurrence yet. On the other hand, -if we've failed to observed to common words co-occuring, it seems likely that +if we've failed to observed two common words co-occuring, it seems likely that they are *anti-correlated*. Swivel attempts to capture this intuition by using both the observed and the @@ -42,6 +42,9 @@ This release includes the following programs. * `swivel.py` is a TensorFlow program that generates embeddings from the co-occurrence statistics. It uses the files created by `prep.py` as input, and generates two text files as output: the row and column embeddings. +* `distributed.sh` is a Bash script that is meant to act as a template for + launching "distributed" Swivel training; i.e., multiple processes that work in + parallel and communicate via a parameter server. * `text2bin.py` combines the row and column vectors generated by Swivel into a flat binary file that can be quickly loaded into memory to perform vector arithmetic. This can also be used to convert embeddings from @@ -174,11 +177,5 @@ mixed case and evaluate them using lower case, things won't work well. # Contact If you have any questions about Swivel, feel free to post to -[swivel-embeddings@googlegroups.com](https://groups.google.com/forum/#!forum/swivel-embeddings) -or contact us directly: - -* Noam Shazeer (`noam@google.com`) -* Ryan Doherty (`portalfire@google.com`) -* Colin Evans (`colinhevans@google.com`) -* Chris Waterson (`waterson@google.com`) +[swivel-embeddings@googlegroups.com](https://groups.google.com/forum/#!forum/swivel-embeddings). diff --git a/swivel/distributed.sh b/swivel/distributed.sh new file mode 100644 index 0000000000000000000000000000000000000000..6aa59f751a8bbd3761a419f5f3242a9d1d5ce5e3 --- /dev/null +++ b/swivel/distributed.sh @@ -0,0 +1,54 @@ +#!/bin/bash +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# This script launches a multi-process version of Swivel on a single machine. +set -e + +# A comma-separated list of parameter server processes. +PS_HOSTS="localhost:4000" + +# A comma-separated list of worker processes. +WORKER_HOSTS="localhost:5000,localhost:5001,localhost:5002,localhost:5003" + +# Where the Swivel training data is located. All processes must be able to read +# from this directory, so it ought to be a network filesystem if you're running +# on multiple servers. +INPUT_BASE_PATH="${HOME}/tmp/swivel/in" + +# Where the output and working directory is located. +OUTPUT_BASE_PATH="${HOME}/tmp/swivel/out" + +# Location of evaluation data, if you want to observe evaluation while training. +EVAL_BASE_PATH="${HOME}/tmp/swivel/eval" + +ARGS="--ps_hosts ${PS_HOSTS} +--worker_hosts ${WORKER_HOSTS} +--input_base_path ${INPUT_BASE_PATH} +--output_base_path ${OUTPUT_BASE_PATH} +--eval_base_path ${EVAL_BASE_PATH}" + +# This configuration is for a two-GPU machine. It starts four worker +# processes, two for each GPU. +python swivel.py --job_name ps --task_index 0 ${ARGS} >& /tmp/ps.0 & +python swivel.py --job_name worker --task_index 0 --gpu_device 0 ${ARGS} >& /tmp/worker.0 & +python swivel.py --job_name worker --task_index 1 --gpu_device 1 ${ARGS} >& /tmp/worker.1 & +python swivel.py --job_name worker --task_index 2 --gpu_device 0 ${ARGS} >& /tmp/worker.2 & +python swivel.py --job_name worker --task_index 3 --gpu_device 1 ${ARGS} >& /tmp/worker.3 & + +# Perhaps there is a more clever way to clean up the parameter server once all +# the workers are done. +wait %2 %3 %4 %5 +kill %1 + diff --git a/swivel/swivel.py b/swivel/swivel.py index f9927cd4283f26f254a8a590bc57e0d1bab82bf3..c69660c09c18f54da654ca8a7341559f8b9bcc22 100644 --- a/swivel/swivel.py +++ b/swivel/swivel.py @@ -1,5 +1,3 @@ -#!/usr/bin/env python -# # Copyright 2016 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); @@ -49,366 +47,442 @@ number of epochs. When complete, it will output the trained vectors to a tab-separated file that contains one line per embedding. Row and column embeddings are stored in separate files. +Swivel can be run "stand-alone" or "distributed". The latter involves running +at least one parameter server process, along with one or more worker processes. """ +from __future__ import division from __future__ import print_function + import glob -import math +import itertools import os -import sys -import time -import threading +import random import numpy as np +import scipy.stats import tensorflow as tf -from tensorflow.python.client import device_lib flags = tf.app.flags -flags.DEFINE_string('input_base_path', '/tmp/swivel_data', - 'Directory containing input shards, vocabularies, ' - 'and marginals.') -flags.DEFINE_string('output_base_path', '/tmp/swivel_data', - 'Path where to write the trained embeddings.') -flags.DEFINE_integer('embedding_size', 300, 'Size of the embeddings') -flags.DEFINE_boolean('trainable_bias', False, 'Biases are trainable') -flags.DEFINE_integer('submatrix_rows', 4096, 'Rows in each training submatrix. ' - 'This must match the training data.') -flags.DEFINE_integer('submatrix_cols', 4096, 'Rows in each training submatrix. ' - 'This must match the training data.') -flags.DEFINE_float('loss_multiplier', 1.0 / 4096, - 'constant multiplier on loss.') -flags.DEFINE_float('confidence_exponent', 0.5, - 'Exponent for l2 confidence function') -flags.DEFINE_float('confidence_scale', 0.25, 'Scale for l2 confidence function') -flags.DEFINE_float('confidence_base', 0.1, 'Base for l2 confidence function') -flags.DEFINE_float('learning_rate', 1.0, 'Initial learning rate') -flags.DEFINE_integer('num_concurrent_steps', 2, - 'Number of threads to train with') -flags.DEFINE_integer('num_readers', 4, - 'Number of threads to read the input data and feed it') -flags.DEFINE_float('num_epochs', 40, 'Number epochs to train for') -flags.DEFINE_float('per_process_gpu_memory_fraction', 0, - 'Fraction of GPU memory to use, 0 means allow_growth') -flags.DEFINE_integer('num_gpus', 0, - 'Number of GPUs to use, 0 means all available') +flags.DEFINE_string( + 'input_base_path', '/tmp/swivel_data', + 'Directory containing input shards, vocabularies, and marginals.') +flags.DEFINE_string( + 'output_base_path', '/tmp/swivel_data', + 'Path where to write the trained embeddings.') +flags.DEFINE_string('eval_base_path', '', 'Path to evaluation data') + +# Control for training. +flags.DEFINE_float('num_epochs', 40, 'Number epochs to train') +flags.DEFINE_string('hparams', '', 'Model hyper-parameters') + +# Model hyper-parameters. (Move these to tf.HParams once that gets integrated +# into TF from tf.contrib.) +flags.DEFINE_integer( + 'dim', 300, 'Embedding dimensionality') +flags.DEFINE_string( + 'optimizer', 'rmsprop', 'SGD optimizer; either "adagrad" or "rmsprop"') +flags.DEFINE_float( + 'learning_rate', 0.1, 'Optimizer learning rate') +flags.DEFINE_float( + 'momentum', 0.1, 'Optimizer momentum; used with RMSProp') +flags.DEFINE_float( + 'confidence_base', 0.0, 'Base for count weighting') +flags.DEFINE_float( + 'confidence_scale', 1.0, 'Scale for count weighting') +flags.DEFINE_float( + 'confidence_exponent', 0.5, 'Exponent for count weighting') +flags.DEFINE_integer( + 'submatrix_rows', 4096, 'Number of rows in each submatrix') +flags.DEFINE_integer( + 'submatrix_cols', 4096, 'Number of cols in each submatrix') + +# For distributed training. +flags.DEFINE_string( + 'ps_hosts', '', + 'Comma-separated list of parameter server host:port; if empty, run local') +flags.DEFINE_string( + 'worker_hosts', '', 'Comma-separated list of worker host:port') +flags.DEFINE_string( + 'job_name', '', 'The job this process will run, either "ps" or "worker"') +flags.DEFINE_integer( + 'task_index', 0, 'The task index for this process') +flags.DEFINE_integer( + 'gpu_device', 0, 'The GPU device to use.') FLAGS = flags.FLAGS -def log(message, *args, **kwargs): - tf.logging.info(message, *args, **kwargs) +class Model(object): + """A Swivel model.""" + + def __init__(self, input_base_path, hparams): + """Creates a new Swivel model.""" + # Read vocab + self.row_ix_to_word, self.row_word_to_ix = self._read_vocab( + os.path.join(input_base_path, 'row_vocab.txt')) + self.col_ix_to_word, self.col_word_to_ix = self._read_vocab( + os.path.join(input_base_path, 'col_vocab.txt')) + + # Read marginals. + row_sums = self._read_marginals_file( + os.path.join(input_base_path, 'row_sums.txt')) + col_sums = self._read_marginals_file( + os.path.join(input_base_path, 'col_sums.txt')) + + # Construct input tensors. + count_matrix_files = glob.glob( + os.path.join(input_base_path, 'shard-*.pb')) + + global_rows, global_cols, counts = self._count_matrix_input( + count_matrix_files, hparams.submatrix_rows, hparams.submatrix_cols) + + # Create embedding variables. + sigma = 1.0 / np.sqrt(hparams.dim) + self.row_embedding = tf.get_variable( + 'row_embedding', + shape=[len(row_sums), hparams.dim], + initializer=tf.random_normal_initializer(0, sigma), + dtype=tf.float32) + self.col_embedding = tf.get_variable( + 'col_embedding', + shape=[len(col_sums), hparams.dim], + initializer=tf.random_normal_initializer(0, sigma), + dtype=tf.float32) + + matrix_log_sum = np.log(np.sum(row_sums) + 1) + row_bias = tf.constant( + [np.log(x + 1) for x in row_sums], dtype=tf.float32) + col_bias = tf.constant( + [np.log(x + 1) for x in col_sums], dtype=tf.float32) + + # Fetch embeddings. + selected_rows = tf.nn.embedding_lookup(self.row_embedding, global_rows) + selected_cols = tf.nn.embedding_lookup(self.col_embedding, global_cols) + + selected_row_bias = tf.gather(row_bias, global_rows) + selected_col_bias = tf.gather(col_bias, global_cols) + + predictions = tf.matmul(selected_rows, selected_cols, transpose_b=True) + + # These binary masks separate zero from non-zero values. + count_is_nonzero = tf.to_float(tf.cast(counts, tf.bool)) + count_is_zero = 1 - count_is_nonzero + + objectives = count_is_nonzero * tf.log(counts + 1e-30) + objectives -= tf.reshape(selected_row_bias, [-1, 1]) + objectives -= selected_col_bias + objectives += matrix_log_sum + + err = predictions - objectives + + # The confidence function scales the L2 loss based on the raw + # co-occurrence count. + l2_confidence = (hparams.confidence_base + + hparams.confidence_scale * tf.pow( + counts, hparams.confidence_exponent)) + + loss_multiplier = 1 / np.sqrt( + hparams.submatrix_rows * hparams.submatrix_cols) + + l2_loss = loss_multiplier * tf.reduce_sum( + 0.5 * l2_confidence * tf.square(err)) + + sigmoid_loss = loss_multiplier * tf.reduce_sum( + tf.nn.softplus(err) * count_is_zero) + + self.loss_op = l2_loss + sigmoid_loss + + if hparams.optimizer == 'adagrad': + opt = tf.train.AdagradOptimizer(hparams.learning_rate) + elif hparams.optimizer == 'rmsprop': + opt = tf.train.RMSPropOptimizer(hparams.learning_rate, hparams.momentum) + else: + raise ValueError('unknown optimizer "%s"' % hparams.optimizer) + + self.global_step = tf.get_variable( + 'global_step', initializer=0, trainable=False) + + self.train_op = opt.minimize(self.loss_op, global_step=self.global_step) + + # One epoch trains each submatrix once. + self.steps_per_epoch = ( + (len(row_sums) / hparams.submatrix_rows) * + (len(col_sums) / hparams.submatrix_cols)) + + def _read_vocab(self, filename): + """Reads the vocabulary file.""" + with open(filename) as lines: + ix_to_word = [line.strip() for line in lines] + word_to_ix = {word: ix for ix, word in enumerate(ix_to_word)} + return ix_to_word, word_to_ix + + def _read_marginals_file(self, filename): + """Reads text file with one number per line to an array.""" + with open(filename) as lines: + return [float(line.strip()) for line in lines] + + def _count_matrix_input(self, filenames, submatrix_rows, submatrix_cols): + """Creates ops that read submatrix shards from disk.""" + random.shuffle(filenames) + filename_queue = tf.train.string_input_producer(filenames) + reader = tf.WholeFileReader() + _, serialized_example = reader.read(filename_queue) + features = tf.parse_single_example( + serialized_example, + features={ + 'global_row': tf.FixedLenFeature([submatrix_rows], dtype=tf.int64), + 'global_col': tf.FixedLenFeature([submatrix_cols], dtype=tf.int64), + 'sparse_local_row': tf.VarLenFeature(dtype=tf.int64), + 'sparse_local_col': tf.VarLenFeature(dtype=tf.int64), + 'sparse_value': tf.VarLenFeature(dtype=tf.float32) + }) + + global_row = features['global_row'] + global_col = features['global_col'] + + sparse_local_row = features['sparse_local_row'].values + sparse_local_col = features['sparse_local_col'].values + sparse_count = features['sparse_value'].values + + sparse_indices = tf.concat( + axis=1, values=[tf.expand_dims(sparse_local_row, 1), + tf.expand_dims(sparse_local_col, 1)]) + + count = tf.sparse_to_dense(sparse_indices, [submatrix_rows, submatrix_cols], + sparse_count) + + return global_row, global_col, count + + def wordsim_eval_op(self, filename): + """Returns an op that runs an eval on a word similarity dataset. + + The eval dataset is assumed to be tab-separated, one scored word pair per + line. The resulting value is Spearman's rho of the human judgements with + the cosine similarity of the word embeddings. + + Args: + filename: the filename containing the word similarity data. + + Returns: + An operator that will compute Spearman's rho of the current row + embeddings. + """ + with open(filename, 'r') as fh: + tuples = (line.strip().split('\t') for line in fh.read().splitlines()) + word1s, word2s, sims = zip(*tuples) + actuals = map(float, sims) + + v1s_t = tf.nn.embedding_lookup( + self.row_embedding, + [self.row_word_to_ix.get(w, 0) for w in word1s]) + + v2s_t = tf.nn.embedding_lookup( + self.row_embedding, + [self.row_word_to_ix.get(w, 0) for w in word2s]) + + # Compute the predicted word similarity as the cosine similarity between the + # embedding vectors. + preds_t = tf.reduce_sum( + tf.nn.l2_normalize(v1s_t, dim=1) * tf.nn.l2_normalize(v2s_t, dim=1), + axis=1) + + def _op(preds): + rho, _ = scipy.stats.spearmanr(preds, actuals) + return rho + + return tf.py_func(_op, [preds_t], tf.float64) + + def analogy_eval_op(self, filename, max_vocab_size=20000): + """Returns an op that runs an eval on an analogy dataset. + + The eval dataset is assumed to be tab-separated, with four tokens per + line. The first three tokens are query terms, the last is the expected + answer. For each line (e.g., "man king woman queen"), the vectors + corresponding to the query terms are added ("king - man + woman") to produce + a query vector. If the expected answer's vector is the nearest neighbor to + the query vector (not counting any of the query vectors themselves), then + the line is scored as correct. The reported accuracy is the number of + correct rows divided by the total number of rows. Missing terms are + replaced with an arbitrary vector and will almost certainly result in + incorrect answers. + + Note that the results are approximate: for efficiency's sake, only the first + `max_vocab_size` terms are included in the nearest neighbor search. + + Args: + filename: the filename containing the analogy data. + max_vocab_size: the maximum number of tokens to include in the nearest + neighbor search. By default, 20000. + + Returns: + The accuracy on the analogy task. + """ + analogy_ixs = [] + with open(filename, 'r') as lines: + for line in lines: + parts = line.strip().split('\t') + if len(parts) == 4: + analogy_ixs.append([self.row_word_to_ix.get(w, 0) for w in parts]) + + # man:king :: woman:queen => king - man + woman == queen + ix1s, ix2s, ix3s, _ = zip(*analogy_ixs) + v1s_t, v2s_t, v3s_t = ( + tf.nn.l2_normalize( + tf.nn.embedding_lookup(self.row_embedding, ixs), + dim=1) + for ixs in (ix1s, ix2s, ix3s)) + + preds_t = v2s_t - v1s_t + v3s_t + + # Compute the nearest neighbors as the cosine similarity. We only consider + # up to max_vocab_size to avoid a matmul that swamps the machine. + sims_t = tf.matmul( + preds_t, + tf.nn.l2_normalize(self.row_embedding[:max_vocab_size], dim=1), + transpose_b=True) + + # Take the four nearest neighbors, since the eval explicitly discards the + # query terms. + _, preds_ixs_t = tf.nn.top_k(sims_t, 4) + + def _op(preds_ixs): + correct, total = 0, 0 + for pred_ixs, actual_ixs in itertools.izip(preds_ixs, analogy_ixs): + pred_ixs = [ix for ix in pred_ixs if ix not in actual_ixs[:3]] + correct += pred_ixs[0] == actual_ixs[3] + total += 1 + + return correct / total + + return tf.py_func(_op, [preds_ixs_t], tf.float64) + + def _write_tensor(self, vocab_path, output_path, session, embedding): + """Writes tensor to output_path as tsv.""" + embeddings = session.run(embedding) + + with open(output_path, 'w') as out_f: + with open(vocab_path) as vocab_f: + for index, word in enumerate(vocab_f): + word = word.strip() + embedding = embeddings[index] + print('\t'.join([word.strip()] + [str(x) for x in embedding]), + file=out_f) + + def write_embeddings(self, config, session): + """Writes row and column embeddings disk.""" + self._write_tensor( + os.path.join(config.input_base_path, 'row_vocab.txt'), + os.path.join(config.output_base_path, 'row_embedding.tsv'), + session, self.row_embedding) + + self._write_tensor( + os.path.join(config.input_base_path, 'col_vocab.txt'), + os.path.join(config.output_base_path, 'col_embedding.tsv'), + session, self.col_embedding) -def get_available_gpus(): - return [d.name for d in device_lib.list_local_devices() - if d.device_type == 'GPU'] +def main(_): + tf.logging.set_verbosity(tf.logging.INFO) + # If we have ps_hosts, then we'll assume that this is going to be a + # distributed training run. Configure the cluster appropriately. Otherwise, + # we just do everything in-process. + if FLAGS.ps_hosts: + cluster = tf.train.ClusterSpec({ + 'ps': FLAGS.ps_hosts.split(','), + 'worker': FLAGS.worker_hosts.split(','), + }) + + if FLAGS.job_name == 'ps': + # Ignore the GPU if we're the parameter server. This let's the PS run on + # the same machine as a worker. + config = tf.ConfigProto(device_count={'GPU': 0}) + elif FLAGS.job_name == 'worker': + config = tf.ConfigProto(gpu_options=tf.GPUOptions( + visible_device_list='%d' % FLAGS.gpu_device, + allow_growth=True)) + else: + raise ValueError('unknown job name "%s"' % FLAGS.job_name) -def embeddings_with_init(vocab_size, embedding_dim, name): - """Creates and initializes the embedding tensors.""" - return tf.get_variable(name=name, - shape=[vocab_size, embedding_dim], - initializer=tf.random_normal_initializer( - stddev=math.sqrt(1.0 / embedding_dim))) - - -def count_matrix_input(filenames, submatrix_rows, submatrix_cols): - """Reads submatrix shards from disk.""" - filename_queue = tf.train.string_input_producer(filenames) - reader = tf.WholeFileReader() - _, serialized_example = reader.read(filename_queue) - features = tf.parse_single_example( - serialized_example, - features={ - 'global_row': tf.FixedLenFeature([submatrix_rows], dtype=tf.int64), - 'global_col': tf.FixedLenFeature([submatrix_cols], dtype=tf.int64), - 'sparse_local_row': tf.VarLenFeature(dtype=tf.int64), - 'sparse_local_col': tf.VarLenFeature(dtype=tf.int64), - 'sparse_value': tf.VarLenFeature(dtype=tf.float32) - }) - - global_row = features['global_row'] - global_col = features['global_col'] - - sparse_local_row = features['sparse_local_row'].values - sparse_local_col = features['sparse_local_col'].values - sparse_count = features['sparse_value'].values - - sparse_indices = tf.concat(axis=1, values=[tf.expand_dims(sparse_local_row, 1), - tf.expand_dims(sparse_local_col, 1)]) - count = tf.sparse_to_dense(sparse_indices, [submatrix_rows, submatrix_cols], - sparse_count) - - queued_global_row, queued_global_col, queued_count = tf.train.batch( - [global_row, global_col, count], - batch_size=1, - num_threads=FLAGS.num_readers, - capacity=32) - - queued_global_row = tf.reshape(queued_global_row, [submatrix_rows]) - queued_global_col = tf.reshape(queued_global_col, [submatrix_cols]) - queued_count = tf.reshape(queued_count, [submatrix_rows, submatrix_cols]) - - return queued_global_row, queued_global_col, queued_count - - -def read_marginals_file(filename): - """Reads text file with one number per line to an array.""" - with open(filename) as lines: - return [float(line) for line in lines] - - -def write_embedding_tensor_to_disk(vocab_path, output_path, sess, embedding): - """Writes tensor to output_path as tsv""" - # Fetch the embedding values from the model - embeddings = sess.run(embedding) - - with open(output_path, 'w') as out_f: - with open(vocab_path) as vocab_f: - for index, word in enumerate(vocab_f): - word = word.strip() - embedding = embeddings[index] - out_f.write(word + '\t' + '\t'.join([str(x) for x in embedding]) + '\n') - - -def write_embeddings_to_disk(config, model, sess): - """Writes row and column embeddings disk""" - # Row Embedding - row_vocab_path = config.input_base_path + '/row_vocab.txt' - row_embedding_output_path = config.output_base_path + '/row_embedding.tsv' - log('Writing row embeddings to: %s', row_embedding_output_path) - write_embedding_tensor_to_disk(row_vocab_path, row_embedding_output_path, - sess, model.row_embedding) - - # Column Embedding - col_vocab_path = config.input_base_path + '/col_vocab.txt' - col_embedding_output_path = config.output_base_path + '/col_embedding.tsv' - log('Writing column embeddings to: %s', col_embedding_output_path) - write_embedding_tensor_to_disk(col_vocab_path, col_embedding_output_path, - sess, model.col_embedding) - - -class SwivelModel(object): - """Small class to gather needed pieces from a Graph being built.""" - - def __init__(self, config): - """Construct graph for dmc.""" - self._config = config - - # Create paths to input data files - log('Reading model from: %s', config.input_base_path) - count_matrix_files = glob.glob(config.input_base_path + '/shard-*.pb') - row_sums_path = config.input_base_path + '/row_sums.txt' - col_sums_path = config.input_base_path + '/col_sums.txt' - - # Read marginals - row_sums = read_marginals_file(row_sums_path) - col_sums = read_marginals_file(col_sums_path) - - self.n_rows = len(row_sums) - self.n_cols = len(col_sums) - log('Matrix dim: (%d,%d) SubMatrix dim: (%d,%d)', - self.n_rows, self.n_cols, config.submatrix_rows, config.submatrix_cols) - self.n_submatrices = (self.n_rows * self.n_cols / - (config.submatrix_rows * config.submatrix_cols)) - log('n_submatrices: %d', self.n_submatrices) - - with tf.device('/cpu:0'): - # ===== CREATE VARIABLES ====== - # Get input - global_row, global_col, count = count_matrix_input( - count_matrix_files, config.submatrix_rows, config.submatrix_cols) - - # Embeddings - self.row_embedding = embeddings_with_init( - embedding_dim=config.embedding_size, - vocab_size=self.n_rows, - name='row_embedding') - self.col_embedding = embeddings_with_init( - embedding_dim=config.embedding_size, - vocab_size=self.n_cols, - name='col_embedding') - tf.summary.histogram('row_emb', self.row_embedding) - tf.summary.histogram('col_emb', self.col_embedding) - - matrix_log_sum = math.log(np.sum(row_sums) + 1) - row_bias_init = [math.log(x + 1) for x in row_sums] - col_bias_init = [math.log(x + 1) for x in col_sums] - self.row_bias = tf.Variable( - row_bias_init, trainable=config.trainable_bias) - self.col_bias = tf.Variable( - col_bias_init, trainable=config.trainable_bias) - tf.summary.histogram('row_bias', self.row_bias) - tf.summary.histogram('col_bias', self.col_bias) - - # Add optimizer - l2_losses = [] - sigmoid_losses = [] - self.global_step = tf.Variable(0, name='global_step') - opt = tf.train.AdagradOptimizer(config.learning_rate) - - all_grads = [] - - devices = ['/gpu:%d' % i for i in range(FLAGS.num_gpus)] \ - if FLAGS.num_gpus > 0 else get_available_gpus() - self.devices_number = len(devices) - with tf.variable_scope(tf.get_variable_scope()): - for dev in devices: - with tf.device(dev): - with tf.name_scope(dev[1:].replace(':', '_')): - # ===== CREATE GRAPH ===== - # Fetch embeddings. - selected_row_embedding = tf.nn.embedding_lookup( - self.row_embedding, global_row) - selected_col_embedding = tf.nn.embedding_lookup( - self.col_embedding, global_col) - - # Fetch biases. - selected_row_bias = tf.nn.embedding_lookup( - [self.row_bias], global_row) - selected_col_bias = tf.nn.embedding_lookup( - [self.col_bias], global_col) - - # Multiply the row and column embeddings to generate predictions. - predictions = tf.matmul( - selected_row_embedding, selected_col_embedding, - transpose_b=True) - - # These binary masks separate zero from non-zero values. - count_is_nonzero = tf.to_float(tf.cast(count, tf.bool)) - count_is_zero = 1 - count_is_nonzero - - objectives = count_is_nonzero * tf.log(count + 1e-30) - objectives -= tf.reshape( - selected_row_bias, [config.submatrix_rows, 1]) - objectives -= selected_col_bias - objectives += matrix_log_sum - - err = predictions - objectives - - # The confidence function scales the L2 loss based on the raw - # co-occurrence count. - l2_confidence = (config.confidence_base + - config.confidence_scale * tf.pow( - count, config.confidence_exponent)) - - l2_loss = config.loss_multiplier * tf.reduce_sum( - 0.5 * l2_confidence * err * err * count_is_nonzero) - l2_losses.append(tf.expand_dims(l2_loss, 0)) - - sigmoid_loss = config.loss_multiplier * tf.reduce_sum( - tf.nn.softplus(err) * count_is_zero) - sigmoid_losses.append(tf.expand_dims(sigmoid_loss, 0)) - - loss = l2_loss + sigmoid_loss - grads = opt.compute_gradients(loss) - all_grads.append(grads) - - with tf.device('/cpu:0'): - # ===== MERGE LOSSES ===== - l2_loss = tf.reduce_mean(tf.concat(axis=0, values=l2_losses), 0, - name="l2_loss") - sigmoid_loss = tf.reduce_mean(tf.concat(axis=0, values=sigmoid_losses), 0, - name="sigmoid_loss") - self.loss = l2_loss + sigmoid_loss - average = tf.train.ExponentialMovingAverage(0.8, self.global_step) - loss_average_op = average.apply((self.loss,)) - tf.summary.scalar("l2_loss", l2_loss) - tf.summary.scalar("sigmoid_loss", sigmoid_loss) - tf.summary.scalar("loss", self.loss) - - # Apply the gradients to adjust the shared variables. - apply_gradient_ops = [] - for grads in all_grads: - apply_gradient_ops.append(opt.apply_gradients( - grads, global_step=self.global_step)) - - self.train_op = tf.group(loss_average_op, *apply_gradient_ops) - self.saver = tf.train.Saver(sharded=True) + server = tf.train.Server( + cluster, + job_name=FLAGS.job_name, + task_index=FLAGS.task_index, + config=config) + if FLAGS.job_name == 'ps': + return server.join() -def main(_): - tf.logging.set_verbosity(tf.logging.INFO) - start_time = time.time() + device_setter = tf.train.replica_device_setter( + worker_device='/job:worker/task:%d' % FLAGS.task_index, + cluster=cluster) - # Create the output path. If this fails, it really ought to fail - # now. :) - if not os.path.isdir(FLAGS.output_base_path): - os.makedirs(FLAGS.output_base_path) + else: + server = None + device_setter = tf.train.replica_device_setter(0) - # Create and run model + # Build the graph. with tf.Graph().as_default(): - model = SwivelModel(FLAGS) - - # Create a session for running Ops on the Graph. - gpu_opts = {} - if FLAGS.per_process_gpu_memory_fraction > 0: - gpu_opts["per_process_gpu_memory_fraction"] = \ - FLAGS.per_process_gpu_memory_fraction - else: - gpu_opts["allow_growth"] = True - gpu_options = tf.GPUOptions(**gpu_opts) - sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options)) - - # Run the Op to initialize the variables. - sess.run(tf.global_variables_initializer()) - - # Start feeding input - coord = tf.train.Coordinator() - threads = tf.train.start_queue_runners(sess=sess, coord=coord) - - # Calculate how many steps each thread should run - n_total_steps = int(FLAGS.num_epochs * model.n_rows * model.n_cols) / ( - FLAGS.submatrix_rows * FLAGS.submatrix_cols) - n_steps_per_thread = n_total_steps / ( - FLAGS.num_concurrent_steps * model.devices_number) - n_submatrices_to_train = model.n_submatrices * FLAGS.num_epochs - t0 = [time.time()] - n_steps_between_status_updates = 100 - status_i = [0] - status_lock = threading.Lock() - msg = ('%%%dd/%%d submatrices trained (%%.1f%%%%), %%5.1f submatrices/sec |' - ' loss %%f') % len(str(n_submatrices_to_train)) - - def TrainingFn(): - for _ in range(int(n_steps_per_thread)): - _, global_step, loss = sess.run(( - model.train_op, model.global_step, model.loss)) - - show_status = False - with status_lock: - new_i = global_step // n_steps_between_status_updates - if new_i > status_i[0]: - status_i[0] = new_i - show_status = True - if show_status: - elapsed = float(time.time() - t0[0]) - log(msg, global_step, n_submatrices_to_train, - 100.0 * global_step / n_submatrices_to_train, - n_steps_between_status_updates / elapsed, loss) - t0[0] = time.time() - - # Start training threads - train_threads = [] - for _ in range(FLAGS.num_concurrent_steps): - t = threading.Thread(target=TrainingFn) - train_threads.append(t) - t.start() - - # Wait for threads to finish. - for t in train_threads: - t.join() - - coord.request_stop() - coord.join(threads) - - # Write out vectors - write_embeddings_to_disk(FLAGS, model, sess) - - # Shutdown - sess.close() - log("Elapsed: %s", time.time() - start_time) + with tf.device(device_setter): + model = Model(FLAGS.input_base_path, FLAGS) + + # If an eval path is present, then create eval operators and set up scalar + # summaries to report on the results. Run the evals on the CPU since + # the analogy eval requires a fairly enormous tensor to be allocated to + # do the nearest neighbor search. + if FLAGS.eval_base_path: + wordsim_filenames = glob.glob( + os.path.join(FLAGS.eval_base_path, '*.ws.tab')) + + for filename in wordsim_filenames: + name = os.path.basename(filename).split('.')[0] + with tf.device(tf.DeviceSpec(device_type='CPU')): + op = model.wordsim_eval_op(filename) + tf.summary.scalar(name, op) + + analogy_filenames = glob.glob( + os.path.join(FLAGS.eval_base_path, '*.an.tab')) + + for filename in analogy_filenames: + name = os.path.basename(filename).split('.')[0] + with tf.device(tf.DeviceSpec(device_type='CPU')): + op = model.analogy_eval_op(filename) + tf.summary.scalar(name, op) + + tf.summary.scalar('loss', model.loss_op) + + # Train on, soldier. + supervisor = tf.train.Supervisor( + logdir=FLAGS.output_base_path, + is_chief=(FLAGS.task_index == 0), + save_summaries_secs=60, + recovery_wait_secs=5) + + max_step = FLAGS.num_epochs * model.steps_per_epoch + master = server.target if server else '' + with supervisor.managed_session(master) as session: + local_step = 0 + global_step = session.run(model.global_step) + while not supervisor.should_stop() and global_step < max_step: + global_step, loss, _ = session.run([ + model.global_step, model.loss_op, model.train_op]) + + if not np.isfinite(loss): + raise ValueError('non-finite cost at step %d' % global_step) + + local_step += 1 + if local_step % 10 == 0: + tf.logging.info( + 'local_step=%d global_step=%d loss=%.1f, %.1f%% complete', + local_step, global_step, loss, 100.0 * global_step / max_step) + + if FLAGS.task_index == 0: + supervisor.saver.save( + session, supervisor.save_path, global_step=global_step) + + model.write_embeddings(FLAGS, session) if __name__ == '__main__': diff --git a/syntaxnet/README.md b/syntaxnet/README.md index e76d627a5f4d6e90a5518da386d4640b9f848c2d..779ba2d8dac3cfba1f27a57dbdecad260d97956c 100644 --- a/syntaxnet/README.md +++ b/syntaxnet/README.md @@ -77,7 +77,7 @@ source. You'll need to install: * `brew install swig` on OSX * protocol buffers, with a version supported by TensorFlow: * check your protobuf version with `pip freeze | grep protobuf` - * upgrade to a supported version with `pip install -U protobuf==3.0.0b2` + * upgrade to a supported version with `pip install -U protobuf==3.3.0` * mock, the testing package: * `pip install mock` * asciitree, to draw parse trees on the console for the demo: diff --git a/syntaxnet/dragnn/tools/build_pip_package.py b/syntaxnet/dragnn/tools/build_pip_package.py index be8c285c1244fd2e282ccb2b0aa6fe19c6746bcb..4925dca3626a10712339339ccad8fff146913d5c 100644 --- a/syntaxnet/dragnn/tools/build_pip_package.py +++ b/syntaxnet/dragnn/tools/build_pip_package.py @@ -63,13 +63,12 @@ def main(): # Copy the files. subprocess.check_call([ - "cp", "-r", - "--no-preserve=all", os.path.join(base_dir, "dragnn"), os.path.join( + "cp", "-r", os.path.join(base_dir, "dragnn"), os.path.join( base_dir, "syntaxnet"), tmp_packaging ]) if args.include_tensorflow: subprocess.check_call( - ["cp", "-r", "--no-preserve=all", tensorflow_dir, tmp_packaging]) + ["cp", "-r", tensorflow_dir, tmp_packaging]) shutil.copy( os.path.join(base_dir, "dragnn/tools/oss_setup.py"), os.path.join(tmp_packaging, "setup.py")) diff --git a/syntaxnet/syntaxnet/arc_standard_transitions.cc b/syntaxnet/syntaxnet/arc_standard_transitions.cc index 8feebe1a9020d5ec91c84d7deb054021464e2955..24b94dbfcff33f727734fe607073bf4d50a0fbec 100644 --- a/syntaxnet/syntaxnet/arc_standard_transitions.cc +++ b/syntaxnet/syntaxnet/arc_standard_transitions.cc @@ -269,9 +269,7 @@ class ArcStandardTransitionSystem : public ParserTransitionSystem { void PerformRightArc(ParserState *state, int label) const { DCHECK(IsAllowedRightArc(*state)); int s0 = state->Pop(); - int s1 = state->Pop(); - state->AddArc(s0, s1, label); - state->Push(s1); + state->AddArc(s0, state->Top(), label); } // We are in a deterministic state when we either reached the end of the input diff --git a/textsum/README.md b/textsum/README.md index f7f69ab45279ebdc07ebdff56cb8a67d5567318f..1507a66a10cce92d8bdd02b09052669c55d3af68 100644 --- a/textsum/README.md +++ b/textsum/README.md @@ -16,7 +16,7 @@ The results described below are based on model trained on multi-gpu and multi-machine settings. It has been simplified to run on only one machine for open source purpose. -DataSet +Dataset We used the Gigaword dataset described in [Rush et al. A Neural Attention Model for Sentence Summarization](https://arxiv.org/abs/1509.00685). diff --git a/textsum/batch_reader.py b/textsum/batch_reader.py index fb2af1892249eb7c62692b6e495b2e649d272c67..918551b4c2c5698a5640d11918199f2a6ff65d23 100644 --- a/textsum/batch_reader.py +++ b/textsum/batch_reader.py @@ -21,6 +21,7 @@ from threading import Thread import time import numpy as np +import six from six.moves import queue as Queue from six.moves import xrange import tensorflow as tf @@ -133,7 +134,7 @@ class Batcher(object): pad_id = self._vocab.WordToId(data.PAD_TOKEN) input_gen = self._TextGenerator(data.ExampleGen(self._data_path)) while True: - (article, abstract) = input_gen.next() + (article, abstract) = six.next(input_gen) article_sentences = [sent.strip() for sent in data.ToSentences(article, include_token=False)] abstract_sentences = [sent.strip() for sent in @@ -242,7 +243,7 @@ class Batcher(object): def _TextGenerator(self, example_gen): """Generates article and abstract text from tf.Example.""" while True: - e = example_gen.next() + e = six.next(example_gen) try: article_text = self._GetExFeatureText(e, self._article_key) abstract_text = self._GetExFeatureText(e, self._abstract_key) diff --git a/tutorials/image/alexnet/alexnet_benchmark.py b/tutorials/image/alexnet/alexnet_benchmark.py index ed723055ca605b1f1b27494b3a55149ab1d8e25e..39fcb109f0ab173e232e8d11c6c53511a566ea5f 100644 --- a/tutorials/image/alexnet/alexnet_benchmark.py +++ b/tutorials/image/alexnet/alexnet_benchmark.py @@ -74,10 +74,15 @@ def inference(images): parameters += [kernel, biases] # lrn1 - # TODO(shlens, jiayq): Add a GPU version of local response normalization. + with tf.name_scope('lrn1') as scope: + lrn1 = tf.nn.local_response_normalization(conv1, + alpha=1e-4, + beta=0.75, + depth_radius=2, + bias=2.0) # pool1 - pool1 = tf.nn.max_pool(conv1, + pool1 = tf.nn.max_pool(lrn1, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='VALID', @@ -96,8 +101,16 @@ def inference(images): parameters += [kernel, biases] print_activations(conv2) + # lrn2 + with tf.name_scope('lrn2') as scope: + lrn2 = tf.nn.local_response_normalization(conv2, + alpha=1e-4, + beta=0.75, + depth_radius=2, + bias=2.0) + # pool2 - pool2 = tf.nn.max_pool(conv2, + pool2 = tf.nn.max_pool(lrn2, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='VALID', diff --git a/tutorials/image/cifar10/cifar10_input.py b/tutorials/image/cifar10/cifar10_input.py index 10c77623d4b7b036de08e3b4a8f221fed7889257..323f2f1139cb645acde783ff633f3ee49bb1873f 100644 --- a/tutorials/image/cifar10/cifar10_input.py +++ b/tutorials/image/cifar10/cifar10_input.py @@ -175,6 +175,8 @@ def distorted_inputs(data_dir, batch_size): # Because these operations are not commutative, consider randomizing # the order their operation. + # NOTE: since per_image_standardization zeros the mean and makes + # the stddev unit, this likely has no effect see tensorflow#1458. distorted_image = tf.image.random_brightness(distorted_image, max_delta=63) distorted_image = tf.image.random_contrast(distorted_image, diff --git a/tutorials/image/cifar10/cifar10_multi_gpu_train.py b/tutorials/image/cifar10/cifar10_multi_gpu_train.py index 1c70ad397c53bf7497e76b3406a19cca7234829c..d139f1315673ce40168b7b385dae4825d3a379da 100644 --- a/tutorials/image/cifar10/cifar10_multi_gpu_train.py +++ b/tutorials/image/cifar10/cifar10_multi_gpu_train.py @@ -13,7 +13,7 @@ # limitations under the License. # ============================================================================== -"""A binary to train CIFAR-10 using multiple GPU's with synchronous updates. +"""A binary to train CIFAR-10 using multiple GPUs with synchronous updates. Accuracy: cifar10_multi_gpu_train.py achieves ~86% accuracy after 100K steps (256 @@ -62,17 +62,17 @@ tf.app.flags.DEFINE_boolean('log_device_placement', False, """Whether to log device placement.""") -def tower_loss(scope): +def tower_loss(scope, images, labels): """Calculate the total loss on a single tower running the CIFAR model. Args: scope: unique prefix string identifying the CIFAR tower, e.g. 'tower_0' + images: Images. 4D tensor of shape [batch_size, height, width, 3]. + labels: Labels. 1D tensor of shape [batch_size]. Returns: Tensor of shape [] containing the total loss for a batch of data """ - # Get images and labels for CIFAR-10. - images, labels = cifar10.distorted_inputs() # Build inference Graph. logits = cifar10.inference(images) @@ -160,16 +160,22 @@ def train(): # Create an optimizer that performs gradient descent. opt = tf.train.GradientDescentOptimizer(lr) + # Get images and labels for CIFAR-10. + images, labels = cifar10.distorted_inputs() + batch_queue = tf.contrib.slim.prefetch_queue.prefetch_queue( + [images, labels], capacity=2 * FLAGS.num_gpus) # Calculate the gradients for each model tower. tower_grads = [] with tf.variable_scope(tf.get_variable_scope()): for i in xrange(FLAGS.num_gpus): with tf.device('/gpu:%d' % i): with tf.name_scope('%s_%d' % (cifar10.TOWER_NAME, i)) as scope: + # Dequeues one batch for the GPU + image_batch, label_batch = batch_queue.dequeue() # Calculate the loss for one tower of the CIFAR model. This function # constructs the entire CIFAR model but shares the variables across # all towers. - loss = tower_loss(scope) + loss = tower_loss(scope, image_batch, label_batch) # Reuse variables for the next tower. tf.get_variable_scope().reuse_variables() diff --git a/tutorials/image/cifar10/cifar10_train.py b/tutorials/image/cifar10/cifar10_train.py index fec64ec2272e1f18f0f6921f372583b481a322b9..cc1dc0d1489a798c7e4cef06ffdd91e0b39592b7 100644 --- a/tutorials/image/cifar10/cifar10_train.py +++ b/tutorials/image/cifar10/cifar10_train.py @@ -62,7 +62,10 @@ def train(): global_step = tf.contrib.framework.get_or_create_global_step() # Get images and labels for CIFAR-10. - images, labels = cifar10.distorted_inputs() + # Force input pipeline to CPU:0 to avoid operations sometimes ending up on + # GPU and resulting in a slow down. + with tf.device('/cpu:0'): + images, labels = cifar10.distorted_inputs() # Build a Graph that computes the logits predictions from the # inference model. diff --git a/tutorials/rnn/ptb/ptb_word_lm.py b/tutorials/rnn/ptb/ptb_word_lm.py index a130d819f31b6adc1d9a3d244f0bfedcac13c6cf..fccbd41255f2fcbb3fcf19a855d650b0f41d7049 100644 --- a/tutorials/rnn/ptb/ptb_word_lm.py +++ b/tutorials/rnn/ptb/ptb_word_lm.py @@ -157,16 +157,26 @@ class PTBModel(object): (cell_output, state) = cell(inputs[:, time_step, :], state) outputs.append(cell_output) - output = tf.reshape(tf.concat(axis=1, values=outputs), [-1, size]) + output = tf.reshape(tf.stack(axis=1, values=outputs), [-1, size]) softmax_w = tf.get_variable( "softmax_w", [size, vocab_size], dtype=data_type()) softmax_b = tf.get_variable("softmax_b", [vocab_size], dtype=data_type()) logits = tf.matmul(output, softmax_w) + softmax_b - loss = tf.contrib.legacy_seq2seq.sequence_loss_by_example( - [logits], - [tf.reshape(input_.targets, [-1])], - [tf.ones([batch_size * num_steps], dtype=data_type())]) - self._cost = cost = tf.reduce_sum(loss) / batch_size + + # Reshape logits to be 3-D tensor for sequence loss + logits = tf.reshape(logits, [batch_size, num_steps, vocab_size]) + + # use the contrib sequence loss and average over the batches + loss = tf.contrib.seq2seq.sequence_loss( + logits, + input_.targets, + tf.ones([batch_size, num_steps], dtype=data_type()), + average_across_timesteps=False, + average_across_batch=True + ) + + # update the cost variables + self._cost = cost = tf.reduce_sum(loss) self._final_state = state if not is_training: diff --git a/tutorials/rnn/ptb/reader.py b/tutorials/rnn/ptb/reader.py index 995b628c0f2c2ac75aeee38bbacbadcd6f29e0df..a14ecc3903c657e6a7d718bf6007f28e0a569d89 100644 --- a/tutorials/rnn/ptb/reader.py +++ b/tutorials/rnn/ptb/reader.py @@ -21,13 +21,17 @@ from __future__ import print_function import collections import os +import sys import tensorflow as tf def _read_words(filename): with tf.gfile.GFile(filename, "r") as f: - return f.read().decode("utf-8").replace("\n", "").split() + if sys.version_info[0] >= 3: + return f.read().replace("\n", "").split() + else: + return f.read().decode("utf-8").replace("\n", "").split() def _build_vocab(filename): diff --git a/tutorials/rnn/translate/__init__.py b/tutorials/rnn/translate/__init__.py index 985a65cb043e55e520bcf80fe166b031699711f2..e3aaab1f437b28a8222f27cec09700219a4b30cd 100644 --- a/tutorials/rnn/translate/__init__.py +++ b/tutorials/rnn/translate/__init__.py @@ -18,5 +18,5 @@ from __future__ import absolute_import from __future__ import division from __future__ import print_function -from . import data_utils -from . import seq2seq_model +import data_utils +import seq2seq_model diff --git a/tutorials/rnn/translate/seq2seq_model.py b/tutorials/rnn/translate/seq2seq_model.py index 7e0cc453f57594106650e8620fb08dd3f8cf84fe..205d3cc23821f444e26cb2251977627d40aac8f3 100644 --- a/tutorials/rnn/translate/seq2seq_model.py +++ b/tutorials/rnn/translate/seq2seq_model.py @@ -25,7 +25,7 @@ import numpy as np from six.moves import xrange # pylint: disable=redefined-builtin import tensorflow as tf -from . import data_utils +import data_utils class Seq2SeqModel(object): diff --git a/video_prediction/README.md b/video_prediction/README.md index 63f85967bb6375de5a719a96e62d2668e2f76689..51cd198c439cbea553fd2b7bf0b22544f7d1b61e 100644 --- a/video_prediction/README.md +++ b/video_prediction/README.md @@ -1,7 +1,6 @@ # Video Prediction with Neural Advection -*A TensorFlow implementation of the models described in [Finn et al. (2016)] -(http://arxiv.org/abs/1605.07157).* +*A TensorFlow implementation of the models described in [Unsupervised Learning for Physical Interaction through Video Prediction (Finn et al., 2016)](https://arxiv.org/abs/1605.07157).* This video prediction model, which is optionally conditioned on actions, predictions future video by internally predicting how to transform the last diff --git a/video_prediction/prediction_train.py b/video_prediction/prediction_train.py index 46f88142608cb1f98d6c89bd9898f7ea3bc6cf84..09625bbf1013b654b2c200a5e6848d10f01d1443 100644 --- a/video_prediction/prediction_train.py +++ b/video_prediction/prediction_train.py @@ -204,6 +204,8 @@ def main(unused_argv): # Make training session. sess = tf.InteractiveSession() + sess.run(tf.global_variables_initializer()) + summary_writer = tf.summary.FileWriter( FLAGS.event_log_dir, graph=sess.graph, flush_secs=10) @@ -211,7 +213,6 @@ def main(unused_argv): saver.restore(sess, FLAGS.pretrained_model) tf.train.start_queue_runners(sess) - sess.run(tf.global_variables_initializer()) tf.logging.info('iteration number, cost')